− convert HTML into XML on the standard output.
aggressively converts a single HTML or XML file, obtained
from the standard input or FILE, into a well formed XML
file, written to the standard output, that can be processed
by xml-coreutils(7) without errors.
The output that
is produced by xml-fixtags is almost certainly not
what you want, and you should nearly always use a more
sophisticated tool such as tidy(1), or
xmllint(1) for ordinary conversions.
is useful for processing documents which are not well formed
to begin with, and where it does not matter if the
corrections resemble closely what the original author
intended, or when there are no alternatives installed on the
system. This makes the xml-coreutils(7) more robust
in a transparent way, without duplicating the repair
heuristics in each command.
uses a very simple algorithm which tries to localise the
effect of well formedness errors in the input with minimal
disruption to the other parts of the input. If the input is
already well formed XML, then no modifications are
The output of
xml-fixtags is not guaranteed to be valid, and does
not follow any rules specific to certain XML or HTML
documents. It is merely guaranteed to be well formed.
describes the main transformations that are performed by
If the file
does not start with ’<’, then an extra root
tag will be added automatically (same effect as
--root-wrap). As soon as a zero depth closing tag is
encountered, the output ends.
If a closing
tag is found which is not properly nested, all the children
of the tag are closed immediately as well. If a closing tag
is found which was not previously opened, it is opened and
closed immediately. For the purposes of the preceding rules,
tag names are searched case insensitively.
If an unknown
entity reference "&name;" is found which has
not been declared before, it is replaced with the text
If the --html
switch is used, then the input is assumed to be HTML and the
rules for opening and closing tags will also depend on the
type of tag. The html, head, and body tags are inserted if
they are missing, but full DTD compliance is not
Adds a standard root wrapper
around the document, thereby incrementing the depth of every
tag. This can be used to prevent early truncation of the
document when a zero depth closing tag would otherwise be
Assume that the input document is HTML. This switches on
some extra heuristics. It does not imply valid XHTML on
Assume that the input document is XML. This is the
returns 0 on success, or 1 otherwise.
still primitive, and can fail to fix an input document.
A. Breyer is the original author of this software. The
source code (GPLv3 or later) for the latest version is
available at the following locations: