XML-FIXTAGS
XML-FIXTAGS
NAME
SYNOPSIS
DESCRIPTION
TRANSFORMATIONS
OPTIONS
EXIT STATUS
BUGS
AUTHORS
SEE ALSO
NAME
xml-fixtags
− convert HTML into XML on the standard output.
SYNOPSIS
DESCRIPTION
xml-fixtags
aggressively converts a single HTML or XML file, obtained
from the standard input or FILE, into a well formed XML
file, written to the standard output, that can be processed
by xml-coreutils(7) without errors.
The output that
is produced by xml-fixtags is almost certainly not
what you want, and you should nearly always use a more
sophisticated tool such as tidy(1), or
xmllint(1) for ordinary conversions.
xml-fixtags
is useful for processing documents which are not well formed
to begin with, and where it does not matter if the
corrections resemble closely what the original author
intended, or when there are no alternatives installed on the
system. This makes the xml-coreutils(7) more robust
in a transparent way, without duplicating the repair
heuristics in each command.
xml-fixtags
uses a very simple algorithm which tries to localise the
effect of well formedness errors in the input with minimal
disruption to the other parts of the input. If the input is
already well formed XML, then no modifications are
performed.
The output of
xml-fixtags is not guaranteed to be valid, and does
not follow any rules specific to certain XML or HTML
documents. It is merely guaranteed to be well formed.
TRANSFORMATIONS
This section
describes the main transformations that are performed by
xml-fixtags.
If the file
does not start with ’<’, then an extra root
tag will be added automatically (same effect as
--root-wrap). As soon as a zero depth closing tag is
encountered, the output ends.
If a closing
tag is found which is not properly nested, all the children
of the tag are closed immediately as well. If a closing tag
is found which was not previously opened, it is opened and
closed immediately. For the purposes of the preceding rules,
tag names are searched case insensitively.
If an unknown
entity reference "&name;" is found which has
not been declared before, it is replaced with the text
"&name;".
If the --html
switch is used, then the input is assumed to be HTML and the
rules for opening and closing tags will also depend on the
type of tag. The html, head, and body tags are inserted if
they are missing, but full DTD compliance is not
attempted.
OPTIONS
--root-wrap
Adds a standard root wrapper
around the document, thereby incrementing the depth of every
tag. This can be used to prevent early truncation of the
document when a zero depth closing tag would otherwise be
found.
|
--html |
|
Assume that the input document is HTML. This switches on
some extra heuristics. It does not imply valid XHTML on
output. |
|
--xml |
|
Assume that the input document is XML. This is the
default. |
EXIT STATUS
xml-fixtags
returns 0 on success, or 1 otherwise.
BUGS
xml-fixtags is
still primitive, and can fail to fix an input document.
AUTHORS
Laird
A. Breyer is the original author of this software. The
source code (GPLv3 or later) for the latest version is
available at the following locations:
http://www.lbreyer.com/gpl.html
http://xml-coreutils.sourceforge.net
SEE ALSO
xml-coreutils(7)
|