]>
Commit | Line | Data |
---|---|---|
1 | .\" This manpage has been automatically generated by docbook2man | |
2 | .\" from a DocBook document. This tool can be found at: | |
3 | .\" <http://shell.ipoline.com/~elmert/comp/docbook2X/> | |
4 | .\" Please send any bug reports, improvements, comments, patches, | |
5 | .\" etc. to Steve Cheng <steve@ggi-project.org>. | |
6 | .TH "XMLWF" "1" "24 January 2003" "" "" | |
7 | .SH NAME | |
8 | xmlwf \- Determines if an XML document is well-formed | |
9 | .SH SYNOPSIS | |
10 | ||
11 | \fBxmlwf\fR [ \fB-s\fR] [ \fB-n\fR] [ \fB-p\fR] [ \fB-x\fR] [ \fB-e \fIencoding\fB\fR] [ \fB-w\fR] [ \fB-d \fIoutput-dir\fB\fR] [ \fB-c\fR] [ \fB-m\fR] [ \fB-r\fR] [ \fB-t\fR] [ \fB-v\fR] [ \fBfile ...\fR] | |
12 | ||
13 | .SH "DESCRIPTION" | |
14 | .PP | |
15 | \fBxmlwf\fR uses the Expat library to | |
16 | determine if an XML document is well-formed. It is | |
17 | non-validating. | |
18 | .PP | |
19 | If you do not specify any files on the command-line, and you | |
20 | have a recent version of \fBxmlwf\fR, the | |
21 | input file will be read from standard input. | |
22 | .SH "WELL-FORMED DOCUMENTS" | |
23 | .PP | |
24 | A well-formed document must adhere to the | |
25 | following rules: | |
26 | .TP 0.2i | |
27 | \(bu | |
28 | The file begins with an XML declaration. For instance, | |
29 | <?xml version="1.0" standalone="yes"?>. | |
30 | \fBNOTE:\fR | |
31 | \fBxmlwf\fR does not currently | |
32 | check for a valid XML declaration. | |
33 | .TP 0.2i | |
34 | \(bu | |
35 | Every start tag is either empty (<tag/>) | |
36 | or has a corresponding end tag. | |
37 | .TP 0.2i | |
38 | \(bu | |
39 | There is exactly one root element. This element must contain | |
40 | all other elements in the document. Only comments, white | |
41 | space, and processing instructions may come after the close | |
42 | of the root element. | |
43 | .TP 0.2i | |
44 | \(bu | |
45 | All elements nest properly. | |
46 | .TP 0.2i | |
47 | \(bu | |
48 | All attribute values are enclosed in quotes (either single | |
49 | or double). | |
50 | .PP | |
51 | If the document has a DTD, and it strictly complies with that | |
52 | DTD, then the document is also considered \fBvalid\fR. | |
53 | \fBxmlwf\fR is a non-validating parser -- | |
54 | it does not check the DTD. However, it does support | |
55 | external entities (see the \fB-x\fR option). | |
56 | .SH "OPTIONS" | |
57 | .PP | |
58 | When an option includes an argument, you may specify the argument either | |
59 | separately ("\fB-d\fR output") or concatenated with the | |
60 | option ("\fB-d\fRoutput"). \fBxmlwf\fR | |
61 | supports both. | |
62 | .TP | |
63 | \fB-c\fR | |
64 | If the input file is well-formed and \fBxmlwf\fR | |
65 | doesn't encounter any errors, the input file is simply copied to | |
66 | the output directory unchanged. | |
67 | This implies no namespaces (turns off \fB-n\fR) and | |
68 | requires \fB-d\fR to specify an output file. | |
69 | .TP | |
70 | \fB-d output-dir\fR | |
71 | Specifies a directory to contain transformed | |
72 | representations of the input files. | |
73 | By default, \fB-d\fR outputs a canonical representation | |
74 | (described below). | |
75 | You can select different output formats using \fB-c\fR | |
76 | and \fB-m\fR. | |
77 | ||
78 | The output filenames will | |
79 | be exactly the same as the input filenames or "STDIN" if the input is | |
80 | coming from standard input. Therefore, you must be careful that the | |
81 | output file does not go into the same directory as the input | |
82 | file. Otherwise, \fBxmlwf\fR will delete the | |
83 | input file before it generates the output file (just like running | |
84 | cat < file > file in most shells). | |
85 | ||
86 | Two structurally equivalent XML documents have a byte-for-byte | |
87 | identical canonical XML representation. | |
88 | Note that ignorable white space is considered significant and | |
89 | is treated equivalently to data. | |
90 | More on canonical XML can be found at | |
91 | http://www.jclark.com/xml/canonxml.html . | |
92 | .TP | |
93 | \fB-e encoding\fR | |
94 | Specifies the character encoding for the document, overriding | |
95 | any document encoding declaration. \fBxmlwf\fR | |
96 | supports four built-in encodings: | |
97 | US-ASCII, | |
98 | UTF-8, | |
99 | UTF-16, and | |
100 | ISO-8859-1. | |
101 | Also see the \fB-w\fR option. | |
102 | .TP | |
103 | \fB-m\fR | |
104 | Outputs some strange sort of XML file that completely | |
105 | describes the the input file, including character postitions. | |
106 | Requires \fB-d\fR to specify an output file. | |
107 | .TP | |
108 | \fB-n\fR | |
109 | Turns on namespace processing. (describe namespaces) | |
110 | \fB-c\fR disables namespaces. | |
111 | .TP | |
112 | \fB-p\fR | |
113 | Tells xmlwf to process external DTDs and parameter | |
114 | entities. | |
115 | ||
116 | Normally \fBxmlwf\fR never parses parameter | |
117 | entities. \fB-p\fR tells it to always parse them. | |
118 | \fB-p\fR implies \fB-x\fR. | |
119 | .TP | |
120 | \fB-r\fR | |
121 | Normally \fBxmlwf\fR memory-maps the XML file | |
122 | before parsing; this can result in faster parsing on many | |
123 | platforms. | |
124 | \fB-r\fR turns off memory-mapping and uses normal file | |
125 | IO calls instead. | |
126 | Of course, memory-mapping is automatically turned off | |
127 | when reading from standard input. | |
128 | ||
129 | Use of memory-mapping can cause some platforms to report | |
130 | substantially higher memory usage for | |
131 | \fBxmlwf\fR, but this appears to be a matter of | |
132 | the operating system reporting memory in a strange way; there is | |
133 | not a leak in \fBxmlwf\fR. | |
134 | .TP | |
135 | \fB-s\fR | |
136 | Prints an error if the document is not standalone. | |
137 | A document is standalone if it has no external subset and no | |
138 | references to parameter entities. | |
139 | .TP | |
140 | \fB-t\fR | |
141 | Turns on timings. This tells Expat to parse the entire file, | |
142 | but not perform any processing. | |
143 | This gives a fairly accurate idea of the raw speed of Expat itself | |
144 | without client overhead. | |
145 | \fB-t\fR turns off most of the output options | |
146 | (\fB-d\fR, \fB-m\fR, \fB-c\fR, | |
147 | \&...). | |
148 | .TP | |
149 | \fB-v\fR | |
150 | Prints the version of the Expat library being used, including some | |
151 | information on the compile-time configuration of the library, and | |
152 | then exits. | |
153 | .TP | |
154 | \fB-w\fR | |
155 | Enables support for Windows code pages. | |
156 | Normally, \fBxmlwf\fR will throw an error if it | |
157 | runs across an encoding that it is not equipped to handle itself. With | |
158 | \fB-w\fR, xmlwf will try to use a Windows code | |
159 | page. See also \fB-e\fR. | |
160 | .TP | |
161 | \fB-x\fR | |
162 | Turns on parsing external entities. | |
163 | ||
164 | Non-validating parsers are not required to resolve external | |
165 | entities, or even expand entities at all. | |
166 | Expat always expands internal entities (?), | |
167 | but external entity parsing must be enabled explicitly. | |
168 | ||
169 | External entities are simply entities that obtain their | |
170 | data from outside the XML file currently being parsed. | |
171 | ||
172 | This is an example of an internal entity: | |
173 | ||
174 | .nf | |
175 | <!ENTITY vers '1.0.2'> | |
176 | .fi | |
177 | ||
178 | And here are some examples of external entities: | |
179 | ||
180 | .nf | |
181 | <!ENTITY header SYSTEM "header-&vers;.xml"> (parsed) | |
182 | <!ENTITY logo SYSTEM "logo.png" PNG> (unparsed) | |
183 | .fi | |
184 | .TP | |
185 | \fB--\fR | |
186 | (Two hyphens.) | |
187 | Terminates the list of options. This is only needed if a filename | |
188 | starts with a hyphen. For example: | |
189 | ||
190 | .nf | |
191 | xmlwf -- -myfile.xml | |
192 | .fi | |
193 | ||
194 | will run \fBxmlwf\fR on the file | |
195 | \fI-myfile.xml\fR. | |
196 | .PP | |
197 | Older versions of \fBxmlwf\fR do not support | |
198 | reading from standard input. | |
199 | .SH "OUTPUT" | |
200 | .PP | |
201 | If an input file is not well-formed, | |
202 | \fBxmlwf\fR prints a single line describing | |
203 | the problem to standard output. If a file is well formed, | |
204 | \fBxmlwf\fR outputs nothing. | |
205 | Note that the result code is \fBnot\fR set. | |
206 | .SH "BUGS" | |
207 | .PP | |
208 | According to the W3C standard, an XML file without a | |
209 | declaration at the beginning is not considered well-formed. | |
210 | However, \fBxmlwf\fR allows this to pass. | |
211 | .PP | |
212 | \fBxmlwf\fR returns a 0 - noerr result, | |
213 | even if the file is not well-formed. There is no good way for | |
214 | a program to use \fBxmlwf\fR to quickly | |
215 | check a file -- it must parse \fBxmlwf\fR's | |
216 | standard output. | |
217 | .PP | |
218 | The errors should go to standard error, not standard output. | |
219 | .PP | |
220 | There should be a way to get \fB-d\fR to send its | |
221 | output to standard output rather than forcing the user to send | |
222 | it to a file. | |
223 | .PP | |
224 | I have no idea why anyone would want to use the | |
225 | \fB-d\fR, \fB-c\fR, and | |
226 | \fB-m\fR options. If someone could explain it to | |
227 | me, I'd like to add this information to this manpage. | |
228 | .SH "ALTERNATIVES" | |
229 | .PP | |
230 | Here are some XML validators on the web: | |
231 | ||
232 | .nf | |
233 | http://www.hcrc.ed.ac.uk/~richard/xml-check.html | |
234 | http://www.stg.brown.edu/service/xmlvalid/ | |
235 | http://www.scripting.com/frontier5/xml/code/xmlValidator.html | |
236 | http://www.xml.com/pub/a/tools/ruwf/check.html | |
237 | .fi | |
238 | .SH "SEE ALSO" | |
239 | .PP | |
240 | ||
241 | .nf | |
242 | The Expat home page: http://www.libexpat.org/ | |
243 | The W3 XML specification: http://www.w3.org/TR/REC-xml | |
244 | .fi | |
245 | .SH "AUTHOR" | |
246 | .PP | |
247 | This manual page was written by Scott Bronson <bronson@rinspin.com> for | |
248 | the Debian GNU/Linux system (but may be used by others). Permission is | |
249 | granted to copy, distribute and/or modify this document under | |
250 | the terms of the GNU Free Documentation | |
251 | License, Version 1.1. |