]> git.saurik.com Git - wxWidgets.git/blob - utils/tex2rtf/docs/notes.txt
Corrected memory.cpp checkpoint bug; added Tex2RTF
[wxWidgets.git] / utils / tex2rtf / docs / notes.txt
1 Implementation notes
2 --------------------
3
4 Files
5 -----
6
7 The library tex2any.lib contains the generic Latex parser.
8 It comprises tex2any.cc, tex2any.h and texutils.cc.
9
10 The executable Tex2RTF is made up of tex2any.lib,
11 tex2rtf.cc (main driver and user interface), and specific
12 drivers for generating output: rtfutils.cc, htmlutil.cc
13 and xlputils.cc.
14
15 Data structures
16 ---------------
17
18 Class declarations are found in tex2any.h.
19
20 TexMacroDef holds a macro (Latex command) definition: name, identifier,
21 number of arguments, whether it should be ignored, etc. Integer
22 identifiers are used for each Latex command for efficiency when
23 generating output. A hash table MacroDefs stores all the TexMacroDefs,
24 indexed on command name.
25
26 Each unit of a Latex file is stored in a TexChunk. A TexChunk can be
27 a macro, argument or just a string: a TexChunk macro has child
28 chunks for the arguments, and each argument will have one or more
29 children for representing another command or a simple string.
30
31 Parsing
32 -------
33
34 Parsing is relatively add hoc. read_a_line reads in a line at a time,
35 doing some processing for file commands (e.g. input, verbatiminclude).
36 File handles are stored in a stack so file input commands may be nested.
37
38 ParseArg parses an argument (which might be the whole Latex input,
39 which is treated as an argument) or a single command, or a command
40 argument. The parsing gets a little hairy because an environment,
41 a normal command and bracketed commands (e.g. {\bf thing}) all get
42 parsed into the same format. An environment, for example,
43 is usually a one-argument command, as is {\bf thing}. It also
44 deals with user-defined macros.
45
46 Whilst parsing, the function MatchMacro gets called to
47 attempt to find a command following a backslash (or the
48 start of an environment). ParseMacroBody parses the
49 arguments of a command when one is found.
50
51 Generation
52 ----------
53
54 The upshot of parsing is a hierarchy of TexChunks.
55 TraverseFromDocument calls the recursive TraverseFromChunk,
56 and is called by the 'client' converter application to
57 start the generation process. TraverseFromChunk
58 calls the two functions OnMacro and OnArgument,
59 twice for each chunk to allow for preprocessing
60 and postprocessing of each macro or argument.
61
62 The client defines OnMacro and OnArgument to test
63 the command identifier, and output the appropriate
64 code. To help do this, the function TexOutput
65 outputs to the current stream(s), and
66 SetCurrentOutput(s) allows the setting of one
67 or two output streams for the output to be sent to.
68 Usually two outputs at a time are sufficient for
69 hypertext applications where a title is likely
70 to appear in an index and as a section header.
71
72 There are support functions for getting the string
73 data for the current chunk (GetArgData) and the
74 current chunk (GetArgChunk). If you have a handle
75 on a chunk, you can output it several times by calling
76 TraverseChildrenFromChunk (not TraverseFromChunk because
77 that causes infinite recursion).
78
79 The client (here, Tex2RTF) also defines OnError and OnInform output
80 functions appropriate to the desired user interface.
81
82 References
83 ----------
84
85 Adding, finding and resolving references are supported
86 with functions from texutils.cc. WriteTexReferences
87 and ReadTexReferences allow saving and reading references
88 between conversion processes, rather like real LaTeX.
89
90 Bibliography
91 ------------
92
93 Again texutils.cc provides functions for reading in .bib files and
94 resolving references. The function OutputBibItem gives a generic way
95 outputting bibliography items, by 'faking' calls to OnMacro and
96 OnArgument, allowing the existing low-level client code to take care of
97 formatting.
98
99 Units
100 -----
101
102 Unit parsing code is in texutils.cc as ParseUnitArgument. It converts
103 units to points.
104
105 Common errors
106 -------------
107
108 1) Macro not found: \end{center} ...
109
110 Rewrite:
111
112 \begin{center}
113 {\large{\underline{A}}}
114 \end{center}
115
116 as:
117
118 \begin{center}
119 {\large \underline{A}}
120 \end{center}
121
122 2) Tables crash RTF. Set 'compatibility ' to TRUE in .ini file; also
123 check for \\ end of row characters on their own on a line, insert
124 correct number of ampersands for the number of columns. E.g.
125
126 hello & world\\
127 \\
128
129 becomes
130
131 hello & world\\
132 &\\
133
134 3) If list items indent erratically, try increasing
135 listItemIndent to give more space between label and following text.
136 A global replace of '\item [' to '\item[' may also be helpful to remove
137 unnecessary space before the item label.
138
139 4) Missing figure or section references: ensure all labels _directly_ follow captions
140 or sections (no intervening white space).