]>
Commit | Line | Data |
---|---|---|
1 | Implementation notes | |
2 | -------------------- | |
3 | ||
4 | Files | |
5 | ----- | |
6 | ||
7 | The library tex2any.lib contains the generic Latex parser. | |
8 | It comprises tex2any.cc, tex2any.h and texutils.cc. | |
9 | ||
10 | The executable Tex2RTF is made up of tex2any.lib, | |
11 | tex2rtf.cc (main driver and user interface), and specific | |
12 | drivers for generating output: rtfutils.cc, htmlutil.cc | |
13 | and xlputils.cc. | |
14 | ||
15 | Data structures | |
16 | --------------- | |
17 | ||
18 | Class declarations are found in tex2any.h. | |
19 | ||
20 | TexMacroDef holds a macro (Latex command) definition: name, identifier, | |
21 | number of arguments, whether it should be ignored, etc. Integer | |
22 | identifiers are used for each Latex command for efficiency when | |
23 | generating output. A hash table MacroDefs stores all the TexMacroDefs, | |
24 | indexed on command name. | |
25 | ||
26 | Each unit of a Latex file is stored in a TexChunk. A TexChunk can be | |
27 | a macro, argument or just a string: a TexChunk macro has child | |
28 | chunks for the arguments, and each argument will have one or more | |
29 | children for representing another command or a simple string. | |
30 | ||
31 | Parsing | |
32 | ------- | |
33 | ||
34 | Parsing is relatively add hoc. read_a_line reads in a line at a time, | |
35 | doing some processing for file commands (e.g. input, verbatiminclude). | |
36 | File handles are stored in a stack so file input commands may be nested. | |
37 | ||
38 | ParseArg parses an argument (which might be the whole Latex input, | |
39 | which is treated as an argument) or a single command, or a command | |
40 | argument. The parsing gets a little hairy because an environment, | |
41 | a normal command and bracketed commands (e.g. {\bf thing}) all get | |
42 | parsed into the same format. An environment, for example, | |
43 | is usually a one-argument command, as is {\bf thing}. It also | |
44 | deals with user-defined macros. | |
45 | ||
46 | Whilst parsing, the function MatchMacro gets called to | |
47 | attempt to find a command following a backslash (or the | |
48 | start of an environment). ParseMacroBody parses the | |
49 | arguments of a command when one is found. | |
50 | ||
51 | Generation | |
52 | ---------- | |
53 | ||
54 | The upshot of parsing is a hierarchy of TexChunks. | |
55 | TraverseFromDocument calls the recursive TraverseFromChunk, | |
56 | and is called by the 'client' converter application to | |
57 | start the generation process. TraverseFromChunk | |
58 | calls the two functions OnMacro and OnArgument, | |
59 | twice for each chunk to allow for preprocessing | |
60 | and postprocessing of each macro or argument. | |
61 | ||
62 | The client defines OnMacro and OnArgument to test | |
63 | the command identifier, and output the appropriate | |
64 | code. To help do this, the function TexOutput | |
65 | outputs to the current stream(s), and | |
66 | SetCurrentOutput(s) allows the setting of one | |
67 | or two output streams for the output to be sent to. | |
68 | Usually two outputs at a time are sufficient for | |
69 | hypertext applications where a title is likely | |
70 | to appear in an index and as a section header. | |
71 | ||
72 | There are support functions for getting the string | |
73 | data for the current chunk (GetArgData) and the | |
74 | current chunk (GetArgChunk). If you have a handle | |
75 | on a chunk, you can output it several times by calling | |
76 | TraverseChildrenFromChunk (not TraverseFromChunk because | |
77 | that causes infinite recursion). | |
78 | ||
79 | The client (here, Tex2RTF) also defines OnError and OnInform output | |
80 | functions appropriate to the desired user interface. | |
81 | ||
82 | References | |
83 | ---------- | |
84 | ||
85 | Adding, finding and resolving references are supported | |
86 | with functions from texutils.cc. WriteTexReferences | |
87 | and ReadTexReferences allow saving and reading references | |
88 | between conversion processes, rather like real LaTeX. | |
89 | ||
90 | Bibliography | |
91 | ------------ | |
92 | ||
93 | Again texutils.cc provides functions for reading in .bib files and | |
94 | resolving references. The function OutputBibItem gives a generic way | |
95 | outputting bibliography items, by 'faking' calls to OnMacro and | |
96 | OnArgument, allowing the existing low-level client code to take care of | |
97 | formatting. | |
98 | ||
99 | Units | |
100 | ----- | |
101 | ||
102 | Unit parsing code is in texutils.cc as ParseUnitArgument. It converts | |
103 | units to points. | |
104 | ||
105 | Common errors | |
106 | ------------- | |
107 | ||
108 | 1) Macro not found: \end{center} ... | |
109 | ||
110 | Rewrite: | |
111 | ||
112 | \begin{center} | |
113 | {\large{\underline{A}}} | |
114 | \end{center} | |
115 | ||
116 | as: | |
117 | ||
118 | \begin{center} | |
119 | {\large \underline{A}} | |
120 | \end{center} | |
121 | ||
122 | 2) Tables crash RTF. Set 'compatibility ' to TRUE in .ini file; also | |
123 | check for \\ end of row characters on their own on a line, insert | |
124 | correct number of ampersands for the number of columns. E.g. | |
125 | ||
126 | hello & world\\ | |
127 | \\ | |
128 | ||
129 | becomes | |
130 | ||
131 | hello & world\\ | |
132 | &\\ | |
133 | ||
134 | 3) If list items indent erratically, try increasing | |
135 | listItemIndent to give more space between label and following text. | |
136 | A global replace of '\item [' to '\item[' may also be helpful to remove | |
137 | unnecessary space before the item label. | |
138 | ||
139 | 4) Missing figure or section references: ensure all labels _directly_ follow captions | |
140 | or sections (no intervening white space). |