]>
Commit | Line | Data |
---|---|---|
1 | Using Structured Text | |
2 | ||
3 | The goal of StructuredText is to make it possible to express | |
4 | structured text using a relatively simple plain text format. Simple | |
5 | structures, like bullets or headings are indicated through | |
6 | conventions that are natural, for some definition of | |
7 | "natural". Hierarchical structures are indicated through | |
8 | indentation. The use of indentation to express hierarchical | |
9 | structure is inspired by the Python programming language. | |
10 | ||
11 | Use of StructuredText consists of one to three logical steps. In the | |
12 | first step, a text string is converted to a network of objects using | |
13 | the 'StructuredText.Basic' facility, as in the following | |
14 | example:: | |
15 | ||
16 | raw=open("mydocument.txt").read() | |
17 | import StructuredText | |
18 | st=StructuredText.Basic(raw) | |
19 | ||
20 | The output of 'StructuredText.Basic' is simply a | |
21 | StructuredTextDocument object containing StructuredTextParagraph | |
22 | objects arranged in a hierarchy. Paragraphs are delimited by strings | |
23 | of two or more whitespace characters beginning and ending with | |
24 | newline characters. Hierarchy is indicated by indentation. The | |
25 | indentation of a paragraph is the minimum number of leading spaces | |
26 | in a line containing non-white-space characters after converting tab | |
27 | characters to spaces (assuming a tab stop every eight characters). | |
28 | ||
29 | StructuredTextNode objects support the read-only subset of the | |
30 | Document Object Model (DOM) API. It should be possible to process | |
31 | 'StructuredTextNode' hierarchies using XML tools such as XSLT. | |
32 | ||
33 | The second step in using StructuredText is to apply additional | |
34 | structuring rules based on text content. A variety of differentText | |
35 | rules can be used. Typically, these are used to implement a | |
36 | structured text language for producing documents, but any sort of | |
37 | structured text language could be implemented in the second | |
38 | step. For example, it is possible to use StructuredText to implement | |
39 | structured text formats for representing structured data. The second | |
40 | step, which could consist of multiple processing steps, is | |
41 | performed by processing, or "coloring", the hierarchy of generic | |
42 | StructuredTextParagraph objects into a network of more specialized | |
43 | objects. Typically, the objects produced should also implement the DOM | |
44 | API to allow processing with XML tools. | |
45 | ||
46 | A document processor is provided to convert a StructuredTextDocument | |
47 | object containing only StructuredStructuredTextParagraph objects | |
48 | into a StructuredTextDocument object containing a richer collection | |
49 | of objects such as bullets, headings, emphasis, and so on using | |
50 | hints in the text. Hints are selected based on conventions of the | |
51 | sort typically seen in electronic mail or news-group postings. It | |
52 | should be noted, however, that these conventions are somewhat | |
53 | culturally dependent, fortunately, the document processor is easily | |
54 | customized to implement alternative rules. Here's an example of | |
55 | using the DOC processor to convert the output of the previous example:: | |
56 | ||
57 | doc=StructuredText.Document(st) | |
58 | ||
59 | The final step is to process the colored networks produced from the | |
60 | second step to produce additional outputs. The final step could be | |
61 | performed by Python programs, or by XML tools. A Python outputter is | |
62 | provided for the document processor output that produces Hypertext Markup | |
63 | Language (HTML) text:: | |
64 | ||
65 | html=StructuredText.HTML(doc) | |
66 | ||
67 | Customizing the document processor | |
68 | ||
69 | The document processor is driven by two tables. The first table, | |
70 | named 'paragraph_types', is a sequence of callable objects or method | |
71 | names for coloring paragraphs. If a table entry is a string, then it | |
72 | is the name of a method of the document processor to be used. For | |
73 | each input paragraph, the objects in the table are called until one | |
74 | returns a value (not 'None'). The value returned replaces the | |
75 | original input paragraph in the output. If none of the objects in | |
76 | the paragraph types table return a value, then a copy of the | |
77 | original paragraph is used. The new object returned by calling a | |
78 | paragraph type should implement the ReadOnlyDOM, | |
79 | StructuredTextColorizable, and StructuredTextSubparagraphContainer | |
80 | interfaces. See the 'Document.py' source file for examples. | |
81 | ||
82 | A paragraph type may return a list or tuple of replacement | |
83 | paragraphs, this allowing a paragraph to be split into multiple | |
84 | paragraphs. | |
85 | ||
86 | The second table, 'text_types', is a sequence of callable objects or | |
87 | method names for coloring text. The callable objects in this table | |
88 | are used in sequence to transform the input text into new text or | |
89 | objects. The callable objects are passed a string and return | |
90 | nothing ('None') or a three-element tuple consisting of: | |
91 | ||
92 | - a replacement object, | |
93 | ||
94 | - a starting position, and | |
95 | ||
96 | - an ending position | |
97 | ||
98 | The text from the starting position is (logically) replaced with the | |
99 | replacement object. The replacement object is typically an object | |
100 | that implements that implements the ReadOnlyDOM, and | |
101 | StructuredTextColorizable interfaces. The replacement object can | |
102 | also be a string or a list of strings or objects. Replacement is | |
103 | done from beginning to end and text after the replacement ending | |
104 | position will be passed to the character type objects for processing. | |
105 | ||
106 | Example: adding wiki links | |
107 | ||
108 | We want to add support for Wiki links. A Wiki link is a string of | |
109 | text containing mixed-case letters, such that at least two of the | |
110 | letters are upper case and such that the first letter is upper case. | |
111 | ||
112 | ||
113 | ||
114 | ||
115 | ||
116 |