3 The goal of StructuredText is to make it possible to express
4 structured text using a relatively simple plain text format. Simple
5 structures, like bullets or headings are indicated through
6 conventions that are natural, for some definition of
7 "natural". Hierarchical structures are indicated through
8 indentation. The use of indentation to express hierarchical
9 structure is inspired by the Python programming language.
11 Use of StructuredText consists of one to three logical steps. In the
12 first step, a text string is converted to a network of objects using
13 the 'StructuredText.Basic' facility, as in the following
16 raw=open("mydocument.txt").read()
18 st=StructuredText.Basic(raw)
20 The output of 'StructuredText.Basic' is simply a
21 StructuredTextDocument object containing StructuredTextParagraph
22 objects arranged in a hierarchy. Paragraphs are delimited by strings
23 of two or more whitespace characters beginning and ending with
24 newline characters. Hierarchy is indicated by indentation. The
25 indentation of a paragraph is the minimum number of leading spaces
26 in a line containing non-white-space characters after converting tab
27 characters to spaces (assuming a tab stop every eight characters).
29 StructuredTextNode objects support the read-only subset of the
30 Document Object Model (DOM) API. It should be possible to process
31 'StructuredTextNode' hierarchies using XML tools such as XSLT.
33 The second step in using StructuredText is to apply additional
34 structuring rules based on text content. A variety of differentText
35 rules can be used. Typically, these are used to implement a
36 structured text language for producing documents, but any sort of
37 structured text language could be implemented in the second
38 step. For example, it is possible to use StructuredText to implement
39 structured text formats for representing structured data. The second
40 step, which could consist of multiple processing steps, is
41 performed by processing, or "coloring", the hierarchy of generic
42 StructuredTextParagraph objects into a network of more specialized
43 objects. Typically, the objects produced should also implement the DOM
44 API to allow processing with XML tools.
46 A document processor is provided to convert a StructuredTextDocument
47 object containing only StructuredStructuredTextParagraph objects
48 into a StructuredTextDocument object containing a richer collection
49 of objects such as bullets, headings, emphasis, and so on using
50 hints in the text. Hints are selected based on conventions of the
51 sort typically seen in electronic mail or news-group postings. It
52 should be noted, however, that these conventions are somewhat
53 culturally dependent, fortunately, the document processor is easily
54 customized to implement alternative rules. Here's an example of
55 using the DOC processor to convert the output of the previous example::
57 doc=StructuredText.Document(st)
59 The final step is to process the colored networks produced from the
60 second step to produce additional outputs. The final step could be
61 performed by Python programs, or by XML tools. A Python outputter is
62 provided for the document processor output that produces Hypertext Markup
63 Language (HTML) text::
65 html=StructuredText.HTML(doc)
67 Customizing the document processor
69 The document processor is driven by two tables. The first table,
70 named 'paragraph_types', is a sequence of callable objects or method
71 names for coloring paragraphs. If a table entry is a string, then it
72 is the name of a method of the document processor to be used. For
73 each input paragraph, the objects in the table are called until one
74 returns a value (not 'None'). The value returned replaces the
75 original input paragraph in the output. If none of the objects in
76 the paragraph types table return a value, then a copy of the
77 original paragraph is used. The new object returned by calling a
78 paragraph type should implement the ReadOnlyDOM,
79 StructuredTextColorizable, and StructuredTextSubparagraphContainer
80 interfaces. See the 'Document.py' source file for examples.
82 A paragraph type may return a list or tuple of replacement
83 paragraphs, this allowing a paragraph to be split into multiple
86 The second table, 'text_types', is a sequence of callable objects or
87 method names for coloring text. The callable objects in this table
88 are used in sequence to transform the input text into new text or
89 objects. The callable objects are passed a string and return
90 nothing ('None') or a three-element tuple consisting of:
92 - a replacement object,
94 - a starting position, and
98 The text from the starting position is (logically) replaced with the
99 replacement object. The replacement object is typically an object
100 that implements that implements the ReadOnlyDOM, and
101 StructuredTextColorizable interfaces. The replacement object can
102 also be a string or a list of strings or objects. Replacement is
103 done from beginning to end and text after the replacement ending
104 position will be passed to the character type objects for processing.
106 Example: adding wiki links
108 We want to add support for Wiki links. A Wiki link is a string of
109 text containing mixed-case letters, such that at least two of the
110 letters are upper case and such that the first letter is upper case.