3   The goal of StructuredText is to make it possible to express
 
   4   structured text using a relatively simple plain text format. Simple
 
   5   structures, like bullets or headings are indicated through
 
   6   conventions that are natural, for some definition of
 
   7   "natural". Hierarchical structures are indicated through
 
   8   indentation. The use of indentation to express hierarchical
 
   9   structure is inspired by the Python programming language.
 
  11   Use of StructuredText consists of one to three logical steps. In the
 
  12   first step, a text string is converted to a network of objects using
 
  13   the 'StructuredText.Basic' facility, as in the following
 
  16     raw=open("mydocument.txt").read()
 
  18     st=StructuredText.Basic(raw)
 
  20   The output of 'StructuredText.Basic' is simply a
 
  21   StructuredTextDocument object containing StructuredTextParagraph
 
  22   objects arranged in a hierarchy. Paragraphs are delimited by strings
 
  23   of two or more whitespace characters beginning and ending with
 
  24   newline characters. Hierarchy is indicated by indentation. The
 
  25   indentation of a paragraph is the minimum number of leading spaces
 
  26   in a line containing non-white-space characters after converting tab
 
  27   characters to spaces (assuming a tab stop every eight characters).
 
  29   StructuredTextNode objects support the read-only subset of the
 
  30   Document Object Model (DOM) API. It should be possible to process
 
  31   'StructuredTextNode' hierarchies using XML tools such as XSLT.
 
  33   The second step in using StructuredText is to apply additional
 
  34   structuring rules based on text content. A variety of differentText
 
  35   rules can be used. Typically, these are used to implement a
 
  36   structured text language for producing documents, but any sort of
 
  37   structured text language could be implemented in the second
 
  38   step. For example, it is possible to use StructuredText to implement
 
  39   structured text formats for representing structured data. The second
 
  40   step, which could consist of multiple processing steps, is
 
  41   performed by processing, or "coloring", the hierarchy of generic
 
  42   StructuredTextParagraph objects into a network of more specialized
 
  43   objects. Typically, the objects produced should also implement the DOM
 
  44   API to allow processing with XML tools.
 
  46   A document processor is provided to convert a StructuredTextDocument
 
  47   object containing only StructuredStructuredTextParagraph objects
 
  48   into a StructuredTextDocument object containing a richer collection
 
  49   of objects such as bullets, headings, emphasis, and so on using
 
  50   hints in the text. Hints are selected based on conventions of the
 
  51   sort typically seen in electronic mail or news-group postings. It
 
  52   should be noted, however, that these conventions are somewhat
 
  53   culturally dependent, fortunately, the document processor is easily
 
  54   customized to implement alternative rules. Here's an example of
 
  55   using the DOC processor to convert the output of the previous example::
 
  57     doc=StructuredText.Document(st)
 
  59   The final step is to process the colored networks produced from the
 
  60   second step to produce additional outputs. The final step could be
 
  61   performed by Python programs, or by XML tools. A Python outputter is
 
  62   provided for the document processor output that produces Hypertext Markup
 
  63   Language (HTML) text::
 
  65     html=StructuredText.HTML(doc)
 
  67 Customizing the document processor
 
  69   The document processor is driven by two tables. The first table,
 
  70   named 'paragraph_types', is a sequence of callable objects or method
 
  71   names for coloring paragraphs. If a table entry is a string, then it
 
  72   is the name of a method of the document processor to be used. For
 
  73   each input paragraph, the objects in the table are called until one
 
  74   returns a value (not 'None'). The value returned replaces the
 
  75   original input paragraph in the output. If none of the objects in
 
  76   the paragraph types table return a value, then a copy of the
 
  77   original paragraph is used.  The new object returned by calling a
 
  78   paragraph type should implement the ReadOnlyDOM,
 
  79   StructuredTextColorizable, and StructuredTextSubparagraphContainer
 
  80   interfaces. See the 'Document.py' source file for examples.
 
  82   A paragraph type may return a list or tuple of replacement
 
  83   paragraphs, this allowing a paragraph to be split into multiple
 
  86   The second table, 'text_types', is a sequence of callable objects or
 
  87   method names for coloring text. The callable objects in this table
 
  88   are used in sequence to transform the input text into new text or
 
  89   objects.  The callable objects are passed a string and return
 
  90   nothing ('None') or a three-element tuple consisting of:
 
  92     - a replacement object,
 
  94     - a starting position, and
 
  98   The text from the starting position is (logically) replaced with the
 
  99   replacement object. The replacement object is typically an object
 
 100   that implements that implements the ReadOnlyDOM, and
 
 101   StructuredTextColorizable interfaces. The replacement object can
 
 102   also be a string or a list of strings or objects. Replacement is
 
 103   done from beginning to end and text after the replacement ending
 
 104   position will be passed to the character type objects for processing.
 
 106 Example: adding wiki links
 
 108   We want to add support for Wiki links. A Wiki link is a string of
 
 109   text containing mixed-case letters, such that at least two of the
 
 110   letters are upper case and such that the first letter is upper case.