SecuritySNACCRuntime/doc/design.tex

   1 % file: .../doc/design.tex
   2
   3 % $Header: /cvs/Darwin/Security/SecuritySNACCRuntime/doc/design.tex,v 1.1.1.1 2001/05/18 23:14:10 mb Exp $
   4 % $Log: design.tex,v $
   5 % Revision 1.1.1.1  2001/05/18 23:14:10  mb
   6 % Move from private repository to open source repository
   7 %
   8 % Revision 1.1.1.1  1999/03/16 18:05:52  aram
   9 % Originals from SMIME Free Library.
  10 %
  11 % Revision 1.1  1997/01/01 22:47:31  rj
  12 % first check-in
  13 %
  14
  15 \chapter{\label{comp-des-chapter}Compiler Design}
  16
  17 \section{\label{comp-overview-section}Overview}
  18 The Snacc compiler is implemented with {\ufn yacc}, {\ufn lex}
  19 (actually GNU's equivalents, {\ufn bison} and {\ufn flex}) and
  20 \verb$C$.  Despite the shortcomings of {\ufn lex} and {\ufn yacc},
  21 they provide reasonable performance without too much programming
  22 effort.  Since {\ufn yacc} parsers are extremely difficult to modify
  23 during runtime, any macro that you want the compiler to handle must
  24 be hand coded into the ASN.1 {\ufn yacc} grammar
  25 ({\ufn \dots/compiler/core/parse-asn1.y}) followed by recompilation of snacc.
  26 Macro definitions do not need special consideration since they are
  27 skipped by the compiler.  Macro definitions and complex value notation
  28 are kept as text in the data structure resulting from a parse if you
  29 want to try to parse and process them.
  30
  31 To handle the anti-compiler nature of ASN.1's syntax, snacc makes
  32 several passes on parse tree data structure when compiling.  None of
  33 these passes creates temporary files; this allows snacc to process
  34 large ASN.1 specifications quite quickly.  Each compiler pass is
  35 explained in the next sections.  The main passes of the compiler are
  36 executed in the following order:
  37
  38 \begin{enumerate}
  39 \item parse useful types ASN.1 module
  40 \item parse all user specified ASN.1 modules
  41 \item link local and imported type references in all modules
  42 \item parse values in all modules
  43 \item link local and imported value references in all modules
  44 \item process any macro types
  45 \item normalize types
  46 \item mark recursive types and signal any recursion related errors
  47 \item check for semantic errors in all modules
  48 \item generate C/C++ type information for each ASN.1 type
  49 \item Sort the types from least dependent to most dependent
  50 \item generate the C, C++, IDL or type table
  51 \end{enumerate}
  52
  53 The source code for the compiler resides in {\ufn \dots/compiler/} and the
  54 back ends are in {\ufn \dots/compiler/back-ends/c-gen/}, {\ufn \dots/compiler/back-ends/c++-gen/} and {\ufn \dots/compiler/back-ends/idl-gen/}.
  55
  56 \section{\label{comp-pass1-section}Pass 1: Parsing the Useful Types Module}
  57 The ASN.1 useful types are not hardwired into snacc.  Instead they
  58 have been placed in a separate ASN.1 module.  This allows the user to
  59 define his own useful types or re-define the existing ones without
  60 modifying snacc.  This also has the benefit that names of useful types
  61 are not keywords in the lexical analyzer.  This step is not really a
  62 compiler pass on the module data, however it is described as one for
  63 simplicity.
  64
  65 The useful types module should be passed to snacc with the {\ufn -u}
  66 flag in front of it.  The {\ufn -u} flag tells snacc to treat the
  67 module in a special way.  Instead of parsing the module and generating
  68 code for it, snacc parses the module and makes the types in it
  69 accessible to all of the other modules being parsed.  Note that the
  70 other modules do not need to explicitly import from the useful types
  71 module.  See Section~\ref{comp-pass3-section} for more information on how
  72 useful types affect linking.
  73
  74 The encode, decode, and other routines for the useful types are in the
  75 runtime library.  Currently, the useful types library routines are the
  76 same as the ones the compiler would normally generate given the useful
  77 types module.  However, since they are in the library, you can modify
  78 them to check character sets (string types), or convert local time
  79 formats into their BER equivalent (UTCTime, GeneralizedTime).
  80
  81 The following types are in the useful types module:
  82 \begin{small}
  83 \begin{verbatim}
  84 ASN-USEFUL DEFINITIONS ::=
  85 BEGIN
  86 ObjectDescriptor ::= [UNIVERSAL 7]  IMPLICIT OCTET STRING
  87 NumericString    ::= [UNIVERSAL 18] IMPLICIT OCTET STRING
  88 PrintableString  ::= [UNIVERSAL 19] IMPLICIT OCTET STRING
  89 TeletexString    ::= [UNIVERSAL 20] IMPLICIT OCTET STRING
  90 T61String        ::= [UNIVERSAL 20] IMPLICIT OCTET STRING
  91 VideotexString   ::= [UNIVERSAL 21] IMPLICIT OCTET STRING
  92 IA5String        ::= [UNIVERSAL 22] IMPLICIT OCTET STRING
  93 GraphicString    ::= [UNIVERSAL 25] IMPLICIT OCTET STRING
  94 VisibleString    ::= [UNIVERSAL 26] IMPLICIT OCTET STRING
  95 ISO646String     ::= [UNIVERSAL 26] IMPLICIT OCTET STRING
  96 GeneralString    ::= [UNIVERSAL 27] IMPLICIT OCTET STRING
  97 UTCTime          ::= [UNIVERSAL 23] IMPLICIT OCTET STRING
  98 GeneralizedTime  ::= [UNIVERSAL 24] IMPLICIT OCTET STRING
  99
 100 EXTERNAL         ::= [UNIVERSAL 8] IMPLICIT SEQUENCE
 101 {
 102   direct-reference      OBJECT IDENTIFIER OPTIONAL,
 103   indirect-reference    INTEGER OPTIONAL,
 104   data-value-descriptor ObjectDescriptor OPTIONAL,
 105   encoding CHOICE
 106   {
 107     single-ASN1-type [0] OCTET STRING,  -- should be ANY
 108     octet-aligned    [1] IMPLICIT OCTET STRING,
 109     arbitrary        [2] IMPLICIT BIT STRING
 110   }
 111 }
 112 END
 113 \end{verbatim}
 114 \end{small}
 115
 116 If you use the EXTERNAL type, you must provide the mechanism to encode
 117 and decode the value in the embedded CHOICE, \verb$encoding$.  The
 118 type and transfer syntax of the value in an EXTERNAL type is not known
 119 when the ASN.1 code is compiled by snacc.  Snacc cannot generate
 120 encoders and decoders without complete type information and only
 121 supports a single set of encoding rules, BER\@.
 122
 123 \section{\label{comp-pass2-section}Pass 2: Parsing ASN.1 Modules}
 124 During this pass, all of the specified modules are parsed into the {\em
 125 Module} data structure.  The ASN.1 source files are not consulted
 126 again, after they are parsed.  {\ufn Yacc} and {\ufn lex} are doing the work in
 127 this step. (see files {\ufn snacc.c}, {\ufn lex-asn1.l}, {\ufn parse-asn1.y}
 128 and {\ufn asn1module.h}).
 129
 130 A lexical tie-in is where the yacc parser puts the lexical analyzer
 131 into a different mode (and is usually considered a hack).  The
 132 different modes tokenize symbols differently, which is useful for
 133 skipping well delimited sections that cannot be parsed easily by a
 134 {\ufn yacc} parser on the first pass.  Lexical tie-ins are used in two
 135 places to simplify the ASN.1 grammar sufficiently for {\ufn yacc} and
 136 {\ufn lex}.  There are two special modes in the lexical analyzer, one
 137 for ASN.1 macro definitions and the other for ASN.1 values enclosed in
 138 \{\}'s.
 139
 140 The lexical tie-in for eating macro definition bodies works with macro
 141 definitions of the following form:
 142
 143 \begin{verbatim}
 144 <upper case identifier> MACRO ::= BEGIN ... END
 145 \end{verbatim}
 146
 147 Everything between the {\ASN BEGIN} and {\ASN END} is stuffed into a
 148 string by {\ufn lex} and passed back as single token to the
 149 {\ufn yacc} parser.
 150
 151 Values within \{\}'s are grabbed in a similar way.  Value parsing
 152 cannot really be done at this stage since complete type information is
 153 needed and the types are not fully parsed or linked yet.
 154
 155 Most syntax errors are reported during this pass.  If syntax errors
 156 are encountered, snacc will report as many as it can from the
 157 offending module before the parser is hopelessly lost and then exit.
 158 If the types and values are separated with semi-colons, the parser can
 159 recover after a syntax error and attempt to find more errors in that
 160 module before exiting.
 161
 162
 163 \section{\label{comp-pass3-section}Pass 3: Linking Types}
 164 The third pass links all type references.  Snacc attempts to resolve
 165 any currently visible (i.\ e.\  not in macro definitions or constructed
 166 values) type reference.  This includes type references in simple value
 167 definitions and subtyping information.  The useful types module (if
 168 given) is linked first.
 169
 170 Snacc will exit after this pass if any type references could not be
 171 resolved.  Error messages with file and line number information will
 172 be printed to {\C stderr}.
 173
 174 This pass also counts and stores the number of times a type definition is
 175 referenced locally and from other modules.  This information is used
 176 during the type sorting pass.
 177
 178 First, each module identifier is checked for conflicts with the
 179 others.  If the module identifier includes an OBJECT IDENTIFIER, snacc
 180 only checks for conflicts with the other module identifier OBJECT
 181 IDENTIFIERs.  When only a module name is provided, snacc checks for
 182 conflicts with the the other module names, even if the other module
 183 identifiers include OBJECT IDENTIFIERs.  If the OBJECT IDENTIFIER of
 184 a module identifier contains any value references, it will be ignored
 185 for module look-up purposes.  Note that value references within the
 186 module identifier OBJECT IDENTIFIERs are not allowed in the 1992
 187 version of ASN.1 due to the difficulty in module name resolution they
 188 present.
 189
 190 Two modules with the same name but different OBJECT IDENTIFIERs are
 191 not considered an error within ASN.1.  However, because the generated
 192 files use the module name as part of their name, the code generation
 193 pass will gripe about and fail for modules with the same name.
 194
 195 Next, each module's import {\em lists} are resolved by finding the
 196 named module and then verifying that the named module contains all of
 197 the imported types.
 198
 199 Then for each module, each type reference (except those of the form
 200 {\em modulename.typename}) is assumed to be a local type reference and
 201 the linker attempts to find a local type definition of the same name
 202 to resolve it with.  If a matching local definition is found, the type
 203 reference is resolved and the linker continues with the next type
 204 reference.
 205
 206 For each type reference of the form {\em modulename.typename}, the
 207 linker looks in the module with name {\em modulename} for the type
 208 {\em typename}.  If the type is found the reference is resolved,
 209 otherwise a linking error is reported.  Note that this form of type
 210 reference provides a special scope that does not conflict with other
 211 local or imported types in that module.
 212
 213 For type references that failed to resolve locally and are not of the
 214 form {\em modulename.typename}, the linker looks in the import lists
 215 of the current type reference's module for a type to resolve with.  If
 216 the type is found in the import lists, the reference is resolved.
 217
 218 For the remaining unresolved type references (failed local and legal
 219 import resolution and are not of the form {\em modulename.typename}),
 220 the linker looks in the useful types module, if one was specified with
 221 the {\ufn -u} option.  If the type is found in the useful types module
 222 then the reference is resolved, otherwise a linking error is reported.
 223
 224 Note that when a useful types module is specified, it is globally
 225 available to all modules, but it has the lowest linking priority.
 226 That is, if a type reference can be resolved legally without the
 227 useful types module, it will be.
 228
 229 Some type checking must be done in this pass to link certain types
 230 properly.  These include:
 231 \begin{itemize}
 232 \item {a SELECTION type must reference a field of a CHOICE type.}
 233 \item {a COMPONENTS OF type in a SET must reference a SET.}
 234 \item {a COMPONENTS OF type in a SEQUENCE must reference a SEQUENCE.}
 235 \end{itemize}
 236
 237
 238
 239 \section{\label{comp-pass4-section}Pass 4: Parsing Values}
 240 The fourth pass attempts to parse any value that is enclosed in \{\}'s in
 241 the given modules.  INTEGERS, REALs and BOOLEANS that are not enclosed in
 242 braces are parsed in the first pass.
 243
 244 The value parser is implemented without {\ufn yacc} and {\ufn lex} and
 245 uses each value's type information to help parse the value.  Values
 246 within \{\}'s hidden within types such as default values and parts of
 247 subtypes are not parsed.  Since subtypes and default values do not
 248 affect the generated code, upgrading the value parser in this respect
 249 is not very useful.
 250
 251 The only type of value in \{\}'s that is parsed is the OBJECT
 252 IDENTIFIER\@.  All of the OBJECT IDENTIFIER value forms are supported
 253 but snacc loosens the restrictions on using arc names defined in the
 254 OBJECT IDENTIFIER tree.
 255
 256 ASN.1 allows OBJECT IDENTIFIER values to reference special built-in
 257 arc names from the OBJECT IDENTIFIER tree defined in Annexes B, C and
 258 D of X.208.  For example the first arc in an OBJECT IDENTIFIER value
 259 can be either {\ASN ccitt} {\ASN iso} or {\ASN joint-iso-ccitt}.  The
 260 acceptable arc names are context dependent; for example the second arc
 261 can be one of {\ASN standard}, {\ASN registration-authority},
 262 {\ASN member-body} or {\ASN identified-organization} only if the first
 263 arc was {\ASN iso} or 1.
 264
 265 Snacc uses a simplified algorithm to handle references to the arc
 266 names defined in the OBJECT IDENTIFIER tree.  Any arc value that is
 267 represented by a single identifier is checked to see if it is one of
 268 the arc names defined in the OBJECT IDENTIFIER tree; context is
 269 ignored.  If the identifier matches one of the arc names then its
 270 value is set accordingly.  The lack of context sensitivity in snacc's
 271 algorithm may cause the arc name to link with an arc name from the
 272 OBJECT IDENTIFIER tree when a local or imported INTEGER was desired.
 273 The following is the list special arc names that snacc understands and
 274 their values (see {\ufn \dots/compiler/core/oid.c}):
 275
 276 \begin{itemize}
 277 \setlength{\itemsep}{0pt}
 278 \setlength{\parsep}{0pt}
 279 \nspace{0}
 280 \item {ccitt = 0}
 281 \item {iso = 1}
 282 \item {joint-iso-ccitt = 2}
 283 \item {standard = 0}
 284 \item {registration-authority = 1}
 285 \item {member-body = 2}
 286 \item {identified-organization = 3}
 287 \item {recommendation = 0}
 288 \item {question = 1}
 289 \item {administration = 2}
 290 \item {network-operator = 3}
 291 \end{itemize}
 292
 293 \section{\label{comp-pass5-section}Pass 5: Linking Values}
 294 The fifth pass links value references. The value linker looks for
 295 value references to resolve in value definitions and type definitions,
 296 including default values and subtyping information.  The value linking
 297 algorithm is virtually identical to the type linking pass (see Section
 298 \ref{comp-pass3-section}).
 299
 300 Currently the value parsing is limited to OBJECT IDENTIFIER values.
 301 Simple values that are not between \{\}'s are parsed in the first
 302 pass.  Here is an example that illustrates the OBJECT IDENTIFIER
 303 parsing and linking.  The following values:
 304
 305 \begin{small}
 306 \begin{verbatim}
 307 foo OBJECT IDENTIFIER ::= { joint-iso-ccitt 2 88 28 }
 308 bar OBJECT IDENTIFIER ::= { foo 1 }
 309 bell INTEGER ::= 2
 310 gumby OBJECT IDENTIFIER ::= { foo bell }
 311 pokie OBJECT IDENTIFIER ::= { foo stimpy(3) }
 312 \end{verbatim}
 313 \end{small}
 314
 315 \noindent
 316 are equivalent to this:
 317
 318 \begin{small}
 319 \begin{verbatim}
 320 foo OBJECT IDENTIFIER ::= { 2 2 88 28 }
 321 bar OBJECT IDENTIFIER ::= { 2 2 88 28  1 }
 322 bell INTEGER ::= 2
 323 gumby OBJECT IDENTIFIER ::= { 2 2 88 28 2 }
 324 pokie OBJECT IDENTIFIER ::= { 2 2 88 28 3 }
 325 \end{verbatim}
 326 \end{small}
 327
 328 Note that in version 1.0, named arcs (e.g. {\ASN stimpy(3)}) were
 329 promoted to full integer values.  This was wrong---many standards
 330 re-used them (e.g. X.500 and {\ASN ds(5)}) leading to multiply defined
 331 integer values.  If you want to improve the value parsing, look in
 332 {\ufn \dots/compiler/core/val-parser.c}
 333
 334 \section{\label{comp-pass6-section}Pass 6: Processing Macros}
 335
 336 The fifth pass processes macros.  For all macros currently handled,
 337 snacc converts type definitions inside the macro to type references
 338 and puts the type definition in the normal scope.  This way, the code
 339 generator does not have to know about macros to generate code for the
 340 types defined within them.
 341
 342 The only macro that receives any special processing is the SNMP
 343 OBJECT-TYPE macro.  This macro's information defines an OBJECT
 344 IDENTIFIER or INTEGER to type mapping for use with any ANY DEFINED BY
 345 type.  Note that the OBJECT-TYPE macro has been extended beyond its
 346 SNMP definition to allow integer values for INTEGER to type mappings.
 347
 348 ASN.1 allows you to define new macros within an ASN.1 module;  this
 349 can change the grammar of the ASN.1 language.  Since snacc is
 350 implemented with {\ufn yacc} and yacc grammars cannot be modified
 351 easily during runtime, snacc cannot change its parser in response to
 352 macro definitions it parses.
 353
 354 Any macro that snacc can parse has been explicitly added to the yacc
 355 grammar before compiling snacc.  When a macro that snacc can parse is
 356 parsed, a data structure that holds the relevant information from the
 357 macro is added to the parse tree.  The type and value linking passes
 358 as well as the macro processing and possibly the normalization pass
 359 need to be modified to handle any new macros that you add.
 360
 361 The following macros are parsed:
 362
 363 \begin{itemize}
 364 %\begin{linespacing}{0.5}
 365 \setlength{\itemsep}{0pt}
 366 \setlength{\parsep}{0pt}
 367 \nspace{0}
 368 \item{ OPERATION (ROS) }
 369 \item{ ERROR (ROS) }
 370 \item{ BIND (ROS) }
 371 \item{ UNBIND (ROS) }
 372 \item{ APPLICATION-SERVICE-ELEMENT (ROS) }
 373 \item{ APPLICATION-CONTEXT }
 374 \item{ EXTENSION (MTSAS)}
 375 \item{ EXTENSIONS (MTSAS) }
 376 \item{ EXTENSION-ATTRIBUTE (MTSAS) }
 377 \item{ TOKEN (MTSAS) }
 378 \item{ TOKEN-DATA (MTSAS)}
 379 \item{ SECURITY-CATEGORY (MTSAS) }
 380 \item{ OBJECT (X.407) }
 381 \item{ PORT   (X.407) }
 382 \item{ REFINE (X.407)}
 383 \item{ ABSTRACT-BIND (X.407) }
 384 \item{ ABSTRACT-UNBIND (X.407) }
 385 \item{ ABSTRACT-OPERATION (X.407) }
 386 \item{ ABSTRACT-ERROR (X.407) }
 387 \item{ ALGORITHM (X.509)}
 388 \item{ ENCRYPTED (X.509)}
 389 \item{ PROTECTED (X.509)}
 390 \item{ SIGNATURE (X.509)}
 391 \item{ SIGNED    (X.509)}
 392 \item{ OBJECT-TYPE (SNMP) }
 393 %\end{linespacing}
 394 \end{itemize}
 395
 396 However, no code is generated for these macros.  As stated above, only
 397 the OBJECT-TYPE macro affects the encoders and decoders.
 398
 399 \section{\label{comp-pass7-section}Pass 7: Normalizing Types}
 400 The sixth pass normalizes the types to make code generation simpler.
 401 The following is done during normalization:
 402 \begin{itemize}
 403
 404 \item[1.] { COMPONENTS OF types are replaced with the contents of the SET
 405 or SEQUENCE components that they reference.}
 406
 407 \item[2.] { SELECTION types are replaced with the type they reference.}
 408
 409 \item[3.] { SEQUENCE, SET, CHOICE, SET OF and SEQUENCE OF {\em definitions}
 410 embedded in other types are made into separate type definitions. }
 411
 412 \item[4.] { For modules in which ``IMPLICIT TAGS'' is specified, tagged
 413 type references such as {\ASN [APPLICATION 2] Foo} are marked IMPLICIT
 414 if the referenced type ({\ASN FOO} in this case) is not an untagged
 415 CHOICE or untagged ANY type.}
 416
 417 \item[5.] { INTEGERs with named numbers, BIT STRINGs with named bits and
 418 ENUMERATED types embedded in other types are made into separate type
 419 definitions.}
 420 \end{itemize}
 421
 422 The COMPONENTS OF and SELECTION type simplifications are obvious but
 423 the motivation for the others may not be so obvious.  The third type of
 424 simplification makes type definitions only one level deep.  This
 425 simplifies the decoding routines since snacc uses local variables for
 426 expected lengths, running length totals and tags instead of stacks.
 427
 428 The implicit references caused by ``IMPLICIT TAGS'' are marked
 429 directly on type references that need it.  This saves the code
 430 generators from worrying about whether implicit tagging is in effect
 431 and which types can be referenced implicitly.
 432
 433 The types with named numbers or bits are made into a separate type to
 434 allow the C++ back end to simply make a class that inherits from the
 435 INTEGER or BIT STRING class and defines the named numbers or bits
 436 inside an enum in the new class.  This is described further in the C++
 437 code generation chapter.
 438
 439 \section{\label{comp-pass8-section}Pass 8: Marking Recursive Types}
 440
 441
 442 This pass marks recursive types and checks for recursion related
 443 errors.  To determine whether a type definition is recursive, each
 444 type definition is traced to its leaves, checking for references to
 445 itself.  Both local and imported type references within a type are
 446 followed to reach the leaves of the type.  A leaf type is a simple
 447 (non-aggregate) built-in type such as an INTEGER or BOOLEAN\@. At the
 448 moment, recursion information is only used during the type dependency
 449 sorting pass.
 450
 451 {\em Snacc} attempts to detect two types of recursion related errors.  The
 452 first type of error results from a recursive type that is composed
 453 solely of type references.  Types of this form contain no real type
 454 information and would result in zero-sized values.  For example the
 455 following recursive types will generate this type of warning:
 456 \begin{small}
 457 \begin{verbatim}
 458 A ::= B
 459 B ::= C
 460 C ::= A
 461 \end{verbatim}
 462 \end{small}
 463
 464 The other recursion related error results from a type whose value will
 465 always be infinite in size.  This is caused by recursion with no
 466 optional component that can terminate the recursion.  If the recursion
 467 includes an OPTIONAL member of a SET or SEQUENCE, a CHOICE member, or
 468 a SET OF or SEQUENCE OF, the recursion can terminate.
 469
 470 Both of the recursion errors generate warnings from snacc but will
 471 not stop code generation.
 472
 473
 474 \section{\label{comp-pass9-section}Pass 9: Semantic Error Checking}
 475 The ninth pass checks for semantic errors in the ASN.1 specification
 476 that have not been checked already. Both the type linking pass and the
 477 recursive type marking pass do some error checking as well. Snacc attempts
 478 to detect the following errors in this pass:
 479
 480 \begin{itemize}
 481 \item { elements of CHOICE and SET types must have distinct tags.}
 482
 483 \item { CHOICE, ANY, and ANY DEFINED BY types cannot be implicitly tagged. }
 484
 485 \item { type and value names within the same scope must be unique. }
 486
 487 \item { field names in a SET, SEQUENCE or CHOICE must be distinct.  If
 488 a CHOICE is a member of a SET, SEQUENCE or CHOICE and has no field name,
 489 then the embedded CHOICE's field names must be distinct from its
 490 parents to avoid ambiguity in value notation.}
 491
 492 \item { an APPLICATION tag code can only be used once per module. }
 493
 494 \item { each value in a named bit list (BIT STRINGs) or named number
 495 list (INTEGERs and ENUMERATED) must be unique within its list.}
 496
 497 \item { each identifier in a named bit list or named number list must
 498 be unique within its list.}
 499
 500 \item { the tags on a series of one or more consecutive OPTIONAL or DEFAULT
 501 SEQUENCE elements and the following element must be distinct. }
 502
 503 \item { gives a warning if an ANY DEFINED BY type appears in a
 504 SEQUENCE before its identifier or in a SET\@.  These would allow encodings
 505 where the ANY DEFINED BY value was prior to its identifier in the
 506 encoded value; ANY DEFINED BY values are difficult to decode without
 507 knowing their identifier.}
 508
 509 \end{itemize}
 510
 511 Snacc does not attempt to detect the following errors due the
 512 limitations of the value parser.
 513 \begin{itemize}
 514 \item { SET and SEQUENCE values can be empty (\{\}) only if the SET or
 515 SEQUENCE type was defined as empty or all of its elements are marked
 516 as OPTIONAL or DEFAULT.}
 517
 518 \item { each identifier in a BIT STRING value must from that BIT
 519 STRING's named bit list (this could be done in an improved value
 520 linker instead of this pass).}
 521 \end{itemize}
 522
 523
 524 \section{\label{comp-pass10-section}Pass 10: Generating C/C++ Type Information}
 525
 526 This pass fills in the target language type information.  The process
 527 is different for the C and C++ back ends since the C++ ASN.1 model is
 528 different and it was developed later (more design flaws had been
 529 corrected for the C++ backend).
 530
 531 For C and C++ there is an array that contains the type {\em definition}
 532 information for each built-in type.  For each built-in ASN.1 type, the
 533 C array holds:
 534
 535 \begin{description}
 536 \item[typename] {the C {\C typedef} name for this type definition.}
 537
 538 \item[isPdu]    {TRUE if this type definition is a PDU\@. This is set
 539 for types used in ANY and ANY DEFINED BY types and those indicated by
 540 the user via compiler directives.  Additional interfaces to the encode
 541 and decode routines are generated for PDU types.  The SNMP OBJECT-TYPE
 542 macro is the current means of indicating whether a type is used within
 543 an ANY or ANY DEFINED BY type.}
 544
 545 \item[isPtrForTypeDef] { TRUE if other types defined solely by this type
 546 definition are defined as a pointer to this type.}
 547
 548 \item[isPtrForTypeRef] { TRUE if type references to this type
 549 definition from a SET or SEQUENCE are by pointer.}
 550
 551 \item[isPtrForOpt] { TRUE if OPTIONAL type references to this type
 552 definition from a SET or SEQUENCE are by pointer.}
 553
 554 \item[isPtrInChoice] { TRUE if type references to this type
 555 definition from a CHOICE are by pointer.}
 556
 557 \item[optTestRoutineName] { name of the routine to test whether an
 558 OPTIONAL element of this type in a SET or SEQUENCE is present.
 559 Usually just the name of a C macro that tests for NULL.}
 560
 561 \item[printRoutineName] { name of this type definition's printing routine.}
 562 \item[encodeRoutineName]{ name of this type definition's encoding routine.}
 563 \item[decodeRoutineName]{ name of this type definition's decoding routine.}
 564 \item[freeRoutineName]  { name of this type definition's freeing routine.}
 565 \end{description}
 566
 567 The C++ type definition array is similar to C's. It contains:
 568
 569 \begin{description}
 570 \item[classname] { holds the C++ {\C class} name for this type definition.}
 571 \item[isPdu] { same as C isPdu except that is does not affect the code
 572 generation since the C++ back end includes the extra PDU encode and
 573 decode routines by default.}
 574 \item[isPtrForTypeDef] { same as C isPtrForTypeDef. }
 575 \item[isPtrForOpt] { same as C isPtrForOpt.}
 576 \item[isPtrInChoice] { same as C isPtrInChoice}
 577 \item[isPtrInSetAndSeq] { whether type references to this class
 578 from a SET or SEQUENCE are by pointer.}
 579 \item[isPtrInList] {whether type references to this class
 580 from a SET OF or SEQUENCE OF are by pointer.}
 581 \item[optTestRoutineName] { name of the routine to test whether an
 582 OPTIONAL element of this type in a SET or SEQUENCE is present.
 583 Usually is just name of a C macro that tests for NULL.}
 584 \end{description}
 585
 586 The first step of this pass uses the type arrays to fill in the C or
 587 C++ type {\em definition} information for each module's ASN.1 type
 588 definitions.  This is done for the useful types module as well.
 589
 590 The next step goes through each constructed type and fills in the type
 591 {\em reference} information for each reference to a built-in, user defined
 592 or useful type.  Much of the type reference information is taken from
 593 the referenced type's definition information.  The type reference
 594 information contains the following (for both C and C++):
 595
 596 \begin{description}
 597 \item[fieldName] { field name for this type if it is referenced from
 598 a CHOICE, SET or SEQUENCE.}
 599 \item[typeName] { type name of the referenced type.}
 600 \item[isPtr] { whether this reference is by pointer.}
 601 \item[namedElmts] { named elements for INTEGER, ENUMERATED or BIT
 602 STRING types with their C names and values.}
 603 \item[choiceIdValue] { if this type reference is in a CHOICE, this
 604 holds the value of the CHOICE's choiceId that indicates the presence
 605 of this field.}
 606 \item[choiceIdSymbol] { if this type reference is in a CHOICE, this
 607 holds the C enum value symbol that has the choiceIdValue value.}
 608 \item[optTestRoutineName] { name of the routine or macro to test for
 609 the presence of this element if it is an OPTIONAL element of a SET or SEQUENCE.}
 610 \end{description}
 611
 612 \section{\label{comp-pass11-section}Pass 11: Sorting Types}
 613
 614 This pass sorts the type definitions within each module in order of
 615 dependence.  ASN.1 does not require the types to be defined before
 616 they are referenced but both C and C++ do.  Without this pass, the
 617 generated types/classes would probably not compile due to type
 618 dependency problems.  There is no attempt to order the modules;
 619 command line order is used for the module dependence.  If you have
 620 problems with mutually dependent modules, the simplest approach is to
 621 combine the dependent modules into a single ASN.1 module.
 622
 623 Some compilers such as CASN1 \cite{CASN1} require the user to order
 624 the types within the ASN.1 modules.  This can be tedious and since
 625 snacc may generate new type definitions from nested aggregate type
 626 definitions in the normalization pass, the user does not have complete
 627 control over the order of every type definition.  (The user could use
 628 the {\ufn -P} option to get the normalized ASN.1 and then order it but
 629 that is painful as well.)
 630
 631 Snacc attempts to sort the types from least dependent to most
 632 dependent using the following convoluted algorithm:
 633
 634 First, separate the type definitions within a module into the groups:
 635 \begin{itemize}
 636 \item[1.] { type definitions that are defined directly from simple built-in
 637 types such as INTEGER.}
 638
 639 \item[2.] { types such as SET, SEQUENCE, SET OF, SEQUENCE OF and CHOICE
 640 that contain no references to types defined in this module.  That, is
 641 they are defined from only simple built-in types, imported types or
 642 useful types.}
 643
 644 \item[3.] { type definitions that reference locally defined types.}
 645
 646 \item[4.] { type definitions that are not referenced by any local types.}
 647 \end{itemize}
 648
 649 Only the 3rd group of type definitions needs more sorting.  After it
 650 has been sorted, the groups are merged in the order 1, 2, 3, 4 to
 651 yield a sorted type definition list.
 652
 653 Now we describe how the 3rd group of type definitions is sorted.
 654 \begin{itemize}
 655
 656 \item[1.] {for each type definition in the third group, a list of its local type
 657 references is built and attached to it.  This type reference list only
 658 goes one level deep; it does not follow type references to find more
 659 type references.}
 660
 661 \item[2.] { all of the linearly-dependent types are removed and sorted.
 662 This is done by repeatedly removing type definitions that do not
 663 directly depend on any other type definitions that remain in the 3rd
 664 group.  The process of removing the type definitions sorts them.}
 665
 666 \item[3.] { the type definitions that were not removed in step 2 are
 667 divided into two groups: recursive and non-recursive.  The
 668 non-recursive types depend on the recursive ones since they are still
 669 in the list after step 2.}
 670
 671 \item[4.] { the non-recursive types from step 3 are sorted as in step
 672 2.  All of them should sort linearly since none are recursive. }
 673
 674 \item[5.] { if the target language is C, any SET OF or SEQUENCE OF
 675 types are separated from the recursive type definitions built in step 3.
 676 This is done because the C representation of a list type is generic
 677 (uses a {\C void~*} to reference the list element) and therefore does
 678 not really depend on the list's element type.}
 679
 680 \item[6] { the list of local type references for the recursive types
 681 from step 3 is re-generated as in step 1 using a relaxation:  types
 682 referenced as pointers are not added to a type's reference list.}
 683
 684 \item[7] { the recursive types from step two are re-sorted as in step
 685 2 using their new local type reference lists. Two lists are formed,
 686 those that sorted linearly and those that did not. Hopefully the
 687 latter list will be empty.}
 688 \end{itemize}
 689
 690 To form a sorted third group, the lists are merged in the following order:
 691 \begin{itemize}
 692 \item {linearly sorted types from step 2}
 693 \item {separated list types (C only) from step 5}
 694 \item {sorted recursive types from step 7}
 695 \item {unsorted recursive types from step 7 (hopefully empty)}
 696 \item {sorted non-recursive types from step 4}
 697 \end{itemize}
 698
 699
 700 In C, the code generator defines both {\C typedef} names and
 701 {\C struct} tags (names).  For example,
 702 \begin{verbatim}
 703 Foo ::= SET { a INTEGER, b BOOLEAN }
 704
 705 Bar ::= SEQUENCE { a OBJECT IDENTIFIER, b Foo }
 706 \end{verbatim}
 707 translates to the following C data types:
 708 \begin{verbatim}
 709 typedef struct Foo /* SET */
 710 {
 711     AsnInt a; /* INTEGER */
 712     AsnBool b; /* BOOLEAN */
 713 } Foo;
 714
 715 typedef struct Bar /* SEQUENCE */
 716 {
 717     AsnOid a; /* OBJECT IDENTIFIER */
 718     struct Foo *b; /* Foo */
 719 } Bar;
 720 \end{verbatim}
 721
 722 Note that both the {\C struct} and the {\C typedef} have the name
 723 {\C Foo}.  Also note that the Bar type references the {\C Foo} via
 724 {\C struct Foo~*}.
 725
 726 For types such as {\C Bar} that contain the {\C Foo} type,
 727 {\C Foo} is referenced as {\C struct Foo~*} instead of just
 728 {\C Foo~*} because C allows you to use the type {\C struct Foo~*}
 729 (incomplete type) in defining types even prior to the actual
 730 declaration of the the {\C struct Foo}. The {\C Foo~*} type can
 731 {\em only} be used after the {\C Foo typedef} declaration.  The use
 732 of incomplete types can often overcome recursion related type ordering
 733 problems (not relevant in this example since they are not recursive).
 734
 735 \section{\label{comp-pass12-section}Pass 12: Generating Code}
 736
 737 This pass creates and fills the source files with C or C++ code or
 738 produces a type table containing the type descriptions from all of the
 739 parsed modules, including the useful types module (if given).  The
 740 purpose of the normalization, sorting and error detection passes is to
 741 simplify this pass.
 742
 743 The normalization pass simplified the ASN.1 types in various ways to
 744 make C/C++ type and code generation simpler.
 745
 746 The type sorting pass hopefully eliminates type dependency problems in the
 747 generated code.  The C/C++ type generator simply proceeds through the
 748 ordered type list writing the C/C++ type definitions to a header file.
 749
 750 The error detection and linking passes will make snacc exit if errors
 751 are found, so the code generation pass can assume the ASN.1 types are
 752 virtually error free.  This usually allows snacc to exit gracefully
 753 instead of crashing due to an undetected error.
 754
 755 The type table data structure is similar to snacc's parse tree for the
 756 ASN.1 modules but it is much simpler.  This is because all of the type
 757 linking and error checking has been done.  The type definitions in the
 758 type tables are in defined by the type sorting pass (dependency).
 759
 760 The next chapters describe the code that is generated by snacc and the
 761 libraries the generated code uses.