]> git.saurik.com Git - apple/security.git/blob - SecuritySNACCRuntime/doc/design.tex
Security-54.1.tar.gz
[apple/security.git] / SecuritySNACCRuntime / doc / design.tex
1 % file: .../doc/design.tex
2
3 % $Header: /cvs/Darwin/Security/SecuritySNACCRuntime/doc/design.tex,v 1.1.1.1 2001/05/18 23:14:10 mb Exp $
4 % $Log: design.tex,v $
5 % Revision 1.1.1.1 2001/05/18 23:14:10 mb
6 % Move from private repository to open source repository
7 %
8 % Revision 1.1.1.1 1999/03/16 18:05:52 aram
9 % Originals from SMIME Free Library.
10 %
11 % Revision 1.1 1997/01/01 22:47:31 rj
12 % first check-in
13 %
14
15 \chapter{\label{comp-des-chapter}Compiler Design}
16
17 \section{\label{comp-overview-section}Overview}
18 The Snacc compiler is implemented with {\ufn yacc}, {\ufn lex}
19 (actually GNU's equivalents, {\ufn bison} and {\ufn flex}) and
20 \verb$C$. Despite the shortcomings of {\ufn lex} and {\ufn yacc},
21 they provide reasonable performance without too much programming
22 effort. Since {\ufn yacc} parsers are extremely difficult to modify
23 during runtime, any macro that you want the compiler to handle must
24 be hand coded into the ASN.1 {\ufn yacc} grammar
25 ({\ufn \dots/compiler/core/parse-asn1.y}) followed by recompilation of snacc.
26 Macro definitions do not need special consideration since they are
27 skipped by the compiler. Macro definitions and complex value notation
28 are kept as text in the data structure resulting from a parse if you
29 want to try to parse and process them.
30
31 To handle the anti-compiler nature of ASN.1's syntax, snacc makes
32 several passes on parse tree data structure when compiling. None of
33 these passes creates temporary files; this allows snacc to process
34 large ASN.1 specifications quite quickly. Each compiler pass is
35 explained in the next sections. The main passes of the compiler are
36 executed in the following order:
37
38 \begin{enumerate}
39 \item parse useful types ASN.1 module
40 \item parse all user specified ASN.1 modules
41 \item link local and imported type references in all modules
42 \item parse values in all modules
43 \item link local and imported value references in all modules
44 \item process any macro types
45 \item normalize types
46 \item mark recursive types and signal any recursion related errors
47 \item check for semantic errors in all modules
48 \item generate C/C++ type information for each ASN.1 type
49 \item Sort the types from least dependent to most dependent
50 \item generate the C, C++, IDL or type table
51 \end{enumerate}
52
53 The source code for the compiler resides in {\ufn \dots/compiler/} and the
54 back ends are in {\ufn \dots/compiler/back-ends/c-gen/}, {\ufn \dots/compiler/back-ends/c++-gen/} and {\ufn \dots/compiler/back-ends/idl-gen/}.
55
56 \section{\label{comp-pass1-section}Pass 1: Parsing the Useful Types Module}
57 The ASN.1 useful types are not hardwired into snacc. Instead they
58 have been placed in a separate ASN.1 module. This allows the user to
59 define his own useful types or re-define the existing ones without
60 modifying snacc. This also has the benefit that names of useful types
61 are not keywords in the lexical analyzer. This step is not really a
62 compiler pass on the module data, however it is described as one for
63 simplicity.
64
65 The useful types module should be passed to snacc with the {\ufn -u}
66 flag in front of it. The {\ufn -u} flag tells snacc to treat the
67 module in a special way. Instead of parsing the module and generating
68 code for it, snacc parses the module and makes the types in it
69 accessible to all of the other modules being parsed. Note that the
70 other modules do not need to explicitly import from the useful types
71 module. See Section~\ref{comp-pass3-section} for more information on how
72 useful types affect linking.
73
74 The encode, decode, and other routines for the useful types are in the
75 runtime library. Currently, the useful types library routines are the
76 same as the ones the compiler would normally generate given the useful
77 types module. However, since they are in the library, you can modify
78 them to check character sets (string types), or convert local time
79 formats into their BER equivalent (UTCTime, GeneralizedTime).
80
81 The following types are in the useful types module:
82 \begin{small}
83 \begin{verbatim}
84 ASN-USEFUL DEFINITIONS ::=
85 BEGIN
86 ObjectDescriptor ::= [UNIVERSAL 7] IMPLICIT OCTET STRING
87 NumericString ::= [UNIVERSAL 18] IMPLICIT OCTET STRING
88 PrintableString ::= [UNIVERSAL 19] IMPLICIT OCTET STRING
89 TeletexString ::= [UNIVERSAL 20] IMPLICIT OCTET STRING
90 T61String ::= [UNIVERSAL 20] IMPLICIT OCTET STRING
91 VideotexString ::= [UNIVERSAL 21] IMPLICIT OCTET STRING
92 IA5String ::= [UNIVERSAL 22] IMPLICIT OCTET STRING
93 GraphicString ::= [UNIVERSAL 25] IMPLICIT OCTET STRING
94 VisibleString ::= [UNIVERSAL 26] IMPLICIT OCTET STRING
95 ISO646String ::= [UNIVERSAL 26] IMPLICIT OCTET STRING
96 GeneralString ::= [UNIVERSAL 27] IMPLICIT OCTET STRING
97 UTCTime ::= [UNIVERSAL 23] IMPLICIT OCTET STRING
98 GeneralizedTime ::= [UNIVERSAL 24] IMPLICIT OCTET STRING
99
100 EXTERNAL ::= [UNIVERSAL 8] IMPLICIT SEQUENCE
101 {
102 direct-reference OBJECT IDENTIFIER OPTIONAL,
103 indirect-reference INTEGER OPTIONAL,
104 data-value-descriptor ObjectDescriptor OPTIONAL,
105 encoding CHOICE
106 {
107 single-ASN1-type [0] OCTET STRING, -- should be ANY
108 octet-aligned [1] IMPLICIT OCTET STRING,
109 arbitrary [2] IMPLICIT BIT STRING
110 }
111 }
112 END
113 \end{verbatim}
114 \end{small}
115
116 If you use the EXTERNAL type, you must provide the mechanism to encode
117 and decode the value in the embedded CHOICE, \verb$encoding$. The
118 type and transfer syntax of the value in an EXTERNAL type is not known
119 when the ASN.1 code is compiled by snacc. Snacc cannot generate
120 encoders and decoders without complete type information and only
121 supports a single set of encoding rules, BER\@.
122
123 \section{\label{comp-pass2-section}Pass 2: Parsing ASN.1 Modules}
124 During this pass, all of the specified modules are parsed into the {\em
125 Module} data structure. The ASN.1 source files are not consulted
126 again, after they are parsed. {\ufn Yacc} and {\ufn lex} are doing the work in
127 this step. (see files {\ufn snacc.c}, {\ufn lex-asn1.l}, {\ufn parse-asn1.y}
128 and {\ufn asn1module.h}).
129
130 A lexical tie-in is where the yacc parser puts the lexical analyzer
131 into a different mode (and is usually considered a hack). The
132 different modes tokenize symbols differently, which is useful for
133 skipping well delimited sections that cannot be parsed easily by a
134 {\ufn yacc} parser on the first pass. Lexical tie-ins are used in two
135 places to simplify the ASN.1 grammar sufficiently for {\ufn yacc} and
136 {\ufn lex}. There are two special modes in the lexical analyzer, one
137 for ASN.1 macro definitions and the other for ASN.1 values enclosed in
138 \{\}'s.
139
140 The lexical tie-in for eating macro definition bodies works with macro
141 definitions of the following form:
142
143 \begin{verbatim}
144 <upper case identifier> MACRO ::= BEGIN ... END
145 \end{verbatim}
146
147 Everything between the {\ASN BEGIN} and {\ASN END} is stuffed into a
148 string by {\ufn lex} and passed back as single token to the
149 {\ufn yacc} parser.
150
151 Values within \{\}'s are grabbed in a similar way. Value parsing
152 cannot really be done at this stage since complete type information is
153 needed and the types are not fully parsed or linked yet.
154
155 Most syntax errors are reported during this pass. If syntax errors
156 are encountered, snacc will report as many as it can from the
157 offending module before the parser is hopelessly lost and then exit.
158 If the types and values are separated with semi-colons, the parser can
159 recover after a syntax error and attempt to find more errors in that
160 module before exiting.
161
162
163 \section{\label{comp-pass3-section}Pass 3: Linking Types}
164 The third pass links all type references. Snacc attempts to resolve
165 any currently visible (i.\ e.\ not in macro definitions or constructed
166 values) type reference. This includes type references in simple value
167 definitions and subtyping information. The useful types module (if
168 given) is linked first.
169
170 Snacc will exit after this pass if any type references could not be
171 resolved. Error messages with file and line number information will
172 be printed to {\C stderr}.
173
174 This pass also counts and stores the number of times a type definition is
175 referenced locally and from other modules. This information is used
176 during the type sorting pass.
177
178 First, each module identifier is checked for conflicts with the
179 others. If the module identifier includes an OBJECT IDENTIFIER, snacc
180 only checks for conflicts with the other module identifier OBJECT
181 IDENTIFIERs. When only a module name is provided, snacc checks for
182 conflicts with the the other module names, even if the other module
183 identifiers include OBJECT IDENTIFIERs. If the OBJECT IDENTIFIER of
184 a module identifier contains any value references, it will be ignored
185 for module look-up purposes. Note that value references within the
186 module identifier OBJECT IDENTIFIERs are not allowed in the 1992
187 version of ASN.1 due to the difficulty in module name resolution they
188 present.
189
190 Two modules with the same name but different OBJECT IDENTIFIERs are
191 not considered an error within ASN.1. However, because the generated
192 files use the module name as part of their name, the code generation
193 pass will gripe about and fail for modules with the same name.
194
195 Next, each module's import {\em lists} are resolved by finding the
196 named module and then verifying that the named module contains all of
197 the imported types.
198
199 Then for each module, each type reference (except those of the form
200 {\em modulename.typename}) is assumed to be a local type reference and
201 the linker attempts to find a local type definition of the same name
202 to resolve it with. If a matching local definition is found, the type
203 reference is resolved and the linker continues with the next type
204 reference.
205
206 For each type reference of the form {\em modulename.typename}, the
207 linker looks in the module with name {\em modulename} for the type
208 {\em typename}. If the type is found the reference is resolved,
209 otherwise a linking error is reported. Note that this form of type
210 reference provides a special scope that does not conflict with other
211 local or imported types in that module.
212
213 For type references that failed to resolve locally and are not of the
214 form {\em modulename.typename}, the linker looks in the import lists
215 of the current type reference's module for a type to resolve with. If
216 the type is found in the import lists, the reference is resolved.
217
218 For the remaining unresolved type references (failed local and legal
219 import resolution and are not of the form {\em modulename.typename}),
220 the linker looks in the useful types module, if one was specified with
221 the {\ufn -u} option. If the type is found in the useful types module
222 then the reference is resolved, otherwise a linking error is reported.
223
224 Note that when a useful types module is specified, it is globally
225 available to all modules, but it has the lowest linking priority.
226 That is, if a type reference can be resolved legally without the
227 useful types module, it will be.
228
229 Some type checking must be done in this pass to link certain types
230 properly. These include:
231 \begin{itemize}
232 \item {a SELECTION type must reference a field of a CHOICE type.}
233 \item {a COMPONENTS OF type in a SET must reference a SET.}
234 \item {a COMPONENTS OF type in a SEQUENCE must reference a SEQUENCE.}
235 \end{itemize}
236
237
238
239 \section{\label{comp-pass4-section}Pass 4: Parsing Values}
240 The fourth pass attempts to parse any value that is enclosed in \{\}'s in
241 the given modules. INTEGERS, REALs and BOOLEANS that are not enclosed in
242 braces are parsed in the first pass.
243
244 The value parser is implemented without {\ufn yacc} and {\ufn lex} and
245 uses each value's type information to help parse the value. Values
246 within \{\}'s hidden within types such as default values and parts of
247 subtypes are not parsed. Since subtypes and default values do not
248 affect the generated code, upgrading the value parser in this respect
249 is not very useful.
250
251 The only type of value in \{\}'s that is parsed is the OBJECT
252 IDENTIFIER\@. All of the OBJECT IDENTIFIER value forms are supported
253 but snacc loosens the restrictions on using arc names defined in the
254 OBJECT IDENTIFIER tree.
255
256 ASN.1 allows OBJECT IDENTIFIER values to reference special built-in
257 arc names from the OBJECT IDENTIFIER tree defined in Annexes B, C and
258 D of X.208. For example the first arc in an OBJECT IDENTIFIER value
259 can be either {\ASN ccitt} {\ASN iso} or {\ASN joint-iso-ccitt}. The
260 acceptable arc names are context dependent; for example the second arc
261 can be one of {\ASN standard}, {\ASN registration-authority},
262 {\ASN member-body} or {\ASN identified-organization} only if the first
263 arc was {\ASN iso} or 1.
264
265 Snacc uses a simplified algorithm to handle references to the arc
266 names defined in the OBJECT IDENTIFIER tree. Any arc value that is
267 represented by a single identifier is checked to see if it is one of
268 the arc names defined in the OBJECT IDENTIFIER tree; context is
269 ignored. If the identifier matches one of the arc names then its
270 value is set accordingly. The lack of context sensitivity in snacc's
271 algorithm may cause the arc name to link with an arc name from the
272 OBJECT IDENTIFIER tree when a local or imported INTEGER was desired.
273 The following is the list special arc names that snacc understands and
274 their values (see {\ufn \dots/compiler/core/oid.c}):
275
276 \begin{itemize}
277 \setlength{\itemsep}{0pt}
278 \setlength{\parsep}{0pt}
279 \nspace{0}
280 \item {ccitt = 0}
281 \item {iso = 1}
282 \item {joint-iso-ccitt = 2}
283 \item {standard = 0}
284 \item {registration-authority = 1}
285 \item {member-body = 2}
286 \item {identified-organization = 3}
287 \item {recommendation = 0}
288 \item {question = 1}
289 \item {administration = 2}
290 \item {network-operator = 3}
291 \end{itemize}
292
293 \section{\label{comp-pass5-section}Pass 5: Linking Values}
294 The fifth pass links value references. The value linker looks for
295 value references to resolve in value definitions and type definitions,
296 including default values and subtyping information. The value linking
297 algorithm is virtually identical to the type linking pass (see Section
298 \ref{comp-pass3-section}).
299
300 Currently the value parsing is limited to OBJECT IDENTIFIER values.
301 Simple values that are not between \{\}'s are parsed in the first
302 pass. Here is an example that illustrates the OBJECT IDENTIFIER
303 parsing and linking. The following values:
304
305 \begin{small}
306 \begin{verbatim}
307 foo OBJECT IDENTIFIER ::= { joint-iso-ccitt 2 88 28 }
308 bar OBJECT IDENTIFIER ::= { foo 1 }
309 bell INTEGER ::= 2
310 gumby OBJECT IDENTIFIER ::= { foo bell }
311 pokie OBJECT IDENTIFIER ::= { foo stimpy(3) }
312 \end{verbatim}
313 \end{small}
314
315 \noindent
316 are equivalent to this:
317
318 \begin{small}
319 \begin{verbatim}
320 foo OBJECT IDENTIFIER ::= { 2 2 88 28 }
321 bar OBJECT IDENTIFIER ::= { 2 2 88 28 1 }
322 bell INTEGER ::= 2
323 gumby OBJECT IDENTIFIER ::= { 2 2 88 28 2 }
324 pokie OBJECT IDENTIFIER ::= { 2 2 88 28 3 }
325 \end{verbatim}
326 \end{small}
327
328 Note that in version 1.0, named arcs (e.g. {\ASN stimpy(3)}) were
329 promoted to full integer values. This was wrong---many standards
330 re-used them (e.g. X.500 and {\ASN ds(5)}) leading to multiply defined
331 integer values. If you want to improve the value parsing, look in
332 {\ufn \dots/compiler/core/val-parser.c}
333
334 \section{\label{comp-pass6-section}Pass 6: Processing Macros}
335
336 The fifth pass processes macros. For all macros currently handled,
337 snacc converts type definitions inside the macro to type references
338 and puts the type definition in the normal scope. This way, the code
339 generator does not have to know about macros to generate code for the
340 types defined within them.
341
342 The only macro that receives any special processing is the SNMP
343 OBJECT-TYPE macro. This macro's information defines an OBJECT
344 IDENTIFIER or INTEGER to type mapping for use with any ANY DEFINED BY
345 type. Note that the OBJECT-TYPE macro has been extended beyond its
346 SNMP definition to allow integer values for INTEGER to type mappings.
347
348 ASN.1 allows you to define new macros within an ASN.1 module; this
349 can change the grammar of the ASN.1 language. Since snacc is
350 implemented with {\ufn yacc} and yacc grammars cannot be modified
351 easily during runtime, snacc cannot change its parser in response to
352 macro definitions it parses.
353
354 Any macro that snacc can parse has been explicitly added to the yacc
355 grammar before compiling snacc. When a macro that snacc can parse is
356 parsed, a data structure that holds the relevant information from the
357 macro is added to the parse tree. The type and value linking passes
358 as well as the macro processing and possibly the normalization pass
359 need to be modified to handle any new macros that you add.
360
361 The following macros are parsed:
362
363 \begin{itemize}
364 %\begin{linespacing}{0.5}
365 \setlength{\itemsep}{0pt}
366 \setlength{\parsep}{0pt}
367 \nspace{0}
368 \item{ OPERATION (ROS) }
369 \item{ ERROR (ROS) }
370 \item{ BIND (ROS) }
371 \item{ UNBIND (ROS) }
372 \item{ APPLICATION-SERVICE-ELEMENT (ROS) }
373 \item{ APPLICATION-CONTEXT }
374 \item{ EXTENSION (MTSAS)}
375 \item{ EXTENSIONS (MTSAS) }
376 \item{ EXTENSION-ATTRIBUTE (MTSAS) }
377 \item{ TOKEN (MTSAS) }
378 \item{ TOKEN-DATA (MTSAS)}
379 \item{ SECURITY-CATEGORY (MTSAS) }
380 \item{ OBJECT (X.407) }
381 \item{ PORT (X.407) }
382 \item{ REFINE (X.407)}
383 \item{ ABSTRACT-BIND (X.407) }
384 \item{ ABSTRACT-UNBIND (X.407) }
385 \item{ ABSTRACT-OPERATION (X.407) }
386 \item{ ABSTRACT-ERROR (X.407) }
387 \item{ ALGORITHM (X.509)}
388 \item{ ENCRYPTED (X.509)}
389 \item{ PROTECTED (X.509)}
390 \item{ SIGNATURE (X.509)}
391 \item{ SIGNED (X.509)}
392 \item{ OBJECT-TYPE (SNMP) }
393 %\end{linespacing}
394 \end{itemize}
395
396 However, no code is generated for these macros. As stated above, only
397 the OBJECT-TYPE macro affects the encoders and decoders.
398
399 \section{\label{comp-pass7-section}Pass 7: Normalizing Types}
400 The sixth pass normalizes the types to make code generation simpler.
401 The following is done during normalization:
402 \begin{itemize}
403
404 \item[1.] { COMPONENTS OF types are replaced with the contents of the SET
405 or SEQUENCE components that they reference.}
406
407 \item[2.] { SELECTION types are replaced with the type they reference.}
408
409 \item[3.] { SEQUENCE, SET, CHOICE, SET OF and SEQUENCE OF {\em definitions}
410 embedded in other types are made into separate type definitions. }
411
412 \item[4.] { For modules in which ``IMPLICIT TAGS'' is specified, tagged
413 type references such as {\ASN [APPLICATION 2] Foo} are marked IMPLICIT
414 if the referenced type ({\ASN FOO} in this case) is not an untagged
415 CHOICE or untagged ANY type.}
416
417 \item[5.] { INTEGERs with named numbers, BIT STRINGs with named bits and
418 ENUMERATED types embedded in other types are made into separate type
419 definitions.}
420 \end{itemize}
421
422 The COMPONENTS OF and SELECTION type simplifications are obvious but
423 the motivation for the others may not be so obvious. The third type of
424 simplification makes type definitions only one level deep. This
425 simplifies the decoding routines since snacc uses local variables for
426 expected lengths, running length totals and tags instead of stacks.
427
428 The implicit references caused by ``IMPLICIT TAGS'' are marked
429 directly on type references that need it. This saves the code
430 generators from worrying about whether implicit tagging is in effect
431 and which types can be referenced implicitly.
432
433 The types with named numbers or bits are made into a separate type to
434 allow the C++ back end to simply make a class that inherits from the
435 INTEGER or BIT STRING class and defines the named numbers or bits
436 inside an enum in the new class. This is described further in the C++
437 code generation chapter.
438
439 \section{\label{comp-pass8-section}Pass 8: Marking Recursive Types}
440
441
442 This pass marks recursive types and checks for recursion related
443 errors. To determine whether a type definition is recursive, each
444 type definition is traced to its leaves, checking for references to
445 itself. Both local and imported type references within a type are
446 followed to reach the leaves of the type. A leaf type is a simple
447 (non-aggregate) built-in type such as an INTEGER or BOOLEAN\@. At the
448 moment, recursion information is only used during the type dependency
449 sorting pass.
450
451 {\em Snacc} attempts to detect two types of recursion related errors. The
452 first type of error results from a recursive type that is composed
453 solely of type references. Types of this form contain no real type
454 information and would result in zero-sized values. For example the
455 following recursive types will generate this type of warning:
456 \begin{small}
457 \begin{verbatim}
458 A ::= B
459 B ::= C
460 C ::= A
461 \end{verbatim}
462 \end{small}
463
464 The other recursion related error results from a type whose value will
465 always be infinite in size. This is caused by recursion with no
466 optional component that can terminate the recursion. If the recursion
467 includes an OPTIONAL member of a SET or SEQUENCE, a CHOICE member, or
468 a SET OF or SEQUENCE OF, the recursion can terminate.
469
470 Both of the recursion errors generate warnings from snacc but will
471 not stop code generation.
472
473
474 \section{\label{comp-pass9-section}Pass 9: Semantic Error Checking}
475 The ninth pass checks for semantic errors in the ASN.1 specification
476 that have not been checked already. Both the type linking pass and the
477 recursive type marking pass do some error checking as well. Snacc attempts
478 to detect the following errors in this pass:
479
480 \begin{itemize}
481 \item { elements of CHOICE and SET types must have distinct tags.}
482
483 \item { CHOICE, ANY, and ANY DEFINED BY types cannot be implicitly tagged. }
484
485 \item { type and value names within the same scope must be unique. }
486
487 \item { field names in a SET, SEQUENCE or CHOICE must be distinct. If
488 a CHOICE is a member of a SET, SEQUENCE or CHOICE and has no field name,
489 then the embedded CHOICE's field names must be distinct from its
490 parents to avoid ambiguity in value notation.}
491
492 \item { an APPLICATION tag code can only be used once per module. }
493
494 \item { each value in a named bit list (BIT STRINGs) or named number
495 list (INTEGERs and ENUMERATED) must be unique within its list.}
496
497 \item { each identifier in a named bit list or named number list must
498 be unique within its list.}
499
500 \item { the tags on a series of one or more consecutive OPTIONAL or DEFAULT
501 SEQUENCE elements and the following element must be distinct. }
502
503 \item { gives a warning if an ANY DEFINED BY type appears in a
504 SEQUENCE before its identifier or in a SET\@. These would allow encodings
505 where the ANY DEFINED BY value was prior to its identifier in the
506 encoded value; ANY DEFINED BY values are difficult to decode without
507 knowing their identifier.}
508
509 \end{itemize}
510
511 Snacc does not attempt to detect the following errors due the
512 limitations of the value parser.
513 \begin{itemize}
514 \item { SET and SEQUENCE values can be empty (\{\}) only if the SET or
515 SEQUENCE type was defined as empty or all of its elements are marked
516 as OPTIONAL or DEFAULT.}
517
518 \item { each identifier in a BIT STRING value must from that BIT
519 STRING's named bit list (this could be done in an improved value
520 linker instead of this pass).}
521 \end{itemize}
522
523
524 \section{\label{comp-pass10-section}Pass 10: Generating C/C++ Type Information}
525
526 This pass fills in the target language type information. The process
527 is different for the C and C++ back ends since the C++ ASN.1 model is
528 different and it was developed later (more design flaws had been
529 corrected for the C++ backend).
530
531 For C and C++ there is an array that contains the type {\em definition}
532 information for each built-in type. For each built-in ASN.1 type, the
533 C array holds:
534
535 \begin{description}
536 \item[typename] {the C {\C typedef} name for this type definition.}
537
538 \item[isPdu] {TRUE if this type definition is a PDU\@. This is set
539 for types used in ANY and ANY DEFINED BY types and those indicated by
540 the user via compiler directives. Additional interfaces to the encode
541 and decode routines are generated for PDU types. The SNMP OBJECT-TYPE
542 macro is the current means of indicating whether a type is used within
543 an ANY or ANY DEFINED BY type.}
544
545 \item[isPtrForTypeDef] { TRUE if other types defined solely by this type
546 definition are defined as a pointer to this type.}
547
548 \item[isPtrForTypeRef] { TRUE if type references to this type
549 definition from a SET or SEQUENCE are by pointer.}
550
551 \item[isPtrForOpt] { TRUE if OPTIONAL type references to this type
552 definition from a SET or SEQUENCE are by pointer.}
553
554 \item[isPtrInChoice] { TRUE if type references to this type
555 definition from a CHOICE are by pointer.}
556
557 \item[optTestRoutineName] { name of the routine to test whether an
558 OPTIONAL element of this type in a SET or SEQUENCE is present.
559 Usually just the name of a C macro that tests for NULL.}
560
561 \item[printRoutineName] { name of this type definition's printing routine.}
562 \item[encodeRoutineName]{ name of this type definition's encoding routine.}
563 \item[decodeRoutineName]{ name of this type definition's decoding routine.}
564 \item[freeRoutineName] { name of this type definition's freeing routine.}
565 \end{description}
566
567 The C++ type definition array is similar to C's. It contains:
568
569 \begin{description}
570 \item[classname] { holds the C++ {\C class} name for this type definition.}
571 \item[isPdu] { same as C isPdu except that is does not affect the code
572 generation since the C++ back end includes the extra PDU encode and
573 decode routines by default.}
574 \item[isPtrForTypeDef] { same as C isPtrForTypeDef. }
575 \item[isPtrForOpt] { same as C isPtrForOpt.}
576 \item[isPtrInChoice] { same as C isPtrInChoice}
577 \item[isPtrInSetAndSeq] { whether type references to this class
578 from a SET or SEQUENCE are by pointer.}
579 \item[isPtrInList] {whether type references to this class
580 from a SET OF or SEQUENCE OF are by pointer.}
581 \item[optTestRoutineName] { name of the routine to test whether an
582 OPTIONAL element of this type in a SET or SEQUENCE is present.
583 Usually is just name of a C macro that tests for NULL.}
584 \end{description}
585
586 The first step of this pass uses the type arrays to fill in the C or
587 C++ type {\em definition} information for each module's ASN.1 type
588 definitions. This is done for the useful types module as well.
589
590 The next step goes through each constructed type and fills in the type
591 {\em reference} information for each reference to a built-in, user defined
592 or useful type. Much of the type reference information is taken from
593 the referenced type's definition information. The type reference
594 information contains the following (for both C and C++):
595
596 \begin{description}
597 \item[fieldName] { field name for this type if it is referenced from
598 a CHOICE, SET or SEQUENCE.}
599 \item[typeName] { type name of the referenced type.}
600 \item[isPtr] { whether this reference is by pointer.}
601 \item[namedElmts] { named elements for INTEGER, ENUMERATED or BIT
602 STRING types with their C names and values.}
603 \item[choiceIdValue] { if this type reference is in a CHOICE, this
604 holds the value of the CHOICE's choiceId that indicates the presence
605 of this field.}
606 \item[choiceIdSymbol] { if this type reference is in a CHOICE, this
607 holds the C enum value symbol that has the choiceIdValue value.}
608 \item[optTestRoutineName] { name of the routine or macro to test for
609 the presence of this element if it is an OPTIONAL element of a SET or SEQUENCE.}
610 \end{description}
611
612 \section{\label{comp-pass11-section}Pass 11: Sorting Types}
613
614 This pass sorts the type definitions within each module in order of
615 dependence. ASN.1 does not require the types to be defined before
616 they are referenced but both C and C++ do. Without this pass, the
617 generated types/classes would probably not compile due to type
618 dependency problems. There is no attempt to order the modules;
619 command line order is used for the module dependence. If you have
620 problems with mutually dependent modules, the simplest approach is to
621 combine the dependent modules into a single ASN.1 module.
622
623 Some compilers such as CASN1 \cite{CASN1} require the user to order
624 the types within the ASN.1 modules. This can be tedious and since
625 snacc may generate new type definitions from nested aggregate type
626 definitions in the normalization pass, the user does not have complete
627 control over the order of every type definition. (The user could use
628 the {\ufn -P} option to get the normalized ASN.1 and then order it but
629 that is painful as well.)
630
631 Snacc attempts to sort the types from least dependent to most
632 dependent using the following convoluted algorithm:
633
634 First, separate the type definitions within a module into the groups:
635 \begin{itemize}
636 \item[1.] { type definitions that are defined directly from simple built-in
637 types such as INTEGER.}
638
639 \item[2.] { types such as SET, SEQUENCE, SET OF, SEQUENCE OF and CHOICE
640 that contain no references to types defined in this module. That, is
641 they are defined from only simple built-in types, imported types or
642 useful types.}
643
644 \item[3.] { type definitions that reference locally defined types.}
645
646 \item[4.] { type definitions that are not referenced by any local types.}
647 \end{itemize}
648
649 Only the 3rd group of type definitions needs more sorting. After it
650 has been sorted, the groups are merged in the order 1, 2, 3, 4 to
651 yield a sorted type definition list.
652
653 Now we describe how the 3rd group of type definitions is sorted.
654 \begin{itemize}
655
656 \item[1.] {for each type definition in the third group, a list of its local type
657 references is built and attached to it. This type reference list only
658 goes one level deep; it does not follow type references to find more
659 type references.}
660
661 \item[2.] { all of the linearly-dependent types are removed and sorted.
662 This is done by repeatedly removing type definitions that do not
663 directly depend on any other type definitions that remain in the 3rd
664 group. The process of removing the type definitions sorts them.}
665
666 \item[3.] { the type definitions that were not removed in step 2 are
667 divided into two groups: recursive and non-recursive. The
668 non-recursive types depend on the recursive ones since they are still
669 in the list after step 2.}
670
671 \item[4.] { the non-recursive types from step 3 are sorted as in step
672 2. All of them should sort linearly since none are recursive. }
673
674 \item[5.] { if the target language is C, any SET OF or SEQUENCE OF
675 types are separated from the recursive type definitions built in step 3.
676 This is done because the C representation of a list type is generic
677 (uses a {\C void~*} to reference the list element) and therefore does
678 not really depend on the list's element type.}
679
680 \item[6] { the list of local type references for the recursive types
681 from step 3 is re-generated as in step 1 using a relaxation: types
682 referenced as pointers are not added to a type's reference list.}
683
684 \item[7] { the recursive types from step two are re-sorted as in step
685 2 using their new local type reference lists. Two lists are formed,
686 those that sorted linearly and those that did not. Hopefully the
687 latter list will be empty.}
688 \end{itemize}
689
690 To form a sorted third group, the lists are merged in the following order:
691 \begin{itemize}
692 \item {linearly sorted types from step 2}
693 \item {separated list types (C only) from step 5}
694 \item {sorted recursive types from step 7}
695 \item {unsorted recursive types from step 7 (hopefully empty)}
696 \item {sorted non-recursive types from step 4}
697 \end{itemize}
698
699
700 In C, the code generator defines both {\C typedef} names and
701 {\C struct} tags (names). For example,
702 \begin{verbatim}
703 Foo ::= SET { a INTEGER, b BOOLEAN }
704
705 Bar ::= SEQUENCE { a OBJECT IDENTIFIER, b Foo }
706 \end{verbatim}
707 translates to the following C data types:
708 \begin{verbatim}
709 typedef struct Foo /* SET */
710 {
711 AsnInt a; /* INTEGER */
712 AsnBool b; /* BOOLEAN */
713 } Foo;
714
715 typedef struct Bar /* SEQUENCE */
716 {
717 AsnOid a; /* OBJECT IDENTIFIER */
718 struct Foo *b; /* Foo */
719 } Bar;
720 \end{verbatim}
721
722 Note that both the {\C struct} and the {\C typedef} have the name
723 {\C Foo}. Also note that the Bar type references the {\C Foo} via
724 {\C struct Foo~*}.
725
726 For types such as {\C Bar} that contain the {\C Foo} type,
727 {\C Foo} is referenced as {\C struct Foo~*} instead of just
728 {\C Foo~*} because C allows you to use the type {\C struct Foo~*}
729 (incomplete type) in defining types even prior to the actual
730 declaration of the the {\C struct Foo}. The {\C Foo~*} type can
731 {\em only} be used after the {\C Foo typedef} declaration. The use
732 of incomplete types can often overcome recursion related type ordering
733 problems (not relevant in this example since they are not recursive).
734
735 \section{\label{comp-pass12-section}Pass 12: Generating Code}
736
737 This pass creates and fills the source files with C or C++ code or
738 produces a type table containing the type descriptions from all of the
739 parsed modules, including the useful types module (if given). The
740 purpose of the normalization, sorting and error detection passes is to
741 simplify this pass.
742
743 The normalization pass simplified the ASN.1 types in various ways to
744 make C/C++ type and code generation simpler.
745
746 The type sorting pass hopefully eliminates type dependency problems in the
747 generated code. The C/C++ type generator simply proceeds through the
748 ordered type list writing the C/C++ type definitions to a header file.
749
750 The error detection and linking passes will make snacc exit if errors
751 are found, so the code generation pass can assume the ASN.1 types are
752 virtually error free. This usually allows snacc to exit gracefully
753 instead of crashing due to an undetected error.
754
755 The type table data structure is similar to snacc's parse tree for the
756 ASN.1 modules but it is much simpler. This is because all of the type
757 linking and error checking has been done. The type definitions in the
758 type tables are in defined by the type sorting pass (dependency).
759
760 The next chapters describe the code that is generated by snacc and the
761 libraries the generated code uses.