]> git.saurik.com Git - apple/shell_cmds.git/blob - sh/TOUR
shell_cmds-198.tar.gz
[apple/shell_cmds.git] / sh / TOUR
1 # @(#)TOUR 8.1 (Berkeley) 5/31/93
2 # $FreeBSD$
3
4 NOTE -- This is the original TOUR paper distributed with ash and
5 does not represent the current state of the shell. It is provided anyway
6 since it provides helpful information for how the shell is structured,
7 but be warned that things have changed -- the current shell is
8 still under development.
9
10 ================================================================
11
12 A Tour through Ash
13
14 Copyright 1989 by Kenneth Almquist.
15
16
17 DIRECTORIES: The subdirectory bltin contains commands which can
18 be compiled stand-alone. The rest of the source is in the main
19 ash directory.
20
21 SOURCE CODE GENERATORS: Files whose names begin with "mk" are
22 programs that generate source code. A complete list of these
23 programs is:
24
25 program input files generates
26 ------- ----------- ---------
27 mkbuiltins builtins builtins.h builtins.c
28 mknodes nodetypes nodes.h nodes.c
29 mksyntax - syntax.h syntax.c
30 mktokens - token.h
31
32 There are undoubtedly too many of these.
33
34 EXCEPTIONS: Code for dealing with exceptions appears in
35 exceptions.c. The C language doesn't include exception handling,
36 so I implement it using setjmp and longjmp. The global variable
37 exception contains the type of exception. EXERROR is raised by
38 calling error. EXINT is an interrupt.
39
40 INTERRUPTS: In an interactive shell, an interrupt will cause an
41 EXINT exception to return to the main command loop. (Exception:
42 EXINT is not raised if the user traps interrupts using the trap
43 command.) The INTOFF and INTON macros (defined in exception.h)
44 provide uninterruptible critical sections. Between the execution
45 of INTOFF and the execution of INTON, interrupt signals will be
46 held for later delivery. INTOFF and INTON can be nested.
47
48 MEMALLOC.C: Memalloc.c defines versions of malloc and realloc
49 which call error when there is no memory left. It also defines a
50 stack oriented memory allocation scheme. Allocating off a stack
51 is probably more efficient than allocation using malloc, but the
52 big advantage is that when an exception occurs all we have to do
53 to free up the memory in use at the time of the exception is to
54 restore the stack pointer. The stack is implemented using a
55 linked list of blocks.
56
57 STPUTC: If the stack were contiguous, it would be easy to store
58 strings on the stack without knowing in advance how long the
59 string was going to be:
60 p = stackptr;
61 *p++ = c; /* repeated as many times as needed */
62 stackptr = p;
63 The following three macros (defined in memalloc.h) perform these
64 operations, but grow the stack if you run off the end:
65 STARTSTACKSTR(p);
66 STPUTC(c, p); /* repeated as many times as needed */
67 grabstackstr(p);
68
69 We now start a top-down look at the code:
70
71 MAIN.C: The main routine performs some initialization, executes
72 the user's profile if necessary, and calls cmdloop. Cmdloop
73 repeatedly parses and executes commands.
74
75 OPTIONS.C: This file contains the option processing code. It is
76 called from main to parse the shell arguments when the shell is
77 invoked, and it also contains the set builtin. The -i and -m op-
78 tions (the latter turns on job control) require changes in signal
79 handling. The routines setjobctl (in jobs.c) and setinteractive
80 (in trap.c) are called to handle changes to these options.
81
82 PARSING: The parser code is all in parser.c. A recursive des-
83 cent parser is used. Syntax tables (generated by mksyntax) are
84 used to classify characters during lexical analysis. There are
85 four tables: one for normal use, one for use when inside single
86 quotes and dollar single quotes, one for use when inside double
87 quotes and one for use in arithmetic. The tables are machine
88 dependent because they are indexed by character variables and
89 the range of a char varies from machine to machine.
90
91 PARSE OUTPUT: The output of the parser consists of a tree of
92 nodes. The various types of nodes are defined in the file node-
93 types.
94
95 Nodes of type NARG are used to represent both words and the con-
96 tents of here documents. An early version of ash kept the con-
97 tents of here documents in temporary files, but keeping here do-
98 cuments in memory typically results in significantly better per-
99 formance. It would have been nice to make it an option to use
100 temporary files for here documents, for the benefit of small
101 machines, but the code to keep track of when to delete the tem-
102 porary files was complex and I never fixed all the bugs in it.
103 (AT&T has been maintaining the Bourne shell for more than ten
104 years, and to the best of my knowledge they still haven't gotten
105 it to handle temporary files correctly in obscure cases.)
106
107 The text field of a NARG structure points to the text of the
108 word. The text consists of ordinary characters and a number of
109 special codes defined in parser.h. The special codes are:
110
111 CTLVAR Variable substitution
112 CTLENDVAR End of variable substitution
113 CTLBACKQ Command substitution
114 CTLBACKQ|CTLQUOTE Command substitution inside double quotes
115 CTLESC Escape next character
116
117 A variable substitution contains the following elements:
118
119 CTLVAR type name '=' [ alternative-text CTLENDVAR ]
120
121 The type field is a single character specifying the type of sub-
122 stitution. The possible types are:
123
124 VSNORMAL $var
125 VSMINUS ${var-text}
126 VSMINUS|VSNUL ${var:-text}
127 VSPLUS ${var+text}
128 VSPLUS|VSNUL ${var:+text}
129 VSQUESTION ${var?text}
130 VSQUESTION|VSNUL ${var:?text}
131 VSASSIGN ${var=text}
132 VSASSIGN|VSNUL ${var:=text}
133
134 In addition, the type field will have the VSQUOTE flag set if the
135 variable is enclosed in double quotes. The name of the variable
136 comes next, terminated by an equals sign. If the type is not
137 VSNORMAL, then the text field in the substitution follows, ter-
138 minated by a CTLENDVAR byte.
139
140 Commands in back quotes are parsed and stored in a linked list.
141 The locations of these commands in the string are indicated by
142 CTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether
143 the back quotes were enclosed in double quotes.
144
145 The character CTLESC escapes the next character, so that in case
146 any of the CTL characters mentioned above appear in the input,
147 they can be passed through transparently. CTLESC is also used to
148 escape '*', '?', '[', and '!' characters which were quoted by the
149 user and thus should not be used for file name generation.
150
151 CTLESC characters have proved to be particularly tricky to get
152 right. In the case of here documents which are not subject to
153 variable and command substitution, the parser doesn't insert any
154 CTLESC characters to begin with (so the contents of the text
155 field can be written without any processing). Other here docu-
156 ments, and words which are not subject to splitting and file name
157 generation, have the CTLESC characters removed during the vari-
158 able and command substitution phase. Words which are subject to
159 splitting and file name generation have the CTLESC characters re-
160 moved as part of the file name phase.
161
162 EXECUTION: Command execution is handled by the following files:
163 eval.c The top level routines.
164 redir.c Code to handle redirection of input and output.
165 jobs.c Code to handle forking, waiting, and job control.
166 exec.c Code to do path searches and the actual exec sys call.
167 expand.c Code to evaluate arguments.
168 var.c Maintains the variable symbol table. Called from expand.c.
169
170 EVAL.C: Evaltree recursively executes a parse tree. The exit
171 status is returned in the global variable exitstatus. The alter-
172 native entry evalbackcmd is called to evaluate commands in back
173 quotes. It saves the result in memory if the command is a buil-
174 tin; otherwise it forks off a child to execute the command and
175 connects the standard output of the child to a pipe.
176
177 JOBS.C: To create a process, you call makejob to return a job
178 structure, and then call forkshell (passing the job structure as
179 an argument) to create the process. Waitforjob waits for a job
180 to complete. These routines take care of process groups if job
181 control is defined.
182
183 REDIR.C: Ash allows file descriptors to be redirected and then
184 restored without forking off a child process. This is accom-
185 plished by duplicating the original file descriptors. The redir-
186 tab structure records where the file descriptors have been dupli-
187 cated to.
188
189 EXEC.C: The routine find_command locates a command, and enters
190 the command in the hash table if it is not already there. The
191 third argument specifies whether it is to print an error message
192 if the command is not found. (When a pipeline is set up,
193 find_command is called for all the commands in the pipeline be-
194 fore any forking is done, so to get the commands into the hash
195 table of the parent process. But to make command hashing as
196 transparent as possible, we silently ignore errors at that point
197 and only print error messages if the command cannot be found
198 later.)
199
200 The routine shellexec is the interface to the exec system call.
201
202 EXPAND.C: Arguments are processed in three passes. The first
203 (performed by the routine argstr) performs variable and command
204 substitution. The second (ifsbreakup) performs word splitting
205 and the third (expandmeta) performs file name generation.
206
207 VAR.C: Variables are stored in a hash table. Probably we should
208 switch to extensible hashing. The variable name is stored in the
209 same string as the value (using the format "name=value") so that
210 no string copying is needed to create the environment of a com-
211 mand. Variables which the shell references internally are preal-
212 located so that the shell can reference the values of these vari-
213 ables without doing a lookup.
214
215 When a program is run, the code in eval.c sticks any environment
216 variables which precede the command (as in "PATH=xxx command") in
217 the variable table as the simplest way to strip duplicates, and
218 then calls "environment" to get the value of the environment.
219
220 BUILTIN COMMANDS: The procedures for handling these are scat-
221 tered throughout the code, depending on which location appears
222 most appropriate. They can be recognized because their names al-
223 ways end in "cmd". The mapping from names to procedures is
224 specified in the file builtins, which is processed by the mkbuilt-
225 ins command.
226
227 A builtin command is invoked with argc and argv set up like a
228 normal program. A builtin command is allowed to overwrite its
229 arguments. Builtin routines can call nextopt to do option pars-
230 ing. This is kind of like getopt, but you don't pass argc and
231 argv to it. Builtin routines can also call error. This routine
232 normally terminates the shell (or returns to the main command
233 loop if the shell is interactive), but when called from a builtin
234 command it causes the builtin command to terminate with an exit
235 status of 2.
236
237 The directory bltins contains commands which can be compiled in-
238 dependently but can also be built into the shell for efficiency
239 reasons. The makefile in this directory compiles these programs
240 in the normal fashion (so that they can be run regardless of
241 whether the invoker is ash), but also creates a library named
242 bltinlib.a which can be linked with ash. The header file bltin.h
243 takes care of most of the differences between the ash and the
244 stand-alone environment. The user should call the main routine
245 "main", and #define main to be the name of the routine to use
246 when the program is linked into ash. This #define should appear
247 before bltin.h is included; bltin.h will #undef main if the pro-
248 gram is to be compiled stand-alone.
249
250 CD.C: This file defines the cd and pwd builtins.
251
252 SIGNALS: Trap.c implements the trap command. The routine set-
253 signal figures out what action should be taken when a signal is
254 received and invokes the signal system call to set the signal ac-
255 tion appropriately. When a signal that a user has set a trap for
256 is caught, the routine "onsig" sets a flag. The routine dotrap
257 is called at appropriate points to actually handle the signal.
258 When an interrupt is caught and no trap has been set for that
259 signal, the routine "onint" in error.c is called.
260
261 OUTPUT: Ash uses it's own output routines. There are three out-
262 put structures allocated. "Output" represents the standard out-
263 put, "errout" the standard error, and "memout" contains output
264 which is to be stored in memory. This last is used when a buil-
265 tin command appears in backquotes, to allow its output to be col-
266 lected without doing any I/O through the UNIX operating system.
267 The variables out1 and out2 normally point to output and errout,
268 respectively, but they are set to point to memout when appropri-
269 ate inside backquotes.
270
271 INPUT: The basic input routine is pgetc, which reads from the
272 current input file. There is a stack of input files; the current
273 input file is the top file on this stack. The code allows the
274 input to come from a string rather than a file. (This is for the
275 -c option and the "." and eval builtin commands.) The global
276 variable plinno is saved and restored when files are pushed and
277 popped from the stack. The parser routines store the number of
278 the current line in this variable.
279
280 DEBUGGING: If DEBUG is defined in shell.h, then the shell will
281 write debugging information to the file $HOME/trace. Most of
282 this is done using the TRACE macro, which takes a set of printf
283 arguments inside two sets of parenthesis. Example:
284 "TRACE(("n=%d0, n))". The double parenthesis are necessary be-
285 cause the preprocessor can't handle functions with a variable
286 number of arguments. Defining DEBUG also causes the shell to
287 generate a core dump if it is sent a quit signal. The tracing
288 code is in show.c.