]>
Commit | Line | Data |
---|---|---|
1 | # @(#)TOUR 8.1 (Berkeley) 5/31/93 | |
2 | # $FreeBSD: head/bin/sh/TOUR 317882 2017-05-06 13:28:42Z jilles $ | |
3 | ||
4 | NOTE -- This is the original TOUR paper distributed with ash and | |
5 | does not represent the current state of the shell. It is provided anyway | |
6 | since it provides helpful information for how the shell is structured, | |
7 | but be warned that things have changed -- the current shell is | |
8 | still under development. | |
9 | ||
10 | ================================================================ | |
11 | ||
12 | A Tour through Ash | |
13 | ||
14 | Copyright 1989 by Kenneth Almquist. | |
15 | ||
16 | ||
17 | DIRECTORIES: The subdirectory bltin contains commands which can | |
18 | be compiled stand-alone. The rest of the source is in the main | |
19 | ash directory. | |
20 | ||
21 | SOURCE CODE GENERATORS: Files whose names begin with "mk" are | |
22 | programs that generate source code. A complete list of these | |
23 | programs is: | |
24 | ||
25 | program input files generates | |
26 | ------- ----------- --------- | |
27 | mkbuiltins builtins.def builtins.h builtins.c | |
28 | mknodes nodetypes nodes.h nodes.c | |
29 | mksyntax - syntax.h syntax.c | |
30 | mktokens - token.h | |
31 | ||
32 | There are undoubtedly too many of these. | |
33 | ||
34 | EXCEPTIONS: Code for dealing with exceptions appears in | |
35 | exceptions.c. The C language doesn't include exception handling, | |
36 | so I implement it using setjmp and longjmp. The global variable | |
37 | exception contains the type of exception. EXERROR is raised by | |
38 | calling error. EXINT is an interrupt. | |
39 | ||
40 | INTERRUPTS: In an interactive shell, an interrupt will cause an | |
41 | EXINT exception to return to the main command loop. (Exception: | |
42 | EXINT is not raised if the user traps interrupts using the trap | |
43 | command.) The INTOFF and INTON macros (defined in exception.h) | |
44 | provide uninterruptible critical sections. Between the execution | |
45 | of INTOFF and the execution of INTON, interrupt signals will be | |
46 | held for later delivery. INTOFF and INTON can be nested. | |
47 | ||
48 | MEMALLOC.C: Memalloc.c defines versions of malloc and realloc | |
49 | which call error when there is no memory left. It also defines a | |
50 | stack oriented memory allocation scheme. Allocating off a stack | |
51 | is probably more efficient than allocation using malloc, but the | |
52 | big advantage is that when an exception occurs all we have to do | |
53 | to free up the memory in use at the time of the exception is to | |
54 | restore the stack pointer. The stack is implemented using a | |
55 | linked list of blocks. | |
56 | ||
57 | STPUTC: If the stack were contiguous, it would be easy to store | |
58 | strings on the stack without knowing in advance how long the | |
59 | string was going to be: | |
60 | p = stackptr; | |
61 | *p++ = c; /* repeated as many times as needed */ | |
62 | stackptr = p; | |
63 | The following three macros (defined in memalloc.h) perform these | |
64 | operations, but grow the stack if you run off the end: | |
65 | STARTSTACKSTR(p); | |
66 | STPUTC(c, p); /* repeated as many times as needed */ | |
67 | grabstackstr(p); | |
68 | ||
69 | We now start a top-down look at the code: | |
70 | ||
71 | MAIN.C: The main routine performs some initialization, executes | |
72 | the user's profile if necessary, and calls cmdloop. Cmdloop | |
73 | repeatedly parses and executes commands. | |
74 | ||
75 | OPTIONS.C: This file contains the option processing code. It is | |
76 | called from main to parse the shell arguments when the shell is | |
77 | invoked, and it also contains the set builtin. The -i and -m op- | |
78 | tions (the latter turns on job control) require changes in signal | |
79 | handling. The routines setjobctl (in jobs.c) and setinteractive | |
80 | (in trap.c) are called to handle changes to these options. | |
81 | ||
82 | PARSING: The parser code is all in parser.c. A recursive des- | |
83 | cent parser is used. Syntax tables (generated by mksyntax) are | |
84 | used to classify characters during lexical analysis. There are | |
85 | four tables: one for normal use, one for use when inside single | |
86 | quotes and dollar single quotes, one for use when inside double | |
87 | quotes and one for use in arithmetic. The tables are machine | |
88 | dependent because they are indexed by character variables and | |
89 | the range of a char varies from machine to machine. | |
90 | ||
91 | PARSE OUTPUT: The output of the parser consists of a tree of | |
92 | nodes. The various types of nodes are defined in the file node- | |
93 | types. | |
94 | ||
95 | Nodes of type NARG are used to represent both words and the con- | |
96 | tents of here documents. An early version of ash kept the con- | |
97 | tents of here documents in temporary files, but keeping here do- | |
98 | cuments in memory typically results in significantly better per- | |
99 | formance. It would have been nice to make it an option to use | |
100 | temporary files for here documents, for the benefit of small | |
101 | machines, but the code to keep track of when to delete the tem- | |
102 | porary files was complex and I never fixed all the bugs in it. | |
103 | (AT&T has been maintaining the Bourne shell for more than ten | |
104 | years, and to the best of my knowledge they still haven't gotten | |
105 | it to handle temporary files correctly in obscure cases.) | |
106 | ||
107 | The text field of a NARG structure points to the text of the | |
108 | word. The text consists of ordinary characters and a number of | |
109 | special codes defined in parser.h. The special codes are: | |
110 | ||
111 | CTLVAR Parameter expansion | |
112 | CTLENDVAR End of parameter expansion | |
113 | CTLBACKQ Command substitution | |
114 | CTLBACKQ|CTLQUOTE Command substitution inside double quotes | |
115 | CTLARI Arithmetic expansion | |
116 | CTLENDARI End of arithmetic expansion | |
117 | CTLESC Escape next character | |
118 | ||
119 | A variable substitution contains the following elements: | |
120 | ||
121 | CTLVAR type name '=' [ alternative-text CTLENDVAR ] | |
122 | ||
123 | The type field is a single character specifying the type of sub- | |
124 | stitution. The possible types are: | |
125 | ||
126 | VSNORMAL $var | |
127 | VSMINUS ${var-text} | |
128 | VSMINUS|VSNUL ${var:-text} | |
129 | VSPLUS ${var+text} | |
130 | VSPLUS|VSNUL ${var:+text} | |
131 | VSQUESTION ${var?text} | |
132 | VSQUESTION|VSNUL ${var:?text} | |
133 | VSASSIGN ${var=text} | |
134 | VSASSIGN|VSNUL ${var:=text} | |
135 | VSTRIMLEFT ${var#text} | |
136 | VSTRIMLEFTMAX ${var##text} | |
137 | VSTRIMRIGHT ${var%text} | |
138 | VSTRIMRIGHTMAX ${var%%text} | |
139 | VSLENGTH ${#var} | |
140 | VSERROR delayed error | |
141 | ||
142 | In addition, the type field will have the VSQUOTE flag set if the | |
143 | variable is enclosed in double quotes and the VSLINENO flag if | |
144 | LINENO is being expanded (the parameter name is the decimal line | |
145 | number). The parameter's name comes next, terminated by an equals | |
146 | sign. If the type is not VSNORMAL (including when it is VSLENGTH), | |
147 | then the text field in the substitution follows, terminated by a | |
148 | CTLENDVAR byte. | |
149 | ||
150 | The type VSERROR is used to allow parsing bad substitutions like | |
151 | ${var[7]} and generate an error when they are expanded. | |
152 | ||
153 | Commands in back quotes are parsed and stored in a linked list. | |
154 | The locations of these commands in the string are indicated by | |
155 | CTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether | |
156 | the back quotes were enclosed in double quotes. | |
157 | ||
158 | Arithmetic expansion starts with CTLARI and ends with CTLENDARI. | |
159 | ||
160 | The character CTLESC escapes the next character, so that in case | |
161 | any of the CTL characters mentioned above appear in the input, | |
162 | they can be passed through transparently. CTLESC is also used to | |
163 | escape '*', '?', '[', and '!' characters which were quoted by the | |
164 | user and thus should not be used for file name generation. | |
165 | ||
166 | CTLESC characters have proved to be particularly tricky to get | |
167 | right. In the case of here documents which are not subject to | |
168 | variable and command substitution, the parser doesn't insert any | |
169 | CTLESC characters to begin with (so the contents of the text | |
170 | field can be written without any processing). Other here docu- | |
171 | ments, and words which are not subject to file name generation, | |
172 | have the CTLESC characters removed during the variable and command | |
173 | substitution phase. Words which are subject to file name | |
174 | generation have the CTLESC characters removed as part of the file | |
175 | name phase. | |
176 | ||
177 | EXECUTION: Command execution is handled by the following files: | |
178 | eval.c The top level routines. | |
179 | redir.c Code to handle redirection of input and output. | |
180 | jobs.c Code to handle forking, waiting, and job control. | |
181 | exec.c Code to do path searches and the actual exec sys call. | |
182 | expand.c Code to evaluate arguments. | |
183 | var.c Maintains the variable symbol table. Called from expand.c. | |
184 | ||
185 | EVAL.C: Evaltree recursively executes a parse tree. The exit | |
186 | status is returned in the global variable exitstatus. The alter- | |
187 | native entry evalbackcmd is called to evaluate commands in back | |
188 | quotes. It saves the result in memory if the command is a buil- | |
189 | tin; otherwise it forks off a child to execute the command and | |
190 | connects the standard output of the child to a pipe. | |
191 | ||
192 | JOBS.C: To create a process, you call makejob to return a job | |
193 | structure, and then call forkshell (passing the job structure as | |
194 | an argument) to create the process. Waitforjob waits for a job | |
195 | to complete. These routines take care of process groups if job | |
196 | control is defined. | |
197 | ||
198 | REDIR.C: Ash allows file descriptors to be redirected and then | |
199 | restored without forking off a child process. This is accom- | |
200 | plished by duplicating the original file descriptors. The redir- | |
201 | tab structure records where the file descriptors have been dupli- | |
202 | cated to. | |
203 | ||
204 | EXEC.C: The routine find_command locates a command, and enters | |
205 | the command in the hash table if it is not already there. The | |
206 | third argument specifies whether it is to print an error message | |
207 | if the command is not found. (When a pipeline is set up, | |
208 | find_command is called for all the commands in the pipeline be- | |
209 | fore any forking is done, so to get the commands into the hash | |
210 | table of the parent process. But to make command hashing as | |
211 | transparent as possible, we silently ignore errors at that point | |
212 | and only print error messages if the command cannot be found | |
213 | later.) | |
214 | ||
215 | The routine shellexec is the interface to the exec system call. | |
216 | ||
217 | EXPAND.C: As the routine argstr generates words by parameter | |
218 | expansion, command substitution and arithmetic expansion, it | |
219 | performs word splitting on the result. As each word is output, | |
220 | the routine expandmeta performs file name generation (if enabled). | |
221 | ||
222 | VAR.C: Variables are stored in a hash table. Probably we should | |
223 | switch to extensible hashing. The variable name is stored in the | |
224 | same string as the value (using the format "name=value") so that | |
225 | no string copying is needed to create the environment of a com- | |
226 | mand. Variables which the shell references internally are preal- | |
227 | located so that the shell can reference the values of these vari- | |
228 | ables without doing a lookup. | |
229 | ||
230 | When a program is run, the code in eval.c sticks any environment | |
231 | variables which precede the command (as in "PATH=xxx command") in | |
232 | the variable table as the simplest way to strip duplicates, and | |
233 | then calls "environment" to get the value of the environment. | |
234 | ||
235 | BUILTIN COMMANDS: The procedures for handling these are scat- | |
236 | tered throughout the code, depending on which location appears | |
237 | most appropriate. They can be recognized because their names al- | |
238 | ways end in "cmd". The mapping from names to procedures is | |
239 | specified in the file builtins.def, which is processed by the | |
240 | mkbuiltins command. | |
241 | ||
242 | A builtin command is invoked with argc and argv set up like a | |
243 | normal program. A builtin command is allowed to overwrite its | |
244 | arguments. Builtin routines can call nextopt to do option pars- | |
245 | ing. This is kind of like getopt, but you don't pass argc and | |
246 | argv to it. Builtin routines can also call error. This routine | |
247 | normally terminates the shell (or returns to the main command | |
248 | loop if the shell is interactive), but when called from a non- | |
249 | special builtin command it causes the builtin command to | |
250 | terminate with an exit status of 2. | |
251 | ||
252 | The directory bltins contains commands which can be compiled in- | |
253 | dependently but can also be built into the shell for efficiency | |
254 | reasons. The header file bltin.h takes care of most of the | |
255 | differences between the ash and the stand-alone environment. | |
256 | The user should call the main routine "main", and #define main to | |
257 | be the name of the routine to use when the program is linked into | |
258 | ash. This #define should appear before bltin.h is included; | |
259 | bltin.h will #undef main if the program is to be compiled | |
260 | stand-alone. A similar approach is used for a few utilities from | |
261 | bin and usr.bin. | |
262 | ||
263 | CD.C: This file defines the cd and pwd builtins. | |
264 | ||
265 | SIGNALS: Trap.c implements the trap command. The routine set- | |
266 | signal figures out what action should be taken when a signal is | |
267 | received and invokes the signal system call to set the signal ac- | |
268 | tion appropriately. When a signal that a user has set a trap for | |
269 | is caught, the routine "onsig" sets a flag. The routine dotrap | |
270 | is called at appropriate points to actually handle the signal. | |
271 | When an interrupt is caught and no trap has been set for that | |
272 | signal, the routine "onint" in error.c is called. | |
273 | ||
274 | OUTPUT: Ash uses its own output routines. There are three out- | |
275 | put structures allocated. "Output" represents the standard out- | |
276 | put, "errout" the standard error, and "memout" contains output | |
277 | which is to be stored in memory. This last is used when a buil- | |
278 | tin command appears in backquotes, to allow its output to be col- | |
279 | lected without doing any I/O through the UNIX operating system. | |
280 | The variables out1 and out2 normally point to output and errout, | |
281 | respectively, but they are set to point to memout when appropri- | |
282 | ate inside backquotes. | |
283 | ||
284 | INPUT: The basic input routine is pgetc, which reads from the | |
285 | current input file. There is a stack of input files; the current | |
286 | input file is the top file on this stack. The code allows the | |
287 | input to come from a string rather than a file. (This is for the | |
288 | -c option and the "." and eval builtin commands.) The global | |
289 | variable plinno is saved and restored when files are pushed and | |
290 | popped from the stack. The parser routines store the number of | |
291 | the current line in this variable. | |
292 | ||
293 | DEBUGGING: If DEBUG is defined in shell.h, then the shell will | |
294 | write debugging information to the file $HOME/trace. Most of | |
295 | this is done using the TRACE macro, which takes a set of printf | |
296 | arguments inside two sets of parenthesis. Example: | |
297 | "TRACE(("n=%d0, n))". The double parenthesis are necessary be- | |
298 | cause the preprocessor can't handle functions with a variable | |
299 | number of arguments. Defining DEBUG also causes the shell to | |
300 | generate a core dump if it is sent a quit signal. The tracing | |
301 | code is in show.c. |