]>
Commit | Line | Data |
---|---|---|
71aad674 | 1 | # @(#)TOUR 8.1 (Berkeley) 5/31/93 |
deb63bfb | 2 | # $FreeBSD: head/bin/sh/TOUR 253650 2013-07-25 15:08:41Z jilles $ |
71aad674 A |
3 | |
4 | NOTE -- This is the original TOUR paper distributed with ash and | |
5 | does not represent the current state of the shell. It is provided anyway | |
6 | since it provides helpful information for how the shell is structured, | |
7 | but be warned that things have changed -- the current shell is | |
8 | still under development. | |
9 | ||
10 | ================================================================ | |
11 | ||
12 | A Tour through Ash | |
13 | ||
14 | Copyright 1989 by Kenneth Almquist. | |
15 | ||
16 | ||
17 | DIRECTORIES: The subdirectory bltin contains commands which can | |
18 | be compiled stand-alone. The rest of the source is in the main | |
19 | ash directory. | |
20 | ||
21 | SOURCE CODE GENERATORS: Files whose names begin with "mk" are | |
22 | programs that generate source code. A complete list of these | |
23 | programs is: | |
24 | ||
25 | program input files generates | |
26 | ------- ----------- --------- | |
27 | mkbuiltins builtins builtins.h builtins.c | |
28 | mknodes nodetypes nodes.h nodes.c | |
29 | mksyntax - syntax.h syntax.c | |
30 | mktokens - token.h | |
31 | ||
32 | There are undoubtedly too many of these. | |
33 | ||
34 | EXCEPTIONS: Code for dealing with exceptions appears in | |
35 | exceptions.c. The C language doesn't include exception handling, | |
36 | so I implement it using setjmp and longjmp. The global variable | |
37 | exception contains the type of exception. EXERROR is raised by | |
38 | calling error. EXINT is an interrupt. | |
39 | ||
40 | INTERRUPTS: In an interactive shell, an interrupt will cause an | |
41 | EXINT exception to return to the main command loop. (Exception: | |
42 | EXINT is not raised if the user traps interrupts using the trap | |
43 | command.) The INTOFF and INTON macros (defined in exception.h) | |
44 | provide uninterruptible critical sections. Between the execution | |
45 | of INTOFF and the execution of INTON, interrupt signals will be | |
46 | held for later delivery. INTOFF and INTON can be nested. | |
47 | ||
48 | MEMALLOC.C: Memalloc.c defines versions of malloc and realloc | |
49 | which call error when there is no memory left. It also defines a | |
50 | stack oriented memory allocation scheme. Allocating off a stack | |
51 | is probably more efficient than allocation using malloc, but the | |
52 | big advantage is that when an exception occurs all we have to do | |
53 | to free up the memory in use at the time of the exception is to | |
54 | restore the stack pointer. The stack is implemented using a | |
55 | linked list of blocks. | |
56 | ||
57 | STPUTC: If the stack were contiguous, it would be easy to store | |
58 | strings on the stack without knowing in advance how long the | |
59 | string was going to be: | |
60 | p = stackptr; | |
61 | *p++ = c; /* repeated as many times as needed */ | |
62 | stackptr = p; | |
63 | The following three macros (defined in memalloc.h) perform these | |
64 | operations, but grow the stack if you run off the end: | |
65 | STARTSTACKSTR(p); | |
66 | STPUTC(c, p); /* repeated as many times as needed */ | |
67 | grabstackstr(p); | |
68 | ||
69 | We now start a top-down look at the code: | |
70 | ||
71 | MAIN.C: The main routine performs some initialization, executes | |
72 | the user's profile if necessary, and calls cmdloop. Cmdloop | |
73 | repeatedly parses and executes commands. | |
74 | ||
75 | OPTIONS.C: This file contains the option processing code. It is | |
76 | called from main to parse the shell arguments when the shell is | |
77 | invoked, and it also contains the set builtin. The -i and -m op- | |
78 | tions (the latter turns on job control) require changes in signal | |
79 | handling. The routines setjobctl (in jobs.c) and setinteractive | |
80 | (in trap.c) are called to handle changes to these options. | |
81 | ||
82 | PARSING: The parser code is all in parser.c. A recursive des- | |
83 | cent parser is used. Syntax tables (generated by mksyntax) are | |
84 | used to classify characters during lexical analysis. There are | |
85 | four tables: one for normal use, one for use when inside single | |
86 | quotes and dollar single quotes, one for use when inside double | |
87 | quotes and one for use in arithmetic. The tables are machine | |
88 | dependent because they are indexed by character variables and | |
89 | the range of a char varies from machine to machine. | |
90 | ||
91 | PARSE OUTPUT: The output of the parser consists of a tree of | |
92 | nodes. The various types of nodes are defined in the file node- | |
93 | types. | |
94 | ||
95 | Nodes of type NARG are used to represent both words and the con- | |
96 | tents of here documents. An early version of ash kept the con- | |
97 | tents of here documents in temporary files, but keeping here do- | |
98 | cuments in memory typically results in significantly better per- | |
99 | formance. It would have been nice to make it an option to use | |
100 | temporary files for here documents, for the benefit of small | |
101 | machines, but the code to keep track of when to delete the tem- | |
102 | porary files was complex and I never fixed all the bugs in it. | |
103 | (AT&T has been maintaining the Bourne shell for more than ten | |
104 | years, and to the best of my knowledge they still haven't gotten | |
105 | it to handle temporary files correctly in obscure cases.) | |
106 | ||
107 | The text field of a NARG structure points to the text of the | |
108 | word. The text consists of ordinary characters and a number of | |
109 | special codes defined in parser.h. The special codes are: | |
110 | ||
111 | CTLVAR Variable substitution | |
112 | CTLENDVAR End of variable substitution | |
113 | CTLBACKQ Command substitution | |
114 | CTLBACKQ|CTLQUOTE Command substitution inside double quotes | |
115 | CTLESC Escape next character | |
116 | ||
117 | A variable substitution contains the following elements: | |
118 | ||
119 | CTLVAR type name '=' [ alternative-text CTLENDVAR ] | |
120 | ||
121 | The type field is a single character specifying the type of sub- | |
122 | stitution. The possible types are: | |
123 | ||
124 | VSNORMAL $var | |
125 | VSMINUS ${var-text} | |
126 | VSMINUS|VSNUL ${var:-text} | |
127 | VSPLUS ${var+text} | |
128 | VSPLUS|VSNUL ${var:+text} | |
129 | VSQUESTION ${var?text} | |
130 | VSQUESTION|VSNUL ${var:?text} | |
131 | VSASSIGN ${var=text} | |
132 | VSASSIGN|VSNUL ${var:=text} | |
133 | ||
134 | In addition, the type field will have the VSQUOTE flag set if the | |
135 | variable is enclosed in double quotes. The name of the variable | |
136 | comes next, terminated by an equals sign. If the type is not | |
137 | VSNORMAL, then the text field in the substitution follows, ter- | |
138 | minated by a CTLENDVAR byte. | |
139 | ||
140 | Commands in back quotes are parsed and stored in a linked list. | |
141 | The locations of these commands in the string are indicated by | |
142 | CTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether | |
143 | the back quotes were enclosed in double quotes. | |
144 | ||
145 | The character CTLESC escapes the next character, so that in case | |
146 | any of the CTL characters mentioned above appear in the input, | |
147 | they can be passed through transparently. CTLESC is also used to | |
148 | escape '*', '?', '[', and '!' characters which were quoted by the | |
149 | user and thus should not be used for file name generation. | |
150 | ||
151 | CTLESC characters have proved to be particularly tricky to get | |
152 | right. In the case of here documents which are not subject to | |
153 | variable and command substitution, the parser doesn't insert any | |
154 | CTLESC characters to begin with (so the contents of the text | |
155 | field can be written without any processing). Other here docu- | |
156 | ments, and words which are not subject to splitting and file name | |
157 | generation, have the CTLESC characters removed during the vari- | |
158 | able and command substitution phase. Words which are subject to | |
159 | splitting and file name generation have the CTLESC characters re- | |
160 | moved as part of the file name phase. | |
161 | ||
162 | EXECUTION: Command execution is handled by the following files: | |
163 | eval.c The top level routines. | |
164 | redir.c Code to handle redirection of input and output. | |
165 | jobs.c Code to handle forking, waiting, and job control. | |
166 | exec.c Code to do path searches and the actual exec sys call. | |
167 | expand.c Code to evaluate arguments. | |
168 | var.c Maintains the variable symbol table. Called from expand.c. | |
169 | ||
170 | EVAL.C: Evaltree recursively executes a parse tree. The exit | |
171 | status is returned in the global variable exitstatus. The alter- | |
172 | native entry evalbackcmd is called to evaluate commands in back | |
173 | quotes. It saves the result in memory if the command is a buil- | |
174 | tin; otherwise it forks off a child to execute the command and | |
175 | connects the standard output of the child to a pipe. | |
176 | ||
177 | JOBS.C: To create a process, you call makejob to return a job | |
178 | structure, and then call forkshell (passing the job structure as | |
179 | an argument) to create the process. Waitforjob waits for a job | |
180 | to complete. These routines take care of process groups if job | |
181 | control is defined. | |
182 | ||
183 | REDIR.C: Ash allows file descriptors to be redirected and then | |
184 | restored without forking off a child process. This is accom- | |
185 | plished by duplicating the original file descriptors. The redir- | |
186 | tab structure records where the file descriptors have been dupli- | |
187 | cated to. | |
188 | ||
189 | EXEC.C: The routine find_command locates a command, and enters | |
190 | the command in the hash table if it is not already there. The | |
191 | third argument specifies whether it is to print an error message | |
192 | if the command is not found. (When a pipeline is set up, | |
193 | find_command is called for all the commands in the pipeline be- | |
194 | fore any forking is done, so to get the commands into the hash | |
195 | table of the parent process. But to make command hashing as | |
196 | transparent as possible, we silently ignore errors at that point | |
197 | and only print error messages if the command cannot be found | |
198 | later.) | |
199 | ||
200 | The routine shellexec is the interface to the exec system call. | |
201 | ||
202 | EXPAND.C: Arguments are processed in three passes. The first | |
203 | (performed by the routine argstr) performs variable and command | |
204 | substitution. The second (ifsbreakup) performs word splitting | |
205 | and the third (expandmeta) performs file name generation. | |
206 | ||
207 | VAR.C: Variables are stored in a hash table. Probably we should | |
208 | switch to extensible hashing. The variable name is stored in the | |
209 | same string as the value (using the format "name=value") so that | |
210 | no string copying is needed to create the environment of a com- | |
211 | mand. Variables which the shell references internally are preal- | |
212 | located so that the shell can reference the values of these vari- | |
213 | ables without doing a lookup. | |
214 | ||
215 | When a program is run, the code in eval.c sticks any environment | |
216 | variables which precede the command (as in "PATH=xxx command") in | |
217 | the variable table as the simplest way to strip duplicates, and | |
218 | then calls "environment" to get the value of the environment. | |
219 | ||
220 | BUILTIN COMMANDS: The procedures for handling these are scat- | |
221 | tered throughout the code, depending on which location appears | |
222 | most appropriate. They can be recognized because their names al- | |
223 | ways end in "cmd". The mapping from names to procedures is | |
224 | specified in the file builtins, which is processed by the mkbuilt- | |
225 | ins command. | |
226 | ||
227 | A builtin command is invoked with argc and argv set up like a | |
228 | normal program. A builtin command is allowed to overwrite its | |
229 | arguments. Builtin routines can call nextopt to do option pars- | |
230 | ing. This is kind of like getopt, but you don't pass argc and | |
231 | argv to it. Builtin routines can also call error. This routine | |
232 | normally terminates the shell (or returns to the main command | |
233 | loop if the shell is interactive), but when called from a builtin | |
234 | command it causes the builtin command to terminate with an exit | |
235 | status of 2. | |
236 | ||
237 | The directory bltins contains commands which can be compiled in- | |
238 | dependently but can also be built into the shell for efficiency | |
239 | reasons. The makefile in this directory compiles these programs | |
240 | in the normal fashion (so that they can be run regardless of | |
241 | whether the invoker is ash), but also creates a library named | |
242 | bltinlib.a which can be linked with ash. The header file bltin.h | |
243 | takes care of most of the differences between the ash and the | |
244 | stand-alone environment. The user should call the main routine | |
245 | "main", and #define main to be the name of the routine to use | |
246 | when the program is linked into ash. This #define should appear | |
247 | before bltin.h is included; bltin.h will #undef main if the pro- | |
248 | gram is to be compiled stand-alone. | |
249 | ||
250 | CD.C: This file defines the cd and pwd builtins. | |
251 | ||
252 | SIGNALS: Trap.c implements the trap command. The routine set- | |
253 | signal figures out what action should be taken when a signal is | |
254 | received and invokes the signal system call to set the signal ac- | |
255 | tion appropriately. When a signal that a user has set a trap for | |
256 | is caught, the routine "onsig" sets a flag. The routine dotrap | |
257 | is called at appropriate points to actually handle the signal. | |
258 | When an interrupt is caught and no trap has been set for that | |
259 | signal, the routine "onint" in error.c is called. | |
260 | ||
261 | OUTPUT: Ash uses it's own output routines. There are three out- | |
262 | put structures allocated. "Output" represents the standard out- | |
263 | put, "errout" the standard error, and "memout" contains output | |
264 | which is to be stored in memory. This last is used when a buil- | |
265 | tin command appears in backquotes, to allow its output to be col- | |
266 | lected without doing any I/O through the UNIX operating system. | |
267 | The variables out1 and out2 normally point to output and errout, | |
268 | respectively, but they are set to point to memout when appropri- | |
269 | ate inside backquotes. | |
270 | ||
271 | INPUT: The basic input routine is pgetc, which reads from the | |
272 | current input file. There is a stack of input files; the current | |
273 | input file is the top file on this stack. The code allows the | |
274 | input to come from a string rather than a file. (This is for the | |
275 | -c option and the "." and eval builtin commands.) The global | |
276 | variable plinno is saved and restored when files are pushed and | |
277 | popped from the stack. The parser routines store the number of | |
278 | the current line in this variable. | |
279 | ||
280 | DEBUGGING: If DEBUG is defined in shell.h, then the shell will | |
281 | write debugging information to the file $HOME/trace. Most of | |
282 | this is done using the TRACE macro, which takes a set of printf | |
283 | arguments inside two sets of parenthesis. Example: | |
284 | "TRACE(("n=%d0, n))". The double parenthesis are necessary be- | |
285 | cause the preprocessor can't handle functions with a variable | |
286 | number of arguments. Defining DEBUG also causes the shell to | |
287 | generate a core dump if it is sent a quit signal. The tracing | |
288 | code is in show.c. |