]>
Commit | Line | Data |
---|---|---|
14c7c974 A |
1 | The Netwide Assembler: NASM |
2 | =========================== | |
3 | ||
4 | Chapter 1: Introduction | |
5 | ----------------------- | |
6 | ||
7 | 1.1 What Is NASM? | |
8 | ||
9 | The Netwide Assembler, NASM, is an 80x86 assembler designed for | |
10 | portability and modularity. It supports a range of object file | |
11 | formats, including Linux `a.out' and ELF, NetBSD/FreeBSD, COFF, | |
12 | Microsoft 16-bit OBJ and Win32. It will also output plain binary | |
13 | files. Its syntax is designed to be simple and easy to understand, | |
14 | similar to Intel's but less complex. It supports Pentium, P6 and MMX | |
15 | opcodes, and has macro capability. | |
16 | ||
17 | 1.1.1 Why Yet Another Assembler? | |
18 | ||
19 | The Netwide Assembler grew out of an idea on `comp.lang.asm.x86' (or | |
20 | possibly `alt.lang.asm' - I forget which), which was essentially | |
21 | that there didn't seem to be a good free x86-series assembler | |
22 | around, and that maybe someone ought to write one. | |
23 | ||
24 | (*) `a86' is good, but not free, and in particular you don't get any | |
25 | 32-bit capability until you pay. It's DOS only, too. | |
26 | ||
27 | (*) `gas' is free, and ports over DOS and Unix, but it's not very | |
28 | good, since it's designed to be a back end to `gcc', which | |
29 | always feeds it correct code. So its error checking is minimal. | |
30 | Also, its syntax is horrible, from the point of view of anyone | |
31 | trying to actually _write_ anything in it. Plus you can't write | |
32 | 16-bit code in it (properly). | |
33 | ||
34 | (*) `as86' is Linux-specific, and (my version at least) doesn't seem | |
35 | to have much (or any) documentation. | |
36 | ||
37 | (*) MASM isn't very good, and it's expensive, and it runs only under | |
38 | DOS. | |
39 | ||
40 | (*) TASM is better, but still strives for MASM compatibility, which | |
41 | means millions of directives and tons of red tape. And its | |
42 | syntax is essentially MASM's, with the contradictions and quirks | |
43 | that entails (although it sorts out some of those by means of | |
44 | Ideal mode). It's expensive too. And it's DOS-only. | |
45 | ||
46 | So here, for your coding pleasure, is NASM. At present it's still in | |
47 | prototype stage - we don't promise that it can outperform any of | |
48 | these assemblers. But please, _please_ send us bug reports, fixes, | |
49 | helpful information, and anything else you can get your hands on | |
50 | (and thanks to the many people who've done this already! You all | |
51 | know who you are), and we'll improve it out of all recognition. | |
52 | Again. | |
53 | ||
54 | 1.1.2 Licence Conditions | |
55 | ||
56 | Please see the file `Licence', supplied as part of any NASM | |
57 | distribution archive, for the licence conditions under which you may | |
58 | use NASM. | |
59 | ||
60 | 1.2 Contact Information | |
61 | ||
62 | NASM has a WWW page at `http://www.cryogen.com/Nasm'. The authors | |
63 | are e-mailable as `jules@earthcorp.com' and `anakin@pobox.com'. If | |
64 | you want to report a bug to us, please read section 10.2 first. | |
65 | ||
66 | New releases of NASM are uploaded to `sunsite.unc.edu', | |
67 | `ftp.simtel.net' and `ftp.coast.net'. Announcements are posted to | |
68 | `comp.lang.asm.x86', `alt.lang.asm', `comp.os.linux.announce' and | |
69 | `comp.archives.msdos.announce' (the last one is done automagically | |
70 | by uploading to `ftp.simtel.net'). | |
71 | ||
72 | If you don't have Usenet access, or would rather be informed by | |
73 | e-mail when new releases come out, e-mail `anakin@pobox.com' and | |
74 | ask. | |
75 | ||
76 | 1.3 Installation | |
77 | ||
78 | 1.3.1 Installing NASM under MS-DOS or Windows | |
79 | ||
80 | Once you've obtained the DOS archive for NASM, `nasmXXX.zip' (where | |
81 | `XXX' denotes the version number of NASM contained in the archive), | |
82 | unpack it into its own directory (for example `c:\nasm'). | |
83 | ||
84 | The archive will contain four executable files: the NASM executable | |
85 | files `nasm.exe' and `nasmw.exe', and the NDISASM executable files | |
86 | `ndisasm.exe' and `ndisasmw.exe'. In each case, the file whose name | |
87 | ends in `w' is a Win32 executable, designed to run under Windows 95 | |
88 | or Windows NT Intel, and the other one is a 16-bit DOS executable. | |
89 | ||
90 | The only file NASM needs to run is its own executable, so copy (at | |
91 | least) one of `nasm.exe' and `nasmw.exe' to a directory on your | |
92 | PATH, or alternatively edit `autoexec.bat' to add the `nasm' | |
93 | directory to your `PATH'. (If you're only installing the Win32 | |
94 | version, you may wish to rename it to `nasm.exe'.) | |
95 | ||
96 | That's it - NASM is installed. You don't need the `nasm' directory | |
97 | to be present to run NASM (unless you've added it to your `PATH'), | |
98 | so you can delete it if you need to save space; however, you may | |
99 | want to keep the documentation or test programs. | |
100 | ||
101 | If you've downloaded the DOS source archive, `nasmXXXs.zip', the | |
102 | `nasm' directory will also contain the full NASM source code, and a | |
103 | selection of Makefiles you can (hopefully) use to rebuild your copy | |
104 | of NASM from scratch. The file `Readme' lists the various Makefiles | |
105 | and which compilers they work with. Note that the source files | |
106 | `insnsa.c' and `insnsd.c' are automatically generated from the | |
107 | master instruction table `insns.dat' by a Perl script; a QBasic | |
108 | version of the program is provided, but it is recommended that you | |
109 | use the Perl version. A DOS port of Perl is available from | |
110 | www.perl.org. | |
111 | ||
112 | 1.3.2 Installing NASM under Unix | |
113 | ||
114 | Once you've obtained the Unix source archive for NASM, | |
115 | `nasm-X.XX.tar.gz' (where `X.XX' denotes the version number of NASM | |
116 | contained in the archive), unpack it into a directory such as | |
117 | `/usr/local/src'. The archive, when unpacked, will create its own | |
118 | subdirectory `nasm-X.XX'. | |
119 | ||
120 | NASM is an auto-configuring package: once you've unpacked it, `cd' | |
121 | to the directory it's been unpacked into and type `./configure'. | |
122 | This shell script will find the best C compiler to use for building | |
123 | NASM and set up Makefiles accordingly. | |
124 | ||
125 | Once NASM has auto-configured, you can type `make' to build the | |
126 | `nasm' and `ndisasm' binaries, and then `make install' to install | |
127 | them in `/usr/local/bin' and install the man pages `nasm.1' and | |
128 | `ndisasm.1' in `/usr/local/man/man1'. Alternatively, you can give | |
129 | options such as `--prefix' to the `configure' script (see the file | |
130 | `INSTALL' for more details), or install the programs yourself. | |
131 | ||
132 | NASM also comes with a set of utilities for handling the RDOFF | |
133 | custom object-file format, which are in the `rdoff' subdirectory of | |
134 | the NASM archive. You can build these with `make rdf' and install | |
135 | them with `make rdf_install', if you want them. | |
136 | ||
137 | If NASM fails to auto-configure, you may still be able to make it | |
138 | compile by using the fall-back Unix makefile `Makefile.unx'. Copy or | |
139 | rename that file to `Makefile' and try typing `make'. There is also | |
140 | a `Makefile.unx' file in the `rdoff' subdirectory. | |
141 | ||
142 | Chapter 2: Running NASM | |
143 | ----------------------- | |
144 | ||
145 | 2.1 NASM Command-Line Syntax | |
146 | ||
147 | To assemble a file, you issue a command of the form | |
148 | ||
149 | nasm -f <format> <filename> [-o <output>] | |
150 | ||
151 | For example, | |
152 | ||
153 | nasm -f elf myfile.asm | |
154 | ||
155 | will assemble `myfile.asm' into an ELF object file `myfile.o'. And | |
156 | ||
157 | nasm -f bin myfile.asm -o myfile.com | |
158 | ||
159 | will assemble `myfile.asm' into a raw binary file `myfile.com'. | |
160 | ||
161 | To produce a listing file, with the hex codes output from NASM | |
162 | displayed on the left of the original sources, use the `-l' option | |
163 | to give a listing file name, for example: | |
164 | ||
165 | nasm -f coff myfile.asm -l myfile.lst | |
166 | ||
167 | To get further usage instructions from NASM, try typing | |
168 | ||
169 | nasm -h | |
170 | ||
171 | This will also list the available output file formats, and what they | |
172 | are. | |
173 | ||
174 | If you use Linux but aren't sure whether your system is `a.out' or | |
175 | ELF, type | |
176 | ||
177 | file nasm | |
178 | ||
179 | (in the directory in which you put the NASM binary when you | |
180 | installed it). If it says something like | |
181 | ||
182 | nasm: ELF 32-bit LSB executable i386 (386 and up) Version 1 | |
183 | ||
184 | then your system is ELF, and you should use the option `-f elf' when | |
185 | you want NASM to produce Linux object files. If it says | |
186 | ||
187 | nasm: Linux/i386 demand-paged executable (QMAGIC) | |
188 | ||
189 | or something similar, your system is `a.out', and you should use | |
190 | `-f aout' instead. | |
191 | ||
192 | Like Unix compilers and assemblers, NASM is silent unless it goes | |
193 | wrong: you won't see any output at all, unless it gives error | |
194 | messages. | |
195 | ||
196 | 2.1.1 The `-o' Option: Specifying the Output File Name | |
197 | ||
198 | NASM will normally choose the name of your output file for you; | |
199 | precisely how it does this is dependent on the object file format. | |
200 | For Microsoft object file formats (`obj' and `win32'), it will | |
201 | remove the `.asm' extension (or whatever extension you like to use - | |
202 | NASM doesn't care) from your source file name and substitute `.obj'. | |
203 | For Unix object file formats (`aout', `coff', `elf' and `as86') it | |
204 | will substitute `.o'. For `rdf', it will use `.rdf', and for the | |
205 | `bin' format it will simply remove the extension, so that | |
206 | `myfile.asm' produces the output file `myfile'. | |
207 | ||
208 | If the output file already exists, NASM will overwrite it, unless it | |
209 | has the same name as the input file, in which case it will give a | |
210 | warning and use `nasm.out' as the output file name instead. | |
211 | ||
212 | For situations in which this behaviour is unacceptable, NASM | |
213 | provides the `-o' command-line option, which allows you to specify | |
214 | your desired output file name. You invoke `-o' by following it with | |
215 | the name you wish for the output file, either with or without an | |
216 | intervening space. For example: | |
217 | ||
218 | nasm -f bin program.asm -o program.com | |
219 | nasm -f bin driver.asm -odriver.sys | |
220 | ||
221 | 2.1.2 The `-f' Option: Specifying the Output File Format | |
222 | ||
223 | If you do not supply the `-f' option to NASM, it will choose an | |
224 | output file format for you itself. In the distribution versions of | |
225 | NASM, the default is always `bin'; if you've compiled your own copy | |
226 | of NASM, you can redefine `OF_DEFAULT' at compile time and choose | |
227 | what you want the default to be. | |
228 | ||
229 | Like `-o', the intervening space between `-f' and the output file | |
230 | format is optional; so `-f elf' and `-felf' are both valid. | |
231 | ||
232 | A complete list of the available output file formats can be given by | |
233 | issuing the command `nasm -h'. | |
234 | ||
235 | 2.1.3 The `-l' Option: Generating a Listing File | |
236 | ||
237 | If you supply the `-l' option to NASM, followed (with the usual | |
238 | optional space) by a file name, NASM will generate a source-listing | |
239 | file for you, in which addresses and generated code are listed on | |
240 | the left, and the actual source code, with expansions of multi-line | |
241 | macros (except those which specifically request no expansion in | |
242 | source listings: see section 4.2.9) on the right. For example: | |
243 | ||
244 | nasm -f elf myfile.asm -l myfile.lst | |
245 | ||
246 | 2.1.4 The `-s' Option: Send Errors to `stdout' | |
247 | ||
248 | Under MS-DOS it can be difficult (though there are ways) to redirect | |
249 | the standard-error output of a program to a file. Since NASM usually | |
250 | produces its warning and error messages on `stderr', this can make | |
251 | it hard to capture the errors if (for example) you want to load them | |
252 | into an editor. | |
253 | ||
254 | NASM therefore provides the `-s' option, requiring no argument, | |
255 | which causes errors to be sent to standard output rather than | |
256 | standard error. Therefore you can redirect the errors into a file by | |
257 | typing | |
258 | ||
259 | nasm -s -f obj myfile.asm > myfile.err | |
260 | ||
261 | 2.1.5 The `-i' Option: Include File Search Directories | |
262 | ||
263 | When NASM sees the `%include' directive in a source file (see | |
264 | section 4.5), it will search for the given file not only in the | |
265 | current directory, but also in any directories specified on the | |
266 | command line by the use of the `-i' option. Therefore you can | |
267 | include files from a macro library, for example, by typing | |
268 | ||
269 | nasm -ic:\macrolib\ -f obj myfile.asm | |
270 | ||
271 | (As usual, a space between `-i' and the path name is allowed, and | |
272 | optional). | |
273 | ||
274 | NASM, in the interests of complete source-code portability, does not | |
275 | understand the file naming conventions of the OS it is running on; | |
276 | the string you provide as an argument to the `-i' option will be | |
277 | prepended exactly as written to the name of the include file. | |
278 | Therefore the trailing backslash in the above example is necessary. | |
279 | Under Unix, a trailing forward slash is similarly necessary. | |
280 | ||
281 | (You can use this to your advantage, if you're really perverse, by | |
282 | noting that the option `-ifoo' will cause `%include "bar.i"' to | |
283 | search for the file `foobar.i'...) | |
284 | ||
285 | If you want to define a _standard_ include search path, similar to | |
286 | `/usr/include' on Unix systems, you should place one or more `-i' | |
287 | directives in the `NASM' environment variable (see section 2.1.11). | |
288 | ||
289 | 2.1.6 The `-p' Option: Pre-Include a File | |
290 | ||
291 | NASM allows you to specify files to be _pre-included_ into your | |
292 | source file, by the use of the `-p' option. So running | |
293 | ||
294 | nasm myfile.asm -p myinc.inc | |
295 | ||
296 | is equivalent to running `nasm myfile.asm' and placing the directive | |
297 | `%include "myinc.inc"' at the start of the file. | |
298 | ||
299 | 2.1.7 The `-d' Option: Pre-Define a Macro | |
300 | ||
301 | Just as the `-p' option gives an alternative to placing `%include' | |
302 | directives at the start of a source file, the `-d' option gives an | |
303 | alternative to placing a `%define' directive. You could code | |
304 | ||
305 | nasm myfile.asm -dFOO=100 | |
306 | ||
307 | as an alternative to placing the directive | |
308 | ||
309 | %define FOO 100 | |
310 | ||
311 | at the start of the file. You can miss off the macro value, as well: | |
312 | the option `-dFOO' is equivalent to coding `%define FOO'. This form | |
313 | of the directive may be useful for selecting assembly-time options | |
314 | which are then tested using `%ifdef', for example `-dDEBUG'. | |
315 | ||
316 | 2.1.8 The `-e' Option: Preprocess Only | |
317 | ||
318 | NASM allows the preprocessor to be run on its own, up to a point. | |
319 | Using the `-e' option (which requires no arguments) will cause NASM | |
320 | to preprocess its input file, expand all the macro references, | |
321 | remove all the comments and preprocessor directives, and print the | |
322 | resulting file on standard output (or save it to a file, if the `-o' | |
323 | option is also used). | |
324 | ||
325 | This option cannot be applied to programs which require the | |
326 | preprocessor to evaluate expressions which depend on the values of | |
327 | symbols: so code such as | |
328 | ||
329 | %assign tablesize ($-tablestart) | |
330 | ||
331 | will cause an error in preprocess-only mode. | |
332 | ||
333 | 2.1.9 The `-a' Option: Don't Preprocess At All | |
334 | ||
335 | If NASM is being used as the back end to a compiler, it might be | |
336 | desirable to suppress preprocessing completely and assume the | |
337 | compiler has already done it, to save time and increase compilation | |
338 | speeds. The `-a' option, requiring no argument, instructs NASM to | |
339 | replace its powerful preprocessor with a stub preprocessor which | |
340 | does nothing. | |
341 | ||
342 | 2.1.10 The `-w' Option: Enable or Disable Assembly Warnings | |
343 | ||
344 | NASM can observe many conditions during the course of assembly which | |
345 | are worth mentioning to the user, but not a sufficiently severe | |
346 | error to justify NASM refusing to generate an output file. These | |
347 | conditions are reported like errors, but come up with the word | |
348 | `warning' before the message. Warnings do not prevent NASM from | |
349 | generating an output file and returning a success status to the | |
350 | operating system. | |
351 | ||
352 | Some conditions are even less severe than that: they are only | |
353 | sometimes worth mentioning to the user. Therefore NASM supports the | |
354 | `-w' command-line option, which enables or disables certain classes | |
355 | of assembly warning. Such warning classes are described by a name, | |
356 | for example `orphan-labels'; you can enable warnings of this class | |
357 | by the command-line option `-w+orphan-labels' and disable it by | |
358 | `-w-orphan-labels'. | |
359 | ||
360 | The suppressible warning classes are: | |
361 | ||
362 | (*) `macro-params' covers warnings about multi-line macros being | |
363 | invoked with the wrong number of parameters. This warning class | |
364 | is enabled by default; see section 4.2.1 for an example of why | |
365 | you might want to disable it. | |
366 | ||
367 | (*) `orphan-labels' covers warnings about source lines which contain | |
368 | no instruction but define a label without a trailing colon. NASM | |
369 | does not warn about this somewhat obscure condition by default; | |
370 | see section 3.1 for an example of why you might want it to. | |
371 | ||
372 | (*) `number-overflow' covers warnings about numeric constants which | |
373 | don't fit in 32 bits (for example, it's easy to type one too | |
374 | many Fs and produce `0x7ffffffff' by mistake). This warning | |
375 | class is enabled by default. | |
376 | ||
377 | 2.1.11 The `NASM' Environment Variable | |
378 | ||
379 | If you define an environment variable called `NASM', the program | |
380 | will interpret it as a list of extra command-line options, which are | |
381 | processed before the real command line. You can use this to define | |
382 | standard search directories for include files, by putting `-i' | |
383 | options in the `NASM' variable. | |
384 | ||
385 | The value of the variable is split up at white space, so that the | |
386 | value `-s -ic:\nasmlib' will be treated as two separate options. | |
387 | However, that means that the value `-dNAME="my name"' won't do what | |
388 | you might want, because it will be split at the space and the NASM | |
389 | command-line processing will get confused by the two nonsensical | |
390 | words `-dNAME="my' and `name"'. | |
391 | ||
392 | To get round this, NASM provides a feature whereby, if you begin the | |
393 | `NASM' environment variable with some character that isn't a minus | |
394 | sign, then NASM will treat this character as the separator character | |
395 | for options. So setting the `NASM' variable to the value | |
396 | `!-s!-ic:\nasmlib' is equivalent to setting it to `-s -ic:\nasmlib', | |
397 | but `!-dNAME="my name"' will work. | |
398 | ||
399 | 2.2 Quick Start for MASM Users | |
400 | ||
401 | If you're used to writing programs with MASM, or with TASM in MASM- | |
402 | compatible (non-Ideal) mode, or with `a86', this section attempts to | |
403 | outline the major differences between MASM's syntax and NASM's. If | |
404 | you're not already used to MASM, it's probably worth skipping this | |
405 | section. | |
406 | ||
407 | 2.2.1 NASM Is Case-Sensitive | |
408 | ||
409 | One simple difference is that NASM is case-sensitive. It makes a | |
410 | difference whether you call your label `foo', `Foo' or `FOO'. If | |
411 | you're assembling to DOS or OS/2 `.OBJ' files, you can invoke the | |
412 | `UPPERCASE' directive (documented in section 6.2) to ensure that all | |
413 | symbols exported to other code modules are forced to be upper case; | |
414 | but even then, _within_ a single module, NASM will distinguish | |
415 | between labels differing only in case. | |
416 | ||
417 | 2.2.2 NASM Requires Square Brackets For Memory References | |
418 | ||
419 | NASM was designed with simplicity of syntax in mind. One of the | |
420 | design goals of NASM is that it should be possible, as far as is | |
421 | practical, for the user to look at a single line of NASM code and | |
422 | tell what opcode is generated by it. You can't do this in MASM: if | |
423 | you declare, for example, | |
424 | ||
425 | foo equ 1 | |
426 | bar dw 2 | |
427 | ||
428 | then the two lines of code | |
429 | ||
430 | mov ax,foo | |
431 | mov ax,bar | |
432 | ||
433 | generate completely different opcodes, despite having identical- | |
434 | looking syntaxes. | |
435 | ||
436 | NASM avoids this undesirable situation by having a much simpler | |
437 | syntax for memory references. The rule is simply that any access to | |
438 | the _contents_ of a memory location requires square brackets around | |
439 | the address, and any access to the _address_ of a variable doesn't. | |
440 | So an instruction of the form `mov ax,foo' will _always_ refer to a | |
441 | compile-time constant, whether it's an `EQU' or the address of a | |
442 | variable; and to access the _contents_ of the variable `bar', you | |
443 | must code `mov ax,[bar]'. | |
444 | ||
445 | This also means that NASM has no need for MASM's `OFFSET' keyword, | |
446 | since the MASM code `mov ax,offset bar' means exactly the same thing | |
447 | as NASM's `mov ax,bar'. If you're trying to get large amounts of | |
448 | MASM code to assemble sensibly under NASM, you can always code | |
449 | `%idefine offset' to make the preprocessor treat the `OFFSET' | |
450 | keyword as a no-op. | |
451 | ||
452 | This issue is even more confusing in `a86', where declaring a label | |
453 | with a trailing colon defines it to be a `label' as opposed to a | |
454 | `variable' and causes `a86' to adopt NASM-style semantics; so in | |
455 | `a86', `mov ax,var' has different behaviour depending on whether | |
456 | `var' was declared as `var: dw 0' (a label) or `var dw 0' (a word- | |
457 | size variable). NASM is very simple by comparison: _everything_ is a | |
458 | label. | |
459 | ||
460 | NASM, in the interests of simplicity, also does not support the | |
461 | hybrid syntaxes supported by MASM and its clones, such as | |
462 | `mov ax,table[bx]', where a memory reference is denoted by one | |
463 | portion outside square brackets and another portion inside. The | |
464 | correct syntax for the above is `mov ax,[table+bx]'. Likewise, | |
465 | `mov ax,es:[di]' is wrong and `mov ax,[es:di]' is right. | |
466 | ||
467 | 2.2.3 NASM Doesn't Store Variable Types | |
468 | ||
469 | NASM, by design, chooses not to remember the types of variables you | |
470 | declare. Whereas MASM will remember, on seeing `var dw 0', that you | |
471 | declared `var' as a word-size variable, and will then be able to | |
472 | fill in the ambiguity in the size of the instruction `mov var,2', | |
473 | NASM will deliberately remember nothing about the symbol `var' | |
474 | except where it begins, and so you must explicitly code | |
475 | `mov word [var],2'. | |
476 | ||
477 | For this reason, NASM doesn't support the `LODS', `MOVS', `STOS', | |
478 | `SCAS', `CMPS', `INS', or `OUTS' instructions, but only supports the | |
479 | forms such as `LODSB', `MOVSW', and `SCASD', which explicitly | |
480 | specify the size of the components of the strings being manipulated. | |
481 | ||
482 | 2.2.4 NASM Doesn't `ASSUME' | |
483 | ||
484 | As part of NASM's drive for simplicity, it also does not support the | |
485 | `ASSUME' directive. NASM will not keep track of what values you | |
486 | choose to put in your segment registers, and will never | |
487 | _automatically_ generate a segment override prefix. | |
488 | ||
489 | 2.2.5 NASM Doesn't Support Memory Models | |
490 | ||
491 | NASM also does not have any directives to support different 16-bit | |
492 | memory models. The programmer has to keep track of which functions | |
493 | are supposed to be called with a far call and which with a near | |
494 | call, and is responsible for putting the correct form of `RET' | |
495 | instruction (`RETN' or `RETF'; NASM accepts `RET' itself as an | |
496 | alternate form for `RETN'); in addition, the programmer is | |
497 | responsible for coding CALL FAR instructions where necessary when | |
498 | calling _external_ functions, and must also keep track of which | |
499 | external variable definitions are far and which are near. | |
500 | ||
501 | 2.2.6 Floating-Point Differences | |
502 | ||
503 | NASM uses different names to refer to floating-point registers from | |
504 | MASM: where MASM would call them `ST(0)', `ST(1)' and so on, and | |
505 | `a86' would call them simply `0', `1' and so on, NASM chooses to | |
506 | call them `st0', `st1' etc. | |
507 | ||
508 | As of version 0.96, NASM now treats the instructions with `nowait' | |
509 | forms in the same way as MASM-compatible assemblers. The | |
510 | idiosyncratic treatment employed by 0.95 and earlier was based on a | |
511 | misunderstanding by the authors. | |
512 | ||
513 | 2.2.7 Other Differences | |
514 | ||
515 | For historical reasons, NASM uses the keyword `TWORD' where MASM and | |
516 | compatible assemblers use `TBYTE'. | |
517 | ||
518 | NASM does not declare uninitialised storage in the same way as MASM: | |
519 | where a MASM programmer might use `stack db 64 dup (?)', NASM | |
520 | requires `stack resb 64', intended to be read as `reserve 64 bytes'. | |
521 | For a limited amount of compatibility, since NASM treats `?' as a | |
522 | valid character in symbol names, you can code `? equ 0' and then | |
523 | writing `dw ?' will at least do something vaguely useful. `DUP' is | |
524 | still not a supported syntax, however. | |
525 | ||
526 | In addition to all of this, macros and directives work completely | |
527 | differently to MASM. See chapter 4 and chapter 5 for further | |
528 | details. | |
529 | ||
530 | Chapter 3: The NASM Language | |
531 | ---------------------------- | |
532 | ||
533 | 3.1 Layout of a NASM Source Line | |
534 | ||
535 | Like most assemblers, each NASM source line contains (unless it is a | |
536 | macro, a preprocessor directive or an assembler directive: see | |
537 | chapter 4 and chapter 5) some combination of the four fields | |
538 | ||
539 | label: instruction operands ; comment | |
540 | ||
541 | As usual, most of these fields are optional; the presence or absence | |
542 | of any combination of a label, an instruction and a comment is | |
543 | allowed. Of course, the operand field is either required or | |
544 | forbidden by the presence and nature of the instruction field. | |
545 | ||
546 | NASM places no restrictions on white space within a line: labels may | |
547 | have white space before them, or instructions may have no space | |
548 | before them, or anything. The colon after a label is also optional. | |
549 | (Note that this means that if you intend to code `lodsb' alone on a | |
550 | line, and type `lodab' by accident, then that's still a valid source | |
551 | line which does nothing but define a label. Running NASM with the | |
552 | command-line option `-w+orphan-labels' will cause it to warn you if | |
553 | you define a label alone on a line without a trailing colon.) | |
554 | ||
555 | Valid characters in labels are letters, numbers, `_', `$', `#', `@', | |
556 | `~', `.', and `?'. The only characters which may be used as the | |
557 | _first_ character of an identifier are letters, `.' (with special | |
558 | meaning: see section 3.8), `_' and `?'. An identifier may also be | |
559 | prefixed with a `$' to indicate that it is intended to be read as an | |
560 | identifier and not a reserved word; thus, if some other module you | |
561 | are linking with defines a symbol called `eax', you can refer to | |
562 | `$eax' in NASM code to distinguish the symbol from the register. | |
563 | ||
564 | The instruction field may contain any machine instruction: Pentium | |
565 | and P6 instructions, FPU instructions, MMX instructions and even | |
566 | undocumented instructions are all supported. The instruction may be | |
567 | prefixed by `LOCK', `REP', `REPE'/`REPZ' or `REPNE'/`REPNZ', in the | |
568 | usual way. Explicit address-size and operand-size prefixes `A16', | |
569 | `A32', `O16' and `O32' are provided - one example of their use is | |
570 | given in chapter 9. You can also use the name of a segment register | |
571 | as an instruction prefix: coding `es mov [bx],ax' is equivalent to | |
572 | coding `mov [es:bx],ax'. We recommend the latter syntax, since it is | |
573 | consistent with other syntactic features of the language, but for | |
574 | instructions such as `LODSB', which has no operands and yet can | |
575 | require a segment override, there is no clean syntactic way to | |
576 | proceed apart from `es lodsb'. | |
577 | ||
578 | An instruction is not required to use a prefix: prefixes such as | |
579 | `CS', `A32', `LOCK' or `REPE' can appear on a line by themselves, | |
580 | and NASM will just generate the prefix bytes. | |
581 | ||
582 | In addition to actual machine instructions, NASM also supports a | |
583 | number of pseudo-instructions, described in section 3.2. | |
584 | ||
585 | Instruction operands may take a number of forms: they can be | |
586 | registers, described simply by the register name (e.g. `ax', `bp', | |
587 | `ebx', `cr0': NASM does not use the `gas'-style syntax in which | |
588 | register names must be prefixed by a `%' sign), or they can be | |
589 | effective addresses (see section 3.3), constants (section 3.4) or | |
590 | expressions (section 3.5). | |
591 | ||
592 | For floating-point instructions, NASM accepts a wide range of | |
593 | syntaxes: you can use two-operand forms like MASM supports, or you | |
594 | can use NASM's native single-operand forms in most cases. Details of | |
595 | all forms of each supported instruction are given in appendix A. For | |
596 | example, you can code: | |
597 | ||
598 | fadd st1 ; this sets st0 := st0 + st1 | |
599 | fadd st0,st1 ; so does this | |
600 | ||
601 | fadd st1,st0 ; this sets st1 := st1 + st0 | |
602 | fadd to st1 ; so does this | |
603 | ||
604 | Almost any floating-point instruction that references memory must | |
605 | use one of the prefixes `DWORD', `QWORD' or `TWORD' to indicate what | |
606 | size of memory operand it refers to. | |
607 | ||
608 | 3.2 Pseudo-Instructions | |
609 | ||
610 | Pseudo-instructions are things which, though not real x86 machine | |
611 | instructions, are used in the instruction field anyway because | |
612 | that's the most convenient place to put them. The current pseudo- | |
613 | instructions are `DB', `DW', `DD', `DQ' and `DT', their | |
614 | uninitialised counterparts `RESB', `RESW', `RESD', `RESQ' and | |
615 | `REST', the `INCBIN' command, the `EQU' command, and the `TIMES' | |
616 | prefix. | |
617 | ||
618 | 3.2.1 `DB' and friends: Declaring Initialised Data | |
619 | ||
620 | `DB', `DW', `DD', `DQ' and `DT' are used, much as in MASM, to | |
621 | declare initialised data in the output file. They can be invoked in | |
622 | a wide range of ways: | |
623 | ||
624 | db 0x55 ; just the byte 0x55 | |
625 | db 0x55,0x56,0x57 ; three bytes in succession | |
626 | db 'a',0x55 ; character constants are OK | |
627 | db 'hello',13,10,'$' ; so are string constants | |
628 | dw 0x1234 ; 0x34 0x12 | |
629 | dw 'a' ; 0x41 0x00 (it's just a number) | |
630 | dw 'ab' ; 0x41 0x42 (character constant) | |
631 | dw 'abc' ; 0x41 0x42 0x43 0x00 (string) | |
632 | dd 0x12345678 ; 0x78 0x56 0x34 0x12 | |
633 | dd 1.234567e20 ; floating-point constant | |
634 | dq 1.234567e20 ; double-precision float | |
635 | dt 1.234567e20 ; extended-precision float | |
636 | ||
637 | `DQ' and `DT' do not accept numeric constants or string constants as | |
638 | operands. | |
639 | ||
640 | 3.2.2 `RESB' and friends: Declaring Uninitialised Data | |
641 | ||
642 | `RESB', `RESW', `RESD', `RESQ' and `REST' are designed to be used in | |
643 | the BSS section of a module: they declare _uninitialised_ storage | |
644 | space. Each takes a single operand, which is the number of bytes, | |
645 | words, doublewords or whatever to reserve. As stated in section | |
646 | 2.2.7, NASM does not support the MASM/TASM syntax of reserving | |
647 | uninitialised space by writing `DW ?' or similar things: this is | |
648 | what it does instead. The operand to a `RESB'-type pseudo- | |
649 | instruction is a _critical expression_: see section 3.7. | |
650 | ||
651 | For example: | |
652 | ||
653 | buffer: resb 64 ; reserve 64 bytes | |
654 | wordvar: resw 1 ; reserve a word | |
655 | realarray resq 10 ; array of ten reals | |
656 | ||
657 | 3.2.3 `INCBIN': Including External Binary Files | |
658 | ||
659 | `INCBIN' is borrowed from the old Amiga assembler DevPac: it | |
660 | includes a binary file verbatim into the output file. This can be | |
661 | handy for (for example) including graphics and sound data directly | |
662 | into a game executable file. It can be called in one of these three | |
663 | ways: | |
664 | ||
665 | incbin "file.dat" ; include the whole file | |
666 | incbin "file.dat",1024 ; skip the first 1024 bytes | |
667 | incbin "file.dat",1024,512 ; skip the first 1024, and | |
668 | ; actually include at most 512 | |
669 | ||
670 | 3.2.4 `EQU': Defining Constants | |
671 | ||
672 | `EQU' defines a symbol to a given constant value: when `EQU' is | |
673 | used, the source line must contain a label. The action of `EQU' is | |
674 | to define the given label name to the value of its (only) operand. | |
675 | This definition is absolute, and cannot change later. So, for | |
676 | example, | |
677 | ||
678 | message db 'hello, world' | |
679 | msglen equ $-message | |
680 | ||
681 | defines `msglen' to be the constant 12. `msglen' may not then be | |
682 | redefined later. This is not a preprocessor definition either: the | |
683 | value of `msglen' is evaluated _once_, using the value of `$' (see | |
684 | section 3.5 for an explanation of `$') at the point of definition, | |
685 | rather than being evaluated wherever it is referenced and using the | |
686 | value of `$' at the point of reference. Note that the operand to an | |
687 | `EQU' is also a critical expression (section 3.7). | |
688 | ||
689 | 3.2.5 `TIMES': Repeating Instructions or Data | |
690 | ||
691 | The `TIMES' prefix causes the instruction to be assembled multiple | |
692 | times. This is partly present as NASM's equivalent of the `DUP' | |
693 | syntax supported by MASM-compatible assemblers, in that you can code | |
694 | ||
695 | zerobuf: times 64 db 0 | |
696 | ||
697 | or similar things; but `TIMES' is more versatile than that. The | |
698 | argument to `TIMES' is not just a numeric constant, but a numeric | |
699 | _expression_, so you can do things like | |
700 | ||
701 | buffer: db 'hello, world' | |
702 | times 64-$+buffer db ' ' | |
703 | ||
704 | which will store exactly enough spaces to make the total length of | |
705 | `buffer' up to 64. Finally, `TIMES' can be applied to ordinary | |
706 | instructions, so you can code trivial unrolled loops in it: | |
707 | ||
708 | times 100 movsb | |
709 | ||
710 | Note that there is no effective difference between | |
711 | `times 100 resb 1' and `resb 100', except that the latter will be | |
712 | assembled about 100 times faster due to the internal structure of | |
713 | the assembler. | |
714 | ||
715 | The operand to `TIMES', like that of `EQU' and those of `RESB' and | |
716 | friends, is a critical expression (section 3.7). | |
717 | ||
718 | Note also that `TIMES' can't be applied to macros: the reason for | |
719 | this is that `TIMES' is processed after the macro phase, which | |
720 | allows the argument to `TIMES' to contain expressions such as | |
721 | `64-$+buffer' as above. To repeat more than one line of code, or a | |
722 | complex macro, use the preprocessor `%rep' directive. | |
723 | ||
724 | 3.3 Effective Addresses | |
725 | ||
726 | An effective address is any operand to an instruction which | |
727 | references memory. Effective addresses, in NASM, have a very simple | |
728 | syntax: they consist of an expression evaluating to the desired | |
729 | address, enclosed in square brackets. For example: | |
730 | ||
731 | wordvar dw 123 | |
732 | mov ax,[wordvar] | |
733 | mov ax,[wordvar+1] | |
734 | mov ax,[es:wordvar+bx] | |
735 | ||
736 | Anything not conforming to this simple system is not a valid memory | |
737 | reference in NASM, for example `es:wordvar[bx]'. | |
738 | ||
739 | More complicated effective addresses, such as those involving more | |
740 | than one register, work in exactly the same way: | |
741 | ||
742 | mov eax,[ebx*2+ecx+offset] | |
743 | mov ax,[bp+di+8] | |
744 | ||
745 | NASM is capable of doing algebra on these effective addresses, so | |
746 | that things which don't necessarily _look_ legal are perfectly all | |
747 | right: | |
748 | ||
749 | mov eax,[ebx*5] ; assembles as [ebx*4+ebx] | |
750 | mov eax,[label1*2-label2] ; ie [label1+(label1-label2)] | |
751 | ||
752 | Some forms of effective address have more than one assembled form; | |
753 | in most such cases NASM will generate the smallest form it can. For | |
754 | example, there are distinct assembled forms for the 32-bit effective | |
755 | addresses `[eax*2+0]' and `[eax+eax]', and NASM will generally | |
756 | generate the latter on the grounds that the former requires four | |
757 | bytes to store a zero offset. | |
758 | ||
759 | NASM has a hinting mechanism which will cause `[eax+ebx]' and | |
760 | `[ebx+eax]' to generate different opcodes; this is occasionally | |
761 | useful because `[esi+ebp]' and `[ebp+esi]' have different default | |
762 | segment registers. | |
763 | ||
764 | However, you can force NASM to generate an effective address in a | |
765 | particular form by the use of the keywords `BYTE', `WORD', `DWORD' | |
766 | and `NOSPLIT'. If you need `[eax+3]' to be assembled using a double- | |
767 | word offset field instead of the one byte NASM will normally | |
768 | generate, you can code `[dword eax+3]'. Similarly, you can force | |
769 | NASM to use a byte offset for a small value which it hasn't seen on | |
770 | the first pass (see section 3.7 for an example of such a code | |
771 | fragment) by using `[byte eax+offset]'. As special cases, | |
772 | `[byte eax]' will code `[eax+0]' with a byte offset of zero, and | |
773 | `[dword eax]' will code it with a double-word offset of zero. The | |
774 | normal form, `[eax]', will be coded with no offset field. | |
775 | ||
776 | Similarly, NASM will split `[eax*2]' into `[eax+eax]' because that | |
777 | allows the offset field to be absent and space to be saved; in fact, | |
778 | it will also split `[eax*2+offset]' into `[eax+eax+offset]'. You can | |
779 | combat this behaviour by the use of the `NOSPLIT' keyword: | |
780 | `[nosplit eax*2]' will force `[eax*2+0]' to be generated literally. | |
781 | ||
782 | 3.4 Constants | |
783 | ||
784 | NASM understands four different types of constant: numeric, | |
785 | character, string and floating-point. | |
786 | ||
787 | 3.4.1 Numeric Constants | |
788 | ||
789 | A numeric constant is simply a number. NASM allows you to specify | |
790 | numbers in a variety of number bases, in a variety of ways: you can | |
791 | suffix `H', `Q' and `B' for hex, octal and binary, or you can prefix | |
792 | `0x' for hex in the style of C, or you can prefix `$' for hex in the | |
793 | style of Borland Pascal. Note, though, that the `$' prefix does | |
794 | double duty as a prefix on identifiers (see section 3.1), so a hex | |
795 | number prefixed with a `$' sign must have a digit after the `$' | |
796 | rather than a letter. | |
797 | ||
798 | Some examples: | |
799 | ||
800 | mov ax,100 ; decimal | |
801 | mov ax,0a2h ; hex | |
802 | mov ax,$0a2 ; hex again: the 0 is required | |
803 | mov ax,0xa2 ; hex yet again | |
804 | mov ax,777q ; octal | |
805 | mov ax,10010011b ; binary | |
806 | ||
807 | 3.4.2 Character Constants | |
808 | ||
809 | A character constant consists of up to four characters enclosed in | |
810 | either single or double quotes. The type of quote makes no | |
811 | difference to NASM, except of course that surrounding the constant | |
812 | with single quotes allows double quotes to appear within it and vice | |
813 | versa. | |
814 | ||
815 | A character constant with more than one character will be arranged | |
816 | with little-endian order in mind: if you code | |
817 | ||
818 | mov eax,'abcd' | |
819 | ||
820 | then the constant generated is not `0x61626364', but `0x64636261', | |
821 | so that if you were then to store the value into memory, it would | |
822 | read `abcd' rather than `dcba'. This is also the sense of character | |
823 | constants understood by the Pentium's `CPUID' instruction (see | |
824 | section A.22). | |
825 | ||
826 | 3.4.3 String Constants | |
827 | ||
828 | String constants are only acceptable to some pseudo-instructions, | |
829 | namely the `DB' family and `INCBIN'. | |
830 | ||
831 | A string constant looks like a character constant, only longer. It | |
832 | is treated as a concatenation of maximum-size character constants | |
833 | for the conditions. So the following are equivalent: | |
834 | ||
835 | db 'hello' ; string constant | |
836 | db 'h','e','l','l','o' ; equivalent character constants | |
837 | ||
838 | And the following are also equivalent: | |
839 | ||
840 | dd 'ninechars' ; doubleword string constant | |
841 | dd 'nine','char','s' ; becomes three doublewords | |
842 | db 'ninechars',0,0,0 ; and really looks like this | |
843 | ||
844 | Note that when used as an operand to `db', a constant like `'ab'' is | |
845 | treated as a string constant despite being short enough to be a | |
846 | character constant, because otherwise `db 'ab'' would have the same | |
847 | effect as `db 'a'', which would be silly. Similarly, three-character | |
848 | or four-character constants are treated as strings when they are | |
849 | operands to `dw'. | |
850 | ||
851 | 3.4.4 Floating-Point Constants | |
852 | ||
853 | Floating-point constants are acceptable only as arguments to `DD', | |
854 | `DQ' and `DT'. They are expressed in the traditional form: digits, | |
855 | then a period, then optionally more digits, then optionally an `E' | |
856 | followed by an exponent. The period is mandatory, so that NASM can | |
857 | distinguish between `dd 1', which declares an integer constant, and | |
858 | `dd 1.0' which declares a floating-point constant. | |
859 | ||
860 | Some examples: | |
861 | ||
862 | dd 1.2 ; an easy one | |
863 | dq 1.e10 ; 10,000,000,000 | |
864 | dq 1.e+10 ; synonymous with 1.e10 | |
865 | dq 1.e-10 ; 0.000 000 000 1 | |
866 | dt 3.141592653589793238462 ; pi | |
867 | ||
868 | NASM cannot do compile-time arithmetic on floating-point constants. | |
869 | This is because NASM is designed to be portable - although it always | |
870 | generates code to run on x86 processors, the assembler itself can | |
871 | run on any system with an ANSI C compiler. Therefore, the assembler | |
872 | cannot guarantee the presence of a floating-point unit capable of | |
873 | handling the Intel number formats, and so for NASM to be able to do | |
874 | floating arithmetic it would have to include its own complete set of | |
875 | floating-point routines, which would significantly increase the size | |
876 | of the assembler for very little benefit. | |
877 | ||
878 | 3.5 Expressions | |
879 | ||
880 | Expressions in NASM are similar in syntax to those in C. | |
881 | ||
882 | NASM does not guarantee the size of the integers used to evaluate | |
883 | expressions at compile time: since NASM can compile and run on 64- | |
884 | bit systems quite happily, don't assume that expressions are | |
885 | evaluated in 32-bit registers and so try to make deliberate use of | |
886 | integer overflow. It might not always work. The only thing NASM will | |
887 | guarantee is what's guaranteed by ANSI C: you always have _at least_ | |
888 | 32 bits to work in. | |
889 | ||
890 | NASM supports two special tokens in expressions, allowing | |
891 | calculations to involve the current assembly position: the `$' and | |
892 | `$$' tokens. `$' evaluates to the assembly position at the beginning | |
893 | of the line containing the expression; so you can code an infinite | |
894 | loop using `JMP $'. `$$' evaluates to the beginning of the current | |
895 | section; so you can tell how far into the section you are by using | |
896 | `($-$$)'. | |
897 | ||
898 | The arithmetic operators provided by NASM are listed here, in | |
899 | increasing order of precedence. | |
900 | ||
901 | 3.5.1 `|': Bitwise OR Operator | |
902 | ||
903 | The `|' operator gives a bitwise OR, exactly as performed by the | |
904 | `OR' machine instruction. Bitwise OR is the lowest-priority | |
905 | arithmetic operator supported by NASM. | |
906 | ||
907 | 3.5.2 `^': Bitwise XOR Operator | |
908 | ||
909 | `^' provides the bitwise XOR operation. | |
910 | ||
911 | 3.5.3 `&': Bitwise AND Operator | |
912 | ||
913 | `&' provides the bitwise AND operation. | |
914 | ||
915 | 3.5.4 `<<' and `>>': Bit Shift Operators | |
916 | ||
917 | `<<' gives a bit-shift to the left, just as it does in C. So `5<<3' | |
918 | evaluates to 5 times 8, or 40. `>>' gives a bit-shift to the right; | |
919 | in NASM, such a shift is _always_ unsigned, so that the bits shifted | |
920 | in from the left-hand end are filled with zero rather than a sign- | |
921 | extension of the previous highest bit. | |
922 | ||
923 | 3.5.5 `+' and `-': Addition and Subtraction Operators | |
924 | ||
925 | The `+' and `-' operators do perfectly ordinary addition and | |
926 | subtraction. | |
927 | ||
928 | 3.5.6 `*', `/', `//', `%' and `%%': Multiplication and Division | |
929 | ||
930 | `*' is the multiplication operator. `/' and `//' are both division | |
931 | operators: `/' is unsigned division and `//' is signed division. | |
932 | Similarly, `%' and `%%' provide unsigned and signed modulo operators | |
933 | respectively. | |
934 | ||
935 | NASM, like ANSI C, provides no guarantees about the sensible | |
936 | operation of the signed modulo operator. | |
937 | ||
938 | Since the `%' character is used extensively by the macro | |
939 | preprocessor, you should ensure that both the signed and unsigned | |
940 | modulo operators are followed by white space wherever they appear. | |
941 | ||
942 | 3.5.7 Unary Operators: `+', `-', `~' and `SEG' | |
943 | ||
944 | The highest-priority operators in NASM's expression grammar are | |
945 | those which only apply to one argument. `-' negates its operand, `+' | |
946 | does nothing (it's provided for symmetry with `-'), `~' computes the | |
947 | one's complement of its operand, and `SEG' provides the segment | |
948 | address of its operand (explained in more detail in section 3.6). | |
949 | ||
950 | 3.6 `SEG' and `WRT' | |
951 | ||
952 | When writing large 16-bit programs, which must be split into | |
953 | multiple segments, it is often necessary to be able to refer to the | |
954 | segment part of the address of a symbol. NASM supports the `SEG' | |
955 | operator to perform this function. | |
956 | ||
957 | The `SEG' operator returns the _preferred_ segment base of a symbol, | |
958 | defined as the segment base relative to which the offset of the | |
959 | symbol makes sense. So the code | |
960 | ||
961 | mov ax,seg symbol | |
962 | mov es,ax | |
963 | mov bx,symbol | |
964 | ||
965 | will load `ES:BX' with a valid pointer to the symbol `symbol'. | |
966 | ||
967 | Things can be more complex than this: since 16-bit segments and | |
968 | groups may overlap, you might occasionally want to refer to some | |
969 | symbol using a different segment base from the preferred one. NASM | |
970 | lets you do this, by the use of the `WRT' (With Reference To) | |
971 | keyword. So you can do things like | |
972 | ||
973 | mov ax,weird_seg ; weird_seg is a segment base | |
974 | mov es,ax | |
975 | mov bx,symbol wrt weird_seg | |
976 | ||
977 | to load `ES:BX' with a different, but functionally equivalent, | |
978 | pointer to the symbol `symbol'. | |
979 | ||
980 | NASM supports far (inter-segment) calls and jumps by means of the | |
981 | syntax `call segment:offset', where `segment' and `offset' both | |
982 | represent immediate values. So to call a far procedure, you could | |
983 | code either of | |
984 | ||
985 | call (seg procedure):procedure | |
986 | call weird_seg:(procedure wrt weird_seg) | |
987 | ||
988 | (The parentheses are included for clarity, to show the intended | |
989 | parsing of the above instructions. They are not necessary in | |
990 | practice.) | |
991 | ||
992 | NASM supports the syntax `call far procedure' as a synonym for the | |
993 | first of the above usages. `JMP' works identically to `CALL' in | |
994 | these examples. | |
995 | ||
996 | To declare a far pointer to a data item in a data segment, you must | |
997 | code | |
998 | ||
999 | dw symbol, seg symbol | |
1000 | ||
1001 | NASM supports no convenient synonym for this, though you can always | |
1002 | invent one using the macro processor. | |
1003 | ||
1004 | 3.7 Critical Expressions | |
1005 | ||
1006 | A limitation of NASM is that it is a two-pass assembler; unlike TASM | |
1007 | and others, it will always do exactly two assembly passes. Therefore | |
1008 | it is unable to cope with source files that are complex enough to | |
1009 | require three or more passes. | |
1010 | ||
1011 | The first pass is used to determine the size of all the assembled | |
1012 | code and data, so that the second pass, when generating all the | |
1013 | code, knows all the symbol addresses the code refers to. So one | |
1014 | thing NASM can't handle is code whose size depends on the value of a | |
1015 | symbol declared after the code in question. For example, | |
1016 | ||
1017 | times (label-$) db 0 | |
1018 | label: db 'Where am I?' | |
1019 | ||
1020 | The argument to `TIMES' in this case could equally legally evaluate | |
1021 | to anything at all; NASM will reject this example because it cannot | |
1022 | tell the size of the `TIMES' line when it first sees it. It will | |
1023 | just as firmly reject the slightly paradoxical code | |
1024 | ||
1025 | times (label-$+1) db 0 | |
1026 | label: db 'NOW where am I?' | |
1027 | ||
1028 | in which _any_ value for the `TIMES' argument is by definition | |
1029 | wrong! | |
1030 | ||
1031 | NASM rejects these examples by means of a concept called a _critical | |
1032 | expression_, which is defined to be an expression whose value is | |
1033 | required to be computable in the first pass, and which must | |
1034 | therefore depend only on symbols defined before it. The argument to | |
1035 | the `TIMES' prefix is a critical expression; for the same reason, | |
1036 | the arguments to the `RESB' family of pseudo-instructions are also | |
1037 | critical expressions. | |
1038 | ||
1039 | Critical expressions can crop up in other contexts as well: consider | |
1040 | the following code. | |
1041 | ||
1042 | mov ax,symbol1 | |
1043 | symbol1 equ symbol2 | |
1044 | symbol2: | |
1045 | ||
1046 | On the first pass, NASM cannot determine the value of `symbol1', | |
1047 | because `symbol1' is defined to be equal to `symbol2' which NASM | |
1048 | hasn't seen yet. On the second pass, therefore, when it encounters | |
1049 | the line `mov ax,symbol1', it is unable to generate the code for it | |
1050 | because it still doesn't know the value of `symbol1'. On the next | |
1051 | line, it would see the `EQU' again and be able to determine the | |
1052 | value of `symbol1', but by then it would be too late. | |
1053 | ||
1054 | NASM avoids this problem by defining the right-hand side of an `EQU' | |
1055 | statement to be a critical expression, so the definition of | |
1056 | `symbol1' would be rejected in the first pass. | |
1057 | ||
1058 | There is a related issue involving forward references: consider this | |
1059 | code fragment. | |
1060 | ||
1061 | mov eax,[ebx+offset] | |
1062 | offset equ 10 | |
1063 | ||
1064 | NASM, on pass one, must calculate the size of the instruction | |
1065 | `mov eax,[ebx+offset]' without knowing the value of `offset'. It has | |
1066 | no way of knowing that `offset' is small enough to fit into a one- | |
1067 | byte offset field and that it could therefore get away with | |
1068 | generating a shorter form of the effective-address encoding; for all | |
1069 | it knows, in pass one, `offset' could be a symbol in the code | |
1070 | segment, and it might need the full four-byte form. So it is forced | |
1071 | to compute the size of the instruction to accommodate a four-byte | |
1072 | address part. In pass two, having made this decision, it is now | |
1073 | forced to honour it and keep the instruction large, so the code | |
1074 | generated in this case is not as small as it could have been. This | |
1075 | problem can be solved by defining `offset' before using it, or by | |
1076 | forcing byte size in the effective address by coding | |
1077 | `[byte ebx+offset]'. | |
1078 | ||
1079 | 3.8 Local Labels | |
1080 | ||
1081 | NASM gives special treatment to symbols beginning with a period. A | |
1082 | label beginning with a single period is treated as a _local_ label, | |
1083 | which means that it is associated with the previous non-local label. | |
1084 | So, for example: | |
1085 | ||
1086 | label1 ; some code | |
1087 | .loop ; some more code | |
1088 | jne .loop | |
1089 | ret | |
1090 | label2 ; some code | |
1091 | .loop ; some more code | |
1092 | jne .loop | |
1093 | ret | |
1094 | ||
1095 | In the above code fragment, each `JNE' instruction jumps to the line | |
1096 | immediately before it, because the two definitions of `.loop' are | |
1097 | kept separate by virtue of each being associated with the previous | |
1098 | non-local label. | |
1099 | ||
1100 | This form of local label handling is borrowed from the old Amiga | |
1101 | assembler DevPac; however, NASM goes one step further, in allowing | |
1102 | access to local labels from other parts of the code. This is | |
1103 | achieved by means of _defining_ a local label in terms of the | |
1104 | previous non-local label: the first definition of `.loop' above is | |
1105 | really defining a symbol called `label1.loop', and the second | |
1106 | defines a symbol called `label2.loop'. So, if you really needed to, | |
1107 | you could write | |
1108 | ||
1109 | label3 ; some more code | |
1110 | ; and some more | |
1111 | jmp label1.loop | |
1112 | ||
1113 | Sometimes it is useful - in a macro, for instance - to be able to | |
1114 | define a label which can be referenced from anywhere but which | |
1115 | doesn't interfere with the normal local-label mechanism. Such a | |
1116 | label can't be non-local because it would interfere with subsequent | |
1117 | definitions of, and references to, local labels; and it can't be | |
1118 | local because the macro that defined it wouldn't know the label's | |
1119 | full name. NASM therefore introduces a third type of label, which is | |
1120 | probably only useful in macro definitions: if a label begins with | |
1121 | the special prefix `..@', then it does nothing to the local label | |
1122 | mechanism. So you could code | |
1123 | ||
1124 | label1: ; a non-local label | |
1125 | .local: ; this is really label1.local | |
1126 | ..@foo: ; this is a special symbol | |
1127 | label2: ; another non-local label | |
1128 | .local: ; this is really label2.local | |
1129 | jmp ..@foo ; this will jump three lines up | |
1130 | ||
1131 | NASM has the capacity to define other special symbols beginning with | |
1132 | a double period: for example, `..start' is used to specify the entry | |
1133 | point in the `obj' output format (see section 6.2.6). | |
1134 | ||
1135 | Chapter 4: The NASM Preprocessor | |
1136 | -------------------------------- | |
1137 | ||
1138 | NASM contains a powerful macro processor, which supports conditional | |
1139 | assembly, multi-level file inclusion, two forms of macro (single- | |
1140 | line and multi-line), and a `context stack' mechanism for extra | |
1141 | macro power. Preprocessor directives all begin with a `%' sign. | |
1142 | ||
1143 | 4.1 Single-Line Macros | |
1144 | ||
1145 | 4.1.1 The Normal Way: `%define' | |
1146 | ||
1147 | Single-line macros are defined using the `%define' preprocessor | |
1148 | directive. The definitions work in a similar way to C; so you can do | |
1149 | things like | |
1150 | ||
1151 | %define ctrl 0x1F & | |
1152 | %define param(a,b) ((a)+(a)*(b)) | |
1153 | mov byte [param(2,ebx)], ctrl 'D' | |
1154 | ||
1155 | which will expand to | |
1156 | ||
1157 | mov byte [(2)+(2)*(ebx)], 0x1F & 'D' | |
1158 | ||
1159 | When the expansion of a single-line macro contains tokens which | |
1160 | invoke another macro, the expansion is performed at invocation time, | |
1161 | not at definition time. Thus the code | |
1162 | ||
1163 | %define a(x) 1+b(x) | |
1164 | %define b(x) 2*x | |
1165 | mov ax,a(8) | |
1166 | ||
1167 | will evaluate in the expected way to `mov ax,1+2*8', even though the | |
1168 | macro `b' wasn't defined at the time of definition of `a'. | |
1169 | ||
1170 | Macros defined with `%define' are case sensitive: after | |
1171 | `%define foo bar', only `foo' will expand to `bar': `Foo' or `FOO' | |
1172 | will not. By using `%idefine' instead of `%define' (the `i' stands | |
1173 | for `insensitive') you can define all the case variants of a macro | |
1174 | at once, so that `%idefine foo bar' would cause `foo', `Foo', `FOO', | |
1175 | `fOO' and so on all to expand to `bar'. | |
1176 | ||
1177 | There is a mechanism which detects when a macro call has occurred as | |
1178 | a result of a previous expansion of the same macro, to guard against | |
1179 | circular references and infinite loops. If this happens, the | |
1180 | preprocessor will only expand the first occurrence of the macro. | |
1181 | Hence, if you code | |
1182 | ||
1183 | %define a(x) 1+a(x) | |
1184 | mov ax,a(3) | |
1185 | ||
1186 | the macro `a(3)' will expand once, becoming `1+a(3)', and will then | |
1187 | expand no further. This behaviour can be useful: see section 8.1 for | |
1188 | an example of its use. | |
1189 | ||
1190 | You can overload single-line macros: if you write | |
1191 | ||
1192 | %define foo(x) 1+x | |
1193 | %define foo(x,y) 1+x*y | |
1194 | ||
1195 | the preprocessor will be able to handle both types of macro call, by | |
1196 | counting the parameters you pass; so `foo(3)' will become `1+3' | |
1197 | whereas `foo(ebx,2)' will become `1+ebx*2'. However, if you define | |
1198 | ||
1199 | %define foo bar | |
1200 | ||
1201 | then no other definition of `foo' will be accepted: a macro with no | |
1202 | parameters prohibits the definition of the same name as a macro | |
1203 | _with_ parameters, and vice versa. | |
1204 | ||
1205 | This doesn't prevent single-line macros being _redefined_: you can | |
1206 | perfectly well define a macro with | |
1207 | ||
1208 | %define foo bar | |
1209 | ||
1210 | and then re-define it later in the same source file with | |
1211 | ||
1212 | %define foo baz | |
1213 | ||
1214 | Then everywhere the macro `foo' is invoked, it will be expanded | |
1215 | according to the most recent definition. This is particularly useful | |
1216 | when defining single-line macros with `%assign' (see section 4.1.2). | |
1217 | ||
1218 | You can pre-define single-line macros using the `-d' option on the | |
1219 | NASM command line: see section 2.1.7. | |
1220 | ||
1221 | 4.1.2 Preprocessor Variables: `%assign' | |
1222 | ||
1223 | An alternative way to define single-line macros is by means of the | |
1224 | `%assign' command (and its case sensitivecase-insensitive | |
1225 | counterpart `%iassign', which differs from `%assign' in exactly the | |
1226 | same way that `%idefine' differs from `%define'). | |
1227 | ||
1228 | `%assign' is used to define single-line macros which take no | |
1229 | parameters and have a numeric value. This value can be specified in | |
1230 | the form of an expression, and it will be evaluated once, when the | |
1231 | `%assign' directive is processed. | |
1232 | ||
1233 | Like `%define', macros defined using `%assign' can be re-defined | |
1234 | later, so you can do things like | |
1235 | ||
1236 | %assign i i+1 | |
1237 | ||
1238 | to increment the numeric value of a macro. | |
1239 | ||
1240 | `%assign' is useful for controlling the termination of `%rep' | |
1241 | preprocessor loops: see section 4.4 for an example of this. Another | |
1242 | use for `%assign' is given in section 7.4 and section 8.1. | |
1243 | ||
1244 | The expression passed to `%assign' is a critical expression (see | |
1245 | section 3.7), and must also evaluate to a pure number (rather than a | |
1246 | relocatable reference such as a code or data address, or anything | |
1247 | involving a register). | |
1248 | ||
1249 | 4.2 Multi-Line Macros: `%macro' | |
1250 | ||
1251 | Multi-line macros are much more like the type of macro seen in MASM | |
1252 | and TASM: a multi-line macro definition in NASM looks something like | |
1253 | this. | |
1254 | ||
1255 | %macro prologue 1 | |
1256 | push ebp | |
1257 | mov ebp,esp | |
1258 | sub esp,%1 | |
1259 | %endmacro | |
1260 | ||
1261 | This defines a C-like function prologue as a macro: so you would | |
1262 | invoke the macro with a call such as | |
1263 | ||
1264 | myfunc: prologue 12 | |
1265 | ||
1266 | which would expand to the three lines of code | |
1267 | ||
1268 | myfunc: push ebp | |
1269 | mov ebp,esp | |
1270 | sub esp,12 | |
1271 | ||
1272 | The number `1' after the macro name in the `%macro' line defines the | |
1273 | number of parameters the macro `prologue' expects to receive. The | |
1274 | use of `%1' inside the macro definition refers to the first | |
1275 | parameter to the macro call. With a macro taking more than one | |
1276 | parameter, subsequent parameters would be referred to as `%2', `%3' | |
1277 | and so on. | |
1278 | ||
1279 | Multi-line macros, like single-line macros, are case-sensitive, | |
1280 | unless you define them using the alternative directive `%imacro'. | |
1281 | ||
1282 | If you need to pass a comma as _part_ of a parameter to a multi-line | |
1283 | macro, you can do that by enclosing the entire parameter in braces. | |
1284 | So you could code things like | |
1285 | ||
1286 | %macro silly 2 | |
1287 | %2: db %1 | |
1288 | %endmacro | |
1289 | silly 'a', letter_a ; letter_a: db 'a' | |
1290 | silly 'ab', string_ab ; string_ab: db 'ab' | |
1291 | silly {13,10}, crlf ; crlf: db 13,10 | |
1292 | ||
1293 | 4.2.1 Overloading Multi-Line Macros | |
1294 | ||
1295 | As with single-line macros, multi-line macros can be overloaded by | |
1296 | defining the same macro name several times with different numbers of | |
1297 | parameters. This time, no exception is made for macros with no | |
1298 | parameters at all. So you could define | |
1299 | ||
1300 | %macro prologue 0 | |
1301 | push ebp | |
1302 | mov ebp,esp | |
1303 | %endmacro | |
1304 | ||
1305 | to define an alternative form of the function prologue which | |
1306 | allocates no local stack space. | |
1307 | ||
1308 | Sometimes, however, you might want to `overload' a machine | |
1309 | instruction; for example, you might want to define | |
1310 | ||
1311 | %macro push 2 | |
1312 | push %1 | |
1313 | push %2 | |
1314 | %endmacro | |
1315 | ||
1316 | so that you could code | |
1317 | ||
1318 | push ebx ; this line is not a macro call | |
1319 | push eax,ecx ; but this one is | |
1320 | ||
1321 | Ordinarily, NASM will give a warning for the first of the above two | |
1322 | lines, since `push' is now defined to be a macro, and is being | |
1323 | invoked with a number of parameters for which no definition has been | |
1324 | given. The correct code will still be generated, but the assembler | |
1325 | will give a warning. This warning can be disabled by the use of the | |
1326 | `-w-macro-params' command-line option (see section 2.1.10). | |
1327 | ||
1328 | 4.2.2 Macro-Local Labels | |
1329 | ||
1330 | NASM allows you to define labels within a multi-line macro | |
1331 | definition in such a way as to make them local to the macro call: so | |
1332 | calling the same macro multiple times will use a different label | |
1333 | each time. You do this by prefixing `%%' to the label name. So you | |
1334 | can invent an instruction which executes a `RET' if the `Z' flag is | |
1335 | set by doing this: | |
1336 | ||
1337 | %macro retz 0 | |
1338 | jnz %%skip | |
1339 | ret | |
1340 | %%skip: | |
1341 | %endmacro | |
1342 | ||
1343 | You can call this macro as many times as you want, and every time | |
1344 | you call it NASM will make up a different `real' name to substitute | |
1345 | for the label `%%skip'. The names NASM invents are of the form | |
1346 | `..@2345.skip', where the number 2345 changes with every macro call. | |
1347 | The `..@' prefix prevents macro-local labels from interfering with | |
1348 | the local label mechanism, as described in section 3.8. You should | |
1349 | avoid defining your own labels in this form (the `..@' prefix, then | |
1350 | a number, then another period) in case they interfere with macro- | |
1351 | local labels. | |
1352 | ||
1353 | 4.2.3 Greedy Macro Parameters | |
1354 | ||
1355 | Occasionally it is useful to define a macro which lumps its entire | |
1356 | command line into one parameter definition, possibly after | |
1357 | extracting one or two smaller parameters from the front. An example | |
1358 | might be a macro to write a text string to a file in MS-DOS, where | |
1359 | you might want to be able to write | |
1360 | ||
1361 | writefile [filehandle],"hello, world",13,10 | |
1362 | ||
1363 | NASM allows you to define the last parameter of a macro to be | |
1364 | _greedy_, meaning that if you invoke the macro with more parameters | |
1365 | than it expects, all the spare parameters get lumped into the last | |
1366 | defined one along with the separating commas. So if you code: | |
1367 | ||
1368 | %macro writefile 2+ | |
1369 | jmp %%endstr | |
1370 | %%str: db %2 | |
1371 | %%endstr: mov dx,%%str | |
1372 | mov cx,%%endstr-%%str | |
1373 | mov bx,%1 | |
1374 | mov ah,0x40 | |
1375 | int 0x21 | |
1376 | %endmacro | |
1377 | ||
1378 | then the example call to `writefile' above will work as expected: | |
1379 | the text before the first comma, `[filehandle]', is used as the | |
1380 | first macro parameter and expanded when `%1' is referred to, and all | |
1381 | the subsequent text is lumped into `%2' and placed after the `db'. | |
1382 | ||
1383 | The greedy nature of the macro is indicated to NASM by the use of | |
1384 | the `+' sign after the parameter count on the `%macro' line. | |
1385 | ||
1386 | If you define a greedy macro, you are effectively telling NASM how | |
1387 | it should expand the macro given _any_ number of parameters from the | |
1388 | actual number specified up to infinity; in this case, for example, | |
1389 | NASM now knows what to do when it sees a call to `writefile' with 2, | |
1390 | 3, 4 or more parameters. NASM will take this into account when | |
1391 | overloading macros, and will not allow you to define another form of | |
1392 | `writefile' taking 4 parameters (for example). | |
1393 | ||
1394 | Of course, the above macro could have been implemented as a non- | |
1395 | greedy macro, in which case the call to it would have had to look | |
1396 | like | |
1397 | ||
1398 | writefile [filehandle], {"hello, world",13,10} | |
1399 | ||
1400 | NASM provides both mechanisms for putting commas in macro | |
1401 | parameters, and you choose which one you prefer for each macro | |
1402 | definition. | |
1403 | ||
1404 | See section 5.2.1 for a better way to write the above macro. | |
1405 | ||
1406 | 4.2.4 Default Macro Parameters | |
1407 | ||
1408 | NASM also allows you to define a multi-line macro with a _range_ of | |
1409 | allowable parameter counts. If you do this, you can specify defaults | |
1410 | for omitted parameters. So, for example: | |
1411 | ||
1412 | %macro die 0-1 "Painful program death has occurred." | |
1413 | writefile 2,%1 | |
1414 | mov ax,0x4c01 | |
1415 | int 0x21 | |
1416 | %endmacro | |
1417 | ||
1418 | This macro (which makes use of the `writefile' macro defined in | |
1419 | section 4.2.3) can be called with an explicit error message, which | |
1420 | it will display on the error output stream before exiting, or it can | |
1421 | be called with no parameters, in which case it will use the default | |
1422 | error message supplied in the macro definition. | |
1423 | ||
1424 | In general, you supply a minimum and maximum number of parameters | |
1425 | for a macro of this type; the minimum number of parameters are then | |
1426 | required in the macro call, and then you provide defaults for the | |
1427 | optional ones. So if a macro definition began with the line | |
1428 | ||
1429 | %macro foobar 1-3 eax,[ebx+2] | |
1430 | ||
1431 | then it could be called with between one and three parameters, and | |
1432 | `%1' would always be taken from the macro call. `%2', if not | |
1433 | specified by the macro call, would default to `eax', and `%3' if not | |
1434 | specified would default to `[ebx+2]'. | |
1435 | ||
1436 | You may omit parameter defaults from the macro definition, in which | |
1437 | case the parameter default is taken to be blank. This can be useful | |
1438 | for macros which can take a variable number of parameters, since the | |
1439 | `%0' token (see section 4.2.5) allows you to determine how many | |
1440 | parameters were really passed to the macro call. | |
1441 | ||
1442 | This defaulting mechanism can be combined with the greedy-parameter | |
1443 | mechanism; so the `die' macro above could be made more powerful, and | |
1444 | more useful, by changing the first line of the definition to | |
1445 | ||
1446 | %macro die 0-1+ "Painful program death has occurred.",13,10 | |
1447 | ||
1448 | The maximum parameter count can be infinite, denoted by `*'. In this | |
1449 | case, of course, it is impossible to provide a _full_ set of default | |
1450 | parameters. Examples of this usage are shown in section 4.2.6. | |
1451 | ||
1452 | 4.2.5 `%0': Macro Parameter Counter | |
1453 | ||
1454 | For a macro which can take a variable number of parameters, the | |
1455 | parameter reference `%0' will return a numeric constant giving the | |
1456 | number of parameters passed to the macro. This can be used as an | |
1457 | argument to `%rep' (see section 4.4) in order to iterate through all | |
1458 | the parameters of a macro. Examples are given in section 4.2.6. | |
1459 | ||
1460 | 4.2.6 `%rotate': Rotating Macro Parameters | |
1461 | ||
1462 | Unix shell programmers will be familiar with the `shift' shell | |
1463 | command, which allows the arguments passed to a shell script | |
1464 | (referenced as `$1', `$2' and so on) to be moved left by one place, | |
1465 | so that the argument previously referenced as `$2' becomes available | |
1466 | as `$1', and the argument previously referenced as `$1' is no longer | |
1467 | available at all. | |
1468 | ||
1469 | NASM provides a similar mechanism, in the form of `%rotate'. As its | |
1470 | name suggests, it differs from the Unix `shift' in that no | |
1471 | parameters are lost: parameters rotated off the left end of the | |
1472 | argument list reappear on the right, and vice versa. | |
1473 | ||
1474 | `%rotate' is invoked with a single numeric argument (which may be an | |
1475 | expression). The macro parameters are rotated to the left by that | |
1476 | many places. If the argument to `%rotate' is negative, the macro | |
1477 | parameters are rotated to the right. | |
1478 | ||
1479 | So a pair of macros to save and restore a set of registers might | |
1480 | work as follows: | |
1481 | ||
1482 | %macro multipush 1-* | |
1483 | %rep %0 | |
1484 | push %1 | |
1485 | %rotate 1 | |
1486 | %endrep | |
1487 | %endmacro | |
1488 | ||
1489 | This macro invokes the `PUSH' instruction on each of its arguments | |
1490 | in turn, from left to right. It begins by pushing its first | |
1491 | argument, `%1', then invokes `%rotate' to move all the arguments one | |
1492 | place to the left, so that the original second argument is now | |
1493 | available as `%1'. Repeating this procedure as many times as there | |
1494 | were arguments (achieved by supplying `%0' as the argument to | |
1495 | `%rep') causes each argument in turn to be pushed. | |
1496 | ||
1497 | Note also the use of `*' as the maximum parameter count, indicating | |
1498 | that there is no upper limit on the number of parameters you may | |
1499 | supply to the `multipush' macro. | |
1500 | ||
1501 | It would be convenient, when using this macro, to have a `POP' | |
1502 | equivalent, which _didn't_ require the arguments to be given in | |
1503 | reverse order. Ideally, you would write the `multipush' macro call, | |
1504 | then cut-and-paste the line to where the pop needed to be done, and | |
1505 | change the name of the called macro to `multipop', and the macro | |
1506 | would take care of popping the registers in the opposite order from | |
1507 | the one in which they were pushed. | |
1508 | ||
1509 | This can be done by the following definition: | |
1510 | ||
1511 | %macro multipop 1-* | |
1512 | %rep %0 | |
1513 | %rotate -1 | |
1514 | pop %1 | |
1515 | %endrep | |
1516 | %endmacro | |
1517 | ||
1518 | This macro begins by rotating its arguments one place to the | |
1519 | _right_, so that the original _last_ argument appears as `%1'. This | |
1520 | is then popped, and the arguments are rotated right again, so the | |
1521 | second-to-last argument becomes `%1'. Thus the arguments are | |
1522 | iterated through in reverse order. | |
1523 | ||
1524 | 4.2.7 Concatenating Macro Parameters | |
1525 | ||
1526 | NASM can concatenate macro parameters on to other text surrounding | |
1527 | them. This allows you to declare a family of symbols, for example, | |
1528 | in a macro definition. If, for example, you wanted to generate a | |
1529 | table of key codes along with offsets into the table, you could code | |
1530 | something like | |
1531 | ||
1532 | %macro keytab_entry 2 | |
1533 | keypos%1 equ $-keytab | |
1534 | db %2 | |
1535 | %endmacro | |
1536 | keytab: | |
1537 | keytab_entry F1,128+1 | |
1538 | keytab_entry F2,128+2 | |
1539 | keytab_entry Return,13 | |
1540 | ||
1541 | which would expand to | |
1542 | ||
1543 | keytab: | |
1544 | keyposF1 equ $-keytab | |
1545 | db 128+1 | |
1546 | keyposF2 equ $-keytab | |
1547 | db 128+2 | |
1548 | keyposReturn equ $-keytab | |
1549 | db 13 | |
1550 | ||
1551 | You can just as easily concatenate text on to the other end of a | |
1552 | macro parameter, by writing `%1foo'. | |
1553 | ||
1554 | If you need to append a _digit_ to a macro parameter, for example | |
1555 | defining labels `foo1' and `foo2' when passed the parameter `foo', | |
1556 | you can't code `%11' because that would be taken as the eleventh | |
1557 | macro parameter. Instead, you must code `%{1}1', which will separate | |
1558 | the first `1' (giving the number of the macro parameter) from the | |
1559 | second (literal text to be concatenated to the parameter). | |
1560 | ||
1561 | This concatenation can also be applied to other preprocessor in-line | |
1562 | objects, such as macro-local labels (section 4.2.2) and context- | |
1563 | local labels (section 4.6.2). In all cases, ambiguities in syntax | |
1564 | can be resolved by enclosing everything after the `%' sign and | |
1565 | before the literal text in braces: so `%{%foo}bar' concatenates the | |
1566 | text `bar' to the end of the real name of the macro-local label | |
1567 | `%%foo'. (This is unnecessary, since the form NASM uses for the real | |
1568 | names of macro-local labels means that the two usages `%{%foo}bar' | |
1569 | and `%%foobar' would both expand to the same thing anyway; | |
1570 | nevertheless, the capability is there.) | |
1571 | ||
1572 | 4.2.8 Condition Codes as Macro Parameters | |
1573 | ||
1574 | NASM can give special treatment to a macro parameter which contains | |
1575 | a condition code. For a start, you can refer to the macro parameter | |
1576 | `%1' by means of the alternative syntax `%+1', which informs NASM | |
1577 | that this macro parameter is supposed to contain a condition code, | |
1578 | and will cause the preprocessor to report an error message if the | |
1579 | macro is called with a parameter which is _not_ a valid condition | |
1580 | code. | |
1581 | ||
1582 | Far more usefully, though, you can refer to the macro parameter by | |
1583 | means of `%-1', which NASM will expand as the _inverse_ condition | |
1584 | code. So the `retz' macro defined in section 4.2.2 can be replaced | |
1585 | by a general conditional-return macro like this: | |
1586 | ||
1587 | %macro retc 1 | |
1588 | j%-1 %%skip | |
1589 | ret | |
1590 | %%skip: | |
1591 | %endmacro | |
1592 | ||
1593 | This macro can now be invoked using calls like `retc ne', which will | |
1594 | cause the conditional-jump instruction in the macro expansion to | |
1595 | come out as `JE', or `retc po' which will make the jump a `JPE'. | |
1596 | ||
1597 | The `%+1' macro-parameter reference is quite happy to interpret the | |
1598 | arguments `CXZ' and `ECXZ' as valid condition codes; however, `%-1' | |
1599 | will report an error if passed either of these, because no inverse | |
1600 | condition code exists. | |
1601 | ||
1602 | 4.2.9 Disabling Listing Expansion | |
1603 | ||
1604 | When NASM is generating a listing file from your program, it will | |
1605 | generally expand multi-line macros by means of writing the macro | |
1606 | call and then listing each line of the expansion. This allows you to | |
1607 | see which instructions in the macro expansion are generating what | |
1608 | code; however, for some macros this clutters the listing up | |
1609 | unnecessarily. | |
1610 | ||
1611 | NASM therefore provides the `.nolist' qualifier, which you can | |
1612 | include in a macro definition to inhibit the expansion of the macro | |
1613 | in the listing file. The `.nolist' qualifier comes directly after | |
1614 | the number of parameters, like this: | |
1615 | ||
1616 | %macro foo 1.nolist | |
1617 | ||
1618 | Or like this: | |
1619 | ||
1620 | %macro bar 1-5+.nolist a,b,c,d,e,f,g,h | |
1621 | ||
1622 | 4.3 Conditional Assembly | |
1623 | ||
1624 | Similarly to the C preprocessor, NASM allows sections of a source | |
1625 | file to be assembled only if certain conditions are met. The general | |
1626 | syntax of this feature looks like this: | |
1627 | ||
1628 | %if<condition> | |
1629 | ; some code which only appears if <condition> is met | |
1630 | %elif<condition2> | |
1631 | ; only appears if <condition> is not met but <condition2> is | |
1632 | %else | |
1633 | ; this appears if neither <condition> nor <condition2> was met | |
1634 | %endif | |
1635 | ||
1636 | The `%else' clause is optional, as is the `%elif' clause. You can | |
1637 | have more than one `%elif' clause as well. | |
1638 | ||
1639 | 4.3.1 `%ifdef': Testing Single-Line Macro Existence | |
1640 | ||
1641 | Beginning a conditional-assembly block with the line `%ifdef MACRO' | |
1642 | will assemble the subsequent code if, and only if, a single-line | |
1643 | macro called `MACRO' is defined. If not, then the `%elif' and | |
1644 | `%else' blocks (if any) will be processed instead. | |
1645 | ||
1646 | For example, when debugging a program, you might want to write code | |
1647 | such as | |
1648 | ||
1649 | ; perform some function | |
1650 | %ifdef DEBUG | |
1651 | writefile 2,"Function performed successfully",13,10 | |
1652 | %endif | |
1653 | ; go and do something else | |
1654 | ||
1655 | Then you could use the command-line option `-dDEBUG' to create a | |
1656 | version of the program which produced debugging messages, and remove | |
1657 | the option to generate the final release version of the program. | |
1658 | ||
1659 | You can test for a macro _not_ being defined by using `%ifndef' | |
1660 | instead of `%ifdef'. You can also test for macro definitions in | |
1661 | `%elif' blocks by using `%elifdef' and `%elifndef'. | |
1662 | ||
1663 | 4.3.2 `%ifctx': Testing the Context Stack | |
1664 | ||
1665 | The conditional-assembly construct `%ifctx ctxname' will cause the | |
1666 | subsequent code to be assembled if and only if the top context on | |
1667 | the preprocessor's context stack has the name `ctxname'. As with | |
1668 | `%ifdef', the inverse and `%elif' forms `%ifnctx', `%elifctx' and | |
1669 | `%elifnctx' are also supported. | |
1670 | ||
1671 | For more details of the context stack, see section 4.6. For a sample | |
1672 | use of `%ifctx', see section 4.6.5. | |
1673 | ||
1674 | 4.3.3 `%if': Testing Arbitrary Numeric Expressions | |
1675 | ||
1676 | The conditional-assembly construct `%if expr' will cause the | |
1677 | subsequent code to be assembled if and only if the value of the | |
1678 | numeric expression `expr' is non-zero. An example of the use of this | |
1679 | feature is in deciding when to break out of a `%rep' preprocessor | |
1680 | loop: see section 4.4 for a detailed example. | |
1681 | ||
1682 | The expression given to `%if', and its counterpart `%elif', is a | |
1683 | critical expression (see section 3.7). | |
1684 | ||
1685 | `%if' extends the normal NASM expression syntax, by providing a set | |
1686 | of relational operators which are not normally available in | |
1687 | expressions. The operators `=', `<', `>', `<=', `>=' and `<>' test | |
1688 | equality, less-than, greater-than, less-or-equal, greater-or-equal | |
1689 | and not-equal respectively. The C-like forms `==' and `!=' are | |
1690 | supported as alternative forms of `=' and `<>'. In addition, low- | |
1691 | priority logical operators `&&', `^^' and `||' are provided, | |
1692 | supplying logical AND, logical XOR and logical OR. These work like | |
1693 | the C logical operators (although C has no logical XOR), in that | |
1694 | they always return either 0 or 1, and treat any non-zero input as 1 | |
1695 | (so that `^^', for example, returns 1 if exactly one of its inputs | |
1696 | is zero, and 0 otherwise). The relational operators also return 1 | |
1697 | for true and 0 for false. | |
1698 | ||
1699 | 4.3.4 `%ifidn' and `%ifidni': Testing Exact Text Identity | |
1700 | ||
1701 | The construct `%ifidn text1,text2' will cause the subsequent code to | |
1702 | be assembled if and only if `text1' and `text2', after expanding | |
1703 | single-line macros, are identical pieces of text. Differences in | |
1704 | white space are not counted. | |
1705 | ||
1706 | `%ifidni' is similar to `%ifidn', but is case-insensitive. | |
1707 | ||
1708 | For example, the following macro pushes a register or number on the | |
1709 | stack, and allows you to treat `IP' as a real register: | |
1710 | ||
1711 | %macro pushparam 1 | |
1712 | %ifidni %1,ip | |
1713 | call %%label | |
1714 | %%label: | |
1715 | %else | |
1716 | push %1 | |
1717 | %endif | |
1718 | %endmacro | |
1719 | ||
1720 | Like most other `%if' constructs, `%ifidn' has a counterpart | |
1721 | `%elifidn', and negative forms `%ifnidn' and `%elifnidn'. Similarly, | |
1722 | `%ifidni' has counterparts `%elifidni', `%ifnidni' and `%elifnidni'. | |
1723 | ||
1724 | 4.3.5 `%ifid', `%ifnum', `%ifstr': Testing Token Types | |
1725 | ||
1726 | Some macros will want to perform different tasks depending on | |
1727 | whether they are passed a number, a string, or an identifier. For | |
1728 | example, a string output macro might want to be able to cope with | |
1729 | being passed either a string constant or a pointer to an existing | |
1730 | string. | |
1731 | ||
1732 | The conditional assembly construct `%ifid', taking one parameter | |
1733 | (which may be blank), assembles the subsequent code if and only if | |
1734 | the first token in the parameter exists and is an identifier. | |
1735 | `%ifnum' works similarly, but tests for the token being a numeric | |
1736 | constant; `%ifstr' tests for it being a string. | |
1737 | ||
1738 | For example, the `writefile' macro defined in section 4.2.3 can be | |
1739 | extended to take advantage of `%ifstr' in the following fashion: | |
1740 | ||
1741 | %macro writefile 2-3+ | |
1742 | %ifstr %2 | |
1743 | jmp %%endstr | |
1744 | %if %0 = 3 | |
1745 | %%str: db %2,%3 | |
1746 | %else | |
1747 | %%str: db %2 | |
1748 | %endif | |
1749 | %%endstr: mov dx,%%str | |
1750 | mov cx,%%endstr-%%str | |
1751 | %else | |
1752 | mov dx,%2 | |
1753 | mov cx,%3 | |
1754 | %endif | |
1755 | mov bx,%1 | |
1756 | mov ah,0x40 | |
1757 | int 0x21 | |
1758 | %endmacro | |
1759 | ||
1760 | Then the `writefile' macro can cope with being called in either of | |
1761 | the following two ways: | |
1762 | ||
1763 | writefile [file], strpointer, length | |
1764 | writefile [file], "hello", 13, 10 | |
1765 | ||
1766 | In the first, `strpointer' is used as the address of an already- | |
1767 | declared string, and `length' is used as its length; in the second, | |
1768 | a string is given to the macro, which therefore declares it itself | |
1769 | and works out the address and length for itself. | |
1770 | ||
1771 | Note the use of `%if' inside the `%ifstr': this is to detect whether | |
1772 | the macro was passed two arguments (so the string would be a single | |
1773 | string constant, and `db %2' would be adequate) or more (in which | |
1774 | case, all but the first two would be lumped together into `%3', and | |
1775 | `db %2,%3' would be required). | |
1776 | ||
1777 | The usual `%elifXXX', `%ifnXXX' and `%elifnXXX' versions exist for | |
1778 | each of `%ifid', `%ifnum' and `%ifstr'. | |
1779 | ||
1780 | 4.3.6 `%error': Reporting User-Defined Errors | |
1781 | ||
1782 | The preprocessor directive `%error' will cause NASM to report an | |
1783 | error if it occurs in assembled code. So if other users are going to | |
1784 | try to assemble your source files, you can ensure that they define | |
1785 | the right macros by means of code like this: | |
1786 | ||
1787 | %ifdef SOME_MACRO | |
1788 | ; do some setup | |
1789 | %elifdef SOME_OTHER_MACRO | |
1790 | ; do some different setup | |
1791 | %else | |
1792 | %error Neither SOME_MACRO nor SOME_OTHER_MACRO was defined. | |
1793 | %endif | |
1794 | ||
1795 | Then any user who fails to understand the way your code is supposed | |
1796 | to be assembled will be quickly warned of their mistake, rather than | |
1797 | having to wait until the program crashes on being run and then not | |
1798 | knowing what went wrong. | |
1799 | ||
1800 | 4.4 Preprocessor Loops: `%rep' | |
1801 | ||
1802 | NASM's `TIMES' prefix, though useful, cannot be used to invoke a | |
1803 | multi-line macro multiple times, because it is processed by NASM | |
1804 | after macros have already been expanded. Therefore NASM provides | |
1805 | another form of loop, this time at the preprocessor level: `%rep'. | |
1806 | ||
1807 | The directives `%rep' and `%endrep' (`%rep' takes a numeric | |
1808 | argument, which can be an expression; `%endrep' takes no arguments) | |
1809 | can be used to enclose a chunk of code, which is then replicated as | |
1810 | many times as specified by the preprocessor: | |
1811 | ||
1812 | %assign i 0 | |
1813 | %rep 64 | |
1814 | inc word [table+2*i] | |
1815 | %assign i i+1 | |
1816 | %endrep | |
1817 | ||
1818 | This will generate a sequence of 64 `INC' instructions, incrementing | |
1819 | every word of memory from `[table]' to `[table+126]'. | |
1820 | ||
1821 | For more complex termination conditions, or to break out of a repeat | |
1822 | loop part way along, you can use the `%exitrep' directive to | |
1823 | terminate the loop, like this: | |
1824 | ||
1825 | fibonacci: | |
1826 | %assign i 0 | |
1827 | %assign j 1 | |
1828 | %rep 100 | |
1829 | %if j > 65535 | |
1830 | %exitrep | |
1831 | %endif | |
1832 | dw j | |
1833 | %assign k j+i | |
1834 | %assign i j | |
1835 | %assign j k | |
1836 | %endrep | |
1837 | fib_number equ ($-fibonacci)/2 | |
1838 | ||
1839 | This produces a list of all the Fibonacci numbers that will fit in | |
1840 | 16 bits. Note that a maximum repeat count must still be given to | |
1841 | `%rep'. This is to prevent the possibility of NASM getting into an | |
1842 | infinite loop in the preprocessor, which (on multitasking or multi- | |
1843 | user systems) would typically cause all the system memory to be | |
1844 | gradually used up and other applications to start crashing. | |
1845 | ||
1846 | 4.5 Including Other Files | |
1847 | ||
1848 | Using, once again, a very similar syntax to the C preprocessor, | |
1849 | NASM's preprocessor lets you include other source files into your | |
1850 | code. This is done by the use of the `%include' directive: | |
1851 | ||
1852 | %include "macros.mac" | |
1853 | ||
1854 | will include the contents of the file `macros.mac' into the source | |
1855 | file containing the `%include' directive. | |
1856 | ||
1857 | Include files are searched for in the current directory (the | |
1858 | directory you're in when you run NASM, as opposed to the location of | |
1859 | the NASM executable or the location of the source file), plus any | |
1860 | directories specified on the NASM command line using the `-i' | |
1861 | option. | |
1862 | ||
1863 | The standard C idiom for preventing a file being included more than | |
1864 | once is just as applicable in NASM: if the file `macros.mac' has the | |
1865 | form | |
1866 | ||
1867 | %ifndef MACROS_MAC | |
1868 | %define MACROS_MAC | |
1869 | ; now define some macros | |
1870 | %endif | |
1871 | ||
1872 | then including the file more than once will not cause errors, | |
1873 | because the second time the file is included nothing will happen | |
1874 | because the macro `MACROS_MAC' will already be defined. | |
1875 | ||
1876 | You can force a file to be included even if there is no `%include' | |
1877 | directive that explicitly includes it, by using the `-p' option on | |
1878 | the NASM command line (see section 2.1.6). | |
1879 | ||
1880 | 4.6 The Context Stack | |
1881 | ||
1882 | Having labels that are local to a macro definition is sometimes not | |
1883 | quite powerful enough: sometimes you want to be able to share labels | |
1884 | between several macro calls. An example might be a `REPEAT' ... | |
1885 | `UNTIL' loop, in which the expansion of the `REPEAT' macro would | |
1886 | need to be able to refer to a label which the `UNTIL' macro had | |
1887 | defined. However, for such a macro you would also want to be able to | |
1888 | nest these loops. | |
1889 | ||
1890 | NASM provides this level of power by means of a _context stack_. The | |
1891 | preprocessor maintains a stack of _contexts_, each of which is | |
1892 | characterised by a name. You add a new context to the stack using | |
1893 | the `%push' directive, and remove one using `%pop'. You can define | |
1894 | labels that are local to a particular context on the stack. | |
1895 | ||
1896 | 4.6.1 `%push' and `%pop': Creating and Removing Contexts | |
1897 | ||
1898 | The `%push' directive is used to create a new context and place it | |
1899 | on the top of the context stack. `%push' requires one argument, | |
1900 | which is the name of the context. For example: | |
1901 | ||
1902 | %push foobar | |
1903 | ||
1904 | This pushes a new context called `foobar' on the stack. You can have | |
1905 | several contexts on the stack with the same name: they can still be | |
1906 | distinguished. | |
1907 | ||
1908 | The directive `%pop', requiring no arguments, removes the top | |
1909 | context from the context stack and destroys it, along with any | |
1910 | labels associated with it. | |
1911 | ||
1912 | 4.6.2 Context-Local Labels | |
1913 | ||
1914 | Just as the usage `%%foo' defines a label which is local to the | |
1915 | particular macro call in which it is used, the usage `%$foo' is used | |
1916 | to define a label which is local to the context on the top of the | |
1917 | context stack. So the `REPEAT' and `UNTIL' example given above could | |
1918 | be implemented by means of: | |
1919 | ||
1920 | %macro repeat 0 | |
1921 | %push repeat | |
1922 | %$begin: | |
1923 | %endmacro | |
1924 | ||
1925 | %macro until 1 | |
1926 | j%-1 %$begin | |
1927 | %pop | |
1928 | %endmacro | |
1929 | ||
1930 | and invoked by means of, for example, | |
1931 | ||
1932 | mov cx,string | |
1933 | repeat | |
1934 | add cx,3 | |
1935 | scasb | |
1936 | until e | |
1937 | ||
1938 | which would scan every fourth byte of a string in search of the byte | |
1939 | in `AL'. | |
1940 | ||
1941 | If you need to define, or access, labels local to the context | |
1942 | _below_ the top one on the stack, you can use `%$$foo', or `%$$$foo' | |
1943 | for the context below that, and so on. | |
1944 | ||
1945 | 4.6.3 Context-Local Single-Line Macros | |
1946 | ||
1947 | NASM also allows you to define single-line macros which are local to | |
1948 | a particular context, in just the same way: | |
1949 | ||
1950 | %define %$localmac 3 | |
1951 | ||
1952 | will define the single-line macro `%$localmac' to be local to the | |
1953 | top context on the stack. Of course, after a subsequent `%push', it | |
1954 | can then still be accessed by the name `%$$localmac'. | |
1955 | ||
1956 | 4.6.4 `%repl': Renaming a Context | |
1957 | ||
1958 | If you need to change the name of the top context on the stack (in | |
1959 | order, for example, to have it respond differently to `%ifctx'), you | |
1960 | can execute a `%pop' followed by a `%push'; but this will have the | |
1961 | side effect of destroying all context-local labels and macros | |
1962 | associated with the context that was just popped. | |
1963 | ||
1964 | NASM provides the directive `%repl', which _replaces_ a context with | |
1965 | a different name, without touching the associated macros and labels. | |
1966 | So you could replace the destructive code | |
1967 | ||
1968 | %pop | |
1969 | %push newname | |
1970 | ||
1971 | with the non-destructive version `%repl newname'. | |
1972 | ||
1973 | 4.6.5 Example Use of the Context Stack: Block IFs | |
1974 | ||
1975 | This example makes use of almost all the context-stack features, | |
1976 | including the conditional-assembly construct `%ifctx', to implement | |
1977 | a block IF statement as a set of macros. | |
1978 | ||
1979 | %macro if 1 | |
1980 | %push if | |
1981 | j%-1 %$ifnot | |
1982 | %endmacro | |
1983 | ||
1984 | %macro else 0 | |
1985 | %ifctx if | |
1986 | %repl else | |
1987 | jmp %$ifend | |
1988 | %$ifnot: | |
1989 | %else | |
1990 | %error "expected `if' before `else'" | |
1991 | %endif | |
1992 | %endmacro | |
1993 | ||
1994 | %macro endif 0 | |
1995 | %ifctx if | |
1996 | %$ifnot: | |
1997 | %pop | |
1998 | %elifctx else | |
1999 | %$ifend: | |
2000 | %pop | |
2001 | %else | |
2002 | %error "expected `if' or `else' before `endif'" | |
2003 | %endif | |
2004 | %endmacro | |
2005 | ||
2006 | This code is more robust than the `REPEAT' and `UNTIL' macros given | |
2007 | in section 4.6.2, because it uses conditional assembly to check that | |
2008 | the macros are issued in the right order (for example, not calling | |
2009 | `endif' before `if') and issues a `%error' if they're not. | |
2010 | ||
2011 | In addition, the `endif' macro has to be able to cope with the two | |
2012 | distinct cases of either directly following an `if', or following an | |
2013 | `else'. It achieves this, again, by using conditional assembly to do | |
2014 | different things depending on whether the context on top of the | |
2015 | stack is `if' or `else'. | |
2016 | ||
2017 | The `else' macro has to preserve the context on the stack, in order | |
2018 | to have the `%$ifnot' referred to by the `if' macro be the same as | |
2019 | the one defined by the `endif' macro, but has to change the | |
2020 | context's name so that `endif' will know there was an intervening | |
2021 | `else'. It does this by the use of `%repl'. | |
2022 | ||
2023 | A sample usage of these macros might look like: | |
2024 | ||
2025 | cmp ax,bx | |
2026 | if ae | |
2027 | cmp bx,cx | |
2028 | if ae | |
2029 | mov ax,cx | |
2030 | else | |
2031 | mov ax,bx | |
2032 | endif | |
2033 | else | |
2034 | cmp ax,cx | |
2035 | if ae | |
2036 | mov ax,cx | |
2037 | endif | |
2038 | endif | |
2039 | ||
2040 | The block-`IF' macros handle nesting quite happily, by means of | |
2041 | pushing another context, describing the inner `if', on top of the | |
2042 | one describing the outer `if'; thus `else' and `endif' always refer | |
2043 | to the last unmatched `if' or `else'. | |
2044 | ||
2045 | 4.7 Standard Macros | |
2046 | ||
2047 | NASM defines a set of standard macros, which are already defined | |
2048 | when it starts to process any source file. If you really need a | |
2049 | program to be assembled with no pre-defined macros, you can use the | |
2050 | `%clear' directive to empty the preprocessor of everything. | |
2051 | ||
2052 | Most user-level assembler directives (see chapter 5) are implemented | |
2053 | as macros which invoke primitive directives; these are described in | |
2054 | chapter 5. The rest of the standard macro set is described here. | |
2055 | ||
2056 | 4.7.1 `__NASM_MAJOR__' and `__NASM_MINOR__': NASM Version | |
2057 | ||
2058 | The single-line macros `__NASM_MAJOR__' and `__NASM_MINOR__' expand | |
2059 | to the major and minor parts of the version number of NASM being | |
2060 | used. So, under NASM 0.96 for example, `__NASM_MAJOR__' would be | |
2061 | defined to be 0 and `__NASM_MINOR__' would be defined as 96. | |
2062 | ||
2063 | 4.7.2 `__FILE__' and `__LINE__': File Name and Line Number | |
2064 | ||
2065 | Like the C preprocessor, NASM allows the user to find out the file | |
2066 | name and line number containing the current instruction. The macro | |
2067 | `__FILE__' expands to a string constant giving the name of the | |
2068 | current input file (which may change through the course of assembly | |
2069 | if `%include' directives are used), and `__LINE__' expands to a | |
2070 | numeric constant giving the current line number in the input file. | |
2071 | ||
2072 | These macros could be used, for example, to communicate debugging | |
2073 | information to a macro, since invoking `__LINE__' inside a macro | |
2074 | definition (either single-line or multi-line) will return the line | |
2075 | number of the macro _call_, rather than _definition_. So to | |
2076 | determine where in a piece of code a crash is occurring, for | |
2077 | example, one could write a routine `stillhere', which is passed a | |
2078 | line number in `EAX' and outputs something like `line 155: still | |
2079 | here'. You could then write a macro | |
2080 | ||
2081 | %macro notdeadyet 0 | |
2082 | push eax | |
2083 | mov eax,__LINE__ | |
2084 | call stillhere | |
2085 | pop eax | |
2086 | %endmacro | |
2087 | ||
2088 | and then pepper your code with calls to `notdeadyet' until you find | |
2089 | the crash point. | |
2090 | ||
2091 | 4.7.3 `STRUC' and `ENDSTRUC': Declaring Structure Data Types | |
2092 | ||
2093 | The core of NASM contains no intrinsic means of defining data | |
2094 | structures; instead, the preprocessor is sufficiently powerful that | |
2095 | data structures can be implemented as a set of macros. The macros | |
2096 | `STRUC' and `ENDSTRUC' are used to define a structure data type. | |
2097 | ||
2098 | `STRUC' takes one parameter, which is the name of the data type. | |
2099 | This name is defined as a symbol with the value zero, and also has | |
2100 | the suffix `_size' appended to it and is then defined as an `EQU' | |
2101 | giving the size of the structure. Once `STRUC' has been issued, you | |
2102 | are defining the structure, and should define fields using the | |
2103 | `RESB' family of pseudo-instructions, and then invoke `ENDSTRUC' to | |
2104 | finish the definition. | |
2105 | ||
2106 | For example, to define a structure called `mytype' containing a | |
2107 | longword, a word, a byte and a string of bytes, you might code | |
2108 | ||
2109 | struc mytype | |
2110 | mt_long: resd 1 | |
2111 | mt_word: resw 1 | |
2112 | mt_byte: resb 1 | |
2113 | mt_str: resb 32 | |
2114 | endstruc | |
2115 | ||
2116 | The above code defines six symbols: `mt_long' as 0 (the offset from | |
2117 | the beginning of a `mytype' structure to the longword field), | |
2118 | `mt_word' as 4, `mt_byte' as 6, `mt_str' as 7, `mytype_size' as 39, | |
2119 | and `mytype' itself as zero. | |
2120 | ||
2121 | The reason why the structure type name is defined at zero is a side | |
2122 | effect of allowing structures to work with the local label | |
2123 | mechanism: if your structure members tend to have the same names in | |
2124 | more than one structure, you can define the above structure like | |
2125 | this: | |
2126 | ||
2127 | struc mytype | |
2128 | .long: resd 1 | |
2129 | .word: resw 1 | |
2130 | .byte: resb 1 | |
2131 | .str: resb 32 | |
2132 | endstruc | |
2133 | ||
2134 | This defines the offsets to the structure fields as `mytype.long', | |
2135 | `mytype.word', `mytype.byte' and `mytype.str'. | |
2136 | ||
2137 | NASM, since it has no _intrinsic_ structure support, does not | |
2138 | support any form of period notation to refer to the elements of a | |
2139 | structure once you have one (except the above local-label notation), | |
2140 | so code such as `mov ax,[mystruc.mt_word]' is not valid. `mt_word' | |
2141 | is a constant just like any other constant, so the correct syntax is | |
2142 | `mov ax,[mystruc+mt_word]' or `mov ax,[mystruc+mytype.word]'. | |
2143 | ||
2144 | 4.7.4 `ISTRUC', `AT' and `IEND': Declaring Instances of Structures | |
2145 | ||
2146 | Having defined a structure type, the next thing you typically want | |
2147 | to do is to declare instances of that structure in your data | |
2148 | segment. NASM provides an easy way to do this in the `ISTRUC' | |
2149 | mechanism. To declare a structure of type `mytype' in a program, you | |
2150 | code something like this: | |
2151 | ||
2152 | mystruc: istruc mytype | |
2153 | at mt_long, dd 123456 | |
2154 | at mt_word, dw 1024 | |
2155 | at mt_byte, db 'x' | |
2156 | at mt_str, db 'hello, world', 13, 10, 0 | |
2157 | iend | |
2158 | ||
2159 | The function of the `AT' macro is to make use of the `TIMES' prefix | |
2160 | to advance the assembly position to the correct point for the | |
2161 | specified structure field, and then to declare the specified data. | |
2162 | Therefore the structure fields must be declared in the same order as | |
2163 | they were specified in the structure definition. | |
2164 | ||
2165 | If the data to go in a structure field requires more than one source | |
2166 | line to specify, the remaining source lines can easily come after | |
2167 | the `AT' line. For example: | |
2168 | ||
2169 | at mt_str, db 123,134,145,156,167,178,189 | |
2170 | db 190,100,0 | |
2171 | ||
2172 | Depending on personal taste, you can also omit the code part of the | |
2173 | `AT' line completely, and start the structure field on the next | |
2174 | line: | |
2175 | ||
2176 | at mt_str | |
2177 | db 'hello, world' | |
2178 | db 13,10,0 | |
2179 | ||
2180 | 4.7.5 `ALIGN' and `ALIGNB': Data Alignment | |
2181 | ||
2182 | The `ALIGN' and `ALIGNB' macros provides a convenient way to align | |
2183 | code or data on a word, longword, paragraph or other boundary. (Some | |
2184 | assemblers call this directive `EVEN'.) The syntax of the `ALIGN' | |
2185 | and `ALIGNB' macros is | |
2186 | ||
2187 | align 4 ; align on 4-byte boundary | |
2188 | align 16 ; align on 16-byte boundary | |
2189 | align 8,db 0 ; pad with 0s rather than NOPs | |
2190 | align 4,resb 1 ; align to 4 in the BSS | |
2191 | alignb 4 ; equivalent to previous line | |
2192 | ||
2193 | Both macros require their first argument to be a power of two; they | |
2194 | both compute the number of additional bytes required to bring the | |
2195 | length of the current section up to a multiple of that power of two, | |
2196 | and then apply the `TIMES' prefix to their second argument to | |
2197 | perform the alignment. | |
2198 | ||
2199 | If the second argument is not specified, the default for `ALIGN' is | |
2200 | `NOP', and the default for `ALIGNB' is `RESB 1'. So if the second | |
2201 | argument is specified, the two macros are equivalent. Normally, you | |
2202 | can just use `ALIGN' in code and data sections and `ALIGNB' in BSS | |
2203 | sections, and never need the second argument except for special | |
2204 | purposes. | |
2205 | ||
2206 | `ALIGN' and `ALIGNB', being simple macros, perform no error | |
2207 | checking: they cannot warn you if their first argument fails to be a | |
2208 | power of two, or if their second argument generates more than one | |
2209 | byte of code. In each of these cases they will silently do the wrong | |
2210 | thing. | |
2211 | ||
2212 | `ALIGNB' (or `ALIGN' with a second argument of `RESB 1') can be used | |
2213 | within structure definitions: | |
2214 | ||
2215 | struc mytype2 | |
2216 | mt_byte: resb 1 | |
2217 | alignb 2 | |
2218 | mt_word: resw 1 | |
2219 | alignb 4 | |
2220 | mt_long: resd 1 | |
2221 | mt_str: resb 32 | |
2222 | endstruc | |
2223 | ||
2224 | This will ensure that the structure members are sensibly aligned | |
2225 | relative to the base of the structure. | |
2226 | ||
2227 | A final caveat: `ALIGN' and `ALIGNB' work relative to the beginning | |
2228 | of the _section_, not the beginning of the address space in the | |
2229 | final executable. Aligning to a 16-byte boundary when the section | |
2230 | you're in is only guaranteed to be aligned to a 4-byte boundary, for | |
2231 | example, is a waste of effort. Again, NASM does not check that the | |
2232 | section's alignment characteristics are sensible for the use of | |
2233 | `ALIGN' or `ALIGNB'. | |
2234 | ||
2235 | Chapter 5: Assembler Directives | |
2236 | ------------------------------- | |
2237 | ||
2238 | NASM, though it attempts to avoid the bureaucracy of assemblers like | |
2239 | MASM and TASM, is nevertheless forced to support a _few_ directives. | |
2240 | These are described in this chapter. | |
2241 | ||
2242 | NASM's directives come in two types: user-level | |
2243 | directives_user-level_ directives and primitive | |
2244 | directives_primitive_ directives. Typically, each directive has a | |
2245 | user-level form and a primitive form. In almost all cases, we | |
2246 | recommend that users use the user-level forms of the directives, | |
2247 | which are implemented as macros which call the primitive forms. | |
2248 | ||
2249 | Primitive directives are enclosed in square brackets; user-level | |
2250 | directives are not. | |
2251 | ||
2252 | In addition to the universal directives described in this chapter, | |
2253 | each object file format can optionally supply extra directives in | |
2254 | order to control particular features of that file format. These | |
2255 | format-specific directives_format-specific_ directives are | |
2256 | documented along with the formats that implement them, in chapter 6. | |
2257 | ||
2258 | 5.1 `BITS': Specifying Target Processor Mode | |
2259 | ||
2260 | The `BITS' directive specifies whether NASM should generate code | |
2261 | designed to run on a processor operating in 16-bit mode, or code | |
2262 | designed to run on a processor operating in 32-bit mode. The syntax | |
2263 | is `BITS 16' or `BITS 32'. | |
2264 | ||
2265 | In most cases, you should not need to use `BITS' explicitly. The | |
2266 | `aout', `coff', `elf' and `win32' object formats, which are designed | |
2267 | for use in 32-bit operating systems, all cause NASM to select 32-bit | |
2268 | mode by default. The `obj' object format allows you to specify each | |
2269 | segment you define as either `USE16' or `USE32', and NASM will set | |
2270 | its operating mode accordingly, so the use of the `BITS' directive | |
2271 | is once again unnecessary. | |
2272 | ||
2273 | The most likely reason for using the `BITS' directive is to write | |
2274 | 32-bit code in a flat binary file; this is because the `bin' output | |
2275 | format defaults to 16-bit mode in anticipation of it being used most | |
2276 | frequently to write DOS `.COM' programs, DOS `.SYS' device drivers | |
2277 | and boot loader software. | |
2278 | ||
2279 | You do _not_ need to specify `BITS 32' merely in order to use 32-bit | |
2280 | instructions in a 16-bit DOS program; if you do, the assembler will | |
2281 | generate incorrect code because it will be writing code targeted at | |
2282 | a 32-bit platform, to be run on a 16-bit one. | |
2283 | ||
2284 | When NASM is in `BITS 16' state, instructions which use 32-bit data | |
2285 | are prefixed with an 0x66 byte, and those referring to 32-bit | |
2286 | addresses have an 0x67 prefix. In `BITS 32' state, the reverse is | |
2287 | true: 32-bit instructions require no prefixes, whereas instructions | |
2288 | using 16-bit data need an 0x66 and those working in 16-bit addresses | |
2289 | need an 0x67. | |
2290 | ||
2291 | The `BITS' directive has an exactly equivalent primitive form, | |
2292 | `[BITS 16]' and `[BITS 32]'. The user-level form is a macro which | |
2293 | has no function other than to call the primitive form. | |
2294 | ||
2295 | 5.2 `SECTION' or `SEGMENT': Changing and Defining Sections | |
2296 | ||
2297 | The `SECTION' directive (`SEGMENT' is an exactly equivalent synonym) | |
2298 | changes which section of the output file the code you write will be | |
2299 | assembled into. In some object file formats, the number and names of | |
2300 | sections are fixed; in others, the user may make up as many as they | |
2301 | wish. Hence `SECTION' may sometimes give an error message, or may | |
2302 | define a new section, if you try to switch to a section that does | |
2303 | not (yet) exist. | |
2304 | ||
2305 | The Unix object formats, and the `bin' object format, all support | |
2306 | the standardised section names `.text', `.data' and `.bss' for the | |
2307 | code, data and uninitialised-data sections. The `obj' format, by | |
2308 | contrast, does not recognise these section names as being special, | |
2309 | and indeed will strip off the leading period of any section name | |
2310 | that has one. | |
2311 | ||
2312 | 5.2.1 The `__SECT__' Macro | |
2313 | ||
2314 | The `SECTION' directive is unusual in that its user-level form | |
2315 | functions differently from its primitive form. The primitive form, | |
2316 | `[SECTION xyz]', simply switches the current target section to the | |
2317 | one given. The user-level form, `SECTION xyz', however, first | |
2318 | defines the single-line macro `__SECT__' to be the primitive | |
2319 | `[SECTION]' directive which it is about to issue, and then issues | |
2320 | it. So the user-level directive | |
2321 | ||
2322 | SECTION .text | |
2323 | ||
2324 | expands to the two lines | |
2325 | ||
2326 | %define __SECT__ [SECTION .text] | |
2327 | [SECTION .text] | |
2328 | ||
2329 | Users may find it useful to make use of this in their own macros. | |
2330 | For example, the `writefile' macro defined in section 4.2.3 can be | |
2331 | usefully rewritten in the following more sophisticated form: | |
2332 | ||
2333 | %macro writefile 2+ | |
2334 | [section .data] | |
2335 | %%str: db %2 | |
2336 | %%endstr: | |
2337 | __SECT__ | |
2338 | mov dx,%%str | |
2339 | mov cx,%%endstr-%%str | |
2340 | mov bx,%1 | |
2341 | mov ah,0x40 | |
2342 | int 0x21 | |
2343 | %endmacro | |
2344 | ||
2345 | This form of the macro, once passed a string to output, first | |
2346 | switches temporarily to the data section of the file, using the | |
2347 | primitive form of the `SECTION' directive so as not to modify | |
2348 | `__SECT__'. It then declares its string in the data section, and | |
2349 | then invokes `__SECT__' to switch back to _whichever_ section the | |
2350 | user was previously working in. It thus avoids the need, in the | |
2351 | previous version of the macro, to include a `JMP' instruction to | |
2352 | jump over the data, and also does not fail if, in a complicated | |
2353 | `OBJ' format module, the user could potentially be assembling the | |
2354 | code in any of several separate code sections. | |
2355 | ||
2356 | 5.3 `ABSOLUTE': Defining Absolute Labels | |
2357 | ||
2358 | The `ABSOLUTE' directive can be thought of as an alternative form of | |
2359 | `SECTION': it causes the subsequent code to be directed at no | |
2360 | physical section, but at the hypothetical section starting at the | |
2361 | given absolute address. The only instructions you can use in this | |
2362 | mode are the `RESB' family. | |
2363 | ||
2364 | `ABSOLUTE' is used as follows: | |
2365 | ||
2366 | absolute 0x1A | |
2367 | kbuf_chr resw 1 | |
2368 | kbuf_free resw 1 | |
2369 | kbuf resw 16 | |
2370 | ||
2371 | This example describes a section of the PC BIOS data area, at | |
2372 | segment address 0x40: the above code defines `kbuf_chr' to be 0x1A, | |
2373 | `kbuf_free' to be 0x1C, and `kbuf' to be 0x1E. | |
2374 | ||
2375 | The user-level form of `ABSOLUTE', like that of `SECTION', redefines | |
2376 | the `__SECT__' macro when it is invoked. | |
2377 | ||
2378 | `STRUC' and `ENDSTRUC' are defined as macros which use `ABSOLUTE' | |
2379 | (and also `__SECT__'). | |
2380 | ||
2381 | `ABSOLUTE' doesn't have to take an absolute constant as an argument: | |
2382 | it can take an expression (actually, a critical expression: see | |
2383 | section 3.7) and it can be a value in a segment. For example, a TSR | |
2384 | can re-use its setup code as run-time BSS like this: | |
2385 | ||
2386 | org 100h ; it's a .COM program | |
2387 | jmp setup ; setup code comes last | |
2388 | ; the resident part of the TSR goes here | |
2389 | setup: ; now write the code that installs the TSR here | |
2390 | absolute setup | |
2391 | runtimevar1 resw 1 | |
2392 | runtimevar2 resd 20 | |
2393 | tsr_end: | |
2394 | ||
2395 | This defines some variables `on top of' the setup code, so that | |
2396 | after the setup has finished running, the space it took up can be | |
2397 | re-used as data storage for the running TSR. The symbol `tsr_end' | |
2398 | can be used to calculate the total size of the part of the TSR that | |
2399 | needs to be made resident. | |
2400 | ||
2401 | 5.4 `EXTERN': Importing Symbols from Other Modules | |
2402 | ||
2403 | `EXTERN' is similar to the MASM directive `EXTRN' and the C keyword | |
2404 | `extern': it is used to declare a symbol which is not defined | |
2405 | anywhere in the module being assembled, but is assumed to be defined | |
2406 | in some other module and needs to be referred to by this one. Not | |
2407 | every object-file format can support external variables: the `bin' | |
2408 | format cannot. | |
2409 | ||
2410 | The `EXTERN' directive takes as many arguments as you like. Each | |
2411 | argument is the name of a symbol: | |
2412 | ||
2413 | extern _printf | |
2414 | extern _sscanf,_fscanf | |
2415 | ||
2416 | Some object-file formats provide extra features to the `EXTERN' | |
2417 | directive. In all cases, the extra features are used by suffixing a | |
2418 | colon to the symbol name followed by object-format specific text. | |
2419 | For example, the `obj' format allows you to declare that the default | |
2420 | segment base of an external should be the group `dgroup' by means of | |
2421 | the directive | |
2422 | ||
2423 | extern _variable:wrt dgroup | |
2424 | ||
2425 | The primitive form of `EXTERN' differs from the user-level form only | |
2426 | in that it can take only one argument at a time: the support for | |
2427 | multiple arguments is implemented at the preprocessor level. | |
2428 | ||
2429 | You can declare the same variable as `EXTERN' more than once: NASM | |
2430 | will quietly ignore the second and later redeclarations. You can't | |
2431 | declare a variable as `EXTERN' as well as something else, though. | |
2432 | ||
2433 | 5.5 `GLOBAL': Exporting Symbols to Other Modules | |
2434 | ||
2435 | `GLOBAL' is the other end of `EXTERN': if one module declares a | |
2436 | symbol as `EXTERN' and refers to it, then in order to prevent linker | |
2437 | errors, some other module must actually _define_ the symbol and | |
2438 | declare it as `GLOBAL'. Some assemblers use the name `PUBLIC' for | |
2439 | this purpose. | |
2440 | ||
2441 | The `GLOBAL' directive applying to a symbol must appear _before_ the | |
2442 | definition of the symbol. | |
2443 | ||
2444 | `GLOBAL' uses the same syntax as `EXTERN', except that it must refer | |
2445 | to symbols which _are_ defined in the same module as the `GLOBAL' | |
2446 | directive. For example: | |
2447 | ||
2448 | global _main | |
2449 | _main: ; some code | |
2450 | ||
2451 | `GLOBAL', like `EXTERN', allows object formats to define private | |
2452 | extensions by means of a colon. The `elf' object format, for | |
2453 | example, lets you specify whether global data items are functions or | |
2454 | data: | |
2455 | ||
2456 | global hashlookup:function, hashtable:data | |
2457 | ||
2458 | Like `EXTERN', the primitive form of `GLOBAL' differs from the user- | |
2459 | level form only in that it can take only one argument at a time. | |
2460 | ||
2461 | 5.6 `COMMON': Defining Common Data Areas | |
2462 | ||
2463 | The `COMMON' directive is used to declare _common variables_. A | |
2464 | common variable is much like a global variable declared in the | |
2465 | uninitialised data section, so that | |
2466 | ||
2467 | common intvar 4 | |
2468 | ||
2469 | is similar in function to | |
2470 | ||
2471 | global intvar | |
2472 | section .bss | |
2473 | intvar resd 1 | |
2474 | ||
2475 | The difference is that if more than one module defines the same | |
2476 | common variable, then at link time those variables will be _merged_, | |
2477 | and references to `intvar' in all modules will point at the same | |
2478 | piece of memory. | |
2479 | ||
2480 | Like `GLOBAL' and `EXTERN', `COMMON' supports object-format specific | |
2481 | extensions. For example, the `obj' format allows common variables to | |
2482 | be NEAR or FAR, and the `elf' format allows you to specify the | |
2483 | alignment requirements of a common variable: | |
2484 | ||
2485 | common commvar 4:near ; works in OBJ | |
2486 | common intarray 100:4 ; works in ELF: 4 byte aligned | |
2487 | ||
2488 | Once again, like `EXTERN' and `GLOBAL', the primitive form of | |
2489 | `COMMON' differs from the user-level form only in that it can take | |
2490 | only one argument at a time. | |
2491 | ||
2492 | Chapter 6: Output Formats | |
2493 | ------------------------- | |
2494 | ||
2495 | NASM is a portable assembler, designed to be able to compile on any | |
2496 | ANSI C-supporting platform and produce output to run on a variety of | |
2497 | Intel x86 operating systems. For this reason, it has a large number | |
2498 | of available output formats, selected using the `-f' option on the | |
2499 | NASM command line. Each of these formats, along with its extensions | |
2500 | to the base NASM syntax, is detailed in this chapter. | |
2501 | ||
2502 | As stated in section 2.1.1, NASM chooses a default name for your | |
2503 | output file based on the input file name and the chosen output | |
2504 | format. This will be generated by removing the extension (`.asm', | |
2505 | `.s', or whatever you like to use) from the input file name, and | |
2506 | substituting an extension defined by the output format. The | |
2507 | extensions are given with each format below. | |
2508 | ||
2509 | 6.1 `bin': Flat-Form Binary Output | |
2510 | ||
2511 | The `bin' format does not produce object files: it generates nothing | |
2512 | in the output file except the code you wrote. Such `pure binary' | |
2513 | files are used by MS-DOS: `.COM' executables and `.SYS' device | |
2514 | drivers are pure binary files. Pure binary output is also useful for | |
2515 | operating-system and boot loader development. | |
2516 | ||
2517 | `bin' supports the three standardised section names `.text', `.data' | |
2518 | and `.bss' only. The file NASM outputs will contain the contents of | |
2519 | the `.text' section first, followed by the contents of the `.data' | |
2520 | section, aligned on a four-byte boundary. The `.bss' section is not | |
2521 | stored in the output file at all, but is assumed to appear directly | |
2522 | after the end of the `.data' section, again aligned on a four-byte | |
2523 | boundary. | |
2524 | ||
2525 | If you specify no explicit `SECTION' directive, the code you write | |
2526 | will be directed by default into the `.text' section. | |
2527 | ||
2528 | Using the `bin' format puts NASM by default into 16-bit mode (see | |
2529 | section 5.1). In order to use `bin' to write 32-bit code such as an | |
2530 | OS kernel, you need to explicitly issue the `BITS 32' directive. | |
2531 | ||
2532 | `bin' has no default output file name extension: instead, it leaves | |
2533 | your file name as it is once the original extension has been | |
2534 | removed. Thus, the default is for NASM to assemble `binprog.asm' | |
2535 | into a binary file called `binprog'. | |
2536 | ||
2537 | 6.1.1 `ORG': Binary File Program Origin | |
2538 | ||
2539 | The `bin' format provides an additional directive to the list given | |
2540 | in chapter 5: `ORG'. The function of the `ORG' directive is to | |
2541 | specify the origin address which NASM will assume the program begins | |
2542 | at when it is loaded into memory. | |
2543 | ||
2544 | For example, the following code will generate the longword | |
2545 | `0x00000104': | |
2546 | ||
2547 | org 0x100 | |
2548 | dd label | |
2549 | label: | |
2550 | ||
2551 | Unlike the `ORG' directive provided by MASM-compatible assemblers, | |
2552 | which allows you to jump around in the object file and overwrite | |
2553 | code you have already generated, NASM's `ORG' does exactly what the | |
2554 | directive says: _origin_. Its sole function is to specify one offset | |
2555 | which is added to all internal address references within the file; | |
2556 | it does not permit any of the trickery that MASM's version does. See | |
2557 | section 10.1.3 for further comments. | |
2558 | ||
2559 | 6.1.2 `bin' Extensions to the `SECTION' Directive | |
2560 | ||
2561 | The `bin' output format extends the `SECTION' (or `SEGMENT') | |
2562 | directive to allow you to specify the alignment requirements of | |
2563 | segments. This is done by appending the `ALIGN' qualifier to the end | |
2564 | of the section-definition line. For example, | |
2565 | ||
2566 | section .data align=16 | |
2567 | ||
2568 | switches to the section `.data' and also specifies that it must be | |
2569 | aligned on a 16-byte boundary. | |
2570 | ||
2571 | The parameter to `ALIGN' specifies how many low bits of the section | |
2572 | start address must be forced to zero. The alignment value given may | |
2573 | be any power of two. | |
2574 | ||
2575 | 6.2 `obj': Microsoft OMF Object Files | |
2576 | ||
2577 | The `obj' file format (NASM calls it `obj' rather than `omf' for | |
2578 | historical reasons) is the one produced by MASM and TASM, which is | |
2579 | typically fed to 16-bit DOS linkers to produce `.EXE' files. It is | |
2580 | also the format used by OS/2. | |
2581 | ||
2582 | `obj' provides a default output file-name extension of `.obj'. | |
2583 | ||
2584 | `obj' is not exclusively a 16-bit format, though: NASM has full | |
2585 | support for the 32-bit extensions to the format. In particular, 32- | |
2586 | bit `obj' format files are used by Borland's Win32 compilers, | |
2587 | instead of using Microsoft's newer `win32' object file format. | |
2588 | ||
2589 | The `obj' format does not define any special segment names: you can | |
2590 | call your segments anything you like. Typical names for segments in | |
2591 | `obj' format files are `CODE', `DATA' and `BSS'. | |
2592 | ||
2593 | If your source file contains code before specifying an explicit | |
2594 | `SEGMENT' directive, then NASM will invent its own segment called | |
2595 | `__NASMDEFSEG' for you. | |
2596 | ||
2597 | When you define a segment in an `obj' file, NASM defines the segment | |
2598 | name as a symbol as well, so that you can access the segment address | |
2599 | of the segment. So, for example: | |
2600 | ||
2601 | segment data | |
2602 | dvar: dw 1234 | |
2603 | segment code | |
2604 | function: mov ax,data ; get segment address of data | |
2605 | mov ds,ax ; and move it into DS | |
2606 | inc word [dvar] ; now this reference will work | |
2607 | ret | |
2608 | ||
2609 | The `obj' format also enables the use of the `SEG' and `WRT' | |
2610 | operators, so that you can write code which does things like | |
2611 | ||
2612 | extern foo | |
2613 | mov ax,seg foo ; get preferred segment of foo | |
2614 | mov ds,ax | |
2615 | mov ax,data ; a different segment | |
2616 | mov es,ax | |
2617 | mov ax,[ds:foo] ; this accesses `foo' | |
2618 | mov [es:foo wrt data],bx ; so does this | |
2619 | ||
2620 | 6.2.1 `obj' Extensions to the `SEGMENT' Directive | |
2621 | ||
2622 | The `obj' output format extends the `SEGMENT' (or `SECTION') | |
2623 | directive to allow you to specify various properties of the segment | |
2624 | you are defining. This is done by appending extra qualifiers to the | |
2625 | end of the segment-definition line. For example, | |
2626 | ||
2627 | segment code private align=16 | |
2628 | ||
2629 | defines the segment `code', but also declares it to be a private | |
2630 | segment, and requires that the portion of it described in this code | |
2631 | module must be aligned on a 16-byte boundary. | |
2632 | ||
2633 | The available qualifiers are: | |
2634 | ||
2635 | (*) `PRIVATE', `PUBLIC', `COMMON' and `STACK' specify the | |
2636 | combination characteristics of the segment. `PRIVATE' segments | |
2637 | do not get combined with any others by the linker; `PUBLIC' and | |
2638 | `STACK' segments get concatenated together at link time; and | |
2639 | `COMMON' segments all get overlaid on top of each other rather | |
2640 | than stuck end-to-end. | |
2641 | ||
2642 | (*) `ALIGN' is used, as shown above, to specify how many low bits of | |
2643 | the segment start address must be forced to zero. The alignment | |
2644 | value given may be any power of two from 1 to 4096; in reality, | |
2645 | the only values supported are 1, 2, 4, 16, 256 and 4096, so if 8 | |
2646 | is specified it will be rounded up to 16, and 32, 64 and 128 | |
2647 | will all be rounded up to 256, and so on. Note that alignment to | |
2648 | 4096-byte boundaries is a PharLap extension to the format and | |
2649 | may not be supported by all linkers. | |
2650 | ||
2651 | (*) `CLASS' can be used to specify the segment class; this feature | |
2652 | indicates to the linker that segments of the same class should | |
2653 | be placed near each other in the output file. The class name can | |
2654 | be any word, e.g. `CLASS=CODE'. | |
2655 | ||
2656 | (*) `OVERLAY', like `CLASS', is specified with an arbitrary word as | |
2657 | an argument, and provides overlay information to an overlay- | |
2658 | capable linker. | |
2659 | ||
2660 | (*) Segments can be declared as `USE16' or `USE32', which has the | |
2661 | effect of recording the choice in the object file and also | |
2662 | ensuring that NASM's default assembly mode when assembling in | |
2663 | that segment is 16-bit or 32-bit respectively. | |
2664 | ||
2665 | (*) When writing OS/2 object files, you should declare 32-bit | |
2666 | segments as `FLAT', which causes the default segment base for | |
2667 | anything in the segment to be the special group `FLAT', and also | |
2668 | defines the group if it is not already defined. | |
2669 | ||
2670 | (*) The `obj' file format also allows segments to be declared as | |
2671 | having a pre-defined absolute segment address, although no | |
2672 | linkers are currently known to make sensible use of this | |
2673 | feature; nevertheless, NASM allows you to declare a segment such | |
2674 | as `SEGMENT SCREEN ABSOLUTE=0xB800' if you need to. The | |
2675 | `ABSOLUTE' and `ALIGN' keywords are mutually exclusive. | |
2676 | ||
2677 | NASM's default segment attributes are `PUBLIC', `ALIGN=1', no class, | |
2678 | no overlay, and `USE16'. | |
2679 | ||
2680 | 6.2.2 `GROUP': Defining Groups of Segments | |
2681 | ||
2682 | The `obj' format also allows segments to be grouped, so that a | |
2683 | single segment register can be used to refer to all the segments in | |
2684 | a group. NASM therefore supplies the `GROUP' directive, whereby you | |
2685 | can code | |
2686 | ||
2687 | segment data | |
2688 | ; some data | |
2689 | segment bss | |
2690 | ; some uninitialised data | |
2691 | group dgroup data bss | |
2692 | ||
2693 | which will define a group called `dgroup' to contain the segments | |
2694 | `data' and `bss'. Like `SEGMENT', `GROUP' causes the group name to | |
2695 | be defined as a symbol, so that you can refer to a variable `var' in | |
2696 | the `data' segment as `var wrt data' or as `var wrt dgroup', | |
2697 | depending on which segment value is currently in your segment | |
2698 | register. | |
2699 | ||
2700 | If you just refer to `var', however, and `var' is declared in a | |
2701 | segment which is part of a group, then NASM will default to giving | |
2702 | you the offset of `var' from the beginning of the _group_, not the | |
2703 | _segment_. Therefore `SEG var', also, will return the group base | |
2704 | rather than the segment base. | |
2705 | ||
2706 | NASM will allow a segment to be part of more than one group, but | |
2707 | will generate a warning if you do this. Variables declared in a | |
2708 | segment which is part of more than one group will default to being | |
2709 | relative to the first group that was defined to contain the segment. | |
2710 | ||
2711 | A group does not have to contain any segments; you can still make | |
2712 | `WRT' references to a group which does not contain the variable you | |
2713 | are referring to. OS/2, for example, defines the special group | |
2714 | `FLAT' with no segments in it. | |
2715 | ||
2716 | 6.2.3 `UPPERCASE': Disabling Case Sensitivity in Output | |
2717 | ||
2718 | Although NASM itself is case sensitive, some OMF linkers are not; | |
2719 | therefore it can be useful for NASM to output single-case object | |
2720 | files. The `UPPERCASE' format-specific directive causes all segment, | |
2721 | group and symbol names that are written to the object file to be | |
2722 | forced to upper case just before being written. Within a source | |
2723 | file, NASM is still case-sensitive; but the object file can be | |
2724 | written entirely in upper case if desired. | |
2725 | ||
2726 | `UPPERCASE' is used alone on a line; it requires no parameters. | |
2727 | ||
2728 | 6.2.4 `IMPORT': Importing DLL Symbols | |
2729 | ||
2730 | The `IMPORT' format-specific directive defines a symbol to be | |
2731 | imported from a DLL, for use if you are writing a DLL's import | |
2732 | library in NASM. You still need to declare the symbol as `EXTERN' as | |
2733 | well as using the `IMPORT' directive. | |
2734 | ||
2735 | The `IMPORT' directive takes two required parameters, separated by | |
2736 | white space, which are (respectively) the name of the symbol you | |
2737 | wish to import and the name of the library you wish to import it | |
2738 | from. For example: | |
2739 | ||
2740 | import WSAStartup wsock32.dll | |
2741 | ||
2742 | A third optional parameter gives the name by which the symbol is | |
2743 | known in the library you are importing it from, in case this is not | |
2744 | the same as the name you wish the symbol to be known by to your code | |
2745 | once you have imported it. For example: | |
2746 | ||
2747 | import asyncsel wsock32.dll WSAAsyncSelect | |
2748 | ||
2749 | 6.2.5 `EXPORT': Exporting DLL Symbols | |
2750 | ||
2751 | The `EXPORT' format-specific directive defines a global symbol to be | |
2752 | exported as a DLL symbol, for use if you are writing a DLL in NASM. | |
2753 | You still need to declare the symbol as `GLOBAL' as well as using | |
2754 | the `EXPORT' directive. | |
2755 | ||
2756 | `EXPORT' takes one required parameter, which is the name of the | |
2757 | symbol you wish to export, as it was defined in your source file. An | |
2758 | optional second parameter (separated by white space from the first) | |
2759 | gives the _external_ name of the symbol: the name by which you wish | |
2760 | the symbol to be known to programs using the DLL. If this name is | |
2761 | the same as the internal name, you may leave the second parameter | |
2762 | off. | |
2763 | ||
2764 | Further parameters can be given to define attributes of the exported | |
2765 | symbol. These parameters, like the second, are separated by white | |
2766 | space. If further parameters are given, the external name must also | |
2767 | be specified, even if it is the same as the internal name. The | |
2768 | available attributes are: | |
2769 | ||
2770 | (*) `resident' indicates that the exported name is to be kept | |
2771 | resident by the system loader. This is an optimisation for | |
2772 | frequently used symbols imported by name. | |
2773 | ||
2774 | (*) `nodata' indicates that the exported symbol is a function which | |
2775 | does not make use of any initialised data. | |
2776 | ||
2777 | (*) `parm=NNN', where `NNN' is an integer, sets the number of | |
2778 | parameter words for the case in which the symbol is a call gate | |
2779 | between 32-bit and 16-bit segments. | |
2780 | ||
2781 | (*) An attribute which is just a number indicates that the symbol | |
2782 | should be exported with an identifying number (ordinal), and | |
2783 | gives the desired number. | |
2784 | ||
2785 | For example: | |
2786 | ||
2787 | export myfunc | |
2788 | export myfunc TheRealMoreFormalLookingFunctionName | |
2789 | export myfunc myfunc 1234 ; export by ordinal | |
2790 | export myfunc myfunc resident parm=23 nodata | |
2791 | ||
2792 | 6.2.6 `..start': Defining the Program Entry Point | |
2793 | ||
2794 | OMF linkers require exactly one of the object files being linked to | |
2795 | define the program entry point, where execution will begin when the | |
2796 | program is run. If the object file that defines the entry point is | |
2797 | assembled using NASM, you specify the entry point by declaring the | |
2798 | special symbol `..start' at the point where you wish execution to | |
2799 | begin. | |
2800 | ||
2801 | 6.2.7 `obj' Extensions to the `EXTERN' Directive | |
2802 | ||
2803 | If you declare an external symbol with the directive | |
2804 | ||
2805 | extern foo | |
2806 | ||
2807 | then references such as `mov ax,foo' will give you the offset of | |
2808 | `foo' from its preferred segment base (as specified in whichever | |
2809 | module `foo' is actually defined in). So to access the contents of | |
2810 | `foo' you will usually need to do something like | |
2811 | ||
2812 | mov ax,seg foo ; get preferred segment base | |
2813 | mov es,ax ; move it into ES | |
2814 | mov ax,[es:foo] ; and use offset `foo' from it | |
2815 | ||
2816 | This is a little unwieldy, particularly if you know that an external | |
2817 | is going to be accessible from a given segment or group, say | |
2818 | `dgroup'. So if `DS' already contained `dgroup', you could simply | |
2819 | code | |
2820 | ||
2821 | mov ax,[foo wrt dgroup] | |
2822 | ||
2823 | However, having to type this every time you want to access `foo' can | |
2824 | be a pain; so NASM allows you to declare `foo' in the alternative | |
2825 | form | |
2826 | ||
2827 | extern foo:wrt dgroup | |
2828 | ||
2829 | This form causes NASM to pretend that the preferred segment base of | |
2830 | `foo' is in fact `dgroup'; so the expression `seg foo' will now | |
2831 | return `dgroup', and the expression `foo' is equivalent to | |
2832 | `foo wrt dgroup'. | |
2833 | ||
2834 | This default-`WRT' mechanism can be used to make externals appear to | |
2835 | be relative to any group or segment in your program. It can also be | |
2836 | applied to common variables: see section 6.2.8. | |
2837 | ||
2838 | 6.2.8 `obj' Extensions to the `COMMON' Directive | |
2839 | ||
2840 | The `obj' format allows common variables to be either near or far; | |
2841 | NASM allows you to specify which your variables should be by the use | |
2842 | of the syntax | |
2843 | ||
2844 | common nearvar 2:near ; `nearvar' is a near common | |
2845 | common farvar 10:far ; and `farvar' is far | |
2846 | ||
2847 | Far common variables may be greater in size than 64Kb, and so the | |
2848 | OMF specification says that they are declared as a number of | |
2849 | _elements_ of a given size. So a 10-byte far common variable could | |
2850 | be declared as ten one-byte elements, five two-byte elements, two | |
2851 | five-byte elements or one ten-byte element. | |
2852 | ||
2853 | Some OMF linkers require the element size, as well as the variable | |
2854 | size, to match when resolving common variables declared in more than | |
2855 | one module. Therefore NASM must allow you to specify the element | |
2856 | size on your far common variables. This is done by the following | |
2857 | syntax: | |
2858 | ||
2859 | common c_5by2 10:far 5 ; two five-byte elements | |
2860 | common c_2by5 10:far 2 ; five two-byte elements | |
2861 | ||
2862 | If no element size is specified, the default is 1. Also, the `FAR' | |
2863 | keyword is not required when an element size is specified, since | |
2864 | only far commons may have element sizes at all. So the above | |
2865 | declarations could equivalently be | |
2866 | ||
2867 | common c_5by2 10:5 ; two five-byte elements | |
2868 | common c_2by5 10:2 ; five two-byte elements | |
2869 | ||
2870 | In addition to these extensions, the `COMMON' directive in `obj' | |
2871 | also supports default-`WRT' specification like `EXTERN' does | |
2872 | (explained in section 6.2.7). So you can also declare things like | |
2873 | ||
2874 | common foo 10:wrt dgroup | |
2875 | common bar 16:far 2:wrt data | |
2876 | common baz 24:wrt data:6 | |
2877 | ||
2878 | 6.3 `win32': Microsoft Win32 Object Files | |
2879 | ||
2880 | The `win32' output format generates Microsoft Win32 object files, | |
2881 | suitable for passing to Microsoft linkers such as Visual C++. Note | |
2882 | that Borland Win32 compilers do not use this format, but use `obj' | |
2883 | instead (see section 6.2). | |
2884 | ||
2885 | `win32' provides a default output file-name extension of `.obj'. | |
2886 | ||
2887 | Note that although Microsoft say that Win32 object files follow the | |
2888 | COFF (Common Object File Format) standard, the object files produced | |
2889 | by Microsoft Win32 compilers are not compatible with COFF linkers | |
2890 | such as DJGPP's, and vice versa. This is due to a difference of | |
2891 | opinion over the precise semantics of PC-relative relocations. To | |
2892 | produce COFF files suitable for DJGPP, use NASM's `coff' output | |
2893 | format; conversely, the `coff' format does not produce object files | |
2894 | that Win32 linkers can generate correct output from. | |
2895 | ||
2896 | 6.3.1 `win32' Extensions to the `SECTION' Directive | |
2897 | ||
2898 | Like the `obj' format, `win32' allows you to specify additional | |
2899 | information on the `SECTION' directive line, to control the type and | |
2900 | properties of sections you declare. Section types and properties are | |
2901 | generated automatically by NASM for the standard section names | |
2902 | `.text', `.data' and `.bss', but may still be overridden by these | |
2903 | qualifiers. | |
2904 | ||
2905 | The available qualifiers are: | |
2906 | ||
2907 | (*) `code', or equivalently `text', defines the section to be a code | |
2908 | section. This marks the section as readable and executable, but | |
2909 | not writable, and also indicates to the linker that the type of | |
2910 | the section is code. | |
2911 | ||
2912 | (*) `data' and `bss' define the section to be a data section, | |
2913 | analogously to `code'. Data sections are marked as readable and | |
2914 | writable, but not executable. `data' declares an initialised | |
2915 | data section, whereas `bss' declares an uninitialised data | |
2916 | section. | |
2917 | ||
2918 | (*) `info' defines the section to be an informational section, which | |
2919 | is not included in the executable file by the linker, but may | |
2920 | (for example) pass information _to_ the linker. For example, | |
2921 | declaring an `info'-type section called `.drectve' causes the | |
2922 | linker to interpret the contents of the section as command-line | |
2923 | options. | |
2924 | ||
2925 | (*) `align=', used with a trailing number as in `obj', gives the | |
2926 | alignment requirements of the section. The maximum you may | |
2927 | specify is 64: the Win32 object file format contains no means to | |
2928 | request a greater section alignment than this. If alignment is | |
2929 | not explicitly specified, the defaults are 16-byte alignment for | |
2930 | code sections, and 4-byte alignment for data (and BSS) sections. | |
2931 | Informational sections get a default alignment of 1 byte (no | |
2932 | alignment), though the value does not matter. | |
2933 | ||
2934 | The defaults assumed by NASM if you do not specify the above | |
2935 | qualifiers are: | |
2936 | ||
2937 | section .text code align=16 | |
2938 | section .data data align=4 | |
2939 | section .bss bss align=4 | |
2940 | ||
2941 | Any other section name is treated by default like `.text'. | |
2942 | ||
2943 | 6.4 `coff': Common Object File Format | |
2944 | ||
2945 | The `coff' output type produces COFF object files suitable for | |
2946 | linking with the DJGPP linker. | |
2947 | ||
2948 | `coff' provides a default output file-name extension of `.o'. | |
2949 | ||
2950 | The `coff' format supports the same extensions to the `SECTION' | |
2951 | directive as `win32' does, except that the `align' qualifier and the | |
2952 | `info' section type are not supported. | |
2953 | ||
2954 | 6.5 `elf': Linux ELFObject Files | |
2955 | ||
2956 | The `elf' output format generates ELF32 (Executable and Linkable | |
2957 | Format) object files, as used by Linux. `elf' provides a default | |
2958 | output file-name extension of `.o'. | |
2959 | ||
2960 | 6.5.1 `elf' Extensions to the `SECTION' Directive | |
2961 | ||
2962 | Like the `obj' format, `elf' allows you to specify additional | |
2963 | information on the `SECTION' directive line, to control the type and | |
2964 | properties of sections you declare. Section types and properties are | |
2965 | generated automatically by NASM for the standard section names | |
2966 | `.text', `.data' and `.bss', but may still be overridden by these | |
2967 | qualifiers. | |
2968 | ||
2969 | The available qualifiers are: | |
2970 | ||
2971 | (*) `alloc' defines the section to be one which is loaded into | |
2972 | memory when the program is run. `noalloc' defines it to be one | |
2973 | which is not, such as an informational or comment section. | |
2974 | ||
2975 | (*) `exec' defines the section to be one which should have execute | |
2976 | permission when the program is run. `noexec' defines it as one | |
2977 | which should not. | |
2978 | ||
2979 | (*) `write' defines the section to be one which should be writable | |
2980 | when the program is run. `nowrite' defines it as one which | |
2981 | should not. | |
2982 | ||
2983 | (*) `progbits' defines the section to be one with explicit contents | |
2984 | stored in the object file: an ordinary code or data section, for | |
2985 | example, `nobits' defines the section to be one with no explicit | |
2986 | contents given, such as a BSS section. | |
2987 | ||
2988 | (*) `align=', used with a trailing number as in `obj', gives the | |
2989 | alignment requirements of the section. | |
2990 | ||
2991 | The defaults assumed by NASM if you do not specify the above | |
2992 | qualifiers are: | |
2993 | ||
2994 | section .text progbits alloc exec nowrite align=16 | |
2995 | section .data progbits alloc noexec write align=4 | |
2996 | section .bss nobits alloc noexec write align=4 | |
2997 | section other progbits alloc noexec nowrite align=1 | |
2998 | ||
2999 | (Any section name other than `.text', `.data' and `.bss' is treated | |
3000 | by default like `other' in the above code.) | |
3001 | ||
3002 | 6.5.2 Position-Independent Code: `elf' Special Symbols and `WRT' | |
3003 | ||
3004 | The ELF specification contains enough features to allow position- | |
3005 | independent code (PIC) to be written, which makes ELF shared | |
3006 | libraries very flexible. However, it also means NASM has to be able | |
3007 | to generate a variety of strange relocation types in ELF object | |
3008 | files, if it is to be an assembler which can write PIC. | |
3009 | ||
3010 | Since ELF does not support segment-base references, the `WRT' | |
3011 | operator is not used for its normal purpose; therefore NASM's `elf' | |
3012 | output format makes use of `WRT' for a different purpose, namely the | |
3013 | PIC-specific relocation types. | |
3014 | ||
3015 | `elf' defines five special symbols which you can use as the right- | |
3016 | hand side of the `WRT' operator to obtain PIC relocation types. They | |
3017 | are `..gotpc', `..gotoff', `..got', `..plt' and `..sym'. Their | |
3018 | functions are summarised here: | |
3019 | ||
3020 | (*) Referring to the symbol marking the global offset table base | |
3021 | using `wrt ..gotpc' will end up giving the distance from the | |
3022 | beginning of the current section to the global offset table. | |
3023 | (`_GLOBAL_OFFSET_TABLE_' is the standard symbol name used to | |
3024 | refer to the GOT.) So you would then need to add `$$' to the | |
3025 | result to get the real address of the GOT. | |
3026 | ||
3027 | (*) Referring to a location in one of your own sections using | |
3028 | `wrt ..gotoff' will give the distance from the beginning of the | |
3029 | GOT to the specified location, so that adding on the address of | |
3030 | the GOT would give the real address of the location you wanted. | |
3031 | ||
3032 | (*) Referring to an external or global symbol using `wrt ..got' | |
3033 | causes the linker to build an entry _in_ the GOT containing the | |
3034 | address of the symbol, and the reference gives the distance from | |
3035 | the beginning of the GOT to the entry; so you can add on the | |
3036 | address of the GOT, load from the resulting address, and end up | |
3037 | with the address of the symbol. | |
3038 | ||
3039 | (*) Referring to a procedure name using `wrt ..plt' causes the | |
3040 | linker to build a procedure linkage table entry for the symbol, | |
3041 | and the reference gives the address of the PLT entry. You can | |
3042 | only use this in contexts which would generate a PC-relative | |
3043 | relocation normally (i.e. as the destination for `CALL' or | |
3044 | `JMP'), since ELF contains no relocation type to refer to PLT | |
3045 | entries absolutely. | |
3046 | ||
3047 | (*) Referring to a symbol name using `wrt ..sym' causes NASM to | |
3048 | write an ordinary relocation, but instead of making the | |
3049 | relocation relative to the start of the section and then adding | |
3050 | on the offset to the symbol, it will write a relocation record | |
3051 | aimed directly at the symbol in question. The distinction is a | |
3052 | necessary one due to a peculiarity of the dynamic linker. | |
3053 | ||
3054 | A fuller explanation of how to use these relocation types to write | |
3055 | shared libraries entirely in NASM is given in section 8.2. | |
3056 | ||
3057 | 6.5.3 `elf' Extensions to the `GLOBAL' Directive | |
3058 | ||
3059 | ELF object files can contain more information about a global symbol | |
3060 | than just its address: they can contain the size of the symbol and | |
3061 | its type as well. These are not merely debugger conveniences, but | |
3062 | are actually necessary when the program being written is a shared | |
3063 | library. NASM therefore supports some extensions to the `GLOBAL' | |
3064 | directive, allowing you to specify these features. | |
3065 | ||
3066 | You can specify whether a global variable is a function or a data | |
3067 | object by suffixing the name with a colon and the word `function' or | |
3068 | `data'. (`object' is a synonym for `data'.) For example: | |
3069 | ||
3070 | global hashlookup:function, hashtable:data | |
3071 | ||
3072 | exports the global symbol `hashlookup' as a function and `hashtable' | |
3073 | as a data object. | |
3074 | ||
3075 | You can also specify the size of the data associated with the | |
3076 | symbol, as a numeric expression (which may involve labels, and even | |
3077 | forward references) after the type specifier. Like this: | |
3078 | ||
3079 | global hashtable:data (hashtable.end - hashtable) | |
3080 | hashtable: | |
3081 | db this,that,theother ; some data here | |
3082 | .end: | |
3083 | ||
3084 | This makes NASM automatically calculate the length of the table and | |
3085 | place that information into the ELF symbol table. | |
3086 | ||
3087 | Declaring the type and size of global symbols is necessary when | |
3088 | writing shared library code. For more information, see section | |
3089 | 8.2.4. | |
3090 | ||
3091 | 6.5.4 `elf' Extensions to the `COMMON' Directive | |
3092 | ||
3093 | ELF also allows you to specify alignment requirements on common | |
3094 | variables. This is done by putting a number (which must be a power | |
3095 | of two) after the name and size of the common variable, separated | |
3096 | (as usual) by a colon. For example, an array of doublewords would | |
3097 | benefit from 4-byte alignment: | |
3098 | ||
3099 | common dwordarray 128:4 | |
3100 | ||
3101 | This declares the total size of the array to be 128 bytes, and | |
3102 | requires that it be aligned on a 4-byte boundary. | |
3103 | ||
3104 | 6.6 `aout': Linux `a.out' Object Files | |
3105 | ||
3106 | The `aout' format generates `a.out' object files, in the form used | |
3107 | by early Linux systems. (These differ from other `a.out' object | |
3108 | files in that the magic number in the first four bytes of the file | |
3109 | is different. Also, some implementations of `a.out', for example | |
3110 | NetBSD's, support position-independent code, which Linux's | |
3111 | implementation doesn't.) | |
3112 | ||
3113 | `a.out' provides a default output file-name extension of `.o'. | |
3114 | ||
3115 | `a.out' is a very simple object format. It supports no special | |
3116 | directives, no special symbols, no use of `SEG' or `WRT', and no | |
3117 | extensions to any standard directives. It supports only the three | |
3118 | standard section names `.text', `.data' and `.bss'. | |
3119 | ||
3120 | 6.7 `aoutb': NetBSD/FreeBSD/OpenBSD `a.out' Object Files | |
3121 | ||
3122 | The `aoutb' format generates `a.out' object files, in the form used | |
3123 | by the various free BSD Unix clones, NetBSD, FreeBSD and OpenBSD. | |
3124 | For simple object files, this object format is exactly the same as | |
3125 | `aout' except for the magic number in the first four bytes of the | |
3126 | file. However, the `aoutb' format supports position-independent code | |
3127 | in the same way as the `elf' format, so you can use it to write BSD | |
3128 | shared libraries. | |
3129 | ||
3130 | `aoutb' provides a default output file-name extension of `.o'. | |
3131 | ||
3132 | `aoutb' supports no special directives, no special symbols, and only | |
3133 | the three standard section names `.text', `.data' and `.bss'. | |
3134 | However, it also supports the same use of `WRT' as `elf' does, to | |
3135 | provide position-independent code relocation types. See section | |
3136 | 6.5.2 for full documentation of this feature. | |
3137 | ||
3138 | `aoutb' also supports the same extensions to the `GLOBAL' directive | |
3139 | as `elf' does: see section 6.5.3 for documentation of this. | |
3140 | ||
3141 | 6.8 `as86': Linux `as86' Object Files | |
3142 | ||
3143 | The Linux 16-bit assembler `as86' has its own non-standard object | |
3144 | file format. Although its companion linker `ld86' produces something | |
3145 | close to ordinary `a.out' binaries as output, the object file format | |
3146 | used to communicate between `as86' and `ld86' is not itself `a.out'. | |
3147 | ||
3148 | NASM supports this format, just in case it is useful, as `as86'. | |
3149 | `as86' provides a default output file-name extension of `.o'. | |
3150 | ||
3151 | `as86' is a very simple object format (from the NASM user's point of | |
3152 | view). It supports no special directives, no special symbols, no use | |
3153 | of `SEG' or `WRT', and no extensions to any standard directives. It | |
3154 | supports only the three standard section names `.text', `.data' and | |
3155 | `.bss'. | |
3156 | ||
3157 | 6.9 `rdf': Relocatable Dynamic Object File Format | |
3158 | ||
3159 | The `rdf' output format produces RDOFF object files. RDOFF | |
3160 | (Relocatable Dynamic Object File Format) is a home-grown object-file | |
3161 | format, designed alongside NASM itself and reflecting in its file | |
3162 | format the internal structure of the assembler. | |
3163 | ||
3164 | RDOFF is not used by any well-known operating systems. Those writing | |
3165 | their own systems, however, may well wish to use RDOFF as their | |
3166 | object format, on the grounds that it is designed primarily for | |
3167 | simplicity and contains very little file-header bureaucracy. | |
3168 | ||
3169 | The Unix NASM archive, and the DOS archive which includes sources, | |
3170 | both contain an `rdoff' subdirectory holding a set of RDOFF | |
3171 | utilities: an RDF linker, an RDF static-library manager, an RDF file | |
3172 | dump utility, and a program which will load and execute an RDF | |
3173 | executable under Linux. | |
3174 | ||
3175 | `rdf' supports only the standard section names `.text', `.data' and | |
3176 | `.bss'. | |
3177 | ||
3178 | 6.9.1 Requiring a Library: The `LIBRARY' Directive | |
3179 | ||
3180 | RDOFF contains a mechanism for an object file to demand a given | |
3181 | library to be linked to the module, either at load time or run time. | |
3182 | This is done by the `LIBRARY' directive, which takes one argument | |
3183 | which is the name of the module: | |
3184 | ||
3185 | library mylib.rdl | |
3186 | ||
3187 | 6.10 `dbg': Debugging Format | |
3188 | ||
3189 | The `dbg' output format is not built into NASM in the default | |
3190 | configuration. If you are building your own NASM executable from the | |
3191 | sources, you can define `OF_DBG' in `outform.h' or on the compiler | |
3192 | command line, and obtain the `dbg' output format. | |
3193 | ||
3194 | The `dbg' format does not output an object file as such; instead, it | |
3195 | outputs a text file which contains a complete list of all the | |
3196 | transactions between the main body of NASM and the output-format | |
3197 | back end module. It is primarily intended to aid people who want to | |
3198 | write their own output drivers, so that they can get a clearer idea | |
3199 | of the various requests the main program makes of the output driver, | |
3200 | and in what order they happen. | |
3201 | ||
3202 | For simple files, one can easily use the `dbg' format like this: | |
3203 | ||
3204 | nasm -f dbg filename.asm | |
3205 | ||
3206 | which will generate a diagnostic file called `filename.dbg'. | |
3207 | However, this will not work well on files which were designed for a | |
3208 | different object format, because each object format defines its own | |
3209 | macros (usually user-level forms of directives), and those macros | |
3210 | will not be defined in the `dbg' format. Therefore it can be useful | |
3211 | to run NASM twice, in order to do the preprocessing with the native | |
3212 | object format selected: | |
3213 | ||
3214 | nasm -e -f rdf -o rdfprog.i rdfprog.asm | |
3215 | nasm -a -f dbg rdfprog.i | |
3216 | ||
3217 | This preprocesses `rdfprog.asm' into `rdfprog.i', keeping the `rdf' | |
3218 | object format selected in order to make sure RDF special directives | |
3219 | are converted into primitive form correctly. Then the preprocessed | |
3220 | source is fed through the `dbg' format to generate the final | |
3221 | diagnostic output. | |
3222 | ||
3223 | This workaround will still typically not work for programs intended | |
3224 | for `obj' format, because the `obj' `SEGMENT' and `GROUP' directives | |
3225 | have side effects of defining the segment and group names as | |
3226 | symbols; `dbg' will not do this, so the program will not assemble. | |
3227 | You will have to work around that by defining the symbols yourself | |
3228 | (using `EXTERN', for example) if you really need to get a `dbg' | |
3229 | trace of an `obj'-specific source file. | |
3230 | ||
3231 | `dbg' accepts any section name and any directives at all, and logs | |
3232 | them all to its output file. | |
3233 | ||
3234 | Chapter 7: Writing 16-bit Code (DOS, Windows 3/3.1) | |
3235 | --------------------------------------------------- | |
3236 | ||
3237 | This chapter attempts to cover some of the common issues encountered | |
3238 | when writing 16-bit code to run under MS-DOS or Windows 3.x. It | |
3239 | covers how to link programs to produce `.EXE' or `.COM' files, how | |
3240 | to write `.SYS' device drivers, and how to interface assembly | |
3241 | language code with 16-bit C compilers and with Borland Pascal. | |
3242 | ||
3243 | 7.1 Producing `.EXE' Files | |
3244 | ||
3245 | Any large program written under DOS needs to be built as a `.EXE' | |
3246 | file: only `.EXE' files have the necessary internal structure | |
3247 | required to span more than one 64K segment. Windows programs, also, | |
3248 | have to be built as `.EXE' files, since Windows does not support the | |
3249 | `.COM' format. | |
3250 | ||
3251 | In general, you generate `.EXE' files by using the `obj' output | |
3252 | format to produce one or more `.OBJ' files, and then linking them | |
3253 | together using a linker. However, NASM also supports the direct | |
3254 | generation of simple DOS `.EXE' files using the `bin' output format | |
3255 | (by using `DB' and `DW' to construct the `.EXE' file header), and a | |
3256 | macro package is supplied to do this. Thanks to Yann Guidon for | |
3257 | contributing the code for this. | |
3258 | ||
3259 | NASM may also support `.EXE' natively as another output format in | |
3260 | future releases. | |
3261 | ||
3262 | 7.1.1 Using the `obj' Format To Generate `.EXE' Files | |
3263 | ||
3264 | This section describes the usual method of generating `.EXE' files | |
3265 | by linking `.OBJ' files together. | |
3266 | ||
3267 | Most 16-bit programming language packages come with a suitable | |
3268 | linker; if you have none of these, there is a free linker called | |
3269 | VAL, available in `LZH' archive format from `x2ftp.oulu.fi'. An LZH | |
3270 | archiver can be found at `ftp.simtel.net'. There is another `free' | |
3271 | linker (though this one doesn't come with sources) called FREELINK, | |
3272 | available from `www.pcorner.com'. A third, `djlink', written by DJ | |
3273 | Delorie, is available at `www.delorie.com'. | |
3274 | ||
3275 | When linking several `.OBJ' files into a `.EXE' file, you should | |
3276 | ensure that exactly one of them has a start point defined (using the | |
3277 | `..start' special symbol defined by the `obj' format: see section | |
3278 | 6.2.6). If no module defines a start point, the linker will not know | |
3279 | what value to give the entry-point field in the output file header; | |
3280 | if more than one defines a start point, the linker will not know | |
3281 | _which_ value to use. | |
3282 | ||
3283 | An example of a NASM source file which can be assembled to a `.OBJ' | |
3284 | file and linked on its own to a `.EXE' is given here. It | |
3285 | demonstrates the basic principles of defining a stack, initialising | |
3286 | the segment registers, and declaring a start point. This file is | |
3287 | also provided in the `test' subdirectory of the NASM archives, under | |
3288 | the name `objexe.asm'. | |
3289 | ||
3290 | segment code | |
3291 | ||
3292 | ..start: mov ax,data | |
3293 | mov ds,ax | |
3294 | mov ax,stack | |
3295 | mov ss,ax | |
3296 | mov sp,stacktop | |
3297 | ||
3298 | This initial piece of code sets up `DS' to point to the data | |
3299 | segment, and initialises `SS' and `SP' to point to the top of the | |
3300 | provided stack. Notice that interrupts are implicitly disabled for | |
3301 | one instruction after a move into `SS', precisely for this | |
3302 | situation, so that there's no chance of an interrupt occurring | |
3303 | between the loads of `SS' and `SP' and not having a stack to execute | |
3304 | on. | |
3305 | ||
3306 | Note also that the special symbol `..start' is defined at the | |
3307 | beginning of this code, which means that will be the entry point | |
3308 | into the resulting executable file. | |
3309 | ||
3310 | mov dx,hello | |
3311 | mov ah,9 | |
3312 | int 0x21 | |
3313 | ||
3314 | The above is the main program: load `DS:DX' with a pointer to the | |
3315 | greeting message (`hello' is implicitly relative to the segment | |
3316 | `data', which was loaded into `DS' in the setup code, so the full | |
3317 | pointer is valid), and call the DOS print-string function. | |
3318 | ||
3319 | mov ax,0x4c00 | |
3320 | int 0x21 | |
3321 | ||
3322 | This terminates the program using another DOS system call. | |
3323 | ||
3324 | segment data | |
3325 | hello: db 'hello, world', 13, 10, '$' | |
3326 | ||
3327 | The data segment contains the string we want to display. | |
3328 | ||
3329 | segment stack stack | |
3330 | resb 64 | |
3331 | stacktop: | |
3332 | ||
3333 | The above code declares a stack segment containing 64 bytes of | |
3334 | uninitialised stack space, and points `stacktop' at the top of it. | |
3335 | The directive `segment stack stack' defines a segment _called_ | |
3336 | `stack', and also of _type_ `STACK'. The latter is not necessary to | |
3337 | the correct running of the program, but linkers are likely to issue | |
3338 | warnings or errors if your program has no segment of type `STACK'. | |
3339 | ||
3340 | The above file, when assembled into a `.OBJ' file, will link on its | |
3341 | own to a valid `.EXE' file, which when run will print `hello, world' | |
3342 | and then exit. | |
3343 | ||
3344 | 7.1.2 Using the `bin' Format To Generate `.EXE' Files | |
3345 | ||
3346 | The `.EXE' file format is simple enough that it's possible to build | |
3347 | a `.EXE' file by writing a pure-binary program and sticking a 32- | |
3348 | byte header on the front. This header is simple enough that it can | |
3349 | be generated using `DB' and `DW' commands by NASM itself, so that | |
3350 | you can use the `bin' output format to directly generate `.EXE' | |
3351 | files. | |
3352 | ||
3353 | Included in the NASM archives, in the `misc' subdirectory, is a file | |
3354 | `exebin.mac' of macros. It defines three macros: `EXE_begin', | |
3355 | `EXE_stack' and `EXE_end'. | |
3356 | ||
3357 | To produce a `.EXE' file using this method, you should start by | |
3358 | using `%include' to load the `exebin.mac' macro package into your | |
3359 | source file. You should then issue the `EXE_begin' macro call (which | |
3360 | takes no arguments) to generate the file header data. Then write | |
3361 | code as normal for the `bin' format - you can use all three standard | |
3362 | sections `.text', `.data' and `.bss'. At the end of the file you | |
3363 | should call the `EXE_end' macro (again, no arguments), which defines | |
3364 | some symbols to mark section sizes, and these symbols are referred | |
3365 | to in the header code generated by `EXE_begin'. | |
3366 | ||
3367 | In this model, the code you end up writing starts at `0x100', just | |
3368 | like a `.COM' file - in fact, if you strip off the 32-byte header | |
3369 | from the resulting `.EXE' file, you will have a valid `.COM' | |
3370 | program. All the segment bases are the same, so you are limited to a | |
3371 | 64K program, again just like a `.COM' file. Note that an `ORG' | |
3372 | directive is issued by the `EXE_begin' macro, so you should not | |
3373 | explicitly issue one of your own. | |
3374 | ||
3375 | You can't directly refer to your segment base value, unfortunately, | |
3376 | since this would require a relocation in the header, and things | |
3377 | would get a lot more complicated. So you should get your segment | |
3378 | base by copying it out of `CS' instead. | |
3379 | ||
3380 | On entry to your `.EXE' file, `SS:SP' are already set up to point to | |
3381 | the top of a 2Kb stack. You can adjust the default stack size of 2Kb | |
3382 | by calling the `EXE_stack' macro. For example, to change the stack | |
3383 | size of your program to 64 bytes, you would call `EXE_stack 64'. | |
3384 | ||
3385 | A sample program which generates a `.EXE' file in this way is given | |
3386 | in the `test' subdirectory of the NASM archive, as `binexe.asm'. | |
3387 | ||
3388 | 7.2 Producing `.COM' Files | |
3389 | ||
3390 | While large DOS programs must be written as `.EXE' files, small ones | |
3391 | are often better written as `.COM' files. `.COM' files are pure | |
3392 | binary, and therefore most easily produced using the `bin' output | |
3393 | format. | |
3394 | ||
3395 | 7.2.1 Using the `bin' Format To Generate `.COM' Files | |
3396 | ||
3397 | `.COM' files expect to be loaded at offset `100h' into their segment | |
3398 | (though the segment may change). Execution then begins at `100h', | |
3399 | i.e. right at the start of the program. So to write a `.COM' | |
3400 | program, you would create a source file looking like | |
3401 | ||
3402 | org 100h | |
3403 | section .text | |
3404 | start: ; put your code here | |
3405 | section .data | |
3406 | ; put data items here | |
3407 | section .bss | |
3408 | ; put uninitialised data here | |
3409 | ||
3410 | The `bin' format puts the `.text' section first in the file, so you | |
3411 | can declare data or BSS items before beginning to write code if you | |
3412 | want to and the code will still end up at the front of the file | |
3413 | where it belongs. | |
3414 | ||
3415 | The BSS (uninitialised data) section does not take up space in the | |
3416 | `.COM' file itself: instead, addresses of BSS items are resolved to | |
3417 | point at space beyond the end of the file, on the grounds that this | |
3418 | will be free memory when the program is run. Therefore you should | |
3419 | not rely on your BSS being initialised to all zeros when you run. | |
3420 | ||
3421 | To assemble the above program, you should use a command line like | |
3422 | ||
3423 | nasm myprog.asm -fbin -o myprog.com | |
3424 | ||
3425 | The `bin' format would produce a file called `myprog' if no explicit | |
3426 | output file name were specified, so you have to override it and give | |
3427 | the desired file name. | |
3428 | ||
3429 | 7.2.2 Using the `obj' Format To Generate `.COM' Files | |
3430 | ||
3431 | If you are writing a `.COM' program as more than one module, you may | |
3432 | wish to assemble several `.OBJ' files and link them together into a | |
3433 | `.COM' program. You can do this, provided you have a linker capable | |
3434 | of outputting `.COM' files directly (TLINK does this), or | |
3435 | alternatively a converter program such as `EXE2BIN' to transform the | |
3436 | `.EXE' file output from the linker into a `.COM' file. | |
3437 | ||
3438 | If you do this, you need to take care of several things: | |
3439 | ||
3440 | (*) The first object file containing code should start its code | |
3441 | segment with a line like `RESB 100h'. This is to ensure that the | |
3442 | code begins at offset `100h' relative to the beginning of the | |
3443 | code segment, so that the linker or converter program does not | |
3444 | have to adjust address references within the file when | |
3445 | generating the `.COM' file. Other assemblers use an `ORG' | |
3446 | directive for this purpose, but `ORG' in NASM is a format- | |
3447 | specific directive to the `bin' output format, and does not mean | |
3448 | the same thing as it does in MASM-compatible assemblers. | |
3449 | ||
3450 | (*) You don't need to define a stack segment. | |
3451 | ||
3452 | (*) All your segments should be in the same group, so that every | |
3453 | time your code or data references a symbol offset, all offsets | |
3454 | are relative to the same segment base. This is because, when a | |
3455 | `.COM' file is loaded, all the segment registers contain the | |
3456 | same value. | |
3457 | ||
3458 | 7.3 Producing `.SYS' Files | |
3459 | ||
3460 | MS-DOS device drivers - `.SYS' files - are pure binary files, | |
3461 | similar to `.COM' files, except that they start at origin zero | |
3462 | rather than `100h'. Therefore, if you are writing a device driver | |
3463 | using the `bin' format, you do not need the `ORG' directive, since | |
3464 | the default origin for `bin' is zero. Similarly, if you are using | |
3465 | `obj', you do not need the `RESB 100h' at the start of your code | |
3466 | segment. | |
3467 | ||
3468 | `.SYS' files start with a header structure, containing pointers to | |
3469 | the various routines inside the driver which do the work. This | |
3470 | structure should be defined at the start of the code segment, even | |
3471 | though it is not actually code. | |
3472 | ||
3473 | For more information on the format of `.SYS' files, and the data | |
3474 | which has to go in the header structure, a list of books is given in | |
3475 | the Frequently Asked Questions list for the newsgroup | |
3476 | `comp.os.msdos.programmer'. | |
3477 | ||
3478 | 7.4 Interfacing to 16-bit C Programs | |
3479 | ||
3480 | This section covers the basics of writing assembly routines that | |
3481 | call, or are called from, C programs. To do this, you would | |
3482 | typically write an assembly module as a `.OBJ' file, and link it | |
3483 | with your C modules to produce a mixed-language program. | |
3484 | ||
3485 | 7.4.1 External Symbol Names | |
3486 | ||
3487 | C compilers have the convention that the names of all global symbols | |
3488 | (functions or data) they define are formed by prefixing an | |
3489 | underscore to the name as it appears in the C program. So, for | |
3490 | example, the function a C programmer thinks of as `printf' appears | |
3491 | to an assembly language programmer as `_printf'. This means that in | |
3492 | your assembly programs, you can define symbols without a leading | |
3493 | underscore, and not have to worry about name clashes with C symbols. | |
3494 | ||
3495 | If you find the underscores inconvenient, you can define macros to | |
3496 | replace the `GLOBAL' and `EXTERN' directives as follows: | |
3497 | ||
3498 | %macro cglobal 1 | |
3499 | global _%1 | |
3500 | %define %1 _%1 | |
3501 | %endmacro | |
3502 | ||
3503 | %macro cextern 1 | |
3504 | extern _%1 | |
3505 | %define %1 _%1 | |
3506 | %endmacro | |
3507 | ||
3508 | (These forms of the macros only take one argument at a time; a | |
3509 | `%rep' construct could solve this.) | |
3510 | ||
3511 | If you then declare an external like this: | |
3512 | ||
3513 | cextern printf | |
3514 | ||
3515 | then the macro will expand it as | |
3516 | ||
3517 | extern _printf | |
3518 | %define printf _printf | |
3519 | ||
3520 | Thereafter, you can reference `printf' as if it was a symbol, and | |
3521 | the preprocessor will put the leading underscore on where necessary. | |
3522 | ||
3523 | The `cglobal' macro works similarly. You must use `cglobal' before | |
3524 | defining the symbol in question, but you would have had to do that | |
3525 | anyway if you used `GLOBAL'. | |
3526 | ||
3527 | 7.4.2 Memory Models | |
3528 | ||
3529 | NASM contains no mechanism to support the various C memory models | |
3530 | directly; you have to keep track yourself of which one you are | |
3531 | writing for. This means you have to keep track of the following | |
3532 | things: | |
3533 | ||
3534 | (*) In models using a single code segment (tiny, small and compact), | |
3535 | functions are near. This means that function pointers, when | |
3536 | stored in data segments or pushed on the stack as function | |
3537 | arguments, are 16 bits long and contain only an offset field | |
3538 | (the `CS' register never changes its value, and always gives the | |
3539 | segment part of the full function address), and that functions | |
3540 | are called using ordinary near `CALL' instructions and return | |
3541 | using `RETN' (which, in NASM, is synonymous with `RET' anyway). | |
3542 | This means both that you should write your own routines to | |
3543 | return with `RETN', and that you should call external C routines | |
3544 | with near `CALL' instructions. | |
3545 | ||
3546 | (*) In models using more than one code segment (medium, large and | |
3547 | huge), functions are far. This means that function pointers are | |
3548 | 32 bits long (consisting of a 16-bit offset followed by a 16-bit | |
3549 | segment), and that functions are called using `CALL FAR' (or | |
3550 | `CALL seg:offset') and return using `RETF'. Again, you should | |
3551 | therefore write your own routines to return with `RETF' and use | |
3552 | `CALL FAR' to call external routines. | |
3553 | ||
3554 | (*) In models using a single data segment (tiny, small and medium), | |
3555 | data pointers are 16 bits long, containing only an offset field | |
3556 | (the `DS' register doesn't change its value, and always gives | |
3557 | the segment part of the full data item address). | |
3558 | ||
3559 | (*) In models using more than one data segment (compact, large and | |
3560 | huge), data pointers are 32 bits long, consisting of a 16-bit | |
3561 | offset followed by a 16-bit segment. You should still be careful | |
3562 | not to modify `DS' in your routines without restoring it | |
3563 | afterwards, but `ES' is free for you to use to access the | |
3564 | contents of 32-bit data pointers you are passed. | |
3565 | ||
3566 | (*) The huge memory model allows single data items to exceed 64K in | |
3567 | size. In all other memory models, you can access the whole of a | |
3568 | data item just by doing arithmetic on the offset field of the | |
3569 | pointer you are given, whether a segment field is present or | |
3570 | not; in huge model, you have to be more careful of your pointer | |
3571 | arithmetic. | |
3572 | ||
3573 | (*) In most memory models, there is a _default_ data segment, whose | |
3574 | segment address is kept in `DS' throughout the program. This | |
3575 | data segment is typically the same segment as the stack, kept in | |
3576 | `SS', so that functions' local variables (which are stored on | |
3577 | the stack) and global data items can both be accessed easily | |
3578 | without changing `DS'. Particularly large data items are | |
3579 | typically stored in other segments. However, some memory models | |
3580 | (though not the standard ones, usually) allow the assumption | |
3581 | that `SS' and `DS' hold the same value to be removed. Be careful | |
3582 | about functions' local variables in this latter case. | |
3583 | ||
3584 | In models with a single code segment, the segment is called `_TEXT', | |
3585 | so your code segment must also go by this name in order to be linked | |
3586 | into the same place as the main code segment. In models with a | |
3587 | single data segment, or with a default data segment, it is called | |
3588 | `_DATA'. | |
3589 | ||
3590 | 7.4.3 Function Definitions and Function Calls | |
3591 | ||
3592 | The C calling convention in 16-bit programs is as follows. In the | |
3593 | following description, the words _caller_ and _callee_ are used to | |
3594 | denote the function doing the calling and the function which gets | |
3595 | called. | |
3596 | ||
3597 | (*) The caller pushes the function's parameters on the stack, one | |
3598 | after another, in reverse order (right to left, so that the | |
3599 | first argument specified to the function is pushed last). | |
3600 | ||
3601 | (*) The caller then executes a `CALL' instruction to pass control to | |
3602 | the callee. This `CALL' is either near or far depending on the | |
3603 | memory model. | |
3604 | ||
3605 | (*) The callee receives control, and typically (although this is not | |
3606 | actually necessary, in functions which do not need to access | |
3607 | their parameters) starts by saving the value of `SP' in `BP' so | |
3608 | as to be able to use `BP' as a base pointer to find its | |
3609 | parameters on the stack. However, the caller was probably doing | |
3610 | this too, so part of the calling convention states that `BP' | |
3611 | must be preserved by any C function. Hence the callee, if it is | |
3612 | going to set up `BP' as a _frame pointer_, must push the | |
3613 | previous value first. | |
3614 | ||
3615 | (*) The callee may then access its parameters relative to `BP'. The | |
3616 | word at `[BP]' holds the previous value of `BP' as it was | |
3617 | pushed; the next word, at `[BP+2]', holds the offset part of the | |
3618 | return address, pushed implicitly by `CALL'. In a small-model | |
3619 | (near) function, the parameters start after that, at `[BP+4]'; | |
3620 | in a large-model (far) function, the segment part of the return | |
3621 | address lives at `[BP+4]', and the parameters begin at `[BP+6]'. | |
3622 | The leftmost parameter of the function, since it was pushed | |
3623 | last, is accessible at this offset from `BP'; the others follow, | |
3624 | at successively greater offsets. Thus, in a function such as | |
3625 | `printf' which takes a variable number of parameters, the | |
3626 | pushing of the parameters in reverse order means that the | |
3627 | function knows where to find its first parameter, which tells it | |
3628 | the number and type of the remaining ones. | |
3629 | ||
3630 | (*) The callee may also wish to decrease `SP' further, so as to | |
3631 | allocate space on the stack for local variables, which will then | |
3632 | be accessible at negative offsets from `BP'. | |
3633 | ||
3634 | (*) The callee, if it wishes to return a value to the caller, should | |
3635 | leave the value in `AL', `AX' or `DX:AX' depending on the size | |
3636 | of the value. Floating-point results are sometimes (depending on | |
3637 | the compiler) returned in `ST0'. | |
3638 | ||
3639 | (*) Once the callee has finished processing, it restores `SP' from | |
3640 | `BP' if it had allocated local stack space, then pops the | |
3641 | previous value of `BP', and returns via `RETN' or `RETF' | |
3642 | depending on memory model. | |
3643 | ||
3644 | (*) When the caller regains control from the callee, the function | |
3645 | parameters are still on the stack, so it typically adds an | |
3646 | immediate constant to `SP' to remove them (instead of executing | |
3647 | a number of slow `POP' instructions). Thus, if a function is | |
3648 | accidentally called with the wrong number of parameters due to a | |
3649 | prototype mismatch, the stack will still be returned to a | |
3650 | sensible state since the caller, which _knows_ how many | |
3651 | parameters it pushed, does the removing. | |
3652 | ||
3653 | It is instructive to compare this calling convention with that for | |
3654 | Pascal programs (described in section 7.5.1). Pascal has a simpler | |
3655 | convention, since no functions have variable numbers of parameters. | |
3656 | Therefore the callee knows how many parameters it should have been | |
3657 | passed, and is able to deallocate them from the stack itself by | |
3658 | passing an immediate argument to the `RET' or `RETF' instruction, so | |
3659 | the caller does not have to do it. Also, the parameters are pushed | |
3660 | in left-to-right order, not right-to-left, which means that a | |
3661 | compiler can give better guarantees about sequence points without | |
3662 | performance suffering. | |
3663 | ||
3664 | Thus, you would define a function in C style in the following way. | |
3665 | The following example is for small model: | |
3666 | ||
3667 | global _myfunc | |
3668 | _myfunc: push bp | |
3669 | mov bp,sp | |
3670 | sub sp,0x40 ; 64 bytes of local stack space | |
3671 | mov bx,[bp+4] ; first parameter to function | |
3672 | ; some more code | |
3673 | mov sp,bp ; undo "sub sp,0x40" above | |
3674 | pop bp | |
3675 | ret | |
3676 | ||
3677 | For a large-model function, you would replace `RET' by `RETF', and | |
3678 | look for the first parameter at `[BP+6]' instead of `[BP+4]'. Of | |
3679 | course, if one of the parameters is a pointer, then the offsets of | |
3680 | _subsequent_ parameters will change depending on the memory model as | |
3681 | well: far pointers take up four bytes on the stack when passed as a | |
3682 | parameter, whereas near pointers take up two. | |
3683 | ||
3684 | At the other end of the process, to call a C function from your | |
3685 | assembly code, you would do something like this: | |
3686 | ||
3687 | extern _printf | |
3688 | ; and then, further down... | |
3689 | push word [myint] ; one of my integer variables | |
3690 | push word mystring ; pointer into my data segment | |
3691 | call _printf | |
3692 | add sp,byte 4 ; `byte' saves space | |
3693 | ; then those data items... | |
3694 | segment _DATA | |
3695 | myint dw 1234 | |
3696 | mystring db 'This number -> %d <- should be 1234',10,0 | |
3697 | ||
3698 | This piece of code is the small-model assembly equivalent of the C | |
3699 | code | |
3700 | ||
3701 | int myint = 1234; | |
3702 | printf("This number -> %d <- should be 1234\n", myint); | |
3703 | ||
3704 | In large model, the function-call code might look more like this. In | |
3705 | this example, it is assumed that `DS' already holds the segment base | |
3706 | of the segment `_DATA'. If not, you would have to initialise it | |
3707 | first. | |
3708 | ||
3709 | push word [myint] | |
3710 | push word seg mystring ; Now push the segment, and... | |
3711 | push word mystring ; ... offset of "mystring" | |
3712 | call far _printf | |
3713 | add sp,byte 6 | |
3714 | ||
3715 | The integer value still takes up one word on the stack, since large | |
3716 | model does not affect the size of the `int' data type. The first | |
3717 | argument (pushed last) to `printf', however, is a data pointer, and | |
3718 | therefore has to contain a segment and offset part. The segment | |
3719 | should be stored second in memory, and therefore must be pushed | |
3720 | first. (Of course, `PUSH DS' would have been a shorter instruction | |
3721 | than `PUSH WORD SEG mystring', if `DS' was set up as the above | |
3722 | example assumed.) Then the actual call becomes a far call, since | |
3723 | functions expect far calls in large model; and `SP' has to be | |
3724 | increased by 6 rather than 4 afterwards to make up for the extra | |
3725 | word of parameters. | |
3726 | ||
3727 | 7.4.4 Accessing Data Items | |
3728 | ||
3729 | To get at the contents of C variables, or to declare variables which | |
3730 | C can access, you need only declare the names as `GLOBAL' or | |
3731 | `EXTERN'. (Again, the names require leading underscores, as stated | |
3732 | in section 7.4.1.) Thus, a C variable declared as `int i' can be | |
3733 | accessed from assembler as | |
3734 | ||
3735 | extern _i | |
3736 | mov ax,[_i] | |
3737 | ||
3738 | And to declare your own integer variable which C programs can access | |
3739 | as `extern int j', you do this (making sure you are assembling in | |
3740 | the `_DATA' segment, if necessary): | |
3741 | ||
3742 | global _j | |
3743 | _j dw 0 | |
3744 | ||
3745 | To access a C array, you need to know the size of the components of | |
3746 | the array. For example, `int' variables are two bytes long, so if a | |
3747 | C program declares an array as `int a[10]', you can access `a[3]' by | |
3748 | coding `mov ax,[_a+6]'. (The byte offset 6 is obtained by | |
3749 | multiplying the desired array index, 3, by the size of the array | |
3750 | element, 2.) The sizes of the C base types in 16-bit compilers are: | |
3751 | 1 for `char', 2 for `short' and `int', 4 for `long' and `float', and | |
3752 | 8 for `double'. | |
3753 | ||
3754 | To access a C data structure, you need to know the offset from the | |
3755 | base of the structure to the field you are interested in. You can | |
3756 | either do this by converting the C structure definition into a NASM | |
3757 | structure definition (using `STRUC'), or by calculating the one | |
3758 | offset and using just that. | |
3759 | ||
3760 | To do either of these, you should read your C compiler's manual to | |
3761 | find out how it organises data structures. NASM gives no special | |
3762 | alignment to structure members in its own `STRUC' macro, so you have | |
3763 | to specify alignment yourself if the C compiler generates it. | |
3764 | Typically, you might find that a structure like | |
3765 | ||
3766 | struct { | |
3767 | char c; | |
3768 | int i; | |
3769 | } foo; | |
3770 | ||
3771 | might be four bytes long rather than three, since the `int' field | |
3772 | would be aligned to a two-byte boundary. However, this sort of | |
3773 | feature tends to be a configurable option in the C compiler, either | |
3774 | using command-line options or `#pragma' lines, so you have to find | |
3775 | out how your own compiler does it. | |
3776 | ||
3777 | 7.4.5 `c16.mac': Helper Macros for the 16-bit C Interface | |
3778 | ||
3779 | Included in the NASM archives, in the `misc' directory, is a file | |
3780 | `c16.mac' of macros. It defines three macros: `proc', `arg' and | |
3781 | `endproc'. These are intended to be used for C-style procedure | |
3782 | definitions, and they automate a lot of the work involved in keeping | |
3783 | track of the calling convention. | |
3784 | ||
3785 | An example of an assembly function using the macro set is given | |
3786 | here: | |
3787 | ||
3788 | proc _nearproc | |
3789 | %$i arg | |
3790 | %$j arg | |
3791 | mov ax,[bp + %$i] | |
3792 | mov bx,[bp + %$j] | |
3793 | add ax,[bx] | |
3794 | endproc | |
3795 | ||
3796 | This defines `_nearproc' to be a procedure taking two arguments, the | |
3797 | first (`i') an integer and the second (`j') a pointer to an integer. | |
3798 | It returns `i + *j'. | |
3799 | ||
3800 | Note that the `arg' macro has an `EQU' as the first line of its | |
3801 | expansion, and since the label before the macro call gets prepended | |
3802 | to the first line of the expanded macro, the `EQU' works, defining | |
3803 | `%$i' to be an offset from `BP'. A context-local variable is used, | |
3804 | local to the context pushed by the `proc' macro and popped by the | |
3805 | `endproc' macro, so that the same argument name can be used in later | |
3806 | procedures. Of course, you don't _have_ to do that. | |
3807 | ||
3808 | The macro set produces code for near functions (tiny, small and | |
3809 | compact-model code) by default. You can have it generate far | |
3810 | functions (medium, large and huge-model code) by means of coding | |
3811 | `%define FARCODE'. This changes the kind of return instruction | |
3812 | generated by `endproc', and also changes the starting point for the | |
3813 | argument offsets. The macro set contains no intrinsic dependency on | |
3814 | whether data pointers are far or not. | |
3815 | ||
3816 | `arg' can take an optional parameter, giving the size of the | |
3817 | argument. If no size is given, 2 is assumed, since it is likely that | |
3818 | many function parameters will be of type `int'. | |
3819 | ||
3820 | The large-model equivalent of the above function would look like | |
3821 | this: | |
3822 | ||
3823 | %define FARCODE | |
3824 | proc _farproc | |
3825 | %$i arg | |
3826 | %$j arg 4 | |
3827 | mov ax,[bp + %$i] | |
3828 | mov bx,[bp + %$j] | |
3829 | mov es,[bp + %$j + 2] | |
3830 | add ax,[bx] | |
3831 | endproc | |
3832 | ||
3833 | This makes use of the argument to the `arg' macro to define a | |
3834 | parameter of size 4, because `j' is now a far pointer. When we load | |
3835 | from `j', we must load a segment and an offset. | |
3836 | ||
3837 | 7.5 Interfacing to Borland Pascal Programs | |
3838 | ||
3839 | Interfacing to Borland Pascal programs is similar in concept to | |
3840 | interfacing to 16-bit C programs. The differences are: | |
3841 | ||
3842 | (*) The leading underscore required for interfacing to C programs is | |
3843 | not required for Pascal. | |
3844 | ||
3845 | (*) The memory model is always large: functions are far, data | |
3846 | pointers are far, and no data item can be more than 64K long. | |
3847 | (Actually, some functions are near, but only those functions | |
3848 | that are local to a Pascal unit and never called from outside | |
3849 | it. All assembly functions that Pascal calls, and all Pascal | |
3850 | functions that assembly routines are able to call, are far.) | |
3851 | However, all static data declared in a Pascal program goes into | |
3852 | the default data segment, which is the one whose segment address | |
3853 | will be in `DS' when control is passed to your assembly code. | |
3854 | The only things that do not live in the default data segment are | |
3855 | local variables (they live in the stack segment) and dynamically | |
3856 | allocated variables. All data _pointers_, however, are far. | |
3857 | ||
3858 | (*) The function calling convention is different - described below. | |
3859 | ||
3860 | (*) Some data types, such as strings, are stored differently. | |
3861 | ||
3862 | (*) There are restrictions on the segment names you are allowed to | |
3863 | use - Borland Pascal will ignore code or data declared in a | |
3864 | segment it doesn't like the name of. The restrictions are | |
3865 | described below. | |
3866 | ||
3867 | 7.5.1 The Pascal Calling Convention | |
3868 | ||
3869 | The 16-bit Pascal calling convention is as follows. In the following | |
3870 | description, the words _caller_ and _callee_ are used to denote the | |
3871 | function doing the calling and the function which gets called. | |
3872 | ||
3873 | (*) The caller pushes the function's parameters on the stack, one | |
3874 | after another, in normal order (left to right, so that the first | |
3875 | argument specified to the function is pushed first). | |
3876 | ||
3877 | (*) The caller then executes a far `CALL' instruction to pass | |
3878 | control to the callee. | |
3879 | ||
3880 | (*) The callee receives control, and typically (although this is not | |
3881 | actually necessary, in functions which do not need to access | |
3882 | their parameters) starts by saving the value of `SP' in `BP' so | |
3883 | as to be able to use `BP' as a base pointer to find its | |
3884 | parameters on the stack. However, the caller was probably doing | |
3885 | this too, so part of the calling convention states that `BP' | |
3886 | must be preserved by any function. Hence the callee, if it is | |
3887 | going to set up `BP' as a frame pointer, must push the previous | |
3888 | value first. | |
3889 | ||
3890 | (*) The callee may then access its parameters relative to `BP'. The | |
3891 | word at `[BP]' holds the previous value of `BP' as it was | |
3892 | pushed. The next word, at `[BP+2]', holds the offset part of the | |
3893 | return address, and the next one at `[BP+4]' the segment part. | |
3894 | The parameters begin at `[BP+6]'. The rightmost parameter of the | |
3895 | function, since it was pushed last, is accessible at this offset | |
3896 | from `BP'; the others follow, at successively greater offsets. | |
3897 | ||
3898 | (*) The callee may also wish to decrease `SP' further, so as to | |
3899 | allocate space on the stack for local variables, which will then | |
3900 | be accessible at negative offsets from `BP'. | |
3901 | ||
3902 | (*) The callee, if it wishes to return a value to the caller, should | |
3903 | leave the value in `AL', `AX' or `DX:AX' depending on the size | |
3904 | of the value. Floating-point results are returned in `ST0'. | |
3905 | Results of type `Real' (Borland's own custom floating-point data | |
3906 | type, not handled directly by the FPU) are returned in | |
3907 | `DX:BX:AX'. To return a result of type `String', the caller | |
3908 | pushes a pointer to a temporary string before pushing the | |
3909 | parameters, and the callee places the returned string value at | |
3910 | that location. The pointer is not a parameter, and should not be | |
3911 | removed from the stack by the `RETF' instruction. | |
3912 | ||
3913 | (*) Once the callee has finished processing, it restores `SP' from | |
3914 | `BP' if it had allocated local stack space, then pops the | |
3915 | previous value of `BP', and returns via `RETF'. It uses the form | |
3916 | of `RETF' with an immediate parameter, giving the number of | |
3917 | bytes taken up by the parameters on the stack. This causes the | |
3918 | parameters to be removed from the stack as a side effect of the | |
3919 | return instruction. | |
3920 | ||
3921 | (*) When the caller regains control from the callee, the function | |
3922 | parameters have already been removed from the stack, so it needs | |
3923 | to do nothing further. | |
3924 | ||
3925 | Thus, you would define a function in Pascal style, taking two | |
3926 | `Integer'-type parameters, in the following way: | |
3927 | ||
3928 | global myfunc | |
3929 | myfunc: push bp | |
3930 | mov bp,sp | |
3931 | sub sp,0x40 ; 64 bytes of local stack space | |
3932 | mov bx,[bp+8] ; first parameter to function | |
3933 | mov bx,[bp+6] ; second parameter to function | |
3934 | ; some more code | |
3935 | mov sp,bp ; undo "sub sp,0x40" above | |
3936 | pop bp | |
3937 | retf 4 ; total size of params is 4 | |
3938 | ||
3939 | At the other end of the process, to call a Pascal function from your | |
3940 | assembly code, you would do something like this: | |
3941 | ||
3942 | extern SomeFunc | |
3943 | ; and then, further down... | |
3944 | push word seg mystring ; Now push the segment, and... | |
3945 | push word mystring ; ... offset of "mystring" | |
3946 | push word [myint] ; one of my variables | |
3947 | call far SomeFunc | |
3948 | ||
3949 | This is equivalent to the Pascal code | |
3950 | ||
3951 | procedure SomeFunc(String: PChar; Int: Integer); | |
3952 | SomeFunc(@mystring, myint); | |
3953 | ||
3954 | 7.5.2 Borland Pascal Segment Name Restrictions | |
3955 | ||
3956 | Since Borland Pascal's internal unit file format is completely | |
3957 | different from `OBJ', it only makes a very sketchy job of actually | |
3958 | reading and understanding the various information contained in a | |
3959 | real `OBJ' file when it links that in. Therefore an object file | |
3960 | intended to be linked to a Pascal program must obey a number of | |
3961 | restrictions: | |
3962 | ||
3963 | (*) Procedures and functions must be in a segment whose name is | |
3964 | either `CODE', `CSEG', or something ending in `_TEXT'. | |
3965 | ||
3966 | (*) Initialised data must be in a segment whose name is either | |
3967 | `CONST' or something ending in `_DATA'. | |
3968 | ||
3969 | (*) Uninitialised data must be in a segment whose name is either | |
3970 | `DATA', `DSEG', or something ending in `_BSS'. | |
3971 | ||
3972 | (*) Any other segments in the object file are completely ignored. | |
3973 | `GROUP' directives and segment attributes are also ignored. | |
3974 | ||
3975 | 7.5.3 Using `c16.mac' With Pascal Programs | |
3976 | ||
3977 | The `c16.mac' macro package, described in section 7.4.5, can also be | |
3978 | used to simplify writing functions to be called from Pascal | |
3979 | programs, if you code `%define PASCAL'. This definition ensures that | |
3980 | functions are far (it implies `FARCODE'), and also causes procedure | |
3981 | return instructions to be generated with an operand. | |
3982 | ||
3983 | Defining `PASCAL' does not change the code which calculates the | |
3984 | argument offsets; you must declare your function's arguments in | |
3985 | reverse order. For example: | |
3986 | ||
3987 | %define PASCAL | |
3988 | proc _pascalproc | |
3989 | %$j arg 4 | |
3990 | %$i arg | |
3991 | mov ax,[bp + %$i] | |
3992 | mov bx,[bp + %$j] | |
3993 | mov es,[bp + %$j + 2] | |
3994 | add ax,[bx] | |
3995 | endproc | |
3996 | ||
3997 | This defines the same routine, conceptually, as the example in | |
3998 | section 7.4.5: it defines a function taking two arguments, an | |
3999 | integer and a pointer to an integer, which returns the sum of the | |
4000 | integer and the contents of the pointer. The only difference between | |
4001 | this code and the large-model C version is that `PASCAL' is defined | |
4002 | instead of `FARCODE', and that the arguments are declared in reverse | |
4003 | order. | |
4004 | ||
4005 | Chapter 8: Writing 32-bit Code (Unix, Win32, DJGPP) | |
4006 | --------------------------------------------------- | |
4007 | ||
4008 | This chapter attempts to cover some of the common issues involved | |
4009 | when writing 32-bit code, to run under Win32 or Unix, or to be | |
4010 | linked with C code generated by a Unix-style C compiler such as | |
4011 | DJGPP. It covers how to write assembly code to interface with 32-bit | |
4012 | C routines, and how to write position-independent code for shared | |
4013 | libraries. | |
4014 | ||
4015 | Almost all 32-bit code, and in particular all code running under | |
4016 | Win32, DJGPP or any of the PC Unix variants, runs in _flat_ memory | |
4017 | model. This means that the segment registers and paging have already | |
4018 | been set up to give you the same 32-bit 4Gb address space no matter | |
4019 | what segment you work relative to, and that you should ignore all | |
4020 | segment registers completely. When writing flat-model application | |
4021 | code, you never need to use a segment override or modify any segment | |
4022 | register, and the code-section addresses you pass to `CALL' and | |
4023 | `JMP' live in the same address space as the data-section addresses | |
4024 | you access your variables by and the stack-section addresses you | |
4025 | access local variables and procedure parameters by. Every address is | |
4026 | 32 bits long and contains only an offset part. | |
4027 | ||
4028 | 8.1 Interfacing to 32-bit C Programs | |
4029 | ||
4030 | A lot of the discussion in section 7.4, about interfacing to 16-bit | |
4031 | C programs, still applies when working in 32 bits. The absence of | |
4032 | memory models or segmentation worries simplifies things a lot. | |
4033 | ||
4034 | 8.1.1 External Symbol Names | |
4035 | ||
4036 | Most 32-bit C compilers share the convention used by 16-bit | |
4037 | compilers, that the names of all global symbols (functions or data) | |
4038 | they define are formed by prefixing an underscore to the name as it | |
4039 | appears in the C program. However, not all of them do: the ELF | |
4040 | specification states that C symbols do _not_ have a leading | |
4041 | underscore on their assembly-language names. | |
4042 | ||
4043 | The older Linux `a.out' C compiler, all Win32 compilers, DJGPP, and | |
4044 | NetBSD and FreeBSD, all use the leading underscore; for these | |
4045 | compilers, the macros `cextern' and `cglobal', as given in section | |
4046 | 7.4.1, will still work. For ELF, though, the leading underscore | |
4047 | should not be used. | |
4048 | ||
4049 | 8.1.2 Function Definitions and Function Calls | |
4050 | ||
4051 | The C calling conventionThe C calling convention in 32-bit programs | |
4052 | is as follows. In the following description, the words _caller_ and | |
4053 | _callee_ are used to denote the function doing the calling and the | |
4054 | function which gets called. | |
4055 | ||
4056 | (*) The caller pushes the function's parameters on the stack, one | |
4057 | after another, in reverse order (right to left, so that the | |
4058 | first argument specified to the function is pushed last). | |
4059 | ||
4060 | (*) The caller then executes a near `CALL' instruction to pass | |
4061 | control to the callee. | |
4062 | ||
4063 | (*) The callee receives control, and typically (although this is not | |
4064 | actually necessary, in functions which do not need to access | |
4065 | their parameters) starts by saving the value of `ESP' in `EBP' | |
4066 | so as to be able to use `EBP' as a base pointer to find its | |
4067 | parameters on the stack. However, the caller was probably doing | |
4068 | this too, so part of the calling convention states that `EBP' | |
4069 | must be preserved by any C function. Hence the callee, if it is | |
4070 | going to set up `EBP' as a frame pointer, must push the previous | |
4071 | value first. | |
4072 | ||
4073 | (*) The callee may then access its parameters relative to `EBP'. The | |
4074 | doubleword at `[EBP]' holds the previous value of `EBP' as it | |
4075 | was pushed; the next doubleword, at `[EBP+4]', holds the return | |
4076 | address, pushed implicitly by `CALL'. The parameters start after | |
4077 | that, at `[EBP+8]'. The leftmost parameter of the function, | |
4078 | since it was pushed last, is accessible at this offset from | |
4079 | `EBP'; the others follow, at successively greater offsets. Thus, | |
4080 | in a function such as `printf' which takes a variable number of | |
4081 | parameters, the pushing of the parameters in reverse order means | |
4082 | that the function knows where to find its first parameter, which | |
4083 | tells it the number and type of the remaining ones. | |
4084 | ||
4085 | (*) The callee may also wish to decrease `ESP' further, so as to | |
4086 | allocate space on the stack for local variables, which will then | |
4087 | be accessible at negative offsets from `EBP'. | |
4088 | ||
4089 | (*) The callee, if it wishes to return a value to the caller, should | |
4090 | leave the value in `AL', `AX' or `EAX' depending on the size of | |
4091 | the value. Floating-point results are typically returned in | |
4092 | `ST0'. | |
4093 | ||
4094 | (*) Once the callee has finished processing, it restores `ESP' from | |
4095 | `EBP' if it had allocated local stack space, then pops the | |
4096 | previous value of `EBP', and returns via `RET' (equivalently, | |
4097 | `RETN'). | |
4098 | ||
4099 | (*) When the caller regains control from the callee, the function | |
4100 | parameters are still on the stack, so it typically adds an | |
4101 | immediate constant to `ESP' to remove them (instead of executing | |
4102 | a number of slow `POP' instructions). Thus, if a function is | |
4103 | accidentally called with the wrong number of parameters due to a | |
4104 | prototype mismatch, the stack will still be returned to a | |
4105 | sensible state since the caller, which _knows_ how many | |
4106 | parameters it pushed, does the removing. | |
4107 | ||
4108 | There is an alternative calling convention used by Win32 programs | |
4109 | for Windows API calls, and also for functions called _by_ the | |
4110 | Windows API such as window procedures: they follow what Microsoft | |
4111 | calls the `__stdcall' convention. This is slightly closer to the | |
4112 | Pascal convention, in that the callee clears the stack by passing a | |
4113 | parameter to the `RET' instruction. However, the parameters are | |
4114 | still pushed in right-to-left order. | |
4115 | ||
4116 | Thus, you would define a function in C style in the following way: | |
4117 | ||
4118 | global _myfunc | |
4119 | _myfunc: push ebp | |
4120 | mov ebp,esp | |
4121 | sub esp,0x40 ; 64 bytes of local stack space | |
4122 | mov ebx,[ebp+8] ; first parameter to function | |
4123 | ; some more code | |
4124 | leave ; mov esp,ebp / pop ebp | |
4125 | ret | |
4126 | ||
4127 | At the other end of the process, to call a C function from your | |
4128 | assembly code, you would do something like this: | |
4129 | ||
4130 | extern _printf | |
4131 | ; and then, further down... | |
4132 | push dword [myint] ; one of my integer variables | |
4133 | push dword mystring ; pointer into my data segment | |
4134 | call _printf | |
4135 | add esp,byte 8 ; `byte' saves space | |
4136 | ; then those data items... | |
4137 | segment _DATA | |
4138 | myint dd 1234 | |
4139 | mystring db 'This number -> %d <- should be 1234',10,0 | |
4140 | ||
4141 | This piece of code is the assembly equivalent of the C code | |
4142 | ||
4143 | int myint = 1234; | |
4144 | printf("This number -> %d <- should be 1234\n", myint); | |
4145 | ||
4146 | 8.1.3 Accessing Data Items | |
4147 | ||
4148 | To get at the contents of C variables, or to declare variables which | |
4149 | C can access, you need only declare the names as `GLOBAL' or | |
4150 | `EXTERN'. (Again, the names require leading underscores, as stated | |
4151 | in section 8.1.1.) Thus, a C variable declared as `int i' can be | |
4152 | accessed from assembler as | |
4153 | ||
4154 | extern _i | |
4155 | mov eax,[_i] | |
4156 | ||
4157 | And to declare your own integer variable which C programs can access | |
4158 | as `extern int j', you do this (making sure you are assembling in | |
4159 | the `_DATA' segment, if necessary): | |
4160 | ||
4161 | global _j | |
4162 | _j dd 0 | |
4163 | ||
4164 | To access a C array, you need to know the size of the components of | |
4165 | the array. For example, `int' variables are four bytes long, so if a | |
4166 | C program declares an array as `int a[10]', you can access `a[3]' by | |
4167 | coding `mov ax,[_a+12]'. (The byte offset 12 is obtained by | |
4168 | multiplying the desired array index, 3, by the size of the array | |
4169 | element, 4.) The sizes of the C base types in 32-bit compilers are: | |
4170 | 1 for `char', 2 for `short', 4 for `int', `long' and `float', and 8 | |
4171 | for `double'. Pointers, being 32-bit addresses, are also 4 bytes | |
4172 | long. | |
4173 | ||
4174 | To access a C data structure, you need to know the offset from the | |
4175 | base of the structure to the field you are interested in. You can | |
4176 | either do this by converting the C structure definition into a NASM | |
4177 | structure definition (using `STRUC'), or by calculating the one | |
4178 | offset and using just that. | |
4179 | ||
4180 | To do either of these, you should read your C compiler's manual to | |
4181 | find out how it organises data structures. NASM gives no special | |
4182 | alignment to structure members in its own `STRUC' macro, so you have | |
4183 | to specify alignment yourself if the C compiler generates it. | |
4184 | Typically, you might find that a structure like | |
4185 | ||
4186 | struct { | |
4187 | char c; | |
4188 | int i; | |
4189 | } foo; | |
4190 | ||
4191 | might be eight bytes long rather than five, since the `int' field | |
4192 | would be aligned to a four-byte boundary. However, this sort of | |
4193 | feature is sometimes a configurable option in the C compiler, either | |
4194 | using command-line options or `#pragma' lines, so you have to find | |
4195 | out how your own compiler does it. | |
4196 | ||
4197 | 8.1.4 `c32.mac': Helper Macros for the 32-bit C Interface | |
4198 | ||
4199 | Included in the NASM archives, in the `misc' directory, is a file | |
4200 | `c32.mac' of macros. It defines three macros: `proc', `arg' and | |
4201 | `endproc'. These are intended to be used for C-style procedure | |
4202 | definitions, and they automate a lot of the work involved in keeping | |
4203 | track of the calling convention. | |
4204 | ||
4205 | An example of an assembly function using the macro set is given | |
4206 | here: | |
4207 | ||
4208 | proc _proc32 | |
4209 | %$i arg | |
4210 | %$j arg | |
4211 | mov eax,[ebp + %$i] | |
4212 | mov ebx,[ebp + %$j] | |
4213 | add eax,[ebx] | |
4214 | endproc | |
4215 | ||
4216 | This defines `_proc32' to be a procedure taking two arguments, the | |
4217 | first (`i') an integer and the second (`j') a pointer to an integer. | |
4218 | It returns `i + *j'. | |
4219 | ||
4220 | Note that the `arg' macro has an `EQU' as the first line of its | |
4221 | expansion, and since the label before the macro call gets prepended | |
4222 | to the first line of the expanded macro, the `EQU' works, defining | |
4223 | `%$i' to be an offset from `BP'. A context-local variable is used, | |
4224 | local to the context pushed by the `proc' macro and popped by the | |
4225 | `endproc' macro, so that the same argument name can be used in later | |
4226 | procedures. Of course, you don't _have_ to do that. | |
4227 | ||
4228 | `arg' can take an optional parameter, giving the size of the | |
4229 | argument. If no size is given, 4 is assumed, since it is likely that | |
4230 | many function parameters will be of type `int' or pointers. | |
4231 | ||
4232 | 8.2 Writing NetBSD/FreeBSD/OpenBSD and Linux/ELF Shared Libraries | |
4233 | ||
4234 | ELF replaced the older `a.out' object file format under Linux | |
4235 | because it contains support for position-independent code (PIC), | |
4236 | which makes writing shared libraries much easier. NASM supports the | |
4237 | ELF position-independent code features, so you can write Linux ELF | |
4238 | shared libraries in NASM. | |
4239 | ||
4240 | NetBSD, and its close cousins FreeBSD and OpenBSD, take a different | |
4241 | approach by hacking PIC support into the `a.out' format. NASM | |
4242 | supports this as the `aoutb' output format, so you can write BSD | |
4243 | shared libraries in NASM too. | |
4244 | ||
4245 | The operating system loads a PIC shared library by memory-mapping | |
4246 | the library file at an arbitrarily chosen point in the address space | |
4247 | of the running process. The contents of the library's code section | |
4248 | must therefore not depend on where it is loaded in memory. | |
4249 | ||
4250 | Therefore, you cannot get at your variables by writing code like | |
4251 | this: | |
4252 | ||
4253 | mov eax,[myvar] ; WRONG | |
4254 | ||
4255 | Instead, the linker provides an area of memory called the _global | |
4256 | offset table_, or GOT; the GOT is situated at a constant distance | |
4257 | from your library's code, so if you can find out where your library | |
4258 | is loaded (which is typically done using a `CALL' and `POP' | |
4259 | combination), you can obtain the address of the GOT, and you can | |
4260 | then load the addresses of your variables out of linker-generated | |
4261 | entries in the GOT. | |
4262 | ||
4263 | The _data_ section of a PIC shared library does not have these | |
4264 | restrictions: since the data section is writable, it has to be | |
4265 | copied into memory anyway rather than just paged in from the library | |
4266 | file, so as long as it's being copied it can be relocated too. So | |
4267 | you can put ordinary types of relocation in the data section without | |
4268 | too much worry (but see section 8.2.4 for a caveat). | |
4269 | ||
4270 | 8.2.1 Obtaining the Address of the GOT | |
4271 | ||
4272 | Each code module in your shared library should define the GOT as an | |
4273 | external symbol: | |
4274 | ||
4275 | extern _GLOBAL_OFFSET_TABLE_ ; in ELF | |
4276 | extern __GLOBAL_OFFSET_TABLE_ ; in BSD a.out | |
4277 | ||
4278 | At the beginning of any function in your shared library which plans | |
4279 | to access your data or BSS sections, you must first calculate the | |
4280 | address of the GOT. This is typically done by writing the function | |
4281 | in this form: | |
4282 | ||
4283 | func: push ebp | |
4284 | mov ebp,esp | |
4285 | push ebx | |
4286 | call .get_GOT | |
4287 | .get_GOT: pop ebx | |
4288 | add ebx,_GLOBAL_OFFSET_TABLE_+$$-.get_GOT wrt ..gotpc | |
4289 | ; the function body comes here | |
4290 | mov ebx,[ebp-4] | |
4291 | mov esp,ebp | |
4292 | pop ebp | |
4293 | ret | |
4294 | ||
4295 | (For BSD, again, the symbol `_GLOBAL_OFFSET_TABLE' requires a second | |
4296 | leading underscore.) | |
4297 | ||
4298 | The first two lines of this function are simply the standard C | |
4299 | prologue to set up a stack frame, and the last three lines are | |
4300 | standard C function epilogue. The third line, and the fourth to last | |
4301 | line, save and restore the `EBX' register, because PIC shared | |
4302 | libraries use this register to store the address of the GOT. | |
4303 | ||
4304 | The interesting bit is the `CALL' instruction and the following two | |
4305 | lines. The `CALL' and `POP' combination obtains the address of the | |
4306 | label `.get_GOT', without having to know in advance where the | |
4307 | program was loaded (since the `CALL' instruction is encoded relative | |
4308 | to the current position). The `ADD' instruction makes use of one of | |
4309 | the special PIC relocation types: GOTPC relocation. With the | |
4310 | `WRT ..gotpc' qualifier specified, the symbol referenced (here | |
4311 | `_GLOBAL_OFFSET_TABLE_', the special symbol assigned to the GOT) is | |
4312 | given as an offset from the beginning of the section. (Actually, ELF | |
4313 | encodes it as the offset from the operand field of the `ADD' | |
4314 | instruction, but NASM simplifies this deliberately, so you do things | |
4315 | the same way for both ELF and BSD.) So the instruction then _adds_ | |
4316 | the beginning of the section, to get the real address of the GOT, | |
4317 | and subtracts the value of `.get_GOT' which it knows is in `EBX'. | |
4318 | Therefore, by the time that instruction has finished, `EBX' contains | |
4319 | the address of the GOT. | |
4320 | ||
4321 | If you didn't follow that, don't worry: it's never necessary to | |
4322 | obtain the address of the GOT by any other means, so you can put | |
4323 | those three instructions into a macro and safely ignore them: | |
4324 | ||
4325 | %macro get_GOT 0 | |
4326 | call %%getgot | |
4327 | %%getgot: pop ebx | |
4328 | add ebx,_GLOBAL_OFFSET_TABLE_+$$-%%getgot wrt ..gotpc | |
4329 | %endmacro | |
4330 | ||
4331 | 8.2.2 Finding Your Local Data Items | |
4332 | ||
4333 | Having got the GOT, you can then use it to obtain the addresses of | |
4334 | your data items. Most variables will reside in the sections you have | |
4335 | declared; they can be accessed using the `..gotoff' special `WRT' | |
4336 | type. The way this works is like this: | |
4337 | ||
4338 | lea eax,[ebx+myvar wrt ..gotoff] | |
4339 | ||
4340 | The expression `myvar wrt ..gotoff' is calculated, when the shared | |
4341 | library is linked, to be the offset to the local variable `myvar' | |
4342 | from the beginning of the GOT. Therefore, adding it to `EBX' as | |
4343 | above will place the real address of `myvar' in `EAX'. | |
4344 | ||
4345 | If you declare variables as `GLOBAL' without specifying a size for | |
4346 | them, they are shared between code modules in the library, but do | |
4347 | not get exported from the library to the program that loaded it. | |
4348 | They will still be in your ordinary data and BSS sections, so you | |
4349 | can access them in the same way as local variables, using the above | |
4350 | `..gotoff' mechanism. | |
4351 | ||
4352 | Note that due to a peculiarity of the way BSD `a.out' format handles | |
4353 | this relocation type, there must be at least one non-local symbol in | |
4354 | the same section as the address you're trying to access. | |
4355 | ||
4356 | 8.2.3 Finding External and Common Data Items | |
4357 | ||
4358 | If your library needs to get at an external variable (external to | |
4359 | the _library_, not just to one of the modules within it), you must | |
4360 | use the `..got' type to get at it. The `..got' type, instead of | |
4361 | giving you the offset from the GOT base to the variable, gives you | |
4362 | the offset from the GOT base to a GOT _entry_ containing the address | |
4363 | of the variable. The linker will set up this GOT entry when it | |
4364 | builds the library, and the dynamic linker will place the correct | |
4365 | address in it at load time. So to obtain the address of an external | |
4366 | variable `extvar' in `EAX', you would code | |
4367 | ||
4368 | mov eax,[ebx+extvar wrt ..got] | |
4369 | ||
4370 | This loads the address of `extvar' out of an entry in the GOT. The | |
4371 | linker, when it builds the shared library, collects together every | |
4372 | relocation of type `..got', and builds the GOT so as to ensure it | |
4373 | has every necessary entry present. | |
4374 | ||
4375 | Common variables must also be accessed in this way. | |
4376 | ||
4377 | 8.2.4 Exporting Symbols to the Library User | |
4378 | ||
4379 | If you want to export symbols to the user of the library, you have | |
4380 | to declare whether they are functions or data, and if they are data, | |
4381 | you have to give the size of the data item. This is because the | |
4382 | dynamic linker has to build procedure linkage table entries for any | |
4383 | exported functions, and also moves exported data items away from the | |
4384 | library's data section in which they were declared. | |
4385 | ||
4386 | So to export a function to users of the library, you must use | |
4387 | ||
4388 | global func:function ; declare it as a function | |
4389 | func: push ebp | |
4390 | ; etc. | |
4391 | ||
4392 | And to export a data item such as an array, you would have to code | |
4393 | ||
4394 | global array:data array.end-array ; give the size too | |
4395 | array: resd 128 | |
4396 | .end: | |
4397 | ||
4398 | Be careful: If you export a variable to the library user, by | |
4399 | declaring it as `GLOBAL' and supplying a size, the variable will end | |
4400 | up living in the data section of the main program, rather than in | |
4401 | your library's data section, where you declared it. So you will have | |
4402 | to access your own global variable with the `..got' mechanism rather | |
4403 | than `..gotoff', as if it were external (which, effectively, it has | |
4404 | become). | |
4405 | ||
4406 | Equally, if you need to store the address of an exported global in | |
4407 | one of your data sections, you can't do it by means of the standard | |
4408 | sort of code: | |
4409 | ||
4410 | dataptr: dd global_data_item ; WRONG | |
4411 | ||
4412 | NASM will interpret this code as an ordinary relocation, in which | |
4413 | `global_data_item' is merely an offset from the beginning of the | |
4414 | `.data' section (or whatever); so this reference will end up | |
4415 | pointing at your data section instead of at the exported global | |
4416 | which resides elsewhere. | |
4417 | ||
4418 | Instead of the above code, then, you must write | |
4419 | ||
4420 | dataptr: dd global_data_item wrt ..sym | |
4421 | ||
4422 | which makes use of the special `WRT' type `..sym' to instruct NASM | |
4423 | to search the symbol table for a particular symbol at that address, | |
4424 | rather than just relocating by section base. | |
4425 | ||
4426 | Either method will work for functions: referring to one of your | |
4427 | functions by means of | |
4428 | ||
4429 | funcptr: dd my_function | |
4430 | ||
4431 | will give the user the address of the code you wrote, whereas | |
4432 | ||
4433 | funcptr: dd my_function wrt ..sym | |
4434 | ||
4435 | will give the address of the procedure linkage table for the | |
4436 | function, which is where the calling program will _believe_ the | |
4437 | function lives. Either address is a valid way to call the function. | |
4438 | ||
4439 | 8.2.5 Calling Procedures Outside the Library | |
4440 | ||
4441 | Calling procedures outside your shared library has to be done by | |
4442 | means of a _procedure linkage table_, or PLT. The PLT is placed at a | |
4443 | known offset from where the library is loaded, so the library code | |
4444 | can make calls to the PLT in a position-independent way. Within the | |
4445 | PLT there is code to jump to offsets contained in the GOT, so | |
4446 | function calls to other shared libraries or to routines in the main | |
4447 | program can be transparently passed off to their real destinations. | |
4448 | ||
4449 | To call an external routine, you must use another special PIC | |
4450 | relocation type, `WRT ..plt'. This is much easier than the GOT-based | |
4451 | ones: you simply replace calls such as `CALL printf' with the PLT- | |
4452 | relative version `CALL printf WRT ..plt'. | |
4453 | ||
4454 | 8.2.6 Generating the Library File | |
4455 | ||
4456 | Having written some code modules and assembled them to `.o' files, | |
4457 | you then generate your shared library with a command such as | |
4458 | ||
4459 | ld -shared -o library.so module1.o module2.o # for ELF | |
4460 | ld -Bshareable -o library.so module1.o module2.o # for BSD | |
4461 | ||
4462 | For ELF, if your shared library is going to reside in system | |
4463 | directories such as `/usr/lib' or `/lib', it is usually worth using | |
4464 | the `-soname' flag to the linker, to store the final library file | |
4465 | name, with a version number, into the library: | |
4466 | ||
4467 | ld -shared -soname library.so.1 -o library.so.1.2 *.o | |
4468 | ||
4469 | You would then copy `library.so.1.2' into the library directory, and | |
4470 | create `library.so.1' as a symbolic link to it. | |
4471 | ||
4472 | Chapter 9: Mixing 16 and 32 Bit Code | |
4473 | ------------------------------------ | |
4474 | ||
4475 | This chapter tries to cover some of the issues, largely related to | |
4476 | unusual forms of addressing and jump instructions, encountered when | |
4477 | writing operating system code such as protected-mode initialisation | |
4478 | routines, which require code that operates in mixed segment sizes, | |
4479 | such as code in a 16-bit segment trying to modify data in a 32-bit | |
4480 | one, or jumps between different-size segments. | |
4481 | ||
4482 | 9.1 Mixed-Size Jumps | |
4483 | ||
4484 | The most common form of mixed-size instruction is the one used when | |
4485 | writing a 32-bit OS: having done your setup in 16-bit mode, such as | |
4486 | loading the kernel, you then have to boot it by switching into | |
4487 | protected mode and jumping to the 32-bit kernel start address. In a | |
4488 | fully 32-bit OS, this tends to be the _only_ mixed-size instruction | |
4489 | you need, since everything before it can be done in pure 16-bit | |
4490 | code, and everything after it can be pure 32-bit. | |
4491 | ||
4492 | This jump must specify a 48-bit far address, since the target | |
4493 | segment is a 32-bit one. However, it must be assembled in a 16-bit | |
4494 | segment, so just coding, for example, | |
4495 | ||
4496 | jmp 0x1234:0x56789ABC ; wrong! | |
4497 | ||
4498 | will not work, since the offset part of the address will be | |
4499 | truncated to `0x9ABC' and the jump will be an ordinary 16-bit far | |
4500 | one. | |
4501 | ||
4502 | The Linux kernel setup code gets round the inability of `as86' to | |
4503 | generate the required instruction by coding it manually, using `DB' | |
4504 | instructions. NASM can go one better than that, by actually | |
4505 | generating the right instruction itself. Here's how to do it right: | |
4506 | ||
4507 | jmp dword 0x1234:0x56789ABC ; right | |
4508 | ||
4509 | The `DWORD' prefix (strictly speaking, it should come _after_ the | |
4510 | colon, since it is declaring the _offset_ field to be a doubleword; | |
4511 | but NASM will accept either form, since both are unambiguous) forces | |
4512 | the offset part to be treated as far, in the assumption that you are | |
4513 | deliberately writing a jump from a 16-bit segment to a 32-bit one. | |
4514 | ||
4515 | You can do the reverse operation, jumping from a 32-bit segment to a | |
4516 | 16-bit one, by means of the `WORD' prefix: | |
4517 | ||
4518 | jmp word 0x8765:0x4321 ; 32 to 16 bit | |
4519 | ||
4520 | If the `WORD' prefix is specified in 16-bit mode, or the `DWORD' | |
4521 | prefix in 32-bit mode, they will be ignored, since each is | |
4522 | explicitly forcing NASM into a mode it was in anyway. | |
4523 | ||
4524 | 9.2 Addressing Between Different-Size Segments | |
4525 | ||
4526 | If your OS is mixed 16 and 32-bit, or if you are writing a DOS | |
4527 | extender, you are likely to have to deal with some 16-bit segments | |
4528 | and some 32-bit ones. At some point, you will probably end up | |
4529 | writing code in a 16-bit segment which has to access data in a 32- | |
4530 | bit segment, or vice versa. | |
4531 | ||
4532 | If the data you are trying to access in a 32-bit segment lies within | |
4533 | the first 64K of the segment, you may be able to get away with using | |
4534 | an ordinary 16-bit addressing operation for the purpose; but sooner | |
4535 | or later, you will want to do 32-bit addressing from 16-bit mode. | |
4536 | ||
4537 | The easiest way to do this is to make sure you use a register for | |
4538 | the address, since any effective address containing a 32-bit | |
4539 | register is forced to be a 32-bit address. So you can do | |
4540 | ||
4541 | mov eax,offset_into_32_bit_segment_specified_by_fs | |
4542 | mov dword [fs:eax],0x11223344 | |
4543 | ||
4544 | This is fine, but slightly cumbersome (since it wastes an | |
4545 | instruction and a register) if you already know the precise offset | |
4546 | you are aiming at. The x86 architecture does allow 32-bit effective | |
4547 | addresses to specify nothing but a 4-byte offset, so why shouldn't | |
4548 | NASM be able to generate the best instruction for the purpose? | |
4549 | ||
4550 | It can. As in section 9.1, you need only prefix the address with the | |
4551 | `DWORD' keyword, and it will be forced to be a 32-bit address: | |
4552 | ||
4553 | mov dword [fs:dword my_offset],0x11223344 | |
4554 | ||
4555 | Also as in section 9.1, NASM is not fussy about whether the `DWORD' | |
4556 | prefix comes before or after the segment override, so arguably a | |
4557 | nicer-looking way to code the above instruction is | |
4558 | ||
4559 | mov dword [dword fs:my_offset],0x11223344 | |
4560 | ||
4561 | Don't confuse the `DWORD' prefix _outside_ the square brackets, | |
4562 | which controls the size of the data stored at the address, with the | |
4563 | one `inside' the square brackets which controls the length of the | |
4564 | address itself. The two can quite easily be different: | |
4565 | ||
4566 | mov word [dword 0x12345678],0x9ABC | |
4567 | ||
4568 | This moves 16 bits of data to an address specified by a 32-bit | |
4569 | offset. | |
4570 | ||
4571 | You can also specify `WORD' or `DWORD' prefixes along with the `FAR' | |
4572 | prefix to indirect far jumps or calls. For example: | |
4573 | ||
4574 | call dword far [fs:word 0x4321] | |
4575 | ||
4576 | This instruction contains an address specified by a 16-bit offset; | |
4577 | it loads a 48-bit far pointer from that (16-bit segment and 32-bit | |
4578 | offset), and calls that address. | |
4579 | ||
4580 | 9.3 Other Mixed-Size Instructions | |
4581 | ||
4582 | The other way you might want to access data might be using the | |
4583 | string instructions (`LODSx', `STOSx' and so on) or the `XLATB' | |
4584 | instruction. These instructions, since they take no parameters, | |
4585 | might seem to have no easy way to make them perform 32-bit | |
4586 | addressing when assembled in a 16-bit segment. | |
4587 | ||
4588 | This is the purpose of NASM's `a16' and `a32' prefixes. If you are | |
4589 | coding `LODSB' in a 16-bit segment but it is supposed to be | |
4590 | accessing a string in a 32-bit segment, you should load the desired | |
4591 | address into `ESI' and then code | |
4592 | ||
4593 | a32 lodsb | |
4594 | ||
4595 | The prefix forces the addressing size to 32 bits, meaning that | |
4596 | `LODSB' loads from `[DS:ESI]' instead of `[DS:SI]'. To access a | |
4597 | string in a 16-bit segment when coding in a 32-bit one, the | |
4598 | corresponding `a16' prefix can be used. | |
4599 | ||
4600 | The `a16' and `a32' prefixes can be applied to any instruction in | |
4601 | NASM's instruction table, but most of them can generate all the | |
4602 | useful forms without them. The prefixes are necessary only for | |
4603 | instructions with implicit addressing: `CMPSx' (section A.19), | |
4604 | `SCASx' (section A.149), `LODSx' (section A.98), `STOSx' (section | |
4605 | A.157), `MOVSx' (section A.105), `INSx' (section A.80), `OUTSx' | |
4606 | (section A.112), and `XLATB' (section A.169). Also, the various push | |
4607 | and pop instructions (`PUSHA' and `POPF' as well as the more usual | |
4608 | `PUSH' and `POP') can accept `a16' or `a32' prefixes to force a | |
4609 | particular one of `SP' or `ESP' to be used as a stack pointer, in | |
4610 | case the stack segment in use is a different size from the code | |
4611 | segment. | |
4612 | ||
4613 | `PUSH' and `POP', when applied to segment registers in 32-bit mode, | |
4614 | also have the slightly odd behaviour that they push and pop 4 bytes | |
4615 | at a time, of which the top two are ignored and the bottom two give | |
4616 | the value of the segment register being manipulated. To force the | |
4617 | 16-bit behaviour of segment-register push and pop instructions, you | |
4618 | can use the operand-size prefix `o16': | |
4619 | ||
4620 | o16 push ss | |
4621 | o16 push ds | |
4622 | ||
4623 | This code saves a doubleword of stack space by fitting two segment | |
4624 | registers into the space which would normally be consumed by pushing | |
4625 | one. | |
4626 | ||
4627 | (You can also use the `o32' prefix to force the 32-bit behaviour | |
4628 | when in 16-bit mode, but this seems less useful.) | |
4629 | ||
4630 | Chapter 10: Troubleshooting | |
4631 | --------------------------- | |
4632 | ||
4633 | This chapter describes some of the common problems that users have | |
4634 | been known to encounter with NASM, and answers them. It also gives | |
4635 | instructions for reporting bugs in NASM if you find a difficulty | |
4636 | that isn't listed here. | |
4637 | ||
4638 | 10.1 Common Problems | |
4639 | ||
4640 | 10.1.1 NASM Generates Inefficient Code | |
4641 | ||
4642 | I get a lot of `bug' reports about NASM generating inefficient, or | |
4643 | even `wrong', code on instructions such as `ADD ESP,8'. This is a | |
4644 | deliberate design feature, connected to predictability of output: | |
4645 | NASM, on seeing `ADD ESP,8', will generate the form of the | |
4646 | instruction which leaves room for a 32-bit offset. You need to code | |
4647 | `ADD ESP,BYTE 8' if you want the space-efficient form of the | |
4648 | instruction. This isn't a bug: at worst it's a misfeature, and | |
4649 | that's a matter of opinion only. | |
4650 | ||
4651 | 10.1.2 My Jumps are Out of Range | |
4652 | ||
4653 | Similarly, people complain that when they issue conditional jumps | |
4654 | (which are `SHORT' by default) that try to jump too far, NASM | |
4655 | reports `short jump out of range' instead of making the jumps | |
4656 | longer. | |
4657 | ||
4658 | This, again, is partly a predictability issue, but in fact has a | |
4659 | more practical reason as well. NASM has no means of being told what | |
4660 | type of processor the code it is generating will be run on; so it | |
4661 | cannot decide for itself that it should generate `Jcc NEAR' type | |
4662 | instructions, because it doesn't know that it's working for a 386 or | |
4663 | above. Alternatively, it could replace the out-of-range short `JNE' | |
4664 | instruction with a very short `JE' instruction that jumps over a | |
4665 | `JMP NEAR'; this is a sensible solution for processors below a 386, | |
4666 | but hardly efficient on processors which have good branch prediction | |
4667 | _and_ could have used `JNE NEAR' instead. So, once again, it's up to | |
4668 | the user, not the assembler, to decide what instructions should be | |
4669 | generated. | |
4670 | ||
4671 | 10.1.3 `ORG' Doesn't Work | |
4672 | ||
4673 | People writing boot sector programs in the `bin' format often | |
4674 | complain that `ORG' doesn't work the way they'd like: in order to | |
4675 | place the `0xAA55' signature word at the end of a 512-byte boot | |
4676 | sector, people who are used to MASM tend to code | |
4677 | ||
4678 | ORG 0 | |
4679 | ; some boot sector code | |
4680 | ORG 510 | |
4681 | DW 0xAA55 | |
4682 | ||
4683 | This is not the intended use of the `ORG' directive in NASM, and | |
4684 | will not work. The correct way to solve this problem in NASM is to | |
4685 | use the `TIMES' directive, like this: | |
4686 | ||
4687 | ORG 0 | |
4688 | ; some boot sector code | |
4689 | TIMES 510-($-$$) DB 0 | |
4690 | DW 0xAA55 | |
4691 | ||
4692 | The `TIMES' directive will insert exactly enough zero bytes into the | |
4693 | output to move the assembly point up to 510. This method also has | |
4694 | the advantage that if you accidentally fill your boot sector too | |
4695 | full, NASM will catch the problem at assembly time and report it, so | |
4696 | you won't end up with a boot sector that you have to disassemble to | |
4697 | find out what's wrong with it. | |
4698 | ||
4699 | 10.1.4 `TIMES' Doesn't Work | |
4700 | ||
4701 | The other common problem with the above code is people who write the | |
4702 | `TIMES' line as | |
4703 | ||
4704 | TIMES 510-$ DB 0 | |
4705 | ||
4706 | by reasoning that `$' should be a pure number, just like 510, so the | |
4707 | difference between them is also a pure number and can happily be fed | |
4708 | to `TIMES'. | |
4709 | ||
4710 | NASM is a _modular_ assembler: the various component parts are | |
4711 | designed to be easily separable for re-use, so they don't exchange | |
4712 | information unnecessarily. In consequence, the `bin' output format, | |
4713 | even though it has been told by the `ORG' directive that the `.text' | |
4714 | section should start at 0, does not pass that information back to | |
4715 | the expression evaluator. So from the evaluator's point of view, `$' | |
4716 | isn't a pure number: it's an offset from a section base. Therefore | |
4717 | the difference between `$' and 510 is also not a pure number, but | |
4718 | involves a section base. Values involving section bases cannot be | |
4719 | passed as arguments to `TIMES'. | |
4720 | ||
4721 | The solution, as in the previous section, is to code the `TIMES' | |
4722 | line in the form | |
4723 | ||
4724 | TIMES 510-($-$$) DB 0 | |
4725 | ||
4726 | in which `$' and `$$' are offsets from the same section base, and so | |
4727 | their difference is a pure number. This will solve the problem and | |
4728 | generate sensible code. | |
4729 | ||
4730 | 10.2 Bugs | |
4731 | ||
4732 | We have never yet released a version of NASM with any _known_ bugs. | |
4733 | That doesn't usually stop there being plenty we didn't know about, | |
4734 | though. Any that you find should be reported to `anakin@pobox.com'. | |
4735 | ||
4736 | Please read section 2.2 first, and don't report the bug if it's | |
4737 | listed in there as a deliberate feature. (If you think the feature | |
4738 | is badly thought out, feel free to send us reasons why you think it | |
4739 | should be changed, but don't just send us mail saying `This is a | |
4740 | bug' if the documentation says we did it on purpose.) Then read | |
4741 | section 10.1, and don't bother reporting the bug if it's listed | |
4742 | there. | |
4743 | ||
4744 | If you do report a bug, _please_ give us all of the following | |
4745 | information: | |
4746 | ||
4747 | (*) What operating system you're running NASM under. DOS, Linux, | |
4748 | NetBSD, Win16, Win32, VMS (I'd be impressed), whatever. | |
4749 | ||
4750 | (*) If you're running NASM under DOS or Win32, tell us whether | |
4751 | you've compiled your own executable from the DOS source archive, | |
4752 | or whether you were using the standard distribution binaries out | |
4753 | of the archive. If you were using a locally built executable, | |
4754 | try to reproduce the problem using one of the standard binaries, | |
4755 | as this will make it easier for us to reproduce your problem | |
4756 | prior to fixing it. | |
4757 | ||
4758 | (*) Which version of NASM you're using, and exactly how you invoked | |
4759 | it. Give us the precise command line, and the contents of the | |
4760 | `NASM' environment variable if any. | |
4761 | ||
4762 | (*) Which versions of any supplementary programs you're using, and | |
4763 | how you invoked them. If the problem only becomes visible at | |
4764 | link time, tell us what linker you're using, what version of it | |
4765 | you've got, and the exact linker command line. If the problem | |
4766 | involves linking against object files generated by a compiler, | |
4767 | tell us what compiler, what version, and what command line or | |
4768 | options you used. (If you're compiling in an IDE, please try to | |
4769 | reproduce the problem with the command-line version of the | |
4770 | compiler.) | |
4771 | ||
4772 | (*) If at all possible, send us a NASM source file which exhibits | |
4773 | the problem. If this causes copyright problems (e.g. you can | |
4774 | only reproduce the bug in restricted-distribution code) then | |
4775 | bear in mind the following two points: firstly, we guarantee | |
4776 | that any source code sent to us for the purposes of debugging | |
4777 | NASM will be used _only_ for the purposes of debugging NASM, and | |
4778 | that we will delete all our copies of it as soon as we have | |
4779 | found and fixed the bug or bugs in question; and secondly, we | |
4780 | would prefer _not_ to be mailed large chunks of code anyway. The | |
4781 | smaller the file, the better. A three-line sample file that does | |
4782 | nothing useful _except_ demonstrate the problem is much easier | |
4783 | to work with than a fully fledged ten-thousand-line program. (Of | |
4784 | course, some errors _do_ only crop up in large files, so this | |
4785 | may not be possible.) | |
4786 | ||
4787 | (*) A description of what the problem actually _is_. `It doesn't | |
4788 | work' is _not_ a helpful description! Please describe exactly | |
4789 | what is happening that shouldn't be, or what isn't happening | |
4790 | that should. Examples might be: `NASM generates an error message | |
4791 | saying Line 3 for an error that's actually on Line 5'; `NASM | |
4792 | generates an error message that I believe it shouldn't be | |
4793 | generating at all'; `NASM fails to generate an error message | |
4794 | that I believe it _should_ be generating'; `the object file | |
4795 | produced from this source code crashes my linker'; `the ninth | |
4796 | byte of the output file is 66 and I think it should be 77 | |
4797 | instead'. | |
4798 | ||
4799 | (*) If you believe the output file from NASM to be faulty, send it | |
4800 | to us. That allows us to determine whether our own copy of NASM | |
4801 | generates the same file, or whether the problem is related to | |
4802 | portability issues between our development platforms and yours. | |
4803 | We can handle binary files mailed to us as MIME attachments, | |
4804 | uuencoded, and even BinHex. Alternatively, we may be able to | |
4805 | provide an FTP site you can upload the suspect files to; but | |
4806 | mailing them is easier for us. | |
4807 | ||
4808 | (*) Any other information or data files that might be helpful. If, | |
4809 | for example, the problem involves NASM failing to generate an | |
4810 | object file while TASM can generate an equivalent file without | |
4811 | trouble, then send us _both_ object files, so we can see what | |
4812 | TASM is doing differently from us. | |
4813 | ||
4814 | Appendix A: Intel x86 Instruction Reference | |
4815 | ------------------------------------------- | |
4816 | ||
4817 | This appendix provides a complete list of the machine instructions | |
4818 | which NASM will assemble, and a short description of the function of | |
4819 | each one. | |
4820 | ||
4821 | It is not intended to be exhaustive documentation on the fine | |
4822 | details of the instructions' function, such as which exceptions they | |
4823 | can trigger: for such documentation, you should go to Intel's Web | |
4824 | site, `http://www.intel.com'. | |
4825 | ||
4826 | Instead, this appendix is intended primarily to provide | |
4827 | documentation on the way the instructions may be used within NASM. | |
4828 | For example, looking up `LOOP' will tell you that NASM allows `CX' | |
4829 | or `ECX' to be specified as an optional second argument to the | |
4830 | `LOOP' instruction, to enforce which of the two possible counter | |
4831 | registers should be used if the default is not the one desired. | |
4832 | ||
4833 | The instructions are not quite listed in alphabetical order, since | |
4834 | groups of instructions with similar functions are lumped together in | |
4835 | the same entry. Most of them don't move very far from their | |
4836 | alphabetic position because of this. | |
4837 | ||
4838 | A.1 Key to Operand Specifications | |
4839 | ||
4840 | The instruction descriptions in this appendix specify their operands | |
4841 | using the following notation: | |
4842 | ||
4843 | (*) Registers: `reg8' denotes an 8-bit general purpose register, | |
4844 | `reg16' denotes a 16-bit general purpose register, and `reg32' a | |
4845 | 32-bit one. `fpureg' denotes one of the eight FPU stack | |
4846 | registers, `mmxreg' denotes one of the eight 64-bit MMX | |
4847 | registers, and `segreg' denotes a segment register. In addition, | |
4848 | some registers (such as `AL', `DX' or `ECX') may be specified | |
4849 | explicitly. | |
4850 | ||
4851 | (*) Immediate operands: `imm' denotes a generic immediate operand. | |
4852 | `imm8', `imm16' and `imm32' are used when the operand is | |
4853 | intended to be a specific size. For some of these instructions, | |
4854 | NASM needs an explicit specifier: for example, `ADD ESP,16' | |
4855 | could be interpreted as either `ADD r/m32,imm32' or | |
4856 | `ADD r/m32,imm8'. NASM chooses the former by default, and so you | |
4857 | must specify `ADD ESP,BYTE 16' for the latter. | |
4858 | ||
4859 | (*) Memory references: `mem' denotes a generic memory reference; | |
4860 | `mem8', `mem16', `mem32', `mem64' and `mem80' are used when the | |
4861 | operand needs to be a specific size. Again, a specifier is | |
4862 | needed in some cases: `DEC [address]' is ambiguous and will be | |
4863 | rejected by NASM. You must specify `DEC BYTE [address]', | |
4864 | `DEC WORD [address]' or `DEC DWORD [address]' instead. | |
4865 | ||
4866 | (*) Restricted memory references: one form of the `MOV' instruction | |
4867 | allows a memory address to be specified _without_ allowing the | |
4868 | normal range of register combinations and effective address | |
4869 | processing. This is denoted by `memoffs8', `memoffs16' and | |
4870 | `memoffs32'. | |
4871 | ||
4872 | (*) Register or memory choices: many instructions can accept either | |
4873 | a register _or_ a memory reference as an operand. `r/m8' is a | |
4874 | shorthand for `reg8/mem8'; similarly `r/m16' and `r/m32'. | |
4875 | `r/m64' is MMX-related, and is a shorthand for `mmxreg/mem64'. | |
4876 | ||
4877 | A.2 Key to Opcode Descriptions | |
4878 | ||
4879 | This appendix also provides the opcodes which NASM will generate for | |
4880 | each form of each instruction. The opcodes are listed in the | |
4881 | following way: | |
4882 | ||
4883 | (*) A hex number, such as `3F', indicates a fixed byte containing | |
4884 | that number. | |
4885 | ||
4886 | (*) A hex number followed by `+r', such as `C8+r', indicates that | |
4887 | one of the operands to the instruction is a register, and the | |
4888 | `register value' of that register should be added to the hex | |
4889 | number to produce the generated byte. For example, EDX has | |
4890 | register value 2, so the code `C8+r', when the register operand | |
4891 | is EDX, generates the hex byte `CA'. Register values for | |
4892 | specific registers are given in section A.2.1. | |
4893 | ||
4894 | (*) A hex number followed by `+cc', such as `40+cc', indicates that | |
4895 | the instruction name has a condition code suffix, and the | |
4896 | numeric representation of the condition code should be added to | |
4897 | the hex number to produce the generated byte. For example, the | |
4898 | code `40+cc', when the instruction contains the `NE' condition, | |
4899 | generates the hex byte `45'. Condition codes and their numeric | |
4900 | representations are given in section A.2.2. | |
4901 | ||
4902 | (*) A slash followed by a digit, such as `/2', indicates that one of | |
4903 | the operands to the instruction is a memory address or register | |
4904 | (denoted `mem' or `r/m', with an optional size). This is to be | |
4905 | encoded as an effective address, with a ModR/M byte, an optional | |
4906 | SIB byte, and an optional displacement, and the spare (register) | |
4907 | field of the ModR/M byte should be the digit given (which will | |
4908 | be from 0 to 7, so it fits in three bits). The encoding of | |
4909 | effective addresses is given in section A.2.3. | |
4910 | ||
4911 | (*) The code `/r' combines the above two: it indicates that one of | |
4912 | the operands is a memory address or `r/m', and another is a | |
4913 | register, and that an effective address should be generated with | |
4914 | the spare (register) field in the ModR/M byte being equal to the | |
4915 | `register value' of the register operand. The encoding of | |
4916 | effective addresses is given in section A.2.3; register values | |
4917 | are given in section A.2.1. | |
4918 | ||
4919 | (*) The codes `ib', `iw' and `id' indicate that one of the operands | |
4920 | to the instruction is an immediate value, and that this is to be | |
4921 | encoded as a byte, little-endian word or little-endian | |
4922 | doubleword respectively. | |
4923 | ||
4924 | (*) The codes `rb', `rw' and `rd' indicate that one of the operands | |
4925 | to the instruction is an immediate value, and that the | |
4926 | _difference_ between this value and the address of the end of | |
4927 | the instruction is to be encoded as a byte, word or doubleword | |
4928 | respectively. Where the form `rw/rd' appears, it indicates that | |
4929 | either `rw' or `rd' should be used according to whether assembly | |
4930 | is being performed in `BITS 16' or `BITS 32' state respectively. | |
4931 | ||
4932 | (*) The codes `ow' and `od' indicate that one of the operands to the | |
4933 | instruction is a reference to the contents of a memory address | |
4934 | specified as an immediate value: this encoding is used in some | |
4935 | forms of the `MOV' instruction in place of the standard | |
4936 | effective-address mechanism. The displacement is encoded as a | |
4937 | word or doubleword. Again, `ow/od' denotes that `ow' or `od' | |
4938 | should be chosen according to the `BITS' setting. | |
4939 | ||
4940 | (*) The codes `o16' and `o32' indicate that the given form of the | |
4941 | instruction should be assembled with operand size 16 or 32 bits. | |
4942 | In other words, `o16' indicates a `66' prefix in `BITS 32' | |
4943 | state, but generates no code in `BITS 16' state; and `o32' | |
4944 | indicates a `66' prefix in `BITS 16' state but generates nothing | |
4945 | in `BITS 32'. | |
4946 | ||
4947 | (*) The codes `a16' and `a32', similarly to `o16' and `o32', | |
4948 | indicate the address size of the given form of the instruction. | |
4949 | Where this does not match the `BITS' setting, a `67' prefix is | |
4950 | required. | |
4951 | ||
4952 | A.2.1 Register Values | |
4953 | ||
4954 | Where an instruction requires a register value, it is already | |
4955 | implicit in the encoding of the rest of the instruction what type of | |
4956 | register is intended: an 8-bit general-purpose register, a segment | |
4957 | register, a debug register, an MMX register, or whatever. Therefore | |
4958 | there is no problem with registers of different types sharing an | |
4959 | encoding value. | |
4960 | ||
4961 | The encodings for the various classes of register are: | |
4962 | ||
4963 | (*) 8-bit general registers: `AL' is 0, `CL' is 1, `DL' is 2, `BL' | |
4964 | is 3, `AH' is 4, `CH' is 5, `DH' is 6, and `BH' is 7. | |
4965 | ||
4966 | (*) 16-bit general registers: `AX' is 0, `CX' is 1, `DX' is 2, `BX' | |
4967 | is 3, `SP' is 4, `BP' is 5, `SI' is 6, and `DI' is 7. | |
4968 | ||
4969 | (*) 32-bit general registers: `EAX' is 0, `ECX' is 1, `EDX' is 2, | |
4970 | `EBX' is 3, `ESP' is 4, `EBP' is 5, `ESI' is 6, and `EDI' is 7. | |
4971 | ||
4972 | (*) Segment registers: `ES' is 0, `CS' is 1, `SS' is 2, `DS' is 3, | |
4973 | `FS' is 4, and `GS' is 5. | |
4974 | ||
4975 | (*) {Floating-point registers}: `ST0' is 0, `ST1' is 1, `ST2' is 2, | |
4976 | `ST3' is 3, `ST4' is 4, `ST5' is 5, `ST6' is 6, and `ST7' is 7. | |
4977 | ||
4978 | (*) 64-bit MMX registers: `MM0' is 0, `MM1' is 1, `MM2' is 2, `MM3' | |
4979 | is 3, `MM4' is 4, `MM5' is 5, `MM6' is 6, and `MM7' is 7. | |
4980 | ||
4981 | (*) Control registers: `CR0' is 0, `CR2' is 2, `CR3' is 3, and `CR4' | |
4982 | is 4. | |
4983 | ||
4984 | (*) Debug registers: `DR0' is 0, `DR1' is 1, `DR2' is 2, `DR3' is 3, | |
4985 | `DR6' is 6, and `DR7' is 7. | |
4986 | ||
4987 | (*) Test registers: `TR3' is 3, `TR4' is 4, `TR5' is 5, `TR6' is 6, | |
4988 | and `TR7' is 7. | |
4989 | ||
4990 | (Note that wherever a register name contains a number, that number | |
4991 | is also the register value for that register.) | |
4992 | ||
4993 | A.2.2 Condition Codes | |
4994 | ||
4995 | The available condition codes are given here, along with their | |
4996 | numeric representations as part of opcodes. Many of these condition | |
4997 | codes have synonyms, so several will be listed at a time. | |
4998 | ||
4999 | In the following descriptions, the word `either', when applied to | |
5000 | two possible trigger conditions, is used to mean `either or both'. | |
5001 | If `either but not both' is meant, the phrase `exactly one of' is | |
5002 | used. | |
5003 | ||
5004 | (*) `O' is 0 (trigger if the overflow flag is set); `NO' is 1. | |
5005 | ||
5006 | (*) `B', `C' and `NAE' are 2 (trigger if the carry flag is set); | |
5007 | `AE', `NB' and `NC' are 3. | |
5008 | ||
5009 | (*) `E' and `Z' are 4 (trigger if the zero flag is set); `NE' and | |
5010 | `NZ' are 5. | |
5011 | ||
5012 | (*) `BE' and `NA' are 6 (trigger if either of the carry or zero | |
5013 | flags is set); `A' and `NBE' are 7. | |
5014 | ||
5015 | (*) `S' is 8 (trigger if the sign flag is set); `NS' is 9. | |
5016 | ||
5017 | (*) `P' and `PE' are 10 (trigger if the parity flag is set); `NP' | |
5018 | and `PO' are 11. | |
5019 | ||
5020 | (*) `L' and `NGE' are 12 (trigger if exactly one of the sign and | |
5021 | overflow flags is set); `GE' and `NL' are 13. | |
5022 | ||
5023 | (*) `LE' and `NG' are 14 (trigger if either the zero flag is set, or | |
5024 | exactly one of the sign and overflow flags is set); `G' and | |
5025 | `NLE' are 15. | |
5026 | ||
5027 | Note that in all cases, the sense of a condition code may be | |
5028 | reversed by changing the low bit of the numeric representation. | |
5029 | ||
5030 | A.2.3 Effective Address Encoding: ModR/M and SIB | |
5031 | ||
5032 | An effective address is encoded in up to three parts: a ModR/M byte, | |
5033 | an optional SIB byte, and an optional byte, word or doubleword | |
5034 | displacement field. | |
5035 | ||
5036 | The ModR/M byte consists of three fields: the `mod' field, ranging | |
5037 | from 0 to 3, in the upper two bits of the byte, the `r/m' field, | |
5038 | ranging from 0 to 7, in the lower three bits, and the spare | |
5039 | (register) field in the middle (bit 3 to bit 5). The spare field is | |
5040 | not relevant to the effective address being encoded, and either | |
5041 | contains an extension to the instruction opcode or the register | |
5042 | value of another operand. | |
5043 | ||
5044 | The ModR/M system can be used to encode a direct register reference | |
5045 | rather than a memory access. This is always done by setting the | |
5046 | `mod' field to 3 and the `r/m' field to the register value of the | |
5047 | register in question (it must be a general-purpose register, and the | |
5048 | size of the register must already be implicit in the encoding of the | |
5049 | rest of the instruction). In this case, the SIB byte and | |
5050 | displacement field are both absent. | |
5051 | ||
5052 | In 16-bit addressing mode (either `BITS 16' with no `67' prefix, or | |
5053 | `BITS 32' with a `67' prefix), the SIB byte is never used. The | |
5054 | general rules for `mod' and `r/m' (there is an exception, given | |
5055 | below) are: | |
5056 | ||
5057 | (*) The `mod' field gives the length of the displacement field: 0 | |
5058 | means no displacement, 1 means one byte, and 2 means two bytes. | |
5059 | ||
5060 | (*) The `r/m' field encodes the combination of registers to be added | |
5061 | to the displacement to give the accessed address: 0 means | |
5062 | `BX+SI', 1 means `BX+DI', 2 means `BP+SI', 3 means `BP+DI', 4 | |
5063 | means `SI' only, 5 means `DI' only, 6 means `BP' only, and 7 | |
5064 | means `BX' only. | |
5065 | ||
5066 | However, there is a special case: | |
5067 | ||
5068 | (*) If `mod' is 0 and `r/m' is 6, the effective address encoded is | |
5069 | not `[BP]' as the above rules would suggest, but instead | |
5070 | `[disp16]': the displacement field is present and is two bytes | |
5071 | long, and no registers are added to the displacement. | |
5072 | ||
5073 | Therefore the effective address `[BP]' cannot be encoded as | |
5074 | efficiently as `[BX]'; so if you code `[BP]' in a program, NASM adds | |
5075 | a notional 8-bit zero displacement, and sets `mod' to 1, `r/m' to 6, | |
5076 | and the one-byte displacement field to 0. | |
5077 | ||
5078 | In 32-bit addressing mode (either `BITS 16' with a `67' prefix, or | |
5079 | `BITS 32' with no `67' prefix) the general rules (again, there are | |
5080 | exceptions) for `mod' and `r/m' are: | |
5081 | ||
5082 | (*) The `mod' field gives the length of the displacement field: 0 | |
5083 | means no displacement, 1 means one byte, and 2 means four bytes. | |
5084 | ||
5085 | (*) If only one register is to be added to the displacement, and it | |
5086 | is not `ESP', the `r/m' field gives its register value, and the | |
5087 | SIB byte is absent. If the `r/m' field is 4 (which would encode | |
5088 | `ESP'), the SIB byte is present and gives the combination and | |
5089 | scaling of registers to be added to the displacement. | |
5090 | ||
5091 | If the SIB byte is present, it describes the combination of | |
5092 | registers (an optional base register, and an optional index register | |
5093 | scaled by multiplication by 1, 2, 4 or 8) to be added to the | |
5094 | displacement. The SIB byte is divided into the `scale' field, in the | |
5095 | top two bits, the `index' field in the next three, and the `base' | |
5096 | field in the bottom three. The general rules are: | |
5097 | ||
5098 | (*) The `base' field encodes the register value of the base | |
5099 | register. | |
5100 | ||
5101 | (*) The `index' field encodes the register value of the index | |
5102 | register, unless it is 4, in which case no index register is | |
5103 | used (so `ESP' cannot be used as an index register). | |
5104 | ||
5105 | (*) The `scale' field encodes the multiplier by which the index | |
5106 | register is scaled before adding it to the base and | |
5107 | displacement: 0 encodes a multiplier of 1, 1 encodes 2, 2 | |
5108 | encodes 4 and 3 encodes 8. | |
5109 | ||
5110 | The exceptions to the 32-bit encoding rules are: | |
5111 | ||
5112 | (*) If `mod' is 0 and `r/m' is 5, the effective address encoded is | |
5113 | not `[EBP]' as the above rules would suggest, but instead | |
5114 | `[disp32]': the displacement field is present and is four bytes | |
5115 | long, and no registers are added to the displacement. | |
5116 | ||
5117 | (*) If `mod' is 0, `r/m' is 4 (meaning the SIB byte is present) and | |
5118 | `base' is 4, the effective address encoded is not `[EBP+index]' | |
5119 | as the above rules would suggest, but instead `[disp32+index]': | |
5120 | the displacement field is present and is four bytes long, and | |
5121 | there is no base register (but the index register is still | |
5122 | processed in the normal way). | |
5123 | ||
5124 | A.3 Key to Instruction Flags | |
5125 | ||
5126 | Given along with each instruction in this appendix is a set of | |
5127 | flags, denoting the type of the instruction. The types are as | |
5128 | follows: | |
5129 | ||
5130 | (*) `8086', `186', `286', `386', `486', `PENT' and `P6' denote the | |
5131 | lowest processor type that supports the instruction. Most | |
5132 | instructions run on all processors above the given type; those | |
5133 | that do not are documented. The Pentium II contains no | |
5134 | additional instructions beyond the P6 (Pentium Pro); from the | |
5135 | point of view of its instruction set, it can be thought of as a | |
5136 | P6 with MMX capability. | |
5137 | ||
5138 | (*) `CYRIX' indicates that the instruction is specific to Cyrix | |
5139 | processors, for example the extra MMX instructions in the Cyrix | |
5140 | extended MMX instruction set. | |
5141 | ||
5142 | (*) `FPU' indicates that the instruction is a floating-point one, | |
5143 | and will only run on machines with a coprocessor (automatically | |
5144 | including 486DX, Pentium and above). | |
5145 | ||
5146 | (*) `MMX' indicates that the instruction is an MMX one, and will run | |
5147 | on MMX-capable Pentium processors and the Pentium II. | |
5148 | ||
5149 | (*) `PRIV' indicates that the instruction is a protected-mode | |
5150 | management instruction. Many of these may only be used in | |
5151 | protected mode, or only at privilege level zero. | |
5152 | ||
5153 | (*) `UNDOC' indicates that the instruction is an undocumented one, | |
5154 | and not part of the official Intel Architecture; it may or may | |
5155 | not be supported on any given machine. | |
5156 | ||
5157 | A.4 `AAA', `AAS', `AAM', `AAD': ASCII Adjustments | |
5158 | ||
5159 | AAA ; 37 [8086] | |
5160 | ||
5161 | AAS ; 3F [8086] | |
5162 | ||
5163 | AAD ; D5 0A [8086] | |
5164 | AAD imm ; D5 ib [8086] | |
5165 | ||
5166 | AAM ; D4 0A [8086] | |
5167 | AAM imm ; D4 ib [8086] | |
5168 | ||
5169 | These instructions are used in conjunction with the add, subtract, | |
5170 | multiply and divide instructions to perform binary-coded decimal | |
5171 | arithmetic in _unpacked_ (one BCD digit per byte - easy to translate | |
5172 | to and from ASCII, hence the instruction names) form. There are also | |
5173 | packed BCD instructions `DAA' and `DAS': see section A.23. | |
5174 | ||
5175 | `AAA' should be used after a one-byte `ADD' instruction whose | |
5176 | destination was the `AL' register: by means of examining the value | |
5177 | in the low nibble of `AL' and also the auxiliary carry flag `AF', it | |
5178 | determines whether the addition has overflowed, and adjusts it (and | |
5179 | sets the carry flag) if so. You can add long BCD strings together by | |
5180 | doing `ADD'/`AAA' on the low digits, then doing `ADC'/`AAA' on each | |
5181 | subsequent digit. | |
5182 | ||
5183 | `AAS' works similarly to `AAA', but is for use after `SUB' | |
5184 | instructions rather than `ADD'. | |
5185 | ||
5186 | `AAM' is for use after you have multiplied two decimal digits | |
5187 | together and left the result in `AL': it divides `AL' by ten and | |
5188 | stores the quotient in `AH', leaving the remainder in `AL'. The | |
5189 | divisor 10 can be changed by specifying an operand to the | |
5190 | instruction: a particularly handy use of this is `AAM 16', causing | |
5191 | the two nibbles in `AL' to be separated into `AH' and `AL'. | |
5192 | ||
5193 | `AAD' performs the inverse operation to `AAM': it multiplies `AH' by | |
5194 | ten, adds it to `AL', and sets `AH' to zero. Again, the multiplier | |
5195 | 10 can be changed. | |
5196 | ||
5197 | A.5 `ADC': Add with Carry | |
5198 | ||
5199 | ADC r/m8,reg8 ; 10 /r [8086] | |
5200 | ADC r/m16,reg16 ; o16 11 /r [8086] | |
5201 | ADC r/m32,reg32 ; o32 11 /r [386] | |
5202 | ||
5203 | ADC reg8,r/m8 ; 12 /r [8086] | |
5204 | ADC reg16,r/m16 ; o16 13 /r [8086] | |
5205 | ADC reg32,r/m32 ; o32 13 /r [386] | |
5206 | ||
5207 | ADC r/m8,imm8 ; 80 /2 ib [8086] | |
5208 | ADC r/m16,imm16 ; o16 81 /2 iw [8086] | |
5209 | ADC r/m32,imm32 ; o32 81 /2 id [386] | |
5210 | ||
5211 | ADC r/m16,imm8 ; o16 83 /2 ib [8086] | |
5212 | ADC r/m32,imm8 ; o32 83 /2 ib [386] | |
5213 | ||
5214 | ADC AL,imm8 ; 14 ib [8086] | |
5215 | ADC AX,imm16 ; o16 15 iw [8086] | |
5216 | ADC EAX,imm32 ; o32 15 id [386] | |
5217 | ||
5218 | `ADC' performs integer addition: it adds its two operands together, | |
5219 | plus the value of the carry flag, and leaves the result in its | |
5220 | destination (first) operand. The flags are set according to the | |
5221 | result of the operation: in particular, the carry flag is affected | |
5222 | and can be used by a subsequent `ADC' instruction. | |
5223 | ||
5224 | In the forms with an 8-bit immediate second operand and a longer | |
5225 | first operand, the second operand is considered to be signed, and is | |
5226 | sign-extended to the length of the first operand. In these cases, | |
5227 | the `BYTE' qualifier is necessary to force NASM to generate this | |
5228 | form of the instruction. | |
5229 | ||
5230 | To add two numbers without also adding the contents of the carry | |
5231 | flag, use `ADD' (section A.6). | |
5232 | ||
5233 | A.6 `ADD': Add Integers | |
5234 | ||
5235 | ADD r/m8,reg8 ; 00 /r [8086] | |
5236 | ADD r/m16,reg16 ; o16 01 /r [8086] | |
5237 | ADD r/m32,reg32 ; o32 01 /r [386] | |
5238 | ||
5239 | ADD reg8,r/m8 ; 02 /r [8086] | |
5240 | ADD reg16,r/m16 ; o16 03 /r [8086] | |
5241 | ADD reg32,r/m32 ; o32 03 /r [386] | |
5242 | ||
5243 | ADD r/m8,imm8 ; 80 /0 ib [8086] | |
5244 | ADD r/m16,imm16 ; o16 81 /0 iw [8086] | |
5245 | ADD r/m32,imm32 ; o32 81 /0 id [386] | |
5246 | ||
5247 | ADD r/m16,imm8 ; o16 83 /0 ib [8086] | |
5248 | ADD r/m32,imm8 ; o32 83 /0 ib [386] | |
5249 | ||
5250 | ADD AL,imm8 ; 04 ib [8086] | |
5251 | ADD AX,imm16 ; o16 05 iw [8086] | |
5252 | ADD EAX,imm32 ; o32 05 id [386] | |
5253 | ||
5254 | `ADD' performs integer addition: it adds its two operands together, | |
5255 | and leaves the result in its destination (first) operand. The flags | |
5256 | are set according to the result of the operation: in particular, the | |
5257 | carry flag is affected and can be used by a subsequent `ADC' | |
5258 | instruction (section A.5). | |
5259 | ||
5260 | In the forms with an 8-bit immediate second operand and a longer | |
5261 | first operand, the second operand is considered to be signed, and is | |
5262 | sign-extended to the length of the first operand. In these cases, | |
5263 | the `BYTE' qualifier is necessary to force NASM to generate this | |
5264 | form of the instruction. | |
5265 | ||
5266 | A.7 `AND': Bitwise AND | |
5267 | ||
5268 | AND r/m8,reg8 ; 20 /r [8086] | |
5269 | AND r/m16,reg16 ; o16 21 /r [8086] | |
5270 | AND r/m32,reg32 ; o32 21 /r [386] | |
5271 | ||
5272 | AND reg8,r/m8 ; 22 /r [8086] | |
5273 | AND reg16,r/m16 ; o16 23 /r [8086] | |
5274 | AND reg32,r/m32 ; o32 23 /r [386] | |
5275 | ||
5276 | AND r/m8,imm8 ; 80 /4 ib [8086] | |
5277 | AND r/m16,imm16 ; o16 81 /4 iw [8086] | |
5278 | AND r/m32,imm32 ; o32 81 /4 id [386] | |
5279 | ||
5280 | AND r/m16,imm8 ; o16 83 /4 ib [8086] | |
5281 | AND r/m32,imm8 ; o32 83 /4 ib [386] | |
5282 | ||
5283 | AND AL,imm8 ; 24 ib [8086] | |
5284 | AND AX,imm16 ; o16 25 iw [8086] | |
5285 | AND EAX,imm32 ; o32 25 id [386] | |
5286 | ||
5287 | `AND' performs a bitwise AND operation between its two operands | |
5288 | (i.e. each bit of the result is 1 if and only if the corresponding | |
5289 | bits of the two inputs were both 1), and stores the result in the | |
5290 | destination (first) operand. | |
5291 | ||
5292 | In the forms with an 8-bit immediate second operand and a longer | |
5293 | first operand, the second operand is considered to be signed, and is | |
5294 | sign-extended to the length of the first operand. In these cases, | |
5295 | the `BYTE' qualifier is necessary to force NASM to generate this | |
5296 | form of the instruction. | |
5297 | ||
5298 | The MMX instruction `PAND' (see section A.116) performs the same | |
5299 | operation on the 64-bit MMX registers. | |
5300 | ||
5301 | A.8 `ARPL': Adjust RPL Field of Selector | |
5302 | ||
5303 | ARPL r/m16,reg16 ; 63 /r [286,PRIV] | |
5304 | ||
5305 | `ARPL' expects its two word operands to be segment selectors. It | |
5306 | adjusts the RPL (requested privilege level - stored in the bottom | |
5307 | two bits of the selector) field of the destination (first) operand | |
5308 | to ensure that it is no less (i.e. no more privileged than) the RPL | |
5309 | field of the source operand. The zero flag is set if and only if a | |
5310 | change had to be made. | |
5311 | ||
5312 | A.9 `BOUND': Check Array Index against Bounds | |
5313 | ||
5314 | BOUND reg16,mem ; o16 62 /r [186] | |
5315 | BOUND reg32,mem ; o32 62 /r [386] | |
5316 | ||
5317 | `BOUND' expects its second operand to point to an area of memory | |
5318 | containing two signed values of the same size as its first operand | |
5319 | (i.e. two words for the 16-bit form; two doublewords for the 32-bit | |
5320 | form). It performs two signed comparisons: if the value in the | |
5321 | register passed as its first operand is less than the first of the | |
5322 | in-memory values, or is greater than or equal to the second, it | |
5323 | throws a BR exception. Otherwise, it does nothing. | |
5324 | ||
5325 | A.10 `BSF', `BSR': Bit Scan | |
5326 | ||
5327 | BSF reg16,r/m16 ; o16 0F BC /r [386] | |
5328 | BSF reg32,r/m32 ; o32 0F BC /r [386] | |
5329 | ||
5330 | BSR reg16,r/m16 ; o16 0F BD /r [386] | |
5331 | BSR reg32,r/m32 ; o32 0F BD /r [386] | |
5332 | ||
5333 | `BSF' searches for a set bit in its source (second) operand, | |
5334 | starting from the bottom, and if it finds one, stores the index in | |
5335 | its destination (first) operand. If no set bit is found, the | |
5336 | contents of the destination operand are undefined. | |
5337 | ||
5338 | `BSR' performs the same function, but searches from the top instead, | |
5339 | so it finds the most significant set bit. | |
5340 | ||
5341 | Bit indices are from 0 (least significant) to 15 or 31 (most | |
5342 | significant). | |
5343 | ||
5344 | A.11 `BSWAP': Byte Swap | |
5345 | ||
5346 | BSWAP reg32 ; o32 0F C8+r [486] | |
5347 | ||
5348 | `BSWAP' swaps the order of the four bytes of a 32-bit register: bits | |
5349 | 0-7 exchange places with bits 24-31, and bits 8-15 swap with bits | |
5350 | 16-23. There is no explicit 16-bit equivalent: to byte-swap `AX', | |
5351 | `BX', `CX' or `DX', `XCHG' can be used. | |
5352 | ||
5353 | A.12 `BT', `BTC', `BTR', `BTS': Bit Test | |
5354 | ||
5355 | BT r/m16,reg16 ; o16 0F A3 /r [386] | |
5356 | BT r/m32,reg32 ; o32 0F A3 /r [386] | |
5357 | BT r/m16,imm8 ; o16 0F BA /4 ib [386] | |
5358 | BT r/m32,imm8 ; o32 0F BA /4 ib [386] | |
5359 | ||
5360 | BTC r/m16,reg16 ; o16 0F BB /r [386] | |
5361 | BTC r/m32,reg32 ; o32 0F BB /r [386] | |
5362 | BTC r/m16,imm8 ; o16 0F BA /7 ib [386] | |
5363 | BTC r/m32,imm8 ; o32 0F BA /7 ib [386] | |
5364 | ||
5365 | BTR r/m16,reg16 ; o16 0F B3 /r [386] | |
5366 | BTR r/m32,reg32 ; o32 0F B3 /r [386] | |
5367 | BTR r/m16,imm8 ; o16 0F BA /6 ib [386] | |
5368 | BTR r/m32,imm8 ; o32 0F BA /6 ib [386] | |
5369 | ||
5370 | BTS r/m16,reg16 ; o16 0F AB /r [386] | |
5371 | BTS r/m32,reg32 ; o32 0F AB /r [386] | |
5372 | BTS r/m16,imm ; o16 0F BA /5 ib [386] | |
5373 | BTS r/m32,imm ; o32 0F BA /5 ib [386] | |
5374 | ||
5375 | These instructions all test one bit of their first operand, whose | |
5376 | index is given by the second operand, and store the value of that | |
5377 | bit into the carry flag. Bit indices are from 0 (least significant) | |
5378 | to 15 or 31 (most significant). | |
5379 | ||
5380 | In addition to storing the original value of the bit into the carry | |
5381 | flag, `BTR' also resets (clears) the bit in the operand itself. | |
5382 | `BTS' sets the bit, and `BTC' complements the bit. `BT' does not | |
5383 | modify its operands. | |
5384 | ||
5385 | The bit offset should be no greater than the size of the operand. | |
5386 | ||
5387 | A.13 `CALL': Call Subroutine | |
5388 | ||
5389 | CALL imm ; E8 rw/rd [8086] | |
5390 | CALL imm:imm16 ; o16 9A iw iw [8086] | |
5391 | CALL imm:imm32 ; o32 9A id iw [386] | |
5392 | CALL FAR mem16 ; o16 FF /3 [8086] | |
5393 | CALL FAR mem32 ; o32 FF /3 [386] | |
5394 | CALL r/m16 ; o16 FF /2 [8086] | |
5395 | CALL r/m32 ; o32 FF /2 [386] | |
5396 | ||
5397 | `CALL' calls a subroutine, by means of pushing the current | |
5398 | instruction pointer (`IP') and optionally `CS' as well on the stack, | |
5399 | and then jumping to a given address. | |
5400 | ||
5401 | `CS' is pushed as well as `IP' if and only if the call is a far | |
5402 | call, i.e. a destination segment address is specified in the | |
5403 | instruction. The forms involving two colon-separated arguments are | |
5404 | far calls; so are the `CALL FAR mem' forms. | |
5405 | ||
5406 | You can choose between the two immediate far call forms | |
5407 | (`CALL imm:imm') by the use of the `WORD' and `DWORD' keywords: | |
5408 | `CALL WORD 0x1234:0x5678') or `CALL DWORD 0x1234:0x56789abc'. | |
5409 | ||
5410 | The `CALL FAR mem' forms execute a far call by loading the | |
5411 | destination address out of memory. The address loaded consists of 16 | |
5412 | or 32 bits of offset (depending on the operand size), and 16 bits of | |
5413 | segment. The operand size may be overridden using | |
5414 | `CALL WORD FAR mem' or `CALL DWORD FAR mem'. | |
5415 | ||
5416 | The `CALL r/m' forms execute a near call (within the same segment), | |
5417 | loading the destination address out of memory or out of a register. | |
5418 | The keyword `NEAR' may be specified, for clarity, in these forms, | |
5419 | but is not necessary. Again, operand size can be overridden using | |
5420 | `CALL WORD mem' or `CALL DWORD mem'. | |
5421 | ||
5422 | As a convenience, NASM does not require you to call a far procedure | |
5423 | symbol by coding the cumbersome `CALL SEG routine:routine', but | |
5424 | instead allows the easier synonym `CALL FAR routine'. | |
5425 | ||
5426 | The `CALL r/m' forms given above are near calls; NASM will accept | |
5427 | the `NEAR' keyword (e.g. `CALL NEAR [address]'), even though it is | |
5428 | not strictly necessary. | |
5429 | ||
5430 | A.14 `CBW', `CWD', `CDQ', `CWDE': Sign Extensions | |
5431 | ||
5432 | CBW ; o16 98 [8086] | |
5433 | CWD ; o16 99 [8086] | |
5434 | CDQ ; o32 99 [386] | |
5435 | CWDE ; o32 98 [386] | |
5436 | ||
5437 | All these instructions sign-extend a short value into a longer one, | |
5438 | by replicating the top bit of the original value to fill the | |
5439 | extended one. | |
5440 | ||
5441 | `CBW' extends `AL' into `AX' by repeating the top bit of `AL' in | |
5442 | every bit of `AH'. `CWD' extends `AX' into `DX:AX' by repeating the | |
5443 | top bit of `AX' throughout `DX'. `CWDE' extends `AX' into `EAX', and | |
5444 | `CDQ' extends `EAX' into `EDX:EAX'. | |
5445 | ||
5446 | A.15 `CLC', `CLD', `CLI', `CLTS': Clear Flags | |
5447 | ||
5448 | CLC ; F8 [8086] | |
5449 | CLD ; FC [8086] | |
5450 | CLI ; FA [8086] | |
5451 | CLTS ; 0F 06 [286,PRIV] | |
5452 | ||
5453 | These instructions clear various flags. `CLC' clears the carry flag; | |
5454 | `CLD' clears the direction flag; `CLI' clears the interrupt flag | |
5455 | (thus disabling interrupts); and `CLTS' clears the task-switched | |
5456 | (`TS') flag in `CR0'. | |
5457 | ||
5458 | To set the carry, direction, or interrupt flags, use the `STC', | |
5459 | `STD' and `STI' instructions (section A.156). To invert the carry | |
5460 | flag, use `CMC' (section A.16). | |
5461 | ||
5462 | A.16 `CMC': Complement Carry Flag | |
5463 | ||
5464 | CMC ; F5 [8086] | |
5465 | ||
5466 | `CMC' changes the value of the carry flag: if it was 0, it sets it | |
5467 | to 1, and vice versa. | |
5468 | ||
5469 | A.17 `CMOVcc': Conditional Move | |
5470 | ||
5471 | CMOVcc reg16,r/m16 ; o16 0F 40+cc /r [P6] | |
5472 | CMOVcc reg32,r/m32 ; o32 0F 40+cc /r [P6] | |
5473 | ||
5474 | `CMOV' moves its source (second) operand into its destination | |
5475 | (first) operand if the given condition code is satisfied; otherwise | |
5476 | it does nothing. | |
5477 | ||
5478 | For a list of condition codes, see section A.2.2. | |
5479 | ||
5480 | Although the `CMOV' instructions are flagged `P6' above, they may | |
5481 | not be supported by all Pentium Pro processors; the `CPUID' | |
5482 | instruction (section A.22) will return a bit which indicates whether | |
5483 | conditional moves are supported. | |
5484 | ||
5485 | A.18 `CMP': Compare Integers | |
5486 | ||
5487 | CMP r/m8,reg8 ; 38 /r [8086] | |
5488 | CMP r/m16,reg16 ; o16 39 /r [8086] | |
5489 | CMP r/m32,reg32 ; o32 39 /r [386] | |
5490 | ||
5491 | CMP reg8,r/m8 ; 3A /r [8086] | |
5492 | CMP reg16,r/m16 ; o16 3B /r [8086] | |
5493 | CMP reg32,r/m32 ; o32 3B /r [386] | |
5494 | ||
5495 | CMP r/m8,imm8 ; 80 /0 ib [8086] | |
5496 | CMP r/m16,imm16 ; o16 81 /0 iw [8086] | |
5497 | CMP r/m32,imm32 ; o32 81 /0 id [386] | |
5498 | ||
5499 | CMP r/m16,imm8 ; o16 83 /0 ib [8086] | |
5500 | CMP r/m32,imm8 ; o32 83 /0 ib [386] | |
5501 | ||
5502 | CMP AL,imm8 ; 3C ib [8086] | |
5503 | CMP AX,imm16 ; o16 3D iw [8086] | |
5504 | CMP EAX,imm32 ; o32 3D id [386] | |
5505 | ||
5506 | `CMP' performs a `mental' subtraction of its second operand from its | |
5507 | first operand, and affects the flags as if the subtraction had taken | |
5508 | place, but does not store the result of the subtraction anywhere. | |
5509 | ||
5510 | In the forms with an 8-bit immediate second operand and a longer | |
5511 | first operand, the second operand is considered to be signed, and is | |
5512 | sign-extended to the length of the first operand. In these cases, | |
5513 | the `BYTE' qualifier is necessary to force NASM to generate this | |
5514 | form of the instruction. | |
5515 | ||
5516 | A.19 `CMPSB', `CMPSW', `CMPSD': Compare Strings | |
5517 | ||
5518 | CMPSB ; A6 [8086] | |
5519 | CMPSW ; o16 A7 [8086] | |
5520 | CMPSD ; o32 A7 [386] | |
5521 | ||
5522 | `CMPSB' compares the byte at `[DS:SI]' or `[DS:ESI]' with the byte | |
5523 | at `[ES:DI]' or `[ES:EDI]', and sets the flags accordingly. It then | |
5524 | increments or decrements (depending on the direction flag: | |
5525 | increments if the flag is clear, decrements if it is set) `SI' and | |
5526 | `DI' (or `ESI' and `EDI'). | |
5527 | ||
5528 | The registers used are `SI' and `DI' if the address size is 16 bits, | |
5529 | and `ESI' and `EDI' if it is 32 bits. If you need to use an address | |
5530 | size not equal to the current `BITS' setting, you can use an | |
5531 | explicit `a16' or `a32' prefix. | |
5532 | ||
5533 | The segment register used to load from `[SI]' or `[ESI]' can be | |
5534 | overridden by using a segment register name as a prefix (for | |
5535 | example, `es cmpsb'). The use of `ES' for the load from `[DI]' or | |
5536 | `[EDI]' cannot be overridden. | |
5537 | ||
5538 | `CMPSW' and `CMPSD' work in the same way, but they compare a word or | |
5539 | a doubleword instead of a byte, and increment or decrement the | |
5540 | addressing registers by 2 or 4 instead of 1. | |
5541 | ||
5542 | The `REPE' and `REPNE' prefixes (equivalently, `REPZ' and `REPNZ') | |
5543 | may be used to repeat the instruction up to `CX' (or `ECX' - again, | |
5544 | the address size chooses which) times until the first unequal or | |
5545 | equal byte is found. | |
5546 | ||
5547 | A.20 `CMPXCHG', `CMPXCHG486': Compare and Exchange | |
5548 | ||
5549 | CMPXCHG r/m8,reg8 ; 0F B0 /r [PENT] | |
5550 | CMPXCHG r/m16,reg16 ; o16 0F B1 /r [PENT] | |
5551 | CMPXCHG r/m32,reg32 ; o32 0F B1 /r [PENT] | |
5552 | ||
5553 | CMPXCHG486 r/m8,reg8 ; 0F A6 /r [486,UNDOC] | |
5554 | CMPXCHG486 r/m16,reg16 ; o16 0F A7 /r [486,UNDOC] | |
5555 | CMPXCHG486 r/m32,reg32 ; o32 0F A7 /r [486,UNDOC] | |
5556 | ||
5557 | These two instructions perform exactly the same operation; however, | |
5558 | apparently some (not all) 486 processors support it under a non- | |
5559 | standard opcode, so NASM provides the undocumented `CMPXCHG486' form | |
5560 | to generate the non-standard opcode. | |
5561 | ||
5562 | `CMPXCHG' compares its destination (first) operand to the value in | |
5563 | `AL', `AX' or `EAX' (depending on the size of the instruction). If | |
5564 | they are equal, it copies its source (second) operand into the | |
5565 | destination and sets the zero flag. Otherwise, it clears the zero | |
5566 | flag and leaves the destination alone. | |
5567 | ||
5568 | `CMPXCHG' is intended to be used for atomic operations in | |
5569 | multitasking or multiprocessor environments. To safely update a | |
5570 | value in shared memory, for example, you might load the value into | |
5571 | `EAX', load the updated value into `EBX', and then execute the | |
5572 | instruction `lock cmpxchg [value],ebx'. If `value' has not changed | |
5573 | since being loaded, it is updated with your desired new value, and | |
5574 | the zero flag is set to let you know it has worked. (The `LOCK' | |
5575 | prefix prevents another processor doing anything in the middle of | |
5576 | this operation: it guarantees atomicity.) However, if another | |
5577 | processor has modified the value in between your load and your | |
5578 | attempted store, the store does not happen, and you are notified of | |
5579 | the failure by a cleared zero flag, so you can go round and try | |
5580 | again. | |
5581 | ||
5582 | A.21 `CMPXCHG8B': Compare and Exchange Eight Bytes | |
5583 | ||
5584 | CMPXCHG8B mem ; 0F C7 /1 [PENT] | |
5585 | ||
5586 | This is a larger and more unwieldy version of `CMPXCHG': it compares | |
5587 | the 64-bit (eight-byte) value stored at `[mem]' with the value in | |
5588 | `EDX:EAX'. If they are equal, it sets the zero flag and stores | |
5589 | `ECX:EBX' into the memory area. If they are unequal, it clears the | |
5590 | zero flag and leaves the memory area untouched. | |
5591 | ||
5592 | A.22 `CPUID': Get CPU Identification Code | |
5593 | ||
5594 | CPUID ; 0F A2 [PENT] | |
5595 | ||
5596 | `CPUID' returns various information about the processor it is being | |
5597 | executed on. It fills the four registers `EAX', `EBX', `ECX' and | |
5598 | `EDX' with information, which varies depending on the input contents | |
5599 | of `EAX'. | |
5600 | ||
5601 | `CPUID' also acts as a barrier to serialise instruction execution: | |
5602 | executing the `CPUID' instruction guarantees that all the effects | |
5603 | (memory modification, flag modification, register modification) of | |
5604 | previous instructions have been completed before the next | |
5605 | instruction gets fetched. | |
5606 | ||
5607 | The information returned is as follows: | |
5608 | ||
5609 | (*) If `EAX' is zero on input, `EAX' on output holds the maximum | |
5610 | acceptable input value of `EAX', and `EBX:EDX:ECX' contain the | |
5611 | string `"GenuineIntel"' (or not, if you have a clone processor). | |
5612 | That is to say, `EBX' contains `"Genu"' (in NASM's own sense of | |
5613 | character constants, described in section 3.4.2), `EDX' contains | |
5614 | `"ineI"' and `ECX' contains `"ntel"'. | |
5615 | ||
5616 | (*) If `EAX' is one on input, `EAX' on output contains version | |
5617 | information about the processor, and `EDX' contains a set of | |
5618 | feature flags, showing the presence and absence of various | |
5619 | features. For example, bit 8 is set if the `CMPXCHG8B' | |
5620 | instruction (section A.21) is supported, bit 15 is set if the | |
5621 | conditional move instructions (section A.17 and section A.34) | |
5622 | are supported, and bit 23 is set if MMX instructions are | |
5623 | supported. | |
5624 | ||
5625 | (*) If `EAX' is two on input, `EAX', `EBX', `ECX' and `EDX' all | |
5626 | contain information about caches and TLBs (Translation Lookahead | |
5627 | Buffers). | |
5628 | ||
5629 | For more information on the data returned from `CPUID', see the | |
5630 | documentation on Intel's web site. | |
5631 | ||
5632 | A.23 `DAA', `DAS': Decimal Adjustments | |
5633 | ||
5634 | DAA ; 27 [8086] | |
5635 | DAS ; 2F [8086] | |
5636 | ||
5637 | These instructions are used in conjunction with the add and subtract | |
5638 | instructions to perform binary-coded decimal arithmetic in _packed_ | |
5639 | (one BCD digit per nibble) form. For the unpacked equivalents, see | |
5640 | section A.4. | |
5641 | ||
5642 | `DAA' should be used after a one-byte `ADD' instruction whose | |
5643 | destination was the `AL' register: by means of examining the value | |
5644 | in the `AL' and also the auxiliary carry flag `AF', it determines | |
5645 | whether either digit of the addition has overflowed, and adjusts it | |
5646 | (and sets the carry and auxiliary-carry flags) if so. You can add | |
5647 | long BCD strings together by doing `ADD'/`DAA' on the low two | |
5648 | digits, then doing `ADC'/`DAA' on each subsequent pair of digits. | |
5649 | ||
5650 | `DAS' works similarly to `DAA', but is for use after `SUB' | |
5651 | instructions rather than `ADD'. | |
5652 | ||
5653 | A.24 `DEC': Decrement Integer | |
5654 | ||
5655 | DEC reg16 ; o16 48+r [8086] | |
5656 | DEC reg32 ; o32 48+r [386] | |
5657 | DEC r/m8 ; FE /1 [8086] | |
5658 | DEC r/m16 ; o16 FF /1 [8086] | |
5659 | DEC r/m32 ; o32 FF /1 [386] | |
5660 | ||
5661 | `DEC' subtracts 1 from its operand. It does _not_ affect the carry | |
5662 | flag: to affect the carry flag, use `SUB something,1' (see section | |
5663 | A.159). See also `INC' (section A.79). | |
5664 | ||
5665 | A.25 `DIV': Unsigned Integer Divide | |
5666 | ||
5667 | DIV r/m8 ; F6 /6 [8086] | |
5668 | DIV r/m16 ; o16 F7 /6 [8086] | |
5669 | DIV r/m32 ; o32 F7 /6 [386] | |
5670 | ||
5671 | `DIV' performs unsigned integer division. The explicit operand | |
5672 | provided is the divisor; the dividend and destination operands are | |
5673 | implicit, in the following way: | |
5674 | ||
5675 | (*) For `DIV r/m8', `AX' is divided by the given operand; the | |
5676 | quotient is stored in `AL' and the remainder in `AH'. | |
5677 | ||
5678 | (*) For `DIV r/m16', `DX:AX' is divided by the given operand; the | |
5679 | quotient is stored in `AX' and the remainder in `DX'. | |
5680 | ||
5681 | (*) For `DIV r/m32', `EDX:EAX' is divided by the given operand; the | |
5682 | quotient is stored in `EAX' and the remainder in `EDX'. | |
5683 | ||
5684 | Signed integer division is performed by the `IDIV' instruction: see | |
5685 | section A.76. | |
5686 | ||
5687 | A.26 `EMMS': Empty MMX State | |
5688 | ||
5689 | EMMS ; 0F 77 [PENT,MMX] | |
5690 | ||
5691 | `EMMS' sets the FPU tag word (marking which floating-point registers | |
5692 | are available) to all ones, meaning all registers are available for | |
5693 | the FPU to use. It should be used after executing MMX instructions | |
5694 | and before executing any subsequent floating-point operations. | |
5695 | ||
5696 | A.27 `ENTER': Create Stack Frame | |
5697 | ||
5698 | ENTER imm,imm ; C8 iw ib [186] | |
5699 | ||
5700 | `ENTER' constructs a stack frame for a high-level language procedure | |
5701 | call. The first operand (the `iw' in the opcode definition above | |
5702 | refers to the first operand) gives the amount of stack space to | |
5703 | allocate for local variables; the second (the `ib' above) gives the | |
5704 | nesting level of the procedure (for languages like Pascal, with | |
5705 | nested procedures). | |
5706 | ||
5707 | The function of `ENTER', with a nesting level of zero, is equivalent | |
5708 | to | |
5709 | ||
5710 | PUSH EBP ; or PUSH BP in 16 bits | |
5711 | MOV EBP,ESP ; or MOV BP,SP in 16 bits | |
5712 | SUB ESP,operand1 ; or SUB SP,operand1 in 16 bits | |
5713 | ||
5714 | This creates a stack frame with the procedure parameters accessible | |
5715 | upwards from `EBP', and local variables accessible downwards from | |
5716 | `EBP'. | |
5717 | ||
5718 | With a nesting level of one, the stack frame created is 4 (or 2) | |
5719 | bytes bigger, and the value of the final frame pointer `EBP' is | |
5720 | accessible in memory at `[EBP-4]'. | |
5721 | ||
5722 | This allows `ENTER', when called with a nesting level of two, to | |
5723 | look at the stack frame described by the _previous_ value of `EBP', | |
5724 | find the frame pointer at offset -4 from that, and push it along | |
5725 | with its new frame pointer, so that when a level-two procedure is | |
5726 | called from within a level-one procedure, `[EBP-4]' holds the frame | |
5727 | pointer of the most recent level-one procedure call and `[EBP-8]' | |
5728 | holds that of the most recent level-two call. And so on, for nesting | |
5729 | levels up to 31. | |
5730 | ||
5731 | Stack frames created by `ENTER' can be destroyed by the `LEAVE' | |
5732 | instruction: see section A.94. | |
5733 | ||
5734 | A.28 `F2XM1': Calculate 2**X-1 | |
5735 | ||
5736 | F2XM1 ; D9 F0 [8086,FPU] | |
5737 | ||
5738 | `F2XM1' raises 2 to the power of `ST0', subtracts one, and stores | |
5739 | the result back into `ST0'. The initial contents of `ST0' must be a | |
5740 | number in the range -1 to +1. | |
5741 | ||
5742 | A.29 `FABS': Floating-Point Absolute Value | |
5743 | ||
5744 | FABS ; D9 E1 [8086,FPU] | |
5745 | ||
5746 | `FABS' computes the absolute value of `ST0', storing the result back | |
5747 | in `ST0'. | |
5748 | ||
5749 | A.30 `FADD', `FADDP': Floating-Point Addition | |
5750 | ||
5751 | FADD mem32 ; D8 /0 [8086,FPU] | |
5752 | FADD mem64 ; DC /0 [8086,FPU] | |
5753 | ||
5754 | FADD fpureg ; D8 C0+r [8086,FPU] | |
5755 | FADD ST0,fpureg ; D8 C0+r [8086,FPU] | |
5756 | ||
5757 | FADD TO fpureg ; DC C0+r [8086,FPU] | |
5758 | FADD fpureg,ST0 ; DC C0+r [8086,FPU] | |
5759 | ||
5760 | FADDP fpureg ; DE C0+r [8086,FPU] | |
5761 | FADDP fpureg,ST0 ; DE C0+r [8086,FPU] | |
5762 | ||
5763 | `FADD', given one operand, adds the operand to `ST0' and stores the | |
5764 | result back in `ST0'. If the operand has the `TO' modifier, the | |
5765 | result is stored in the register given rather than in `ST0'. | |
5766 | ||
5767 | `FADDP' performs the same function as `FADD TO', but pops the | |
5768 | register stack after storing the result. | |
5769 | ||
5770 | The given two-operand forms are synonyms for the one-operand forms. | |
5771 | ||
5772 | A.31 `FBLD', `FBSTP': BCD Floating-Point Load and Store | |
5773 | ||
5774 | FBLD mem80 ; DF /4 [8086,FPU] | |
5775 | FBSTP mem80 ; DF /6 [8086,FPU] | |
5776 | ||
5777 | `FBLD' loads an 80-bit (ten-byte) packed binary-coded decimal number | |
5778 | from the given memory address, converts it to a real, and pushes it | |
5779 | on the register stack. `FBSTP' stores the value of `ST0', in packed | |
5780 | BCD, at the given address and then pops the register stack. | |
5781 | ||
5782 | A.32 `FCHS': Floating-Point Change Sign | |
5783 | ||
5784 | FCHS ; D9 E0 [8086,FPU] | |
5785 | ||
5786 | `FCHS' negates the number in `ST0': negative numbers become | |
5787 | positive, and vice versa. | |
5788 | ||
5789 | A.33 `FCLEX', {FNCLEX}: Clear Floating-Point Exceptions | |
5790 | ||
5791 | FCLEX ; 9B DB E2 [8086,FPU] | |
5792 | FNCLEX ; DB E2 [8086,FPU] | |
5793 | ||
5794 | `FCLEX' clears any floating-point exceptions which may be pending. | |
5795 | `FNCLEX' does the same thing but doesn't wait for previous floating- | |
5796 | point operations (including the _handling_ of pending exceptions) to | |
5797 | finish first. | |
5798 | ||
5799 | A.34 `FCMOVcc': Floating-Point Conditional Move | |
5800 | ||
5801 | FCMOVB fpureg ; DA C0+r [P6,FPU] | |
5802 | FCMOVB ST0,fpureg ; DA C0+r [P6,FPU] | |
5803 | ||
5804 | FCMOVBE fpureg ; DA D0+r [P6,FPU] | |
5805 | FCMOVBE ST0,fpureg ; DA D0+r [P6,FPU] | |
5806 | ||
5807 | FCMOVE fpureg ; DA C8+r [P6,FPU] | |
5808 | FCMOVE ST0,fpureg ; DA C8+r [P6,FPU] | |
5809 | ||
5810 | FCMOVNB fpureg ; DB C0+r [P6,FPU] | |
5811 | FCMOVNB ST0,fpureg ; DB C0+r [P6,FPU] | |
5812 | ||
5813 | FCMOVNBE fpureg ; DB D0+r [P6,FPU] | |
5814 | FCMOVNBE ST0,fpureg ; DB D0+r [P6,FPU] | |
5815 | ||
5816 | FCMOVNE fpureg ; DB C8+r [P6,FPU] | |
5817 | FCMOVNE ST0,fpureg ; DB C8+r [P6,FPU] | |
5818 | ||
5819 | FCMOVNU fpureg ; DB D8+r [P6,FPU] | |
5820 | FCMOVNU ST0,fpureg ; DB D8+r [P6,FPU] | |
5821 | ||
5822 | FCMOVU fpureg ; DA D8+r [P6,FPU] | |
5823 | FCMOVU ST0,fpureg ; DA D8+r [P6,FPU] | |
5824 | ||
5825 | The `FCMOV' instructions perform conditional move operations: each | |
5826 | of them moves the contents of the given register into `ST0' if its | |
5827 | condition is satisfied, and does nothing if not. | |
5828 | ||
5829 | The conditions are not the same as the standard condition codes used | |
5830 | with conditional jump instructions. The conditions `B', `BE', `NB', | |
5831 | `NBE', `E' and `NE' are exactly as normal, but none of the other | |
5832 | standard ones are supported. Instead, the condition `U' and its | |
5833 | counterpart `NU' are provided; the `U' condition is satisfied if the | |
5834 | last two floating-point numbers compared were _unordered_, i.e. they | |
5835 | were not equal but neither one could be said to be greater than the | |
5836 | other, for example if they were NaNs. (The flag state which signals | |
5837 | this is the setting of the parity flag: so the `U' condition is | |
5838 | notionally equivalent to `PE', and `NU' is equivalent to `PO'.) | |
5839 | ||
5840 | The `FCMOV' conditions test the main processor's status flags, not | |
5841 | the FPU status flags, so using `FCMOV' directly after `FCOM' will | |
5842 | not work. Instead, you should either use `FCOMI' which writes | |
5843 | directly to the main CPU flags word, or use `FSTSW' to extract the | |
5844 | FPU flags. | |
5845 | ||
5846 | Although the `FCMOV' instructions are flagged `P6' above, they may | |
5847 | not be supported by all Pentium Pro processors; the `CPUID' | |
5848 | instruction (section A.22) will return a bit which indicates whether | |
5849 | conditional moves are supported. | |
5850 | ||
5851 | A.35 `FCOM', `FCOMP', `FCOMPP', `FCOMI', `FCOMIP': Floating-Point Compare | |
5852 | ||
5853 | FCOM mem32 ; D8 /2 [8086,FPU] | |
5854 | FCOM mem64 ; DC /2 [8086,FPU] | |
5855 | FCOM fpureg ; D8 D0+r [8086,FPU] | |
5856 | FCOM ST0,fpureg ; D8 D0+r [8086,FPU] | |
5857 | ||
5858 | FCOMP mem32 ; D8 /3 [8086,FPU] | |
5859 | FCOMP mem64 ; DC /3 [8086,FPU] | |
5860 | FCOMP fpureg ; D8 D8+r [8086,FPU] | |
5861 | FCOMP ST0,fpureg ; D8 D8+r [8086,FPU] | |
5862 | ||
5863 | FCOMPP ; DE D9 [8086,FPU] | |
5864 | ||
5865 | FCOMI fpureg ; DB F0+r [P6,FPU] | |
5866 | FCOMI ST0,fpureg ; DB F0+r [P6,FPU] | |
5867 | ||
5868 | FCOMIP fpureg ; DF F0+r [P6,FPU] | |
5869 | FCOMIP ST0,fpureg ; DF F0+r [P6,FPU] | |
5870 | ||
5871 | `FCOM' compares `ST0' with the given operand, and sets the FPU flags | |
5872 | accordingly. `ST0' is treated as the left-hand side of the | |
5873 | comparison, so that the carry flag is set (for a `less-than' result) | |
5874 | if `ST0' is less than the given operand. | |
5875 | ||
5876 | `FCOMP' does the same as `FCOM', but pops the register stack | |
5877 | afterwards. `FCOMPP' compares `ST0' with `ST1' and then pops the | |
5878 | register stack twice. | |
5879 | ||
5880 | `FCOMI' and `FCOMIP' work like the corresponding forms of `FCOM' and | |
5881 | `FCOMP', but write their results directly to the CPU flags register | |
5882 | rather than the FPU status word, so they can be immediately followed | |
5883 | by conditional jump or conditional move instructions. | |
5884 | ||
5885 | The `FCOM' instructions differ from the `FUCOM' instructions | |
5886 | (section A.69) only in the way they handle quiet NaNs: `FUCOM' will | |
5887 | handle them silently and set the condition code flags to an | |
5888 | `unordered' result, whereas `FCOM' will generate an exception. | |
5889 | ||
5890 | A.36 `FCOS': Cosine | |
5891 | ||
5892 | FCOS ; D9 FF [386,FPU] | |
5893 | ||
5894 | `FCOS' computes the cosine of `ST0' (in radians), and stores the | |
5895 | result in `ST0'. See also `FSINCOS' (section A.61). | |
5896 | ||
5897 | A.37 `FDECSTP': Decrement Floating-Point Stack Pointer | |
5898 | ||
5899 | FDECSTP ; D9 F6 [8086,FPU] | |
5900 | ||
5901 | `FDECSTP' decrements the `top' field in the floating-point status | |
5902 | word. This has the effect of rotating the FPU register stack by one, | |
5903 | as if the contents of `ST7' had been pushed on the stack. See also | |
5904 | `FINCSTP' (section A.46). | |
5905 | ||
5906 | A.38 `FxDISI', `FxENI': Disable and Enable Floating-Point Interrupts | |
5907 | ||
5908 | FDISI ; 9B DB E1 [8086,FPU] | |
5909 | FNDISI ; DB E1 [8086,FPU] | |
5910 | ||
5911 | FENI ; 9B DB E0 [8086,FPU] | |
5912 | FNENI ; DB E0 [8086,FPU] | |
5913 | ||
5914 | `FDISI' and `FENI' disable and enable floating-point interrupts. | |
5915 | These instructions are only meaningful on original 8087 processors: | |
5916 | the 287 and above treat them as no-operation instructions. | |
5917 | ||
5918 | `FNDISI' and `FNENI' do the same thing as `FDISI' and `FENI' | |
5919 | respectively, but without waiting for the floating-point processor | |
5920 | to finish what it was doing first. | |
5921 | ||
5922 | A.39 `FDIV', `FDIVP', `FDIVR', `FDIVRP': Floating-Point Division | |
5923 | ||
5924 | FDIV mem32 ; D8 /6 [8086,FPU] | |
5925 | FDIV mem64 ; DC /6 [8086,FPU] | |
5926 | ||
5927 | FDIV fpureg ; D8 F0+r [8086,FPU] | |
5928 | FDIV ST0,fpureg ; D8 F0+r [8086,FPU] | |
5929 | ||
5930 | FDIV TO fpureg ; DC F8+r [8086,FPU] | |
5931 | FDIV fpureg,ST0 ; DC F8+r [8086,FPU] | |
5932 | ||
5933 | FDIVR mem32 ; D8 /0 [8086,FPU] | |
5934 | FDIVR mem64 ; DC /0 [8086,FPU] | |
5935 | ||
5936 | FDIVR fpureg ; D8 F8+r [8086,FPU] | |
5937 | FDIVR ST0,fpureg ; D8 F8+r [8086,FPU] | |
5938 | ||
5939 | FDIVR TO fpureg ; DC F0+r [8086,FPU] | |
5940 | FDIVR fpureg,ST0 ; DC F0+r [8086,FPU] | |
5941 | ||
5942 | FDIVP fpureg ; DE F8+r [8086,FPU] | |
5943 | FDIVP fpureg,ST0 ; DE F8+r [8086,FPU] | |
5944 | ||
5945 | FDIVRP fpureg ; DE F0+r [8086,FPU] | |
5946 | FDIVRP fpureg,ST0 ; DE F0+r [8086,FPU] | |
5947 | ||
5948 | `FDIV' divides `ST0' by the given operand and stores the result back | |
5949 | in `ST0', unless the `TO' qualifier is given, in which case it | |
5950 | divides the given operand by `ST0' and stores the result in the | |
5951 | operand. | |
5952 | ||
5953 | `FDIVR' does the same thing, but does the division the other way up: | |
5954 | so if `TO' is not given, it divides the given operand by `ST0' and | |
5955 | stores the result in `ST0', whereas if `TO' is given it divides | |
5956 | `ST0' by its operand and stores the result in the operand. | |
5957 | ||
5958 | `FDIVP' operates like `FDIV TO', but pops the register stack once it | |
5959 | has finished. `FDIVRP' operates like `FDIVR TO', but pops the | |
5960 | register stack once it has finished. | |
5961 | ||
5962 | A.40 `FFREE': Flag Floating-Point Register as Unused | |
5963 | ||
5964 | FFREE fpureg ; DD C0+r [8086,FPU] | |
5965 | ||
5966 | `FFREE' marks the given register as being empty. | |
5967 | ||
5968 | A.41 `FIADD': Floating-Point/Integer Addition | |
5969 | ||
5970 | FIADD mem16 ; DE /0 [8086,FPU] | |
5971 | FIADD mem32 ; DA /0 [8086,FPU] | |
5972 | ||
5973 | `FIADD' adds the 16-bit or 32-bit integer stored in the given memory | |
5974 | location to `ST0', storing the result in `ST0'. | |
5975 | ||
5976 | A.42 `FICOM', `FICOMP': Floating-Point/Integer Compare | |
5977 | ||
5978 | FICOM mem16 ; DE /2 [8086,FPU] | |
5979 | FICOM mem32 ; DA /2 [8086,FPU] | |
5980 | ||
5981 | FICOMP mem16 ; DE /3 [8086,FPU] | |
5982 | FICOMP mem32 ; DA /3 [8086,FPU] | |
5983 | ||
5984 | `FICOM' compares `ST0' with the 16-bit or 32-bit integer stored in | |
5985 | the given memory location, and sets the FPU flags accordingly. | |
5986 | `FICOMP' does the same, but pops the register stack afterwards. | |
5987 | ||
5988 | A.43 `FIDIV', `FIDIVR': Floating-Point/Integer Division | |
5989 | ||
5990 | FIDIV mem16 ; DE /6 [8086,FPU] | |
5991 | FIDIV mem32 ; DA /6 [8086,FPU] | |
5992 | ||
5993 | FIDIVR mem16 ; DE /0 [8086,FPU] | |
5994 | FIDIVR mem32 ; DA /0 [8086,FPU] | |
5995 | ||
5996 | `FIDIV' divides `ST0' by the 16-bit or 32-bit integer stored in the | |
5997 | given memory location, and stores the result in `ST0'. `FIDIVR' does | |
5998 | the division the other way up: it divides the integer by `ST0', but | |
5999 | still stores the result in `ST0'. | |
6000 | ||
6001 | A.44 `FILD', `FIST', `FISTP': Floating-Point/Integer Conversion | |
6002 | ||
6003 | FILD mem16 ; DF /0 [8086,FPU] | |
6004 | FILD mem32 ; DB /0 [8086,FPU] | |
6005 | FILD mem64 ; DF /5 [8086,FPU] | |
6006 | ||
6007 | FIST mem16 ; DF /2 [8086,FPU] | |
6008 | FIST mem32 ; DB /2 [8086,FPU] | |
6009 | ||
6010 | FISTP mem16 ; DF /3 [8086,FPU] | |
6011 | FISTP mem32 ; DB /3 [8086,FPU] | |
6012 | FISTP mem64 ; DF /0 [8086,FPU] | |
6013 | ||
6014 | `FILD' loads an integer out of a memory location, converts it to a | |
6015 | real, and pushes it on the FPU register stack. `FIST' converts `ST0' | |
6016 | to an integer and stores that in memory; `FISTP' does the same as | |
6017 | `FIST', but pops the register stack afterwards. | |
6018 | ||
6019 | A.45 `FIMUL': Floating-Point/Integer Multiplication | |
6020 | ||
6021 | FIMUL mem16 ; DE /1 [8086,FPU] | |
6022 | FIMUL mem32 ; DA /1 [8086,FPU] | |
6023 | ||
6024 | `FIMUL' multiplies `ST0' by the 16-bit or 32-bit integer stored in | |
6025 | the given memory location, and stores the result in `ST0'. | |
6026 | ||
6027 | A.46 `FINCSTP': Increment Floating-Point Stack Pointer | |
6028 | ||
6029 | FINCSTP ; D9 F7 [8086,FPU] | |
6030 | ||
6031 | `FINCSTP' increments the `top' field in the floating-point status | |
6032 | word. This has the effect of rotating the FPU register stack by one, | |
6033 | as if the register stack had been popped; however, unlike the | |
6034 | popping of the stack performed by many FPU instructions, it does not | |
6035 | flag the new `ST7' (previously `ST0') as empty. See also `FDECSTP' | |
6036 | (section A.37). | |
6037 | ||
6038 | A.47 `FINIT', `FNINIT': Initialise Floating-Point Unit | |
6039 | ||
6040 | FINIT ; 9B DB E3 [8086,FPU] | |
6041 | FNINIT ; DB E3 [8086,FPU] | |
6042 | ||
6043 | `FINIT' initialises the FPU to its default state. It flags all | |
6044 | registers as empty, though it does not actually change their values. | |
6045 | `FNINIT' does the same, without first waiting for pending exceptions | |
6046 | to clear. | |
6047 | ||
6048 | A.48 `FISUB': Floating-Point/Integer Subtraction | |
6049 | ||
6050 | FISUB mem16 ; DE /4 [8086,FPU] | |
6051 | FISUB mem32 ; DA /4 [8086,FPU] | |
6052 | ||
6053 | FISUBR mem16 ; DE /5 [8086,FPU] | |
6054 | FISUBR mem32 ; DA /5 [8086,FPU] | |
6055 | ||
6056 | `FISUB' subtracts the 16-bit or 32-bit integer stored in the given | |
6057 | memory location from `ST0', and stores the result in `ST0'. `FISUBR' | |
6058 | does the subtraction the other way round, i.e. it subtracts `ST0' | |
6059 | from the given integer, but still stores the result in `ST0'. | |
6060 | ||
6061 | A.49 `FLD': Floating-Point Load | |
6062 | ||
6063 | FLD mem32 ; D9 /0 [8086,FPU] | |
6064 | FLD mem64 ; DD /0 [8086,FPU] | |
6065 | FLD mem80 ; DB /5 [8086,FPU] | |
6066 | FLD fpureg ; D9 C0+r [8086,FPU] | |
6067 | ||
6068 | `FLD' loads a floating-point value out of the given register or | |
6069 | memory location, and pushes it on the FPU register stack. | |
6070 | ||
6071 | A.50 `FLDxx': Floating-Point Load Constants | |
6072 | ||
6073 | FLD1 ; D9 E8 [8086,FPU] | |
6074 | FLDL2E ; D9 EA [8086,FPU] | |
6075 | FLDL2T ; D9 E9 [8086,FPU] | |
6076 | FLDLG2 ; D9 EC [8086,FPU] | |
6077 | FLDLN2 ; D9 ED [8086,FPU] | |
6078 | FLDPI ; D9 EB [8086,FPU] | |
6079 | FLDZ ; D9 EE [8086,FPU] | |
6080 | ||
6081 | These instructions push specific standard constants on the FPU | |
6082 | register stack. `FLD1' pushes the value 1; `FLDL2E' pushes the base- | |
6083 | 2 logarithm of e; `FLDL2T' pushes the base-2 log of 10; `FLDLG2' | |
6084 | pushes the base-10 log of 2; `FLDLN2' pushes the base-e log of 2; | |
6085 | `FLDPI' pushes pi; and `FLDZ' pushes zero. | |
6086 | ||
6087 | A.51 `FLDCW': Load Floating-Point Control Word | |
6088 | ||
6089 | FLDCW mem16 ; D9 /5 [8086,FPU] | |
6090 | ||
6091 | `FLDCW' loads a 16-bit value out of memory and stores it into the | |
6092 | FPU control word (governing things like the rounding mode, the | |
6093 | precision, and the exception masks). See also `FSTCW' (section | |
6094 | A.64). | |
6095 | ||
6096 | A.52 `FLDENV': Load Floating-Point Environment | |
6097 | ||
6098 | FLDENV mem ; D9 /4 [8086,FPU] | |
6099 | ||
6100 | `FLDENV' loads the FPU operating environment (control word, status | |
6101 | word, tag word, instruction pointer, data pointer and last opcode) | |
6102 | from memory. The memory area is 14 or 28 bytes long, depending on | |
6103 | the CPU mode at the time. See also `FSTENV' (section A.65). | |
6104 | ||
6105 | A.53 `FMUL', `FMULP': Floating-Point Multiply | |
6106 | ||
6107 | FMUL mem32 ; D8 /1 [8086,FPU] | |
6108 | FMUL mem64 ; DC /1 [8086,FPU] | |
6109 | ||
6110 | FMUL fpureg ; D8 C8+r [8086,FPU] | |
6111 | FMUL ST0,fpureg ; D8 C8+r [8086,FPU] | |
6112 | ||
6113 | FMUL TO fpureg ; DC C8+r [8086,FPU] | |
6114 | FMUL fpureg,ST0 ; DC C8+r [8086,FPU] | |
6115 | ||
6116 | FMULP fpureg ; DE C8+r [8086,FPU] | |
6117 | FMULP fpureg,ST0 ; DE C8+r [8086,FPU] | |
6118 | ||
6119 | `FMUL' multiplies `ST0' by the given operand, and stores the result | |
6120 | in `ST0', unless the `TO' qualifier is used in which case it stores | |
6121 | the result in the operand. `FMULP' performs the same operation as | |
6122 | `FMUL TO', and then pops the register stack. | |
6123 | ||
6124 | A.54 `FNOP': Floating-Point No Operation | |
6125 | ||
6126 | FNOP ; D9 D0 [8086,FPU] | |
6127 | ||
6128 | `FNOP' does nothing. | |
6129 | ||
6130 | A.55 `FPATAN', `FPTAN': Arctangent and Tangent | |
6131 | ||
6132 | FPATAN ; D9 F3 [8086,FPU] | |
6133 | FPTAN ; D9 F2 [8086,FPU] | |
6134 | ||
6135 | `FPATAN' computes the arctangent, in radians, of the result of | |
6136 | dividing `ST1' by `ST0', stores the result in `ST1', and pops the | |
6137 | register stack. It works like the C `atan2' function, in that | |
6138 | changing the sign of both `ST0' and `ST1' changes the output value | |
6139 | by pi (so it performs true rectangular-to-polar coordinate | |
6140 | conversion, with `ST1' being the Y coordinate and `ST0' being the X | |
6141 | coordinate, not merely an arctangent). | |
6142 | ||
6143 | `FPTAN' computes the tangent of the value in `ST0' (in radians), and | |
6144 | stores the result back into `ST0'. | |
6145 | ||
6146 | A.56 `FPREM', `FPREM1': Floating-Point Partial Remainder | |
6147 | ||
6148 | FPREM ; D9 F8 [8086,FPU] | |
6149 | FPREM1 ; D9 F5 [386,FPU] | |
6150 | ||
6151 | These instructions both produce the remainder obtained by dividing | |
6152 | `ST0' by `ST1'. This is calculated, notionally, by dividing `ST0' by | |
6153 | `ST1', rounding the result to an integer, multiplying by `ST1' | |
6154 | again, and computing the value which would need to be added back on | |
6155 | to the result to get back to the original value in `ST0'. | |
6156 | ||
6157 | The two instructions differ in the way the notional round-to-integer | |
6158 | operation is performed. `FPREM' does it by rounding towards zero, so | |
6159 | that the remainder it returns always has the same sign as the | |
6160 | original value in `ST0'; `FPREM1' does it by rounding to the nearest | |
6161 | integer, so that the remainder always has at most half the magnitude | |
6162 | of `ST1'. | |
6163 | ||
6164 | Both instructions calculate _partial_ remainders, meaning that they | |
6165 | may not manage to provide the final result, but might leave | |
6166 | intermediate results in `ST0' instead. If this happens, they will | |
6167 | set the C2 flag in the FPU status word; therefore, to calculate a | |
6168 | remainder, you should repeatedly execute `FPREM' or `FPREM1' until | |
6169 | C2 becomes clear. | |
6170 | ||
6171 | A.57 `FRNDINT': Floating-Point Round to Integer | |
6172 | ||
6173 | FRNDINT ; D9 FC [8086,FPU] | |
6174 | ||
6175 | `FRNDINT' rounds the contents of `ST0' to an integer, according to | |
6176 | the current rounding mode set in the FPU control word, and stores | |
6177 | the result back in `ST0'. | |
6178 | ||
6179 | A.58 `FSAVE', `FRSTOR': Save/Restore Floating-Point State | |
6180 | ||
6181 | FSAVE mem ; 9B DD /6 [8086,FPU] | |
6182 | FNSAVE mem ; DD /6 [8086,FPU] | |
6183 | ||
6184 | FRSTOR mem ; DD /4 [8086,FPU] | |
6185 | ||
6186 | `FSAVE' saves the entire floating-point unit state, including all | |
6187 | the information saved by `FSTENV' (section A.65) plus the contents | |
6188 | of all the registers, to a 94 or 108 byte area of memory (depending | |
6189 | on the CPU mode). `FRSTOR' restores the floating-point state from | |
6190 | the same area of memory. | |
6191 | ||
6192 | `FNSAVE' does the same as `FSAVE', without first waiting for pending | |
6193 | floating-point exceptions to clear. | |
6194 | ||
6195 | A.59 `FSCALE': Scale Floating-Point Value by Power of Two | |
6196 | ||
6197 | FSCALE ; D9 FD [8086,FPU] | |
6198 | ||
6199 | `FSCALE' scales a number by a power of two: it rounds `ST1' towards | |
6200 | zero to obtain an integer, then multiplies `ST0' by two to the power | |
6201 | of that integer, and stores the result in `ST0'. | |
6202 | ||
6203 | A.60 `FSETPM': Set Protected Mode | |
6204 | ||
6205 | FSETPM ; DB E4 [286,FPU] | |
6206 | ||
6207 | This instruction initalises protected mode on the 287 floating-point | |
6208 | coprocessor. It is only meaningful on that processor: the 387 and | |
6209 | above treat the instruction as a no-operation. | |
6210 | ||
6211 | A.61 `FSIN', `FSINCOS': Sine and Cosine | |
6212 | ||
6213 | FSIN ; D9 FE [386,FPU] | |
6214 | FSINCOS ; D9 FB [386,FPU] | |
6215 | ||
6216 | `FSIN' calculates the sine of `ST0' (in radians) and stores the | |
6217 | result in `ST0'. `FSINCOS' does the same, but then pushes the cosine | |
6218 | of the same value on the register stack, so that the sine ends up in | |
6219 | `ST1' and the cosine in `ST0'. `FSINCOS' is faster than executing | |
6220 | `FSIN' and `FCOS' (see section A.36) in succession. | |
6221 | ||
6222 | A.62 `FSQRT': Floating-Point Square Root | |
6223 | ||
6224 | FSQRT ; D9 FA [8086,FPU] | |
6225 | ||
6226 | `FSQRT' calculates the square root of `ST0' and stores the result in | |
6227 | `ST0'. | |
6228 | ||
6229 | A.63 `FST', `FSTP': Floating-Point Store | |
6230 | ||
6231 | FST mem32 ; D9 /2 [8086,FPU] | |
6232 | FST mem64 ; DD /2 [8086,FPU] | |
6233 | FST fpureg ; DD D0+r [8086,FPU] | |
6234 | ||
6235 | FSTP mem32 ; D9 /3 [8086,FPU] | |
6236 | FSTP mem64 ; DD /3 [8086,FPU] | |
6237 | FSTP mem80 ; DB /0 [8086,FPU] | |
6238 | FSTP fpureg ; DD D8+r [8086,FPU] | |
6239 | ||
6240 | `FST' stores the value in `ST0' into the given memory location or | |
6241 | other FPU register. `FSTP' does the same, but then pops the register | |
6242 | stack. | |
6243 | ||
6244 | A.64 `FSTCW': Store Floating-Point Control Word | |
6245 | ||
6246 | FSTCW mem16 ; 9B D9 /0 [8086,FPU] | |
6247 | FNSTCW mem16 ; D9 /0 [8086,FPU] | |
6248 | ||
6249 | `FSTCW' stores the FPU control word (governing things like the | |
6250 | rounding mode, the precision, and the exception masks) into a 2-byte | |
6251 | memory area. See also `FLDCW' (section A.51). | |
6252 | ||
6253 | `FNSTCW' does the same thing as `FSTCW', without first waiting for | |
6254 | pending floating-point exceptions to clear. | |
6255 | ||
6256 | A.65 `FSTENV': Store Floating-Point Environment | |
6257 | ||
6258 | FSTENV mem ; 9B D9 /6 [8086,FPU] | |
6259 | FNSTENV mem ; D9 /6 [8086,FPU] | |
6260 | ||
6261 | `FSTENV' stores the FPU operating environment (control word, status | |
6262 | word, tag word, instruction pointer, data pointer and last opcode) | |
6263 | into memory. The memory area is 14 or 28 bytes long, depending on | |
6264 | the CPU mode at the time. See also `FLDENV' (section A.52). | |
6265 | ||
6266 | `FNSTENV' does the same thing as `FSTENV', without first waiting for | |
6267 | pending floating-point exceptions to clear. | |
6268 | ||
6269 | A.66 `FSTSW': Store Floating-Point Status Word | |
6270 | ||
6271 | FSTSW mem16 ; 9B DD /0 [8086,FPU] | |
6272 | FSTSW AX ; 9B DF E0 [286,FPU] | |
6273 | ||
6274 | FNSTSW mem16 ; DD /0 [8086,FPU] | |
6275 | FNSTSW AX ; DF E0 [286,FPU] | |
6276 | ||
6277 | `FSTSW' stores the FPU status word into `AX' or into a 2-byte memory | |
6278 | area. | |
6279 | ||
6280 | `FNSTSW' does the same thing as `FSTSW', without first waiting for | |
6281 | pending floating-point exceptions to clear. | |
6282 | ||
6283 | A.67 `FSUB', `FSUBP', `FSUBR', `FSUBRP': Floating-Point Subtract | |
6284 | ||
6285 | FSUB mem32 ; D8 /4 [8086,FPU] | |
6286 | FSUB mem64 ; DC /4 [8086,FPU] | |
6287 | ||
6288 | FSUB fpureg ; D8 E0+r [8086,FPU] | |
6289 | FSUB ST0,fpureg ; D8 E0+r [8086,FPU] | |
6290 | ||
6291 | FSUB TO fpureg ; DC E8+r [8086,FPU] | |
6292 | FSUB fpureg,ST0 ; DC E8+r [8086,FPU] | |
6293 | ||
6294 | FSUBR mem32 ; D8 /5 [8086,FPU] | |
6295 | FSUBR mem64 ; DC /5 [8086,FPU] | |
6296 | ||
6297 | FSUBR fpureg ; D8 E8+r [8086,FPU] | |
6298 | FSUBR ST0,fpureg ; D8 E8+r [8086,FPU] | |
6299 | ||
6300 | FSUBR TO fpureg ; DC E0+r [8086,FPU] | |
6301 | FSUBR fpureg,ST0 ; DC E0+r [8086,FPU] | |
6302 | ||
6303 | FSUBP fpureg ; DE E8+r [8086,FPU] | |
6304 | FSUBP fpureg,ST0 ; DE E8+r [8086,FPU] | |
6305 | ||
6306 | FSUBRP fpureg ; DE E0+r [8086,FPU] | |
6307 | FSUBRP fpureg,ST0 ; DE E0+r [8086,FPU] | |
6308 | ||
6309 | `FSUB' subtracts the given operand from `ST0' and stores the result | |
6310 | back in `ST0', unless the `TO' qualifier is given, in which case it | |
6311 | subtracts `ST0' from the given operand and stores the result in the | |
6312 | operand. | |
6313 | ||
6314 | `FSUBR' does the same thing, but does the subtraction the other way | |
6315 | up: so if `TO' is not given, it subtracts `ST0' from the given | |
6316 | operand and stores the result in `ST0', whereas if `TO' is given it | |
6317 | subtracts its operand from `ST0' and stores the result in the | |
6318 | operand. | |
6319 | ||
6320 | `FSUBP' operates like `FSUB TO', but pops the register stack once it | |
6321 | has finished. `FSUBRP' operates like `FSUBR TO', but pops the | |
6322 | register stack once it has finished. | |
6323 | ||
6324 | A.68 `FTST': Test `ST0' Against Zero | |
6325 | ||
6326 | FTST ; D9 E4 [8086,FPU] | |
6327 | ||
6328 | `FTST' compares `ST0' with zero and sets the FPU flags accordingly. | |
6329 | `ST0' is treated as the left-hand side of the comparison, so that a | |
6330 | `less-than' result is generated if `ST0' is negative. | |
6331 | ||
6332 | A.69 `FUCOMxx': Floating-Point Unordered Compare | |
6333 | ||
6334 | FUCOM fpureg ; DD E0+r [386,FPU] | |
6335 | FUCOM ST0,fpureg ; DD E0+r [386,FPU] | |
6336 | ||
6337 | FUCOMP fpureg ; DD E8+r [386,FPU] | |
6338 | FUCOMP ST0,fpureg ; DD E8+r [386,FPU] | |
6339 | ||
6340 | FUCOMPP ; DA E9 [386,FPU] | |
6341 | ||
6342 | FUCOMI fpureg ; DB E8+r [P6,FPU] | |
6343 | FUCOMI ST0,fpureg ; DB E8+r [P6,FPU] | |
6344 | ||
6345 | FUCOMIP fpureg ; DF E8+r [P6,FPU] | |
6346 | FUCOMIP ST0,fpureg ; DF E8+r [P6,FPU] | |
6347 | ||
6348 | `FUCOM' compares `ST0' with the given operand, and sets the FPU | |
6349 | flags accordingly. `ST0' is treated as the left-hand side of the | |
6350 | comparison, so that the carry flag is set (for a `less-than' result) | |
6351 | if `ST0' is less than the given operand. | |
6352 | ||
6353 | `FUCOMP' does the same as `FUCOM', but pops the register stack | |
6354 | afterwards. `FUCOMPP' compares `ST0' with `ST1' and then pops the | |
6355 | register stack twice. | |
6356 | ||
6357 | `FUCOMI' and `FUCOMIP' work like the corresponding forms of `FUCOM' | |
6358 | and `FUCOMP', but write their results directly to the CPU flags | |
6359 | register rather than the FPU status word, so they can be immediately | |
6360 | followed by conditional jump or conditional move instructions. | |
6361 | ||
6362 | The `FUCOM' instructions differ from the `FCOM' instructions | |
6363 | (section A.35) only in the way they handle quiet NaNs: `FUCOM' will | |
6364 | handle them silently and set the condition code flags to an | |
6365 | `unordered' result, whereas `FCOM' will generate an exception. | |
6366 | ||
6367 | A.70 `FXAM': Examine Class of Value in `ST0' | |
6368 | ||
6369 | FXAM ; D9 E5 [8086,FPU] | |
6370 | ||
6371 | `FXAM' sets the FPU flags C3, C2 and C0 depending on the type of | |
6372 | value stored in `ST0': 000 (respectively) for an unsupported format, | |
6373 | 001 for a NaN, 010 for a normal finite number, 011 for an infinity, | |
6374 | 100 for a zero, 101 for an empty register, and 110 for a denormal. | |
6375 | It also sets the C1 flag to the sign of the number. | |
6376 | ||
6377 | A.71 `FXCH': Floating-Point Exchange | |
6378 | ||
6379 | FXCH ; D9 C9 [8086,FPU] | |
6380 | FXCH fpureg ; D9 C8+r [8086,FPU] | |
6381 | FXCH fpureg,ST0 ; D9 C8+r [8086,FPU] | |
6382 | FXCH ST0,fpureg ; D9 C8+r [8086,FPU] | |
6383 | ||
6384 | `FXCH' exchanges `ST0' with a given FPU register. The no-operand | |
6385 | form exchanges `ST0' with `ST1'. | |
6386 | ||
6387 | A.72 `FXTRACT': Extract Exponent and Significand | |
6388 | ||
6389 | FXTRACT ; D9 F4 [8086,FPU] | |
6390 | ||
6391 | `FXTRACT' separates the number in `ST0' into its exponent and | |
6392 | significand (mantissa), stores the exponent back into `ST0', and | |
6393 | then pushes the significand on the register stack (so that the | |
6394 | significand ends up in `ST0', and the exponent in `ST1'). | |
6395 | ||
6396 | A.73 `FYL2X', `FYL2XP1': Compute Y times Log2(X) or Log2(X+1) | |
6397 | ||
6398 | FYL2X ; D9 F1 [8086,FPU] | |
6399 | FYL2XP1 ; D9 F9 [8086,FPU] | |
6400 | ||
6401 | `FYL2X' multiplies `ST1' by the base-2 logarithm of `ST0', stores | |
6402 | the result in `ST1', and pops the register stack (so that the result | |
6403 | ends up in `ST0'). `ST0' must be non-zero and positive. | |
6404 | ||
6405 | `FYL2XP1' works the same way, but replacing the base-2 log of `ST0' | |
6406 | with that of `ST0' plus one. This time, `ST0' must have magnitude no | |
6407 | greater than 1 minus half the square root of two. | |
6408 | ||
6409 | A.74 `HLT': Halt Processor | |
6410 | ||
6411 | HLT ; F4 [8086] | |
6412 | ||
6413 | `HLT' puts the processor into a halted state, where it will perform | |
6414 | no more operations until restarted by an interrupt or a reset. | |
6415 | ||
6416 | A.75 `IBTS': Insert Bit String | |
6417 | ||
6418 | IBTS r/m16,reg16 ; o16 0F A7 /r [386,UNDOC] | |
6419 | IBTS r/m32,reg32 ; o32 0F A7 /r [386,UNDOC] | |
6420 | ||
6421 | No clear documentation seems to be available for this instruction: | |
6422 | the best I've been able to find reads `Takes a string of bits from | |
6423 | the second operand and puts them in the first operand'. It is | |
6424 | present only in early 386 processors, and conflicts with the opcodes | |
6425 | for `CMPXCHG486'. NASM supports it only for completeness. Its | |
6426 | counterpart is `XBTS' (see section A.167). | |
6427 | ||
6428 | A.76 `IDIV': Signed Integer Divide | |
6429 | ||
6430 | IDIV r/m8 ; F6 /7 [8086] | |
6431 | IDIV r/m16 ; o16 F7 /7 [8086] | |
6432 | IDIV r/m32 ; o32 F7 /7 [386] | |
6433 | ||
6434 | `IDIV' performs signed integer division. The explicit operand | |
6435 | provided is the divisor; the dividend and destination operands are | |
6436 | implicit, in the following way: | |
6437 | ||
6438 | (*) For `IDIV r/m8', `AX' is divided by the given operand; the | |
6439 | quotient is stored in `AL' and the remainder in `AH'. | |
6440 | ||
6441 | (*) For `IDIV r/m16', `DX:AX' is divided by the given operand; the | |
6442 | quotient is stored in `AX' and the remainder in `DX'. | |
6443 | ||
6444 | (*) For `IDIV r/m32', `EDX:EAX' is divided by the given operand; the | |
6445 | quotient is stored in `EAX' and the remainder in `EDX'. | |
6446 | ||
6447 | Unsigned integer division is performed by the `DIV' instruction: see | |
6448 | section A.25. | |
6449 | ||
6450 | A.77 `IMUL': Signed Integer Multiply | |
6451 | ||
6452 | IMUL r/m8 ; F6 /5 [8086] | |
6453 | IMUL r/m16 ; o16 F7 /5 [8086] | |
6454 | IMUL r/m32 ; o32 F7 /5 [386] | |
6455 | ||
6456 | IMUL reg16,r/m16 ; o16 0F AF /r [386] | |
6457 | IMUL reg32,r/m32 ; o32 0F AF /r [386] | |
6458 | ||
6459 | IMUL reg16,imm8 ; o16 6B /r ib [286] | |
6460 | IMUL reg16,imm16 ; o16 69 /r iw [286] | |
6461 | IMUL reg32,imm8 ; o32 6B /r ib [386] | |
6462 | IMUL reg32,imm32 ; o32 69 /r id [386] | |
6463 | ||
6464 | IMUL reg16,r/m16,imm8 ; o16 6B /r ib [286] | |
6465 | IMUL reg16,r/m16,imm16 ; o16 69 /r iw [286] | |
6466 | IMUL reg32,r/m32,imm8 ; o32 6B /r ib [386] | |
6467 | IMUL reg32,r/m32,imm32 ; o32 69 /r id [386] | |
6468 | ||
6469 | `IMUL' performs signed integer multiplication. For the single- | |
6470 | operand form, the other operand and destination are implicit, in the | |
6471 | following way: | |
6472 | ||
6473 | (*) For `IMUL r/m8', `AL' is multiplied by the given operand; the | |
6474 | product is stored in `AX'. | |
6475 | ||
6476 | (*) For `IMUL r/m16', `AX' is multiplied by the given operand; the | |
6477 | product is stored in `DX:AX'. | |
6478 | ||
6479 | (*) For `IMUL r/m32', `EAX' is multiplied by the given operand; the | |
6480 | product is stored in `EDX:EAX'. | |
6481 | ||
6482 | The two-operand form multiplies its two operands and stores the | |
6483 | result in the destination (first) operand. The three-operand form | |
6484 | multiplies its last two operands and stores the result in the first | |
6485 | operand. | |
6486 | ||
6487 | The two-operand form is in fact a shorthand for the three-operand | |
6488 | form, as can be seen by examining the opcode descriptions: in the | |
6489 | two-operand form, the code `/r' takes both its register and `r/m' | |
6490 | parts from the same operand (the first one). | |
6491 | ||
6492 | In the forms with an 8-bit immediate operand and another longer | |
6493 | source operand, the immediate operand is considered to be signed, | |
6494 | and is sign-extended to the length of the other source operand. In | |
6495 | these cases, the `BYTE' qualifier is necessary to force NASM to | |
6496 | generate this form of the instruction. | |
6497 | ||
6498 | Unsigned integer multiplication is performed by the `MUL' | |
6499 | instruction: see section A.107. | |
6500 | ||
6501 | A.78 `IN': Input from I/O Port | |
6502 | ||
6503 | IN AL,imm8 ; E4 ib [8086] | |
6504 | IN AX,imm8 ; o16 E5 ib [8086] | |
6505 | IN EAX,imm8 ; o32 E5 ib [386] | |
6506 | IN AL,DX ; EC [8086] | |
6507 | IN AX,DX ; o16 ED [8086] | |
6508 | IN EAX,DX ; o32 ED [386] | |
6509 | ||
6510 | `IN' reads a byte, word or doubleword from the specified I/O port, | |
6511 | and stores it in the given destination register. The port number may | |
6512 | be specified as an immediate value if it is between 0 and 255, and | |
6513 | otherwise must be stored in `DX'. See also `OUT' (section A.111). | |
6514 | ||
6515 | A.79 `INC': Increment Integer | |
6516 | ||
6517 | INC reg16 ; o16 40+r [8086] | |
6518 | INC reg32 ; o32 40+r [386] | |
6519 | INC r/m8 ; FE /0 [8086] | |
6520 | INC r/m16 ; o16 FF /0 [8086] | |
6521 | INC r/m32 ; o32 FF /0 [386] | |
6522 | ||
6523 | `INC' adds 1 to its operand. It does _not_ affect the carry flag: to | |
6524 | affect the carry flag, use `ADD something,1' (see section A.6). See | |
6525 | also `DEC' (section A.24). | |
6526 | ||
6527 | A.80 `INSB', `INSW', `INSD': Input String from I/O Port | |
6528 | ||
6529 | INSB ; 6C [186] | |
6530 | INSW ; o16 6D [186] | |
6531 | INSD ; o32 6D [386] | |
6532 | ||
6533 | `INSB' inputs a byte from the I/O port specified in `DX' and stores | |
6534 | it at `[ES:DI]' or `[ES:EDI]'. It then increments or decrements | |
6535 | (depending on the direction flag: increments if the flag is clear, | |
6536 | decrements if it is set) `DI' or `EDI'. | |
6537 | ||
6538 | The register used is `DI' if the address size is 16 bits, and `EDI' | |
6539 | if it is 32 bits. If you need to use an address size not equal to | |
6540 | the current `BITS' setting, you can use an explicit `a16' or `a32' | |
6541 | prefix. | |
6542 | ||
6543 | Segment override prefixes have no effect for this instruction: the | |
6544 | use of `ES' for the load from `[DI]' or `[EDI]' cannot be | |
6545 | overridden. | |
6546 | ||
6547 | `INSW' and `INSD' work in the same way, but they input a word or a | |
6548 | doubleword instead of a byte, and increment or decrement the | |
6549 | addressing register by 2 or 4 instead of 1. | |
6550 | ||
6551 | The `REP' prefix may be used to repeat the instruction `CX' (or | |
6552 | `ECX' - again, the address size chooses which) times. | |
6553 | ||
6554 | See also `OUTSB', `OUTSW' and `OUTSD' (section A.112). | |
6555 | ||
6556 | A.81 `INT': Software Interrupt | |
6557 | ||
6558 | INT imm8 ; CD ib [8086] | |
6559 | ||
6560 | `INT' causes a software interrupt through a specified vector number | |
6561 | from 0 to 255. | |
6562 | ||
6563 | The code generated by the `INT' instruction is always two bytes | |
6564 | long: although there are short forms for some `INT' instructions, | |
6565 | NASM does not generate them when it sees the `INT' mnemonic. In | |
6566 | order to generate single-byte breakpoint instructions, use the | |
6567 | `INT3' or `INT1' instructions (see section A.82) instead. | |
6568 | ||
6569 | A.82 `INT3', `INT1', `ICEBP', `INT01': Breakpoints | |
6570 | ||
6571 | INT1 ; F1 [P6] | |
6572 | ICEBP ; F1 [P6] | |
6573 | INT01 ; F1 [P6] | |
6574 | ||
6575 | INT3 ; CC [8086] | |
6576 | ||
6577 | `INT1' and `INT3' are short one-byte forms of the instructions | |
6578 | `INT 1' and `INT 3' (see section A.81). They perform a similar | |
6579 | function to their longer counterparts, but take up less code space. | |
6580 | They are used as breakpoints by debuggers. | |
6581 | ||
6582 | `INT1', and its alternative synonyms `INT01' and `ICEBP', is an | |
6583 | instruction used by in-circuit emulators (ICEs). It is present, | |
6584 | though not documented, on some processors down to the 286, but is | |
6585 | only documented for the Pentium Pro. `INT3' is the instruction | |
6586 | normally used as a breakpoint by debuggers. | |
6587 | ||
6588 | `INT3' is not precisely equivalent to `INT 3': the short form, since | |
6589 | it is designed to be used as a breakpoint, bypasses the normal IOPL | |
6590 | checks in virtual-8086 mode, and also does not go through interrupt | |
6591 | redirection. | |
6592 | ||
6593 | A.83 `INTO': Interrupt if Overflow | |
6594 | ||
6595 | INTO ; CE [8086] | |
6596 | ||
6597 | `INTO' performs an `INT 4' software interrupt (see section A.81) if | |
6598 | and only if the overflow flag is set. | |
6599 | ||
6600 | A.84 `INVD': Invalidate Internal Caches | |
6601 | ||
6602 | INVD ; 0F 08 [486] | |
6603 | ||
6604 | `INVD' invalidates and empties the processor's internal caches, and | |
6605 | causes the processor to instruct external caches to do the same. It | |
6606 | does not write the contents of the caches back to memory first: any | |
6607 | modified data held in the caches will be lost. To write the data | |
6608 | back first, use `WBINVD' (section A.164). | |
6609 | ||
6610 | A.85 `INVLPG': Invalidate TLB Entry | |
6611 | ||
6612 | INVLPG mem ; 0F 01 /0 [486] | |
6613 | ||
6614 | `INVLPG' invalidates the translation lookahead buffer (TLB) entry | |
6615 | associated with the supplied memory address. | |
6616 | ||
6617 | A.86 `IRET', `IRETW', `IRETD': Return from Interrupt | |
6618 | ||
6619 | IRET ; CF [8086] | |
6620 | IRETW ; o16 CF [8086] | |
6621 | IRETD ; o32 CF [386] | |
6622 | ||
6623 | `IRET' returns from an interrupt (hardware or software) by means of | |
6624 | popping `IP' (or `EIP'), `CS' and the flags off the stack and then | |
6625 | continuing execution from the new `CS:IP'. | |
6626 | ||
6627 | `IRETW' pops `IP', `CS' and the flags as 2 bytes each, taking 6 | |
6628 | bytes off the stack in total. `IRETD' pops `EIP' as 4 bytes, pops a | |
6629 | further 4 bytes of which the top two are discarded and the bottom | |
6630 | two go into `CS', and pops the flags as 4 bytes as well, taking 12 | |
6631 | bytes off the stack. | |
6632 | ||
6633 | `IRET' is a shorthand for either `IRETW' or `IRETD', depending on | |
6634 | the default `BITS' setting at the time. | |
6635 | ||
6636 | A.87 `JCXZ', `JECXZ': Jump if CX/ECX Zero | |
6637 | ||
6638 | JCXZ imm ; o16 E3 rb [8086] | |
6639 | JECXZ imm ; o32 E3 rb [386] | |
6640 | ||
6641 | `JCXZ' performs a short jump (with maximum range 128 bytes) if and | |
6642 | only if the contents of the `CX' register is 0. `JECXZ' does the | |
6643 | same thing, but with `ECX'. | |
6644 | ||
6645 | A.88 `JMP': Jump | |
6646 | ||
6647 | JMP imm ; E9 rw/rd [8086] | |
6648 | JMP SHORT imm ; EB rb [8086] | |
6649 | JMP imm:imm16 ; o16 EA iw iw [8086] | |
6650 | JMP imm:imm32 ; o32 EA id iw [386] | |
6651 | JMP FAR mem ; o16 FF /5 [8086] | |
6652 | JMP FAR mem ; o32 FF /5 [386] | |
6653 | JMP r/m16 ; o16 FF /4 [8086] | |
6654 | JMP r/m32 ; o32 FF /4 [386] | |
6655 | ||
6656 | `JMP' jumps to a given address. The address may be specified as an | |
6657 | absolute segment and offset, or as a relative jump within the | |
6658 | current segment. | |
6659 | ||
6660 | `JMP SHORT imm' has a maximum range of 128 bytes, since the | |
6661 | displacement is specified as only 8 bits, but takes up less code | |
6662 | space. NASM does not choose when to generate `JMP SHORT' for you: | |
6663 | you must explicitly code `SHORT' every time you want a short jump. | |
6664 | ||
6665 | You can choose between the two immediate far jump forms | |
6666 | (`JMP imm:imm') by the use of the `WORD' and `DWORD' keywords: | |
6667 | `JMP WORD 0x1234:0x5678') or `JMP DWORD 0x1234:0x56789abc'. | |
6668 | ||
6669 | The `JMP FAR mem' forms execute a far jump by loading the | |
6670 | destination address out of memory. The address loaded consists of 16 | |
6671 | or 32 bits of offset (depending on the operand size), and 16 bits of | |
6672 | segment. The operand size may be overridden using `JMP WORD FAR mem' | |
6673 | or `JMP DWORD FAR mem'. | |
6674 | ||
6675 | The `JMP r/m' forms execute a near jump (within the same segment), | |
6676 | loading the destination address out of memory or out of a register. | |
6677 | The keyword `NEAR' may be specified, for clarity, in these forms, | |
6678 | but is not necessary. Again, operand size can be overridden using | |
6679 | `JMP WORD mem' or `JMP DWORD mem'. | |
6680 | ||
6681 | As a convenience, NASM does not require you to jump to a far symbol | |
6682 | by coding the cumbersome `JMP SEG routine:routine', but instead | |
6683 | allows the easier synonym `JMP FAR routine'. | |
6684 | ||
6685 | The `CALL r/m' forms given above are near calls; NASM will accept | |
6686 | the `NEAR' keyword (e.g. `CALL NEAR [address]'), even though it is | |
6687 | not strictly necessary. | |
6688 | ||
6689 | A.89 `Jcc': Conditional Branch | |
6690 | ||
6691 | Jcc imm ; 70+cc rb [8086] | |
6692 | Jcc NEAR imm ; 0F 80+cc rw/rd [386] | |
6693 | ||
6694 | The conditional jump instructions execute a near (same segment) jump | |
6695 | if and only if their conditions are satisfied. For example, `JNZ' | |
6696 | jumps only if the zero flag is not set. | |
6697 | ||
6698 | The ordinary form of the instructions has only a 128-byte range; the | |
6699 | `NEAR' form is a 386 extension to the instruction set, and can span | |
6700 | the full size of a segment. NASM will not override your choice of | |
6701 | jump instruction: if you want `Jcc NEAR', you have to use the `NEAR' | |
6702 | keyword. | |
6703 | ||
6704 | The `SHORT' keyword is allowed on the first form of the instruction, | |
6705 | for clarity, but is not necessary. | |
6706 | ||
6707 | A.90 `LAHF': Load AH from Flags | |
6708 | ||
6709 | LAHF ; 9F [8086] | |
6710 | ||
6711 | `LAHF' sets the `AH' register according to the contents of the low | |
6712 | byte of the flags word. See also `SAHF' (section A.145). | |
6713 | ||
6714 | A.91 `LAR': Load Access Rights | |
6715 | ||
6716 | LAR reg16,r/m16 ; o16 0F 02 /r [286,PRIV] | |
6717 | LAR reg32,r/m32 ; o32 0F 02 /r [286,PRIV] | |
6718 | ||
6719 | `LAR' takes the segment selector specified by its source (second) | |
6720 | operand, finds the corresponding segment descriptor in the GDT or | |
6721 | LDT, and loads the access-rights byte of the descriptor into its | |
6722 | destination (first) operand. | |
6723 | ||
6724 | A.92 `LDS', `LES', `LFS', `LGS', `LSS': Load Far Pointer | |
6725 | ||
6726 | LDS reg16,mem ; o16 C5 /r [8086] | |
6727 | LDS reg32,mem ; o32 C5 /r [8086] | |
6728 | ||
6729 | LES reg16,mem ; o16 C4 /r [8086] | |
6730 | LES reg32,mem ; o32 C4 /r [8086] | |
6731 | ||
6732 | LFS reg16,mem ; o16 0F B4 /r [386] | |
6733 | LFS reg32,mem ; o32 0F B4 /r [386] | |
6734 | ||
6735 | LGS reg16,mem ; o16 0F B5 /r [386] | |
6736 | LGS reg32,mem ; o32 0F B5 /r [386] | |
6737 | ||
6738 | LSS reg16,mem ; o16 0F B2 /r [386] | |
6739 | LSS reg32,mem ; o32 0F B2 /r [386] | |
6740 | ||
6741 | These instructions load an entire far pointer (16 or 32 bits of | |
6742 | offset, plus 16 bits of segment) out of memory in one go. `LDS', for | |
6743 | example, loads 16 or 32 bits from the given memory address into the | |
6744 | given register (depending on the size of the register), then loads | |
6745 | the _next_ 16 bits from memory into `DS'. `LES', `LFS', `LGS' and | |
6746 | `LSS' work in the same way but use the other segment registers. | |
6747 | ||
6748 | A.93 `LEA': Load Effective Address | |
6749 | ||
6750 | LEA reg16,mem ; o16 8D /r [8086] | |
6751 | LEA reg32,mem ; o32 8D /r [8086] | |
6752 | ||
6753 | `LEA', despite its syntax, does not access memory. It calculates the | |
6754 | effective address specified by its second operand as if it were | |
6755 | going to load or store data from it, but instead it stores the | |
6756 | calculated address into the register specified by its first operand. | |
6757 | This can be used to perform quite complex calculations (e.g. | |
6758 | `LEA EAX,[EBX+ECX*4+100]') in one instruction. | |
6759 | ||
6760 | `LEA', despite being a purely arithmetic instruction which accesses | |
6761 | no memory, still requires square brackets around its second operand, | |
6762 | as if it were a memory reference. | |
6763 | ||
6764 | A.94 `LEAVE': Destroy Stack Frame | |
6765 | ||
6766 | LEAVE ; C9 [186] | |
6767 | ||
6768 | `LEAVE' destroys a stack frame of the form created by the `ENTER' | |
6769 | instruction (see section A.27). It is functionally equivalent to | |
6770 | `MOV ESP,EBP' followed by `POP EBP' (or `MOV SP,BP' followed by | |
6771 | `POP BP' in 16-bit mode). | |
6772 | ||
6773 | A.95 `LGDT', `LIDT', `LLDT': Load Descriptor Tables | |
6774 | ||
6775 | LGDT mem ; 0F 01 /2 [286,PRIV] | |
6776 | LIDT mem ; 0F 01 /3 [286,PRIV] | |
6777 | LLDT r/m16 ; 0F 00 /2 [286,PRIV] | |
6778 | ||
6779 | `LGDT' and `LIDT' both take a 6-byte memory area as an operand: they | |
6780 | load a 32-bit linear address and a 16-bit size limit from that area | |
6781 | (in the opposite order) into the GDTR (global descriptor table | |
6782 | register) or IDTR (interrupt descriptor table register). These are | |
6783 | the only instructions which directly use _linear_ addresses, rather | |
6784 | than segment/offset pairs. | |
6785 | ||
6786 | `LLDT' takes a segment selector as an operand. The processor looks | |
6787 | up that selector in the GDT and stores the limit and base address | |
6788 | given there into the LDTR (local descriptor table register). | |
6789 | ||
6790 | See also `SGDT', `SIDT' and `SLDT' (section A.151). | |
6791 | ||
6792 | A.96 `LMSW': Load/Store Machine Status Word | |
6793 | ||
6794 | LMSW r/m16 ; 0F 01 /6 [286,PRIV] | |
6795 | ||
6796 | `LMSW' loads the bottom four bits of the source operand into the | |
6797 | bottom four bits of the `CR0' control register (or the Machine | |
6798 | Status Word, on 286 processors). See also `SMSW' (section A.155). | |
6799 | ||
6800 | A.97 `LOADALL', `LOADALL286': Load Processor State | |
6801 | ||
6802 | LOADALL ; 0F 07 [386,UNDOC] | |
6803 | LOADALL286 ; 0F 05 [286,UNDOC] | |
6804 | ||
6805 | This instruction, in its two different-opcode forms, is apparently | |
6806 | supported on most 286 processors, some 386 and possibly some 486. | |
6807 | The opcode differs between the 286 and the 386. | |
6808 | ||
6809 | The function of the instruction is to load all information relating | |
6810 | to the state of the processor out of a block of memory: on the 286, | |
6811 | this block is located implicitly at absolute address `0x800', and on | |
6812 | the 386 and 486 it is at `[ES:EDI]'. | |
6813 | ||
6814 | A.98 `LODSB', `LODSW', `LODSD': Load from String | |
6815 | ||
6816 | LODSB ; AC [8086] | |
6817 | LODSW ; o16 AD [8086] | |
6818 | LODSD ; o32 AD [386] | |
6819 | ||
6820 | `LODSB' loads a byte from `[DS:SI]' or `[DS:ESI]' into `AL'. It then | |
6821 | increments or decrements (depending on the direction flag: | |
6822 | increments if the flag is clear, decrements if it is set) `SI' or | |
6823 | `ESI'. | |
6824 | ||
6825 | The register used is `SI' if the address size is 16 bits, and `ESI' | |
6826 | if it is 32 bits. If you need to use an address size not equal to | |
6827 | the current `BITS' setting, you can use an explicit `a16' or `a32' | |
6828 | prefix. | |
6829 | ||
6830 | The segment register used to load from `[SI]' or `[ESI]' can be | |
6831 | overridden by using a segment register name as a prefix (for | |
6832 | example, `es lodsb'). | |
6833 | ||
6834 | `LODSW' and `LODSD' work in the same way, but they load a word or a | |
6835 | doubleword instead of a byte, and increment or decrement the | |
6836 | addressing registers by 2 or 4 instead of 1. | |
6837 | ||
6838 | A.99 `LOOP', `LOOPE', `LOOPZ', `LOOPNE', `LOOPNZ': Loop with Counter | |
6839 | ||
6840 | LOOP imm ; E2 rb [8086] | |
6841 | LOOP imm,CX ; a16 E2 rb [8086] | |
6842 | LOOP imm,ECX ; a32 E2 rb [386] | |
6843 | ||
6844 | LOOPE imm ; E1 rb [8086] | |
6845 | LOOPE imm,CX ; a16 E1 rb [8086] | |
6846 | LOOPE imm,ECX ; a32 E1 rb [386] | |
6847 | LOOPZ imm ; E1 rb [8086] | |
6848 | LOOPZ imm,CX ; a16 E1 rb [8086] | |
6849 | LOOPZ imm,ECX ; a32 E1 rb [386] | |
6850 | ||
6851 | LOOPNE imm ; E0 rb [8086] | |
6852 | LOOPNE imm,CX ; a16 E0 rb [8086] | |
6853 | LOOPNE imm,ECX ; a32 E0 rb [386] | |
6854 | LOOPNZ imm ; E0 rb [8086] | |
6855 | LOOPNZ imm,CX ; a16 E0 rb [8086] | |
6856 | LOOPNZ imm,ECX ; a32 E0 rb [386] | |
6857 | ||
6858 | `LOOP' decrements its counter register (either `CX' or `ECX' - if | |
6859 | one is not specified explicitly, the `BITS' setting dictates which | |
6860 | is used) by one, and if the counter does not become zero as a result | |
6861 | of this operation, it jumps to the given label. The jump has a range | |
6862 | of 128 bytes. | |
6863 | ||
6864 | `LOOPE' (or its synonym `LOOPZ') adds the additional condition that | |
6865 | it only jumps if the counter is nonzero _and_ the zero flag is set. | |
6866 | Similarly, `LOOPNE' (and `LOOPNZ') jumps only if the counter is | |
6867 | nonzero and the zero flag is clear. | |
6868 | ||
6869 | A.100 `LSL': Load Segment Limit | |
6870 | ||
6871 | LSL reg16,r/m16 ; o16 0F 03 /r [286,PRIV] | |
6872 | LSL reg32,r/m32 ; o32 0F 03 /r [286,PRIV] | |
6873 | ||
6874 | `LSL' is given a segment selector in its source (second) operand; it | |
6875 | computes the segment limit value by loading the segment limit field | |
6876 | from the associated segment descriptor in the GDT or LDT. (This | |
6877 | involves shifting left by 12 bits if the segment limit is page- | |
6878 | granular, and not if it is byte-granular; so you end up with a byte | |
6879 | limit in either case.) The segment limit obtained is then loaded | |
6880 | into the destination (first) operand. | |
6881 | ||
6882 | A.101 `LTR': Load Task Register | |
6883 | ||
6884 | LTR r/m16 ; 0F 00 /3 [286,PRIV] | |
6885 | ||
6886 | `LTR' looks up the segment base and limit in the GDT or LDT | |
6887 | descriptor specified by the segment selector given as its operand, | |
6888 | and loads them into the Task Register. | |
6889 | ||
6890 | A.102 `MOV': Move Data | |
6891 | ||
6892 | MOV r/m8,reg8 ; 88 /r [8086] | |
6893 | MOV r/m16,reg16 ; o16 89 /r [8086] | |
6894 | MOV r/m32,reg32 ; o32 89 /r [386] | |
6895 | MOV reg8,r/m8 ; 8A /r [8086] | |
6896 | MOV reg16,r/m16 ; o16 8B /r [8086] | |
6897 | MOV reg32,r/m32 ; o32 8B /r [386] | |
6898 | ||
6899 | MOV reg8,imm8 ; B0+r ib [8086] | |
6900 | MOV reg16,imm16 ; o16 B8+r iw [8086] | |
6901 | MOV reg32,imm32 ; o32 B8+r id [386] | |
6902 | MOV r/m8,imm8 ; C6 /0 ib [8086] | |
6903 | MOV r/m16,imm16 ; o16 C7 /0 iw [8086] | |
6904 | MOV r/m32,imm32 ; o32 C7 /0 id [386] | |
6905 | ||
6906 | MOV AL,memoffs8 ; A0 ow/od [8086] | |
6907 | MOV AX,memoffs16 ; o16 A1 ow/od [8086] | |
6908 | MOV EAX,memoffs32 ; o32 A1 ow/od [386] | |
6909 | MOV memoffs8,AL ; A2 ow/od [8086] | |
6910 | MOV memoffs16,AX ; o16 A3 ow/od [8086] | |
6911 | MOV memoffs32,EAX ; o32 A3 ow/od [386] | |
6912 | ||
6913 | MOV r/m16,segreg ; o16 8C /r [8086] | |
6914 | MOV r/m32,segreg ; o32 8C /r [386] | |
6915 | MOV segreg,r/m16 ; o16 8E /r [8086] | |
6916 | MOV segreg,r/m32 ; o32 8E /r [386] | |
6917 | ||
6918 | MOV reg32,CR0/2/3/4 ; 0F 20 /r [386] | |
6919 | MOV reg32,DR0/1/2/3/6/7 ; 0F 21 /r [386] | |
6920 | MOV reg32,TR3/4/5/6/7 ; 0F 24 /r [386] | |
6921 | MOV CR0/2/3/4,reg32 ; 0F 22 /r [386] | |
6922 | MOV DR0/1/2/3/6/7,reg32 ; 0F 23 /r [386] | |
6923 | MOV TR3/4/5/6/7,reg32 ; 0F 26 /r [386] | |
6924 | ||
6925 | `MOV' copies the contents of its source (second) operand into its | |
6926 | destination (first) operand. | |
6927 | ||
6928 | In all forms of the `MOV' instruction, the two operands are the same | |
6929 | size, except for moving between a segment register and an `r/m32' | |
6930 | operand. These instructions are treated exactly like the | |
6931 | corresponding 16-bit equivalent (so that, for example, `MOV DS,EAX' | |
6932 | functions identically to `MOV DS,AX' but saves a prefix when in 32- | |
6933 | bit mode), except that when a segment register is moved into a 32- | |
6934 | bit destination, the top two bytes of the result are undefined. | |
6935 | ||
6936 | `MOV' may not use `CS' as a destination. | |
6937 | ||
6938 | `CR4' is only a supported register on the Pentium and above. | |
6939 | ||
6940 | A.103 `MOVD': Move Doubleword to/from MMX Register | |
6941 | ||
6942 | MOVD mmxreg,r/m32 ; 0F 6E /r [PENT,MMX] | |
6943 | MOVD r/m32,mmxreg ; 0F 7E /r [PENT,MMX] | |
6944 | ||
6945 | `MOVD' copies 32 bits from its source (second) operand into its | |
6946 | destination (first) operand. When the destination is a 64-bit MMX | |
6947 | register, the top 32 bits are set to zero. | |
6948 | ||
6949 | A.104 `MOVQ': Move Quadword to/from MMX Register | |
6950 | ||
6951 | MOVQ mmxreg,r/m64 ; 0F 6F /r [PENT,MMX] | |
6952 | MOVQ r/m64,mmxreg ; 0F 7F /r [PENT,MMX] | |
6953 | ||
6954 | `MOVQ' copies 64 bits from its source (second) operand into its | |
6955 | destination (first) operand. | |
6956 | ||
6957 | A.105 `MOVSB', `MOVSW', `MOVSD': Move String | |
6958 | ||
6959 | MOVSB ; A4 [8086] | |
6960 | MOVSW ; o16 A5 [8086] | |
6961 | MOVSD ; o32 A5 [386] | |
6962 | ||
6963 | `MOVSB' copies the byte at `[ES:DI]' or `[ES:EDI]' to `[DS:SI]' or | |
6964 | `[DS:ESI]'. It then increments or decrements (depending on the | |
6965 | direction flag: increments if the flag is clear, decrements if it is | |
6966 | set) `SI' and `DI' (or `ESI' and `EDI'). | |
6967 | ||
6968 | The registers used are `SI' and `DI' if the address size is 16 bits, | |
6969 | and `ESI' and `EDI' if it is 32 bits. If you need to use an address | |
6970 | size not equal to the current `BITS' setting, you can use an | |
6971 | explicit `a16' or `a32' prefix. | |
6972 | ||
6973 | The segment register used to load from `[SI]' or `[ESI]' can be | |
6974 | overridden by using a segment register name as a prefix (for | |
6975 | example, `es movsb'). The use of `ES' for the store to `[DI]' or | |
6976 | `[EDI]' cannot be overridden. | |
6977 | ||
6978 | `MOVSW' and `MOVSD' work in the same way, but they copy a word or a | |
6979 | doubleword instead of a byte, and increment or decrement the | |
6980 | addressing registers by 2 or 4 instead of 1. | |
6981 | ||
6982 | The `REP' prefix may be used to repeat the instruction `CX' (or | |
6983 | `ECX' - again, the address size chooses which) times. | |
6984 | ||
6985 | A.106 `MOVSX', `MOVZX': Move Data with Sign or Zero Extend | |
6986 | ||
6987 | MOVSX reg16,r/m8 ; o16 0F BE /r [386] | |
6988 | MOVSX reg32,r/m8 ; o32 0F BE /r [386] | |
6989 | MOVSX reg32,r/m16 ; o32 0F BF /r [386] | |
6990 | ||
6991 | MOVZX reg16,r/m8 ; o16 0F B6 /r [386] | |
6992 | MOVZX reg32,r/m8 ; o32 0F B6 /r [386] | |
6993 | MOVZX reg32,r/m16 ; o32 0F B7 /r [386] | |
6994 | ||
6995 | `MOVSX' sign-extends its source (second) operand to the length of | |
6996 | its destination (first) operand, and copies the result into the | |
6997 | destination operand. `MOVZX' does the same, but zero-extends rather | |
6998 | than sign-extending. | |
6999 | ||
7000 | A.107 `MUL': Unsigned Integer Multiply | |
7001 | ||
7002 | MUL r/m8 ; F6 /4 [8086] | |
7003 | MUL r/m16 ; o16 F7 /4 [8086] | |
7004 | MUL r/m32 ; o32 F7 /4 [386] | |
7005 | ||
7006 | `MUL' performs unsigned integer multiplication. The other operand to | |
7007 | the multiplication, and the destination operand, are implicit, in | |
7008 | the following way: | |
7009 | ||
7010 | (*) For `MUL r/m8', `AL' is multiplied by the given operand; the | |
7011 | product is stored in `AX'. | |
7012 | ||
7013 | (*) For `MUL r/m16', `AX' is multiplied by the given operand; the | |
7014 | product is stored in `DX:AX'. | |
7015 | ||
7016 | (*) For `MUL r/m32', `EAX' is multiplied by the given operand; the | |
7017 | product is stored in `EDX:EAX'. | |
7018 | ||
7019 | Signed integer multiplication is performed by the `IMUL' | |
7020 | instruction: see section A.77. | |
7021 | ||
7022 | A.108 `NEG', `NOT': Two's and One's Complement | |
7023 | ||
7024 | NEG r/m8 ; F6 /3 [8086] | |
7025 | NEG r/m16 ; o16 F7 /3 [8086] | |
7026 | NEG r/m32 ; o32 F7 /3 [386] | |
7027 | ||
7028 | NOT r/m8 ; F6 /2 [8086] | |
7029 | NOT r/m16 ; o16 F7 /2 [8086] | |
7030 | NOT r/m32 ; o32 F7 /2 [386] | |
7031 | ||
7032 | `NEG' replaces the contents of its operand by the two's complement | |
7033 | negation (invert all the bits and then add one) of the original | |
7034 | value. `NOT', similarly, performs one's complement (inverts all the | |
7035 | bits). | |
7036 | ||
7037 | A.109 `NOP': No Operation | |
7038 | ||
7039 | NOP ; 90 [8086] | |
7040 | ||
7041 | `NOP' performs no operation. Its opcode is the same as that | |
7042 | generated by `XCHG AX,AX' or `XCHG EAX,EAX' (depending on the | |
7043 | processor mode; see section A.168). | |
7044 | ||
7045 | A.110 `OR': Bitwise OR | |
7046 | ||
7047 | OR r/m8,reg8 ; 08 /r [8086] | |
7048 | OR r/m16,reg16 ; o16 09 /r [8086] | |
7049 | OR r/m32,reg32 ; o32 09 /r [386] | |
7050 | ||
7051 | OR reg8,r/m8 ; 0A /r [8086] | |
7052 | OR reg16,r/m16 ; o16 0B /r [8086] | |
7053 | OR reg32,r/m32 ; o32 0B /r [386] | |
7054 | ||
7055 | OR r/m8,imm8 ; 80 /1 ib [8086] | |
7056 | OR r/m16,imm16 ; o16 81 /1 iw [8086] | |
7057 | OR r/m32,imm32 ; o32 81 /1 id [386] | |
7058 | ||
7059 | OR r/m16,imm8 ; o16 83 /1 ib [8086] | |
7060 | OR r/m32,imm8 ; o32 83 /1 ib [386] | |
7061 | ||
7062 | OR AL,imm8 ; 0C ib [8086] | |
7063 | OR AX,imm16 ; o16 0D iw [8086] | |
7064 | OR EAX,imm32 ; o32 0D id [386] | |
7065 | ||
7066 | `OR' performs a bitwise OR operation between its two operands (i.e. | |
7067 | each bit of the result is 1 if and only if at least one of the | |
7068 | corresponding bits of the two inputs was 1), and stores the result | |
7069 | in the destination (first) operand. | |
7070 | ||
7071 | In the forms with an 8-bit immediate second operand and a longer | |
7072 | first operand, the second operand is considered to be signed, and is | |
7073 | sign-extended to the length of the first operand. In these cases, | |
7074 | the `BYTE' qualifier is necessary to force NASM to generate this | |
7075 | form of the instruction. | |
7076 | ||
7077 | The MMX instruction `POR' (see section A.129) performs the same | |
7078 | operation on the 64-bit MMX registers. | |
7079 | ||
7080 | A.111 `OUT': Output Data to I/O Port | |
7081 | ||
7082 | OUT imm8,AL ; E6 ib [8086] | |
7083 | OUT imm8,AX ; o16 E7 ib [8086] | |
7084 | OUT imm8,EAX ; o32 E7 ib [386] | |
7085 | OUT DX,AL ; EE [8086] | |
7086 | OUT DX,AX ; o16 EF [8086] | |
7087 | OUT DX,EAX ; o32 EF [386] | |
7088 | ||
7089 | `IN' writes the contents of the given source register to the | |
7090 | specified I/O port. The port number may be specified as an immediate | |
7091 | value if it is between 0 and 255, and otherwise must be stored in | |
7092 | `DX'. See also `IN' (section A.78). | |
7093 | ||
7094 | A.112 `OUTSB', `OUTSW', `OUTSD': Output String to I/O Port | |
7095 | ||
7096 | OUTSB ; 6E [186] | |
7097 | ||
7098 | OUTSW ; o16 6F [186] | |
7099 | ||
7100 | OUTSD ; o32 6F [386] | |
7101 | ||
7102 | `OUTSB' loads a byte from `[DS:SI]' or `[DS:ESI]' and writes it to | |
7103 | the I/O port specified in `DX'. It then increments or decrements | |
7104 | (depending on the direction flag: increments if the flag is clear, | |
7105 | decrements if it is set) `SI' or `ESI'. | |
7106 | ||
7107 | The register used is `SI' if the address size is 16 bits, and `ESI' | |
7108 | if it is 32 bits. If you need to use an address size not equal to | |
7109 | the current `BITS' setting, you can use an explicit `a16' or `a32' | |
7110 | prefix. | |
7111 | ||
7112 | The segment register used to load from `[SI]' or `[ESI]' can be | |
7113 | overridden by using a segment register name as a prefix (for | |
7114 | example, `es outsb'). | |
7115 | ||
7116 | `OUTSW' and `OUTSD' work in the same way, but they output a word or | |
7117 | a doubleword instead of a byte, and increment or decrement the | |
7118 | addressing registers by 2 or 4 instead of 1. | |
7119 | ||
7120 | The `REP' prefix may be used to repeat the instruction `CX' (or | |
7121 | `ECX' - again, the address size chooses which) times. | |
7122 | ||
7123 | A.113 `PACKSSDW', `PACKSSWB', `PACKUSWB': Pack Data | |
7124 | ||
7125 | PACKSSDW mmxreg,r/m64 ; 0F 6B /r [PENT,MMX] | |
7126 | PACKSSWB mmxreg,r/m64 ; 0F 63 /r [PENT,MMX] | |
7127 | PACKUSWB mmxreg,r/m64 ; 0F 67 /r [PENT,MMX] | |
7128 | ||
7129 | All these instructions start by forming a notional 128-bit word by | |
7130 | placing the source (second) operand on the left of the destination | |
7131 | (first) operand. `PACKSSDW' then splits this 128-bit word into four | |
7132 | doublewords, converts each to a word, and loads them side by side | |
7133 | into the destination register; `PACKSSWB' and `PACKUSWB' both split | |
7134 | the 128-bit word into eight words, converts each to a byte, and | |
7135 | loads _those_ side by side into the destination register. | |
7136 | ||
7137 | `PACKSSDW' and `PACKSSWB' perform signed saturation when reducing | |
7138 | the length of numbers: if the number is too large to fit into the | |
7139 | reduced space, they replace it by the largest signed number (`7FFFh' | |
7140 | or `7Fh') that _will_ fit, and if it is too small then they replace | |
7141 | it by the smallest signed number (`8000h' or `80h') that will fit. | |
7142 | `PACKUSWB' performs unsigned saturation: it treats its input as | |
7143 | unsigned, and replaces it by the largest unsigned number that will | |
7144 | fit. | |
7145 | ||
7146 | A.114 `PADDxx': MMX Packed Addition | |
7147 | ||
7148 | PADDB mmxreg,r/m64 ; 0F FC /r [PENT,MMX] | |
7149 | PADDW mmxreg,r/m64 ; 0F FD /r [PENT,MMX] | |
7150 | PADDD mmxreg,r/m64 ; 0F FE /r [PENT,MMX] | |
7151 | ||
7152 | PADDSB mmxreg,r/m64 ; 0F EC /r [PENT,MMX] | |
7153 | PADDSW mmxreg,r/m64 ; 0F ED /r [PENT,MMX] | |
7154 | ||
7155 | PADDUSB mmxreg,r/m64 ; 0F DC /r [PENT,MMX] | |
7156 | PADDUSW mmxreg,r/m64 ; 0F DD /r [PENT,MMX] | |
7157 | ||
7158 | `PADDxx' all perform packed addition between their two 64-bit | |
7159 | operands, storing the result in the destination (first) operand. The | |
7160 | `PADDxB' forms treat the 64-bit operands as vectors of eight bytes, | |
7161 | and add each byte individually; `PADDxW' treat the operands as | |
7162 | vectors of four words; and `PADDD' treats its operands as vectors of | |
7163 | two doublewords. | |
7164 | ||
7165 | `PADDSB' and `PADDSW' perform signed saturation on the sum of each | |
7166 | pair of bytes or words: if the result of an addition is too large or | |
7167 | too small to fit into a signed byte or word result, it is clipped | |
7168 | (saturated) to the largest or smallest value which _will_ fit. | |
7169 | `PADDUSB' and `PADDUSW' similarly perform unsigned saturation, | |
7170 | clipping to `0FFh' or `0FFFFh' if the result is larger than that. | |
7171 | ||
7172 | A.115 `PADDSIW': MMX Packed Addition to Implicit Destination | |
7173 | ||
7174 | PADDSIW mmxreg,r/m64 ; 0F 51 /r [CYRIX,MMX] | |
7175 | ||
7176 | `PADDSIW', specific to the Cyrix extensions to the MMX instruction | |
7177 | set, performs the same function as `PADDSW', except that the result | |
7178 | is not placed in the register specified by the first operand, but | |
7179 | instead in the register whose number differs from the first operand | |
7180 | only in the last bit. So `PADDSIW MM0,MM2' would put the result in | |
7181 | `MM1', but `PADDSIW MM1,MM2' would put the result in `MM0'. | |
7182 | ||
7183 | A.116 `PAND', `PANDN': MMX Bitwise AND and AND-NOT | |
7184 | ||
7185 | PAND mmxreg,r/m64 ; 0F DB /r [PENT,MMX] | |
7186 | PANDN mmxreg,r/m64 ; 0F DF /r [PENT,MMX] | |
7187 | ||
7188 | `PAND' performs a bitwise AND operation between its two operands | |
7189 | (i.e. each bit of the result is 1 if and only if the corresponding | |
7190 | bits of the two inputs were both 1), and stores the result in the | |
7191 | destination (first) operand. | |
7192 | ||
7193 | `PANDN' performs the same operation, but performs a one's complement | |
7194 | operation on the destination (first) operand first. | |
7195 | ||
7196 | A.117 `PAVEB': MMX Packed Average | |
7197 | ||
7198 | PAVEB mmxreg,r/m64 ; 0F 50 /r [CYRIX,MMX] | |
7199 | ||
7200 | `PAVEB', specific to the Cyrix MMX extensions, treats its two | |
7201 | operands as vectors of eight unsigned bytes, and calculates the | |
7202 | average of the corresponding bytes in the operands. The resulting | |
7203 | vector of eight averages is stored in the first operand. | |
7204 | ||
7205 | A.118 `PCMPxx': MMX Packed Comparison | |
7206 | ||
7207 | PCMPEQB mmxreg,r/m64 ; 0F 74 /r [PENT,MMX] | |
7208 | PCMPEQW mmxreg,r/m64 ; 0F 75 /r [PENT,MMX] | |
7209 | PCMPEQD mmxreg,r/m64 ; 0F 76 /r [PENT,MMX] | |
7210 | ||
7211 | PCMPGTB mmxreg,r/m64 ; 0F 64 /r [PENT,MMX] | |
7212 | PCMPGTW mmxreg,r/m64 ; 0F 65 /r [PENT,MMX] | |
7213 | PCMPGTD mmxreg,r/m64 ; 0F 66 /r [PENT,MMX] | |
7214 | ||
7215 | The `PCMPxx' instructions all treat their operands as vectors of | |
7216 | bytes, words, or doublewords; corresponding elements of the source | |
7217 | and destination are compared, and the corresponding element of the | |
7218 | destination (first) operand is set to all zeros or all ones | |
7219 | depending on the result of the comparison. | |
7220 | ||
7221 | `PCMPxxB' treats the operands as vectors of eight bytes, `PCMPxxW' | |
7222 | treats them as vectors of four words, and `PCMPxxD' as two | |
7223 | doublewords. | |
7224 | ||
7225 | `PCMPEQx' sets the corresponding element of the destination operand | |
7226 | to all ones if the two elements compared are equal; `PCMPGTx' sets | |
7227 | the destination element to all ones if the element of the first | |
7228 | (destination) operand is greater (treated as a signed integer) than | |
7229 | that of the second (source) operand. | |
7230 | ||
7231 | A.119 `PDISTIB': MMX Packed Distance and Accumulate with Implied Register | |
7232 | ||
7233 | PDISTIB mmxreg,mem64 ; 0F 54 /r [CYRIX,MMX] | |
7234 | ||
7235 | `PDISTIB', specific to the Cyrix MMX extensions, treats its two | |
7236 | input operands as vectors of eight unsigned bytes. For each byte | |
7237 | position, it finds the absolute difference between the bytes in that | |
7238 | position in the two input operands, and adds that value to the byte | |
7239 | in the same position in the implied output register. The addition is | |
7240 | saturated to an unsigned byte in the same way as `PADDUSB'. | |
7241 | ||
7242 | The implied output register is found in the same way as `PADDSIW' | |
7243 | (section A.115). | |
7244 | ||
7245 | Note that `PDISTIB' cannot take a register as its second source | |
7246 | operand. | |
7247 | ||
7248 | A.120 `PMACHRIW': MMX Packed Multiply and Accumulate with Rounding | |
7249 | ||
7250 | PMACHRIW mmxreg,mem64 ; 0F 5E /r [CYRIX,MMX] | |
7251 | ||
7252 | `PMACHRIW' acts almost identically to `PMULHRIW' (section A.123), | |
7253 | but instead of _storing_ its result in the implied destination | |
7254 | register, it _adds_ its result, as four packed words, to the implied | |
7255 | destination register. No saturation is done: the addition can wrap | |
7256 | around. | |
7257 | ||
7258 | Note that `PMACHRIW' cannot take a register as its second source | |
7259 | operand. | |
7260 | ||
7261 | A.121 `PMADDWD': MMX Packed Multiply and Add | |
7262 | ||
7263 | PMADDWD mmxreg,r/m64 ; 0F F5 /r [PENT,MMX] | |
7264 | ||
7265 | `PMADDWD' treats its two inputs as vectors of four signed words. It | |
7266 | multiplies corresponding elements of the two operands, giving four | |
7267 | signed doubleword results. The top two of these are added and placed | |
7268 | in the top 32 bits of the destination (first) operand; the bottom | |
7269 | two are added and placed in the bottom 32 bits. | |
7270 | ||
7271 | A.122 `PMAGW': MMX Packed Magnitude | |
7272 | ||
7273 | PMAGW mmxreg,r/m64 ; 0F 52 /r [CYRIX,MMX] | |
7274 | ||
7275 | `PMAGW', specific to the Cyrix MMX extensions, treats both its | |
7276 | operands as vectors of four signed words. It compares the absolute | |
7277 | values of the words in corresponding positions, and sets each word | |
7278 | of the destination (first) operand to whichever of the two words in | |
7279 | that position had the larger absolute value. | |
7280 | ||
7281 | A.123 `PMULHRW', `PMULHRIW': MMX Packed Multiply High with Rounding | |
7282 | ||
7283 | PMULHRW mmxreg,r/m64 ; 0F 59 /r [CYRIX,MMX] | |
7284 | PMULHRIW mmxreg,r/m64 ; 0F 5D /r [CYRIX,MMX] | |
7285 | ||
7286 | These instructions, specific to the Cyrix MMX extensions, treat | |
7287 | their operands as vectors of four signed words. Words in | |
7288 | corresponding positions are multiplied, to give a 32-bit value in | |
7289 | which bits 30 and 31 are guaranteed equal. Bits 30 to 15 of this | |
7290 | value (bit mask `0x7FFF8000') are taken and stored in the | |
7291 | corresponding position of the destination operand, after first | |
7292 | rounding the low bit (equivalent to adding `0x4000' before | |
7293 | extracting bits 30 to 15). | |
7294 | ||
7295 | For `PMULHRW', the destination operand is the first operand; for | |
7296 | `PMULHRIW' the destination operand is implied by the first operand | |
7297 | in the manner of `PADDSIW' (section A.115). | |
7298 | ||
7299 | A.124 `PMULHW', `PMULLW': MMX Packed Multiply | |
7300 | ||
7301 | PMULHW mmxreg,r/m64 ; 0F E5 /r [PENT,MMX] | |
7302 | PMULLW mmxreg,r/m64 ; 0F D5 /r [PENT,MMX] | |
7303 | ||
7304 | `PMULxW' treats its two inputs as vectors of four signed words. It | |
7305 | multiplies corresponding elements of the two operands, giving four | |
7306 | signed doubleword results. | |
7307 | ||
7308 | `PMULHW' then stores the top 16 bits of each doubleword in the | |
7309 | destination (first) operand; `PMULLW' stores the bottom 16 bits of | |
7310 | each doubleword in the destination operand. | |
7311 | ||
7312 | A.125 `PMVccZB': MMX Packed Conditional Move | |
7313 | ||
7314 | PMVZB mmxreg,mem64 ; 0F 58 /r [CYRIX,MMX] | |
7315 | PMVNZB mmxreg,mem64 ; 0F 5A /r [CYRIX,MMX] | |
7316 | PMVLZB mmxreg,mem64 ; 0F 5B /r [CYRIX,MMX] | |
7317 | PMVGEZB mmxreg,mem64 ; 0F 5C /r [CYRIX,MMX] | |
7318 | ||
7319 | These instructions, specific to the Cyrix MMX extensions, perform | |
7320 | parallel conditional moves. The two input operands are treated as | |
7321 | vectors of eight bytes. Each byte of the destination (first) operand | |
7322 | is either written from the corresponding byte of the source (second) | |
7323 | operand, or left alone, depending on the value of the byte in the | |
7324 | _implied_ operand (specified in the same way as `PADDSIW', in | |
7325 | section A.115). | |
7326 | ||
7327 | `PMVZB' performs each move if the corresponding byte in the implied | |
7328 | operand is zero. `PMVNZB' moves if the byte is non-zero. `PMVLZB' | |
7329 | moves if the byte is less than zero, and `PMVGEZB' moves if the byte | |
7330 | is greater than or equal to zero. | |
7331 | ||
7332 | Note that these instructions cannot take a register as their second | |
7333 | source operand. | |
7334 | ||
7335 | A.126 `POP': Pop Data from Stack | |
7336 | ||
7337 | POP reg16 ; o16 58+r [8086] | |
7338 | POP reg32 ; o32 58+r [386] | |
7339 | ||
7340 | POP r/m16 ; o16 8F /0 [8086] | |
7341 | POP r/m32 ; o32 8F /0 [386] | |
7342 | ||
7343 | POP CS ; 0F [8086,UNDOC] | |
7344 | POP DS ; 1F [8086] | |
7345 | POP ES ; 07 [8086] | |
7346 | POP SS ; 17 [8086] | |
7347 | POP FS ; 0F A1 [386] | |
7348 | POP GS ; 0F A9 [386] | |
7349 | ||
7350 | `POP' loads a value from the stack (from `[SS:SP]' or `[SS:ESP]') | |
7351 | and then increments the stack pointer. | |
7352 | ||
7353 | The address-size attribute of the instruction determines whether | |
7354 | `SP' or `ESP' is used as the stack pointer: to deliberately override | |
7355 | the default given by the `BITS' setting, you can use an `a16' or | |
7356 | `a32' prefix. | |
7357 | ||
7358 | The operand-size attribute of the instruction determines whether the | |
7359 | stack pointer is incremented by 2 or 4: this means that segment | |
7360 | register pops in `BITS 32' mode will pop 4 bytes off the stack and | |
7361 | discard the upper two of them. If you need to override that, you can | |
7362 | use an `o16' or `o32' prefix. | |
7363 | ||
7364 | The above opcode listings give two forms for general-purpose | |
7365 | register pop instructions: for example, `POP BX' has the two forms | |
7366 | `5B' and `8F C3'. NASM will always generate the shorter form when | |
7367 | given `POP BX'. NDISASM will disassemble both. | |
7368 | ||
7369 | `POP CS' is not a documented instruction, and is not supported on | |
7370 | any processor above the 8086 (since they use `0Fh' as an opcode | |
7371 | prefix for instruction set extensions). However, at least some 8086 | |
7372 | processors do support it, and so NASM generates it for completeness. | |
7373 | ||
7374 | A.127 `POPAx': Pop All General-Purpose Registers | |
7375 | ||
7376 | POPA ; 61 [186] | |
7377 | POPAW ; o16 61 [186] | |
7378 | POPAD ; o32 61 [386] | |
7379 | ||
7380 | `POPAW' pops a word from the stack into each of, successively, `DI', | |
7381 | `SI', `BP', nothing (it discards a word from the stack which was a | |
7382 | placeholder for `SP'), `BX', `DX', `CX' and `AX'. It is intended to | |
7383 | reverse the operation of `PUSHAW' (see section A.135), but it | |
7384 | ignores the value for `SP' that was pushed on the stack by `PUSHAW'. | |
7385 | ||
7386 | `POPAD' pops twice as much data, and places the results in `EDI', | |
7387 | `ESI', `EBP', nothing (placeholder for `ESP'), `EBX', `EDX', `ECX' | |
7388 | and `EAX'. It reverses the operation of `PUSHAD'. | |
7389 | ||
7390 | `POPA' is an alias mnemonic for either `POPAW' or `POPAD', depending | |
7391 | on the current `BITS' setting. | |
7392 | ||
7393 | Note that the registers are popped in reverse order of their numeric | |
7394 | values in opcodes (see section A.2.1). | |
7395 | ||
7396 | A.128 `POPFx': Pop Flags Register | |
7397 | ||
7398 | POPF ; 9D [186] | |
7399 | POPFW ; o16 9D [186] | |
7400 | POPFD ; o32 9D [386] | |
7401 | ||
7402 | `POPFW' pops a word from the stack and stores it in the bottom 16 | |
7403 | bits of the flags register (or the whole flags register, on | |
7404 | processors below a 386). `POPFD' pops a doubleword and stores it in | |
7405 | the entire flags register. | |
7406 | ||
7407 | `POPF' is an alias mnemonic for either `POPFW' or `POPFD', depending | |
7408 | on the current `BITS' setting. | |
7409 | ||
7410 | See also `PUSHF' (section A.136). | |
7411 | ||
7412 | A.129 `POR': MMX Bitwise OR | |
7413 | ||
7414 | POR mmxreg,r/m64 ; 0F EB /r [PENT,MMX] | |
7415 | ||
7416 | `POR' performs a bitwise OR operation between its two operands (i.e. | |
7417 | each bit of the result is 1 if and only if at least one of the | |
7418 | corresponding bits of the two inputs was 1), and stores the result | |
7419 | in the destination (first) operand. | |
7420 | ||
7421 | A.130 `PSLLx', `PSRLx', `PSRAx': MMX Bit Shifts | |
7422 | ||
7423 | PSLLW mmxreg,r/m64 ; 0F F1 /r [PENT,MMX] | |
7424 | PSLLW mmxreg,imm8 ; 0F 71 /6 ib [PENT,MMX] | |
7425 | ||
7426 | PSLLD mmxreg,r/m64 ; 0F F2 /r [PENT,MMX] | |
7427 | PSLLD mmxreg,imm8 ; 0F 72 /6 ib [PENT,MMX] | |
7428 | ||
7429 | PSLLQ mmxreg,r/m64 ; 0F F3 /r [PENT,MMX] | |
7430 | PSLLQ mmxreg,imm8 ; 0F 73 /6 ib [PENT,MMX] | |
7431 | ||
7432 | PSRAW mmxreg,r/m64 ; 0F E1 /r [PENT,MMX] | |
7433 | PSRAW mmxreg,imm8 ; 0F 71 /4 ib [PENT,MMX] | |
7434 | ||
7435 | PSRAD mmxreg,r/m64 ; 0F E2 /r [PENT,MMX] | |
7436 | PSRAD mmxreg,imm8 ; 0F 72 /4 ib [PENT,MMX] | |
7437 | ||
7438 | PSRLW mmxreg,r/m64 ; 0F D1 /r [PENT,MMX] | |
7439 | PSRLW mmxreg,imm8 ; 0F 71 /2 ib [PENT,MMX] | |
7440 | ||
7441 | PSRLD mmxreg,r/m64 ; 0F D2 /r [PENT,MMX] | |
7442 | PSRLD mmxreg,imm8 ; 0F 72 /2 ib [PENT,MMX] | |
7443 | ||
7444 | PSRLQ mmxreg,r/m64 ; 0F D3 /r [PENT,MMX] | |
7445 | PSRLQ mmxreg,imm8 ; 0F 73 /2 ib [PENT,MMX] | |
7446 | ||
7447 | `PSxxQ' perform simple bit shifts on the 64-bit MMX registers: the | |
7448 | destination (first) operand is shifted left or right by the number | |
7449 | of bits given in the source (second) operand, and the vacated bits | |
7450 | are filled in with zeros (for a logical shift) or copies of the | |
7451 | original sign bit (for an arithmetic right shift). | |
7452 | ||
7453 | `PSxxW' and `PSxxD' perform packed bit shifts: the destination | |
7454 | operand is treated as a vector of four words or two doublewords, and | |
7455 | each element is shifted individually, so bits shifted out of one | |
7456 | element do not interfere with empty bits coming into the next. | |
7457 | ||
7458 | `PSLLx' and `PSRLx' perform logical shifts: the vacated bits at one | |
7459 | end of the shifted number are filled with zeros. `PSRAx' performs an | |
7460 | arithmetic right shift: the vacated bits at the top of the shifted | |
7461 | number are filled with copies of the original top (sign) bit. | |
7462 | ||
7463 | A.131 `PSUBxx': MMX Packed Subtraction | |
7464 | ||
7465 | PSUBB mmxreg,r/m64 ; 0F F8 /r [PENT,MMX] | |
7466 | PSUBW mmxreg,r/m64 ; 0F F9 /r [PENT,MMX] | |
7467 | PSUBD mmxreg,r/m64 ; 0F FA /r [PENT,MMX] | |
7468 | ||
7469 | PSUBSB mmxreg,r/m64 ; 0F E8 /r [PENT,MMX] | |
7470 | PSUBSW mmxreg,r/m64 ; 0F E9 /r [PENT,MMX] | |
7471 | ||
7472 | PSUBUSB mmxreg,r/m64 ; 0F D8 /r [PENT,MMX] | |
7473 | PSUBUSW mmxreg,r/m64 ; 0F D9 /r [PENT,MMX] | |
7474 | ||
7475 | `PSUBxx' all perform packed subtraction between their two 64-bit | |
7476 | operands, storing the result in the destination (first) operand. The | |
7477 | `PSUBxB' forms treat the 64-bit operands as vectors of eight bytes, | |
7478 | and subtract each byte individually; `PSUBxW' treat the operands as | |
7479 | vectors of four words; and `PSUBD' treats its operands as vectors of | |
7480 | two doublewords. | |
7481 | ||
7482 | In all cases, the elements of the operand on the right are | |
7483 | subtracted from the corresponding elements of the operand on the | |
7484 | left, not the other way round. | |
7485 | ||
7486 | `PSUBSB' and `PSUBSW' perform signed saturation on the sum of each | |
7487 | pair of bytes or words: if the result of a subtraction is too large | |
7488 | or too small to fit into a signed byte or word result, it is clipped | |
7489 | (saturated) to the largest or smallest value which _will_ fit. | |
7490 | `PSUBUSB' and `PSUBUSW' similarly perform unsigned saturation, | |
7491 | clipping to `0FFh' or `0FFFFh' if the result is larger than that. | |
7492 | ||
7493 | A.132 `PSUBSIW': MMX Packed Subtract with Saturation to Implied Destination | |
7494 | ||
7495 | PSUBSIW mmxreg,r/m64 ; 0F 55 /r [CYRIX,MMX] | |
7496 | ||
7497 | `PSUBSIW', specific to the Cyrix extensions to the MMX instruction | |
7498 | set, performs the same function as `PSUBSW', except that the result | |
7499 | is not placed in the register specified by the first operand, but | |
7500 | instead in the implied destination register, specified as for | |
7501 | `PADDSIW' (section A.115). | |
7502 | ||
7503 | A.133 `PUNPCKxxx': Unpack Data | |
7504 | ||
7505 | PUNPCKHBW mmxreg,r/m64 ; 0F 68 /r [PENT,MMX] | |
7506 | PUNPCKHWD mmxreg,r/m64 ; 0F 69 /r [PENT,MMX] | |
7507 | PUNPCKHDQ mmxreg,r/m64 ; 0F 6A /r [PENT,MMX] | |
7508 | ||
7509 | PUNPCKLBW mmxreg,r/m64 ; 0F 60 /r [PENT,MMX] | |
7510 | PUNPCKLWD mmxreg,r/m64 ; 0F 61 /r [PENT,MMX] | |
7511 | PUNPCKLDQ mmxreg,r/m64 ; 0F 62 /r [PENT,MMX] | |
7512 | ||
7513 | `PUNPCKxx' all treat their operands as vectors, and produce a new | |
7514 | vector generated by interleaving elements from the two inputs. The | |
7515 | `PUNPCKHxx' instructions start by throwing away the bottom half of | |
7516 | each input operand, and the `PUNPCKLxx' instructions throw away the | |
7517 | top half. | |
7518 | ||
7519 | The remaining elements, totalling 64 bits, are then interleaved into | |
7520 | the destination, alternating elements from the second (source) | |
7521 | operand and the first (destination) operand: so the leftmost element | |
7522 | in the result always comes from the second operand, and the | |
7523 | rightmost from the destination. | |
7524 | ||
7525 | `PUNPCKxBW' works a byte at a time, `PUNPCKxWD' a word at a time, | |
7526 | and `PUNPCKxDQ' a doubleword at a time. | |
7527 | ||
7528 | So, for example, if the first operand held `0x7A6A5A4A3A2A1A0A' and | |
7529 | the second held `0x7B6B5B4B3B2B1B0B', then: | |
7530 | ||
7531 | (*) `PUNPCKHBW' would return `0x7B7A6B6A5B5A4B4A'. | |
7532 | ||
7533 | (*) `PUNPCKHWD' would return `0x7B6B7A6A5B4B5A4A'. | |
7534 | ||
7535 | (*) `PUNPCKHDQ' would return `0x7B6B5B4B7A6A5A4A'. | |
7536 | ||
7537 | (*) `PUNPCKLBW' would return `0x3B3A2B2A1B1A0B0A'. | |
7538 | ||
7539 | (*) `PUNPCKLWD' would return `0x3B2B3A2A1B0B1A0A'. | |
7540 | ||
7541 | (*) `PUNPCKLDQ' would return `0x3B2B1B0B3A2A1A0A'. | |
7542 | ||
7543 | A.134 `PUSH': Push Data on Stack | |
7544 | ||
7545 | PUSH reg16 ; o16 50+r [8086] | |
7546 | PUSH reg32 ; o32 50+r [386] | |
7547 | ||
7548 | PUSH r/m16 ; o16 FF /6 [8086] | |
7549 | PUSH r/m32 ; o32 FF /6 [386] | |
7550 | ||
7551 | PUSH CS ; 0E [8086] | |
7552 | PUSH DS ; 1E [8086] | |
7553 | PUSH ES ; 06 [8086] | |
7554 | PUSH SS ; 16 [8086] | |
7555 | PUSH FS ; 0F A0 [386] | |
7556 | PUSH GS ; 0F A8 [386] | |
7557 | ||
7558 | PUSH imm8 ; 6A ib [286] | |
7559 | PUSH imm16 ; o16 68 iw [286] | |
7560 | PUSH imm32 ; o32 68 id [386] | |
7561 | ||
7562 | `PUSH' decrements the stack pointer (`SP' or `ESP') by 2 or 4, and | |
7563 | then stores the given value at `[SS:SP]' or `[SS:ESP]'. | |
7564 | ||
7565 | The address-size attribute of the instruction determines whether | |
7566 | `SP' or `ESP' is used as the stack pointer: to deliberately override | |
7567 | the default given by the `BITS' setting, you can use an `a16' or | |
7568 | `a32' prefix. | |
7569 | ||
7570 | The operand-size attribute of the instruction determines whether the | |
7571 | stack pointer is decremented by 2 or 4: this means that segment | |
7572 | register pushes in `BITS 32' mode will push 4 bytes on the stack, of | |
7573 | which the upper two are undefined. If you need to override that, you | |
7574 | can use an `o16' or `o32' prefix. | |
7575 | ||
7576 | The above opcode listings give two forms for general-purpose | |
7577 | register push instructions: for example, `PUSH BX' has the two forms | |
7578 | `53' and `FF F3'. NASM will always generate the shorter form when | |
7579 | given `PUSH BX'. NDISASM will disassemble both. | |
7580 | ||
7581 | Unlike the undocumented and barely supported `POP CS', `PUSH CS' is | |
7582 | a perfectly valid and sensible instruction, supported on all | |
7583 | processors. | |
7584 | ||
7585 | The instruction `PUSH SP' may be used to distinguish an 8086 from | |
7586 | later processors: on an 8086, the value of `SP' stored is the value | |
7587 | it has _after_ the push instruction, whereas on later processors it | |
7588 | is the value _before_ the push instruction. | |
7589 | ||
7590 | A.135 `PUSHAx': Push All General-Purpose Registers | |
7591 | ||
7592 | PUSHA ; 60 [186] | |
7593 | PUSHAD ; o32 60 [386] | |
7594 | PUSHAW ; o16 60 [186] | |
7595 | ||
7596 | `PUSHAW' pushes, in succession, `AX', `CX', `DX', `BX', `SP', `BP', | |
7597 | `SI' and `DI' on the stack, decrementing the stack pointer by a | |
7598 | total of 16. | |
7599 | ||
7600 | `PUSHAD' pushes, in succession, `EAX', `ECX', `EDX', `EBX', `ESP', | |
7601 | `EBP', `ESI' and `EDI' on the stack, decrementing the stack pointer | |
7602 | by a total of 32. | |
7603 | ||
7604 | In both cases, the value of `SP' or `ESP' pushed is its _original_ | |
7605 | value, as it had before the instruction was executed. | |
7606 | ||
7607 | `PUSHA' is an alias mnemonic for either `PUSHAW' or `PUSHAD', | |
7608 | depending on the current `BITS' setting. | |
7609 | ||
7610 | Note that the registers are pushed in order of their numeric values | |
7611 | in opcodes (see section A.2.1). | |
7612 | ||
7613 | See also `POPA' (section A.127). | |
7614 | ||
7615 | A.136 `PUSHFx': Push Flags Register | |
7616 | ||
7617 | PUSHF ; 9C [186] | |
7618 | PUSHFD ; o32 9C [386] | |
7619 | PUSHFW ; o16 9C [186] | |
7620 | ||
7621 | `PUSHFW' pops a word from the stack and stores it in the bottom 16 | |
7622 | bits of the flags register (or the whole flags register, on | |
7623 | processors below a 386). `PUSHFD' pops a doubleword and stores it in | |
7624 | the entire flags register. | |
7625 | ||
7626 | `PUSHF' is an alias mnemonic for either `PUSHFW' or `PUSHFD', | |
7627 | depending on the current `BITS' setting. | |
7628 | ||
7629 | See also `POPF' (section A.128). | |
7630 | ||
7631 | A.137 `PXOR': MMX Bitwise XOR | |
7632 | ||
7633 | PXOR mmxreg,r/m64 ; 0F EF /r [PENT,MMX] | |
7634 | ||
7635 | `PXOR' performs a bitwise XOR operation between its two operands | |
7636 | (i.e. each bit of the result is 1 if and only if exactly one of the | |
7637 | corresponding bits of the two inputs was 1), and stores the result | |
7638 | in the destination (first) operand. | |
7639 | ||
7640 | A.138 `RCL', `RCR': Bitwise Rotate through Carry Bit | |
7641 | ||
7642 | RCL r/m8,1 ; D0 /2 [8086] | |
7643 | RCL r/m8,CL ; D2 /2 [8086] | |
7644 | RCL r/m8,imm8 ; C0 /2 ib [286] | |
7645 | RCL r/m16,1 ; o16 D1 /2 [8086] | |
7646 | RCL r/m16,CL ; o16 D3 /2 [8086] | |
7647 | RCL r/m16,imm8 ; o16 C1 /2 ib [286] | |
7648 | RCL r/m32,1 ; o32 D1 /2 [386] | |
7649 | RCL r/m32,CL ; o32 D3 /2 [386] | |
7650 | RCL r/m32,imm8 ; o32 C1 /2 ib [386] | |
7651 | ||
7652 | RCR r/m8,1 ; D0 /3 [8086] | |
7653 | RCR r/m8,CL ; D2 /3 [8086] | |
7654 | RCR r/m8,imm8 ; C0 /3 ib [286] | |
7655 | RCR r/m16,1 ; o16 D1 /3 [8086] | |
7656 | RCR r/m16,CL ; o16 D3 /3 [8086] | |
7657 | RCR r/m16,imm8 ; o16 C1 /3 ib [286] | |
7658 | RCR r/m32,1 ; o32 D1 /3 [386] | |
7659 | RCR r/m32,CL ; o32 D3 /3 [386] | |
7660 | RCR r/m32,imm8 ; o32 C1 /3 ib [386] | |
7661 | ||
7662 | `RCL' and `RCR' perform a 9-bit, 17-bit or 33-bit bitwise rotation | |
7663 | operation, involving the given source/destination (first) operand | |
7664 | and the carry bit. Thus, for example, in the operation `RCR AL,1', a | |
7665 | 9-bit rotation is performed in which `AL' is shifted left by 1, the | |
7666 | top bit of `AL' moves into the carry flag, and the original value of | |
7667 | the carry flag is placed in the low bit of `AL'. | |
7668 | ||
7669 | The number of bits to rotate by is given by the second operand. Only | |
7670 | the bottom five bits of the rotation count are considered by | |
7671 | processors above the 8086. | |
7672 | ||
7673 | You can force the longer (286 and upwards, beginning with a `C1' | |
7674 | byte) form of `RCL foo,1' by using a `BYTE' prefix: | |
7675 | `RCL foo,BYTE 1'. Similarly with `RCR'. | |
7676 | ||
7677 | A.139 `RDMSR': Read Model-Specific Registers | |
7678 | ||
7679 | RDMSR ; 0F 32 [PENT] | |
7680 | ||
7681 | `RDMSR' reads the processor Model-Specific Register (MSR) whose | |
7682 | index is stored in `ECX', and stores the result in `EDX:EAX'. See | |
7683 | also `WRMSR' (section A.165). | |
7684 | ||
7685 | A.140 `RDPMC': Read Performance-Monitoring Counters | |
7686 | ||
7687 | RDPMC ; 0F 33 [P6] | |
7688 | ||
7689 | `RDPMC' reads the processor performance-monitoring counter whose | |
7690 | index is stored in `ECX', and stores the result in `EDX:EAX'. | |
7691 | ||
7692 | A.141 `RDTSC': Read Time-Stamp Counter | |
7693 | ||
7694 | RDTSC ; 0F 31 [PENT] | |
7695 | ||
7696 | `RDTSC' reads the processor's time-stamp counter into `EDX:EAX'. | |
7697 | ||
7698 | A.142 `RET', `RETF', `RETN': Return from Procedure Call | |
7699 | ||
7700 | RET ; C3 [8086] | |
7701 | RET imm16 ; C2 iw [8086] | |
7702 | ||
7703 | RETF ; CB [8086] | |
7704 | RETF imm16 ; CA iw [8086] | |
7705 | ||
7706 | RETN ; C3 [8086] | |
7707 | RETN imm16 ; C2 iw [8086] | |
7708 | ||
7709 | `RET', and its exact synonym `RETN', pop `IP' or `EIP' from the | |
7710 | stack and transfer control to the new address. Optionally, if a | |
7711 | numeric second operand is provided, they increment the stack pointer | |
7712 | by a further `imm16' bytes after popping the return address. | |
7713 | ||
7714 | `RETF' executes a far return: after popping `IP'/`EIP', it then pops | |
7715 | `CS', and _then_ increments the stack pointer by the optional | |
7716 | argument if present. | |
7717 | ||
7718 | A.143 `ROL', `ROR': Bitwise Rotate | |
7719 | ||
7720 | ROL r/m8,1 ; D0 /0 [8086] | |
7721 | ROL r/m8,CL ; D2 /0 [8086] | |
7722 | ROL r/m8,imm8 ; C0 /0 ib [286] | |
7723 | ROL r/m16,1 ; o16 D1 /0 [8086] | |
7724 | ROL r/m16,CL ; o16 D3 /0 [8086] | |
7725 | ROL r/m16,imm8 ; o16 C1 /0 ib [286] | |
7726 | ROL r/m32,1 ; o32 D1 /0 [386] | |
7727 | ROL r/m32,CL ; o32 D3 /0 [386] | |
7728 | ROL r/m32,imm8 ; o32 C1 /0 ib [386] | |
7729 | ||
7730 | ROR r/m8,1 ; D0 /1 [8086] | |
7731 | ROR r/m8,CL ; D2 /1 [8086] | |
7732 | ROR r/m8,imm8 ; C0 /1 ib [286] | |
7733 | ROR r/m16,1 ; o16 D1 /1 [8086] | |
7734 | ROR r/m16,CL ; o16 D3 /1 [8086] | |
7735 | ROR r/m16,imm8 ; o16 C1 /1 ib [286] | |
7736 | ROR r/m32,1 ; o32 D1 /1 [386] | |
7737 | ROR r/m32,CL ; o32 D3 /1 [386] | |
7738 | ROR r/m32,imm8 ; o32 C1 /1 ib [386] | |
7739 | ||
7740 | `ROL' and `ROR' perform a bitwise rotation operation on the given | |
7741 | source/destination (first) operand. Thus, for example, in the | |
7742 | operation `ROR AL,1', an 8-bit rotation is performed in which `AL' | |
7743 | is shifted left by 1 and the original top bit of `AL' moves round | |
7744 | into the low bit. | |
7745 | ||
7746 | The number of bits to rotate by is given by the second operand. Only | |
7747 | the bottom 3, 4 or 5 bits (depending on the source operand size) of | |
7748 | the rotation count are considered by processors above the 8086. | |
7749 | ||
7750 | You can force the longer (286 and upwards, beginning with a `C1' | |
7751 | byte) form of `ROL foo,1' by using a `BYTE' prefix: | |
7752 | `ROL foo,BYTE 1'. Similarly with `ROR'. | |
7753 | ||
7754 | A.144 `RSM': Resume from System-Management Mode | |
7755 | ||
7756 | RSM ; 0F AA [PENT] | |
7757 | ||
7758 | `RSM' returns the processor to its normal operating mode when it was | |
7759 | in System-Management Mode. | |
7760 | ||
7761 | A.145 `SAHF': Store AH to Flags | |
7762 | ||
7763 | SAHF ; 9E [8086] | |
7764 | ||
7765 | `SAHF' sets the low byte of the flags word according to the contents | |
7766 | of the `AH' register. See also `LAHF' (section A.90). | |
7767 | ||
7768 | A.146 `SAL', `SAR': Bitwise Arithmetic Shifts | |
7769 | ||
7770 | SAL r/m8,1 ; D0 /4 [8086] | |
7771 | SAL r/m8,CL ; D2 /4 [8086] | |
7772 | SAL r/m8,imm8 ; C0 /4 ib [286] | |
7773 | SAL r/m16,1 ; o16 D1 /4 [8086] | |
7774 | SAL r/m16,CL ; o16 D3 /4 [8086] | |
7775 | SAL r/m16,imm8 ; o16 C1 /4 ib [286] | |
7776 | SAL r/m32,1 ; o32 D1 /4 [386] | |
7777 | SAL r/m32,CL ; o32 D3 /4 [386] | |
7778 | SAL r/m32,imm8 ; o32 C1 /4 ib [386] | |
7779 | ||
7780 | SAR r/m8,1 ; D0 /0 [8086] | |
7781 | SAR r/m8,CL ; D2 /0 [8086] | |
7782 | SAR r/m8,imm8 ; C0 /0 ib [286] | |
7783 | SAR r/m16,1 ; o16 D1 /0 [8086] | |
7784 | SAR r/m16,CL ; o16 D3 /0 [8086] | |
7785 | SAR r/m16,imm8 ; o16 C1 /0 ib [286] | |
7786 | SAR r/m32,1 ; o32 D1 /0 [386] | |
7787 | SAR r/m32,CL ; o32 D3 /0 [386] | |
7788 | SAR r/m32,imm8 ; o32 C1 /0 ib [386] | |
7789 | ||
7790 | `SAL' and `SAR' perform an arithmetic shift operation on the given | |
7791 | source/destination (first) operand. The vacated bits are filled with | |
7792 | zero for `SAL', and with copies of the original high bit of the | |
7793 | source operand for `SAR'. | |
7794 | ||
7795 | `SAL' is a synonym for `SHL' (see section A.152). NASM will assemble | |
7796 | either one to the same code, but NDISASM will always disassemble | |
7797 | that code as `SHL'. | |
7798 | ||
7799 | The number of bits to shift by is given by the second operand. Only | |
7800 | the bottom 3, 4 or 5 bits (depending on the source operand size) of | |
7801 | the shift count are considered by processors above the 8086. | |
7802 | ||
7803 | You can force the longer (286 and upwards, beginning with a `C1' | |
7804 | byte) form of `SAL foo,1' by using a `BYTE' prefix: | |
7805 | `SAL foo,BYTE 1'. Similarly with `SAR'. | |
7806 | ||
7807 | A.147 `SALC': Set AL from Carry Flag | |
7808 | ||
7809 | SALC ; D6 [8086,UNDOC] | |
7810 | ||
7811 | `SALC' is an early undocumented instruction similar in concept to | |
7812 | `SETcc' (section A.150). Its function is to set `AL' to zero if the | |
7813 | carry flag is clear, or to `0xFF' if it is set. | |
7814 | ||
7815 | A.148 `SBB': Subtract with Borrow | |
7816 | ||
7817 | SBB r/m8,reg8 ; 18 /r [8086] | |
7818 | SBB r/m16,reg16 ; o16 19 /r [8086] | |
7819 | SBB r/m32,reg32 ; o32 19 /r [386] | |
7820 | ||
7821 | SBB reg8,r/m8 ; 1A /r [8086] | |
7822 | SBB reg16,r/m16 ; o16 1B /r [8086] | |
7823 | SBB reg32,r/m32 ; o32 1B /r [386] | |
7824 | ||
7825 | SBB r/m8,imm8 ; 80 /3 ib [8086] | |
7826 | SBB r/m16,imm16 ; o16 81 /3 iw [8086] | |
7827 | SBB r/m32,imm32 ; o32 81 /3 id [386] | |
7828 | ||
7829 | SBB r/m16,imm8 ; o16 83 /3 ib [8086] | |
7830 | SBB r/m32,imm8 ; o32 83 /3 ib [8086] | |
7831 | ||
7832 | SBB AL,imm8 ; 1C ib [8086] | |
7833 | SBB AX,imm16 ; o16 1D iw [8086] | |
7834 | SBB EAX,imm32 ; o32 1D id [386] | |
7835 | ||
7836 | `SBB' performs integer subtraction: it subtracts its second operand, | |
7837 | plus the value of the carry flag, from its first, and leaves the | |
7838 | result in its destination (first) operand. The flags are set | |
7839 | according to the result of the operation: in particular, the carry | |
7840 | flag is affected and can be used by a subsequent `SBB' instruction. | |
7841 | ||
7842 | In the forms with an 8-bit immediate second operand and a longer | |
7843 | first operand, the second operand is considered to be signed, and is | |
7844 | sign-extended to the length of the first operand. In these cases, | |
7845 | the `BYTE' qualifier is necessary to force NASM to generate this | |
7846 | form of the instruction. | |
7847 | ||
7848 | To subtract one number from another without also subtracting the | |
7849 | contents of the carry flag, use `SUB' (section A.159). | |
7850 | ||
7851 | A.149 `SCASB', `SCASW', `SCASD': Scan String | |
7852 | ||
7853 | SCASB ; AE [8086] | |
7854 | SCASW ; o16 AF [8086] | |
7855 | SCASD ; o32 AF [386] | |
7856 | ||
7857 | `SCASB' compares the byte in `AL' with the byte at `[ES:DI]' or | |
7858 | `[ES:EDI]', and sets the flags accordingly. It then increments or | |
7859 | decrements (depending on the direction flag: increments if the flag | |
7860 | is clear, decrements if it is set) `DI' (or `EDI'). | |
7861 | ||
7862 | The register used is `DI' if the address size is 16 bits, and `EDI' | |
7863 | if it is 32 bits. If you need to use an address size not equal to | |
7864 | the current `BITS' setting, you can use an explicit `a16' or `a32' | |
7865 | prefix. | |
7866 | ||
7867 | Segment override prefixes have no effect for this instruction: the | |
7868 | use of `ES' for the load from `[DI]' or `[EDI]' cannot be | |
7869 | overridden. | |
7870 | ||
7871 | `SCASW' and `SCASD' work in the same way, but they compare a word to | |
7872 | `AX' or a doubleword to `EAX' instead of a byte to `AL', and | |
7873 | increment or decrement the addressing registers by 2 or 4 instead of | |
7874 | 1. | |
7875 | ||
7876 | The `REPE' and `REPNE' prefixes (equivalently, `REPZ' and `REPNZ') | |
7877 | may be used to repeat the instruction up to `CX' (or `ECX' - again, | |
7878 | the address size chooses which) times until the first unequal or | |
7879 | equal byte is found. | |
7880 | ||
7881 | A.150 `SETcc': Set Register from Condition | |
7882 | ||
7883 | SETcc r/m8 ; 0F 90+cc /2 [386] | |
7884 | ||
7885 | `SETcc' sets the given 8-bit operand to zero if its condition is not | |
7886 | satisfied, and to 1 if it is. | |
7887 | ||
7888 | A.151 `SGDT', `SIDT', `SLDT': Store Descriptor Table Pointers | |
7889 | ||
7890 | SGDT mem ; 0F 01 /0 [286,PRIV] | |
7891 | SIDT mem ; 0F 01 /1 [286,PRIV] | |
7892 | SLDT r/m16 ; 0F 00 /0 [286,PRIV] | |
7893 | ||
7894 | `SGDT' and `SIDT' both take a 6-byte memory area as an operand: they | |
7895 | store the contents of the GDTR (global descriptor table register) or | |
7896 | IDTR (interrupt descriptor table register) into that area as a 32- | |
7897 | bit linear address and a 16-bit size limit from that area (in that | |
7898 | order). These are the only instructions which directly use _linear_ | |
7899 | addresses, rather than segment/offset pairs. | |
7900 | ||
7901 | `SLDT' stores the segment selector corresponding to the LDT (local | |
7902 | descriptor table) into the given operand. | |
7903 | ||
7904 | See also `LGDT', `LIDT' and `LLDT' (section A.95). | |
7905 | ||
7906 | A.152 `SHL', `SHR': Bitwise Logical Shifts | |
7907 | ||
7908 | SHL r/m8,1 ; D0 /4 [8086] | |
7909 | SHL r/m8,CL ; D2 /4 [8086] | |
7910 | SHL r/m8,imm8 ; C0 /4 ib [286] | |
7911 | SHL r/m16,1 ; o16 D1 /4 [8086] | |
7912 | SHL r/m16,CL ; o16 D3 /4 [8086] | |
7913 | SHL r/m16,imm8 ; o16 C1 /4 ib [286] | |
7914 | SHL r/m32,1 ; o32 D1 /4 [386] | |
7915 | SHL r/m32,CL ; o32 D3 /4 [386] | |
7916 | SHL r/m32,imm8 ; o32 C1 /4 ib [386] | |
7917 | ||
7918 | SHR r/m8,1 ; D0 /5 [8086] | |
7919 | SHR r/m8,CL ; D2 /5 [8086] | |
7920 | SHR r/m8,imm8 ; C0 /5 ib [286] | |
7921 | SHR r/m16,1 ; o16 D1 /5 [8086] | |
7922 | SHR r/m16,CL ; o16 D3 /5 [8086] | |
7923 | SHR r/m16,imm8 ; o16 C1 /5 ib [286] | |
7924 | SHR r/m32,1 ; o32 D1 /5 [386] | |
7925 | SHR r/m32,CL ; o32 D3 /5 [386] | |
7926 | SHR r/m32,imm8 ; o32 C1 /5 ib [386] | |
7927 | ||
7928 | `SHL' and `SHR' perform a logical shift operation on the given | |
7929 | source/destination (first) operand. The vacated bits are filled with | |
7930 | zero. | |
7931 | ||
7932 | A synonym for `SHL' is `SAL' (see section A.146). NASM will assemble | |
7933 | either one to the same code, but NDISASM will always disassemble | |
7934 | that code as `SHL'. | |
7935 | ||
7936 | The number of bits to shift by is given by the second operand. Only | |
7937 | the bottom 3, 4 or 5 bits (depending on the source operand size) of | |
7938 | the shift count are considered by processors above the 8086. | |
7939 | ||
7940 | You can force the longer (286 and upwards, beginning with a `C1' | |
7941 | byte) form of `SHL foo,1' by using a `BYTE' prefix: | |
7942 | `SHL foo,BYTE 1'. Similarly with `SHR'. | |
7943 | ||
7944 | A.153 `SHLD', `SHRD': Bitwise Double-Precision Shifts | |
7945 | ||
7946 | SHLD r/m16,reg16,imm8 ; o16 0F A4 /r ib [386] | |
7947 | SHLD r/m16,reg32,imm8 ; o32 0F A4 /r ib [386] | |
7948 | SHLD r/m16,reg16,CL ; o16 0F A5 /r [386] | |
7949 | SHLD r/m16,reg32,CL ; o32 0F A5 /r [386] | |
7950 | ||
7951 | SHRD r/m16,reg16,imm8 ; o16 0F AC /r ib [386] | |
7952 | SHRD r/m32,reg32,imm8 ; o32 0F AC /r ib [386] | |
7953 | SHRD r/m16,reg16,CL ; o16 0F AD /r [386] | |
7954 | SHRD r/m32,reg32,CL ; o32 0F AD /r [386] | |
7955 | ||
7956 | `SHLD' performs a double-precision left shift. It notionally places | |
7957 | its second operand to the right of its first, then shifts the entire | |
7958 | bit string thus generated to the left by a number of bits specified | |
7959 | in the third operand. It then updates only the _first_ operand | |
7960 | according to the result of this. The second operand is not modified. | |
7961 | ||
7962 | `SHRD' performs the corresponding right shift: it notionally places | |
7963 | the second operand to the _left_ of the first, shifts the whole bit | |
7964 | string right, and updates only the first operand. | |
7965 | ||
7966 | For example, if `EAX' holds `0x01234567' and `EBX' holds | |
7967 | `0x89ABCDEF', then the instruction `SHLD EAX,EBX,4' would update | |
7968 | `EAX' to hold `0x12345678'. Under the same conditions, | |
7969 | `SHRD EAX,EBX,4' would update `EAX' to hold `0xF0123456'. | |
7970 | ||
7971 | The number of bits to shift by is given by the third operand. Only | |
7972 | the bottom 5 bits of the shift count are considered. | |
7973 | ||
7974 | A.154 `SMI': System Management Interrupt | |
7975 | ||
7976 | SMI ; F1 [386,UNDOC] | |
7977 | ||
7978 | This is an opcode apparently supported by some AMD processors (which | |
7979 | is why it can generate the same opcode as `INT1'), and places the | |
7980 | machine into system-management mode, a special debugging mode. | |
7981 | ||
7982 | A.155 `SMSW': Store Machine Status Word | |
7983 | ||
7984 | SMSW r/m16 ; 0F 01 /4 [286,PRIV] | |
7985 | ||
7986 | `SMSW' stores the bottom half of the `CR0' control register (or the | |
7987 | Machine Status Word, on 286 processors) into the destination | |
7988 | operand. See also `LMSW' (section A.96). | |
7989 | ||
7990 | A.156 `STC', `STD', `STI': Set Flags | |
7991 | ||
7992 | STC ; F9 [8086] | |
7993 | STD ; FD [8086] | |
7994 | STI ; FB [8086] | |
7995 | ||
7996 | These instructions set various flags. `STC' sets the carry flag; | |
7997 | `STD' sets the direction flag; and `STI' sets the interrupt flag | |
7998 | (thus enabling interrupts). | |
7999 | ||
8000 | To clear the carry, direction, or interrupt flags, use the `CLC', | |
8001 | `CLD' and `CLI' instructions (section A.15). To invert the carry | |
8002 | flag, use `CMC' (section A.16). | |
8003 | ||
8004 | A.157 `STOSB', `STOSW', `STOSD': Store Byte to String | |
8005 | ||
8006 | STOSB ; AA [8086] | |
8007 | STOSW ; o16 AB [8086] | |
8008 | STOSD ; o32 AB [386] | |
8009 | ||
8010 | `STOSB' stores the byte in `AL' at `[ES:DI]' or `[ES:EDI]', and sets | |
8011 | the flags accordingly. It then increments or decrements (depending | |
8012 | on the direction flag: increments if the flag is clear, decrements | |
8013 | if it is set) `DI' (or `EDI'). | |
8014 | ||
8015 | The register used is `DI' if the address size is 16 bits, and `EDI' | |
8016 | if it is 32 bits. If you need to use an address size not equal to | |
8017 | the current `BITS' setting, you can use an explicit `a16' or `a32' | |
8018 | prefix. | |
8019 | ||
8020 | Segment override prefixes have no effect for this instruction: the | |
8021 | use of `ES' for the store to `[DI]' or `[EDI]' cannot be overridden. | |
8022 | ||
8023 | `STOSW' and `STOSD' work in the same way, but they store the word in | |
8024 | `AX' or the doubleword in `EAX' instead of the byte in `AL', and | |
8025 | increment or decrement the addressing registers by 2 or 4 instead of | |
8026 | 1. | |
8027 | ||
8028 | The `REP' prefix may be used to repeat the instruction `CX' (or | |
8029 | `ECX' - again, the address size chooses which) times. | |
8030 | ||
8031 | A.158 `STR': Store Task Register | |
8032 | ||
8033 | STR r/m16 ; 0F 00 /1 [286,PRIV] | |
8034 | ||
8035 | `STR' stores the segment selector corresponding to the contents of | |
8036 | the Task Register into its operand. | |
8037 | ||
8038 | A.159 `SUB': Subtract Integers | |
8039 | ||
8040 | SUB r/m8,reg8 ; 28 /r [8086] | |
8041 | SUB r/m16,reg16 ; o16 29 /r [8086] | |
8042 | SUB r/m32,reg32 ; o32 29 /r [386] | |
8043 | ||
8044 | SUB reg8,r/m8 ; 2A /r [8086] | |
8045 | SUB reg16,r/m16 ; o16 2B /r [8086] | |
8046 | SUB reg32,r/m32 ; o32 2B /r [386] | |
8047 | ||
8048 | SUB r/m8,imm8 ; 80 /5 ib [8086] | |
8049 | SUB r/m16,imm16 ; o16 81 /5 iw [8086] | |
8050 | SUB r/m32,imm32 ; o32 81 /5 id [386] | |
8051 | ||
8052 | SUB r/m16,imm8 ; o16 83 /5 ib [8086] | |
8053 | SUB r/m32,imm8 ; o32 83 /5 ib [386] | |
8054 | ||
8055 | SUB AL,imm8 ; 2C ib [8086] | |
8056 | SUB AX,imm16 ; o16 2D iw [8086] | |
8057 | SUB EAX,imm32 ; o32 2D id [386] | |
8058 | ||
8059 | `SUB' performs integer subtraction: it subtracts its second operand | |
8060 | from its first, and leaves the result in its destination (first) | |
8061 | operand. The flags are set according to the result of the operation: | |
8062 | in particular, the carry flag is affected and can be used by a | |
8063 | subsequent `SBB' instruction (section A.148). | |
8064 | ||
8065 | In the forms with an 8-bit immediate second operand and a longer | |
8066 | first operand, the second operand is considered to be signed, and is | |
8067 | sign-extended to the length of the first operand. In these cases, | |
8068 | the `BYTE' qualifier is necessary to force NASM to generate this | |
8069 | form of the instruction. | |
8070 | ||
8071 | A.160 `TEST': Test Bits (notional bitwise AND) | |
8072 | ||
8073 | TEST r/m8,reg8 ; 84 /r [8086] | |
8074 | TEST r/m16,reg16 ; o16 85 /r [8086] | |
8075 | TEST r/m32,reg32 ; o32 85 /r [386] | |
8076 | ||
8077 | TEST r/m8,imm8 ; F6 /7 ib [8086] | |
8078 | TEST r/m16,imm16 ; o16 F7 /7 iw [8086] | |
8079 | TEST r/m32,imm32 ; o32 F7 /7 id [386] | |
8080 | ||
8081 | TEST AL,imm8 ; A8 ib [8086] | |
8082 | TEST AX,imm16 ; o16 A9 iw [8086] | |
8083 | TEST EAX,imm32 ; o32 A9 id [386] | |
8084 | ||
8085 | `TEST' performs a `mental' bitwise AND of its two operands, and | |
8086 | affects the flags as if the operation had taken place, but does not | |
8087 | store the result of the operation anywhere. | |
8088 | ||
8089 | A.161 `UMOV': User Move Data | |
8090 | ||
8091 | UMOV r/m8,reg8 ; 0F 10 /r [386,UNDOC] | |
8092 | UMOV r/m16,reg16 ; o16 0F 11 /r [386,UNDOC] | |
8093 | UMOV r/m32,reg32 ; o32 0F 11 /r [386,UNDOC] | |
8094 | ||
8095 | UMOV reg8,r/m8 ; 0F 12 /r [386,UNDOC] | |
8096 | UMOV reg16,r/m16 ; o16 0F 13 /r [386,UNDOC] | |
8097 | UMOV reg32,r/m32 ; o32 0F 13 /r [386,UNDOC] | |
8098 | ||
8099 | This undocumented instruction is used by in-circuit emulators to | |
8100 | access user memory (as opposed to host memory). It is used just like | |
8101 | an ordinary memory/register or register/register `MOV' instruction, | |
8102 | but accesses user space. | |
8103 | ||
8104 | A.162 `VERR', `VERW': Verify Segment Readability/Writability | |
8105 | ||
8106 | VERR r/m16 ; 0F 00 /4 [286,PRIV] | |
8107 | ||
8108 | VERW r/m16 ; 0F 00 /5 [286,PRIV] | |
8109 | ||
8110 | `VERR' sets the zero flag if the segment specified by the selector | |
8111 | in its operand can be read from at the current privilege level. | |
8112 | `VERW' sets the zero flag if the segment can be written. | |
8113 | ||
8114 | A.163 `WAIT': Wait for Floating-Point Processor | |
8115 | ||
8116 | WAIT ; 9B [8086] | |
8117 | ||
8118 | `WAIT', on 8086 systems with a separate 8087 FPU, waits for the FPU | |
8119 | to have finished any operation it is engaged in before continuing | |
8120 | main processor operations, so that (for example) an FPU store to | |
8121 | main memory can be guaranteed to have completed before the CPU tries | |
8122 | to read the result back out. | |
8123 | ||
8124 | On higher processors, `WAIT' is unnecessary for this purpose, and it | |
8125 | has the alternative purpose of ensuring that any pending unmasked | |
8126 | FPU exceptions have happened before execution continues. | |
8127 | ||
8128 | A.164 `WBINVD': Write Back and Invalidate Cache | |
8129 | ||
8130 | WBINVD ; 0F 09 [486] | |
8131 | ||
8132 | `WBINVD' invalidates and empties the processor's internal caches, | |
8133 | and causes the processor to instruct external caches to do the same. | |
8134 | It writes the contents of the caches back to memory first, so no | |
8135 | data is lost. To flush the caches quickly without bothering to write | |
8136 | the data back first, use `INVD' (section A.84). | |
8137 | ||
8138 | A.165 `WRMSR': Write Model-Specific Registers | |
8139 | ||
8140 | WRMSR ; 0F 30 [PENT] | |
8141 | ||
8142 | `WRMSR' writes the value in `EDX:EAX' to the processor Model- | |
8143 | Specific Register (MSR) whose index is stored in `ECX'. See also | |
8144 | `RDMSR' (section A.139). | |
8145 | ||
8146 | A.166 `XADD': Exchange and Add | |
8147 | ||
8148 | XADD r/m8,reg8 ; 0F C0 /r [486] | |
8149 | XADD r/m16,reg16 ; o16 0F C1 /r [486] | |
8150 | XADD r/m32,reg32 ; o32 0F C1 /r [486] | |
8151 | ||
8152 | `XADD' exchanges the values in its two operands, and then adds them | |
8153 | together and writes the result into the destination (first) operand. | |
8154 | This instruction can be used with a `LOCK' prefix for multi- | |
8155 | processor synchronisation purposes. | |
8156 | ||
8157 | A.167 `XBTS': Extract Bit String | |
8158 | ||
8159 | XBTS reg16,r/m16 ; o16 0F A6 /r [386,UNDOC] | |
8160 | XBTS reg32,r/m32 ; o32 0F A6 /r [386,UNDOC] | |
8161 | ||
8162 | No clear documentation seems to be available for this instruction: | |
8163 | the best I've been able to find reads `Takes a string of bits from | |
8164 | the first operand and puts them in the second operand'. It is | |
8165 | present only in early 386 processors, and conflicts with the opcodes | |
8166 | for `CMPXCHG486'. NASM supports it only for completeness. Its | |
8167 | counterpart is `IBTS' (see section A.75). | |
8168 | ||
8169 | A.168 `XCHG': Exchange | |
8170 | ||
8171 | XCHG reg8,r/m8 ; 86 /r [8086] | |
8172 | XCHG reg16,r/m8 ; o16 87 /r [8086] | |
8173 | XCHG reg32,r/m32 ; o32 87 /r [386] | |
8174 | ||
8175 | XCHG r/m8,reg8 ; 86 /r [8086] | |
8176 | XCHG r/m16,reg16 ; o16 87 /r [8086] | |
8177 | XCHG r/m32,reg32 ; o32 87 /r [386] | |
8178 | ||
8179 | XCHG AX,reg16 ; o16 90+r [8086] | |
8180 | XCHG EAX,reg32 ; o32 90+r [386] | |
8181 | XCHG reg16,AX ; o16 90+r [8086] | |
8182 | XCHG reg32,EAX ; o32 90+r [386] | |
8183 | ||
8184 | `XCHG' exchanges the values in its two operands. It can be used with | |
8185 | a `LOCK' prefix for purposes of multi-processor synchronisation. | |
8186 | ||
8187 | `XCHG AX,AX' or `XCHG EAX,EAX' (depending on the `BITS' setting) | |
8188 | generates the opcode `90h', and so is a synonym for `NOP' (section | |
8189 | A.109). | |
8190 | ||
8191 | A.169 `XLATB': Translate Byte in Lookup Table | |
8192 | ||
8193 | XLATB ; D7 [8086] | |
8194 | ||
8195 | `XLATB' adds the value in `AL', treated as an unsigned byte, to `BX' | |
8196 | or `EBX', and loads the byte from the resulting address (in the | |
8197 | segment specified by `DS') back into `AL'. | |
8198 | ||
8199 | The base register used is `BX' if the address size is 16 bits, and | |
8200 | `EBX' if it is 32 bits. If you need to use an address size not equal | |
8201 | to the current `BITS' setting, you can use an explicit `a16' or | |
8202 | `a32' prefix. | |
8203 | ||
8204 | The segment register used to load from `[BX+AL]' or `[EBX+AL]' can | |
8205 | be overridden by using a segment register name as a prefix (for | |
8206 | example, `es xlatb'). | |
8207 | ||
8208 | A.170 `XOR': Bitwise Exclusive OR | |
8209 | ||
8210 | XOR r/m8,reg8 ; 30 /r [8086] | |
8211 | XOR r/m16,reg16 ; o16 31 /r [8086] | |
8212 | XOR r/m32,reg32 ; o32 31 /r [386] | |
8213 | ||
8214 | XOR reg8,r/m8 ; 32 /r [8086] | |
8215 | XOR reg16,r/m16 ; o16 33 /r [8086] | |
8216 | XOR reg32,r/m32 ; o32 33 /r [386] | |
8217 | ||
8218 | XOR r/m8,imm8 ; 80 /6 ib [8086] | |
8219 | XOR r/m16,imm16 ; o16 81 /6 iw [8086] | |
8220 | XOR r/m32,imm32 ; o32 81 /6 id [386] | |
8221 | ||
8222 | XOR r/m16,imm8 ; o16 83 /6 ib [8086] | |
8223 | XOR r/m32,imm8 ; o32 83 /6 ib [386] | |
8224 | ||
8225 | XOR AL,imm8 ; 34 ib [8086] | |
8226 | XOR AX,imm16 ; o16 35 iw [8086] | |
8227 | XOR EAX,imm32 ; o32 35 id [386] | |
8228 | ||
8229 | `XOR' performs a bitwise XOR operation between its two operands | |
8230 | (i.e. each bit of the result is 1 if and only if exactly one of the | |
8231 | corresponding bits of the two inputs was 1), and stores the result | |
8232 | in the destination (first) operand. | |
8233 | ||
8234 | In the forms with an 8-bit immediate second operand and a longer | |
8235 | first operand, the second operand is considered to be signed, and is | |
8236 | sign-extended to the length of the first operand. In these cases, | |
8237 | the `BYTE' qualifier is necessary to force NASM to generate this | |
8238 | form of the instruction. | |
8239 | ||
8240 | The MMX instruction `PXOR' (see section A.137) performs the same | |
8241 | operation on the 64-bit MMX registers. |