]>
Commit | Line | Data |
---|---|---|
8414a40c VZ |
1 | <pre> |
2 | DRAFT TIFF Technical Note #2 17-Mar-95 | |
3 | ============================ | |
4 | ||
5 | This Technical Note describes serious problems that have been found in | |
6 | TIFF 6.0's design for embedding JPEG-compressed data in TIFF (Section 22 | |
7 | of the TIFF 6.0 spec of 3 June 1992). A replacement TIFF/JPEG | |
8 | specification is given. Some corrections to Section 21 are also given. | |
9 | ||
10 | To permit TIFF implementations to continue to read existing files, the 6.0 | |
11 | JPEG fields and tag values will remain reserved indefinitely. However, | |
12 | TIFF writers are strongly discouraged from using the 6.0 JPEG design. It | |
13 | is expected that the next full release of the TIFF specification will not | |
14 | describe the old design at all, except to note that certain tag numbers | |
15 | are reserved. The existing Section 22 will be replaced by the | |
16 | specification text given in the second part of this Tech Note. | |
17 | ||
18 | ||
19 | Problems in TIFF 6.0 JPEG | |
20 | ========================= | |
21 | ||
22 | Abandoning a published spec is not a step to be taken lightly. This | |
23 | section summarizes the reasons that have forced this decision. | |
24 | TIFF 6.0's JPEG design suffers from design errors and limitations, | |
25 | ambiguities, and unnecessary complexity. | |
26 | ||
27 | ||
28 | Design errors and limitations | |
29 | ----------------------------- | |
30 | ||
31 | The fundamental design error in the existing Section 22 is that JPEG's | |
32 | various tables and parameters are broken out as separate fields which the | |
33 | TIFF control logic must manage. This is bad software engineering: that | |
34 | information should be treated as private to the JPEG codec | |
35 | (compressor/decompressor). Worse, the fields themselves are specified | |
36 | without sufficient thought for future extension and without regard to | |
37 | well-established TIFF conventions. Here are some of the significant | |
38 | problems: | |
39 | ||
40 | * The JPEGxxTable fields do not store the table data directly in the | |
41 | IFD/field structure; rather, the fields hold pointers to information | |
42 | elsewhere in the file. This requires special-purpose code to be added to | |
43 | *every* TIFF-manipulating application, whether it needs to decode JPEG | |
44 | image data or not. Even a trivial TIFF editor, for example a program to | |
45 | add an ImageDescription field to a TIFF file, must be explicitly aware of | |
46 | the internal structure of the JPEG-related tables, or else it will probably | |
47 | break the file. Every other auxiliary field in the TIFF spec contains | |
48 | data, not pointers, and can be copied or relocated by standard code that | |
49 | doesn't know anything about the particular field. This is a crucial | |
50 | property of the TIFF format that must not be given up. | |
51 | ||
52 | * To manipulate these fields, the TIFF control logic is required to know a | |
53 | great deal about JPEG details, for example such arcana as how to compute | |
54 | the length of a Huffman code table --- the length is not supplied in the | |
55 | field structure and can only be found by inspecting the table contents. | |
56 | This is again a violation of good software practice. Moreover, it will | |
57 | prevent easy adoption of future JPEG extensions that might change these | |
58 | low-level details. | |
59 | ||
60 | * The design neglects the fact that baseline JPEG codecs support only two | |
61 | sets of Huffman tables: it specifies a separate table for each color | |
62 | component. This implies that encoders must waste space (by storing | |
63 | duplicate Huffman tables) or else violate the well-founded TIFF convention | |
64 | that prohibits duplicate pointers. Furthermore, baseline decoders must | |
65 | test to find out which tables are identical, a waste of time and code | |
66 | space. | |
67 | ||
68 | * The JPEGInterchangeFormat field also violates TIFF's proscription against | |
69 | duplicate pointers: the normal strip/tile pointers are expected to point | |
70 | into the larger data area pointed to by JPEGInterchangeFormat. All TIFF | |
71 | editing applications must be specifically aware of this relationship, since | |
72 | they must maintain it or else delete the JPEGInterchangeFormat field. The | |
73 | JPEGxxTables fields are also likely to point into the JPEGInterchangeFormat | |
74 | area, creating additional pointer relationships that must be maintained. | |
75 | ||
76 | * The JPEGQTables field is fixed at a byte per table entry; there is no | |
77 | way to support 16-bit quantization values. This is a serious impediment | |
78 | to extending TIFF to use 12-bit JPEG. | |
79 | ||
80 | * The 6.0 design cannot support using different quantization tables in | |
81 | different strips/tiles of an image (so as to encode some areas at higher | |
82 | quality than others). Furthermore, since quantization tables are tied | |
83 | one-for-one to color components, the design cannot support table switching | |
84 | options that are likely to be added in future JPEG revisions. | |
85 | ||
86 | ||
87 | Ambiguities | |
88 | ----------- | |
89 | ||
90 | Several incompatible interpretations are possible for 6.0's treatment of | |
91 | JPEG restart markers: | |
92 | ||
93 | * It is unclear whether restart markers must be omitted at TIFF segment | |
94 | (strip/tile) boundaries, or whether they are optional. | |
95 | ||
96 | * It is unclear whether the segment size is required to be chosen as | |
97 | a multiple of the specified restart interval (if any); perhaps the | |
98 | JPEG codec is supposed to be reset at each segment boundary as if | |
99 | there were a restart marker there, even if the boundary does not fall | |
100 | at a multiple of the nominal restart interval. | |
101 | ||
102 | * The spec fails to address the question of restart marker numbering: | |
103 | do the numbers begin again within each segment, or not? | |
104 | ||
105 | That last point is particularly nasty. If we make numbering begin again | |
106 | within each segment, we give up the ability to impose a TIFF strip/tile | |
107 | structure on an existing JPEG datastream with restarts (which was clearly a | |
108 | goal of Section 22's authors). But the other choice interferes with random | |
109 | access to the image segments: a reader must compute the first restart | |
110 | number to be expected within a segment, and must have a way to reset its | |
111 | JPEG decoder to expect a nonzero restart number first. This may not even | |
112 | be possible with some JPEG chips. | |
113 | ||
114 | The tile height restriction found on page 104 contradicts Section 15's | |
115 | general description of tiles. For an image that is not vertically | |
116 | downsampled, page 104 specifies a tile height of one MCU or 8 pixels; but | |
117 | Section 15 requires tiles to be a multiple of 16 pixels high. | |
118 | ||
119 | This Tech Note does not attempt to resolve these ambiguities, so | |
120 | implementations that follow the 6.0 design should be aware that | |
121 | inter-application compatibility problems are likely to arise. | |
122 | ||
123 | ||
124 | Unnecessary complexity | |
125 | ---------------------- | |
126 | ||
127 | The 6.0 design creates problems for implementations that need to keep the | |
128 | JPEG codec separate from the TIFF control logic --- for example, consider | |
129 | using a JPEG chip that was not designed specifically for TIFF. JPEG codecs | |
130 | generally want to produce or consume a standard ISO JPEG datastream, not | |
131 | just raw compressed data. (If they were to handle raw data, a separate | |
132 | out-of-band mechanism would be needed to load tables into the codec.) | |
133 | With such a codec, the TIFF control logic must parse JPEG markers emitted | |
134 | by the codec to create the TIFF table fields (when writing) or synthesize | |
135 | JPEG markers from the TIFF fields to feed the codec (when reading). This | |
136 | means that the control logic must know a great deal more about JPEG details | |
137 | than we would like. The parsing and reconstruction of the markers also | |
138 | represents a fair amount of unnecessary work. | |
139 | ||
140 | Quite a few implementors have proposed writing "TIFF/JPEG" files in which | |
141 | a standard JPEG datastream is simply dumped into the file and pointed to | |
142 | by JPEGInterchangeFormat. To avoid parsing the JPEG datastream, they | |
143 | suggest not writing the JPEG auxiliary fields (JPEGxxTables etc) nor even | |
144 | the basic TIFF strip/tile data pointers. This approach is incompatible | |
145 | with implementations that handle the full TIFF 6.0 JPEG design, since they | |
146 | will expect to find strip/tile pointers and auxiliary fields. Indeed this | |
147 | is arguably not TIFF at all, since *all* TIFF-reading applications expect | |
148 | to find strip or tile pointers. A subset implementation that is not | |
149 | upward-compatible with the full spec is clearly unacceptable. However, | |
150 | the frequency with which this idea has come up makes it clear that | |
151 | implementors find the existing Section 22 too complex. | |
152 | ||
153 | ||
154 | Overview of the solution | |
155 | ======================== | |
156 | ||
157 | To solve these problems, we adopt a new design for embedding | |
158 | JPEG-compressed data in TIFF files. The new design uses only complete, | |
159 | uninterpreted ISO JPEG datastreams, so it should be much more forgiving of | |
160 | extensions to the ISO standard. It should also be far easier to implement | |
161 | using unmodified JPEG codecs. | |
162 | ||
163 | To reduce overhead in multi-segment TIFF files, we allow JPEG overhead | |
164 | tables to be stored just once in a JPEGTables auxiliary field. This | |
165 | feature does not violate the integrity of the JPEG datastreams, because it | |
166 | uses the notions of "tables-only datastreams" and "abbreviated image | |
167 | datastreams" as defined by the ISO standard. | |
168 | ||
169 | To prevent confusion with the old design, the new design is given a new | |
170 | Compression tag value, Compression=7. Readers that need to handle | |
171 | existing 6.0 JPEG files may read both old and new files, using whatever | |
172 | interpretation of the 6.0 spec they did before. Compression tag value 6 | |
173 | and the field tag numbers defined by 6.0 section 22 will remain reserved | |
174 | indefinitely, even though detailed descriptions of them will be dropped | |
175 | from future editions of the TIFF specification. | |
176 | ||
177 | ||
178 | Replacement TIFF/JPEG specification | |
179 | =================================== | |
180 | ||
181 | [This section of the Tech Note is expected to replace Section 22 in the | |
182 | next release of the TIFF specification.] | |
183 | ||
184 | This section describes TIFF compression scheme 7, a high-performance | |
185 | compression method for continuous-tone images. | |
186 | ||
187 | Introduction | |
188 | ------------ | |
189 | ||
190 | This TIFF compression method uses the international standard for image | |
191 | compression ISO/IEC 10918-1, usually known as "JPEG" (after the original | |
192 | name of the standards committee, Joint Photographic Experts Group). JPEG | |
193 | is a joint ISO/CCITT standard for compression of continuous-tone images. | |
194 | ||
195 | The JPEG committee decided that because of the broad scope of the standard, | |
196 | no one algorithmic procedure was able to satisfy the requirements of all | |
197 | applications. Instead, the JPEG standard became a "toolkit" of multiple | |
198 | algorithms and optional capabilities. Individual applications may select | |
199 | a subset of the JPEG standard that meets their requirements. | |
200 | ||
201 | The most important distinction among the JPEG processes is between lossy | |
202 | and lossless compression. Lossy compression methods provide high | |
203 | compression but allow only approximate reconstruction of the original | |
204 | image. JPEG's lossy processes allow the encoder to trade off compressed | |
205 | file size against reconstruction fidelity over a wide range. Typically, | |
206 | 10:1 or more compression of full-color data can be obtained while keeping | |
207 | the reconstructed image visually indistinguishable from the original. Much | |
208 | higher compression ratios are possible if a low-quality reconstructed image | |
209 | is acceptable. Lossless compression provides exact reconstruction of the | |
210 | source data, but the achievable compression ratio is much lower than for | |
211 | the lossy processes; JPEG's rather simple lossless process typically | |
212 | achieves around 2:1 compression of full-color data. | |
213 | ||
214 | The most widely implemented JPEG subset is the "baseline" JPEG process. | |
215 | This provides lossy compression of 8-bit-per-channel data. Optional | |
216 | extensions include 12-bit-per-channel data, arithmetic entropy coding for | |
217 | better compression, and progressive/hierarchical representations. The | |
218 | lossless process is an independent algorithm that has little in | |
219 | common with the lossy processes. | |
220 | ||
221 | It should be noted that the optional arithmetic-coding extension is subject | |
222 | to several US and Japanese patents. To avoid patent problems, use of | |
223 | arithmetic coding processes in TIFF files intended for inter-application | |
224 | interchange is discouraged. | |
225 | ||
226 | All of the JPEG processes are useful only for "continuous tone" data, | |
227 | in which the difference between adjacent pixel values is usually small. | |
228 | Low-bit-depth source data is not appropriate for JPEG compression, nor | |
229 | are palette-color images good candidates. The JPEG processes work well | |
230 | on grayscale and full-color data. | |
231 | ||
232 | Describing the JPEG compression algorithms in sufficient detail to permit | |
233 | implementation would require more space than we have here. Instead, we | |
234 | refer the reader to the References section. | |
235 | ||
236 | ||
237 | What data is being compressed? | |
238 | ------------------------------ | |
239 | ||
240 | In lossy JPEG compression, it is customary to convert color source data | |
241 | to YCbCr and then downsample it before JPEG compression. This gives | |
242 | 2:1 data compression with hardly any visible image degradation, and it | |
243 | permits additional space savings within the JPEG compression step proper. | |
244 | However, these steps are not considered part of the ISO JPEG standard. | |
245 | The ISO standard is "color blind": it accepts data in any color space. | |
246 | ||
247 | For TIFF purposes, the JPEG compression tag is considered to represent the | |
248 | ISO JPEG compression standard only. The ISO standard is applied to the | |
249 | same data that would be stored in the TIFF file if no compression were | |
250 | used. Therefore, if color conversion or downsampling are used, they must | |
251 | be reflected in the regular TIFF fields; these steps are not considered to | |
252 | be implicit in the JPEG compression tag value. PhotometricInterpretation | |
253 | and related fields shall describe the color space actually stored in the | |
254 | file. With the TIFF 6.0 field definitions, downsampling is permissible | |
255 | only for YCbCr data, and it must correspond to the YCbCrSubSampling field. | |
256 | (Note that the default value for this field is not 1,1; so the default for | |
257 | YCbCr is to apply downsampling!) It is likely that future versions of TIFF | |
258 | will provide additional PhotometricInterpretation values and a more general | |
259 | way of defining subsampling, so as to allow more flexibility in | |
260 | JPEG-compressed files. But that issue is not addressed in this Tech Note. | |
261 | ||
262 | Implementors should note that many popular JPEG codecs | |
263 | (compressor/decompressors) provide automatic color conversion and | |
264 | downsampling, so that the application may supply full-size RGB data which | |
265 | is nonetheless converted to downsampled YCbCr. This is an implementation | |
266 | convenience which does not excuse the TIFF control layer from its | |
267 | responsibility to know what is really going on. The | |
268 | PhotometricInterpretation and subsampling fields written to the file must | |
269 | describe what is actually in the file. | |
270 | ||
271 | A JPEG-compressed TIFF file will typically have PhotometricInterpretation = | |
272 | YCbCr and YCbCrSubSampling = [2,1] or [2,2], unless the source data was | |
273 | grayscale or CMYK. | |
274 | ||
275 | ||
276 | Basic representation of JPEG-compressed images | |
277 | ---------------------------------------------- | |
278 | ||
279 | JPEG compression works in either strip-based or tile-based TIFF files. | |
280 | Rather than repeating "strip or tile" constantly, we will use the term | |
281 | "segment" to mean either a strip or a tile. | |
282 | ||
283 | When the Compression field has the value 7, each image segment contains | |
284 | a complete JPEG datastream which is valid according to the ISO JPEG | |
285 | standard (ISO/IEC 10918-1). Any sequential JPEG process can be used, | |
286 | including lossless JPEG, but progressive and hierarchical processes are not | |
287 | supported. Since JPEG is useful only for continuous-tone images, the | |
288 | PhotometricInterpretation of the image shall not be 3 (palette color) nor | |
289 | 4 (transparency mask). The bit depth of the data is also restricted as | |
290 | specified below. | |
291 | ||
292 | Each image segment in a JPEG-compressed TIFF file shall contain a valid | |
293 | JPEG datastream according to the ISO JPEG standard's rules for | |
294 | interchange-format or abbreviated-image-format data. The datastream shall | |
295 | contain a single JPEG frame storing that segment of the image. The | |
296 | required JPEG markers within a segment are: | |
297 | SOI (must appear at very beginning of segment) | |
298 | SOFn | |
299 | SOS (one for each scan, if there is more than one scan) | |
300 | EOI (must appear at very end of segment) | |
301 | The actual compressed data follows SOS; it may contain RSTn markers if DRI | |
302 | is used. | |
303 | ||
304 | Additional JPEG "tables and miscellaneous" markers may appear between SOI | |
305 | and SOFn, between SOFn and SOS, and before each subsequent SOS if there is | |
306 | more than one scan. These markers include: | |
307 | DQT | |
308 | DHT | |
309 | DAC (not to appear unless arithmetic coding is used) | |
310 | DRI | |
311 | APPn (shall be ignored by TIFF readers) | |
312 | COM (shall be ignored by TIFF readers) | |
313 | DNL markers shall not be used in TIFF files. Readers should abort if any | |
314 | other marker type is found, especially the JPEG reserved markers; | |
315 | occurrence of such a marker is likely to indicate a JPEG extension. | |
316 | ||
317 | The tables/miscellaneous markers may appear in any order. Readers are | |
318 | cautioned that although the SOFn marker refers to DQT tables, JPEG does not | |
319 | require those tables to precede the SOFn, only the SOS. Missing-table | |
320 | checks should be made when SOS is reached. | |
321 | ||
322 | If no JPEGTables field is used, then each image segment shall be a complete | |
323 | JPEG interchange datastream. Each segment must define all the tables it | |
324 | references. To allow readers to decode segments in any order, no segment | |
325 | may rely on tables being carried over from a previous segment. | |
326 | ||
327 | When a JPEGTables field is used, image segments may omit tables that have | |
328 | been specified in the JPEGTables field. Further details appear below. | |
329 | ||
330 | The SOFn marker shall be of type SOF0 for strict baseline JPEG data, of | |
331 | type SOF1 for non-baseline lossy JPEG data, or of type SOF3 for lossless | |
332 | JPEG data. (SOF9 or SOF11 would be used for arithmetic coding.) All | |
333 | segments of a JPEG-compressed TIFF image shall use the same JPEG | |
334 | compression process, in particular the same SOFn type. | |
335 | ||
336 | The data precision field of the SOFn marker shall agree with the TIFF | |
337 | BitsPerSample field. (Note that when PlanarConfiguration=1, this implies | |
338 | that all components must have the same BitsPerSample value; when | |
339 | PlanarConfiguration=2, different components could have different bit | |
340 | depths.) For SOF0 only precision 8 is permitted; for SOF1, precision 8 or | |
341 | 12 is permitted; for SOF3, precisions 2 to 16 are permitted. | |
342 | ||
343 | The image dimensions given in the SOFn marker shall agree with the logical | |
344 | dimensions of that particular strip or tile. For strip images, the SOFn | |
345 | image width shall equal ImageWidth and the height shall equal RowsPerStrip, | |
346 | except in the last strip; its SOFn height shall equal the number of rows | |
347 | remaining in the ImageLength. (In other words, no padding data is counted | |
348 | in the SOFn dimensions.) For tile images, each SOFn shall have width | |
349 | TileWidth and height TileHeight; adding and removing any padding needed in | |
350 | the edge tiles is the concern of some higher level of the TIFF software. | |
351 | (The dimensional rules are slightly different when PlanarConfiguration=2, | |
352 | as described below.) | |
353 | ||
354 | The ISO JPEG standard only permits images up to 65535 pixels in width or | |
355 | height, due to 2-byte fields in the SOFn markers. In TIFF, this limits | |
356 | the size of an individual JPEG-compressed strip or tile, but the total | |
357 | image size can be greater. | |
358 | ||
359 | The number of components in the JPEG datastream shall equal SamplesPerPixel | |
360 | for PlanarConfiguration=1, and shall be 1 for PlanarConfiguration=2. The | |
361 | components shall be stored in the same order as they are described at the | |
362 | TIFF field level. (This applies both to their order in the SOFn marker, | |
363 | and to the order in which they are scanned if multiple JPEG scans are | |
364 | used.) The component ID bytes are arbitrary so long as each component | |
365 | within an image segment is given a distinct ID. To avoid any possible | |
366 | confusion, we require that all segments of a TIFF image use the same ID | |
367 | code for a given component. | |
368 | ||
369 | In PlanarConfiguration 1, the sampling factors given in SOFn markers shall | |
370 | agree with the sampling factors defined by the related TIFF fields (or with | |
371 | the default values that are specified in the absence of those fields). | |
372 | ||
373 | When DCT-based JPEG is used in a strip TIFF file, RowsPerStrip is required | |
374 | to be a multiple of 8 times the largest vertical sampling factor, i.e., a | |
375 | multiple of the height of an interleaved MCU. (For simplicity of | |
376 | specification, we require this even if the data is not actually | |
377 | interleaved.) For example, if YCbCrSubSampling = [2,2] then RowsPerStrip | |
378 | must be a multiple of 16. An exception to this rule is made for | |
379 | single-strip images (RowsPerStrip >= ImageLength): the exact value of | |
380 | RowsPerStrip is unimportant in that case. This rule ensures that no data | |
381 | padding is needed at the bottom of a strip, except perhaps the last strip. | |
382 | Any padding required at the right edge of the image, or at the bottom of | |
383 | the last strip, is expected to occur internally to the JPEG codec. | |
384 | ||
385 | When DCT-based JPEG is used in a tiled TIFF file, TileLength is required | |
386 | to be a multiple of 8 times the largest vertical sampling factor, i.e., | |
387 | a multiple of the height of an interleaved MCU; and TileWidth is required | |
388 | to be a multiple of 8 times the largest horizontal sampling factor, i.e., | |
389 | a multiple of the width of an interleaved MCU. (For simplicity of | |
390 | specification, we require this even if the data is not actually | |
391 | interleaved.) All edge padding required will therefore occur in the course | |
392 | of normal TIFF tile padding; it is not special to JPEG. | |
393 | ||
394 | Lossless JPEG does not impose these constraints on strip and tile sizes, | |
395 | since it is not DCT-based. | |
396 | ||
397 | Note that within JPEG datastreams, multibyte values appear in the MSB-first | |
398 | order specified by the JPEG standard, regardless of the byte ordering of | |
399 | the surrounding TIFF file. | |
400 | ||
401 | ||
402 | JPEGTables field | |
403 | ---------------- | |
404 | ||
405 | The only auxiliary TIFF field added for Compression=7 is the optional | |
406 | JPEGTables field. The purpose of JPEGTables is to predefine JPEG | |
407 | quantization and/or Huffman tables for subsequent use by JPEG image | |
408 | segments. When this is done, these rather bulky tables need not be | |
409 | duplicated in each segment, thus saving space and processing time. | |
410 | JPEGTables may be used even in a single-segment file, although there is no | |
411 | space savings in that case. | |
412 | ||
413 | JPEGTables: | |
414 | Tag = 347 (15B.H) | |
415 | Type = UNDEFINED | |
416 | N = number of bytes in tables datastream, typically a few hundred | |
417 | JPEGTables provides default JPEG quantization and/or Huffman tables which | |
418 | are used whenever a segment datastream does not contain its own tables, as | |
419 | specified below. | |
420 | ||
421 | Notice that the JPEGTables field is required to have type code UNDEFINED, | |
422 | not type code BYTE. This is to cue readers that expanding individual bytes | |
423 | to short or long integers is not appropriate. A TIFF reader will generally | |
424 | need to store the field value as an uninterpreted byte sequence until it is | |
425 | fed to the JPEG decoder. | |
426 | ||
427 | Multibyte quantities within the tables follow the ISO JPEG convention of | |
428 | MSB-first storage, regardless of the byte ordering of the surrounding TIFF | |
429 | file. | |
430 | ||
431 | When the JPEGTables field is present, it shall contain a valid JPEG | |
432 | "abbreviated table specification" datastream. This datastream shall begin | |
433 | with SOI and end with EOI. It may contain zero or more JPEG "tables and | |
434 | miscellaneous" markers, namely: | |
435 | DQT | |
436 | DHT | |
437 | DAC (not to appear unless arithmetic coding is used) | |
438 | DRI | |
439 | APPn (shall be ignored by TIFF readers) | |
440 | COM (shall be ignored by TIFF readers) | |
441 | Since JPEG defines the SOI marker to reset the DAC and DRI state, these two | |
442 | markers' values cannot be carried over into any image datastream, and thus | |
443 | they are effectively no-ops in the JPEGTables field. To avoid confusion, | |
444 | it is recommended that writers not place DAC or DRI markers in JPEGTables. | |
445 | However readers must properly skip over them if they appear. | |
446 | ||
447 | When JPEGTables is present, readers shall load the table specifications | |
448 | contained in JPEGTables before processing image segment datastreams. | |
449 | Image segments may simply refer to these preloaded tables without defining | |
450 | them. An image segment can still define and use its own tables, subject to | |
451 | the restrictions below. | |
452 | ||
453 | An image segment may not redefine any table defined in JPEGTables. (This | |
454 | restriction is imposed to allow readers to process image segments in random | |
455 | order without having to reload JPEGTables between segments.) Therefore, use | |
456 | of JPEGTables divides the available table slots into two groups: "global" | |
457 | slots are defined in JPEGTables and may be used but not redefined by | |
458 | segments; "local" slots are available for local definition and use in each | |
459 | segment. To permit random access, a segment may not reference any local | |
460 | tables that it does not itself define. | |
461 | ||
462 | ||
463 | Special considerations for PlanarConfiguration 2 | |
464 | ------------------------------------------------ | |
465 | ||
466 | In PlanarConfiguration 2, each image segment contains data for only one | |
467 | color component. To avoid confusing the JPEG codec, we wish the segments | |
468 | to look like valid single-channel (i.e., grayscale) JPEG datastreams. This | |
469 | means that different rules must be used for the SOFn parameters. | |
470 | ||
471 | In PlanarConfiguration 2, the dimensions given in the SOFn of a subsampled | |
472 | component shall be scaled down by the sampling factors compared to the SOFn | |
473 | dimensions that would be used in PlanarConfiguration 1. This is necessary | |
474 | to match the actual number of samples stored in that segment, so that the | |
475 | JPEG codec doesn't complain about too much or too little data. In strip | |
476 | TIFF files the computed dimensions may need to be rounded up to the next | |
477 | integer; in tiled files, the restrictions on tile size make this case | |
478 | impossible. | |
479 | ||
480 | Furthermore, all SOFn sampling factors shall be given as 1. (This is | |
481 | merely to avoid confusion, since the sampling factors in a single-channel | |
482 | JPEG datastream have no real effect.) | |
483 | ||
484 | Any downsampling will need to happen externally to the JPEG codec, since | |
485 | JPEG sampling factors are defined with reference to the full-precision | |
486 | component. In PlanarConfiguration 2, the JPEG codec will be working on | |
487 | only one component at a time and thus will have no reference component to | |
488 | downsample against. | |
489 | ||
490 | ||
491 | Minimum requirements for TIFF/JPEG | |
492 | ---------------------------------- | |
493 | ||
494 | ISO JPEG is a large and complex standard; most implementations support only | |
495 | a subset of it. Here we define a "core" subset of TIFF/JPEG which readers | |
496 | must support to claim TIFF/JPEG compatibility. For maximum | |
497 | cross-application compatibility, we recommend that writers confine | |
498 | themselves to this subset unless there is very good reason to do otherwise. | |
499 | ||
500 | Use the ISO baseline JPEG process: 8-bit data precision, Huffman coding, | |
501 | with no more than 2 DC and 2 AC Huffman tables. Note that this implies | |
502 | BitsPerSample = 8 for each component. We recommend deviating from baseline | |
503 | JPEG only if 12-bit data precision or lossless coding is required. | |
504 | ||
505 | Use no subsampling (all JPEG sampling factors = 1) for color spaces other | |
506 | than YCbCr. (This is, in fact, required with the TIFF 6.0 field | |
507 | definitions, but may not be so in future revisions.) For YCbCr, use one of | |
508 | the following choices: | |
509 | YCbCrSubSampling field JPEG sampling factors | |
510 | 1,1 1h1v, 1h1v, 1h1v | |
511 | 2,1 2h1v, 1h1v, 1h1v | |
512 | 2,2 (default value) 2h2v, 1h1v, 1h1v | |
513 | We recommend that RGB source data be converted to YCbCr for best compression | |
514 | results. Other source data colorspaces should probably be left alone. | |
515 | Minimal readers need not support JPEG images with colorspaces other than | |
516 | YCbCr and grayscale (PhotometricInterpretation = 6 or 1). | |
517 | ||
518 | A minimal reader also need not support JPEG YCbCr images with nondefault | |
519 | values of YCbCrCoefficients or YCbCrPositioning, nor with values of | |
520 | ReferenceBlackWhite other than [0,255,128,255,128,255]. (These values | |
521 | correspond to the RGB<=>YCbCr conversion specified by JFIF, which is widely | |
522 | implemented in JPEG codecs.) | |
523 | ||
524 | Writers are reminded that a ReferenceBlackWhite field *must* be included | |
525 | when PhotometricInterpretation is YCbCr, because the default | |
526 | ReferenceBlackWhite values are inappropriate for YCbCr. | |
527 | ||
528 | If any subsampling is used, PlanarConfiguration=1 is preferred to avoid the | |
529 | possibly-confusing requirements of PlanarConfiguration=2. In any case, | |
530 | readers are not required to support PlanarConfiguration=2. | |
531 | ||
532 | If possible, use a single interleaved scan in each image segment. This is | |
533 | not legal JPEG if there are more than 4 SamplesPerPixel or if the sampling | |
534 | factors are such that more than 10 blocks would be needed per MCU; in that | |
535 | case, use a separate scan for each component. (The recommended color | |
536 | spaces and sampling factors will not run into that restriction, so a | |
537 | minimal reader need not support more than one scan per segment.) | |
538 | ||
539 | To claim TIFF/JPEG compatibility, readers shall support multiple-strip TIFF | |
540 | files and the optional JPEGTables field; it is not acceptable to read only | |
541 | single-datastream files. Support for tiled TIFF files is strongly | |
542 | recommended but not required. | |
543 | ||
544 | ||
545 | Other recommendations for implementors | |
546 | -------------------------------------- | |
547 | ||
548 | The TIFF tag Compression=7 guarantees only that the compressed data is | |
549 | represented as ISO JPEG datastreams. Since JPEG is a large and evolving | |
550 | standard, readers should apply careful error checking to the JPEG markers | |
551 | to ensure that the compression process is within their capabilities. In | |
552 | particular, to avoid being confused by future extensions to the JPEG | |
553 | standard, it is important to abort if unknown marker codes are seen. | |
554 | ||
555 | The point of requiring that all image segments use the same JPEG process is | |
556 | to ensure that a reader need check only one segment to determine whether it | |
557 | can handle the image. For example, consider a TIFF reader that has access | |
558 | to fast but restricted JPEG hardware, as well as a slower, more general | |
559 | software implementation. It is desirable to check only one image segment | |
560 | to find out whether the fast hardware can be used. Thus, writers should | |
561 | try to ensure that all segments of an image look as much "alike" as | |
562 | possible: there should be no variation in scan layout, use of options such | |
563 | as DRI, etc. Ideally, segments will be processed identically except | |
564 | perhaps for using different local quantization or entropy-coding tables. | |
565 | ||
566 | Writers should avoid including "noise" JPEG markers (COM and APPn markers). | |
567 | Standard TIFF fields provide a better way to transport any non-image data. | |
568 | Some JPEG codecs may change behavior if they see an APPn marker they | |
569 | think they understand; since the TIFF spec requires these markers to be | |
570 | ignored, this behavior is undesirable. | |
571 | ||
572 | It is possible to convert an interchange-JPEG file (e.g., a JFIF file) to | |
573 | TIFF simply by dropping the interchange datastream into a single strip. | |
574 | (However, designers are reminded that the TIFF spec discourages huge | |
575 | strips; splitting the image is somewhat more work but may give better | |
576 | results.) Conversion from TIFF to interchange JPEG is more complex. A | |
577 | strip-based TIFF/JPEG file can be converted fairly easily if all strips use | |
578 | identical JPEG tables and no RSTn markers: just delete the overhead markers | |
579 | and insert RSTn markers between strips. Converting tiled images is harder, | |
580 | since the data will usually not be in the right order (unless the tiles are | |
581 | only one MCU high). This can still be done losslessly, but it will require | |
582 | undoing and redoing the entropy coding so that the DC coefficient | |
583 | differences can be updated. | |
584 | ||
585 | There is no default value for JPEGTables: standard TIFF files must define all | |
586 | tables that they reference. For some closed systems in which many files will | |
587 | have identical tables, it might make sense to define a default JPEGTables | |
588 | value to avoid actually storing the tables. Or even better, invent a | |
589 | private field selecting one of N default JPEGTables settings, so as to allow | |
590 | for future expansion. Either of these must be regarded as a private | |
591 | extension that will render the files unreadable by other applications. | |
592 | ||
593 | ||
594 | References | |
595 | ---------- | |
596 | ||
597 | [1] Wallace, Gregory K. "The JPEG Still Picture Compression Standard", | |
598 | Communications of the ACM, April 1991 (vol. 34 no. 4), pp. 30-44. | |
599 | ||
600 | This is the best short technical introduction to the JPEG algorithms. | |
601 | It is a good overview but does not provide sufficiently detailed | |
602 | information to write an implementation. | |
603 | ||
604 | [2] Pennebaker, William B. and Mitchell, Joan L. "JPEG Still Image Data | |
605 | Compression Standard", Van Nostrand Reinhold, 1993, ISBN 0-442-01272-1. | |
606 | 638pp. | |
607 | ||
608 | This textbook is by far the most complete exposition of JPEG in existence. | |
609 | It includes the full text of the ISO JPEG standards (DIS 10918-1 and draft | |
610 | DIS 10918-2). No would-be JPEG implementor should be without it. | |
611 | ||
612 | [3] ISO/IEC IS 10918-1, "Digital Compression and Coding of Continuous-tone | |
613 | Still Images, Part 1: Requirements and guidelines", February 1994. | |
614 | ISO/IEC DIS 10918-2, "Digital Compression and Coding of Continuous-tone | |
615 | Still Images, Part 2: Compliance testing", final approval expected 1994. | |
616 | ||
617 | These are the official standards documents. Note that the Pennebaker and | |
618 | Mitchell textbook is likely to be cheaper and more useful than the official | |
619 | standards. | |
620 | ||
621 | ||
622 | Changes to Section 21: YCbCr Images | |
623 | =================================== | |
624 | ||
625 | [This section of the Tech Note clarifies section 21 to make clear the | |
626 | interpretation of image dimensions in a subsampled image. Furthermore, | |
627 | the section is changed to allow the original image dimensions not to be | |
628 | multiples of the sampling factors. This change is necessary to support use | |
629 | of JPEG compression on odd-size images.] | |
630 | ||
631 | Add the following paragraphs to the Section 21 introduction (p. 89), | |
632 | just after the paragraph beginning "When a Class Y image is subsampled": | |
633 | ||
634 | In a subsampled image, it is understood that all TIFF image | |
635 | dimensions are measured in terms of the highest-resolution | |
636 | (luminance) component. In particular, ImageWidth, ImageLength, | |
637 | RowsPerStrip, TileWidth, TileLength, XResolution, and YResolution | |
638 | are measured in luminance samples. | |
639 | ||
640 | RowsPerStrip, TileWidth, and TileLength are constrained so that | |
641 | there are an integral number of samples of each component in a | |
642 | complete strip or tile. However, ImageWidth/ImageLength are not | |
643 | constrained. If an odd-size image is to be converted to subsampled | |
644 | format, the writer should pad the source data to a multiple of the | |
645 | sampling factors by replication of the last column and/or row, then | |
646 | downsample. The number of luminance samples actually stored in the | |
647 | file will be a multiple of the sampling factors. Conversely, | |
648 | readers must ignore any extra data (outside the specified image | |
649 | dimensions) after upsampling. | |
650 | ||
651 | When PlanarConfiguration=2, each strip or tile covers the same | |
652 | image area despite subsampling; that is, the total number of strips | |
653 | or tiles in the image is the same for each component. Therefore | |
654 | strips or tiles of the subsampled components contain fewer samples | |
655 | than strips or tiles of the luminance component. | |
656 | ||
657 | If there are extra samples per pixel (see field ExtraSamples), | |
658 | these data channels have the same number of samples as the | |
659 | luminance component. | |
660 | ||
661 | Rewrite the YCbCrSubSampling field description (pp 91-92) as follows | |
662 | (largely to eliminate possibly-misleading references to | |
663 | ImageWidth/ImageLength of the subsampled components): | |
664 | ||
665 | (first paragraph unchanged) | |
666 | ||
667 | The two elements of this field are defined as follows: | |
668 | ||
669 | Short 0: ChromaSubsampleHoriz: | |
670 | ||
671 | 1 = there are equal numbers of luma and chroma samples horizontally. | |
672 | ||
673 | 2 = there are twice as many luma samples as chroma samples | |
674 | horizontally. | |
675 | ||
676 | 4 = there are four times as many luma samples as chroma samples | |
677 | horizontally. | |
678 | ||
679 | Short 1: ChromaSubsampleVert: | |
680 | ||
681 | 1 = there are equal numbers of luma and chroma samples vertically. | |
682 | ||
683 | 2 = there are twice as many luma samples as chroma samples | |
684 | vertically. | |
685 | ||
686 | 4 = there are four times as many luma samples as chroma samples | |
687 | vertically. | |
688 | ||
689 | ChromaSubsampleVert shall always be less than or equal to | |
690 | ChromaSubsampleHoriz. Note that Cb and Cr have the same sampling | |
691 | ratios. | |
692 | ||
693 | In a strip TIFF file, RowsPerStrip is required to be an integer | |
694 | multiple of ChromaSubSampleVert (unless RowsPerStrip >= | |
695 | ImageLength, in which case its exact value is unimportant). | |
696 | If ImageWidth and ImageLength are not multiples of | |
697 | ChromaSubsampleHoriz and ChromaSubsampleVert respectively, then the | |
698 | source data shall be padded to the next integer multiple of these | |
699 | values before downsampling. | |
700 | ||
701 | In a tiled TIFF file, TileWidth must be an integer multiple of | |
702 | ChromaSubsampleHoriz and TileLength must be an integer multiple of | |
703 | ChromaSubsampleVert. Padding will occur to tile boundaries. | |
704 | ||
705 | The default values of this field are [ 2,2 ]. Thus, YCbCr data is | |
706 | downsampled by default! | |
707 | </pre> |