]> git.saurik.com Git - apple/icu.git/blob - icuSources/common/unicode/utf_old.h
ICU-6.2.8.tar.gz
[apple/icu.git] / icuSources / common / unicode / utf_old.h
1 /*
2 *******************************************************************************
3 *
4 * Copyright (C) 2002-2004, International Business Machines
5 * Corporation and others. All Rights Reserved.
6 *
7 *******************************************************************************
8 * file name: utf.h
9 * encoding: US-ASCII
10 * tab size: 8 (not used)
11 * indentation:4
12 *
13 * created on: 2002sep21
14 * created by: Markus W. Scherer
15 */
16
17 /**
18 * \file
19 * The macros in utf_old.h are all deprecated and their use discouraged.
20 * Some of the design principles behind the set of UTF macros
21 * have changed or proved impractical.
22 * Almost all of the old "UTF macros" are at least renamed.
23 * If you are looking for a new equivalent to an old macro, please see the
24 * comment at the old one.
25 *
26 * utf_old.h is included by utf.h after unicode/umachine.h
27 * and some common definitions, to not break old code.
28 *
29 * Brief summary of reasons for deprecation:
30 * - Switch on UTF_SIZE (selection of UTF-8/16/32 default string processing)
31 * was impractical.
32 * - Switch on UTF_SAFE etc. (selection of unsafe/safe/strict default string processing)
33 * was of little use and impractical.
34 * - Whole classes of macros became obsolete outside of the UTF_SIZE/UTF_SAFE
35 * selection framework: UTF32_ macros (all trivial)
36 * and UTF_ default and intermediate macros (all aliases).
37 * - The selection framework also caused many macro aliases.
38 * - Change in Unicode standard: "irregular" sequences (3.0) became illegal (3.2).
39 * - Change of language in Unicode standard:
40 * Growing distinction between internal x-bit Unicode strings and external UTF-x
41 * forms, with the former more lenient.
42 * Suggests renaming of UTF16_ macros to U16_.
43 * - The prefix "UTF_" without a width number confused some users.
44 * - "Safe" append macros needed the addition of an error indicator output.
45 * - "Safe" UTF-8 macros used legitimate (if rarely used) code point values
46 * to indicate error conditions.
47 * - The use of the "_CHAR" infix for code point operations confused some users.
48 *
49 * More details:
50 *
51 * Until ICU 2.2, utf.h theoretically allowed to choose among UTF-8/16/32
52 * for string processing, and among unsafe/safe/strict default macros for that.
53 *
54 * It proved nearly impossible to write non-trivial, high-performance code
55 * that is UTF-generic.
56 * Unsafe default macros would be dangerous for default string processing,
57 * and the main reason for the "strict" versions disappeared:
58 * Between Unicode 3.0 and 3.2 all "irregular" UTF-8 sequences became illegal.
59 * The only other conditions that "strict" checked for were non-characters,
60 * which are valid during processing. Only during text input/output should they
61 * be checked, and at that time other well-formedness checks may be
62 * necessary or useful as well.
63 * This can still be done by using U16_NEXT and U_IS_UNICODE_NONCHAR
64 * or U_IS_UNICODE_CHAR.
65 *
66 * The old UTF8_..._SAFE macros also used some normal Unicode code points
67 * to indicate malformed sequences.
68 * The new UTF8_ macros without suffix use negative values instead.
69 *
70 * The entire contents of utf32.h was moved here without replacement
71 * because all those macros were trivial and
72 * were meaningful only in the framework of choosing the UTF size.
73 *
74 * See Jitterbug 2150 and its discussion on the ICU mailing list
75 * in September 2002.
76 *
77 * <hr>
78 *
79 * <em>Obsolete part</em> of pre-ICU 2.4 utf.h file documentation:
80 *
81 * <p>The original concept for these files was for ICU to allow
82 * in principle to set which UTF (UTF-8/16/32) is used internally
83 * by defining UTF_SIZE to either 8, 16, or 32. utf.h would then define the UChar type
84 * accordingly. UTF-16 was the default.</p>
85 *
86 * <p>This concept has been abandoned.
87 * A lot of the ICU source code &#8212; especially low-level code like
88 * conversion, normalization, and collation &#8212; assumes UTF-16,
89 * utf.h enforces the default of UTF-16.
90 * The UTF-8 and UTF-32 macros remain for now for completeness and backward compatibility.</p>
91 *
92 * <p>Accordingly, utf.h defines UChar to be an unsigned 16-bit integer. If this matches wchar_t, then
93 * UChar is defined to be exactly wchar_t, otherwise uint16_t.</p>
94 *
95 * <p>UChar32 is defined to be a signed 32-bit integer (int32_t), large enough for a 21-bit
96 * Unicode code point (Unicode scalar value, 0..0x10ffff).
97 * Before ICU 2.4, the definition of UChar32 was similarly platform-dependent as
98 * the definition of UChar. For details see the documentation for UChar32 itself.</p>
99 *
100 * <p>utf.h also defines a number of C macros for handling single Unicode code points and
101 * for using UTF Unicode strings. It includes utf8.h, utf16.h, and utf32.h for the actual
102 * implementations of those macros and then aliases one set of them (for UTF-16) for general use.
103 * The UTF-specific macros have the UTF size in the macro name prefixes (UTF16_...), while
104 * the general alias macros always begin with UTF_...</p>
105 *
106 * <p>Many string operations can be done with or without error checking.
107 * Where such a distinction is useful, there are two versions of the macros, "unsafe" and "safe"
108 * ones with ..._UNSAFE and ..._SAFE suffixes. The unsafe macros are fast but may cause
109 * program failures if the strings are not well-formed. The safe macros have an additional, boolean
110 * parameter "strict". If strict is FALSE, then only illegal sequences are detected.
111 * Otherwise, irregular sequences and non-characters are detected as well (like single surrogates).
112 * Safe macros return special error code points for illegal/irregular sequences:
113 * Typically, U+ffff, or values that would result in a code unit sequence of the same length
114 * as the erroneous input sequence.<br>
115 * Note that _UNSAFE macros have fewer parameters: They do not have the strictness parameter, and
116 * they do not have start/length parameters for boundary checking.</p>
117 *
118 * <p>Here, the macros are aliased in two steps:
119 * In the first step, the UTF-specific macros with UTF16_ prefix and _UNSAFE and _SAFE suffixes are
120 * aliased according to the UTF_SIZE to macros with UTF_ prefix and the same suffixes and signatures.
121 * Then, in a second step, the default, general alias macros are set to use either the unsafe or
122 * the safe/not strict (default) or the safe/strict macro;
123 * these general macros do not have a strictness parameter.</p>
124 *
125 * <p>It is possible to change the default choice for the general alias macros to be unsafe, safe/not strict or safe/strict.
126 * The default is safe/not strict. It is not recommended to select the unsafe macros as the basis for
127 * Unicode string handling in ICU! To select this, define UTF_SAFE, UTF_STRICT, or UTF_UNSAFE.</p>
128 *
129 * <p>For general use, one should use the default, general macros with UTF_ prefix and no _SAFE/_UNSAFE suffix.
130 * Only in some cases it may be necessary to control the choice of macro directly and use a less generic alias.
131 * For example, if it can be assumed that a string is well-formed and the index will stay within the bounds,
132 * then the _UNSAFE version may be used.
133 * If a UTF-8 string is to be processed, then the macros with UTF8_ prefixes need to be used.</p>
134 *
135 * <hr>
136 *
137 * @deprecated ICU 2.4. Use the macros in utf.h, utf16.h, utf8.h instead.
138 */
139
140 #ifndef __UTF_OLD_H__
141 #define __UTF_OLD_H__
142
143 #ifndef U_HIDE_DEPRECATED_API
144
145 /* utf.h must be included first. */
146 #ifndef __UTF_H__
147 # include "unicode/utf.h"
148 #endif
149
150 /* Formerly utf.h, part 1 --------------------------------------------------- */
151
152 #ifdef U_USE_UTF_DEPRECATES
153 /**
154 * Unicode string and array offset and index type.
155 * ICU always counts Unicode code units (UChars) for
156 * string offsets, indexes, and lengths, not Unicode code points.
157 *
158 * @obsolete ICU 2.6. Use int32_t directly instead since this API will be removed in that release.
159 */
160 typedef int32_t UTextOffset;
161 #endif
162
163 /** Number of bits in a Unicode string code unit - ICU uses 16-bit Unicode. @deprecated ICU 2.4. Obsolete, see utf_old.h. */
164 #define UTF_SIZE 16
165
166 /**
167 * The default choice for general Unicode string macros is to use the ..._SAFE macro implementations
168 * with strict=FALSE.
169 *
170 * @deprecated ICU 2.4. Obsolete, see utf_old.h.
171 */
172 #define UTF_SAFE
173 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
174 #undef UTF_UNSAFE
175 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
176 #undef UTF_STRICT
177
178 /**
179 * <p>UTF8_ERROR_VALUE_1 and UTF8_ERROR_VALUE_2 are special error values for UTF-8,
180 * which need 1 or 2 bytes in UTF-8:<br>
181 * U+0015 = NAK = Negative Acknowledge, C0 control character<br>
182 * U+009f = highest C1 control character</p>
183 *
184 * <p>These are used by UTF8_..._SAFE macros so that they can return an error value
185 * that needs the same number of code units (bytes) as were seen by
186 * a macro. They should be tested with UTF_IS_ERROR() or UTF_IS_VALID().</p>
187 *
188 * @deprecated ICU 2.4. Obsolete, see utf_old.h.
189 */
190 #define UTF8_ERROR_VALUE_1 0x15
191
192 /**
193 * See documentation on UTF8_ERROR_VALUE_1 for details.
194 *
195 * @deprecated ICU 2.4. Obsolete, see utf_old.h.
196 */
197 #define UTF8_ERROR_VALUE_2 0x9f
198
199 /**
200 * Error value for all UTFs. This code point value will be set by macros with error
201 * checking if an error is detected.
202 *
203 * @deprecated ICU 2.4. Obsolete, see utf_old.h.
204 */
205 #define UTF_ERROR_VALUE 0xffff
206
207 /**
208 * Is a given 32-bit code an error value
209 * as returned by one of the macros for any UTF?
210 *
211 * @deprecated ICU 2.4. Obsolete, see utf_old.h.
212 */
213 #define UTF_IS_ERROR(c) \
214 (((c)&0xfffe)==0xfffe || (c)==UTF8_ERROR_VALUE_1 || (c)==UTF8_ERROR_VALUE_2)
215
216 /**
217 * This is a combined macro: Is c a valid Unicode value _and_ not an error code?
218 *
219 * @deprecated ICU 2.4. Obsolete, see utf_old.h.
220 */
221 #define UTF_IS_VALID(c) \
222 (UTF_IS_UNICODE_CHAR(c) && \
223 (c)!=UTF8_ERROR_VALUE_1 && (c)!=UTF8_ERROR_VALUE_2)
224
225 /**
226 * Is this code unit or code point a surrogate (U+d800..U+dfff)?
227 * @deprecated ICU 2.4. Renamed to U_IS_SURROGATE and U16_IS_SURROGATE, see utf_old.h.
228 */
229 #define UTF_IS_SURROGATE(uchar) (((uchar)&0xfffff800)==0xd800)
230
231 /**
232 * Is a given 32-bit code point a Unicode noncharacter?
233 *
234 * @deprecated ICU 2.4. Renamed to U_IS_UNICODE_NONCHAR, see utf_old.h.
235 */
236 #define UTF_IS_UNICODE_NONCHAR(c) \
237 ((c)>=0xfdd0 && \
238 ((uint32_t)(c)<=0xfdef || ((c)&0xfffe)==0xfffe) && \
239 (uint32_t)(c)<=0x10ffff)
240
241 /**
242 * Is a given 32-bit value a Unicode code point value (0..U+10ffff)
243 * that can be assigned a character?
244 *
245 * Code points that are not characters include:
246 * - single surrogate code points (U+d800..U+dfff, 2048 code points)
247 * - the last two code points on each plane (U+__fffe and U+__ffff, 34 code points)
248 * - U+fdd0..U+fdef (new with Unicode 3.1, 32 code points)
249 * - the highest Unicode code point value is U+10ffff
250 *
251 * This means that all code points below U+d800 are character code points,
252 * and that boundary is tested first for performance.
253 *
254 * @deprecated ICU 2.4. Renamed to U_IS_UNICODE_CHAR, see utf_old.h.
255 */
256 #define UTF_IS_UNICODE_CHAR(c) \
257 ((uint32_t)(c)<0xd800 || \
258 ((uint32_t)(c)>0xdfff && \
259 (uint32_t)(c)<=0x10ffff && \
260 !UTF_IS_UNICODE_NONCHAR(c)))
261
262 /* Formerly utf8.h ---------------------------------------------------------- */
263
264 /**
265 * Count the trail bytes for a UTF-8 lead byte.
266 * @deprecated ICU 2.4. Renamed to U8_COUNT_TRAIL_BYTES, see utf_old.h.
267 */
268 #define UTF8_COUNT_TRAIL_BYTES(leadByte) (utf8_countTrailBytes[(uint8_t)leadByte])
269
270 /**
271 * Mask a UTF-8 lead byte, leave only the lower bits that form part of the code point value.
272 * @deprecated ICU 2.4. Renamed to U8_MASK_LEAD_BYTE, see utf_old.h.
273 */
274 #define UTF8_MASK_LEAD_BYTE(leadByte, countTrailBytes) ((leadByte)&=(1<<(6-(countTrailBytes)))-1)
275
276 /** Is this this code point a single code unit (byte)? @deprecated ICU 2.4. Renamed to U8_IS_SINGLE, see utf_old.h. */
277 #define UTF8_IS_SINGLE(uchar) (((uchar)&0x80)==0)
278 /** Is this this code unit the lead code unit (byte) of a code point? @deprecated ICU 2.4. Renamed to U8_IS_LEAD, see utf_old.h. */
279 #define UTF8_IS_LEAD(uchar) ((uint8_t)((uchar)-0xc0)<0x3e)
280 /** Is this this code unit a trailing code unit (byte) of a code point? @deprecated ICU 2.4. Renamed to U8_IS_TRAIL, see utf_old.h. */
281 #define UTF8_IS_TRAIL(uchar) (((uchar)&0xc0)==0x80)
282
283 /** Does this scalar Unicode value need multiple code units for storage? @deprecated ICU 2.4. Use U8_LENGTH or test ((uint32_t)(c)>0x7f) instead, see utf_old.h. */
284 #define UTF8_NEED_MULTIPLE_UCHAR(c) ((uint32_t)(c)>0x7f)
285
286 /**
287 * Given the lead character, how many bytes are taken by this code point.
288 * ICU does not deal with code points >0x10ffff
289 * unless necessary for advancing in the byte stream.
290 *
291 * These length macros take into account that for values >0x10ffff
292 * the UTF8_APPEND_CHAR_SAFE macros would write the error code point 0xffff
293 * with 3 bytes.
294 * Code point comparisons need to be in uint32_t because UChar32
295 * may be a signed type, and negative values must be recognized.
296 *
297 * @deprecated ICU 2.4. Use U8_LENGTH instead, see utf_old.h.
298 */
299 #if 1
300 # define UTF8_CHAR_LENGTH(c) \
301 ((uint32_t)(c)<=0x7f ? 1 : \
302 ((uint32_t)(c)<=0x7ff ? 2 : \
303 ((uint32_t)((c)-0x10000)>0xfffff ? 3 : 4) \
304 ) \
305 )
306 #else
307 # define UTF8_CHAR_LENGTH(c) \
308 ((uint32_t)(c)<=0x7f ? 1 : \
309 ((uint32_t)(c)<=0x7ff ? 2 : \
310 ((uint32_t)(c)<=0xffff ? 3 : \
311 ((uint32_t)(c)<=0x10ffff ? 4 : \
312 ((uint32_t)(c)<=0x3ffffff ? 5 : \
313 ((uint32_t)(c)<=0x7fffffff ? 6 : 3) \
314 ) \
315 ) \
316 ) \
317 ) \
318 )
319 #endif
320
321 /** The maximum number of bytes per code point. @deprecated ICU 2.4. Renamed to U8_MAX_LENGTH, see utf_old.h. */
322 #define UTF8_MAX_CHAR_LENGTH 4
323
324 /** Average number of code units compared to UTF-16. @deprecated ICU 2.4. Obsolete, see utf_old.h. */
325 #define UTF8_ARRAY_SIZE(size) ((5*(size))/2)
326
327 /** @deprecated ICU 2.4. Renamed to U8_GET_UNSAFE, see utf_old.h. */
328 #define UTF8_GET_CHAR_UNSAFE(s, i, c) { \
329 int32_t _utf8_get_char_unsafe_index=(int32_t)(i); \
330 UTF8_SET_CHAR_START_UNSAFE(s, _utf8_get_char_unsafe_index); \
331 UTF8_NEXT_CHAR_UNSAFE(s, _utf8_get_char_unsafe_index, c); \
332 }
333
334 /** @deprecated ICU 2.4. Use U8_GET instead, see utf_old.h. */
335 #define UTF8_GET_CHAR_SAFE(s, start, i, length, c, strict) { \
336 int32_t _utf8_get_char_safe_index=(int32_t)(i); \
337 UTF8_SET_CHAR_START_SAFE(s, start, _utf8_get_char_safe_index); \
338 UTF8_NEXT_CHAR_SAFE(s, _utf8_get_char_safe_index, length, c, strict); \
339 }
340
341 /** @deprecated ICU 2.4. Renamed to U8_NEXT_UNSAFE, see utf_old.h. */
342 #define UTF8_NEXT_CHAR_UNSAFE(s, i, c) { \
343 (c)=(s)[(i)++]; \
344 if((uint8_t)((c)-0xc0)<0x35) { \
345 uint8_t __count=UTF8_COUNT_TRAIL_BYTES(c); \
346 UTF8_MASK_LEAD_BYTE(c, __count); \
347 switch(__count) { \
348 /* each following branch falls through to the next one */ \
349 case 3: \
350 (c)=((c)<<6)|((s)[(i)++]&0x3f); \
351 case 2: \
352 (c)=((c)<<6)|((s)[(i)++]&0x3f); \
353 case 1: \
354 (c)=((c)<<6)|((s)[(i)++]&0x3f); \
355 /* no other branches to optimize switch() */ \
356 break; \
357 } \
358 } \
359 }
360
361 /** @deprecated ICU 2.4. Renamed to U8_APPEND_UNSAFE, see utf_old.h. */
362 #define UTF8_APPEND_CHAR_UNSAFE(s, i, c) { \
363 if((uint32_t)(c)<=0x7f) { \
364 (s)[(i)++]=(uint8_t)(c); \
365 } else { \
366 if((uint32_t)(c)<=0x7ff) { \
367 (s)[(i)++]=(uint8_t)(((c)>>6)|0xc0); \
368 } else { \
369 if((uint32_t)(c)<=0xffff) { \
370 (s)[(i)++]=(uint8_t)(((c)>>12)|0xe0); \
371 } else { \
372 (s)[(i)++]=(uint8_t)(((c)>>18)|0xf0); \
373 (s)[(i)++]=(uint8_t)((((c)>>12)&0x3f)|0x80); \
374 } \
375 (s)[(i)++]=(uint8_t)((((c)>>6)&0x3f)|0x80); \
376 } \
377 (s)[(i)++]=(uint8_t)(((c)&0x3f)|0x80); \
378 } \
379 }
380
381 /** @deprecated ICU 2.4. Renamed to U8_FWD_1_UNSAFE, see utf_old.h. */
382 #define UTF8_FWD_1_UNSAFE(s, i) { \
383 (i)+=1+UTF8_COUNT_TRAIL_BYTES((s)[i]); \
384 }
385
386 /** @deprecated ICU 2.4. Renamed to U8_FWD_N_UNSAFE, see utf_old.h. */
387 #define UTF8_FWD_N_UNSAFE(s, i, n) { \
388 int32_t __N=(n); \
389 while(__N>0) { \
390 UTF8_FWD_1_UNSAFE(s, i); \
391 --__N; \
392 } \
393 }
394
395 /** @deprecated ICU 2.4. Renamed to U8_SET_CP_START_UNSAFE, see utf_old.h. */
396 #define UTF8_SET_CHAR_START_UNSAFE(s, i) { \
397 while(UTF8_IS_TRAIL((s)[i])) { --(i); } \
398 }
399
400 /** @deprecated ICU 2.4. Use U8_NEXT instead, see utf_old.h. */
401 #define UTF8_NEXT_CHAR_SAFE(s, i, length, c, strict) { \
402 (c)=(s)[(i)++]; \
403 if((c)>=0x80) { \
404 if(UTF8_IS_LEAD(c)) { \
405 (c)=utf8_nextCharSafeBody(s, &(i), (int32_t)(length), c, strict); \
406 } else { \
407 (c)=UTF8_ERROR_VALUE_1; \
408 } \
409 } \
410 }
411
412 /** @deprecated ICU 2.4. Use U8_APPEND instead, see utf_old.h. */
413 #define UTF8_APPEND_CHAR_SAFE(s, i, length, c) { \
414 if((uint32_t)(c)<=0x7f) { \
415 (s)[(i)++]=(uint8_t)(c); \
416 } else { \
417 (i)=utf8_appendCharSafeBody(s, (int32_t)(i), (int32_t)(length), c, NULL); \
418 } \
419 }
420
421 /** @deprecated ICU 2.4. Renamed to U8_FWD_1, see utf_old.h. */
422 #define UTF8_FWD_1_SAFE(s, i, length) U8_FWD_1(s, i, length)
423
424 /** @deprecated ICU 2.4. Renamed to U8_FWD_N, see utf_old.h. */
425 #define UTF8_FWD_N_SAFE(s, i, length, n) U8_FWD_N(s, i, length, n)
426
427 /** @deprecated ICU 2.4. Renamed to U8_SET_CP_START, see utf_old.h. */
428 #define UTF8_SET_CHAR_START_SAFE(s, start, i) U8_SET_CP_START(s, start, i)
429
430 /** @deprecated ICU 2.4. Renamed to U8_PREV_UNSAFE, see utf_old.h. */
431 #define UTF8_PREV_CHAR_UNSAFE(s, i, c) { \
432 (c)=(s)[--(i)]; \
433 if(UTF8_IS_TRAIL(c)) { \
434 uint8_t __b, __count=1, __shift=6; \
435 \
436 /* c is a trail byte */ \
437 (c)&=0x3f; \
438 for(;;) { \
439 __b=(s)[--(i)]; \
440 if(__b>=0xc0) { \
441 UTF8_MASK_LEAD_BYTE(__b, __count); \
442 (c)|=(UChar32)__b<<__shift; \
443 break; \
444 } else { \
445 (c)|=(UChar32)(__b&0x3f)<<__shift; \
446 ++__count; \
447 __shift+=6; \
448 } \
449 } \
450 } \
451 }
452
453 /** @deprecated ICU 2.4. Renamed to U8_BACK_1_UNSAFE, see utf_old.h. */
454 #define UTF8_BACK_1_UNSAFE(s, i) { \
455 while(UTF8_IS_TRAIL((s)[--(i)])) {} \
456 }
457
458 /** @deprecated ICU 2.4. Renamed to U8_BACK_N_UNSAFE, see utf_old.h. */
459 #define UTF8_BACK_N_UNSAFE(s, i, n) { \
460 int32_t __N=(n); \
461 while(__N>0) { \
462 UTF8_BACK_1_UNSAFE(s, i); \
463 --__N; \
464 } \
465 }
466
467 /** @deprecated ICU 2.4. Renamed to U8_SET_CP_LIMIT_UNSAFE, see utf_old.h. */
468 #define UTF8_SET_CHAR_LIMIT_UNSAFE(s, i) { \
469 UTF8_BACK_1_UNSAFE(s, i); \
470 UTF8_FWD_1_UNSAFE(s, i); \
471 }
472
473 /** @deprecated ICU 2.4. Use U8_PREV instead, see utf_old.h. */
474 #define UTF8_PREV_CHAR_SAFE(s, start, i, c, strict) { \
475 (c)=(s)[--(i)]; \
476 if((c)>=0x80) { \
477 if((c)<=0xbf) { \
478 (c)=utf8_prevCharSafeBody(s, start, &(i), c, strict); \
479 } else { \
480 (c)=UTF8_ERROR_VALUE_1; \
481 } \
482 } \
483 }
484
485 /** @deprecated ICU 2.4. Renamed to U8_BACK_1, see utf_old.h. */
486 #define UTF8_BACK_1_SAFE(s, start, i) U8_BACK_1(s, start, i)
487
488 /** @deprecated ICU 2.4. Renamed to U8_BACK_N, see utf_old.h. */
489 #define UTF8_BACK_N_SAFE(s, start, i, n) U8_BACK_N(s, start, i, n)
490
491 /** @deprecated ICU 2.4. Renamed to U8_SET_CP_LIMIT, see utf_old.h. */
492 #define UTF8_SET_CHAR_LIMIT_SAFE(s, start, i, length) U8_SET_CP_LIMIT(s, start, i, length)
493
494 /* Formerly utf16.h --------------------------------------------------------- */
495
496 /** Is uchar a first/lead surrogate? @deprecated ICU 2.4. Renamed to U_IS_LEAD and U16_IS_LEAD, see utf_old.h. */
497 #define UTF_IS_FIRST_SURROGATE(uchar) (((uchar)&0xfffffc00)==0xd800)
498
499 /** Is uchar a second/trail surrogate? @deprecated ICU 2.4. Renamed to U_IS_TRAIL and U16_IS_TRAIL, see utf_old.h. */
500 #define UTF_IS_SECOND_SURROGATE(uchar) (((uchar)&0xfffffc00)==0xdc00)
501
502 /** Assuming c is a surrogate, is it a first/lead surrogate? @deprecated ICU 2.4. Renamed to U_IS_SURROGATE_LEAD and U16_IS_SURROGATE_LEAD, see utf_old.h. */
503 #define UTF_IS_SURROGATE_FIRST(c) (((c)&0x400)==0)
504
505 /** Helper constant for UTF16_GET_PAIR_VALUE. @deprecated ICU 2.4. Renamed to U16_SURROGATE_OFFSET, see utf_old.h. */
506 #define UTF_SURROGATE_OFFSET ((0xd800<<10UL)+0xdc00-0x10000)
507
508 /** Get the UTF-32 value from the surrogate code units. @deprecated ICU 2.4. Renamed to U16_GET_SUPPLEMENTARY, see utf_old.h. */
509 #define UTF16_GET_PAIR_VALUE(first, second) \
510 (((first)<<10UL)+(second)-UTF_SURROGATE_OFFSET)
511
512 /** @deprecated ICU 2.4. Renamed to U16_LEAD, see utf_old.h. */
513 #define UTF_FIRST_SURROGATE(supplementary) (UChar)(((supplementary)>>10)+0xd7c0)
514
515 /** @deprecated ICU 2.4. Renamed to U16_TRAIL, see utf_old.h. */
516 #define UTF_SECOND_SURROGATE(supplementary) (UChar)(((supplementary)&0x3ff)|0xdc00)
517
518 /** @deprecated ICU 2.4. Renamed to U16_LEAD, see utf_old.h. */
519 #define UTF16_LEAD(supplementary) UTF_FIRST_SURROGATE(supplementary)
520
521 /** @deprecated ICU 2.4. Renamed to U16_TRAIL, see utf_old.h. */
522 #define UTF16_TRAIL(supplementary) UTF_SECOND_SURROGATE(supplementary)
523
524 /** @deprecated ICU 2.4. Renamed to U16_IS_SINGLE, see utf_old.h. */
525 #define UTF16_IS_SINGLE(uchar) !UTF_IS_SURROGATE(uchar)
526
527 /** @deprecated ICU 2.4. Renamed to U16_IS_LEAD, see utf_old.h. */
528 #define UTF16_IS_LEAD(uchar) UTF_IS_FIRST_SURROGATE(uchar)
529
530 /** @deprecated ICU 2.4. Renamed to U16_IS_TRAIL, see utf_old.h. */
531 #define UTF16_IS_TRAIL(uchar) UTF_IS_SECOND_SURROGATE(uchar)
532
533 /** Does this scalar Unicode value need multiple code units for storage? @deprecated ICU 2.4. Use U16_LENGTH or test ((uint32_t)(c)>0xffff) instead, see utf_old.h. */
534 #define UTF16_NEED_MULTIPLE_UCHAR(c) ((uint32_t)(c)>0xffff)
535
536 /** @deprecated ICU 2.4. Renamed to U16_LENGTH, see utf_old.h. */
537 #define UTF16_CHAR_LENGTH(c) ((uint32_t)(c)<=0xffff ? 1 : 2)
538
539 /** @deprecated ICU 2.4. Renamed to U16_MAX_LENGTH, see utf_old.h. */
540 #define UTF16_MAX_CHAR_LENGTH 2
541
542 /** Average number of code units compared to UTF-16. @deprecated ICU 2.4. Obsolete, see utf_old.h. */
543 #define UTF16_ARRAY_SIZE(size) (size)
544
545 /**
546 * Get a single code point from an offset that points to any
547 * of the code units that belong to that code point.
548 * Assume 0<=i<length.
549 *
550 * This could be used for iteration together with
551 * UTF16_CHAR_LENGTH() and UTF_IS_ERROR(),
552 * but the use of UTF16_NEXT_CHAR[_UNSAFE]() and
553 * UTF16_PREV_CHAR[_UNSAFE]() is more efficient for that.
554 * @deprecated ICU 2.4. Renamed to U16_GET_UNSAFE, see utf_old.h.
555 */
556 #define UTF16_GET_CHAR_UNSAFE(s, i, c) { \
557 (c)=(s)[i]; \
558 if(UTF_IS_SURROGATE(c)) { \
559 if(UTF_IS_SURROGATE_FIRST(c)) { \
560 (c)=UTF16_GET_PAIR_VALUE((c), (s)[(i)+1]); \
561 } else { \
562 (c)=UTF16_GET_PAIR_VALUE((s)[(i)-1], (c)); \
563 } \
564 } \
565 }
566
567 /** @deprecated ICU 2.4. Use U16_GET instead, see utf_old.h. */
568 #define UTF16_GET_CHAR_SAFE(s, start, i, length, c, strict) { \
569 (c)=(s)[i]; \
570 if(UTF_IS_SURROGATE(c)) { \
571 uint16_t __c2; \
572 if(UTF_IS_SURROGATE_FIRST(c)) { \
573 if((i)+1<(length) && UTF_IS_SECOND_SURROGATE(__c2=(s)[(i)+1])) { \
574 (c)=UTF16_GET_PAIR_VALUE((c), __c2); \
575 /* strict: ((c)&0xfffe)==0xfffe is caught by UTF_IS_ERROR() and UTF_IS_UNICODE_CHAR() */ \
576 } else if(strict) {\
577 /* unmatched first surrogate */ \
578 (c)=UTF_ERROR_VALUE; \
579 } \
580 } else { \
581 if((i)-1>=(start) && UTF_IS_FIRST_SURROGATE(__c2=(s)[(i)-1])) { \
582 (c)=UTF16_GET_PAIR_VALUE(__c2, (c)); \
583 /* strict: ((c)&0xfffe)==0xfffe is caught by UTF_IS_ERROR() and UTF_IS_UNICODE_CHAR() */ \
584 } else if(strict) {\
585 /* unmatched second surrogate */ \
586 (c)=UTF_ERROR_VALUE; \
587 } \
588 } \
589 } else if((strict) && !UTF_IS_UNICODE_CHAR(c)) { \
590 (c)=UTF_ERROR_VALUE; \
591 } \
592 }
593
594 /** @deprecated ICU 2.4. Renamed to U16_NEXT_UNSAFE, see utf_old.h. */
595 #define UTF16_NEXT_CHAR_UNSAFE(s, i, c) { \
596 (c)=(s)[(i)++]; \
597 if(UTF_IS_FIRST_SURROGATE(c)) { \
598 (c)=UTF16_GET_PAIR_VALUE((c), (s)[(i)++]); \
599 } \
600 }
601
602 /** @deprecated ICU 2.4. Renamed to U16_APPEND_UNSAFE, see utf_old.h. */
603 #define UTF16_APPEND_CHAR_UNSAFE(s, i, c) { \
604 if((uint32_t)(c)<=0xffff) { \
605 (s)[(i)++]=(uint16_t)(c); \
606 } else { \
607 (s)[(i)++]=(uint16_t)(((c)>>10)+0xd7c0); \
608 (s)[(i)++]=(uint16_t)(((c)&0x3ff)|0xdc00); \
609 } \
610 }
611
612 /** @deprecated ICU 2.4. Renamed to U16_FWD_1_UNSAFE, see utf_old.h. */
613 #define UTF16_FWD_1_UNSAFE(s, i) { \
614 if(UTF_IS_FIRST_SURROGATE((s)[(i)++])) { \
615 ++(i); \
616 } \
617 }
618
619 /** @deprecated ICU 2.4. Renamed to U16_FWD_N_UNSAFE, see utf_old.h. */
620 #define UTF16_FWD_N_UNSAFE(s, i, n) { \
621 int32_t __N=(n); \
622 while(__N>0) { \
623 UTF16_FWD_1_UNSAFE(s, i); \
624 --__N; \
625 } \
626 }
627
628 /** @deprecated ICU 2.4. Renamed to U16_SET_CP_START_UNSAFE, see utf_old.h. */
629 #define UTF16_SET_CHAR_START_UNSAFE(s, i) { \
630 if(UTF_IS_SECOND_SURROGATE((s)[i])) { \
631 --(i); \
632 } \
633 }
634
635 /** @deprecated ICU 2.4. Use U16_NEXT instead, see utf_old.h. */
636 #define UTF16_NEXT_CHAR_SAFE(s, i, length, c, strict) { \
637 (c)=(s)[(i)++]; \
638 if(UTF_IS_FIRST_SURROGATE(c)) { \
639 uint16_t __c2; \
640 if((i)<(length) && UTF_IS_SECOND_SURROGATE(__c2=(s)[(i)])) { \
641 ++(i); \
642 (c)=UTF16_GET_PAIR_VALUE((c), __c2); \
643 /* strict: ((c)&0xfffe)==0xfffe is caught by UTF_IS_ERROR() and UTF_IS_UNICODE_CHAR() */ \
644 } else if(strict) {\
645 /* unmatched first surrogate */ \
646 (c)=UTF_ERROR_VALUE; \
647 } \
648 } else if((strict) && !UTF_IS_UNICODE_CHAR(c)) { \
649 /* unmatched second surrogate or other non-character */ \
650 (c)=UTF_ERROR_VALUE; \
651 } \
652 }
653
654 /** @deprecated ICU 2.4. Use U16_APPEND instead, see utf_old.h. */
655 #define UTF16_APPEND_CHAR_SAFE(s, i, length, c) { \
656 if((uint32_t)(c)<=0xffff) { \
657 (s)[(i)++]=(uint16_t)(c); \
658 } else if((uint32_t)(c)<=0x10ffff) { \
659 if((i)+1<(length)) { \
660 (s)[(i)++]=(uint16_t)(((c)>>10)+0xd7c0); \
661 (s)[(i)++]=(uint16_t)(((c)&0x3ff)|0xdc00); \
662 } else /* not enough space */ { \
663 (s)[(i)++]=UTF_ERROR_VALUE; \
664 } \
665 } else /* c>0x10ffff, write error value */ { \
666 (s)[(i)++]=UTF_ERROR_VALUE; \
667 } \
668 }
669
670 /** @deprecated ICU 2.4. Renamed to U16_FWD_1, see utf_old.h. */
671 #define UTF16_FWD_1_SAFE(s, i, length) U16_FWD_1(s, i, length)
672
673 /** @deprecated ICU 2.4. Renamed to U16_FWD_N, see utf_old.h. */
674 #define UTF16_FWD_N_SAFE(s, i, length, n) U16_FWD_N(s, i, length, n)
675
676 /** @deprecated ICU 2.4. Renamed to U16_SET_CP_START, see utf_old.h. */
677 #define UTF16_SET_CHAR_START_SAFE(s, start, i) U16_SET_CP_START(s, start, i)
678
679 /** @deprecated ICU 2.4. Renamed to U16_PREV_UNSAFE, see utf_old.h. */
680 #define UTF16_PREV_CHAR_UNSAFE(s, i, c) { \
681 (c)=(s)[--(i)]; \
682 if(UTF_IS_SECOND_SURROGATE(c)) { \
683 (c)=UTF16_GET_PAIR_VALUE((s)[--(i)], (c)); \
684 } \
685 }
686
687 /** @deprecated ICU 2.4. Renamed to U16_BACK_1_UNSAFE, see utf_old.h. */
688 #define UTF16_BACK_1_UNSAFE(s, i) { \
689 if(UTF_IS_SECOND_SURROGATE((s)[--(i)])) { \
690 --(i); \
691 } \
692 }
693
694 /** @deprecated ICU 2.4. Renamed to U16_BACK_N_UNSAFE, see utf_old.h. */
695 #define UTF16_BACK_N_UNSAFE(s, i, n) { \
696 int32_t __N=(n); \
697 while(__N>0) { \
698 UTF16_BACK_1_UNSAFE(s, i); \
699 --__N; \
700 } \
701 }
702
703 /** @deprecated ICU 2.4. Renamed to U16_SET_CP_LIMIT_UNSAFE, see utf_old.h. */
704 #define UTF16_SET_CHAR_LIMIT_UNSAFE(s, i) { \
705 if(UTF_IS_FIRST_SURROGATE((s)[(i)-1])) { \
706 ++(i); \
707 } \
708 }
709
710 /** @deprecated ICU 2.4. Use U16_PREV instead, see utf_old.h. */
711 #define UTF16_PREV_CHAR_SAFE(s, start, i, c, strict) { \
712 (c)=(s)[--(i)]; \
713 if(UTF_IS_SECOND_SURROGATE(c)) { \
714 uint16_t __c2; \
715 if((i)>(start) && UTF_IS_FIRST_SURROGATE(__c2=(s)[(i)-1])) { \
716 --(i); \
717 (c)=UTF16_GET_PAIR_VALUE(__c2, (c)); \
718 /* strict: ((c)&0xfffe)==0xfffe is caught by UTF_IS_ERROR() and UTF_IS_UNICODE_CHAR() */ \
719 } else if(strict) {\
720 /* unmatched second surrogate */ \
721 (c)=UTF_ERROR_VALUE; \
722 } \
723 } else if((strict) && !UTF_IS_UNICODE_CHAR(c)) { \
724 /* unmatched first surrogate or other non-character */ \
725 (c)=UTF_ERROR_VALUE; \
726 } \
727 }
728
729 /** @deprecated ICU 2.4. Renamed to U16_BACK_1, see utf_old.h. */
730 #define UTF16_BACK_1_SAFE(s, start, i) U16_BACK_1(s, start, i)
731
732 /** @deprecated ICU 2.4. Renamed to U16_BACK_N, see utf_old.h. */
733 #define UTF16_BACK_N_SAFE(s, start, i, n) U16_BACK_N(s, start, i, n)
734
735 /** @deprecated ICU 2.4. Renamed to U16_SET_CP_LIMIT, see utf_old.h. */
736 #define UTF16_SET_CHAR_LIMIT_SAFE(s, start, i, length) U16_SET_CP_LIMIT(s, start, i, length)
737
738 /* Formerly utf32.h --------------------------------------------------------- */
739
740 /*
741 * Old documentation:
742 *
743 * This file defines macros to deal with UTF-32 code units and code points.
744 * Signatures and semantics are the same as for the similarly named macros
745 * in utf16.h.
746 * utf32.h is included by utf.h after unicode/umachine.h</p>
747 * and some common definitions.
748 * <p><b>Usage:</b> ICU coding guidelines for if() statements should be followed when using these macros.
749 * Compound statements (curly braces {}) must be used for if-else-while...
750 * bodies and all macro statements should be terminated with semicolon.</p>
751 */
752
753 /* internal definitions ----------------------------------------------------- */
754
755 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
756 #define UTF32_IS_SAFE(c, strict) \
757 (!(strict) ? \
758 (uint32_t)(c)<=0x10ffff : \
759 UTF_IS_UNICODE_CHAR(c))
760
761 /*
762 * For the semantics of all of these macros, see utf16.h.
763 * The UTF-32 versions are trivial because any code point is
764 * encoded using exactly one code unit.
765 */
766
767 /* single-code point definitions -------------------------------------------- */
768
769 /* classes of code unit values */
770
771 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
772 #define UTF32_IS_SINGLE(uchar) 1
773 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
774 #define UTF32_IS_LEAD(uchar) 0
775 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
776 #define UTF32_IS_TRAIL(uchar) 0
777
778 /* number of code units per code point */
779
780 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
781 #define UTF32_NEED_MULTIPLE_UCHAR(c) 0
782 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
783 #define UTF32_CHAR_LENGTH(c) 1
784 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
785 #define UTF32_MAX_CHAR_LENGTH 1
786
787 /* average number of code units compared to UTF-16 */
788
789 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
790 #define UTF32_ARRAY_SIZE(size) (size)
791
792 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
793 #define UTF32_GET_CHAR_UNSAFE(s, i, c) { \
794 (c)=(s)[i]; \
795 }
796
797 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
798 #define UTF32_GET_CHAR_SAFE(s, start, i, length, c, strict) { \
799 (c)=(s)[i]; \
800 if(!UTF32_IS_SAFE(c, strict)) { \
801 (c)=UTF_ERROR_VALUE; \
802 } \
803 }
804
805 /* definitions with forward iteration --------------------------------------- */
806
807 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
808 #define UTF32_NEXT_CHAR_UNSAFE(s, i, c) { \
809 (c)=(s)[(i)++]; \
810 }
811
812 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
813 #define UTF32_APPEND_CHAR_UNSAFE(s, i, c) { \
814 (s)[(i)++]=(c); \
815 }
816
817 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
818 #define UTF32_FWD_1_UNSAFE(s, i) { \
819 ++(i); \
820 }
821
822 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
823 #define UTF32_FWD_N_UNSAFE(s, i, n) { \
824 (i)+=(n); \
825 }
826
827 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
828 #define UTF32_SET_CHAR_START_UNSAFE(s, i) { \
829 }
830
831 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
832 #define UTF32_NEXT_CHAR_SAFE(s, i, length, c, strict) { \
833 (c)=(s)[(i)++]; \
834 if(!UTF32_IS_SAFE(c, strict)) { \
835 (c)=UTF_ERROR_VALUE; \
836 } \
837 }
838
839 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
840 #define UTF32_APPEND_CHAR_SAFE(s, i, length, c) { \
841 if((uint32_t)(c)<=0x10ffff) { \
842 (s)[(i)++]=(c); \
843 } else /* c>0x10ffff, write 0xfffd */ { \
844 (s)[(i)++]=0xfffd; \
845 } \
846 }
847
848 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
849 #define UTF32_FWD_1_SAFE(s, i, length) { \
850 ++(i); \
851 }
852
853 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
854 #define UTF32_FWD_N_SAFE(s, i, length, n) { \
855 if(((i)+=(n))>(length)) { \
856 (i)=(length); \
857 } \
858 }
859
860 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
861 #define UTF32_SET_CHAR_START_SAFE(s, start, i) { \
862 }
863
864 /* definitions with backward iteration -------------------------------------- */
865
866 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
867 #define UTF32_PREV_CHAR_UNSAFE(s, i, c) { \
868 (c)=(s)[--(i)]; \
869 }
870
871 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
872 #define UTF32_BACK_1_UNSAFE(s, i) { \
873 --(i); \
874 }
875
876 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
877 #define UTF32_BACK_N_UNSAFE(s, i, n) { \
878 (i)-=(n); \
879 }
880
881 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
882 #define UTF32_SET_CHAR_LIMIT_UNSAFE(s, i) { \
883 }
884
885 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
886 #define UTF32_PREV_CHAR_SAFE(s, start, i, c, strict) { \
887 (c)=(s)[--(i)]; \
888 if(!UTF32_IS_SAFE(c, strict)) { \
889 (c)=UTF_ERROR_VALUE; \
890 } \
891 }
892
893 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
894 #define UTF32_BACK_1_SAFE(s, start, i) { \
895 --(i); \
896 }
897
898 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
899 #define UTF32_BACK_N_SAFE(s, start, i, n) { \
900 (i)-=(n); \
901 if((i)<(start)) { \
902 (i)=(start); \
903 } \
904 }
905
906 /** @deprecated ICU 2.4. Obsolete, see utf_old.h. */
907 #define UTF32_SET_CHAR_LIMIT_SAFE(s, i, length) { \
908 }
909
910 /* Formerly utf.h, part 2 --------------------------------------------------- */
911
912 /**
913 * Estimate the number of code units for a string based on the number of UTF-16 code units.
914 *
915 * @deprecated ICU 2.4. Obsolete, see utf_old.h.
916 */
917 #define UTF_ARRAY_SIZE(size) UTF16_ARRAY_SIZE(size)
918
919 /** @deprecated ICU 2.4. Renamed to U16_GET_UNSAFE, see utf_old.h. */
920 #define UTF_GET_CHAR_UNSAFE(s, i, c) UTF16_GET_CHAR_UNSAFE(s, i, c)
921
922 /** @deprecated ICU 2.4. Use U16_GET instead, see utf_old.h. */
923 #define UTF_GET_CHAR_SAFE(s, start, i, length, c, strict) UTF16_GET_CHAR_SAFE(s, start, i, length, c, strict)
924
925
926 /** @deprecated ICU 2.4. Renamed to U16_NEXT_UNSAFE, see utf_old.h. */
927 #define UTF_NEXT_CHAR_UNSAFE(s, i, c) UTF16_NEXT_CHAR_UNSAFE(s, i, c)
928
929 /** @deprecated ICU 2.4. Use U16_NEXT instead, see utf_old.h. */
930 #define UTF_NEXT_CHAR_SAFE(s, i, length, c, strict) UTF16_NEXT_CHAR_SAFE(s, i, length, c, strict)
931
932
933 /** @deprecated ICU 2.4. Renamed to U16_APPEND_UNSAFE, see utf_old.h. */
934 #define UTF_APPEND_CHAR_UNSAFE(s, i, c) UTF16_APPEND_CHAR_UNSAFE(s, i, c)
935
936 /** @deprecated ICU 2.4. Use U16_APPEND instead, see utf_old.h. */
937 #define UTF_APPEND_CHAR_SAFE(s, i, length, c) UTF16_APPEND_CHAR_SAFE(s, i, length, c)
938
939
940 /** @deprecated ICU 2.4. Renamed to U16_FWD_1_UNSAFE, see utf_old.h. */
941 #define UTF_FWD_1_UNSAFE(s, i) UTF16_FWD_1_UNSAFE(s, i)
942
943 /** @deprecated ICU 2.4. Renamed to U16_FWD_1, see utf_old.h. */
944 #define UTF_FWD_1_SAFE(s, i, length) UTF16_FWD_1_SAFE(s, i, length)
945
946
947 /** @deprecated ICU 2.4. Renamed to U16_FWD_N_UNSAFE, see utf_old.h. */
948 #define UTF_FWD_N_UNSAFE(s, i, n) UTF16_FWD_N_UNSAFE(s, i, n)
949
950 /** @deprecated ICU 2.4. Renamed to U16_FWD_N, see utf_old.h. */
951 #define UTF_FWD_N_SAFE(s, i, length, n) UTF16_FWD_N_SAFE(s, i, length, n)
952
953
954 /** @deprecated ICU 2.4. Renamed to U16_SET_CP_START_UNSAFE, see utf_old.h. */
955 #define UTF_SET_CHAR_START_UNSAFE(s, i) UTF16_SET_CHAR_START_UNSAFE(s, i)
956
957 /** @deprecated ICU 2.4. Renamed to U16_SET_CP_START, see utf_old.h. */
958 #define UTF_SET_CHAR_START_SAFE(s, start, i) UTF16_SET_CHAR_START_SAFE(s, start, i)
959
960
961 /** @deprecated ICU 2.4. Renamed to U16_PREV_UNSAFE, see utf_old.h. */
962 #define UTF_PREV_CHAR_UNSAFE(s, i, c) UTF16_PREV_CHAR_UNSAFE(s, i, c)
963
964 /** @deprecated ICU 2.4. Use U16_PREV instead, see utf_old.h. */
965 #define UTF_PREV_CHAR_SAFE(s, start, i, c, strict) UTF16_PREV_CHAR_SAFE(s, start, i, c, strict)
966
967
968 /** @deprecated ICU 2.4. Renamed to U16_BACK_1_UNSAFE, see utf_old.h. */
969 #define UTF_BACK_1_UNSAFE(s, i) UTF16_BACK_1_UNSAFE(s, i)
970
971 /** @deprecated ICU 2.4. Renamed to U16_BACK_1, see utf_old.h. */
972 #define UTF_BACK_1_SAFE(s, start, i) UTF16_BACK_1_SAFE(s, start, i)
973
974
975 /** @deprecated ICU 2.4. Renamed to U16_BACK_N_UNSAFE, see utf_old.h. */
976 #define UTF_BACK_N_UNSAFE(s, i, n) UTF16_BACK_N_UNSAFE(s, i, n)
977
978 /** @deprecated ICU 2.4. Renamed to U16_BACK_N, see utf_old.h. */
979 #define UTF_BACK_N_SAFE(s, start, i, n) UTF16_BACK_N_SAFE(s, start, i, n)
980
981
982 /** @deprecated ICU 2.4. Renamed to U16_SET_CP_LIMIT_UNSAFE, see utf_old.h. */
983 #define UTF_SET_CHAR_LIMIT_UNSAFE(s, i) UTF16_SET_CHAR_LIMIT_UNSAFE(s, i)
984
985 /** @deprecated ICU 2.4. Renamed to U16_SET_CP_LIMIT, see utf_old.h. */
986 #define UTF_SET_CHAR_LIMIT_SAFE(s, start, i, length) UTF16_SET_CHAR_LIMIT_SAFE(s, start, i, length)
987
988 /* Define default macros (UTF-16 "safe") ------------------------------------ */
989
990 /**
991 * Does this code unit alone encode a code point (BMP, not a surrogate)?
992 * Same as UTF16_IS_SINGLE.
993 * @deprecated ICU 2.4. Renamed to U_IS_SINGLE and U16_IS_SINGLE, see utf_old.h.
994 */
995 #define UTF_IS_SINGLE(uchar) U16_IS_SINGLE(uchar)
996
997 /**
998 * Is this code unit the first one of several (a lead surrogate)?
999 * Same as UTF16_IS_LEAD.
1000 * @deprecated ICU 2.4. Renamed to U_IS_LEAD and U16_IS_LEAD, see utf_old.h.
1001 */
1002 #define UTF_IS_LEAD(uchar) U16_IS_LEAD(uchar)
1003
1004 /**
1005 * Is this code unit one of several but not the first one (a trail surrogate)?
1006 * Same as UTF16_IS_TRAIL.
1007 * @deprecated ICU 2.4. Renamed to U_IS_TRAIL and U16_IS_TRAIL, see utf_old.h.
1008 */
1009 #define UTF_IS_TRAIL(uchar) U16_IS_TRAIL(uchar)
1010
1011 /**
1012 * Does this code point require multiple code units (is it a supplementary code point)?
1013 * Same as UTF16_NEED_MULTIPLE_UCHAR.
1014 * @deprecated ICU 2.4. Use U16_LENGTH or test ((uint32_t)(c)>0xffff) instead.
1015 */
1016 #define UTF_NEED_MULTIPLE_UCHAR(c) UTF16_NEED_MULTIPLE_UCHAR(c)
1017
1018 /**
1019 * How many code units are used to encode this code point (1 or 2)?
1020 * Same as UTF16_CHAR_LENGTH.
1021 * @deprecated ICU 2.4. Renamed to U16_LENGTH, see utf_old.h.
1022 */
1023 #define UTF_CHAR_LENGTH(c) U16_LENGTH(c)
1024
1025 /**
1026 * How many code units are used at most for any Unicode code point (2)?
1027 * Same as UTF16_MAX_CHAR_LENGTH.
1028 * @deprecated ICU 2.4. Renamed to U16_MAX_LENGTH, see utf_old.h.
1029 */
1030 #define UTF_MAX_CHAR_LENGTH U16_MAX_LENGTH
1031
1032 /**
1033 * Set c to the code point that contains the code unit i.
1034 * i could point to the lead or the trail surrogate for the code point.
1035 * i is not modified.
1036 * Same as UTF16_GET_CHAR.
1037 * \pre 0<=i<length
1038 *
1039 * @deprecated ICU 2.4. Renamed to U16_GET, see utf_old.h.
1040 */
1041 #define UTF_GET_CHAR(s, start, i, length, c) U16_GET(s, start, i, length, c)
1042
1043 /**
1044 * Set c to the code point that starts at code unit i
1045 * and advance i to beyond the code units of this code point (post-increment).
1046 * i must point to the first code unit of a code point.
1047 * Otherwise c is set to the trail unit (surrogate) itself.
1048 * Same as UTF16_NEXT_CHAR.
1049 * \pre 0<=i<length
1050 * \post 0<i<=length
1051 *
1052 * @deprecated ICU 2.4. Renamed to U16_NEXT, see utf_old.h.
1053 */
1054 #define UTF_NEXT_CHAR(s, i, length, c) U16_NEXT(s, i, length, c)
1055
1056 /**
1057 * Append the code units of code point c to the string at index i
1058 * and advance i to beyond the new code units (post-increment).
1059 * The code units beginning at index i will be overwritten.
1060 * Same as UTF16_APPEND_CHAR.
1061 * \pre 0<=c<=0x10ffff
1062 * \pre 0<=i<length
1063 * \post 0<i<=length
1064 *
1065 * @deprecated ICU 2.4. Use U16_APPEND instead, see utf_old.h.
1066 */
1067 #define UTF_APPEND_CHAR(s, i, length, c) UTF16_APPEND_CHAR_SAFE(s, i, length, c)
1068
1069 /**
1070 * Advance i to beyond the code units of the code point that begins at i.
1071 * I.e., advance i by one code point.
1072 * Same as UTF16_FWD_1.
1073 * \pre 0<=i<length
1074 * \post 0<i<=length
1075 *
1076 * @deprecated ICU 2.4. Renamed to U16_FWD_1, see utf_old.h.
1077 */
1078 #define UTF_FWD_1(s, i, length) U16_FWD_1(s, i, length)
1079
1080 /**
1081 * Advance i to beyond the code units of the n code points where the first one begins at i.
1082 * I.e., advance i by n code points.
1083 * Same as UT16_FWD_N.
1084 * \pre 0<=i<length
1085 * \post 0<i<=length
1086 *
1087 * @deprecated ICU 2.4. Renamed to U16_FWD_N, see utf_old.h.
1088 */
1089 #define UTF_FWD_N(s, i, length, n) U16_FWD_N(s, i, length, n)
1090
1091 /**
1092 * Take the random-access index i and adjust it so that it points to the beginning
1093 * of a code point.
1094 * The input index points to any code unit of a code point and is moved to point to
1095 * the first code unit of the same code point. i is never incremented.
1096 * In other words, if i points to a trail surrogate that is preceded by a matching
1097 * lead surrogate, then i is decremented. Otherwise it is not modified.
1098 * This can be used to start an iteration with UTF_NEXT_CHAR() from a random index.
1099 * Same as UTF16_SET_CHAR_START.
1100 * \pre start<=i<length
1101 * \post start<=i<length
1102 *
1103 * @deprecated ICU 2.4. Renamed to U16_SET_CP_START, see utf_old.h.
1104 */
1105 #define UTF_SET_CHAR_START(s, start, i) U16_SET_CP_START(s, start, i)
1106
1107 /**
1108 * Set c to the code point that has code units before i
1109 * and move i backward (towards the beginning of the string)
1110 * to the first code unit of this code point (pre-increment).
1111 * i must point to the first code unit after the last unit of a code point (i==length is allowed).
1112 * Same as UTF16_PREV_CHAR.
1113 * \pre start<i<=length
1114 * \post start<=i<length
1115 *
1116 * @deprecated ICU 2.4. Renamed to U16_PREV, see utf_old.h.
1117 */
1118 #define UTF_PREV_CHAR(s, start, i, c) U16_PREV(s, start, i, c)
1119
1120 /**
1121 * Move i backward (towards the beginning of the string)
1122 * to the first code unit of the code point that has code units before i.
1123 * I.e., move i backward by one code point.
1124 * i must point to the first code unit after the last unit of a code point (i==length is allowed).
1125 * Same as UTF16_BACK_1.
1126 * \pre start<i<=length
1127 * \post start<=i<length
1128 *
1129 * @deprecated ICU 2.4. Renamed to U16_BACK_1, see utf_old.h.
1130 */
1131 #define UTF_BACK_1(s, start, i) U16_BACK_1(s, start, i)
1132
1133 /**
1134 * Move i backward (towards the beginning of the string)
1135 * to the first code unit of the n code points that have code units before i.
1136 * I.e., move i backward by n code points.
1137 * i must point to the first code unit after the last unit of a code point (i==length is allowed).
1138 * Same as UTF16_BACK_N.
1139 * \pre start<i<=length
1140 * \post start<=i<length
1141 *
1142 * @deprecated ICU 2.4. Renamed to U16_BACK_N, see utf_old.h.
1143 */
1144 #define UTF_BACK_N(s, start, i, n) U16_BACK_N(s, start, i, n)
1145
1146 /**
1147 * Take the random-access index i and adjust it so that it points beyond
1148 * a code point. The input index points beyond any code unit
1149 * of a code point and is moved to point beyond the last code unit of the same
1150 * code point. i is never decremented.
1151 * In other words, if i points to a trail surrogate that is preceded by a matching
1152 * lead surrogate, then i is incremented. Otherwise it is not modified.
1153 * This can be used to start an iteration with UTF_PREV_CHAR() from a random index.
1154 * Same as UTF16_SET_CHAR_LIMIT.
1155 * \pre start<i<=length
1156 * \post start<i<=length
1157 *
1158 * @deprecated ICU 2.4. Renamed to U16_SET_CP_LIMIT, see utf_old.h.
1159 */
1160 #define UTF_SET_CHAR_LIMIT(s, start, i, length) U16_SET_CP_LIMIT(s, start, i, length)
1161
1162 #endif /* U_HIDE_DEPRECATED_API */
1163
1164 #endif
1165