collationtest.txt (45308B)
1 # Copyright (C) 2016 and later: Unicode, Inc. and others. 2 # License & terms of use: http://www.unicode.org/copyright.html 3 # Copyright (c) 2012-2015 International Business Machines 4 # Corporation and others. All Rights Reserved. 5 # 6 # This file should be in UTF-8 with a signature byte sequence ("BOM"). 7 # 8 # collationtest.txt: Collation test data. 9 # 10 # created on: 2012apr13 11 # created by: Markus W. Scherer 12 13 # A line with "** test: description" is used for verbose and error output. 14 15 # A collator can be set with "@ root" or "@ locale language-tag", 16 # for example "@ locale de-u-co-phonebk". 17 # An old-style locale ID can also be used, for example "@ locale de@collation=phonebook". 18 19 # A collator can be built with "@ rules". 20 # An "@ rules" line is followed by one or more lines with the tailoring rules. 21 22 # A collator can be modified with "% attribute=value". 23 24 # "* compare" tests the order (= or <) of the following strings. 25 # The relation can be "=" or "<" (the level of the difference is not specified) 26 # or "<1", "<2", "<c", "<3", "<4" (indicating the level of the difference). 27 28 # Test sections ("* compare") are terminated by 29 # definitions of new collators, changing attributes, or new test sections. 30 31 ** test: simple CEs & expansions 32 # Many types of mappings are tested elsewhere, including via the UCA conformance tests. 33 # Here we mostly cover a few unusual mappings. 34 @ rules 35 &\x01 # most control codes are ignorable 36 <<<\u0300 # tertiary CE 37 &9<\x00 # NUL not ignorable 38 &\uA00A\uA00B=\uA002 # two long-primary CEs 39 &\uA00A\uA00B\u00050005=\uA003 # three CEs, require 64 bits 40 41 * compare 42 = \x01 43 = \x02 44 <3 \u0300 45 <1 9 46 <1 \x00 47 = \x01\x00\x02 48 <1 a 49 <3 a\u0300 50 <2 a\u0308 51 = ä 52 <1 b 53 <1 か # Hiragana Ka (U+304B) 54 <2 か\u3099 # plus voiced sound mark 55 = が # Hiragana Ga (U+304C) 56 <1 \uA00A\uA00B 57 = \uA002 58 <1 \uA00A\uA00B\u00050004 59 <1 \uA00A\uA00B\u00050005 60 = \uA003 61 <1 \uA00A\uA00B\u00050006 62 63 ** test: contractions 64 # Create some interesting mappings, and map some normalization-inert characters 65 # (which are not subject to canonical reordering) 66 # to some of the same CEs to check the sequence of CEs. 67 @ rules 68 69 # Contractions starting with 'a' should not continue with any character < U+0300 70 # so that we can test a shortcut for that. 71 &a=ⓐ 72 &b<bz=ⓑ 73 &d<dz\u0301=ⓓ # d+z+acute 74 &z 75 <a\u0301=Ⓐ # a+acute sorts after z 76 <a\u0301\u0301=Ⓑ # a+acute+acute 77 <a\u0301\u0301\u0358=Ⓒ # a+acute+acute+dot above right 78 <a\u030a=Ⓓ # a+ring 79 <a\u0323=Ⓔ # a+dot below 80 <a\u0323\u0358=Ⓕ # a+dot below+dot above right 81 <a\u0327\u0323\u030a=Ⓖ # a+cedilla+dot below+ring 82 <a\u0327\u0323bz=Ⓗ # a+cedilla+dot below+b+z 83 84 &\U0001D158=⁰ # musical notehead black (has a symbol primary) 85 <\U0001D158\U0001D165=¼ # musical quarter note 86 87 # deliberately missing prefix contractions: 88 # dz 89 # a\u0327 90 # a\u0327\u0323 91 # a\u0327\u0323b 92 93 &\x01 94 <<<\U0001D165=¹ # musical stem (ccc=216) 95 <<<\U0001D16D=² # musical augmentation dot (ccc=226) 96 <<<\U0001D165\U0001D16D=³ # stem+dot (ccc=216 226) 97 &\u0301=❶ # acute (ccc=230) 98 &\u030a=❷ # ring (ccc=230) 99 &\u0308=❸ # diaeresis (ccc=230) 100 <<\u0308\u0301=❹ # diaeresis+acute (=dialytika tonos) (ccc=230 230) 101 &\u0327=❺ # cedilla (ccc=202) 102 &\u0323=❻ # dot below (ccc=220) 103 &\u0331=❼ # macron below (ccc=220) 104 <<\u0331\u0358=❽ # macron below+dot above right (ccc=220 232) 105 &\u0334=❾ # tilde overlay (ccc=1) 106 &\u0358=❿ # dot above right (ccc=232) 107 108 &\u0f71=① # tibetan vowel sign aa 109 &\u0f72=② # tibetan vowel sign i 110 # \u0f71\u0f72 # tibetan vowel sign aa + i = ii = U+0F73 111 &\u0f73=③ # tibetan vowel sign ii (ccc=0 but lccc=129) 112 113 ** test: simple contractions 114 115 # Some strings are chosen to cause incremental contiguous contraction matching to 116 # go into partial matches for prefixes of contractions 117 # (where the prefixes are deliberately not also contractions). 118 # When there is no complete match, then the matching code must back out of those 119 # so that discontiguous contractions work as specified. 120 121 * compare 122 # contraction starter with no following text, or mismatch, or blocked 123 <1 a 124 = ⓐ 125 <1 aa 126 = ⓐⓐ 127 <1 ab 128 = ⓐb 129 <1 az 130 = ⓐz 131 132 * compare 133 <1 a 134 <2 a\u0308\u030a # ring blocked by diaeresis 135 = ⓐ❸❷ 136 <2 a\u0327 137 = ⓐ❺ 138 139 * compare 140 <2 \u0308 141 = ❸ 142 <2 \u0308\u030a\u0301 # acute blocked by ring 143 = ❸❷❶ 144 145 * compare 146 <1 \U0001D158 147 = ⁰ 148 <1 \U0001D158\U0001D165 149 = ¼ 150 151 # no discontiguous contraction because of missing prefix contraction d+z, 152 # and a starter ('z') after the 'd' 153 * compare 154 <1 dz\u0323\u0301 155 = dz❻❶ 156 157 # contiguous contractions 158 * compare 159 <1 abz 160 = ⓐⓑ 161 <1 abzz 162 = ⓐⓑz 163 164 * compare 165 <1 a 166 <1 z 167 <1 a\u0301 168 = Ⓐ 169 <1 a\u0301\u0301 170 = Ⓑ 171 <1 a\u0301\u0301\u0358 172 = Ⓒ 173 <1 a\u030a 174 = Ⓓ 175 <1 a\u0323\u0358 176 = Ⓕ 177 <1 a\u0327\u0323\u030a # match despite missing prefix 178 = Ⓖ 179 <1 a\u0327\u0323bz 180 = Ⓗ 181 182 * compare 183 <2 \u0308\u0308\u0301 # acute blocked from first diaeresis, contracts with second 184 = ❸❹ 185 186 * compare 187 <1 \U0001D158\U0001D165 188 = ¼ 189 190 * compare 191 <3 \U0001D165\U0001D16D 192 = ³ 193 194 ** test: discontiguous contractions 195 * compare 196 <1 a\u0327\u030a # a+ring skips cedilla 197 = Ⓓ❺ 198 <2 a\u0327\u0327\u030a # a+ring skips 2 cedillas 199 = Ⓓ❺❺ 200 <2 a\u0327\u0327\u0327\u030a # a+ring skips 3 cedillas 201 = Ⓓ❺❺❺ 202 <2 a\u0334\u0327\u0327\u030a # a+ring skips tilde overlay & 2 cedillas 203 = Ⓓ❾❺❺ 204 <1 a\u0327\u0323 # a+dot below skips cedilla 205 = Ⓔ❺ 206 <1 a\u0323\u0301\u0358 # a+dot below+dot ab.r.: 2-char match, then skips acute 207 = Ⓕ❶ 208 <2 a\u0334\u0323\u0358 # a+dot below skips tilde overlay 209 = Ⓕ❾ 210 211 * compare 212 <2 \u0331\u0331\u0358 # macron below+dot ab.r. skips the second macron below 213 = ❽❼ 214 215 * compare 216 <1 a\u0327\u0331\u0323\u030a # a+ring skips cedilla, macron below, dot below (dot blocked by macron) 217 = Ⓓ❺❼❻ 218 <1 a\u0327\u0323\U0001D16D\u030a # a+dot below skips cedilla 219 = Ⓔ❺²❷ 220 <2 a\u0327\u0327\u0323\u030a # a+dot below skips 2 cedillas 221 = Ⓔ❺❺❷ 222 <2 a\u0327\u0323\u0323\u030a # a+dot below skips cedilla 223 = Ⓔ❺❻❷ 224 <2 a\u0334\u0327\u0323\u030a # a+dot below skips tilde overlay & cedilla 225 = Ⓔ❾❺❷ 226 227 * compare 228 <1 \U0001D158\u0327\U0001D165 # quarter note skips cedilla 229 = ¼❺ 230 <1 a\U0001D165\u0323 # a+dot below skips stem 231 = Ⓔ¹ 232 233 # partial contiguous match, backs up, matches discontiguous contraction 234 <1 a\u0327\u0323b 235 = Ⓔ❺b 236 <1 a\u0327\u0323ba 237 = Ⓔ❺bⓐ 238 239 # a+acute+acute+dot above right skips cedilla, continues matching 2 same-ccc combining marks 240 * compare 241 <1 a\u0327\u0301\u0301\u0358 242 = Ⓒ❺ 243 244 # FCD but not NFD 245 * compare 246 <1 a\u0f73\u0301 # a+acute skips tibetan ii 247 = Ⓐ③ 248 249 # FCD but the 0f71 inside the 0f73 must be skipped 250 # to match the discontiguous contraction of the first 0f71 with the trailing 0f72 inside the 0f73 251 * compare 252 <1 \u0f71\u0f73 # == \u0f73\u0f71 == \u0f71\u0f71\u0f72 253 = ③① 254 255 ** test: discontiguous contractions with nested contractions 256 * compare 257 <1 a\u0323\u0308\u0301\u0358 258 = Ⓕ❹ 259 <2 a\u0323\u0308\u0301\u0308\u0301\u0358 260 = Ⓕ❹❹ 261 262 ** test: discontiguous contractions with interleaved contractions 263 * compare 264 # a+ring & cedilla & macron below+dot above right 265 <1 a\u0327\u0331\u030a\u0358 266 = Ⓓ❺❽ 267 268 # a+ring & 1x..3x macron below+dot above right 269 <2 a\u0331\u030a\u0358 270 = Ⓓ❽ 271 <2 a\u0331\u0331\u030a\u0358\u0358 272 = Ⓓ❽❽ 273 # also skips acute 274 <2 a\u0331\u0331\u0331\u030a\u0301\u0358\u0358\u0358 275 = Ⓓ❽❽❽❶ 276 277 # a+dot below & stem+augmentation dot, followed by contiguous d+z+acute 278 <1 a\U0001D165\u0323\U0001D16Ddz\u0301 279 = Ⓔ³ⓓ 280 281 ** test: some simple string comparisons 282 @ root 283 * compare 284 # first string compares against "" 285 = \u0000 286 < a 287 <1 b 288 <3 B 289 = \u0000B\u0000 290 291 ** test: compare with strength=primary 292 % strength=primary 293 * compare 294 <1 a 295 <1 b 296 = B 297 298 ** test: compare with strength=secondary 299 % strength=secondary 300 * compare 301 <1 a 302 <1 b 303 = B 304 305 ** test: compare with strength=tertiary 306 % strength=tertiary 307 * compare 308 <1 a 309 <1 b 310 <3 B 311 312 ** test: compare with strength=quaternary 313 % strength=quaternary 314 * compare 315 <1 a 316 <1 b 317 <3 B 318 319 ** test: compare with strength=identical 320 % strength=identical 321 * compare 322 <1 a 323 <1 b 324 <3 B 325 326 ** test: côté with forwards secondary 327 @ root 328 * compare 329 <1 cote 330 <2 coté 331 <2 côte 332 <2 côté 333 334 ** test: côté with forwards secondary vs. U+FFFE merge separator 335 # Merged sort keys: On each level, any difference in the first segment 336 # must trump any further difference. 337 * compare 338 <1 cote\uFFFEcôté 339 <2 coté\uFFFEcôte 340 <2 côte\uFFFEcoté 341 <2 côté\uFFFEcote 342 343 ** test: côté with backwards secondary 344 % backwards=on 345 * compare 346 <1 cote 347 <2 côte 348 <2 coté 349 <2 côté 350 351 ** test: côté with backwards secondary vs. U+FFFE merge separator 352 # Merged sort keys: On each level, any difference in the first segment 353 # must trump any further difference. 354 * compare 355 <1 cote\uFFFEcôté 356 <2 côte\uFFFEcoté 357 <2 coté\uFFFEcôte 358 <2 côté\uFFFEcote 359 360 ** test: U+FFFE on identical level 361 @ root 362 % strength=identical 363 * compare 364 # All of these control codes are completely-ignorable, so that 365 # their low code points are compared with the merge separator. 366 # The merge separator must compare less than any other character. 367 <1 \uFFFE\u0001\u0002\u0003 368 <i \u0001\uFFFE\u0002\u0003 369 <i \u0001\u0002\uFFFE\u0003 370 <i \u0001\u0002\u0003\uFFFE 371 372 * compare 373 # The merge separator must even compare less than U+0000. 374 <1 \uFFFE\u0000\u0000 375 <i \u0000\uFFFE\u0000 376 <i \u0000\u0000\uFFFE 377 378 ** test: Hani < surrogates < U+FFFD 379 # Note: compareUTF8() treats unpaired surrogates like U+FFFD, 380 # so with that the strings with surrogates will compare equal to each other 381 # and equal to the string with U+FFFD. 382 @ root 383 % strength=identical 384 * compare 385 <1 abz 386 <1 a\u4e00z 387 <1 a\U00020000z 388 <1 a\ud800z 389 <1 a\udbffz 390 <1 a\udc00z 391 <1 a\udfffz 392 <1 a\ufffdz 393 394 ** test: script reordering 395 @ root 396 % reorder Hani Zzzz digit 397 * compare 398 <1 ? 399 <1 + 400 <1 丂 401 <1 a 402 <1 α 403 <1 5 404 405 % reorder default 406 * compare 407 <1 ? 408 <1 + 409 <1 5 410 <1 a 411 <1 α 412 <1 丂 413 414 ** test: empty rules 415 @ rules 416 * compare 417 <1 a 418 <2 ä 419 <3 Ä 420 <1 b 421 422 ** test: very simple rules 423 @ rules 424 &a=e<<<<q<<<<r<x<<<X<<y<<<Y;z,Z 425 % strength=quaternary 426 * compare 427 <1 a 428 = e 429 <4 q 430 <4 r 431 <1 x 432 <3 X 433 <2 y 434 <3 Y 435 <2 z 436 <3 Z 437 438 ** test: tailoring twice before a root position: primary 439 @ rules 440 &[before 1]b<p 441 &[before 1]b<q 442 * compare 443 <1 a 444 <1 p 445 <1 q 446 <1 b 447 448 ** test: tailoring twice before a root position: secondary 449 @ rules 450 &[before 2]ſ<<p 451 &[before 2]ſ<<q 452 * compare 453 <1 s 454 <2 p 455 <2 q 456 <2 ſ 457 458 # secondary-before common weight 459 @ rules 460 &[before 2]b<<p 461 &[before 2]b<<q 462 * compare 463 <1 a 464 <1 p 465 <2 q 466 <2 b 467 468 ** test: tailoring twice before a root position: tertiary 469 @ rules 470 &[before 3]B<<<p 471 &[before 3]B<<<q 472 * compare 473 <1 b 474 <3 p 475 <3 q 476 <3 B 477 478 # tertiary-before common weight 479 @ rules 480 &[before 3]b<<<p 481 &[before 3]b<<<q 482 * compare 483 <1 a 484 <1 p 485 <3 q 486 <3 b 487 488 @ rules 489 &[before 2]b<<s 490 &[before 3]s<<<p 491 &[before 3]s<<<q 492 * compare 493 <1 a 494 <1 p 495 <3 q 496 <3 s 497 <2 b 498 499 ** test: tailor after completely ignorable 500 @ rules 501 &\x00<<<x<<y 502 * compare 503 = \x00 504 = \x1F 505 <3 x 506 <2 y 507 508 ** test: secondary tailoring gaps, ICU ticket 9362 509 @ rules 510 &[before 2]s<<'_' 511 &s<<r # secondary between s and ſ (long s) 512 &ſ<<*a-q # more than 15 between ſ and secondary CE boundary 513 &[before 2][first primary ignorable]<<u<<v # between secondary CE boundary & lowest secondary CE 514 &[last primary ignorable]<<y<<z 515 516 * compare 517 <2 u 518 <2 v 519 <2 \u0332 # lowest secondary CE 520 <2 \u0308 521 <2 y 522 <2 z 523 <1 s_ 524 <2 ss 525 <2 sr 526 <2 sſ 527 <2 sa 528 <2 sb 529 <2 sp 530 <2 sq 531 <2 sus 532 <2 svs 533 <2 rs 534 535 ** test: tertiary tailoring gaps, ICU ticket 9362 536 @ rules 537 &[before 3]t<<<'_' 538 &t<<<r # tertiary between t and fullwidth t 539 &ᵀ<<<*a-q # more than 15 between ᵀ (modifier letter T) and tertiary CE boundary 540 &[before 3][first secondary ignorable]<<<u<<<v # between tertiary CE boundary & lowest tertiary CE 541 &[last secondary ignorable]<<<y<<<z 542 543 * compare 544 <3 u 545 <3 v 546 # Note: The root collator currently does not map any characters to tertiary CEs. 547 <3 y 548 <3 z 549 <1 t_ 550 <3 tt 551 <3 tr 552 <3 tt 553 <3 tᵀ 554 <3 ta 555 <3 tb 556 <3 tp 557 <3 tq 558 <3 tut 559 <3 tvt 560 <3 rt 561 562 ** test: secondary & tertiary around root character 563 @ rules 564 &[before 2]m<<r 565 &m<<s 566 &[before 3]m<<<u 567 &m<<<v 568 * compare 569 <1 l 570 <1 r 571 <2 u 572 <3 m 573 <3 v 574 <2 s 575 <1 n 576 577 ** test: secondary & tertiary around tailored item 578 @ rules 579 &m<x 580 &[before 2]x<<r 581 &x<<s 582 &[before 3]x<<<u 583 &x<<<v 584 * compare 585 <1 m 586 <1 r 587 <2 u 588 <3 x 589 <3 v 590 <2 s 591 <1 n 592 593 ** test: more nesting of secondary & tertiary before 594 @ rules 595 &[before 3]m<<<u 596 &[before 2]m<<r 597 &[before 3]r<<<q 598 &m<<<w 599 &m<<t 600 &[before 3]w<<<v 601 &w<<<x 602 &w<<s 603 * compare 604 <1 l 605 <1 q 606 <3 r 607 <2 u 608 <3 m 609 <3 v 610 <3 w 611 <3 x 612 <2 s 613 <2 t 614 <1 n 615 616 ** test: case bits 617 @ rules 618 &w<x # tailored CE getting case bits 619 =uv=uV=Uv=UV # 2 chars -> 1 CE 620 &ae=ch=cH=Ch=CH # 2 chars -> 2 CEs 621 &rst=yz=yZ=Yz=YZ # 2 chars -> 3 CEs 622 % caseFirst=lower 623 * compare 624 <1 ae 625 = ch 626 <3 cH 627 <3 Ch 628 <3 CH 629 <1 rst 630 = yz 631 <3 yZ 632 <3 Yz 633 <3 YZ 634 <1 w 635 <1 x 636 = uv 637 <3 uV 638 = Uv # mixed case on single CE cannot distinguish variations 639 <3 UV 640 641 ** test: tertiary CEs, tertiary, caseLevel=off, caseFirst=lower 642 @ rules 643 &\u0001<<<t<<<T # tertiary CEs 644 % caseFirst=lower 645 * compare 646 <1 aa 647 <3 aat 648 <3 aaT 649 <3 aA 650 <3 aAt 651 <3 ata 652 <3 aTa 653 654 ** test: tertiary CEs, tertiary, caseLevel=off, caseFirst=upper 655 % caseFirst=upper 656 * compare 657 <1 aA 658 <3 aAt 659 <3 aa 660 <3 aat 661 <3 aaT 662 <3 ata 663 <3 aTa 664 665 ** test: reset on expansion, ICU tickets 9415 & 9593 666 @ rules 667 &æ<x # tailor the last primary CE so that x sorts between ae and af 668 &æb=bæ # copy all reset CEs to make bæ sort the same 669 &각<h # copy/tailor 3 CEs to make h sort before the next Hangul syllable 갂 670 &⒀<<y # copy/tailor 4 CEs to make y sort with only a secondary difference 671 &l·=z # handle the pre-context for · when fetching reset CEs 672 <<u # copy/tailor 2 CEs 673 674 * compare 675 <1 ae 676 <2 æ 677 <1 x 678 <1 af 679 680 * compare 681 <1 aeb 682 <2 æb 683 = bæ 684 685 * compare 686 <1 각 687 <1 h 688 <1 갂 689 <1 갃 690 691 * compare 692 <1 · # by itself: primary CE 693 <1 l 694 <2 l· # l+middle dot has only a secondary difference from l 695 = z 696 <2 u 697 698 * compare 699 <1 (13) 700 <3 ⒀ # DUCET sets special tertiary weights in all CEs 701 <2 y 702 <1 (13[ 703 704 % alternate=shifted 705 * compare 706 <1 (13) 707 = 13 708 <3 ⒀ 709 = y # alternate=shifted removes the tailoring difference on the last CE 710 <1 14 711 712 ** test: contraction inside extension, ICU ticket 9378 713 @ rules 714 &а<<х/й # all letters are Cyrillic 715 * compare 716 <1 ай 717 <2 х 718 719 ** test: no duplicate tailored CEs for different reset positions with same CEs, ICU ticket 10104 720 @ rules 721 &t<x &ᵀ<y # same primary weights 722 &q<u &[before 1]ꝗ<v # q and ꝗ are primary adjacent 723 * compare 724 <1 q 725 <1 u 726 <1 v 727 <1 ꝗ 728 <1 t 729 <3 ᵀ 730 <1 y 731 <1 x 732 733 # Principle: Each rule builds on the state of preceding rules and ignores following rules. 734 735 ** test: later rule does not affect earlier reset position, ICU ticket 10105 736 @ rules 737 &a < u < v < w &ov < x &b < v 738 * compare 739 <1 oa 740 <1 ou 741 <1 x # CE(o) followed by CE between u and w 742 <1 ow 743 <1 ob 744 <1 ov 745 746 ** test: later rule does not affect earlier extension (1), ICU ticket 10105 747 @ rules 748 &a=x/b &v=b 749 % strength=secondary 750 * compare 751 <1 B 752 <1 c 753 <1 v 754 = b 755 * compare 756 <1 AB 757 = x 758 <1 ac 759 <1 av 760 = ab 761 762 ** test: later rule does not affect earlier extension (2), ICU ticket 10105 763 @ rules 764 &a <<< c / e &g <<< e / l 765 % strength=secondary 766 * compare 767 <1 AE 768 = c 769 <2 æ 770 <1 agl 771 = ae 772 773 ** test: later rule does not affect earlier extension (3), ICU ticket 10105 774 @ rules 775 &a = b / c &d = c / e 776 % strength=secondary 777 * compare 778 <1 AC # C is still only tertiary different from the original c 779 = b 780 <1 ade 781 = ac 782 783 ** test: extension contains tailored character, ICU ticket 10105 784 @ rules 785 &a=e &b=u/e 786 * compare 787 <1 a 788 = e 789 <1 ba 790 = be 791 = u 792 793 ** test: add simple mappings for characters with root context 794 @ rules 795 &z=· # middle dot has a prefix mapping in the CLDR root 796 &n=и # и (U+0438) has contractions in the root 797 * compare 798 <1 l 799 <2 l· # root mapping for l|· still works 800 <1 z 801 = · 802 * compare 803 <1 n 804 = и 805 <1 И 806 <1 и\u0306 # root mapping for й=и\u0306 still works 807 = й 808 <3 Й 809 810 ** test: add context mappings around characters with root context 811 @ rules 812 &z=·h # middle dot has a prefix mapping in the CLDR root 813 &n=ә|и # и (U+0438) has contractions in the root 814 * compare 815 <1 l 816 <2 l· # root mapping for l|· still works 817 <1 z 818 = ·h 819 * compare 820 <1 и 821 <3 И 822 <1 и\u0306 # root mapping for й=и\u0306 still works 823 = й 824 * compare 825 <1 әn 826 = әи 827 <1 әo 828 829 ** test: many secondary CEs at the top of their range 830 @ rules 831 &[last primary ignorable]<<*\u2801-\u28ff 832 * compare 833 <2 \u0308 834 <2 \u2801 835 <2 \u2802 836 <2 \u2803 837 <2 \u2804 838 <2 \u28fd 839 <2 \u28fe 840 <2 \u28ff 841 <1 \x20 842 843 ** test: many tertiary CEs at the top of their range 844 @ rules 845 &[last secondary ignorable]<<<*a-z 846 * compare 847 <3 a 848 <3 b 849 <3 c 850 <3 d 851 # e..w 852 <3 x 853 <3 y 854 <3 z 855 <2 \u0308 856 857 ** test: tailor contraction together with nearly equivalent prefix, ICU ticket 10101 858 @ rules 859 &a=p|x &b=px &c=op 860 * compare 861 <1 b 862 = px 863 <3 B 864 <1 c 865 = op 866 <3 C 867 * compare 868 <1 ca 869 = opx # first contraction op, then prefix p|x 870 <3 cA 871 <3 Ca 872 873 ** test: reset position with prefix (pre-context), ICU ticket 10102 874 @ rules 875 &a=p|x &px=y 876 * compare 877 <1 pa 878 = px 879 = y 880 <3 pA 881 <1 q 882 <1 x 883 884 ** test: prefix+contraction together (1), ICU ticket 10071 885 @ rules 886 &x=a|bc 887 * compare 888 <1 ab 889 <1 Abc 890 <1 abd 891 <1 ac 892 <1 aw 893 <1 ax 894 = abc 895 <3 aX 896 <3 Ax 897 <1 b 898 <1 bb 899 <1 bc 900 <3 bC 901 <3 Bc 902 <1 bd 903 904 ** test: prefix+contraction together (2), ICU ticket 10071 905 @ rules 906 &w=bc &x=a|b 907 * compare 908 <1 w 909 = bc 910 <3 W 911 * compare 912 <1 aw 913 <1 ax 914 = ab 915 <3 aX 916 <1 axb 917 <1 axc 918 = abc # prefix match a|b takes precedence over contraction match bc 919 <3 abC 920 <1 abd 921 <1 ay 922 923 ** test: prefix+contraction together (3), ICU ticket 10071 924 @ rules 925 &x=a|b &w=bc # reverse order of rules as previous test, order should not matter here 926 * compare # same "compare" sequences as previous test 927 <1 w 928 = bc 929 <3 W 930 * compare 931 <1 aw 932 <1 ax 933 = ab 934 <3 aX 935 <1 axb 936 <1 axc 937 = abc # prefix match a|b takes precedence over contraction match bc 938 <3 abC 939 <1 abd 940 <1 ay 941 942 ** test: no mapping p|c, falls back to contraction ch, CLDR ticket 5962 943 @ rules 944 &d=ch &v=p|ci 945 * compare 946 <1 pc 947 <3 pC 948 <1 pcH 949 <1 pcI 950 <1 pd 951 = pch # no-prefix contraction ch matches 952 <3 pD 953 <1 pv 954 = pci # prefix+contraction p|ci matches 955 <3 pV 956 957 ** test: tailor in & around compact ranges of root primaries 958 # The Ogham characters U+1681..U+169A are in simple ascending order of primary CEs 959 # which should be reliably encoded as one range in the root elements data. 960 @ rules 961 &[before 1]ᚁ<a 962 &ᚁ<b 963 &[before 1]ᚂ<c 964 &ᚂ<d 965 &[before 1]ᚚ<y 966 &ᚚ<z 967 &[before 2]ᚁ<<r 968 &ᚁ<<s 969 &[before 3]ᚚ<<<t 970 &ᚚ<<<u 971 * compare 972 <1 ᣵ # U+18F5 last Canadian Aboriginal 973 <1 a 974 <1 r 975 <2 ᚁ 976 <2 s 977 <1 b 978 <1 c 979 <1 ᚂ 980 <1 d 981 <1 ᚃ 982 <1 ᚙ 983 <1 y 984 <1 t 985 <3 ᚚ 986 <3 u 987 <1 z 988 <1 ᚠ # U+16A0 first Runic 989 990 ** test: suppressContractions 991 @ rules 992 &z<ch<әж [suppressContractions [·cә]] 993 * compare 994 <1 ch 995 <3 cH # ch was suppressed 996 <1 l 997 <1 l· # primary difference, not secondary, because l|· was suppressed 998 <1 ә 999 <2 ә\u0308 # secondary difference, not primary, because contractions for ә were suppressed 1000 <1 әж 1001 <3 әЖ 1002 1003 ** test: Hangul & Jamo 1004 @ rules 1005 &L=\u1100 # first Jamo L 1006 &V=\u1161 # first Jamo V 1007 &T=\u11A8 # first Jamo T 1008 &\uAC01<<*\u4E00-\u4EFF # first Hangul LVT syllable & lots of secondary diffs 1009 * compare 1010 <1 Lv 1011 <3 LV 1012 = \u1100\u1161 1013 = \uAC00 1014 <1 LVt 1015 <3 LVT 1016 = \u1100\u1161\u11A8 1017 = \uAC00\u11A8 1018 = \uAC01 1019 <2 LVT\u0308 1020 <2 \u4E00 1021 <2 \u4E01 1022 <2 \u4E80 1023 <2 \u4EFF 1024 <2 LV\u0308T 1025 <1 \uAC02 1026 1027 ** test: adjust special reset positions according to previous rules, CLDR ticket 6070 1028 @ rules 1029 &[last variable]<x 1030 [maxVariable space] # has effect only after building, no effect on following rules 1031 &[last variable]<y 1032 &[before 1][first regular]<z 1033 * compare 1034 <1 ? # some punctuation 1035 <1 x 1036 <1 y 1037 <1 z 1038 <1 $ # some symbol 1039 1040 @ rules 1041 &[last primary ignorable]<<x<<<y 1042 &[last primary ignorable]<<z 1043 * compare 1044 <2 \u0358 1045 <2 x 1046 <3 y 1047 <2 z 1048 <1 \x20 1049 1050 @ rules 1051 &[last secondary ignorable]<<<x 1052 &[last secondary ignorable]<<<y 1053 * compare 1054 <3 x 1055 <3 y 1056 <2 \u0358 1057 1058 @ rules 1059 &[before 2][first variable]<<z 1060 &[before 2][first variable]<<y 1061 &[before 3][first variable]<<<x 1062 &[before 3][first variable]<<<w 1063 &[before 1][first variable]<v 1064 &[before 2][first variable]<<u 1065 &[before 3][first variable]<<<t 1066 &[before 2]\uFDD1\xA0<<s # FractionalUCA.txt: FDD1 00A0, SPACE first primary 1067 * compare 1068 <2 \u0358 1069 <1 s 1070 <2 \uFDD1\xA0 1071 <1 t 1072 <3 u 1073 <2 v 1074 <1 w 1075 <3 x 1076 <3 y 1077 <2 z 1078 <2 \t 1079 1080 @ rules 1081 &[before 2][first regular]<<z 1082 &[before 3][first regular]<<<y 1083 &[before 1][first regular]<x 1084 &[before 3][first regular]<<<w 1085 &[before 2]\uFDD1\u263A<<v # FractionalUCA.txt: FDD1 263A, SYMBOL first primary 1086 &[before 3][first regular]<<<u 1087 &[before 1][first regular]<p # primary before the boundary: becomes variable 1088 &[before 3][first regular]<<<t # not affected by p 1089 &[last variable]<q # after p! 1090 * compare 1091 <1 ? 1092 <1 p 1093 <1 q 1094 <1 t 1095 <3 u 1096 <3 v 1097 <1 w 1098 <3 x 1099 <1 y 1100 <3 z 1101 <1 $ 1102 1103 # check that p & q are indeed variable 1104 % alternate=shifted 1105 * compare 1106 = ? 1107 = p 1108 = q 1109 <1 t 1110 <3 u 1111 <3 v 1112 <1 w 1113 <3 x 1114 <1 y 1115 <3 z 1116 <1 $ 1117 1118 @ rules 1119 &[before 2][first trailing]<<z 1120 &[before 1][first trailing]<y 1121 &[before 3][first trailing]<<<x 1122 * compare 1123 <1 \u4E00 # first Han, first implicit 1124 <1 \uFDD1\uFDD0 # FractionalUCA.txt: unassigned first primary 1125 # Note: The root collator currently does not map any characters to the trailing first boundary primary. 1126 <1 x 1127 <3 y 1128 <1 z 1129 <2 \uFFFD # The root collator currently maps U+FFFD to the first real trailing primary. 1130 1131 @ rules 1132 &[before 2][first primary ignorable]<<z 1133 &[before 2][first primary ignorable]<<y 1134 &[before 3][first primary ignorable]<<<x 1135 &[before 3][first primary ignorable]<<<w 1136 * compare 1137 = \x01 1138 <2 w 1139 <3 x 1140 <3 y 1141 <2 z 1142 <2 \u0301 1143 1144 @ rules 1145 &[before 3][first secondary ignorable]<<<y 1146 &[before 3][first secondary ignorable]<<<x 1147 * compare 1148 = \x01 1149 <3 x 1150 <3 y 1151 <2 \u0301 1152 1153 ** test: canonical closure 1154 @ rules 1155 &X=A &U= 1156 * compare 1157 <1 U 1158 =  1159 = A\u0302 1160 <2 Ú # U with acute 1161 = U\u0301 1162 = Ấ # A with circumflex & acute 1163 = Â\u0301 1164 = A\u0302\u0301 1165 <1 X 1166 = A 1167 <2 X\u030A # with ring above 1168 = Å 1169 = A\u030A 1170 = \u212B # Angstrom sign 1171 1172 @ rules 1173 &x=\u5140\u55C0 1174 * compare 1175 <1 x 1176 = \u5140\u55C0 1177 = \u5140\uFA0D 1178 = \uFA0C\u55C0 1179 = \uFA0C\uFA0D # CJK compatibility characters 1180 <3 X 1181 1182 # canonical closure on prefix rules, ICU ticket 9444 1183 @ rules 1184 &x=ä|ŝ 1185 * compare 1186 <1 äs # not tailored 1187 <1 äx 1188 = äŝ 1189 = a\u0308s\u0302 1190 = a\u0308ŝ 1191 = äs\u0302 1192 <3 äX 1193 1194 ** test: conjoining Jamo map to expansions 1195 @ rules 1196 &gg=\u1101 # Jamo Lead consonant GG 1197 &nj=\u11AC # Jamo Trail consonant NJ 1198 * compare 1199 <1 gg\u1161nj 1200 = \u1101\u1161\u11AC 1201 = \uAE4C\u11AC 1202 = \uAE51 1203 <3 gg\u1161nJ 1204 <1 \u1100\u1100 1205 1206 ** test: canonical tail closure, ICU ticket 5913 1207 @ rules 1208 &a<â 1209 * compare 1210 <1 a 1211 <1 â # tailored 1212 = a\u0302 1213 <2 a\u0323\u0302 # discontiguous contraction 1214 = ạ\u0302 # equivalent 1215 = ậ # equivalent 1216 <1 b 1217 1218 @ rules 1219 &a<ạ 1220 * compare 1221 <1 a 1222 <1 ạ # tailored 1223 = a\u0323 1224 <2 a\u0323\u0302 # contiguous contraction plus extra diacritic 1225 = ạ\u0302 # equivalent 1226 = ậ # equivalent 1227 <1 b 1228 1229 # Tail closure should work even if there is a prefix and/or contraction. 1230 @ rules 1231 &a<\u5140|câ 1232 # In order to find discontiguous contractions for \u5140|câ 1233 # there must exist a mapping for \u5140|ca, regardless of what it maps to. 1234 # (This follows from the UCA spec.) 1235 &x=\u5140|ca 1236 * compare 1237 <1 \u5140a 1238 = \uFA0Ca 1239 <1 \u5140câ # tailored 1240 = \uFA0Ccâ 1241 = \u5140ca\u0302 1242 = \uFA0Cca\u0302 1243 <2 \u5140ca\u0323\u0302 # discontiguous contraction 1244 = \uFA0Cca\u0323\u0302 1245 = \u5140cạ\u0302 1246 = \uFA0Ccạ\u0302 1247 = \u5140cậ 1248 = \uFA0Ccậ 1249 <1 \u5140b 1250 = \uFA0Cb 1251 <1 \u5140x 1252 = \u5140ca 1253 1254 # Double-check that without the extra mapping there will be no discontiguous match. 1255 @ rules 1256 &a<\u5140|câ 1257 * compare 1258 <1 \u5140a 1259 = \uFA0Ca 1260 <1 \u5140câ # tailored 1261 = \uFA0Ccâ 1262 = \u5140ca\u0302 1263 = \uFA0Cca\u0302 1264 <1 \u5140b 1265 = \uFA0Cb 1266 <1 \u5140ca\u0323\u0302 # no discontiguous contraction 1267 = \uFA0Cca\u0323\u0302 1268 = \u5140cạ\u0302 1269 = \uFA0Ccạ\u0302 1270 = \u5140cậ 1271 = \uFA0Ccậ 1272 1273 @ rules 1274 &a<cạ 1275 * compare 1276 <1 a 1277 <1 cạ # tailored 1278 = ca\u0323 1279 <2 ca\u0323\u0302 # contiguous contraction plus extra diacritic 1280 = cạ\u0302 # equivalent 1281 = cậ # equivalent 1282 <1 b 1283 1284 # ᾢ = U+1FA2 GREEK SMALL LETTER OMEGA WITH PSILI AND VARIA AND YPOGEGRAMMENI 1285 # = 03C9 0313 0300 0345 1286 # ccc = 0, 230, 230, 240 1287 @ rules 1288 &δ=αῳ 1289 # In order to find discontiguous contractions for αῳ 1290 # there must exist a mapping for αω, regardless of what it maps to. 1291 # (This follows from the UCA spec.) 1292 &ε=αω 1293 * compare 1294 <1 δ 1295 = αῳ 1296 = αω\u0345 1297 <2 αω\u0313\u0300\u0345 # discontiguous contraction 1298 = αὠ\u0300\u0345 1299 = αὢ\u0345 1300 = αᾢ 1301 <2 αω\u0300\u0313\u0345 1302 = αὼ\u0313\u0345 1303 = αῲ\u0313 # not FCD 1304 <1 ε 1305 = αω 1306 1307 # Double-check that without the extra mapping there will be no discontiguous match. 1308 @ rules 1309 &δ=αῳ 1310 * compare 1311 <1 αω\u0313\u0300\u0345 # no discontiguous contraction 1312 = αὠ\u0300\u0345 1313 = αὢ\u0345 1314 = αᾢ 1315 <2 αω\u0300\u0313\u0345 1316 = αὼ\u0313\u0345 1317 = αῲ\u0313 # not FCD 1318 <1 δ 1319 = αῳ 1320 = αω\u0345 1321 1322 # Add U+0315 COMBINING COMMA ABOVE RIGHT which has ccc=232. 1323 # Tests code paths where the tailored string has a combining mark 1324 # that does not occur in any composite's decomposition. 1325 @ rules 1326 &δ=αὼ\u0315 1327 * compare 1328 <1 αω\u0313\u0300\u0315 # Not tailored: The grave accent blocks the comma above. 1329 = αὠ\u0300\u0315 1330 = αὢ\u0315 1331 <1 δ 1332 = αὼ\u0315 1333 = αω\u0300\u0315 1334 <2 αω\u0300\u0315\u0345 1335 = αὼ\u0315\u0345 1336 = αῲ\u0315 # not FCD 1337 1338 ** test: danish a+a vs. a-umlaut, ICU ticket 9319 1339 @ rules 1340 &z<aa 1341 * compare 1342 <1 z 1343 <1 aa 1344 <2 aa\u0308 1345 = aä 1346 1347 ** test: Jamo L with and in prefix 1348 # Useful for the Korean "searchjl" tailoring (instead of contractions of pairs of Jamo L). 1349 @ rules 1350 # Jamo Lead consonant G after G or GG 1351 &[last primary ignorable]<<\u1100|\u1100=\u1101|\u1100 1352 # Jamo Lead consonant GG sorts like G+G 1353 &\u1100\u1100=\u1101 1354 # Note: Making G|GG and GG|GG sort the same as G|G+G 1355 # would require the ability to reset on G|G+G, 1356 # or we could make G-after-G equal to some secondary-CE character, 1357 # and reset on a pair of those. 1358 # (It does not matter much if there are at most two G in a row in real text.) 1359 * compare 1360 <1 \u1100 1361 <2 \u1100\u1100 # only one primary from a sequence of G lead consonants 1362 = \u1101 1363 <2 \u1100\u1100\u1100 1364 = \u1101\u1100 1365 # but not = \u1100\u1101, see above 1366 <1 \u1100\u1161 1367 = \uAC00 1368 <2 \u1100\u1100\u1161 1369 = \u1100\uAC00 # prefix match from the L of the LV syllable 1370 = \u1101\u1161 1371 = \uAE4C 1372 1373 ** test: proposed Korean "searchjl" tailoring with prefixes, CLDR ticket 6546 1374 @ rules 1375 # Low secondary CEs for Jamo V & T. 1376 # Note: T should sort before V for proper syllable order. 1377 &\u0332 # COMBINING LOW LINE (first primary ignorable) 1378 <<\u1161<<\u1162 1379 1380 # Korean Jamo lead consonant search rules, part 2: 1381 # Make modern compound L jamo primary equivalent to non-compound forms. 1382 1383 # Secondary CEs for Jamo L-after-L, greater than Jamo V & T. 1384 &\u0313 # COMBINING COMMA ABOVE (second primary ignorable) 1385 =\u1100|\u1100 1386 =\u1103|\u1103 1387 =\u1107|\u1107 1388 =\u1109|\u1109 1389 =\u110C|\u110C 1390 1391 # Compound L Jamo map to equivalent expansions of primary+secondary CE. 1392 &\u1100\u0313=\u1101<<<\u3132 # HANGUL CHOSEONG SSANGKIYEOK, HANGUL LETTER SSANGKIYEOK 1393 &\u1103\u0313=\u1104<<<\u3138 # HANGUL CHOSEONG SSANGTIKEUT, HANGUL LETTER SSANGTIKEUT 1394 &\u1107\u0313=\u1108<<<\u3143 # HANGUL CHOSEONG SSANGPIEUP, HANGUL LETTER SSANGPIEUP 1395 &\u1109\u0313=\u110A<<<\u3146 # HANGUL CHOSEONG SSANGSIOS, HANGUL LETTER SSANGSIOS 1396 &\u110C\u0313=\u110D<<<\u3149 # HANGUL CHOSEONG SSANGCIEUC, HANGUL LETTER SSANGCIEUC 1397 1398 * compare 1399 <1 \u1100\u1161 1400 = \uAC00 1401 <2 \u1100\u1162 1402 = \uAC1C 1403 <2 \u1100\u1100\u1161 1404 = \u1100\uAC00 1405 = \u1101\u1161 1406 = \uAE4C 1407 <3 \u3132\u1161 1408 1409 ** test: Hangul syllables in prefix & in the interior of a contraction 1410 @ rules 1411 &x=\u1100\u1161|a\u1102\u1162z 1412 * compare 1413 <1 \u1100\u1161x 1414 = \u1100\u1161a\u1102\u1162z 1415 = \u1100\u1161a\uB0B4z 1416 = \uAC00a\u1102\u1162z 1417 = \uAC00a\uB0B4z 1418 1419 ** test: digits are unsafe-backwards when numeric=on 1420 @ root 1421 % numeric=on 1422 * compare 1423 # If digits are not unsafe, then numeric collation sees "1"=="01" and "b">"a". 1424 # We need to back up before the identical prefix "1" and compare the full numbers. 1425 <1 11b 1426 <1 101a 1427 1428 ** test: simple locale data test 1429 @ locale de 1430 * compare 1431 <1 a 1432 <2 ä 1433 <1 ae 1434 <2 æ 1435 1436 @ locale de-u-co-phonebk 1437 * compare 1438 <1 a 1439 <1 ae 1440 <2 ä 1441 <2 æ 1442 1443 # The following test cases were moved here from ICU 52's DataDrivenCollationTest.txt. 1444 1445 ** test: DataDrivenCollationTest/TestMorePinyin 1446 # Testing the primary strength. 1447 @ locale zh 1448 % strength=primary 1449 * compare 1450 < lā 1451 = lĀ 1452 = Lā 1453 = LĀ 1454 < lān 1455 = lĀn 1456 < lē 1457 = lĒ 1458 = Lē 1459 = LĒ 1460 < lēn 1461 = lĒn 1462 1463 ** test: DataDrivenCollationTest/TestLithuanian 1464 # Lithuanian sort order. 1465 @ locale lt 1466 * compare 1467 < cz 1468 < č 1469 < d 1470 < iz 1471 < j 1472 < sz 1473 < š 1474 < t 1475 < zz 1476 < ž 1477 1478 ** test: DataDrivenCollationTest/TestLatvian 1479 # Latvian sort order. 1480 @ locale lv 1481 * compare 1482 < az 1483 < ā 1484 < b 1485 < cz 1486 < č 1487 < d 1488 < ez 1489 < ē 1490 < f 1491 < gz 1492 < ģ 1493 < h 1494 < iz 1495 < y 1496 < ī 1497 < j 1498 < kz 1499 < ķ 1500 < l 1501 < lz 1502 < ļ 1503 < m 1504 < nz 1505 < ņ 1506 < o 1507 < oz 1508 < ō 1509 < p 1510 < rz 1511 < ŗ 1512 < s 1513 < sz 1514 < š 1515 < t 1516 < uz 1517 < ū 1518 < v 1519 < zz 1520 < ž 1521 1522 ** test: DataDrivenCollationTest/TestEstonian 1523 # Estonian sort order. 1524 @ locale et 1525 * compare 1526 < sy 1527 < š 1528 < šy 1529 < z 1530 < zy 1531 < ž 1532 < v 1533 < va 1534 < w 1535 < õ 1536 < õy 1537 < ä 1538 < äy 1539 < ö 1540 < öy 1541 < ü 1542 < üy 1543 < x 1544 1545 ** test: DataDrivenCollationTest/TestAlbanian 1546 # Albanian sort order. 1547 @ locale sq 1548 * compare 1549 < cz 1550 < ç 1551 < d 1552 < dz 1553 < dh 1554 < e 1555 < ez 1556 < ë 1557 < f 1558 < gz 1559 < gj 1560 < h 1561 < lz 1562 < ll 1563 < m 1564 < nz 1565 < nj 1566 < o 1567 < rz 1568 < rr 1569 < s 1570 < sz 1571 < sh 1572 < t 1573 < tz 1574 < th 1575 < u 1576 < xz 1577 < xh 1578 < y 1579 < zz 1580 < zh 1581 1582 ** test: DataDrivenCollationTest/TestSimplifiedChineseOrder 1583 # Sorted file has different order. 1584 @ root 1585 # normalization=on turned on & off automatically. 1586 * compare 1587 < \u5F20 1588 < \u5F20\u4E00\u8E3F 1589 1590 ** test: DataDrivenCollationTest/TestTibetanNormalizedIterativeCrash 1591 # This pretty much crashes. 1592 @ root 1593 * compare 1594 < \u0f71\u0f72\u0f80\u0f71\u0f72 1595 < \u0f80 1596 1597 ** test: DataDrivenCollationTest/TestThaiPartialSortKeyProblems 1598 # These are examples of strings that caused trouble in partial sort key testing. 1599 @ locale th-TH 1600 * compare 1601 < \u0E01\u0E01\u0E38\u0E18\u0E20\u0E31\u0E13\u0E11\u0E4C 1602 < \u0E01\u0E01\u0E38\u0E2A\u0E31\u0E19\u0E42\u0E18 1603 * compare 1604 < \u0E01\u0E07\u0E01\u0E32\u0E23 1605 < \u0E01\u0E07\u0E42\u0E01\u0E49 1606 * compare 1607 < \u0E01\u0E23\u0E19\u0E17\u0E32 1608 < \u0E01\u0E23\u0E19\u0E19\u0E40\u0E0A\u0E49\u0E32 1609 * compare 1610 < \u0E01\u0E23\u0E30\u0E40\u0E08\u0E35\u0E22\u0E27 1611 < \u0E01\u0E23\u0E30\u0E40\u0E08\u0E35\u0E4A\u0E22\u0E27 1612 * compare 1613 < \u0E01\u0E23\u0E23\u0E40\u0E0A\u0E2D 1614 < \u0E01\u0E23\u0E23\u0E40\u0E0A\u0E49\u0E32 1615 1616 ** test: DataDrivenCollationTest/TestJavaStyleRule 1617 # java.text allows rules to start as '<<<x<<<y...' 1618 # we emulate this by assuming a &[first tertiary ignorable] in this case. 1619 @ rules 1620 &\u0001=equal<<<z<<x<<<w &[first tertiary ignorable]=a &[first primary ignorable]=b 1621 * compare 1622 = a 1623 = equal 1624 < z 1625 < x 1626 = b # x had become the new first primary ignorable 1627 < w 1628 1629 ** test: DataDrivenCollationTest/TestShiftedIgnorable 1630 # The UCA states that primary ignorables should be completely 1631 # ignorable when following a shifted code point. 1632 @ root 1633 % alternate=shifted 1634 % strength=quaternary 1635 * compare 1636 < a\u0020b 1637 = a\u0020\u0300b 1638 = a\u0020\u0301b 1639 < a_b 1640 = a_\u0300b 1641 = a_\u0301b 1642 < A\u0020b 1643 = A\u0020\u0300b 1644 = A\u0020\u0301b 1645 < A_b 1646 = A_\u0300b 1647 = A_\u0301b 1648 < a\u0301b 1649 < A\u0301b 1650 < a\u0300b 1651 < A\u0300b 1652 1653 ** test: DataDrivenCollationTest/TestNShiftedIgnorable 1654 # The UCA states that primary ignorables should be completely 1655 # ignorable when following a shifted code point. 1656 @ root 1657 % alternate=non-ignorable 1658 % strength=tertiary 1659 * compare 1660 < a\u0020b 1661 < A\u0020b 1662 < a\u0020\u0301b 1663 < A\u0020\u0301b 1664 < a\u0020\u0300b 1665 < A\u0020\u0300b 1666 < a_b 1667 < A_b 1668 < a_\u0301b 1669 < A_\u0301b 1670 < a_\u0300b 1671 < A_\u0300b 1672 < a\u0301b 1673 < A\u0301b 1674 < a\u0300b 1675 < A\u0300b 1676 1677 ** test: DataDrivenCollationTest/TestSafeSurrogates 1678 # It turned out that surrogates were not skipped properly 1679 # when iterating backwards if they were in the middle of a 1680 # contraction. This test assures that this is fixed. 1681 @ rules 1682 &a < x\ud800\udc00b 1683 * compare 1684 < a 1685 < x\ud800\udc00b 1686 1687 ** test: DataDrivenCollationTest/da_TestPrimary 1688 # This test goes through primary strength cases 1689 @ locale da 1690 % strength=primary 1691 * compare 1692 < Lvi 1693 < Lwi 1694 * compare 1695 < L\u00e4vi 1696 < L\u00f6wi 1697 * compare 1698 < L\u00fcbeck 1699 = Lybeck 1700 1701 ** test: DataDrivenCollationTest/da_TestTertiary 1702 # This test goes through tertiary strength cases 1703 @ locale da 1704 % strength=tertiary 1705 * compare 1706 < Luc 1707 < luck 1708 * compare 1709 < luck 1710 < L\u00fcbeck 1711 * compare 1712 < lybeck 1713 < L\u00fcbeck 1714 * compare 1715 < L\u00e4vi 1716 < L\u00f6we 1717 * compare 1718 < L\u00f6ww 1719 < mast 1720 1721 * compare 1722 < A/S 1723 < ANDRE 1724 < ANDR\u00c9 1725 < ANDREAS 1726 < AS 1727 < CA 1728 < \u00c7A 1729 < CB 1730 < \u00c7C 1731 < D.S.B. 1732 < DA 1733 < \u00d0A 1734 < DB 1735 < \u00d0C 1736 < DSB 1737 < DSC 1738 < EKSTRA_ARBEJDE 1739 < EKSTRABUD0 1740 < H\u00d8ST 1741 < HAAG 1742 < H\u00c5NDBOG 1743 < HAANDV\u00c6RKSBANKEN 1744 < Karl 1745 < karl 1746 < NIELS\u0020J\u00d8RGEN 1747 < NIELS-J\u00d8RGEN 1748 < NIELSEN 1749 < R\u00c9E,\u0020A 1750 < REE,\u0020B 1751 < R\u00c9E,\u0020L 1752 < REE,\u0020V 1753 < SCHYTT,\u0020B 1754 < SCHYTT,\u0020H 1755 < SCH\u00dcTT,\u0020H 1756 < SCHYTT,\u0020L 1757 < SCH\u00dcTT,\u0020M 1758 < SS 1759 < \u00df 1760 < SSA 1761 < STORE\u0020VILDMOSE 1762 < STOREK\u00c6R0 1763 < STORM\u0020PETERSEN 1764 < STORMLY 1765 < THORVALD 1766 < THORVARDUR 1767 < \u00feORVAR\u00d0UR 1768 < THYGESEN 1769 < VESTERG\u00c5RD,\u0020A 1770 < VESTERGAARD,\u0020A 1771 < VESTERG\u00c5RD,\u0020B 1772 < \u00c6BLE 1773 < \u00c4BLE 1774 < \u00d8BERG 1775 < \u00d6BERG 1776 1777 * compare 1778 < andere 1779 < chaque 1780 < chemin 1781 < cote 1782 < cot\u00e9 1783 < c\u00f4te 1784 < c\u00f4t\u00e9 1785 < \u010du\u010d\u0113t 1786 < Czech 1787 < hi\u0161a 1788 < irdisch 1789 < lie 1790 < lire 1791 < llama 1792 < l\u00f5ug 1793 < l\u00f2za 1794 < lu\u010d 1795 < luck 1796 < L\u00fcbeck 1797 < lye 1798 < l\u00e4vi 1799 < L\u00f6wen 1800 < m\u00e0\u0161ta 1801 < m\u00eer 1802 < myndig 1803 < M\u00e4nner 1804 < m\u00f6chten 1805 < pi\u00f1a 1806 < pint 1807 < pylon 1808 < \u0161\u00e0ran 1809 < savoir 1810 < \u0160erb\u016bra 1811 < Sietla 1812 < \u015blub 1813 < subtle 1814 < symbol 1815 < s\u00e4mtlich 1816 < verkehrt 1817 < vox 1818 < v\u00e4ga 1819 < waffle 1820 < wood 1821 < yen 1822 < yuan 1823 < yucca 1824 < \u017eal 1825 < \u017eena 1826 < \u017den\u0113va 1827 < zoo0 1828 < Zviedrija 1829 < Z\u00fcrich 1830 < zysk0 1831 < \u00e4ndere 1832 1833 ** test: DataDrivenCollationTest/hi_TestNewRules 1834 # This test goes through new rules and tests against old rules 1835 @ locale hi 1836 * compare 1837 < कॐ 1838 < कं 1839 < कँ 1840 < कः 1841 1842 ** test: DataDrivenCollationTest/ro_TestNewRules 1843 # This test goes through new rules and tests against old rules 1844 @ locale ro 1845 * compare 1846 < xAx 1847 < xă 1848 < xĂ 1849 < Xă 1850 < XĂ 1851 < xăx 1852 < xĂx 1853 < xâ 1854 < x 1855 < Xâ 1856 < X 1857 < xâx 1858 < xÂx 1859 < xb 1860 < xIx 1861 < xî 1862 < xÎ 1863 < Xî 1864 < XÎ 1865 < xîx 1866 < xÎx 1867 < xj 1868 < xSx 1869 < xș 1870 = xş 1871 < xȘ 1872 = xŞ 1873 < Xș 1874 = Xş 1875 < XȘ 1876 = XŞ 1877 < xșx 1878 = xşx 1879 < xȘx 1880 = xŞx 1881 < xT 1882 < xTx 1883 < xț 1884 = xţ 1885 < xȚ 1886 = xŢ 1887 < Xț 1888 = Xţ 1889 < XȚ 1890 = XŢ 1891 < xțx 1892 = xţx 1893 < xȚx 1894 = xŢx 1895 < xU 1896 1897 ** test: DataDrivenCollationTest/testOffsets 1898 # This tests cases where forwards and backwards iteration get different offsets 1899 @ locale en 1900 % strength=tertiary 1901 * compare 1902 < a\uD800\uDC00\uDC00 1903 < b\uD800\uDC00\uDC00 1904 * compare 1905 < \u0301A\u0301\u0301 1906 < \u0301B\u0301\u0301 1907 * compare 1908 < abcd\r\u0301 1909 < abce\r\u0301 1910 # TODO: test offsets in new CollationTest 1911 1912 # End of test cases moved here from ICU 52's DataDrivenCollationTest.txt. 1913 1914 ** test: was ICU 52 cmsccoll/TestRedundantRules 1915 @ rules 1916 & a < b < c < d& [before 1] c < m 1917 * compare 1918 <1 a 1919 <1 b 1920 <1 m 1921 <1 c 1922 <1 d 1923 1924 @ rules 1925 & a < b <<< c << d <<< e& [before 3] e <<< x 1926 * compare 1927 <1 a 1928 <1 b 1929 <3 c 1930 <2 d 1931 <3 x 1932 <3 e 1933 1934 @ rules 1935 & a < b <<< c << d <<< e <<< f < g& [before 1] g < x 1936 * compare 1937 <1 a 1938 <1 b 1939 <3 c 1940 <2 d 1941 <3 e 1942 <3 f 1943 <1 x 1944 <1 g 1945 1946 @ rules 1947 & a <<< b << c < d& a < m 1948 * compare 1949 <1 a 1950 <3 b 1951 <2 c 1952 <1 m 1953 <1 d 1954 1955 @ rules 1956 &a<b<<b\u0301 &z<b 1957 * compare 1958 <1 a 1959 <1 b\u0301 1960 <1 z 1961 <1 b 1962 1963 @ rules 1964 &z<m<<<q<<<m 1965 * compare 1966 <1 z 1967 <1 q 1968 <3 m 1969 1970 @ rules 1971 &z<<<m<q<<<m 1972 * compare 1973 <1 z 1974 <1 q 1975 <3 m 1976 1977 @ rules 1978 & a < b < c < d& r < c 1979 * compare 1980 <1 a 1981 <1 b 1982 <1 d 1983 <1 r 1984 <1 c 1985 1986 @ rules 1987 & a < b < c < d& c < m 1988 * compare 1989 <1 a 1990 <1 b 1991 <1 c 1992 <1 m 1993 <1 d 1994 1995 @ rules 1996 & a < b < c < d& a < m 1997 * compare 1998 <1 a 1999 <1 m 2000 <1 b 2001 <1 c 2002 <1 d 2003 2004 ** test: was ICU 52 cmsccoll/TestExpansionSyntax 2005 # The following two rules should sort the particular list of strings the same. 2006 @ rules 2007 &AE <<< a << b <<< c &d <<< f 2008 * compare 2009 <1 AE 2010 <3 a 2011 <2 b 2012 <3 c 2013 <1 d 2014 <3 f 2015 2016 @ rules 2017 &A <<< a / E << b / E <<< c /E &d <<< f 2018 * compare 2019 <1 AE 2020 <3 a 2021 <2 b 2022 <3 c 2023 <1 d 2024 <3 f 2025 2026 # The following two rules should sort the particular list of strings the same. 2027 @ rules 2028 &AE <<< a <<< b << c << d < e < f <<< g 2029 * compare 2030 <1 AE 2031 <3 a 2032 <3 b 2033 <2 c 2034 <2 d 2035 <1 e 2036 <1 f 2037 <3 g 2038 2039 @ rules 2040 &A <<< a / E <<< b / E << c / E << d / E < e < f <<< g 2041 * compare 2042 <1 AE 2043 <3 a 2044 <3 b 2045 <2 c 2046 <2 d 2047 <1 e 2048 <1 f 2049 <3 g 2050 2051 # The following two rules should sort the particular list of strings the same. 2052 @ rules 2053 &AE <<< B <<< C / D <<< F 2054 * compare 2055 <1 AE 2056 <3 B 2057 <3 F 2058 <1 AED 2059 <3 C 2060 2061 @ rules 2062 &A <<< B / E <<< C / ED <<< F / E 2063 * compare 2064 <1 AE 2065 <3 B 2066 <3 F 2067 <1 AED 2068 <3 C 2069 2070 ** test: never reorder trailing primaries 2071 @ root 2072 % reorder Zzzz Grek 2073 * compare 2074 <1 L 2075 <1 字 2076 <1 Ω 2077 <1 \uFFFD 2078 <1 \uFFFF 2079 2080 ** test: fall back to mappings with shorter prefixes, not immediately to ones with no prefixes 2081 @ rules 2082 &u=ab|cd 2083 &v=b|ce 2084 * compare 2085 <1 abc 2086 <1 abcc 2087 <1 abcf 2088 <1 abcd 2089 = abu 2090 <1 abce 2091 = abv 2092 2093 # With the following rules, there is only one prefix per composite ĉ or ç, 2094 # but both prefixes apply to just c in NFD form. 2095 # We would get different results for composed vs. NFD input 2096 # if we fell back directly from longest-prefix mappings to no-prefix mappings. 2097 @ rules 2098 &x=op|ĉ 2099 &y=p|ç 2100 * compare 2101 <1 opc 2102 <2 opć 2103 <1 opcz 2104 <1 opd 2105 <1 opĉ 2106 = opc\u0302 2107 = opx 2108 <1 opç 2109 = opc\u0327 2110 = opy 2111 2112 # The mapping is used which has the longest matching prefix for which 2113 # there is also a suffix match, with the longest suffix match among several for that prefix. 2114 @ rules 2115 &❶=d 2116 &❷=de 2117 &❸=def 2118 &①=c|d 2119 &②=c|de 2120 &③=c|def 2121 &④=bc|d 2122 &⑤=bc|de 2123 &⑥=bc|def 2124 &⑦=abc|d 2125 &⑧=abc|de 2126 &⑨=abc|def 2127 * compare 2128 <1 9aadzz 2129 = 9aa❶zz 2130 <1 9aadez 2131 = 9aa❷z 2132 <1 9aadef 2133 = 9aa❸ 2134 <1 9acdzz 2135 = 9ac①zz 2136 <1 9acdez 2137 = 9ac②z 2138 <1 9acdef 2139 = 9ac③ 2140 <1 9bcdzz 2141 = 9bc④zz 2142 <1 9bcdez 2143 = 9bc⑤z 2144 <1 9bcdef 2145 = 9bc⑥ 2146 <1 abcdzz 2147 = abc⑦zz 2148 <1 abcdez 2149 = abc⑧z 2150 <1 abcdef 2151 = abc⑨ 2152 2153 ** test: prefix + discontiguous contraction with missing prefix contraction 2154 # Unfortunate terminology: The first "prefix" here is the pre-context, 2155 # the second "prefix" refers to the contraction/relation string that is 2156 # one shorter than the one being tested. 2157 @ rules 2158 &x=p|e 2159 &y=p|ê 2160 &z=op|ê 2161 # No mapping for op|e: 2162 # Discontiguous contraction matching should not match op|ê in opệ 2163 # because it would have to skip the dot below and extend a match on op|e by the circumflex, 2164 # but there is no match on op|e. 2165 * compare 2166 <1 oPe 2167 <1 ope 2168 = opx 2169 <1 opệ 2170 = opy\u0323 # y not z 2171 <1 opê 2172 = opz 2173 2174 # We cannot test for fallback by whether the contraction default CE32 2175 # is for another contraction. With the following rules, there is no mapping for op|e, 2176 # and the fallback to prefix p has no contractions. 2177 @ rules 2178 &x=p|e 2179 &z=op|ê 2180 * compare 2181 <1 oPe 2182 <1 ope 2183 = opx 2184 <2 opệ 2185 = opx\u0323\u0302 # x not z 2186 <1 opê 2187 = opz 2188 2189 # One more variation: Fallback to the simple code point, no shorter non-empty prefix. 2190 @ rules 2191 &x=e 2192 &z=op|ê 2193 * compare 2194 <1 ope 2195 = opx 2196 <3 oPe 2197 = oPx 2198 <2 opệ 2199 = opx\u0323\u0302 # x not z 2200 <1 opê 2201 = opz 2202 2203 ** test: maxVariable via rules 2204 @ rules 2205 [maxVariable space][alternate shifted] 2206 * compare 2207 = \u0020 2208 = \u000A 2209 <1 . 2210 <1 ° # degree sign 2211 <1 $ 2212 <1 0 2213 2214 ** test: maxVariable via setting 2215 @ root 2216 % maxVariable=currency 2217 % alternate=shifted 2218 * compare 2219 = \u0020 2220 = \u000A 2221 = . 2222 = ° # degree sign 2223 = $ 2224 <1 0 2225 2226 ** test: ICU4J CollationMiscTest/TestContractionClosure (ää) 2227 # This tests canonical closure, but it also tests that CollationFastLatin 2228 # bails out properly for contractions with combining marks. 2229 # For that we need pairs of strings that remain in the Latin fastpath 2230 # long enough, hence the extra "= b" lines. 2231 @ rules 2232 &b=\u00e4\u00e4 2233 * compare 2234 <1 b 2235 = \u00e4\u00e4 2236 = b 2237 = a\u0308a\u0308 2238 = b 2239 = \u00e4a\u0308 2240 = b 2241 = a\u0308\u00e4 2242 2243 ** test: ICU4J CollationMiscTest/TestContractionClosure (Å) 2244 @ rules 2245 &b=\u00C5 2246 * compare 2247 <1 b 2248 = \u00C5 2249 = b 2250 = A\u030A 2251 = b 2252 = \u212B 2253 2254 ** test: reset-before on already-tailored characters, ICU ticket 10108 2255 @ rules 2256 &a<w<<x &[before 2]x<<y 2257 * compare 2258 <1 a 2259 <1 w 2260 <2 y 2261 <2 x 2262 2263 @ rules 2264 &a<<w<<<x &[before 2]x<<y 2265 * compare 2266 <1 a 2267 <2 y 2268 <2 w 2269 <3 x 2270 2271 @ rules 2272 &a<w<x &[before 2]x<<y 2273 * compare 2274 <1 a 2275 <1 w 2276 <1 y 2277 <2 x 2278 2279 @ rules 2280 &a<w<<<x &[before 2]x<<y 2281 * compare 2282 <1 a 2283 <1 y 2284 <2 w 2285 <3 x 2286 2287 ** test: numeric collation with other settings, ICU ticket 9092 2288 @ root 2289 % strength=identical 2290 % caseFirst=upper 2291 % numeric=on 2292 * compare 2293 <1 100\u0020a 2294 <1 101 2295 2296 ** test: collation type fallback from unsupported type, ICU ticket 10149 2297 @ locale fr-CA-u-co-phonebk 2298 # Expect the same result as with fr-CA, using backwards-secondary order. 2299 # That is, we should fall back from the unsupported collation type 2300 # to the locale's default collation type. 2301 * compare 2302 <1 cote 2303 <2 côte 2304 <2 coté 2305 <2 côté 2306 2307 ** test: @ is equivalent to [backwards 2], ICU ticket 9956 2308 @ rules 2309 &b<a @ &v<<w 2310 * compare 2311 <1 b 2312 <1 a 2313 <1 cote 2314 <2 côte 2315 <2 coté 2316 <2 côté 2317 <1 v 2318 <2 w 2319 <1 x 2320 2321 ** test: shifted+reordering, ICU ticket 9507 2322 @ root 2323 % reorder Grek punct space 2324 % alternate=shifted 2325 % strength=quaternary 2326 # Which primaries are "variable" should be determined without script reordering, 2327 # and then primaries should be reordered whether they are shifted to quaternary or not. 2328 * compare 2329 <4 ( # punctuation 2330 <4 ) 2331 <4 \u0020 # space 2332 <1 ` # symbol 2333 <1 ^ 2334 <1 $ # currency symbol 2335 <1 € 2336 <1 0 # numbers 2337 <1 ε # Greek 2338 <1 e # Latin 2339 <1 e(e 2340 <4 e)e 2341 <4 e\u0020e 2342 <4 ee 2343 <3 e(E 2344 <4 e)E 2345 <4 e\u0020E 2346 <4 eE 2347 2348 ** test: "uppercase first" could sort a string before its prefix, ICU ticket 9351 2349 @ rules 2350 &\u0001<<<b<<<B 2351 % caseFirst=upper 2352 * compare 2353 <1 aaa 2354 <3 aaaB 2355 2356 ** test: secondary+case ignores secondary ignorables, ICU ticket 9355 2357 @ rules 2358 &\u0001<<<b<<<B 2359 % strength=secondary 2360 % caseLevel=on 2361 * compare 2362 <1 a 2363 = ab 2364 = aB 2365 2366 ** test: custom collation rules involving tail of a contraction in Malayalam, ICU ticket 6328 2367 @ rules 2368 &[before 2] ൌ << ൗ # U+0D57 << U+0D4C == 0D46+0D57 2369 * compare 2370 <1 ൗx 2371 <2 ൌx 2372 <1 ൗy 2373 <2 ൌy 2374 2375 ** test: quoted apostrophe in compact syntax, ICU ticket 8204 2376 @ rules 2377 &q<<*a''c 2378 * compare 2379 <1 d 2380 <1 p 2381 <1 q 2382 <2 a 2383 <2 \u0027 2384 <2 c 2385 <1 r 2386 2387 # ICU ticket #8260 "Support all collation-related keywords in Collator.getInstance()" 2388 ** test: locale -u- with collation keywords, ICU ticket 8260 2389 @ locale de-u-kv-sPace-ka-shifTed-kn-kk-falsE-kf-Upper-kc-tRue-ks-leVel4 2390 * compare 2391 <4 \u0020 # space is shifted, strength=quaternary 2392 <1 ! # punctuation is regular 2393 <1 2 2394 <1 12 # numeric sorting 2395 <1 B 2396 <c b # uppercase first on case level 2397 <1 x\u0301\u0308 2398 <2 x\u0308\u0301 # normalization off 2399 2400 ** test: locale @ with collation keywords, ICU ticket 8260 2401 @ locale fr@colbAckwards=yes;ColStrength=Quaternary;kv=currencY;colalternate=shifted 2402 * compare 2403 <4 $ # currency symbols are shifted, strength=quaternary 2404 <1 àla 2405 <2 alà # backwards secondary level 2406 2407 ** test: locale -u- with script reordering, ICU ticket 8260 2408 @ locale el-u-kr-kana-SYMBOL-Grek-hani-cyrl-latn-digit-armn-deva-ethi-thai 2409 * compare 2410 <1 \u0020 2411 <1 あ 2412 <1 ☂ 2413 <1 Ω 2414 <1 丂 2415 <1 ж 2416 <1 L 2417 <1 4 2418 <1 Ձ 2419 <1 अ 2420 <1 ሄ 2421 <1 ฉ 2422 2423 ** test: locale @collation=type should be case-insensitive 2424 @ locale de@coLLation=PhoneBook 2425 * compare 2426 <1 ae 2427 <2 ä 2428 <3 Ä 2429 2430 ** test: import root search rules plus German phonebook rules, ICU ticket 8962 2431 @ locale de-u-co-search 2432 * compare 2433 <1 = 2434 <1 ≠ 2435 <1 a 2436 <1 ae 2437 <2 ä 2438 2439 # Once more, but with runtime builder. 2440 @ rules 2441 [import und-u-co-search][import de-u-co-phonebk] 2442 * compare 2443 <1 = 2444 <1 ≠ 2445 <1 a 2446 <1 ae 2447 <2 ä 2448 2449 # Once again, with import from "root" not "und" (as in a proper language tag). 2450 @ rules 2451 [import root-u-co-search][import de-u-co-phonebk] 2452 * compare 2453 <1 = 2454 <1 ≠ 2455 <1 a 2456 <1 ae 2457 <2 ä 2458 2459 ** test: import rules from a language with non-Latin native script, and reset the reordering, ICU ticket 10998 2460 # Greek should sort Greek first. 2461 @ rules 2462 [import el] 2463 * compare 2464 <1 4 2465 <1 Ω 2466 <1 L 2467 2468 # Import Greek, and then reset the reordering. 2469 @ rules 2470 [import el][reorder Zzzz] 2471 * compare 2472 <1 4 2473 <1 L 2474 <1 Ω 2475 2476 # "others" is a synonym for Zzzz. 2477 @ rules 2478 [import el][reorder others] 2479 * compare 2480 <1 4 2481 <1 L 2482 <1 Ω 2483 2484 ** test: regression test for CollationFastLatinBuilder, ICU ticket 11388 2485 @ rules 2486 &x<<aa<<<Aa<<<AA 2487 % strength=secondary 2488 * compare 2489 <1 AA 2490 <2 Aẩ 2491 <2 aą 2492 * compare 2493 <1 AA 2494 <2 aą 2495 2496 ** test: tailor tertiary-after a common tertiary where there is a lower one 2497 # Assume that Hiragana small A has a below-common tertiary, and Hiragana A has a common one. 2498 # See ICU ticket 11448 & CLDR ticket 7222. 2499 @ rules 2500 &あ<<<x<<<y<<<z 2501 * compare 2502 <1 ぁ 2503 <3 あ 2504 <3 x 2505 <3 y 2506 <3 z 2507 <3 ァ 2508 <1 い 2509 2510 ** test: tailor tertiary-after a below-common tertiary 2511 @ rules 2512 &ぁ<<<x<<<y<<<z 2513 * compare 2514 <1 ぁ 2515 <3 x 2516 <3 y 2517 <3 z 2518 <3 あ 2519 <3 ァ 2520 <1 い 2521 2522 ** test: tailor tertiary-before a common tertiary where there is a lower one 2523 @ rules 2524 &[before 3]あ<<<x<<<y<<<z 2525 * compare 2526 <1 ぁ 2527 <3 x 2528 <3 y 2529 <3 z 2530 <3 あ 2531 <3 ァ 2532 <1 い 2533 2534 ** test: tailor tertiary-before a below-common tertiary 2535 @ rules 2536 &[before 3]ぁ<<<x<<<y<<<z 2537 * compare 2538 <1 x 2539 <3 y 2540 <3 z 2541 <3 ぁ 2542 <3 あ 2543 <3 ァ 2544 <1 い 2545 2546 ** test: reorder single scripts not groups, ICU ticket 11449 2547 @ root 2548 % reorder Goth Latn 2549 * compare 2550 <1 4 2551 <1 𐌰 # Gothic 2552 <1 L 2553 <1 Ω 2554 # Before ICU 55, the following reordered together with Gothic. 2555 <1 𐌈 # Old Italic 2556 <1 𐑐 # Shavian 2557 2558 # Check for presence of certain chars 乛冂刂卜又小彑艹日月爫牛辶 in 2559 # zh pinyin and stroke, ICU-13790 2560 # (bracket pinyin test with 卬..作, stroke test with 一..乾) 2561 2562 ** test: DataDrivenCollationTest/VerifyCertainCharsInPinyin 2563 @ locale zh-u-co-pinyin 2564 * compare 2565 < 卬 2566 < 卜 2567 < 艹 2568 < 辶 2569 < 刂 2570 < 彑 2571 < 冂 2572 < 牛 2573 < 日 2574 < 小 2575 < 乛 2576 < 又 2577 < 月 2578 < 爫 2579 < 作 2580 2581 ** test: DataDrivenCollationTest/VerifyCertainCharsInStroke 2582 @ locale zh-u-co-stroke 2583 * compare 2584 < 一 2585 < 乛 2586 < 冂 2587 < 刂 2588 < 卜 2589 < 又 2590 < 小 2591 < 彑 2592 < 艹 2593 < 日 2594 < 月 2595 < 爫 2596 < 牛 2597 < 辶 2598 < 乾