[ tor-browser ].git.dasho

changes.txt (327271B)
      1 * Copyright (C) 2016 and later: Unicode, Inc. and others.
      2 * License & terms of use: http://www.unicode.org/copyright.html
      3 * Copyright (C) 2004-2016, International Business Machines
      4 * Corporation and others.  All Rights Reserved.
      5 *
      6 *   file name:  changes.txt
      7 *   encoding:   US-ASCII
      8 *   tab size:   8 (not used)
      9 *   indentation:4
     10 *
     11 *   created on: 2004may06
     12 *   created by: Markus W. Scherer
     13 
     14 * change log for Unicode updates
     15 
     16 For an overview, see https://unicode-org.github.io/icu/processes/unicode-update
     17 
     18 Notes:
     19 
     20 This log includes several command lines as used in the update process.
     21 Some of them include a console prompt with the present working directory (pwd) followed by a $ sign.
     22 Use a console window that is set to that directory, or cd to there,
     23 and then paste the command that follows the $ sign.
     24 
     25 Most command lines use environment variables to make them more portable across versions
     26 and machine configurations. When you set up a console window, copy & paste the `export` commands
     27 from near the top of the current section before pasting tool command lines.
     28 Adjust the environment variables to the current version and your machine setup.
     29 (The command lines are currently as used on Linux.)
     30 
     31 Syntax of this file:
     32 
     33 `***` - section heading
     34 `*` - sub heading
     35 `-` - 1st level bullet
     36 `+` - 2nd level bullet
     37 `=` - 1st level bullet
     38 `->` - "the previous things leads to...", OR a 2nd level bullet/item
     39 
     40 ---------------------------------------------------------------------------- ***
     41 
     42 * New ISO 15924 script codes
     43 
     44 Normally, add new script codes as part of a Unicode update.
     45 See https://unicode-org.github.io/icu/processes/release/tasks/standards#update-script-code-enums
     46 and see the change logs below.
     47 
     48 ---------------------------------------------------------------------------- ***
     49 
     50 TODO: Run gencolusb for Unicode updates.
     51 - https://github.com/markusicu/icu/blob/main/icu4c/source/tools/gencolusb/README.md
     52 - until ICU-12062 is done
     53 
     54 ---------------------------------------------------------------------------- ***
     55 
     56 Unicode 17.0 update for ICU 78
     57 
     58 https://www.unicode.org/versions/Unicode17.0.0/
     59 https://www.unicode.org/versions/beta-17.0.0.html
     60 https://www.unicode.org/Public/draft/
     61 https://www.unicode.org/reports/uax-proposed-updates.html
     62 https://www.unicode.org/reports/tr44/tr44-35.html
     63 
     64 https://unicode-org.atlassian.net/browse/ICU-23038 Unicode 17
     65 https://unicode-org.atlassian.net/browse/CLDR-18283 BRS Unicode 17
     66 
     67 * Command-line environment setup
     68 
     69 Markus:
     70 
     71 export UNIDATA_ROOT=~/unidata
     72 export UNICODE_DATA=$UNIDATA_ROOT/uni17/final
     73 export CLDR_SRC=~/cldr/uni/src
     74 export ICU_ROOT=~/icu/uni
     75 export ICU_SRC=$ICU_ROOT/src
     76 export ICU_OUT=$ICU_ROOT/dbg
     77 export ICUDT=icudt78b
     78 export ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
     79 export ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
     80 export LD_LIBRARY_PATH=$ICU_OUT/icu4c/lib
     81 export UNICODE_TOOLS=~/unitools/mine/src
     82 
     83 Elango:
     84 
     85 export UNIDATA_ROOT=~/oss/unidata
     86 export UNICODE_DATA=$UNIDATA_ROOT/uni17/final
     87 export CLDR_SRC=~/oss/cldr/mine/src
     88 export ICU_ROOT=~/oss/icu
     89 export ICU_SRC=$ICU_ROOT
     90 export ICU_OUT=$ICU_ROOT
     91 export ICUDT=icudt78b
     92 export ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
     93 export ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
     94 export LD_LIBRARY_PATH=$ICU_OUT/icu4c/lib
     95 export UNICODE_TOOLS=~/oss/unicodetools/mine/src
     96 
     97 *** Unicode version numbers
     98 - icu4c/source/data/makedata.mak
     99 - icu4c/source/common/unicode/uchar.h
    100 - com.ibm.icu.util.VersionInfo
    101 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
    102 
    103 *** Configure: Build Unicode data for ICU4J
    104 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
    105    so that the makefiles see the new version number.
    106 - FYI: The option that adds the additional Unicode data files for ICU4J is
    107    ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data
    108 - Markus's version:
    109  cd $ICU_OUT/icu4c
    110  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data CXXFLAGS="-DU_USING_ICU_NAMESPACE=0 -Wimplicit-fallthrough" CPPFLAGS="-DU_NO_DEFAULT_INCLUDE_UTF_HEADERS=1 -fsanitize=bounds" LDFLAGS=-fsanitize=bounds ../../src/icu4c/source/runConfigureICU --enable-debug --disable-release Linux/clang --prefix=/usr/local/google/home/mscherer/icu/mine/inst/icu4c > config.out 2>&1 ; tail config.out
    111 - Elango's version (diff default C++ compiler & in-source build paths):
    112  cd $ICU_OUT/icu4c/source
    113  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data CXXFLAGS="-DU_USING_ICU_NAMESPACE=0 -Wimplicit-fallthrough" CPPFLAGS="-DU_NO_DEFAULT_INCLUDE_UTF_HEADERS=1 -fsanitize=bounds" LDFLAGS=-fsanitize=bounds ./runConfigureICU --enable-debug --disable-release Linux/gcc --prefix=/usr/local/google/home/elango/oss/icu/icu4c > config.out 2>&1 ; tail config.out
    114 
    115 *** data files & enums & parser code
    116 
    117 * download files
    118 - same as for the early Unicode Tools setup and data refresh:
    119  https://github.com/unicode-org/unicodetools/blob/main/docs/index.md
    120  https://github.com/unicode-org/unicodetools/blob/main/docs/inputdata.md
    121 - mkdir -p $UNICODE_DATA
    122 - download Unicode files into $UNICODE_DATA
    123  + use an FTP client; anonymous FTP from www.unicode.org at /Public/draft
    124  + subfolders: emoji, idna, security, ucd, uca
    125  + for pre-release (alpha, beta) data files:
    126    ~ if one of us produces the alpha.zip or beta.zip collection of data files for publication,
    127      then we can use its contents directly (no FTP from unicode.org necessary)
    128    ~ otherwise download from https://www.unicode.org/Public/draft/
    129    ~ you can omit or discard the charts/ and ucdxml/ files/folders
    130    ~ you can omit or discard ucd/UCD.zip & ucd/Unihan.zip & security/*.zip
    131  + alternate way of fetching files, if available:
    132    copy the files from a Unicode Tools workspace that is up to date with
    133    https://github.com/unicode-org/unicodetools
    134    and which might at this point be *ahead* of "Public"
    135    ~ before the Unicode release copy files from "dev" subfolders, for example
    136      https://github.com/unicode-org/unicodetools/tree/main/unicodetools/data/ucd/dev
    137  + for final-release data files, the source of truth is the files in
    138    https://www.unicode.org/Public/(version)
    139 
    140 * process and/or copy files
    141 - cd $ICU_SRC/tools/unicode
    142    py/preparseucd.py $UNICODE_DATA $ICU_SRC
    143  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
    144  + For debugging, and tweaking how ppucd.txt is written,
    145    the tool has an --only_ppucd option:
    146      py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
    147    e.g.
    148      py/preparseucd.py $UNICODE_DATA --only_ppucd /tmp/ppucd.txt
    149 
    150 * new constants for new property values
    151 - preparseucd.py error:
    152    ValueError: missing uchar.h enum constants for some property values: [('blk', {'Tangut_Components_Sup', 'Misc_Symbols_Sup', 'CJK_Ext_J', 'Tai_Yo', 'Chisoi', 'Sharada_Sup', 'Beria_Erfe', 'Tolong_Siki', 'Sidetic'}), ('jg', {'Thin_Noon'}), ('lb', {'HH'}), ('sc', {'Chis', 'Sidt', 'Tols', 'Berf', 'Tayo'})]
    153  = PropertyValueAliases.txt new property values (diff old & new .txt files)
    154    (cd $UNIDATA_ROOT && diff -u uni16/final/UCD/ucd/PropertyValueAliases.txt uni17/beta/ucd/PropertyValueAliases.txt | egrep '^[-+][a-zA-Z]')
    155    +age; 17.0                             ; V17_0
    156    +blk; Beria_Erfe                       ; Beria_Erfe
    157    +blk; Chisoi                           ; Chisoi
    158    +blk; CJK_Ext_J                        ; CJK_Unified_Ideographs_Extension_J
    159    +blk; Misc_Symbols_Sup                 ; Miscellaneous_Symbols_Supplement
    160    +blk; Sharada_Sup                      ; Sharada_Supplement
    161    +blk; Sidetic                          ; Sidetic
    162    +blk; Tai_Yo                           ; Tai_Yo
    163    +blk; Tangut_Components_Sup            ; Tangut_Components_Supplement
    164    +blk; Tolong_Siki                      ; Tolong_Siki
    165    +jg ; Thin_Noon                        ; Thin_Noon
    166    +lb ; HH                               ; Unambiguous_Hyphen
    167    +sc ; Berf                             ; Beria_Erfe
    168    +sc ; Chis                             ; Chisoi
    169    +sc ; Sidt                             ; Sidetic
    170    +sc ; Tayo                             ; Tai_Yo
    171    +sc ; Tols                             ; Tolong_Siki
    172  + copy new API constants from the preparseucd.py output into the .h/.java files,
    173    add/adjust comments, wrap lines, and set numeric values
    174  + (ignore Age: no API constants for that)
    175  + Block:
    176      uchar.h before UBLOCK_COUNT,
    177      UCharacter.UnicodeBlock IDs before COUNT,
    178      UCharacter.UnicodeBlock objects before INVALID_CODE
    179  + Script: uscript.h & com.ibm.icu.lang.UScript
    180  + for new scripts: fix expectedLong names
    181      in cintltst/cucdapi.c/TestUScriptCodeAPI()
    182      and in com.ibm.icu.dev.test.lang.TestUScript.java
    183  + Indic_Syllabic_Category: uchar.h & UCharacter.IndicSyllabicCategory
    184  + Note: preparseucd.py does not write constants for values of every property.
    185    Add some manually, or write more generator code.
    186  + after adding new API constants, run preparseucd.py again
    187 
    188 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
    189    (not strictly necessary for NOT_ENCODED scripts)
    190  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
    191 
    192 * build ICU
    193  to make sure that there are no syntax errors
    194 
    195  $ICU_OUT/icu4c$ echo;echo; date; make -j20 tests &> out.txt ; tail -n 30 out.txt ; date
    196 
    197 * Bazel build process
    198 
    199 See https://unicode-org.github.io/icu/processes/unicode-update#bazel-build-process
    200 for an overview and for setup instructions.
    201 
    202 Consider running `bazelisk --version` outside of the $ICU_SRC folder
    203 to find out the latest `bazel` version, and
    204 copying that version number into the $ICU_SRC/.bazeliskrc config file.
    205 (Revert if you find incompatibilities, or, better, update our build & config files.)
    206 
    207 * TODO:
    208  Error when upgrading from Bazel 7.2.1 to Bazel 8.2.1:
    209    ERROR: Skipping '//icu4c/source/tools/gennorm2': error loading package 'icu4c/source/tools/gennorm2': Unable to find package for @@[unknown repo 'rules_cc' requested from @@]//cc:defs.bzl: The repository '@@[unknown repo 'rules_cc' requested from @@]' could not be resolved: No repository visible as '@rules_cc' from main repository. Was the repository introduced in WORKSPACE? The WORKSPACE file is disabled by default in Bazel 8 (late 2024) and will be removed in Bazel 9 (late 2025), please migrate to Bzlmod. See https://bazel.build/external/migration.
    210  Need to revisit!
    211 
    212 * generate data files
    213 
    214 - remember to define the environment variables
    215  (see the start of the section for this Unicode version)
    216 - cd $ICU_SRC
    217 - optional but not necessary:
    218    bazelisk clean
    219      or even
    220    bazelisk clean --expunge
    221 - build/bootstrap/generate new files:
    222    icu4c/source/data/unidata/generate.sh
    223 
    224 * NOTE: propsVectorsTrie_index in uprops.icu / uchar_props_data.h
    225  (and a bit of propsVectors)
    226  increased by some 22kB, probably mostly due to the revised Identifier_Type data,
    227  especially for Unified_Ideograph characters.
    228 
    229 * run & fix ICU4C tests
    230 - Note: Some of the collation data and test data will be updated below,
    231  so at this time we might get some collation test failures.
    232  Ignore these for now.
    233 - Some properties are hardcoded in the ICU libraries because they apply to
    234  few characters or ranges, and are not expected to change often.
    235  They are tested at least in C++ intltest (e.g., against ppucd.txt).
    236  If these tests fail, then update the implementation and the tests.
    237 - Robin or Andy helps with RBBI & spoof check test failures
    238 
    239 * collation: CLDR collation root, UCA DUCET
    240 
    241 - UCA DUCET goes into Mark's Unicode tools,
    242  and a tool-tailored version goes into CLDR, see
    243    https://github.com/unicode-org/unicodetools/blob/main/docs/uca/index.md
    244 
    245 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
    246    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
    247 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
    248    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
    249    (note removing the underscore before "Rules")
    250    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
    251 - restore TODO diffs in UCARules.txt; adjust boundaries as needed, e.g., for new currency symbols
    252    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
    253 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
    254  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
    255  from the CLDR root files (..._CLDR_..._SHORT.txt)
    256    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
    257    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
    258    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/collate/src/test/resources/com/ibm/icu/dev/data
    259 - if CLDR common/uca/unihan-index.txt changes, then update
    260  CLDR common/collation/root.xml <collation type="private-unihan">
    261  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
    262 
    263 - update CollationFCD.java:
    264  copy & paste the initializers of lcccIndex[] etc.
    265  from
    266    $ICU_SRC/icu4c/source/i18n/collationfcd.cpp
    267  to
    268    $ICU_SRC/icu4j/main/collate/src/main/java/com/ibm/icu/impl/coll/CollationFCD.java
    269 - generate data files, as above (generate.sh), now to pick up new collation data
    270 - rebuild ICU4C (make clean, make check, as usual)
    271 
    272 * Unihan collators
    273    https://github.com/unicode-org/unicodetools/blob/main/docs/unihan.md
    274 - run Unicode Tools GenerateUnihanCollators & GenerateUnihanCollatorFiles,
    275  check CLDR diffs, copy to CLDR, test CLDR, ... as documented there
    276 - generate ICU zh collation data
    277    Follow the tools/cldr/cldr-to-icu/README.md file.
    278  + setup:
    279    export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
    280        (didn't work without setting JAVA_HOME,
    281         nor with the Google default of /usr/local/buildtools/java/jdk
    282         [Google security limitations in the XML parser])
    283    export PATH=$JAVA_HOME/bin:$PATH
    284    export TOOLS_ROOT=$ICU_SRC/tools
    285    export ICU_DIR=$ICU_SRC
    286    export CLDR_DIR=$CLDR_SRC
    287    export CLDR_DATA_DIR=$CLDR_DIR
    288        (pointing to the "raw" data, not cldr-staging/.../production should be ok for the relevant files)
    289  + build & run Java code
    290    Follow the instructions in the TLDR section of the cldr-to-icu/README.md file.
    291    In TLDR "Run the conversion tool" add parameters to generate only the files we need:
    292      java -jar target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar --outDir=/tmp/icu --outputTypes=coll,transforms --localeIdFilter='zh.*' --dontGenCode
    293  + diff
    294    cd $ICU_SRC
    295    meld icu4c/source/data/coll/zh.txt /tmp/icu/coll/zh.txt
    296    meld icu4c/source/data/translit/Hani_Latn.txt /tmp/icu/translit/Hani_Latn.txt
    297  + copy into the source tree
    298    cd $ICU_SRC
    299    cp /tmp/icu/coll/zh.txt icu4c/source/data/coll/zh.txt
    300    cp /tmp/icu/translit/Hani_Latn.txt icu4c/source/data/translit/Hani_Latn.txt
    301 - rebuild ICU4C
    302 
    303 * run & fix ICU4C tests, now with new CLDR collation root data
    304 - run all tests with the collation test data *_SHORT.txt or the full files
    305  (the full ones have comments, useful for debugging)
    306 - note on intltest: if collate/UCAConformanceTest fails, then
    307  utility/MultithreadTest/TestCollators will fail as well;
    308  fix the conformance test before looking into the multi-thread test
    309 
    310 * update Java data files
    311 - refresh just the UCD/UCA-related/derived files, just to be safe
    312 - see (ICU4C)/source/data/icu4j-readme.txt
    313 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
    314 - $ICU_OUT/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
    315    NOTE: If you get the error "No rule to make target 'out/build/icudt70l/uprops.icu'",
    316    you need to reconfigure with unicore data; see the "configure" line above.
    317  output:
    318    ...
    319    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
    320    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudata
    321    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudata
    322    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt78l.dat ./out/icu4j/icudt78b.dat -s ./out/build/icudt78l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudata
    323    mv ./out/icu4j/"com/ibm/icu/impl/data/icudata/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudata/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudata/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudata/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudata"
    324    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudata/
    325    mkdir -p /tmp/icu4j/main/shared/data
    326    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
    327    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudata/
    328    mkdir -p /tmp/icu4j/main/shared/data
    329    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
    330    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
    331 - copy the binary data files into the ICU4J tree
    332    cd $ICU_OUT/icu4c/data/out/icu4j
    333    cp -v com/ibm/icu/impl/data/icudata/coll/* $ICU_SRC/icu4j/main/collate/src/main/resources/com/ibm/icu/impl/data/icudata/coll
    334    cp -v com/ibm/icu/impl/data/icudata/brkitr/* $ICU_SRC/icu4j/main/core/src/main/resources/com/ibm/icu/impl/data/icudata/brkitr
    335    cp -v com/ibm/icu/impl/data/icudata/confusables.cfu $ICU_SRC/icu4j/main/core/src/main/resources/com/ibm/icu/impl/data/icudata
    336    cp -v com/ibm/icu/impl/data/icudata/*.nrm $ICU_SRC/icu4j/main/core/src/main/resources/com/ibm/icu/impl/data/icudata
    337    cd com/ibm/icu/impl/data/icudata/
    338    ls *.icu | egrep -v "cnvalias.icu" | awk '{print "cp " $0 " $ICU_SRC/icu4j/main/core/src/main/resources/com/ibm/icu/impl/data/icudata";}' | sh
    339 - The procedure above is very conservative:
    340  It refreshes only the parts of the ICU4J data that we think are affected by a Unicode data update.
    341  It avoids dealing with any other discrepancies
    342  between the source and generated data files.
    343  *If* instead we wanted to refresh *all* of the ICU4J data from ICU4C:
    344      $ICU_OUT/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
    345 
    346 * refresh Java test .txt files
    347 - copy new .txt files into ICU4J's main/core/src/test/resources/com/ibm/icu/dev/data/unicode
    348    cd $ICU_SRC/icu4c/source/data/unidata
    349    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/core/src/test/resources/com/ibm/icu/dev/data/unicode
    350    cd ../../test/testdata
    351    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/core/src/test/resources/com/ibm/icu/dev/data/unicode
    352    cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/core/src/test/resources/com/ibm/icu/dev/data/unicode
    353 
    354 * run & fix ICU4J tests
    355 
    356 *** API additions
    357 - send notice to icu-design about new born-@stable API (enum constants etc.)
    358 
    359 *** CLDR numbering systems
    360 - look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
    361  for example:
    362    ~/unitools/mine/src$ diff -u unicodetools/data/ucd/16.0.0/extracted/DerivedGeneralCategory.txt unicodetools/data/ucd/dev/extracted/DerivedGeneralCategory.txt | grep '; Nd' | egrep '^\+'
    363    -->
    364      +11DE0..11DE9  ; Nd #  [10] TOLONG SIKI DIGIT ZERO..TOLONG SIKI DIGIT NINE
    365      +16DA0..16DA9  ; Nd #  [10] CHISOI DIGIT ZERO..CHISOI DIGIT NINE
    366  --> https://github.com/unicode-org/cldr/pull/4726
    367      (FYI: Chisoi was later removed from Unicode 17)
    368 
    369 *** merge the Unicode update branch back onto the main branch
    370 - make sure that changes to Unicode tools are checked in:
    371  https://github.com/unicode-org/unicodetools
    372 
    373 ---------------------------------------------------------------------------- ***
    374 
    375 Unicode 16.0 update for ICU 76
    376 
    377 https://www.unicode.org/versions/Unicode16.0.0/
    378 https://www.unicode.org/versions/beta-16.0.0.html
    379 https://www.unicode.org/Public/draft/
    380 https://www.unicode.org/reports/uax-proposed-updates.html
    381 https://www.unicode.org/reports/tr44/tr44-33.html
    382 
    383 https://unicode-org.atlassian.net/browse/ICU-22707 Unicode 16
    384 https://unicode-org.atlassian.net/browse/CLDR-17226 BRS Unicode 16
    385 
    386 https://github.com/unicode-org/unicodetools/pull/774 delete the RecommendedSetGenerator
    387 
    388 https://github.com/unicode-org/unicodetools/issues/492 adjust cldr/*BreakTest generation for Unicode 15.1
    389 
    390 * Command-line environment setup
    391 
    392 Markus:
    393 
    394 export UNIDATA_ROOT=~/unidata
    395 export UNICODE_DATA=$UNIDATA_ROOT/uni16/final
    396 export CLDR_SRC=~/cldr/uni/src
    397 export ICU_ROOT=~/icu/uni
    398 export ICU_SRC=$ICU_ROOT/src
    399 export ICU_OUT=$ICU_ROOT/dbg
    400 export ICUDT=icudt76b
    401 export ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
    402 export ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
    403 export LD_LIBRARY_PATH=$ICU_OUT/icu4c/lib
    404 export UNICODE_TOOLS=~/unitools/mine/src
    405 
    406 Elango:
    407 
    408 export UNIDATA_ROOT=~/oss/unidata
    409 export UNICODE_DATA=$UNIDATA_ROOT/uni16/final
    410 export CLDR_SRC=~/oss/cldr/mine/src
    411 export ICU_ROOT=~/oss/icu
    412 export ICU_SRC=$ICU_ROOT
    413 export ICU_OUT=$ICU_ROOT
    414 export ICUDT=icudt76b
    415 export ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
    416 export ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
    417 export LD_LIBRARY_PATH=$ICU_OUT/icu4c/lib
    418 export UNICODE_TOOLS=~/oss/unicodetools/mine/src
    419 
    420 *** Unicode version numbers
    421 - icu4c/source/data/makedata.mak
    422 - icu4c/source/common/unicode/uchar.h
    423 - com.ibm.icu.util.VersionInfo
    424 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
    425 
    426 *** Configure: Build Unicode data for ICU4J
    427 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
    428    so that the makefiles see the new version number.
    429 - FYI: The option that adds the additional Unicode data files for ICU4J is
    430    ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data
    431 - Markus's version:
    432  cd $ICU_OUT/icu4c
    433  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data CXXFLAGS="-DU_USING_ICU_NAMESPACE=0 -Wimplicit-fallthrough" CPPFLAGS="-DU_NO_DEFAULT_INCLUDE_UTF_HEADERS=1 -fsanitize=bounds" LDFLAGS=-fsanitize=bounds ../../src/icu4c/source/runConfigureICU --enable-debug --disable-release Linux/clang --prefix=/usr/local/google/home/mscherer/icu/mine/inst/icu4c > config.out 2>&1 ; tail config.out
    434 - Elango's version (diff default C++ compiler & in-source build paths):
    435  cd $ICU_OUT/icu4c/source
    436  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data CXXFLAGS="-DU_USING_ICU_NAMESPACE=0 -Wimplicit-fallthrough" CPPFLAGS="-DU_NO_DEFAULT_INCLUDE_UTF_HEADERS=1 -fsanitize=bounds" LDFLAGS=-fsanitize=bounds ./runConfigureICU --enable-debug --disable-release Linux/gcc --prefix=/usr/local/google/home/elango/oss/icu/icu4c > config.out 2>&1 ; tail config.out
    437 
    438 *** data files & enums & parser code
    439 
    440 * download files
    441 - same as for the early Unicode Tools setup and data refresh:
    442  https://github.com/unicode-org/unicodetools/blob/main/docs/index.md
    443  https://github.com/unicode-org/unicodetools/blob/main/docs/inputdata.md
    444 - mkdir -p $UNICODE_DATA
    445 - download Unicode files into $UNICODE_DATA
    446  + use an FTP client; anonymous FTP from www.unicode.org at /Public/draft etc.
    447  + subfolders: emoji, idna, security, ucd, uca
    448  + for pre-release (alpha, beta) data files:
    449    ~ if one of us produces the alpha.zip or beta.zip collection of data files for publication,
    450      then we can use its contents directly (no FTP from unicode.org necessary)
    451    ~ otherwise download from https://www.unicode.org/Public/draft/
    452    ~ you can omit or discard the UCD/charts/ and UCD/ucdxml/ files/folders
    453    ~ you can omit or discard UCD/ucd/Unihan.zip
    454  + alternate way of fetching files, if available:
    455    copy the files from a Unicode Tools workspace that is up to date with
    456    https://github.com/unicode-org/unicodetools
    457    and which might at this point be *ahead* of "Public"
    458    ~ before the Unicode release copy files from "dev" subfolders, for example
    459      https://github.com/unicode-org/unicodetools/tree/main/unicodetools/data/ucd/dev
    460  + for final-release data files, the source of truth is the files in
    461    https://www.unicode.org/Public/(version) [=UCD],
    462    https://www.unicode.org/Public/UCA/(version),
    463    https://www.unicode.org/Public/idna/(version),
    464    etc.
    465 - get the CLDR version of GraphemeBreakTest.txt from CLDR (if it has been updated there already)
    466  or from the UCD/cldr/ output folder of the Unicode Tools:
    467  From Unicode 12/CLDR 35/ICU 64 to Unicode 15.0/CLDR 43/ICU 73,
    468  CLDR used modified grapheme break rules.
    469  This might happen again.
    470  + To check in the Unicode Tools workspace:
    471    ~/unitools/mine/Generated$ meld UCD/16.0.0/auxiliary/*GraphemeBreakTest.txt UCD/16.0.0/cldr/GraphemeBreakTest-cldr.txt
    472  + If different, and after copying into CLDR:
    473    cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt icu4c/source/test/testdata
    474  or
    475    cp ~/unitools/mine/Generated/UCD/16.0.0/cldr/GraphemeBreakTest-cldr.txt icu4c/source/test/testdata/GraphemeBreakTest.txt
    476    cp ~/unitools/mine/Generated/UCD/16.0.0/cldr/GraphemeBreakTest-cldr.txt $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
    477    cp ~/unitools/mine/Generated/UCD/16.0.0/cldr/GraphemeBreakTest-cldr.html $CLDR_SRC/common/properties/segments/GraphemeBreakTest.html
    478  + We may need CLDR versions of WordBreakTest.txt and LineBreakTest.txt
    479    unless Unicode 16 and CLDR 46 eliminate their differences:
    480    unicodetools issue #492
    481 
    482 * process and/or copy files
    483 - cd $ICU_SRC/tools/unicode
    484    py/preparseucd.py $UNICODE_DATA $ICU_SRC
    485  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
    486  + For debugging, and tweaking how ppucd.txt is written,
    487    the tool has an --only_ppucd option:
    488      py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
    489    e.g.
    490      py/preparseucd.py $UNICODE_DATA --only_ppucd /tmp/ppucd.txt
    491 
    492 * new constants for new property values
    493 - preparseucd.py error:
    494    ValueError: missing uchar.h enum constants for some property values:
    495    [('blk', {'Garay', 'Tulu_Tigalari', 'Todhri', 'Sunuwar', 'Egyptian_Hieroglyphs_Ext_A', 'Kirat_Rai', 'Symbols_For_Legacy_Computing_Sup', 'Myanmar_Ext_C', 'Ol_Onal', 'Gurung_Khema'}),
    496    ('sc', {'Gara', 'Onao', 'Todr', 'Krai', 'Tutg', 'Sunu', 'Gukh'}),
    497    ('InSC', {'Reordering_Killer'})]
    498  = PropertyValueAliases.txt new property values (diff old & new .txt files)
    499    (cd $UNIDATA_ROOT && diff -u uni15.1/final/ucd/PropertyValueAliases.txt uni16/alpha/UCD/ucd/PropertyValueAliases.txt | egrep '^[-+][a-zA-Z]')
    500    +age; 16.0                             ; V16_0
    501    +blk; Egyptian_Hieroglyphs_Ext_A       ; Egyptian_Hieroglyphs_Extended_A
    502    +blk; Garay                            ; Garay
    503    +blk; Gurung_Khema                     ; Gurung_Khema
    504    +blk; Kirat_Rai                        ; Kirat_Rai
    505    +blk; Myanmar_Ext_C                    ; Myanmar_Extended_C
    506    +blk; Ol_Onal                          ; Ol_Onal
    507    +blk; Sunuwar                          ; Sunuwar
    508    +blk; Symbols_For_Legacy_Computing_Sup ; Symbols_For_Legacy_Computing_Supplement
    509    +blk; Todhri                           ; Todhri
    510    +blk; Tulu_Tigalari                    ; Tulu_Tigalari
    511    +InSC; Reordering_Killer               ; Reordering_Killer
    512    -jg ; Teh_Marbuta_Goal                 ; Hamza_On_Heh_Goal
    513    +jg ; Teh_Marbuta_Goal                 ; Teh_Marbuta_Goal                 ; Hamza_On_Heh_Goal
    514    +sc ; Gara                             ; Garay
    515    +sc ; Gukh                             ; Gurung_Khema
    516    +sc ; Krai                             ; Kirat_Rai
    517    +sc ; Onao                             ; Ol_Onal
    518    +sc ; Sunu                             ; Sunuwar
    519    +sc ; Todr                             ; Todhri
    520    +sc ; Tutg                             ; Tulu_Tigalari
    521  + copy new API constants from the preparseucd.py output into the .h/.java files,
    522    add/adjust comments, wrap lines, and set numeric values
    523  + (ignore Age: no API constants for that)
    524  + Block: uchar.h before UBLOCK_COUNT,
    525      UCharacter.UnicodeBlock IDs, UCharacter.UnicodeBlock objects
    526  + Script: uscript.h & com.ibm.icu.lang.UScript
    527  + for new scripts: fix expectedLong names
    528      in cintltst/cucdapi.c/TestUScriptCodeAPI()
    529      and in com.ibm.icu.dev.test.lang.TestUScript.java
    530  + Indic_Syllabic_Category: uchar.h & UCharacter.IndicSyllabicCategory
    531  + after adding new API constants, run preparseucd.py again
    532 
    533 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
    534    (not strictly necessary for NOT_ENCODED scripts)
    535  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
    536 
    537 * build ICU
    538  to make sure that there are no syntax errors
    539 
    540  $ICU_OUT/icu4c$ echo;echo; date; make -j20 tests &> out.txt ; tail -n 30 out.txt ; date
    541 
    542 * Bazel build process
    543 
    544 See https://unicode-org.github.io/icu/processes/unicode-update#bazel-build-process
    545 for an overview and for setup instructions.
    546 
    547 Consider running `bazelisk --version` outside of the $ICU_SRC folder
    548 to find out the latest `bazel` version, and
    549 copying that version number into the $ICU_SRC/.bazeliskrc config file.
    550 (Revert if you find incompatibilities, or, better, update our build & config files.)
    551 
    552 * generate data files
    553 
    554 - remember to define the environment variables
    555  (see the start of the section for this Unicode version)
    556 - cd $ICU_SRC
    557 - optional but not necessary:
    558    bazelisk clean
    559      or even
    560    bazelisk clean --expunge
    561 - build/bootstrap/generate new files:
    562    icu4c/source/data/unidata/generate.sh
    563 
    564 * run & fix ICU4C tests
    565 - Note: Some of the collation data and test data will be updated below,
    566  so at this time we might get some collation test failures.
    567  Ignore these for now.
    568 - Some properties are hardcoded in the ICU libraries because they apply to
    569  few characters or ranges, and are not expected to change often.
    570  They are tested at least in C++ intltest (e.g., against ppucd.txt).
    571  If these tests fail, then update the implementation and the tests.
    572 - update CLDR GraphemeBreakTest.txt
    573  (see the download section above about this file)
    574    cd ~/unitools/mine/Generated
    575    cp UCD/16.0.0/cldr/GraphemeBreakTest-cldr.txt $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
    576    cp UCD/16.0.0/cldr/GraphemeBreakTest-cldr.html $CLDR_SRC/common/properties/segments/GraphemeBreakTest.html
    577    cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt $ICU_SRC/icu4c/source/test/testdata
    578 - Robin or Andy helps with RBBI & spoof check test failures
    579 
    580 * collation: CLDR collation root, UCA DUCET
    581 
    582 - UCA DUCET goes into Mark's Unicode tools,
    583  and a tool-tailored version goes into CLDR, see
    584    https://github.com/unicode-org/unicodetools/blob/main/docs/uca/index.md
    585 
    586 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
    587    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
    588 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
    589    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
    590    (note removing the underscore before "Rules")
    591    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
    592 - restore TODO diffs in UCARules.txt
    593    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
    594 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
    595  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
    596  from the CLDR root files (..._CLDR_..._SHORT.txt)
    597    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
    598    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
    599    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/collate/src/test/resources/com/ibm/icu/dev/data
    600 - if CLDR common/uca/unihan-index.txt changes, then update
    601  CLDR common/collation/root.xml <collation type="private-unihan">
    602  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
    603 
    604 - update CollationFCD.java:
    605  copy & paste the initializers of lcccIndex[] etc.
    606  from
    607    $ICU_SRC/icu4c/source/i18n/collationfcd.cpp
    608  to
    609    $ICU_SRC/icu4j/main/collate/src/main/java/com/ibm/icu/impl/coll/CollationFCD.java
    610 - generate data files, as above (generate.sh), now to pick up new collation data
    611 - rebuild ICU4C (make clean, make check, as usual)
    612 
    613 * Unihan collators
    614    https://github.com/unicode-org/unicodetools/blob/main/docs/unihan.md
    615 - run Unicode Tools GenerateUnihanCollators & GenerateUnihanCollatorFiles,
    616  check CLDR diffs, copy to CLDR, test CLDR, ... as documented there
    617 - generate ICU zh collation data
    618    WARNING: outdated, don't do this, follow the tools/cldr/cldr-to-icu/README.md file!
    619    --- Old text from here:
    620    instructions inspired by
    621    https://github.com/unicode-org/icu/blob/main/tools/cldr/cldr-to-icu/README.txt and
    622    https://github.com/unicode-org/icu/blob/main/icu4c/source/data/cldr-icu-readme.txt
    623  + setup:
    624    export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
    625        (didn't work without setting JAVA_HOME,
    626         nor with the Google default of /usr/local/buildtools/java/jdk
    627         [Google security limitations in the XML parser])
    628    export TOOLS_ROOT=$ICU_SRC/tools
    629    export CLDR_DIR=$CLDR_SRC
    630    export CLDR_DATA_DIR=$CLDR_DIR
    631        (pointing to the "raw" data, not cldr-staging/.../production should be ok for the relevant files)
    632    cd "$TOOLS_ROOT/cldr/lib"
    633    ./install-cldr-jars.sh "$CLDR_DIR"
    634  + generate the files we need
    635    cd "$TOOLS_ROOT/cldr/cldr-to-icu"
    636    ant -f build-icu-data.xml -DoutDir=/tmp/icu -DoutputTypes=coll,transforms -DlocaleIdFilter='zh.*'
    637  + diff
    638    cd $ICU_SRC
    639    meld icu4c/source/data/coll/zh.txt /tmp/icu/coll/zh.txt
    640    meld icu4c/source/data/translit/Hani_Latn.txt /tmp/icu/translit/Hani_Latn.txt
    641  + copy into the source tree
    642    cd $ICU_SRC
    643    cp /tmp/icu/coll/zh.txt icu4c/source/data/coll/zh.txt
    644    cp /tmp/icu/translit/Hani_Latn.txt icu4c/source/data/translit/Hani_Latn.txt
    645 - rebuild ICU4C
    646 
    647 * run & fix ICU4C tests, now with new CLDR collation root data
    648 - run all tests with the collation test data *_SHORT.txt or the full files
    649  (the full ones have comments, useful for debugging)
    650 - note on intltest: if collate/UCAConformanceTest fails, then
    651  utility/MultithreadTest/TestCollators will fail as well;
    652  fix the conformance test before looking into the multi-thread test
    653 
    654 * update Java data files
    655 - refresh just the UCD/UCA-related/derived files, just to be safe
    656 - see (ICU4C)/source/data/icu4j-readme.txt
    657 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
    658 - $ICU_OUT/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
    659    NOTE: If you get the error "No rule to make target 'out/build/icudt70l/uprops.icu'",
    660    you need to reconfigure with unicore data; see the "configure" line above.
    661  output:
    662    ...
    663    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
    664    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt76b
    665    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt76b
    666    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt76l.dat ./out/icu4j/icudt76b.dat -s ./out/build/icudt76l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt76b
    667    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt76b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt76b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt76b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt76b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt76b"
    668    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt76b/
    669    mkdir -p /tmp/icu4j/main/shared/data
    670    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
    671    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt76b/
    672    mkdir -p /tmp/icu4j/main/shared/data
    673    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
    674    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
    675 - copy the binary data files into the ICU4J tree
    676    cd $ICU_OUT/icu4c/data/out/icu4j
    677    cp -v com/ibm/icu/impl/data/icudata/coll/* $ICU_SRC/icu4j/main/collate/src/main/resources/com/ibm/icu/impl/data/icudata/coll
    678    cp -v com/ibm/icu/impl/data/icudata/brkitr/* $ICU_SRC/icu4j/main/core/src/main/resources/com/ibm/icu/impl/data/icudata/brkitr
    679    cp -v com/ibm/icu/impl/data/icudata/confusables.cfu $ICU_SRC/icu4j/main/core/src/main/resources/com/ibm/icu/impl/data/icudata
    680    cp -v com/ibm/icu/impl/data/icudata/*.nrm $ICU_SRC/icu4j/main/core/src/main/resources/com/ibm/icu/impl/data/icudata
    681    cd com/ibm/icu/impl/data/icudata/
    682    ls *.icu | egrep -v "cnvalias.icu" | awk '{print "cp " $0 " $ICU_SRC/icu4j/main/core/src/main/resources/com/ibm/icu/impl/data/icudata";}' | sh
    683 - The procedure above is very conservative:
    684  It refreshes only the parts of the ICU4J data that we think are affected by a Unicode data update.
    685  It avoids dealing with any other discrepancies
    686  between the source and generated data files.
    687  *If* instead we wanted to refresh *all* of the ICU4J data from ICU4C:
    688      $ICU_OUT/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
    689 
    690 * refresh Java test .txt files
    691 - copy new .txt files into ICU4J's main/core/src/test/resources/com/ibm/icu/dev/data/unicode
    692    cd $ICU_SRC/icu4c/source/data/unidata
    693    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/core/src/test/resources/com/ibm/icu/dev/data/unicode
    694    cd ../../test/testdata
    695    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/core/src/test/resources/com/ibm/icu/dev/data/unicode
    696    cp -v $UNICODE_DATA/UCD/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/core/src/test/resources/com/ibm/icu/dev/data/unicode
    697 
    698 * run & fix ICU4J tests
    699 
    700 *** API additions
    701 - send notice to icu-design about new born-@stable API (enum constants etc.)
    702 
    703 *** CLDR numbering systems
    704 - look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
    705  for example:
    706    ~/unitools/mine/src$ diff -u unicodetools/data/ucd/15.1.0/extracted/DerivedGeneralCategory.txt unicodetools/data/ucd/dev/extracted/DerivedGeneralCategory.txt | grep '; Nd' | egrep '^\+'
    707    -->
    708      +10D40..10D49  ; Nd #  [10] GARAY DIGIT ZERO..GARAY DIGIT NINE
    709      +116D0..116E3  ; Nd #  [20] MYANMAR PAO DIGIT ZERO..MYANMAR EASTERN PWO KAREN DIGIT NINE
    710      +11BF0..11BF9  ; Nd #  [10] SUNUWAR DIGIT ZERO..SUNUWAR DIGIT NINE
    711      +16130..16139  ; Nd #  [10] GURUNG KHEMA DIGIT ZERO..GURUNG KHEMA DIGIT NINE
    712      +16D70..16D79  ; Nd #  [10] KIRAT RAI DIGIT ZERO..KIRAT RAI DIGIT NINE
    713      +1CCF0..1CCF9  ; Nd #  [10] OUTLINED DIGIT ZERO..OUTLINED DIGIT NINE
    714      +1E5F1..1E5FA  ; Nd #  [10] OL ONAL DIGIT ZERO..OL ONAL DIGIT NINE
    715  --> https://github.com/unicode-org/cldr/pull/3658
    716 
    717 *** merge the Unicode update branch back onto the main branch
    718 - make sure that changes to Unicode tools are checked in:
    719  https://github.com/unicode-org/unicodetools
    720 
    721 ---------------------------------------------------------------------------- ***
    722 
    723 Unicode 15.1 update for ICU 74
    724 
    725 https://www.unicode.org/versions/Unicode15.1.0/
    726 https://www.unicode.org/versions/beta-15.1.0.html
    727 https://www.unicode.org/Public/draft/
    728 https://www.unicode.org/reports/uax-proposed-updates.html
    729 https://www.unicode.org/reports/tr44/tr44-31.html
    730 
    731 https://unicode-org.atlassian.net/browse/ICU-22404 Unicode 15.1
    732 https://unicode-org.atlassian.net/browse/CLDR-16669 BRS Unicode 15.1
    733 
    734 https://github.com/unicode-org/unicodetools/issues/492 adjust cldr/*BreakTest generation for Unicode 15.1
    735 
    736 * Command-line environment setup
    737 
    738 Markus:
    739 
    740 export UNIDATA_ROOT=~/unidata
    741 export UNICODE_DATA=$UNIDATA_ROOT/uni15.1/final
    742 export CLDR_SRC=~/cldr/uni/src
    743 export ICU_ROOT=~/icu/uni
    744 export ICU_SRC=$ICU_ROOT/src
    745 export ICU_OUT=$ICU_ROOT/dbg
    746 export ICUDT=icudt74b
    747 export ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
    748 export ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
    749 export LD_LIBRARY_PATH=$ICU_OUT/icu4c/lib
    750 export UNICODE_TOOLS=~/unitools/mine/src
    751 
    752 Elango:
    753 
    754 export UNIDATA_ROOT=~/oss/unidata
    755 export UNICODE_DATA=$UNIDATA_ROOT/uni15.1/snapshot
    756 export CLDR_SRC=~/oss/cldr/mine/src
    757 export ICU_ROOT=~/oss/icu
    758 export ICU_SRC=$ICU_ROOT
    759 export ICU_OUT=$ICU_ROOT
    760 export ICUDT=icudt74b
    761 export ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
    762 export ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
    763 export LD_LIBRARY_PATH=$ICU_OUT/icu4c/lib
    764 export UNICODE_TOOLS=~/oss/unicodetools/mine/src
    765 
    766 *** Unicode version numbers
    767 - makedata.mak
    768 - uchar.h
    769 - com.ibm.icu.util.VersionInfo
    770 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
    771 
    772 *** Configure: Build Unicode data for ICU4J
    773 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
    774    so that the makefiles see the new version number.
    775  cd $ICU_OUT/icu4c
    776  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
    777 
    778 *** data files & enums & parser code
    779 
    780 * download files
    781 - same as for the early Unicode Tools setup and data refresh:
    782  https://github.com/unicode-org/unicodetools/blob/main/docs/index.md
    783  https://github.com/unicode-org/unicodetools/blob/main/docs/inputdata.md
    784 - mkdir -p $UNICODE_DATA
    785 - download Unicode files into $UNICODE_DATA
    786  + new since Unicode 15.1:
    787    for the pre-release (alpha, beta) data files,
    788    download all of https://www.unicode.org/Public/draft/
    789    (you can omit or discard the UCD/charts/ and UCD/ucdxml/ files/folders)
    790  + if one of us produces the alpha.zip or beta.zip collection of data files for publication,
    791    then we can use its contents directly (no FTP from unicode.org necessary)
    792  + for final-release data files, the source of truth are the files in
    793    https://www.unicode.org/Public/(version) [=UCD],
    794    https://www.unicode.org/Public/UCA/(version),
    795    https://www.unicode.org/Public/idna/(version),
    796    etc.
    797  + use an FTP client; anonymous FTP from www.unicode.org at /Public/draft etc.
    798  + subfolders: emoji, idna, security, ucd, uca
    799  + whichever way you download the files:
    800    ~ inside ucd: extract Unihan.zip to "here" (.../UCD/ucd/Unihan/*.txt), delete Unihan.zip
    801    ~ split Unihan into single-property files
    802      ~/unitools/mine/src$ py/splitunihan.py $UNICODE_DATA/UCD/ucd/Unihan
    803    ~ FYI: for updating ICU, we do not actually need Unihan.zip contents
    804  + alternate way of fetching files, if available:
    805    copy the files from a Unicode Tools workspace that is up to date with
    806    https://github.com/unicode-org/unicodetools
    807    and which might at this point be *ahead* of "Public"
    808    ~ before the Unicode release copy files from "dev" subfolders, for example
    809      https://github.com/unicode-org/unicodetools/tree/main/unicodetools/data/ucd/dev
    810 - get the CLDR version of GraphemeBreakTest.txt from CLDR (if it has been updated there already)
    811    or from the UCD/cldr/ output folder of the Unicode Tools:
    812    From Unicode 12/CLDR 35/ICU 64 to Unicode 15.0/CLDR 43/ICU 73,
    813    CLDR used modified grapheme break rules.
    814    This might happen again.
    815  cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt icu4c/source/test/testdata
    816    or
    817  cp ~/unitools/mine/Generated/UCD/15.1.0/cldr/GraphemeBreakTest-cldr.txt icu4c/source/test/testdata/GraphemeBreakTest.txt
    818  cp ~/unitools/mine/Generated/UCD/15.1.0/cldr/GraphemeBreakTest-cldr.txt $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
    819  cp ~/unitools/mine/Generated/UCD/15.1.0/cldr/GraphemeBreakTest-cldr.html $CLDR_SRC/common/properties/segments/GraphemeBreakTest.html
    820  + Done: figure out whether we need a CLDR version of LineBreakTest.txt:
    821    unicodetools issue #492
    822    We should have had one, and instead rbbitst.cpp has "known issue" exception.
    823    Unicode 16 and CLDR 46 might get back to having the same behavior.
    824 - cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
    825  + done in ICU 76: modify preparseucd.py to copy this file
    826 
    827 * Note: Since Unicode 15.1, data files are no longer published with version suffixes
    828  even during the alpha or beta.
    829  Thus we no longer need steps & tools to remove those suffixes.
    830  (remove this note next time)
    831 
    832 * process and/or copy files
    833 - cd $ICU_SRC/tools/unicode
    834  py/preparseucd.py $UNICODE_DATA $ICU_SRC
    835  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
    836  + For debugging, and tweaking how ppucd.txt is written,
    837    the tool has an --only_ppucd option:
    838    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
    839 
    840 * new constants for new property values
    841 - preparseucd.py error:
    842    ValueError: missing uchar.h enum constants for some property values: [('blk', {'CJK_Ext_I'}), ('lb', {'VF', 'VI', 'AS', 'AK', 'AP'})]
    843  = PropertyValueAliases.txt new property values (diff old & new .txt files)
    844    cd $UNIDATA_ROOT
    845    $ diff -u uni15.0/ucd/PropertyValueAliases.txt uni15.1/snapshot/UCD/ucd/PropertyValueAliases.txt | egrep '^[-+][a-zA-Z]'
    846    +age; 15.1                             ; V15_1
    847    +blk; CJK_Ext_I                        ; CJK_Unified_Ideographs_Extension_I
    848    +IDSU; N                               ; No                               ; F                                ; False
    849    +IDSU; Y                               ; Yes                              ; T                                ; True
    850    +ID_Compat_Math_Continue; N            ; No                               ; F                                ; False
    851    +ID_Compat_Math_Continue; Y            ; Yes                              ; T                                ; True
    852    +ID_Compat_Math_Start; N               ; No                               ; F                                ; False
    853    +ID_Compat_Math_Start; Y               ; Yes                              ; T                                ; True
    854    +lb ; AK                               ; Aksara
    855    +lb ; AP                               ; Aksara_Prebase
    856    +lb ; AS                               ; Aksara_Start
    857    +lb ; VF                               ; Virama_Final
    858    +lb ; VI                               ; Virama
    859  -> add new blocks to uchar.h before UBLOCK_COUNT
    860    use long property names for enum constants,
    861    for the trailing comment get the block start code point: diff old & new Blocks.txt
    862    cd $UNIDATA_ROOT
    863    $ diff -u uni15.0/ucd/Blocks.txt uni15.1/snapshot/UCD/ucd/Blocks.txt | egrep '^[-+][0-9A-Z]'
    864    +2EBF0..2EE4F; CJK Unified Ideographs Extension I
    865    (ignore blocks whose end code point changed)
    866  -> add new blocks to UCharacter.UnicodeBlock IDs
    867    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
    868            replace  public static final int \1_ID = \2; \3
    869  -> add new blocks to UCharacter.UnicodeBlock objects
    870    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
    871            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
    872  -> add new line break values to uchar.h & UCharacter.LineBreak
    873 
    874 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
    875    (not strictly necessary for NOT_ENCODED scripts)
    876  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
    877 
    878 * build ICU
    879  to make sure that there are no syntax errors
    880 
    881  $ICU_OUT/icu4c$ echo;echo; date; make -j7 tests &> out.txt ; tail -n 30 out.txt ; date
    882 
    883 * update spoof checker UnicodeSet initializers:
    884    inclusionPat & recommendedPat in i18n/uspoof.cpp
    885    INCLUSION & RECOMMENDED in SpoofChecker.java
    886 - make sure that the Unicode Tools tree contains the latest security data files
    887 - go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
    888 - run the tool (no special environment variables needed)
    889  cd $UNICODE_TOOLS
    890  mvn -s ~/.m2/settings.xml compile exec:java -Dexec.mainClass="org.unicode.text.tools.RecommendedSetGenerator" \ 
    891      -Dexec.args="" -am -pl unicodetools  -DCLDR_DIR=$(cd ../../../cldr/mine/src ; pwd) -DUNICODETOOLS_REPO_DIR=$(pwd)
    892 - copy & paste from the Console output into the .cpp & .java files
    893 
    894 * check hardcoded IDS_Unary_Operator
    895 - new in Unicode 15.1, hardcoded because trivial, and unlikely to change
    896 - check that it has not changed:
    897    (cd $UNICODE_DATA && grep -r --include=PropList.txt IDS_Unary_Operator)
    898 - if it has changed, then update the implementation and the tests
    899 - Since ICU 75, this property is tested in C++ intltest against ppucd.txt.
    900 
    901 * check hardcoded ID_Compat_Math_Start & ID_Compat_Math_Continue
    902 - new in Unicode 15.1, hardcoded because trivial, and unlikely to change
    903 - check that they have not changed:
    904    (cd $UNICODE_DATA && grep -r --include=PropList.txt ID_Compat_Math)
    905 - if they have changed, then update the implementation and the tests
    906 - Since ICU 75, these properties are tested in C++ intltest against ppucd.txt.
    907 
    908 * Bazel build process
    909 
    910 See https://unicode-org.github.io/icu/processes/unicode-update#bazel-build-process
    911 for an overview and for setup instructions.
    912 
    913 Consider running `bazelisk --version` outside of the $ICU_SRC folder
    914 to find out the latest `bazel` version, and
    915 copying that version number into the $ICU_SRC/.bazeliskrc config file.
    916 (Revert if you find incompatibilities, or, better, update our build & config files.)
    917 
    918 * generate data files
    919 
    920 - remember to define the environment variables
    921  (see the start of the section for this Unicode version)
    922 - cd $ICU_SRC
    923 - optional but not necessary:
    924    bazelisk clean
    925      or even
    926    bazelisk clean --expunge
    927 - build/bootstrap/generate new files:
    928    icu4c/source/data/unidata/generate.sh
    929 
    930 * Since Unicode 15.1, the UTS #46 data derivation no longer looks at the decompositions (NFD).
    931  These characters are now just valid, no longer disallowed_STD3_valid.
    932  Remove special handling of U+2260, U+226E, U+226F (isNonASCIIDisallowedSTD3Valid())
    933  from uts46.cpp & UTS46.java,
    934  and special test code from uts46test.cpp & UTS46Test.java.
    935  (remove this section next time)
    936 
    937 * run & fix ICU4C tests
    938 - Note: Some of the collation data and test data will be updated below,
    939  so at this time we might get some collation test failures.
    940  Ignore these for now.
    941 - fix Unicode Tools class Segmenter to generate correct *BreakTest.txt files
    942 - update CLDR GraphemeBreakTest.txt
    943    cd ~/unitools/mine/Generated
    944    cp UCD/15.1.0/cldr/GraphemeBreakTest-cldr.txt $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
    945    cp UCD/15.1.0/cldr/GraphemeBreakTest-cldr.html $CLDR_SRC/common/properties/segments/GraphemeBreakTest.html
    946    cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt $ICU_SRC/icu4c/source/test/testdata
    947 - Robin or Andy helps with RBBI & spoof check test failures
    948 
    949 * collation: CLDR collation root, UCA DUCET
    950 
    951 - UCA DUCET goes into Mark's Unicode tools,
    952  and a tool-tailored version goes into CLDR, see
    953    https://github.com/unicode-org/unicodetools/blob/main/docs/uca/index.md
    954 
    955 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
    956    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
    957 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
    958    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
    959    (note removing the underscore before "Rules")
    960    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
    961 - restore TODO diffs in UCARules.txt
    962    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
    963 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
    964  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
    965  from the CLDR root files (..._CLDR_..._SHORT.txt)
    966    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
    967    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
    968    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
    969 - if CLDR common/uca/unihan-index.txt changes, then update
    970  CLDR common/collation/root.xml <collation type="private-unihan">
    971  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
    972 
    973 - generate data files, as above (generate.sh), now to pick up new collation data
    974 - update CollationFCD.java:
    975  copy & paste the initializers of lcccIndex[] etc. from
    976    ICU4C/source/i18n/collationfcd.cpp to
    977    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
    978 - rebuild ICU4C (make clean, make check, as usual)
    979 
    980 * Unihan collators
    981    https://github.com/unicode-org/unicodetools/blob/main/docs/unihan.md
    982 - run Unicode Tools GenerateUnihanCollators & GenerateUnihanCollatorFiles,
    983  check CLDR diffs, copy to CLDR, test CLDR, ... as documented there
    984 - generate ICU zh collation data
    985    instructions inspired by
    986    https://github.com/unicode-org/icu/blob/main/tools/cldr/cldr-to-icu/README.txt and
    987    https://github.com/unicode-org/icu/blob/main/icu4c/source/data/cldr-icu-readme.txt
    988  + setup:
    989    export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
    990        (didn't work without setting JAVA_HOME,
    991         nor with the Google default of /usr/local/buildtools/java/jdk
    992         [Google security limitations in the XML parser])
    993    export TOOLS_ROOT=$ICU_SRC/tools
    994    export CLDR_DIR=$CLDR_SRC
    995    export CLDR_DATA_DIR=$CLDR_DIR
    996        (pointing to the "raw" data, not cldr-staging/.../production should be ok for the relevant files)
    997    cd "$TOOLS_ROOT/cldr/lib"
    998    ./install-cldr-jars.sh "$CLDR_DIR"
    999  + generate the files we need
   1000    cd "$TOOLS_ROOT/cldr/cldr-to-icu"
   1001    ant -f build-icu-data.xml -DoutDir=/tmp/icu -DoutputTypes=coll,transforms -DlocaleIdFilter='zh.*'
   1002  + diff
   1003    cd $ICU_SRC
   1004    meld icu4c/source/data/coll/zh.txt /tmp/icu/coll/zh.txt
   1005    meld icu4c/source/data/translit/Hani_Latn.txt /tmp/icu/translit/Hani_Latn.txt
   1006  + copy into the source tree
   1007    cd $ICU_SRC
   1008    cp /tmp/icu/coll/zh.txt icu4c/source/data/coll/zh.txt
   1009    cp /tmp/icu/translit/Hani_Latn.txt icu4c/source/data/translit/Hani_Latn.txt
   1010 - rebuild ICU4C
   1011 
   1012 * run & fix ICU4C tests, now with new CLDR collation root data
   1013 - run all tests with the collation test data *_SHORT.txt or the full files
   1014  (the full ones have comments, useful for debugging)
   1015 - note on intltest: if collate/UCAConformanceTest fails, then
   1016  utility/MultithreadTest/TestCollators will fail as well;
   1017  fix the conformance test before looking into the multi-thread test
   1018 
   1019 * update Java data files
   1020 - refresh just the UCD/UCA-related/derived files, just to be safe
   1021 - see (ICU4C)/source/data/icu4j-readme.txt
   1022 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   1023 - $ICU_OUT/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   1024    NOTE: If you get the error "No rule to make target 'out/build/icudt70l/uprops.icu'",
   1025    you need to reconfigure with unicore data; see the "configure" line above.
   1026  output:
   1027    ...
   1028    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
   1029    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt74b
   1030    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt74b
   1031    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt74l.dat ./out/icu4j/icudt74b.dat -s ./out/build/icudt74l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt74b
   1032    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt74b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt74b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt74b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt74b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt74b"
   1033    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt74b/
   1034    mkdir -p /tmp/icu4j/main/shared/data
   1035    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
   1036    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt74b/
   1037    mkdir -p /tmp/icu4j/main/shared/data
   1038    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
   1039    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
   1040 - copy the binary data files into the ICU4J tree
   1041    cd $ICU_OUT/icu4c/data/out/icu4j
   1042    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* $ICU_SRC/icu4j/main/core/src/main/resources/com/ibm/icu/impl/data/$ICUDT/coll
   1043    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* $ICU_SRC/icu4j/main/core/src/main/resources/com/ibm/icu/impl/data/$ICUDT/brkitr
   1044    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu $ICU_SRC/icu4j/main/core/src/main/resources/com/ibm/icu/impl/data/$ICUDT
   1045    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm $ICU_SRC/icu4j/main/core/src/main/resources/com/ibm/icu/impl/data/$ICUDT
   1046    cd com/ibm/icu/impl/data/$ICUDT/
   1047    ls *.icu | egrep -v "cnvalias.icu" | awk '{print "cp " $0 " $ICU_SRC/icu4j/main/core/src/main/resources/com/ibm/icu/impl/data/$ICUDT";}' | sh
   1048 - The procedure above is very conservative:
   1049  It refreshes only the parts of the ICU4J data that we think are affected by a Unicode data update.
   1050  It avoids dealing with any other discrepancies
   1051  between the source and generated data files.
   1052  *If* instead we wanted to refresh *all* of the ICU4J data from ICU4C:
   1053      $ICU_OUT/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
   1054 
   1055 * refresh Java test .txt files
   1056 - copy new .txt files into ICU4J's main/core/src/test/resources/com/ibm/icu/dev/data/unicode
   1057    cd $ICU_SRC/icu4c/source/data/unidata
   1058    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/core/src/test/resources/com/ibm/icu/dev/data/unicode
   1059    cd ../../test/testdata
   1060    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/core/src/test/resources/com/ibm/icu/dev/data/unicode
   1061    cp -v $UNICODE_DATA/UCD/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/core/src/test/resources/com/ibm/icu/dev/data/unicode
   1062 
   1063 * run & fix ICU4J tests
   1064 
   1065 *** API additions
   1066 - send notice to icu-design about new born-@stable API (enum constants etc.)
   1067 
   1068 *** CLDR numbering systems
   1069 - look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
   1070  for example:
   1071    ~/icu/mine/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt > /tmp/icu/nv4-15.txt
   1072    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt > /tmp/icu/nv4-15.1.txt
   1073    ~/icu/uni/src$ diff -u /tmp/icu/nv4-15.txt /tmp/icu/nv4-15.1.txt
   1074    -->
   1075    (empty this time)
   1076  or:
   1077    ~/unitools/mine/src$ diff -u unicodetools/data/ucd/15.0.0/extracted/DerivedGeneralCategory.txt unicodetools/data/ucd/dev/extracted/DerivedGeneralCategory.txt | grep '; Nd' | egrep '^\+'
   1078    -->
   1079    (empty this time)
   1080  Unicode 15.1:
   1081    (none this time)
   1082 
   1083 *** merge the Unicode update branch back onto the main branch
   1084 - do not merge the icudata.jar and testdata.jar,
   1085  instead rebuild them from merged & tested ICU4C
   1086 - if there is a merge conflict in icudata.jar, here is one way to deal with it:
   1087  +   remove icudata.jar from the commit so that rebasing is trivial
   1088  + ~/icu/uni/src$ git restore --source=main icu4j/main/shared/data/icudata.jar
   1089  + ~/icu/uni/src$ git commit -a --amend
   1090  +   switch to main, pull updates, switch back to the dev branch
   1091  + ~/icu/uni/src$ git rebase main
   1092  +   rebuild icudata.jar
   1093  + ~/icu/uni/src$ git commit -a --amend
   1094  + ~/icu/uni/src$ git push -f
   1095 - make sure that changes to Unicode tools are checked in:
   1096  https://github.com/unicode-org/unicodetools
   1097 
   1098 ---------------------------------------------------------------------------- ***
   1099 
   1100 CLDR 43 root collation update for ICU 73
   1101 
   1102 Partial update only for the root collation.
   1103 See
   1104 - https://unicode-org.atlassian.net/browse/CLDR-15946
   1105  Treat quote marks as equivalent when strength=UCOL_PRIMARY
   1106 - https://github.com/unicode-org/cldr/pull/2691
   1107  CLDR-15946 make fancy quotes primary-equal to ASCII fallbacks
   1108 - https://github.com/unicode-org/cldr/pull/2833
   1109  CLDR-15946 make fancy quotes secondary-different from each other
   1110 
   1111 The related changes to tailorings were already integrated in an earlier PR for
   1112 https://unicode-org.atlassian.net/browse/ICU-22220 ICU 73rc BRS.
   1113 
   1114 This update is for the root collation,
   1115 which is handled by different tools than the locale data updates.
   1116 
   1117 * Command-line environment setup
   1118 
   1119 export UNICODE_DATA=~/unidata/uni15/20220830
   1120 export CLDR_SRC=~/cldr/uni/src
   1121 export ICU_ROOT=~/icu/uni
   1122 export ICU_SRC=$ICU_ROOT/src
   1123 export ICUDT=icudt73b
   1124 export ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
   1125 export ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
   1126 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
   1127 
   1128 *** Configure: Build Unicode data for ICU4J
   1129  cd $ICU_ROOT/dbg/icu4c
   1130  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
   1131 
   1132 * Bazel build process
   1133 
   1134 See https://unicode-org.github.io/icu/processes/unicode-update#bazel-build-process
   1135 for an overview and for setup instructions.
   1136 
   1137 Consider running `bazelisk --version` outside of the $ICU_SRC folder
   1138 to find out the latest `bazel` version, and
   1139 copying that version number into the $ICU_SRC/.bazeliskrc config file.
   1140 (Revert if you find incompatibilities, or, better, update our build & config files.)
   1141 
   1142 * generate data files
   1143 
   1144 - remember to define the environment variables
   1145  (see the start of the section for this Unicode version)
   1146 - cd $ICU_SRC
   1147 - optional but not necessary:
   1148    bazelisk clean
   1149      or even
   1150    bazelisk clean --expunge
   1151 - build/bootstrap/generate new files:
   1152    icu4c/source/data/unidata/generate.sh
   1153 
   1154 * collation: CLDR collation root, UCA DUCET
   1155 
   1156 - UCA DUCET goes into Mark's Unicode tools,
   1157  and a tool-tailored version goes into CLDR, see
   1158    https://github.com/unicode-org/unicodetools/blob/main/docs/uca/index.md
   1159 
   1160 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
   1161    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
   1162 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
   1163    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
   1164    (note removing the underscore before "Rules")
   1165    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
   1166 - restore TODO diffs in UCARules.txt
   1167    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
   1168 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
   1169  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
   1170  from the CLDR root files (..._CLDR_..._SHORT.txt)
   1171    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
   1172    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
   1173    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
   1174 - if CLDR common/uca/unihan-index.txt changes, then update
   1175  CLDR common/collation/root.xml <collation type="private-unihan">
   1176  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
   1177 
   1178 - generate data files, as above (generate.sh), now to pick up new collation data
   1179 - rebuild ICU4C (make clean, make check, as usual)
   1180 
   1181 * run & fix ICU4C tests, now with new CLDR collation root data
   1182 - run all tests with the collation test data *_SHORT.txt or the full files
   1183  (the full ones have comments, useful for debugging)
   1184 - note on intltest: if collate/UCAConformanceTest fails, then
   1185  utility/MultithreadTest/TestCollators will fail as well;
   1186  fix the conformance test before looking into the multi-thread test
   1187 
   1188 * update Java data files
   1189 - refresh just the UCD/UCA-related/derived files, just to be safe
   1190 - see (ICU4C)/source/data/icu4j-readme.txt
   1191 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   1192 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   1193    NOTE: If you get the error "No rule to make target 'out/build/icudt70l/uprops.icu'",
   1194    you need to reconfigure with unicore data; see the "configure" line above.
   1195  output:
   1196    ...
   1197    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
   1198    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt73b
   1199    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt73b
   1200    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt73l.dat ./out/icu4j/icudt73b.dat -s ./out/build/icudt73l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt73b
   1201    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt73b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt73b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt73b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt73b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt73b"
   1202    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt73b/
   1203    mkdir -p /tmp/icu4j/main/shared/data
   1204    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
   1205    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt73b/
   1206    mkdir -p /tmp/icu4j/main/shared/data
   1207    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
   1208    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
   1209 - copy the big-endian Unicode data files to another location,
   1210  separate from the other data files,
   1211  and then refresh ICU4J
   1212    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
   1213    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   1214    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   1215    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
   1216 - new for ICU 73: also copy the binary data files directly into the ICU4J tree
   1217    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* $ICU_SRC/icu4j/maven-build/maven-icu4j-datafiles/src/main/resources/com/ibm/icu/impl/data/$ICUDT/coll
   1218 
   1219 * When refreshing all of ICU4J data from ICU4C
   1220 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   1221 - cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
   1222 or
   1223 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
   1224 
   1225 * refresh Java test .txt files
   1226 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
   1227    cd $ICU_SRC/icu4c/source/data/unidata
   1228    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   1229    cd ../../test/testdata
   1230    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   1231    cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   1232 
   1233 * run & fix ICU4J tests
   1234 
   1235 *** merge the Unicode update branch back onto the main branch
   1236 - do not merge the icudata.jar and testdata.jar,
   1237  instead rebuild them from merged & tested ICU4C
   1238 - if there is a merge conflict in icudata.jar, here is one way to deal with it:
   1239  +   remove icudata.jar from the commit so that rebasing is trivial
   1240  + ~/icu/uni/src$ git restore --source=main icu4j/main/shared/data/icudata.jar
   1241  + ~/icu/uni/src$ git commit -a --amend
   1242  +   switch to main, pull updates, switch back to the dev branch
   1243  + ~/icu/uni/src$ git rebase main
   1244  +   rebuild icudata.jar
   1245  + ~/icu/uni/src$ git commit -a --amend
   1246  + ~/icu/uni/src$ git push -f
   1247 - make sure that changes to Unicode tools are checked in:
   1248  https://github.com/unicode-org/unicodetools
   1249 
   1250 ---------------------------------------------------------------------------- ***
   1251 
   1252 Unicode 15.0 update for ICU 72
   1253 
   1254 https://www.unicode.org/versions/Unicode15.0.0/
   1255 https://www.unicode.org/versions/beta-15.0.0.html
   1256 https://www.unicode.org/Public/15.0.0/ucd/
   1257 https://www.unicode.org/reports/uax-proposed-updates.html
   1258 https://www.unicode.org/reports/tr44/tr44-29.html
   1259 
   1260 https://unicode-org.atlassian.net/browse/ICU-21980 Unicode 15
   1261 https://unicode-org.atlassian.net/browse/CLDR-15516 Unicode 15
   1262 https://unicode-org.atlassian.net/browse/CLDR-15253 Unicode 15 script metadata (in CLDR 41)
   1263 
   1264 * Command-line environment setup
   1265 
   1266 export UNICODE_DATA=~/unidata/uni15/20220830
   1267 export CLDR_SRC=~/cldr/uni/src
   1268 export ICU_ROOT=~/icu/uni
   1269 export ICU_SRC=$ICU_ROOT/src
   1270 export ICUDT=icudt72b
   1271 export ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
   1272 export ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
   1273 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
   1274 
   1275 *** Unicode version numbers
   1276 - makedata.mak
   1277 - uchar.h
   1278 - com.ibm.icu.util.VersionInfo
   1279 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
   1280 
   1281 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
   1282    so that the makefiles see the new version number.
   1283  cd $ICU_ROOT/dbg/icu4c
   1284  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
   1285 
   1286 *** data files & enums & parser code
   1287 
   1288 * download files
   1289 - same as for the early Unicode Tools setup and data refresh:
   1290  https://github.com/unicode-org/unicodetools/blob/main/docs/index.md
   1291  https://github.com/unicode-org/unicodetools/blob/main/docs/inputdata.md
   1292 - mkdir -p $UNICODE_DATA
   1293 - download Unicode files into $UNICODE_DATA
   1294  + subfolders: emoji, idna, security, ucd, uca
   1295  + old way of fetching files: from the "Public" area on unicode.org
   1296    ~ inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
   1297    ~ split Unihan into single-property files
   1298      ~/unitools/mine/src$ py/splitunihan.py $UNICODE_DATA/ucd/Unihan
   1299  + new way of fetching files, if available:
   1300    copy the files from a Unicode Tools workspace that is up to date with
   1301    https://github.com/unicode-org/unicodetools
   1302    and which might at this point be *ahead* of "Public"
   1303    ~ before the Unicode release copy files from "dev" subfolders, for example
   1304      https://github.com/unicode-org/unicodetools/tree/main/unicodetools/data/ucd/dev
   1305  + get GraphemeBreakTest-cldr.txt from $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
   1306    or from the UCD/cldr/ output folder of the Unicode Tools:
   1307    Since Unicode 12/CLDR 35/ICU 64 CLDR uses modified break rules.
   1308  cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt icu4c/source/test/testdata
   1309    or
   1310  cp ~/unitools/mine/Generated/UCD/15.0.0/cldr/GraphemeBreakTest-cldr.txt icu4c/source/test/testdata/GraphemeBreakTest.txt
   1311 
   1312 * for manual diffs and for Unicode Tools input data updates:
   1313  remove version suffixes from the file names
   1314    ~$ unidata/desuffixucd.py $UNICODE_DATA
   1315  (see https://github.com/unicode-org/unicodetools/blob/main/docs/inputdata.md)
   1316 
   1317 * process and/or copy files
   1318 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
   1319  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
   1320  + For debugging, and tweaking how ppucd.txt is written,
   1321    the tool has an --only_ppucd option:
   1322    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
   1323 
   1324 - cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
   1325 
   1326 * new constants for new property values
   1327 - preparseucd.py error:
   1328    ValueError: missing uchar.h enum constants for some property values: [('blk', {'Nag_Mundari', 'CJK_Ext_H', 'Kawi', 'Kaktovik_Numerals', 'Devanagari_Ext_A', 'Arabic_Ext_C', 'Cyrillic_Ext_D'}), ('sc', {'Nagm', 'Kawi'})]
   1329  = PropertyValueAliases.txt new property values (diff old & new .txt files)
   1330    ~/unidata$ diff -u uni14/20210922/ucd/PropertyValueAliases.txt uni15/beta/ucd/PropertyValueAliases.txt | egrep '^[-+][a-zA-Z]'
   1331    +age; 15.0                             ; V15_0
   1332    +blk; Arabic_Ext_C                     ; Arabic_Extended_C
   1333    +blk; CJK_Ext_H                        ; CJK_Unified_Ideographs_Extension_H
   1334    +blk; Cyrillic_Ext_D                   ; Cyrillic_Extended_D
   1335    +blk; Devanagari_Ext_A                 ; Devanagari_Extended_A
   1336    +blk; Kaktovik_Numerals                ; Kaktovik_Numerals
   1337    +blk; Kawi                             ; Kawi
   1338    +blk; Nag_Mundari                      ; Nag_Mundari
   1339    +sc ; Kawi                             ; Kawi
   1340    +sc ; Nagm                             ; Nag_Mundari
   1341  -> add new blocks to uchar.h before UBLOCK_COUNT
   1342    use long property names for enum constants,
   1343    for the trailing comment get the block start code point: diff old & new Blocks.txt
   1344    ~/unidata$ diff -u uni14/20210922/ucd/Blocks.txt uni15/beta/ucd/Blocks.txt | egrep '^[-+][0-9A-Z]'
   1345    +10EC0..10EFF; Arabic Extended-C
   1346    +11B00..11B5F; Devanagari Extended-A
   1347    +11F00..11F5F; Kawi
   1348    -13430..1343F; Egyptian Hieroglyph Format Controls
   1349    +13430..1345F; Egyptian Hieroglyph Format Controls
   1350    +1D2C0..1D2DF; Kaktovik Numerals
   1351    +1E030..1E08F; Cyrillic Extended-D
   1352    +1E4D0..1E4FF; Nag Mundari
   1353    +31350..323AF; CJK Unified Ideographs Extension H
   1354    (ignore blocks whose end code point changed)
   1355  -> add new blocks to UCharacter.UnicodeBlock IDs
   1356    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
   1357            replace  public static final int \1_ID = \2; \3
   1358  -> add new blocks to UCharacter.UnicodeBlock objects
   1359    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
   1360            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
   1361  -> add new scripts to uscript.h & com.ibm.icu.lang.UScript
   1362    Eclipse find     USCRIPT_([^ ]+) *= ([0-9]+),(/.+)
   1363            replace  public static final int \1 = \2; \3
   1364  -> for new scripts: fix expectedLong names in cintltst/cucdapi.c/TestUScriptCodeAPI()
   1365      and in com.ibm.icu.dev.test.lang.TestUScript.java
   1366 
   1367 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
   1368    (not strictly necessary for NOT_ENCODED scripts)
   1369  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
   1370 
   1371 * build ICU
   1372  to make sure that there are no syntax errors
   1373 
   1374  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 tests &> out.txt ; tail -n 30 out.txt ; date
   1375 
   1376 * update spoof checker UnicodeSet initializers:
   1377    inclusionPat & recommendedPat in i18n/uspoof.cpp
   1378    INCLUSION & RECOMMENDED in SpoofChecker.java
   1379 - make sure that the Unicode Tools tree contains the latest security data files
   1380 - go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
   1381 - run the tool (no special environment variables needed)
   1382 - copy & paste from the Console output into the .cpp & .java files
   1383 
   1384 * Bazel build process
   1385 
   1386 See https://unicode-org.github.io/icu/processes/unicode-update#bazel-build-process
   1387 for an overview and for setup instructions.
   1388 
   1389 Consider running `bazelisk --version` outside of the $ICU_SRC folder
   1390 to find out the latest `bazel` version, and
   1391 copying that version number into the $ICU_SRC/.bazeliskrc config file.
   1392 (Revert if you find incompatibilities, or, better, update our build & config files.)
   1393 
   1394 * generate data files
   1395 
   1396 - remember to define the environment variables
   1397  (see the start of the section for this Unicode version)
   1398 - cd $ICU_SRC
   1399 - optional but not necessary:
   1400    bazelisk clean
   1401 - build/bootstrap/generate new files:
   1402    icu4c/source/data/unidata/generate.sh
   1403 
   1404 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
   1405  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
   1406 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
   1407    ~/unitools/mine/src$ grep disallowed_STD3_valid unicodetools/data/idna/dev/IdnaMappingTable.txt
   1408 - Unicode 6.0..15.0: U+2260, U+226E, U+226F
   1409 - nothing new in this Unicode version, no test file to update
   1410 
   1411 * run & fix ICU4C tests
   1412 - Note: Some of the collation data and test data will be updated below,
   1413  so at this time we might get some collation test failures.
   1414  Ignore these for now.
   1415 - fix Unicode Tools class Segmenter to generate correct *BreakTest.txt files
   1416  (no rule changes in Unicode 15)
   1417 - update CLDR GraphemeBreakTest.txt
   1418    cd ~/unitools/mine/Generated
   1419    cp UCD/15.0.0/cldr/GraphemeBreakTest-cldr.txt $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
   1420    cp UCD/15.0.0/cldr/GraphemeBreakTest-cldr.html $CLDR_SRC/common/properties/segments/GraphemeBreakTest.html
   1421    cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt $ICU_SRC/icu4c/source/test/testdata
   1422 - Andy helps with RBBI & spoof check test failures
   1423 
   1424 * collation: CLDR collation root, UCA DUCET
   1425 
   1426 - UCA DUCET goes into Mark's Unicode tools,
   1427  and a tool-tailored version goes into CLDR, see
   1428    https://github.com/unicode-org/unicodetools/blob/main/docs/uca/index.md
   1429 
   1430 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
   1431    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
   1432 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
   1433    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
   1434    (note removing the underscore before "Rules")
   1435    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
   1436 - restore TODO diffs in UCARules.txt
   1437    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
   1438 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
   1439  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
   1440  from the CLDR root files (..._CLDR_..._SHORT.txt)
   1441    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
   1442    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
   1443    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
   1444 - if CLDR common/uca/unihan-index.txt changes, then update
   1445  CLDR common/collation/root.xml <collation type="private-unihan">
   1446  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
   1447 
   1448 - generate data files, as above (generate.sh), now to pick up new collation data
   1449 - update CollationFCD.java:
   1450  copy & paste the initializers of lcccIndex[] etc. from
   1451    ICU4C/source/i18n/collationfcd.cpp to
   1452    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
   1453 - rebuild ICU4C (make clean, make check, as usual)
   1454 
   1455 * Unihan collators
   1456    https://github.com/unicode-org/unicodetools/blob/main/docs/unihan.md
   1457 - run Unicode Tools GenerateUnihanCollators & GenerateUnihanCollatorFiles,
   1458  check CLDR diffs, copy to CLDR, test CLDR, ... as documented there
   1459 - generate ICU zh collation data
   1460    instructions inspired by
   1461    https://github.com/unicode-org/icu/blob/main/tools/cldr/cldr-to-icu/README.txt and
   1462    https://github.com/unicode-org/icu/blob/main/icu4c/source/data/cldr-icu-readme.txt
   1463  + setup:
   1464    export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
   1465        (didn't work without setting JAVA_HOME,
   1466         nor with the Google default of /usr/local/buildtools/java/jdk
   1467         [Google security limitations in the XML parser])
   1468    export TOOLS_ROOT=~/icu/uni/src/tools
   1469    export CLDR_DIR=~/cldr/uni/src
   1470    export CLDR_DATA_DIR=~/cldr/uni/src
   1471        (pointing to the "raw" data, not cldr-staging/.../production should be ok for the relevant files)
   1472    cd "$TOOLS_ROOT/cldr/lib"
   1473    ./install-cldr-jars.sh "$CLDR_DIR"
   1474  + generate the files we need
   1475    cd "$TOOLS_ROOT/cldr/cldr-to-icu"
   1476    ant -f build-icu-data.xml -DoutDir=/tmp/icu -DoutputTypes=coll,transforms -DlocaleIdFilter='zh.*'
   1477  + diff
   1478    cd $ICU_SRC
   1479    meld icu4c/source/data/coll/zh.txt /tmp/icu/coll/zh.txt
   1480    meld icu4c/source/data/translit/Hani_Latn.txt /tmp/icu/translit/Hani_Latn.txt
   1481  + copy into the source tree
   1482    cd $ICU_SRC
   1483    cp /tmp/icu/coll/zh.txt icu4c/source/data/coll/zh.txt
   1484    cp /tmp/icu/translit/Hani_Latn.txt icu4c/source/data/translit/Hani_Latn.txt
   1485 - rebuild ICU4C
   1486 
   1487 * run & fix ICU4C tests, now with new CLDR collation root data
   1488 - run all tests with the collation test data *_SHORT.txt or the full files
   1489  (the full ones have comments, useful for debugging)
   1490 - note on intltest: if collate/UCAConformanceTest fails, then
   1491  utility/MultithreadTest/TestCollators will fail as well;
   1492  fix the conformance test before looking into the multi-thread test
   1493 
   1494 * update Java data files
   1495 - refresh just the UCD/UCA-related/derived files, just to be safe
   1496 - see (ICU4C)/source/data/icu4j-readme.txt
   1497 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   1498 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   1499    NOTE: If you get the error "No rule to make target 'out/build/icudt70l/uprops.icu'",
   1500    you need to reconfigure with unicore data; see the "configure" line above.
   1501  output:
   1502    ...
   1503    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
   1504    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt72b
   1505    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt72b
   1506    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt72l.dat ./out/icu4j/icudt72b.dat -s ./out/build/icudt72l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt72b
   1507    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt72b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt72b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt72b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt72b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt72b"
   1508    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt72b/
   1509    mkdir -p /tmp/icu4j/main/shared/data
   1510    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
   1511    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt72b/
   1512    mkdir -p /tmp/icu4j/main/shared/data
   1513    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
   1514    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
   1515 - copy the big-endian Unicode data files to another location,
   1516  separate from the other data files,
   1517  and then refresh ICU4J
   1518    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
   1519    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   1520    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   1521    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   1522    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   1523    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
   1524    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   1525    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   1526    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   1527    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
   1528 
   1529 * When refreshing all of ICU4J data from ICU4C
   1530 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   1531 - cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
   1532 or
   1533 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
   1534 
   1535 * refresh Java test .txt files
   1536 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
   1537    cd $ICU_SRC/icu4c/source/data/unidata
   1538    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   1539    cd ../../test/testdata
   1540    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   1541    cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   1542 
   1543 * run & fix ICU4J tests
   1544 
   1545 *** API additions
   1546 - send notice to icu-design about new born-@stable API (enum constants etc.)
   1547 
   1548 *** CLDR numbering systems
   1549 - look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
   1550  for example:
   1551    ~/icu/mine/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt > /tmp/icu/nv4-14.txt
   1552    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt > /tmp/icu/nv4-15.txt
   1553    ~/icu/uni/src$ diff -u /tmp/icu/nv4-14.txt /tmp/icu/nv4-15.txt
   1554    -->
   1555    +cp;11F54;-Alpha;gc=Nd;InSC=Number;lb=NU;na=KAWI DIGIT FOUR;nt=De;nv=4;SB=NU;WB=NU;-XIDS
   1556    +cp;1E4F4;-Alpha;gc=Nd;-IDS;lb=NU;na=NAG MUNDARI DIGIT FOUR;nt=De;nv=4;SB=NU;WB=NU;-XIDS
   1557  or:
   1558    ~/unitools/mine/src$ diff -u unicodetools/data/ucd/14.0.0-Update/extracted/DerivedGeneralCategory.txt unicodetools/data/ucd/dev/extracted/DerivedGeneralCategory.txt | grep '; Nd' | egrep '^\+'
   1559    -->
   1560    +11F50..11F59  ; Nd #  [10] KAWI DIGIT ZERO..KAWI DIGIT NINE
   1561    +1E4F0..1E4F9  ; Nd #  [10] NAG MUNDARI DIGIT ZERO..NAG MUNDARI DIGIT NINE
   1562  Unicode 15:
   1563    kawi 11F50..11F59 Kawi
   1564    nagm 1E4F0..1E4F9 Nag Mundari
   1565    https://github.com/unicode-org/cldr/pull/2041
   1566 
   1567 *** merge the Unicode update branches back onto the trunk
   1568 - do not merge the icudata.jar and testdata.jar,
   1569  instead rebuild them from merged & tested ICU4C
   1570 - if there is a merge conflict in icudata.jar, here is one way to deal with it:
   1571  +   remove icudata.jar from the commit so that rebasing is trivial
   1572  + ~/icu/uni/src$ git restore --source=main icu4j/main/shared/data/icudata.jar
   1573  + ~/icu/uni/src$ git commit -a --amend
   1574  +   switch to main, pull updates, switch back to the dev branch
   1575  + ~/icu/uni/src$ git rebase main
   1576  +   rebuild icudata.jar
   1577  + ~/icu/uni/src$ git commit -a --amend
   1578  + ~/icu/uni/src$ git push -f
   1579 - make sure that changes to Unicode tools are checked in:
   1580  https://github.com/unicode-org/unicodetools
   1581 
   1582 ---------------------------------------------------------------------------- ***
   1583 
   1584 Unicode 14.0 update for ICU 70
   1585 
   1586 https://www.unicode.org/versions/Unicode14.0.0/
   1587 https://www.unicode.org/versions/beta-14.0.0.html
   1588 https://www.unicode.org/Public/14.0.0/ucd/
   1589 https://www.unicode.org/reports/uax-proposed-updates.html
   1590 https://www.unicode.org/reports/tr44/tr44-27.html
   1591 
   1592 https://unicode-org.atlassian.net/browse/CLDR-14801
   1593 https://unicode-org.atlassian.net/browse/ICU-21635
   1594 
   1595 * Command-line environment setup
   1596 
   1597 export UNICODE_DATA=~/unidata/uni14/20210903
   1598 export CLDR_SRC=~/cldr/uni/src
   1599 export ICU_ROOT=~/icu/uni
   1600 export ICU_SRC=$ICU_ROOT/src
   1601 export ICUDT=icudt70b
   1602 export ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
   1603 export ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
   1604 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
   1605 
   1606 *** Unicode version numbers
   1607 - makedata.mak
   1608 - uchar.h
   1609 - com.ibm.icu.util.VersionInfo
   1610 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
   1611 
   1612 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
   1613    so that the makefiles see the new version number.
   1614  cd $ICU_ROOT/dbg/icu4c
   1615  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
   1616 
   1617 *** data files & enums & parser code
   1618 
   1619 * download files
   1620 - same as for the early Unicode Tools setup and data refresh:
   1621  https://github.com/unicode-org/unicodetools/blob/main/docs/index.md
   1622  https://github.com/unicode-org/unicodetools/blob/main/docs/inputdata.md
   1623 - mkdir -p $UNICODE_DATA
   1624 - download Unicode files into $UNICODE_DATA
   1625  + subfolders: emoji, idna, security, ucd, uca
   1626  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
   1627  + split Unihan into single-property files
   1628    ~/unitools/mine/src$ py/splitunihan.py $UNICODE_DATA/ucd/Unihan
   1629  + get GraphemeBreakTest-cldr.txt from $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
   1630    or from the UCD/cldr/ output folder of the Unicode Tools:
   1631    Since Unicode 12/CLDR 35/ICU 64 CLDR uses modified break rules.
   1632  cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt icu4c/source/test/testdata
   1633    or
   1634  cp ~/unitools/mine/Generated/UCD/d19/cldr/GraphemeBreakTest-cldr-14.0.0d19.txt icu4c/source/test/testdata/GraphemeBreakTest.txt
   1635 
   1636 * for manual diffs and for Unicode Tools input data updates:
   1637  remove version suffixes from the file names
   1638    ~$ unidata/desuffixucd.py $UNICODE_DATA
   1639  (see https://github.com/unicode-org/unicodetools/blob/main/docs/inputdata.md)
   1640 
   1641 * process and/or copy files
   1642 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
   1643  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
   1644  + For debugging, and tweaking how ppucd.txt is written,
   1645    the tool has an --only_ppucd option:
   1646    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
   1647 
   1648 - cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
   1649 
   1650 * new constants for new property values
   1651 - preparseucd.py error:
   1652    ValueError: missing uchar.h enum constants for some property values:
   1653    [(u'blk', set([u'Toto', u'Tangsa', u'Cypro_Minoan', u'Arabic_Ext_B', u'Vithkuqi', u'Old_Uyghur', u'Latin_Ext_F', u'UCAS_Ext_A', u'Kana_Ext_B', u'Ethiopic_Ext_B', u'Latin_Ext_G', u'Znamenny_Music'])),
   1654    (u'jg', set([u'Vertical_Tail', u'Thin_Yeh'])),
   1655    (u'sc', set([u'Toto', u'Ougr', u'Vith', u'Tnsa', u'Cpmn']))]
   1656  = PropertyValueAliases.txt new property values (diff old & new .txt files)
   1657    ~/unidata$ diff -u uni13/20200304/ucd/PropertyValueAliases.txt uni14/20210609/ucd/PropertyValueAliases.txt | egrep '^[-+][a-zA-Z]'
   1658    +age; 14.0                             ; V14_0
   1659    +blk; Arabic_Ext_B                     ; Arabic_Extended_B
   1660    +blk; Cypro_Minoan                     ; Cypro_Minoan
   1661    +blk; Ethiopic_Ext_B                   ; Ethiopic_Extended_B
   1662    +blk; Kana_Ext_B                       ; Kana_Extended_B
   1663    +blk; Latin_Ext_F                      ; Latin_Extended_F
   1664    +blk; Latin_Ext_G                      ; Latin_Extended_G
   1665    +blk; Old_Uyghur                       ; Old_Uyghur
   1666    +blk; Tangsa                           ; Tangsa
   1667    +blk; Toto                             ; Toto
   1668    +blk; UCAS_Ext_A                       ; Unified_Canadian_Aboriginal_Syllabics_Extended_A
   1669    +blk; Vithkuqi                         ; Vithkuqi
   1670    +blk; Znamenny_Music                   ; Znamenny_Musical_Notation
   1671    +jg ; Thin_Yeh                         ; Thin_Yeh
   1672    +jg ; Vertical_Tail                    ; Vertical_Tail
   1673    +sc ; Cpmn                             ; Cypro_Minoan
   1674    +sc ; Ougr                             ; Old_Uyghur
   1675    +sc ; Tnsa                             ; Tangsa
   1676    +sc ; Toto                             ; Toto
   1677    +sc ; Vith                             ; Vithkuqi
   1678  -> add new blocks to uchar.h before UBLOCK_COUNT
   1679    use long property names for enum constants,
   1680    for the trailing comment get the block start code point: diff old & new Blocks.txt
   1681    ~/unidata$ diff -u uni13/20200304/ucd/Blocks.txt uni14/20210609/ucd/Blocks.txt | egrep '^[-+][0-9A-Z]'
   1682    +0870..089F; Arabic Extended-B
   1683    +10570..105BF; Vithkuqi
   1684    +10780..107BF; Latin Extended-F
   1685    +10F70..10FAF; Old Uyghur
   1686    -11700..1173F; Ahom
   1687    +11700..1174F; Ahom
   1688    +11AB0..11ABF; Unified Canadian Aboriginal Syllabics Extended-A
   1689    +12F90..12FFF; Cypro-Minoan
   1690    +16A70..16ACF; Tangsa
   1691    -18D00..18D8F; Tangut Supplement
   1692    +18D00..18D7F; Tangut Supplement
   1693    +1AFF0..1AFFF; Kana Extended-B
   1694    +1CF00..1CFCF; Znamenny Musical Notation
   1695    +1DF00..1DFFF; Latin Extended-G
   1696    +1E290..1E2BF; Toto
   1697    +1E7E0..1E7FF; Ethiopic Extended-B
   1698    (ignore blocks whose end code point changed)
   1699  -> add new blocks to UCharacter.UnicodeBlock IDs
   1700    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
   1701            replace  public static final int \1_ID = \2; \3
   1702  -> add new blocks to UCharacter.UnicodeBlock objects
   1703    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
   1704            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
   1705  -> add new scripts to uscript.h & com.ibm.icu.lang.UScript
   1706    Eclipse find     USCRIPT_([^ ]+) *= ([0-9]+),(/.+)
   1707            replace  public static final int \1 = \2; \3
   1708  -> for new scripts: fix expectedLong names in cintltst/cucdapi.c/TestUScriptCodeAPI()
   1709      and in com.ibm.icu.dev.test.lang.TestUScript.java
   1710  -> add new joining groups to uchar.h & UCharacter.JoiningGroup
   1711 
   1712 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
   1713    (not strictly necessary for NOT_ENCODED scripts)
   1714  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
   1715 
   1716 * build ICU
   1717  to make sure that there are no syntax errors
   1718 
   1719  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 tests &> out.txt ; tail -n 30 out.txt ; date
   1720 
   1721 * update spoof checker UnicodeSet initializers:
   1722    inclusionPat & recommendedPat in i18n/uspoof.cpp
   1723    INCLUSION & RECOMMENDED in SpoofChecker.java
   1724 - make sure that the Unicode Tools tree contains the latest security data files
   1725 - go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
   1726 - run the tool (no special environment variables needed)
   1727 - copy & paste from the Console output into the .cpp & .java files
   1728 
   1729 * Bazel build process
   1730 
   1731 See https://unicode-org.github.io/icu/processes/unicode-update#bazel-build-process
   1732 for an overview and for setup instructions.
   1733 
   1734 Consider running `bazelisk --version` outside of the $ICU_SRC folder
   1735 to find out the latest `bazel` version, and
   1736 copying that version number into the $ICU_SRC/.bazeliskrc config file.
   1737 (Revert if you find incompatibilities, or, better, update our build & config files.)
   1738 
   1739 * generate data files
   1740 
   1741 - remember to define the environment variables
   1742  (see the start of the section for this Unicode version)
   1743 - cd $ICU_SRC
   1744 - optional but not necessary:
   1745    bazelisk clean
   1746 - build/bootstrap/generate new files:
   1747    icu4c/source/data/unidata/generate.sh
   1748 
   1749 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
   1750  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
   1751 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
   1752 - Unicode 6.0..14.0: U+2260, U+226E, U+226F
   1753 - nothing new in this Unicode version, no test file to update
   1754 
   1755 * run & fix ICU4C tests
   1756 - fix Unicode Tools class Segmenter to generate correct *BreakTest.txt files
   1757 - update CLDR GraphemeBreakTest.txt
   1758    cd ~/unitools/mine/Generated
   1759    cp UCD/d22d/cldr/GraphemeBreakTest-cldr.txt $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
   1760    cp UCD/d22d/cldr/GraphemeBreakTest-cldr.html $CLDR_SRC/common/properties/segments/GraphemeBreakTest.html
   1761    cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt $ICU_SRC/icu4c/source/test/testdata
   1762 - Andy helps with RBBI & spoof check test failures
   1763 
   1764 * collation: CLDR collation root, UCA DUCET
   1765 
   1766 - UCA DUCET goes into Mark's Unicode tools,
   1767  and a tool-tailored version goes into CLDR, see
   1768    https://github.com/unicode-org/unicodetools/blob/main/docs/uca/index.md
   1769 
   1770 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
   1771    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
   1772 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
   1773    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
   1774    (note removing the underscore before "Rules")
   1775    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
   1776 - restore TODO diffs in UCARules.txt
   1777    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
   1778 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
   1779  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
   1780  from the CLDR root files (..._CLDR_..._SHORT.txt)
   1781    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
   1782    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
   1783    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
   1784 - if CLDR common/uca/unihan-index.txt changes, then update
   1785  CLDR common/collation/root.xml <collation type="private-unihan">
   1786  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
   1787 
   1788 - generate data files, as above (generate.sh), now to pick up new collation data
   1789 - update CollationFCD.java:
   1790  copy & paste the initializers of lcccIndex[] etc. from
   1791    ICU4C/source/i18n/collationfcd.cpp to
   1792    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
   1793 - rebuild ICU4C (make clean, make check, as usual)
   1794 
   1795 * Unihan collators
   1796    https://github.com/unicode-org/unicodetools/blob/main/docs/unihan.md
   1797 - run Unicode Tools GenerateUnihanCollators & GenerateUnihanCollatorFiles,
   1798  check CLDR diffs, copy to CLDR, test CLDR, ... as documented there
   1799 - generate ICU zh collation data
   1800    instructions inspired by
   1801    https://github.com/unicode-org/icu/blob/main/tools/cldr/cldr-to-icu/README.txt and
   1802    https://github.com/unicode-org/icu/blob/main/icu4c/source/data/cldr-icu-readme.txt
   1803  + setup:
   1804    export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
   1805        (didn't work without setting JAVA_HOME,
   1806         nor with the Google default of /usr/local/buildtools/java/jdk
   1807         [Google security limitations in the XML parser])
   1808    export TOOLS_ROOT=~/icu/uni/src/tools
   1809    export CLDR_DIR=~/cldr/uni/src
   1810    export CLDR_DATA_DIR=~/cldr/uni/src
   1811        (pointing to the "raw" data, not cldr-staging/.../production should be ok for the relevant files)
   1812    cd "$TOOLS_ROOT/cldr/lib"
   1813    ./install-cldr-jars.sh "$CLDR_DIR"
   1814  + generate the files we need
   1815    cd "$TOOLS_ROOT/cldr/cldr-to-icu"
   1816    ant -f build-icu-data.xml -DoutDir=/tmp/icu -DoutputTypes=coll,transforms -DlocaleIdFilter='zh.*'
   1817  + diff
   1818    cd $ICU_SRC
   1819    meld icu4c/source/data/coll/zh.txt /tmp/icu/coll/zh.txt
   1820    meld icu4c/source/data/translit/Hani_Latn.txt /tmp/icu/translit/Hani_Latn.txt
   1821  + copy into the source tree
   1822    cd $ICU_SRC
   1823    cp /tmp/icu/coll/zh.txt icu4c/source/data/coll/zh.txt
   1824    cp /tmp/icu/translit/Hani_Latn.txt icu4c/source/data/translit/Hani_Latn.txt
   1825 - rebuild ICU4C
   1826 
   1827 * run & fix ICU4C tests, now with new CLDR collation root data
   1828 - run all tests with the collation test data *_SHORT.txt or the full files
   1829  (the full ones have comments, useful for debugging)
   1830 - note on intltest: if collate/UCAConformanceTest fails, then
   1831  utility/MultithreadTest/TestCollators will fail as well;
   1832  fix the conformance test before looking into the multi-thread test
   1833 
   1834 * update Java data files
   1835 - refresh just the UCD/UCA-related/derived files, just to be safe
   1836 - see (ICU4C)/source/data/icu4j-readme.txt
   1837 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   1838 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   1839    NOTE: If you get the error "No rule to make target 'out/build/icudt70l/uprops.icu'",
   1840    you need to reconfigure with unicore data; see the "configure" line above.
   1841  output:
   1842    ...
   1843    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
   1844    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt70b
   1845    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt70b
   1846    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt70l.dat ./out/icu4j/icudt70b.dat -s ./out/build/icudt70l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt70b
   1847    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt70b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt70b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt70b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt70b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt70b"
   1848    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt70b/
   1849    mkdir -p /tmp/icu4j/main/shared/data
   1850    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
   1851    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt70b/
   1852    mkdir -p /tmp/icu4j/main/shared/data
   1853    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
   1854    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
   1855 - copy the big-endian Unicode data files to another location,
   1856  separate from the other data files,
   1857  and then refresh ICU4J
   1858    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
   1859    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   1860    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   1861    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   1862    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   1863    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
   1864    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   1865    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   1866    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   1867    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
   1868 
   1869 * When refreshing all of ICU4J data from ICU4C
   1870 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   1871 - cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
   1872 or
   1873 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
   1874 
   1875 * refresh Java test .txt files
   1876 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
   1877    cd $ICU_SRC/icu4c/source/data/unidata
   1878    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   1879    cd ../../test/testdata
   1880    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   1881    cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   1882 
   1883 * run & fix ICU4J tests
   1884 
   1885 *** API additions
   1886 - send notice to icu-design about new born-@stable API (enum constants etc.)
   1887 
   1888 *** CLDR numbering systems
   1889 - look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
   1890  for example:
   1891    ~/icu/mine/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt > /tmp/icu/nv4-13.txt
   1892    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt > /tmp/icu/nv4-14.txt
   1893    ~/icu/uni/src$ diff -u /tmp/icu/nv4-13.txt /tmp/icu/nv4-14.txt
   1894    -->
   1895    +cp;16AC4;-Alpha;gc=Nd;-IDS;lb=NU;na=TANGSA DIGIT FOUR;nt=De;nv=4;SB=NU;WB=NU;-XIDS
   1896  Unicode 14:
   1897    tnsa 16AC0..16AC9 Tangsa
   1898    https://github.com/unicode-org/cldr/pull/1326
   1899 
   1900 *** merge the Unicode update branches back onto the trunk
   1901 - do not merge the icudata.jar and testdata.jar,
   1902  instead rebuild them from merged & tested ICU4C
   1903 - make sure that changes to Unicode tools are checked in:
   1904  https://github.com/unicode-org/unicodetools
   1905 
   1906 ---------------------------------------------------------------------------- ***
   1907 
   1908 Unicode 13.0 update for ICU 66
   1909 
   1910 https://www.unicode.org/versions/Unicode13.0.0/
   1911 https://www.unicode.org/versions/beta-13.0.0.html
   1912 https://www.unicode.org/Public/13.0.0/ucd/
   1913 https://www.unicode.org/reports/uax-proposed-updates.html
   1914 https://www.unicode.org/reports/tr44/tr44-25.html
   1915 
   1916 https://unicode-org.atlassian.net/browse/CLDR-13387
   1917 https://unicode-org.atlassian.net/browse/ICU-20893
   1918 
   1919 * Command-line environment setup
   1920 
   1921 UNICODE_DATA=~/unidata/uni13/20200212
   1922 CLDR_SRC=~/cldr/uni/src
   1923 ICU_ROOT=~/icu/uni
   1924 ICU_SRC=$ICU_ROOT/src
   1925 ICUDT=icudt66b
   1926 ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
   1927 ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
   1928 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
   1929 
   1930 *** Unicode version numbers
   1931 - makedata.mak
   1932 - uchar.h
   1933 - com.ibm.icu.util.VersionInfo
   1934 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
   1935 
   1936 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
   1937    so that the makefiles see the new version number.
   1938  cd $ICU_ROOT/dbg/icu4c
   1939  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
   1940 
   1941 *** data files & enums & parser code
   1942 
   1943 * download files
   1944 - mkdir -p $UNICODE_DATA
   1945 - download Unicode files into $UNICODE_DATA
   1946  + subfolders: emoji, idna, security, ucd, uca
   1947  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
   1948  + split Unihan into single-property files
   1949    ~/unitools/trunk/src$ py/splitunihan.py $UNICODE_DATA/ucd/Unihan
   1950  + get GraphemeBreakTest-cldr.txt from $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
   1951    or from the ucd/cldr/ output folder of the Unicode Tools:
   1952    Since Unicode 12/CLDR 35/ICU 64 CLDR uses modified break rules.
   1953  cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt icu4c/source/test/testdata
   1954 
   1955 * for manual diffs and for Unicode Tools input data updates:
   1956  remove version suffixes from the file names
   1957    ~$ unidata/desuffixucd.py $UNICODE_DATA
   1958  (see https://sites.google.com/site/unicodetools/inputdata)
   1959 
   1960 * process and/or copy files
   1961 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
   1962  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
   1963  + For debugging, and tweaking how ppucd.txt is written,
   1964    the tool has an --only_ppucd option:
   1965    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
   1966 
   1967 - cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
   1968 
   1969 * new constants for new property values
   1970 - preparseucd.py error:
   1971    ValueError: missing uchar.h enum constants for some property values:
   1972    [(u'blk', set([u'Symbols_For_Legacy_Computing', u'Dives_Akuru', u'Yezidi',
   1973        u'Tangut_Sup', u'CJK_Ext_G', u'Khitan_Small_Script', u'Chorasmian', u'Lisu_Sup'])),
   1974    (u'sc', set([u'Chrs', u'Diak', u'Kits', u'Yezi'])),
   1975    (u'InPC', set([u'Top_And_Bottom_And_Left']))]
   1976  = PropertyValueAliases.txt new property values (diff old & new .txt files)
   1977    blk; Chorasmian                       ; Chorasmian
   1978    blk; CJK_Ext_G                        ; CJK_Unified_Ideographs_Extension_G
   1979    blk; Dives_Akuru                      ; Dives_Akuru
   1980    blk; Khitan_Small_Script              ; Khitan_Small_Script
   1981    blk; Lisu_Sup                         ; Lisu_Supplement
   1982    blk; Symbols_For_Legacy_Computing     ; Symbols_For_Legacy_Computing
   1983    blk; Tangut_Sup                       ; Tangut_Supplement
   1984    blk; Yezidi                           ; Yezidi
   1985  -> add to uchar.h before UBLOCK_COUNT
   1986    use long property names for enum constants,
   1987    for the trailing comment get the block start code point: diff old & new Blocks.txt
   1988  -> add to UCharacter.UnicodeBlock IDs
   1989    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
   1990            replace  public static final int \1_ID = \2; \3
   1991  -> add to UCharacter.UnicodeBlock objects
   1992    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
   1993            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
   1994 
   1995    sc ; Chrs                             ; Chorasmian
   1996    sc ; Diak                             ; Dives_Akuru
   1997    sc ; Kits                             ; Khitan_Small_Script
   1998    sc ; Yezi                             ; Yezidi
   1999  -> uscript.h & com.ibm.icu.lang.UScript
   2000  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
   2001      and in com.ibm.icu.dev.test.lang.TestUScript.java
   2002 
   2003    InPC; Top_And_Bottom_And_Left         ; Top_And_Bottom_And_Left
   2004  -> uchar.h enum UIndicPositionalCategory & UCharacter.java IndicPositionalCategory
   2005 
   2006 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
   2007    (not strictly necessary for NOT_ENCODED scripts)
   2008  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
   2009 
   2010 * build ICU (make install)
   2011  to make sure that there are no syntax errors, and
   2012  so that the tools build can pick up the new definitions from the installed header files.
   2013 
   2014  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
   2015 
   2016 * update spoof checker UnicodeSet initializers:
   2017    inclusionPat & recommendedPat in i18n/uspoof.cpp
   2018    INCLUSION & RECOMMENDED in SpoofChecker.java
   2019 - make sure that the Unicode Tools tree contains the latest security data files
   2020 - go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
   2021 - update the hardcoded version number there in the DIRECTORY path
   2022 - run the tool (no special environment variables needed)
   2023 - copy & paste from the Console output into the .cpp & .java files
   2024 
   2025 * generate normalization data files
   2026  cd $ICU_ROOT/dbg/icu4c
   2027  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
   2028  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
   2029  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
   2030  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
   2031  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
   2032 
   2033 * build ICU (make install)
   2034  so that the tools build can pick up the new definitions from the installed header files.
   2035 
   2036  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
   2037 
   2038 * build Unicode tools using CMake+make
   2039 
   2040 $ICU_SRC/tools/unicode/c/icudefs.txt:
   2041 
   2042 # Location (--prefix) of where ICU was installed.
   2043 set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
   2044 # Location of the ICU4C source tree.
   2045 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
   2046 
   2047  $ICU_ROOT/dbg$
   2048    mkdir -p tools/unicode/c
   2049    cd tools/unicode/c
   2050 
   2051  $ICU_ROOT/dbg/tools/unicode/c$
   2052    cmake ../../../../src/tools/unicode/c
   2053    make
   2054 
   2055 * generate core properties data files
   2056  $ICU_ROOT/dbg/tools/unicode/c$
   2057    genprops/genprops $ICU_SRC/icu4c
   2058 - tool failure:
   2059    genprops: Script_Extensions indexes overflow bit field
   2060    genprops: error parsing or setting values from ppucd.txt line 32696 - U_BUFFER_OVERFLOW_ERROR
   2061  -> uprops.icu data file format :
   2062     add two more bits to store a script code or Script_Extensions index
   2063  -> generator code, C++ & Java runtime, uprops.icu format version 7.7
   2064 - rebuild ICU (make install) & tools
   2065 
   2066 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
   2067  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
   2068 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
   2069 - Unicode 6.0..13.0: U+2260, U+226E, U+226F
   2070 - nothing new in this Unicode version, no test file to update
   2071 
   2072 * run & fix ICU4C tests
   2073 - fix Unicode Tools class Segmenter to generate correct *BreakTest.txt files
   2074 - Andy helps with RBBI & spoof check test failures
   2075 
   2076 * collation: CLDR collation root, UCA DUCET
   2077 
   2078 - UCA DUCET goes into Mark's Unicode tools, see
   2079    https://sites.google.com/site/unicodetools/home#TOC-UCA
   2080  diff the main mapping file, look for bad changes
   2081  (for example, more bytes per weight for common characters)
   2082    ~/svn.unitools/trunk$ sed -r -f ~/cldr/uni/src/tools/scripts/uca/blankweights.sed ../Generated/UCA/13.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-13.0.txt
   2083    ~/svn.unitools/trunk$ meld ../frac-12.1.txt ../frac-13.0.txt
   2084 
   2085 - CLDR root data files are checked into $CLDR_SRC/common/uca/
   2086    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
   2087 
   2088 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
   2089    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
   2090 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
   2091    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
   2092    (note removing the underscore before "Rules")
   2093    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
   2094 - restore TODO diffs in UCARules.txt
   2095    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
   2096 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
   2097  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
   2098  from the CLDR root files (..._CLDR_..._SHORT.txt)
   2099    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
   2100    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
   2101    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
   2102 - if CLDR common/uca/unihan-index.txt changes, then update
   2103  CLDR common/collation/root.xml <collation type="private-unihan">
   2104  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
   2105 
   2106 - run genuca
   2107  $ICU_ROOT/dbg/tools/unicode/c$
   2108    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
   2109    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
   2110 - rebuild ICU4C
   2111 
   2112 * Unihan collators
   2113    https://sites.google.com/site/unicodetools/unihan
   2114 - run Unicode Tools
   2115    org.unicode.draft.GenerateUnihanCollators
   2116  with VM arguments
   2117    -ea
   2118    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
   2119    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
   2120    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
   2121    -DCLDR_DIR=/usr/local/google/home/mscherer/cldr/uni/src
   2122    -DUVERSION=13.0.0
   2123 - run Unicode Tools
   2124    org.unicode.draft.GenerateUnihanCollatorFiles
   2125  with the same arguments
   2126 - check CLDR diffs
   2127    cd $CLDR_SRC
   2128    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
   2129    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
   2130 - copy to CLDR
   2131    cd $CLDR_SRC
   2132    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
   2133    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
   2134 - run CLDR unit tests, commit to CLDR
   2135 - generate ICU zh collation data: run CLDR
   2136    org.unicode.cldr.icu.NewLdml2IcuConverter
   2137  with program arguments
   2138    -t collation
   2139    -s /usr/local/google/home/mscherer/cldr/uni/src/common/collation
   2140    -m /usr/local/google/home/mscherer/cldr/uni/src/common/supplemental
   2141    -d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
   2142    -p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
   2143    zh
   2144  and VM arguments
   2145    -ea
   2146    -DCLDR_DIR=/usr/local/google/home/mscherer/cldr/uni/src
   2147 - rebuild ICU4C
   2148 
   2149 * run & fix ICU4C tests, now with new CLDR collation root data
   2150 - run all tests with the collation test data *_SHORT.txt or the full files
   2151  (the full ones have comments, useful for debugging)
   2152 - note on intltest: if collate/UCAConformanceTest fails, then
   2153  utility/MultithreadTest/TestCollators will fail as well;
   2154  fix the conformance test before looking into the multi-thread test
   2155 
   2156 * update Java data files
   2157 - refresh just the UCD/UCA-related/derived files, just to be safe
   2158 - see (ICU4C)/source/data/icu4j-readme.txt
   2159 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   2160 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   2161  output:
   2162    ...
   2163    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
   2164    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt66b
   2165    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt66b
   2166    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt66l.dat ./out/icu4j/icudt66b.dat -s ./out/build/icudt66l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt66b
   2167    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt66b"
   2168    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt66b/
   2169    mkdir -p /tmp/icu4j/main/shared/data
   2170    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
   2171    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt66b/
   2172    mkdir -p /tmp/icu4j/main/shared/data
   2173    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
   2174    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
   2175 - copy the big-endian Unicode data files to another location,
   2176  separate from the other data files,
   2177  and then refresh ICU4J
   2178    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
   2179    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   2180    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   2181    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   2182    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   2183    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
   2184    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   2185    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   2186    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   2187    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
   2188 
   2189 * When refreshing all of ICU4J data from ICU4C
   2190 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   2191 - cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
   2192 or
   2193 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
   2194 
   2195 * update CollationFCD.java
   2196  + copy & paste the initializers of lcccIndex[] etc. from
   2197    ICU4C/source/i18n/collationfcd.cpp to
   2198    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
   2199 
   2200 * refresh Java test .txt files
   2201 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
   2202    cd $ICU_SRC/icu4c/source/data/unidata
   2203    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   2204    cd ../../test/testdata
   2205    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   2206    cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   2207 
   2208 * run & fix ICU4J tests
   2209 
   2210 *** API additions
   2211 - send notice to icu-design about new born-@stable API (enum constants etc.)
   2212 
   2213 *** CLDR numbering systems
   2214 - look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
   2215  for example, look for
   2216    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
   2217    in new blocks (Blocks.txt)
   2218  Unicode 13:
   2219    diak 11950..11959 Dives_Akuru
   2220 
   2221 *** merge the Unicode update branches back onto the trunk
   2222 - do not merge the icudata.jar and testdata.jar,
   2223  instead rebuild them from merged & tested ICU4C
   2224 - make sure that changes to Unicode tools are checked in:
   2225  http://www.unicode.org/utility/trac/log/trunk/unicodetools
   2226 
   2227 ---------------------------------------------------------------------------- ***
   2228 
   2229 Unicode 12.1 update for ICU 64.2
   2230 
   2231 ** This is an abbreviated update with one new character for the new
   2232 ** Japanese era expected to start on 2019-May-01: U+32FF SQUARE ERA NAME REIWA
   2233 https://en.wikipedia.org/wiki/Reiwa_period
   2234 
   2235 http://www.unicode.org/versions/Unicode12.1.0/
   2236 
   2237 ICU-20497 Unicode 12.1
   2238 
   2239 cldrbug 11978: Unicode 12.1
   2240 
   2241 * Command-line environment setup
   2242 
   2243 UNICODE_DATA=~/unidata/uni121/20190403
   2244 CLDR_SRC=~/svn.cldr/uni
   2245 ICU_ROOT=~/icu/uni
   2246 ICU_SRC=$ICU_ROOT/src
   2247 ICUDT=icudt64b
   2248 ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
   2249 ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
   2250 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
   2251 
   2252 *** Unicode version numbers
   2253 - makedata.mak
   2254 - uchar.h
   2255 - com.ibm.icu.util.VersionInfo
   2256 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
   2257 
   2258 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
   2259    so that the makefiles see the new version number.
   2260  cd $ICU_ROOT/dbg/icu4c
   2261  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
   2262 
   2263 *** data files & enums & parser code
   2264 
   2265 * download files
   2266 - mkdir -p $UNICODE_DATA
   2267 - download Unicode files into $UNICODE_DATA
   2268  + subfolders: emoji, idna, security, ucd, uca
   2269  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
   2270 
   2271 * for manual diffs and for Unicode Tools input data updates:
   2272  remove version suffixes from the file names
   2273    ~$ unidata/desuffixucd.py $UNICODE_DATA
   2274  (see https://sites.google.com/site/unicodetools/inputdata)
   2275 
   2276 * process and/or copy files
   2277 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
   2278  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
   2279  + For debugging, and tweaking how ppucd.txt is written,
   2280    the tool has an --only_ppucd option:
   2281    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
   2282 
   2283 - cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
   2284 
   2285 * build ICU (make install)
   2286  so that the tools build can pick up the new definitions from the installed header files.
   2287 
   2288  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
   2289 
   2290 * update spoof checker UnicodeSet initializers:
   2291    inclusionPat & recommendedPat in uspoof.cpp
   2292    INCLUSION & RECOMMENDED in SpoofChecker.java
   2293 - make sure that the Unicode Tools tree contains the latest security data files
   2294 - go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
   2295 - update the hardcoded version number there in the DIRECTORY path
   2296 - run the tool (no special environment variables needed)
   2297 - copy & paste from the Console output into the .cpp & .java files
   2298 
   2299 * generate normalization data files
   2300  cd $ICU_ROOT/dbg/icu4c
   2301  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
   2302  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
   2303  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
   2304  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
   2305  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
   2306 
   2307 * build ICU (make install)
   2308  so that the tools build can pick up the new definitions from the installed header files.
   2309 
   2310  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
   2311 
   2312 * build Unicode tools using CMake+make
   2313 
   2314 $ICU_SRC/tools/unicode/c/icudefs.txt:
   2315 
   2316 # Location (--prefix) of where ICU was installed.
   2317 set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
   2318 # Location of the ICU4C source tree.
   2319 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
   2320 
   2321  $ICU_ROOT/dbg$
   2322    mkdir -p tools/unicode/c
   2323    cd tools/unicode/c
   2324 
   2325  $ICU_ROOT/dbg/tools/unicode/c$
   2326    cmake ../../../../src/tools/unicode/c
   2327    make
   2328 
   2329 * generate core properties data files
   2330  $ICU_ROOT/dbg/tools/unicode/c$
   2331    genprops/genprops $ICU_SRC/icu4c
   2332    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
   2333    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
   2334 - rebuild ICU (make install) & tools
   2335 
   2336 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
   2337  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
   2338 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
   2339 - Unicode 6.0..12.1: U+2260, U+226E, U+226F
   2340 - nothing new in this Unicode version, no test file to update
   2341 
   2342 * run & fix ICU4C tests
   2343 - Andy handles RBBI & spoof check test failures
   2344 
   2345 * collation: CLDR collation root, UCA DUCET
   2346 
   2347 - UCA DUCET goes into Mark's Unicode tools, see
   2348    https://sites.google.com/site/unicodetools/home#TOC-UCA
   2349  diff the main mapping file, look for bad changes
   2350  (for example, more bytes per weight for common characters)
   2351    ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/UCA/12.1.0/CollationAuxiliary/FractionalUCA.txt > ../frac-12.1.txt
   2352    ~/svn.unitools/trunk$ meld ../frac-12.txt ../frac-12.1.txt
   2353 
   2354 - CLDR root data files are checked into $CLDR_SRC/common/uca/
   2355    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
   2356 
   2357 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
   2358    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
   2359 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
   2360    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
   2361    (note removing the underscore before "Rules")
   2362    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
   2363 - restore TODO diffs in UCARules.txt
   2364    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
   2365 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
   2366  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
   2367  from the CLDR root files (..._CLDR_..._SHORT.txt)
   2368    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
   2369    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
   2370    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
   2371 - if CLDR common/uca/unihan-index.txt changes, then update
   2372  CLDR common/collation/root.xml <collation type="private-unihan">
   2373  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
   2374 
   2375 - run genuca, see command line above
   2376 - rebuild ICU4C
   2377 
   2378 * Unihan collators
   2379    https://sites.google.com/site/unicodetools/unihan
   2380 - run Unicode Tools
   2381    org.unicode.draft.GenerateUnihanCollators
   2382  with VM arguments
   2383    -ea
   2384    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
   2385    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
   2386    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
   2387    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
   2388    -DUVERSION=12.1.0
   2389 - run Unicode Tools
   2390    org.unicode.draft.GenerateUnihanCollatorFiles
   2391  with the same arguments
   2392 - check CLDR diffs
   2393    cd $CLDR_SRC
   2394    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
   2395    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
   2396 - copy to CLDR
   2397    cd $CLDR_SRC
   2398    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
   2399    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
   2400 - run CLDR unit tests, commit to CLDR
   2401 - generate ICU zh collation data: run CLDR
   2402    org.unicode.cldr.icu.NewLdml2IcuConverter
   2403  with program arguments
   2404    -t collation
   2405    -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
   2406    -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
   2407    -d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
   2408    -p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
   2409    zh
   2410  and VM arguments
   2411    -ea
   2412    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
   2413 - rebuild ICU4C
   2414 
   2415 * run & fix ICU4C tests, now with new CLDR collation root data
   2416 - run all tests with the collation test data *_SHORT.txt or the full files
   2417  (the full ones have comments, useful for debugging)
   2418 - note on intltest: if collate/UCAConformanceTest fails, then
   2419  utility/MultithreadTest/TestCollators will fail as well;
   2420  fix the conformance test before looking into the multi-thread test
   2421 
   2422 * update Java data files
   2423 - refresh just the UCD/UCA-related/derived files, just to be safe
   2424 - see (ICU4C)/source/data/icu4j-readme.txt
   2425 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   2426 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   2427  output:
   2428    ...
   2429    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
   2430    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt64b
   2431    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt64b
   2432    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt64l.dat ./out/icu4j/icudt64b.dat -s ./out/build/icudt64l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt64b
   2433    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt64b"
   2434    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt64b/
   2435    mkdir -p /tmp/icu4j/main/shared/data
   2436    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
   2437    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt64b/
   2438    mkdir -p /tmp/icu4j/main/shared/data
   2439    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
   2440    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
   2441 - copy the big-endian Unicode data files to another location,
   2442  separate from the other data files,
   2443  and then refresh ICU4J
   2444    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
   2445    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   2446    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   2447    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   2448    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   2449    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
   2450    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   2451    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   2452    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   2453    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
   2454 
   2455 * When refreshing all of ICU4J data from ICU4C
   2456 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   2457 - cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
   2458 or
   2459 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
   2460 
   2461 * update CollationFCD.java
   2462  + copy & paste the initializers of lcccIndex[] etc. from
   2463    ICU4C/source/i18n/collationfcd.cpp to
   2464    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
   2465 
   2466 * refresh Java test .txt files
   2467 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
   2468    cd $ICU_SRC/icu4c/source/data/unidata
   2469    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   2470    cd ../../test/testdata
   2471    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   2472    cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   2473 
   2474 * run & fix ICU4J tests
   2475 
   2476 *** API additions
   2477 - send notice to icu-design about new born-@stable API (enum constants etc.)
   2478 
   2479 *** CLDR numbering systems
   2480 - look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
   2481  for example, look for
   2482    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
   2483    in new blocks (Blocks.txt)
   2484  Unicode 12: using Unicode 12 CLDR ticket #11478
   2485    hmnp 1E140..1E149 Nyiakeng_Puachue_Hmong
   2486    wcho 1E2F0..1E2F9 Wancho
   2487  Unicode 11: using Unicode 11 CLDR ticket #10978
   2488    rohg 10D30..10D39 Hanifi_Rohingya
   2489    gong 11DA0..11DA9 Gunjala_Gondi
   2490  Earlier: CLDR tickets specific to adding new numbering systems.
   2491  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
   2492  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
   2493 
   2494 *** merge the Unicode update branches back onto the trunk
   2495 - do not merge the icudata.jar and testdata.jar,
   2496  instead rebuild them from merged & tested ICU4C
   2497 - make sure that changes to Unicode tools are checked in:
   2498  http://www.unicode.org/utility/trac/log/trunk/unicodetools
   2499 
   2500 ---------------------------------------------------------------------------- ***
   2501 
   2502 Unicode 12.0 update for ICU 64
   2503 
   2504 http://www.unicode.org/versions/Unicode12.0.0/
   2505 http://unicode.org/versions/beta-12.0.0.html
   2506 https://www.unicode.org/review/pri389/
   2507 http://www.unicode.org/reports/uax-proposed-updates.html
   2508 http://www.unicode.org/reports/tr44/tr44-23.html
   2509 
   2510 ICU-20203 Unicode 12
   2511 
   2512 ICU-20111 move text layout properties data into a data file
   2513 
   2514 cldrbug 11478: Unicode 12
   2515 Accidentally used ^/trunk instead of ^/branches/markus/uni12
   2516 
   2517 * Command-line environment setup
   2518 
   2519 UNICODE_DATA=~/unidata/uni12/20190309
   2520 CLDR_SRC=~/svn.cldr/uni
   2521 ICU_ROOT=~/icu/uni
   2522 ICU_SRC=$ICU_ROOT/src
   2523 ICUDT=icudt63b
   2524 ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
   2525 ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
   2526 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
   2527 
   2528 *** Unicode version numbers
   2529 - makedata.mak
   2530 - uchar.h
   2531 - com.ibm.icu.util.VersionInfo
   2532 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
   2533 
   2534 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
   2535  so that the makefiles see the new version number.
   2536 
   2537 *** data files & enums & parser code
   2538 
   2539 * download files
   2540 - mkdir -p $UNICODE_DATA
   2541 - download Unicode files into $UNICODE_DATA
   2542  + subfolders: emoji, idna, security, ucd, uca
   2543  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
   2544 
   2545 * for manual diffs and for Unicode Tools input data updates:
   2546  remove version suffixes from the file names
   2547    ~$ unidata/desuffixucd.py $UNICODE_DATA
   2548  (see https://sites.google.com/site/unicodetools/inputdata)
   2549 
   2550 * process and/or copy files
   2551 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
   2552  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
   2553  + For debugging, and tweaking how ppucd.txt is written,
   2554    the tool has an --only_ppucd option:
   2555    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
   2556 
   2557 - cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
   2558 
   2559 * build ICU (make install)
   2560  so that the tools build can pick up the new definitions from the installed header files.
   2561 
   2562  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
   2563 
   2564 * new constants for new property values
   2565 - preparseucd.py error:
   2566    ValueError: missing uchar.h enum constants for some property values:
   2567    [(u'blk', set([u'Symbols_And_Pictographs_Ext_A', u'Elymaic',
   2568        u'Ottoman_Siyaq_Numbers', u'Nandinagari', u'Nyiakeng_Puachue_Hmong',
   2569        u'Small_Kana_Ext', u'Egyptian_Hieroglyph_Format_Controls', u'Wancho', u'Tamil_Sup'])),
   2570    (u'sc', set([u'Nand', u'Wcho', u'Elym', u'Hmnp']))]
   2571  = PropertyValueAliases.txt new property values (diff old & new .txt files)
   2572    blk; Egyptian_Hieroglyph_Format_Controls; Egyptian_Hieroglyph_Format_Controls
   2573    blk; Elymaic                          ; Elymaic
   2574    blk; Nandinagari                      ; Nandinagari
   2575    blk; Nyiakeng_Puachue_Hmong           ; Nyiakeng_Puachue_Hmong
   2576    blk; Ottoman_Siyaq_Numbers            ; Ottoman_Siyaq_Numbers
   2577    blk; Small_Kana_Ext                   ; Small_Kana_Extension
   2578    blk; Symbols_And_Pictographs_Ext_A    ; Symbols_And_Pictographs_Extended_A
   2579    blk; Tamil_Sup                        ; Tamil_Supplement
   2580    blk; Wancho                           ; Wancho
   2581  -> add to uchar.h
   2582    use long property names for enum constants,
   2583    for the trailing comment get the block start code point: diff old & new Blocks.txt
   2584  -> add to UCharacter.UnicodeBlock IDs
   2585    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
   2586            replace  public static final int \1_ID = \2; \3
   2587  -> add to UCharacter.UnicodeBlock objects
   2588    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
   2589            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \3
   2590 
   2591    sc ; Elym                             ; Elymaic
   2592    sc ; Hmnp                             ; Nyiakeng_Puachue_Hmong
   2593    sc ; Nand                             ; Nandinagari
   2594    sc ; Wcho                             ; Wancho
   2595  -> uscript.h & com.ibm.icu.lang.UScript
   2596  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
   2597      and in com.ibm.icu.dev.test.lang.TestUScript.java
   2598 
   2599 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
   2600    (not strictly necessary for NOT_ENCODED scripts)
   2601  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
   2602 
   2603 * update spoof checker UnicodeSet initializers:
   2604    inclusionPat & recommendedPat in uspoof.cpp
   2605    INCLUSION & RECOMMENDED in SpoofChecker.java
   2606 - make sure that the Unicode Tools tree contains the latest security data files
   2607 - go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
   2608 - update the hardcoded version number there in the DIRECTORY path
   2609 - run the tool (no special environment variables needed)
   2610 - copy & paste from the Console output into the .cpp & .java files
   2611 
   2612 * generate normalization data files
   2613  cd $ICU_ROOT/dbg/icu4c
   2614  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
   2615  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
   2616  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
   2617  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
   2618  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
   2619 
   2620 * build ICU (make install)
   2621  so that the tools build can pick up the new definitions from the installed header files.
   2622 
   2623  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
   2624 
   2625 * build Unicode tools using CMake+make
   2626 
   2627 $ICU_SRC/tools/unicode/c/icudefs.txt:
   2628 
   2629 # Location (--prefix) of where ICU was installed.
   2630 set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
   2631 # Location of the ICU4C source tree.
   2632 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
   2633 
   2634  $ICU_ROOT/dbg$
   2635    mkdir -p tools/unicode/c
   2636    cd tools/unicode/c
   2637 
   2638  $ICU_ROOT/dbg/tools/unicode/c$
   2639    cmake ../../../../src/tools/unicode/c
   2640    make
   2641 
   2642 * generate core properties data files
   2643  $ICU_ROOT/dbg/tools/unicode/c$
   2644    genprops/genprops $ICU_SRC/icu4c
   2645    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
   2646    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
   2647 - rebuild ICU (make install) & tools
   2648 
   2649 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
   2650  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
   2651 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
   2652 - Unicode 6.0..12.0: U+2260, U+226E, U+226F
   2653 - nothing new in this Unicode version, no test file to update
   2654 
   2655 * run & fix ICU4C tests
   2656 - update test of default bidi classes:
   2657  Bidi range \U0001ED00-\U0001ED4F changes default from R to AL,
   2658  see diffs in DerivedBidiClass.txt
   2659  + /tsutil/cucdtst/TestUnicodeData enumDefaultsRange() defaultBidi[]
   2660  + UCharacterTest.java TestIteration() defaultBidi[]
   2661 - Andy handles RBBI & spoof check test failures
   2662 
   2663 * collation: CLDR collation root, UCA DUCET
   2664 
   2665 - UCA DUCET goes into Mark's Unicode tools, see
   2666    https://sites.google.com/site/unicodetools/home#TOC-UCA
   2667  diff the main mapping file, look for bad changes
   2668  (for example, more bytes per weight for common characters)
   2669    ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/UCA/12.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-12.txt
   2670    ~/svn.unitools/trunk$ meld ../frac-11.txt ../frac-12.txt
   2671 
   2672 - CLDR root data files are checked into $CLDR_SRC/common/uca/
   2673    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
   2674 
   2675 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
   2676    cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
   2677 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
   2678    cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
   2679    (note removing the underscore before "Rules")
   2680    cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
   2681 - restore TODO diffs in UCARules.txt
   2682    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
   2683 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
   2684  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
   2685  from the CLDR root files (..._CLDR_..._SHORT.txt)
   2686    cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
   2687    cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
   2688    cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
   2689 - if CLDR common/uca/unihan-index.txt changes, then update
   2690  CLDR common/collation/root.xml <collation type="private-unihan">
   2691  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
   2692 
   2693 - run genuca, see command line above;
   2694  deal with
   2695    Error: Unknown script for first-primary sample character U+119CE on line 29233 of /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/unidata/FractionalUCA.txt:
   2696    FDD1 119CE;	[71 CD 02, 05, 05]	# Nandinagari first primary (compressible)
   2697        (add the character to genuca.cpp sampleCharsToScripts[])
   2698  + This time, I added code to genuca.cpp to use uscript_getSampleUnicodeString(script)
   2699    and cache its values.
   2700    Works as long as the script metadata is updated before the collation data.
   2701 - rebuild ICU4C
   2702 
   2703 * Unihan collators
   2704    https://sites.google.com/site/unicodetools/unihan
   2705 - run Unicode Tools
   2706    org.unicode.draft.GenerateUnihanCollators
   2707  with VM arguments
   2708    -ea
   2709    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
   2710    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
   2711    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
   2712    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
   2713    -DUVERSION=12.0.0
   2714 - run Unicode Tools
   2715    org.unicode.draft.GenerateUnihanCollatorFiles
   2716  with the same arguments
   2717 - check CLDR diffs
   2718    cd $CLDR_SRC
   2719    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
   2720    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
   2721 - copy to CLDR
   2722    cd $CLDR_SRC
   2723    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
   2724    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
   2725 - run CLDR unit tests, commit to CLDR
   2726 - generate ICU zh collation data: run CLDR
   2727    org.unicode.cldr.icu.NewLdml2IcuConverter
   2728  with program arguments
   2729    -t collation
   2730    -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
   2731    -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
   2732    -d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
   2733    -p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
   2734    zh
   2735  and VM arguments
   2736    -ea
   2737    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
   2738 - rebuild ICU4C
   2739 
   2740 * run & fix ICU4C tests, now with new CLDR collation root data
   2741 - run all tests with the collation test data *_SHORT.txt or the full files
   2742  (the full ones have comments, useful for debugging)
   2743 - note on intltest: if collate/UCAConformanceTest fails, then
   2744  utility/MultithreadTest/TestCollators will fail as well;
   2745  fix the conformance test before looking into the multi-thread test
   2746 
   2747 * update Java data files
   2748 - refresh just the UCD/UCA-related/derived files, just to be safe
   2749 - see (ICU4C)/source/data/icu4j-readme.txt
   2750 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   2751 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   2752  output:
   2753    ...
   2754    Unicode .icu files built to ./out/build/icudt63l
   2755    echo timestamp > uni-core-data
   2756    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt63b
   2757    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt63b
   2758    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
   2759    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt63l.dat ./out/icu4j/icudt63b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt63l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt63b
   2760    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt63b"
   2761    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt63b/
   2762    mkdir -p /tmp/icu4j/main/shared/data
   2763    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
   2764    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt63b/
   2765    mkdir -p /tmp/icu4j/main/shared/data
   2766    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
   2767    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
   2768 - copy the big-endian Unicode data files to another location,
   2769  separate from the other data files,
   2770  and then refresh ICU4J
   2771    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
   2772    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   2773    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   2774    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   2775    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   2776    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
   2777    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   2778    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   2779    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   2780    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
   2781 
   2782 * When refreshing all of ICU4J data from ICU4C
   2783 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   2784 - cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
   2785 or
   2786 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
   2787 
   2788 * update CollationFCD.java
   2789  + copy & paste the initializers of lcccIndex[] etc. from
   2790    ICU4C/source/i18n/collationfcd.cpp to
   2791    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
   2792 
   2793 * refresh Java test .txt files
   2794 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
   2795    cd $ICU_SRC/icu4c/source/data/unidata
   2796    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   2797    cd ../../test/testdata
   2798    cp BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   2799    cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   2800 
   2801 * run & fix ICU4J tests
   2802 
   2803 *** API additions
   2804 - send notice to icu-design about new born-@stable API (enum constants etc.)
   2805 
   2806 *** CLDR numbering systems
   2807 - look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
   2808  for example, look for
   2809    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
   2810    in new blocks (Blocks.txt)
   2811  Unicode 12: using Unicode 12 CLDR ticket #11478
   2812    hmnp 1E140..1E149 Nyiakeng_Puachue_Hmong
   2813    wcho 1E2F0..1E2F9 Wancho
   2814  Unicode 11: using Unicode 11 CLDR ticket #10978
   2815    rohg 10D30..10D39 Hanifi_Rohingya
   2816    gong 11DA0..11DA9 Gunjala_Gondi
   2817  Earlier: CLDR tickets specific to adding new numbering systems.
   2818  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
   2819  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
   2820 
   2821 *** merge the Unicode update branches back onto the trunk
   2822 - do not merge the icudata.jar and testdata.jar,
   2823  instead rebuild them from merged & tested ICU4C
   2824 - make sure that changes to Unicode tools are checked in:
   2825  http://www.unicode.org/utility/trac/log/trunk/unicodetools
   2826 
   2827 ---------------------------------------------------------------------------- ***
   2828 
   2829 ICU 63 addition of ICU support of text layout properties InPC, InSC, vo
   2830 
   2831 * Command-line environment setup
   2832 
   2833 UNICODE_DATA=~/unidata/uni11/20180609
   2834 CLDR_SRC=~/svn.cldr/uni
   2835 ICU_ROOT=~/icu/mine
   2836 ICU_SRC=$ICU_ROOT/src
   2837 ICUDT=icudt62b
   2838 ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
   2839 ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
   2840 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
   2841 
   2842 *** Links
   2843 
   2844 https://unicode-org.atlassian.net/browse/ICU-8966 InPC & InSC
   2845 https://unicode-org.atlassian.net/browse/ICU-12850 vo
   2846 
   2847 *** data files & enums & parser code
   2848 
   2849 * API additions
   2850 - for each of the three new enumerated properties
   2851  + uchar.h: add the enum UProperty constant UCHAR_<long prop name>
   2852  + uchar.h: update UCHAR_INT_LIMIT
   2853  + uchar.h: add the enum U<long prop name>
   2854    with constants U_<short prop name>_<long value name>
   2855  + UProperty.java: add the constant <long prop name>
   2856  + UProperty.java: update INT_LIMIT
   2857  + UCharacter.java: add the interface <long prop name>
   2858    with constants <long value name>
   2859 
   2860 * process and/or copy files
   2861 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
   2862  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
   2863  + It also writes tools/unicode/c/genprops/pnames_data.h with property and value
   2864    names and aliases.
   2865  + For debugging, and tweaking how ppucd.txt is written,
   2866    the tool has an --only_ppucd option:
   2867    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
   2868 
   2869 * preparseucd.py changes
   2870 - add new property short names (uppercase) to _prop_and_value_re
   2871  so that ParseUCharHeader() parses the new enum constants
   2872 
   2873 * build ICU (make install)
   2874  so that the tools build can pick up the new definitions from the installed header files.
   2875 
   2876  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
   2877 
   2878 * build Unicode tools using CMake+make
   2879 
   2880 $ICU_SRC/tools/unicode/c/icudefs.txt:
   2881 
   2882 # Location (--prefix) of where ICU was installed.
   2883 set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
   2884 # Location of the ICU4C source tree.
   2885 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/mine/src/icu4c)
   2886 
   2887  $ICU_ROOT/dbg$
   2888    mkdir -p tools/unicode/c
   2889    cd tools/unicode/c
   2890 
   2891  $ICU_ROOT/dbg/tools/unicode/c$
   2892    cmake ../../../../../src/tools/unicode/c
   2893    make
   2894 
   2895 * generate core properties data files
   2896  $ICU_ROOT/dbg/tools/unicode/c$
   2897    genprops/genprops $ICU_SRC/icu4c
   2898 - rebuild ICU (make install) & tools
   2899 
   2900 * write data for runtime, hardcoded for now
   2901 - add genprops/layoutpropsbuilder.cpp with pieces from sibling files
   2902 - generate new icu4c/source/common/ulayout_props_data.h
   2903 - for each of the three new enumerated properties
   2904  + int property max value
   2905  + small, 8-bit UCPTrie
   2906    (A small 16-bit trie with bit fields for these three properties
   2907    is very nearly the same size as the sum of the three.)
   2908 
   2909 * wire into C++
   2910 - uprops.cpp: #include ulayout_props_data.h
   2911 - uprops.cpp: add getInPC() etc. functions
   2912 - uprops.cpp: add lines to intProps[], include max values
   2913 - uprops.h: add UPropertySource constants
   2914 - uprops.cpp: add uprops_addPropertyStarts(src)
   2915 - uniset_props.cpp: add to UnicodeSet_initInclusion()
   2916 - intltest/ucdtest.cpp: write unit tests
   2917 
   2918 * update Java data files
   2919 - refresh just the pnames.icu file with the new property [value] names, just to be safe
   2920 - see $ICU_SRC/icu4c/source/data/icu4j-readme.txt
   2921 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   2922 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   2923 - copy the big-endian Unicode data files to another location,
   2924  separate from the other data files,
   2925  and then refresh ICU4J
   2926    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
   2927    cp com/ibm/icu/impl/data/$ICUDT/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   2928    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
   2929 
   2930 * wire into Java
   2931 - UCharacterProperty.java: add new SRC_INPC etc. constants as in C++
   2932 - UCharacterProperty.java: for each new property
   2933  + create a nested class to hold its CodePointTrie
   2934  + initialize it from a string literal
   2935  + paste in the initializer printed by genprops
   2936  + add a new IntProperty object to the intProps[] array
   2937  + use the correct max int value for each property, also printed by genprops
   2938 - UCharacterProperty.java: add ulayout_addPropertyStarts(src, set)
   2939 - UnicodeSet.java: add to getInclusions()
   2940 - UCharacterTest.java: write unit tests
   2941 
   2942 ---------------------------------------------------------------------------- ***
   2943 
   2944 Unicode 11.0 update for ICU 62
   2945 
   2946 http://www.unicode.org/versions/Unicode11.0.0/
   2947 http://unicode.org/versions/beta-11.0.0.html
   2948 https://www.unicode.org/review/pri372/
   2949 http://www.unicode.org/reports/uax-proposed-updates.html
   2950 http://www.unicode.org/reports/tr44/tr44-21.html
   2951 
   2952 * Command-line environment setup
   2953 
   2954 UNICODE_DATA=~/unidata/uni11/20180521
   2955 CLDR_SRC=~/svn.cldr/uni
   2956 ICU_ROOT=~/svn.icu/uni
   2957 ICU_SRC=$ICU_ROOT/src
   2958 ICUDT=icudt61b
   2959 ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
   2960 ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
   2961 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
   2962 
   2963 *** ICU Trac
   2964 
   2965 - ticket:13630: Unicode 11
   2966 - ^/branches/markus/uni11
   2967 
   2968 *** CLDR Trac
   2969 
   2970 - cldrbug 10978: Unicode 11
   2971 - ^/branches/markus/uni11
   2972 
   2973 *** Unicode version numbers
   2974 - makedata.mak
   2975 - uchar.h
   2976 - com.ibm.icu.util.VersionInfo
   2977 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
   2978 
   2979 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
   2980  so that the makefiles see the new version number.
   2981 
   2982 *** data files & enums & parser code
   2983 
   2984 * download files
   2985 - mkdir -p $UNICODE_DATA
   2986 - download Unicode files into $UNICODE_DATA
   2987  + subfolders: emoji, idna, security, ucd, uca
   2988  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
   2989 
   2990 * for manual diffs and for Unicode Tools input data updates:
   2991  remove version suffixes from the file names
   2992    ~$ unidata/desuffixucd.py $UNICODE_DATA
   2993  (see https://sites.google.com/site/unicodetools/inputdata)
   2994 
   2995 * process and/or copy files
   2996 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
   2997  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
   2998  + For debugging, and tweaking how ppucd.txt is written,
   2999    the tool has an --only_ppucd option:
   3000    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
   3001 
   3002 - cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
   3003 
   3004 * build ICU (make install)
   3005  so that the tools build can pick up the new definitions from the installed header files.
   3006 
   3007  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
   3008 
   3009 * preparseucd.py changes
   3010 - fix other errors
   3011    NameError: unknown property Extended_Pictographic
   3012  -> add Extended_Pictographic binary property
   3013  -> add new short names for all Emoji properties
   3014 
   3015 * new constants for new property values
   3016 - preparseucd.py error:
   3017    ValueError: missing uchar.h enum constants for some property values:
   3018    [(u'blk', set([u'Georgian_Ext', u'Hanifi_Rohingya', u'Medefaidrin', u'Sogdian', u'Makasar',
   3019                   u'Old_Sogdian', u'Dogra', u'Gunjala_Gondi', u'Chess_Symbols', u'Mayan_Numerals',
   3020                   u'Indic_Siyaq_Numbers'])),
   3021     (u'jg', set([u'Hanifi_Rohingya_Kinna_Ya', u'Hanifi_Rohingya_Pa'])),
   3022     (u'sc', set([u'Medf', u'Sogd', u'Dogr', u'Rohg', u'Maka', u'Sogo', u'Gong'])),
   3023     (u'GCB', set([u'LinkC', u'Virama'])),
   3024     (u'WB', set([u'WSegSpace']))]
   3025  = PropertyValueAliases.txt new property values (diff old & new .txt files)
   3026    blk; Chess_Symbols                    ; Chess_Symbols
   3027    blk; Dogra                            ; Dogra
   3028    blk; Georgian_Ext                     ; Georgian_Extended
   3029    blk; Gunjala_Gondi                    ; Gunjala_Gondi
   3030    blk; Hanifi_Rohingya                  ; Hanifi_Rohingya
   3031    blk; Indic_Siyaq_Numbers              ; Indic_Siyaq_Numbers
   3032    blk; Makasar                          ; Makasar
   3033    blk; Mayan_Numerals                   ; Mayan_Numerals
   3034    blk; Medefaidrin                      ; Medefaidrin
   3035    blk; Old_Sogdian                      ; Old_Sogdian
   3036    blk; Sogdian                          ; Sogdian
   3037  -> add to uchar.h
   3038    use long property names for enum constants,
   3039    for the trailing comment get the block start code point: diff old & new Blocks.txt
   3040  -> add to UCharacter.UnicodeBlock IDs
   3041    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
   3042            replace  public static final int \1_ID = \2; \3
   3043  -> add to UCharacter.UnicodeBlock objects
   3044    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
   3045            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
   3046 
   3047    GCB; LinkC                            ; LinkingConsonant
   3048    GCB; Virama                           ; Virama
   3049  -> uchar.h & UCharacter.GraphemeClusterBreak
   3050  -> these two later removed again: http://www.unicode.org/L2/L2018/18115.htm#155-A76
   3051 
   3052    InSC; Consonant_Initial_Postfixed     ; Consonant_Initial_Postfixed
   3053  -> ignore: ICU does not yet support this property
   3054 
   3055    jg ; Hanifi_Rohingya_Kinna_Ya         ; Hanifi_Rohingya_Kinna_Ya
   3056    jg ; Hanifi_Rohingya_Pa               ; Hanifi_Rohingya_Pa
   3057  -> uchar.h & UCharacter.JoiningGroup
   3058 
   3059    sc ; Dogr                             ; Dogra
   3060    sc ; Gong                             ; Gunjala_Gondi
   3061    sc ; Maka                             ; Makasar
   3062    sc ; Medf                             ; Medefaidrin
   3063    sc ; Rohg                             ; Hanifi_Rohingya
   3064    sc ; Sogd                             ; Sogdian
   3065    sc ; Sogo                             ; Old_Sogdian
   3066  -> uscript.h & com.ibm.icu.lang.UScript
   3067  -> Nushu had been added already
   3068  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
   3069      and in com.ibm.icu.dev.test.lang.TestUScript.java
   3070 
   3071    WB ; WSegSpace                        ; WSegSpace
   3072  -> uchar.h & UCharacter.WordBreak
   3073 
   3074 * New short names for emoji properties
   3075 - see UTS #51
   3076 - short names set in preparseucd.py
   3077 
   3078 * New properties
   3079 - boolean emoji property Extended_Pictographic
   3080  -> added in preparseucd.py
   3081  -> uchar.h & UProperty.java
   3082 - misc. property Equivalent_Unified_Ideograph (EqUIdeo)
   3083  as shown in PropertyValueAliases.txt
   3084  -> ignore for now
   3085 
   3086 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
   3087    (not strictly necessary for NOT_ENCODED scripts)
   3088  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
   3089 
   3090 * update spoof checker UnicodeSet initializers:
   3091    inclusionPat & recommendedPat in uspoof.cpp
   3092    INCLUSION & RECOMMENDED in SpoofChecker.java
   3093 - make sure that the Unicode Tools tree contains the latest security data files
   3094 - go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
   3095 - update the hardcoded version number there in the DIRECTORY path
   3096 - run the tool (no special environment variables needed)
   3097 - copy & paste from the Console output into the .cpp & .java files
   3098 
   3099 * generate normalization data files
   3100  cd $ICU_ROOT/dbg/icu4c
   3101  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
   3102  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
   3103  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
   3104  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
   3105  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
   3106 
   3107 * build ICU (make install)
   3108  so that the tools build can pick up the new definitions from the installed header files.
   3109 
   3110  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
   3111 
   3112 * build Unicode tools using CMake+make
   3113 
   3114 $ICU_SRC/tools/unicode/c/icudefs.txt:
   3115 
   3116 # Location (--prefix) of where ICU was installed.
   3117 set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
   3118 # Location of the ICU4C source tree.
   3119 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c)
   3120 
   3121  $ICU_ROOT/dbg$
   3122    mkdir -p tools/unicode/c
   3123    cd tools/unicode/c
   3124 
   3125  $ICU_ROOT/dbg/tools/unicode/c$
   3126    cmake ../../../../src/tools/unicode/c
   3127    make
   3128 
   3129 * generate core properties data files
   3130  $ICU_ROOT/dbg/tools/unicode/c$
   3131    genprops/genprops $ICU_SRC/icu4c
   3132    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
   3133    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
   3134 - rebuild ICU (make install) & tools
   3135 
   3136 * Fix case props
   3137    genprops error: casepropsbuilder: too many exceptions words
   3138    genprops error: failure finalizing the data - U_BUFFER_OVERFLOW_ERROR
   3139 - With the addition of Georgian Mtavruli capital letters,
   3140  there are now too many simple case mappings with big mapping deltas
   3141  that yield uncompressible exceptions.
   3142 - Changing the data structure (now formatVersion 4),
   3143  adding one bit for no-simple-case-folding (for Cherokee), and
   3144  one optional slot for a big delta (for most faraway mappings),
   3145  together with another bit for whether that is negative.
   3146  This makes most Cherokee & Georgian etc. case mappings compressible,
   3147  reducing the number of exceptions words.
   3148 - Further changes to gain one more bit for the exceptions index,
   3149  for future growth. Details see casepropsbuilder.cpp.
   3150 
   3151 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
   3152  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
   3153 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
   3154 - Unicode 6.0..11.0: U+2260, U+226E, U+226F
   3155 - nothing new in this Unicode version, no test file to update
   3156 
   3157 * run & fix ICU4C tests
   3158 - Andy handles RBBI & spoof check test failures
   3159 
   3160 - Errors in char.txt, word.txt, word_POSIX.txt like
   3161    createRuleBasedBreakIterator: ICU Error "U_BRK_RULE_EMPTY_SET"  at line 46, column 16
   3162  because \p{Grapheme_Cluster_Break = EBG} and \p{Word_Break = EBG} are empty.
   3163  -> Temporary(!) workaround: Add an arbitrary code point to these sets to make them
   3164     not empty, just to get ICU building.
   3165  -> Intermediate workaround: Remove $E_Base_GAZ and other now-unused variables
   3166     and properties together with the rules that used them (GB 10, WB 14).
   3167  -> Andy adjusts the rule sets further to sync with
   3168     Unicode 11 grapheme, word, and line break spec changes.
   3169 
   3170 * collation: CLDR collation root, UCA DUCET
   3171 
   3172 - UCA DUCET goes into Mark's Unicode tools, see
   3173    https://sites.google.com/site/unicodetools/home#TOC-UCA
   3174  diff the main mapping file, look for bad changes
   3175  (for example, more bytes per weight for common characters)
   3176    ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/uca/11.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-11.txt
   3177    ~/svn.unitools/trunk$ meld ../frac-10.txt ../frac-11.txt
   3178 
   3179 - CLDR root data files are checked into $CLDR_SRC/common/uca/
   3180    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
   3181 
   3182 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
   3183    cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
   3184 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
   3185    cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
   3186    (note removing the underscore before "Rules")
   3187    cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
   3188 - restore TODO diffs in UCARules.txt
   3189    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
   3190 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
   3191  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
   3192  from the CLDR root files (..._CLDR_..._SHORT.txt)
   3193    cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
   3194    cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
   3195    cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
   3196 - if CLDR common/uca/unihan-index.txt changes, then update
   3197  CLDR common/collation/root.xml <collation type="private-unihan">
   3198  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
   3199 
   3200 - run genuca, see command line above;
   3201  deal with
   3202    Error: Unknown script for first-primary sample character U+1180B on line 28649 of /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/unidata/FractionalUCA.txt:
   3203    FDD1 1180B;	[71 CC 02, 05, 05]	# Dogra first primary (compressible)
   3204        (add the character to genuca.cpp sampleCharsToScripts[])
   3205  + look up the USCRIPT_ code for the new sample characters
   3206    (should be obvious from the comment in the error output)
   3207  + *add* mappings to sampleCharsToScripts[], do not replace them
   3208    (in case the script sample characters flip-flop)
   3209  + insert new scripts in DUCET script order, see the top_byte table
   3210    at the beginning of FractionalUCA.txt
   3211 - rebuild ICU4C
   3212 
   3213 * Unihan collators
   3214    https://sites.google.com/site/unicodetools/unihan
   3215 - run Unicode Tools
   3216    org.unicode.draft.GenerateUnihanCollators
   3217  with VM arguments
   3218    -ea
   3219    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
   3220    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
   3221    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
   3222    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
   3223    -DUVERSION=11.0.0
   3224 - run Unicode Tools
   3225    org.unicode.draft.GenerateUnihanCollatorFiles
   3226  with the same arguments
   3227 - check CLDR diffs
   3228    cd $CLDR_SRC
   3229    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
   3230    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
   3231 - copy to CLDR
   3232    cd $CLDR_SRC
   3233    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
   3234    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
   3235 - run CLDR unit tests, commit to CLDR
   3236 - generate ICU zh collation data: run CLDR
   3237    org.unicode.cldr.icu.NewLdml2IcuConverter
   3238  with program arguments
   3239    -t collation
   3240    -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
   3241    -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
   3242    -d /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/coll
   3243    -p /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/xml/collation
   3244    zh
   3245  and VM arguments
   3246    -ea
   3247    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
   3248 - rebuild ICU4C
   3249 
   3250 * run & fix ICU4C tests, now with new CLDR collation root data
   3251 - run all tests with the collation test data *_SHORT.txt or the full files
   3252  (the full ones have comments, useful for debugging)
   3253 - note on intltest: if collate/UCAConformanceTest fails, then
   3254  utility/MultithreadTest/TestCollators will fail as well;
   3255  fix the conformance test before looking into the multi-thread test
   3256 
   3257 * update Java data files
   3258 - refresh just the UCD/UCA-related/derived files, just to be safe
   3259 - see (ICU4C)/source/data/icu4j-readme.txt
   3260 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   3261 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   3262  output:
   3263    ...
   3264    Unicode .icu files built to ./out/build/icudt61l
   3265    echo timestamp > uni-core-data
   3266    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt61b
   3267    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt61b
   3268    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
   3269    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt61l.dat ./out/icu4j/icudt61b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt61l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt61b
   3270    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt61b"
   3271    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt61b/
   3272    mkdir -p /tmp/icu4j/main/shared/data
   3273    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
   3274    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt61b/
   3275    mkdir -p /tmp/icu4j/main/shared/data
   3276    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
   3277    make[1]: Leaving directory '/usr/local/google/home/mscherer/svn.icu/uni/dbg/icu4c/data'
   3278 - copy the big-endian Unicode data files to another location,
   3279  separate from the other data files,
   3280  and then refresh ICU4J
   3281    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
   3282    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   3283    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   3284    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   3285    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   3286    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
   3287    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   3288    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   3289    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   3290    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
   3291 
   3292 * When refreshing all of ICU4J data from ICU4C
   3293 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   3294 - cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
   3295 or
   3296 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
   3297 
   3298 * update CollationFCD.java
   3299  + copy & paste the initializers of lcccIndex[] etc. from
   3300    ICU4C/source/i18n/collationfcd.cpp to
   3301    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
   3302 
   3303 * refresh Java test .txt files
   3304 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
   3305    cd $ICU_SRC/icu4c/source/data/unidata
   3306    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   3307    cd ../../test/testdata
   3308    cp BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   3309    cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   3310 
   3311 * run & fix ICU4J tests
   3312 
   3313 *** API additions
   3314 - send notice to icu-design about new born-@stable API (enum constants etc.)
   3315 
   3316 *** CLDR numbering systems
   3317 - look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
   3318  Unicode 11: using Unicode 11 CLDR ticket #10978
   3319    rohg 10D30..10D39 Hanifi_Rohingya
   3320    gong 11DA0..11DA9 Gunjala_Gondi
   3321  Earlier: CLDR tickets specific to adding new numbering systems.
   3322  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
   3323  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
   3324 
   3325 *** merge the Unicode update branches back onto the trunk
   3326 - do not merge the icudata.jar and testdata.jar,
   3327  instead rebuild them from merged & tested ICU4C
   3328 - make sure that changes to Unicode tools are checked in:
   3329  http://www.unicode.org/utility/trac/log/trunk/unicodetools
   3330 
   3331 ---------------------------------------------------------------------------- ***
   3332 
   3333 Unicode 10.0 update for ICU 60
   3334 
   3335 http://www.unicode.org/versions/Unicode10.0.0/
   3336 http://www.unicode.org/versions/beta-10.0.0.html
   3337 http://blog.unicode.org/2017/03/unicode-100-beta-review.html
   3338 http://www.unicode.org/review/pri350/
   3339 http://www.unicode.org/reports/uax-proposed-updates.html
   3340 http://www.unicode.org/reports/tr44/tr44-19.html
   3341 
   3342 * Command-line environment setup
   3343 
   3344 UNICODE_DATA=~/unidata/uni10/20170605
   3345 CLDR_SRC=~/svn.cldr/uni10
   3346 ICU_ROOT=~/svn.icu/uni10
   3347 ICU_SRC=$ICU_ROOT/src
   3348 ICUDT=icudt60b
   3349 ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
   3350 ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
   3351 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
   3352 
   3353 *** ICU Trac
   3354 
   3355 - ticket:12985: Unicode 10
   3356 - ticket:13061: undo hacks from emoji 5.0 update
   3357 - ticket:13062: add Emoji_Component property
   3358 - ^/branches/markus/uni10
   3359 
   3360 *** CLDR Trac
   3361 
   3362 - cldrbug 10055: Unicode 10
   3363 - cldrbug 9882: Unicode 10 script metadata
   3364 - cldrbug 10219: numbering systems for Unicode 10
   3365 
   3366 *** Unicode version numbers
   3367 - makedata.mak
   3368 - uchar.h
   3369 - com.ibm.icu.util.VersionInfo
   3370 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
   3371 
   3372 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
   3373  so that the makefiles see the new version number.
   3374 
   3375 *** data files & enums & parser code
   3376 
   3377 * download files
   3378 - mkdir -p $UNICODE_DATA
   3379 - download Unicode 10.0 files into $UNICODE_DATA
   3380  + subfolders: ucd, uca, idna, security
   3381  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
   3382 - download emoji 5.0 files into $UNICODE_DATA/emoji
   3383 
   3384 * for manual diffs: remove version suffixes from the file names
   3385  ~$ unidata/desuffixucd.py $UNICODE_DATA
   3386  (see https://sites.google.com/site/unicodetools/inputdata)
   3387 
   3388 * process and/or copy files
   3389 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
   3390  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
   3391  + For debugging, and tweaking how ppucd.txt is written,
   3392    the tool has an --only_ppucd option:
   3393    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
   3394 
   3395 - cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
   3396 
   3397 * build ICU (make install)
   3398  so that the tools build can pick up the new definitions from the installed header files.
   3399 
   3400  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
   3401 
   3402 * preparseucd.py changes
   3403 - remove or add new Unicode scripts from/to the
   3404  only-in-ISO-15924 list according to the error messages:
   3405    ValueError: remove ['Nshu'] from _scripts_only_in_iso15924
   3406  -> adjust _scripts_only_in_iso15924 as indicated
   3407 - fix other errors
   3408    Exception: no default values (@missing lines) for some Catalog or Enumerated properties: [u'vo'] 
   3409  -> add vo=Vertical_Orientation to _ignored_properties
   3410  -> later removed again, parsing the file, even though we do not yet store data for runtime use
   3411 
   3412 * new constants for new property values
   3413 - preparseucd.py error:
   3414    ValueError: missing uchar.h enum constants for some property values:
   3415    [(u'blk', set([u'Zanabazar_Square', u'Nushu', u'CJK_Ext_F',
   3416                   u'Kana_Ext_A', u'Syriac_Sup', u'Masaram_Gondi', u'Soyombo'])),
   3417     (u'jg', set([u'Malayalam_Bha', u'Malayalam_Llla', u'Malayalam_Nya', u'Malayalam_Lla',
   3418                  u'Malayalam_Nga', u'Malayalam_Ssa', u'Malayalam_Tta', u'Malayalam_Ra',
   3419                  u'Malayalam_Nna', u'Malayalam_Ja', u'Malayalam_Nnna'])),
   3420     (u'sc', set([u'Soyo', u'Gonm', u'Zanb']))]
   3421  = PropertyValueAliases.txt new property values (diff old & new .txt files)
   3422    blk; CJK_Ext_F                        ; CJK_Unified_Ideographs_Extension_F
   3423    blk; Kana_Ext_A                       ; Kana_Extended_A
   3424    blk; Masaram_Gondi                    ; Masaram_Gondi
   3425    blk; Nushu                            ; Nushu
   3426    blk; Soyombo                          ; Soyombo
   3427    blk; Syriac_Sup                       ; Syriac_Supplement
   3428    blk; Zanabazar_Square                 ; Zanabazar_Square
   3429  -> add to uchar.h
   3430    use long property names for enum constants,
   3431    for the trailing comment get the block start code point: diff old & new Blocks.txt
   3432  -> add to UCharacter.UnicodeBlock IDs
   3433    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
   3434            replace  public static final int \1_ID = \2; \3
   3435  -> add to UCharacter.UnicodeBlock objects
   3436    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
   3437            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
   3438 
   3439    jg ; Malayalam_Bha                    ; Malayalam_Bha
   3440    jg ; Malayalam_Ja                     ; Malayalam_Ja
   3441    jg ; Malayalam_Lla                    ; Malayalam_Lla
   3442    jg ; Malayalam_Llla                   ; Malayalam_Llla
   3443    jg ; Malayalam_Nga                    ; Malayalam_Nga
   3444    jg ; Malayalam_Nna                    ; Malayalam_Nna
   3445    jg ; Malayalam_Nnna                   ; Malayalam_Nnna
   3446    jg ; Malayalam_Nya                    ; Malayalam_Nya
   3447    jg ; Malayalam_Ra                     ; Malayalam_Ra
   3448    jg ; Malayalam_Ssa                    ; Malayalam_Ssa
   3449    jg ; Malayalam_Tta                    ; Malayalam_Tta
   3450  -> uchar.h & UCharacter.JoiningGroup
   3451 
   3452    sc ; Gonm                             ; Masaram_Gondi
   3453    sc ; Nshu                             ; Nushu
   3454    sc ; Soyo                             ; Soyombo
   3455    sc ; Zanb                             ; Zanabazar_Square
   3456  -> uscript.h & com.ibm.icu.lang.UScript
   3457  -> Nushu had been added already
   3458  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
   3459      and in com.ibm.icu.dev.test.lang.TestUScript.java
   3460 
   3461 * New properties as shown in PropertyValueAliases.txt changes
   3462 - boolean Emoji_Component from emoji 5
   3463  -> uchar.h & UProperty.java
   3464 - boolean
   3465    # Regional_Indicator (RI)
   3466 
   3467    RI ; N                                ; No                               ; F                                ; False
   3468    RI ; Y                                ; Yes                              ; T                                ; True
   3469  -> uchar.h & UProperty.java
   3470  -> single immutable range, to be hardcoded
   3471 - boolean
   3472    # Prepended_Concatenation_Mark (PCM)
   3473 
   3474    PCM; N                                ; No                               ; F                                ; False
   3475    PCM; Y                                ; Yes                              ; T                                ; True
   3476  -> was new in Unicode 9
   3477  -> uchar.h & UProperty.java
   3478 - enumerated
   3479    # Vertical_Orientation (vo)
   3480 
   3481    vo ; R                                ; Rotated
   3482    vo ; Tr                               ; Transformed_Rotated
   3483    vo ; Tu                               ; Transformed_Upright
   3484    vo ; U                                ; Upright
   3485  -> only pre-parsed for now, but not yet stored for runtime use
   3486 
   3487 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
   3488    (not strictly necessary for NOT_ENCODED scripts)
   3489  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
   3490 
   3491 * generate normalization data files
   3492  cd $ICU_ROOT/dbg/icu4c
   3493  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
   3494  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
   3495  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
   3496  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
   3497  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
   3498 
   3499 * build ICU (make install)
   3500  so that the tools build can pick up the new definitions from the installed header files.
   3501 
   3502  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
   3503 
   3504 * build Unicode tools using CMake+make
   3505 
   3506 $ICU_SRC/tools/unicode/c/icudefs.txt:
   3507 
   3508 # Location (--prefix) of where ICU was installed.
   3509 set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
   3510 # Location of the ICU4C source tree.
   3511 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c)
   3512 
   3513  $ICU_ROOT/dbg/tools/unicode/c$
   3514    cmake ../../../../src/tools/unicode/c
   3515    make
   3516 
   3517 * generate core properties data files
   3518  $ICU_ROOT/dbg/tools/unicode/c$
   3519    genprops/genprops $ICU_SRC/icu4c
   3520    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
   3521    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
   3522 - rebuild ICU (make install) & tools
   3523 
   3524 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
   3525  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
   3526 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
   3527 - Unicode 6.0..10.0: U+2260, U+226E, U+226F
   3528 - nothing new in this Unicode version, no test file to update
   3529 
   3530 * run & fix ICU4C tests
   3531 - Andy handles RBBI & spoof check test failures
   3532 
   3533 * collation: CLDR collation root, UCA DUCET
   3534 
   3535 - UCA DUCET goes into Mark's Unicode tools, see
   3536  https://sites.google.com/site/unicodetools/home#TOC-UCA
   3537 - CLDR root data files are checked into $CLDR_SRC/common/uca/
   3538    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
   3539 
   3540 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
   3541    cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
   3542 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
   3543    cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
   3544    (note removing the underscore before "Rules")
   3545    cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
   3546 - restore TODO diffs in UCARules.txt
   3547    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
   3548 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
   3549  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
   3550  from the CLDR root files (..._CLDR_..._SHORT.txt)
   3551    cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
   3552    cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
   3553    cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
   3554 - if CLDR common/uca/unihan-index.txt changes, then update
   3555  CLDR common/collation/root.xml <collation type="private-unihan">
   3556  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
   3557 
   3558 - run genuca, see command line above;
   3559  deal with
   3560    Error: Unknown script for first-primary sample character U+11D10 on line 28117 of /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/unidata/FractionalUCA.txt:
   3561    FDD1 11D10;     [70 D5 02, 05, 05]      # Masaram_Gondi first primary (compressible)
   3562        (add the character to genuca.cpp sampleCharsToScripts[])
   3563  + look up the USCRIPT_ code for the new sample characters
   3564    (should be obvious from the comment in the error output)
   3565  + *add* mappings to sampleCharsToScripts[], do not replace them
   3566    (in case the script sample characters flip-flop)
   3567  + insert new scripts in DUCET script order, see the top_byte table
   3568    at the beginning of FractionalUCA.txt
   3569 - rebuild ICU4C
   3570 
   3571 * Unihan collators
   3572    https://sites.google.com/site/unicodetools/unihan
   3573 - run Unicode Tools
   3574    org.unicode.draft.GenerateUnihanCollators
   3575  with VM arguments
   3576    -ea
   3577    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
   3578    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
   3579    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
   3580    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
   3581    -DUVERSION=10.0.0
   3582 - run Unicode Tools
   3583    org.unicode.draft.GenerateUnihanCollatorFiles
   3584  with the same arguments
   3585 - check CLDR diffs
   3586    cd $CLDR_SRC
   3587    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
   3588    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
   3589 - copy to CLDR
   3590    cd $CLDR_SRC
   3591    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
   3592    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
   3593 - run CLDR unit tests, commit to CLDR
   3594 - generate ICU zh collation data: run CLDR
   3595    org.unicode.cldr.icu.NewLdml2IcuConverter
   3596  with program arguments
   3597    -t collation
   3598    -s /usr/local/google/home/mscherer/svn.cldr/uni10/common/collation
   3599    -m /usr/local/google/home/mscherer/svn.cldr/uni10/common/supplemental
   3600    -d /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/coll
   3601    -p /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/xml/collation
   3602    zh
   3603  and VM arguments
   3604    -ea
   3605    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
   3606 - rebuild ICU4C
   3607 
   3608 * run & fix ICU4C tests, now with new CLDR collation root data
   3609 - run all tests with the collation test data *_SHORT.txt or the full files
   3610  (the full ones have comments, useful for debugging)
   3611 - note on intltest: if collate/UCAConformanceTest fails, then
   3612  utility/MultithreadTest/TestCollators will fail as well;
   3613  fix the conformance test before looking into the multi-thread test
   3614 
   3615 * update Java data files
   3616 - refresh just the UCD/UCA-related/derived files, just to be safe
   3617 - see (ICU4C)/source/data/icu4j-readme.txt
   3618 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   3619 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   3620  output:
   3621    ...
   3622    Unicode .icu files built to ./out/build/icudt60l
   3623    echo timestamp > uni-core-data
   3624    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt60b
   3625    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b
   3626    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
   3627    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt60l.dat ./out/icu4j/icudt60b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt60l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt60b
   3628    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b"
   3629    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt60b/
   3630    mkdir -p /tmp/icu4j/main/shared/data
   3631    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
   3632    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt60b/
   3633    mkdir -p /tmp/icu4j/main/shared/data
   3634    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
   3635    make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/uni10/dbg/icu4c/data'
   3636 - copy the big-endian Unicode data files to another location,
   3637  separate from the other data files,
   3638  and then refresh ICU4J
   3639    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
   3640    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   3641    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   3642    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   3643    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   3644    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
   3645    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   3646    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   3647    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   3648    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
   3649 
   3650 * When refreshing all of ICU4J data from ICU4C
   3651 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   3652 - cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
   3653 or
   3654 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
   3655 
   3656 * update CollationFCD.java
   3657  + copy & paste the initializers of lcccIndex[] etc. from
   3658    ICU4C/source/i18n/collationfcd.cpp to
   3659    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
   3660 
   3661 * refresh Java test .txt files
   3662 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
   3663    cd $ICU_SRC/icu4c/source/data/unidata
   3664    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   3665    cd ../../test/testdata
   3666    cp BidiCharacterTest.txt BidiTest.txt IdnaTest.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   3667    cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   3668 
   3669 * run & fix ICU4J tests
   3670 
   3671 *** API additions
   3672 - send notice to icu-design about new born-@stable API (enum constants etc.)
   3673 
   3674 *** CLDR numbering systems
   3675 - look for new sets of decimal digits (gc=ND & nv=4) and submit a CLDR ticket
   3676  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
   3677  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
   3678 
   3679 *** merge the Unicode update branches back onto the trunk
   3680 - do not merge the icudata.jar and testdata.jar,
   3681  instead rebuild them from merged & tested ICU4C
   3682 - make sure that changes to Unicode tools are checked in:
   3683  http://www.unicode.org/utility/trac/log/trunk/unicodetools
   3684 
   3685 ---------------------------------------------------------------------------- ***
   3686 
   3687 Emoji 5.0 update for ICU 59
   3688 - ICU 59 mostly remains on Unicode 9.0
   3689 - except updates bidi and segmentation data to Unicode 10 beta
   3690 
   3691 First run of tools on combined icu4c/icu4j/tools trunk after svn repository reorg.
   3692 
   3693 * Command-line environment setup
   3694 
   3695 ICU_ROOT=~/svn.icu/trunk
   3696 ICU_SRC_DIR=$ICU_ROOT/src
   3697 ICU4C_SRC_DIR=$ICU_SRC_DIR/icu4c
   3698 ICUDT=icudt59b
   3699 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
   3700 SRC_DATA_IN=$ICU4C_SRC_DIR/source/data/in
   3701 UNIDATA=$ICU4C_SRC_DIR/source/data/unidata
   3702 
   3703 *** ICU Trac
   3704 
   3705 - ticket:12900: take Emoji 5.0 properties data into ICU 59 once it's released
   3706 - changes directly on trunk
   3707 
   3708 *** data files & enums & parser code
   3709 
   3710 * download files
   3711 
   3712 - download Unicode 9.0 files into a uni90e50 folder: ucd, idna, security (skip uca)
   3713 - download emoji 5.0 beta files into the same uni90e50 folder
   3714 - download Unicode 10.0 beta files: ucd
   3715  + copy Unicode 10 bidi files to the uni90e50/ucd folder:
   3716    BidiBrackets.txt
   3717    BidiCharacterTest.txt
   3718    BidiMirroring.txt
   3719    BidiTest.txt
   3720    extracted/DerivedBidiClass.txt
   3721  + copy Unicode 10 segmentation files to the uni90e50/ucd folder:
   3722    LineBreak.txt
   3723    auxiliary/*
   3724 
   3725 * preparseucd.py changes
   3726 - adjust for combined trunks
   3727 - write new copyright lines
   3728 - ignore new Emoji_Component property for now
   3729 
   3730 * process and/or copy files
   3731 - ~/svn.icu/trunk/src/tools/unicode$ py/preparseucd.py ~/unidata/uni90e50/20170322 $ICU_SRC_DIR
   3732  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
   3733 
   3734 - cp ~/unidata/uni90e50/20170322/security/confusables.txt $UNIDATA
   3735 
   3736 * build ICU (make install)
   3737  so that the tools build can pick up the new definitions from the installed header files.
   3738 
   3739  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
   3740 
   3741 * build Unicode tools using CMake+make
   3742 
   3743 ~/svn.icu/trunk/src/tools/unicode/c/icudefs.txt:
   3744 
   3745 # Location (--prefix) of where ICU was installed.
   3746 set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
   3747 # Location of the ICU4C source tree.
   3748 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/trunk/src/icu4c)
   3749 
   3750  ~/svn.icu/trunk/dbg/tools/unicode/c$
   3751    cmake ../../../../src/tools/unicode/c
   3752    make
   3753 
   3754 * generate core properties data files
   3755  ~/svn.icu/trunk/dbg/tools/unicode/c$
   3756    genprops/genprops $ICU4C_SRC_DIR
   3757 - rebuild ICU (make install) & tools
   3758 
   3759 * run & fix ICU4C tests
   3760 - Andy handles RBBI & spoof check test failures
   3761 
   3762 * update Java data files
   3763 - refresh just the UCD/UCA-related/derived files, just to be safe
   3764 - see (ICU4C)/source/data/icu4j-readme.txt
   3765 - mkdir /tmp/icu4j
   3766 - ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   3767  output:
   3768    ...
   3769    Unicode .icu files built to ./out/build/icudt59l
   3770    echo timestamp > uni-core-data
   3771    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt59b
   3772    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b
   3773    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
   3774    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt59l.dat ./out/icu4j/icudt59b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt59l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt59b
   3775    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b"
   3776    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt59b/
   3777    mkdir -p /tmp/icu4j/main/shared/data
   3778    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
   3779    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt59b/
   3780    mkdir -p /tmp/icu4j/main/shared/data
   3781    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
   3782    make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/trunk/dbg/icu4c/data'
   3783 - copy the big-endian Unicode data files to another location,
   3784  separate from the other data files,
   3785  and then refresh ICU4J
   3786    cd ~/svn.icu/trunk/dbg/icu4c/data/out/icu4j
   3787    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   3788    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   3789    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   3790    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
   3791    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   3792    jar uvf ~/svn.icu/trunk/src/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
   3793 
   3794 * When refreshing all of ICU4J data from ICU4C
   3795 - ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   3796 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu/trunk/src/icu4j/main/shared/data
   3797 or
   3798 - ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=~/svn.icu/trunk/src/icu4j icu4j-data-install
   3799 
   3800 * refresh Java test .txt files
   3801 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
   3802    cd $ICU4C_SRC_DIR/source/data/unidata
   3803    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   3804    cd ../../test/testdata
   3805    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   3806    cp ~/unidata/uni90e50/20170322/ucd/CompositionExclusions.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
   3807 
   3808 * run & fix ICU4J tests
   3809 
   3810 ---------------------------------------------------------------------------- ***
   3811 
   3812 Unicode 9.0 update for ICU 58
   3813 
   3814 * Command-line environment setup
   3815 
   3816 ICU_ROOT=~/svn.icu/trunk
   3817 ICU_SRC_DIR=$ICU_ROOT/src
   3818 ICUDT=icudt58b
   3819 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
   3820 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
   3821 UNIDATA=$ICU_SRC_DIR/source/data/unidata
   3822 
   3823 http://www.unicode.org/review/pri323/  -- beta review
   3824 http://www.unicode.org/reports/uax-proposed-updates.html
   3825 http://www.unicode.org/versions/beta-9.0.0.html
   3826 http://www.unicode.org/versions/Unicode9.0.0/
   3827 http://www.unicode.org/reports/tr44/tr44-17.html
   3828 
   3829 *** ICU Trac
   3830 
   3831 - ticket:12526: integrate Unicode 9
   3832 - C++ ^/icu/branches/markus/uni90, ^/icu/branches/markus/uni90b
   3833 - Java ^/icu4j/branches/markus/uni90, ^/icu4j/branches/markus/uni90b
   3834 
   3835 *** CLDR Trac
   3836 
   3837 - cldrbug 9414: UCA 9
   3838 - ^/branches/markus/uni90 at r11518 from trunk at r11517
   3839 
   3840 - cldrbug 8745: Unicode 9.0 script metadata
   3841 
   3842 *** Unicode version numbers
   3843 - makedata.mak
   3844 - uchar.h
   3845 - com.ibm.icu.util.VersionInfo
   3846 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
   3847 
   3848 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
   3849  so that the makefiles see the new version number.
   3850 
   3851 *** data files & enums & parser code
   3852 
   3853 * file preparation
   3854 
   3855 - download UCD & IDNA files
   3856 - make sure that the Unicode data folder passed into preparseucd.py
   3857  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
   3858 - only for manual diffs: remove version suffixes from the file names
   3859  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
   3860  (see https://sites.google.com/site/unicodetools/inputdata)
   3861 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
   3862 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni90/20160603 $ICU_SRC_DIR ~/svn.icutools/trunk/src
   3863 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
   3864 
   3865 - also: from http://unicode.org/Public/security/9.0.0/ download new confusables.txt
   3866  and copy to $UNIDATA
   3867    cp ~/unidata/uni90/20160603/security/confusables.txt $UNIDATA
   3868 
   3869 * preparseucd.py changes
   3870 - remove or add new Unicode scripts from/to the
   3871  only-in-ISO-15924 list according to the error messages:
   3872    ValueError: remove ['Tang'] from _scripts_only_in_iso15924
   3873    ValueError: sc = Hanb (uchar.h USCRIPT_HAN_WITH_BOPOMOFO) not in the UCD
   3874    ValueError: sc = Jamo (uchar.h USCRIPT_JAMO) not in the UCD
   3875    ValueError: sc = Zsye (uchar.h USCRIPT_SYMBOLS_EMOJI) not in the UCD
   3876  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
   3877      and in com.ibm.icu.dev.test.lang.TestUScript.java
   3878 - DerivedNumericValues.txt new numeric values
   3879    0D58          ; 0.00625 ; ; 1/160 # No       MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH
   3880    0D59          ; 0.025 ; ; 1/40 # No       MALAYALAM FRACTION ONE FORTIETH
   3881    0D5A          ; 0.0375 ; ; 3/80 # No       MALAYALAM FRACTION THREE EIGHTIETHS
   3882    0D5B          ; 0.05 ; ; 1/20 # No       MALAYALAM FRACTION ONE TWENTIETH
   3883    0D5D          ; 0.15 ; ; 3/20 # No       MALAYALAM FRACTION THREE TWENTIETHS
   3884  -> change uprops.h, corepropsbuilder.cpp/encodeNumericValue(),
   3885     uchar.c, UCharacterProperty.java
   3886     to support a new series of values
   3887 - adjust preparseucd.py for Tangut algorithmic names
   3888  in ppucd.txt:
   3889    algnamesrange;17000..187EC;han;CJK UNIFIED IDEOGRAPH-
   3890  ->
   3891    algnamesrange;17000..187EC;han;TANGUT IDEOGRAPH-
   3892 - avoid block-compressing most String/Miscellaneous property values,
   3893  triggered by genprops not coping with a multi-code point Case_Folding on
   3894    block;1C80..1C8F;...;Cased;cf=0442;CWCF;...
   3895  keep block-compressing empty-string mappings NFKC_CF="" for tags and variation selectors
   3896 
   3897 * PropertyAliases.txt changes
   3898 - 1 new property PCM=Prepended_Concatenation_Mark
   3899  Ignore: Only useful for layout engines.
   3900  Ok to list in ppucd.txt.
   3901 
   3902 * PropertyValueAliases.txt new property values
   3903    blk; Adlam                            ; Adlam
   3904    blk; Bhaiksuki                        ; Bhaiksuki
   3905    blk; Cyrillic_Ext_C                   ; Cyrillic_Extended_C
   3906    blk; Glagolitic_Sup                   ; Glagolitic_Supplement
   3907    blk; Ideographic_Symbols              ; Ideographic_Symbols_And_Punctuation
   3908    blk; Marchen                          ; Marchen
   3909    blk; Mongolian_Sup                    ; Mongolian_Supplement
   3910    blk; Newa                             ; Newa
   3911    blk; Osage                            ; Osage
   3912    blk; Tangut                           ; Tangut
   3913    blk; Tangut_Components                ; Tangut_Components
   3914  -> add to uchar.h
   3915    use long property names for enum constants
   3916  -> add to UCharacter.UnicodeBlock IDs
   3917    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
   3918            replace  public static final int \1_ID = \2; \3
   3919  -> add to UCharacter.UnicodeBlock objects
   3920    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
   3921            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
   3922 
   3923    GCB; EB                               ; E_Base
   3924    GCB; EBG                              ; E_Base_GAZ
   3925    GCB; EM                               ; E_Modifier
   3926    GCB; GAZ                              ; Glue_After_Zwj
   3927    GCB; ZWJ                              ; ZWJ
   3928  -> uchar.h & UCharacter.GraphemeClusterBreak
   3929 
   3930    jg ; African_Feh                      ; African_Feh
   3931    jg ; African_Noon                     ; African_Noon
   3932    jg ; African_Qaf                      ; African_Qaf
   3933  -> uchar.h & UCharacter.JoiningGroup
   3934 
   3935    lb ; EB                               ; E_Base
   3936    lb ; EM                               ; E_Modifier
   3937    lb ; ZWJ                              ; ZWJ
   3938  -> uchar.h & UCharacter.LineBreak
   3939 
   3940    sc ; Adlm                             ; Adlam
   3941    sc ; Bhks                             ; Bhaiksuki
   3942    sc ; Marc                             ; Marchen
   3943    sc ; Newa                             ; Newa
   3944    sc ; Osge                             ; Osage
   3945    sc ; Tang                             ; Tangut
   3946  -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
   3947 
   3948    WB ; EB                               ; E_Base
   3949    WB ; EBG                              ; E_Base_GAZ
   3950    WB ; EM                               ; E_Modifier
   3951    WB ; GAZ                              ; Glue_After_Zwj
   3952    WB ; ZWJ                              ; ZWJ
   3953  -> uchar.h & UCharacter.WordBreak
   3954 
   3955 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
   3956    (not strictly necessary for NOT_ENCODED scripts)
   3957  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
   3958 
   3959 * generate normalization data files
   3960  cd $ICU_ROOT/dbg
   3961  bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
   3962  bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
   3963  bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
   3964  bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
   3965  bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
   3966 
   3967 * build ICU (make install)
   3968  so that the tools build can pick up the new definitions from the installed header files.
   3969 
   3970  $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 30 out.txt
   3971 
   3972 * build Unicode tools using CMake+make
   3973 
   3974 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
   3975 
   3976  # Location (--prefix) of where ICU was installed.
   3977  set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
   3978  # Location of the ICU source tree.
   3979  set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
   3980 
   3981  ~/svn.icutools/trunk/dbg/unicode/c$
   3982    cmake ../../../src/unicode/c
   3983    make
   3984 
   3985 * generate core properties data files
   3986  ~/svn.icutools/trunk/dbg/unicode/c$
   3987    genprops/genprops $ICU_SRC_DIR
   3988    genuca/genuca --hanOrder implicit $ICU_SRC_DIR
   3989    genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
   3990 - rebuild ICU (make install) & tools
   3991 
   3992 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
   3993  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
   3994 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
   3995 - Unicode 6.0..9.0: U+2260, U+226E, U+226F
   3996 - nothing new in 9.0, no test file to update
   3997 
   3998 * run & fix ICU4C tests
   3999 - Andy handles RBBI & spoof check test failures
   4000 
   4001 * collation: CLDR collation root, UCA DUCET
   4002 
   4003 - UCA DUCET goes into Mark's Unicode tools, see
   4004  https://sites.google.com/site/unicodetools/home#TOC-UCA
   4005 - CLDR root data files are checked into (CLDR UCA branch)/common/uca/
   4006    cp (UCA generated)/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
   4007 
   4008 - cd (CLDR UCA branch)/common/uca/
   4009 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
   4010    cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
   4011 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
   4012    cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
   4013    (note removing the underscore before "Rules")
   4014    cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
   4015 - restore TODO diffs in UCARules.txt
   4016    meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
   4017 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
   4018  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
   4019  from the CLDR root files (..._CLDR_..._SHORT.txt)
   4020    cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
   4021    cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
   4022    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
   4023 - if CLDR common/uca/unihan-index.txt changes, then update
   4024  CLDR common/collation/root.xml <collation type="private-unihan">
   4025  and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
   4026 
   4027 - run genuca, see command line above;
   4028  deal with
   4029    Error: Unknown script for first-primary sample character U+104B5 on line 32599 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt:
   4030    FDD1 104B5;     [75 B8 02, 05, 05]      # Osage first primary (compressible)
   4031        (add the character to genuca.cpp sampleCharsToScripts[])
   4032  + look up the USCRIPT_ code for the new sample characters
   4033    (should be obvious from the comment in the error output)
   4034  + *add* mappings to sampleCharsToScripts[], do not replace them
   4035    (in case the script sample characters flip-flop)
   4036  + insert new scripts in DUCET script order, see the top_byte table
   4037    at the beginning of FractionalUCA.txt
   4038 - rebuild ICU4C
   4039 
   4040 * Unihan collators
   4041 - run Unicode Tools
   4042    org.unicode.draft.GenerateUnihanCollators
   4043  with VM arguments
   4044    -DSVN_WORKSPACE=/home/mscherer/svn.unitools/trunk
   4045    -DOTHER_WORKSPACE=/home/mscherer/svn.unitools
   4046    -DUCD_DIR=/home/mscherer/svn.unitools/trunk/data
   4047    -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
   4048    -DUVERSION=9.0.0
   4049    -ea
   4050 - run Unicode Tools
   4051    org.unicode.draft.GenerateUnihanCollatorFiles
   4052  with the same arguments
   4053 - check CLDR diffs
   4054    cd ~/svn.cldr/trunk
   4055    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
   4056    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
   4057 - copy to CLDR
   4058    cd ~/svn.cldr/trunk
   4059    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
   4060    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
   4061 - commit to CLDR
   4062 - generate ICU zh collation data: run CLDR
   4063    org.unicode.cldr.icu.NewLdml2IcuConverter
   4064  with program arguments
   4065    -t collation
   4066    -s /home/mscherer/svn.cldr/trunk/common/collation
   4067    -m /home/mscherer/svn.cldr/trunk/common/supplemental
   4068    -d /home/mscherer/svn.icu/trunk/src/source/data/coll
   4069    -p /home/mscherer/svn.icu/trunk/src/source/data/xml/collation
   4070    zh
   4071  and VM arguments
   4072    -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
   4073 - rebuild ICU4C
   4074 
   4075 * run & fix ICU4C tests, now with new CLDR collation root data
   4076 - run all tests with the collation test data *_SHORT.txt or the full files
   4077  (the full ones have comments, useful for debugging)
   4078 - note on intltest: if collate/UCAConformanceTest fails, then
   4079  utility/MultithreadTest/TestCollators will fail as well;
   4080  fix the conformance test before looking into the multi-thread test
   4081 
   4082 * update Java data files
   4083 - refresh just the UCD/UCA-related/derived files, just to be safe
   4084 - see (ICU4C)/source/data/icu4j-readme.txt
   4085 - mkdir /tmp/icu4j
   4086 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   4087  output:
   4088    ...
   4089    Unicode .icu files built to ./out/build/icudt58l
   4090    echo timestamp > uni-core-data
   4091    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt58b
   4092    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b
   4093    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
   4094    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt58l.dat ./out/icu4j/icudt58b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt58l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt58b
   4095    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b"
   4096    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt58b/
   4097    mkdir -p /tmp/icu4j/main/shared/data
   4098    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
   4099    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt58b/
   4100    mkdir -p /tmp/icu4j/main/shared/data
   4101    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
   4102    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
   4103 - copy the big-endian Unicode data files to another location,
   4104  separate from the other data files,
   4105  and then refresh ICU4J
   4106    cd ~/svn.icu/trunk/dbg/data/out/icu4j
   4107    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   4108    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   4109    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   4110    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   4111    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
   4112    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   4113    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   4114    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   4115    jar uvf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
   4116 
   4117 * When refreshing all of ICU4J data from ICU4C
   4118 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   4119 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
   4120 or
   4121 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
   4122 
   4123 * update CollationFCD.java
   4124  + copy & paste the initializers of lcccIndex[] etc. from
   4125    ICU4C/source/i18n/collationfcd.cpp to
   4126    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
   4127 
   4128 * refresh Java test .txt files
   4129 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
   4130    cd $ICU_SRC_DIR/source/data/unidata
   4131    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
   4132    cd ../../test/testdata
   4133    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
   4134    cp ~/unidata/uni90/20160603/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
   4135 
   4136 * run & fix ICU4J tests
   4137 
   4138 *** LayoutEngine script information
   4139 
   4140 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
   4141  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
   4142  in the working directory.
   4143 
   4144  (It also generates ScriptRunData.cpp, which is no longer needed.)
   4145 
   4146  It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
   4147  (a plain text file)
   4148  which maps ICU versions to the numbers of script/language constants
   4149  that were added then.
   4150  (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
   4151 
   4152  The generated files have a current copyright date and "@deprecated" statement.
   4153 
   4154 * Review changes, fix Java tool if necessary, and copy to ICU4C
   4155  cd ~/svn.icu4j/trunk/src
   4156  meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
   4157  cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
   4158  cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
   4159 
   4160 *** API additions
   4161 - send notice to icu-design about new born-@stable API (enum constants etc.)
   4162 
   4163 *** merge the Unicode update branches back onto the trunk
   4164 - do not merge the icudata.jar and testdata.jar,
   4165  instead rebuild them from merged & tested ICU4C
   4166 - make sure that changes to Unicode tools & ICU tools are checked in
   4167  http://www.unicode.org/utility/trac/log/trunk/unicodetools
   4168  http://bugs.icu-project.org/trac/log/tools/trunk
   4169 
   4170 ---------------------------------------------------------------------------- ***
   4171 
   4172 New script codes early in ICU 58: https://unicode-org.atlassian.net/browse/ICU-11764
   4173 
   4174 Adding
   4175 - new scripts in Unicode 9: Adlm, Bhks, Marc, Newa, Osge
   4176 - new combination/alias codes: Hanb, Jamo
   4177  - used in CLDR 29 and in spoof checker
   4178 - new Z* code: Zsye
   4179 
   4180 Add new codes to uscript.h & UScript.java, see Unicode update logs.
   4181  -> com.ibm.icu.lang.UScript
   4182    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
   4183    replace  public static final int \1 = \2; \3
   4184 
   4185 Manually edit ppucd.txt and icutools:unicode/c/genprops/pnames_data.h,
   4186 add new script codes.
   4187 "Long" script names only where established in Unicode 9 PropertyValueAliases.txt.
   4188 
   4189 Note: If we have to run preparseucd.py again before the Unicode 9 update,
   4190 then we need to manually keep/restore the new script codes.
   4191 
   4192 ICU_ROOT=~/svn.icu/trunk
   4193 ICU_SRC_DIR=$ICU_ROOT/src
   4194 ICUDT=icudt57b
   4195 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
   4196 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
   4197 UNIDATA=$ICU_SRC_DIR/source/data/unidata
   4198 
   4199 Adjust unicode/c/genprops/*builder.cpp for #ifndef/#ifdef changes in _data.h files,
   4200 see https://unicode-org.atlassian.net/browse/ICU-12141
   4201 
   4202 make install, then icutools cmake & make, then
   4203 ~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
   4204 
   4205 Generate Java data as usual, only update pnames.icu & uprops.icu.
   4206 
   4207 *** LayoutEngine script information
   4208 
   4209 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
   4210  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
   4211  in the working directory.
   4212 
   4213  (It also generates ScriptRunData.cpp, which is no longer needed.)
   4214 
   4215  It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
   4216  (a plain text file)
   4217  which maps ICU versions to the numbers of script/language constants
   4218  that were added then.
   4219  (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
   4220 
   4221  The generated files have a current copyright date and "@deprecated" statement.
   4222 
   4223 * Review changes, fix Java tool if necessary, and copy to ICU4C
   4224  cd ~/svn.icu4j/trunk/src
   4225  meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
   4226  cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
   4227  cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
   4228 
   4229 ---------------------------------------------------------------------------- ***
   4230 
   4231 Emoji properties added in ICU 57: https://unicode-org.atlassian.net/browse/ICU-11802
   4232 
   4233 Edit preparseucd.py to add & parse new properties.
   4234 They share the UCD property namespace but are not listed in PropertyAliases.txt.
   4235 
   4236 Add emoji-data.txt to the input files, from http://www.unicode.org/Public/emoji/
   4237 Initial data from emoji/2.0/
   4238 
   4239 ICU_ROOT=~/svn.icu/trunk
   4240 ICU_SRC_DIR=$ICU_ROOT/src
   4241 ICUDT=icudt56b
   4242 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
   4243 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
   4244 UNIDATA=$ICU_SRC_DIR/source/data/unidata
   4245 
   4246 Add binary-property constants to uchar.h enum UProperty & UProperty.java.
   4247 
   4248 ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20151217 $ICU_SRC_DIR ~/svn.icutools/trunk/src
   4249 (Needs to be run after uchar.h additions, so that the new properties can be picked up by genprops.)
   4250 
   4251 Data structure: uprops.h/.cpp, corepropsbuilder.cpp, UCharacterProperty.java
   4252 
   4253 make install, then icutools cmake & make, then
   4254 ~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
   4255 
   4256 Generate Java data as usual, only update pnames.icu & uprops.icu.
   4257 
   4258 ---------------------------------------------------------------------------- ***
   4259 
   4260 Unicode 8.0 update for ICU 56
   4261 
   4262 * Command-line environment setup
   4263 
   4264 ICU_ROOT=~/svn.icu/trunk
   4265 ICU_SRC_DIR=$ICU_ROOT/src
   4266 ICUDT=icudt56b
   4267 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
   4268 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
   4269 UNIDATA=$ICU_SRC_DIR/source/data/unidata
   4270 
   4271 http://www.unicode.org/review/pri297/  -- beta review
   4272 http://www.unicode.org/reports/uax-proposed-updates.html
   4273 http://unicode.org/versions/beta-8.0.0.html
   4274 http://www.unicode.org/versions/Unicode8.0.0/
   4275 http://www.unicode.org/reports/tr44/tr44-15.html
   4276 
   4277 *** ICU Trac
   4278 
   4279 - ticket:11574: Unicode 8
   4280 - C++ branches/markus/uni80 at r37351 from trunk at r37343
   4281 - Java branches/markus/uni80 at r37352 from trunk at r37338
   4282 
   4283 *** CLDR Trac
   4284 
   4285 - cldrbug 8311: UCA 8
   4286 - branches/markus/uni80 at r11518 from trunk at r11517
   4287 
   4288 - cldrbug 8109: Unicode 8.0 script metadata
   4289 - cldrbug 8418: Updated segmentation for Unicode 8.0
   4290 
   4291 *** Unicode version numbers
   4292 - makedata.mak
   4293 - uchar.h
   4294 - com.ibm.icu.util.VersionInfo
   4295 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
   4296 
   4297 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
   4298  so that the makefiles see the new version number.
   4299 
   4300 *** data files & enums & parser code
   4301 
   4302 * file preparation
   4303 
   4304 - download UCD & IDNA files
   4305 - make sure that the Unicode data folder passed into preparseucd.py
   4306  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
   4307 - only for manual diffs: remove version suffixes from the file names
   4308  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
   4309  (see https://sites.google.com/site/unicodetools/inputdata)
   4310 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
   4311 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20150415 $ICU_SRC_DIR ~/svn.icutools/trunk/src
   4312 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
   4313 
   4314 - also: from http://unicode.org/Public/security/8.0.0/ download new
   4315  confusables.txt & confusablesWholeScript.txt
   4316  and copy to $UNIDATA
   4317    ~/unidata$ cp uni80/20150415/security/confusables.txt $UNIDATA
   4318    ~/unidata$ cp uni80/20150415/security/confusablesWholeScript.txt $UNIDATA
   4319 
   4320 * initial preparseucd.py changes
   4321 - remove new Unicode scripts from the
   4322  only-in-ISO-15924 list according to the error message:
   4323    ValueError: remove ['Ahom', 'Hatr', 'Hluw', 'Hung', 'Mult', 'Sgnw']
   4324    from _scripts_only_in_iso15924
   4325  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
   4326      and in com.ibm.icu.dev.test.lang.TestUScript.java
   4327 - property and file name change:
   4328    IndicMatraCategory -> IndicPositionalCategory
   4329 - UnicodeData.txt unusual numeric values (improper fractions)
   4330    109F6;MEROITIC CURSIVE FRACTION ONE TWELFTH;No;0;R;;;;1/12;N;;;;;
   4331    109F7;MEROITIC CURSIVE FRACTION TWO TWELFTHS;No;0;R;;;;2/12;N;;;;;
   4332    109F8;MEROITIC CURSIVE FRACTION THREE TWELFTHS;No;0;R;;;;3/12;N;;;;;
   4333    109F9;MEROITIC CURSIVE FRACTION FOUR TWELFTHS;No;0;R;;;;4/12;N;;;;;
   4334    109FA;MEROITIC CURSIVE FRACTION FIVE TWELFTHS;No;0;R;;;;5/12;N;;;;;
   4335    109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R;;;;6/12;N;;;;;
   4336    109FC;MEROITIC CURSIVE FRACTION SEVEN TWELFTHS;No;0;R;;;;7/12;N;;;;;
   4337    109FD;MEROITIC CURSIVE FRACTION EIGHT TWELFTHS;No;0;R;;;;8/12;N;;;;;
   4338    109FE;MEROITIC CURSIVE FRACTION NINE TWELFTHS;No;0;R;;;;9/12;N;;;;;
   4339    109FF;MEROITIC CURSIVE FRACTION TEN TWELFTHS;No;0;R;;;;10/12;N;;;;;
   4340  -> change preparseucd.py to map them to proper fractions (e.g., 1/6)
   4341     which are listed in DerivedNumericValues.txt;
   4342     keeps storage in data file simple
   4343 
   4344 * PropertyValueAliases.txt changes
   4345 - 10 new Block (blk) values:
   4346    blk; Ahom                             ; Ahom
   4347    blk; Anatolian_Hieroglyphs            ; Anatolian_Hieroglyphs
   4348    blk; Cherokee_Sup                     ; Cherokee_Supplement
   4349    blk; CJK_Ext_E                        ; CJK_Unified_Ideographs_Extension_E
   4350    blk; Early_Dynastic_Cuneiform         ; Early_Dynastic_Cuneiform
   4351    blk; Hatran                           ; Hatran
   4352    blk; Multani                          ; Multani
   4353    blk; Old_Hungarian                    ; Old_Hungarian
   4354    blk; Sup_Symbols_And_Pictographs      ; Supplemental_Symbols_And_Pictographs
   4355    blk; Sutton_SignWriting               ; Sutton_SignWriting
   4356  -> add to uchar.h
   4357    use long property names for enum constants
   4358  -> add to UCharacter.UnicodeBlock IDs
   4359    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
   4360            replace  public static final int \1_ID = \2; \3
   4361  -> add to UCharacter.UnicodeBlock objects
   4362    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
   4363            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
   4364 - 6 new Script (sc) values:
   4365    sc ; Ahom                             ; Ahom
   4366    sc ; Hatr                             ; Hatran
   4367    sc ; Hluw                             ; Anatolian_Hieroglyphs
   4368    sc ; Hung                             ; Old_Hungarian
   4369    sc ; Mult                             ; Multani
   4370    sc ; Sgnw                             ; SignWriting
   4371  -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
   4372 
   4373 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
   4374    (not strictly necessary for NOT_ENCODED scripts)
   4375  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
   4376 
   4377 * generate normalization data files
   4378  cd $ICU_ROOT/dbg
   4379  bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
   4380  bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
   4381  bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
   4382  bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
   4383  bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
   4384 
   4385 * build ICU (make install)
   4386  so that the tools build can pick up the new definitions from the installed header files.
   4387 
   4388  $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
   4389 
   4390 * build Unicode tools using CMake+make
   4391 
   4392 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
   4393 
   4394  # Location (--prefix) of where ICU was installed.
   4395  set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
   4396  # Location of the ICU source tree.
   4397  set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
   4398 
   4399  ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
   4400  ~/svn.icutools/trunk/dbg/unicode/c$ make
   4401 
   4402 * generate core properties data files
   4403 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
   4404 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR
   4405 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
   4406 - rebuild ICU (make install) & tools
   4407 - run genuca again (see step above) so that it picks up the new nfc.nrm
   4408 - rebuild ICU (make install) & tools
   4409 
   4410 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
   4411  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
   4412 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
   4413 - Unicode 6.0..8.0: U+2260, U+226E, U+226F
   4414 - nothing new in 8.0, no test file to update
   4415 
   4416 * run & fix ICU4C tests
   4417 - bad Cherokee case folding due to difference in fallbacks:
   4418  UCD case folding falls back to no mapping,
   4419  ICU runtime case folding falls back to lowercasing;
   4420  fixed casepropsbuilder.cpp to generate scf mappings to self
   4421  when there is an slc mapping but no scf
   4422 - Andy handles RBBI & spoof check test failures
   4423 
   4424 * collation: CLDR collation root, UCA DUCET
   4425 
   4426 - UCA DUCET goes into Mark's Unicode tools, see
   4427  https://sites.google.com/site/unicodetools/home#TOC-UCA
   4428 - CLDR root data files are checked into (CLDR UCA branch)/common/uca/
   4429 - cd (CLDR UCA branch)/common/uca/
   4430 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
   4431  cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
   4432 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
   4433    cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
   4434    (note removing the underscore before "Rules")
   4435    cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
   4436 - restore TODO diffs in UCARules.txt
   4437    meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
   4438 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
   4439  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
   4440  from the CLDR root files (..._CLDR_..._SHORT.txt)
   4441    cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
   4442    cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
   4443    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
   4444 - if CLDR common/uca/unihan-index.txt changes, then update
   4445  CLDR common/collation/root.xml <collation type="private-unihan">
   4446  and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
   4447 - run genuca, see command line above;
   4448  deal with
   4449    Error: Unknown script for first-primary sample character U+07d8 on line 23005 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt
   4450        (add the character to genuca.cpp sampleCharsToScripts[])
   4451  + look up the script for the new sample characters
   4452    (e.g., in FractionalUCA.txt)
   4453  + *add* mappings to sampleCharsToScripts[], do not replace them
   4454    (in case the script sample characters flip-flop)
   4455  + insert new scripts in DUCET script order, see the top_byte table
   4456    at the beginning of FractionalUCA.txt
   4457 - rebuild ICU4C
   4458 
   4459 * run & fix ICU4C tests, now with new CLDR collation root data
   4460 - run all tests with the collation test data *_SHORT.txt or the full files
   4461  (the full ones have comments, useful for debugging)
   4462 - note on intltest: if collate/UCAConformanceTest fails, then
   4463  utility/MultithreadTest/TestCollators will fail as well;
   4464  fix the conformance test before looking into the multi-thread test
   4465 - fixed bug in CollationWeights::getWeightRanges()
   4466  exposed by new data and CollationTest::TestRootElements
   4467 
   4468 * update Java data files
   4469 - refresh just the UCD/UCA-related/derived files, just to be safe
   4470 - see (ICU4C)/source/data/icu4j-readme.txt
   4471 - mkdir /tmp/icu4j
   4472 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   4473  output:
   4474    ...
   4475    Unicode .icu files built to ./out/build/icudt56l
   4476    echo timestamp > uni-core-data
   4477    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt56b
   4478    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b
   4479    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
   4480    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt56l.dat ./out/icu4j/icudt56b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt56l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt56b
   4481    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b"
   4482    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt56b/
   4483    mkdir -p /tmp/icu4j/main/shared/data
   4484    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
   4485    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt56b/
   4486    mkdir -p /tmp/icu4j/main/shared/data
   4487    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
   4488    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
   4489 - copy the big-endian Unicode data files to another location,
   4490  separate from the other data files,
   4491  and then refresh ICU4J
   4492    cd ~/svn.icu/trunk/dbg/data/out/icu4j
   4493    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   4494    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   4495    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   4496    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   4497    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
   4498    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   4499    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   4500    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   4501    jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
   4502 
   4503 * When refreshing all of ICU4J data from ICU4C
   4504 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   4505 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
   4506 or
   4507 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
   4508 
   4509 * update CollationFCD.java
   4510  + copy & paste the initializers of lcccIndex[] etc. from
   4511    ICU4C/source/i18n/collationfcd.cpp to
   4512    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
   4513 
   4514 * refresh Java test .txt files
   4515 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
   4516    cd $ICU_SRC_DIR/source/data/unidata
   4517    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
   4518    cd ../../test/testdata
   4519    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
   4520    cp ~/unidata/uni80/20150415/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
   4521 
   4522 * run & fix ICU4J tests
   4523 
   4524 *** LayoutEngine script information
   4525 
   4526 * ICU 56: Modify ScriptIDModuleWriter.java to not output @stable tags any more,
   4527  because the layout engine was deprecated in ICU 54.
   4528  Modify ScriptIDModuleWriter.java and ScriptTagModuleWriter.java
   4529  to write lines that we used to add manually.
   4530 
   4531 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
   4532  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
   4533  in the working directory.
   4534 
   4535  (It also generates ScriptRunData.cpp, which is no longer needed.)
   4536 
   4537  It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
   4538  (a plain text file)
   4539  which maps ICU versions to the numbers of script/language constants
   4540  that were added then.
   4541  (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
   4542 
   4543  The generated files have a current copyright date and "@deprecated" statement.
   4544 
   4545 * Review changes, fix Java tool if necessary, and copy to ICU4C
   4546  cd ~/svn.icu4j/trunk/src
   4547  meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
   4548  cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
   4549  cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
   4550 
   4551 *** API additions
   4552 - send notice to icu-design about new born-@stable API (enum constants etc.)
   4553 
   4554 *** merge the Unicode update branches back onto the trunk
   4555 - do not merge the icudata.jar and testdata.jar,
   4556  instead rebuild them from merged & tested ICU4C
   4557 - make sure that changes to Unicode tools & ICU tools are checked in
   4558  http://www.unicode.org/utility/trac/log/trunk/unicodetools
   4559  http://bugs.icu-project.org/trac/log/tools/trunk
   4560 
   4561 ---------------------------------------------------------------------------- ***
   4562 
   4563 Unicode 7.0 update for ICU 54
   4564 
   4565 http://www.unicode.org/review/pri271/  -- beta review
   4566 http://www.unicode.org/reports/uax-proposed-updates.html
   4567 http://www.unicode.org/versions/beta-7.0.0.html#notable_issues
   4568 http://www.unicode.org/reports/tr44/tr44-13.html
   4569 
   4570 *** ICU Trac
   4571 
   4572 - ticket 10821: Unicode 7.0, UCA 7.0
   4573 - C++ branches/markus/uni70 at r35584 from trunk at r35580
   4574 - Java branches/markus/uni70 at r35587 from trunk at r35545
   4575 
   4576 *** CLDR Trac
   4577 
   4578 - ticket 7195: UCA 7.0 CLDR root collation
   4579 - branches/markus/uni70 at r10062 from trunk at r10061
   4580 
   4581 - ticket 6762: script metadata for Unicode 7.0 new scripts
   4582 
   4583 *** Unicode version numbers
   4584 - makedata.mak
   4585 - uchar.h
   4586 - com.ibm.icu.util.VersionInfo
   4587 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
   4588 
   4589 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
   4590  so that the makefiles see the new version number.
   4591 
   4592 *** data files & enums & parser code
   4593 
   4594 * file preparation
   4595 
   4596 - download UCD & IDNA files
   4597 - make sure that the Unicode data folder passed into preparseucd.py
   4598  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
   4599 - only for manual diffs: remove version suffixes from the file names
   4600  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
   4601  (see https://sites.google.com/site/unicodetools/inputdata)
   4602 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
   4603 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.icutools/trunk/src
   4604 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
   4605 - Restore TODO diffs in source/data/unidata/UCARules.txt
   4606    cd $ICU_SRC_DIR
   4607    meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UCARules.txt
   4608 - Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt
   4609 
   4610 - also: from http://unicode.org/Public/security/7.0.0/ download new
   4611  confusables.txt & confusablesWholeScript.txt
   4612  and copy to $ICU_ROOT/src/source/data/unidata/
   4613 
   4614 * initial preparseucd.py changes
   4615 - remove new Unicode scripts from the
   4616  only-in-ISO-15924 list according to the error message:
   4617    ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass',
   4618                        'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm',
   4619                        'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj']
   4620    from _scripts_only_in_iso15924
   4621  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
   4622      and in com.ibm.icu.dev.test.lang.TestUScript.java
   4623 - NamesList.txt now has a heading with a non-ASCII character
   4624  + keep ppucd.txt in platform charset, rather than changing tool/test parsers
   4625  + escape non-ASCII characters in heading comments
   4626 - gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013
   4627  + get the copyright from the first file whose copyright line contains the current year
   4628 
   4629 * PropertyValueAliases.txt changes
   4630 - 32 new Block (blk) values:
   4631    blk; Bassa_Vah                        ; Bassa_Vah
   4632    blk; Caucasian_Albanian               ; Caucasian_Albanian
   4633    blk; Coptic_Epact_Numbers             ; Coptic_Epact_Numbers
   4634    blk; Diacriticals_Ext                 ; Combining_Diacritical_Marks_Extended
   4635    blk; Duployan                         ; Duployan
   4636    blk; Elbasan                          ; Elbasan
   4637    blk; Geometric_Shapes_Ext             ; Geometric_Shapes_Extended
   4638    blk; Grantha                          ; Grantha
   4639    blk; Khojki                           ; Khojki
   4640    blk; Khudawadi                        ; Khudawadi
   4641    blk; Latin_Ext_E                      ; Latin_Extended_E
   4642    blk; Linear_A                         ; Linear_A
   4643    blk; Mahajani                         ; Mahajani
   4644    blk; Manichaean                       ; Manichaean
   4645    blk; Mende_Kikakui                    ; Mende_Kikakui
   4646    blk; Modi                             ; Modi
   4647    blk; Mro                              ; Mro
   4648    blk; Myanmar_Ext_B                    ; Myanmar_Extended_B
   4649    blk; Nabataean                        ; Nabataean
   4650    blk; Old_North_Arabian                ; Old_North_Arabian
   4651    blk; Old_Permic                       ; Old_Permic
   4652    blk; Ornamental_Dingbats              ; Ornamental_Dingbats
   4653    blk; Pahawh_Hmong                     ; Pahawh_Hmong
   4654    blk; Palmyrene                        ; Palmyrene
   4655    blk; Pau_Cin_Hau                      ; Pau_Cin_Hau
   4656    blk; Psalter_Pahlavi                  ; Psalter_Pahlavi
   4657    blk; Shorthand_Format_Controls        ; Shorthand_Format_Controls
   4658    blk; Siddham                          ; Siddham
   4659    blk; Sinhala_Archaic_Numbers          ; Sinhala_Archaic_Numbers
   4660    blk; Sup_Arrows_C                     ; Supplemental_Arrows_C
   4661    blk; Tirhuta                          ; Tirhuta
   4662    blk; Warang_Citi                      ; Warang_Citi
   4663  -> add to uchar.h
   4664    use long property names for enum constants
   4665  -> add to UCharacter.UnicodeBlock IDs
   4666    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
   4667            replace  public static final int \1_ID = \2; \3
   4668  -> add to UCharacter.UnicodeBlock objects
   4669    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
   4670            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
   4671 - 28 new Joining_Group (jg) values:
   4672    jg ; Manichaean_Aleph                 ; Manichaean_Aleph
   4673    jg ; Manichaean_Ayin                  ; Manichaean_Ayin
   4674    jg ; Manichaean_Beth                  ; Manichaean_Beth
   4675    jg ; Manichaean_Daleth                ; Manichaean_Daleth
   4676    jg ; Manichaean_Dhamedh               ; Manichaean_Dhamedh
   4677    jg ; Manichaean_Five                  ; Manichaean_Five
   4678    jg ; Manichaean_Gimel                 ; Manichaean_Gimel
   4679    jg ; Manichaean_Heth                  ; Manichaean_Heth
   4680    jg ; Manichaean_Hundred               ; Manichaean_Hundred
   4681    jg ; Manichaean_Kaph                  ; Manichaean_Kaph
   4682    jg ; Manichaean_Lamedh                ; Manichaean_Lamedh
   4683    jg ; Manichaean_Mem                   ; Manichaean_Mem
   4684    jg ; Manichaean_Nun                   ; Manichaean_Nun
   4685    jg ; Manichaean_One                   ; Manichaean_One
   4686    jg ; Manichaean_Pe                    ; Manichaean_Pe
   4687    jg ; Manichaean_Qoph                  ; Manichaean_Qoph
   4688    jg ; Manichaean_Resh                  ; Manichaean_Resh
   4689    jg ; Manichaean_Sadhe                 ; Manichaean_Sadhe
   4690    jg ; Manichaean_Samekh                ; Manichaean_Samekh
   4691    jg ; Manichaean_Taw                   ; Manichaean_Taw
   4692    jg ; Manichaean_Ten                   ; Manichaean_Ten
   4693    jg ; Manichaean_Teth                  ; Manichaean_Teth
   4694    jg ; Manichaean_Thamedh               ; Manichaean_Thamedh
   4695    jg ; Manichaean_Twenty                ; Manichaean_Twenty
   4696    jg ; Manichaean_Waw                   ; Manichaean_Waw
   4697    jg ; Manichaean_Yodh                  ; Manichaean_Yodh
   4698    jg ; Manichaean_Zayin                 ; Manichaean_Zayin
   4699    jg ; Straight_Waw                     ; Straight_Waw
   4700  -> uchar.h & UCharacter.JoiningGroup
   4701 - 23 new Script (sc) values:
   4702    sc ; Aghb                             ; Caucasian_Albanian
   4703    sc ; Bass                             ; Bassa_Vah
   4704    sc ; Dupl                             ; Duployan
   4705    sc ; Elba                             ; Elbasan
   4706    sc ; Gran                             ; Grantha
   4707    sc ; Hmng                             ; Pahawh_Hmong
   4708    sc ; Khoj                             ; Khojki
   4709    sc ; Lina                             ; Linear_A
   4710    sc ; Mahj                             ; Mahajani
   4711    sc ; Mani                             ; Manichaean
   4712    sc ; Mend                             ; Mende_Kikakui
   4713    sc ; Modi                             ; Modi
   4714    sc ; Mroo                             ; Mro
   4715    sc ; Narb                             ; Old_North_Arabian
   4716    sc ; Nbat                             ; Nabataean
   4717    sc ; Palm                             ; Palmyrene
   4718    sc ; Pauc                             ; Pau_Cin_Hau
   4719    sc ; Perm                             ; Old_Permic
   4720    sc ; Phlp                             ; Psalter_Pahlavi
   4721    sc ; Sidd                             ; Siddham
   4722    sc ; Sind                             ; Khudawadi
   4723    sc ; Tirh                             ; Tirhuta
   4724    sc ; Wara                             ; Warang_Citi
   4725  -> uscript.h (many were added before)
   4726    comment "Mende Kikakui" for USCRIPT_MENDE
   4727    add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias
   4728  -> com.ibm.icu.lang.UScript
   4729    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
   4730    replace  public static final int \1 = \2; \3
   4731 - 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
   4732  (added 2012-11-01)
   4733    Ahom        338     Ahom
   4734    Hatr        127     Hatran
   4735    Mult        323     Multani
   4736  (added 2013-10-12)
   4737    Modi        324     Modi
   4738    Pauc        263     Pau Cin Hau
   4739    Sidd        302     Siddham
   4740  -> uscript.h (some overlap with additions from Unicode)
   4741  -> com.ibm.icu.lang.UScript
   4742    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
   4743    replace  public static final int \1 = \2; \3
   4744  -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924
   4745  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
   4746      and in com.ibm.icu.dev.test.lang.TestUScript.java
   4747 
   4748 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
   4749    (not strictly necessary for NOT_ENCODED scripts)
   4750  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
   4751 
   4752 * generate normalization data files
   4753 - cd $ICU_ROOT/dbg
   4754 - export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
   4755 - SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
   4756 - UNIDATA=$ICU_SRC_DIR/source/data/unidata
   4757 - bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
   4758 - bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
   4759 - bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
   4760 - bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
   4761 - bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
   4762 
   4763 * build ICU (make install)
   4764  so that the tools build can pick up the new definitions from the installed header files.
   4765 
   4766 ~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
   4767 
   4768 * build Unicode tools using CMake+make
   4769 
   4770 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
   4771 
   4772 # Location (--prefix) of where ICU was installed.
   4773 set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst)
   4774 # Location of the ICU source tree.
   4775 set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src)
   4776 
   4777 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
   4778 ~/svn.icutools/trunk/dbg/unicode/c$ make
   4779 
   4780 * genprops work
   4781 - new code point range for Joining_Group values: 10AC0..10AFF Manichaean
   4782  + add second array of Joining_Group values for at most 10800..10FFF
   4783    icutools: unicode/c/genprops/bidipropsbuilder.cpp
   4784    icu: source/common/ubidi_props.h/.c/_data.h
   4785    icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java
   4786 
   4787 * generate core properties data files
   4788 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
   4789 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR
   4790 - rebuild ICU (make install) & tools
   4791 - run genuca again (see step above) so that it picks up the new nfc.nrm
   4792 - rebuild ICU (make install) & tools
   4793 
   4794 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
   4795  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
   4796 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
   4797 - Unicode 6.0..7.0: U+2260, U+226E, U+226F
   4798 - nothing new in 7.0, no test file to update
   4799 
   4800 * run & fix ICU4C tests
   4801 
   4802 * update Java data files
   4803 - refresh just the UCD-related files, just to be safe
   4804 - see (ICU4C)/source/data/icu4j-readme.txt
   4805 - mkdir /tmp/icu4j
   4806 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   4807  output:
   4808    ...
   4809    Unicode .icu files built to ./out/build/icudt53l
   4810    echo timestamp > uni-core-data
   4811    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b
   4812    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b
   4813    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
   4814    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b
   4815    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b"
   4816    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/
   4817    mkdir -p /tmp/icu4j/main/shared/data
   4818    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
   4819    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/
   4820    mkdir -p /tmp/icu4j/main/shared/data
   4821    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
   4822    make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data'
   4823 - copy the big-endian Unicode data files to another location,
   4824  separate from the other data files
   4825    ICUDT=icudt54b
   4826    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   4827    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   4828    cd ~/svn.icu/uni70/dbg/data/out/icu4j
   4829    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   4830    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   4831    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
   4832    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
   4833    cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   4834    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
   4835 - refresh ICU4J
   4836    ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
   4837 
   4838 * update CollationFCD.java
   4839  + copy & paste the initializers of lcccIndex[] etc. from
   4840    ICU4C/source/i18n/collationfcd.cpp to
   4841    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
   4842 
   4843 * refresh Java test .txt files
   4844 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
   4845    cd $ICU_SRC_DIR/source/data/unidata
   4846    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
   4847    cd ../../test/testdata
   4848    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
   4849    cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
   4850 
   4851 * UCA
   4852 
   4853 - download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/
   4854 - run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata)
   4855 - update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/
   4856 - run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA
   4857 - output files are in ~/svn.unitools/Generated/uca/7.0.0/
   4858 - review data; compare files, use blankweights.sed or similar
   4859  ~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.txt > frac-7.0.txt
   4860 - cd ~/svn.unitools/Generated/uca/7.0.0/
   4861 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
   4862  cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
   4863 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
   4864    (note removing the underscore before "Rules")
   4865    cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
   4866 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
   4867  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
   4868  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
   4869    cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
   4870    cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
   4871    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
   4872 - run genuca, see command line above
   4873 - rebuild ICU4C
   4874 - refresh ICU4J collation data:
   4875  (subset of instructions above for properties data refresh, except copies all coll/*)
   4876    ICUDT=icudt54b
   4877    ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   4878    ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   4879    ~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
   4880    ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
   4881 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
   4882 - note on intltest: if collate/UCAConformanceTest fails, then
   4883  utility/MultithreadTest/TestCollators will fail as well;
   4884  fix the conformance test before looking into the multi-thread test
   4885 - copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors
   4886 - copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch
   4887  ~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
   4888 
   4889 * When refreshing all of ICU4J data from ICU4C
   4890 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   4891 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
   4892 or
   4893 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
   4894 
   4895 * run & fix ICU4J tests
   4896 
   4897 *** LayoutEngine script information
   4898 
   4899 (For details see the Unicode 5.2 change log below.)
   4900 
   4901 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
   4902  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
   4903  in the working directory.
   4904  (It also generates ScriptRunData.cpp, which is no longer needed.)
   4905 
   4906  The generated files have a current copyright date and "@stable" statement.
   4907  ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java
   4908  for "born stable" Unicode API constants, and to stop parsing ICU version numbers
   4909  which may not contain dots any more.
   4910 
   4911 - diff current <icu>/source/layout files vs. generated ones
   4912    ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
   4913  review and manually merge desired changes;
   4914  fix gratuitous changes, incorrect @draft/@stable and missing aliases;
   4915  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
   4916 - if you just copy the above files, then
   4917  fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
   4918  manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
   4919 
   4920 *** API additions
   4921 - send notice to icu-design about new born-@stable API (enum constants etc.)
   4922 
   4923 *** merge the Unicode update branches back onto the trunk
   4924 - do not merge the icudata.jar and testdata.jar,
   4925  instead rebuild them from merged & tested ICU4C
   4926 
   4927 ---------------------------------------------------------------------------- ***
   4928 
   4929 Unicode 6.3 update
   4930 
   4931 http://www.unicode.org/review/pri249/  -- beta review
   4932 http://www.unicode.org/reports/uax-proposed-updates.html
   4933 http://www.unicode.org/versions/beta-6.3.0.html#notable_issues
   4934 http://www.unicode.org/reports/tr44/tr44-11.html
   4935 
   4936 *** ICU Trac
   4937 
   4938 - ticket 10128: update ICU to Unicode 6.3 beta
   4939 - ticket 10168: update ICU to Unicode 6.3 final
   4940 - C++ branches/markus/uni63 at r33552 from trunk at r33551
   4941 - Java branches/markus/uni63 at r33550 from trunk at r33553
   4942 
   4943 - ticket 10142: implement Unicode 6.3 bidi algorithm additions
   4944 
   4945 *** Unicode version numbers
   4946 - makedata.mak
   4947 - uchar.h
   4948  (configure.in & configure: have been modified to extract the version from uchar.h)
   4949 - com.ibm.icu.util.VersionInfo
   4950 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
   4951 
   4952 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
   4953  so that the makefiles see the new version number.
   4954 
   4955 *** data files & enums & parser code
   4956 
   4957 * file preparation
   4958 
   4959 - download UCD, UCA & IDNA files
   4960 - make sure that the Unicode data folder passed into preparseucd.py
   4961  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
   4962 - modify preparseucd.py:
   4963  parse new file BidiBrackets.txt
   4964  with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type
   4965 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src
   4966 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
   4967 - Check test file diffs for previously commented-out, known-failing data lines;
   4968  probably need to keep those commented out.
   4969 
   4970 * PropertyAliases.txt changes
   4971 - 1 new Enumerated Property
   4972  bpt                      ; Bidi_Paired_Bracket_Type
   4973  -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType
   4974  -> ubidi_props.h & .c & UBiDiProps.java
   4975  -> remember to write the max value at UBIDI_MAX_VALUES_INDEX
   4976  -> uprops.cpp
   4977  -> change ubidi.icu format version from 2.0 to 2.1
   4978 - 1 new Miscellaneous Property
   4979  bpb                      ; Bidi_Paired_Bracket
   4980  -> uchar.h & UProperty.java
   4981  -> ppucd.h & .cpp
   4982 
   4983 * PropertyValueAliases.txt changes
   4984 - 3 Bidi_Paired_Bracket_Type (bpt) values:
   4985  bpt; c                                ; Close
   4986  bpt; n                                ; None
   4987  bpt; o                                ; Open
   4988  -> uchar.h & UCharacter.BidiPairedBracketType
   4989  -> ubidi_props.h & .c & UBiDiProps.java
   4990  -> change ubidi.icu format version from 2.0 to 2.1
   4991 - 4 new Bidi_Class (bc) values:
   4992  bc ; FSI                              ; First_Strong_Isolate
   4993  bc ; LRI                              ; Left_To_Right_Isolate
   4994  bc ; RLI                              ; Right_To_Left_Isolate
   4995  bc ; PDI                              ; Pop_Directional_Isolate
   4996  -> uchar.h & UCharacterEnums.ECharacterDirection
   4997  -> until the bidi code gets updated,
   4998     Roozbeh suggests mapping the new bc values to ON (Other_Neutral)
   4999 - 3 new Word_Break (WB) values:
   5000  WB ; HL                               ; Hebrew_Letter
   5001  WB ; SQ                               ; Single_Quote
   5002  WB ; DQ                               ; Double_Quote
   5003  -> uchar.h & UCharacter.WordBreak
   5004  -> first time Word_Break numeric constants exceed 4 bits (now 17 values)
   5005 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
   5006  (added 2012-10-16)
   5007  Aghb  239     Caucasian Albanian
   5008  Mahj  314     Mahajani
   5009  -> uscript.h
   5010  -> com.ibm.icu.lang.UScript
   5011    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
   5012    replace  public static final int \1 = \2;\3
   5013  -> preparseucd.py _scripts_only_in_iso15924
   5014  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
   5015      and in com.ibm.icu.dev.test.lang.TestUScript.java
   5016  -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
   5017     (not strictly necessary for NOT_ENCODED scripts)
   5018 
   5019 * generate normalization data files
   5020 - ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib
   5021 - ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in
   5022 - ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata
   5023 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
   5024 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
   5025 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
   5026 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
   5027 
   5028 * build ICU (make install)
   5029  so that the tools build can pick up the new definitions from the installed header files.
   5030 
   5031 ~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
   5032 
   5033 * build Unicode tools using CMake+make
   5034 
   5035 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
   5036 
   5037 # Location (--prefix) of where ICU was installed.
   5038 set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst)
   5039 # Location of the ICU source tree.
   5040 set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src)
   5041 
   5042 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
   5043 ~/svn.icutools/trunk/dbg/unicode/c$ make
   5044 
   5045 * generate core properties data files
   5046 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src
   5047 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src
   5048 - rebuild ICU (make install) & tools
   5049 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
   5050 - rebuild ICU (make install) & tools
   5051 
   5052 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
   5053  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
   5054 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
   5055 - Unicode 6.0..6.3: U+2260, U+226E, U+226F
   5056 - nothing new in 6.3, no test file to update
   5057 
   5058 * update Java data files
   5059 - refresh just the UCD-related files, just to be safe
   5060 - see (ICU4C)/source/data/icu4j-readme.txt
   5061 - mkdir /tmp/icu4j
   5062 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   5063  output:
   5064    ...
   5065    Unicode .icu files built to ./out/build/icudt52l
   5066    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b
   5067    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b
   5068    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
   5069    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b
   5070    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b"
   5071    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/
   5072    mkdir -p /tmp/icu4j/main/shared/data
   5073    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
   5074    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/
   5075    mkdir -p /tmp/icu4j/main/shared/data
   5076    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
   5077    make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data'
   5078 - copy the big-endian Unicode data files to another location,
   5079  separate from the other data files
   5080    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
   5081    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
   5082    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
   5083    ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu
   5084    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
   5085    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
   5086    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
   5087 - refresh ICU4J
   5088    ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
   5089 
   5090 * refresh Java test .txt files
   5091 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
   5092 
   5093 * UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files
   5094 
   5095 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
   5096 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
   5097 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
   5098 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
   5099  (note removing the underscore before "Rules")
   5100 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
   5101  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
   5102  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
   5103 - check test file diffs for previously commented-out, known-failing data lines;
   5104  probably need to keep those commented out
   5105 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
   5106 - run genuca, see command line above
   5107 - rebuild ICU4C
   5108 - refresh ICU4J collation data:
   5109  (subset of instructions above for properties data refresh, except copies all coll/*)
   5110    ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   5111    ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
   5112    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
   5113    ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
   5114 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
   5115 - note on intltest: if collate/UCAConformanceTest fails, then
   5116  utility/MultithreadTest/TestCollators will fail as well;
   5117  fix the conformance test before looking into the multi-thread test
   5118 
   5119 * test ICU, fix test code where necessary
   5120 
   5121 * When refreshing all of ICU4J data from ICU4C
   5122 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   5123 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
   5124 or
   5125 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
   5126 
   5127 *** LayoutEngine script information
   5128 - skipped for Unicode 6.3: no new scripts
   5129 
   5130 *** merge the Unicode update branches back onto the trunk
   5131 - do not merge the icudata.jar and testdata.jar,
   5132  instead rebuild them from merged & tested ICU4C
   5133 
   5134 ---------------------------------------------------------------------------- ***
   5135 
   5136 Unicode 6.2 update
   5137 
   5138 http://www.unicode.org/review/pri230/
   5139 http://www.unicode.org/versions/beta-6.2.0.html
   5140 http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0
   5141 http://www.unicode.org/review/pri227/  Changes to Script Extensions Property Values
   5142 http://www.unicode.org/review/pri228/  Changing some common characters from Punctuation to Symbol
   5143 http://www.unicode.org/review/pri229/  Linebreaking Changes for Pictographic Symbols
   5144 http://www.unicode.org/reports/tr46/tr46-8.html  IDNA
   5145 http://unicode.org/Public/idna/6.2.0/
   5146 
   5147 *** ICU Trac
   5148 
   5149 - ticket 9515: Unicode 6.2: final ICU update
   5150 
   5151 - ticket 9514: UCA 6.2: fix UCARules.txt
   5152 
   5153 - ticket 9437: update ICU to Unicode 6.2
   5154 - C++ branches/markus/uni62 at r32050 from trunk at r32041
   5155 - Java branches/markus/uni62 at r32068 from trunk at r32066
   5156 
   5157 *** Unicode version numbers
   5158 - makedata.mak
   5159 - uchar.h
   5160  (configure.in & configure: have been modified to extract the version from uchar.h)
   5161 - com.ibm.icu.util.VersionInfo
   5162 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
   5163 
   5164 *** data files & enums & parser code
   5165 
   5166 * file preparation
   5167 
   5168 - download UCD, UCA & IDNA files
   5169 - make sure that the Unicode data folder passed into preparseucd.py
   5170  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
   5171 - modify preparseucd.py: NamesList.txt is now in UTF-8
   5172 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src
   5173 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
   5174 - Check test file diffs for previously commented-out, known-failing data lines;
   5175  probably need to keep those commented out.
   5176 
   5177 * PropertyValueAliases.txt changes
   5178 - 1 new Line_Break (lb) value:
   5179  lb ; RI                               ; Regional_Indicator
   5180  -> uchar.h & UCharacter.LineBreak
   5181 - 1 new Word_Break (WB) value:
   5182  WB ; RI                               ; Regional_Indicator
   5183  -> uchar.h & UCharacter.WordBreak
   5184 - 1 new Grapheme_Cluster_Break (GCB) value:
   5185  GCB; RI                               ; Regional_Indicator
   5186  -> uchar.h & UCharacter.GraphemeClusterBreak
   5187 
   5188 * 3 new numeric values
   5189  The new value -1, which was really supposed to be NaN but that would have required
   5190  new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1,
   5191  but encodeNumericValue() in corepropsbuilder.cpp had to be fixed.
   5192    cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1
   5193    cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1
   5194  The two new values 216000 and 432000 require an addition to the encoding of numeric values.
   5195    cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000
   5196    cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000
   5197  -> uprops.h, uchar.c & UCharacterProperty.java
   5198  -> cucdtst.c & UCharacterTest.java
   5199 
   5200 * generate normalization data files
   5201 - ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib
   5202 - ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in
   5203 - ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata
   5204 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
   5205 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
   5206 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
   5207 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
   5208 
   5209 * build ICU (make install)
   5210  so that the tools build can pick up the new definitions from the installed header files.
   5211 * build Unicode tools using CMake+make
   5212 
   5213 * generate core properties data files
   5214 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src
   5215 - in initial bootstrapping, change the UCA version
   5216  in source/data/unidata/FractionalUCA.txt to match the new Unicode version
   5217 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src
   5218 - rebuild ICU (make install) & tools
   5219  + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
   5220    check if the UCA version in FractionalUCA.txt matches the new Unicode version
   5221    (see step above)
   5222 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
   5223 - rebuild ICU (make install) & tools
   5224 
   5225 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
   5226  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
   5227 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
   5228 - Unicode 6.0..6.2: U+2260, U+226E, U+226F
   5229 - nothing new in 6.2, no test file to update
   5230 
   5231 * update Java data files
   5232 - refresh just the UCD-related files, just to be safe
   5233 - see (ICU4C)/source/data/icu4j-readme.txt
   5234 - mkdir /tmp/icu4j
   5235 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   5236  output:
   5237    ...
   5238    Unicode .icu files built to ./out/build/icudt50l
   5239    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b
   5240    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b
   5241    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
   5242    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b
   5243    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b"
   5244    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/
   5245    mkdir -p /tmp/icu4j/main/shared/data
   5246    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
   5247    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/
   5248    mkdir -p /tmp/icu4j/main/shared/data
   5249    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
   5250    make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data'
   5251 - copy the big-endian Unicode data files to another location,
   5252  separate from the other data files
   5253    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
   5254    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
   5255    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
   5256    ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu
   5257    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
   5258    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
   5259    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
   5260 - refresh ICU4J
   5261    ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
   5262 
   5263 * refresh Java test .txt files
   5264 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
   5265 
   5266 * UCA
   5267 
   5268 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
   5269 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
   5270 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
   5271 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
   5272  (note removing the underscore before "Rules")
   5273 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
   5274  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
   5275  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
   5276 - check test file diffs for previously commented-out, known-failing data lines;
   5277  probably need to keep those commented out
   5278 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
   5279 - run genuca, see command line above
   5280 - rebuild ICU4C
   5281 - refresh ICU4J collation data:
   5282  (subset of instructions above for properties data refresh, except copies all coll/*)
   5283    ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   5284    ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
   5285    ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
   5286    ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
   5287 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
   5288 - note on intltest: if collate/UCAConformanceTest fails, then
   5289  utility/MultithreadTest/TestCollators will fail as well;
   5290  fix the conformance test before looking into the multi-thread test
   5291 
   5292 * test ICU, fix test code where necessary
   5293 
   5294 * When refreshing all of ICU4J data from ICU4C
   5295 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   5296 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
   5297 or
   5298 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
   5299 
   5300 *** LayoutEngine script information
   5301 - skipped for Unicode 6.2: no new scripts
   5302 
   5303 *** merge the Unicode update branches back onto the trunk
   5304 - do not merge the icudata.jar and testdata.jar,
   5305  instead rebuild them from merged & tested ICU4C
   5306 
   5307 ---------------------------------------------------------------------------- ***
   5308 
   5309 Future Unicode update
   5310 
   5311 Tools simplified since the Unicode 6.1 update. See
   5312 - https://icu.unicode.org/design/props/ppucd
   5313 - http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972
   5314 
   5315 * Unicode version numbers
   5316 - icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates
   5317 
   5318 * file preparation
   5319 - ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py:
   5320 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src
   5321 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
   5322 - Check test file diffs for previously commented-out, known-failing data lines;
   5323  probably need to keep those commented out.
   5324 
   5325 * PropertyValueAliases.txt changes
   5326 - Script codes that are in ISO 15924 but not in Unicode are now listed in
   5327  preparseucd.py, in the _scripts_only_in_iso15924 variable.
   5328  If there are new ISO codes, then add them.
   5329  If Unicode adds some of them, then remove them from the .py variable.
   5330 
   5331 * UnicodeData.txt changes
   5332 - No more manual changes for CJK ranges for algorithmic names;
   5333  those are now written to ppucd.txt and genprops reads them from there.
   5334 
   5335 * generate core properties data files (makeprops.sh was deleted)
   5336 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src
   5337 
   5338 * no more manual updates of source/data/unidata/norm2/nfkc_cf.txt
   5339 - it is now generated by preparseucd.py
   5340 
   5341 * no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt
   5342 - it is now generated by preparseucd.py
   5343 - make sure that the Unicode data folder passed into preparseucd.py
   5344  includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
   5345  (can be in some subfolder)
   5346 
   5347 * generate normalization data files
   5348 - ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib
   5349 - ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in
   5350 - ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata
   5351 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
   5352 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
   5353 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
   5354 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
   5355 
   5356 * build ICU (make install)
   5357 * build Unicode tools using CMake+make
   5358 
   5359 * new way to call genuca (makeuca.sh was deleted)
   5360 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src
   5361 
   5362 ---------------------------------------------------------------------------- ***
   5363 
   5364 Unicode 6.1 update
   5365 
   5366 *** ICU Trac
   5367 
   5368 - ticket 8995 final update to Unicode 6.1
   5369 - ticket 8994 regenerate source/layout/CanonData.cpp
   5370 
   5371 - ticket 8961 support Unicode "Age" value *names*
   5372 - ticket 8963 support multiple character name aliases & types
   5373 
   5374 - ticket 8827 "update ICU to Unicode 6.1"
   5375 - C++ branches/markus/uni61 at r30864 from trunk at r30843
   5376 - Java branches/markus/uni61 at r30865 from trunk at r30863
   5377 
   5378 *** Unicode version numbers
   5379 - makedata.mak
   5380 - uchar.h
   5381  (configure.in & configure: have been modified to extract the version from uchar.h)
   5382 - com.ibm.icu.util.VersionInfo
   5383 - icutools/unicode/makedefs.sh
   5384  + also review & update other definitions in that file,
   5385    e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l
   5386 
   5387 *** data files & enums & parser code
   5388 
   5389 * file preparation
   5390 
   5391 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed
   5392 - This prepares both unidata and testdata files in respective output subfolders.
   5393 - Check test file diffs for previously commented-out, known-failing data lines;
   5394  probably need to keep those commented out.
   5395 
   5396 * PropertyValueAliases.txt changes
   5397 - 11 new block names:
   5398  Arabic_Extended_A
   5399  Arabic_Mathematical_Alphabetic_Symbols
   5400  Chakma
   5401  Meetei_Mayek_Extensions
   5402  Meroitic_Cursive
   5403  Meroitic_Hieroglyphs
   5404  Miao
   5405  Sharada
   5406  Sora_Sompeng
   5407  Sundanese_Supplement
   5408  Takri
   5409  -> add to uchar.h
   5410  -> add to UCharacter.UnicodeBlock IDs
   5411    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
   5412            replace  public static final int \1_ID = \2; \3
   5413  -> add to UCharacter.UnicodeBlock objects
   5414    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
   5415            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
   5416 - 1 new Joining_Group (jg) value:
   5417  Rohingya_Yeh
   5418  -> uchar.h & UCharacter.JoiningGroup
   5419 - 2 new Line_Break (lb) values:
   5420  CJ=Conditional_Japanese_Starter
   5421  HL=Hebrew_Letter
   5422  -> uchar.h & UCharacter.LineBreak
   5423 - 7 new scripts:
   5424  sc ; Cakm      ; Chakma
   5425  sc ; Merc      ; Meroitic_Cursive
   5426  sc ; Mero      ; Meroitic_Hieroglyphs
   5427  sc ; Plrd      ; Miao
   5428  sc ; Shrd      ; Sharada
   5429  sc ; Sora      ; Sora_Sompeng
   5430  sc ; Takr      ; Takri
   5431  -> remove these from SyntheticPropertyValueAliases.txt
   5432  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
   5433      and in com.ibm.icu.dev.test.lang.TestUScript.java
   5434 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
   5435  (added 2011-06-21)
   5436  Khoj        322     Khojki
   5437  Tirh        326     Tirhuta
   5438    and another one added 2011-12-09
   5439  Hluw        080     Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs)
   5440  -> uscript.h
   5441  -> com.ibm.icu.lang.UScript
   5442    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
   5443    replace  public static final int \1 = \2;\3
   5444  -> SyntheticPropertyValueAliases.txt
   5445  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
   5446      and in com.ibm.icu.dev.test.lang.TestUScript.java
   5447 
   5448 * UnicodeData.txt changes
   5449 - the last Unihan code point changes from U+9FCB to U+9FCC
   5450  search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive)
   5451  + do change gennames.c
   5452  + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java
   5453 
   5454 * DerivedBidiClass.txt changes
   5455 - 2 new default-AL blocks:
   5456 #     Arabic Extended-A: U+08A0  -  U+08FF  (was default-R)
   5457 #     Arabic Mathematical Alphabetic Symbols:
   5458 #                       U+1EE00  - U+1EEFF  (was default-R)
   5459 - 2 new default-R blocks:
   5460 #     Meroitic Hieroglyphs:
   5461 #                        U+10980 - U+1099F
   5462 #     Meroitic Cursive:  U+109A0 - U+109FF
   5463  -> should be picked up by the explicit data in the file
   5464 
   5465 * NameAliases.txt changes
   5466 - from
   5467    # Each line has two fields
   5468    # First field: Code point
   5469    # Second field: Alias
   5470 - to
   5471    # Each line has three fields, as described here:
   5472    #
   5473    # First field:  Code point
   5474    # Second field: Alias
   5475    # Third field:  Type
   5476 - Also, the file previously allowed multiple aliases but only now does it
   5477  actually provide multiple, even multiple of the same type. For example,
   5478    FEFF;BYTE ORDER MARK;alternate
   5479    FEFF;BOM;abbreviation
   5480    FEFF;ZWNBSP;abbreviation
   5481 - This breaks our gennames parser, unames.icu data structure, and API.
   5482  Fix gennames to only pick up "correction" aliases.
   5483  New ticket #8963 for further changes.
   5484 
   5485 * run genpname/preparse.pl (on Linux)
   5486  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
   5487  + make sure that data.h is writable
   5488  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
   5489  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
   5490 
   5491 * build ICU (make install)
   5492  so that the tools build can pick up the new definitions from the installed header files.
   5493 * build Unicode tools (at least genpname) using CMake+make
   5494 
   5495 * run genpname
   5496  (builds both pnames.icu and propname_data.h)
   5497 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
   5498 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
   5499 
   5500 * build ICU (make install)
   5501 * build Unicode tools using CMake+make
   5502 
   5503 * update source/data/unidata/norm2/nfkc_cf.txt
   5504 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
   5505 
   5506 * update source/data/unidata/norm2/uts46.txt
   5507 - download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
   5508  to ~/svn.icu/tools/trunk/src/unicode/py
   5509 - adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008".
   5510 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
   5511 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
   5512 
   5513 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
   5514  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
   5515 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
   5516 - Unicode 6.0..6.1: U+2260, U+226E, U+226F
   5517 - nothing new in 6.1, no test file to update
   5518 
   5519 * generate core properties data files
   5520 - in initial bootstrapping, change the UCA version
   5521  in source/data/unidata/FractionalUCA.txt to match the new Unicode version
   5522 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
   5523 - rebuild ICU & tools
   5524  + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
   5525    check if the UCA version in FractionalUCA.txt matches the new Unicode version
   5526    (see step above)
   5527 - run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm:
   5528  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
   5529 - rebuild ICU & tools
   5530 
   5531 * update Java data files
   5532 - refresh just the UCD-related files, just to be safe
   5533 - see (ICU4C)/source/data/icu4j-readme.txt
   5534 - mkdir /tmp/icu4j
   5535 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   5536  output:
   5537    ...
   5538    Unicode .icu files built to ./out/build/icudt49l
   5539    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b
   5540    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b
   5541    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
   5542    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b
   5543    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b"
   5544    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/
   5545    mkdir -p /tmp/icu4j/main/shared/data
   5546    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
   5547    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/
   5548    mkdir -p /tmp/icu4j/main/shared/data
   5549    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
   5550    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data'
   5551 - copy the big-endian Unicode data files to another location,
   5552  separate from the other data files
   5553    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
   5554    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
   5555    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
   5556    ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu
   5557    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
   5558    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
   5559    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
   5560 - refresh ICU4J
   5561    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
   5562 
   5563 * refresh Java test .txt files
   5564 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
   5565 
   5566 * test ICU so far, fix test code where necessary
   5567 - temporarily ignore collation issues that look like UCA/UCD mismatches,
   5568  until UCA data is updated
   5569 
   5570 * UCA
   5571 
   5572 - get output from Mark's tools; look in
   5573    http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt
   5574 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
   5575 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
   5576  (note removing the underscore before "Rules")
   5577 - update (ICU)/source/test/testdata/CollationTest_*.txt
   5578  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
   5579  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
   5580 - check test file diffs for previously commented-out, known-failing data lines;
   5581  probably need to keep those commented out
   5582 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
   5583 - run makeuca.sh:
   5584  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
   5585 - rebuild ICU4C
   5586 - refresh ICU4J collation data:
   5587  (subset of instructions above for properties data refresh, except copies all coll/*)
   5588    ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   5589    ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
   5590    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
   5591    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
   5592 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
   5593 - note on intltest: if collate/UCAConformanceTest fails, then
   5594  utility/MultithreadTest/TestCollators will fail as well;
   5595  fix the conformance test before looking into the multi-thread test
   5596 
   5597 * When refreshing all of ICU4J data from ICU4C
   5598 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   5599 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
   5600 or
   5601 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
   5602 
   5603 *** LayoutEngine script information
   5604 
   5605 (For details see the Unicode 5.2 change log below.)
   5606 
   5607 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
   5608  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
   5609  in the working directory.
   5610  (It also generates ScriptRunData.cpp, which is no longer needed.)
   5611 
   5612  The generated files have a current copyright date and "@draft" statement.
   5613 
   5614 - diff current <icu>/source/layout files vs. generated ones
   5615    ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
   5616  review and manually merge desired changes;
   5617  fix gratuitous changes, incorrect @draft and missing aliases;
   5618  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
   5619 - if you just copy the above files, then
   5620  fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
   5621  manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
   5622 
   5623 *** merge the Unicode update branches back onto the trunk
   5624 - do not merge the icudata.jar and testdata.jar,
   5625  instead rebuild them from merged & tested ICU4C
   5626 
   5627 ---------------------------------------------------------------------------- ***
   5628 
   5629 ICU 4.8 (no Unicode update, just new script codes)
   5630 
   5631 * 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
   5632  (added 2010-12-21)
   5633    Afak    439     Afaka
   5634    Jurc    510     Jurchen
   5635    Mroo    199     Mro, Mru
   5636    Nshu    499     Nüshu
   5637    Shrd    319     Sharada, Śāradā
   5638    Sora    398     Sora Sompeng
   5639    Takr    321     Takri, Ṭākrī, Ṭāṅkrī
   5640    Tang    520     Tangut
   5641    Wole    480     Woleai
   5642  -> uscript.h
   5643  -> com.ibm.icu.lang.UScript
   5644    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
   5645    replace  public static final int \1 = \2;\3
   5646  -> genpname/SyntheticPropertyValueAliases.txt
   5647  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
   5648      and in com.ibm.icu.dev.test.lang.TestUScript.java
   5649 
   5650 * run genpname/preparse.pl (on Linux)
   5651  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
   5652  + make sure that data.h is writable
   5653  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
   5654  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
   5655 
   5656 * rebuild Unicode tools (at least genpname) using make
   5657 - You might first need to "make install" ICU so that the tools build can pick
   5658  up the new definitions from the installed header files.
   5659 
   5660 * run genpname
   5661  (builds both pnames.icu and propname_data.h)
   5662 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
   5663 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
   5664 - rebuild ICU & tools
   5665 
   5666 * run genprops
   5667 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
   5668 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
   5669 - rebuild ICU & tools
   5670 
   5671 * update Java data files
   5672 - refresh just the UCD-related files, just to be safe
   5673 - see (ICU4C)/source/data/icu4j-readme.txt
   5674 - mkdir /tmp/icu4j
   5675 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   5676 - copy the big-endian Unicode data files to another location,
   5677  separate from the other data files
   5678    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
   5679    ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
   5680    ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
   5681 - refresh ICU4J
   5682    ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b
   5683 
   5684 * should have updated the layout engine script codes but forgot
   5685 
   5686 ---------------------------------------------------------------------------- ***
   5687 
   5688 Unicode 6.0 update
   5689 
   5690 *** related ICU Trac tickets
   5691 
   5692 7264 Unicode 6.0 Update
   5693 
   5694 *** Unicode version numbers
   5695 - makedata.mak
   5696 - uchar.h
   5697  (configure.in & configure: have been modified to extract the version from uchar.h)
   5698 - com.ibm.icu.util.VersionInfo
   5699 
   5700 *** data files & enums & parser code
   5701 
   5702 * file preparation
   5703 
   5704 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
   5705 - This now prepares both unidata and testdata files in respective output subfolders.
   5706 
   5707 * PropertyAliases.txt changes
   5708 - new Script_Extensions property defined in the new ScriptExtensions.txt file
   5709  but not listed in PropertyAliases.txt; reported to unicode.org;
   5710  -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
   5711    scx; Script_Extensions
   5712  -> uchar.h with new UProperty section
   5713  -> com.ibm.icu.lang.UProperty, parallel with uchar.h
   5714 
   5715 * PropertyValueAliases.txt changes
   5716 - 12 new block names:
   5717  Alchemical_Symbols
   5718  Bamum_Supplement
   5719  Batak
   5720  Brahmi
   5721  CJK_Unified_Ideographs_Extension_D
   5722  Emoticons
   5723  Ethiopic_Extended_A
   5724  Kana_Supplement
   5725  Mandaic
   5726  Miscellaneous_Symbols_And_Pictographs
   5727  Playing_Cards
   5728  Transport_And_Map_Symbols
   5729  -> add to uchar.h
   5730  -> add to UCharacter.UnicodeBlock
   5731    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
   5732            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
   5733 - Joining_Group (jg) values:
   5734  Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
   5735  -> uchar.h & UCharacter.JoiningGroup
   5736 - 3 new scripts:
   5737  sc ; Batk      ; Batak
   5738  sc ; Brah      ; Brahmi
   5739  sc ; Mand      ; Mandaic
   5740  -> remove these from SyntheticPropertyValueAliases.txt
   5741  -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
   5742  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
   5743      and in com.ibm.icu.dev.test.lang.TestUScript.java
   5744 - 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
   5745  (added 2009-11-11..2010-07-18)
   5746  Bass        259     Bassa Vah
   5747  Dupl        755     Duployan shortand
   5748  Elba        226     Elbasan
   5749  Gran        343     Grantha
   5750  Kpel        436     Kpelle
   5751  Loma        437     Loma
   5752  Mend        438     Mende
   5753  Merc        101     Meroitic Cursive
   5754  Narb        106     Old North Arabian
   5755  Nbat        159     Nabataean
   5756  Palm        126     Palmyrene
   5757  Sind        318     Sindhi
   5758  Wara        262     Warang Citi
   5759  -> uscript.h
   5760  -> com.ibm.icu.lang.UScript
   5761    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
   5762    replace  public static final int \1 = \2;\3
   5763  -> SyntheticPropertyValueAliases.txt
   5764  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
   5765      and in com.ibm.icu.dev.test.lang.TestUScript.java
   5766 - ISO 15924 name change
   5767  Mero        100     Meroitic Hieroglyphs (was Meroitic)
   5768  -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
   5769 - property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
   5770 
   5771 * UnicodeData.txt changes
   5772 - new CJK block:
   5773  2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
   5774  2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
   5775  -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
   5776 
   5777 * build Unicode tools using CMake+make
   5778 
   5779 * run genpname/preparse.pl (on Linux)
   5780  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
   5781  + make sure that data.h is writable
   5782  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
   5783  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
   5784 
   5785 * rebuild Unicode tools (at least genpname) using make
   5786 - You might first need to "make install" ICU so that the tools build can pick
   5787  up the new definitions from the installed header files.
   5788 
   5789 * run genpname
   5790 - ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
   5791 - rebuild ICU & tools
   5792 
   5793 * update source/data/unidata/norm2/nfkc_cf.txt
   5794 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
   5795 
   5796 * update source/data/unidata/norm2/uts46.txt
   5797 - download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
   5798  to ~/svn.icu/tools/trunk/src/unicode/py
   5799 - adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
   5800 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
   5801 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
   5802 
   5803 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
   5804  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
   5805 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
   5806 - Unicode 6.0: U+2260, U+226E, U+226F
   5807 
   5808 * generate core properties data files
   5809 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
   5810 - rebuild ICU & tools
   5811 - run makeuca.sh so that genuca picks up the new nfc.nrm:
   5812  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
   5813 - rebuild ICU & tools
   5814 
   5815 * implement new Script_Extensions property (provisional)
   5816 - parser & generator: genprops & uprops.icu
   5817 - uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
   5818 - UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
   5819 
   5820 * switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
   5821 - (one-time change)
   5822 - genbidi/gencase/genprops tools changes
   5823 - re-run makeprops.sh (see above)
   5824 - UCharacterProperty.java, UCharacterTypeIterator.java,
   5825  UBiDiProps.java, UCaseProps.java, and several others with minor changes;
   5826  UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
   5827 
   5828 * update Java data files
   5829 - refresh just the UCD-related files, just to be safe
   5830 - see (ICU4C)/source/data/icu4j-readme.txt
   5831 - mkdir /tmp/icu4j
   5832 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   5833  output:
   5834    ...
   5835    Unicode .icu files built to ./out/build/icudt45l
   5836    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
   5837    echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
   5838    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
   5839    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
   5840    mkdir -p /tmp/icu4j/main/shared/data
   5841    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
   5842 - copy the big-endian Unicode data files to another location,
   5843  separate from the other data files
   5844    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
   5845    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
   5846    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
   5847    ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
   5848    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
   5849    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
   5850    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
   5851 - refresh ICU4J
   5852    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
   5853 
   5854 * refresh Java test .txt files
   5855 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
   5856 
   5857 * un-hardcode normalization skippable (NF*_Inert) test data
   5858 - removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
   5859 
   5860 * copy updated break iterator test files
   5861 - now handled by early ucdcopy.py and
   5862  copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
   5863  (old instructions:
   5864   copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
   5865   to ~/svn.icu/trunk/src/source/test/testdata)
   5866 - they are not used in ICU4J
   5867 
   5868 * UCA
   5869 
   5870 - get output from Mark's tools; look in
   5871    http://www.unicode.org/~book/incoming/mark/uca6.0.0/
   5872    http://www.macchiato.com/unicode/utc/additional-uca-files
   5873    http://www.unicode.org/Public/UCA/6.0.0/
   5874    http://www.unicode.org/~mdavis/uca/
   5875 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
   5876 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
   5877 - update Han-implicit ranges for new CJK extensions:
   5878  swapCJK() in ucol.cpp & ImplicitCEGenerator.java
   5879 - genuca: allow bytes 02 for U+FFFE, new merge-sort character;
   5880  do not add it into invuca so that tailoring primary-after an ignorable works
   5881 - genuca: permit space between [variable top] bytes
   5882 - ucol.cpp: treat noncharacters like unassigned rather than ignorable
   5883 - run makeuca.sh:
   5884  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
   5885 - rebuild ICU4C
   5886 - refresh ICU4J collation data:
   5887  (subset of instructions above for properties data refresh, except copies all coll/*)
   5888    ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   5889    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
   5890    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
   5891    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
   5892 - update (ICU)/source/test/testdata/CollationTest_*.txt
   5893  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
   5894  with output from Mark's Unicode tools
   5895 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
   5896 - note on intltest: if collate/UCAConformanceTest fails, then
   5897  utility/MultithreadTest/TestCollators will fail as well;
   5898  fix the conformance test before looking into the multi-thread test
   5899 
   5900 * When refreshing all of ICU4J data from ICU4C
   5901 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
   5902 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
   5903 or
   5904 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
   5905 
   5906 *** LayoutEngine script information
   5907 
   5908 (For details see the Unicode 5.2 change log below.)
   5909 
   5910 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
   5911 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
   5912 ScriptRunData.cpp, which is no longer needed.)
   5913 
   5914 The generated files have a current copyright date and "@draft" statement.
   5915 
   5916 * copy the above files into <icu>/source/layout, replacing the old files.
   5917 * fix mixed line endings
   5918 * review the diffs and fix incorrect @draft and missing aliases;
   5919  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
   5920 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
   5921 
   5922 ---------------------------------------------------------------------------- ***
   5923 
   5924 Unicode 5.2 update
   5925 
   5926 *** related ICU Trac tickets
   5927 
   5928 7084 Unicode 5.2
   5929 
   5930 7167 verify collation bytes
   5931 7235 Java test NAME_ALIAS
   5932 7236 Java DerivedCoreProperties.txt test
   5933 7237 Java BidiTest.txt
   5934 7238 UTrie2 in core unidata
   5935 7239 test for tailoring gaps
   5936 7240 Java fix CollationMiscTest
   5937 7243 update layout engine for Unicode 5.2
   5938 
   5939 *** Unicode version numbers
   5940 - makedata.mak
   5941 - uchar.h
   5942 - configure.in & configure
   5943 - update ucdVersion in gennames.c if an algorithmic range changes
   5944 
   5945 *** data files & enums & parser code
   5946 
   5947 * file preparation
   5948 
   5949 python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
   5950 - includes finding files regardless of version numbers,
   5951  copying them, and performing the equivalent processing of the
   5952  ucdstrip and ucdmerge tools on the desired set of files
   5953 
   5954 * notes on changes
   5955 - PropertyAliases.txt
   5956  moved from numeric to enumerated:
   5957    ccc       ; Canonical_Combining_Class
   5958  new string properties:
   5959    NFKC_CF   ; NFKC_Casefold
   5960    Name_Alias; Name_Alias
   5961  new binary properties:
   5962    Cased     ; Cased
   5963    CI        ; Case_Ignorable
   5964    CWCF      ; Changes_When_Casefolded
   5965    CWCM      ; Changes_When_Casemapped
   5966    CWKCF     ; Changes_When_NFKC_Casefolded
   5967    CWL       ; Changes_When_Lowercased
   5968    CWT       ; Changes_When_Titlecased
   5969    CWU       ; Changes_When_Uppercased
   5970  new CJK Unihan properties (not supported by ICU)
   5971 - PropertyValueAliases.txt
   5972  new block names
   5973  new scripts
   5974  one script code change:
   5975    sc ; Qaai      ; Inherited
   5976    ->
   5977    sc ; Zinh      ; Inherited                        ; Qaai
   5978  new Line_Break (lb) value:
   5979    lb ; CP        ; Close_Parenthesis
   5980  new Joining_Group (jg) values: Farsi_Yeh, Nya
   5981  other new values:
   5982    ccc; 214; ATA  ; Attached_Above
   5983 - DerivedBidiClass.txt
   5984  new default-R range: U+1E800 - U+1EFFF
   5985 - UnicodeData.txt
   5986  all of the ISO comments are gone
   5987  new CJK block end:
   5988    9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
   5989  new CJK block:
   5990    2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
   5991    2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
   5992 
   5993 * genpname
   5994 - run preparse.pl
   5995  + cd \svn\icuproj\icu\trunk\source\tools\genpname
   5996  + make sure that data.h is writable
   5997  + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
   5998  + preparse.pl complains with errors like the following:
   5999      Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
   6000    This is because ICU 4.0 had scripts from ISO 15924 which are now
   6001    added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
   6002    and PropertyValueAliases.txt.
   6003    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
   6004       Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
   6005  + preparse.pl complains with errors about block names missing from uchar.h; add them
   6006 
   6007 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
   6008 - new block & script values
   6009  + 26 new blocks
   6010    copy new blocks from Blocks.txt
   6011    MS VC++ 2008 regular expression:
   6012      find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
   6013      replace with "    UBLOCK_\3 = 172, /*[\1]*/"
   6014  + several new script values already added in ICU 4.0 for ISO 15924 coverage
   6015    (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
   6016  + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
   6017  + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
   6018    (added to SyntheticPropertyValueAliases.txt)
   6019 - new Joining Group (JG) values: Farsi_Yeh, Nya
   6020 - new Line_Break (lb) value:
   6021    lb ; CP        ; Close_Parenthesis
   6022 
   6023 * hardcoded Unihan range end/limit
   6024 - Unihan range end moves from 9FC3 to 9FCB
   6025  search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
   6026  + do change gennames.c
   6027 
   6028 * Compare definitions of new binary properties with what we used to use
   6029  in algorithms, to see if the definitions changed.
   6030 - Verified that definitions for Cased and Case_Ignorable are unchanged.
   6031  The gencase tool now parses the newly public Case_Ignorable values
   6032  in case the definition changes in the future.
   6033 
   6034 * uchar.c & uprops.h & uprops.c & genprops
   6035 - new numeric values that didn't exist in Unicode data before:
   6036    1/7, 1/9, 1/10, 3/10, 1/16, 3/16
   6037  the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
   6038  therefore redesign the encoding of numeric types and values for formatVersion 6;
   6039  design for simple numbers up to at least 144 ("one gross"),
   6040  large values up to at least 10^20,
   6041  and fractions with numerators -1..17 and denominators 1..16
   6042  to cover current and expected future values
   6043  (e.g., more Han numeric values, Meroitic twelfths)
   6044 
   6045 * reimplement Hangul_Syllable_Type for new Jamo characters
   6046 - the old code assumed that all Jamo characters are in the 11xx block
   6047 - Unicode 5.2 fills holes there and adds new Jamo characters in
   6048    A960..A97F; Hangul Jamo Extended-A
   6049  and in
   6050    D7B0..D7FF; Hangul Jamo Extended-B
   6051 - Hangul_Syllable_Type can be trivially derived from a subset of
   6052  Grapheme_Cluster_Break values
   6053 
   6054 * build Unicode data source code for hardcoding core data
   6055 C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
   6056 
   6057 ICU data make path is \svn\icuproj\icu\trunk\source\data\
   6058 ICU root path is \svn\icuproj\icu\trunk
   6059 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
   6060 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
   6061 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
   6062 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
   6063 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
   6064 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
   6065 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
   6066 Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
   6067 Creating data file for Unicode Property Names
   6068 Creating data file for Unicode Character Properties
   6069 Creating data file for Unicode Case Mapping Properties
   6070 Creating data file for Unicode BiDi/Shaping Properties
   6071 Creating data file for Unicode Normalization
   6072 Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
   6073 Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
   6074 
   6075 - copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
   6076  and rebuild the common library
   6077 
   6078 *** UCA
   6079 
   6080 - update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
   6081 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
   6082 - update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
   6083 [ Begin obsolete instructions:
   6084  Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
   6085    - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
   6086      on Windows:
   6087        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
   6088        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
   6089  End obsolete instructions]
   6090 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
   6091  not just the *_STUB.txt files
   6092 - note on intltest: if collate/UCAConformanceTest fails, then
   6093  utility/MultithreadTest/TestCollators will fail as well;
   6094  fix the conformance test before looking into the multi-thread test
   6095 
   6096 *** Implement Cased & Case_Ignorable properties
   6097 - via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
   6098 - Problem: These properties should be disjoint, but aren't
   6099 - UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
   6100 - change ucase.icu to be able to store any combination of Cased and Case_Ignorable
   6101 
   6102 *** Implement Changes_When_Xyz properties
   6103 - without stored data
   6104 
   6105 *** Implement Name_Alias property
   6106 - add it as another name field in unames.icu
   6107 - make it available via u_charName() and UCharNameChoice and
   6108 - consider it in u_charFromName()
   6109 
   6110 *** Break iterators
   6111 
   6112 * Update break iterator rules to new UAX versions and new property values
   6113 * Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
   6114 
   6115 *** new BidiTest file
   6116 - review format and data
   6117 - copy BidiTest.txt to source/test/testdata
   6118 - write test code using this data
   6119 - fix ICU code where it fails the conformance test
   6120 
   6121 *** Java
   6122 - generally, find and update code corresponding to C/C++
   6123 - UCharacter.UnicodeBlock constants:
   6124  a) add an _ID integer per new block, update COUNT
   6125  b) add a class instance per new block
   6126     Visual Studio regex:
   6127        find            UBLOCK_{[^ ]+} = [0-9]+, {/.+}
   6128        replace with    public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
   6129 - CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
   6130 
   6131 - port test changes to Java
   6132 
   6133 *** LayoutEngine script information
   6134 
   6135 (For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
   6136 
   6137 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
   6138 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
   6139 ScriptRunData.cpp, which is no longer needed.)
   6140 
   6141 The generated files have a current copyright date and "@draft" statement.
   6142 
   6143 -> Eric Mader wrote in email on 20090930:
   6144    "I think the tool has been modified to update @draft to @stable for
   6145     older scripts and to add @draft for new scripts.
   6146     (I worked with an intern on this last year.)
   6147     You should check the output after you run it."
   6148 
   6149 * copy the above files into <icu>/source/layout, replacing the old files.
   6150 * fix mixed line endings
   6151 * review the diffs and fix incorrect @draft and missing aliases
   6152 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
   6153 
   6154 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
   6155 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
   6156 
   6157 -> Eric Mader wrote in email on 20090930:
   6158    "This is just a matter of making sure that all the per-script tables have
   6159     entries for any new scripts that were added.
   6160     If any new Indic characters were added, then the class tables in
   6161     IndicClassTables.cpp should be updated to reflect this.
   6162     John Emmons should know how to do this if it's required."
   6163 
   6164 * rebuild the layout and layoutex libraries.
   6165 
   6166 *** Documentation
   6167 - Update User Guide
   6168  + Jamo_Short_Name, sfc->scf, binary property value aliases
   6169 
   6170 ---------------------------------------------------------------------------- ***
   6171 
   6172 Unicode 5.1 update
   6173 
   6174 *** related ICU Trac tickets
   6175 
   6176 5696 Update to Unicode 5.1
   6177 
   6178 *** Unicode version numbers
   6179 - makedata.mak
   6180 - uchar.h
   6181 - configure.in & configure
   6182 - update ucdVersion in gennames.c if an algorithmic range changes
   6183 
   6184 *** data files & enums & parser code
   6185 
   6186 * file preparation
   6187 - ucdstrip:
   6188    DerivedCoreProperties.txt
   6189    DerivedNormalizationProps.txt
   6190    NormalizationTest.txt
   6191    PropList.txt
   6192    Scripts.txt
   6193    GraphemeBreakProperty.txt
   6194    SentenceBreakProperty.txt
   6195    WordBreakProperty.txt
   6196 - ucdstrip and ucdmerge:
   6197    EastAsianWidth.txt
   6198    LineBreak.txt
   6199 
   6200 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
   6201 copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
   6202 copy 5.1.0\ucd\Blocks.txt ..\unidata\
   6203 copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
   6204 copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
   6205 copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
   6206 copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
   6207 copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
   6208 copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
   6209 copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
   6210 copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
   6211 copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
   6212 copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
   6213 copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
   6214 
   6215 ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
   6216 ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
   6217 ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
   6218 ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
   6219 ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
   6220 ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
   6221 ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
   6222 ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
   6223 ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
   6224 ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
   6225 
   6226 * genpname
   6227 - run preparse.pl
   6228  + cd \svn\icuproj\icu\uni51\source\tools\genpname
   6229  + make sure that data.h is writable
   6230  + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
   6231  + preparse.pl complains with errors like the following:
   6232      Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
   6233    This is because ICU 3.8 had scripts from ISO 15924 which are now
   6234    added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
   6235    and PropertyValueAliases.txt.
   6236    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
   6237       Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
   6238  + PropertyValueAliases.txt now explicitly contains values for boolean properties:
   6239      N/Y, No/Yes, F/T, False/True
   6240    -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
   6241       It will use further values from the file if present.
   6242 
   6243 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
   6244 - new block & script values
   6245  + 17 new blocks
   6246  + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
   6247    (removed from SyntheticPropertyValueAliases.txt)
   6248  + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
   6249    (added to SyntheticPropertyValueAliases.txt)
   6250 - uprops.icu (uprops.h) only provides 7 bits for script codes.
   6251  In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
   6252  There is none above 127 yet which is the script code for an
   6253  assigned Unicode character, so ICU 4.0 uprops.icu does not store any
   6254  script code values greater than 127.
   6255  However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
   6256  in a parallel bit field, and that overflows now.
   6257  Also, future values >=128 would be incompatible anyway.
   6258  uprops.h is modified to move around several of the bit fields
   6259  in the properties vector words, and now uses 8 bits for the script code.
   6260  Two other bit fields also grow to accommodate future growth:
   6261  Block (current count: 172) grows from 8 to 9 bits,
   6262  and Word_Break grows from 4 to 5 bits.
   6263 - renamed property Simple_Case_Folding (sfc->scf)
   6264  + nothing to be done: handled as normal alias
   6265 - new property JSN Jamo_Short_Name
   6266  + no new API: only contributes to the Name property
   6267 - new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
   6268 - new Joining Group (JG) value: Burushashki_Yeh_Barree
   6269 - new Sentence_Break (SB) values:
   6270    SB ; CR        ; CR
   6271    SB ; EX        ; Extend
   6272    SB ; LF        ; LF
   6273    SB ; SC        ; SContinue
   6274 - new Word_Break (WB) values:
   6275    WB ; CR        ; CR
   6276    WB ; Extend    ; Extend
   6277    WB ; LF        ; LF
   6278    WB ; MB        ; MidNumLet
   6279 
   6280 * Further changes in the 2008-02-29 update:
   6281 - Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
   6282  because they should not normally be invisible.
   6283 - new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
   6284 - new Grapheme_Cluster_Break (GCB) value: PP=Prepend
   6285 - new Word_Break (WB) value: NL=Newline
   6286 
   6287 * hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
   6288 - Unihan range end moves from 9FBB to 9FC3
   6289  search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
   6290  + do change gennames.c
   6291 
   6292 * build Unicode data source code for hardcoding core data
   6293 C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
   6294 
   6295 ICU data make path is \svn\icuproj\icu\uni51\source\data\
   6296 ICU root path is \svn\icuproj\icu\uni51
   6297 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
   6298 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
   6299 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
   6300 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
   6301 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
   6302 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
   6303 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
   6304 Creating data file for Unicode Character Properties
   6305 Creating data file for Unicode Case Mapping Properties
   6306 Creating data file for Unicode BiDi/Shaping Properties
   6307 Creating data file for Unicode Normalization
   6308 Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
   6309 Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
   6310 
   6311 - copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
   6312  and rebuild the common library
   6313 
   6314 *** Break iterators
   6315 
   6316 * Update break iterator rules to new UAX versions and new property values
   6317 
   6318 *** UCA
   6319 
   6320 * update FractionalUCA.txt and UCARules.txt with new canonical closure
   6321 
   6322 *** Test suites
   6323 - Test that APIs using Unicode property value aliases (like UnicodeSet)
   6324  support all of the boolean values N/Y, No/Yes, F/T, False/True
   6325  -> TestBinaryValues() tests in both cintltst and intltest
   6326 
   6327 *** LayoutEngine script information
   6328 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
   6329 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
   6330 ScriptRunData.cpp, which is no longer needed.)
   6331 
   6332 The generated files have a current copyright date and "@draft" statement.
   6333 
   6334 * copy the above files into <icu>/source/layout, replacing the old files.
   6335 
   6336 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
   6337 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
   6338 
   6339 * rebuild the layout and layoutex libraries.
   6340 
   6341 *** Documentation
   6342 - Update User Guide
   6343  + Jamo_Short_Name, sfc->scf, binary property value aliases
   6344 
   6345 ---------------------------------------------------------------------------- ***
   6346 
   6347 Unicode 5.0 update
   6348 
   6349 *** related Jitterbugs
   6350 
   6351 5084 RFE: Update to Unicode 5.0
   6352 
   6353 *** data files & enums & parser code
   6354 
   6355 * file preparation
   6356 - ucdstrip:
   6357    DerivedCoreProperties.txt
   6358    DerivedNormalizationProps.txt
   6359    NormalizationTest.txt
   6360    PropList.txt
   6361    Scripts.txt
   6362    GraphemeBreakProperty.txt
   6363    SentenceBreakProperty.txt
   6364    WordBreakProperty.txt
   6365 - ucdstrip and ucdmerge:
   6366    EastAsianWidth.txt
   6367    LineBreak.txt
   6368 
   6369 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
   6370 copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
   6371 copy 5.0.0\ucd\Blocks.txt ..\unidata\
   6372 copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
   6373 copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
   6374 copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
   6375 copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
   6376 copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
   6377 copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
   6378 copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
   6379 copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
   6380 copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
   6381 copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
   6382 copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
   6383 
   6384 ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
   6385 ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
   6386 ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
   6387 ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
   6388 ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
   6389 ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
   6390 ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
   6391 ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
   6392 ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
   6393 ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
   6394 
   6395 * update FractionalUCA.txt and UCARules.txt with new canonical closure
   6396 
   6397 * genpname
   6398 - run preparse.pl
   6399  + make sure that data.h is writable
   6400  + perl preparse.pl \cvs\oss\icu > out.txt
   6401 
   6402 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
   6403 - new block & script values
   6404  + script values already added in ICU 3.6 because all of ISO 15924 is now covered
   6405 
   6406 * build Unicode data source code for hardcoding core data
   6407 C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
   6408 
   6409 ICU data make path is \cvs\oss\icu\source\data\
   6410 ICU root path is \cvs\oss\icu
   6411 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
   6412 [etc.]
   6413 Creating data file for Unicode Character Properties
   6414 Creating data file for Unicode Case Mapping Properties
   6415 Creating data file for Unicode BiDi/Shaping Properties
   6416 Creating data file for Unicode Normalization
   6417 Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
   6418 Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
   6419 
   6420 - copy the .c source files to C:\cvs\oss\icu\source\common
   6421  and rebuild the common library
   6422 
   6423 *** Unicode version numbers
   6424 - makedata.mak
   6425 - uchar.h
   6426 - configure.in
   6427 
   6428 *** LayoutEngine script information
   6429 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
   6430 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
   6431 ScriptRunData.cpp, which is no longer needed.)
   6432 
   6433 The generated files have a current copyright date and "@draft" statement.
   6434 
   6435 * copy the above files into <icu>/source/layout, replacing the old files.
   6436 
   6437 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
   6438 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
   6439 
   6440 * rebuild the layout and layoutex libraries.
   6441 
   6442 ---------------------------------------------------------------------------- ***
   6443 
   6444 Unicode 4.1 update
   6445 
   6446 *** related Jitterbugs
   6447 
   6448 4332 RFE: Update to Unicode 4.1
   6449 4157 RBBI, TR29 4.1 updates
   6450 
   6451 *** data files & enums & parser code
   6452 
   6453 * file preparation
   6454 - ucdstrip:
   6455    DerivedCoreProperties.txt
   6456    DerivedNormalizationProps.txt
   6457    NormalizationTest.txt
   6458    GraphemeBreakProperty.txt
   6459    SentenceBreakProperty.txt
   6460    WordBreakProperty.txt
   6461 - ucdstrip and ucdmerge:
   6462    EastAsianWidth.txt
   6463    LineBreak.txt
   6464 
   6465 * add new files to the repository
   6466    GraphemeBreakProperty.txt
   6467    SentenceBreakProperty.txt
   6468    WordBreakProperty.txt
   6469 
   6470 * update FractionalUCA.txt and UCARules.txt with new canonical closure
   6471 
   6472 * genpname
   6473 - handle new enumerated properties in sub read_uchar
   6474 - run preparse.pl
   6475 
   6476 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
   6477 - new binary properties
   6478  + Pattern_Syntax
   6479  + Pattern_White_Space
   6480 - new enumerated properties
   6481  + Grapheme_Cluster_Break
   6482  + Sentence_Break
   6483  + Word_Break
   6484 - new block & script & line break values
   6485 
   6486 * gencase
   6487 - case-ignorable changes
   6488  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
   6489  now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
   6490 
   6491 *** Unicode version numbers
   6492 - makedata.mak
   6493 - uchar.h
   6494 - configure.in
   6495 
   6496 *** tests
   6497 - verify that u_charMirror() round-trips
   6498 - test all new properties and some new values of old properties
   6499 
   6500 *** other code
   6501 
   6502 * hardcoded Unihan range end/limit
   6503 - Unihan range end moves from 9FA5 to 9FBB
   6504  search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
   6505  + do not modify BOCU/BOCSU code because that would change the encoding
   6506    and break binary compatibility!
   6507  + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
   6508    NamePrepProfile.txt
   6509  + ignore trietest.c: test data is arbitrary
   6510  + ignore tstnorm.cpp: test optimization, not important
   6511  + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
   6512  + do change line_th.txt and word_th.txt
   6513    by replacing hardcoded ranges with the new property values
   6514  + do change gennames.c
   6515 
   6516 source\data\brkitr\line_th.txt(229):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
   6517 source\data\brkitr\word_th.txt(23):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
   6518 source\tools\gennames\gennames.c(971):        0x4e00, 0x9fa5,
   6519 
   6520 * case mappings
   6521 - compare new special casing context conditions with previous ones
   6522  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
   6523 
   6524 * genpname
   6525 - consider storing only the short name if it is the same as the long name
   6526 
   6527 *** other reviews
   6528 - UAX #29 changes (grapheme/word/sentence breaks)
   6529 - UAX #14 changes (line breaks)
   6530 - Pattern_Syntax & Pattern_White_Space
   6531 
   6532 ---------------------------------------------------------------------------- ***
   6533 
   6534 Unicode 4.0.1 update
   6535 
   6536 *** related Jitterbugs
   6537 
   6538 3170 RFE: Update to Unicode 4.0.1
   6539 3171 Add new Unicode 4.0.1 properties
   6540 3520 use Unicode 4.0.1 updates for break iteration
   6541 
   6542 *** data files & enums & parser code
   6543 
   6544 * file preparation
   6545 - ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
   6546 - ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
   6547 
   6548 * file fixes
   6549 - fix UnicodeData.txt general categories of Ethiopic digits Nd->No
   6550  according to PRI #26
   6551  http://www.unicode.org/review/resolved-pri.html#pri26
   6552 - undone again because no corrigendum in sight;
   6553  instead modified tests to not check consistency on this for Unicode 4.0.1
   6554 
   6555 * ucdterms.txt
   6556 - update from http://www.unicode.org/copyright.html
   6557  formatted for plain text
   6558 
   6559 * uchar.h & uprops.h & uprops.c & genprops
   6560 - add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
   6561 - add U_LB_INSEPARABLE due to a spelling fix
   6562  + put short name comment only on line with new constant
   6563    for genpname perl script parser
   6564 - new binary properties
   6565  + STerm
   6566  + Variation_Selector
   6567 
   6568 * genpname
   6569 - fix genpname perl script so that it doesn't choke on more than 2 names per property value
   6570 - perl script: correctly calculate the maximum number of fields per row
   6571 
   6572 * uscript.h
   6573 - new script code Hrkt=Katakana_Or_Hiragana
   6574 
   6575 * gennorm.c track changes in DerivedNormalizationProps.txt
   6576 - "FNC" -> "FC_NFKC"
   6577 - single field "NFD_NO" -> two fields "NFD_QC; N" etc.
   6578 
   6579 * genprops/props2.c track changes in DerivedNumericValues.txt
   6580 - changed from 3 columns to 2, dropping the numeric type
   6581  + assume that the type is always numeric for Han characters,
   6582    and that only those are added in addition to what UnicodeData.txt lists
   6583 
   6584 *** Unicode version numbers
   6585 - makedata.mak
   6586 - uchar.h
   6587 - configure.in
   6588 
   6589 *** tests
   6590 - update test of default bidi classes according to PRI #28
   6591  /tsutil/cucdtst/TestUnicodeData
   6592  http://www.unicode.org/review/resolved-pri.html#pri28
   6593 - bidi tests: change exemplar character for ES depending on Unicode version
   6594 - change hardcoded expected property values where they change
   6595 
   6596 *** other code
   6597 
   6598 * name matching
   6599 - read UCD.html
   6600 
   6601 * scripts
   6602 - use new Hrkt=Katakana_Or_Hiragana
   6603 
   6604 * ZWJ & ZWNJ
   6605 - are now part of combining character sequences
   6606 - break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ
	tor-browser The Tor Browser
	git clone https://git.dasho.dev/tor-browser.git
	Log \| Files \| Refs \| README \| LICENSE