gendict.1.in (3418B)
1 .\" Hey, Emacs! This is -*-nroff-*- you know... 2 .\" 3 .\" gendict.1: manual page for the gendict utility 4 .\" 5 .\" Copyright (C) 2016 and later: Unicode, Inc. and others. 6 .\" License & terms of use: http://www.unicode.org/copyright.html 7 .\" Copyright (C) 2012 International Business Machines Corporation and others 8 .\" 9 .TH GENDICT 1 "1 June 2012" "ICU MANPAGE" "ICU @VERSION@ Manual" 10 .SH NAME 11 .B gendict 12 \- Compiles word list into ICU string trie dictionary 13 .SH SYNOPSIS 14 .B gendict 15 [ 16 .BR "\fB\-\-uchars" 17 | 18 .BR "\fB\-\-bytes" 19 .BI "\fB\-\-transform" " transform" 20 ] 21 [ 22 .BR "\-h\fP, \fB\-?\fP, \fB\-\-help" 23 ] 24 [ 25 .BR "\-V\fP, \fB\-\-version" 26 ] 27 [ 28 .BR "\-c\fP, \fB\-\-copyright" 29 ] 30 [ 31 .BR "\-v\fP, \fB\-\-verbose" 32 ] 33 [ 34 .BI "\-i\fP, \fB\-\-icudatadir" " directory" 35 ] 36 .IR " input-file" 37 .IR " output\-file" 38 .SH DESCRIPTION 39 .B gendict 40 reads the word list from 41 .I dictionary-file 42 and creates a string trie dictionary file. Normally this data file has the 43 .B .dict 44 extension. 45 .PP 46 Words begin at the beginning of a line and are terminated by the first whitespace. 47 Lines that begin with whitespace are ignored. 48 .SH OPTIONS 49 .TP 50 .BR "\-h\fP, \fB\-?\fP, \fB\-\-help" 51 Print help about usage and exit. 52 .TP 53 .BR "\-V\fP, \fB\-\-version" 54 Print the version of 55 .B gendict 56 and exit. 57 .TP 58 .BR "\-c\fP, \fB\-\-copyright" 59 Embeds the standard ICU copyright into the 60 .IR output-file . 61 .TP 62 .BR "\-v\fP, \fB\-\-verbose" 63 Display extra informative messages during execution. 64 .TP 65 .BI "\-i\fP, \fB\-\-icudatadir" " directory" 66 Look for any necessary ICU data files in 67 .IR directory . 68 For example, the file 69 .B pnames.icu 70 must be located when ICU's data is not built as a shared library. 71 The default ICU data directory is specified by the environment variable 72 .BR ICU_DATA . 73 Most configurations of ICU do not require this argument. 74 .TP 75 .BR "\fB\-\-uchars" 76 Set the output trie type to UChar. Mutually exclusive with 77 .BR --bytes. 78 .TP 79 .BR "\fB\-\-bytes" 80 Set the output trie type to Bytes. Mutually exclusive with 81 .BR --uchars. 82 .TP 83 .BR "\fB\-\-transform" 84 Set the transform type. Should only be specified with 85 .BR --bytes. 86 Currently supported transforms are: 87 .BR offset-<hex-number>, 88 which specifies an offset to subtract from all input characters. 89 It should be noted that the offset transform also maps U+200D 90 to 0xFF and U+200C to 0xFE, in order to offer compatibility to 91 languages that require these characters. 92 A transform must be specified for a bytes trie, and when applied 93 to the non-value characters in the 94 .IR input-file 95 must produce output between 0x00 and 0xFF. 96 .TP 97 .BI " input\-file" 98 The source file to read. 99 .TP 100 .BI " output\-file" 101 The file to write the output dictionary to. 102 .SH CAVEATS 103 The 104 .IR input-file 105 is assumed to be encoded in UTF-8. 106 The integers in the 107 .IR input-file 108 that are used as values must be made up of ASCII digits. They 109 may be specified either in hex, by using a 0x prefix, or in 110 decimal. 111 Either 112 .BI --bytes 113 or 114 .BI --uchars 115 must be specified. 116 .SH ENVIRONMENT 117 .TP 10 118 .B ICU_DATA 119 Specifies the directory containing ICU data. Defaults to 120 .BR @thepkgicudatadir@/@PACKAGE@/@VERSION@/ . 121 Some tools in ICU depend on the presence of the trailing slash. It is thus 122 important to make sure that it is present if 123 .B ICU_DATA 124 is set. 125 .SH AUTHORS 126 Maxime Serrano 127 .SH VERSION 128 1.0 129 .SH COPYRIGHT 130 Copyright (C) 2012 International Business Machines Corporation and others 131 .SH SEE ALSO 132 .BR http://www.icu-project.org/userguide/boundaryAnalysis.html 133