ISO/IEC 14651:2007 信息技术 国际串排序和比较 字符串比较方法和公共模板定制说明
标准编号:ISO/IEC 14651:2007
中文名称:信息技术 国际串排序和比较 字符串比较方法和公共模板定制说明
英文名称:Information technology — International string ordering and comparison — Method for comparing character strings and description of the common template tailorable ordering
发布日期:2007-12
标准范围
ISO/IEC 14651:2007定义了以下内容。参考比较法。该方法适用于两个字符串,以确定它们在排序列表中的排序顺序。该方法可以应用于包含来自ISO/IEC 10646完整库的字符的字符串。该方法也适用于该库的子集,例如不同ISO/IEC 8位标准字符集的子集,或任何其他字符集,无论是否标准化,以产生对每个脚本的给定语言集有效(在定制之后)的排序结果。该方法使用从ISO/IEC 14651中定义的通用模板表或从其定制之一导出的整理表。该方法提供了一种参考格式。该格式使用Backus-Naur形式(BNF)描述。此格式用于描述公共模板表。该格式在ISO/IEC 14651:2007中规范使用。通用模板表。参考比较方法使用公共模板表的给定定制。通用模板表描述了ISO/IEC 10646:2003直到修订2中编码的所有字符的顺序,加上字符梵文字母GGA、梵文字母JJA、梵文字母DDDA和梵文字母BBA(分别为字符U097B、U097C、U097E和U097F)。它允许完全确定性排序的规范。该表使得能够指定适合于本地排序规则的字符串排序,而不需要实施者了解已经在UCS中编码的所有不同脚本。注1此通用模板表将被修改以适应本地环境的需要。全球范围内的主要好处是,对于其他脚本,通常不需要修改,并且从国际角度来看,顺序将尽可能保持一致和可预测。注2 ISO/IEC 14651:2007中使用的字符集等同于Unicode标准5.0版的字符集。参考名称。参考名称指的是通用模板表的这个特定版本,以便在裁剪时用作参考。特别地,该名称意味着该表与ISO/IEC 10646通用多八位字节编码字符集的特定开发阶段相关联。声明排序规则表和通用模板表之间差异(增量)的要求。ISO/IEC 14651:2007未强制要求以下内容。具体的比较方法;给出相同结果的任何等效方法都是可接受的。在给定实现中描述或定制表的特定格式。实现要使用的特定符号,公共模板表的名称除外。用于选择选项的任何特定用户界面。比较时使用的中间键的任何特定内部格式,也不是所使用的表的任何特定内部格式。数字键的使用也不是强制性的。依赖于上下文的排序。比较前字符串的任何特殊准备。注1:即使ISO/IEC 14651:2007没有规定,通常也需要在比较前准备字符串(参见资料性附录C)。注2尽管不需要用户界面来选择选项或指定通用模板表的定制,但一致性要求始终声明适用的增量,即与该表的差异声明。建议流程向用户提供可用的定制选项。
ISO/IEC 14651:2007 defines the following.
- A reference comparison method. This method is applicable to two character strings to determine their collating order in a sorted list. The method can be applied to strings containing characters from the full repertoire of ISO/IEC 10646. This method is also applicable to subsets of that repertoire, such as those of the different ISO/IEC 8-bit standard character sets, or any other character set, standardised or not, to produce ordering results valid (after tailoring) for a given set of languages for each script. This method uses collation tables derived either from the Common Template Table defined in ISO/IEC 14651 or from one of its tailorings. This method provides a reference format. The format is described using the Backus-Naur Form (BNF). This format is used to describe the Common Template Table. The format is used normatively within ISO/IEC 14651:2007.
- A Common Template Table. A given tailoring of the Common Template Table is used by the reference comparison method. The Common Template Table describes an order for all characters encoded in ISO/IEC 10646:2003 up to Amendment 2, plus characters DEVANAGARI LETTER GGA, DEVANAGARI LETTER JJA, DEVANAGARI LETTER DDDA and DEVANAGARI LETTER BBA (characters U097B, U097C, U097E and U097F, respectively). It allows for a specification of a fully deterministic ordering. This table enables the specification of a string ordering adapted to local ordering rules, without requiring an implementer to have knowledge of all the different scripts already encoded in the UCS.
- A reference name. The reference name refers to this particular version of the Common Template Table, for use as a reference when tailoring. In particular, this name implies that the table is linked to a particular stage of development of the ISO/IEC 10646 Universal multiple-octet coded character set.
- Requirements for a declaration of the differences (delta) between the collation table and the Common Template Table.
- A specific comparison method; any equivalent method giving the same results is acceptable.
- A specific format for describing or tailoring tables in a given implementation.
- Specific symbols to be used by implementations, except for the name of the Common Template Table.
- Any specific user interface for choosing options.
- Any specific internal format for intermediate keys used when comparing, nor for the table used. The use of numeric keys is not mandated either.
- A context-dependent ordering.
- Any particular preparation of character strings prior to comparison.
标准预览图


