ISO/IEC 14651:2007 信息技术 国际串排序和比较 字符串比较方法和公共模板定制说明

标准编号:ISO/IEC 14651:2007

中文名称:信息技术 国际串排序和比较 字符串比较方法和公共模板定制说明

英文名称:Information technology — International string ordering and comparison — Method for comparing character strings and description of the common template tailorable ordering

发布日期:2007-12

标准范围

ISO/IEC 14651:2007定义了以下内容。参考比较法。该方法适用于两个字符串,以确定它们在排序列表中的排序顺序。该方法可以应用于包含来自ISO/IEC 10646完整库的字符的字符串。该方法也适用于该库的子集,例如不同ISO/IEC 8位标准字符集的子集,或任何其他字符集,无论是否标准化,以产生对每个脚本的给定语言集有效(在定制之后)的排序结果。该方法使用从ISO/IEC 14651中定义的通用模板表或从其定制之一导出的整理表。该方法提供了一种参考格式。该格式使用Backus-Naur形式(BNF)描述。此格式用于描述公共模板表。该格式在ISO/IEC 14651:2007中规范使用。通用模板表。参考比较方法使用公共模板表的给定定制。通用模板表描述了ISO/IEC 10646:2003直到修订2中编码的所有字符的顺序,加上字符梵文字母GGA、梵文字母JJA、梵文字母DDDA和梵文字母BBA(分别为字符U097B、U097C、U097E和U097F)。它允许完全确定性排序的规范。该表使得能够指定适合于本地排序规则的字符串排序,而不需要实施者了解已经在UCS中编码的所有不同脚本。注1此通用模板表将被修改以适应本地环境的需要。全球范围内的主要好处是,对于其他脚本,通常不需要修改,并且从国际角度来看,顺序将尽可能保持一致和可预测。注2 ISO/IEC 14651:2007中使用的字符集等同于Unicode标准5.0版的字符集。参考名称。参考名称指的是通用模板表的这个特定版本,以便在裁剪时用作参考。特别地,该名称意味着该表与ISO/IEC 10646通用多八位字节编码字符集的特定开发阶段相关联。声明排序规则表和通用模板表之间差异(增量)的要求。ISO/IEC 14651:2007未强制要求以下内容。具体的比较方法;给出相同结果的任何等效方法都是可接受的。在给定实现中描述或定制表的特定格式。实现要使用的特定符号,公共模板表的名称除外。用于选择选项的任何特定用户界面。比较时使用的中间键的任何特定内部格式,也不是所使用的表的任何特定内部格式。数字键的使用也不是强制性的。依赖于上下文的排序。比较前字符串的任何特殊准备。注1:即使ISO/IEC 14651:2007没有规定,通常也需要在比较前准备字符串(参见资料性附录C)。注2尽管不需要用户界面来选择选项或指定通用模板表的定制,但一致性要求始终声明适用的增量,即与该表的差异声明。建议流程向用户提供可用的定制选项。

ISO/IEC 14651:2007 defines the following.

  • A reference comparison method. This method is applicable to two character strings to determine their collating order in a sorted list. The method can be applied to strings containing characters from the full repertoire of ISO/IEC 10646. This method is also applicable to subsets of that repertoire, such as those of the different ISO/IEC 8-bit standard character sets, or any other character set, standardised or not, to produce ordering results valid (after tailoring) for a given set of languages for each script. This method uses collation tables derived either from the Common Template Table defined in ISO/IEC 14651 or from one of its tailorings. This method provides a reference format. The format is described using the Backus-Naur Form (BNF). This format is used to describe the Common Template Table. The format is used normatively within ISO/IEC 14651:2007.
  • A Common Template Table. A given tailoring of the Common Template Table is used by the reference comparison method. The Common Template Table describes an order for all characters encoded in ISO/IEC 10646:2003 up to Amendment 2, plus characters DEVANAGARI LETTER GGA, DEVANAGARI LETTER JJA, DEVANAGARI LETTER DDDA and DEVANAGARI LETTER BBA (characters U097B, U097C, U097E and U097F, respectively). It allows for a specification of a fully deterministic ordering. This table enables the specification of a string ordering adapted to local ordering rules, without requiring an implementer to have knowledge of all the different scripts already encoded in the UCS.
NOTE 1 This Common Template Table is to be modified to suit the needs of a local environment. The main worldwide benefit is that, for other scripts, often no modification is required and the order will remain as consistent as possible and predictable from an international point of view.NOTE 2 The character repertoire used in ISO/IEC 14651:2007 is equivalent to that of the Unicode Standard version 5.0.
  • A reference name. The reference name refers to this particular version of the Common Template Table, for use as a reference when tailoring. In particular, this name implies that the table is linked to a particular stage of development of the ISO/IEC 10646 Universal multiple-octet coded character set.
  • Requirements for a declaration of the differences (delta) between the collation table and the Common Template Table.
ISO/IEC 14651:2007 does not mandate the following.
  • A specific comparison method; any equivalent method giving the same results is acceptable.
  • A specific format for describing or tailoring tables in a given implementation.
  • Specific symbols to be used by implementations, except for the name of the Common Template Table.
  • Any specific user interface for choosing options.
  • Any specific internal format for intermediate keys used when comparing, nor for the table used. The use of numeric keys is not mandated either.
  • A context-dependent ordering.
  • Any particular preparation of character strings prior to comparison.
NOTE 1 It is normally necessary to do preparation of character strings prior to comparison even if it is not prescribed by ISO/IEC 14651:2007 (see informative Annex C).NOTE 2 Although no user interface is required to choose options or to specify tailoring of the Common Template Table, conformance requires always declaring the applicable delta, a declaration of differences with this table. It is recommended that processes present available tailoring options to users.

标准预览图


立即下载标准文件