unicode: introduce code for UTF-8 normalization - linux.git - dakr's fork of kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

diff options

author	Olaf Weber <olaf@sgi.com>	2019-04-25 13:45:46 -0400
committer	Theodore Ts'o <tytso@mit.edu>	2019-04-25 13:45:46 -0400
commit	44594c2fbf42528001dfb1597d26adb40ba6d178 (patch)
tree	08f72b2f0b6413988dd8785f06468fbcc2d8e8b0 /kernel
parent	955405d1174eebcd1b89ab335f720adc27d52b67 (diff)

unicode: introduce code for UTF-8 normalization

Supporting functions for UTF-8 normalization are in utf8norm.c with the header utf8norm.h. Two normalization forms are supported: nfdi and nfdicf. nfdi: - Apply unicode normalization form NFD. - Remove any Default_Ignorable_Code_Point. nfdicf: - Apply unicode normalization form NFD. - Remove any Default_Ignorable_Code_Point. - Apply a full casefold (C + F). For the purposes of the code, a string is valid UTF-8 if: - The values encoded are 0x1..0x10FFFF. - The surrogate codepoints 0xD800..0xDFFFF are not encoded. - The shortest possible encoding is used for all values. The supporting functions work on null-terminated strings (utf8 prefix) and on length-limited strings (utf8n prefix). From the original SGI patch and for conformity with coding standards, the utf8data_t typedef was dropped, since it was just masking the struct keyword. On other occasions, namely utf8leaf_t and utf8trie_t, I decided to keep it, since they are simple pointers to memory buffers, and using uchars here wouldn't provide any more meaningful information. From the original submission, we also converted from the compatibility form to canonical. Changes made by Gabriel: Rebase to Mainline Fix up checkpatch.pl warnings Drop typedefs move out of libxfs Convert from NFKD to NFD Signed-off-by: Olaf Weber <olaf@sgi.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk> Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Diffstat (limited to 'kernel')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: