summaryrefslogtreecommitdiff
path: root/fs/unicode/Makefile
diff options
context:
space:
mode:
authorGabriel Krisman Bertazi <krisman@collabora.com>2019-04-25 13:38:44 -0400
committerTheodore Ts'o <tytso@mit.edu>2019-04-25 13:38:44 -0400
commit955405d1174eebcd1b89ab335f720adc27d52b67 (patch)
treedf420b2703e110c3ac1cc51f508918500f30715f /fs/unicode/Makefile
parent310a997fd74de778b9a4848a64be9cda9f18764a (diff)
unicode: introduce UTF-8 character database
The decomposition and casefolding of UTF-8 characters are described in a prefix tree in utf8data.h, which is a generate from the Unicode Character Database (UCD), published by the Unicode Consortium, and should not be edited by hand. The structures in utf8data.h are meant to be used for lookup operations by the unicode subsystem, when decoding a utf-8 string. mkutf8data.c is the source for a program that generates utf8data.h. It was written by Olaf Weber from SGI and originally proposed to be merged into Linux in 2014. The original proposal performed the compatibility decomposition, NFKD, but the current version was modified by me to do canonical decomposition, NFD, as suggested by the community. The changes from the original submission are: * Rebase to mainline. * Fix out-of-tree-build. * Update makefile to build 11.0.0 ucd files. * drop references to xfs. * Convert NFKD to NFD. * Merge back robustness fixes from original patch. Requested by Dave Chinner. The original submission is archived at: <https://linux-xfs.oss.sgi.narkive.com/Xx10wjVY/rfc-unicode-utf-8-support-for-xfs> The utf8data.h file can be regenerated using the instructions in fs/unicode/README.utf8data. - Notes on the update from 8.0.0 to 11.0: The structure of the ucd files and special cases have not experienced any changes between versions 8.0.0 and 11.0.0. 8.0.0 saw the addition of Cherokee LC characters, which is an interesting case for case-folding. The update is accompanied by new tests on the test_ucd module to catch specific cases. No changes to mkutf8data script were required for the updates. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Diffstat (limited to 'fs/unicode/Makefile')
-rw-r--r--fs/unicode/Makefile14
1 files changed, 14 insertions, 0 deletions
diff --git a/fs/unicode/Makefile b/fs/unicode/Makefile
new file mode 100644
index 000000000000..764f8e5da4bb
--- /dev/null
+++ b/fs/unicode/Makefile
@@ -0,0 +1,14 @@
+# SPDX-License-Identifier: GPL-2.0
+
+# This rule is not invoked during the kernel compilation. It is used to
+# regenerate the utf8data.h header file.
+utf8data.h.new: *.txt $(objdir)/scripts/mkutf8data
+ $(objdir)/scripts/mkutf8data \
+ -a DerivedAge.txt \
+ -c DerivedCombiningClass.txt \
+ -p DerivedCoreProperties.txt \
+ -d UnicodeData.txt \
+ -f CaseFolding.txt \
+ -n NormalizationCorrections.txt \
+ -t NormalizationTest.txt \
+ -o $@