Quickjs.UnicodeUnicode utilities from QuickJS's libunicode
This module provides Unicode character classification, case conversion, and normalization functions. It uses the same battle-tested Unicode tables as QuickJS's ES2023-compliant JavaScript engine.
val normalize : normalization -> string -> string optionnormalize form str normalizes a UTF-8 string to the specified form. Returns None on memory allocation failure or invalid input.
Example:
normalize NFC "café" (* composed form *) normalize NFD
"café" (* decomposed form *)lowercase str converts a UTF-8 string to lowercase. Handles Unicode characters like "ÉCOLE" → "école".
uppercase str converts a UTF-8 string to uppercase. Handles special cases like "ß" → "SS".
lowercase_char c returns the lowercase form of a code point. Returns a list because some characters expand (though lowercase rarely does).
uppercase_char c returns the uppercase form of a code point. Returns a list because some characters expand, e.g., 'ß' → 'S'; 'S'.
val is_cased : Uchar.t -> boolis_cased c returns true if the character has uppercase/lowercase forms. Examples: 'a', 'A', 'é' are cased; '1', '!' are not.
val is_case_ignorable : Uchar.t -> boolis_case_ignorable c returns true if the character is ignored during case mapping operations (e.g., combining marks).
val is_id_start : Uchar.t -> boolis_id_start c returns true if the character can start a JavaScript/Unicode identifier (letters, $, _).
val is_id_continue : Uchar.t -> boolis_id_continue c returns true if the character can continue a JavaScript/Unicode identifier (letters, digits, $, _, combining marks).
val is_whitespace : Uchar.t -> boolis_whitespace c returns true if the character is Unicode whitespace. Includes ASCII space, tab, newline, and Unicode spaces like U+00A0 (NBSP).