lone surrogates
are strings, contains 16-bit Code Unit(unicode characters) such as \uD914
, should be one of below things
- Leading surrogates: Range between 0XD800 to 0XDBFF
- Trailing Surrogate: Range between 0XDC00 to 0XDFFF
The string is not wellformed its characters contain lone surrogates. String introduced below two methods to check and convert wellformed strings.
String.prototype.isWellFormed method
Check if the string contains lone surrogates or not.
Returns true, if unicode string is not present.
const str1 = "hello\uD914";
const str2 = "welcome";
console.log(str1.isWellFormed()); // false
console.log(str2.isWellFormed()); // true
Another example on Emoji and leading surrogates
// emoji wellformed utf-16 string
const str = "welcome 😃 ";
console.log(str.isWellFormed()); // true
// Not wellformed string with a lone leading surrogate
const str1 = "user \uD83C";
console.log(illFormed.isWellFormed()); // false
String.prototype.toWellFormed method
This method returns a string by converting unpaired surrogate code points with U+FFFD Replacement characters.
unpaired surrogates are pairs that are leading and trailing surrogates
const str1 = "hello\uD914";
const str2 = "welcome";
console.log(str1.toWellFormed()); // hello�
console.log(str2.toWellFormed()); // welcome
where do we use these methods?
encodeURI
methods throws an error if string is not wellformed.
const str = "https://domain.com/query?q=\uD413";
try {
encodeURI(str);
} catch (e) {
console.log(e); // URI malformed error
}
To avoid encodeURI errors, Check and convert
const str = "https://domain.com/query?q=\uD413";
// Check wellformed string or not
if (str.isWellFormed()) {
// Convert and encode
console.log(encodeURI(str.toWellFormed()))
} else {
console.log(' string is not wellformed')
}
Supported Browsers
- Chrome
- Firefox
- Safari
- Edge
In Summary, checking and conversion Wellformed strings helps developers to works with encoding a string process.