Since 2015, JavaScript has improved immensely.
It’s much more pleasant to use it now than ever.
In this article, we’ll look at the best features of ES2018.
Regex Property Escapes
Unicode has properties, which are metadata describing it.
There are properties like Lowercase_Letter
to describe lowercase letters, White_space
to describe white spaces etc.
There’re several types of properties.
Enumerated property is a property value whose values are few and named.
General_Category
is an enumerated property.
Close enumerated property ios an enumerated property whose set values is fixed.
Boolean property is a close enumerated property whose value is true
or false
.
Numeric properties have values that are real numbers.
String-valued property is a property whose values are strings.
Catalog property is an enumerated property that may be extended as Unicode changes.
Miscellaneous property is a property whose values aren’t any of the above.
There are various kinds of matching properties and property values.
The properties are loose matching, so General_Category
is considered the same as GeneralCategory
and other variants.
Unicode Property Escapes for Regex
We can use the \p
characters to escape the Unicode properties.
This must be used with the /u
flag to enable Unicode mode.
\p
is the same as p
without Unicode mode.
For instance, we can write:
const result = /^\p{White_Space}+$/u.test(' ')
and result
would be true
.
This means \p{White_Space}
matches whitespace.
This is more descriptive than regular regex patterns.
We can also write:
const result = /^\p{Letter}+$/u.test('abc')
to match letters.
To match Greek letters, we write:
const result = /^\p{Script=Greek}+$/u.test('μ')
And we can match Latin letters with:
const result = /^\p{Script=Latin}+$/u.test('ç')
Long surrogate characters can also be matched:
const result = /^\p{Surrogate}+$/u.test('\\u{D83D}')
Lookbehind Assertions
A lookbehind assertion is a construct in a regex that specifies what the surroundings of the current location must look like.
For instance, we can write:
const RE_DOLLAR_PREFIX = /(?<=\$)\d+/g;
const result = '$123'.replace(RE_DOLLAR_PREFIX, '456');
We have the (?<=\$)
group to look for digits with a dollar sign before it.
Then when we call replace
to replace the number, we just replace the number.
We searched for something with $
and digits after it with that regex.
So result
is '$456'
.
This doesn’t work if the prefix should be part of the previous match.
We can also add a !
to add a negative lookbehind assertion.,
For instance, we can write:
const RE_DOLLAR_PREFIX = /(?<!\$)baz/g;
const result = '&baz'.replace(RE_DOLLAR_PREFIX, 'qux');
The regex looks for baz
without a dollar sign before it.
So if we called replace
as we did, we get '&qux’
returned since $
isn’t in the string.
s
(dotAll
) Flag for Regex
The dotAll dlag is a enahnance of the .
flag in a regex.
The .
in a regex doesn’t match line terminator characters.
So if we have:
/^.$/.test('\n')
We get false
.
To match line terminator characters, we’ve to wite:
/^[^]$/.test('\\n')
to match everything except no character or:
/^\[\s\S]$/.test('\n')
to match space or non-whitespace.
There’re various kinds of line termination characters.
They include:
- U+000A line feed (LF) (
\n
) - U+000D carriage return (CR) (
\r
) - U+2028 line separator
- U+2029 paragraph separator
With ES2018, we can use the /s
flag to match something that ends with a line terminator:
const result = /^.$/s.test('\\n')
So result
should be true
.
The long name is dotAll
.
So we can write:
/./s.dotAll
and that returns true
.
Conclusion
Regex property escapes, lookbehind assertions, and the dotAll flag new additions to JavaScript regexes that let us match various special cases.