Best Features of ES2018 — New Regex Features

Spread the love

Since 2015, JavaScript has improved immensely.

It’s much more pleasant to use it now than ever.

In this article, we’ll look at the best features of ES2018.

Regex Property Escapes

Unicode has properties, which are metadata describing it.

There are properties like Lowercase_Letter to describe lowercase letters, White_space to describe white spaces etc.

There’re several types of properties.

Enumerated property is a property value whose values are few and named.

General_Category is an enumerated property.

Close enumerated property ios an enumerated property whose set values is fixed.

Boolean property is a close enumerated property whose value is true or false .

Numeric properties have values that are real numbers.

String-valued property is a property whose values are strings.

Catalog property is an enumerated property that may be extended as Unicode changes.

Miscellaneous property is a property whose values aren’t any of the above.

There are various kinds of matching properties and property values.

The properties are loose matching, so General_Category is considered the same as GeneralCategory and other variants.

Unicode Property Escapes for Regex

We can use the \p characters to escape the Unicode properties.

This must be used with the /u flag to enable Unicode mode.

\p is the same as p without Unicode mode.

For instance, we can write:

const result = /^\p{White_Space}+$/u.test(' ')

and result would be true .

This means \p{White_Space} matches whitespace.

This is more descriptive than regular regex patterns.

We can also write:

const result = /^\p{Letter}+$/u.test('abc')

to match letters.

To match Greek letters, we write:

const result = /^\p{Script=Greek}+$/u.test('μ')

And we can match Latin letters with:

const result = /^\p{Script=Latin}+$/u.test('ç')

Long surrogate characters can also be matched:

const result = /^\p{Surrogate}+$/u.test('\\u{D83D}')

Lookbehind Assertions

A lookbehind assertion is a construct in a regex that specifies what the surroundings of the current location must look like.

For instance, we can write:

const RE_DOLLAR_PREFIX = /(?<=\$)\d+/g;

const result = '$123'.replace(RE_DOLLAR_PREFIX, '456');

We have the (?<=\$) group to look for digits with a dollar sign before it.

Then when we call replace to replace the number, we just replace the number.

We searched for something with $ and digits after it with that regex.

So result is '$456' .

This doesn’t work if the prefix should be part of the previous match.

We can also add a ! to add a negative lookbehind assertion.,

For instance, we can write:

const RE_DOLLAR_PREFIX = /(?<!\$)baz/g;

const result = '&baz'.replace(RE_DOLLAR_PREFIX, 'qux');

The regex looks for baz without a dollar sign before it.

So if we called replace as we did, we get '&qux’ returned since $ isn’t in the string.

`s` (`dotAll`) Flag for Regex

The dotAll dlag is a enahnance of the . flag in a regex.

The . in a regex doesn’t match line terminator characters.

So if we have:

/^.$/.test('\n')

We get false .

To match line terminator characters, we’ve to wite:

/^[^]$/.test('\\n')

to match everything except no character or:

/^\[\s\S]$/.test('\n')

to match space or non-whitespace.

There’re various kinds of line termination characters.

They include:

U+000A line feed (LF) (\n)
U+000D carriage return (CR) (\r)
U+2028 line separator

U+2029 paragraph separator

With ES2018, we can use the /s flag to match something that ends with a line terminator:

const result = /^.$/s.test('\\n')

So result should be true .

The long name is dotAll .

So we can write:

/./s.dotAll

and that returns true .

Conclusion

Regex property escapes, lookbehind assertions, and the dotAll flag new additions to JavaScript regexes that let us match various special cases.

Regex Property Escapes

Unicode Property Escapes for Regex

Lookbehind Assertions

s (dotAll) Flag for Regex

Conclusion

Related Posts

By John Au-Yeung

Leave a Reply Cancel reply

`s` (`dotAll`) Flag for Regex