Regular expressions let us manipulate strings with ease. They are patterns that let us match the text in pretty much in any way we imagine. Without it, we’ll have big trouble searching for texts with complex patterns. It’s also useful to check inputs against regular expressions for form validation.
In JavaScript, there’s a regular expression object. We can define it with a regular expression literal or use the RegExp
constructor to define a regular expression object.
To define regular expression literals, we wrap our regular expression pattern with slashes. For example, we can write:
const re = /a/
to define a regular expression object to look for the letter ‘a’. Alternatively, we can use the regular expression constructor to define the regular expression object by passing in a string to the RegExp
constructor as follows:
const re = new RegExp('a');
Using letters or digits as we have above are useful for searching for simple patterns like numbers and letters. If we want to search for more complex patterns, we have to use special characters to build a regular expression that lets us search for more complex patterns. Most of them are the same for most programming languages, but there may also be language-specific extensions.
Special Characters
Below are the special characters that we can use with JavaScript objects:
\
A backslash that precedes a non-special character indicates that the next character is special and not to be interpreted literally. For instance \b
means a word boundary, while b
matches the letter ‘b’.
A backslash that precedes a special character indicates the next character isn’t special and should be interpreted literally. For example, \^
will search for the ‘^’ character.
^
The ^ character matches the beginning of the input. If the multiline flag is set to true, it also matches immediately after a line break character.
For example, /^a/
matches the ‘a’ in string abc
but not ‘a’ in cba
.
$
The $ matches the end of the input. If the multiline flag is set to true, it also matches immediately after a line break character.
For example, /t$/
matches the ‘t’ in bat
, but not the ‘t’ in tab
.
*
The * matches the preceding expression 0 or more times. It’s the same as {0*}
.
For instance, /o*/
matches the ‘oo’ in moo
, but nothing in tab
.
+
This character matches the expression preceding it 1 or more times. It’s the same as {1,}
.
For example, /b+/
matches bb
in cabbage
but nothing in moo
.
?
The question mark matches the expression preceding it 0 or 1 time. It’s the same as {0,1}
.
For example, /b?a/
matches ba
in bat
, and a
in cat
.
If it’s used after any of the characters *, +, ? or {}, then it makes the ?
non-greedy, which means it matches the fewest possible characters. This is the opposite of the default which is greedy, which matches as many characters as possible.
For example, [a-z]+
matches abc
but [a-z]+?
matches a
.
.
The period matches any single character except the newline character by default.
For example .a
matches ba
in ba
or bat
.
(x)
Matches whatever pattern in the parentheses and remembers that match. The parentheses are called capturing parentheses. Then whole pattern is called a capturing group.
For example, if we have (abc)(def) \1
, then if we get a few matches for abcdef abcdef
. We get abcdef abc
, abc
, and def
The \1
is for denoting the first substring match, which is abc
.
We can have more than one capturing group in one regular expression For example if we have (abc) (def) \1 \2
, then we get the matches abc def abc def
, abc
and def
. As we can see, anything that matches the pattern in each capturing group and the combination of them is considered matches. \1
is the stands for the same thing as in the previous example, and \2
stands for the second substring, which is def
.
(?:x)
This is similar to (x)
but doesn’t remember the match.
For example, if we have the pattern (?:abc){1,2}
, then if we have the string abc def abc def
then the first abc
will be matched.
x(?=y)
The pattern matches x
only if x
if before y
. It’s called a lookahead pattern.
For example, if we have the pattern x(?=y)
and the string xy
, then we get the x
as the match.
x(?!y)
This pattern matches x
only if x
isn’t followed by y
. This is called a lookbehind.
For instance x(?!y)
won’t match xylophone
, but it’ll match the x
in axe
.
(?<!y)x
This pattern matches x
only is x
isn’t preceded by y
. This is called a negated lookahead.
For example, if we have the pattern (?<=b)a
and the string bat
, then it’ll match the a
in bat
. However, the same pattern won’t match the word cat
since b
isn’t before a
.
(?<!y)x
This pattern matches x
only if x
isn’t preceded by y
. This is called a negated lookbehind.
For instance, if we have the pattern (?<!b)a
and the string cat
, then it’ll match the a
in cat
. However, it won’t match anything in bat
.
x|y
Matches x
or y
. For example, if we have the pattern x|y
and the word xylophone
, then it’ll match the x
.
{n}
Match exactly n
occurrences of an expression. For example, if we have a{2}
, then we match the aa
in baa
.
{n,}
This pattern matches at least n
occurrences of the preceding expression where n
is a positive integer.
For example, b{2,}
matches bb
,bbb
, bbbb
and so on.
{a,b}
Matches an expression preceding this from a
to b
times. For example, if we have b{2,4}
, it’ll match the strings bb
,bbb
, bbbb
and nothing else.
[xyz]
A pattern that matches one of the characters within the brackets, including escape sequences. For example, [xyz]
will match the x
in xylophone
, and y
in yawn
.
Combining Special Characters to Form Complex Patterns
We can combine these special expressions to make a search for more complex patterns. One common example are validating email address.
A simple regular expression for email may be:
[a-z0-9.]+@[a-z0-9.]+.[a-z]
In the regular expression above, we have [a-z0–9.]+
which matches any digit or letters or a period occurring any number of times. This is followed by an @
sign, which is in every email address to separate the username from the domain name. Then we have [a-z0–9.]+
again to match any letter or digit or a period occurring any number of times. This is followed by a period. Then [a-z]
to match the domain name.
In this article, we just began looking at regular expressions in JavaScript. Defining a regular expression can be done with a literal expression or the RegExp
constructor. We can construct regular expressions with special characters which denote certain patterns. Then we can combine them into bigger regular expressions to search for more complex patterns like emails.