Predefined Character Classes & Shortcuts
Predefined Character Classes & Shortcuts
. Any character (except for line terminators)
\d A digit (same as [0-9] )
\D A non-digit (same as [^0-9] )
\s A whitespace character (same as [ \t\n\x0B\f\r] )
\S A non-whitespace character (same as [^\s] )
\w A word character (same as [a-zA-Z_0-9] — see below for more information about “word characters”)
\W A non-word character (same as [^\w] )
\s+ any and all whitespace
(?i) Ignore case — must precede the regex
Word Characters
A word character is defined as any member of the following Unicode categories:
- Ll (Letter, Lowercase)
- Lu (Letter, Uppercase)
- Lt (Letter, Titlecase)
- Lo (Letter, Other)
- Lm (Letter, Modifier)
- Nd (Number, Decimal Digit)
- Pc (Punctuation, Connector)
This category includes ten characters, the most commonly used of which is the LOWLINE character (_), u+005F.
Updated over 2 years ago