HomeDocumentation and Guides

Predefined Character Classes & Shortcuts

Predefined Character Classes & Shortcuts

. Any character (except for line terminators)
\d A digit (same as [0-9] )
\D A non-digit (same as [^0-9] )
\s A whitespace character (same as [ \t\n\x0B\f\r] )
\S A non-whitespace character (same as [^\s] )
\w A word character (same as [a-zA-Z_0-9] — see below for more information about “word characters”)
\W A non-word character (same as [^\w] )
\s+ any and all whitespace
(?i) Ignore case — must precede the regex

Word Characters

A word character is defined as any member of the following Unicode categories:

  • Ll (Letter, Lowercase)
  • Lu (Letter, Uppercase)
  • Lt (Letter, Titlecase)
  • Lo (Letter, Other)
  • Lm (Letter, Modifier)
  • Nd (Number, Decimal Digit)
  • Pc (Punctuation, Connector)
    This category includes ten characters, the most commonly used of which is the LOWLINE character (_), u+005F.