A MathWikiből

Previous - Up - Next


Regular expressions

Regular expressions are used to find complex paterns in text, or if we want to substitude these patterns for something else. We will use this site

  • Special characters: These don't symbolize themselves, to find them in text we have to escape them with \ for example: \$, \^ etc.
. ^ $ * + ? { } [ ] \ | ( )

Character classes

For the time being we only use one character patterns.

  • \d: arbitrary number, \D: arbitrary character that is not a number.
  • \w: arbitrary alphanumeric character, character, number, or underline (_), \W: arbitrary non-alphanumeric character.
  • \s: whitespace, which is tab, end of line, space, \S arbitrary non-whitespace character.
  • We can create custom character classes: [xyz], or we can make exclusions, e.g. [^xyz]. The former matches x, y or z, the latter matches any character that is not x, y or z. Using a dash we can specify intervals, e.g. [a-z] this matches all lower case characters, but for example [A-Za-z0-9] maches all uppercase, lowercase and numeric characters.
  • ^: beginning of line, $, end of line.
  • A . matches any character.


Notation Recurrance number Example
* 0,1, or however many \d* matches '123', and it even matches the empty string, as well
+ at least 1 \d+ matches any number of numeric characters
? 0 or 1 the?an matches 'then' and 'than' as well
{m,n} At least m, at most n number of something, both of them are optional :D{4,10} does not match ':DDDDDDDDDDDDDD'


  • The pattern a|e|i|o|u matches any vowel. Try the GetValue|Get|Set|SetValue expression. What do we get for the text SetValue?


We can specify groups within the expression. The following example matches any string that repeats once:


We can match for HTML tags:


We can specify multiple groups, the sequence of the opening parenthesis specifies the number. Replace the ending if email addresses to .hu!



  • date formats:
  • Mobile numbers starting with +36 20, +36 30, +36 70
  • Link tags (<a>anything here</a>)
  • Webpage adresses
  • Find the BME logo with patterns on:
  • 2 digit numbers divisible by 4
  • leap years
  • date format with custom separator:

The separator can be any of ,.- or space but the two separator should be the same.

  • Swap two columns of a text file (separated by tabulator)

Advanced tasks

  • Roman numerals written with capital letters

Millenium: M{0,4}, century: CM|CD|D?C{0,3}, decade: XC|XL|L?X{0,3}, year: IX|IV|V?I{0,3}.

  • Positive integers, really long numbers might contain spaces when grouped by 3 digits (1 000, 435 000 000).
  • Decimal color code in HTML (3 or 6 hexa number)

Previous - Up - Next

Személyes eszközök