Informatics1-2017/Practice3

A MathWikiből
(Változatok közti eltérés)
(Új oldal, tartalma: „== Regular expressions == Regular expressions are used to find complex paterns in text, or if we want to substitude these patterns for something else. We will use this …”)
 
 
27. sor: 27. sor:
 
| '''{m,n}''' || At least ''m'', at most ''n'' number of something, both of them are optional || ''':D{4,10}''' does not match ':DDDDDDDDDDDDDD'
 
| '''{m,n}''' || At least ''m'', at most ''n'' number of something, both of them are optional || ''':D{4,10}''' does not match ':DDDDDDDDDDDDDD'
 
|}
 
|}
=== Választás ===
+
=== Choice ===
* Bármilyen magánhangzóra illeszkedik a '''a|e|i|o|u''' kifejezés. Próbáljuk ki a '''GetValue|Get|Set|SetValue''' kifejezést. A ''SetValue'' szövegnél mire illeszkedik?
+
* The pattern '''a|e|i|o|u''' matches any vowel. Try the '''GetValue|Get|Set|SetValue''' expression. What do we get for the text ''SetValue''?
=== Csoportosítás ===
+
=== Grouping ===
Lehetőségünk van csoportokat kijelölni a kifejezésen belül. Az alábbi példa olyan szövegre illeszkedik, amelyben kétszer ismétlődik bármi.
+
We can specify groups within the expression. The following example matches any string that repeats once:
 
<pre>
 
<pre>
 
(.*)\1
 
(.*)\1
 
</pre>
 
</pre>
Tudunk keresni html tageket is
+
We can match for HTML tags:
 
<pre>
 
<pre>
 
<([A-Z][A-Z0-9]*)\b[^>]*>.*?</\1>
 
<([A-Z][A-Z0-9]*)\b[^>]*>.*?</\1>
 
</pre>
 
</pre>
Több csoportot is tudunk kezelni, a nyitó zárójelek sorrendje határozza meg a csoport sorszámát. Cseréljük le az email cím országkódját .hu-ra!
+
We can specify multiple groups, the sequence of the opening parenthesis specifies the number. Replace the ending if email addresses to .hu!
 
<pre>
 
<pre>
 
(\w+)@((\w+)\.)+(\w+)
 
(\w+)@((\w+)\.)+(\w+)
 
</pre>
 
</pre>
== Feladatok ==
+
== Tasks ==
* yyyy.mm.dd formátumú dátumok
+
* date formats: yyyy.mm.dd
* Magyar mobilszámok
+
* Mobile numbers starting with +36 20, +36 30, +36 70
* Link keresése html oldalon
+
* Link tags (<a>anything here</a>)
* Webcímek
+
* Webpage adresses
* Keressük ki a hét képét a bmeme.hu oldalon!
+
* Find the BME logo with patterns on: http://www.bme.hu/?language=en
* Néggyel osztható 2-jegyű szám
+
* 2 digit numbers divisible by 4
* Szökőév
+
* leap years
  
=== Nehezebb feladatok ===
+
=== Advanced tasks ===
Nagy betűkkel írt római számok
+
* Roman numerals written with capital latters
 
+
Millenium: <code>M{0,4}</code>, century: <code>CM|CD|D?C{0,3}</code>, decade: <code>XC|XL|L?X{0,3}</code>, year: <code>IX|IV|V?I{0,3}</code>.
Évezred: <code>M{0,4}</code>, évszázad: <code>CM|CD|D?C{0,3}</code>, évtized: <code>XC|XL|L?X{0,3}</code>, év: <code>IX|IV|V?I{0,3}</code>. Akkor mégis mi a hiba az alábbi megoldással?
+
* Positive integers, really long numbers might contain spaces when grouped by 3 digits (1 000, 435 000 000).
 
+
* Decimal color code in HTML (3 or 6 hexa number)
M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})
+
 
+
Hogy illeszkedik az üres sztringre is! Megoldás
+
 
+
\b(?=[MDCLXVI])M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b
+
 
+
Pozitív egészek, esetleg a sokjegyű számok hármasával csoportosítva vannak és a csoportok szóközzel elválasztva
+
 
+
[1-9](\d{0,2}( \d{3})+|\d*)
+
 
+
Ugyanez szóhatárok közt:
+
 
+
\b[1-9](\d{0,2}( \d{3})+|\d*)\b
+
 
+
HTML-kódban hexadecimális színkód (3 vagy 6 hexa szám)
+
 
+
([0-9A-Fa-f]{3}){1,2}
+

A lap jelenlegi, 2017. szeptember 18., 06:37-kori változata

Tartalomjegyzék

Regular expressions

Regular expressions are used to find complex paterns in text, or if we want to substitude these patterns for something else. We will use this site https://regex101.com/#python

  • Special characters: These don't symbolize themselves, to find them in text we have to escape them with \ for example: \$, \^ etc.
. ^ $ * + ? { } [ ] \ | ( )

Character classes

For the time being we only use one character patterns.

  • \d: arbitrary number, \D: arbitrary character that is not a number.
  • \w: arbitrary alphanumeric character, character, number, or underline (_), \W: arbitrary non-alphanumeric character.
  • \s: whitespace, which is tab, end of line, space, \S arbitrary non-whitespace character.
  • We can create custom character classes: [xyz], or we can make exclusions, e.g. [^xyz]. The former matches x, y or z, the latter matches any character that is not x, y or z. Using a dash we can specify intervals, e.g. [a-z] this matches all lower case characters, but for example [A-Za-z0-9] maches all uppercase, lowercase and numeric characters.
  • ^: beginning of line, $, end of line.
  • A . matches any character.

Recurrence

Notation Recurrance number Example
* 0,1, or however many \d* matches '123', and it even matches the empty string, as well
+ at least 1 \d+ matches any number of numeric characters
? 0 or 1 the?an matches 'then' and 'than' as well
{m,n} At least m, at most n number of something, both of them are optional :D{4,10} does not match ':DDDDDDDDDDDDDD'

Choice

  • The pattern a|e|i|o|u matches any vowel. Try the GetValue|Get|Set|SetValue expression. What do we get for the text SetValue?

Grouping

We can specify groups within the expression. The following example matches any string that repeats once:

(.*)\1

We can match for HTML tags:

<([A-Z][A-Z0-9]*)\b[^>]*>.*?</\1>

We can specify multiple groups, the sequence of the opening parenthesis specifies the number. Replace the ending if email addresses to .hu!

(\w+)@((\w+)\.)+(\w+)

Tasks

  • date formats: yyyy.mm.dd
  • Mobile numbers starting with +36 20, +36 30, +36 70
  • Link tags (<a>anything here</a>)
  • Webpage adresses
  • Find the BME logo with patterns on: http://www.bme.hu/?language=en
  • 2 digit numbers divisible by 4
  • leap years

Advanced tasks

  • Roman numerals written with capital latters

Millenium: M{0,4}, century: CM|CD|D?C{0,3}, decade: XC|XL|L?X{0,3}, year: IX|IV|V?I{0,3}.

  • Positive integers, really long numbers might contain spaces when grouped by 3 digits (1 000, 435 000 000).
  • Decimal color code in HTML (3 or 6 hexa number)
Személyes eszközök