http://wiki.math.bme.hu/history/Informatics1-2019/Lab03?feed=atom&Informatics1-2019/Lab03 - Laptörténet2024-03-28T21:21:17ZAz oldal laptörténete a wikibenMediaWiki 1.18.1http://wiki.math.bme.hu/index.php?title=Informatics1-2019/Lab03&diff=14001&oldid=prevGaebor: Új oldal, tartalma: „Previous - Up - Next == Regular expressions == Regular expressions are used to find …”2019-09-30T10:07:16Z<p>Új oldal, tartalma: „<a href="/view/Informatics1-2019/Lab02" title="Informatics1-2019/Lab02">Previous</a> - <a href="/view/Informatics1-2019#Labs" title="Informatics1-2019">Up</a> - <a href="/view/Informatics1-2019/Lab04" title="Informatics1-2019/Lab04">Next</a> == Regular expressions == Regular expressions are used to find …”</p>
<p><b>Új lap</b></p><div>[[Informatics1-2019/Lab02|Previous]] - [[Informatics1-2019#Labs|Up]] - [[Informatics1-2019/Lab04|Next]]<br />
<br />
== Regular expressions ==<br />
Regular expressions are used to find complex paterns in text, or if we want to substitude these patterns for something else. We will use this site https://regex101.com/#python<br />
* Special characters: These don't symbolize themselves, to find them in text we have to escape them with \ for example: \$, \^ etc.<br />
<pre><br />
. ^ $ * + ? { } [ ] \ | ( )<br />
</pre><br />
=== Character classes === <br />
For the time being we only use one character patterns.<br />
* '''\d''': arbitrary number, '''\D''': arbitrary character that is not a number.<br />
* '''\w''': arbitrary alphanumeric character, character, number, or underline (_), '''\W''': arbitrary non-alphanumeric character.<br />
* '''\s''': whitespace, which is tab, end of line, space, '''\S''' arbitrary non-whitespace character.<br />
* We can create custom character classes: '''[xyz]''', or we can make exclusions, e.g. '''[^xyz]'''. The former matches x, y or z, the latter matches any character that is not x, y or z. Using a dash we can specify intervals, e.g. '''[a-z]''' this matches all lower case characters, but for example '''[A-Za-z0-9]''' maches all uppercase, lowercase and numeric characters.<br />
* '''^''': beginning of line, '''$''', end of line.<br />
* A '''.''' matches any character.<br />
<br />
=== Recurrence ===<br />
{| class="wikitable"<br />
|-<br />
| Notation || Recurrance number || Example<br />
|-<br />
| '''*''' || 0,1, or however many || '''\d*''' matches '123', and it even matches the empty string, '' as well<br />
|-<br />
| '''+''' || at least 1 || '''\d+''' matches any number of numeric characters<br />
|-<br />
| '''?''' || 0 or 1 || '''the?an''' matches 'then' and 'than' as well<br />
|-<br />
| '''{m,n}''' || At least ''m'', at most ''n'' number of something, both of them are optional || ''':D{4,10}''' does not match ':DDDDDDDDDDDDDD'<br />
|}<br />
=== Choice ===<br />
* The pattern '''a|e|i|o|u''' matches any vowel. Try the '''GetValue|Get|Set|SetValue''' expression. What do we get for the text ''SetValue''?<br />
=== Grouping ===<br />
We can specify groups within the expression. The following example matches any string that repeats once:<br />
<pre><br />
(.*)\1<br />
</pre><br />
We can match for HTML tags:<br />
<pre><br />
<([A-Z][A-Z0-9]*)\b[^>]*>.*?</\1><br />
</pre><br />
We can specify multiple groups, the sequence of the opening parenthesis specifies the number. Replace the ending if email addresses to .hu!<br />
<pre><br />
(\w+)@((\w+)\.)+(\w+)<br />
</pre><br />
== Tasks ==<br />
* date formats: yyyy.mm.dd<br />
* Mobile numbers starting with +36 20, +36 30, +36 70<br />
* Link tags (<a>anything here</a>)<br />
* Webpage adresses<br />
* Find the BME logo with patterns on: http://www.bme.hu/?language=en<br />
* 2 digit numbers divisible by 4<br />
* leap years<br />
* date format with custom separator:<br />
<br />
yyyy.mm.dd<br />
yyyy,mm,dd<br />
yyyy-mm-dd<br />
<br />
The separator can be any of ''',.-''' or space but the two separator should be the same.<br />
<br />
* Swap two columns of a text file (separated by tabulator)<br />
<br />
=== Advanced tasks ===<br />
* Roman numerals written with capital letters<br />
Millenium: <code>M{0,4}</code>, century: <code>CM|CD|D?C{0,3}</code>, decade: <code>XC|XL|L?X{0,3}</code>, year: <code>IX|IV|V?I{0,3}</code>.<br />
* Positive integers, really long numbers might contain spaces when grouped by 3 digits (1 000, 435 000 000).<br />
* Decimal color code in HTML (3 or 6 hexa number)<br />
<br />
[[Informatics1-2019/Lab02|Previous]] - [[Informatics1-2019#Labs|Up]] - [[Informatics1-2019/Lab04|Next]]</div>Gaebor