LEG-011
### 1. Regular Expression Basics
#### What’s the difference between using `(...)` and `[...]` in a regex:
- `(...)` is used for grouping. It creates a capture group that can be referenced later or used for applying more complex operations like alternation or repetition.
- `[...]` is used for character sets. It matches any single character that is inside the set. For example, `[abc]` will match `a`, `b`, or `c`.
#### Find all versions (upper and lower) of the string “hello” in a document:
- `[Hh][Ee][Ll][Ll][Oo]`
#### Capture all versions of the name “Kelly”:
- `[Kk][Ee][LL][Ll][Yy]`
### 2. Regular Expressions in Python
#### Write the command to get the number of true matches in a text:
To get the number of true matches in a text, you can use Python's `re` module to compile a regex pattern and then count the number of matches found. Here’s how you can do it:
```python
import re
# define the regex pattern
pattern = re.compile(r'pattern') # replace 'pattern' with actual regex pattern
# get all matches in a text
matches = pattern.finditer(text) # replace 'text' with actual text
# count the number of matches
num_matches = sum(1 for _ in matches)
print(f'Number of matches: {num_matches}')
```
### 3. The `re` module
#### Separate the text into one match per element:
To separate the text into one match per element using Python's `re` module, you can use `re.finditer()` or `re.findall()` depending on your needs. Here’s an example:
```python
import re
# define the regex pattern
pattern = re.compile(r'pattern') # replace 'pattern' with actual regex pattern
# get all matches in a text
matches = pattern.findall(text) # replace 'text' with actual text
# print each match
for match in matches:
print(match)
```
### 4. Regular Expression Challenges
#### Capturing only the american flight numbers:
To capture only the American flight numbers (i.e., flights that start with AA), you can use the following regex:
```regex
AA[0-9]{4}
```
#### Write a regex that captures all flight numbers:
To capture all flight numbers, assuming they consist of two letters followed by four numbers, you can use:
```regex
[Kk][Ee][Ll][Ll][Yy]
```
#### Write a regex that only captures the airplane numbers:
To capture only the airplane numbers, assuming they consist solely of numbers, you can use:
```regex
[0-9]{4}
```
#### Regularly disambiguate only the airplane numbers in a text and print all the results:
To disambiguate and print only the airplane numbers in a text, you can use:
```regex
[0-9]{4}
```
### 5. Using regex for cleaning data
#### Write a regex that only captures those using the “@” in their email addresses:
To capture only those using the `@` in their email addresses, you can use:
```regex
[Kk][Ee][Ll][Ll][Yy]
```
#### Transform the dataset by making only the telephone numbers lowercase:
To transform the dataset by making only the telephone numbers lowercase, you can use:
```regex
([0-9]{4})
```
#### Transform the dataset by converting all telephone numbers to uppercase:
To transform the dataset by converting all telephone numbers to uppercase, you can use:
```regex
([0-9]{4})
```
#### Filter a text by keeping only the numbers starting with 442 in a document:
To capture only the numbers starting with 442 in a document, you can use:
```regex
442[0-9]{4}
```
### 6. Practical Problems
#### Disambiguate only the telephone numbers in a text and print all the results:
To disambiguate only the telephone numbers in a text and print all the results, you can use:
```regex
([0-9]{4})
```
#### Match only the valid telephone numbers by keeping each number starts with a regex):
To match only the valid telephone numbers by ensuring each starts with a specific pattern, you can use:
```regex
([0-9]{4})
```
#### Write a regex that captures only the names starting with sw:
To capture only the names starting with `sw`, you can use:
```regex
sw[Kk][Ee][Ll][Ll][Yy]
```
#### Write a regex that captures only the names ending with ke:
To capture only the names ending with `ke`, you can use:
```regex
[Kk][Ee][Ll][Ll][Yy]ke
```
#### Write a regex that captures only the names ending with sw or ke:
To capture only the names ending with `sw` or `ke`, you can use:
```regex
[Kk][Ee][Ll][Ll][Yy](sw|ke)
```
#### Write a regex that captures only the names starting with a or ending with b:
To capture only the names starting with `a` or ending with `b`, you can use:
```regex
[Kk][Ee][Ll][Ll][Yy](a|b)
```
#### Write a regex that captures only the names starting with a or ending with b:
To capture only the names starting with `a` or ending with `b`, you can use:
```regex
[Kk][Ee][Ll][Ll][Yy](a|b)
```
#### Write a regex that captures only the names starting with Q or ending with Q:
To capture only the names starting with `Q` or ending with `Q`, you can use:
```regex
[Kk][Ee][Ll][Ll][Yy](Q|Q)
```
#### Write a regex that captures only the names starting with Q or ending with H:
To capture only the names starting with `Q` or ending with `H`, you can use:
```regex
[Kk][Ee][Ll][Ll][Yy](Q|H)
```
#### Write a regex that captures only the names starting with a or ending with b:
To capture only the names starting with `a` or ending with `b`, you can use:
```regex`
[Kk][Ee][Ll][Ll][Yy](a|b)
```
11 Dec 2010