## Regular Expressions

### Quantifiers

Quantifiers affect the character before them in the regular expression, and determine how many times this character must or may occur.

If you want the quantifier to affect a sequence of characters, enclose those characters in parentheses.

The quantifiers are:

 {n} Must occur exactly n times {n,m} Must occur at least n times but no more than m times {n,} Must occur at least n times * 0 or more times (same as {0,}) + 1 or more times (same as {1,}) ? 0 or 1 time (same as {0,1})

#### Example 1

We would like to find out whether the concensus sequence

is contained (somewhere) in a given sequence \$a.

Without quantifiers:
```if (\$a =~ /ACCCC[AG][AG][AG]GTGT/) {...};
```
With quantifiers:
```if (\$a =~ /AC{4}[AG]{3}(GT){2}/) {...};
```

#### Example 2

The date and time example from the previous slide will look much nicer if we use quantifiers:
```#!/usr/local/bin/perl

print "Please enter date and time, as in \"08-OCT-1997  16:30\"\n";
my \$entry = <STDIN>;
chop (\$entry);

if (\$entry =~ /\d{2}-\w{3}-\d{4}  \d{2}:\d{2}/) {
print "good!\n";
} else {
print "wrong format!\n";
}
```

#### Example 3

To check whether a given sequence contains 2 or more repeats of the GATA tetranucleotide write:
```if (\$seq =~ /(GATA){2,}/) {  }

# note that we enclosed  the sequence to be repeated in parentheses
```

#### Example 4

The Genome Database accession IDs are composed of the characters GDB: followed by several digits (see example).
To check whether a Genome Database accession ID is entered correctly, use the following conditional:
```if (\$entry =~ /GDB:\d+/) {  }

# i.e. "GDB:" followed by one or more digits
```

#### Example 5

To check whether a sentence contains either the word "color" or "colour", write:
```if (\$sentence =~ /colou?r/) {  }

# the question mark here denotes an optional "u"
```

#### Example 6

The HTML specifications allow extra whitespaces inside tags.
`For example, < TITLE    > and <\tTITLE> mean the same as <TITLE>.`
To check whether an HTML text contains the TITLE tag, write:
```
if (\$text =~ /<\s*TITLE\s*>/) {  }

# the word "TITLE" may optionally be surrounded by any number
# of spaces, tabs etc.
```