So far in our past regex tutorials we have only seen literal matches and individual character matches. Regex has the ability to multiply atoms. This just means we can match more then individual characters! This is accomplished by quantifiers.
- The star * quantifier matches 0 or more times.
- Example: [1-9]* matches 123456789
- matches any digit 0 or more times.
- The plus quantifier + matches 1 or more times.
- Example: [a-z]+ matches hello
- matches anything a-z character 1 or more times. Some letter in the range needs to be present for there to be a match.
NOTE 1: by default quantifiers are considered ‘greedy’. This means they try to consume as much as possible.
NOTE 2: Also note the default behavior of a period . is to match everything but newlines. To match a literal period you need to escape it with a backslash in front \.
<p>.*</p> matches <p>paragraph one</p><p>paragraph two</p>
? Quantifier (Non Greediness)
To make something non greedy add a question mark ? this will match the least number of possible results vs the most, therefore:
- <p>.*?</p> ~> matches ~> <p>paragraph one</p><p>paragraph two</p>
- As you can see only one html ‘paragraph’ is matched at a time here.
Additional ? Points
the question mark ? quantifier can also be used on individual characters to make them optional. (Tells the regex engine to match the least amount of times)
https?:// matches either http:// or https://