“You won’t get IN with THIS E-Mail” – Python Regex to validate email (formally)
Currently did some text analysis with Python and Regex. Came up with an E-Mail format validation method which could help you with the same problem and remind me of it in the future.
Import package re and use the compile method to get a regex pattern with which you can match your desired String.
For evaluation I use the pattern.match() method, which only matches if the regex is found at the beggining of the String. Use pattern.search() if you want to search the whole (multi-line) String and see if it matches. If you want to have all matches and be able to iterate through them, use pattern.finditer(), which returns an Iterable.
import re
email_pattern = re.compile(r"[a-z0-9][a-z0-9-_.]*[a-z0-9]@[a-z0-9][a-z0-9-]*[a-z0-9]\.[a-z0-9]{2,}", flags=re.IGNORECASE)
def check_email(email: str) -> bool:
return email_pattern.match(email) is not None
my_email = 'My_Email@monster-soft.com'
print(f"Email format of '{my_email}' is {'valid' if check_email(my_email) else 'invalid'}")
Regex cheat steet:
Character sets:
All of the characters that match at this point in the regex.
[] – character set delimiters
[a-z] – only lower case letters
[A-Z] – only updder case letters
[0-9] – only numbers
[^0-2] – everything else than 0, 1, or 2
. – anything
@ – exactly @ Works with all other characters which are not special regex characters as well.
\. – exactly . Regex special characters have to be escaped with \ out of character sets.
Quantifiers:
How often something can or has to appear to match.
? – zero or one
+ – one or more
* – zero or more
{2,5} – greedy by default: two to five, but as many as possible
{2,5}? – non-greedy: two to five, but as few as possible
{5} – exactly five
Positioning:
You can use ^ (start) and / or $ (end) to dictate where in the string the regex has to match.
^Start – Will only match if the String starts with “Start”
end$ – WIll only match if the String ends with “end”
Complete list @ read the docs.
The “r” before the String makes sure Python lets it raw – line breaks “\n”, tabs “\t” will remain as they are.
The method parameter “flags=re.IGNORECASE” or “flags=re.I” makes sure the regex matches case IN-sensitive. This could of course also be done with regex. You should do that if parts or the whole regex has to match case sensitive. As this isn’t the case here, you save a-z or A-Z in your character sets and that makes your regex shorter and a little more readable.
Thanks Cory Schaefer for your great video on Regex and Regex in Python!
Thanks Kaya Yanar for this great title image! I just had to use it because it’s perfect. Hope you are not mad 🙂
Buy Kaya Yanar stuff on amazon (no affiliate link)