Simple Email RegEx Validation
I know I've been leaving you all hanging the past couple of weeks without imparting some tumblelogging loveliness but its about that time when my girlfriend and I had to find a new place to live (our lease expires in a month... talk about last minute). But dont worry, after weeks of headaches and stress, we found a great condo not too far from where we live now. But I digress.
Instead of my typical posts regarding the current startup environment, I figured I would share something a little more technical I came across this week that may be helpful in working on your website: Regular Expressions.
Wikipedia defines regular expressions as a way to provide "concise and flexible means to 'match' (specify and recognize) strings of text, such as particular characters, words, or patterns of characters"
Something like this is particularly handy when you want to validate an input field, such as an email field on a sign up form.
When we built Lokalite, we actually had users validate their email addresses via an email we sent to the emails they provided. However, after testing and researching the usability of this feature, we saw this as a major (and unnecessary) barrier to users accessing our site and opted to remove it.
Email registration was going smoothly without this feature until we noticed a user had signed up and their email ended in '.edy' instead of '.edu'. Not fixing this problem could have had huge implications. What's the point of having a user sign up if it doesn't facilitate one-on-one communication, you aren't able to drive traffic to your site via email marketing, etc.?
How did we decide to solve this problem in the near term?
A simple email regular expression that validated the email string. Now I know some will say that doing it this way will not handle all potential scenarios and doesn't allow for the application to imply and suggest what a user meant to place in the field if the input is invalid. I completely agree. This is only a near term solution for a web application that is localized to a specific geographic region, doesn't have massive amounts of registrations everyday, and for those who don't have the technical savvy to implement a complete state engine (which is well beyond my current technical competency).
Our Solution
Here is the regular expression we're currently using to validate email registration:
/^([a-zA-Z0-9_\.\-\'\+])+\@(([a-zA-Z0-9\-])+\.)+(?:[a-zA-Z0-9]{2}|aero|asia|biz|cat|com|coop|info|int|jobs|mobi|museum|name|net|org|pro|tel|travel|xxx|edu|gov|mil|nom|firm|gen|idv\.)$/
What the hell does all of this mean?
1. ([a-zA-Z0-9_\.\-\'\+])
This pattern matches anything that has 'one of more of' a through z (lowercase and/or uppercase), 0 through 9, an underscore (_), a dot (.), an apostrophe (') or a hyphen(-).
2. +\@
Then followed by a required @ symbol.
3. (([a-zA-Z0-9\-])+\.)
Then followed by a through z (upper and/or lower), and/or 0 through 9, and/or a hyphen followed by a required dot (.).
4. +(?:[a-zA-Z0-9]{2}|aero|asia|biz|cat|com|coop|info|int|jobs|mobi|museum|name|net|org|pro|tel|travel|xxx|edu|gov|mil|nom|firm|gen|idv\.)
Then lastly followed by an alpha-numeric string that is 2 characters long (to handle country domains and 2 character TLD's), all current US TLD's, a couple one off non-US TLD's (nom, firm, gen, and idv), and/or a dot (.) followed by a 2 character long alpha numeric string (to handle TLD extensions).
This regular expression should satisfy most cases but needs to become more refined as your application's user base grows. This is just a start and there are definitely areas to make this more refined. I welcome anyone to use and add to this expression... just make sure to share the love.
Give credit where credit is due... inspired by http://www.dustindiaz.com/update-your-email-regexp/ & http://stackoverflow.com/questions/1487789/regular-expression-for-domain-from-email-address











