A question that programmers often ask is "How do I validate an email address?"
At first glance that appears to be a sensible question. If you're writing a web form or some other application that needs to accept an email address, you might want to detect errors (say, typing fred4my.com instead of email@example.com) and give the user a chance to correct the error.
But the question of what is a valid email address is much harder than you might expect. The official standard for email accepts a very broad range of email address formats.
[Aside: what's with Google? Try searching for "how to validate email addresses" (without the quotes). I get a 403 error page:
... but your query looks similar to automated requests from a computer virus or spyware application. [...]
After some experiments, it looks like Google UK blocks (almost) any search containing "email" and "address". But Google Australia doesn't seem to care; and even Google UK will accept the query if it comes from Konqueror's toolbar.]
The best advice for validating email addresses is: Just Say No. At most, check that the email address isn't blank. If you absolutely know that the address can't be a local address, check for the presence of at least one at-sign @. (Yes, you read me right the first time: at least one.) And that's it -- leave the validation up to the mail server. If the mail server can deliver it, it is valid, and if it can't, it isn't.
If you want to guard against user typos, get the user to type the address twice, like they do for a password.
But ignorant programmers -- and it's frightening how many programmers fall into that category -- insist on doing incorrect validation. This example shows the danger of false negatives: anyone using this code will wrongly reject perfectly valid email addresses like:
somebody (see me @ the pub) @somewhere.com
Yes, the third one is valid: the part between ( and ) is a comment, and is ignored by any compliant mail server.
Another common mistake is to reject emails like
firstname.lastname@example.org: plus signs in the user name part are allowed.
And then there are the commercial sites that won't let you register with a Hotmail, Yahoo or Gmail address. Don't get me started on the sheer pig-ignorance and stupidity of that...
But ultimately, even if an email address is syntactically valid (and it is a horrific task to check that!) there's no guarantee that the address is valid until you've actually sent to it successfully.
email@example.com syntactically valid, but you still have to send an email to that address to find out whether the address is valid or not! That's why using a validator that works for "99% of email addresses" is bad practice -- not only do you needlessly reject the 1% of valid email addresses that your software can't handle, but you still don't know whether the address is valid until you actually try it.
The only thing worse than people who insist on validating email addresses are people who insist on validating email addresses with a regular expression. To quote Jamie Zawinski:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
Somebody, I think in the spirit of George Leigh Mallory ("because it's there"), wrote a regular expression to almost validate email addresses (it can't deal with comments, and naturally it can't tell whether or not the address actually exists). To give you a flavour of this regex, here are the first sixty-five characters of this 6343-character monster:
(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
Multiply that by a hundred. Now imagine trying to track down a bug in this beast. How confident are you that the creator of this regex has correctly dealt with all the odd corner cases?