Regular expression engine simplifies e-mail validation in PHP
Takeaway: It is extremely easy to validate user-supplied e-mail addresses in PHP, thanks largely to a very powerful regular expression engine built into the language.
A very common task when building Web sites involves validating user-supplied e-mail addresses. This is of particular importance to sites which require a valid e-mail address for transactions—e-commerce sites, Web mail sites, mailing lists and so on.
If your Web site uses PHP, however, you're in luck. It's extremely easy to validate user-supplied e-mail addresses in PHP, thanks largely to a very powerful regular expression engine built into the language. In this article, I'll demonstrate how easy it is.
To begin, assume you have the following Web form, which asks the user to enter an e-mail address. (Listing A)
Listing A
<html>
<head></head>
<body>
<form action="validate.php" method="post">
Enter e-mail address: <input type="text" name="e-mail">
</body>
</html>
As the code above shows, this form is submitted to the PHP script validate.php. Assuming the e-mail address is an important input into the next transaction, it's very important to verify that it is valid before using it.
The best way to accomplish this is with a regular expression, which checks the format of the e-mail address and ensures that it conforms to the standard format of user@domain.ext. Here's an example (Listing B):
Listing B
<?php
// check e-mail address
// display success or failure message
if (!preg_match("/^([a-zA-Z0-9])+@([a-zA-Z0-9_-])+(\.[a-zA-Z0-9_-]
+)+/", $_POST['e-mail'])) {
die("Invalid e-mail address");
}
echo "Valid e-mail address, processing...";
?>
Try it for yourself and see. The script will flag all e-mail addresses that are not in the format user@domain.ext. This is accomplished with the regular expression /^([a-zA-Z0-9])+@([a-zA-Z0-9_-])+(\.[a-zA-Z0-9_-]+)+/. Let's look at each bit of it in detail:
- The caret (^) indicates the beginning of the string.
- The expression ([a-zA-Z0-9])+ indicates the range of allowed characters for the user part of the e-mail address. The plus (+) symbol appended to the end of this range indicates that at least one character from this range is mandatory.
- The @ symbol is exactly what it looks like—the literal @ symbol used in an e-mail address
- The expression ([a-zA-Z0-9_-])+(\.[a-zA-Z0-9_-]+)+ represents the domain.ext part of the e-mail address. Notice that the first range does not include a period (.), while the second one does. This is to ensure that the domain part of the address contains at least one character. Again, the plus(+) symbols scattered throughout the pattern indicate that at least one valid character is required.
Of course, this expression isn't perfect—it will fail on addresses in the format first.last@domain.ext, and pass invalid domain extensions. You can tighten up the regular expression a little, by allowing periods in the username part and restricting the length of the domain extension. Here's an example (Listing C):
Listing C
<?php
// check e-mail address
// display success or failure message
if (!preg_match("/^([a-zA-Z0-9])+([\.a-zA-Z0-9_-])*@([a-zA-Z0-9_-
])+(\.[a-zA-Z0-9_-]+)*\.([a-zA-Z]{2,6})$/", $_POST['e-mail'])) {
die("Invalid e-mail address");
}
echo "Valid e-mail address, processing...";
?>
Some of the interesting enhancements here are:
- The username part of the e-mail address now has two ranges, one containing alphabetic, numeric and dash characters and the other also supporting periods (.). This allows usernames of the form first.last@domain.ext.
-
The extension part of the e-mail address, ([a-zA-Z]{2,6}),
now has
a size specifier enclosed in curly braces. This forces the extension to be
between 2 and 6 characters long. All currently valid domain extensions fall
within this range.
Caution: Obviously, this restriction, while reducing the incidence of too-long or too-short domain extensions, doesn't solve the problem entirely; users can still input invalid extensions between 2-6 characters long. This can be rectified by replacing the final part of the expression with a rigid list of valid domains (it hasn't been done here because it significantly increases the length and processing efficiency of the expression). - The dollar ($) symbol is the end-of-string delimiter.
These are just two examples of regular expressions you can use to validate e-mail addresses. Many more variants exist, each with its own advantages and drawbacks. Remember that, given efficiency constraints, no pattern is completely foolproof, and so you should choose a pattern that has an appropriate combination of rigidity and performance for your needs. Happy coding!
Print/View all Posts Comments on this article
SponsoredWhite Papers, Webcasts, and Downloads
- IBM Multiform Master Data Management: The evolution of MDM applications IBM
- IBM Master Data Management: Effective Data Governance IBM
- Case Study: Clackamas County Oregon's Outdated Fibre Channel Infrastructure Runs Out of Capacity Dell EqualLogic
- Demo: Need Disk Space? IBM DB2 9 Compression Demo IBM
- Next Generation Mobility Now Sprint
Article Categories
- Security
- Security Solutions, IT Locksmith
- Networking and Communications
- E-mail Administration NetNote, Cisco Routers and Switches
- CIO and IT Management
- Project Management, CIO Issues, Strategies that Scale
- Desktops, Laptops & OS
- Windows 2000 Professional, Microsoft Word, Microsoft Excel, Microsoft Access, Windows XP,
- Data Management
- Oracle, SQL Server
- Servers
- Windows NT, Linux NetNote, Windows Server 2003
- Career Development
- Geek Trivia
- Software/Web Development
- Web Development Zone, Visual Basic, .NET
