More‎ > ‎

How to Find Valid Email Addresses in Java with Regular Expressions (RegEx) or with EmailValidator from Apache Commons?

BY MARKUS SPRUNCK

 
Validating email addresses can be a tricky task. If your preferred operating system is unix, you would usually use the grep program to run the regex, but some-times we have to do this job within a Java program.

Basics - Regular Expression for Email Addresses

The regular expression used in the example code: 


  [A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}

is one of the simplest possible. In many cases this simple expression is good enough. It consists of five parts:

 
[A-Z0-9._%+-]+ the first part of mail address may contain all characters, numbers, points, underscores, percent, plus and minus.

  @ the @ character is mandatory
  [A-Z0-9.-]+ the second part of mail address may contain all characters, numbers, points, underscores. 
  \\. the point is mandatory
  [A-Z]{2,4} the domain name may contain all characters. The number of characters is limited between 2 and 4.

To get a better impression how regular expressions work you may visit Regular Expressions - User Guide

More Complex Regular Expression 

To get deeper into the topic you may visit a specialized page. There you find different expressions with more or less complex implementations like the following:

 
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|
"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")
@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\
[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|
[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

a very complex expression from [2].

Java Example Code - Call Regular Expression

The class RegularExpression.java reads a file and tries to find all valid email addresses.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;

public class RegularExpression {
    public static void main(String[] args) throws IOException {

        // Simple expression to find a valid e-mail address in a file
        Pattern pattern = Pattern.compile("[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}");
        // Read file, find valid mail addresses and print result
        File file = new File("test.txt");
        BufferedReader in = new BufferedReader(new FileReader(file));
        int lines = 0;
        int matches = 0;
        for (String line = in.readLine(); line != null; line = in.readLine()) {
            lines++;
            Matcher matcher = pattern.matcher(line.toUpperCase());
            if (matcher.matches()) {    
                System.out.println(lines + ": '" + line + "'");
                matches++;
            }
        }
        // output of summary
        if (matches == 0) {    
            System.out.println("No matches in " + lines + " lines");
        } else {
            System.out.println("\n" + matches + " matches in " + lines + " lines");
        }
    }
}   

With the following Test.txt file.

1
2
3
4
5
markus.sprunck@
markus.sprunck@online.de
markus.sprunck@sampledomain.eu
markus.sprunck@online
@online.de

The expected output for this input file is.

1
2
3
4
2: 'markus.sprunck@online.de'
3: 'markus.sprunck@sampledomain.eu'
 
2 matches in 5 lines

If the correct encoding is important, the Scanner class could be used - like in the following code snippet. A better way to read a file is to use correct encoding like this:

1
2
3
4
5
6
final Scanner scan = new Scanner(new File(filePath), encoding);
String line = scan.nextLine();
while (scan.hasNext()) {
    // insert here the code for e-mail scan
    line = scan.nextLine();
}

Java Example Code - Call EmailValidator from Apache Commons 

In the case you don't like to use a RegEx the uses of the class EmailValidator from Apache Commons is a good alternative. Just include the file commons-validator-1.4.0.jar into your project. [2]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
package com.sw_engineering_candies.main;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;

import org.apache.commons.validator.routines.EmailValidator;

public class ApacheCommonsEmailValidator {

	public static void main(final String[] args) throws IOException {

		// Get an instance
		final EmailValidator emailValidator = EmailValidator.getInstance();

		// Read file, find valid mail addresses and print result
		final File file = new File("test.txt");
		final BufferedReader in = new BufferedReader(new FileReader(file));
		int lines = 0;
		int matches = 0;
		for (String line = in.readLine(); line != null; line = in.readLine()) {
			lines++;
			final boolean valid = emailValidator.isValid(line);
			if (valid) {
				System.out.println(lines + ": '" + line + "'");
				matches++;
			}
		}
		// output of summary
		if (matches == 0) {
			System.out.println("No matches in " + lines + " lines");
		} else {
			System.out.println("\n" + matches + " matches in " + lines + " lines");
		}
	}
}

With the following Test.txt file.

1
2
3
4
5
markus.sprunck@
markus.sprunck@online.de
markus.sprunck@sampledomain.eu
markus.sprunck@online
@online.de

The expected output for this input file is.

1
2
3
4
2: 'markus.sprunck@online.de'
3: 'markus.sprunck@sampledomain.eu'
 
2 matches in 5 lines

References

[1]  How to Find or Validate an Email Address, Regular-Expressions.info;
     http://www.regular-expressions.info/email.html

[2] Commons Validator; 
     http://commons.apache.org/proper/commons-validator

Sponsored Link