How do I generate an MD5 digest for a web page?
Author: Deron Eriksson
Description: This Java tutorial describes how to generate an MD5 digest for a URL.
Tutorial created using: Windows XP || JDK 1.5.0_09 || Eclipse Web Tools Platform 2.0 (Eclipse 3.3.0)


The MessageDigest class makes it easy to generate a digest from an InputStream. Since the URL class has an openStream() method that returns an InputStream, we can easily connect the content of a URL to a MessageDigest to generate a digest for that URL. The MessageDigestForUrl class generates an MD5W digest for http://www.google.com.

MessageDigestForUrl.java

package test;

import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;

import org.apache.commons.codec.binary.Hex;

public class MessageDigestForUrl {

	public static void main(String[] args) throws NoSuchAlgorithmException, FileNotFoundException, IOException {

		URL url = new URL("http://www.google.com");
		InputStream is = url.openStream();
		MessageDigest md = MessageDigest.getInstance("MD5");
		String digest = getDigest(is, md, 2048);

		System.out.println("MD5 Digest:" + digest);

	}

	public static String getDigest(InputStream is, MessageDigest md, int byteArraySize)
			throws NoSuchAlgorithmException, IOException {

		md.reset();
		byte[] bytes = new byte[byteArraySize];
		int numBytes;
		while ((numBytes = is.read(bytes)) != -1) {
			md.update(bytes, 0, numBytes);
		}
		byte[] digest = md.digest();
		String result = new String(Hex.encodeHex(digest));
		return result;
	}

}

If I execute MessageDigestForUrl right now, I get the following result:

MD5 Digest:4197db86818b67b66903ac62cd2bd04b

If I execute it again now, I get the following:

MD5 Digest:bcd1e22ac22842d3a7465584bca36a82

As you can see, the MD5 digest is different, indicating that the web page source code has changed! If you examine the www.google.com source code, you can see that it has indeed changed in a couple places. Most likely, these changes are a timestamp or a unique indicator so that Google can gain statistical information to improve their searches.

MD5 digests can be used to determine if a web page has changed, which can be useful if you manage a lot of web pages. Instead of storing all of the web page source code to look for changes, you can store an MD5 digest, which will be different if your source code for a web page changes.