Should I update my MessageDigest with byte arrays or one byte at a time?
Author: Deron Eriksson
Description: This Java tutorial compares the speeds of updates to MessageDigests.
Tutorial created using: Windows XP || JDK 1.5.0_09 || Eclipse Web Tools Platform 2.0 (Eclipse 3.3.0)


Performance is much better when updating a MessageDigest from an InputStream if you read/update byte arrays rather than reading/updating one byte at a time. The MessageDigestTest class demonstrates this by updating a MessageDigest using different sizes of byte arrays read from a FileInputStream. It also updates a MessageDigest by reading bytes one at a time from a FileInputStream for comparison. It uses the ApacheSW CommonsSW Codec library to output the digest results in hex.

The MessageDigestTest class generates an MD5W digest for the httpd-2.2.6-win32-src-r2.zip file that I downloaded from Apache's web site. This file is about 9 megabytes in size.

MessageDigestTest.java

package test;

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Date;

import org.apache.commons.codec.binary.Hex;

public class MessageDigestTest {

	public static void main(String[] args) {

		try {
			String file = "httpd-2.2.6-win32-src-r2.zip";
			MessageDigest md = MessageDigest.getInstance("MD5");

			getDigestViaByteArray(new FileInputStream(file), md, 128);
			getDigestViaByteArray(new FileInputStream(file), md, 256);
			getDigestViaByteArray(new FileInputStream(file), md, 512);
			getDigestViaByteArray(new FileInputStream(file), md, 1024);
			getDigestViaByteArray(new FileInputStream(file), md, 2048);
			getDigestViaByteArray(new FileInputStream(file), md, 4096);
			getDigestViaByteArray(new FileInputStream(file), md, 8192);

			getDigestViaOneByteAtATime(new FileInputStream(file), md);

		} catch (Throwable e) {
			e.printStackTrace();
		}

	}

	public static String getDigestViaByteArray(InputStream is, MessageDigest md, int arraySize)
			throws NoSuchAlgorithmException, IOException {
		Date t1 = new Date();

		md.reset();
		byte[] bytes = new byte[arraySize];
		int numBytes;
		while ((numBytes = is.read(bytes)) != -1) {
			md.update(bytes, 0, numBytes);
		}
		byte[] digest = md.digest();
		String result = new String(Hex.encodeHex(digest));
		
		Date t2 = new Date();

		System.out.println("MD5 Digest:" + result);
		System.out.print("Using byte array (size " + arraySize + "): ");
		System.out.println((t2.getTime() - t1.getTime()) + " milliseconds\n");

		return result;
	}

	public static String getDigestViaOneByteAtATime(InputStream is, MessageDigest md) throws NoSuchAlgorithmException,
			IOException {
		Date t1 = new Date();

		md.reset();
		int oneByte;
		while ((oneByte = is.read()) != -1) {
			md.update((byte) oneByte);
		}

		byte[] digest = md.digest();
		String result = new String(Hex.encodeHex(digest));

		Date t2 = new Date();

		System.out.println("MD5 Digest:" + result);
		System.out.print("One byte at a time: ");
		System.out.println((t2.getTime() - t1.getTime()) + " milliseconds\n");

		return result;
	}

}

The MessageDigestTest class outputs the time to update the MessageDigest and output its digest from reading byte arrays of sizes 128, 256, 512, 1024, 2048, 4096, and 8192. It also displays the time from reading/updating bytes one at a time.

The console output from executing MessageDigestTest is shown here:

MD5 Digest:301d61853fc9ce94bbfb55b56c218d06
Using byte array (size 128): 359 milliseconds

MD5 Digest:301d61853fc9ce94bbfb55b56c218d06
Using byte array (size 256): 219 milliseconds

MD5 Digest:301d61853fc9ce94bbfb55b56c218d06
Using byte array (size 512): 188 milliseconds

MD5 Digest:301d61853fc9ce94bbfb55b56c218d06
Using byte array (size 1024): 171 milliseconds

MD5 Digest:301d61853fc9ce94bbfb55b56c218d06
Using byte array (size 2048): 157 milliseconds

MD5 Digest:301d61853fc9ce94bbfb55b56c218d06
Using byte array (size 4096): 140 milliseconds

MD5 Digest:301d61853fc9ce94bbfb55b56c218d06
Using byte array (size 8192): 141 milliseconds

MD5 Digest:301d61853fc9ce94bbfb55b56c218d06
One byte at a time: 18687 milliseconds

Executing MessageDigestTest multiple times reveals similar results to those seen above. As you can see, reading/updating one byte at a time takes nearly 100 times longer that reading/updating using a byte array! Based on the results above, on my machine, using byte arrays of around 2048 or larger is a good idea.

For more accurate results, you could put the code in a loop and run it many times to generate accurate average times for each size.