How do I convert a file system index to a memory index?
Author: Deron Eriksson
Description: This Java tutorial shows how to convert a Lucene file system index to an in-memory index.
Tutorial created using: Windows XP || JDK 1.5.0_09 || Eclipse Web Tools Platform 2.0 (Eclipse 3.3.0)


In certain situations, if you have enough memory, it can be useful to copy an index from the file system into memory. This results in faster searches against the index, since no reads from a hard drive need to be performed against an in-memory index since it is already in memory.

This can be accomplished via the RAMDirectory(Directory directory) constructor of the RAMDirectory class, where we pass the FSDirectory to be copied as the parameter of the constructor. This constructor makes a copy of the Directory that is passed as a parameter. As a result, it is independent from the original directory, so changes to the original directory that occur after the copy has taken place will not be reflected in the new RAMDirectory.

We will demonstrate this will the following project:

project structure

Two text files in the filesToIndex directory will be indexed. The first one, deron-foods.txt, lists some foods that I like.

deron-foods.txt

Here are some foods that Deron likes:
hamburger
french fries
steak
mushrooms
artichokes

The second text file, nicole-foods.txt, lists some foods that Nicole likes.

nicole-foods.txt

Here are some foods that Nicole likes:
apples
bananas
salad
mushrooms
cheese

The LuceneFileSystemToRamDemo class is our demonstration class. If we look at its main() method, we can see that it first creates a file system index via its createFileSystemIndex() method. This occurs using FSDirectory, which is obtained via:

fileSystemDirectory = FSDirectory.getDirectory(INDEX_DIRECTORY);

Following this, we make a copy of the file system index into memory via the aforementioned RAMDirectory constructor:

memoryDirectory = new RAMDirectory(fileSystemDirectory);

Following this, we perform 5 searches using the file system index and measure the total time, and then we perform another 5 searches using the memory index and measure the total time. I won't describe the details involved in the search process, since these are covered in another tutorial.

LuceneFileSystemToRamDemo.java

package avajava;

import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.util.Date;
import java.util.Iterator;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hit;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.LockObtainFailedException;
import org.apache.lucene.store.RAMDirectory;

public class LuceneFileSystemToRamDemo {

	public static final String FILES_TO_INDEX_DIRECTORY = "filesToIndex";
	public static final String INDEX_DIRECTORY = "indexDirectory";

	public static final String FIELD_PATH = "path";
	public static final String FIELD_CONTENTS = "contents";

	public static Directory fileSystemDirectory = null;
	public static Directory memoryDirectory = null;

	public static void main(String[] args) throws Exception {
		createFileSystemIndex();
		memoryDirectory = new RAMDirectory(fileSystemDirectory);
		doSearches(fileSystemDirectory);
		doSearches(memoryDirectory);
	}

	public static void doSearches(Directory directory) throws IOException, ParseException {
		long start = new Date().getTime();
		searchIndex(directory, "mushrooms");
		searchIndex(directory, "steak");
		searchIndex(directory, "steak AND cheese");
		searchIndex(directory, "steak and cheese");
		searchIndex(directory, "bacon OR cheese");
		long end = new Date().getTime();
		System.out.println("TOTAL SEARCH TIME (using " + directory.getClass().getSimpleName() + ") in milliseconds:"
				+ (end - start));
	}

	public static void createFileSystemIndex() throws CorruptIndexException, LockObtainFailedException, IOException {
		Analyzer analyzer = new StandardAnalyzer();
		boolean recreateIndexIfExists = true;
		fileSystemDirectory = FSDirectory.getDirectory(INDEX_DIRECTORY);
		IndexWriter indexWriter = new IndexWriter(fileSystemDirectory, analyzer, recreateIndexIfExists);

		File dir = new File(FILES_TO_INDEX_DIRECTORY);
		File[] files = dir.listFiles();
		for (File file : files) {
			Document document = new Document();

			String path = file.getCanonicalPath();
			document.add(new Field(FIELD_PATH, path, Field.Store.YES, Field.Index.UN_TOKENIZED));

			Reader reader = new FileReader(file);
			document.add(new Field(FIELD_CONTENTS, reader));

			indexWriter.addDocument(document);
		}
		indexWriter.optimize();
		indexWriter.close();
	}

	public static void searchIndex(Directory directory, String searchString) throws IOException, ParseException {
		System.out.println("Searching for '" + searchString + "'");
		IndexSearcher indexSearcher = new IndexSearcher(directory);

		Analyzer analyzer = new StandardAnalyzer();
		QueryParser queryParser = new QueryParser(FIELD_CONTENTS, analyzer);
		Query query = queryParser.parse(searchString);
		Hits hits = indexSearcher.search(query);
		System.out.println("Number of hits: " + hits.length());

		Iterator<Hit> it = hits.iterator();
		while (it.hasNext()) {
			Hit hit = it.next();
			Document document = hit.getDocument();
			String path = document.get(FIELD_PATH);
			System.out.println("Hit: " + path);
		}

	}

}

Let's look at the console output from the execution of LuceneFileSystemToRamDemo.

Console Output

Searching for 'mushrooms'
Number of hits: 2
Hit: C:\projects\workspace\demo\filesToIndex\nicole-foods.txt
Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt
Searching for 'steak'
Number of hits: 1
Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt
Searching for 'steak AND cheese'
Number of hits: 0
Searching for 'steak and cheese'
Number of hits: 2
Hit: C:\projects\workspace\demo\filesToIndex\nicole-foods.txt
Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt
Searching for 'bacon OR cheese'
Number of hits: 1
Hit: C:\projects\workspace\demo\filesToIndex\nicole-foods.txt
TOTAL SEARCH TIME (using FSDirectory) in milliseconds:125

Searching for 'mushrooms'
Number of hits: 2
Hit: C:\projects\workspace\demo\filesToIndex\nicole-foods.txt
Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt
Searching for 'steak'
Number of hits: 1
Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt
Searching for 'steak AND cheese'
Number of hits: 0
Searching for 'steak and cheese'
Number of hits: 2
Hit: C:\projects\workspace\demo\filesToIndex\nicole-foods.txt
Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt
Searching for 'bacon OR cheese'
Number of hits: 1
Hit: C:\projects\workspace\demo\filesToIndex\nicole-foods.txt
TOTAL SEARCH TIME (using RAMDirectory) in milliseconds:16

In the console output, notice that the 5 searches against the file system index took 125 milliseconds, and the 5 searches against the memory index took 16 milliseconds. This is what we would expect, since the file system index searches required hard disk I/O operations, while the memory index searches went against the index in memory, so no I/O was required.