How do I search an index for a term?
Author: Deron Eriksson
Description: This Java tutorial shows how to query an index for a particular term using the TermQuery class of Lucene.
Tutorial created using: Windows XP || JDK 1.5.0_09 || Eclipse Web Tools Platform 2.0 (Eclipse 3.3.0)


Page:    1 2 >

With Lucene, it's possible to search for a particular word that has been indexed using the TermQuery class. This tutorial will compare TermQuery searches with QueryParser searches, as well as show some of the nuances involved with a term query.

This example will utilize the following project structure. The "indexDirectory" directory contains an index that gets created (and searched) by the LuceneTermQueryDemo class.

project structure

Two text files in the "filesToIndex" directory will be indexed. The first one, deron-foods.txt, lists some foods that I like.

deron-foods.txt

Here are some foods that Deron likes:
hamburger
french fries
steak
mushrooms
artichokes

The second text file, nicole-foods.txt, lists some foods that Nicole likes.

nicole-foods.txt

Here are some foods that Nicole likes:
apples
bananas
salad
mushrooms
cheese

If we look at the main() method of LuceneTermQueryDemo, we can see that it first creates an index and then performs 6 searches. We won't cover index creation here, but we will look at the 6 searches that are performed against the index.

LuceneTermQueryDemo.java

package avajava;

import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.util.Iterator;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hit;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.LockObtainFailedException;

public class LuceneTermQueryDemo {

	public static final String FILES_TO_INDEX_DIRECTORY = "filesToIndex";
	public static final String INDEX_DIRECTORY = "indexDirectory";

	public static final String FIELD_PATH = "path";
	public static final String FIELD_CONTENTS = "contents";

	public static void main(String[] args) throws Exception {

		createIndex();
		searchIndex("deron");
		searchIndexWithTermQuery("deron");
		searchIndex("Deron");
		searchIndexWithTermQuery("Deron");
		searchIndex("french fries");
		searchIndexWithTermQuery("french fries");
	}

	public static void createIndex() throws CorruptIndexException, LockObtainFailedException, IOException {
		Analyzer analyzer = new StandardAnalyzer();
		boolean recreateIndexIfExists = true;
		IndexWriter indexWriter = new IndexWriter(INDEX_DIRECTORY, analyzer, recreateIndexIfExists);
		File dir = new File(FILES_TO_INDEX_DIRECTORY);
		File[] files = dir.listFiles();
		for (File file : files) {
			Document document = new Document();

			String path = file.getCanonicalPath();
			document.add(new Field(FIELD_PATH, path, Field.Store.YES, Field.Index.UN_TOKENIZED));

			Reader reader = new FileReader(file);
			document.add(new Field(FIELD_CONTENTS, reader));

			indexWriter.addDocument(document);
		}
		indexWriter.optimize();
		indexWriter.close();
	}

	public static void searchIndex(String searchString) throws IOException, ParseException {
		System.out.println("\nSearching for '" + searchString + "' using QueryParser");
		Directory directory = FSDirectory.getDirectory(INDEX_DIRECTORY);
		IndexSearcher indexSearcher = new IndexSearcher(directory);

		QueryParser queryParser = new QueryParser(FIELD_CONTENTS, new StandardAnalyzer());
		Query query = queryParser.parse(searchString);
		System.out.println("Type of query: " + query.getClass().getSimpleName());
		Hits hits = indexSearcher.search(query);
		displayHits(hits);

	}

	public static void searchIndexWithTermQuery(String searchString) throws IOException, ParseException {
		System.out.println("\nSearching for '" + searchString + "' using TermQuery");
		Directory directory = FSDirectory.getDirectory(INDEX_DIRECTORY);
		IndexSearcher indexSearcher = new IndexSearcher(directory);

		Term term = new Term(FIELD_CONTENTS, searchString);
		Query query = new TermQuery(term);
		Hits hits = indexSearcher.search(query);
		displayHits(hits);

	}

	public static void displayHits(Hits hits) throws CorruptIndexException, IOException {
		System.out.println("Number of hits: " + hits.length());

		Iterator<Hit> it = hits.iterator();
		while (it.hasNext()) {
			Hit hit = it.next();
			Document document = hit.getDocument();
			String path = document.get(FIELD_PATH);
			System.out.println("Hit: " + path);
		}
	}
}

First, let's look at the searchIndex() method, which performs a search using QueryParser. QueryParser is a very useful class, since it can very easily take user input and convert it into a search query that makes sense. We can see how QueryParser is used in the following code:

	Directory directory = FSDirectory.getDirectory(INDEX_DIRECTORY);
	IndexSearcher indexSearcher = new IndexSearcher(directory);
	QueryParser queryParser = new QueryParser(FIELD_CONTENTS, new StandardAnalyzer());
	Query query = queryParser.parse(searchString);
	Hits hits = indexSearcher.search(query);

We create a QueryParser object, specifying the field that we'd like to search and an analyzer. In this case, we use a StandardAnalyzer, which does things such as converting the words in the search string to lower case and tokenizing the string. Calling the parse() on the QueryParser object with the search string as a parameter returns a Query object. This object can actually be one of several different Query classes, as we shall see soon.

Now, let's look at the searchIndexWithTermQuery() method, which performs a search using a TermQuery. The TermQuery constructor takes a Term object. A Term object consists of a field and a term in that field. In general, user input can be queried much easier using QueryParser versus TermQuery, since the QueryParser can handle a variety of different user inputs without further code manipulation. A term query, on the other hand, is designed to take a single term and search for it in a field. Here is the code in searchIndexWithTermQuery() that searches the index using the term query.

	Directory directory = FSDirectory.getDirectory(INDEX_DIRECTORY);
	IndexSearcher indexSearcher = new IndexSearcher(directory);
	Term term = new Term(FIELD_CONTENTS, searchString);
	Query query = new TermQuery(term);
	Hits hits = indexSearcher.search(query);

In our main() method, we can see that 6 searches are performed. First we query for "deron" using a QueryParser and then a TermQuery. Next, we query for "Deron" with a QueryParser and then a TermQuery. Finally, we query for "french fries" with the QueryParser and then a TermQuery.

	searchIndex("deron");
	searchIndexWithTermQuery("deron");
	searchIndex("Deron");
	searchIndexWithTermQuery("Deron");
	searchIndex("french fries");
	searchIndexWithTermQuery("french fries");

(Continued on page 2)

Page:    1 2 >