How do I perform a range query?
Author: Deron Eriksson
Description: This Java tutorial shows how to query a Lucene index for a range of values using RangeQuery and QueryParser.
Tutorial created using: Windows XP || JDK 1.5.0_09 || Eclipse Web Tools Platform 2.0 (Eclipse 3.3.0)


In this tutorial, we'll see that with Lucene, it's possible to perform queries across a range of values using the RangeQuery (and ConstantScoreRangeQuery) class. It's also possible to query ranges using QueryParser, as we shall see.

This tutorial will utilize a project with the following structure. The "filesToIndex" directory contains two text files that will be indexed, and the "indexDirectory" will contain a file system index that we will create based on the text files.

project structure

Two text files in the "filesToIndex" directory will be indexed. The first one, deron-foods.txt, lists some foods that I like.

deron-foods.txt

Here are some foods that Deron likes:
hamburger
french fries
steak
mushrooms
artichokes

The second text file, nicole-foods.txt, lists some foods that Nicole likes.

nicole-foods.txt

Here are some foods that Nicole likes:
apples
bananas
salad
mushrooms
cheese

Now, on to our demonstration JavaSW class. The LuceneRangeQueryDemo class first creates an index based on the files in "filesToIndex". The documents in this index consist of fields for the file canonical path, the file last modified date, and the file contents. I mostly will skip over the topic of index creation, since this is covered in other tutorials, but notice that the FIELD_LAST_MODIFIED field stores the file's last modified date using the pattern "yyyy-MM-dd-HH-mm-ss". All values in the index need to be stored as Strings. Since they are indexed alphabetically (and thus hierarchically), it makes sense to start with the most general time value on the left (the year value) and work our way down to more and more specific values (down to seconds). Using this pattern allows us to query ranges of date values in the index for the FIELD_LAST_MODIFIED field. I used dashes between the various parts of the date/time pattern as a visual aid.

LuceneRangeQueryDemo.java

package avajava;

import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Iterator;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hit;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.RangeQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.LockObtainFailedException;

public class LuceneRangeQueryDemo {

	public static final String FILES_TO_INDEX_DIRECTORY = "filesToIndex";
	public static final String INDEX_DIRECTORY = "indexDirectory";

	public static final String FIELD_PATH = "path";
	public static final String FIELD_CONTENTS = "contents";
	public static final String FIELD_LAST_MODIFIED = "lastModified";

	public static final boolean INCLUSIVE = true;
	public static final boolean EXCLUSIVE = false;

	public static void main(String[] args) throws Exception {

		createIndex();

		searchIndexWithRangeQuery(FIELD_LAST_MODIFIED, "2008-07-10-00-00-00", "2008-07-10-23-59-59", INCLUSIVE);
		searchIndexWithRangeQuery(FIELD_LAST_MODIFIED, "2008-07-11-00-00-00", "2008-07-11-23-59-59", INCLUSIVE);
		searchIndexWithRangeQuery(FIELD_LAST_MODIFIED, "2008-07-10-00-00-00", "2008-07-10-21-21-02", INCLUSIVE);
		searchIndexWithRangeQuery(FIELD_LAST_MODIFIED, "2008-07-10-00-00-00", "2008-07-10-21-21-02", EXCLUSIVE);

		// equivalent range searches using QueryParser
		searchIndexWithQueryParser(FIELD_LAST_MODIFIED, "[2008-07-10-00-00-00 TO 2008-07-10-23-59-59]");
		searchIndexWithQueryParser(FIELD_LAST_MODIFIED, "[2008-07-11-00-00-00 TO 2008-07-11-23-59-59]");
		searchIndexWithQueryParser(FIELD_LAST_MODIFIED, "[2008-07-10-00-00-00 TO 2008-07-10-21-21-02]");
		searchIndexWithQueryParser(FIELD_LAST_MODIFIED, "{2008-07-10-00-00-00 TO 2008-07-10-21-21-02}");

	}

	public static void createIndex() throws CorruptIndexException, LockObtainFailedException, IOException {
		SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd-HH-mm-ss");

		Analyzer analyzer = new StandardAnalyzer();
		boolean recreateIndexIfExists = true;
		IndexWriter indexWriter = new IndexWriter(INDEX_DIRECTORY, analyzer, recreateIndexIfExists);
		File dir = new File(FILES_TO_INDEX_DIRECTORY);
		File[] files = dir.listFiles();
		for (File file : files) {
			Document document = new Document();

			String path = file.getCanonicalPath();
			document.add(new Field(FIELD_PATH, path, Field.Store.YES, Field.Index.UN_TOKENIZED));

			Reader reader = new FileReader(file);
			document.add(new Field(FIELD_CONTENTS, reader));

			String lastModified = sdf.format(new Date(file.lastModified()));
			document.add(new Field(FIELD_LAST_MODIFIED, lastModified, Field.Store.YES, Field.Index.UN_TOKENIZED));

			System.out.println("indexing file: " + path + " (last modified: " + lastModified + ")");

			indexWriter.addDocument(document);
		}
		indexWriter.optimize();
		indexWriter.close();
	}

	public static void searchIndexWithQueryParser(String whichField, String searchString) throws IOException,
			ParseException {
		System.out.println("\nSearching for '" + searchString + "' using QueryParser");
		Directory directory = FSDirectory.getDirectory(INDEX_DIRECTORY);
		IndexSearcher indexSearcher = new IndexSearcher(directory);

		QueryParser queryParser = new QueryParser(whichField, new StandardAnalyzer());
		Query query = queryParser.parse(searchString);
		System.out.println("Type of query: " + query.getClass().getSimpleName());
		Hits hits = indexSearcher.search(query);
		displayHits(hits);
	}

	public static void searchIndexWithRangeQuery(String whichField, String start, String end, boolean inclusive)
			throws IOException, ParseException {
		System.out.println("\nSearching for range '" + start + " to " + end + "' using RangeQuery");
		Directory directory = FSDirectory.getDirectory(INDEX_DIRECTORY);
		IndexSearcher indexSearcher = new IndexSearcher(directory);

		Term startTerm = new Term(whichField, start);
		Term endTerm = new Term(whichField, end);
		Query query = new RangeQuery(startTerm, endTerm, inclusive);
		Hits hits = indexSearcher.search(query);
		displayHits(hits);
	}

	public static void displayHits(Hits hits) throws CorruptIndexException, IOException {
		System.out.println("Number of hits: " + hits.length());

		Iterator<Hit> it = hits.iterator();
		while (it.hasNext()) {
			Hit hit = it.next();
			Document document = hit.getDocument();
			String path = document.get(FIELD_PATH);
			System.out.println("Hit: " + path);
		}
	}
}

In the searchIndexWithRangeQuery() method, we perform a range query. A RangeQuery constructor requires a Term indicating the start of the range, a Term indicating the end of the range, and a boolean value indicating whether the search is inclusive of the start and end values ("true") or exclusive of the start and end values ("false"). The following code snipped from searchIndexWithRangeQuery() performs the range query against the index stored in the file system.

	Directory directory = FSDirectory.getDirectory(INDEX_DIRECTORY);
	IndexSearcher indexSearcher = new IndexSearcher(directory);
	Term startTerm = new Term(whichField, start);
	Term endTerm = new Term(whichField, end);
	Query query = new RangeQuery(startTerm, endTerm, inclusive);
	Hits hits = indexSearcher.search(query);

We can also perform range queries using a QueryParser. To do so, we need to specify our range query in the following formats:

[start_term TO end_term]    (inclusive range query)
{start_term TO end_term}    (exclusive range query)

The inclusive range query uses angle brackets ([ and ]) around the range query, and the exclusive range query uses curly brackets ({ and }) around the range query. By default, QueryParser's parse() method will return a ConstantScoreRangeQuery rather than a RangeQuery unless QueryParser's useOldRangeQuery is set to true. According to the javadocs for RangeQuery:

The QueryParser default behaviour is to use the newer ConstantScoreRangeQuery class. This is generally preferable because:
  • It is faster than RangeQuery
  • Unlike RangeQuery, it does not cause a BooleanQuery.TooManyClauses exception if the range of values is large
  • Unlike RangeQuery it does not influence scoring based on the scarcity of individual terms that may match

If we examine the main() method of LuceneRangeQueryDemo, we can see that it performs 4 range queries using the RangeQuery class directly. Following this, it performs the same four equivalent queries using a QueryParser.

	searchIndexWithRangeQuery(FIELD_LAST_MODIFIED, "2008-07-10-00-00-00", "2008-07-10-23-59-59", INCLUSIVE);
	searchIndexWithRangeQuery(FIELD_LAST_MODIFIED, "2008-07-11-00-00-00", "2008-07-11-23-59-59", INCLUSIVE);
	searchIndexWithRangeQuery(FIELD_LAST_MODIFIED, "2008-07-10-00-00-00", "2008-07-10-21-21-02", INCLUSIVE);
	searchIndexWithRangeQuery(FIELD_LAST_MODIFIED, "2008-07-10-00-00-00", "2008-07-10-21-21-02", EXCLUSIVE);

	// equivalent range searches using QueryParser
	searchIndexWithQueryParser(FIELD_LAST_MODIFIED, "[2008-07-10-00-00-00 TO 2008-07-10-23-59-59]");
	searchIndexWithQueryParser(FIELD_LAST_MODIFIED, "[2008-07-11-00-00-00 TO 2008-07-11-23-59-59]");
	searchIndexWithQueryParser(FIELD_LAST_MODIFIED, "[2008-07-10-00-00-00 TO 2008-07-10-21-21-02]");
	searchIndexWithQueryParser(FIELD_LAST_MODIFIED, "{2008-07-10-00-00-00 TO 2008-07-10-21-21-02}");

Let's look at the console output from executing LuceneRangeQueryDemo:

Console Output

indexing file: C:\projects\workspace\demo\filesToIndex\deron-foods.txt (last modified: 2008-07-10-21-21-02)
indexing file: C:\projects\workspace\demo\filesToIndex\nicole-foods.txt (last modified: 2008-07-10-21-21-38)

Searching for range '2008-07-10-00-00-00 to 2008-07-10-23-59-59' using RangeQuery
Number of hits: 2
Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt
Hit: C:\projects\workspace\demo\filesToIndex\nicole-foods.txt

Searching for range '2008-07-11-00-00-00 to 2008-07-11-23-59-59' using RangeQuery
Number of hits: 0

Searching for range '2008-07-10-00-00-00 to 2008-07-10-21-21-02' using RangeQuery
Number of hits: 1
Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt

Searching for range '2008-07-10-00-00-00 to 2008-07-10-21-21-02' using RangeQuery
Number of hits: 0

Searching for '[2008-07-10-00-00-00 TO 2008-07-10-23-59-59]' using QueryParser
Type of query: ConstantScoreRangeQuery
Number of hits: 2
Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt
Hit: C:\projects\workspace\demo\filesToIndex\nicole-foods.txt

Searching for '[2008-07-11-00-00-00 TO 2008-07-11-23-59-59]' using QueryParser
Type of query: ConstantScoreRangeQuery
Number of hits: 0

Searching for '[2008-07-10-00-00-00 TO 2008-07-10-21-21-02]' using QueryParser
Type of query: ConstantScoreRangeQuery
Number of hits: 1
Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt

Searching for '{2008-07-10-00-00-00 TO 2008-07-10-21-21-02}' using QueryParser
Type of query: ConstantScoreRangeQuery
Number of hits: 0

At the top of the console output, we can see that the two files get indexed, and that the "last modified" times for these files are "2008-07-10-21-21-02" (for deron-foods.txt) and "2008-07-10-21-21-38" (for nicole-foods.txt).

In the first range query, we search for all files that were last modified on July 10th, 2008. This returns 2 hits since both files were last modified on this date. In the second range query, we search for all files that were last modified on July 11th, 2008. This returns 0 hits, since both documents were last modified on July 10th, 2008.

Next, we search in a range from 2008-07-10-00-00-00 to 2008-07-10-21-21-02, inclusively. Since deron-foods.txt was last modified at 2008-07-10-21-21-02 and the range query includes this value, we get one search hit. Following this, we search in a range from 2008-07-10-00-00-00 to 2008-07-10-21-21-02, exclusively. Since deron-foods.txt was last modified at 2008-07-10-21-21-02 and the range query doesn't include this value (since it is excluded), this search returns 0 hits.

After this, our next four searches show the equivalent searches performed using a QueryParser object. Notice that QueryParser's parse() method returns a ConstantScoreRangeQuery object rather than a RangeQuery object for each of these queries, as we can see from the console output for these queries.

Range queries are a great way to query across a range of values. They are probably most useful in terms of their ability to handle ranges of dates, although they can be used for other things such as file sizes and alphabetical ranges.