How do I delete a document from a Lucene index using the value of a field?
Author: Deron Eriksson
Description: This Java tutorial shows how to delete a Document using the value of a Field from an index that has been created using Lucene.
Tutorial created using: Windows XP || JDK 1.5.0_09 || Eclipse Web Tools Platform 2.0 (Eclipse 3.3.0)


In another tutorial, we examined how to create an index based on text files in a directory and then search that index. Here, we'll see how we can delete a document or documents from the index using the text value of one of the fields of the document.

We'll use the same project structure, shown below. The "filesToIndex" directory contains files that are indexed, and the indexDirectory contains the resulting Lucene index files.

project structure

Two text files in the "filesToIndex" directory will be indexed. The first one, deron-foods.txt, lists some foods that I like.

deron-foods.txt

Here are some foods that Deron likes:
hamburger
french fries
steak
mushrooms
artichokes

The second text file, nicole-foods.txt, lists some foods that Nicole likes.

nicole-foods.txt

Here are some foods that Nicole likes:
apples
bananas
salad
mushrooms
cheese

The LuceneDeleteFromIndexDemo class is shown below. Its createIndex() method creates an index based on the files in "filesToIndex". Each Document added to the index has two fields, the canonical path to the file (referenced using the FIELD_PATH constant) and the textual contents of the file (referenced using the FIELD_CONTENTS constant). The searchIndex() method of LuceneDeleteFromIndexDemo allows us to search for a search term in the contents of the documents in the index. These methods were described (used in another JavaSW class) in the other tutorial.

The deleteDocumentsFromIndexUsingTerm() method deletes the documents associated with a Term from the index. A Term consists basically of the name of a field and the text of that field. The deleteDocuments() method of the IndexReader object is used to delete the documents associated with the Term. The actual deletion occurs when we call the close() method of the IndexReader, so until that point, searches performed on the index will still take into account the documents to be deleted.

LuceneDeleteFromIndexDemo.java

package avajava;

import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.util.Iterator;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hit;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.LockObtainFailedException;

public class LuceneDeleteFromIndexDemo {

	public static final String FILES_TO_INDEX_DIRECTORY = "filesToIndex";
	public static final String INDEX_DIRECTORY = "indexDirectory";

	public static final String FIELD_PATH = "path";
	public static final String FIELD_CONTENTS = "contents";

	public static void main(String[] args) throws Exception {

		createIndex();
		searchIndex("mushrooms");

		Term term = new Term(FIELD_PATH, "C:\\projects\\workspace\\demo\\filesToIndex\\nicole-foods.txt");
		deleteDocumentsFromIndexUsingTerm(term);
		searchIndex("mushrooms");

	}

	public static void createIndex() throws CorruptIndexException, LockObtainFailedException, IOException {
		Analyzer analyzer = new StandardAnalyzer();
		boolean recreateIndexIfExists = true;
		IndexWriter indexWriter = new IndexWriter(INDEX_DIRECTORY, analyzer, recreateIndexIfExists);
		File dir = new File(FILES_TO_INDEX_DIRECTORY);
		File[] files = dir.listFiles();
		for (File file : files) {
			Document document = new Document();

			String path = file.getCanonicalPath();
			document.add(new Field(FIELD_PATH, path, Field.Store.YES, Field.Index.UN_TOKENIZED));

			Reader reader = new FileReader(file);
			document.add(new Field(FIELD_CONTENTS, reader));

			indexWriter.addDocument(document);
		}
		indexWriter.optimize();
		indexWriter.close();
	}

	public static void searchIndex(String searchString) throws IOException, ParseException {
		System.out.println("Searching for '" + searchString + "'");
		Directory directory = FSDirectory.getDirectory(INDEX_DIRECTORY);
		IndexReader indexReader = IndexReader.open(directory);
		IndexSearcher indexSearcher = new IndexSearcher(indexReader);

		Analyzer analyzer = new StandardAnalyzer();
		QueryParser queryParser = new QueryParser(FIELD_CONTENTS, analyzer);
		Query query = queryParser.parse(searchString);
		Hits hits = indexSearcher.search(query);
		System.out.println("Number of hits: " + hits.length());

		Iterator<Hit> it = hits.iterator();
		while (it.hasNext()) {
			Hit hit = it.next();
			Document document = hit.getDocument();
			String path = document.get(FIELD_PATH);
			System.out.println("Hit: " + path);
		}

	}

	public static void deleteDocumentsFromIndexUsingTerm(Term term) throws IOException, ParseException {

		System.out.println("Deleting documents with field '" + term.field() + "' with text '" + term.text() + "'");
		Directory directory = FSDirectory.getDirectory(INDEX_DIRECTORY);
		IndexReader indexReader = IndexReader.open(directory);
		indexReader.deleteDocuments(term);
		indexReader.close();

	}

}

In the main() method of LuceneDeleteFromIndexDemo, we can see that first an index is created and then a search is performed for "mushrooms" in the contents of the documents in the index. This search returns 2 hits. Next, we delete the document representing nicole-foods.txt from the index. We do this by creating a Term object consisting of the FIELD_PATH field and the canonical path to the "nicole-foods.txt" file, which serves as a unique identifier for the document. We pass this term to the deleteDocumentsFromIndexUsingTerm() method. Following the deletion, we once again search for "mushrooms". This search returns 1 hit (rather than the 2 hits we saw previously) because we deleted one of the documents from the index.

The console output from executing LuceneDeleteFromIndexDemo is shown here. Notice that before the deletion, we receive two hits for "mushrooms", while after the deletion, we receive one hit for "mushrooms".

Console Output

Searching for 'mushrooms'
Number of hits: 2
Hit: C:\projects\workspace\demo\filesToIndex\nicole-foods.txt
Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt
Deleting documents with field 'path' with text 'C:\projects\workspace\demo\filesToIndex\nicole-foods.txt'
Searching for 'mushrooms'
Number of hits: 1
Hit: C:\projects\workspace\demo\filesToIndex\deron-foods.txt

As we have seen, deleting documents from a Lucene index via a Term (consisting of a field and its text) is very straightforward. Probably the thing to be most careful about is to realize that the actual deletion doesn't occur until the IndexReader is closed.