Lucene 1.2 API

Jakarta Lucene API

See:
Description

Packages

org.apache.lucene.analysis API and code to convert text into indexable tokens.

org.apache.lucene.analysis.de Support for indexing and searching of German text.

org.apache.lucene.analysis.standard A grammar-based tokenizer constructed with JavaCC.

org.apache.lucene.document The Document abstraction.

org.apache.lucene.index Code to maintain and access indices.

org.apache.lucene.queryParser A simple query parser implemented with JavaCC.

org.apache.lucene.search Search over indices.

org.apache.lucene.store Binary i/o API, for storing index data.

org.apache.lucene.util Some utility classes.

Jakarta Lucene API

The Jakarta Lucene API is divided into several packages:

org.apache.lucene.util contains a few handy data structures, e.g., BitVector and PriorityQueue.
org.apache.lucene.store defines an abstract class for storing persistent data, the Directory, a collection of named files written by an OutputStream and read by an InputStream. Two implementations are provided, FSDirectory, which uses a file system directory to store files, and RAMDirectory which implements files as memory-resident data structures.
org.apache.lucene.document provides a simple Document class. A document is simply a set of named Field's, whose values may be strings or instances of java.io.Reader.
org.apache.lucene.analysis defines an abstract Analyzer API for converting text from a java.io.Reader into a TokenStream, an enumeration of Token's. A TokenStream is composed by applying TokenFilter's to the output of a Tokenizer. A few simple implemenations are provided, including StopAnalyzer and the grammar-based StandardAnalyzer.
org.apache.lucene.index provides two primary classes: IndexWriter, which creates and adds documents to indices; and IndexReader, which accesses the data in the index.
org.apache.lucene.search provides data structures to represent queries (TermQuery for individual words, PhraseQuery for phrases, and BooleanQuery for boolean combinations of queries) and the abstract Searcher which turns queries into Hits. IndexSearcher implements search over a single IndexReader.
org.apache.lucene.queryParser uses JavaCC to implement a QueryParser.

To use Lucene, an application should:

Create Document's by adding Field's.
Create an IndexWriter and add documents to to it with addDocument();
Call QueryParser.parse() to build a query from a string; and
Create an IndexSearcher and pass the query to it's search() method.

Some simple examples of code which does this are:

FileDocument.java contains code to create a Document for a file.
IndexFiles.java creates an index for all the files contained in a directory.
DeleteFiles.java deletes some of these files from the index.
SearchFiles.java prompts for queries and searches an index.

To demonstrate these, try something like:

> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexFiles rec.food.recipes/soups
adding rec.food.recipes/soups/abalone-chowder
[ ... ]
> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.SearchFiles
Query: chowder
Searching for: chowder
34 total matching documents
0. rec.food.recipes/soups/spam-chowder
[ ... thirty-four documents contain the word "chowder", "spam-chowder" with the greatest density.]
Query: path:chowder
Searching for: path:chowder
31 total matching documents
0. rec.food.recipes/soups/abalone-chowder
[ ... only thrity-one have "chowder" in the "path" field. ]
Query: path:"clam chowder"
Searching for: path:"clam chowder"
10 total matching documents
0. rec.food.recipes/soups/clam-chowder
[ ... only ten have "clam chowder" in the "path" field. ]
Query: path:"clam chowder" AND manhattan
Searching for: +path:"clam chowder" +manhattan
2 total matching documents
0. rec.food.recipes/soups/clam-chowder
[ ... only two also have "manhattan" in the contents. ]
[ Note: "+" and "-" are canonical, but "AND", "OR" and "NOT" may be used. ]

The IndexHtml demo is more sophisticated. It incrementally maintains an index of HTML files, adding new files as they appear, deleting old files as they disappear and re-indexing files as they change.

> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexHTML -create java/jdk1.1.6/docs/relnotes
adding java/jdk1.1.6/docs/relnotes/SMICopyright.html
[ ... create an index containing all the relnotes ]
> rm java/jdk1.1.6/docs/relnotes/smicopyright.html
> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexHTML java/jdk1.1.6/docs/relnotes
deleting java/jdk1.1.6/docs/relnotes/SMICopyright.html

HTML indexes are searched using SUN's JavaWebServer (JWS) and Search.jhtml. To use this:

copy Search.html and Search.jhtml to JWS's public_html directory;
copy lucene.jar to JWS's lib directory;
create and maintain your indexes with demo.IndexHTML in JWS's top-level directory;
launch JWS, with the demo directory on CLASSPATH (only one class is actually needed);
visit Search.html.

Note that indexes can be updated while searches are going on. Search.jhtml will re-open the index when it is updated so that the latest version is immediately available.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV NEXT

FRAMES NO FRAMES

Packages
org.apache.lucene.analysis	API and code to convert text into indexable tokens.
org.apache.lucene.analysis.de	Support for indexing and searching of German text.
org.apache.lucene.analysis.standard	A grammar-based tokenizer constructed with JavaCC.
org.apache.lucene.document	The Document abstraction.
org.apache.lucene.index	Code to maintain and access indices.
org.apache.lucene.queryParser	A simple query parser implemented with JavaCC.
org.apache.lucene.search	Search over indices.
org.apache.lucene.store	Binary i/o API, for storing index data.
org.apache.lucene.util	Some utility classes.