Class Summary |
Analyzer |
An Analyzer builds TokenStreams, which analyze text. |
CharTokenizer |
An abstract base class for simple, character-oriented tokenizers. |
LetterTokenizer |
A LetterTokenizer is a tokenizer that divides text at non-letters. |
LowerCaseFilter |
Normalizes token text to lower case. |
LowerCaseTokenizer |
LowerCaseTokenizer performs the function of LetterTokenizer
and LowerCaseFilter together. |
PorterStemFilter |
Transforms the token stream as per the Porter stemming algorithm. |
SimpleAnalyzer |
An Analyzer that filters LetterTokenizer with LowerCaseFilter. |
StopAnalyzer |
Filters LetterTokenizer with LowerCaseFilter and StopFilter. |
StopFilter |
Removes stop words from a token stream. |
Token |
A Token is an occurence of a term from the text of a field. |
TokenFilter |
A TokenFilter is a TokenStream whose input is another token stream. |
Tokenizer |
A Tokenizer is a TokenStream whose input is a Reader. |
TokenStream |
A TokenStream enumerates the sequence of tokens, either from
fields of a document or from query text. |
WhitespaceAnalyzer |
An Analyzer that uses WhitespaceTokenizer. |
WhitespaceTokenizer |
A WhitespaceTokenizer is a tokenizer that divides text at whitespace. |
API and code to convert text into indexable tokens.