org.apache.lucene.analysis
Class CharTokenizer
java.lang.Object
|
+--org.apache.lucene.analysis.TokenStream
|
+--org.apache.lucene.analysis.Tokenizer
|
+--org.apache.lucene.analysis.CharTokenizer
- Direct Known Subclasses:
- LetterTokenizer, WhitespaceTokenizer
- public abstract class CharTokenizer
- extends Tokenizer
An abstract base class for simple, character-oriented tokenizers.
Fields inherited from class org.apache.lucene.analysis.Tokenizer |
input |
Method Summary |
protected abstract boolean |
isTokenChar(char c)
Returns true iff a character should be included in a token. |
Token |
next()
Returns the next token in the stream, or null at EOS. |
protected char |
normalize(char c)
Called on each token character to normalize it before it is added to the
token. |
Methods inherited from class org.apache.lucene.analysis.Tokenizer |
close |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CharTokenizer
public CharTokenizer(Reader input)
isTokenChar
protected abstract boolean isTokenChar(char c)
- Returns true iff a character should be included in a token. This
tokenizer generates as tokens adjacent sequences of characters which
satisfy this predicate. Characters for which this is false are used to
define token boundaries and are not included in tokens.
normalize
protected char normalize(char c)
- Called on each token character to normalize it before it is added to the
token. The default implementation does nothing. Subclasses may use this
to, e.g., lowercase tokens.
next
public final Token next()
throws IOException
- Returns the next token in the stream, or null at EOS.
- Overrides:
next
in class TokenStream
Copyright © 2000-2002 Apache Software Foundation. All Rights Reserved.