org.apache.lucene.analysis.standard
Class StandardTokenizer

java.lang.Object
  |
  +--org.apache.lucene.analysis.TokenStream
        |
        +--org.apache.lucene.analysis.Tokenizer
              |
              +--org.apache.lucene.analysis.standard.StandardTokenizer
All Implemented Interfaces:
StandardTokenizerConstants

public class StandardTokenizer
extends Tokenizer
implements StandardTokenizerConstants

A grammar-based tokenizer constructed with JavaCC.

This should be a good tokenizer for most European-language documents.

Many applications have specific tokenizer needs. If this tokenizer does not suit your application, please consider copying this source code directory to your project and maintaining your own grammar-based tokenizer.


Field Summary
 Token jj_nt
           
 Token token
           
 StandardTokenizerTokenManager token_source
           
 
Fields inherited from class org.apache.lucene.analysis.Tokenizer
input
 
Fields inherited from interface org.apache.lucene.analysis.standard.StandardTokenizerConstants
ACRONYM, ALPHA, ALPHANUM, APOSTROPHE, COMPANY, DEFAULT, DIGIT, EMAIL, EOF, HAS_DIGIT, HOST, LETTER, NOISE, NUM, P, tokenImage
 
Constructor Summary
StandardTokenizer(CharStream stream)
           
StandardTokenizer(Reader reader)
          Constructs a tokenizer for this Reader.
StandardTokenizer(StandardTokenizerTokenManager tm)
           
 
Method Summary
 void disable_tracing()
           
 void enable_tracing()
           
 ParseException generateParseException()
           
 Token getNextToken()
           
 Token getToken(int index)
           
 Token next()
          Returns the next token in the stream, or null at EOS.
 void ReInit(CharStream stream)
           
 void ReInit(StandardTokenizerTokenManager tm)
           
 
Methods inherited from class org.apache.lucene.analysis.Tokenizer
close
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

token_source

public StandardTokenizerTokenManager token_source

token

public Token token

jj_nt

public Token jj_nt
Constructor Detail

StandardTokenizer

public StandardTokenizer(Reader reader)
Constructs a tokenizer for this Reader.

StandardTokenizer

public StandardTokenizer(CharStream stream)

StandardTokenizer

public StandardTokenizer(StandardTokenizerTokenManager tm)
Method Detail

next

public final Token next()
                 throws ParseException,
                        IOException
Returns the next token in the stream, or null at EOS.

The returned token's type is set to an element of StandardTokenizerConstants.tokenImage.

Overrides:
next in class TokenStream

ReInit

public void ReInit(CharStream stream)

ReInit

public void ReInit(StandardTokenizerTokenManager tm)

getNextToken

public final Token getNextToken()

getToken

public final Token getToken(int index)

generateParseException

public final ParseException generateParseException()

enable_tracing

public final void enable_tracing()

disable_tracing

public final void disable_tracing()


Copyright © 2000-2002 Apache Software Foundation. All Rights Reserved.