前言:根据Lucene7.0版本介绍Lucene相关知识
Lucene7.0包目录
Lucene7.0官方文档
org.apache.lucene.analysis defines an abstract Analyzer API for converting text from a Reader into a TokenStream, an enumeration of token Attributes. A TokenStream can be composed by applying TokenFilters to the output of a Tokenizer. Tokenizers and TokenFilters are strung together and applied with an Analyzer. analyzers-common provides a number of Analyzer implementations, including StopAnalyzer and the grammar-based StandardAnalyzer.
org.apache.lucene.codecs provides an abstraction over the encoding and decoding of the inverted index structure, as well as different implementations that can be chosen depending upon application needs.
org.apache.lucene.document provides a simple Document class. A Document is simply a set of named Fields, whose values may be strings or instances of Reader.
org.apache.lucene.index provides two primary classes: IndexWriter, which creates and adds documents to indices; and IndexReader, which accesses the data in the index.
org.apache.lucene.search provides data structures to represent queries (ie TermQuery for individual words, PhraseQuery for phrases, and BooleanQuery for boolean combinations of queries) and the IndexSearcher which turns queries into TopDocs. A number of QueryParsers are provided for producing query structures from strings or xml.
org.apache.lucene.store defines an abstract class for storing persistent data, the Directory, which is a collection of named files written by an IndexOutput and read by an IndexInput. Multiple implementations are provided, including FSDirectory, which uses a file system directory to store files, and RAMDirectory which implements files as memory-resident data structures.
org.apache.lucene.util contains a few handy data structures and util classes, ie FixedBitSet and PriorityQueue.
解释
analysis:定义了一个分词器的API抽象类以及提供了一些常用分词器;分词器的作业是建立索引过程中,对文本进行分词,去掉停用词,转换成词根等。(如果想深入了解分词器推荐《Lucene实战》的第四章,Lucene的分析过程)
codecs:提供对反向索引结构的编码和解码的抽象,以及根据应用需要可选择的不同实现。
document:提供简单的文档类。文档只是一组命名的Fields,其值可能是字符串或Reader实例。
index:提供了两个主要的类:IndexWriter,想索引中创建和添加文件;IndexReader,访问索引数据。
search:提供的数据结构来表示查询(TermQuery、PhraseQuery、BooleanQuery)并将查询结果存放到TopDocs中,提供从字符串或xml生成查询结构的QueryParsers。
store:定义一个抽象类来存储持久数据,该目录是由一个IndexOutput编写的命名文件的集合,并由一个IndexInput读取。提供了多个实现,包括使用文件系统目录存储文件的FSDirectory和将文件作为内存驻留的数据结构实现的RAMDirectory。
util:包含了一些有用的数据结构和工具类。
geo:Lucene核心的地理空间工具实现
注:能力一般,水平有限,如有不当之处,请批评指正,定当虚心接受!