Analyzers are pre-processors which are executed on the Text before generating inverted index.
So why do we need Analyzers in Elasticsearch?
Consider a field which we want have inverted-index
Document 1: "Hello Elasticsearch World "
Document 2: "hello Elasticsearch world"
The Inverted Index looks like this:
Term | (Document,Frequency) |
Hello | (1,1) |
World | (1,1) |
Elasticsearch | (1,1),(2,1) |
hello | (2,1) |
world | (2,1) |
Its evident that "Hello" and "hello" are same words with change in case, As there are two separate indexes for the same word , the query returns partial result when we need a case insensitive search.
To fix the above issue,Now lets have the words converted to lowercase before creating the inverted-index.
Term | (Document,Frequency) |
hello | (1,1),(2,1) |
world | (1,1),(2,1) |
Elasticsearch | (1,1),(2,1) |
This is what analyzers in Elasticsearch are for. This a simple illustration and analyzers are designed to do more than the example used for illustration.
In simple terms an Analyzer does:
- Split the text into individual terms or token, based on whitespace.
- Standardize the individual terms so they are searchable.
No comments:
Post a Comment