Friday, 8 April 2016

What is Elasticsearch Analyzers?

Analyzers are pre-processors which are executed on the Text before generating inverted index. 

So why do we need Analyzers in Elasticsearch?

Consider a field which we want have inverted-index

Document 1: "Hello Elasticsearch World "
Document 2: "hello Elasticsearch world"

The Inverted Index looks like this:

Term(Document,Frequency)
Hello(1,1)
World(1,1)
Elasticsearch(1,1),(2,1)
hello(2,1)
world(2,1)

Its evident that "Hello" and "hello" are same words with change in case, As there are two separate indexes for the same word , the query returns partial result when we need a case insensitive search.

To fix the above issue,Now lets have the words converted to lowercase before creating the inverted-index.

Term(Document,Frequency)
hello(1,1),(2,1)
world(1,1),(2,1)
Elasticsearch(1,1),(2,1)

This is what analyzers in Elasticsearch are for. This a simple illustration and analyzers are designed to do more than the example used for illustration.

In simple terms an Analyzer does:
  1. Split the text into individual terms or token, based on whitespace.
  2. Standardize the individual terms so they are searchable.



No comments:

Post a Comment