<h1 id="analyzer">Analyzer<a aria-hidden="true" class="anchor-heading icon-link" href="#analyzer"></a></h1>
<blockquote>
<p>Before text is indexed, it’s passed through an analyzer. The analyzer, specified in the <a href="/notes/xll842rugo8cjkepz9d87bm">IndexWriter</a>'s constructor, is in charge of extracting those tokens out of text that should be indexed and eliminating the rest. If the content to be indexed isn’t plain text, you should first extract plain text from it before indexing. Chapter 7 shows how to use Tika to extract text from the most common rich-media document formats. Analyzer is an abstract class, but Lucene comes with several implementations of it. Some
Understanding the core indexing classes 27
of them deal with skipping stop words (frequently used words that don’t help distinguish one document from the other, such as a, an, the, in, and on); some deal with conversion of tokens to lowercase letters, so that searches aren’t case sensitive; and so on. Analyzers are an important part of Lucene and can be used for much more than simple input filtering. For a developer integrating Lucene into an application, the choice of analyzer(s) is a critical element of application design. You’ll learn much more about them in chapter 4.
The analysis process requires a document, containing separate fields to be indexed. - <a href="/notes/i099m1uvs7ztfvugwjyn3hj">Lucene in Action 2nd Edition</a></p>
</blockquote>

Analyzer

<div class="bordered-h3">

## About 
Hello! Welcome to my public [vault/website](https://wiki.dendron.so/notes/6682fca0-65ed-402c-8634-94cd51463cc4/).

I use this place to clarify my thoughts, and share my notes on various topics. The areas of interest to me include: 
- Philosophy
- Personal Development 
- And of course **Software Development**. 

</div>


<div class="bordered-h3">

### Highlighted notes

![[_.highlighted-notes]]

</div>

![[_.thoughts-become-destiny]]