Lucene is a Java-based library.
It is not a web application and it doesn't run as a server.
It doesn't have any configuration files.
Lucene provides APIs (Java classes: Document, Field, IndexWriter, IndexSearcher, ...) that can be used to index and search documents.
A document in Lucene is a collection of fields which are name-value pairs.
A value can be a string, number, date, location, ...
Lucene supports multi-valued fields.
Which means a field that can store an array of values.
When indexing a text (document), Lucene will use text analyzers to tokenize the unstructured text into a stream of words (tokens).
Lucene can be configured to apply further operations on the extracted tokens, so, for example, a token can be discarded, substituted, or reduced (stemming).
The result of text analysis is a list of tokens called terms (text -> tokens -> terms).
Text analysis can be applied to each field of a document and Lucene will index the terms of each field.
Lucene index each field's term along with its ordinal position in the text and a link to its associated documents
(this is why it's called an inverted index).
Text Analysis includes tokenization and filtering:
Tokenizers: They extract tokens from the provided unstructured text (whitespace, regular expressions, ...).
Filters: They process the stream of tokens extracted by the Tokenizers.
They can remove from the stream non needed tokens (punctuation, ...),
they can execute some operation on tokens (lower-case, upper-case, ....),
and they can add new tokens to the stream (synonyms, ...).
When searching for documents, Lucene will parse the query string (using a query parser: lucene, dismax, edismax)
and it will apply the same text analysis (if configured) on each field of the query before executing the search query.
A relevance score will be assigned to each document of the search results.
Lucene provides:
Text analysis that transforms a text string into a list of terms (tokennizers, filters).
Query parser
Scoring algorithm
Inverted index to find documents related to an indexed term.
Search features (query completion, query spell checker, highlighter, ...)
Search enhancing features (faceted navigation, spatial search, ...)