• Home
  • LLMs
  • Docker
  • Kubernetes
  • Java
  • Maven
  • About
Apache Solr | schema.xml
  1. "schema.xml": Structure
  2. Unique Key
  3. Valid attributes for fields
  4. Fields naming convention
  5. Dynamic fields
  6. Fields types
  7. Tokenizers
  8. Filters

  1. "schema.xml": Structure
    Visit the Solr Wiki page for more information: https://cwiki.apache.org/confluence/display/solr/SchemaXml

    See these sample Schema files for more information (9.8.1):
    ► ${SOLR_HOME}/configsets/_default/conf/managed-schema.xml
    ► ${SOLR_HOME}/configsets/sample_techproducts_configs/conf/managed-schema.xml

    <schema name="" version="1.6" />
    
    <uniqueKey />
    
    <field />
    
    <dynamicField />
    
    <copyField />
    
    <fieldType />
    
    <fieldType name="" class="">
        <analyzer type="index">
            <tokenizer class=""/ >
            <filter class=""/ >
        </analyzer>
    
        <analyzer type="query">
            <tokenizer class=""/ >
            <filter class=""/ >
        </analyzer>
    </fieldType>
  2. Unique Key
    Field to use to determine and enforce document uniqueness.
    The field will be required, unless it's marked with required="false".

    <uniqueKey>id</uniqueKey>
  3. Valid attributes for fields
    <field ... />

    • name: [mandatory] - the name of the field.

    • type: [mandatory] - a name of a field type from the <fieldType> section.

    • indexed: [default=true] - if this field should be indexed (searchable or sortable).

    • stored: [default=true] - if this field should be retrievable.

    • required: if this field is required.
      It will throw an error if the value does not exist when indexing a document.

    • default: a value that should be used if no value is specified when adding a document.

    • multiValued: [default=true] - if this field may contain multiple values per document.

    • termPositions: stores position information with the term vector.
      This will increase storage costs.

    • termOffsets: stores offset information with the term vector.
      This will increase storage costs.

    • docValues: [default=true] - if this field should have doc values.
      Doc Values is recommended (required, if you are using *PointField fields) for faceting, grouping, sorting and function queries.
      Doc Values will make the index faster to load, more NRT-friendly and more memory-efficient.
      They are currently only supported by StrField, UUIDField, all *PointField fields, and depending on the field type, they might require the field to be single-valued, be required or have a default value
      (check the documentation of the field type you're interested in for more information).

    • omitNorms: (expert) set to true to omit the norms associated with this field (this disables length normalization and index-time boosting for the field, and saves some memory).
      Only full-text fields or fields that need an index-time boost need norms.
      Norms are omitted for primitive (non-analyzed) types by default.

    • termVectors: [default=false] set to true to store the term vector for a given field.
      When using MoreLikeThis, fields used for similarity should be stored for best performance.
  4. Fields naming convention
    Field names should consist of alphanumeric or underscore characters only and not start with a digit.
    Names with both leading and trailing underscores (e.g. _version_) are reserved.

    Special Names:
    • id
      <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />

    • _version_
      <field name="_version_" type="plong" indexed="false" stored="false" />

    • _root_
      <field name="_root_" type="string" indexed="true" stored="false" docValues="false" />

    • _text_
      <field name="_text_" type="text_general" indexed="true" stored="false" multiValued="true" />
  5. Dynamic fields
    Dynamic field definitions allow using convention over configuration for fields via the specification of patterns to match field names.
    Example: <dynamicField name="*_i" /> will match any field ending in _i (like myid_i, z_i).
    Restriction: the glob-like pattern in the name attribute must have a "*" only at the start or the end.

    <dynamicField name="*_i" type="pint" indexed="true" stored="true" />
    <dynamicField name="*_is" type="pints" indexed="true" stored="true" />

    <dynamicField name="*_s" type="string" indexed="true" stored="true" />
    <dynamicField name="*_ss" type="strings" indexed="true" stored="true" />

    <dynamicField name="*_l" type="plong" indexed="true" stored="true" />
    <dynamicField name="*_ls" type="plongs" indexed="true" stored="true" />

    <dynamicField name="*_t" type="text_general" indexed="true" stored="true" multiValued="false" />
    <dynamicField name="*_txt" type="text_general" indexed="true" stored="true" />

    <dynamicField name="*_b" type="boolean" indexed="true" stored="true" />
    <dynamicField name="*_bs" type="booleans" indexed="true" stored="true" />

    <dynamicField name="*_f" type="pfloat" indexed="true" stored="true" />
    <dynamicField name="*_fs" type="pfloats" indexed="true" stored="true" />

    <dynamicField name="*_d" type="pdouble" indexed="true" stored="true" />
    <dynamicField name="*_ds" type="pdoubles" indexed="true" stored="true" />

    <dynamicField name="random_*" type="random" />

    <dynamicField name="ignored_*" type="ignored" />

    <dynamicField name="*_str" type="strings" indexed="false" stored="false" docValues="true" useDocValuesAsStored="false" />

    <dynamicField name="*_dt" type="pdate" indexed="true" stored="true" />
    <dynamicField name="*_dts" type="pdate" indexed="true" stored="true" multiValued="true" />

    <dynamicField name="*_p" type="location" indexed="true" stored="true" />
    <dynamicField name="*_srpt" type="location_rpt" indexed="true" stored="true" />

    <!-- payloaded dynamic fields -->
    <dynamicField name="*_dpf" type="delimited_payloads_float" indexed="true" stored="true" />
    <dynamicField name="*_dpi" type="delimited_payloads_int" indexed="true" stored="true" />
    <dynamicField name="*_dps" type="delimited_payloads_string" indexed="true" stored="true" />

    <dynamicField name="attr_*" type="text_general" indexed="true" stored="true" multiValued="true" />

    <dynamicField name="*_ws" type="text_ws" indexed="true" stored="true" />

    <dynamicField name="*_t_sort" type="text_gen_sort" indexed="true" stored="true" multiValued="false" />
    <dynamicField name="*_txt_sort" type="text_gen_sort" indexed="true" stored="true" />

    <dynamicField name="*_txt_rev" type="text_general_rev" indexed="true" stored="true" />

    <dynamicField name="*_phon_en" type="phonetic_en" indexed="true" stored="true" />

    <dynamicField name="*_s_lower" type="lowercase" indexed="true" stored="true" />

    <dynamicField name="*_descendent_path" type="descendent_path" indexed="true" stored="true" />
    <dynamicField name="*_ancestor_path" type="ancestor_path" indexed="true" stored="true" />

    <dynamicField name="*_point" type="point" indexed="true" stored="true" />

    <dynamicField name="*_txt_en" type="text_en" indexed="true" stored="true" />
    <dynamicField name="*_txt_en_split" type="text_en_splitting" indexed="true" stored="true" />
    <dynamicField name="*_txt_en_split_tight" type="text_en_splitting_tight" indexed="true" stored="true" />

    <dynamicField name="*_txt_ar" type="text_ar" indexed="true" stored="true" />
    <dynamicField name="*_txt_bg" type="text_bg" indexed="true" stored="true" />
    <dynamicField name="*_txt_ca" type="text_ca" indexed="true" stored="true" />
    <dynamicField name="*_txt_cjk" type="text_cjk" indexed="true" stored="true" />
    <dynamicField name="*_txt_cz" type="text_cz" indexed="true" stored="true" />
    <dynamicField name="*_txt_da" type="text_da" indexed="true" stored="true" />
    <dynamicField name="*_txt_de" type="text_de" indexed="true" stored="true" />
    <dynamicField name="*_txt_el" type="text_el" indexed="true" stored="true" />
    <dynamicField name="*_txt_es" type="text_es" indexed="true" stored="true" />
    <dynamicField name="*_txt_eu" type="text_eu" indexed="true" stored="true" />
    <dynamicField name="*_txt_fa" type="text_fa" indexed="true" stored="true" />
    <dynamicField name="*_txt_fi" type="text_fi" indexed="true" stored="true" />
    <dynamicField name="*_txt_fr" type="text_fr" indexed="true" stored="true" />
    <dynamicField name="*_txt_ga" type="text_ga" indexed="true" stored="true" />
    <dynamicField name="*_txt_gl" type="text_gl" indexed="true" stored="true" />
    <dynamicField name="*_txt_hi" type="text_hi" indexed="true" stored="true" />
    <dynamicField name="*_txt_hu" type="text_hu" indexed="true" stored="true" />
    <dynamicField name="*_txt_hy" type="text_hy" indexed="true" stored="true" />
    <dynamicField name="*_txt_id" type="text_id" indexed="true" stored="true" />
    <dynamicField name="*_txt_it" type="text_it" indexed="true" stored="true" />
    <dynamicField name="*_txt_ja" type="text_ja" indexed="true" stored="true" />
    <dynamicField name="*_txt_ko" type="text_ko" indexed="true" stored="true" />
    <dynamicField name="*_txt_lv" type="text_lv" indexed="true" stored="true" />
    <dynamicField name="*_txt_nl" type="text_nl" indexed="true" stored="true" />
    <dynamicField name="*_txt_no" type="text_no" indexed="true" stored="true" />
    <dynamicField name="*_txt_pt" type="text_pt" indexed="true" stored="true" />
    <dynamicField name="*_txt_ro" type="text_ro" indexed="true" stored="true" />
    <dynamicField name="*_txt_ru" type="text_ru" indexed="true" stored="true" />
    <dynamicField name="*_txt_sv" type="text_sv" indexed="true" stored="true" />
    <dynamicField name="*_txt_th" type="text_th" indexed="true" stored="true" />
    <dynamicField name="*_txt_tr" type="text_tr" indexed="true" stored="true" />
  6. Fields types
    <fieldType ... />

    • String field types:
      string [class: solr.StrField]
      strings [class: solr.StrField]

    • Boolean field types:
      boolean [class: solr.BoolField]
      booleans [class: solr.BoolField]

    • Numeric field types (precisionStep="8"):
      pint [class: solr.IntPointField]
      pints [class: solr.IntPointField]

      plong [class: solr.LongPointField]
      plongs [class: solr.LongPointField]

      pfloat [class: solr.FloatPointField]
      pfloats [class: solr.FloatPointField]

      pdouble [class: solr.DoublePointField]
      pdoubles [class: solr.DoublePointField]

    • Date field types (precisionStep="6"):
      pdate [class: solr.DatePointField]
      pdates [class: solr.DatePointField]

    • Binary field types:
      binary [class: solr.BinaryField]

    • Random field types:
      random [class: solr.RandomSortField]

    • Generic field types:
      text_general [class: solr.TextField]
      text_ws [class: solr.TextField]
      text_general_rev [class: solr.TextField]

      text_en [class: solr.TextField]
      text_en_splitting [class: solr.TextField]
      text_en_splitting_tight [class: solr.TextField]

      text_[ar|bg|ca|cjk|cz|da|de|el|es|eu|fa|fi|fr|ga|gl|hi|hu|hy|id|it|ja|ko|lv|nl|no|pt|ro|ru|sv|th|tr] [class: solr.TextField]

      text_gen_sort [class: solr.SortableTextField]

      point [class: solr.PointType]

      location [class: solr.LatLonPointSpatialField]
      location_rpt [class: solr.SpatialRecursivePrefixTreeFieldType]

      delimited_payloads_float [class: solr.TextField]
      delimited_payloads_int [class: solr.TextField]
      delimited_payloads_string [class: solr.TextField]

      phonetic_en [class: solr.TextField]

      lowercase [class: solr.TextField]

      descendent_path [class: solr.TextField]
      ancestor_path [class: solr.TextField]
  7. Tokenizers
    Visit the Solr wiki page for more information: https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#TokenizerFactories

    solr.KeywordTokenizerFactory
    solr.LetterTokenizerFactory
    solr.WhitespaceTokenizerFactory
    solr.LowerCaseTokenizerFactory
    solr.StandardTokenizerFactory
    solr.ClassicTokenizerFactory
    solr.UAX29URLEmailTokenizerFactory
    solr.PatternTokenizerFactory
    solr.PathHierarchyTokenizerFactory
    solr.ICUTokenizerFactory
  8. Filters
    Visit the Solr wiki page for more information: https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#TokenFilterFactories

    solr.ClassicFilterFactory
    solr.ApostropheFilterFactory
    solr.LowerCaseFilterFactory
    solr.TypeTokenFilterFactory
    solr.TrimFilterFactory
    solr.TruncateTokenFilterFactory
    solr.PatternCaptureGroupFilterFactory
    solr.PatternReplaceFilterFactory
    solr.StopFilterFactory
    solr.CommonGramsFilterFactory
    solr.EdgeNGramFilterFactory
    solr.KeepWordFilterFactory
    solr.WordDelimiterFilterFactory
    solr.SynonymFilterFactory
    solr.RemoveDuplicatesTokenFilterFactory
    solr.ISOLatin1AccentFilterFactory
    solr.ASCIIFoldingFilterFactory
    solr.PhoneticFilterFactory
    solr.DoubleMetaphoneFilterFactor
    solr.BeiderMorseFilterFactory
    solr.ShingleFilterFactory
    solr.PositionFilterFactory
    solr.ReversedWildcardFilterFactory
    solr.CollationKeyFilterFactory
    solr.ICUCollationKeyFilterFactory
    solr.ICUNormalizer2FilterFactory
    solr.ICUFoldingFilterFactory
    solr.ICUTransformFilterFactory
© 2025  mtitek