• Home
  • LLMs
  • Docker
  • Kubernetes
  • Java
  • Python
  • Ubuntu
  • Maven
  • Archived
  • About
Apache Solr | SolrJ: Extract text using the request handler /update/extract
  1. Notes
  2. Example
  3. Notes

  1. Notes
    Make sure to configure the request handler "/update/extract" in SolrConfigXml file.

    In order for the code bellow to work:

    ► Make sure to update the variables ("solrUrl", "collectionName", ...) with your information.

    To force the commit, make sure to set the property "openSearcher" to true (SolrConfigXml file -> updateHandler -> autoCommit)

    Note: You can also force the commit by running the URL: http://localhost:8983/solr/COLLECTION-NAME/update?commit=true
  2. Example
    Extract text using the request handler /update/extract:

    This should create the following document:
  3. Notes
    • If you have a required unique key (Solr schema), you need to generate an auto value for the field (see an example bellow).

    • You can configure the request handler to capture Tika attributes and saved them in specific fields.

      To save Tika attributes in a separate field "meta", add the following option to the request handler:

      The "content" filed will hold, in this case, only the extracted text.

      To lower case the extracted fields/attributes, add the following option to the request handler:

      To save the extracted fields/attributes in separate fields, add the prefix "fmap." to the request handler:

    To apply the notes mentioned above, please adjust the "solrconfig.xm" file with the following:



    You also need to adjust the the solr schema and add a special dynamic field (*) to be able to index:
    ► Tika fields (x_parsed_by, ...)
    ► and Solr fields (stream_name, stream_source_info, stream_size, stream_content_type)


    To apply these changes you need to reload the collections that uses the updated configuration (solr schema and config).
    If your changes didn't apply properly, try to restart Solr. Otherwise check Solr logs in case you have some errors in your configuration.
© 2025  mtitek