• Home
  • LLMs
  • Docker
  • Kubernetes
  • Java
  • Ubuntu
  • Maven
  • Big Data
  • Archived
LLMs | Topic Modeling
  1. Topic Modeling
  2. Creating topics
  3. Using a model to create labeled topics

  1. Topic Modeling
    Topic modeling is about finding themes (topics) within clusters of textual documents. The topics are labels (keywords) that capture the meaning of the cluster.

    BERTopic is a modular topic modeling technique that extract topic representations. BERTopic uses two steps to extract topics: text clustering and representation topics. The text clustering step provides BERTopic with the clusters of the documents that are semantically similar.

    Example (to simplify the example, we choose one-word sentences):

    See this page for more details about BERTopic:
    https://maartengr.github.io/BERTopic/index.html
  2. Creating topics
    We will create topics from a set of words.

    Install the required modules:

    Python code:

    Run the Python script:

    Output:

    Topics are represented by the main keywords in the text. Their names are composed of these keywords concatenated with the underscore character ("_"). A specific topic with the tag "-1" can be listed and should include all keywords that do not match a specific topic. This topic may also include outliers which are candidates that do not match any of the found topics.

    Chart of the topics (Topic Word Scores): bertopic-barchart-figure.html
    Topics Plot
  3. Using a model to create labeled topics
    BERTopic can use a model to generate proper labels for the topics.

    We will create labeled topics from a set of words.

    For that we need to craft a prompt that should have two parts:
    • A subset of documents that best represent the topics that will inserted using the [DOCUMENTS] tag.
    • The keywords that make up the topics of the cluster that will be inserted using the [KEYWORDS] tag.


    Python code:

    Run the Python script:

    Output:
© 2025  mtitek