• Home
  • LLMs
  • Python
  • Docker
  • Kubernetes
  • Java
  • Maven
  • All
  • About
LLMs | Text Classification
  1. Text Classification
  2. Example: Task-specific model (sentiment analysis)
  3. Example: Text classification with generative models (OpenAI GPT)

  1. Text Classification
    Text classification is a fundamental natural language processing task that assigns predefined labels or categories to text documents. This supervised learning technique enables machines to automatically categorize text based on its content.

    Modern text classification leverages two primary approaches with Large Language Models:
    • Representation Models
      These models convert text into numerical representations (embeddings) that capture semantic meaning:
      • Task-Specific Models: Fine-tuned for particular classification tasks (e.g., sentiment analysis, spam detection). These models are trained on domain-specific datasets and optimized for specific use cases.
      • Embedding Models: Generate general-purpose text embeddings that can be used with traditional machine learning classifiers or similarity-based approaches.

      Both types typically start with pre-trained transformer models like BERT, RoBERTa, or DistilBERT, which are then fine-tuned on task-specific datasets.

    • Generative Models
      Large language models like GPT-4, Claude, or Gemini can perform text classification through:
      • Zero-shot Classification: Classifying text without task-specific training using natural language instructions.
      • Few-shot Learning: Providing a few examples to guide the model's classification behavior.
      • Prompt Engineering: Crafting effective prompts to elicit accurate classifications.

    Advantages of Task-Specific Models
    • High Accuracy: Optimized for specific tasks with domain-relevant training data.
    • Fast Inference: Efficient processing with smaller model sizes.
    • Consistent Performance: Reliable results for the trained task.

    Advantages of Generative Models
    • Flexibility: Handle diverse classification tasks without retraining.
    • Zero-shot Capability: Classify into new categories without examples.
    • Reasoning: Provide explanations for classifications.

    Common Applications:
    • Sentiment Analysis: Determining emotional tone (positive, negative, neutral) in reviews, social media posts, or customer feedback.
    • Topic Classification: Categorizing documents by subject matter (sports, politics, technology, etc.).
    • Spam Detection: Filtering unwanted emails or messages.
    • Content Moderation: Identifying inappropriate or harmful content.
    • Document Classification: Organizing legal documents, research papers, or business reports.
    • Language Detection: Identifying the language of a given text.
    • Intent Classification: Understanding user intentions in chatbots and virtual assistants.
  2. Example: Task-specific model (sentiment analysis)
    Task-specific models offer high accuracy and efficiency for well-defined classification tasks. Here's an example using a fine-tuned RoBERTa model for sentiment analysis.

    Python code:
    $ vi representation-sentiment.py
    from transformers import AutoTokenizer, AutoConfig, AutoModelForSequenceClassification
    import numpy as np
    from scipy.special import softmax
    
    MODEL = f"cardiffnlp/twitter-roberta-base-sentiment-latest"
    
    # load the pre-trained sentiment analysis model, tokenizer, and configuration.
    model = AutoModelForSequenceClassification.from_pretrained(MODEL)
    tokenizer = AutoTokenizer.from_pretrained(MODEL)
    config = AutoConfig.from_pretrained(MODEL)
    
    # tokenize input text
    encoded_input = tokenizer("The weather today is great!", return_tensors='pt')
    
    # analyze sentiment of input text and return ranked predictions.
    output = model(**encoded_input)
    
    # extract and normalize scores
    scores = output.logits[0].detach().numpy()
    scores = softmax(scores)
    
    # rank predictions by confidence
    ranking = np.argsort(scores)
    ranking = ranking[::-1]
    for i in range(scores.shape[0]):
        l = config.id2label[ranking[i]]
        s = scores[ranking[i]]
        print(f"{i+1}) {l} {np.round(float(s), 4)}")
    Run the Python script:
    $ python3 representation-sentiment.py
    Output:
    1) positive 0.9899
    2) neutral 0.0068
    3) negative 0.0033
  3. Example: Text classification with generative models (OpenAI GPT)
    Generative models provide flexibility and can handle diverse classification tasks without task-specific training. Here's an implementation using OpenAI's GPT models.

    Python code:
    $ vi generative-sentiment.py
    import openai
    
    openai.api_key = "YOUR_API_KEY"
    
    prompt = """Can you tell if the following sentence is a positive, negative, or neutral statement:
    
    The weather today is great!
    
    If it is positive return Positive. If it is negative return Negative. Otherwise return Neutral.
    Also return the confidence score of your prediction.
    """
    
    messages=[
        { "role": "user", "content": prompt}
    ]
    
    output = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        temperature=0
    )
    
    print(output.choices[0].message.content)
    Run the Python script:
    $ python3 generative-sentiment.py
    Output:
    Positive, with a confidence score of 95%.
© 2025  mtitek