• Home
  • LLMs
  • Docker
  • Kubernetes
  • Java
  • Ubuntu
  • Maven
  • Big Data
  • Archived
LLMs | Introduction
  1. Introduction
  2. Representation models
  3. Generative models
  4. Creating a language model

  1. Introduction
    Language models are a type of AI models designed to understand, generate, and interact with human language. LLMs (Large Language Models) are trained on vast amounts of text data, allowing them to capture complex language patterns. They are large neural network models made up of layers of interconnected nodes. Connections between nodes are expressed using numeric values ​​(parameters) that represent the model's understanding of the language. Each parameter has a weight; a numeric value given to each connection between two nodes.

    Types of LLMs:
    • Representation models: encoder-only models that are used for specific tasks.
      Representation models are sequence-to-value models; they take a text (sequence of tokens) as input and generate a value.
      Example: "The weather today is great!" ==> "1"

    • Generative models: decoder-only models that generate text.
      Generative models are sequence-to-sequence models; they take a text (sequence of tokens) as input and generate text (sequence of tokens). They are not trained on specific tasks. When given a text, the generative models need to understand the context and they need to be given clear instructions on the expected output.
      Example: "What's 1 + 1?" ==> "The answer is 2"

    The Transformer architecture is a neural network model primarily used for processing text. It's widely adopted for various tasks such as machine translation, text summarization, and question-answering.

    The transformer architecture is an encoder-decoder architecture. The encoder-decoder models are sequence-to-sequence models. They are pre-trained using masked language modeling where sets of tokens are masked:

    Applications of LLMs:
    • text generation
    • text classification
    • text clustering
    • semantic search
    • ...
  2. Representation models
    They are used for specific tasks: for example, text classification

    Example: predict masked token

    Examples of representation models:
    • Google BERT large model (open source) consists of 340 million parameters.
      BERT: Bidirectional Encoder Representations from Transformers
  3. Generative models
    The model takes an input (a.k.a. the user prompt, user query) and returns an output that's expected to follow the user prompt.

    Generative models are also called completion models (they auto-complete the user prompt).

    Example: predict next token

    Example: auto completion

    Examples of representation models:
    • OpenAI GPT-4 model (proprietary) consists of 1.75 trillion parameters
      GPT: Generative Pre-trained Transformer

    • Google Gemini model (proprietary) consists of 27 billion parameters

    • Meta AI LLaMA 4 model (open source) consists of 2 trillion parameters

    • Google T5 is an encoder-decoder architecture with 11 billion parameters.
      T5: Text To Text Transfer Transformer
  4. Creating a language model
    Creating a language model takes place in two steps: Training, Fine-tuning

    Training:
    • The trained models (a.k.a. the pre-trained models) are called foundation models or base models.
    • It allows the model to learn the language grammar and understand the semantic, context, and patterns of text.
    • It doesn't target specific tasks.
    • It allows the model to predict the next token.
    • It takes a lot of computation (GPUs, VRAMs).
    • It requires a lot of data (un-supervised).
    • It takes a lot of training time.

    Fine-tuning:
    • Fine-tuning uses the pre-trained model to train it on a specific task (for example: text classification task).
    • It produces fine-tuned models.
    • It takes less of computation (GPUs, VRAMs).
    • It requires less data (supervised).
    • It takes less of training time.

    Training techniques: supervised and un-supervised
    • Supervised training techniques: uses labeled data (supervised text classification).
    • Un-supervised training techniques: no prior labeling (text clustering).

    Generative models can be fine-tuned to create models that respond to instructions.

    Training representation models uses a technique called masked language modeling. It masks tokens of the input and instructs the model to predict it.
© 2025  mtitek