LLMs (Large Language Models) are large neural networks models that are built up of interconnected layers of nodes.
LLMs parameters are connections between the input tokens of the model (or the nodes of a layer) and other nodes of another layer.
They are numerical values that represent the model's understanding of a domain (global or specific).
Each parameter has a weight; a numerical value given to a connection.
Types of LLMs:
-
Representation models: encoder-only models that are used for specific tasks.
Representation models are sequence-to-value models; they take a text (sequence of tokens) as input and generate a value.
Example: "The weather today is great!" ==> "1"
-
Generative models: decoder-only models that generate text.
Generative models are sequence-to-sequence models; they take a text (sequence of tokens) as input and generate text (sequence of tokens).
They are not trained on specific tasks.
When given a text, the generative models need to understand the context and they need to be given clear instructions on the expected output.
Example: "What's 1 + 1?" ==> "The answer is 2"
The transformer architecture is an encoder-decoder architecture.
It has 12 encoders and 12 decoders stacked together.
The encoder-decoder models are sequence-to-sequence models.
They are pre-trained using masked language modeling where sets of tokens are masked:
T5 (Text To Text Transfer Transformer) architecture is an encoder-decoder architecture.
Applications of LLMs:
- text generation
- text classification
- text clustering
- semantic search
- ...