• Home
  • LLMs
  • Python
  • Docker
  • Kubernetes
  • Java
  • Maven
  • All
  • About
LLMs | Installation
  1. Python Environment Setup
  2. Hugging Face CLI
  3. Hugging Face Transformers
  4. llama-cpp-python
  5. LangChain

  1. Python Environment Setup
    See this page for details on installing Python and the minimum libraries required for your development environment:
    Install Python
  2. Hugging Face CLI
    Access and download models from Hugging Face Hub.

    See this page for more details: https://huggingface.co/docs/huggingface_hub/main/en/guides/cli

    Install Hugging Face CLI:
    $ pip install "huggingface_hub[cli]"
    Verify installation and check version:
    $ huggingface-cli version
    huggingface_hub version: 0.30.2
    You can use the CLI to download models:
    $ huggingface-cli download microsoft/DialoGPT-small
  3. Hugging Face Transformers
    Main library for working with transformer models (BERT, GPT, etc.).

    See this page for more details: https://huggingface.co/docs/transformers/en/installation

    Install transformers:
    $ pip install transformers
    To install Transformers with PyTorch as the backend (defaults to CPU if no GPU/CUDA is available), run:
    $ pip install 'transformers[torch]'
    To test if the installation was successful, run the following command. It should return a label and a score for the provided text:
    $ python3 -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('hugging face is the best'))"
    [{'label': 'POSITIVE', 'score': 0.999839186668396}]
    Example: Running a model using Hugging Face Transformers:
    $ vi huggingface-llm.py
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-small")
    tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-small")
    
    print(model)
    print(tokenizer)
    Run the python script:
    $ python3 huggingface-llm.py
    Output:
    GPT2TokenizerFast(
        name_or_path='microsoft/DialoGPT-small',
        vocab_size=50257,
        model_max_length=1024,
        is_fast=True,
        padding_side='right',
        truncation_side='right',
        special_tokens={'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>'},
        clean_up_tokenization_spaces=True,
        added_tokens_decoder={50256: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),}
    )
    GPT2LMHeadModel(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 1024)
        (wpe): Embedding(1024, 1024)
        ...
      )
      (lm_head): Linear(in_features=1024, out_features=50257, bias=False)
    )
  4. llama-cpp-python
    Run optimized LLMs locally with efficient inference.

    See these pages for more details:
    https://pypi.org/project/llama-cpp-python/
    https://python.langchain.com/docs/integrations/llms/llamacpp/

    Install llama-cpp-python:
    $ pip install llama-cpp-python
    Test llama-cpp-python:

    Download a compatible model (GGUF format + supported by llama-cpp-python):
    $ wget https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf
    Python code:
    $ vi llama-llm.py
    from llama_cpp import Llama
    
    model = Llama(model_path="./Phi-3-mini-4k-instruct-q4.gguf")
    Run the python script:
    $ python3 llama-llm.py
    Output:
    llama_model_loader: loaded meta data with 24 key-value pairs and 195 tensors from Phi-3-mini-4k-instruct-q4.gguf (version GGUF V3 (latest))
    llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
    llama_model_loader: - kv   0:                       general.architecture str              = phi3
    ...
    Using gguf chat template: {{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') %}{{'<|user|>' + '
    ' + message['content'] + '<|end|>' + '
    ' + '<|assistant|>' + '
    '}}{% elif (message['role'] == 'assistant') %}{{message['content'] + '<|end|>' + '
    '}}{% endif %}{% endfor %}
    Using chat eos_token: <|endoftext|>
    Using chat bos_token: <s>
  5. LangChain
    Framework for building applications powered by language models.

    See this page for more details: https://python.langchain.com/docs/how_to/installation/

    To install the main LangChain package (main framework):
    $ pip install langchain
    To install the LangChain core package:
    $ pip install langchain-core
    To install the LangChain community package:
    $ pip install langchain-community
    To install the LangChain command Line Interface (CLI) package:
    $ pip install langchain-cli
    To test the installation of the LangChain CLI package:
    $ langchain-cli --version
    langchain-cli 0.0.36
© 2025  mtitek