33. Using Hugging Face LLMs#

Hugging Face transformers includes LLMs. There are an enormous number of LLMs available on HF. In this notebook we explore the working experience of using such LLMs for tasks like text generation.

  • Since transformers use PyTorch, we can install it or just use an AWS instance (machine image) that comes with PyTorch preinstalled.

  • Also note that most LLMs will need a GPU so choose a machine accordingly. Many times a large GPU with sufficient GPU RAM is required especially for very large LLMs with billions of parameters.

This tutorial gives a nice recap: https://huggingface.co/docs/transformers/llm_tutorial

%%time
!pip install --upgrade pip --quiet
# !pip install torch torchvision torchaudio --quiet
!pip install transformers --quiet
!pip install accelerate --quiet
!pip install --upgrade bitsandbytes --quiet
!pip install SentencePiece --quiet
!pip install datasets --quiet
!pip install einops --quiet
# !pip install "sagemaker>=2.175.0" --upgrade --quiet
!pip install tensorflow>=2.14 --quiet
?25l   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/1.8 MB ? eta -:--:--
   ━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━ 1.1/1.8 MB 31.2 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 28.8 MB/s eta 0:00:00
?25hCPU times: user 14.3 s, sys: 2.13 s, total: 16.5 s
Wall time: 53.6 s
import torch
from transformers import BertTokenizer, BertForMaskedLM, AutoModel
import numpy as np
import pandas as pd#
from sklearn.metrics import accuracy_score, f1_score
import accelerate
import bitsandbytes
import sentencepiece
from datasets import load_dataset
from tqdm import tqdm
import numpy as np

from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM
from transformers import AutoConfig
from transformers import pipeline

import textwrap
def p80(text):
    print(textwrap.fill(text, 80))
    return None
# Check if the GPU is being accessed
torch.cuda.is_available()
True

33.1. Using the Hugging Face pipeline#

This is the simplest way to use the LLMs on HF.

%%time
pipe = pipeline("text-generation", model="gpt2", max_new_tokens=128, do_sample=True)
prompt = """ Universal Basic Income, also known as UBI, is """
res = pipe(prompt)
/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(
Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
CPU times: user 5.36 s, sys: 1.99 s, total: 7.35 s
Wall time: 22.2 s
p80(res[0]['generated_text'])
 Universal Basic Income, also known as UBI, is  a program of the United States
Government that will help to raise employment and income for American workers.
The government also will provide a stipend to workers as well as other benefits
such as health care, education, and other benefits. The program will also
provide for public education, a free public library, the building of a $500
million public health department, the creation of a $200 million government-
funded health care program and to create an effective public hospital system.
The program is expected to be implemented by 2020. The Social Security
Administration estimated that the program would be worth $3.3 trillion in 2010.
The Social Security

33.2. Instantiate the LLM#

A more detailed way to use the LLMs is to initialize both,

  1. Tokenizer

  2. Model

Below you can see the model is initialized first. You can comment/uncomment the options for 4-bit and 8-bit quantization. (You will get slightly different results for various quantizations.) When using a GPU, you need to make sure all tensors are loaded onto the GPU, but to avoid forgetting, just use the option device+map="auto" so that it will automatically move the model to the GPU if it is being used. The tokenizer is also initialized below.

%%time

def load_model_tokenizer(model_name_or_path):
    config = AutoConfig.from_pretrained(model_name_or_path, trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained(
        model_name_or_path, config=config, trust_remote_code=True, cache_dir = f"cache/{model_name_or_path.split('/')[-1]}",
        # load_in_4bit=True,
        # load_in_8bit=True,
        device_map="auto",
        offload_folder=f"offloads/{model_name_or_path.split('/')[-1]}"
    )
    tokenizer = AutoTokenizer.from_pretrained(
        model_name_or_path,
        padding_side="left",
        cache_dir=f"cache/{model_name_or_path.split('/')[-1]}"
    )
    tokenizer.pad_token = tokenizer.eos_token
    return model, tokenizer

# LLM
# model_name = "Writer/palmyra-small" # https://huggingface.co/Writer/palmyra-small
# model_name = "mistralai/Mistral-7B-v0.1" # https://huggingface.co/mistralai/Mistral-7B-v0.1
# model_name = "HuggingFaceH4/zephyr-7b-alpha" # https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha
model_name="microsoft/phi-1_5" # https://huggingface.co/microsoft/phi-1_5

model, tokenizer = load_model_tokenizer(model_name)
CPU times: user 6.01 s, sys: 8.2 s, total: 14.2 s
Wall time: 47.3 s

33.3. Using the LLM for text generation#

Here we use the .generate method to use the tokenize inputs (which are loaded onto the GPU). See the .to("cuda") method below. The max_new_tokens is set to a larger number depending on how many tokens you need to generate (the default is 20). The generated token IDs are decoded back into text using the .batch_decode method.

This use of token IDs is explained nicely in this video.

def generate_response(prompt, tokenizer, model):
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    generate_ids = model.generate(inputs.input_ids, max_new_tokens=512, do_sample=True)
    response = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
    response = response.split(prompt)[-1]
    return response

And here we go ahead and pass in the prompt and get the generated text.

%%time
res = generate_response(prompt, tokenizer, model)
p80(prompt + res)
 Universal Basic Income, also known as UBI, is _________. A: A concept. To stop
the spreading of germs, one should avoid touching others with their bare hands.
The child needed help with his homework but the homework was too difficult.  To
protect his skin from the sun's harmful rays, Joe put on a hat, because Joe was
concerned about the sun’s harmfulness.  It was best to buy the wooden planks
from the hardware store, because the wood was a better material for the job.
Because David saw the bird flying away, he ran in the opposite direction.
Topic: Social studies--Economics--Economics concepts and principles  Once upon a
time, in a small town named Glenville, there were two best friends, Emily and
Lily. They both had a passion for photography, but their approaches towards
their work were quite different.   Emily, who was a professional photographer,
always followed traditional techniques and principles in her photographs. She
believed in capturing timeless moments and valued the importance of composition
and lighting. On the other hand, Lily, being an amateur photographer, embraced a
more experimental approach, often challenging conventional principles and
exploring new angles and perspectives.  Their different approaches became
evident when they worked on a project together for an economics conference. They
had to document various economic concepts and principles for their presentation.
One day, Emily and Lily met up to discuss their progress. Emily said, "Lily, I
noticed that you were able to capture unique angles and perspectives in your
photographs, which adds a fresh and contemporary touch to our project. However,
I think we should also focus on maintaining a professional approach to ensure
that our audience understands the concepts clearly."  Lily nodded and replied,
"I understand your concern, Emily. I believe incorporating a more abstract and
modern perspective will attract a wider audience. Besides, as an amateur, I am
more open to taking risks and pushing the boundaries of conventional
photography."  Emily considered Lily's points and smiled, "You're right, Lily.
Sometimes, stepping out of our comfort zones and exploring new artistic
territories can help us communicate complex ideas in a more accessible way.
Let's balance our photographs with a touch of professionalism."  As they
continued their work, Emily and Lily's collaboration became more fruitful.
Emily's traditional approach beautifully complemented Lily's experimental style.
They managed to capture significant economic concepts such as supply and demand,
inflation, and market equilibrium while maintaining a visually appealing
CPU times: user 13.9 s, sys: 23.4 ms, total: 13.9 s
Wall time: 14.1 s

33.4. Using an LLM for Perplexity type calculations#

We have discussed perplexity before and here is the code to use the LLM to compute perplexity for a string of text. Note that device_map is not set to “auto” now in the initialization below.

def load_model_tokenizer(model_name_or_path):
    config = AutoConfig.from_pretrained(model_name_or_path, trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained(
        model_name_or_path, config=config, trust_remote_code=True, cache_dir = f"cache/{model_name_or_path.split('/')[-1]}",
        # load_in_4bit=True,
        # load_in_8bit=True,
        # device_map="auto", # comment this out if there is a CPU/GPU mismatch
        offload_folder=f"offloads/{model_name_or_path.split('/')[-1]}"
    )
    tokenizer = AutoTokenizer.from_pretrained(
        model_name_or_path,
        padding_side="left",
        cache_dir=f"cache/{model_name_or_path.split('/')[-1]}"
    )
    tokenizer.pad_token = tokenizer.eos_token
    return model, tokenizer

# LLM
model_name = "Writer/palmyra-small" # https://huggingface.co/Writer/palmyra-small
# model_name = "mistralai/Mistral-7B-v0.1" # https://huggingface.co/mistralai/Mistral-7B-v0.1
# model_name = "HuggingFaceH4/zephyr-7b-alpha" # https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha
# model_name="microsoft/phi-1_5" # https://huggingface.co/microsoft/phi-1_5

model, tokenizer = load_model_tokenizer(model_name)

This implements perplexity, see https://srdas.github.io/NLPBook/Topic_Modeling.html?highlight=perplex#perplexity

import torch
from tqdm import tqdm

def get_ppl(text, model, tokenizer, stride=512):
    encodings = tokenizer(text, return_tensors="pt")
    max_length = stride #model.config.n_positions
    seq_len = encodings.input_ids.size(1)
    nlls = []
    prev_end_loc = 0
    for begin_loc in range(0, seq_len, stride):
        end_loc = min(begin_loc + max_length, seq_len)
        trg_len = end_loc - prev_end_loc  # may be different from stride on last loop
        input_ids = encodings.input_ids[:, begin_loc:end_loc]#.to(device)
        target_ids = input_ids.clone()
        target_ids[:, :-trg_len] = -100

        with torch.no_grad():
            outputs = model(input_ids, labels=target_ids)

            # loss is calculated using CrossEntropyLoss which averages over valid labels
            # N.B. the model only calculates loss over trg_len - 1 labels, because it internally shifts the labels
            # to the left by 1.
            neg_log_likelihood = outputs.loss

        nlls.append(neg_log_likelihood)

        prev_end_loc = end_loc
        if end_loc == seq_len:
            break

    ppl = torch.exp(torch.stack(nlls).mean())
    return ppl.numpy().tolist()
## EXAMPLE 1

# model_name = "gpt2"
# model, tokenizer = load_model_tokenizer(model_name)
get_ppl(""" I love PyTorch """, model, tokenizer)
`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
846.648681640625
## EXAMPLE 2
txt = """ 'Alexis Sanchez had offers from several clubs when he left Barcelona last summer, including Arsenal and Liverpool. He chose to join Arsenal in a decision that delighted boss Arsene Wenger. Sanchez has made an instant impact in English football and has scored 19 goals for the Gunners so far this season. Arsenal face Liverpool in the Premier League on Saturday as the two sides compete for a top-four finish.', 'Luis Suarez the famous chilean striker has departed Barcelona FC and is currently being courted by teams in the Barclays Premier League. Manchester United is the most likely landing spot for the famous Chilean striker who has scored hundreds of goals throughout his famous career. Negotiations between interested parties are ongoing and a decision is expected to be heard in the coming weeks.', 'An impressive english football player is being scouted out by several teams that hope to have him on their team. These bidding wars are intense and the player still has yet to make a decision.', 'Wenger is happy to have Alexis Sanchez. Sanchez was previously dismissed by Barcelona. He has been influential with 19 goals this season.', 'Arsenal Manager Arsene Wenger elated that Alexis Sanchez will side with his team for Liverpool on Saturday.', '26 year old Alexis Sanchez will be lining up for Arsenal this Saturday, and manager Arsene Wenger is certainly happy about it. When Barcelona allowed Sanchez to leave, he could have chosen any team. Luckily, he chose Arsenal and has already scored nineteen goals this season.', 'Alexis Sanchez chose Emirates over Anfield. Alexis Sanchez impressed immediately with his stellar play. Wenger has been rumored to be linked with Liverpool winger Raheem Sterling.', 'Alexis Sanchez picked Emirates over Anfield. Alexis Sanchez has made an instant impact with 19 goals. The speculation that is being linked with Wenger is Raheem Sterling.', "Chile forward Alexis Sanchez recently chose to play for Arsenal at the Emirates Statium over Liverpool at Anfield. Sanchez came into the league strong, scoring 19 goals this season. Arsenal manager Arsene Wenger is trying to downplay speculation that links him to Liverpool's winger Raheem Sterling.", 'Alexus Sabcgez chose the play at Emerates Stadium over Anfield in the premier league. Sanchez has mad a great impact with the teal scoring 19 goals this year for the North London Club. Wenger has been impressed by the way Rodger has developed his team but has down played speculation linking him to Raheem Sterling, the Liverpool winger.', "Alexis Sanchez picked Arsenal and Emirates Stadium to play for instead of Liverpool and Anfield, and his coach Arsene Wenger is sure happy about it. Sanchez has already scored nineteen goals for his team this year. Wenger also downplayed an interest in Raheem Sterling, who is a forward for Liverpool. Sterling rejected his team's latest contract offer." """

print("INPUT:")
p80(txt)

print("\nPPL metric =", get_ppl(txt, model, tokenizer))
INPUT:
 'Alexis Sanchez had offers from several clubs when he left Barcelona last
summer, including Arsenal and Liverpool. He chose to join Arsenal in a decision
that delighted boss Arsene Wenger. Sanchez has made an instant impact in English
football and has scored 19 goals for the Gunners so far this season. Arsenal
face Liverpool in the Premier League on Saturday as the two sides compete for a
top-four finish.', 'Luis Suarez the famous chilean striker has departed
Barcelona FC and is currently being courted by teams in the Barclays Premier
League. Manchester United is the most likely landing spot for the famous Chilean
striker who has scored hundreds of goals throughout his famous career.
Negotiations between interested parties are ongoing and a decision is expected
to be heard in the coming weeks.', 'An impressive english football player is
being scouted out by several teams that hope to have him on their team. These
bidding wars are intense and the player still has yet to make a decision.',
'Wenger is happy to have Alexis Sanchez. Sanchez was previously dismissed by
Barcelona. He has been influential with 19 goals this season.', 'Arsenal Manager
Arsene Wenger elated that Alexis Sanchez will side with his team for Liverpool
on Saturday.', '26 year old Alexis Sanchez will be lining up for Arsenal this
Saturday, and manager Arsene Wenger is certainly happy about it. When Barcelona
allowed Sanchez to leave, he could have chosen any team. Luckily, he chose
Arsenal and has already scored nineteen goals this season.', 'Alexis Sanchez
chose Emirates over Anfield. Alexis Sanchez impressed immediately with his
stellar play. Wenger has been rumored to be linked with Liverpool winger Raheem
Sterling.', 'Alexis Sanchez picked Emirates over Anfield. Alexis Sanchez has
made an instant impact with 19 goals. The speculation that is being linked with
Wenger is Raheem Sterling.', "Chile forward Alexis Sanchez recently chose to
play for Arsenal at the Emirates Statium over Liverpool at Anfield. Sanchez came
into the league strong, scoring 19 goals this season. Arsenal manager Arsene
Wenger is trying to downplay speculation that links him to Liverpool's winger
Raheem Sterling.", 'Alexus Sabcgez chose the play at Emerates Stadium over
Anfield in the premier league. Sanchez has mad a great impact with the teal
scoring 19 goals this year for the North London Club. Wenger has been impressed
by the way Rodger has developed his team but has down played speculation linking
him to Raheem Sterling, the Liverpool winger.', "Alexis Sanchez picked Arsenal
and Emirates Stadium to play for instead of Liverpool and Anfield, and his coach
Arsene Wenger is sure happy about it. Sanchez has already scored nineteen goals
for his team this year. Wenger also downplayed an interest in Raheem Sterling,
who is a forward for Liverpool. Sterling rejected his team's latest contract
offer."

PPL metric = 34.58359909057617

The longer text is much more self-consistent than the shorter one, so has low perplexity!

33.5. GPUs#

Going forward, understanding and using GPUs is going to be a useful skill. See: https://codeconfessions.substack.com/p/gpu-computing