Using Hugging Face LLMs

33. Using Hugging Face LLMs#

Hugging Face transformers includes LLMs. There are an enormous number of LLMs available on HF. In this notebook we explore the working experience of using such LLMs for tasks like text generation.

Since transformers use PyTorch, we can install it or just use an AWS instance (machine image) that comes with PyTorch preinstalled.
Also note that most LLMs will need a GPU so choose a machine accordingly. Many times a large GPU with sufficient GPU RAM is required especially for very large LLMs with billions of parameters.

This tutorial gives a nice recap: https://huggingface.co/docs/transformers/llm_tutorial

%%time
!pip install --upgrade pip --quiet
# !pip install torch torchvision torchaudio --quiet
!pip install transformers --quiet
!pip install accelerate --quiet
!pip install --upgrade bitsandbytes --quiet
!pip install SentencePiece --quiet
!pip install datasets --quiet
!pip install einops --quiet
# !pip install "sagemaker>=2.175.0" --upgrade --quiet
!pip install tensorflow>=2.14 --quiet

   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 18.9 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 363.4/363.4 MB 47.0 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.8/13.8 MB 111.7 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.6/24.6 MB 85.8 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 883.7/883.7 kB 32.7 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.8/664.8 MB 41.2 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 211.5/211.5 MB 61.2 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.3/56.3 MB 32.1 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 127.9/127.9 MB 46.5 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 207.5/207.5 MB 62.0 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.1/21.1 MB 93.4 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 76.1/76.1 MB 51.8 MB/s eta 0:00:00
?25hCPU times: user 683 ms, sys: 112 ms, total: 795 ms
Wall time: 2min 9s

import torch
from transformers import BertTokenizer, BertForMaskedLM, AutoModel
import numpy as np
import pandas as pd#
from sklearn.metrics import accuracy_score, f1_score
import accelerate
import bitsandbytes
import sentencepiece
from datasets import load_dataset
from tqdm import tqdm
import numpy as np

from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM
from transformers import AutoConfig
from transformers import pipeline

import textwrap
def p80(text):
    print(textwrap.fill(text, 80))
    return None

# Check the GPU is being accessed
torch.cuda.is_available()

True

33.1. Using the Hugging Face pipeline#

This is the simplest way to use the LLMs on HF.

%%time
pipe = pipeline("text-generation", model="gpt2", max_new_tokens=128, do_sample=True)
prompt = """ Universal Basic Income, also known as UBI, is """
res = pipe(prompt)

/usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(

Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

CPU times: user 4.36 s, sys: 1.81 s, total: 6.17 s
Wall time: 13 s

p80(res[0]['generated_text'])

 Universal Basic Income, also known as UBI, is  (b) an unconditional basic
income guaranteed for all. Budgeted for 2014-15, UBI would include: $80.1
trillion $25 trillion of personal income, including all benefits above £18,500
(if any) and £14,000 of benefit to those under £14,000 per year or £14,000 for
an individual. The government estimates that by 2020-21, every person would have
$25,000 higher income than what their parents would be paying if they enjoyed
UBI. However, the new proposals to introduce the universal Basic Income (VBF)
have been

33.2. Instantiate the LLM#

A more detailed way to use the LLMs is to initialize both,

Tokenizer
Model

Below you can see the model is initialized first. You can comment/uncomment the options for 4-bit and 8-bit quantization. (You will get slightly different results for various quantizations.) When using a GPU, you need to make sure all tensors are loaded onto the GPU, but to avoid forgetting, just use the option device+map="auto" so that it will automatically move the model to the GPU if it is being used. The tokenizer is also initialized below.

%%time

def load_model_tokenizer(model_name_or_path):
    config = AutoConfig.from_pretrained(model_name_or_path, trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained(
        model_name_or_path, config=config, trust_remote_code=True, cache_dir = f"cache/{model_name_or_path.split('/')[-1]}",
        # load_in_4bit=True,
        # load_in_8bit=True,
        device_map="auto",
        offload_folder=f"offloads/{model_name_or_path.split('/')[-1]}"
    )
    tokenizer = AutoTokenizer.from_pretrained(
        model_name_or_path,
        padding_side="left",
        cache_dir=f"cache/{model_name_or_path.split('/')[-1]}"
    )
    tokenizer.pad_token = tokenizer.eos_token
    return model, tokenizer

# LLM
# model_name = "Writer/palmyra-small" # https://huggingface.co/Writer/palmyra-small
# model_name = "mistralai/Mistral-7B-v0.1" # https://huggingface.co/mistralai/Mistral-7B-v0.1
# model_name = "HuggingFaceH4/zephyr-7b-alpha" # https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha
model_name="microsoft/phi-1_5" # https://huggingface.co/microsoft/phi-1_5

model, tokenizer = load_model_tokenizer(model_name)

CPU times: user 6.01 s, sys: 7.91 s, total: 13.9 s
Wall time: 26.6 s

33.3. Using the LLM for text generation#

Here we use the .generate method to use the tokenize inputs (which are loaded onto the GPU). See the .to("cuda") method below. The max_new_tokens is set to a larger number depending on how many tokens you need to generate (the default is 20). The generated token IDs are decoded back into text using the .batch_decode method.

This use of token IDs is explained nicely in this video.

def generate_response(prompt, tokenizer, model):
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    generate_ids = model.generate(inputs.input_ids, max_new_tokens=512, do_sample=True)
    response = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
    response = response.split(prompt)[-1]
    return response

And here we go ahead and pass in the prompt and get the generated text.

%%time
res = generate_response(prompt, tokenizer, model)
p80(prompt + res)

 Universal Basic Income, also known as UBI, is a topic of great interest in our
current age. In the past, it has been a subject of debate, but as we move into
the future, it is essential to take both positive and negative aspects into
consideration.  On the positive side, the implementation of a Universal Basic
Income (UBI) in Finland, the first nation to consider a fully universal basic
income, has proven to have significant benefits. The Finnish government has been
studying UBI since the late 1990s, and they have concluded that it can provide a
positive impact on society. By introducing a guaranteed Basic Income for all
citizens, regardless of their employment status, the Finnish government hopes to
address various concerns.  One of the major concerns in Finland is long-term
welfare dependency, where vulnerable individuals rely too heavily on social
welfare benefits. UBI aims to combat this issue by reducing financial stress and
providing a safety net for citizens in times of need. Additionally, UBI studies
indicate that an increase in overall happiness and improved health can also be
attributed to the provision of a Universal Basic Income. By alleviating the
financial burden faced by many individuals, UBI promotes a happier and healthier
population.  The Finnish concept of basic well-being reflects the understanding
that happiness should not solely depend on economic factors. According to the
World Happiness Report, happiness is influenced by various aspects, including
health, safety, social relations, freedom, and a sense of community. UBI takes
these factors into account by ensuring that everyone has the means to meet their
basic needs and have a support system in place.  UBI also recognizes the
importance of maintaining the workforce's skills and capacity. In many
countries, there are concerns about individuals becoming dependent on welfare
benefits, which may inadvertently lead to fewer people actively seeking
employment. However, UBI aims to address this issue by providing individuals
with financial security, allowing them to have the confidence to engage in the
labor market and contribute to society. By doing so, UBI helps to improve the
overall quality of the workforce and ensures that everyone has a fair chance at
employment.  The implementation of UBI in Finland will require careful
consideration and extensive research. The Finnish government is actively working
on refining the concept and gathering data to support their decision. This
involves conducting comprehensive studies to assess the potential impact of UBI
on various sectors of society. It also aims to explore how UBI can be designed
to be financially sustainable for the country, taking into account the unique
needs of Finland and its cultural values.  As we delve deeper into the world of
U
CPU times: user 14.3 s, sys: 12.2 ms, total: 14.3 s
Wall time: 14.6 s

33.4. Using an LLM for Perplexity type calculations#

We have discussed perplexity before and here is the code to use the LLM to compute perplexity for a string of text. Note that device_map is not set to “auto” now in the initialization below.

def load_model_tokenizer(model_name_or_path):
    config = AutoConfig.from_pretrained(model_name_or_path, trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained(
        model_name_or_path, config=config, trust_remote_code=True, cache_dir = f"cache/{model_name_or_path.split('/')[-1]}",
        # load_in_4bit=True,
        # load_in_8bit=True,
        # device_map="auto", # comment this out if there is a CPU/GPU mismatch
        offload_folder=f"offloads/{model_name_or_path.split('/')[-1]}"
    )
    tokenizer = AutoTokenizer.from_pretrained(
        model_name_or_path,
        padding_side="left",
        cache_dir=f"cache/{model_name_or_path.split('/')[-1]}"
    )
    tokenizer.pad_token = tokenizer.eos_token
    return model, tokenizer

# LLM
model_name = "Writer/palmyra-small" # https://huggingface.co/Writer/palmyra-small
# model_name = "mistralai/Mistral-7B-v0.1" # https://huggingface.co/mistralai/Mistral-7B-v0.1
# model_name = "HuggingFaceH4/zephyr-7b-alpha" # https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha
# model_name="microsoft/phi-1_5" # https://huggingface.co/microsoft/phi-1_5

model, tokenizer = load_model_tokenizer(model_name)

This implements perplexity, see https://srdas.github.io/NLPBook/Topic_Modeling.html?highlight=perplex#perplexity

import torch
from tqdm import tqdm

def get_ppl(text, model, tokenizer, stride=512):
    encodings = tokenizer(text, return_tensors="pt")
    max_length = stride #model.config.n_positions
    seq_len = encodings.input_ids.size(1)
    nlls = []
    prev_end_loc = 0
    for begin_loc in range(0, seq_len, stride):
        end_loc = min(begin_loc + max_length, seq_len)
        trg_len = end_loc - prev_end_loc  # may be different from stride on last loop
        input_ids = encodings.input_ids[:, begin_loc:end_loc]#.to(device)
        target_ids = input_ids.clone()
        target_ids[:, :-trg_len] = -100

        with torch.no_grad():
            outputs = model(input_ids, labels=target_ids)

            # loss is calculated using CrossEntropyLoss which averages over valid labels
            # N.B. the model only calculates loss over trg_len - 1 labels, because it internally shifts the labels
            # to the left by 1.
            neg_log_likelihood = outputs.loss

        nlls.append(neg_log_likelihood)

        prev_end_loc = end_loc
        if end_loc == seq_len:
            break

    ppl = torch.exp(torch.stack(nlls).mean())
    return ppl.numpy().tolist()

## EXAMPLE 1

# model_name = "gpt2"
# model, tokenizer = load_model_tokenizer(model_name)
get_ppl(""" I love PyTorch """, model, tokenizer)

`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.

846.6474609375

## EXAMPLE 2
txt = """ 'Alexis Sanchez had offers from several clubs when he left Barcelona last summer, including Arsenal and Liverpool. He chose to join Arsenal in a decision that delighted boss Arsene Wenger. Sanchez has made an instant impact in English football and has scored 19 goals for the Gunners so far this season. Arsenal face Liverpool in the Premier League on Saturday as the two sides compete for a top-four finish.', 'Luis Suarez the famous chilean striker has departed Barcelona FC and is currently being courted by teams in the Barclays Premier League. Manchester United is the most likely landing spot for the famous Chilean striker who has scored hundreds of goals throughout his famous career. Negotiations between interested parties are ongoing and a decision is expected to be heard in the coming weeks.', 'An impressive english football player is being scouted out by several teams that hope to have him on their team. These bidding wars are intense and the player still has yet to make a decision.', 'Wenger is happy to have Alexis Sanchez. Sanchez was previously dismissed by Barcelona. He has been influential with 19 goals this season.', 'Arsenal Manager Arsene Wenger elated that Alexis Sanchez will side with his team for Liverpool on Saturday.', '26 year old Alexis Sanchez will be lining up for Arsenal this Saturday, and manager Arsene Wenger is certainly happy about it. When Barcelona allowed Sanchez to leave, he could have chosen any team. Luckily, he chose Arsenal and has already scored nineteen goals this season.', 'Alexis Sanchez chose Emirates over Anfield. Alexis Sanchez impressed immediately with his stellar play. Wenger has been rumored to be linked with Liverpool winger Raheem Sterling.', 'Alexis Sanchez picked Emirates over Anfield. Alexis Sanchez has made an instant impact with 19 goals. The speculation that is being linked with Wenger is Raheem Sterling.', "Chile forward Alexis Sanchez recently chose to play for Arsenal at the Emirates Statium over Liverpool at Anfield. Sanchez came into the league strong, scoring 19 goals this season. Arsenal manager Arsene Wenger is trying to downplay speculation that links him to Liverpool's winger Raheem Sterling.", 'Alexus Sabcgez chose the play at Emerates Stadium over Anfield in the premier league. Sanchez has mad a great impact with the teal scoring 19 goals this year for the North London Club. Wenger has been impressed by the way Rodger has developed his team but has down played speculation linking him to Raheem Sterling, the Liverpool winger.', "Alexis Sanchez picked Arsenal and Emirates Stadium to play for instead of Liverpool and Anfield, and his coach Arsene Wenger is sure happy about it. Sanchez has already scored nineteen goals for his team this year. Wenger also downplayed an interest in Raheem Sterling, who is a forward for Liverpool. Sterling rejected his team's latest contract offer." """

print("INPUT:")
p80(txt)

print("\nPPL metric =", get_ppl(txt, model, tokenizer))

INPUT:
 'Alexis Sanchez had offers from several clubs when he left Barcelona last
summer, including Arsenal and Liverpool. He chose to join Arsenal in a decision
that delighted boss Arsene Wenger. Sanchez has made an instant impact in English
football and has scored 19 goals for the Gunners so far this season. Arsenal
face Liverpool in the Premier League on Saturday as the two sides compete for a
top-four finish.', 'Luis Suarez the famous chilean striker has departed
Barcelona FC and is currently being courted by teams in the Barclays Premier
League. Manchester United is the most likely landing spot for the famous Chilean
striker who has scored hundreds of goals throughout his famous career.
Negotiations between interested parties are ongoing and a decision is expected
to be heard in the coming weeks.', 'An impressive english football player is
being scouted out by several teams that hope to have him on their team. These
bidding wars are intense and the player still has yet to make a decision.',
'Wenger is happy to have Alexis Sanchez. Sanchez was previously dismissed by
Barcelona. He has been influential with 19 goals this season.', 'Arsenal Manager
Arsene Wenger elated that Alexis Sanchez will side with his team for Liverpool
on Saturday.', '26 year old Alexis Sanchez will be lining up for Arsenal this
Saturday, and manager Arsene Wenger is certainly happy about it. When Barcelona
allowed Sanchez to leave, he could have chosen any team. Luckily, he chose
Arsenal and has already scored nineteen goals this season.', 'Alexis Sanchez
chose Emirates over Anfield. Alexis Sanchez impressed immediately with his
stellar play. Wenger has been rumored to be linked with Liverpool winger Raheem
Sterling.', 'Alexis Sanchez picked Emirates over Anfield. Alexis Sanchez has
made an instant impact with 19 goals. The speculation that is being linked with
Wenger is Raheem Sterling.', "Chile forward Alexis Sanchez recently chose to
play for Arsenal at the Emirates Statium over Liverpool at Anfield. Sanchez came
into the league strong, scoring 19 goals this season. Arsenal manager Arsene
Wenger is trying to downplay speculation that links him to Liverpool's winger
Raheem Sterling.", 'Alexus Sabcgez chose the play at Emerates Stadium over
Anfield in the premier league. Sanchez has mad a great impact with the teal
scoring 19 goals this year for the North London Club. Wenger has been impressed
by the way Rodger has developed his team but has down played speculation linking
him to Raheem Sterling, the Liverpool winger.', "Alexis Sanchez picked Arsenal
and Emirates Stadium to play for instead of Liverpool and Anfield, and his coach
Arsene Wenger is sure happy about it. Sanchez has already scored nineteen goals
for his team this year. Wenger also downplayed an interest in Raheem Sterling,
who is a forward for Liverpool. Sterling rejected his team's latest contract
offer."

PPL metric = 34.58329391479492

The longer text is much more self-consistent than the shorter one, so has low perplexity!

33.5. GPUs#

Going forward, understanding and using GPUs is going to be a useful skill. See: https://codeconfessions.substack.com/p/gpu-computing