Skip to main content

How to Run Alpaca-LoRA on Your Device

Learn how to run Alpaca-LoRA on your device with this comprehensive guide. Discover how this open-source model leverages LoRA technology to offer a powerful yet efficient AI chatbot solution.
Sep 26, 2023  · 7 min read

As generative AI continues to gain traction, developers worldwide are leaping at the opportunity to build exciting applications using natural language. One tool in particular has garnered plenty of attention recently: ChatGPT.

ChatGPT is a language model developed by OpenAI. Its purpose is to serve as an AI-powered chatbot capable of engaging in human-like dialogue. Although it’s a highly useful tool, it’s not without its problems. ChatGPT is not open-source, which means the source code is not accessible and cannot be modified. It’s also extremely resource-intensive, which makes building your own implementation a terrible solution.

Such problems birthed a series of ChatGPT alternatives, such as Alpaca-LoRA, that are capable of functioning like ChatGPT but with an open-source license and less resource requirements.

In this tutorial, we will focus our attention specifically on Alpaca-LoRA. We will cover what it is, the prerequisites to run it on your device, and the steps to execute it.

What is Alpaca LoRA?

In early March 2023, Eric J. Wang released the Alpaca-LoRA project. It’s a project containing code to reproduce the Standford Alpaca results using Parameter-Efficient Fine-Tuning (PEFT); this is a library that enables developers to fine-tune transformer-based models using LoRA.

Understanding LoRA

Low-Rank Adaptation of Large Language Models (LoRA) is a method used to accelerate the process of training large models while consuming less memory.

Here's how it works:

  • Freezing existing weights. Imagine the model as a complex web of interconnected nodes (these are the "weights"). Normally, you'd adjust all these nodes during training to improve the model. LoRA says, "Let's not touch these; let's keep them as they are."
  • Adding new weights. LoRA then adds a few new, simpler connections (new weights) to this web.
  • Training only the new weights. Instead of adjusting the entire complex web, you only focus on improving these new, simpler connections.

By doing this, you save time and computer memory while still making your model better at its tasks.

Advantages of LoRA

The advantages of LoRA include:

  • Portability - Rank-decomposition weight matrices contain far fewer trainable parameters than the original model; thus, the trained LoRA weights are easily portable and can run on Rasberry Pi.
  • Accessibility – When compared to conventional fine-tuning, LoRa has been demonstrated to significantly reduce GPU memory usage; this makes it possible to perform fine-tuning on consumer GPUs such as Tesla T4, RTX 3080, or even the RTX 2080 Ti.

Alpaca: The Open-Source Model

Alpaca, on the other hand, is an open-source instruction-finetuned AI language model based on the Large Language Model Meta AI (LLaMA). It was developed by a team of researchers at Stanford University with the intent of making large language models (LLMs) more accessible.

And this brings us to Alpaca-LoRA.

The Alpaca-LoRA model is a less resource-intensive version of the Stanford Alpaca model that leverages LoRA to speed up the training process while consuming less memory.

Alpaca-LoRA Prerequisites

To run the Alpaca-LoRA model locally, you must have a GPU. It can be a low-spec GPU such as NVIDIA T4 or a consumer GPU like 4090. According to Eric J. Wang, the creator of the project, the model “runs within hours on a single RTX 4090.”

Note: the instructions in this article follow those provided in the Alpaca-LoRA repository by Eric J. Wang.

How to Run Alpaca-LoRA in 4 Steps

Step 1: Create the virtual environment (Optional)

Virtual environments are isolated containers used to store the Python-related dependencies required for a specific project. This helps to keep the dependencies required for different projects separate, thereby making it easier to share projects and reduce dependency conflicts.

It’s not mandatory to use one to run the Alpaca-LoRA model, but it’s recommended.

To create a virtual environment using the command prompt on the Windows operating system, run the following:

py -m venv venv

This will create a virtual environment called venv in your current working directory.

Note: You may use whatever name you wish for your virtual environment by replacing the second venv with your preferred name.

Before you install any dependencies, you must activate the virtual environment. Run the following command to activate your virtual environment:

venv\Scripts\activate.bat

When you are no longer using the virtual environment, run the following command to deactivate it:

deactivate

Now you’re ready to get to work on running Alpaca-LoRA.

Step 2: Setup

The first step to run the Alpaca-LoRA model is to clone the repository from GitHub and install the dependencies required for execution.

Use the following command to install the GitHub repository:

git clone https://github.com/tloen/alpaca-lora.git

Then navigate to the alpaca-lora repository you just installed using:

cd alpaca-lora

And run the following command to install the dependencies:

pip install -r requirements.txt

Step 3: Fine-Tuning the Model (Optional)

The alpaca-lora repository contains a file named finetune.py. finetune.py contains a simple application of Parameter-Efficient Fine-Tuning (PEFT) applied to the LLaMA model, among other things.

This is the file you must execute if you wish to tweak the hyperparameter of the model, but it’s not mandatory. According to the author of the repository, “Without hyperparameter tuning, the LoRA model produces outputs comparable to the Stanford Alpaca model. Further tuning might be able to achieve better performance [...].”

Here’s an example presented for how to use the finetune.py file:

python -m finetune.py \
    --base_model 'decapoda-research/llama-7b-hf' \
    --data_path 'yahma/alpaca-cleaned' \
    --output_dir './lora-alpaca' \
    --batch_size 128 \
    --micro_batch_size 4 \
    --num_epochs 3 \
    --learning_rate 1e-4 \
    --cutoff_len 512 \
    --val_set_size 2000 \
    --lora_r 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --lora_target_modules '[q_proj,v_proj]' \
    --train_on_inputs \
    --group_by_length

Step 4: Running the model / Inference

Also in the alpaca-lora repository is a file named generate.py. Executing the generate.py will perform the following:

  • Read the foundational model from the Hugging Face model hub
  • Read the model weights from tloen/alpaca-lora-7b
  • Start up a Gradio interface where inference is performed on a specified input.

At the time of writing, the most recent Alpaca-LoRA adapter used to train the model is alpaca-lora-7b. This was conducted on the 26th of March, 2023 using the following command:

python finetune.py \
    --base_model='decapoda-research/llama-7b-hf' \
    --num_epochs=10 \
    --cutoff_len=512 \
    --group_by_length \
    --output_dir='./lora-alpaca' \
    --lora_target_modules='[q_proj,k_proj,v_proj,o_proj]' \
    --lora_r=16 \
    --micro_batch_size=8

If you wish to use a different adapter, you may do so by running the generate.py file with a link to the destination of your preferred adaptor.

python generate.py \
    --load_8bit \
    --base_model 'decapoda-research/llama-7b-hf' \
    --lora_weights 'tloen/alpaca-lora-7b'

Wrap up

Alpaca-LoRA is a less resource-intensive version of the Stanford Alpaca model. It achieves this goal by leveraging low-rank adaptation of large language models (LoRA), which speeds up the training process while consuming far less memory than the original Alpaca model.

Learn more about large language models (LLMs) and generative AI with the following tutorials:


Kurtis Pykes 's photo
Author
Kurtis Pykes
LinkedIn
Topics

Start Your AI Journey Today!

course

Generative AI Concepts

2 hr
25.6K
Discover how to begin responsibly leveraging generative AI. Learn how generative AI models are developed and how they will impact society moving forward.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

blog

Introduction to Meta AI’s LLaMA

LLaMA, a revolutionary open-source framework, aims to make large language model research more accessible.
Abid Ali Awan's photo

Abid Ali Awan

8 min

blog

What Is OpenAI's Sora? How It Works, Examples, Features

Discover OpenAI’s Sora through example videos and explore its features, including Remix, Re-cut, Loop, Storyboard, Blend, and Style Preset.
Richie Cotton's photo

Richie Cotton

8 min

tutorial

Mastering Low-Rank Adaptation (LoRA): Enhancing Large Language Models for Efficient Adaptation

Explore the groundbreaking technique of Low-Rank Adaptation (LoRA) in our full guide. Discover how LoRA revolutionizes the fine-tuning of Large Language Models.
Moez Ali's photo

Moez Ali

10 min

tutorial

Fine-tuning Stable Diffusion XL with DreamBooth and LoRA

Learn how to successfully fine-tune Stable Diffusion XL on personal photos using Hugging Face AutoTrain Advance, DreamBooth, and LoRA for customized, high-quality image generation.
Abid Ali Awan's photo

Abid Ali Awan

14 min

tutorial

Run LLMs Locally: 7 Simple Methods

Run LLMs locally (Windows, macOS, Linux) by leveraging these easy-to-use LLM frameworks: GPT4All, LM Studio, Jan, llama.cpp, llamafile, Ollama, and NextChat.
Abid Ali Awan's photo

Abid Ali Awan

14 min

tutorial

Fine-Tune and Run Inference on Google's Gemma Model Using TPUs for Enhanced Speed and Performance

Learn to infer and fine-tune LLMs with TPUs and implement model parallelism for distributed training on 8 TPU devices.
Abid Ali Awan's photo

Abid Ali Awan

12 min

See MoreSee More