Skip to main content
HomeTutorialsArtificial Intelligence (AI)

How to Run Alpaca-LoRA on Your Device

Learn how to run Alpaca-LoRA on your device with this comprehensive guide. Discover how this open-source model leverages LoRA technology to offer a powerful yet efficient AI chatbot solution.
Updated Sep 2023  · 7 min read

As generative AI continues to gain traction, developers worldwide are leaping at the opportunity to build exciting applications using natural language. One tool in particular has garnered plenty of attention recently: ChatGPT.

ChatGPT is a language model developed by OpenAI. Its purpose is to serve as an AI-powered chatbot capable of engaging in human-like dialogue. Although it’s a highly useful tool, it’s not without its problems. ChatGPT is not open-source, which means the source code is not accessible and cannot be modified. It’s also extremely resource-intensive, which makes building your own implementation a terrible solution.

Such problems birthed a series of ChatGPT alternatives, such as Alpaca-LoRA, that are capable of functioning like ChatGPT but with an open-source license and less resource requirements.

In this tutorial, we will focus our attention specifically on Alpaca-LoRA. We will cover what it is, the prerequisites to run it on your device, and the steps to execute it.

What is Alpaca LoRA?

In early March 2023, Eric J. Wang released the Alpaca-LoRA project. It’s a project containing code to reproduce the Standford Alpaca results using Parameter-Efficient Fine-Tuning (PEFT); this is a library that enables developers to fine-tune transformer-based models using LoRA.

Understanding LoRA

Low-Rank Adaptation of Large Language Models (LoRA) is a method used to accelerate the process of training large models while consuming less memory.

Here's how it works:

  • Freezing existing weights. Imagine the model as a complex web of interconnected nodes (these are the "weights"). Normally, you'd adjust all these nodes during training to improve the model. LoRA says, "Let's not touch these; let's keep them as they are."
  • Adding new weights. LoRA then adds a few new, simpler connections (new weights) to this web.
  • Training only the new weights. Instead of adjusting the entire complex web, you only focus on improving these new, simpler connections.

By doing this, you save time and computer memory while still making your model better at its tasks.

Advantages of LoRA

The advantages of LoRA include:

  • Portability - Rank-decomposition weight matrices contain far fewer trainable parameters than the original model; thus, the trained LoRA weights are easily portable and can run on Rasberry Pi.
  • Accessibility – When compared to conventional fine-tuning, LoRa has been demonstrated to significantly reduce GPU memory usage; this makes it possible to perform fine-tuning on consumer GPUs such as Tesla T4, RTX 3080, or even the RTX 2080 Ti.

Alpaca: The Open-Source Model

Alpaca, on the other hand, is an open-source instruction-finetuned AI language model based on the Large Language Model Meta AI (LLaMA). It was developed by a team of researchers at Stanford University with the intent of making large language models (LLMs) more accessible.

And this brings us to Alpaca-LoRA.

The Alpaca-LoRA model is a less resource-intensive version of the Stanford Alpaca model that leverages LoRA to speed up the training process while consuming less memory.

Alpaca-LoRA Prerequisites

To run the Alpaca-LoRA model locally, you must have a GPU. It can be a low-spec GPU such as NVIDIA T4 or a consumer GPU like 4090. According to Eric J. Wang, the creator of the project, the model “runs within hours on a single RTX 4090.”

Note: the instructions in this article follow those provided in the Alpaca-LoRA repository by Eric J. Wang.

How to Run Alpaca-LoRA in 4 Steps

Step 1: Create the virtual environment (Optional)

Virtual environments are isolated containers used to store the Python-related dependencies required for a specific project. This helps to keep the dependencies required for different projects separate, thereby making it easier to share projects and reduce dependency conflicts.

It’s not mandatory to use one to run the Alpaca-LoRA model, but it’s recommended.

To create a virtual environment using the command prompt on the Windows operating system, run the following:

py -m venv venv

This will create a virtual environment called venv in your current working directory.

Note: You may use whatever name you wish for your virtual environment by replacing the second venv with your preferred name.

Before you install any dependencies, you must activate the virtual environment. Run the following command to activate your virtual environment:

venv\Scripts\activate.bat

When you are no longer using the virtual environment, run the following command to deactivate it:

deactivate

Now you’re ready to get to work on running Alpaca-LoRA.

Step 2: Setup

The first step to run the Alpaca-LoRA model is to clone the repository from GitHub and install the dependencies required for execution.

Use the following command to install the GitHub repository:

git clone https://github.com/tloen/alpaca-lora.git

Then navigate to the alpaca-lora repository you just installed using:

cd alpaca-lora

And run the following command to install the dependencies:

pip install -r requirements.txt

Step 3: Fine-Tuning the Model (Optional)

The alpaca-lora repository contains a file named finetune.py. finetune.py contains a simple application of Parameter-Efficient Fine-Tuning (PEFT) applied to the LLaMA model, among other things.

This is the file you must execute if you wish to tweak the hyperparameter of the model, but it’s not mandatory. According to the author of the repository, “Without hyperparameter tuning, the LoRA model produces outputs comparable to the Stanford Alpaca model. Further tuning might be able to achieve better performance [...].”

Here’s an example presented for how to use the finetune.py file:

python -m finetune.py \
    --base_model 'decapoda-research/llama-7b-hf' \
    --data_path 'yahma/alpaca-cleaned' \
    --output_dir './lora-alpaca' \
    --batch_size 128 \
    --micro_batch_size 4 \
    --num_epochs 3 \
    --learning_rate 1e-4 \
    --cutoff_len 512 \
    --val_set_size 2000 \
    --lora_r 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --lora_target_modules '[q_proj,v_proj]' \
    --train_on_inputs \
    --group_by_length

Step 4: Running the model / Inference

Also in the alpaca-lora repository is a file named generate.py. Executing the generate.py will perform the following:

  • Read the foundational model from the Hugging Face model hub
  • Read the model weights from tloen/alpaca-lora-7b
  • Start up a Gradio interface where inference is performed on a specified input.

At the time of writing, the most recent Alpaca-LoRA adapter used to train the model is alpaca-lora-7b. This was conducted on the 26th of March, 2023 using the following command:

python finetune.py \
    --base_model='decapoda-research/llama-7b-hf' \
    --num_epochs=10 \
    --cutoff_len=512 \
    --group_by_length \
    --output_dir='./lora-alpaca' \
    --lora_target_modules='[q_proj,k_proj,v_proj,o_proj]' \
    --lora_r=16 \
    --micro_batch_size=8

If you wish to use a different adapter, you may do so by running the generate.py file with a link to the destination of your preferred adaptor.

python generate.py \
    --load_8bit \
    --base_model 'decapoda-research/llama-7b-hf' \
    --lora_weights 'tloen/alpaca-lora-7b'

Wrap up

Alpaca-LoRA is a less resource-intensive version of the Stanford Alpaca model. It achieves this goal by leveraging low-rank adaptation of large language models (LoRA), which speeds up the training process while consuming far less memory than the original Alpaca model.

Learn more about large language models (LLMs) and generative AI with the following tutorials:


Photo of Kurtis Pykes
Author
Kurtis Pykes
Topics

Start Your AI Journey Today!

Course

Generative AI Concepts

2 hr
17K
Discover how to begin responsibly leveraging generative AI. Learn how generative AI models are developed and how they will impact society moving forward.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

What is DeepMind AlphaGeometry?

Discover AphaGeometry, an innovative AI model with unprecedented performance to solve geometry problems.
Javier Canales Luna's photo

Javier Canales Luna

8 min

What is Stable Code 3B?

Discover everything you need to know about Stable Code 3B, the latest product of Stability AI, specifically designed for accurate and responsive coding.
Javier Canales Luna's photo

Javier Canales Luna

11 min

The 11 Best AI Coding Assistants in 2024

Explore the best coding assistants, including open-source, free, and commercial tools that can enhance your development experience.
Abid Ali Awan's photo

Abid Ali Awan

8 min

How the UN is Driving Global AI Governance with Ian Bremmer and Jimena Viveros, Members of the UN AI Advisory Board

Richie, Ian and Jimena explore what the UN's AI Advisory Body was set up for, the opportunities and risks of AI, how AI impacts global inequality, key principles of AI governance, the future of AI in politics and global society, and much more. 
Richie Cotton's photo

Richie Cotton

41 min

The Power of Vector Databases and Semantic Search with Elan Dekel, VP of Product at Pinecone

RIchie and Elan explore LLMs, vector databases and the best use-cases for them, semantic search, the tech stack for AI applications, emerging roles within the AI space, the future of vector databases and AI, and much more.  
Richie Cotton's photo

Richie Cotton

36 min

Getting Started with Claude 3 and the Claude 3 API

Learn about the Claude 3 models, detailed performance benchmarks, and how to access them. Additionally, discover the new Claude 3 Python API for generating text, accessing vision capabilities, and streaming.
Abid Ali Awan's photo

Abid Ali Awan

See MoreSee More