Accéder au contenu principal

How to Run Llama 3.2 1B on an Android Phone With Torchchat

Get step-by-step instructions on how to set up and run Llama 3.2 1B on your Android device using the Torchchat framework.
31 oct. 2024  · 12 min de lecture

I recently wrote an article on how to run LLMs with Python and Torchchat. Torchchat is a flexible framework designed to execute LLMs efficiently on various hardware platforms. Running an LLM locally offers several benefits, including:

  • Offline access: Because the model is running on our device, we don’t need to be connected to the internet to use it.
  • Privacy: Since we run the model ourselves, the prompts and data we input remain private.
  • Cost: We can run the model for free.

Following Facebook's recent release of Llama 3.2, I’ve decided to extend my previous article to teach how to deploy it to our mobile phones, focusing on Android.

Here’s a high-level overview of what we need to do:

  • Download the model files, which are freely available because the model is open source.
  • Generate the necessary files to run the model on a mobile device.
  • Set up an Android chat application that executes the model.

Torchchat provides an answer to all these steps. It has commands to download the model and generate the files required to run it. On top of that, it comes with a demo chat Android application, so we don’t need to build one ourselves.

Develop AI Applications

Learn to build AI applications using the OpenAI API.
Start Upskilling For Free

Downloading Torchchat

The first step is to clone the Torchchat repository using Git:

git clone git@github.com:pytorch/torchchat.git

We can also do this without Git by downloading it using the download button on GitHub:

Downloading Torchchat from GitHub

Important note: After cloning (or downloading) the repository, a folder named torchchat will become available on our computer. All commands mentioned in this article should be executed from within that folder.

Torchchat Installation

We’ll assume that Python and Anaconda have already been installed. If they are not, refer to these two tutorials: Installing Anaconda on Windows and How to Install Anaconda on Mac OS X.

Alternatively, follow the steps on the Torchchat repository, which uses a virtual environment instead of Anaconda.

We begin by creating an Anaconda environment using the command:

conda create -yn llama python=3.10.0

This creates an environment named llama that uses version 3.10 of Python. The -yn option combines -y and -n:

  • The -y option allows the command to proceed without asking for confirmation before creating the environment.
  • The -n option specifies the environment's name, which in this case is llama. After creating the environment, we activate it and install the dependencies. 

To activate it, we use the following command:

conda activate llama

To install the necessary dependencies, we utilize the installation script provided by TorchChat:

./install/install_requirements.sh

Downloading the Llama 3.2 1B Model

In this tutorial, we’ll use the Llama 3.2 1B model, a one billion-parameter model. We can download it using the command:

python torchchat.py download llama3.2-1b

The process is the same for experimenting with other models—we need to replace llama3.2-1b with the alias of the desired model. It’s important to select only models marked as mobile-friendly.

Torchchat leverages Hugging Face for model management. Consequently, downloading a model requires a Hugging Face account. For a comprehensive guide on how to create an account and log in from the terminal, check the “Downloading a model” section from this Torchchat tutorial.

Alternatively, the model can also be downloaded from Llama’s official website. However, using Torchchat is more convenient, as it facilitates exporting the model files for mobile deployment.

Exporting the Model

To deploy to a mobile device, we first export the model to generate the .pte artifact which is a file used by Executorch. Executorch is the engine Torchchat uses to run the LLM on a mobile device.

To generate the PTE file, we must first install Executorch. To do this, run the following commands:

export TORCHCHAT_ROOT=${PWD} 
./torchchat/utils/scripts/install_et.sh

After the installation is complete, we can generate the PTE file using the command:

python torchchat.py export llama3.2-1b --quantize torchchat/quant_config/mobile.json --output-pte-path llama3_2-1b.pte

This command will also quantize the model, reducing its size and increasing the inference speed. We use Torchchat's default configuration in torchchat/quant_config/mobile.json.

When this process is completed, a file named llama3_2-1b.pte will be created in the torchchat folder.

Preparing the Mobile App

Torchchat offers a demo Android app that allows us to operate Llama directly on our smartphones. This demo project can be found in the Torchchat folder that was downloaded. Its relative path is torchchat/edge/android/torchchat. Note that the first torchchat reference in the path refers to a folder inside the repository, not the repository root folder.

Download and set up the Java library

  1. Download the .aar file provided by Torchchat. This contains the Java library and corresponding JNI library, to build and run the app. 
  2. Navigate to the folder app directory: torchchat/edge/android/torchchat/app/
  3. Create a directory named libs (if one doesn’t exist already)
  4. Rename the download file to executorch.aar and paste it into the libs folder.

Load the Android project on Android Studio

Download and install Android Studio from the official website. Once the installation is complete, open the project:

Opening the demo app on Android Studio

The demo project is located at torchchat/edge/android/torchchat.

When we open the project, we’re prompted with this window asking us if we trust the code:

Popup after opening the demo app

Here, choose the “Trust Project” option.

After opening the project, Android Studio will need some time to load the configuration and complete the initial setup. It's important to wait until this process is finished. We can monitor the status in the bottom right corner of the window. Once it’s ready, the run button at the top should become green:

How to know when the project is loaded

Setting up developer mode on Android

To be able to run the app on the phone via Android Studio, we need to enable developer mode on our device. Here’s how we can do that:

  1. Go to "Settings"
  2. Select "About device" or "About phone"
  3. Open “Software information”
  4. Find the “Build number” and press it seven times

That's it! Now, we can use our device to run the app from Android Studio. When we do this, the app will remain on our phone, enabling us to run it even without a connection to the computer.

Installing adb

The app needs the model files to be located in a specific folder on the phone. To send those files to the phone, we use adb (Android Debug Bridge). We can install it using Android Studio by following these steps:

  1. Navigate to the settings and start typing “Android SDK” on the search bar
  2. Locate the menu named “Android SDK” and click on it
  3. Select the “Android SDK command line tools (latest)” option
  4. Click “Apply” to install it

Installing adb from Android Studio

Note that at the top of the window, Android Studio shows the location of the Android sdk:

Finding the path to adb

We copy that path and execute the following command (replacing <android_skd_location> below with the path that was just copied):

export PATH=$PATH:<android_skd_location>/platform-tools/

To confirm that adb was installed successfully, use the command adb --version, which will display the version of adb.

Setting the app on the phone

Connect the phone to the computer with a cable. The name of the device will appear on the list of devices next to the run button:

Device selection in Android Studio

Before running the app on the phone, we now use adb to copy the necessary model files to our device.

  1. Find the id of the device using the adb devices command. There could be multiple lines in the output. Locate the one with the form with the form <device_code> device and copy the device code.
  2. Create the directory to store the model files using adb -s <device_code> shell mkdir -p /data/local/tmp/llama.
  3. Copy the .pte file located in the torchchat folder with adb -s <device_code> push llama3_2-1b.pte /data/local/tmp/llama.
  4. Locate the model tokenizer file using python torchchat.py where llama3.2-1b. This will output several lines, we care about the path shown in the last line.

Locating the tokenizer.model file

  1. Copy the tokenizer file to the device using adb -s <device_code> push <model_path>/tokenizer.model /data/local/tmp/llama replacing <model_path> with the path obtained in the previous step.

After completing these steps, the model files should be on our phone and ready to use. We can check it by listing all files located in the device folder we just created:

adb -s <device_code> shell ls 

The output should have the two files we just copied:

llama3_2-1b.pte
tokenizer.model

Running the app

Everything is now ready to run the demo app on our phone. We can click the green arrow to run it.

Running the app on the phone

This will open the app on our device. The app prompts us to select the model and tokenizer files.

Selecting the model and tokenizer

Using the Llama 3.2 1B App

We can now start chatting with Llama3.2 1B on our phone! The app interface is quite simplistic. We can use the “Prompt” textbox to write a prompt and send it to the model using the “Generate” button.

Since the model runs offline, I could use it on the plane to my next destination to ask, for example, for some dish recommendations:

Example of using the app

With this interaction, we observe a few limitations of the demo app:

  1. The answer repeats our prompt.
  2. The answer formatting starts with “I’ve been to Taiwan…” indicating that the app isn’t designed to use the model directly as an assistant.
  3. The responses are cut abruptly, likely because the token limit has been reached.

We can compare the behavior by running the same model on the terminal with the command:

python torchchat.py chat llama3.2-1b

We provided the same prompt to see if the behavior was different. Indeed, in this case, the answer seems much more natural and useful, despite also being cut off:

Behavior of the model on the computer

I reached out to the team behind Torchchat, and they told me that Llama 3.2 was too recent and that the demo app needed to be slightly updated to support it. 

However, the setup process will remain the same, and the knowledge acquired here will still apply to the updated version. It is possible that by the time you are reading this article, it has already been updated and is functioning correctly.

Conclusion

In this guide, we learned how to set up Llama 3.2 1B directly on an Android device using Torchchat. We covered the step-by-step process of downloading and installing the necessary components, including the model files and the demo Android app.

Though there are some limitations with the current demo app, particularly in response formatting and length, the underlying potential of such implementations is immense.

If you want to read more on Llama 3.2, I recommend these blogs:


François Aubry's photo
Author
François Aubry
LinkedIn
Teaching has always been my passion. From my early days as a student, I eagerly sought out opportunities to tutor and assist other students. This passion led me to pursue a PhD, where I also served as a teaching assistant to support my academic endeavors. During those years, I found immense fulfillment in the traditional classroom setting, fostering connections and facilitating learning. However, with the advent of online learning platforms, I recognized the transformative potential of digital education. In fact, I was actively involved in the development of one such platform at our university. I am deeply committed to integrating traditional teaching principles with innovative digital methodologies. My passion is to create courses that are not only engaging and informative but also accessible to learners in this digital age.
Sujets

Learn AI with these courses!

cours

Working with Llama 3

4 hr
2.4K
Explore the latest techniques for running the Llama LLM locally, fine-tuning it, and integrating it within your stack.
Afficher les détailsRight Arrow
Commencer le cours
Voir plusRight Arrow
Apparenté
Llama 3.2 is now multimodal

blog

Llama 3.2 Guide: How It Works, Use Cases & More

Meta releases Llama 3.2, which features small and medium-sized vision LLMs (11B and 90B) alongside lightweight text-only models (1B and 3B). It also introduces the Llama Stack Distribution.
Alex Olteanu's photo

Alex Olteanu

8 min

didacticiel

How to Run Llama 3 Locally: A Complete Guide

Run LLaMA 3 locally with GPT4ALL and Ollama, and integrate it into VSCode. Then, build a Q&A retrieval system using Langchain, Chroma DB, and Ollama.
Abid Ali Awan's photo

Abid Ali Awan

15 min

didacticiel

Llama 3.2 and Gradio Tutorial: Build a Multimodal Web App

Learn how to use the Llama 3.2 11B vision model with Gradio to create a multimodal web app that functions as a customer support assistant.
Aashi Dutt's photo

Aashi Dutt

10 min

didacticiel

How to Run Alpaca-LoRA on Your Device

Learn how to run Alpaca-LoRA on your device with this comprehensive guide. Discover how this open-source model leverages LoRA technology to offer a powerful yet efficient AI chatbot solution.
Kurtis Pykes 's photo

Kurtis Pykes

7 min

didacticiel

RAG With Llama 3.1 8B, Ollama, and Langchain: Tutorial

Learn to build a RAG application with Llama 3.1 8B using Ollama and Langchain by setting up the environment, processing documents, creating embeddings, and integrating a retriever.
Ryan Ong's photo

Ryan Ong

12 min

didacticiel

Fine-Tuning Llama 3.1 for Text Classification

Get started with the new Llama models and customize Llama-3.1-8B-It to predict various mental health disorders from the text.
Abid Ali Awan's photo

Abid Ali Awan

13 min

See MoreSee More