Skip to main content

How to Set Up and Run Gemma 3 Locally With Ollama

Learn how to install, set up, and run Gemma 3 locally with Ollama and build a simple file assistant on your own device.
Mar 17, 2025  · 12 min read

Google DeepMind just released Gemma 3, the next iteration of their open-source models. Gemma 3 is designed to run directly on low-resource devices like phones and laptops. These models are optimized for quick performance on a single GPU or TPU and come in various sizes to suit different hardware needs.

In this tutorial, I’ll explain step by step how to set up and run Gemma 3 locally using Ollama. Once we do that, I’ll show you how you can use Gemma 3 and Python to build a file assistant.

Why Run Gemma 3 Locally?

Running a large language model (LLM) like Gemma 3 locally comes with several key benefits:

  • Privacy: Data stays on the device, protecting sensitive information.
  • Low latency: Eliminates the need for internet transmission, resulting in faster responses.
  • Customization: Models can be adjusted to suit specific needs and experiments.
  • Cost efficiency: Reduces cloud usage fees by utilizing existing hardware.
  • Offline access: Applications remain operational without internet connectivity.
  • Control: Enhanced security and control over our computing environment.

Set Up Gemma 3 Locally With Ollama

Installing Ollama

Ollama is a platform available for Windows, Mac, and Linux that supports running and distributing AI models, making it easier for developers to integrate these models into their projects. We'll use it to download and run Gemma 3 locally.

The first step is to download and install it from the official Ollama website.

Downloading Ollama to use gemma 3 locally

When installing it, make sure to install the command line:

Install Ollama command line

After completing the installation, we can check that it was correctly installed using the ollama command in the terminal. Here's what the result should be:

Ollama installation check

Downloading Gemma 3

To download a model with Ollama, use the pull command:

ollama pull <model_name>:<model_version>

The list of available models can be found in Ollama’s library.

Gemma 3, in particular, has four models available: 1b, 4b, 12b, and 27b, where b stands for billion, referring to the number of parameters in the model.

For example, to download gemma3 with 1b parameters, we use the command:

ollama pull gemma3:1b

If we don't specify the model version, the 4b model will be downloaded by default:

ollama pull gemma3

Listing local models

We can list the models we have locally using the following command:

ollama list

In my case, the output shows that I have two models: 

NAME             ID              SIZE      MODIFIED       
gemma3:1b        2d27a774bc62    815 MB    38 seconds ago    
gemma3:latest    c0494fe00251    3.3 GB    22 minutes ago 

Chatting in the terminal

We can use Ollama to chat with a model using the run command:

ollama run gemma3

Chatting with Ollama and Gemma 3 in the terminal

Note that if we use the run command with a model we didn't download, it will be downloaded automatically using pull.

Running Gemma 3 in the background

To use Gemma 3 with Python we need to run it in the background. We can do that using the serve command:

ollama serve

If you get the following error when executing the command, it likely means that Ollama is already running:

Error: listen tcp 127.0.0.1:11434: bind: address already in use

This error can happen when Ollama keeps running in the background.

Using Gemma 3 Locally with Python

Set up the Python environment

Ollama offers a Python package to easily connect with models running on our computer.

We'll use Anaconda to set up a Python environment and add the necessary dependencies. Doing it this way helps prevent possible issues with other Python packages we may already have.

Once Anaconda is installed, we can set up the environment by using the command:

conda create -n gemma3-demo -y python=3.9

This command sets up an environment called gemma3-demo using Python version 3.9. The -y option makes sure that all questions during setup are automatically answered with yes.

Next, we activate the environment using:

conda activate gemma3-demo

Finally, we install the ollama package using the command:

pip install ollama

Sending a message to Gemma 3 with Python

Here's how we can send a message to Gemma 3 using Python:

from ollama import chat
response = chat(
    model="gemma3",
    messages=[
        {
            "role": "user",
            "content": "Why is the sky blue?",
        },
    ],
)
print(response.message.content)

Depending on your hardware, the model may take a while to answer, so be patient when executing the script.

We previously saw that gemma3 refers to the gemma3:4b by default. So when we specify model="gemma3" as the model, that’s the one that will be used.

To use another model—say the 1B model—we need to pass the argument model=”gemma3:1b” instead (provided we pulled it beforehand using the ollama pull gemma3:1b command). To list the available models, we can use the ollama list command.

If we want to stream the answer word by word, we can instead use stream=True and print the response chunk by chunk:

from ollama import chat
stream = chat(
    model="gemma3",
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
    stream=True,
)
for chunk in stream:
    print(chunk["message"]["content"], end="", flush=True)

This provides a better user experience because the user doesn’t have to wait for the complete answer to be generated.

To learn more about the ollama package, check their documentation.

Building a File Assistant Using Gemma 3

In this section, we learn how to build a Python script that allows us to ask questions about the content of a text file right from the terminal. This script will be handy for tasks like checking for bugs in a code file or querying information from any document.

Script overview

Our goal is to create a command-line tool using Python that reads a text file and uses Gemma 3 to answer questions related to its content. Here's the step-by-step guide to achieving this:

Setting up the script structure

First, we need to set up the Python script with the necessary imports and basic structure:

import sys
from ollama import chat
def ask_questions_from_file(file_path):
    # Read the content of the text file
    with open(file_path, "r") as file:
        content = file.read()
    # Loop to keep asking questions
    while True:
        question = input("> ")
        print()
        if question.lower() == "exit":
            break
        # Use Gemma 3 to get answers
        stream = chat(
            model="gemma3",
            messages=[
                {"role": "user", "content": content},
                {"role": "user", "content": question},
            ],
            stream=True,
        )
        for chunk in stream:
            print(chunk["message"]["content"], end="", flush=True)

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python ask_file.py <path_to_text_file>")
    else:
        file_path = sys.argv[1]
        ask_questions_from_file(file_path)

The core functionality is encapsulated in the ask_questions_from_file() function. This function takes a file path as an argument and starts by opening and reading the content from the specified text file. This content will be used as background information to answer questions.

Once the file content is loaded, the script enters a loop where it continuously prompts us to input questions. When we type a question, the script sends the content of the file along with our question to the Gemma 3 model, which processes this information to generate an answer.

The interaction with the model takes place through a streaming mechanism, allowing answers to be displayed in real time as they are generated. If we type exit, the loop breaks, and the script stops running.

At the end of the script, there is a check to ensure that the script is being run correctly with one command-line argument, which should be the path to the text file. If the argument is not provided, it displays a usage message to guide us on the correct way to run the script. This setup allows us to direct the script effectively from the command line.

Executing the script

Save the above code in a file named, for example, ask.py. To test the script, run the command:

python ask.py ask.py

This will run the script to ask questions about itself (that's why ask.py appears twice in the command above). Here's an example of asking it to explain how the script works:

Using the script to use Gemma 3 locally as a file assistant

Conclusion

We've successfully set up and learned how to run Gemma 3 locally using Ollama and Python. This approach ensures the privacy of our data, offers low latency, provides customization options, and can lead to cost savings. The steps we've covered aren't just limited to Gemma 3—they can be applied to other models hosted on Ollama too.

If we want to improve the functionality of our script, we could, for instance, extend its capabilities to handle PDFs. The best way to do that would be to use the Mistral OCR API. We can convert PDF files into text, allowing our script to answer questions about PDFs, making it even more versatile and powerful.

With these tools, we're now equipped to explore and interact with large models right from our own devices.

FAQs

What are the hardware requirements for running Gemma 3 locally?

The ability to run Gemma 3 locally depends on the specific model size you choose to download. Smaller models like gemma3:1b are designed to be efficient and can run on less powerful devices, such as laptops with a single GPU or even some high-powered tablets. Larger models like gemma3:27b require more computational resources, such as a multi-core workstation with a high-end GPU. Always check your device's specifications against the model's resource requirements to ensure compatibility.

Can I run multiple instances of Gemma 3 simultaneously?

Yes, you can run multiple instances of Gemma 3 simultaneously, provided your hardware resources (CPU, GPU, and memory) are sufficient to support the additional computational load. Each instance of the model running in the background will consume additional system resources, so make sure your system can handle the multiple processes without significant performance degradation.

Is it necessary to use Anaconda to set up a Python environment?

While Anaconda is highly recommended for its ease of creating isolated environments, managing dependencies, and avoiding conflicts with other Python packages, it is not strictly necessary. You can also use other virtual environment tools like venv or virtualenv.


François Aubry's photo
Author
François Aubry
LinkedIn
Full-stack engineer & founder at CheapGPT. Teaching has always been my passion. From my early days as a student, I eagerly sought out opportunities to tutor and assist other students. This passion led me to pursue a PhD, where I also served as a teaching assistant to support my academic endeavors. During those years, I found immense fulfillment in the traditional classroom setting, fostering connections and facilitating learning. However, with the advent of online learning platforms, I recognized the transformative potential of digital education. In fact, I was actively involved in the development of one such platform at our university. I am deeply committed to integrating traditional teaching principles with innovative digital methodologies. My passion is to create courses that are not only engaging and informative but also accessible to learners in this digital age.
Topics

Learn AI with these courses!

Course

Retrieval Augmented Generation (RAG) with LangChain

3 hr
4.6K
Learn cutting-edge methods for integrating external data with LLMs using Retrieval Augmented Generation (RAG) with LangChain.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

Tutorial

How to Set Up and Run QwQ 32B Locally With Ollama

Learn how to install, set up, and run QwQ-32B locally with Ollama and build a simple Gradio application.
Aashi Dutt's photo

Aashi Dutt

12 min

Tutorial

How to Run Llama 3 Locally With Ollama and GPT4ALL

Run LLaMA 3 locally with GPT4ALL and Ollama, and integrate it into VSCode. Then, build a Q&A retrieval system using Langchain and Chroma DB.
Abid Ali Awan's photo

Abid Ali Awan

12 min

Tutorial

How to Set Up and Run DeepSeek R1 Locally With Ollama

Learn how to install, set up, and run DeepSeek-R1 locally with Ollama and build a simple RAG application.
Aashi Dutt's photo

Aashi Dutt

12 min

Tutorial

Llama 3.2 Vision With RAG: A Guide Using Ollama and ColPali

Learn the step-by-step process of setting up a RAG application using Llama 3.2 Vision, Ollama, and ColPali.
Ryan Ong's photo

Ryan Ong

12 min

Tutorial

RAG With Llama 3.1 8B, Ollama, and Langchain: Tutorial

Learn to build a RAG application with Llama 3.1 8B using Ollama and Langchain by setting up the environment, processing documents, creating embeddings, and integrating a retriever.
Ryan Ong's photo

Ryan Ong

12 min

Tutorial

vLLM: Setting Up vLLM Locally and on Google Cloud for CPU

Learn how to set up and run vLLM (Virtual Large Language Model) locally using Docker and in the cloud using Google Cloud.
François Aubry's photo

François Aubry

12 min

See MoreSee More