Skip to main content

Gemini 1.5 Pro API Tutorial: Getting Started With Google's LLM

To connect to the Gemini 1.5 Pro API, obtain your API key from Google AI for Developers, install the necessary Python libraries, and send requests and receive responses from the Gemini 1.5 Pro model.
May 27, 2024  · 8 min read

Gemini 1.5 Pro represents a significant advancement in the Gemini family. It introduces long-context reasoning, which empowers us to work with massive datasets across various modalities (text, video, and audio).

In this tutorial, we’ll learn how to connect to the Gemini 1.5 Pro API. We’ll learn to execute tasks such as retrieval, long-document and video question answering, long-context automatic speech recognition (ASR), and in-context learning.

If you want to learn more about Gemini, check out this article on What is Google Gemini.

The Gemini Family

Gemini AI encompasses a series of generative AI models collaboratively developed by multiple teams within Google, including Google Research and Google DeepMind.

These models, equipped with strong capabilities for general-purpose multi-modal tasks, are designed to assess developers in content generation and problem-solving tasks. Each model variant is optimized to specific use cases, ensuring optimal performance across diverse scenarios.

Gemini AI addresses the balance between computational resources and functionality by offering three model sizes:

Model

Size

Capabilities

Ideal Use Cases

Gemini Ultra

Largest

Most capable, handles highly complex tasks

Demanding applications, large-scale projects, intricate problem-solving

Gemini Pro

Medium

Versatile, suitable for a wide range of tasks, scalable

General-purpose applications, adaptable to diverse scenarios, projects requiring a balance of power and efficiency

Gemini Nano

Smallest

Lightweight and efficient, optimized for on-device and resource-constrained environments

Mobile applications, embedded systems, tasks with limited computational resources, real-time processing

pen_spark

Our primary focus will be on Gemini 1.5 Pro, the first model released within the Gemini 1.5 series.

Gemini 1.5 Pro Capabilities

Gemini 1.5 Pro, with its large context window of up to at least 10 million tokens, can address the challenge of understanding long contexts across a broad spectrum of applications in real-world scenarios.

A comprehensive evaluation of long dependency tasks has been conducted to thoroughly assess the Gemini 1.5 Pro model’s long-context capabilities. The Gemini 1.5 Pro model adeptly addressed the challenging "needle-in-a-haystack" task, achieving near-perfect (>99%) recall of the "needle" even amidst haystacks spanning multiple millions of tokens across all modalities, encompassing text, video, and audio.

This recall performance was notably maintained even with haystack sizes reaching 10 million tokens within the text modality.

The Gemini 1.5 Pro model surpassed all competing models, including those augmented with external retrieval methods, particularly on tasks necessitating an understanding of the interdependencies across multiple evidence widely spanning the entire long content, such as long dependency Question-Answer (QA) instances.

Gemini 1.5 Pro has also shown its remarkable ability, enabled by very long text, to learn tasks in context, such as translating a new language from a single set of linguistic documentation.

This advancement in Gemini 1.5 Pro's long-context performance is noteworthy, as it doesn't compromise the model's fundamental multi-modal capabilities. Compared to its predecessor (1.0 Pro), it substantially improved in various areas (+28.9% in Math, Science, and Reasoning).

It even surpasses the top-of-the-line 1.0 Ultra model in over half of the benchmarks, particularly excelling in text and vision tasks (as shown in the table below). This is achieved despite requiring fewer resources and being more computationally efficient.

Gemini Pro 1.5 Comparison with 1.0 Pro and 1.0 UltraData source.

For more detailed insights, I recommend consulting the technical report: “Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context”for more detailed insights.

Gemini 1.5 Pro Applications

With the ability to process multiple millions of tokens, novel practical applications emerge.

Gemini 1.5 Pro can be widely used in software engineering. Given the entire 746,152 token JAX codebase in context, Gemini 1.5 Pro can identify the specific location of a core automatic differentiation method.

With access to a reference grammar book and a bilingual wordlist (total of ∼250k), Gemini 1.5 Pro can translate from English to Kalamang with a quality comparable to that of a human who studied using the same resources.

The limited availability of online data for Kalamang (less than 200 speakers) forces Gemini 1.5 Pro to rely solely on the context provided in each prompt to understand and translate the language. This finding suggests promising opportunities for utilizing LLMs with extended contextual abilities to aid in preserving and reviving endangered languages and fostering communication and comprehension among different linguistic communities.

Gemini 1.5 Pro can also answer an image query given the entire text of Les Misérables (1382 pages, 732k tokens). Being natively multimodal, the model can identify a famous scene from a hand-drawn sketch.

Given a 45-minute silent film “Sherlock Jr.” (2,674 frames at 1FPS, 684k tokens), Gemini 1.5 Pro retrieves and extracts textual information from a specific frame and provides the corresponding timestamp. The model can also identify a scene in the movie from a hand-drawn sketch.

Gemini 1.5 Pro API: How to Use It

Now that we know about Gemini 1.5 Pro’s capabilities and applications, let’s learn how to connect to its API.

Step 1: Generate an API key

First, we need to get an API key from the Google AI for Developers page (make sure you are logged in to your Google account). We can do this by clicking the Get an API key button:

Google AI for Developers

Next, we need to establish the project and produce the API key.

Google AI for Developers

Step 2: Import the API library into Python

Let’s first install the Gemini Python API package using pip.

%pip install google-generativeai

Next, we’ll launch our Jupyter notebook and import the essential Python libraries required for our project.

import google.generativeai as genai
from google.generativeai.types import ContentType
from PIL import Image
from IPython.display import Markdown
import time
import cv2

Step 3: Make an API call

Let’s start by filling our API key in the GOOGLE_API_KEY variable and configure it:

GOOGLE_API_KEY = ‘your-api-key-goes-here’
genai.configure(api_key=GOOGLE_API_KEY)

Before calling the API, let’s first check the available models through the free API.

for m in genai.list_models():
	if 'generateContent' in m.supported_generation_methods:
    	print(m.name)
models/gemini-1.0-pro
models/gemini-1.0-pro-001
models/gemini-1.0-pro-latest
models/gemini-1.0-pro-vision-latest
models/gemini-1.5-flash-latest
models/gemini-1.5-pro-latest
models/gemini-pro
models/gemini-pro-vision

Let's access the gemini-1.5-pro-latest model:

model = genai.GenerativeModel('gemini-1.5-pro-latest')

Now let’s make our first API call and prompt Gemini 1.5 Pro!

response = model.generate_content("Please provide a list of the most influential people in the world.")
print(response.text)

The model provided a clear and detailed response with just a few lines of code.

Gemini AI can produce multiple responses called candidates for a prompt, allowing you to select the most appropriate one.

Gemini 1.5 Pro API: Image Prompting

In this section, we’ll learn how to leverage Gemini 1.5 Pro to automate extracting a list of books from an image!

Let’s first consider this image:

An image of a bookshelf with several books stacked on it.

Let’s first instruct the model to identify and list book titles in the image following a specific order:

text_prompt = "List all the books and help me organize them into three categories."

Next, we’ll open our "bookshelf.jpeg" image using the Image class, preparing it for the model's analysis.

bookshelf_image = Image.open('bookshelf.jpeg')

Let’s put both text_prompt and bookshelf_image into a list and prompt the model:

prompt = [text_prompt, bookshelf_image]
response = model.generate_content(prompt)

To avoid having the output in Markdown format, let's use the IPython Markdown tool.

Markdown(response.text)

The model has successfully managed to categorize all the books present in the uploaded image into three different categories .

Conclusion

In this tutorial, we have presented Gemini 1.5 Pro, the first release from the Gemini 1.5 family. Beyond its multi-modal capabilities and efficiency, Gemini 1.5 Pro expands the content window compared to the Gemini 1.0 series from 32k to multiple millions of tokens.

Whether working with text, images, video, or audio, the Gemini 1.5 Pro API offers the flexibility to tackle various applications, from information retrieval and question-answering to content generation and translation.

If you want to learn more about AI, check out this six-course AI Fundamentals skill track.


Photo of Natasha Al-Khatib
Author
Natasha Al-Khatib
LinkedIn

I'm a computer and communication engineer and researcher. I spend my time building deep learning tools to advance the cybersecurity field! 

Gemini FAQs

Is the Gemini 1.5 Pro API available?

Yes, the Gemini 1.5 Pro API is available in public preview in over 180 countries. You can access it through Google Cloud's Vertex AI platform.

Does Gemini have an API?

Yes, Gemini models are available through the Gemini API, which allows developers to integrate Gemini's capabilities into their applications.

What are the key differences between Gemini Ultra, Gemini Pro, and Gemini Nano?

These models differ primarily in size and computational capabilities. Gemini Ultra is the largest and most powerful, designed for complex tasks. Gemini Pro is a versatile option suitable for a wide range of applications. Gemini Nano is the smallest and most efficient, ideal for on-device or resource-constrained scenarios.

What is the API limit for Gemini Pro?

The Gemini Pro models have usage limits based on your Google Cloud project's quota. These limits can vary depending on your specific plan and usage patterns. You can find more details about the API limits and quotas in the Google Cloud documentation.

How can I get started with using the Gemini 1.5 Pro API in my projects?

Begin by obtaining the necessary API credentials (key or token) from the Gemini Pro 1.5 platform. Then, install the official client library for your preferred programming language to simplify interactions with the API. You can find code examples and tutorials in the API documentation to guide you through the implementation process.

Topics

Learn more about LLMs!

track

AI Fundamentals

10hrs hr
Discover the fundamentals of AI, dive into models like ChatGPT, and decode generative AI secrets to navigate the dynamic AI landscape.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

tutorial

Introducing Google Gemini API: Discover the Power of the New Gemini AI Models

Learn how to use Gemini Python API and its various functions to build AI-enabled applications for free.
Abid Ali Awan's photo

Abid Ali Awan

13 min

tutorial

GPT-4o API Tutorial: Getting Started with OpenAI's API

To connect through the GPT-4o API, obtain your API key from OpenAI, install the OpenAI Python library, and use it to send requests and receive responses from the GPT-4o models.
Ryan Ong's photo

Ryan Ong

8 min

tutorial

What is Google Gemini? Everything You Need To Know About Google’s ChatGPT Rival

Gemini defines a family of multimodal LLMs capable of understanding texts, images, videos, and audio. It’s also said to be capable of performing complex tasks in math and physics, as well as being able to generate high-quality code in several programming languages.
Kurtis Pykes 's photo

Kurtis Pykes

9 min

tutorial

Claude Sonnet 3.5 API Tutorial: Getting Started With Anthropic's API

To connect through the Claude 3.5 Sonnet API, obtain your API key from Anthropic, install the anthropic Python library, and use it to send requests and receive responses from Claude 3.5 Sonnet.
Ryan Ong's photo

Ryan Ong

8 min

code-along

Getting Started with the OpenAI API and ChatGPT

Get an introduction to the OpenAI API and the GPT-3 model.
Richie Cotton's photo

Richie Cotton

code-along

Fine-tuning GPT3.5 with the OpenAI API

In this code along, you'll learn how to use the OpenAI API and Python to get started fine-tuning GPT3.5.
Zoumana Keita 's photo

Zoumana Keita

See MoreSee More