Blog

Introducing Google Gemini API: Discover the Power of the New Gemini AI Models

Learn how to use Gemini Python API and its various functions to build AI-enabled applications for free.

Updated Jan 2024 · 13 min read

Google Bard has faced increasing competition from ChatGPT, but with the release of their latest Gemini AI models, they hope to regain their footing in the market. In an effort to improve user experience, Google has recently revamped Bard AI assistance with Gemini Pro models, making it more accessible for users who wish to input both text and images to receive accurate and natural responses.

In this tutorial, we will explore the Google Gemini API, which allows you to create advanced AI-based applications. By leveraging its powerful capabilities, you can easily incorporate both text and image inputs to generate precise and contextually relevant outputs. Gemini Python API simplifies integration into your existing projects, enabling seamless implementation of cutting-edge artificial intelligence features.

What is Gemini AI?

Gemini AI is a cutting-edge artificial intelligence (AI) model created through a joint effort by various teams within Google, such as Google Research and Google DeepMind. Designed to be multimodal, Gemini AI can comprehend and process numerous forms of data, including text, code, audio, images, and videos.

Google DeepMind’s pursuit of artificial intelligence has long been driven by the hope of using its capabilities for the collective benefit of humanity. With that, Gemini AI has made significant advancements in creating a new generation of AI models that are inspired by how humans perceive and interact with the world.

As the most sophisticated and largest AI model produced by Google thus far, Gemini AI was designed to be highly flexible and handle diverse types of information. Its versatility allows it to run efficiently on a variety of systems, ranging from powerful data center servers to mobile devices.

Here are three versions of the Gemini models trained for specific use cases:

Gemini Ultra: The most advanced version, capable of handling intricate tasks.
Gemini Pro: A well-balanced choice offering good performance and scalability.
Gemini Nano: Optimized for mobile devices, providing maximum efficiency.

Image source

Among these variants, Gemini Ultra has displayed exceptional performance, surpassing GPT-4 on multiple criteria. It is the first model to outperform human experts on the Massive Multitask Language Understanding benchmark, which assesses world knowledge and problem-solving abilities across 57 subject areas. This achievement demonstrates Gemini Ultra's superior comprehension and problem-solving capabilities.

If you're new to Artificial Intelligence, explore the AI Fundamentals skill track. It covers popular AI topics such as ChatGPT, large language models, and generative AI.

Environment Setup

Before we start using the API, we have to get the API key from the Google AI for Developers.

1. Click on the “Get an API key” button.

2. After that, create the project and generate the API key.

3. Copy the API key and add an environment variable called "Gemini_API_KEY" in your system. In order to create a secure environment variable on Kaggle, go to "Add-ons" and select "Secrets." Then, click on the "Add a new secret" button to add a label and its corresponding value, and save it.

4. Install the Gemini Python API package using pip.

%pip install google-generativeai

5. Configure the Gemini API by adding the API key. To access the secret you have to create the UserSecretsClient object and then provide it with the label.

import google.generativeai as genai
from kaggle_secrets import UserSecretsClient

user_secrets = UserSecretsClient()
gemini_key = user_secrets.get_secret("GEMINI_API_KEY")

genai.configure(api_key = gemini_key)

Generating the Response

In this section, we will learn to generate responses using the Gemini Pro model. If you are facing issues while running the code on your own, you can follow the Gemini API Kaggle Notebook.

We will now check the available model access with the free API. To do so, run the code below. It will display the supported generation models using the list_models function.

for m in genai.list_models():
    if 'generateContent' in m.supported_generation_methods:
        print(m.name)

It appears that we have access to only two models, "gemini-pro" and "gemini-pro-vision".

models/gemini-pro
models/gemini-pro-vision

Let's access the "Gemini Pro" model and generate a response by providing a simple prompt.

model = genai.GenerativeModel('gemini-pro')

response = model.generate_content("Please provide a list of the most influential people in the world.")

print(response.text)

We were able to generate a highly accurate response with just a few lines of code.

Gemini AI can produce multiple responses called candidates for a prompt, allowing you to select the most appropriate one. However, the free version of the API only offers one candidate option.

response.candidates

Our output is in Markdown format. Let's fix that by using the IPython Markdown tool. We will now ask Gemini to generate Python code.

from IPython.display import Markdown

response = model.generate_content("Build a simple Python web application.")

Markdown(response.text)

The model has done a great job. It has generated a step-by-step guide with an explanation of how to build and run a simple web application using Flask.

Streaming

It is important to use streaming to increase the perceived speed of the application. By turning on the 'stream' argument, the response will be generated and displayed as soon as it becomes available. Our output will be generated in chunks, so we have to iterate over response.

from IPython.display import display

model = genai.GenerativeModel("gemini-pro")
response = model.generate_content(
    "How can I make authentic Italian pasta?", stream=True
)

for chunk in response:
    display(Markdown(chunk.text))
    display(Markdown("_" * 80))

Each chunk is separated by a dashed line. As we can see below, our content is generated in chunks. The streaming in Gemini APi is similar to OpenAI and Anthropic API.

Fine-tuning the Response

We can adjust the generated response to suit our requirements.

In the following example, we are generating one candidate with a maximum of 1000 tokens and a temperature of 0.7. Additionally, we have set a stop sequence, so that when the word "time" appears in the generated text, the generation process automatically stops.

response = model.generate_content(
    'How to be productive during a burnout stage.',
    generation_config=genai.types.GenerationConfig(
        candidate_count=1,
        stop_sequences=['time'],
        max_output_tokens=1000,
        temperature=0.7)
)

Markdown(response.text)

Trying Gemini Pro Vision

In this section, we will use Gemini Pro Vision to access the multi-modality of Gemini AI.

We will download the photo by Mayumi Maciel from pexel.com using the curl tool.

!curl -o landscape.jpg "https://images.pexels.com/photos/18776367/pexels-photo-18776367/free-photo-of-colorful-houses-line-the-canal-in-burano-italy.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=2"

To load and view the image in the Kaggle Notebook, we will use the Pillow Python package.

import PIL.Image

img = PIL.Image.open('landscape.jpg')
display(img)

To understand the content of this image, we must first load the "gemini-pro-vision" model and provide it with an image object.

model = genai.GenerativeModel('gemini-pro-vision')

response = model.generate_content(img)

Markdown(response.text)

It has accurately identified the location in the image and provided a description.

A beautiful day in Burano, Italy. The colorful houses and the calm water make this a perfect place to relax and enjoy the scenery.

Let's feed the prompt text and image object to our generation function to ask our model question about the image.

response = model.generate_content(
    ["Can you provide the exact location with coordinates?", img]
)

Markdown(response.text)

45.4408° N, 12.3150° E

It has accurately provided the coordinates of the place, and we have also verified it by adding the coordinates to Google Maps.

Chat Conversations

It is possible to have a conversation with the AI model that retains the context of the chat, using the history of the previous conversations. This way, the model can remember the content of the conversation and provide appropriate responses based on the previous interactions.

We just have to create a chat object using the model.start_chat and provide it with the first message.

model = genai.GenerativeModel('gemini-pro')

chat = model.start_chat(history=[])

chat.send_message("Who created the Mona Lisa?")

for message in chat.history:
    display(Markdown(f'**{message.role}**: {message.parts[0].text}'))

Let’s ask another question and display the response in the chat format.

chat.send_message("What else he created?")

for message in chat.history:
    display(Markdown(f'**{message.role}**: {message.parts[0].text}'))

The model has remembered the context and provided an appropriate response.

Embeddings

The API offers access to the embedding models, which is a popular tool for context-aware applications. With the simple Python function, users can convert their prompt or content to embedding, enabling them to represent words, sentences, or entire documents as dense vectors that encode semantic meaning.

We will provide the embed_content function with the model name, content, task type, and the title.

result = genai.embed_content(
    model="models/embedding-001",
    content="Who created the Mona Lisa?",
    task_type="retrieval_document",
    title="Mona Lisa Research")

print(result['embedding'][:10])

[0.086675204, -0.027617611, -0.015689207, -0.00445945, 0.07286186, 0.00017529335, 0.07243656, -0.018299067, 0.018799499, 0.028495966]

To convert multiple content into the embedding, you need to provide a list of strings to the 'content' argument.

result = genai.embed_content(
    model="models/embedding-001",
    content=[
        "Who created the Mona Lisa?",
        "What else he created?",
        "When did he died?"
    ],
    task_type="retrieval_document",
    title="Mona Lisa Research 2")

for emb in result['embedding']:
    print(emb[:10],"\n")

[0.07942627, -0.032543894, -0.010346633, -0.0039942865, 0.071596086, -0.0016670177, 0.07821064, -0.011955016, 0.019478166, 0.03784406] 

[0.027897138, -0.030693276, -0.0012639443, 0.0018902065, 0.07171923, -0.011544562, 0.04235154, -0.024570161, 0.013215181, 0.03026724] 

[0.04341321, 0.013262799, -0.0152797215, -0.009688456, 0.07342798, 0.0033503908, 0.05274988, -0.010907041, 0.05933596, 0.019402765]

Enroll in Introduction to Embeddings with the OpenAI API course to learn about embedding models for advanced AI applications like semantic search and recommendation engines.

Advanced Use Cases

In order to harness the full potential of the Gemini API, developers may need to dive deeper into advanced features such as safety settings and low-level API.

Safety settings enable users to configure content blocking and allowance in prompts and responses, ensuring compliance with community guidelines.
Lower-level API provides increasing flexibility for customization.
Multi-turn conversations can be extended using the GenaiChatSession class, enhancing conversational experiences.

Using advanced features enables developers to create more efficient applications and unlocks new possibilities for user interactions.

Conclusion

The Gemini API opens up exciting possibilities for developers to build advanced AI applications by using both text and visual inputs. Its state-of-the-art models, like Gemini Ultra, push the boundaries of what AI systems can comprehend and generate.

Google has made integrating AI into applications more seamless with the convenient access provided through a Python API. Gemini API offers a range of features such as content generation, embedding, and multi-turn conversations, making AI more accessible. Gemini AI signifies a major leap forward in multi-modal understanding.

If you're interested in learning more, start your journey developing AI-powered applications with the OpenAI API by enrolling in a short Working with the OpenAI API course. Discover the functionality behind popular AI applications, like ChatGPT and DataLab, DataCamp's AI-enabled data notebook.

If you are a professional and don't need the practical course, you can explore the OpenAI API in Python with our cheat sheet.

Author

Abid Ali Awan

Topics

Artificial Intelligence (AI)

Start Your AI Journey Today!

Course

Generative AI Concepts

2 hr

13.4K

Discover how to begin responsibly leveraging generative AI. Learn how generative AI models are developed and how they will impact society moving forward.

See Details

Start Course

Course

Working with the OpenAI API

3 hr

8.9K

Start your journey developing AI-powered applications with the OpenAI API. Learn about the functionality that underpins popular AI applications like ChatGPT.

See Details

Start Course

Course

ChatGPT Prompt Engineering for Developers

4 hr

3.2K

Dive deep into the principles and best practices of prompt engineering to leverage powerful language models like ChatGPT to solve real-world problems.

See Details

Start Course

blog

You’re invited! Join us for Radar: AI Edition

Join us for two days of events sharing best practices from thought leaders in the AI space

DataCamp Team

2 min

blog

What is Llama 3? The Experts' View on The Next Generation of Open Source LLMs

Discover Meta’s Llama3 model: the latest iteration of one of today's most powerful open-source large language models.

Richie Cotton

5 min

podcast

How Walmart Leverages Data & AI with Swati Kirti, Sr Director of Data Science at Walmart

Swati and Richie explore the role of data and AI at Walmart, how Walmart improves customer experience through the use of data, supply chain optimization, demand forecasting, scaling AI solutions, and much more.

Richie Cotton

31 min

podcast

Creating an AI-First Culture with Sanjay Srivastava, Chief Digital Strategist at Genpact

Sanjay and Richie cover the shift from experimentation to production seen in the AI space over the past 12 months, how AI automation is revolutionizing business processes at GENPACT, how change management contributes to how we leverage AI tools at work, and much more.

Richie Cotton

36 min

tutorial

Serving an LLM Application as an API Endpoint using FastAPI in Python

Unlock the power of Large Language Models (LLMs) in your applications with our latest blog on "Serving LLM Application as an API Endpoint Using FastAPI in Python." LLMs like GPT, Claude, and LLaMA are revolutionizing chatbots, content creation, and many more use-cases. Discover how APIs act as crucial bridges, enabling seamless integration of sophisticated language understanding and generation features into your projects.

Moez Ali

tutorial

How to Improve RAG Performance: 5 Key Techniques with Examples

Explore different approaches to enhance RAG systems: Chunking, Reranking, and Query Transformations.

Eugenia Anello

See More See More

What is Gemini AI?

Environment Setup

Generating the Response

Streaming

Fine-tuning the Response

Trying Gemini Pro Vision

Chat Conversations

Embeddings

Advanced Use Cases

Conclusion

You’re invited! Join us for Radar: AI Edition

What is Llama 3? The Experts' View on The Next Generation of Open Source LLMs

How Walmart Leverages Data & AI with Swati Kirti, Sr Director of Data Science at Walmart

Creating an AI-First Culture with Sanjay Srivastava, Chief Digital Strategist at Genpact

Serving an LLM Application as an API Endpoint using FastAPI in Python

How to Improve RAG Performance: 5 Key Techniques with Examples

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Generative AI Concepts

Working with the OpenAI API

ChatGPT Prompt Engineering for Developers

You’re invited! Join us for Radar: AI Edition

What is Llama 3? The Experts' View on The Next Generation of Open Source LLMs

How Walmart Leverages Data & AI with Swati Kirti, Sr Director of Data Science at Walmart

Creating an AI-First Culture with Sanjay Srivastava, Chief Digital Strategist at Genpact

Serving an LLM Application as an API Endpoint using FastAPI in Python

How to Improve RAG Performance: 5 Key Techniques with Examples

Generative AI Concepts