Skip to main content
HomeAbout PythonLearn Python

Beginner's Guide to Google's Vision API in Python

Learn what Vision API is and what are all the things that it offers. By the end of this tutorial, you will also learn how you can call Vision API from your Python code.
Dec 2018  · 10 min read

It's been quite a while since Google released a dedicated API called Vision API for performing computer vision related tasks. Computer vision is a field that concerns how a computer processes an image. It is quite easy for us humans to derive any useful insights from a given image but how does a computer do it?

Say, for example, you supply an image of a dog to your computer and using some software the computer tells you that the image supplied to it is a dog's image. This is where computer vision comes in. Computer vision is a whole world of study onto itself, and the Vision API provides a number of utilities for performing tasks related to computer vision with absolute ease. The best part is that developers with absolutely no previous experience in computer vision can use Vision API by going through its documentation.

This tutorial attempts to introduce you to Vision API and how it can be called from Python code.

Note: If you feel that you want to know more about APIs in general (from Python and Machine Learning perspectives) before getting started with this tutorial following are some excellent resources:

What is Google's Vision API? A more Detailed Introduction

Google have encapsulated their Machine Learning models in an API to allow developers to use their Vision technology. The Vision API can quickly classify images into thousands of categories and assign them sensible labels. It can even detect individual objects, faces, and pieces of text within an image.

On a very high level, Google's Vision API lets you do two things:

  • Use the API directly from your code for doing powerful image analysis that too as scale.
  • Build custom models using the API to accommodate more flexibility for your particular use case.

This API is particularly handy because as the modern day progresses the need for "Full-Stack" practitioners is increasing very rapidly. Now, consider a scenario where a Full-Stack web developer (this essentially means that the developer is equipped with both the front-end and back-end technologies related to web development) is asked to build a website that takes images and detects its kind. Now, this would certainly require a good amount of knowledge in Computer Vision (if they do not already) because the developer will have to instruct his back-end code in such a way that it can accurately detect a given image. Also, assume that the deadline is not very long.

Now, in a situation like this, if developers start to learn Computer Vision from scratch and then implements the required tasks they are more likely to miss the deadline. Instead, if he/she uses some pre-trained computer vision models and learns the underlying concepts as they proceed towards the development, it would be more practical. This is precisely one of those situations where the Vision API comes in handy. The API provides many state-of-the-art pre-trained models to serve many real-world business use-cases.

The term "Full-Stack" is also getting associated with roles like Machine Learning Engineer, Data Scientist, etc. A Full-Stack Machine Learning Practitioner/Data Scientist is supposed to design and develop or at least know the end-to-end business processes. This includes "Making Production-ready Models" as one of the most crucial steps wherein the concerned person/team wraps the developed model into an API or a set of APIs and deploys on the production environment. Now, the term production varies accordingly to the use-cases, but the general idea/framework of the processes remains the same. The Vision API lets you efficiently train custom vision models with AutoML Vision BETA.

Great! By now, you should have got a pretty good overview of Vision API. A nice little experiment that you can do on the Vision homepage is to analyze your favorite images and derive useful insights with the help of Vision. Here are the steps to do this:

  • Go to Vision homepage.
  • It has a section called "Try the API". It lets you drag/upload an image in its interface. try the api
  • Once you supply an image to it, it provides you with a bunch of information regarding the image: vision detected facts As you can see, Vision detected many facts about the image provided within no time. Feel free to explore the other tabs as well to learn even more about the image.

Consider this task if it was to be performed on a set of billion images. Using an API like this would undoubtedly be fruitful in that regard. Now, let's learn about the offerings of Vision API as to see some examples real-world use-cases the API has served to.

What are the Offerings of Vision API - Some Niche Use-Cases

The Vision API is known for its accurate results. Vision API documentation provides an excellent collection of tutorials which gives you a very detailed insight about the API. But for a first glance, these things may appear to be overwhelming. So, to keep things simple, you will learn about a few use cases which have been already served by Vision API.

  • Optical Character Recognition (OCR): This is a classic example of Computer Vision which primarily deals with extraction of text from an image. The Vision API comprises many state-of-the-art approaches for doing this.
  • Detection of Image Properties: This is the task that you performed in the earlier section. With Vision API you can retrieve general attributes of an image, features such as dominant color.
  • Label Detection: This task annotates an image with a label (or "tag") based on the image content. For example, a picture of a dog may produce a label of "dog", "animal", or some other similar annotation. This is an essential step in the field of Content-based Information Retrieval.
  • Face Detection: Given an image or a set of images, the task is to detect the faces present in them. This has several large applications like Surveillance Systems.

These are some of the excellent use-cases on which Vision API performs seamlessly, and you can integrate any of the above into your applications within very less amount of time. If you want to learn more use-cases like these, be sure to check out these tutorials.

Vision API provides support for a wide range of languages like Go, C#, Java, PHP, Node.js, Python, Ruby. In the next sections, you will see how to use Vision API in Python.

Vision API Client Library for Python

The first step for using the Python variant of Vision API, you will have to install it. The best way to install it is through pip.

!pip install google-cloud-vision

One the installation is successful, the next step is to verify if it is successful.

from import vision

If the above line of code executes successfully, you are ready to proceed. Google provides a series of fantastic tutorials on using Vision API in Python.

Now, you will build a simple application in Python which will be able to detect some general attributes of an image, such as dominant color.

A Case Study with Vision API in Python

Your application will take a path of an image as its input, and it will display the general attributes of the corresponding image. This is useful when the images are located inside the computer on which the application is going to be executed. But what if you need to read an image from the internet? The Vision API supports reading images from the internet as well.

In this case study, you will learn to tackle the first scenario. But it's only a matter of one line of code to accommodate the internet variant.

As always, you will start off by importing vision from module.

from import vision

The next step is to call ImageAnnotatorClient() which contains the utilities for extracting image properties.

client = vision.ImageAnnotatorClient()

You will most likely run into an error if GOOGLE_APPLICATION_CREDENTIALS environment variable is not set. This is because these libraries use Application Default Credentials (ADC) to locate your application's credentials. When your code uses libraries like this, the strategy checks for your credentials.

Follow this link to learn how to generate GOOGLE_APPLICATION_CREDENTIALS. You aim to generate a client_secrets.json file which you will use for authentication purpose.

Once client_secrets.json is obtained, you will execute the following code to set the GOOGLE_APPLICATION_CREDENTIALS environment variable.

import os

Now running the below code should not give you any error.

client = vision.ImageAnnotatorClient()

You will now write code for reading an image through a given path.


Image Courtesy

import io

path = 'Image.jpeg'
with, 'rb') as image_file:
        content =

You have successfully loaded an image into your workspace. Now, you will instantiate an object of type vision.types.Image and you will supply content=content as its argument.

image = vision.types.Image(content=content)

You are only left with the final steps of your Image Properties detection application. In these steps, you will:

  • Call client.image_properties with as (image=image) argument.
  • Store the response of image_properties() in a variable response and extract the image properties by calling the image_properties_annotation argument of response.
  • Display several properties of the images in a formatted manner.
 response = client.image_properties(image=image)
props = response.image_properties_annotation
print('Properties of the image:')

for color in props.dominant_colors.colors:
    print('Fraction: {}'.format(color.pixel_fraction))
    print('\tr: {}'.format(
    print('\tg: {}'.format(
    print('\tb: {}'.format(

You might again run into errors if you have not enabled Vision API for your application. Enabling the API is extremely easy, and the error trace provides the instructions so that you can enable it quickly.

After enabling the API, you will have to enable Billing as well to use the Vision API. Utilities for Image Properties detection take only $0.60. After that is done, the code is executed successfully and produces the output.

utilities for Image Properties detection


You saw how easy it is to use the Vision API, and the kind of utilities that it provides that to are at a remarkably less cost. Today, many companies and organizations are getting benefited from this API be it for business purposes or research grounds. In this tutorial, you merely scratched the surface of Vision API, but this should serve you as a good starting point to use Machine Learning APIs for your applications.

Make sure you check out the whole suite of Machine Learning APIs that Google provides and it is known as CloudML.

You can build several cool applications with the help of these easily callable APIs. The links to Vision API and CloudML provide an amazing compilation of tutorials so that you can easily play with them. Good luck!

If you are interested in knowing more about Image Processing, take DataCamp's Convolutional Neural Networks for Image Processing course.


Learn more about Python


Image Modeling with Keras

4 hr
Learn to conduct image analysis using Keras with Python by constructing, training, and evaluating convolutional neural networks.
See DetailsRight Arrow
Start Course
See MoreRight Arrow

What is DeepMind AlphaGeometry?

Discover AphaGeometry, an innovative AI model with unprecedented performance to solve geometry problems.
Javier Canales Luna's photo

Javier Canales Luna

8 min

What is Stable Code 3B?

Discover everything you need to know about Stable Code 3B, the latest product of Stability AI, specifically designed for accurate and responsive coding.
Javier Canales Luna's photo

Javier Canales Luna

11 min

The 11 Best AI Coding Assistants in 2024

Explore the best coding assistants, including open-source, free, and commercial tools that can enhance your development experience.
Abid Ali Awan's photo

Abid Ali Awan

8 min

How the UN is Driving Global AI Governance with Ian Bremmer and Jimena Viveros, Members of the UN AI Advisory Board

Richie, Ian and Jimena explore what the UN's AI Advisory Body was set up for, the opportunities and risks of AI, how AI impacts global inequality, key principles of AI governance, the future of AI in politics and global society, and much more. 
Richie Cotton's photo

Richie Cotton

41 min

The Power of Vector Databases and Semantic Search with Elan Dekel, VP of Product at Pinecone

RIchie and Elan explore LLMs, vector databases and the best use-cases for them, semantic search, the tech stack for AI applications, emerging roles within the AI space, the future of vector databases and AI, and much more.  
Richie Cotton's photo

Richie Cotton

36 min

Getting Started with Claude 3 and the Claude 3 API

Learn about the Claude 3 models, detailed performance benchmarks, and how to access them. Additionally, discover the new Claude 3 Python API for generating text, accessing vision capabilities, and streaming.
Abid Ali Awan's photo

Abid Ali Awan

See MoreSee More