Beginner's Guide to Google's Vision API in Python

Learn what Vision API is and what are all the things that it offers. By the end of this tutorial, you will also learn how you can call Vision API from your Python code.

It's been quite a while since Google released a dedicated API called Vision API for performing computer vision related tasks. Computer vision is a field that concerns how a computer processes an image. It is quite easy for us humans to derive any useful insights from a given image but how does a computer do it?

Say, for example, you supply an image of a dog to your computer and using some software the computer tells you that the image supplied to it is a dog's image. This is where computer vision comes in. Computer vision is a whole world of study onto itself, and the Vision API provides a number of utilities for performing tasks related to computer vision with absolute ease. The best part is that developers with absolutely no previous experience in computer vision can use Vision API by going through its documentation.

This tutorial attempts to introduce you to Vision API and how it can be called from Python code. Specifically speaking, this tutorial is going to cover:

  • What is Google's Vision API (a more detailed introduction)
  • What are the offerings of Vision API
  • Vision API Client Library for Python
  • A case study with Vision API in Python

Note: If you feel that you want to know more about APIs in general (from Python and Machine Learning perspectives) before getting started with this tutorial following are some excellent resources:

What is Google's Vision API (a more detailed introduction)?

Google have encapsulated their Machine Learning models in an API to allow developers to use their Vision technology. The Vision API can quickly classify images into thousands of categories and assign them sensible labels. It can even detect individual objects, faces, and pieces of text within an image.

On a very high level, Google's Vision API lets you do two things:

  • Use the API directly from your code for doing powerful image analysis that too as scale.
  • Build custom models using the API to accommodate more flexibility for your particular use case.

This API is particularly handy because as the modern day progresses the need for "Full-Stack" practitioners is increasing very rapidly. Now, consider a scenario where a Full-Stack web developer (this essentially means that the developer is equipped with both the front-end and back-end technologies related to web development) is asked to build a website that takes images and detects its kind. Now, this would certainly require a good amount of knowledge in Computer Vision (if they do not already) because the developer will have to instruct his back-end code in such a way that it can accurately detect a given image. Also, assume that the deadline is not very long.

Now, in a situation like this, if developers start to learn Computer Vision from scratch and then implements the required tasks they are more likely to miss the deadline. Instead, if he/she uses some pre-trained computer vision models and learns the underlying concepts as they proceed towards the development, it would be more practical. This is precisely one of those situations where the Vision API comes in handy. The API provides many state-of-the-art pre-trained models to serve many real-world business use-cases.

The term "Full-Stack" is also getting associated with roles like Machine Learning Engineer, Data Scientist, etc. A Full-Stack Machine Learning Practitioner/Data Scientist is supposed to design and develop or at least know the end-to-end business processes. This includes "Making Production-ready Models" as one of the most crucial steps wherein the concerned person/team wraps the developed model into an API or a set of APIs and deploys on the production environment. Now, the term production varies accordingly to the use-cases, but the general idea/framework of the processes remains the same. The Vision API lets you efficiently train custom vision models with AutoML Vision BETA.

Great! By now, you should have got a pretty good overview of Vision API. A nice little experiment that you can do on the Vision homepage is to analyze your favorite images and derive useful insights with the help of Vision. Here are the steps to do this:

  • Go to Vision homepage.
  • It has a section called "Try the API". It lets you drag/upload an image in its interface.
  • Once you supply an image to it, it provides you with a bunch of information regarding the image: As you can see, Vision detected many facts about the image provided within no time. Feel free to explore the other tabs as well to learn even more about the image.

Consider this task if it was to be performed on a set of billion images. Using an API like this would undoubtedly be fruitful in that regard. Now, let's learn about the offerings of Vision API as to see some examples real-world use-cases the API has served to.

What are the offerings of Vision API - Some niche use-cases:

The Vision API is known for its accurate results. Vision API documentation provides an excellent collection of tutorials which gives you a very detailed insight about the API. But for a first glance, these things may appear to be overwhelming. So, to keep things simple, you will learn about a few use cases which have been already served by Vision API.

  • Optical Character Recognition (OCR): This is a classic example of Computer Vision which primarily deals with extraction of text from an image. The Vision API comprises many state-of-the-art approaches for doing this.
  • Detection of Image Properties: This is the task that you performed in the earlier section. With Vision API you can retrieve general attributes of an image, features such as dominant color.
  • Label Detection: This task annotates an image with a label (or "tag") based on the image content. For example, a picture of a dog may produce a label of "dog", "animal", or some other similar annotation. This is an essential step in the field of Content-based Information Retrieval.
  • Face Detection: Given an image or a set of images, the task is to detect the faces present in them. This has several large applications like Surveillance Systems.

These are some of the excellent use-cases on which Vision API performs seamlessly, and you can integrate any of the above into your applications within very less amount of time. If you want to learn more use-cases like these, be sure to check out these tutorials.

Vision API provides support for a wide range of languages like Go, C#, Java, PHP, Node.js, Python, Ruby. In the next sections, you will see how to use Vision API in Python.

Vision API Client Library for Python:

The first step for using the Python variant of Vision API, you will have to install it. The best way to install it is through pip.

!pip install google-cloud-vision

One the installation is successful, the next step is to verify if it is successful.

from google.cloud import vision

If the above line of code executes successfully, you are ready to proceed. Google provides a series of fantastic tutorials on using Vision API in Python.

Now, you will build a simple application in Python which will be able to detect some general attributes of an image, such as dominant color.

A case study with Vision API in Python:

Your application will take a path of an image as its input, and it will display the general attributes of the corresponding image. This is useful when the images are located inside the computer on which the application is going to be executed. But what if you need to read an image from the internet? The Vision API supports reading images from the internet as well.

In this case study, you will learn to tackle the first scenario. But it's only a matter of one line of code to accommodate the internet variant.

As always, you will start off by importing vision from google.cloud module.

from google.cloud import vision

The next step is to call ImageAnnotatorClient() which contains the utilities for extracting image properties.

client = vision.ImageAnnotatorClient()

You will most likely run into an error if GOOGLE_APPLICATION_CREDENTIALS environment variable is not set. This is because these libraries use Application Default Credentials (ADC) to locate your application's credentials. When your code uses libraries like this, the strategy checks for your credentials.

Follow this link to learn how to generate GOOGLE_APPLICATION_CREDENTIALS. You aim to generate a client_secrets.json file which you will use for authentication purpose.

Once client_secrets.json is obtained, you will execute the following code to set the GOOGLE_APPLICATION_CREDENTIALS environment variable.

import os

Now running the below code should not give you any error.

client = vision.ImageAnnotatorClient()

You will now write code for reading an image through a given path.

Image Courtesy

import io

path = 'Image.jpeg'
with io.open(path, 'rb') as image_file:
        content = image_file.read()

You have successfully loaded an image into your workspace. Now, you will instantiate an object of type vision.types.Image and you will supply content=content as its argument.

image = vision.types.Image(content=content)

You are only left with the final steps of your Image Properties detection application. In these steps, you will:

  • Call client.image_properties with as (image=image) argument.
  • Store the response of image_properties() in a variable response and extract the image properties by calling the image_properties_annotation argument of response.
  • Display several properties of the images in a formatted manner.
response = client.image_properties(image=image)
props = response.image_properties_annotation
print('Properties of the image:')

for color in props.dominant_colors.colors:
    print('Fraction: {}'.format(color.pixel_fraction))
    print('\tr: {}'.format(color.color.red))
    print('\tg: {}'.format(color.color.green))
    print('\tb: {}'.format(color.color.blue))

You might again run into errors if you have not enabled Vision API for your application. Enabling the API is extremely easy, and the error trace provides the instructions so that you can enable it quickly.

After enabling the API, you will have to enable Billing as well to use the Vision API. Utilities for Image Properties detection take only $0.60. After that is done, the code is executed successfully and produces the output.


You saw how easy it is to use the Vision API, and the kind of utilities that it provides that to are at a remarkably less cost. Today, many companies and organizations are getting benefited from this API be it for business purposes or research grounds. In this tutorial, you merely scratched the surface of Vision API, but this should serve you as a good starting point to use Machine Learning APIs for your applications.

Make sure you check out the whole suite of Machine Learning APIs that Google provides and it is known as CloudML.

You can build several cool applications with the help of these easily callable APIs. The links to Vision API and CloudML provide an amazing compilation of tutorials so that you can easily play with them. Good luck!

If you are interested in knowing more about Image Processing, take DataCamp's Convolutional Neural Networks for Image Processing course.

Want to leave a comment?