Skip to main content
HomeTutorialsArtificial Intelligence (AI)

Cohere API Tutorial: Getting Started With Cohere Models

Cohere offers powerful large language models for various language tasks through their user-friendly Playground or API.
May 2024  · 12 min read

In the fast-changing landscape of AI, the arrival of large language models has revolutionized how we interact with and generate text content. Since the release of OpenAI’s ChatGPT in November 2022, we’ve seen a fresh wave of large language models being released daily.

Among these models, Cohere Command has become a top choice among practitioners. Cohere offers cutting-edge tools to empower developers and businesses to harness foundational models' power in their applications and use cases.

This beginner's guide aims to describe Cohere's API offerings, providing an in-depth exploration of its capabilities, advantages, and practical implementation strategies.

We’ll begin with a quick overview of large language models, but if you’re looking for a more comprehensive approach, check out this course on Large Language Models (LLMs).

Cohere and LLMs

Large Language Models (LLMs) are advanced AI systems that understand and generate human-like text. They are trained on vast amounts of data to grasp complex patterns, comprehend language nuances, and generate coherent responses.

LLMs can perform various language-related tasks, including translation, text completion, summarization, and conversational interactions.

Cohere is a key player in this space, offering a user-friendly API that grants data scientists access to these LLM capabilities, streamlining their workflows.

What Is Cohere

Established in 2019, Cohere is a Canadian enterprise dedicated to creating artificial intelligence solutions for businesses. It has rapidly become a significant figure in the large language model (LLM) industry, frequently mentioned alongside competitors like OpenAI GPT and Anthropic Claude.

llms landscape

Source: Cobus Greyling, Large Language Model Landscape, Medium

Primarily, Cohere develops and provides foundational models for language generation, which are accessible to developers and businesses mainly through its API service.

Cohere primarily offers three different types of models:

  • Cohere Command: Cohere Command R is a family of highly scalable language models that balance high performance with strong accuracy. The latest version, Command R+, ranks among the top models in the LMSYS Chatbot Arena leaderboard.
  • Cohere Embed: Embed is Cohere’s leading text representation language model. It improves the accuracy of search results, retrieval-augmented generation (RAG), classification, and clustering.
  • Cohere Rerank: Cohere’s Rerank provides a powerful semantic boost to the search quality of any keyword or vector search system without requiring any overhaul or replacement.

We can access Cohere models in two ways:

  1. Through the Playground.
  2. Programmatically via APIs in Python or other programming languages.

Cohere also offers an exclusive chat interface similar to ChatGPT—powered by Command R+ under the hood.

Cohere Playground

Cohere Playground provides a user-friendly interface for interacting with Cohere Models, offering an intuitive experience similar to working with GPT.

Users can explore and experiment with various model capabilities, generating text and gaining insights into its behavior. With a sleek and accessible design, Cohere Playground enables users to quickly prototype and test Cohere’s model offerings quickly.

Cohere Playground is free for users to explore and experiment with models (it’s free until you go into production).

To use Cohere Playground, we first need to sign up on Cohere’s website.

Then, we can sign in, and we’ll land on the page we see below—which is pretty similar to a playground in the OpenAI platform. Using the dropdown on the right-hand side, we can choose different Cohere models.

The Playground has slightly different interfaces for different tasks. Towards the top, you have four options to choose from: Chat, Classify, Embed, and Generate.

cohere playground

Cohere Playground’s dashboard

Cohere Playground is great for quickly testing the capabilities of Cohere models, but if you are building an application or workflow, you need a programmatic way to access these models. That’s where Cohere’s API comes in!

Cohere API

Accessing Cohere’s API is pretty straightforward. In a nutshell, we’ll need to:

  1. Install the Cohere library.
  2. Create an API key.
  3. Make an API call.

Let’s discuss these steps one by one.

Install the Cohere library

We can install the cohere client using pip.

!pip install cohere

Create an API key

Now, we need an API key.

As we can see in the image below, there are two kinds of API keys: production and trial. For this tutorial’s purposes, we only need a trial key. We can copy it by clicking the copy icon next to the key.

cohere api

Make an API call

Now that we have the cohere library installed and our API key, we can make an API call. In the code below, we:

  • Import the cohere library to interact with their API.
  • Create a Client object using our API key.
  • Define a prompt to send to the LLM.
  • Call the model using
  • Extract the response object and print it.
import cohere

co = cohere.Client('your_token_here')
message = "What is Machine Learning?"
response =

answer = response.text
Machine learning is a branch of computer science that focuses on developing algorithms and models that enable computers to learn and make predictions or decisions without being explicitly programmed. It is a key component of artificial intelligence (AI).

Machine Learning allows systems to identify patterns in data, uncover insights, and improve processes through iterative learning. Instead of writing explicit programs, ML enables the creation of models that can analyze data, identify patterns, and predict outcomes. These models are refined and optimized through a process known as training, which involves feeding the models annotated examples or real-world data.


Response structure

Earlier, we accessed the answer using response.text. But the response object contains more information, and we can access it using response.dict()—this will return a JSON file containing all the metadata including the response message:

   "text":"Machine Learning (ML) is a ....",
         "message":"What is Machine Learning?",
         "message":"Machine Learning (ML) is a ....",

Let’s break down each field of the response object:

  • text: Contains the message generated by the model.
  • generation_id: Contains the ID associated with the generated answer, which can be used with the Feedback API endpoint to highlight good responses and report poor ones.
  • finish_reason: Indicates why the generation ended and can be one of the following:
    • COMPLETE: The model successfully finished generating the message.
    • MAX_TOKENS: The model's context limit was reached before the generation could be completed.
  • chat_history: Contains the conversation presented in a chat log format.
  • meta: Contains information about token counts, billing, etc.
  • response_id: While generation_id focuses on a single attempt at text generation, response_id identifies the entire response object containing that generated text and other relevant data.

Multi-turn conversations in Cohere’s API

Imagine you are developing a chatbot designed to answer user queries. Having context from previous conversations can enable the language model to provide more relevant answers.

In multi-turn conversations, retaining the context of previous interactions is essential for generating coherent and pertinent responses. Cohere’s API facilitates this by allowing us to include the chat history through the chat_history parameter.

This helps the model comprehend the conversation flow and respond appropriately based on earlier exchanges. Let’s see an example.

message = "How is this related to Artificial Intelligence?"

response =
    {"role": "USER", "text": "What is Large Language Model?"},
    {"role": "CHATBOT", "text": "Large Language Model is a type ..."},
  message="How is this related to Artificial Intelligence?"

Large language models (LLMs) are a key component of artificial intelligence (AI), and their development has played a significant role in advancing the field of AI. Here's how they are related:

1. Language Understanding and Generation: AI aims to create intelligent systems that can understand and respond to human language. LLMs are designed to process and generate human-like language, which is a fundamental aspect of human intelligence. By training on vast amounts of text data, LLMs learn to recognize patterns, syntax, semantics, and context, enabling them to interpret and generate language in a way that is similar to how humans do.

2. Machine Learning and Neural Networks: LLMs are typically built using machine learning techniques, particularly deep learning and neural networks. These are core areas of AI research. Neural networks, inspired by the structure of the human brain, enable LLMs to learn and make connections between different pieces of information. By training on large datasets, LLMs can identify patterns, relationships, and meanings in the data, which is a fundamental aspect of AI.

3. (...)

When the user asked "How is this related to Artificial Intelligence?," the model understood that "this" referred to large language models (LLMs). This is thanks to the chat_history parameter, which provided the necessary context from the previous interactions. Maintaining context in multi-turn conversations is crucial for generating relevant and coherent responses.

This works great but it’s not realistic to manually pass the chat_history to the API like we did in the above example. Instead of hardcoding the chat_history, we can build it dynamically as we have a conversation.

There are multiple ways to achieve this. The simplest one is using conversation_id in the API. Let’s see an example:

# First message
response =
  message="What is Large Language Models?",

Large Language Models (LLMs) are a type of artificial intelligence model specifically designed to process and generate human language. These models are typically based on deep learning and neural network architectures and are trained on vast amounts of text data. Here's an overview of LLMs and their key characteristics:

1. Definition:

- Large Language Models are machine learning models that are trained on large-scale textual data to understand, generate, and manipulate human language. They aim to capture the complexities of language, including syntax, semantics, and context.

2. Training Data:

- LLMs are trained on massive amounts of text data, often consisting of billions or even trillions of words. This training data can come from various sources such as books, articles, websites, social media, and other text-based documents.

3. (...)

And now let’s prompt the model with another message—notice how conversation_id is kept consistent with the first message. As long as conversation_id is kept consistent, the API will automatically use previous messages as context.

# Second message
response =
  message="How is it related to Artificial Intelligence?",
Large Language Models (LLMs) are a crucial component and a powerful manifestation of Artificial Intelligence (AI). Here's how LLMs are closely related to AI:

1. AI Foundation:

- Artificial Intelligence is a broad field that encompasses the development of computer systems capable of performing tasks typically associated with human intelligence. This includes areas like machine learning, natural language processing, computer vision, robotics, and decision-making.

- LLMs fall under the umbrella of AI as they are designed to understand, generate, and respond to human language, which is a fundamental aspect of human intelligence.

2. Machine Learning:

- Machine learning is a subset of AI that focuses on the development of algorithms and models that enable computers to learn and improve over time, based on data and experience, without being explicitly programmed.

- LLMs are built using machine learning techniques, particularly deep learning and neural networks. They are trained on large datasets to identify patterns, understand context, and generate responses, demonstrating their ability to learn and improve from data.

3. (...)

If you want to continue the conversation, we need to keep the conversation_id consistent, and it’ll automatically remember all previous conversations. The conversation_id should not be used in conjunction with the chat_history. They are mutually exclusive.

If you want to learn more, check out Cohere’s documentation.

Cohere Pricing

Let's examine the pricing of Cohere's most advanced model and compare it with the top proprietary language models from OpenAI and Anthropic (we’ll focus on each company's most powerful model). As of May 2024, these are the prices:


$ / million input token

$ / million output token

Cohere Command R+



Anthropic Claude Opus



OpenAI GPT-4 Turbo



While Cohere Command R+ offers an attractive cost advantage, it's important to consider the broader performance spectrum of these models. Anthropic Claude Opus and GPT-4 turbo have consistently demonstrated stronger performance in benchmarks.


Whether you're a developer looking to enhance your applications or a business aiming to incorporate cutting-edge AI, Cohere offers large models that are competitive with other foundational models, like GPT and Claude—but at a significantly lower price.

In this blog, we’ve learned what Cohere is and how to access it using either the Playground or the API.

If you want to explore the topic further, I recommend this code-along on Using Large Language Models with the Cohere API.

If you feel like going even deeper and learn how to develop end-to-end AI applications, check out Developing LLM Applications with LangChain.

Photo of Moez Ali
Moez Ali

Data Scientist, Founder & Creator of PyCaret


Learn more about APIs and LLMs with these courses!


Working with the OpenAI API

3 hr
Start your journey developing AI-powered applications with the OpenAI API. Learn about the functionality that underpins popular AI applications like ChatGPT.
See DetailsRight Arrow
Start Course
See MoreRight Arrow


OpenAI Assistants API Tutorial

A comprehensive overview of the Assistants API with our article, offering an in-depth look at its features, industry uses, setup guidance, and best practices to maximize its potential in various business applications.
Zoumana Keita 's photo

Zoumana Keita

14 min


A Beginner's Guide to The OpenAI API: Hands-On Tutorial and Best Practices

This tutorial introduces you to the OpenAI API, it's use-cases, a hands-on approach to using the API, and all the best practices to follow.
Arunn Thevapalan's photo

Arunn Thevapalan

13 min


Using Large Language Models with the Cohere API

In this session, you'll learn to use the Cohere API with Python to generate content based on a given prompt, extract information from documents, and build a semantic search engine.
Rishit Dholakia's photo

Rishit Dholakia


Building Multimodal AI Applications with LangChain & the OpenAI API

Combine the power of text and audio AI models to build a bot that answers questions about YouTube videos.
Korey Stegared-Pace's photo

Korey Stegared-Pace


Introduction to Large Language Models with GPT & LangChain

Learn the fundamentals of working with large language models and build a bot that analyzes data.
Richie Cotton's photo

Richie Cotton


Getting Started with the OpenAI API and ChatGPT

Get an introduction to the OpenAI API and the GPT-3 model.
Richie Cotton's photo

Richie Cotton

See MoreSee More