Code-Along Webinar | 2023-10-03 | Getting Started with Cohere API
What does Cohere do ?
Cohere offers an API to add cutting-edge language processing to any system. Cohere trains massive language models and puts them behind a simple API. Moreover, through training, users can create massive models customized to their use case and trained on their data. This way, Cohere handles the complexities of collecting massive amounts of text data, the ever-evolving neural network architectures, distributed training, and serving models around the clock.
Cohere offers access to both generation models (through the generate endpoint) and representation models (through the embed endpoint which returns an embedding vector for the input text).
You can read more about the API here
There are many advantages to using Cohere.ai for your app, including faster model building, improved accuracy and performance, and better scalability and maintainability.
Cohere Signup
- Click on the "TRY NOW" tab
-
All you need is your email address to signup
-
Click on API keys tabs and scroll down, you should see a trial api key present.
Set up a Workspace integration
- In Workspace, click on Environment.
- Click on the "Add environment variables" plus button.
- In the "Name" field, type "COHERE". In the "Value" field, paste in your secret key.
- Click "Create", and connect the new integration.
Please follow the instructions below :
- Install cohere package
!pip install cohere !pip install -U urllib3 !pip install requests==2.29.0
-
Restart your kernel on installations
-
Insert your API key
import os import cohere co_key = os.environ["COHERE_KEY"] co = cohere.Client(co_key)
# Install the above requirements
!pip install cohere
!pip install -U urllib3
!pip install requests==2.29.0# Add Cohere API key
Let's Begin
Consider yourself as tech founder of Nanogood, an E-commerce website where you get to sell all kinds of goods at wholesale and are a B2B business. You are short of employess in the creative space who can help you with creating advertisements of products, product descriptions and product titles.
Let us use cohere's generative AI api co.generate(). to generate product title and product description.
There are multiple language models that cohere offers as a part of the co.generate API we can use. Cohere’s text generation model is tuned to follow user commands and deliver leading edge performance with continuous updates. Cohere’s flagship model is called "Command".
It's available in two sizes: command-light and command. The command model demonstrates better performance, and command-light is a great option for developers who require fast response, like those building chatbots.
To reduce the turnaround time for releases, we have nightly versions of command available. This means that every week, you can expect the performance of command-nightly-* to improve.
The following code snippet, shows us how to use co.generate() in python. More platforms supported.
response = co.generate( model='command-nightly', prompt='Write 5 titles for a blog ideas for the keywords \"large language model\" or \"text generation\"', max_tokens=300, temperature=0.9, stop_sequences=[], return_likelihoods='NONE') print('Prediction: {}'.format(response.generations[0].text))
There are a few things to unpack here :
-
model - the model we would like to use for generation.
-
max_tokens - Sets the limit for the number of tokens to be generated.
-
temperature - This is useful in telling the model how creative should it be while generating content.
-
stop_sequences - On identifying the sequence in the list. The model will stop generating any tokens moving forward. This could be either "." or "\n" or any sequence of your choice.
-
return_likelihoods - Would you like to return any likelihoods for each of the tokens.
Exercise 1 - Generate Product Title and Product description
In this exercise we will pass a string that contains a product name and the brand. We will use co.generate() to a generate product title along with some description
# include the user prompt in the co.generate() module to receive generations
text = ' You are an Apple merchandising specialist, draft a product title and product description for a smartwatch from Fitbit, a fitness brand in a brand new ocean blue color, has accurate tracking.'
# use co.generate with temp = 0.9, max_tokens=200
Exercise 2 - Alternate Product Search
Its been a few months there are many clients that buy products from you in large quantities. Now you are facing an inventory mamangement issue on the tech side, where you have the inventory manager facing a problem where clients might need alternate products for products that are out of stock. Many clients have different requirements some want alternate products for the same price range,or products that give a high profit margin.
The problem you are facing is that you do not have any annotated attributes for the product, and asking annotators to do that might cost you alot.
So lets build a system where the we just use the product names to search alternate products with the same dimensions as a criteria.
This is how we are going to build this system using Cohere, with less than 100 lines of code :) :
- Build a semantic search method to retrieve similar product items
- Extract information from text data to rank the products based on the most similar dimension .
Task 1 : Build a semantic search method using Co:Embed
This endpoint is one of cohere's most popular endpoints across users and enterprises. This endpoint returns text embeddings. An embedding is a list of floating point numbers that captures semantic information about the text that it represents.
Embeddings can be used to create text classifiers as well as empower semantic search. To learn more about embeddings, see the embedding page.
Co:Embed is multilingual, which means we can use multiligual text documents and have an embedding representation for them, which can be used for many different task like language detection :
-
Multilingual semantic search - When the query and the result are of the same language. For example, an English query of “places to eat” returning an English result of “Bob's Burgers.”
-
Cross-lingual search - Happens when the query and the result are of a different language. For example, a Hindi query of “खाने की जगह” (places to eat) returning an English result of “Bob's Burgers.”
Semantic Search with Co:Embed
Language models give computers the ability to search by meaning and go beyond searching by matching keywords. This capability is called semantic search.
Now for the exciting part! Lets build a semantic search engine that can be used to power internal web search engines. At the end of this task you should be able to use this knowledge to build your own custom search engine which could power document search on you laptop or something cool as powering a chatbot to search stackoverflow assistant for documents to your natural language queries.
We will proceed with the follwing steps :
- Get the archive of product names
- Embed the archive
- Search using nearest neighbor search
#import numpy
#import pandas
#import torch
#import re (regex)
1. Get The Archive of Product Names
We will be using the amazon product description dataset which contains product names.
# read a file csv file using pandas
df= pd.read_csv("amazon_product_name.csv")
#select the all rows with column name 'Product Name'
product_names = df['Product Name'].tolist()Embed the archive
Using co.embed()
co.embed(texts=product_names, model="embed-english-light-v2.0", truncate="END").embeddings
"embed-english-light-v2.0" - is an embedding model for english text with 1024 dimension
A few details to unpack here :
text : A list of strings that are to be embedded
model : the models that you would like to use, there is a english-large, english-small and multiligual models available.
truncate : We can specify if we want to truncate the start of string or the end of string. The reason we do this is because embedding models allow a maximum context length of 512 to be represented in its embeddings form, if the context is longer that 512 token the sentence will be truncated either from the END(end of string) or from the START(start of string)
# Use co.embed and use passages as text and model : "embed-english-light-v2.0" and truncate from right.
embeds =