Enriching stock market data using Open AI API
The Nasdaq-100 is a stock market index made up of 101 equity securities issued by 100 of the largest non-financial companies listed on the Nasdaq stock exchange. It helps investors compare stock prices with previous prices to determine market performance.
In this project you are provided with two CSV files containing Nasdaq-100 stock information:
- nasdaq100.csv: contains information about companies in the index such as symbol, name, etc.
- nasdaq100_price_change.csv: contains price changes per stock across periods including (but not limited to) one day, five days, one month, six months, one year, etc.
As an AI developer, you will leverage the OpenAI API to classify companies into sectors and produce a summary of sector and company performance for this year.
Add an environmental variable with your OpenAI key
-
In DataLab, click on "Environment," in the menu.
-
Click on "Environment variables" to add environment variables.
-
In the "Name" field, type "OPENAI_API_KEY". In the "Value" field, paste in your secret key.
# Start your code here!
import os
import pandas as pd
from openai import OpenAI
# Instantiate an API client
client = OpenAI(api_key=os.environ["MY_TEST_KEY"])
# Read data
nasdaq100 = pd.read_csv("nasdaq100.csv") # Contains basic company info
price_change = pd.read_csv("nasdaq100_price_change.csv") # Contains YTD price data
print(nasdaq100.columns)print(price_change.columns)nasdaq100 = pd.merge(nasdaq100,price_change[['symbol','ytd']],on='symbol')nasdaq100.rename(columns={'YTD':'ytd'},inplace=True)print(nasdaq100.columns)def classify_sector(company_name):
prompt = (
f"Classify the company '{company_name}' into one of the following sectors:\n"
"Technology, Consumer Cyclical, Industrials, Utilities, Healthcare, "
"Communication, Energy, Consumer Defensive, Real Estate, or Financial.\n\n"
"Only return the name of the sector."
)
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content.strip()
Explanation of return response['choices'][0]['message']['content'].strip()
-
response
This is the dictionary returned byopenai.ChatCompletion.create(...). It has a structure like:{ "id": "...", "object": "chat.completion", "created": 123456, "model": "gpt-3.5-turbo", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Technology" }, "finish_reason": "stop" } ], ... } -
response['choices'][0]
OpenAI returns responses in a list under thechoiceskey. Even if there's just one answer, it's stored as a list. So, this grabs the first response. -
['message']['content']
This navigates into that first choice and retrieves the actual text GPT returned — the assistant's answer, like "Technology". -
.strip()
This removes any leading or trailing whitespace (like newlines or extra spaces). So, " Technology\n" becomes "Technology".
# Only get unique names to avoid redundant API calls
unique_names = nasdaq100['symbol'].drop_duplicates()
# Create a name-to-sector map
name_to_sector = {name: classify_sector(name) for name in unique_names}
# Map back to the DataFrame
nasdaq100['sector'] = nasdaq100['symbol'].map(name_to_sector)
sector_counts = nasdaq100['sector'].value_counts()
print(sector_counts)
# Format the data into a readable string for the model
def generate_summary_prompt(df):
top_stocks = df.sort_values(by="ytd", ascending=False).head(20)
summary = "Company YTD Performance:\n"
for _, row in top_stocks.iterrows():
summary += f"{row['name']} ({row['symbol']}), Sector: {row['sector']}, YTD: {row['ytd']}%\n"
prompt = (
summary +
"\nBased on this data, recommend the '3' unique best performing sectors year-to-date, "
"and for 'each', list 2-3 top companies. Format your response as a table with columns: "
"'Sector', 'Company', 'Symbol', 'YTD (%)'."
)
return prompt
# Create and send the prompt
prompt = generate_summary_prompt(nasdaq100)