Medical professionals often summarize patient encounters in transcripts written in natural language, which include details about symptoms, diagnosis, and treatments. These transcripts can be used for other medical documentation, such as for insurance purposes, but as they are densely packed with medical information, extracting the key data accurately can be challenging.
You and your team at Lakeside Healthcare Network have decided to leverage the OpenAI API to automatically extract medical information from these transcripts and automate the matching with the appropriate ICD-10 codes. ICD-10 codes are a standardized system used worldwide for diagnosing and billing purposes, such as insurance claims processing.
The Data
The dataset contains anonymized medical transcriptions categorized by specialty.
transcriptions.csv
Column | Description |
---|---|
"medical_specialty" | The medical specialty associated with each transcription. |
"transcription" | Detailed medical transcription texts, with insights into the medical case. |
Before you start
In order to complete the project you will need to create a developer account with OpenAI and store your API key as a secure environment variable. Instructions for these steps are outlined below.
Create a developer account with OpenAI
-
Go to the API signup page.
-
Create your account (you'll need to provide your email address and your phone number).
-
Go to the API keys page.
-
Create a new secret key.
- Take a copy of it. (If you lose it, delete the key and create a new one.)
Add a payment method
OpenAI sometimes provides free credits for the API, but this can vary depending on geography. You may need to add debit/credit card details.
This project should cost less than 10 US cents with GPT-3.5-Turbo (but if you rerun tasks, you will be charged every time).
-
Go to the Payment Methods page.
-
Click Add payment method.
- Fill in your card details.
Add an environmental variable with your OpenAI key
-
In the workbook, click on "Environment," in the left sidebar.
-
Click on the plus button next to "Environment variables" to add environment variables.
-
In the "Name" field, type "OPENAI_API_KEY". In the "Value" field, paste in your secret key.
- Click "Create", then you'll see the following pop-up window. Click "Connect," then wait 5-10 seconds for the kernel to restart, or restart it manually in the Run menu.
# Import the necessary libraries
import pandas as pd
from openai import OpenAI
import json
# Load the data
df = pd.read_csv("data/transcriptions.csv")
df.head()
## Start coding here, use as many cells as you need
# Initialize the OpenAI client: make sure you have a valid API key named OPENAI_API_KEY in your Environment Variables
client = OpenAI()
def extract_info_with_openai(transcription):
"""Extracts age and recommended treatment or procedure from a transcription using OpenAI."""
messages = [
{
"role": "system",
"content":"You are a healthcare professional and need to get the age and recommended treatment or procedure from a medical record transcript. Always return both age and recommended treatment or procedure: if any of the fields is missing in the transcript, return Not Found.",
"role": "user",
"content": f"Return the age and recommended treatment or procedure for the patients from the body of the following transcription: {transcription}. "
}
]
function_definition = [
{
'type': 'function',
'function': {
'name': 'extract_medical_data',
'description': 'Get the age and recommended treatment or procedure from the input text. Always return both age and recommended treatment or procedure: if any of the fields is missing in the transcript, return Not Found.',
'parameters': {
'type': 'object',
'properties': {
'Age': {
'type': 'integer',
'description': 'Age of the patient'
},
'Recommended Treatment/Procedure': {
'type': 'string',
'description': 'Recommended treatment or procedure for the patient'
}
}
}
}
}
]
response = client.chat.completions.create(
model="gpt-4o-mini", # gpt-3.5-turbo
messages=messages,
tools=function_definition
)
return json.loads(response.choices[0].message.tool_calls[0].function.arguments)
def get_icd_codes(treatment):
"""Retrieves ICD codes for a given treatment using OpenAI."""
response = client.chat.completions.create(
model="gpt-4o-mini", # gpt-3.5-turbo
messages=[{
"role": "user",
"content": f"Provide the ICD codes for the following treatment or procedure: {treatment}. Return the answer as a list of codes with corresponding definition."
}],
temperature=0.3
)
return response.choices[0].message.content
processed_data = []
for index, row in df.iterrows():
transcription = row['transcription']
medical_specialty = row['medical_specialty']
extracted_data = extract_info_with_openai(transcription)
icd_code = get_icd_codes(extracted_data["Recommended Treatment/Procedure"])
extracted_data["Medical Specialty"] = medical_specialty
extracted_data["ICD Code"] = icd_code
processed_data.append(extracted_data)
df_structured = pd.DataFrame(processed_data)
df_structured.head()