Skip to content
My Python workspace (copy)
Weaviate workshop
Goals:
What you will see:
- Create a vector database with Weaviate,
- Add data to the database, and
- Interact with the data, including searching, and using LLMs with your data in Weaviate
You will learn today:
- What Weaviate is,
- How it stores the data (based on its "meaning"), and
- What you can do with Weaviate, like semantic searches, and using LLMs to transform data.
Preparation
Install the Weaviate python client, for environments that don't yet have it.
!pip install -U weaviate-client
Get the data
We'll use a subset of the Jeopardy! quiz dataset:
https://www.kaggle.com/datasets/tunguz/200000-jeopardy-questions
Pre-processed version:
Load (or download) the data, and preview it
import requests
import json
def load_data():
with open("jeopardy_1k.json", "r") as f:
raw_data = f.read()
return raw_data
def download_data():
response = requests.get('https://raw.githubusercontent.com/databyjp/wv_demo_uploader/main/weaviate_datasets/data/jeopardy_1k.json')
raw_data = response.text
return raw_data
# Parse the JSON and preview it
json_data = load_data()
data = json.loads(json_data)
print(type(data), len(data))
print(json.dumps(data[0], indent=2))
Step 1: Create a Weaviate instance (database)
We'll use Embedded Weaviate - this is a quick way to create a Weaviate database.
You can also use:
- A free sandbox with Weaviate Cloud Services
- Open-source Weaviate directly, available cross-platform with Docker
- Or use Kubernetes in production :)
# Temporary key for livestream only
openai_key = "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"