My Python workspace (copy)

Weaviate workshop

Goals:

What you will see:

Create a vector database with Weaviate,
Add data to the database, and
Interact with the data, including searching, and using LLMs with your data in Weaviate

You will learn today:

What Weaviate is,
How it stores the data (based on its "meaning"), and
What you can do with Weaviate, like semantic searches, and using LLMs to transform data.

Preparation

Install the Weaviate python client, for environments that don't yet have it.

!pip install -U weaviate-client

Get the data

We'll use a subset of the Jeopardy! quiz dataset:

https://www.kaggle.com/datasets/tunguz/200000-jeopardy-questions

Pre-processed version:

https://raw.githubusercontent.com/databyjp/wv_demo_uploader/main/weaviate_datasets/data/jeopardy_1k.json

Load (or download) the data, and preview it

import requests
import json

def load_data():
    with open("jeopardy_1k.json", "r") as f:
        raw_data = f.read()
    return raw_data

def download_data():
    response = requests.get('https://raw.githubusercontent.com/databyjp/wv_demo_uploader/main/weaviate_datasets/data/jeopardy_1k.json')
    raw_data = response.text
    return raw_data

# Parse the JSON and preview it
json_data = load_data()
data = json.loads(json_data)
print(type(data), len(data))
print(json.dumps(data[0], indent=2))

Step 1: Create a Weaviate instance (database)

We'll use Embedded Weaviate - this is a quick way to create a Weaviate database.

You can also use:

A free sandbox with Weaviate Cloud Services

Open-source Weaviate directly, available cross-platform with Docker

Or use Kubernetes in production :)

# Temporary key for livestream only
openai_key = "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"

‌
‌
‌