Skip to content

Weaviate workshop

Goals:

What you will see:
  • Create a vector database with Weaviate,
  • Add data to the database, and
  • Interact with the data, including searching, and using LLMs with your data in Weaviate
You will learn today:
  • What Weaviate is,
  • How it stores the data (based on its "meaning"), and
  • What you can do with Weaviate, like semantic searches, and using LLMs to transform data.

Preparation

Install the Weaviate python client, for environments that don't yet have it.

!pip install -U weaviate-client

Get the data

We'll use a subset of the Jeopardy! quiz dataset:

https://www.kaggle.com/datasets/tunguz/200000-jeopardy-questions

Pre-processed version:

https://raw.githubusercontent.com/databyjp/wv_demo_uploader/main/weaviate_datasets/data/jeopardy_1k.json

Load (or download) the data, and preview it

import requests
import json

def load_data():
    with open("jeopardy_1k.json", "r") as f:
        raw_data = f.read()
    return raw_data

def download_data():
    response = requests.get('https://raw.githubusercontent.com/databyjp/wv_demo_uploader/main/weaviate_datasets/data/jeopardy_1k.json')
    raw_data = response.text
    return raw_data

# Parse the JSON and preview it
json_data = load_data()
data = json.loads(json_data)
print(type(data), len(data))
print(json.dumps(data[0], indent=2))

Step 1: Create a Weaviate instance (database)

We'll use Embedded Weaviate - this is a quick way to create a Weaviate database.

You can also use:

  • A free sandbox with Weaviate Cloud Services
  • Open-source Weaviate directly, available cross-platform with Docker
  • Or use Kubernetes in production :)
# Temporary key for livestream only
openai_key = "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"