🐕 Choose my dog breed
Build an AI chatbot that finds your perfect pup
📖 Background
You’re working as part of an innovation team at a smart robotics company launching a new AI-powered robot dog. To attract customers, they want a fun, interactive chatbot that helps users discover their ideal real-world dog breed match based on their personality and lifestyle.
Your challenge: build a chatbot recommender system that asks questions, interprets responses, and recommends the top three matching dog breeds, complete with images. Bonus: include generated videos for social media flair!
💾 The data
The data consists of three datasets: data/breed_traits, data/trait_description, and a Github repo consisting of 30+ images per dog breed.
Dog breed traits
| Variable | Class | Description |
|---|---|---|
| Breed | character | Dog breed name |
| Affectionate With Family | character | Placement on scale of 1–5 for the breed’s tendency to be affectionate with family |
| Good With Young Children | character | Placement on scale of 1–5 for the breed’s tendency to be good with young children |
| Good With Other Dogs | character | Placement on scale of 1–5 for the breed’s tendency to be good with other dogs |
| Shedding Level | character | Placement on scale of 1–5 for the breed’s shedding level |
| Coat Grooming Frequency | character | Placement on scale of 1–5 for the breed’s grooming frequency |
| Drooling Level | character | Placement on scale of 1–5 for the breed’s drooling level |
| Coat Type | character | Description of the breed’s coat type |
| Coat Length | character | Description of the breed’s coat length |
| Openness To Strangers | character | Placement on scale of 1–5 for the breed’s openness to strangers |
| Playfulness Level | character | Placement on scale of 1–5 for the breed’s playfulness |
| Watchdog/Protective Nature | character | Placement on scale of 1–5 for the breed’s protective instincts |
| Adaptability Level | character | Placement on scale of 1–5 for the breed’s adaptability |
| Trainability Level | character | Placement on scale of 1–5 for the breed’s trainability |
| Energy Level | character | Placement on scale of 1–5 for the breed’s energy level |
| Barking Level | character | Placement on scale of 1–5 for the breed’s barking level |
| Mental Stimulation Needs | character | Placement on scale of 1–5 for the breed’s mental stimulation needs |
Trait descriptions
| Variable | Class | Description |
|---|---|---|
| Trait | character | Dog trait name |
| Trait_1 | character | Description when scale = 1 |
| Trait_5 | character | Description when scale = 5 |
| Description | character | Long-form explanation of the trait |
Dog breed images
| Detail | Description |
|---|---|
| Coverage | Breeds recognized by the Fédération Cynologique Internationale (FCI) |
| Images per breed | 35 |
| Total breeds | 356 |
| Format | High-resolution JPEG images |
import pandas as pd
dog_breeds = pd.read_csv('data/breed_traits.csv')
dog_breeds.head()trait_descriptions = pd.read_csv('data/trait_description.csv')
trait_descriptions.head()💪 Competition challenge
- Build an interactive chatbot that interprets natural language to converse with users to extract their preferences and lifestyle (e.g., energy level, space, family status, allergies).
- Match user preferences to breed traits and recommend the top 3 dog breeds that best fit.
- Showcase breed results with images (from the GitHub dataset) and optionally, a short generated video or post-style description for social media.
Present your findings and logic clearly in a reproducible notebook or hosted app.
🧑⚖️ Judging criteria
| Category | Description | Weight |
|---|---|---|
| Creativity & Functionality | Does your chatbot feel natural, engaging, and provide relevant recommendations? | 40% |
| Data Use & Insight | Are dog breed traits used effectively? Are results backed by clear logic or analysis? | 35% |
| Presentation & Storytelling | Is your notebook well-structured, easy to follow, and visually engaging? | 25% |
✅ Checklist before publishing into the competition
- Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
- Remove redundant cells like the judging criteria, so the workbook is focused on your story.
- Make sure the workbook reads well and explains how you found your insights.
- Try to include an executive summary of your recommendations at the beginning.
- Check that all the cells run without error
⌛️ Time is ticking. Good luck!
print("hello world")%%capture
!pip install --upgrade transformers>=4.43.0 torch diffusers invisible_watermark accelerate safetensors scikit-learn
!pip install pandas numpy ipython huggingface_hub openai# 🐕 Choose my dog breed
# **Build an AI chatbot that finds your perfect pup**
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from IPython.display import Image, display
import os
import random
from huggingface_hub import login, InferenceClient
import torch
import getpass
from openai import OpenAI
# Step 1: Clone the Dog Breeds Dataset from GitHub (only if not already cloned)
if not os.path.exists('Dog-Breeds-Dataset'):
!git clone https://github.com/maartenvandenbroeck/Dog-Breeds-Dataset.git
import pandas as pd
import numpy as np
import re # For cleaning breed names
dog_breeds = pd.read_csv('data/breed_traits.csv')
trait_descriptions = pd.read_csv('data/trait_description.csv')
# Clean breed names for matching with image folders
dog_breeds['Breed_clean'] = dog_breeds['Breed'].str.replace(r'\s*\(.*\)', '', regex=True).str.strip().str.lower().str.replace(' ', '-')
# List available breed folders from the cloned repo
image_repo_path = 'Dog-Breeds-Dataset/'
available_breeds = [folder.lower() for folder in os.listdir(image_repo_path) if os.path.isdir(os.path.join(image_repo_path, folder))]
# Filter dog_breeds to only include breeds with available images
# dog_breeds = dog_breeds[dog_breeds['Breed_clean'].isin(available_breeds)] # Temporarily commented out to avoid empty dataset
# Numerical traits for similarity computation
numerical_traits = [
'Affectionate With Family', 'Good With Young Children', 'Good With Other Dogs',
'Shedding Level', 'Coat Grooming Frequency', 'Drooling Level',
'Openness To Strangers', 'Playfulness Level', 'Watchdog/Protective Nature',
'Adaptability Level', 'Trainability Level', 'Energy Level',
'Barking Level', 'Mental Stimulation Needs'
]
# Categorical traits
categorical_traits = ['Coat Type', 'Coat Length']
# Convert numerical traits to int
for trait in numerical_traits:
dog_breeds[trait] = dog_breeds[trait].astype(int)from huggingface_hub import login
import getpass
#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
hf_token = getpass.getpass("Enter your Hugging Face token: ")
login(hf_token)
# Set up OpenAI client for Hugging Face router
os.environ["HF_TOKEN"] = hf_token
client = OpenAI(
base_url="https://router.huggingface.co/v1",
api_key=hf_token,
)
model_id = "meta-llama/Meta-Llama-3-8B-Instruct:novita"
image_client = InferenceClient(
provider="nebius",
api_key=hf_token,
)# System prompt for the chatbot
system_prompt = """
You are a friendly dog breed recommendation assistant. Your goal is to help users find their perfect dog breed by asking natural, engaging questions about their lifestyle, preferences, and needs.
Key traits to gather information on (indirectly through conversation):
- Family situation (e.g., kids, other pets) -> Maps to 'Good With Young Children', 'Good With Other Dogs', 'Affectionate With Family'
- Living space (e.g., apartment vs. house) -> Maps to 'Adaptability Level', 'Energy Level'
- Activity level (e.g., active, sedentary) -> Maps to 'Energy Level', 'Playfulness Level', 'Mental Stimulation Needs'
- Allergies or grooming preferences -> Maps to 'Shedding Level', 'Coat Grooming Frequency', 'Drooling Level'
- Personality preferences (e.g., protective, friendly to strangers) -> Maps to 'Watchdog/Protective Nature', 'Openness To Strangers', 'Barking Level'
- Training experience -> Maps to 'Trainability Level'
Ask 1-2 questions at a time to keep the conversation natural. After 4-6 exchanges, summarize the user's preferences and output them in this exact JSON format:
{
"preferences": {
"Affectionate With Family": <1-5>,
"Good With Young Children": <1-5>,
"Good With Other Dogs": <1-5>,
"Shedding Level": <1-5> (lower if allergic),
"Coat Grooming Frequency": <1-5> (lower if low maintenance),
"Drooling Level": <1-5> (lower preferred usually),
"Openness To Strangers": <1-5>,
"Playfulness Level": <1-5>,
"Watchdog/Protective Nature": <1-5>,
"Adaptability Level": <1-5>,
"Trainability Level": <1-5>,
"Energy Level": <1-5>,
"Barking Level": <1-5> (lower if quiet preferred),
"Mental Stimulation Needs": <1-5>
},
"coat_type": "<preferred coat type, e.g., Smooth, Double, or any>",
"coat_length": "<preferred coat length, e.g., Short, Medium, Long, or any>"
}
Do not output JSON until you have enough information. If user provides incomplete info, ask follow-ups. End conversation only when ready to recommend.
"""
# Function to generate response from LLM
def generate_response(conversation_history):
messages = [{"role": "system", "content": system_prompt}] + conversation_history
completion = client.chat.completions.create(
model=model_id,
messages=messages,
max_tokens=200,
temperature=0.7,
top_p=0.95,
)
return completion.choices[0].message.content
# Function to extract JSON preferences from LLM output
import json
def extract_preferences(response):
try:
json_start = response.find('{')
json_end = response.rfind('}') + 1
prefs_json = response[json_start:json_end]
prefs = json.loads(prefs_json)
return prefs['preferences'], prefs.get('coat_type', 'any'), prefs.get('coat_length', 'any')
except:
return None, None, None