Competition - 🐕 Choose my dog breed

🐕 Choose my dog breed

Build an AI chatbot that finds your perfect pup

📖 Background

You’re working as part of an innovation team at a smart robotics company launching a new AI-powered robot dog. To attract customers, they want a fun, interactive chatbot that helps users discover their ideal real-world dog breed match based on their personality and lifestyle.

Your challenge: build a chatbot recommender system that asks questions, interprets responses, and recommends the top three matching dog breeds, complete with images. Bonus: include generated videos for social media flair!

💾 The data

The data consists of three datasets: data/breed_traits, data/trait_description, and a Github repo consisting of 30+ images per dog breed.

Dog breed traits

Variable	Class	Description
Breed	character	Dog breed name
Affectionate With Family	character	Placement on scale of 1–5 for the breed’s tendency to be affectionate with family
Good With Young Children	character	Placement on scale of 1–5 for the breed’s tendency to be good with young children
Good With Other Dogs	character	Placement on scale of 1–5 for the breed’s tendency to be good with other dogs
Shedding Level	character	Placement on scale of 1–5 for the breed’s shedding level
Coat Grooming Frequency	character	Placement on scale of 1–5 for the breed’s grooming frequency
Drooling Level	character	Placement on scale of 1–5 for the breed’s drooling level
Coat Type	character	Description of the breed’s coat type
Coat Length	character	Description of the breed’s coat length
Openness To Strangers	character	Placement on scale of 1–5 for the breed’s openness to strangers
Playfulness Level	character	Placement on scale of 1–5 for the breed’s playfulness
Watchdog/Protective Nature	character	Placement on scale of 1–5 for the breed’s protective instincts
Adaptability Level	character	Placement on scale of 1–5 for the breed’s adaptability
Trainability Level	character	Placement on scale of 1–5 for the breed’s trainability
Energy Level	character	Placement on scale of 1–5 for the breed’s energy level
Barking Level	character	Placement on scale of 1–5 for the breed’s barking level
Mental Stimulation Needs	character	Placement on scale of 1–5 for the breed’s mental stimulation needs

Trait descriptions

Variable	Class	Description
Trait	character	Dog trait name
Trait_1	character	Description when scale = 1
Trait_5	character	Description when scale = 5
Description	character	Long-form explanation of the trait

Dog breed images

Detail	Description
Coverage	Breeds recognized by the Fédération Cynologique Internationale (FCI)
Images per breed	35
Total breeds	356
Format	High-resolution JPEG images

import pandas as pd
dog_breeds = pd.read_csv('data/breed_traits.csv')
dog_breeds.head()

trait_descriptions = pd.read_csv('data/trait_description.csv')
trait_descriptions.head()

💪 Competition challenge

Build an interactive chatbot that interprets natural language to converse with users to extract their preferences and lifestyle (e.g., energy level, space, family status, allergies).
Match user preferences to breed traits and recommend the top 3 dog breeds that best fit.
Showcase breed results with images (from the GitHub dataset) and optionally, a short generated video or post-style description for social media.

Present your findings and logic clearly in a reproducible notebook or hosted app.

🧑‍⚖️ Judging criteria

Category	Description	Weight
Creativity & Functionality	Does your chatbot feel natural, engaging, and provide relevant recommendations?	40%
Data Use & Insight	Are dog breed traits used effectively? Are results backed by clear logic or analysis?	35%
Presentation & Storytelling	Is your notebook well-structured, easy to follow, and visually engaging?	25%

✅ Checklist before publishing into the competition

Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
Remove redundant cells like the judging criteria, so the workbook is focused on your story.
Make sure the workbook reads well and explains how you found your insights.
Try to include an executive summary of your recommendations at the beginning.
Check that all the cells run without error

⌛️ Time is ticking. Good luck!

print("hello world")

%%capture
!pip install --upgrade transformers>=4.43.0 torch diffusers invisible_watermark accelerate safetensors scikit-learn
!pip install pandas numpy ipython huggingface_hub openai

# 🐕 Choose my dog breed
# **Build an AI chatbot that finds your perfect pup**

# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from IPython.display import Image, display
import os
import random
from huggingface_hub import login, InferenceClient
import torch
import getpass
from openai import OpenAI


# Step 1: Clone the Dog Breeds Dataset from GitHub (only if not already cloned)
if not os.path.exists('Dog-Breeds-Dataset'):
    !git clone https://github.com/maartenvandenbroeck/Dog-Breeds-Dataset.git

import pandas as pd
import numpy as np
import re  # For cleaning breed names

dog_breeds = pd.read_csv('data/breed_traits.csv')
trait_descriptions = pd.read_csv('data/trait_description.csv')

# Clean breed names for matching with image folders
dog_breeds['Breed_clean'] = dog_breeds['Breed'].str.replace(r'\s*\(.*\)', '', regex=True).str.strip().str.lower().str.replace(' ', '-')

# List available breed folders from the cloned repo
image_repo_path = 'Dog-Breeds-Dataset/'
available_breeds = [folder.lower() for folder in os.listdir(image_repo_path) if os.path.isdir(os.path.join(image_repo_path, folder))]

# Filter dog_breeds to only include breeds with available images
# dog_breeds = dog_breeds[dog_breeds['Breed_clean'].isin(available_breeds)]  # Temporarily commented out to avoid empty dataset

# Numerical traits for similarity computation
numerical_traits = [
    'Affectionate With Family', 'Good With Young Children', 'Good With Other Dogs',
    'Shedding Level', 'Coat Grooming Frequency', 'Drooling Level',
    'Openness To Strangers', 'Playfulness Level', 'Watchdog/Protective Nature',
    'Adaptability Level', 'Trainability Level', 'Energy Level',
    'Barking Level', 'Mental Stimulation Needs'
]

# Categorical traits
categorical_traits = ['Coat Type', 'Coat Length']

# Convert numerical traits to int
for trait in numerical_traits:
    dog_breeds[trait] = dog_breeds[trait].astype(int)

from huggingface_hub import login
import getpass
#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
hf_token = getpass.getpass("Enter your Hugging Face token: ")
login(hf_token)

# Set up OpenAI client for Hugging Face router
os.environ["HF_TOKEN"] = hf_token
client = OpenAI(
    base_url="https://router.huggingface.co/v1",
    api_key=hf_token,
)
model_id = "meta-llama/Meta-Llama-3-8B-Instruct:novita"

image_client = InferenceClient(
    provider="nebius",
    api_key=hf_token,
)

# System prompt for the chatbot
system_prompt = """
You are a friendly dog breed recommendation assistant. Your goal is to help users find their perfect dog breed by asking natural, engaging questions about their lifestyle, preferences, and needs.

Key traits to gather information on (indirectly through conversation):
- Family situation (e.g., kids, other pets) -> Maps to 'Good With Young Children', 'Good With Other Dogs', 'Affectionate With Family'
- Living space (e.g., apartment vs. house) -> Maps to 'Adaptability Level', 'Energy Level'
- Activity level (e.g., active, sedentary) -> Maps to 'Energy Level', 'Playfulness Level', 'Mental Stimulation Needs'
- Allergies or grooming preferences -> Maps to 'Shedding Level', 'Coat Grooming Frequency', 'Drooling Level'
- Personality preferences (e.g., protective, friendly to strangers) -> Maps to 'Watchdog/Protective Nature', 'Openness To Strangers', 'Barking Level'
- Training experience -> Maps to 'Trainability Level'

Ask 1-2 questions at a time to keep the conversation natural. After 4-6 exchanges, summarize the user's preferences and output them in this exact JSON format:
{
  "preferences": {
    "Affectionate With Family": <1-5>,
    "Good With Young Children": <1-5>,
    "Good With Other Dogs": <1-5>,
    "Shedding Level": <1-5> (lower if allergic),
    "Coat Grooming Frequency": <1-5> (lower if low maintenance),
    "Drooling Level": <1-5> (lower preferred usually),
    "Openness To Strangers": <1-5>,
    "Playfulness Level": <1-5>,
    "Watchdog/Protective Nature": <1-5>,
    "Adaptability Level": <1-5>,
    "Trainability Level": <1-5>,
    "Energy Level": <1-5>,
    "Barking Level": <1-5> (lower if quiet preferred),
    "Mental Stimulation Needs": <1-5>
  },
  "coat_type": "<preferred coat type, e.g., Smooth, Double, or any>",
  "coat_length": "<preferred coat length, e.g., Short, Medium, Long, or any>"
}

Do not output JSON until you have enough information. If user provides incomplete info, ask follow-ups. End conversation only when ready to recommend.
"""

# Function to generate response from LLM
def generate_response(conversation_history):
    messages = [{"role": "system", "content": system_prompt}] + conversation_history
    completion = client.chat.completions.create(
        model=model_id,
        messages=messages,
        max_tokens=200,
        temperature=0.7,
        top_p=0.95,
    )
    return completion.choices[0].message.content

# Function to extract JSON preferences from LLM output
import json
def extract_preferences(response):
    try:
        json_start = response.find('{')
        json_end = response.rfind('}') + 1
        prefs_json = response[json_start:json_end]
        prefs = json.loads(prefs_json)
        return prefs['preferences'], prefs.get('coat_type', 'any'), prefs.get('coat_length', 'any')
    except:
        return None, None, None

‌
‌
‌