Dog Matchmaker – AI Breed Recommender

🌐 Prototype web (work in progress)

For this competition, the reference implementation of the Dog Matchmaker is this notebook: it contains the full data pipeline, scoring logic, and explanations.

I am also experimenting with a small web prototype built with Lovable: https://dog-match-mate.lovable.app

⚠️ This web app is not the final or official version of the project: it is a work in progress and may not reflect all the logic implemented in this notebook (e.g. some questions, filters, and scoring details are still being integrated).

Last update of the web prototype:
21/11/2025 — 22:25 (GMT+1)

🐕 Dog Matchmaker – From Robot Dog to Real-World Breeds

A robotics company is launching a robot dog and wants its personality to feel as natural as a real pet.
This notebook transforms their dataset into an AI Dog Matchmaker that recommends the top 3 dog breeds for any lifestyle.

We integrate:

Structured breed traits (affection, kid-friendliness, energy, barking, shedding, trainability, etc.)
A conversational flow that turns user answers into a structured profile
A hybrid matching engine (hard constraints + weighted scoring on 1–5 behavioral scales)
Optional semantic embeddings to interpret the user’s free-text description
Breed images and a social-media-ready caption for the top match

At the end of this notebook, you can:

Run an automatic demo that illustrates the full recommendation pipeline
Interact with the Dog Matchmaker chatbot directly inside the notebook

Objective: help the product team design a robot dog persona grounded in real-world canine behavior and user expectations.

🔍 Methodology – How the system matches people with dog breeds

The goal of this project is not only to process a dataset, but to design a simple, interpretable recommendation engine that mimics how a real expert would advise someone choosing a dog.

1. Data used

We rely on:

A structured table of breed traits (energy, barking, shedding, trainability, child-friendliness…)
A companion table with human-readable trait descriptions
A large repository of breed images for visual results

These datasets let us translate lifestyle questions into measurable criteria.

2. Converting user answers into a profile

The chatbot turns the conversation into a clear user profile:

Family context (kids, other dogs, allergies)
Preferences (energy, noise, grooming, shedding)
Behavioral expectations (trainability, friendliness, adaptability)
Optional free-text description of their “ideal dog vibe”

This profile becomes the input of the recommender.

3. Matching logic

The engine combines two layers:

Hard constraints
(e.g., allergies → remove heavy-shedding breeds)

Soft compatibility scoring
A weighted comparison between user preferences and breed traits:

Energy match
Barking tolerance
Trainability importance
Compatibility with children / other dogs
Grooming and shedding constraints
Stranger friendliness
Adaptability

Everything is normalized to produce an interpretable match score between 0 and 1.

An optional semantic layer (sentence-transformers) captures signals from the user’s free-text description, adding a small “personal taste” boost.

4. Results & user experience

For each profile, the system:

Ranks all breeds by match score
Returns the top 3 recommendations
Displays images, interpretable explanations, and a social-media-ready caption
Allows either a one-shot demo or a fully interactive chat session inside the notebook

This creates a transparent AI assistant that connects human preferences with real-world breed characteristics.

# ============================================================
# 1. Imports and data loading
# ============================================================
import os
import re
import textwrap

import numpy as np
import pandas as pd
from dataclasses import dataclass, asdict
from IPython.display import display, Image, Markdown


os.environ["TOKENIZERS_PARALLELISM"] = "false"
# ------------------------------------------------------------------
# Expected files:
#   - data/breed_traits.csv
#   - data/trait_description.csv
# ------------------------------------------------------------------

breed_traits_path = "data/breed_traits.csv"
trait_desc_path = "data/trait_description.csv"

dog_breeds = pd.read_csv(breed_traits_path)
trait_descriptions = pd.read_csv(trait_desc_path)

dog_breeds["_row_id"] = np.arange(len(dog_breeds))

# Quick sanity check
print("Breed traits shape:", dog_breeds.shape)
print("Trait description shape:", trait_descriptions.shape)
print("Sample breeds:", dog_breeds["Breed"].head().tolist())

# ============================================================
# 2. Data preprocessing and helpers
# ============================================================

# Columns that are numeric scales 1–5 stored as text
NUMERIC_TRAIT_COLS = [
    "Affectionate With Family",
    "Good With Young Children",
    "Good With Other Dogs",
    "Shedding Level",
    "Coat Grooming Frequency",
    "Drooling Level",
    "Openness To Strangers",
    "Playfulness Level",
    "Watchdog/Protective Nature",
    "Adaptability Level",
    "Trainability Level",
    "Energy Level",
    "Barking Level",
    "Mental Stimulation Needs",
]

# Convert numeric traits to integers
for col in NUMERIC_TRAIT_COLS:
    dog_breeds[col] = pd.to_numeric(dog_breeds[col], errors="coerce").astype("Int64")

# Build a dictionary for trait descriptions
trait_desc_map = {}
for _, row in trait_descriptions.iterrows():
    name = row["Trait"]
    trait_desc_map[name] = {
        "low_label": row["Trait_1"],
        "high_label": row["Trait_5"],
        "long_desc": row["Description"],
    }

def describe_trait_level(trait_name: str, level: int | float | None) -> str:
    """
    Turn a numeric level (1–5) into a human-readable short description
    using trait_descriptions when available.
    """
    if level is None or pd.isna(level):
        return f"{trait_name}: unknown"

    level = int(level)
    base = trait_desc_map.get(trait_name, None)
    if base is None:
        return f"{trait_name}: {level}/5"

    low = base["low_label"]
    high = base["high_label"]
    if level <= 2:
        qual = low
    elif level >= 4:
        qual = high
    else:
        qual = f"Moderate {trait_name.lower()}"
    return f"{trait_name}: {level}/5 ({qual})"

# ============================================================
# 3. User profile definition
# ============================================================

@dataclass
class UserProfile:
    # lifestyle and family
    has_kids: bool | None = None
    has_other_dogs: bool | None = None
    allergies: bool | None = None         # to hair/dander
    wants_guard_dog: bool | None = None
    # numeric preferences on 1–5 scale
    desired_energy: int | None = None       # maps to Energy Level
    noise_tolerance: int | None = None      # maps to Barking Level tolerance
    grooming_tolerance: int | None = None   # inverse of Coat Grooming Frequency
    shedding_tolerance: int | None = None    # maps to Shedding Level tolerance
    stranger_friendly_pref: int | None = None # Openness To Strangers
    trainability_importance: int | None = None # Trainability Level importance
    mental_stimulation_importance: int | None = None # Mental Stimulation Needs alignment
    adaptability_importance: int | None = None       # Adaptability Level
    # free text (not used by the scoring engine, but can be used later)
    free_text_intro: str | None = None


# ============================================================
# 4. Simple NLU helpers (parsing yes/no and scales)
# ============================================================

def parse_yes_no(text: str) -> bool | None:
    t = text.strip().lower()
    yes_words = ["yes", "y", "yeah", "yep", "oui", "of course", "sure"]
    no_words = ["no", "n", "nope", "non"]
    if any(w in t for w in yes_words):
        return True
    if any(w in t for w in no_words):
        return False
    return None


def parse_scale_1_5(text: str, default: int = 3) -> int:
    """
    Parse a 1–5 answer. If parsing fails, return default.
    """
    t = text.strip().lower()
    # Try to extract an integer
    m = re.search(r"[1-5]", t)
    if m:
        val = int(m.group(0))
        return max(1, min(5, val))

    # Try some qualitative mappings
    if any(w in t for w in ["very low", "calm", "quiet", "minimal"]):
        return 1
    if any(w in t for w in ["low", "rather low", "not much"]):
        return 2
    if any(w in t for w in ["medium", "moderate", "balanced"]):
        return 3
    if any(w in t for w in ["high", "a lot", "active"]):
        return 4
    if any(w in t for w in ["very high", "hyper", "intense"]):
        return 5

    return default


# ============================================================
# 5. Matching engine (hard filters + soft scoring)
# ============================================================

def hard_filter_breeds(df: pd.DataFrame, user: UserProfile) -> pd.DataFrame:
    """
    Apply strong constraints where it makes sense, e.g. allergies.
    Returns a filtered copy.
    """
    filtered = df.copy()

    # Allergies: reduce to low shedding if user has allergies
    if user.allergies is True:
        # 1 = No shedding, 5 = Hair everywhere
        filtered = filtered[filtered["Shedding Level"] <= 2]

    # If user definitely does NOT want a guard/protective dog
    if user.wants_guard_dog is False:
        # Keep breeds with moderate or lower protective nature
        filtered = filtered[filtered["Watchdog/Protective Nature"] <= 3]

    return filtered


def score_breed_row(row: pd.Series, user: UserProfile) -> float:
    """
    Compute a weighted similarity score between a breed and the user's preferences.
    All component scores are in [0, 1]; final score is a weighted average in [0, 1].
    """
    scores = []
    weights = []

    # Energy level (match desired energy)
    if user.desired_energy is not None and not pd.isna(row["Energy Level"]):
        diff = abs(int(row["Energy Level"]) - int(user.desired_energy))
        s_energy = 1 - diff / 4    # max diff = |1-5| = 4
        scores.append(s_energy)
        weights.append(2.0)        # energy is important

    # Barking level vs noise tolerance:
    # if breed barking > tolerance, penalty; else ok.
    if user.noise_tolerance is not None and not pd.isna(row["Barking Level"]):
        barking = int(row["Barking Level"])
        tolerance = int(user.noise_tolerance)
        penalty = max(0, barking - tolerance)
        s_noise = 1 - penalty / 4
        scores.append(s_noise)
        weights.append(1.5)

    # Shedding level vs shedding tolerance
    if user.shedding_tolerance is not None and not pd.isna(row["Shedding Level"]):
        shedding = int(row["Shedding Level"])
        tol = int(user.shedding_tolerance)
        diff = max(0, shedding - tol)
        s_shed = 1 - diff / 4
        scores.append(s_shed)
        # If allergies, make this more important
        w_shed = 2.0 if user.allergies else 1.5
        weights.append(w_shed)

    # Grooming vs grooming tolerance
    if user.grooming_tolerance is not None and not pd.isna(row["Coat Grooming Frequency"]):
        groom_need = int(row["Coat Grooming Frequency"])
        tol = int(user.grooming_tolerance)
        diff = max(0, groom_need - tol)
        s_groom = 1 - diff / 4
        scores.append(s_groom)
        weights.append(1.0)

    # Kids: if user has kids, push "Good With Young Children" high
    if user.has_kids is True and not pd.isna(row["Good With Young Children"]):
        kids = int(row["Good With Young Children"])
        s_kids = kids / 5
        scores.append(s_kids)
        weights.append(2.0)

    # Other dogs at home: use "Good With Other Dogs"
    if user.has_other_dogs is True and not pd.isna(row["Good With Other Dogs"]):
        other = int(row["Good With Other Dogs"])
        s_other = other / 5
        scores.append(s_other)
        weights.append(1.5)

    # Trainability importance
    if user.trainability_importance is not None and not pd.isna(row["Trainability Level"]):
        train = int(row["Trainability Level"])
        importance = int(user.trainability_importance) / 5
        s_train = train / 5
        scores.append(s_train)
        weights.append(1.5 * importance + 0.5)

    # Mental stimulation / playfulness for users who want to work with the dog
    if user.mental_stimulation_importance is not None and not pd.isna(row["Mental Stimulation Needs"]):
        mental = int(row["Mental Stimulation Needs"])
        importance = int(user.mental_stimulation_importance) / 5
        s_mental = mental / 5
        scores.append(s_mental)
        weights.append(1.2 * importance + 0.3)

    # Stranger friendliness preference
    if user.stranger_friendly_pref is not None and not pd.isna(row["Openness To Strangers"]):
        desired_open = int(user.stranger_friendly_pref)
        diff = abs(int(row["Openness To Strangers"]) - desired_open)
        s_stranger = 1 - diff / 4
        scores.append(s_stranger)
        weights.append(1.0)

    # Adaptability importance
    if user.adaptability_importance is not None and not pd.isna(row["Adaptability Level"]):
        adapt = int(row["Adaptability Level"])
        importance = int(user.adaptability_importance) / 5
        s_adapt = adapt / 5
        scores.append(s_adapt)
        weights.append(1.0 * importance + 0.5)

    if not scores or sum(weights) == 0:
        return 0.0

    scores = np.array(scores, dtype=float)
    weights = np.array(weights, dtype=float)
    return float(np.clip((scores * weights).sum() / weights.sum(), 0.0, 1.0))


def rank_breeds(dog_breeds: pd.DataFrame, user: UserProfile, top_k: int = 3) -> pd.DataFrame:
    """
    Apply hard filters, compute scores, and return top_k breeds with:
      - match_score_structured : score basé sur les traits numériques
      - match_score_semantic   : (optionnel) score basé sur embeddings texte
      - match_score            : combinaison des deux
    """
    filtered = hard_filter_breeds(dog_breeds, user)
    if filtered.empty:
        return filtered

    # 1) score basé sur les traits structurés
    struct_scores = []
    for _, row in filtered.iterrows():
        struct_scores.append(score_breed_row(row, user))

    filtered = filtered.copy()
    filtered["match_score_structured"] = struct_scores

    # 2) optionnel : score sémantique à partir du texte libre de l'utilisateur
    use_semantic = (
        EMBEDDING_AVAILABLE
        and EMBEDDING_MODEL is not None
        and breed_embs is not None
        and isinstance(user.free_text_intro, str)
        and user.free_text_intro.strip() != ""
    )

    if use_semantic:
        try:
            # encode user text
            u_emb = EMBEDDING_MODEL.encode([user.free_text_intro], normalize_embeddings=True)[0]

            # on doit aligner chaque ligne filtrée avec son embedding global
            row_ids = filtered["_row_id"].astype(int).values
            # breed_embs[row_ids] : matrice (n_filtered, dim)
            sims = breed_embs[row_ids] @ u_emb  # produit scalaire cosinus (car normalisé)
            # Rescale from [-1,1] to [0,1] just in case
            sims = (sims + 1.0) / 2.0

            filtered["match_score_semantic"] = sims

            # Combinaison : 70% traits structurés, 30% similarité sémantique
            alpha = 0.7
            filtered["match_score"] = alpha * filtered["match_score_structured"] + (1 - alpha) * filtered["match_score_semantic"]
        except Exception as e:
            print("Semantic scoring failed, falling back to structured only:", e)
            filtered["match_score_semantic"] = np.nan
            filtered["match_score"] = filtered["match_score_structured"]
    else:
        # pas d'embedding dispo ou pas de texte → uniquement traits
        filtered["match_score_semantic"] = np.nan
        filtered["match_score"] = filtered["match_score_structured"]

    # Tri final
    filtered = filtered.sort_values("match_score", ascending=False)
    return filtered.head(top_k)


# ============================================================
# 6. Dog image retrieval from GitHub dataset
# ============================================================



IMAGE_ROOT = "data/dog_images/Dog-Breeds-Dataset"
if not os.path.isdir(IMAGE_ROOT):
    # clone only if folder does not exist
    !git clone https://github.com/maartenvandenbroeck/Dog-Breeds-Dataset.git $IMAGE_ROOT
    
if os.path.isdir(IMAGE_ROOT):
    image_folders = [f for f in os.listdir(IMAGE_ROOT) if os.path.isdir(os.path.join(IMAGE_ROOT, f))]
    print(f"Found {len(image_folders)} image folders under {IMAGE_ROOT}.")
else:
    image_folders = []
    print(f"No image root folder '{IMAGE_ROOT}' found. Images will be skipped.")


def find_best_image_folder(breed_name: str) -> str | None:
    """
    Fuzzy match breed_name to a folder in the cloned Dog-Breeds-Dataset.
    Strategy: compare word overlap between AKC breed name and folder names.
    """
    if not image_folders:
        return None

    breed_words = set(re.sub(r"[^a-z ]", " ", breed_name.lower()).split())
    best_folder = None
    best_score = 0

    for folder in image_folders:
        folder_words = set(re.sub(r"[^a-z ]", " ", folder.lower()).split())
        # Intersection size as a simple similarity score
        score = len(breed_words & folder_words)
        if score > best_score:
            best_score = score
            best_folder = folder

    if best_score == 0:
        return None
    return best_folder


def get_example_images_for_breed(breed_name: str, max_images: int = 3) -> list[str]:
    """
    Return a list of local image file paths (relative to notebook) for the given breed,
    using the cloned GitHub repository. If no images found, return [].
    """
    folder = find_best_image_folder(breed_name)
    if folder is None:
        return []

    folder_path = os.path.join(IMAGE_ROOT, folder)
    if not os.path.isdir(folder_path):
        return []

    # Take the first few JPEGs
    files = [f for f in os.listdir(folder_path) if f.lower().endswith((".jpg", ".jpeg", ".png"))]
    files = sorted(files)[:max_images]
    return [os.path.join(folder_path, f) for f in files]


# ============================================================
# 7. Presentation helpers (executive summary, result cards)
# ============================================================

def summarize_breed(breed_row: pd.Series) -> str:
    """
    Use trait_descriptions and key traits to build a short explanation for the breed.
    """
    lines = []

    lines.append(f"Breed: {breed_row['Breed']}")
    # Family and kids
    lines.append(
        describe_trait_level("Affectionate With Family", breed_row["Affectionate With Family"])
    )
    lines.append(
        describe_trait_level("Good With Young Children", breed_row["Good With Young Children"])
    )
    # Energy and mental needs
    lines.append(describe_trait_level("Energy Level", breed_row["Energy Level"]))
    lines.append(
        describe_trait_level("Mental Stimulation Needs", breed_row["Mental Stimulation Needs"])
    )
    # Shedding / grooming / barking
    lines.append(describe_trait_level("Shedding Level", breed_row["Shedding Level"]))
    lines.append(
        describe_trait_level("Coat Grooming Frequency", breed_row["Coat Grooming Frequency"])
    )
    lines.append(describe_trait_level("Barking Level", breed_row["Barking Level"]))

    text = "\n".join(lines)
    return textwrap.dedent(text)


def generate_social_media_caption(breed_row: pd.Series, user: UserProfile) -> str:
    """
    Simple text-only post-style description for the top match.
    """
    traits = []
    traits.append(f"Energy level {breed_row['Energy Level']}/5")
    traits.append(f"Affection with family {breed_row['Affectionate With Family']}/5")
    traits.append(f"Good with children {breed_row['Good With Young Children']}/5")
    traits_str = ", ".join(traits)

    profile_bits = []
    if user.has_kids:
        profile_bits.append("family with children")
    if user.allergies:
        profile_bits.append("sensitive to shedding")
    if user.desired_energy is not None:
        profile_bits.append(f"likes activity level {user.desired_energy}/5")

    profile_str = ", ".join(profile_bits) if profile_bits else "my lifestyle"

    caption = f"""\
My ideal dog match: {breed_row['Breed']}.

They fit {profile_str} with traits like: {traits_str}.

If you are looking for a dog with similar energy and temperament, this breed is worth considering."""
    return textwrap.dedent(caption)


def display_results(top_breeds: pd.DataFrame, user: UserProfile) -> None:
    """
    Nicely render the top 3 matches with images (if available).
    """
    if top_breeds.empty:
        display(Markdown("### No suitable breeds found\n\nYour constraints are very strict. You may want to relax shedding or barking constraints and try again."))
        return

    display(Markdown("## Your top dog breed matches"))

    for rank, (_, row) in enumerate(top_breeds.iterrows(), start=1):
        score = row["match_score"]
        display(Markdown(f"### #{rank}: {row['Breed']} (match score: {score:.2f})"))

        # Show images if available
        image_paths = get_example_images_for_breed(row["Breed"], max_images=3)
        if image_paths:
            for img_path in image_paths:
                try:
                    display(Image(filename=img_path, width=250))
                except Exception as e:
                    print(f"Could not display image {img_path}: {e}")
        else:
            display(Markdown("_No local images found for this breed (check Dog-Breeds-Dataset clone)._"))

        # Text summary
        summary = summarize_breed(row)
        display(Markdown("**Why this is a good match for you**"))
        display(Markdown(f"```text\n{summary}\n```"))

    # Social media style caption for the top match
    best_row = top_breeds.iloc[0]
    caption = generate_social_media_caption(best_row, user)
    display(Markdown("## Social-media-ready caption for your top match"))
    display(Markdown(f"```text\n{caption}\n```"))


# ============================================================
# 8. Chatbot flow (interactive CLI in the notebook)
# ============================================================

def run_dog_matchmaker_chat():
    """
    Simple console chatbot that collects user preferences
    and prints top 3 breed recommendations.
    You run it in a notebook cell and answer in the console.
    """
    print("Welcome to the Dog Matchmaker.")
    print("I will ask you a few questions about your lifestyle and preferences.")
    print("Then I will recommend the top 3 dog breeds that best fit you.")
    print("-" * 60)

    user = UserProfile()

    # Free-text intro (not used by the scoring engine, but could be logged)
    intro = input("First, in a few words, describe your ideal life with a dog:\n> ")
    user.free_text_intro = intro

    # Family and home context
    while user.has_kids is None:
        ans = input("Do you have young children at home? (yes/no)\n> ")
        user.has_kids = parse_yes_no(ans)
        if user.has_kids is None:
            print("I did not understand. Please answer with yes or no.")

    while user.has_other_dogs is None:
        ans = input("Do you already have other dogs at home? (yes/no)\n> ")
        user.has_other_dogs = parse_yes_no(ans)
        if user.has_other_dogs is None:
            print("I did not understand. Please answer with yes or no.")

    while user.allergies is None:
        ans = input("Does anyone in your home have dog-related allergies (hair/dander)? (yes/no)\n> ")
        user.allergies = parse_yes_no(ans)
        if user.allergies is None:
            print("I did not understand. Please answer with yes or no.")

    while user.wants_guard_dog is None:
        ans = input("Are you specifically looking for a watchdog/protective dog? (yes/no)\n> ")
        user.wants_guard_dog = parse_yes_no(ans)
        if user.wants_guard_dog is None:
            print("I did not understand. Please answer with yes or no.")

    # Preferences on numeric scales
    de = input("On a scale from 1 (very calm) to 5 (very energetic), what energy level do you prefer?\n> ")
    user.desired_energy = parse_scale_1_5(de, default=3)

    nt = input("On a scale from 1 (needs to be very quiet) to 5 (noise is fine), how much barking can you tolerate?\n> ")
    user.noise_tolerance = parse_scale_1_5(nt, default=3)

    gt = input("On a scale from 1 (almost no grooming) to 5 (regular grooming is fine), how much grooming can you handle?\n> ")
    user.grooming_tolerance = parse_scale_1_5(gt, default=3)

    st = input("On a scale from 1 (very low shedding) to 5 (shedding is not a problem), how much shedding can you tolerate?\n> ")
    user.shedding_tolerance = parse_scale_1_5(st, default=3)

    sf = input("On a scale from 1 (prefers reserved with strangers) to 5 (very friendly with strangers), what do you prefer?\n> ")
    user.stranger_friendly_pref = parse_scale_1_5(sf, default=3)

    tr = input("On a scale from 1 (trainability does not matter) to 5 (very important that the dog is easy to train), how important is trainability for you?\n> ")
    user.trainability_importance = parse_scale_1_5(tr, default=4)

    ms = input("On a scale from 1 (no need for games/mental work) to 5 (I want to do many activities and games), how important is mental stimulation with your dog?\n> ")
    user.mental_stimulation_importance = parse_scale_1_5(ms, default=3)

    ad = input("On a scale from 1 (adaptability not important) to 5 (very important that the dog adapts to changes, travel, etc.), how important is adaptability?\n> ")
    user.adaptability_importance = parse_scale_1_5(ad, default=3)

    print("\nThank you. Here is the profile I built from your answers:")
    for k, v in asdict(user).items():
        print(f"  - {k}: {v}")

    print("\nComputing your best dog breed matches...")
    top = rank_breeds(dog_breeds, user, top_k=3)
    display_results(top, user)
    print("\nEnd of Dog Matchmaker session.")

# ============================================================
# Optional: semantic embeddings for free-text intro
# ============================================================
try:
    from sentence_transformers import SentenceTransformer
    EMBEDDING_AVAILABLE = True
except Exception as e:
    print("Warning: sentence_transformers not available, semantic matching disabled:", e)
    SentenceTransformer = None
    EMBEDDING_AVAILABLE = False

EMBEDDING_MODEL = None
breed_embs = None  # will store embeddings aligned with dog_breeds["_row_id"]

def build_breed_text(row: pd.Series) -> str:
    """
    Build a textual description per breed that we will embed.
    The goal is not to be perfect, but to capture overall 'vibe':
    family, kids, energy, barking, shedding, trainability, etc.
    """
    parts = [str(row["Breed"])]

    def safe_int(x):
        try:
            return int(x)
        except Exception:
            return None

    aff = safe_int(row.get("Affectionate With Family", None))
    kids = safe_int(row.get("Good With Young Children", None))
    energy = safe_int(row.get("Energy Level", None))
    bark = safe_int(row.get("Barking Level", None))
    shed = safe_int(row.get("Shedding Level", None))
    train = safe_int(row.get("Trainability Level", None))
    mental = safe_int(row.get("Mental Stimulation Needs", None))

    if aff is not None:
        parts.append(f"affection with family {aff}/5")
    if kids is not None:
        parts.append(f"good with children {kids}/5")
    if energy is not None:
        parts.append(f"energy level {energy}/5")
    if bark is not None:
        parts.append(f"barking level {bark}/5")
    if shed is not None:
        parts.append(f"shedding level {shed}/5")
    if train is not None:
        parts.append(f"trainability {train}/5")
    if mental is not None:
        parts.append(f"mental stimulation needs {mental}/5")

    return ". ".join(parts)


if EMBEDDING_AVAILABLE:
    try:
        print("Loading sentence-transformers model for semantic matching...")
        EMBEDDING_MODEL = SentenceTransformer("all-MiniLM-L6-v2")
        breed_texts = dog_breeds.apply(build_breed_text, axis=1).tolist()
        breed_embs = EMBEDDING_MODEL.encode(breed_texts, normalize_embeddings=True)
        print("Semantic embeddings ready for", len(breed_embs), "breeds.")
    except Exception as e:
        print("Could not load embedding model or compute embeddings, disabling semantic matching:", e)
        EMBEDDING_MODEL = None
        breed_embs = None
        EMBEDDING_AVAILABLE = False
else:
    print("Semantic matching is disabled (no sentence_transformers).")

# ============================================================
# 10. Automatic demo (non-interactive)
# ============================================================

demo_profile = UserProfile(
    has_kids=True,
    has_other_dogs=False,
    allergies=False,
    wants_guard_dog=False,
    desired_energy=3,
    noise_tolerance=3,
    grooming_tolerance=3,
    shedding_tolerance=3,
    stranger_friendly_pref=4,
    trainability_importance=4,
    mental_stimulation_importance=3,
    adaptability_importance=4,
    free_text_intro="I want a playful family dog that fits apartment life."
)

demo_top = rank_breeds(dog_breeds, demo_profile, top_k=3)
display_results(demo_top, demo_profile)

# ============================================================
# 11. Optional interactive chatbot session
# ============================================================

# If you want to talk to the Dog Matchmaker in the notebook,
# uncomment the line below and run this cell manually.

# run_dog_matchmaker_chat()