Skip to content
Competition - Bee friendly plants
0
  • AI Chat
  • Code
  • Report
  • 🎯 Problem and objectives:

    The challenge is to determine which plants are preferred by native vs non-native bee species, to optimize spaces dedicated to them. The ultimate goal is to recommend the top three plant species that most benefit native bees.

    In sort, the following points are to be answered:

    • Which plants are preferred by native vs non-native bee species?
    • A visualization of the distribution of bee and plant species across one of the samples.
    • Select the top three plant species you would recommend to the agency to support native bees.
    💾 Data exploration

    We will read the CSV file plants_and_bees.csv that contains relevant information and explore the data using exploratory analysis methods. Each row represents a sample that was taken from a patch of land where the plant species were being studied.

    ColumnDescription
    sample_idThe ID number of the sample taken.
    species_numThe number of different bee species in the sample.
    dateDate the sample was taken.
    seasonSeason during sample collection ("early.season" or "late.season").
    siteName of collection site.
    native_or_nonWhether the sample was from a native or non-native plant.
    samplingThe sampling method.
    plant_speciesThe name of the plant species the sample was taken from. None indicates the sample was taken from the air.
    timeThe time the sample was taken.
    bee_speciesThe bee species in the sample.
    sexThe gender of the bee species.
    specialized_onThe plant genus the bee species preferred.
    parasiticWhether or not the bee is parasitic (0:no, 1:yes).
    nestingThe bees nesting method.
    statusThe status of the bee species.
    nonnative_beeWhether the bee species is native or not (0:no, 1:yes).

    Source (data has been modified)

    import pandas as pd
    
    data_frame_plants_and_bees_original = pd.read_csv("data/plants_and_bees.csv", parse_dates=['date'])
    
    display(data_frame_plants_and_bees_original.head())
    🔍 Data validation

    In examining the dataset, we found that the status and specialized_on columns have a significant number of missing values. The specialized_on column only contains two distinct values, Penstemon and Ipomoea, each corresponding exclusively to Osmia distincta and Melitoma taurea bee species, respectively. Similarly, the status column only holds three unique values, which are vulnerable (IUCN), uncommon, and common. Given these findings, we have decided to exclude both status and specialized_on columns from our analysis due to their limited variety and substantial missing data.

    Based on the information given in the parasitic column, the sub-categorization within the nesting column seems redundant. To streamline our data analysis, we will simplify the nesting categories as follows: Wood/cavities and Wood/shell will be recategorized as Wood. We will carry out similar consolidation for the rest of the categories as well.

    import seaborn as sns
    
    df = data_frame_plants_and_bees_original.drop(columns=['specialized_on', 'status'])
    
    nesting_replacements = {'wood/cavities': 'wood',
                            'wood/shell': 'wood',
                            'parasite [ground]': 'ground'}
    
    df['nesting'] = df['nesting'].replace(nesting_replacements)
    import matplotlib.pyplot as plt
    
    most_common_species = df.groupby('native_or_non')['plant_species'].apply(lambda x: x.value_counts().nlargest(2).idxmin()).values
    
    native_data = df.loc[df['native_or_non'] == 'native']
    native_counts = native_data['plant_species'].value_counts()
    native_counts = native_counts.drop(native_counts.idxmax())
    
    non_native_data = df.loc[df['native_or_non'] == 'non-native']
    non_native_counts = non_native_data['plant_species'].value_counts()
    non_native_counts = non_native_counts.drop(non_native_counts.idxmax())
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))
    
    ax1.bar(native_counts.index, native_counts.values)
    ax1.set_title('Native Plants')
    ax1.set_xlabel('Plant Species')
    ax1.set_ylabel('Visit Counts')
    ax1.tick_params(axis='x', rotation=90)
    
    ax2.bar(non_native_counts.index, non_native_counts.values)
    ax2.set_title('Non-Native Plants')
    ax2.set_xlabel('Plant Species')
    ax2.set_ylabel('Visit Counts')
    ax2.tick_params(axis='x', rotation=90)
    
    plt.tight_layout()
    plt.show()
    
    print(f'The most visited plant by native bees is: {most_common_species[0]}')
    print(f'The most visited plant by non-native bees is: {most_common_species[1]}')
    visits_count = df.groupby(['native_or_non', 'bee_species', 'plant_species']).size().reset_index(name='count')
    
    visits_count = visits_count[visits_count['plant_species'] != 'None']
    
    heatmap_data = visits_count.pivot_table(index='bee_species', columns=['native_or_non', 'plant_species'], values='count', fill_value=0)
    
    plt.figure(figsize=(15, 7))
    sns.heatmap(heatmap_data, cmap="BuGn", linewidths=.5)
    plt.title("Visits count by bee species")
    plt.xlabel("Plant Species")
    plt.ylabel("Bee Species")
    plt.show()
    
    most_common_species
    for tag, bee in heatmap_data.columns:
        print(bee)
    df[df.plant_species != 'None'].sample_id.values