Skip to content
0

Which plants are better for bees: native or non-native?

πŸ“– Background

You work for the local government environment agency and have taken on a project about creating pollinator bee-friendly spaces. You can use both native and non-native plants to create these spaces and therefore need to ensure that you use the correct plants to optimize the environment for these bees.

The team has collected data on native and non-native plants and their effects on pollinator bees. Your task will be to analyze this data and provide recommendations on which plants create an optimized environment for pollinator bees.

πŸ’Ύ The Data

You have assembled information on the plants and bees research in a file called plants_and_bees.csv. Each row represents a sample that was taken from a patch of land where the plant species were being studied.

ColumnDescription
sample_idThe ID number of the sample taken.
bees_numThe total number of bee individuals in the sample.
dateDate the sample was taken.
seasonSeason during sample collection ("early.season" or "late.season").
siteName of collection site.
native_or_nonWhether the sample was from a native or non-native plot.
samplingThe sampling method.
plant_speciesThe name of the plant species the sample was taken from. None indicates the sample was taken from the air.
timeThe time the sample was taken.
bee_speciesThe bee species in the sample.
sexThe gender of the bee species.
specialized_onThe plant genus the bee species preferred.
parasiticWhether or not the bee is parasitic (0:no, 1:yes).
nestingThe bees nesting method.
statusThe status of the bee species.
nonnative_beeWhether the bee species is native or not (0:no, 1:yes).

Source (data has been modified)

πŸ’ͺ Challenge

Provide your agency with a report that covers the following:

  • Which plants are preferred by native vs non-native bee species?
  • A visualization of the distribution of bee and plant species across one of the samples.
  • Select the top three plant species you would recommend to the agency to support native bees.

πŸ§‘β€βš–οΈ Judging criteria

This is a community-based competition. The top 5 most upvoted entries will win.

The winners will receive DataCamp merchandise.

βœ… Checklist before publishing

  • Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
  • Remove redundant cells like the judging criteria, so the workbook is focused on your work.
  • Check that all the cells run without error.

βŒ›οΈ Time is ticking. Good luck!

# Importing libraries
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.style as style
import seaborn as sns
import numpy as np
from datetime import datetime

data = pd.read_csv("data/plants_and_bees.csv")

df_bees_clean = data.copy()

df_bees_clean
# a. State whether the values match the description given in the table above.

# b. State the number of missing values in the column.

# Check the data types in the columns
# df_fitness_clean.info()
df_bees_clean.dtypes
# Check the missing values in the columns
#df_bees_clean.isna().sum()
# Clean Rating and Review columns

df_bees_clean.sort_values(by = 'bees_num', inplace=True)

# date Date the sample was taken. time time the sample was taken.
# Convert 'time' column to a zero-padded string representation
df_bees_clean['time'] = df_bees_clean['time'].astype(str).str.zfill(4)
# Convert 'date' column to a proper datetime format
df_bees_clean['date'] = pd.to_datetime(df_bees_clean['date'], format='%m/%d/%Y')
# Concatenate 'date' and 'time' columns into a new column 'datetime'
df_bees_clean['datetime'] = df_bees_clean['date'].dt.strftime('%Y-%m-%d') + ' ' + df_bees_clean['time']
# Convert 'datetime' column to datetime objects
df_bees_clean['datetime'] = pd.to_datetime(df_bees_clean['datetime'])
# Now you can drop the 'time' and 'date' columns if you don't need them anymore
df_bees_clean.drop(columns=['time', 'date'], inplace=True)
# Use dt.floor() to remove the seconds from the 'datetime' column
df_bees_clean['datetime'] = df_bees_clean['datetime'].dt.floor('min')

# specialized_on	The plant genus the bee species preferred.
df_bees_clean['specialized_on'] = df_bees_clean['specialized_on'].fillna(0)
df_bees_clean['specialized_on'] = df_bees_clean['specialized_on'].replace(0, 'unknown')

# parasitic	Whether or not the bee is parasitic (0:no, 1:yes).
df_bees_clean['parasitic'].fillna(0, inplace=True)
df_bees_clean['parasitic'] = df_bees_clean['parasitic'].replace({0: 'No', 1: 'Yes'})

# nesting	The bees nesting method.
df_bees_clean['nesting'].fillna(0, inplace=True)
df_bees_clean['nesting'] = df_bees_clean['nesting'].replace(0, 'unknown')

# status The status of the bee species.
df_bees_clean['status'].fillna(0, inplace=True)
df_bees_clean['status'] = df_bees_clean['status'].replace(0, 'unknown')

# nonnative_bee	Whether the bee species is native or not (0:no, 1:yes).
df_bees_clean['nonnative_bee'].fillna(0, inplace=True)
df_bees_clean['nonnative_bee'] = df_bees_clean['nonnative_bee'].replace({0: 'No', 1: 'Yes'})
# Order Categories
ordered_cats = {"season":['early.season', 'late.season'], 
                "site": ['A', 'B', 'C'], 
                "native_or_non": ['native', 'non-native'],
                "sampling": ['pan traps', 'hand netting'],
                
                "plant_species": ['None', 'Trifolium incarnatum', 'Viola cornuta',
       'Trifolium repens', 'Leucanthemum vulgare',
       'Melilotus officinalis', 'Tradescantia virginiana',
       'Penstemon digitalis', 'Trifolium pratense', 'Monarda punctata',
       'Asclepias tuberosa', 'Rudbeckia hirta', 'Coronilla varia',
       'Lobularia maritima', 'Daucus carota', 'Chamaecrista fasciculata',
       'Pycnanthemum tenuifolium', 'Agastache foeniculum',
       'Cosmos bipinnatus', 'Helenium flexuosum', 'Origanum vulgare',
       'Lotus corniculatus', 'Cichorium intybus', 'Rudbeckia triloba'],
                
                "bee_species": ['Augochlorella aurata', 'Agapostemon texanus', 'Andrena carlini',
       'Andrena perplexa', 'Apis mellifera', 'Lasioglossum tegulare',
       'Lasioglossum pectorale', 'Lasioglossum pilosum',
       'Lasioglossum cressonii', 'Lasioglossum trigeminum',
       'Osmia pumila', 'Andrena miserabilis', 'Lasioglossum versatum',
       'Halictus poeyi/ligatus', 'Osmia atriventris',
       'Nomada bidentate_group', 'Osmia bucephala',
       'Lasioglossum callidum', 'Ceratina calcarata',
       'Agapostemon splendens', 'Lasioglossum coreopsis',
       'Nomada australis', 'Ceratina', 'Megachile brevis',
       'Halictus parallelus', 'Ceratina strenua',
       'Andrena (Trachandrena)', 'Andrena nasonii', 'Ceratina mikmaqi',
       'Agapostemon virescens', 'Osmia subfasciata',
       'Lasioglossum coriaceum', 'Lasioglossum vierecki',
       'Nomada pygmaea', 'Nomada articulata', 'Osmia taurus',
       'Andrena banksi', 'Osmia distincta', 'Eucera hamata',
       'Hoplitis producta', 'Augochloropsis metallica_metallica',
       'Halictus confusus', 'Ceratina dupla', 'Andrena barbara',
       'Osmia georgica', 'Lasioglossum oblongum',
       'Lasioglossum floridanum', 'Nomada parva', 'Osmia sandhouseae',
       'Lasioglossum bruneri', 'Megachile mendica', 'Lasioglossum weemsi',
       'Hoplitis pilosifrons', 'Bombus bimaculatus', 'Lasioglossum',
       'Lasioglossum subviridatum', 'Bombus impatiens',
       'Bombus griseocollis', 'Lasioglossum hitchensi',
       'Agapostemon sericeus', 'Andrena wilkella', 'Andrena macra',
       'Hoplitis truncata', 'Augochloropsis metallica_fulgida',
       'Andrena atlantica', 'Calliopsis andreniformis',
       'Melissodes subillatus', 'Anthidiellum notatum',
       'Megachile exilis', 'Heriades carinata', 'Lasioglossum ephialtum',
       'Megachile georgica', 'Lasioglossum admirandum',
       'Lasioglossum gotham', 'Lasioglossum abanci', 'Megachile texana',
       'Triepeolus lunatus', 'Melissodes', 'Melissodes bimaculatus',
       'Melissodes comptoides', 'Melissodes trinodis',
       'Bombus fervidus/pensylvanicus', 'Nomada texana',
       'Augochlora pura', 'Bombus citrinus', 'Hylaeus affinis/modestus',
       'Hylaeus modestus', 'Melitoma taurea', 'Triepeolus remigatus',
       'Anthidium manicatum', 'Bombus pensylvanicus', 'Bombus fervidus',
       'Nomada vegana'],
               "sex": ['f', 'm'],
               "specialized_on": ['Penstemon', 'Ipomoea','unknown'],
                "parasitic": ['No', 'Yes'],
               "nesting": ['ground', 'hive', 'wood', 'parasite [ground]', 'wood/shell','wood/cavities', 'unknown'],
               "status": ['uncommon', 'vulnerable (IUCN)', 'common', 'unknown'],
               "nonnative_bee": ['No', 'Yes']}


# Loop through DataFrame columns to efficiently change data types
for col in df_bees_clean:
    
    # Convert integer columns to int32
    if df_bees_clean[col].dtype == 'int':
        df_bees_clean[col] = df_bees_clean[col].astype('int16')
    
    # Convert float columns to float16
    elif df_bees_clean[col].dtype == 'float':
        df_bees_clean[col] = df_bees_clean[col].astype('float16')
        
    elif df_bees_clean[col].dtype == 'datetime':
        df_bees_clean[col] = df_bees_clean[col].astype('datetime')
    
    # Convert columns containing ordered categorical data to ordered categories using dict
    elif col in ordered_cats.keys():
        category = pd.CategoricalDtype(ordered_cats[col], ordered=True)
        df_bees_clean[col] = df_bees_clean[col].astype(category)
        
    # Convert remaining columns to standard categories
    #else:
        #df_bees_clean[col] = df_bees_clean[col].astype('category')
        
df_bees_clean.dtypes
df_bees_clean

Which plants are preferred by native vs non-native bee species?

sns.color_palette('colorblind')
# Filter the DataFrame to include only rows with 'native' or 'non-native' bee species
selected_bees = df_bees_clean[df_bees_clean['native_or_non'].isin(['native', 'non-native'])]

# Group the DataFrame by 'plant_species' and 'native_or_non' and count the occurrences
plant_counts_by_bee = selected_bees.groupby(['plant_species', 'native_or_non']).size().unstack(fill_value=0)

# Sum the counts for each plant species across 'native' and 'non-native' categories
total_counts = plant_counts_by_bee.sum(axis=1)

# Sort the total counts to get the plants with the highest preference
top_plants = total_counts.nlargest(20)

# Calculate the difference in counts for each plant species between 'native' and 'non-native' bee species
difference_counts = plant_counts_by_bee['native'] - plant_counts_by_bee['non-native']

# Get the absolute values of the differences for the top plants
top_plants_difference_abs = difference_counts[top_plants.index].abs()

# Define colors for native and non-native bee species
native_color = 'skyblue'
non_native_color = 'orange'

# Create a bar plot to show the difference in preference for the top plants
plt.figure(figsize=(10, 6))
bars = plt.bar(top_plants_difference_abs.index, top_plants_difference_abs.values, color=[native_color if d > 0 else non_native_color for d in difference_counts[top_plants.index]])
plt.xlabel('Plant Species')
plt.ylabel('Difference in Preference (Absolute)')
plt.title('Difference in Preference for Top Plants')
plt.xticks(rotation=45, ha='right')

# Add labels above the bars
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width() / 2, height, f'{int(height):d}', ha='center', va='bottom')

# Add custom legend
legend_native = plt.Line2D([], [], color=native_color, marker='s', markersize=10, label='Native Preference')
legend_non_native = plt.Line2D([], [], color=non_native_color, marker='s', markersize=10, label='Non-Native Preference')
plt.legend(handles=[legend_native, legend_non_native], loc='upper right')

# Show the plot
plt.show()

A visualization of the distribution of bee and plant species across one of the samples.

β€Œ
β€Œ
β€Œ