Skip to content
Competition - The best plants for bees
0
  • AI Chat
  • Code
  • Report
  • Which plants are better for bees: native or non-native?

    📖 Background

    You work for the local government environment agency and have taken on a project about creating pollinator bee-friendly spaces. You can use both native and non-native plants to create these spaces and therefore need to ensure that you use the correct plants to optimize the environment for these bees.

    The team has collected data on native and non-native plants and their effects on pollinator bees. Your task will be to analyze this data and provide recommendations on which plants create an optimized environment for pollinator bees.

    💪 Challenge

    Provide your agency with a report that covers the following:

    • Which plants are preferred by native vs non-native bee species?
    • A visualization of the distribution of bee and plant species across one of the samples.
    • Select the top three plant species you would recommend to the agency to support native bees.

    Preprocessing

    In this step, we will preprocess the data so it can be used in our future code

    Imports

    We will need to use these modules in the project

    import matplotlib.pyplot as plt
    import plotly.express as px
    import seaborn as sns
    import pandas as pd
    import numpy as np

    Loading the data

    data = pd.read_csv("data/plants_and_bees.csv")
    display(data)

    Checking for missing values

    prep = data.copy()
    prep.isna().sum()

    There are too many missing values for specialized_on and status. We have to delete status, but not specialized_on, as we will need it in the first challenge.

    prep = prep.drop('status', axis=1)

    For specialized_on, we can replace all missing values with 'None', as the bees didn't prefer anything.

    prep['specialized_on'] = prep['specialized_on'].fillna(value='None')
    data['specialized_on'] = data['specialized_on'].fillna(value='None')

    For the other columns, we can remove the rows which have missing values, as it is best, because there isn't too much to delete

    ‌
    ‌
    ‌