Which plants are better for bees: native or non-native?
📖 Background
You work for the local government environment agency and have taken on a project about creating pollinator bee-friendly spaces. You can use both native and non-native plants to create these spaces and therefore need to ensure that you use the correct plants to optimize the environment for these bees.
The team has collected data on native and non-native plants and their effects on pollinator bees. Your task will be to analyze this data and provide recommendations on which plants create an optimized environment for pollinator bees.
💪 Challenge
Provide your agency with a report that covers the following:
- Which plants are preferred by native vs non-native bee species?
- A visualization of the distribution of bee and plant species across one of the samples.
- Select the top three plant species you would recommend to the agency to support native bees.
Preprocessing
In this step, we will preprocess the data so it can be used in our future code
Imports
We will need to use these modules in the project
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import pandas as pd
import numpy as np
Loading the data
data = pd.read_csv("data/plants_and_bees.csv")
display(data)
Checking for missing values
prep = data.copy()
prep.isna().sum()
There are too many missing values for specialized_on
and status
. We have to delete status
, but not specialized_on
, as we will need it in the first challenge.
prep = prep.drop('status', axis=1)
For specialized_on
, we can replace all missing values with 'None'
, as the bees didn't prefer anything.
prep['specialized_on'] = prep['specialized_on'].fillna(value='None')
data['specialized_on'] = data['specialized_on'].fillna(value='None')
For the other columns, we can remove the rows which have missing values, as it is best, because there isn't too much to delete
‌
‌