Skip to content
Competition - Bee friendly plants
💾 The Data
You have assembled information on the plants and bees research in a file called plants_and_bees.csv
. Each row represents a sample that was taken from a patch of land where the plant species were being studied.
Column | Description |
---|---|
sample_id | The ID number of the sample taken. |
species_num | The number of different bee species in the sample. |
date | Date the sample was taken. |
season | Season during sample collection ("early.season" or "late.season"). |
site | Name of collection site. |
native_or_non | Whether the sample was from a native or non-native plant. |
sampling | The sampling method. |
plant_species | The name of the plant species the sample was taken from. None indicates the sample was taken from the air. |
time | The time the sample was taken. |
bee_species | The bee species in the sample. |
sex | The gender of the bee species. |
specialized_on | The plant genus the bee species preferred. |
parasitic | Whether or not the bee is parasitic (0:no, 1:yes). |
nesting | The bees nesting method. |
status | The status of the bee species. |
nonnative_bee | Whether the bee species is native or not (0:no, 1:yes). |
Source (data has been modified)
import matplotlib.pyplot as plt
💪 Challenge
Provide your agency with a report that covers the following:
- Which plants are preferred by native vs non-native bee species?
- A visualization of the distribution of bee and plant species across one of the samples.
- Select the top three plant species you would recommend to the agency to support native bees.
import pandas as pd
data = pd.read_csv("data/plants_and_bees.csv")
data
data.info()
data.describe()
we have 1250 row of data and 16 columns and have empty values such as (status,specialization_on,nesting and nonnative)
data.isna().sum()/len(data)*100
data.value_counts("sampling")
we have 99.44% of specializized_on column and status are empty so we can drop them .
data = data.drop(columns=['specialized_on','status'])
#data = data[data["plant_species"] !="None"]
data.value_counts("sampling")
import plotly.graph_objects as go
import plotly.express as px
data = data.sort_values('plant_species')
fig = px.histogram(data_frame = data,
y="plant_species",title='plant species with sex',color='sex')
fig.update_layout(xaxis=dict(range=[50, max(data['plant_species'])]))
fig.show()
number of feamels are the Sweeping in plants
- 1 lenucanthenum vulgare is the large number with 101 feamel and 4 male
- 2 Rudbeckia hirta is the second one with 49 female and 10 male
- 3 Daucus carota with 27 female and 6 male ``
data.value_counts("plant_species")
Hidden output
‌
‌
‌
‌
‌