Skip to content
0

💾 The Data

You have assembled information on the plants and bees research in a file called plants_and_bees.csv. Each row represents a sample that was taken from a patch of land where the plant species were being studied.

ColumnDescription
sample_idThe ID number of the sample taken.
species_numThe number of different bee species in the sample.
dateDate the sample was taken.
seasonSeason during sample collection ("early.season" or "late.season").
siteName of collection site.
native_or_nonWhether the sample was from a native or non-native plant.
samplingThe sampling method.
plant_speciesThe name of the plant species the sample was taken from. None indicates the sample was taken from the air.
timeThe time the sample was taken.
bee_speciesThe bee species in the sample.
sexThe gender of the bee species.
specialized_onThe plant genus the bee species preferred.
parasiticWhether or not the bee is parasitic (0:no, 1:yes).
nestingThe bees nesting method.
statusThe status of the bee species.
nonnative_beeWhether the bee species is native or not (0:no, 1:yes).

Source (data has been modified)

import matplotlib.pyplot as plt

💪 Challenge

Provide your agency with a report that covers the following:

  • Which plants are preferred by native vs non-native bee species?
  • A visualization of the distribution of bee and plant species across one of the samples.
  • Select the top three plant species you would recommend to the agency to support native bees.
import pandas as pd
data = pd.read_csv("data/plants_and_bees.csv")
data
data.info()
data.describe()

we have 1250 row of data and 16 columns and have empty values such as (status,specialization_on,nesting and nonnative)

data.isna().sum()/len(data)*100
data.value_counts("sampling")

we have 99.44% of specializized_on column and status are empty so we can drop them .

data = data.drop(columns=['specialized_on','status'])
#data = data[data["plant_species"] !="None"]
data.value_counts("sampling")
import  plotly.graph_objects as go
import plotly.express as px
data = data.sort_values('plant_species')

fig = px.histogram(data_frame = data, 
             y="plant_species",title='plant species with sex',color='sex')
fig.update_layout(xaxis=dict(range=[50, max(data['plant_species'])]))
fig.show()

number of feamels are the Sweeping in plants

  • 1 lenucanthenum vulgare is the large number with 101 feamel and 4 male
  • 2 Rudbeckia hirta is the second one with 49 female and 10 male
  • 3 Daucus carota with 27 female and 6 male ``
data.value_counts("plant_species")
Hidden output
‌
‌
‌