Sowing Success: How Machine Learning Helps Farmers Select the Best Crops
Measuring essential soil metrics such as nitrogen, phosphorus, potassium levels, and pH is crucial for assessing soil health. However, this testing can be expensive and time-consuming, often forcing farmers to prioritize which metrics they measure based on budget constraints.
When choosing which crop to plant each season, farmers aim to maximize crop yield by considering factors such as soil quality. Each crop has specific ideal soil conditions that support optimal growth.
A farmer has approached us for assistance in selecting the best crop for their field using machine learning. They provided a dataset, soil_measures.csv, containing:
"N": Nitrogen content ratio in the soil"P": Phosphorous content ratio in the soil"K": Potassium content ratio in the soil"pH"value of the soil"crop": categorical values that contain various crops (target variable).
Each row in the dataset represents a sample of soil measurements from a particular field, along with the corresponding optimal crop.
🎯 Project Objective
In this project, we will:
- Build multi-class classification models using logistic regression
- Evaluate how well each individual feature predicts the crop
- Identify the single most important soil metric for accurate crop prediction
This work helps guide cost-effective testing by identifying which soil element contributes the most predictive value.
📥 1. Load and Inspect the Data
We'll begin by importing the necessary libraries and loading the dataset soil_measures.csv.
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
# Load the dataset
crops = pd.read_csv("soil_measures.csv")
print(crops.head())
# Write your code here🧼 2. Data Overview
This dataset includes four numeric features:
- Nitrogen (
N) - Phosphorus (
P) - Potassium (
K) - pH value (
ph)
Our target is a categorical label: crop, with 22 unique crop types.
We'll also confirm there are no missing values.
crops.isna().sum().sort_values()crops["crop"].unique()🤖 3. Model Training
We'll build separate logistic regression models using one soil metric at a time to predict the crop.
This helps determine which individual feature has the strongest predictive power.
We use:
LogisticRegressionfrom scikit-learnF1-score (weighted)for performance evaluation
X=crops.drop("crop",axis=1)
y=crops["crop"]
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=42)📈 4. Performance Evaluation
Here are the F1-scores for each feature when used alone in a multinomial logistic regression model:
feature_performance = {}
for feature in ["N", "P","K","ph"]:
log_reg=LogisticRegression(multi_class="multinomial")
log_reg.fit(X_train[[feature]].values,y_train)
y_pred=log_reg.predict(X_test[[feature]].values)
feature_performance[feature]=f1_score(y_test,y_pred,average="weighted")
print(f"F1-score for {feature}: {feature_performance[feature]}")print(feature_performance)best_predictive_feature={"K":feature_performance["K"]}
print(best_predictive_feature)