Skip to content

Evaluate your model using the F-score

The F-score or F-measure is a metric for indicating the accuracy of a classification model. After obtaining the precision and recall on your test set, you can calculate the F1-score with following formula:

The maximum possible value is 1, which indicates a perfect model. If either precision or recall is 0, the -score is 0 as well.
In the formula above, we state that precision is as important as recall for our application; that is why we write . We will use this version of the metric in this template via the scikit-learn's function f1_score(). However, you can also apply weights to the precision or recall with the -score.

# Load packages
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
from sklearn.metrics import f1_score
%config InlineBackend.figure_format = 'retina'
# Load data from the csv file
df = pd.read_csv("F1-score_data.csv")
df.head()
# Inspect the data for missing values and incorrect data
df.info()
df['PROBABILITY'].describe()
df['ACTUAL LABEL'].value_counts() 
# Create histogram for 'PROBABILITY' column
plt.hist(df['PROBABILITY'])

First, we need to convert the probabilities predicted by our model to actual binary predictions. In this example, we will use a threshold of 0.65; predictions under this threshold will be mapped to 0, the other predictions to 1.

THRESHOLD = 0.65                # Choose your threshold for which you want to calculate the F-score

# Convert the probability predicted by the model to an actual binary prediction
def convert_to_pred(x):
    if x < THRESHOLD:            # If under threshold,
        return 0                   # map to 0.
    else:                        # Else,
        return 1                   # map to 1.

df['PREDICTION'] = df['PROBABILITY'].apply(lambda x: convert_to_pred(x))

df.head()

Now we can go ahead and calculate the -score using the scikit-learn package.

f1_score(df['ACTUAL LABEL'], df['PREDICTION'])

Now, it would be easier if we automate this process. I am assuming that this data set has imbalanced problem such as fraud detection. In that case, our threshold must be determined in a robust way because we are interested a model that can effectively detect positive cases.

Finding Optimal Threshold

# search thresholds for imbalanced classification
from numpy import arange
from numpy import argmax
from sklearn.metrics import f1_score


# define thresholds
thresholds= arange(0, 1, 0.1)

# This should be the predicted probabilities for the positive class, but our sample is very small. For practicality, I used all the observations.
probs = df['PROBABILITY']  

# apply thresholds to positive probabilities to create labels
def to_labels(probs, threshold):
    return (probs >= threshold).astype(int)

# Calculate scores for each threshold
f1scores = [f1_score(df['ACTUAL LABEL'], to_labels(probs, t)) for t in thresholds]

# get best threshold
ix = argmax(f1scores)
print('BestThreshold=%.3f, F-Score=%.5f' % (thresholds[ix], f1scores[ix]))

30% threshold optimizes our model.

Thank you!

I appreciate you stopping by. I hope you enjoy the content. If so, please show your appreciation with an upvote!

Also, feel free to check out other materials,too! Happy Learning!