INSPIRATION
Every day we sign up on websites or mobile apps which require us to provide passwords. We are to ensure that these passwords are hard to crack by hackers. But how do we know what 'bad password' is? For this reason, the National Institute of Standards and Technology, (NIST) in the NIST Special Publication 800-63B states what to use to make sure you are at least not using a bad password.
Taking the guidelines from NIST and adding other major ways of ensuring preventing bad passwords, and using a fictional company usernames and passwords, this project is going to check passwords that meet the following guidelines:
- Must be at least 8 charecters long
- Must have at least one digit
- Must have at least an alphabet
- Must have at least one upper case letter
- Must have at least one lower case letter
- Must have at least a special character
- Must not have more than four repeating charecters
- Must not be a common English word
- Must not be in one of the common passwords lists
- Must not be the name of the user
The insight for this project and the users CSV file were obtained from this project on DataCamp: Bad Passwords and the NIST Guidelines
NOTE: The list of passwords and the fictional user database both contain real passwords leaked from real websites. These passwords have not been filtered in any way and include words that are explicit, derogatory and offensive.
Load Data
# Import necessary libraries.
import pandas as pd
import re
import numpy
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")# Loading in dataset
users = pd.read_csv('users.csv')
# Taking a look at first 12 users
users.head(12)# Getting total users
print(len(users))Password Guidlines Analysis
# Defining a funtion for ploting data
def plot_bar(col, title):
col.value_counts().plot(kind = 'bar')
plt.title(title)
plt.xlabel('State of User')
plt.xticks(rotation = 0)
plt.ylabel('Count of Users')
total = len(col)
for i, value in enumerate(col.value_counts()):
percentage = value / total * 100
plt.annotate(f"{percentage:.1f}%", (i, value), ha='center', va='bottom')
return plt.show()1. Must be at least 8 characters long.
# Calculating the lengths of users' passwords
users['length'] = users['password'].str.len()
# Flagging the users with too short passwords
users['too_short'] = users['length'] < 8
# Taking a look at the 12 first rows
users.head(12)# Visualizing
plot_bar(users['too_short'], 'Users with Passwords More or Less Than 8 Characters')It can be observeed from the chart above that about 62% of users have passwords more than 8 characters while 38% do not.
2. Must have at least 1 digit
# Flagging the users with passwords not having digits
users['has_number'] = users['password'].str.contains(r'\d+')
# Taking a look at the 12 first rows
users.head(12)# Visualizing
plot_bar(users['has_number'], 'Users with Passwords Having at Least One Digit')It can be observed that 59% of the users have at least one digit in their passwords while 41% of them do not.