Skip to content

City Hospitals Analysis

City growth affects each city's hospitals, schools etc. This growth happens because of the people's demands. Because of this growth it affects the price of everything. So the government wants to open a new hospital but first of all they want to see information about the number of each regional hospital , over the age analysis and each department's sum of deposit.

Project Overview and Objective: Hospital Data Analysis

This project aims to analyze a dataset related to hospital information. The dataset contains various factors within hospitals, and the goal is to understand if these factors impact hospital performance.

Description of Columns:

  1. Unnamed: 0:

    • Contains an unnamed sequence number.
    • This column doesn't provide specific information and won't be used in the analysis.
  2. case_id:

    • Uniquely identifies a hospital case.
    • Contains no missing values.
  3. Hospital_region_code:

    • Indicates the region code of the hospital.
    • Contains no missing values.
  4. Available Extra Rooms in Hospital:

    • Represents the number of available extra rooms in the hospital.
    • Contains no missing values.
  5. Department:

    • Specifies the department in the hospital (e.g., Surgery, Pediatrics, etc.).
    • Contains no missing values.
  6. Ward_Type:

    • Indicates the type of ward where the patient is placed.
    • Contains no missing values.
  7. Ward_Facility_Code:

    • Contains the code of the facility where the ward is located.
    • Contains no missing values.
  8. Type of Admission:

    • Specifies the type of patient admission (Emergency, Elective, etc.).
    • Contains no missing values.
  9. Severity of Illness:

    • Indicates the severity of the patient's illness (Minor, Moderate, Major).
    • Contains no missing values.
  10. Visitors with Patient:

    • Represents the number of visitors with the patient.
    • Contains no missing values.
  11. Age:

    • Specifies the age of the patient.
    • Contains no missing values.
  12. Admission_Deposit:

    • Represents the deposit amount required when admitted to the hospital.
    • Contains no missing values.

The project involves analyzing factors that influence hospital performance and using these insights to predict relevant outcomes. The results of the analysis can be utilized to enhance or optimize the quality of services provided by hospitals.

EDA (Exploratory Data Analysis)

Data Set Review

In this project, we will use a dataset containing hospital data. The dataset is downloaded from the website [https://app.gamboo.io/]. Data set; Unnamed: 0, case_id, Hospital_region_code, Available Extra Rooms in Hospital, Department, Ward_Type, Ward_Facility_Code, Type of Admission, Severity of Illness, Visitors with Patien, It contains information such as Admission_Deposit. The dataset has a total of 318438 rows and 18 columns.

import pandas as pd
import seaborn as sns
import scipy.stats as stats
data=pd.read_csv("hospitaldataset.csv")

Data Cleaning and Preprocessing

Before analysing the dataset, we will perform data cleaning and pre-processing steps. These steps are as follows:

We will use the data.head() and data.info() functions to examine the structure and properties of the data set.

We will use the data.isnull().sum() function to check for missing values in the data set. We will see if there are any missing values in the data set. We will fill these missing values with the average of the relevant columns. To do this, we will use the function data.fillna(data.mean()).

We will use the function pd.get_dummies() to numerically encode the categorical variables in the dataset. This will make the dataset suitable for machine learning models.

data.head()
data.info()
data.mean()

Analyses and Visualisations

After cleaning and pre-processing the dataset, we will use various graphs to analyse and visualise the relationships and distributions between variables in the dataset. These graphs are the following:

Regional Hospital Number & Analysis

regional_hospital_counts = data['Hospital_region_code'].value_counts()
print("Number of Regional Hospitals:")
print(regional_hospital_counts)

import seaborn as sns
import matplotlib.pyplot as plt

# Visualise the number of regional hospitals
plt.figure(figsize=(10, 6))
sns.countplot(x='Hospital_region_code', data=data)
plt.title('Regional Hospital Number & Analysis')
plt.show()

Age Analysis

age_analysis = data['Age'].value_counts()
print("Age Analysis")
print(age_analysis)


# Analyses on age groups
plt.figure(figsize=(12, 8))
sns.countplot(x='Age', data=data, hue='Department')
plt.title('Departments by Age Groups')
plt.show()

Total Deposits of Each Department

This graph shows the distribution of deposits in each hospital region as a box plot. This can help us understand the impact of regional differences on deposits.

department_deposit_sum = data.groupby('Department')['Admission_Deposit'].sum()
print("Total Deposits of Each Department")
print(department_deposit_sum)

# Average of deposits in each department
plt.figure(figsize=(10, 6))
sns.barplot(x='Department', y='Admission_Deposit', data=data)
plt.title('# Average of deposits in each department')
plt.show()

# Average of deposits in each department
plt.figure(figsize=(12, 8))
sns.boxplot(x='Hospital_region_code', y='Admission_Deposit', data=data)
plt.title('# Average of deposits in each department')
plt.show()

Regional Distribution of the Number of Patients