Skip to content
KHOJ - Know Your High Court Judges
  • AI Chat
  • Code
  • Report
  • Exploratory Data Analysis on KHOJ using Python

    👉 Introduction

    In this notebook, you are going to do data analysis on an interesting dataset named KHOJ - Know Your High Court Judges. This dataset contains information about the judges of all high courts in India from 1993 to 2021.

    💾 The Data

    The data you will see here contains 1708 rows and 44 columns. But you don't need all columns of this dataset. In the table below, you will see the description of those columns which you will use in this analysis.

    Name of the JudgeContaining the name of the Judge
    GenderGender of the Judge
    Date of BirthThe date in which the Judges were born
    Date of AppointmentThe date on which the person was elevated as a Judge of any High Court (appointment as an Additional Judge is also considered here)
    Date of RetirementThe date on which the person demits office as a Judge of High Court or of the Supreme Court (if elevated to it)
    If appointed Chief Justice in any High CourtCategorical column specifying if a judge is appointed as Chief Justice or not.
    If appointed to the Supreme CourtCategorical column specifying if a judge is appointed to the Supreme Court or not.
    Foreign Degree in LawIf the judge has a Foreign Degree in Law or not.
    Post-Graduate in LawIf the judge has a PG Degree in Law or not.
    Post-Graduate in another subjectIf the judge has a PG Degree in another subject or not.
    Graduation SpecializationThe particular subject is chosen by the Judge during his Graduation.
    file TitleName of the Court.

    If you want to know about all columns of this data, you should check this link. If you want to analyze the data of a specific court, you can check this website as this data contains all informations about the Judges of all courts.

    What to do🤔?

    You know about the data. Now you have to ask yourself  - what you want to know from this data? Below are some questions which I want to know from this data.

    • What is the average age of Judges when they are appointed as a Judge of any High Court?
    • What is the average retirement age of the High Court Judge ?
    • What is the average duration of working as a Judge?
    • What is the Ratio of the Male and Female Judges?
    • What is the Education Qualification of Judges? This also has four subparts.
      • How many of them have done Post Graduation in Law?
      • How many of them have a Foreign degree in Law?
      • Which subject they chose in their Graduation Specialization?
      • How many of them have done Post Graduation in another subject other than Law?
    • What is the Judge's designation? It also has two subparts.
      • How many judges per state had been promoted as a chief justice in any High Court?
      • How many judges per state have been promoted as a judge in the Supreme Court?

    It's also possible that the question/questions you think is/are not in the list. You can add them. Now you know about the data and you also know that what you want to know. Now you can finally go to the data analysis part.

    🧹 Analyzing and Cleaning the Data

    Importing the necessary libraries

    As we are not doing any data visualization tasks in Python here, we are not going to import any data visualization library. For our task, it is sufficient to import Pandas and Numpy. Let's import those libraries.

    import pandas as pd
    import numpy as np

    Quick Look on data

    Now, let's see what our data looks like. As the file is in .csv format, we import this data in Pandas using the read_csv() method.

    judge_data = pd.read_csv("khoj-1.8.csv")

    By first look, we can notice that,

    • There are three date columns available - Date of Birth, Date of Appointment and Date of Retirement. But suprisingly, Pandas detects them as object, which is the style of Pandas library, telling you that these columns are categorical columns.
    • Some columns containing the value Not Available and Not Applicable. Here Not Available denotes the null value in the column and Not Applicable means that column is not applicable for that specific judge.
    date_cols = ['Date of Birth', 'Date of Appointment', 'Date of Retirement']
    Hidden output
    Hidden output

    Now, you don't need the whole date of these three date columns. As you are only interested in the age of the judge, it is sufficient for us to take only the year from the date. But before doing that, you have to replace the Not Available value with np.nan. Otherwise, while converting those columns to pd.datetime format, it will throw an error.

    for cols in date_cols:
        # Replacing "Not Available" value with np.nan
        judge_data[cols] = judge_data[cols].replace("Not Available", np.nan)
        # Converting the date columns to pd.datetime format
        judge_data[cols] = pd.to_datetime(judge_data[cols])
        # extracting year from date
        judge_data[cols] = pd.DatetimeIndex(judge_data[cols]).year

    After this operation, these date column contains only the year. So, it is no longer needed to call those columns as date columns. It's time for replace their names.