Skip to content
Project: Analyzing River Thames Water Levels
  • AI Chat
  • Code
  • Report
  • Analyzing River Thames Water Levels

    Time series data is everywhere, from watching your stock portfolio to monitoring climate change, and even live-tracking as local cases of a virus become a global pandemic. In this project, you’ll work with a time series that tracks the tide levels of the Thames River. You’ll first load the data and inspect it data visually, and then perform calculations on the dataset to generate some summary statistics. You’ll end by reducing the time series to its component attributes and analyzing them.

    The original dataset is available from the British Oceanographic Data Center.

    Here's a map of the locations of the tidal meters along the River Thames in London.

    The provided datasets are in the data folder in this workspace. For this project, you will work with one of these files, 10-11_London_Bridge.txt, which contains comma separated values for water levels in the Thames River at the London Bridge. After you've finished the project, you can use your same code to analyze data from the other files (at other spots in the UK where tidal data is collected) if you'd like.

    The TXT file contains data for three variables, described in the table below.

    Variable NameDescriptionFormat
    Date and timeDate and time of measurement to GMT. Note the tide gauge is accurate to one minute.dd/mm/yyyy hh:mm:ss
    Water levelHigh or low water level measured by tide meter. Tide gauges are accurate to 1 centimetre.metres (Admiralty Chart Datum (CD), Ordnance Datum Newlyn (ODN or Trinity High Water (THW))
    FlagHigh water flag = 1, low water flag = 0Categorical (0 or 1)
    import pandas as pd               
    
    def IQR(column): 
        """ Calculates the interquartile range (IQR) for a given DataFrame column using the quantile method """
        q25, q75 = column.quantile([0.25, 0.75])
        return q75-q25
    
    #Importing the comma seperated text file and picking only useful columns
    london_bridge_import = pd.read_csv('data/10-11_London_Bridge.txt')
    london_bridge = london_bridge_import.copy()
    london_bridge = london_bridge.iloc[:,:3]
    london_bridge.columns = ['datetime','water_level','is_high_tide']
    
    #Preparing data
    london_bridge.dtypes
    london_bridge['datetime'] = pd.to_datetime(london_bridge['datetime'])
    london_bridge['water_level'] = london_bridge.water_level.astype('float')
    
    #Extracting month and year columns for easy access
    london_bridge['month'] = london_bridge['datetime'].dt.month
    london_bridge['year'] = london_bridge['datetime'].dt.year
    
    #Seperate high and low tides
    high_tide = london_bridge[london_bridge['is_high_tide'] == 1]
    low_tide = london_bridge[london_bridge['is_high_tide'] == 0]
    
    #Summary statistics
    high_statistics = high_tide['water_level'].agg(['mean','median', IQR])
    low_statistics = low_tide['water_level'].agg(['mean','median', IQR])
    
    #Annual percentage of high tide days
    all_high_days = high_tide.groupby('year')['water_level'].count()
    very_high_days = high_tide[high_tide['water_level'] > high_tide['water_level'].quantile(0.90)].groupby('year')['water_level'].count()
    very_high_ratio = (very_high_days/all_high_days).reset_index()
    
    #Annual percentage of low tide days
    all_low_days = low_tide.groupby('year')['water_level'].count()
    very_low_days = low_tide[low_tide['water_level'] < low_tide['water_level'].quantile(0.10)].groupby('year')['water_level'].count()
    very_low_ratio = (very_low_days/all_low_days).reset_index()
    
    solution = {"high_statistics": high_statistics, "low_statistics": low_statistics, 
                "very_high_ratio": very_high_ratio, "very_low_ratio": very_low_ratio}
    print(solution)