Skip to content

Full Stack Data Engineering with Python

Some background context briefly summarizing

  • We will deal with HDF5 files
  • The data will come from the LIGO Detector
  • Manipulate HDF5 with Python, extract and manipulate data make it ready for scientific analysis.
  • We are closer to science and exploration of the Universe then we believe. This data is open and available to us, and can be used for a variety of proposes.

Task 0: Setup

Import the following packages.

  • Import numpy using the alias pd.
  • From the scipy package, import signal, interpolate.interp1d and butter, filtfilt.
  • Use matplotlib on pyplot and mlab.
  • Don't forget the h5py library that will help on the dicovery of the HDF5 file.
import numpy as np
from scipy import signal
from scipy.interpolate import interp1d
from scipy.signal import butter, filtfilt
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import h5py as h5

Task 1: Import the LIGO data

l1 = h5.File('L-L1_GWOSC_4KHZ_R1-1126259447-32.hdf5', 'r')
h1 = h5.File('H-H1_GWOSC_4KHZ_R1-1126259447-32.hdf5', 'r')

We can see the keys, which are the name of the groups.

The 'strain' group contains the time series.

The file contains the following groups:

l1.keys()

To access the groups you pass the name of it inside brackets, the same as you do with a dictionary.

l1['strain'].keys()
l1['strain']['Strain']

If we get into a dataset with a valuable shape we can access it as it is a numpy array

l1['strain']['Strain'][:]

Set the time series into variables