Skip to content
Full Stack Data Engineering with Python
Full Stack Data Engineering with Python
Some background context briefly summarizing
- We will deal with HDF5 files
- The data will come from the LIGO Detector
- Manipulate HDF5 with Python, extract and manipulate data make it ready for scientific analysis.
- We are closer to science and exploration of the Universe then we believe. This data is open and available to us, and can be used for a variety of proposes.
Task 0: Setup
Import the following packages.
- Import
numpyusing the aliaspd. - From the
scipypackage, importsignal,interpolate.interp1dandbutter, filtfilt. - Use
matplotlibonpyplotandmlab. - Don't forget the
h5pylibrary that will help on the dicovery of the HDF5 file.
import numpy as np
from scipy import signal
from scipy.interpolate import interp1d
from scipy.signal import butter, filtfilt
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import h5py as h5Task 1: Import the LIGO data
l1 = h5.File('L-L1_GWOSC_4KHZ_R1-1126259447-32.hdf5', 'r')
h1 = h5.File('H-H1_GWOSC_4KHZ_R1-1126259447-32.hdf5', 'r')We can see the keys, which are the name of the groups.
The 'strain' group contains the time series.
The file contains the following groups:
l1.keys()To access the groups you pass the name of it inside brackets, the same as you do with a dictionary.
l1['strain'].keys()l1['strain']['Strain']If we get into a dataset with a valuable shape we can access it as it is a numpy array
l1['strain']['Strain'][:]Set the time series into variables