Loved by learners at thousands of companies
Learn Speech Recognition and Spoken Language Processing in PythonWe learn to speak far before we learn to read. Even in the digital age, our main method of communication is speech. Spoken Language Processing in Python will help you load, transform, and transcribe audio files. You’ll start by seeing what raw audio looks like in Python, and move on to exploring popular libraries and working through an example business use case.
Use Python SpeechRecognition and PyDub to Transcribe Audio FilesPython has a number of popular libraries that help you to process spoken language. SpeechRecognition offers you an easy way to integrate with speech-to-text APIs, while PyDub helps you to programmatically alter audio file attributes to get them ready for transcription. Each of these libraries is covered in an in-depth chapter, offering you the opportunity to put theory into practice to cement your knowledge.
Practice Speech Transcription with an In-Course ProjectThe final chapter in this course offers you the opportunity to put everything you’ve learned together by building a speech processing proof of concept for a fictional technology company. You’ll build a system that transcribes phone call audio to text and then performs sentiment analysis to review customer support phone calls.
By the end of this course, you’ll have both the knowledge and hands-on experience to put your learning into practice within your job or personal projects.
Introduction to Spoken Language Processing with PythonFree
Audio files are different from most other types of data. Before you can start working with them, they require some preprocessing. In this chapter, you'll learn the first steps to working with speech files by converting two different audio files into soundwaves and comparing them visually.Introduction to audio data in Python50 xpThe right frequency50 xpImporting an audio file with Python100 xpConverting sound wave bytes to integers50 xpThe right data type50 xpBytes to integers100 xpFinding the time stamps100 xpVisualizing sound waves50 xpStaying consistent50 xpProcessing audio data with Python100 xp
Using the Python SpeechRecognition library
Speech recognition is still far from perfect. But the SpeechRecognition library provides an easy way to interact with many speech-to-text APIs. In this section, you'll learn how to use the SpeechRecognition library to easily start converting the spoken language in your audio files to text.SpeechRecognition Python library50 xpPick the wrong speech_recognition API50 xpUsing the SpeechRecognition library100 xpUsing the Recognizer class100 xpReading audio files with SpeechRecognition50 xpFrom AudioFile to AudioData100 xpRecording the audio we need100 xpDealing with different kinds of audio50 xpDifferent kinds of audio100 xpMultiple Speakers 1100 xpMultiple Speakers 2100 xpWorking with noisy audio100 xp
Manipulating Audio Files with PyDub
Not all audio files come in the same shape, size or format. Luckily, the PyDub library by James Robert provides tools which you can use to programmatically alter and change different audio file attributes such as frame rate, number of channels, file format and more. In this chapter, you'll learn how to use this helpful library to ensure all of your audio files are in the right shape for transcription.Introduction to PyDub50 xpImport an audio file with PyDub100 xpPlay an audio file with PyDub100 xpAudio parameters with PyDub100 xpAdjusting audio parameters100 xpManipulating audio files with PyDub50 xpTurning it down... then up100 xpNormalizing an audio file with PyDub100 xpChopping and changing audio files100 xpSplitting stereo audio to mono with PyDub100 xpConverting and saving audio files with PyDub50 xpExporting and reformatting audio files100 xpManipulating multiple audio files with PyDub100 xpAn audio processing workflow100 xp
Processing text transcribed from spoken language
In this chapter, you'll put everything you've learned together by building a speech processing proof of concept project for a technology company, Acme Studios. You'll start by transcribing customer support call phone call audio snippets to text. Then you'll perform sentiment analysis using NLTK, named entity recognition using spaCy and text classification using scikit-learn on the transcribed text.Creating transcription helper functions50 xpConverting audio to the right format100 xpFinding PyDub stats100 xpTranscribing audio with one line100 xpUsing the helper functions you've built100 xpSentiment analysis on spoken language text50 xpAnalyzing sentiment of a phone call100 xpSentiment analysis on formatted text100 xpNamed entity recognition on transcribed text50 xpNamed entity recognition in spaCy100 xpCreating a custom named entity in spaCy100 xpClassifying transcribed speech with Sklearn50 xpPreparing audio files for text classification100 xpTranscribing phone call excerpts100 xpOrganizing transcribed phone call data100 xpCreate a spoken language text classifier100 xpCongratulations!50 xp
In the following tracksNatural Language Processing in Python