Engine knock detection AI part 2/5
After the videos were sliced up and the audio tracks saved as wave files in the first part the magnificent librosa library is now used for pre-processing and generating spectrogram images to be used as training data
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
root_dir = "/content/gdrive/My Drive/"
base_dir = root_dir + 'Colab Notebooks/fast.ai/KnockKnock/data/' # /content/gdrive/My Drive/Colab Notebooks/fast.ai/data/
Importing the needed modules. Librosa is used for spectral decomposition.
from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt
import librosa
import librosa.display
import IPython.display as ipd
Selecting an audiofile to experiment on.
audiofile_path = base_dir+'knocking/0005.wav'
y, sr = librosa.load(audiofile_path,
duration=2,
offset=0)
Pre-processing audio
Calculate the Short-time Fourier transform using the stft
function and perform Median-filtering harmonic percussive source separation with the decompose.hpss
function.
Sounds real fancy but the gist of it is that the latter function attempts to split the audio into harmonic and percussive elements. As the sound one is listening for when determining if an engine is knocking is more of a percussive or transient kind, this will make it easier to train the classifier model.
D = librosa.stft(y)
D_harmonic, D_percussive = librosa.decompose.hpss(D)
# Pre-compute a global reference power from the input spectrum
rp = np.max(np.abs(D))
plt.figure(figsize=(12, 8))
plt.subplot(3, 1, 1)
librosa.display.specshow(librosa.amplitude_to_db(np.abs(D), ref=rp), y_axis='log')
plt.colorbar()
plt.title('Full spectrogram')
plt.subplot(3, 1, 2)
librosa.display.specshow(librosa.amplitude_to_db(np.abs(D_harmonic), ref=rp), y_axis='log')
plt.colorbar()
plt.title('Harmonic spectrogram')
plt.subplot(3, 1, 3)
librosa.display.specshow(librosa.amplitude_to_db(np.abs(D_percussive), ref=rp), y_axis='log', x_axis='time')
plt.colorbar()
plt.title('Percussive spectrogram')
plt.tight_layout()
It is possible to run an inverse fourier tranform on the harmonic and percussive content in order to hear the difference.
First up is the original sound.
ipd.Audio(y,rate=sr)
Next is the harmonic content.
ipd.Audio(librosa.istft(D_harmonic),rate=sr)
And lastly the percussive. The knocking is quite pronounced.
ipd.Audio(librosa.istft(D_percussive),rate=sr)
mydpi=150
pix_side=256
Here is the original sound file for comparison.
plt.figure(figsize=(pix_side/mydpi, pix_side/mydpi))
CQT = librosa.amplitude_to_db(np.abs(librosa.cqt(y, sr=sr)), ref=np.max)
librosa.display.specshow(CQT,x_axis=None,y_axis=None)
plt.axis('off')
And this is the spectrogram of the percussive content in the same format.
plt.figure(figsize=(pix_side/mydpi, pix_side/mydpi))
CQT = librosa.amplitude_to_db(np.abs(librosa.cqt(librosa.istft(D_percussive), sr=sr)), ref=np.max)
p=librosa.display.specshow(CQT,x_axis=None,y_axis=None)
plt.axis('off')
Save the file
p.figure.savefig('test.png')
and try opening and displaying it.
from IPython.display import Image
Image(filename='test.png')
Load each sound file extracted from the videos and generate a spectrogram image of every two seconds of audio.
The soundfile package is used to save out each 2 second slice so that it can be listened to when evaluating the performance of the classifier after training.
!pip install soundfile
First a class to represent the spectrogram. The idea was to give it functions for each step in the process detailed above and call them in a loop. But this was determined to be the concern of a future refactoring. Now the class is a bit redundant, a function would have sufficed.
class Spectrogram:
def __init__(self, audiofile_path, dpi=150, side_px=256, total_duration=10, duration=2):
import numpy as np
import matplotlib.pyplot as plt
import librosa
import librosa.display
import os
import soundfile as sf
filepath, extension = os.path.splitext(audiofile_path)
slices = int(total_duration / duration)
for i in range(slices):
spectrogram_path = filepath + '_' + str(i) + '.png'
audio_slice_path = filepath + '_' + str(i) + '.wav'
y, sr = librosa.load(audiofile_path,
duration=duration,
offset=duration*i)
sf.write(audio_slice_path,y,sr)
D = librosa.stft(y)
D_harmonic, D_percussive = librosa.decompose.hpss(D)
# Pre-compute a global reference power from the input spectrum
rp = np.max(np.abs(D))
plt.figure(figsize=(side_px/dpi, side_px/dpi))
CQT = librosa.amplitude_to_db(np.abs(librosa.cqt(librosa.istft(D_percussive), sr=sr)), ref=np.max)
p=librosa.display.specshow(CQT,x_axis=None,y_axis=None)
plt.axis('off')
figure = p.figure
figure.savefig(spectrogram_path)
plt.close(figure)
The actual batch job is run by these nested for loops.
import os
dirs = [base_dir+'knocking/',base_dir+'normal/']
for dirry in dirs:
print(dirry)
for filename in os.listdir(dirry):
if filename.endswith('.wav'):
print(filename)
Spectrogram(dirry+filename)