Important: This notebook is not guaranteed to work without modification. It’s a while since I wrote this and haven’t tested it since. For a summary of this project see Engine knock detection AI part 5/5

Setup

Basics

Since this ran on Google Colab and the audio files are in a Drive folder, Google Drive is first mounted.

from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
root_dir = "/content/gdrive/My Drive/"
base_dir = root_dir + 'Colab Notebooks/fast.ai/KnockKnock/data/' # /content/gdrive/My Drive/Colab Notebooks/fast.ai/data/

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/gdrive

Importing the needed modules. Librosa is used for spectral decomposition.

from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt

import librosa
import librosa.display

import IPython.display as ipd

Selecting an audiofile to experiment on.

audiofile_path = base_dir+'knocking/0005.wav'

Loading the audiofile

Load two seconds of audio from the specified file. The load function returns both the data, y, and information on the sample rate, sr.

y, sr = librosa.load(audiofile_path,
                     duration=2,
                     offset=0)

Pre-processing audio

Calculate the Short-time Fourier transform using the stft function and perform Median-filtering harmonic percussive source separation with the decompose.hpss function.

Sounds real fancy but the gist of it is that the latter function attempts to split the audio into harmonic and percussive elements. As the sound one is listening for when determining if an engine is knocking is more of a percussive or transient kind, this will make it easier to train the classifier model.

D = librosa.stft(y)
D_harmonic, D_percussive = librosa.decompose.hpss(D)

Examining results

Plot the spectrograms of the original, harmonic and percussive content. Even though the knocking is visible in the full spectrogram it is a lot clearer in the percussive one.

# Pre-compute a global reference power from the input spectrum
rp = np.max(np.abs(D))

plt.figure(figsize=(12, 8))

plt.subplot(3, 1, 1)
librosa.display.specshow(librosa.amplitude_to_db(np.abs(D), ref=rp), y_axis='log')
plt.colorbar()
plt.title('Full spectrogram')

plt.subplot(3, 1, 2)
librosa.display.specshow(librosa.amplitude_to_db(np.abs(D_harmonic), ref=rp), y_axis='log')
plt.colorbar()
plt.title('Harmonic spectrogram')


plt.subplot(3, 1, 3)
librosa.display.specshow(librosa.amplitude_to_db(np.abs(D_percussive), ref=rp), y_axis='log', x_axis='time')
plt.colorbar()
plt.title('Percussive spectrogram')

plt.tight_layout()

It is possible to run an inverse fourier tranform on the harmonic and percussive content in order to hear the difference.

First up is the original sound.

ipd.Audio(y,rate=sr)

Next is the harmonic content.

ipd.Audio(librosa.istft(D_harmonic),rate=sr)

And lastly the percussive. The knocking is quite pronounced.

ipd.Audio(librosa.istft(D_percussive),rate=sr)

Preparing plots for output

The images used to train a classifier should ideally be square and relatively small. The following documents the steps necessary to get a borderless square plot of a spectrogram.

mydpi=150
pix_side=256

Here is the original sound file for comparison.

plt.figure(figsize=(pix_side/mydpi, pix_side/mydpi))

CQT = librosa.amplitude_to_db(np.abs(librosa.cqt(y, sr=sr)), ref=np.max)
librosa.display.specshow(CQT,x_axis=None,y_axis=None)
plt.axis('off')

(0.0, 87.0, 0.0, 84.0)

And this is the spectrogram of the percussive content in the same format.

plt.figure(figsize=(pix_side/mydpi, pix_side/mydpi))

CQT = librosa.amplitude_to_db(np.abs(librosa.cqt(librosa.istft(D_percussive), sr=sr)), ref=np.max)
p=librosa.display.specshow(CQT,x_axis=None,y_axis=None)
plt.axis('off')

(0.0, 87.0, 0.0, 84.0)

Save the file

p.figure.savefig('test.png')

and try opening and displaying it.

from IPython.display import Image
Image(filename='test.png')

Creating the training dataset

Load each sound file extracted from the videos and generate a spectrogram image of every two seconds of audio.

The soundfile package is used to save out each 2 second slice so that it can be listened to when evaluating the performance of the classifier after training.

!pip install soundfile

Collecting soundfile
  Downloading https://files.pythonhosted.org/packages/68/64/1191352221e2ec90db7492b4bf0c04fd9d2508de67b3f39cbf093cd6bd86/SoundFile-0.10.2-py2.py3-none-any.whl
Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.6/dist-packages (from soundfile) (1.12.3)
Requirement already satisfied: pycparser in /usr/local/lib/python3.6/dist-packages (from cffi>=1.0->soundfile) (2.19)
Installing collected packages: soundfile
Successfully installed soundfile-0.10.2

First a class to represent the spectrogram. The idea was to give it functions for each step in the process detailed above and call them in a loop. But this was determined to be the concern of a future refactoring. Now the class is a bit redundant, a function would have sufficed.

class Spectrogram:
  def __init__(self, audiofile_path, dpi=150, side_px=256, total_duration=10, duration=2):
    import numpy as np
    import matplotlib.pyplot as plt

    import librosa
    import librosa.display

    import os

    import soundfile as sf

    filepath, extension = os.path.splitext(audiofile_path)

    slices = int(total_duration / duration)

    for i in range(slices):
      spectrogram_path = filepath + '_' + str(i) + '.png'
      audio_slice_path = filepath + '_' + str(i) + '.wav'
      y, sr = librosa.load(audiofile_path,
                     duration=duration,
                     offset=duration*i)
      sf.write(audio_slice_path,y,sr)
      D = librosa.stft(y)
      D_harmonic, D_percussive = librosa.decompose.hpss(D)
      # Pre-compute a global reference power from the input spectrum
      rp = np.max(np.abs(D))
      plt.figure(figsize=(side_px/dpi, side_px/dpi))

      CQT = librosa.amplitude_to_db(np.abs(librosa.cqt(librosa.istft(D_percussive), sr=sr)), ref=np.max)
      p=librosa.display.specshow(CQT,x_axis=None,y_axis=None)
      plt.axis('off')
      figure = p.figure
      figure.savefig(spectrogram_path)
      plt.close(figure)

The actual batch job is run by these nested for loops.

import os
dirs = [base_dir+'knocking/',base_dir+'normal/']

for dirry in dirs:
  print(dirry)
  for filename in os.listdir(dirry):
    if filename.endswith('.wav'):
      print(filename)
      Spectrogram(dirry+filename)

/content/gdrive/My Drive/Colab Notebooks/fast.ai/KnockKnock/data/knocking/
0005.wav
0004.wav
0010.wav
0003.wav
0011.wav
0002.wav
0009.wav
0006.wav
0007.wav
0008.wav
0013.wav
0014.wav
0012.wav
0020.wav
0021.wav
0016.wav
0015.wav
0017.wav
0018.wav
0019.wav
0022.wav
0029.wav
0030.wav
0027.wav
0031.wav
0028.wav
0026.wav
0023.wav
0024.wav
0025.wav
0037.wav
0033.wav
0035.wav
0040.wav
0034.wav
0036.wav
0042.wav
0032.wav
0039.wav
0041.wav
0044.wav
0045.wav
0001.wav
0043.wav
0046.wav
/content/gdrive/My Drive/Colab Notebooks/fast.ai/KnockKnock/data/normal/
0004.wav
0012.wav
0011.wav
0010.wav
0007.wav
0009.wav
0013.wav
0008.wav
0006.wav
0005.wav
0023.wav
0021.wav
0019.wav
0022.wav
0014.wav
0020.wav
0017.wav
0016.wav
0018.wav
0015.wav
0029.wav
0027.wav
0026.wav
0032.wav
0030.wav
0028.wav
0024.wav
0031.wav
0033.wav
0025.wav
0034.wav
0043.wav
0036.wav
0042.wav
0039.wav
0041.wav
0037.wav
0035.wav
0038.wav
0040.wav
0003.wav
0046.wav
0002.wav
0001.wav
0044.wav
0045.wav
0047.wav