In the ever growing field of machine learning more and more efficient information retrieval methods are in demand as more data needs to be processed. In the audio related machine learning short-time Fourier transform is one of the most common basic information retrieval methods. Established computing libraries provide a multitude of short-time Fourier transform implementations but they all rely on central processing units. As short-time Fourier transform parallelizes effectively it ought to be possible to use graphics processing units to execute this task more efficiently. 

The purpose of this study was to evaluate the effectiveness of graphics processing units in calculating short-time Fourier transforms in order to provide a more efficient way of information retrieval for audio machine learning systems. The first order of business was to develop a novel implementation of short-time Fourier transform for graphics processing units using PyTorch. Once our implementation was proven correct by testing it's accuracy against other established implementations, it's time efficiency was evaluated against other established methods. 

In time efficiency tests even the slowest graphics processing unit times were faster than the fastest central processing unit times. Overall the results demonstrated magnitudes faster run times on graphics processing units with novel implementation compared to established central processing unit implementations whilst maintaining significantly above threshold accuracies.


Online document