Application of Short Time Fourier Transform and Wavelet Transform for Sound Source Localization Using Single Moving Microphone in Machine Condition Monitoring

The paper discusses means to predict sound source position emitted by fault machine components based on a single microphone moving in a linear track with constant speed. The position of sound source that consists of some frequency spectrum is detected by time-frequency distribution of the sound signal through Short Time Fourier Transform (STFT) and Continues Wavelet Transform (CWT). As the amplitude of sound pressure increases when the microphone moves closer, the source position and frequency are predicted from the peaks of time-frequency contour map. Firstly, numerical simulation is conducted using two sound sources that generate four different frequencies of sound. The second case is experimental analysis using rotating machine being monitored with unbalanced, misalignment and bearing defect. The result shows that application of both STFT and CWT are able to detect multiple sound sources position with multiple frequency peaks caused by machine fault. The STFT can indicate the frequency very clearly, but not for the peak position. On the other hand, the CWT is able to predict the position of sound at low frequency very clearly. However, it is failed to detect the exact frequency because of overlapping.


Introduction
Sound source localization is a complex work that acoustic engineers face today.Some standards based on microphone arrays are used to analyze the noise source.In general, the methods are categorized into three: near-field acoustic holography, beamforming, and inverse method.Most of these methods were developed using 20 channels or more of microphones and data acquisition [1].Meanwhile, other researchers have been trying to develop sound source localization method with minimal number of microphone, like binaural and inter-aural sound localization [2].
Likewise, the vibration and acoustic signal analysis are very important methods in condition monitoring and fault diagnostics of machine components.With the rapid development of the signal processing techniques, the analysis of stationary and transitory signals has largely been based on well-known spectral techniques such as: Fourier Transform and wavelet transform [3,4,5].Then, to localize sound sources using acoustic signals, some methods also have been developed.For instance, sound source localization using time frequency histogram by two microphones [6], through the fusion between visual reconstruction with a stereoscopic camera pair with several microphones [7], and the application of envelope and wavelet transform to enhance the resolution of the received signals through the combination of different time-frequency contents [8].
This paper discusses about the implementation of a single microphone that is moving in a linear track and constant speed to detect sound source position.The sound position and frequency peaks are be detected by the peak of the time-frequency distribution of the sound signal using short time Fourier transform (STFT) and continues wavelet transform (CWT).Firstly, numerical simulation is conducted by two loudspeakers as the sound sources that generate four different frequencies of sound signals.The second case is the sound generated by a rotating machine with unbalanced, misalignment and bearing defect.Each defect generates specific frequency of vibration and sound.This single moving microphone method has benefit that could be developed for autonomous or robotic condition monitoring system with simpler and cheaper devices and analysis rather than multi-channel microphones array.

Short Time Fourier Transform and Continuous Wavelet Transform in Moving Microphone
The acoustic signals from faulty components are basically non-stationary.There are two basic approaches to analyze a non-stationary vibration and acoustic signal in time and frequency domain simultaneously.One approach is short time Fourier transform (STFT) by splitting an acoustic signal into segments in time domain by proper selection of a window function and then to carry out a Fourier transform on each of these segments separately and to deliver an instantaneous spectrum.Another approach is the continuous wavelet transform (CWT), where the non-stationary acoustic signal to be analyzed is filtered into different frequency bands, which are split into segments in time domain and their frequency contents and energy are analyzed.Wavelet analysis overcomes the disadvantage of STFT since CWT uses a windowing technique with variable sized regions.Wavelet analysis allows the use of long time intervals where we want more precise low-frequency information, and shorter regions where we want high-frequency information.In this analysis Morlet function is applied for basic wavelet function.
Furthermore, when a fixed source is measured by a moving microphone over a period of time, the distance between source and microphone is no longer constant, but as a function of time.When the moving microphone is approaching closer the sound source, the emitting sound needs less time to reach the microphone, the waveform is compressed resulting in an increase of frequency, especially when the microphone moves at very high speed.This phenomenon is known as the Doppler Effect [9].Likewise, because the distance decreases, the amplitude increases and becomes maximum when the microphone in the closest position to the sound source.The highest peak in  time-frequency distribution of the sound signal is then detected as the sound source position in each frequency.

Numerical Simulation
Firstly, a numerical simulation is conducted using two loudspeakers generating four different frequencies of sound like illustrated in Fig. 1.The first loudspeaker generates sinusoidal sound at frequencies of 25 and 50 Hz, and the second one generates 85 and 125 Hz of signal.The microphone moves on the track one meter length with low speed of 0.1 m/s.Both sound sources generate random noise with the signal to the noise ratio of 5.The sound signal from the microphone in time domain and timefrequency distribution by STFT is depicted in Fig. 2, while time-frequency wavelet energy distribution is shown in Fig. 3.
The result of wavelet transform is not easy to be understood compared with the FFT result.In a broad sense, and it is better to speak about the approximated frequency corresponding to a scale by the relationship of F  = F  /(a.T  ).Where F  is the center frequency of wavelet, a and T  are the scale and the sampling period, respectively.Fig. 3 shows that by STFT, the sound at frequency of 20 Hz and 50 Hz are detected at the first loudspeaker position.Meanwhile the sound at frequency of 85 and 125 Hz are detected at the second loudspeaker position.However, it is not easy to distinguish the peak from the frequency line.
On the other hand, in time-frequency wavelet energy distribution like shown in Fig. 3, the sound position at low frequency can be detected more clearly at the first sound source.However, two higher frequencies at the second sound source are not detected well.Although zooming in order to increase frequency resolution is done at 100 Hz to 160 Hz of frequency range, the sound at frequency of 85 Hz and 125 Hz are found merging into a single peak distributed from 85 Hz to 120 Hz.One of the drawbacks of CWT is the overlapping occurred at higher frequency [4].The spectrum peaks at frequencies of 85 Hz and 125 Hz are too close each other in both position and frequency distance, so the overlapping makes both frequency peaks merge in a single distributed peak.The resolution of the CWT is very good at high scale, which means it is very satisfied at low frequency.Therefore, the CWT will contribute well to detect the sound source at relatively low frequency or longer distance of frequency peaks of spectrum.

The Rotor Dynamics Model
The second case is the application of both methods in a simple rotating machine consists of an electric motor, two pairs of ball bearing with two thick rotor and shaft like illustrated in Fig. 4. The right pair has fault wore out bearings, so they have more clereance or loosness between inner ring, outer ring and the ball.The shaft between the electrical motor and both pairs of bearing-shaft-rotor are connected by flexible coupling.Morever, aligning and balancing process are unavailable in this case, so the unbalance and the misaligment signal may appear in the sound analysis.The microphone moves at very low speed of 0.024 m/s.The timewave of sound signal and time-frequency spectogram using STFT is shown in Fig. 5, while the time-frequency wavelet energy spectrum are shown in Fig. 6.
The sound signal in time domain presented in Fig. 5 shows that higher pressure level of sound generated by wore out pair of bearing part.Most energy is emitted at the first harmonics rotation frequency about 48 Hz like confirmed by the time-frequency spectrogram using STFT.It is shown that pairs of bearing-shaft-rotor part generate the sound at the first harmonic.However the wore out part generates the highest energy  of sound.It is possibly caused by unbalanced and misalignment occurred in both part.Moreover, the second, the third, fourth, and fifth harmonic are also generated along the system that is possibly caused by the bearings looseness or clearance.In addition, some signal at sub-harmonics frequency also observed to be omitted by the electrical motor.Fig. 6 shows the time-frequency spectrogram using CWT in various frequency ranges.At sub harmonics frequency range, it is found that some transient sound emitted by last wore out bearing around the FTF of bearing frequency.Furthermore, CWT also shows that the highest energy emitted from wore out part at rotation frequency at 48 Hz, higher than generated by the normal one.In addition, higher energy also emitted by the last bearing at the second and the third frequency harmonics.The same problem is found when using CWT at higher frequency.The application of CWT is difficult to observe the frequency of signal clearly because of the overlapping phenomenon.The overlapping increases significantly by longer time period of sampling.

Conclusion
This paper has presented the application of single moving microphone to predict the sound source position for machine condition monitoring.It is observed qualitatively that the time-frequency distribution of sound signal by STFT and CWT are possible to be used to predict the sound source position.The STFT time-frequency distribution shows the frequency clearly, although not so clearly for the position of the peak.On the other hand, the CWT could predict the position of sound at low frequency very clearly, but in contrast, it failed to shows the exact frequency because of overlapping.Combination of STFT and CWT could be applied to overcome the both drawbacks in machine condition monitoring.Furthermore, the implementation of band pass filter could be proposed to improve the CWT in higher frequency.

Figure 2 :
Figure 2: The time waveform (left) and the time-frequency spectrogram using STFT (left) of the sound signal in numerical model.

Figure 3 :Figure 4 :
Figure 3: Time-frequency wavelet energy spectrum at different frequency range.

Figure 5 :
Figure 5: Time wave of sound signal (left), and time-frequency spectrogram by STFT (right).

Figure 6 :
Figure 6: The time-frequency spectrogram using CWT in various frequency ranges.