MOBILE SOUND ANALYSIS BASED ON DEEP LEARNING

Title:

MOBILE SOUND ANALYSIS BASED ON DEEP LEARNING

Document Type and Number:

WIPO Patent Application WO/2020/147928

Kind Code:

Abstract:

The sound analysis of an object for determining defects shall be simplified. Therefore, there is provided a method of analysing a sound example of an object to be examined including the steps of recording (S1) a plurality of sound samples output from a reference object, deep learning (S3, S4) a relation between the plurality of sound samples and at least one property of the reference object by a mobile phone or a computing device, recording (12) the sound example of the object to be examined by the mobile phone (11) and assigning the at least one property to the object depending on the recorded sound example and the learned relation.

Inventors:

FISHKIN ALEXEY (DE)
PFEIFER UWE (DE)
ROSHCHIN MIKHAIL (DE)
SASU LUCIAN-MIRCEA (RO)
SOLER GARRIDO JOSEP (DE)

Application Number:

PCT/EP2019/050927

Publication Date:

July 23, 2020

Filing Date:

January 15, 2019

Export Citation:

Click for automatic bibliography generation Help

Assignee:

SIEMENS AG (DE)

International Classes:

G01M5/00; G01M7/08

Domestic Patent References:

WO2004017038A1

2004-02-26

Foreign References:

CN103776903B	2016-07-27
CN107292286A	2017-10-24
CN206931362U	2018-01-26
US6463805B1	2002-10-15
US7059191B2	2006-06-13

Other References:

ALAIN DUFAUX: "Ph.D. thesis", 2001, UNIVERSITY OF NEUCHATEL, article "Detection and Recognition of Impulsive Sound Signals"
HANS-JOACHIM KORB; JOACHIM WAGNER: "Neural Networks: Artificial Intelligence and Industrial Applications, Proceedings of the Third Annual SNN Symposium on Neural Networks", article "Automatic Quality Control of Roofing Tiles"
LIPAR, PRIMOZ ET AL.: "Automatic Recognition of Machinery Noise in the Working Environment", STRO-JNISKI VESTNIK - JOURNAL OF MECHANICAL ENGINEERING, [S.I, vol. 61, no. 12, December 2015 (2015-12-01), pages 698 - 708, ISSN: 0039-2480, Retrieved from the Internet
GERMEN, E.; BASARAN, M.; FIDAN, M.: "Sound based induction motor fault diagnosis using Kohonen self-organizing map", MECHANICAL SYSTEMS AND SIGNAL PROCESSING, vol. 46, no. l, 2014, pages 45 - 58
BENKO, U.; PETROVIC, J.; JURCIC, D.; TAVCAR, J.; REJEC, J.: "An approach to fault diagnosis of vacuum cleaner motors based on sound analysis", MECHANICAL SYSTEMS AND SIGNAL PROCESSING, vol. 19, no. 2, 2005, pages 427 - 445, XP004536746, DOI: doi:10.1016/j.ymssp.2003.09.004
D. CIRE-SAN; U. MEIER; J. MASCI; J. SCHMIDHUBER: "A Committee of Neural Networks for Traffic Sign Classification", IJCNN, 2011

Download PDF:

View/Download PDF PDF Help

Claims:

Claims

1. Method of analyzing a sound example of an object to be ex amined including the steps of

Recording (SI) a plurality of sound samples output from a reference object,

deep learning (S3, S4) a relation between the plurality of sound samples and at least one property of the reference ob ject by a mobile phone (11) or a computing device (4), recording the sound example (12) of the object to be examined by the mobile phone (11) and

assigning the at least one property to the object depending on the recorded sound example (12) and the learnt relation.

2. Method according to claim 1, wherein the object is a tur bine (1) and specifically an arrangement of blades (6) of the turbine ( 1 ) .

3. Method according to claim 1 or 2, wherein the sound exam ple (12) and the sound samples are generated by hammering on the object.

4. Method according to one of the preceding claims, wherein the object is firmly integrated into an assembly.

5. Method according to one of the preceding claims, wherein the relation learnt by said deep learning (S3, S4) relates to a plurality of properties of the object, and based on this plurality of properties a classification (S5) with respect to a status of the object is performed by the mobile phone (11) .

6. Method according to one of the preceding claims, wherein prior to the step of deep learning (S3, S4) a preprocessing (S2) of the sound example (12) and/or sound samples is per formed on the mobile phone (11) .

7. Method according to claim 6, wherein the preprocessing (S2) includes FFT, spectrum analysis, wavelet analysis, Hid den Markov Model and/or a statistical algorithm.

8. Method according to one of the preceding claims, wherein the mobile phone (11) is controlled by a software application stored thereon to perform the method.

9. Method according to one of the preceding claims, wherein further sound samples are generated for said deep learning (S3, S4) by adding noise to at least some of the recorded sound samples.

10. Method according to one of the preceding claims, wherein said deep learning (S3, S4) is performed permanently.

11. Method according to one of the preceding claims, wherein the step of assigning and respective calculations after re cording the sound samples are performed on the mobile phone (11) .

12. Method according to one of the claims 1 to 10, wherein the recorded sound example (12) is wirelessly transferred to an external calculation device, the step of assigning and re spective calculations after recording the sound samples are performed on the external calculation device and a result of said assigning is retransferred to the mobile phone.

13. Method according to one of the preceding claims, wherein the step of recording the sound example (12) is performed on the mobile phone (11) .

Description:

Description

Mobile Sound Analysis based on Deep Learning

The present invention relates to a method of analysing a sound example of an object to be examined including the step of recording a plurality of sound samples output from a ref erence object.

Objects like machines, components of machines but also struc tures like bridges, buildings etc. have to be analysed with respect to the quality of their manufacturing. For example, the blades of a turbine have to be examined. The blades are mounted into a specific turbine structure. It is the target to provide a system which is capable of automatically as sessing the health of the integrated blades. The basic ap proach for this consists of a visual inspection of the tur bine, which implies opening the equipment and inspecting the blades status one by one. This assembly, inspection and reas sembly of such a unit are time prohibitive, with a negative impact on production and with a requirement of specialised human labour.

Another approach for analysing the mechanical structure of objects is based on sound analysis. Such an acoustical analy sis of a turbine can be performed by recording the sound pro duced by hammering the integrated blades and storing respec tive sound samples for a later analysis. The audio files may be analysed in situ or transferred to a sound expert which further comes with a decision. Eventually, the results are communicated to the beneficiary. Also one avoids this assem bly at the same scale as in the first approach, the key is sues here are the availability of a sound expert and the de layed feedback.

There is a plurality of tools which can be used for sound signal analysis. However, such tools are usually devoted to sound processing in general. Therefore, they are only a part of the whole processing and decision-support workflow for judging the quality of the object.

Furthermore, there are research results targeting recognition systems based on sounds. For example, Alain Dufaux, "Detec tion and Recognition of Impulsive Sound Signals", Ph.D. the sis, University of Neuchatel, 2001 (Online:

http : //1pm. epf1. ch/webdav/site/lpm/users/175547 /public/Sound_ recognition.pdf) describes detection and recognition of im pulsive sound signals, targeting alarm issuing.

Automatic quality control of roofing tiles is proposed in Hans-Joachim Korb, Joachim Wagner, "Automatic Quality Control of Roofing Tiles", in Neural Networks: Artificial Intelli gence and Industrial Applications, Proceedings of the Third Annual SNN Symposium on Neural Networks, Njimegen, The Neth erlands, 14-15 September 1995, Bert happen and Stan Gielen (Eds) . This procedure restricts the preprocessing stage to frequency analysis based on fast Fourier transformation and shallow neural network based architecture for classification.

Automatic recognition of machinery noise in the working envi ronment is discussed in LIPAR, Primoz et al . Automatic Recog nition of Machinery Noise in the Working Environment. Stro- jniski vestnik - Journal of Mechanical Engineering, [S.I], v. 61, n. 12, p. 698-708, dec. 2015. ISSN 0039-2480. (Available at : http://ojs.sv-j me. eu/index . php/sv-j me/article/view/sv- jme.2015.2781/870) .

In the paper Germen, E., Basaran, M., Fidan, M. (2014) . Sound based induction motor fault diagnosis using Kohonen self organizing map. Mechanical Systems and Signal Processing, vol. 46, no.l, p. 45-58, DOI : 10.1016/j . ymssp .2013.12.002 the author discusses classification of mechanical and electrical folds in induction motors, after extracting features from the acoustic data. In Benko, U., Petrovic, J., Jurcic, D., Tavcar, J., Rejec, J. (2005) . An approach to fault diagnosis of vacuum cleaner mo tors based on sound analysis. Mechanical Systems and Signal Processing, vol.19, no.2, p. 427-445,

DOI : 10.1016/j . ymssp .2003.09.004 , the author develops a sound- analysis based system to detect and localise bearing faults, defects in fan impellers, improper brush-commutator contacts and rubbing of rotating surfaces.

Moreover, the company 3D Signals proposes a solution, where the platform requires sensor modules mounted near rotating equipment and cloud data aggregation. The same idea of rely ing on remote services in order to get the decision is found in a solution of the company Augury. There is a need to con tact a remote server to find out the inferred sound category.

Finally, the KSB® Sonolyzer App targets asynchronous motors and potential energy saving and increasing efficiency. It analyses "the audio signal of the motor fan to determine the current rotational speed of the asynchronous motor", and tar gets mainly KSB-supported devices. At the moment they are re stricted to machines having a drive rating of up to 200 kW.

Document WO 2004 017038 A1 presents a portable tool for diag nostics for industrial rotating machines. The tool is consti tuted by a parabolic-shape collector of sound wave, a micro phone, a supporting block, a handle, an electronic amplifier, an analogue/digital converter, a microcomputer and an appli cation software. The primary function of the tool is to visu alize malfunctions of rotating machines in order to reduce diagnostic services for machine maintenance. The tool makes a comparison of frequency characteristics of sound waves be tween that emitted by machines "with defect" and that of ma chines "without defect". The tool can substitute sophisticat ed electronic devices of maintenance with relatively good ac curacy . Furthermore, document US 6463805 B1 discloses an apparatus comprising an inspection jig including a sample mounting por tion for mounting a sample, a vibrator for applying vibration to the sample and a sound collector for collecting a vibra tion sound when the vibration is applied to the sample by the vibrator, and a sound detector for conducting frequency anal ysis of the vibration sound collected by the sound collector. The defects of the sample are determined by sampling and ana lysing the vibration sound when the vibration is applied to the sample by the vibrator on the time series.

In addition a document US 7059191 B2 discloses a method for determining whether a device is defective by analysing the sound signals generated by the device. Digital samples are generated to represent the sound signals. Digital samples are transformed from the time domain to the frequency domain to generate a frequency spectrum. By comparing the levels of in tensity at a corresponding frequency to the threshold levels of intensity, defective devices can be determined.

It is the object of the present invention to automatically assess the health of an object with simpler tools.

According to the present invention this object is solved by a method according to claim 1. Further favourable developments are defined in the sub claims.

Specifically, according to the invention there is provided a method of analysing a sound example of an object to be exam ined including the step of recording by a mobile phone a plu rality of sound samples output from a reference object. This means that a common mobile phone is used for gathering a plu rality of sound samples. Thus, a commonly available device like the mobile phone is used for analysing the object. Par ticularly, the microphone or the microphones of the mobile phone are used to record the sound samples. Additionally the processor and memory of the mobile phone is used for storing and further processing the sound signals delivered by the mi- crophone of the mobile phone. Audio files may be provided from the respective sound samples. The sound samples are out put from the object, where the object is excited respective ly. The object may be the reference object at a later point of time (i.e. the object aged) or the object is not identical with the reference object but of the same type.

In a second step of the inventive method there is performed deep learning of a relation between the plurality of sound samples and at least one property of the object by the mobile phone or (another) computing device. Deep learning is also known as deep structured learning or hierarchical learning.

It is a certain kind of machined learning and can be super vised, partially supervised or unsupervised. Deep neural net works can be used for deep learning methods. The concept of deep learning is based on layers which are assumed to corre spond to levels of abstraction or composition. Higher levels representing more abstract concepts are learnt from lower levels. In the end the mobile phone learns to classify the sound samples in a more or less abstract manner.

The above two steps of recording a plurality of sound samples and deep learning are performed prior to actually analysing the object. These two steps represent a training phase of the method. After such training with the reference object an ob ject to be examined can be analysed. This step of recording the sound example of the object to be examined is performed by the mobile phone. I.e. the mobile phone may record in ad dition to the sound samples from the reference object a sound sample of the object to be examined. This sound sample will be further processed by the mobile phone on the basis of the deep learned relation. Specifically, there is performed a step of assigning the at least one property to the object de pending on the recorded sound sample and the learnt relation. Thus, if the sound example indicates the at least one proper ty, this property is assigned to the object. Otherwise, if the sound example does not indicate the at least one proper ty, this property is not assigned to the object. Perhaps the sound example indicates another property which can be as signed to the object. Thus, the object can be classified in a specific way.

The advantage of the inventive method is that the analysis of the object can be performed by a mobile phone and a respec tive app . Thus, there is an integrated solution which allows for automatic sound classification in industrial domain based entirely on mobile applications.

In one embodiment the object is a turbine and specifically an arrangement of blades of the turbine. Turbines usually have a very complex structure and it is not easy to analyse them visually. Therefore, it is very favourable to perform the analysis of the object simply by a mobile phone and its app.

The sound example and the sound samples may be generated by hammering on the object. Such hammering may be performed by a wooden hammer. Of course, there may be used other methods for exciting the object to emit sound. For instance, piezo ele ments may be attached to the object and stimulated electri cally.

In a further embodiment the object is firmly integrated into an assembly. Even if the object is not directly accessible, the whole assembly may be excited to emit sound. If the ob ject does not have the predetermined quality, the complete assembly will emit another sound example than in the case that the object has the predetermined quality. Thus, it is not necessary to isolate the object from the assembly.

In a further embodiment the relation learnt by deep learning relates to a plurality of properties of the object, and based on this plurality of properties a classification with respect to a status of the object is performed by the mobile phone. For instance, the object may be classified into quality lev els like "good", "warning" and "alert". If the quality of the object is perfect, it may be classified as "good". Otherwise, if the quality of the object is degraded, but the object can be used for some more operating hours, a respective warning can be given as classification result. Finally, if the object is defect an alert can be output as classification result. In this example three classes are provided for determining the property of the object. However, in other instances more or less classes may be used.

Prior to the step of deep learning a preprocessing of the sound example and/or the sound samples may be performed on the mobile phone or the computing/measuring device. With such preprocessing the sound example or the sound samples may be prepared for enhancing the deep learning with respect to quantity or quality.

Specifically, the preprocessing may include FFT, Spectrum Analysis, Wavelet Analysis, Hidden Markov Model and/or a sta tistical algorithm. Particularly, spectral information may help to classify the status of the object. Such preprocessing features may be implemented by standard audio processing tools .

As already indicated above, the mobile phone (e.g.

smartphone) may be controlled by a software application stored thereon to perform the method of analysing the sound example of the object. Such software application (in short "app") may be downloaded from a supplier of the object or from specific analysing suppliers. The download can be per formed via usual channels. Furthermore, such an app does not require a specific hardware of the mobile phone.

Further sound examples for the deep learning may be generated by adding noise to at least some of the recorded sound sam ples. Thus, artificial sound samples can be obtained by such adding of artificial noise. Such further sound samples en hance the quality and speed up the process of deep learning. In a preferred embodiment the deep learning is performed per manently. Such permanent learning is also called "active learning". It may be performed in the background to improve the analysis over time.

In one embodiment the step of assigning and respective calcu lations after recording the sound samples are performed on the mobile phone. The advantage is that the mobile phone can be used offline. The user does not need any other calculation means than the mobile phone including the app .

Alternatively the recorded sound example (preprocessed or not) is wirelessly transferred to an external calculation de vice, the step of assigning and respective calculations after recording the sound samples are performed on the external calculation device and a result of said assigning is retrans ferred to the mobile phone. The advantage of this solution is that a highly sophisticated analysis can be performed on a powerful computer and the mobile phone has just to display the result of the assignment or classification.

The present invention will now be described in more detail along with the attached drawings showing in

FIG 1 a turbine with a measuring probe;

FIG 2 a part of the turbine together with the measuring probe ;

FIG 3 a workflow diagram for producing a classification model; and

FIG 4 an example of a pipeline for sound classification in a mobile phone.

The following examples are preferred embodiments of the pre sent invention. The status of an object shall be analysed by acoustical meth ods. In the following examples the object is a turbine. How ever, the analysing method is not limited to turbines. Other constructions may be analysed as well.

FIG 1 shows a turbine 1. The turbine 1 has a driving part 2 with a plurality of blade wheels. Specifically, the blades of the blade wheels shall be analysed. The acoustical analysis is performed by a measuring probe 3. For learning a proper analysing model a computing device 4 may be used. Such compu ting device may be laptop, a PC or a smartphone, for in stance .

FIG 2 shows an enlarged view of the driving part 2 including blade wheels 5. Each blade wheel has a plurality of blades 6. The blades 6 are firmly fixed between a hub 7 and a circular frame 8. The mechanical status of the blades 6, the blade wheels 5 or the complete turbine 2 has to be examined. For this analysis the acoustical measuring probe 3 is used. The measuring probe 3 has a hammer 9 and a microphone 10, for in stance .

First of all, a model has to be created which maps a sound example to a specific status of the object. Such model may be created as shown in FIG 3. A plurality of sound samples of one or more reference objects (e.g. a perfect quality tur bine, a defect turbine etc.) has to be recorded in a first step SI. The sound samples have to be recorded in connection with the respective status of the reference object. I.e. each sound sample has to be assigned to the corresponding status of the reference object. For example, some sound samples have to be recorded for a healthy turbine or other objects. Other sound samples have to be recorded for an unhealthy turbine. Further sound samples could be recorded for an intermediate condition state of the turbine. In this example there are three different labels for the condition state of the tur- bine. However, there may be used more or less labels for the condition state of the reference object.

In a further optional step of the model creation process there is performed a preprocessing S2 of the sound samples. The preprocessing may include a sound analysis and a specific labelling by a domain expert. Furthermore, the preprocessing stages may include any classical methods to be found in the literature (e.g. FFT, spectrum and wavelet Analysis or cus tomized algorithms) .

As a result of the preprocessing step S2 or immediately from the sound recording of step SI there may be provided a plu rality of training samples in step S3. The derived training samples are further provided as inputs for a machine learning process S4. The outcome of the training step S4 is a classi fication pipeline or classification model in step S5.

The whole pipeline is customized preferably offline, based on previously recorded and labelled sound files or sound sam ples. Even if the workflow shown in FIG 3 which produces a pipeline for classification of sound data depicts a laptop for training in step S4, any other computing device like a mobile phone or the like may be employed. Thus, based on pre viously recorded sound samples e.g. produced by hammering parts of a turbine together with corresponding status labels (for example healthy/unhealthy/intermediate condition state, issued by sound analysis) a model or pipeline is built con sisting of e.g. data preprocessing stages and a data-driven model which learnt the relation or the association between input sound and its corresponding status label.

To be used in production of an object, the model or pipeline is deployed on a mobile phone 11 as shown in FIG 4. The model may allow one to estimate the health status of the blades in tegrated in the turbine or other objects. The result is pro vided instantaneously without requiring access to external analysis services and/or domain experts. In the example of FIG 4 a model or pipeline for sound classification is depict ed symbolically within a mobile phone 11. With this mobile phone 11 a mobile sound input 12 may be realised in the time domain. An optional preprocessing step in the mobile phone 11 may perform a FFT in order to obtain a spectral density 13 in a frequency domain. The model for analysing the sound samples may include e.g. 32 features 14. The spectral density func tion 13 is fed to the 32 features 14 by a first convolution 15. In the present example a max-pooling 16 is performed in order to combine the features to a second stage of features 17. A second convolution 18 may be performed in order to ob tain feature maps 19 in a third feature stage. A neural net work 20 may be fed with the feature maps 19 by a second max pooling 21, where the features are combined again. The net work 20 may include several node stages which are fully con nected. The neural network may be able to classify the input sound example 12 into three classes: Healthy, intermediate and unhealthy.

Due to its usage simplicity, the application can be operated by a regular worker after a minimal instruction on using it. Furthermore, the pipeline deployed in the application can be changed and may be inappropriate for any other sound-specific contacts (e.g. turbine type and family, router type, etc.).

In the following some optional details and advantages of the inventive method are shown. So far, for the diagnosis phase one had either to perform visual inspection of the blades (hence this assembly and then reassembly of the turbine was needed, which is time cumbersome) , or to record the acoustic sounds emitted by rings of integrated blades after hammering and finally ask a sound expert to perform offline analysis.

In this latter scenario, one may encounter a considerable de lay between sound recording and its analysis. Also one has to manage securely transferring of the recorded sound between turbine's place and expert's location. In the proposed method, the model can be retrained any time on broader collections of labelled data. Retraining the model and re-deploying the updated application can last up to a few hours (depending on the training set size and hardware used) . The domain expert is needed only for providing labels on a size-limited training subset and, unlike so far, he/she is no longer required for later diagnosis.

There is a combination of factors which realises the ad vantages mentioned in the previous subsections: a) The inventive method avoids time-demanding visual in

spection and off-line sound analysis as performed so far;

b) The inventive method combines state-of-the-art algo

rithms for sound processing and machine learning; for visual recognition, machine learning is known to produce better accuracy than human experts as stated in D. Cire- san, U. Meier, J. Masci, J. Schmidhuber - A Committee of Neural Networks for Traffic Sign Classification, IJCNN

2011;

c) The method runs at least partly on a mobile phone and specifically a smartphone, and can be operated in situ by a regular worker without sound processing or pattern recognition background. The result is provided instanta neously without requiring external services;

d) Updating the model (e.g. for another turbine type) and redeploying the method can be done in a few hours.

e) The impact of human expert in the loop is reduced to a minimum; hence mitigating the risk of expert's unavaila bility due to retirement or overwhelming load.

As the analysis can be made in situ, the delay between sound recording and turbine's status assessment is reduced to a minimum. Thus the time requested for diagnosis is optimized. It also cuts financial costs which are normally induced by involving a sound expert in analysis. Further recorded cases (i.e. enriched training data set) can be easily integrated in the application, thus providing enriched knowledge anywhere, anytime to anyone.

The above embodiments may be varied in plural ways. For in stance, in the above embodiment the analysis runs entirely on the mobile phone. An alternative solution is to submit the recorded sound to another calculation device or application (e.g. stored in the cloud) through e.g. a secure channel and to remotely perform the analysis. Finally the result is re ceived on the mobile phone.

If the initial training data is scarce, one may get more sound samples via adding noise for other sound-processing specific methods. Furthermore, the recorded sounds may be stored for later analysis or for enriching the training with further samples. In another embodiment the data may be grouped based on turbine type, family etc. and later transfer learning may be used to enrich the knowledge of the scarce groups. Moreover, one may use active learning to ask for la belling of the problematic cases, i.e. the cases where there is high uncertainty on correct labelling from the model's perspective. After a sound expert performs labelling, the training data set is enriched and later on one may retrain the model and redeploy the application.

Previous Patent: METHODS AND NODES FOR IN-ADVANCE QOS PREDICTION NOTIFICATION

Next Patent: TRANSPORT SYSTEM FOR TRANSPORTING A CARRIER, VACUUM PROCESSING SYSTEM AND METHOD FOR TRANSPORTATION ...