西班牙专利ES2667626A1 Sound identification system by parametric classification of derived series (Machine-translation by G

专利PDF首页>>西班牙专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
The object of the present invention is a sound identification system that is based on the description and selection of a few characterizing parameters thereof, the obtaining of series derived from classification scores thereof, and the definitive assignment to a class of sound by parametric characterization and the classification of the derived series. The invention falls within the field of electronic technology and communications, specifically for application in information processing systems and filing and retrieval systems, among others. (Machine-translation by Google Translate, not legally binding)
公开号:ES2667626A1
申请号:ES201600960
申请日:2016-11-11
公开日:2018-05-11
发明作者:Alejandro CARRASCO MUÑOZ；Amalia LUQUE SANDRA；Javier ROMERO LEMOS；Julio BARBANCHO CONCEJERO
申请人:Universidad de Sevilla；
IPC主号:

专利说明:

DESCRIPTION Sound identification system by parametric classification of derived series Object of the invention 5 The present invention aims at a sound identification system that is based on the description and selection of a few parameters characterizing them, obtaining series derived from their classification scores, and the definitive assignment to a sound class through parametric characterization and classification of the derived series. The invention falls within the sector of electronic technology and communications, specifically for application in information processing systems and archiving and recovery systems, among others. State of the art 15 The first step in the identification of sounds consists in the extraction of its characteristics, that is, in obtaining a set of parameters that represent it. These parameters usually take the form of a vector that evolves over time. To obtain it, temporary, spectral, homomorphic, linear predictive coding processes, etc. can be used. A summary of the 20 sound feature extraction techniques can be found in [1], [2], [3]. Many of the procedures for extracting characteristics of a sound divide it into temporary fragments (windows) of a very short duration (typically a few hundredths of a second). A few parameters that characterize them are obtained from each of these windows, the use of those defined in the MPEG-7 standard [4] or those derived from Mel coefficients: MFCCs [5] being very widespread. It is typical that the number of parameters extracted is around 20. Frequently, the extraction phase is followed by a process of construction of characteristics, that is, of obtaining derived parameters that reflect additional behaviors of the sound (or the window of sound). A typical set of constructed parameters are those that try to represent the temporal evolution of sound, among which, for example, the differences of first andsecond order [6], [7]: or the joint use of consecutive window parameters using the sliding window technique [8]. The above processes can provide feature vectors with a large number of parameters (several hundreds), which significantly increases the process times necessary for the classification of sounds. In addition, the relevance of each parameter in the classification task can be very different. It is therefore usual that the next step in the process of identifying sounds is the selection of characteristics, that is, obtaining a subset of parameters as small as possible without substantially affecting the subsequent sorting capacity 10. A summary of the techniques used in the selection of features can be found in [10], [11]. Within these, the filter techniques [12], [13] tend to be those with the best computational efficiency. Once the characteristics have been extracted, constructed and selected, the resulting vectors are used to identify the sounds. For this, different classification techniques are used that compare the characteristics of the sounds with those of one or more patterns. In [14], [15] a good summary of the most commonly used techniques can be found. Among them are those based on hidden Markov models [16] which is also the technique recommended in the MPEG-7 standard. The invention raises a novel and efficient system of identifying a sound, that is, of recognizing it as belonging to a class within a predetermined set of them. The proposed solution is based on the extraction of characteristics of sound windows, based on MPEG-7 standardized parameters followed by the construction of parameters by sliding windows. 25 The generated characteristics are used to classify each window with some standard data mining technique (decision trees, Bayesian classifier ,,,.). 30 Both the use of MPEG-7 parameters and their classification by data mining have already been described in the technical literature and patent claims are not part. References [1] Lu, L., & Hanjalic, A. (2009). Audio Representation In Encyclopedia of Oatabase Systems (pp. 160-167). Springer US{2] Sharan, R. V., & Moir, T. J. (2016). An Overview of Applications and Advancements in Automatic Sound Recognition. Neurocomputing (3) Cowling, M, & Sitte, R. (2003). Comparison of techniques for environmental sound recognition. Pattern recognition letters, 24 (15), 2895-2907. 5 {4] ISO (2001). ISO / lEC FDIS 15938-4: 2001: Information Technology -Multimedia Content Description Interface -Part 4: Audio {5] Young, S., Evermann, G., & Wales, M (2012) .The HTK book (ZOL). (6] Sharma , S., Shukla, A, & Mishra, P. (2014). Speech and Language Recognition using MFCC and DEL TA-MFCC. International Joumal of Engineering Trends and 10 Technology (/ JETT), 12 (9), 449-452 [7] Hossan, MA, Memon, S., & Gregory, MA (2010, December) .A novel approach for MFCC feature extraction.In Signal Processing and Communication Systems (ICSPCS), 2010 4th International Conference on (pp. 1- 5) IEEE (8) Chu, CSJ (1995) Time series segmentation: A sliding window 15 approach.lnformation Sciences, 85 (1), 147-173 (9) Beniwal, S., & A rora, J. ( 2012) Classification and feature selection techniques in data mining. International Journal of Engine ering Research & Technology (/ JERT), 1 (6). {10] Guyon, l., & Elisseeff, A (2003). An introduction to variable and feature selection. 20 The Journal of Machine Learning Research, 3, 1157-1182. {11] Liu, H., & Motoda, H. (1998). Feature extraction, construction and selection: A data mining perspective. Springer Science & Business Media {12] Guyon, l., Gunn, S., Nikravesh, M, & Zadeh, L. A (Eds.). (2008). Feature extraction: foundations and applications (Vol. 207). Springer 25 (13) Liu, H., Hussain, F., Tan, CL, & Dash, M (2002). Discretization: An enabling technique. Data mining and knowledge discovery, 6 (4), 393-423. {14] Aggarwal, CC (2007). Data streams: models and algorithms (Vol. 31). Springer Science & Business Media. (15] Fu, TC (2011). A review on time series data mining. Engineering Applications of 30 Artificiallntelligence, 24 (1), 164-181. (16] Rabiner, LR (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77 (2), 257-286. 35 Description of the figures Figure 1.-Diagram of the process of classification of sounds object of the invention.5 10 15 Description of the invention The present invention aims at a sound identification system by parametric classification of derived series comprising the following steps: a. Obtaining derived series, Pi, from a sound classifier that assigns a Pik score that measures the proximity of each window k to each i-th sound class. b. Characterization of each derived series Pi, each derived series being considered as a single sound window from which a set of MPEG-7 parameters are obtained. C. Feature selection by reducing the number of MPEG-7 parameters representing each derived series from the corrected Jensen-Shanon distance. d. Sound identification by applying standard data mining techniques to the selected MPEG-7 parameters. The novelty of the invention consists of the following two aspects: • Selection of characteristics by means of Jensen-Shanon or The several tens of generated characteristics are reduced by a selection method based on the Jensen-Shanon distance 20 corrected 25 30 • Sound classification from derived series o The window classifier assigns a score (usually a probability) for each window and each kind of sound. This produces a set of "derived" time series, as many as sound classes you want to classify or Each of the derived series is also characterized by MPEG-7 parameters, considering the series set as a single window or Sound classification It is done by applying some standard data mining technique to the MPEG-7 parameters of the derived seriesOnce the characteristics of each window of a sound have been obtained and selected, the classification method proposed in the invention is as follows: 1. For each k-th window, a classification technique is assigned that assigns a Pik score. which measures the proximity (usually the probability) of that window to each i-th class. 2. Analyzing the totality of the sound windows, for each i-th class a time series of Pi scores, or derived series, is obtained. 3. Each i-th class of Pi scores is treated as if it were a single sound window (it is not segmented) and from it a set of MPEG-7 parameters is obtained 4. The number of parameters representing each series is reduced derived by the same feature selection method based on the corrected Jensen-Shanon distance detailed below. 5. To the sound set, now represented by a small set of parameters, a classification technique is applied that allows definitive identification with any of the predetermined classes. Selection method based on the Jensen-Shanon distance The first step of the feature selection method proposed in the invention 20 is to determine the separability of sound classes according to each of the parameters. For the calculation of the index of separability of classes ll'or according to the 8th parameter, proceed as follows: 1. For each i-th class, the values of the 8th parameter are obtained in all sound windows belonging to that class. 25 2. For each ith class, the probability density functions fo of the values of the 8 th parameter in the sound windows belonging to said class are calculated. 3. For each possible pair of classes i and j, the Jensen-Shanon divergence calculated by 1 foo 2fo¡ 1 foo 2foj Discro¡, fo;) = -fo¡ log2¡; ¡; dx + "2 fo¡ log2¡; ¡; dx 2 -00 O¡ + Oj -00 o¡ + Oj 30 4. For each possible pair of classes i and j, the distance of Jensen-Shanon that is given by5 5. The separability index lf'e is calculated as n-1 n lf'e == N nnd¡s (reJeij) i = lj = i + 1 expression in which n is the number of kinds of sounds that are pretend to identify; and N is the number of calculated Jensen-Shanon distances whose value is given by n (n -1) N = --- 2 Selection method based on the corrected Jensen-Shanon distance The method of feature selection proposed in the The invention uses the separability index calculated in the previous section, but corrects it based on the correlation between parameters. The proposed process is as follows: 10 1. The parameter-parameter correlation matrix Qp is calculated, which is formed by the elements (! Ij that represent the correlation between the parameters i and j. These values are calculated as VOC (Xi, Xj) (! Ij = -; ========= var (x¡) var (xj) expression in which xik represents the value of the ith parameter in the kth window; Xi is the value mean of the i-th parameter; and m is the total number of windows 2. From the correlation matrix Qp the matrix of independence between parameters defined as 3 is calculated. For each parameter the separability index lf'¡ is calculated following the procedure indicated in the previous section.5 4. The one with the highest value of lPi is chosen as the most relevant parameter. It is incorporated into the set (ordered) of relevant parameters: R and removed from the set of parameters pending to analyze P. 5. For each of the j-th parameters in P, the independence Ojk is calculated with respect to each of the k-th parameters in: R. 6. For each of the j-th parameters in P, the minimum independence is calculated with respect to: R defined as flj == min Ojk k 7. For each of the j-th parameters in P, the separability is calculated corrected, defined as 10 8. The most relevant parameter is the one with the highest value of Kj. It is incorporated into the set (ordered) of relevant parameters: R and removed from the set of pending parameters to analyze P. 9. Steps 5 through 8 are iterated until all parameters have been analyzed: P = I / J 15 Regarding the status of the technique, the proposed invention has three main advantages: • The identification of sounds by classification of derived series significantly improves the success rate in the classification of sounds • The reduction of the number of parameters in the characterization of sound windows 20 makes the classification process is significantly faster 25 • The proposed feature selection method allows a faster determination of a subset of parameters that represent the sound for later classification. Embodiment of the invention A possible embodiment of the system proposed in the invention is as follows: 1. ' A set of sounds is recorded, for example, using a sampling frequency of 44.1 kHz and 16 bits of resolution2. A subset of the sound (for example, 10%) that are considered significant of each of the classes to be identified is chosen. This subset will constitute the standard sounds. 3. In each pattern, the segments that are identifying the class 5 they represent are determined. 10 15 20 4. Each sound is segmented into short-lived windows, for example, 10 milliseconds. 5. Different MPEG-7 parameters are obtained from each window. The set of them can be, for example, the following: • Total power • Relevant power, that is, that falls within a certain frequency band, for example, between 500 Hz and 5kHz. • Power centroid • Spectral dispersion • Flatness • Tone • Reason for harmonicity • Limit frequency of harmonicity • Frequency of the first 3 formants • Bandwidth of the first 3 formants • Harmonic centroid • Harmonic deviation • Harmonic dispersion • Harmonic variation 25 6 By means of the method of selection of features proposed in the invention, the number of parameters is reduced by choosing, for example, the most significant ones. 7. For each window, other features are constructed using the sliding window technique. That is, the window is identified with the 5 own parameters plus the 5 corresponding to the adjacent windows. The size of the sliding window can be, for example, 5. Each window is then characterized by 25 (5x5) parameters. 8. Each window is classified by comparison between its characteristics and the characteristics of the pattern windows. The classification method can be, for example, the decision tree. This classifier generates a5 10 15 20 25 score of the proximity of each window to each class (probability of belonging to that class). 9. The application of the classifier to the sequence of the windows of a sound produces a derived series (scores) for each of the sound classes. 10. Each of the derived series is considered as if it were a single window of a sound and from it the MPEG-7 parameters expressed above are extracted. 11. Through the feature selection system proposed in the invention, the number of parameters with which each derived series is characterized is reduced by choosing, for example, the 5 most significant. If we have, for example, 10 classes, the series derived from each sound are characterized by 50 (5x10) parameters. 12. Each sound is classified by comparison between the characteristics of its derived series and the characteristics of the series derived from the patterns. The classification method can be, for example, the decision tree.

权利要求:
Claims (5)
[1]
5 10 15 20 25 30 Claims 1. Sound identification system by parametric classification of derived series characterized in that it comprises: e. Obtaining derived series, Pi, from a sound classifier that assigns a Pik score that measures the closeness of each k window to each class of ith sound. F. Characterization of each derived series Pi, each derived series being considered as a single sound window from which a set of MPEG-7 parameters are obtained. g. Feature selection by reducing the number of MPEG-7 parameters representing each series derived from the corrected Jensen-Shanon distance. h. Sound identification applying standard data mining techniques to selected MPEG-7 parameters.
[2]
2. Sound identification system by means of parametric classification of derived series characterized in that the selection of characteristics of a sound window from the Jensen-Shanon distance is obtained by calculating the separability index of classes lVe according to the MPEG-7 parameter 8 -th comprising: i) For each i-th class, the values of the 8-th parameter are obtained in all the sound windows belonging to that class. ii) For each i-th class, the probability density functions [ei of the values of the 8-th parameter in the sound windows belonging to that class are calculated. iii) For each possible pair of classes i and j, the Jensen-Shanon DJS divergence is calculated, which is given by iv) For each possible pair of classes i and j, the Jensen-Shanon distance dJS is calculated, which is given byV) The calculation of the separability index lJ'e as an expression in which n is the number of classes of sounds that are intended to be identified; and N is the number of calculated Jensen-Shanon distances whose value is given by n (n -1) N = - = - 2-
[3]
3. Sound identification system by parametric classification of derived series according to previous claims, characterized in that the selection of characteristics of a sound window from the Jensen-Shanon distance 10 described in claim 2 is corrected based on the correlation between MPEG-7 parameters Y comprising: 15 20 25 i) The calculation of the parameter-parameter correlation matrix Qp. which is formed by the elements e, j that represent the correlation between the parameters i and j. These values are calculated as cov (x¡, Xj) l: r = l (X¡k -X¡) (Xjk -Xj) e¡j = -¡ :::: ======= var (x ¡) Var (xj) Jl: 'k = l (X¡k -X¡) 2 l:' k = l (Xjk -Xj) 2 expression in which X¡k represents the value of the i-th parameter in the k-th window; x¡ is the mean value of the i-th parameter; and m is the total number of windows. ii) From the correlation matrix Qp, the independence matrix between parameters defined as L1 == 1-lepl is calculated. iii) For each parameter, the separability index IJ'¡ is calculated following the procedure indicated in the previous section. iv) The most relevant parameter is the one with the highest value of IJ'¡. It is incorporated into the (ordered) set of relevant parameters: R and it is removed from the set of parameters pending analysis P.v) For each of the j-th parameters in P, the independence Djk with respect to each of the k-th parameters in ~ is calculated. vi) For each of the j-th parameters in P, calculate the minimum independence with respect to ~ defined as J1. j == min Djk k 5 vii) For each of the j-th parameters in P, calculate the corrected separability, defined as Kj == lJIj. J1.j viii) The most relevant parameter is chosen as the one with the highest value of Kj 'It is incorporated into the (ordered) set of relevant parameters ~ and eliminated from the set of parameters pending analysis P. 10 ix) Steps v are iterated ) to viii) until all the parameters have been analyzed: P = 0 15 20 25
[4]
4. Sound identification system through parametric classification of derived series characterized in that the selection of characteristics of a series derived from the Jensen-Shanon distance is obtained by calculating the class separability index 4o'e according to the MPEG-7 parameter 8-th that includes: i) For each i-th class, the values of the 8-th parameter are obtained in all the sound windows belonging to that class. ii) For each i-th class, calculate the probability density functions f, of the values of the 8-th parameter in the sound windows belonging to that class. iii) For each possible pair of classes i and j, the Jensen-Shanon DJS divergence is calculated, which is given by iv) For each possible pair of classes i and j, the Jensen-Shanon distance d¡s is calculated, which is given by d¡s ( fe¡, fe i) == D¡s (tee fe i) v) The separability index lJIe is calculated as5 10 15 20 25 n-l n ll'O == N n n d¡S (tOiJOj) i = l j = i + l expression in which n is the number of classes of sounds to be identified; and N is the number of calculated Jensen-Shanon distances whose value is given by n (n -1) N = - 2--
[5]
5. Sound identification system by parametric classification of derived series according to claims 1 and 4, characterized in that the selection of characteristics of a series derived from the Jensen-Shanon distance described in claim 4 is corrected according to the correlation between MPEG-7 parameters Y comprising i) The calculation of the parameter-parameter correlation matrix Qp, which is formed by the elements f2ij that represent the correlation between the parameters i and j. These values are calculated as COV (Xi 'xJ Ir = l (Xik -Xi) (Xjk -Xj) f2ij = r ========= var (xDvar (xj) JI'k = l (Xik -xY I'k = l (Xjk -Xj) 2 expression in which Xii represents the value of the i-th parameter in the k-th window; Xi is the mean value of the i-th parameter; and m is the total number of windows ii) From the correlation matrix Qp, the independence matrix between parameters defined as L1 == l-ll.'pl is calculated iii) For each parameter, the separability index ll'i is calculated following iv) the indicated procedure in the previous section. The most relevant parameter is chosen as the one with the highest value of ll'i '. It is incorporated into the (ordered) set of relevant parameters: R. and is eliminated from the set of parameters pending analysis P.5 10 15 20 25 v) For each of the j-th parameters in 'P, the independence Ojk is calculated with respect to each of the k-th parameters in: Ro vi) vii) viii) ix) For each one From the j-th parameters in 'P, the minimum independence with respect to: R defined as J1 is calculated. j == min Ojk k For each of the j-th parameters in' P, the corrected separability is calculated, defined as Kj == lJlj or J1.j The one with the highest value of Kjo is chosen as the most relevant parameter. It is incorporated into the (ordered) set of relevant parameters: R and it is eliminated from the set of parameters pending analysis' Po The steps are iterated v) to viii) until all the parameters have been analyzed: 'P = f / J

类似技术:

公开号 | 公开日 | 专利标题

US9904849B2|2018-02-27|System for simplified generation of systems for broad area geospatial object detection

Larios et al.2010|Haar random forest features and SVM spatial matching kernel for stonefly species identification

Christie et al.2016|Acoustics based terrain classification for legged robots

Li et al.2015|Automatic instrument recognition in polyphonic music using convolutional neural networks

CN104765768A|2015-07-08|Mass face database rapid and accurate retrieval method

CN106157972A|2016-11-23|Use the method and apparatus that local binary pattern carries out acoustics situation identification

Ting Yuan et al.2012|Frog sound identification system for frog species recognition

Dong et al.2013|A novel representation of bioacoustic events for content-based search in field audio data

Zhang et al.2018|Automatic bird vocalization identification based on fusion of spectral pattern and texture features

Li et al.2019|Acoustic scene clustering using joint optimization of deep embedding learning and clustering iteration

Nanni et al.2018|Bird and whale species identification using sound images

Sturm2013|On music genre classification via compressive sampling

CN105006231A|2015-10-28|Distributed large population speaker recognition method based on fuzzy clustering decision tree

KR101628602B1|2016-06-08|Similarity judge method and appratus for judging similarity of program

ES2667626A1|2018-05-11|Sound identification system by parametric classification of derived series |

Sobieraj et al.2017|Masked non-negative matrix factorization for bird detection using weakly labeled data

Billeb et al.2014|Efficient two-stage speaker identification based on universal background models

Salman et al.2021|Machine learning inspired efficient audio drone detection using acoustic features

Thakur et al.2017|Rapid bird activity detection using probabilistic sequence kernels

Michalevsky et al.2011|Speaker identification using diffusion maps

Ghassempour et al.2018|A SIFT-Based Forest Fire Detection Framework Using Static Images

Balemarthy et al.2018|Our practice of using machine learning to recognize species by voice

Birla2018|A robust unsupervised pattern discovery and clustering of speech signals

Thenkanidiyoor et al.2017|Dynamic kernels based approaches to analysis of varying length patterns in speech and image processing tasks

CN110457680A|2019-11-15|Entity disambiguation method, device, computer equipment and storage medium

同族专利:

公开号 | 公开日

WO2018087406A1|2018-05-17|

ES2667626B1|2019-03-28|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

法律状态:
2019-03-28| FG2A| Definitive protection|Ref document number: 2667626 Country of ref document: ES Kind code of ref document: B1 Effective date: 20190328 |

优先权:

申请号 | 申请日 | 专利标题

ES201600960A|ES2667626B1|2016-11-11|2016-11-11|Sound identification system by parametric classification of derived series|ES201600960A| ES2667626B1|2016-11-11|2016-11-11|Sound identification system by parametric classification of derived series|

PCT/ES2017/000139| WO2018087406A1|2016-11-11|2017-11-10|System for identifying sounds by means of parametric classification of derived series|

[返回顶部]