巴西专利BR112013014172B1 apparatus and method for decomposing an input signal using a downmixer

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
APPARATUS AND METHOD FOR DECOMPOSING AN INPUT SIGNAL USING A DOWNMIXER. An apparatus for decomposing an input signal having a number of at least three input channels, comprising a downmixer (12) for the downmix of the input signal to obtain a reduced signal having a smaller number of channels. In addition, an analyzer (16) for analyzing the reduced signal to derive an analysis result is provided, and the analysis result 18 is transmitted to a signal processor (20) to process the input signal or a signal derived from the signal input to obtain the decomposed signal (26).
公开号:BR112013014172B1
申请号:R112013014172-7
申请日:2011-11-22
公开日:2021-03-09
发明作者:Andreas Walther
申请人:Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V；
IPC主号:

专利说明:

Field of the Invention
The present invention relates to the processing of audio and, in particular, to the decomposition of audio signal into different components, such as perceptually distinct components.
The human auditory system perceives the sound from all directions. The perceived auditory environment (the auditory adjective 10 indicates what is perceived, while the word sound will be used to describe the physical phenomenon) creates an impression of the acoustic properties of the surrounding space and the sound events produced. The auditory impression perceived in a specific field of sound can (at least partially) be modeled considering three different types of signals at the entrances of the car: direct sound, initial reflections and diffuse reflections. These signals contribute to the formation of a perceived auditory spatial image.
Direct sound refers to the waves of each sound event that first reach the listener directly from an undisturbed sound source. It is characteristic for the sound source and offers the least compromised information about the direction of incidence of the sound event. The main indications for estimating the direction of a sound source in the horizontal plane are the differences between the input signals of the left and right ears, that is, differences in interaural time (DTIs | interaural time differences) and differences in interaural level ( ILDs I interaural level differences). Subsequently, a series of reflections of the direct sound reaches the ears from different directions and with different relative time delays and levels. With the increase in the delay time, in relation to the direct sound, the density of the reflections increases until they constitute a statistical disorder.
The reflected sound contributes to the perception of distance and, for the spatial auditory impression, which is composed of at least two components: Apparent Source Width (ASW I Apparent Source Width) (Another widely used term for ASW is Hearing Amplitude) and Listener Engagement (LEV | Listener 10 Envelopment). ASW is defined as an increase in the apparent width of a sound source and is mainly determined by the initial lateral reflections. LEV refers to the listener's feeling of being involved in the sound and is mainly determined by the final reflections. The purpose of reproducing electro-acoustic stereophonic sound is to evoke the perception of a pleasant spatial auditory image. This can have a natural or architectural reference (for example, recording a concert in a hall), or it can be a sound field that does not exist in reality (for example, electroacoustic music).
In the field of auditorium acoustics it is well known that - in order to obtain a subjectively pleasant sound field - a strong sense of auditory spatial impression is important, with LEV being an integral part. The ability of speaker configurations to reproduce a surround sound field, through the reproduction of a diffuse sound field is of interest. In a synthetic sound field, it is not possible to reproduce all naturally occurring reflections using dedicated transducers. This is especially true for diffuse later reflections. The properties of time and level of diffuse reflections can be simulated using "reverberated" signals when the speakers are powered. If these are not sufficiently correlated, the number and location of the speakers used for reproduction determine whether the sound field is perceived as being diffuse. The aim is to evoke the perception of a diffuse and continuous sound field, using only a discrete number of transducers. That is, the creation of sound fields where no direction of arrival of the sound can be estimated and, especially, no transducers can be located. The subjective diffusion of synthetic sound fields can be evaluated in subjective tests.
Stereo sound reproductions aim to stimulate the perception of a continuous sound field, using only a discrete number of transducers. The most desired features 15 are directional stability from localized sources and realistic rendering of the surrounding listening environment. Most of the formats used today to store or transport stereophonic recordings are based on channels. Each channel transmits a signal that is intended to be reproduced over a speaker 20 'connected in a specific position. A specific auditory image is created during the recording or mixing process. This image is accurately recreated if the speaker configuration used for playback resembles the target configuration for which the recording was designed.
The number of viable channels for transmission and reproduction is constantly growing and with all emerging audio reproduction formats comes the desire to process the legacy content over the current reproduction system.
Upmix algorithms are a solution to this desire, computing a signal with more channels from a legacy signal. A number of stereo channel upmix algorithms have been proposed in the literature, for example, Carlos Avendano and Jean-Marc Jot, "A frequency-domain approach to multichannel upmix", Journal of the Audio Engineering Society, vol. 52, no. 7/8, pp. 740-749, 2004; Christof Faller, "Multiple-loudspeaker playback of stereo signals," Journal of the Audio Engineering Society, vol. 54, no. 11, pp. 1051-1064, November 2006; John Usherand Jacob Benesty, "Enhancement of spatial sound quality: A new reverberationextraction audio upmixer," IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 7, pp. 2141-2150, September 2007. Most of these algorithms are based on a direct / ambient signal decomposition, followed by processing adapted to the destination speaker configuration.
The described direct / ambient signal decompositions are not easily applicable to multichannel surround signals. It is not easy to formulate a signal and filtering model to obtain N channels of audio N corresponding direct sound and N channels of ambient sound. The simple signal model used in the case of stereo sound, see, for example, Christof Faller, "Multiple-loudspeaker playback of stereo signals," Journal of the Audio Engineering Society, vol. 54, no. 11, pp. 1051-1064, November 2006, considering the direct sound to be correlated between all channels, it does not capture the diversity of channel relationships that may exist between surround signal channels.
The general objective of stereophonic sound reproduction is to stimulate the perception of a continuous sound field, using only a limited number of transmission channels and transducers. Two speakers are the minimum requirement for reproducing spatial sound. Modern consumer systems often offer a greater number of reproduction channels.
Basically stereo signals (regardless of the number of channels) are recorded or mixed in such a way that for each source the direct sound becomes coherent (= dependent) for a number of channels with specific directional signals and reflected independent sounds are inserted in a number of channels that determine 10 signals for the width of the apparent font and the involvement of the listener. The correct perception of the intended auditory image is generally possible only at the ideal point of observation in the reproduction configuration for which the recording is intended. Adding more speakers for a given 15 speaker configuration generally allows for a more realistic reconstruction / simulation of a natural sound field. To use the full advantage of an extended speaker configuration if the input signals are provided in a different format, or to manipulate the perceptually distinct parts of the input signal, 20 those must be accessible separately. This specification describes a method for separating dependent and independent components from stereo recordings, comprising an arbitrary number of input channels below.
The decomposition of audio signals into perceptually distinct components is necessary for high quality signal modification, enhancement, adaptive reproduction and perceptual encoding. A number of methods have recently been proposed that allow manipulation and / or extraction of perceptually distinct signal components from input signals on two channels. As input signals with more than two channels become more and more common, the manipulations described are also desirable for multichannel input signals. However, most of the concepts described for two channel input cannot be easily extended to work with input signals with an arbitrary number of channels.
If it is necessary to perform a signal analysis in direct and environmental parts with, for example, a 5.1 channel surround signal having a left channel, a central channel, a right channel, a left surround channel, a right surround channel and an improvement low frequency (subwoofer), it is not so simple to know how to apply an ambient or direct signal analysis. One can think of comparing each pair of the six channels, resulting in a hierarchical processing, which, in the end, has up to 15 different comparison operations. Then, when all of these 15 comparison operations have been done, where each channel has been compared to all other channels, it would be necessary to determine how to evaluate the 15 results. This is time consuming, the results are difficult to interpret, and due to the considerable amount of processing resources, for example, not usable for real-time continuous direct / environment separation applications or, in general, signal decompositions that can be, for example, used in the context of upmix or any other audio processing operations.
In M. M. Goodwin and J. M. Jot, "Primary-ambient signal decomposition and vector-based localization for spatial audio coding and enhancement," in Proc. Of ICASSP 2007, 2007, the principal component analysis is applied to the input channel signals to perform the primary signal decomposition (= direct) and an ambient signal decomposition.
The models used by Christof Faller, "Multiple-loudspeaker playback of stereo signals," Journal of the Audio Engineering Society, vol. 54, no. 11, pp. 1051-1064, November 2006 and C. Faller, "A highly directive 2-capsule based microphone system," in Preprint 123rd Conv. Aud. Eng. Soc., Oct. 2007 consider diffuse or partially correlated sounds in stereo and microphone signals, respectively. They derive filters to extract the diffuse / ambient signal given this hypothesis. These approaches are limited to single and two channel audio signals.
Another reference is C. Avendano and J.-M. Jot, "A frequency-domain approach to multichannel upmix", Journal of the Audio Engineering Society, vol. 52, no. 7/8, pp. 740-749, 2004. The reference of M. M. Goodwin and J. M. Jot, "Primary-ambient signal decomposition and vector-based localization for spatial audio coding and enhancement," in Proc. Of ICASSP 2007, 2007, comments on the reference of Avendano, Jot as follows. The reference offers an approach that involves creating a time-frequency mask to extract the environment from a stereo input signal. The mask is based on the cross correlation between the left and right channel signals, however, so this approach is not immediately applicable to the problem of extracting the ambience from an arbitrary multichannel input. In order to use any method based on correlation in this case of higher order, a correlation analysis of hierarchical pairs would be required, which would imply a significant computational cost, or some alternative measure of multichannel correlation.
Spatial impulse response rendering (SIRR I Spatial Impulse Response Rendering) (Juha Merimaa and Ville Pulkki, "Spatial impulse response rendering", in Proc, of the 7th Int. Conf, on Digital Audio Effects (DAFx '04), 2004 ) estimates direct sound with direction and fuzzy sound in B-shaped impulse responses. Very similar to SIRR, DirAC I Directional Audio Coding implements direct and diffuse sound analysis similar to continuous audio signals in format B.
The approach presented by Julia Jakka, Binaural to Multichannel Audio Upmix, Ph.D. thesis, Master's Thesis, Helsinki University of Technology, 2005 describes an upmix using binaural signals as input.
Boaz Rafaely's reference, "Spatially Optimal Wiener Filtering in a Reverberant Sound Field, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 2001, October 21 to 24, 2001, New Paltz, New York," describes the derivation of Wiener filters that are spatially ideal for reverberating sound fields. An application for noise cancellation of two microphones in reverberating rooms is given. The optimal filters that are derived from the spatial correlation of diffuse sound fields capture the local behavior of the sound fields and, therefore, are of an inferior order and potentially more spatially robust than conventional adaptive noise cancellation filters in reverberating rooms. The formulations for ideal filters without restrictions and causally restricted are presented and an example of an application for speech improvement in two microphones is demonstrated using computer simulation.
It is the object of the present invention to provide an improved concept for decomposing the input signal.
This object is achieved by an apparatus for decomposing the input signal according to claim 1, a method for decomposing an input signal according to claim 14 or a computer program according to claim 15.
The present invention is based on the discovery that, for the decomposition of a multichannel signal, it is an advantageous approach not to carry out the analysis with respect to the different components of the signal with the input signal directly, that is, with the signal that features at least three input channels. Instead, the multichannel input signal that has at least three input channels is processed by a downmixer to reduce channels of the input signal to obtain a reduced signal. The reduced signal has a reduced number of channels than the number of input channels, which is preferably two. Then, the analysis of the input signal is performed on the reduced signal instead of directly on the input signal and the analysis results in an analysis result. However, this analysis result is not applied to the reduced signal, but is applied to the input signal or, alternatively, to a signal derived from the input signal, where the signal derived from the input signal can be an upmix signal or, depending on the number of channels of the input signals, also a downmix signal, but this signal derived from the input signal will be different from the reduced signal, in which the analysis was performed. When, for example, the input signal is a 5.1 channel signal, then the downmix signal, on which the analysis is performed, can be a stereo downmix signal having two channels. The analysis results are then applied to the 5.1 input signal directly, to a higher upmix, such as a 7.1 output signal or to a multichannel downmix of the input signal that has, for example, only three channels, which they are the left channel, the central channel and the right channel, when only one three-channel audio processing device is at hand. In any case, however, the signal to which the analysis results are applied by the signal processor is different from the reduced signal for which the analysis was performed and typically has more channels than the reduced signal, on which the analysis with respect to signal components was performed.
The so-called "indirect" processing / analysis is possible due to the fact that any signal components on the individual input channels can also be considered to occur on the reduced channels, since a downmix typically consists of an addition of different input channels ways. A simple downmix is, for example, when the individual input channels are weighted as required by a downmix rule or a downmix matrix and are then added together after they have been weighted. An alternative downmix consists of filtering the input channels with certain filters, such as HRTF filters, the downmix being performed using filtered signals, that is, the signals filtered by HRTF filters as known in the art. For a five channel input signal, 10 HRTF filters are required, and the HRTF filter outputs for the left / left ear are added and the HRTF filter outputs for the right channel filters are added for the right ear. Alternative downmix can be applied in order to reduce the number of channels which must be processed in the signal analyzer.
Therefore, the applications of the present invention describe a new concept for extracting perceptually distinct components from arbitrary input signals when considering an analysis signal, while the analysis result is applied to the input signal. Such an analysis signal can be obtained for example, considering a model of propagation of one of the channel signals or loudspeaker signals to the ears. This is partly motivated by the fact that the human auditory system also uses only two sensors (in the left and right ears), to assess sound fields. Thus, the extraction of perceptually distinct components is basically reduced to the consideration of an analysis signal that will be considered as the next downmix. Throughout this document, the term downmix is used for any pre-processing of the multichannel signal resulting in an analysis signal (which may include, for example, a propagation model, HRTFs, BRIRs, single cross-factor downmix).
Knowing the format of a given input and the desired characteristics of the signal to be extracted, the ideal relationships between channels can be defined for the reduced format. An analysis of this analysis signal is sufficient to generate a weighting mask (or multiple weighting masks) for the decomposition of multichannel signals.
In an application, the multichannel problem is simplified by using a stereo downmix of a surround signal and applying a direct / ambient analysis to the downmix. Based on the result, that is, on the short-term power spectrum estimates of direct and ambient sound, the filters are derived for the decomposition of N channel signal to an N direct sound and N channels of ambient sound.
The present invention is advantageous due to the fact that the signal analysis is applied in a smaller number of channels, which significantly reduces the necessary processing time, so that the inventive concept can even be applied in real time applications for upmix or downmix or any other signal processing operation in which different components are needed, such as perceptually different components of a signal.
Another advantage of the present invention is that although a downmix is performed, it has been found that this does not impair the detectability of the perceptually distinct components in the input signal. In other words, even when the input channels are reduced, the individual signal components can, however, be largely separated. In addition, the downmix acts as a kind of "collection" of all signal components of all input channels in two channels and the only analysis applied to these reduced "collected" signals provides a unique result, which does not need to be interpreted and can be used directly for signal processing.
In a preferred application, a particular efficiency for the purpose of signal decomposition is obtained when the signal analysis is performed based on the similarity curve pre-calculated as a function of frequency as a reference curve. The term "similarity" includes correlation and coherence, where - in a rigorous - mathematical sense, the correlation is calculated between two signals without an additional time change and coherence is calculated by shifting the two signals in time / phase so that the signals have a maximum correlation and the actual correlation on the frequency is then calculated with the applied time / phase change. For this text, similarity, correlation and coherence are considered to have the same meaning, that is, a degree of quantitative similarity between two signs, for example, a higher absolute value of similarity means that the two signs are more similar and an absolute value less similarity means that the two signals are less similar.
It has been shown that the use of such a correlation curve as a reference curve allows a quite and efficiently implementable analysis, since the curve can be used for simple comparison operations and / or weighting factor calculations. The use of a correlation curve pre-calculated as a function of frequency allows only simple calculations to be performed, and not more complex operations, such as Wiener filtering. Furthermore, the application of the correlation curve as a function of frequency is particularly useful due to the fact that the problem is not approached from a statistical point of view, but is approached in a more analytical way, since the maximum of Possible information from the current configuration is provided in order to obtain a solution to the problem. In addition, the flexibility of this procedure is very high, since the reference curve can be obtained in many different ways. One way is to measure the two or more signals of a certain configuration and then calculate the correlation curve along the frequency from the measured signals. Therefore, you can output independent signals from different speakers or signals with a certain degree of dependency that is known in advance.
The other preferred alternative is simply to calculate the correlation curve under the assumption of independent signals. In this case, all signals are, in fact, not necessary, since the result is independent of the signal.
Signal decomposition using a reference curve for signal analysis can be applied for stereo processing, that is, for the decomposition of a stereo signal. Alternatively, this procedure can also be applied in conjunction with a downmixer to decompose multichannel signals. Alternatively, this procedure can also be applied to multichannel signals, without using a downmixer when an evaluation of signal pairs is predicted in a hierarchical manner.
The preferred applications of the present invention are discussed further in relation to the accompanying figures, where:
FIGURE 1 is a block diagram illustrating an apparatus for decomposing an input signal using a downmixer;
FIGURE 2 is a block diagram illustrating an implementation of a device for decomposing a signal having a number of at least three input channels using an analyzer with a correlation curve as a function of the pre-calculated frequency according to with a further aspect of the invention;
FIGURE 3 illustrates a further preferred implementation of the present invention with frequency-domain processing for downmix, analysis and signal processing;
FIGURE 4 illustrates an example of a correlation curve as a function of the pre-calculated frequency for a reference curve for the analysis indicated in FIGURE 1 or FIGURE 2;
FIGURE 5 illustrates a block diagram that illustrates further processing in order to extract the independent components;
FIGURE 6 illustrates an additional implementation of a block diagram for further processing where fuzzy independent, direct and direct independent components are extracted;
FIGURE 7 illustrates a block diagram for implementing the downmixer as an analysis signal generator;
FIGURE 8 illustrates a flow chart for indicating a preferred form of processing in the signal analyzer of FIGURE 1 or FIGURE 2;
FIGURES 9A - 9E illustrate the different pre-calculated frequency-related correlation curves that can be used as reference curves for several different configurations, with different numbers and positions of the sound sources (such as speakers);
FIGURE 10 illustrates a block diagram to illustrate another application for a diffusion estimate where the diffuse components are the components to be decomposed, and
FIGURES 11A and 11B illustrate examples of equations for applying a signal analysis without a frequency-related correlation curve, but relying on the Wiener filtering approach.
FIGURE 1 illustrates an apparatus for decomposing an input signal 10 having a number of at least three input channels or, in general, N input channels. These input channels are introduced in a downmixer 12 to reduce the input signal and obtain a reduced signal 14, characterized by the fact that the downmixer 12 is willing to perform the downmix, so that a number of downmix channels of the reduced signal 14, which is indicated by (m); is at least two and less than the number of input channels of the input signal 10. The downmix channels (m) are introduced into an analyzer 16 to analyze the reduced signal to derive an analysis result 18. The result of analysis 18 is inserted into a signal processor 20, in which the signal processor is arranged to process the input signal 10, or a signal derived from the input signal through a signal derivative 22, using the result of the analysis, wherein the signal processor 20 is configured to apply the analysis results to the input channels or to the channels of the signal 24 derived from the input signal to obtain a decomposed signal 26.
In the application illustrated in FIGURE 1, a number of input channels is (n), the number of downmix channels is (m), the number of derived channels is (1) and the number of output channels is equal to (1 ), when the derived signal, instead of the input signal, is processed by the signal processor. Alternatively, when signal shunt 22 does not exist, then the input signal is processed directly by the signal processor and then the number of channels of the decomposed signal 26 indicated by (1) in FIGURE 1 will be equal to (n) . Thus, FIGURE 1 illustrates two different examples. One example does not have the signal splitter 22 and the input signal is applied directly to the signal processor 20. The other example is that the signal splitter 22 is applied and then the derived signal 24 instead of the input signal 10 , is processed by the signal processor 20. The signal tap may be, for example, an audio channel mixer, such as a downmixer to generate more output channels. In this case, (1) would be greater than (n). In another application, the signal tap could be another audio processor that performs weighting, delay or any other task for the input channels, in which case the number of output channels of (1) of the signal tap 22 would be equal the number (n) of input channels. In another implementation, the signal drift could be a downmixer that reduces the number of channels from the input signal to the derived signal. In this implementation, it is preferable that the number (1) is even greater than the number (m) of downmix channels, in order to have one of the advantages of the present invention, that is, that the signal analysis is applied to a number minor channel signals.
The analyzer is operative to analyze the reduced signal with respect to the perceptually distinct components.
These perceptually distinct components can be independent components in the individual channels, on the one hand, and the dependent components, on the other hand. Alternative signal components to be analyzed by the present invention are direct components on the one hand and ambient components on the other hand. There are many other components that can be separated by the present invention, such as speech components from music components, noise components from speech components, noise components from music components, high frequency noise components from from low frequency noise components, in multi-level signals the components provided by the different instruments, etc. This is due to the fact that there are powerful analysis tools, such as Wiener filtering, as discussed in the context of FIGURE 11A, 11B or other analysis procedures, such as using a correlation curve as a function of frequency, as discussed in the context of, for example, FIGURE 8, according to the present invention.
FIGURE 2 illustrates another aspect, in which the analyzer is implemented for the use of a correlation curve pre-calculated as a function of frequency 16. Thus, the apparatus for the decomposition of a signal 28, having a plurality of channels comprises the analyzer 16 for analyzing a correlation between the two channels of an analysis signal identical to the input signal or related to the input signal, for example, by a downmix operation, as illustrated in the context of FIGURE 1. The analysis signal analyzed the analyzer 16 has at least two channels of analysis and the analyzer 16 is configured to use a correlation curve as a function of the pre-calculated frequency as a reference curve to determine the analysis result 18. The signal processor 20 can function in the same way, as discussed in the context of FIGURE 1, and is configured to process the analysis signal or a signal derived from the analysis of the signal by a signal derivative 22, where the derivative signal 22 can be implemented in a similar way to what has been discussed in the context of signal derivative 22 of FIGURE 1. Alternatively, the signal processor can process a signal from which the analysis signal is derived and the signal processing uses the result of the analysis to obtain a decomposed signal. Thus, in the application of FIGURE 2, the input signal can be identical to the analysis signal and, in this case, the signal analysis can also be a stereo sound signal having only two channels, as illustrated in FIGURE 2. Alternatively, the analysis signal can be derived from an input signal by any type of processing, such as downmix, as described in the context of FIGURE 1, or by any other processing such as upmix or the like. In addition, signal processor 20 can be useful for processing the signal for the same signal that has been inserted into the analyzer, or the signal processor can apply signal processing to a signal from which the analysis signal was derived , as indicated in the context of FIGURE 1, or the signal processor may apply signal processing to a signal that was derived from the analysis signal such as by upmix or the like.
Thus, there are several possibilities for the signal processor and all these possibilities are advantageous due to the exclusive operation of the analyzer using a correlation curve as a function of the pre-calculated frequency as a reference curve to determine the result of the analysis.
Subsequently, additional applications will be discussed. It should be noted that, as discussed in the context of FIGURE 2, even the use of a two-channel analysis signal (without downmix) is considered. Thus, the present invention, as discussed in the different aspects in the context of FIGURE 1 and FIGURE 2, which can be used together or in separate aspects, the downmix can be processed by the analyzer or a two-channel signal, which was probably not generated by a downmix, it can be processed by the signal analyzer using the pre-calculated reference curve. In this context, it is noted that the subsequent description of the implementation aspects can be applied to both aspects illustrated schematically in FIGURE 1 and FIGURE 2, even when certain characteristics are described for only one aspect instead of both. If, for example, FIGURE 3 is considered, it becomes clear that the frequency domain characteristics of FIGURE 3 are described in the context of the aspect illustrated in FIGURE 1, but it is clear that a time / frequency transformation as later described with with respect to FIGURE 3 and the inverse transformation can also be applied for the implementation in FIGURE 2, which does not have a downmixer, but which has a specified analyzer that uses a correlation curve as a function of the pre-calculated frequency.
In particular, the time / frequency converter would be placed to convert the analysis signal before the analysis signal was introduced into the analyzer, and the frequency / time converter would be placed at the signal processor output to convert the processed signal back to the dominant moment. When a signal drift exists, the time / frequency converter can be placed at an input of the signal drift so that the signal drift, the analyzer and the signal processor all operate in the frequency / subband domain. In this context, the frequency and sub-band basically means a part of the frequency of a frequency representation.
Furthermore, it is evident that the analyzer in FIGURE 1 can be implemented in many different ways, but this analyzer is also, in an application, implemented as the analyzer discussed in FIGURE 2, that is, as an analyzer that uses a curve of correlation as a function of the pre-calculated frequency as an alternative to Wiener filtering or any other method of analysis.
The application of FIGURE 3 refers to a process of downmixing an arbitrary input signal to obtain a representation of two channels. An analysis in the frequency-time domain is performed and weighting masks are calculated and multiplied by the frequency-time representation of the input signal, as shown in FIGURE 3.
In the image, T / F denotes a frequency-time transformation; commonly a Short-Term Fourier Transform (STFT | Short-time Fourier Transform). iT / F denotes the respective reverse transformation. [X] (n), L, xN (ri) are the input signals in the time domain, where n is the time index. Denote the frequency decomposition coefficients, where m is the decomposition time index and i represents the index of the decomposition frequency. [D} (m, i), D2 (m, i)] are the two channels of the reduced signal.
W (mi) is the calculated weight. [Y (m2), z), ..., yv (w, z)] are the weighted frequency decompositions for each channel. Hjj (i) are the downmix coefficients, which can be real or complex values and the coefficients can be constant in time or time-varying. Thus, the downmix coefficients can be just constants or filters, such as HRTF filters, reverberation filters or similar filters.

In FIGURE 3, the process of applying the same weighting to all channels is represented
.LVj (w), ...,>'Ar (w)] are the output signals in the time domain comprising the components of the extracted signal. (The input signal can have an arbitrary number of channels (N), produced for an arbitrary target playback speaker configuration. The downmix can include HRTFs for obtaining input signals for ears, simulation of auditory filters, etc. Downmix can also be performed in the time domain.).
In an application, the difference between a reference correlation (Throughout this text, the term "correlation" is used as a synonym for similarity between channels and therefore can also include time shift assessments, for which the term is usually "coherence" is used. Even if time deviations are evaluated, the resulting value can have a sign. (Normally, coherence is defined as having only positive values), as a function of frequency (cref (a>))), and the actual correlation of the reduced input signal (c (to)) is computed. Depending on the deviation of the real curve from the reference curve, a weighting factor for each time-frequency portion is calculated, indicating whether it comprises dependent or independent components. The obtained time-frequency weighting indicates the independent components and can now be applied to each channel of the input signal to produce a multichannel signal (number of channels equal to the number of input channels), including independent parts that can be perceived as distinct or diffuse.
The reference curve can be defined in different ways. Examples are: • Ideal theoretical reference curve for a diffuse two- or three-dimensional sound field made up of independent components. • The ideal curve achievable with the reference destination speaker configuration for the input signal (for example, standard stereo configuration with azimuth angles (± 30 °) or standard five-channel configuration according to ITU-R BS.775 with azimuth angles (0 °, ± 30 °, ± 1 10 °)). • The ideal curve for the speaker configuration currently present (the actual positions can be measured or known to the user. The reference curve can be calculated assuming the reproduction of independent signals by the considered speakers). • Short-term energy dependent on the actual frequency of each input channel can be incorporated into the reference calculation.
Given a frequency-dependent reference curve (Cnf (ct))), an upper limit (AC, (ÍW)) and a lower limit (C / O (ÍW)) can be defined (see FIGURE 4). The limit curves can coincide with the reference curve (cnf (<y) = crel (a>) = clo (&>)), or be defined assuming detectability limits, or they can be derived heuristically.
If the deviation from the real curve from the reference curve is within the vicinity indicated by the limits, the current compartment is weighted indicating independent components. Above the upper limit or below the lower limit, the compartment is indicated as dependent. This indication can be binary or gradual (that is, following a smooth decision function). In particular, if the upper and lower limit coincides with the reference curve, the applied coefficient is directly related to the deviation from the reference curve.
With reference to FIGURE 3, numerical reference 32 illustrates a time-to-frequency converter that can be implemented as a Short-Term Fourier Transform or as any type of filter bank for generating subband signals, such as a filter bank QMF or similar. Regardless of the detailed execution of the time / frequency converter 32, the output of the time / frequency converter is, for each input channel Xi, a spectrum for each time period of the input signal. implemented to always have a block of input samples from an individual channel signal and calculate the frequency representation such as an FFT spectrum, having spectral lines that extend from a frequency lower than a higher frequency. Then, for a next block of time, the same procedure is performed so that, in the end, a sequence of short time spectra is calculated for each input channel signal. A certain frequency range of a certain spectrum relative to a given block of input samples from an input channel is referred to as a "time / frequency" portion and, preferably, the analysis of the analyzer 16 is performed based on these portions of time / frequency. Consequently, the analyzer receives, as an input for a portion of time / frequency, the spectral value at a first frequency of a given block of input samples from the first downmix channel Dx and receives the value for the same frequency and the same block. (in time) of the second downmix channel D2.
Then, as for example illustrated in FIGURE 8, the analyzer 16 is configured to determine (80) a correlation value between the two input channels by subband and time block, that is, a correlation value for a portion of time / frequency. Then, the analyzer 16 retrieves, in the application illustrated with reference to FIGURE 2 or FIGURE 4, a correlation value (82) for the corresponding subband from the reference correlation curve. When, for example, the subband is the subband indicated in 40 in FIGURE 4, then step 82 results in the value 41 indicating a correlation between -1 and +1, and the value 41 is, then, the correlation value recovered. . Then, in step 83, the result for the subband using the correlation value determined from step 80 and the recovered correlation value 41 obtained in step 82 is performed through a comparison and subsequent decision or the calculation is made of a real difference. The result can be, as previously discussed, a binary sequence indicating that the portion of the actual time / frequency considered in the downmix / analysis signal has independent components. This decision will be made when the correlation value actually determined (in step 80) is equal to the reference correlation value or is close to the reference correlation value.
When, however, it is determined that the determined correlation value indicates a higher absolute correlation than the reference correlation value, then it is determined that the time / frequency portion under consideration comprises dependent components. Thus, when the correlation of a time / frequency portion of the downmix signal or analysis signal indicates an absolute correlation value greater than the reference curve, then it can be said that the components of this time / frequency portion are dependent on each other. When, however, the correlation is indicated as very close to the reference curve, then it can be said that the components are independent. Dependent components can receive a first weighting value, such as 1, and independent components can receive a second weighting value, such as 0. Preferably, as shown in FIGURE 4, the upper and lower limits are spaced apart from each other. of the reference line and are used to provide a better result, which is more appropriate than using the reference curve in isolation.
In addition, with respect to FIGURE 4, it is noted that the correlation can vary between -1 and +1. A correlation having a negative sign further indicates a 180 ° phase shift between the signals. Therefore, other correlations that only extend between 0 and 1 can be applied, with the negative part of the correlation being simply made positive. In this procedure, you can then ignore a time change or phase change for the purpose of determining the correlation.
The alternative way of calculating the result is to actually calculate the distance between the correlation value determined in block 80 and the recovered correlation value obtained in block 82 and then determine a value between 0 and 1 as a weighting factor based on in the distance. While the first alternative (1) in FIGURE 8 only results in values of 0 or 1, possibility (2) results in values between 0 and 1, being, in some implementations, preferred.
The signal processor 20 in FIGURE 3 is illustrated as multipliers and the analysis results are just a determined weighting factor that is routed from the analyzer and transmitted to the signal processor, as shown in 84 in FIGURE 8 and then applied to the corresponding time / frequency portion of the input signal 10. When, for example, the spectrum considered is the 20th spectrum in the sequence of spectra, and when the frequency is actually considered the 5th frequency of this 20th spectrum, then , the time / frequency portion can be indicated as (20, 5), where the first number indicates the block number in time and the second number indicates the frequency compartment, in this spectrum. Then, the analysis result for the time / frequency portion (20, 5) is applied to the corresponding time / frequency portion (20, 5) of each channel of the input signal in FIGURE 3, or, when a signal, as illustrated in FIGURE 1 is implemented, to the corresponding time / frequency portion of each channel of the derived signal.
Subsequently, the calculation of the reference curve is discussed in more detail. For the present invention, however, it is not fundamentally important how the reference curve was derived. It can be an arbitrary curve or, for example, the values of a table indicating an ideal or desired relation of the input signals Xj in the downmix signal D or, and in the context of FIGURE 2, in the signal analysis. The following derivation is exemplary.
The physical diffusion of a sound field can be evaluated by a method introduced by Cook et al. (Richard K. Cook, RV Waterhouse, RD Berendt, Seymour Edelman, and Jr. MC Thompson, "Measurement of correlation coefficients in reverberant sound fields," Journal Of The Acoustical Society Of America, vol. 27, no. 6, pp. 1072-1077, November 1955), using the correlation coefficient (r) of sound pressure in the state of equilibrium of plane waves at two spatially separated points, as illustrated in the following equation (4)
where / 2 | (n) and p2 (n) are the sound pressure measurements at two points, n is the time index, and <•> denotes the mean time. In a steady-state sound field, the following relationships can be derived:
(for three-dimensional sound fields) and (5) kd r (k, d) = J0 (kd) (for two-dimensional sound fields, and (6) where d is the distance between the two measurement points. 1π
is the wave number, with 2 being the wavelength. (The physical reference curve r (k, d) can now be used as cKf for further processing.)
The measure for the perceptual diffusion of a sound field is the interaural cross-correlation coefficient (p), measured in a sound field. The measurement of p implies that the radius between the pressure sensors (of the ears) is fixed. Including this restriction, r becomes a function of the frequency with the radian frequency co = kc, where cê the speed of sound in the air. In addition, the pressure signals differ from the free field signals previously considered due to the reflection, diffraction and flexion effects caused by the listener's ear (ear), head and torso. These effects, which are substantial for spatial hearing, are described by head-related transfer functions (HRTFs I head-related transfer functions). Considering these influences, the resulting pressure signals at the ear inputs are pL (n, ú>) and pR (n, cd). For the calculation, measured HRTF data can be used or approximations can be obtained using an analytical model (e.g. Richard O. Duda and William L.
Martens, "Range dependence of the response of a spherical head model," Journal Of The Acoustical Society Of America, vol. 104, no. 5, pp. 3048-3058, November 1998).
Since the human auditory system acts as a frequency analyzer with limited frequency selectivity, this frequency selectivity can be additionally incorporated. Hearing filters have the presumed behavior of overlapping bandpass filters. In the following example of explanation, a critical band approach is used to approximate these bandpass overlapped by rectangular filters. The equivalent rectangular bandwidth (ERB | Equivalent Rectangular Bandwitdth) can be calculated as a function of the center frequency. Considering that binaural processing follows auditory filtration, p has to be calculated for separate frequency channels, producing the following pressure signals as a function of frequency:
where the limits of integration are given by the limits of the critical band according to the real central frequency co. The factors of 1 / b (w) may or may not be used in equations (7) and (8).
If one of the sound pressure measurements is advanced or delayed by a time difference independent of the time, the coherence of the signals can be assessed. The human auditory system is capable of making use of such a time alignment property. Normally, interaural coherence is calculated within ± 1 ms. Depending on the processing capacity available, calculations can be implemented using only the zero delay value (for low complexity) or consistency with a time advance and delay (if high complexity is possible). In what follows, no distinction is made between the two cases.
The ideal behavior is obtained considering an ideal diffuse sound field, which can be idealized as a wave field that is composed of decelated, equally intense flat waves, propagating in all directions (that is, an overlap of an infinite number of waves propagation planes with random phase relationships and uniformly distributed directions). A signal radiated through a loudspeaker can be considered a flat wave for a listener positioned far enough away. This assumption of a flat wave is common in stereo reproduction over loudspeakers. In this way, a synthetic sound field reproduced by the speakers consists of flat waves from a limited number of directions.
Given an input signal with N channels, produced for reproduction through a configuration with speaker positions [lx, l2, l3, ..., lN]. (In the case of a single horizontal reproduction configuration, li, indicates the azimuth angle. In the general case, li = (azimuth, elevation) indicates the position of the speaker in relation to the listener's head. If the configuration present in the room list differs from the reference setting, li can alternatively represent the speaker positions of the actual playback setting). With this information, an interaural coherence reference curve pre / for a diffuse field simulation can be calculated for this configuration under the assumption that the independent signals are fed to each of the speakers. The signal strength added by each input channel, in each time-frequency interval can be included in the calculation of the reference curve. In the implementation example, prc / is used as cref.
Different reference curves as examples of reference curves as a function of frequency or correlation curves are illustrated in FIGURES 9A to 9E for a different number of sound sources, in different positions of the sound sources and different head orientations, as indicated in FIGURES .
Subsequently, the calculation of the analysis results as discussed in the context of FIGURE 8, based on the reference curves, is discussed in more detail.
The objective is to derive a weighting coefficient, which is equal to 1, if the correlation between the downmix channels is equal to the reference correlation calculated based on the assumption of independent signals being reproduced by all speakers. If the downmix correlation is +1 or -1, the derived weight must be 0, indicating that there are no independent components present. Among these extreme cases, the weighting must represent a reasonable transition between the indication as independent (W = 1) or totally dependent (W = 0).
Given the reference correlation curve crcf (ci)} and the estimate of the correlation / coherence of the actual input signal reproduced through the actual reproduction configuration (c „g (<u)) (C sig is the coherence of the correlation response downmix), the deviation of c v (<y) from cre / (<y) can be calculated. This deviation (possibly including an upper and lower limit) is mapped to the interval [0, 1] to obtain a weight (lF (w, z)) that is applied to all input channels to separate the independent components.
The following example illustrates a possible mapping when the limits correspond to the reference curve:
The magnitude of the deviation (denoted as Δ) from the real curve cviR from the reference cref is given by: Δ («) = | ci / g (iy) -cnt / (<y) | (9)
Considering that the correlation / coherence is delimited between [-1, +1], the maximum possible deviation towards +1 or -1 for each frequency is given by:

The weighting for each frequency is obtained by:

Considering the time dependence and the limited frequency resolution of the frequency decomposition, the weighting values are derived as follows (Here, the general case of a reference curve that can change over time is shown.). The time-independent reference curve (ie crel - (/)) is also possible):

Such processing can be performed in a frequency decomposition with frequency coefficients grouped in sub-bands that are perceptually motivated for reasons of computational complexity and to obtain filters with shorter pulse responses. In addition, smoothing filters and compression functions (that is, distorting the weighting coefficient in a desired way, additionally introducing maximum or minimum minimum weighting values) can be applied.
FIGURE 5 illustrates a further implementation of the present invention, in which the downmixer is implemented using HRTF filters and auditory filters, as illustrated. In addition, FIGURE 5 further illustrates that the output of the analysis results by the analyzer 16 are the weighting factors for each time / frequency compartment, and the signal processor 20 is illustrated as an extractor for the extraction of independent components . Then, the output of processor 20 is again N channels, but each channel now includes only the independent components and no longer includes any dependent components. In this implementation, the analyzer can calculate the weighting coefficients so that, in a first application of FIGURE 8, an independent component receives a weighting value of 1 and a dependent component receives a weighting value of 0. Then, the time portions / frequency on the original N channels processed by processor 20 that have dependent components would be set to 0.
In another alternative, where there are weighting values between 0 and 1 in FIGURE 8, the analyzer would calculate the weighting coefficient so that a portion of time / frequency having a small distance from the reference curve would receive a high value (nearest of 1), and a portion of time / frequency having a large distance from the reference curve would receive a small weighting factor (being closer to 0). In later weighting illustrated, for example, in FIGURE 3 in 20, the independent components are amplified, while the dependent components are attenuated.
When, however, signal processor 20 is implemented not to extract the independent components, but to extract the dependent components, then the weighting coefficients would be indicated in the opposite direction so that when the weighting coefficients are used in the multipliers 20 illustrated in FIGURE 3, the independent components are attenuated and the dependent components are amplified. Thus, each signal processor can be applied for the extraction of the signal components, since the determination of the components extracted from the signal is determined by the actual assignment of the weighting values.
FIGURE 6 illustrates a further implementation of the concept of the invention, but now with a different implementation of processor 20. In the application of FIGURE 6, processor 20 is implemented for the extraction of independent diffuse parts, independent direct parts and direct parts / direct components by itself.
To obtain, from the separate independent components (YÁ, L, YN), the parts that contribute to the perception of an ambient / surrounding sound field, other additional limitations have to be considered. Such a limitation may be the assumption that the surrounding ambient sound is equally intense from each direction. Thus, for example, the minimum energy of each time / frequency portion in each channel of the independent sound signals can be extracted to obtain a surrounding ambient signal (which can be further processed to obtain a greater number of ambient channels. ). Example:

where P denotes a short-term power estimate. (This example shows the simplest case. An obvious exceptional case, where it is not applicable is when one of the channels includes a signal pause, during which the power in that channel is very low or zero.)
In some cases, it is advantageous to extract equal parts of energy from all input channels and calculate the weighting coefficient using only those extracted spectra.

The extracted dependent parts (for example, those that can be derived as Ydependentes = Yj (m, i) - Xj (m, i)) can be used to detect channel dependencies and estimate the directional signals inherent in the input signal, allowing additional processes such as, for example, a panoramic repositioning.
FIGURE 7 illustrates a variant of the general concept. The N-channel input signal is fed to an analysis signal generator (ASG I analysis signal generator). The generation of the M channel analysis signal may, for example, include a model of propagation of the channels / speakers to the ears or other methods indicated as a mix of downmix throughout this document. The indication of the different components is based on the analysis signal. The masks that indicate the different components are applied to the input signals (extraction A / extraction D (20a, 20b)). The weighted input signals can be further processed (post A / post D (70a, 70b) to produce output signals, with specific characteristics, where, in this example, the designations "A" and "D" have been chosen to indicate that the components to be extracted can be ambient sound and direct sound.
Subsequently, FIGURE 10 is described. One of the stationary sound fields is called diffuse, if the directional distribution of sound energy does not depend on direction. Directional energy distribution can be assessed by measuring all directions using a highly directional microphone. In the room's acoustics, the reverberating sound field in an enclosed space is often modeled as a diffuse field. A diffuse sound field can be thought of as a wave field, which is composed of flat, uncorrelated and equally intense waves propagating in all directions. Such a sound field is isotropic and homogeneous.
If uniformity of energy distribution is of particular interest, the point-to-point correlation coefficient:

of the sound pressures at steady state Pi (t) and p2 (t) at two spatially separated points can be used to evaluate the physical diffusion of a sound field. Considering the ideal two-dimensional and three-dimensional steady-state diffuse sound fields induced by a sinusoidal source, the following relationships can be derived:

Where
(with Z = wavelength) is the number of waves and d is the distance between the measurement points. Given these relationships, the diffusion of a sound field can be assessed by comparing the measurement data to the reference curves. Since the ideal relationships are only necessary, but not sufficient, a series of measurements with orientations different from the axis connecting the microphones can be considered.
Considering a listener in a sound field, sound pressure measurements are given by the input signals to the ear pj (t) and pr (t). Thus, the assumed distance d between the measuring points is fixed and r becomes a function of the frequency kc with only
where c is the speed of sound in the air. The 2π signals considered previously due to the influence of the effects caused by the ears (acoustic flag) of the listener, head and trunk. These effects, which are substantial for spatial hearing, are described by head-related transfer functions (HRTFs). The measured HRTF data can be used to incorporate these effects. We use an analytical model to simulate an HRTF approach. The head is modeled as a rigid sphere with a radius of 8.75 centimeters and the locations of the ears in the azimuth ± 100 ° and elevation 0o. Given the theoretical behavior of r in an ideal diffuse sound field and the influence of HRTFs, it is possible to determine a cross-correlation interaural dependent frequency reference curve for diffuse sound fields.
The diffusion estimate is based on the comparison of simulated signals with assumed diffuse field reference signals. This comparison is subject to the limitations of human hearing. In the auditory system, binaural processing follows the auditory periphery that consists of the outer ear, middle ear and inner ear. Effects of the external ear that are not approximated through the sphere model (for example, shape of the acoustic pavilion, ear canal) and the effects of the middle ear are not considered. The selectivity of the inner ear spectrum is modeled as a bank of overlapping bandpass filters (auditory filters shown in Figure 10). A critical band approach is used to approximate these bands overlaid by rectangular filters. The Equivalent Rectangular Bandwidth (ERB) is calculated as a function of the central frequency according to, b (f.) = 24.7- (0.00437 ■ fc +1)
It is considered that the human auditory system is capable of performing a time alignment for the detection of coherent signal components and that the cross correlation analysis is used to estimate the time of T alignment (corresponding to ITD - interaural time differences | differences interaural time), in the presence of complex sounds. Up to about 11.5 kHz, changes in the carrier signal's time are evaluated using waveform cross correlation, whereas at higher frequencies, the cross correlation envelope becomes the relevant signal. Next, we don't make that distinction. Interaural coherence (IC | interaural coherence) is modeled as the absolute maximum value of the normalized interaural cross-correlation function

Some models of binaural perception consider an interaural cross-correlation analysis in progress. Since we consider stationary signals, we do not take time dependency into account. To model the influence of critical band processing, the normalized and frequency-dependent cross-correlation function is calculated as
where A is the critical correlation cross-correlation function, and B and C are the critical band autocorrelation functions. Its relationship with the frequency domain through the crossover spectrum of the bandwidth and the passage auto-spectrum can be formulated as follows:
where L (f) and R (f) are the transforms of b {f)
Fourier of the input signals from the ear,
are the J Jc 2 upper and lower limits of integration of the critical band according to the real central frequency, and * indicates the complex conjugate.
If signals from two or more sources at different angles are over-positioned, floating ILD and ITD signals are evoked. Such ILD and ITD variations as a function of time and / or frequency can generate amplitude. However, within the long-term average, there should be no ILDs and ITD in a diffuse sound field. An average ITD of zero means that the correlation between signals cannot be increased by time alignment. ILDs can, in principle, be evaluated over the entire audible frequency range. Because the head is not an obstacle for low frequencies, ILDs are more efficient at medium and high frequencies.
Subsequently, FIGURES 11A and 11B are discussed in order to illustrate an alternative application of the analyzer without using a reference curve, as discussed in the context of FIGURE 10 or FIGURE 4.
A short-lived Fourier transform (STFT) is applied to the surround audio channels of input x} (n) to xN (n), obtaining the short duration spectra X} (m, i) to XN (m, i), respectively, where m is the spectrum index (time) and the frequency index. The spectrum of a stereo channel downmix of the surround input signal, denoted X ^ mj) and X2 (m, i), are computed. For a 5.1 surround, an ITU downmix is suitable according to equation (1). X m, i) to X5 (m, i) correspond in this order to the channels on the left (L | left), right (R | right), 10 center (C), left surround (LS I left surround), and surround (RS I right surround). In the following, the time and frequency indices are omitted most of the time for brevity of notation.
Based on the stereo downmix signal, the WD and WA filters are calculated to obtain the direct and ambient surround signal estimates in equations (2) and (3).
Considering the hypothesis that the ambient sound signal is not correlated between all input channels, downmix coefficients were chosen such that this assumption 20 is also valid for downmix channels. Thus, we can formulate the downmix signal model in equation 4.
Di and D2 represent the STFT spectrum of direct correlated sound, and Aj and A2 represent the uncorrelated ambient sound. It is also assumed that the direct sound and the environment in each channel are 25 mutually unrelated.
An estimate of the direct sound, at least in the middle quadratic direction, is achieved by applying a Wiener filter to the original surround sound signal to suppress the environment. To derive a single filter that can be applied to all input channels, we can estimate the direct components in the downmix using the same filter for the left and right channels, as in equation (5).
The joint mean square error function for this estimate is given by equation (6). £ {•} is the expectation operator and PD and PA are the sums of the short-term power estimates of the direct and environmental components, (equation 7).
The error function (6) is minimized by setting its derivative to zero. The resulting filter for estimating the direct sound is found in equation 8.
Similarly, the estimation filter for ambient sound can be obtained from equation 9.
Next, estimates for PD and PA are derived, as needed to compute WD and JVA. The cross correlation of the downmix is given by equation 10. where, given the downmix signal model (4), reference is made to (11).
In addition, assuming that the components of the environment in the downmix have the same power in the left and right downmix channel, equation 12 can be written.
Replacing equation 12 in the last line of equation 10 and considering equation 13, equations (14) 25 and (15) are obtained.
As discussed in the context of FIGURE 4, the generation of reference curves for minimal correlation can be imagined, placing two or more different sound sources in a repetition configuration and placing the listener's head in a certain position in this repetition configuration. Then, completely independent signals are output from the different speakers. For a two speaker configuration, the two channels must be completely uncorrelated with a correlation equal to 0, in which case there would be no cross-mixing products. However, these cross-mixing products occur due to the cross-linking of the left side to the right side of a human hearing system and, on the other hand, cross-linking can also occur due to the reverberation of rooms, etc. Therefore, the resulting reference curves, as illustrated in FIGURE 4 or in FIGURES 9a to 9d, are not always at 0, but have values that are particularly different from 0, although the reference signals imagined for this scenario are completely independent. It is, however, important to understand that these signs are not actually needed. It is also sufficient to assume complete independence between the two or more signals when calculating the reference curve. In this context, it should be noted, however, that other reference curves can be calculated for other situations, for example, using or assuming signals that are not completely independent, but have a certain, but pre-known, dependency or degree of dependence on each other. When such a different reference curve is calculated, the interpretation or provision of weighting factors is different from a reference curve, where the totally independent signals were assumed.
Although some aspects have been described in the context of an apparatus, it is evident that these aspects also represent a description of the corresponding method, where a block or a device corresponds to a method step or a characteristic of a method step. Similarly, the aspects described in the context of a method step also represent a description of a corresponding block or item or characteristic of a corresponding device.
The decomposed signal of the invention can be stored on a digital storage medium or it can be transmitted via a transmission medium, such as a wireless transmission medium or a cable transmission medium, such as the Internet.
Depending on the requirements of certain implementations, the applications of the invention can be implemented in hardware or in software. The implementation can be carried out using a digital storage medium, for example, a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored there, which cooperate (or are able to cooperate) with a programmable computer system so that the respective method is carried out.
Some applications according to the invention comprise a non-transitory data carrier with electronically readable control signals, which are capable of cooperating with a programmable computer system, in such a way that one of the methods described here is carried out.
In general, the applications of the present invention can be implemented as a computer program product with a program code, the program code being operative to perform one of the methods in which the computer program product operates on a computer . The program code can, for example, be stored on an optical reader.
Other applications include the computer program for executing one of the methods described here, stored on optical reading equipment.
In other words, an application of the method of the invention is, therefore, a computer program with a program code to perform one of the methods described herein, when the computer program is executed on a computer.
Another application of the methods of the invention is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium), containing the recorded computer program for carrying out one of the methods described herein.
Another application of the method of the invention is, therefore, a stream of data or a sequence of signals representing the computer program for carrying out one of the methods described herein. The data stream or the signal sequence can, for example, be configured to be transferred over a connection for data communication, for example, over the Internet.
An additional application comprises a processing medium, for example, a computer or a programmable logic device, configured for or adapted to perform a
Another application comprises a computer having a computer program installed on it to perform one of the methods described herein.
In some applications, a programmable logic device (for example, a set of ports that are programmable in the field) can be used to perform some or all of the functionality of the methods described here. In some applications, a set of programmable ports in the field can cooperate with a microprocessor in order to execute one of the methods described here. In general, the methods are preferably carried out by any hardware device.
The applications described above are merely illustrative for the principles of the present invention. It is understood that modifications and variations of the arrangements and details described herein will be evident to other experts in the art. It is therefore intended to be limited only by the scope of the patent pending claims and not by the specific details presented for the purpose of describing and explaining the applications of the present invention.

权利要求:
Claims (14)
[0001]
1. Apparatus for decomposing an input signal (10) having a number of at least three input channels, comprising: a downmixer (12) for reducing the input signal to obtain a downmix signal, characterized by the downmixer ( 12) be configured for reduction so that the number of downmix channels of the reduced signal (14) is at least 2 channels or less than the number of input channels; an analyzer (16) for analyzing the reduced signal to derive an analysis result (18), and a signal processor (20) for processing the input signal (10) or a signal (24) derived from the input signal, or a signal, from which the input signal is derived, using the analysis result (18), in which the signal processor (20) is configured to apply the analysis result to the input signal channels or channels of the signal derived from the input signal to obtain the decomposed signal (26).
[0002]
An apparatus according to claim 1, further comprising a time / frequency converter (32) for converting the input channels into a time sequence of frequency representations of the channels, each frequency representation of the input channel with a plurality of sub-bands, or where the downmixer (12) comprises a time / frequency converter to convert the reduced signal, characterized in that the analyzer (16) is configured to generate an analysis result (18) for individual sub-bands , and wherein the signal processor (20) is configured to apply the results of the individual analyzes to corresponding subbands of the input signal or the signal obtained from the input signal.
[0003]
Apparatus according to claim 1 or 2, characterized in that the analyzer (16) is configured to produce, according to the result of the analysis, the weighting factors (W (m, i)), and in which the signal processor (20) is configured to apply the weighting factors to the input signal or the signal obtained from the input signal with application of the weighting factors.
[0004]
4. Apparatus according to one of the preceding claims, characterized in that the downmixer is configured for adding weighted or unweighted input channels according to a downmix rule in such a way that at least the two downmix channels are different each other.
[0005]
Apparatus according to one of the preceding claims, in which the downmixer (12) is characterized by being configured to filter the input signal (10) using filters based on impulse responses from the environment, binaural filters based on responses from environment impulse (BRIR) or filters based on HRTF.
[0006]
Apparatus according to one of the preceding claims, where the processor (20) is characterized by being configured to apply a Wiener filter to the input signal or the signal obtained from the input signal, and in which the analyzer (16 ) is configured to calculate the Wiener filter using expected values derived from the downmix channels.
[0007]
Apparatus according to one of the preceding claims, characterized in that it further comprises a signal shunt (22) for deriving the signal from the input signal so that the signal obtained from the input signal has a different number of channels compared to the reduced signal or the input signal.
[0008]
8. Apparatus according to one of the preceding claims, where the analyzer (20) is characterized by being configured to use a frequency-dependent pre-stored similarity curve, indicating a frequency-dependent similarity between two signals generated by reference signals previously known.
[0009]
9. Apparatus according to any one of claims 1 to 8, in which the analyzer is characterized by being configured to use a frequency-dependent pre-stored similarity curve, indicating a frequency-dependent similarity between two or more signals in one position of the listener under the assumption that the signals have a known characteristic similarity and that the signals are emitted by loudspeakers in the known positions of the speakers.
[0010]
Apparatus according to one of claims 1 to 7, wherein the analyzer is characterized by being configured to calculate a frequency-dependent and signal-dependent similarity curve, using a short-term power dependent on the frequency of the input channels .
[0011]
Apparatus according to any one of claims 8 to 10, in which the analyzer (16) is characterized by being configured to calculate a similarity of the reduced channel in a frequency subband (80), to compare a result of similarity with a similarity indicated by the reference curve (82, 83) and generate the weighting factor based on the result of the compression, as a result of the analysis, or to calculate the distance between the corresponding result and a similarity indicated by the reference curve for the same frequency sub-band and also to calculate a weighting factor based on the distance according to the result of the analysis.
[0012]
Apparatus according to one of the preceding claims, characterized in that the analyzer (16) is configured to analyze the downmix channels in sub-bands determined by a frequency resolution of the human ear.
[0013]
Apparatus according to one of claims 1 to 12, in which the analyzer (16) is characterized by being configured to analyze the reduced signal to generate an analysis result, allowing a decomposition of direct ambience, and in which the processor signal (20) is configured to extract the direct or ambient part using the analysis result.
[0014]
14. Method for decomposing the input signal (10) having a number of at least three input channels, comprising: reducing (12) the input signal to obtain a reduced signal, so that a number of reduced channels of the reduced signal (14) is at least 2 channels or less than the number of input channels; analyze (16) the reduced signal to derive an analysis result (18), and process (20), the input signal (10) or the signal (24) derived from the input signal, or a signal, from which the input signal is derived, using the analysis result (18), characterized in that the analysis result is applied to the input channels of the input signal or channels of the signal derived from the input signal to obtain the decomposed signal (26).

类似技术:

公开号 | 公开日 | 专利标题

BR112013014172B1|2021-03-09|apparatus and method for decomposing an input signal using a downmixer

ES2895436T3|2022-02-21|Apparatus and method for generating an audio output signal having at least two output channels

AU2015255287B2|2017-11-23|Apparatus and method for generating an output signal employing a decomposer

Walther2009|Perception and rendering of three-dimensional surround sound

同族专利:

公开号 | 公开日

ES2534180T3|2015-04-20|

CN103348703A|2013-10-09|

TWI519178B|2016-01-21|

EP2649814B1|2015-01-14|

WO2012076332A1|2012-06-14|

WO2012076331A1|2012-06-14|

JP5595602B2|2014-09-24|

CN103355001A|2013-10-16|

AR084175A1|2013-04-24|

JP2014502478A|2014-01-30|

RU2013131774A|2015-01-20|

EP2649815B1|2015-01-21|

US20130268281A1|2013-10-10|

MX2013006358A|2013-08-08|

RU2554552C2|2015-06-27|

CN103348703B|2016-08-10|

KR20130105881A|2013-09-26|

CA2820376A1|2012-06-14|

CA2820351A1|2012-06-14|

BR112013014172A2|2016-09-27|

US9241218B2|2016-01-19|

RU2013131775A|2015-01-20|

TW201238367A|2012-09-16|

EP2464146A1|2012-06-13|

BR112013014173A2|2018-09-18|

HK1190553A1|2014-07-04|

US10187725B2|2019-01-22|

AU2011340891A1|2013-06-27|

JP2014502479A|2014-01-30|

PL2649815T3|2015-06-30|

AU2011340890B2|2015-07-16|

BR112013014173B1|2021-07-20|

PL2649814T3|2015-08-31|

AU2011340890A1|2013-07-04|

US20190110129A1|2019-04-11|

EP2649814A1|2013-10-16|

TWI524786B|2016-03-01|

RU2555237C2|2015-07-10|

KR101480258B1|2015-01-09|

ES2530960T3|2015-03-09|

KR20130133242A|2013-12-06|

US20130272526A1|2013-10-17|

MX2013006364A|2013-08-08|

EP2464145A1|2012-06-13|

AR084176A1|2013-04-24|

CA2820376C|2015-09-29|

KR101471798B1|2014-12-10|

AU2011340891B2|2015-08-20|

CA2820351C|2015-08-04|

US10531198B2|2020-01-07|

EP2649815A1|2013-10-16|

TW201234871A|2012-08-16|

CN103355001B|2016-06-29|

HK1190552A1|2014-07-04|

JP5654692B2|2015-01-14|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US9025A|1852-06-15|And chas |

US7026A|1850-01-15|Door-lock |

US5065759A|1990-08-30|1991-11-19|Vitatron Medical B.V.|Pacemaker with optimized rate responsiveness and method of rate control|

US5912976A|1996-11-07|1999-06-15|Srs Labs, Inc.|Multi-channel audio enhancement system for use in recording and playback and methods for providing same|

TW358925B|1997-12-31|1999-05-21|Ind Tech Res Inst|Improvement of oscillation encoding of a low bit rate sine conversion language encoder|

SE514862C2|1999-02-24|2001-05-07|Akzo Nobel Nv|Use of a quaternary ammonium glycoside surfactant as an effect enhancing chemical for fertilizers or pesticides and compositions containing pesticides or fertilizers|

US6694027B1|1999-03-09|2004-02-17|Smart Devices, Inc.|Discrete multi-channel/5-2-5 matrix system|

ES2294300T3|2002-07-12|2008-04-01|Koninklijke Philips Electronics N.V.|AUDIO CODING|

WO2004059643A1|2002-12-28|2004-07-15|Samsung Electronics Co., Ltd.|Method and apparatus for mixing audio stream and information storage medium|

US7254500B2|2003-03-31|2007-08-07|The Salk Institute For Biological Studies|Monitoring and representing complex signals|

JP2004354589A|2003-05-28|2004-12-16|Nippon Telegr & Teleph Corp <Ntt>|Method, device, and program for sound signal discrimination|

EP1810280B1|2004-10-28|2017-08-02|DTS, Inc.|Audio spatial environment engine|

DE602005022641D1|2004-03-01|2010-09-09|Dolby Lab Licensing Corp|Multi-channel audio decoding|

EP1722359B1|2004-03-05|2011-09-07|Panasonic Corporation|Error conceal device and error conceal method|

US7392195B2|2004-03-25|2008-06-24|Dts, Inc.|Lossless multi-channel audio codec|

US8843378B2|2004-06-30|2014-09-23|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Multi-channel synthesizer and method for generating a multi-channel output signal|

US7961890B2|2005-04-15|2011-06-14|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung, E.V.|Multi-channel hierarchical audio coding with compact side information|

US7468763B2|2005-08-09|2008-12-23|Texas Instruments Incorporated|Method and apparatus for digital MTS receiver|

US7563975B2|2005-09-14|2009-07-21|Mattel, Inc.|Music production system|

KR100739798B1|2005-12-22|2007-07-13|삼성전자주식회사|Method and apparatus for reproducing a virtual sound of two channels based on the position of listener|

SG136836A1|2006-04-28|2007-11-29|St Microelectronics Asia|Adaptive rate control algorithm for low complexity aac encoding|

US8379868B2|2006-05-17|2013-02-19|Creative Technology Ltd|Spatial audio coding based on universal spatial cues|

US7877317B2|2006-11-21|2011-01-25|Yahoo! Inc.|Method and system for finding similar charts for financial analysis|

US8023707B2|2007-03-26|2011-09-20|Siemens Aktiengesellschaft|Evaluation method for mapping the myocardium of a patient|

DE102008009024A1|2008-02-14|2009-08-27|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for synchronizing multichannel extension data with an audio signal and for processing the audio signal|

EP2272169B1|2008-03-31|2017-09-06|Creative Technology Ltd.|Adaptive primary-ambient decomposition of audio signals|

US8023660B2|2008-09-11|2011-09-20|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Apparatus, method and computer program for providing a set of spatial cues on the basis of a microphone signal and apparatus for providing a two-channel audio signal and a set of spatial cues|

EP2393463B1|2009-02-09|2016-09-21|Waves Audio Ltd.|Multiple microphone based directional sound filter|

WO2010125228A1|2009-04-30|2010-11-04|Nokia Corporation|Encoding of multiview audio signals|

KR101566967B1|2009-09-10|2015-11-06|삼성전자주식회사|Method and apparatus for decoding packet in digital broadcasting system|

EP2323130A1|2009-11-12|2011-05-18|Koninklijke Philips Electronics N.V.|Parametric encoding and decoding|

EP2578000A1|2010-06-02|2013-04-10|Koninklijke Philips Electronics N.V.|System and method for sound processing|

US9183849B2|2012-12-21|2015-11-10|The Nielsen Company , Llc|Audio matching with semantic audio recognition and report generation|US9600021B2|2011-02-01|2017-03-21|Fu Da Tong Technology Co., Ltd.|Operating clock synchronization adjusting method for induction type power supply system|

US9628147B2|2011-02-01|2017-04-18|Fu Da Tong Technology Co., Ltd.|Method of automatically adjusting determination voltage and voltage adjusting device thereof|

US9048881B2|2011-06-07|2015-06-02|Fu Da Tong Technology Co., Ltd.|Method of time-synchronized data transmission in induction type power supply system|

US10056944B2|2011-02-01|2018-08-21|Fu Da Tong Technology Co., Ltd.|Data determination method for supplying-end module of induction type power supply system and related supplying-end module|

US10038338B2|2011-02-01|2018-07-31|Fu Da Tong Technology Co., Ltd.|Signal modulation method and signal rectification and modulation device|

US9671444B2|2011-02-01|2017-06-06|Fu Da Tong Technology Co., Ltd.|Current signal sensing method for supplying-end module of induction type power supply system|

US9831687B2|2011-02-01|2017-11-28|Fu Da Tong Technology Co., Ltd.|Supplying-end module for induction-type power supply system and signal analysis circuit therein|

US8941267B2|2011-06-07|2015-01-27|Fu Da Tong Technology Co., Ltd.|High-power induction-type power supply system and its bi-phase decoding method|

KR20120132342A|2011-05-25|2012-12-05|삼성전자주식회사|Apparatus and method for removing vocal signal|

US9253574B2|2011-09-13|2016-02-02|Dts, Inc.|Direct-diffuse decomposition|

US9075587B2|2012-07-03|2015-07-07|Fu Da Tong Technology Co., Ltd.|Induction type power supply system with synchronous rectification control for data transmission|

PT2896221T|2012-09-12|2017-01-30|Fraunhofer Ges Zur Förderung Der Angewandten Forschung E V|Apparatus and method for providing enhanced guided downmix capabilities for 3d audio|

WO2014147551A1|2013-03-19|2014-09-25|Koninklijke Philips N.V.|Method and apparatus for determining a position of a microphone|

EP2790419A1|2013-04-12|2014-10-15|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for center signal scaling and stereophonic enhancement based on a signal-to-downmix ratio|

US9319819B2|2013-07-25|2016-04-19|Etri|Binaural rendering method and apparatus for decoding multi channel audio|

US10469969B2|2013-09-17|2019-11-05|Wilus Institute Of Standards And Technology Inc.|Method and apparatus for processing multimedia signals|

EP3062535B1|2013-10-22|2019-07-03|Industry-Academic Cooperation Foundation, Yonsei University|Method and apparatus for processing audio signal|

KR20210094125A|2013-12-23|2021-07-28|주식회사 윌러스표준기술연구소|Method for generating filter for audio signal, and parameterization device for same|

CN107770718B|2014-01-03|2020-01-17|杜比实验室特许公司|Generating binaural audio by using at least one feedback delay network in response to multi-channel audio|

CN104768121A|2014-01-03|2015-07-08|杜比实验室特许公司|Generating binaural audio in response to multi-channel audio using at least one feedback delay network|

EP3122073A4|2014-03-19|2017-10-18|Wilus Institute of Standards and Technology Inc.|Audio signal processing method and apparatus|

KR101856540B1|2014-04-02|2018-05-11|주식회사 윌러스표준기술연구소|Audio signal processing method and device|

EP2942982A1|2014-05-05|2015-11-11|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|System, apparatus and method for consistent acoustic scene reproduction based on informed spatial filtering|

US9883314B2|2014-07-03|2018-01-30|Dolby Laboratories Licensing Corporation|Auxiliary augmentation of soundfields|

CN105336332A|2014-07-17|2016-02-17|杜比实验室特许公司|Decomposed audio signals|

EP3197182B1|2014-08-13|2020-09-30|Samsung Electronics Co., Ltd.|Method and device for generating and playing back audio signal|

US10559303B2|2015-05-26|2020-02-11|Nuance Communications, Inc.|Methods and apparatus for reducing latency in speech recognition applications|

US9666192B2|2015-05-26|2017-05-30|Nuance Communications, Inc.|Methods and apparatus for reducing latency in speech recognition applications|

EP3335218B1|2016-03-16|2019-06-05|Huawei Technologies Co., Ltd.|An audio signal processing apparatus and method for processing an input audio signal|

EP3232688A1|2016-04-12|2017-10-18|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for providing individual sound zones|

US10187740B2|2016-09-23|2019-01-22|Apple Inc.|Producing headphone driver signals in a digital audio signal processing binaural rendering environment|

US10659904B2|2016-09-23|2020-05-19|Gaudio Lab, Inc.|Method and device for processing binaural audio signal|

JP6788272B2|2017-02-21|2020-11-25|オンフューチャー株式会社|Sound source detection method and its detection device|

IT201700040732A1|2017-04-12|2018-10-12|Inst Rundfunktechnik Gmbh|VERFAHREN UND VORRICHTUNG ZUM MISCHEN VON N INFORMATIONSSIGNALEN|

CN111107481B|2018-10-26|2021-06-22|华为技术有限公司|Audio rendering method and device|

法律状态:
2018-12-18| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2019-10-29| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2021-01-05| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2021-03-09| B16A| Patent or certificate of addition of invention granted|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 22/11/2011, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

US42192710P| true| 2010-12-10|2010-12-10|

US61/421,927|2010-12-10|

EP11165742A|EP2464145A1|2010-12-10|2011-05-11|Apparatus and method for decomposing an input signal using a downmixer|

EP11165742.5|2011-05-11|

PCT/EP2011/070702|WO2012076332A1|2010-12-10|2011-11-22|Apparatus and method for decomposing an input signal using a downmixer|

[返回顶部]