巴西专利BR112012025013B1 A spatial audio processor and a method for providing special parameters based on an acoustic input s

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
A SPACE AUDIO PROCESSOR AND A METHOD TO PROVIDE SPECIAL PARAMETERS BASED ON ACOUSTIC INPUT SIGNAL. A spatial audio processor for providing spatial parameters based on an acoustic input signal comprises a signal characteristic determiner and a controllable parameter estimator. The Signal Characteristic Determiner is configured to determine a signal characteristic of the acoustic input signal. The controllable parameter estimator for calculating the spatial parameters for the acoustic input signal according to a variable spatial parameter calculation standard is configured to modify the variable spatial parameter calculation standard according to the determined signal characteristic. .
公开号:BR112012025013B1
申请号:R112012025013-2
申请日:2011-03-16
公开日:2021-08-31
发明作者:Oliver Thiergart；Richard Schultz-Amling；Markus Kallinger；Giovanni Del Galdo；Achim Kuntz；Dirk Mahne；Ville Pulkki；Mikko-Ville Laitinen；Fabian KÜCH
申请人:Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V.；
IPC主号:

专利说明:

DESCRIPTION TECHNICAL FIELD
Embodiments of the present invention create a spatial audio processor to provide spatial parameters based on an acoustic input signal. Further embodiments of the present invention create a method for providing spatial parameters based on an acoustic input signal. Embodiments of the present invention may relate to the field of acoustic analysis, parametric description and spatial sound reproduction, for example, based on microphone recordings. HISTORY OF THE INVENTION
Spatial sound recording aims to capture sound field • 15 with multiple microphones so that, on the playback side, a listener perceives the sound image as if it were present at the location of the recording. Standard approaches to spatial sound recording use simple stereo microphones or more sophisticated combinations of directional microphones, for example, such as the B-format microphones used in Ambisonics. These methods are often referred to as coincident microphone techniques.
Alternatively, methods based on a parametric representation of sound fields can be applied, which are referred to as parametric spatial audio processors. Recently, several techniques for analysis, parametric description and spatial audio reproduction have been proposed. Each system has unique advantages and disadvantages in ' regarding the type of parametric description, the type of input signals required, the dependence and independence of a specific ® loudspeaker configuration etc.
An example for an efficient parametric description of spatial sound is given by Directional Audio Coding (DirAC) (V. Pulkki: Spatial Sound Reproduction with Directional Audio Coding, Journal of the AES, Vol. 55, No. 6, 2007). DirAC represents an approach to the acoustic analysis and parametric description of spatial sound (DirAC analysis), as well as its reproduction (DirAC synthesis) . DirAC analysis considers multiple microphone signals as input. The description of spatial sound is provided for several sub-frequency bands in terms of one or several downmix audio signals and parallel parametric information containing the sound direction and diffusion. The last parameter describes how diffused the recorded sound field is. Furthermore, diffusion can be used as a safety measure for direction estimation. Another application is direction-dependent processing of the spatial audio signal (M. Kallinger et al. : A Spatial Filtering Approach for Directional
Audio Coding, 126th AES Convention, Munich, May 2009) . Based on parametric representation, spatial audio can be reproduced with arbitrary speaker settings. Furthermore, DirAC analysis can be considered an acoustic front end for parametric coding systems that are capable of encoding, transmitting and reproducing multichannel spatial audio, eg MPEG Surround.
Another approach to spatial sound field analysis is represented by the so-called Space Audio Microphone (SAM) (C. Fallen Microphone Front-Ends for Spatial Audio Coders, in Proceedings of the AES 125th International Convention, San Francisco, Oct. 2008) . 0 SAM considers signals from coincident directional microphones as input. Similar to DirAC, SAM determines the DOA (DOA - direction of arrival) of the sound for a parametric description of the sound field, along with an estimate of diffuse sound components.
Parametric techniques for recording and analyzing spatial audio, such as DirAC and SAM, rely on estimates of specific sound field parameters. The performance of these approaches is, therefore, strongly dependent on the estimation performance of the spatial indication parameters, such as the direction of sound arrival or the diffusion of the sound field.
Generally speaking, when estimating spatial indication parameters, specific assumptions can be made about the acoustic input signals (e.g., about fixation or hue) in order to employ the best (i.e., c more efficient or more accurate) algorithm for processing audio. Traditionally, a single time-invariant signal model 20 can be defined for this purpose. However, a problem that commonly arises is that different audio signals can exhibit significant temporal variation, so a general time-invariant model describing the audio input is generally inadequate. In particular, when considering a single time-invariant signal model for audio processing, model incompatibilities may occur that degrade the performance of the applied algorithm.
It is an object of embodiments of the present invention to provide spatial parameters for an acoustic input signal with minor model mismatches caused by a temporal variation or a temporal non-fixation of the acoustic input signal. SUMMARY OF THE INVENTION
This objective is solved by a spatial audio processor, according to claim 1, a method for providing spatial parameters based on an acoustic input signal, according to claim 14, and a computer program, according to claim 15.
Embodiments of the present invention create a spatial audio processor to provide spatial parameters based on an acoustic input signal. The spatial audio processor comprises a signal characteristic determiner and a controllable parameter estimator. The Signal Characteristic Determiner is configured to determine a signal characteristic of the acoustic input signal. The controllable parameter estimator is configured to calculate the spatial parameters for the acoustic input signal according to a variable spatial parameter calculation standard. The parameter estimator is further configured to modify the spatial variable parameter calculation standard according to the determined signal characteristic.
It is an idea of the embodiments of the present invention that a spatial audio processor for providing spatial parameters based on an acoustic input signal, which reduces model mismatches caused by a temporal variation of the acoustic input signal, can be created, when a Calculation standard for calculating the spatial parameter is modified based on a signal characteristic of the acoustic input signal. It has been found that model mismatches can be reduced when a signal characteristic of the acoustic input signal is determined, and based on that determined signal characteristic, spatial parameters for the acoustic input signal are calculated.
In other words, the embodiments of the present invention can address the problem of model 10 mismatches caused by a temporal variation of the acoustic input signal by determining characteristics (signal characteristics) of the acoustic input signals, for example, in a pre-setting step. - processing (in the signal characteristic finder) and then identifying the signal model (for example, a spatial parameter calculation standard or spatial parameter calculation standard parameters) that best fits the current situation (the current signal characteristics). This information can be fed to the parameter estimator which can then select the best parameter estimation strategy (in relation to the temporal variation of the acoustic input signal) to calculate the spatial parameters. Therefore, it is an advantage of the embodiments of the present invention that a parametric field description (the spatial parameters) can be achieved with a significantly reduced model mismatch.
The acoustic input signal can, for example, be a signal measured with one or more microphones, for example with groups of microphones or with a B-format microphone. Different microphones can have different guidelines. Acoustic input signals can have, for example, a sound pressure "P" or a particular speed "U", for example, in a time or "frequency domain" (eg in an SFTF domain, STFT = transform Fourier function) or, in other words, in a representation of time or in a representation of frequency. the acoustic input signal may, for example, comprise components in three different directions (eg, orthogonal) (eg, an x-component, a y-component and a z-component) and an omnidirectional component (eg, a w-component). Furthermore, acoustic input signals can contain only three-way components and no omnidirectional components. Furthermore, the acoustic input signal can comprise only the omnidirectional component. Furthermore, the acoustic input signal may comprise two directional components (for example the x component and the y component, the x component and the z component or the y component and the z component) and the omnidirectional component or no omnidirectional component. Furthermore, the acoustic input signal may comprise only one directional component (for example the x component, the y component or the z component) and the omnidirectional component or no omnidirectional component.
The signal characteristic determined by the signal characteristic determiner of the acoustic input signal, for example of microphone signals, can be, for example: fixed intervals with respect to time, frequency, space; presence of dubious conversation or multiple sound sources; presence of tonality or transients; a signal-to-noise ratio of the acoustic input signal; or presence of signs such as applause.
Signals such as applause are defined here as signals that comprise a fast temporal sequence of transients, for example, with different directions. 11 The information obtained by the signal characteristic determiner can be used to control the controllable parameter estimator, for example, in directional audio coding (DirAC) or spatial audio microphone (SAM), for example, to select the strategy of the estimator or estimator settings (or, in other words, to modify the variable spatial parameter calculation standard) that best fits the current situation (the current signal characteristic of the acoustic input signal) .
Embodiments of the present invention can be similarly applied to both spatial audio microphone (SAM) and directional audio coding (DirAC) systems, or to any other parametric system. Next, there will be a main focus on directional audio encoding analysis.
According to some embodiments of the present invention, the controllable parameter estimator can be configured to calculate spatial parameters as directional audio encoding parameters comprising a broadcast parameter for a time period and a frequency sub-range and/or a parameter direction of arrival for a period of time and a sub-frequency band or as spatial audio microphone parameters.
Next, directional audio coding and spatial audio microphone are considered acoustic front ends for systems that operate on spatial parameters, such as the direction of arrival and diffusion of sound. It should be noted that it is simple to apply the concept of the present invention t- also to other acoustic front ends. Both the directional audio coding and the spatial audio microphone provide specific (spatial) parameters obtained from the acoustic input signals to describe the spatial sound. Traditionally, when processing spatial audio with acoustic front ends, y ■ such as directional audio encoding and special audio microphone, a single general model for acoustic input signals is defined so that ideal or near ideal parameter estimators) can be derivatives. Estimators work as desired, as long as the design assumptions taken into account by the model are met. As mentioned earlier, if this is not the case, model mismatches arise, which often lead to serious errors in estimates. These model mismatches represent a recurring problem, as acoustic input signals are often highly time-varying. BRIEF DESCRIPTION OF THE FIGURES
Embodiments in accordance with the present invention will be described with reference to the accompanying figures, in which: Figure 1 shows a schematic block diagram of a spatial audio processor in accordance with an embodiment of the present invention; Figure 2 presents a schematic block diagram of a directional audio encoder as a reference example; Figure 3 is a schematic block diagram of a spatial audio processor in accordance with a further embodiment of the present invention; Figure 4 presents a schematic block diagram of a spatial audio processor, in accordance with a further embodiment of the present invention; Figure 5 presents a schematic V block diagram of a spatial audio processor in accordance with a further embodiment of the present invention; Figure 6 is a schematic block diagram 10 of a spatial audio processor in accordance with a further embodiment of the present invention; Figure 7a presents a schematic block diagram of a parameter estimator that may be used in a spatial audio processor, in accordance with an embodiment of the present invention; > Figure 7b presents a schematic block diagram of a parameter estimator, which can be used in a spatial audio processor, according to an embodiment of the present invention; Figure 8 presents a schematic block diagram of a spatial audio processor in accordance with a further embodiment of the present invention; Figure 9 presents a schematic block diagram of a spatial audio processor in accordance with a further embodiment of the present invention; and. Figure 10 presents a flowchart of a method, in accordance with a further embodiment of the present invention. DETAILED DESCRIPTION OF THE ACHIEVEMENTS OF THE PRESENT INVENTION
Before the embodiments of the present invention are explained in more detail using the attached figures, it should be noted that the same elements or with the same functionality are provided with the same reference numerals and that a repeated description of the same elements should be omitted. The 7 element descriptions provided with the same reference numbers are therefore mutually interchangeable. V
SPACE AUDIO PROCESSOR ACCORDING TO FIGURE 1 Next, a spatial audio processor 100 will be described with reference to Figure 1, which presents a schematic block diagram of this spatial audio processor. The spatial audio processor 100 for providing 15 spatial parameters 102 or spatial parameter estimates 102 based on an acoustic input signal 104 (or a plurality of acoustic input signals 104) comprises a controllable parameter estimator 106 and a signal characteristics 108. Signal characteristic determiner 108 is configured to determine a signal characteristic 110 of the acoustic input signal 104. The controllable parameter estimator 106 is configured to calculate spatial parameters 102 for the acoustic input signal 104. , according to a variable spatial parameter calculation standard. The controllable parameter estimator 106 is further configured to modify the spatial variable parameter calculation standard in accordance with the determined signal characteristics 110. In other words, the controllable parameter estimator 106 is controlled depending on the characteristics of the input signals acoustic signals or the acoustic input signal 104.
The acoustic input signal 104 may, as described above, comprise directional components and/or omnidirectional components. A suitable signal characteristic 110, as already mentioned, may be, for example, fixed intervals with respect to time, frequency, space of the acoustic input signal 104, the presence of dubious conversation or multiple sound sources in the acoustic input signal 104, the presence of pitch 10 or transients within the acoustic input signal 104, the presence of clap or a signal-to-noise ratio of the acoustic input signal 104. This enumeration of suitable signal characteristics is only an example of signal characteristics that the signal characteristic determiner 108 can determine. According to further embodiments of the present invention, the signal characteristic determiner 108 can also determine other signal characteristics (not mentioned) of the acoustic input signal 104 and the controllable parameter estimator 106 can modify the standard for calculating variable spatial parameter 20 based on these other signal characteristics of the acoustic input signal 104.
Controllable parameter estimator 106 can be configured to calculate spatial parameters 102 as directional audio coding parameters comprising a broadcast parameter 7(k,n) for a time period n and a frequency subrange ke/or a parameter of arrival direction <p(k, n) for a period of time n and a sub-band of frequency k or as spatial audio microphone parameters, eg, for a period of time n and a sub-band of frequency k. the controllable parameter estimator 106 can be further configured to calculate the spatial parameters 102 using a concept other than DirAC or SAM. The calculation of DirAC parameters and SAM parameters should only be understood as examples. The controllable parameter estimator can, for example, be configured to calculate the spatial parameters 102 so that the spatial parameters comprise a sound direction, a sound diffusion or a statistical measure of the sound direction.
The acoustic input signal 104 may, for example, be provided in a time domain or a frequency domain (short time), for example, in the SFTF domain.
For example, the acoustic signal 104, which is provided in the time domain, may comprise a plurality of acoustic audio streams Xj(t) to xK(t), each comprising a plurality of acoustic input samples over time. Each of the acoustic input streams can, for example, be provided with a different microphone and can correspond to a different viewing direction. For example, a first acoustic input stream Xj(t) may correspond to a first direction (eg with an x direction), a second acoustic input stream x2(t) may correspond to a second direction, which may be orthogonal to the first direction (eg a y direction) , a third acoustic input stream x3(t) can correspond to a third direction, which can be orthogonal to the first direction and the second direction (eg a z direction) and a fourth acoustic input stream x4(t) can be an omnidirectional component. These different acoustic input streams can be recorded from different microphones, for example, in an orthogonal orientation, and can be digitized using an analogue to digital converter.
According to further embodiments of the present invention, the acoustic input signal 104 may comprise acoustic input streams in a frequency representation, for example, in a time frequency domain such as the SFTF domain. For example, the acoustic input signal 104 may be provided in B-format comprising a particular velocity vector U(k,n) and a sound pressure vector P(k,n), where k denotes a frequency sub-range and n denotes a period of time. the particular velocity vector U(k,n) is a directional component of the acoustic input signal 104, wherein the sound pressure P(k,n) represents an omnidirectional component of the acoustic input signal 104.
As mentioned above, the controllable parameter estimator 106 can be configured to provide the spatial parameters 102 as directional audio encoding parameters or as spatial audio microphone parameters. Next, a conventional directional audio encoder will be presented as a reference example. A schematic block diagram of this conventional directional audio encoder is shown in Figure 2.
CONVENTIONAL DIRECTIONAL AUDIO ACCORDING TO FIGURE 2 Figure 2 presents a schematic block diagram of a directional audio coder 200. The directional audio coder 200 comprises a B-format estimator 202. The B-format estimator 202 comprises a bank of filters. Directional audio encoder 200 further comprises a directional audio encoding parameter estimator 204. Directional audio encoding parameter estimator 204 comprises an energy analyzer 206 for performing an energy analysis.
Furthermore, the directional audio encoding parameter estimator 204 comprises a direction estimator 208 and a spread estimator 210.
Directional Audio Coding (DirAC) (V. Pulkki: Spatial Sound Reproduction with Directional Audio Coding, Journal of the AES, Vol. 55, No. 6, 2007) represents an efficient perceptually motivated approach to spatial sound analysis and reproduction. . DirAC analysis provides a parametric description of the sound field in terms of a downmix audio signal and additional parallel information, eg direction of arrival (DOA) of sound and sound field diffusion. DirAC takes into account aspects that are relevant to human hearing. For example, 20 it is assumed that in-ear time differences (ITD) and in-ear level differences (ILD) can be described by the DOA of sound. Correspondingly, it is assumed that inter-aural coherence (CI) can be represented by the diffusion of the sound field. From the output of the DirAC analysis, a sound reproduction system can generate aspects to reproduce the sound with the original spatial impression of an arbitrary set of speakers. It should be noted that diffusion can also be considered a confidence measure for the estimated DOAs. The greater the diffusion, the lower the reliability of DOA and vice versa. This information can be used by many DirAC-based tools, such as source localization (0. Thiergart et al.: Localization of Sound Sources in Reverberant Environments Based on
Directional Audio Coding Parameters, 127th AES Convention, NY, October 2009). Embodiments of the present invention focus on part i of DirAC analysis rather than sound reproduction.
In the DirAC analysis, the parameters are estimated through an energy analysis performed by the energy analyzer 206 of the sound field, based on B-format signals provided by the B-format estimator 202. The B-format signals consist of one omnidirectional signal, corresponding to the sound pressure P(k, n) , and one, two or three dipole signals aligned to the x, y and z directions of a Cartesian coordinate system. The dipole signals correspond to the elements of the particle velocity vector U(k, n) . The DirAC analysis is depicted in Figure 2. The time-domain microphone signals, namely, x^t), x2(t), ... , xN(t), are provided to the B-format estimator 202 These time domain microphone signals may be referred to as "time domain acoustic input signals" below. The B-format estimator 202, which contains a short-time Fourier transform (STFT) or other filter bank (FB), computes the B-format signals in the short-time frequency domain, that is, the vector of sound pressure P(k,n) and particle velocity U(k,n), where ken denote the frequency index (a frequency subband) and the time block index (a period of time), respectively. Signals P(k,n) and U(k,n) can be referred to as "acoustic input signals in the short time frequency domain" below. B-format signals can be obtained from measurements with microphone collections, as explained in R. Schultz-Amling et al.: Planar Microphone Array Processing for the Analysis and Reproduction of Spatial Audio using Directional Audio Coding, 124th AES Convention, Amsterdam , The Netherlands, May 2008, or directly using, for example, a B-format microphone. In energy analysis, the active sound intensity vector Ia(k,n) can be estimated separately for different frequency ranges, using
where Re(•) yields the real part and U*(k,n) denotes the complex conjugate of the particle velocity vector U(k,n).
In the following, the active sound intensity vector will also be called the loudness parameter.
Using the SFTF domain representation in equation 1, the DOA of the sound cp(k,n) can be determined in the direction estimator 208 for each ken as the opposite direction of the active sound intensity vector Ia(k,n). In the diffusion estimator 210, the diffusion of the sound field Φ(k,n) can be computed based on active intensity oscillations, according to
where I(.}I denotes the vector norm and E(•) returns the expectation. In practical application, the expectation E(*) can be approximated by a finite mean along one or more specific dimensions, for example, over time, frequency or space.
It was found that the expectation E(»), in equation 2, can be approximated by the average along a specific dimension. For this question, the average can be performed over time (temporal average), frequency (spectral average) or space (spatial average). Spatial mean means, for example, that the active sound intensity vector Ia(k,n) in equation 2 is estimated 5 with multiple collections of microphones placed at different points. For example, we can place four different (microphone) collections at four different points within the environment. As a result, then, we have, for each time frequency point (k,n), four intensity vectors Ia(k,n) 10 that can be measured (in the same way as, for example, the spectral average) to obtain an approximation for the expectation operator E (•) .
For example, by using a temporal average over several n, we obtain an estimate T(k,n) for the diffusion parameter given by

There are common methods for taking a temporal averaging, as needed, in (3). A Method is the block average (interval average) over a specific number N of 20 instances of time n, given by
where y(k,n) is the quantity to be measured, for example, Ia(k,n) or |Is(k,n) | . A second method for computing time averages, which is generally used in DirAC, due to its efficiency, is to apply infinite impulse response (IIR) filters. For example, when using a first-order low-pass filter with filter coefficient ae[0,l], a temporal average of a given signal y(k,n) over n can be obtained with
where y|(k,n) denotes the result of the real average and 5 );'l(k,n - 1) is the result of the previous average, that is, the result of the average for the time instance (n-1 ) . A larger temporal average is achieved for » a smaller a, while a larger a produces more instantaneous results, where the previous result n-1) counts less. A typical value for a used in DirAC is a=0.1.
It was found that, in addition to using temporal averaging, the expectation operator in equation 2 can also be approximated by spectral averaging over several or all k frequency subbands. This method is only applicable if independent diffusion estimates are not required for the 15 different frequency sub-bands in the last processing, eg when only a single sound source is present. Therefore, generally, the most adequate way to compute diffusion, in practice, may be to employ temporal average.
In general, when approximating an operator of i expectation, as in equation 2, by an averaging process, we assume the fixation of the considered sign in relation to the quantity to be measured. The higher the mean, that is, the more samples taken into account, generally the more accurate the results.
Next, spatial audio microphone (SAM) analysis should also be explained briefly. SPACE AUDIO MICROPHONE ANALYSIS (SAM)
Similar to DirAC, the SAM analysis (C. Fallen Microphone Front-Ends for Spatial Audio Coders, in Proceedings of the AES 125th International Convention, San Francisco, Oct. 2008) provides a parametric description of spatial sound. The sound field representation is based on a downmix audio signal and parallel parametric information, namely, the DOA of the sound and estimates of the levels of direct and diffuse sound components. What is included in the SAM analysis are the signals measured with multiple coincident directional microphones, for example, two cardioid sensors placed at the same point. The bases for SAM analysis are the energy spectral densities (PSDs) and the cross spectral densities (CSDs) of the input signals.
For example, consider Xi(k,n) and X2(k,n) as signals in the time frequency domain measured by two coincident directional microphones. The PSDs of both input signals can be determined with

The CSD between both inputs is given by the correlation

SAM assumes that the measured input signals Xi(k,n) and X2(k,n) represent an overlap of direct sound and diffuse sound, since direct sound and diffuse sound are not correlated. Based on this assumption, it is presented in C. Faller: Microphone Front-Ends for Spatial Audio Coders, in
Proceedings of the AES 125th International Convention, San Francisco, Oct. 2008 that it is possible to derive equations 5a and 5b for each sensor, the PSD of the measured direct sound and the measured diffuse sound. From the ratio between the direct sound PSDs, it is then possible to determine the DOA <p(k,n) of the sound with prior knowledge of the directional responses of the microphones.
It was found that, in a practical application, the expectations E{*}, in equations 5a and 5b, can be approximated by 5 temporal and/or spectral averaging operations. This is similar to the DirAC broadcast computation described in the previous section. Similarly, the average can be performed using, for example, equation 4 or 5. To give an example, the estimate of the CSD can be performed based on the recursive temporal average, according to

As discussed in the previous section, when approximating an expectation operator, as in equations 5a and 5b, by an averaging process, the fixation of the considered sign in relation to the quantity to be measured can be assumed.
In the following, an embodiment of the present invention, which performs a time-varying parameter estimate depending on a fixation interval, will be explained.
SPACE AUDIO PROCESSOR ACCORDING TO FIGURE 3 Figure 3 shows a spatial audio processor 300, in accordance with an embodiment of the present invention, at The functionality of the spatial audio processor 300 may be similar to a functionality of the processor. Spatial audio 100 according to Figure 1. Spatial audio processor 300 may comprise the additional aspects shown in Figure 3. Spatial audio processor 300 comprises a controllable parameter estimator 306, the functionality of which may be similar to a functionality. of the controllable parameter estimator 106, according to Figure 1, and which may comprise the additional aspects described below. The spatial audio processor 300 further comprises a signal characteristic determiner 308, the functionality of which may be similar to the functionality of the signal characteristic determiner 108 according to Figure 1, and which may comprise the additional aspects described below.
The signal characteristic determiner 308 10 can be configured to determine a clamp interval of the acoustic input signal 104 that constitutes the determined signal characteristic 110, for example, using a clamp interval determiner 310. The parameter estimator 306 can be configured to modify the variable calculation standard parameter 15 in accordance with the determined signal characteristic 110, i.e. the determined clamping interval. Parameter estimator 306 can be configured to modify the variable calculation norm parameter so that an average period or average length for calculating spatial parameters 102 20 is comparatively longer (larger) for a comparatively larger clamping interval and be comparatively shorter (smaller) for a comparatively smaller clamping interval. The average length can, for example, be equal to the clamping range.
In other words, the spatial audio processor 300 creates a concept to improve diffusion estimation in directional audio coding by considering varying range of fixing the acoustic input signal 104 or the acoustic input signals. the clamping interval of the acoustic input signal 104 may, for example, define a period of time in which there has been no movement (or only an insignificantly small one) of a sound source of the acoustic input signal 104. The acoustic input signal 104 can define a time period in which a certain signal characteristic of the acoustic input signal 104 remains constant over time. The signal characteristic can, for example, be a signal energy, a spatial spread, a hue, a Signal to Noise Ratio and/or others. By taking into account the clamping range of the acoustic input signal 104 for calculating the spatial parameters 102, an average length for calculating the spatial parameters 102 can be modified, so that a precision of the spatial parameters 102 representing the input signal Acoustic 104 can be improved. For example, for a longer fixation interval, which means that the sound source of the acoustic input signal 104 has not moved for a longer interval, a longer temporal (or time) average may be applied than for a shorter clamping interval. Therefore, at least an almost ideal (or, in some cases, even ideal) spatial parameter estimation can (always) be performed by the controllable parameter estimator 306 depending on the clamping range of the acoustic input signal 104. the parameter estimator controllable 306 can, for example, be configured to provide a diffusion parameter ^(k,n), for example, in an SFTF domain for a frequency subband k and a time period or time block n. The controllable parameter estimator 306 may comprise a diffusion estimator 312 for calculating the diffusion parameter F(k,n), e.g., based on a temporal average of an intensity parameter Ia(k,n) of the input signal. Acoustic 104 in an SFTF domain. In addition, the controllable parameter estimator 306 may comprise an energy analyzer 314 for performing an energy analysis of the acoustic input signal 104 to determine the intensity parameter Ia(k,n). The intensity parameter Ia(k, n) can also be called the active sound intensity vector and can be calculated by the energy analyzer 314, according to equation 1.
Therefore, the acoustic input signal 104 can also be provided in the SFTF domain, e.g., in the B-format, comprising a sound pressure P(k, n) and a particular velocity vector U(k, n) for a k frequency subband and a time period n, The diffusion estimator 312 can calculate the diffusion parameter ^(k,n) based on a time average of the 20 intensity parameters Ia(k,n) of the acoustic input signal 104 , for example, of the same frequency sub-band k. Diffusion estimator 312 can calculate diffusion parameter ^(k, n) , according to equation 3, in which several intensity parameters and therefore average length can be varied by diffusion estimator 312 in dependence. of the determined fixation interval.
As a numerical example, if a comparatively long fixation interval is determined by the fixation interval determinator 310, the diffusion estimator 312 can perform the temporal average of the intensity parameters Ia(k,n)' over the intensity parameters Ia (k, n-10) to Ia(k, n-1). For a comparatively short fixation interval determined by the fixation interval determinator 310, the diffusion estimator 312 can perform the temporal average of the intensity parameters - Ia(k,n) to the intensity parameters Ia(k,n-4) a Ia(k,n-1).
As can be seen, the average length of the temporal average applied by the 312 diffusion estimator corresponds to the number of intensity parameters Ia(k, n) used for the temporal average.
In other words, the directional audio coding diffusion estimate is improved by considering the time invariant clamping interval (also called the coherence time) of the acoustic input signals or the acoustic input signal 104. As explained earlier, the A common way, in practice, to estimate the diffusion parameter T(k, n) is to use equation 3, which comprises a temporal average of the active intensity vector Ia(k, n) . It has been found that the ideal averaging length depends on the temporal fixation of the acoustic input signals or the acoustic input signal 104. It has been found that the most accurate results can be obtained when the averaging length is chosen to be equal to the range of fixation.
Traditionally, as presented with the conventional directional audio encoder 200, a general time invariant model for the acoustic input signal is defined from which optimal parameter estimation strategy is then defined, which in this case means the ideal temporal average length. For diffusion estimation, it is typically assumed that the acoustic input signal is time-fixed within a certain time interval, eg 20 m. In other words, the considered clamping range is set to a constant value which is typical for many input signals. From the assumed fixation interval, the ideal temporal averaging strategy is then derived, for example, the best value for ot, when using an IIR mean, as shown in equation 5, or the best N, when using a block average, as shown in equation 4.
However, it has been found that different acoustic input signals are generally characterized by different fixed intervals. Thus, the traditional method of supposing a time-invariant model for the acoustic input signal does not hold. In other words, when the input signal has fixed ranges that are different from the one assumed by the estimator, we can run into a model mismatch, which can result in poor parameter estimates.
Therefore, the proposed innovative approach (for example, performed in the spatial audio processor 300) adapts the parameter estimation strategy (the spatial variable parameter calculation standard) depending on the actual signal characteristic, as visualized in Figure 3, to the diffusion estimation: the fixing interval of the acoustic input signal 104, i.e. of the B-format signal, is determined in a pre-processing step (by the signal characteristic determiner 308). From this information (from the determined fixation interval), the best (or, in some cases, approximately the best) length of the temporal average, the best (or, in some cases, approximately the best) value for α or for N is chosen and then the (spatial) parameter calculation is performed with the 312 diffusion estimator.
It should be mentioned that in addition to an adaptive signal diffusion estimate in DirAC, it is possible to improve the direction estimate in SAM in a very similar way. In fact, computing the PSDs and CSDs of the acoustic input signals in equations 5a and 5b also needs to approximate expectation operators 10 by a time averaging process (eg, using equations 4 or 5). As explained above, the most accurate results can be obtained when the averaging length matches the clamping range of the acoustic input signals. This means that the SAM analysis can be improved by first determining the clamping range of the acoustic input signals and then choosing from this information the best averaging length. The clamping range of the acoustic input signals and the corresponding ideal averaging filter can be determined as explained below.
Next, an exemplary approach determining the clamping range of the acoustic input signal 104 will be presented. From this information, the ideal temporal average length for diffusion computation, presented in equation 3, is then chosen. DETERMINATION OF THE FIXATION INTERVAL
A possible way of determining the clamping range of an acoustic input signal (eg, acoustic input signal 104) as well as the ideal IIR filter coefficient α (eg used in equation 5) is described below. , which produces a corresponding temporal average. The lock range determination described below can be performed by the lock range finder 310 of the signal characteristic finder 308. The shown method i; allows using equation 3 to precisely estimate the diffusion (parameter) V(k, n) depending on the clamping range of the acoustic input signal 104. The frequency domain sound pressure 10 P(k, n) , which is part of signal B-format, can be considered the acoustic input signal 104. In other words, the acoustic input signal 104 can comprise at least one component corresponding to the sound pressure P(k, n).
Acoustic input signals generally have a short clamping interval if the signal energy varies strongly within a short time interval and vice versa. Typical examples for which the fixation interval is short are transients, speech onsets, and "compensations", namely when a speaker stops talking. The last case is characterized by strongly reducing the signal energy (negative gain) within a short time, while in the first two cases the energy strongly increases (positive gain). the desired algorithm, which aims to find the ideal filter coefficient a, has to provide values close to α 25 =1 (corresponding to a short temporal average) for high non-fixed signals and values close to a = a ', in the case of fixation . The symbol denotes a signal-independent filter coefficient suitable for averaging fixed signals. Expressed in mathematical terms, a suitable algorithm is given by
where a+(k,n) is the ideal filter coefficient for each time frequency box, W(k,n) = |P(k,n)]2 is the absolute value of the instantaneous signal energy of P(k ,n), and w(k,n) is a time average of W(k,n). For fixed signals, the instantaneous energy W(k,n) equals the time average W(k, n) , which produces Oí+ =ot' , as desired. In the case of highly unfixed signals due to positive energy gains, the denominator of equation 7 becomes close to t'*W(k,n), since W(k,n) is larger compared to W(k, n). Thus, a1 « 1 is obtained, as desired. In the case of non-fixation due to negative energy gains, the unwanted result a+ « 0 is obtained, since W(k,n) becomes larger compared to W(k,n). Therefore, an alternative candidate for the ideal filter coefficient α, namely
is introduced, which is similar to equation 7, but has the inverse behavior in the case of non-fixation. This means that, in the case of non-fixation due to positive energy gains, a' ~ 0 is obtained, while for negative energy gains, α" ® 1 is obtained. Thus, considering the maximum of equation 7 and equation 8, this is,
the desired ideal value for the recursive averaging coefficient o is produced, leading to a temporal average that corresponds to the fixing interval of the acoustic input signals.
In other words, the signal characteristic determiner 308 is configured to determine the weighting parameter a based on a ratio of a current (instantaneous) signal energy of at least one 5 (omnidirectional) component (e.g., sound pressure P(k,n) ) of the acoustic input signal 104 and a temporal average over a given (earlier) time segment of the signal energy of the at least one (omnidirectional) component of the acoustic input signal 104. The given time segment it can, for example, correspond to a certain number of signal energy coefficients for different (earlier) time periods.
In the case of a SAM analysis, the energy signal W(k,n) can be composed of the energies of the two microphone signals Xi(k,n) and X2(k,n), for example, W(k,n ) = iX^^n) I2 + |X2(k,n)|2. The coefficient a for the recursive estimation of the correlations in equation 5a or equation 5b, according to equation 5c, can be chosen properly using the criterion of equation 9, described above.
As can be seen from the above, the controllable parameter estimator 306 can be configured to apply the temporal average of the intensity parameters Ia(k, n) of the acoustic input signal 104 using a low pass filter (e.g. the mentioned infinite impulse response (UR) filter or a finite impulse response filter (FIR)). In addition, the controllable parameter estimator 306 can be configured to adjust a weight between a current strength parameter of the acoustic audio signal 104 and previous strength parameters of the acoustic input signal 104 based on the weight parameter α. In a special case of the first-order HR filter, as presented with equation 5, a weight between the current intensity parameter and a previous intensity parameter can be adjusted. The greater the weighting factor, the shorter the length of the temporal average and, therefore, the greater the weight of the current intensity parameter compared to the weight of the previous intensity parameters. In other words, the length of the temporal average is based on the weight parameter a. the controllable parameter estimator 306 can be, for example, configured so that the weight of the current intensity parameter compared to the weight of the previous intensity parameters is comparatively larger for a comparatively smaller fixation interval and so that the weight of the current intensity parameter compared to the weighting of previous intensity parameters is comparatively smaller for a comparatively larger fixation interval. Therefore, the temporal mean length is comparatively shorter for a comparatively smaller fixation interval and is comparatively longer for a comparatively larger fixation interval.
In accordance with further embodiments of the present invention, a controllable parameter estimator of a spatial audio processor, according to an embodiment of the present invention, may be configured to select a spatial parameter calculation standard from a plurality of standards. spatial parameter calculation to calculate the spatial parameters in dependence on the determined signal characteristic.
A plurality of spatial parameter calculation standards may, for example, differ in calculation parameters or may even be completely different from each other. As shown with equations 4 and 5, a temporal average can be calculated using 5 a block average as shown in equation 4 or a low pass filter as shown in equation 5. A first spatial parameter calculation standard can , for example, ■ corresponds to the block mean, according to equation 4, and a second parameter calculation standard can, for example, 10 correspond to the mean using the low-pass filter, according to equation 5. The controllable parameter estimator can choose the calculation norm from the plurality of calculation norm, which provides the most accurate estimate of the spatial parameters, based on the determined signal characteristic.
According to further embodiments of the present invention, the controllable parameter estimator may be configured so that a first spatial parameter calculation standard of the plurality of spatial parameter calculation standards is different from a second spatial parameter calculation standard of the plurality of spatial parameter calculation standards. The first spatial parameter calculation standard and the second spatial parameter calculation standard can be selected from a group consisting of: time averaging over a plurality of time periods in a frequency sub-band (for example, as per presented in equation 3), frequency averaging over a plurality of frequency sub-bands over a period of time, frequency and time averaging, spatial averaging and no averaging.
Next, this concept of choosing a spatial parameter calculation standard from a plurality of spatial parameter calculation standards by a controllable parameter estimator will be described using two exemplary embodiments 5 of the present invention, shown in Figures 4 and 5.
DIRECTION OF ARRIVAL TIME VARIANT AND DIFFUSION ESTIMATE DEPENDING ON THE DUBAI CONVERSATION USING A SPACE ENCODER, ACCORDING TO FIGURE 4
Figure 4 presents a schematic block diagram 10 of a spatial audio processor 400, in accordance with one embodiment of the present invention. A functionality of the spatial audio processor 400 may be similar to the functionality of the spatial audio processor 100, in accordance with Figure 1. The spatial audio processor 400 may comprise the additional aspects described below. Spatial audio processor 400 comprises a controllable parameter estimator 406, the functionality of which may be similar to the functionality of controllable parameter estimator 106, in accordance with Figure 1 and which may comprise the additional aspects described below. The spatial audio processor 400 further comprises a signal characteristic determiner 408, the functionality of which may be similar to the functionality of the signal characteristic determiner 108 of Figure 1, and which may comprise the additional aspects described below.
Controllable parameter estimator 406 is configured to select a spatial parameter calculation standard from a plurality of spatial parameter calculation standards for calculating spatial parameters 102, in dependence on a determined signal characteristic 110, which is determined by the characteristic determiner. 408. In the exemplary embodiment shown in Figure 4, the signal characteristic determiner is configured to determine whether an acoustic input signal 104 comprises components from different sound sources or only comprises components from one sound source. Based on that determination, the controllable parameter estimator 406 can choose a first spatial parameter calculation standard 410 to calculate the spatial parameters 102 10 if the acoustic input signal 104 only comprises components of a sound source and can choose a second standard of calculation of spatial parameter 412 to calculate spatial parameters 102 if the acoustic input signal 104 comprises components from more than one sound source. The first spatial parameter calculation standard 410 may, for example, comprise a spectral average or frequency average over a plurality of frequency sub-bands, and the second spatial parameter calculation standard 412 may not comprise spectral average or frequency average.
Determining whether the acoustic input signal 104 comprises components from more than one sound source may or may not be performed by a double-talk detector 414 of the signal characteristic determiner 408. The parameter estimator 406 may, for example, be set to providing a spread parameter W(k,n) of the acoustic input signal 104 in the SFTF domain for a frequency subband k and a time block n.
In other words, the spatial audio processor 400 presents a concept to improve the broadcast estimation in directional audio encoding when considering dubious conversation situations.
Or, in other words, the signal characteristic determiner 408 is configured to determine whether the acoustic input signal 104 comprises components from different sound sources at the same time. The controllable parameter estimator 406 is configured to select, according to a result of the signal characteristics determination, a spatial parameter calculation norm (for example, the first spatial parameter calculation norm 410 or the second spatial parameter calculation norm. 10 spatial parameter 412) of the plurality of spatial parameter calculation standards, to calculate the spatial parameters 102 (e.g., to calculate the diffusion parameter W(k,n)). The first spatial parameter calculation standard 410 is chosen when the acoustic input signal 104 comprises components of a maximum sound source 15 and the second spatial parameter calculation standard 412 of the plurality of spatial parameter calculation standards is chosen when the signal Acoustic input 104 comprise components from more than one sound source at the same time. The first spatial parameter calculation standard 410 includes a frequency average (e.g., of intensity parameters I3(k,n)) of the acoustic input signal 104 over a plurality of frequency subranges. The second spatial parameter calculation standard 412 does not include a frequency average.
In the example shown in Figure 4, the estimate of the diffusion parameter W(k, n) and/or a direction (incoming) parameter <p(k, n) in the directional audio coding analysis is improved by adjusting the corresponding estimators depending on situations of dubious conversation. It has been found that the computation of diffusion in Equation 2 can be carried out, in practice, by averaging the active intensity vector lE(k, n) over the k frequency subbands, or by combining a temporal and a spectral averaging. However, spectral averaging is not adequate if independent spread estimates are needed for different sub-frequency bands, as is the case in a so-called dubious conversation situation, where multiple sound sources (eg speakers) are active at the same time. . Therefore, traditionally (as in the directional audio encoder shown in Figure 2) 10 spectral averaging is not used, since the general model of acoustic input signals always assumes dubious conversation situations. This model assumption was found not to be ideal in the case of single speech situations, as it was found that in single speech situations, a spectral average can improve the accuracy of the parameter estimate.
The proposed innovative approach, as shown in Figure 4, chooses the optimal parameter estimation strategy (the ideal spatial parameter calculation standard) when selecting the basic model for the acoustic input signal 104 20 or for the acoustic input signals. In other words, Figure 4 presents an application of an embodiment of the present invention to improve the broadcast estimation depending on dubious conversation situations: first, the dubious speech detector 414 is employed, which determines from the acoustic input signal 104 or of the acoustic input signals whether or not the dubious conversation is present in the current situation. If not, a parameter estimator is decided (or, in other words, the controllable parameter estimator 406 chooses a spatial parameter calculation standard) that computes the diffusion (parameter) W(k, n) by approximating equation 2 to use the spectral (frequency) and temporal mean of the active intensity vector Ia(k, n) , that is,

Otherwise, if there is dubious talk, an estimator is chosen (or, in other words, the controllable parameter estimator 406 chooses a spatial parameter calculation standard) that uses only the temporal average, as in equation 3. A similar idea can be applied to the direction estimate: in the case of single speech situations, but only in this case, the direction estimate <p(k, n) could be improved by a spectral average of the results over several or all frequency sub-bands k, that is,

According to some embodiments of the present invention, it is also conceivable to apply the (spectral) averaging over parts of the spectrum and not necessarily across the entire range of band.
To perform the temporal and spectral averaging, the controllable parameter estimator 406 can determine the active intensity vector Ia(k, n) , for example, in the SFTF domain for each subrange k and each time period n, for example, using an energy analysis, for example, by employing an energy analyzer 416 of the controllable parameter estimator 406.
In other words, parameter estimator 406 can be configured to determine a current diffusion parameter ¥(k, n) for a current frequency sub-band k and a current time period n of the acoustic input signal 104, based on the spectral average and temporal of the determined active intensity parameters Ia(k, n) of the acoustic input signal 104 included in the first spatial parameter calculation standard 410 or based only on the temporal average of the determined active intensity vectors Ia(k, n) , depending on the determined signal characteristic.
In the following, another exemplary embodiment of the present invention will be described, which is also based on the concept of choice 10 of a spatial parameter adjustment calculation standard for improving the calculation of the spatial parameters of the acoustic input signal using a spatial audio processor 500 shown in Figure 5, based on a tone of the acoustic input signal.
TOTALITY DEPENDENT PARAMETER ESTIMATION USING A SPATIAL AUDIO PROCESSOR ACCORDING TO FIGURE 5
Figure 5 presents a schematic block diagram of a spatial audio processor 500 in accordance with an embodiment of the present invention. A functionality of the spatial audio processor 500 may be similar to the functionality of the spatial audio processor 100, in accordance with Figure 1. The spatial audio processor 500 may further comprise the additional aspects described below. Spatial audio processor 500 comprises a controllable parameter estimator 506 and a signal characteristic determiner 508. A functionality of controllable parameter estimator 506 may be similar to the functionality of controllable parameter estimator T...—of controllable parameter 106, in accordance with Figure 1, the controllable parameter estimator 506 may comprise the * additional aspects described below. A functionality of the signal characteristic determiner 508 may be similar to the functionality of the signal characteristic determiner 108 according to Figure 1. The signal characteristic determiner 508 may comprise the additional aspects described below.
The spatial audio processor 500 differs from the spatial audio processor 400 in that the calculation of the 10 spatial parameters 102 is modified based on a determined pitch of the acoustic input signal 104. The signal characteristic determiner 508 can determine the pitch of the acoustic input signal 104 and the controllable parameter estimator 506 can choose, based on the determined hue of the acoustic input signal 104, a spatial parameter calculation standard from a plurality of spatial parameter calculation standards for calculating the parameters. space 102.
In other words, the spatial audio processor 500 presents a concept to improve the estimate on the 20 directional audio coding parameters by considering the pitch of the acoustic input signal 104 or the acoustic input signals.
Signal characteristic determinant 508 can determine the pitch of the acoustic input signal 25 using a pitch estimator, for example, using a pitch estimator 510 of the signal characteristic determinant 508. Signal characteristic determinant 508 can therefore , providing the pitch of the acoustic input signal 104 or information corresponding to the pitch of the acoustic input signal 104 as the determined signal characteristic 110 of the acoustic input signal 104.
The controllable parameter estimator 506 can be configured to select, according to a result of the determination of signal characteristics (of the hue estimate), a spatial parameter calculation standard from the plurality of spatial parameter calculation standards to calculate the parameters. 102, so that a first spatial parameter calculation standard from the plurality of spatial parameter calculation standards is chosen when the pitch of the acoustic input signal 104 is below a certain pitch threshold level and so that a second spatial parameter calculation standard from the plurality of spatial parameter calculation standards 15 is chosen when the pitch of the acoustic input signal 104 is above a certain pitch threshold level. Similar to the controllable parameter estimator 406, according to Figure 4, the first spatial parameter calculation standard may include a frequency average 20 and the second spatial parameter calculation standard may not include a frequency average.
Generally speaking, the hue of an acoustic signal provides information on whether the signal has a wide-band spectrum or not. A high pitch indicates that the signal spectrum contains only a few high-energy frequencies. In contrast, low pitch indicates wide-band signals, that is, signals where similar energy is present over a wide frequency range.
Such information about the pitch of an acoustic input signal (of the pitch of the acoustic input signal 104) can be exploited to improve, for example, the parameter estimate of directional audio encoding. Referring to the schematic block diagram shown in Figure 5, of the acoustic input signal 104 or the acoustic input signals, the hue is first determined (for example, as explained in S. Molla and B. Torresani: Determining Local Transientness of Audio Signals, IEEE Signal Processing Letters, 10 Vol. 11, No. 7, July 2007) of the input using hue detector or hue estimator 510. The hue information (the determined signal characteristic 110) controls the estimation of directional audio coding parameters (of spatial parameters 102). An output of the controllable parameter estimator 15 506 is spatial parameters 102 with improved accuracy compared to the traditional method presented with the directional audio encoder, according to Figure 2.
The estimation of diffusion T(k,n) can arrive at knowing the tone of the input signal as follows: The computation of diffusion (k,n) needs an averaging process, as shown in equation 3. This averaging is traditionally performed only over time n. Particularly, in diffuse sound fields, an accurate estimation of the diffusion is only possible when the average is sufficiently long. A 25 long temporal average; however, it is generally not possible due to the short fixed range of acoustic input signals. To improve the diffusion estimate, we can combine the temporal average with a spectral average over the k frequency bands,

However, this method may need wideband signals where diffusion is similar for different frequency bands. In the case of tonal signals, where only a few frequencies have significant energy, the actual scattering of the í sound field may vary strongly across the k frequency bands.
This means, when the hue detector (the hue estimator 510 of the signal characteristic finder 508) 10 indicates a high hue of the acoustic signal 104, then spectral averaging is avoided.
In other words, the controllable parameter estimator 506 is configured to derive the spatial parameters 102, e.g., a diffusion parameter ^(k, n), e.g., in the SFTF domain, for a frequency sub-band k and a period. of time n based on a temporal and spectral average of intensity parameters Ia(k, n) of the acoustic input signal 104, if the determined pitch of the acoustic signal 104 is comparatively small, and to provide the spatial parameters 102, for example , the diffusion parameter Φ(k, n) based only on the temporal average and not on the spectral average of the intensity parameters Ia(k, n) of the acoustic input signal 104, if the determined hue of the acoustic input signal 104 is comparatively high.
The same idea can be applied to the estimation of the direction (arrival) parameter <p(k, n) to improve the signal-to-noise ratio of the results (of the determined spatial parameters 102). In other words, the controllable parameter estimator 506 can be configured to determine the arrival direction parameter <p(k, n) based on a spectral average, if the determined pitch of the acoustic input signal 104 is comparatively small, and to derive the 5th inward direction parameter (p(k, n) without performing a spectral averaging, if a pitch is comparatively high.
This idea of improving the signal-to-noise ratio by the spectral average of the arrival direction parameter <p(k, n) will be described in more detail below, using another embodiment of the present invention. Spectral averaging can be applied to the acoustic input signal 104 or the acoustic input signals for the active sound intensity or directly to the direction (arrival) parameter <p(k, n) .
For a person skilled in the art, it becomes clear that the spatial audio processor 500 can also be applied to spatial audio microphone analysis in a similar way, with the difference that now the expectation operators in equation 5a and equation 5b are approximate when considering a spectral average in the case where no dubious conversation is present or in the case of a low tone.
In the following, two other embodiments of the present invention will be explained, which perform a signal-to-noise ratio dependent direction estimation to improve the calculation of spatial parameters.
SIGNAL DIRECTION ESTIMATE DEPENDENT SIGNAL TO NOISE RATIO USING A SPACE AUDIO PROCESSOR ACCORDING TO FIGURE 6
Figure 6 presents a schematic block diagram of spatial audio processor 600. Spatial audio processor 600 is configured to perform direction estimation dependent on the signal-to-noise ratio mentioned above.
A functionality of the spatial audio processor 600 may be similar to the functionality of the spatial audio processor 100, in accordance with Figure 1. The spatial audio processor 600 may comprise the additional aspects described below. Spatial audio processor 600 comprises a controllable parameter estimator 606 and a signal characteristic determiner 608. A functionality of controllable parameter estimator 606 may be similar to the functionality of controllable parameter estimator 106, in accordance with Figure 1, and the controllable parameter estimator 606 may comprise the additional aspects described below. A functionality of the signal characteristic determiner 608 may be similar to the functionality of the signal characteristic determiner 108 according to Figure 1, and the signal characteristic determiner 608 may comprise the additional aspects described below. signal characteristic determiner 608 may be configured to determine a signal to noise ratio (SNR) of an acoustic input signal 104 as a signal characteristic 110 of acoustic input signal 104. Controllable parameter estimator 606 may be configured to provide a spatially variable calculation standard for calculating spatial parameters 102 of the acoustic input signal 104 based on the determined signal-to-noise ratio of the acoustic input signal 104. The controllable parameter estimator 606 may, for example, perform a temporal average to determine the spatial parameters 102 and can vary an average length of the temporal average (or of various elements used for the temporal average) depending on the determined signal-to-noise ratio of the acoustic input signal 104. For example , the estimator of parameter 606 can be configured to vary the average length of the temporal average, so that the length 10 of the average is comparatively high for a comparatively low signal-to-noise ratio of the acoustic input signal 104 and so that the average length is comparatively low for a comparatively high signal-to-noise ratio of the acoustic input signal 104.
The estimator of parameter 606 can be configured to provide an arrival direction parameter cp(k, n) as spatial parameter 102 based on the mentioned temporal average. As mentioned before, the arrival direction parameter <p(k, n) can be determined in the controllable parameter estimator 606 20 (for example, in a direction estimator 610 of the parameter estimator 606) for each k frequency sub-band and the period of time n as the opposite direction of the active sound intensity vector Ia(k, n) . The parameter estimator 606 can therefore comprise an energy analyzer 612 to perform an energy analysis 25 on the acoustic input signal 104 to determine the active sound intensity vector Ia(k,n) for each frequency sub-band k and each time period. n. Direction estimator 610 may perform temporal averaging, for example, on the determined active intensity vector Ia(k, n) for a frequency subband k over a plurality of time periods n. In other words, the 'direction estimator 610 can perform a temporal average of intensity parameters Ia(k, n) for a frequency subband 5k and a plurality of (earlier) time periods to calculate the arrival direction parameter < p(k, n) for a frequency sub-band k and a time period n. In accordance with further embodiments of the present invention, the direction estimator 610 may also (e.g., instead of a temporal average of the intensity parameters Ia(k, n) ) perform the temporal average on a plurality of direction parameters determined arrival times <p(k, n) for a frequency sub-band k and a plurality of (earlier) time periods. The averaging length of the temporal averaging therefore corresponds to the number of 15 intensity parameters or the number of inward direction parameters used to perform the temporal averaging. In other words, parameter estimator 606 can be configured to apply the temporal average to a subset of intensity parameters Ia(k, n) for a plurality of time periods and a frequency subset k or a subset of parameters of arrival direction <p(k, n) for a plurality of time periods and a frequency sub-band k. The number of intensity parameters in the intensity parameter subset or the number of arrival direction parameters in the 25 arrival direction parameter subset used for the temporal average corresponds to the average length of the temporal average. The controllable parameter estimator 606 is configured to adjust the number of intensity parameters or the number of arrival direction parameters in the subset used to calculate the temporal average, so that the number of intensity parameters in the and * subset of intensity parameters or the number of inward direction parameters in the subset of inward direction parameters is comparatively low for a comparatively high signal-to-noise ratio of the acoustic input signal 104 and so that the number of intensity parameters or the number of arrival direction parameters is comparatively high for a comparatively low signal-to-noise ratio of the acoustic input signal 10 104.
In other words, the embodiment of the present invention provides a directional audio coding direction estimate that is based on the signal-to-noise ratio of the acoustic input signals or the acoustic input signal 104.
Generally speaking, the accuracy of the estimated direction <p(k, n) (or arrival direction parameter <p(k, n) ) of the sound, defined in accordance with the directional audio encoder 200, in accordance with Figure 2, is influenced by noise, which is always present within the acoustic input signals.
The impact of noise on the accuracy of the estimate depends on the SNR, that is, on the ratio between the sound signal energy arriving at the collection (microphones) and the noise energy. A significantly small SNR reduces the estimation accuracy of the ç(k,n) direction. The noise signal is usually introduced by measuring equipment, eg microphones and microphone amplifier, and leads to errors in cp(k,n) . The direction cp(k,n) has been found to be equally likely to be estimated or overestimated, but the expectation of <p(k,n) is still correct. 7
It was found that by having several estimates ' independent of the arrival direction parameter tp(k, n) , for example, by repeating the measurement several times, the influence of noise can be reduced and thus the accuracy of the 5 direction estimate can be increased by averaging the arrival direction parameter cp(k,n) over the different measurement cases. Efficiently, the averaging process increases the signal-to-noise ratio of the estimator. The lower the signal-to-noise ratio in microphones or, in general, in sound recording devices, or the higher the target signal-to-noise ratio desired in the estimator, the greater the number of measurement cases that can be needed in the averaging process.
The spatial encoder 600 shown in Figure 6 performs this averaging process in dependence on the signal-to-noise ratio of the acoustic input signal 104. Or, in other words, the spatial audio processor 600 presents a concept for improving direction estimation in the directional audio encoding when considering the SNR at the acoustic input or the acoustic input signal 104.
Before estimating the direction <p(k, n) with the direction estimator 610, the signal-to-noise ratio of the acoustic input signal 104 or the acoustic input signals is determined with the signal-to-noise ratio estimator 614 of the signal characteristic determiner 608. The signal to noise ratio can be estimated for each time block in the k frequency band, for example, in the SFTF domain. Information on the actual signal-to-noise ratio of the acoustic input signal 104 is provided as the determined signal characteristic 110 of the signal-to-noise ratio estimator 614 to the direction estimator 610, which includes a frequency and time dependent temporal average of the specific signals of the directional audio encoding signals to improve the signal-to-noise ratio. In addition, a desired target signal-to-noise ratio can be passed to direction estimator 610. The desired target signal-to-noise ratio can be set externally, for example, by a user. Direction estimator 610 can adjust the average length of the temporal average such that an achieved signal-to-noise ratio of the acoustic input signal 104 at an output of the controllable parameter estimator 606 (after the average) matches the ratio of signal to desired noise. Or, in other words, the average (on the 610 direction estimator) is performed until the desired target signal to noise ratio is obtained.
Direction estimator 610 can continuously compare the achieved signal-to-noise ratio of the acoustic input signal 104 with the target signal-to-noise ratio and can average until the desired target signal-to-noise ratio 20 is reached. Using this concept, the achieved acoustic input signal 104 of signal-to-noise ratio is continuously monitored and averaging is terminated when the achieved signal-to-noise ratio of the acoustic input signal 104 matches the target signal-to-noise ratio, thus, there is no need to calculate the average length in advance.
In addition, direction estimator 610 can determine, based on the signal-to-noise ratio of the acoustic input signal 104 at the input of controllable parameter estimator 2 Í* T 606, the mean length to the mean of the signal-to-signal ratio. noise of the acoustic input signal 104, such that the achieved signal-to-noise ratio of the acoustic input signal 104 at the output of the controllable parameter estimator 606 5 corresponds to the target signal to noise. Thus, using this concept, the achieved signal-to-noise ratio of the acoustic input signal 104 is not continuously monitored.
A result generated for the two concepts for the direction estimator 610 described above is the same: During the estimation of the spatial parameters 102, a precision of the spatial parameters 102 can be achieved as if the acoustic input signal 104 had the proportion of signal-to-noise target, although the actual signal-to-noise ratio of the acoustic input signal 104 (at the input of controllable parameter estimator 606) was worse.
The lower the signal-to-noise ratio of the acoustic input signal 104 compared to the target signal-to-noise ratio, the longer the temporal average. An output of the direction estimator 610 is, for example, an estimate ç>(k,n), that is, the 20 arrival direction parameter cp(k, n) with improved accuracy.
As mentioned before, there are different possibilities for averaging directional audio coding signals: averaging the active sound intensity vector Ia(k, n) for a frequency subband k and a plurality of time periods provided by equation 1 or a averaging directly from the estimated direction cp(k, n) (the arrival direction parameter <p(k, n) ) already defined earlier as the opposite direction of the active sound intensity vector Ia(k, n) over time. 7
The spatial audio processor 600 can also be applied to spatial audio microphone direction analysis in a similar manner. The accuracy of the direction estimate can be increased by averaging results over several measurement cases. This means that, similar to DirAC in Figure 6, the SAN estimator is improved by first determining the SNR of the acoustic input signal(s) 104. The information about the actual SNR and the target SNR desired is passed to the s direction estimator,
SAM which includes a frequency and time dependent temporal average of the specific SAM signals to improve the SNR. Averaging is performed until the desired target SNR is obtained. In fact, two SAM signals can be averaged, namely, the estimated direction <p(k,n) or the PSDs and CSDs defined in equation 5a and equation 5b. The last average simply means that the 15 expectation operators are approximated by an averaging process, the length of which depends on the actual and desired (target) SNR. The average of the estimated direction cp(k,n) is explained for DirAC, according to Figure 7b, but keeps the same form for SAM.
According to a further embodiment of the present invention, which will be explained later using Figure 8, instead of explicitly averaging physical quantities with these two methods, it is possible to change a used filterbank, since the filterbank may contain an inherent average of the input signals. In the following, the two mentioned methods 25 for averaging directional audio coding signals will be explained in more detail using Figures 7a and 7b. The alternative method of exchanging the filterbank with a spatial audio processor is shown in Figure 8.
AVERAGE OF THE ACTIVE SOUND DENSITY VECTOR IN THE DIRECTIONAL AUDIO CODING, ACCORDING TO FIGURE 7a
Figure 7a presents, in a schematic block diagram, a first possible realization of the signal-to-noise ratio-dependent direction estimator 610 in Figure 6. The realization, which is shown in Figure 7a, is based on a temporal average of the intensity acoustic sound or the sound intensity parameters Is(k, n) by a direction estimator 610a. The functionality of the direction estimator 610a may be similar to a functionality of the direction estimator 610 of Figure 6, in that the direction estimator 610a may comprise the additional aspects described below. direction estimator 610a is configured to perform an average and direction estimate. The direction estimator 610a is connected to the energy analyzer 612 of Figure 6, the direction estimator 610 with the energy analyzer 612 can constitute a controllable parameter estimator 606a, whose functionality is similar to the functionality of the controllable parameter estimator 606 shown in Figure 6 The controllable parameter estimator 606a first determines from the acoustic input signal 104 or the acoustic input signals an active sound intensity vector 706 (Ia(k, n) ) in the energy analysis using the energy analyzer 612 using equation 1, as explained before. In an averaging block 702 of the 25 direction estimator 610a that averages this vector (the sound intensity vector 706) is measured over time n independently for all (or at least a part of all) frequency ranges or k frequency subbands, which leads to a t vector of average acoustic intensity 708 (Iavg(k, n) ), according to the following equation:

To perform the average, the 610a direction estimator considers previous intensity estimates. An insert to the averaging block 702 is the actual signal-to-noise ratio 710 of the acoustic input 104 or the acoustic input signal 104, which is determined with the signal-to-noise ratio estimator 614 shown in Figure 6. The signal ratio for real noise 10 710 of the acoustic input signal 104 constitutes the determined signal characteristic 110 of the acoustic input signal 104. The signal to noise ratio is determined for each frequency sub-band k and each time period n in the time frequency domain I enjoy. A second insertion to the averaging block 702 is a desired signal-to-noise ratio or a target signal-to-noise ratio 712, which must be obtained from an output of the controllable parameter estimator 606a, i.e., the target signal-to-noise ratio. . The target signal to noise ratio 712 is an external input, given, for example, by the user. Averaging block 702 averages intensity vector 706 (Ia(k, n) ) until the target signal-to-noise ratio 712 is reached. Based on the (acoustic) intensity vector averaged 708 (Iavg(k, n) ), finally the direction (p(k, n) of the sound can be computed using a direction estimation block 704 of the direction estimator 610a which performs direction estimation as explained above. The arrival direction parameter <p(k, n) constitutes a spatial parameter 102 determined by the controllable parameter estimator 606a. The direction estimator 610a can *4' I determine the parameter of arrival direction cp(k, n) for each frequency subband k and time period n as the opposite direction s. of the sound intensity vector on average 708 (Iavç(k, n) ) of the corresponding frequency subband k and the period of time 5 corresponding n.
Depending on the desired target signal-to-noise ratio 712, the controllable parameter estimator 610a may vary the length from the average to the average of the sound intensity parameters 706 (Ia(k, n)) such that a signal-to-noise ratio. The noise at the output of the controllable parameter estimator 606a matches (i.e. equal to) the target signal to noise ratio 712. Typically, the controllable parameter estimator 610a can choose a comparatively long average length for a comparatively high difference between the signal ratio 15 for real noise 710 of the acoustic input signal 104 and the signal to target noise ratio 712. For a comparatively low difference between the signal to real noise ratio 710 of the acoustic input signal 104 and the signal to noise ratio target 712 , the controllable parameter estimator 610a will choose a comparatively short average length 20.
Or, in other words, the 606a direction estimator is based on the average of the acoustic intensity of the acoustic intensity parameters.
AVERAGE OF THE ENCODING DIRECTION PARAMETER OF 25 DIRECTIONAL AUDIO DIRECTLY ACCORDING TO FIGURE 7b
Figure 7b presents a schematic block diagram of a controllable parameter estimator 606b, the functionality of which may be similar to the functionality of the controllable parameter estimator 606 shown in Figure 6. The controllable parameter estimator 606b comprises the energy analyzer 612 and an energy estimator. direction 610b configured to perform a direction estimate and an average. Direction estimator 610b differs from direction estimator 610a in that it first performs direction estimation to determine an arrival direction parameter 718 (cp(k, n) ) for each frequency subband k and each time period ne, second, it averages the determined arrival direction parameter 718 to determine an average arrival direction parameter Φavg(k, n) for each k frequency subband and each n time period. The mean arrival direction parameter <pavg(k, n) constitutes a spatial parameter 102 determined by the controllable parameter estimator 606b.
In other words, Figure 7b presents another possible realization of the signal-to-noise ratio-dependent direction estimator 610, which is shown in Figure 6. The realization, which is shown in Figure 7b, is based on a temporal average of the estimated direction. (the arrival direction parameter 718 (<p(k, n) ) which can be obtained with a conventional audio coding approach, for example, for each frequency subband k and each time period n as the opposite direction of active sound intensity vector 706 (Ia(k, n) ) .
From the acoustic input or the acoustic input signal 104, the energy analysis is performed using the energy analyzer 612 and then the direction of sound (the direction of arrival parameter 718 (cp(k, n) ) is determined in a direction estimation block 714 of direction estimator 610b which performs direction estimation T, for example, with a conventional directional audio encoding method i explained earlier.
L averaging block 716 of the direction estimator 610b, a temporal average in that direction is applied (in the arrival direction parameter 5718 (cp(k, n) ) . As explained before, the average is performed over time and for all (or at least part of all) frequency bands or sub-bands of frequency k, which produces the © direction on average cpavgtk, n) :

The direction in the mean çavg(k, n) for each frequency sub-band k and each time period n constitutes a spatial parameter 102 determined by the controllable parameter estimator 606b.
As described above, the inserts to the averaging block 716 are the actual signal-to-noise ratio 710 of the acoustic input or the acoustic input signal 104, as well as the target signal-to-noise ratio 712, which must be obtained at an output of the controllable parameter estimator 606b. The actual signal-to-noise ratio 710 is determined for each k frequency subband and each time period n, for example, in the domain of
SFTF. Averaging 716 is performed over a sufficient number of time blocks (or time periods) until the target signal-to-noise ratio 712 is reached. The final result is the temporal average direction q>avg(k, n) with increased accuracy.
To summarize, the signal characteristic determiner 608 is configured to provide the signal-to-noise ratio 710 of the acoustic input signal 104 as a plurality of signal-to-noise ratio parameters for a subrange of t * frequency k and a time period. n of the acoustic input signal '104. The controllable parameter estimators 606a, 606b are 'configured to receive the target signal-to-noise ratio 712' as a plurality of target signal-to-noise ratio parameters for a frequency sub-range k and a period of time n. The controllable parameter estimators 606a, 606b are further configured to derive the average length of the temporal average in accordance with a current signal-to-noise ratio parameter of the acoustic input signal, such that a signal-to-noise ratio parameter 10 current noise of the current (averaged) inward direction parameter (pavg(k, n) corresponds to a current target signal-to-noise ratio parameter.
The controllable parameter estimators 606a, 606b are configured to derive intensity parameters Ia(k, 15 n) for each k frequency subband and each time period n of the acoustic input signal 104. In addition, the controllable parameter estimators 606, 606b are configured to derive inward direction parameters q>(k/n) for each frequency subband k and each time period n of the acoustic input signal 104 based on the intensity parameters la(k,n) do . acoustic audio signal determined by controllable parameter estimators 606a, 606b. Controllable parameter estimators 606a, 606b are further configured to derive the current arrival direction parameter <p(k, n) for a current frequency subband and a current time period, based on the temporal average of at least one subset of intensity parameters derived from the acoustic input signal 104 or based on the temporal average of at least a subset of derived inward direction parameters.
The controllable parameter estimators 606a, 606b are configured to derive the intensity parameters Ia(k, n) for each k frequency subband and each n time period, eg in the SFTF domain, in addition the parameter estimators controllable 606a, 606b are configured to derive the arrival direction parameter q>(k,n) for each frequency subband k and each time period n, for example, in the SFTF domain. The controllable parameter estimator 606a is configured to choose the intensity parameter subset to perform the temporal averaging such that a frequency subchannel associated with all intensity parameters of the intensity parameter subset is equal to a frequency subband current associated with the current arrival direction parameter. Controllable parameter 606b is configured to choose the inbound direction parameter subset to perform time averaging 716 such that a frequency subchannel associated with all inbound direction parameters of the inbound direction parameter subset is equal to the current frequency subchannel associated with the current arrival direction parameter.
In addition, the controllable parameter estimator 606a is configured to choose the intensity parameter subset such that the time periods associated with the 25 intensity parameters from the intensity parameter subset are time adjacent. The controllable parameter estimator 606b is configured to choose the inbound direction subset of parameters so that the time periods
I associated with the inbound direction parameters of the inbound direction subset parameters are adjacent in time. 0' number of intensity parameter in the subset of intensity parameters or the number of inward direction parameters in the 5th subset of inward direction parameters corresponds to the average length of the temporal average. Controllable parameter estimator 606a is configured to derive the number of intensity parameters in the intensity parameter subset for J to time-average as a function of the difference between a . 10 signal-to-noise ratio of the acoustic input signal 104 and the current target signal-to-noise ratio. The controllable parameter estimator 606b is configured to derive the number of arrival direction parameters in the arrival direction parameter subset to perform the temporal average based on the difference between the current signal-to-noise ratio of the acoustic input signal 104 and the current target signal-to-noise ratio.
Or, in other words, direction estimator 606b is based on the average of direction 718 <p(k, n) obtained with a conventional directional audio coding approach.
Next, another embodiment of a spatial audio processor will be described, which also performs signal-to-noise ratio dependent parameter estimation.
USING A BANK OF FILTERS WITH A SUITABLE SPECTRO-TEMPORAL RESOLUTION IN THE 25-DIRECTIONAL AUDIO CODING USING AN AUDIO ENCODER ACCORDING TO FIGURE 8
Figure 8 shows a spatial audio processor 800 comprising a controllable parameter estimator 806 and a signal characteristic determiner 808. The functionality of the directional audio encoder 800 may be similar to the functionality of the directional audio encoder 100. The audio encoder directional 800 may comprise the additional aspects described below. A functionality of the controllable parameter estimator 806 may be similar to the functionality of the controllable parameter estimator 106 and a functionality of the signal characteristic determiner 808 may be similar to a functionality of the signal characteristic determiner 108. The controllable parameter estimator 806 and the signal characteristic determiner 808 may comprise the additional aspects described below.
The signal characteristic determinant 808 differs from the signal characteristic determinant 608 in that it determines a signal to noise ratio 810 of the acoustic input signal 104, which is also denoted as the input signal to noise ratio, in the domain of time and not in the SFTF domain. The signal to noise ratio 810 of the acoustic input signal 104 constitutes a signal characteristic determined by the signal characteristic determiner 808. The controllable parameter estimator 806 differs from the controllable parameter estimator 606 shown in Figure 6 in that it comprises a B-format estimator 812 comprising a filterbank 814 and a B-format computation block 816, which is configured to transform the acoustic input signal 104 in the time domain to the B-format representation, e.g. in the SFTF domain.
In addition, the B-format estimator 812 is configured to vary the B-format determination of the acoustic input T signal 104 based on the signal characteristics determined by the signal characteristic determiner 808 or, in other words, in dependence on the signal to noise ratio 810 of the acoustic input signal 104 in the time domain.
An output of the B-format estimator 812 is a B-format 818 representation of the acoustic input signal 104. The B-format 818 representation comprises an omnidirectional ô 1 component, e.g., the sound pressure vector P(k, n) mentioned above and a directional component, for example, the velocity vector U(k, n) mentioned above for each frequency sub-band k and each time period n.
A direction estimator 820 of the controllable parameter estimator 806 derives an arrival direction parameter p(k, n) from the acoustic input signal 104 for each 15 frequency sub-band k and each time period n. The arrival direction parameter q>(k, n) constitutes a spatial parameter 102 determined by the controllable parameter estimator 806. The direction estimator 820 can perform direction estimation by determining an active intensity parameter Ia(k, n) for every 20 frequency subband k and every time period ne when deriving the arrival direction parameters cp(k, n) based on the active intensity parameters Ia(k, n) .
The filterbank 814 of the B-format estimator 812 is configured to receive the actual signal-to-noise ratio 810 of the acoustic input signal 104 and to receive a target signal-to-noise ratio 822. The controllable parameter estimator 806 is configured to vary a block length of filterbank 814 in dependence on a difference between the actual signal-to-noise ratio 810 of the acoustic input signal 104 and the target signal-to-noise ratio 822. An output of filterbank 814 is a frequency representation (e.g., in the SFTF domain) of the acoustic input signal 104, on the basis of which B-format computation block 816 computes the B-format 818 representation of the acoustic input signal 104. words, the conversion of the time domain acoustic input signal 104 to the frequency representation can be performed by the filterbank 814 in dependence on the determined signal to actual noise ratio 810 of the signal. acoustic input 104 and in dependence on the target signal-to-noise ratio 822. In short, the B-format computation can be performed by the B-format computation block 816 in dependence on the determined signal-to-noise ratio 810 and the ratio signal to noise 15 target 822.
In other words, the signal characteristic determiner 808 is configured to determine the signal to noise ratio 810 of the acoustic input signal 104 in the time domain. Controllable parameter estimator 806 20 comprises filterbank 814 for converting acoustic input signal 104 from the time domain to frequency representation. Controllable parameter estimator 806 is configured to vary the block length of filterbank 814 in accordance with the determined signal-to-noise ratio 810 of the acoustic input signal 104. Controllable parameter estimator 806 is configured to receive the signal-to-noise ratio 812 and to vary the block length of filterbank 814 so that the signal-to-noise ratio of the acoustic input signal Y 104 in the frequency domain matches the signal-to-noise ratio 824 or , in other words, so that the signal-to-noise ratio of the frequency representation 824 of the acoustic input signal 104 corresponds to the target signal-to-noise ratio 822.
The controllable parameter estimator 806 shown in Figure 8 can also be understood as another realization of the signal-to-noise ratio-dependent direction estimator 610 shown in Figure 6. The realization that is shown in Figure 8 is based on the choice of a adequate temporal and spectral resolution of filterbank 814. As explained earlier, directional audio coding operates in the SFTF domain. Thus, the acoustic input signals or the time domain acoustic input signal 104, for example, measured with microphones, are transformed using, for example, a short-time Fourier transform or any other filter bank. The B-format estimator 812 then provides the short-time frequency representation 818 of the acoustic input signal 104 or, in other words, provides the B-format signal, as denoted by the sound pressure P(k, n) and by the particular velocity vector U(k, n) , respectively. Application of filterbank 814 to the time domain of acoustic input signals (in the acoustic input signal 104 in the time domain) inherently averages the transformed signal (the short-time frequency representation 824 of the acoustic input signal 104) , while the averaging length corresponds to the transform length (or block length) of filterbank 814. The averaging method described in conjunction with spatial audio processor 800 T exploits this inherent temporal averaging of the input signals. The acoustic input or acoustic input signal t 104, which can be measured with the microphones, is transformed into the short time frequency domain using filter bank 5 814. The transformation length or filter length or block length is controlled by the actual input signal-to-noise ratio 810 of the acoustic input signal 104 or the acoustic input signals and the desired target signal-to-noise ratio 822, which is to be obtained by the averaging process. In other words, it is desired to average the filterbank 814 so that the signal-to-noise ratio of the frequency and time representation 824 of the acoustic input signal 104 corresponds or is equal to the target signal-to-noise ratio 822 The signal-to-noise ratio is determined from the acoustic input signal 104 or the acoustic input signals in the time domain. In case of high input signal to noise ratio 810, shorter transform length is chosen and vice versa for low input signal to noise ratio 810, longer transform length is chosen.
As explained in the previous section, the input signal-to-noise ratio 810 of the acoustic input signal 104 is provided by a signal-to-noise ratio estimator of the signal characteristic finder 808, while the target signal-to-noise ratio 822 can be controlled externally, for example, by a user. The output of filterbank 814 and the subsequent B-format computation performed by the B-format computation block 816 are the acoustic input signals 818, e.g., in the STFT domain, viz., P(k,n) and/or U(k,n). These signals (the acoustic input signal 818 in the STFT domain) are further processed, for example, with conventional directional audio coding processing in the direction estimator 820 to obtain the direction <p(k,n) for each subrange of frequency k and each period of time n.
Or, in other words, the spatial audio processor 800 or the direction estimator is based on choosing a suitable filterbank for the acoustic input signal 104 or for the acoustic input signals.
In short, the signal characteristic determiner 808 is configured to determine the signal to noise ratio 810 of the acoustic input signal 104 in the time domain. Controllable parameter estimator 806 comprises filterbank 814 configured to convert acoustic input signal 15104 from the time domain to frequency representation. The controllable parameter estimator 806 is configured to vary the block length of the filterbank 814 in accordance with the determined signal-to-noise ratio 810 of the acoustic input signal 104. In addition, the controllable parameter estimator 806 is configured to receive the target signal-to-noise ratio 822 and to vary the block length of filterbank 814 so that the signal-to-noise ratio of the acoustic input signal 824 in the frequency representation matches the target signal-to-noise ratio 822 ,
The signal-to-noise ratio estimation performed by the signal characteristic finder 608, 808 is a well-known problem. Next, a possible implementation of a signal-to-noise ratio estimator should be described.
POSSIBLE IMPLEMENTATION OF AN SNR ESTIMATOR In the following, a possible implementation of the signal-to-input noise ratio estimator 614 in Fig. 605 [SIC] will be described. The signal-to-noise ratio estimator, described below, can be used for the controllable parameter estimator 606a and the controllable parameter estimator 606b shown in Figures 7a and 7b. The signal-to-noise ratio estimator estimates the signal-to-noise ratio of the acoustic input signal 104, for example, in the SFTF domain. A time domain implementation (e.g., implemented in signal characteristics finder 808) can be performed in a similar manner.
The SNR estimator can estimate the SNR of the acoustic input signals, for example, in the STFT domain, for each time block in the k frequency band or for a time domain signal. The SNR is estimated by computing the signal energy for the considered time frequency box. Assume that x(k,n) is the acoustic input signal. The signal energy S(k,n) can be determined with

To obtain the SNR, the ratio of signal energy to noise energy N(k) is computed, ie,

As S(k,n) already contains noise, a more accurate SNR estimator 25 in the case of low SNR is given by

The noise energy signal N(k) is assumed to be constant over time n. Can be determined for each k of acoustic input. In fact, it is equal to the average energy of the acoustic input signal in case there is no sound present, that is, during silence. Expressed in mathematical terms,

In other words, according to some embodiments of the present invention, a signal characteristic determiner is configured to measure a noise signal during a silence phase of the acoustic input signal 104 and to calculate an energy N(k) of the signal. noise. The signal characteristic determiner may be further configured to measure an active signal during a non-silent phase of the acoustic input signal 104 and to calculate an energy S(k,n) of the active signal. The signal characteristic determinant may further be configured to determine the signal to noise ratio of the acoustic input signal 104 based on the calculated energy N(k) of the noise signal and the calculated energy S(k,n) of the signal. active.
This scheme can also be applied to the signal characteristic finder 808 with the difference that the signal characteristic finder 808 determines an energy S(t) of the active signal in the time domain and determines an energy N(t) of the signal. noise in the time domain, to obtain the actual signal-to-noise ratio of the acoustic input signal 104 in the time domain.
In other words, the signal characteristic determiners 608, 808 are configured to measure a noise signal during a silence phase of the acoustic input signal 104 and to calculate an energy N(k) of the noise signal.
Signal characteristic determiners 608, 808 are configured to measure an active signal during a non-silence phase of the acoustic input signal 104 and to calculate an energy of the active signal (S(k,n)). In addition, the signal characteristic determiners 608, 808 are configured to determine a signal to noise ratio of the acoustic input signal 104 i based on the calculated energy N(k) of the noise signal and the calculated energy S(k) of the active signal.
The following is another embodiment of the present invention! 10 will be described, which performs a clap-dependent parameter estimation. APPLAUSE DEPENDENT PARAMETER ESTIMATE USING A SPATIAL AUDIO PROCESSOR ACCORDING TO FIGURE 9 I ■ 15 Figure 9 presents a schematic block diagram of a spatial audio processor 900 according to an embodiment of the present invention . A functionality of the spatial audio processor 900 may be similar to the functionality of the spatial audio processor 100, and the spatial audio processor 900 may comprise the additional aspects described below. Spatial audio processor 900 comprises a controllable parameter estimator 906 and a signal characteristic estimator 908. A functionality of controllable parameter estimator 906 may be similar to the functionality of controllable parameter estimator 106 and controllable parameter estimator 906 may comprise the additional aspects described below. A functionality of the signal characteristic determiner 908 may be similar to the functionality of the signal characteristic determiner 108, and the signal characteristic determiner 908 may comprise the additional aspects described below.
The signal characteristic determiner 908 is configured to determine whether the acoustic input signal 104 comprises transient components that correspond to signals such as clapping, for example, using a clap detector 910.
Signals such as applause, here defined as signals, which comprise a fast temporal sequence of transients, for example, in different directions. the controllable parameter estimator 906 comprises a filterbank 912 that is configured to convert the time domain acoustic input signal 104 to a frequency representation (e.g., to a 15 SFTF domain) based on a calculation standard of conversion. The controllable parameter estimator 906 is configured to choose the conversion calculation standard for converting the time domain acoustic input signal 104 to the frequency representation of a plurality of conversion calculation standards, according to a result of a signal characteristic determination performed by the signal characteristic determinant 908. The result of the signal characteristic determination constitutes the determined signal characteristic 110 of the signal characteristic determinant 908. The controllable parameter estimator 25 906 chooses the calculation standard of converting a plurality of conversion calculation standards, so that a first conversion calculation standard of the plurality of conversion calculation standards is chosen to convert the acoustic input signal 104 from the time domain to the frequency representation when the acoustic input signal comprises components corresponding to applause, and so that a second conversion calculation standard from the plurality of conversion calculation standards is chosen to convert the acoustic input signal 104 from the time domain to the frequency representation when the acoustic input signal 104 does not comprise components corresponding to applause.
Or, in other words, the controllable parameter estimator 906 is configured to choose a suitable conversion calculation standard for converting the acoustic input signal 104 from the time domain to the frequency representation in dependence on a clap detection.
In short, the spatial audio processor 900 is presented as an exemplary embodiment of the invention, where the parametric description of the sound field is determined depending on the characteristic of the acoustic input signals or the acoustic input signal 104. In case the microphones capture the clap or the acoustic input signal 104 comprises components 20 corresponding to signals like clapping, a special processing in order to increase the accuracy of the parameter estimate is used.
Applause is generally characterized by a rapid variation in the direction of sound arrival within a very short period of time. Furthermore, the captured sound signals contain mostly transients. It has been found that for accurate sound analysis, it is advantageous to have a system that can resolve the rapid temporal variation of the incoming direction and that can preserve the transient character of the signal components.
These goals can be achieved by using a filterbank with high temporal resolution (eg a short block or transform length STFT) to transform the time-domain acoustic input signals. When using this filter bank, the spectral resolution of the system will be reduced. This is not a problem for applause signals, as the DOA of the sound does not vary much across the frequency due to the characteristics of the transient sound. However, small spectral resolution has been found to be problematic for other signals, such as speaking in a dubious conversation setting, where a certain spectral resolution is required to be able to distinguish between individual speakers. It has been found that an accurate parameter estimate may need a signal-dependent portion of the filterbank (or the corresponding filterbank or transform length or block) depending on the characteristic of the acoustic input signals or the acoustic input signal. 104.
The space encoder 900 shown in Figure 9 represents a possible embodiment to perform the signal-dependent switching of filterbank 912 or to choose the conversion calculation standard of filterbank 912. Before transforming the acoustic input signals or the signal of acoustic input 104 in the frequency representation (e.g., in the STFT domain) with the filterbank 912, the input signals or the input signal 104 is passed to the clap detector 910 of the signal characteristic determiner 908. acoustic input signal 104 is passed to the applause detector 910 in the time domain. The applause detector 910 of the signal characteristic determiner 908 controls the filterbank 912 based on the determined signal characteristic 110 (which, in this case, signals whether the acoustic input signal 104 contains components corresponding to signals such as applause or not) . If applause is detected in the acoustic input signals or the acoustic input signal 104, the controllable parameter estimator 900 switches to a filterbank or, in other words, a conversion calculation standard is chosen in filterbank 912, which is suitable for applause analysis. In case there is no applause present, a conventional filterbank or, in other words, a conventional conversion calculation standard, which may for example be known from the directional audio encoder 200, is used. After transforming the acoustic input signal 104 to the STFT domain (or other frequency representation), a conventional directional audio coding processing can be performed (using a B-format 914 computation block and a parameter estimation block 916 of the controllable parameter estimator 906). In other words, the determination of the directional audio coding parameters constituting the spatial parameters 102, which are determined by the spatial audio processor 900, can be performed using the B-format computation block 914 and the B-format estimation block. parameter 916, as described according to the directional audio encoder 200 shown in Figure 2. The results are, for example, the directional audio encoding parameters, that is, the direction <p(k, n) and diffusion Φ( k, n).
Or, in other words, the spatial audio processor 900 provides a concept in which the estimation of directional audio encoding parameters is improved by switching the filterbank in the case of clapping signals or signals like clapping.
In short, the controllable parameter estimator 906 is configured so that the first conversion calculation standard corresponds to a greater temporal resolution of the acoustic input signal in the frequency representation than the second conversion calculation standard, and so that the second conversion calculation standard corresponds to a higher spectral resolution of the acoustic input signal in the frequency representation than the first conversion calculation standard.
The applause detector 910 of the signal characteristic determiner 908 can, for example, determine whether the acoustic input signal 104 comprises signals such as applause based on metadata, for example, generated by a user.
The spatial audio processor 900 shown in Figure 9 can also be applied to SAM analysis in a similar manner with the difference that now the SAM filterbank is controlled by the clap detector 910 of the signal characteristic finder 908.
In a further embodiment of the present invention, the controllable parameter estimator can determine the spatial parameters using different parameter estimation strategies independent of the determined signal characteristic, so that for each parameter estimation strategy, the controllable parameter estimators determine a set of spatial parameters of the acoustic input signal. The controllable parameter estimator can be further configured to select a set of spatial parameters from the determined sets of spatial parameters as the spatial parameter of the acoustic input signal and therefore as the result of the estimation process in dependence on the determined signal characteristic. For example, a first variable spatial parameter calculation standard may comprise: determining spatial parameters of the acoustic input signal for each parameter estimation strategy and selecting the set of spatial parameters determined with a first parameter estimation strategy. A second variable spatial parameter calculation standard may comprise: determining spatial parameters of the acoustic input signal for each parameter estimation strategy and selecting the set of spatial parameters determined with a second parameter estimation strategy.
Figure 10 presents a flowchart of a method 1000, in accordance with one embodiment of the present invention.
Method 1000 for providing spatial parameters based on an acoustic input signal comprises a step 1010 of determining a signal characteristic of the acoustic input signal.
Method 1000 further comprises a step 1020 of modifying a variable spatial parameter calculation standard in accordance with the determined signal characteristic.
Method 1000 further comprises a step 1030 of calculating spatial parameters of the acoustic input signal in accordance with the variable spatial parameter calculation standard.
Embodiments of the present invention relate to a method that controls parameter estimation strategies in systems for spatial sound representation based on the characteristics of acoustic input signals, i.e., microphone signals.
In the following, some aspects of the embodiments of the present invention will be summarized.
At least some embodiments of the present invention are configured to receive multi-channel acoustic audio signals, i.e., microphone signals. From the acoustic input signals, embodiments of the present invention can determine specific signal characteristics. Based on the signal characteristics, embodiments of the present invention can choose the signal model that best fits. The signal model can then control the parameter estimation strategy. Based on the controlled or selected parameter estimation strategy, embodiments of the present invention can estimate the spatial parameters that best fit for the given acoustic input signal.
Estimation of parametric sound field descriptions rely on specific assumptions on the acoustic input signals. However, this input can have significant temporal variation and so a general time-invariant model is generally inadequate. In parametric encoding, this problem can be solved by identifying the signal characteristics early and then choosing the best encoding strategy in a time-varying manner.
Embodiments of the present invention determine the signal characteristics of the acoustic input signals not in advance, but continuously, e.g., by block, e.g., for a frequency sub-band and a time period or for a subset of frequency sub-bands. and/or a subset of time periods. Embodiments of the present invention can apply this strategy to acoustic front ends for parametric spatial audio processing and/or spatial audio encoding, such as directional audio encoding (DirAC) or spatial audio microphone (SAM).
It is an idea of embodiments of the present invention to utilize time-varying signal-dependent data processing strategies for parameter estimation in parametric spatial audio coding, based on microphone signals or other acoustic input signals.
The embodiments of the present invention have been described with a main focus on parameter estimation in directional audio coding, however, the presented concept can also be applied to other parametric approaches such as spatial audio microphone.
Embodiments of the present invention provide a signal adaptive parameter estimate for spatial sound based on acoustic input signals.
Different embodiments of the present invention have been described. Some embodiments of the present invention perform parameter estimation depending on a set range of input signals. Further embodiments of the present invention 25 perform parameter estimation depending on dubious conversation situations. Further embodiments of the present invention perform parameter estimation depending on a signal-to-noise ratio of the input signals. Further embodiments of the present invention perform a parameter estimate based on the mean of the sound intensity vector depending on the ratio of input signal to noise. Further embodiments of the present invention perform parameter estimation based on an average of the estimated direction parameter depending on the input signal-to-noise ratio. Further embodiments of the present invention perform parameter estimation by choosing a suitable filterbank or a suitable conversion calculation standard depending on the input signal-to-noise ratio. Further embodiments of the present invention perform parameter estimation depending on the pitch of the acoustic input signals. Further embodiments of the present invention perform parameter estimation depending on signs such as applause.
A spatial audio processor can be, in general, a device that processes spatial audio and generates or processes parametric information.
IMPLEMENTATION ALTERNATIVES Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a corresponding method description, where a block or device corresponds to a method step or an aspect of a method step. Similarly, aspects described in the context of a method step also represent a description of a corresponding block or item or aspect of a corresponding apparatus. Some or all of the method steps can be performed by (or using) a hardware device, such as a microprocessor, a programmable computer, or an electronic circuit. In some embodiments, one or more of the most important steps of the method can be performed by this apparatus.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be carried out using a digital storage medium, for example, a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a FROM, an EPROM, an EEPROM or a FLASH memory, having control signals 10 electronically readable files stored therein, which cooperate (or are capable of cooperating) with a programmable computer system so that the respective method is carried out. Therefore, the digital storage medium can be computer readable.
Some embodiments, in accordance with the invention, comprise a data loader having electronically readable control signals, which are capable of cooperating with a programmable computer system, so that one of the methods described herein is carried out.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operated to perform one of the methods when the computer program product runs on a computer. Program code can, for example, be stored in a machine readable loader.
Other embodiments comprise the computer program for performing one of the methods described herein, stored in a machine readable loader.
In other words, an embodiment of the inventive method is therefore a computer program having program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is therefore a data loader (or a digital storage medium, or a computer readable medium) comprising, recorded thereon, the computer program for carrying out one of the methods described herein.
A further embodiment of the inventive method is therefore a data stream or signal sequence representing the computer program for carrying out one of the methods described herein. The data stream or signal sequence can, for example, be configured to be transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer or a programmable logic device configured or adapted to perform one of the 20 methods described herein.
A further embodiment comprises a computer having the computer program installed therein to carry out one of the methods described herein.
In some embodiments, a programmable logic device (e.g., a programmable field gate array) can be used to perform some or all of the functionality of the methods described herein. In some embodiments, a programmable field gate array can cooperate with a microprocessor to perform one of the methods described herein. Generally speaking, the methods are preferably performed by any hardware device.
The embodiments described above are merely illustrative for the principles of the present invention. It is understood that modifications and variations to the arrangements and details described herein will be apparent to those skilled in the art. It is therefore intended to be limited only by the scope of the impending patent claims and not by the specific details presented in the title of description and explanation of the achievements here.

权利要求:
Claims (15)
[0001]
1. SPATIAL AUDIO PROCESSOR TO PROVIDE SPATIAL PARAMETERS (102, Φ(k,n), ^(k,n)) BASED ON AN ACOUSTIC INPUT SIGNAL (104), the spatial audio processor is characterized by comprising: a signal characteristic determiner (108, 308, 408, 508, 608, 808, 908) configured to determine a signal characteristic (110, 710, 810) of the acoustic input signal (104), wherein the acoustic input signal (104) comprises at least one component; and a controllable parameter estimator (106, 306, 406, 506, 606, 606a, 606b, 806, 906) to calculate the spatial parameters (102, Φ(k,n), ^(k,n)) for the signal acoustic input (104) according to the variable spatial parameter calculation standard; wherein the controllable parameter estimator (106, 306, 406, 506, 606, 606a, 606b, 806, 906) is configured to modify the variable spatial parameter calculation standard according to the determined signal characteristic (110, 710, 810).
[0002]
A SPATIAL AUDIO PROCESSOR according to claim 1, wherein the spatial parameters (102) are characterized by comprising a sound direction and/or a sound diffusion and/or a statistical measure of the sound direction.
[0003]
A SPATIAL AUDIO PROCESSOR according to claim 1 or 2, wherein the controllable parameter estimator (106, 306, 406, 506, 606, 606a, 606b, 806, 906) is configured to calculate the spatial parameters ( 102, Φ(k,n), W(k,n)) as directional audio coding parameters which are characterized by comprising a broadcast parameter (W(k,n)) for a period of time (n) and for a frequency sub-range (k) and/or an arrival direction parameter (Φ(k, n)) for a time period (n) and for a frequency sub-range (k) or as spatial audio microphone parameters.
[0004]
A SPATIAL AUDIO PROCESSOR according to any one of claims 1 to 3, characterized in that the signal characteristic determiner (308) is configured to determine a clamping range of the acoustic input signal (104); and wherein the controllable parameter estimator (306) is configured to modify the variable spatial parameter calculation standard in accordance with the determined fixation interval, so that an averaging period for calculating the spatial parameters (102, W(k) , n), Φ(k, n)) is comparatively larger for a comparatively larger fixation interval and is comparatively smaller for a comparatively smaller fixation interval.
[0005]
5. SPATIAL AUDIO PROCESSOR according to claim 4, characterized in that the controllable parameter estimator (306) is configured to calculate the spatial parameters (102, W(k, n)) of the acoustic input signal (104) for a time period (n) and a frequency sub-range (k) based on at least a time average of signal parameters (Ia(k, n)) of the acoustic input signal (104); and wherein the controllable parameter estimator (306) is configured to vary a period averaging the time average of the signal parameters (Ia(k,n)) of the acoustic input signal (104) in accordance with the interval of determined fixation.
[0006]
6. SPATIAL AUDIO PROCESSOR according to claim 5, characterized in that the controllable parameter estimator (306) is configured to apply the time average of the signal parameters (Ia(k, n)) of the acoustic input signal ( 104) using a low pass filter; wherein the controllable parameter estimator (306) is configured to adjust a weight between a current signal parameter of the acoustic input signal (104) and previous signal parameters of the acoustic input signal (104) based on a weight parameter (α), so that the averaging period is based on the weight parameter (α), so that a weight of the current signal parameter compared to the weight of the previous signal parameters is comparatively high for a comparatively short clamping interval and so that the weight of the current signal parameter compared to the weight of the previous signal parameters is comparatively low for a comparatively long clamping interval.
[0007]
7. SPATIAL AUDIO PROCESSOR according to any one of claims 1 to 6, characterized in that the controllable parameter estimator (406, 506, 906) is configured to select a spatial parameter calculation standard (410, 412) from a plurality of spatial parameter calculation standards (410, 412) to calculate the spatial parameters (102, W(k, n), Φ(k, n)) in dependence on the determined signal characteristic (110).
[0008]
8. SPATIAL AUDIO PROCESSOR according to claim 7, characterized in that the controllable parameter estimator (406, 506) is configured so that a first spatial parameter calculation standard (410) of the plurality of parameter calculation standards The spatial parameter (410, 412) is different from a second spatial parameter calculation standard (412) of the plurality of spatial parameter calculation standards (410, 412) and wherein the first spatial parameter calculation standard (410) and the second spatial parameter standard (412) are selected from a group consisting of: time averaging over a plurality of time periods in a frequency sub-band, frequency averaging over a plurality of frequency sub-bands in a time period, averaging of time and frequency averaging and no averaging.
[0009]
A SPATIAL AUDIO PROCESSOR according to any one of claims 1 to 8, wherein the signal characteristic determiner (408) is configured to determine whether the acoustic input signal (104) is characterized by comprising components from different sources audible at the same time or when the signal characteristic determiner (508) is configured to determine a pitch of the acoustic input signal (104); wherein the controllable parameter estimator (406, 506) is configured to select, in accordance with a result of the signal characteristics determination, a spatial parameter calculation standard (410, 412) from a plurality of parameter calculation standards (410, 412), to calculate the spatial parameters (102, W(k, n), Φ(k, n)), so that a first spatial parameter calculation standard (410) of the plurality of calculation standards of spatial parameter (410, 412) is chosen when the acoustic input signal (104) comprises components of a maximum sound source or when the hue of the acoustic input signal (104) is below a certain threshold level of hue and mode. that a second spatial parameter calculation standard (412) from the plurality of spatial parameter calculation standards (410, 412) is chosen when the acoustic input signal (104) comprises components from more than one sound source at the same time or when the hue of the acoustic input signal (104) is above a certain pitch threshold level; wherein the first spatial parameter calculation standard (410) includes a frequency average over a first number of frequency subbands (k) and the second spatial parameter calculation standard (412) includes a frequency average over a second number. of frequency sub-bands (k) or does not include a frequency average; and where the first number is greater than the second number.
[0010]
10. SPATIAL AUDIO PROCESSOR according to any one of claims 1 to 9, characterized in that the signal characteristic determiner (608) is configured to determine a signal to noise ratio (110, 710) of the acoustic input signal ( 104); wherein the controllable parameter estimator (606, 606a, 606b) is configured to apply a time average over a plurality of time periods in a frequency sub-band (k), a frequency average over a plurality of frequency sub-bands ( k) over a period of time (n), a spatial average or a combination of these; and wherein the controllable parameter estimator (606, 606a, 606b) is configured to vary a period averaging the time average, the frequency average, the spatial average or a combination thereof, according to the signal-to-noise ratio determined (110, 710), so that the averaging period is comparatively longer for a comparatively smaller signal-to-noise ratio (110, 710) of the acoustic input signal and so that the averaging period is comparatively shorter. comparatively higher signal to noise (110, 710) of the acoustic input signal (104).
[0011]
11. SPATIAL AUDIO PROCESSOR according to claim 10, characterized in that the controllable parameter estimator (606a, 606b) is configured to apply the time average to a subset of intensity parameters (Ia(k,n)) over a plurality of time periods and a frequency sub-band (k) or a subset of arrival direction parameters (Φ(k, n)) over a plurality of time periods and a frequency sub-band (k); and where several intensity parameters (Ia(k, n)) in the intensity parameter subset (Ia(k, n)) or several arrival direction parameters (Φ(k, n)) in the direction parameter subset (Φ(k, n)) correspond to the average period of time, so the number of intensity parameters (Ia(k, n)) in the intensity parameter subset (Ia(k, n) ) or the number of arrival direction parameters (Φ(k, n)) in the arrival direction parameters subset (Φ(k, n)) is comparatively smaller for comparatively larger signal-to-noise ratio (110, 710) of the acoustic input signal (104) and so that the number of intensity parameters (Ia(k, n)) in the intensity parameter subset (Ia(k, n)) or the number of inward direction parameters ( Φ(k, n)) in the subset of inward direction parameters (Φ(k, n)) is comparatively larger for a comparatively smaller signal-to-noise ratio (110, 710) of the input signal. of the acoustic (104).
[0012]
12. SPATIAL AUDIO PROCESSOR according to any one of claims 10 to 11, characterized in that the signal characteristic determiner (608) is configured to provide the signal to noise ratio (110, 710) of the acoustic input signal ( 104) as a plurality of signal-to-noise ratio parameters of the acoustic input signal (104), each signal-to-noise ratio parameter of the acoustic input signal (104) being associated with a frequency sub-range and a time period , wherein the controllable parameter estimator (606a, 606b) is configured to receive a target signal to noise ratio (712) as a plurality of target signal to noise ratio parameters, each target signal to noise ratio parameter being associated with a frequency sub-band and a period of time; and wherein the controllable parameter estimator (606a, 606b) is configured to vary the averaging period of time in accordance with a current signal-to-noise ratio parameter of the acoustic input signal, such that a ratio parameter Signal to Noise Current (102) Try to match a current target signal to noise ratio parameter.
[0013]
A SPATIAL AUDIO PROCESSOR according to any one of claims 1 to 12, wherein the signal characteristic determiner (908) is configured to determine whether the acoustic input signal (104) is characterized by comprising corresponding transient components to signs like applause; wherein the controllable parameter estimator (906) comprises a filterbank (912) that is configured to convert the acoustic input signal (104) from a time domain to a frequency representation based on a standard conversion calculation. ; and wherein the controllable parameter estimator (906) is configured to choose the conversion calculation standard for converting the acoustic input signal (104) from the time domain to the frequency representation of a plurality of conversion calculation standards. according to the result of determining signal characteristics, so that a first conversion calculation standard from the plurality of conversion calculation standards is chosen to convert the acoustic input signal (104) from the time domain to the frequency representation. when the acoustic input signal comprises components corresponding to signals such as applause, and so that a second conversion calculation standard from the plurality of conversion calculation standards is chosen to convert the acoustic input signal (104) from the time domain to the frequency representation when the acoustic input signal does not comprise components corresponding to signals such as applause.
[0014]
14. METHOD FOR PROVIDING SPACE PARAMETERS BASED ON AN ACOUSTIC INPUT SIGNAL, the method is characterized by comprising: determining (1010) a signal characteristic of the acoustic input signal; modifying (1020) a spatial variable parameter calculation standard in accordance with the determined signal characteristic; and calculating (1030) spatial parameters of the acoustic input signal in accordance with the variable spatial parameter calculation standard.
[0015]
15. Non-transient storage media having recorded instructions read by a computer characterized by comprising instructions which when executed perform the method of claim 14.

类似技术:

公开号 | 公开日 | 专利标题

BR112012025013B1|2021-08-31|A spatial audio processor and a method for providing special parameters based on an acoustic input signal

US20200194013A1|2020-06-18|Apparatus and Method for Estimating an Inter-Channel Time Difference

KR101591220B1|2016-02-03|Apparatus and method for microphone positioning based on a spatial power density

JP2017503388A|2017-01-26|Extraction of reverberation using a microphone array

JP2010541350A|2010-12-24|Apparatus and method for extracting ambient signal in apparatus and method for obtaining weighting coefficient for extracting ambient signal, and computer program

WO2010061505A1|2010-06-03|Uttered sound detection apparatus

JP6636633B2|2020-01-29|Acoustic signal processing apparatus and method for improving acoustic signal

WO2017202680A1|2017-11-30|Method and apparatus for voice or sound activity detection for spatial audio

RU2762302C1|2021-12-17|Apparatus, method, or computer program for estimating the time difference between channels

BR112021007807A2|2021-07-27|analyzer, similarity evaluator, audio encoder and decoder, format converter, renderer, methods and audio representation

CN113646836A|2021-11-12|Sound field dependent rendering

AU2017229323B2|2020-01-16|A method and apparatus for increasing stability of an inter-channel time difference parameter

TW201921338A|2019-06-01|Temporal offset estimation

Cho et al.2009|Underdetermined audio source separation from anechoic mixtures with long time delay

Herzog et al.2021|Signal-Dependent Mixing for Direction-Preserving Multichannel Noise Reduction

BR112021010964A2|2021-08-31|DEVICE AND METHOD TO GENERATE A SOUND FIELD DESCRIPTION

Habib et al.2008|Experimental evaluation of multi-band position-pitch estimation | algorithm for multi-speaker localization

同族专利:

公开号 | 公开日

JP2013524267A|2013-06-17|

US9626974B2|2017-04-18|

EP2543037B1|2014-03-05|

EP2375410A1|2011-10-12|

RU2012145972A|2014-11-27|

PL2543037T3|2014-08-29|

CA2794946C|2017-02-28|

ES2452557T3|2014-04-01|

EP2543037B8|2014-04-23|

KR20130007634A|2013-01-18|

US20170134876A1|2017-05-11|

US10327088B2|2019-06-18|

RU2596592C2|2016-09-10|

CA2794946A1|2011-10-06|

AU2011234772A1|2012-11-08|

CN102918588B|2014-11-05|

HK1180824A1|2013-10-25|

JP5706513B2|2015-04-22|

US20130022206A1|2013-01-24|

EP2375410B1|2017-11-22|

BR112012025013A2|2020-10-13|

CN102918588A|2013-02-06|

KR101442377B1|2014-09-17|

MX2012011203A|2013-02-15|

EP2543037A1|2013-01-09|

AU2011234772B2|2014-09-04|

WO2011120800A1|2011-10-06|

ES2656815T3|2018-02-28|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

JP3812887B2|2001-12-21|2006-08-23|富士通株式会社|Signal processing system and method|

AU2003281128A1|2002-07-16|2004-02-02|Koninklijke Philips Electronics N.V.|Audio coding|

RU2383941C2|2005-06-30|2010-03-10|ЭлДжи ЭЛЕКТРОНИКС ИНК.|Method and device for encoding and decoding audio signals|

JP2007178684A|2005-12-27|2007-07-12|Matsushita Electric Ind Co Ltd|Multi-channel audio decoding device|

US20080232601A1|2007-03-21|2008-09-25|Ville Pulkki|Method and apparatus for enhancement of audio reconstruction|

US8180062B2|2007-05-30|2012-05-15|Nokia Corporation|Spatial sound zooming|

US8209190B2|2007-10-25|2012-06-26|Motorola Mobility, Inc.|Method and apparatus for generating an enhancement layer within an audio coding system|

EP2229676B1|2007-12-31|2013-11-06|LG Electronics Inc.|A method and an apparatus for processing an audio signal|

US8386267B2|2008-03-19|2013-02-26|Panasonic Corporation|Stereo signal encoding device, stereo signal decoding device and methods for them|

KR101629862B1|2008-05-23|2016-06-24|코닌클리케 필립스 엔.브이.|A parametric stereo upmix apparatus, a parametric stereo decoder, a parametric stereo downmix apparatus, a parametric stereo encoder|

EP2146344B1|2008-07-17|2016-07-06|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio encoding/decoding scheme having a switchable bypass|

EP2154910A1|2008-08-13|2010-02-17|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus for merging spatial audio streams|

CN101673549B|2009-09-28|2011-12-14|武汉大学|Spatial audio parameters prediction coding and decoding methods of movable sound source and system|CN103636236B|2011-07-01|2016-11-09|杜比实验室特许公司|Audio playback system monitors|

JP5752324B2|2011-07-07|2015-07-22|ニュアンスコミュニケーションズ，インコーポレイテッド|Single channel suppression of impulsive interference in noisy speech signals.|

US9479886B2|2012-07-20|2016-10-25|Qualcomm Incorporated|Scalable downmix design with feedback for object-based surround codec|

US9761229B2|2012-07-20|2017-09-12|Qualcomm Incorporated|Systems, methods, apparatus, and computer-readable media for audio object clustering|

US20140355769A1|2013-05-29|2014-12-04|Qualcomm Incorporated|Energy preservation for decomposed representations of a sound field|

EP3017446B1|2013-07-05|2021-08-25|Dolby International AB|Enhanced soundfield coding using parametric component generation|

CN104299615B|2013-07-16|2017-11-17|华为技术有限公司|Level difference processing method and processing device between a kind of sound channel|

KR102231755B1|2013-10-25|2021-03-24|삼성전자주식회사|Method and apparatus for 3D sound reproducing|

KR102112018B1|2013-11-08|2020-05-18|한국전자통신연구원|Apparatus and method for cancelling acoustic echo in teleconference system|

EP2884491A1|2013-12-11|2015-06-17|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Extraction of reverberant sound using microphone arrays|

US9922656B2|2014-01-30|2018-03-20|Qualcomm Incorporated|Transitioning of ambient higher-order ambisonic coefficients|

US10770087B2|2014-05-16|2020-09-08|Qualcomm Incorporated|Selecting codebooks for coding vectors decomposed from higher-order ambisonic audio signals|

US9462406B2|2014-07-17|2016-10-04|Nokia Technologies Oy|Method and apparatus for facilitating spatial audio capture with multiple devices|

CN105336333B|2014-08-12|2019-07-05|北京天籁传音数字技术有限公司|Multi-channel sound signal coding method, coding/decoding method and device|

CN105989851B|2015-02-15|2021-05-07|杜比实验室特许公司|Audio source separation|

EP3264802A1|2016-06-30|2018-01-03|Nokia Technologies Oy|Spatial audio processing for moving sound sources|

CN107731238B|2016-08-10|2021-07-16|华为技术有限公司|Coding method and coder for multi-channel signal|

CN107785025B|2016-08-25|2021-06-22|上海英波声学工程技术股份有限公司|Noise removal method and device based on repeated measurement of room impulse response|

EP3297298B1|2016-09-19|2020-05-06|A-Volute|Method for reproducing spatially distributed sounds|

US10187740B2|2016-09-23|2019-01-22|Apple Inc.|Producing headphone driver signals in a digital audio signal processing binaural rendering environment|

US10020813B1|2017-01-09|2018-07-10|Microsoft Technology Licensing, Llc|Scaleable DLL clocking system|

JP6788272B2|2017-02-21|2020-11-25|オンフューチャー株式会社|Sound source detection method and its detection device|

JP2020525853A|2017-07-03|2020-08-27|ドルビー・インターナショナル・アーベー|Reduced complexity of dense transient detection and coding|

EP3692704A1|2017-10-03|2020-08-12|Bose Corporation|Spatial double-talk detector|

US10165388B1|2017-11-15|2018-12-25|Adobe Systems Incorporated|Particle-based spatial audio visualization|

CN109831731B|2019-02-15|2020-08-04|杭州嘉楠耘智信息科技有限公司|Sound source orientation method and device and computer readable storage medium|

CN110007276B|2019-04-18|2021-01-12|太原理工大学|Sound source positioning method and system|

US10964305B2|2019-05-20|2021-03-30|Bose Corporation|Mitigating impact of double talk for residual echo suppressors|

法律状态:
2020-10-20| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2021-02-17| B06A| Patent application procedure suspended [chapter 6.1 patent gazette]|

2021-06-15| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2021-08-31| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 16/03/2011, OBSERVADAS AS CONDICOES LEGAIS. PATENTE CONCEDIDA CONFORME ADI 5.529/DF, QUE DETERMINA A ALTERACAO DO PRAZO DE CONCESSAO. |

优先权:

申请号 | 申请日 | 专利标题

US31868910P| true| 2010-03-29|2010-03-29|

US61/318,689|2010-03-29|

EP10186808.1|2010-10-07|

EP10186808.1A|EP2375410B1|2010-03-29|2010-10-07|A spatial audio processor and a method for providing spatial parameters based on an acoustic input signal|

PCT/EP2011/053958|WO2011120800A1|2010-03-29|2011-03-16|A spatial audio processor and a method for providing spatial parameters based on an acoustic input signal|

[返回顶部]