巴西专利BR112013013681B1 sound acquisition by extracting geometric information from arrival direction estimates

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
SOUND ACQUISITION THROUGH THE EXTRACTION OF GEOMETRIC INFORMATION FROM THE ARRIVAL DIRECTION ESTIMATES. An apparatus for generating an audio output signal to simulate a recording from a virtual microphone in a configurable virtual position in an environment is provided. The device comprises a position estimator of sound events and a computational information calculation module (120). The sound event position estimator (110) is adapted to estimate a sound source position indicating a position of a sound source in the environment, characterized by the fact that the sound event position estimator (110) is adapted to estimate the position of the sound source based on a first direction information provided by a first real space microphone being located at a first position of the real microphone in the environment, and based on a second direction information provided by a second space microphone real being located in a second position of the real microphone in the environment. The computational information calculation module (120) is adapted to generate the audio output signal based on a first recorded audio input signal, based on the first (...).
公开号:BR112013013681B1
申请号:R112013013681-2
申请日:2011-12-02
公开日:2020-12-29
发明作者:Giovanni Del Galdo；Herre Jürgen；Küch Fabian；Thiergart Oliver；Kuntz Achim；Kallinger Markus；Mahne Dirk；Kratschmer Michael；Craciun Alexandra
申请人:Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.；Friedrichalexander-Universität Erlangen-Nürnberg；
IPC主号:

专利说明:

description
The present invention relates to audio processing and, in particular, to an apparatus and method for acquiring sound by extracting geometric information from the arrival direction estimates.
Traditional spatial sound recording aims to capture a sound field with multiple microphones, so that on the reproduction side, a listener perceives the sound image as it was at the recording location. Standard approaches to recording spatial sound generally use spaced, omnidirectional microphones, for example, in AB stereophony, or coincident directional microphones, for example, in intensity stereophony, or more sophisticated microphones, such as a B-format microphone, for example Ambisonics, see, for example, [1] RK Furness, "Ambisonics - An overview," in AES 8th International Conference, April 1990, pp. 181-189.
For sound reproduction, these nonparametric approaches derive the desired audio reproduction signals (for example, the signals to be sent to the speakers) directly from the recorded microphone signals.
Alternatively, methods based on a parametric representation of sound fields can be applied, which are referred to as parametric spatial audio encoders. These methods often employ microphone sets to determine one or more audio mix signals together with spatial side information describing spatial sound. Examples are Directional Audio Coding (DirAC | Directional Audio Coding) or the well-known spatial audio microphone approach (SAM I spatial audio microphones). 5 more details about DirAC can be found in [2] Pulkki, V., "Directional audio coding in spatial sound reproduction and stereo upmixing," in Proceedings of the AES 28: n International Conference, pp. 251-258, Piteâ, Sweden, June 30 - July 2, 2006, [3] V. Pulkki, "Spatial sound reproduction with directional audio coding," J.
Audio Eng. Soc., Vol. 55, no. 6, pp. 503-516, June 2007.
For more details on the spatial audio microphone approach, the reference can be found in [4] C. Faller: "Microphone Front-Ends for Spatial Audio Coders", in Proceedings of the AES 125th International Convention, San 15 Francisco, Oct 2008.
In DirAC, for example, spatial suggestion information comprises the direction of arrival (DOA | direction-of-arrival) of the sound and the diffusion of the sound field calculated in a time-frequency domain. For sound reproduction, audio reproduction signals can be derived based on the parametric description. In some applications, the acquisition of spatial sound aims to capture a complete sound scenario. In other applications, the acquisition of spatial sound is only intended to capture certain desired components. Nearby microphones are often used to record individual sound sources with a high signal-to-noise ratio (SNR I signal-to-noise ratio) and low reverberation, while more distant configurations such as XY stereo represent a way to capture the spatial image of a complete sound scene. Greater flexibility in terms of directivity can be achieved with a beam generator, where a set of microphones can be used to perceive controllable receiving patterns. Even more flexibility is provided by the methods mentioned above, such as directional audio coding (DirAC) (see [2], [3]), in which it is possible to perceive spatial filters with arbitrary receiving patterns, as described in [5] M Kallinger, H. Ochsenfeld, G. Del Galdo, F. Küch, D. Mahne, R. Schultz-Amling. and O. Thiergart, "A spatial filtering approach for directional audio coding," in Audio Engineering Society Convention 126, Munich, Germany, May 2009, as well as other signal processing manipulations of the sound scenario, see, for example, [6 ] R. Schultz-Anil ing, F. Küch, 0. Thiergart, and M. Kallinger,
Acoustical zooming based on a parametric sound field representation, "in Audio Engineering Society Convention 128, London UK, May 2010, [7] J. Herre, C. Falch, D. Mahne, G. Del Galdo, M. Kallinger, and 0 Thiergart, "Interactive teleconferencing combining spatial audio object coding and DirAC technology," in Audio Engineering Society Convention 128, London UK, May 2010.
All the concepts mentioned above have in common that the microphones are arranged in a known fixed geometry. The spacing between the microphones is as small as possible for coincident feedback 25, while it is usually a few centimeters in other methods. In the following, we refer to any device for recording spatial sound capable of retrieving the direction of arrival of the sound (for example, a combination of directional microphones or a set of microphones, etc.) as a space microphone.
In addition, all the methods mentioned above have in common to be limited to a representation of the sound field with respect to only one point, namely, the measurement location.
Thus, the desired microphones must be placed in very specific positions, carefully selected, for example, close to the sources or so that the spatial image can be captured optimally.
However, in many applications this is not feasible and therefore it would be useful to place several microphones further away from the sound sources and still be able to capture the sound as desired.
There are several methods of field reconstruction to estimate the sound field at a point in space beyond where it was measured. One method is acoustic holography, as described in [8] E. G. Williams, Fourier Acoustics: Sound Radiation and Nearfield Acoustical Holography, Academic Press, 1999.
Acoustic holography allows you to calculate the sound field at any point with an arbitrary volume, given that the sound pressure and particle speed are known across the surface. Therefore, when the volume is large, a large number of impractical sensors are needed. In addition, the method assumes that no sound source is present within the volume, making the algorithm unfeasible for our needs. The extrapolation of the related wave field (see also [8]) aims to extrapolate the known sound field on the surface of a volume to external regions. However, the extrapolation accuracy degrades rapidly for longer extrapolation distances as well as for extrapolations in directions orthogonal to the direction of sound propagation, see [9] A. Kuntz and R. Rabenstein, "Limitations in the extrapolation of wave fields from circular measurements , "in 15th European Signal Processing 5 Conference (EUSIPCO 2007), 2007.
A. Walther and C. Faller, "Linear simulation of spaced microphone arrays using b-format recordings," in Audio Engineering Society Convention 128, London UK, May 2010, describes a plane wave model, where field extrapolation is only 10 possible at points far from real sound sources, for example, close to the measurement point. A major disadvantage of traditional approaches is that the recorded spatial image is always relative to the spatial microphone used. In many applications, it is not possible or feasible to place a space microphone in the desired position, for example, close to the sound sources. In this case, it would be more useful to place multiple space microphones further away from the sound scene and still be able to capture the sound as desired. US61 / 287,596: An Apparatus and a Method for 20 Converting a First Parametric Spatial Audio Signal into a Second Parametric Spatial Audio Signal, proposes a method to virtually move the actual recording position to another position when played on speakers or headphones . However, this approach is limited to a simple sound scenario, in which it is assumed that all sound objects are equal in distance to the actual space microphone used for recording. Furthermore, the method can only take advantage of a space microphone.
It is an object of the present invention to provide improved concepts for sound acquisition through the extraction of geometric information. The object of the present invention is solved by an apparatus according to claim 1, by a method according to claim 24 and by a computer program, according to claim 25.
According to an application, a device for generating an audio output signal to simulate a recording from a virtual microphone in a configurable virtual position in an environment is provided. The apparatus comprises a 10 position estimator of sound events and a computational information calculation module. The sound event position estimator is adapted to a sound source position, indicating a position of a sound source in the environment, where the sound event position estimator is adapted to estimate the position of the sound source based on 15 in a first direction information provided by a real space microphone being located in a first real microphone position in the environment and based on a second direction information provided by a second real space microphone being located in a second real microphone position in the environment.
The computational information calculation module is adapted to generate the audio output signal based on a first recorded audio input signal being recorded by the first real spatial microphone, based on the first position of the real microphone, based on the virtual position of the virtual microphone and 25 based on the position of the sound source.
In an application, the computational information calculation module comprises a propagation compensator, in which the propagation compensator is adapted to generate a first modified audio signal by modifying the first recorded audio input signal, based on a first decline. of amplitude between the source of the sound and the first real space microphone, and based on a second decline of 5 amplitude between the source of the sound and the virtual microphone when adjusting an amplitude value, a magnitude value or a phase value of the first recorded audio input signal, to obtain the audio output signal. In an application the first amplitude decline may be a amplitude decline of a sound wave emitted by a sound source and the second amplitude decline may be a amplitude decline of the sound wave emitted by the sound source.
According to another application, the computational information calculation module comprises a spreader compensator being adapted to generate a first modified audio signal by modifying the first recorded audio input signal by compensating for a first delay between an incoming wave. sound emitted by the sound source in the first real space microphone and an arrival of the sound wave in the virtual microphone by adjusting an amplitude value, a magnitude value or a phase value of the first recorded audio input signal, to obtain the audio output signal.
According to an application, the use of two or more space microphones is assumed, which are referred to as 25 real space microphones below. For each real space microphone, the DOA of the sound can be estimated in the time-frequency domain. From the information collected by the real space microphones, together with the knowledge of their relative position, it is possible to constitute the output signal of an arbitrary space microphone placed virtually arbitrarily in the environment. This space microphone is referred to as the virtual space microphone below.
Note that the Direction of Arrival (DOA) can be expressed as an azimuth angle if in 2D space, or by a pair of azimuth and elevation angles in 3D. Similarly, a unit standard vector pointed out in the DOA can be used.
In applications, means are provided to capture sound in a spatially selective manner, for example, a sound that originates from a specific target location can be selected, as if a "local microphone" had been installed at that location. Instead of actually installing this local microphone, its output signal can be simulated using two or more space microphones placed in other distant positions.
The term "space microphone" refers to any device for acquiring spatial sound capable of retrieving the direction of the sound's arrival (for example, combination of directional microphones, microphone sets, etc.)
The term "non-spatial microphone" refers to any device that is not adapted to retrieve the direction of arrival of the sound, such as a single directional or omnidirectional microphone.
It should be noted, that the term "real space microphone" refers to a space microphone as defined above that exists physically. With reference to the virtual space microphone, it should be noted that the virtual space microphone can represent any type of desired microphone or microphone combination, that is, it can, for example, represent a single omnidirectional microphone, a directional microphone, a pair of 5 microphones directional as used in standard stereo microphones, but also a microphone set.
The present invention is based on the discovery that when two or more real space microphones are used, it is possible to estimate the position in 2D or 3D space of sound events, thus, the location of the position can be obtained. By using the determined positions of sound events, the sound signal that would be recorded by a virtual space microphone placed and arbitrarily oriented in space can be calculated, as well as the corresponding spatial side information, such as the Direction of Arrival from the point of arrival. view of the virtual space microphone.
For this purpose, each sound event can be assumed to represent a point as the sound source, for example, an isotropic point as the sound source. The following "real sound source" refers to a real sound source that physically exists in the recording environment, such as transmitters or musical instruments, etc. On the contrary, with the "sound source" or "sound event" we refer below to an effective sound source, which is active at a certain time or at a certain time-frequency position, characterized by the fact that the 25 sound sources can, for example, represent the sound sources real or mirror image sources. According to an application, it is implicitly assumed that the sound scene can be shaped as a wide variety of these sound events or point as sound sources. In addition, each source can be assumed to be active only within a specific time and frequency slot in a predefined time-frequency representation. The distance between the real space microphones can thus be that the resulting temporal difference in the propagation time is shorter than the temporal resolution of the time-frequency representation. The latter assumption ensures that a given sound event is received by all space microphones within the same time slot. This implies that the DOAs estimated in 10 different space microphones for the same time-frequency fit in addition to corresponding to the same sound event. This presumption is not difficult to bring together the real space microphones placed a few meters apart in large rooms (such as living rooms or conference rooms) with a temporal resolution of a few ms.
Microphone sets can be used to locate the sound sources. Localized sound sources can have different physical interpretations depending on their nature. When microphone sets receive direct sound, they can locate the position of a true sound source (for example, transmitters). When microphone sets receive reflections, they can locate the position of a mirror image source. Mirror image sources are also sources of sound.
A parametric method capable of estimating the sound signal from a virtual microphone placed in an arbitrary location is provided. In contrast to the previously described methods, the proposed method is not intended to directly reconstruct the sound field, but rather to provide sound that is perceptually similar to what would be received by a microphone physically placed in this location. This can be achieved by employing a parametric model of the sound field based on point type sound sources, for example, isotropic point-like sound sources' type 5 (IPLS | sound sources). The necessary geometric information, namely, the instantaneous position of all IPLS, can be obtained by conducting the triangulation of the estimated arrival directions with two or more sets of distributed microphones. This can be achieved by obtaining knowledge of the position and relative orientation of the systems. However, no prior knowledge of the number and position of actual sound sources (eg, transmitters) is necessary. Given the parametric nature of the proposed concepts, for example, the proposed apparatus or method, the virtual microphone can process a pattern of arbitrary directivity as well as arbitrary physical or non-physical behaviors, for example, with respect to pressure drop with distance. The approach presented was verified by studying the parameter estimation accuracy based on measurements in a reverberating environment.
While conventional recording techniques for spatial audio are limited so far as the spatial image obtained is always relative to the position in which the microphones were physically placed, the applications of the present invention 25 consider that in many applications, it is desired to place the microphones outside of the sound scenario and still be able to capture the sound from an arbitrary perspective. According to the applications, concepts are provided that virtually place a virtual microphone at an arbitrary point in space, computing a signal perceptually similar to what would be received, if the microphone was physically placed on the sound stage. Applications can apply the concepts, which can employ a parametric model of the sound field based on the point type sound sources, for example, isotropic point type sound sources. The required geometric information can be collected by two or more sets of distributed microphones.
According to an application, the 10 position event sound estimator can be adapted to estimate the position of the sound source based on a first direction of arrival of the sound wave emitted by the sound source at the first position of the actual microphone as the first direction information and based on a second direction of arrival of the sound wave in the second position 15 of the actual microphone as the second direction information.
In another application, the computational information calculation module may comprise a computational module for spatial lateral information to calculate spatial lateral information. The computational computation module of 20 information can be adapted to estimate the direction of arrival or an active sound intensity in the virtual microphone as the spatial lateral information, based on a position vector of the virtual microphone and based on a position vector the sound event,
According to another application, the propagation compensator can be adapted to generate the first audio signal modified in a time-frequency domain, compensating for the first delay or decline in amplitude between the arrival of the sound wave emitted by the sound source in the first real space microphone and the arrival of the sound wave in the virtual microphone by adjusting the said magnitude value of the first recorded audio input signal being represented in a time-frequency domain.
In an application, the propagation compensator can be adapted to conduct propagation compensation by generating a modified magnitude value of the first modified audio signal by applying the formula:
characterized by the fact that di (k, n) is a distance between the position of the first real space microphone and the position of the sound event, where s (k, n) is a distance between the virtual position of the virtual microphone and the position of the sound source of the sound event, where Pref (k, n) is a magnitude value of the first recorded audio input signal being represented in a time-frequency domain, and where Pv (k, n ) is the modified magnitude value.
In another application, the computational information calculation module can also comprise a combiner, characterized by the fact that the propagation compensator can be further adapted to modify a second recorded audio input signal, being recorded by the second spatial microphone 25 compensating for a second delay or amplitude decline between an arrival of the sound wave emitted by the sound source in the second real space microphone and an arrival of the sound wave in the virtual microphone, by adjusting an amplitude value, a magnitude value or a phase value of the second audio input signal recorded to obtain a second modified audio signal, and where the combiner can be adapted to generate a combination signal by combining the first modified audio signal and the second audio signal modified, to obtain the audio output signal.
According to another application, the propagation compensator can also be adapted to modify one or more additional recorded audio input signals, being recorded by one or more additional real space microphones, compensating for delays between the arrival of the sound wave in the virtual microphone and an arrival of the sound wave emitted by the sound source in each of the additional real space microphones. Each of the amplitude delays or declines can be compensated for by adjusting an amplitude value, a magnitude value or a phase value of each of the additional recorded audio input signals to obtain a plurality of third modified audio signals. The combiner can be adapted to generate a combination signal by combining the first modified audio signal and the second modified audio signal and the plurality of third modified audio signals, to obtain the audio output signal.
In another application, the computational information calculation module may comprise a spectral weighting unit to generate an audio signal weighted by modifying the first modified audio signal depending on a direction of arrival of the sound wave in the virtual position of the virtual microphone and depending on a virtual orientation of the virtual microphone to obtain the audio output signal, characterized by the fact that the first modified audio signal can be modified in a time-frequency domain.
In addition, the computational information calculation module can comprise a spectral weighting unit to generate an audio signal weighted by modifying the combination signal depending on an incoming direction or the sound wave in the virtual position of the virtual microphone and a virtual orientation of the virtual microphone to obtain the audio output signal, 10 characterized by the fact that the combination signal can be modified in a time-frequency domain.
According to another application, the spectral weighting unit can be adapted to apply the weighting factor 15 α + (1-α) cos (cpv (k, n)), or the weighting factor 0.5 + 0.5 cos (cpv (k, n)) in the weighted audio signal, characterized by the fact that <pv (k, n) indicates a vector of the direction of arrival of the sound wave emitted by the sound source in the virtual position of the virtual microphone 20 .
In one application, the propagation compensator is further adapted to generate a third audio signal modified by modifying a third audio input signal recorded by an omnidirectional microphone compensating for a third delay or decline in amplitude between an arrival of the sound wave emitted by the sound source in the omnidirectional microphone and an arrival of the sound wave in the virtual microphone by adjusting an amplitude value, a magnitude value or a phase value of the recorded third audio input signal, to obtain the output signal of audio.
In another application, the position estimator of sound events can be adapted to estimate a position of the sound source in a three-dimensional environment.
In addition, according to another application, the computational information calculation module can also comprise a diffusion computational calculation unit being adapted to estimate a diffuse sound energy in the virtual microphone or a direct sound energy in the virtual microphone.
The diffusion computation unit can, according to another application, be adapted to estimate the diffuse sound energy Edjff in the virtual microphone by applying the formula:
Characterized by the fact that N is the number of a plurality of real space microphones comprising the first and the second real space microphone, and in which is the diffuse sound energy in the real space microphone i-th.
In another application, the diffusion computation unit can be adapted to estimate direct sound energy by applying the formula:
characterized by the fact that "distance SMi - IPLS" is a distance between a position of the actual microphone i-th and the position of the sound source, where "distance VM - IPLS" is a distance between the virtual position and the position of the sound source, and dir and the direct energy in the real space microphone i-th.
In addition, according to another application, the diffusion computation unit can also be adapted to estimate the diffusion in the virtual microphone by estimating the diffuse sound energy in the virtual microphone and the direct sound energy in the virtual microphone and applying the formula: p (VM)
Characterized by the fact that it indicates the diffusion in the virtual microphone being estimated, in which it indicates the diffuse sound energy being estimated and in which Edir indicates the direct sound energy being estimated.
Preferred applications of the present invention will be described below, in which: Fig. 1 illustrates an apparatus for generating an audio output signal according to an application, Fig. 2 illustrates the inputs and outputs of an apparatus and a method for generating an audio output signal according to an application, 20 Fig. 3 illustrates the basic structure of an apparatus according to an application comprising an estimator of the position of the sound events and a computational information calculation module, Fig. 4 shows an exemplary scenario in which the 25 real space microphones are described as Uniform Linear Sets of 3 microphones each, 3D to estimate the direction of arrival in 3D space, Fig. 6 illustrates a geometry where an isotropic point type sound source of the position current time-frequency (k, n) is located in a position PiPLs (k, n), Fig. 7 describes the computational information calculation module according to an application, Fig. 8 describes the computational calculation module of information according to another application, Fig. 9 shows two real space microphones, 10 a localized sound event and a position of a virtual space microphone, along with the corresponding delays and amplitude declines, Fig. 10 illustrates how to obtain the direction of arrival with respect to a virtual microphone according to an application, 15 Fig. 11 describes a possible way to derive the DOA of the sound from the point of view of the virtual microphone according to an application, Fig. 12 illustrates a calculation block computational information additionally comprising a computational diffusion calculation unit according to an application, Fig. 13 describes a computational diffusion calculation unit according to an application, Fig. 14 illustrates a scenario, where the estimate 25 of the position of the sound events are not possible, and Fig. 15a-15c illustrate scenarios where two sets of microphones receive direct sound, sound reflected by a wall and diffuse sound. I
Figure 1 illustrates a device for generating an audio output signal to simulate a recording from a virtual microphone in a configurable posVmic virtual position in an environment. The apparatus comprises a sound event position estimator 5 and a computational information information module 120. The sound event position estimator 110 receives a first dil direction information from a first real space microphone and a second sound information. di2 direction of a second real space microphone. The sound event position estimator 110 is adapted to estimate a position of the ssp sound source indicating a position of a sound source in the environment, the sound source emitting a sound wave, characterized by the fact that the sound estimator position of sound events 110 is adapted to estimate the position of the ssp sound source based on a first dil direction information 15 provided by a first real space microphone being located in a first position of the actual poslmic microphone in the environment, and based on in a second direction information di2 provided by a second real space microphone being located in a second position of the real microphone in the environment. The computational information calculation module 120 is adapted to generate the audio output signal based on a first recorded audio input signal isl being recorded by the first real space microphone, based on the first poslmic real microphone position and with based on the posVmic virtual position of the virtual microphone 25. The computational information calculation module 120 comprises a propagation compensator being adapted to generate a first modified audio signal by modifying the first recorded audio input signal isl compensating for a first delay or amplitude decline between an arrival of the sound wave. * t
I emitted by the sound source in the first real space microphone and an arrival of the sound wave in the virtual microphone by adjusting an amplitude value, a magnitude value or a phase value of the first recorded audio input signal isl, for get the audio output signal.
Figure 2 illustrates the inputs and outputs of a device and a method according to an application. Information from two or more real space microphones 111, 112, 11N are 10 inserted in the device / are processed by the method. This information comprises audio signals received by the real space microphones, as well as the direction information from the real space microphones, for example, estimates of the direction of arrival (DOA). Audio signals and direction information, such as estimates of the arrival direction can be expressed in a time-frequency domain. If, for example, a 2D geometry reconstruction is desired and a traditional STFT domain [short time Fourier transformation | Short Term Fourier Transform) is chosen for the representation of the signals, the DOA can be expressed as azimuthal angles dependent on k and n, namely, the time and frequency indices.
In applications, the location of the sound event in space, as well as describing the position of the virtual microphone, can be conducted based on the positions and orientations of the real and virtual space microphones in a common coordinate system.
This information can be represented by entries 121. . . 12N and input 104 in Fig. 2. Input 104 can additionally specify the characteristic of the virtual space microphone, for example, its position and receiving pattern, as will be discussed below. If the virtual space microphone comprises several virtual sensors, their positions and the corresponding different receiving patterns can be considered.
The output of the apparatus or a corresponding method may, when desired, be one or more sound signals 105, which may have been received by a spatial microphone defined and placed as specified by 104. In addition, the apparatus (or rather the method) it can provide the corresponding spatial side information 10 of output 106 which can be estimated using the virtual space microphone.
Figure 3 illustrates an apparatus according to an application, comprising two main processing units, a position estimator of sound events 201 and a computational information calculation module 202. The position estimator of sound events 201 can perform geometric reconstruction based on the DOAs included in entries 111. . . UN and based on knowledge of the position and orientation of the real space microphones, where the DOAs were calculated. The output of the I Sound Event Position Estimator 205 comprises the position estimates (both in 2D and 3D) of the sound sources where the sound events occur for each time and frequency position. The second processing block 202 is a computational information calculation module. According to the application of Figure 3, the second processing block 202 computes a virtual microphone signal and spatial lateral information. In this way, it is also referred to as the virtual microphone signal and computational block of lateral information 202. The virtual microphone signal and computational block of lateral information 202 use the positions of the sound events 205 to process the audio signals. comprised in 111 ... 11N to output the audio signal from the virtual microphone 105. Block 202, if necessary 5, can also calculate the spatial side information 106 corresponding to the virtual space microphone. The applications below illustrate the possibilities, how blocks 201 and 202 can operate.
In the following, the position estimate of a sound event position estimator according to an application is described in more detail. Depending on the size of the problem (2D or 3D) and the number of space microphones, several solutions for position estimation are possible. If there are two 2D space microphones, (the simplest possible case) a simple triangulation is possible. Figure 4 shows an exemplary scenario in which the real space microphones are described as Uniform Linear Sets (ULAs I Uniform Linear Arrays) of 3 microphones each. DOA, expressed as the azimuthal angles al (k, n) and a2 (k, n), is calculated for the time-frequency position (k, n). This is achieved by employing a correct DOA estimator, such as ESPRIT, [13] R. Roy, A. Paulraj, and T. Kailath, "Direction-of-arrival estimation by subspace rotation methods - ESPRIT," in IEEE International Conference on Acoustics, Speech, 25 and Signal Processing (ICASSP), Stanford, CA, USA, April 1986, or the (root) MUSIC, see [14] R. Schmidt, "Multiple emitter location and signal parameter estimation," IEEE Transactions on Antennas and
Propagation, vol. 34, no. 3, pp. 276-280, 1986 regarding pressure signals transformed in the time-frequency domain.
In Figure 4, two real space microphones, here, two sets of real space microphones 410, 420 are 5 illustrated. The two estimated DOAs al (k, n) and a2 (k, n) are represented by two lines, a first line 430 representing DOA al (k, n) and a second line 440 representing DOA a2 (k, n). Triangulation is possible through simple geometric considerations, knowing the position and orientation of 10 each system.
Triangulation fails when the two lines 430, 440 are exactly parallel. In real applications, however, this is very unlikely. However, not all triangulation results correspond to the physical or viable position for the sound event 15 in the considered space. For example, the estimated position of the sound event can be very far or even outside the supposed space, indicating that the DOAs probably do not correspond to any sound event that can be physically interpreted with the model used. Such results can be caused by noise from the sensor or very strong ambient reverberation. In this way, according to an application, such unwanted results are reported so that the computational information calculation module 202 can handle them correctly.
Figure 5 describes a scenario where the position of a sound event is estimated in 3D space. Suitable space microphones are used, for example, a flat or 3D microphone set. In Figure 5, a first space microphone 510, for example, a first 3D microphone set, and a second space microphone 520, for example, a first 3D microphone set, is illustrated. DOA in 3D space, for example, can be expressed as azimuth and elevation. Unit vectors 530, 540 can be used to express DOAs. Two lines 550, 5 560 are designed according to the DOAs. In 3D, even with many reliable estimates, the two lines 550, 560 projected according to the DOAs may not cross. However, triangulation can still be performed, for example, by choosing the midpoint of the smallest segment that connects the two lines. 10 Similar to the 2D case, triangulation may fail or may produce impractical results for certain combinations of directions, which can then also be assigned, for example, to the computational information calculation module 202 in Figure 3. 15 If there is more than two space microphones, several solutions are possible. For example, the triangulation explained above can be performed for all pairs of real space microphones (if N = 3, 1 with 2, 1 with 3, and 2 with 3). The resulting positions can then be averaged (by x and y, and, 20 if 3D is considered, z).
Alternatively, more complex concepts can be used. For example, probabilistic approaches can be applied as described in [15] J. Michael Steele, "Optimal Triangulation of Random Samples in the Plane", 25 The Annals of Probability, Vol. 10, No. 3 (Aug., 1982), pp. 548-553.
According to an application, the sound field can be analyzed in the time-frequency domain, for example, obtained through a short-lived Fourier transform (STFT), in which ken denote the frequency index k and time index n , respectively. The complex pressure Pv (k, n) in an arbitrary position pv for a given ken is modeled as a single spherical wave emitted by a narrow band isotropic point source, for example, using the formula:
where PipLS (k, n) is the signal emitted by IPLS in its pIPLS position (k, n). The complex factor y (k, pIPLS, Pv) expresses the propagation of PiPLs (k, n) in pv, for example, introducing the appropriate magnitude and phase modifications. Here, the presumption can be applied if in each time-frequency duration only one IPLS is active. However, several narrowband IPLSs located in different positions can also be active in a single instance of time.
Each IPLS models both direct sound and a distinct ambient reflection. Its PiPLs position (k, n) can ideally correspond to a real sound source located inside the room, or a mirror image source of sound located outside, 20 respectively. In this way, the PiPLs position (k, n) can also indicate the position of a sound event.
Please note that the term "real sound sources" denotes the actual sound sources that physically exist in the recording environment, such as transmitters or musical instruments. On the contrary, with "sound sources" or "sound events" or "IPLS" we refer to effective sound sources, which are activated in certain time instances or in certain time-frequency positions, characterized by the fact that sound sources can, for example, represent real sound sources or mirror image sources.
Figures 15a-15b illustrate sets of microphones that locate the sound sources. Localized sound sources can have different physical interpretations depending on their nature. When microphone sets receive direct sound, they can locate the position of a true sound source (for example, transmitters). When microphone sets receive reflections, they can locate the position of a mirror image source. Mirror image sources are also sources of sound.
Figure 15a illustrates a scenario, where two sets of microphones 151 and 152 receive sound directly from a real sound source (a sound source that physically exists) 153.
Figure 15b illustrates a scenario, where two sets of microphones 161, 162 receive the reflected sound, characterized by the fact that the sound was reflected by a wall. Because of the reflection, the microphone sets 161, 162 locate the position, where the resulting sound appears, in a position of a mirror image source 165, which is different from the position of speaker 163.
Both the actual sound source 153 of Figure 15a and the mirror image source 165 are sound sources.
Figure 15c illustrates a scenario, where two sets of microphones 171, 172 receive diffused sound and cannot locate a sound source.
Although this single wave model is only needed for slightly reverberant environments, given that the source signals meet the condition of W disjoint orthogonality (WDO I W-disjoint orthogonality), that is, the time-frequency overlap is small enough, this is usually true for speech signals, see, for example, [12] S. Rickard and Z. Yilmaz, "On the approximate W-disjoint orthogonality of speech," in Acoustics, Speech and Signal Processing, 2002. ICASSP 2002. IEEE International Conference on, April 2002, vol. 1.
However, the model also provides a good estimate for other environments and is therefore still applicable for these environments.
Next, the estimation of the pIPLS positions (k, n) according to an application is explained. The PiPL, s (k, n) position of an active IPLS at a given time-frequency position, and thus the estimation of a sound event at a time-frequency position is estimated through triangulation based on the direction of arrival (DOA) of the sound measured at at least two different observation points.
Figure 6 illustrates a geometry, where the IPLS of the current time-frequency slot (k, n) is located at the unknown position pip ^ i k, n). To determine the necessary DOA information, two real space microphones, here, two sets of microphones, are employed having a known geometry, position and orientation, which are placed at positions 610 and 620, respectively. The px and p2 vectors point at positions 610, 620, respectively. The orientations of the system are defined by the vectors of the unit c2 and c2. The DOA of the sound is determined at positions 610 and 620 for each (k, n) using a DOA estimation algorithm, for example, as provided by DirAC analysis (see [2], [3]). At present, a first unit vector from the point of view eP0V (k, n) and a second unit vector from the point of view eP0V (k, n) with respect to a point of view of the 5 microphone sets (both not shown) in Figure 6) can be provided as an output from the DirAC analysis. For example, when operating in 2D, the first unit vector from the point of view results in:

Here, <pi (k, n) represents the azimuth of the DOA estimated in the first microphone set, as described in Figure 6. The corresponding DOA unit vectors ei (k, n) and e2 (k, n), with respect to to the global coordinate system at the origin, can be calculated using the formula:
where R are coordinate transformation matrices, for example,
when operating in 2D and Cl ~ ci, y] _ To perform the triangulation, the direction vectors d: (k, n) and d2 (k, n) can be calculated as:
where di (k, n) = | dx (k, n) I | and d2 (k, n) = | | d2 (k, n) II are the unknown distances between the IPLS and the two microphone sets. The following equation v1 + di (k, n) _ál2 + d2 (k, n) can be solved for di (k, n). Finally, the PiPLs (k, n) position of the IPLS is given by prpi..s (k ,,, ri) - d 1 (k, n.) And i. (k, n) + V i. In another application, equation (6) can be solved for d2 (k, n) and PiPLs (k, n) is similarly calculated using d2 (k, n).
Equation (6) always provides a solution when operating in 2D, unless ejk, n) and e2 (k, n) are parallel. However, when using more than two sets of microphones or when operating in 3D, a solution cannot be obtained when the direction vectors d do not intersect. According to an application, in this case, the point that is closest to all the direction vectors d must be calculated and the result can be used as the position of the IPLS. In an application, all observation points plz p2, ... must be located so that the sound emitted by the IPLS fails in the same time block n. This requirement can simply be fulfilled when the distance Δ between any of the two observation points is less than
where nFFT is the length of the STFT window, 0 <R <1 specifies the overlap between successive time frames and fs is the sample frequency. For example, for a 1024-point STFT at 48 kHz with 50% overlap (R = 0.5), the maximum spacing between systems to meet the above requirement is Δ = 3.65 m.
In the following, a computational information calculation module 202, for example, a virtual microphone signal and lateral information computation module, according to an application is described in more detail.
Figure 7 illustrates a schematic overview of a computational information calculation module 202 according to an application. The computational information calculation unit comprises a propagation compensator 500, a combiner 510 and a spectral weighting unit 520. The computational information calculation module 202 receives the estimates of the position of the ssp sound source estimated by a position estimator of sound events, one or more audio input signals are recorded by one or more of the real space microphones, posRealMic positions of one or more of the real space microphones, and the virtual microphone posVmic virtual position. Emits an audio output signal representing an audio signal from the virtual microphone.
Figure 8 illustrates a computational information calculation module according to another application. The computational information calculation module of Figure 8 comprises a propagation compensator 500, a combiner 510 and a spectral weighting unit 520. The propagation compensator 500 comprises a computational calculation module of the propagation parameters 501 and a compensation compensation module. 5 propagation 504. Combiner 510 comprises a computational module for combining factors 502 and a combining module 505. Spectral weighting unit 520 comprises a computational unit for spectral weighting 503, a spectral weighting application module 506 and a computational calculation module for spatial lateral information 507. To calculate the audio signal from the virtual microphone, the geometric information, for example, the position and orientation of the real space microphones 121. . . 12N, the position, 15 orientation and characteristics of the virtual space microphone 104, and the position estimates of the sound events 205 are inserted in the computational information calculation module 202, in particular, in the computational calculation module of the propagation parameters 501 of the propagation compensator 500, in the computational module 20 of the combination factors 502 of the combiner 510 and in the computational unit of spectral weighting 503 of the spectral weighting unit 520. The computational module of the propagation parameters 501, the computational calculation of the combination factors 502 and the unit of computational calculation of spectral weights 503 calculate the parameters used in the modification of the audio signals 111. . . 11N in the propagation compensation module 504, in the combination module 505 and in the spectral weighting application module 506.
In the computational information calculation module 202, the audio signals 111 ... 11N can in the first be modified to compensate for the effects given by the different 5 lengths of propagation between the positions of the sound event and the real space microphones. The signals can then be combined to improve, for example, the signal-to-noise ratio (SNR). Finally, the resulting signal can then be spectrally weighed to account for the directional receiving pattern of the virtual microphone 10, as well as any distance dependent on the gain function. These three steps are discussed in more detail below.
Propagation compensation is now explained in more detail. At the top of Figure 9, two real space microphones (a first microphone set 910 and a 15 second microphone set 920), the position of a localized sound event 930 for the time-frequency position (k, n), and the position of the virtual space microphone 940 are illustrated.
The bottom part of Figure 9 describes a time axis. A sound event is supposed to be emitted at time t0 20 and then propagate to real and virtual space microphones.
Arrival time delays, as well as amplitudes change with distance, so that the longer the propagation length, the weaker the amplitude and the longer the arrival delay time will be.
The signals in the two real systems are comparable only if the relative Dtl2 delay between them is small. Otherwise, one of the two signals needs to be temporarily realigned to compensate for the relative Dtl2 delay, and possibly be scaled to compensate for the different declines.
Compensating for the delay between arrival at the virtual microphone and arrival at the real microphone systems (on one of the real space microphones) changes the delay regardless of the location of the sound event, making it unnecessary for most applications.
With reference to Figure 8, the computational calculation module of propagation parameters 501 is adapted to compute the delays to be corrected for each real space microphone 10 and for each sound event. If desired, it also computes the gain factors to be considered to compensate for different amplitude declines.
The propagation compensation module 504 is configured to use this information to modify audio signals 15 correctly. If the signals are to be changed for a small amount of time (compared to the filter bank time window), then a simple phase rotation is sufficient. If the delays are longer, more complicated implementations are needed.
The output of the propagation compensation module 4 is the modified audio signals expressed in the original time-frequency domain.
In the following, a particular estimate of propagation compensation for a virtual microphone according to an application will be described with reference to Figure 6 which, among other things, illustrates the position 610 of a first real space microphone and the position 620 of a second real space microphone.
In the application that is now explained, it is assumed that at least one first recorded audio input signal, for example, a pressure signal from at least one of the actual space microphones (for example, microphone sets) is available , for example, the pressure signal from a first real space microphone. We refer to the microphone considered as the reference microphone, its position as the pref reference position and its pressure signal as the Pref (k, n) reference pressure signal. However, propagation compensation may not only be conducted with respect to just one pressure signal, but also with respect to pressure signals from a plurality or all of the actual space microphones.
The relationship between the PIPLS pressure signal (k, n) emitted by IPLS and a Pref reference pressure signal (k, n) of a reference microphone located in pref can be expressed by the formula (9):

In general, the complex factor y (k, Pa / Pb) expresses the phase rotation and amplitude decline introduced by the propagation of a spherical wave from its origin in pa to bp. However, practical tests indicated that consider only p declination of amplitude in y leads to plausible impressions of the virtual microphone signal with significantly few artifacts 25 compared to still considering phase rotation.
The sound energy that can be measured at a given point in space depends strongly on the distance r from the sound source, in Figure 6 of the piPLs position of the sound source. In many situations, this dependency can be modeled with sufficient precision using well-known physical principles, for example, the pressure of the declining sound 1 / r in the distant field of a point source. When the distance from a reference microphone, for example, the first real microphone from the sound source is known, and when the distance from the virtual microphone to the sound source is also known, then the sound energy at the position of the virtual microphone can be estimated from the signal and energy of the reference microphone, for example, the first real space microphone. This means that the output signal from the virtual microphone can be obtained by applying correct gains to the reference pressure signal.
Assuming the first real space microphone is
The reference microphone, then pref = Pi. In Figure 6, the virtual microphone 15 is located in pv. Since the geometry in Figure 6 is known in detail, the distance di (k, n) = | | dj (k, n) | | between the reference microphone (in Figure 6: the first real space microphone) and the IPLS can be easily determined, as well as the distance s (k, n) = I | s (k, n) II between the virtual microphone and 20 IPLS, namely,
The Pv sound pressure (k, n) at the position of the virtual microphone is calculated by combining formulas (1) and (9), 25 resulting in

As mentioned above, in some applications, the y-factors may consider only the decline in amplitude due to propagation. Supposing, for example, that the sound pressure reduces with 1 / r, then
When the model in formula (1) remains, for example, when only the direct sound is present, then formula (12) can precisely reconstruct the magnitude information. However, in the case of pure diffuse sound fields, for example, when the model's assumptions are not met, the method presented produces an implicit non-reverberation of the signal when moving the virtual microphone away from the positions of the sensor systems. In fact, as discussed above, in diffuse sound fields, we expect that most IPLSs are located close to the two sensor systems. So, by moving the virtual microphone away from these positions, we probably increase the distance s = I | s | | In Figure 6. In this way, the magnitude of the reference pressure is reduced when applying a weight according to formula (11). Correspondingly, when moving the virtual microphone close to a real sound source, the time-frequency positions corresponding to the direct sound will be amplified so that the entire audio signal will be perceived as less diffuse. By adjusting the rule in formula (12), one can control the amplification of the direct sound and suppression of the diffuse sound arbitrarily.
By conducting the propagation compensation on the recorded audio input signal (for example, the pressure signal) of the first real space microphone, a first modified audio signal is obtained.
In applications, a second modified audio signal can be obtained by conducting the propagation compensation on a second recorded audio input signal (second pressure signal) from the second real space microphone.
In other applications, other audio signals can be obtained by conducting the propagation compensation on the other recorded audio input signals (other pressure signals) of the additional real space microphones.
Now, the combination in blocks 502 and 505 in
Figure 8 according to an application is explained in more detail.
It is assumed that two or more audio signals from a plurality of different real space microphones have been modified to compensate for different propagation paths to obtain two or more modified audio signals. Since the audio signals from different real space microphones have been modified to compensate for different propagation trajectories, they can be combined to improve audio quality. By doing this, for example, the SNR may be high or the reverb may be reduced.
Possible solutions for the combination include: 25 - Weighted average, for example, considering SNR, or the distance to the virtual microphone, or the diffusion that was estimated by the real space microphones. Traditional solutions, for example, Maximum Ratio Combination (MRC | Maximum Ratio Combining) or Equal Gain Combination (EQC I Equal Gain Combining) can be employed, or Linear combination of some or all of the modified audio signals to obtain a combination. The 5 modified audio signals can be weighted in the linear combination to obtain the combination signal, or
Selection, for example, only one signal is used, for example, depending on SNR or distance or diffusion
The task of module 502 is, if applicable, to compute 10 parameters for the combination, which is performed in module 505.
Spectral weighting according to applications is now described in more detail. For this, reference is made to blocks 503 and 506 of Figure 8. In this final step, the audio signal resulting from the combination or propagation compensation 15 of the audio input signals is weighted in the time-frequency domain according to the spatial characteristics of the virtual space microphone as specified by input 104 and / or according to the reconstructed geometry (given in 205).
For each time-frequency position, 20 geometric reconstruction allows to easily obtain the DOA in relation to the virtual microphone, as shown in Figure 10. Furthermore, the distance between the virtual microphone and the position of the sound event can be readily calculated.
The weight for the time-frequency position is then calculated considering the type of virtual microphone desired.
In the case of directional microphones, spectral weights can be calculated according to a predefined receiving pattern. For example, according to an application, a cardioid microphone may have a receiving pattern defined by the function g (theta), g (theta) = 0.5 + 0.5 cos (theta), 5 where theta is the angle between the direction of view of the virtual space microphone and the DOA of the sound from the point of view of the virtual microphone.
Another possibility are artistic (not physical) decline functions. In certain applications, it may be desired to suppress sound events far from the virtual microphone with a factor greater than a characteristic free field propagation. For this purpose, some applications introduce an additional weighting function that depends on the distance between the virtual microphone and the sound event. In an application, only sound events within a certain distance (for example, in meters) from the virtual microphone must be received.
Regarding the directivity of the virtual microphone, arbitrary directivity standards can be applied to the virtual microphone. By doing this, one can, for example, separate a source from a complex sound scenario. Since the DOA of the sound can be calculated at the pv position of the virtual microphone, namely,
where cv is the unit vector that describes the orientation of the virtual microphone, arbitrary directives for the virtual microphone can be performed. For example, supposing that Pv (k, n) indicates the combination signal or the modified audio signal, compensated by propagation, then the formula:
calculates the output of a virtual microphone with cardioid directivity. Directional patterns, which can potentially be generated in this way, depend on the accuracy of the position estimate.
In applications, one or more real non-space microphones, for example, an omnidirectional microphone or a directional microphone such as a cardioid, are placed on the sound stage in addition to the real space microphones to further improve the sound quality of the virtual microphone signals 105 in Figure 8. These microphones are not used to collect any geometrical information, but only provide a cleaner audio signal. These microphones can be placed closer to the sound sources than space microphones. In this case, according to an application, the audio signals from the real non-space microphones and their positions are simply inserted into the propagation compensation module 504 of Figure 8 for processing, instead of the audio signals from the real space microphones. Propagation compensation is then conducted for one or more audio signals recorded from the non-space microphones with respect to the position of one or more non-space microphones. At present, an application is performed using additional non-space microphones.
In another application, the computational calculation of the spatial lateral information of the virtual microphone is performed. To compute the spatial lateral information 106 of the microphone, the computational information calculation module 202 of Figure 8 comprises a computational calculation module for spatial lateral information 507, which is adapted to receive as input the positions of the 5 sound sources 205 and the position, orientation and characteristics 104 of the virtual microphone. In certain applications, according to the side information 106 that needs to be computed, the audio signal from the virtual microphone 105 can also be considered as input to the spatial lateral information computation module 10 507.
The output of the computational calculation module for spatial lateral information 507 is the lateral information of the virtual microphone 106. This lateral information can be, for example, the DOA or the sound diffusion for each time-frequency position (k, n) from the point of view of the virtual microphone.
Another possible lateral information could, for example, be the vector of the active sound intensity Ia (k, n) that would have been measured at the position of the virtual microphone. How these parameters can be derived, will now be described.
According to one application, the DOA estimate for the virtual space microphone is performed. The computational information calculation module 120 is adapted to estimate the direction of arrival at the virtual microphone as the spatial lateral information, based on a position vector of the virtual microphone and 25 based on a position vector of the sound event as illustrated Figure 11.
Figure 11 describes a possible way to derive the DOA of the sound from the point of view of the virtual microphone. The position of the sound event, provided by block 205 in Figure 8, can be described for each time-frequency position (k, n) with a position vector r (k, n), the position vector of the sound event . Similarly, the position of the virtual microphone, provided as input 104 in Figure 8, can be described with a position vector s (k, n), the position vector of the virtual microphone. The viewing direction of the virtual microphone can be described by a vector v (k, n). The DOA with respect to the virtual microphone is given by a (k, n). This represents the angle between v and the trajectory of the sound propagation h (k, n). h (k, n) which can be calculated using the formula: h (k, n) = s (k, n) - r (k, n).
The desired DOA a (k, n) can now be calculated for each (k, n) for example by defining the internal product of h (k, n) and v (k, n), namely a (k, n ) = arcs (h (k, n) • v (k, n) / (| h (k, n) | I Iv (k, n) II).
In another application, the computational information calculation module 120 can be adapted to estimate the active sound intensity in the virtual microphone as spatial lateral information, based on a position vector of the virtual microphone and based on a position vector of the event of sound as shown in Figure 11.
From the DOA a (k, n) defined above, we can derive the active sound intensity Ia (k, n) at the position of the virtual microphone. For this, the audio signal from the virtual microphone 105 in Figure 8 is supposed to correspond to the output of an omnidirectional microphone, for example, we assume, that the virtual microphone is an omnidirectional microphone. In addition, the viewing direction v in Figure 11 is assumed to be parallel to the x axis of the coordinate system. Since the desired active sound intensity vector Ia (k, n) describes the net energy flow through the position of the virtual microphone, we can calculate Ia (k, n), for example, according to the formula: Ia (k, n) = - (1/2 rho) | Pv (k, n) | 2 * [cos a (k, n), sin a (k, n)] T, where [] T denotes a vector transposed, rho is the air density, and Pv (k, n) is the sound pressure measured by the virtual space microphone, for example, the outlet 105 of block 506 in Figure 8. If the vector of the active intensity has to be computed expressed in the general coordinate system, but still in the position of the virtual microphone, the following formula can be applied: Ia (k, n) = (1/2 rho) | Pv (k, n) | 'h (k, n) / I | h (k, n) II.
Sound diffusion expresses how diffuse the sound field is in a given time-frequency slot (see, for example, [2]). The diffusion is expressed by an I | J value, characterized by the fact that 0 á Φ 1. The diffusion of 1 indicates that the total sound energy field of a sound field is completely diffuse. This information is important, for example, in the reproduction of spatial sound. Traditionally, the diffusion is calculated at the specific point in the space in which a microphone set is placed.
According to an application, the diffusion can be computed as an additional parameter to the lateral information generated for the virtual microphone (VM), which can be placed arbitrarily in an arbitrary position in the sound scenario. At present, a device that also calculates the diffusion in addition to the audio signal in a virtual position of a virtual microphone can be seen as a virtual frontal DirAC, as it is possible to produce a DirAC stream, namely, an audio signal, direction of arrival and diffusion, to an arbitrary point in the sound scene. The DirAC stream can also be processed, transmitted and reproduced in an arbitrary configuration with multiple speakers. In this case, the listener goes through the sound scene as if he or she were in the position specified by the virtual microphone and was looking in the direction determined by their orientation.
Figure 12 illustrates a computational information calculation block according to an application, comprising a diffusion computational calculation unit 801 for computing the diffusion in the virtual microphone. The computational information block 202 is adapted to receive inputs 111 to 11N, which in addition to the inputs in Figure 3 also include broadcasting in real space microphones. Leave ΦISM1> to denote these values. These additional inputs are inserted into the computational information calculation module 202. Output 103 of the diffusion computational calculation unit 801 is the computed diffusion parameter at the position of the virtual microphone.
A diffusion computational computation unit 801 of an application is illustrated in Figure 13 which presents more details. According to an application, the direct and diffuse sound energy in each of the space microphones N is estimated. Then, when using the information about the positions of the IPLS and the information about the positions of the space and virtual microphones, the N estimates of these energies in the position of the virtual microphone are obtained. Finally, the estimates can be combined to improve the accuracy of the estimate and the diffusion parameter in the virtual microphone can be readily calculated.
Let ESMIdir to ESMNdir and ESMIdiff to ESMNdiff 'denotes the estimates of direct and diffuse sound energies for space microphones N calculated by the energy analysis unit 810. If P, is the complex pressure signal and Í | ÍI is the diffusion for the i-th space microphone, then the energies can, for example, be calculated according to the formulas:

The diffuse sound energy must be the same in all positions; in this way, an estimate of the diffused sound energy Edjff in the virtual microphone can be calculated simply by means of Ediff to Ediff, for example, in a combining unit of diffusion 820, for example, according to the formula:

A more effective combination of Ediff and Ediff estimates could be performed considering the variance of the estimators, for example, considering the SNR.
The direct sound energy depends on the distance to the source due to propagation. In this way, Edjr a Edir can be modified to take this into account. This can be accomplished, for example, by an 830 direct sound propagation adjustment unit. For example, if the energy of the direct sound field declines with 1 over the distance squared, then the estimate for the sound direct into the virtual microphone for the i-th space microphone can be calculated according to the formula:

Similar to the 820 diffusion combination unit, estimates of direct sound energy obtained on different space microphones can be combined, for example, by an 840 direct sound combination unit. The result is 10 Ej, rMl, for example, estimate for direct sound energy in the virtual microphone. The diffusion in the virtual microphone can be computed, for example, by a diffusion subcalculator 850, for example, according to the formula:

As mentioned above, in some cases, the position estimation of sound events performed by a position estimator of sound events fails, for example, in the case of a wrong estimate of the direction of arrival. Figure 14 illustrates this scenario. In these cases, regardless of the diffusion parameters estimated in the different spatial microphone and as received from inputs 111 to 11N, the diffusion for virtual microphone 103 can be set to 1 (that is, completely diffuse), since no spatially coherent reproduction is possible .
In addition, the reliability of DOA estimates 25 in space microphones N can be considered. This is expressed, for example, in terms of the variance of the DOA or SNR estimator. Such information can be considered by the diffusion subcalculator 850, so that the VM 103 diffusion can be artificially high in the event that DOA estimates are doubtful. In reality, as a consequence, the 5 position 205 estimates will also be doubtful.
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or to a characteristic of a method step. Similarly, the aspects described in the context of a method step also represent a description of a corresponding block or item or characteristic of a corresponding apparatus. The inventive decomposed signal can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
Depending on certain implementation requirements, applications of the invention can be implemented in hardware or in software. The implementation can be carried out using a digital storage medium, for example, a floppy disk, a DVD, a CD, a ROM memory, a PROM, an EPROM, an EEPROM or a FLASH, having readable control signals electronically stored on it, which cooperate (or can cooperate) with a programmable computer system so that the respective method is carried out.
Some applications according to the invention comprise a non-transient data carrier, having electronically readable control signals that can cooperate with a programmable computer system, so that one of the methods described here is performed.
Generally, the applications of the present invention can be implemented as a computer program product with a program code, the program code being operative to perform one of the methods when the computer program product operates on a computer. The program code can, for example, be stored on a machine-readable conveyor.
Other applications include the computer program to perform one of the methods described here, stored on a machine-readable conveyor.
In other words, an application of the inventive method is, in this way, a computer program, having a program code to perform one of the methods described here, when the computer program operates on a computer.
Another application of the inventive methods is, in this way, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded on it, the computer program for carrying out one of the methods described here. Another application of the inventive method is, therefore, a data stream or a sequence of signals that represents the computer program to perform one of the methods described here. The data stream or the signal sequence can, for example, be configured to be transferred over a data communication connection, for example, over the Internet. Another application comprises a processing medium, for example, a computer, or a programmable logic device, configured or adapted to perform one of the methods described here.
Another application comprises a computer, having the computer program installed on it to perform one of the methods described here.
In some applications, a programmable logic device (for example, a set of programmable logic gates) can be used to perform some or all of the functionality of the methods described here. In some applications, a set of programmable logic gates can cooperate with a microprocessor to perform one of the methods described here. Generally, the methods are preferably performed by any hardware device.
The applications described above are merely illustrative for the principles of the present invention. It is understood that the modifications and variations of the arrangements and the details described here will be apparent to other persons skilled in the art. It is therefore the intention to be limited only by the scope of the impending patent claims and not by the specific details presented in the form of description and explanation of the applications here. Literature: [1] R. K. Furness, "Ambisonics - An overview," in AES 8lh International Conference, April 1990, pp. 181-189. [2] V. Pulkki, "Directional audio coding in spatial sound reproduction and stereo upmixing," in Proceedings of the AES 28t; h International Conference, pp. 251-258, Piteã, Sweden, June 30 - July 2, 2006. 5 10 15 20 25 [3] V. Pulkki, "Spatial sound reproduction with directional audio coding," J. Audio Eng. Soc., Vol. 55, no. 6, pp. 503-516, June 2007. [4] C. Faller: "Microphone Front-Ends for Spatial Audio Coders", in Proceedings of the AES 125th International Convention, San Francisco, Oct. 2008. [5] M. Kallinger, H. Ochsenfeld, G. Del Galdo, F. Küch, D. Mahne, R. Schultz-Amling. and 0. Thiergart, "A spatial filtering approach for directional audio coding," in Audio Engineering Society Convention 126, Munich, Germany, May 2009. [6] R. Schultz-Amling, F. Küch, 0. Thiergart, and M. Kallinger, "Acoustical zooming based on a parametric sound field representation," in Audio Engineering Society Convention 128, London UK, May 2010. [7] J. Herre, C. Falch, D. Mahne, G. Del Galdo, M. Kallinger , and 0. Thiergart, "Interactive teleconferencing combining spatial audio object coding and DirAC technology," in Audio Engineering Society Convention 128, London UK, May 2010. [81 EG Williams, Fourier Acoustics: Sound Radiation and Nearfield Acoustical Holography, Academic Press, 1999. [9] A. Kuntz and R. Rabenstein, "Limitations in the extrapolation of wave fields from circular measurements," in 15th European Signal Processing Conference (EUSIPCO 2007), 2007. [10] A. Walther and C. Faller, "Linear simulation of spaced microphone arrays using b-format recordings," in Audio Eng ineering Society Convention 128, London UK, May 2010. 5 10 15 20 25 [11] US61 / 287,596: An Apparatus and a Method for Converting a First Parametric Spatial Audio Signal into a Second Parametric Spatial Audio Signal. [12] S. Rickard and Z. Yilmaz, "On the approximate W-disjoint orthogonality of speech," in Acoustics, Speech and Signal Processing, 2002. ICASSP 2002. IEEE International Conference on, April 2002, vol. 1. [13] R. Roy, A. Paulraj, and T. Kailath, "Direction-of-arrival estimation by subspace rotation methods - ESPRIT," in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Stanford , CA, USA, April 1986. [14] R. Schmidt, "Multiple emitter location and signal parameter estimation," IEEE Transactions on Antennas and Propagation, vol. 34, no. 3, pp. 276-280, 1986. [15] J. Michael Steele, "Optimal Triangulation of Random Samples in the Plane", The Annals of Probability, Vol. 10, No.3 (Aug., 1982), pp. 548-553. [16] FJ Fahy, Sound Intensity, Essex: Elsevier Science Publishers Ltd., 1989. [17] R. Schultz-Amling, F. Küch, M. Kallinger, G. Del Galdo, T. Ahonen and V. Pulkki, " Planar microphone array processing for the analysis and reproduction of spatial audio using directional audio coding, "in Audio Engineering Society Convention 124, Amsterdam, The Netherlands, May 2008. [18] M. Kallinger, F. Küch, R. Schultz-Amling, G. Del Galdo, T. Ahonen and V. Pulkki, "Enhanced direction estimation using microphone arrays for directional audio coding;" in Hands-Free Speech Communication and Microphone Arrays, 2008. HSCMA 2008, May 2008, pp. 45-48.

权利要求:
Claims (17)
[0001]
1. An apparatus for generating an audio output signal to simulate a recording from a virtual microphone in a configurable virtual position in an environment, characterized by comprising: a sound event position estimator (110) to estimate an event position sound indicating a position of a sound event in the environment, in which the sound event is active at a certain time or in a certain time frequency compartment, where the sound event is a real sound source or a mirror image source, in which the sound event position estimator (110) is configured to estimate the sound event position indicating a position of a mirror image source in the environment when the sound event is a mirror image source and in that the sound event position estimator (110) is adapted to estimate the sound event position based on first direction information provided by a first real space microphone being located in u a first real microphone position in the room and based on second direction information provided by a second real space microphone being located in a second real microphone position in the room, where the first real space microphone and the second real spa microphone start they are space microphones that exist physically; and in which the first real space microphone and the second real space microphone are devices for acquiring spatial sound capable of recovering the direction of arrival of the sound, and an information computing module (120) to generate the audio output signal with based on a first recorded audio input signal, based on the first real position of the microphone, based on the virtual position of the virtual microphone and based on the position of the sound event, where the first real space microphone is configured to record the first recorded audio input signal or where a third microphone is configured to record the first recorded audio input signal, where the sound event position estimator (110) is adapted to estimate the sound event position with based on a first direction of arrival of the sound wave emitted by the sound event in the first real position of the microphone as the first direction information and based on a second direction of arrival of the sound wave n the second real position of the microphone as information from the second direction, and in which the information computing module (120) comprises a propagation compensator (500), in which the propagation compensator (500) is adapted to generate a first modified audio by modifying the first recorded audio input signal, based on a first decrease in amplitude between the sound event and the first real space microphone and based on a second amplitude deterioration between the sound event and virtual microphone, adjusting a value amplitude, a magnitude value or a phase value of the first recorded audio input signal, to obtain the audio output signal; or where the propagation compensator (500) is adapted to generate a first modified audio signal, compensating for a first time delay between the arrival of a sound wave emitted by the sound event in the first real space microphone and the arrival of the sound wave in the virtual microphone adjusting an amplitude value, a magnitude value or a phase value of the first recorded audio input signal, to obtain the audio output signal.
[0002]
An apparatus according to claim 1, characterized in that the computational information computation module (120) comprises a computational computation module of lateral spatial information (507) for computing lateral spatial information, wherein the computation module computational information (120) be adapted to estimate the direction of arrival or an active sound intensity in the virtual microphone as the spatial lateral information, based on a position vector of the virtual microphone and based on a position vector of the event sound.
[0003]
An apparatus according to claim 2, characterized in that the propagation compensator (500) is adapted to generate the first audio signal modified in a time-frequency domain, based on the first decline in amplitude between the source of sound and the first real space microphone and based on the second decline in amplitude between the sound source and the virtual microphone, by adjusting the said magnitude value of the first recorded audio input signal being represented in a time-frequency domain.
[0004]
An apparatus according to claim 3, characterized in that the propagation compensator (500) is adapted to generate the first audio signal modified in a time-frequency domain, compensating for the first delay between the arrival of the sound wave emitted by the sound source in the first real space microphone and the arrival of the sound wave in the virtual microphone by adjusting the said magnitude value of the first recorded audio input signal being represented in a time-frequency domain.
[0005]
An apparatus according to any one of the preceding claims, characterized in that the propagation compensator (500) is adapted to conduct propagation compensation by generating a modified magnitude value of the first modified audio signal using the formula:
[0006]
An apparatus according to any one of the preceding claims, characterized in that the computational information calculation module (120) further comprises a combiner (510), in which the propagation compensator (500) is further adapted to modify a second recorded audio input signal, being recorded by the second real space microphone, compensating for a second delay or a second amplitude decline between an arrival of the sound wave emitted by the sound source in the second real space microphone and an arrival of the sound wave in the virtual microphone, by adjusting an amplitude value, a magnitude value or a phase value of the second recorded audio input signal to obtain a modified second audio signal in which the combiner (510) is adapted for generate a combination signal combining the first modified audio signal and the second modified audio signal, to obtain the audio output signal.
[0007]
An apparatus according to claim 6, characterized in that the propagation compensator (500) is further adapted to modify one or more additional recorded audio input signals, being recorded by one or more additional real space microphones, compensating for delays or declines in amplitude between an arrival of the sound wave in the virtual microphone and an arrival of the sound wave emitted by the sound source in each of the additional real space microphones, in which the propagation compensator (500) is adapted to compensate each of the amplitude delays or declines by adjusting an amplitude value, a magnitude value or a phase value of each of the additional recorded audio input signals to obtain a plurality of third modified audio signals, and where the combiner (510) is adapted to generate a combination signal by combining the first modified audio signal and the second modified audio signal and the plurality of third signals modified audio outputs to obtain the audio output signal.
[0008]
An apparatus according to one of claims 1 to 5, characterized in that the computational information calculation module (120) comprises a spectral weighting unit (520) for generating a weighted audio signal by modifying the first signal. modified audio depending on a direction of arrival of the sound wave in the virtual position of the virtual microphone and depending on a virtual orientation of the virtual microphone to obtain the audio output signal, the first modified audio signal being modified in a domain of time-frequency.
[0009]
An apparatus according to claim 6 or 7, characterized in that the computational information calculation module (120) comprises a spectral weighting unit (520) for generating a weighted audio signal by modifying the combination signal depending on a direction of arrival or sound wave in the virtual position of the virtual microphone and a virtual orientation of the virtual microphone to obtain the audio output signal, the combination signal being modified in a time-frequency domain.
[0010]
An apparatus according to claim 8 or 9, characterized in that the spectral weighting unit (520) is adapted to apply the weighting factor α + (1-α) cos (Φv (k, n)), or the weighting factor 0.5 + 0.5 cos (Φv (k, n)) in the weighted audio signal, where Φv (k, n) indicates a vector of the direction of arrival of the sound wave emitted by the sound source in the virtual position of the virtual microphone.
[0011]
An apparatus according to one of claims 1 to 6, characterized in that the propagation compensator (500) is further adapted to generate a third audio signal modified by modifying a third audio input signal recorded by a fourth microphone compensating for a third delay or a third amplitude decline between an arrival of the sound wave emitted by the sound source in the fourth microphone and an arrival of the sound wave in the virtual microphone by adjusting an amplitude value, a magnitude value or a phase value of the third recorded audio input signal, to obtain the audio output signal.
[0012]
An apparatus according to any one of the preceding claims, characterized in that the position estimator of sound events (110) is adapted to estimate a position of the sound source in a three-dimensional environment.
[0013]
An apparatus according to any one of the preceding claims, characterized in that the computational information calculation module (120) further comprises a diffusion computational calculation unit (801) being adapted to estimate a diffuse sound energy in the virtual microphone or a direct sound energy in the virtual microphone; wherein the computational diffusion computation unit (801) is adapted to estimate the diffuse sound energy in the virtual microphone based on the diffuse sound energies in the first and second real space microphone.
[0014]
A device according to claim 13, characterized in that the computational diffusion calculation unit (801) is adapted to estimate the diffuse sound energy E (dVifMf) in the virtual microphone by applying the formula:
[0015]
An apparatus according to claim 13 or 14, characterized in that the computational diffusion calculation unit (801) is adapted to estimate the direct sound energy using the formula:
[0016]
An apparatus according to one of claims 13 to 15, characterized in that the computational diffusion calculation unit (801) is adapted to estimate the diffusion in the virtual microphone by estimating the diffuse sound energy in the virtual microphone and the energy of direct sound in the virtual microphone and applying the formula:
[0017]
17. A method to generate an audio output signal to simulate a recording from a virtual microphone in a configurable virtual position in an environment, characterized by understanding: estimating a position of a sound event indicating a position of a sound event in the environment, in that the sound event is active at a certain time or in a certain time frequency compartment, where the sound event is a real sound source or a mirror image source, in which the step of estimating the position of the sound event Sound comprises estimating the position of the sound event indicating a position of a mirror image source in the environment when the sound event is a mirror image source and in which the step of estimating the position of the sound event is based on information from first direction provided by a real first space microphone being located in a real first microphone position in the environment and based on second direction information provided by a sec undo real space microphone being located in a second real microphone position in the environment, where the first real space microphone and the second real space microphone are space microphones that exist physically; and where the first real space microphone and the second real space microphone are devices for acquiring spatial sound capable of retrieving the direction of arrival of the sound, and generating the audio output signal based on a first recorded audio input signal. , based on the first real position of the microphone, based on the virtual position of the virtual microphone and based on the position of the sound event, where the first real space microphone is configured to record the first recorded audio input signal or where a third microphone is configured to record the first recorded audio input signal, in which the estimate of the position of the sound event is conducted based on a first direction of arrival of the sound wave emitted by the sound event in the first real position of the microphone as the information of the first direction and based on a second direction of arrival of the sound wave in the second real position of the microphone as the information of the second direction, in which the generating the audio output signal comprises generating a first modified audio signal by modifying the first recorded audio input signal, based on a first decrease in amplitude between the sound event and the first real space microphone and based on a second decrease amplitude between the sound event and virtual microphone, adjusting an amplitude value, a magnitude value or a phase value of the first recorded audio input signal, to obtain the audio output signal; or where the step of generating the audio output signal comprises generating a first modified audio signal compensating for a first time delay between the arrival of a sound wave emitted by the sound event in the first real space microphone and the arrival of the sound wave in the virtual microphone adjusting an amplitude value, a magnitude value or a phase value of the first recorded audio input signal, to obtain the audio output signal.

类似技术:

公开号 | 公开日 | 专利标题

BR112013013681B1|2020-12-29|sound acquisition by extracting geometric information from arrival direction estimates

US10284947B2|2019-05-07|Apparatus and method for microphone positioning based on a spatial power density

JP6086923B2|2017-03-01|Apparatus and method for integrating spatial audio encoded streams based on geometry

BR112013013673B1|2021-03-30|APPARATUS AND METHOD FOR THE ACQUISITION OF SPATIALLY SELECTIVE SOUND BY ACOUSTIC TRIANGULATION

BR112014013335B1|2021-11-23|APPARATUS AND METHOD FOR MICROPHONE POSITIONING BASED ON A SPACE POWER DENSITY

同族专利:

公开号 | 公开日

EP2647005A1|2013-10-09|

WO2012072804A1|2012-06-07|

EP2647222B1|2014-10-29|

CN103583054A|2014-02-12|

AU2011334851A1|2013-06-27|

EP2647222A1|2013-10-09|

EP2647005B1|2017-08-16|

BR112013013681A2|2017-09-26|

AU2011334857A1|2013-06-27|

RU2013130226A|2015-01-10|

US20130268280A1|2013-10-10|

MX338525B|2016-04-20|

RU2570359C2|2015-12-10|

MX2013006068A|2013-12-02|

ES2643163T3|2017-11-21|

RU2013130233A|2015-01-10|

US9396731B2|2016-07-19|

JP5728094B2|2015-06-03|

CA2819394A1|2012-06-07|

CN103583054B|2016-08-10|

AR084091A1|2013-04-17|

CA2819502A1|2012-06-07|

AU2011334851B2|2015-01-22|

CA2819502C|2020-03-10|

JP2014501945A|2014-01-23|

CN103460285B|2018-01-12|

CA2819394C|2016-07-05|

TWI489450B|2015-06-21|

AR084160A1|2013-04-24|

KR20130111602A|2013-10-10|

PL2647222T3|2015-04-30|

WO2012072798A1|2012-06-07|

JP5878549B2|2016-03-08|

RU2556390C2|2015-07-10|

US20130259243A1|2013-10-03|

AU2011334857B2|2015-08-13|

ES2525839T3|2014-12-30|

KR101442446B1|2014-09-22|

HK1190490A1|2014-11-21|

TW201237849A|2012-09-16|

KR101619578B1|2016-05-18|

KR20140045910A|2014-04-17|

TW201234873A|2012-08-16|

TWI530201B|2016-04-11|

MX2013006150A|2014-03-12|

US10109282B2|2018-10-23|

CN103460285A|2013-12-18|

JP2014502109A|2014-01-23|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US6577738B2|1996-07-17|2003-06-10|American Technology Corporation|Parametric virtual speaker and surround-sound system|

US6072878A|1997-09-24|2000-06-06|Sonic Solutions|Multi-channel surround sound mastering and reproduction techniques that preserve spatial harmonics|

CN1452851A|2000-04-19|2003-10-29|音响方案公司|Multi-channel surround sound mastering and reproduction techniques that preserve spatial harmonics in three dimensions|

JP3344647B2|1998-02-18|2002-11-11|富士通株式会社|Microphone array device|

JP3863323B2|1999-08-03|2006-12-27|富士通株式会社|Microphone array device|

KR100387238B1|2000-04-21|2003-06-12|삼성전자주식회사|Audio reproducing apparatus and method having function capable of modulating audio signal, remixing apparatus and method employing the apparatus|

GB2364121B|2000-06-30|2004-11-24|Mitel Corp|Method and apparatus for locating a talker|

JP4304845B2|2000-08-03|2009-07-29|ソニー株式会社|Audio signal processing method and audio signal processing apparatus|

KR100626661B1|2002-10-15|2006-09-22|한국전자통신연구원|Method of Processing 3D Audio Scene with Extended Spatiality of Sound Source|

EP1552724A4|2002-10-15|2010-10-20|Korea Electronics Telecomm|Method for generating and consuming 3d audio scene with extended spatiality of sound source|

KR101014404B1|2002-11-15|2011-02-15|소니 주식회사|Audio signal processing method and processing device|

JP2004193877A|2002-12-10|2004-07-08|Sony Corp|Sound image localization signal processing apparatus and sound image localization signal processing method|

WO2004059643A1|2002-12-28|2004-07-15|Samsung Electronics Co., Ltd.|Method and apparatus for mixing audio stream and information storage medium|

KR20040060718A|2002-12-28|2004-07-06|삼성전자주식회사|Method and apparatus for mixing audio stream and information storage medium thereof|

JP3639280B2|2003-02-12|2005-04-20|任天堂株式会社|Game message display method and game program|

FI118247B|2003-02-26|2007-08-31|Fraunhofer Ges Forschung|Method for creating a natural or modified space impression in multi-channel listening|

JP4133559B2|2003-05-02|2008-08-13|株式会社コナミデジタルエンタテインメント|Audio reproduction program, audio reproduction method, and audio reproduction apparatus|

US20060104451A1|2003-08-07|2006-05-18|Tymphany Corporation|Audio reproduction system|

US9992599B2|2004-04-05|2018-06-05|Koninklijke Philips N.V.|Method, device, encoder apparatus, decoder apparatus and audio system|

GB2414369B|2004-05-21|2007-08-01|Hewlett Packard Development Co|Processing audio data|

KR100586893B1|2004-06-28|2006-06-08|삼성전자주식회사|System and method for estimating speaker localization in non-stationary noise environment|

WO2006006935A1|2004-07-08|2006-01-19|Agency For Science, Technology And Research|Capturing sound from a target region|

US7617501B2|2004-07-09|2009-11-10|Quest Software, Inc.|Apparatus, system, and method for managing policies on a computer having a foreign operating system|

US7903824B2|2005-01-10|2011-03-08|Agere Systems Inc.|Compact side information for parametric coding of spatial audio|

DE102005010057A1|2005-03-04|2006-09-07|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for generating a coded stereo signal of an audio piece or audio data stream|

EP2030420A4|2005-03-28|2009-06-03|Sound Id|Personal sound system|

JP4273343B2|2005-04-18|2009-06-03|ソニー株式会社|Playback apparatus and playback method|

US20070047742A1|2005-08-26|2007-03-01|Step Communications Corporation, A Nevada Corporation|Method and system for enhancing regional sensitivity noise discrimination|

WO2007046288A1|2005-10-18|2007-04-26|Pioneer Corporation|Localization control device, localization control method, localization control program, and computer-readable recording medium|

JP5586950B2|2006-05-19|2014-09-10|韓國電子通信研究院|Object-based three-dimensional audio service system and method using preset audio scene|

CN101473645B|2005-12-08|2011-09-21|韩国电子通信研究院|Object-based 3-dimensional audio service system using preset audio scenes|

KR101358700B1|2006-02-21|2014-02-07|코닌클리케 필립스 엔.브이.|Audio encoding and decoding|

EP1989926B1|2006-03-01|2020-07-08|Lancaster University Business Enterprises Limited|Method and apparatus for signal presentation|

GB0604076D0|2006-03-01|2006-04-12|Univ Lancaster|Method and apparatus for signal presentation|

GB2467668B|2007-10-03|2011-12-07|Creative Tech Ltd|Spatial audio analysis and synthesis for binaural reproduction and format conversion|

US8374365B2|2006-05-17|2013-02-12|Creative Technology Ltd|Spatial audio analysis and synthesis for binaural reproduction and format conversion|

US20080004729A1|2006-06-30|2008-01-03|Nokia Corporation|Direct encoding into a directional audio coding format|

JP4894386B2|2006-07-21|2012-03-14|ソニー株式会社|Audio signal processing apparatus, audio signal processing method, and audio signal processing program|

US8229754B1|2006-10-23|2012-07-24|Adobe Systems Incorporated|Selecting features of displayed audio data across time|

EP2595149A3|2006-12-27|2013-11-13|Electronics and Telecommunications Research Institute|Apparatus for transcoding downmix signals|

JP4449987B2|2007-02-15|2010-04-14|ソニー株式会社|Audio processing apparatus, audio processing method and program|

US9015051B2|2007-03-21|2015-04-21|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Reconstruction of audio channels with direction parameters indicating direction of origin|

JP4221035B2|2007-03-30|2009-02-12|株式会社コナミデジタルエンタテインメント|Game sound output device, sound image localization control method, and program|

EP2147567B1|2007-04-19|2013-04-10|Epos Development Ltd.|Voice and position localization|

FR2916078A1|2007-05-10|2008-11-14|France Telecom|AUDIO ENCODING AND DECODING METHOD, AUDIO ENCODER, AUDIO DECODER AND ASSOCIATED COMPUTER PROGRAMS|

US8180062B2|2007-05-30|2012-05-15|Nokia Corporation|Spatial sound zooming|

US20080298610A1|2007-05-30|2008-12-04|Nokia Corporation|Parameter Space Re-Panning for Spatial Audio|

JP5294603B2|2007-10-03|2013-09-18|日本電信電話株式会社|Acoustic signal estimation device, acoustic signal synthesis device, acoustic signal estimation synthesis device, acoustic signal estimation method, acoustic signal synthesis method, acoustic signal estimation synthesis method, program using these methods, and recording medium|

KR101415026B1|2007-11-19|2014-07-04|삼성전자주식회사|Method and apparatus for acquiring the multi-channel sound with a microphone array|

US20090180631A1|2008-01-10|2009-07-16|Sound Id|Personal sound system for display of sound pressure level or other environmental condition|

JP5686358B2|2008-03-07|2015-03-18|学校法人日本大学|Sound source distance measuring device and acoustic information separating device using the same|

JP2009246827A|2008-03-31|2009-10-22|Nippon Hoso Kyokai <Nhk>|Device for determining positions of sound source and virtual sound source, method and program|

KR101461685B1|2008-03-31|2014-11-19|한국전자통신연구원|Method and apparatus for generating side information bitstream of multi object audio signal|

US8457328B2|2008-04-22|2013-06-04|Nokia Corporation|Method, apparatus and computer program product for utilizing spatial information for audio signal enhancement in a distributed network environment|

ES2425814T3|2008-08-13|2013-10-17|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus for determining a converted spatial audio signal|

EP2154910A1|2008-08-13|2010-02-17|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus for merging spatial audio streams|

US8023660B2|2008-09-11|2011-09-20|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Apparatus, method and computer program for providing a set of spatial cues on the basis of a microphone signal and apparatus for providing a two-channel audio signal and a set of spatial cues|

WO2010028784A1|2008-09-11|2010-03-18|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus, method and computer program for providing a set of spatial cues on the basis of a microphone signal and apparatus for providing a two-channel audio signal and a set of spatial cues|

ES2733878T3|2008-12-15|2019-12-03|Orange|Enhanced coding of multichannel digital audio signals|

JP5309953B2|2008-12-17|2013-10-09|ヤマハ株式会社|Sound collector|

EP2205007B1|2008-12-30|2019-01-09|Dolby International AB|Method and apparatus for three-dimensional acoustic field encoding and optimal reconstruction|

JP5620689B2|2009-02-13|2014-11-05|本田技研工業株式会社|Reverberation suppression apparatus and reverberation suppression method|

JP5197458B2|2009-03-25|2013-05-15|株式会社東芝|Received signal processing apparatus, method and program|

WO2010113434A1|2009-03-31|2010-10-07|パナソニック株式会社|Sound reproduction system and method|

EP2422344A1|2009-04-21|2012-02-29|Koninklijke Philips Electronics N.V.|Audio signal synthesizing|

EP2249334A1|2009-05-08|2010-11-10|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio format transcoder|

EP2346028A1|2009-12-17|2011-07-20|Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V.|An apparatus and a method for converting a first parametric spatial audio signal into a second parametric spatial audio signal|

KR20120059827A|2010-12-01|2012-06-11|삼성전자주식회사|Apparatus for multiple sound source localization and method the same|US9558755B1|2010-05-20|2017-01-31|Knowles Electronics, Llc|Noise suppression assisted automatic speech recognition|

EP2600637A1|2011-12-02|2013-06-05|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for microphone positioning based on a spatial power density|

US10154361B2|2011-12-22|2018-12-11|Nokia Technologies Oy|Spatial audio processing apparatus|

CN104054126B|2012-01-19|2017-03-29|皇家飞利浦有限公司|Space audio is rendered and is encoded|

EP2893532B1|2012-09-03|2021-03-24|Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V.|Apparatus and method for providing an informed multichannel speech presence probability estimation|

EP2898506B1|2012-09-21|2018-01-17|Dolby Laboratories Licensing Corporation|Layered approach to spatial audio coding|

US9554203B1|2012-09-26|2017-01-24|Foundation for Research and Technolgy—HellasInstitute of Computer Science |Sound source characterization apparatuses, methods and systems|

US10149048B1|2012-09-26|2018-12-04|Foundation for Research and Technology—HellasInstitute of Computer Science |Direction of arrival estimation and sound source enhancement in the presence of a reflective surface apparatuses, methods, and systems|

US9549253B2|2012-09-26|2017-01-17|Foundation for Research and Technology—HellasInstitute of Computer Science |Sound source localization and isolation apparatuses, methods and systems|

US10175335B1|2012-09-26|2019-01-08|Foundation For Research And Technology-Hellas |Direction of arrivalestimation apparatuses, methods, and systems|

US9955277B1|2012-09-26|2018-04-24|Foundation For Research And Technology-HellasInstitute Of Computer Science |Spatial sound characterization apparatuses, methods and systems|

US10136239B1|2012-09-26|2018-11-20|Foundation For Research And Technology—Hellas |Capturing and reproducing spatial sound apparatuses, methods, and systems|

US9640194B1|2012-10-04|2017-05-02|Knowles Electronics, Llc|Noise suppression for speech processing based on machine-learning mask estimation|

FR2998438A1|2012-11-16|2014-05-23|France Telecom|ACQUISITION OF SPATIALIZED SOUND DATA|

EP2747451A1|2012-12-21|2014-06-25|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Filter and method for informed spatial filtering using multiple instantaneous direction-of-arrivial estimates|

CN104010265A|2013-02-22|2014-08-27|杜比实验室特许公司|Audio space rendering device and method|

CN104019885A|2013-02-28|2014-09-03|杜比实验室特许公司|Sound field analysis system|

WO2014151813A1|2013-03-15|2014-09-25|Dolby Laboratories Licensing Corporation|Normalization of soundfield orientations based on auditory scene analysis|

CN104982042B|2013-04-19|2018-06-08|韩国电子通信研究院|Multi channel audio signal processing unit and method|

US20140355769A1|2013-05-29|2014-12-04|Qualcomm Incorporated|Energy preservation for decomposed representations of a sound field|

CN104244164A|2013-06-18|2014-12-24|杜比实验室特许公司|Method, device and computer program product for generating surround sound field|

CN104240711B|2013-06-18|2019-10-11|杜比实验室特许公司|For generating the mthods, systems and devices of adaptive audio content|

EP2830051A3|2013-07-22|2015-03-04|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio encoder, audio decoder, methods and computer program using jointly encoded residual signals|

EP2830050A1|2013-07-22|2015-01-28|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for enhanced spatial audio object coding|

EP2830045A1|2013-07-22|2015-01-28|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Concept for audio encoding and decoding for audio channels and audio objects|

EP2830047A1|2013-07-22|2015-01-28|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for low delay object metadata coding|

US9319819B2|2013-07-25|2016-04-19|Etri|Binaural rendering method and apparatus for decoding multi channel audio|

EP3028476B1|2013-07-30|2019-03-13|Dolby International AB|Panning of audio objects to arbitrary speaker layouts|

CN104637495B|2013-11-08|2019-03-26|宏达国际电子股份有限公司|Electronic device and acoustic signal processing method|

CN103618986B|2013-11-19|2015-09-30|深圳市新一代信息技术研究院有限公司|The extracting method of source of sound acoustic image body and device in a kind of 3d space|

EP3072315B1|2013-11-22|2021-11-03|Apple Inc.|Handsfree beam pattern configuration|

RU2666248C2|2014-05-13|2018-09-06|Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф.|Device and method for amplitude panning with front fading|

US10770087B2|2014-05-16|2020-09-08|Qualcomm Incorporated|Selecting codebooks for coding vectors decomposed from higher-order ambisonic audio signals|

US9620137B2|2014-05-16|2017-04-11|Qualcomm Incorporated|Determining between scalar and vector quantization in higher order ambisonic coefficients|

DE112015003945T5|2014-08-28|2017-05-11|Knowles Electronics, Llc|Multi-source noise reduction|

CN105376691B|2014-08-29|2019-10-08|杜比实验室特许公司|The surround sound of perceived direction plays|

CN104168534A|2014-09-01|2014-11-26|北京塞宾科技有限公司|Holographic audio device and control method|

CN104378570A|2014-09-28|2015-02-25|小米科技有限责任公司|Sound recording method and device|

EP3206415B1|2014-10-10|2019-09-04|Sony Corporation|Sound processing device, method, and program|

US20160210957A1|2015-01-16|2016-07-21|Foundation For Research And Technology - Hellas |Foreground Signal Suppression Apparatuses, Methods, and Systems|

KR20170109023A|2015-01-30|2017-09-27|디티에스, 인코포레이티드|Systems and methods for capturing, encoding, distributing, and decoding immersive audio|

EP3079074A1|2015-04-10|2016-10-12|B<>Com|Data-processing method for estimating parameters for mixing audio signals, associated mixing method, devices and computer programs|

US9609436B2|2015-05-22|2017-03-28|Microsoft Technology Licensing, Llc|Systems and methods for audio creation and delivery|

US9530426B1|2015-06-24|2016-12-27|Microsoft Technology Licensing, Llc|Filtering sounds for conferencing applications|

US9601131B2|2015-06-25|2017-03-21|Htc Corporation|Sound processing device and method|

US10375472B2|2015-07-02|2019-08-06|Dolby Laboratories Licensing Corporation|Determining azimuth and elevation angles from stereo recordings|

HK1255002A1|2015-07-02|2019-08-02|杜比實驗室特許公司|Determining azimuth and elevation angles from stereo recordings|

GB2543275A|2015-10-12|2017-04-19|Nokia Technologies Oy|Distributed audio capture and mixing|

EP3370437A4|2015-10-26|2018-10-17|Sony Corporation|Signal processing device, signal processing method, and program|

US10206040B2|2015-10-30|2019-02-12|Essential Products, Inc.|Microphone array for generating virtual sound field|

EP3174316B1|2015-11-27|2020-02-26|Nokia Technologies Oy|Intelligent audio rendering|

US11064291B2|2015-12-04|2021-07-13|Sennheiser Electronic Gmbh & Co. Kg|Microphone array system|

US9894434B2|2015-12-04|2018-02-13|Sennheiser Electronic Gmbh & Co. Kg|Conference system with a microphone array system and a method of speech acquisition in a conference system|

KR102357287B1|2016-03-15|2022-02-08|프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝에. 베.|Apparatus, Method or Computer Program for Generating a Sound Field Description|

US9956910B2|2016-07-18|2018-05-01|Toyota Motor Engineering & Manufacturing North America, Inc.|Audible notification systems and methods for autonomous vehicles|

US9986357B2|2016-09-28|2018-05-29|Nokia Technologies Oy|Fitting background ambiance to sound objects|

US10820097B2|2016-09-29|2020-10-27|Dolby Laboratories Licensing Corporation|Method, systems and apparatus for determining audio representation of one or more audio sources|

US9980078B2|2016-10-14|2018-05-22|Nokia Technologies Oy|Audio object modification in free-viewpoint rendering|

US10531220B2|2016-12-05|2020-01-07|Magic Leap, Inc.|Distributed audio capturing techniques for virtual reality , augmented reality , and mixed realitysystems|

CN106708041B|2016-12-12|2020-12-29|西安Tcl软件开发有限公司|Intelligent sound box and directional moving method and device of intelligent sound box|

US11096004B2|2017-01-23|2021-08-17|Nokia Technologies Oy|Spatial audio rendering point extension|

US10366700B2|2017-02-08|2019-07-30|Logitech Europe, S.A.|Device for acquiring and processing audible input|

US10362393B2|2017-02-08|2019-07-23|Logitech Europe, S.A.|Direction detection device for acquiring and processing audible input|

US10229667B2|2017-02-08|2019-03-12|Logitech Europe S.A.|Multi-directional beamforming device for acquiring and processing audible input|

US10366702B2|2017-02-08|2019-07-30|Logitech Europe, S.A.|Direction detection device for acquiring and processing audible input|

US10531219B2|2017-03-20|2020-01-07|Nokia Technologies Oy|Smooth rendering of overlapping audio-object interactions|

US10397724B2|2017-03-27|2019-08-27|Samsung Electronics Co., Ltd.|Modifying an apparent elevation of a sound source utilizing second-order filter sections|

US11074036B2|2017-05-05|2021-07-27|Nokia Technologies Oy|Metadata-free audio-object interactions|

US10165386B2|2017-05-16|2018-12-25|Nokia Technologies Oy|VR audio superzoom|

US10602296B2|2017-06-09|2020-03-24|Nokia Technologies Oy|Audio object adjustment for phase compensation in 6 degrees of freedom audio|

US10334360B2|2017-06-12|2019-06-25|Revolabs, Inc|Method for accurately calculating the direction of arrival of sound at a microphone array|

GB201710093D0|2017-06-23|2017-08-09|Nokia Technologies Oy|Audio distance estimation for spatial audio processing|

CA3069403A1|2017-07-14|2019-01-17|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Concept for generating an enhanced sound-field description or a modified sound field description using a multi-layer description|

US10264354B1|2017-09-25|2019-04-16|Cirrus Logic, Inc.|Spatial cues from broadside detection|

US10542368B2|2018-03-27|2020-01-21|Nokia Technologies Oy|Audio content modification for playback audio|

US11017790B2|2018-11-30|2021-05-25|International Business Machines Corporation|Avoiding speech collisions among participants during teleconferences|

EP3928315A1|2019-03-14|2021-12-29|Boomcloud 360, Inc.|Spatially aware multiband compression system with priority|

KR102154553B1|2019-09-18|2020-09-10|한국표준과학연구원|A spherical array of microphones for improved directivity and a method to encode sound field with the array|

法律状态:
2017-10-03| B15I| Others concerning applications: loss of priority|

2018-12-18| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2019-10-01| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2020-10-13| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2020-12-29| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 02/12/2011, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

US41962310P| true| 2010-12-03|2010-12-03|

US61/419,623|2010-12-03|

US42009910P| true| 2010-12-06|2010-12-06|

US61/420,099|2010-12-06|

PCT/EP2011/071629|WO2012072798A1|2010-12-03|2011-12-02|Sound acquisition via the extraction of geometrical information from direction of arrival estimates|

[返回顶部]