巴西专利BR112012024528B1 method and device for decoding a representation for audio sound field for audio reproduction and com

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
METHOD AND DEVICE TO DECODE AN AUDIO SOUND FIELD REPRESENTATION FOR AUDIO REPRODUCTION. The present invention relates to sound field signals such as, for example, Ambisonics which bear the representation of a desired sound field. The Ambisonics format is based on spherical harmonic decomposition of the sound field, and Higher Order Ambisonics (HOA) uses at least second order spherical harmonics. However, the commonly used configuration of the speakers are irregular and lead to problems with the decoder configuration. An improved method for decoding a decoder representation. An improved method for decoding an audio sound field representation for audio reproduction comprises calculating (110) the pan action function (W) using a geometric method based on the positions of the plurality of speakers and a plurality of source directions. , calculate (120) the mode matrix (equivalent) from the speaker positions, calculate (130) the pseudo-inverse mode matrix (equivalent+) and decode (140) the representation for audio sound field. The decoding is based on the decoding matrix (D) which is obtained from the pan action function (W) and the pseudo-inverse mode matrix (...).
公开号:BR112012024528B1
申请号:R112012024528-7
申请日:2011-03-25
公开日:2021-05-11
发明作者:Johann-Markus Batke；Florian Keiler；Johannes Boehm
申请人:Dolby International Ab；
IPC主号:

专利说明:

field of invention
[0001] The present invention relates to a method and a device for decoding a representation for audio sound field, and in particular a formatted Ambisonics audio representation, for audio reproduction. Background
[0002] This section intends to introduce the reader to the various aspects of the technique, which may be related to the various aspects of the present invention that are described and/or claimed below. The present discussion is believed to be useful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Thus, it is to be understood that said determinations are to be read in light, and not as admissions of the prior art, unless a source is expressly mentioned.
[0003] Precise location is a key objective for any spatial audio reproduction system. Said reproduction systems are highly applicable for conference systems, games, or other virtual environments that benefit from 3D sound. 3D sound scenes can be synthesized or captured as a natural sound field. Sound field signals such as, for example, Ambisonics carry a representation of a desired sound field. The Ambisonics format is based on spherical harmonic decomposition of the sound field. Although the basic format Ambisonics or format-B uses zero and one order spherical harmonics, the so-called Higher Order Ambisonics (HOA) also uses additional spherical harmonics of at least second order. A decoding process is required to obtain the individual speaker signals. To synthesize audio scenes, pan action functions that refer to the spatial arrangement of a speaker are needed to obtain a spatial location of the given sound source. If a natural sound field is to be registered, microphone structures are needed to capture the spatial information. The familiar Ambisonics approach is a well-suited tool for accomplishing this. Ambisonics formatted signals carry a representation of the desired sound field. A decoding process is required to obtain the individual speaker signals from said Ambisonics formatted signals. Since also in this case the pan action functions can be derived from the decoding functions, pan action functions are the key item to describe the spatial location task. The spatial arrangement of speakers is referred to here as the speaker configuration.
[0004] Commonly used speaker configurations are stereo configurations, which employ two speakers, the standard surround configuration using five speakers, and extensions of the surround configuration using more than five speakers. Said configurations are well known. However, they are restricted to two dimensions (2D), for example, no height information is reproduced.
[0005] The configuration of speakers for three-dimensional (3D) reproduction is described, for example, in "Wide listening area with exceptional spatial sound quality of a 22.2 multichannel sound system", K. Hamasaki, T. Nishiguchi, R. Okumaura , and Y. Nakayama in Audio Engineering Society Preprints, Vienna, Austria, May 2007, which is a proposal for the NHK ultra high definition TV with 22.2 format, or the 2+2+2 arrangement by Dabringhaus (mdg-musikproduktion dabringhaus und grimm, www.mdg.de) and a 10.2 setting in "Sound for Film and Television", T. Holman in 2nd ed. Boston: Focal Press, 2002. One of the few known systems that refer to spatial reproduction and panning strategies is the Vector-Based Panoramic Amplitude Action (VBAP) approach in "Virtual sound source positioning using vector base amplitude panning" , Journal of the Audio Engineering Society, vol. 45, no. 6, pp. 456466, June 1997, here Pulkki. VBAP (Vector Based Amplitude Panning) which was used by Pulkki to reproduce virtual acoustic sounds with an arbitrary speaker configuration. To arrange a virtual source on a 2D plane, a pair of speakers is needed, whereas in a 3D case a group of three speakers is needed. For each virtual source, a monophonic signal with different gains (depending on the position of the virtual source) is fed to the speakers selected from the complete setup. The speaker signals for all virtual sources are then summed. VBAP applies a geometric approach to calculate speaker signal gains for panning action between speakers.
[0006] An exemplary 3D configuration of the speaker example considered and just proposed here has 16 speakers, which are positioned as shown in Figure 2. The placement was chosen due to practical considerations, having four columns with three loudspeakers. -speakers each and additional speakers between said columns. In more detail, eight of the speakers are evenly distributed in a circle around the listener's head, enclosing 45-degree angles. Four additional speakers are located at the top and bottom, enclosing 90 degree azimuth angles. Regarding Ambisonics, this configuration is irregular and causes problems in the decoder configuration, as mentioned in "An ambisonics format for flexible reproduction layouts", by H. Pomberger and F. Zotter in Proceedings of the 1st Ambisonics Symposium, Graz, Austria , July 2009.
[0007] Conventional decoding of Ambisonics, as described in "Three-dimensional surround sound systems based on spherical harmonics" by M. Poletti in J. Audio Eng. Soc, vol. 53, no. 11, pp. 1004 - 1025, November 2005, employs the correspondence process in a commonly known manner. Modes are described by mode vectors that contain spherical harmonic values for a different direction of incidence. Combining all the directions given by the individual speakers leads to the mode matrix of the speaker configuration, so the mode matrix represents the speaker positions. To reproduce the mode of a signal from a different source, the speaker modes are weighted in such a way that the stacked modes of the individual speakers add up to the desired mode. To get the required weights, an inverse matrix representation of the speaker mode matrix needs to be calculated. In terms of signal decoding, the weights form the direction signal of the speakers, and the inverse speaker mode matrix is referred to as "decoding matrix", which is applied to decode an Ambisonics formatted signal representation. . In particular, for many speaker configurations, for example the configuration shown in figure 2, it is difficult to get the inverse of the mode matrix.
[0008] As mentioned above, the commonly used speaker configuration is restricted to 2D, ie no pitch information is reproduced. Decoding a sound field representation to a speaker configuration with mathematically uneven spatial distribution leads to localization and coloring problems with commonly known techniques. To decode an Ambisonics signal, the decoding matrix (ie a matrix of decoding coefficients) is used. In conventional decoding of Ambisonics signals, and particularly HOA signals, at least two problems occur. First, to correct the decoding it is necessary to know the signal source directions to obtain the decoding matrix. Second, mapping to an existing speaker configuration is systematically wrong due to the following mathematical problem: a mathematically correct decoding will result in not only positive speaker amplitudes, but also some negative ones. However, the above are erroneously reproduced as positive signs, thus leading to the aforementioned problems. Invention Summary
[0009] The present invention describes a method to decode a representation of a sound field for non-regular spatial distributions with highly improved localization and coloring properties. This represents another way of obtaining a decoding matrix for the sound field data, eg in Ambisonics format, and employs a process in a system estimation mode. Considering a set of possible incidence directions, the panning action functions related to the desired speakers are calculated. Pan action functions are taken as a result of an Ambisonics decoding process. The required input signal is a mode matrix of all directions considered. Therefore, as shown below, the decoding matrix is obtained by multiplying the weight matrix by an inverse version of the mode matrix of the input signals.
[00010] Regarding the second problem mentioned above, it was observed that it is also possible to obtain the decoding matrix from the inverse of the so-called mode matrix, which represents the loudspeaker positions, and weighting functions dependent on position ("pan action functions") W. One aspect of the present invention is that said pan action functions W can be derived using a different method than commonly used. Advantageously, a simple geometric method is used. Said method does not require any knowledge of any signal source direction, thus solving the first problem mentioned above. Said method is known as "Vector-Based Panoramic Amplitude Action" (VBAP). According to the present invention, VBAP is used to calculate the necessary pan action functions, which are then used to calculate the Ambisonics decoding matrix. Another problem occurs where the inverse of the mode matrix (which represents the speaker configuration) is required. However, the exact reverse is difficult to obtain, which also leads to poor audio reproduction. So, an additional aspect is that to obtain the decoding matrix an inversely-mode pseudo matrix is computed, which is much easier to obtain.
[00011] The present invention uses a two-step approach. The first step is a derivation of the pan action functions that are dependent on the speaker setup used for playback. In the second step, an Ambisonics decoding matrix is computed from said pan action functions for all speakers.
[00012] An advantage of the present invention is that no parametric description of the sound sources is needed; instead, a sound field description such as Ambisonics can be used.
[00013] According to the present invention, a method for decoding a representation for audio sound field for audio reproduction comprises steps steps of calculating, for each of the plurality of speakers, the pan action function using a geometric method based on the positions of the speakers and the plurality of source directions, calculate the mode matrix from the source directions, calculate the pseudo-inverse mode matrix of the mode matrix, and decode the representation for the mode field. audio sound, where the decoding is based on a decoding matrix that is obtained from at least the pan action function and the matrix in pseudo-inverse mode.
[00014] According to another aspect, a device for decoding a representation for audio sound field for audio reproduction comprises first calculating means for calculating, for each of the plurality of speakers, the pan action function using a geometric method based on the positions of the speakers and the plurality of source directions, second calculation means for calculating the mode matrix from the source directions, third calculation means for calculating the pseudo-inverse mode matrix from the power matrix mode, and decoding means for decoding the sound field representation, wherein the decoding is based on the decoding matrix and the decoding means uses at least the pan action function and the matrix pseudo-inversely to obtain the matrix of decoding. The first, second and third calculation means can be a single processor or two or more separate processors.
[00015] According to yet another aspect, a computer-readable medium has stored therein executable instructions for causing a computer to implement a method for decoding a representation for audio sound field for audio reproduction comprises steps of calculate, for each of the plurality of speakers, the pan action function using a geometric method based on the positions of the speakers and the plurality of source directions, calculate the mode matrix from the source directions, calculate the pseudo-inverse of the mode matrix, and decoding the representation for audio sound field, where the decoding is based on the decoding matrix that is obtained from at least the pan action function and the pseudo-inverse mode matrix .
[00016] Advantageous embodiments of the present invention are described in the embodiments, in the description below and in the figures. Brief Description of Drawings
[00017] Exemplary embodiments of the present invention are described with reference to the attached drawings, which show: Figure 1 is a flow chart of the method; Figure 2 is an example 3D configuration with 16 speakers; Figure 3 is a beam pattern resulting from decoding using unregulated match mode; Figure 4 is a beam pattern resulting from decoding using a smoothed-mode matrix; Figure 5 is a beam pattern resulting from decoding using the decoding matrix derived from VBAP; Figure 6 is a result of a listening test; and Figure 7 is a block diagram of a device. Detailed description of the present invention
[00018] As shown in figure 1, a method for decoding a representation for SFc audio sound field for audio reproduction comprises steps of calculating 110, for each of the plurality of speakers, the pan action function W using a method geometric based on the speaker positions 102 (L is the number of speakers) and the plurality of source directions 103 (S is the number of source directions), calculate 120 the mode matrix Ξ from the source directions and at a given order N of the sound field representation, calculate 130 the pseudo-inverse mode matrix Ξ+ of the mode matrix Ξ, and decode 135, 140 the representation for SFC audio sound field. Where the AUdec decoded sound data is obtained. The decoding is based on the decoding matrix D which is obtained 135 from at least the pan action function W and the pseudo-inverse mode matrix Ξ+. In one embodiment, the pseudo-inverse mode matrix is obtained according to Ξ+ = ΞH [ΞΞH]-1 . The order N of the sound field representation can be predefined, or it can be extracted 105 from the input signal SFC.
[00019] As shown in figure 7, a device for decoding a representation for audio sound field for audio reproduction comprises first calculating means 210 for calculating, for each of the plurality of speakers, the pan action function W using a geometric method based on the positions 102 of the speakers and the plurality of source directions 103, second calculation means 220 to calculate the modo-mode matrix from the source directions, third calculation means 230 to calculate the matrix of + pseudo-inverse mode of the Ξ mode matrix, and decoding means 240 for decoding the sound field representation. The decoding is based on the decoding matrix D, which is obtained from at least the pan action function W and the pseudo-inverse mode matrix Ξ+ by the decoding matrix calculating means 235 (eg a multiplier). The decoding means 240 uses the decoding matrix D to obtain a decoded audio signal AUdec. The first, second and third calculation means 220, 230, 240 can be a single processor, or two or more separate processors. The order N of the sound field representation can be predefined, or it can be obtained by means 205 for extracting the order from the input signal SFC.
[00020] A particularly useful 3D speaker setup has 16 speakers. As shown in figure2, there are four columns with three speakers each, and additional speakers between those columns. Eight of the speakers are evenly distributed in a circle around the listener's head, enclosing 45-degree angles. Four additional speakers are located at the top and bottom, enclosing 90 degree azimuth angles. With respect to Ambisonics, said configuration is irregular and in general leads to problems in the decoder configuration.
[00021] In the following, Vector-Based Panoramic Amplitude Action (VBAP) is described in detail. In one modality, VBAP is used here to arrange virtual acoustic sources with an arbitrary speaker configuration where the same speaker distance from the listening position is assumed. VBAP uses three speakers to arrange the virtual source in 3D space. For each virtual source, a monophonic signal with different gains is fed to the speakers to be used. Gains for different speakers are dependent on the position of the virtual source. VBAP is the geometric approach to calculate speaker signal gains for panning action between speakers. In the case of 3D, three speakers arranged in a triangle build a vector base. Each vector base is identified by the speaker numbers k, m, n and the speaker position vectors lk, lm, ln given in Cartesian coordinates normalized to unit length. The vector basis for the k, m, n speakers is defined by

[00022] The desired direction Ω = (θ, Φ) of the virtual source has to be given as an azimuth angle Φ and slope angle Θ. The position vector of unit length p(Ω) of the virtual source in Cartesian coordinates is therefore defined by
The position of the virtual source can be represented with the vector basis and the gain factors g(Ω) = ( ~gk, ~gm, ~gn)T by
By inverting the base of the vector matrix the necessary gain factors can be computed by

[00023] The vector base to be used is determined according to Pulkki's document: First the gains are calculated according to Pulkki for all vector bases. Then for each base vector the minimum over the gain factors is evaluated by ~gmin = min {~gk, ~gm, ~gn}. Finally the vector base where ~gmin has the highest value is used. The resulting gain factors must not be negative. Depending on the acoustics of the dark environment the gain factors can be normalized for energy conservation.
[00024] In the following, the Ambisonics format is described, which is an exemplary sound field format. The Ambisonics representation is a description of the sound field method employing a mathematical approximation of the sound field at a location. When using the spherical coordinate system, the pressure at the point r = (r, θ, Φ) in space is described by means of the spherical Fourier transform
where k is the wave number. Normally n goes to a finite order M. The coefficients Amn(k) of the series describe the sound field (assuming sources outside the validity region), jn(kr) is the spherical Bessel function of the first type and Ymn(θ, Φ) denotes spherical harmonics. Amn(k) coefficients are observed as Ambisonics coefficients in the present context. Spherical harmonics Ym n (θ, Φ) depend only on slope and azimuth angles and describe a function in the unit sphere.
[00025] For reasons of simplicity often flat waves are assumed for sound field reproduction. Ambisonics coefficients describing a plane wave as an acoustic source from the Q5 direction are

[00026] Their dependence on the wave number k decreases to a pure directional dependence in this special case. For a bounded order M the coefficients form a vector A which can be arranged as
keeping O = (M + 1 )2 elements. The same arrangement is used for the spherical harmonic coefficients producing a vector

[00027] The superscript H denotes complex conjugate transposition.
[00028] To calculate speaker signals from an Ambisonics representation of a sound field, mode matching is a commonly used approach. The basic idea is to express a given Ambisonics description of the A(Ω5) sound field by a heavy sum of the A( dos|) sound field descriptions
where Ω| denote the directions of the speakers, W| are weights, and L is the number of speakers. To derive pan action functions from equation (8), we assume a known incidence direction Ω5. If the source and the sound fields of the speakers are both flat waves, the 4πin factor (see equation (6)) can be reduced and equation (8) depends only on the complex conjugates of the spherical harmonic vectors, also referred to. as "modes". Using matrix notation, this is written as
where Ψ is the mode matrix of the speaker configuration
with O x L elements. To obtain the desired weight vector w, several strategies to accomplish this are known. If M = 3 is chosen, Ψ is square and can be inverted. Due to the irregular configuration of the speaker the matrix is however poorly scaled. In such a case, often the pseudo-inverse matrix is chosen and

[00029] Produces a DL x O decoding matrix.
where the weights W (Ωs) are the minimum energy solution for equation (9). The consequences of using the pseudo-inverse are described below.
[00030] The following describes the link between the pan action functions and the Ambisonics decoding matrix. Starting with Ambisonics, the pan action functions for the individual speakers can be calculated using equation (12). considering
be the mode matrix of the input signal directions S (Ω3), for example a spherical grid with a slope angle running in steps of one degree from 1 ...180° and an azimuth angle from 1 ...360° respectively. Said mode matrix has O x S elements. Using equation (12), the resulting matrix W has L x S elements, row I holds the pan action weights S for the respective loudspeaker:

[00031] As a representative example, the pan action function of a single speaker 2 is shown as beam pattern in figure3. The decoding matrix D of order M = 3 in the above example. As can be seen, the pan action function values in no way refer to the physical placement of the speaker. This is due to the irregular mathematical positioning of the speakers, which is not sufficient as a spatial sampling scheme for the chosen order. The decoding matrix is therefore referred to as a non-regulated mode matrix. This problem can be overcome by regularizing the speaker mode matrix Ψ in equation (11). This solution works as a function of the spatial resolution of the decoding matrix, which in turn can be expressed as a lower Ambisonics order. Figure 4 shows an exemplary pattern of beams resulting from decoding using the smoothed mode matrix, and particularly using the averaged eigenvalues of the mode matrix for smoothing. Compared to figure 3, the direction of the addressed speaker is now clearly recognized.
[00032] As outlined in the introduction, another way to obtain the D decoding matrix for the reproduction of Ambisonics signals is possible when the panning action functions are already known. The pan action functions W are seen as desired signal defined in a set of virtual source directions Ω, and the mode matrix Ξ of those directions serves as the input signal. Then the decoding matrix can be calculated using
where ΞH [Ξ ΞH]-1 or simply Ξ+ is the pseudo-inverse of the mode matrix Ξ. In the new approach, we take the panoramic action functions on W from VBAP and calculate an Ambisonics decoding matrix from the above.
[00033] Pan action functions for W are taken as gain values g(Q) calculated using equation (4), where Q is chosen according to equation (13). The resulting decoding matrix using equation (15) is an Ambisonics decoding matrix that facilitates panning VBAP functions. An example is illustrated in Figure 5, which shows a beam pattern resulting from decoding using the decoding matrix derived from VBAP. Advantageously, the SL side lobes are significantly smaller than the SLreg side lobes of the smoothed-mode matching result of figure 4. Furthermore, the VBAP-derived beam pattern for the individual speakers follows the geometry of the speaker configuration. as the VBAP pan action functions depend on the vector basis of the addressed direction. As a consequence, the new approach according to the present invention produces better results for all directions of the speaker configuration.
[00034] The 103 source directions can be relatively loosely defined. A condition for the number of source directions S is that it must be at least (N+1)2. Thus, having a given order N of the SFC sound field signal, it is possible to define S according to S > (N+1)2, and distribute the source directions S evenly over the unit sphere. As mentioned above, the result can be a spherical grid with a slope angle θ running in constant steps of x (eg x = 1 ...5 or x = 10, 20 etc.) degrees from 1 .. .180° and an azimuth angle Φ from 1 ...360° respectively, where each source direction Ω = (θ, Φ) can be given by the azimuth angle Φ and tilt angle θ.
[00035] The beneficial effect was confirmed in a listening test. For the assessment of the location of a single source, the virtual source is compared against a real source as a reference. For real source, a speaker in the desired position is used. The playback methods used are VBAP, Ambisonics mode correspondence decoding, and the newly proposed decoding using the VBAP pan action functions according to the present invention. For the last two methods, for each tested position and each tested input signal, a third-order Ambisonics signal is generated. Said synthetic Ambisonics signal is then decoded using the matching decoding matrices. The test signals used are wideband pink noise and a male speech signal. The tested positions are arranged in the frontal region with the directions

[00036] The listening test was conducted in an acoustic environment with an average reverberation time of approximately 0.2 s. Nine people participated in the listening test. Test subjects were asked to rate the spatial reproduction performance of all reproduction methods compared to the reference. A single rank value has to be found to represent the virtual font location and timbre changes. Figure 5 shows the results of the listening test.
[00037] As the results show, the unregulated Ambisonics mode correspondence decoding is rated as perceptually poor than the other methods being tested. This result corresponds to figure 3. The Ambisonics mode matching method serves as an anchor in this listening test. Another advantage is that the confidence intervals for signal noise are larger for VBAP than for other methods. The average values show the highest values for Ambisonics decoding using VBAP pan action functions. Thus, although the spatial resolution is reduced - due to the Ambisonics order used - this method shows advantages over the VBAP parametric approach. Compared to VBAP, not only the robust Ambisonics decoding but also the VBAP panoramic action functions have the advantage that not only three speakers are used to make the source virtual. In VBAP a single speaker can be dominant if the position of the virtual source is close to the fixed positions of the speakers. Most subjects reported fewer timbre changes for Ambisonics triggered VBAP than for directly applied VBAP. The issue of timbre changes for VBAP is already known from Pulkki.
[00038] As opposed to VBAP, the newly proposed method uses more than three speakers to reproduce a virtual source, but surprisingly produces less coloration.
[00039] In conclusion, a new way to obtain an Ambisonics decoding matrix from the VBAP panoramic action functions is described. For a different configuration of loudspeakers, said approach is advantageous compared to matrices of the correspondence mode approach. The properties and consequences of said decoding matrices are discussed above. In short, the newly proposed Ambisonics decoding with VBAP pan action functions avoids the typical problems of all known approaches and correspondence. A listening test showed that Ambisonics VBAP-derived decoding can produce better spatial reproduction quality than direct use of VBAP can. The proposed method only requires the sound field description while VBAP requires the parametric description of the virtual sources to be rendered.
[00040] Although it has been shown, described, and pointed out new and fundamental characteristics of the present invention as applied to the preferred modality thereof, it will be understood that several omissions and substitutions and changes in the apparatus and method described, in the form and details of the described devices, and in their operation, can be performed by those skilled in the art without deviating from the spirit of the present invention. It is expressly intended that all combinations of said elements that perform substantially the same function in substantially the same way to achieve the same results are within the scope of the present invention. Substitutions of elements from one described modality to another are also widely intended and contemplated. It will be understood that modifications of the details can be made without departing from the scope of the present invention. Each feature described in the description and (where appropriate) the embodiments and drawings may be provided independently or in any suitable combination. Features can, where appropriate, be implemented in hardware, software, or a combination of the two.
[00041] Reference numerals appearing in the embodiments are for illustration only and should not limit the scope of the embodiments.

权利要求:
Claims (12)
[0001]
1. Method for decoding an audio sound field representation for audio reproduction, characterized in that it comprises the steps of: - calculating (110), for each of a plurality of speakers, a panoramic action function (W) using a geometric method based on speaker positions and plurality of source directions; - calculate (120) a mode matrix (Ξ N) from the source directions; - calculate (130) a pseudo-inverse mode matrix (Ξ +) of the mode matrix (Ξ.); and - decoding (140) the representation for audio sound field, wherein the decoding is based on a decoding matrix (D) which is obtained from at least the pan action function (W) and the mode matrix pseudo-inverse (Ξ +).
[0002]
2. Method according to claim 1, characterized in that the geometric method used in the step of calculating the panoramic action function is Vector-Based Panoramic Amplitude Action (VBAP).
[0003]
3. Method according to claim 1 or 2, characterized in that the pseudo-inverse matrix (Ξ +) is obtained according to Ξ H [Ξ ΞH]-1 , where Ξ is the matrix of mode of the plurality of source directions.
[0004]
4. Method according to claim 3, characterized in that the decoding matrix (DN) is obtained (135) according to D =W Ξ H [Ξ ΞH]-1 = W Ξ+, where W is the set of pan action functions for each speaker.
[0005]
5. Device for decoding a representation for audio sound field for audio reproduction, characterized in that comprising: - first calculation means (210) for calculating, for each of a plurality of speakers, an action function pan (W) using a geometric method based on speaker positions and plurality of source directions; - second calculating means (220) for calculating a mode matrix (Ξ) from the source directions; - third calculation means (230) for calculating a pseudo-inverse mode matrix (Ξ +) of the mode matrix (Ξ); and - decoding means (240) for decoding the sound field representation, wherein the decoding is based on a decoding matrix (D) and the decoding means uses at least the pan action function (W) and the matrix pseudo-inversely (Ξ +) to obtain the decoding matrix (D).
[0006]
6. Device according to claim 5, characterized in that the device for decoding further comprises means (235) for calculating the decoding matrix (D) from the pan action function (W) and the matrix pseudo-inversely (Ξ +).
[0007]
7. Device according to claim 5 or 6, characterized by the fact that the geometric method used in the step of calculating the panoramic action function is Vector-Based Panoramic Amplitude Action (VBAP).
[0008]
8. Device according to any one of claims 5 to 7, characterized by the fact that the pseudo-inversely matrix Ξ + is obtained according to Ξ+ = ΞH [Ξ ΞH]-1, where Ξ it is the mode matrix of the plurality of source directions.
[0009]
9. Device according to claim 8, characterized in that the decoding matrix (DN) is obtained in means (245) to calculate the decoding matrix, according to D=W ΞH [Ξ ΞH] -1 = W Ξ+, where W is the set of panning functions for each speaker.
[0010]
10. Computer readable medium having stored therein a method to be implemented by a computer to decode a representation for audio sound field for audio reproduction, the method characterized in that it comprises the steps of: - calculating (110) , for each of a plurality of speakers, a pan action function (W) using a geometric method based on the positions of the speakers and the plurality of source directions; - calculate (120) a mode matrix (Ξ) from the source directions; - calculate (130) a pseudo-inverse mode matrix (Ξ+) of the mode matrix (Ξ); and - decoding (140) the representation for audio sound field, wherein the decoding is based on a decoding matrix (D) which is obtained from at least one pan action function (W) and the mode matrix. pseudo-inverse (Ξ+).
[0011]
11. Computer readable medium according to claim 10, characterized by the fact that the geometric method used in the step of calculating the pan action function is Vector-Based Panoramic Amplitude Action (VBAP).
[0012]
12. Computer readable medium according to claim 10 or 11, characterized in that the pseudo-inversely matrix matriz+ is obtained according to Ξ+ = ΞH [Ξ ΞH]-1 , where Ξ is the mode matrix of the plurality of source directions.

类似技术:

公开号 | 公开日 | 专利标题

BR112012024528B1|2021-05-11|method and device for decoding a representation for audio sound field for audio reproduction and computer readable medium

AU2016204408B2|2017-11-23|Method and device for decoding an audio soundfield representation for audio playback

AU2014265108B2|2016-06-30|Method and device for decoding an audio soundfield representation for audio playback

同族专利:

公开号 | 公开日

JP6918896B2|2021-08-11|

KR101795015B1|2017-11-07|

KR20190104450A|2019-09-09|

JP2021184611A|2021-12-02|

AU2011231565A1|2012-08-23|

KR102018824B1|2019-09-05|

KR101953279B1|2019-02-28|

US20190139555A1|2019-05-09|

KR101890229B1|2018-08-21|

EP2553947A1|2013-02-06|

US20130010971A1|2013-01-10|

BR122020001822B1|2021-05-04|

US10134405B2|2018-11-20|

CN102823277A|2012-12-12|

KR20130031823A|2013-03-29|

US9100768B2|2015-08-04|

US9460726B2|2016-10-04|

US20200273470A1|2020-08-27|

JP2013524564A|2013-06-17|

KR20200033997A|2020-03-30|

AU2011231565B2|2014-08-28|

JP2020039148A|2020-03-12|

JP5739041B2|2015-06-24|

JP2017085620A|2017-05-18|

PL2553947T3|2014-08-29|

US20150294672A1|2015-10-15|

KR20210107165A|2021-08-31|

WO2011117399A1|2011-09-29|

HK1174763A1|2013-06-14|

JP6615936B2|2019-12-04|

KR20170125138A|2017-11-13|

CN102823277B|2015-07-15|

KR101755531B1|2017-07-07|

KR102294460B1|2021-08-27|

JP5559415B2|2014-07-23|

US20170025127A1|2017-01-26|

JP2014161122A|2014-09-04|

US10037762B2|2018-07-31|

KR102093390B1|2020-03-25|

JP2018137818A|2018-08-30|

JP2015159598A|2015-09-03|

ES2472456T3|2014-07-01|

US11217258B2|2022-01-04|

EP2553947B1|2014-05-07|

KR20190022914A|2019-03-06|

BR112012024528A2|2016-09-06|

US9767813B2|2017-09-19|

PT2553947E|2014-06-24|

US20190341062A1|2019-11-07|

US10629211B2|2020-04-21|

KR20180094144A|2018-08-22|

JP6336558B2|2018-06-06|

JP6067773B2|2017-01-25|

US10522159B2|2019-12-31|

US20180308498A1|2018-10-25|

US20170372709A1|2017-12-28|

KR20170084335A|2017-07-19|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US4095049A|1976-03-15|1978-06-13|National Research Development Corporation|Non-rotationally-symmetric surround-sound encoding system|

CN1452851A|2000-04-19|2003-10-29|音响方案公司|Multi-channel surround sound mastering and reproduction techniques that preserve spatial harmonics in three dimensions|

JP2002218655A|2001-01-16|2002-08-02|Nippon Telegr & Teleph Corp <Ntt>|Power supply system at airport|

FR2847376B1|2002-11-19|2005-02-04|France Telecom|METHOD FOR PROCESSING SOUND DATA AND SOUND ACQUISITION DEVICE USING THE SAME|

US7558393B2|2003-03-18|2009-07-07|Miller Iii Robert E|System and method for compatible 2D/3D surround sound reproduction|

DE602005003342T2|2005-06-23|2008-09-11|Akg Acoustics Gmbh|Method for modeling a microphone|

JP4928177B2|2006-07-05|2012-05-09|日本放送協会|Sound image forming device|

DE102006053919A1|2006-10-11|2008-04-17|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for generating a number of speaker signals for a speaker array defining a playback space|

US8290167B2|2007-03-21|2012-10-16|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Method and apparatus for conversion between multi-channel audio formats|

US20080232601A1|2007-03-21|2008-09-25|Ville Pulkki|Method and apparatus for enhancement of audio reconstruction|

EP2094032A1|2008-02-19|2009-08-26|Deutsche Thomson OHG|Audio signal, method and apparatus for encoding or transmitting the same and method and apparatus for processing the same|

JP4922211B2|2008-03-07|2012-04-25|日本放送協会|Acoustic signal converter, method and program thereof|

ES2425814T3|2008-08-13|2013-10-17|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus for determining a converted spatial audio signal|

WO2011012455A1|2009-07-30|2011-02-03|Oce-Technologies B.V.|Automatic table location in documents|

PL2553947T3|2010-03-26|2014-08-29|Thomson Licensing|Method and device for decoding an audio soundfield representation for audio playback|

JP6589838B2|2016-11-30|2019-10-16|カシオ計算機株式会社|Moving picture editing apparatus and moving picture editing method|US4968134A|1988-06-29|1990-11-06|Ricoh Company, Ltd.|Overhead projector|

PL2553947T3|2010-03-26|2014-08-29|Thomson Licensing|Method and device for decoding an audio soundfield representation for audio playback|

EP2541547A1|2011-06-30|2013-01-02|Thomson Licensing|Method and apparatus for changing the relative positions of sound objects contained within a higher-order ambisonics representation|

RU2672130C2|2011-07-01|2018-11-12|Долби Лабораторис Лайсэнзин Корпорейшн|System and instrumental means for improved authoring and representation of three-dimensional audio data|

US9084058B2|2011-12-29|2015-07-14|Sonos, Inc.|Sound field calibration using listener localization|

EP2637427A1|2012-03-06|2013-09-11|Thomson Licensing|Method and apparatus for playback of a higher-order ambisonics audio signal|

EP2645748A1|2012-03-28|2013-10-02|Thomson Licensing|Method and apparatus for decoding stereo loudspeaker signals from a higher-order Ambisonics audio signal|

EP2665208A1|2012-05-14|2013-11-20|Thomson Licensing|Method and apparatus for compressing and decompressing a Higher Order Ambisonics signal representation|

US9106192B2|2012-06-28|2015-08-11|Sonos, Inc.|System and method for device playback calibration|

US9288603B2|2012-07-15|2016-03-15|Qualcomm Incorporated|Systems, methods, apparatus, and computer-readable media for backward-compatible audio coding|

CN107071685B|2012-07-16|2020-02-14|杜比国际公司|Method and apparatus for rendering an audio soundfield representation for audio playback|

US9473870B2|2012-07-16|2016-10-18|Qualcomm Incorporated|Loudspeaker position compensation with 3D-audio hierarchical coding|

EP2688066A1|2012-07-16|2014-01-22|Thomson Licensing|Method and apparatus for encoding multi-channel HOA audio signals for noise reduction, and method and apparatus for decoding multi-channel HOA audio signals for noise reduction|

US9479886B2|2012-07-20|2016-10-25|Qualcomm Incorporated|Scalable downmix design with feedback for object-based surround codec|

US9761229B2|2012-07-20|2017-09-12|Qualcomm Incorporated|Systems, methods, apparatus, and computer-readable media for audio object clustering|

EP2738962A1|2012-11-29|2014-06-04|Thomson Licensing|Method and apparatus for determining dominant sound source directions in a higher order ambisonics representation of a sound field|

CN108174341B|2013-01-16|2021-01-08|杜比国际公司|Method and apparatus for measuring higher order ambisonics loudness level|

US9736609B2|2013-02-07|2017-08-15|Qualcomm Incorporated|Determining renderers for spherical harmonic coefficients|

EP2765791A1|2013-02-08|2014-08-13|Thomson Licensing|Method and apparatus for determining directions of uncorrelated sound sources in a higher order ambisonics representation of a sound field|

EP2979467B1|2013-03-28|2019-12-18|Dolby Laboratories Licensing Corporation|Rendering audio using speakers organized as a mesh of arbitrary n-gons|

CN108064014B|2013-04-26|2020-11-06|索尼公司|Sound processing device|

KR102160519B1|2013-04-26|2020-09-28|소니 주식회사|Audio processing device, method, and recording medium|

EP2800401A1|2013-04-29|2014-11-05|Thomson Licensing|Method and Apparatus for compressing and decompressing a Higher Order Ambisonics representation|

CN105340008B|2013-05-29|2019-06-14|高通股份有限公司|The compression through exploded representation of sound field|

US20140355769A1|2013-05-29|2014-12-04|Qualcomm Incorporated|Energy preservation for decomposed representations of a sound field|

US9466305B2|2013-05-29|2016-10-11|Qualcomm Incorporated|Performing positional analysis to code spherical harmonic coefficients|

EP3005354B1|2013-06-05|2019-07-03|Dolby International AB|Method for encoding audio signals, apparatus for encoding audio signals, method for decoding audio signals and apparatus for decoding audio signals|

EP2824661A1|2013-07-11|2015-01-14|Thomson Licensing|Method and Apparatus for generating from a coefficient domain representation of HOA signals a mixed spatial/coefficient domain representation of said HOA signals|

EP2866475A1|2013-10-23|2015-04-29|Thomson Licensing|Method for and apparatus for decoding an audio soundfield representation for audio playback using 2D setups|

EP2879408A1|2013-11-28|2015-06-03|Thomson Licensing|Method and apparatus for higher order ambisonics encoding and decoding using singular value decomposition|

CN111182443B|2014-01-08|2021-10-22|杜比国际公司|Method and apparatus for decoding a bitstream comprising an encoded HOA representation|

US9922656B2|2014-01-30|2018-03-20|Qualcomm Incorporated|Transitioning of ambient higher-order ambisonic coefficients|

US9502045B2|2014-01-30|2016-11-22|Qualcomm Incorporated|Coding independent frames of ambient higher-order ambisonic coefficients|

KR102201726B1|2014-03-21|2021-01-12|돌비 인터네셔널 에이비|Method for compressing a higher order ambisonics signal, method for decompressing a compressed hoa signal, apparatus for compressing a hoa signal, and apparatus for decompressing a compressed hoa signal|

EP2922057A1|2014-03-21|2015-09-23|Thomson Licensing|Method for compressing a Higher Order Ambisonicssignal, method for decompressing a compressed HOA signal, apparatus for compressing a HOA signal, and apparatus for decompressing a compressed HOA signal|

US10412522B2|2014-03-21|2019-09-10|Qualcomm Incorporated|Inserting audio channels into descriptions of soundfields|

WO2015145782A1|2014-03-26|2015-10-01|Panasonic Corporation|Apparatus and method for surround audio signal processing|

RU2666248C2|2014-05-13|2018-09-06|Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф.|Device and method for amplitude panning with front fading|

US10770087B2|2014-05-16|2020-09-08|Qualcomm Incorporated|Selecting codebooks for coding vectors decomposed from higher-order ambisonic audio signals|

US9847087B2|2014-05-16|2017-12-19|Qualcomm Incorporated|Higher order ambisonics signal compression|

US9852737B2|2014-05-16|2017-12-26|Qualcomm Incorporated|Coding vectors decomposed from higher-order ambisonics audio signals|

US9620137B2|2014-05-16|2017-04-11|Qualcomm Incorporated|Determining between scalar and vector quantization in higher order ambisonic coefficients|

US9910634B2|2014-09-09|2018-03-06|Sonos, Inc.|Microphone calibration|

US9747910B2|2014-09-26|2017-08-29|Qualcomm Incorporated|Switching between predictive and non-predictive quantization techniques in a higher order ambisonicsframework|

US10140996B2|2014-10-10|2018-11-27|Qualcomm Incorporated|Signaling layers for scalable coding of higher order ambisonic audio data|

EP3073488A1|2015-03-24|2016-09-28|Thomson Licensing|Method and apparatus for embedding and regaining watermarks in an ambisonics representation of a sound field|

US9693165B2|2015-09-17|2017-06-27|Sonos, Inc.|Validation of audio calibration using multi-dimensional motion check|

US10070094B2|2015-10-14|2018-09-04|Qualcomm Incorporated|Screen related adaptation of higher order ambisoniccontent|

CN105392102B|2015-11-30|2017-07-25|武汉大学|Three-dimensional sound signal generation method and system for aspherical loudspeaker array|

WO2017119320A1|2016-01-08|2017-07-13|ソニー株式会社|Audio processing device and method, and program|

WO2017119321A1|2016-01-08|2017-07-13|ソニー株式会社|Audio processing device and method, and program|

EP3402221B1|2016-01-08|2020-04-08|Sony Corporation|Audio processing device and method, and program|

US10003899B2|2016-01-25|2018-06-19|Sonos, Inc.|Calibration with particular locations|

US11106423B2|2016-01-25|2021-08-31|Sonos, Inc.|Evaluating calibration of a playback device|

US9860662B2|2016-04-01|2018-01-02|Sonos, Inc.|Updating playback device configuration information based on calibration data|

US9763018B1|2016-04-12|2017-09-12|Sonos, Inc.|Calibration of audio playback devices|

US10372406B2|2016-07-22|2019-08-06|Sonos, Inc.|Calibration interface|

EP3574661B1|2017-01-27|2021-08-11|Auro Technologies NV|Processing method and system for panning audio objects|

US10861467B2|2017-03-01|2020-12-08|Dolby Laboratories Licensing Corporation|Audio processing in adaptive intermediate spatial format|

KR20190139206A|2017-04-13|2019-12-17|소니 주식회사|Signal processing apparatus and method, and program|

CN107147975B|2017-04-26|2019-05-14|北京大学|A kind of Ambisonics matching pursuit coding/decoding method put towards irregular loudspeaker|

US10405126B2|2017-06-30|2019-09-03|Qualcomm Incorporated|Mixed-order ambisonicsaudio data for computer-mediated reality systems|

US10674301B2|2017-08-25|2020-06-02|Google Llc|Fast and memory efficient encoding of sound objects using spherical harmonic symmetries|

US10264386B1|2018-02-09|2019-04-16|Google Llc|Directional emphasis in ambisonics|

US11206484B2|2018-08-28|2021-12-21|Sonos, Inc.|Passive speaker authentication|

法律状态:
2017-12-05| B25A| Requested transfer of rights approved|Owner name: DOLBY INTERNATIONAL AB (NL) |

2019-01-08| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2019-11-05| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2021-03-02| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2021-05-11| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 10 (DEZ) ANOS CONTADOS A PARTIR DE 11/05/2021, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

EP10305316|2010-03-26|

EP10305316.1|2010-03-26|

PCT/EP2011/054644|WO2011117399A1|2010-03-26|2011-03-25|Method and device for decoding an audio soundfield representation for audio playback|BR122020001822-4A| BR122020001822B1|2010-03-26|2011-03-25|METHOD AND DEVICE TO DECODE AN AUDIO SOUND FIELD REPRESENTATION FOR AUDIO REPRODUCTION AND COMPUTER-READABLE MEDIA|

[返回顶部]