巴西专利BR112013032878B1 METHOD AND APPARATUS TO CHANGE THE RELATIVE POSITIONS OF SOUND OBJECTS CONTAINED WITHIN AN AMBISONIC

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
method and apparatus for changing the relative positions of sound objects contained within an ambisonic representation of higher order ambisonic of higher order hoa is a representation of spatial sound fields that facilitates the capture, manipulation, recording, transmission and reproduction of audio scenes complex with superior spatial resolution, both in 2D and 3D. the sound field is approximately at, and around a point of reference in space through a series of fourier-bessel. the invention uses space adjustment (12, 13, 14; 16) to modify the spatial content and / or the reproduction of sound field information that has been captured or produced as a higher-order ambisonic representation. different tuning characteristics are feasible for 2d and 3d sound fields. the adjustment is performed in a space domain without performing scene analysis or decomposition. hoa input coefficients in a given order are decoded to the weights or input signals from regularly positioned (virtual) speakers.
公开号:BR112013032878B1
申请号:R112013032878-9
申请日:2012-06-15
公开日:2021-04-13
发明作者:Peter Jax；Johann-Markus Batke
申请人:Interdigital Madison Patent Holdings；
IPC主号:

专利说明:

The invention relates to a method and an apparatus for changing the relative positions of sound objects contained within a superior, two-dimensional or three-dimensional ambisonic representation of an audio scene. BACKGROUND
Higher order ambisonics (HOA) is a representation of spatial sound fields that facilitates the capture, manipulation, recording, transmission and reproduction of complex audio scenes with superior spatial resolution, both in 2D and 3D. The sound field is approximated at, and around a reference point in space through a Fourier-Bessel series.
There are only a limited number of techniques for manipulating the spatial arrangement of an audio scene captured with HOA techniques. In principle, there are two ways: A) Decompose the audio scene into separate sound objects and associated position information, for example, through DirAC, and compose a new scene with manipulated position parameters. The disadvantage is that the error-prone and sophisticated scene decomposition is mandatory. B) The content of the HOA representation can be modified by means of linear transformation of HOA vectors. Here, only rotation, mirroring and emphasis of forward / backward directions have been proposed. All of these known transformation-based modification techniques keep the relative positioning of objects within a scene fixed. To manipulate or modify a scene's content, space deformation has been proposed, including the rotation and mirroring of the HOA sound fields, and modification of the specific directions: GJ Barton, MA Gerzon, "Ambisonic Decoders for HDTV", AES Convention, 1992; J. Daniel, "Representation of champs acoustiques, application to transmission and reproduction of complex sound scenes in a multimedia context", PhD thesis, Université de Paris 6, 2001, Paris, France; M. Chapman, Ph. Cotterell, "Towards a Comprehensive Account of Valid Ambisonic Transformations", Ambisonics Symposium, 2009, Graz, Austria. INVENTION
A problem to be solved is that of facilitating the change of relative positions of sound objects contained within an HOA-based audio scene, without the need to analyze the composition of the scene. This problem is solved by the method revealed in king-vindication 1. An apparatus using this method is revealed in claim 2.
The invention uses space deformation to modify the spatial content and / or the reproduction of sound field information that has been captured or produced as an ambisonic representation of a higher order. The spatial deformation in the HOA domain represents both a multi-step approach or, in a more efficient computational way, a single-step linear matrix multiplication. Deformation characteristics, different feasible for 2D and 3D sound fields.
Deformation is performed in the space domain without performing scene analysis or decomposition. The HOA input coefficients in a given order are decoded to the weights or input signals from regularly positioned (virtual) speakers.
The processing of deformation of inventive space has several advantages: - it is very flexible due to the various degrees of freedom in parameterization; - it can be implemented in a very efficient way, that is, with comparatively low complexity; - does not require any scene analysis or decomposition.
In principle, the inventive method is suitable for changing the relative positions of sound objects contained within a two-dimensional or three-dimensional HOA ambisonic representation of an audio scene, in which an input vector Ain with dimension Oin determines the coefficients of a series of Fourier of the input signal and an output vector Aout with dimension Oout determines the coefficients of a series of Fourier of the output signal correspondingly changed, the method including the steps of: - decoding the input vector Ain of the HOA coefficients of input into sin input signals in space domain for regularly positioned speaker positions using the Ti-1 inverse of a Ti mode matrix by calculating Sin = Ti-1Ain; - deform and encode in space space the input signals sin for the output vector Aout of the HOA output coefficients adapted by calculating Aout = T2 sin, in which the mode vectors of the T2 mode matrix are modified according to a deformation function / (Φ) through which the angles of the original speaker positions are mapped one by one at target angles of the target speaker positions in the output vector Aout.
In principle, the inventive apparatus is suitable for changing the relative positions of the sound objects contained within a two-dimensional or three-dimensional ambitionic HOA representation of an audio scene, in which an input vector Ain with dimension Oin determines the coefficients of an Fourier series of the input signal and an Aout output vector with Oout dimension determines the coefficients of a correspondingly changed Fourier series of the output signal, the apparatus including: - medium being adapted to decode the input vector Ain of the HOA coefficients of input into sin input signals in the space domain for regularly positioned speaker positions using the Ti-1 inverse of a Ti mode matrix by calculating Sin = Ti-1Ain; - medium being adapted to deform and encode in space domain the input signals sin in output vector Aout of the HOA output coefficients, adapted by calculating Aout = T2 sin, in which the mode vectors of the T2 mode matrix are modified according to a deformation function / (Φ) through which the angles of the original speaker positions are mapped one by one at target angles of the target speaker positions in the output vector Aout.
Additional advantageous embodiments of the invention are disclosed in the respective dependent claims. DRAWINGS
Exemplary modalities of the invention are described with reference to the accompanying drawings, which show in:
Figure 1, principle of deformation in the space domain;
Figure 2, example of space deformation with Nin = 3, Nout = 12 and the deformation function

Figure 3, matrix distortions for different deformation functions and Nwarp “in-internal” orders. Exemplary Modalities
Then, for comprehensibility, the inventive application of space deformation is described for a two-dimensional scenario, the HOA representation is based on circular harmonic, and it is assumed that the represented sound field comprises only flat sound waves. Subsequently, the description is extended to three-dimensional cases, based on spherical harmonic. Notation
In Ambisonic theory, the sound field at, and around, a specific point in space is described by means of a truncated Fourier-Bessel series. In general, the reference point is supposed to be at the origin of the chosen coordinate system.
For a three-dimensional application using spherical coordinates, the Fourier series with coefficients for all defined indices n = em = -n, ..., n describes the pressure of the sound field in azimuth angle Φ, slope θ and distance ra from of origin:
m that k is the wave number and
is the basic function of the Fourier-Bessel series which is strictly related to the spherical harmonic for the direction defined by θ and Φ- For convenience, then the HOA Amn coefficients are used with the definition
. For the specific order N the number of coefficients in the Fourier-Bessel series is 0 = (N + 1) 2 •
For a two-dimensional application using circular coordinates, the basic functions depend only on the azimuth angle Φ- All coefficients with mm n have a value of zero and can be omitted. Therefore, the number of HOA coefficients is reduced to just 0 = 2N + 1. In addition, the slope θ = π / 2 is fixed. Note that for the 2D case and for a perfectly uniform distribution of the sound objects in the circle, that is, with 'o' the T-mode vectors are identical to the basic functions of the well-known discrete Fourier transform DFT.
There are different conventions for the definition of the basic functions that also Δm lead to different definitions of the Ambisonic coefficients "• However, the exact definition does not play a role for the basic specification and characteristics of the space deformation techniques described in this application.
The HOA “signal” comprises a vector A of Ambisonic coefficients for each instant. For a two-dimensional - that is, circular - scenario, the typical composition and ordering of the coefficient vector is

For a three-dimensional, spherical scenario, the common ordering of the coefficients is different:

The encoding of HOA representations behaves in a linear manner and, therefore, the HOA coefficients for multiple, separate sound objects can be added together to derive the HOA coefficients of the resulting sound field. Simple Encoding
Simple encoding of multiple sound objects from multiple directions can be performed directly in vector algebra. “Coding” means the step to derive the HOA coefficient vector A (k, l) at a time l and has a wave number ka from the information on the pressure contributions si (k, 0 of individual sound objects (i = 0. ..M - 1) at the same moment l, plus the directions Φi # í from which the sound waves are reaching the origin of the coordinate system

If a two-dimensional scenario and a HOA vector composition as defined in Equation (2) is assumed, the T mode matrix is constructed from the mode vectors
• The i-order column of T contains the mode vector according to the direction Φi of the I-order sound object

As defined above, the encoding of an HOA representation can be interpreted as a space frequency transformation because the input signals (sound objects) are spatially distributed. This transformation by matrix T can be reversed without loss of information only if the number of sound objects is identical to the number of HO coefficients, that is, if M = 0, and if the directions Φi are reasonably spread around the unit circle. In mathematical terms, the conditions for reversibility are that the T-mode matrix must be square (0 x 0) and invertible. Simple Decoding
Upon decoding, real or virtual speaker driver signals are derived that have to be applied to accurately reproduce the desired sound field as described by the HOA input coefficients. Such decoding depends on the M number and speaker positions. The following three important cases have to be distinguished (note: these cases are simplified in the sense that they are defined by means of the “number of speakers”, assuming that these are established in a geometrically reasonable way. More precisely, the definition must be made through the mode matrix category of the target speaker configuration). In the exemplary decoding rules shown below, the mode equivalent to the decoding principle is employed, but other decoding principles can be used which can lead to different decoding rules for the three scenarios.
• Overdetermined caseThe number of speakers is greater than the number of HO coefficients, that is, M> O. In this case, there is no singular solution to the decoding problem, but there is a range of admissible solutions that are located in a dimensional subspace. M - The dimensional space M of all potential solutions. Typically, the T mode matrix pseudo inverse of the specific speaker configuration is used to determine the speaker signals
.
• This solution provides speaker signals with a minimum bass reproduction capability sTs (see, for example, LL Scharf, "Statistical Signal Processing. Detection, Estimation, and Time Series Analysis", Addison-Wesley Publishing Company, Reading, Massachusetts, nineteen ninety). For regular speaker configurations (which can be easily achieved in the 2D case) the matrix operation ^ T) 1 produces the identity matrix, and the decoding rule from Equation (6) simplifies to s = ^ A.
• Case determined: The number of speakers is equal to the number of HOA coefficients. Exactly a single solution exists for the decoding problem, which is defined by the inverse 'P-1 of the mode matrix:

• Underdetermined case: The number of speakers M is less than the number 0 of HOA coefficients. Thus, the mathematical problem of decoding the sound field is undetermined and there is no single, exact solution. Instead, numerical optimization has to be used to determine the speaker signals that possibly best match the desired sound field.
Regularization can be applied to derive a stable solution, for example, using the formula
where I denotes the identity matrix and the scalar factor X defines the amount of regularization. As an example, X can be established for the mean of the eigenvalues of Φ ΦT.
The resulting beam patterns can be suboptimal because in general the beam patterns obtained with this approach are excessively directional, and a large amount of sound information will be underrepresented.
For all the decoder examples described above, the assumption was made that the speakers emit flat waves. Real-world speakers have different reproduction characteristics, whose characteristics must be taken care of by the decoding rule. Basic Deformation
The inventive space deformation principle is illustrated in Figure 1a. Deformation is carried out in the space domain. Therefore, first the HOA input coefficients Ain with the order Nin and dimension Oin are decoded in step / stage 12 for weights or sin input signals for regularly positioned (virtual) speakers. For this decoding step, it is advantageous to employ a specific decoder, that is, one for which the Owarp number of virtual speakers is equal to or greater than the number of HOA Oin coefficients. For the case mentioned last (more speakers than HOA coefficients), the order or dimension of the Ain vector of HOA coefficients can be easily extended by adding in step / stage 11 zero coefficients for higher orders. The dimension of the target vector sin will be denoted by Owarp next.
The decryption rule is

The virtual positions of the speaker signals must be regular, for example, Φi - i • 2π / Owarp for the two-dimensional case. In this way, it is guaranteed that the 'Pi mode matrix is well conditioned to determine the coding matrix 1 •
Then, the positions of the virtual speakers are modified in the “deformation” processing according to the desired deformation characteristics. The deformation processing is in step / stage 14, combined with the coding of the target vector Sin (or Sout, respectively) using the ^ 2 mode matrix, resulting in the Aout vector of the deformed HOA coefficients with Owarp dimension or, after a step of additional processing described below, with Oout dimension. In principle, the deformation characteristics can be defined completely through a one-to-one mapping of source angles to target angles, that is, for each source angle Φin = 0. . . 2n and possibly θin = 0. . . 2π a target angle is defined, so for the 2D case

For understanding, this (virtual) reorientation can be compared to physically move the speakers to new positions.
A problem that will be produced by this procedure is that the distance between adjacent speakers at certain angles is changed according to the gradient of the deformation function f (Φe is described for the case 2D below): if the gradient of ^ Φ) is larger than one, the same angular space in the deformed sound field will be occupied by fewer “speakers” than in the original sound field, and vice versa. In other words, the Ds density of the speakers behaves according to

In turn, this means that the space deformation changes the sound balance around the listener. Regions in which the speaker density is increased, that is, for which ^ s (Φ)> 1, will become more dominant, and regions in which ^ s (Φ) <1 will become less dominant.
As an option, depending on the application requirements, the aforementioned modification of the speaker density can be opposed by applying a gain function # (Φ) to the virtual speaker output signals Sin in step / weighting stage 13, resulting in the sout signal. In principle, any weighting function 0 (Φ) can be specified. A specific advantageous variant has been determined empirically to be proportional to the derivative of the deformation function / (Φ):

With this specific weighting function, in the assumption of appropriately high internal order and exit order (see the section below How to establish HOA orders), the amplitude of a panning function at a specific deformed angle / (Φ) is maintained equal to the original pan function at the original angle Φ. In this way, a homogeneous sound balance (amplitude) by opening angle is obtained.
Apart from the exemplary weighting function above, other weighting functions can be used, for example, to obtain an equal power per opening angle.
Finally, in step / stage 14 the weighted virtual speaker signals are deformed and encoded again with the ^ 2 mode matrix by performing VIJ2 sOut • VIJ2 comprises different mode vectors than '^ í, according to the deformation function / (Φ). The result is an HOA representation of the Owarp dimension of the deformed sound field.
If the order or dimension of the target HOA representation must be less than the order of the encoder ^ 2 (see the section below How to establish the HOA orders), some (that is, a part) of the deformed coefficients have to be removed (removed) in the step / stage 15. In general, this removal operation can be described using a window operation: the coded vector ^ 2 sout is multiplied by a window vector w comprising zero coefficients for the highest orders that must be removed, whose multiplication can be considered as representing an additional weight. In the simplest case, a rectangular window can be employed, however, more sophisticated windows can be used as described in section 3 of MA Poletti, "A Unified Theory of Horizontal Holographic Sound Systems", Journal of the Audio Engineering Society, 48 (12 ), pp.11551182, 2000, or the 'in-phase' or 'max. rE 'from section 3.3.2 of J. Daniel's PhD thesis, mentioned above.
Deformation Functions for 3D
The concept of a deformation function / (Φ) and associated weighting function ^ (Φ) were described above for the two-dimensional case. What follows is an extension to the three-dimensional case which is more sophisticated both because of the superior dimension and because of the spherical geometry it has to be employed. Two simplified scenarios are introduced, both of which allow you to specify the desired spatial strain by using one-dimensional strain functions foΦ) or / (^).
In the space deformation along longitudes, the space deformation is performed as a function of only the azimuth Φ. This case is very similar to the two-dimensional case introduced above. The deformation function is defined completely by

In this way, similar deformation functions can be used as for the two-dimensional case. Space deformation has its maximum impact for sound objects at the equator, while it has the lowest impact for sound objects at the poles of the sphere. The density of sound objects (deformed) in the sphere depends only on the azimuth. Therefore, the weighting function for constant density is

A free orientation of the specific characteristics in space is possible by rotating (virtually) the sphere before using the deformation and then rotating inversely.
In space deformation along latitudes, space deformation is allowed only along the meridians. The deformation function is defined by

An important characteristic of this deformation function in a sphere is that, although the azimuth angle is kept constant, the angular distance of two points in the azimuth direction can change due to the modification of the slope. The reason is that the angular distance between two meridians is maximum at the equator, but disappears to zero at both poles. This fact has to be considered by the weighting function.
The angular distance c of the two points A and B can be determined by the cosine rule of spherical geometry, compare with: Eq. (3.188c) in IN Bronstein, KA Semendjajew, G. Musiol, H. Mühlig, "Taschenbuch der Mathematik ", Verlag Harri Deutsch, Thun, Frankfurt / Main, 5th edition, 2000:
where ΦABnotes the azimuth angle between the two points A and B. With respect to the angular distance between two points on the same slope θ, this equation simplifies for

This formula can be used to derive the angular distance between a point in space and another point that is separated by a small azimuth angle Φε. “Small” means as small as feasible in practical applications, but not zero, in theory the limiting value Φε '• The relationship between such angular distances before and after deformation provides the factor by which the density of sound objects changes in the direction O:

Finally, the weighting function is the product of the two weighting functions in the Φ and θ directions

Again, as in the previous scenario, an orientation free of the specific deformation characteristics in the space is feasible by rotation.
Single Step Processing
The steps introduced in connection with Figure 1a, that is, order extension, decoding, weighting, deformation + coding and order removal, are essentially linear operations. Therefore, this sequence of operations can be replaced by multiplying the input HOA coefficients with a single matrix in step / stage 16 as shown in Figure 1b. By omitting the extension and removal operations, the complete transformation matrix T ^ warP x ^ warP is determined as
where diag (-) denotes a diagonal matrix that has the values of its vector argument as components of the main diagonal, S is the weighting function, and w is the window vector to prepare the removal described above, that is, from two weighting functions for preparing the removal and the removal of coefficients performed in step / stage 15, the window vector w in equation (24) serves only for the weighting.
The two adaptations of orders within the multi-step approach, that is, the extension of the order preceding the decoder and the removal of HOA coefficients after encoding, can also be integrated in the transformation matrix T by removing the corresponding columns and / or lines . In this way, a matrix of size ^ out x ^ in is derived which can be used directly for the input HOA vectors. So, the space deformation operation becomes
I Advantageously, due to the effective reduction of the dimensions of the transformation matrix T from Owarp x Owarp to Oout x Oin, the computational complexity required to perform the single-step processing according to Figure 1b is significantly lower than that required for the approach of multiple steps in Figure 1a, although single-step processing provides perfectly identical results. In particular, distortions that could arise if multi-step processing were performed with a lower Nwarp order of its temporary signals are avoided (see the section below How to establish HOA orders for details). State of the art: rotation and mirroring
Rotations and mirroring of a sound field can be considered as “simple” subcategories of space deformation. The special feature of these transformations is that the relative position of the sound objects in relation to each other is not changed. This means that a sound object that was located, for example, 30 ° to the right of another sound object in the original sound scene will remain 30 ° to the right of the same sound object in the rotated sound scene. For mirroring, only the signal changes, except the angular distances, remain the same.
Algorithms and applications for rotation and mirroring of sound field information have been explored and described, for example, in the articles mentioned above by Baron / Gerson and J. Daniel and in M. Noisternig, A. Sontacchi, Th. Musil, R Holdrich, "A 3D Ambisonic Based Binaural Sound Reproduction System", Proc. of the AES 24th Intl. Conf. on Multichannel Audio, Banff, Canada, 2003, and in H. Pomberger, F. Zotter, "An Ambisonics Format for Flexible Playback Layouts", 1st Ambisonics Symposium, Graz, Austria, 2009.
These approaches are based on analytical expressions for the rotation matrices. For example, the rotation of a circular sound field (2D case) at an arbitrary angle can be performed by multiplying with the Tana deformation matrix which only a subset of the coefficients is nonzero:

As in this example, all deformation matrices for rotation and / or mirroring operations have special characteristics in which only the coefficients of the same order n are affecting each other. Therefore, these deformation matrices are very sparsely populated, and the Nout output can be equal to the Nin input order without losing any spatial information.
There are some interesting applications, for which the rotation or mirroring of the sound field information is required. An example is the reproduction of sound fields through headphones with a head tracking system. Instead of interpolating HRTFs (transfer function related to the head) according to the angle of rotation of the head, it is advantageous to previously rotate the sound field according to the position of the head and use fixed HRTFs for reproduction effective. This process was described in the article Noisternig / Sontacchi / Musil / Holdrich.
Another example was described in the article Pomberger / Zotter mentioned above in the context field for encoding sound information. It is possible to limit the spatial region that is described by the HOA vectors to specific parts of a circle (2D case) or a sphere. Due to the restrictions some parts of the HOA vectors will become zero. The idea promoted in that article is to use this redundancy reduction property for mixed order coding of the sound field information. As the aforementioned restrictions can only be obtained for very specific regions in space, a rotation operation is generally required to change the partial information transmitted to the desired region in space. Example
Figure 2 illustrates an example of space deformation in the two-dimensional (circular) case. The deformation function was chosen to
with a = -0.4, (27) which resembles the phase response of a filter passes everything in discrete time with a single real value parameter, compare with: M. Kap-pelan, "Eigenschaften von Allpass-Ketten und ihre Anwendung bei der nicht-aquidistanten spektralen Analyze und Syn-these ", PhD thesis, Aachen University (RWTH), Aachen, Germany, 1998.
The deformation function is shown in Figure 2A. This food's specific deformation function was selected because it guarantees a 2π periodic deformation function while allowing the amount of spatial distortion to be modified with a single parameter a. The corresponding weighting function shown in Figure 2b results in a deterministic way the results for that specific deformation function.
Figure 2c illustrates the 7x25 single step transformation strain matrix T. The absolute logarithmic values of the individual matrix coefficients are indicated by the gray scale or the types of shading according to the shading bar or attached gray scale. This exemplary matrix was designed for an HOA input order of = 3 and an output order of ^ out - 12 • The highest output order is required to capture most of the information that is dispersed by the transformation from the coefficients of lower order for higher order coefficients. If the exit order is to be further reduced, the accuracy of the deformation operation would be degraded because non-zero coefficients of the complete deformation matrix would be neglected (see the section below How to establish HOA orders for a more detailed discussion).
A very useful feature of this specific deformation matrix is that large portions of it are zero. This allows a lot of computational energy to be saved when implementing this operation, but it is not a general rule that some portions of a single step transformation matrix are zero.
Figure 2d and Figure 2e illustrate the deformation characteristics in the example of beam patterns produced by some plane waves. The two figures result from the same seven flat incoming waves at positions Φ of 0, 2 / 7π, 4 / 7π, 6 / 7π, 8 / 7π, 10 / 7π and 12 / 7π, all with identical amplitude of one, and show the seven distributions of angular amplitude, that is, the result vector s of the regular decoding operation, overdetermined se- Guinte
where the HOA A vector is the original variant or is the adjusted variant of the set of plane waves. The numbers outside the circle represent the angle Φ. The number (for example, 360) of virtual speakers is considerably greater than the number of HOA parameters. The amplitude or beam pattern distribution for the plane wave from the front direction is located at Φ = 0.
Figure 2d shows the amplitude distribution of the original HOA representation. All seven distributions are formed in a similar way and have the same width as the main lobe. The maxima of the main lobes are located at angles Φ = (0.2 / 7π, ...) of the original seven sound objects, as expected. The main lobes have widths corresponding to the limited order Nin = 3 of the original HOA vectors.
Figure 2e shows the amplitude distributions of the same sound objects, but after the deformation operation has been carried out. In general, the objects moved in the direction of the 0 degree frontal direction and the beam patterns were modified: main lobes around the frontal direction Φ = 0 became narrower and more focused, while the main lobes in the rear direction in 180 degrees became considerably wider. On the sides, with a maximum impact at 90 and 270 degrees, the beam patterns have become asymmetrical due to the wide gradient of the weighting function in Figure 2b for these angles. These considerable modifications (narrowing and remodeling) of the beam patterns were made possible by the higher order Nout = 12 of the deformed HOA vector. Theoretically, the resolution of the main lobes in the forward direction has been increased by a factor of 2.33, while the resolution in the backward direction has been reduced by a factor of 1 / 2.33. A mixed order signal was created with local orders varying across space. It can be assumed that a minimum exit order of 2.33 • Nin »7 is required to represent the deformed HOA coefficients with reasonable accuracy. In the section below How to establish HOA orders, the discussion of local, intrinsic orders is more detailed. Features
The deformation steps introduced above are more properly generic and very flexible. At least the following basic operations can be performed: rotation and / or mirroring along arbitrary axes and / or planes, spatial distortion with a continuous deformation function, and weighting of specific directions (spatial beam formation).
In the following subsections some characteristics of the deformation of inventive space are highlighted and these details provide guidance on what can and cannot be achieved. In addition, some model rules are described. In principle, the following parameters can be deformed with some degree of freedom to obtain the desired deformation characteristics: • Deformation function f (θ’Φ) / • Weighting function 9 (9, Φ); • Internal order ^ warp; • Output order M) Ut r • Window of the output coefficients with a vector w. Linearity
The basic steps of transformation in multi-step processing are linear by definition. The nonlinear mapping of sound sources to new locations occurring in the middle has an impact on the definition of the coding matrix, but the coding matrix itself is again linear. Consequently, the space deformation operation and matrix multiplication, combined with T is also a linear operation, that is,

This property is essential because it allows to manage sound field information, which is complex and comprises simultaneous contributions from different sound sources.
Invariance in space
By definition (unless the deformation function is perfectly linear with a 1 or -1 gradient), the space deformation transformation is not invariant in space. This means that the operation behaves differently for sound objects that are originally located in different positions in the hemisphere. In mathematical terms, this property is the result of the nonlinearity of the deformation function ie, f (0 + a) # = f (</>) + a (30) for at least some arbitrary angles α elθ ••• 2π [. Reversibility
Typically, the transformation matrix T cannot simply be reversed by mathematical inversion. One reason is that T is not normally square. Even a space-deforming, square matrix will not be reversible because information that is typically dispersed from lower-order coefficients to higher-order coefficients will be lost (compare the section How to establish HOA orders and the example in the Example section), and losing information on an operation means that the operation cannot be reversed.
Therefore, another way has to be found to at least approximately reverse a space deformation operation. The reverse deformation transformation
Trev can be projected by means of the reverse / rev (") function of the

Depending on the choice of HOA orders, this processing comes close to the reverse transformation. HOW TO SET UP HOA ORDERS
An important aspect to consider when designing a space deformation transformation is related to HOA orders. Although, normally, the Nin order of the input vectors Ain is predefined by external constraints, the Nout order of the output vectors Aout and also the “internal” Nwarp order of the effective nonlinear deformation operation can be more or less arbitrarily assigned . However, these two orders, Nin and Nwarp, have to be chosen carefully, as explained below. "internal" order Nwarp
The "internal" Nwarp order defines the accuracy of the effective decoding, de-forming and coding steps in the multi-step space deformation processing, described above. Typically, the Nwarp order should be considerably larger than both the Nin entry order and the Nout exit order. The reason for this requirement is that otherwise directions and artifacts will be produced because the deformation operation, in general, is a non-linear operation.
To explain this fact, Figure 3 shows an example of the total deformation matrix for the same deformation function as used for the example from Figure 2. Figures 3a, 3c and 3e illustrate the deformation functions MP fe (Φ) and f3 (Φ), respectively. Figures 3b, 3d and 3f illustrate the deformation matrices Ti. (DB), T2 (dB) and T3 (dB), respectively. For the sake of illustration, these deformation matrices were not cut to determine the deformation matrix for a specific input order Nin or output order Nout. Instead, the dotted lines of the box centered within Figures 3b, 3d and 3f illustrate the target size Nout x Nin of the final result, that is, cut transformation matrix. In this way, the impact of non-linear distortions on the deformation matrix is clearly visible. In the example, the target orders were set arbitrarily for Nin = 30 and Nout = 100.
The basic challenge can be seen in Figure 3b: it is obvious that due to nonlinear space-domain processing, the coefficients within the strain matrix are dispersed around the main diagonal - the farther from the center of the matrix. At very high distances from the center, in the example at approximately Iyl ^ 90, y being the vertical axis, the coefficient dispersion reaches the limits of the total matrix, where it appears to “jump”. This creates a special type of distortion that extends to a large portion of the deformation matrix. In experimental evaluations it was observed that these distortions significantly prejudice the transformation performance, as soon as the distortion products are located within the target area of the matrix (marked by the box of dotted lines in the figure).
For the first example in Figure 3b everything works fine because the “internal” order of processing was chosen for Nwarp = 200 which is considerably larger than the output order Nout = 100. The region of the distortions does not extend into the box. dotted line.
Another scenario is shown in Figure 3d. The internal order was specified to be equal to the output order, that is, Nwarp = Nout = 100. The figure shows that the extent of the distortions is scaled linearly with the internal order. The result is that the higher order coefficients of the transformation output are polluted by the distortion products. The advantage of such a scaling property is that it seems possible to avoid this type of non-linear distortion by increasing the internal Nwarp order accordingly.
Figure 3f shows an example with a more aggressive deformation function with a greater coefficient a = 0.7. Due to the more aggressive deformation function, the distortions now extend into the target matrix area even to the internal order of Nwarp = 200. For this case, as derived in the previous paragraph, the internal order must be increased further to super even greater supply. Experiments for this deformation function show that increasing the internal order to, for example, N = 400 removes these nonlinear distortions.
In summary, the more aggressive the deformation operation, the higher the internal Nwarp order must be. There is no formal derivation yet of a minimum internal order. However, if in doubt, over-provisioning of the “internal” order is useful because the non-linear effects are the linearity scaled with the size of the total deformation matrix. In principle, the “internal” order can be arbitrarily elevated. In particular, if a single-step transformation matrix is to be derived, the internal order plays no role in the complexity of the final deformation operation.
Nout exit order:
To specify the Nout output order of the deformation transformation, the following two aspects must be considered: - In general, the output order must be greater than the Nin input order to maintain all the information that is dispersed to the coefficients of different orders. The actual required size also depends on the characteristics of the deformation function. As a rule of thumb, the smaller the “broadband” of the warp function f (Φ smaller the required output order. It appears that in some cases the warp function can be filtered low-pass to limit the displayed order of output Nout.
An example can be seen in Figure 3. For this specific deformation function, an exit order of Nout = 100, as indicated by the dotted line box, is sufficient to prevent loss of information. If the output order is reduced significantly, for example, to Nout = 50, some non-zero coefficients of the transformation matrix will be left out, and the corresponding loss of information should be expected. - In some cases, the HOA output coefficients will be used for processing or for a device that is capable of handling only a limited order. For example, the target may be a speaker configuration with a limited number of speakers. In such applications, the exit order must be specified according to the capacities of the target system.
If Nout is small enough, the deformation transformation effectively reduces spatial information.
The reduction of the internal Nwarp order to the external Nout order can be done by simply abandoning the higher order coefficients. This corresponds to employing a rectangular window to the HOA output vectors. Alternatively, more sophisticated techniques for reducing bandwidth can be employed such as those discussed in the article by M. A. Poletti mentioned above or in the article by J. Daniel mentioned above. In this way, even more information is likely to be lost than with the rectangular window, but higher directivity standards can be realized.
The invention can be used in different parts of an audio processing chain, for example, recording, post-production, transmission, reproduction.

权利要求:
Claims (14)
[0001]
1. Method for changing the relative positions of the sound objects contained within a two-dimensional or three-dimensional ambitionic HOA representation of an audio scene, in which an input vector Ain with dimension Oin determines the coefficients of a series of Fourier of the input signal and an output vector Aout with dimension Oout determines the coefficients of a series of Fourier of the output signal correspondingly changed, the method CHARACTERIZED by the fact that it includes: - decoding (12) the input vector Ain of the coefficients Input HOA on sin input signals in space domain for regularly positioned speaker positions using the inverse W of a Ti mode matrix by calculating Sin = W Ain; - deform and encode (14) in space domain the input signals sin in the output vector Aout of the HOA output coefficients adapted by calculating Aout = T2 sin, in which the mode vectors of the T2 mode matrix are modified accordingly with a deformation function / (Φ) through which the angles,,; J ■ of the regularly positioned speaker positions are mapped one by one at target angles of the target speaker positions in the vector of
[0002]
2. Method, according to claim 1, CHARACTERIZED by the fact that the sin space domain input signals are weighted (13) by a gain function
[0003]
3. Method, according to claim 2, CHARACTERIZED by the fact that for two-dimensional Ambisonics the gain function is
[0004]
4. Method, according to claim 1 or 2, CHARACTERIZED by the fact that, in case the Owarp number or dimension of virtual speakers is equal to or greater than the Oin number or dimension of HOA coefficients, before decoding (12 ) the order or dimension of the input vector Ain is extended (11) by adding (11) zero coefficients for higher orders.
[0005]
5. Method, according to claim 1 or 2, CHARACTERIZED by the fact that if the order or dimension of HOA coefficients is less than the order or dimension of the matrix so the signal is deformed and coded and possibly weighted (13) ^ 2 sin is additionally weighted (15) using a window vector w comprising zero coefficients for the highest orders, to remove (15) part of the deformed coefficients to provide the output vector Aout.
[0006]
6. Method, according to claims 2 and 5, CHARACTERIZED by the fact that decoding (12), weighting (13) and deformation / encoding (14) are commonly performed using a transformation matrix of size ^ warP x ^ warp T = diag (w) Φ2 diag (g) Φ1, where diag (w) denotes a diagonal matrix that has the values of the window vector w as components of its main diagonal and diaS (8) denotes a diagonal matrix that has the values of the gain function 8 as components of its main diagonal.
[0007]
7. Method according to claim 6, CHARACTERIZED by the fact that, to model the transformation matrix T in order to obtain a ° outx Oin size, the corresponding columns and / or rows of the transformation matrix T are removed so a A = TA perform the space deformation operation out m-
[0008]
8. Apparatus for changing the relative positions of the sound objects contained within a two-dimensional or three-dimensional HOA ambisonic representation of an audio scene, in which an input vector Ain with dimension Oin determines the coefficients of a series of Fourier of the input signal and an Aout output vector with Oout dimension determines the coefficients of a correspondingly changed Fourier series of the output signal, the device FEATURED by the fact that it includes: - a decoder that decodes the Ain input vector of the coefficients Input HOA in sin input signals in the space domain for regularly positioned speaker positions using the Ti-1 inverse of a Ti mode matrix by calculating sin = Ti-1Am; - a deformation and coding unit that deforms and encodes in space space the input signals sin in the output vector Aout of the HOA output coefficients adapted by calculating Aout = T2 sin, in which the mode vectors of the T2 mode matrix are modified according to a deformation function / (Φ) whereby the angles of the regularly positioned speaker positions are mapped one by one at target angles' of the target speaker positions in the output vector Aout.
[0009]
9. Apparatus, according to claim 8, CHARACTERIZED by the fact that it includes a weighting unit that weighs the sin space domain input signals by means of a gain or # (0'0) function before deformation and coding ( 14).
[0010]
10. Apparatus, according to claim 9, CHARACTERIZED by the fact that for two-dimensional Ambisonics the gain function is
[0011]
11. Apparatus according to claim 8 or 9, CHARACTERIZED by the fact that it comprises an extension unit that extends, before decoding (12), the order or dimension of the input vector Ain by adding zero coefficients for higher orders , if the Owarp number or dimension of virtual speakers is equal to or greater than the number or dimension Oin of HOA coefficients.
[0012]
12. Apparatus according to claim 8 or 9, CHARACTERIZED by the fact that it comprises an additional weighting unit that weights using a window vector w comprising zero coefficients for the highest orders the deformed and coded and possibly weighted signal ^ 2 sin, and that removes part of the deformed coefficients to provide the output vector Aout.
[0013]
13. Apparatus according to claim 9 or 12, CHARACTERIZED by the fact that it comprises a unit that commonly performs decoding, weighting and deformation / encoding using a size transformation matrix
[0014]
14. Apparatus, according to claim 13, CHARACTERIZED by the fact that to model the transformation matrix T in order to obtain an Oout * Ooin size, in the decoder, in the weighting unit, in the deformation and coding unit, and in the additional weighting unit being adapted to commonly perform decoding, weighting and deformation / encoding, the corresponding columns and / or lines of the transformation matrix T are removed in order to perform the A = TA space deformation operation

类似技术:

公开号 | 公开日 | 专利标题

BR112013032878B1|2021-04-13|METHOD AND APPARATUS TO CHANGE THE RELATIVE POSITIONS OF SOUND OBJECTS CONTAINED WITHIN AN AMBISONIC REPRESENTATION OF A HIGHER ORDER

US10075799B2|2018-09-11|Method and device for rendering an audio soundfield representation

Davis et al.2005|High order spatial audio capture and its binaural head-tracked playback over headphones with HRTF cues

US10674301B2|2020-06-02|Fast and memory efficient encoding of sound objects using spherical harmonic symmetries

US10515645B2|2019-12-24|Method and apparatus for transforming an HOA signal representation

Gorzel et al.2019|Efficient encoding and decoding of binaural sound with resonance audio

Lecomte et al.2015|On the use of a Lebedev grid for Ambisonics

Støfringsdal et al.2006|Conversion of discretely sampled sound field data to auralization formats

Bai et al.2005|Head-related transfer function | synthesis based on a three-dimensional array model and singular value decomposition

Reddy et al.2017|On the Conditioning of the Spherical Harmonic Matrix for Spatial Audio Applications

Franck et al.2016|Comparison of listener-centric sound field reproduction methods in a convex optimization framework

McKenzie et al.2019|Interaural Level Difference Optimization of Binaural Ambisonic Rendering

Trevino et al.2014|Sound field reproduction using Ambisonics and irregular loudspeaker arrays

KR20200003051A|2020-01-08|Incoherent idempotent Ambisonics rendering

Kirkeby1995|Reproduction of acoustic fields

Franck et al.2012|Efficient rendering of directional sound sources in wave field synthesis

Kronlachner et al.2014|Warping and Directional Loudness Manipulation Tools for Ambisonics

同族专利:

公开号 | 公开日

EP2541547A1|2013-01-02|

TWI526088B|2016-03-11|

US9338574B2|2016-05-10|

BR112013032878A2|2017-01-24|

CN103635964B|2016-05-04|

US20140133660A1|2014-05-15|

KR102012988B1|2019-08-21|

AU2012278094A1|2014-01-16|

JP2014523172A|2014-09-08|

TW201301911A|2013-01-01|

AU2012278094B2|2017-07-27|

KR20140051927A|2014-05-02|

EP2727109A1|2014-05-07|

CN103635964A|2014-03-12|

DK2727109T3|2020-08-31|

JP5921678B2|2016-05-24|

HUE051678T2|2021-03-29|

WO2013000740A1|2013-01-03|

EP2727109B1|2020-08-05|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

GB2073556B|1980-02-23|1984-02-22|Nat Res Dev|Sound reproduction systems|

DE69839212T2|1997-06-17|2009-03-19|British Telecommunications P.L.C.|SURROUND PLAYBACK|

JP2001084000A|1999-09-08|2001-03-30|Roland Corp|Waveform reproducing device|

JP2005529379A|2001-11-21|2005-09-29|アリフコム|Method and apparatus for removing noise from electronic signals|

FR2836571B1|2002-02-28|2004-07-09|Remy Henri Denis Bruno|METHOD AND DEVICE FOR DRIVING AN ACOUSTIC FIELD RESTITUTION ASSEMBLY|

FR2847376B1|2002-11-19|2005-02-04|France Telecom|METHOD FOR PROCESSING SOUND DATA AND SOUND ACQUISITION DEVICE USING THE SAME|

WO2004060346A2|2002-12-30|2004-07-22|Angiotech International Ag|Drug delivery from rapid gelling polymer composition|

CN1226718C|2003-03-04|2005-11-09|无敌科技股份有限公司|Phonetic speed regulating method|

GB2410164A|2004-01-16|2005-07-20|Anthony John Andrews|Sound feature positioner|

EP1779385B1|2004-07-09|2010-09-22|Electronics and Telecommunications Research Institute|Method and apparatus for encoding and decoding multi-channel audio signal using virtual source location information|

WO2008080012A1|2006-12-21|2008-07-03|Cv Therapeutics, Inc.|Reduction of cardiovascular symptoms|

EP2112653A4|2007-05-24|2013-09-11|Panasonic Corp|Audio decoding device, audio decoding method, program, and integrated circuit|

GB2467534B|2009-02-04|2014-12-24|Richard Furse|Sound system|

JP2010252220A|2009-04-20|2010-11-04|Nippon Hoso Kyokai <Nhk>|Three-dimensional acoustic panning apparatus and program therefor|

US9113281B2|2009-10-07|2015-08-18|The University Of Sydney|Reconstruction of a recorded sound field|

EP2346028A1|2009-12-17|2011-07-20|Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V.|An apparatus and a method for converting a first parametric spatial audio signal into a second parametric spatial audio signal|

PL2553947T3|2010-03-26|2014-08-29|Thomson Licensing|Method and device for decoding an audio soundfield representation for audio playback|EP2637427A1|2012-03-06|2013-09-11|Thomson Licensing|Method and apparatus for playback of a higher-order ambisonics audio signal|

EP2665208A1|2012-05-14|2013-11-20|Thomson Licensing|Method and apparatus for compressing and decompressing a Higher Order Ambisonics signal representation|

US9288603B2|2012-07-15|2016-03-15|Qualcomm Incorporated|Systems, methods, apparatus, and computer-readable media for backward-compatible audio coding|

US9473870B2|2012-07-16|2016-10-18|Qualcomm Incorporated|Loudspeaker position compensation with 3D-audio hierarchical coding|

EP2898506B1|2012-09-21|2018-01-17|Dolby Laboratories Licensing Corporation|Layered approach to spatial audio coding|

CN108174341B|2013-01-16|2021-01-08|杜比国际公司|Method and apparatus for measuring higher order ambisonics loudness level|

US9736609B2|2013-02-07|2017-08-15|Qualcomm Incorporated|Determining renderers for spherical harmonic coefficients|

EP2765791A1|2013-02-08|2014-08-13|Thomson Licensing|Method and apparatus for determining directions of uncorrelated sound sources in a higher order ambisonics representation of a sound field|

US9883310B2|2013-02-08|2018-01-30|Qualcomm Incorporated|Obtaining symmetry information for higher order ambisonic audio renderers|

US10178489B2|2013-02-08|2019-01-08|Qualcomm Incorporated|Signaling audio rendering information in a bitstream|

US9609452B2|2013-02-08|2017-03-28|Qualcomm Incorporated|Obtaining sparseness information for higher order ambisonic audio renderers|

US9685163B2|2013-03-01|2017-06-20|Qualcomm Incorporated|Transforming spherical harmonic coefficients|

US9466305B2|2013-05-29|2016-10-11|Qualcomm Incorporated|Performing positional analysis to code spherical harmonic coefficients|

CN105340008B|2013-05-29|2019-06-14|高通股份有限公司|The compression through exploded representation of sound field|

US20140355769A1|2013-05-29|2014-12-04|Qualcomm Incorporated|Energy preservation for decomposed representations of a sound field|

EP2824661A1|2013-07-11|2015-01-14|Thomson Licensing|Method and Apparatus for generating from a coefficient domain representation of HOA signals a mixed spatial/coefficient domain representation of said HOA signals|

EP3028476B1|2013-07-30|2019-03-13|Dolby International AB|Panning of audio objects to arbitrary speaker layouts|

EP2866475A1|2013-10-23|2015-04-29|Thomson Licensing|Method for and apparatus for decoding an audio soundfield representation for audio playback using 2D setups|

EP3069528B1|2013-11-14|2017-09-13|Dolby Laboratories Licensing Corporation|Screen-relative rendering of audio and encoding and decoding of audio for such rendering|

CN111182443B|2014-01-08|2021-10-22|杜比国际公司|Method and apparatus for decoding a bitstream comprising an encoded HOA representation|

US9502045B2|2014-01-30|2016-11-22|Qualcomm Incorporated|Coding independent frames of ambient higher-order ambisonic coefficients|

US9922656B2|2014-01-30|2018-03-20|Qualcomm Incorporated|Transitioning of ambient higher-order ambisonic coefficients|

KR102201726B1|2014-03-21|2021-01-12|돌비 인터네셔널 에이비|Method for compressing a higher order ambisonics signal, method for decompressing a compressed hoa signal, apparatus for compressing a hoa signal, and apparatus for decompressing a compressed hoa signal|

CN106105270A|2014-03-25|2016-11-09|英迪股份有限公司|For processing the system and method for audio signal|

US9852737B2|2014-05-16|2017-12-26|Qualcomm Incorporated|Coding vectors decomposed from higher-order ambisonics audio signals|

US10770087B2|2014-05-16|2020-09-08|Qualcomm Incorporated|Selecting codebooks for coding vectors decomposed from higher-order ambisonic audio signals|

US9620137B2|2014-05-16|2017-04-11|Qualcomm Incorporated|Determining between scalar and vector quantization in higher order ambisonic coefficients|

US9747910B2|2014-09-26|2017-08-29|Qualcomm Incorporated|Switching between predictive and non-predictive quantization techniques in a higher order ambisonicsframework|

US9940937B2|2014-10-10|2018-04-10|Qualcomm Incorporated|Screen related adaptation of HOA content|

US10880597B2|2014-11-28|2020-12-29|Saturn Licensing Llc|Transmission device, transmission method, reception device, and reception method|

US10327067B2|2015-05-08|2019-06-18|Samsung Electronics Co., Ltd.|Three-dimensional sound reproduction method and device|

US10070094B2|2015-10-14|2018-09-04|Qualcomm Incorporated|Screen related adaptation of higher order ambisoniccontent|

EP3400722A1|2016-01-04|2018-11-14|Harman Becker Automotive Systems GmbH|Sound wave field generation|

EP3209036A1|2016-02-19|2017-08-23|Thomson Licensing|Method, computer readable storage medium, and apparatus for determining a target sound scene at a target position from two or more source sound scenes|

US10721578B2|2017-01-06|2020-07-21|Microsoft Technology Licensing, Llc|Spatial audio warp compensator|

法律状态:
2018-12-11| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2019-04-09| B25A| Requested transfer of rights approved|Owner name: THOMSON LICENSING DTV (FR) |

2019-04-30| B25A| Requested transfer of rights approved|Owner name: INTERDIGITAL MADISON PATENT HOLDINGS (FR) |

2019-09-17| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2021-03-02| B09A| Decision: intention to grant|

2021-03-16| B09W| Decision of grant: rectification|Free format text: O PRESENTE PEDIDO TEVE UM PARECER DE DEFERIMENTO NOTIFICADO NA RPI NO 2617 DE02/03/2021. ATRAVES DA MENSAGEM FALE CONOSCO 907401, A REQUERENTE SOLICITA CORRIGIR ERROS DETRADUCAO NO RELATORIO DESCRITIVO, QUADRO REIVINDICATORIO, RESUMO E DESENHOS DA PETICAO DEDEFERIMENTO. AS CORRECOES QUE DEVERAO COMPOR A CARTA-PATENTE SAO APRESENTADAS NA PETICAO870210021282 DE 05/03/2021. DIANTE DISTO, CONCLUO PELA RETIFICACAO. |

2021-04-13| B16A| Patent or certificate of addition of invention granted|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 15/06/2012, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

EP11305845A|EP2541547A1|2011-06-30|2011-06-30|Method and apparatus for changing the relative positions of sound objects contained within a higher-order ambisonics representation|

EP11305845.7|2011-06-30|

PCT/EP2012/061477|WO2013000740A1|2011-06-30|2012-06-15|Method and apparatus for changing the relative positions of sound objects contained within a higher-order ambisonics representation|

[返回顶部]