专利摘要:
data structure for high order ambisonics audio data the invention relates to a data structure for high order ambisonics audio data, which data structure includes 2d or 3d spatial audio content data for one or more different hoa audio data stream descriptions. the hoa audio data may have an order greater than "3", and the data structure in addition may include single audio signal source data and/or microphone array audio data from fixed spatial positions or varying with time.
公开号:BR112013010754B1
申请号:R112013010754-5
申请日:2011-10-26
公开日:2021-06-15
发明作者:Florian Keiler;Sven Kordon;Johannes Boehm;Holger Kropp;Johann-Markus Batke
申请人:Dolby International Ab;
IPC主号:
专利说明:

[0001] The invention relates to a High Order Ambisonics audio still data structure, which includes 2D and/or 3D spatial audio content data and which is also suitable for HOA audio data having an order of more than that "3". Background
[0002] 3D Audio can be realized using a soundfield description by a technique called Higher Order Ambisonics (HOA) as described below. Storing HOA data requires some conventions and conditions of how this data must be used by a special decoder to be able to create speaker signals for playback in a given playback speaker configuration. The non-existent storage format defines all of these conditions for the HOA. Format B (based on the extensible "Riff/wav" structure) with its *amb file format realization as described, for example, in "File Format for B-Format" March 30, 2009, by Martin Leese, http ://www.ambisonia.com/Members/etienne/Members/mleese/file-format-for-b-format is the most sophisticated format currently available.
[0003] As of July 16, 2010, an overview of existing file formats is revealed on the Ambisonics Xchange Site: "Existing formats", http://ambisonics.iem.at/xchange/format/existing-formats, and a proposal for an Ambisonics exchange format is also revealed on this site: "A first proposal to specify, define and determine the parameters for an Ambisonics exchange format", http://ambisonics.iem.at/xchange/format/a-first-proposal -for-the-format. Invention
[0004] With respect to HOA signals, for 3D, a collection of M = x -1 : (-X - i for 2D different Audio objects from different sound sources, all at the same frequency, can be recorded (encoded ) and played back as different sound objects since they are spatially evenly distributed. This means that a first order Ambisonics signal can carry four 3D or three 2D audio objects and these objects need to be evenly separated around a sphere to 3D or around a circle in 2D. Spatial overlay and more than M signals in the recording will result in blurring - only the highest signals can be reproduced as coherent objects, the other diffuse signals will slightly degenerate the coherent signals depending on the overlap in space, frequency and similarity of sound.
[0005] With respect to the acoustic situation in a cinema, high spatial sound localization accuracy is required for the front area of the screen in order to match the visual scene. Perception of surrounding sound objects is less critical (reverberation, sound objects not connected to the visual scene). Here, the speaker density can be lower compared to the front area.
[0006] The HOA order of the HOA data, relevant to the frontal area, needs to be large to allow holophonic reproduction at will. A typical order is N = 10. This requires (N + 1) 2 121 HOA coefficients. In theory, we could also encode M = 121 audio objects, if these audio objects were evenly spatially distributed. However, in our scenario, they are constricted to the frontal area (because here we only need higher orders). In fact, we can only encode around M = 60 Audio objects without confusing sound (the front area is at most half a sphere of directions, so M/2). With respect to the B format mentioned above, it only allows a description up to an Ambisonics order of 3, and the file size is restricted to 4GB. Other items of special information are absent, such as the wave type or the decoding reference radius which are vital for modern decoders. It is not possible to use different sample formats (word widths) and bandwidths for different components (channels) Ambisonics. There is also no standard for storing additional information and metadata for Ambisonics.
[0007] In the known technique, recording Ambisonics signals using a set of microphones is restricted to the orders of one. This could change in the future if experimental prototype HOA microphones are developed. For 3D content creation, a description of the ambient sound field could be recorded using a set of microphones in first order Ambisonics, whereby directional sources are captured using zoom mode microphones or highly directional microphones together with directional information (ie the position of the source). The directional signals can then be encoded into an HOA description, or this can be performed by a sophisticated decoder. Anyway, a new Ambisonics file format needs to be able to store more than one sound field description at the same time, but it seems that no existing format can encapsulate more than one Ambisonics description.
[0008] A problem to be solved by the invention is to provide an Ambisonics file format that is capable of storing two or more sound field descriptions at the same time, where the Ambisonics order can be greater than 3. This problem is solved by data structure disclosed in claim 1 and by the method disclosed in claim 12.
[0009] To recreate realistic 3D Audio, next-generation Ambisonics decoders will require various conventions and conditions along with the stored data to be processed, or a single file format where all parameters and related data elements can be coherently stored.
[00010] The inventive file format for spatial sound content can store one or more HOA signals and/or directional mode signals together with directional information, where Ambisonics orders greater than 3 and files > 4GB are viable. Additionally, the file format of the invention provides additional elements that existing formats do not provide: 1) Vital information required for next generation HOA decodes is stored within the file format:- Ambisonics waveform information (flat, spherical, mix types ), region of interest (sources outside or within the listening area), and reference radius (for decoding spherical waves) - Related mono directional signals can be stored. Position information of these directional signals can be described using angle or distance information or a vector encoding Ambisonics coefficients.2) All parameters defining the Ambisonics data are contained in additional information, to ensure clarity about recording:- Scaling and Ambisonics normalization (SN3D, N3D, Furse Malham, Format B, ..., user defined), mixed order information. 3) The Ambisonics data storage format is extended to allow flexible and economical storage of data:- The format of the invention allows to store data related to the Ambisonics order (Ambisonics channels) with PCM-word size resolution as well as using width 4.) Meta fields allow you to store accompanying information about the file such as recording information for microphone signals:- Recording reference coordinate system, microphone, microphone, source and virtual listener positions, directional characteristics of the microphone, environment and source information.
[00011] This file format for 2D and 3D audio content covers the storage of both High Order Ambisonics (HOA) descriptions as well as single sources with fixed or time-varying positions, and contains all the information allowing decoders to next-generation audio deliver realistic 3D Audio.
[00012] Using appropriate parameters, the file format of the invention is also suitable for continuous playback of audio content. Thus, additional content-dependent information (header data) can be sent at time instances as selected by the creator of the file. The inventive file format also serves as a scene description where tracks of an audio scene can start and end at any time.
[00013] In principle, the data structure of the invention is suitable for HOA High Order Ambisonics audio data, which data structure includes 2D and/or 3D spatial content data for one or more different data stream descriptions. HOA audio, and this data structure which is also suitable for HOA audio data having an order of more than "3", and this data structure which in addition can include single audio signal source data and/or microphone array audio data from fixed or time-varying spatial positions.
[00014] In principle, the method of the invention is suitable for audio presentation, where an HOA audio data stream containing at least two different HOA audio data signals is received and by the first one of them is used for presentation with a dense array of speakers located in a distinct area of a performance venue, and at least one second and different thereof is used for presentation with a less dense array of speakers surrounding said performance venue.
[00015] Additional advantageous embodiments are disclosed in the respective dependent claims. graphics
[00016] Illustrative embodiments of the invention are described with reference to the accompanying drawings, in which:
[00017] Fig. 1 is a holophonic cinema reproduction with dense arrays of speakers in the front region and sparse density of speakers surrounding the listening area;
[00018] Fig. 2 is a sophisticated decoding system;
[00019] Fig. 3 is the creation of HOA content from microphone array recording, single source recording and simple and complex sound field generation;
[00020] Fig. 4 is the next generation immersive content creation;
[00021] Fig. 5 is 2D decoding of HOA signals for simple surround effect speaker setup, and HOA signal decoding for front stage holophonic speaker setup and more sparse speaker setup. 3D surround effect speakers;
[00022] Fig. 6 is an interior domain problem, where the sources are outside the region of interest / validity;
[00023] Fig. 7 is the definition of spherical coordinates;
[00024] Fig. 8 is the outer domain problem, where the sources are within the region of interest / validity;
[00025] Fig. 9 is a simple example of the HOA file format;
[00026] Fig. 10 is an example for an HOA file containing multiple frames with multiple tracks;
[00027] Fig. 11 is an HOA file with multiple MetaDataChunks;
[00028] Fig. 12 is the encoding processing of TrackRegion;
[00029] FIG. 13 is the TrackRegion decoding processing;
[00030] Fig. 14 is the implementation of bandwidth reduction using MDCT processing;
[00031] Fig. 15 is the implementation of Bandwidth Reconstruction using MDCT processing. Illustrative Embodiments
[00032] With the increasing expansion of 3D Video, immersive audio technologies are becoming an interesting aspect for differentiation. Higher Order Ambisonics (HOA) is one such technology that can provide a means to progressively introduce 3D Audio into theaters. Using HOA soundtracks and HOA decoders, a theater can start with the audio settings of existing surround-effect speakers and invest in more speakers gradually, improving the immersive experience with each step.
[00033] Fig. 1a presents holophonic cinema reproduction with the dense array of speakers 11 in the front region and the sparse density of speakers 12 around the listening area or seats 10, providing a medium for reproduction it needs sounds related to visual action and sufficient accuracy of the reproduced ambient sounds. Fig. 1b presents the perceived arrival direction of the reproduced frontal sound waves, where the arrival direction of plane waves matches different screen positions, ie, plane waves are suitable for reproducing depth. Fig. 1c presents the perceived arrival direction of reproduced spherical waves, which leads to better consistency of perceived direction of sound and 3D visual action around the screen.
[00034] The need for two different HOA streams is caused by the fact that the main visual action in a theater takes place in the frontal region of the listeners. In addition, the perception accuracy of detecting the direction of a sound is greater for the front sound sources than for the surrounding sources. Therefore, the accuracy of frontal spatial sound reproduction needs to be greater than the spatial accuracy for reproduced ambient sounds. Holophonic devices for sound reproduction, a high number of speakers, a dedicated decoder and related speaker drivers are required for the front region of the screen, while less costly technology is needed for ambient sound reproduction (lower density of speakers surrounding the listening area and less perfect decoding technology).
[00035] Due to content creation and sound reproduction technologies, it is advantageous to provide an HOA representation for the ambient sounds and an HOA representation for the foreground action sounds as shown in Fig. 4. A cinema using a setup Simple with a sparse reproduction simple sound equipment can mix both streams before decoding (as shown in Fig. 5, top). A more sophisticated cinema equipped with fully immersible playback devices can utilize two decoders - one to decode ambient sounds and a specialized decoder for high-precision positioning of virtual sound sources for main action in the foreground, as featured in the sophisticated decoding system in Fig. 2 and at the bottom of Fig. 5. A special HOA file contains at least two tracks that represent HOA sound fields for ambient sounds and for front sounds related to the main visual action Optional streams for directional effects can be provided. Two corresponding decoder systems together with a positioner provide signals for a dense front 3D holophonic speaker system 21 and for the less dense (ie sparse) 3D surround effect system 22. The HOA data signal from the Track 1 stream represents ambient sounds and is converted to an HOA 231 converter for input to a specialized 1 232 Decoder for ambient reproduction. For the Track 2 data stream, the HOA signal data (frontal sounds related to the visual scene) is converted to an HOA converter 241 for input to a corrected distance filter (Eq. (26)) 242 for better placement of spherical sound sources around the screen area with a dedicated Decoder 2 243. The directional data streams are directly positioned to the L speakers. The three speaker signals are PCM mixed for playback together with the 3D speaker system.
[00036] It appears that there is no dedicated file format for such a scenario. 3D sound field recordings use full scene descriptions with related sound tracks, or a single sound field description when storing for later playback. Examples for the first type are WFS (Wave Field Synthesis) formats and various container formats. Examples for the second type are Ambisonics formats such as B format or AMB formats, according to the above mentioned article "File Format for B-Format". The latter restricts the Ambisonics orders to three, a fixed broadcast format, a fixed decoder model, and single sound fields. HOA Content Creation and Playback
[00037] The processing to generate HOA sound field descriptions is represented in Fig. 3. In Fig. 3a, natural sound field recordings are created by using sets of microphones. Capsule signals are matrixed and equalized to form HOA signals. High-order signals (>1 order Ambisonics) are typically low-pass filtered to reduce artifacts due to capsule distance effects: low-pass filtered to reduce spatial interference at high frequencies, and low-pass filtered. high to reduce excessive low frequency levels with increasing Ambisonics order n (K " ■■■., , see Eq. (34) Optionally, distance coding filtering can be applied, see Eqs. (25) and ( 27) Before storage, the HOA format information is added to the track header.
[00038] Artistic sound field representations are typically created using multiple single source directional streams. As shown in Fig. 3b, a single source signal can be captured as a PCM recording. This can be done by approach microphones or by using microphones with high directionality. In addition, the directional parameters ('-θ,-->) of the sound source in relation to a virtual best listening position are recorded (HOA coordinate system, or any reference point for further mapping). Distance information can also be created by artistically placing sounds when synthesizing scenes for movies. As shown in Fig. 3c, the directional information (θ.-:-.-:) is then used to create the encoding vector, and the directional source signal is encoded into an Ambisonics signal, see Eq. (18) . This is equivalent to a plane wave representation. A final filtering process may use the distance information rs to print a spherical source characteristic on the Ambisonics signal (Eq. (19)), or to apply distance coding filtering, Eqs. (25), (27). Before storage, HOA format information is added to the track header.
[00039] More complex wave field descriptions are generated by mixed HOA Ambisonics signals as depicted in Fig. 3d. Before storage, HOA format information is added to the track header.
[00040] The content generation process for 3D cinema is represented in Fig. 4. Frontal sounds related to visual action are encoded with high spatial accuracy and mixed with a HOA (wavefield) signal -and stored as Track 2 The encoders involved encode with a high spatial accuracy and special wave types needed to best associate with the visual scene. Track 1 contains the sound field that is related to the encoded ambient sounds without restriction of source direction. Normally, the spatial accuracy of ambient sounds does not need to be as high as for front sounds (as a result, the Ambisonics order may be lower) and wave type shaping is less critical. The ambient sound field can also include reverberant parts of the front sound signals. Both tracks are multiplexed for storage and/or exchange. Optionally, directional sounds (eg Track 3) can be multiplexed into the file. These sounds can be special effects sounds, dialogues or sports information such as narrative speech for the visually impaired.
[00041] FIG. 5 introduces the decoding principles. As shown at the top, a theater with sparse speaker configuration can mix both HOA signals from Track 1 and Track 2 before simplified HOA decoding, and can truncate the order of Track 2 and reduce the size of both. the tracks for 2D. IN case a directional stream is present, it is coded to 2D HOA. Then all three streams are mixed together to form a single HOA representation which is then decoded and played back. The lower part corresponds to Fig. 2. A cinema equipped with a holophonic system for the front stage and a sparse 3D surround effect system will use sophisticated dedicated decoders and mix the feeds for the speaker. For the Track 1 data stream, the HOA data representing the ambient sounds is converted to the specialized Decoder 1 for ambient reproduction. For the Track 2 data stream, HOA (front sounds related to the visual scene) is converted and distance corrected (Eq. (26)) for better placement of spherical sound sources around the screen area with a Decoder 2 dedicated. The directional data streams are directly positioned to the L speakers. The three speaker signals are PCM mixed for playback together with the 3D speaker system. Sound field descriptions using High Order Ambisonics Field description of sound sound using Spherical Harmonics (SH)
[00042] When using spherical Harmonic / Bessel descriptions, the solution of the acoustic wave equation is provided in Eq. (1), according to MA Poletti, "Three-dimensional surround sound systems based on spherical harmonics", Journal of Audio Engineering Society, 53(11), pp.1004-1025, November 2005, and Earl G. Williams, "Fourier Acoustics", Academic Press, 1999. Sound pressure is a function of the spherical coordinates r,θ,Φ ( see Fig. 7 for ú) 2π/ its definition) and the spatial frequency

[00043] The description is valid for audio sound sources outside the region of interest or validity (interior domain problem, as shown in Fig. 6) and assumes the Orthogonal Spherical Harmonics - Normalized:

[00044] The --7 (7 are called Ambisonics Coefficients, is the spherical Bessel function of the first type, are called Spherical Harmonics (SH), n is the index of the Ambisonics order, and m indicates the degree.
[00045] Due to the nature of the Bessel function which has significant values only for small Kr values (small distances from the origin or low frequencies), the series can be Prada in the same n order and restricted to an N value with sufficient precision. When storing HOA data, normally Ambisonics coefficients -■ -T ■ or some derivatives (details are described below) are stored up to this N order. N is called the Ambisonics order. N is called Ambisonics order, and the term "order" is also commonly used in combination with n in the Bessel :•.(<-) and Hankel < functions.
[00046] The solution of the wave equations for the outer case, where the sources are located within a region of interest or validity as represented in Fig. 8, is expressed for > in Eq. (2):

[00047] The -7 G} again are called the Ambisonics coefficients and *•- (<-) denotes the spherical Henkel function of the first type and the nth order. The formula assumes orthogonal-normalized SH. Note: Generally, the spherical Hankel function of the first type *•- is used to describe output waves (related to "■') for positive frequencies and the spherical Hankel function of the second type is used for incoming waves (related to " “'"'), according to the book "Fourier Acoustics" mentioned above. Spherical Harmonics
[00048] Spherical harmonics can be complex or real values. The general case for HOA uses spherical harmonics with real value. A unified description of Ambisonics using spherical harmonics with real and complex values can be seen in Mark Poletti, "Unified description of Ambisonics using real and complex spherical harmonics", Proceedings of the Ambisonics Symposium 2009, Gras, Austria, June, 2009.
[00049] There are different ways to normalize spherical harmonics (which are independent of spherical harmonics being real or complex), according to the following Web pages regarding spherical (real) harmonics, and normalization schemes:http: //www.ipgp.fr/~wiecsor/SHTOOLS/www/conventions.html, http://en.citisendium.org/wiki/Spherical harmonics. The normalization corresponds to the orthogonality relationship between and .
[00050] Note:

[00051] where s2 is the unit sphere and the Kroneker delta is equal to 1 for a = a , otherwise 0.
[00052] Complex spherical harmonics are described by:

[00053] where
for an alternate sign for me positive as in the book "Fourier Acoustic" mentioned above. (Note: this is a convention term and can be omitted for positive SH only). is a normalization term that takes the form for an orthogonal representation - normalized (! denotafactorial):

[00054] Table 1 below presents some normalization schemes commonly used for spherical harmonics with complex values. J'n.lml are the reassociated Legend functions, where the notation with |m| from the above article "Unified description of Ambisonics using real and complex sphericalharmonics" which avoids the (-1)m phase term called the Condon-Shortley phase, and which is sometimes included within the ■' representation within others notations. The associated Legendre functions _ - - - 0 can be expressed using Rodriguez's formula as:
Table 1 - Normalization Factors for spherical harmonics with complex values
[00055] Numerically it is advantageous to derive ■ ii in a progressive way from a recurrence relation, see William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery, "Numeric Recipes in C", Cambridge University Press, 1992. The associated Legendre functions up to n = 4 are given in Table 2:

[00056] SH with real value are derived by combining the complex conjugate corresponding to the opposite values of m (the term (-1)m in definition (6) is introduced to obtain unsigned expressions for real SH, which is the normal case at Ambisonics):

[00057] which can be rewritten as Eq. (7) to highlight the connection with circular harmonics with Φ-(.-)= Φ •=-(.-) just retaining the azimuth term:

[00058] The total number of spherical components -ç . for a given Ambisonics order N is equal (N+1)2. Common normalization schemes for spherical harmonics with real values are given in Table 3.
Table 3 - Normalization schemes of real 3D SH, 5 has a value of 1 for m = 0 and otherwise 0 Circular Harmonics
[00059] For two-dimensional representations only a subset of harmonics is required. The degree SH can only take the values and í- < The total number of components for a given N reduces to 2N+1 because the components representing the slope θ become obsolete and the spherical harmonics can be replaced by the circular harmonics given in Eq . (8).
[00060] There are different Nm normalization schemes for circular harmonics which need to be considered when converting 3D Ambisonics coefficients to 2D coefficients. The most general formula for circular harmonics becomes:

[00061] Some common normalization factors for circular harmonics are provided in Table 4, where the normalization term is introduced by the factor before the horizontal term
Table 4 - CH 2D normalization schemes, has a value of 1 for m = 0 and otherwise 0
[00062] The conversion between different normalizations is straightforward. In general, normalization has an effect on the notation describing the pressure (according to Eqs. (1), (2)) and on all derived considerations. The type of normalization also influences the Ambisonics coefficients. There are also weights that can be applied to scale these coefficients, for example, Furse-Malhan (FuMa) weights applied to Ambisonics coefficients when storing a file using the AMB format.
[00063] With respect to 2D to 3D conversion, CH to SH conversion and vice versa can also be applied to Ambisonics coefficients, for example when decoding an Ambisonics 3D representation (recording) with a 2D decoder to a high-configuration 2D speaker. The relationship between -ç' and “•.= ••• for the 3D to 2D conversion is represented in the following scheme up to an Ambisonics order of 4:

[00064] The 2D to 3D conversion fact can be derived for horizontal positioning in as follows:

[00065] Converting from 3D to 2D uses ft. Details are given in connection with Eqs. (28), (29), (30) below. A 2D normalized to orthogonal normalized conversion becomes:
Ambisonics Coefficients
[00066] Ambisonics coefficients have the unit scale of sound pressure:
Ambisonics coefficients make up the Ambisonics signal and are generally a function of separate time. Table 5 presents the relationship between the dimensional representation, order N Ambisonics and the number of Ambisonics coefficients (channels).
Table 5 - Number of Ambisonics coefficients
[00067] When dealing with separate time representations normally Ambisonics coefficients are stored in an interleaved manner as PCM channel representations for multi-channel recordings (channel 1 = Ambisonics'-sample coefficient ' ). The sequence of coefficients being a matter of convention. An example for 3D, N = 2 is:

[00068] and for 2D, N = 2:

[00069] Signal } can be considered as a mono representation of the Ambisonics recording, having no directional information but being representative for the overall timbre impression of the recording. Normalization of the Ambisonics coefficients is generally performed according to the normalization of SH (as will be apparent below, see Eq. (15)), which must be considered when decoding an external recording (-< are based on SH with normalization factor ; .; , " ■ are based on SH with normalization factor ):

[00070] that becomes
for the case of SN3Dto N3D.
[00071] Format B and format AMB use additional weights (Gerson Weights, Furse-Malhan (FuMa), MaxN) that are applied to the coefficients. The reference normalization then is usually SN3D, according to Jerome Daniel, Représentation de champs acoustiques, application à La transmission et à La reproduction of scènes sonor complexes dans a multimedia context", PhD thesis, Universitè Paris, 6, 2001, and Dave Malham, "3-D acoustic space and its simulation using ambisonics", http://www.dxarts.washington.edu/courses/567 /current/malham3d.pdf.
[00072] The following two specific realizations of the wave equations for plane waves or ideal spherical waves provide more details on the Ambisonics coefficients: Plane Waves
[00073] Solving the wave equation for plane waves - ■ becomes independent of kers; < describe the source angle, '*' denotes the conjugate complex:

[00074] Here, <is used to describe the pressure of the source scaling signal measured at the origin of the description coordinate system which may be a function of time and sector in
for orthogonal normalized spherical harmonics. Generally, Ambisonics assumes plane waves and coefficients Ambisonics
are transmitted or stored. This assumption offers the possibility of superimposing different directional signals as well as a simple decoder design. This is also true for signals from a Soundfieldtm microphone recorded in first order B format (N = 1), which becomes obvious when comparing the phrase progression of the EQ filters (for theoretical progression see the article mentioned above "Unified description of Ambisonics using real and complex spherical harmonics", chapter 2.1, and for a patent-protected progression see US 4042779. Eq. (1) becomes:

[00075] Coefficients " can be derived by signs of (18)
[00076] where d is an Ambisonics signal, keeping -t<-j, (example for N = 2: d(t)=[ -i— -f ]'), size(d)= (N+1)2x1 = Ox1 , - is the pressure of the source signal at the reference origin, and is the encoding vector, holding ;■->)', size (^ ) = Ox1. The encoding vector can be derived from the spherical harmonies for the specific source direction θj■ (equal to the plane wave direction).Spherical Waves
[00077] Ambisonics coefficients describing input peripheral waves generated by point sources (near field sources) for < are :

[00078] This equation is derived in connection with Eqs. (31) through (36) below. - . describes the sound pressure at the origin and 4°again becomes identical to <F, is the zero-th order spherical Hankel function of the second type. The eq. (19) is similar to the instruction in Jerome Daniel, "Spatial sound encoding including near field effect: Introducing distance coding filters and a viable, new ambisonic format", AES 23rd International Conference, Denmark, May 2003. Here,
which, with Eq. (11) in mind, can be found in MA Gerson, "General metatheory of auditory localization", 92th AES Convention, 1992, Preprint 2206, where Gerson describes the proximity effect for first degree signals. Synthetic creation of spherical Ambisonics signals is less common for higher order N Ambisonics because the frequency responses are difficult to handle numerically for low frequencies. These numerical problems can be overcome by considering a spherical model for decoding/playback as described below.Soundfield ReproductionFlat Wave Decoding
[00079] In general, Ambisonics assumes a reproduction of the sound field by L speakers that are evenly distributed in a circle or sphere. When assuming that the speakers are placed far enough away from the listener's position, a flat wave decoding model is valid at the center (r > ). Sound pressure generated by L speakers is described by:

[00080] with ' being the signal to the speaker and having the unit scale of a sound pressure, 1Pa. ' : is often called the speaker trigger function 7 . It is desirable that this sound pressure of Eq. (20) is identical to the pressure described by Eq. (17). This leads to:

[00081] This can be rewritten in matrix form, known as "recoding formula" (compare Eq. (18)):

[00082] Where d is an Ambisonics signal, holding or
(example for N = 2: d(n)
size(d) = (N+1)2x1 = Ox1, is the matrix (recoding), holding ,'-'tã•-:)', size (^ ) - OxL, and y are the signals of speaker 1 : , size (y(n), 1) = L. y can then be derived using some known methods, eg mode matching, or by methods that optimize special speaker placement functions. Decoding for Spherical Wave Model
[00083] A more general decoding model again assumes speakers evenly distributed around the origin with a distance radiating point like spherical waves. The Ambisonics - coefficients are given by the general description from Eq. (1) and the sound pressure generated by the L speakers is given according to Eq. (19):

[00084] A more sophisticated decoder can filter the Ambisonics coefficients in order to recover
Remote Encoded Ambisonics Signals

[00085] Create < in Ambisonics encoder using speaker reference distance = can solve numerical problems of -■ when modeling or recording spherical waves (using Eq. (18)):

[00086] Transmitted or stored -■ , the reference distance : and an indicator that the coded spherical distance coefficients are used. On the decoder side, a simple decoding processing as given in Eq. (22) is feasible as long as the actual speaker distance .^TBf . If this difference is too big, a correction
filtering before Ambisonics decoding is required.
[00087] Other decoding models such as Eq. (24) result in different formulas for distance encoded Ambisonics:

[00088] Furthermore, the normalization of Spherical Harmonics can have the influence of the distance-coded Ambisonics formula, ie, the distance-coded Ambisonics coefficients need a defined context.
[00089] The details for the 2D to 3D conversion mentioned above are as follows: The conversion factor "it to convert a 2D circular component to a 3D spherical component by multiplication can be derived as follows:

[00090] Using common identity (according to Wikipedia, 12 October 2010, "Associated Legendre polynomials", http://en.wikipedia.org/w/index.php title=Associated Legendre polynomials&oldid=3630

[00091] Eq. (29) inserted in Eq. (28) leads to Eq. (10). The 2D to ortho-3D conversion is derived by

[00092] using the relation
and replacing = 2
[00093] The details for the Spherical Wave expansion mentioned above are as follows:
[00094] Solving Eq. (1) for spherical waves, which are generated by point sources for < 3 and input wave is more complicated because point sources with infinitesimal disappearance size need to be described using a volume flow <T , where the pressure radiated to a field point at ; and source positioned to be given by (according to the book "Fourier Acoustics") mentioned above:

[00095] with 3 being the specific density and i being the Green function
(32), Jr j can also be expressed in spherical harmonics for < 3 by
b(33), where is the second type Hankel function. Note that Green's function has a unit-meter-1 scale due to ; ). Eqs. (31), (33) can be compared with Eq. (1) to derive the Ambisonics coefficients of spherical waves:
[00096]
(34) where is the volume flow in m3s-1 units, and is the specific density in kg m-3. To be able to synthetically create Ambisonics signals and relate the above plane wave considerations, it is sensible to express Eq. (34) using the sound pressure generated at the origin of the coordinate system:

[00097] which leads to
Swap Storage Format
[00098] The storage format according to the invention allows to store more than one HOA representation and additional directional flows together in a data container. This allows for different formats of HOA descriptions that allow decoders to optimize playback, and provide efficient data storage for sizes >4GB. Additional advantages are: A) By storing multiple HOA descriptions using different formats together with the related storage format information, an Ambisonics decoder is able to mix and decode both representations. B) Required information items for next generation HOA decoders are stored as format information: - Dimensionality, region of interest) (sources outside or within the listening area), normalization of functions with spherical basis; - Ambisonics coefficient packing and sizing information; Ambisonics wave type (flat, spherical ), reference radius (for decoding spherical waves);- Related mono directional signals can be stored. Position information of these directional signals can be described using angle or distance information or an Ambisonics coefficient encoding veto.C) The Ambisonics data storage format is extended to allow flexible and economical data storage:- Storing Ambisonics data related to Ambisonics components (Ambisonics channels) with different PCM word size resolution;- Store Ambisonics data with reduced bandwidth using resampling or MDCT processing.D) Metadata fields are available to associate tracks for special decoding (front, environment) and to allow the storage of accompanying information about the file, such as recording information for microphone signals:- Record coordinate system reference, microphone, source and virtual listener positions, microphone directional characteristics, environment and environment information. source.E) The format is suitable for multiple storage. by frames containing different tracks, allowing audio scene changes without a scene description. (Note: A track contains a HOA sound field description or a single source with position information. A frame is a combination of one or more parallel tracks). Tracks can start at the beginning of a frame or end at the end of a frame, so no timecode is required.F) The format makes it easy to quickly access audio track data (fast forward or skip to cue points ) and determine a time code relative to the start time of the file's data. HOA parameters for HOA data exchange
[00099] Table 6 summarizes the parameters required to be set for an unambiguous exchange of HOA signal data. The definition of spherical harmonics is fixed for cases with complex values and for cases with real values, according to Eqs. (3) and (6).

Table 6 - Parameters for unambiguous swapping of HOAD writesFile Format Details
[000100] In the following, the file format for storing audio scenes composed of High Order Ambisonics (HOA) or single sources with position information is described in detail. The audio scene can contain multiple HOA streams that can use different normalization schemes. Thus, a decoder can calculate the corresponding speaker signals for the desired speaker configuration as an overlay of all audio tracks from a current file. The file contains all the data required to decode the audio content. The file format according to the invention offers the aspect of storing more than one HOA or single source signal in the single file. The file format uses a composition of frames, each of which can contain several tracks, where the data of a track is stored in one or more packages called TrackPackets.
[000101] All integer types are stored in low-end byte-first order so the least significant byte comes first. Bit order is always the most significant bits first. The notation for integer data types is "int". A "u" in front indicates an unsigned integer. Bit resolution is written at the end of the definition. For example, an unsigned 16-bit integer field is defined as "uint16". PCM samples and HOA coefficients in integer format are represented as fixed-point number with the decimal point at the most significant bit. All floating point data types conform to the IEEE IEEE-754 Specification, "Standard for binary floating-point arithmetic", http://grouper.ieee.org/groups/754/. The notation for the floating point data type is "float". Bit resolution is written at the end of the definition. For example, a 32-bit floating point field is defined as "float32". Constant ID identifiers, which identify the beginning of a frame, track, or block, and string identifiers are data type byte. The byte order of byte arrays is byte and most significant bit first. Therefore, the ID "TRACK" is defined in a 32-bit byte field where the bytes are written in physical order "T", "R", "C" and "K" (<0x54; 0x52; 0x4b>). Hexadecimal values start with "0x" (for example, 0xAB64C5). Single bits are enclosed in quotes (eg "1") and multiple binary values start with "0b" (eg 0b0011 = 0x3). Header field names always start with the header name followed by the field name, where the first letter of each word is capitalized (for example, TrackHeaderSize). Field or header name abbreviations are created by using capital letters only (eg TrackHeaderSize = THS). The HOA file format can include more than one Frame, Pack or Track. For discrimination of multiple header fields, a number can follow the field or header name. For example, the second TrackPacket of the third Track is named "Track3Packet2". The HOA file format can include fields with complex values. These complex values are stored as the real and imaginary part where the real part is recorded first. The complex number 1+i2 in the format "int8" would be stored as "0x01" followed by "0x02". Consequently, fields or coefficients in a complex-valued format type require twice the storage size compared to the corresponding real-valued format type.High Order Ambisonics File Format Structure Single-track format
[000102] The High Order Ambisonics file format includes at least a FileHeader, a FrameHeader, a TrackHeader and a TrackPacket as depicted in Fig. 9, which presents an illustrative simple HOA file format file that contains a Track in one or more Packets. Therefore, the basic structure of an HOA file is a FileHeader followed by a Frame that includes at least one Track. A Track always consists of a TrackHeader and one or more TrackPacket. Multiple Track and Frame Format
[000103] In contrast to FileHeader, the HOA File can contain more than one Frame, where a Frame can contain more than one Track. A new FrameHeader is used if the maximum size of a Frame is exceeded or if Tracks are added, or removed from one Frame to another. The structure of a HOA File with Multiple Tracks and Frames is shown in Fig. 10. The structure of a Multiple Tracks and Frames starts with FrameHeader followed by all TrackHeaders of the Frame. Consequently, the TrackPackets of each Track are successively sent to the FrameHeaders, where the TrackPackets are interleaved in the same order as TrackHeaders. In a file with multiple tracks and frames, the length of a Sampled Packet is defined in FrameHeader and is constant for all Tracks. Additionally, the samples of each Track are synchronized, for example, the samples of Track1Packet1 are synchronous with the samples of Track2Packet1. Specific TrackCodingTypes can cause a delay on the decoder side, and such specific delay needs to be known on the decoder side, or is it to be included in the TrackCodingType dependent part of the TrackHeader, due to the decoder synchronizing all TrackPackets with the maximum delay of all Tracks of a Frame.File Dependent Metadata
[000104] Metadata referring to the complete HOA file can optionally be added after the FileHeader in MetaDataChunks. A MetaDataChunk starts with a General User ID (GUID) followed by the MetaDataChunkSize. The essence of MetaDataChunk, for example the Metadata information, is packaged in an XML format or any user-defined format. Fig. 11 presents the structure of an HOA file format using multiple MetaDataChunks. Trail Types
[000105] An HOA Format Track differentiates between a HOATrack and a SingleSourceTrack. The HOATrack includes the full sound field encoded as HOACoefficients. Therefore, a scene description, for example the positions of the encoded sources, is not required to decode the coefficients on the decoder side. In other words, an audio scene is stored within HOACoefficients, Unlike HOATrack, SingleSourceTrack only includes a source encoded as PCM samples along with the position of the source within an audio scene. Over time, the SingleSourceTrack position can be fixed or variable. The source position is sent as TrackHOAEncodingVector or as TrackPositionVector. The TrackHOAEncodingVector contains the HOA encoding values to get the HOACoefficients for each sample. The TrackPositionVector contains the position of the source as an angle and distance with respect to the central listening position.

[000106] The FileHeader includes all the information contained in the complete HOA File. FileID is used to identify the HOA File Format. The sampling rate is constant for all Tracklogs, even if it is sent in FrameHeader. HOA files that change their sample rate from one frame to another are invalid. The number of Frames is indicated in FileHeader to indicate the frame structure for the decoder.


[000107] The FrameHeader maintains the constant information of all Tracklogs of a Frame and indicates changes within the HOA File. FrameID and FrameSize indicate the beginning of a Frame and the length of the Frame. These two fields allow easy access to each frame and cross-checking the frame structure. If Frame length displays more than 32 bits, a Frame can be split into multiple Frames. Each Frame has a unique FrameNumber. FrameNumber must start with 0 and must be incremented by one for each new Frame. The number of samples in the Frame is constant for all Tracks in a Frame. The number of Tracks within the Frame is constant for the Frame. A new Frame header is sent to end or start Tracks at a desired sampling position. Samples for each Track are stored in Packages. The size of these TrackPackets is indicated in the samples and is constant for all Tracks. The number of packets is equal to the integer that is required to store the number of Frame samples. Therefore, the last packet of a Tracklog may contain fewer samples than the indicated packet size. The sample rate of a frame is equal to FileSampleRate and is indicated in FrameHeader to allow decoding of a Frame without FileHeader knowledge. This can be used when decoding from the middle of a multi-frame file without FileHeader knowledge, eg for streaming applications.


[000108] The term "din" refers to a dynamic field size due to conditional fields. TrackHeader maintains constant information for specific Track Packs. The TrackHeader is separated into a constant part and a variable part for two TrackSourceTypes. TrackHeader starts with a constant TrackID for checking and identifying the beginning of the TrackHeader. A unique TrackNumber is assigned to each Track to indicate coherent Tracks across the borders of the Frames. Thus, a track with the same TrackNumber can occur in the next frame. The TrackHeaderSize is provided to jump to the next TrackHeader and it is flagged as an offset from the end of the TrackHeaderSize field. TrackMetaDataOffset provides the number of samples to jump directly to the beginning of the TrackMetaData field, which can be used to skip the variable length part of the TrackHeader. A TrackMetaDataOffset of zero indicates that the TrackMetaData field does not exist. Depending on the TrackSourceType, either the HOATrackHeader or the SincleSourceTrackHeader is provided. The HOATrackHeader provides additional information for standard HOA coefficients that describe the complete sound field. The SingleSourceTrackHeader maintains information for the samples of a mono PCM track and the position of the source. For SingleSourceTracks, the decoder has to include the Tracks within the scene. At the end of the TrackHeader, an optional TrackMetaData field is defined, which uses the XML format to provide Track-dependent Metadata, for example, additional information for transmission in A-format (microphone array signals).




[000109] The HOATrackHeader is a part of the TrackHeader that holds information for decoding a HOATrack. The TrackPackets of a HOATrack transfer HOA coefficients that encode the entire sound field of a Track. Basically, the HOATrackHeader maintains all the HOA parameters that are required on the decoder side to decode HOA coefficients for a given speaker configuration. TrackComplexValueFlag and TrackSampleFormat define the format type of the HOA coefficients of each TrackPacket. For encoded or compressed coefficients, TrackSampleFormat defines the format of the decoded or uncompressed coefficients. All format types can be real or complex numbers. More information on complex numbers is provided in the File Format Details section above. All HOA dependent information is defined in TrackHOAParams. TrackHOAParams are reused in other TrackSourceTypes. Therefore, the TrackHOAParams fields are defined and described in the Track HOA Parameters section (TrackHOAParams). The TrackCodingType field indicates the encoding format (compression of the HOA coefficients. The basic version of the HOA file format includes, for example, two CodingTypes. A CodingType is the PCM encoding type (TrackCodingType == "0"), where the coefficients Uncompressed reals or complexes are written to packets in the selected TrackSampleFormat.The order and normalization of the HOA coefficients are defined in the TrackHOAParams fields.A second CodingType allows a sample format change and limit bandwidth of the coefficients of each HOA order. A detailed description of this CodingType is provided in the Encoding section of the TrackRegion, and an explanation follows: OTrackBandwidthReductionType determines the type of processing that was used to limit the bandwidth of each HOA order. left unchanged, bandwidth reduction can be disabled by setting the TrackBandwidthReductionTyp field and to zero. Two other types of bandwidth reduction processing are defined. The format includes frequency domain MDCT processing and optionally time domain filter processing. For more information on MDCT processing see the Bandwidth Reduction via MDCT section. HOA orders can be combined in regions of the same sample format and bandwidth. The number of regions is indicated by the TrackNumberOfOrderRegions field. For each region, the first and last order index, sample format and optional bandwidth reduction information have to be defined. A region will get at least one order. Orders that are not covered by any region are encoded with full bandwidth using the standard format indicated in the TrackSampleFormat field. A special case is the use of no regions (TrackNumberOfOrderRegions == 0). This case can be used to remove the interleaving of HOA coefficients in PCM format, where the HOA components are not sample interleaved. The HOA coefficients of the orders of a region are encoded in TrackRegionSampleFormat. TrackRegionUseBandwidthReduction indicates using bandwidth reduction processing for region order coefficients. If the TrackRegionUseBandwidthReduction indicator is set, additional bandwidth reduction information will follow. For MDCT processing, the window type and the first and last encoded MDCT warehouse are defined. Hereby, the first deposit is equivalent to the lower cutoff frequency and the last deposit defines the upper cutoff frequency. MDCT deposits are also encoded in the TrackRegionSampleFormat, according to the Bandwidth Reduction via MDCT section. Single Source Type
[000110] Single sources are subdivided into fixed position and mobile position sources. The source type is indicated in TrackMovingSourceFlag. The difference between the fixed and mobile position source type is that the position of the fixed source is indicated only on the TrackHeader and in each TrackPacket for moving sources. The position of a source can be indicated explicitly with the position vector in spherical coordinates or implicitly as the HOA encoded vector. The source itself is a mono PCM track which has to be encoded to HOA coefficients on the decoder side in case an Ambisonics decoder is used for playback.



[000111] Fixed position source type is defined by a TrackMovingSourceFlag of zero. The second field indicates the TrackPositionType which provides the font position encoding as vector in spherical coordinates or as HOA encoding vector. The encoding format of mono PCM samples is indicated by the TrackSampleFormat field. If the source position is sent as TrackPositionVector, the spherical coordinates of the source position are defined in the fields TrackPositionTheta (slope from s axis to x plane, y plane), TrackPositionPhi (counterclockwise azimuth start) and TrackPositionRadius . If the source position is defined as an HOA encoding vector, the TrackHOAParams are defined first. These parameters are defined in the TrackHOAParams section and indicate the normalizations and definitions used of the HOA encoding vector. The TrackEncodeVectorComplexFlag andTrackEncodeVectorFormat fields define the format type of the following TrackHOAEncoding vector. The TrackHOAEncodingVector consists of TrackHOAParamNumberOfCoeffs values that are encoded in the format "float32" or "float64".


[000112] Moving position source type is defined by a TrackMovingSourceFlag of "1". The header is identical to the fixed font header, except that the TrackPositionTheta, TrackPositionPhi, TrackPositionRadius, and TrackHOAEncodingVector font position data fields are missing. For mobile sources, these are located in TrackPackets to indicate new (mobile) source position in each Pack.





[000113] Various approaches to HOA encoding and decoding have been discussed in the past. However, without any conclusion or agreement to code HOA coefficients. Advantageously, the format according to the invention allows the storage of most known HOA representations. TrackHOAParams are defined to clarify what kind of normalization and order sequence of coefficients was used on the encoder side. These settings have to be considered on the decoder side for mixing HOA tracks and for applying the decoder matrix. HOA coefficients can be applied to the full three-dimensional sound field or just to the two-dimensional z/y plane. The dimension of the HOATrack is defined by the TrackHOAParamDimension field. TrackHOAParamRegionOfInterest reflects two sound pressure expansions in series, whereby the sources reside inside or outside the region of interest, and the region of interest does not contain any sources. The calculation of the sound pressure for the inner and outer cases is defined in equations (1) and (2) above, respectively, whereby the direction information of the HOA signal ■«j is determined by the conjugate complex spherical harmonic function . This function is defined in a real and complex number version. The encoder and decoder have to apply the spherical harmonic function of the equivalent number type. Therefore, TrackHOAParamSphericalHarmonicType indicates which spherical harmonic type and function has been applied on the decoder side. As mentioned above, basically the spherical harmonic function is defined by the associated Legendre functions and a complex or real trigonometric function. The associated Legendre functions are defined by Eq. (5). The spherical harmonic representation with complex value is

[000114] where is the sizing factor (according to Eq. (3)). This complex-valued representation can be transformed into a real-valued representation using the following equation:

[000115] where the modified scaling factor for spherical harmonics with real value is

[000116] For 2D representations, the circular harmonic function has to be used for encoding and decoding the HOA coefficients. The complex valued representation of the circular harmonic is defined by v„.(Φ) = *™e"KC. The actual valued representation of the circular harmonic is defined by

[000117] Various normalization factors and are used for spherical or circular harmonic functions for specific applications or requirements. To ensure correct decoding of the HOA coefficients, the normalization of the spherical harmonic function used on the encoder side has to be known on the decoder side. Table 7 below defines the normalizations that can be selected with the TrackHOAParamSphericalHarmonicNorm field.
Table 7 - Normalization of spherical and circular harmonic functions
[000118] For future normalizations, the dedicated value of the TrackHOAParamSphericalHarmonnicNorm field is available. For a dedicated normalization, the scaling fact for each HOA coefficient is defined at the end of TrackHOAParams. TrackScalingFactors dedicated scaling factors can be passed as real or complex "float32" or "float64" values. The format of the scaling factor is defined in the fields TrackCompleteValueScalingFlag and TrackScalingFormat in the case of dedicated scaling.
[000119] Furse-Malham normalization can be additionally applied to the coded HOA coefficients for equalizing the amplitudes of the different HOA order coefficients to absolute values of less than "one" for a transmission in integer format types. Furse-Malham normalization was designed for the spherical harmonic function with real value SN3D up to three order coefficients. Therefore, it is recommended to use Furse-Malham normalization only in combination with the SN3D real value spherical harmonic function. Also, TrackHOAParamFurseMalhamFlag is ignored for Tracks with an HOA order greater than three. Furse-Malham normalization has to be inverted on the decoder side to decode the HOA coefficients. Table 8 defines the Furse-Malham coefficients.

Table 9 -= Furse-Malham normalization factors to be applied on the encoder side
[000120] TrackHOAParamDecoderType defines what type of decoder is on the encoder side assumed to be present on the decoder side. The decoder type determines the speaker model (spherical or flat wave) that is to be used on the decoder side to produce the sound field. In this way, the computational complexity of the decoder can be reduced by shifting parts of the decoder equation to the encoder equation. Additionally, numerical issues on the encoder side can be reduced. Additionally, the decoder can be reduced to identical processing for all HOA coefficients because all inconsistencies on the decoder side can be moved to the encoder. However, for spherical waves, a constant speaker distance from the listening position has to be assumed. Therefore, the default decoder type is indicated in TrackHeader, and the speaker radius ■ for spherical wave decoder types is transmitted in the optional TrackHOAParamReferenceRadius field in millimeters. An additional field on the decoder side can equalize the differences between assumed and actual speaker radius. The TrackHOAParamDecoderType normalization of the HOA coefficients depends on the use of the inner or outer field expansion in the series selected in the TrackHOAParamRegionOfInterest. Note: coefficients " in Eq. (18) and in the following equations correspond to coefficients 7 in the following. On the encoder side, the coefficients - . are determined from the coefficients '-or -7 ■ as defined in Table 9, and are stored. The normalization used is indicated in the TrackHOAParamDecoderType field of the TrackHOAParams header:
Table 9 - Transmitted HOA coefficients for various decoder type normalizations
[000121] The HOA coefficients for a time sample comprise the number of coefficientsTrackParamNumberOfCoeffs( -. depends on the dimension of the HOA coefficients. For 2D sound fields -"' is equal to on - i where ' is equal to the TrackHOAParamHorizontalOrder field from the TrackHOAParams header The 2D HOA coefficients are defined as “ii = “-com ' and can be represented as a subset of the 3D coefficients as shown in Table 10. For 3D sound fields it is equal to C - -)2 where ' is equal to the field TrackHOAParamVerticalOrder from the TrackHOAParam header The 3D HOA coefficients -'/ are set to ■ * ■ and e - ■ .A common representation of the HOA coefficients is given in the Table
Table 10 - Representation of HOA coefficients up to the fourth order showing the 2D coefficients in bold as a subset of the 3D coefficients
[000122] In case of eTrackHOAParamHorizontalOrder 3D sound fields greater thanTrackHOAParamVerticalOrder, mixed order decoding will be performed. In mixed order signals, some high order coefficients are transmitted in 2D only. The TrackHOAParamVerticalOrder field determines the vertical order where all coefficients are transmitted. From the vertical order to the TrackHOAPramHorizontalOrder, only 2D coefficients are used. Thus, TrackHOAParamHorizontalOrder is equal to or greater than TrackHOAParamVerticalOrder. An example for a mixed order representation of a horizontal order of four and a vertical order of two is shown in Table 11:
Table 11 - Representation of HOA coefficients for a mixed order representation of vertical order two and horizontal order four.
[000123] The HOA C coefficients are stored in the Packs of a Track. The sequence of coefficients, for example, which coefficient comes first and which one comes next, was defined differently in the past. Therefore, theTrackHOAParamCoeffSequence field indicates three types of coefficient sequences. The three sequences are derived from the arrangement of HOA coefficients in Table 10.
[000124] The format sequence B uses special phrasing for the HOA coefficients up to the order of three as shown in Table 12:
Table 12. Format B HOA coefficient naming convention
[000125] For Format B, the HOA coefficients are transmitted from lowest to highest order, where the HOA coefficients of each order are transmitted in alphabetical order. For example, the coefficients of a 3D configuration of order HOA three are stored in the sequence W, X, Y, S, R, S, T, U, V, K, L, M, N, O, P, and Q. format B is defined only up to the third HOA order. For the transmission of the horizontal (2D) coefficients, the supplementary 3D coefficients are ignored, for example, W, X, Y, U, V, P, Q.
[000126] Coefficients for 3D HOA are passed in the TrackHOAParamCoeffSequence in a numerically increasing or decreasing manner from lowest to lowest HOA order ( = '■ ). The increasing number sequence starts with = - and increases up to
[000127]
which is the "CG" sequence defined in Chris Travis, "Four candidate component sequences", http://ambisonics.googlegroups.com/web/Four+candidate+component+ sequences+V09.pdf, 2008. The descending numerical sequence runs in reverse from = to
which is the "QM" string defined in this publication.
[000128] For 2D HOA coefficient, the ascending and descending number sequences TrackHOAParamCoeffSequence are the same as the 3D case, but where unused coefficients * ■ (That is, only the sectorial coefficients Hi= --from Table 10) are omitted. Thus, the increasing numerical sequence leads to
Track Packages Track Packages HOAPCM Encoding Type Package

[000129] This Packet contains the HOA coefficients d in the order defined in TrackHOAParamCoeffSequence, where all coefficients of a time sample are transmitted successively. This package is used for standard HOA Tracks with a TrackSourceType of zero and a TrackCodingType of zero. Dynamic Resolution Encoding Type Package

[000130] The dynamic resolution package is used for a TrackSourceType of "zero" and a TrackCodingType of "one". The different resolutions of the TrackOrderRegions lead to different storage sizes for each TrackOrderRegion. Therefore, the HOA coefficients are stored in a de-interleaved manner, for example, all the coefficients of an HOA order are stored in succession.Single Source Track Packets Single Source Fixed Position Packets

[000131] Single Source Fixed Position Pack is used for a TrackSourceType of "one" and a TrackMovingSourceFlag of "zero". The Pack holds the PCM samples from a mono font. Single Font Moving Position Pack



[000132] Single Source Moving Position Pack is used for a TrackSourceType of "one" and a TrackMovingSourceFlag of "one". It maintains the mono PCM samples and position information for the TrackPacket sample. The TrackDirectionFlag indicates whether the Packet direction has changed or the previous Packet direction should be used. To ensure decoding from the beginning of each Frame, the PacketDirectionFlag is equal to "one" for the first Moving Source TrackPacket of a Frame. For a TrackDirectionFlag of "one", the direction information from the following PCM sample source is transmitted. Depending on the TrackPositionType, direction information is sent as TrackPositionVector in spherical coordinates or as TrackHOAEncodingVector with the TrackEncodingVectorFormat set. TrackEncodingVector generates HOA coefficients that conform to the definitions of the HOAParamHeader field. Following the direction information, the TrackPacket mono PCM Samples are transmitted. Encoding Processing TrackRegion Encoding
[000133] HOA signals can be derived from sound field records with microphone sets. For example, the Eigenmike disclosed in WO 03/061336 A1 can be used to obtain order three HOA recordings. However, the finite size of the microphone array leads to restrictions for the recorded HOA coefficients. In WO 03/061336 A1 and in the above mentioned article "Three-dimensional surround sound systems based on spherical harmonics", issues caused by finite sets of microphones are discussed. The distance of the microphone capsules results in an upper frequency limit given by the spatial sampling theorem. Above this higher frequency, the microphone array may not produce correct HOA coefficients. Additionally, the finite distance of the microphone from the HOA listening position requires an equalization filter. These filters obtain high gains for low frequencies that equally increase with each HOA order. In WO 03/061336 A1 a lower cut-off frequency for higher order coefficients is introduced in order to manipulate the dynamic range of the equalizing filter. This shows that the bandwidth of HOA coefficients of different HOA orders can be different. Therefore, the HOA file format offers the TrackRegionBandwidthReduction which allows the transmission of only the required frequency bandwidth for each HOA order. Due to the highly dynamic range of the EQ filter and due to the fact that the zero order coefficient is basically the sum of all microphone signals, different HOA order coefficients can have different dynamic ranges. Therefore, the HOA file format also offers the aspect of adapting the format type to the dynamic range of each HOA order.TrackRegion Encoding Processing
[000134] As shown in Fig. 12, the interleaved HOA coefficients are fed to the first step or de-interleaving stage 1211, which is assigned to the first TrackRegion and separates all the HOA coefficients of the TrackRegion into de-interleaved buffers in FramePacketSize samples. TrackRegion coefficients are derived from the TrackRegionLastOrder field and the TrackRegionFirstOrder field. Removing the interleaving means that the coefficients -. for a combination of - and are grouped in a temporary memory. From step or stage of de-interleaving 1211, the HOA coefficients with de-interleaving are passed to the TrackRegion coding section. The remaining interleaved HOA coefficients are passed to the next TrackRegion de-interleaving step or stage, and so on to the 121N de-interleaving step or stage. The N number of steps or collation removal stages is equal to the TrackNumberOfOrderRegions plus "one". The additional interleaving stripping step or stage 125 uninterleaving the remaining coefficients that are not part of the TrackRegion into a standard processing path including a format conversion step or stage 126. The TrackRegion encoding path includes a step or stage bandwidth reduction option 1221 and a 1231 format conversion step or stage and performs parallel processing for each HOA coefficient buffer. Bandwidth reduction is performed if the TrackRegionUseBandwidthReduction field is set to "one". Depending on the selected TrackBandwidthReductionType, a processing is selected to limit the frequency range of the HOA coefficients and to critically reduce them. This is performed in order to reduce the number of HOA coefficients to the required minimum number of samples. Format conversion converts the current HOA coefficient format to the TrackRegionSampleFormat defined in the HOATrack header. This is the only step/stage in the standard processing path that converts the HOA coefficients to the TrackSampleFormat of the HOA Track Header. The TrackPacket multiplexer step or stage 124 multiplexes the HOA coefficient buffers into TrackPacket data file streams as defined in the selected TrackHOAParamCoeffSequence field, where the coefficients C/ for a combination of ; and indexes remain with the merge removed (within a temporary memory). TrackRegion Decoding Processing
[000135] As shown in Fig. 13, decoding processing is inverse to encoding processing. Demultiplexer step or stage 134 demultiplexes the TrackPacket file or data stream from the TrackHOAParamCoeffSequence into HOA coefficient buffers (not shown) with interleaving removed. Each buffer contains FramePacketLength -' coefficients for a combination of - and !'= .
[000136] Step/stage 134 initializes TrackNumberOfOrderRegion processing paths plus "one" and passes the contents of the stripped-interleaved HOA coefficient buffers to the appropriate processing path. The coefficients of each TrackRegion are defined by the TrackRegionLastOrder and TrackRegionFirstOrder fields of the HOA Track header. HOA orders that are not covered by the selected TrackRegions are processed in the default processing path including a format conversion step or stage 136 and a remaining coefficient interleaving step or stage 135. The default processing path corresponds to a TrackProcessing path without a bandwidth reduction step or stage.
[000137] In the TrackProcessing paths, a format conversion step/stage 1331 to 133N converts the HOA coefficients that are encoded in the TrackRegionSampleFormat into the data format that is used for decoder processing. Depending on the TrackRegionUseBandwidthReduction data field, an optional bandwidth reconstruction step or stage 1321 to 132N follows, in which the critically sampled bandwidth-limited HOA coefficients are reconstructed to the full bandwidth of the Track. The type of reconstruction processing is defined in the TrackBandwidthReductionType field of the HOA Trash header. In the next step or stage of interleaving 131 through 131N, the contents of the HOA coefficients buffers with interleaving removed are interleaved by grouping the HOA coefficients into a time sample, and the HOA coefficients of the current TrackRegion are combined with the HOA coefficients of the TrackRegions above. The resulting sequence of the HOA coefficients can be adapted for Trail processing. Additionally, the interleaving steps/stages handle delays between TrackRegions using bandwidth reduction and TrackRegions using no bandwidth reduction, which delay depends on the selected TrackBandwidthReductionType processing. For example, MDCT processing adds a delay of FramePacketSize samples and therefore steps / stages of interleaving processing paths without bandwidth reduction will delay their output in a packet. Bandwidth Reduction via MDCT Encoding
[000138] Fig. 14 shows the bandwidth reduction using MDCT processing (modified discrete cosine transform). Each TrackRegion HOA coefficient of FramePacketSize samples passes via a buffer 141 to 141M through a corresponding MDCT window addition step or stage 1421 to 142M. Each input buffer contains the successive temporal HOA coefficients < of a combination of - and, ie, a buffer is defined as

[000139] The M number of buffers is the same as the number of Ambisonics components (N + 1)2 for a 3D full sound field of order N. The buffer manipulation performs a 50% override for MDCT processing next by combining the contents of the previous buffer with the current contents of the buffer into new content for MDCT processing in the corresponding steps or stages 1431 to 143M, and stores the current contents of the buffer for processing the next contents of the buffer . MDCT processing restarts at the beginning of each Frame, which means that all coefficients of a current Frame Track can be decoded without knowledge of the previous Frame, and following the last buffer contents of the current Frame, an additional buffer content of zeros is processed. Therefore, TrackRegions processed with MDCT produce an extra TrackPacket. In the steps / stages of adding window, the corresponding temporary memory contents are multiplied by the selected window function ■■<■), which is defined in the TrackRegionWindowType of the HOATrack header field for each TrackRegion. The Modified Discrete Cosine Transform is first mentioned in JP Princen, AB Bradley, "Analysis/Synthesis Filter Bank Design Based on Time Domain Aliasing Cancellation", IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSP-34, no. .5, pages 1153-1161, October, 1986. MDCT can be thought of as representing a critically sampled filterbank of FramePacketSize subbands, and it requires a 50% overlay of input buffer. Input buffer has a length of twice the size of the subband. MDCT is defined by the following equation with ” equal to FramePacketSize:

[000140] The coefficients . Gj are called MDCT deposits. The MDCT calculation can be implemented using the Fast Fourier Transform. In the following steps or stages of cutting frequency region 1441 through 1444, bandwidth reduction is performed by removing all MDCT deposits. Gjcom ; :: TrackRegionFirstBin and < > TrackRegionLastBin, for reducing the temporary memory length for TrackRegionLastBin - TrackRegionFirstBin + 1, where, TrackRegionFirstBin is the lower cutoff frequency for TrackRegion and TrackRegionLastBin is the upper cutoff frequency. The omission of MDCT deposits can be considered to represent a bandpass filter with cutoff frequencies corresponding to the TrackRegionLastBin and TrackRegionFirstBin frequencies. Therefore, only the required MDCT deposits are transmitted.Decoding
[000141] Fig. 15 presents the decoding or reconstruction of bandwidth using MDCT processing, in which HOA coefficients of TrackRegions with limited bandwidth are reconstructed to full bandwidths of the Track. This bandwidth rebuild processes the contents of the HOA coefficients buffer temporarily with collating stripped in parallel, where each buffer contains TrackRegionlastBin - TrackRegionFirstBin + 1 MDCT store of coefficients •.(<}. The steps or stages of adding regions of Missing frequencies 1541 to 154M rebuild the complete contents of the MDCT buffer of FramePacketLength size by supplementing the received MDCT buckets with the missing MDCT buckets k < TrackRegionFirstBin and k > TrackRegionLastBin using zeros, after which the reverse MDCT is performed in the MDCT steps or stages inverse corresponding 1531 to 153M in order to reconstruct the HOA coefficients in the time domain -'O. The inverse MDCT can be interpreted as a synthesis filter bank where MDCT deposits with FramePacketLength are converted to twice the coefficients in the time domain with FramePacketLength. Meanwhile, comp rebuild leting the samples in the time domain requires a multiplication with the O window function used in the encoder and an overlay - addition of the first half of the current buffer contents with the second half of the previous buffer contents. Inverse MDCT is defined by the following equation:

[000142] In the same way as for the MDCT, the inverse MDCT can be implemented using the Fast Fourier Transform.
[000143] MDCT window addition steps or stages 1521 through 152M multiply the time domain coefficients reconstructed by the window function defined by TrackRegionWindowType. Subsequent buffers 151 to 151M add the first half of the current contents of the TrackPacket buffer with the second half of the last contents of the TrackPacket buffer in order to reconstruct the time domain coefficients with FramePacketSize. The second half of the current contents of the TrackPacket memory is stored for subsequent TrackPacket processing, overlay/add processing which removes the opposite distortion components from both temporary memory contents.
[000144] For HOA files with multiple Frames, the encoder is prohibited from using the last buffer contents of the previous frame for the overwrite/add procedure at the beginning of a new Frame. Therefore, at Frame edges or at the beginning of a new Frame, the buffer overlay/add content is missing, and the reconstruction of the first TrackPacket of a Frame can be performed on the second TrackPacket, whereby a delay of one FramePacket and the decoding of an extra TrackPacket is introduced as compared to processing paths without bandwidth reduction. This delay is handled by the steps/interleaving stages described in connection with Fig. 13.
权利要求:
Claims (14)
[0001]
1. Data structure for High Order Ambisonics (HOA) audio data, including Ambisonics coefficients, whose data structure includes 2D or 3D or both 2D and 3D spatial audio content data for one or more different data stream descriptions of HOA audio, and whose data structure is also suitable for HOA audio data having an order greater than "3", and which data structure, in addition, can include data from single audio signal source, or microphone array audio data, or both single audio signal source data and microphone array audio data from fixed or time-varying spatial positions, characterized by the fact that the different flow descriptions HOA audio data is related to at least two of different speaker position densities, HOA encoded wave types, HOA orders and HOA dimensionality, and where an HOA audio data stream description contains audio data. audio for a presentation with a given array of speakers (11, 21) located in an area distinct from a presentation location (10), and another description of the HOA audio data stream contains audio data for a presentation with a different speaker arrangement (12, 22) surrounding the presentation location (10), wherein the different speaker arrangement (12, 22) has a speaker position density that is less than that given speaker arrangement (11,21).
[0002]
2. Data structure according to claim 1, characterized in that the audio data for the given speaker arrangement (11, 21) represents spherical waves and a first order Ambisonics, and the audio data for the different speaker arrangement (12, 22) represents flat waves, or a second order Ambisonics, or both flat waves and second order Ambisonics, with second order Ambisonics being smaller than first order Ambisonics.
[0003]
3. Data structure, according to claim 1, characterized by the fact that the data structure serves as a scene description, where tracks of an audio scene can start and end at any time.
[0004]
4. Data structure, according to claim 1, characterized in that the data structure includes data items with respect to one or more of: - region of interest related to audio sources outside or within an area of listen;- normalization of functions with spherical basis;- propagation directionality;- Ambisonics coefficient sizing information;- Ambisonics wave type, eg plane or spherical;- in case of spherical waves, reference radius for decoding.
[0005]
5. Data structure according to claim 1, characterized by the fact that the Ambisonics coefficients are complex coefficients.
[0006]
6. Data structure according to claim 1, characterized in that the data structure includes at least one of metadata with respect to directions and characteristics for one or more microphones, and an encoding vector for input signals single-source.
[0007]
7. Data structure according to one of claims 1 to 6, characterized in that at least part of the Ambisonics coefficients has reduced bandwidth, so that, for different HOA orders, the bandwidth of the Ambisonics coefficients related is different (1221-122N).
[0008]
8. Data structure, according to claim 7, characterized in that the bandwidth reduction is based on modified discrete cosine transform (MDCT) processing (1431-143M).
[0009]
9. Data structure, according to claim 1, characterized by the fact that the place of presentation is a listening or sitting area in a cinema.
[0010]
10. Method for encoding and laying out data to a data structure characterized in that the data structure is a data structure as defined in claim 1.
[0011]
11. Method for audio presentation, characterized in that it comprises the step of: receiving a High Order Ambisonics (HOA) audio data stream containing at least two different HOA audio data signals, wherein at least one first among the signals (231, 232) is used for presentation with a given array of speakers (11, 21) located in a distinct area of a presentation location (10), and in which at least one second and different signal among the signals is used (241, 242, 243) for presentation with a different speaker arrangement (12, 22) surrounding the presentation location (10), where the different speaker arrangement (12, 22) has a different density speaker position that is less than that of the given speaker arrangement (11,21).
[0012]
12. Method according to claim 11, characterized in that the audio data for the given speaker arrangement (11, 21) represents spherical waves and a first order Ambisonics, and the audio data for the arrangement of different speakers (12, 22) represent flatwaves, or a second order Ambisonics, or both flatwaves and the second order Ambisonics, with the second order Ambisonics being smaller than the first order Ambisonics.
[0013]
13. Method according to claim 11, characterized in that the place of presentation is a listening or sitting area in a cinema.
[0014]
14. Apparatus for audio presentation, characterized in that it comprises: means for receiving a High Order Ambisonics (HOA) audio data stream containing at least two different HOA audio data signals; means for processing at least a first one among the signals (231,232) for presentation with a given array of speakers (11, 21) located in an area distinct from a presentation location (10); and means for processing at least one second and different signal from among the signals (241, 242, 243) for presentation with a different speaker arrangement (12, 22) surrounding the presentation location (10), wherein the loudspeaker arrangement -different speakers (12, 22) have a speaker position density that is less than that of the given speaker arrangement (11,21).
类似技术:
公开号 | 公开日 | 专利标题
BR112013010754B1|2021-06-15|DATA STRUCTURE FOR HIGH-ORDER AMBISONICS AUDIO DATA, METHOD FOR CODING AND DISPLAYING DATA TO A DATA STRUCTURE, METHOD FOR AUDIO PRESENTATION AND AUDIO PRESENTATION DEVICE
US10187739B2|2019-01-22|System and method for capturing, encoding, distributing, and decoding immersive audio
CN105981411B|2018-11-30|The matrix mixing based on multi-component system for the multichannel audio that high sound channel counts
KR101854964B1|2018-05-04|Transforming spherical harmonic coefficients
BRPI0808217B1|2021-04-06|METHOD AND EQUIPMENT FOR CONVERSION BETWEEN MULTI-CHANNEL AUDIO FORMATS
TWI646847B|2019-01-01|Method and apparatus for enhancing directivity of a 1st order ambisonics signal
EP3162086B1|2021-04-07|Apparatus for determining for the compression of an hoa data frame representation a lowest integer number of bits required for representing non-differential gain values
EP3162087B1|2021-03-17|Coded hoa data frame representation that includes non-differential gain values associated with channel signals of specific ones of the data frames of an hoa data frame representation
CN106471580B|2021-03-05|Method and apparatus for determining a minimum number of integer bits required to represent non-differential gain values for compression of a representation of a HOA data frame
EP3161821B1|2018-09-26|Method for determining for the compression of an hoa data frame representation a lowest integer number of bits required for representing non-differential gain values
TW202007191A|2020-02-01|Embedding enhanced audio transports in backward compatible audio bitstreams
BR112015019526B1|2021-12-07|METHOD AND APPARATUS FOR IMPROVING THE DIRECTIVITY OF A 1ST ORDER AMBISONICS SIGNAL AND NON-TRANSITORY COMPUTER READable STORAGE MEDIA.
同族专利:
公开号 | 公开日
JP5823529B2|2015-11-25|
HK1189297A1|2014-05-30|
PT2636036E|2014-10-13|
EP2636036B1|2014-08-27|
CN103250207B|2016-01-20|
AU2011325335B2|2015-05-21|
JP2013545391A|2013-12-19|
WO2012059385A1|2012-05-10|
US20130216070A1|2013-08-22|
AU2011325335A8|2015-06-04|
KR101824287B1|2018-01-31|
AU2011325335B8|2015-06-04|
US9241216B2|2016-01-19|
BR112013010754A2|2018-05-02|
CN103250207A|2013-08-14|
KR20140000240A|2014-01-02|
EP2450880A1|2012-05-09|
EP2636036A1|2013-09-11|
AU2011325335A1|2013-05-09|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

GB1512514A|1974-07-12|1978-06-01|Nat Res Dev|Microphone assemblies|
US5956674A|1995-12-01|1999-09-21|Digital Theater Systems, Inc.|Multi-channel predictive subband audio coder using psychoacoustic adaptive bit allocation in frequency, time and over the multiple channels|
US20030147539A1|2002-01-11|2003-08-07|Mh Acoustics, Llc, A Delaware Corporation|Audio system based on at least second-order eigenbeams|
FR2858403B1|2003-07-31|2005-11-18|Remy Henri Denis Bruno|SYSTEM AND METHOD FOR DETERMINING REPRESENTATION OF AN ACOUSTIC FIELD|
CN1677490A|2004-04-01|2005-10-05|北京宫羽数字技术有限责任公司|Intensified audio-frequency coding-decoding device and method|
JP5023662B2|2006-11-06|2012-09-12|ソニー株式会社|Signal processing system, signal transmission device, signal reception device, and program|
EP2205007B1|2008-12-30|2019-01-09|Dolby International AB|Method and apparatus for three-dimensional acoustic field encoding and optimal reconstruction|
EP2451196A1|2010-11-05|2012-05-09|Thomson Licensing|Method and apparatus for generating and for decoding sound field data including ambisonics sound field data of an order higher than three|EP2469741A1|2010-12-21|2012-06-27|Thomson Licensing|Method and apparatus for encoding and decoding successive frames of an ambisonics representation of a 2- or 3-dimensional sound field|
DE102012200512B4|2012-01-13|2013-11-14|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for calculating loudspeaker signals for a plurality of loudspeakers using a delay in the frequency domain|
EP2637427A1|2012-03-06|2013-09-11|Thomson Licensing|Method and apparatus for playback of a higher-order ambisonics audio signal|
EP2645748A1|2012-03-28|2013-10-02|Thomson Licensing|Method and apparatus for decoding stereo loudspeaker signals from a higher-order Ambisonics audio signal|
EP2665208A1|2012-05-14|2013-11-20|Thomson Licensing|Method and apparatus for compressing and decompressing a Higher Order Ambisonics signal representation|
US9288603B2|2012-07-15|2016-03-15|Qualcomm Incorporated|Systems, methods, apparatus, and computer-readable media for backward-compatible audio coding|
EP2688066A1|2012-07-16|2014-01-22|Thomson Licensing|Method and apparatus for encoding multi-channel HOA audio signals for noise reduction, and method and apparatus for decoding multi-channel HOA audio signals for noise reduction|
CN107071685B|2012-07-16|2020-02-14|杜比国际公司|Method and apparatus for rendering an audio soundfield representation for audio playback|
EP2875511B1|2012-07-19|2018-02-21|Dolby International AB|Audio coding for improving the rendering of multi-channel audio signals|
EP2898506B1|2012-09-21|2018-01-17|Dolby Laboratories Licensing Corporation|Layered approach to spatial audio coding|
EP2733963A1|2012-11-14|2014-05-21|Thomson Licensing|Method and apparatus for facilitating listening to a sound signal for matrixed sound signals|
CN108174341B|2013-01-16|2021-01-08|杜比国际公司|Method and apparatus for measuring higher order ambisonics loudness level|
US9736609B2|2013-02-07|2017-08-15|Qualcomm Incorporated|Determining renderers for spherical harmonic coefficients|
US9883310B2|2013-02-08|2018-01-30|Qualcomm Incorporated|Obtaining symmetry information for higher order ambisonic audio renderers|
US9609452B2|2013-02-08|2017-03-28|Qualcomm Incorporated|Obtaining sparseness information for higher order ambisonic audio renderers|
EP2765791A1|2013-02-08|2014-08-13|Thomson Licensing|Method and apparatus for determining directions of uncorrelated sound sources in a higher order ambisonics representation of a sound field|
US10178489B2|2013-02-08|2019-01-08|Qualcomm Incorporated|Signaling audio rendering information in a bitstream|
JP5734327B2|2013-02-28|2015-06-17|日本電信電話株式会社|Sound field recording / reproducing apparatus, method, and program|
JP5734328B2|2013-02-28|2015-06-17|日本電信電話株式会社|Sound field recording / reproducing apparatus, method, and program|
JP5734329B2|2013-02-28|2015-06-17|日本電信電話株式会社|Sound field recording / reproducing apparatus, method, and program|
US9685163B2|2013-03-01|2017-06-20|Qualcomm Incorporated|Transforming spherical harmonic coefficients|
EP2782094A1|2013-03-22|2014-09-24|Thomson Licensing|Method and apparatus for enhancing directivity of a 1st order Ambisonics signal|
US9641834B2|2013-03-29|2017-05-02|Qualcomm Incorporated|RTP payload format designs|
EP2800401A1|2013-04-29|2014-11-05|Thomson Licensing|Method and Apparatus for compressing and decompressing a Higher Order Ambisonics representation|
US9412385B2|2013-05-28|2016-08-09|Qualcomm Incorporated|Performing spatial masking with respect to spherical harmonic coefficients|
CN105340008B|2013-05-29|2019-06-14|高通股份有限公司|The compression through exploded representation of sound field|
US20140355769A1|2013-05-29|2014-12-04|Qualcomm Incorporated|Energy preservation for decomposed representations of a sound field|
US9384741B2|2013-05-29|2016-07-05|Qualcomm Incorporated|Binauralization of rotated higher order ambisonics|
US9466305B2|2013-05-29|2016-10-11|Qualcomm Incorporated|Performing positional analysis to code spherical harmonic coefficients|
JP6186900B2|2013-06-04|2017-08-30|ソニー株式会社|Solid-state imaging device, electronic device, lens control method, and imaging module|
EP3005354B1|2013-06-05|2019-07-03|Dolby International AB|Method for encoding audio signals, apparatus for encoding audio signals, method for decoding audio signals and apparatus for decoding audio signals|
WO2014204911A1|2013-06-18|2014-12-24|Dolby Laboratories Licensing Corporation|Bass management for audio rendering|
EP2824661A1|2013-07-11|2015-01-14|Thomson Licensing|Method and Apparatus for generating from a coefficient domain representation of HOA signals a mixed spatial/coefficient domain representation of said HOA signals|
EP2830335A3|2013-07-22|2015-02-25|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus, method, and computer program for mapping first and second input channels to at least one output channel|
EP2866475A1|2013-10-23|2015-04-29|Thomson Licensing|Method for and apparatus for decoding an audio soundfield representation for audio playback using 2D setups|
CN103618986B|2013-11-19|2015-09-30|深圳市新一代信息技术研究院有限公司|The extracting method of source of sound acoustic image body and device in a kind of 3d space|
US10015615B2|2013-11-19|2018-07-03|Sony Corporation|Sound field reproduction apparatus and method, and program|
EP2879408A1|2013-11-28|2015-06-03|Thomson Licensing|Method and apparatus for higher order ambisonics encoding and decoding using singular value decomposition|
KR101862356B1|2014-01-03|2018-06-29|삼성전자주식회사|Method and apparatus for improved ambisonic decoding|
CN111182443B|2014-01-08|2021-10-22|杜比国际公司|Method and apparatus for decoding a bitstream comprising an encoded HOA representation|
US9922656B2|2014-01-30|2018-03-20|Qualcomm Incorporated|Transitioning of ambient higher-order ambisonic coefficients|
US9502045B2|2014-01-30|2016-11-22|Qualcomm Incorporated|Coding independent frames of ambient higher-order ambisonic coefficients|
US20150243292A1|2014-02-25|2015-08-27|Qualcomm Incorporated|Order format signaling for higher-order ambisonic audio data|
KR102201726B1|2014-03-21|2021-01-12|돌비 인터네셔널 에이비|Method for compressing a higher order ambisonics signal, method for decompressing a compressed hoa signal, apparatus for compressing a hoa signal, and apparatus for decompressing a compressed hoa signal|
EP2922057A1|2014-03-21|2015-09-23|Thomson Licensing|Method for compressing a Higher Order Ambisonicssignal, method for decompressing a compressed HOA signal, apparatus for compressing a HOA signal, and apparatus for decompressing a compressed HOA signal|
US10412522B2|2014-03-21|2019-09-10|Qualcomm Incorporated|Inserting audio channels into descriptions of soundfields|
CN109410962A|2014-03-21|2019-03-01|杜比国际公司|Method, apparatus and storage medium for being decoded to the HOA signal of compression|
CN109087653A|2014-03-24|2018-12-25|杜比国际公司|To the method and apparatus of high-order clear stereo signal application dynamic range compression|
WO2015152666A1|2014-04-02|2015-10-08|삼성전자 주식회사|Method and device for decoding audio signal comprising hoa signal|
US10770087B2|2014-05-16|2020-09-08|Qualcomm Incorporated|Selecting codebooks for coding vectors decomposed from higher-order ambisonic audio signals|
US9852737B2|2014-05-16|2017-12-26|Qualcomm Incorporated|Coding vectors decomposed from higher-order ambisonics audio signals|
US20150332682A1|2014-05-16|2015-11-19|Qualcomm Incorporated|Spatial relation coding for higher order ambisonic coefficients|
US9620137B2|2014-05-16|2017-04-11|Qualcomm Incorporated|Determining between scalar and vector quantization in higher order ambisonic coefficients|
EP3151240A4|2014-05-30|2018-01-24|Sony Corporation|Information processing device and information processing method|
CN113808598A|2014-06-27|2021-12-17|杜比国际公司|Method for determining the minimum number of integer bits required to represent non-differential gain values for compression of a representation of a HOA data frame|
EP2960903A1|2014-06-27|2015-12-30|Thomson Licensing|Method and apparatus for determining for the compression of an HOA data frame representation a lowest integer number of bits required for representing non-differential gain values|
CN110415712A|2014-06-27|2019-11-05|杜比国际公司|The method indicated for decoded voice or the high-order ambisonicsof sound field|
KR20170023869A|2014-06-27|2017-03-06|돌비 인터네셔널 에이비|Coded hoa data frame representation taht includes non-differential gain values associated with channel signals of specific ones of the data frames of an hoa data frame representation|
CN113851138A|2014-06-30|2021-12-28|索尼公司|Information processing apparatus, information processing method, and computer program|
WO2016001355A1|2014-07-02|2016-01-07|Thomson Licensing|Method and apparatus for encoding/decoding of directions of dominant directional signals within subbands of a hoa signal representation|
CN106463131B|2014-07-02|2020-12-08|杜比国际公司|Method and apparatus for encoding/decoding the direction of a dominant direction signal within a subband represented by an HOA signal|
US9838819B2|2014-07-02|2017-12-05|Qualcomm Incorporated|Reducing correlation between higher order ambisonicbackground channels|
US9536531B2|2014-08-01|2017-01-03|Qualcomm Incorporated|Editing of higher-order ambisonic audio data|
US9847088B2|2014-08-29|2017-12-19|Qualcomm Incorporated|Intermediate compression for higher order ambisonic audio data|
US9747910B2|2014-09-26|2017-08-29|Qualcomm Incorporated|Switching between predictive and non-predictive quantization techniques in a higher order ambisonicsframework|
US9875745B2|2014-10-07|2018-01-23|Qualcomm Incorporated|Normalization of ambient higher order ambisonic audio data|
US10140996B2|2014-10-10|2018-11-27|Qualcomm Incorporated|Signaling layers for scalable coding of higher order ambisonic audio data|
EP3007167A1|2014-10-10|2016-04-13|Thomson Licensing|Method and apparatus for low bit rate compression of a Higher Order Ambisonics HOA signal representation of a sound field|
GB2532034A|2014-11-05|2016-05-11|Lee Smiles Aaron|A 3D visual-audio data comprehension method|
KR20170109023A|2015-01-30|2017-09-27|디티에스, 인코포레이티드|Systems and methods for capturing, encoding, distributing, and decoding immersive audio|
US9712936B2|2015-02-03|2017-07-18|Qualcomm Incorporated|Coding higher-order ambisonic audio data with motion stabilization|
US10327067B2|2015-05-08|2019-06-18|Samsung Electronics Co., Ltd.|Three-dimensional sound reproduction method and device|
JP6466251B2|2015-05-20|2019-02-06|アルパイン株式会社|Sound field reproduction system|
TWI607655B|2015-06-19|2017-12-01|Sony Corp|Coding apparatus and method, decoding apparatus and method, and program|
US9961475B2|2015-10-08|2018-05-01|Qualcomm Incorporated|Conversion from object-based audio to HOA|
US10249312B2|2015-10-08|2019-04-02|Qualcomm Incorporated|Quantization of spatial vectors|
US9961467B2|2015-10-08|2018-05-01|Qualcomm Incorporated|Conversion from channel-based audio to HOA|
CN105895111A|2015-12-15|2016-08-24|乐视致新电子科技(天津)有限公司|Android based audio content processing method and device|
CN108496221B|2016-01-26|2020-01-21|杜比实验室特许公司|Adaptive quantization|
EP3209036A1|2016-02-19|2017-08-23|Thomson Licensing|Method, computer readable storage medium, and apparatus for determining a target sound scene at a target position from two or more source sound scenes|
EP3232688A1|2016-04-12|2017-10-18|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for providing individual sound zones|
US10074012B2|2016-06-17|2018-09-11|Dolby Laboratories Licensing Corporation|Sound and video object tracking|
CN106340301B|2016-09-13|2020-01-24|广州酷狗计算机科技有限公司|Audio playing method and device|
WO2018064528A1|2016-09-29|2018-04-05|The Trustees Of Princeton University|Ambisonic navigation of sound fields from an array of microphones|
US10158963B2|2017-01-30|2018-12-18|Google Llc|Ambisonic audio with non-head tracked stereo based on head position and time|
KR20180090022A|2017-02-02|2018-08-10|한국전자통신연구원|Method for providng virtual-reality based on multi omni-direction camera and microphone, sound signal processing apparatus, and image signal processing apparatus for performin the method|
US10390166B2|2017-05-31|2019-08-20|Qualcomm Incorporated|System and method for mixing and adjusting multi-input ambisonics|
KR20200018773A|2017-06-15|2020-02-20|돌비 인터네셔널 에이비|Methods, devices, and systems for optimizing communication between transmitter and receiver in computer mediated reality applications|
SG11202000330XA|2017-07-14|2020-02-27|Fraunhofer Ges Forschung|Concept for generating an enhanced sound field description or a modified sound field description using a multi-point sound field description|
CA3069403A1|2017-07-14|2019-01-17|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Concept for generating an enhanced sound-field description or a modified sound field description using a multi-layer description|
CN111108555A|2017-07-14|2020-05-05|弗劳恩霍夫应用研究促进协会|Concept for generating an enhanced or modified sound field description using depth extended DirAC techniques or other techniques|
CN107920303B|2017-11-21|2019-12-24|北京时代拓灵科技有限公司|Audio acquisition method and device|
US10595146B2|2017-12-21|2020-03-17|Verizon Patent And Licensing Inc.|Methods and systems for extracting location-diffused ambient sound from a real-world scene|
CN112005560B|2018-04-10|2021-12-31|高迪奥实验室公司|Method and apparatus for processing audio signal using metadata|
KR102323529B1|2018-12-17|2021-11-09|한국전자통신연구원|Apparatus and method for processing audio signal using composited order ambisonics|
US10735887B1|2019-09-19|2020-08-04|Wave Sciences, LLC|Spatial audio array processing system and method|
RU2751440C1|2020-10-19|2021-07-13|Федеральное государственное бюджетное образовательное учреждение высшего образования «Московский государственный университет имени М.В.Ломоносова» |System for holographic recording and playback of audio information|
法律状态:
2018-06-12| B25A| Requested transfer of rights approved|Owner name: DOLBY INTERNATIONAL AB (NL) |
2018-12-18| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|
2019-09-10| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|
2020-11-17| B07A| Application suspended after technical examination (opinion) [chapter 7.1 patent gazette]|
2021-03-30| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|
2021-06-15| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 26/10/2011, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
申请号 | 申请日 | 专利标题
EP10306211A|EP2450880A1|2010-11-05|2010-11-05|Data structure for Higher Order Ambisonics audio data|
EP10306211.3|2010-11-05|
PCT/EP2011/068782|WO2012059385A1|2010-11-05|2011-10-26|Data structure for higher order ambisonics audio data|
[返回顶部]