巴西专利BR112013033386B1 system and method for adaptive audio signal generation, encoding, and rendering

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
SYSTEM AND METHOD FOR ADAPTIVE AUDIO SIGNAL GENERATION, ENCODING AND RENDERING. The present invention relates to embodiments that are described for an adaptive audio system that processes audio data comprising a number of independent monophonic audio streams. One or more of the streams has associated metadata that specifies whether the stream is an object-based or channel-based stream. Channel-based streams have rendering information encoded via the channel name; and object-based streams have location information encoded through location expressions encoded in the associated metadata. A codec bundles independent audio streams into a single serial bit stream that contains all of the audio data. This setting allows the sound to be rendered according to an allocentric frame of reference, where a sound's rendering location is based on the characteristics of the playing environment (eg room size, format, etc.) to match the intent of the mixer. Object position metadata contains the appropriate allocentric frame of reference information requested to play the sound correctly with (...).
公开号:BR112013033386B1
申请号:R112013033386-3
申请日:2012-06-27
公开日:2021-05-04
发明作者:Charles Q. Robinson；Nicolas R. Tsingos；Christophe Chabanne
申请人:Dolby Laboratories Licensing Corporation；
IPC主号:

专利说明:

CROSS REFERENCE TO RELATED ORDERS
[0001] This application claims priority from Provisional Application No. US 61/504,005, filed June 1, 2011, and Provisional Application No. US 61/636,429, filed April 20, 2012, both of which are incorporated herein by way of reference in its entirety for all purposes. FIELD OF TECHNIQUE
[0002] One or more implementations relate generally to audio signal processing and more specifically to hybrid object and channel-based audio processing for use in cinema, home and other environments. BACKGROUND
[0003] The matter discussed in the background section should be assumed to be the prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the matter of the background section should not be assumed to have been previously recognized in the prior art. The matter in the background section merely represents different approaches which in and of themselves may also be inventions.
[0004] Since the introduction of movies with sound, there has been a steady evolution of the technology used to capture the artist's artistic intent for the moving image soundtrack and to accurately reproduce it in a cinematic environment. A fundamental role of cinema sound is to support the story that is shown on the screen. Typical movie soundtracks comprise many different sound elements corresponding to on-screen elements and images, dialogue, noise and sound effects that emanate from different on-screen elements and combine with background music and ambient effects to create the overall listening experience. The artistic intent of the creators and producers represents their desire to have these sounds reproduced in a way that corresponds as closely as possible to what is shown on screen with respect to sound source position, intensity, movement and other similar parameters.
[0005] Today's cinema creation, distribution and playback suffer from limitations that restrict the creation of truly immersive and natural audio. Traditional channel-based audio systems send audio content in the form of speaker feeds to individual speakers in a playback environment such as stereo and 5.1 systems. The introduction of digital cinema has set new standards for sound in film, such as incorporating up to 16 audio channels to enable greater creativity for content creators and a more immersive and realistic listening experience for the audience. The introduction of 7.1 surround sound systems provided a new format that increases the number of surround channels by dividing the left and right surround channels into four zones, thus creating the scope for sound programmers and mixers to control the placement of audio elements in the movie theater.
[0006] To improve the listener experience, sound reproduction in three-dimensional virtual environments has become a growing area of research and development. The spatial presentation of sound uses audio objects, which are audio signals with associated parametric source descriptions of apparent source position (for example, 3D coordinates), apparent source width, and other parameters. Object-based audio is increasingly being used for many of today's multimedia applications, such as digital movies, video games, simulators, and 3D video.
[0007] Expanding beyond traditional speaker feeds and channel-based audio as a means of distributing spatial audio is critical and there has been considerable interest in a model-based audio description that holds the promise of enabling the listener/viewer the freedom to select the playback setup that suits their individual needs or budget, with the audio rendered specifically for their chosen setup. At a high level, there are four main spatial audio description formats at present: speaker feed in which the audio is described as signals intended for speakers at the nominal speaker positions; microphone feed in which audio is described as signals captured by virtual or real microphones in a predefined arrangement; model-based description in which the audio is described in terms of a sequence of audio events at the described positions; and binaural in which audio is described by the signals that reach the listeners' ears. These four description formats are often associated with one or more rendering technologies that convert audio signals into speaker feeds. Rendering technologies include panning, in which the audio stream is converted to speaker feeds using a set of panning laws and known or assumed speaker positions (typically rendered before delivery) ; Ambisonics, in which microphone signals are converted into feeds for a scalable array of speakers (typically rendered after distribution); WFS (Wave Field Synthesis) in which sound events are converted into the appropriate speaker signals to synthesize the sound field (typically rendered after distribution); and binaural, in which binaural L/R (left/right) signals are delivered to the L/R ear, typically using headphones, but also through the use of speakers and interference cancellation (rendered before or after distribution). Among these formats, the speaker power format is the most common because it is simple and effective. The best sonic results (more accurate, more reliable) are achieved by mixing/monitoring and distributing to speaker feeds directly as there is no processing between the listener and the content creator. If the playback system is known in advance, a speaker power description generally provides the highest fidelity. However, in many practical applications, the reproduction system is not known. Model-based description is considered the most adaptable because it makes no assumptions about the rendering technology and is therefore more easily applied to any rendering. Although template-based description effectively captures spatial information, it becomes more ineffective as the number of audio sources increases.
[0008] For many years, cinema systems have distinct screen channels portrayed in the form of left, center, right and occasionally "inner left" and "inner right" channels. These distinct sources generally have sufficient frequency response and power handling to allow sounds to be precisely placed in different screen arias and to allow timbre matching as sounds are panned or panned between locations. Recent developments in improving the listener experience try to accurately reproduce the location of sounds in relation to the listener. In a 5.1 configuration, surround "zones" comprise an array of speakers, all of which carry the same audio information within each left surround or right surround zone. Such arrangements can be effective with diffuse "ambient" or surround effects, however, in everyday life many sound effects originate from randomly placed point sources. For example, in a restaurant, ambient music can be played apparently from all around, while subtle, but the distinct sounds originate from specific points: a person talking from a point, the noise of a knife in a meadow of another. Being able to place such sounds distinctly around the auditorium can add a heightened sense of accomplishment without being remarkably obvious. Superior sounds are also an important component of surround definition. In the real world, sounds originate from all directions not always from a single horizontal plane. An added sense of realism can be achieved if sound can be heard from above, in other words from the "upper hemisphere". Present systems, however, do not provide truly accurate sound reproduction for different types of audio in a variety of different playback environments. A great deal of processing, knowledge and setup of real playing environments is required using existing systems to attempt accurate representation of location-specific sounds, thus rendering current systems impractical for most applications.
[0009] What is needed is a system that supports multiple screen channels, resulting in increased definition and improved audiovisual coherence for on-screen sounds or dialog, and the ability to precisely position sources anywhere in surround zones to improve the audiovisual transition of the screen into the room. For example, if a character on the screen looks into the room towards a sound source, the sound engineer ("mixer") must be able to precisely position the sound so that it coincides with the character's line of sight. and the effect will be consistent across the entire audience. In a traditional 5.1 or 7.1 surround sound mix, however, the effect is highly dependent on the listener's seating position, which is not advantageous for most large-scale listening environments. The increased surround resolution creates new opportunities to use sound in a room-centered way as opposed to the traditional approach, where content is created assuming a single listener in the "sweet spot".
[00010] In addition to spatial issues, the current state of multiple channels of technical systems has a disadvantage in relation to timbre. For example, the timbre quality of some sounds, such as hissing steam from a broken pipe, can suffer from being reproduced by an array of speakers. The ability to target specific sounds to a single speaker gives the mixer the opportunity to eliminate arrangement playback artifacts and deliver a more realistic experience to the audience. Traditionally, surround-type speakers do not support the same full range of audio frequency and level that large-screen channels do. This has historically created problems for mixers, reducing their ability to freely move full-range sounds from screen to room. As a result, movie theater owners did not feel compelled to improve their surround channel setup, preventing the widespread adoption of higher quality installations. BRIEF SUMMARY OF MODALITIES
[00011] Systems and methods are described for a cinema sound processing system and format that includes a new speaker template (channel configuration) and an associated spatial description format. An adaptive audio system and format are defined that supports multiple rendering technologies. Audio streams are transmitted along with metadata describing the "mixer intent" including the desired position of the audio stream. Position can be expressed as a named channel (from within the predefined channel configuration) or as three-dimensional position information. This channel-plus-object format combines optimal model-based and channel-based audio scene description methods. The audio data for the adaptive audio system comprises several independent monophonic audio streams. Each stream has associated metadata that specifies whether the stream is a channel-based or an object-based stream. Channel-based streams have rendering information encoded via the channel name; and object-based streams have location information encoded through mathematical expressions encoded in additional associated metadata. The original independent audio streams are packaged as a single serial bitstream that contains all the audio data. This setting allows the sound to be rendered according to an allocentric frame of reference, in which a sound's rendering location is based on the characteristics of the playing environment (eg room size, format, etc.) to match to the intent of the mixer. Object position metadata contains the appropriate allocentric frame of reference information required to correctly reproduce sound using the speaker positions available in a room that is set up to play adaptive audio content. This allows the sound to be optimally mixed for a particular playing environment that may be different from the mixing environment experienced by the recording engineer.
[00012] Adaptive audio system improves audio quality in different rooms through such benefits as improved room equalization and surround bass management, so that speakers (either with screen or without screen) can be freely addressed by the mixer without having to fuss about the timbre match. Adaptive audio system adds the flexibility and power of dynamic audio objects in traditional channel-based workflows. These audio objects allow the creator to control distinct sound elements independent of any specific playback speaker settings, including overhead speakers. The system also introduces new efficiencies to the post-production process, allowing sound engineers to effectively capture your entire intent and then monitor in real-time or automatically generate surround sound versions 7.1 and 5.1.
[00013] The Adaptive Audio System simplifies distribution by encapsulating the essence of audio and artistic intent into a single track file within the Digital Cinema Processor, which can be faithfully reproduced in a wide range of theater settings. The system provides optimal reproduction of artistic intent when mixing and rendering use the same channel setup and a single inventory with downstream adaptation to the render setup, ie, down-mix.
[00014] These and other advantages are provided through modalities that target a cinema sound platform, current address system limitations, and deliver an audio experience beyond currently available systems. BRIEF DESCRIPTION OF THE DRAWINGS
[00015] In the following drawings, like reference numbers are used to refer to like elements. Although the following figures depict various examples, the one or more deployments are not limited to the examples depicted in the figures.
[00016] Figure 1 is a top-level overview of an audio reproduction and creation environment that uses an adaptive audio system, under one modality.
[00017] Figure 2 illustrates the combination of data based on channel and object to produce an adaptive audio mix, under one modality.
[00018] Figure 3 is a block diagram that illustrates the workflow of creating, packaging and rendering adaptive audio content, under one modality.
[00019] Figure 4 is a block diagram of a rendering stage of an adaptive audio system, under a modality.
[00020] Figure 5 is a table that lists the metadata types and associated metadata elements for adaptive audio, under a modality.
[00021] Figure 6 is a diagram that illustrates a post-production and mastering for an adaptive audio system, under a modality.
[00022] Figure 7 is a diagram of an example workflow for a digital cinema packaging process that uses adaptive audio files, under one modality.
[00023] Figure 8 is a top view of an exemplary template of suggested speaker locations for use with an adaptive audio system in a typical auditorium.
[00024] Figure 9 is a front view of an exemplary placement of suggested speaker locations on the screen for use in the typical auditorium.
[00025] Figure 10 is a side view of an exemplary template of suggested speaker locations for use with an adaptive audio system in the typical auditorium.
[00026] Figure 11 is an example of a placement of top surround type speakers and side surround type speakers in relation to the reference point, under a modality. DETAILED DESCRIPTION
[00027] Systems and methods are described for an adaptive audio system and associated audio signal and data format that supports multiple rendering technologies. Aspects of the one or more modalities described in this document may be deployed in an audio or audio-visual system that processes source audio information in a mixing, playback and rendering system that includes one or more computers or processing devices that execute software instructions. Any of the described modalities can be used alone or in conjunction with each other in any combination. Although various modalities may have been motivated by various deficiencies with the prior art, which may be discussed or alluded to in one or more places in the specification, the modalities do not necessarily cover any of these deficiencies. In other words, different modalities can cover different deficiencies that can be discussed in the descriptive report. Some modalities may only partially cover some deficiencies or only a deficiency that can be discussed in the descriptive report, and some modalities may not cover any of these deficiencies.
[00028] For the purposes of this description, the following terms have the associated meanings: audio channel or channel: a monophonic audio signal or an audio stream plus metadata in which the position is encoded as a channel ID, for example , Surround Top Right or Front Left. One channel object can drive multiple speakers, for example the Surround Left (Ls) channels will power all speakers in the Ls array.
[00029] Channel Setup: a predefined set of speaker zones with associated nominal locations, eg 5.1, 7.1, and so on; 5.1 refers to a six-channel surround sound audio system that has front left and right channels, center channel, two surround-type channels, and one subwoofer channel; 7.1 refers to an eight-channel surround-type system that adds two additional surround-type channels to the 5.1 system. Examples of 5.1 and 7.1 configurations include Dolby® surround-type systems.
[00030] Speaker: An audio transducer or set of transducers that render an audio signal.
[00031] Speaker Zone: an arrangement of one or more speakers can be uniquely referenced and receive a single audio signal, eg Left Surround as typically found in cinema and in particular for exclusion or inclusion for object rendering.
[00032] Speaker Channel or Speaker Power Channel: An audio channel that is associated with a named speaker or speaker zone within a defined speaker configuration. A speaker channel is nominally rendered using the associated speaker zone.
[00033] Speaker Channel Group: A set of one or more speaker channels that correspond to a channel setting (eg a stereo track, mono track, etc.)
[00034] Object or Object Channel: one or more audio channels with a parametric source description such as apparent source position (eg 3D coordinates), apparent source width, etc. An audio stream plus metadata in which the position is encoded as the 3D position in space.
[00035] Audio Program: The complete set of speaker channels and/or associated object channels and metadata that describe the desired spatial audio presentation.
[00036] Allocentric Reference: a spatial reference in which audio objects are defined in relation to features within the rendering environment such as room walls and corners, default speaker locations and screen location (eg left corner front of a room).
[00037] Egocentric Reference: the spatial reference in which audio objects are defined in relation to the listener's perspective (audience) and often specified in relation to angles relative to a listener (eg 30 degrees to the listener's right).
[00038] Frame: Frames are short independently decodable segments into which a total audio program is divided. The audio frame rate and threshold are typically aligned with video frames.
[00039] Adaptive Audio: Channel-based and/or object-based signals plus metadata that renders the audio signals based on the playback environment.
[00040] The cinema sound format and processing system described herein, also referred to as an "adaptive audio system", utilizes a new spatial audio rendering and description technology to enable enhanced audience immersion, more control. scalability and system flexibility, and ease of installation and maintenance. Modalities of a cinema audio platform include several distinct components including mixing tools, packer/encoder, unpacker/decoder, movie theater final mixing and rendering components, new speaker designs and networked amplifiers. The system includes recommendations for a new channel setup to be used by players and content creators. The system uses a template-based description that supports several features such as: unique inventory with downward and upward adaption to the rendering configuration, ie delay rendering and enabling optimal use of available speakers; improved sound wrap, including optimized down-mix to avoid channel-to-channel correlation; increased spatial resolution across arrays through conduction (eg, an audio object dynamically assigned to one or more speakers within a surround-type array); and support for alternative rendering methods.
[00041] Figure 1 is a top-level overview of an audio reproduction and creation environment that uses an adaptive audio system, under one modality. As shown in Figure 1, a comprehensive end-to-end environment 100 includes content creation, packaging, distribution, and playback/rendering components across a large number of endpoint devices and use cases. The general system 100 originates with content captured to and from several different use cases that comprise different user experiences 112. The content capture element 102 includes, for example, cinema, TV, live broadcast, user generated content , recorded content, games and the like, and may include pure audio or audio/visual content. Content, as it progresses through system 100 from capture stage 102 to end user experience 112, goes through key processing steps through distinct system components. These process steps include pre-processing the audio 104, authoring tools and processes 106, encoding by an audio codec 108 that captures, for example, audio data, additional metadata and playback information and object channels. Various processing effects such as compression (lossless or lossless), encryption and the like can be applied to object channels for effective and secure distribution through various means. The appropriate endpoint specificity rendering and decoding processes 110 are then applied to reproduce and drive a particular adaptive audio user experience 112. The audio experience 112 represents the reproduction of audio or audio/visual content through appropriate speakers and playback devices, and can represent any environment in which a listener is experiencing the playback of captured content, such as a movie theater, concert hall, outdoor theatre, a home or living room, listening booth, car, game console, headphone or headphone system, public address (PA) system or any other playback environment.
[00042] System embodiment 100 includes an audio codec 108 which has the ability to efficiently distribute and store multi-channel audio programs and therefore can be referred to as a "hybrid" codec. Codec 108 combines traditional channel-based audio data with associated metadata to produce audio objects that facilitate the creation and delivery of audio that is tailored and optimized for rendering and playback in environments that may be different from the mixing environment. This allows the sound engineer to code their intent regarding how the final audio should be heard by the listener, based on the listener's actual listening environment.
[00043] Conventional channel-based audio codecs operate under the assumption that the audio program will be played by an array of speakers at predetermined positions relative to the listener. To create a complete multi-channel audio program, sound engineers typically mix a large number of separate audio trunks (eg, dialog, music, effects) to create the overall desired impression. Audio mixing decisions are typically made to the audio program as played by an array of speakers at predetermined positions, for example, a particular 5.1 or 7.1 system in a particular theater. The final mixed signal serves as an input to the audio codec. For reproduction, spatially accurate sound fields are achieved only when the speakers are placed in predetermined positions.
[00044] A new form of audio encoding called audio object encoding provides distinct sound sources (audio objects) as inserted into the encoder in the form of separate audio trunks. Examples of audio objects include dialog tracks, unique instruments, individual sound effects, and other point sources. Each audio object is associated with spatial parameters, which can include, but are not limited to, sound position, sound width, and velocity information. The audio objects and associated parameters are then encoded for distribution and storage. Final audio object rendering and mixing is performed at the receiving end of the audio distribution chain as part of audio program playback. This step can be based on knowledge of actual speaker positions so that the result is an audio distribution system that is customizable to the user's specific listening conditions. The two forms of encoding, channel-based and object-based, optimally realize different input signal conditions. Channel-based audio encoders are generally more effective for encoding input signals that contain dense mixes from different audio sources and for diffused sounds. Conversely, audio object encoders are more effective for encoding a small number of highly directional sound sources.
[00045] In one embodiment, the methods and components of system 100 comprise an audio encoding, distribution and decoding system configured to generate one or more bit streams that contain both conventional channel-based audio elements and audio encoding elements. audio object. Such a combined approach provides greater rendering flexibility and coding efficiency compared to object-based or channel-based approaches taken separately.
[00046] Other aspects of the described modalities include extending a predefined channel-based audio codec in a backward-compatible manner to include audio object encoding elements. A new "extension layer" that contains the audio object encoding elements is defined and added to the "base" or "backward-compatible" layer of the channel-based audio codec bitstream. This approach allows one or more bitstreams that include the extension layer to be processed by pre-existing decoders while providing an improved listening experience for users with new decoders. An example of an improved user experience includes controlling audio object rendering. An additional advantage of this approach is that audio objects can be added or modified anywhere along the distribution chain without decoding/mixing/recoding the multi-channel audio encoded with the channel-based audio codec.
[00047] Regarding the frame of reference, the spatial effects of audio signals are critical in providing an immersive experience for the listener. Sounds that are intended to emanate from a specific region of a viewing screen or room must be reproduced through speaker(s) located in the same relative location. Thus, the primary audio metadata of a sound event in a model-based description is position, although other parameters such as size, orientation, velocity and acoustic dispersion can also be described. To drive the position, a model-based 3D audio spatial description requires a 3D coordinate system. The coordinate system used for transmission (Euclidean, spherical, etc.) is generally chosen for convenience or compression, however, other coordinate systems can be used for rendering processing. In addition to a coordinate system, a frame of reference is required to represent the locations of objects in space. For systems to accurately reproduce position-based sound in a variety of different environments, selecting the appropriate frame of reference can be critical. With an allocentric frame of reference, an audio source position is defined in relation to features within the rendering environment such as room walls and corners, default speaker locations, and screen location. In an egocentric frame of reference, locations are represented in relation to the listener's perspective, such as "in front of me, slightly to the left" and so on. Scientific studies of spatial perception (audio and otherwise) have shown that the egocentric perspective is used almost universally. For cinema, however, allocentric is generally more appropriate for several reasons. For example, the precise location of an audio object is more important when there is an object or screen associated with it. With the use of an allocentric reference, for each listening position and for any screen size, the sound will be located in the same relative position on the screen, for example, one third to the left of the center of the screen. Another reason is that mixers tend to think and mix in allocentric terms and the panning tools are set up with an allocentric frame (room walls) and mixers expect them to be rendered that way, for example, that sound should be on the screen, this sound must be off the screen or from the left wall, etc.
[00048] Despite the use of the allocentric frame of reference in the film setting, there are some cases where the egocentric frame of reference can be useful and more appropriate. These include non-diegetic sounds, ie those that are not present in "story space", eg ambient music, for which an egocentrically uniform presentation may be desirable. Another case is near-field effects (eg, a mosquito buzzing in the listener's left ear) that require an egocentric representation. Currently, there is no way to render such a short sound field using headphones or very close-field speakers. Additionally, infinitely distant sound sources (and the resulting plane waves) seem to come from a constant egocentric position (eg, 30 degrees to the left) and such sounds are easier to describe in egocentric terms than in allocentric terms.
[00049] In some cases, it is possible to use an allocentric frame of reference as long as a nominal listening position is defined, although some examples require an egocentric representation that is not yet possible to render. While an allocentric reference may be more useful and appropriate, the audio representation must be extensible, as many new features, including egocentric representation, may be more desirable in certain applications and listening environments. Adaptive audio system modalities include a hybrid spatial description approach that includes a recommended channel setting for optimal fidelity and for rendering diffuse or complex multi-point sources (eg, stadium audience, ambient) with the use of an egocentric reference plus an allocentric model-based sound description to effectively enable increased scalability and spatial resolution. System Components
[00050] Referring to Figure 1, the original sound content data 102 is first processed in a pre-processing block 104. The pre-processing block 104 of system 100 includes an object channel filtering component. In many cases, audio objects contain individual sim sources to enable panning independent of sounds. In some cases, such as when creating audio programs using natural or "production" sound, it may be necessary to extract individual sound objects from a recording that contains multiple sound sources. Modalities include a method for isolating source-independent signals from a more complex signal. Unwanted elements to be separated from independent source signals may include, but are not limited to, other independent sound sources and background noise. Additionally, reverb can be removed to recover "dry" sound sources.
[00051] The preprocessor 104 also includes content-type detection and font separation functionality. The system provides automated generation of metadata through the analysis of incoming audio. Positional metadata is derived from a multi-channel recording through an analysis of the relative levels of correlated input between the channel pairs. Content type detection such as "speech" or "music" can be achieved, for example, by classification and resource extraction. Authoring Tools
[00052] Authoring Toolkit 106 includes features to improve the authoring of audio programs by optimizing the input and encoding of the sim engineer's creative intent allowing it to create the final audio mix that is optimized for playback on virtually any playback environment. This is achieved through the use of audio objects and positional data that are associated and encoded with the original audio content. In order to accurately place sounds around an auditorium, the sound engineer needs to control how the sound will ultimately be rendered based on the actual constraints and capabilities of the playback environment. The adaptive audio system provides this control by allowing the sound engineer to change how audio content is projected and mixed through the use of audio objects and positional data.
[00053] Audio objects can be considered as groups of sound elements that can be perceived to emanate from a particular physical location or locations in the auditorium. Such objects can be static or they can move. In adaptive audio system 100, audio objects are controlled by metadata that, among other things, details the position of the sound at a given point in time. When objects are monitored or played back in a movie theater, they are rendered according to the positional metadata using the speakers that are present, rather than necessarily being broadcast to a physical channel. A track in a session can be an audio object, and standard panning data is analogous to positional metadata. In this way, content placed on the screen can pan just as effectively as with channel-based content, but content placed in the vicinity can be rendered to an individual speaker if desired. While the use of audio objects provides the desired control for distinct effects, other aspects of a movie soundtrack work effectively in a channel-based environment. For example, many ambient or reverb effects actually benefit from being fed into speaker arrays. While these can be treated as objects with enough width to fill an array, it is beneficial to retain some channel-based functionality.
[00054] In one embodiment, the adaptive audio system supports "beds" in addition to audio objects, where the beds are effectively trunks or channel-based submixes. They can be delivered for final reproduction (rendering) either individually or combined into a single bed, depending on the intention of the content creator. These beds can be created in different channel-based configurations such as 5.1, 7.1 and are extendable to more extensive formats such as 9.1 and arrangements that include superior speakers.
[00055] Figure 2 illustrates the combination of data based on channel and object to produce an adaptive audio mix, under one modality. As shown in process 200, channel-based data 202 which, for example, may be 5.1 or 7.1 surround sound data provided in the form of pulse code modulated (PCM) data is combined with the audio object data 204 to produce an adaptive audio mix 208. Audio object data 204 is produced by combining the elements of the original channel-based data with associated metadata that specify certain parameters pertaining to the location of the audio objects.
[00056] As shown conceptually in Figure 2, the authoring tools provide the ability to create audio programs that contain a combination of speaker channel groups and object channels simultaneously. For example, an audio program contains one or more speaker channels optionally organized into groups (or tracks, for example a stereo or 5.1 track), descriptive metadata for one or more speaker channels, one or more object channels and the descriptive metadata for one or more object channels. Within an audio program, each speaker channel group and each object channel can be represented using one or more different sample rates. For example, Digital Cinema (D-Cinema) applications support 48kHz and 96kHz sample rates, but other samples may also be supported. In addition, ingesting, storing and editing channels with different sample rates can also be supported.
[00057] Creating an audio program requires sound design steps, which include combining sound elements as a sum of level-adjusted constituent sound elements to create a desired new sound effect. Adaptive audio system authoring tools allow you to create sound effects as a collection of sound objects with relative positions using a visual-space design graphical user interface. For example, a visual representation of the sound generating object (eg a car) can be used as a template to assemble audio elements (exhaust note, tire hum, engine noise) as object channels that contain the sound and the proper spatial position (on the exhaust pipe, on the tires, on the hood). Individual object channels can then be linked and manipulated as a group. The authoring tool 106 includes several user interface elements to allow the sound engineer to input control information and visualize mix parameters and improve system functionality. The sound authoring and design process is also improved by allowing object channels and speaker channels to be linked and manipulated as a group. An example is combining an object channel with a distinct dry sound source with a set of speaker channels that contain an associated reverb signal.
[00058] Audio authoring tool 106 supports the ability to combine multiple audio channels, commonly called mixing. Multiple mixing methods are supported and can include traditional level-based mixing and sound-based mixing. In level-based mixing, wideband scaling is applied to the audio channels and the scaled audio channels are then summed. The wideband scaling factors for each channel are chosen to control the absolute level of the resulting mixed signal as well as the relative levels of the mixed channels in the mixed signal. In loudness-based mixing, one or more input signals are modified using frequency-dependent amplitude scaling, where the frequency-dependent amplitude is chosen to provide the desired perceived relative and absolute loudness while preserving the perceived timbre input sound.
[00059] Authoring tools allow the ability to create speaker channels and speaker channel groups. This allows metadata to be associated with each speaker channel group. Each speaker channel group can be tagged according to content type. The content type is extensible through a text description. Content types can include, but are not limited to, dialog, music, and effects. Each speaker channel group can be assigned unique instructions on how to mix upstream from one channel setup to another, where upstream mix is defined as creating M audio channels from N channels in that M > N. Upmixing instructions may include, but are not limited to, the following: an enable/disable flag to indicate whether upmixing is allowed; an upmix matrix to control the mapping between each input and output channel; and pre-configured matrix and enable settings can be assigned based on content type, for example, enabling upstream mixing for music only. Each speaker channel group can also be assigned unique instructions on how to downmix from one channel setup to another, where downmix is defined as creating Y audio channels from X channels where Y < X. Downmix instructions may include, but are not limited to, the following: an array to control the mapping between each input and output channel; and pre-configured matrix settings can be assigned based on content type, eg dialog must down-mix across the screen; effects should down-mix off-screen. Each speaker channel can also be associated with a metadata flag to disable bass management during rendering.
[00060] Modalities include a feature that enables the creation of object channels and object channel groups. This invention allows metadata to be associated with each object channel group. Each object channel group can be tagged according to content type. The content type is extensible through a text description, where content types can include, but are not limited to, dialog, music, and effects. Each object channel group can be assigned metadata to describe how the object(s) should be rendered.
[00061] Position information is provided to indicate the desired apparent source position. Position can be indicated using an egocentric or allocentric frame of reference. Egocentric reference is appropriate when the source position is to be referenced to the listener. For the egocentric position, spherical coordinates are useful for the description of position. An allocentric reference is the typical frame of reference for cinema or other audio/visual presentations where the source position is referenced relative to objects in the presentation environment such as a visual display screen or room boundaries. Three-dimensional (3D) trajectory information is provided to enable position interpolation or to use other rendering decisions such as enabling a "snap to mode". Size information is provided to indicate the desired apparent perceived audio source size.
[00062] Spatial quantization is provided through a "closest speaker setting" control which indicates an intention by the sound engineer or mixer to have an object rendered exactly by one speaker (with some sacrifice potential to spatial accuracy). A limit to the allowable spatial distortion can be indicated by azimuth and elevation tolerance thresholds so that if the threshold is exceeded, the "tune" function will not occur. In addition to distance thresholds, a cross attenuation rate parameter can be indicated to control how fast a moving object will transition or jump from one speaker to another when the desired position crosses between the speakers.
[00063] In one embodiment, dependent spatial metadata is used for certain position metadata. For example, metadata can be automatically generated for a "slave" object by associating it with a "master" object that the slave object must follow. A delay time or relative speed can be assigned to the slave object. Mechanisms can also be provided to allow the definition of an acoustic center of gravity for sets or groups of objects so that an object can be rendered so that it is perceived to move around another object. In such a case, one or more objects may rotate around an object or a defined area, such as a dominant point or a dry area of the room. The acoustic center of gravity would then be used at the rendering stage to help determine location information for each sound based on the appropriate object, even if the final location information was expressed as a relative room location as opposed to a location relative to another object.
[00064] When an object is rendered, it is assigned to one or more speakers according to the position metadata and location of the playback speakers. Additional metadata can be associated with the object to limit the speakers that should be used. The use of restrictions may prohibit the use of designated speakers or merely inhibit the designated speakers (allow less power to the speaker or speakers than would otherwise be applied). The speaker sets to be restricted may include, but are not limited to, any of the aforementioned speakers or speaker zones (eg L, C, R, etc.), or speaker areas such as such as: front wall, back wall, left wall, right wall, ceiling, floor, room speakers, and so on. Likewise, in the course of specifying the desired mix of multiple sound elements, it is possible to have one or more sound elements become inaudible or "masked" due to the presence of other "masking" sound elements. For example, when masked elements are detected, they can be identified to the user through a graphical display.
[00065] As described elsewhere, the audio program description can be adapted for rendering on a wide variety of speaker setups and channel configurations. When authoring an audio program, it is important to monitor the program's rendering effect in early playback settings to verify that the desired results are achieved. This invention includes the ability to select target playback settings and monitor the result. In addition, the system can automatically monitor the worst-case (ie, highest) signal levels that would be generated at each early playback setting and provide an indication of whether clipping or throttling will occur.
[00066] Figure 3 is a block diagram illustrating the workflow of creating, packaging and rendering adaptive audio content, in one modality. Workflow 300 in Figure 3 is divided into three distinct task groups labeled create/author, packaging, and display. In general, the hybrid bed and object model shown in Figure 2 allows most of the design, editing, pre-mixing and final sound mixing to be carried out the way they are today and without adding excessive overhead to the present processes. In one embodiment, adaptive audio functionality is provided in the form of software, firmware or circuitry that is used in conjunction with sound production and processing equipment, where such equipment may be new hardware systems or upgrades to existing systems. . For example, plug-in applications can be provided for digital audio workstations to allow existing panning technologies in sound editing and design to remain unchanged. In this way, it is possible to place both beds and objects on the workstation in 5.1 or editing rooms equipped with similar surround. Object metadata and audio are recorded in the session in preparation for the pre-mix and final mix stages in the dubbing theater.
[00067] As shown in Figure 3, authoring or authoring tasks involve entering 302 mix controls by a user, eg a sound engineer in the following example, for a 304 mixing console or audio workstation. In one modality, metadata is integrated into the mixing console surface, allowing channel strip attenuators, panning, and audio processing to work with both audio beds or objects or trunks. Metadata can be edited using either the console surface or the workstation UI, and the sound is monitored using a 306 rendering and mastering unit (RMU). The object and bed audio data and metadata Associates are recorded during the mastering session to create a "print master" that includes an adaptive audio mix 310 and any other rendered deliverables (such as a 7.1 or 5.1 surround cinema mix) 308. Existing authoring tools (eg , digital audio workstations such as Pro Tools) can be used to allow sound engineers to label individual audio tracks in a mixing session. Modalities extend this concept by allowing users to label individual subsegments within a track to aid in quickly searching or identifying audio elements. The user interface for the mixing console that enables the definition and creation of metadata can be implemented through graphical user interface elements, physical controls (eg sliders and buttons) or any combination of these.
[00068] At the packaging stage, the print master file is encapsulated using industry standard MXF encapsulation procedures, hashed and optionally encrypted to ensure the integrity of the audio content for delivery to the digital cinema packaging facility . This step can be performed by a 312 digital cinema processor (DCP) or any appropriate audio processor depending on the final playback environment, such as a theater equipped with 318 surround sound, a theater with adaptive audio enabled 320, or any other reproduction environment. As shown in Figure 3, the 312 processor outputs the appropriate audio signals 314 and 316 depending on the display environment.
[00069] In one modality, the adaptive audio print master contains an adaptive audio mix, along with a standard DCI compliant Pulse Code Modulated (PCM) mix. The PCM mix can be rendered by the render and master unit in a dubbing theater or created by a separate mix pass if desired. PCM audio forms the default main audio track file on the 312 digital cinema processor and adaptive audio forms form an additional track file. Such a track file may be compliant with existing industry standards and is ignored by DCI compliant servers that cannot use it.
[00070] In an example cinema playback environment, the DCP containing an adaptive audio track file is recognized by a server as a valid packet and ingested at the server and then transmitted to an adaptive audio cinema processor. On a system that has both adaptive and linear PCM audio files available, the system can switch between them as needed. For distribution to the exhibition stage, the adaptive audio packaging scheme allows for the delivery of a single type of package to be delivered to a theater. The DCP package contains both adaptive and PCM audio files. The use of security keys such as a key delivery message (KDM) can be incorporated to enable secure delivery of movie content or other similar content.
[00071] As shown in Figure 3, adaptive audio methodology is accomplished by enabling a sound engineer to express his intention regarding the rendering and playback of audio content through the 304 audio workstation. By controlling certain controls input, the engineer is able to specify where and how audio objects and sound elements are reproduced depending on the listening environment. Metadata is generated at Audio Workstation 304 in response to engineer 302's mix inputs to provide render queues that control spatial parameters (eg, position, velocity, pitch, timbre, etc.) and specify which one(s) speaker(s) or speaker groups in the listening environment play respective sounds during viewing. Metadata is associated with the respective audio data on workstation 304 or RMU 306 for packaging and transport by DCP 312.
[00072] A graphical user interface and software tools that provide control of the 304 workstation by the engineer comprise at least part of the authoring tools 106 of Figure 1. Hybrid Audio Codec
[00073] As shown in Figure 1, system 100 includes a hybrid audio codec 108. This component comprises an audio encoding, distributing and decoding system that is configured to generate a single bitstream that contains both audio elements based on conventional channel and audio object encoding elements. The hybrid audio coding system is built around a channel-based coding system that is configured to generate a single (unified) bitstream that is simultaneously compatible with (i.e., decoded by) a first decoder configured to decoding audio data encoded in accordance with a first (channel-based) encoding protocol and one or more secondary decoders configured to decode audio data encoded in accordance with one or more secondary (object-based) encoding protocols. The bit stream may include both encoded data (in the form of data bursts) decoded by the first decoder (and ignored by any slave decoder) and encoded data (e.g., other data bursts) decoded by one or more slave decoders ( and ignored by the first decoder). The decoded audio and associated information (metadata) from the first and one or more of the secondary decoders can then be combined in such a way that both channel-based and object-based information are rendered simultaneously to recreate a facsimile of the environment, channels, spatial information and objects presented to the hybrid coding system (ie, within a 3D space or listening environment).
[00074] Codec 108 generates a bit stream that contains encoded audio information and information related to multiple sets of channel positions (speakers). In one modality, one set of channel positions is fixed and used for the channel-based protocol, while another set of channel positions is adaptive and used for the audio object-based encoding protocol, so the configuration the channel for an audio object can change as a function of time (depending on where the object is placed in the sound field). Thus, the hybrid audio coding system can carry information about two sets of speaker locations for playback, where one set can be fixed and be a subset of the other. Legacy devices that support encoded audio information decode and render the audio information from the fixed subset, while a device with the ability to support the larger set can encode and render the additional encoded audio information that would be variably assigned in time to different speakers of the larger set. Furthermore, the system does not depend on the first and one or more of the secondary decoders that are simultaneously present in a system and/or device. Therefore, a legacy and/or existing system/device that contains only a decoder that supports the first protocol would yield a fully compatible sound field to be rendered via traditional channel-based playback systems. In this case, the unknown and unsupported portion(s) of the hybrid bitstream protocol (ie, the audio information represented by a secondary encoding protocol) would be ignored by the system or decoder device that supports the first hybrid encoding protocol.
[00075] In another modality, codec 108 is configured to operate in a mode in which the first encoding subsystem (which supports the first protocol) contains a combined representation of all information (channels and objects) of the sound field represented both in the first or in one or more of the secondary encoder subsystems present in the hybrid encoder. This ensures that the hybrid bitstream includes backward compatibility with decoders that support only the first encoder subsystem protocol by allowing audio objects (typically carried in one or more secondary encoder protocols) to be represented and rendered in the decoders that support only the first protocol.
[00076] In yet another embodiment, codec 108 includes two or more encoding subsystems, each of which subsystems is configured to encode audio data according to a different protocol and is configured to combine the outputs of the subsystems to generate a hybrid format (unified) bitstream.
[00077] One of the benefits of the modalities is the ability of a hybrid encoded audio bitstream to be transported to a wide range of content distribution systems, each of the distribution systems conventionally supporting only data encoded in accordance with the first encoding protocol. This eliminates the need for system and/or transport level protocol modifications/changes in order to specifically support the hybrid coding system.
[00078] Audio coding systems typically use standardized bitstream elements to enable the transport of additional (arbitrary) data in the bitstream itself. This additional (arbitrary) data is typically skipped (ie, ignored) when decoding the encoded audio included in the bit stream, but it can be used for a purpose other than decoding. Different audio encoding standards express these additional data fields using unique naming. Bitstream elements of this general type may include, but are not limited to, auxiliary data, fields to skip, datastream elements, padding elements, overhead data, and substream elements. Unless otherwise noted, the use of the term "auxiliary data" herein does not imply a specific type or format of additional data, but is to be construed as a gene term encompassing any or all of the examples associated with the present invention.
[00079] A data channel enabled through "auxiliary" bitstream elements of a first coding protocol in a combined hybrid coding system bitstream may carry one or more secondary (independent or dependent) bitstreams audio (encoded according to one or more secondary encoding protocols). The one or more audio bitstreams can be separated into N-sample blocks and multiplexed into the "auxiliary data" fields of a first bitstream. The first bit stream can be decoded by an appropriate (complementary) decoder. In addition, auxiliary data from the first bitstream could be extracted, recombined into one or more audio bitstreams, decoded by a processor that supports the syntax of one or more of the secondary bitstreams, and then combined and rendered together or independently. Furthermore, it is also possible to reverse the roles of the first and second bitstreams, so that data blocks from a first bitstream are multiplexed into the auxiliary data from a second bitstream.
[00080] Bitstream elements associated with a secondary encoding protocol also carry and transmit information characteristics (metadata) of the overlying audio, which may include, but are not limited to, desired sound source position, speed and size. This metadata is used during the decoding and rendering processes to recreate the proper (ie, original) position for the associated audio object carried in the applicable bitstream. It is also possible to carry the metadata described above, which are applicable to the audio objects contained in the one or more secondary bitstreams present in the hybrid stream, in the bitstream elements associated with the first encoding protocol.
[00081] Bitstream elements associated with one of or both the first and second protocols of the hybrid coding system carry/transmit contextual metadata identifying spatial parameters (ie, the essence of the signal properties themselves) and information additions that describe the underlying audio essence type in the form of specific audio classes that are carried in the hybrid encoded audio bitstream. Such metadata can indicate, for example, the presence of spoken dialogue, music, dialogue about music, applause, singing voice, etc., and can be used to adaptively modify the behavior of pre- or post-processing modules interconnected upstream or downstream of the hybrid coding system.
[00082] In one embodiment, codec 108 is configured to operate with a group of common or shared bits in which the bits available for encoding are "shared" among all or part of encoding subsystems that support one or more protocols. Such a codec can distribute the available bits (from the group of common "shared" bits) among the encoding subsystems in order to optimize the overall audio quality of the unified bitstream. For example, during a first time interval, the codec might assign more of the bits available to a first encoding subsystem and a little less of the bits available to the remaining subsystems, while during a second time interval, the codec might assign a little less of the bits available to the first encoding subsystem and more of the bits available to the remaining subsystems. The decision of how to assign bits between encoding subsystems can be dependent, for example, on statistical analysis results of the group of shared bits, and/or analysis of the audio content encoded by each subsystem. The codec can allocate bits from the shared group in such a way that a unified bitstream built by multiplexing the outputs of the encoding subsystems maintains a constant bitrate/frame length for a specified time interval. It is also possible, in some cases, for the bit rate/frame length of the unified bit stream to vary during a specific time interval.
[00083] In an alternative embodiment, codec 108 generates a unified bitstream that includes data encoded in accordance with the first encoding protocol configured and transmitted as an independent substream of a encoded data stream (that of a decoder supporting the first encoding protocol will decode) and encoded data according to a second protocol sent as an independent or dependent substream of the encoded data (one whose decoder supporting the first protocol will ignore). More generally, within a class of modalities, the codec generates a unified bitstream that includes two or more independent or dependent substreams (where each substream includes data encoded according to an identical or different encoding protocol).
[00084] In yet another alternative embodiment, codec 108 generates a unified bitstream that includes data encoded in accordance with the first encoding protocol configured and transmitted with a unique bitstream identifier (which is a decoder supporting a first protocol encoding associated with the unique bitstream identifier will decode) and data encoded according to a second protocol configured and transmitted with a unique bitstream identifier, which a decoder supporting the first protocol will ignore. More generally, in a class of modalities, the codec generates a unified bitstream that includes two or more substreams (where each substream includes data encoded according to an identical or different encoding protocol, and each carries one unique bitstream identifier). The methods and systems for creating a unified bitstream described above provide the ability to ambiguously signal (to a decoder) which interleaving and/or protocol was used in a hybrid bitstream (eg to signal whether AUX data , SKIP, DSE or the described approach is used).
[00085] The hybrid coding system is configured to support deinterleaving/demultiplexing and reinterleaving/remultiplexing of bitstreams that support one or more secondary protocols into a first bitstream (which supports a first protocol) at any observed processing point by a media delivery system. The hybrid codec is also configured to have the ability to encode audio input streams with different sample rates into one bitstream. This provides a means to efficiently encode and distribute sound sources that contain signals with inherently different bandwidths. For example, dialog tracks typically have inherently lower bandwidth than music and effects tracks. rendering
[00086] In one modality, the adaptive audio system allows multiple (eg up to 128) tracks to be bundled, usually as a combination of beds and objects. The basic audio data format for adaptive audio system comprises various monophonic audio streams. Each stream has associated metadata that specifies whether the stream is a channel-based stream or an object-based stream. Channel-based streams have rendering information via channel name or label; and object-based streams have location information encoded through mathematical expressions encoded in the additionally associated metadata. The original independent audio streams are then packaged as a single serial bit stream that contains all the audio data in an orderly manner. This adaptive data setting allows the sound to be rendered according to an allocentric frame of reference, where the final rendering location of a sound is based on the playback environment to match the mixer's intent. Thus, a sound can be specified to originate from a playback room reference frame (eg, middle of the left wall), rather than a labeled speaker or speaker group (eg surround left). Object position metadata contains the allocentric frame information necessary to correctly reproduce sound using available speaker positions in a room that is tuned to play adaptive audio content.
[00087] The renderer receives the bit stream that encodes the audio tracks and processes the content according to the signal type. Beds are fed arrangements, which will potentially require different equalization processing and delays than individual objects. The process supports rendering these beds and objects to multiply (up to 64) speaker outputs. Figure 4 is a block diagram of a stage of an adaptive audio system, in one modality. As shown in system 400 of Figure 4, various input signals, such as up to 128 audio tracks comprising adaptive audio signals 402 are provided by certain components of the creation, authoring, and packaging stages of system 300, such as RMU 306 and processor 312. These signals comprise the channel-based objects and beds that are used by the 404 renderer. The channel-based audio (beds) and objects are inserted into a 406 level manager that provides control over the amplitudes or levels of output from the different audio components. Certain audio components can be processed by an array correction component 408. The adaptive audio signals are then passed through a B chain processing component 410, which generates several (eg up to 64) of output signals speaker power supply. In general, B-chain feeds refer to the signals processed by power amplifiers, frequency dividers and speakers, as opposed to an A-chain content that constitutes the soundtrack on the film reel.
[00088] In one modality, the 404 renderer performs a rendering algorithm that intelligently utilizes the theater's surround speakers to their best capacity. By improving the power management and frequency response of the surround speakers and maintaining the same monitoring reference level for each output channel or speaker in the movie theater, objects are panned across the screen and high. Surround speakers can maintain their sound pressure level and have closer timbre compatibility without, importantly, increasing the overall sound pressure level in the theater. An array of properly specified surround speakers will typically have sufficient capacity to reproduce the maximum dynamic range in a 7.1 or 5.1 surround soundtrack (ie 20 dB above the reference level), however a single one is unlikely to surround speaker has the same capability as a large multi-way screen speaker. As a result, there will likely be instances when an object placed in the surround field will require a higher sound pressure than can be achieved using a single surround speaker. In these cases, the renderer will spread the sound over an appropriate number of speakers in order to achieve the required sound pressure level. The adaptive audio system enhances the quality and power management of surround speakers to provide an improvement in rendering fidelity. Provides support for surround speaker bass management through the use of optional rear subwoofers that allow each surround speaker to achieve improved power management, and by using simultaneously and potentially smaller speaker cabinets. It also allows the addition of side surround speakers closer to the screen than current practice to ensure objects can transition smoothly from screen to surround.
[00089] Through the use of metadata to specify location information of audio objects along with certain rendering processes, System 400 provides a flexible and understandable method for content creators to move beyond the constraints of existing systems. As stated previously, current systems create and distribute audio that is fixed to particular speaker locations with limited knowledge of the type of content conveyed in the essence of audio (the pâté of the audio that is played). Adaptive audio system 100 provides a new hybrid approach that includes the option for both speaker location-specific audio (left channel, right channel, etc.) and object-oriented audio elements that have generalized spatial information that can include but not limited to position, size and speed. This hybrid approach provides a balanced approach to fidelity (provided by fixed speaker locations) and rendering flexibility (generalized audio objects). The system also provides additional useful information about radio content that is paired with the audio essence by a content creator at the time of content creation. This information provides detailed and powerful information about audio attributes that can be used in very powerful ways during rendering. Such attributes may include, but are not limited to, content type (dialogue, music, effect, Foley, background/environment, etc.), spatial attributes (3D position, 3D size, speed) and rendering information (adjust for location speaker settings, channel weights, gain, bass management information, etc.).
[00090] The adaptive audio system described in this document provides powerful information that can be used to render a widely varying number of endpoints. In many cases, the optimal rendering technology applied depends largely on the endpoint device. For example, home theater systems and soundbars can have 2, 3, 5, 7 or even 9 separate speakers. Many other types of systems, such as televisions, computers and music docks have only two speakers and almost all commonly used devices have a binaural headphone output (PC, laptop computer, tablet computer, telephone cell phone, music player, etc.). However, for traditional audio that is delivered today (mono, stereo, 5.1, 7.1 channels) endpoint devices often need to make simplistic decisions and compromises to render and reproduce audio that is now delivered in a channel-specific/ speaker. Furthermore, there is little or no information conveyed about the actual content being distributed (dialogue, music, ambience, etc.) and little or no information about the content creator's intent for audio playback. However, adaptive audio system 100 provides this information and potentially access to audio objects, which can be used to create a compelling next-generation user experience.
[00091] System 100 allows the content creator to insert spatial intent of the mix into the bitstream using metadata such as position, size, velocity, and so on, through powerful and unique metadata and audio stream format adaptable. This allows for great flexibility in spatial audio reproduction. From a spatial rendering standpoint, adaptive audio enables you to adapt the mix to the exact position of the speakers in a particular room in order to avoid spatial distortion that occurs when the geometry of the playback system is not identical. to the authoring system. In current audio playback systems where only audio for one speaker channel is sent, the intent of the content creator is unknown. System 100 uses metadata transmitted through the creation and distribution piping. An adaptive audio detection playback system can use this metadata information to play the content in a way that is compatible with the original intent of the content creator. Likewise, the mix can be tailored to the exact hardware configuration of the playback system. Currently, there are many different possible speaker configurations and types in rendering equipment such as televisions, home theaters, sound bars, portable music playback docks, etc. When these systems receive channel-specific audio information today (ie, left and right channel audio or multi-channel audio) the system must process the audio to properly match the capabilities of the rendering equipment. An example is standard stereo audio that is sent to a soundbar with more than two speakers. In current audio playback where only audio for one speaker channel is sent, the intent of the content creator is unknown. Through the use of metadata transmitted throughout the creation and distribution pipeline, an adaptive audio detection playback system can use this information to reproduce content in a way that is compatible with the original intent of the content creator. For example, some soundbars have trigger speakers to create an enveloping feel, with adaptive audio, spatial information and content type (such as ambient effects) can be used by the soundbar to send only the appropriate audio to these side-firing speakers.
[00092] Adaptive audio system allows unlimited interpolation of speakers in a system in all dimensions front/rear, left/right, up/down, near/far. In current audio reproduction systems there is no information for how to manage the audio where it may be desirable to position the audio so that it is perceived by a listener being between two speakers. Currently, with audio that is only assigned to a specific speaker, a spatial quantization factor is introduced. With adaptive audio, the spatial positioning of the audio can be accurately known and reproduced in accordance with the audio playback system.
[00093] With respect to headphone rendering, the creator's intent is to make Head Related Transfer Functions (HRTF) compatible with spatial position. When audio is played back on headphones, spatial virtualization can be achieved by applying a Head Related Transfer Function, which processes the audio, adding perceptual cues that create the perception of the audio being played back in 3D space. and not on headphones. Accuracy of spatial reproduction depends on selection of the appropriate HRTF, which may vary based on a number of factors including spatial position. Using the spatial information provided by the Adaptive Audio System may result in the selection of one or a continuously varying number of HRTFs to greatly enhance the playing experience.
[00094] The spatial information carried by the adaptive audio system can not only be used by a content creator to create a compelling entertainment experience (film, television, music, etc.), but the spatial information can also indicate where a listener is it is positioned in relation to physical objects such as buildings or geographic points of interest. This would allow the user to interact with a virtualized audio experience that is related to the real world, ie augmented reality.
[00095] Modals also enable spatial ascending mixing, performing intensified ascending mixing by reading metadata only if object audio data is not available. Knowing the position of all objects and their types allows the upstream mixer to better distinguish elements within the channel-based bands. Existing upmixing algorithms have to infer information such as the type of audio content (speech, music, ambient effects) as well as the position of different elements within the audio stream to create a high quality upmix with minimal or no audible artifacts. Inferred information can often be incorrect or inappropriate. With adaptive audio, additional information available from related metadata, for example, audio content type, spatial position, velocity, audio object size, etc., can be used by an up-mix algorithm to create a result of High quality reproduction. The system also spatially matches the audio to the video by precisely positioning the audio object from the screen to the visual elements. In this case, an immersive audio/video playback experience is possible, particularly with larger screen sizes, if the rendered spatial location of some audio elements match image elements on the screen. One example is that having dialogue in a movie or television show spatially coincides with a person or character who is speaking on the screen. With normal speaker channel-based audio there is no easy method to determine where dialogue should be spatially positioned to match the location of the person or character on the screen. With the audio information available with adaptive audio, such audio/visual alignment can be achieved. Audio visual and spatial positional alignment can also be used for non-character/dialogue objects such as cars, trucks, animation and so on.
[00096] A spatial masking processing is facilitated by system 100, as knowledge of the spatial content of a mix through adaptive audio metadata means that the mix can be adapted to any speaker configuration. However, there is a risk of downmixing objects in the same or nearly the same location due to playback system limitations. For example, an object intended to be panned left rear can be downmixed to the left front if surround channels are not present, but if a sound element occurs to the left front at the same time, the upmixed object will be masked and will disappear from the mixing. With the use of adaptive audio metadata, spatial masking can be anticipated by the renderer and the spatial descending mix and/or sounding parameters of each object can be adjusted so that all audio elements in the mix remain exactly as noticeable as in the original mix. Because the renderer understands the spatial relationship between the mix and the playback system, it has the ability to "fit" objects to the closest speakers instead of creating a ghost image between two or more speakers . While this can slightly distort the spatial representation of the mix, it also allows the renderer to avoid an unwanted ghost image. For example, if the angular position of the left speaker of the mixing stage does not match the angular position of the left speaker of the playback system, using the adjustment to the closest speaker function could avoid causing the system to playback plays back a constant ghost image from the left channel of the mix stage.
[00097] Regarding content processing, adaptive audio system 100 allows the content creator to create individual audio objects and add information about the content that can be conveyed to the playback system. This allows for a great deal of flexibility in processing audio before playback. From a content processing and rendering point of view, the adaptive audio system allows the processing to be adapted to the type of object. For example, dialog enhancement can be applied to dialog objects only. Dialogue enhancement refers to a method of processing audio that contains dialogue so that the audibility and/or intelligibility of the dialogue is increased and/or enhanced. In many cases the audio processing that is applied to dialogue is inappropriate for non-dialogue audio content (ie music, environmental effects, etc.) and can result in objectionable audible artifacts. With adaptive audio, an audio object could contain only dialog in a content fragment and could be labeled accordingly so that a rendering solution could selectively apply dialog enhancement only to dialog content. Additionally, if the audio object is dialogue only (and not a mix of dialogue and other content which is often the case), then the dialogue enhancement processing can process the dialogue exclusively (thus limiting any processing that is performed on any other content). Similarly, bass management (filtering, attenuation, gain) can be focused on specific objects based on their type. Bass management refers to selectively isolating and processing only the bass (or lower) frequencies in a particular piece of content. With current audio systems and delivery mechanisms, this is a "blind" process that is applied to all audio. With adaptive audio, specific audio objects for which bass management is appropriate can be identified by the metadata and rendering processing can be applied accordingly.
[00098] Adaptive Audio System 100 also provides object based dynamic range compression and selective upmixing. Traditional audio tracks are of the same duration as the content itself, while an audio object can occur for only a limited amount of time in the content. Metadata associated with an object can contain information about its average and peak signal amplitude, as well as its onset or attack time (particularly for transient material). This information would allow a compressor to better adapt its compression and time constants (attack, release, etc.) to better fit the content. For selective upmixing, content creators can choose to indicate in the adaptive audio bitstream whether an object should be upmixed or not. This information allows the adaptive audio renderer and ascending mixer to distinguish which audio elements can be safely upmixed while respecting the creator's intent.
[00099] Modalities also allow the adaptive audio system to select a preferred rendering algorithm from a number of available rendering algorithms and/or surround sound formats. Examples of available rendering algorithms include: binaural, stereo dipole, Ambisonics, Wave Field Synthesis (WFS), multi-channel panning, raw trunks with position metadata. Others include dual balance and vector-based panning amplitude positioning.
[000100] The binaural distribution format uses a two-channel representation of a sound field in terms of the signal present in the left and right ears. Binaural information can be created through intra-auricular recording or synthesized using HRTF templates. Reproduction of a binaural representation is typically done on headphones, or employing crosstalk cancellation. Playback at an arbitrary speaker setting would require signal analysis to determine the associated sound field and/or signal source(s).
[000101] The Stereo Dipole Rendering Method is a transaural crosstalk cancellation process to make binaural signals playable on stereo speakers (eg at + and - 10 degrees off center).
[000102] Ambisonics is a (distribution format and rendering method) that is encoded in a four-channel form called format B. The first channel, W, is the non-directional pressure signal; the second channel, X, is the directional pressure gradient that contains the front and rear information; the third channel, Y, contains left and right and Z the top and bottom. These channels define a first-order sample of the complete sound field at one point. Ambisonics uses every available speaker to recreate the sampled (or synthesized) sound field within the speaker arrangement so that when some speakers are boosted, others are pulled.
[000103] Wave Field Synthesis is a method of rendering sound reproduction, based on accurate construction of the desired wave field by secondary sources. WFS is based on Huygens' principle and is deployed as an array of speakers (tens or hundreds) that wrap around the listening space and operate in a phase-coordinated manner to recreate each individual sound wave.
[000104] Multi-channel panning is a distribution format and/or rendering method and may be termed as channel-based audio. In this case, sound is represented as a number of distinct sources to be played through an equal number of speakers at defined angles from the listener. The content creator/mixer can create virtual images by panning signals between adjacent channels to provide direction suggestions; early reflections, reverberation, etc., can be mixed in many channels to provide environmental and direction cues.
[000105] Raw trunks with position metadata is a distribution format and can also be termed as object-based audio. In this format, distinct sound sources from the nearby microphone are represented along with environmental and position metadata. Virtual fonts are rendered based on metadata and playback equipment and listening environments.
[000106] The adaptive audio format is a hybrid of the multi-channel panning format and the raw trunks format. The rendering method in this modality is the panning of multiple channels. For audio channels, rendering (panning) happens at author time, while for objects, rendering (panning) happens at playback. Metadata and Adaptive Audio Broadcast Format
[000107] As stated above, metadata is generated during the creation stage to encode certain positional information for the audio objects and to accompany an audio program to assist in rendering the audio program and in particular to describe the audio program. audio in a way that enables rendering of the audio program in a wide variety of playback equipment and playback environments. Metadata is generated for a given program and the editors and mixers that create, collect, edit and manipulate the audio during post-production. An important feature of the adaptive audio format is the ability to control how the audio will translate to playback systems and environments that deviate from the mixing environment. In particular, a given theater may have less capabilities than the mixing environment.
[000108] The adaptive audio renderer is designed to make best use of available equipment to recreate the mixer intent. In addition, adaptive audio authoring tools allow the mixer to preview and adjust how the mix will be rendered in a variety of playback settings. All metadata values can be conditioned in the playback environment and speaker configuration. For example, a different mix level for a given audio element can be specified based on the setup or play mode. In one modality, the list of conditioned play modes is extensible and includes the following: (1) channel-based play only: 5.1, 7.1, 7.1 (height), 9.1; and (2) distinct speaker reproduction: 3D, 2D (no height).
[000109] In one modality, metadata controls or dictates different aspects of adaptive audio content and is organized based on different types including: program metadata, audio metadata, and rendering metadata (for channel and object). Each type of metadata includes one or more metadata items that provide values for characteristics that are referenced by an identifier (ID). Figure 5 is a table listing the metadata types and associated metadata elements for the adaptive audio system, in one modality.
[000110] As shown in table 500 of Figure 5, the first type of metadata is program metadata, which includes metadata elements that specify frame rate, track count, extendable channel description, and mix stage description. The frame rate metadata element specifies the frame rate of audio content in units of frames per second (fps). The raw audio format does not need to include audio framing or metadata as the audio is provided as complete tracks (length of a reel or entire resource) rather than audio segments (length of an object). Raw format does not need to carry all the information required to enable the adaptive audio encoder to frame the audio and metadata, including the actual frame rate. Table 1 shows the ID, sample values and description of the frame rate metadata element. TABLE 1

[000111] Track count metadata element indicates the number of audio tracks in a frame. An exemplary adaptive audio decoder/processor can support up to 128 simultaneous audio tracks, while the adaptive audio format will support any number of audio tracks. Table 2 shows the ID, sample values, and description of the range count metadata element. TABLE 2

[000112] Channel-based audio can be assigned to non-standard channels and the extensible channel description metadata element enables mixes to use new channel positions. For each extension channel the following metadata must be provided as shown in table 3: TABLE 3

[000113] The mix stage description metadata elements specify the frequency at which a particular speaker produces half the power of the passband. Table 4 shows the ID, sample values and description of the mixing stage description metadata element, where LF = Low Frequency; HF = High Frequency; 3dB point = speaker passband edge. TABLE 4

[000114] As shown in Figure 5, the second type of metadata is audio metadata. Each channel-based or object-based audio element consists of audio essence and metadata. Audio essence is a monophonic audio stream loaded on one of many audio tracks. Associated metadata describes how the essence of audio is stored (audio metadata, eg sample rate) or how it should be rendered (rendering metadata, eg desired audio source position). In general, audio tracks are continuous throughout the duration of the audio program. The program editor or mixer is responsible for assigning audio elements to tracks. Band usage is expected to be sparse, ie the average simultaneous band usage may only be 16 to 32. In a typical deployment, audio will be transmitted efficiently using a lossless encoder. However, alternative implementations are possible, for example transmitting unencoded audio data or lossless encoded audio data. In a typical deployment, the format consists of up to 128 audio tracks where each track has a single sample rate and a single encoding system. Each track lasts for the duration of the feature (no explicit coil support). The mapping of objects to tracks (time multiplexing) is the responsibility of the content creator (mixer).
[000115] As shown in Figure 3, the audio metadata includes the elements of sample rate, bit depth, and encoding systems. Table 5 shows the sample rate metadata element ID, sample values, and description. TABLE 5

[000116] Table 6 shows the ID, example values and description of the bit depth metadata element (for PCM and lossless compression). TABLE 6

[000117] Table 7 shows the ID, example values and description of the coding system metadata element. TABLE 7

[000118] As shown in Figure 5, the third type of metadata is rendering metadata. Render metadata specifies values that help the renderer match as closely as possible to the original mixer's intent regardless of the playback environment. The set of metadata elements is different for channel-based audio and object-based audio. A first render metadata field selects between the two types of audio - channel-based or object-based, as shown in table 8. TABLE 8

[000119] The rendering metadata for channel-based audio comprises a position metadata element that specifies the audio source position as one or more speaker positions. Table 9 shows the ID and values for the position metadata element for the channel base case. TABLE 9

[000120] The rendering metadata for channel-based audio also comprises a rendering control element that specifies certain characteristics regarding channel-based audio playback, as shown in table 10. TABLE 10

[000121] For object-based audio, metadata includes analogous elements as for channel-based audio. Table 11 provides the ID and values for the object position metadata element. Object position is described in one of three ways: three-dimensional coordinates; a two-dimensional plane and coordinates; or a line and a unidirectional coordinate. The rendering method can adapt based on the type of position information. TABLE 11

[000122] The ID and values for the object rendering control metadata elements are shown in table 12. These values provide additional means to control or optimize rendering for object-based audio. TABLE 12

[000123] In one embodiment, the metadata described above and illustrated in Figure 5 is generated and stored as one or more files that are associated or indexed with corresponding audio content so that the audio streams are processed by the adaptive audio system that interprets the metadata generated by the mixer. It should be noted that the metadata described above is a set of sample IDs, values and definitions, and other additional metadata elements or elements can be added for use in the adaptive audio system.
[000124] In one modality, two (or more) sets of metadata elements are associated with each of the audio streams based on channels and objects. A first set of metadata is applied to the plurality of audio tracks for a first playback environment condition and a second set of metadata is applied to the plurality of audio streams for a second playback environment condition. The second or subsequent set of metadata elements replaces the first set of metadata elements for a given audio stream based on the condition of the playback environment. The condition can include factors such as room size, shape, composition of material within the room, present occupancy and density of people in the room, ambient noise characteristics, ambient light characteristics and any other factor that may affect the sound or even the climate of the breeding environment. Post-production and Mastering
[000125] Render stage 110 of adaptive audio processing system 100 can include audio post-production steps that prompt the creation of a final mix. In a movie application, the three main categories of sound used in a movie mix are dialogue, music, and effects. Effects consist of sounds that are not dialogue or music (eg ambient sound, background/scene noise). Sound effects can be recorded or synthesized by the sound designer, or they can come from effect libraries. A subset of effects that involve specific noise sources (eg footsteps, doors, etc.) is known as Foley and are performed by Foley actors. The different types of sound are marked and panned accordingly by the recording engineers.
[000126] Figure 6 illustrates an exemplary workflow for a post-production process in an adaptive audio system, in one modality. As shown in diagram 600, all of the individual sound components of music, dialogue, Foley, and effects are brought together in the dubbing theater during the final 606 mix, and the 604 re-record mixer(s) use the pre- mixes (also known as 'mix minus') together with individual sound objects and placement data to create trunks as a way of grouping, for example, dialog, music, effects, Foley and background sounds. In addition to forming the final 606 mix, the music and all the effects trunks can be used as a basis for creating dubbed language versions of the film. Each trunk consists of a channel-based bed and several audio objects with metadata. The trunks combine to form the final mix. Using panning object positioning information from both the audio station and the mixing console, the rendering and mastering unit 608 renders the audio to speaker locations in the dubbing theater. This rendering allows mixers to hear how channel-based beds and audio objects combine and also provides the ability to render to different configurations. The mixer can use conditional metadata, which omit relevant profiles, to control how content is rendered to surround channels. In this way, the mixers retain complete control of how the movie plays across all scalable environments. A monitoring step can be added after either or both of the rewrite step 604 and the final mix step 606 to allow the mixer to hear and evaluate the intermediate content generated during each of the stages.
[000127] During the mastering session, the trunks, objects, and metadata are gathered into a 614 adaptive audio package, which is produced by the "printmaster" 610. This package also contains the retro-compatible 612 surround sound cinema mix (legacy 5.1 or 7.1). The rendering/mastering unit (RMU) 608 can render this output if desired; thus eliminating the need for any additional workflow steps in generating existing channel-based deliverables. In one modality, audio files are packaged using standard Material Exchange Format (MXF) encapsulation. The adaptive audio mix master file can also be used to generate other deliverables, such as multi-channel mixes or consumer stereo. Smart profiles and conditional metadata allow for controlled renderings that can significantly reduce the time required to create such mixes.
[000128] In one embodiment, a packaging system can be used to create a digital cinema package for the deliverables including an adaptive audio mix. Audio track files can be locked together to help prevent sync errors with adaptive audio track files. Certain territories require the addition of track files during the packaging phase, for example, the addition of Hearing Impaired (HI) or Visually Impaired Narration (VI-N) tracks to the main audio track file.
[000129] In one embodiment, the speaker arrangement in the playback environment can comprise any number of surround sound speakers placed and designated in accordance with established surround sound standards. Any number of additional speakers for accurate rendering of object-based audio content can also be placed based on the condition of the playback environment. These additional speakers can be tuned by an audio engineer and this tuning is provided to the system in the form of a tuning file that is used by the system to render the object-based components of audio adaptable to a speaker or speaker. specific speakers within the general speaker arrangement. The tuning file includes at least a list of speaker assignments and a channel mapping for individual speakers, information related to speaker grouping, and a runtime mapping based on a relative speaker position. speakers to the playback environment. Runtime mapping is used by an adjustment to the system resource that renders point source object-based audio content to a specific speaker that is closest to the perceived location of the sound as intended by the sound engineer.
[000130] Figure 7 is a diagram of an exemplary workflow for a digital cinema package process with the use of adaptive audio files, in one modality. As shown in diagram 700, the audio files comprising both the adaptive audio files and the 5.1 or 7.1 surround sound audio files are inserted into an encapsulation/encryption block 704. In one modality, upon creation of the digital cinema package in block 706, the PCM MXF file (with appropriate additional tracks attached) is encrypted using SMPTE specifications in accordance with existing practice. MXF adaptive audio is packaged as an auxiliary track file and is optionally encrypted using a symmetric content key per the SMPTE specification. This single DCP 708 can then be delivered to any compatible Digital Cinema Initiatives (DCI) server. In general, any installations that are not properly equipped will simply ignore the additional track file that contains the adaptive audio soundtrack and will use the existing main audio track file for standard playback. Facilities equipped with appropriate adaptive audio processors may ingest and play the adaptive audio soundtrack where applicable, reverting to the standard audio track as needed. Encapsulation/encryption component 704 can also provide input directly to a distribution KDM block 710 to generate an appropriate security key for use in the digital cinema server. Other movie elements or files such as subtitles 714 and images 716 can be encapsulated and encrypted along with audio files 702. In that case, certain processing steps can be included, such as compression 712 in the case of image files 716.
[000131] Regarding content management, adaptive audio system 100 allows the content creator to create individual audio objects and add information about the content that can be transported to the playback system. This allows for a great deal of flexibility in managing audio content. From a content management standpoint, adaptive audio methods enable several different features. These include changing the content language by replacing just the dialog object for space saving, download transfer efficiency, geographic playback adaptation, etc. Film, television and other entertainment programs are typically distributed internationally. This often requires that the language in the content fragment be changed depending on where it will be played (French for movies that are shown in France, German for TV programs that are shown in Germany, etc.). These days, this often requires a completely independent audio soundtrack to be created, packaged and distributed. With adaptive audio and its inherent concept of audio objects, the dialog for a piece of content could be an independent audio object. This allows the content language to be easily changed without updating or changing other elements of the audio soundtrack such as music, effects, etc. This would not only apply to foreign languages, but also language inappropriate for certain audiences (eg, children's television shows, airline movies, etc.), targeted advertising, and so on. Installation and Equipment Considerations
[000132] The adaptive audio file format and associated processors allow for changes in how cinema equipment is installed, calibrated and maintained. With the introduction of much more potential speaker outputs, each individually equalized and balanced, there is a need for intelligent and effective automatic room equalization, which can be accomplished through the ability to manually adjust any automated room equalization. In one modality, the adaptive audio system uses an optimized 1/12° octave band equalization mechanism. Up to 64 outputs can be processed to more precisely balance cinema sound. The system also allows programmed monitoring of individual speaker outputs, cinema processor output directly or even the sound reproduced in the auditorium. Local and network alerts can be created to ensure appropriate action is taken. The flexible rendering system can automatically remove a damaged speaker or amplifier from the playback chain and render around it, then allowing the show to continue.
[000133] The cinema processor can be connected to the digital cinema server with existing 8xAES main audio connections and an Ethernet connection to stream adaptive audio data. Playback of 7.1 or 5.1 surround content uses existing PCM connections. Adaptive audio data is streamed over Ethernet to the cinema processor for decoding and rendering, and communication between the server and the cinema processor allows the audio to be identified and synchronized. In the event of any problem with adaptive audio track playback, the sound is reverted back to Dolby Surround 7.1 or 5.1 PCM audio.
[000134] Although modalities have been described in relation to 5.1 and 7.1 surround sound systems, it should be noted that many other present and future configurations can be used in conjunction with modalities including 9.1, 11.1 and 13.1 and beyond.
[000135] The Adaptive Audio System is designed to allow both content creators and exhibitors to decide how sound content should be rendered in different playback speaker configurations. The optimal number of speaker output channels used will vary depending on the size of the room. As such, the recommended speaker installation depends on many factors, such as size, composition, seating configuration, environment, average audience sizes, and so on. Exemplary or representative speaker configurations and templates are provided herein for illustrative purposes only and are not intended to limit the scope of any claimed modality.
[000136] The recommended speaker template for an adaptive audio system remains compatible with existing theater systems, which is vital so as not to compromise playback of existing 5.1 and 7.1 channel based formats. In order to preserve the intent of the adaptive audio sound engineer, and the intent of 7.1 and 5.1 content mixers, the positions of existing screen channels should not be changed too radically in an effort to elevate or accentuate the introduction of new ones. speaker locations. In contrast to using all 64 available output channels, the adaptive audio format is able to be accurately rendered in theater to high-miss settings such as 7.1, so as to allow even the format (and associated benefits ) to be used in existing movie theaters with no changes to amplifiers or speakers.
[000137] Different speaker locations may have different effectiveness depending on the theater design, so there is currently no specified ideal number of industry or channel assignments. Adaptive audio is intended to be truly adaptable and capable of accurate reproduction in a variety of auditoriums, whether it has a limited number of playback channels or many channels with highly flexible configurations.
[000138] Figure 8 is an aerial view 800 of an exemplary speaker locations template suggested for use with an adaptive audio system in a typical auditorium, and Figure 9 is a front view 900 of the exemplary speaker locations template. speaker suggested on the auditorium screen. The reference position referred to hereinafter in this document corresponds to a position 2/3 of the distance back from the screen to the rear wall, on the center line of the screen. The standard 801 screen speakers are shown in their common positions relative to the screen. Studies of screen plane elevation perception have shown that additional 804 speakers behind the screen, such as Center Left (Lc) and Center Right (Rc) screen speakers (at Extra Left and Extra Right channel locations in 70mm film formats) can be beneficial in creating smoother panning across the screen. Such optional speakers, particularly in auditing with screens larger than 12 m (40 ft.) wide, are therefore recommended. All screen speakers should be angled so that they are aimed towards the reference position. The recommended assignment of the 810 subwoofer behind the screen should remain unchanged, which includes keeping the cabinet assignment asymmetrical, relative to the center of the room, to prevent standing wave stimulation. Additional 816 subwoofers can be assigned in the rear of the theater.
[000139] The 802 surround type speakers should be wired individually behind the amplifier hanger, and be individually amplified where possible with a dedicated power amplification channel that corresponds to the speaker power management in accordance with the manufacturer specifications. Ideally, surround-type speakers should be specified to handle an increased SPL for each individual speaker, and likewise with a wider frequency response where possible. As a general rule of thumb for a medium-sized movie theater, the distance for surround type speakers should be between 2 and 3 m (6'6" and 9'9"), with left surround type speakers and rights assigned symmetrically. However, surround-type speaker distance is more effectively regarded as subtended angles from a given listener between adjacent speakers, as opposed to using absolute speaker distances. For optimal reproduction throughout the auditorium, the angular distance between adjacent speakers should be 30 degrees or less, referenced from each of the four corners of the main listening area. Good results can be achieved with a distance of up to 50 degrees. For each surround zone, the speakers should maintain equal linear spacing adjacent to the seating area where possible. The linear distance beyond the listening area, for example between the front row and the screen, may be slightly greater. Figure 11 is an example of a placement of top surround type speakers 808 and side surround type speakers 806 with respect to the reference position under one modality.
[000140] Additional 806 side surround type speakers should be mounted closer to the screen than is currently recommended practice to start approximately one-third of the way to the back of the auditorium. These speakers are not used as side surrounds when playing Dolby Surround 7.1 or 5.1 soundtracks, but will enable smooth transition and enhanced timbre that match when panning objects from the screen speakers to the surround zones . To maximize the impression of space, surround arrangements should be assigned as low as practical, subject to the following restrictions: the vertical assignment of surround-type speakers in front of the array should be reasonably close to the acoustic center height of the speaker. screen speaker, and loud enough to maintain good coverage across the seating area according to the directivity of the speaker. The vertical assignment of the surround-type speakers should be such that they are in a straight line from front to back, and (typically) slanted upward so that the relative elevation of the surround-type speakers above the listeners is held toward the rear of the theater as seat elevation increases, as shown in Figure 10, which is a side view of an exemplary template of suggested speaker locations for use with an adaptive audio system in the typical auditorium. In practice, this can be achieved most simply by choosing the elevation for the front-most and rear-most side surround-type speakers, and assigning the remaining speakers in a line between these points.
[000141] In order to provide optimal coverage for each speaker over the seating area, the 806 side surround and 816 rear speakers and 808 top surrounds should be targeted towards the reference position in the movie theater, under definite guidelines regarding distance, position, angle, and so on.
[000142] Adaptive Audio Cinema System and Format Modalities achieve enhanced levels of immersion and audience engagement over current systems by offering powerful new authoring tools for mixers, and a new cinema processor that features a flexible rendering engine that optimizes the audio quality and surround effects of the soundtrack for each feedback and speaker features in the room. Additionally, the system maintains backwards compatibility and minimizes the impact on current production and distribution workflows.
[000143] Although modalities have been described in relation to examples and deployments in a cinema environment where adaptive audio content is associated with film content for use in digital cinema processing systems, it should be noted that modalities can also be deployed in non-cinema environments. Adaptive audio content comprising object-based audio and channel-based audio can be used in conjunction with any related content (associated audio, video, graphic, etc.), or it can constitute stand-alone audio content. The playback environment can be any suitable listening environment from headphones or near-field monitors to small or large rooms, cars, outdoor arenas, concert halls, and so on.
[000144] System aspects 100 can be deployed in a computer-based sound processing network environment suitable for processing digital or digitized audio files. Portions of the adaptive audio system may include one or more networks that comprise any desired number of individual machines, which include one or more routers (not shown) that serve to temporarily store and route data transmitted between computers. Such a network can be built on several different network protocols, and it can be the Internet, a Wide Area Network (WAN), a Local Area Network (LAN), or any combination thereof. In an embodiment where the network comprises the Internet, one or more machines can be configured to access the Internet through web browser programs.
[000145] One or more of the components, blocks, processes or other functional components can be implemented through a computer program that controls execution of a system processor-based computing device. It should also be noted that the various functions disclosed in this document can be described using any number of combinations of hardware, firmware, and/or as data and/or instructions embedded in various machine-readable or computer-readable media in terms of its logic component, register transfer, behavioral, and/or other characteristics. Computer-readable media in which such instructions and/or formatted data may be incorporated include, but are not limited to, non-volatile (non-transient) physical storage media in various forms, such as optical, magnetic, or semiconductor storage media.
[000146] Unless the context clearly requires otherwise, throughout the description and claims, the words "comprises," "comprises," and the like shall be interpreted in an inclusive sense, as opposed to an exclusive exclusive sense. or excluding; that is, in a sense of "which includes but without limitation". Words that use the singular or plural number also include the plural or singular number respectively. Additionally, the words "in this application", "in this document", "above", "below" and words of similar meaning refer to this application as a whole and not to any particular portions of this application. When the word "or" is used in reference to a list of two or more items, that word covers all of the following interpretations of the word: any of the items in the list, all the items in the list, and any combination of the items in the list.
[000147] Although one or more deployments have been described by way of example and in terms of specific modalities, it should be understood that one or more deployments are not limited to the disclosed modalities. Rather, it is intended to cover various similar modifications and arrangements as would be apparent to those skilled in the art. Therefore, the scope of the appended claims must be accorded the broadest interpretation to encompass all such modifications and similar arrangements.

权利要求:
Claims (18)
[0001]
1. A system for processing audio signals characterized in that it comprises an authoring component configured to: receive a plurality of audio signals; generate an adaptive audio mix comprising a plurality of monophonic audio streams and metadata associated with each of the audio streams and indicating a reproduction location of a respective monophonic audio stream, wherein at least part of the plurality of monophonic audio streams is identified as channel-based audio and the others of the plurality of monophonic audio streams are identified as object-based audio, and where the playback location of a channel-based monophonic audio stream comprises a designation of a speaker in an array of speakers, and the playback location of an object-based mono audio stream comprises a location in three-dimensional space, and where each object-based mono audio stream is rendered in at least one specific speaker of the speaker matrix; and encapsulating the plurality of monophonic audio streams and the metadata into a bitstream for transmission to a rendering system configured to render the plurality of monophonic audio streams to a plurality of speaker feeds corresponding to speakers in one playback environment, where the speakers in the speaker array are placed at specific positions within the playback environment, and where metadata elements associated with each object-based mono audio stream indicate whether rendering the stream is prohibited monaural audio streams on one or more specific speaker feeds of the plurality of speaker feeds, such that the respective object-based monaural audio stream is not rendered on any of the one or more specific speaker feeds of the plurality of speaker feeds.
[0002]
2. System according to claim 1, characterized in that the authoring component includes a mixing console with controls operable by a user to indicate levels of reproduction of the plurality of monophonic audio streams and in which the metadata elements associated with each object stream are automatically generated upon user input to the mixing console controls.
[0003]
3. System according to claim 1, characterized in that it further comprises an encoder coupled to the authoring component and configured to receive the plurality of monophonic audio streams and metadata and generate a single digital bit stream containing the plurality of monophonic audio streams in an orderly manner.
[0004]
4. System for processing audio signals, characterized in that it comprises a rendering system configured to: receive a bit stream encapsulating an adaptive audio mix comprising a plurality of monophonic audio streams and metadata associated with each of the streams and indicating a playback location of a respective monophonic audio stream, where at least some of the plurality of monophonic audio streams are identified as channel-based audio and the others of the plurality of monophonic audio streams are identified as object-based audio, and where the playback location of a channel-based mono audio stream comprises a designation of a speaker in an array of speakers and the playback location of an object-based mono audio stream comprises a location in three-dimensional space and where each object-based monophonic audio stream is rendered in at least one al. speaker matrix-specific speaker; and rendering the plurality of monophonic audio streams to a plurality of speaker feeds corresponding to the speakers in a playback environment, where the speakers of the speaker array are placed at specific positions within the listening environment. playback and in which metadata elements associated with each respective monaural based audio stream object indicates whether rendering the respective monaural audio stream on one or more specific speaker feeds of the plurality of speaker feeds is prohibited, so that the respective object-based mono audio stream is not rendered on any of the one or more speaker-specific feeds of the plurality of speaker feeds.
[0005]
5. System according to claim 4, characterized in that the one or more specific powered speakers in which the rendering of the respective monophonic audio stream is prohibited include one or more speakers or speaker zones appointed.
[0006]
6. System according to claim 5, characterized in that the one or more named speakers or speaker zones include one or more of L, C, and R.
[0007]
7. System according to claim 4, characterized in that the one or more specific powered speakers in which the rendering of the respective monophonic audio stream is prohibited include one or more speaker areas.
[0008]
8. System according to claim 7, characterized in that the one or more speaker areas include one or more among: front wall, rear wall, left wall, right wall, ceiling, floor and speakers inside the room.
[0009]
9. System according to claim 4, characterized in that the metadata elements associated with each object-based monophonic audio stream further indicate spatial parameters that control the reproduction of a corresponding sound component comprising one or more of: sound position, sound width and sound speed.
[0010]
10. System according to claim 4, characterized in that the reproduction location for each of the plurality of object-based monophonic audio streams comprises a spatial position in relation to a screen within a reproduction environment or a surface that encloses the reproduction environment and in which the surface comprises a front plane, back plane, left plane, right plane, superior plane and inferior plane.
[0011]
11. System according to claim 4, characterized in that the rendering system selects a rendering algorithm used by the processing system, the rendering algorithm selected from the group consisting of: binaural, stereo dipole, Ambisonics , Wave Field Synthesis (WFS), multi-channel panning, raw trunks with position metadata, dual balance, and vector-based panning amplitude positioning.
[0012]
12. System according to claim 4, characterized in that the playback location for each of the plurality of object-based monophonic audio streams is specified independently with respect to an egocentric frame of reference or an allocentric frame of reference , in which the egocentric frame of reference is taken in relation to a listener in the reproduction environment and in which the allocentric frame of reference is obtained in relation to a characteristic of the reproduction environment.
[0013]
13. Method for authorizing audio signals for rendering, characterized in that it comprises the steps of: receiving a plurality of audio signals; generate an adaptive audio mix comprising a plurality of monophonic audio streams and metadata associated with each of the audio streams and indicating a reproduction location of a respective monophonic audio stream, wherein at least part of the plurality of monophonic audio streams is identified as channel-based audio and the others of the plurality of monophonic audio streams are identified as object-based audio and where the channel-based audio playback location comprises speaker designations of speakers in an array of speakers, and the object-based audio playback location comprises a location in three-dimensional space and where each object-based monophonic audio stream is rendered on at least one specific speaker of the speaker array; and encapsulating the plurality of monophonic audio streams and the metadata into a bitstream for transmission to a rendering system configured to render the plurality of monophonic audio streams to a plurality of speaker feeds corresponding to the speakers in a playback environment, where the speakers in the speaker array are placed at specific positions within the playback environment, and where metadata elements associated with each object-based mono audio stream indicate whether rendering the stream is prohibited monaural audio streams on one or more specific speaker feeds of the plurality of speaker feeds, such that the respective object-based monaural audio stream is not rendered on any of the one or more specific speaker feeds of the plurality of speaker feeds.
[0014]
14. Method for rendering audio signals, characterized in that it comprises the steps of: receiving a bitstream encapsulating an adaptive audio mix comprising a plurality of monophonic audio streams and metadata associated with each of the audio streams and indicating a reproduction location of a respective monophonic audio stream, where at least some of the plurality of monophonic audio streams are identified as channel-based audio and the others of the plurality of monophonic audio streams are identified as object-based audio , and where the playback location of a channel-based mono audio stream comprises a designation of a speaker in an array of speakers and the playback location of an object-based mono audio stream comprises a location in three-dimensional space and where each object-based monophonic audio stream is rendered on at least one specific speaker in the array of speaker; and rendering the plurality of monophonic audio streams to a plurality of speaker feeds corresponding to the speakers in a playback environment, where the speakers of the speaker array are placed at specific positions within the listening environment. playback and in which metadata elements associated with each respective object-based mono audio stream indicates whether it is prohibited to render the respective mono audio stream on one or more specific speaker feeds of the plurality of speaker feeds, so that the respective object-based mono audio stream is not rendered on any of the one or more speaker-specific feeds of the plurality of speaker feeds.
[0015]
15. Method according to claim 14, characterized in that the one or more specific powered speakers in which the rendering of the respective monophonic audio stream is prohibited include one or more speakers or speaker zones appointed.
[0016]
16. Method according to claim 14, characterized in that the one or more specific speaker feeds in which rendering the respective monophonic audio stream is prohibited include one or more speaker areas.
[0017]
17. Method according to claim 14, characterized in that the metadata elements associated with each object-based monophonic audio stream further indicate spatial parameters that control the reproduction of a corresponding sound component comprising one or more of: sound position, sound width and sound speed.
[0018]
18. Method according to claim 14, characterized in that the reproduction site for each of the plurality of object-based monophonic audio streams comprises a spatial position relative to a screen within a reproduction environment, or a surface surrounding the reproduction environment, and wherein the surface comprises a front plane, a back plane, a left plane, a right plane, a top plane and a bottom plane, and/or is independently specified with respect to a frame of egocentric frame of reference or an allocentric frame of reference, wherein the egocentric frame of reference is obtained in relation to a listener in the reproduction environment, and in which the allocentric frame of reference is obtained in relation to a characteristic of the reproduction environment.

类似技术:

公开号 | 公开日 | 专利标题

US10904692B2|2021-01-26|System and method for adaptive audio signal generation, coding and rendering

同族专利:

公开号 | 公开日

EP3893521A1|2021-10-13|

JP2019144583A|2019-08-29|

TWI651005B|2019-02-11|

US20200145779A1|2020-05-07|

ES2871224T3|2021-10-28|

CN105792086B|2019-02-15|

KR20200137034A|2020-12-08|

AU2016202227A1|2016-05-05|

KR20180035937A|2018-04-06|

WO2013006338A3|2013-10-10|

SG10201604679UA|2016-07-28|

JP2017215592A|2017-12-07|

RU2017112527A|2019-01-24|

US20160381483A1|2016-12-29|

JP6523585B1|2019-06-05|

AU2016202227B2|2018-03-22|

EP2727383B1|2021-04-28|

MY165933A|2018-05-18|

US10327092B2|2019-06-18|

TW201909658A|2019-03-01|

RU2617553C2|2017-04-25|

JP6821854B2|2021-01-27|

HUE054452T2|2021-09-28|

US20210219091A1|2021-07-15|

JP2019095813A|2019-06-20|

JP2021005876A|2021-01-14|

US10057708B2|2018-08-21|

IL277736A|2021-07-29|

IL245574D0|2016-06-30|

US10904692B2|2021-01-26|

AU2020226984A1|2020-09-17|

DK2727383T3|2021-05-25|

JP6759442B2|2020-09-23|

IL265741A|2020-10-29|

TWI603632B|2017-10-21|

KR102115723B1|2020-05-28|

KR101685447B1|2016-12-12|

JP2016165117A|2016-09-08|

JP2020057014A|2020-04-09|

CN103650539B|2016-03-16|

UA124570C2|2021-10-13|

JP2021073496A|2021-05-13|

RU2731025C2|2020-08-28|

JP2014522155A|2014-08-28|

US20140133683A1|2014-05-15|

US9179236B2|2015-11-03|

US9942688B2|2018-04-10|

CA2973703A1|2013-01-10|

KR101946795B1|2019-02-13|

US20160021476A1|2016-01-21|

US9467791B2|2016-10-11|

JP6174184B2|2017-08-02|

JP6486995B2|2019-03-20|

KR102185941B1|2020-12-03|

US20180192230A1|2018-07-05|

US9800991B2|2017-10-24|

KR20140017682A|2014-02-11|

AU2018203734B2|2019-03-14|

RU2741738C1|2021-01-28|

KR102003191B1|2019-07-24|

JP6637208B2|2020-01-29|

US20170215020A1|2017-07-27|

AU2019204012B2|2020-06-11|

TW202139720A|2021-10-16|

KR20150013913A|2015-02-05|

JP2021131562A|2021-09-09|

HK1219604A1|2017-04-07|

CA2837893A1|2013-01-10|

TWI543642B|2016-07-21|

IL284585D0|2021-08-31|

RU2013158054A|2015-08-10|

MX2013014684A|2014-03-27|

KR20200058593A|2020-05-27|

AU2012279357B2|2016-01-14|

WO2013006338A2|2013-01-10|

EP2727383A2|2014-05-07|

US9622009B2|2017-04-11|

TWI722342B|2021-03-21|

US20190306652A1|2019-10-03|

AR086775A1|2014-01-22|

TW201811070A|2018-03-16|

KR20190014601A|2019-02-12|

US10477339B2|2019-11-12|

AU2019204012A1|2019-07-11|

JP5912179B2|2016-04-27|

CN105792086A|2016-07-20|

PL2727383T3|2021-08-02|

CN103650539A|2014-03-19|

BR112013033386A2|2017-01-24|

CA2837893C|2017-08-29|

IL277736D0|2020-11-30|

US20190104376A1|2019-04-04|

AU2021258043A1|2021-11-25|

RU2017112527A3|2020-06-26|

JP6882618B2|2021-06-02|

AU2020226984B2|2021-08-19|

IL230046A|2016-06-30|

KR101845226B1|2018-05-18|

IL265741D0|2019-06-30|

US10165387B2|2018-12-25|

AU2018203734A1|2018-06-21|

US20180324543A1|2018-11-08|

KR20190086785A|2019-07-23|

US20180027352A1|2018-01-25|

TW201325269A|2013-06-16|

TW201642673A|2016-12-01|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US5155510A|1990-11-29|1992-10-13|Digital Theater Systems Corporation|Digital sound system for motion pictures with analog sound track emulation|

RU1332U1|1993-11-25|1995-12-16|Магаданское государственное геологическое предприятие "Новая техника"|Hydraulic monitor|

US5717765A|1994-03-07|1998-02-10|Sony Corporation|Theater sound system with upper surround channels|

US5642423A|1995-11-22|1997-06-24|Sony Corporation|Digital surround sound processor|

US5970152A|1996-04-30|1999-10-19|Srs Labs, Inc.|Audio enhancement system for use in a surround sound environment|

US6229899B1|1996-07-17|2001-05-08|American Technology Corporation|Method and device for developing a virtual speaker distant from the sound source|

CN1452851A|2000-04-19|2003-10-29|音响方案公司|Multi-channel surround sound mastering and reproduction techniques that preserve spatial harmonics in three dimensions|

US6164018A|1997-12-08|2000-12-26|Shopro, Inc.|Cinematic theater and theater multiplex|

US6624873B1|1998-05-05|2003-09-23|Dolby Laboratories Licensing Corporation|Matrix-encoded surround-sound channels in a discrete digital sound format|

US6931370B1|1999-11-02|2005-08-16|Digital Theater Systems, Inc.|System and method for providing interactive audio in a multi-channel audio environment|

US6771323B1|1999-11-15|2004-08-03|Thx Ltd.|Audio visual display adjustment using captured content characteristics|

EP1134724B1|2000-03-17|2008-07-23|Sony France S.A.|Real time audio spatialisation system with high level control|

US7212872B1|2000-05-10|2007-05-01|Dts, Inc.|Discrete multichannel audio with a backward compatible mix|

US6970822B2|2001-03-07|2005-11-29|Microsoft Corporation|Accessing audio processing components in an audio generation system|

KR20030015806A|2001-08-17|2003-02-25|최해용|Optical system for theaterical visual & sound|

WO2003085643A1|2002-04-10|2003-10-16|Koninklijke Philips Electronics N.V.|Coding of stereo signals|

US20030223603A1|2002-05-28|2003-12-04|Beckman Kenneth Oren|Sound space replication|

JP2003348700A|2002-05-28|2003-12-05|Victor Co Of Japan Ltd|Presence signal generating method, and presence signal generating apparatus|

DE10254404B4|2002-11-21|2004-11-18|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio reproduction system and method for reproducing an audio signal|

GB0301093D0|2003-01-17|2003-02-19|1 Ltd|Set-up method for array-type sound systems|

GB0304126D0|2003-02-24|2003-03-26|1 Ltd|Sound beam loudspeaker system|

FR2853802B1|2003-04-11|2005-06-24|Pierre Denis Rene Vincent|INSTALLATION FOR THE PROJECTION OF CINEMATOGRAPHIC OR DIGITAL AUDIO WORKS|

RU2347282C2|2003-07-07|2009-02-20|Конинклейке Филипс Электроникс Н.В.|System and method of sound signal processing|

US6972828B2|2003-12-18|2005-12-06|Eastman Kodak Company|Method and system for preserving the creative intent within a motion picture production chain|

SE0400998D0|2004-04-16|2004-04-16|Cooding Technologies Sweden Ab|Method for representing multi-channel audio signals|

US7106411B2|2004-05-05|2006-09-12|Imax Corporation|Conversion of cinema theatre to a super cinema theatre|

CA2598575A1|2005-02-22|2006-08-31|Verax Technologies Inc.|System and method for formatting multimode sound content and metadata|

DE102005008366A1|2005-02-23|2006-08-24|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Device for driving wave-field synthesis rendering device with audio objects, has unit for supplying scene description defining time sequence of audio objects|

DE102005008343A1|2005-02-23|2006-09-07|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for providing data in a multi-renderer system|

DE102005008342A1|2005-02-23|2006-08-24|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio-data files storage device especially for driving a wave-field synthesis rendering device, uses control device for controlling audio data files written on storage device|

JP2006304165A|2005-04-25|2006-11-02|Yamaha Corp|Speaker array system|

DE102005033238A1|2005-07-15|2007-01-25|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for driving a plurality of loudspeakers by means of a DSP|

JP4685106B2|2005-07-29|2011-05-18|ハーマンインターナショナルインダストリーズインコーポレイテッド|Audio adjustment system|

KR100733965B1|2005-11-01|2007-06-29|한국전자통신연구원|Object-based audio transmitting/receiving system and method|

EP1843635B1|2006-04-05|2010-12-08|Harman Becker Automotive Systems GmbH|Method for automatically equalizing a sound system|

DE102006022346B4|2006-05-12|2008-02-28|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Information signal coding|

MX2009003564A|2006-10-16|2009-05-28|Fraunhofer Ges Forschung|Apparatus and method for multi -channel parameter transformation.|

CN101001485A|2006-10-23|2007-07-18|中国传媒大学|Finite sound source multi-channel sound field system and sound field analogy method|

JP5209637B2|2006-12-07|2013-06-12|エルジーエレクトロニクスインコーポレイティド|Audio processing method and apparatus|

US7788395B2|2007-02-14|2010-08-31|Microsoft Corporation|Adaptive media playback|

AT526663T|2007-03-09|2011-10-15|Lg Electronics Inc|METHOD AND DEVICE FOR PROCESSING AN AUDIO SIGNAL|

EP3712888A3|2007-03-30|2020-10-28|Electronics and Telecommunications Research Institute|Apparatus and method for coding and decoding multi object audio signal with multi channel|

EP2158587A4|2007-06-08|2010-06-02|Lg Electronics Inc|A method and an apparatus for processing an audio signal|

AT535906T|2007-07-13|2011-12-15|Dolby Lab Licensing Corp|SOUND PROCESSING BY AUDITORIAL SCENE ANALYSIS AND SPECTRAL ASYMMETRY|

US20110188342A1|2008-03-20|2011-08-04|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Device and method for acoustic display|

US8315396B2|2008-07-17|2012-11-20|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Apparatus and method for generating audio output signals using object based metadata|

US7996422B2|2008-07-22|2011-08-09|At&T Intellectual Property L.L.P.|System and method for adaptive media playback based on destination|

US7796190B2|2008-08-15|2010-09-14|At&T Labs, Inc.|System and method for adaptive content rendition|

US8793749B2|2008-08-25|2014-07-29|Broadcom Corporation|Source frame adaptation and matching optimally to suit a recipient video device|

US8798776B2|2008-09-30|2014-08-05|Dolby International Ab|Transcoding of audio metadata|

US8351612B2|2008-12-02|2013-01-08|Electronics And Telecommunications Research Institute|Apparatus for generating and playing object based audio contents|

US8786682B2|2009-03-05|2014-07-22|Primesense Ltd.|Reference image techniques for three-dimensional sensing|

US20120096353A1|2009-06-19|2012-04-19|Dolby Laboratories Licensing Corporation|User-specific features for an upgradeable media kernel and engine|

US8136142B2|2009-07-02|2012-03-13|Ericsson Television, Inc.|Centralized content management system for managing distribution of packages to video service providers|

ES2793958T3|2009-08-14|2020-11-17|Dts Llc|System to adaptively transmit audio objects|

US9384299B2|2009-09-22|2016-07-05|Thwapr, Inc.|Receiving content for mobile media sharing|

US20110088076A1|2009-10-08|2011-04-14|Futurewei Technologies, Inc.|System and Method for Media Adaptation|

WO2011045813A2|2009-10-15|2011-04-21|Tony Joy|A method and product to transparently deliver audio through fusion of fixed loudspeakers and headphones to deliver the sweet spot experience|

JP5645951B2|2009-11-20|2014-12-24|フラウンホッファー−ゲゼルシャフトツァフェルダールングデァアンゲヴァンテンフォアシュンクエー．ファオ|An apparatus for providing an upmix signal based on a downmix signal representation, an apparatus for providing a bitstream representing a multichannel audio signal, a method, a computer program, and a multi-channel audio signal using linear combination parameters Bitstream|

KR20120105497A|2009-12-02|2012-09-25|톰슨 라이센싱|Optimizing content calibration for home theaters|

US20130163794A1|2011-12-22|2013-06-27|Motorola Mobility, Inc.|Dynamic control of audio on a mobile device with respect to orientation of the mobile device|

RS1332U|2013-04-24|2013-08-30|Tomislav Stanojević|Total surround sound system with floor loudspeakers|

EP2830336A3|2013-07-22|2015-03-04|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Renderer controlled spatial upmix|SE534980C2|2009-08-26|2012-03-06|Svenska Utvecklings Entreprenoeren Susen Ab|Method of waking a sleepy motor vehicle driver|

US9622014B2|2012-06-19|2017-04-11|Dolby Laboratories Licensing Corporation|Rendering and playback of spatial audio using channel-based audio systems|

EP2875511B1|2012-07-19|2018-02-21|Dolby International AB|Audio coding for improving the rendering of multi-channel audio signals|

CN104541524B|2012-07-31|2017-03-08|英迪股份有限公司|A kind of method and apparatus for processing audio signal|

CN104604255B|2012-08-31|2016-11-09|杜比实验室特许公司|The virtual of object-based audio frequency renders|

US9532158B2|2012-08-31|2016-12-27|Dolby Laboratories Licensing Corporation|Reflected and direct rendering of upmixed content to individually addressable drivers|

US9794718B2|2012-08-31|2017-10-17|Dolby Laboratories Licensing Corporation|Reflected sound rendering for object-based audio|

US9826328B2|2012-08-31|2017-11-21|Dolby Laboratories Licensing Corporation|System for rendering and playback of object based audio in various listening environments|

CN104604258B|2012-08-31|2017-04-26|杜比实验室特许公司|Bi-directional interconnect for communication between a renderer and an array of individually addressable drivers|

KR20140047509A|2012-10-12|2014-04-22|한국전자통신연구원|Audio coding/decoding apparatus using reverberation signal of object audio signal|

CA3031476C|2012-12-04|2021-03-09|Samsung Electronics Co., Ltd.|Audio providing apparatus and audio providing method|

CN104885151B|2012-12-21|2017-12-22|杜比实验室特许公司|For the cluster of objects of object-based audio content to be presented based on perceptual criteria|

WO2014112793A1|2013-01-15|2014-07-24|한국전자통신연구원|Encoding/decoding apparatus for processing channel signal and method therefor|

KR102160218B1|2013-01-15|2020-09-28|한국전자통신연구원|Audio signal procsessing apparatus and method for sound bar|

US10068579B2|2013-01-15|2018-09-04|Electronics And Telecommunications Research Institute|Encoding/decoding apparatus for processing channel signal and method therefor|

EP2757558A1|2013-01-18|2014-07-23|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Time domain level adjustment for audio signal decoding or encoding|

US10178489B2|2013-02-08|2019-01-08|Qualcomm Incorporated|Signaling audio rendering information in a bitstream|

US9883310B2|2013-02-08|2018-01-30|Qualcomm Incorporated|Obtaining symmetry information for higher order ambisonic audio renderers|

US9609452B2|2013-02-08|2017-03-28|Qualcomm Incorporated|Obtaining sparseness information for higher order ambisonic audio renderers|

US9685163B2|2013-03-01|2017-06-20|Qualcomm Incorporated|Transforming spherical harmonic coefficients|

US9640163B2|2013-03-15|2017-05-02|Dts, Inc.|Automatic multi-channel music mix from multiple audio stems|

WO2014160717A1|2013-03-28|2014-10-02|Dolby Laboratories Licensing Corporation|Using single bitstream to produce tailored audio device mixes|

EP3282716B1|2013-03-28|2019-11-20|Dolby Laboratories Licensing Corporation|Rendering of audio objects with apparent size to arbitrary loudspeaker layouts|

JP6204682B2|2013-04-05|2017-09-27|日本放送協会|Acoustic signal reproduction device|

JP6204683B2|2013-04-05|2017-09-27|日本放送協会|Acoustic signal reproduction device, acoustic signal creation device|

JP2014204316A|2013-04-05|2014-10-27|日本放送協会|Acoustic signal reproducing device and acoustic signal preparation device|

JP6204681B2|2013-04-05|2017-09-27|日本放送協会|Acoustic signal reproduction device|

US20160050508A1|2013-04-05|2016-02-18|William Gebbens REDMANN|Method for managing reverberant field for immersive audio|

JP6204684B2|2013-04-05|2017-09-27|日本放送協会|Acoustic signal reproduction device|

JP6204680B2|2013-04-05|2017-09-27|日本放送協会|Acoustic signal reproduction device, acoustic signal creation device|

CN105144751A|2013-04-15|2015-12-09|英迪股份有限公司|Audio signal processing method using generating virtual object|

CN108064014B|2013-04-26|2020-11-06|索尼公司|Sound processing device|

CN105075294B|2013-04-30|2018-03-09|华为技术有限公司|Audio signal processor|

US10582330B2|2013-05-16|2020-03-03|Koninklijke Philips N.V.|Audio processing apparatus and method therefor|

US9860669B2|2013-05-16|2018-01-02|Koninklijke Philips N.V.|Audio apparatus and method therefor|

WO2014184618A1|2013-05-17|2014-11-20|Nokia Corporation|Spatial object oriented audio apparatus|

KR101410976B1|2013-05-31|2014-06-23|한국산업은행|Apparatus and method for positioning of speaker|

CN105264914B|2013-06-10|2017-03-22|株式会社索思未来|Audio playback device and method therefor|

US9705953B2|2013-06-17|2017-07-11|Adobe Systems Incorporated|Local control of digital signal processing|

WO2014204911A1|2013-06-18|2014-12-24|Dolby Laboratories Licensing Corporation|Bass management for audio rendering|

CN104240711B|2013-06-18|2019-10-11|杜比实验室特许公司|For generating the mthods, systems and devices of adaptive audio content|

US9883311B2|2013-06-28|2018-01-30|Dolby Laboratories Licensing Corporation|Rendering of audio objects using discontinuous rendering-matrix updates|

EP3020042B1|2013-07-08|2018-03-21|Dolby Laboratories Licensing Corporation|Processing of time-varying metadata for lossless resampling|

EP2830335A3|2013-07-22|2015-02-25|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus, method, and computer program for mapping first and second input channels to at least one output channel|

EP2830045A1|2013-07-22|2015-01-28|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Concept for audio encoding and decoding for audio channels and audio objects|

EP2830047A1|2013-07-22|2015-01-28|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for low delay object metadata coding|

EP2830050A1|2013-07-22|2015-01-28|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for enhanced spatial audio object coding|

EP3028476B1|2013-07-30|2019-03-13|Dolby International AB|Panning of audio objects to arbitrary speaker layouts|

ES2710774T3|2013-11-27|2019-04-26|Dts Inc|Multiple-based matrix mixing for multi-channel audio with high number of channels|

PL3028474T3|2013-07-30|2019-06-28|Dts, Inc.|Matrix decoder with constant-power pairwise panning|

KR20210141766A|2013-07-31|2021-11-23|돌비 레버러토리즈 라이쎈싱 코오포레이션|Processing spatially diffuse or large audio objects|

CN105518569A|2013-08-21|2016-04-20|汤姆逊许可公司|Video display with pan function controlled by viewing direction|

US9483228B2|2013-08-26|2016-11-01|Dolby Laboratories Licensing Corporation|Live engine|

US10141004B2|2013-08-28|2018-11-27|Dolby Laboratories Licensing Corporation|Hybrid waveform-coded and parametric-coded speech enhancement|

CA2887124C|2013-08-28|2015-09-29|Mixgenius Inc.|System and method for performing automatic audio production using semantic data|

EP3044876B1|2013-09-12|2019-04-10|Dolby Laboratories Licensing Corporation|Dynamic range control for a wide variety of playback environments|

US8751832B2|2013-09-27|2014-06-10|James A Cashin|Secure system and method for audio processing|

US9067135B2|2013-10-07|2015-06-30|Voyetra Turtle Beach, Inc.|Method and system for dynamic control of game audio based on audio analysis|

US9716958B2|2013-10-09|2017-07-25|Voyetra Turtle Beach, Inc.|Method and system for surround sound processing in a headset|

US9338541B2|2013-10-09|2016-05-10|Voyetra Turtle Beach, Inc.|Method and system for in-game visualization based on audio analysis|

US10063982B2|2013-10-09|2018-08-28|Voyetra Turtle Beach, Inc.|Method and system for a game headset with audio alerts based on audio track analysis|

US8979658B1|2013-10-10|2015-03-17|Voyetra Turtle Beach, Inc.|Dynamic adjustment of game controller sensitivity based on audio analysis|

WO2015056383A1|2013-10-17|2015-04-23|パナソニック株式会社|Audio encoding device and audio decoding device|

KR102231755B1|2013-10-25|2021-03-24|삼성전자주식회사|Method and apparatus for 3D sound reproducing|

US9933989B2|2013-10-31|2018-04-03|Dolby Laboratories Licensing Corporation|Binaural rendering for headphones using metadata processing|

US9888333B2|2013-11-11|2018-02-06|Google Technology Holdings LLC|Three-dimensional audio rendering techniques|

EP3069528B1|2013-11-14|2017-09-13|Dolby Laboratories Licensing Corporation|Screen-relative rendering of audio and encoding and decoding of audio for such rendering|

EP3075173B1|2013-11-28|2019-12-11|Dolby Laboratories Licensing Corporation|Position-based gain adjustment of object-based audio and ring-based channel audio|

EP2892250A1|2014-01-07|2015-07-08|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for generating a plurality of audio channels|

US9704491B2|2014-02-11|2017-07-11|Disney Enterprises, Inc.|Storytelling environment: distributed immersive audio soundscape|

US9578436B2|2014-02-20|2017-02-21|Bose Corporation|Content-aware audio modes|

RU2678323C2|2014-03-18|2019-01-28|Конинклейке Филипс Н.В.|Audiovisual content item data streams|

KR102201726B1|2014-03-21|2021-01-12|돌비 인터네셔널 에이비|Method for compressing a higher order ambisonics signal, method for decompressing a compressed hoa signal, apparatus for compressing a hoa signal, and apparatus for decompressing a compressed hoa signal|

EP2922057A1|2014-03-21|2015-09-23|Thomson Licensing|Method for compressing a Higher Order Ambisonicssignal, method for decompressing a compressed HOA signal, apparatus for compressing a HOA signal, and apparatus for decompressing a compressed HOA signal|

US10412522B2|2014-03-21|2019-09-10|Qualcomm Incorporated|Inserting audio channels into descriptions of soundfields|

CN109087653A|2014-03-24|2018-12-25|杜比国际公司|To the method and apparatus of high-order clear stereo signal application dynamic range compression|

JP6313641B2|2014-03-25|2018-04-18|日本放送協会|Channel number converter|

EP2925024A1|2014-03-26|2015-09-30|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for audio rendering employing a geometric distance definition|

EP2928216A1|2014-03-26|2015-10-07|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for screen related audio object remapping|

KR20210114558A|2014-04-11|2021-09-23|삼성전자주식회사|Method and apparatus for rendering sound signal, and computer-readable recording medium|

WO2015164572A1|2014-04-25|2015-10-29|Dolby Laboratories Licensing Corporation|Audio segmentation based on spatial metadata|

HK1195445A2|2014-05-08|2014-11-07|黃偉明|Endpoint mixing system and reproduction method of endpoint mixed sounds|

US20170195819A1|2014-05-21|2017-07-06|Dolby International Ab|Configuring Playback of Audio Via a Home Audio Playback System|

EP3151240A4|2014-05-30|2018-01-24|Sony Corporation|Information processing device and information processing method|

CN106465028B|2014-06-06|2019-02-15|索尼公司|Audio signal processor and method, code device and method and program|

US10139907B2|2014-06-16|2018-11-27|Immersion Corporation|Systems and methods for foley-style haptic content creation|

CN113851138A|2014-06-30|2021-12-28|索尼公司|Information processing apparatus, information processing method, and computer program|

CN105532009B|2014-07-18|2021-03-12|索尼公司|Transmission device, transmission method, reception device, and reception method|

JP6710675B2|2014-07-31|2020-06-17|ドルビーラボラトリーズライセンシングコーポレイション|Audio processing system and method|

EP3197182B1|2014-08-13|2020-09-30|Samsung Electronics Co., Ltd.|Method and device for generating and playing back audio signal|

CN105657633A|2014-09-04|2016-06-08|杜比实验室特许公司|Method for generating metadata aiming at audio object|

WO2016048762A1|2014-09-24|2016-03-31|Dolby Laboratories Licensing Corporation|Overhead speaker system|

US9774974B2|2014-09-24|2017-09-26|Electronics And Telecommunications Research Institute|Audio metadata providing apparatus and method, and multichannel audio data playback apparatus and method to support dynamic format conversion|

RU2701060C2|2014-09-30|2019-09-24|Сони Корпорейшн|Transmitting device, transmission method, receiving device and reception method|

US20160094914A1|2014-09-30|2016-03-31|Alcatel-Lucent Usa Inc.|Systems and methods for localizing audio streams via acoustic large scale speaker arrays|

KR102226817B1|2014-10-01|2021-03-11|삼성전자주식회사|Method for reproducing contents and an electronic device thereof|

JP6732739B2|2014-10-01|2020-07-29|ドルビー・インターナショナル・アーベー|Audio encoders and decoders|

DK3201918T3|2014-10-02|2019-02-25|Dolby Int Ab|DECODING PROCEDURE AND DECODS FOR DIALOGUE IMPROVEMENT|

US10089991B2|2014-10-03|2018-10-02|Dolby International Ab|Smart access to personalized audio|

BR112017008015A2|2014-10-31|2017-12-19|Dolby Int Ab|audio decoding and coding methods and systems, and computer program product|

EP3219115A1|2014-11-11|2017-09-20|Google, Inc.|3d immersive spatial audio systems and methods|

CN112954580A|2014-12-11|2021-06-11|杜比实验室特许公司|Metadata-preserving audio object clustering|

US10057705B2|2015-01-13|2018-08-21|Harman International Industries, Incorporated|System and method for transitioning between audio system modes|

JP6550756B2|2015-01-20|2019-07-31|ヤマハ株式会社|Audio signal processor|

EP3780589A1|2015-02-03|2021-02-17|Dolby Laboratories Licensing Corporation|Post-conference playback system having higher perceived quality than originally heard in the conference|

WO2016126715A1|2015-02-03|2016-08-11|Dolby Laboratories Licensing Corporation|Adaptive audio construction|

EP3254456B1|2015-02-03|2020-12-30|Dolby Laboratories Licensing Corporation|Optimized virtual scene layout for spatial meeting playback|

CN105992120B|2015-02-09|2019-12-31|杜比实验室特许公司|Upmixing of audio signals|

CN105989845B|2015-02-25|2020-12-08|杜比实验室特许公司|Video content assisted audio object extraction|

US9933991B2|2015-03-10|2018-04-03|Harman International Industries, Limited|Remote controlled digital audio mixing system|

WO2016148552A2|2015-03-19|2016-09-22|소닉티어랩|Device and method for reproducing three-dimensional sound image in sound image externalization|

JP6777071B2|2015-04-08|2020-10-28|ソニー株式会社|Transmitter, transmitter, receiver and receiver|

EP3286929B1|2015-04-20|2019-07-31|Dolby Laboratories Licensing Corporation|Processing audio data to compensate for partial hearing loss or an adverse hearing environment|

WO2016172254A1|2015-04-21|2016-10-27|Dolby Laboratories Licensing Corporation|Spatial audio signal manipulation|

US20160315722A1|2015-04-22|2016-10-27|Apple Inc.|Audio stem delivery and control|

US10304467B2|2015-04-24|2019-05-28|Sony Corporation|Transmission device, transmission method, reception device, and reception method|

US10063985B2|2015-05-14|2018-08-28|Dolby Laboratories Licensing Corporation|Generation and playback of near-field audio content|

KR102357293B1|2015-05-26|2022-01-28|삼성전자주식회사|Stereophonic sound reproduction method and apparatus|

US9985676B2|2015-06-05|2018-05-29|Braven, Lc|Multi-channel mixing console|

RU2685999C1|2015-06-17|2019-04-23|Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф.|Volume control for user interactivity in the audio coding systems|

DE102015008000A1|2015-06-24|2016-12-29|Saalakustik.De Gmbh|Method for reproducing sound in reflection environments, in particular in listening rooms|

US9530426B1|2015-06-24|2016-12-27|Microsoft Technology Licensing, Llc|Filtering sounds for conferencing applications|

EP3314916B1|2015-06-25|2020-07-29|Dolby Laboratories Licensing Corporation|Audio panning transformation system and method|

GB2540225A|2015-07-08|2017-01-11|Nokia Technologies Oy|Distributed audio capture and mixing control|

CN105187625B|2015-07-13|2018-11-16|努比亚技术有限公司|A kind of electronic equipment and audio-frequency processing method|

GB2540404B|2015-07-16|2019-04-10|Powerchord Group Ltd|Synchronising an audio signal|

GB2529310B|2015-07-16|2016-11-30|Powerchord Group Ltd|A method of augmenting an audio content|

GB2540407B|2015-07-16|2020-05-20|Powerchord Group Ltd|Personal audio mixer|

US9934790B2|2015-07-31|2018-04-03|Apple Inc.|Encoded audio metadata-based equalization|

CN105070304B|2015-08-11|2018-09-04|小米科技有限责任公司|Realize method and device, the electronic equipment of multi-object audio recording|

CN108141692B|2015-08-14|2020-09-29|Dts（英属维尔京群岛）有限公司|Bass management system and method for object-based audio|

KR20170022415A|2015-08-20|2017-03-02|삼성전자주식회사|Method and apparatus for processing audio signal based on speaker location information|

US9832590B2|2015-09-12|2017-11-28|Dolby Laboratories Licensing Corporation|Audio program playback calibration based on content creation environment|

EP3356905A4|2015-09-28|2018-09-05|RazerPte Ltd.|Computers, methods for controlling a computer, and computer-readable media|

US10341770B2|2015-09-30|2019-07-02|Apple Inc.|Encoded audio metadata-based loudness equalization and dynamic equalization during DRC|

US20170098452A1|2015-10-02|2017-04-06|Dts, Inc.|Method and system for audio processing of dialog, music, effect and height objects|

US9877137B2|2015-10-06|2018-01-23|Disney Enterprises, Inc.|Systems and methods for playing a venue-specific object-based audio|

CN108141674A|2015-10-21|2018-06-08|富士胶片株式会社|Audio-video system|

US9807535B2|2015-10-30|2017-10-31|International Business Machines Corporation|Three dimensional audio speaker array|

US10251007B2|2015-11-20|2019-04-02|Dolby Laboratories Licensing Corporation|System and method for rendering an audio program|

CN105979349A|2015-12-03|2016-09-28|乐视致新电子科技（天津）有限公司|Audio frequency data processing method and device|

WO2017106368A1|2015-12-18|2017-06-22|Dolby Laboratories Licensing Corporation|Dual-orientation speaker for rendering immersive audio content|

WO2017126895A1|2016-01-19|2017-07-27|지오디오랩 인코포레이티드|Device and method for processing audio signal|

WO2017130210A1|2016-01-27|2017-08-03|Indian Institute Of Technology Bombay|Method and system for rendering audio streams|

CN108702582B|2016-01-29|2020-11-06|杜比实验室特许公司|Method and apparatus for binaural dialog enhancement|

US11121620B2|2016-01-29|2021-09-14|Dolby Laboratories Licensing Corporation|Multi-channel cinema amplifier with power-sharing, messaging and multi-phase power supply|

US10778160B2|2016-01-29|2020-09-15|Dolby Laboratories Licensing Corporation|Class-D dynamic closed loop feedback amplifier|

CN105656915B|2016-01-29|2019-01-18|腾讯科技（深圳）有限公司|Immediate communication methods, devices and systems|

US9924291B2|2016-02-16|2018-03-20|Sony Corporation|Distributed wireless speaker system|

US10573324B2|2016-02-24|2020-02-25|Dolby International Ab|Method and system for bit reservoir control in case of varying metadata|

CN105898669B|2016-03-18|2017-10-20|南京青衿信息科技有限公司|A kind of coding method of target voice|

US10325610B2|2016-03-30|2019-06-18|Microsoft Technology Licensing, Llc|Adaptive audio rendering|

GB2550877A|2016-05-26|2017-12-06|Univ Surrey|Object-based audio rendering|

EP3472832A4|2016-06-17|2020-03-11|DTS, Inc.|Distance panning using near / far-field rendering|

US20170372697A1|2016-06-22|2017-12-28|Elwha Llc|Systems and methods for rule-based user control of audio rendering|

US10951985B1|2016-07-01|2021-03-16|Gebre Waddell|Method and system for audio critical listening and evaluation|

US9956910B2|2016-07-18|2018-05-01|Toyota Motor Engineering & Manufacturing North America, Inc.|Audible notification systems and methods for autonomous vehicles|

CN109478400A|2016-07-22|2019-03-15|杜比实验室特许公司|The network-based processing and distribution of the multimedia content of live musical performance|

CN106375778B|2016-08-12|2020-04-17|南京青衿信息科技有限公司|Method for transmitting three-dimensional audio program code stream conforming to digital movie specification|

GB201615538D0|2016-09-13|2016-10-26|Nokia Technologies Oy|A method , apparatus and computer program for processing audio signals|

CN109716794B|2016-09-20|2021-07-13|索尼公司|Information processing apparatus, information processing method, and computer-readable storage medium|

JP6693569B2|2016-09-28|2020-05-13|ヤマハ株式会社|Mixer, control method of mixer, and program|

GB2554447A|2016-09-28|2018-04-04|Nokia Technologies Oy|Gain control in spatial audio systems|

US10349196B2|2016-10-03|2019-07-09|Nokia Technologies Oy|Method of editing audio signals using separated objects and associated apparatus|

US10419866B2|2016-10-07|2019-09-17|Microsoft Technology Licensing, Llc|Shared three-dimensional audio bed|

US10516914B2|2016-10-19|2019-12-24|Centurylink Intellectual Property Llc|Method and system for implementing automatic audio optimization for streaming services|

US10535355B2|2016-11-18|2020-01-14|Microsoft Technology Licensing, Llc|Frame coding for spatial audio data|

US11259135B2|2016-11-25|2022-02-22|Sony Corporation|Reproduction apparatus, reproduction method, information processing apparatus, and information processing method|

JP6993774B2|2016-12-07|2022-01-14|シャープ株式会社|Audio output controller|

CN110326310B|2017-01-13|2020-12-29|杜比实验室特许公司|Dynamic equalization for crosstalk cancellation|

DE102017102234A1|2017-02-06|2018-08-09|Visteon Global Technologies, Inc.|Method and device for the spatial representation of virtual noise sources in a vehicle|

WO2018150774A1|2017-02-17|2018-08-23|シャープ株式会社|Voice signal processing device and voice signal processing system|

WO2018173413A1|2017-03-24|2018-09-27|シャープ株式会社|Audio signal processing device and audio signal processing system|

KR20190139206A|2017-04-13|2019-12-17|소니 주식회사|Signal processing apparatus and method, and program|

US9820073B1|2017-05-10|2017-11-14|Tls Corp.|Extracting a common signal from multiple audio signals|

US9843883B1|2017-05-12|2017-12-12|QoSound, Inc.|Source independent sound field rotation for virtual and augmented reality applications|

US20180357038A1|2017-06-09|2018-12-13|Qualcomm Incorporated|Audio metadata modification at rendering device|

WO2018231185A1|2017-06-16|2018-12-20|Василий Васильевич ДУМА|Method of synchronizing sound signals|

US10028069B1|2017-06-22|2018-07-17|Sonos, Inc.|Immersive audio in a media playback system|

CN111108760B|2017-09-29|2021-11-26|苹果公司|File format for spatial audio|

CN111052770B|2017-09-29|2021-12-03|苹果公司|Method and system for spatial audio down-mixing|

US20200265853A1|2017-10-05|2020-08-20|Sony Corporation|Encoding device and method, decoding device and method, and program|

WO2019094027A1|2017-11-10|2019-05-16|Hewlett-Packard Development Company, L.P.|Conferencing environment monitoring|

US10440497B2|2017-11-17|2019-10-08|Intel Corporation|Multi-modal dereverbaration in far-field audio systems|

US10511909B2|2017-11-29|2019-12-17|Boomcloud 360, Inc.|Crosstalk cancellation for opposite-facing transaural loudspeaker systems|

RU2020116581A|2017-12-12|2021-11-22|Сони Корпорейшн|PROGRAM, METHOD AND DEVICE FOR SIGNAL PROCESSING|

KR20190095789A|2018-02-07|2019-08-16|삼성전자주식회사|Method for playing audio data using dual speaker and electronic device thereof|

DE102018206025A1|2018-02-19|2019-08-22|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for object-based spatial audio mastering|

US10514882B2|2018-02-21|2019-12-24|Microsoft Technology Licensing, Llc|Digital audio processing system for adjoining digital audio stems based on computed audio intensity/characteristics|

US10609503B2|2018-04-08|2020-03-31|Dts, Inc.|Ambisonic depth extraction|

US10630870B2|2018-06-20|2020-04-21|Gdc TechnologyLimited|System and method for augmented reality movie screenings|

EP3588988B1|2018-06-26|2021-02-17|Nokia Technologies Oy|Selective presentation of ambient audio content for spatial audio presentation|

US20200007988A1|2018-07-02|2020-01-02|Microchip Technology Incorporated|Wireless signal source based audio output and related systems, methods and devices|

US10445056B1|2018-07-03|2019-10-15|Disney Enterprises, Inc.|System for deliverables versioning in audio mastering|

CN110675889A|2018-07-03|2020-01-10|阿里巴巴集团控股有限公司|Audio signal processing method, client and electronic equipment|

US10455078B1|2018-07-11|2019-10-22|International Business Machines Corporation|Enhancing privacy in mobile phone calls by caller controlled audio delivering modes|

GB2575510A|2018-07-13|2020-01-15|Nokia Technologies Oy|Spatial augmentation|

US11159327B2|2018-08-06|2021-10-26|Tyson York Winarski|Blockchain augmentation of a material exchange format MXF file|

US20200081681A1|2018-09-10|2020-03-12|Spotify Ab|Mulitple master music playback|

US10932344B2|2018-10-09|2021-02-23|Rovi Guides, Inc.|Systems and methods for emulating an environment created by the outputs of a plurality of devices|

WO2020086357A1|2018-10-24|2020-04-30|Otto Engineering, Inc.|Directional awareness audio communications system|

US20200236487A1|2019-01-22|2020-07-23|Harman International Industries, Incorporated|Mapping virtual sound sources to physical speakers in extended reality applications|

EP3949438A1|2019-04-02|2022-02-09|Syng, Inc.|Systems and methods for spatial audio rendering|

JP2020170935A|2019-04-03|2020-10-15|ヤマハ株式会社|Sound signal processing device and sound signal processing method|

US11087738B2|2019-06-11|2021-08-10|Lucasfilm Entertainment Company Ltd. LLC|System and method for music and effects sound mix creation in audio soundtrack versioning|

CN112153530A|2019-06-28|2020-12-29|苹果公司|Spatial audio file format for storing capture metadata|

US20210004452A1|2019-07-03|2021-01-07|Qualcomm Incorporated|Password-based authorization for audio rendering|

US10972852B2|2019-07-03|2021-04-06|Qualcomm Incorporated|Adapting audio streams for rendering|

CN114175685A|2019-07-09|2022-03-11|杜比实验室特许公司|Rendering independent mastering of audio content|

RU2721180C1|2019-12-02|2020-05-18|Самсунг Электроникс Ко., Лтд.|Method for generating an animation model of a head based on a speech signal and an electronic computing device which implements it|

KR20210072388A|2019-12-09|2021-06-17|삼성전자주식회사|Audio outputting apparatus and method of controlling the audio outputting appratus|

CN111586553A|2020-05-27|2020-08-25|京东方科技集团股份有限公司|Display device and working method thereof|

WO2022010454A1|2020-07-06|2022-01-13|Hewlett-Packard Development Company, L.P.|Binaural down-mixing of audio signals|

RU2759666C1|2021-02-19|2021-11-16|Общество с ограниченной ответственностью «ЯЛОС СТРИМ»|Audio-video data playback system|

法律状态:
2018-12-11| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2019-10-29| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2020-04-07| B25G| Requested change of headquarter approved|Owner name: DOLBY LABORATORIES LICENSING CORPORATION (US) |

2021-04-06| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2021-05-04| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 27/06/2012, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

US201161504005P| true| 2011-07-01|2011-07-01|

US61/504,005|2011-07-01|

US201261636429P| true| 2012-04-20|2012-04-20|

US61/636,429|2012-04-20|

PCT/US2012/044388|WO2013006338A2|2011-07-01|2012-06-27|System and method for adaptive audio signal generation, coding and rendering|

[返回顶部]