专利摘要:
SIGNALING FEATURES OF AN MVC OPERATING POINT Source and destination video devices can use data structures that signal details of an operating point to an MPEG-2 System (Moving Image Experts Group) bitstream. In one example, an apparatus includes a multiplexer that constructs a data structure corresponding to a multiview video coding (MVC) operating point of an MPEG-2 System (Moving Image Experts Group) standard bitstream , in which the data structure signals a renderability value that describes a rendering capability to be filled by a receiving device to use the MVC operating point, a decoding capability value that describes a decoding capability to be satisfied by the receiving device to use the MVC operating point, and a bit rate value that describes a bit rate of the MVC operating point, and which includes the data structure as part of the bit stream, and an interface of output that outputs the bit stream comprising the data structure.
公开号:BR112012002259B1
申请号:R112012002259-8
申请日:2010-08-06
公开日:2021-05-25
发明作者:Ying Chen;Peisong Chen;Marta Karczewicz
申请人:Qualcomm Incorporated;
IPC主号:
专利说明:

Field of Invention
This disclosure refers to the transport of encoded video data. Description of Prior Art
Digital video capabilities can be incorporated into a wide variety of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), desktop or laptop computers, digital cameras, digital recording devices , digital media players, video game devices, video game consoles, satellite or cellular radio phones, video teleconferencing devices, and so on. Digital video devices implement video compression techniques such as those described in the standards defined by ITU-T MPEG-2, MPEG-4, H.263 or H.264/MPEG-4 Part 10, Advanced Video Encoding ( ITU-T's AVC), and extensions of such standards, to more efficiently transmit and receive digital video information.
Video compression techniques perform spatial prediction and/or temporal prediction to reduce or remove inherent redundancies in video sequences. For block-based video encoding, a video slice or frame can be partitioned into macroblocks. Each macroblock can be further partitioned. Macroblocks in an intracoded (I) slice or frame are encoded using spatial prediction with respect to neighboring macroblocks. Macroblocks in a slice or an intercoded frame (P or B) can use spatial prediction with respect to neighboring macroblocks in the same frame or slice or temporal prediction with respect to other reference frames.
After the video data is encoded, the video data can be bundled by a multiplexer for transmission or storage. MPEG-2 includes a "Systems" section that defines a transport level for various video encoding standards. MPEG-2 transport level systems can be used by MPEG-2 video encoders or other video encoders, according to different video encoding standards. For example, MPEG-4 prescribes different encoding and decoding methodologies than MPEG-2, but video encoders implementing MPEG-4 standard techniques can still use MPEG-2 transport level methodologies.
In general, references to "MPEG-2 systems" in this disclosure refer to the video data transport level prescribed by MPEG-2. The transport level prescribed by MPEG-2 is also referred to in this disclosure as an "MPEG-2 transport stream" or simply a "transport stream". Likewise, the transport layer of MPEG-2 systems also includes program streams. Transport streams and program streams generally include different formats for delivering similar data, where a transport stream comprises one or more "programs" including audio and video data, while program streams include a program including audio and video data.
Efforts have been made to develop new video encoding standards based on H.264/AVC. One of these standards is the scalable video encoding standard (SVC), which is the scalable extension for H.264/AVC. Another standard is multiview video encoding (M7C), which becomes the multiview extension for H.264/AVC. The MPEG-2 Systems specification describes how compressed multimedia (video audio) data streams can be multiplexed together with other data to form a single data stream suitable for digital storage or transmission. The latest specification for MPEG-2 systems is specified in "Information Technology - Generic Encoding of Moving Figures and Associated Audio Systems, Recommendation H.222.0; International Organization for Standardization, ISO/IEC JTC1/SC29/WG11; Image Encoding in Motion and Associated Audio", May 2006. MPEG recently designed the MVC transport standard in MPEG-2 systems, and the latest version of this specification is "Study of ISO/IEC 13818-1:2007/FPDAM4 MVC Transport ", MPEG doc. N10572, MPEG from ISO/IEC JTC1/SC29/WG11, Maui, Hawaii, USA, April 2009. Invention Summary
In general, this disclosure describes techniques for improving multiview video encoding in MPEG-2 (Moving Images Experts Group) systems. In particular, the techniques of this disclosure are directed to a data structure for an operating point of an MPEG-2 System bit stream, where the data structure signals a rendering capability to a receiving device, a decoding capability. for the receiving device and, in some examples, a bit rate for the operating point. The data structure can correspond to an operating point descriptor that is included in the MPEG-2 System bit stream.
To correctly display and decode video data from an operating point, a receiving device must satisfy the properties described by the rendering capability and the decoding capability signaled in the data structure. MPEG-2 Systems data streams can include a plurality of operating points that correspond to multiple views of a program. Using different operating points for a program 5 allows multiple client devices to perform adaptation, that is, client devices with different decoding and rendering capabilities can extract views from the same program to display two-dimensional or three-dimensional video data. Client devices can also negotiate with a server device to retrieve data at varying bit rates to accommodate transport media of various bandwidth capacities.
In one example, one method includes constructing, with a source device, a data structure corresponding to a multi-view video encoding (MVC) operating point of an MPEG-2 System standard bitstream, in which the data structure signals a rendering capability value describing a rendering capability to be satisfied by a receiving device to use the MVC operating point, a decoding capability value describing a decoding capability to be satisfied by the device to use the 25 MVC operating point, and a bit rate value that describes a bit rate of the MVC operating point, and at which the data structure is included as part of the bit stream, and outputting the stream. bits comprising the data structure.
In another example, an apparatus includes a multiplexer that constructs a data structure corresponding to an MVC operating point of an MPEG-2 System standard bitstream, in which the data structure signals a rendering capability value that describes a rendering capability to be satisfied by a receiving device to use the MVC operating point, a decoding capability value that describes a decoding capability to be satisfied by the receiving device to use the MVC operating point, and a bitrate value which describes a bitrate of the MVC operating point, and which includes the data structure as part of the data stream, and an output interface which outputs the bitstream comprising the data structure.
In another example, an apparatus includes mechanisms for constructing a data structure corresponding to an MVC operating point of an MPEG-2 System standard bitstream, in which the data structure signals a renderability value that describes a rendering capability to be satisfied by a receiving device to use the MVC operating point, a decoding capability value describing a decoding capability to be satisfied by the receiving device to use the MVC operating point, and a value of bit rate which describes a bit rate of the MVC operating point, and at which the data structure is included as part of the data stream, and mechanisms for outputting the bit stream comprising the data structure.
In another example, a computer-readable medium comprises instructions that cause a processor of a source device to construct a data structure corresponding to an MVC operating point of an MPEG-2 System standard bitstream, in which the data structure signals a rendering capability value that describes a rendering capability to be satisfied by a receiving device to use the MVC operating point, a decoding capability value that describes a decoding capability to be satisfied by the receiving device. reception to use the MVC operating point, and a bitrate value that describes a bitrate of the MVC operating point, and at which the data structure is included as part of the bitstream, and making an interface output output the bit stream comprising the data structure.
In another example, one method includes receiving, with a target device, a data structure corresponding to an MVC operating point of an MPEG-2 System (Moving Image Expert Group) standard bitstream of the system, where the data structure signals a renderability value that describes a rendering capability to be satisfied by a receiving device to use the MVC operating point, a decoding capability value that describes a decoding capability to be satisfied by the receiving device to use the MVC operating point, and a bit rate value that describes a bit rate of the MVC operating point, determine whether a target device's video decoder is capable of decoding views corresponding to the point of MVC operation based on the decoding capability signaled by the data structure, determine whether the target device is capable of rendering the correct views corresponding to the MVC operating point based on the rendering capability signaled by the data structure, and sending the views corresponding to the MVC operating point to the target device's video decoder when the target device's video decoder is determined to be able to decode and render the views corresponding to the MVC operating point.
In another example, an apparatus includes an input interface configured to receive a data structure corresponding to an MVC operating point of an MPEG-2 System standard bitstream, in which the data structure signals a capability value. a rendering capability that describes a rendering capability to be satisfied by a receiving device to use the operating point MVC, a decoding capability value that describes a decoding capability to be satisfied by the receiving device to use the operating point MVC, and a bitrate value describing a bitrate of the operating point MVC, a video decoder configured to decode video data; and a demultiplexer configured to determine whether the video decoder is capable of decoding views corresponding to the operating point MVC based on the decoding capability signaled by the data structure, to determine whether the apparatus is capable of rendering the views corresponding to the operating point. MVC operation based on the rendering capability signaled by the data structure, and to send the views corresponding to the MVC operating point to the video decoder when the video decoder is determined to be able to decode and render the views corresponding to the point 25 MVC operation.
In another example, an apparatus includes mechanisms for receiving a data structure corresponding to an MVC operating point of an MPEG-2 System standard bitstream, in which the data structure signals a rendering capability value that describes a rendering capability to be satisfied by a receiving device to use the MVC operating point, a decoding capability value describing a decoding capability to be satisfied by the receiving device to use the MVC operating point, and a value a bit rate that describes a bit rate of the MVC operating point, mechanisms for determining whether an apparatus video decoder is capable of decoding views corresponding to the MVC operating point based on the decoding capability signaled by the data structure, mechanisms to determine if the fixture is capable of rendering the views corresponding to the MVC operating point based on the renderability location signaled by the data structure, and mechanisms for sending the views corresponding to the MVC operating point to the device video decoder when the device video decoder is determined to be able to decode and render the views corresponding to the MVC operating point .
In another example, a computer-readable storage medium comprises instructions that cause a processor of a target device to receive a data structure corresponding to an MVC operating point of an MPEG-2 System standard and bitstream, in the which data structure signals a rendering capability value that describes a rendering capability to be satisfied by a receiving device to use the MVC operating point, a decoding capability value that describes a decoding capability to be satisfied by the receiving device to use the operating point MVC, and a bit rate value that describes a bit rate of the operating point MVC, to determine whether a target device video decoder is capable of decoding views corresponding to the operating point MVC based on the decoding capability signaled by the data structure, determine whether the target device is capable of rendering Find the views corresponding to the MVC operating point based on the rendering capability signaled by the data structure, and send the views corresponding to the MVC operating point to the target device's video decoder when the target device's video decoder is determined to be able to decode and render the views corresponding to the MVC operating point. Details of one or more examples are set forth in the accompanying drawings and in the description below. Other features, objects and advantages will be evident from the description and drawings, and from the claims. Brief Description of Drawings
Figure 1 is a block diagram illustrating an exemplary system in which an audio/video (A/V) source device carries audio and video data to an A/V destination device.
Figure 2 is a block diagram illustrating an exemplary arrangement of components of a multiplexer consistent with this disclosure.
Figure 3 is a block diagram illustrating an exemplary set of program-specific information tables consistent with this disclosure.
Figures 4 through 6 are conceptual diagrams that illustrate several examples of data sets that can be included in an operating point descriptor.
Figure 7 is a conceptual diagram illustrating an exemplary MVC prediction pattern.
Figure 8 is a flowchart illustrating an exemplary method for using a data structure that signals characteristics of an operating point. Detailed Description of the Invention
The techniques of this disclosure are generally directed towards improving Multiview Video Coding (MVC) in MPEG-2 (Moving Image Experts Group) systems, that is, systems that conform to MPEG-2 with respect to level-level details. transport. MPEG-4, for example, provides standards for video encoding, but generally assumes that video encoders conforming to the MPEG-4 standard will use MPEG-2 transport level systems. Therefore, the techniques of this disclosure are applicable to video encoders that conform to ITU-T MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, or any other standard of video encoding using MPEG-2 transport streams and/or program streams.
In particular, the techniques of this disclosure can modify transport-level syntax elements for MPEG-2 transport streams and program streams. For example, the techniques of this disclosure include a descriptor, which is transmitted in the transport stream, to describe features of an operating point. A server device, for example, may provide multiple operating points in an MPEG-2 transport layer bitstream, each of which corresponds to a respective subset of particular views of multiview video encoding video data. That is, an operating point usually corresponds to a subset of views of a bit stream. In some examples, each view of an operating point includes video data at the same frame rate.
A target device can use operating point descriptors included in a bit stream to select one of the operating points to be decoded and finally presented (eg, displayed) to a user. Instead of passing data for all views to a video decoder upon receipt, the target device can only send views from a selected operating point to the video decoder. In this way, the target device can drop data for views that will not be decoded. The target device can select an operating point based on the one of the highest supported quality among the operating points for a bit stream. The server device may send a plurality of bit substreams (each of which may correspond to an operating point) in a single transport stream or program stream. Although, in various sections, this disclosure may individually refer to a "transport stream" or a "program stream", it should be understood that the techniques in this disclosure are generally applicable to one or both of MPEG transport streams. -2 and program flows. In general, this disclosure describes the use of descriptors as exemplary data structures for carrying out the techniques of this disclosure. Descriptors are used to extend the functionality of a stream. The descriptors in this disclosure can be used by both transport streams and program streams to implement information dissemination techniques. Although this disclosure mainly focuses on descriptors, as an example, data structure that can be used to signal a renderability value for an operating point, a decoding capability value for the operating point, and a value of bit rate for the operating point, it should be understood that other data structures can also be used to perform these techniques.
In accordance with the techniques of this disclosure, source device 20 can construct an operating point descriptor that describes characteristics of an operating point. Characteristics can include, for example, which views are included at an operating point and frame rates for the operating point views. The operating point descriptor can specify a rendering capability, which must be supported by a video decoder in order to receive and decode the operating point, a decoding capability, which must be supported by the video decoder in order of receive and decode the operating point, and a bit rate for the operating point.
The techniques of this disclosure can generally represent each operating point as if the operating point were its own program, signaled by a program map table in a transport stream or a program stream map in a program stream. Alternatively, when one. The program contains multiple operating points, the techniques in this disclosure provide information on how operating points should be reassembled into operating point descriptors. The operating point descriptors can even signal operating point dependencies, which can save bits.
Figure 1 is a block diagram illustrating an exemplary system 10 in which audio/video (A/V) source device 20 carries audio and video data to A/V destination device 40. System 10 of Figure 1 can correspond to a video teleconferencing system, a server/client system, a broadcast/receive system or any other system in which video data is sent from a source device, such as an A/V source device 20, to a target device, such as A/V target device 40. In some examples, A/V source device 20 and A/V target device 40 can perform bidirectional information exchange. That is, A/V source device 20 and A/V destination device 40 may be capable of encoding and decoding (and transmitting and receiving) audio and video data. In some examples, audio encoder 26 may comprise a speech encoder, also referred to as a vocoder.
Source device 20, in the example of figure 1, comprises audio source 22 and video source 24. Audio source 22 may comprise, for example, a microphone that produces electrical signals representative of captured audio data to be encoded by encoder audio 26. Alternatively, audio source 22 may comprise a storage medium storing previously recorded audio data, an audio data generator such as a computer synthesizer or any other audio data source. Video source 24 may comprise a video camera which produces video data to be encoded by video encoder 28, a storage medium encoded with previously recorded video data, a video data generating unit or any other data source of video.
Raw audio and video data can include analog or digital data. Analog data can be digitized before being encoded by audio encoder 26 and/or video encoder 28. Audio source 22 can obtain audio data from a speaking participant while the speaking participant is speaking, and source 24 can simultaneously obtain video data from the speaking participant. In other examples, audio source 22 may comprise a computer readable storage medium comprising stored audio data, and video source 24 may comprise a computer readable storage medium comprising stored video data. As such, the techniques described in this disclosure can be applied to live, streaming, real-time audio and video data or to pre-recorded, archived audio and video data.
Audio frames that correspond to video frames are generally audio frames containing audio data that has been captured by audio source 22 contemporaneously with video data captured by video source 24, which is contained within the video frames. For example, while a participant who is speaking generally outputs audio data by speaking, audio source 22 captures the audio data, and video source 24 captures video data from the participant who is speaking at the same time, ie, as a source Audio 22 is capturing the audio data. From there, an audio frame can temporarily correspond to one or more particular video frames. Consequently, an audio frame corresponding to a video frame generally corresponds to a situation, in which audio data and video data were captured at the same time, and for which an audio frame and a video frame respectively comprise the audio data and the video data that were captured at the same time.
In some examples, audio encoder 26 can encode a timestamp in each encoded audio frame that represents a time at which the audio data for the encoded audio frame is recorded, and similarly, video encoder 28 can encode a timestamp on each encoded video frame that represents a time when the video data for encoded video frame is recorded. In such examples, an audio frame corresponding to a video frame may comprise an audio frame comprising a timestamp and a video frame comprising the same timestamp. A/V source device 20 may include an internal clock, from which audio encoder 26 and/or video encoder 28 may generate the time stamps, or which audio source 22 and video source 24 may use to associate audio and video data, respectively, with a time stamp.
In some examples, audio source 22 may send data to audio encoder 26 corresponds to a time at which audio data was recorded, and video source 24 may send data to video encoder 28 corresponds to a time at which video data were recorded. In some examples, audio encoder 26 can encode a sequence identifier in encoded audio data to indicate a relative temporal ordering of encoded audio data, but without necessarily indicating an absolute time at which the audio data was recorded, and similarly , video encoder 28 may also use sequence identifiers to indicate a relative temporal ordering of encoded video data. Similarly, in some examples, a sequence identifier may be mapped or otherwise correlated with a timestamp.
The techniques of this disclosure are generally directed towards the transport of encoded multimedia data (eg audio and video) and reception, and subsequent interpretation and decoding, of the transported multimedia data. The techniques of this disclosure are particularly applicable to data transport from
Multiview Video Encoding (MVC), i.e. video data comprising a plurality of views. As shown in the example of Figure 1, video source 24 can provide a plurality of views of a scene to video encoder 28. MVC can be useful for generating three-dimensional video data to be used by a three-dimensional display, such as a three-dimensional display. stereoscopic or auto-stereoscopic.
A/V source device 20 can provide a "service" to A/V target device 40. A service generally corresponds to a subset of available views of MVC data. For example, MVC data may be available for eight views, ordered from zero to seven. One service can match stereo video having two views, while another service can match four views, and yet another service can match all eight views. In general, a service matches any combination (that is, any subset) of the available views. A service can also correspond to a combination of available views as well as audio data. An operating point may correspond to a service such that A/V source device 20 can additionally provide an operating point descriptor for each service provided by A/V source device 20.
A/V source device 20, in accordance with the techniques of this disclosure, is capable of providing services corresponding to a subset of views. In general, a view is represented by a view identifier, also referred to as a "view_id". View identifiers generally comprise syntax elements that can be used to identify a view. An MVC encoder provides the view_id of a view when the view is encoded. The view_id can be used by an MVC decoder for interview prediction or by other units for other purposes, eg for rendering.
Interview prediction is a technique for encoding MVC video data from one frame with reference to one or more frames at a common temporal location as the encoded frame of different views. Figure 7, which is discussed in more detail below, provides an exemplary coding scheme 10 for interview prediction. In general, an encoded frame of MVC video data can be predictively encoded spatially, temporally, and/or with reference to frames from other views at a common temporal location. Consequently, reference views from which other views are predicted are generally decoded before views to which the reference views act as a reference, so that these decoded views can be used for reference when decoding reference views. Decoding order 20 does not necessarily match the order of view_ids. Therefore, the decoding order of views is described using view order indices. view order indices are indices indicating the order of decoding of corresponding view components in an access unit.
Each individual stream of data (whether audio or video) is referred to as an elementary stream. An elementary stream is a single digitally encoded (possibly compressed) component of a program. For example, the encoded video or audio portion of the program can be an elementary stream. An elementary stream can be converted to a packed elementary stream (PES) before being multiplexed into a program stream or transport stream. Within the same program, a stream ID is used to distinguish PES packets belonging to one elementary stream from another. The basic unit of data in an elementary stream is a packed elementary stream (PES) packet. Thus, each view of MVC video data corresponds to respective elementary streams. Similarly, audio data corresponds to one or more respective elementary streams.
An MVC encoded video stream can be separated into multiple bit substreams, each of which is an elementary stream. Each substream of bits can be identified using a subset of the MVC view_id. Based on the concept of each subset of MVC view_id, a substream of bits is defined. An MVC video bitstream contains the NAL units of the views listed in the MVC view_id subset. A program stream usually contains only NAL units that are those of elementary streams. It is also designed so that any two elementary streams cannot contain an identical view.
In the example of Fig. 1, multiplexer 30 receives elementary streams comprising video data from video encoder 28 and elementary streams comprising audio data from audio encoder 26. In some examples, video encoder 28 and audio encoder 26 may include , each, wrappers to form PES packets from encoded data. In other examples, video encoder 28 and audio encoder 26 may each interface with respective packers to form PES packets from encoded data. In still other examples, multiplexer 30 may include packers to form PES packets from encoded audio and video data.
A "program", as used in this disclosure, may comprise a combination of audio data and video data, for example, an elementary audio stream and a subset of available views delivered by an A/V source device service 20. Each PES packet includes a stream_id that identifies the elementary stream to which the PES packet belongs. Multiplexer 30 is responsible for assembling elementary streams into transport streams or constitutive program streams. A program stream and a transport stream are two alternative multiplexes targeting different applications.
In general, a program stream includes data for one program, whereas a transport stream can include data for one or more programs. Multiplexer 30 can encode one or both of a program stream or one. transport stream, based on a service being provided, a medium in which the stream will be passed, a number of programs to be sent, or other considerations. For example, when video data is to be encoded onto a storage medium, multiplexer 30 may be more likely to form a program stream, whereas, when video data is to be streamed over a network, broadcast or shipped as part of video telephony, multiplexer 30 may be more likely to use a transport stream.
Multiplexer 30 may be biased in favor of using a program stream for storing and displaying a single program from a digital storage service. A program flow is intended to be used in error-free environments or environments less susceptible to encountering errors, because program flows are quite susceptible to errors. A program stream simply comprises the elementary streams belonging to it and usually contains packets of variable lengths. In a program flow, PES packets that are derived from the contributing elementary flows are organized into "packages". A packet comprises a packet header, an optional system header, and any number of PES packets taken from any of the contributing elementary streams, in any order. The system header contains a summary of the characteristics of the program stream, such as its maximum data rate, the number of contributing elementary audio and video streams, additional timing information, or other information. A decoder can use the information contained in a system header to determine whether or not the decoder is capable of decoding the program stream.
Multiplexer 30 can utilize a transport stream for simultaneous delivery of a plurality of programs over potentially error-prone channels. A transport stream is a multiplex designed for multi-program applications, such as broadcast, so that a single transport stream can accommodate many independent programs. A transport stream may comprise a succession of transport packets, with each of the transport packets being 188 bytes in length. Using short, fixed-size packets makes the transport stream less susceptible to errors than the program stream. In addition, each 188-byte long transport packet can be given additional error protection by processing the packet through a standard error protection process, such as Reed-Solomon encoding. The improved error resilience of the transport stream means that it has a better chance of surviving the error-prone channels to be found in a broadcast environment, for example.
It might appear that transport stream is better than a program stream due to its increased error resilience and ability to port many simultaneous programs. However, the transport stream is a more sophisticated multiplex than the program stream and is consequently more difficult to create and more complicated to demultiplex than a program stream. The first byte of a transport packet can be a sync byte having a value of 0x47 (hex 47, binary "01000111", decimal 71). A single transport stream can carry many different programs, each program comprising many packed elementary streams. Multiplexer 30 can use a thirteen-bit Packet Identifier (PID) field to distinguish transport packets containing data from one elementary stream from those carrying data from other elementary streams. It is the responsibility of the multiplexer to ensure that each elementary stream is given a unique PID value. The last byte of a transport packet can be the continuity count field. Multiplexer 30 increments the value of the continuity count field between successive transport packets belonging to the same elementary stream. This allows a decoder or other unit of a target device, such as an A/V target device 40, to detect the loss or gain of a transport packet and hopefully hide errors that might otherwise result. of an event like that.
Multiplexer 30 receives PES packets for elementary program streams from audio encoder 26 and video encoder 28 and forms corresponding network abstraction layer (NAL) units from the PES packets. In the H.264/AVC (Advanced Video Coding) example, encoded video segments are organized into NAL units, which provide a "network friendly" video representation addressing applications such as video telephony, storage, broadcast or streaming continuous. NAL units can be categorized into Video Coding Layer (VCL) NAL units and non-VCL NAL units. VCL units contain the core compression engine and can comprise block, macroblock and/or slice levels. Other NAL units are non-VCL NAL units.
Multiplexer 30 can form NAL units comprising a header that identifies a program to which the NAL belongs, as well as a payload, e.g., audio data, video data, or data describing the program stream or transport, for which the unit NAL matches. For example, in H.264/AVC, a NAL unit includes a 1-byte header and a variable-length payload. In one example, a NAL unit header comprises a priority_id element, a temporal_id element, an anchor_pic_flag element, a view_id element, a non_idr_flag element, and an inter__view_flag element. In conventional MVC, the NAL unit defined by H.264 is retained, except for prefix NAL units and MVC encoded slice NAL units, which include a 4-byte NAL MVC unit header and the NAL unit payload. The priority_id element of a NAL header can be used for a single single-path bitstream adaptation process. The element time_id can be used to specify the time level of the corresponding NAL unit, where different time levels correspond to different frame rates. The anchor_pic_flag element can indicate whether an image is an anchor image or a non-anchor image. Anchor images and all images succeeding them in output order (ie, display order) can be correctly decoded without decoding previous images in decoding order (ie, bitstream order), and so , can be used as random access points. Anchor images and non-anchor images can have different dependencies, both of which are 10 flagged in the sequence parameter set. Other flags should be discussed and used in the following sections of this chapter. Such an anchor image can also be referred to as an open GOP (Group of Images) hotspot, while a nearby GOP hotspot is also supported when the non_idr_flag element is equal to zero. The non_idr_flag element indicates whether an image is an Instant Decoder Update (IDR) or View IDR (V-IDR) image. In general, an IDR image, and all the images following it in output order or 20 bit stream order, can be correctly decoded without decoding previous images in decoding order or display order. The view_id element may comprise syntax information that can be used to identify a view, which can be used for data interactivity within an MVC decoder, for example, for interview prediction, and outside a decoder, for example, for rendering. The inter_view_flag element can specify whether the corresponding NAL unit is used by 30 other views for interview prediction. To transport the 4-byte NAL unit header information to a base view, which can support AVC, a prefix NAL unit is defined in MVC. In the context of MVC, the base view access unit includes the NAL VCL units of the current time instance of the view, as well as its prefix NAL unit, which contains only the NAL unit header. An H.264/AVC decoder can ignore the prefix NAL unit.
The NAL unit including video data in its payload can comprise various levels of video data granularity. For example, a NAL unit may comprise data from a video block, a macroblock, a plurality of macroblocks, a video data slice or an entire frame of video data. Multiplexer 30 can receive encoded video data from video encoder 28 in the form of elementary stream PES packets. Multiplexer 30 can associate each elementary stream with a corresponding program by mapping stream_ids to corresponding programs, for example, in a database or other data structure, such as a Program Map Table (PMT) or Program Flow Map (PSM).
Multiplexer 30 can also assemble access units from a plurality of NAL units. In general, an access unit may comprise one or more NAL units to represent data from a video frame, in addition to audio data corresponding to the frame when such audio data is available. An access unit typically includes all NAL units for an output time instance, for example, all audio and video data for a time instance. For example, if each view has a frame rate of 20 frames per second (fps), then each instance of time can correspond to a time span of 0.05 seconds. During this time interval, specific frames for all views of the same access unit (the same time instance) can be rendered simultaneously.
In an example corresponding to H.264/AVC, an access unit can comprise an encoded image at a time instance, which can be presented as a primary encoded image. Consequently, an access unit may comprise all audio and video frames of a common temporal instance, for example all views corresponding to time X. This disclosure also refers to an encoded image of a particular view as a "component of View". That is, a display component can comprise an image (or frame) encoded for a particular view at a particular time. Consequently, an access unit can be defined as comprising all view components of a common temporal instance. The decoding order of access units does not necessarily have to be the same as the display or output order.
Multiplexer 30 can also embed data regarding a program in a NAL unit. For example, multiplexer 30 can create a NAL unit comprising a Program Map Table (PMT) or a Program Stream Map (PSM). In general, a PMT is used to describe a transport stream, whereas a PSM is used to describe a program stream. As described in more detail with respect to the example of Figure 2, below, multiplexer 30 may comprise or interact with a data storage unit that associates elementary streams received from audio encoder 26 and video encoder 28 with programs and, consequently , with respective transport streams and/or program streams.
As with most video encoding standards, H.264/AVC defines the syntax, semantics and decoding processes for error-free bitstreams, any of which conform to a certain profile or level. H.264/AVC does not specify the encoder, but the encoder is in charge of ensuring that the generated bitstreams are compliant with the standard for a decoder. In the context of the video encoding standard, a "profile" is a subset of algorithms, features or tools, and restrictions that apply to them. As defined by the H.264 standard, for example, a "profile" is a subset of the entire bitstream syntax that is specified by the H.264 standard. A "level" corresponds to decoder resource consumption limitations, such as, for example, computation and decoder memory, which are related to picture resolution, bit rate and macroblock processing rate (MB). The H.264 standard, for example, recognizes that, within the limits imposed by the syntax of a given profile, it is still possible to require a large variation in the performance of encoders and decoders, depending on the values taken by syntax elements in the bitstream, such as the specified size of the decoded images. The H.264 standard still recognizes that, in many applications, it is neither practical nor economical to implement a decoder capable of handling all the hypothetical uses of syntax within a particular profile. Consequently, the H.264 standard defines a "level" as a specific set of restrictions imposed on the values of syntax elements in the bitstream. These restrictions can be simple limits on values. Alternatively, these restrictions can take the form of restrictions on arithmetic combinations of values (eg image width multiplied by image height multiplied by the number of images decoded per second). The H.264 standard further provides that individual implementations can support a different level for each supported profile.
A profile-conforming decoder ordinarily supports all features defined in the profile. For example, as an encoding feature, Image B encoding is not supported in the baseline profile of H.264/AVC, but it is supported in other profiles of H.264/AVC. A level-conforming decoder must be able to decode any bit stream that does not require resources beyond the limits defined in the level. Profile and level definitions can be useful for interpretability. For example, during video streaming, a pair of level and profile definitions can be negotiated and agreed upon for an entire streaming session. More specifically, in H.264/AVC, a level can define, for example, limitations on the number of macroblocks that need to be processed, decoded image store (DPB) size, encoded image store (CPB) size, range of vertical motion vector, maximum number of motion vectors for two consecutive MBs, and whether a B block can have macroblock subpartitions smaller than 8x8 pixels. In this way, a decoder can determine if the decoder is able to correctly decode the bit stream.
Parameter sets typically contain sequence layer header information in sequence parameter sets (SPS) and the infrequently changing picture layer header information in image parameter sets (PPS). With parameter sets, this infrequently changing information does not need to be repeated for every sequence or image; Furthermore, coding efficiency can be improved. Furthermore, the use of parameter sets can allow out-of-band transmission of header information, avoiding the need for redundant transmissions to achieve error resilience. In out-of-band transmission, parameter set NAL units are transmitted on a different channel than other NAL units. The MPEG-2 Systems standard allows for system extensions through "descriptors". PMTs and PSMs include descriptor loops, into which one or more descriptors can be inserted. In general, a descriptor can comprise a data structure that can be used to extend the definition of programs and/or program elements. This disclosure describes one-point descriptors for performing the techniques of this disclosure. In general, the operating point descriptor of this disclosure improves on the conventional MVC extension descriptor by describing a rendering capability, a decoding capability, and a bitrate of an operating point. A target device, such as an A/V target device 40, can use operating point descriptors for each operating point to select one of the operating points of a bit stream to be decoded.
Each PMT or PSM can include an operating point descriptor that describes characteristics of an operating point. For example, source device 20 may provide the operating point descriptor to provide a renderability value that describes a renderability for client device 40. For client device 40 to correctly render (e.g., display) video data from the operating point, client device 40 must satisfy the rendering capabilities signaled by the rendering capability value. The renderability value can describe, for example, a number of views to be displayed (for example, a number of views targeted for rendering) and/or the video data frame rate for the views. Thus, client device 40 can determine that rendering capabilities are satisfied when video output 44 from client device 40 is capable of displaying the number of views of the operating point at the frame rate specified by the operating point descriptor.
In the examples where source device 20 transmits an MVC bitstream using broadcast or multicast protocols, source device 20 can bundle the entire MVC bitstreams into transport streams, which can be received by client devices having various rendering capabilities . For example, some three-dimensional programs may have different numbers of views (for example, two views, four views, six views, or eight views), and various devices may be capable of using any one to four pairs of views. Thus, each client device can determine which operating point to use based on the supported number of views that can be displayed by the client device. For example, client device 40 can determine which of the operating points to use by determining a number of views that can be displayed per video output 44 and a frame rate at which video output 44 is capable of displaying video data and determining which of the operating points should be used based on the output video rendering capabilities 44.
In examples where the source device transmits an MVC bit stream using a unicast protocol, client device 40 can establish a session corresponding to a program with an acceptable number of views by checking the rendering capability specified in operating point descriptors correspondents. Similarly, in examples where the MVC bitstream is encoded on a computer-readable storage medium for local playback, client device 40 may select a suitable program by checking the rendering capability specified in the PMTs or operating point descriptors PSMs.
Source device 20 may also provide a value of decoding capabilities in an operating point descriptor. The number of views to be decoded may not necessarily be the same as the number of views to be displayed. Hence, the operating point descriptor can separately signal the number of views to be displayed and the number of views to be decoded for the operating point. In addition, the operating point descriptor can specifically identify the views corresponding to the operating point. Certain client devices may prefer private views for various 20 purposes, eg based on viewing angle.
Consequently, client device 40 can be configured to select an operating point based on which views are available at the operating point.
In some examples, the decoding capabilities marked at the operating point may additionally or alternatively specify a profile and level to which the operating point corresponds. In the examples where source device 20 transmits the bitstreams using broadcast or multicast protocols, several client devices with different decoding capabilities may receive the bitstream. For example, some decoders might only be able to decode two views at 30 fps, although some might be able to decode four views at 60 fps. In the examples in which source device 20 transmits the bit stream using a unicast protocol, client device 40 can establish a suitable session (for a specific three-dimensional program) after checking the decoding capability specified in the descriptors in PMTs. Similarly, for local playback, client device 40 can select a suitable program by checking the decoding capability specified in the operating point descriptors of PMTs or PSMs.
Source device 20 may additionally signal bit rate information in the operating point descriptor. Bitrate information can describe one or both of the average bitrate and/or the maximum bitrate for the operating point. For example, when source device 20 transmits the bitstream using a unicast protocol, the channel used to transmit the bitstream may be limited in terms of bandwidth. Consequently, client device 40 can select an operating point having a tolerable maximum or average bit rate for the communication channel. 4
In some examples, source device 20 may still specify the operating point frame rate in the operating point descriptor. Certain views of the operating point may have frame rates that do not match the operating point frame rate. Thus, client device 40 can determine the operating point frame rate and frame rate from such a view to facilitate the process of re-assembling the decoded video data for the purposes of displaying the video data. In several examples, when the two operating point frame rates do not match, client device 40 may drop frames from operating point views having the highest frame rate or interpolated frames from operating point views having the lowest frame rate.
Typically, an elementary stream includes flags "no_sei_nal_unit_present" and "no_prefix_nal_unit_present" that describe, respectively, whether the elementary stream includes NAL units of SEI messages and prefix. This disclosure proposes that client devices, such as client device 40, deduce whether prefix NAL units and/or SEI messages are present within an operating point, rather than explicitly signaling these values to the operating point. To determine whether SEI messages are present at an operating point, client device 40 can determine whether the maximum value of the no_sei_nal_unit_present values of the elementary streams for the operating point is equal to one. Similarly, to determine whether NAL prefix units are present at the operating point, client device 40 can determine whether the maximum value of the no_prefix_nal_unit_present values of the elementary streams for the operating point is equal to one.
The examples discussed above have focused on operating point descriptors included for each operating point of an MVC bit stream. As an alternative, source device 20 can provide MVC extension descriptors that signal similar data. For example, source device 20 may associate more than one MVC extension descriptor with an MVC video bitstream that corresponds to an elementary stream. Source device 20 can specify, in the MVC extension descriptor for a bit substream, a frame rate, a view_id subset of the views to be displayed, and a number of views to be decoded. Source device 20 can still signal a mapping between the MVC extension descriptors and the corresponding operating point.
Video compression standards such as ITU-T's H.261, H.262 and H.263, MPEG-1, MPEG-2 and H.264/MPEG-4 part 10 make use of motion compensated temporal prediction for reduce temporal redundancy. The encoder uses a motion compensated prediction coming from some previously encoded pictures (also referred to here as frames) to predict the actual encoded pictures according to motion vectors. There are three main types of image in typical video encoding. They are Intracoded images ("I images" or "I frames"), Predicted images ("P images" or "P frames"), and Two-dimensional predicted images ("B images" or "B frames"). P images only use the reference image before the current image in temporal order. In a B image, each block of the B image can be predicted from one or two reference images. These reference images could be placed before or after the current image in temporal order.
According to the H.264 encoding standard, as an example, B-pictures use two lists of previously encoded reference pictures, list 0 and list 1. These two lists can each contain coded pictures past and/or future in temporal order. Blocks in a B-picture can be predicted in one of several modes: motion compensated prediction from a list 0 reference image, motion compensated prediction from a list 1 reference image, or motion compensated prediction from from the combination of reference images of both list 0 and list 1. To obtain the combination of reference images of both list 0 and list 1, two motion compensated reference areas are obtained from the reference images of list 0 and list 1, respectively. Your 5 combinations will be used to predict the current block. The ITU-T H.264 standard supports intraprediction on various block sizes, such as 16 by 16, 8 by 8 or 4 by 4 for luma components and 8x8 for chroma components, as well as interprediction on various 10 block sizes , such as 16x16, 16x8, 8x16, 8x8, 8x4, 4x8 and 4x4 for luma components and corresponding scale sizes for chroma components. In this disclosure, "x" and "by" may be used interchangeably to refer to the pixel dimensions of the block in terms of 15 vertical and horizontal dimensions, for example, 16x16 pixels or 16 by 16 pixels. In general, a 16x16 block will have 16 pixels in a vertical direction (y = 16) and 16 pixels in a horizontal direction (x = 16). Likewise, an NxN block typically has N pixels in a vertical direction and N pixels 2 0 in a horizontal direction, where N represents a non-negative integer value. Pixels in a block can be arranged in rows and columns.
Block sizes that are smaller than 16 by 16 may be referred to as partitions of a 25 by 16 by 16 macroblock. Video blocks may comprise pixel data blocks in the pixel domain or blocks of transform coefficients in the transform domain, by example, following application of a transform, such as a discrete cosine transform (DCT), an integral transform, a wavelet transform, or a transform conceptually similar to residual video block data representing pixel differences between coded video blocks and blocks predictive video. In some cases, a video block may comprise blocks of quantized transform coefficients in the transform domain.
Smaller video blocks can provide better resolution and can be used for locations within a video frame that include high levels of detail. In general, macroblocks and the various partitions, sometimes referred to as subblocks, can be considered video blocks. In addition, a slice can be thought of as a plurality of video blocks, such as macroblocks and/or subblocks. Each slice can be an independent decodable unit of a video frame. Alternatively, frames themselves can be decodable units, or other parts of a frame can be defined as decodable units. The terms "encoded unit" or "encoded unit" can refer to any independently decodable unit of a video frame, such as an entire frame, a slice of a frame, a group of pictures (GOP), also referred to as a sequence, or other independently decodable unit defined in accordance with applicable coding techniques. The term macroblock refers to a data structure for encoding image and/or video data according to a two-dimensional pixel array comprising 16x16 pixels. Each pixel comprises a chrominance component and a luminance component. Consequently, the macroblock can define four luminance blocks, each comprising a two-dimensional array of 8x8 pixels, two chrominance blocks, each comprising a two-dimensional array of 16x16 pixels, and a header comprising syntax information such as a block pattern encoded (CBP), an encoding mode (eg, intra (I) or inter (P or B) encoding modes), a partition size for partitions of an intracoded block (eg 16x16, 16x8, 8x16, 8x8 , 8x4, 4x8 or 4x4) or one or more motion vectors for an intercoded macroblock.
Video encoder 28, video decoder 48, audio encoder 26, audio decoder 46, multiplexer 30 and demultiplexer 38 can each be implemented as any of a variety of suitable encoder or decoder circuitry, as per the case, such as one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic circuitry, software, hardware, firmware, or any combinations thereof. Each of video encoder 28 and video decoder 48 can be included in one or more encoders or decoders, one of which can be integrated as part of a combined video encoder/decoder (CODEC). Likewise, each of audio encoder 26 and audio decoder 46 can be included in one or more encoders or decoders, one of which can be integrated as part of a combined CODEC. An apparatus including video encoder 28, video decoder 48, audio encoder 26, audio decoder 46, multiplexer 30 and/or demultiplexer 38 may comprise an integrated circuit, a microprocessor and/or a wireless communication device such as a cell phone.
The techniques of this disclosure may offer certain advantages over conventional techniques for MVC bit substreams, which do not provide features for signaling operating points. Each substream of bits can include one or more views of the corresponding bitstream. In some cases, an operating point may correspond to views of different bitstreams. The techniques in this disclosure provide an operating point descriptor that identifies the corresponding operating point views.
After multiplexer 30 has assembled a NAL unit and/or an access unit from received data, multiplexer 30 passes the unit to output interface 32 for output. Output interface 32 may comprise, for example, a transmitter, a transceiver, a device for writing data to a computer-readable medium, such as, for example, an optical drive, a magnetic media drive (e.g., floppy disk drive ), a universal serial bus (USB) port, a network interface, or other output interface. Output interface 32 outputs the NAL unit or the access unit to a computer readable medium 34, such as, for example, a transmission signal, a magnetic medium, an optical medium, a memory, a flash drive or other readable medium per computer.
Finally, input interface 36 retrieves data from computer readable medium 34. Input interface 36 may comprise, for example, an optical drive, a magnetic media drive, a USB port, a receiver, a transceiver or other interface of computer readable medium. Input interface 36 can provide NAL unit or demultiplexer access unit 38. Demultiplexer 38 can demultiplex a transport stream or program stream into constituent PES streams, unpack the PES streams to retrieve encoded data, and send the encoded data to decoder audio 46 or video decoder 48, depending on whether the encoded data is part of an audio or video stream, for example, as indicated by PES packet headers of the stream. Audio decoder 46 decodes encoded audio data and sends the decoded audio data to audio output 42, while video decoder 48 decodes encoded video data and sends the decoded video data, which may include a plurality of views of a stream, for video output 44. Video output 44 may comprise a display that uses a plurality of views of a scene, for example, a stereoscopic or autostereoscopic display that presents each view of a scene simultaneously.
In particular, demultiplexer 38 can select an operating point from a received bit stream. For example, demultiplexer 38 can compare characteristics of operating points of the bit stream to select an appropriate operating point to be used by A/V target device 40. In general, demultiplexer 38 may attempt to select one of the operating points that would provide the highest quality viewing experience by a user that can be decoded by video decoder 48. For example, demultiplexer 38 can compare the rendering capabilities and decoding capabilities of video decoder 48 to the suggested rendering and decoding capabilities signaled by bitstream operating point descriptors. Of the operating points that demultiplexer 38 determines could be correctly decoded by video decoder 48, demultiplexer 38 can select an operating point that will provide the highest quality video data, e.g., bit rate and/or bit rate. taller frames. In other examples, demultiplexer 38 may select one of the supported operating points based on other considerations, such as, for example, power consumption.
Figure 2 is a block diagram illustrating an exemplary arrangement of multiplexer components 30 (Figure 1). In the example of Figure 2, multiplexer 30 includes stream management unit 60, video input interface 80, audio input interface 82, multiplexed stream output interface 84, and program specific information tables 88. stream 60 includes NAL unit constructor 62, PMT constructor 64, stream identifier lookup unit (ID stream) 66, and program identifier (PID) assignment unit 68.
In the example of Fig. 2, video input interface 80 and audio input interface 82 include respective packers for forming PES units from encoded video data and encoded audio data. In other examples, audio and/or video packers may be included in a unit or module that is external to multiplexer 30. With respect to the example of Figure 2, video input interface 80 may form PES packets from video data encoded from video encoder 28, and audio input interface 82 can form PES packets from encoded audio data received from audio encoder 26.
Stream management unit 60 receives PES packets from video input interface 80 and audio input interface 82. Each PES packet includes a stream ID identifying the elementary stream to which the PES packet belongs. Flow ID lookup unit 66 can determine a program for which the corresponding PES packet by looking up program specific information tables 88. That is, flow ID lookup unit 66 can determine for which program a received PES packet matches. Each program can include a plurality of elementary streams, whereas in general, an elementary stream corresponds to just one program. However, in some examples, an elementary stream can be included in a plurality of programs. Each PES packet 5 can be included in a plurality of output streams from multiplexer 30, as well as various services can each include several subsets of available audio and video streams. Consequently, stream ID inquiry unit 66 can determine whether a PES packet should be included in one or more output streams (e.g. one or more transport or program streams), and particularly in which of the output streams. output include the PES package.
In one example, each elementary stream corresponds to a 15 program. Multiplexer 30 can be responsible for ensuring that each elementary stream is associated with a particular program, and, consequently, with a program ID (PID). When a PES packet is received including a stream ID that is not recognized by multiplexer 30 (e.g., a stream ID not stored in program specific information tables 88), PID assignment unit 68 creates a or more new entries in program-specific information tables 88 to associate the new stream ID with an unused PID. After determining a program to which a PES packet corresponds, NAL unit constructor 62 forms a NAL unit comprising the PES packet, for example, by encapsulating the PES packet with a NAL unit header including the PID of the program for which the 30 stream ID of the PES packet matches. In some examples, NAL unit builder 62, or another flow management unit sub-unit 60, may form an access unit comprising a plurality of NAL units. PMT Builder 64 creates Program Map Tables (PMTs) for a corresponding output stream from multiplexer 30 using information from program specific information tables 88. In another example, stream management unit 60 may comprise a PSM builder to create program stream maps for a program stream output per multiplexer 30. In some examples, multiplexer 30 may comprise both PMT 64 constructor and PSM constructor and output one or both of a transport stream and a transport stream. program. In the example of Fig. 2, PMT constructor 64 can construct a PMT including the new descriptors described by this disclosure, e.g., an operating point descriptor, as well as any other PMT descriptors and data required for the PMT. PMT Builder 64 may periodically, for example after a certain period of time or after a certain amount of data has been transmitted, send a subsequent PMT to the transport stream. PMT builder 64 can pass created PMTs to NAL unit builder 62 to form a NAL unit comprising the PMT, for example, by encapsulating the PMT with a corresponding NAL unit header including the corresponding PID. PMT Builder 64 can create a data structure, such as an operating point descriptor, for each operating point in a program. The data structure created by PMT builder 64 can signal a renderability value that describes a rendering capability to be satisfied by a receiving device to utilize the operating point, a decoding capability value that describes a capability of decoding to be satisfied by the receiving device to use the operating point, and a bitrate value that describes a bitrate of the operating point. For example, PMT builder 64 can determine a number of views to be displayed for an operating point and a frame rate for the operating point views based on information stored by program-specific information tables 88 or information received at from video encoder 28 through video input interface 80. PMT 64 builder can signal one or both of the number of views and frame rate for the operating point views using the structure renderability value of data. PMT 64 Builder can also determine a number of views to be decoded for the operating point and a level value for a profile, to which the operating point views correspond. For example, PMT builder 64 can determine a number of macroblocks that need to be processed, a decoded image store size, an encoded image store size, a vertical motion vector range, a maximum number of motion vectors per two consecutive macroblocks, and/or whether a B block can have submacroblock partitions smaller than 8x8 pixels, and use these determinations to determine the level for the operating point. PMT Builder 64 can receive this information from video encoder 28 via video input interface 80. PMT Builder 64 can then represent the number of views to be decoded and/or the profile level value using the decoding capability value for the operating point. PMT 64 Builder can further determine a bitrate value for the operating point and encode the bitrate value into the data structure. The bitrate value can correspond to an average bitrate or a maximum bitrate for the operating point. PMT 64 builder can calculate bit rate for operating point or receive bit rate indication from video encoder 28.
Multiplexed stream output interface 84 may receive one or more NAL units and/or access units from stream management unit 60, e.g. NAL units comprising PES packets (e.g. audio or video data) and/ or NAL units. comprising a PMT. In some examples, multiplexed streams output interface 84 may form access units from one or more NAL units corresponding to a common temporal location after the NAL units are received from stream management unit 60. Output interface of multiplexed stream 84 transmits the 20 NAL units or access units as output in a corresponding transport stream or program stream. Multiplexed stream output interface 84 can also receive the data structure from PMT 64 constructor and include the data structure as part of the 25-bit stream.
Figure 3 is a block diagram illustrating an exemplary set of program specific information tables 88. The elementary stream to which a transport packet belongs can be determined based on the PID value of the transport packet. In order for a decoder to correctly decode received data, the decoder must be able to determine which elementary streams belong to each program. Program-specific information, as included in program-specific information table 88, may explicitly specify relationships between programs and elementary component streams. In the example of Figure 3, program specific information tables 88 include network information table 100, conditional access table 102, program access table 104, and program map table 106. For the example of Figure 3, assume The output stream is said to comprise an MPEG-2 transport stream. In an alternative example, the output stream may comprise a program stream, in which case program map table 106 may be replaced by a program stream map.
The MPEG-2 Systems specification specifies that all programs carried in a transport stream have a program map table, such as program map table 106, associated with it. Program map table 106 may include details about the program and the elementary streams that the program includes. As an example, 20 a program, identified as program number 3, can contain an elementary video stream with PID 33, an English audio stream with PID 57, and a Chinese audio stream with PID 60. It is allowed for a PMT include more than one program.
The basic program map table specified by the MPEG-2 systems specification may be embellished with any of the many descriptors, for example, descriptors 108, specified within the MPEG-2 systems specification. Descriptors 108 may include any or all of the specified descriptors of the 30 MPEG-2 systems specification. In general, descriptors, such as descriptors 108, carry additional information about a program or its bit substreams or elementary component streams. Descriptors can include video encoding parameters, audio encoding parameters, language identification, pan-and-scan information, conditional access details, copyright information or other information. A broadcaster or other user can define additional private descriptors.
This disclosure provides an operating point descriptor for describing characteristics of an operating point in an MPEG-2 systems conformance bitstream. Descriptors 108 may include operating point descriptors for each operating point of the corresponding bit stream. As shown in Fig. 3, descriptors 108 include extension descriptors MVC 110, hierarchy descriptor 112, and operating point descriptors 114. Each of the operating point descriptors 114 may correspond to a particular operating point of a bit stream and signal, for the operating point, a rendering capability value that describes a rendering capability value to be satisfied by a receiving device to use the operating point, a decoding capability value that describes a decoding capability to be satisfied by the receiving device to use the operating point, and a bitrate value that describes an operating point bitrate. In the elementary streams of video-related components, there is also a hierarchy descriptor, which provides information to identify program elements containing hierarchically encoded video, audio, and private streams.
Table 1 below provides an example of data included in MVC 110 extension descriptors. The various fields and bit depths of the fields shown in Table 1 are merely an example. In one example, each MVC video bitstream is associated with a corresponding one of the MVC extension descriptors 110 that specify the characteristics of the corresponding MVC video bitstreams. An MVC video bitstream may need to assemble another MVC video bitstream. That is, in order to decode and present a particular bitstream, a client device may need to extract and decode video data from other bitstreams of a common bitstream that includes the two bitstreams. Table 1 — MVC Extension Descriptor

In the example of Table 1, the descriptor tag field may correspond to an eight-bit descriptor tag field that is included in each descriptor, as established by the MPEG-2 Systems standard, to particularly identify the descriptor. The MPEG-2 Systems standard defines certain descriptor tags and marks other descriptor tag values, for example, values 36 to 63, as "reserved". Amendment 4 to the MPEG-2 Systems Standard proposes, however, to set the MVC extension descriptor to "49", which corresponds to one of the reserved descriptor tags as specified in the MPEG-2 Systems specification. This disclosure thus proposes to set the value of the descriptor_tag of MVC extension descriptors 110 to a value of "49". Again, the descriptor length field may correspond to an eight-bit descriptor length field that is also included in each descriptor as established by the MPEG-2 Systems standard. Multiplexer 30 can set the value of the descriptor length field equal to the number of bytes of the corresponding MVC extension descriptor 110 immediately following the descriptor length field. Because the length of an MVC extension descriptor does not change, multiplexer 30 can adjust the value of the descriptor length field for each of the MVC extension descriptors 110 to a value of eight, to represent the presence of eight bytes of information following the field. of descriptor length.
The average bit rate field may comprise a sixteen bit field that indicates the average bit rate, in kilobits per second, of a reassembled AVC video stream. That is, the average bitrate field describes the average bitrate for a video stream when the video stream is assembled from constituent parts of the transport stream or program stream for which one of the extension descriptors MVC 110 matches. In some examples, multiplexer 30 may set the value of the average bit rate field to zero to indicate that the average bit rate is not indicated by one of the MVC 110 extension descriptors.
The maximum bit rate field may comprise a sixteen bit field that indicates the maximum bit rate, in kilobits per second, of the reassembled AVC video stream. That is, the maximum bitrate field describes the maximum bitrate for a video stream when the video stream is assembled from constituent parts of the transport stream or program stream for which the one of the descriptors of extension MVC 110 matches. In some examples, multiplexer 30 may set the value of the maximum bit rate field to zero to indicate that the maximum bit rate is not indicated by one of the MVC 110 extension descriptors.
The minimum view order index field may comprise a ten-bit field which indicates the minimum view order index value of all NAL units contained in the associated MVC video bitstream. Similarly, the maximum view order index field is a ten-bit field that indicates the maximum view order index value of all NAL units contained in the associated MVC video bitstream.
The start temporal ID field may comprise a three-bit field that indicates the minimum temporal_id value of the NAL unit header syntax element of all NAL units contained in the associated MVC video bitstream. That is, a temporal ID value is included in a header for each NAL unit. In general, the temporal ID value corresponds to a particular frame rate, where larger temporal ID values correspond relative to higher frame rates. For example, a value of 'O' for a temporal ID may correspond to a frame rate of 15 frames per second (fps), a value of *1' for a temporal ID may correspond to a frame rate of 30 fps. In this way, gathering all images having a temporal ID of 0, in this example, into a set can be used to form a video segment having a frame rate of 15 fps, while gathering all images having a temporal ID of 0 and all images having a temporal ID of 1 in a different set can be used to form a different video segment having a frame rate of 30 fps. Multiplexer 30 determines the smallest temporal ID of all NAL units of the MVC video bitstream and sets the value of the beginning temporal ID field equal to this determined smallest temporal ID value.
The temporal ID end field may comprise a three-bit field that indicates the maximum value of the temporal ID of the NAL units header syntax element of all NAL units contained in the associated MVC video bitstream. Consequently, multiplexer 30 determines the greatest temporal ID of all NAL units of the MVC video bitstream and sets the value of the start temporal ID field equal to this determined greatest temporal ID value.
The no_sei_nal_unit_present field may comprise a one-bit flag which, when set to '1', indicates that no NAL unit of supplemental enhancement information is present in the associated MVC video bitstream. Multiplexer 30 can determine if one or more NAL units of supplemental enhancement information have been put into the bitstream and set the value of the no_sei_nal_unit_present field to a value when there are no NAL SEI units in the bitstream, but can adjust the value of field no_sei_nal_unit_present to a value of '0' when at least one NAL SEI unit is present in the bitstream.
Table 2 below provides an example of data included in hierarchy descriptor 112. In MPEG-2 systems, a hierarchy descriptor can be defined for a video program stream that contains an embedded video program stream. The various fields and field bit depths, shown in Table 2, are provided as an example. The hierarchy_layer_index value identifies the layer index of the current program flow, and the hierarchy_embedded_layer_index value identifies a dependent layer. In MVC design, one program flow can depend on another program flow using the hierarchy descriptor. That is, dependencies between program flows can be determined based on data included in the hierarchy descriptor. Table 2 — Hierarchy descriptor

As noted above, the MPEG-2 Systems specification specifies that each descriptor includes a descriptor tag field and a descriptor length field. Accordingly, hierarchy descriptor 1125 includes a descriptor tag field and a descriptor length field. According to the MPEG-2 Systems specification, multiplexer 30 can adjust the value of the descriptor tag field to a value of "4" for hierarchy descriptor 112.
The length of hierarchy descriptor 112 can be determined a priori, because each instance of hierarchy descriptor 112 must include the same amount of data. In one example, multiplexer 30 may set the value of the descriptor length field to a value of four, indicative of four bytes in a hierarchy descriptor instance 112 after the end of the descriptor length field.
The hierarchy type field describes the hierarchical relationship between the associated hierarchy layer and its built-in hierarchy layer. In one example, multiplexer 30 adjusts the value of the hierarchy type field based on the hierarchical relationship, for example, as described in Table 3, below. As an example, when scalability applies to more than one dimension, multiplexer 30 can set the hierarchy type field to a value of "8" ("Combined Scalability", as shown in Table 3), and multiplexer 30 sets values of the temporal scalability flag field, the spatial scalability flag field, and the quality scalability flag field according to data retrieved from PES packets and PES packet headers of the respective flows. In general, multiplexer 30 can determine dependencies between different streams corresponding to various views and/or audio data streams. Multiplexer 30 can also determine whether a dependent stream comprising an enhancement layer is a spatial layer, a signal-to-noise (SNR) enhancement layer, a quality enhancement layer, or another type of enhancement layer. As another example, for MVC video bitstreams, multiplexer 30 can set the hierarchy type field to a value of "9" ("MVC", as shown in Table 3) and can adjust the values of each of the the scalability flag field, the spatial scalability flag field, and the quality scalability flag field to "1". As yet another example, for MVC base view bitstreams, multiplexer 30 can set the value of the hierarchy type field to a value of "15" and can set values of the scalability flag field, of the flag field. of spatial scalability and the quality scalability flag field to "1". As yet another example, for Prefix MVC bitstream, multiplexer 30 can set the hierarchy type field to a value of "14" and can set the scalability flag field, the spatial scalability flag field, and the field of quality scalability flag to "1".
The layer hierarchy index field may comprise a six-bit field that defines a unique index of the associated program element in a coding layer hierarchy table. indexes can be unique within a single program definition. For video AVC video bitstreams, video streams conforming to one or more profiles defined in Annex G of ITU-T Rec. H.264 | ISO/IEC 14496-10, this is the program element index, which is assigned in such a way that the bitstream order will be correct if associated SVC dependency representations of the video bitstreams of the same access unit are reassembled in ascending order of hierarchy_layer_index. For MVC video bitstreams of AVC video streams conforming to one or more profiles defined in Annex H of ITU-T Rec. H.264 | ISO/IEC 14496-10, this is the program element index, which is assigned such that any of these values are greater than the hierarchy_layer_index value specified in the hierarchy descriptor for the prefix MVC bitstream.
The built-in hierarchy index field may comprise a six-bit field that defines the hierarchy table index of the program element that needs to be accessed before decoding the elementary stream associated with the corresponding instance of the hierarchy descriptor 112. This disclosure leaves the value for the undefined hierarchy embedded layer index field for when the hierarchy type field has a value of "15" (that is, a value corresponding to the base layer).
The hierarchy channel field may comprise a six-bit field that indicates the number of channels destined for the associated program element in an ordered set of transmission channels. The most robust transmission channel is defined by the lowest value of the hierarchy channel field, with respect to the general transmission hierarchy definition. Note that a given hierarchy channel can be assigned to several program elements at the same time.
Reserved fields in Tables 1 and 2 are reserved for future use by future standards development. The techniques of this revelation do not propose, at this moment, to attribute semantic meaning to values of reserved fields.
Table 3 below illustrates the potential values for the hierarchy type field described above. Table 3 — Hierarchy Type Field Values
In some examples, hierarchy descriptor 112 can be used to signal an MVC bit substream signaled by incremental bit substream and embedded bit substream. Embedded bit substreams include the directly dependent bit substream corresponding to hierarchy_embedded_layer_index and all embedded bit substreams of this directly dependent bit substream. In this disclosure, views that are explicitly contained are called enhanced views, while views that are embedded are called dependent views.
In an example in which multiplexer output 30 comprises a program stream, program specific information tables 88 may include a program stream map (PSM). A PSM can provide a description of the elementary streams in the corresponding program stream and the relationships of the elementary streams to one another. In some examples, a program stream map can also correspond to a transport stream. When ported into a corresponding Transport Stream, the PSM structure must not be modified. Multiplexer 30 can indicate that a PSM is present in a PES packet by setting the PES packet stream_id value to OxBC, that is, the hexadecimal value BC, which corresponds to the binary value 10111100 or the decimal value 188.
Multiplexer 30 maintains a complete list of all programs available in a transport stream in program association table 104. Multiplexer 30 can also incorporate program association tables in NAL units. Multiplexer 30 may indicate that a NAL unit includes a program association table by assigning in the NAL unit a PID value of 0. Multiplexer 30 may list each program, along with the PID value of the transport packets containing the table of corresponding program map, in program association table 104. Using the same example mentioned above, the exemplary program map table specifying the elementary streams of program number 3 has a PID of 10011 and another PMT has another PID of 1002. This or similar sets of information may be included in program association table 104. Program specific information tables 88 also include network information table (NIT) 100 and conditional access table (CAT) 102. program zero, as specified in PAT, has a special meaning. In particular, program number zero can be used to point the path to network information table 100. The table is optional and, when present, the table can provide information about the physical network carrying the transport stream, such as frequencies of channel, satellite transponder details, modulation characteristics, service originator, service name and details of available alternative networks.
If any elementary streams within a transport stream are scrambled, then a conditional access table 102 must be present. Conditional access table 102 provides details of the scrambling system(s) in use and provides the PID values of the transport packets that contain the conditional access management and entitlement information. The format of this information is not specified within the MPEG-2 Systems standard.
Figure 4 is a block diagram illustrating an exemplary set of data that can be included in one of the operating point descriptors 114 (Figure 3). In the example of Fig. 4, operating point descriptor 118 includes descriptor tag field 120, descriptor length field 122, frame rate field 124, field number of display views 126, field number of decoding views 128, view identifier fields 130, average bit rate field 132, maximum bit rate field 134, temporal identifier field 136, and reserved escape bit fields 138.
In the example of Figure 4, frame rate field 124 and field number of display views 126 correspond to an exemplary rendering capability value, field number of decoding views 128 corresponds to an exemplary decoding capability value, and average bit rate field 132 and maximum bit rate field 134 correspond to an exemplary bit rate value. Operating point descriptor 118 is merely an example of a data structure that can be used to signal characteristics of an operating point, such as a rendering capability, a decoding capability and a bit rate. Figures 5 and 6 below provide alternative examples of operating point descriptors that signal these characteristics.
As described above, the MPEG-2 Systems specification specifies that each descriptor has a descriptor tag field and a descriptor length field, each of 8 bits. Thus, multiplexer 30 (FIG. 1) may assign a value for descriptor tag field 120 indicative of an MVC operating point descriptor. Multiplexer 30 can also determine a number of views for the operating point and a number of bits reserved for the operating point descriptor, and then calculate the operating point descriptor length 118 in bytes following the operating point descriptor. descriptor 122. Multiplexer 30 can assign this calculated length value to descriptor length field 122 by instantiating operating point descriptor 118.
Frame rate field 124 may comprise a 16-bit field that indicates the maximum frame rate, in frames/256 seconds, of the reassembled AVC video stream. That is, multiplexer 30 can calculate the maximum frame rate of a 256 second time period to adjust the value of frame rate field 124. In some examples, dividing by 256 can result in a conversion to a floating point value. to an integer value. In other examples, time periods other than 256 seconds can be used. The time period of 256 seconds, with respect to frame rate field 124, is merely a potential example at which the maximum frame rate of an operating point can be calculated.
Number of display views field 126 may comprise a ten-bit field that indicates the value of the number of views directed to output the reassembled AVC video stream. In general, field number of display views 126 represents a number of views to be displayed for a corresponding operating point. Because different displays are capable of displaying a different number of views, a client device can use the display views field number value 126 to select an operating point that has as many views to be displayed as are possible on the display for the client device. For example, if a client device is capable of displaying four views, the client device can select an operating point with a field number of display views having a value indicating that four views will be displayed for the corresponding operating point. Consequently, display view field number 126 can be included as part of a renderability value. Likewise, multiplexer 30 can adjust the value of field number of display views 126 in accordance with a number of views to be displayed for an operating point.
Number of decoding views field 128 may comprise a ten-bit field that indicates the value of the number of views required to decode the reassembled AVC video stream. This value may differ from the number of views to be displayed, indicated by the number of display views field 126. This may result from certain views being required for decoding, due to view dependencies, but not currently being displayed.
Referring briefly to Figure 7 as an example, views S0 and SI can be views that should be displayed for an operating point. View S0 can be directly decoded without decoding any other views. However, to decode view S1, view S2 must also be decoded, because view S1 includes prediction data referring to view S2. So, in this example, display views field number 126 would have a value of two, but decoding views field number 128 would have a value of three. In some examples, a view to be displayed may be interpolated from one or more other views, such that the number of views to be displayed may be greater than the number of views to be decoded, i.e., using a base view and depth information, video decoder 48 (figure 1) can interpolate a second view. Video decoder 48 can use two or more views to calculate depth information to interpolate a new view, or video decoder 48 can receive detailed information for one view from source device 20.
Field number of decoding views 128 can correspond to a decoding capability value, wherein a decoder of a client device (such as the video decoder 48 of target device 40) should be able to decode an equal number of views to the field number of decoding views value 128. Accordingly, the client device may select an operating point having a field number of decoding views representative of a number of views that a video decoder of the client device is capable of decoding .
Operating point descriptor 118 of Fig. 4 also includes view identifier fields 130. Each of the view identifier fields 130 may comprise a ten bit field which indicates the view id value of the NAL units contained in the bit stream of Reassembled AVC video. Thus, the view identifiers of each view displayed for a point of operation are flagged using view identifier fields 130. That is, the view identifiers of view identifier fields 130 correspond to the displayed views. Thus, views that are decoded but not displayed are not flagged by view identifier fields 130, in the example of Figure 4.
Average Bit Rate Field 132 may comprise a sixteen bit field that indicates the average bit rate, in kilobits per second, of the reassembled AVC video stream. When set to 0, the average bit rate is not indicated. That is, a value of zero for the average bit rate field 132 implies that the average bit rate field 132 should not be used to determine the average bit rate of the reassembled AVC video stream.
Maximum bit rate field 134 may comprise a sixteen bit field that indicates the maximum bit rate, in kbits per second, of the reassembled AVC video stream. When set to 0, the maximum bit rate is not indicated. That is, when the maximum bitrate field value 134 is set to zero, the maximum bitrate field 134 should not be used to determine the maximum bitrate of the reassembled AVC video stream.
Temporal identifier field 136 may comprise a three-bit field that indicates the value of the temporal_id corresponding to the frame rate of the reassembled AVC video stream. That is, the temporal_id can be used to determine the frame rate of the reassembled AVC video stream, as discussed above.
Exemplary operating point descriptor 118 10 also includes reserved escape bit fields 138. In an example, for example, as shown in Table 4, below, the number of reserved escape bits can be used for both additional signaling and stuffing. run point descriptor 118, such that run point descriptor 118 ends on a byte boundary. For example, as discussed above, operating point descriptor 118 may use ten bits to represent the view identifier of each view displayed. The static number of bits in addition to the bits used for the view identifiers and 20 reserved escape bits is 87 in this example. Thus, to ensure that operating point descriptor 118 ends on a byte boundary (that is, having a number of bits that is evenly divisible by eight), multiplexer 30 can add a number of escape bits in accordance with the following 25 formula: escape bits = (1+6 * num_display_views) 8% where "%" represents the math modulus operator. That is, A % B results in the remainder of A divided by B, such that the remainder is in the range of 30 integers between 0 and B-1.
Table 4 summarizes an exemplary set of data that can be included in the example operating point descriptor 118 of Figure 4. Table 4 — MVC operating point descriptor

Figure 5 is a block diagram illustrating an alternative exemplary set of data that may be included in one of the operating point descriptors 1145 (Figure 3). In general, each of the operating point descriptors 114 should have a common format so that a client device can be configured to receive operating point descriptors of a single format. Thus, each of the operating point descriptors 114 may have a similar format as the operating point descriptor of Fig. 4, Fig. 5 or Fig. 6 or another common format that includes similar signaling data.
In the example of Fig. 5, operating point descriptor 140 includes descriptor tag field 142, descriptor length field 144, profile_idc field 146, level_idc field 148, frame rate field 149, number of views field. display 150, number of decoding view fields 152, average bit rate field 154, maximum bit rate field 156, temporal identifier field 158, reserved bit field 160, view order index fields 162, fields of view identifier 164 and reserved escape bit fields 166. IDC stands for "indicator". As explained below, the example operating point descriptor 140 explicitly flags profile_idc and level_idc values for an operating point, as well as information about how an operating point is mounted.
Display views field number 150 and frame rate field 149 correspond to a rendering capabilities value signaled by operating point descriptor 140. profile_idc field 146, level_idc field 148, and decoding views field number 152 , in the example of Fig. 5, represent examples of data that may correspond to a value of decoding capability signaled by operating point descriptor 140. Average bit rate field 154 and maximum bit rate field 156 correspond to a value of bit rate signaled by operating point descriptor 140.
As described above, the MPEG-2 Systems specification specifies that each descriptor has a descriptor tag field and a descriptor length field, each of which may be 8 bits in length. Thus, multiplexer 30 (FIG. 1) may assign a descriptor tag field value 142, 5 indicative of an MVC operating point descriptor.
Multiplexer 30 can also determine a number of views for the operating point and a number of bits reserved for the operating point descriptor and then calculate the operating point descriptor length 10 140, in bytes, which follows field of descriptor length 144. Multiplexer 30 can assign this value to descriptor length field 144 when instantiating operating point descriptor 140.
profile_idc field 146 may comprise an eight-bit field indicating the operating point profile_idc reassembled by the information given in operating point descriptor 140. level_idc field 148 may comprise an eight-bit field indicating the operating point level_idc reassembled by the information given in 20 operating point descriptor 140.
Frame rate field 149 may comprise a 16-bit field that indicates the maximum frame rate, in frame/256 seconds, of the reassembled AVC video stream. That is, multiplexer 30 can calculate the maximum frame rate 25 of a time period of 256 seconds to adjust the value of frame rate field 149. As in frame rate field 124, in other examples for frame rate field frames 14 9, other time periods than 25 6 seconds can be used. Display view number field 150 may comprise a ten bit field which indicates the value of the number of views directed to output the reassembled AVC video stream. In general, field number of display views 150 represents a number of views to be displayed for a corresponding operating point. Number of decoding views field 152 may comprise a ten-bit field that indicates the value of the number of views required to decode the reassembled AVC video stream. This value may differ from the number of views to be displayed, indicated by the number of display views field 150. This may result from certain views being required for decoding, due to view dependencies, but not currently being displayed, for example, as described above with respect to field number of decoding views 128.
Average bit rate field 154 may comprise a sixteen bit field that indicates the average bit rate, in kilobits per second, of the reassembled AVC video stream. When set to 0, the average bit rate is not indicated. That is, a value of zero for average bit rate field 154 implies that average bit rate field 154 should not be used to determine the average bit rate of the reassembled AVC video stream. Maximum bit rate field 156 may comprise a sixteen bit field which indicates the maximum bit rate, in kbits per second, of the reassembled AVC video stream. When set to 0, the maximum bit rate is not indicated. That is, when maximum bit rate field value 156 is set to zero, maximum bit rate field 156 should not be used to determine the maximum bit rate of reassembled AVC video stream.
Temporal identifier field 158 may comprise a three-bit field that indicates the value of temporal_id corresponding to the frame rate of the reassembled AVC video stream. That is, the temporal id can be used to determine the frame rate of the reassembled AVC video stream, as discussed above.
Operation point descriptor 140 also includes view rank index fields 162 and view identifier fields 164. Each of the view rank index fields 162 may comprise a ten-bit field that indicates the rank index value. view of the NAL units contained in the operating point. A client device can reassemble the NAL units corresponding to all view_order_index values signaled in operating point descriptor 140 by view order index fields 162. View order index fields 162 include view order index fields for each of the views to be decoded. From a value of view_order_index, a client device can extract the corresponding NAL units from the elementary streams, because the MVC extension descriptor tells the range of the view order index values, where elementary stream and range cover the value of view_order_index flagged in the operation point descriptor.
Each of the view identifier fields 164 may comprise a ten-bit field that indicates the view_id value of the NAL units contained in the reassembled AVC video bitstream. Thus, the view identifiers of each view displayed for an operating point are flagged using view identifier fields 164. That is, the view identifiers of view identifier fields 164 correspond to the displayed views. Thus, views that are decoded but not displayed are not flagged by view identifier fields 164, in the example of Figure 5.
Run point descriptor 140 also includes reserved escape bit fields 166. Run point descriptor 140 can include escape bits as padding such that the number of bits in run point descriptor 140 is evenly divisible by eight . Because the number of view order index fields and view identifier fields 5 may vary, the number of escape bits that multiplexer 30 includes in operating point descriptor 140 may accordingly vary. For example, the number of escape bits can be determined according to the following formula: 10 escape bits = (6* (num_display_views + num_decode_views)) % 8 where "%" represents the modulo operator.
Table 5 summarizes an exemplary set of data that can be included in the exemplary operating point descriptor 15 140 of figure 5. Table 5 — MVC operating point descriptor


Figure 6 is a block diagram illustrating another alternative exemplary set of data that may be included in one of the operating point descriptors 114 (Figure 3). In the example of Fig. 6, operating point descriptor 170 includes descriptor tag field 172, descriptor length field 174, profile_idc field 176, level_idc field 178, frame rate field 180, number of views field display 182, decoding views field number 184, average bit rate field 10 186, maximum bit rate field 188, temporal identifier field 190, reserved bit field 192, operating point identifier field 194 , operating point independent flag field 196, optional dependent operating point identifier field 198, 15 view order index fields 200, view identifier fields 202, and reserved escape bit fields 204. As described below , operating point descriptor 170 provides an exemplary operating point descriptor for an operating point that depends on another operating point and that flags extra views needed for decoding.
Display views field number 182 and frame rate field 180 correspond to a renderability value signaled by operating point descriptor 140. profile_idc field 176, level_idc field 178 and decoding views field number 184 , in the example of Fig. 6, represent examples of data that may correspond to a value of decoding capability signaled by operating point descriptor 140. Average bit rate field 154 and maximum bit rate field 156 correspond to a value of bit rate signaled by operating point descriptor 140.
As described above, the MPEG-2 Systems specification specifies that each descriptor has a descriptor tag field and a descriptor length field, each of 8 bits. Thus, multiplexer 30 (FIG. 1) may assign a value to descriptor tag field 172 indicative of an MVC operating point descriptor. Multiplexer 30 can also determine a number of views for the operating point and a number of bits reserved for the operating point descriptor and then calculate the operating point descriptor length 170, in bytes, which follows the length field of descriptor 174. Multiplexer 30 assigns this calculated length value to length descriptor field 174 when instantiating operating point descriptor 140.
profile_idc field 176 may comprise an eight-bit field indicating the profile_idc of the operating point reassembled by the information given in operating point descriptor 170. Level_idc field 178 may comprise an eight-bit field indicating the level_idc of the operating point reassembled by the information given in operating point descriptor 170.
Frame rate field 180 may comprise a 16-bit field that indicates the maximum frame rate, in frame/256 seconds, of the reassembled AVC video stream. That is, multiplexer 30 can calculate the maximum frame rate of a time period of 256 seconds to adjust the value of frame rate field 149. As with frame rate field 124, in other examples for frame rate field 180 frames, time periods other than 256 seconds can be used.
Number of display views field 182 may comprise a ten bit field which indicates the value of the number of views directed to output the reassembled AVC video stream. In general, display view field number 182 represents a number of views to be displayed for a corresponding operating point. Number of decoding views field 184 may comprise a ten bit field which indicates the value of the number of views required to decode the reassembled AVC video stream. This value may differ from the number of views to be displayed, indicated by the number of display views field 182. This may result from certain views being required for decoding, due to view dependencies, but not currently being displayed, for example, as described above with respect to field number of decoding views 128.
Average bit rate field 186 may comprise a sixteen bit field that indicates the average bit rate, in kilobits per second, of the reassembled AVC video stream.
When set to 0, the average bit rate is not indicated. That is, a value of zero for average bit rate field 186 implies that average bit rate field 186 should not be used to determine the average bit rate of the reassembled AVC video stream. Maximum bit rate field 188 may comprise a sixteen bit field that indicates the maximum bit rate, in kbits per second, of the reassembled AVC video stream. When set to 0, the maximum bit rate is not indicated. In particular, when the maximum bitrate field value 188 is set to zero, the maximum bitrate field 188 should not be used to determine the maximum bitrate of the reassembled AVC video stream.
Temporal identifier field 190 may comprise a three-bit field which indicates the value of temporal_id corresponding to the frame rate of the reassembled AVC video stream. That is, the temporal_id can be used to determine the frame rate of the reassembled AVC video stream, as discussed above. Reserved bit field 192 corresponds to a single bit, which is reserved for future use.
Operating point descriptor 170 also includes operating point identifier field 194 and operating point dependent flag field 196. Operating point identifier field 194 may comprise a ten-bit field indicating the operating point identifier described by operating point descriptor 170. Operating point dependent flag field 196 is a single bit flag that indicates whether a dependency of the current operating point on another operating point is signaled. If operating point dependent flag 196 has a value of one (or true), the dependency is signaled; if the value of operation point dependent flag 196 is zero (or false), the dependency is not signaled.
When operating point dependent flag value 196 is true or one, operating point descriptor 170 additionally includes dependent operating point identifier field 198. When present, operating point identifier field 198 may comprise a field ten-bit that indicates the identifier of the operating point that the current descriptor depends on. That is, when multiplexer 30 determines which operating point descriptor 170 corresponds to an operating point that depends on another operating point, multiplexer 30 sets the operating point dependent flag value to true or one and then signals the identifier of the operating point on which the operating point corresponding to the operating point descriptor 170 depends.
Operation point descriptor 170 also includes view order index fields 200 and view identifier fields 202. Each of the view order index fields 202 may comprise a ten-bit field that indicates the index value order of view of the NAL units contained in the current operating point with an identifier of operation_point_id, but not contained in the 25 operating point with an identifier of dependent_operation_point_id. A client device can reassemble the NAL units corresponding to all view_order_index values signaled in operating point descriptor 170 by view order index fields 200. View order index fields 200 include view order index fields for each of the views to be decoded. Given a value of view_order_index, a client device can extract the corresponding NAL units from the elementary streams, because the MVC extension descriptor tells the range of the view order index values in which elementary stream and the range cover the value of view_order__index flagged in the operating point descriptor. The operation point flagged in operation point descriptor 170 is reassembled by the NAL units corresponding to all flagged view_order_index values of view order index fields 200 and the NAL units contained by the operation point with identifier of dependent_operation_point_id.
Each of the view identifier fields 202 may comprise a ten-bit field that indicates the view_id value of the NAL units contained in the reassembled AVC video bitstream. Thus, the view identifiers of each view displayed for an operating point are flagged using view identifier fields 202. That is, the view identifiers of view identifier fields 164 correspond to the displayed views. Thus, views that are decoded but not displayed are not flagged by view identifier fields 202 in the example of Figure 5.
Operation point descriptor 170 also includes reserved escape bit fields 204. Operation point descriptor 170 can include escape bits as padding, such that the number of bits in operation point descriptor 170 is evenly divisible by eight . Because the number of view order index fields and view identifier fields may vary, the number of escape bits that multiplexer 30 includes in operating point descriptor 170 may accordingly vary. For example, the number of escape bits can be determined according to the following formula: escape bits = (6*(num_display_views + num_decode_views)) % 8 where "%"represents the modulo operator. Table 6 below summarizes an exemplary set of 5 data that can be included in the exemplary operating point descriptor 170 of figure 6. Table 6 — MVC operating point descriptor


As yet another alternative, source device 20 (figure 1) signals characteristics of an operating point using a data structure in addition to an operating point descriptor. For example, source device 20 may signal a renderability value that describes a rendering capability to be satisfied by a receiving device to use the MVC operating point, a decoding capability value that describes a decoding capability 10 to be satisfied by the receiving device to use the operating point MVC, and a bit rate value describing a bit rate of the operating point MVC using modified MVC extension descriptor. Table 7 below illustrates an example of such a modified MVC 15 extension descriptor. Table 7 — MVC Extension Descriptor

Multiplexer 30 (figure 2) can construct MVC 110 extension descriptors according to the syntax defined by Table 7. In general, the semantics of the syntactic elements of Table 7 is the same as that of the commonly named 5 elements described with respect to Table 1 above. The example in Table 7 includes additional elements over those in Table 1, namely, a frame rate field, a field number of display views, a field number of decoding views, and view identifier fields for each view. of an operating point to which the MVC extension descriptor corresponds. The frame rate field may comprise a sixteen bit field which indicates the maximum frame rate, in frames/256 seconds, of the reassembled AVC video stream. The number of display views field "num_display_views" may comprise a ten-bit field which indicates the value of the number of views directed to output the reassembled AVC video stream. The number of decode views field "num_decode_views" may comprise a ten bit field which indicates the value of the number of views required to decode the reassembled AVC video stream. Each of the "viewid" view identifier fields may comprise a ten-bit field that indicates the view_id value of the NAL units for a corresponding view contained in the reassembled AVC video bitstream.
In some examples, one or more operating point descriptors may include values that indicate a maximum temporal identifier value and a maximum frame rate value of all MVC operating points of a bitstream. In some examples, the maximum temporal identifier value and the maximum frame rate value of all MVC operating points of the bitstream can be signaled in an MVC operating point descriptor.
Figure 7 is a conceptual diagram illustrating an exemplary MVC prediction pattern. In the example in Figure 7, eight views (having view IDs from "SO" to "S7") are illustrated, and twelve temporal locations (from "TO" to "Til") are illustrated for each view. That is, each row in Figure 7 corresponds to a view, while each column indicates a temporal location.
Although MVC has a so-called base view, which is decodable by H.264/AVC decoders, and stereo view pair can also be supported by MVC, the advantage of MVC is that it can support an example that uses more than two views as a 3D video input and decodes this 3D video represented by the multiple views. A client renderer having an MVC decoder can expect 3D video content with multiple views.
Frames in Figure 7 are indicated in each row and column indication in Figure 7 using a shaded block including a letter, designating whether the corresponding frame is intracoded (i.e., an I frame) or intercoded in one direction (i.e., as a P-frame) or in multiple directions (that is, as a B-frame). In general, predictions are indicated by arrows, where the pointed frame uses the object it points to as a prediction reference. For example, the P-frame of view S2 in temporal location TO is predicted from I-frame of view S0 in temporal location TO.
As with single-view video encoding, frames of a multi-view video encoding video sequence can be encoded predictively with respect to frames at different temporal locations. For example, frame b of view S0 in temporal location T1 has an arrow pointing at it from frame I of view S0 in temporal location TO, indicating that frame b is predicted from frame I. Additionally, however, in the context of multiview video encoding, frames can be predicted interview. That is, a view component can use view components in other views by reference. In MVC, for example, interview prediction is performed as if the view component in another view were an interprediction reference. Potential interview references are flagged in the Sequence Parameter Set (SPS) MVC extension and can be modified by the reference image list 5 construction process, which allows flexible ordering of the interview prediction or interprediction references. Table 8 below provides an exemplary definition for a set of MVC extension string parameters. Table 8


Figure 7 provides several examples of interview prediction. View S1 frames in the example of Figure 7 are illustrated as being predicted from frames at different time locations of view S1 as well as interview predicted from frames of views SO and S2 frames at the same time locations. For example, view S1 b-frame at temporal location TI is predicted from each of view S1 B-frames at temporal locations TO and T2, as well as b-frames of views S0 and S2 at temporal location T1.
In the example in Figure 7, capital "B" and lowercase "b" are intended to indicate different hierarchical relationships between frames, rather than different coding methodologies. In general, uppercase "B" frames are relatively larger in the prediction hierarchy than lowercase "b" frames. Figure 7 also illustrates variations in the prediction hierarchy using different shading levels, where a greater amount of shading frames (ie, relatively darker) is greater in the prediction hierarchy than those frames having less shading (i.e., relatively clearer). For example, all of the I-frames in Figure 7 are illustrated with full shading, while P-frames have slightly lighter shading and B-frames (and lowercase b-frame) have various levels of shading relative to each other, but always clearer than the shading of P-frames and I-frames.
In general, the prediction hierarchy is related to view order indices, where relatively larger frames in the prediction hierarchy must be decoded before decoding frames that are relatively smaller in the hierarchy, such that relatively larger frames in the hierarchy can be used as frames of reference when decoding the relatively smaller frames in the hierarchy. A view order index is an index that indicates the decoding order of view components in an access unit. View order indices are implicit in the MVC extension of SPS as specified in Annex H of H.264/AVC (change MVC). In SPS, for each index i, the corresponding view_id is flagged. Decoding the view components will follow the ascending order of the view order index. If all views are presented, then the view order indices are in a consecutive order from 0 to num_views_minus_l.
In this way, frames used as reference frames can be decoded before decoding the frames that are coded with reference to the reference frames. A view order index is an index that indicates the decoding order of view components in an access unit. For each view order index i, the corresponding view_id is flagged. The decoding of view components follows the ascending order of view order indices. If all views are presented, then the set of view rank indices may comprise a consecutively ranked set adjusted from zero to one minus the total number of views.
For certain frames at equal levels of the hierarchy, decoding order may not matter in relation to each other. For example, the I frame of view S0 in temporal location TO is used as a reference frame for the P frame of view S2 in temporal location TO, which in turn is used as a reference frame for the P frame of S4 view in temporal location 25 TO. Therefore, the I-frame of view S0 in temporal location TO must be decoded before the P-frame of view S2 in temporal location TO, which is to be decoded before the P-frame of view S4 in temporal location TO. However, between views SI and S3, 30 a decoding order does not matter, because views SI and S3 do not trust each other for prediction, but instead are only predicted from views that are higher in the prediction hierarchy. Furthermore, view SI can be decoded before view S4, since view S1 is decoded after views SO and S2.
In this way, a hierarchical ordering can be used to describe views from SO to S7. Let the notation 5 SA >SB mean that view SA must be decoded before view SB. Using this notation, SO > S2 > S4 > S6 > S7 in the example in Figure 7. Also, with respect to the example in Figure 7, SO > S1, S2 > S1, S2 > S3, S4 > S3, S4 > S5 and S6 > S5. Any decoding order for the views that does not violate these requirements is possible. Consequently, many different decoding orders are possible, with only certain limitations. Two exemplary decoding orders are presented below, although it should be understood that many other decoding orders are possible. In an example, illustrated in Table 9, below, views are decoded as quickly as possible. Table 9

The example in Table 9 recognizes that view S1 can be decoded immediately after views SO and S2 20 are decoded, view S3 can be decoded immediately after views S2 and S4 are decoded, and view S5 can be decoded immediately after views S4 and S6 are decoded.
Table 10 below presents another exemplary decoding order, in which the decoding order is such that any view that is used as a reference to another view is decoded before views that are not used as a reference to any other view. Table 10
The example in Table 10 recognizes that frames from views S1, S3, S5 and S7 do not act as reference frames for frames from any other views and therefore views 5 S1, S3, S5 and S7 can be decoded after frames from those views that they are used as reference frames, i.e. views S0, S2, S4 and S6, in the example of figure 7. With respect to each other, views S1, S3, S5 and S7 can be decoded in any order. 10 Consequently, in the example of Tab it 10, view S7 is decoded before each of views S1, S3 and S5. To be clear, there can be a hierarchical relationship between frames in each view, as well as the temporal locations of frames in each view. With respect to the example of Figure 7, frames at temporal location TO are intrapredicted or interviews predicted from frames of other views at temporal location TO. Similarly, frames at temporal location T8 are intrapredicted or interviews predicted from frames 20 of other views at temporal location T8. Consequently, with respect to a temporal hierarchy, temporal locations T0 and T8 are at the top of the temporal hierarchy.
T4 temporal location frames in the example of Figure 7 are smaller in the temporal hierarchy than TO and T8 temporal location frames because T4 temporal location frames are coded B with reference to TO and T8 temporal location frames. Frames at temporal locations T2 and T6 are smaller in the temporal hierarchy than frames at temporal location T4. Finally, frames at temporal locations T1, T3, T5, and T7 are smaller in the temporal hierarchy than frames at temporal locations T2 and T6. In MVC, a subset of a complete bitstream 5 can be extracted to form a substream of bits that still conforms to MVC. There are many possible bit substreams that specific applications may require, based, for example, on a service provided by a server, the capacity, support and 10 decoder capabilities of one or more clients, and/or the preference of one or more customers. For example, a client might only require three views, and there might be two scenarios. In one example, one customer might require smooth viewing experience and might prefer views with view_id values SO, SI, and S2, while another customer might require view scalability and prefer views with view_id values SO, S2, and S4. If the view_ids are originally ordered with respect to the example in Table 9, the view order index values 20 are {0, 1, 2} and {0, 1, 4} in these two examples, respectively. Note that both sub bitstreams can be decoded as independent MVC bitstreams and can be supported simultaneously.
There can be many MVC bit substreams that are decodable by MVC decoders. In theory, any combination of views that satisfies the following two properties can be decoded by an MVC decoder compatible with a certain profile or level: (1) the view components in each access unit are ordered 30 in an ascending order of rank index view, and (2) for each view in the ensemble, its dependent views are also included in the ensemble.
Figure 8 is a flowchart illustrating an exemplary method for using a data structure that signals characteristics of an operating point. That is, the method of Fig. 8 includes building data structures for each operating point of an MPEG-2 Systems bit stream by a source device, e.g., source device 20 (Fig. 1). The method of Fig. 8 also includes using received data structures to select an operating point to retrieve the multimedia data for decoding and display by a target device, such as target device 40 (Fig. 1). Initially, in the example of Figure 8, source device 20 determines operating points for a program (210). For example, source device 20 can select multiple subsets of views of a program to create multiple operating points that represent client devices having various capabilities, for example, rendering and decoding capabilities. An administrator can interact with source device 20, for example, to select views and create operating points that represent client devices that have different rendering and decoding capabilities, or different operating points could be created automatically by source device 20.
After determining the operating points for a program, source device 20 can generate data structures for each of the operating points in a program map table (212), for example, when the bit stream is broadcast as a MPEG-2 Systems transport stream. Alternatively, source device 20 may generate the data structures in a program stream map when the bit stream is broadcast as an MPEG-2 Systems program stream. In any case, source device 20 can generate, for each operating point, a data structure representing characteristics of the corresponding operating point. The data structure may comprise an operating point descriptor corresponding to one of the examples in figures 4 to 6, for example. In this way, the data structure can signal rendering features, decoding features and a bit rate for the corresponding operating point. Source device 20 can then output the data structures (214), e.g. within the PMT in the example of Fig. 8, to a client device, e.g., destination device 40 (Fig. 1). In this way, source device 20 can output the data structures as part of the bit stream. Source device 20 may output the bit stream in the form of a broadcast, unicast, multicast, any broadcast or other communication protocol over a network, for example, over a wireless or wired network, or broadcast over frequencies for example, according to signals conforming to Advanced Television Systems Committee (ATSC) standards or National Television System Committee (NTSC) standards. Alternatively, source device 20 can encode the bitstream onto a computer-readable storage medium, such as a DVD-ROM, Blu-ray disc, flash drive, magnetic disk, or other storage medium, in which case device The source 20 can form a PSM that includes the data structures for the operating points and encode the PSM on the computer-readable storage medium. Destination device 40 may finally receive the PMT (or PSM) from source device 20 (216). Destination device 40 may then select one of the operating points based on characteristics of the operating points signaled by the data structures included in the PMT or PSM (218). In general, target device 40 may select an operating point for which target device 40 satisfies the rendering and decoding capabilities signaled by the corresponding data structure. For example, target device 40 can determine whether video output 44 is capable of rendering a number of views indicated by the 10 data frame as the number of views to be displayed, at a frame rate in accordance with the value of the capabilities of rendering signaled by the data structure to the operating point. Likewise, destination device 40 can determine whether video decoder 48 is capable of decoding a number of views to be decoded for the operating point, as signaled by the operating point decoding capability value data structure. Furthermore, in some examples, target device 40 may use the bit rate signaled in the data structure to select an operating point that is suitable for a transport medium, for example, based on bandwidth limitations. of the transport medium from which destination device 40 receives the bit stream. When target device 40 determines that target device 40 is capable of rendering and decoding more than one operating point, target device 40 can select a higher quality operating point for decoding and rendering. For example, target device 40 may select an operating point having the highest number of views, the highest bit rate number, the highest frame rate, or other quality indications for an operating point to determine which Select operating point.
After selecting an operating point, target device 40 can retrieve data for the operating point from the bit stream (220). That is, target device 40 can extract data for each of the views corresponding to the operating point from the program included in the bit stream. In some examples, target device 40 selects data from one or more bitstreams of the bitstream to extract data to the operating point. After extracting the data, target device can decode and display the data for the selected operating point (222). Video decoder 48 can decode each of the views that are to be decoded for the operating point, while video output 44 can display each one. of the views that should be displayed for the operating point. The views displayed may not necessarily be the views that are decoded as described above.
In one or more examples, the functions described can be implemented in hardware, software, firmware or any combination of these. If implemented in software, the functions can be stored or transmitted via one or more instructions or code in a computer-readable medium. Computer-readable media can include computer-readable storage media, such as data storage media or communication media, including any media that facilitates the transfer of a computer program from one place to another. Data storage means can be any available means that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. By way of example, and not limitation, such computer readable storage media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory or any other means that can be used to store code of the desired program in the form of data structures or instructions and that can be accessed by a computer. Also, any connection is aptly named a computer-readable medium. For example, if instructions are transmitted from a network site, server or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL) or wireless technologies such as infrared, radio and microwave, then coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that a computer readable storage medium and a data storage medium do not include connections, carrier waves, signals or other transient media. Disc (disk) and disc (disc), as used herein, include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, and blu-ray disc, where discs (disks) typically reproduce data magnetically , while discs reproduce data optically with lasers. Combinations of the above should also be included in the scope of computer readable media. Instructions can be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application-specific integrated circuits (ASICs), field-programmable logic arrays (FPGAs) or other sets of circuits equivalent discrete or integrated logics. Accordingly, the term "processor" as used herein may refer to any of the foregoing frameworks or any other framework suitable for implementing the techniques described herein. In addition, in some respects, the functionality described here may be provided within dedicated hardware and/or software modules configured for encoding and decoding or incorporated into a combined codec. Also, the techniques can be fully implemented in one or more circuits or logic elements.
The techniques of this disclosure can be implemented in a wide variety of devices or devices, including a cordless handset, an integrated circuit (IC), or a set of ICs (eg, a chip set). Various components, modules or units are described in this disclosure to emphasize functional aspects of devices configured to carry out the disclosed techniques, but which do not necessarily require realization by different hardware units. Instead, as described above, multiple units can be combined into one codec hardware unit or provided by a collection of interoperable hardware units, including one or more processors, as described above, together with firmware and/or software suitable. Several examples have been described. These and other examples are within the scope of the following claims.
权利要求:
Claims (14)
[0001]
1. Method characterized in that it comprises: constructing, with a source device, a plurality of multi-view video encoding (MVC) operating point descriptors (114, 140) each corresponding to an MVC operating point, wherein each MVC operating point descriptor is inserted into the Program Map Table (PMT) or Program Stream Map (PSM) descriptor loops of an MPEG-2 System (Picture Experts Group) standard bitstream. Motion), where each MVC operating point descriptor (114, 140) signals a renderability value that describes a renderability to be satisfied by a receiving device to use the corresponding MVC operating point, and a value decoding capability that describes a decoding capability to be satisfied by the receiving device to use the corresponding MVC operating point, where the renderability value describes at least a number of views directed for rendering (150) for the corresponding MVC operating point, a frame rate for video data (149) for the corresponding MVC operating point, and a temporal identifier value (158) for the operating point corresponding MVC; wherein the decoding capability value describes at least a number of views to be decoded (152) for the corresponding operating point MVC, a level value (148) corresponding to the operating point MVC, and a profile value (146 ) corresponding to the MVC operating point; and outputting the bit stream comprising the plurality of MVC operating point descriptors (114) 140).
[0002]
2. Method according to claim 1, characterized in that constructing each MVC operating point descriptor comprises constructing the MVC operating point descriptor to make one or more two-dimensional display devices and three-dimensional display devices adapt the bit stream for the one or more two-dimensional display devices and three-dimensional display devices and to accommodate transport media of various bandwidths for the one or more two-dimensional display devices and three-dimensional display devices.
[0003]
3. Method according to claim 1, characterized in that constructing the plurality of MVC operating point descriptors comprises constructing the plurality of MVC operating point descriptors to signal a bit rate value describing a bit rate of the corresponding MVC operating point, where the bitrate value describes one of an average bitrate for the corresponding MVC operating point and the maximum bitrate for the corresponding MVC operating point.
[0004]
4. Method according to claim 1, characterized in that constructing each MVC operating point descriptor comprises: including a frame rate value in the operating point descriptor that describes a maximum frame rate for included video data in a number of views to the MVC operating point; including view identifier values in the operating point descriptor for a number of views targeted for rendering from the MVC operating point, each of the view identifier values corresponding to one of the views targeted for rendering; including view identifier values in the operating point descriptor for a number of views to be decoded for the operating point MVC, each of the view identifier values corresponding to one of the views to be decoded; and including a temporal identifier value in the operating point descriptor that corresponds to a frame rate for a video stream assembled from the video data of the views for the operating point MVC.
[0005]
5. Apparatus, characterized in that it comprises: means for constructing a plurality of multi-view video encoding (MVC) operating point descriptors each corresponding to an operating point MVC, wherein the means inserts each operating point descriptor MVC is inserted into the Program Map Table (PMT) or Program Stream Map (PSM) descriptor loops of an MPEG-2 System (Moving Pictures Expert Group) standard bitstream, where each MVC operating point descriptor signals a rendering capability value that describes a rendering capability to be satisfied by a receiving device to use the corresponding MVC operating point, and a decoding capability value that describes a decoding capability to be satisfied by the receiving device to use the corresponding MVC operating point, where the renderability value describes at least a number of dir views. given for rendering for the corresponding MVC operating point, a frame rate for video data from the corresponding MVC operating point, and a temporal identifier value for the corresponding MVC operating point; wherein the decoding capability value describes at least a number of views to be decoded for the corresponding operating point MVC, a level value corresponding to the operating point MVC, and a profile value corresponding to the operating point MVC; and means for outputting the bit stream comprising the plurality of MVC operating point descriptors.
[0006]
6. Apparatus according to claim 5, characterized in that the means for constructing each MVC descriptor further comprises: means for including a frame rate value in the operating point descriptor describing a maximum frame rate for data from video included in a number of views for the MVC operating point; means for including view identifier values in the operating point descriptor for a number of views directed towards rendering of the MVC operating point, each of the view identifier values corresponding to one of the views directed towards rendering; means for including view identifier values in the operating point descriptor for a number of views to be decoded for the operating point MVC, each of the view identifier values corresponding to one of the views to be decoded; and means for including a temporal identifier value in the operating point descriptor that corresponds to a frame rate for a video stream assembled from the video data of the views for the operating point MVC.
[0007]
7. Apparatus according to claim 5, characterized in that a multiplexer constructs each MVC operating point descriptor, and in that each MVC operating point corresponds to a subset of a number of views of the bit stream, and in that, to construct each MVC operating point descriptor, the multiplexer includes a frame rate value in the operating point descriptor that describes a maximum frame rate for video data included in the views for the corresponding MVC operating point, includes view identifier values in the operating point descriptor for views targeted for rendering the corresponding MVC operating point, where each of the view identifier values corresponds to one of the views targeted for rendering, includes view identifier values in the descriptor from operating point for views to be decoded to the corresponding MVC operating point, where each of the identifier values of view corresponds to one of the views to be decoded, and includes a temporal identifier value in the operating point descriptor that corresponds to a frame rate for a video stream assembled from video data of the views to the operating point MVC corresponding.
[0008]
8. Method characterized in that it comprises: receiving, with a target device, in loops of descriptors of a Program Map Table (PMT) or Program Stream Map (PSM) of an MPEG System standard bit stream -2 (Moving Image Experts Group), a plurality of multiview video encoding (MVC) descriptors each corresponding to an MVC operating point, where each MVC operating point descriptor signals a rendering capability value that describes a rendering capability to be satisfied by a receiving device to use the MVC operating point, and a decoding capability value that describes a decoding capability to be satisfied by the receiving device to use the MVC operating point, in that the renderability value describes at least a number of views directed to render to the corresponding MVC operating point, a frame rate for v data. video of the corresponding MVC operating point, and a temporal identifier value for the corresponding MVC operating point, where the decoding capability value describes at least a number of views to be decoded for the corresponding MVC operating point, a value of level corresponding to the operating point MVC, and a profile value corresponding to the operating point MVC; determining for each MVC operating point descriptor whether a target device video decoder is capable of decoding the number of views corresponding to the MVC operating point based on the decoding capability signaled by the MVC operating point descriptor; determining for each MVC operating point descriptor whether the target device is capable of rendering the views corresponding to the MVC operating point based on the rendering capability signaled by the MVC operating point descriptor; selecting an operating point based on the corresponding MVC operating point descriptor, wherein selecting comprises determining that the video decoder is capable of decoding and rendering views corresponding to the selected operating point; and send the views corresponding to the selected MVC operating point to the target device's video decoder.
[0009]
9. Method according to claim 8, characterized in that each MVC operating point descriptor comprises a frame rate value that describes a maximum frame rate for video data included in the views for the MVC operating point, view identifier values for a number of views targeted to render the MVC operating point, where each of the view identifier values corresponds to one of the views targeted for rendering, view identifier values for a number of views to be decoded to the MVC operating point, where each of the view identifier values corresponds to one of the views to be decoded, and a temporal identifier value that corresponds to a frame rate for a video stream assembled from the data view of the views for the MVC operating point.
[0010]
10. Method according to claim 9, characterized in that determining whether the video decoder is capable of decoding the views comprises determining whether the video decoder is capable of decoding a number of views equivalent to the number of views to be decoded at the frame rate indicated by the frame rate value.
[0011]
11. Method according to claim 8, characterized in that the target device is configured with a supported number of views that describes a supported number of views that can be rendered by the target device and a frame rate value that describes a frame rate of video data that can be displayed by the target device, wherein determining whether the target device is capable of rendering the views corresponding to the operating point MVC comprises: comparing a number of views corresponding to the operating point MVC for the number of views supported; and comparing a frame rate of the views corresponding to the MVC operating point to the frame rate value, wherein sending the views corresponding to the MVC operating point to the video decoder comprises sending the views corresponding to the MVC operating point to the video decoder when the number of views corresponding to the operating point MVC is less than or equal to the supported number of views and when the frame rate of the views corresponding to the operating point MVC is less than or equal to the frame rate value.
[0012]
12. Method according to claim 11, characterized by the fact that the supported number of views is inversely proportional to the frame rate value.
[0013]
13. Apparatus characterized in that it comprises: means for receiving a plurality of multiview video encoding (MVC) descriptors in descriptor loops of a Program Map Table (PMT) or Program Stream Map (PSM) of a stream of MPEG-2 System (Moving Image Experts Group) standard bits, each MVC operating point descriptor corresponding to an MVC operating point, where each MVC operating point descriptor signals a rendering capability value which describes a rendering capability to be satisfied by a receiving device to use the MVC operating point, and a decoding capability value that describes a decoding capability to be satisfied by the receiving device to use the MVC operating point, where the renderability value describes at least a number of views directed to render for the corresponding MVC operating point, a frame rate s for video data of the corresponding MVC operating point, and a temporal identifier value for the corresponding MVC operating point, wherein the decoding capability value describes at least a number of views to be decoded for the MVC operating point corresponding, a level value corresponding to the MVC operating point, and a profile value corresponding to the MVC operating point; means for determining for each MVC operating point descriptor whether a video decoder of the apparatus is capable of decoding a number of views corresponding to the MVC operating point based on the decoding capability signaled by the MVC operating point descriptor; means for determining for each MVC operating point descriptor whether the apparatus is capable of rendering the views corresponding to the MVC operating point based on the rendering capability signaled by the MVC operating point descriptor; means for selecting an operating point based on the corresponding MVC operating point descriptor, wherein selecting comprises determining that the video decoder is capable of decoding and rendering views corresponding to the selected operating point; and means for sending the views corresponding to the selected MVC operating point to the video decoder of the apparatus.
[0014]
14. Memory, characterized in that it comprises instructions stored therein, the instructions being executed by a computer to carry out the method as defined in any one of claims 1 to 4 5 or 8 to 12.
类似技术:
公开号 | 公开日 | 专利标题
BR112012002259B1|2021-05-25|signaling characteristics of an mvc operating point
US8411746B2|2013-04-02|Multiview video coding over MPEG-2 systems
US8780999B2|2014-07-15|Assembling multiview video coding sub-BITSTREAMS in MPEG-2 systems
EP2601789B1|2021-12-22|Signaling attributes for network-streamed video data
BRPI1013146B1|2021-09-28|METHOD FOR SENDING VIDEO DATA HAVING A PLURALITY OF VIEWS ACCORDING TO THE MULTIVIEW ENCODING STANDARD IN AN MPEG-2 BITS STREAM, EQUIPMENT TO GENERATE MULTIVIEW VIDEO DATA ACCORDING TO A MULTIVIEW ENCODING STANDARD AND COMPUTER-READABLE MEMORY
BR112013002693B1|2021-10-26|FLAG ATTRIBUTES FOR NETWORK PLAYED VIDEO DATA
同族专利:
公开号 | 公开日
AU2010279256A1|2012-03-01|
CN102474655A|2012-05-23|
KR20120054052A|2012-05-29|
EP2462742B1|2017-09-20|
BR112012002259A2|2016-06-14|
CA2768618C|2015-12-08|
RU2530740C2|2014-10-10|
SG177621A1|2012-03-29|
ES2650220T3|2018-01-17|
ZA201201474B|2013-08-28|
JP2013502097A|2013-01-17|
US8948241B2|2015-02-03|
JP5602854B2|2014-10-08|
IL217436A|2015-09-24|
CA2768618A1|2011-02-10|
HK1169247A1|2013-01-18|
KR101293425B1|2013-08-05|
EP2462742A1|2012-06-13|
WO2011017661A1|2011-02-10|
MY180768A|2020-12-08|
HUE037168T2|2018-08-28|
CN102474655B|2016-06-29|
TWI581635B|2017-05-01|
AU2010279256B2|2013-11-28|
IL217436D0|2012-02-29|
TW201112769A|2011-04-01|
US20110032999A1|2011-02-10|
RU2012108618A|2013-11-20|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US232272A|1880-09-14|Bran-cleaner and middlings-separator |
US266861A|1882-10-31|William e |
US248738A|1881-10-25|Refrigerati no-chamber |
US4829299A|1987-09-25|1989-05-09|Dolby Laboratories Licensing Corporation|Adaptive-filter single-bit digital encoder and decoder and adaptation control circuit responsive to bit-stream loading|
DE69232713T2|1991-05-29|2004-05-06|Pacific Microsonics, Inc., Berkeley|Improvements in systems to achieve greater amplitude resolution|
US6748020B1|2000-10-25|2004-06-08|General Instrument Corporation|Transcoder-multiplexer software architecture|
KR100475060B1|2002-08-07|2005-03-10|한국전자통신연구원|The multiplexing method and its device according to user's request for multi-view 3D video|
TWI260591B|2002-10-14|2006-08-21|Samsung Electronics Co Ltd|Information storage medium with structure for multi-angle data, and recording and reproducing apparatus therefor|
US20040260827A1|2003-06-19|2004-12-23|Nokia Corporation|Stream switching based on gradual decoder refresh|
US7324594B2|2003-11-26|2008-01-29|Mitsubishi Electric Research Laboratories, Inc.|Method for encoding and decoding free viewpoint videos|
US7054536B2|2004-05-12|2006-05-30|Molex Incorporated|Breakout assembly for flexible circuitry|
US20050254575A1|2004-05-12|2005-11-17|Nokia Corporation|Multiple interoperability points for scalable media coding and transmission|
KR100779875B1|2005-01-14|2007-11-27|주식회사 휴맥스|Method for setting reference frame order for multi-view coding and computer readable medium storing thereof|
KR100947234B1|2006-01-12|2010-03-12|엘지전자 주식회사|Method and apparatus for processing multiview video|
CN101416517A|2006-03-29|2009-04-22|汤姆森特许公司|Methods and apparatus for use in a multi-view video coding system|
CN101518086B|2006-07-20|2013-10-30|汤姆森特许公司|Method and apparatus for signaling view scalability in multi-view video coding|
JP4793366B2|2006-10-13|2011-10-12|日本ビクター株式会社|Multi-view image encoding device, multi-view image encoding method, multi-view image encoding program, multi-view image decoding device, multi-view image decoding method, and multi-view image decoding program|
EP2642756B1|2006-10-16|2018-09-26|Nokia Technologies Oy|System and method for implementing efficient decoded buffer management in multi-view video coding|
EP2080382B1|2006-10-20|2016-07-20|Nokia Technologies Oy|System and method for implementing low-complexity multi-view video coding|
ES2721506T3|2007-01-04|2019-08-01|Interdigital Madison Patent Holdings|Methods and apparatus for multi-view information, expressed in high-level syntax|
MX2009007240A|2007-01-08|2009-08-07|Nokia Corp|System and method for providing and using predetermined signaling of interoperability points for transcoded media streams.|
KR20080066522A|2007-01-11|2008-07-16|삼성전자주식회사|Method and apparatus for encoding and decoding multi-view image|
CN101658040B|2007-04-17|2013-09-11|汤姆森许可贸易公司|Hypothetical reference decoder for multiview video coding|
BR122012021948A2|2007-10-05|2015-08-11|Thomson Licensing|Method for embedding video usability information in a multi-view video coding system|CA2786812C|2010-01-18|2018-03-20|Telefonaktiebolaget L M Ericsson |Method and arrangement for supporting playout of content|
US8724710B2|2010-02-24|2014-05-13|Thomson Licensing|Method and apparatus for video encoding with hypothetical reference decoder compliant bit allocation|
US8374113B2|2010-06-03|2013-02-12|Cisco Technology, Inc.|Distributed gateway for reliable multicast wireless video|
US9191284B2|2010-10-28|2015-11-17|Avvasi Inc.|Methods and apparatus for providing a media stream quality signal|
CN102186038A|2011-05-17|2011-09-14|浪潮电子信息有限公司|Method for synchronously playing multi-viewing-angle pictures on digital television screen|
CN105357541A|2011-06-28|2016-02-24|三星电子株式会社|Method and apparatus for coding video|
US9237356B2|2011-09-23|2016-01-12|Qualcomm Incorporated|Reference picture list construction for video coding|
US9264717B2|2011-10-31|2016-02-16|Qualcomm Incorporated|Random access with advanced decoded picture buffermanagement in video coding|
US20130113882A1|2011-11-08|2013-05-09|Sony Corporation|Video coding system and method of operation thereof|
CN103931200A|2011-11-14|2014-07-16|摩托罗拉移动有限责任公司|Association of MVC stereoscopic views to left or right eye display for 3DTV|
US10154276B2|2011-11-30|2018-12-11|Qualcomm Incorporated|Nested SEI messages for multiview video codingcompatible three-dimensional video coding |
CA2856909C|2011-12-04|2016-12-06|Lg Electronics Inc.|Digital broadcasting reception method and apparatus capable of displaying stereoscopic images|
KR20140107182A|2011-12-27|2014-09-04|엘지전자 주식회사|Digital broadcasting reception method capable of displaying stereoscopic image, and digital broadcasting reception apparatus using same|
US20130258052A1|2012-03-28|2013-10-03|Qualcomm Incorporated|Inter-view residual prediction in 3d video coding|
JP6376470B2|2012-04-03|2018-08-22|サン パテント トラスト|Image encoding method, image decoding method, image encoding device, and image decoding device|
PL2842313T3|2012-04-13|2017-06-30|Ge Video Compression, Llc|Scalable data stream and network entity|
JP5949204B2|2012-06-21|2016-07-06|ソニー株式会社|Electronic device, stream transmission / reception method in electronic device, program, host device, and stream transmission / reception method in host device|
WO2014001573A1|2012-06-29|2014-01-03|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Video data stream concept|
US9912941B2|2012-07-02|2018-03-06|Sony Corporation|Video coding system with temporal layers and method of operation thereof|
US9241158B2|2012-09-24|2016-01-19|Qualcomm Incorporated|Hypothetical reference decoder parameters in video coding|
US9479773B2|2012-09-24|2016-10-25|Qualcomm Incorporated|Access unit independent coded picture buffer removal times in video coding|
US9161039B2|2012-09-24|2015-10-13|Qualcomm Incorporated|Bitstream properties in video coding|
US9432664B2|2012-09-28|2016-08-30|Qualcomm Incorporated|Signaling layer identifiers for operation points in video coding|
US9479779B2|2012-10-01|2016-10-25|Qualcomm Incorporated|Sub-bitstream extraction for multiview, three-dimensionaland scalable media bitstreams|
US9781413B2|2012-10-02|2017-10-03|Qualcomm Incorporated|Signaling of layer identifiers for operation points|
EP2904773A4|2012-10-03|2016-12-07|Hfi Innovation Inc|Method and apparatus of motion data buffer reduction for three-dimensional video coding|
US20140098851A1|2012-10-04|2014-04-10|Qualcomm Incorporated|Indication of video properties|
US9154785B2|2012-10-08|2015-10-06|Qualcomm Incorporated|Sub-bitstream applicability to nested SEI messages in video coding|
US9936196B2|2012-10-30|2018-04-03|Qualcomm Incorporated|Target output layers in video coding|
US9257092B2|2013-02-12|2016-02-09|Vmware, Inc.|Method and system for enhancing user experience for remoting technologies|
CN103118285A|2013-02-22|2013-05-22|浪潮齐鲁软件产业有限公司|Method for enabling multi-scenario television to be compatible with ordinary television|
US10063868B2|2013-04-08|2018-08-28|Arris Enterprises Llc|Signaling for addition or removal of layers in video coding|
WO2015055143A1|2013-10-17|2015-04-23|Mediatek Inc.|Method of motion information prediction and inheritance in multi-view and three-dimensional video coding|
US10205954B2|2013-10-23|2019-02-12|Qualcomm Incorporated|Carriage of video coding standard extension bitstream data using MPEG-2 systems|
US10291922B2|2013-10-28|2019-05-14|Arris Enterprises Llc|Method and apparatus for decoding an enhanced video stream|
CA3083172C|2014-05-21|2022-01-25|Arris Enterprises Llc|Individual buffer management in transport of scalable video|
US10034002B2|2014-05-21|2018-07-24|Arris Enterprises Llc|Signaling and selection for the enhancement of layers in scalable video|
US9930342B2|2014-06-20|2018-03-27|Qualcomm Incorporated|Systems and methods for signaling hypothetical reference decoder parameters in a parameter set|
EP3038358A1|2014-12-22|2016-06-29|Thomson Licensing|A method for adapting a number of views delivered by an auto-stereoscopic display device, and corresponding computer program product and electronic device|
US9930378B2|2015-02-11|2018-03-27|Qualcomm Incorporated|Signaling of operation points for carriage of HEVC extensions|
CN106303673B|2015-06-04|2021-01-22|中兴通讯股份有限公司|Code stream alignment and synchronization processing method, transmitting and receiving terminal and communication system|
GB2539462B|2015-06-16|2019-04-03|Canon Kk|Obtaining media data and metadata from encapsulated bit-streams wherein operating point descriptors can be dynamically set|
CN106331704B|2015-07-07|2019-10-22|杭州海康威视数字技术股份有限公司|A kind of video code rate control method and video coding apparatus|
EP3226561A1|2016-03-31|2017-10-04|Thomson Licensing|Method and apparatus for coding a video into a bitstream carrying region-based post processing parameters into an sei nesting message|
US10595046B2|2016-05-13|2020-03-17|Sony Corporation|Image processing apparatus and method|
TWI673708B|2017-01-10|2019-10-01|弗勞恩霍夫爾協會|Audio decoder, audio encoder, method for providing a decoded audio signal, method for providing an encoded audio signal, audio stream, audio stream provider and computer program using a stream identifier|
法律状态:
2018-03-27| B15K| Others concerning applications: alteration of classification|Ipc: H04N 21/435 (2011.01), H04N 21/2343 (2011.01), H04 |
2019-01-15| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|
2019-12-31| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|
2021-03-09| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|
2021-05-25| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 06/08/2010, OBSERVADAS AS CONDICOES LEGAIS. PATENTE CONCEDIDA CONFORME ADI 5.529/DF |
优先权:
申请号 | 申请日 | 专利标题
US23227209P| true| 2009-08-07|2009-08-07|
US61/232,272|2009-08-07|
US24873809P| true| 2009-10-05|2009-10-05|
US61/248,738|2009-10-05|
US26686109P| true| 2009-12-04|2009-12-04|
US61/266,861|2009-12-04|
US12/757,231|US8948241B2|2009-08-07|2010-04-09|Signaling characteristics of an MVC operation point|
US12/757,231|2010-04-09|
PCT/US2010/044780|WO2011017661A1|2009-08-07|2010-08-06|Signaling characteristics of an mvc operation point|
[返回顶部]