专利摘要:
slice header prediction for depth maps in three-dimensional video codecs. in one example this is a video encoder that is configured to encode a first slice, where the first slice comprises one of a texture slice and a corresponding depth slice, where the first slice has a slice header which comprises complete syntax elements representative of features of the first slice. the video encoder is further configured to determine common syntax elements for a second slice from the slice header of the first slice. the video encoder is also configured to encode the second slice after encoding the first slice at least partially based on the common syntax elements determined, where the second slice comprises one of the texture slice and the depth slice that is not the first slice, wherein the second slice has a slice header that comprises the syntax elements representative of characteristics of the second slice, excluding the values for the syntax elements that are common to the first slice.
公开号:BR112014001461B1
申请号:R112014001461-2
申请日:2012-07-20
公开日:2021-08-03
发明作者:Marta Karczewicz;Ying Chen
申请人:Qualcomm Incorporated;
IPC主号:
专利说明:

[001] This application claims the benefit of Provisional Applications No. US 61/510,738, filed July 22, 2011, No. US 61/522,584, filed August 11, 2011, No. US 61/563,772, filed at November 26, 2011 and US 61/624,031, filed April 13, 2012, each of which is incorporated herein by reference in its entirety. Field of Technique
[002] This disclosure relates to the field of video encoding, eg three-dimensional video data encoding. Fundamentals
[003] Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, the digital direct broadcast system, wireless communication devices such as radiotelephone sets, wireless broadcast systems, personal digital assistants ( PDAs), portable or desktop computers, digital cameras, digital recording devices, video game devices, video game consoles and the like. Digital video devices implement video compression techniques, such as MPEG-2, MPEG-4, or H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), to transmit and receive digital video more effectively. efficient. Video compression techniques perform spatial and temporal prediction to reduce or remove the inherent redundancy of video sequences.
[004] Video compression techniques perform spatial prediction and/or temporal prediction to reduce or remove the inherent redundancy of video sequences. For block-based video encoding, a video frame or slice can be partitioned into macroblocks. Each macroblock can be additionally partitioned. Macroblocks in an intra-coded (I) frame or slice are encoded using spatial prediction with respect to neighboring macroblocks. Macroblocks in an inter-coded (P or B) frame or slice can use spatial prediction with respect to neighboring macroblocks in the same frame or slice or temporal prediction with respect to other reference frames.
[005] After the video data has been encoded, the video data can be bundled for transmission or storage. Video data can be assembled into a video file conforming to any of a variety of standards, such as the media file format based on the International Organization for Standardization (International Organization for Standardization - ISO) and extensions thereof , such as stroke.
[006] Efforts were made to develop new video encoding standards based on H.264/AVC. One such standard is the scalable video encoding standard (SVC), which is the scalable extension to H.264/AVC. Another standard is multiple view video encoding (MVC), which has become the multiple view extension to H.264/AVC. A common MVC consensus is described in JVT-AB204, “Joint draft 8.0 on Multiview video coding,” 28th JVT meeting, Hannover, Germany, July 2008, available at http://wftp3.itu.int/av-arch /jvt-site/2008_07_Hannover/JVT-AB204.zip. A version of the AVC standard is described in JVT-AD007, “Editors’ draft revision to ITU-T Rec. H.264 | ISO/IEC 1449610 Advanced Video Coding - in preparation for ITU-T SG 16 AAP Consent (in integrated form)”, 30th meeting JVT, Genova, CH, February 2009,” available from http://wftp3.itu.int /av-arch/jvt-site/2009_01_Geneva/JVT-AD007.zip. This document integrates SVC and MVC into the AVC specification. summary
[007] In general, this disclosure describes techniques to support three-dimensional (3D) video rendering. In particular, the techniques of this disclosure refer to 3D encoding and decoding video content. This disclosure also proposes signaling techniques for coded block units of video data. For example, this disclosure proposes to reuse syntax elements included in a texture view component slice header for corresponding depth view components. Additionally, this disclosure proposes to reuse syntax elements in slice header information from depth view components to texture view components.
[008] In a 3D codec, a view component of each video data view at a specific time instance can include a texture view component and a depth view component. The texture view component can include luminance components (Y) and chrominance components (Cb and Cr). The luminance (brightness) and chrominance (color) components are collectively referred to herein as the “texture” components. The depth view component can be a depth map of an image. In 3D image rendering, depth maps include depth components that are representative of depth values, for example, for corresponding texture components. Depth View Components can be used to generate virtual views from a given viewed perspective.
[009] The syntax elements for components of depth and texture components can be signaled with a coded block unit. Encoded block units, also referred to simply as “encoded blocks” in this disclosure, can correspond to macroblocks in ITU-T H.264/AVC (Advanced Video Encoding) or High Efficiency Video Encoding (HEVC) encoding units .
[0010] In one aspect, a decoding method includes receiving a texture slice for a texture view component associated with one or more encoded blocks of video data representative of texture information, the texture slice comprising the ones. or more coded blocks and a texture slice header comprising syntax elements representative of texture slice characteristics. The method further includes receiving a depth slice for a depth view component associated with one or more encoded blocks of depth information corresponding to the texture view component, wherein the depth slice comprises the one or more encoded blocks of information. depth slice header comprising syntax elements representative of depth slice characteristics and where both the depth view component and the texture view component belong to a view and an access unit. The method further comprises decoding a first slice, wherein the first slice comprises one of the texture slice and the depth slice, wherein the first slice has a slice header comprising full syntax elements representative of features of the first slice and determines common syntax elements for a second slice of the slice header of the first slice. The method may further include decoding the second slice after encoding the first slice at least partially based on the common syntax elements determined, wherein the second slice comprises one of the texture slice and the depth slice that is not the first slice, wherein the second slice has a slice header that comprises syntax elements representative of characteristics of the second slice, excluding values for the syntax elements that are common to the first slice.
[0011] In another aspect, a device for decoding data includes a video decoder configured to receive a texture slice for a texture view component associated with one or more encoded blocks of video data representative of texture information, wherein the texture slice comprises the one or more coded blocks and a texture slice header which comprises syntax elements representative of characteristics of the texture slice, which receives a depth slice for a depth view component associated with the one or more blocks coded depth information corresponding to the texture view component, wherein the depth slice comprises the one or more encoded blocks of depth information and a depth slice header comprising syntax elements representative of characteristics of the depth slice and where both the depth view component and the component texture view entities belong to a view and an access unit, decoding a first slice, wherein the first slice comprises one of the texture slice and the depth slice, wherein the first slice has a slice header that comprises complete syntax elements representative of characteristics of the first slice, determining common syntax elements for a second slice of the slice header of the first slice, and decoding the second slice after decoding the first slice at least partially based on the determined common syntax elements , wherein the second slice comprises one of the texture slice and the depth slice that is not the first slice, wherein the second slice has a slice header comprising syntax elements representative of characteristics of the second slice, excluding values for the syntax elements that are common to the first slice.
[0012] In another aspect, a computer program product comprises a computer readable storage medium that has stored therein instructions which, when executed, cause a processor of a video decoding device to receive a texture slice to a texture view component associated with one or more coded blocks of video data representative of texture information, the texture slice comprising the one or more coded blocks and a texture slice header comprising representative syntax elements of texture slice features. The instructions further cause the video decoding device processor to receive a depth slice for a depth view component associated with one or more encoded blocks of depth information corresponding to the texture view component, where the depth comprises the one or more encoded blocks of depth information and a depth slice header which comprises syntax elements representative of depth slice characteristics and in which both the depth view component and the texture view component belong to a view and an access unit. The instructions further cause the video decoding device processor to decode a first slice, where the first slice comprises one of the texture slice and the depth slice, where the first slice has a slice header comprising elements syntax representative of characteristics of the first slice and determines the common syntax elements for a second slice of the slice header of the first slice. The instructions further cause the video decoding device processor to decode the second slice after decoding the first slice at least partially based on the common syntax elements determined, wherein the second slice comprises one of the texture slice and the depth slice other than the first slice, where the second slice has a slice header that comprises syntax elements representative of characteristics of the second slice, excluding values for the syntax elements that are common to the first slice.
[0013] In another aspect, there is provided a device comprising means for receiving a texture slice for a texture view component associated with one or more encoded blocks of video data representative of texture information, wherein the texture slice comprises the one or more coded blocks and a texture slice header comprising syntax elements representative of texture slice characteristics. The device further comprises means for receiving a depth slice for a depth view component associated with one or more encoded blocks of depth information corresponding to the texture view component, wherein the depth slice comprises the one or more encoded blocks of depth information and a depth slice header comprising syntax elements representative of characteristics of the depth slice and where both the depth view component and the texture view component belong to a view and an access unit . The device further comprises means for decoding a first slice, wherein the first slice comprises one of the texture slice and the depth slice, wherein the first slice has a slice header comprising syntax elements representative of characteristics of the first slice . The device further comprises means for decoding the second slice after encoding the first slice at least partially based on the common syntax elements determined, wherein the second slice comprises one of the texture slice and the depth slice which is not the first slice, where the second slice has a slice header that comprises syntax elements representative of characteristics of the second slice, excluding values for the syntax elements that are common to the first slice.
[0014] In one aspect, an encoding method includes receiving a texture slice for a texture view component associated with one or more encoded blocks of video data representative of texture information, the texture slice comprising the ones. or more coded blocks and a texture slice header comprising syntax elements representative of texture slice characteristics. The method further includes receiving a depth slice for a depth view component associated with one or more encoded blocks of depth information corresponding to the texture view component, wherein the depth slice comprises the one or more encoded blocks of information. depth slice header comprising syntax elements representative of depth slice characteristics and where both the depth view component and the texture view component belong to a view and an access unit. The method further comprises encoding a first slice, wherein the first slice comprises one of the texture slice and the depth slice, wherein the first slice has a slice header comprising syntax elements representative of characteristics of the first slice and determines the common syntax elements for a second slice of the slice header of the first slice. The method may further include encoding the second slice after encoding the first slice at least partially based on the common syntax elements determined, wherein the second slice comprises one of the texture slice and the depth slice that is not the first slice , wherein the second slice has a slice header that comprises syntax elements representative of characteristics of the second slice, excluding values for the syntax elements that are common to the first slice.
[0015] In another aspect, a device for encoding data includes a video encoder configured to receive a texture slice for a texture view component associated with one or more encoded blocks of video data representative of texture information, wherein the texture slice comprises the one or more coded blocks and a texture slice header which comprises syntax elements representative of texture slice characteristics, receives a depth slice for a depth view component associated with the one or more coded blocks of depth information corresponding to the texture view component, wherein the depth slice comprises the one or more encoded blocks of depth information and a depth slice header comprising syntax elements representative of characteristics of the depth slice and in that both the depth view component and the texture views belong to a view and an access unit. The video encoder is further configured to encode a first slice, wherein the first slice comprises one of the texture slice and the depth slice, wherein the first slice has a slice header comprising syntax elements representative of characteristics of the first slice, determining common syntax elements for a second slice of the slice header of the first slice, and encoding the second slice after encoding the first slice at least partially based on the determined common syntax elements, wherein the second slice comprises one of the texture slice and the depth slice that is not the first slice, where the second slice has a slice header that comprises syntax elements representative of characteristics of the second slice, excluding values for the syntax elements that are common to the first slice.
[0016] In another aspect, a computer program product comprises a computer readable storage medium that has stored therein instructions which, when executed, cause a processor of a video encoding device to receive a texture slice to a texture view component associated with one or more coded blocks of video data representative of texture information, the texture slice comprising the one or more coded blocks and a texture slice header comprising representative syntax elements of texture slice features. The instructions further cause the video encoding device processor to receive a depth slice for a depth view component associated with one or more encoded blocks of depth information corresponding to the texture view component, wherein the depth comprises the one or more encoded blocks of depth information and a depth slice header which comprises syntax elements representative of depth slice characteristics and in which both the depth view component and the texture view component belong to a view and an access unit. The instructions further cause the video encoding device processor to encode a first slice, where the first slice comprises one of the texture slice and the depth slice, where the first slice has a slice header comprising elements syntax representative of characteristics of the first slice and determines the common syntax elements for a second slice of the slice header of the first slice. The instructions further cause the video encoding device processor to encode the second slice after decoding the first slice at least partially based on the common syntax elements determined, wherein the second slice comprises one of the texture slice and the depth slice other than the first slice, where the second slice has a slice header that comprises syntax elements representative of characteristics of the second slice, excluding values for those syntax elements that are common to the first slice.
[0017] In another aspect, a device is provided comprising means for receiving a texture slice for a texture view component associated with one or more encoded blocks of video data representative of texture information, wherein the texture slice comprises the one or more coded blocks and a texture slice header comprising syntax elements representative of texture slice characteristics. The device further comprises means for receiving a depth slice for a depth view component associated with one or more encoded blocks of depth information corresponding to the texture view component, wherein the depth slice comprises the one or more encoded blocks of depth information and a depth slice header comprising syntax elements representative of characteristics of the depth slice and where both the depth view component and the texture view component belong to a view and an access unit . The device further comprises means for decoding a first slice, wherein the first slice comprises one of the texture slice and the depth slice, wherein the first slice has a slice header comprising syntax elements representative of characteristics of the first slice . The device further comprises means for determining common syntax elements for a second slice from the slice header of the first slice. The device further comprises means for encoding the second slice after finishing the first slice at least partially based on the common syntax elements determined, wherein the second slice comprises one of the texture slice and the depth slice which is not the first slice, where the second slice has a slice header that comprises syntax elements representative of characteristics of the second slice, excluding values for the syntax elements that are common to the first slice.
[0018] The techniques described in this disclosure may be deployed in hardware, software, firmware or any combination thereof. If deployed in software, the software can run on a processor, which can refer to one or more processors, such as a microprocessor, application-specific integrated circuit (ASIC), field-programmable gate array (FPGA) or a processor. digital signal (DSP) or other equivalent integrated or discrete logic circuitry. Software comprising instructions for performing the techniques may be initially stored on a computer-readable medium and loaded and executed by a processor.
Accordingly, this disclosure also contemplates computer readable media comprising instructions for causing a processor to perform any of a variety of techniques as described in this disclosure. In some cases, the computer-readable medium may form part of a computer program product, which may be sold to manufacturers and/or used in a device. The computer program product may include computer readable medium and in some cases may also include packaging materials.
[0020] This disclosure can also apply to electromagnetic signals that carry information. For example, an electromagnetic signal may comprise information relating to the total pixel support used to interpolate a value for a sub-integer pixel of a reference sample. In some examples, a signal can be generated from a device that implements the techniques described in this document or can be transmitted by it. In other examples, this disclosure may apply to signals that can be received in a device that implements the techniques described herein.
[0021] Details of one or more aspects of the disclosure are set forth in the accompanying drawings and in the description below. Other features, objectives and advantages of the techniques described in this disclosure will be apparent from the description and drawings and from the claims. Brief Description of Drawings
[0022] Figure 1 is a block diagram illustrating an example of a video encoding and decoding system, in accordance with the techniques of the present disclosure.
[0023] Figure 2 is a block diagram illustrating an example of the video encoder of Figure 1 in greater detail, in accordance with the techniques of the present disclosure.
[0024] Figure 3 is a diagram of an example of an MVC prediction structure for multi-view video encoding, in accordance with the techniques of the present disclosure.
[0025] Figure 4 is a flowchart illustrating an exemplary operation of a video encoder, in accordance with the techniques of the present disclosure.
[0026] Figure 5 is a block diagram illustrating an example of the video decoder of Figure 1 in greater detail, in accordance with the techniques of the present disclosure.
[0027] Figure 6 is a flowchart illustrating an exemplary operation of a video decoder, in accordance with the techniques of the present disclosure. Detailed Description
[0028] This disclosure describes signaling techniques that an encoder can apply and that a decoder can use during at least the inter-prediction stage of at least one video encoding or decoding process. The techniques described are related to encoding three-dimensional (“3D”) video content. 3D video content can be represented, for example, as encoded blocks of multi-view more depth (“MVD”) video. That is, these techniques can be applied to encode or decode a bitstream that resembles a multiple view video encoding (MVC) bitstream, where any or all views of the MVC Bitstream can additionally include depth information.
[0029] More specifically, some techniques in accordance with this disclosure involve receiving at least one two-dimensional image that has texture view components and depth view components. Some texture view components and depth view components can be coded together in a single coded block or as separate blocks. An image can be broken into image slices. Syntax elements for encoding texture view components can be flagged in a slice header. Some syntax elements for the depth view components can be predicted from the syntax elements for the texture view components that correspond to the depth view components. The techniques of this disclosure relate to data encoding, decoding and signaling used to render three-dimensional video data from two-dimensional video data, based on estimated depth map data for the two-dimensional video data. In some examples, texture view components are encoded using techniques other than those used to encode depth information. In this disclosure, the term "encode" can refer to either encoding or decoding or both.
[0030] Video conversion based on depth estimation and virtual view synthesis is used to create 3D images, such as for 3D video applications. In particular, virtual views of a scene can be used to create a 3D view of the scene. Generating a virtual view of a scene based on an existing view of the scene is conventionally achieved by estimating object depth values before synthesizing the virtual view. Depth estimation is a process of estimating absolute or relative distances between objects and a camera plane of stereo pairs or monoscopic content. As used in this document, depth information includes information useful in three-dimensional video formation, such as a depth map (eg, depth values on a per pixel basis) or a parallax map (eg, horizontal disparity in one base per pixel).
[0031] Estimated depth information, usually represented by a gray level image depth map, which can be used to generate an arbitrary angle for virtual views using depth image-based rendering techniques (DIBR ). Compared to traditional three-dimensional television (3DTV) systems where multiple view sequences face the challenges of efficient interview compression, a depth map-based system can reduce bandwidth usage by transmitting only one or a few views along the( s) depth map(s), which can be efficiently encoded. The depth map(s) used in the depth map based conversion may be controllable (eg through scale) by end users before the depth map(s) are in) used(s) in view synthesis. Custom virtual views can be generated with different amount of perceived depth. In addition, a depth estimate can be performed using monoscopic video where only a 2D view content is available.
[0032] The techniques described in this document can be applied to predict syntax elements for a depth view component from syntax elements stored in a slice header for texture view components co-located in the same view. For example, values for syntax elements that are common to the depth slice and texture slice can be included in the slice header for texture view components, but not in the slice for the associated depth view components. That is, a video encoder or decoder can encode syntax elements that are common to depth slice and texture slice in slice header for texture view components that are not present in slice header for view components of depth. For example, a first value can be provided for a first syntax element in the slice header for texture view components. The slice header for depth view components also share the first syntax element, which means that the first syntax element is common to both the texture slice header and the depth slice header. The first syntax element for depth view components has a second value. However, the slice header for the Depth View component does not include the first syntax element. According to techniques described in this document, the second value of the first syntax element can be predicted from the first value.
[0033] In some examples, only a visual representation (picture) parameter set id (PPS) and a delta quantize (QP) parameter of a slice are signaled to the slice header of the depth view component. In other examples, additional information from the reference visual representation list construction is flagged in addition to an identification of PPS and delta QP. Other syntax elements are inherited or determined from the slice header of the texture view component. In some examples, the values for common syntax elements are set to be the same as the corresponding syntax elements. That is, the other syntax elements for the slice header of the depth view component is set equal to the corresponding values in the slice header for the corresponding texture view component.
[0034] In another example, the start position of the coded block (macroblock or coding unit) is additionally signaled. That is, the slice header for a slice of depth information signals the location of the first block (for example, first block or CU) of the slice, without signaling other syntax data for the slice header (which can be determined to be equal to the slice's corresponding syntax data including corresponding texture information). When the starting position of the slice is not signaled, it is inferred to be 0 in some examples. A frame_num and POC value of the depth view component can be additionally flagged. A flag is used to indicate whether one or more loop filter parameters used for the depth view component are the same as one or more loop filter parameters flagged for the texture view components.
[0035] Block-based intercoding is an encoding technique that relies on temporal prediction to reduce or remove temporal redundancy between video blocks of successive encoded units of a video sequence. The encoded units may comprise video frames, video frame slices, groups of visual representations or other defined unit of encoded video blocks. For intercoding, a video encoder performs motion estimation and motion compensation to estimate motion between video blocks of two or more adjacent encoded units. Using motion estimation techniques, the video encoder generates motion vectors, which indicate displacement of video blocks relative to corresponding predictive video blocks in one or more reference frames or other coded units. Using motion compensation techniques, the video encoder uses motion vectors to generate predictive video blocks from one or more reference frames or other coded units. After motion compensation, the video encoder calculates residual video blocks by subtracting the prediction video blocks from the original video blocks that are encoded.
[0036] Reference View Components (RVCs) can include multiple texture or depth slices. In some examples, where the reference view components comprise multiple slices, a co-located slice can be used when determining the syntax elements of a current slice. Alternatively, a first slice in RVC can be determined to determine syntax elements of the current slice. In other examples, another slice in the RVC can be used to determine common syntax elements of the current slice.
[0037] Figure 1 is a block diagram illustrating an example of a video encoding and decoding system 10, in accordance with the techniques of the present disclosure. As shown in the example in Figure 1, the system 10 includes a source device 12 that transmits encoded video to a destination device 14 via a link 15. The link 16 may comprise any type of medium or device capable of moving the data. encoded video data from source device 12 to destination device 14. In one example, bus 16 comprises a communication means for enabling source device 12 to transmit the encoded video data directly to destination device 14 in real time . The encoded video data can be modulated according to a communication standard such as a wireless communication protocol and transmitted to the target device 14. The communication medium can comprise any wired or wireless communication medium such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium can form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the Internet. The communication medium may include routers, switches, base stations or any other equipment that may be useful to facilitate communication from source device 12 to destination device 14.
[0038] The source device 12 and the destination device 14 may comprise any of a wide range of devices. In some examples, either the source device 12 or the target device 14 may comprise wireless communication devices, such as wireless telephone equipment, so-called cellular or satellite radiotelephones, or any wireless devices that can communicate video information. at link 15, where link 15 of the case is wireless. However, the techniques of this disclosure that pertain to video data encoding blocks that include both texture and depth information, are not necessarily limited to wireless applications or settings. The techniques can also be useful in a wide range of other settings and devices, including devices that communicate over physical wires, optical fibers, or other physical or wireless media. Additionally, encoding or decoding techniques can also be applied on a standalone device that does not necessarily communicate with any other device. For example, the video decoder 28 may reside in a digital media player or other device and receive encoded video data via streaming, downloading, or storage media. Therefore, the description of source device 12 and target device 14 in communication with each other is provided for the purposes of illustrating an exemplary deployment and should not be considered limiting to the techniques described in this disclosure, which may be applicable to video encoding in general in a variety of environments, applications or deployments.
[0039] In the example of Figure 1, the source device 12 includes a video source 20, depth processing unit 21, video encoder 22 and output interface 24. The destination device 14 includes an input interface 26, video decoder 28 and display device 30. In accordance with this disclosure, the video encoder 22 of source device 12 may be configured to apply one or more of the techniques of this disclosure as part of a video encoding process. Similarly, the target device 14 video decoder 28 may be configured to apply one or more of the techniques of this disclosure as part of a video decoding process.
[0040] The video encoder 22 can also apply transform, quantization and entropy encoding processes to further reduce the bit rate associated with residual block communication. Transform techniques can comprise discrete cosine transforms (DCTs) or conceptually similar processes. Alternatively, small wave transforms, integer transforms or other types of transforms can be used. In a DCT process, as an example, a set of pixel values is converted into transform coefficients, which represent the energy of the pixel values in the frequency domain. Video encoder 22 can also quantize the transform coefficients, which may generally involve a process that reduces the number of bits associated with the corresponding transform coefficient. Entropy coding can include one or more processes that collectively compress the data for release to a bit stream, where the compressed data can include, for example, a sequence of coding modes, motion information, coded block patterns, and quantized transform coefficients. Examples of entropy coding include, but are not limited to, context adaptive variable length coding (CAVLC) and context adaptive binary arithmetic coding (CABAC).
[0041] An encoded video block can be represented by prediction information that can be used to create or identify a predictive block and a residual block of data that can be applied to the predictive block to recreate the original block. The prediction information can comprise the one or more motion vectors that are used to identify the predictive block of data. With the use of motion vectors, the video decoder 28 can be able to reconstruct the predictive blocks that were used to encode the residual blocks. In this way, given a set of residual blocks and a set of motion vectors (and possibly some additional syntax), the video decoder 28 can reconstruct a video frame that was originally encoded. Intercoding based on motion estimation and motion compensation can achieve relatively high amounts of compression without excessive data loss, due to the fact that successive video frames or other types of encoded units are often similar. An encoded video sequence may comprise residual data blocks, motion vectors (when encoded interpredicting), intraprediction mode indications for intraprediction, and syntax elements.
[0042] Video encoder 22 can also use intraprediction techniques to encode video blocks relative to neighboring video blocks of a common frame or slice. In this way, the video encoder 22 spatially predicts the blocks. Video encoder 22 can be configured with a variety of intraprediction modes, which generally correspond to various spatial prediction directions. As with motion estimation, video encoder 22 can be configured to select an intraprediction mode based on a block's luminance component, then reuse the intraprediction mode to encode the block's chrominance components. Furthermore, in accordance with the techniques of this disclosure, video encoder 22 can reuse the intraprediction mode to encode a block depth component.
[0043] Reusing the information of motion and intraprediction mode to encode a depth component of a block, these techniques can simplify the process of encoding depth maps. Furthermore, the techniques described in this document can improve bitstream efficiency. That is, the bitstream only needs to indicate some syntax elements since, in a slice header for the texture view component, instead of signal additional syntax element in a slice header for a component slice of depth view.
[0044] Optionally, a texture view component can also reuse its corresponding depth view component in the same way.
[0045] Again, the system 10 illustrated in Figure 1 is merely an example. The various techniques of this disclosure can be performed by any encoding device that supports block-based predictive encoding or by any decoding device that supports block-based predictive decoding. Source device 12 and destination device 14 are merely examples of such encoding devices in which source device 12 generates encoded video data for transmission to destination device 14. In some cases, devices 12 and 16 may operate substantially symmetrically so that each of devices 12 and 16 includes video encoding and decoding components. Therefore, system 10 can support unidirectional or bidirectional video transmission between video devices 12 and 16, for example, for video streaming, video playback, video transmission, or video telephony.
[0046] Video source 20 of source device 12 includes a video capture device such as a video camera, a video file containing previously captured video, or a video feed from a video content provider. Alternatively, the video source 20 may generate data based on computer graphics such as the video source or a combination of live video, archived video, and/or computer generated video. In some cases, if the video source 20 is for a video camera, the source device 12 and the target device 14 may form so-called camera phones or video phones or other mobile devices configured to handle video data such as as tablet-type device computing devices. In each case, the captured, pre-captured or computer generated video can be encoded by video encoder 22. Video source 20 captures a view and feeds it to depth processing unit 21.
[0047] Video source 20 provides view 2 to depth processing unit 21 for depth image calculation for objects in view 2. In some examples, view 2 comprises more than one view. A depth image is determined for objects in view 2 captured by video source 20. Depth processing unit 21 is configured to automatically calculate depth values for objects in view image 2. depth 21 calculates depth values for objects based on luminance information. In some examples, depth processing unit 21 is configured to receive depth information from a user. In some examples, video source 20 captures two views of a scene from different perspectives and then calculates depth information for objects in the scene based on the disparity between the objects in the two views. In several examples, the video source 20 comprises a standard two-dimensional camera, a two-camera system that provides a stereotopic view of a scene, a camera arrangement that captures multiple views of the scene, or a camera that captures information from a deeper view. .
[0048] The depth processing unit 21 provides texture view components 4 and depth view components 6 to the video encoder 22. The depth processing unit 21 can also provide the view 2 directly to the video encoder 22. Depth information 6 comprises a depth map image for view 2. A depth map image may comprise a map of depth values for each pixel region associated with an area (eg block, slice or frame) to be displayed. A pixel region includes a single pixel or a group of one or more pixels. Some examples of depth maps have a per-pixel depth component. In other examples, there are multiple depth components per pixel. Depth maps can be encoded in a manner substantially similar to texture data, for example, using intra-prediction or inter-prediction with respect to other, previously encoded depth data. In other examples, depth maps are encoded in a different way than texture data is encoded.
[0049] The depth map can be estimated in some examples. When more than one view is present, stereo matching can be used to estimate depth maps. However, in 2D to 3D conversion, estimating depth can be more difficult. However, the depth map estimated by various methods can be used for 3D rendering based on Depth Image Based Rendering (DIBR).
[0050] Although the video source 20 can provide multiple views of a scene and the depth processing unit 21 can calculate the depth information based on the multiple views, the source device 12 generally can transmit information from one view. more depth for each view of a scene.
[0051] When view 2 is a digital static visual representation, video encoder 22 can be configured to encode view 2 as, for example, a Mixed Group of Photographers (JPEG) image. When view 2 is a video data frame, video encoder 22 is configured to encode first view 50 in accordance with a video encoding standard such as, for example, Moving Picture Expert Group (MPEG) , the International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC) MPEG-1 Visual, ISO/IEC MPEG-2 Visual, ISO/IEC MPEG-4 Visual, International Communication Union (ITU) H.261, ITU-T H.262, ITU-T H.263, ITU-T H.264/MPEG-4, H.264 Advanced Video Coding (AVC), the Incoming High Efficiency Video Coding (HEVC) standard (also called such as H.265) or other video encoding standards. Video encoder 22 may include depth information 6 next to the encoded image to form encoded block 8, which includes encoded image data along with depth information 6. Video encoder 22 moves encoded block 8 to the output interface 24 Coded block 8 may be transferred to input interface 26 in a bit stream which includes signaling information along coded block 8 at loop 15.
[0052] The encoded video information includes texture components 4 and depth information 6. Texture components 4 can include luminance (luma) and chrominance (chroma) components of video information. Luma components generally describe brightness, while chrominance components generally describe shades of color. Depth processing unit 21 extracts depth information 6 from a depth view map 2. Video encoder 22 can encode texture view components 4 and depth view components 6 into a single coded block 8 of encoded video data. Similarly, video encoder 22 can encode the block so that the motion or intraprediction mode information for the luma component is reused for the chroma components and the depth component. Syntax elements used for texture view components can be used to predict similar syntax elements for depth view components.
[0053] In some instances, the depth map view component may not be coded using interview prediction techniques even when the corresponding texture view component is coded using interview prediction techniques. For example, the depth map view component can be predicted using intraview prediction when the corresponding texture view component is predicted using interview prediction. For example, the interview predicting a texture view component predicts the texture view information of data from a different view as a view corresponding to the texture view component. In contrast, intraview that predicts the depth view information predicts the depth information from data in the same view as the view corresponding to the depth view information.
[0054] Despite the use of different prediction techniques, some syntax elements for the depth map view component can be predicted from the corresponding syntax elements in the slice header of the corresponding texture view component. However, the slice header information for the depth map view component can contain information related to a reference visual representation list construct. That is, information related to building reference visual representation list can be flagged in slice header for depth map view component. For example, various visual reference representations that are used and an indication of which visual reference representations are used to predict the depth map view component can be flagged in the slice header for the depth map view component. Similar information can also be flagged in a slice header for the corresponding texture view component.
[0055] In some examples, the source device 12 includes a modem that modulates the coded block 8 in accordance with a communication standard, for example, such as code division multiple access (CDMA) or other communication standard. A modem can include multiple mixers, filters, amplifiers, or other components designed for signal modulation. The output interface 24 can include circuitry designed to transmit data, including amplifiers, filters and one or more antennas. Encoded block 8 is transmitted to destination device 14 via output interface 24 and link 15. In some examples, instead of transmission on a communication channel, source device 12 stores encoded video data, including blocks that have texture and depth components, on a 32 storage device, such as a digital video optical disc (DVD), Blu-ray optical disc, flash drive, or the like.
[0056] The input interface 26 of the destination device 14 receives information on link 15. In some examples, the destination device 14 includes a modem that demodulates the information. As with output interface 24, input interface 26 can include circuitry designed to receive data, including amplifiers, filters, and one or more antennas. In some cases, the output interface 24 and/or the input interface 26 may be incorporated within a single transceiver component that includes both receiving and transmitting circuitry. A modem can include multiple mixers, filters, amplifiers, or other components designed for signal demodulation. In some cases, a modem may include components to perform both modulation and demodulation.
[0057] Again, the video encoding process performed by video encoder 22 may implement one or more of the techniques described herein during interprediction encoding, which may include motion estimation and motion compensation and intraprediction encoding. The video decoding process performed by the video decoder 28 can also perform such techniques during a motion compensation stage for the decoding process.
[0058] The term "encoder" is used herein to refer to a specialized computer device or apparatus that performs video encoding or video decoding. The term “encoder” generally refers to any video encoder, video decoder or combined encoder/decoder (codec). The term “encoding” refers to encoding or decoding. The terms "encoded block," "encoded block unit," or "encoded unit" can refer to any independently decodable unit of a video frame such as an entire frame, a slice of a frame, a block of video data or another independently decodable unit defined in accordance with the encoding techniques used.
[0059] The display device 30 displays the decoded video data to a user and may comprise any of a variety of one or more display devices such as a cathode ray tube (CRT), a liquid crystal display ( LCD), a plasma display, an organic light-emitting diode (OLED) display, or other type of display device. In some examples, the display device 30 corresponds to a device capable of three-dimensional reproduction. For example, the display device 30 may comprise a stereoscopic display, which is used in conjunction with an eye protection article worn by an observer. The eye protection article may comprise active spectacles, in which case the display device 30 rapidly switches between different viewing images synchronously with alternative obturation of active spectacle lenses. Alternatively, the eye protection article may comprise passive spectacles, in which case the display device 30 displays different sight images simultaneously and the passive spectacle may include polarized lenses that are generally polarized in orthogonal directions for filtering between different sights.
[0060] In the example of Figure 1, the link 15 may comprise any wired or wireless communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines or any combination of wireless and wired. The link 15 may form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the Internet. The link 15 generally represents any suitable communication means or collection of different communication means, for transmitting video data from the source device 12 to the destination device 14. The link 15 may include routers, switches, base stations or any other equipment. which can be useful to facilitate communication from source device 12 to target device 14.
[0061] The video encoder 22 and video decoder 28 can operate in accordance with a video compression standard such as the ITU-T H.264 standard, alternatively described as MPEG-4, Part 10, Advanced Video Encoding (stroke). Additional video compression standards that are based on the ITU H.264/AVC standard that can be used by video encoder 22 and video decoder 28 include the scalable video encoding (SVC) standard, which is a scalable extension to ITU H.264/AVC standard. Another standard that video encoder 22 and video decoder 28 can operate in accordance with includes the multiple view video coding (MVC) standard, which is a multiple view extension to the ITU H.264/AVC standard . The techniques of this disclosure, however, are not limited to any particular video encoding standard.
[0062] In some aspects, video encoder 22 and video decoder 28 may each be integrated with an audio encoder and decoder and may include appropriate MUX-DEMUX or other hardware and software units to manage encoding both audio and video into a common data stream or separate data streams. If applicable, MUX-DEMUX units can conform to ITU H.223 multiplexer protocol or other protocols such as User Datagram Protocol (UDP).
[0063] Video encoder 22 and video decoder 28 can each be deployed as one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrangements (FPGAs), discrete logic, software, hardware, firmware, or any combinations thereof. When any or all of the techniques of this disclosure are implemented in software, an implementation device may additionally include hardware to store and/or execute instructions for the software, for example, a memory to store the instructions and one or more processing units to execute the instructions. Each of video encoder 22 and video decoder 28 can be included in one or more encoders or decoders, any of which can be integrated as part of a combined codec that provides encoding and decoding capabilities in a respective mobile device, device. subscriber, broadcast device, server or the like.
[0064] A video sequence typically includes several video frames, also referred to as visual representations of video. Video encoder 22 operates on video blocks within individual video frames in order to encode video data. Video blocks can be fixed or vary in size and have different sizes according to a specified encoding standard. Each video frame includes a series of one or more slices. In the ITU-T H.264 standard, for example, each slice includes several macroblocks, which can be arranged in subblocks. The H.264 standard supports intraprediction at various block sizes for two-dimensional (2D) video encoding, such as 16 by 16, 8 by 8 or 4 by 4 for luma components and 8 by 8 for chroma components as well as interprediction on various block sizes such as 16 by 16, 16 by 8, 8 by 16, 8 by 8, 8 by 4, 4 by 8 and 4 by 4 for luma components and corresponding stepped sizes for chroma components. Video blocks can comprise pixel data blocks or blocks of transform coefficients, for example, following a transformation process such as discrete cosine transform (DCT) or a conceptually similar transformation process. These techniques can be extended to 3D video.
[0065] Smaller video blocks can provide better resolution and can be used for locations on a video frame that include high levels of detail. In general, macroblocks and various subblocks can be considered to be video blocks. Additionally, a slice can be considered to be a series of video blocks, such as macroblocks and/or subblocks. Each slice can be an independently decodable unit of a video frame. Alternatively, the frames themselves can be decodable units or other portions of a frame can be defined as decodable units.
[0066] The 2D macroblocks of the ITU-T H.264 standard can be extended to 3D by encoding depth information from a depth map or parallax map along with associated luma and chroma components (ie, components of texture) for that frame or video slice. Parallax map mapping (also referred to as virtual displacement mapping or offset mapping) shifts texture view components at a pixel location based on a function of a view angle and a height map at pixel location. Video encoder 22 can encode depth information as monochrome video.
[0067] To encode the video blocks, such as a coded block, the video encoder 22 performs intra or interprediction to generate one or more prediction blocks. Video encoder 22 subtracts the prediction blocks from the original video blocks to be encoded to generate residual blocks. In this way, residual blocks can represent pixel-by-pixel differences between the blocks that are coded and the prediction blocks. Video encoder 22 can perform a transform on the residual blocks to generate blocks of transform coefficients. Following intra- or inter-based predictive coding and transformation techniques, the video encoder 22 can quantize the transform coefficients. Quantization generally refers to a process in which coefficients are quantized to possibly reduce the amount of data used to represent the coefficients. Following quantization, entropy coding can be performed according to an entropy coding methodology such as context adaptive variable length coding (CAVLC) or context adaptive binary arithmetic coding (CABAC). Additional details of an encoding process performed by video encoder 22 are described below in relation to Figure 2.
[0068] Efforts are currently underway to develop a new video coding standard, currently referred to as High Efficiency Video Coding (HEVC). The arrival pattern is also referred to as H.265. Standardization efforts are based on a model of a video encoding device termed as the HEVC Test Model (HM). HM assumes various video device encoding capabilities in devices according to, for example, ITU-T H.264/AVC. For example, while H.264 provides nine intraprediction encoding modes, HM provides up to thirty-three intraprediction encoding modes. HEVC can be extended to support slice header information techniques as described in this document.
[0069] HM refers to a block of video data as a coding unit (CU). Syntax data within a bitstream can define a larger encoding unit (LCU), which is a larger encoding unit in terms of the number of pixels. In general, a CU serves a similar purpose to an H.264 macroblock, except that a CU does not have a size distinction. A coded block can be a CU according to the HM standard. In this way, a CU can be divided into subCUs. In general, references in this disclosure to a CU may refer to a larger coding unit (LCU) of a visual representation or a subCU of an LCU. An LCU can be split into subCUs and each subCU can be split into subCUs. Syntax data for a bitstream can define a maximum number of times an LCU can be split, referred to as CU depth. Consequently, a bitstream can also define a smaller encoding unit (SCU). This disclosure also uses the term “block” to refer to any one of a CU, prediction unit (PU) or transform unit (TU).
[0070] An LCU can be associated with a quadtree data structure. In general, a quadtree data structure includes one node per CU, where a root node corresponds to the LCU. If a CU is divided into four subCUs, the node corresponding to the CU includes four leaf nodes, each of which corresponds to one of the subCUs. Each node of the quadtree data structure can provide syntax data for the corresponding CU. For example, a node in the quadtree might include a split flag, indicating whether the CU corresponding to the node is split into subCUs. The syntax elements for a CU can be defined recursively and can depend on whether the CU is divided into subCUs.
[0071] A CU that is not split can include one or more prediction units (PUs). In general, a PU represents all or a portion of the corresponding CU and includes data to retrieve a reference sample for the PU. For example, when the PU is intramode encoded, the PU can include data describing an intraprediction mode for the PU. As another example, when the PU is intermode coded, the PU can include data that defines a motion vector for the PU. Data defining the motion vector can describe, for example, a horizontal motion vector component, a vertical motion vector component, a resolution for the motion vector (for example, pixel precision of one-quarter or one -eighth), a frame of reference to which the motion vector points, and/or a reference list (eg list 0 or list 1) for the motion vector. The motion vector can also be treated as having different resolutions for texture view components and depth view components. The data for the CU that defines the PU(s) can also describe, for example, partitioning of the CU into one or more PUs. Partitioning modes can differ between whether the CU is uncoded, coded intraprediction mode, or coded interprediction mode.
[0072] A CU that has one or more PUs may also include one or more Transform Units (TUs). Following the prediction using a PU, the video encoder 22 can calculate a residual value for the portion of the CU corresponding to the PU. The residual value can be transformed, scanned and quantized. A TU is not necessarily limited to the size of a PU. In this way, the TUs can be larger or smaller than the corresponding PUs for the same CU. In some examples, the maximum size of a TU can match the size of the corresponding CU.
[0073] As noted above, intraprediction includes predicting a PU of a current CU of a visual representation from previously encoded CUs of the same visual representation. More specifically, video encoder 22 can intrapredict a current CU of a visual representation using a particular intrapredict mode. An HM encoder can be configured with up to thirty-three intraprediction modes. Therefore, to support a one-to-one mapping between directional intraprediction modes and directional transforms, the HM encoders and decoders would need to store 66 matrices for each supported transform size. Furthermore, the block sizes for which all thirty-three intraprediction modes are supported can be relatively large blocks, for example 32x32 pixels, 64x64 pixels or even larger.
[0074] At the destination device 14, the video decoder 28 receives encoded video data 8. The video decoder entropy 28 decodes the received encoded video data 8, such as a coded block, according to an encoding methodology by entropy, such as CAVLC or CABAC, to obtain the quantized coefficients. Video decoder 28 applies inverse quantization (dequantization) and inverse transform functions to reconstruct the residual block in the pixel domain. The video decoder 28 also generates a prediction block based on the control information or syntax information (e.g. coding mode, motion vectors, syntax defining filter coefficients and the like) included in the encoded video data. The video decoder 28 calculates a sum of the prediction block and the reconstructed residual block to produce a reconstructed video block for display. Further details of an example decoding process performed by video decoder 28 are described below in relation to Figure 5.
[0075] As described in this document, Y can represent luminance, Cb and Cr can represent two different chrominance values of a three-dimensional YCbCr color space (eg, blue and red hues) and D can represent depth information. In some examples, each pixel location might actually define three pixel values for a three-dimensional color space and one pixel value for the depth of the pixel location. In other examples, there may be different numbers of luma components per chroma component. For example, there may be four luma components per chroma component. Additionally, the depth and texture components can have different resolutions. In such an example, there may not be a one-to-one relationship between texture view components (eg luma components) and depth view components. The techniques of this disclosure, however, may refer to prediction with respect to a dimension for the sake of simplicity. To the extent that techniques are described in relation to pixel values in one dimension, similar techniques can be extended to other dimensions. In particular, in accordance with one aspect of this disclosure, video encoder 22 and/or video decoder 28 can obtain a pixel block, wherein the pixel block includes texture view components and depth view components.
[0076] In some examples, video encoder 22 and video decoder 28 may use one or more interpolation filtering techniques during motion compensation. That is, the video encoder 22 and/or the video decoder 28 can apply an interpolation filter to filter the media which comprises groups of complete integer pixel positions.
[0077] The video decoder 28 of the target device 14 receives one or more encoded blocks as part of an encoded video bitstream along with additional information including syntax elements related to texture view components. The video decoder 28 can render the video data for 3D playback based on 8 coded block and syntax elements. In accordance with the techniques of this disclosure and as discussed in more detail below, the flagged syntax elements for texture view components 4 can be used to predict syntax elements for depth view components 6. The syntax elements can be flagged in a slice header for texture view 4 components. The corresponding syntax elements for depth 6 view components can be determined from the related syntax elements for texture view 4 components.
[0078] Some syntax elements for the 6 depth view components can be signaled in a slice header for the 6 depth view components, such as a quantization parameter difference between the depth component map and one of the one or more texture components for a slice. The attribute can also be a slice-level flag that indicates whether the loop filter parameters used for the depth view component are the same as the loop filter parameters as flagged for the texture view components. In other examples, syntax elements can be signaled at the string level (for example, in a string parameter set (SPS) data structure), the visual representation level (for example, in a data structure or visual representation parameter set (PPS) frame header) or the block level (eg in a block header) in addition to the slice level (eg in a slice header).
[0079] Figure 2 is a block diagram illustrating an example of the video encoder 22 of Figure 1 in greater detail. Video encoder 22 encodes block units that signal syntax elements for texture view components that can be used to predict syntax elements for depth view components, consistent with the techniques of this disclosure. Video encoder 22 is an example of a specialized video computer device or apparatus referred to herein as an "encoder." as shown in Figure 2, video encoder 22 corresponds to video encoder 22 of source device 12. However, in other examples, video encoder 22 may correspond to a different device. In further examples, other units (such as, for example, other encoders/decoders (CODECS)) can also perform similar techniques to those performed by the video encoder 22.
[0080] The video encoder 22 can perform at least one of intra and intercoding blocks within video frames, although the intracoding components are not shown in Figure 2 for ease of illustration. Intracoding relies on spatial prediction to reduce or remove spatial redundancy of video within a given video frame. Intercoding relies on temporal prediction to reduce or remove temporal redundancy of video within adjacent frames of a video sequence. Intramode (I-mode) can refer to the spatially based compression mode. Intermodes such as a prediction (P-mode) or a bidirectional (B-mode) can refer to time-based compression modes. The techniques of this disclosure apply during intercoding and intracoding. However, for simplicity and ease of illustration, intracoding units such as a spatial prediction unit are not illustrated in Figure 2.
[0081] As shown in Figure 2, the video encoder 22 receives a video block within a video frame to be encoded. In one example, video encoder 22 receives texture view 4 components and depth view components 6. In another example, video encoder receives view 2 of video source 20.
[0082] In the example of Figure 2, the video encoder 22 includes a prediction processing unit 32, predictive coding unit (MCU), multiple depth view video unit (MVD) 33, memory 34, a first adder 48, a transform processing unit 38, a quantization unit 40 and an entropy coding unit 46. For video block reconstruction, the video encoder 22 also includes an inverse quantization unit 42, a inverse transform processing 44, a second adder 51 and a deblocking unit 43. The deblocking unit 43 is a deblocking filter that filters block boundaries to remove blocking artifacts from the reconstructed video. If included in video encoder 22, deblocking unit 43 would typically filter the output of second adder 51. Deblocking unit 43 can determine deblocking information for the one or more texture view components. The deblocking unit 43 can also determine deblocking information for the depth component map. In some examples, the unlocking information for the one or more texture components may be different from the unlocking information for the depth component map. In one example, as shown in Figure 2, the transform processing unit 38 represents a functional block, as opposed to a “TU” in terms of HEVC.
[0083] The multiple view plus depth (MVD) video unit 33 receives one or more blocks of video (labeled "VIDEO BLOCK" in Figure 2) that comprise texture components and depth information, such as video view components. texture 4 and depth view components 6. The MVD 33 unit provides functionality to the video encoder 22 to encode depth components into a block unit. The MVD unit 33 provides the texture view components and depth view components, either combined or separate, to the prediction processing unit 32 in a format that allows the prediction processing unit 32 to process depth information. The MVD unit 33 can also signal to the transform processing unit 38 that depth view components are included with the video block. In other examples, each video encoder unit 22, such as prediction processing unit 32, transform processing unit 38, quantizing unit 40, entropy encoding unit 46, etc., comprises the functionality of processing information depth plus texture view components.
[0084] In general, video encoder 22 encodes depth information similarly to chrominance information, that motion compensation unit 37 is configured to reuse calculated motion vectors for a luminance component of a block when calculating a predicted value for a depth component of the same block. Similarly, a video encoder intraprediction unit 22 can be configured to use a selected intraprediction mode for the luminance component (i.e., based on luminance component analysis) when encoding the in-depth view component by the use of intraprediction.
[0085] The prediction processing unit 32 includes a motion estimation unit (ME) 35 and a motion compensation unit (MC) 37. The prediction processing unit 32 predicts depth information for both pixel locations and for texture components. One or more interpolation filters 39 (referred to herein as "filter 39") may be included in the prediction processing unit 32 and may be called by one or both of the ME unit 35 and the MC unit 37 to perform interpolation as part of motion estimation and/or motion compensation. The interpolation filter 39 may actually represent a plurality of different filters to facilitate a number of different types of interpolation and interpolation type filtering. Therefore, the prediction processing unit 32 may include a plurality of interpolation filters or interpolation-like filters.
[0086] During the encoding process, the video encoder 22 receives a block of video to be encoded (labeled "VIDEO BLOCK" in Figure 2) and a prediction processing unit 32 performs intraprediction encoding to generate a block of prediction (labeled “PREDICTION BLOCK” in Figure 2). The prediction block includes both texture view components and depth view information. Specifically, the ME unit 35 can perform motion estimation to identify the prediction block in memory 34 and the MC unit 37 can perform motion compensation to generate the prediction block.
[0087] Motion estimation is normally considered the process of generating motion vectors that estimate motion for video blocks. A motion vector can, for example, indicate the offset of a prediction block within a prediction or reference frame (or other coded unit, eg slice) relative to the block to be coded within the current frame (or another unit encoded). The motion vector can be either full-integer or sub-integer pixel precision. For example, both a horizontal component and a vertical component of the motion vector can have respective total integer components or subinteger components. The reference frame (or portion of the frame) can be located temporally before or after the video frame (or portion of the video frame) to which the current video block belongs. Motion compensation is usually considered the process of fetching or generating the prediction block from memory 34 which may include interpolating or otherwise generating the prediction data based on the motion vector determined by motion estimation.
[0088] The ME 35 unit calculates at least one motion vector for the video block to be encoded by comparing the video block to reference blocks of one or more reference frames (for example, a previous and/or subsequent frame ). Data for the reference frames can be stored in memory 34. The ME unit 35 can perform fractional pixel precision motion estimation, sometimes referred to as fractional pixel, fractional pel, subinteger or subpixel motion estimation. In fractional pixel motion estimation, the ME 35 unit calculates a motion vector that indicates displacement to a location other than an integer pixel location. Therefore, the motion vector can have fractional pixel precision, for example, half-pixel precision, quarter-pixel precision, one-eighth pixel precision, or other fractional pixel precision. In this way, fractional pixel motion estimation allows prediction processing unit 32 to estimate motion more accurately than whole pixel (or full pixel) locations, and therefore prediction processing unit 32 generates a more prediction block. need. Fractional pixel motion estimation allows the prediction processing unit 32 to predict depth information at a first resolution and predict texture components at a second resolution. For example, texture components are predicted to be full pixel precision while depth information is predicted to be half pixel precision. In other examples, other motion vector resolutions can be used for depth information and texture components.
[0089] The ME 35 unit can call one or more filters 39 for any necessary interpolations during the motion estimation process. In some examples, memory 34 can store interpolated values for sub-integer pixels that can be calculated by, for example, adder 51 using filters 39. For example, adder 51 can apply filters 39 to reconstructed blocks that are to be stored in memory 34 .
[0090] Once the prediction processing unit 32 generated the prediction block, the video encoder 22 forms a residual video block (labeled "RESID BLOCK." in Figure 2) by subtracting the prediction block from the block of original video being encoded. This subtraction can occur between texture components in the original video block and texture components in the prediction block, as well as for depth information in the original video block or depth map of depth information in the prediction block. The adder 48 represents the component or components that perform this subtraction operation.
[0091] The transform processing unit 38 applies a transform, such as a discrete cosine transform (DCT) or a conceptually similar transform to the residual block, which produces a video block comprising residual transform block coefficients. It should be understood that the transform processing unit 38 represents the component of the video encoder 22 which applies a transform to residual coefficients of a video data block in contrast to a TU of a CU as defined by the HEVC. The transform processing unit 38 can, for example, perform other transforms such as those defined by the H.264 standard that are conceptually similar to DCT. Such transforms include, for example, directional transforms (such as Karhunen-Loeve theorem transforms), wavelet transforms, integral transforms, subband transforms, or other types of transforms. In either case, the transform processing unit 38 applies the transform to the residual block, which produces a block of residual transform coefficients. Transform processing unit 38 can apply the same type of transform to both texture components and depth information in corresponding residual blocks. There will be separate residual blocks for each texture and depth component. The transform converts residual information from a pixel domain to a frequency domain.
[0092] The quantization unit 40 quantizes the residual transform coefficients to further reduce the bit rate. The quantization process can reduce the bit depth associated with some or all of the coefficients. The quantization unit 40 can quantize a depth image encoding residue. After quantization, the entropy encoding unit 46 encodes the quantized transform coefficients by entropy. For example, the entropy encoding unit 46 may perform CAVLC, CABAC or other entropy encoding methodology.
[0093] The entropy encoding unit 46 may also encode one or more motion vectors and supporting information obtained from the prediction processing unit 32 or another video encoder component 22, such as the quantization unit 40. The one or more prediction syntax elements may include an encoding mode, data for one or more motion vectors (e.g., vertical and horizontal components, reference list identifiers, list indexes, and/or resolution flag information of motion vector), an indication of an interpolation technique used, a set of filter coefficients, an indication of the relative resolution of the image in depth to the resolution of the luminance component, a quantization matrix for the image coding residue of depth, deblocking information for the in-depth image, or other information associated with prediction block generation. These prediction syntax elements can be provided at the sequence level or at the visual representation level.
[0094] The one or more syntax elements may also include a quantization parameter difference (QP) between the luminance component and the depth component. The QP difference can be signaled at the slice level and can be included in a slice header for texture view components. Other syntax elements can also be signaled at a coded block unit level that includes a coded block pattern for the depth view component, a delta QP for the depth view component, a motion vector difference, or others information associated with the generation of the prediction block. The motion vector difference can be signaled as a delta value between a target motion vector and a motion vector of the texture components or as a delta value between the target motion vector (ie, the motion vector of the block being coded) and a predictor of motion vectors neighbors to the block (eg a PU of a CU). After entropy encoding by the entropy encoding unit 46, the encoded video and syntax elements can be transmitted to another device or archived (e.g., in memory 34) for later transmission or retrieval.
[0095] The inverse quantization unit 42 and the inverse transform processing unit 44 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain, for example, for later use as a reference block. The reconstructed residual block (labeled “BLOCK RESID. RECON.” in Figure 2) may represent a reconstructed version of the residual block supplied to the transform processing unit 38. The reconstructed residual block may differ from the residual block generated by the adder 48 due to loss of detail caused by quantize and inverse quantize operations. The adder 51 adds the reconstructed residual block to the motion compensated prediction block produced by the prediction processing unit 32 to produce a reconstructed video block for storage in memory 34. The reconstructed video block can be used by the prediction processing unit 32 as a reference block that can be used to subsequently encode a block unit into a subsequent video frame or subsequent encoded unit.
[0096] Thus, video encoder 22 represents an example of a video encoder configured to receive a coded block unit comprising a view component indicative of a view of an image wherein the view component comprises one or more texture view components and a depth view component, generate a texture slice header for the one or more texture view components that include texture syntax elements where the depth syntax elements for the view component in depth can be determined from the texture syntax elements in the texture slice header.
[0097] In some cases, information related to the encoding of texture view components and depth view components are indicated as one or more syntax elements for inclusion in the encoded bitstream. In some examples, a depth slice header comprises syntax elements that include at least one of the starting microblock location, the slice type, the set of visual representation (PPS) parameters to be used, the delta QP between the Initial QP of the slice and the QP flagged in PPS, the order of the reference visual representations (represented as frame_num) and a display order of the current visual representation (POC). The depth slice header can also comprise at least one of a reference visual representation list construct and related syntax elements, a memory management control operation and related syntax elements, and a weighted prediction and related syntax elements .
[0098] Figure 3 is a diagram of an example of an MVC prediction structure (MVC) for multi-view video encoding. MVC is an extension of H.264/AVC. The MVC prediction framework includes both inter-visual representation prediction within each view and interview prediction. In Figure 3, predictions are indicated by arrows where the object pointed to using the object that points for prediction reference. The MVC prediction structure of Figure 3 can be used in conjunction with a first time coding order arrangement. In a first time decoding order, each access unit can be defined to contain coded visual representations of all views for one instance of the exit time. The decoding order of access units may not be identical to the output or display order.
[0099] In MVC, interview prediction is supported by disparity motion compensation which uses the H.264/AVC motion compensation syntax, but allows a visual representation in a different view to be placed as a reference visual representation . Two-view encoding could also be supported by MVC. An MVC encoder can have more than two views as a 3D video input and an MVC decoder can decode a multi-view representation. A renderer with an MVC decoder can decode 3D video content with multiple views.
[00100] Visual representations in the same access unit (that is, with the same occurrence in time) can be predicted by intervening in the MVC. When encoding a visual representation in one of the non-basic views, a visual representation can be added to a reference visual representation list if it is in a different view but with the same occurrence in time. An interpredict visual reference representation can be placed at any position in a visual reference representation list, such as any interprediction reference visual representation.
[00101] In MVC, the interview prediction can be performed as if the view component in another view is an interprediction reference. Potential interview references can be flagged in the Sequence Parameter Set (SPS) MVC extension. Potential interview references can be modified by the visual reference representation list construction process that allows flexible ordering of the interprediction or interview prediction references.
[00102] In contrast, in HEVC, the slice header follows a design principle similar to that in H.264/AVC. Additionally, a HEVC slice header can contain an adaptive loop filter (ALF) syntax parameter in the current HEVC specification. In some examples, the depth slice header comprises one or more adaptive loop filter parameters.
[00103] In a 3DV codec, a view component of each view at a specific time occurrence can include a texture view component and a depth view component. A slice structure can be used for error resilience purposes, that is, to provide error resilience. However, a depth view component can only be meaningful when the corresponding texture view component is received correctly. By including all syntax elements for the Depth View component, a slice header for the NAL unit of a Depth View component can be relatively large. Depth slice header size can be reduced by predicting some syntax elements from syntax elements in texture slice header for texture view components.
[00104] A bitstream can be used to transfer multiple-view video block units plus depth and syntax elements between, for example, a source device 12 and a target device 14 of Figure 1. The bitstream can conforms to the ITU H.264/AVC encoding standard, and in particular follows a multi-view video encoding (MVC) bitstream structure. That is, in some examples the bitstream conforms to the MVC extension of H.264/AVC. In other examples, the bit stream conforms to a multiple-view span of HEVC or multiple-view span of another pattern. In still other examples, other coding standards are used.
[00105] A common MVC bitstream order arrangement (decoding order) is a time-first encoding. Each access unit is defined to contain the coded visual representations of all views for an exit time occurrence. The decoding order of access units may or may not be identical to the output or display order. Typically, MVC prediction can include both inter-visual representation prediction within each view and interview prediction. In MVC, interview prediction can be supported by disparity motion compensation which uses the H.264/AVC motion compensation syntax, but allows a visual representation in a different view to be used as a reference visual representation.
[00106] Two-view encoding is supported by MVC. One of the advantages of MVC is that an MVC encoder can have more than two views as a 3D video input and an MVC decoder can decode both views into a multi-view representation. Therefore, a renderer with an MVC decoder can treat 3D video content as having multiple views. Previously, MVC does not process depth map input similar to H.264/AVC with Supplemental Enhancement Information (SEI) messages (stereo information or spatial interleaving visual representations).
[00107] In the H.264/AVC standard, Network Abstraction Layer (NAL) units are defined to provide “network-useful” video representation addressing applications such as video telephony, storage or video streaming. NAL units can be categorized into Video Coding Layer (VCL) NAL units and non-VCL NAL units. VCL units can contain a core compression engine and comprise block, macroblock (MB) and slice level. Other NAL units are NAL units other than VCL.
[00108] In an example of 2D video encoding, each NAL Unit contains a one-byte NAL Unit header and a payload of varying size. Five bits are used to specify the NAL unit type. Three bits are used for nal_ref_idc which indicate how important the NAL unit is in terms of being referenced by other visual representations (NAL units). For example, setting nal_ref_idc equal to 0 means that the NAL unit is not used for interprediction. As H.264/AVC is expanded to include 3D video encoding, such as the scalable video encoding (SVC) standard, the NAL header can be similar to that of the 2D scenario. For example, one or more bits in the NAL unit header are used to identify that the NAL unit is a four-component NAL unit.
[00109] NAL unit headers can also be used for NAL MVC units. However, in MVC, the NAL unit header structure can be retained except for the prefix of the NAL units and MVC encoded slice NAL units. The MVC encoded slice NAL units may comprise a four byte header and the NAL unit payload which may include a block unit such as an encoded block 8 of Figure 1. The syntax elements in the NAL unit MVC header may include priority_id, temporal_id, anchor_pic_flag, view_id, non_idr_flag and inter_view_flag. In other examples, other syntax elements are included in a NAL unit MVC header.
[00110] The anchor_pic_flag syntax element can indicate whether a visual representation is an anchor visual representation or a non-anchor visual representation. Anchor visual representations and all visual representations that follow it in output order (ie, display order) can be decoded correctly without decoding previous visual representations in decoding order (ie, bitstream order) and therefore can be used as random hotspots. Anchor visual representations and non-anchor visual representations can have different dependencies, both of which can be flagged in the string parameter set.
[00111] The bitstream structure defined in MVC can be characterized by two syntax elements: view_id and temporal_id. The view_id syntax element can indicate the identifier of each view. This identifier in the NAL unit header allows for easy identification of NAL units in the decoder and quick access to the decoded views for display. The temporal_id syntax element can indicate the temporal scalability hierarchy or, indirectly, the frame rate. For example, an operating point that includes NAL units with a lower maximum temporal_id value may have a lower frame rate than an operating point with a higher maximum temporal_id value. Visual representations encoded with a larger temporal_id value typically depend on visual representations encoded with smaller temporal_id values within a view, but may not depend on any visual representation encoded with a larger temporal_id.
[00112] The view_id and temporal_id syntax elements in the NAL unit header can be used for both bitstream extraction and bitstream adaptation. The priority_id syntax element can be used primarily for the simple process of bitstream adaptation of a trajectory. The inter_view_flag syntax element can indicate whether this NAL unit will be used for interview prediction of another NAL unit in a different view.
[00113] MVC can also employ Sequence Parameter Sets (SPSs) and includes an SPS MVC extension. Parameter sets are used for signaling in H.264/AVC. String parameter sets comprise string-level header information. Visual representation parameter sets (PPSs) comprise the infrequent change of header information at the visual representation level. With parameter sets, this infrequently changing information is not always repeated for every sequence or visual representation, thus coding efficiency is improved. Furthermore, the use of parameter sets allows for out-of-band transmission of header information, which avoids the need for redundant transmissions for error resilience. In some examples of out-of-band transmission, the parameter set NAL Units are transmitted on a different channel than the other NAL units. In MVC, a view dependency can be signaled in the SPS MVC extension. All interview prediction can be done within the scope specified by the SPS MVC extension.
[00114] In some previous 3D video encoding techniques, content is encoded in such a way that the color components, for example in the YCbCr color space, are encoded in one or more NAL units while the image in depth is encoded on one or more separate NAL units. However, when no single NAL unit contains the texture encoded samples and depth images of an access unit, several problems can occur. For example, in a 3D video decoder, it is expected that after decoding both the texture and depth image of each frame, view rendering based on depth map and texture is activated to generate the virtual views. If the depth image NAL unit and the texture NAL unit for an access unit are sequentially encoded, view rendering may not start until the entire access unit is decoded. This can lead to an increase in the time it takes for 3D video to render.
[00115] Furthermore, the texture image and the associated depth map image can share some information at various levels in the codec, for example, sequence level, visual representation level, slice level and block level. Encoding this information into two NAL units can create extra deployment difficulty when sharing or predicting the information. Therefore, the encoder may have to perform motion estimation for a frame twice, once for the texture and again for the depth map. Similarly, the decoder may need to perform motion compensation twice for one frame.
[00116] As described in this document, techniques are added to existing standards such as MVC to support 3D video. Multi-view video plus depth (MVD) can be added to MVC for 3D video processing. 3D video encoding techniques can provide more flexibility and extensibility to existing video standards, for example, to change the angle of view smoothly or adjust convergence or depth perception backwards or forwards which can be based on the specifications of the devices or user preferences, for example. Coding standards can also be expanded to use depth maps for generating virtual views in 3D video.
[00117] Figure 4 is a flowchart illustrating an exemplary operation of a video encoder according to techniques of the present disclosure. In some examples, the video encoder is a video encoder such as video encoder 22 shown in Figures 1 and 2. In other examples, the video encoder is a video decoder such as video decoder 28 shown in Figures 1 and 5. A video encoder receives a texture slice which comprises a texture slice header which comprises syntax elements representative of characteristics of the texture slice (102). For example, a video encoder receives a texture slice for a texture view component associated with one or more encoded blocks of video data representative of texture information, the texture slice comprising the one or more encoded blocks and a texture slice header comprising syntax elements representative of texture slice characteristics. The method further includes receiving a depth slice that comprises a depth slice header that comprises syntax elements representative of characteristics of the depth slice (104). For example, the video encoder receives a depth slice for a depth view component associated with one or more encoded blocks of depth information corresponding to the texture view component, where the depth slice comprises the one or more encoded blocks of depth information and a depth slice header comprising syntax elements representative of characteristics of the depth slice. In some examples, the depth view component and the texture view component both belong to a view and an access unit.
[00118] The method further comprises encoding a first slice, wherein the first slice comprises one of the texture slice and the depth slice, wherein the first slice has a slice header comprising syntax elements representative of features of the first slice (106). For example, video encoder 22 encodes a first slice, where the first slice comprises one of the texture slice and the depth slice, where the first slice has a slice header comprising syntax elements representative of characteristics of the first slice. In one example, the slice header comprises all syntax elements used to encode the associated slice. In another example, the video decoder 28 decodes a first slice, where the first slice comprises one of the texture slice and the depth slice, where The first slice has a slice header comprising syntax elements representative of features of the first slice.
[00119] The method further comprises determining common syntax elements for a second slice from the slice header of the first slice (108). Additionally, the method comprises encoding the second slice after encoding the first slice at least partially based on the common syntax elements determined, wherein the second slice has a slice header comprising syntax elements representative of characteristics of the second slice that exclude values for syntax elements that are common to the first slice (110). For example, video encoder 22 can encode the second slice after encoding the first slice at least partially based on the common syntax elements determined, wherein the second slice comprises one of the texture slice and the depth slice that is not the first slice, wherein the second slice has a slice header that comprises syntax elements representative of characteristics of the second slice that exclude values for syntax elements that are common to the first slice. Similarly, the video decoder 28 can decode the second slice after encoding the first slice at least partially based on the common syntax elements determined, wherein the second slice comprises one of the texture slice and the depth slice that does not is the first slice, where the second slice has a slice header that comprises syntax elements representative of characteristics of the second slice that exclude values for syntax elements that are common to the first slice.
[00120] In other examples, the method further comprises signaling an indication of which syntax elements are explicitly signaled in the slice header of the second slice in the sequence parameter set.
[00121] In other examples, at least one depth syntax element is determined and flagged in a depth view component slice header. The at least one depth syntax element may include a visual representation parameter set identifier, a quantization parameter difference between a slice quantization parameter and a flagged quantization parameter in a visual representation parameter set, a starting position of the coded block unit, an order of the reference visual representations, or a display order of the current visual representation of the depth view component. For example, the slice header of the second slice comprises at least one signed syntax element of an identification of a reference visual representation parameter set. In another example, the second slice slice header comprises at least one signaled syntax element of a quantization parameter difference between a second slice quantization parameter and a signaled quantization parameter in a set of visual representation parameters. In another example, the slice header of the second slice comprises at least one syntax element signaled from a start position of the coded block. Additionally, the slice header of the second slice may comprise at least one of a frame number and an order count of visual representation of the second slice. In another example, the slice header of the second slice comprises at least one of the syntax elements related to a reference visual representation list construction, a number of active reference frames for each list, modification syntax tables of a reference visual representation list and a prediction weight table.
[00122] A start position of the coded block unit can be determined to be zero when a start position of the coded block is not signaled in the texture slice header or the depth slice header. A loop filter parameter for the at least one texture view component can be flagged and an established flag that indicates a loop filter parameter used for the depth view component is the same as a loop filter parameter for o at least one texture view component. For example, the slice header of the second slice comprises at least one of the syntax elements related to deblocking filter parameters or loop filter adaptation parameters for the second slice.
[00123] In another example, the one or more video data blocks representing texture information are encoded using interview prediction while the depth values for a corresponding portion of the frame are encoded using intraview prediction. A video frame that has texture view components and depth view components can match a first view. Encoding one or more texture information representative video data blocks may include predicting at least a portion of at least one of the texture information representative video data blocks with respect to the data of a second view, wherein the second view is different from first view. Encoding depth information representative of depth values for the frame portion further comprises predicting at least a portion of the depth information representative of depth values with respect to data from the first view. The depth slice header may additionally flag syntax elements representative of a reference visual representation list construction for the view component depth map.
[00124] Table 1 shows a sequence parameter set (SPS) MVC extension. Interview references can be flagged in SPS and can be modified by the visual reference representation list construction process that allows flexible ordering of the interprediction or interview prediction references.



[00125] A sequence-level indicator can specify how depth view components are predicted from corresponding texture view components in the same view. In a set of sequence parameters for a depth map, the following syntax can be flagged: pred_slice_header_placed_idc ue(v) or u(2)
[00126] In the examples where the one or more video data blocks representing texture information are encoded using interview prediction while the depth values for a corresponding portion of the frame are encoded using intraview prediction, num_ref_idx_active_override_flag and ref_pic_list_reordering can be flagged in the slice header for depth map view components.
[00127] Table 2 provides an example syntax table of a slice header for a depth slice. The pre_slice_header_colocated_idc syntax element specifies that syntax elements are reused between a slice header of a texture view component and a slice header of a depth view component in the following ways. Setting pred_slice_header_colocated_idc equal to 0 indicates that there is no prediction between any slice header of the texture view component and the corresponding depth view component of it. Note that a corresponding texture view component of a view component depth map refers to the texture view component at the same time occurrence within the same view.
[00128] Setting pre_slice_header_colocated_idc equal to 3 indicates that the visual representation parameter set and QP delta of a depth view component NAL unit are flagged in the slice header, while other slice level syntax elements of the NAL unit of depth view component are the same as or predictable from the corresponding texture view component syntax elements.
[00129] Setting pred_slice_header_placed_idc equal to 2 indicates that the visual representation parameter set and the QP delta as well as the location of the first MB or CU of a depth view component NAL unit are flagged in the depth slice header while other syntax elements are the same as or predictable from the corresponding syntax elements of the co-located texture view component of the same view.
[00130] Setting pred_slice_header_colocated_idc equal to 1 indicates that the visual representation parameter set and the delta QP, the location of the first MB or CU of a depth view component NAL unit, and the frame_num and POC values are flagged in the header from slice while other syntax elements are the same as or predictable from the corresponding syntax elements of the co-located texture view component of the same view. In one example, when pred_slice_header_colocated_idc is equal to 3, first_mb_in_slice is inferred to have a value of 0. On the other hand, when pred_slice_header_colocated_idc is less than 3, a value for first_mb_in_slice can be explicitly signaled as shown in Table 2.
[00131] It is also shown in Table 2 that when pred_slice_header_colocated_idc has a value that is less than one, an entropy slice indication and a slice type are signaled. The entropy slice indication has a value that indicates whether the corresponding slice is an entropy slice, that is, whether the slice is entropy encoded without reference to contexts for other slices. Context models can therefore be initialized or restarted at the beginning of each entropy slice. The slice type indicates a type for the slice, for example, I, P, or B. Also, when pred_slice_header_placed_idc has a value that is less than one, the slice header indicates whether slice blocks are field encoded (for example, for field interleaving encoding).


[00132] Table 3 provides an example design of a slice header for a depth view component based on HEVC. Note that in this example, when pred_slice_header_placed_idc is equal to 3, first_tb_in_slice is inferred to have a value of 0.




[00133] Table 4 is an example syntax table of a slice header of a depth slice. Table 4 provides an example design of a depth slice header syntax to further indicate syntax reuse for a depth view component. In this example, a sequence-level indicator specifies how depth view components are predicted from corresponding texture view components in the same view.
[00134] In such a set of sequence parameters for a depth map, the following syntax can be flagged: pred_slice_header_placed_idc ue(v) or u(2)
[00135] The pre_slice_header_placed_idc syntax element specifies a syntax element reuse between the slice header of a texture view component and the slice header of a depth view component. For example, setting pred_slice_header_placed_idc equal to 0 indicates that there is no prediction between any slice header of texture view component and corresponding depth view component thereof. Setting pre_slice_header_colocated_idc equal to 3 indicates that the visual representation parameter set and QP delta of a depth view component NAL unit are flagged in the slice header while other component NAL unit slice level syntax elements depth view are the same as or predicted from the syntax elements of the corresponding texture view component.
[00136] Setting pred_slice_header_colocated_idc equal to 2 indicates that the visual representation parameter set and the delta QP as well as the location of the first MB or CU of a depth view component NAL unit are flagged in the slice header while other syntax elements are the same as or predicted from the corresponding syntax elements of the co-located texture view component of the same view. Setting pred_slice_header_colocated_idc equal to 1 indicates that the visual representation parameter set and the delta QP, the location of the first MB or CU of a depth view component NAL unit, and the frame_num and POC values are flagged in the slice header while that the other syntax elements are the same as or predicted from the corresponding syntax elements of the co-located texture view component of the same view.
[00137] A syntax indication, pred_default_syntax_flag, indicates whether the slice header syntax elements of a view component depth map are predicted from those of the co-located texture view component. In one example, pred_default_syntax_flag is inferred to be 0 if pred_slice_header_colocated_idc is equal to 0. When pred_slice_header_colocated_idc is equal to 3 and pred_default_syntax_flag is 1 in this example, first_mb_in_slice is equal to 0.



[00138] Table 5 is an example syntax table of a slice header for a depth view component based on HEVC. In the example in Table 5, pred_default_syntax_flag indicates whether the slice header syntax elements of a view component depth map are predicted from those of the co-located texture view component. The pred_default_syntax_flag indication is inferred to be 0 if pred_slice_header_colocated_idc is equal to 0. When pred_slice_header_colocated_idc is equal to 3 and pred_default_syntax_flag is 1 in this example, first_tb_in_slice is equal to 0.



[00139] Note that when slice header prediction is enabled, there is an implication that if slice A is based on slice B, given that either slice A or B is a deep slice and the other is a slice of texture and they belong to the view of the same occurrence in time, one of the following is obtained: all slices in the visual representation containing slice B have the same slice header; any MB in slice A has an MB co-located in slice B; or if any MB in slice A has an MB co-located in slice C of the visual representation that contains slice B, slice C must have the same slice header as slice B.
[00140] Alternatively, a different implementation of the described set of procedures may be as follows for a depth view component. Table 6 provides an example of a slice header depth extension.

[00141] In this example, the sameRefPicList syntax element is derived from or signaled at a PPS or SPS level. For example, a disable_depth_inter_view_flag, flagged in SPS, indicates whether interview prediction for depth is disabled.
[00142] For a texture view component, another implementation of the described set of procedures can be as shown in Table 7. In this example, the syntax elements for a texture slice header for the texture view components can be predicted from correlated syntax elements to a depth slice header to the deep view components.

[00143] Similarly, in this example, the sameRefPicList syntax element is derived from or signaled at a PPS or SPS level.
[00144] Alternatively, such a flag can be explicitly flagged in the slice header as shown in Table 8.

[00145] A syntax element, slice_header_prediction_flag, indicates whether slice header prediction from texture to depth or from depth to texture is enabled. That is, at least one of the texture slice or the depth slice comprises a syntax element that indicates whether a slice header prediction is from the texture slice header to the depth slice header or from the depth slice header for texture slice header.
[00146] Alternatively, slice level flags or other indicators specify the extent to which slice prediction applies. Examples of these indicators include if reference visual representation list construction syntax elements are predicted, if slice_qp_delta is predicted, and if weighted prediction syntax elements are predicted.
[00147] In some examples, it is also indicated if syntax elements related to the filter/loop are predicted. If filter/loop related syntax elements are not predicted, an additional flag to indicate whether those syntax elements are present or not is included in the depth slice header.
[00148] Alternatively, another flag used to signal a deblocking filter, deblocking_pred_flag, can be used instead of pred_default_syntax_flag or pred_slice_header_colocated_idc for deblocking filter parameters. This flag is signaled in the same slice header as either PPS or SPS. Table 9 shows an example syntax table of a slice header for a HEVC-based depth view component. In the context of HEVC, the ALF parameters of a depth view component should not be the same as the ALF parameters of the corresponding texture view component, unless ALF is not used for both the texture view component and the view component with depth.

[00149] Figure 5 is a block diagram illustrating an example of the video decoder 28 of Figure 1 in more detail, according to the sets of procedures of the present disclosure. Video decoder 28 is an example of a specialized video computer apparatus or device referred to herein as an "encoder". As shown in Figure 5, video decoder 28 corresponds to video decoder 28 of target device 14. However, in other examples, video decoder 28 corresponds to a different device. In the additional examples, other units (such as, for example, another encoder/decoder (CODECS)) can also perform similar sets of procedures as the video decoder 28.
[00150] The video decoder 28 includes an entropy decoding unit 52 which entropy decodes the received bit stream to generate quantized coefficients and prediction syntax elements. Bitstream includes coded blocks that have texture components and a depth component for each pixel location in order to render a 3D video and syntax elements. Prediction syntax elements include at least one of a coding mode, one or more motion vectors, information identifying a set of interpolation procedures used, coefficients for use in interpolation filtering, and other information associated with generating the prediction block.
The prediction syntax elements, for example the coefficients, are forwarded to the prediction processing unit 55. The prediction processing unit 55 includes a depth syntax prediction module 66. If prediction is used to encode the coefficients with respect to the coefficients of a fixed filter, or with respect to each other, the prediction processing unit 55 decodes the syntax elements to define the actual coefficients. Depth syntax prediction module 66 predicts depth syntax elements for depth view components from texture syntax elements for texture view components.
[00152] If quantization is applied to any of the prediction syntax elements, the inverse quantization unit 56 removes such quantization. The inverse quantization unit 56 can handle the texture and depth components for each pixel location of the encoded blocks in the encoded bitstream differently. For example, when the depth component has been quantized differently from the texture components, the inverse quantization unit 56 processes the texture and depth components separately. Filter coefficients, for example, may be predictively encoded and quantized in accordance with this disclosure, in which case, inverse quantization unit 56 is used by video decoder 28 to predictively decode and dequantize such coefficients.
[00153] The prediction processing unit 55 generates prediction data based on the prediction syntax elements and one or more previously decoded blocks that are stored in memory 62, in much the same way as described in detail above in relation to the unit of prediction processing 32 of the video encoder 22. In particular, the prediction processing unit 55 performs one or more of the sets of multiple view video procedures plus depth of such revelation during motion compensation to generate a prediction block that incorporates the depth components as well as the texture components. The prediction block (as well as a coded block) can have different resolution for depth components versus texture components. For example, depth components are quarter-pixel accurate while texture components are integer pixel-precise. As such, one or more of the sets of procedures of that disclosure are used by the video decoder 28 in generating a prediction block. In some examples, the prediction processing unit 55 may include a motion compensation unit comprising filters used for sets of interpolation filtering procedures and similar to the interpolation of that disclosure. The motion compensation component is not shown in Figure 5 for simplicity and ease of illustration.
[00154] The inverse quantization unit 56 inversely quantizes, that is, dequantizes, the quantized coefficients. The inverse quantization process is a defined process for H.264 decoding or any other decoding standard. The inverse transform processing unit 58 applies an inverse transform, e.g., a conceptually similar inverse transform process or inverse DCT, to the transform coefficients in order to produce residual blocks in the pixel domain. The adder 64 sums the residual block with the corresponding prediction block generated by the prediction processing unit 55 to form a reconstructed version of the original block encoded by the video encoder 22. If desired, a deblocking filter is also applied to filter the blocks decoded in order to remove blocking artifacts. The decoded video blocks are then stored in memory 62, which provides the reference blocks for subsequent motion compensation and also produces decoded video to drive the display device (such as device 28 of Figure 1).
[00155] Decoded video can be used to render video in 3D. 3D video can comprise a three-dimensional virtual view. Depth information is used to determine a horizontal offset (horizontal disparity) for each pixel in the block. Occlusion handling can also be performed to generate the virtual view. The syntax elements for the depth view components can be predicted from the syntax elements for the texture view components.
[00156] Figure 6 is a flow diagram illustrating an exemplary operation of a video decoder, according to the sets of procedures of the present disclosure. The process of Figure 6 can be considered the reciprocal decoding process to the encoding process of Figure 4. Figure 6 will be described from the perspective of the video decoder 28 of Figure 5, although other devices may perform similar sets of procedures.
[00157] A video decoder, such as video decoder 28, receives a texture slice for a texture view component associated with one or more encoded blocks of video data representative of texture information from at least a portion of a frame of video data, the texture slice comprising the one or more coded blocks and a texture slice header comprising syntax elements representative of texture slice characteristics (122). The video decoder receives a depth slice for a depth view component corresponding to the texture view component, the depth slice comprising the encoded depth information and a depth slice header comprising at least one element. syntax representative of characteristics of the depth slice, excluding the values for the syntax elements that are common to the depth slice and texture slice (124). The video decoder predicts Syntax elements for at least one of the depth slice or texture slice from the values for syntax elements that are common to the depth slice and texture slice (126).
[00158] In other examples, at least one depth syntax element is determined and flagged in a view component slice header with depth. The at least one depth syntax element includes at least one of a visual representation parameter set identifier, a quantization parameter difference between a slice quantization parameter and a flagged quantization parameter in a representation parameter set visual, a starting position of the coded block unit, an order of the reference visual representations, and a display order of the current visual representation of the view component with depth. A start position of the coded block unit is determined to be zero when a start position of the coded block is not signaled in the texture slice header or the depth slice header. A loop filter parameter for the at least one texture view component can be flagged, and a set of flags that indicate a loop filter parameter used for the depth view component is the same as a texture filter parameter. loop to at least one texture view component.
[00159] In another example, the video decoder 28 predicts the texture view component using sets of interview prediction procedures and predicts the view component with depth using sets of intraview prediction procedures. The video decoder 28 receives the depth slice header which further comprises syntax elements representative of a reference visual representation list construction for the depth view component. In an example where the texture view component and the depth view component correspond to a first view, decoding the texture view component includes predicting at least a portion of the texture view component with respect to data from a second. View. The second view is different from the first view. In some examples, decoding the depth view component may include predicting at least a portion of the depth view component with respect to the first view data.
[00160] In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, functions can be stored in or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer readable media may include computer readable storage media, which is a tangible medium such as data storage media, or communication media which includes any medium that facilitates the transfer of a computer program from a location to another, for example, according to a communication protocol. Thus, computer readable media can generally correspond to (1) tangible computer readable storage media that is non-transient or (2) a communication medium such as a carrier wave or signal. Data storage media can be any available media that can be accessed by one or more computers or one or more processors to retrieve instruction structures, code and/or data for implementing the sets of procedures described in this disclosure. A computer program product may include a computer-readable medium.
[00161] By way of example, and without limitation, such computer readable storage media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, memory flash, or any other medium that can be used to store the desired program code in the form of data structures or instructions and that can be accessed by a computer. Furthermore, any connection is appropriately termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the medium definition . It should be understood, however, that computer readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are instead directed to tangible storage media. , not transient. Magnetic disk and optical disk, as used herein, include compact disk (CD), laser disk, optical disk, digital versatile disk (DVD), floppy disk, and Blu-ray disk on which magnetic disks normally reproduce data magnetically , while optical discs reproduce data optically with lasers. Combinations of the above must also be included within the scope of computer readable media.
[00162] Instructions can be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs) , or other equivalent integrated discrete or integrated logic circuitry. Accordingly, the term “processor,” as used herein, may refer to any of the following framework or any other framework suitable for implementing the sets of procedures described in this document. In addition, in some respects, the functionality described in this document may be provided within dedicated software and/or hardware modules configured for encoding and decoding, or incorporated into a combined codec. In addition, sets of procedures can be fully implemented in one or more circuits or logic elements.
[00163] The sets of procedures of this disclosure can be implemented in a wide variety of devices or apparatus, including a wireless telephone handset, an integrated circuit (IC) or a set of ICs (eg, a set of chips). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed sets of procedures, but do not necessarily require them to be performed by different hardware units. Preferably, as described above, multiple units may be combined into one codec hardware unit or provided by a collection of interoperable hardware units, including one or more processors as described above, together with suitable software and/or firmware.
[00164] Several examples of this revelation have been described. These and other examples are within the scope of the following claims.
权利要求:
Claims (14)
[0001]
1. A method for decoding video data, the method characterized in that it comprises: receiving a texture slice for a texture view component associated with one or more encoded blocks of video data representative of texture information, the texture slice the one or more coded blocks comprising and a texture slice header comprising syntax elements representative of texture slice characteristics; receiving a depth slice for a depth view component associated with one or more encoded blocks of depth information that correspond to the texture view component, wherein the depth slice comprises the one or more encoded blocks of depth information and a depth slice header comprising syntax elements representing characteristics of the depth slice, and wherein both the depth view component and the texture view component belong to a view and an access unit; decoding a first slice, wherein the first slice comprises the texture slice, wherein the first slice has a slice header comprising syntax elements representative of features of the first slice; determine common syntax elements for a second slice from the slice header of the first slice; and decoding the second slice after decoding the first slice at least partially based on the common syntax elements determined, wherein the second slice comprises the depth slice, wherein the second slice has a slice header comprising syntax elements representative of characteristics of the second slice, excluding values for syntax elements that are common to the first slice.
[0002]
2. Method according to claim 1, characterized in that the slice header of the second slice comprises one or more of: a signed syntax element of an identification of a set of reference visual representation parameters; a signed syntax element of a quantization parameter difference between a second slice quantization parameter and a signed quantization parameter in a set of visual representation parameters; a signed syntax element from a starting position of one of the coded blocks; a frame number; a visual representation order count of the second slice; syntax elements related to a reference visual representation list construct; a number of active frames of reference for each list; a reference visual representation list modification syntax table; a prediction weight table; and syntax elements related to deblocking filter parameters or adaptive loop filtering parameters for the second slice.
[0003]
3. Method according to any one of claims 1 to 2, characterized in that it further comprises: determining an initial position of the depth slice to be zero when an initial position of the depth view component is not signaled in the header of texture slice or depth slice header.
[0004]
4. Device for decoding video data, characterized in that it comprises: means for receiving a texture slice for a texture view component associated with one or more encoded blocks of video data representative of texture information, the slice a texture slice comprising the one or more coded blocks and a texture slice header comprising syntax elements representative of texture slice characteristics; means for receiving a depth slice for a depth view component associated with one or more encoded blocks of depth information corresponding to the texture view component, wherein the depth slice comprises the one or more encoded blocks of texture information. depth and a depth slice header comprising syntax elements representative of characteristics of the depth slice, and wherein both the depth view component and the texture view component belong to a view and an access unit; means for decoding a first slice, wherein the first slice comprises the texture slice, wherein the first slice has a slice header comprising syntax elements representative of features of the first slice; means for determining common syntax elements for a second slice from the slice header of the first slice; and means for decoding the second slice after decoding the first slice at least partially based on the determined common syntax elements, wherein the second slice comprises the depth slice, wherein the second slice has a slice header comprising elements of syntax representative of characteristics of the second slice, excluding values for syntax elements that are common to the first slice.
[0005]
5. Device according to claim 4, characterized in that the slice header of the second slice comprises one or more of: a signed syntax element of an identification of a set of reference visual representation parameters; a signed syntax element of a quantization parameter difference between a second slice quantization parameter and a signed quantization parameter in a set of visual representation parameters; a signed syntax element from a starting position of one of the coded blocks; a frame number; a visual representation order count of the second slice; syntax elements related to a reference visual representation list construct; a number of active frames of reference for each list; a reference visual representation list modification syntax table; a prediction weight table; and syntax elements related to deblocking filter parameters or adaptive loop filtering parameters for the second slice.
[0006]
6. Method for encoding video data, the method characterized in that it comprises: receiving a texture slice for a texture view component associated with one or more blocks of video data representative of texture information, the texture slice comprising the one or more blocks and a texture slice header comprising syntax elements representative of texture slice characteristics; receiving a depth slice for a depth view component associated with one or more blocks of depth information that correspond to the texture view component, wherein the depth slice comprises the one or more blocks of depth information and a header a depth slice comprising syntax elements representing characteristics of the depth slice, and in which both the depth view component and the texture view component belong to a view and an access unit; encoding a first slice, wherein the first slice comprises the texture slice, wherein the first slice has a slice header comprising syntax elements representative of features of the first slice; determine common syntax elements for a second slice from the slice header of the first slice; and encoding the second slice after encoding the first slice at least partially based on the determined common syntax elements, wherein the second slice comprises the depth slice, wherein the second slice has a slice header comprising representative syntax elements of characteristics of the second slice, excluding values for syntax elements that are common to the first slice.
[0007]
7. Method according to claim 6, characterized in that the slice header of the second slice comprises one or more of: a signed syntax element of an identification of a set of reference visual representation parameters; a signed syntax element of a quantization parameter difference between a second slice quantization parameter and a signed quantization parameter in a set of visual representation parameters; a signed syntax element from a starting position of one of the coded blocks; a frame number; a visual representation order count of the second slice; syntax elements related to a reference visual representation list construct; a number of active frames of reference for each list; a reference visual representation list modification syntax table; a prediction weight table; and syntax elements related to deblocking filter parameters or adaptive loop filtering parameters for the second slice. 8. Method according to any one of claims 6 to 7, characterized in that it further comprises: determining an initial position of the depth slice to be zero when an initial position of the depth view component is not signaled in the header of texture slice or depth slice header. 9. Method according to any one of claims 6 to 8, characterized in that it further comprises: signaling an indication of which syntax elements are explicitly signaled in the slice header of the second slice in the sequence parameter set.
[0008]
8. Method according to any one of claims 6 to 7, characterized in that it further comprises: determining an initial position of the depth slice to be zero when an initial position of the depth view component is not signaled in the header of texture slice or depth slice header. 9. Method according to any one of claims 6 to 8, characterized in that it further comprises: signaling an indication of which syntax elements are explicitly signaled in the slice header of the second slice in the sequence parameter set.
[0009]
9. Method according to any one of claims 6 to 8, characterized in that it further comprises: signaling an indication of which syntax elements are explicitly signaled in the slice header of the second slice in the sequence parameter set.
[0010]
10. Device for encoding video data, characterized in that it comprises: means for receiving a texture slice for a texture view component associated with one or more blocks of video data representative of texture information, the slice of texture comprising the one or more blocks and a texture slice header comprising syntax elements representative of texture slice characteristics; means for receiving a depth slice for a depth view component associated with one or more blocks of depth information that correspond to the texture view component, wherein the depth slice comprises the one or more blocks of depth information and a depth slice header comprising syntax elements representative of characteristics of the depth slice, and wherein both the depth view component and the texture view component belong to a view and an access unit; and means for encoding a first slice, wherein the first slice comprises the texture slice, wherein the first slice has a slice header comprising syntax elements representative of features of the first slice; means for determining common syntax elements for a second slice from the slice header of the first slice; and means for encoding the second slice after encoding the first slice at least partially based on the determined common syntax elements, wherein the second slice comprises the depth slice, wherein the second slice has a slice header comprising syntax elements representative of characteristics of the second slice, without repeating values for syntax elements that are common to the first slice.
[0011]
11. Device according to claim 10, characterized in that the slice header of the second slice comprises one or more of: a signed syntax element of an identification of a set of reference visual representation parameters; a signed syntax element of a quantization parameter difference between a second slice quantization parameter and a signed quantization parameter in a set of visual representation parameters; a signed syntax element from a starting position of one of the coded blocks; a frame number; a visual representation order count of the second slice; syntax elements related to a reference visual representation list construct; a number of active frames of reference for each list; a reference visual representation list modification syntax table; a prediction weight table; and syntax elements related to deblocking filter parameters or adaptive loop filtering parameters for the second slice.
[0012]
12. Device according to any one of claims 10 to 11, characterized in that it further comprises: means for signaling an indication of which syntax elements are explicitly signaled in the slice header of the second slice in the sequence parameter set.
[0013]
13. Computer-readable memory, characterized in that it contains recorded thereon the method as defined in any one of claims 1 to 3.
[0014]
14. Computer-readable memory, characterized in that it contains recorded thereon the method as defined in any one of claims 6 to 9.
类似技术:
公开号 | 公开日 | 专利标题
BR112014001461B1|2021-08-03|METHOD AND DEVICE TO DECODE VIDEO DATA, METHOD AND DEVICE TO ENCODE VIDEO DATA AND COMPUTER-READABLE MEMORY
JP6022652B2|2016-11-09|Slice header 3D video extension for slice header prediction
US9363535B2|2016-06-07|Coding motion depth maps with depth range variation
US9565449B2|2017-02-07|Coding multiview video plus depth content
US20120236934A1|2012-09-20|Signaling of multiview video plus depth content with a block-level 4-component structure
US20130271565A1|2013-10-17|View synthesis based on asymmetric texture and depth resolutions
RU2571511C2|2015-12-20|Encoding of motion depth maps with depth range variation
同族专利:
公开号 | 公开日
CN103703778B|2017-03-22|
RU2562419C1|2015-09-10|
KR20140046028A|2014-04-17|
WO2013016225A1|2013-01-31|
US20130022113A1|2013-01-24|
JP5805871B2|2015-11-10|
RU2014106718A|2015-08-27|
JP2014527336A|2014-10-09|
BR112014001461A2|2017-02-21|
EP2735150B1|2015-08-19|
CA2842554A1|2013-01-31|
CN103703778A|2014-04-02|
EP2735150A1|2014-05-28|
ES2548778T3|2015-10-20|
CA2842554C|2016-09-06|
KR101733852B1|2017-05-08|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

RU2314656C2|2002-06-11|2008-01-10|Нокиа Корпорейшн|Internal encoding, based on spatial prediction|
DE102007049351A1|2007-10-15|2009-04-16|Siemens Ag|A method and apparatus for creating a coded output video stream from at least two coded input video streams, and using the apparatus and coded input video stream|
EP2338281A4|2008-10-17|2012-08-15|Nokia Corp|Sharing of motion vector in 3d video coding|
CN102265617A|2008-12-26|2011-11-30|日本胜利株式会社|Image encoding device, image encoding method, program thereof, image decoding device, image decoding method, and program thereof|
US9565449B2|2011-03-10|2017-02-07|Qualcomm Incorporated|Coding multiview video plus depth content|
US9918068B2|2011-06-15|2018-03-13|Media Tek Inc.|Method and apparatus of texture image compress in 3D video coding|EP3661199A1|2010-07-15|2020-06-03|GE Video Compression, LLC|Hybrid video coding supporting intermediate view synthesis|
US20120268551A1|2011-04-19|2012-10-25|Ben Wu|Image Interaction Device, Interactive Image Operating System, and Interactive Image Operating Method thereof|
US9288505B2|2011-08-11|2016-03-15|Qualcomm Incorporated|Three-dimensional video with asymmetric spatial resolution|
US20130083846A1|2011-09-29|2013-04-04|JVC Kenwood Corporation|Image encoding apparatus, image encoding method, image encoding program, image decoding apparatus, image decoding method, and image decoding program|
US9143802B2|2011-10-31|2015-09-22|Qualcomm Incorporated|Fragmented parameter set for video coding|
US9485503B2|2011-11-18|2016-11-01|Qualcomm Incorporated|Inside view motion prediction among texture and depth view components|
EP2810443B1|2012-02-01|2021-03-31|Nokia Technologies Oy|Method and apparatus for video coding|
JP2013198059A|2012-03-22|2013-09-30|Sharp Corp|Image encoder, image decoder, image encoding method, image decoding method and program|
US10205961B2|2012-04-23|2019-02-12|Qualcomm Incorporated|View dependency in multi-view coding and 3D coding|
CN104813660B|2012-09-28|2019-04-05|诺基亚技术有限公司|For Video coding and decoded device and method|
EP2904773A4|2012-10-03|2016-12-07|Hfi Innovation Inc|Method and apparatus of motion data buffer reduction for three-dimensional video coding|
GB2512658B|2013-04-05|2020-04-01|British Broadcasting Corp|Transmitting and receiving a composite image|
EP2966866A4|2013-04-05|2016-10-26|Samsung Electronics Co Ltd|Video encoding method and apparatus thereof, and a video decoding method and apparatus thereof|
US9729875B2|2013-07-08|2017-08-08|Sony Corporation|Palette coding mode|
US20150063455A1|2013-09-02|2015-03-05|Humax Holdings Co., Ltd.|Methods and apparatuses for predicting depth quadtree in three-dimensional video|
EP3061233B1|2013-10-25|2019-12-11|Microsoft Technology Licensing, LLC|Representing blocks with hash values in video and image coding and decoding|
EP3061253A4|2013-10-25|2016-08-31|Microsoft Technology Licensing Llc|Hash-based block matching in video and image coding|
WO2015085449A1|2013-12-13|2015-06-18|Qualcomm Incorporated|Signaling of simplified depth codingfor depth intra-and inter – prediction modes in 3d video coding|
WO2015101329A1|2014-01-02|2015-07-09|Mediatek Singapore Pte. Ltd.|Method and apparatus for intra prediction coding with boundary filtering control|
US10368092B2|2014-03-04|2019-07-30|Microsoft Technology Licensing, Llc|Encoder-side decisions for block flipping and skip mode in intra block copy prediction|
EP3114838B1|2014-03-04|2018-02-07|Microsoft Technology Licensing, LLC|Hash table construction and availability checking for hash-based block matching|
CN105706450B|2014-06-23|2019-07-16|微软技术许可有限责任公司|It is determined according to the encoder of the result of the Block- matching based on hash|
AU2014408223B2|2014-09-30|2019-12-05|Microsoft Technology Licensing, Llc|Hash-based encoder decisions for video coding|
US20160105678A1|2014-10-13|2016-04-14|Microsoft Technology Licensing, Llc|Video Parameter Techniques|
WO2016204479A1|2015-06-16|2016-12-22|엘지전자|Method for encoding/decoding image, and device therefor|
EP3236657A1|2016-04-21|2017-10-25|Ultra-D Coöperatief U.A.|Dual mode depth estimator|
US10390039B2|2016-08-31|2019-08-20|Microsoft Technology Licensing, Llc|Motion estimation for screen remoting scenarios|
US11095877B2|2016-11-30|2021-08-17|Microsoft Technology Licensing, Llc|Local hash-based motion estimation for screen remoting scenarios|
US10506230B2|2017-01-04|2019-12-10|Qualcomm Incorporated|Modified adaptive loop filter temporal prediction for temporal scalability support|
US10986354B2|2018-04-16|2021-04-20|Panasonic Intellectual Property Corporation Of America|Encoder, decoder, encoding method, and decoding method|
WO2020181104A1|2019-03-07|2020-09-10|Alibaba Group Holding Limited|Method, apparatus, medium, and server for generating multi-angle free-perspective video data|
US11202085B1|2020-06-12|2021-12-14|Microsoft Technology Licensing, Llc|Low-cost hash table construction and hash-based block matching for variable-size blocks|
CN113891060A|2020-07-03|2022-01-04|阿里巴巴集团控股有限公司|Free viewpoint video reconstruction method, playing processing method, device and storage medium|
RU2761768C1|2020-11-10|2021-12-13|Самсунг Электроникс Ко., Лтд.|Method for estimating the depth of a scene based on an image and computing apparatus for implementation thereof|
法律状态:
2018-03-27| B15K| Others concerning applications: alteration of classification|Ipc: H04N 19/44 (2014.01), H04N 19/172 (2014.01), H04N |
2018-12-11| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|
2020-04-07| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|
2021-06-01| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|
2021-08-03| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 20/07/2012, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
申请号 | 申请日 | 专利标题
US201161510738P| true| 2011-07-22|2011-07-22|
US61/510,738|2011-07-22|
US201161522584P| true| 2011-08-11|2011-08-11|
US61/522,584|2011-08-11|
US201161563772P| true| 2011-11-26|2011-11-26|
US61/563,772|2011-11-26|
US201261624031P| true| 2012-04-13|2012-04-13|
US61/624,031|2012-04-13|
US13/553,617|2012-07-19|
US13/553,617|US20130022113A1|2011-07-22|2012-07-19|Slice header prediction for depth maps in three-dimensional video codecs|
PCT/US2012/047690|WO2013016225A1|2011-07-22|2012-07-20|Slice header prediction for depth maps in three-dimensional video codecs|
[返回顶部]