专利摘要:
BI-PREDITIVE FUSION MODE BASED ON UNI-PREDITIVE NEIGHBORHOODS IN VIDEO ENCODING. This description describes a bi-predictive merge mode in which a bi-predictive video block inherits motion information from two different neighboring blocks, where the two different neighboring blocks were each encoded in a uni-predictive mode. Bi-predictive encoding can improve the ability to achieve compression in video encoding. The described bi-predictive fusion mode can increase the number of bi-predictive candidates that can be used in the fusion mode encoding context by allowing two separate uni-predictive neighbors to be used to define bi-predictive motion information for a video block.
公开号:BR112013024187B1
申请号:R112013024187-0
申请日:2012-02-29
公开日:2022-02-01
发明作者:Yunfei Zheng;Wei-Jung Chien;Marta Karczewicz
申请人:Qualcomm Incorporated;
IPC主号:
专利说明:

[001] This order claims the benefits of the U.S. provisional order. At the. 61/454,862, filed March 21, 2011, and the U.S. At the. 61/502,703, filed June 29, 2011, the entire contents of each of which are incorporated herein by reference. Field of Invention
[002] This description refers to video encoding techniques used to compress video data, and more particularly to video encoding modes used in video compression. Prior Art Description
[003] Digital video capabilities can be incorporated into a wide range of video devices, including digital televisions, digital direct broadcast systems, wireless communication devices such as cordless telephone sets, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, digital cameras, digital recording devices, video game devices, video game consoles, personal multimedia devices, and the like. Such video devices may implement video compression techniques, such as those described in MPEG-2, MPEG-4, or ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), in order to to compress video data. Video compression techniques perform spatial and/or temporal prediction to reduce or remove the redundancy inherent in video sequences. New video standards, such as the High Efficiency Video Coding (HEVC) standard being developed by the "Joint Collaborative Team - Video Coding" (JCTVC), which is a collaboration between MPEG and ITU-T, continue to emerge and evolve. The emerging HEVC standard is sometimes referred to as H.265.
[004] These and other video encoding standards and techniques utilize block-based video encoding. Block-based video encoding techniques divide the video data of a video frame (or part of it) into video blocks and then encode the video blocks using prediction block-based compression techniques. Video blocks can be further divided into video block partitions. Video blocks (or partitions thereof) may be referred to as encoding units (CUs) and may be encoded using one or more video specific encoding techniques as well as general data compression techniques. Different modes can be selected and used to encode video blocks.
[005] With the emerging HEVC standard, larger coding units (LCUs) can be split into smaller and smaller CUs according to a quadtree partitioning scheme. CUs can be predicted based on so-called prediction units (PUs), which can have partition sizes corresponding to the size of the CUs or smaller than the size of the CUs, so that multiple PUs can be used to predict a given CU.
[006] Different modes can be used for encoding CUs. For example, different intracoding modes can be used to encode CUs based on prediction data within the same frame or slice in order to exploit spatial redundancy within a video frame. Alternatively, intercoding modes can be used to encode CUs based on prediction data from another frame or slice, in order to exploit temporal redundancy across frames of a video sequence. After prediction coding is performed according to a selected mode, transform coding can then be performed, such as discrete cosine transforms (DCT), integer transforms, or the like. With HEVC, transform encoding can occur with respect to transform units (TUs), which can also have variable transform sizes in the HEVC standard. Quantization of transform coefficients, scanning of quantized transform coefficients, and entropy coding can also be performed. Syntax information is signaled with the encoded video data, for example in a video slice header or video block header, in order to tell the decoder how to decode the video data. Among other things, the syntax information can identify the mode that was used in video encoding different video blocks.
[007] The merge mode is a specific intercoding mode used in video compression. With the merge mode, the motion vector of a neighboring video block is inherited to a current video block being encoded. In some cases, the merge mode causes a current video block to inherit the motion vector of a predefined neighbor, and in other cases, an index value can be used to identify the specific neighbor from which the video block current inherits its motion vector (e.g. top, top right, left, bottom left, or co-located from a temporally adjacent frame). Summary of the Invention
[008] This description describes a bi-predictive merge mode in which a video block encoded in the bi-predictive merge mode inherits its motion information from two different neighboring blocks, where the two different neighboring blocks were each encoded. in a uni-predictive mode. Bi-predictive encoding can improve the ability to achieve compression and improve video quality at a given compression level. However, in some cases, there may not be any (or few) neighbors that are encoded in a bi-predictive mode, thus making bi-prediction unavailable (or limited) with respect to merge-mode encoding. The described bi-predictive fusion mode can increase the number of bi-predictive candidates that can be used in the context of fusion mode encoding by allowing two separate uni-predictive neighbors to be used to define bi-predictive motion information for a video block.
[009] In one example, this description describes a method of decoding video data. The method comprises receiving one or more syntax elements for a current video block, where the current video block is encoded according to a bi-predictive merging mode, and based on one or more syntax elements, identifying two different neighboring video blocks encoded in uni-predictive modes. The method also comprises using motion information from two different neighboring video blocks to decode a current video block according to the bi-predictive merging mode.
[0010] In another example, this description describes a method of encoding video data. The method comprises selecting a bi-predictive merge mode for encoding a current video block, identifying two different neighboring video blocks encoded in uni-predictive modes, using motion information from two different neighboring video blocks to encode the current video block according to the bi-predictive merge mode, and generating one or more syntax elements for identifying the two different neighboring video blocks to a video decoder.
[0011] In another example, this description describes a video decoding device that decodes video data. The video decoding device comprises a video decoder configured to receive one or more syntax elements for a current video block, where the current video block is encoded according to a bi-predictive merging mode, and based on one or more syntax elements identify two different neighboring video blocks encoded in uni-predictive modes. The video decoder is configured to use motion information from the two different neighboring video blocks for decoding a current video block according to the bi-predictive merging mode.
[0012] In another example, this description describes a video encoding device comprising a video encoder configured to select a bi-predictive merge mode for encoding a current video block by identifying two different neighboring video blocks encoded in the modes uni-predictive, using motion information from the two different neighboring video blocks to encode the current video block according to the bi-predictive merging mode, and generating one or more syntax elements for identifying two neighboring video blocks different for a video decoder.
[0013] In another example, this description describes a device for decoding video data, the device comprising mechanisms for receiving one or more syntax elements for a current video block, where the current video block is encoded in according to a bi-predictive merge mode, mechanisms for identifying two different neighboring video blocks encoded in uni-predictive modes based on one or more syntax elements, and mechanisms for utilizing motion information from the two video blocks. neighboring video for decoding a current video block according to bi-predictive merging mode.
[0014] In another example, this description describes a device for encoding video data, the device comprising mechanisms for selecting a bi-predictive merge mode for encoding a current video block, mechanisms for identifying two video blocks different neighbors encoded in uni-predictive modes, mechanisms for using motion information from two different neighboring video blocks to encode the current video block according to the bi-predictive merge mode, and mechanisms for generating one or more syntax elements for identifying two different neighboring video blocks for a video decoder.
[0015] The techniques described in this description can be implemented in hardware, software, firmware, or any fusion thereof. For example, various techniques may be implemented or executed by one or more processors. As used herein, a processor can refer to a microprocessor, and an application-specific integrated circuit (ASIC), a field-programmable gate assembly (FPGA), a digital signal processor (DSP), or other set of logic circuitry. discrete or equivalent integrated. Software may run on one or more processors. Software comprising instructions for performing techniques may initially be stored on a computer-readable medium and loaded and executed by a processor.
[0016] Accordingly, this description also contemplates computer-readable storage media comprising instructions for causing a processor to perform any techniques described in that description. In some cases, the computer-readable storage medium may form part of a computer program storage product, which may be sold to manufacturers and/or used in a device. The computer program product may include computer readable media, and in some cases, may also include packaging materials.
[0017] In particular, this description also describes a computer-readable medium comprising instructions that upon execution cause the processor to decode the video data, where the instructions cause the processor upon receipt of one or more syntax elements to a current video block, where the current video block is encoded according to a bi-predictive merge mode, identify two different neighboring video blocks encoded in uni-predictive modes based on one or more syntax elements, use the motion information of the two different neighboring video blocks to decode a current video block according to the bi-predictive merging mode.
[0018] In yet another example, this description describes a computer-readable medium comprising instructions that upon execution cause a processor to encode the video data, where the instructions cause the processor to select a bi-predictive fusion mode for encoding of a current video block, identify two different neighboring video blocks encoded in uni-predictive modes, use motion information from the two different neighboring video blocks to encode the current video block according to bi-predictive merge mode , and generate one or more syntax elements to identify the two different neighboring video blocks for a video decoder.
[0019] Details of one or more aspects of the description are presented in the attached figures and in the description below. Other features, objects and advantages of the techniques described in this specification will be apparent from the description and figures, and from the claims. Brief Description of Figures
[0020] Figure 1 - is a block diagram illustrating an exemplary video encoding and decoding system that can implement one or more of the techniques of that description;
[0021] Figure 2 - is a conceptual diagram illustrating quadtree partitioning of coding units (CUs) consistent with the techniques of this description;
[0022] Figure 3 - is a conceptual diagram illustrating some possible relationships between CUs, prediction units (PUs) and transform units (TUs) consistent with the techniques of this description;
[0023] Figure 4 - is a block diagram illustrating a video encoder that can implement the techniques of this description;
[0024] Figure 5 - is a block diagram illustrating an exemplary prediction unit of an encoder, consistent with one or more examples of that description;
[0025] Figure 6 - is a block diagram illustrated a video decoder that can implement the techniques of this description;
[0026] Figure 7 - is a conceptual diagram illustrating the location of different neighboring video blocks with respect to the current video block so that the current video block can use information from one or more different neighboring video blocks in a mode of bi-predictive fusion, consistent with this description;
[0027] Figures 8 and 9 - are flowcharts illustrating techniques consistent with this description. Detailed Description of the Invention
[0028] In most video encoding systems, motion estimation and motion compensation are used to reduce temporal redundancy in a video sequence in order to achieve data compression. In this case, a motion vector can be generated in order to identify a prediction block of video data, for example from another video frame or slice, which can be used to predict the values of the current video block being encoded. The prediction video block values are subtracted from the current video block values to produce a residual data block. The motion vector is communicated from the encoder to the decoder along with residual data. The decoder can locate the same prediction block (based on the motion vector) and reconstruct the encoded video block by merging the residual data with the prediction block data. Many other compression techniques can also be used, such as transforms and entropy encoding, to further improve video compression.
[0029] The motion estimation process is normally performed at the encoder. Motion information (such as motion vectors, motion vector indices, prediction directions, or other information) can be encoded and transmitted from the encoder to the decoder so that the decoder can identify the same prediction block as the decoder. was used to encode a given video block. Many different encoding modes can be used to allow different types of temporal prediction between two different frames or different types of spatial prediction within a given frame.
[0030] With the so-called merge mode, motion information from a neighboring video block is inherited to a current video block being encoded. In this case, the motion vector itself is not transmitted to a video block encoded in merge mode. Instead, an index value can be used to identify the neighbor from which the current video block inherits its motion vector (and possibly other motion information). For example, motion information can be inherited from an upper neighbor, an upper right neighbor, a left neighbor, a lower left neighbor, or a temporal neighbor co-located from a temporally adjacent frame.
[0031] With most merge modes, if the neighbor is encoded in a uni-predictive mode, then the current video block inherits a motion vector. If the neighbor is encoded in a bi-predictive mode, then the current video block inherits two motion vectors. In such examples, a block being encoded in merge mode is constrained by the motion information of its neighbors. Uni-prediction and bi-prediction are sometimes referred to as one-way prediction (P) and bidirectional prediction (B), but the term "directional" is often misused as with modern video coding standards, bi-prediction it is simply based on two different lists of prediction data and direction is not required. In other words, the data in two different lists for bi-prediction can come from previous or subsequent frames, and need not be bidirectional from both previous and subsequent frames, respectively. For that reason, this description uses the terms uni-prediction and bi-prediction instead of unidirectional prediction and bidirectional prediction terms.
[0032] Bi-predictive encoding can improve the ability to achieve compression or improve video quality at a certain compression level. However, in some cases, there may be no (or only a few) neighbors that were encoded in a bi-predictive mode, thus making bi-prediction unavailable (or limited) in merge mode encoding. For example, like the conventional merge mode, if there is no bi-predictive mode in any of the neighboring blocks, the current block may miss the opportunity to exploit the benefits that can arise from bi-prediction.
[0033] This description describes a bi-predictive fusion mode as an extension or addition to fusion mode techniques. More specifically, this description describes a bi-predictive merge mode that inherits motion information from two different neighboring blocks, from two different neighboring blocks were each encoded in a uni-predictive mode. The described bi-predictive fusion mode can increase the number of bi-predictive candidates that can be used in the fusion mode encoding context.
[0034] Figure 1 is a block diagram illustrating an exemplary video encoding and decoding system 10 that can implement the techniques of that description. As illustrated in Figure 1, system 10 includes a source device 12 that transmits encoded video to a destination device 16 over a communication channel 15. The source device 12 and the destination device 16 may comprise any of a number of wide range of devices. In some cases, the source device 12 and the target device 16 may comprise wireless communication device apparatus, such as so-called cellular or satellite radio telephones. The techniques of this description, however, which generally apply to encoding and decoding of video blocks in bi-predictive merging mode can be applied to non-wireless devices including video encoding and/or decoding capabilities. The source device 12 and the destination device 16 are merely examples of encoding devices that can support the techniques described herein.
[0035] In the example of Figure 1, the source device 12 may include a video source 20, a video encoder 22, a modulator/demodulator (modem) 23 and a transmitter 24. The destination device 16 may include a receiver 26, a modem 27, a video decoder 28, and a display device 30. According to that description, the video encoder 22 of the source device 12 can be configured to encode one or more video blocks according to a mode of bi-predictive fusion. With the bi-predictive merge mode, a video block inherits its motion information from two different neighboring blocks, where the two different neighboring blocks were each encoded in a uni-predictive mode. Syntax elements may be generated in the video encoder 22 in order to identify the two different neighboring video blocks encoded in uni-predictive modes. In this way, a video decoder can reconstruct a bi-predictive video block based on the motion information from the two different neighboring video blocks identified by the syntax elements.
[0036] More specifically, the video encoder 22 can select a bi-predictive merge mode for encoding a current video block, and identify two different neighboring video blocks encoded in the uni-predictive modes. The video encoder 22 can use motion information from the two different neighboring video blocks to encode the current video block according to the bi-predictive merge mode, and generate one or more syntax elements for identifying two video blocks different neighbors for a video decoder.
[0037] The video source 20 may comprise a video capture device, such as a video camera, a video file containing previously captured video, a video feed from a video content provider, or other video source. video. As a further alternative, source video 20 may generate computer graphics based data such as source video, or a fusion of live video, archived video, and computer generated video. In some cases, if the video source 20 is a video camera, the source device 12 and the target device 16 can form so-called camera phones or video phones. In each case, the captured, pre-captured or computer generated video can be encoded by the video encoder 22.
[0038] Once the video data is encoded by the video encoder 22, the encoded video information can then be modulated by the modem 23 in accordance with a communication standard, for example, such as code division multiple access ( CDMA), orthogonal frequency division multiplexing (OFDM), or any other communication standard or technique. The encoded and modulated data can then be transmitted to the destination device 16 via the transmitter 24. The modem 23 can include various mixers, filters, amplifiers or other components designed for signal modulation. Transmitter 24 may include circuitry designed for transmitting data, including amplifiers, filters, and one or more antennas. Receiver 26 of destination device 16 receives information over channel 15, and modem 27 demodulates the information.
[0039] The video decoding process performed by the video decoder 28 may include reciprocal techniques to the encoding techniques performed by the video encoder 22. In particular, the video decoder 28 may receive one or more syntax elements for a block of current video, where the current video block is encoded according to a bi-predictive merge mode, and based on one or more syntax elements, identifying two different neighboring video blocks encoded in uni-predictive modes. The video decoder can use motion information from two different neighboring video blocks to decode a current video block according to the bi-predictive merging mode.
[0040] Communication channel 15 may comprise any wired or wireless communication medium, such as the radio frequency (RF) spectrum or one or more physical transmission lines, or any fusion of wired and wireless media. Communication channel 15 may form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the Internet. Communication channel 15 generally represents any suitable media, or collection of different media, for transmitting video data from source device 12 to destination device 16. Again, Figure 1 is merely exemplary and the techniques in this description may apply to video encoding configurations (eg, video encoding and video decoding) that do not necessarily include any data communication between the encoding and decoding devices. In other examples, data may be retrieved from local memory, streamed over a network, or the like.
[0041] In some cases, the video encoder 22 and the video decoder 28 may operate substantially in accordance with a video compression standard such as the emerging HEVC standard. However, the techniques in this description can also be applied in the context of a variety of other video encoding standards, including some old standards, or emerging new standards. Although not illustrated in Figure 1, in some cases the video encoder 22 and the video decoder 28 may each be integrated with an audio encoder and decoder, and may include appropriate MUX-DEMUX units or other hardware and software, to handle encoding both audio and video into a common data stream or separate data streams. If applicable, MUX-DEMUX units can conform to the ITU H.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP).
[0042] The video encoder 22 and the video decoder 28 can each be implemented as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate assemblies (FPGAs), discrete logic, software, hardware, firmware or combinations thereof. Each of the video encoder 22 and video decoder 28 may be included in one or more encoders or decoders, any of which may be integrated as part of a combined encoder/decoder (CODEC) in a respective mobile device, subscriber, broadcast device, server or similar. In this description, the term encoder refers to an encoder, a decoder, or CODEC, and the terms encoder, decoder, and CODEC all refer to specific machines designed for encoding (encoding and/or decoding) video data consistent with this description.
[0043] In some cases, the devices 12, 16 may operate in a substantially symmetrical manner. For example, each of the devices 12, 16 may include a video encoding and decoding component. In this way, system 10 can support one-way or two-way video transmission between video devices 12, 16, for example, for video streaming, video playback, video broadcasting, or video telephony.
[0044] During the encoding process, the video encoder 22 may perform a number of encoding techniques or operations. In general, the video encoder 22 operates on blocks of video data consistent with the HEVC standard. Consistent with HEVC, video blocks are referred to as encoding units (CUs) and many CUs exist within individual video frames (or other independently defined video units such as slices). Frames, slices, parts of frames, groups of images, or other data structures can be defined as units of video information that include a plurality of CUs. CUs can be of varying sizes consistent with the HEVC standard, and the bitstream can define the largest encoding units (LCUs) as the largest CU size. Bi-predictive fusion mode can be used to encode LCUs, CUs, or possibly other types of video blocks. With the HEVC standard, LCUs can be split into smaller and smaller CUs according to a quadtree partitioning scheme, and the different Cus that are defined in the scheme can be further partitioned into so-called prediction units (PUs). The LCUs, CUs and PUs are all video blocks within the meaning of this description.
[0045] Video encoder 22 may perform predictive coding in which a block of video being encoded (e.g. a PU of a CU within an LCU) is compared with one or more predictive candidates in order to identify a predictive block . This predictive encoding process can be intra (in which case predictive data is generated based on intra neighboring data within the same video frame or slice) or inter (in which case predictive data is generated based on video data in previous or subsequent frames or slices). Many different encoding modes can be supported, and the video encoder 22 can select a desirable video encoding mode. According to that description, at least some video blocks can be encoded using the bi-predictive fusion mode described here.
[0046] After generation of the predictive block, the differences between the current video block being encoded and the predictive block are encoded as a residual block, and the prediction syntax (such as a motion vector in the case of intercoding, or a prediction mode in case of intracoding) is used to identify the predictive block. Furthermore, with the bi-predictive merging mode described here, the prediction syntax (eg syntax elements) can identify two different neighboring video blocks to a video decoder. Accordingly, the decoder can identify two different neighboring video blocks encoded in uni-predictive modes, based on the syntax elements, and use the motion information from the two different neighboring video blocks to decode a current video block according to the bi-predictive fusion mode.
[0047] Residual block can be transformed quantized. The transform techniques may comprise a DCT process or conceptually similar process, integer transforms, wavelet transforms, or other types of transforms. In a DCT process, as an example, the transform process converts a set of pixel values (eg, residual pixel values) into transform coefficients, which can represent the energy of the pixel values in the frequency domain. The HEVC standard allows transformations according to transformation units (TUs), which may be different for different CUs. TUs are typically sized based on the size of the PUs within a given CU defined for a partitioned LCU, although this may not always be the case. TUs are typically the same size or smaller than PUs.
[0048] Quantization can be applied to transform coefficients, and generally involves a process that limits the number of bits associated with any given transform coefficients. More specifically, quantization can be applied according to a quantization parameter (QP) defined at the LCU level. Accordingly, the same level of quantization can be applied to all transform coefficients in the TUs associated with different PUs of the CUs within an LCU. However, rather than signaling the QP itself, a change (ie a delta) in the QP can be signaled with the LCU to indicate the change in QP from a previous LCU.
[0049] After transform and quantization, entropy coding can be performed on the quantized and transformed residual video blocks. Syntax elements can also be included in the entropy-encoded bitstream. In general, entropy encoding comprises one or more processes that collectively compress a sequence of quantized transform coefficients and/or other syntax information. Sweeping techniques can be performed on the quantized transform coefficients in order to define one or more one-dimensional serialized vectors of coefficients from the two-dimensional video blocks. The swept coefficients are then entropy encoded along with any syntax information, for example through adaptive variable length encoding (CAVLC), context adaptive binary arithmetic encoding (CABAC), or other entropy encoding process.
[0050] As part of the encoding process, the encoded video blocks can be decoded in order to generate the video data that is used for encoding based on subsequent prediction of subsequent video blocks. This is often referred to as a decoding loop of the encoding process, and generally reproduces the decoding that is performed by a decoding device. In the decoding loop of an encoder or decoder, filtering techniques can be used to improve video quality, and for example smooth pixel boundaries and possibly remove artifacts from the decoded video. This filtering can be looped or post-looped. With loop filtering, filtering of the reconstructed video data takes place in the encoding loop, which means that the filtered data is stored by an encoder or decoder for subsequent use in predicting subsequent image data. In contrast, with post-loop filtering, the filtering of the reconstructed video data takes place outside the encoding loop, which means that unfiltered versions of the data are stored by an encoder or decoder for subsequent use in predicting the image data. subsequent. Circuit filtering often follows a separate unblock filtering process, which typically applies filtering to pixels that are at or near the boundaries of adjacent video blocks in order to remove blocking artifacts that manifest at the block boundaries.
[0051] With respect to previous encoding standards, the emerging HEVC standard introduces new terms and block sizes for video blocks. In particular, HEVC refers to encoding units (CUs), which can be partitioned according to a quadtree partitioning scheme. An "LCU" refers to the largest encoding unit size (eg, the "largest encoding unit") supported in a given situation. The LCU size can itself be signaled as part of the bit stream, for example as the level syntax string. The LCU can be partitioned into smaller CUs. CUs can be partitioned into prediction units (PUs) for prediction purposes. PUs can have square or rectangular shapes. Transforms are not fixed in the emerging HEVC standard, but are defined according to transform unit (TU) sizes, which can be the same size as a given CU or possibly smaller. Residual data for a given CU can be reported in the TUs. Syntax elements can be defined at LCU level, CU level, PU level or TU level.
[0052] To illustrate the video blocks according to the HEVC standard, Figure 2 conceptually shows a 64 by 64 depth LCU, which is then partitioned into smaller CUs according to a quadtree partitioning scheme. Elements called "split indicators" can be included as CU-level syntax to indicate whether any given CU is subdivided into four additional CUs. In figure 2, CU0 may comprise LCU, CU1 to CU4 may comprise sub-CUs of LCU. Bi-predictive merge mode syntax elements, as described in this description, can be defined at the CU level (or possibly at the LCU level if the LCU is not split into smaller CUs). Bi-predictive merging mode may also be supported for PUs of CUs in some examples.
[0053] Figure 3 further illustrates a possible relationship between CUs, PUs and TUs, which may be consistent with the emerging HEVC standard or other standards. However, other relationships are also possible, and Figure 3 is merely illustrated as a possible example. In this case, any given CU of an LCU can be partitioned into PUs, as illustrated in Figure 3. The PU type for a given CU can be flagged as CU-level syntax. As shown in Figure 3, symmetric-type PUs and asymmetric-type PUs can be defined for a given CU. Furthermore, two different TU structures can be defined for each of the four symmetric PU types and asymmetric PU types. In this way, a one-bit syntax element (TU size indicator) can be used to signal the TU size, which can also depend on the PU type (symmetric or asymmetric). Coded Block Patterns (CBPs) can be defined for an LCU to indicate whether any given CU includes non-zero transform coefficients (eg, whether any TUs are present).
[0054] In other examples, TUs can be defined differently than shown in Figure 3. For example, residual samples corresponding to a CU can be subdivided into smaller units using a quadtree structure known as "residual quadtree" (RQT). RQT leaf nodes can be referred to as transform units (TUs). In some cases, TUs may be defined for a CU according to a quadtree structure, but the TUs may not necessarily depend on the PUs defined for any given CU. The PUs used for prediction can be defined separately from the TUs of any CU. A number of different types of partitioning schemes for CUs, TUs and PUs are possible.
[0055] Figure 4 is a block diagram illustrating a video encoder 50 consistent with this description. Video encoder 50 may correspond to video encoder 22 of device 20, or a video encoder of a different device. As shown in Figure 4, the video encoder 50 includes a quadtree partition unit 31 of the prediction encoding unit 32, adders 48 and 51, and a memory 34. The video encoder 50 also includes a transform unit 38 and a quantization unit 40, in addition to an inverse quantization unit 42 and an inverse transform unit 44. The video encoder 50 also includes an entropy encoding unit 46, and a filter unit 47, which may include unblocking filters and post loop and/or loop filters. The encoded video data and syntax information defining the encoding manner can be communicated to the entropy encoding unit 46, which performs entropy encoding on the bit stream.
[0056] As illustrated in Fig. 4, the prediction decoding unit 32 can support a plurality of different encoding modes 35 in encoding video blocks. Modes 35 must include intercoding modes that define predictive data from different video frames (or slices). Intercoding modes can be bi-predictive, meaning that two different lists (eg, List 0 and List 1) of prediction data (and typically two different vectors and motion) are used to identify the prediction data. Intercoding modes can alternatively be uni-predictive, meaning that a list (eg, List 0) of prediction data (and typically a motion vector) is used to identify the prediction data. Interpolations, biases or other techniques can be performed in conjunction with generating predictive data. So-called SKIP modes and DIRECT modes can also be supported, which inherit the motion information associated with a co-located block from another frame (or slice). SKIP mode blocks do not include any residual information, while DIRECT mode blocks include residual information.
[0057] Additionally, modes 35 may include intercoding modes, which define prediction data based on data within the same video frame (or slice) as the one being encoded. Intracoding modes can include directional modes that define prediction data based on data in a particular direction within the same frame, as well as DC and/or planar modes that define prediction data based on the average or weighted average of neighboring data. The prediction coding unit 32 can select the mode for a given block based on some criteria, such as based on a rate skew analysis or some characteristics of the block, such as block size, texture or other characteristics.
[0058] According to that description, the prediction decoding unit 32 supports a 35X bi-predictive merging mode. With the 35X bi-predictive merge mode, a video block being encoded inherits motion information from two different neighboring blocks, where the two different neighboring blocks were each encoded in a uni-predictive mode. In this way, the video block is encoded with the two different motion vectors that come from two different neighboring video blocks. In this case, the prediction coding unit 32 outputs an indication that the bi-predictive merge mode was used for a given block, and outputs syntax elements that identify the two different uni-predictive neighbors that collectively define the motion information. for the current bi-predictive block. The prediction blocks associated with the bi-predictive merge mode can be combined into a bi-predictive block (possibly using weighting factors) and the bi-predictive block can be subtracted from the block being coded (via adder 48) to define the residual data associated with the block encoded in the bi-predictive merge mode.
[0059] The motion information may comprise two different uni-predictive motion vectors associated with two different neighboring video blocks. These two different uni-predictive motion vectors can be used as two bi-predictive motion vectors of the current video block. The motion information may further comprise two reference index values associated with two different uni-predictive motion vectors, where the reference index values identify one or more lists of predictive data associated with the two different uni-predictive motion vectors. . Again, the residual data can be generated as the difference between the block being encoded and the predictive data defined by the two different uni-predictive motion vectors that collectively define the bi-predictive merge block used in the prediction.
[0060] In the case of HEVC, the current video block being encoded may comprise a CU call defined with respect to an LCU according to the quadtree partitioning scheme. In that case, the quadtree partition unit 31 can generate LCU syntax data that defines the quadtree partitioning scheme, and the prediction encoding unit 32 can generate mode information for the CU that defines the bi-predictive merge mode, where the one or more syntax elements (which identify uni-predictive neighbors) are included in the mode information for the CU.
[0061] The described bi-predictive fusion mode can increase the number of bi-predictive candidates that can be used in the fusion mode encoding context. For example, if none of the neighbors are encoded in bi-predictive mode, the described bi-predictive merging mode can allow bi-prediction to be exploited by merging the motion information of the two neighbors into the prediction of the current video block. Furthermore, even if one or more neighbors are encoded in bi-predictive mode, merging two unidirectional neighbors according to the described bi-predictive merging mode can still provide encoding gains in some situations.
[0062] Generally, during the encoding process, video encoder 50 receives input video data. The prediction coding unit 32 performs the predictive coding techniques on the video blocks (eg CUs and PUs). The quadtree partition unit 31 can break an LCU into smaller CUs and PUs according to the HEVC partitioning explained above with reference to figures 2 and 3. For intercoding, the prediction coding unit 32 compares CUs or PUs to multiple predictive candidates into one or more video reference frames or slices (for example, one or more reference data "lists") to define a predictive block. For intracoding, the prediction coding unit 32 generates a prediction block based on neighboring data within the same video frame or slice. The prediction encoding unit 32 outputs the prediction block and the adder 48 subtracts the prediction block from the CU or PU being encoded in order to generate a residual block. Again, at least some video blocks can be encoded using the bi-predictive merging mode described here.
[0063] Fig. 5 illustrates an example of prediction decoding unit 32 of video encoder 50 in greater detail. The prediction coding unit 32 may include a mode selection unit 75 which selects the desired mode from modes 35, which include bi-predictive merging mode 35X as a possibility. For intercoding, prediction coding unit 32 may comprise motion estimation unit (ME) 76 and motion compensation unit (MC) 77 which identify one or more motion vectors pointing to the predictive data, and generate the prediction block based on the motion vector. Typically, motion estimation is considered the process of generating one or more motion vectors, which estimate motion. For example, the motion vector can indicate the displacement of a predictive block within a predictive frame with respect to the current block being encoded within the current frame. In the case of 35X bi-predictive merge mode, two unidirectional motion vectors from the two neighbors are combined to create a bidirectional prediction.
[0064] Motion compensation is typically considered the process of finding or generating predictive block (or blocks) based on the motion vector determined by the motion estimate. In some cases, the motion compensation for intercoding may include interpolations for the subpixel resolution, which allows the motion estimation process to estimate the motion of video blocks at that subpixel resolution. Weighted combinations of the two blocks (in the case of bi-prediction) can also be used.
[0065] For intracoding, prediction coding unit 32 may comprise intraprediction unit 78. In that case, prediction data may be generated based on data within the current video block (e.g. adjacent to the video block). video being encoded). Again, intracoding modes can include directional modes that define prediction data based on data in a particular direction within the same frame, as well as DC and/or planar modes that define predictive data based on the average or weighted average of the neighboring data.
[0066] The Rate Distortion Unit (R-D) 79 can compare the encoding results of video blocks (eg CUs or PUs) in different modes. In addition, the R-D 79 unit can allow other types of parameter adjustments such as adjustments to interpolations, offsets, quantization parameters or other factors that may affect the encoding rate. The mode selection unit75 can analyze the encoding results in terms of encoding rate (i.e. encoding of bits needed for the block) and distortion (e.g. representing the video quality of the encoded block with respect to the original block) to in order to make the mode selections for the video blocks. In this way, the R-D unit 79 provides analysis of the results of different modes to allow the mode selection unit 75 to select the desired mode for different video blocks. Consistent with that description, the 35X bi-predictive merge mode can be selected when the R-D unit 79 identifies it as the desired mode for a given video block, for example, due to coding gains or coding efficiency.
[0067] Referring again to Fig. 4, after the prediction encoding unit 32 outputs the prediction block, and after the adder 48 subtracts the prediction block from the video block being encoded in order to generate a residual block of values of residual pixels, the transform unit 38 applies a transform to the residual block. The transform may comprise a discrete cosine transform (DCT) or a conceptually similar transform as defined by the ITU standard H.264 or the HEVC standard. So-called "butterfly" structures can be defined to perform the transforms, or matrix-based multiplication can also be used. In some examples, consistent with the HEVC standard, the transform size may vary for different CUs, for example depending on the level of partitioning that occurs with respect to a given LCU. Transform units (TUs) can be set in order to configure the transform size applied by transform unit 38. Wavelet transforms, integer transforms, subband transforms, or other types of transforms can be used as well. In either case, the transform unit applies the transform to the residual block, producing a block of residual transform coefficients. The transform, in general, can convert the residual information from a pixel domain into a frequency domain.
[0068] The quantization unit 40 then quantizes the residual transform coefficients to further reduce the bit rate. The quantization unit 40, for example, can limit the number of bits used to encode each of the coefficients. In particular, the quantization unit 40 may apply the delta QP defined for the LCU in order to define the level of quantization to be applied (such as by merging delta QP with QP from the previous LCU or some other known QP). After quantization is performed on residual samples, the entropy encoding unit 46 can scan and entropy encode the data.
[0069] CAVLC is a type of entropy encoding technique supported by the ITU standard H. 264 and the emerging HEVC standard, which can be applied to a base vectorized by entropy encoding unit 46. CAVLC uses length encoding tables variable (VLC) in a way that effectively compresses the serialized "passages" of coefficients and/or syntax elements. CABAC is another type of entropy encoding technique supported by the ITU H.264 standard or HEVC standard, which can be applied to a base vectored by entropy encoding unit 46. CABAC may involve several stages, including binarization, model selection context, and binary arithmetic encoding. In that case, entropy encoding unit 46 encodes the coefficients and syntax elements according to CABAC. Many other types of entropy encoding techniques also exist, and new entropy encoding techniques will likely emerge in the future. This description is not limited to any specific entropy encoding technique.
[0070] Following entropy encoding by entropy encoding unit 46, the encoded video can be streamed to another device or archived for later transmission or retrieval. The encoded video may comprise entropy encoded vectors and various syntax information (including syntax information that defines two neighbors in the case of a bi-predictive merge mode). Such information can be used by the decoder to properly configure the decoding process. The inverse quantization unit 42 and the inverse transform unit 44 apply the inverse quantization and transform, respectively, to reconstruct the residual block in the pixel domain. The adder 51 adds the reconstructed residual block to the prediction block produced by the prediction encoding unit 32 to produce a reconstructed video block for storage in memory 34. Prior to such storage, however, the filter unit 47 may apply filtering. to the video block to improve the video quality. Filtering applied by filter unit 47 can reduce artifacts and smooth pixel boundaries. Furthermore, filtering can improve compression by generating predictive video blocks that comprise combinations with the video blocks being encoded.
[0071] According to this description, 35X bi-predictive merge mode which inherits motion information from two different neighboring blocks is supported, where the two different neighboring blocks were each encoded in a uni-predictive mode. The described 35X bi-predictive fusion mode can increase the number of bi-predictive candidates that can be used in the fusion mode encoding context. Accordingly, the R-D unit 79 (FIG. 5) can identify the 35X bi-predictive merge mode as the most desirable encoding mode due to the encoding gains achieved by that mode over other modes. In such cases, the mode selection unit 75 can select the 35X bi-predictive merge mode for encoding one of the video blocks.
[0072] Fig. 6 is a block diagram illustrating an example of a video decoder 60, which decodes a video stream that is encoded in the manner described herein. The techniques of this description can be performed by the video decoder 60 in some examples. In particular, the video decoder 60 receives one or more syntax elements for a current video block, where the current video block is encoded according to a bi-predictive merging mode, and based on one or more video elements. syntax, identifies two different neighboring video blocks encoded in uni-predictive modes. Video decoder 60 then uses motion information from two different neighboring video blocks to decode a current video block according to the bi-predictive merge mode.
[0073] A video stream received at the video decoder 60 may comprise an encoded set of picture frames, a set of frame slices, a commonly encoded group of pictures (GOPs), or a wide variety of image information units. video that include encoded LCUs (or other video blocks) and syntax information to define how to decode such LCUs. The process of decoding the LCUs may include decoding an indication of the encoding mode, which may be the bi-predictive fusion mode described herein.
[0074] The video decoder 60 includes an entropy decoding unit 52, which performs the reciprocal decoding function of the encoding performed by the entropy encoding unit 46 of Fig. 2 . In particular, the entropy decoding unit 52 can perform CAVLC or CABAC decoding or any other type of entropy decoding used by video encoder 50. Video decoder 60 also includes a prediction decoding unit 54, an inverse quantization unit 56, an inverse transform unit 58, a memory 62, and an adder 64. In particular, like the video encoder 50, the video decoder 60 includes a prediction decoding unit 54 and a filter unit 57. The prediction decoding unit 54 of the video decoder 60 may include motion compensation unit 86, which decodes the intercoded blocks and possibly includes one or more interpolation filters to subpixel interpolation in motion compensation process. The prediction decoding unit 54 may also include an intraprediction unit for decoding intramodes. The prediction decoding unit 54 may support a plurality of modes 35 including bi-predictive fusion mode 55X. Filter unit 57 may filter the output of adder 64, and may receive entropy-decoded filter information in order to define filter coefficients applied to loop filtering.
[0075] After receiving the encoded video data, the entropy decoding unit 52 performs reciprocal decoding to the encoding performed by the entropy encoding unit 46 (of the encoder 50 in Fig. 4). At the decoder, the entropy decoding unit 52 analyzes the bit stream to determine the LCUs and the corresponding partitioning associated with the LCUs. In some examples, an LCU or CUs of the LCU may define encoding modes that have been used, and these encoding modes may include bi-predictive merge mode. Accordingly, the entropy decoding unit 52 may send the syntax information to the prediction unit which identifies the bi-predictive merge mode. In that case, the syntax information may include one or more syntax elements that identify two different neighboring video blocks encoded in uni-predictive modes. In that case, the MC unit 86 of the prediction decoding unit 54 can use the motion information of two different neighboring video blocks to decode a current video block according to the bi-predictive merging mode. That is, the MC unit 86 can fetch the predictive data identified by the motion information from two different neighboring video blocks, and use some merging of that prediction data in decoding the current video block in bi-predictive merging mode.
[0076] Figure 7 is a conceptual illustration showing an example of five different neighbors that can be considered for purposes of bi-predictive fusion mode. In this example, the upper neighbor (T), upper right neighbor (TR), left neighbor (L), lower left neighbor (BL) and co-located temporal neighbor (Temp) of another video frame can be considered for video mode purposes. bi-predictive fusion. Of course, other neighbors (spatial or temporal) can also be used for any inheritance of motion information fusion mode.
[0077] Again, with the merge mode, the current video block can inherit all motion information from a neighboring candidate block. This means that the current block will have the same motion vector, same reference frame, and same prediction mode (uni-prediction or bi-prediction) as the selected neighboring block. The selected neighbor block can be signaled as part of an encoded bit stream, but the motion information need not be signaled as the decoder can obtain the motion information from the selected neighbor block.
[0078] Consistent with this description, a bi-predictive merge mode is supported, which inherits motion information from two different neighboring blocks, where the two different neighboring blocks were each encoded in a uni-predictive mode. The described bi-predictive fusion mode can increase the number of bi-predictive candidates that can be used in the fusion mode context. Instead of signaling one neighbor, the bi-predictive fusion mode can signal two different neighbors. The bi-predictive merge mode may be an extension of a conventional merge mode simply by augmenting candidate neighboring blocks to include combinations thereof, or it may be an entirely separate mode from a conventional merge mode.
[0079] Assuming the neighboring candidate spatial and temporal blocks shown in Figure 1, the described bi-predictive merge mode can work in at least two scenarios. In a first scenario, all candidate neighboring blocks are encoded in uni-predictive modes. In this case, any two candidate blocks can be selected, and the motion information from both selected candidates can be combined to achieve a bi-prediction. For example, assuming the neighboring blocks shown in Figure 1 are encoded according to the following information: L: uni-pred, L0, refIdx = 0T: uni-pred, L1, refIdx = 0TR: uni-pred, L0, refIdx = 1BL: uni-pred, L0, refIdx = 0Temp: uni-predo, L1, refIdx = 1
[0080] In this case, there are 10 combinations of any two out of five candidates. L0 refers to a first list of predictive data, and L1 refers to a second list of predictive data. RefIdx may comprise an index to a particular image in the respective list. The video encoder can select the best merge (eg in terms of encoding rate and distortion) and can send syntax information that identifies the two selected neighboring blocks. The decoder can decode the syntax information and obtain the motion information of selected neighboring blocks.
[0081] In a second scenario, at least one of the candidate neighbor blocks can be encoded in a bi-predictive mode. In this case, any two of the neighboring uni-predictive blocks can be considered (merged) to obtain a candidate for the bi-predictive merge mode. However, any bi-predictive blocks can also be used alone for consideration as a bi-predictive fusion candidate. For example, assuming the neighboring blocks shown in Figure 1 are encoded according to the following information: L: bi-pred, L0, refIdx = 0, L1, refIdx = 0T: uni-pred, L1, refIdx = 0TR: uni-pred, L0, refIdx = 1BL: uni-pred, L0, refIdx = 0Temp: bi-pred, L0, refIdx = 0, L1, refIdx = 1
[0082] Again, L0 can comprise a value that refers to a first list of prediction data, L1 can comprise a value that refers to a second list of prediction data, and refIdx can be values that define the indices for an image particular in the respective list. In this second example, two of the five candidates are already in the bi-predictive mode, so they can be considered alone for the purposes of bi-predictive fusion modes. Additionally, the different combinations of the remaining three uni-predictive candidates can be considered. So, in this case, there will be 5 possibilities of bi-predictive mode: 1. L2. Season3. T + TR4. T + BL5. TR + BL
[0083] In this second example, the encoder can select the best neighbor (or fusion of neighbors) from these five possibilities (e.g. in terms of encoding rate and distortion) and can send the syntax information that identifies which individual neighbor or Neighbor merging was used in merge mode. The decoder can decode the syntax information and obtain the motion information from the selected neighboring blocks.
[0084] In the example of figure 7, five candidates are shown. However, additional candidates may also be considered in the same areas as the candidates in Figure 7 or in other areas. In some cases, there may be multiple superior candidates (T), multiple superior left candidates (TL), multiple left candidates (L), multiple inferior left candidates (BL), and multiple temporal candidates (T). In some cases, the size of the current block may differ from the size of the candidates, in which case the top edge or left edge of the current block may be adjacent to multiple candidates. In other cases, candidates at even greater distances from the current video block may be considered for the purposes of the bi-predictive fusion mode described in this description. Many different scenarios using many different candidates are possible consistently with this description. Thus, Figure 7 is merely an example illustrating five neighboring candidates with respect to the current video block.
[0085] Figure 8 is a flowchart illustrating a decoding technique consistent with this description. Figure 8 will be described from the perspective of the video decoder 60 of Figure 6, although other devices may perform similar techniques. As shown in Figure 8, the video decoder 60 receives an LCU including syntax elements for the LCU and for the CUs within the LCU 801. In particular, the entropy decoding unit 52 can receive a bit stream that includes the LCU and parses the bit stream to identify the syntax elements, which can be sent to the prediction decoding unit 54. Accordingly, the prediction decoding unit 54 can perform the modes of the CUs based on the syntax elements. In other examples, modes can be defined at the PU level instead of the CU level.
[0086] In determining modes, the prediction decoding unit 54 identifies any CUs encoded in the bi-predictive fusion mode (803). If any CU is not encoded in bi-predictive ("not") merge mode 803, then that CU is decoded according to its 804 mode. For example, many different intramodes and many different intermodes can be supported. If any CU is encoded in bi-predictive ("yes") merge mode 803, then that CU is bi-predictive. However, their motion vectors for bi-prediction come from the two unidirectional neighbors as discussed here. In that case, the MC unit 86 and prediction decoding unit 54 identify two different uni-predictive neighbors for the CU based on the CU syntax element 805, and use the motion information of the uni-predictive neighbors to decode the CU bi. -predictive.
[0087] Figure 9 is a flowchart illustrating an encoding technique consistent with this description. Fig. 9 will be described from the perspective of the video encoder 50 of Fig. 4, although other devices may perform similar techniques. As shown in Fig. 9, the prediction encoding unit 32 selects a bi-predictive merge mode for a CU. For example, the prediction coding unit 32 (see Figure 5 ) may include a mode selection unit 75 that selects the bi-predictive merge mode 35X for a CU based on a plurality of possible modes 35. The RD unit 75 can identify the coding rate and level of quality or distortion associated with different modes by analyzing the coding results by the intraprediction unit 78 for intramodes. In this way, the mode selection unit 75 can identify the best mode in any given situation.
[0088] Once the prediction encoding unit 32 selects the 35X bi-predictive merging mode for a CU, the prediction unit identifies two different neighboring blocks encoded in the 902 uni-predictive modes. This two-block identification process different neighbors encoded in uni-predictive modes can be realized by the ME unit 76 in a similar manner as described above. For example, the ME unit 76 and the MC unit 77 can generate encoding results of different combinations of motion information from different one-way neighbors, and these results can be analyzed by the RD unit 75 to determine the encoding rate and quality or distortion associated with such different combinations. Finally, the R-D unit 75 can determine which bi-predictive fusion mode fusion of uni-predictive neighbors results in better encoding results.
[0089] Accordingly, the prediction coding unit 32 uses the motion information from the best uni-predictive neighbor merge to encode the CU as bi-predictive 903. Of course, any bi-predictive neighbors can also be considered, and possibly used for fusion mode encoding if the rate skew results are better than using two uni-predictive neighbors. The prediction unit 32 (e.g. ME unit 76 or MC unit 77) generates one or more syntax elements for the CU for identification of two different uni-predictive neighbors used for the bi-predictive merge mode encoding of the CU. Syntax elements, for example, may comprise index values that identify two of the CU's neighbors, such as the left neighbor (L), the lower left neighbor (BL), the upper neighbor (T), the upper right neighbor ( TR), or the co-located temporal neighbor (T) as conceptually illustrated in Figure 7. However, many other signaling schemes for syntax elements can also be used.
[0090] The techniques of this description can be performed on a wide variety of devices or appliances, including a wireless device, and integrated circuit (IC) or set of ICs (ie, a chip set). Any components, modules or units have been described to emphasize the functional aspects and do not necessarily require realization by different hardware units.
[0091] Accordingly, the techniques described here may be implemented in hardware, software, firmware or any fusion thereof. Any features described as modules or components can be implemented together in an integrated logic device or separately as discrete but interoperable logic devices. If implemented in software, the techniques can be performed at least in part by a computer readable medium comprising instructions which, when executed, perform one or more of the methods described above. The computer readable data storage medium may form part of a computer program product, which may include packaging materials.
[0092] The computer readable medium may comprise a tangible computer readable storage medium, such as random access memory (RAM) such as synchronized dynamic random access memory (SDRAM), read-only memory (ROM), access memory non-volatile random memory (NVRAM), electrically programmable, programmable read-only memory (EEPROM), FLASH memory, magnetic or optical data storage media, and the like. The techniques additionally, or alternatively, can be performed at least in part by a computer-readable communication medium that carries or communicates code in the form of instructions or data structures and that can be accessed, read, and/or executed by a computer.
[0093] Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general-purpose microprocessors, application-specific integrated circuits (ASICs), field-programmable logic assemblies (FPGAs), or other equivalent discrete or integrated logic circuitry. The term "processor" as used herein may refer to any of the above structure or any other structure suitable for implementing the techniques described herein. Additionally, in some aspects, the functionality described here may be provided within dedicated software modules or hardware modules configured for encoding and decoding, or incorporated into a combined video encoder-decoder (CODEC). Furthermore, the techniques can be fully implemented in one or more circuits or logic elements.
[0094] Various aspects of the description have been described. These and other aspects are within the scope of the following claims.
权利要求:
Claims (12)
[0001]
1. A method for encoding video data comprising: selecting a bi-predictive merge mode to encode a current video block; identifying a first set of two different neighboring video blocks encoded in uni-predictive modes from a set of blocks candidate neighbors, the set of candidate neighbor blocks comprising an upper neighbor, upper right neighbor, left neighbor, lower left neighbor, and co-located temporal neighbor of another video frame; use two different uni-predictive motion vectors, one associated with each one of two different neighboring video blocks of said first set, as a first candidate set of motion vectors to encode the current video block according to the bi-predictive merge mode; characterized by identifying a second set of two blocks of video neighbors encoded in uni-predictive modes from the set of candidate neighbor blocks, and use two different uni-predictive motion vectors, one associated with each of the two different neighboring video blocks of said second set, as a second candidate set of motion vectors to encode the current video block according to the bi merge mode -predictive; identify at least one of the neighboring blocks from the set of candidate neighbor blocks that is encoded in bi-predictive mode, and use the two bi-predictive motion vectors of said at least one of the neighboring blocks as a third candidate set of motion vectors to encode the current video block according to the bi-predictive merging mode; selecting one of the candidate sets of motion vectors to encode the current video block according to the bi-predictive merging mode; and generate one or more syntax elements to identify the selected candidate set of motion vectors.
[0002]
2. Method according to claim 1, characterized in that the motion information further comprises at least two values associated with the first set of two different uni-predictive motion vectors, in which the values identify one or more lists of predictive data associated with the set of two different uni-predictive motion vectors.
[0003]
3. Method according to claim 1, characterized in that the current video block comprises an encoding unit (CU) defined in accordance with a high-efficiency video encoding (HEVC) standard, the method additionally comprising :define the CU relative to a major encoding unit (LCU) according to a quadtree partitioning scheme;generate LCU syntax data that define the quadtree partitioning scheme; Generate mode information for the CU that defines the bi-predictive merge mode, where one or more syntax elements are included in the mode information for the CU.
[0004]
4. Method according to claim 1, characterized in that the current video block comprises a prediction unit (PU) of a coding unit (CU) that is defined according to a video coding standard of high efficiency (HEVC).
[0005]
5. Method according to claim 1, characterized by the fact that the selection of the set of candidate motion vectors is based on a rate and distortion determination.
[0006]
6. A method for decoding video data comprising: receiving one or more syntax elements for a current video block, wherein the current video block is encoded according to a bi-predictive merging mode; based on the one or more syntax elements, identify: one of a plurality of sets of two different neighboring video blocks encoded in uni-predictive modes, or a neighbor block from the set of candidate neighboring blocks that is encoded in bi-predictive mode, where the neighboring video blocks are from a set of candidate neighbor blocks, the set of candidate neighbor blocks comprising an upper neighbor, upper right neighbor, left neighbor, lower left neighbor and co-located temporal neighbor of another video frame; where the syntax element identifies one or a plurality of sets of two different neighboring video blocks encoded in uni-predictive mode, characterized by the fact that it uses two different uni-predictive motion vectors, one associated with each of the two video blocks. different neighboring video to decode a current video block according to bi-predictive merging mode; ewhere the syntax element identifies a neighboring video block encoded in bi-predictive mode using the two bi-predictive motion vectors associated with the neighboring video block to decode a current video block according to the bi-predictive merge mode.
[0007]
7. Method according to claim 6, characterized in that the motion information further comprises at least two values associated with the two different uni-predictive motion vectors, in which the values identify one or more lists of predictive data associated with the two different uni-predictive motion vectors.
[0008]
8. Method according to claim 6, characterized in that the current video block comprises an encoding unit (CU) defined in accordance with a high-efficiency video encoding standard (HEVC), wherein the CU is defined with respect to a major encoding unit (LCU) according to a quadtree partitioning scheme; the method further comprising: receiving syntax data from the LCU defining the quadtree partitioning scheme; Receive mode information for the CU that defines the bi-predictive merge mode, where one or more syntax elements are included in the mode information for the CU.
[0009]
9. Method according to claim 6, characterized in that the current video block comprises a prediction unit (PU) of a coding unit (CU) that is defined according to a video coding standard of high efficiency (HEVC).
[0010]
10. Device for decoding video data comprising: mechanisms for receiving one or more syntax elements for a current video block, wherein the current video block is encoded according to a bi-predictive fusion mode; mechanisms for identifying a from among a plurality of sets of two different neighboring video blocks encoded in uni-predictive modes, or a neighbor block from the set of candidate neighboring blocks which is encoded in bi-predictive mode, based on one or more syntax elements, where the neighboring video blocks are from a set of candidate neighbor blocks, the set of candidate neighbor blocks comprising an upper neighbor, upper right neighbor, left neighbor, lower left neighbor, and co-located temporal neighbor of another video frame; mechanisms to use two different uni-predictive motion vectors, one associated with each of the set of two different neighboring video blocks for decoding to define a current video block according to the bi-predictive merge mode characterized by the fact that the syntax element identifies one or a plurality of sets of two different neighboring video blocks encoded in uni-predictive mode, or to use the two bi-predictive motion vectors associated with the neighboring video block to decode a current video block according to the bi-predictive merge mode where the syntax element identifies a bi-predictive mode encoded neighboring video block.
[0011]
11. Device for encoding video data comprising: mechanisms for selecting a bi-predictive merge mode to encode a current video block; mechanisms for identifying a first set of two different neighboring video blocks encoded in uni-predictive modes from a set of candidate neighbor blocks, the set of candidate neighbor blocks comprising an upper neighbor, upper right neighbor, left neighbor, lower left neighbor and co-located temporal neighbor of another video frame; mechanisms for using two uni-predictive motion vectors different, one associated with each of the two different neighboring video blocks of said first set, as a first candidate set of motion vectors to encode the current video block according to the bi-predictive merge mode; mechanisms for identifying a second set of two different neighboring video blocks encoded in uni-predictive modes from r of the set of candidate neighboring blocks, and characterized by using two different uni-predictive motion vectors, one associated with each of the two different neighboring video blocks of said second set, as a second candidate set of motion vectors to encode the block video stream according to bi-predictive merge mode; mechanisms to identify at least one of the neighboring blocks from the set of candidate neighbor blocks that is encoded in bi-predictive mode, and use the two bi-predictive motion vectors of said at least one of the neighboring blocks as a third candidate set of motion vectors for encoding the current video block in accordance with the bi-predictive merge mode; mechanisms for selecting one of the candidate sets of motion vectors for encoding the block current video according to bi-predictive fusion mode; and mechanisms to generate one or more syntax elements to identify the set c andidate of selected motion vectors.
[0012]
12. Computer readable medium characterized in that it comprises instructions which, when executed, cause a processor to encode video data, wherein the instructions cause the processor to perform the method as defined in any one of claims 1 to 9 .
类似技术:
公开号 | 公开日 | 专利标题
BR112013024187B1|2022-02-01|Bi-predictive merging mode based on uni-predictive neighbors in video encoding
US9288501B2|2016-03-15|Motion vector predictors | for bi-predictive inter mode in video coding
US9906790B2|2018-02-27|Deblock filtering using pixel distance
US9699472B2|2017-07-04|Restriction of prediction units in B slices to uni-directional inter prediction
US8964833B2|2015-02-24|Deblocking of non-square blocks for video coding
WO2017201011A1|2017-11-23|Confusion of multiple filters in adaptive loop filtering in video coding
KR20130105894A|2013-09-26|Mode dependent scanning of coefficients of a block of video data
TWI481237B|2015-04-11|Grouping bypass coded syntax elements in video coding
US10382755B2|2019-08-13|Low complexity sample adaptive offset | coding
AU2012273109A1|2014-01-09|Parallelization friendly merge candidates for video coding
JP6333942B2|2018-05-30|Apparatus and method for scalable coding of video information
CA2841953C|2017-01-03|Signaling picture size in video coding
JP2015514362A|2015-05-18|Inter-layer texture prediction for video coding
US20150373362A1|2015-12-24|Deblocking filter design for intra block copy
US20140044162A1|2014-02-13|Adaptive inference mode information derivation in scalable video coding
CN107258081B|2020-07-03|Optimization of video data encoded using non-square partitions
JP2019534631A|2019-11-28|Peak sample adaptive offset
CA2830242C|2016-11-22|Bi-predictive merge mode based on uni-predictive neighbors in video coding
同族专利:
公开号 | 公开日
JP5897698B2|2016-03-30|
EP2689582B1|2018-11-14|
BR112013024187A2|2016-12-13|
KR20130135362A|2013-12-10|
MX2013010766A|2013-10-17|
ZA201307789B|2018-12-19|
WO2012128903A1|2012-09-27|
PL2689582T3|2019-05-31|
SI2689582T1|2019-03-29|
KR101667836B1|2016-10-19|
HUE042481T2|2019-06-28|
SG192954A1|2013-09-30|
JP2014514814A|2014-06-19|
AU2012231675B2|2015-04-02|
US9648334B2|2017-05-09|
RU2013146788A|2015-04-27|
US20120243609A1|2012-09-27|
CN103460695A|2013-12-18|
EP2689582A1|2014-01-29|
IL228117A|2017-09-28|
ES2711449T3|2019-05-03|
IL228117D0|2013-12-31|
AU2012231675A1|2013-10-10|
MY167046A|2018-08-02|
DK2689582T3|2019-02-25|
PT2689582T|2019-02-21|
RU2547240C1|2015-04-10|
CN103460695B|2016-12-14|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

KR0171145B1|1995-03-20|1999-03-20|배순훈|Improved motion compensation apparatus for use in an iamge encoding system|
GB9519923D0|1995-09-29|1995-11-29|Philips Electronics Nv|Motion estimation for predictive image coding|
JP2003199111A|2001-12-27|2003-07-11|Matsushita Electric Ind Co Ltd|Prediction device, encoder, inverse prediction device, decoder, and arithmetic device|
JP4114859B2|2002-01-09|2008-07-09|松下電器産業株式会社|Motion vector encoding method and motion vector decoding method|
HUE044616T2|2002-04-19|2019-11-28|Panasonic Ip Corp America|Motion vector calculating method|
US7020200B2|2002-08-13|2006-03-28|Lsi Logic Corporation|System and method for direct motion vector prediction in bi-predictive video frames and fields|
US7469070B2|2004-02-09|2008-12-23|Lsi Corporation|Method for selection of contexts for arithmetic coding of reference picture and motion vector residual bitstream syntax elements|
JP2007043651A|2005-07-05|2007-02-15|Ntt Docomo Inc|Dynamic image encoding device, dynamic image encoding method, dynamic image encoding program, dynamic image decoding device, dynamic image decoding method, and dynamic image decoding program|
TWI364992B|2006-01-05|2012-05-21|Nippon Telegraph & Telephone|
WO2008035665A1|2006-09-20|2008-03-27|Nippon Telegraph And Telephone Corporation|Image encoding method, decoding method, device thereof, image decoding device, program thereof, and storage medium containing the program|
US8571104B2|2007-06-15|2013-10-29|Qualcomm, Incorporated|Adaptive coefficient scanning in video coding|
WO2010086018A1|2009-01-29|2010-08-05|Telefonaktiebolaget L M Ericsson |Method and apparatus for efficient hardware motion estimation|
KR101452859B1|2009-08-13|2014-10-23|삼성전자주식회사|Method and apparatus for encoding and decoding motion vector|
CN106210737B|2010-10-06|2019-05-21|株式会社Ntt都科摩|Image prediction/decoding device, image prediction decoding method|
US8526495B2|2010-11-22|2013-09-03|Mediatek Singapore Pte. Ltd.|Apparatus and method of constrained partition size for high efficiency video coding|
EP3554079A1|2011-01-07|2019-10-16|LG Electronics Inc.|Method for encoding video information, method of decoding video information and decoding apparatus for decoding video information|
WO2012097376A1|2011-01-14|2012-07-19|General Instrument Corporation|Spatial block merge mode|
TW201246943A|2011-01-26|2012-11-16|Panasonic Corp|Video image encoding method, video image encoding device, video image decoding method, video image decoding device, and video image encoding and decoding device|
MX2013009864A|2011-03-03|2013-10-25|Panasonic Corp|Video image encoding method, video image decoding method, video image encoding device, video image decoding device, and video image encoding/decoding device.|
MX2013010231A|2011-04-12|2013-10-25|Panasonic Corp|Motion-video encoding method, motion-video encoding apparatus, motion-video decoding method, motion-video decoding apparatus, and motion-video encoding/decoding apparatus.|
PL2717573T3|2011-05-24|2018-09-28|Velos Media International Limited|Image encoding method, image encoding apparatus, image decoding method, image decoding apparatus, and image encoding/decoding apparatus|
CA3101406A1|2011-06-14|2012-12-20|Samsung Electronic Co., Ltd.|Method and apparatus for encoding motion information and method and apparatus for decoding same|
WO2013001803A1|2011-06-30|2013-01-03|株式会社Jvcケンウッド|Image encoding device, image encoding method, image encoding program, image decoding device, image decoding method, and image decoding program|KR101527666B1|2010-11-04|2015-06-09|프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베.|Picture coding supporting block merging and skip mode|
MX2013010231A|2011-04-12|2013-10-25|Panasonic Corp|Motion-video encoding method, motion-video encoding apparatus, motion-video decoding method, motion-video decoding apparatus, and motion-video encoding/decoding apparatus.|
PL2717573T3|2011-05-24|2018-09-28|Velos Media International Limited|Image encoding method, image encoding apparatus, image decoding method, image decoding apparatus, and image encoding/decoding apparatus|
MX2013012132A|2011-05-27|2013-10-30|Panasonic Corp|Image encoding method, image encoding device, image decoding method, image decoding device, and image encoding/decoding device.|
US9485518B2|2011-05-27|2016-11-01|Sun Patent Trust|Decoding method and apparatus with candidate motion vectors|
SG194746A1|2011-05-31|2013-12-30|Kaba Gmbh|Image encoding method, image encoding device, image decoding method, image decoding device, and image encoding/decoding device|
KR101889582B1|2011-05-31|2018-08-20|선 페이턴트 트러스트|Video encoding method, video encoding device, video decoding method, video decoding device, and video encoding/decoding device|
JP5807621B2|2011-06-30|2015-11-10|株式会社Jvcケンウッド|Image encoding device, image encoding method, image encoding program, transmission device, transmission method, and transmission program|
PL2728878T3|2011-06-30|2020-06-15|Sun Patent Trust|Image decoding method, image encoding method, image decoding device, image encoding device, and image encoding/decoding device|
KR20130004173A|2011-07-01|2013-01-09|한국항공대학교산학협력단|Method and apparatus for video encoding and decoding|
MX347793B|2011-08-03|2017-05-12|Panasonic Ip Corp America|Video encoding method, video encoding apparatus, video decoding method, video decoding apparatus, and video encoding/decoding apparatus.|
CN104883576B|2011-08-29|2017-11-14|苗太平洋控股有限公司|The method that prediction block is produced with AMVP patterns|
AU2012309008A1|2011-09-16|2013-11-07|Mediatek Singapore Pte. Ltd.|Method and apparatus for prediction mode and partition mode syntax coding for coding units in HEVC|
MX2014003991A|2011-10-19|2014-05-07|Panasonic Corp|Image encoding method, image encoding device, image decoding method, and image decoding device.|
EP2774360B1|2011-11-04|2017-08-02|Huawei Technologies Co., Ltd.|Differential pulse code modulation intra prediction for high efficiency video coding|
WO2013074964A1|2011-11-16|2013-05-23|Vanguard Software Solutions, Inc.|Video compression for high efficiency video coding|
KR20130055773A|2011-11-21|2013-05-29|한국전자통신연구원|Encoding method and apparatus|
JP5846132B2|2012-01-18|2016-01-20|株式会社Jvcケンウッド|Moving picture coding apparatus, moving picture coding method, moving picture coding program, transmission apparatus, transmission method, and transmission program|
JP5846133B2|2012-01-18|2016-01-20|株式会社Jvcケンウッド|Moving picture decoding apparatus, moving picture decoding method, moving picture decoding program, receiving apparatus, receiving method, and receiving program|
WO2013108690A1|2012-01-19|2013-07-25|ソニー株式会社|Image processing device and method|
KR101418096B1|2012-01-20|2014-07-16|에스케이 텔레콤주식회사|Video Coding Method and Apparatus Using Weighted Prediction|
JPWO2013111596A1|2012-01-26|2015-05-11|パナソニック インテレクチュアル プロパティ コーポレーション オブアメリカPanasonic Intellectual Property Corporation of America|Moving picture coding method and moving picture coding apparatus|
US9106922B2|2012-12-19|2015-08-11|Vanguard Software Solutions, Inc.|Motion estimation engine for video encoding|
KR102018316B1|2013-02-28|2019-09-05|삼성전자 주식회사|Method and apparatus for transmitting control information for measuring interference in wireless communication system|
EP3202143B8|2014-11-18|2019-09-25|MediaTek Inc.|Method of bi-prediction video coding based on motion vectors from uni-prediction and merge candidate|
US10291932B2|2015-03-06|2019-05-14|Qualcomm Incorporated|Method and apparatus for low complexity quarter pel generation in motion search|
CN109479141A|2016-07-12|2019-03-15|韩国电子通信研究院|Image coding/decoding method and recording medium for the method|
US20180041769A1|2016-08-08|2018-02-08|Mediatek Inc.|Pattern-based motion vector derivation for video coding|
KR101986819B1|2016-11-11|2019-06-07|지아이건설|Reinforcement assembly of girder and construction method thereof|
CN107124638B|2017-05-18|2020-05-12|京东方科技集团股份有限公司|Data transmission method, device and system|
CN108933941A|2017-05-26|2018-12-04|富士通株式会社|Image encoding method and device and picture decoding method and device|
US10282827B2|2017-08-10|2019-05-07|Wipro Limited|Method and system for removal of rain streak distortion from a video|
TWI719524B|2018-07-01|2021-02-21|大陸商北京字節跳動網絡技術有限公司|Complexity reduction of non-adjacent merge design|
CA3112373A1|2018-11-20|2020-05-28|Huawei Technologies Co., Ltd.|An encoder, a decoder and corresponding methods for merge mode|
WO2021040572A1|2019-08-30|2021-03-04|Huawei Technologies Co., Ltd.|Method and apparatus of high-level signaling for non-rectangular partitioning modes|
法律状态:
2018-03-27| B15K| Others concerning applications: alteration of classification|Ipc: H04N 19/196 (2014.01), H04N 19/105 (2014.01), H04N |
2018-12-18| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|
2020-04-07| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|
2021-11-09| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|
2022-02-01| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 29/02/2012, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
申请号 | 申请日 | 专利标题
US201161454862P| true| 2011-03-21|2011-03-21|
US61/454,862|2011-03-21|
US201161502703P| true| 2011-06-29|2011-06-29|
US61/502,703|2011-06-29|
US13/336,799|2011-12-23|
US13/336,799|US9648334B2|2011-03-21|2011-12-23|Bi-predictive merge mode based on uni-predictive neighbors in video coding|
PCT/US2012/027136|WO2012128903A1|2011-03-21|2012-02-29|Bi-predictive merge mode based on uni-predictive neighbors in video coding|
[返回顶部]