法国专利FR3037676A1 APPARATUS AND METHOD FOR COMPRESSION AND DEDUPLICATION ONLINE

专利PDF首页>>法国专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
An apparatus and method for on-line compression and deduplication. Embodiments of the present invention include a memory unit and a processor connected to the memory unit. The processor is configured to receive a subset of data from a data stream and select a reference data block corresponding to the subset of data, the reference data block being stored in a memory buffer resident in the data subset. the memory unit. The processor is also configured to compare a first hash value computed for the subset of data with a second hash value calculated for the reference data block, the first hash value and the second hash value stored in data sets. separate hash tables and generating a compressed representation of the subset of data by modifying the header data corresponding to the subset of data in response to a detected match between the first hash value and the second hash value in one of the separate hash tables.
公开号:FR3037676A1
申请号:FR1655674
申请日:2016-06-17
公开日:2016-12-23
发明作者:Ashwin Narasimha；Ashish Singhai；Vijay Karamcheti
申请人:HGST Netherlands BV；
IPC主号:

专利说明:

[0001] FIELD OF THE INVENTION The present invention relates generally to the field of data reduction technology.
[0002] BACKGROUND OF THE INVENTION [2] The memory subsystems of the non-volatile, high performance category generally consist of relatively expensive components. It is therefore highly desirable to maximize the storage of data in such systems by employing data reduction techniques. Data reduction refers to the techniques of self-compression of data and deduplication of data to reduce the total amount of information that is written to or read from a primary storage system. Reducing the data results in the transformation of the (input) data of the user into a more compact representation that can be stored. The benefits of data reduction include improved storage utilization, increased service life (in the context of a fully flash storage system), and application acceleration, among other benefits. [3] Data compression refers to the process of searching for redundancy within the same block of data and then coding these repeated sequences so as to reduce the overall size of the data. Data deduplication refers to the process of matching data sequences between multiple blocks in an effort to find matched sequences, even if the individual block includes incompressible data. Conventional systems, however, do compression and deduplication of data as separate steps in the data reduction process. Indeed, these conventional systems do not combine them in one step and therefore pay latency and width penalties. [004] In addition, conventional data reduction solutions require many cycles and a lot of power to perform the compression functions. In any data flow of an application, there is still a high probability that a particular set of data blocks does not exhibit self-compression properties. At the end of a compression step, conventional solutions typically perform a check to ensure that the result is not larger than the original block, which is rather late, since the resources have already been used in the attempt to compress the data. SUMMARY OF THE INVENTION [005] Therefore, there is a need for a solution that creates a unified data path that performs both data compression and deduplication in a single pass. Embodiments of the present invention combine data compression technologies and extend them by integrating them with data deduplication methods. The one-pass nature of the embodiments of the present invention allows control of the latencies of the system and assists in the compression and deduplication of in-line throughput at higher speeds (eg, in a manner PCIe Gen3 speeds for a given FPGA, or other requirements or speed standards). [006] Embodiments of the present invention employ smaller data subsets, such as 4 kilobyte data blocks, for compression and may take precedence over compression coding copy formats. to differentiate a self-referenced copy of a copy of a reference block. It should be appreciated that the embodiments are not limited to 4 kilobyte data blocks and that any block size or block size range can be used (e.g. 4K, 8K, 10K, block size range from 4 KB to 8 KB, etc.). Embodiments can create memory buffer structures that have multiple parallel input buffers to hold the reference data blocks. Also, embodiments may include a parallel hash table lookup scheme in which searches corresponding to the data stored in the reference data block buffers can be performed concurrently with the hash lookups performed for the data stored in the the input data buffers. [7] Additionally, the embodiments may use the refill time of the reference data buffers to compute and store the sliced hash function values of the reference data for the purpose of improving the performance of the data reduction. data. Embodiments may also create a mutual lock between the calculations of the reference hash table and the start of compression. Thus, when the compression starts, searches can be performed in the reference hash table, in a compression hash table, or in both. Embodiments of the present invention may use heuristics to determine which sequence to use (if any) when hash access is detected in one or more of the hash tables. In addition, the embodiments of the present invention may modify the interpretation of the earlier reference for either the input data stream or the input reference buffer. [8] In addition, the embodiments of the present invention can detect very early and predict the compressibility of the blocks in order to minimize unnecessary efforts and avoid a decline in overall system performance. The embodiments described herein can analyze the characteristic compressibility characteristics in order to make a decision to perform data reduction procedures, such as compression, on a given data block. Low-impact, high-performance entropy detection operations can thus be performed in a manner that allows a high-performance data reduction system to save power and compression unit cycles as data becomes available. incompressible are provided. BRIEF DESCRIPTION OF THE DRAWINGS [009] The accompanying drawings, which are included in and form part of this specification, and in which like numerals designate like elements, illustrate embodiments of the present invention and, together with the description, serving to explain the principles of the invention. [0010] FIG. 1A is a block diagram showing an exemplary hardware configuration of an online compression and deduplication system capable of performing parallel dual compression and deduplication procedures for data reduction purposes in accordance with FIGS. embodiments of the present invention. Fig. 1B is a block diagram showing exemplary components provided in the memory for performing online compression and deduplication procedures in accordance with embodiments of the present invention. Fig. 1C shows an example of a compressed format for framing the data generated in accordance with embodiments of the present invention. Figure 1D shows an example of a reference hash table and compression hash table consultation schema in accordance with embodiments of the present invention. Figure 2A is a process diagram of a first portion of an exemplary process for one-pass entropy detection in accordance with embodiments of the present invention. Figure 2B is a process diagram of a second portion of an exemplary process for one-pass entropy detection in accordance with embodiments of the present invention. Figure 3A is a process diagram of an example of a simultaneous data deduplication and compression process in accordance with embodiments of the present invention. Fig. 3B is a process diagram of an exemplary process for performing hash table lookup procedures in accordance with embodiments of the present invention.
[0003] DETAILED DESCRIPTION [0018] Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings. Although the invention will be described in conjunction with the preferred embodiments, it will be understood that they are not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover variants, modifications and equivalents which may be included in the spirit and scope of the invention as defined by the appended claims. In addition, in the following detailed description of the embodiments of the present invention, many specific details are set forth in order to provide a thorough understanding of the present invention. However, one skilled in the art will recognize that the present invention can be implemented without these specific details. In other cases, well known methods, procedures, components and circuits have not been described in detail so as not to unnecessarily obscure aspects of the embodiments of the present invention. Although a method can be described as a sequence of numbered steps for clarity purposes, the numbering does not necessarily dictate the order of the steps. It should be understood that some of the steps may be skipped, executed in parallel or executed without it being absolutely necessary to maintain a strict sequence order. The drawings which represent the embodiments of the invention are semi-schematic and are not to scale, and some dimensions are particularly indicated for the sake of clarity of presentation and are shown exaggerated in the figures of the drawing. Likewise, although the views in the drawings generally represent similar orientations to facilitate the description, this representation in the Figures is for the most part arbitrary. In general, the invention can operate in any orientation. NOTATION AND NOMENCLATURE: It should be borne in mind, however, that all of these terms as well as similar terms are to be associated with the appropriate physical quantities and are only convenient labels applied to these quantities. Unless otherwise specifically indicated in the discussions below, it is considered that in the context of the present invention discussions using terms such as "reception" or "selection" or "generation" or "grouping" or "monitoring" or Similarly, reference is made to action 5 and to the processes of a computer system or similar computing device that manipulates and transforms data represented as physical (e.g., electronic) quantities within the registers and memories of the computer system and other computer-readable media to other data similarly represented as physical quantities within the memories or registers of the computer system or other storage devices; transmission or display of similar information. When a component appears in more than one embodiment, the use of the same reference number means that the component is the same component as that shown in the original embodiment.
[0004] EXAMPLE OF ONLINE COMPRESSION AND DEDUPLICATION SYSTEM CONFIGURATION [0022] FIG. 1A is a block diagram showing an example hardware configuration of an on-line compression and deduplication system (e.g. system 100) capable of performing dual duplicate compression and deduplication procedures for data reduction purposes in accordance with embodiments of the present invention. In this manner, the system 100 can perform data reduction procedures in a single pass such that operations related to the data reduction operations, such as data compression and data deduplication, are combined into one. process, a single processing path, or a single step, thereby reducing the overall latency and / or bandwidth penalties of the system. Although specific components are disclosed in Figure 1A, it should be appreciated that said components are exemplified. This means that the embodiments of the present invention are well suited to have different other hardware components or variants of the components mentioned in Figure 1A. It is appreciated that the hardware components in Figure 1A may operate with components other than those shown and that not all of the hardware components described in Figure 1A are necessary to achieve the objectives of the present invention. In accordance with some embodiments, the components shown in Figure 1A may be combined to achieve the objects of the present invention. The system 100 may be implemented in the form of an electronic device capable of communicating with other electronic devices through a data communications bus. The bus 106, for example, represents such a data communications bus. The exemplary system 100 upon which embodiments of the present invention can be implemented includes a general purpose computer system environment. In its simplest configuration, the system 100 usually includes at least one processing unit 101 and a storage storage unit. The computer readable storage medium 104, for example, represents such a storage storage unit. Depending on the exact configuration and type of device, the computer-readable storage medium 104 may be volatile (such as RAM), nonvolatile (such as ROM, flash memory) or any combination of both. Portions of the computer readable storage medium 104, when executed, facilitate efficient execution of memory operations or queries for elementary task groups. In one embodiment, the processor 101 may be a programmable circuit configured to perform the on-line compression and deduplication operations described herein. The processor 101 may, for example, be an FPGA controller or a flash-mate device controller. Alternatively, in one embodiment, the processor 101 may be used to execute an on-line deduplication and compression program stored in the computer-readable storage medium 104 and configured to perform the functions described herein (see, for example, Figure 1B described later). The system 100 may also include an optional graphics system 105 for presenting information to the user of the computer, for example by displaying information on an optional display device 102. The system 100 also includes an optional alphanumeric input / output device 103. The input / output device 103 may include an optional cursor control or guidance device and one or more signal communication interfaces, such as a network interface card. In addition, the interface module 115 includes functionality to enable the system 100 to communicate with other computer systems through an electronic communications network (e.g., the Internet, wired communication networks, communication networks, and the like). 20 wireless communication or similar networks). In addition, the system 100 may also have additional features and features. The system 100 may, for example, also include additional storage media (removable and / or non-removable), including, but not limited to, magnetic or optical disks or tapes. Computer storage media include volatile and nonvolatile, removable and non-removable media implemented in any information storage method or technology such as computer readable instructions, data structures, modules. program or other data. Figure 1B is a block diagram showing exemplary components provided in the memory for performing on-line compression and deduplication procedures in accordance with embodiments of the present invention. Although specific components are disclosed in Figure 1B, it should be appreciated that said computer storage medium components are exemplified. That is, the embodiments of the present invention are well suited to have different other hardware components or variants of the computer storage medium components mentioned in Figure 1B. It is appreciated that the components in Figure 1B can operate with components other than those shown, and that all of the computer storage medium components described in Figure 1B are not necessary to achieve the objects of the present invention. In accordance with some embodiments, the components described in Figure 1B may be combined to achieve the objects of the present invention. In addition, it is appreciated that certain hardware components described in Figure 1A may operate in combination with certain components described in Figure 1B to achieve the objects of the present invention. As shown in Figure 1B, the computer readable storage medium 104 includes an operating system 107. The operating system 107 loads into the processor 101 when the system 100 is initialized. Also, when executed by the processor 101, the operating system 107 may be configured to provide a programming interface to the system 100. The system 100 may also include wireless communication mechanisms. Through such devices, the system 100 may be communicatively connected to other computer systems through a communication network such as the Internet or an intranet, such as a local area network. In addition, as illustrated in FIG. 1B, the computer readable storage medium 104 comprises a computerized fingerprint calculation engine 110. The computerized fingerprint calculation engine 110 includes fingerprint generation functionality using a sequence of bytes to perform authentication and / or consultation procedures. Upon detecting the receipt of a data stream, the buffer management controller 112 may communicate signals to the computerized fingerprint calculation engine 110 to process the data stored in the data input buffer 112-1. upon receipt. The fingerprints generated by the computerized fingerprint calculation engine 110 can be used to represent larger files while using a fraction of the storage space that would otherwise be needed for the storage of such larger files. great. Larger files may include, for example, content pages or media files. The computerized fingerprint calculation engine 110 may employ conventional computer-implemented procedures, such as hash functions, to reduce the data flows into data bits to generate fingerprints, so that they can be processed by system components 100 as a computerized signature computer 113. Computerized hash calculations can be performed in a consistent manner with how the other system components 100 calculate the hash values, such as a hash table module 111 or in a different manner. In this manner, the computerized fingerprint calculation engine 110 may be configured to generate fingerprints for a subset of incoming data associated with a data stream while it is received by the system 100 . The subset of data can be in the form of increments of 4 kilobytes, for example. In one embodiment, the computerized fingerprint calculation engine 110 can calculate the fingerprints for an incoming 4 kilobyte set associated with a data stream received by the system 100 and stored in the input buffer of the system. data 112-1 10 generated by the buffer management controller 112. The computerized signature calculation engine 113 includes the signature calculation functionality for the data streams received by the system 100. The signatures may be computed by the computerized signature computation engine 113 based on various conventional hash-based signature schemes, including Merkle, Spooky, CRC, MD5, SHA or similar schemes. The computerized signature calculation engine 113 may be configured to perform computerized signature calculations using computerized sub-block signature calculations, computerized similarity detection calculations based on Rabin's signature and / or other calculations. computerized signature based on the similarity on the data streams received by the system 100. According to one embodiment, the computerized signature processor 113 may use the fingerprint data generated by the computerized fingerprint calculation engine 110 to generate signatures. In one embodiment, upon receipt of a data stream, the buffer management controller 112 may be configured to communicate signals to the computer-generated signature computing engine 113 to process the data stored in the buffer. data entry 112-1 upon receipt. The computerized signature computation engine 113 may be configured to compute multiple signatures for subsets of data at a time for different portions of an input data stream. In this manner, the signatures computed by the computerized signature calculation engine 113 for the subsets can be communicated to other system components 100 for further processing, such as a reference block identification module 114. The signatures computed by the computerized signature computation engine 113 may include, for example, mathematical properties that allow them to be similar or identical to the case in which they are calculated on blocks that are similar or identical to each other. . As such, a reference block selected by the system components 100, such as a reference block identification module 114, may be based on a computed signature that best represents a plurality of similar signature clusters stored in a reference block. Resident memory on the system 100. Therefore, the system components 100 can perform reference block identification procedures using the signatures calculated by the computerized signature calculation engine 113. The reference block identification module 114 may use, for example, sub-block signatures to perform the reference block identification procedures. [0033] The reference block identification module 114 comprises the analysis function of a plurality of different signature clusters generated by the computerized signature calculation engine 113 and selection of the reference blocks that can be processed. by the system components 100, such as the hash table module 111. The reference block identification module 114 may be configured to compare the calculated signatures with the signature clusters currently stored by the system 100 and accordingly select a reference block that best represents the calculated signatter. The reference block identification module 114 may be configured, for example, to compare the calculated signatures with the signature clusters currently stored in a buffer generated by the buffer management controller 112 and accordingly select a reference block which best represents the calculated signature. The reference blocks selected by the reference block identification module 114 may be stored within the buffers generated by the buffer management controller 112, as a reference block buffer 112-3. for further processing by system components 100. The reference blocks may be normal data blocks that have been found to be similar to the input data by various methods. The reference blocks may, for example, be normal data blocks that have been found to be similar to the input data using calculated sub-block signatures, similarity detection mechanisms, index detection schemes. application or similar schemes. The reference blocks may also be purely synthetic blocks containing repeated data sequences that have been found to have higher repetition factors. According to one embodiment, reference block identification module 114 may be configured to identify reference blocks using prior knowledge, content similarity concordance, application indices, pattern recognition. data or similar means. In addition, reference block information, such as a reference block stored within the reference block buffer 112-3 identified by the reference block identification module 114, may be stored in the header portion of a data stream. Referring to Figure 1C, for example, the reference block identifier for a reference block identified by the reference block identification module 114 may be stored in the header portion 116a of the stream 116. As illustrated in Figure 1C, header data 116a may be included within a set of data grains, such as data grains 116-1, 116-2 and 116-N, together with their portions. respective compressed payload data, such as the compressed payload 116b. In one embodiment, the header data 116a may store a reference identifier 117-1 in addition to the binary vector 117-2, the number of grains 117-3 and / or the CRC data of 117-4. Referring to Figure 1B, the hash table module 111 includes the hash value calculation and hash table dynamic generation functionality based on the data associated with the data streams received by the hash table module 111. system 100. Upon receipt of a data stream, the buffer management controller 112 may communicate signals to the hash table module 111 to process the data stored in the data input buffer 112-1 and / or the data buffer. reference blocks 112-3 at each reception of data by the buffer. The hash table module 111 includes the hash value computation functionality for subsets of data, such as data bytes, associated with a data stream received by the system 100 that can be stored within a data stream. a hash table generated. The hash table module 111 may, for example, calculate the hash values for the data bytes associated with a data stream received by the system 100. As such, the hash table module 111 can be used by the most common high-performance compression schemes in a manner that speeds up the search for repeated data sequences. The hash table module 111 may be used, for example, by the most common high performance compression schemes including Snappy, Lempel-Ziv (LZ), Gzip compression schemes or similar schemes. The subsets of data may have a predetermined fixed size and may be used to represent larger files for performing deduplication procedures. The hash table module 111 can thus calculate a hash value for each byte of data received by the system 100. In this way, the hash table module 111 can calculate the hash values for the subsets of data simultaneously with their reception and storage within a buffer generated by the buffer management controller 112. In addition, computerized hashing calculations can be performed in a manner consistent with the manner in which other system components 100 calculate the hash values, such as the computerized fingerprint calculation engine 110, or in a manner similar to that described in FIG. different. [0038] In accordance with one embodiment, the hash table module 111 includes the dynamic generation feature of the reference hash table based on the data reference blocks identified by the block identification module. reference 130. Once selected by the reference block identification module 114, the data blocks corresponding to the reference blocks can be stored with them in a reference block buffer, such as the reference block buffer 112-3. As the reference blocks are stored, the hash table module 111 can be configured to calculate the sliced hash values that correspond to the reference blocks. Thus, the hash table module 111 can generate pre-computed hash tables that can speed up the performance of the compression and deduplication procedures performed by the system 100. Referring to Figure 1B, for example, when a set of bytes is received by the system 100 and stored in the data input buffer 112-1 resident on the system 100, the hash table 111 may calculate the hash values for the reference blocks determined and / or selected by the reference block identification module 114 as corresponding to the received set of bytes. The hash table module 111 calculates these hash values as the data reference blocks are stored in the reference data block buffer 112-3, which has been dynamically generated by the buffer management controller 112. . In this manner, the buffer management controller 112 includes the feature of creating reference data block buffers that may be parallel to the functionality of the resident data input buffers on the system 100, such as the buffer. data entry 112-1. In this form, these calculated reference block hash values can then be stored in the reference hash table 111-1 generated by the hash table module 111. The hash table module 111 includes the dynamic generation feature of compression hash tables using a data stream received by the system 100 and / or stored in the data input buffers. In addition, the hash table module 111 includes the coded data modification and / or generation feature that can be used to decompress and / or reconstruct the previously processed data streams by the system 100. In this manner, the hash table module 111 may be configured to modify and / or encode the header data when identifying similar data sequences during the compression operations. The hash table module 111 can thus generate coded data containing a reference identifier that corresponds to the stored data previously identified by the hash table module 111. The hash table module 111 may, for example, generate and / or modify coded header data that contains the number of uncompressed data bytes identified by the hash table module 111, such as the number of figurative constants identified, upon completion of the computerized hash calculation procedures. In this manner, the encoded data generated by the hash table module 111 can provide instructions as to how the decompression module can decompress or decode a figurative constant and / or copy elements that correspond to a set of associated bytes. to a data stream subject to decompression procedures. The copied items can include bytes to copy ("length") and / or how far back are the data to be copied ("offset"). In one embodiment, for example, the header data generated and / or modified by the hash table module 111 may include a representation of the identified figurative constants and a corresponding figurative constant data sequence. . As such, the decompression module 108 can read the coded and / or modified header information that provides instructions on how the module can decompress the sequence of figurative constants. In addition, the decompression module 108 may be configured to perform the decompression procedures based on various compression schemes such as Snappy compression schemes, LZ, Gzip or similar schemes. According to one embodiment, provided that at least one reference block is selected and designated to be stored in a reference block buffer, the hash module 111 may send signals to the components of the reference block. system 100 for performing the hash table lookup and / or header modification procedures using the reference hash table and / or the compression hash table for further processing based on the calculated hash values. In this manner, the hash table module 111 can create a mutual interlock between the computerized calculations of the reference hash table and the start of the decompression procedures. In addition, the computer hash calculation procedures performed by the hash table module 111 for the compression hash table and the reference hash table may be the same computer-implemented procedures or functions or procedures or functions. implemented by 15 different computers. Table I contains an example set of header formats or modifications of the encoding format with an earlier reference capable of being modified by embodiments of the present invention. Compressed header Meaning 00 Figurative constant, max. 60 bytes 01 Local copy, 3-bit length, 11-bit offset 10 Local copy, 6-bit length, 12-bit offset 11 Reference copy, 12-bit length, 12-bit offset Table I 3037676 [0045] The scan and match engine 109 includes the functionality of performing hash table consultation procedures for performing hash value comparisons. The scan and match engine 109 includes the transmit and / or receive function of the hash table module 111 for performing computer-implemented lookup procedures to compare the values hashes calculated for the subsets of data to the data reference blocks currently stored by the system 100. The scan and match engine 109 may use the hash table lookup logic to locate the calculated hash values within the hash tables generated by the hash table module 111 and compare the data. . The hash table module 111 may, for example, generate the reference hash table 111-1 and the compression hash table 111-2 and perform comparison operations. As such, the scan and match engine 109 may be configured to view the hash values calculated for a subset of bytes against the data reference blocks currently stored by the system 100 in the system. the buffers generated by the buffer management controller 112, such as the reference block buffer 112-3. In this manner, the scan and match engine 109 may conduct parallel or concurrent searches in both a reference hash table and a compression hash table created by the hash table module 111. . When performing such consultation procedures, the scan and match engine 109 may also perform procedures for comparing a next set of bytes received by the system 100 to the stored reference data block and / or compression hash values which correspond to the data previously identified by the hash table module 111. Referring to Figure 1D, for example, when the reference block 118 is identified by the reference block identification module 114, the hash module 111 stores in the reference hash table 111-1 a calculated hash value which corresponds to portions of the reference block 118 (for example the values of the data subsets of the reference block 118-1, 118-2, 118-3, 118-4, etc. . ) as stored in a reference block buffer. In this manner, the system 100 can use the reference data buffer fill time to calculate and store the sliced hash function values of the reference data corresponding to the reference block 118, which improves the performance. compression and deduplication procedures performed by the system 100. In addition, as illustrated in Figure 1D, the system 100 may also receive input data blocks 120 associated with an incoming data stream. As such, the scan and match engine 109 may use the hash table logic 109-3 to perform parallel lookup procedures using the reference hash table 111-1 and the compression hash table 111. -2 reported 20 to identify the previously stored data sequences that are similar to the received data blocks 120. In this way, the scan and match engine 109 may perform byte-byte comparisons using smaller subsets (eg data subset 120-1 of the data block) of data and blocks of data. reference. If the scan and match engine 109 detects a match between an entry in the reference hash table 111 and / or the compression hash table 111-2 and the hash value calculated for the block 120, the scan and match engine 109 may then send signals to the decompression module 108 to decompress the subset of data within the reference block buffer or data input buffer. using modified compression header formats, such as changes to the earlier referenced encoding format described herein. As a result, the decompressed output can then be stored in a buffer generated by the buffer management controller 112, such as the data output buffer 112-2. In one embodiment, during the realization of the decompression procedures, the decompression module 108 may be configured to select one of a plurality of different sequences when the scan and match engine 109 detects a match in the table. reference hash 111-1 and / or 15 in the compression hash table 111-2. Based on a predetermined heuristic, for example, the decompression module 108 may be configured to decompress the data in the form of figurative constants, local copies, and / or reference copies. In this way, at decompression, the system 100 can create similar reference data input buffers so that the implementation of a decompression can be modified to interpret the previous references from either a data stream. input data, either from a reference block buffer. [0052] As such, the decompression module 108 may be configured to process the figurative constant scan logic 109-1 and / or the local copy scan logic 109-2 used by the scan engine and concordance 109. It should be appreciated that the embodiments of the present invention are not limited to the use of a single reference block. Embodiments may be expanded to include multiple reference blocks with simple modifications to existing data paths and frame structures. Embodiments may, for example, be extended to comparisons of multiple reference blocks made in parallel. In addition, the hash table module 111 may be configured to generate multiple reference hash tables that correspond to a respective reference block of a set of different reference blocks. In addition, multiple reference blocks may be stored within a single reference hash table generated by the hash table module 111. In addition, the system 100 may be configured for early detection and prediction of the compressibility of the blocks prior to performing a data reduction operation such as that described herein to minimize unnecessary efforts and avoid a decrease in the overall performance of the system. The decompression module 108 includes, for example, the functionality of performing grouping procedures on the data received by the system 100. As such, the decompression module 108 may include data grouping logic 108-1 that allows the decompression module 108 to group the incoming data, received through the data input buffer 112-1, into 20 subsets of data bytes or "slices" that can be processed or operated in a single instance. Thus, the hash table module 111 can calculate hash values on superimposed data slices selected by the decompression module 108 through the data grouping logic 108-1. In addition, the hash values calculated by the hash table module 111 for the superimposed slices 3037676 can be used as memory address locations which represent the locations where the slice offset values within the data structures are stored. , such as the compression hash table 111-2 and / or the resident memory on the system 100. In addition, the scan and match engine 109 may use the hash table module 111 to locate the calculated slices and, in parallel, perform comparison operations on the data blocks as they become available. written in the data entry buffer 112-1. By using the compression hash table 111-2, for example, the scan and match engine 109 can detect the occurrence of a "hash access" if it determines that a hash value calculated for a Slice related to an incoming data set shares the same signature as a hash value stored in the 111-2 compression hash table. In this manner, the scan and match engine 109 can detect the occurrence of a hash access when two slices have identical or similar signatures calculated by the computerized signature computation engine 113. In addition, the scanning and matching engine 109 includes the signal sending functionality to the decompression module 108 to increment a compressibility counter 20, such as the hash access counter 111-3. In this manner, the hash access counter 111-3 can be incremented each time the scan and match engine 109 detects the occurrence of a hash access. The hash access counter 111-3 allows the system 100 to keep track of the hash values that occur frequently within an incoming data set received by the system 100. Therefore, at the end of a data transfer in the data entry buffer 112-1, the system 100 can store a set of hashes calculated for a complete data set. In addition, the system 100 may be configured to store frequent hash value match thresholds, which allows it to better determine which data blocks would benefit most from being subjected to reduction procedures. data (eg data deduplication procedures, reference block identification procedures, data compression procedures, etc.) ). In this way, the system 100 may be configured in a manner that allows it to automatically interpret the compressibility characteristics using predetermined threshold values and / or calculated compressibility counts. For example, before the system 100 performs any data reduction procedure, it can first refer to a predetermined threshold count and decide whether to perform, stop and / or suspend a data reduction operation. Thus, the system components 100 such as the decompression module 108 may generate an instruction or set of instructions that instruct the system components 100 to initiate the execution of a data reduction operation (for example). example, data deduplication procedures, reference block identification procedures, data compression procedures, etc. ) when the threshold counter reaches or exceeds a frequent hash value match threshold. Therefore, the system components 100 may generate an instruction or set of instructions that instruct the system components 100 to refrain from performing a data reduction operation when the threshold count fails to reach a threshold. frequent hash value concordance. Such determinations by system 100 not only save cycles of the host CPU, but also allow data to move through the system without interrupting other handlers, such as host handlers. In one embodiment, for example, if the value of the hash access counter 111-3 is less than a predetermined threshold value, the decompression module 108 may determine that the currently analyzed data blocks present low compressibility characteristics, thus demonstrating a high entropy level for at least a portion of the data stream. Therefore, in response to this determination, the decompression module 108 may be configured not to perform any decompression operation. In this manner, the decompression module 108 may be configured to send instructions that stop and / or suspend the execution of the decompression operations. However, if the value of the hash access counter 111-3 is equal to or greater than the predetermined threshold value, the decompression module 108 can determine that the data blocks have high compressibility characteristics, demonstrating thus a low level of entropy for at least a portion of the data stream. Therefore, in response to this determination, the decompression module 108 may be configured to send instructions that initialize the execution of a decompression operation. In this manner, the decompression module 108 uses the compressibility factors to determine whether to provide "compression" or "compression bypass" signals to the other system components 100 for a given set of related bytes. with an incoming data set stored in the data input buffer 112-1. In this way, the system 100 can measure the entropy in relation to the data sets stored in the data entry buffer 112-1 based on the frequency of the similarities detected between the data blocks. of a given data set. In accordance with one embodiment, the scan and match engine 109 may calculate the frequency of the hash accesses using representations of the data as a histogram. In addition, the hash access counter 111-3 can be implemented by hardware or software. In addition, the system 100 may also be configured to dynamically adjust the threshold values based on system load and / or user preferences. In this way, the threshold for compression can be relaxed to increase the compression ratio at the expense of power and latency. Likewise, higher threshold values can be used to achieve lower average latencies. Figure 2A is a process diagram of a first portion of an example of a one-pass entropy detection process in accordance with embodiments of the present invention. In step 205, an input data stream is received by the system and stored in a data input buffer. Upon receipt of the data stream, the decompression module uses the data grouping logic to group a plurality of subsets of data found within the input data stream. The size of the subsets may be predetermined and have a fixed value. In step 206, using the fingerprint data generated by the computerized fingerprint calculation engine 30 for data stored in the data input buffer, the computerized signature calculation engine calculates a first signature for a first grouped subset of data within the data stream as stored during step 205. In step 207, the hash table module calculates a first hash value for the first grouped subset of data and compares the calculated hash value with a hash value stored in the hash table so that detect a match. In step 208, the hash table module calculates a second hash value for a second grouped data subset and compares the calculated hash value with a hash value stored in a hash table so that the hash module calculates a second hash value for a second grouped data subset. detect a match. In step 209, the hash table module calculates a hash value for a grouped data subset and compares the calculated hash value with a hash value stored in a table. hash to detect a match. In step 210, the decompression module monitors the matches detected by the hash table module and increments a counter accordingly for each detected match. Figure 2B is a process diagram of a second portion of an exemplary one-pass entropy detection process in accordance with embodiments of the present invention. The details of operation 210 (see Figure 2A) are depicted in Figure 2B. In step 211, the decompression module determines an entropy level for a portion of the input data stream based on a counter value with respect to a hash value match threshold. frequent predetermined. In step 212, the decompression module determines whether it detects that the hash value matching threshold has been reached or exceeded. If the decompression module detects that the hash value matching threshold has been reached or exceeded, the decompression modulus determines a high entropy level for a portion of the input data stream and thereby communicates signals. to the system components for initiating the execution of the data reduction operations, as described in detail in step 213. If the decompression module detects that the hash value matching threshold has not been reached, the decompression module determines a low entropy level for a portion of the input data stream and therefore communicates signals to the system components to stop the execution of the data reduction operations, as described in detail in step 214. In step 213, the decompression module detects that the hash value matching threshold has been reached or exceeded and, therefore, the decompression module determines a high entropy level for a portion of the input data stream and thereby communicate signals to the system components to initiate the execution of the data reduction operations. In step 214, the decompression module detects that the hash value matching threshold has not been reached and, therefore, the decompression module determines a low entropy level for a portion of the input data stream and therefore communicates signals to the system components to stop the execution of the data reduction operations. Figure 3A is a process diagram of an example of a simultaneous data deduplication and compression process in accordance with embodiments of the present invention. The details of operation 213 (see. Figure 2B) are depicted in Figure 3A. In step 215, the reference block identification module compares a signature computed during step 206 with the signature clusters currently stored by the system and accordingly selects a reference block that best represents the signature. calculated signature. The reference block selected by the reference block identification module is stored in the reference block buffer for further processing by the system. In step 216, as the reference block is stored in step 215, the hash table module calculates the sliced hash values corresponding to the reference block. In step 217, the hash values calculated during step 216 are stored in a reference hash table generated by the hash table module, provided that the hash values are not already stored. in the reference hash table. In step 218, with the proviso that at least one reference block is stored in the reference block buffer, the hash table module sends signals to the crawler and match to perform procedures. to query the hash table and / or modify the header using the reference hash table and / or the compression hash table for further processing based on the calculated hash value during steps 207, 208 and / or 209. Figure 3B is a process diagram of an exemplary process for performing the hash table lookup procedures in accordance with embodiments of the present invention. The details of operation 218 (see Figure 3A) are depicted in Figure 3B. In step 219, the scan and match engine determines whether or not it has detected a match between a calculated hash value and an entry stored exclusively in the reference hash table. If the scan and match engine determines that a match has been detected, the scan and match engine then compares byte by byte the subset of data associated with the hash value of the reference block stored in the buffer. of reference blocks associated with the matching entry, as described in detail in step 220. If the scan and match engine determines that no match has been detected, the scan and match engine then determines whether or not it has detected a match between a calculated hash value and an entry stored exclusively in the match. compression hash table, as described in detail in step 221. In step 220, the scan and match engine has determined that a match has been detected and, therefore, the scan and match engine compares, byte byte, the associated data subset. to the hash value at the reference block stored in the reference block buffer associated with the matching entry and accordingly sends signals to the decompression module to decompress the subset of data within the reference buffer. reference blocks using a modified compression header format for reference copies, such as "11". The uncompressed output is stored in the output data buffer. In step 221, the scanning and matching engine has determined that no match has been detected and, therefore, the scan and match engine determines whether or not it has detected a match. between a calculated hash value and an entry stored exclusively in the compression hash table. If the scan and match engine determines that a match has been detected, the scan and match engine then compares, byte byte, the subset of data associated with the hash value with the data currently stored in the cache. data input buffer, as described in detail in step 222. If the scan and match engine determines that no match has been detected, the scan and match engine then determines whether or not it has detected a match between a calculated hash value and a stored entry at a time. in the reference hash table and the compression hash table, as described in detail in step 223. In step 222, the scan and match engine has determined that a match has been detected and, therefore, the scan and match engine compares, byte byte, the associated data subset. to the hash value of the data currently stored in the data input buffer and accordingly sends signals to the decompression module to decompress the subset of data within the data input buffer using a format modified compression header for local copies, such as "01" or "10", based on the correct bit length and offset. The decompressed output is stored in the data output buffer. In step 223, the scanning and matching engine has determined that no match has been detected and, therefore, the scanning and matching engine determines whether or not it has detected a match between a calculated hash value and an entry stored in both the reference hash table and the compression hash table. If the scan and match engine determines that a match has been detected, the scan and match engine then compares, byte byte, the subset of data associated with the hash value with the data currently stored in the buffer. and inputting signals to the decompression module to decompress the subset of data within the data input buffer based on the predetermined procedures. In step 224, the scanning and matching engine has determined that a match has been detected and, therefore, the scan and match engine compares, byte byte, the subset of data. associated with the hash value to the data currently stored in the data input buffer and accordingly sends signals to the decompression module to decompress the subset of data within the data input buffer based on predetermined procedures. In accordance with one embodiment, the predetermined procedures may include the scan engine and concordance pattern to bias its selection of decompression procedures to local matches or reference matches, depending on the length of the copy and / or some other knowledge of the data associated with the data flow. In step 225, the scan and match engine has determined that no match has been detected and, therefore, the calculated hash value is stored in the compression hash table generated by the hash table module. In step 226, the scan and match engine communicates signals to the decompression module to decompress the subset of data stored in the data input buffer using a header format. modified compression for sequences of figurative constants, such as "00". The decompressed output is stored in the data output buffer. Although some preferred embodiments and methods have been disclosed herein, it will be apparent from the foregoing disclosure to those skilled in the art that variations and modifications of such embodiments and methods can be made without the use of the present invention. depart from the spirit and scope of the invention. [0089] According to one embodiment, the techniques described here can be implemented by one or more specific-use computer processing devices. The dedicated-use computer processing devices may be in physical cabling to execute the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or programmable gate arrays (FPGAs) which are permanently programmed to execute the techniques, or may include one or more general-purpose physical processors programmed to execute the techniques in accordance with program instructions in a firmware, memory, other storage, or combination. Such purpose-specific computing devices may also combine custom physical wiring logic, ASICs, or FPGAs with custom programming to accomplish the techniques. The purpose-specific computer processing devices 3037676 may be database servers, storage devices, desktop computer systems, portable computer systems, portable devices, networking devices, or any other device that incorporates a logic in physical wiring and / or programmable to implement the techniques. In the foregoing detailed description of embodiments of the present invention, numerous specific details have been presented to provide a thorough understanding of the present invention. However, it will be recognized by one of ordinary skill in the art that the present invention can be implemented without these specific details. In other circumstances, well known methods, procedures, components and circuits have not been described in detail so as not to unnecessarily obscure aspects of the embodiments of the present invention. Although a method may be described as a sequence of numbered steps for reasons of clarity, the numbering does not necessarily dictate the order of the steps. It should be understood that certain steps can be skipped, executed in parallel or executed without it being absolutely necessary to maintain a strict sequence order. The drawings which represent the embodiments of the invention are semi-schematic and are not to scale, and some dimensions are particularly indicated for reasons of clarity of presentation and are shown exaggerated in the figures of the drawing. Likewise, although the views in the drawings generally represent similar orientations to facilitate the description, this representation in the Figures is for the most part arbitrary.

权利要求:
Claims (21)
[0001]
REVENDICATIONS1. An apparatus comprising: a memory unit for storing data streams; and a processor coupled to said memory unit, the processor configured to perform a one-pass compression operation and a deduplication operation, said processor being operable to use a subset of data from a data stream to generate a reference data block corresponding to said subset of data, for comparing a first hash value calculated for said subset of data with a second hash value calculated for said reference data block, to generate a representation compressed deduplicated from said subset of data at least by modifying the header data corresponding to said subset of data in response to a detected match between said first hash value and said second hash value in one of said tables havehes, said first hash value and said second value of h are stored in separate hash tables.
[0002]
Apparatus according to claim 1, wherein said processor can be used to compare said first hash value and said second hash value in parallel using said separate hash tables.
[0003]
The apparatus of claim 1, wherein said separate hash tables include a reference hash table and a compression hash table.
[0004]
An apparatus according to claim 3, wherein said processor can be used to generate said compressed representation using said reference data block in response to a detection of a match between said first hash value and said second value of hashing, said second hash value being stored in said reference table. 5
[0005]
An apparatus according to claim 4, wherein said processor can be used to generate a mutual lock by initiating decompression procedures upon storage of said reference data block in said memory buffer.
[0006]
Apparatus according to claim 1, wherein said processor may be used to modify said header data using a heuristic-based prior reference encoding format.
[0007]
An apparatus according to claim 1, wherein said first hash value and said second hash value are calculated using the same function. 15
[0008]
A computer-implemented method for performing data reduction operations on an input data stream during a single pass, said method comprising: receiving a subset of data from a stream of data ; Selecting a reference data block corresponding to said subset of data, said reference data block being stored in a memory buffer; comparing a first hash value calculated for said subset of data with a second hash value calculated for said reference data block, said first hash value and said second hash value being stored in separate hash; and generating a compressed representation of said subset of data at least by modifying the header data corresponding to said subset of data in response to a detected match between said first hash value and said second value. hashing in one of said separate hash tables.
[0009]
The method of claim 8, wherein said comparing further comprises comparing said first hash value and said second hash value in parallel using said separate hash tables.
[0010]
The method of claim 8, wherein said separate hash tables include a reference hash table and a compression hash table. 10
[0011]
The method of claim 10, wherein said generating further comprises generating said compressed representation using said reference data block in response to a detection of a match between said first hash value and said second hash value. said second hash value being stored in said reference table.
[0012]
The method of claim 8, further comprising: generating a mutual lock by initiating decompression procedures upon storing said reference data block in said memory buffer. 20
[0013]
The method of claim 8, wherein said modifying further comprises modifying said header data using an earlier heuristic based reference encoding format. 25
[0014]
The method of claim 8, wherein said first hash value and said second hash value are calculated using the same function. 3037676 39
[0015]
Apparatus comprising: a memory unit for storing memory buffers; and a processor coupled to said memory unit and configured to: receive a subset of data from a data stream; storing said subset of data in a data entry memory buffer; Calculating a signature for said subset of data; selecting a reference data block using said calculated signature, said reference data block being stored in a memory buffer residing in said memory unit; comparing a first hash value calculated for said subset of data with a second hash value calculated for said reference block, said first hash value and said second hash value being stored in separate hash tables; and generating a compressed representation of said subset of data by modifying the header data corresponding to said subset of data in response to a detected match between said first hash value and said second hash value in one said separate hash tables. 25
[0016]
Apparatus according to claim 15, wherein said processor can be used to compare said first hash value and said second hash value in parallel using said separate hash tables.
[0017]
Apparatus according to claim 15, wherein said separate hash tables include a reference hash table and a compression hash table. 3037676 40
[0018]
Apparatus according to claim 17, wherein said processor can be used to generate said compressed representation using said reference data block for detecting a match between said first hash value and said second value of hashing, said second hash value being stored in said reference table.
[0019]
Apparatus according to claim 18, wherein said processor may be used to generate a mutual lock by initiating decompression procedures upon storage of said reference data block in said memory buffer.
[0020]
The apparatus of claim 15, wherein said processor may be used to modify said header data using a heuristic-based prior reference encoding format. 15
[0021]
Apparatus according to claim 15, wherein said first hash value and said second hash value are calculated using the same function.

类似技术:

公开号 | 公开日 | 专利标题

FR3037677B1|2019-06-14|APPARATUS AND METHOD FOR DETECTION OF ENTROPY IN A PASS ON DATA TRANSFER

FR3037676A1|2016-12-23|APPARATUS AND METHOD FOR COMPRESSION AND DEDUPLICATION ONLINE

US9798731B2|2017-10-24|Delta compression of probabilistically clustered chunks of data

EP2087418A1|2009-08-12|Methods and systems for data management using multiple selection criteria

US20170357718A1|2017-12-14|Hold back and real time ranking of results in a streaming matching system

US20130321180A1|2013-12-05|Method of accelerating dynamic huffman decompaction within the inflate algorithm

WO2021027252A1|2021-02-18|Data storage method and apparatus in block chain-type account book, and device

US8897487B1|2014-11-25|Systems and methods for facilitating combined multiple fingerprinters for media

US20150227540A1|2015-08-13|System and method for content-aware data compression

JP2010277522A|2010-12-09|Device for constructing locality sensitive hashing, similar neighborhood search processor, and program

Vikraman et al.2014|A study on various data de-duplication systems

US11119995B2|2021-09-14|Systems and methods for sketch computation

US10938961B1|2021-03-02|Systems and methods for data deduplication by generating similarity metrics using sketch computation

US20210191640A1|2021-06-24|Systems and methods for data segment processing

US10922187B2|2021-02-16|Data redirector for scale out

US20190236283A1|2019-08-01|Data analysis in streaming data

Zhuang et al.2015|StoreSim: Optimizing information leakage in multicloud storage services

WO2021127245A1|2021-06-24|Systems and methods for sketch computation

WO2021111456A1|2021-06-10|Moderator for identifying deficient nodes in federated learning

WO2021142072A1|2021-07-15|Peceptual video fingerprinting

同族专利:

公开号 | 公开日

GB2540666A|2017-01-25|

KR20160150043A|2016-12-28|

CN106257403A|2016-12-28|

CN106257403B|2020-11-03|

GB2540666B|2018-03-21|

US10152389B2|2018-12-11|

CA2933374A1|2016-12-19|

JP6370838B2|2018-08-08|

US20160371292A1|2016-12-22|

KR102052789B1|2019-12-05|

GB201610526D0|2016-08-03|

AU2015215974B1|2016-02-25|

AU2016203410A1|2016-06-16|

DE102016007365A1|2016-12-22|

JP2017010551A|2017-01-12|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US5032987A|1988-08-04|1991-07-16|Digital Equipment Corporation|System with a plurality of hash tables each using different adaptive hashing functions|

US5406279A|1992-09-02|1995-04-11|Cirrus Logic, Inc.|General purpose, hash-based technique for single-pass lossless data compression|

JP2564749B2|1993-03-31|1996-12-18|株式会社富士通ソーシアルサイエンスラボラトリ|Data compression method|

US6278735B1|1998-03-19|2001-08-21|International Business Machines Corporation|Real-time single pass variable bit rate control strategy and encoder|

US6374266B1|1998-07-28|2002-04-16|Ralph Shnelvar|Method and apparatus for storing information in a data processing system|

US6195024B1|1998-12-11|2001-02-27|Realtime Data, Llc|Content independent data compression method and system|

US6624761B2|1998-12-11|2003-09-23|Realtime Data, Llc|Content independent data compression method and system|

US6601104B1|1999-03-11|2003-07-29|Realtime Data Llc|System and methods for accelerated data storage and retrieval|

US6724817B1|2000-06-05|2004-04-20|Amphion Semiconductor Limited|Adaptive image data compression|

US7289643B2|2000-12-21|2007-10-30|Digimarc Corporation|Method, apparatus and programs for generating and utilizing content signatures|

US7487169B2|2004-11-24|2009-02-03|International Business Machines Corporation|Method for finding the longest common subsequences between files with applications to differential compression|

US8484427B1|2006-06-28|2013-07-09|Acronis International Gmbh|System and method for efficient backup using hashes|

US7970216B2|2006-08-24|2011-06-28|Dell Products L.P.|Methods and apparatus for reducing storage size|

US7885988B2|2006-08-24|2011-02-08|Dell Products L.P.|Methods and apparatus for reducing storage size|

CN101821973B|2007-06-25|2014-03-12|熵敏通讯公司|Multi-format stream re-multiplexer for multi-pass, multi-stream, multiplexed transport stream processing|

US8819288B2|2007-09-14|2014-08-26|Microsoft Corporation|Optimized data stream compression using data-dependent chunking|

US7937371B2|2008-03-14|2011-05-03|International Business Machines Corporation|Ordering compression and deduplication of data|

JP2010061518A|2008-09-05|2010-03-18|Nec Corp|Apparatus and method for storing data and program|

US8751462B2|2008-11-14|2014-06-10|Emc Corporation|Delta compression after identity deduplication|

US8205065B2|2009-03-30|2012-06-19|Exar Corporation|System and method for data deduplication|

US8706727B2|2009-06-19|2014-04-22|Sybase, Inc.|Data compression for reducing storage requirements in a database system|

US9058298B2|2009-07-16|2015-06-16|International Business Machines Corporation|Integrated approach for deduplicating data in a distributed environment that involves a source and a target|

GB2472072B|2009-07-24|2013-10-16|Hewlett Packard Development Co|Deduplication of encoded data|

US20110093439A1|2009-10-16|2011-04-21|Fanglu Guo|De-duplication Storage System with Multiple Indices for Efficient File Storage|

US8364929B2|2009-10-23|2013-01-29|Seagate Technology Llc|Enabling spanning for a storage device|

US8407193B2|2010-01-27|2013-03-26|International Business Machines Corporation|Data deduplication for streaming sequential data storage applications|

US8427346B2|2010-04-13|2013-04-23|Empire Technology Development Llc|Adaptive compression|

US9613142B2|2010-04-26|2017-04-04|Flash Networks Ltd|Method and system for providing the download of transcoded files|

US8533550B2|2010-06-29|2013-09-10|Intel Corporation|Method and system to improve the performance and/or reliability of a solid-state drive|

US10394757B2|2010-11-18|2019-08-27|Microsoft Technology Licensing, Llc|Scalable chunk store for data deduplication|

US8380681B2|2010-12-16|2013-02-19|Microsoft Corporation|Extensible pipeline for data deduplication|

KR101725223B1|2011-03-25|2017-04-11|삼성전자 주식회사|Data compressing method of storage device|

US8725933B2|2011-07-01|2014-05-13|Intel Corporation|Method to detect uncompressible data in mass storage device|

US9363339B2|2011-07-12|2016-06-07|Hughes Network Systems, Llc|Staged data compression, including block level long range compression, for data streams in a communications system|

KR20130048595A|2011-11-02|2013-05-10|삼성전자주식회사|Apparatus and method for filtering duplication data in restricted resource environment|

US8542135B2|2011-11-24|2013-09-24|International Business Machines Corporation|Compression algorithm incorporating automatic generation of a bank of predefined huffman dictionaries|

US9047304B2|2011-11-28|2015-06-02|International Business Machines Corporation|Optimization of fingerprint-based deduplication|

US9703796B2|2011-12-06|2017-07-11|Brocade Communications Systems, Inc.|Shared dictionary between devices|

US9087187B1|2012-10-08|2015-07-21|Amazon Technologies, Inc.|Unique credentials verification|

KR101956031B1|2012-10-15|2019-03-11|삼성전자 주식회사|Data compressor, memory system comprising the compress and method for compressing data|

US9035809B2|2012-10-15|2015-05-19|Seagate Technology Llc|Optimizing compression engine throughput via run pre-processing|

US9495552B2|2012-12-31|2016-11-15|Microsoft Technology Licensing, Llc|Integrated data deduplication and encryption|

US20140244604A1|2013-02-28|2014-08-28|Microsoft Corporation|Predicting data compressibility using data entropy estimation|

US8751763B1|2013-03-13|2014-06-10|Nimbus Data Systems, Inc.|Low-overhead deduplication within a block-based data storage|

US9471500B2|2013-04-12|2016-10-18|Nec Corporation|Bucketized multi-index low-memory data structures|

CN104123309B|2013-04-28|2017-08-25|国际商业机器公司|Method and system for data management|

US9384204B2|2013-05-22|2016-07-05|Amazon Technologies, Inc.|Efficient data compression and analysis as a service|

US10229043B2|2013-07-23|2019-03-12|Intel Business Machines Corporation|Requesting memory spaces and resources using a memory controller|

KR101532283B1|2013-11-04|2015-06-30|인하대학교 산학협력단| A Unified De-duplication Method of Data and Parity Disks in SSD-based RAID Storage |

KR102187127B1|2013-12-03|2020-12-04|삼성전자주식회사|Deduplication method using data association and system thereof|

US9552384B2|2015-06-19|2017-01-24|HGST Netherlands B.V.|Apparatus and method for single pass entropy detection on data transfer|US20100325054A1|2009-06-18|2010-12-23|Varigence, Inc.|Method and apparatus for business intelligence analysis and modification|

US9552384B2|2015-06-19|2017-01-24|HGST Netherlands B.V.|Apparatus and method for single pass entropy detection on data transfer|

US10585611B2|2016-04-26|2020-03-10|Netapp Inc.|Inline deduplication|

US10282127B2|2017-04-20|2019-05-07|Western Digital Technologies, Inc.|Managing data in a storage system|

US10809928B2|2017-06-02|2020-10-20|Western Digital Technologies, Inc.|Efficient data deduplication leveraging sequential chunks or auxiliary databases|

KR102276912B1|2017-06-07|2021-07-13|삼성전자주식회사|Storage system and operating method thereof|

US10503608B2|2017-07-24|2019-12-10|Western Digital Technologies, Inc.|Efficient management of reference blocks used in data deduplication|

US10521400B1|2017-07-31|2019-12-31|EMC IP Holding Company LLC|Data reduction reporting in storage systems|

US11144227B2|2017-09-07|2021-10-12|Vmware, Inc.|Content-based post-process data deduplication|

US11126594B2|2018-02-09|2021-09-21|Exagrid Systems, Inc.|Delta compression|

法律状态:
2017-05-11| PLFP| Fee payment|Year of fee payment: 2 |

2018-05-11| PLFP| Fee payment|Year of fee payment: 3 |

2019-12-27| PLSC| Search report ready|Effective date: 20191227 |

2020-03-13| ST| Notification of lapse|Effective date: 20200206 |

优先权:

申请号 | 申请日 | 专利标题

US14/744,444|US10152389B2|2015-06-19|2015-06-19|Apparatus and method for inline compression and deduplication|

[返回顶部]