专利摘要:
A method of managing a flash storage system includes reading flash data units from a flash memory into a buffer, wherein each of the flash data units comprises host data units, and determining an identifier for each host data unit. The method includes selecting a set of unique identifiers from the identifiers determined based on a number of host data units sharing the respective unique identifier. For each unique identifier of the set of unique identifiers, the method includes designating one of the host data units as the master data unit, wherein the logical address of the designated host data unit is mapped. with a physical address. The logical addresses of the other host data units sharing the unique identifier are remapped with the master physical address, and the physical addresses previously mapped with the remapped logical addresses are invalidated.
公开号:FR3026512A1
申请号:FR1559103
申请日:2015-09-28
公开日:2016-04-01
发明作者:Gunter Knestele;Jeffrey L Furlong
申请人:HGST Netherlands BV;
IPC主号:
专利说明:

[0001] The present invention relates to flash storage devices and, more particularly, to offline de-duplication processes for flash storage devices. [0002] Solid state storage devices (SSDs) can use flash memory as a nonvolatile storage medium. A process of deduplication or duplication allows a more efficient use of space. In a deduplication process, duplicate data entries are deleted. Rather than storing multiple copies of the same data at multiple physical addresses on the storage device, only one copy of the data is stored at a physical address, and references to that copy replace the other copies. Deduplication can be done online when a write command is received from a host. Before writing the data, the data is compared with data already stored on the storage device. If a match is found, a reference to that match is used, rather than writing the data to a new physical address. However, this online duplication can add latency to write operations. SUMMARY [0003] According to aspects of the technology in question, a method of managing a flash storage system is provided. The method comprises reading a plurality of flash data units from a flash memory into a buffer, wherein each of the plurality of flash data units comprises one or more host data units, and determining an identifier for each Host data units read from the buffer. The method includes selecting a set of unique identifiers from the identifiers determined based on a number of host data units that share the respective unique identifiers. For each unique identifier of the set of unique identifiers, the method includes designating a first host data unit sharing the unique identifier as the master data unit, wherein a logical address of the first host data unit. is mapped with a first physical address in the flash memory in a look-up table, remapping, in the look-up table, the respective logical addresses of one or more second host data units sharing the unique identifier from second respective physical addresses in the flash memory with the first physical address in the flash memory, and invalidate data stored at the respective second physical addresses in the flash memory. [0004] In other aspects of the technology in question, a flash storage system is provided. The flash storage system includes a plurality of flash memory devices, a memory including a buffer, and a controller. The controller is configured to read a plurality of flash data units from the plurality of flash memory devices in the buffer, wherein each of the plurality of flash data units comprises one or more host data units, determining an identifier for each of the host data units read from the buffer, and selecting a set of unique identifiers from the identifiers determined based on a number of host data units that share the respective unique identifiers. For each unique identifier of the set of unique identifiers, the controller is configured to designate a first host data unit sharing the unique identifier as a master data unit, in which a logical address of the first data unit The host is mapped with a first physical address in the flash memory device in a look-up table, remapping, in the look-up table, the respective logical addresses of one or more second host data units sharing the unique identifier from second respective physical addresses in the flash memory device with the first physical address in the flash memory device, and invalidating the data stored at the respective second physical addresses in the flash memory device. [0005] According to other aspects of the technology in question, machine-readable non-transitory support includes instructions stored thereon which, when executed by a machine, cause the machine to perform operations. . The operations include reading a plurality of flash data units from a flash memory in a buffer, each of the plurality of -flash data units including one or more host data units, determining an identifier for each of the units host data read from the buffer, and select a set of unique identifiers from the identifiers determined based on a number of host data units that share the respective unique identifiers. For each unique identifier of the set of unique identifiers, the operations include designating a first host data unit sharing the unique identifier as the master data unit, wherein a logical address of the first host data unit is mapped to a first physical address in the memory flash memory in a look-up table, remapping, in the look-up table, the respective logical addresses of one or more second host data units sharing the unique identifier from second physical addresses respective in the flash memory with the first physical address in the flash memory, and invalidate the data stored at the respective second physical addresses in the flash memory. It is understood that other configurations of the technology in question will be readily apparent to those skilled in the art from the following detailed description, wherein different configurations of the technology in question are shown and described as 'drawing.
[0002] As will be understood, the technique in question is capable of other configurations and configurations different and its various details can be modified in various other respects, all without departing from the scope of the technology in question.  As a result, the drawings and the detailed description should be considered illustrative and not restrictive.  BRIEF DESCRIPTION OF THE DRAWINGS [0007] Fig. 1 is a block diagram illustrating components of a flash storage system according to aspects of the technology in question.  Figure 2 is a block diagram illustrating flash memory devices according to aspects of the technology in question.  [0009] Fig. 3 is a flowchart illustrating a method of managing a flash storage system according to aspects of the technology in question.  Figure 4A is a diagram of a node of a data structure according to aspects of the technology in question.  Figure 4B is a diagram of a data structure comprising nodes of Figure 4A according to aspects of the technology in question.  Figure 5A is a diagram of a look-up table according to aspects of the technology in question.  Figure 5B is a diagram of a back-to-back table according to aspects of the technology in question.  Figure 6A is a flowchart illustrating a method for updating deduplication entries according to aspects of the technology in question.  Figure 6B is a flowchart illustrating another method for updating deduplication entries according to aspects of the technology in question.  Figure 7A is a diagram of a modified look-up table according to aspects of the technology in question.  Figure 7B is a diagram of a modified back-end table according to aspects of the technology in question.  Figure 7C is a diagram of another modified look-up table according to aspects of the technology in question.  Fig. 7D is a diagram of another modified back-end table according to aspects of the technology in question.  DETAILED DESCRIPTION [0020] The detailed description presented below is intended as a description of the various configurations of the technology in question and is not intended to represent the only configurations in which the technology in question can be implemented.  The accompanying drawings are incorporated herein and constitute a part of the detailed description.  The detailed description includes specific details to provide a thorough understanding of the technology in question.  However, the technology in question can be implemented without these specific details.  In some cases, structures and components are presented in block diagram form to avoid obscuring the concepts of the technology in question.  An SSD may include one or more flash memory devices, each of which includes an array of flash memory cells.  The flash memory cells can be organized into physical blocks, each physical block comprising a number of pages.  The data is written to the flash memory in page write units, where each page has the ability to store a predetermined number of host data units or sectors.  Host data files may be sequentially written to the flash memory at the next available location.  However, the data is erased from the flash memory 10 in physical block erase units.  The SSD can perform maintenance operations, which can help manage the storage / usage of data and the lifetime of flash memory devices.  In a deduplication or deduplication process, storage space is used more efficiently by eliminating duplicate data units.  In on-line dubbing, when a write command is received from the host, the host data unit to be written is compared to the host data units stored in the storage device.  If a match is found, the target logical address of the write command is mapped to the physical address of the corresponding host data unit.  If a match is not found, the host data unit is written to an available physical address, and the target logical address is mapped to the physical address written.  However, the duplicate processps may add a delay in completing the write command.  Applying deduplication during maintenance operations (offline deduplication), such as garbage collection (GC), may prevent write latency during host write commands.  Figure 1 is a block diagram illustrating components of a flash storage system 110 according to aspects of the technology in question.  As shown in FIG. 1, the flash storage system 110 includes an interface 115, a controller 120, flash memory devices 130, and a memory 125.  The interface 115 facilitates the communication of data, commands, and / or control signals between the flash storage system 110 and a host 150.  The controller 120 controls the operation of the flash storage system 110 to store and retrieve data in the flash memory devices 130 in accordance with instructions received from the host 150.  The controller 120 may include a processor 122.  The memory 125, which may be a random access memory (RAM), provides a temporary storage space, which may include a buffer 127, for the controller 120 to process commands and transfer data between the host 150 and the memory devices. flash 130.  The operation of each of these components is described in more detail below.  The interface 115 provides physical and electrical connections between the host 150 and the flash storage system 110.  The interface 115 is configured to facilitate communication of data, commands, and / or control signals between the host 150 and the flash storage system 110 through the physical and electrical connections.  The connection and communications with the interface 115 may be based on a standard interface such as Universal Serial Bus (USB), Small Computer System Interface (SCSI), Serial Advanced Technology Attachment (SATA), and so on.  Alternatively, the connection and / or communications may be based on a proprietary interface.  Those skilled in the art will recognize that the technology in question is not limited to a particular type of interface.  The controller 120 manages the flow of data between the host 150 and the flash memory devices 130.  The controller 120 is configured to receive commands and data from the host 150 through the interface 115.  For example, the controller 120 may receive data and a write command from the host 150 to write the data to the flash memory devices 130.  The controller 120 is further configured to send data to the host 150 through the interface 115.  For example, the controller 120 may read data in the flash memory devices 130 and send the data to the host 150 in response to a read command.  The controller 120 is further configured to manage the data stored in the flash memory devices 130 and the memory 125 based on internal control algorithms or other types of commands that may be received from the host 150.  For example, the controller 120 is configured to perform a GC and other maintenance operations.  Those skilled in the art will be familiar with other conventional operations performed by a controller in a flash storage device, which will not be described in detail here.  The controller 120 may be implemented with a general purpose processor, a microcontroller, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array. the user (FPGA) or other programmable logic device, discrete transistor gate or logic device, discrete hardware components, or any combination thereof designed and configured to perform the operations and functions described herein.  In some implementations, the controller 120 may include the processor 122, which may be a specialized processor for a specific operation, such as computing a secure hashing algorithm (SHA).  The controller 120 may perform the operations and functions described herein by executing one or more instruction sequences stored on a machine / computer readable medium.  The machine-readable or computer-readable medium may be flash memory devices 130, memory 125, or other types of media from which controller 120 may read instructions or code.  For example, the flash storage system 110 may include a read-only memory (ROM), such as an EPROM or EEPROM, encoded with firmware / software comprising one or more instruction sequences read and executed by the controller 120 during operation. of the flash storage system 110.  The flash memory devices 130 may each be a single flash memory chip or may represent groups of multiple flash memory chips.  The flash memory devices 130 may be organized between multiple channels through which the data is read and written into the flash memory devices 130 by the controller 120, or coupled to a single channel.  The flash memory devices 130 may be implemented using the NAND flash memory.  The flash memory devices 130 comprise multiple memory cells divided into storage blocks.  These storage blocks may be called data blocks or memory blocks and are addressable by controller 120 using a physical block address.  Each of the storage blocks is further divided into multiple data segments or pages addressable by the controller 120 using a physical page address or shifted with respect to a physical block address of the storage block containing the referenced page.  Pages can store sectors or other host data units.  The storage blocks 30 represent the data units that are erased in the flash memory devices 130 in a single erase operation.  The physical pages represent the data units that are read from or written to the flash memory devices 130 in a single read or write operation.  FIG. 2 shows a block diagram of flash memory devices 230A-D, which may correspond to the flash memory devices 130.  Flash memory devices 230A-D include storage blocks 240A-D, respectively, and blocks 241A-D, respectively.  Block 240A is divided into pages 245A1 to A4.  Block 240B is divided into pages 245B1 to B4.  Block 240C is divided into pages 245C1 to C4.  Block 240D is divided into pages 245D1 to D4.  Although not shown in FIG. 2, the blocks 241A-D can be divided into pages.  The pages, such as pages 245A1, 245B1, 245C1, and 245D1, in the set of flash memory devices 230A-D having corresponding physical addresses, are referred to as super page or streak 250.  Blocks, such as blocks 240A-D, in all flash memory devices 230A-D are referred to as superblock or band 251.  Each of the flash memory devices 230A-D may be on a separate channel, allowing read / write operations in parallel across all channels.  Those skilled in the art can recognize other terms conventionally used to call these data units in a flash storage device.  The technology in question is not limited to a particular capacity of the flash memory device.  For example, storage blocks may each comprise 32, 64, 128 or 512 pages.  In addition, each page can be 512 bytes, 2 KB, 4 KB, or 32 KB.  Sectors can each include 4 KB, or other sizes so that sectors can be the same size as a page, or there can be multiple sectors per page.  Returning to Figure 1, the memory 125 represents a memory coupled to the controller 120 and used by it during the operation of the flash storage system 110.  The controller 120 can buffer commands and / or data in the memory 125.  For example, the controller 120 may buffer data in the buffer 127.  The buffer 127 may include transfer buffers and / or other buffers used by the controller 120.  The controller 120 may also use the memory 125 to store address mapping tables or lookup tables used to convert logical data addresses used by the host 150 into virtual and / or physical addresses corresponding to parts of flash memory devices 130.  Other types of tables, data, status indicators, etc. used to manage the flash storage devices may also be stored in the memory 125 by the controller 120.  The memory 125 can be implemented using a dynamic random access memory (DRAM), a static random access memory (SRAM) or other types of volatile and nonvolatile random access memory, including many types of memories such as DRAMs and SRAMs. , known to those skilled in the art without departing from the scope of the technology in question.  The host 150 may be a computing device, such as a computer / server, a smartphone or any other electronic device that reads and writes data into the flash storage system 110.  The host 150 may have an operating system or other software that transmits read and write commands to the flash storage system 110.  The flash storage system 110 may be integrated with the host 150 or may be external to the host 150.  The flash storage system 110 may be wirelessly connected to the host 150, or may be physically connected to the host 150.  The controller 120 is configured to perform maintenance operations on the flash memory devices 130.  For example, the controller 120 may determine that a GC is to be performed.  For example, the controller 120 may determine that a number of available blocks may be less than a threshold.  The controller 120 can keep track of the programming / erasing (P / E) cycles of each block for purposes of distribution of wear.  Once a GC is triggered, block 240A, for example, may be selected depending on the amount of invalid data units contained in block 240A, a number of errors associated with block 240A , or other parameters such as PIE cycles.  For example, even if block 240B can be selected for GC - having a large amount of invalid data - block 240A can be selected instead based on the respective PIE cycles of blocks 240A and 240B.  Once block 240A is selected for the GC, the valid data of block 240A is copied to an empty block, such as block 240D, and block 240A is cleared.  However, controller 120 may be configured to perform deduplication before copying valid data from block 240A into block 240D.  Figure 3 shows an offline deduplication flow chart 300, according to aspects of the technology in question.  With a block, such as block 240A, selected for a maintenance operation, such as a GC, the deduplication process can begin by reading a page of the selected block, such as page 245A1.  In step 310, a plurality of flash data units are read from the flash memory in a buffer, each of the plurality of flash data units including one or more host data units.  For example, the data on page 245A1 can be read in buffer 127.  To improve deduplication performance, more data can be read into the buffer, as the analysis of more data increases the likelihood of finding duplicate data.  Therefore, the remaining pages of the band associated with the playback page, for example the band 251, can also be read into the buffer.  Since the other pages of the tape are on different channels, the pages may be read in parallel, so that reading the entire tape may not add significant read overhead.  In some implementations, the maintenance operation may select a number of flash data units, such as half a band or stripe, so that all flash memory devices may not be scanned. .  In step 320, an identifier for each of the host data units read in the buffer is determined.  For example, a hashing algorithm, such as SHA, can be applied to each host data unit to create respective digests.  Although an SHA is discussed here, other types of identifiers may be used in place of an SHA.  The identifiers are determined so that the same data creates the same identifier, in order to identify duplicate data entries.  The identifiers can be determined by the controller 120.  In some implementations, a small hardware processor, such as processor 122, may be used to calculate the SHA.  The processor 122 may be specifically configured to calculate the SHA to increase performance and allow the controller 120 to perform other operations in parallel.  In step 330, a set of unique identifiers is selected from the identifiers determined based on a number of host data units that share the respective unique identifiers.  The selection may be based on the analysis of a data structure, for example a type B tree, containing the identifiers determined in step 320.  For example, controller 120 may construct a type B tree in memory 125 when each identifier is determined in step 320.  Figure 4A illustrates a block diagram of a node 410 used in a B-type tree 400 in Figure 4B.  Each node 410 comprises a key 420.  Each key 420 stores a digest 425, an account 430, and addresses 440.  Digest 425 contains a unique identifier.  Count 430 contains the number of host data units corresponding to the identifier of digest 425.  When each host data unit is examined, the count 430 is increased if the digest of the host data unit corresponds to the digest 425, and the logical address of the host data unit is added to the addresses 440.  The addresses 440 contain a list of logical addresses having the data corresponding to the identifier of the digest 425.  The account 430 corresponds to the number of addresses in the addresses 440.  The addresses 440 may be implemented as a linked list, array, or other data structure for storing multiple items.  Figure 4B shows the B-type tree 400 containing the nodes 410A, 410B, 410C, and 410D.  Node 410A contains keys 420A and 420B.  Node 410B contains keys 420C and 420D.  Node 410C contains keys 420E and 420F.  Node 410D contains keys 420G and 420H.  Each of the keys 420A, 420B, 420C, 420D, 420E, 420F, 420G and 420H contains a condensate 425, a count 430, and respective addresses 440, which are not shown in FIG. 4B.  Since a digest is generated for each host data unit examined in step 320, the controller looks for a corresponding digest in the type B tree.  If a corresponding digest is found in a key, the corresponding count is increased, and the logical address of the host data unit is added to the corresponding addresses.  If a corresponding digest is not found, a new key is created.  The new key is added to the type B tree based on the value of the digest.  The digests may be compared, for example, by alphanumeric value, so that a first digest may be considered higher, lower, or equal to a second digest.  From the root node (node 410A in Fig. 4B) as the current node, the new digest is compared to the digests of the keys for the current node.  If the new digest is less than the digests of the keys of the current node, the examination continues on the child node (node 0B in FIG. 4B) having digests lower than the digits of the current node.  If the new digest is between the digests of the current node, the scan continues on the child node (node 410C in Figure 4B) having digests between the appropriate digests of the current node.  If the new digest is greater than the digests of the current node, the exam continues on the child node (node 410D in Fig. 4B) having digests higher than the digests of the current node.  Once a node without children is reached, the new digest is added as a new key, with the new digest, a new count of 1, and the logical address of the host data unit.  Depending on the order of the type B tree, the new key may cause the node to exceed a maximum number of keys, so that a new node may be created.  Once the identifiers of all the host data units are entered into the type B tree, the type B tree can be completed.  The set of unique identifiers can be selected based on the account values stored with each key.  For example, identifiers for the largest accounts can be selected.  A threshold number of identifiers, such as 16, may be selected, or a threshold percentage of the total number of host data units examined, for example 15%, may be used to select the identifiers having accounts which in total reach the threshold percentage without exceeding the threshold percentage.  Alternatively, only identifiers having accounts greater than a lower threshold count may be selected to prevent the selection of identifiers having a count that is too small for a deduplication benefit.  For example, the 16 largest accounts may include an account of two, which may not provide significant deduplication.  Identifiers with accounts greater than an upper threshold account can also be excluded.  Although deduplication of a large account can provide increased storage efficiency, data management can become cumbersome.  If the count is large enough, multiple read commands may require reading from the same physical address in a short period of time, which can create a bottleneck and cause a read delay.  For example, if each host data unit contains the same data, then all read commands will require access to the same physical address.  For each unique identifier in the set of unique identifiers, steps 340, 350, and 360 are performed.  In step 340, a first host data unit sharing the unique identifier is designated as the master data unit, in which a logical address of the first host data unit is mapped to a first physical address in the memory. flash in a lookup table.  Fig. 5A shows a look-up table 500, storing logical block address (LBA) 510 mapped to physical block address (PBA) 520.  The PBA of the master data unit stores a copy (or master copy) of the host data unit.  In addition, a rollback table, such as a rollback table 550 having the PBAs 560 and the LBAs 570 shown in FIG. 5B, is updated to maintain reverse mappings from the PBA to the LBAs. .  The backtrack table 550 may be implemented with linked lists or arrays to store the multiple LBAs by PBA.  In step 350, the respective logical address of one or more second host data units sharing the unique identifier is remapped from respective second physical addresses in the flash memory with the first physical address in the memory. flash memory.  In FIG. 5A, the LBA 12 corresponds to the master data unit, and has the same data as the LBAs 98 and 105.  The LBA 98 is mapped to the PBA 16, and the LBA 105 is also mapped to the PBA 16.  In some implementations, the deduplicated entries 12, 98, and 105 may be marked with a deduplication flag.  In some implementations, the input of the master data unit may be marked as a master data unit.  The backtrack table is also updated.  In FIG. 5B, the PBA 16 is mapped with the LBAs 12, 98 and 105.  The backtrack table shows which LBAs are mapped to a given PBA, which can be used when, later, the data stored in a PBA is updated.  In step 360, the data stored at the respective second physical addresses in the flash memory is disabled.  Since the LBAs for the deduplicated entries point to the PBA of the master data unit, the PBAs that were previously dotted may be released later.  The maintenance operation, such as the GC operation in which the flash data unit was initially selected, can continue.  Since the deduplicated PBAs are now marked as invalid, there may be less valid data to be copied for the GC operation than there was before deduplication.  In other words, more invalid host data units can be claimed for the GC.  In addition, the type B tree can be deleted from the memory once the deduplication is complete.  Since the deduplication operation is not online, the type B tree does not need to be maintained for new write commands, and the memory can be freed for other operations.  If a deduplicated band is later selected for deduplication, the type B tree can be recreated based on a backtrack table and / or marked deduplication entries in the lookup table.  When the data in the flash storage system is modified, such as to be updated or deleted, the deduplicated LBAs may require an update.  Figs. 6A and 6B illustrate methods for updating deduplication entries of a target host data unit to be modified.  For example, the host may issue a write command to update data for the LBA of the target host data unit.  Alternatively, the host may issue an erase command to erase data for the LBA of the target host data unit.  If the lookup table entry for the target host data unit includes a deduplication flag, the deduplication update process may be triggered.  Alternatively, the PBA of the target host data unit can be found in the backtrack table to trigger the deduplication update process.  FIG. 6A illustrates a flow chart 600 for deleting / overwriting a deduplication entry.  For example, the host may issue an erase or overwrite command for an LBA.  In step 610, a mapping of a logical address of a target host data unit is removed from a lookup table for the target host data unit to be changed, the logical address of the data unit target host that has been mapped to the physical address of the master data unit.  For example, Figure 7A shows a look-up table 700 having the LBAs 702 mapped to the PBAs 704, as in the Lookup Table 500 of Figure 5A.  The target host data unit may correspond to the LBA 105.  LBA 105 is no longer mapped to PBA 16 (compared to FIG. 5A).  The LBA 105 may no longer be mapped to a PBA, or may be remapped with another PBA (not shown in Figure 7A).  If the LBA - 14 - 5 is no longer mapped to a master PBA (for example, is no longer a deduplication entry), the deduplication flag for the LBA 105 may be removed.  In step 620, an inverse mapping of the physical address of the master data unit to the logical address of the target host data is removed from the backtrack table.  When the target host data unit is not a master data unit, the corresponding PBA is not affected, and the respective entries in the lookup table and the backtrack table are updated.  For example, Fig. 7B shows a back-end table 710 having the PBAs 712 mapped with the LBAs 714, as in the back-end table 550 of Fig. 5B.  PBA 16 is no longer mapped to LBA 105 (compared to FIG. 5B).  In some implementations, a deduplication entry may be cleared using the method of Figure 6A, when the master PBA does not need to be cleared.  For example, if the LBA 12 were to be removed, the other LBAs, such as the LBA 98, would still keep the PBA 16 as the master PBA.  By updating the lookup and rollback tables without changing the master PBA, no data is copied or moved.  FIG. 6B illustrates a flow chart 602 for deleting / overwriting a master deduplication entry.  A GC operation or other maintenance operation might possibly require clearing a master PBA, such as clearing a block corresponding to the master PBA after copying the master PBA data to a new PBA.  Unlike the method shown in Figure 6A, the master PBA will be changed.  In step 612, the data of a physical address of a target host data unit is copied to a new physical address, in which the target host data unit is to be modified and the target host data unit is designated as the master data unit.  When the target host data unit is the master data unit, the data at the corresponding PBA must be copied to a new PBA to ensure that other deduplication host data units sharing the data can still reference the data.  Fig. 7C shows a look-up table 720 having the LBAs 722 mapped to the PBAs 724, as in the look-up table 500 of Fig. 5A.  Figure 7D shows a back-end table 730 having the PBA 732 mapped with the LBA 734, as in the back-end table 550 of Figure 5B.  The master PBA can be 32, and the new physical address can be 40.  In step 622, a host data unit of the new physical address is designated as the master data unit.  For example, the PBA 40 may be designated as the new master data unit.  In step 632, the respective logical addresses of the second host data units are remapped with the new physical address in the look-up table.  For example, in Figure 7C, the LBAs 12, 98, and 105 are now mapped to the PBA 40 (compared to Figure 5A).  In step 642, the inverse mappings of the physical address of the target host data unit are updated, in the backtrack table, with the reverse mappings of the new physical address.  In the back-to-back table, the old master PBA is updated with the new PBA master.  For example, in Fig. 7D, PBA 16 is now replaced by PBA 40 (compared to Fig. 5B).  In step 652, the data stored at the physical address of the target host data unit is invalidated.  The old master PBA is marked as invalid to complete the modification.  The invalidated PBA can then be cleared and released in a GC operation.  In some implementations, the methods of flow charts 600 and 602 may be combined to modify a specific data unit.  For example, if a particular deduplicated LBA and PBA needed to be modified, the flowcharts processes 600 and 602 could be used to update lookup tables and backtrack tables.  The deduplication for the flash storage system may be enabled by a user, for example, through a user interface accessible by the host.  The user interface can be implemented as an application or as a web page accessible through a browser on the host.  The user can also select certain options or settings for the deduplication of the flash storage system.  For example, a percentage of the deduplication capacity can be selected such that a threshold percentage of the total available space of the flash storage system can be deduplicated.  Once the threshold percentage has been deduplicated, the flash storage system can no longer deduplicate data unless the percentage falls below the threshold percentage.  In addition, some deduplication models can be stored normally without deduplication.  For example, the user can designate specific data patterns that will not be selected for deduplication.  These models can include common models (like all zeros), which can create latency or significant management overhead.  Deduplication can be disabled by the user, which may require formatting the flash storage system.  Deduplication may be a priority for cold data tapes.  The flash storage system can keep track of hot data tapes, for example based on the data change frequency of the LBAs.  Performing deduplication on cold data minimizes account maintenance, such as updating the lookup table and the rollback table.  Because cold data can be updated less often, master data units can be updated less often, which can reduce write amplification.  For example, when the GC selects a band, deduplication may not be performed if the band is considered hot.  [0057] Deduplication may increase read amplification, since additional reads may be needed to read the data for deduplicated entries.  However, since fewer writes are needed, and less storage space is used, the frequency of GC operations or other maintenance operations can be reduced, and the overall storage system lifetime flash can be increased.  In addition, since deduplication is not performed during host write operations, deduplication does not add latency to host write operations.  Since copying valid data requires writing data, reducing the amount of written data can increase the life of the corresponding flash memory device.  Offline deduplication may be easier to interrupt than online deduplication.  For example, with offline deduplication, the data is already stored on the flash storage device, and the mappings are updated.  If a new read command is received during offline deduplication, the data can be read, since the data is still stored as previously written, or the mappings may have already been updated by deduplication.  If a new write command is received during offline deduplication, the data is usually written to a new physical address, rather than examining physical addresses for deduplication.  During offline deduplication, GC commands may have lower priority than host write commands, providing greater performance benefits to the host.  In addition, deduplication can optimize storage utilization because more free space may be available.  Those skilled in the art will appreciate that the various blocks, modules, elements, components, methods and illustration algorithms described herein may be implemented in the form of electronic equipment, computer software, or a combination of the two. .  To illustrate this interchangeability between hardware and software, various blocks, modules, elements, components, methods and illustrative algorithms have been generally described above in terms of functionality.  Whether this functionality is implemented in a hardware or software form depends on the particular application and design constraints imposed on the overall system.  Those skilled in the art will be able to implement the described functionality in various ways for each particular application.  Various components and blocks may be arranged differently (for example, arranged in a different order, or distributed in another way), all without departing from the scope of the technology in question.  It is understood that the specific order or hierarchy of the steps in the methods described is an illustration of exemplary approaches.  Based on the design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged.  Some steps can be done simultaneously.  The accompanying method claims have elements of the different steps in a sample order, and are not intended to be limited to the specific order or hierarchy presented.  The foregoing description is provided to enable any person skilled in the art to practice the various aspects described herein.  Various modifications of these aspects will readily be apparent to those skilled in the art, and the general principles defined herein can be applied to other aspects.  Thus, the claims are not intended to be limited to the aspects presented here, but must be given full scope in accordance with the language claims, where a reference to an element in the singular is not intended to mean "a ) and - 18 - "only one", unless otherwise specifically indicated, but rather "one or more".  Unless otherwise indicated, the term "some" means one or more.  Male pronouns (eg, sound) include the feminine and neutral gender (eg, sound and sound) and vice versa.  Titles and headings, if any, are used for convenience only and do not limit the invention.  An expression such as an "aspect" does not mean that such an aspect is essential to the technology in question or that such an aspect applies to all the configurations of the technology in question.  An aspect disclosure may apply to all configurations, or to one or more configurations.  An expression such as an aspect may refer to one or more aspects, and vice versa.  An expression such as a "configuration" does not mean that such a configuration is essential to the technology in question or that such a configuration applies to all configurations of the technology in question.  A configuration disclosure may apply to all configurations, or to one or more configurations.  An expression such as a configuration may refer to one or more configurations, and vice versa.  The word "exemplary" is used here to mean "serving as an example or illustration."  Any aspect or design described herein as "exemplars" need not be interpreted as being preferred or advantageous over other aspects or designs.  All structural and functional equivalents of elements of the various aspects described throughout this specification which are known or will become known to those skilled in the art are expressly incorporated herein by reference and are intended to be encompassed by those skilled in the art. claims.  Furthermore, nothing described herein is intended to be dedicated to the public, regardless of whether such disclosure is explicitly mentioned in the claims.  No claim element shall be construed under the provisions of Article 35 USC §112, sixth paragraph, unless the element is expressly recited by using the expression "means for" or, in the case of a claim process, if the element is recited using the expression "step for".  In addition, to the extent that the terms "include", "have" or the like are used in the description or claims, these terms are intended to be inclusive in a manner similar to the term "understand", given that that "understand" is interpreted when used as a transition word in a claim.  - 20 -
权利要求:
Claims (8)
[0001]
CLAIMS: A method of managing a flash storage system, comprising: reading a plurality of flash data units from a flash memory into a buffer, wherein each of the plurality of flash data units comprises one or more host data units; determining an identifier for each of the host data units read from the buffer; selecting a set of unique identifiers from the identifiers determined based on a number of host data units that share the respective unique identifiers; And for each unique identifier of the set of unique identifiers: designating a first host data unit sharing the unique identifier as a master data unit, in which a logical address of the first host data unit is mapped with a first physical address in the flash memory 15 in a look-up table; remapping, in the look-up table, the respective logical addresses of one or more second host data units sharing the unique identifier from respective second physical addresses in the flash memory with the first physical address in the flash memory; and invalidating the data stored at the respective second physical addresses in the flash memory.
[0002]
The method of claim 1, wherein the plurality of flash data units are read from the flash memory in the buffer in response to a system maintenance operation, and wherein the method further comprises perform the system maintenance operation after the data stored at the respective second physical address has been disabled. 30
[0003]
The method of claim 1, wherein the method further comprises: filling a data structure based on the determined identifiers of the host data units, wherein the data structure comprises an entry for each unique identifier of the data units; identifiers, and wherein each entry includes a host data unit account sharing the respective unique identifier and a logical address of each host data unit sharing the respective unique identifier, wherein the set of unique identifiers is selected by browsing the entries of the data structure and selecting a predetermined number of unique identifiers having the largest host data unit accounts.
[0004]
The method of claim 3, wherein the largest host data unit counts are greater than a lower threshold count.
[0005]
The method of claim 3, wherein the largest host data unit counts exclude host data unit counts greater than an upper threshold count.
[0006]
The method of claim 3, wherein the data structure comprises a type B tree, and wherein each input comprises a respective key of the type B tree.
[0007]
The method of claim 1, further comprising, for each unique identifier in the set of unique identifiers: storing, in a backtrack table, respective inverse mappings of the first physical address with the logical address of the first host data unit and the respective logical addresses of said one or more second host data units.
[0008]
The method of claim 7, further comprising: removing from the look-up table a mapping of a logical address of a target host data unit to be modified, the logical address of the target host data unit being mapped to the physical address of the master data unit; and removing from the return-back table a reverse mapping of the physical address of the master data unit to the logical address of the target host data. - 22 -. The method of claim 7, further comprising: copying data from a physical address of a target host data unit to a new physical address, wherein the target host data unit is to be modified and the unit target host data is designated as the master data unit; designate a groomed unit host of the new physical address as the master data unit; remapping, in the lookup table, the respective logical addresses of the second host data units with the new physical address; updates, in the backtrack table, the inverse mappings of the physical address of the target host data unit with the inverse mappings of the new physical address; and invalidating the data stored at the physical address of the target host data unit. The method of claim 1, further comprising: for each logical address associated with a unique identifier of the set of unique identifiers, setting a deduplication flag in an entry corresponding to the logical address in the look-up table, wherein the deduplication flag indicates that the logical address is mapped to a respective unique physical address. The method of claim 1, wherein the plurality of flash data units comprises respective flash data units from a plurality of flash memory devices of the flash storage system, and wherein the respective flash data units of the a plurality of flash memory devices are selected in response to a system maintenance operation. A flash storage system comprising: a plurality of flash memory devices; a memory comprising a buffer; and a controller configured to: read a plurality of flash data units from the plurality of flash memory devices into the buffer, wherein each of the plurality of flash data units includes one or more host data units; - 23 - determining an identifier for each of the host data units read in the buffer; selecting a set of unique identifiers from the identifiers determined based on a number of host data units that share the respective unique identifiers; and for each unique identifier of the set of unique identifiers: designating a first host data unit sharing the unique identifier as the master data unit, wherein a logical address of the first host data unit is mapped with a first physical address 10 in the flash memory device in a look-up table; remapping, in the look-up table, the respective logical addresses of one or more second host data units sharing the unique identifier from respective second physical addresses in the flash memory device with the first physical address in the memory device 15 flash; and invalidating the data stored in the respective second physical addresses in the flash memory device. The flash storage system of claim 12, wherein the controller is configured to read the plurality of flash data units from the plurality of flash memory devices in the buffer in response to a system maintenance operation. , and wherein the controller is further configured to perform the system maintenance operation after the data stored at the respective second physical address has been disabled. The flash storage system of claim 12, wherein the controller is further configured to: fill a data structure in the memory based on the determined identifiers of the host data units, wherein the data structure comprises a input 30 for each unique identifier of the determined identifiers, and wherein each input - 24 - comprises a host data unit account sharing the respective unique identifier and a logical address of each host data unit sharing the respective unique identifier, wherein the set of unique identifiers is selected by browsing the entries of the data structure and selecting a predetermined number of unique identifiers having the largest host data unit accounts. The flash storage system of claim 14, wherein the largest host data unit counts are greater than a lower threshold count. The flash storage system of claim 14, wherein the larger host data unit accounts exclude host data unit counts greater than an upper threshold count. The flash storage system of claim 14, wherein the data structure 15 comprises a type B tree, and wherein each input comprises a respective key of the B type tree. 18. Flash storage system according to the claim 12, wherein the controller is further configured for, for each unique identifier of the set of unique identifiers: storing, in a backtrack table, the respective inverse mappings of the first physical address with the address logic of the first host data unit and the respective logical addresses of said one or more second host data units. The flash storage system of claim 18, wherein the controller is further configured to: delete from the look-up table a mapping of a logical address of a target host data unit to be modified, the logical address the target host data unit that has been mapped to the physical address of the master data unit; and removing from the return-back table a reverse mapping of the physical address of the master data unit to the logical address of the target host data. - 25 -. The flash storage system of claim 18, wherein the controller is further configured to: copy data from a physical address of a target host data unit to a new physical address, wherein the data unit target host must be modified and the target host data unit is designated as the master data unit; designate a host data unit of the new physical address as the master data unit; remapping, in the lookup table, the respective logical addresses of the second host data units with the new physical address; update the reverse mappings of the physical address of the target host data unit with the inverse mappings of the new physical address in the backtrack table; and invalidating the data stored at the physical address of the target host data unit. The flash storage system of claim 12, wherein the controller is further configured to: for each logical address associated with a unique identifier of the set of unique identifiers, set a deduplication flag in an entry corresponding to the logical address in the look-up table, in which the deduplication flag indicates that the logical address is not mapped to a single respective physical address. The flash storage system of claim 12, wherein the plurality of flash data units comprises respective flash data units of the plurality of flash memory devices, and wherein the respective flash data units of the plurality of flash memory devices comprise Flash memory devices are selected in response to a system maintenance operation. A machine-readable non-transitory medium comprising instructions stored thereon which, when executed by a machine, causes the machine to perform operations including: reading a plurality of flash data units at a time; from a flash memory in a buffer, wherein each of the plurality of flash data units comprises one or more host data units; Determining an identifier for each of the host data units read in the buffer; selecting a set of unique identifiers from the identifiers determined based on a number of host data units that share the respective unique identifiers; and for each unique identifier of the set of unique identifiers: designating a first host data unit sharing the unique identifier as the master data unit, wherein a logical address of the first host data unit is mapped with a first physical address in the flash memory 10 in a look-up table; remapping, in the look-up table, the respective logical addresses of one or more second host data units sharing the unique identifier from respective second physical addresses in the flash memory with the first physical address in the flash memory; and invalidating the data stored at the respective second physical addresses in the flash memory. Machine readable non-transitory medium according to claim 23, wherein the instructions further cause the machine to perform operations comprising: filling a data structure based on the determined identifiers of the host data units, wherein the data structure comprises an entry for each unique identifier of the determined identifiers, and wherein each entry comprises a host data unit account sharing the respective unique identifier and a logical address of each host data unit sharing the identifier. respective unique, wherein the set of unique identifiers is selected by browsing the entries of the data structure and selecting a predetermined number of unique identifiers having the largest host data unit counts. 25. A machine-readable non-transitory medium according to claim 23, wherein the plurality of flash data units are read from the flash memory in the buffer in response to a garbage collection operation. - 27 -
类似技术:
公开号 | 公开日 | 专利标题
FR3026512A1|2016-04-01|
US10248356B2|2019-04-02|Using scratch extents to facilitate copying operations in an append-only storage system
CN105474200B|2019-01-22|Hydration and dehydration with placeholder
FR3023030B1|2019-10-18|INVALIDATION DATA AREA FOR CACHE
US8904125B1|2014-12-02|Systems and methods for creating reference-based synthetic backups
CN104160397B|2018-03-06|Position unique file
US9690823B2|2017-06-27|Synchronizing copies of an extent in an append-only storage system
US9069682B1|2015-06-30|Accelerating file system recovery by storing file system metadata on fast persistent storage during file system recovery
US9189494B2|2015-11-17|Object file system
US7577808B1|2009-08-18|Efficient backup data retrieval
US10296518B2|2019-05-21|Managing distributed deletes in a replicated storage system
CN107787489A|2018-03-09|Document storage system including level
TW200839516A|2008-10-01|A method and system for facilitating fast wake-up of a flash memory system
FR3033061A1|2016-08-26|
US9996426B1|2018-06-12|Sparse segment trees for high metadata churn workloads
US11144508B2|2021-10-12|Region-integrated data deduplication implementing a multi-lifetime duplicate finder
US20160092124A1|2016-03-31|Append-only storage system supporting open and closed extents
US20210311880A1|2021-10-07|Advanced File Recovery Method For Flash Memory
US8904128B2|2014-12-02|Processing a request to restore deduplicated data
US20160139980A1|2016-05-19|Erasure-coding extents in an append-only storage system
TW200941215A|2009-10-01|Management method, management apparatus and controller for memory data access
US10223377B1|2019-03-05|Efficiently seeding small files with certain localities
US11176034B2|2021-11-16|System and method for inline tiering of write data
JP5972455B2|2016-08-17|How to delete information
US11263132B2|2022-03-01|Method and system for facilitating log-structure data organization
同族专利:
公开号 | 公开日
GB201517060D0|2015-11-11|
GB2532850B|2016-12-07|
US20160092138A1|2016-03-31|
CN105468294B|2018-10-26|
CN105468294A|2016-04-06|
GB2532850A|2016-06-01|
DE102015012621A1|2016-03-31|
US9792069B2|2017-10-17|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US8452929B2|2005-04-21|2013-05-28|Violin Memory Inc.|Method and system for storage of data in non-volatile media|
US8380944B2|2007-03-01|2013-02-19|Douglas Dumitru|Fast block device and methodology|
JP5026213B2|2007-09-28|2012-09-12|株式会社日立製作所|Storage apparatus and data deduplication method|
US8468293B2|2009-07-24|2013-06-18|Apple Inc.|Restore index page|
US8612699B2|2010-06-25|2013-12-17|International Business Machines Corporation|Deduplication in a hybrid storage environment|
WO2012056491A1|2010-10-26|2012-05-03|Hitachi, Ltd.|Storage apparatus and data control method|
US8572053B2|2010-12-09|2013-10-29|Jeffrey Vincent TOFANO|De-duplication indexing|
EP2715510B1|2011-05-24|2018-05-02|Marvell World Trade Ltd.|Method for storage devices to achieve low write amplification with low over provision|
US8583868B2|2011-08-29|2013-11-12|International Business Machines|Storage system cache using flash memory with direct block access|
US9021231B2|2011-09-02|2015-04-28|SMART Storage Systems, Inc.|Storage control system with write amplification control mechanism and method of operation thereof|
US8930307B2|2011-09-30|2015-01-06|Pure Storage, Inc.|Method for removing duplicate data from a storage array|
US8589640B2|2011-10-14|2013-11-19|Pure Storage, Inc.|Method for maintaining multiple fingerprint tables in a deduplicating storage system|
US9069657B2|2011-12-12|2015-06-30|Apple Inc.|LBA bitmap usage|
US9075710B2|2012-04-17|2015-07-07|SanDisk Technologies, Inc.|Non-volatile key-value store|
US8930612B2|2012-05-31|2015-01-06|Seagate Technology Llc|Background deduplication of data sets in a memory|
US8780633B2|2012-11-09|2014-07-15|SanDisk Technologies, Inc.|De-duplication system using NAND flash based content addressable memory|
US9898404B2|2013-07-14|2018-02-20|Cnex Labs|Method and apparatus for providing improved garbage collection process in solid state drive|KR102320864B1|2015-03-24|2021-11-03|에스케이하이닉스 주식회사|Memory system and operating method thereof|
US10515055B2|2015-09-18|2019-12-24|Netapp, Inc.|Mapping logical identifiers using multiple identifier spaces|
JP6679971B2|2016-02-16|2020-04-15|セイコーエプソン株式会社|Storage device, liquid container and host device|
US10229000B2|2016-08-09|2019-03-12|Seagate Llc|Erasure codes to prevent lower page corruption in flash memory|
CA2977742A1|2016-09-28|2018-03-28|Huawei Technologies Co., Ltd.|Method for deduplication in storage system, storage system, and controller|
US10235397B1|2016-09-30|2019-03-19|EMC IP Holding Company LLC|Trees and graphs in flash memory|
US10282127B2|2017-04-20|2019-05-07|Western Digital Technologies, Inc.|Managing data in a storage system|
US10809928B2|2017-06-02|2020-10-20|Western Digital Technologies, Inc.|Efficient data deduplication leveraging sequential chunks or auxiliary databases|
CN107301133B|2017-07-20|2021-01-12|苏州浪潮智能科技有限公司|Method and device for constructing lost FTL table|
US10503608B2|2017-07-24|2019-12-10|Western Digital Technologies, Inc.|Efficient management of reference blocks used in data deduplication|
US10579593B2|2018-01-31|2020-03-03|EMC IP Holding Company, LLC|Techniques for selectively deactivating storage deduplication|
CN108304736A|2018-02-09|2018-07-20|深圳国微技术有限公司|A kind of safety chip|
CN109213450B|2018-09-10|2021-08-31|郑州云海信息技术有限公司|Associated metadata deleting method, device and equipment based on flash memory array|
法律状态:
2016-09-21| PLFP| Fee payment|Year of fee payment: 2 |
2017-08-10| PLFP| Fee payment|Year of fee payment: 3 |
2018-08-13| PLFP| Fee payment|Year of fee payment: 4 |
2018-10-05| PLSC| Publication of the preliminary search report|Effective date: 20181005 |
2020-04-24| TP| Transmission of property|Owner name: WESTERN DIGITAL TECHNOLOGIES, INC., US Effective date: 20200319 |
优先权:
申请号 | 申请日 | 专利标题
US14/500,937|US9792069B2|2014-09-29|2014-09-29|Offline deduplication for solid-state storage devices|
[返回顶部]