![]() registrar rename processor, method and unit
专利摘要:
ZERO CYCLE LOADING. The present invention relates to a system and method for reducing the latency of loading operations. A register renaming unit within a processor determines whether a decoded load instruction is eligible for conversion to a zero cycle load operation. If so, the control logic assigns a physical register identifier associated with a source operand of an older dependent storage instruction to the target operand of the load instruction. In addition, the recorder renaming unit marks the load instruction to prevent it from reading the data associated with the source operand of the memory storage instruction. Due to duplicate renaming, this data can be forwarded from a physical register file to instructions that are newer and dependent on the loading instruction. 公开号:BR102013014996B1 申请号:R102013014996-9 申请日:2013-06-14 公开日:2020-12-08 发明作者:Gerard R. Williams Iii;John H. Mylius;Conrado Blasco-Allue 申请人:Apple Inc; IPC主号:
专利说明:
Background of the Invention Field of invention [0001] This invention relates to microprocessors and, more particularly, to the efficient reduction of the latency and energy of loading operations. Description of the Relevant Technique [0002] Microprocessors typically include overlapping routing stages and out-of-order execution instructions. Additionally, microprocessors can support simultaneous multitransaction to increase productivity. These techniques take advantage of instruction-level parallelism (ILP) in the source code. During each clock cycle, a microprocessor ideally produces a useful execution of a maximum number of N instructions per transaction for each stage of a routing, where N is an integer greater than one. However, control dependencies and data dependencies reduce the maximum microprocessor throughput to below N instructions per cycle. [0003] Speculative execution of instructions is used to carry out parallel execution of instructions despite control dependencies in the source code. A data dependency occurs when an operand of an instruction depends on a result of an older instruction in the order of the program. Data dependencies can appear either between subsequent instruction operands in a straight code segment or between instruction operands that belong to subsequent loop iterations. In straight-line code, one can find dependencies of read after write (read after write - RAW), write after read (write after read - WAR) or write after writing (write after write - WAW). Register renaming is used to allow parallel execution of instructions despite WAR and WAW dependencies. However, the real dependency, or RAW dependency, is still intact. Therefore, architecture registers used repeatedly as a target register and subsequently as a source register cause serialization of instruction execution for associated source code segments. [0004] An example of a common RAW dependency on an architecture register is a load instruction or a read operation, which attempts to read a memory location that has been modified by an older storage instruction (in program order) that has not yet consolidated its results in the memory location. This type of RAW dependency can often occur during program execution. Reading the memory location can include appreciable latency and reduce processor productivity. [0005] In view of the above, efficient methods and mechanisms to efficiently reduce the latency of loading operations are desired. Summary of Modalities [0006] Systems and methods to efficiently reduce the latency of loading operations. In one embodiment, a processor includes a register renaming unit that receives decoded instructions and determines whether a given decoded instruction is qualified to be a zero cycle loading operation. An example of a qualifier for including an expected memory dependency for a given loading instruction in a given storage instruction. In addition, a qualifier that can include available support detection exists to maintain a duplicate count of mappings for a given number of physical registers. If the determination is true, the renaming register unit can assign a number of physical registers associated with a source operand of the given storage instruction to the destination operand of the given loading instruction. [0007] Also, the control logic in the register renaming unit can mark the given load instruction to prevent it from reading data associated with the source operand of the storage instruction from memory. Due to duplicate renaming, this data can be forwarded from a physical register file for instructions that are newer (in the order of the program) and that depend on the given loading instruction. At a later stage of forwarding, the predicted memory dependency can be checked. If the memory dependency is correct, then the given load instruction can be considered complete without reading operand data from memory (data cache) or from the temporary storage of the storage. If the memory dependency is incorrect, then the given loading instruction and newer instructions (in the order of the program) can be released from routing and repeated. [0008] These and other modalities will be further appreciated by reference to the description and drawings below. Brief Description of Drawings [0009] Figure 1 is a generalized block diagram of a modality of a computer system. [00010] Figure 2 is a generalized block diagram of a modality of a processor core that performs the superscalar execution, out of order with zero cycle loading operations. [00011] Figure 3 is a generalized flowchart of an embodiment of a method for detecting zero cycle loading operations. [00012] Figure 4 is a generalized flowchart of an embodiment of a method for processing zero cycle loading operations. [00013] Figure 5 is a generalized flow chart of an embodiment of a method for consolidating instructions that include zero cycle loading operations. [00014] Although the invention is susceptible to several modifications and alternative forms, specific modalities of it are shown by way of example in the drawings and will be described in this document in detail. It should be understood, however, that the drawings and the detailed description for it are not intended to limit the invention to the particular form revealed, on the contrary, the intention is to cover all the modifications, equivalences and alternatives that are found in the spirit and scope of the present invention as defined by the appended claims. As used throughout this request, the word "can" is used in a permissive sense (that is, that means to have the potential for), rather than the mandatory sense (that is, that means must). Similarly, the words "include," "which includes," and "includes" mean to include, but without limitation. [00015] Various units, circuits or other components can be described as "configured to" perform a task or tasks. In such contexts, "configured for" is a comprehensive recitation of structure that generally means "that has circuitry that" performs the task or tasks during operation. As such, the unit / circuit / component can be configured to perform the task even when the unit / circuit / component is not currently connected. In general, the circuitry that forms the structure that corresponds to "configured to" may include hardware circuits. Similarly, several units / circuits / components can be described as performing a task or tasks, for convenience in the description. Such descriptions should be interpreted as including the phrase "configured for". Reciting a unit / circuit / component that is configured to perform one or more tasks is expressly intended not to invoke the interpretation of 35 U.S.C. § 112, paragraph six for that unit / circuit / component. Detailed Description [00016] In the description below, numerous specific details are presented to provide a complete understanding of the present invention. However, one skilled in the art must recognize that the invention can be practiced without these specific details. In some instances, well-known circuits, structures and technologies have not been shown in detail to avoid obscuring the present invention. [00017] Referring to Figure 1, a generalized block diagram of an embodiment of a computer system 100 is shown. As shown, microprocessor 110 can be connected to one or more peripheral devices 150a through 150b, and external computer memory, such as disk memory 130 and dynamic random access memory (DRAM) 140. Disk memory 130 can store an operating system (OS) for computer system 100. Instructions from a software application can be loaded into cache memory subsystem 116 on microprocessor 110. The software application may have been stored in one or more of the memory in disk 130, DRAM 140 and one of the peripheral devices 150a to 150b. [00018] Although a single processor core is shown, microprocessor 110 may include multiple processor cores. Each processor core can be connected to an associated cache memory subsystem. In addition, each processor core can share another cache memory subsystem. For example, each of the multiple processor cores can use a level one (L1) and a level two (L2) cache and additionally share a level three (L3) cache with the other processor cores. As shown, processor core 112 can load the software application instructions from cache memory subsystem 116 and process the instructions. Generally speaking, when software programmers write applications to do work according to an algorithm or method, programmers use variables to refer to result and temporary data. This data uses space allocated in the computer's memory. The operating system allocates memory regions for the software application. [00019] Processor core 112 can include multiple physical registers 114 in one physical register file. Physical registers 114 may include architecturally visible registers that a software programmer and / or compiler can identify in the software application. In addition, physical registers 114 may include non-architectural (speculative) registers identified by renamed register identifiers. Architecturally visible registers are associated with a given instruction set architecture (ISA). During application processing, data can be loaded from the allocated memory regions in the cache memory subsystem 116. One or more of the physical registers 114 can be used to load and store temporary data and result data. The hardware in the processor core 112 includes circuitry for processing instructions according to the given ISA. The hardware circuitry includes at least one associated set of architecture registers, functional units, routing preparation elements and control logic. The ARM instruction set architecture can be selected for the given ISA. Alternatively, Alpha, PowerPC, SPARC, MIPS, x86 or any other ISA can be selected. [00020] The given ISA can be used to select a way to declare and allocate regions of memory. The given ISA can further determine a selected addressing mode used to transfer data between microprocessor 110, which includes physical registers 114 and memory locations in one or more of disk memory 130, DRAM 140 and peripheral devices 150a to 150b. Both storage and loading instructions are typically used to transfer data between memory and microprocessor 110 and between cache memory subsystem 116 and physical registers 114. The dotted lines shown on computer system 100 indicate examples of data transfers. data performed by storage and loading operations. Considerable latency can be associated with each of these data transfers. [00021] In addition to the out-of-order emissions of instructions for execution units in a superscalar microarchitecture, the processor core 112 can perform the renaming of registers to increase productivity. Using hardware, processor core 112 dynamically renames an architecture register identifier used for a target operand. The source operand with the same architecture register identifier as the target operand can be renamed with the same renamed register identifier used for the target operand. [00022] In one embodiment, processor core 112 includes control logic that detects storage instructions at an initial forwarding stage and temporarily stores at least associated address operand identifiers. The initial routing stages can process instructions in order, while instructions can be issued and executed out of order at later routing stages. The address operands of a given storage instruction are used at a later stage of forwarding to generate the storage address. Address operands can include an architecture register identifier (ID) used as a base register ID and an intermediate value used as a trade-off. [00023] At the initial forwarding stage, the control logic at processor core 112 can monitor subsequent instructions to determine whether one or more of these instructions modify one or more address operands for a given storage instruction. For example, the address operand ID can match the target operand IDs in one or more subsequent instructions. This monitoring can take place at a forwarding stage prior to the issuing and out-of-order execution of the forwarding stages. [00024] The control logic can continue to monitor subsequent instructions for the given storage instruction until a corresponding entry in the physical register file is deallocated. This entry can be moved in response to an associated renamed registrar identifier being removed from a mapping table and returned to a free list. For example, a subsequent instruction (in program order) for the storage instruction may have a target operand identifier (target architecture register identifier) identical to the source operand identifier (source architecture register identifier) given storage instruction. When the subsequent instruction is consolidated, the renamed register identifier that was previously assigned to the destination operand of the subsequent instruction is placed on the free list when it is rejected by another instruction. This free renamed register identifier is the same identifier used for the source operand of the given storage instruction. In response to detecting the condition above, the monitoring of the given storage instruction may end. Additional details and an example will be provided shortly. [00025] In one mode, during monitoring, the control logic can determine a subsequent loading operation that has corresponding address operands, since the given storage instruction and these address operands were not modified by intervention instructions. In addition, the control logic that cannot determine any other storage instruction with the same address operands is located between the given storage instruction and the load instruction. In other words, the given storage instruction is the newest storage instruction with the address operands. In response to that determination, an indication of a RAW dependency between that loading instruction and the given storage instruction can be established or declared. In another embodiment, additional information, such as at least instruction labels and program counter values, can be compared or used to index forecast tables. The control logic can predict at this initial routing stage that a RAW dependency exists between that loading instruction and the given storage instruction. A determination or prediction of this RAW dependency can occur at the same routing stage as the renaming of registrars. Alternatively, the determination or forecast can occur at a referral stage before a stage used for renaming registrars. An example of this RAW dependency is provided here: ADD r7, r3, r5 STORE [r10 + 4], r7 // Operands are r10 and 4. MOVE address r12, r16 LOAD r14, [r10 + 4] // Address operands they are r10 and 4. SUBTRAIR r2, r6, r14 // For r14, use data forwarded from // operand of source of the storage operation, r7. ADD r11, r14, r13 // For r14, use data forwarded from // operand of source of the storage operation, r7. [00026] In this example, a target operand is listed first before an instruction mnemonic followed by one or more source operands. Registrars use the general nomenclature of "r" followed by a registrar identifier. For example, register 7 is denoted by "r7". The instructions in the example above are intended to be an example of pseudocode and language agnostic. As can be seen above, the loading instruction has the same address operands as the storage instruction. No intervention instructions modify the address operands (r10). Thus, the control logic can determine whether the loading instruction has a RAW dependency on the storage instruction. In other embodiments, forecasting qualifications can be used, such as comparison instruction labels, which are not shown to facilitate illustration. [00027] In response to determining or forecasting the RAW dependency, the target operand identifier (ID) (r14) of the loading instruction can be renamed to the same renamed register identifier used for the source operand ID (r7) of the storage instruction. For example, if the source operand ID (r7) of the storage instruction is renamed to a renamed register identifier P44, then the destination operand ID (r14) of the load instruction can be renamed to the same identifier (P44). Similarly, the source operand ID r14 for each of the subtraction instructions and the less addition instruction can be renamed to the renamed register identifier (P44). [00028] The control logic within the processor core 112 can issue the load instruction and subsequent instructions out of order. In that case, each of the subtraction instructions and the last addition instruction can be issued, during or shortly after the loading instruction, even though the loading instruction has not been completed. If the source operand for register identifiers r6 and r13 are available, the subtraction instruction and the last addition instruction can be issued before the load instruction is issued, let alone completed. These instructions can be issued with data forwarded from the architecture source register ID r7, which is the source operand ID for the storage instruction. With the use of register renaming, the data to be forwarded can be stored in the physical register identified by the renamed register identifier P44, which is associated with the source architecture register ID r7. Therefore, the loading instruction becomes a zero cycle operation. The load instruction can complete without accessing memory, such as a multi-level cache hierarchy on chip and off-chip memory. [00029] If the steps above are performed and the loading instruction is converted to a zero cycle operation, then the productivity of the instruction can increase for forwarding. Instruction productivity can increase since instructions that are younger (in order of the program) and that depend on the loading instruction do not wait for the data to be retrieved for the loading instruction from a data cache, a temporary storage storage or off-chip memory. Preferably, these younger, more dependent instructions can receive data from the physical recorder file. Before proceeding with further details regarding the conversion of loading instructions to zero cycle loading operations, an additional description of the components in the computer system 100 is provided. [00030] In addition to including one or more processor cores connected to corresponding cache memory subsystems, microprocessor 110 may also include interface logic 118 and a memory controller 120. Other inter and intrabloc logic and communications are not shown to facilitate the illustration. The illustrated functionality of microprocessor 110 can be incorporated into a single integrated circuit. In another embodiment, the illustrated functionality is incorporated into a chipset on a computer's motherboard. In some embodiments, microprocessor 110 can be included in a desktop computer or a server. In yet another embodiment, the illustrated functionality is incorporated into a semiconductor matrix with other processor matrices in a system-on-a-chip (SOC). [00031] Processor core 112 may include circuitry for executing instructions according to a given ISA as previously described. In one embodiment, processor core 112 may include a superscalar multi-transition microarchitecture used to process instructions from a given ISA. In some embodiments, the processor core is a general purpose processor core. In several other embodiments, microprocessor 110 may include one or more specific cores, such as a digital signal processor (DSP), a graphics processing unit (GPU), an application specific integrated circuit (ASIC), and so on. [00032] Cache memory subsystem 116 can reduce memory latencies for processor core 112. A reduced error rate achieved by the additional memory provided by cache memory subsystem 116 helps to hide the latency gap between processor core 112 processor and non-chip memory. Although the latency between processor core 112 and cache memory subsystem 116 is less than the latency for off-chip memory, that latency can be reduced further if the load instruction is converted to a cycle load operation zero as previously described. [00033] If a cache error occurs, such as a requested block is not found in cache memory subsystem 116, then a read request can be generated and transmitted to memory controller 120. Memory controller 120 can translate an address that corresponds to the requested block and send a read request to DRAM out of volatile chip 140 through memory bus 122. Memory controller 120 may include a set of control circuits to interface with memory channels and follow corresponding protocol. In addition, memory controller 120 may include request queues to queue requests from memory. Off-chip DRAM 140 can be populated with data from off-chip disk memory 130. Off-chip disk memory 130 can provide non-volatile, random access secondary storage of data. In one embodiment, the non-chip disk memory 130 may include one or more hard disk drives (HDDs). In another embodiment, off-chip disk memory 130 uses a Solid State Disk (SSD). [00034] Although only two peripheral devices are shown on computer system 100 for illustrative purposes, another number of peripheral devices can be connected to microprocessor 110. One or more of the peripheral devices 150a to 150b can be a screen including a modern television, a computer monitor, a laptop computer, or a mobile device monitor, and so on. A graphic video subsystem can be used between the screen and microprocessor 110. One or more of the peripheral devices 150a to 150b can be an input / output device typically used such as a keyboard, mouse, printer, modem and so on . [00035] Referring now to Figure 2, a generalized block diagram illustrating an embodiment of a processor core 200 that performs out-of-order out-of-order executions with zero cycle loading operations is shown. Processor core 200 may use multistage routing for processing instructions. Although control and functional blocks are shown in a particular order and in a forwarding stage, other combinations are possible and contemplated. In addition, control and function blocks can occupy more than one routing stage. In most cases, a single forwarding stage is shown for each function block to facilitate illustration. [00036] An instruction cache (i-cache) 204 can store instructions for a software application. One or more instructions indicated by an address transmitted by the address selection logic 202 can be obtained from i-cache 204. Multiple instructions can be obtained from i-cache 204 per clock cycle if there are no i-cache errors. The address can be incremented by a next obtain predictor 206. A branch direction predictor 208 can be coupled to each next obtain predictor 206 and the control flow evaluation logic 212 at a later forward stage. The predictor 208 can predict instruction information that alters the flow of an instruction stream from executing a next sequential instruction. [00037] The decoding unit 210 decodes the operation codes of the multiple instructions obtained. Alternatively, the instructions can be divided into micro-operations. As used in this document, the terms "instructions", "micro-operations" and "operations" are interchangeable as the invention can be used with an architecture that uses any implantation. In one embodiment, the control flow evaluation block 212 can alter the obtaining of instructions in address selector 202. For example, an absolute address value associated with an unconditional branch operation code can be sent to address selector 202 . [00038] Rename intragroup dependency detection logic 214 can find dependencies between instructions decoded by decoding unit 210. An instruction intragroup can include instructions decoded from one or more clock cycles, or forward stages. Dependencies such as writing after reading (WAR), writing after writing (WAW) and reading after writing (RAW) can be detected. Detection vectors that indicate dependencies between instructions can be generated. [00039] Dependency detection logic can include a memory dependency (MD) detector 216. In some embodiments, the MD 216 detector can determine dependency on storage for loading memories (STL). In these modalities, a table can be used to store a base register ID and an immediate value (compensation value) used as address operands for a given storage instruction. In addition, a source operand register ID for the storage instruction can be stored. For newer instructions (in program order), target operand register IDs, address operand register IDs and immediate values can be compared to values stored in the table. The MD 216 detector can indicate whether an STL memory dependency exists between a newer loading instruction and the given storage instruction in response to the determination that certain conditions are met. One condition may be that the newest load instruction has an address operand register ID and an immediate address operand value that corresponds to address operand values for the storage instruction. A second condition may be that it is determined that no intervention instruction changes the values stored in the table for the given storage instruction. A third condition may be that the storage instruction is determined to be the newest storage instruction that is older than the load instruction with the corresponding address operands. The MD 216 detector can store an indication that a RAW dependency exists between the loading instruction and the given storage instruction. [00040] Additionally, the MD 216 detector can send an indication to register renaming unit 220 to rename the target operand register ID of the loading instruction with the same renamed register identifier used for the operand register ID source of the given storage instruction. In other ways, a compiler can analyze code and perform the steps described above. If the compiler determines that a RAW dependency exists between the newest loading instruction and the given storage instruction, then the compiler can insert an indication in the program code to be detected by at least the MD 216 detector. Information can include a declared bit and the source operand register ID of the given storage instruction. Alternatively, the information may include a declared bit and a program counter (PC) compensation used to identify the given storage instruction. Other information can be used. [00041] In still other embodiments, the MD 216 detector may include a predictor for dependence on STL memories. In such embodiments, the MD 216 detector can index one or more tables with at least program counter (PC) values associated with storage and loading instructions. Partial address labels and other instructional identifying information can also be used to index tables. An output from a polling function can be used to index forecast tables, which store saturation counters or other forecast information. In some embodiments, the MD 216 detector can determine whether these address operands have not been modified by intervention instructions. In other modalities, the MD 216 detector can allow correction logic, such as saturation counters, to be responsible for erroneous predictions. The information read from the tables can be used to identify speculated dependencies. In response to the determination of a speculative RAW memory dependency, the MD detector 216 can store an indication that a RAW dependency exists between a given storage instruction and a given subsequent load instruction. In addition, the MD detector 216 can send an indication to register renaming unit 220 to rename the target operand register ID of the loading instruction with the same renamed register identifier used for the source operand register ID of the given storage instruction. In additional embodiments, a combination of the above methods and mechanisms for finding STL memory dependencies can be used. [00042] Mapper 218 can divide instructions among distributed hardware resources using factors such as available competition, criticality of dependency chains and communication penalties. When the hardware renames an architecture register identifier with a physical register identifier, the hardware stores the mapping on mapper 218, which can be a data structure, such as a mapping table. As used herein, an identifier or an architecture register or a physical register can also be referred to as a number. Therefore, an architecture register identifier can be referred to as an architecture register number. Similarly, a physical register identifier can be referred to as a physical register number. The physical register number used to rename an architecture register number can also be referred to as a rename register number. [00043] Register renaming unit 220 may include naming control matrix and logic 222 and a register duplication matrix (RDA) 224. Register renaming unit 220 can determine which physical register identifiers to use to rename identifiers architecture recorders used in both target and source operands in the instructions. The register renaming unit can select candidate physical register identifiers from the free list allocator 230 or a rename mapping table in the rename control logic 222. In several ways, RDA 224 is configured to store a mapping indication duplicates. Duplicate mappings can be used when converting a load operation to a zero cycle load operation. [00044] Register renaming unit 220 may receive an indication from the MD 216 detector that a loading instruction is qualified to be converted into a zero cycle loading operation. Register renaming unit 220 may assign the target operand register ID of the load instruction to the same rename register identifier used for the source operand register ID of a storage instruction as the load operation. is dependent. Mapper 218 can store multiple mappings for the renaming register identifier. In addition, RDA 224 can store a duplicate count for the renaming register identifier. For example, in the previous code example, the renaming register identifier P44 can be used for both the source operand register ID (r7) of the storage instruction and the destination operand register ID (r14) of the loading instruction. This duplicate count can include the number of times that any given architecture register identifier has been mapped to the same renaming register identifier. [00045] In several modalities, the duplicate count may not be increased for a mapping when a particular architecture register has already been mapped to the renaming register number at the time of the mapping. RDA 224 can store both the renaming register number and the associated duplicate count. In one embodiment, the RDA can be deployed as a relatively small, fully associative structure labeled. RDA 224 can have any number of entries to store a renaming registrar number and an associated duplicate count. In one example, an ISA deployment can include 144 physical register numbers, so an 8-bit physical register index can be stored in an RDA entry and used to access the RDA. In one embodiment, each duplicate count size is 5 bits. Therefore, a maximum number of duplications for a given physical register number is 31. However, another duplicate count size may be possible and chosen. [00046] RDA 224 can be updated before the instruction is sent when the processor is forwarded. When the MD 216 detector determines that a decoded load instruction is a zero cycle load operation, RDA 224 can be accessed to determine whether an entry already exists for the physical register number to be used to rename each one within the Source operand register ID of an associated storage instruction and the destination operand register ID of the load instruction. If an entry exists, then the associated duplicate count can be incremented each time any given architecture register ID currently not mapped to the given rename register number is mapped to the given rename register number. If an entry does not exist in the RDA, then an entry can be allocated and the associated duplicate count can be started in two. [00047] RDA 224 can also be updated during a consolidation forwarding stage in the processor forwarding. The duplicate count can be decreased each time the physical register identifier is ready to return to the free list for any given architecture register during instruction consolidation. The physical recorder identifier can also be referred to as the rename recorder identifier. A physical register identifier that may be a candidate to return to the free list in response to an entry in the mapping table associated with the physical register identifier is removed or invalidated due to statement consolidation. In one embodiment, in response to decreasing the duplicate count to one, the duplicate count and duplicate mappings can no longer be stored. [00048] In one embodiment, in response to a given renaming registrar identifier, a candidate returns to the free list during an associated statement consolidation, and no associated duplicate information is stored in RDA 224, the renaming registrar identifier is returned to the free list. In another embodiment, in response to a given renaming register identifier, a candidate returns to the free stripe and the duplicate count stored in RDA 224 is decreased from one to zero, the renaming register identifier returns to the free list. [00049] In one embodiment, in response to a renaming register identifier given that it is a candidate to return to the free list and the stored duplicate count is even greater than one after being decreased, the renaming register identifier is not returned to the free list. The renaming register identifier also has duplicate mappings for multiple architecture registers in the latter case. In one embodiment, RDA 224 is checked for each potential zero cycle loading candidate to ensure that there is free entry to track a duplicate. If there is no free entry for allocation within RDA 224, then a corresponding loading instruction is not converted into a zero cycle loading operation. Similarly, if an allocated entry exists for the zero cycle load candidate, but the duplicate count is already saturated, then the load instruction is not converted to a zero cycle load operation. [00050] For a zero cycle loading operation, the data contents can be forwarded from a physical register within the physical register file that stores the data for the storage instruction operating from source to the subsequent loading instruction and others newer, dependent instructions. The data may not be read from data storage intermediate storage, temporary storage or off-chip memory. The newer, dependent instructions can issue without waiting for data to be read from data storage intermediate storage, temporary storage storage or off-chip memory. [00051] After the instructions have been decoded and renamed, the associated entries can be allocated to dispatch queue 240. The instructions and associated renamed identifiers, program count (PC) values, dependency vectors, markings for completion and so on can be sent to dispatch queue 240 and later to programmer 250. Various exceptions can be detected, such as by execution core 260. Examples include exceptions for memory accesses, no address translation and so on. onwards. Exceptions can cause a corresponding exception handling routine to be performed, such as by microcode 242. [00052] Programmer 250 can program instructions for execution in execution core 260. When operands are available and hardware resources are also available, an instruction can be issued out of order from programmer 250 for one of the units functional within the execution core 260. Programmer 250 can be its source operands of the physical register file (not shown) after translating renamed identifiers with a mapping table or from the operand bypass logic. The source operands can be supplied to the execution core 260. The execution core 260 can determine loading addresses and storage instructions. Additionally, execution core 260 can perform one or more of Boolean, floating point and multiple integer operations. [00053] The execution core 260 may include a loading / storage unit. The loading / storage unit can be connected to a data storage intermediate storage (not shown) and to the temporary storage of storage 272 or directly or via reordering temporary storage (rob) 270. Processor 200 can include temporary storage translation translation (TLB) for each of the provisioning intermediate storage i 204 and the data provisioning intermediate storage to prevent a cost of performing a full memory translation when performing a provisioning intermediate access. The temporary storage of storage 272 can store addresses that correspond to the storage instructions. The robot 270 can receive results from the execution core 260. Additionally, the results can be diverted to previous routing stages for routing data to dependent instructions already in the routing. The rob 270 can guarantee deactivation and consolidation in order of instructions. [00054] Referring now to Figure 3, a generalized flowchart of a method 300 method for detecting the zero cycle loading operation is shown. In block 302, the program instructions are processed. Instructions can be compiled, extracted from memory, decoded and executed. After decoding, if a given instruction is detected as a storage instruction (conditional block 304), then in block 306, at least the base address operand register ID, the immediate address operand value and the ID source operand recorder of the storage instruction are temporarily stored. These values can be stored in a given table. An associated program counter (PC) and other information can also be stored temporarily. In one embodiment, this information is temporarily stored in a table within the memory dependency (MD) detector. [00055] In block 308, the information from the subsequent instructions (in the program order) is monitored for potential compatibility with the information temporarily stored for the previous storage instructions (in the program order). The information for comparisons can include at least the destination operand register ID of the subsequent instructions and the address operand base register ID and the immediate value of the subsequent loading instructions. The control logic can detect a match between a register ID associated with a given storage instruction and a register ID associated with a subsequent instruction. The register IDs can be the architecture register IDs used for identification operands. [00056] A modification of a base register of address operand of the storage instruction can be an update based on the immediate value. Using the previous code example, an addition instruction can follow (in program order) the storage instruction, such as ADD r10, r10, # 4. The "#" symbol can be used to indicate a data operand of immediate value. The base operand register of address r10 of the storage instruction is modified. However, it is a modification based on the immediate value. If no modification is made to the base operand register of address r10 other than modification based on the immediate value by the intervention instruction between the loading instruction and the storage instruction, then an adjustment can be made with the table to consider modification based on immediate value. If an address operand base register for a given storage instruction is detected as being modified (conditional block 310) and the modification is based on an immediate value update (conditional block 311), then in block 313, the stored values particulars in the table for the given storage instruction are updated. For example, the immediate value of stored address operand can be updated. Using the example above, the immediate stored value of 4 within the table for the given storage instruction can be decreased within the table by 4, which is the immediate value used by the ADD instruction. In other examples, the base register of address operand can be decreased, rather than increased, and the immediate value stored within the table can be increased accordingly. If the modification is not based on an immediate value update (conditional block 311), then in block 312, the table entry that stores the values corresponding to the given storage instruction can be invalidated. Thereafter, the control flow of method 300 can return to block 302 via block A. [00057] If the source operand of the given storage instruction is detected as being modified (conditional block 314) and the storage instruction has been deactivated (conditional block 316), then the control flow of method 300 moves to the block 312. To illustrate this case, another code example of a memory dependency with modification of the source operand of the storage instruction and a race condition is provided here: ADD r7, r3, r5 STORE ADD [r10 + 4], r7 // The address operands are r10 and 4. r19, r24, r18 ADD LOAD r7, r20, r21 // The storage operation source operand is // modified. r14, [r10 +4] // The address operands are r10 and 4. SUBTRAIR r2, r6, r14 // For r14, use the data forwarded from the // operand source of the storage operation, r7. ADD r7, r14, r13 // For r14, use the data forwarded from the // operand of source of the storage operation, r7. ADD r14, r22, r25 // The target operand of the storage operation is // replaced. During the consolidation stage, ADD r7, r3, r5 STORE [r10 + 4], r7 // The address operands are r10 and 4. // return the physical register number // shared by r7 and r14 to the free list . [00058] Similar to the previous code example, in the example above, the loading instruction has a memory dependency on the storage instruction. In this case, the third addition instruction modifies the source operand (r7) of the storage instruction. There may be a race condition in the forwarding between the load instruction that marks the renamed register identifier associated with the source operand (r7) as a duplicate and the third addition instruction that consolidates and causes the same rename register identifier returns to the free list. By the time the load statement marks this rename register identifier as a duplicate, the rename register identifier may already be in a history file towards the free list. [00059] An option to handle the above case is to detect an intervention instruction, such as the third addition instruction, modify the source operand of the storage instruction and, in response, disqualify the loading instruction from being converted into an operation. zero cycle loading. Another option for handling the above case is to detect that an intervention instruction modifies the source operand of the storage instruction and, in response, determine whether the storage instruction has been deactivated. If the storage instruction has not been deactivated, then the intervention instruction has not been deactivated. Therefore, the intervention instruction has not yet caused the renaming register identifier associated with the source operand of the storage instruction to be returned to the free list. A duplicate count of that renaming registrar identifier can now be maintained. Similarly, a duplicate count can be increased for the target operand (r14) of the load instruction due to the last addition instruction in the code example above. [00060] Returning to method 300, if the source operand of the given storage instruction is detected as being modified (conditional block 314) and the storage instruction has not been deactivated (conditional block 316), then it is determined whether a loading may have a memory dependency on the given storage instruction. Similarly, if the source operand of the given storage instruction is not detected as being modified (conditional block 314), then it is determined whether a load instruction can have a memory dependency on the given storage instruction. It is noted that, in some modalities, each of the conditional blocks 310, 314 and 318 can be evaluated at the same time. For example, control logic and tables can receive admissions associated with subsequent instructions at the same time. [00061] In one embodiment, a memory dependency (MD) detector is accessed with the information associated with the loading instruction. As previously described, the MD detector can include a table that contains information for particular storage instructions to be compared with subsequent instructions. Alternatively, the MD detector can include control logic to detect hint information from the compiler. The MD detector can also include the STL predictor. In addition, the MD detector can include a combination of these alternative design choices. In response to accessing the MD detector, both the control logic and the table (s) that store values for the given storage instruction and other storage instructions can generate a result that indicates a memory dependency exists between the loading instruction and the given storage instruction. For example, in one embodiment, the base register ID of the address operand and the immediate value for each of the loading instructions and the given storage instructions may match. In addition, the storage instruction can be determined to be the newest storage instruction that is older than the load instruction with compatible address operands. If a loading instruction is determined to be dependent on the given storage instruction (conditional block 318), then, at block 320, the loading instruction can be processed as a zero cycle loading instruction. Then, the additional details of processing a zero cycle loading instruction are provided. [00062] Referring now to Figure 4, a generalized flowchart of a 400 method embodiment for processing the zero cycle loading operation is shown. In block 402, a given loading instruction can be determined to be dependent on an older storage instruction (in program order). A given loading instruction may qualify to be converted to a zero cycle loading instruction. As previously described, a condition may be that the newest load instruction has an address operand register ID and an intermediate address operand value that matches the address operand values for the storage instruction. A second condition may be that it is determined that no intervention instructions modify the values, such as the address operands and the source operand, stored in a table accessed for the given storage instruction. A third condition can be the storage instruction which is determined to be the newest storage instruction which is older than the load instruction with compatible address operands. A fourth condition can be an indication of available duplication capabilities. For example, the renaming register number for the source operand register ID of the storage instruction can be used to index in a data structure such as RDA 224. In other embodiments, a memory dependency predictor STL and / or compiler tip information can be used to indicate the RAW dependency between the load instruction and the given storage instruction. [00063] An adjustment in RDA 224 may indicate that the renamed source register identifier is already duplicated. An error may indicate that the source rename register identifier is not yet duplicated. If the renamed source register identifier is not yet duplicated and RDA 224 is not yet complete, then an entry in RDA 224 can be allocated to the renamed source register identifier. If the renamed source register identifier is already duplicated, then a duplicate count for the renamed source register identifier can be compared to a given threshold. In one mode, the limit can correspond to a particular count. If an associated duplicate count has not reached the limit, then duplicate resources are available. If the associated duplicate count has reached the limit, then duplicate resources are unavailable. [00064] If duplicate resources are unavailable (conditional block 404), then in block 406, the destination architecture register for the load instruction is renamed with a free list rename register identifier. The renamed identifier, an associated program (PC) counter value, dependency vectors and so on can be sent to a dispatch queue and later to a programmer. In block 408, a next available instruction can be processed. The next available instruction can be processed in parallel with the steps above or in a subsequent clock cycle. [00065] If duplicate resources are unavailable (conditional block 404), then, at block 410, the destination operand identifier (ID) of the loading instruction is renamed with a renamed register ID used for the source operand of the most new older dependent storage instruction. In block 412, a duplication count for the physical registers is updated. The duplicate count can be increased each time a given architecture registrar identifier currently not mapped to the selected renaming register identifier is mapped to that renaming register identifier. In one embodiment, the duplicate count can be initialized to a value of two. [00066] The identifiers renamed to the loading instruction and one or more other instructions can be sent to a dispatch queue and later to a programmer. The associated program counter (PC) values, the dependency vectors and so on can also be sent. In one embodiment, the RAW dependency is considered to be accurate and the loading instruction can be marked as complete. For example, the table access on the MD detector as described in method 300 can be considered as having no uncertainty and, therefore, the loading instruction is not further processed by further forwarding stages. In other embodiments, the RAW dependency between loading and storage instructions is not considered to have any uncertainty. Therefore, the loading instruction is not marked as complete and is further processed by further forwarding stages. In block 414, one or more instructions that include the loading instruction when not marked as complete are issued in the routing. In block 416, the memory dependency between the storage instruction and the loading instruction can be verified. For example, an access to a temporary storage of the storage with a given address and other instruction identification information can be performed. [00067] If the memory dependency is revealed to be incorrect (conditional block 418), then in block 420 the loading instruction and the newer instructions (in program order) that the loading instruction can be released from forwarding. The loading instruction can then be played back again. If the predicted memory dependency is revealed to be correct (conditional block 418), then at block 422 the loading instruction proceeds to forward to the consolidation forward stage without reading the data associated with the source operand of the intermediate storage hierarchy. provision on chip, temporary storage of storage or off-chip memory. The newer instructions dependent on the loading instruction can continue with the forwarded data received from the physical register file associated with the source operand of the corresponding storage instruction. [00068] Referring now to Figure 5, a generalized flow chart of a 500 method embodiment for consolidating instructions that include the zero cycle loading operation is shown. For purposes of discussion, the steps in this modality and subsequent modalities of methods 300 and 400 described above are shown in sequential order. However, in other modalities, some steps may occur in a different order than shown, some steps may be performed concurrently, some steps may be combined with other steps and some steps may be missed. [00069] In block 502, the program instructions are being consolidated. A window in order of instructions within a data structure can be used to determine when to consolidate and disable instructions. For example, the robot 270 can be used as the data structure. When memory instructions are detected for consolidation, a check can be performed to determine if the associated renaming register identifiers are duplicated. In one example, a flag or associated duplicate field that indicates that a duplicate state for each of the target and the source physical registers can be stored with other associated information for the instruction. In another example, each of the destination and the source renaming register identifier can be used to index in a data structure such as RDA 224. A match may indicate that a corresponding rename register identifier is already duplicated. An error may indicate that the corresponding renaming registrar identifier is not yet duplicated. [00070] If a given renaming register identifier is not duplicated (conditional block 504), then, in block 506, the renaming register identifier is returned to the free list. Otherwise, at block 508, a duplicate count for the corresponding renaming register identifier can be decreased. Generally, a duplicate count is decreased each time an associated renaming register identifier is ready to return to the free list for any given architecture register. A rename register identifier can be determined to be ready to return to the free stripe in response to a mapping being removed from the mapping table. Typically, a rename register identifier is returned to the free stripe in response to a mapping being removed from the mapping table. However, with duplicate mapping in the mapping table due to zero cycle loading operations, a data structure, such as RDA 224, can be inspected before any return to the free list. [00071] After the duplicate count is decreased, if the rename register identifier is still duplicated (conditional block 510), then, in block 512, the rename register identifier can be marked as still duplicated and is not returned to the free list. For example, a valid entry in a data structure, such as RDA 224, may still be present with a duplicate count greater than one. [00072] After the duplicate count is decreased, if the rename register identifier is not yet duplicated (conditional block 510), then, in block 514, the rename register identifier can be marked as mapped, but not duplicated. For example, an associated entry in a data structure, such as RDA 224, can be invalidated. Alternatively, a valid entry may still be present with a doubled count of one. The renaming registrar identifier is not returned for the free list. [00073] Although the above modalities have been described in considerable detail, several variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. The following claims are intended to be interpreted to cover all such variations and modifications.
权利要求:
Claims (20) [0001] 1. Processor characterized by the fact that it comprises: a memory dependency detector configured to: detect a given storage instruction; storing an address operand base register ID corresponding to an address operand base register and an immediate address operand value of the given storage instruction in a first table entry; in response to the detection of an instruction that is configured to perform a modification of the base register of the address operand: adjust the value of the immediate value of the address operand for the given storage instruction in the first entry of the table in response to the determination that the modification is an immediate modification based on value; and invalidate the first entry in the table in response to the determination that the change is not an immediate change based on value; and determining a memory dependency for a given loading instruction in the given storage instruction, sensitive to the detection of a valid table entry, indicates that the given storage instruction and the given loading instruction have values that correspond to a register ID address operand base and an immediate address operand value; and a registrar renaming unit; where, in response to receiving a memory dependency indication, the register renaming unit is configured to: assign a rename register identifier (ID) associated with a source operand register ID of the given storage instruction a destination operand register ID of the given load instruction; store a duplicate count associated with the renaming registrar ID, where the duplicate count indicates a number of architectural registrar IDs currently mapped to a given renaming registrar ID; and preventing the given loading instruction from reading data associated with the original operand from memory. [0002] 2. Processor according to claim 1, characterized by the fact that to prevent the given load instruction from reading the data associated with the memory source operand, the register renaming unit is configured to indicate that the given load instruction loading must be completed after at least one of the following: the memory dependency is determined and the memory dependency is verified as correct. [0003] 3. Processor according to claim 2, characterized by the fact that the register renaming unit is further configured to decrease the duplicate count each time an instruction uses an architectural register ID currently mapped to the renamed register ID. [0004] 4. Processor according to claim 3, characterized by the fact that to determine memory dependency, the memory dependency detector is further configured to determine: an intermediate instruction is found between the given storage instruction and the given instruction loading in the program order that has a destination operand register ID equal to the source operand register ID of the given storage instruction; and the given storage instruction has not yet been withdrawn. [0005] 5. Processor according to claim 3, characterized by the fact that the register renaming unit is further configured to: detect that an instruction with a destination operand register ID assigned to the register rename ID is ready for consolidate; and preventing the renaming registrar ID from returning to a free list in response to the determination that the renaming registrar ID is duplicated. [0006] 6. Processor according to claim 3, characterized by the fact that to determine said memory dependency, the memory dependency detector is further configured to determine that no intermediate instruction is found between the given storage instruction and the given load instruction in program order that has a destination operand register ID equal to the origin operand register ID of the given storage instruction. [0007] 7. Processor according to claim 5, characterized by the fact that the registrar renaming unit is further configured to increase the duplicate count each time an architecture registrar ID currently not mapped to the renaming registrar ID is mapped to the renaming registrar ID. [0008] 8. Processor according to claim 5, characterized by the fact that the processor further comprises a loading / storage unit configured to replay the given loading instruction and the program instructions that are newer in the program order than the given loading instruction in response to the determination that the memory dependency is incorrect. [0009] 9. Processor, according to claim 5, characterized by the fact that the processor also comprises a physical register file configured to forward data associated with the source operand of the given storage instruction to newer instructions (in program order) and dependent on the given loading instruction. [0010] 10. Method characterized by the fact that it comprises: detecting a particular storage instruction; storing an address operand base register ID corresponding to an address operand base register and an immediate address operand value of the given storage instruction in a first table entry; in response to the detection of an instruction that is configured to perform a modification of the base register of the address operand: adjust the immediate value of the address operand for the given storage instruction in the first entry in the table in response to the determination that the modification is an immediate change based on value; and invalidate the first entry in the table in response to the determination that the change is different from an immediate value-based change; and determining a memory dependency for a given load instruction on the given storage instruction sensitive to the detection of a valid entry in the table indicating that the given storage instruction and the given load instruction have values that correspond to a base register ID address operand and an immediate address operand value; and in response to receiving an indication of the memory dependency, the method further comprises: assigning a renaming register identifier (ID) associated with an operand register ID of the given storage instruction to an operand register ID destination of the given loading instruction; store a duplicate count associated with the renaming registrar ID, where the duplicate count indicates a number of architectural registrar IDs currently mapped to a given renaming registrar ID; and preventing the given loading instruction from reading data associated with the original operand from memory. [0011] 11. Method, according to claim 10, characterized by the fact that to prevent the given loading instruction of the reading data associated with the memory source operand, the method further comprises indicating that the given loading instruction must be completed after at least one of the following: the memory dependency has been determined and the memory dependency has been verified as correct. [0012] 12. Method according to claim 11, characterized by the fact that it still comprises decreasing the double count whenever an instruction uses an architectural register ID currently mapped to the renamed register ID. [0013] 13. Method, according to claim 12, characterized by the fact that to determine said memory dependency, the method further comprises determining that: an intervention instruction between the given storage instruction and the given loading instruction has an ID destination operand register equal to the source operand register ID of the given storage instruction; and the given storage instruction has not yet been deactivated. [0014] 14. Method, according to claim 13, characterized by the fact that it still comprises issuing newer instructions (in program order) and dependent on the given loading instruction with the given loading instruction. [0015] 15. Method according to claim 10, characterized by the fact that it still comprises invalidating the base register ID of the stored address operand and the immediate value of the address operand of the given storage instruction in response to the determination of which other type Modification is done on the base register identifier of the given storage instruction other than an immediate value-based modification by an intermediate instruction between the given storage instruction and a given loading instruction in the program order. [0016] 16. Method, according to claim 12, characterized by the fact that to determine memory dependency, the method further comprises determining that no intermediate instruction is found between the given storage instruction and the given loading instruction in the program order which has a destination operand register ID equal to the origin operand register ID of the given storage instruction. [0017] 17. Method according to claim 13, characterized by the fact that it further comprises: detecting that an instruction is ready to consolidate, in which an instruction ID of the target operand of the instruction is assigned to the register renaming ID; and prevent the renaming registrar ID from returning to a free list in response to the determination that the renaming registrar ID is duplicated. [0018] 18. Method, according to claim 17, characterized by the fact that it still comprises increasing the duplicate count each time any given architecture register ID currently not mapped to the renaming register ID is mapped to the register register ID. renaming. [0019] 19. Recorder renaming unit characterized by the fact that it comprises: a first interface configured to receive decoded instructions, decoded instructions including a loading instruction and a storage instruction, where the storage instruction comprises a source recorder identifier (ID) and an identification of a storage address, the loading instruction comprises an identification of a source address and a destination registrar ID; a second interface for a dispatch unit configured to dispatch instructions to a scheduler; zero cycle load logic; and a memory dependency detector configured to: detect a given storage instruction; storing a base register ID of the address operand that corresponds to a base register of the address operand and an immediate value of the address operand of the given storage instruction in a first entry in a table; in response to the detection of an instruction that is configured to perform a modification of the base register of the address operand: adjust the value of the immediate value of the address operand for the given storage instruction in the first entry of the table in response to the determination that the modification is an immediate modification based on value; and invalidate the first entry in the table in response to the determination that the change is not an immediate change based on value; and determining a memory dependency for a given load instruction on the given storage instruction sensitive to the detection of a valid entry in the table indicating that the given storage instruction and the given load instruction have values that correspond to a base register ID address operand and an immediate address operand value; a register duplication matrix (RDA) comprising a plurality of entries, each entry configured to store a duplicate count that includes multiple mappings for any architectural register identifier (ID) to a given renaming register (ID) identifier; where, in response to receiving an indication of a memory dependency from the loading instruction in the storage instruction, the logic is configured to: assign the renaming register ID associated with a source operand register ID of the given instruction from storage to the load register destination register ID; in response to said assignment of the renaming registrar ID, storing a duplicate count associated with the renaming registrar ID, where the duplicate count indicates several times that any architectural registrar ID has been mapped to the same renaming registrar ID; and preventing the load instruction from reading data associated with the original operand from memory. [0020] 20. Register renaming unit, according to claim 19, characterized by the fact that to prevent the given loading instruction from reading the data associated with the original operand from memory, the logic is configured to indicate that the given load instruction must be completed after at least one of the following options: memory dependency is determined and memory dependency is verified as correct.
类似技术:
公开号 | 公开日 | 专利标题 BR102013014996B1|2020-12-08|registrar rename processor, method and unit US9043559B2|2015-05-26|Block memory engine with memory corruption detection US10896128B2|2021-01-19|Partitioning shared caches US10802987B2|2020-10-13|Computer processor employing cache memory storing backless cache lines KR101025354B1|2011-03-28|Global overflow method for virtualized transactional memory US7809895B2|2010-10-05|Low overhead access to shared on-chip hardware accelerator with memory-based interfaces JP2007536626A|2007-12-13|System and method for verifying a memory file that links speculative results of a load operation to register values US20080010417A1|2008-01-10|Read/Write Permission Bit Support for Efficient Hardware to Software Handover US20080222383A1|2008-09-11|Efficient On-Chip Accelerator Interfaces to Reduce Software Overhead US20070050592A1|2007-03-01|Method and apparatus for accessing misaligned data streams US9672298B2|2017-06-06|Precise excecution of versioned store instructions US20190250913A1|2019-08-15|Processor for avoiding reduced performance using instruction metadata to determine not to maintain a mapping of a logical register to a physical register in a first level register file US9875108B2|2018-01-23|Shared memory interleavings for instruction atomicity violations US20140122842A1|2014-05-01|Efficient usage of a register file mapper mapping structure US11119925B2|2021-09-14|Apparatus and method for managing capability metadata US9286068B2|2016-03-15|Efficient usage of a multi-level register file utilizing a register file bypass US20140281246A1|2014-09-18|Instruction boundary prediction for variable length instruction set JP2019503009A|2019-01-31|Vector atomic memory update instruction US11200062B2|2021-12-14|History file for previous register mapping storage and last reference indication US20210173654A1|2021-06-10|Zero cycle load bypass BRPI0805218A2|2010-08-17|omission scheme of pre-post withdrawal hybrid hardware lock
同族专利:
公开号 | 公开日 BR102013014996A2|2015-08-11| EP2674856B1|2019-08-21| CN103514009B|2019-10-08| KR101497807B1|2015-03-02| EP2674856A2|2013-12-18| WO2013188120A3|2014-03-06| JP2014002735A|2014-01-09| US9996348B2|2018-06-12| CN103514009A|2014-01-15| TW201411485A|2014-03-16| JP5894120B2|2016-03-23| KR20130140582A|2013-12-24| US20130339671A1|2013-12-19| WO2013188120A2|2013-12-19| EP2674856A3|2014-07-23| TWI537824B|2016-06-11|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题 DE3852432T2|1987-07-01|1995-07-13|Ibm|Command control device for a computer system.| WO1993020505A2|1992-03-31|1993-10-14|Seiko Epson Corporation|Superscalar risc instruction scheduling| US5799179A|1995-01-24|1998-08-25|International Business Machines Corporation|Handling of exceptions in speculative instructions| US5751983A|1995-10-03|1998-05-12|Abramson; Jeffrey M.|Out-of-order processor with a memory subsystem which handles speculatively dispatched load operations| US5926646A|1997-09-11|1999-07-20|Advanced Micro Devices, Inc.|Context-dependent memory-mapped registers for transparent expansion of a register file| US6065103A|1997-12-16|2000-05-16|Advanced Micro Devices, Inc.|Speculative store buffer| US6122725A|1998-03-31|2000-09-19|Intel Corporation|Executing partial-width packed data instructions| US6094716A|1998-07-14|2000-07-25|Advanced Micro Devices, Inc.|Register renaming in which moves are accomplished by swapping rename tags| US6122656A|1998-07-31|2000-09-19|Advanced Micro Devices, Inc.|Processor configured to map logical register numbers to physical register numbers using virtual register numbers| JP3497087B2|1998-12-17|2004-02-16|富士通株式会社|Instruction control apparatus and method| EP1050806A1|1999-05-03|2000-11-08|STMicroelectronics SA|Memory access address comparison| US6505293B1|1999-07-07|2003-01-07|Intel Corporation|Register renaming to optimize identical register values| US6662280B1|1999-11-10|2003-12-09|Advanced Micro Devices, Inc.|Store buffer which forwards data based on index and optional way match| US7028166B2|2002-04-30|2006-04-11|Advanced Micro Devices, Inc.|System and method for linking speculative results of load operations to register values| US7165167B2|2003-06-10|2007-01-16|Advanced Micro Devices, Inc.|Load store unit with replay mechanism| US7111126B2|2003-09-24|2006-09-19|Arm Limited|Apparatus and method for loading data values| KR20070019750A|2004-05-05|2007-02-15|어드밴스드 마이크로 디바이시즈, 인코포레이티드|System and method for validating a memory file that links speculative results of load operations to register values| US7263600B2|2004-05-05|2007-08-28|Advanced Micro Devices, Inc.|System and method for validating a memory file that links speculative results of load operations to register values| FR2873466A1|2004-07-21|2006-01-27|St Microelectronics Sa|METHOD FOR PROGRAMMING A DMA CONTROLLER IN A CHIP SYSTEM AND ASSOCIATED CHIP SYSTEM| US8612944B2|2008-04-17|2013-12-17|Qualcomm Incorporated|Code evaluation for in-order processing| US8533438B2|2009-08-12|2013-09-10|Via Technologies, Inc.|Store-to-load forwarding based on load/store address computation source information comparisons| US8631225B2|2010-06-25|2014-01-14|International Business Machines Corporation|Dynamically rewriting branch instructions to directly target an instruction cache location|US8327115B2|2006-04-12|2012-12-04|Soft Machines, Inc.|Plural matrices of execution units for processing matrices of row dependent instructions in single clock cycle in super or separate mode| WO2008061154A2|2006-11-14|2008-05-22|Soft Machines, Inc.|Apparatus and method for processing instructions in a multi-threaded architecture using context switching| US10228949B2|2010-09-17|2019-03-12|Intel Corporation|Single cycle multi-branch prediction including shadow cache for early far branch prediction| KR101966712B1|2011-03-25|2019-04-09|인텔 코포레이션|Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines| CN103649932B|2011-05-20|2017-09-26|英特尔公司|The scattered distribution of resource and for supporting by the interconnection structure of multiple engine execute instruction sequences| KR101832679B1|2011-11-22|2018-02-26|소프트 머신즈, 인크.|A microprocessor accelerated code optimizer| US9047092B2|2012-12-21|2015-06-02|Arm Limited|Resource management within a load store unit| US9811342B2|2013-03-15|2017-11-07|Intel Corporation|Method for performing dual dispatch of blocks and half blocks| US9904625B2|2013-03-15|2018-02-27|Intel Corporation|Methods, systems and apparatus for predicting the way of a set associative cache| WO2014150991A1|2013-03-15|2014-09-25|Soft Machines, Inc.|A method for implementing a reduced size register view data structure in a microprocessor| US9569216B2|2013-03-15|2017-02-14|Soft Machines, Inc.|Method for populating a source view data structure by using register template snapshots| US10275255B2|2013-03-15|2019-04-30|Intel Corporation|Method for dependency broadcasting through a source organized source view data structure| US10140138B2|2013-03-15|2018-11-27|Intel Corporation|Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation| WO2014150806A1|2013-03-15|2014-09-25|Soft Machines, Inc.|A method for populating register view data structure by using register template snapshots| US9311084B2|2013-07-31|2016-04-12|Apple Inc.|RDA checkpoint optimization| US9940229B2|2013-12-17|2018-04-10|Intel Corporation|Technologies for persistent memory programming| US11068271B2|2014-07-28|2021-07-20|Apple Inc.|Zero cycle move using free list counts| CN106648546A|2016-09-07|2017-05-10|北京大学|Collaborative optimization compilation method used for GPU register allocation and parallelism management| US11175923B2|2017-02-13|2021-11-16|International Business Machines Corporation|Comparing load instruction address fields to store instruction address fields in a table to delay issuing dependent load instructions| US10838729B1|2018-03-21|2020-11-17|Apple Inc.|System and method for predicting memory dependence when a source register of a push instruction matches the destination register of a pop instruction| CN108614736B|2018-04-13|2021-03-02|杭州中天微系统有限公司|Device and processor for realizing resource index replacement| US11200062B2|2019-08-26|2021-12-14|Apple Inc.|History file for previous register mapping storage and last reference indication|
法律状态:
2015-08-11| B03A| Publication of a patent application or of a certificate of addition of invention [chapter 3.1 patent gazette]| 2018-12-04| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]| 2019-11-19| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]| 2020-08-18| B09A| Decision: intention to grant [chapter 9.1 patent gazette]| 2020-12-08| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 14/06/2013, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
[返回顶部]
申请号 | 申请日 | 专利标题 US13/517,865|US9996348B2|2012-06-14|2012-06-14|Zero cycle load| US13/517,865|2012-06-14| 相关专利
Sulfonates, polymers, resist compositions and patterning process
Washing machine
Washing machine
Device for fixture finishing and tension adjusting of membrane
Structure for Equipping Band in a Plane Cathode Ray Tube
Process for preparation of 7 alpha-carboxyl 9, 11-epoxy steroids and intermediates useful therein an
国家/地区
|