巴西专利BR112014022725B1 METHOD FOR LOADING DATA TO A DYNAMICALLY DETERMINED BORDER OF MEMORY

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
method for loading data up to a dynamically determined memory boundary. a block-bound load instruction is provided that loads a variable number of bytes of data into a register while ensuring that a specified memory boundary is not exceeded. the boundary is dynamically determined based on a specified boundary type and one or more executed processor characteristics of the instruction, such as cache row size or page size used by the processor.
公开号:BR112014022725B1
申请号:R112014022725-0
申请日:2013-03-07
公开日:2021-08-10
发明作者:Jonathan David Bradbury；Michael Karl Gschwind；Timothy Slegel；Eric Mark Schwarz；Christian Jacobi
申请人:International Business Machines Corporation；
IPC主号:

专利说明:

Background of the Invention
[0001] One aspect of the invention relates in general to data processing and, in particular, to loading data into records.
[0002] Data processing includes several types of processing, including loading data into records. Loading data into a record includes but is not limited to loading character data such as character data strings; integer data; or any other types of data. The data that is loaded is then able to be used and/or manipulated.
[0003] Current instructions for performing various types of processing, including loading data into records, tend to be inefficient. Invention Summary
[0004] The disadvantages of the prior art are overcome and advantages are provided through the provision of a computer program product to execute a machine instruction. The computer program product includes a computer readable storage medium readable by a processing circuit and storage instructions for execution by the processing circuit to execute a method. The method includes, for example, obtaining by a processor a machine instruction for execution, the machine instruction being defined for computer execution in accordance with a computer architecture, the machine instruction including: at least one field of code. operation to provide an operation code, the operation code identifying a shipment to block border operation; a record field to be used to designate a record, the record comprising a first operand; at least one field to locate a second operand in main memory; and execute the machine instruction, the execution including: only load bytes of the first operand with corresponding bytes of the second operand that are within a dynamically determined block of main memory based on a specified type of block boundary and one or more characteristics of the processor.
[0005] Methods and systems relating to one or more aspects of the present invention are also described and claimed herein. Furthermore, services relating to one or more aspects of the present invention are also described and may be claimed herein.
[0006] Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered part of the claimed invention. Brief Description of Drawings
[0007] One or more aspects of the present invention are particularly indicated and distinctly claimed as examples in the claims at the conclusion of the specification. The above and other objectives, features and advantages of the invention are evident from the following detailed description taken in conjunction with the attached drawings in which:
[0008] Figure 1 shows an example of a computing environment to incorporate and use one or more aspects of the present invention;
[0009] Figure 2A shows another example of a computing environment to incorporate and use one or more aspects of the present invention;
[0010] Figure 2B shows additional details of the memory of Figure 2A, in accordance with an aspect of the present invention;
[0011] Figure 3 shows an embodiment of a format of a load vector instruction for block boundary, according to an aspect of the present invention;
[0012] Figure 4 shows an embodiment of logic associated with the load vector instruction for block boundary, according to an aspect of the present invention;
[0013] Figure 5 shows an example of data to be loaded into a vector register, according to an aspect of the present invention;
[0014] Figure 6 shows an example of a log file according to an aspect of the present invention;
[0015] Figure 7 shows an embodiment of a computer program product incorporating one or more aspects of the present invention;
[0016] Figure 8 shows an embodiment of a host computer system incorporating and using one or more aspects of the present invention;
[0017] Figure 9 shows a further example of a computer system incorporating and using one or more aspects of the present invention;
[0018] Figure 10 shows another example of a computer system comprising a computer network incorporating and using one or more aspects of the present invention;
[0019] Figure 11 shows an embodiment of various elements of a computer system incorporating and using one or more aspects of the present invention;
[0020] Figure 12A shows an embodiment of the computer system execution unit of Figure 11 incorporating and using one or more aspects of the present invention;
[0021] Figure 12B shows an embodiment of the shunt unit of the computer system of Figure 11 incorporating and using one or more aspects of the present invention;
[0022] Figure 12C shows an embodiment of the loading/storage unit of the computer system of Figure 11 incorporating and using one or more aspects of the present invention; and
[0023] Figure 13 shows an embodiment of an emulated host computer system incorporating and using one or more aspects of the present invention. Detailed Description of the Invention
[0024] In accordance with an aspect of the present invention, a capability is provided to facilitate the loading of data into a record. As examples, data includes character data, integer data and/or other types of data. Also, the register is a vector register or other type of register.
[0025] Character data includes, but is not limited to, alphabetic characters in any language; numeric digits; punctuation; and/or other symbols. Character data may or may not be data strings. Associated with character data are standards, examples of which include, but are not limited to, ASCII (American Standard Code for Information Interchange); Unicode; including but not limited to UTF-8 (Unicode Transformation Format); UTF-16, etc.
[0026] A vector register (also referred to as a vector) includes one or more elements, and each element is one, two, or four bytes in length, as examples. Furthermore, a vector operand is, for example, a SIMD (Multiple Data, Single Instruction) operand having a plurality of elements. In other embodiments, elements can be of other sizes; and a vector operand need not be SIMD, and/or may include an element.
[0027] In one example, a load vector to block boundary instruction is provided that loads a variable number of bytes of data from memory into a vector register while ensuring that a specified boundary of the memory from which the data is being loaded is not exceeded. The boundary can be explicitly specified by the instruction (eg, a variable value in the instruction text, a fixed instruction text value encoded in the opcode, a record-based boundary specified in the instruction, etc.); or the boundary can be dynamically determined by the machine. For example, the statement specifies what data should be loaded into a cache or page boundary, and the machine determines the cache row or page size (for example, querying, for example, a translation lookaside buffer (TLB) to determine the page size), and loads to that point.
[0028] As an additional example, this instruction is also used to align data accesses to a selected boundary.
[0029] In one embodiment, the instruction only loads bytes from the vector register (a first operand) with corresponding bytes from a second operand that are comprised in a block of main memory (also known as main storage) dynamically determined by a type of boundary of block (for example, page or cache line) and one or more characteristics of the processor executing the instruction, such as cache line or page size. As used here, a main memory block is any memory block of a specified size. The specified size is also referred to as the block boundary, the boundary being the end of the block.
[0030] In a further embodiment, other types of records are loaded. That is, the register being loaded is not a vector register, but another type of register. In this context, the instruction is referred to as a block-bound load instruction, which is used to load data into a record.
[0031] An embodiment of a computing environment for incorporating and using one or more aspects of the present invention is described with reference to Figure 1. A computing environment 100 includes, for example, a processor 102 (e.g., a processing unit central), a memory 104 (e.g., main memory), and one or more devices and/or input/output (I/O) interfaces 106 coupled together via, for example, one or more buses 108 and/or other connections.
[0032] In one example, processor 102 is based on z/Architecture offered by International Business Machines Corporation, and is part of a server, such as System z server, which is also offered by International Business Machines Corporation and implements z/Architecture . An embodiment of the z/Architecture is described in an IBM® publication entitled “z/Architecture Principles of operation”, IBM® publication no. SA22-7832-08, ninth edition, August 2010. In one example, the processor runs an operating system, such as z/OS, also offered by International Business Machines Corporation. IBM®, Z/ARCHITECTURE® and Z/OS® are registered trademarks of International Business Machines Corporation, Armonk, New York, USA. Other names used herein may be trademarks, trademarks, or product names of International Business Machines Corporation or other companies.
[0033] In a further embodiment, processor 102 is based on the Power Architecture offered by the International Business Machines Corporation. An embodiment of the Power Architecture is described in “Power ISA™ Version 2.06 Revision B,” International Business Machines Corporation, July 23, 2010. POWER ARCHITECTURE® is a registered trademark of International Business Machines Corporation.
[0034] In yet a further embodiment, the processor 102 is based on an Intel architecture offered by Intel Corporation. An embodiment of Intel architecture is described in “Intel® 64 and IA-32 Architectures Developer's Manual: vol. 2B, Instructions Set Reference, AL”, order number 253666-041US, December 2011, and “Intel® 64 and IA-32 Architectures Developer's Manual: vol. 2B, Instructions Set Reference, M-Z”, order number 253667-041US, December 2011. Intel® is a registered trademark of Intel Corporation, Santa Clara, California.
[0035] Another embodiment of a computing environment to incorporate and use one or more aspects of the present invention is described with reference to Figure 2A. In that example, a computing environment 200 includes, for example, a native central processing unit 202, a memory 204, and one or more input/output devices and/or interfaces 206 coupled together through, for example, one or plus 208 buses and/or other connections. As examples, the computing environment 200 might include a PowerPC processor, a pSeries server, or an xSeries server offered by International Business Machies Corporation, Armonk, New York, an HP Syperdome with Intel Itanium II processors offered by Hewlett Packard Co., Palo Alto , California; and/or other machines based on architectures offered by International Business Machines Corporation, Hewlett Packard, Intel, Oracle or others.
The native central processing unit 202 includes one or more native registers 210, such as one or more general purpose registers and/or one or more special purpose registers used during ambient processing. These records include information that represents the state of the environment at any specific point in time.
[0037] In addition, the native central processing unit 202 executes instructions and code that are stored in memory 204. In a specific example, the central processing unit executes emulator code 212 stored in memory 204. processing configured on one architecture emulates another architecture. For example, emulator code 212 allows machines based on architectures other than z/Architecture, such as PowerPC processors, pSeries servers, xSeries servers, HP Superdome servers or others, to emulate z/Architecture, and execute software and instructions developed based on z /Architecture.
[0038] Additional details regarding the emulator code 212 are described with reference to figure 2B. Guest instructions 250 comprise software instructions (eg machine instructions) that are designed to run on an architecture other than the native CPU 202. For example, guest instructions 250 may have been designed to run on a z/ processor Architecture 102, however, is instead being emulated on the native CPU 202, which could be, for example, an Intel Itanium II processor. In one example, emulator code 212 includes an instruction read unit 252 to obtain one or more guest instructions 250 from memory 204, and optionally provide local storage for the obtained instructions. It also includes an instruction translation routine 254 to determine the type of guest instruction that has been obtained and translate the guest instruction into one or more corresponding native instructions 256. This translation includes, for example, identifying the function to be performed by the instruction. and choose the native instruction(s) to perform that function.
[0039] In addition, the emulator 212 includes an emulation control routine 260 to cause native instructions to be executed. The emulation control routine 260 can cause the native CPU 202 to execute a native instruction routine that emulates one or more previously obtained guest instructions and, on completion of such execution, return control to the instruction read routine to emulate. obtaining the next guest instruction or a set of guest instructions. Executing native instructions 256 may include loading data into a register from memory 204; store data back into memory from a register; or perform some kind of arithmetic or logical operation, as determined by the translation routine.
[0040] Each routine is, for example, implemented in software, which is stored in memory and executed by the native central processing unit 202. In other examples, one or more of the routines or operations are implemented in firmware, hardware, software, or some combination of them. Emulated processor registers can be emulated using native CPU registers 210 or by using memory locations 204. In embodiments, guest instructions 250, native instructions 256, and emulator code 212 may reside in the same memory or may be made available between devices different memory devices.
[0041] As used herein, firmware includes, for example, the microcode, millicode and/or macrocode of the processor. It includes, for example, the hardware level instructions and/or data structures used in higher level machine code implementation. In one embodiment, it includes, for example, proprietary code that is typically provided as microcode that includes trusted software or microcode specific to the underlying hardware and controls access to the operating system from the system hardware.
[0042] In one example, a guest instruction 250 that is fetched, translated, and executed is the instruction described here. The instruction that is of one architecture (eg z/Architecture), is read from memory, translated and represented as a sequence of native instructions 256 of another architecture (eg PowerPC, pSeries, xSeries, Intel, etc.). These native instructions are then executed.
[0043] In one embodiment, the instruction described here is a vector instruction, which is part of a vector setup, provided in accordance with an aspect of the present invention. The vector setup provides, for example, fixed-size vectors ranging from one to sixteen elements. Each vector includes data that is operated by vector instructions defined at installation. In one embodiment, if a vector is composed of multiple elements, then each element is processed in parallel with the other elements. Statement completion does not occur until processing of all elements is complete.
[0044] As described here, vector instructions can be implemented as part of various architectures including but not limited to z/Architecture, Power, Intel, etc. While one embodiment described here is for the z/Architecture, vector instructions and one or more aspects of the present invention can be based on many other architectures. z/Architecture is just an example.
[0045] In an embodiment where vector setup is implemented as part of the z/Architecture, to use vector registers and instructions, a vector enable control and a register control in a specified control register (for example , control register 0) are set, for example, to one. If vector installation is installed and a vector instruction is executed without the enabling controls defined, a data exception is recognized. If vector installation is not installed and a vector instruction is executed, an operation exception is recognized.
[0046] Vector data appears in storage, for example, in the same left-to-right sequence as other data formats. Bits of a data format that are numbered 0-7 constitute the byte in the leftmost (lowest numbered) byte location in storage, bits 8-15 form the byte in the next sequential location, and so on. In a further example, vector data may appear in storage in another sequence, such as right to left.
[0047] Many of the vector instructions provided with vector installation have a specific bit field. This field, referred to as the register extension bit or RXB, includes the most significant bit for each of the operands designated by the vector register. Bits for register assignments not specified by the instruction must be reserved and set to zero.
[0048] In an example, the RXB field includes four bits (for example, bits 0-3) and the bits are defined as follows:
[0049] 0 — most significant bit for the instruction's first vector register designation.
[0050] 1 - most significant bit for the second vector register designation of the instruction, if any.
[0051] 2 - most significant bit for the third vector register designation of the instruction, if any.
[0052] 3 - most significant bit for the fourth vector register designation of the instruction, if any.
[0053] Each bit is set to zero or one, for example, by the assembler depending on the number of registers. For example, for registers 0-15, the bit is set to 0; for registers 16-31, the bit is set to 1, etc.
[0054] In one embodiment, each RXB bit is an extension bit for a specific location in an instruction that includes one or more vector registers. For example, in one or more vector instructions, bit 0 of RXB is an extension bit for location 8-11, which is assigned to, for example, Vi; bit 1 of RXB is an extension bit for location 12-15, which is assigned, for example, to V2; and so on.
[0055] In a further embodiment, the RXB field includes additional bits, and more than one bit is used as an extension for each vector or location.
[0056] An instruction provided in accordance with an aspect of the present invention that includes the RXB field is a vector-to-block boundary load instruction, the example of which is shown in Figure 3. In one example, the load instruction from vector to block boundary 300 includes opcode fields 302a (e.g., bits 0-7), 302b (e.g., bis 40-47) indicating a load vector operation to block boundary; a vector register field 304 (eg, bits 8-11) used to designate a vector register (V1); an index field (X2) 306 (for example, bits 12-15); a base field (B2) 308 (e.g. bits 16-19); a shift field (D2) 310 (e.g. bits 20-31); a mask field (M3) 312 (for example, bits 32-35); and an RXB 316 field (eg bits 36-39). Each of the 304-314 fields in an example is separate and independent of the opcode field(s). Furthermore, in one embodiment, they are separate and independent of each other; however, in other embodiments, more than one field may be combined. Additional information on the use of these fields is described below.
[0057] In one example, selected bis (for example, the first two bits) of the opcode designated by the 302a opcode field specify the length and format of the instruction. In this specific example, the length is three half words, and the format is a vector and index record store operation with an extended opcode field. The vector field (V) , together with its corresponding extension bit specified by RXB, designates a vector register. In particular, for vector registers, the register containing the operand is specified using, for example, a four-bit field of the register field with the addition of the register extension bit (RXB) as the most significant bit. For example, if the four-bit field is 0110 and the extension bit is 0, then the five-bit field 00110 indicates register number 6.
[0058] The subscript number associated with a field of the instruction indicates the operand to which the field applies. For example, the subscript number 1 associated with VI indicates the first operand, and so on. The register operand is a register in length, which is, for example, 128 bits.
[0059] In an example, in a vector register and index store operation instruction, the contents of the general registers designated by fields X2 and B2 are added to the contents of field D2 to form the second operand address. The offset, D2, for the vector load instruction for block boundary is treated as a 12-bit unsigned integer in one example.
[0060] The M3 field is used, in one or more embodiments, to determine the boundary (also referred to here as boundary size) in memory to load. For example, in an embodiment where the boundary is specified by the instruction, the M3 field specifies a code that is used to signal the CPU regarding the block boundary to load. If a reserved value is specified, a specification exception is recognized. Example codes and corresponding values are as follows: Border Code0 64 bytes1 128 Bytes2 256 bytes3 512 bytes 4 1K byte5 2K-byte6 4K-byte
[0061] However, in a further embodiment in which the boundary is dynamically determined by the processor executing the instruction, the M3 field includes an indication of the boundary type, such as a page boundary or cache, as examples. The processor then determines, based on the type and one or more processor characteristics, such as the page size or cache line used by the processor, the boundary size. The processor can use a fixed value for the size or it can dynamically determine the size. For example, if the M3 field indicates that the type is a page boundary, then the processor can perform a table lookup, for example, in a translation lookaside buffer, from the start address to get the page size.
[0062] In a further example, field M3 is not provided and the type is indicated by another field in the instruction or a control outside the instruction.
[0063] In executing an embodiment of the load vector instruction for block boundary (VLBB), preceding in one embodiment from left to right, the first operand (specified in the register designated by the field Vi plus the extension bit) is loaded starting at zero-indexed byte element with bytes from the second operand. The second operand is a memory location designated by the second operand address (also referred to as a start address). Loading starts from that memory location and continues to an end address computed by the instruction (or processor) as described below. If a boundary condition is met, it is model dependent on how the rest of the first operand is treated. Access exceptions are not recognized in unloaded bytes. In one example, bytes that are not loaded are unpredictable.
[0064] In the example instruction above, the start address is determined by the index register value (X2) + a base register value (B2) + an offset (D2) ; however, in other embodiments, it is provided by a registry value; an instruction address + specified instruction text offset; a registry value + offset; or a registry value + index registry value; just as a few examples. Also, in one embodiment, the instruction does not include the RXB field. Instead, no extension is used, or the extension is provided in another way, such as from a control outside the instruction, or provided as part of another field in the instruction.
[0065] Additional details of an embodiment of processing the vector load instruction to block boundary are described with reference to figure 4. In one example, a processor in the computing environment is executing this logic.
[0066] In one embodiment, initially, a start address is computed, which indicates a memory location from which the load should start, step 400. As examples, the start address 402 can be provided by a register value ; an instruction address plus specified instruction text offset; a registry value plus offset; one registry value plus index registry value; or a registry value plus index registry value plus offset. In the instruction provided here, the start address is given by field X2, field B2 and field D2. That is, the contents of the registers designated by X2 and B2 are added to the offset indicated by D2 to provide the start address. The above ways to compute a start address are just examples; other examples are also possible.
[0067] Subsequently, a determination is made as to whether the boundary should be dynamically determined, INQUIRY 404. If not, then the value specified in field M3 is used as the boundary size (BdySize) . Otherwise, the processor dynamically determines the boundary size, step 406. For example, the M3 field specifies the boundary type, and based on the type and one or more processor characteristics (for example, cache line size for the processor; page size for the processor; etc.), the processor determines the boundary. As examples, based on the type, the processor uses a fixed size for the boundary (for example, predefined fixed cache line or page size for the processor) or based on the type, the processor determines the boundary. For example, if the type is a page boundary, the processor looks up the start address in the TLB and determines the page boundary from it. Other examples also exist.
[0068] Subsequent to determining the boundary size, dynamically or by specified instruction, a boundary mask (BdyMask) is created, which is used to determine proximity to the specified boundary, step 410. To create the mask, in an example, a 2's complement negation of a border size (BdySize) 408 is taken by creating a 412 border mask (for example, BdyMask = 0-BdySize).
[0069] Next, an ending address is computed indicating where to stop loading, step 420. Input in this computation is, for example, boundary size 408, start address 402, array size 414 (for example, in bytes ; for example, 16), and border mask 412. In one example, the ending address 422 is computed as follows:
[0070] EndAddress = min(StartAddress + (BdySize - (StartAddress & - BdyMask)), StartAddress + vec_size).
[0071] Thereafter, the first operand (ie, the designated vector register) is loaded, starting at indexed byte 0, from memory starting at the start address and ending at the end address, step 430. This allows for a variable number of bytes is loaded from memory into a vector without crossing a designated memory boundary. For example, if the memory boundary is at 64 bytes, and the start address is at 58 bytes, then bytes 58-64 are loaded into the vector register.
[0072] An example of data to be loaded into a vector register, in accordance with an aspect of the present invention, is shown in Figure 5. As indicated no data is loaded beyond the boundary designated by the vertical dashed line. Locations across the border are not accessible and no exceptions are made. In a specific embodiment, the vector is loaded from left to right. However, in another embodiment, it can be loaded from right to left. In one embodiment, the direction of vectors, left to right or right to left, is provided at runtime. For example, the instruction accesses a register, status control, or other entity that indicates the processing direction is left-to-right or right-to-left, as examples. In one embodiment, this direction control is not coded as part of the instruction, but provided to the instruction at runtime.
[0073] As described here, the vector register is loaded with data bytes from within a main storage block. The block boundary is considered the end of the block. The start of the block can be computed using StartAddress and the boundary mask. For example, the start of the block is computed by StartAddress AND BdyMask.
[0074] An example of a load instruction is described above. When loading data, such as string data, it is often not known whether the string will end before a page boundary. The ability to load to that border without crossing typically requires first checking the end of the string. Some implementations may also have a penalty for crossing borders and software may wish to avoid this. So the ability to load up to multiple borders is useful. An instruction is provided that loads a variable number of bytes into a vector register while ensuring that data across a specified boundary is not loaded.
[0075] In one embodiment, there are 32 vector registers and other types of registers can map into a quadrant of the vector registers. For example, as shown in Figure 6, if there is a 600 register file that includes 32 vector registers 602 and each register is 128 bits long, then 16 floating point registers 604, which is 64 bits long, can overlap the vector records. So, as an example, when floating point register 2 is modified, then vector register 2 is also modified. Other mappings to other record types are also possible.
[0076] Here, memory, main memory, storage and main storage are used interchangeably, unless otherwise noted explicitly or by context.
[0077] Additional details regarding vector installation including examples of other instructions are provided as part of this further detailed description below.
[0078] As will be recognized by one of skill in the art, one or more aspects of the present invention may be incorporated as a system, method or computer program product. Therefore, one or more aspects of the present invention may take the form of an all-hardware embodiment, an all-software embodiment (including firmware, resident software, microcode, etc.) or an embodiment combining software and hardware aspects that can all be generically referred to here as a “circuit”, “module” or “system”. Furthermore, one or more aspects of the present invention may take the form of a computer program product incorporated in one or more computer readable medium(s) having computer readable program code incorporated therein.
[0079] Any combination of one or more computer readable media(s) may be used. The computer-readable medium may be a computer-readable storage medium. A computer readable storage medium may be, for example, but not limited to an electronic, magnetic, optical, electromagnetic, infrared or semiconductor system, apparatus or device, or any suitable combination of the above. More specific examples (a non-exhaustive list) of the computer-readable storage medium include the following: an electrical connection having one or more wires, a portable computer floppy disk, a hard disk, a random access memory (RAM), a memory read-only (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disk (CD-ROM) read-only memory, an optical storage device, a storage device magnetic, or any suitable combination of the above. In the context of this document, a computer-readable storage medium can be any tangible medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus or device.
[0080] Referring now to Figure 7, in one example, a computer program product 700 includes, for example, one or more non-transient computer readable storage media 702 for storing computer readable program code medium or logic. 704 therein to provide and facilitate one or more aspects of the present invention.
[0081] Program code embedded in a computer readable medium may be transmitted using an appropriate medium, including but not limited to wireless, landline, fiber optic cable, RF, etc., or any suitable combination of the above .
[0082] Computer program code to perform operations for one or more aspects of the present invention may be written in any combination of one or more programming languages, including an object-oriented programming language such as Java, Smalltalk, C++ or similar, and conventional procedure programming languages such as the “C” programming language, assembler or similar programming languages. Program code can run entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer, and partially on a remote computer, or fully on the server or remote computer. In the last mentioned scenario, the remote computer can be connected to the user's computer over any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection can be made with a external computer (eg over the internet using an internet service provider).
[0083] One or more aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a general purpose computer processor, special purpose computer, or other programmable data processing apparatus to produce a machine such that the instructions, which execute through the computer's processor or other programmable data processing apparatus, create means to implement the functions/acts specified in the flowchart and/or block or block diagram blocks.
[0084] These computer program instructions may also be stored on a computer-readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a specific mode, so that the instructions stored in the computer readable medium produce an industrial product including instructions that implement the function/act specified in the flowchart and/or block or blocks of the block diagram.
[0085] Computer program instructions can also be loaded into a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus, or other devices to produce a computer-implemented process such that instructions executing on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block or blocks of the block diagram.
[0086] The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible system implementations, methods, and computer program products in accordance with various embodiments of one or more aspects of the present invention. In this regard, each block in the flowchart or block diagrams can represent a module, segment or piece of code, which comprises one or more executable instructions to implement the specified logical function(s). It should also be noted that, in some alternative implementations, the functions mentioned in the block may occur outside the order mentioned in the figures. For example, two blocks shown in succession can actually be played substantially simultaneously, or the blocks can sometimes be played in reverse order, depending on the functionality involved. It will also be noted that each block of block diagrams and/or flowchart illustration, and combinations of blocks in block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts , or combinations of computer instructions and special purpose hardware.
[0087] In addition to the above, one or more aspects of the present invention may be provided, offered, deployed, managed, serviced, etc., by a service provider that offers management of customer environments. For example, the service provider may create, maintain, support, etc., computer code and/or a computer infrastructure that performs one or more aspects of the present invention for one or more customers. In return, the service provider may receive payment from the customer pursuant to a subscription contract and/or fees, as examples. Additionally or alternatively, the service provider may receive payment from the sale of advertising content to one or more third parties.
[0088] In one aspect of the present invention, an application may be deployed by executing one or more aspects of the present invention. As an example, deploying an application comprises providing operable computer infrastructure to execute one or more aspects of the present invention.
[0089] As a further aspect of the present invention, a computing infrastructure can be deployed comprising integrating computer readable code into a computing system, in which the code in combination with the computing system is capable of executing one or more aspects of the present invention.
[0090] As a still further aspect of the present invention, a process for integrating computing infrastructure comprising integrating computer readable code into a computer system can be provided. The computer system comprises a computer readable medium, in which the computer medium comprises one or more aspects of the present invention. The code in combination with the computer system is capable of executing one or more aspects of the present invention.
[0091] Although several embodiments are described above, these are only examples. For example, computing environments of other architectures may incorporate and use one or more aspects of the present invention. In addition, registers of other sizes may be used, and changes to instructions may be made without departing from the spirit of the present invention.
[0092] In addition, other types of computing environments can benefit from one or more aspects of the present invention. As an example, a data processing system suitable for storing and/or executing program code is useful which includes at least two processors coupled directly or indirectly to memory elements via a system bus. Memory elements include, for example, local memory employed during actual execution of program code, block depot, and cache memory that provide temporary storage of at least some program code to reduce the number of times the code must be retrieved bulk deposit during execution.
[0093] Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, DASD, tape, CDs, DVDs, thumb drives and other memory media, etc.) can be attached to the system directly or through the intermediate I/O controllers. Network adapters can also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices via intermediate public or private networks. Modems, cable modems and Ethernet cards are just some of the available types of network adapters.
[0094] Referring to Figure 8, representative components of a Host Computer 5000 system for implementing one or more aspects of the present invention are depicted. Representative host computer 5000 comprises one or more CPUs 5001 in communication with computer memory (ie, central storage) 5002, as well as I/O interfaces to 5011 storage media devices and 5010 networks to communicate with other computers or SANs and the like. The 5001 CPU conforms to an architecture having an architected instruction set and architected functionality. The CPU 5001 can have dynamic address translation (DAT) 5003 to transform program addresses (virtual addresses) into real memory addresses. A DAT typically includes a translation lookaside buffer (TLB) 5007 to cache translations so that later accesses to computer memory block 5002 do not require the address translation delay. Typically, a 5009 cache is employed between the 5002 computer memory and the 5001 processor. The 5009 cache can be hierarchical having a large cache available for more than one CPU and smaller, faster (lower tier) caches between the large cache and each CPU. In some implementations the lower-level caches are split to provide separate lower-level caches for instruction fetching and data accesses. In one embodiment, an instruction is fetched from memory 5002 by a read unit of instruction 5004 through a cache 5009. The instruction is decoded in an instruction decoding unit 5006 and unpacked (with other instructions in some embodiments) to unit or instruction execution units 5008. Typically, several execution units 5008 are employed, for example, an arithmetic execution unit, a floating-point execution unit, and a derivation instruction execution unit. The instruction is executed by the execution unit, accessing operands from the specified instruction or memory registers as needed. If an operand is to be accessed (loaded or stored) from memory 5002, a load/store unit 5005 typically handles access in control of the instruction being executed. Instructions can be executed in hardware circuits or in internal microcode (firmware) or by a combination of both.
[0095] As noted, a computer system includes information in local (or main) storage, as well as addressing, protection and reference and change recording. Some aspects of addressing include the format of addresses, the concept of address spaces, the various types of addresses, and the way in which one type of address is translated into another type of address. Part of main storage includes permanently assigned storage locations. Main storage provides the directly addressable, fast-access data storage system. Both data and programs must be loaded into main storage (from input devices) before they can be processed.
[0096] Main storage can include one or more smaller, faster-access buffer stores, sometimes called caches. A cache is typically physically associated with a CPU or an I/O processor. the effects, except on performance, of physical construction and usage of distinct storage media are generally not observable by the program.
[0097] Separate caches can be maintained for instructions and for data operands. Information in a cache is held in contiguous bytes on an integral boundary called a cache block or cache line (or line, for short). A template can provide an EXTRACT CACHE ATTRIBUTE statement that returns the size of a cache row in bytes. A template can also provide PERFETCH DATA and PREFETCH DATA RELATIVE LONG statements that pre-read the data or statement cache or flush data from the cache.
[0098] Storage is seen as a long, horizontal bit string. For most operations, storage accesses proceed in a left-to-right sequence. The bit string is subdivided into eight-bit units. An eight-bit unit is called a byte, which is the basic building block of all information formats. Each byte location in storage is identified by a unique non-negative integer, which is the address of that byte location, or simply the byte address. Adjacent byte locations have consecutive addresses, starting with a left 0 and proceeding in a left-to-right sequence. Addresses are unsigned binary integers and are 24, 31, or 64 bits long.
[0099] Information is transmitted between storage and a CPU or channel subsystem one byte, or a group of bytes at a time. Unless otherwise specified, for example in z/Architecture, a group of bytes in storage is addressed by the leftmost byte of the group. The number of bytes in the group is either implicitly or explicitly specified by the operation to be performed. When used in a CPU operation, a group of bytes is called a field. In each group of bytes, for example in z/Architecture, bits are numbered in a left-to-right sequence. In z/Architecture, the leftmost bits are sometimes referred to as the “higher order” bits and the rightmost bits as the “lower order” bits. Bit numbers are not storage addresses, however. Only bytes can be addressed. To operate on individual bits of a byte in storage, the entire byte is accessed. Bits in a byte are numbered 0 through 7, left to right (for example, in z/Architecture) . Bits in an address can be numbered 8-31 or 40-63 for 24-bit addresses, or 131 or 33-63 for 31-bit addresses; are numbered 0-63 for 64-bit addresses. In any multi-byte fixed-length format, the bits that make up the format are consecutively numbered starting from 0. For error detection purposes, and preferably for correction, one or more check bits may be transmitted with each byte or with a group of bytes. Such check bits are automatically generated by the machine and cannot be directly controlled by the program. Storage capacities are expressed in number of bytes. When the length of a storage-operand field is implied by the opcode of an instruction, the field is said to have a fixed length, which can be one, two, four, eight, or sixteen bytes. Larger fields may be indicated for some instructions. When the length of a storage-operand field is not indicated but explicitly mentioned, the field is said to have a variable length. Variable-length operands can vary in length in one-byte increments (or with some instructions, in two-byte multiples or other multiples). When information is put into storage, the contents of only those byte locations are replaced that are included in the designated field, although the width of the physical path to storage may be greater than the length of the field being stored.
[0100] Certain units of information must be on an integral boundary in storage. A boundary is called an integral for a unit of information when its storage address is a multiple of the unit's length in bytes. Special names are given to 2, 4, 8 and 16 byte fields on an integral boundary. A half-word is a group of two consecutive bytes on a two-byte boundary and is the basic building block of instructions. A word is a group of four consecutive bytes on a four-byte boundary. A doubleword is a group of eight consecutive bytes on an eight-byte boundary. A quad word is a group of 16 consecutive bytes on a 16-byte boundary. When storage addresses designate half-words, words, double words, and quad words, the binary representation of the address contains one, two, three, or four rightmost zero bits, respectively. Instructions must be in two-byte integral boundaries. The storage operands of most instructions have no boundary alignment requirements.
[0101] In devices that implement separate caches for instructions and data operands, a significant delay can be experienced if the program stores on a cache line from which instructions are subsequently fetched, regardless of whether the store changes the instructions that are subsequently fetched .
[0102] In one embodiment, the invention may be practiced by software (sometimes referred to as licensed internal code, firmware, microcode, millicode, picocode, and the like, any of which would be compatible with one or more aspects of the present invention) . Referring to Figure 8, software program code incorporating one or more aspects of the present invention may be accessed by processor 5001 of host system 5000 from long-term storage media devices 5011, such as a CD-ROM drive. , tape drive or hard disk. Software program code can be embedded on any of a variety of media known for use with a data processing system, such as a floppy disk, hard drive, or CD-ROM. Code may be distributed on such media, or may be distributed to users of computer memory 5002 or storage of one computer system over a network 5010 for other computer systems, for use by users of such other systems.
[0103] Software program code includes an operating system that controls the function and interaction of various computer components and one or more application programs. Program code is normally paged from storage media device 5011 to relatively higher speed computer storage 5002 where it is available for processing by processor 5001. The techniques and methods for embedding software program code in memory, on physical media, and/or distribution software code over networks are well known and will not be discussed further here. Program code, when created and stored on tangible media (including but not limited to electronic memory modules (RAM), flash memory, compact discs (CDs), DVDs, magnetic tape and the like is often referred to as a “product Computer program product media.” Computer program product media is typically readable by a processing circuit preferably in a computer system for execution by the processing circuit.
[0104] Figure 9 illustrates a representative workstation or server hardware system in which one or more aspects of the present invention can be practiced. System 5020 of Figure 9 comprises a representative base computer system 5021, such as a personal computer, a workstation or a server, including optional peripheral devices. Base computer system 5021 includes one or more processors 5026 and a bus employed to connect and enable communication between processor(s) 5026 and the other components of system 5021 in accordance with known techniques. The bus connects the processor 5026 to memory 5026 and long term storage 5027 which can include a hard drive (including any magnetic media, CD, DVD and Flash memory, for example) or a tape drive, for example. The 5021 system may also include a user interface adapter, which connects the 5026 microprocessor over the bus to one or more interface devices, such as a 5024 keyboard, a 5023 mouse, a 5030 printer/scanner, and other interface devices, which they can be any user interface device, such as a touch screen, digitized input pad, etc. The bus also connects a 5022 display device, such as a monitor or LCD screen, to the 5026 microprocessor via a display adapter.
[0105] The 5021 system can communicate with other computers or computer networks through a network adapter capable of communicating 5028 with a 5029 network. Example network adapters are communication channels, token ring, Ethernet, or modems. Alternatively, the 5021 system can communicate using a wireless interface such as a CDPD (Cellular Digital Packet Data) card. System 5021 may be associated with such other computers in a local area network (LAN) or a wide area network (WAN), or system 5021 may be a client in a client/server arrangement with another computer, etc. All such configurations, as well as the appropriate communications hardware and software, are known in the art.
[0106] Figure 10 illustrates a data processing network 5040 in which one or more aspects of the present invention can be put into practice. Data processing network 5040 may include a plurality of individual networks, such as a wireless network and a wired network, each of which may include a plurality of individual workstations 5041, 5042, 5043, 5044. Additionally, such as those Those skilled in the art will recognize, one or more LANs may be included, where one LAN may comprise a plurality of intelligent workstations coupled to a host processor.
[0107] Still referring to Figure 10, networks can also include mainframe computers or servers, such as a gateway computer (5046 client server) or application server (remote server 5048 that can access a data repository and can also be accessed directly from a 5045 workstation). A 5046 gateway computer serves as an entry point into each individual network. A gateway is required when connecting one networking protocol to another. Gateway 5046 may preferably be coupled to another network (the Internet 5047, for example) via a communication link. Gateway 5046 can also be directly coupled to one or more workstations 5041, 5042, 5043, 5044 using a communication link. The gateway computer can be deployed using an IBM eServer™ z server server available from International Business Machines Corporation.
[0108] Referring simultaneously to Figure 9 and Figure 10, software programming code that may incorporate one or more aspects of the present invention may be accessed by processor 5026 of system 5020 from long-term storage media 5027, as a CD-ROM drive or hard drive. Software programming code can be embedded on any of a variety of media known for use with a data processing system, such as a floppy disk, hard disk, or CD-ROM. Code may be distributed on such media, or may be distributed to users 5050, 5051 from the memory or storage of one computer system over a network to other computer systems for use by users of such other systems.
[0109] Alternatively, programming code may be embedded in memory 5025, and accessed by processor 5026 using the processor bus. Such programming code includes an operating system that controls the function and interaction of the various computer components and one or more application programs 5032. The program code is normally paged from storage media 5027 to high speed memory 5025 where it is available for processing by processor 5026. Techniques and methods for embedding software programming code in memory, on physical media, and/or distribution software code over networks are well known and will not be further discussed here. Program code, when created and stored on tangible media (including but not limited to electronic memory modules (RAM), flash memory, Compact discs (CDs), DVDs, magnetic tape and the like, and is often referred to as a “computer program product.” The computer program product medium is typically readable by a processing circuit preferably in a computer system for execution by the processing circuit.
[0110] The cache that is most readily available to the processor (usually faster and smaller than other processor caches) is the lowest cache (L1 or level one) and main storage (main memory) is the highest level cache (L3 if there are 3 levels) . The lowest level cache is often divided into an instruction cache (I-cache) containing machine instructions to be executed and a data cache (D-cache) containing data operands.
[0111] Referring to Fig. 11, an exemplary processor embodiment is shown for processor 5026. Typically one or more levels of cache 5053 are employed to store memory blocks to improve processor performance. The 5053 cache is a high-speed buffer containing cache lines of memory data that are likely to be used. Typical cache lines have 64, 128, or 256 bytes of memory data. Separate caches are often used for caching instructions rather than caching data. Cache coherence (synchronization of in-memory copies of lines and caches) is often provided by various "snoop" algorithms well known in the art. The 5025 main memory storage of a processor system is often referred to as a cache. On a processor system having 4 tiers of 5053 cache, main storage 5025 is sometimes referred to as tier 5 (L5) cache as it is typically faster and only contains a portion of the nonvolatile storage (DASD, tape, etc.) which is available for a computer system. Main storage 5025 “cache” data pages paged in and out of main storage 5025 by the operating system.
[0112] A program counter (instruction counter) 5061 tracks the address of the current instruction to be executed. A program counter on az/Architecture processor is 64-bit and can be truncated to 31-bit or 24-bit to support earlier addressing boundaries. A program counter is typically embedded in a computer's PSW (program status word) so that it persists during context switching. Thereby, an in-process program, having a program counter value, can be stopped by, for example, operating system (context switching from program environment to operating system environment). The program's PSW maintains the program counter value while the program is not active, and the operating system's program counter (in PSW) is used while the operating system is running. Typically, the program counter is incremented by a value equal to the number of bytes in the current instruction. RISC (Reduced Instruction Set Computing) instructions are typically of fixed length while CISC (Complex Instruction Set Computing) instructions are typically of variable length. IBM z/Architecture instructions are CISC instructions having a length of 2, 4 or 6 bytes. Program counter 5061 is modified by a context computation operation or a derivation operation taken from a derivation instruction, for example. In a context-switching operation, the current program counter value is saved in the program status word along with other status information about the running program (such as condition codes), and a new program counter value is loaded pointing to an instruction of a new program module to be executed. A derivation operation taken is performed to allow the program to make decisions or loop in the program by loading the result of the derivation instruction into program counter 5061.
[0113] Typically, a 5055 instruction read unit is employed to read instructions on behalf of the processor 5026. The read unit searches for "following sequential instructions", target instructions of taken derivation instructions, or first instructions of a program following a context switching. Modern instruction read units often employ prefetch techniques to speculatively prefetch instructions based on the probability that the prefetched instructions could be used. For example, a read unit can fetch 16 bytes of instruction that includes the next sequential instruction and additional bytes of additional sequential instructions.
[0114] The fetched instructions are then executed by the processor 5026. In one embodiment, the fetched instruction(s) are passed to a dispatch unit 5056 of the fetch unit. The dispatch unit decodes the instruction(s) and outputs information about the decoded instruction(s) to appropriate units 5057, 5058, 5060. An arithmetic execution unit 5057 decoded from the unit of read instruction 5055 and will perform arithmetic operations on operands according to the operating code of the instruction. Operands are provided to execution unit 5057 preferably from memory 5025, architected registers 5059 or from an immediate field of the instruction being executed. Execution results, when stored, are stored in memory 5025, registers 5059, or other machine hardware (such as control registers, PSW registers, and the like).
[0115] A 5026 processor typically has one or more units 5057, 5058, 5060 to execute the instruction's function. Referring to Fig. 12A, an execution unit 5057 can communicate with general architected registers 5059, a decode/unpack unit 5056, a load storage unit 5060, and other processor units 5065 via interface logic 5071 An execution unit 5057 may employ a number of register circuits 5067, 5068, 5069 to contain information on which the arithmetic logic unit (ALU) 5066 will operate. ALU performs arithmetic operations such as add, subtract, multiply and divide as well as logic functions such as AND, OR and unique-OR (XOR), rotate and shift. Preferably, the ALU supports specialized operations that are design dependent. Other circuits may provide other 5072 architected facilities including condition codes and recovery support logic, for example. Typically, the result of an ALU operation is held in a 5070 output register circuit that can output the result to a variety of other processing functions. There are many arrangements of processor units, the present description is only intended to provide a representative understanding of an embodiment.
[0116] An ADD instruction, for example, would execute in an execution unit 5057 having logic and arithmetic functionality while a floating-point instruction, for example, would execute in a floating-point execution having specialized floating-point capability. Preferably, an execution unit operates on operands identified by an instruction by performing a function defined by opcode on the operands. For example, an ADD instruction can be executed by an execution unit 5057 on operands found in two 5059 registers identified by register fields of the instruction.
[0117] Execution unit 5057 performs arithmetic addition in two operands and stores the result in a third operand where the third operand can be a third register or one of two source registers. The execution unit preferably uses an arithmetic logic unit (ALU) 5066 which is capable of performing a variety of logic functions such as Shift, Rotate, AND, OR, and XOR as well as a variety of algebraic functions including any add, subtract, multiply, divide. Some 5066 ALUs are designed for scalar operations and some for floating point. Data can be Big Endian (where the least significant byte is at the highest byte address) or Little Endian (where the least significant byte is at the lowest byte address) depending on the architecture. IBM's z/Architecture is Big Endian. Signed fields can be sign and magnitude, 1's complement, or 2's complement depending on architecture. A 2's complement number is advantageous in that the ALU does not need to design a subtraction capability since a negative value or a positive 2's complement value requires only one addition in the ALU. Numbers are commonly described abbreviated, where a 12-bit field defines an address of a 4,096-byte block and is commonly described as a 4-Kbyte (Kilobyte) block, for example.
[0118] Referring to Fig. 12B, derivation instruction information for executing a derivation instruction is typically sent to a derivation unit 5058 which often employs a derivation prediction algorithm such as a derivation history table 5082 to predict the result of derivation before other conditional operations are completed. The target of the current derivation instruction will be fetched and speculatively executed before conditional operations complete. When conditional operations complete, speculatively executed derivation statements are either completed or eliminated based on conditional operation conditions and speculated result. A typical tap instruction can test condition codes and tap for a target address if the condition codes meet the tap requirement of the tap instruction, a target address can be calculated based on various numbers including ones found in the register or fields. an immediate field of the instruction, for example. Tap unit 5058 can employ an ALU 5074 having a plurality of input register circuits 5075, 5076, 5077 and an output register circuit 5080. Tap unit 5058 can communicate with general registers 5059, decode dispatch unit 5056 or other 5073 circuits, for example.
[0119] The execution of a group of instructions can be interrupted for a variety of reasons including a context change initiated by an operating system, a program exception or error causing a context change, an I/O interrupt signal causing a change of context or multi-threading activity of a plurality of programs (in a multi-threaded environment), for example. Preferably, a context switch action saves state information about a currently running program and then loads state information about another program being called. Status information can be saved in hardware registers or in memory, for example. Status information preferably comprises a program counter value pointing to a next instruction to be executed, condition codes, memory translation information and architected register contents. A context switching activity can be performed by hardware circuits, application programs, operating system programs, or firmware code (microcode, pico-code, or licensed internal code (LIC)) individually or in combination.
[0120] A processor accesses operands according to instruction-defined methods. The instruction may provide an immediate operand using the value of a portion of the instruction, it may provide one or more register fields explicitly pointing to general purpose registers or special purpose registers (floating point registers, for example). The instruction can use implicit registers identified by an opcode field as operands. The instruction can use memory locations for operands. A memory location of an operand can be provided by a register, an immediate field, or a combination of registers and immediate field as exemplified by the z/Architecture long shift facility where the instruction defines a base register, a register of index and an immediate field (offset field) that are added together to provide the address of the operand in memory, for example. Location here typically indicates a location in main memory (main storage) unless otherwise noted.
[0121] Referring to Fig. 12C, a processor accesses storage using a load/storage unit 5060. Load/storage unit 5060 can perform a load operation by obtaining the address of the target operand in memory 5053 and loading the operand into a register 5059 or other memory location 5053, or may perform a store operation by obtaining the address of the target operand in memory 5053 and storing data obtained from a register 5059 or other memory location 5053 in the target operand location in memory 5053. Load/storage unit 5060 can be speculative and may access memory in a sequence that is out of order with respect to the instruction sequence, however load/storage unit 5060 must maintain the appearance for programs that instructions were executed in order . A load/storage unit 5060 can communicate with general registers 5059, decode/unpack unit 5056, cache/memory interface 5053 or other 5083 elements and comprises various register circuits, ALUs 5085 and control logic 5090 to calculate storage addresses and provide pipeline sequencing to keep operations in order. Some operations may be out of order, but the load/storage unit provides functionality to make out of order operations appear to the program as having been performed in order, as is known in the art.
[0122] Preferably, addresses that an application program “sees” are often referred to as virtual addresses. Virtual addresses are sometimes referred to as “logical addresses” and “effective addresses”. These virtual addresses are virtual in that they are redirected to physical memory location by one of a variety of dynamic address translation (DAT) technologies including, but not limited to, simply prefixing a virtual address with an offset value, translating the virtual address via one or more translation tables, the translation tables preferably comprising at least one segment table and one page table individually or in combination, preferably the segment table having an entry pointing to the page table. In z/Architecture, a translation hierarchy is provided including a region first table, region second table, region third table, segment table, and optional page table. Address translation execution is often improved by utilizing a translation lookaside buffer (TLB) which comprises entries mapping a virtual address to an associated physical memory location. Entries are created when the DAT translates a virtual address using the translation tables. Subsequent use of the virtual address can then use the fast TLB entry instead of the slow sequential translation table accesses. TLB content can be managed by a variety of replacement algorithms including LRU (Less Recently Used).
[0123] In the case where the processor is a processor of a multiprocessor system, each processor has the responsibility to maintain shared resources, such as I/O, caches, TLBs and memory, locked for coherence. Typically “snoop” technologies will be used to maintain cache coherency. In a snoop environment, each cache line can be marked as being in any one of a shared state, an exclusive state, a changed state, an invalid state, and the like to facilitate sharing.
[0124] The 5054 I/O units (figure 11) provide the processor with a means to attach itself to peripheral devices, including tape, disk, printers, displays, and networks, for example. I/O units are often introduced to the computer program by software drivers. On mainframes such as IBM® System z®, channel adapters and open system adapters are mainframe I/O units that provide communication between the operating system and peripheral devices.
[0125] In addition, other types of communication environments can benefit from one or more aspects of the present invention. As an example, an environment can include an emulator (eg, software or other emulation mechanisms), in which a specific architecture (including, for example, instruction execution, architected functions such as address translation, and architected records) or a subset of them is emulated (eg on a native computer system having a processor and memory). In such an environment, one or more emulation functions of the emulator may implement one or more aspects of the present invention, although a computer running the emulator may have a different architecture than the capabilities being emulated. As an example, in emulation mode the specific instruction or operation being emulated is decoded, and an appropriate emulation function is built to implement the individual instruction or operation.
[0126] In an emulation environment, a host computer includes, for example, a memory to store instructions and data; an instruction read unit for fetching instructions from memory and optionally providing local storage for the fetched instruction; an instruction decoding unit for receiving the fetched instructions and determining the type of instructions that were fetched; and an instruction execution unit for executing the instructions. Execution can include loading data into a register from memory; store data back into memory from a record; or perform some kind of arithmetic or logical operation, as determined by the decode unit. In one example, each unit is implemented in software. For example, the operations being performed by the units are implemented as one or more subroutines in the emulator software.
[0127] More particularly, on a mainframe, architected machine instructions are used by programmers, typically today's "C" programmers, often via a compiler application. These instructions stored on the storage medium can be executed natively on an IBM® z/Architecture Server, or alternatively on machines running other architectures. They can be emulated on existing and future IBM® mainframe servers and other IBM® machines (eg Power Systems Servers and System x® Servers. They can run on machines running Linux on a wide variety of of machines using hardware manufactured by IBM®, Intel®, AMD™, etc. In addition to running on that hardware under a z/Architecture, Linux can be used as well as machines that use emulation by Hercules, UMX or FSI (Fundamental Software, Inc. .) where in general execution is in an emulation mode In emulation mode, emulation software is executed by a native processor to emulate the architecture of an emulated processor.
[0128] The native processor typically runs emulation software comprising firmware or a native operating system to perform emulation of the emulated processor. The emulation software is responsible for fetching and executing instructions from the emulated processor architecture. Emulation software maintains an emulated program counter to track instruction boundaries. Emulation software can fetch one or more emulated machine instructions at a time and convert one or more emulated machine instructions into a corresponding group of native machine instructions for execution by the native processor. These converted instructions can be cached so that a faster conversion can be performed. Nevertheless, emulation software is maintaining the architecture rules of the emulated processor architecture in order to ensure operating systems and applications written for the emulated processor operating correctly. In addition, the emulation software is to provide features identified by the emulated processor architecture including, but not limited to, control registers, general purpose registers, floating point registers, dynamic address translation function including segment tables and tables for example, interrupt mechanisms, context switching mechanisms, Time of Day (TOD) clocks, and interfaces architected to I/O subsystems so that an operating system or an application program designed to run on the emulated processor can be run on the native processor having the emulation software.
[0129] A specific instruction being emulated is decoded, and a subroutine is called to perform the function of the individual instruction. An emulation software function emulating a function of an emulated processor is implemented, for example, in a "C" subroutine or driver, or some other method of providing a driver for specific hardware as will be understood by those in the art. after understanding the description of the preferred embodiment. Various software and hardware emulation patents including but not limited to US Patent Letters no. 5,551,013, entitled “Multiprocessor for hardware emulation”, by Beausoleil et al; and US Patent Letters no. 6,009,261, entitled “Proprocessing of stored target routines for emulating incompatible instructions on a target processor,” by Scalzi et al; and US patents no. 5,574,873 entitled “Decoding guest instruction to directly access emulation routines that emulate guest instructions ,” by Davidian et al; and US Patent Letters no. 6,308,255, entitled “Symmetrical multiprocessing bus and chipset used for coprocessor support allowing non-native code to run in a system,” by Gorishek et al; and US patents no. 6,463,582, entitled “Dynamic optimizing object code translator for architecture emulation and dynamics optimizing object code translation method”, by Lethin et al; and US patents no. 5,790,825 titled “Method for emulating guest instructions on a host computer through dynamics recompilation of host instructions,” by Eric Traut; and many others, illustrate a variety of known ways to achieve emulation of an instruction format architected for a different machine to a target machine available to those skilled in the art.
[0130] In Figure 13, an example of an emulated host computer system 5092 is provided that emulates a host computer system 5000’ of a host architecture. In emulated host computer system 5092, host processor (CPU) 5091 is an emulated host processor (or virtual host processor) and comprises an emulation processor 5093 having a different native instruction set architecture than that of the computer processor 5091 host 5000'. Emulated host computer system 5092 has memory 5094 accessible to emulation processor 5093. In the example embodiment, memory 5094 is divided into a portion of host computer memory 5096 and a portion of emulation routines 5097. 5096 is available for host computer programs emulated 5092 according to host computer architecture. Emulation processor 5093 executes native instructions from a set of architected instructions obtained from memory of emulation routines 5097, and can access a host instruction for execution from a program in host computer memory 5096 by employing one or more instruction(s). ) obtained in an access/decode sequence & routine which can decode the host instruction(s) accessed to determine a native instruction execution routine to emulate the function of the host instruction accessed. Other facilities that are defined for the 5000' host computer system architecture can be emulated by architected facility routines, including such facilities as general purpose registers, control registers, dynamic address translation, and I/O subsystem support and cache of processor, for example. Emulation routines can also take advantage of functions available in the 5093 emulation processor (such as general registers and dynamic virtual address translation) to improve the performance of the emulation routines. Special hardware and off-load motors can also be provided to assist the 5093 processor in emulating the function of the 5000 hospedeiro host computer.
[0131] The terminology used herein is for the purpose of describing specific embodiments only and is not intended to be limiting of the invention. As used here, the singular forms "a", "an" and "o, a" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising" when used in this descriptive report, specify the presence of aspects, integers, steps, operations, elements and/or components mentioned, but do not exclude the presence or addition of one or plus other aspects, integers, steps, operations, elements, components and/or groups thereof.
[0132] The corresponding structures, materials, acts and equivalents of all means or step plus function elements in the claims below, if any, are intended to include any structure, material or act to perform the function in combination with other elements claimed as specifically claimed . The description of one or more aspects of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in its disclosed form. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment has been chosen and described to better explain the principles of the invention and practical application, and to allow others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suitable for the specific use considered. vector string Instructions
[0133] Unless otherwise specified all operands are vector register operands. A “V” in assembler syntax designates a vector operand.
VECTOR FIND ANY EQUAL

[0134] Proceeding from left to right, every unsigned binary integer element of the second operand is compared for equality with each unsigned binary integer element of the third operand and optionally zero if Fetch flag zero is set in field M5.
[0135] If the Result Type (RT) indicator in field M5 is zero, then for each element in the second operand that matches any element in the third operand, or optionally zero, the bit positions of the corresponding element in the first operand are defined in ones, otherwise they are set to zero.
[0136] If the Result Type (RT) indicator in field M5 is one, then the byte index of the leftmost element in the second operand that matches an element in the third operand or zero is stored in byte seven of the first operand.
[0137] Each instruction has an Extended Mnemonics section that describes recommended extended mnemonics and its corresponding machine assembler syntax.
[0138] Programming Note: For all instructions that optionally set the condition code, performance may be degraded if the condition code is set.
[0139] If the Result Type (RT) flag in field M5 is one and no bytes are found to be equal, or zero if search flag zero is set, an index equal to the number of bytes in the array is stored in byte seven of the first operand.
[0140] The M4 field specifies the element size control (ES) . The ES control specifies the size of elements in the vector register operands. If a reserved value is specified, a specification exception is recognized.0 - byte1 - half-word2 - word3-15 - reserved
[0141] Field M5 has the following format:

[0142] The bits of field M5 are defined as follows: result is a mask of all range comparisons on that element. If one, a byte index is stored in byte seven of the first operand and zeros are stored in all other elements. • Zero search (ZS): if one, each element of the second operand is also compared with zero. • Condition code set (CC) : if zero, the condition code is not defined and remains unchanged. If one, the condition code is defined as specified in the following section. Special conditions
[0143] A specification exception is recognized and no further action is taken if any of the following occurs:1. Field M4 contains a value of 3-15.2. Bit 0 of field M5 is not zero. Resulting condition code:
[0144] If the CC flag is zero, the code remains unchanged.
[0145] If the CC flag is one, the code is defined as follows:3. if bit-ZS is set, there are no matches on an element indexed lower than zero in the second operand.4. some elements of the second operand match at least one element in the third operand.5. All elements of the second operand matched at least one element in the third operand. 6. No elements in the second operand matches any elements in the third operand. Program exceptions:
[0146] 1 Data with DXC FE, Vector Register• Operation if vector extension facility is not installedSpecification (ES reserved value) transaction limitation
VECTOR MEETS EQUAL ELEMENT

[0147] Proceeding from left to right, the unsigned binary integer elements of the second operand are compared with the corresponding unsigned binary integer elements of the third operand. If two elements are equal, the byte index of the first byte of the leftmost equal element is placed in byte seven of the first operand. Zeros are stored in the remaining bytes of the first operand. If no bytes are found to be equal, or zero if zero comparison is defined, then an index equal to the number of bytes in the array is stored in byte seven of the first operand. Zeros are stored in the remaining bytes.
[0148] If the Zero Fetch (ZS) bit is set in field M5, then each element in the second operand is also compared for equality with zero. If an element zero is found in the second operand before any other elements of the second and third operands are checked to be equal, the byte index of the first byte of the element checked to be zero is stored in byte seven the first operand and zeros are stored in all the other byte locations. If the Condition Code Set (CC) indicator is one, then the condition code is set to zero.
[0149] The M4 field specifies the element size control (ES) . The ES control specifies the size of elements in the vector register operands. If a reserved value is specified, a specification exception is recognized.0 - byte1 - half-word2 - word3-15 - reserved
[0150] Field M5 has the following format:

[0151] The bits of field M5 are defined as follows: Reserved: bits 0-1 are reserved and must be zero. Otherwise, a specification exception is recognized. Zero Fetch (ZS): if one, each element of the second operand is also compared to zero. • Condition code set (CC) : if zero, the condition code remains unchanged. If one, the condition code is defined as specified in the following section.
[0152] A specification exception is recognized and no further action is taken if any of the following occurs:1. Field M4 contains a value of 3-15.2. 0-1 of field M5 is not zero. Resulting condition code:
[0153] If bit 3 of field M5 is set to one, the code is set as follows:3. if the zero comparison bit is set, the comparison has detected a zero element in the second operand in an element with an index less than any equal comparisons.4. comparison detected a match between the second and third operands in some element. If the zero comparison bit is set, this match has occurred on an element with an index less than or equal to the zero comparison element.5. -.6. No element compared the same.
[0154] If bit 3 of field M5 is zero, code remains unchanged.Program exceptions: • Data with DXC FE, Vector register• Operation if vector extension facility is not installed• Specification (ES reserved value)• limitation extended mnemonic transaction:
Programming notes:
[0155] 1. A byte index is always stored in the first operand for any element size. For example, if the element size was defined in half word and the 2nd indexed half word matched equal, then a byte index of 4 would be stored.
[0156] 2. The third operand must not contain elements with a value of zero. If the third operand contains a zero and matches a zero element in the second operand before any other equal comparisons, condition code one is set regardless of the zero comparison bit setting.

[0157] Proceeding from left to right, the unsigned binary integer elements of the second operand are compared with the corresponding unsigned binary integer elements of the third operand. If two elements are not equal, the byte index of the leftmost equal non-equal element is placed in byte seven of the first operand and zeros are stored for all other bytes. If the condition code set (CC) bit in field M5 is set to one, the condition code is set to indicate which operand was greater. If all elements were equal, then a byte index equal to the vector size is placed in byte seven of the first operand and zeros are placed in all other byte locations. If bit CC is one, condition code three is set.
[0158] If the Zero Fetch (ZS) bit is set in field M5, each element in the second operand is also compared for equality with zero. If a zero element is found in the second operand before any other element of the second operand is found to be unequal, the byte index of the first byte of the element checked to be zero is stored in byte seven of the first operand. Zeros are stored in all other bytes and condition code 0 is set.
[0159] The M4 field specifies the element size control (ES) . The ES control specifies the size of elements in the vector register operands. If a reserved value is specified, a specification exception is recognized.0 - byte1 - half-word2 - word3-15 - reserved
[0160] Field M5 has the following format:

[0161] The bits of field M5 are defined as follows: • Zero search (ZS): if one, each element of the second operand is also compared to zero. • Condition code set (CC) : if zero, the code of condition is not defined and remains unchanged. If one, the condition code is defined as specified in the following section.
[0162] A specification exception is recognized and no further action is taken if any of the following occurs:1. Field M4 contains a value of 3-15.2. Bits 0-1 of field M5 are not zero. Resulting condition code:
[0163] If bit 3 of field M5 is set to one, the code is set as follows:
[0164] 0 if the comparison bit zero is set, the comparison has detected an element zero in the two operands in an indexed element lower than any unequal comparisons.
[0165] 1 an element mismatch has been detected and the element in VR2 is smaller than the element in VR3.
[0166] 2 an element mismatch has been detected and the element in VR2 is larger than the element in VR3.
[0167] 3 all elements compared equal, and if the zero comparison bit is set, no zero elements were found in the second operand.
[0168] If bit 3 of field M5 is zero, the code remains unchanged.Program exceptions:• Data with DXC FE, Vector register• Operation if the vector extension facility is not installed• Specification (ES reserved value)• limitation extended mnemonic transaction:
VECTOR STRING BAND COMPARISON

[0169] Proceeding from left to right, unsigned binary integer elements in the second operand are compared with value ranges defined by even-odd element pairs in the third and fourth operands.O combined with control values from the fourth operand defines the range of comparisons to be performed. If an element matches any of the ranges specified by the third and fourth operands, it is considered to be a match.
[0170] If the Result Type (RT) indicator in field M6 is zero, the bit positions of the element in the first operand corresponding with the element being compared in the second operand are set to one if the element matches any of the ranges, otherwise they are set to zero.
[0171] If the Result Type (RT) indicator in field M6 is set to one, the byte index of the first element in the second operand that matches any of the ranges specified by the third and fourth operands or a zero comparison, if the ZS indicator is set to one, is placed in byte seven of the first operand, and zeros are stored in the remaining bytes. If no elements match, then an index equal to the number of bytes in an array is placed in byte seven of the first operand and zeros are stored in the remaining bytes.
[0172] The Zero Search indicator (ZS) in field M6, if set to one, will add a zero comparison of the elements of the second operand to the range provided by the third and fourth operands. If a comparison is zero on an indexed element lower than any other comparison is true, then the condition code is set to zero.
[0173] Operands contain elements of the size specified by the Element Size control in field M5.
[0174] The elements of the fourth operand have the following format:
[0175] If ES equals 0:

[0176] If ES equals 1:

[0177] If ES equals 2:

[0178] The bits in the elements of the fourth operand are defined as follows: Equal (EQ) : when an equal comparison is made. Greater than (GT): when a greater than comparison is performed. • Less than (LT): when a comparison less than is performed • All other bits are reserved and must be zero to ensure future compatibility.
[0179] Control bits can be used in any combination. If none of the bits are set, the comparison will always produce a false result. If all bits are set, the comparison will always produce a true result.
[0180] The M5 field specifies the element size control (ES) . The ES control specifies the size of elements in the vector register operands. If a reserved value is specified, a specification exception is recognized.0 - Byte1 - half-word2 - word3-15 -reserved
[0181] Field M6 has the following format:

[0182] The bits of field M6 are defined as follows: • Invert result (IN) : If zero, comparison proceeds with the pair of values in the control vector. If one, the result of pairs of comparisons across ranges is inverted. Result type (RT) : if zero, each resulting element is a mask of comparisons of all ranges in that element. If one, an index is stored in byte seven of the first operand. Zeros are stored in the remaining bytes.• Fetch zero (ZS) : if one, each element of the second operand is also compared with zero.• Condition code definition (CC) : if zero, the condition code is not set and remains unchanged. If one, the condition code is defined as specified in the following section.
[0183] A specification exception is recognized and no further action is taken if any of the following occurs:1. Field M4 contains a value 3-15. Resulting condition code:
[0184] 0 if ZS=1 and a zero is found in an indexed element lower than any comparison
[0185] 1 Comparison found
[0186] 2 -
[0187] 3 No comparison foundProgram exceptions:• data with DXC FE, Vector register• operation if vector extension facility is not installed• specification (ES value reserved)• transaction limitation extended mnemonics:

[0188] Figure 23-1
[0189] ES = 1, ZS = 0
[0190] VR1(a) results with RT=0
[0191] VR1(b) results with RT=1 LOADING COUNT FOR BLOCK BORDER

[0192] A 32-bit unsigned binary integer containing the number of bytes possible to load from the second operand location without crossing a specified block boundary, covered in sixteen is placed in the first operand.
[0193] The offset is treated as a 12-bit unsigned integer.
[0194] The second operand address is not used to handle data.
[0195] The M3 field specifies a code that is used to signal the CPU regarding the block boundary size to compute the number of possible bytes loaded. If a reserved value is specified then a specification exception is recognized.
[0196] Code boundary0 64-byte1 128-byte2 256-byte3 512-byte4 1K-byte 5 2K-byte6 4K-byte7-15 reserved
[0197] Resulting condition code:0 operand one is sixteen1 -2 -3 operand one less than sixteen
[0198] Resulting condition code:
[0199] Program exceptions: • operation if vector extension facility is not installed • specification
[0200] Programming note: LOAD COUNT TO BLOCK BOUNDARY is expected to be used in combination with VECTOR LOAD TO BLOCK BOUNDARY to determine the number of bytes that have been loaded. VR ELEMENT GR VECTOR LOAD

[0201] The third operand element of size specified by the value ES in field M4 and indexed by the address of the second operand is placed at the location of the first operand. The third operand is a vector register. The first operand is a general register. If the index specified by the address of the second operand is greater than the highest numbered element in the third operand of the specified element size, the data in the first operand is unpredictable.
[0202] If the vector register element is less than a double word, the element is right-aligned in the 64-bit general register and zeros fill the remaining bits.
[0203] Second operand address is not used to address data; the rightmost 12 bits of the address are used to specify the index of an element in the second operand.
[0204] The M4 field specifies the element size control (ES) . The ES control specifies the size of elements in the vector register operands. If a reserved value is specified, a specification exception is recognized.
[0205] 0 - byte
[0206] 1 - half-word
[0207] 2-word
[0208] 3 - double word
[0209] 4-15 reserved unchanged. Resulting Condition Code: The code is unchanged.
[0210] Program exceptions:• . data with DXC FE, Vector Register• . Operation if the vector extension feature is not installed.• . specification (ES reserved value)• . Extended Mnemonic Transaction Limitation:
VECTOR LOAD TO BLOCK BORDER

[0211] The first operand is loaded starting at zero indexed byte element with bytes from the second operand. If a boundary condition is met, the rest of the first operand is unpredictable. Access exceptions are not recognized in unloaded bytes.
[0212] The offset for VLBB is treated as a 12-bit unsigned integer.
[0213] The M3 field specifies a code that is used to signal the CPU regarding the block boundary size to load. If a reserved value is specified, a specification exception is recognized. Code Boundary0 64-byte1 128-byte2 256-byte3 512-byte4 1K-byte5 2K-byte6 4K-byte7-15 reserved
[0214] Resulting condition code: code remains unchanged.
[0215] Program exceptions: • access (search, operand 2) • data with DXC FE, Vector register • operation if vector extension facility is not installed • specification (reserved block boundary code) Programming notes
[0216] Under certain circumstances data may be loaded across the block boundary. However, this will only happen if there are no access exceptions on that data.VECTOR STORAGE

[0217] The 128-bit value in the first operand is stored in the storage location specified by the second operand. The offset to VST is treated as a 12-bit unsigned integer.
[0218] Resulting condition code: the code remains unchanged.
[0219] Program exceptions:• access (search, operand 2)• data with DXC FE, Vector register• operation if vector extension facility is not installed• transaction limitation VECTOR STORAGE WITH LENGTH

[0220] Proceeding from left to right, bytes from the first operand are stored in the location of the second operand. The third operand specified in the general register contains a 32-bit unsigned integer containing a value representing the highest indexed byte to store. If the third operand contains a value greater than or equal to the highest byte index of the array, all bytes of the first operand are stored.
[0221] Access exceptions are only recognized in stored bytes.
[0222] The offset for VECTOR STORE WITH LENGTH is treated as a 12-bit unsigned integer.
[0223] Resulting Condition Code: Condition code remains unchanged.
[0224] Program exceptions:• . access (search, operand 2)• . data with DXC FE, Vector Register• . operation if vector extension facility is not installed Description of RXB
[0225] All vector instructions have a field in bits 36-40 of the instruction labeled RXB. This field contains the most significant bits for all operands designated by vector register. Bits for register assignments not specified by the instruction are reserved and must be set to zero; otherwise, the program may not operate compatibly in the future. The most significant bit is concatenated to the left of the four-bit register designation to create the five-bit vector register designation.
[0226] Bits are defined as follows:
[0227] 0. Most significant bit for vector register designation in bits 8-11 of the instruction.
[0228] 1. Most significant bit for vector register designation in bits 12-15 of the instruction.
[0229] 2. Most significant bit for vector register designation in bits 16-19 of the instruction.
[0230] Most significant T for vector register designation in bits 32-35 of the instruction.
[0231] Vector registers and instructions can only be used if both the vector enable control (bit 46) and the AFP register control (bit 45) in the zero control register are set to one. If the vector facility is installed and a vector instruction is executed without setting the enable bits, a data exception with DXC FE hex is recognized. If the vector facility is not installed, an operation exception is recognized.

权利要求:
Claims (11)
[0001]
1. Method for executing a machine instruction in a central processing unit, characterized in that it comprises the steps of: obtaining, by a processor, a machine instruction for execution, the machine instruction being defined for execution of the computer according to a computer architecture, the machine instruction, comprising: at least one opcode field for providing an opcode, the opcode identifying a load for block boundary operation; a record field to be used to designate a register, the register comprising a first operand; at least one field for locating a second operand in main memory; and a block boundary type indicator for indicating a specific type of block boundary of the second operand; and executing the machine instruction, the execution comprising: loading only bytes of the first operand with corresponding bytes of the second operand that are within a block of main memory, a main memory block boundary being dynamically determined based on a specific type of block boundary and one or more processor characteristics, where only loading comprises loading from a block of the second operating a variable amount of data in a first operand while ensuring that only the data within the block of the second operand is being loaded into the first operand, where the loading of the block of the second operand starts at a start address in the block of the second operand, the given start address per machine instruction, and where loading ends on or before a block boundary determined from the block of the second operand, where the variable amount of data loaded is based on the starting address and the determined block boundary, the determined block boundary being dynamically determined based on the specified type of block boundary and one or more characteristics of the processor, and where the variable amount of data is a minimum number of bytes in the first operand or a number of bytes carried to the given block boundary.
[0002]
2. Method according to claim 1, characterized in that the at least one field comprises an offset field, a base field and an index field, the base field and the index field for locating general records which have contents to be added to the contents of the displacement field to form an address of the second operand.
[0003]
3. Method according to claim 1, characterized in that the machine instruction further comprises a mask field, the mask field comprising a block boundary type indicator.
[0004]
4. Method according to claim 1, characterized in that one or more features comprises one of a processor cache line size or a processor page size.
[0005]
5. Method according to claim 1, characterized in that the execution comprises determining a block boundary using an address of the second operand, wherein the address is used in a data structure search to determine the block boundary .
[0006]
6. Method according to claim 1, characterized in that an address of the second operand is a starting address in memory from which the data must be loaded into the first operand, and in which the execution further comprises the determination of an end address where loading is to stop, where loading stops at the end address.
[0007]
7. Method according to claim 6, characterized in that determining the final address comprises calculating the final address as follows: end address = minimum of (initial address + (boundary size - (initial address AND NOT) boundary mask)), start address + record size), where boundary size is the block boundary, boundary mask equals 0 - boundary size, and record size is a specific record size.
[0008]
8. Method according to claim 1, characterized in that loading comprises one of the following: loading the first operand from left to right, or loading the first operand from right to left.
[0009]
9. Method according to claim 7, characterized in that a loading direction is provided at runtime.
[0010]
10. Method according to claim 1, characterized in that the machine instruction further comprises an extension field to be used in the designation of one or more records, and wherein the record field is combined with at least one portion of the extension field to designate the record.
[0011]
11. System, characterized in that it comprises means adapted to carry out all the steps of the method as defined in any one of claims 1 to 10.

类似技术:

公开号 | 公开日 | 专利标题

BR112014022725B1|2021-08-10|METHOD FOR LOADING DATA TO A DYNAMICALLY DETERMINED BORDER OF MEMORY

US9946542B2|2018-04-17|Instruction to load data up to a specified memory boundary indicated by the instruction

EP2825954B1|2021-06-02|Vector string range compare

EP2758891B1|2015-10-07|Vector find element equal instruction

US9710267B2|2017-07-18|Instruction to compute the distance to a specified memory boundary

BR112014022638B1|2022-01-04|METHOD, PHYSICAL SUPPORT AND EQUIPMENT TO TRANSFORM INSTRUCTION SPECIFICATIONS IN A COMPUTATIONAL ENVIRONMENT

IL232813A|2017-07-31|Vector find element not equal instruction

BR112014022727B1|2021-10-13|INSTRUCTION FOR LOADING DATA TO A SPECIFIC MEMORY BORDER INDICATED BY THE INSTRUCTION

BR112014022726B1|2022-02-15|METHOD OF EXECUTING A MACHINE INSTRUCTION ON A CENTRAL PROCESSING UNIT, COMPUTER-READable MEDIUM AND COMPUTER SYSTEM

同族专利:

公开号 | 公开日

EP2758868A1|2014-07-30|

HK1201353A1|2015-08-28|

WO2013135556A1|2013-09-19|

EP2758868B1|2018-07-04|

US9471312B2|2016-10-18|

US20130246740A1|2013-09-19|

CN104185839B|2017-06-06|

JP6278906B2|2018-02-14|

JP2015516618A|2015-06-11|

WO2013135556A9|2013-12-27|

US9459868B2|2016-10-04|

BR112014022725A2|2017-06-20|

PL2758868T3|2018-10-31|

US20130246762A1|2013-09-19|

US20160266903A1|2016-09-15|

US20160266904A1|2016-09-15|

US9959118B2|2018-05-01|

US9952862B2|2018-04-24|

CN104185839A|2014-12-03|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

JPH0470662B2|1985-07-31|1992-11-11|Nippon Electric Co|

US5073864A|1987-02-10|1991-12-17|Davin Computer Corporation|Parallel string processor and method for a minicomputer|

US5222225A|1988-10-07|1993-06-22|International Business Machines Corporation|Apparatus for processing character string moves in a data processing system|

US5465374A|1993-01-12|1995-11-07|International Business Machines Corporation|Processor for processing data string by byte-by-byte|

AU6629894A|1993-05-07|1994-12-12|Apple Computer, Inc.|Method for decoding guest instructions for a host computer|

WO1994029790A1|1993-06-14|1994-12-22|Apple Computer, Inc.|Method and apparatus for finding a termination character within a variable length character string or a processor|

US5509129A|1993-11-30|1996-04-16|Guttag; Karl M.|Long instruction word controlling plural independent processor operations|

US6185629B1|1994-03-08|2001-02-06|Texas Instruments Incorporated|Data transfer controller employing differing memory interface protocols dependent upon external input at predetermined time|

US5551013A|1994-06-03|1996-08-27|International Business Machines Corporation|Multiprocessor for hardware emulation|

WO1996010103A1|1994-09-27|1996-04-04|Nkk Corporation|Galvanized steel sheet and process for producing the same|

US5790825A|1995-11-08|1998-08-04|Apple Computer, Inc.|Method for emulating guest instructions on a host computer through dynamic recompilation of host instructions|

US5812147A|1996-09-20|1998-09-22|Silicon Graphics, Inc.|Instruction methods for performing data formatting while moving data between memory and a vector register file|

US5931940A|1997-01-23|1999-08-03|Unisys Corporation|Testing and string instructions for data stored on memory byte boundaries in a word oriented machine|

DE69804495T2|1997-11-24|2002-10-31|British Telecomm|INFORMATION MANAGEMENT AND RECOVERY OF KEY TERMS|

US6009261A|1997-12-16|1999-12-28|International Business Machines Corporation|Preprocessing of stored target routines for emulating incompatible instructions on a target processor|

US5881260A|1998-02-09|1999-03-09|Hewlett-Packard Company|Method and apparatus for sequencing and decoding variable length instructions with an instruction boundary marker within each instruction|

US6094695A|1998-03-11|2000-07-25|Texas Instruments Incorporated|Storage buffer that dynamically adjusts boundary between two storage areas when one area is full and the other has an empty data register|

US6334176B1|1998-04-17|2001-12-25|Motorola, Inc.|Method and apparatus for generating an alignment control vector|

US6308255B1|1998-05-26|2001-10-23|Advanced Micro Devices, Inc.|Symmetrical multiprocessing bus and chipset used for coprocessor support allowing non-native code to run in a system|

US20020147969A1|1998-10-21|2002-10-10|Richard A. Lethin|Dynamic optimizing object code translator for architecture emulation and dynamic optimizing object code translation method|

JP3564395B2|1998-11-27|2004-09-08|松下電器産業株式会社|Address generation device and motion vector detection device|

US6192466B1|1999-01-21|2001-02-20|International Business Machines Corporation|Pipeline control for high-frequency pipelined designs|

US8127121B2|1999-01-28|2012-02-28|Ati Technologies Ulc|Apparatus for executing programs for a first computer architechture on a computer of a second architechture|

US6189088B1|1999-02-03|2001-02-13|International Business Machines Corporation|Forwarding stored dara fetched for out-of-order load/read operation to over-taken operation read-accessing same memory location|

US6499116B1|1999-03-31|2002-12-24|International Business Machines Corp.|Performance of data stream touch events|

US6802056B1|1999-06-30|2004-10-05|Microsoft Corporation|Translation and transformation of heterogeneous programs|

US6381691B1|1999-08-13|2002-04-30|International Business Machines Corporation|Method and apparatus for reordering memory operations along multiple execution paths in a processor|

US6513107B1|1999-08-17|2003-01-28|Nec Electronics, Inc.|Vector transfer system generating address error exception when vector to be transferred does not start and end on same memory page|

US6513109B1|1999-08-31|2003-01-28|International Business Machines Corporation|Method and apparatus for implementing execution predicates in a computer processing system|

US6820195B1|1999-10-01|2004-11-16|Hitachi, Ltd.|Aligning load/store data with big/little endian determined rotation distance control|

US6449706B1|1999-12-22|2002-09-10|Intel Corporation|Method and apparatus for accessing unaligned data|

JP2001236249A|2000-02-24|2001-08-31|Nec Corp|Device and method for managing memory|

US6625724B1|2000-03-28|2003-09-23|Intel Corporation|Method and apparatus to support an expanded register set|

US6349361B1|2000-03-31|2002-02-19|International Business Machines Corporation|Methods and apparatus for reordering and renaming memory references in a multiprocessor computer system|

US6701424B1|2000-04-07|2004-03-02|Nintendo Co., Ltd.|Method and apparatus for efficient loading and storing of vectors|

US6408383B1|2000-05-04|2002-06-18|Sun Microsystems, Inc.|Array access boundary check by executing BNDCHK instruction with comparison specifiers|

JP3801987B2|2000-10-18|2006-07-26|コーニンクレッカフィリップスエレクトロニクスエヌヴィ|Digital signal processor|

US7487330B2|2001-05-02|2009-02-03|International Business Machines Corporations|Method and apparatus for transferring control in a computer system with dynamic compilation capability|

US7100026B2|2001-05-30|2006-08-29|The Massachusetts Institute Of Technology|System and method for performing efficient conditional vector operations for data parallel architectures involving both input and conditional vector values|

JP3900863B2|2001-06-28|2007-04-04|シャープ株式会社|Data transfer control device, semiconductor memory device and information equipment|

US6839828B2|2001-08-14|2005-01-04|International Business Machines Corporation|SIMD datapath coupled to scalar/vector/address/conditional data register file with selective subpath scalar processing mode|

US6907443B2|2001-09-19|2005-06-14|Broadcom Corporation|Magnitude comparator|

US6570511B1|2001-10-15|2003-05-27|Unisys Corporation|Data compression method and apparatus implemented with limited length character tables and compact string code utilization|

US20100274988A1|2002-02-04|2010-10-28|Mimar Tibet|Flexible vector modes of operation for SIMD processor|

US7089371B2|2002-02-12|2006-08-08|Ip-First, Llc|Microprocessor apparatus and method for prefetch, allocation, and initialization of a block of cache lines from memory|

US7441104B2|2002-03-30|2008-10-21|Hewlett-Packard Development Company, L.P.|Parallel subword instructions with distributed results|

US7373483B2|2002-04-02|2008-05-13|Ip-First, Llc|Mechanism for extending the number of registers in a microprocessor|

US7376812B1|2002-05-13|2008-05-20|Tensilica, Inc.|Vector co-processor for configurable and extensible processor architecture|

US6918010B1|2002-10-16|2005-07-12|Silicon Graphics, Inc.|Method and system for prefetching data|

US7103754B2|2003-03-28|2006-09-05|International Business Machines Corporation|Computer instructions for having extended signed displacement fields for finding instruction operands|

US20040215924A1|2003-04-28|2004-10-28|Collard Jean-Francois C.|Analyzing stored data|

US7035986B2|2003-05-12|2006-04-25|International Business Machines Corporation|System and method for simultaneous access of the same line in cache storage|

US20040250027A1|2003-06-04|2004-12-09|Heflinger Kenneth A.|Method and system for comparing multiple bytes of data to stored string segments|

US7539714B2|2003-06-30|2009-05-26|Intel Corporation|Method, apparatus, and instruction for performing a sign operation that multiplies|

US7610466B2|2003-09-05|2009-10-27|Freescale Semiconductor, Inc.|Data processing system using independent memory and register operand size specifiers and method thereof|

US7904905B2|2003-11-14|2011-03-08|Stmicroelectronics, Inc.|System and method for efficiently executing single program multiple data programs|

GB2411973B|2003-12-09|2006-09-27|Advanced Risc Mach Ltd|Constant generation in SMD processing|

US20060095713A1|2004-11-03|2006-05-04|Stexar Corporation|Clip-and-pack instruction for processor|

JP4837305B2|2005-05-10|2011-12-14|ルネサスエレクトロニクス株式会社|Microprocessor and control method of microprocessor|

US7421566B2|2005-08-12|2008-09-02|International Business Machines Corporation|Implementing instruction set architectures with non-contiguous register file specifiers|

US20070106883A1|2005-11-07|2007-05-10|Choquette Jack H|Efficient Streaming of Un-Aligned Load/Store Instructions that Save Unused Non-Aligned Data in a Scratch Register for the Next Instruction|

US9436468B2|2005-11-22|2016-09-06|Intel Corporation|Technique for setting a vector mask|

US8010953B2|2006-04-04|2011-08-30|International Business Machines Corporation|Method for compiling scalar code for a single instruction multiple data execution engine|

US7565514B2|2006-04-28|2009-07-21|Freescale Semiconductor, Inc.|Parallel condition code generation for SIMD operations|

CN101097488B|2006-06-30|2011-05-04|2012244安大略公司|Method for learning character fragments from received text and relevant hand-hold electronic equipments|

US9069547B2|2006-09-22|2015-06-30|Intel Corporation|Instruction and logic for processing text strings|

JP2008077590A|2006-09-25|2008-04-03|Toshiba Corp|Data transfer device|

US7536532B2|2006-09-27|2009-05-19|International Business Machines Corporation|Merge operations of data arrays based on SIMD instructions|

US7676659B2|2007-04-04|2010-03-09|Qualcomm Incorporated|System, method and software to preload instructions from a variable-length instruction set with proper pre-decoding|

US7991987B2|2007-05-10|2011-08-02|Intel Corporation|Comparing text strings|

CN101755265A|2007-05-21|2010-06-23|茵科瑞蒂梅尔有限公司|Interactive message editing system and method|

US20090063410A1|2007-08-29|2009-03-05|Nils Haustein|Method for Performing Parallel Data Indexing Within a Data Storage System|

US7895419B2|2008-01-11|2011-02-22|International Business Machines Corporation|Rotate then operate on selected bits facility and instructions therefore|

US7739434B2|2008-01-11|2010-06-15|International Business Machines Corporation|Performing a configuration virtual topology change and instruction therefore|

US7870339B2|2008-01-11|2011-01-11|International Business Machines Corporation|Extract cache attribute facility and instruction therefore|

US7877582B2|2008-01-31|2011-01-25|International Business Machines Corporation|Multi-addressable register file|

EP2245529A1|2008-02-18|2010-11-03|Sandbridge Technologies, Inc.|Method to accelerate null-terminated string operations|

DK176835B1|2008-03-07|2009-11-23|Jala Aps|Method of scanning, medium containing a program for carrying out the method and system for carrying out the method|

US20090282220A1|2008-05-08|2009-11-12|Mips Technologies, Inc.|Microprocessor with Compact Instruction Set Architecture|

US8386547B2|2008-10-31|2013-02-26|Intel Corporation|Instruction and logic for performing range detection|

US20120023308A1|2009-02-02|2012-01-26|Renesas Electronics Corporation|Parallel comparison/selection operation apparatus, processor, and parallel comparison/selection operation method|

JP5471082B2|2009-06-30|2014-04-16|富士通株式会社|Arithmetic processing device and control method of arithmetic processing device|

US8595471B2|2010-01-22|2013-11-26|Via Technologies, Inc.|Executing repeat load string instruction with guaranteed prefetch microcode to prefetch into cache for loading up to the last value in architectural register|

JP2011212043A|2010-03-31|2011-10-27|Fujifilm Corp|Medical image playback device and method, as well as program|

US20110314263A1|2010-06-22|2011-12-22|International Business Machines Corporation|Instructions for performing an operation on two operands and subsequently storing an original value of operand|

US8972698B2|2010-12-22|2015-03-03|Intel Corporation|Vector conflict instructions|

US9009447B2|2011-07-18|2015-04-14|Oracle International Corporation|Acceleration of string comparisons using vector instructions|

US9459864B2|2012-03-15|2016-10-04|International Business Machines Corporation|Vector string range compare|

US9268566B2|2012-03-15|2016-02-23|International Business Machines Corporation|Character data match determination by loading registers at most up to memory block boundary and comparing|

US9588762B2|2012-03-15|2017-03-07|International Business Machines Corporation|Vector find element not equal instruction|

US9454367B2|2012-03-15|2016-09-27|International Business Machines Corporation|Finding the length of a set of character data having a termination character|

US9454366B2|2012-03-15|2016-09-27|International Business Machines Corporation|Copying character data having a termination character from one memory location to another|

US9715383B2|2012-03-15|2017-07-25|International Business Machines Corporation|Vector find element equal instruction|

US9280347B2|2012-03-15|2016-03-08|International Business Machines Corporation|Transforming non-contiguous instruction specifiers to contiguous instruction specifiers|

US9459867B2|2012-03-15|2016-10-04|International Business Machines Corporation|Instruction to load data up to a specified memory boundary indicated by the instruction|

US9710266B2|2012-03-15|2017-07-18|International Business Machines Corporation|Instruction to compute the distance to a specified memory boundary|

US9459868B2|2012-03-15|2016-10-04|International Business Machines Corporation|Instruction to load data up to a dynamically determined memory boundary|US9280347B2|2012-03-15|2016-03-08|International Business Machines Corporation|Transforming non-contiguous instruction specifiers to contiguous instruction specifiers|