CorePy is designed to give developers direct access to the underlying processor from Python. CorePy provides four main components for interacting with the processor:
These components are used to develop synthetic programs. Synthetic programs are small computational kernels synthesized at run time and optimized for the data they will process. A single Python program can use many synthetic programs during its lifetime, much in the same way it can use many external libraries. Synthetic programs may be created using synthetic components. Synthetic components are reusable synthetic functions and classes.
Programming with CorePy is similar to programming in assembly language. Instead of using an assembler to generate object files, CorePy collects sequences of instructions, renders them to machine instructions, and makes them directly executable from a Python program. In fact, CorePy's pretty printing library can be used to output NASM formatted assembly code.
Contents |
To get a feel for CorePy, let's start with a simple interactive session. First, make sure corepy is in your Python path and start the Python interpreter. Next, select the example for your architecture and enter it into the interpreter:
x86_64
# Load the x86_64 instructions and environment >>> import corepy.arch.x86_64.isa as x86 >>> import corepy.arch.x86_64.platform as x86_env Platform: linux.spre_linux_x86_64_64 # Create a simple synthetic program >>> code = x86_env.InstructionStream() >>> code.add(x86.mov(code.gp_return, 12)) # Execute the synthetic program >>> proc = x86_env.Processor() >>> result = proc.execute(code, mode='int') >>> print result 12
PowerPC
# Import the ppc instructions and run-time environment: >>> import corepy.arch.ppc.isa as ppc >>> import corepy.arch.ppc.platform as ppc_env Platform: linux.spre_linux_cell_64 # Create a simple synthetic program >>> code = ppc_env.InstructionStream() >>> code.add(ppc.addi(code.gp_return, 0, 12)) # Execute the synthetic program >>> proc = ppc_env.Processor() >>> result = proc.execute(code) >>> print result 12
Cell SPU
# Load the SPU instructions and environment >>> import corepy.arch.spu.isa as spu >>> import corepy.arch.spu.platform as spu_env Platform: linux.spre_linux_spu # Create a simple empty synthetic program >>> code = spu_env.InstructionStream() >>> code.add(spu.il(code.gp_return, 12)) # Execute the synthetic program on an SPU >>> proc = spu_env.Processor() >>> result = proc.execute(code, mode='int') >>> print result 12
NOTE: The above examples work correctly on CorePy 1.0, but no longer work on the trunk due to the New Code Composition Support. The following example is an equivalent x86_64 program that will work on the trunk:
x86_64
# Load the x86_64 instructions and environment >>> import corepy.arch.x86_64.isa as x86 >>> import corepy.arch.x86_64.platform as x86_env Platform: linux.spre_linux_x86_64_64 # Create a simple synthetic program >>> prgm = x86_env.Program() >>> code = prgm.get_stream() >>> code.add(x86.mov(prgm.gp_return, 12)) >>> prgm += code # Execute the synthetic program >>> proc = x86_env.Processor() >>> result = proc.execute(prgm, mode='int') >>> print result 12
Congratulations! You have just created and executed your first
synthetic program.
A few things to note in the above examples:
CorePy's programming model is processor agnostic. Of course, each processor has its own instruction set. But, if you learn the basics of CorePy for one processor platform, programming for another platform only requires learning the particulars of the new platform's ISA. All the basic features of ISA calling, code synthesis, and execution are the same on all platforms.
CorePy is designed to make programming at the processor level as easy as using a library. As with any library, there are a few basic data types that form the main programming model for CorePy. If you have programmed in assembly (or even machine) languages before, these will be familiar. If not, don't worry! The concepts all fairly simple and easy to understand.
The next few sections describe the main components in the CorePy programming model, starting with the basics of working with registers and instructions, and progressing through program creation and execution.
Registers are the simplest data type in CorePy and form the foundation for all other data types. When a physical processor operates on data, it can generally only access data stored in its local registers. Load and store instructions move data between memory and registers. CorePy registers correspond directly to processor registers. Processors have a limited number of registers and some care must be taken when designing synthetic programs to ensure the registers are not used up. Compilers use complex register allocation algorithms to assign registers to instructions. CorePy leaves overall register management up to the developer, but does provide some features to simplify the task.
In CorePy, registers are 'owned' by instances of InstructionStream. Thus, each synthetic program has its own set of registers, much like each thread in a C program has a set of registers. Registers can be acquired and released through the InstructionStream. An exception is thrown if a register is requested when there are none remaining. Some platforms include multiple types of registers. These are requested using a type argument. The following examples show how to acquire different register types for a PPC InstructionStream:
import corepy.arch.ppc.platform as env code = env.InstructionStream() # Acquire a general purpose (integer) register ra = code.acquire_register() # Acquire a floating point register fa = code.acquire_register('fp') # Acquire a VMX register va = code.acquire_register('vector') # Release the registers code.release_register(ra) code.release_register(fa) code.release_register(va)
Some InstructionStream classes contain special registers. The most common are return registers. These are the registers that values are placed in to return them from a function call. The gp_return and fp_return return registers on the PowerPC InstructionStream can be used for this purpose.
G4/G5 and Cell PowerPC processors have 32 general purpose, 32 floating point, and 32 AltiVec/VMX registers. Each SPU has 128 general purpose registers that can hold either integer or floating point vectors. x86 has 8 general purpose, 8 x87 floating point, and 8 SSE registers, while x86_64 extends the general purpose and SSE register counts to 16.
Instructions are the main interface to the processor. Instructions operate on two different types of data, data stored in registers and data encoded directly into the instruction, called immediate operands. In CorePy, register operands must be acquired registers and while immediate operands can be any type convertible to int.
The operand order for the instructions matches the assembly order. As a general rule, the first operand is the destination, followed by the register operands and finally the immediate operands.
For instance, the SPU ai, or add immediate, instruction, adds the value in register A to a constant and stores the value in register D. The assembly and CorePy versions of the instruction are:
ai D, A, 12 # Assembly spu.ai(D, A, 12) # CorePy
One naming convention worth nothing: some instructions are listed in the ISA manuals with an italic x at the end. These instructions appear in CorePy with the x. For example, the PowerPC add instruction is listed as addx. The corresponding CorePy instruction is ppc.addx(...). If enough people complain about this, I'll drop the x's.
Many instructions have bit flags that are commonly set to the same value every time the instruction is used. In assembly language, these bit flags are changed by using mnemonic forms of the instructions. Instead of using mnemonics, CorePy makes these flags optional keyword arguments. In all cases, the flags default to 0. For example, the full form of the addx instruction is
ppc.addx(D, A, B, oe = 0, rc = 0)
To call addx with the rc bit set to one, use
ppc.addx(D, A, B, rc = 1)
Instructions are collected into synthetic programs using InstructionStream instances. We have already seen how to use InstructionStreams to acquire registers. Instructions can be added to InstructionStreams using two methods, a explicit add and an implicit add mode.
The initial examples used the explicit add method:
code.add(ppc.addi(code.gp_return, 0, 42))
Of course, using code.add(...) adds some visual noise to the code and becomes cumbersome to type for longer synthetic programs. Active Code mode associates an InstructionStream with an ISA and automatically adds Instructions the the stream as they are created:
code = env.InstructionStream() # Set the current instruction stream ppc.set_active_code(code) # The addi instruction is automatically added to the instruction stream ppc.addi(code.gp_return, 0, 42)
The get_active_code() method returns the current active instruction stream. This is useful for library development, when synthetic components may work on multiple instruction streams.
old_active = spu.get_active_code() spu.set_active_code(code) spu.xor(x, x, x) spu.ai(x, x, 11) spu.ai(x, x, 31) spu.set_active_code(old_active)
TODO - talk about iterating on IS's and ways to access instructions already added?
Once a synthetic program is built, it can be executed on a Processor instance. All versions of the PowerPC and SPU Processor objects support synchronous and asynchronous execution. Additionally, the PowerPC Processor object can return integer and floating point values stored in the gp/fp_return registers. Note that asynchronous execution is true multithreaded execution. If there are two processors available and two synthetic programs are executed, the will most likely execute on separate processors (the operating system makes the final decision).
# Execution examples ... create a synthetic program ... proc = env.Processor() # Synchronous (blocking) execution, return int result = proc.execute(code) # Synchronous (blocking) execution, return float result = proc.execute(code, mode = 'fp') # Asynchronous (blocking) execution prog_id = proc.execute(code, async = True) # Stop/restart proc.suspend(proc_id) proc.resume(proc_id) # Wait for the prog to finish proc.join(proc_id)
TODO: Flesh this out, link to extended array
Key points:
Every CPU architecture supported by CorePy provides some sort of branching functionality. To facilitate easy use of branches, assembly-style labels are supported. Currently, these labels may only be used for branching purposes; expanded support for data referencing and other uses are planned.
Labels are Python objects generated by InstructionStream objects. A label is only valid for use with the InstructionStream that creates it; mixing labels across InstructionStream objects is invalid and behavior is undefined.
Labels are obtained via the InstructionStream get_label() method. Use it like the following:
lbl_loop = code.get_label("LOOP")
In the above example, a Label with the name LOOP is created and assigned to the lbl_loop variable. If get_label() is called again with LOOP as the label name, the exact same label object is returned. Note that this merely creates a Label object; nothing has been added to the synthesized code in the InstructionStream. To introduce the label into the instruction stream, do the following:
code.add(lbl_loop)
This causes the label to be added after the last instruction in the InstructionStream, and before and subsequent instructions or labels that may be added. Referencing the label in instructions is straightforward:
code.add(x86.jmp(lbl_loop))
It might seem a little strange that Labels are created and added to the InstructionStream separately. This is to allow for forward label references (the above example is a backwards label reference). A quick example of a forward reference:
lbl_skip = code.get_label("SKIP") code.add(x86.jnz(lbl_skip)) # more code generated here code.add(lbl_skip)
The exact branch instructions to be used depend on the architecture and desired behavior.
Three labels are predefined within the InstructionStream class for convenience -- PROLOGUE, BODY, and EPILOGUE. These labels are placed at the beginning of their respective sections in the synthesized code. PROLOGUE is set right before any ABI instructions that may be needed. BODY is placed immediately after the prologue, just before any instructions that may be added to the stream. EPILOGUE is placed after the instructions forming the body of the InstructionStream, but before and ABI instructions needed for termination. The recommended method for returning early from a synthesized stream is to branch to the EPILOGUE label. These labels are accessible via the get_label method with their respective names, or are available via the lbl_prologue, lbl_body, and lbl_epilogue InstructionStream member variables.
The InstructionStream class has a built-in method for printing its contents:
code.print_code(hex = True, binary = True, pro = True, epi = True)
A number of keyword arguments are available for printing in different ways. The hex and binary keywords, if set to True cause the machine code to be printed under each instruction in either hexadecimal or binary. The pro and epi keywords may be used to enable printing of the prologue and epilogue code generated internally by the InstructionStream.
In addition, a generic printer module allows code to be printed using a plugin-like architecture to define different output syntaxes. The following example prints an InstructionStream using the 'default' syntax (similar output to the print_code() method above), but redirects it to a StringIO file descriptor:
import corepy.lib.printer as printer fd = StringIO.StringIO() printer.PrintInstructionStream(code, printer.Default(), fd = fd)
In the example above, the Default plugin was used. Although not given any arguments in this case, it supports the same keyword arguments as the print_code() method as shown above, but with different names: show_hex, show_binary, show_prologue, and show_epilogues. A few more keyword arguments are also supported: line_numbers, if set to True, causes a line number to be prefixed before each line of output. inst_prefix, which may be set to a string value, prefixes that string to every instruction.
Other output plugins, in varying degrees of completeness, are also available. The x86_64_Nasm (and x86_Nasm) plugin prints code in an NASM-compatible, Intel-style assembly syntax. In fact, this output may be compiled into an object file using NASM. The x86_64_Nasm plugin prints the prologue and epilogue by default. The function_name keyword argument, when set to a string value, causes a .global directive and label to emitted, with the string used as the label name. Example:
import corepy.lib.printer as printer import corepy.arch.x86_64.platform as env import corepy.arch.x86_64.isa as x86 from corepy.arch.x86_64.types.registers import * code = env.InstructionStream() x86.set_active_code(code) x86.mov(code.gp_return, 11) x86.add(code.gp_return, 31) printer.PrintInstructionStream(code, printer.x86_64_Nasm(function_name="foobar"))
The output looks like the following:
BITS 64 SECTION .text global foobar foobar: PROLOGUE: push rbp mov rbp, rsp push r15 push r14 push r13 push r12 push rbx BODY: mov rax, 11 add rax, 31 EPILOGUE: pop rbx pop r12 pop r13 pop r14 pop r15 leave ret
A similar SPU_Asm plugin is available, which outputs GAS-compatible SPU assembly syntax. A comment_chan keyword argument, if set to True, causes any wrch/rdch instructions in the stream to be commented. This is useful for feeding synthesized SPU code into the spu_timing tool (included in the Cell SDK) for performance tuning.