CorePy User Guide
CorePy Overview
CorePy is designed to give developers direct access to the underlying processor from Python. CorePy provides four main components for interacting with the processor:
- ISAs are libraries of instructions for a given instruction set architecture. A physical processor may support multiple ISAs.
- InstructionStreams are containers for sequences of instructions, called synthetic programs, and are responsible for managing operating specific tasks for executing the instructions (e.g., ABI compliance).
- Processors execute synthetic programs, synchronously or asynchronously, and can pass parameters to and from synthetic programs.
- Memory Classes provide support for describing memory and moving data across memory boundaries (e.g., RAM to registers, main memory to SPU local store).
These components are used to develop
synthetic programs. Synthetic programs are small computational kernels
synthesized at run time and optimized for the data they will process. A single Python program can use many synthetic programs during its lifetime, much in the same way it can use many external libraries. Synthetic programs may be created using
synthetic components. Synthetic components are reusable synthetic functions and classes.
CorePy provides a collection of synthetic components for common tasks. These include:
- Variables abstract basic register management and constant
formation tasks.
- Expressions use Python objects and operator overloading
to generate instruction sequences using a more natural expression
syntax.
- Iterators use Python iterators to generate instructions
for managing loops. The iterator classes support many different
iteration semantics and allow for user defined loop semantics (e.g.,
auto-simdization)
In addition, many examples demonstrate how to develop components for different domains, such as physics simulations and chemical informatics.
Back to top
Hello, Processor!
To get a feel for CorePy, let's start with a simple interactive session. First, make sure corepy is in your Python path and start the Python interpreter. Import the ppc instructions and run-time environment:
% python
>>> import corepy.arch.ppc.isa as ppc
>>> import corepy.arch.ppc.platform as env
Platform: linux.spre_linux_cell_64
(By convention, the isa is aliased to the name of the architecture, in this case ppc, and the run-time library is aliased to env.)
Now, create a new InstructionStream and add an instruction to it:
>>> code = env.InstructionStream()
>>> code.add(ppc.addi(code.gp_return, 0, 42))
Finally, create a Processor and execute the instruction stream:
>>> proc = env.Processor()
>>> result = proc.execute(code)
>>> print result
42
Congratulations! You have just created and executed your first synthetic program.
Let's do the same thing on the SPU:
# Load the SPU instructions and environment
>>> import corepy.arch.spu.isa as spu
>>> import corepy.arch.spu.platform as spu_env
Platform: linux.spre_linux_spu
# Create a simple empty synthetic program
>>> code = spu_env.InstructionStream()
>>> code.add(spu.stop(0x200C))
# Execute the synthetic program on an SPU
>>> proc = spu_env.Processor()
>>> result = proc.execute(code)
>>> print result
12
Believe it or not, you just wrote and executed an SPU program interactively from Python!
Back to top
CorePy Packages
CorePy is organized as a collection of packages, some shared and some platform specific. The platform sub-hierarchies all live in the
arch package and follow the same basic structure:
corepy/
arch/
[ppc,vmx,spu]/
isa/
lib/
platform/
types/
examples/
doc/
We've already seen the
isa and
platform packages is in action. The
isa package contains all the instructions for a particular ISA. The
platform package contains the native code and Python code necessary to execute instruction streams on a given operating system. When imported, the
platform package automatically selects the appropriate platform support code.
The
types package contains support for variables and expressions. We'll introduce these in detail in a few sections.
The
lib package contains additional libraries of synthetic components. Platforms with branch instructions will have an
iterators module here. Most platforms also include a
util module with other common operations.
Examples and Unit Tests
In addition to the files in the examples directory, almost every module in CorePy contains an extensive set of unit tests and examples at the end of each file. Feel free to explore the modules to learn more about how to use their features.
Documentation
Documentation is currently limited. But, portions of the code are well commented and most files contain unit tests that are good usage examples.
Back to top
CorePy Basics
CorePy is designed to make programming at the processor level as easy as using a library. As with any library, there are a few basic data types that form the main programming model for CorePy. If you have programmed in assembly (or even machine) languages before, these will be familiar. If not, don't worry! The concepts all fairly simple and easy to understand.
Registers
Registers are the simplest data type in CorePy and form the foundation for all other data types. When a physical processor operates on data, it can generally only access data stored in its local registers. Load and store instructions move data between memory and registers. CorePy registers correspond directly to processor registers. Processors have a limited number of registers and some care must be taken when designing synthetic programs to ensure the registers are not used up. Compilers use complex register allocation algorithms to assign registers to instructions. CorePy leaves overall register management up to the developer, but does provide some features to simplify the task.
In CorePy, registers are 'owned' by instances of InstructionStream. Thus, each synthetic program has its own set of registers, much like each thread in a C program has a set of registers. Registers can be acquired and released through the InstructionStream. An exception is thrown if a register is requested when there are none remaining. Some platforms include multiple types of registers. These are requested using a type argument. The following examples show how to acquire different register types for a PPC InstructionStream:
import corepy.arch.ppc.platform as env
code = env.InstructionStream()
# Acquire a general purpose (integer) register
ra = code.acquire_register()
# Acquire a floating point register
fa = code.acquire_register('fp')
# Acquire a VMX register
va = code.acquire_register('vector')
# Release the registers
code.release_register(ra)
code.release_register(fa)
code.release_register(va)
Some InstructionStream classes contain special registers. The most common are return registers. These are the registers that values are placed in to return them from a function call. The
gp_return and
fp_return return registers on the PowerPC InstructionStream can be used for this purpose.
G4/G5 and Cell PowerPC processors have 32 general purpose, 32 floating point, and 32 AltiVec/VMX registers. Each SPU has 128 general purpose registers that can hold either integer or floating point vectors.
Instructions
Instructions are the main interface to the processor. Instructions operate on two different types of data, data stored in registers and data encoded directly into the instruction, called immediate operands. In CorePy, register operands must be acquired registers and while immediate operands can be any type convertible to
int.
The operand order for the instructions matches the assembly order. As a general rule, the first operand is the destination, followed by the register operands and finally the immediate operands.
For instance, the SPU
ai, or add immediate, instruction, adds the value in register A to a constant and stores the value in register D. The assembly and CorePy versions of the instruction are:
ai D, A, 12 # Assembly
spu.ai(D, A, 12) # CorePy
For each of the three supported architectures, all instructions in the 32-bit Instruction Set Architectures are included in CorePy. One naming convention worth nothing: some instructions are listed in the ISA manuals with an italic
x at the end. These instructions appear in CorePy with the
x. For example, the PowerPC add instruction is listed as
addx. The corresponding CorePy instruction is
ppc.addx(...). If enough people complain about this, I'll drop the x's.
Many instructions have bit flags that are commonly set to the same value every time the instruction is used. In assembly language, these bit flags are changed by using mnemonic forms of the instructions. Instead of using mnemonics, CorePy makes these flags optional keyword arguments. In all cases, the flags default to 0. For example, the full form of the
addx instruction is
ppc.addx(D, A, B, oe = 0, rc = 0)
To call
addx with the rc bit set to one, use
ppc.addx(D, A, B, rc = 1)
Building InstructionStreams
Instructions are collected into synthetic programs using InstructionStream instances. We have already seen how to use InstructionStreams to acquire registers. Instructions can be added to InstructionStreams using two methods, a explicit add and an implicit add mode.
The initial examples used the explicit add method:
code.add(ppc.addi(code.gp_return, 0, 42))
Of course, this method adds some visual noise to the code and becomes cumbersome to type for longer synthetic programs. The implicit add mode associates an InstructionStream with an ISA and automatically adds Instructions the the stream as they are created:
code = env.InstructionStream()
# Set the current instruction stream
ppc.set_active_code(code)
# The addi instruction is automatically added to the instruction stream
ppc.addi(code.gp_return, 0, 42)
Once complete, or at any time during construction, you can view the current contents of an InstructionStream using the
print_code() method.
Execution
Once a synthetic program is built, it can be executed on a Processor instance. All versions of the PowerPC and SPU Processor objects support synchronous and asynchronous execution. Additionally, the PowerPC Processor object can return integer and floating point values stored in the gp/fp_return registers. Note that asynchronous execution is true multithreaded execution. If there are two processors available and two synthetic programs are executed, the will most likely execute on separate processors (the operating system makes the final decision).
# Execution examples
... create a synthetic program ...
proc = env.Processor()
# Synchronous (blocking) execution, return int
result = proc.execute(code)
# Synchronous (blocking) execution, return float
result = proc.execute(code, mode = 'fp')
# Asynchronous (blocking) execution
prog_id = proc.execute(code, mode = 'async')
# Stop/restart
proc.suspend(proc_id)
proc.resume(proc_id)
# Wait for the prog to finish
proc.join(proc_id)
Sharing Data with Synthetic Programs
Key points:
- Data is explicitly loaded into registers by synthetic programs
using the load instructions.
- Data can be created in Python using any form of memory buffer,
e.g. array, Numeric array
- Memory addresses can hard coded into the instruction sequence or
passed as parameters.
- InstructionStream.add_storage(data) keeps a reference
to arbitrary data objects to avoid having them garbage collected.
Variables and Expressions
Easy access to low-level machine instructions is a double-edged sword. On one side, you have direct access to the full power of the processor. But, expressing common things can be tedious and make one long for high-level languages. CorePy's variable and expression libraries add support for building stronger type checking and building expressions for common operations.
Machine instructions, while technically only operating on the Register and Immediate types, imply a number of semantic types. For instance, add instructions may perform signed or unsigned addition. Selecting the wrong instruction can lead to obscure bugs. Of course, this is one of the value propositions for typed languages. Variables provide a solution to this for common types. Variables encapsulate a register and a collection of type-specific operations via overloaded operators. Typed variables can only be used with compatible variables. The type semantics are still evolving in CorePy, but they are similar to those found in C.
Types are found in
arch/[ppc,vmx,spu]/types/[ppc,vmx,spu]_types.py. Each file has series of test cases demonstrating the available operators. (note: missing operators will be added over time, contributions are welcome!). For example, the SPU Bits type supports common logical operations and is the base type for the Halfword and Word types:
... setup active code ...
import corepy.arch.spu.types.spu_types as var
x = var.Bits(0)
y = var.Bits(0)
z = var.Bits(0)
z.v = (x | y) & (x ^ y)
proc.execute(code)
The example generates the instruction sequence suggested by the
expression. Note the
.v when the expression is assigned to
z. Python's = operator cannot be overloaded directly and instead the
special
.v property triggers evaluation of the expression.
Iterators
One of the most common tasks in implementing high-performance kernels
is writing loops. Managing loops at the instruction level is a
tedious and error prone process and one of the best reasons to use
compiled languages for high-performance code generation. CorePy
Iterators are powerful Python iterators allow you to use Python loop
syntax to generate high-performance loops. For example, a
nested sum can be implemented using a CorePy Iterator:
... set active code ...
import arch.ppc.lib.iterators as iter
import arch.ppc.types.ppc_types as var
a = var.UnsignedWord(0)
for i in iter.syn_iter(code, 5):
for j in iter.syn_iter(code, 5):
for k in iter.syn_iter(code, 5):
a.v = a + i + j + k
util.return_var(a)
a.release_register()
proc = synppc.Processor()
r = proc.execute(code)
# r == 750
This example creates three nested loops. The induction variables on the loops are returned as CorePy variables and can be used in expressions.
Iterator libraries exist for PowerPC (ppc/lib/iterators.py) and SPU (spu/lib/iterators.py). Iterators types include iterators for scalar arrays, vector arrays (e.g., simple auto-simdization), Pythonic iteration (zip/range), thread/processor-parallel block decomposition (auto-natural parallelism), and stream buffer management for moving data to and from SPU local stores. Explore the iterator files for examples.
Other Features
This document provides an introduction to CorePy. CorePy has been
used to develop many applications and many other features exist to aid
in development. If there's something you think it should do, it may
already do it. If it doesn't, let us know what you'd like and we'll
see if we can roll it into CorePy.
CorePy replaces the original Synthetic Programming
Environment for Python (SPE). The SPE contains a number of examples
that have yet not been ported to the CorePy, including the particle
system demo and the chemical fingerprint application. In special
cases, we can make the SPE codebase available.
Back to top
References
Instruction Set Guides:
PowerPC PEM/ISA Manual
AltiVec/VMX PEM/ISA Manual
SPU ISA