CorePy User Guide

CorePy Overview

CorePy is designed to give developers direct access to the underlying processor from Python. CorePy provides four main components for interacting with the processor: These components are used to develop synthetic programs. Synthetic programs are small computational kernels synthesized at run time and optimized for the data they will process. A single Python program can use many synthetic programs during its lifetime, much in the same way it can use many external libraries. Synthetic programs may be created using synthetic components. Synthetic components are reusable synthetic functions and classes.

CorePy provides a collection of synthetic components for common tasks. These include: In addition, many examples demonstrate how to develop components for different domains, such as physics simulations and chemical informatics.

Back to top

Hello, Processor!

To get a feel for CorePy, let's start with a simple interactive session. First, make sure corepy is in your Python path and start the Python interpreter. Import the ppc instructions and run-time environment:
  % python
  >>> import corepy.arch.ppc.isa as ppc
  >>> import corepy.arch.ppc.platform as env
  Platform: linux.spre_linux_cell_64
(By convention, the isa is aliased to the name of the architecture, in this case ppc, and the run-time library is aliased to env.)

Now, create a new InstructionStream and add an instruction to it:
  >>> code = env.InstructionStream()
  >>> code.add(ppc.addi(code.gp_return, 0, 42))
Finally, create a Processor and execute the instruction stream:
  >>> proc = env.Processor()
  >>> result = proc.execute(code)
  >>> print result
  42
Congratulations! You have just created and executed your first synthetic program.

Let's do the same thing on the SPU:
  # Load the SPU instructions and environment
  >>> import corepy.arch.spu.isa as spu
  >>> import corepy.arch.spu.platform as spu_env
  Platform: linux.spre_linux_spu

  # Create a simple empty synthetic program
  >>> code = spu_env.InstructionStream()
  >>> code.add(spu.stop(0x200C))

  # Execute the synthetic program on an SPU
  >>> proc = spu_env.Processor()
  >>> result = proc.execute(code)
  >>> print result
  12
Believe it or not, you just wrote and executed an SPU program interactively from Python!

Back to top

CorePy Packages

CorePy is organized as a collection of packages, some shared and some platform specific. The platform sub-hierarchies all live in the arch package and follow the same basic structure:
corepy/
  arch/
    [ppc,vmx,spu]/
      isa/
      lib/
      platform/
      types/
  examples/
  doc/
We've already seen the isa and platform packages is in action. The isa package contains all the instructions for a particular ISA. The platform package contains the native code and Python code necessary to execute instruction streams on a given operating system. When imported, the platform package automatically selects the appropriate platform support code.

The types package contains support for variables and expressions. We'll introduce these in detail in a few sections.

The lib package contains additional libraries of synthetic components. Platforms with branch instructions will have an iterators module here. Most platforms also include a util module with other common operations.

Examples and Unit Tests

In addition to the files in the examples directory, almost every module in CorePy contains an extensive set of unit tests and examples at the end of each file. Feel free to explore the modules to learn more about how to use their features.

Documentation

Documentation is currently limited. But, portions of the code are well commented and most files contain unit tests that are good usage examples.

Back to top

CorePy Basics

CorePy is designed to make programming at the processor level as easy as using a library. As with any library, there are a few basic data types that form the main programming model for CorePy. If you have programmed in assembly (or even machine) languages before, these will be familiar. If not, don't worry! The concepts all fairly simple and easy to understand.

Registers

Registers are the simplest data type in CorePy and form the foundation for all other data types. When a physical processor operates on data, it can generally only access data stored in its local registers. Load and store instructions move data between memory and registers. CorePy registers correspond directly to processor registers. Processors have a limited number of registers and some care must be taken when designing synthetic programs to ensure the registers are not used up. Compilers use complex register allocation algorithms to assign registers to instructions. CorePy leaves overall register management up to the developer, but does provide some features to simplify the task.

In CorePy, registers are 'owned' by instances of InstructionStream. Thus, each synthetic program has its own set of registers, much like each thread in a C program has a set of registers. Registers can be acquired and released through the InstructionStream. An exception is thrown if a register is requested when there are none remaining. Some platforms include multiple types of registers. These are requested using a type argument. The following examples show how to acquire different register types for a PPC InstructionStream:
  import corepy.arch.ppc.platform as env
  code = env.InstructionStream()
 
  # Acquire a general purpose (integer) register
  ra = code.acquire_register()

  # Acquire a floating point register
  fa = code.acquire_register('fp')

  # Acquire a VMX register
  va = code.acquire_register('vector')

  # Release the registers
  code.release_register(ra)
  code.release_register(fa)
  code.release_register(va)
Some InstructionStream classes contain special registers. The most common are return registers. These are the registers that values are placed in to return them from a function call. The gp_return and fp_return return registers on the PowerPC InstructionStream can be used for this purpose.

G4/G5 and Cell PowerPC processors have 32 general purpose, 32 floating point, and 32 AltiVec/VMX registers. Each SPU has 128 general purpose registers that can hold either integer or floating point vectors.

Instructions

Instructions are the main interface to the processor. Instructions operate on two different types of data, data stored in registers and data encoded directly into the instruction, called immediate operands. In CorePy, register operands must be acquired registers and while immediate operands can be any type convertible to int.

The operand order for the instructions matches the assembly order. As a general rule, the first operand is the destination, followed by the register operands and finally the immediate operands.

For instance, the SPU ai, or add immediate, instruction, adds the value in register A to a constant and stores the value in register D. The assembly and CorePy versions of the instruction are:
  ai D, A, 12       # Assembly
  spu.ai(D, A, 12)  # CorePy
For each of the three supported architectures, all instructions in the 32-bit Instruction Set Architectures are included in CorePy. One naming convention worth nothing: some instructions are listed in the ISA manuals with an italic x at the end. These instructions appear in CorePy with the x. For example, the PowerPC add instruction is listed as addx. The corresponding CorePy instruction is ppc.addx(...). If enough people complain about this, I'll drop the x's.

Many instructions have bit flags that are commonly set to the same value every time the instruction is used. In assembly language, these bit flags are changed by using mnemonic forms of the instructions. Instead of using mnemonics, CorePy makes these flags optional keyword arguments. In all cases, the flags default to 0. For example, the full form of the addx instruction is
  ppc.addx(D, A, B, oe = 0, rc = 0)
To call addx with the rc bit set to one, use
  ppc.addx(D, A, B, rc = 1)
Building InstructionStreams

Instructions are collected into synthetic programs using InstructionStream instances. We have already seen how to use InstructionStreams to acquire registers. Instructions can be added to InstructionStreams using two methods, a explicit add and an implicit add mode.

The initial examples used the explicit add method:
  code.add(ppc.addi(code.gp_return, 0, 42))
Of course, this method adds some visual noise to the code and becomes cumbersome to type for longer synthetic programs. The implicit add mode associates an InstructionStream with an ISA and automatically adds Instructions the the stream as they are created:
  code = env.InstructionStream()

  # Set the current instruction stream
  ppc.set_active_code(code)
 
  # The addi instruction is automatically added to the instruction stream
  ppc.addi(code.gp_return, 0, 42)
Once complete, or at any time during construction, you can view the current contents of an InstructionStream using the print_code() method.

Execution

Once a synthetic program is built, it can be executed on a Processor instance. All versions of the PowerPC and SPU Processor objects support synchronous and asynchronous execution. Additionally, the PowerPC Processor object can return integer and floating point values stored in the gp/fp_return registers. Note that asynchronous execution is true multithreaded execution. If there are two processors available and two synthetic programs are executed, the will most likely execute on separate processors (the operating system makes the final decision).
  # Execution examples
  ... create a synthetic program ...
  proc = env.Processor()
  
  # Synchronous (blocking) execution, return int
  result = proc.execute(code)

  # Synchronous (blocking) execution, return float
  result = proc.execute(code, mode = 'fp')

  # Asynchronous (blocking) execution
  prog_id = proc.execute(code, mode = 'async')
	
  # Stop/restart
  proc.suspend(proc_id)
  proc.resume(proc_id)

  # Wait for the prog to finish
  proc.join(proc_id)
Sharing Data with Synthetic Programs

Key points: Variables and Expressions

Easy access to low-level machine instructions is a double-edged sword. On one side, you have direct access to the full power of the processor. But, expressing common things can be tedious and make one long for high-level languages. CorePy's variable and expression libraries add support for building stronger type checking and building expressions for common operations.

Machine instructions, while technically only operating on the Register and Immediate types, imply a number of semantic types. For instance, add instructions may perform signed or unsigned addition. Selecting the wrong instruction can lead to obscure bugs. Of course, this is one of the value propositions for typed languages. Variables provide a solution to this for common types. Variables encapsulate a register and a collection of type-specific operations via overloaded operators. Typed variables can only be used with compatible variables. The type semantics are still evolving in CorePy, but they are similar to those found in C.

Types are found in arch/[ppc,vmx,spu]/types/[ppc,vmx,spu]_types.py. Each file has series of test cases demonstrating the available operators. (note: missing operators will be added over time, contributions are welcome!). For example, the SPU Bits type supports common logical operations and is the base type for the Halfword and Word types:
  ... setup active code ...
  import corepy.arch.spu.types.spu_types as var

  x = var.Bits(0)
  y = var.Bits(0)
  z = var.Bits(0)

  z.v = (x | y) & (x ^ y)

  proc.execute(code)
The example generates the instruction sequence suggested by the expression. Note the .v when the expression is assigned to z. Python's = operator cannot be overloaded directly and instead the special .v property triggers evaluation of the expression.

Iterators

One of the most common tasks in implementing high-performance kernels is writing loops. Managing loops at the instruction level is a tedious and error prone process and one of the best reasons to use compiled languages for high-performance code generation. CorePy Iterators are powerful Python iterators allow you to use Python loop syntax to generate high-performance loops. For example, a nested sum can be implemented using a CorePy Iterator:
  ... set active code ...
  import arch.ppc.lib.iterators as iter
  import arch.ppc.types.ppc_types as var

  a = var.UnsignedWord(0)

  for i in iter.syn_iter(code, 5):
    for j in iter.syn_iter(code, 5):
      for k in iter.syn_iter(code, 5):
        a.v = a + i + j + k
      
  util.return_var(a)
  a.release_register()

  proc = synppc.Processor()
  r = proc.execute(code)
  # r == 750
This example creates three nested loops. The induction variables on the loops are returned as CorePy variables and can be used in expressions.

Iterator libraries exist for PowerPC (ppc/lib/iterators.py) and SPU (spu/lib/iterators.py). Iterators types include iterators for scalar arrays, vector arrays (e.g., simple auto-simdization), Pythonic iteration (zip/range), thread/processor-parallel block decomposition (auto-natural parallelism), and stream buffer management for moving data to and from SPU local stores. Explore the iterator files for examples.

Other Features

This document provides an introduction to CorePy. CorePy has been used to develop many applications and many other features exist to aid in development. If there's something you think it should do, it may already do it. If it doesn't, let us know what you'd like and we'll see if we can roll it into CorePy.

CorePy replaces the original Synthetic Programming Environment for Python (SPE). The SPE contains a number of examples that have yet not been ported to the CorePy, including the particle system demo and the chemical fingerprint application. In special cases, we can make the SPE codebase available.

Back to top

References

Instruction Set Guides:

PowerPC PEM/ISA Manual

AltiVec/VMX PEM/ISA Manual

SPU ISA