New Code Composition Support

From CorePy

Jump to: navigation, search

New support for code composition (adding one instruction stream to another) has been merged into the trunk as of r789:

https://svn.osl.iu.edu/svn/corepy/trunk

A number of other fixes, cleanups, and features have been implemented as well. This page attempts to document the user-visible changes and improvements. Please send comments/questions to the Mailing Lists.


A paper on the new code composition and transformation functionality has been submitted for publication. A preprint is available here, and gives a good overview of the new functionality and its motivation:


Enabling Code Transformation via Synthetic Composition


Contents

[edit] Quick Guide to Updating Existing Code

The biggest user-visible change is the introduction of the Program object. Program objects contain all resource management ({acquire,release}_register(), get_label(), add_storage(), etc) now, instead of InstructionStream. Streams are now created using the Program.get_stream() method. The following examples first show how a synthetic program is written on the current trunk, followed by what needs to be changed for that same code to work on the composition branch:

# Old (trunk) way
code = env.InstructionStream()
 
r_foo = code.acquire_register()
lbl_loop = code.get_unique_label("loop")
 
# Generate awesome code here
 
code.release_register(r_foo)
 
proc.execute(code)
# New (branch) way
prgm = env.Program()
code = prgm.get_stream()
 
r_foo = prgm.acquire_register()
lbl_loop = prgm.get_unique_label("loop")
 
# Generate awesome code here
 
prgm.release_register(r_foo)
 
prgm.add(code)
proc.execute(prgm)

InstructionStream objects can no longer be created on their own; they must be tied to a Program object. The above example illustrates the recommended way of doing this. Another big change shown above is register and label management -- the Program object now manages these resources, not individual instruction streams.

The old way of building synthetic components was to define functions that generated code like this:

# Old (trunk) way:
 
def generate_awesomeness(code):
  r_foo = code.acquire_register()
 
  # Generate awesomeness using r_foo
 
  return
 
code = env.InstructionStream()
generate_awesomeness(code)

The problem with porting this to the code composition branch is that now a Program object is also going to need to be passed in, so that generate_awesomeness() can acquire a register. Then of course all calls to generate_awesomeness() need to pass in the Program object in addition to the InstructionStream. A quick way to update this code with minimal hassle is to use the InstructionStream's reference to its parent Program object:

# Quick new (branch) way
 
def generate_awesomeness(code):
  r_foo = code.prgm.acquire_register()
 
  # Generate awesomeness using r_foo
 
  return
 
prgm = env.Program()
code = prgm.get_stream()
 
generate_awesomeness(code)

This is the approach that was taken for the existing utility functions, synthetic iterators, and synthetic expressions in the codebase. However, the long (and proper) way to port this code to use code composition would be to do this:

# Proper new (branch) way
 
def generate_awesomeness(prgm):
  code = prgm.get_stream()
  r_foo = prgm.acquire_register()
 
  # Generate awesomeness using r_foo
 
  return code
 
prgm = env.Program()
code = prgm.get_stream()
 
# Generate some code
 
# Generate awesomeness, and add it to the code
awesome_code = generate_awesomeness(prgm)
code.add(awesome_code)
 
prgm.add(code)

Using this approach creates a standalone InstructionStream containing the awesome code, which could be reused (by adding it multiple times) without actually regenerating the awesome code. Furthermore, this approach allows the application of code transformations to just the generated awesome code. The documentation below talks about this functionality in more detail.


[edit] Code Composition in Detail

[edit] Program objects

The new Program object was created to centralize code-related resources such as registers and labels. All of the resource management functionality was moved over (mostly) unchanged from InstructionStream. The following methods are now found on Program objects:

acquire_register()

acquire_registers()

release_register()

release_registers()

get_label()

get_unique_label()

add_storage()

remove_storage()

get_storage()

cache_code()

One change pertains to register acquire/release. On all architectures but PPC, registers are now acquired in a least-recently-used ordering, rather than the existing most-recently-used ordering. The purpose of this is to avoid register stalls (and limited optimization opportunities) created by false write-after-read dependences. PPC preserves the old behavior because it was decided that minimizing the registers used (and thus the how many need to be saved/restored by the pro/epilogue) was more important.

Synthetic programs are now formed by first creating a Program object, then obtaining streams to which code may be added. Instruction streams must be explicitly added to the Program:

prgm = env.Program()
code = prgm.get_stream()
 
prgm.add(code)

Instructions added to a stream both before and after adding the stream to the Program will be included in the final synthetic program. In other words, the Program object maintains a reference to the stream. Instructions and labels currently may not be added directly to a Program, though this may be supported in the future.

The code rendering process (cache_code() method) has been moved to the Program object, though the process itself remains unchanged. As a result, calls to InstructionStream.cache_code() need to be changed to Program.cache_code(). InstructionStream.print_code() remains, but has been stripped of its optional functionality -- no arguments are accepted. Program objects have a fully functional print_code() method.


[edit] InstructionStream objects

As documented above, a lot of functionality previously found on InstructionStream has been moved to the new Program object. However, one new feature has been added: streams may be added to one another using the existing InstructionStream.add() method. This enables code composition (and code transformation), which is the whole purpose of this work. Code composition looks like this:

prgm = env.Program()
code = prgm.get_stream()
 
# Generate some subcode component
subcode = generate_subcode(prgm)
 
# Compose the subcode into the main code
code.add(subcode)
 
# Add the main code to the program
prgm.add(code)

Currently, adding one stream to another results in new references to the sub-stream's instructions and labels being added to the parent stream. In the future, this may be changed so that a reference to the sub-stream itself is added to the parent stream instead -- this would allow later changes to the sub-stream to appear in the parent stream, and be somewhat more Pythonic.


[edit] Other Features

A new side feature is that += has been overloaded on both the Program and InstructionStream objects. The following two examples are equivalent:

# Using overloaded operators:
subcode += x86.mov(rax, 42)
code += subcode
prgm += code
# Using the existing add() method:
subcode.add(x86.mov(rax, 42))
code.add(subcode)
prgm.add(code)

InstructionStreams also have an overloaded + operator which can be used to create a new stream from two other streams:

code = subcode1 + subcode2


[edit] Sample Code

The code in the examples directory on the branch has been updated for code composition, with the exceptions of some older PPC examples. New examples have been added for each architecture to demonstrate/test code composition, and can be found in the *_comp.py files.


[edit] Code Transformation

The motivation for the code composition functionality is to allow synthetic programs to be constructed from a set of independent synthetic components rendered into their own streams. While useful on its own, this also creates the possibility for targeted application of code transformations. The obvious application of this possibility is automated code optimization. An instruction scheduling optimizer has been developed for the Cell SPU architecture. The scheduler takes one or more streams as input, and rearranges the instructions to achieve the best performance possible. However the scheduler currently only supports the optimization of straight-line code that contains no branches or labels. Code composition minimizes the impact of this limitation by allowing code to be broken apart into suitable segments.

The scheduler has been included in the SPU library directory (corepy/arch/spu/lib/isched.py). The following example demonstrates its use:

def generate_code(prgm)
  code = prgm.get_stream()
 
  # Generate straight-line optimizable code
 
  return code
 
 
# Generate the code segment
slowcode = generate_code(prgm)
 
# Optimize the slow code
fastcode = isched(slowcode)
 
# Add the fast code to the program
prgm += fastcode

The instruction scheduler can also be used to interleave the code from two independent operations in an optimal way. For example:

# Generate sine and cosine code independently
sincode = gen_sin(r_sin_x)
coscode = gen_cos(r_cos_x)
 
# Combine and optimize the sine and cosine code
code += isched(sincode + coscode)

Performance gains vary depending on the code being optimized; the scheduler attempts to make the best use of pipelining and instruction-level parallelism to reduce stalls and maximize the instruction issue rate. The code composition paper referenced above documents the instruction scheduler in more detail.