Contents |
When creating a Processor, a device number (greater than 0 and less than platform.N_GPUS) may be specified; the default is 0.
The Processor.execute() method takes several additional parameters. The first is "threads" which may be a tuple with 2 or 5 values. If 2 values, are specified, a two-dimensional grid of processes with those sizes are run. If 5 values are specified, the first two values specify a block size, and the last three values specify the sizes of a three-dimensional grid of blocks to run. The second is "params" which is a list or tuple of parameter values to be passed into the ptx Program. The parameters must match the number and type of parameters specified using Program.add_parameter() (see below).
PTX devices can effectively read system or "host" memory, but they can only write to device memory. Memory must be copied to and from the device. Furthermore, device memory must be allocated.
Host-side memory should be allocated using Processor.alloc_host() and device memory should be allocated using alloc_device(). Both functions take a type as their first parameter (where the type is a string corresponding to one of the ptx types: 'u32', 's32', 'f32', 'b32', 'u64', etc.) A second argument specifies the size of the memory in number of elements.
To copy memory from host to device or vice versa, use Processor.copy(dst, src).
PTX instructions often have several suffixes, which may be used to set optional flags on the instruction and also determine the "type" of the instruction. Type suffixes are generated by CorePy, so specifying a type is not usually necessary. Flags may be set using keywords, e.g. arithmetic instructions that take a suffix specifying rounding have a keyword "rnd" that takes values "rn", "rz", etc.
Predicates for individual instructions may be set by setting the keyword "pred" to a predicate variable. Negated predicates can be set by using the "nrepd" keyword.
PTX variables have a type, which is in the form 'u32', 's32', 'f32', 'b32' - that is, unsigned 32 bit, signed 32 bit, floating point 32 bit, and untyped ("bit") 32 bit. There are restrictions about what types can be used when, but those types are available in different bit widths between 8 and 64 bits. There is also a predicate variable type. Variables also have a "space," such as "global", "param", "shared". Registers are specified just like other variables but have a specified space of "reg".
PTX variables can be created by calling Program.add_variable() which takes a string for the space as the first argument, and a type for the second argument, and an optional name as the third argument. Registers can, strictly speaking, be acquired by this method, but should be instead obtained through the usual CorePy method of Program.acquire_register().
The read-only "special registers" such as %tid, %ntid, etc. are defined in corepy.arch.type.registers as tid, ntid, etc. Most of the special registers are defined as vectors in PTX, and the components are accessed by writing, for example, %tid.x; similarly, in CorePy, access these components by writing registers.tid.x.
Function parameters for PTX programs are just regular PTX variables in the "param" space. In order to add parameters to PTX programs, the Program.add_parameter() method should be used. It takes a type and an optional name as parameters. It returns a ptx variable that may be used as an operand for PTX instructions.
PTX allows addresses as operands. In PTX assembly, these are represented as one of the following enclosed in square brackets: a variable name, a register name, a register with an offset, or a constant address. In CorePy, such an address operand can be created by creating a registers.ptxAddress with a register or an immediate as an argument to the constructor.
Synthetic expressions are not yet supported.