AOS and SOA

In computing, AoS and SoA refer to contrasting ways to arrange a sequence of records in memory, with regard to interleaving, and are of interest in SIMD programming.

Structure of arrays

Structure of arrays (or SoA) is a layout separating elements of a record (or 'struct' in the C programming language) into one parallel array per field.^[1] The motivation is easier manipulation with packed SIMD instructions in most instruction set architectures, since a single SIMD register can load homogeneous data, possibly transferred by a wide internal datapath (e.g. 128-bit). If only a specific part of the record is needed, only those parts need to be iterated over, allowing more data to fit onto a single cache line. The downside is requiring more cache ways when traversing data, and inefficient indexed addressing. (see also: planar image format)

Array of structures

Array of structures (or AoS) is the opposite (and more conventional) layout, in which data for different fields is interleaved. This is often more intuitive, and supported directly by most programming languages.

Example

Here is an example of the two layouts written in C++:

struct Point {
	int x;
	int y;
};

/* Array of structures approach */
//Each struct contains information about an individual ball.
struct Ball {
	Point position;
	Point velocity;
	std::string name;
};

std::vector<Ball> balls;

//Updating is as simple as looping over the array of balls.
void update_positions(std::vector<Ball>& bs) {
	for(auto& b : bs) {
		b.position.x += b.velocity.x;
		b.position.y += b.velocity.y;
	}
}

//If we only care about a particular attribute (in this case, the name) we
//have to load the entire ball into the function, which would be wasteful.
void print_names(const Balls& bs) {
	for(auto& b : bs) {
		std::cout << b.name << '\n';
	}
}

/* Structure of arrays approach */
//The struct consists of arrays of each particular attribute of the ball.
struct Balls {
	int n_balls;
	std::vector<Point> positions;
	std::vector<Point> velocities;
	std::vector<std::string> names;
};

//To iterate over the structure of arrays, we have to iterate over each of the 
//relevant arrays.
void update_positions(Balls& bs) {
	for(int i = 0; i < bs.n_balls; ++i) {
		//Retrieve the information of the ball at this index
		auto *c_position = &bs.positions[i];
		auto *c_velocity = &bs.velocities[i];
		//Do stuff with it
		c_position->x += c_velocity->x;
		c_position->y += c_velocity->y;
	}
};

//If we only care about a particular attribute, we only have to load the 
//relevant arrays, and nothing else.
void print_names(const Balls& bs) {
	for(int i = 0; i < bs.n_balls; ++i) {
		auto *c_name = bs.names[i];
		std::cout << *c_name << '\n';
	}
}

Alternatives

A hybrid approach is possible, e.g. de-interleaving a number of objects that correspond to the number of SIMD lanes.

It is also possible to split some subset of a structure (rather than each individual field) into a parallel array – this can actually improve locality of reference if different pieces of fields are used at different times in the program (see data oriented design).

Some SIMD architectures provide strided load/store instructions to load homogeneous data from SoA format. Yet another option used in some CELL libraries is to de-interleave data from AoS format when loading sources into registers, and interleave when writing out results (facilitated by superscalar issue of permutes). Some vector maths libraries align floating point 4D vectors with the SIMD register to leverage the associated data path and instructions, whilst still providing programmer convenience, although this does not scale to SIMD units wider than four lanes.

In the GPU stream processing model, SoA layout is generally not required, as they provide true vector addressing.

4D vectors

AOS vs SOA presents a choice when considering 3D or 4D vector data on machines with 4-lane SIMD hardware. SIMD ISAs are usually designed for homogeneous data, however some provide a dot product instruction^[2] and additional permutes making the AOS case easier to handle. As of 2016, most GPUs have moved away from 4D instructions to scalar SIMT pipelines,^[3] for better performance for compute kernels.

Software support

Most languages support the AOS format more naturally by combining records and various array abstract data types. The experimental JAI programming language is a recent attempt to provide language level SOA support.^[4] Julia language, supports multi-dimensional arrays, with AOS, or SOA (through a package).

References

http://www.cc.gatech.edu/~hyesoon/spr10/lec_cuda5.pdf

↑ "how to manipulate data structure to optimize memory use".
↑ "intel SSE4 dot product".
↑ "modern GPU arch (see scalar unified pipelines)" (PDF).
↑ "Data oriented demo:SOA,composition". Demonstration of data-oriented and SOA features in the JAI language, also explaining the motivation.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] "how to manipulate data structure to optimize memory use".

[2] "intel SSE4 dot product".

[3] "modern GPU arch (see scalar unified pipelines)" (PDF).

[4] "Data oriented demo:SOA,composition". Demonstration of data-oriented and SOA features in the JAI language, also explaining the motivation.