RISC-V Vector Element Groups
RISC-V Vector extension has defined multiple ways to organize vector data over one or multiple vector registers. In this previous blog post we presented the concept of vector register groups defined by RVV 1.0: the capacity to group together multiple vector registers into a meta (larger) vector register which can be operated on as a single operand by most instructions and thus extend the size of vector operands and results. Recently a new concept was introduced: vector element groups: considering multiple contiguous elements as a single larger element and operate on a group as if it was a single element. The concept was suggested by Krste Asanovic in this email; and later specified in a standalone document of the vector spec: https://github.com/riscv/riscv-v-spec/blob/master/element_groups.adoc.
Definition
A vector element group is defined by an effective element width (EEW) and an element group size (EGS), it is a group of EGS elements, each EEW-bit wide. The total group width (in bits) is called the Element Group Width (or EGW, EGW = EEW * EGS).
NOTE: the single element width parameter implies that all elements in an element group have the same width.
The element group is useful to manipulate multiple data elements which make sense as a block (e.g. a 128-bit ciphertext for the AES cipher algorithm) without the need to define large element widths and implement their support in hardware.
An element group can be fully contained in one vector register or can overlap multiple registers. In the former case, a single vector register can contain multiple element groups.
An element group can also have EGW larger than an implementation VLEN; in this case a multi-register group is required to fit a single element group. The same constraints as any vector register group apply: the register group is encoded by its first register whose index must be a multiple of the group size.
EEW can either be specific to the opcode or defined through SEW. For example most vector crypto instructions defines EEW from SEW (even if only a small subset of values are legal): it is required to define vtype properly before executing a vector crypto instruction. This is in particular useful to reuse the same instruction for different algorithm: e.g. setting SEW=32 bits, vl=4 and executing a vsha2c will perform a SHA-256 message compression, while setting SEW=64 bits and vl=4 and executing a vsha2c will perform a SHA-512 message compression . We will provide more detail on the new vector crypto extension in a future post.
Contrary to EEW, EGS is always defined by the opcode: it is not a new vtype field. For example vsha2ms (SHA-2 message schedule) statically defines EGS as 4.
Constraints on vl and vstart
The vector length (vl) of an element-group operand is counted in elements and not in terms of groups. And the vector length must be a multiple of the element group size EGS, the same constraint applies to vstart. Other cases are reserved. This mean that operand on a single element group with EGS=8, requires setting vl to 8, operating on 3 elements groups requires setting vl to 24 and so forth.
The case when vl (or vstart) is not a multiple of EGS is reserved and executing an instruction working on element group may result in an illegal instruction exception being signaled.
Note: raising an illegal instruction exception on reserved cases is allowed although it is not required. Implementers may decide otherwise.
Masking and element groups
The specification of element groups leaves a lot of room when it comes to masking: masking support is defined on a per operation basis. The concept allow for per-element masking and mask setting or per element-group (if any or all mask bits corresponding to elements in the group are set). The concept does not seem to cover the cases where a single mask bit corresponds to a full group regardless of the actual number of element in the group: mask bit 0 would correspond to group 0, mask bit 1 to group 1 ... This case would behave similarly to masking with SEW=EGW.
In the only existing use case (the vector cryptography extension), none of the instructions defined as operating on element groups support masking, so the problem of element group masking implementation will be delayed to a future extension.
Examples and use cases
The following diagram illustrates two examples of element groups. The top element group has EGW half as wide as VLEN and so two element groups fit in a single vector registers. The bottom example has EGW twice as wide as VLEN and so a 2-register group is required to fit a single element group.
The element group concept has been first used by the vector cryptography extension proposal (draft under architectural review at the time of writing). Different element groups configurations are used:
4x 32-bit elements for AES, SM4, GHMAC and SHA-256
8x 32-bit elements for SM3
4x 64-bit elements for SHA-512
As mentioned earlier, the vector crypto extension requires SEW to be set to an element width value supported by the instruction being execution, and considers all other cases as reserved.
Difference between element groups and larger SEW
One of the relevant question one may ask regarding element groups is: what is the difference between an element group with EGW=128 (e.g. EEW=32 and EGS=4) and a single element with SEW=128 ?
The first fact is that currently SEW (as defined by the vsew field of the vtype register) cannot exceed 64 bits (RVV 1.0 spec section). Although the vsew field is large enough to accommodate larger values, those encodings are currently reserved. So element groups bring support for larger data blocks without requiring new vsew encodings.
A second fact is that support for element groups is possible even if EGW exceeds ELEN, the maximal element width supported by the architecture. In fact EGW can even exceed VLEN: this case is part of the element group concept and is supported by laying down an element group across a multi-register vector register group. This is not currently supported for single element. This translate a fact that operations using element groups do not need an actual datapath larger than ELEN, most operation are performed on ELEN-bit wide data or less.
A third fact is that supporting element groups can be done mostly transparently from the micro-architecture point of view: there is not much difference between a single element group of EGS=4 and EEW=32 and a 4-element vector with EEW=32: internal element masking and tail processing can reuse the same datapaths,
Conclusion
Vector element groups represent a new interesting paradigm to extend the capabilities of vector processor, similar to the concept of vector register group with integer or fractional group length multiplier (reviewed in the RISC-V Vector Register Groups post). They allow the reuse of existing SEW, vl and LMUL values to have a different meaning and different constraints. We will review the use of vector element groups when we review the upcoming vector crypto extensions (stay tuned for this blog series).
References:
Section on vector element groups in the RISC-V vector crypto specification: https://github.com/riscv/riscv-crypto/blob/master/doc/vector/riscv-crypto-vector-element-groups.adoc
Section on vector element groups in the RISC-V Vector specification (not part of any released version yet): https://github.com/riscv/riscv-v-spec/blob/master/element_groups.adoc