Taxonomy of RISC-V Vector Extensions
One vector extension to rule them all (or not): survey of existing and future RISC-V Vector ISA extensions.
RISC-V Vector 1.0 (RVV) was ratified in November 2021. The main extension, dubbed simply the “v” extension (as in the letter V, not the roman numeral V), covers a lot of ground, with supported element sizes ranging from 8 to 64-bit and offering a plethora of formats (integer, floating-point, fixed-point, boolean mask, …) and countless operations. The RVV 1.0 is actually not limited to the single v extension as it defines a few more extensions, each a subset of v, dedicated in part to embedded processors: the Zve* family of extensions. And since RVV 1.0 a few more vector extensions have been ratified, a few others are in the specification/ratification stages and a few more are in project. In this post we will survey those extensions.
RISC-V Vector 1.0
RVV extensions: ELEN, formats and operations
RVV defines ELEN as the widest supported element size (corresponding to the widest integer format supported, the widest floating-point format can actually be narrower than ELEN as we will see later, but it cannot be wider). For v, ELEN is 64 bits. This means that the v extension offers support for formats up to 64-bit wide. The v extension also offers support for both single (32-bit) and double (64-bit) floating-point precisions.
v actually supports all the instructions specified in RVV 1.0. There are other extensions which offer a reduced subset of instructions and/or formats.
The Figure below illustrates the main vector extensions and their relationship.
Extensions for Embedded Processors: Zve* family
RVV specification - Zve*: Vector Extensions for Embedded Processors
Zve32x and Zve32f limit ELEN to 32 bits: they offer no support for operations on 64-bit elements. This means that an effective element width (EEW) of 64-bit is not valid, which excludes any operation with selected element width (SEW) set to 64-bit. No single-width operations on 64-bit formats is supported as part of Zve32*. Furthermore, no narrowing operations from a 64-bit input nor no widening operations from a 32-bit input (to a 64-bit output) are supported either: it is simply not possible to have valid 64-bit element as operand nor result of an operation with just Zve32*. This translates into narrowing and widening operations being reserved in Zve32* when SEW is set to 32-bit (as SEW indicates the output format width for narrowing operations and the input format width for widening operations).
Zve32f offers floating-point instructions while Zve32x does not, allowing further simplifications in the implementation.
Zve64x and Zve64f, are similar, they extend ELEN to 64 and differ by the support of floating-point formats/instructions.
Note: the f suffix in extensions name Zve32f and Zve64f indicates that single precision (IEEE-754 binary32 format) is supported.
Zve64d builds on Zve64f and adds support for double precision. It still differs from the full v extension as it does not mandate support for vmulh*
nor vsmul*
instructions: those instructions would require the high part of 64-bit integer multiplier which is an extra cost some implementations want to avoid.
Note: Although a full 64-bit integer multiplier is not necessary to support Zve64d, the extension still require double precision mantissa multiplication (53-bit multiplier with full output). It can be implemented with full throughput or through an iterative method.
Constraint on VLEN: Zvl<n>b
RVV allows vector length agnostic (VLA) program development but the width of vector registers, VLEN, can still be important for some applications or some vector length specific (VLS) programs.
RVV defines a set of “extensions” to indicate support for a minimum VLEN: the Zvl*b extensions (e.g. Zvl32b). A Zvl<n>b extension indicates a lower bound on the implementation value of VLEN, that is the implementation implements VLEN >= n
bits.
For example RISC-V Application Processor profile, RVA, mandates Zvl128b, which means that any conformant implementation offers at least 128-bit wide VLEN.
Note: For any RVV 1.0 conformant vector implementation, VLEN must be at least 32-bit wide so all implementations support at least Zvl32b.
Note: there are no extensions to indicate a maximum VLEN value, and in particular there are no extension to indicate a specific VLEN value is supported.
Compilers can still rely on machine specific flags to perform VLS specific optimization or software can implement dynamic dispatch based on the value read from the vlenb register.
Vector after RVV 1.0
The extensions described above were all part of RVV 1.0. Since its ratification, RVIA has kept busy and has been extending RVV with other vector extensions which we are going to survey now.
Support for smaller floating-point formats
For floating-point, RVV 1.0 specified support for single and double precision. Support for half precision (IEEE-754 binary16) and BFloat16 have been added since.
Half precision: Zvfh, Zvfhmin
Two extensions have been added to extend floating-point support to half-precision (IEEE-754 binary16).
Zvfh defines a full fledged support where all existing FP instructions are extended to half precision (encoded with vtype.vsew=1
for SEW=16-bit).
Zvfhmin defines a reduced version of it limited to some conversion operations; the goal being to convert from half to a wider format, perform any computation in the wider formats before eventually converting back to half precision and storing the result.
We reviewed those extensions in more details in this section of this post.
BFloat16: Zvfbfmin, Zvfbfwma
The current official RISC-V support for BFloat16 is more limited, compared to half precision.
Only conversions (extension Zvfbfmin) and a widening FMA to single precision (Zvfbfma) are currently supported. The format is also encoded by vtype.vsew=1
, and distinguished from half precision by the use of different opcodes.
More info in this post Vector BF16 support: Zvfbfmin and Zvfbfwma.
RVIA is working on projects to extend support for BFloat16 with the Zvbfa extension (on going as of March 2025) that we present briefly later in this post.
Vector Bit Manipulations: Zvbb, Zvkb
RVV 1.0 was lacking some generic bit manipulation instructions (bit/byte reverse, rotations, ….). Those were added as part of the vector crypto extensions efforts through the Zvkb and Zvbb extensions (Zvkb is a strict subset of Zvbb).
Zvbb was deemed important enough that it has been made mandatory in the RVA(23U64) profile. Thus, we can expect to see it supported on the RISC-V modern application processors.
Vector cryptography: AES, SHA2, SM3, SM4 and GCM
The vector cryptography effort at RVIA also produced the following specification
NIST Cipher and HASH functions (AES and SHA-2): Zvkned, Zvknha, Zvknhb
Shang-Mi Cipher and Hash functions: (SM4 and SM3): Zvksed, Zvksh
Carry-Less multiplication: Zvbc (64-bit only at the moment)
Galois Counter Mode Multiply/Accumulate-Multiply: Zvkg
For more details, the taxonomy of vector crypto extensions was covered in details in this post:
RISC-V vector crypto spec freeze update
Some time ago we presented the upcoming RISC-V vector crypto extension (a.k.a. vector crypto) in a 2-post series. The specification was not yet frozen and was being actively reviewed by RISC-V Architectural Review Committee (ARC).
Note: the vector crypto extension defined a number of super extensions, grouping the extensions listed above into “thematic” sets. For example Zvkng (NIST+GCM) groups: Zvkg and Zvkn; in turn Zvkn (NIST) is made of Zvkned + Zvknhb + Zvkb + Zvkt.
Vector extension dependencies
Most vector extensions rely on other extensions. The vector extensions dependency graph is illustrated by the figure below (including the main dependencies to scalar extensions):
The future of RISC-V Vector
RVIA is continuing its specification effort and numerous projects of extensions exist. Let’s review a few of the most active ones.
Extended vector support for BFloat16: Zvfbfa
As we have seen, currently RISC-V offers a limited support for BFloat16: only a handful of conversion and a single arithmetic operation (widening FMA) are supported in ratified extensions. This is about to change, as a proposal to extend support to many more operations has been made. This proposal was submitted in late 2024 (email) to RVIA Vector SIG (Special Interest Group) by Andrew Waterman and consist in a new extension: Zvfbfa (draft proposal).
The Zvfbfa proposal defines an almost complete support for BFloat16 operations: all but a few vector instructions supporting single precision are extended to support BFloat16. For format encoding, BFloat16 reuses vtype.vsew = 1
(16-bit, used also by FP16 half precision) and is differentiated from FP16 by the used of a new vtype
bit: altfmt
. altfmt=0
encodes half precision and altfmt=1
encodes BFloat16.
At the time of writing (early March 2025) the following BFloat16 operations are excluded from the Zvbfa project: “division, square root, reductions, and conversions to/from integers wider than 8 bits”. Their interest and cost is estimated to be too small to justify their integration; the proposal suggests to use conversion to FP32 and FP32 operations to perform those operations indirectly.
Note: the 7-bit vector estimate instructions,
vfrec7.v
andvfrsqrt7.v
, provide very accurate approximation of reciprocal and reciprocal square root in BFloat16, since the format mantissa is only 7-bit wide (plus an implicit digit).
RVV support for OCP formats: OFP8, OFP4
Another parallel proposal to Zvfbfa was made simultaneous: minimal support for Open Compute 8-bit floating point formats (OFP8 E5M2 and E4M3 formats).
This proposal dubbed Zvfofp8min (initially named Zvfmx8min) offers conversions from both OCP 8-bit formats and BFloat16, and the conversions in the other direction. It also specifies conversions from FP32 to the 8-bit OFP8 formats.
The specification draft proposal can be found here.
In the same email which announced Zvfmx8min / Zvfofp8min, two new extensions, Zvi4min and Zvfmx4min, were also suggested. They respectively introduced instructions to support conversion of 4-bit integer and 4-bit floating-point format. Andrew Waterman’s email is overall an interesting read as it lay down his views on the general topic of RISC-V vector support for smaller formats.
Extensions for DSP
A fast track extension project, announced here, aims at specifying new instructions for the computation of sum of absolute differences. The proposal was initially sent to the vector SIG mailing list in December 2024.
Another proposal, also shared on the vector SIG mailing list in December 2024, suggests the introduction of packing and unpacking vector instructions (zip and unzip).
A third, more ambitious (in terms of the number of instructions) proposal, also made in December 2024, aims at accelerating wireless application. The project, dubbed Zvw, was initial announced on the vector SIG mailing list in this email. The proposal has a (non-RVIA) github repository: https://github.com/gaoshanlee193/riscv-vector4wireless-extension/tree/Zvw.
Extensions for Matrix multiplication
RVIA has setup two different task groups to work on support for matrix operations in RISC-V: The integrated matrix extension (IME) task group (IME mailing list) and the attached matrix extension (AME) task group (AME mailing list).
The IME aims at specifying a set of extensions to support matrix operations within the existing vector registers while the AME aims at specifying a set of extensions to support matrix operations with a dedicated accumulator states.
Those groups are very active, and multiple different proposals for extensions are being discussed. They would justify a full blog post (or even a series) just on that topic.
Vector cryptographic extensions
RVIA has spawned up two task groups to extend cryptographic support.
The first work group, the high assurance cryptography TG (mailing list), is working on a proposal to define vector cryptographic primitives amenable to the implementation of side channel protection. The proposal uses vector registers to interface with an high assurance cryptography engine (send/receive data block, plaintext or ciphertext, exchange wrapped keys, …).
The second active cryptography task group, the post-quantum cryptography TG (mailing list), is working on defining new vector instructions to accelerate post-quantum cryptographic primitives. The TG work focuses on accelerating Kyber and Dilithium and is going in the direction of specifying vector instructions to accelerate the Keccak hash function (SHA-3) which appears to be the main performance bottleneck in the implementation of those algorithms on RISC-V Vector.
Conclusion
RISC-V Vector is still pretty new (at least when we consider ratified version RVV 1.0) and is gaining traction in the computing ecosystem with more and more hardware being available. RVIA has been working on extending the original specification and a few extra dedicated extensions have already been ratified (support for cryptography or 16-bit floating-point formats) and numerous projects are on the way. If you want to be involved, please check the mailing list reference below.
Reference(s)
Most of the vector extensions are being discussed in RVIA Vector SIG (Special Interest Group). Its mailing list can be accessed here. The SIG meets every other week on Mondays (RVIA technical agenda is accessible here). Larger extensions, or extensions requiring longer term work will certainly spawn their own task groups.
Also not a proposal (yet),
published a detailed vector gap analysis: https://gist.github.com/camel-cdr/99a41367d6529f390d25e36ca3e4b626
Thanks for the article. I want to follow the discussion around the FP8 extensions. Apart from that mailing list, what is the best way to follow the discussion as a non-member do you think?