RISC-V vector crypto spec freeze update

Revisiting vector crypto specification on its way to ratification

Jun 25, 2023

Some time ago we presented the upcoming RISC-V vector crypto extension (a.k.a. vector crypto) in a 2-post series. The specification was not yet frozen and was being actively reviewed by RISC-V Architectural Review Committee (ARC).

RISC-V Vector Cryptography Extensions (1/2)

Fprox

February 12, 2023

Read full story

RISC-V Vector Cryptography Extension (2/2)

Fprox

February 20, 2023

Read full story

Since then the specification has been approved by the ARC and by RVIA committee chairs after a vote: it has reach the “frozen” milestone. The next step after that, on its way to ratification, is a 30-day public review which just started (see the announcement) and will end July 23rd 2023.

In this post, we will cover what changed since our initial posts in February 2023. The full specification v1.0.0-rc1 is available on github: https://github.com/riscv/riscv-crypto/releases/tag/v20230620.

The following diagram presents a summary of instructions and extensions in RISC-V vector-crypto specification (the new instructions and extensions are highlighted in bold font, Zvkt is not represented).

VECTOR CRYPTO INSTRUCTIONS AND EXTENSIONS

Revised Instructions

Aligning vandn on scalar andn

vandn.v* was revised to mirror the specification of the scalar andn from the Zbb extension: the roles of operands vs2 and vs1/rs1 were swapped, the inverted operand is now vs1/rs1. The variant vandn.vi was removed (since it can be implemented with vand.vi and a negated immediate).

Swapped GHASH for better software integration

RISC-V vector crypto introduces an instruction to speed-up the evaluation of the GCM block cipher (Galois Counter Mode). The most compute intensive part of the GCM (except from the cipher itself) is a function called GHASH which is a multiply-accumulate in a GF(2^128) finite field with 128-bit inputs and results interpreted as polynomials.

The early versions of RISC-V vector crypto specified vghmac.vv: an instruction which performs a multiply-xor in the GF(2^128) field (xor is the addition operation in such fields) corresponding to the MAC variant in the GCM diagram below (modified from Wikipedia). It was replaced by a new instruction: vghsh.vv, which performs a xor-multiply in the same field, corresponding to the ACCMUL variant below.

The multiply-xor variant is more in line with what software expects (see the discussion on this github issue). In particular, OpenSSL, a widely used library implementing cryptography primitives and TLS for secure network connections, relies on the multiply-xor order, making it more natural for new ISAs to adopt this variant (OpenSSL source code).

        for (i = 0; i < len; i += 16) {
            memcpy(tmp, &inp[i], sizeof(tmp));
            Xi[0] ^= tmp[0];
            Xi[1] ^= tmp[1];
            funcs.gmult(Xi, Htable);
        }

And the same is true for the implementation of GHASH in the Linux kernel (Linux source code):

	while (srclen >= GHASH_BLOCK_SIZE) {
		crypto_xor(dst, src, GHASH_BLOCK_SIZE);
		gf128mul_4k_lle((be128 *)dst, ctx->gf128);
		src += GHASH_BLOCK_SIZE;
		srclen -= GHASH_BLOCK_SIZE;
	}

A new instruction, vgmul.vv, was introduced. It performs the multiply without any xor.

The two new extensions exist only in vector-vector form (.vv); vector-scalar variants (.vs), were requested. But, although the task group thought them to be helpful in some cases, the expected benefits were small overall and they came too late in the process to be integrated without delaying the ratification. They might be considered in a future extension (potentially a fast track extension once the main specification has been ratified).

Note: the GHASH algorithm interprets bit-in-bytes in a big-endian fashion (bit 0 being the most significant coefficient). vghsh.vv and vgmul.vv integrate this bit-in-byte endianness swap on inputs and output (as did the earlier vghmac.vv) which saves precious instructions.

Body-builded bit manipulation

The original Zvkb extension has been split into two extensions: Zvbb and Zvbc.

Zvbc contains the two carry-less multiply instructions vclmul.v* and vclmulh.v*. Those instructions are only defined for an element size SEW of 64 bits. Although supporting other element widths was considered it did not make it to the final specification.

Zvbb contains the remaining instructions from Zvkb plus a few new ones:

vbrev.v : bit-reversal in elements (generalization of vbrev8.v )
vcpop.v: vector pop-count per-element (RVV 1.0 already specifies vcpop.m: vector population count for a full mask, this instruction adds a per-element pop-count)
vctz.v and vclz.v: vector count trailing-zeros and leading zeros
vwsll.v*: vector widening shift left logical, shifting SEW-wide input elements to (2*SEW)-wide output elements (after first zero extending them)

Note: vbrev8.v and vbrev.v can seem redundant since the former can be emulated with vbrev.v when SEW=8-bit, but this may require using a couple of vsetvl* instructions to change and restore SEW (if it not already equal to 8-bit). As reverting bits in bytes is a pretty common in cryptography it justifies keeping the instruction with a static element width (embedded in the opcode).

Zvkt: data independent timing vector extension

The new version of the vector crypto specification introduces the Zvkt extension. This extension does not introduce any new instruction but it mandates that a list of existing instructions be implemented with data independent timing (sometimes also called DIEL for data independent execution latency). This concept was first introduced to RISC-V in the scalar crypto specification with the Zkt extension.

Note: Software implementations of cryptography primitives which use instructions whose latency depends on the data being manipulated are vulnerable to timing attacks. Those side channel attacks use the difference in timing to extract secrets (e.g. key values). Mandating implementations to ensure that the subset of instructions used to implement critical part of cryptographic software does not expose data-independent timing represents a first step towards secure software.

Zvkt lists:

all the instructions from Zvbb and Zvbc.
46 instructions from RVV 1.0 (the full list is available here).

Zvkt does not mandate timing independence with respect to the vector length vl nor on the mask value (except for some specific operations where the mask constitutes the main operand such as vand.mm). For some operations not all operands are affected by the data independent timing constraint. For example the slide amount or indices vs1/rs1 for vslide*, vrgather* are not affected by the DIEL constraint.

More meta-extensions

Some of the vector crypto extensions have been listed as options for the upcoming RVA23 profile (our article on the new profile and the specification). Those extensions are grouped in 4 options:

NIST suite + GHMAC: Zvkng = Zvkned + Zvknhb + Zvbb + Zvkg + Zvkt
Shang-Mi suite + GHMAC: Zvksg = Zvksed + Zvksh + Zvbb + Zvkg + Zvkt
Carry-Less Multiply Zvbc
Bit Manipulation: Zvbb (redundant with either Zvkng or Zvksg)

To ensure efficient GHMAC implementation Zvkg is integrated in Zvkng and Zvksg. It is not possible to implement support for a cipher/hash suite without efficient support for GCM. The specification notes that this support is important.

Those sets of extensions correspond to the profile goal: pushing for coherent sets of extensions that should be implemented together. The combination of bitmanip and a cipher/hash suite brings a coherent set of instructions to implement many cryptography primitives for that suite and beyond.

Note: there was a late debate between the task group and the ARC about whether Zvbc should be incorporated into Zvkn(g) and Zvks(g) or should be removed.
It was eventually decided not to include it.
One of the main argument for its removal seems to have been that it was no longer required to provide an efficient implementation of GHASH (since Zvkg would fulfill that role) and one of the main argument for its integration was that generic carry-less multiply could be used for more that GHASH, in particular for fast CRC implementation.
For a future RVB profile, the following two extensions have been defined:
NIST + Carry-Less Multiply: Zvknc = Zvkned + Zvknhb + Zvbb + Zvbc + Zvkt
Shang-Mi + Carry-Less Multiply: Zvksc = Zvksed + Zvksh + Zvbb + Zvbc + Zvkt

Simplified illegal instruction conditions

The change affects instruction working on element groups. For an introduction to element groups you can read this post:

RISC-V Vector Element Groups

Fprox

January 18, 2023

Read full story

In the earlier versions of the spec, it was mandated that executing a vector crypto instruction with the vector length vl not a multiple of EGS (element group size) should raise an illegal instruction exception. For example, executing a vaesz.vs with vl=7 (while vaes* instructions expect vl to be a multiple of 4 to cover full AES blocks) would trigger an illegal instruction exception.

This case is now reserved: the behavior is undefined but it is not expected for a valid implementation to necessarily raise an exception.

However, a case for the illegal instruction exception remains: VLMAX (defined as LMUL * VLEN / SEW) must be greater than or equal to EGS else an illegal exception must be raised. This case overlaps with the previous one except when vl=0, it is still considered a multiple of EGS (so not reserved) but if VLMAX is less than EGS then an illegal exception is triggered. This last constraint covers the case when the maximum available size for any vector operand (regardless of the actual vector length) is not sufficient to fit a single element group.

On-going work

Even if the specification is getting close to being ratified, there are still some activities going-on around the new vector crypto extensions.

A pull request (https://github.com/riscv/riscv-crypto/pull/310) containing proof-of-concept sample codes was recently merged into the riscv-crypto github repository and can be used to validate an implementation (although another set of tests: the architectural compatibility tests, or ACTs, will be the formal way to do this).

The support for the new extensions in the spike simulator (riscv-isa-sim) was also merged recently (https://github.com/riscv-software-src/riscv-isa-sim/pull/1303).

The SAIL model (intended to be RISC-V formal golden model) is being extended to support vector-crypto (this extension depends on the on-going work to extend the SAIL model to support the standard vector extension).

Support for the vector crypto extensions is being brought to OpenSSL (Openssl PR 20149 on github).

Conclusion

The RISC-V vector crypto extension should allow RISC-V to get to a new level of performance for workload relying on standard cryptography. Compliant RISC-V implementations will be able to provide much better performance for some useful cryptography primitives without having to rely on external IPs or proprietary extensions for acceleration. Their integration as options to the future RISC-V RVA23U64 profile should facilitate their adoption in the software ecosystem and give incentives to CPU vendors to integrate them into their products.

At the time of writing, the specification has just entered public review (https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/DpjkaK_1zQs). Anyone will then be able to share feedback and opinion on the specification and every bit of it should be considered by RISC-V cryptography task group.

After ratification, the cryptography task group may continue for a while and work on specifying a set of fast track extensions to cover a few remaining instructions which did not make the cut for the initial vector cryptography standard (e.g. 32-bit vector carry-less multiply instructions and vector-scalar variants of the Zvkg instructions). A new RISC-V task group, dedicated to Post-Quantum Cryptography (PQC) is also under creation.

What are you optimizing for ? (fprox's substack)

RISC-V Vector Cryptography Extensions (1/2)

RISC-V Vector Cryptography Extension (2/2)

RISC-V Vector Element Groups

Discussion about this post