RISC-V Vector Cryptography Extensions (1/2)
accelerating cryptography with RISC-V Vector: Introduction and ciphers
June 24th 2023 Update: the vector cryptography specification has recently reach the public review stage and a few differences have been introduced since this post was written. A summary of those changes can be found in this post:
Introduction
RVIA is in the process of releasing a new extension: the vector cryptography extension. The extension is currently undergoing review by RVIA architectural committee and should soon be submitted for public review. The vector cryptography extension is an extension to RVV (RISC-V Vector) and will define a new set of 22 new vector instructions dedicated to cryptography.
The extension is divided into 7 sub-extensions, each sharing the Zvk prefix: Zvkned, Zvknha, Zvkhnhb, Zvksed, Zvksh, Zvkb and Zvkg.
Zvkned contains instructions to perform NIST Cipher AES encryption and decryption.
Zvknha and Zvknhb contain instructions to perform NIST hash function SHA-2 (SHA-256 for Zvknha and both SHA-256 and SHA-512 for Zvkhnb)
Zvksed contains instructions to perform Shang-Mi 4 Cipher (SM4) encryption and decryption
Zvksh contains instructions to perform Shang-Mi 3 Hash function (SM3)
Zvkg contains instructions to perform primitive multiplication or accumulate-and-multiply for the GHASH algorithm (used in the GCM authenticated cipher mode)
Zvkb contains bit manipulations instructions (rotations, bit and byte reversal, carry-less multiplies, ...)
The standard also defines two supersets: Zvkn (Zvkned + Zvknhb + Zvkg) and Zvks (Zvksed + Zvksh). These supersets will be optional in the future RVA23 application profile.
Zvkned was known as Zvkns in the early versions of the specification drafts.
Origins: scalar crypto extension and Zkt
In September 2021, RVIA released a new set of scalar extensions dedicated to cryptography acceleration: https://riscv.org/blog/2021/09/risc-v-cryptography-extensions-task-group-announces-public-review-of-the-scalar-cryptography-extensions. The spec was later updated and is now in version 1.0.1 (spec pdf).
This set introduced numerous new instructions to accelerate cryptography operations. Contrary to Zvk those instructions operates on the general purpose register file (XRF). The set also defined the Zkt extension which lists existing instructions from the base ISA and earlier extensions and mandates that they be implemented with data-independent latency. This is critical for any cryptography software as data-dependent latency of instruction can leak information on data values, leading to side-channel (timing) attacks. This data-independent latency requirement has been extended to the instruction of the vector cryptography standard.
Vector element groups
The vector cryptography extensions perform operations on ciphertext / plaintext blocks and hash blocks which can be much larger than single elements (e.g. AES blocks and keys are 128-bit wide). To that end the vector crypto extension make use of the vector element group paradigm reviewed in:
It defines EGS (Element Group Size) and list supported SEW (Selected Element Width) for each algorithm (e.g. EGS=4 and SEW=32-bit for AES, forming 128-bit wide element groups). Except instructions in Zvkb, all other Zvk instructions operate on element groups.
If an instruction working with element groups is executed with
vl
not a multiple of itsEGS
or withVLMAX < EGS
then it will raise an illegal instruction exception.Same thing if the instruction is executed with an SEW value which is not supported.
Vector-scalar and element group
The standard extends the concept of vector-scalar operations (mnemonic with suffix .vs)
: the scalar operand can now be a full element group (rather than the single element 0 from a vector register as is the case for the vector reduction vredsum.vs
). This operation variant can be leveraged to apply the same key to multiple vector elements, saving an element splat. Most cipher instructions introduced by the vector crypto extension have both .vv
and .vs
variants. The only instruction which has a single variant is vaesz.vs
because the .vv
variant can be emulated by a simple vxor.vv
.
Element group and smaller VLEN values
As allowed in the vector element group definition the vector crypto extensions do not limit the value of VLEN to VLEN >= EGW, they allow smaller implementations with smaller VLENs. In that case, software must make sure that LMUL is set such that LMUL * VLEN >= EGW, else it is reserved.
This impacts some of the vector-scalar variant of vector crypto instructions. Those instructions defined with the .vs
mnemonic suffix perform an operation between an input vector register group read from vd
and a single element group read from vs2.
Since VLEN can be smaller than EGW, even if the vs2
is called “scalar” (scalar element group), it may span multiple vector registers, and LMUL must be set accordingly.
Ciphers
The vector crypto extensions defines a set of instruction to accelerate various ciphers (encryption/decryption). For all of them, the new instructions are primitives to expand a round of key schedule or a round of encryption / decryption. This means that multiple copies of those instructions must be sequenced to execute the encryption / decryption of a single block (e.g. about 11 instructions to encrypt a 128-bit block using AES-128 encryption, assuming the key schedule as been fully expanded, add another 10 instructions to perform this key schedule expansion).
Full-Round / All-Round versions of the instructions were initially considered during the specification process. A single of those instruction could perform the full encryption, covering all the required rounds and key expansions. They were split from the first version of the standard and should be addressed later by RVIA crypto task group.
Let us now review the instructions which made it in the standard: the single round instructions.
Zvkns: AES cipher
Zvkns defines instructions to accelerate the NIST Advanced Encryption Standard (AES), it includes three family of instructions:
vaeskf1
for AES-128 andvaeskf2
for AES-256: key schedule, each instruction produces 4 32-bit word of the AES key schedulevaesz
,vaesem
andvaesef
: encryption instructions; each of them realize one type of AES encryption roundvaesz
(common with encryption),vaesdm
andvaesdf
: decryption instructions; each of them realize one type of AES decryption round
The encryption/decryption instructions expect a round key produce from the original key by the key schedule instruction. Key schedule can be unrolled once and broadcast to many block encryption/decryption.
(Round) keys, ciphertext and plaintext blocks (each 128-bit wide in AES specification) are stored as 4-wide element groups with 32-bit as element size (EGS=4, EEW=32, EGW=128).
vaesz
, vaesfm
, vaesef
, vaesdm
, vaesdf
admit both .vs
and .vv
variants
The extension supports two key sizes: 128 and 256 bits, it does not support the less used AES-192.
NOTE:
vaes.vz
is just a glorified vector-scalar element group version ofvxor
. It can be used to implement a fast element group splat
// to splat vl / 4 element groups from vs2 to vd // if vd and vs2 differ (else we need an intermediary register) vsetivli x0, vl, <desired vlmul> // vl = #EG * 4 (AES's EGS) vxor.vv vd, vd, vd // zero-ing vd vaesz.vs vd, vs2 // splatting element group from vs2
NOTE: There is no
vaesz.vv vd, vs2
instruction because it can be implemented directly byvxor.vv vd, vd, vs2
.The only noticeable difference being that
vxor
will not raise an illegal instruction exception when executed with avl
which is not a multiple of 4 (or withVLMAX < EGS
).
Zvksed: SM4 cipher
Zvksed defines two instructions to accelerate the Shang-Mi 4 cipher standard (SM4):
vsm4k
for the key expansionvsm4r
to perform an encryption/decryption round
They work in a similar fashion to the AES instructions, except that, given the symmetric definition of SM4, a single instruction is enough to handle both encryption and decryption. SM4 also shares the same element group parameters as AES: EGS=4, EEW=32 (EGW=128).
vsm4r
admits a vector-scalar variant (vsm4r.vs
, on top of the vector-vector vsm4r.vv
)
Conclusion
RISC-V Vector crypto extension should be ratified this year (2023) and integrated as optional in the RVA23 application profile (at least Zvkn and Zvks). It constitutes a good step towards allowing high performance implementation of cryptography libraries on RISC-V processors.
In the next post we will review the Zvknh[ab] (SHA-256 and SHA-512), Zvksh (SM3 hash), Zvkg (gmac) and Zvkb (bitmanip) extensions.
Thanks
Thank you to Aliaksei C. for pointing out a couple of typos in an earlier version of this post.
Thank you to Hong-Rong H. for pointing out a typo in an earlier version of this post “EGS (Element Group Width Size)”.
References:
github directory for vector crypto specification: https://github.com/riscv/riscv-crypto/tree/master/doc/vector
github directory for scalar crypto specification: https://github.com/riscv/riscv-crypto/blob/master/doc/scalar/riscv-crypto-spec-scalar.adoc