RISC-V Scalar Bit Manipulation Extensions

Zba, Zbb, Zbc, Zbs, Zbkb, Zbkc, Zbkx and B extensions

Apr 08, 2024

On March 25th 2024 (announcement) the public review period for RISC-V B extension ended and it should be ratified soon. We want to use this occasion to present RISC-V ISA offerings in term of scalar bit manipulation.

Bit manipulation in a Nutshell

Bit manipulation refers to many different operations (from rotation to single bit manipulation) which can require a long sequence of instructions if not supported natively by an ISA. For example, simply extracting the value of a single can require shifting and masking, and counting the number of bits set (a.k.a. popcount) can require many more instructions (see https://www.johndcook.com/blog/2020/02/21/popcount/ and for a comprehensive reference, the interested reader can refer to the well known book Hacker’s delight).

RISC-V extensions for bit manipulation (often referred to as bitmanip) were ratified in November 2021. bitmanip is often labelled as a collection of extensions since it groups 4 extensions: Zba, Zbb, Zbc and Zbs

diagram of the 4 RISC-V bit manipulation extensions and which instructions they contain — RISC-V Bit Manipulation Extensions

Zba: address generation

Zba is made of 8 instructions which can be used to speed-up memory address generation, performing the work of multiple basic instructions.

For RV32, Zba introduce 3 shift-add-add instructions: sh1add, sh2add, and sh3add. They shift one of their operand, rs1, by 1 (respectively 2 and 3) bit left before adding the result to the second operand rs2. The shift perform a multiplication by a power of two, assisting in the computation of a half-word (16-bit), word (32-bit) or double-word (64-bit) address.

Those instructions are also defined for RV64, in both case they are defined as operating on XLEN-wide values. In addition to those 3, Zba for RV64 introduces 3 unsigned word variants: sh<n>add.uw (n=1, 2, 3), which only considers the least significant 32-bit word of rs1, which is zero extended to XLEN before being shifted by n and added to rs2. Finally, Zba for RVA64 introduces unsigned-word versions of add and slli (shift-left-logically by a 5-bit immediate): add.uw and slli.uw.

Zbb: basic bit-manipulation

Zbb introduces the following instructions:

logic with negate: andn, orn, xnor
count of leading or trailing zeros: clz[w], ctz[w]
pop(ulation) count: cpop[w]
minimum / maximum for signed and unsigned integers: min[u], max[u]
sign and zero extensions: sext.(b,h), zext.h
rotations: rol[w], ror[i][w]
or-combine Byte: orc.b
byte reverse: rev8

All these instructions could be expanded into sequences of simpler instructions from the base ISA, their introduction provides sizable code size reduction.

The instructions with a w suffix are only defined for RV64. They work on the least 32-bit word on their inputs and sign extend their 32-bit result to XLEN bits. All other instructions work on XLEN-bit wide operands.

andn and orn invert one of their operand while xnor inverts the result of a xor operation.

orc.b set each byte of the result to 8b11111111 if any bit in the corresponding byte of the operand is set and 8b00000000 otherwise.

Note: We have presented the vector version of Zbb, Zvbb, in this post:
RISC-V Vector Cryptography Extension (2/2)
Fprox
·
February 20, 2023
Read full story
orn, xnor, and orc.b are not present in the vector extension Zvbb.

Zbc: carry-less multiplication

Zbc defines 3 new instructions: clmul, clmulh, clmulr.

Those instructions perform various forms of carry-less multiplication. A carry less multiplication is a binary multiplication where accumulation of similarly weighted partial products is performed by XOR operation rather than integer addition (hence the name carry-less since XOR does not generate any carry). It is equivalent to assuming each input is a polynomial with boolean coefficients (in GF(2)) and the carry-less multiplication corresponds to the result of the polynomial multiplication.

It it used in some hash primitives such as the Cyclic Redundancy Check (CRC) and some authentication code such as the Galois Counter Mode. It often comes with a modulo reduction step which can be implemented with well crafted multiplications (e.g CRC folding).

clmul and clmulh provide everything needed to computed the full 2*XLEN-1 wide result of a XLEN-bit by XLEN-bit carry-less multiplication.

Note: multiplying a n-bit value by a m-bit value in a carry-less fashion results in a (n+m-1)-bit result. This is easy to infer in the polynomial multiplication: multiplying a degree n-1 polynomial with boolean coefficients (n-bit boolean value) by a degree m-1 polynomial (m-bit boolean value) results in a degree m+n-2 polynomial which is a m+n-1 boolean value.

clmulr provides the bit reversed high XLEN bits of the result.

Note: Some algorithms relying on carry-less multiply interpret bits in a word as little-endian when it comes to associating them with coefficient (the Least Significant Bit, a.k.a. bit 0, is the constant coefficient and the Most Significant Bit, a.k.a. bit XLEN-1 is the coefficient for X^XLEN-1), whereas other algorithms interpret the bits in a swapped order: the MSB being the constant coefficient and the LSB being the highest coefficient value. This is why both reversed clmulr and non-reversed clmul instructions are useful: saving a costly bit reverse operation.

Zbs: single-bit manipulation

Zbs defines 8 new instructions:

single bit clear: bclr and bclri
single bit extract: bext and bexti
single bit invert: binv and binvi
single bit set: bset and bseti

Each of the instructions modifies or extracts a single bit in the input operand and copies the result to the destination register.

For every of the 4 groups, there are two variants: register index or immediate index (with i suffix); the index of the modified/extracted bit can either be stored in a register (rs2) or directly in the opcode as a 5-bit immediate (RV32). For RV64, the opcode dedicates an extra bit to immediate encoding, providing the 6-bit value required to encode any bit position in a 64-bit (XLEN) register.

The B extension

The B extension, not yet ratified as of April 5th 2024, does not introduce any new instructions. It is simply a grouping of 3 of the existing, and already ratified, ISA extensions specifying bit manipulation instructions: Zba, Zbb, and Zbs.

Zbc is not part of the B extension. Although it can be very useful for any application using carry-less arithmetic but it implies the implementation of a XLEN x XLEN carry-less multiplier whose large cost may be too high for some processors.

Note: the B extension also specifies that bit 1 of the misa CSR describes support for the B extension: if the bit is set the implementation supports the extension, if not then it may not support it.

The very minimal B extension document can be found here: pdf for version 1.0-rc2. It is a very quick read (a single chapter and a single section).

The B extension was designed to group in a single extension the most common scalar bit-manipulation extensions. Those 3 extensions are mandatory in the standard RVA23(U64) and RVB23(U24) profiles.

Bit manipulation and cryptography

Some instructions introduced in RISC-V scalar bit manipulation extensions are reused in RISC-V scalar cryptography extensions: Zbkb and Zbkc.

Bitmanip for cryptography: Zbkb extension

The extension Zbkb integrates some of Zbb instructions (andn, orn, xnor, rol*, ror*, and rev8)and add a few new ones:

XLEN/2 word, byte and half word packing: pack, packh, packw
bit in byte reversal: brev8
splitting odd/even bits: zip, and its reversal: unzip (RV32 only)

The figures below illustrates the instructions which are in Zbkb but not in Zbb.

Carry-less for cryptography: Zbkc extension

The extension Zbkc reuses clmul and clmulh from Zbc (but not clmulr).

Crossbar permutation instruction: Zbkx

Another set of cryptography bitmanip instructions not defined in any of the scalar bitmanip extensions is the set of permutation instructions introduced in Zbkx. There are two such instructions: xperm8 and xperm4. xperm8 performs a byte permutation and xperm4 performs a nibble (4-bit element) permutation. Both instructions read their data set from source rs1 and their index vector from source rs2.

For xperm4, a 4-bit index is enough to address every data nibble when XLEN=64; in such a case every index has a mapping in the data set rs1. For xperm8 8-bit indices or for xperm4 when XLEN=32, the index may be out-of-bound: in such case the result element is replaced by 0.

Zbkx’s **xperm4** nibble permutation instruction

Scalar bitmanip and constant execution time

The scalar cryptography extension adds Zkt: extension mandating constant time execution latency. Zkt covers all instructions from Zbkb, Zbkc, and Zbkx: when Zkt is supported all those instructions must be implemented with data-independent execution latency, as to not disclose any information on data values through data-dependent latencies. This is critical for any cryptography application. Zkt does not mandata data-independent execution latency for other scalar bitmanip instructions (In Zba, Zbb, Zbc, Zbs but not Zbkb or Zbkc) but it does cover other standard instructions.

Conclusion

Scalar bit manipulation extensions extend RISC-V capability to implement widespread operations on bits which would require large sequence of instructions otherwise. Implementing those extra instructions incur extra costs compared to the base ISA but this cost is easily justifiable for most applications in particular for the 3 (out of 4) extensions (Zba, Zbb, Zbs) which are grouped into the B extension. The B extension is mandatory in the main RISC-V profiles such as RVA23U64 (while Zbc remains an option in the profile). An extra set of scalar bit manipulation instructions has also been specified, those extensions cover more dedicated needs and can be mandated to be implemented with data independent execution latency to make them suitable for use in cryptographic applications.

References:

RISC-V Bit-Manipulation ISA-extensions (specification)
Repository for bitmanip specifications: https://github.com/riscv/riscv-bitmanip
Repository for B extension specification: https://github.com/riscv/riscv-b
Tracking RISC-V B extension ratification on RVIA Jira: RVS-2006 ticket

What are you optimizing for ? (fprox's substack)

RISC-V Vector Cryptography Extension (2/2)

Discussion about this post

Ready for more?