Zfa: even more floating-point operations

completing RISC-V floating-point capabilities

May 24, 2023

This post was updated June 23rd 2023 to mention the end of the public review period (more information can be found at the end of this post).

On May 3rd 2023, RVIA announced the start of the public review period for the Zfa extension. In this post we will review the instructions added by this new extension.

This extension aims at completing the F/D/Q/Zfh extensions with new floating-point instructions. It introduces new instructions in 6 categories: immediate value materialization, minimum/maximum, round-to-integer, modular convert to integer, moves and comparisons. Some instructions are variations of existing ones (e.g. minimum/maximum) and others are completely new (e.g. FP load immediate).

Floating-point load immediate

Zfa introduces a fli.s instruction which materializes one of 32 floating-point constants selected by a 5-bit immediate field in the opcode. If the D extension (resp. Q, Zfh/Zvfh) is supported, then the corresponding double precision fli.d (respectively quad precision fli.q, half precision fli.h) instruction is introduced.

The table (extracted from the standard) listing fli.s possible values has been reproduced above. The constant values were selected as the most frequent values used in a set of libraries and which required few significand bits (at most 2 of the most signicant bits of the mantissa are non-zero), in order to simplify the table implementation.

Note: Some RISC-V members in particular from the floating-point Special Interest Group (SIG) have questioned the selection of the constants and this has lead to a discussion on the subject (here).

Minimum and Maximum instructions

The F extensions already specifies fmin.s ( resp. fmax.s) which computes the minimum (resp. maximum) of two floating-point numbers. Those instructions implement the operations specified as minimumNumber and maximumNumber by the IEEE-754 standard (IEEE standard for floating-point arithmetic). When only one of the two operands is a number (and the other is not-a-number or NaN) then the instruction returns the number.

Zfa introduces fminm.s and fmaxm.s which perform exactly the same operation as fmin.s and fmax.s, except that when any operand is a NaN then the canonical NaN is returned. This means that if one operand is a number and the other one is a NaN, then a canonical NaN is returned. Those instructions implement the operations specified as minimum and maximum by the IEEE-754 standard.

The main difference between fmin.s/fmax.s and fminm.s/fmaxm.s is that the latters propagate NaN (although after canonicalization).

If the D extension (resp. Q, Zfh) is supported and Zfa support is enabled, then the corresponding double precision fminm.d/fmaxm.d (respectively quad precision fminm.q/fmaxm.q, half precision fminm.h/fmaxm.h) instructions are introduced.

Round-to-Integer Instructions

RISC-V F extension specifies a conversion instruction from floating-point to integer: fcvt.x.s. Due to the split between integer and floating-point register file (more info in this post), this instruction stores the converted result in a general purpose register. Sometimes it can be useful to round a value to an integer while keeping it in a floating-point format. This is equivalent to discarding the fractional part of the number (by truncation when rounding to zero). In particular this is useful if the result is later used in other operations with floating-point value, saving a conversion back from integer to floating-point (this happens often when implementing argument reduction for elementary functions).

To answer this need, Zfa introduces a new instruction: fround.s which rounds a single precision floating-point value to an integer while returning the result in the single precision floating-point format. The rounding is performed using, as usual, the rounding mode defined by the opcode rm field. This instruction can raise only one exception: invalid operation if the input value is a signaling NaN. Zfa also introduces froundnx.s which performs exactly as fround.s except it also raises the inexact flag if the conversion is not exact (and does not return a NaN).

If the D extension (resp. Q, Zfh) is supported and Zfa support is enabled, then the corresponding double precision fround.d/froundnx.d (respectively quad precision fround.q/froundnx.q, half precision fround.h/froundnx.h) instructions are introduced.

Modular Convert-to-Integer Instruction

If the D extension is implemented then Zfa adds the fcvtmod.w.d instructions.. This instructions converts a double precision value, read from a floating-point register, to an unbounded signed integer, always rounding it towards zero. Then sign-extends the lowest 32 bits of the result to XLEN bits before returning them in a general purpose register.

Note: According to the specification, this instruction was addend to accelerate Javascript, which uses double precision to store floating-point values and often truncate them to 32-bit signed integers. More information on how Javascript handles “Numbers” can be found on this page from Mozilla javascript documentation.

Given the intent, this instruction is only added for double precision; there are no quad nor half precision counterparts.

Move Instructions

Zfa introduces new move instructions between floating-point and general purpose registers.

For RV32, if the D extension is implemented, it introduces fmvh.x.d moving the high 32 bits of a FP register into an integer register, and fmvp.d.x which assembles two 32-bit values read in two independent integer registers into a 64-bit double precision value in a floating-point register.

Note: it is unclear if the independence of the two input registers will have a real benefit or if the register pair could have been encoded as a single register (even using 4 bits rather than 5 since only the first even-indexed register needed to be encoded). But this would have been a new format for RISC-V instruction.

For RV64, if the Q extension is implemented, it introduces the equivalent fmvh.x.q and fmvp.q.x defined between 64-bit integer register(s) and 128-bit quad precision floating-point registers.

Those additions will make program running on architecture where XLEN < FLEN more efficient by skimming a few instructions in sequence moving data from/to the floating-point register file (without actual conversion).

Comparison Instructions

Zfa extends the set of quiet comparisons with the following instructions: fleq.s/fltq.s, fleq.d/fltq.d (conditioned to the D extension), fleq.q/fltq.q (conditioned to the Q extension), fleq.h/fltq.h (conditioned to Zfh). Those instructions are equivalent to fle.s/flt.s except that they do not set the invalid operation flags if either operand is a NaN (including when one of the operand is a signaling NaN). They complete the set of quiet comparisons offered by RISC-V ISA which already contained feq.s.

Conclusion

Zfa completes the F, D, Zfh/Zfhmin and Q extensions, extending the conversion to integers and the flag and special values handling for comparison and min/max. This should make RISC-V implementations which adopts Zfa more efficient for some rather standard floating-point sequences.

As of May 23rd 2023, the specification is still under going its open public review. Public comments should be welcome until June 3rd 2023.

As of June 26th 2023, the public review period has ended and a discussion summary can be found in the isa-dev mailing list announcement thread.

Updated on August 15th 2023 to clarify comparison description. Thank you to Ken P. for pointing the unclear section and suggesting a fix.

References:

Zfa specification pdf
Zfa public review start announcement on RISC-V isa-dev mailing list
Zfa public review feedback summary on RISC-V isa-dev mailing list

What are you optimizing for ? (fprox's substack)

Discussion about this post