Discussion about this post

User's avatar
Al Martin's avatar

Fprox, I was not actually trying to directly compare "classical" encryption algorithms with PQC. We are both aware of Keccak hardware optimization being proposed for accelerating PQC in RISC-V, and I was hoping to get a rough idea of how much faster this would be. I know you can't directly measure it until there's hardware (or, at least, cycle accurate simulation/emulation), but there are similarities in how the calculations are done.

I agree, AES may not be the proper baseline to compare against, but RISC-V vector crypto does not directly accelerate the other algorithms, although they could use the Zvknh, Zvkg, Zvbc, and bit manipulation extensions for building blocks.

One other thing I forgot to comment on earlier: a major stumbling block for FALCON is its use of floating-point arithmetic. This is unique among cryptographic algorithms, and the idiosyncrasies of the IEEE754 standard cause those who analyze this algorithm a lot of headaches. I think the problems are two-fold: 1) dealing with infinity, NaN, and subnormals and 2) no guarantees of Data Independent Execution Latency (DIEL), making it harder to create constant-time implementations.

Correct me if I'm wrong, but I believe FALCON data does not source or produce infinity, NaN, or subnormals. It is also using Double-Precision (64-bit) floating point even in hardware that does not support it (e.g., some old 32-bit ARMs). There is some sample code out there that implements the floating-point instructions in software, but these are horribly slow (on the order of 10-20x the number of cycles that a hardware instruction would take).

Even with software implementation, FALCON is considered to be "fast enough", compared to other PQC. Not having to implement floating-point hardware + registers may be the right tradeoff in small processors.

My understanding is that only a small subset of floating-point instructions are needed by FALCON (fadd/fsub, fmul, maybe some of the conversions, but not fused mul-add, and not fdiv/fsqrt). The ones that are needed *can* be implemented with DIEL conformance, but this would have to be codified into the ISA before implementers would trust that.

Al Martin's avatar

Fprox, informative as usual. I know it isn't apples-to-apples, but I'm curious how AES latencies compare.

1 more comment...

No posts

Ready for more?