Fprox, I was not actually trying to directly compare "classical" encryption algorithms with PQC. We are both aware of Keccak hardware optimization being proposed for accelerating PQC in RISC-V, and I was hoping to get a rough idea of how much faster this would be. I know you can't directly measure it until there's hardware (or, at least, cycle accurate simulation/emulation), but there are similarities in how the calculations are done.
I agree, AES may not be the proper baseline to compare against, but RISC-V vector crypto does not directly accelerate the other algorithms, although they could use the Zvknh, Zvkg, Zvbc, and bit manipulation extensions for building blocks.
One other thing I forgot to comment on earlier: a major stumbling block for FALCON is its use of floating-point arithmetic. This is unique among cryptographic algorithms, and the idiosyncrasies of the IEEE754 standard cause those who analyze this algorithm a lot of headaches. I think the problems are two-fold: 1) dealing with infinity, NaN, and subnormals and 2) no guarantees of Data Independent Execution Latency (DIEL), making it harder to create constant-time implementations.
Correct me if I'm wrong, but I believe FALCON data does not source or produce infinity, NaN, or subnormals. It is also using Double-Precision (64-bit) floating point even in hardware that does not support it (e.g., some old 32-bit ARMs). There is some sample code out there that implements the floating-point instructions in software, but these are horribly slow (on the order of 10-20x the number of cycles that a hardware instruction would take).
Even with software implementation, FALCON is considered to be "fast enough", compared to other PQC. Not having to implement floating-point hardware + registers may be the right tradeoff in small processors.
My understanding is that only a small subset of floating-point instructions are needed by FALCON (fadd/fsub, fmul, maybe some of the conversions, but not fused mul-add, and not fdiv/fsqrt). The ones that are needed *can* be implemented with DIEL conformance, but this would have to be codified into the ISA before implementers would trust that.
Hi Al, thank you for the compliment, appreciated as usual.
That is a great question. I would say that if Zvkned is implemented (vector crypto extension for NIST/AES cipher, I am sure you know that but other readers might not :-) ), I would suspect the latency to be in the low tens of cycles for the 10-14 rounds (could even be faster on some hardware). Software implementation would be in the low hundreds (using bitslicing or composed fields, or in register vrgather for the byte sub steps; assuming they are constant time as I am sure most implementation would want to avoid memory timing attacks).
So there is definitely a very big latency difference between PQC and AES.
But as you hinted, this would not be apples-to-apples, AES being a symmetric encryption algorithm, a fairer comparison might be against asymmetric algorithms such as Diffie-Hellman (key exchange), RSA, DSA or ECDSA (signature). I will need to look into it (I am sure `openssl speed` would rapidly provide some numbers).
Even if PQC algorithms end up slower, they have the unbeatable advantage that DH, RSA, (EC)DSA are breakable by a quantum computer.
Fprox, I was not actually trying to directly compare "classical" encryption algorithms with PQC. We are both aware of Keccak hardware optimization being proposed for accelerating PQC in RISC-V, and I was hoping to get a rough idea of how much faster this would be. I know you can't directly measure it until there's hardware (or, at least, cycle accurate simulation/emulation), but there are similarities in how the calculations are done.
I agree, AES may not be the proper baseline to compare against, but RISC-V vector crypto does not directly accelerate the other algorithms, although they could use the Zvknh, Zvkg, Zvbc, and bit manipulation extensions for building blocks.
One other thing I forgot to comment on earlier: a major stumbling block for FALCON is its use of floating-point arithmetic. This is unique among cryptographic algorithms, and the idiosyncrasies of the IEEE754 standard cause those who analyze this algorithm a lot of headaches. I think the problems are two-fold: 1) dealing with infinity, NaN, and subnormals and 2) no guarantees of Data Independent Execution Latency (DIEL), making it harder to create constant-time implementations.
Correct me if I'm wrong, but I believe FALCON data does not source or produce infinity, NaN, or subnormals. It is also using Double-Precision (64-bit) floating point even in hardware that does not support it (e.g., some old 32-bit ARMs). There is some sample code out there that implements the floating-point instructions in software, but these are horribly slow (on the order of 10-20x the number of cycles that a hardware instruction would take).
Even with software implementation, FALCON is considered to be "fast enough", compared to other PQC. Not having to implement floating-point hardware + registers may be the right tradeoff in small processors.
My understanding is that only a small subset of floating-point instructions are needed by FALCON (fadd/fsub, fmul, maybe some of the conversions, but not fused mul-add, and not fdiv/fsqrt). The ones that are needed *can* be implemented with DIEL conformance, but this would have to be codified into the ISA before implementers would trust that.
Fprox, informative as usual. I know it isn't apples-to-apples, but I'm curious how AES latencies compare.
Hi Al, thank you for the compliment, appreciated as usual.
That is a great question. I would say that if Zvkned is implemented (vector crypto extension for NIST/AES cipher, I am sure you know that but other readers might not :-) ), I would suspect the latency to be in the low tens of cycles for the 10-14 rounds (could even be faster on some hardware). Software implementation would be in the low hundreds (using bitslicing or composed fields, or in register vrgather for the byte sub steps; assuming they are constant time as I am sure most implementation would want to avoid memory timing attacks).
So there is definitely a very big latency difference between PQC and AES.
But as you hinted, this would not be apples-to-apples, AES being a symmetric encryption algorithm, a fairer comparison might be against asymmetric algorithms such as Diffie-Hellman (key exchange), RSA, DSA or ECDSA (signature). I will need to look into it (I am sure `openssl speed` would rapidly provide some numbers).
Even if PQC algorithms end up slower, they have the unbeatable advantage that DH, RSA, (EC)DSA are breakable by a quantum computer.