Faster NTT with RISC-V Vector

Jan 11

From 17'800 cycles down to ?

2 Comments

> Our code heavily relies on vsetvl* instructions setting vl to VLMAX when avl is greater than or equal to VLMAX. Although this seems to be true on our target, this is actually not mandated by RVV 1.0. This makes this code quite brittle and should be addressed.

This is a single additional min instruction: vsetvli(min(avl, vlmax))

Or you can just always use VLMAX, which works better depends on the context.

> Using a VLEN specific approach is somehow cheating

I think the best approach is to have a general solution, with specialization where beneficial. Branching on VLEN should be extremely cheap, because it's always predicted.

Expand full comment

Reply (1)

FPRox

Jan 11

Thanks for the suggestion @camel-cdr.

I actually want to set vl to VLMAX but RVV 1.0 allow some leniency when setting vl

https://github.com/riscvarchive/riscv-v-spec/blob/master/v-spec.adoc#63-constraints-on-setting-vl

Although looking at it closely, I think I can use clause 3. `vl = VLMAX if AVL ≥ (2 * VLMAX)` to make sure I get the value I expect.

>> Using a VLEN specific approach is somehow cheating

> I think the best approach is to have a general solution, with specialization where beneficial. Branching on VLEN should be extremely cheap, because it's always predicted.

I agree, although I would need to evaluate the impact in the code size (that I completely disregarded in this work). I might try it to try to optimize the code across several implementation (I only have a VLEN=128 implementation readily available at the moment).

Expand full comment

What are you optimizing for ? (fprox's substack)

Faster NTT with RISC-V Vector