RISC-V Vector Programming in C with…

FPRox

Nov 23, 2023

RVV for those who find assembly too low level

Read →

5 Comments

-.-

Jan 10, 2024

> if avl is greater than VLMAX then VLMAX is returned

That'd be logical, right? Too bad the RVV spec likes to throw curveballs at unsuspecting developers.

https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc#constraints-on-setting-vl

Expand full comment

Reply (1)

FPRox

Jan 10, 2024Edited

You are right, this is only one of the allowed implementation behavior, I need to correct my sentence.

Expand full comment

Reply (1)

FPRox

Jan 10, 2024

Post updated on Jan 10th 2024 to fix this error.

Expand full comment

camel-cdr

Nov 23, 2023

I recommend https://dzaima.github.io/intrinsics-viewer/ as a reference for the intrinsics.

I ran the float sum benchmark with 10000 elements and rdcycle on a C920, here are the results:

scalar: 27000 cycles

LMUL=1: 10792 cycles

LMUL=2: 9337 cycles

LMUL=4: 8702 cycles

LMUL=8: 10553 cycles

You can see how LMUL>1 basically acts as loop unrolling, as the C920 has DLEN<=VLEN. The reason LMUL=8 is slower than LMUL=4 is, presumably, because the core can issue one 512 bit load and one 512 bit store in parallel, but with LMUL=8 it can't (or rather doesn't) interleave the load stores. I expect future implementations to not suffer from this problem.

Expand full comment

Saltuk Akgül

Jan 31

Great article, thank you.

What I am wondering is why this strongly typed approach is selected for intrinsics (i.e. why compiler can't pick an LMUL value). I couldn't find any article or forum thread that explains that.

Expand full comment

What are you optimizing for ? (fprox's substack)

RISC-V Vector Programming in C with…