RISC-V Vector Is Expanding

Jan 25

An illustrated look at some of the on-going fast track extension projects for RVV

4 Comments

vzip.vv could previously be done with `vwmaccu.vx(vwaddu.vv(a, b), -1U, b)`, vunzip[e,o].vv is vnsrl.vi and vpaire/vpairo are masked vslide1up/vslide1down.

Reply (1)

FPRox

Jan 25

I am pretty sure everyone would be happier using vzip rather than the widening mac sequence :-) (and I am pretty sure this will be much more energy efficient, toggling a multiplier is getting expensive those days).

With respect to the masked slides, why vslide1up/vslide1down and not masked vslide.vi ? (here the benefit of the new instructions would be similar to implementing the unzip with a vcompress I guess: less instructions, less need to materialize mask and hopefully better suited datapath when hardware is not an issue)

Reply (1)

camel-cdr

Jan 25

Yeah, that wasn't a critique of the instructions. I just wanted to point out how to do the equivalent thing today.

vzip.vv is a great improvement over the widening macc sequence, vunzip doesn't give much of an advantage over vnsrl.vi, but does also work for the SEW=64 case.

The vpair[e,o] instructions are also great, because they are extremely cheap to implement and will likely have a higher throughput and lower latency than using the vslide instructions, you also don't need to move the masks around so much for the common transpose usage.

BTW, vdot4au.vv also sounds interesting for number parsing.

Reply (1)

FPRox

Jan 26

> Yeah, that wasn't a critique of the instructions.

I figured :-), IIRC you were the direct driver behind some of those new instructions

https://gist.github.com/camel-cdr/99a41367d6529f390d25e36ca3e4b626

What are you optimizing for ? (fprox's substack)

RISC-V Vector Is Expanding