Survey of basic techniques to transform matrix layouts using RVV
Great article, here are some rdcycle measurements from a C908:
scalar baseline (due to -Os): https://pastebin.com/Jejx6CQW
autovec baseline (due to -Ofast): https://pastebin.com/gQB76kgy
Edit: I experimented with larger LMUL but that didn't seem to impact the performance by a lot.
Edit2: the output logs say ... instruction(s), but it does actually measured the cycles with rdcycle.
Thank you for running those @camel-cdr
https://github.com/nibrunie/rvv-examples/pull/4 should improve the labelling of performance metrics "cycle" vs "instruction"
Great article, here are some rdcycle measurements from a C908:
scalar baseline (due to -Os): https://pastebin.com/Jejx6CQW
autovec baseline (due to -Ofast): https://pastebin.com/gQB76kgy
Edit: I experimented with larger LMUL but that didn't seem to impact the performance by a lot.
Edit2: the output logs say ... instruction(s), but it does actually measured the cycles with rdcycle.
Thank you for running those @camel-cdr
https://github.com/nibrunie/rvv-examples/pull/4 should improve the labelling of performance metrics "cycle" vs "instruction"