Adding a new RISC-V instruction (to LLVM)
Extending the LLVM compiler to assemble new RISC-V instructions
In a previous post we presented how to add a new instruction to riscv-opcodes and RISC-V main instruction set simulator (a.k.a. riscv-isa-sim / spike). Since we did not extend any compiler or assembler tool, we had to rely on the inline assembly macro .insn
to integrate the new instruction into a test program.
In this article we will review how to extend the LLVM compiler (llvm.org) to support the new experimental extension, Zvknf, and the new instruction: vaese128.vv
(all the details about its specification can be found in the previous post). Here support means: being able to assemble the instruction with its proper mnemonic, this way our instruction will be supported in assembly form as any other standard RISC-V instruction.
Note: the LLVM suite contains multiple tools beside a compiler. We will mostly extend the assembly tool in this post (not declaring a new builtin yet).
To test the generate binary, we will rely on the spike simulator built in the first post.
Build environment
To build a RISC-V environment compatible with our modified version of LLVM we follow the guidelines listed here: https://github.com/sifive/riscv-llvm/blob/dev/README.md. We mostly build an LLVM capable of generating an object file for a RISC-V target and rely on the GNU toolchain to finally link it with its libraries dependencies into a binary we can execute with spike.
Declaring a new extension and new instruction(s)
We use the pull request introducing the standard vector-crypto support in LLVM as a template for our modification: https://reviews.llvm.org/D141672.
The full commit required to implement Zvknf support and declare vaese128.v*
is available here. We need to modify multiple source files, some in c++ (main language of LLVM) and some based on TableGen. All the source files we modify are contained in the llvm/lib
sub-directory.
Note: TableGen is part of the LLVM project. Its “purpose is to help a human develop and maintain records of domain-specific information“ (https://llvm.org/docs/TableGen/ ). It is used in particular for target description files for LLVM backends (including instruction definitions, scheduling information, …).
First, the new extension needs to be declared (in Support/RISCVISAInfo.cpp
and Target/RISCV/RISCVFeatures.td
). We declare the new extension Zvknf as experimental with the version 0.1
(This is arbitrary as our version of our extension does not strictly follow the draft specification).. We also needs to modify some tablegen target description (.td) files, in particular to declare a set of predicates to indicate support for the new extension. This will percolate into LLVM RISC-V backend and allow us to declare instructions associated with the new extension later and also to add the extension to a RISC-V ISA string, alongside a version number, when invoking the compiler. The git diff below lists those changes:
diff --git a/llvm/lib/Support/RISCVISAInfo.cpp b/llvm/lib/Support/RISCVISAInfo.cpp
index da1bd12fb2d5..faa16fcada6f 100644
--- a/llvm/lib/Support/RISCVISAInfo.cpp
+++ b/llvm/lib/Support/RISCVISAInfo.cpp
@@ -160,6 +160,8 @@ static const RISCVSupportedExtension SupportedExperimentalExtensions[] = {
{"zvksg", RISCVExtensionVersion{0, 5}},
{"zvksh", RISCVExtensionVersion{0, 5}},
{"zvkt", RISCVExtensionVersion{0, 5}},
+ // vector crypto all rounds
+ {"zvknf", RISCVExtensionVersion{0, 1}},
};
static bool stripExperimentalPrefix(StringRef &Ext) {
diff --git a/llvm/lib/Target/RISCV/RISCVFeatures.td b/llvm/lib/Target/RISCV/RISCVFeatures.td
index d01375ebe866..0626f29a00ab 100644
--- a/llvm/lib/Target/RISCV/RISCVFeatures.td
+++ b/llvm/lib/Target/RISCV/RISCVFeatures.td
@@ -569,6 +569,13 @@ def FeatureStdExtZvkt
: SubtargetFeature<"experimental-zvkt", "HasStdExtZvkt", "true",
"'Zvkt' (Vector Data-Independent Execution Latency)">;
+def FeatureStdExtZvknf
+ : SubtargetFeature<"experimental-zvknf", "HasStdExtZvknf", "true",
+ "'Zvknf' (Vector AES Encryption & Decryption (All Rounds))">;
+def HasStdExtZvknf : Predicate<"Subtarget->hasStdExtZvknf()">,
+ AssemblerPredicate<(all_of FeatureStdExtZvknf),
+ "'Zvknf' (Vector AES Encryption & Decryption (All Rounds))">;
+
def FeatureStdExtZicond
: SubtargetFeature<"experimental-zicond", "HasStdExtZicond", "true",
"'Zicond' (Integer Conditional Operations)">;
Then, we need to declare the new instruction. This is done in a RISC-V target description file Target/RISCV/RISCVInstrInfoZvk.td
. This step is pretty easy as the instruction we would like to add, vaese128.vv
, fits nicely into an existing vector crypto instruction pattern, VAES_MV_V_S
, as it is very similar to the single round AES vector instructions (e.g., vaesm.vv
). The more noticeable difference is obviously the value of the funct5
bitfield (same bitfield as the one used to encode vs1
/rs1
operand). The value of this bitfield (0b01000
) distinguishes the new instructions from the others with a similar pattern. We insert our new instruction alongside the other vector crypto instructions:
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoZvk.td b/llvm/lib/Target/RISCV/RISCVInstrInfoZvk.td
index 1e27e4306b84..e3d6543f8437 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoZvk.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoZvk.td
@@ -176,3 +176,7 @@ let Predicates = [HasStdExtZvksh], RVVConstraint = NoConstraint in {
def VSM3C_VI : PALUVINoVm<0b101011, "vsm3c.vi", uimm5>;
def VSM3ME_VV : PALUVVNoVm<0b100000, OPMVV, "vsm3me.vv">;
} // Predicates = [HasStdExtZvksh]
+
+let Predicates = [HasStdExtZvknf], RVVConstraint = NoConstraint in {
+ defm VAESE128 : VAES_MV_V_S<0b101000, 0b101001, 0b01000, OPMVV, "vaese128">;
+} // Predicates = [HasStdExtZvknf]
Note: this declaration adds both
vaese128.vv
andvaese128.vs
. We will only be using the former.
Building LLVM (and clang)
We need to build our version of LLVM to support the RISC-V target (in fact we do not need any other target). We will follow the guidelines in riscv-llvm-project README.md, install our RISC-V toolchain under llvm-project/../riscv-install
(rather than _install
) and build llvm (and clang) in llvm-project/build
.
# from llvm-project directory
ln -s ../../clang llvm/tools || true # required to include clang build
mkdir build
pushd build
cmake -G Ninja -DCMAKE_BUILD_TYPE="Release" \
-DBUILD_SHARED_LIBS=True -DLLVM_USE_SPLIT_DWARF=True \
-DCMAKE_INSTALL_PREFIX="../../riscv-install" \
-DLLVM_OPTIMIZED_TABLEGEN=True -DLLVM_BUILD_TESTS=False \
-DDEFAULT_SYSROOT="../../riscv-install/riscv64-unknown-elf" \
-DLLVM_DEFAULT_TARGET_TRIPLE="riscv64-unknown-elf"
-DLLVM_TARGETS_TO_BUILD="RISCV" ../llvm
cmake --build . --target install
Testing
We are going to re-use the test from the previous post but this time we can call the instruction through its mnemonic directly within the assembly inline in our C test program:
"vsetivli x0, 4, e32, m1, ta, ma\n" // setting vl=4
"vle32.v v17, (%[src])\n" // loading plaintext
"vle32.v v23, (%[key])\n" // loading key
// direct use of vaese128.vv, v23=key, v17=source/destination
// the .insn macro is no longer required
"vaese128.vv v17, v23\n"
"vse32.v v17, (%[dst])\n" // storing ciphertext
The full test becomes:
#include <stdio.h>
// Inputs and known answer taken from Section C.1
// of https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.197.pdf
char plaintext[16] = { 0x0, 0x11, 0x22, 0x33, 0x44, 0x55,
0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb,
0xcc, 0xdd, 0xee, 0xff};
char key[16] = {0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6,
0x7, 0x8, 0x9, 0xa, 0xb, 0xc, 0xd,
0xe, 0xf};
const char expected[16] = {0x69, 0xc4, 0xe0, 0xd8, 0x6a, 0x7b,
0x04, 0x30, 0xd8, 0xcd, 0xb7, 0x80,
0x70, 0xb4, 0xc5, 0x5a};
char ciphertext[16] = {0};
void __attribute__((noinline))
encrypt(char* dst, char* src, char* key) {
__asm__ volatile (
"vsetivli x0, 4, e32, m1, ta, ma\n" // setting vl=4
"vle32.v v17, (%[src])\n" // loading plaintext
"vle32.v v23, (%[key])\n" // loading key
// vaese128.vv
"vaese128.vv v17, v23\n"
"vse32.v v17, (%[dst])\n" // storing ciphertext
:
: [src]"r"(src), [key]"r"(key), [dst]"r"(dst)
:
);
}
#define PRINT_EGU8x16(LABEL, X) do {\
printf("%s: %02x %02x %02x %02x" " %02x %02x %02x %02x" \
" %02x %02x %02x %02x" " %02x %02x %02x %02x\n",\
(LABEL),\
(X)[0], (X)[1], (X)[2], (X)[3], (X)[4], (X)[5], \
(X)[6], (X)[7], (X)[8], (X)[9], (X)[10], (X)[11], \
(X)[12], (X)[13], (X)[14], (X)[15]);\
} while (0);
int main(void) {
encrypt(ciphertext, plaintext, key);
PRINT_EGU8x16("plaintext ", plaintext)
PRINT_EGU8x16("ciphertext", ciphertext)
PRINT_EGU8x16("expected ", expected)
return 0;
}
We use clang (C frontend for LLVM) to build an object file which will be linked later into an executable binary using gnu elf toolchain. The Zvknf extension was added as experimental and so we need to enable experimental extensions to build our test and we need to add the extension to the ISA string such that the compiler consider the extension’s instruction(s) as valid:
clang -O2 --target=riscv64 -c -o test.o vaese128-test.c \
-menable-experimental-extensions -march=rv64gcv_zvknf0p1
Since the instruction was declared into a compiler suite, it also get supported by llvm-objdump: we can objdump our program with llvm-objdump and see the new instruction:
$ llvm-objdump -D test.o
(...)
0000000000000002 <encrypt>:
2: 57 70 02 cd vsetivli zero, 4, e32, m1, ta, ma
6: 87 e8 05 02 vle32.v v17, (a1)
a: 87 6b 06 02 vle32.v v23, (a2)
e: f7 28 74 a3 vaese128.vv v17, v23
12: a7 68 05 02 vse32.v v17, (a0)
16: 82 80 ret
(...)
Our build of llvm does not support fully linking our test binary, so as mentionned, we use the gnu toolchain to link the final binary:
riscv64-unknown-elf-gcc -march=rv64gcv -mabi=lp64 test.o -o test-aes128-asm
And finally we can execute our test with the spike built following the previous blog:
./spike --isa=rv64gcv_zvknf --varch=vlen:128,elen:64 \
riscv64-unknown-elf/bin/pk test-aes128-asm
And we obtain the following output (as expected):
bbl loader
plaintext : 00 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff
ciphertext: 69 c4 e0 d8 6a 7b 04 30 d8 cd b7 80 70 b4 c5 5a
expected : 69 c4 e0 d8 6a 7b 04 30 d8 cd b7 80 70 b4 c5 5a
Conclusion
Adding support for a new extension and a new instruction to a compiler is an important step towards allowing programmers to make use of any new RISC-V extension (standard or custom). The process of extending LLVM has been made pretty easy by the fact that target descriptions are self contained and easy to extend through a domain specific language (tablegen) and also by the fact that we added an extension and an instruction which build upon pre-existing extensions and instructions (and we did not add any tests nor support for the instruction as an intrinsic).
References
How to build a RISC-V environment with LLVM https://github.com/sifive/riscv-llvm/blob/dev/README.md