by Mikhail R. Gadelha, Igalia S.L.
Over the past eight months, Igalia has been working through RISE on the LLVM compiler, focusing on its RISC-V target. The goal is to improve the performance of generated code for application-class RISC-V processors, especially where there are gaps between LLVM and GCC RISC-V.
The result? A set of improvements that reduces execution time by up to 15% on our SPEC CPU® 2017-based benchmark harness as measured on SpacemiT-X60.
Our efforts have concentrated on several key areas within the LLVM compiler infrastructure to specifically target and improve the efficiency of RISC-V code generation. These contributions have involved delving into various stages of the compilation process, from instruction selection to instruction scheduling, but here we’ll focus on three major areas where substantial progress has been made:
Introducing a scheduling model for the hardware used for benchmarking (SpacemiT-X60). LLVM had no scheduling model for the SpacemiT-X60, leading to pessimistic and inefficient code generation. We added a model tailored to the X60’s pipeline, allowing LLVM to better schedule instructions and improve performance. Longer term, a more generic in-order model could be introduced in LLVM to help other RISC-V targets that currently lack scheduling information, similar to how it’s already done for other targets, e.g., Aarch64. This contribution alone brings up to 15.7% improvement on the execution time of SPEC benchmarks on SpacemiT-X60.
Improved Vectorization Efficiency. LLVM’s SLP vectorizer used to skip over entire basic blocks when calculating spill costs, leading to inaccurate estimations and suboptimal vectorization when functions were present in the skipped blocks. We addressed this by improving the backward traversal to consider all relevant blocks, ensuring spill costs were properly accounted for. The final solution, contributed by the SLP Vectorizer maintainer, was to fix the issue without impacting compile times, unlocking better vectorization decisions and performance. This contribution brings up to 9.1% improvement on the execution time of SPEC benchmarks on SpacemiT-X60.
Register Allocation with IPRA Support. Enabling Inter-Procedural Register Allocation (IPRA) to the RISC-V backend. IPRA reduces save/restore overhead across function calls by tracking which registers are used. In the RISC-V backend, supporting IPRA required implementing a hook to report callee-saved registers and prevent miscompilation. This contribution brings up to 3.3% improvement on the execution time of SPEC benchmarks on SpacemiT-X60.
An in-depth discussion of these contributions can be found at:
The Future of RISC-V in LLVM
This project significantly improved the performance of the RISC-V backend in LLVM through a combination of targeted optimizations, infrastructure improvements, and upstream contributions. We tackled key issues in vectorization, register allocation, and scheduling, demonstrating that careful backend tuning can yield substantial real-world benefits, especially on in-order cores like the SpacemiT-X60.
Future Work:
- Vector latency modeling: The current scheduling model lacks accurate latencies for vector instructions.
- Further scheduling model fine-tuning: This would impact the largest number of users and would align RISC-V with other targets in LLVM.
- Improve vectorization: The similar performance between scalar and vectorized code suggests we are not fully exploiting vectorization opportunities. Deeper analysis might uncover missed cases or necessary model tuning.
Improvements to DAGCombine: after the work on improving vectorization efficiency, Philip Reames created issue 132787 with follow-up ideas.