Design And Performance Comparison Of 32-Bit Risc-V Alu Accelerators: From Combinational To Pipelined Architecture With Flag Support

13 May

Authors: Debika Rani Sahu, Tapas Kumar Patra, Debi Prasad Dash

Abstract: The demand for more efficient and high-performing computing in embedded and edge systems has led to the creation of application-specific hardware acceleration on FPGAs. This includes an integrated two-stage pipelined Arithmetic Logic Unit (ALU) accelerator with an AXI4-Lite interface, designed for use in Zynq-based processing systems. Two design variants were explored: a baseline non-piped ALU and a pipelined ALU, both offering support for Zero, Carry, Overflow, and Negative flags. The proposed accelerator is implemented on the Xilinx PYNQ-Z2 FPGA board. The processing system communicates with the programmable logic through an AXI-based memory-mapped interface. A Python-based layer is used to set up, manipulate, check, and verify the hardware module, which facilitates prototype development and testing. The pipelined architecture balances manageable design complexity with computational throughput by overlapping instruction execution stages. Experimental evaluation shows that the pipelined design achieves 150 MHz and 150 MOPS in operating frequency and throughput, respectively. This demonstrates a 50% improvement over the non-pipelined version. The implementation incurs a modest resource usage penalty of 35% LUTs, and the overall power consumption stays below 0.1 W. These results highlight how effective pipelining is in enhancing ALU performance on FPGA platforms. It also confirms its suitability for high-performance embedded applications that need efficient hardware acceleration.

DOI: http://doi.org/10.5281/zenodo.20175302