Design and performance analysis of high throughput and low latency double precision floating point division on FPGA

Main Article Content

Sushmitha V Gopal, Shivaputra


Instruction level parallelism is employed by fully pipelined architecture in latest processors and the existing processor utilizes various strategies. The system proposed in this work is implemented by employing fully pipelined architecture FPU in field-programmable gate array (FPGA) and when compared to other strategies, the proposed architecture is considered to be an optimized architecture in terms of area, latency and throughput. There exist numerous challenges in performing division operation in Digital signal processing. As a result, it is essential to implement division operations in FPGA due to their complexity which is main module in many complex operations. The pipeline in every cycle of clock must be ready to accept the divider results and store in memory. A Virtex-5 FPGA uses Verilog code to build and implement DPFP (double precision floating point), which makes use of single precision floating point calculations. The proposed DPFP uses Radix-4 multiplier at lower level module to perform partial products and additions. The delay possesses 23 clock cycles and the output rate consists of 21 clock cycles.  In other implementations such as high-pipelined, a large number of adders/subtractors are used in some of applications. The process delays, as well as output rate obtained are 25 and 9 clock cycles, respectively. It minimizes the amount of energy dissipated and offers several advantages such as lower area usage as well as lower latency. In most floating-point applications, the floating-point division is considered to have low-frequency as well as high-latency of operation. On the other hand, processor designers are being compelled to pay close attention to all aspects of floating-point computations because of the growing demand for excellent graphics and the widespread use of performance standards like SPEC marks.. This work outlines the most widely known algorithms for floating-point division as well as implementation options that are available for the DSP applications. It also demonstrates how conventional applications of floating-point can help the designer in formulating decisions as well as trade-offs using a system-level analysis as a foundation.

Article Details