Domain Specific Computing with FPGAs
The continued watering-down of Moore’s Law means that performance gains cannot be gained with a simple “Scale Up”, using a CPU with a higher clock rate. Semiconductors are already operating close to the limits set by laws of physics in the materials. Designers can often circumvent this problem by running tasks in parallel on multiple CPUs, referred to as “Scale Out”. Even when problems can be solved with more CPUs, they will eventually run into limitations of their own as the size, heat, power consumption and cost increases.
Using more of the same CPU is not the only way to scale out however. A growing number of hardware and software tools are being developed that can run specific tasks much more efficiently than before. Cryptocurrency mining stands out as an example where performance gains were made when hash computation moved from CPU, GPU, FPGA to ASIC. Each step represented a higher degree domain specific computation, at a cost of lower flexibility to change. Calculating hashes is now only economically viable with an ASIC that is designed to do only one thing, a specific type of hash, very efficiently.
ASICs however, despite the benefits, are very time consuming and costly to develop and offer virtually no flexibility to make any changes. The FPGA being a programmable hardware circuit offers the nearest thing to an ASIC in terms of efficiency, but with the ability to change function at any time and at a much lower cost. The devices share the same domain specific language (VHDL or Verilog) to produce RTL for their circuit description. FPGAs can be programmed and configured to provide a practically unlimited number of functions limited only by their logic gate count and the size of the design put into RTL.
Examples of applications that benefit greatly from using FPGAs are ones that can use fixed-point optimized designs, bit-level operations and many distributed cache memories. The Xilinx Ultrascale FPGA architecture provides fabric that does exactly this with DSP, LUT and BRAM blocks using state-of-the-art energy efficient silicon from TSMC. Designers can, according to Amazon Web Services AWS EC2 F1, in some cases expect to see 100x acceleration improvements using FPGA instead of CPU.
Some applications are particularly sensitive to latency, such as financial trading, video processing, network security, and applications where latency is multiplied in a feedback loop such as real-time data analysis. FPGAs can offer great performance gains in this type of application because multiple steps of data copying and moving can be avoided, thus reducing latency, power and cost.
A domain specific accelerator becomes very useful when it encapsulates behaviour and abstracts function in a fast, compact and energy efficient way. Doing so with an FPGA makes it easier to offload entire protocols from software to an efficient hardware accelerator, while keeping it flexible at the same time.
Chevin Technology has developed accelerator IP that can help solve problems related to throughput, latency, power, size, heat and cost constraints, in demanding science, engineering and business applications.