-
oa An Ultra-low-power Processor Architecture For High-performance Computing And Other Compute-intensive Applications
- الناشر: Hamad bin Khalifa University Press (HBKU Press)
- المصدر: Qatar Foundation Annual Research Conference Proceedings, Qatar Foundation Annual Research Conference Proceedings Volume 2014 Issue 1, نوفمبر ٢٠١٤, المجلد 2014, ITPP0503
ملخص
GRAPE-X processor is an experimental processor chip designed to achieve extremely high performance per watt. It was made using TSMC's 28 nm technology, and has achieved 30 Gflops/W. This number is three times higher than the performance of best GPGPU cards announced so far, using the same 28 nm technology. The power consumption has been the main factor which limits the performance improvement of HPC systems. This is because of the break of the so-called CMOS scaling law. Until early 2000's, or when the design rule of the silicon device was larger than 130nm, shrinking the transistor size by a factor of two results in: for times more transistors, two times higher clock frequency, half the supply voltage, and the same power consumption. Thus, one could achieve 8x performance improvement. However, with transistors smaller than 130nm design rules, it has become difficult to reduce the supply voltage, resulting in only a factor-of-two performance improvement for the same power consumption. As a result, reduction in the power consumption of the processor, when it is fully in operation, has become the most important issue. In addition, it has also become important to achieve high parallel efficiency on relatively small-sized problems. With large parallel machines, high peak performance is realized, but that peak performance is in many cases not so useful, since it is achieved only for unrealistically large problems. For the problems of practical interest, the efficiencies of large scale parallel machines are sometimes surprisingly low. In order to achieve high performance-per-watt and high parallel efficiency on small problems, we developed a highly streamlined processor architecture. In order to reduce the communication overhead and improve parallel efficiency, we adopted an SIMD architecture. To reduce the power consumption, we adopted the distributed-memory-on-chip architecture, in which each of SIMD processor core has its own main memory. Based on the GRAPE-X architecture, an exa-flops (10^18 flops) system with the power consumption less than 50 MW will be possible in 2018-2019 time-frame. For many real applications including those in the cyber security area, which requires 10TB or less memory, a parallel system based on our GRAPE-X architecture will provide the highest parallel efficiency and the shortest time to the solution at the same time. Oral presentation is preferred