Multi core processor for QR decomposition based on FPGA


  • Safaa S. Omran Electrical Engineering Technical College, Baghdad/Iraq
  • Ahmed K. Abdul-abbas Electrical Engineering Technical College, Baghdad/Iraq





QR Decomposition, Gram Schmidt, Givens Rotation, Multicore Processor, CORDIC Square root.


Hardware design of multicore 32-bits processor is implemented to achieve low latency and high throughput QR decomposition (QRD) based on two algorithms which they are Gram Schmidt (GS) and Givens Rotation (GR). The orthogonal matrices are computed using the first core processor by Gram Schmidt algorithm, and the upper triangular matrices are computed using the second core processor by Givens Rotation algorithm. This design of multicore processor can achieve 50M QRD/s throughput for (4 × 4) matrices at running frequency 200 MHz.




[1] K. Sarrigeorgidis and J. Rabaey, "A scalable configurable architecture for advanced wireless communication algorithms," Journal of VLSI signal processing systems for signal, image and video technology, vol. 45, no. 3, p. 127–151, 2006.

[2] S. Chan and X. Yang, "Improved approximate QR-LS algorithms for adaptive filtering," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 51, no. 1, pp. 29 - 39, 2004.

[3] Z.-Y. Huang and P.-Y. Tsai, "Efficient Implementation of QR Decomposition for Gigabit MIMO-OFDM Systems," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 58, no. 10, pp. 2531 - 2542, 2011.

[4] S. D. Munoz and J. Hormigo, "High-throughput FPGA implementation of QR decomposition," IEEE Transactions on Circuits and Systems II, vol. 6, no. 9, p. 1, 2015.

[5] P. Luethi, "Gram-Schmidt-based QR Decomposition for MIMO Detection: VLSI Implementation and Comparison," in IEEE Asia Pacific Conference on Circuits and Systems, Macao, China, 2008.

[6] R. C. H. Chang and C. H. Lin and K. H. Lin and C. L. Huang and F. C. Chen, "Iterative QR decomposition architecture using the modified gram–schmidt algorithm for MIMO systems," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 57, no. 5, pp. 1095-1102, 2010.

[7] Dongyeob Shin, Ji-Hwan Yoon, "Gram-schmidt tailed high-throughput QR decomposition architecture for MIMO detector," in International SoC Design Conference (ISOCC), Jeju, South Korea , 2014.

[8] R. Gangarajaiah, O. E. Liu and Liang, "An adaptive QR decomposition processor for carrier-aggregated LTE-A in 28-nm FD-SOI," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 64, no. 7, pp. 1914 - 1926, 2017.

[9] S. S. Omran and A. K. Abdul-abbas, "Design of 32-bits RISC processor for hardware efficient QR decomposition," in 2018 International IEEE Conference on Advance of Sustainable Engineering and its Application (ICASEA), Wasit - Kut, Iraq, Iraq, 2018.

[10] M. Parker, V. Mauer and D. Pritsker, "QR Decomposition using FPGAs," in EEE National Aerospace and Electronics Conference and Ohio Innovation Summit, Dayton, OH, USA, 2016.

[11] G. H. Golub and C. F. V. Loan, Matrix comutations, Baltimore, Maryland: John Hopkins Univ. Press, 2013.

[12] S. S. Omran and H. S. Mahmood, "Pipelined MIPS processor with cache controller using VHDL implementation for educational purposes," in IEEE International Conference on Electrical Communication, Computer, Power, and Control Engineering, Mosul, Iraq, 2014.

[13] xilinx, "," Xilinx Inc., 2017. [Online]. Available:

View Full Article: