

**International Journal of Engineering & Technology** 

Website: www.sciencepubco.com/index.php/IJET

Research Paper



# Efficient high throughput decoding architecture for non-binary LDPC codes

C. Arul Murugan<sup>1\*,</sup> B. Banuselvasaraswathy<sup>2</sup>, K. Gayathree<sup>3</sup>, M. Ishwarya Niranjana<sup>4</sup>

<sup>1</sup>Assistant Professor, Department of ETE, Karpagam College of Engineering, Coimbatore, India <sup>2, 3</sup>Assistant Professor, Department of ECE, Sri Krishna College of Technology, Coimbatore, India <sup>4</sup>Assistant Professor, Department of ECE, Pollachi Institute of Engineering and Technology, Pollachi, India.

\*Corresponding author E-mail: <u>murugan.carul@gmail.com</u>

### Abstract

This article, deals with efficient trellis inbuilt decoding architecture for non-binary Linear Density Parity Check (LDPC) codes. In this decoder, a bidirectional recursion is embedded to enhance the layered scheduling and decoding latency, which in turn is used to minimize the number of iterations compared to existing techniques. Consequently, it is necessary to increase the throughput for improving the efficiency of the system. In addition, a compression technique is implemented for reducing the requirements of memory and the area. Trellis based decoder was used to reinforce the check node processing. The proposed decoder for LDPC codes yields high throughput when compared to other similar decoders presented in preceding works. The designed architecture was implemented using Cadence Virtuoso software. This decoder provides a throughput of about 39.21 Mb/s at clock frequency of 190MHz.

# 1. Introduction

In today's modern era [3], communication has entered into day to day lives in various forms. Communication is the method of exchanging information in form of message, information between sender and receiver. Therefore, various technologies are developed for increasing long range communication and automatic data processing equipment. High throughput and effective data transmission with minimum error rate is needed for the design. Following, a lot of error controlling techniques have been introduced for error detection and correction. LDPC (Linear Density Parity Check) is one kind of techniques used. LDPC codes are good enough with high potential to support decoder that exhibit parallel operation. In mobile communications, the channel decoder should have the ability to assist different code rates and automatic error correction capability. To design efficient multimode decoder, closeness among different modes are identified, analyzed and designed as reusable hardware devices to increase the flexibility of entire architecture. These features are incorporated into a fully parallel architecture adopted in multimode LDPC decoder designs. LDPC codes are employed in order to control the errors. Furthermore, it is appropriate for implementations in decoder that that show a substantial utilization of parallelism. Moreover, Trellis modulation scheme is a used for efficient transmission of data over a band limited channels which is widely used for application with high throughput and better error controlling techniques. Hence, a trellis dependent decoding architecture for non-binary LDPC (Linear Density Parity Check) code was designed.

# 2. Related Works

In this article, non-binary LDPC codes are preferred than binary low density parity check (LDPC) codes because of high coding gain. Binary LDPC codes fail to achieve near-capacity performance in small or medium code length. Hence, decoding architecture is designed for non - binary LDPC codes. Injae et al[1], brings out a low-power low-density parity check convolutional code (LDPC-CC) decoder. This design combines several memory banks into single memory bank in order to diminish the power consumption. Wang et al[2] discussed about the incorporation of new parallel interleave techniques for turbo decoder. It utilizes quadratic permutation polynomial (QPP) interleaver for the cross MAP (XMAP) parallel decoding and symbol-based serial MAP (SMAP). In [6], Davey et al introduced aq-ary Sum-Product algorithm (QSPA). The algorithm is appropriate for non binary LDPC codes. This algorithm is suitable for implementation in probability domain. QSPA is an extension of SPA (sum Product Algorithm) for binary codes. QSPA algorithm is difficult to implement in log domain because it is easily affected by quantization effects and needs complex multiplication operation. Wymeersch et al[7], proposed implemented QSPA in the log domain and named as Log-QSPA. In this algorithm, multiplication process is replaced by addition which in turn eliminates the normalization factor. In log OSPA, each and every check node processing is carried out by Brute force technique. From the results, it is inferred that when the check node is high, the check node processing in log QSPA is still considered to be a tedious task. In [8] author utilized Fast Fourier Transform (FFT) in QSPA implementation, therefore it is named as FFT-QSPA, even though this algorithm reduces the complexity but still it requires improvement for multiplication operation in the probability domain. algorithm reduces the complexity but still it



Copyright © 2018 Authors. This is an open access article distributed under the <u>Creative Commons Attribution License</u>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

need multiplication operation in the probability domain. Songand et al [9], proposed a mixed -domain implementation of FFT OSPA. This approach utilizes look up tables for the exponential and logarithmic values. But major drawback of this algorithm is, it is superior only when the size of the look up table is very small. To avoid complexity, Min-Sum algorithm [10] and the Min-Max algorithm [11] were proposed. However, these algorithms can reduce complexity and memory requirements but does not provide high throughput and deteriorates from loss in error rate performance. In [12], QSPA algorithms are used for three serial non-binary LDPC decoders. In [13], EMS decoder was designed for non - binary LDPC codes utilizing Minimum - sum algorithm. This decoder was mainly designed to address the memory problem as well as to minimize the decoding iteration. Lin et al [14], presented a parallel architecture to benefit variable and check node processing units. In [15], VLSI architecture was designed for nonbinary LDPC decoder. In this approach, symmetric properties and intrinsic shifting of QC-LDPC codes are manipulated to decrease the difficulties in routing as well as to reduce the area of memory. In [16], path construction approach was introduced to maximize the process with minimum complexity. In[17], Ueng et al discussed the efficiency of permutation network integrated into the decoding architecture. In Zhang et al[18], introduced iterative hard reliability-based majority-logic decoding (IHRB-MLGD) algorithm. IHRB algorithm brings off a substantial coding gain with less overhead. Chen et al[19], demonstrated the iterative soft-reliability-based (ISRB algorithm for non-binary LDPC codes. This architecture results in good error performance and fast convergence. Controller analysis for non-linear system has been reported [20-29].

This paper is organized as follows: Section III describes the design of trellis decoding architecture for LDPC codes, Section IV deals with results and discussion and section V includes Conclusion.

# 3. Trellis Decoding Architecture

In this article, trellis based decoding architecture was designed to overcome the drawbacks of existing Max Log QSPA [5]. The existing methods use complex multiplication process and are easily affected by quantization effects. In the designed architecture, the complex multiplication process is replaced by the addition process in ADD – PPN modeling for NBC (Non Binary Code) operation. It comprises of message compression and decompression for area reduction. The check node processing is reformulated by including forward and backward recursion. All these features are incorporated into decoding architecture to yield high throughput, flexibility, less operating frequency with reduced number of iterations and complexity.



Fig. 1: Block diagram of trellis based decoding architecture

# 3.1. ADD – PPN Modeling

In this section, the decoding architecture for ADD-PPN (Add – Permutation Polynomial) modeling for LDPC codes is presented. The proposed ADD-PPN architecture modeling consists of permutation network, multiplexers, adders and flip flop for efficient execution of recursion steps.



Fig. 2: Block diagram of ADD-PPN Architecture

#### **3.2.** Permutation Polynomial Network (PPN)

PPN modeling helps in handling the decoding operation related to addition operation over finite fields in check node operations. It facilitates the layered decoder to be accomplished efficiently. In order to maximize the throughput, the permutation network and the minimum value filter are utilized to devise a trellis based decoding architecture. The inputs given to adder are processed simultaneously. Therefore, a permutation polynomial network is necessary to shuffle the outputs of the adder in order to transfer uniformly in order.



Fig. 3 Permutation Polynomial Network

### 3.3. Adder

The adders are manipulated in decoding architecture to reduce the power consumption as well as to maintain full voltage swing at reduced supply voltage. It has improved power delay product and better noise immunity. The inputs are forwarded to the adder followed by the permutation network for further processing. The usage of adder results in greater noise immunity and desirable power delay product.



Fig. 4: Adder circuit for the permutation network

### 3.4. Flip-flop

Flip-flops are often used as a storage device. It stores the output value from the multiplexer and values are restored when it is needed for further processing. The power consumption of the flip-flop is reduced by deactivating the clock independently when it does not have to change its value.



Fig. 5: Flip Flop circuit coupled in permutation network

#### 3.5. Multiplexer

Multiplexing technique is utilized to minimize the number of electrical connections required for each component. This further reduces the complexity and increase the flexibility of the decoder. Here, the driver signals are activated over a group of rows and columns at a time, but finally it switches to a single output.



Fig. 6: Multiplexer for the permutation network

### 3.6. Bidirectional recursion

Bidirectional recursion and layered scheduling are incorporated into the decoding architecture for maximum throughput. The number of iterations is reduced due to the layered scheduling and number of frequency clock cycles is reduced due to integration of bidirectional scheduling which includes forward and backward recursion as a parallel operation. Besides, bidirectional scheduling also decreases the latency compared to unidirectional scheduling.

## TRELLIS BASED MAX ADD ALGORITHM [[4]

#### Step 1 (initialization)

At initial condition  $\in = 0$  and t=0, it checks for the starting node to travel further

### Step 2 (forward recursion)

The node travel in forward direction when

 $\varepsilon$  > t and the path metric have the following condition

(d + 1) >= t1

#### Step 3 (backward recursion)

The node travel in reverse direction when

 $\mathbf{e} < \mathbf{t}$  and the path metric have the following condition

#### (d + 1) < t1.

#### Step 4 (end of the process)

After completing step 3, it comes back to the initial node and check for the next node and the same process is repeated.

#### 3.7. Message Compression and Decompression Unit

The decoding architecture uses two register memory banks to store posteriori messages and check to variable message. The memory register banks are small in size and reside in larger area. Message compression and decompression techniques are introduced to lower the area of the decoder. In addition, this technique also supports the memory bank to store APP and check to variable messages along with their corresponding log – likelihood ratios (LLR).



Fig. 7: Message compression and decompression unit

#### 3.8. Trellis Based CNU Unit

The figure 8 shows the trellis based check node unit (CNU). It consists of a forward unit, a backward unit and a pair of LLR units. The two units are named as LLR unit 1 and LLR unit 2. The forward unit consists of multiplexers, filters, ADD-PPN node and forward memory unit. The multiplexer selects the initial input values and gives to the permutation network. The NBC network reorders the received messages and shuffles and forward it back again to the multiplexer. The filter unit consists of arbiter unit and serial unit. The serial unit calculates the maximum threshold voltage and the arbiter unit lists the values that exceed the maximum threshold voltage. The completed forward recursion unit values are stored in forward memory unit. The backward

recursion operation follows the similar procedure as the forward recursion process. The LLR unit 1 consists of filter and NBC unit. The estimated information resulting from the forward memory unit are directed to the LLR unit 1. Similarly, the backward computation information from the backward memory are forwarded to the LLR unit 2 to obtain the check to variable message. Both forward and backward computation information are transferred successively. Hence reduces the number of iterations with reduced frequency clock cycles.



Fig. 8: Trellis check node unit decoding architecture with permutation network

# 4. Results and Simulation

Here, the implementation result for the trellis decoder is shown in figure 9. This architecture is implemented using cadence virtuoso tool. It consists of permutation network, message compression and decompression unit, forward and backward memory unit and filter circuit



Fig. 9: Trellis based decoding architecture



Fig. 10: Simulation Results of LDPC Code Generated



Fig. 11: Simulation Results of Bidirectional recursion

The figure 12 shows the transient response of the decoding architecture. It is analysed to determine the response of system to change from equilibrium.



Fig. 12: Transient response of trellis decoding architecture

The Figure 13 clearly depicts the DC response of decoding architecture to determine the operation of the system, Here the input given is 0 to 5 V and obtained output is 2.375 V.



Fig. 13: DC response of trellis decoding architecture

The throughput of decoding architecture can be calculated as follows:

Throughput =  $\underline{N \times \log_2(q) \times f_{clk}}$  $N_{it} \times N_c$ 

Where  $N_c$  is the number of clock,  $f_{clk}$  is the clock frequency and  $N_{it}$  is the number of iterations. In the designed decoder a throughput of 39.21 Mb/s is achieved.

| Algorithm             | Frequency(MHz) | Iterations | Throughput(Mb/s) |
|-----------------------|----------------|------------|------------------|
| Selective input       | 260            | 15         | 8.84             |
| Min-Max               | 260            | 15         | 8.84             |
| Max log QSPA          | 250            | 5          | 27.44            |
| Trellis based         | 100            | 2          | 20.21            |
| Max -add<br>algorithm | 190            | 2          | 39.21            |
| ugonum                |                |            |                  |

 Table 1: Comparison of different algorithm

# 5. Conclusion

In this work, trellis based decoding architecture with efficient check node processing is Presented. This decoder is incorporated with parallel forward and backward recursion for efficient processing. The layered scheduling and decoding latency are integrated to reduce the number of iterations with reduced number of frequency clock cycles compared to the existing method. Therefore, the throughput can be maximized for improving the efficiency of the system. In addition, a compression and decompression techniques are embedded into the architecture to shrink the area and memory size. The proposed trellis decoding architecture for LDPC codes achieves the highest throughput and better error-rate performance than several existing preceding decoders. In future, decoding architectures with high error controlling techniques are planned to design and implement.

# References

 Injae Yoo and In-Cheol Park, "Low-Power LDPC-CC Decoding Architecture Based on the Integration of Memory Banks," IEEE Transactions on Circuits and Systems II, Volume: 64, Issue: 9, pp 1057 – 1061, Sept. 2017.

- [2] Jian Wang, Kangli Zhang and Harald Kröll, "Design of QPP Interleavers for the Parallel Turbo Decoding Architecture," IEEE Transactions on Circuits and Systems I: Regular Papers, Volume: 63, Issue: 2, pp. 288-299, Feb. 2016.
- [3] B. Banuselvasaraswathy, "Trellis based decoding architecture for non-binary LDPC codes using modified Fano algorithm to achieve high throughput", International Journal of Advanced Information Science and Technology (IJAIST), Vol.23, No.23, pp 383-389, March 2014.
- [4] B. Banuselvasaraswathy, "A New Enhanced Trellis Based Decoding Architecture for Punctured Codes using Modified Max Product Algorithm", International Journal of Advanced Information Science and Technology (IJAIST), Vol.18, No.18, pp 36-43,October 2013.
- [5] Yeong-Luh Ueng, Kuo-Hsuan Liao, Hsueh-Chih Chou, and Chung-Jay Yang, "A High-Throughput Trellis-Based Layered Decoding Architecture for Non-Binary LDPC Codes Using Max-Log-QSPA", IEEE Transactions on Signal Processing, Vol.61, No.11, pp 2940 – 2950, June, 2013.
- [6] M. C. Davey and D. J. C. MacKay, "Low-density parity check codes over GF (q)," IEEE Commun. Lett, vol. 2, no. 6, pp. 165– 167, Jun. 1998.
- [7] H. Wymeersch, H. Steendam and M. Moeneclaey, "Log-domain decoding of LDPC codes over GF(q),"in Proc. IEEE Int. Conf. Commun. (ICC), Paris, France, Jun. 20–24, 2004, pp. 772–77
- [8] L.Barnault and D.Declercq, "Fast decoding algorithm for LDPC over GF(2<sup>n</sup>)", in Proc. IEEE Inf. Theory Workshop (ITW), Paris, France, Mar. 31–Apr. 4, 2003, pp. 70–73.
- [9] H.Songand, J.R.Cruz, "Reduced complexity decoding of Q-ary LDPC codes for magnetic recording,"IEEETrans.Magn.,vol.39,no.2,pp. 1081–1087, Mar. 2003.
- [10] D. Declercq and M. Fossorier, "Decoding algorithms for nonbinary LDPC codes over GF (q)," IEEE Trans. Commun., vol. 55, no. 4, pp. 633–643, Apr. 2007.
- [11] V. Savin, "Min-max decoding for non-binary LDPC codes," in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Toronto, ON, Canada, Jul. 6–11, 2008, pp. 960–964.
- [12] A.C.Spagnol, E.Popovici,andW.Marnane,"Hardware implementation of LDPC decoders," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 56, no. 12, pp. 2609–2620, Dec. 2009.
- [13] A. Voicila, D. Declereq, F. Verdier, M. Fossorier, and P. Urard, "Low complexity low-memory EMS algorithm for non-binary LDPC codes," in Proc. IEEE Int.Conf.Commun.,Jun.2007,pp.671–676
- [14] J. Lin, J. Sha, Z. Wang, and L. Li, "Efficient decoder design for nonbinary quasicyclic LDPC codes," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 5, pp. 1071–1082, May 2010.
- [15] C. Zhang and K. K. Parhi, "A network-efficient nonbinary QC-LDPC decoder architecture," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 59, no. 6, pp. 1359–1371, Jun. 2012.
- [16] X. Zhang and F. Cai, "Reduced-complexity decoder architecture for non-binary LDPC codes, "IEEE Trans. Very Large Scale Integr.(VLSI) Syst., vol. 10, no. 7, pp. 1229–1238, Jul. 2011.
- [17] Y.-L. Ueng, C.-Y. Leong, C.-J. Yang, C.-C. Cheng, K.-H. Liao, and S.-W.Chen, "An efficient layered decoding architecture for non-binary QC-LDPC codes," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 59, pp. 385–398, Feb. 2012.
- [18] X. Zhang, F. Cai, and S. Lin, "Low-complexity reliability-based message passing decoder architectures for non-binary LDPC codes," IEEE Trans.VLSI Syst., vol.20, no.11, pp.1938–1950, Nov.2012.
- [19] C.-Y.Chen, Q.Huang, C.-C. Chao and S.Lin, "Two lowcomplexity reliability-based message-passing algorithms for decoding non-binary LDPC codes," IEEE Trans. Commun., vol.58, no.11, pp.3140–3147, Nov. 2010.
- [20] R. Kalaivani, K. Ramash Kumar, S. Jeevananthan, "Implementation of VSBSMC plus PDIC for Fundamental Positive Output Super Lift-Luo Converter," Journal of Electrical Engineering, Vol. 16, Edition: 4, 2016, pp. 243-258.
- [21] K. Ramash Kumar,"Implementation of Sliding Mode Controller plus Proportional Integral Controller for Negative Output Elementary Boost Converter," Alexandria Engineering Journal (Elsevier), 2016, Vol. 55, No. 2, pp. 1429-1445.
- [22] P. Sivakumar, V. Rajasekaran, K. Ramash Kumar, "Investigation of Intelligent Controllers for Varibale Speeed PFC Buck-Boost Rectifier Fed BLDC Motor Drive," Journal of Electrical Engineering (Romania), Vol.17, No.4, 2017, pp. 459-471.

- [23] K. Ramash Kumar, D.Kalyankumar, DR.V.Kirbakaran" An Hybrid Multi level Inverter Based DSTATCOM Control, Majlesi Journal of Electrical Engineering, Vol. 5. No. 2, pp. 17-22, June 2011, ISSN: 0000-0388.
- [24] K. Ramash Kumar, S. Jeevananthan, "A Sliding Mode Control for Positive Output Elementary Luo Converter," Journal of Electrical Engineering, Volume 10/4, December 2010, pp. 115-127.
- [25] K. Ramash Kumar, Dr.S. Jeevananthan," Design of a Hybrid Posicast Control for a DC-DC Boost Converter Operated in Continuous Conduction Mode" (IEEE-conference PROCEEDINGS OF ICETECT 2011), pp-240-248, 978-1-4244-7925-2/11.
- [26] K. Ramash Kumar, Dr. S. Jeevananthan," Design of Sliding Mode Control for Negative Output Elementary Super Lift Luo Converter Operated in Continuous Conduction Mode", (IEEE conference Proceeding of ICCCCT-2010), pp. 138-148, 978-1-4244-7768-5/10.
- [27] K. Ramash Kumar, S. Jeevananthan, S. Ramamurthy" Improved Performance of the Positive Output Elementary Split Inductor-Type Boost Converter using Sliding Mode Controller plus Fuzzy Logic Controller, WSEAS TRANSACTIONS on SYSTEMS and CONTROL, Volume 9, 2014, pp. 215-228.
- [28] N. Arunkumar, T.S. Sivakumaran, K. Ramash Kumar, S. Saranya, "Reduced Order Linear Quadratic Regulator plus Proportional Double Integral Based Controller for a Positive Output Elementary Super Lift Luo-Converter," JOURNAL OF THEORETICAL AND APPLIED INFORMATION TECHNOLOGY, July 2014. Vol. 65 No.3, pp. 890-901.
- [29] Arunkumar, T.S. Sivakumaran, K. Ramash Kumar, "Improved Performance of Linear Quadratic Regulator plus Fuzzy Logic Controller for Positive Output Super Lift Luo-Converter," Journal of Electrical Engineering, Vol. 16, Edition:3, 2016, pp. 397-408.
- [30] S.V.Manikanthan and K.Baskaran "Low Cost VLSI Design Implementation of Sorting Network for ACSFD in Wireless Sensor Network", CiiT International Journal of Programmable Device Circuits and Systems, Print: ISSN 0974 – 973X & Online: ISSN 0974 – 9624, Issue : November 2011, PDCS112011008.
- [31] S.V.Manikanthan and V.Rama"Optimal Performance Of Key Predistribution Protocol In Wireless Sensor Networks" International Innovative Research Journal of Engineering and Technology, ISSN NO: 2456-1983, Vol-2, Issue – Special – March 2017.
- [32] T. Padmapriya and V.Saminadan, "Handoff Decision for Multiuser Multiclass Traffic in MIMO-LTE-A Networks", 2nd International Conference on Intelligent Computing, Communication & Convergence (ICCC-2016) – Elsevier -PROCEDIA OF COMPUTER SCIENCE, vol. 92, pp: 410-417, August 2016.
- [33] P. SIVA SANKAR, "A Secure and Fast Authentication implementation between the Entities using Trust Aware Algorithm", International Innovative Research Journal of Engineering and Technology. September 2016 Volume 2 Issue No: 1. 34-40.
- [34] R. Kalaivani, K. Ramash Kumar, S. Jeevananthan, "Implementation of VSBSMC plus PDIC for Fundamental Positive Output Super Lift-Luo Converter," Journal of Electrical Engineering, Vol. 16, Edition: 4, 2016, pp. 243-258.