

**International Journal of Engineering & Technology** 

Website: www.sciencepubco.com/index.php/IJET doi: 10.14419/ijet.v7i1.8730 Research paper



# Low-power, low-latency transceiver design using d-TGMS flip-flop for on-chip interconnects

U. Saravanakumar<sup>1</sup>\*, P. Suresh<sup>2</sup>, S. P. Vimal<sup>3</sup>

 <sup>1</sup> Department of ECE, Veltech Dr.RR & Dr.SR University, Chennai – 62, India
<sup>2</sup> Department of ECE, Sri Ramakrishna Engineering College, Coimbatore – 22, India \*Corresponding author E-mail: saran.usk@gmail.com

### Abstract

The routers in Network on Chips (NoCs) are used to transmit the data among the Processing Elements (PEs) in the field, and it can be done through transmission links between the routers. Traditionally, the data transmission between the PEs of NoC is carried out by the parallel bus which consumes more power, leads to be complex routing strategies and occupies more area within the field. Instead of parallel bus, serializes and deserialisers are used for serial data transmission, which consumes very less power and area than traditional method. To implement serialiser-deserialiser at the transceiver in the router for on chip communication, a three-level encoding technique is implemented in this design, which eliminates power hungry blocks in earlier works, such as Phase Locked Loops, Feed Forward Equalizers, Decision Feedback Equalizers and the repeaters along the transmission line. In this paper, a low-power transceiver is proposed using modified C<sup>2</sup>MOS flip flop and Dynamic TGMS flip flop circuits in order to minimize the delay. The power reduction of 35.683% and the delay reduction of 44.71% were achieved in the proposed transceiver than the NAND gate based D flip flop transceivers.

Keywords: Network on Chip; Serialiser-Deserialiser; On-Chip Interconnects; D-TGMS; Low Power.

# 1. Introduction

This Technology scaling has now reached to nanometer regime, and that gives an opportunity to integrate several numbers of cores on a single chip, called system on Chip (SoC). Most of the current SoCs use a traditional system bus to connect several functional units. Later, a new communication protocol has introduced, named as NoC and several research works are carried out by the designers. The communication process in NoC is done by passing data from one router to another router over the conventional parallel bus for a long distance on-chip. But this conventional bus has become no longer suitable for MPSoCs in terms of power, area and reliability. As said earlier, technology scale down has major impact on gate delay leads to better functional units but not on interconnects, which puts a lot of pressure on wiring complexity and area occupied. Timing errors due to jitter and skew on the parallel bus makes the receiver synchronization very hard, and limits the bandwidth. Also, other factors like cross talk, noise and coupling from adjacent lines limit the bandwidth [1], [2], [3].

One solution to solve the above disadvantages is, replacing the parallel on chip bus by serial transmission links between the PEs through routers. At low frequencies, transmission links between the routers behave like RC interconnect, and its performance has improved by introducing repeaters but leads to larger area and high power. To suppress the area and power, serialization techniques have introduced in NoCs with the Serializer-Deserializer (SerDes) transceiver. This serializing technique allows the data transmission among the PEs at high frequencies to benefit from the characteristics' issues of the transmission line [3] and [4]. Several coding techniques can be used for serial links between transceivers at PEs in NoC to improve the data rate, to eliminate the

power-hungry blocks and area. In [17], Manchester coding along with the resistive termination scheme was implemented to reduce voltage swing in the transmission line and to improve the data rate. The PAM-4 and Non-Return to Zero (NRZ) coding schemes are implemented at transmitter and receiver sides of the transmission links for higher data rate. The PAM4 transmitter adopts feedforward equalizer (FFE) with pre-distortion driver and PAM4 receiver employs linear and decision-feedback equalization together with purely linear CDR. The NRZ transmitter hires a phase aligner before the last multiplexing stage to dynamically line up the data and clock. At the transmitter side, NRZ uses unique method to extract clock signal and data [18]. In [19], 3-level coding technique has used for the same agenda, but the circuit consumed more power due to the current paths in the voltage divider segment.

In this paper, data and the clock signal are multiplexed using a three-level coding technique shown in Figure 1 [1] which helps to reduce the power and unnecessary routing resources in SerDes transmission links. The main advantage of this coding technique is, the clock signal will be extracted from the received data at the receiver side with simple circuitry. Additionally, this coding technique maintains the signal DC level at half of the power-supply independent of the data pattern.

And also this approach takes away the need for equalization circuits and eliminates the need of sending the clock signal using an extra wire or a Clock Data Recovery (CDR) circuit or a Phase Locked Loop (PLL) at the receiver side. As a result, this coding technique reduces area, power consumption and wire usage as compared to other conventional SerDes designs [5], [6], and [7]. This three-level encoding technique is unaffected by jitter effect gathered during signal transmission because the clock signal is extracted from the data. If any timing errors occurred in the re-



ceived signal that will be reflected in the extracted clock signal, and the data will be sampled appropriately [8]. With these advantages, this paper shows the various implementations and analysis of transceivers for on-chip communications. Recently, the designers have shown their interest on optical interconnects instead of metal wires for on chip wired data transmission, and they are using it because of its efficacy but here we have considered only metal inter connectors [9], [10] and [11].



The organization of this paper is as follows: SerDes architecture including the transmitter and receiver is presented in section II. Transmission line and signaling technique is presented in section III .The simulation results is discussed in section IV and finally section V presents the conclusions.



#### Fig. 2: Block diagram of SerDes Transceiver.

## 2. SER-DES transceiver

The block diagram of the SerDes Transceiver is shown in Figure 2. The transceiver for on chip communication consists of three modules such as transmitter, receiver and a metal wire between them which is used as a transmission medium that helps to exchange the information among the PEs in a chip. The transmitter part of the transceiver serializes 8-bit and 16-bit parallel input data and generates the corresponding differential three-level code to drive the transmission line between the PEs. In this work, the frequency of input signal to the transceiver is selected as 1.5GHz.



Fig. 3: A) Serialiser; B) Three-Level Encoder.

On the other side, receiver portion receives the transmitted codedsignal via on chip communication wires and the phase detector helps to extract the clock signal and serialized data. Then, clock and extracted data are fed into to the deserialiser which recovers the transmitted 8-bit and 16-bit parallel data.

#### a) Transmitter

The transmitter consists of a serialiser and a three level encoder with the clock frequency of 24GHz produced by ring oscillator. The serialiser can be implemented in anyway but in this work it is constructed by using Double Edge Triggered Flip-Flop (DETFF). The architecture of serialiser is shown in Figure 3 (a). The DETFF block is realized by the combination a flip-flop, latch and 2:1 mux. Each stage of the serialiser is supplied with three different clock frequencies i.e., 1.5 GHz, 3GHz and 6GHz. The serialised data is then given to the three - level encoder for encoding. The architecture of three-level encoder is shown in Figure 3 (b). The driver will produce the data EXOR clock and the signal will be given to the receiver. For the encoder a 12GHz clock and a 24GHz clock is needed. The D flip flop is designed using NAND gates or using C<sup>2</sup>MOS [13]. To reduce the power consumption in the transceiver, modified C<sup>2</sup>MOS (mC<sup>2</sup>MOS) flip flop is used and to further reduce the power Dynamic TGMS (D-TGMS) flip flop is used. b) Receiver

The receiver consists of skewed inverter, phase detector and a deserialiser. The phase detector is used to retrieve the data and the clock from the received signal. The phase detector used for the transceiver is as shown in Figure. 4 (a). This phase detector works as a three-state machine, starting with both QA and QB set to '1' representing the reset state. When a '0' pulse is received on A the phase detector will reset QA, and then wait for a pulse on B to set it back to '1'. The same will be done with QB, when the phase detector receives a pulse on B followed by a pulse on A. An SR latches is used to set the output using QA and reset it using QB. An OR gate is used to extract the 12 GHz output clock. The desternalize is simply a shift register used to convert the serial data stream, to 8-bit parallel data at a clock frequency of 1.5 GHz.



Fig. 4: A) Phase Detector B) Mc<sup>2</sup>mos Flip Flop C) D-TGMS.

## c) MC2MOS flip flop

In this paper, the mC<sup>2</sup>MOS flip flop is used in serialiser, threelevel encoder as well as the de-serialiser to achieve low power consumption. This mC<sup>2</sup>MOS flipflop is implemented by cascading two complementary latches. This master-slave implementation results in robust flip-flop with a good hold time behavior. The mC<sup>2</sup>MOS flip flop uses clocked inverters in the circuit for better performance. Also this flip flop has an advantage of insensitivity to overlap of clock signals which causes less power consumption. The mC<sup>2</sup>MOS flip flop is shown in Figure 4 (b).

d) D-TGMS Flip flop

The D-TGMS flip-flop is considered in this work to achieve high speed and low power but it is sensitive to overlap of the clocks. The D-TGMS flip-flop is constructed with transmission gates and MOS transistors. The performance and comparative analysis is presented by Oskuii and Alvandpour in [14] and its delay is very less compare to other flip-flop topologies. Additionally, energy per transition and clock energy of D-TGMS are very less than NAND, mC<sup>2</sup>MOS and C<sup>2</sup>MOS. This flip-flop malfunctions if the clocks overlap for a length of time period. The detailed working function of D-TGMS is presented in [15], [16]. The D-TGMS structure is illustrated in Figure 4 (c).

## 3. Signaling

At low frequencies on-chip interconnects behave as RC interconnects that introduces a low pass effect to the signal. For higher frequencies, the inductance effect becomes significant that enabling the high frequency signals to travel with the speed of light. These high frequency signals still suffer from attenuation along the line due to the resistance of interconnect. However, this attenuation is constant across different frequency components, so the signal keeps its eye [5].

The random data streams cover the entire frequency spectrum, and low frequency components will introduce large distortion to the signal. To suppress this distortion effect, equalization is required using either a Feed-Forward Equalizer at the transmitter or using a Decision Feed-back Equalizer at the receiver [2], [12]. These equalizers are very complex in design and require more power. So, three-level coding technique is used in this work to avoid using these complex and high power equalizers. Another issue that should be considered especially with high frequency signals is signal reflections, which requires matching the transmission lines to avoid signal distortion. In this design source matching is used instead of matching at the receiver side to make use of the transmission gate resistance and benefit from the signal reflection at the receiver side to double the amplitude of the received signal.

## 4. Simulation results and discussion

The SerDes transceiver for on chip communication is designed, simulated and synthesized using Cadence virtuoso with 180nm technology. The transceiver is designed with flip flops using NAND gates, flip flops using C<sup>2</sup>MOS and mC<sup>2</sup>MOS flip flop and D-TGMS flip flop. The complete integrated block diagram of the transceiver using D-TGMS in Cadence Virtuoso is presented in Figure 5.



Fig. 5: Complete Transceiver Implementation on Cadence.

The output waveform of the proposed D-TGMS transceiver is shown in Figure 6. According to the results obtained from the experiments, the input data that is given to the transceiver of source PE is correctly transmitted and the same data is received and decoded by the transceiver of destination PE.

Table 1 shows the power consumed by transmitter and receiver modules in the transceiver designed using D-TGMS based flip flops. The summation of each module results in the total power consumed by the transceiver. The power consumption and the delay of transceiver using various flip flops are calculated and tabulated in Table 2. The comparison results show that, the transceiver with NAND based flip flops consumed more power and delay than all other transceivers. The existing transceiver that uses  $C^2MOS$  flip flop consumes more power than the proposed transceiver that uses mC<sup>2</sup>MOS and D-TGMS.



Fig. 6: Complete Output Waveform of the Transceiver.

| Table 1: Power Consumption |                     |         |  |
|----------------------------|---------------------|---------|--|
| Module                     | Component           | Power   |  |
| Tx                         | Serialiser          | 7.29mW  |  |
|                            | Three level encoder | 4.735mW |  |
| Rx                         | Phase Detector      | 0.59mW  |  |
|                            | Deserialiser        | 0.54mW  |  |

| Table 2: Design Comparisons |                               |            |            |
|-----------------------------|-------------------------------|------------|------------|
| S. No                       | Transceivers (Flip Flop used) | Power (mW) | Delay (ps) |
| 1.                          | NAND D                        | 20.155     | 68.53      |
| 2.                          | C <sup>2</sup> MOS            | 16.656     | 49.68      |
| 3.                          | mC <sup>2</sup> MOS           | 14.474     | 47.76      |
| 4.                          | D-TGMS                        | 12.963     | 37.89      |

## 5. Conclusion

A new signaling technique was used in this paper to design the transceiver that consumes very less power with minimum delay. The proposed transceiver uses  $mC^2MOS$  flip flop and D-TGMS flip flop in the serialiser, three level encoder and the Deserialiser circuits. The three levels encoding technique enables recovering the clock from the transmitted data at the destination side, which eliminates the need for sending the clock through an extra wire or using power hungry complex blocks, such as PLLs and CDRs. The complete transceiver is implemented using various flip flop topologies. The proposed transceiver circuits using D-TGMS flip flops consumes very less power with minimum delay than other circuit topologies.

## References

- [1] S. Safwat, Ezz El-Din Hussein, Maged Ghoneima, and Yehea Ismail (2011), A 12Gbps All Digital Low Power SerDes Transceiver for On-Chip Networking, *Proceedings of the IEEE International Symposium Circuits and Systems, Rio de Janeiro, pp. 1419-1422.* <u>https://doi.org/10.1109/ISCAS.2011.5937839.</u>
- [2] T.Geurts, W. Rens, J. Crols, S. Kashiwakura, and Y. Segawa (2004), A 2.5 Gbps - 3.125 Gbps multi-core serial-link transceiver in 0.13 μm CMOS, *Proceedings of the 30th European Solid-State Circuits Conference*, pp. 487-490. https://doi.org/10.1109/ESSCIR.2004.1356725.
- [3] Ofer Markish, Oded Katz, Benny Sheinman, Dan Corcos, and Danny Elad (2015), On-Chip Millimeter Wave Antennas and Transceivers", *Proceedings of the* 9th International Symposium on Networks-on-Chip (NOCS '15), ACM, Article 11, 7 pages. <u>https://doi.org/10.1145/2786572.2789983</u>.
- [4] M. Harwood, N.Warke (2007), A 12.5Gb/s SerDes in 65nm CMOS using a baud-rate ADC with digital receiver equalization and clock recovery, *IEEE International Solid-State Circuits Conference, Dig. Tech. Papers*, San Francisco, CA pp. 611-613.
- [5] E. Hussein and Y. I. Ismail (2010), A novel variation insensitive clock distribution methodology, *Proceedings of IEEE International Symposium on Circuits and Systems*, Paris, pp. 1743-1746. <u>https://doi.org/10.1109/ISCAS.2010.5537550</u>.

- [6] S. A. Mirbozorgi, H. Bahrami, M. Sawan, L. A. Rusch and B. Gosselin (2016), A Single-Chip Full-Duplex High Speed Transceiver for Multi-Site Stimulating and Recording Neural Implants, *IEEE Transactions on Biomedical Circuits and Systems*, vol. 10, no. 3, pp. 643-653, 2016. https://doi.org/10.1109/TBCAS.2015.2466592.
- Hong-Yi Huang, Ruei-Iun Pu (2011), Differential bidirectional transceiver for on-chip long wires, *Microelectronics Journal*, vol. 42, no. 11, pp. 1208-1215, 2011. https://doi.org/10.1016/j.mejo.2011.08.001.
- [8] J.Young, J. Kang, S. Park, and M. Flynn (2009), A 9Gbit/s serial Transceiver for on-chip global signaling over lossy transmission lines, *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 56, no. 8, pp. 1807-1817. <u>https://doi.org/10.1109/TCSI.2009.2027634</u>.
- [9] Kim G, Park H, Joo J (2015), Single-chip photonic transceiver based on bulk-silicon, as a chip-level photonic I/O platform for optical interconnects, *Scientific Reports*, vol. 5:11329, pp. 1 – 11, 2015. <u>https://doi.org/10.1038/srep11329</u>.
- [10] Chong Zhang, Shangjian Zhang, Jon D. Peters, and John E. Bowers (2016), 8 × 8 × 40 Gbps fully integrated silicon photonic network on chip, *Optica*, vol. 3, no. 7, pp. 785-786. <u>https://doi.org/10.1364/OPTICA.3.000785</u>.
- [11] V. Catania, A. Mineo, S. Monteleone, M. Palesi and D. Patti (2016), Energy efficient transceiver in wireless Network on Chip architectures, *Proceedings of Design, Automation & Test in Europe Conference & Exhibition (DATE)*, Dresden, pp. 1321-1326.
- [12] R. A. Philpott, J. S. Humble, R. A. Kertis, K. E. Fritz, B. K. Gilbert and E. S. Daniel (2008), A 20Gb/s SerDes transmitter with adjustable source Impedance and 4-tap feed-forward equalization in 65nm bulk CMOS, *Proceedings of IEEE Custom Integrated Circuits Conference* (*CICC*), pp. 623-626. <u>https://doi.org/10.1109/CICC.2008.4672163</u>.
- [13] Y. Suzuki, K. Odagawa, and T. Abe (1973), Clocked CMOS calculator circuitry, *IEEE Journal of Solid State Circuits*, vol. SC-8, pp. 462-469. <u>https://doi.org/10.1109/JSSC.1973.1050440</u>.
- [14] S. Tahmasbi Oskuii and A. Alvandpour (2004), Comparative study on low-power high-performance standard-cell flip-flops, *Proceed*ings of the SPIE, vol. 5274, pp. 390-398. <u>https://doi.org/10.1117/12.530225</u>.
- [15] Weste N. H. E., Eshraghian K (1994), Principles of CMOS VLSI design, a systems perspective, second edition, Addison-Wesley, 1994.
- [16] Rabaey J. M., Chandrakasan A., Nikolic B. (2016), *Digital integrated circuits, a design perspective*, second edition, Prentice Hall, 2016.
- [17] A. H. Elsayed, R. N. Tadros, M. Ghoneima and Y. Ismail (2014), Low-power all-digital manchester-encoding-based high-speed serdes transceiver for on-chip networks, *IEEE International Symposium on Circuits and Systems (ISCAS)*, Melbourne VIC, pp. 2752-2755. <u>https://doi.org/10.1109/ISCAS.2014.6865743</u>.
- [18] J. Lee, P. C. Chiang, P. J. Peng, L. Y. Chen and C. C. Weng (2015), Design of 56 Gb/s NRZ and PAM4 SerDes Transceivers in CMOS Technologies, *IEEE Journal of Solid-State Circuits*, vol. 50, no. 9, pp. 2061-2073. <u>https://doi.org/10.1109/JSSC.2015.2433269</u>.
- [19] R. N. Tadros, A. H. Ahmed, M. Ghoneima and Y. Ismail (2015), A 24 Gbps SerDes transceiver for on-chip networks using a new half-data-rate self-timed 3-level signaling scheme, *Proceedings of the 5th International Conference on Energy Aware Computing Systems & Applications*, pp. 1-4. https://doi.org/10.1109/ICEAC.2015.7352168.