

**International Journal of Engineering & Technology** 

Website: www.sciencepubco.com/index.php/IJET doi: 10.14419/ijet.v7i4.13477 Research paper



# AOCR: adaptive on chip router algorithm for multicore domain controller platform

T. Shanmuganathan<sup>1</sup>\*, U. Ramachandraiah<sup>2</sup>

<sup>1</sup> Department of Electronics & Communication Engineering, Hindustan Institute of Technology & Science, India <sup>2</sup> Department of Electronics & Instrumentation Engineering, Hindustan Institute of Technology & Science, India \*Corresponding author E-mail: thangatamizh@gmail.com

#### Abstract

In the recent years, with the rapid development of semiconductor technologies and increasing demand for more effective multi-Core Domain Controller platforms, there is a clear demand for effective routing algorithms that can be used to route the packets between these platforms, while enhancing an on chip network performance, achieving a better latency and throughput. This paper proposes an adaptive on Chip Router algorithm with a simple adaptive routing algorithm based on runtime weighted arbitration and resource allocation methodology, where the routing decisions are minimized for applications-specific MDCU platforms. The proposed scheme is evaluated by simulations and its performance in terms of latency, area, power consumption and cost reduction per vehicle are presented. The results show that, 24.5% of latency reduction, 62.25% area utilization optimization and 63.76% of energy efficient compare with existing methods.

Keywords: Multi-Core Domain Controller Unit; Network on Chip Interconnect Architecture; Routing Algorithm; VLSI Implementation

# 1. Introduction

Network-On-Chip (NOC) has become the most significant communication interconnect for many-core chip multiprocessors (CMPS) and the routing algorithms used in these networks play a vital role in determining processor performance. The increasing number of cores are integrated on chip has risen the challenging to deliver high bandwidth and better throughput of these on chip interconnect. Traditionally, the shared bus architecture was used for communication between the cores where simple arbitration and crossbar used to exchange the date. But with increase in number of cores, complexity of communication system also increases. Also the number of cores to be connected to the bus is limited [2].

Hence Network-On-Chip (NOC) design approach is adopted for communication between the cores and these cores communicate by sending packets to one another over the network. Instead of connecting these cores by dedicated wires, they are connected to the network and each IP core communicates with all other cores, not just its neighbors, through the network by sending packets [3]. The network logic utilizes the small amount of area (maximum 2% for this design) in each core. Number of neighbors of the IP core is decided by topology of the network. The topology structure and its router path selection algorithms has impact the performance of Network on chip architecture in terms of latency and throughput. The irregular topology maximizes the flexibility of on chip architecture and it's provide high degree of configurability where congestion of the network can have reduced and achieved high throughputs.

The runtime configurability can reduce the impact of traffic congestion by bypassing few intermediate pipeline stages of router where decision delay minimized [4]. Sometime the runtime configuration of routing algorithms methods and it resources allocation method become critical while optimize the NoC performance [5]. The overall performance of network on chip architecture have significantly impact by the path selection time, bandwidth and network traffic pattern. The most of the researcher are recommended to use two dimensional mesh topology with unidirectional links as interconnect infrastructure to connect processing elements and this planner structure suit to implement in silicon [6].

In the mesh topology, the remotely located node may suffer high latency due to non-minimal path existence. The shared bus interconnection communication faster than pipelines based Network on chip interconnection but the shared bus structure couldn't support higher scalability and higher bandwidth requirements. Similarly, NoC can provide higher bandwidth then shared bus structure but it consumes larger area & delay due to router components such as arbiter circuits, routing table, switches and buffer [7]. Network on chip interconnect could reach higher bandwidth and it can reduce area utilization when application specific architectures with heterogeneous cores are adapted. The communication interconnects for general purpose computing cannot be utilized for application specific function. General purpose computing systems interconnects are regular NoC structure and it's have even distribution of the load among the entire processing element connected.

In these type computing network's design constrains are predicted at design time and this may arise the under & over resource utilization [8]. To address the above said challenges, the runtime network conditions are must be considering for resource allocation rather than predicted at design time. The runtime reconfiguration of on chip network configuration based on application demands will improve the performance of the on chip interconnect in terms of area minimization and delay reduction. The reconfiguration may have done in buffer length, numbers of ports, arbitration mechanism and flow control based on load [9]. Generally, on chip interconnect infrastructure flow control has been classified as circuit switching and packet switching which define the way the data transfer flow from the source core to destination core. The circuit switching method has less complexity to route the date between the cores for minimal load application specific multicore platform compare to packet



Copyright © 2018 T. Shanmuganathan, U. Ramachandraiah. This is an open access article distributed under the <u>Creative Commons Attribution</u> <u>License</u>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. switching. The customized methodology could reduce the complexity and it can improve the network performance [20].



In this paper, we proposed Adaptive on chip router (AoCR) algorithms to reduce routing decision delay and area utilization with circuit switching flow control. In section II, we review the existing techniques for adaptive algorithms and section III we explained proposed methodology. In section IV simulation results are discussed. Finally, section V concludes the proposed work and future enhancement.

## 2. Literature review

Adaptive routing algorithm has more robustness to route the data according to the current condition of network's parameter such as workload and traffic level. These parameter has been measured from the status of buffer occupancy, virtual channel utilization and number of cross bar request. This parametric value has store in register as a look of table and compared to choose the suitable routing path from one node to another node. The adaptive algorithms are classified such as local or global adaptive routing algorithm based on the number of node's information shared among the node for arbitration process, [1], [4], [6] and [12]. Author [10] has proposed, the neighbors-on-path (NoP) selection algorithm has designed for unidirectional Network on chip architecture where the routing path selection done based on the condition of adjacent node and these condition measurements are done one hop count ahead. Here another author [11] has done work to enhance the condition measurement region for more adaptability where routing the date based on local information. The author improved load balancing of Network on chip, the regional contention awareness scheme (RCA) proposed where adaptive algorithm based on local and non-local information. This RCA scheme offered better load balancing but it added logic complexity. The congestion information propagation delay due to more number of node statuses has to consider for route the path selection in RCA scheme.

To address this problem, the RCA scheme was improved by using adaptive routing scheme DBAR algorithms where the congestion calculation done the information collected from either row nodes or column nodes. This DBAR algorithm has minimized the redundant congestion information and reduces the congestion calculation complexity [12]. Similarly to reduce the congestion estimation complexity, the destination based adaptive routing algorithm (DAR) has implemented where every node in network calculate their routing delay and the estimate delay information has to be shared with other node in the network within the particular radius. This algorithms route the data to destination node based on this estimated delay and this calculation & updating done in cyclic basis [13]. This work further enhanced by congestion measurement indices as a buffer depth to improve the performance and throughput by the same author [14]. This adaptive router algorithm consumed less power compare with his previous work but it's utilizing more resource to deliver this output. The above described methods use registers to store the congestion and network status information which increase the area needed in router architecture. The information updating and sharing is similar to the routing table techniques and complexity increasing as increasing monitoring coverage of the nodes. The congestion weightage calculation and propagation process adding routing path selection delay.

To minimize these overheads, source routing is used which doesn't need logic circuit for routing decision and this feature reduces the area of the router. But it requires routing table at the network adapter of each node [15]. With increase in number of nodes area of the routing table also increases. In DOR algorithm is simple and widely used oblivious routing path selection algorithms and its exchange the packet between source & destination through axis with predefine priority. The path selection either XY or YX directions which has less path diversity [21].

# 3. AOCR algorithm

The adaptive routing algorithm's efficiency has affected due to live lock and deadlocks problem during adaptive path selection based on runtime network condition. The live lock arises because of closed priority data transfer where data couldn't have reached destination. The live lock can be minimized by minimal adaptive path selection process. AoCR algorithm has developed based on minimal adaptive methodology and the data has been routed based on node x & y axis coordinate weight assigned. The path selection based on less weight of node value determine by comparing the weighted address and selected minimal route. Similarly, deadlocks, more data has waiting to deliver and resources are being blocked.

To minimize deadlock, the concept of weighted round robin scheme has adopted in AoCR algorithms, This scheme will reflected the nature of round robin method as per weightage assigned which has been assigned based on upstream & downwards congestion level. Here the congestion level has calculated the status of port utilize and buffer utilized value as congestion parameter.

The weightage assignment scheme is almost similar to priority assignment which decide arbiter granting based on congestion value and its compare the congestion value among the requests. The granting access of input port based on high congested upstream node which has been assigned high priority than others ports. This may lead live lock situation and it should be minimized. To minimize live lock situation, we added the logic to assigning the priority not only based on upstream node congestion values and also the number of recent grant given to those particular port. If this port has utilized two consecutive grant recently and then this port won't have assigned as highest priority even though it has highest congested value. Straight away, the grant goes to next highest upstream congestion value port which less recent access granted. This will follow as round robin pattern to avoid deadlock and live lock problem. This process done with help of added additional logic with arbiter unit such as counter & comparator logic. The Counter use to count the number of time it's been granted and it's compared to decide the port selection.

This algorithm has been developed for 3x3 2D mesh topology which has less interconnect complexity due its regular structure. This simple topology implement on a silicon die has provide low latency and high throughput. In 2D mesh architecture, the processing Elements has been arranged in X and Y coordinates and these elements are connected through routing nodes and these nodes are interconnected by either unidirectional wire or bidirectional links. The address of the node connected with processing elements assigned as per the X and Y coordinates assigned with them. This AoCR algorithms has developed with dual arbitration selection method for input port selection and output port selection of routing nodes. This algorithm used logic to inject the data to node for arbitration process and eject the date from node after arbitration process. The injection unit used to inject the received routing data in to

Source **Routing Input Buffer** Injection function Ecc Encoder **Ejection Function** Ecc Decoder Shortout Path Calculation **Congestion Calculations** Weighted round robin Function Cross Bar Switch Control Next Hop Fig. 2: Proposed System Functional Flow.

router node and injected data has encoded by using single error correction hamming code to protect the register used within the router.

#### The ejection of date could have done after encoding and the output path arbitration selection done based on weighted round robin arbitration. The arbitration has granting of access based on neighbor node congestion status and shortest distance between the source & destination node connected in 2D Mesh architecture. This arbitration selection parameter has calculated and compared with others to choose minimal value to route the information. This minimal value would be the weight of arbitration algorithms which decide the next routing node from interconnected nodes. The arbitration process for the output path selection based on congestion value received from downwards nodes. Here the arbiter grants the path to route packet to the next node which located towards destination.

The path selection done based on shortest path which has calculated from X and Y coordinates values. This shortest path can be done by comparing the value of current node X coordinate's values with destination X coordinate's value and Both values are same then its select the next node located in X coordinates. Suppose these values are not same then its compare the current node Y coordinate's values with designation Y coordinates value. If matches, then its selection to next node located in y axis. These selections also compare the minimal value to choose either ways. Once it's selected the path, the connection established through crossbars switch where simple multiplexer plays the role. The overall functional flow of AoCR algorithm has shown in fig 2

### 4. Experimental results

The proposed AoCR algorithms has developed in VHDL code and simulated in Xilinx ISE 15.1 environment. To evaluate the performance of AoCR algorithms, it has been synthesized and implemented on Xilinx FPGA Spartan6 Board. Fig 3. Show that, the synthesized model of AoCR algorithm and Table 1 summaries the experimental result of AoCR algorithm.



Fig. 3: Schematic Structure of AOCR.

We compared the result of proposed AoCR algorithms with exiting method of buffer size based routing algorithm. The results are compared based on the performance evaluation parameters of on chip network architecture such as clock frequency, area utilized (slice) and power estimation. The result shows that, the path delay has reduced up to 23% and the clock frequency has been increased significantly up to 24.5% compare with existing method. Also that, the power computation has minimized at 63% and area utilized for this AoCR algorithms has been reduced at 73%. This proposed application specific AoCR algorithm for vehicle control system has been developed to fit in two processing elements (IP core) connected with single router node architecture. Its offers better clock frequency and optimize the area overhead compare to other existing methods.

Table 1: Comparison & Result Analysis

| Parameters/Meth-            | Existing | Proposed | Percentage of |
|-----------------------------|----------|----------|---------------|
| ods                         | Work     | Work     | Minimization  |
| Slice Register              | 4298     | 795      | 81            |
| Slice- LUT                  | 4846     | 1290     | 73            |
| LUT-Flip Flop               | 2180     | 1466     | 32.75         |
| Bio                         | 166      | 164      | 1.2           |
| Minimum Period<br>Time (Ns) | 2.905    | 2.192    | 24.54         |
| Clock Frequency<br>(MHz)    | 344.270  | 456.204  | 24.55         |
| Power Consump-<br>tion (Mw) | 356      | 129      | 63.76         |

The evaluation state that, the proposed algorithm performs better than existing method in terms of path delay reduction and area minimization. This design also ensure the better throughput by adopting error correction methods. The proposed AoCR algorithm may have lack of adaptability compared for high radix router node and the arbitration has been done based on only on local congestion information to select the path to route. However, these parameters are very depends on application chosen and here we have developed this proposed algorithm for vehicle electronic control system networks with low radix processing element architecture.

#### 5. Conclusion

The proposed AoCR algorithm developed based on congestion based adaptive routing decision for shortest path selection in 2D mesh topology has minimized the complexity of exiting routing decision methods. This algorithm also adapted with optimized resource allocation method to balance the workload of Multi-Core Domain Controller network. This algorithm also includes Single error detection algorithm for encoding and decoding process to achieve high reliability and energy-efficiency. The deadlock & live lock problems has been minimized by using shortest distance routing method to choose forward hop selection and congestion based priority arbitration method.

We evaluated our design in terms of delay, clock frequency, area utilization, power Consumption compared with buffer size based adaptive routing algorithms. The result show that, our design can perform better in terms of high frequency with minimum area in the field of on chip communication for Application Specific Multi-Core Domain Controller platform. In future, the AoCR algorithm will equipped with efficient congestion index estimation method based on arbitration demand and it has been implemented for D-Mesh architecture with high radix router node to improve the performance of Multi-Core Domain Controller platform of Vehicle control system networks.

## References

- R. Ramanujam and B. Lin, "Destination-based congestion awareness for adaptive routing in 2D mesh networks", ACM Transactions on Design Automation of Electronic Systems, vol. 18, no. 4, pp. 1-27, 2013. <u>https://doi.org/10.1145/2505055</u>.
- [2] Dally, W. J., & Towles, B. (2001). Route packets, not wires: on-chip interconnection networks. In Design Automation Conference, 2001. Proceedings (pp. 684-689). IEEE. https://doi.org/10.1109/DAC.2001.935594.
- [3] Pande, P. P., Grecu, C., Jones, M., Ivanov, A., & Saleh, R. (2005). Performance evaluation and design trade-offs for network-on-chip interconnect architectures. Computers, IEEE Transactions on, 54(8), 1025-1040 <u>https://doi.org/10.1109/TC.2005.134</u>.
- [4] G. Ascia, V. Catania, M. Palesi, D. Patti, "Neighbors-on-Path: A New Selection Strategy for
- [5] On-Chip Networks," Embedded Systems for Real Time Multimedia (ESTMedia), pp. 79-84, 2006.
- [6] P. Bogdan, R. Marculescu, "Non-Stationary Traffic Analysis and Its Implications on Multicore Platform Design," IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol.30, no.4, 2011.
- [7] P. Gratz, B. Grot, S.W. Keckler, "Regional congestion awareness for load balance in networks-on-chip," 14thIntl. Symp. On High Performance Computer Architecture (HPCA), pp.203-214, 2008.
- [8] Stensgaard, M. and Spars, J."ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology". In the second ACM/IEEE International Symposium on Networks-on-Chip, 2008, pp. 55-64. <u>https://doi.org/10.1109/NOCS.2008.4492725</u>.
- [9] Faruque, M., Ebi, T. and Henkel, J., "Run-time Adaptive onchip Communication Scheme". In Proceedings of ICCAD, 2007, pp. 26-31.
- [10] R.Manevich, I. Cidon, A. Kolodny and, I. Walter, "Centralized Adaptive Routing for NoCs", IEEE Computer Architecture Letters, VOL. 9, NO.2, JULY-DECEMBER 2010, pp 57-60.
- [11] G. Ascia, V. Catania, M. Palesi, D. Patti, "Neighbors-on-Path: A New Selection Strategy for On-Chip Networks," Embedded Systems for Real Time Multimedia (ESTMedia), pp. 79-84, 2006.
- [12] P. Gratz, B. Grot, S.W. Keckler, "Regional congestion awareness for load balance in networks-on-chip," 14th Intl. Symp. On High Performance Computer Architecture (HPCA), pp.203-214, 2008. <u>https://doi.org/10.1109/HPCA.2008.4658640</u>.
- [13] S. Ma, N.E. Jerger, Z. Wang, "DBAR: an efficient routing algorithm to support multiple concurrent applications in networks-on-chip," 38th Intl. Symp. On Computer Architecture (ISCA), 2011. <u>https://doi.org/10.1145/2000064.2000113</u>.
- [14] R.S. Ramanujam, B. Lin, "Destination-based adaptive routing on 2D mesh networks," Architectures forNetworking and Comm. Systems (ANCS), 2010.
- [15] Matos, D., Concatto, C., Kologeski, A., Carro, L., Kastensmidt, F., Susin, A., and Kreutz, M.," Adaptive router architecture based on traffic behavior observability". In Proceedings of the second international Workshop on Network on Chip Architectures, 2009.

- [16] Hatem, F. O., & Kumar, T. N. (2013, April). A low-area asynchronous router for clock-less network-on-chip on a FPGA. In Computers & Informatics (ISCI), 2013 IEEE Symposium on (pp. 152-158). IEEE. <u>https://doi.org/10.1109/ISCI.2013.6612394</u>.
- [17] Z. Lu and A. Jantsch, "Flit ejection in on-chip wormhole-switched Networks with virtual channels," in NORCHIP '04: Proceedings of the 2004 IEEE/ACM International Conference on Norchip, Nov. 2004, pp. 273–276.
- [18] J. Hu, "U. Y. Ogras, and R. Marculescu, "System-level buffer allocation for application-specific networks-on-chip router design," IEEE Trans. on CAD of Integrated Circuits and Systems, vol. 25, no. 12, pp. 2919–2933, Jan. 2006. <u>https://doi.org/10.1109/TCAD.2006.882474</u>.
- [19] H. Wang, L.-S. Peh, and S. Malik, "Power-driven design of router." Design, Automation & Testin Europe Conf. & Exhibition (DATE), 2012.
- [20] C. A. Nicopoulos, D. Park, J. Kim, N. Vijaykrishnan, M. S. Yousif, and C. R. Das, "ViChaR: A dynamic virtual channel regulator for network-on-chip routers," in MICRO'39: Proceedings of the 39th Annual IEEE/ACM International Sympo-sium on Microarchitecture, Dec. 2006, pp. 333–346. https://doi.org/10.1109/MICRO.2006.50.
- [21] Vincenzo Rana A Reconfigurable Network-on-Chip Architecture for Optimal Multi-Processor SoC Communication VLSI-SoC 2008, IFIP AICT 313, pp. 232–250, 2010.
- [22] Intel Corporation. A Touchstone DELTA System Description. Technical Report.1991.