# Tiling Based Concurrent Supervision of Power And Fault Tolerance In Heterogeneous Multicore Embedded Systems

<sup>1</sup>S.Murugalakshmi and <sup>2</sup>Dr. P.Ranjith Kumar

Final year PG student<sup>1</sup>, AP<sup>2</sup> P.S.R.Engineering College, Sivakasi

> Received Date: 03 March 2021 Revised Date: 06 April 2021 Accepted Date: 18 April 2021

Abstract — Increasing power densities had led to the dark silicon era, for which miscellaneous multicores with various power and performance characteristics are promising architectures. In this system, peak-power-aware reliability management (PPARM) is presented with a power density for reliability constraints. To balance the reliability issue, we also employ a tiling-based prohibitions method to further diminish the consumption of power. This system has resulted in an average of less peak and low area consumption.

Keywords: PPARM, Tiling, reliability

### I. INTRODUCTION

With the initiation of immensely deep submicron technologies as low as 45 nanometers and tomorrow down to 32 and even 22 nanometers, integrated circuit (IC) designers will have to face two major dares: first of all, they have to take into account a substantial increase in complexity due to the number of components including multi-core processors ("More Moore") but also due to the significant growth in miscellaneous technology ("More than Moore"). Secondarily, the remarkable reduction in reliability of the components wanted to be taken into account, in particular with the behavior of switches that are more sensitive to technical variations, temperature effects, and environmental conditions. While designing an SoC, a vendor could use a library of cores designed by external designers along with the utilization of cores from in-house libraries. Cores are pre-designed architectures of critical functions termed Intellectual Property Blocks (IP Blocks),

Virtual Components (VC), or simply micros. Since the design of an SoC comprises cores from different sources, vendors, we can say that an SoC is completely miscellaneous, and that is one of the key stumbling blocks which complicates its design process.

#### **II. RELATED WORK**

Ejlali et al. 2012 brought forward an Antonio presented OmpSs, which was a directive-based programming model that utilizes OpenMP-like directives that permit executing the tasks annotated on both the SMPs and as FPGA kernels on contemporary SoC processors, similar to Xilinx Zynq platform.

An Ejlali, B M Al, P Hashimi, W Eles, H Munawar, S Khdr, M Pagani, Shafique, Chen are with ABSTRACT, M.Ansariet al.2019 represented this concern with a mixture of software and hardware hardening modes while taking care of power, performance, and overhead constraints. They have presented an efficient algorithm to miniaturize the total energy consumption while convincing the timing constraints of all tasks.

Hashimiet al.2015 proposed a dynamic logic gate that saved the switching power by 50% with LC resonance. The reserved energy on the load capacitance was resettled rather than being misspent. Implementation in a standard 90nm CMOS process demonstrated feasibility with realistic onchip inductors.

Pagani et al.2015 provided an assessment of jitter provoked by power and ground (P/G) voltage fluctuations. The assessments were based on an expended input/output buffer information specification just like a replica for capturing the effect of P/G signal dissimilarities under contemporaneous switching output buffers. The executed large-signal equivalent-circuit model is validated under copious test conditions having innumerable P/G voltage varieties for foretelling the output signal distortions.

Chen introduced Logic-Based Distributed Routing for 3D NoCs (LBDR3D), an extensible, reconfigurable, and fault-tolerant mechanism, which makes use of only two effective channels for executing any deadlock-free turn model routing algorithm was partially vertically connected 3D NoCs.

Pagani et al.2014 stated a CMOS bandgap reference was fabricated in 0.18µm TSMC CMOS technology, with very low power consumption, high Power Supply Rejection Ratio (PSRR), and low temperature drift over a vast temperature range. This is achieved by using a straightforward 3-bit trimming circuit design. That includes high ohmic polysilicon unit resistors, which save area and 8 to 1 multiplexer digitally operated to switch between the 8 various outputs.

#### **III. PROPOSED SYSTEM**

This section explains the PPARM, which disperses the power for the whole circuits. It is used to find the redundant to maintain the reliability of the circuit. Therefore, power constraints, deadlines, and task-level reliability are achieved. The algorithm of PPARM is given below

Algorithm 1 PPARM

**INPUT:** ready tasks with the execution time and the dead- line, set of free

cores  $\Phi = \{HPI: \{C1, \dots, CNHPI\}, LPI: \{C1, \dots, CNHP$ 

CNLPI}},

code versions for each task,

available frequency levels for each core, core power con-straint PTSP, coreand chip power constraint PTDP, Chip.

**OUTPUT:** Mapping and Scheduling of tasks

## BEGIN

1: h=LCM(the periods of all tasks); //Total# of time slots in the frame

```
2: PDA [1...h]= \{0\}; //Initialize the total power consumption array
```

3: <u>S\_HPIi={Null</u>, 1 i <u>NHPI}; //Initialize</u> S with an empty schedule

4: <u>S\_LPI=</u>{Null, 1 i <u>NLPI</u>}; //Initialize S with an empty schedule

5: while (iAZ is not empty) do

6: Ti=ïAZ .remove(); /Select a task chosen from LPD Codes 7:

 $\varphi = \text{find_island}(); //Find the best island$ 

8: C=  $\varphi$  minutilization: //Find a core of the island with lowest utilization

```
9: <u>C.add(Ti);</u>
```

10: while (RT<Rreg) do

```
11: Ti insert(); //Insert a replica task chosen from HR Codes to iAZ
```

12: Update\_reliability(RT); // update the task reliability 13:

φ= find\_island(); //Find the island

14: C=  $\omega$  minutilization; //Find a core of the island with lowest utilization

15: C.add(Ti); //Insert a replica task chosen from HR Codes to iAZ

16: end while

- 17: end while
- 18: while (iAZ is not empty) do

 Jij=iAŹ\_\_\_remove(); //Select a job chosen which has highest priority

20: Jijl= {Jijl, 11 wci}; /Partition the selected job into parts

21: k=release\_time(Jij);

22: foreachpartJijl starting from the first part do

- 23: foreachfreeslot t=k j Pi in C.S do
- 24: if PDA[t]+power (Jijl)

```
PTDP.Chipthen
```

```
25: if power (Jijl) PTSP Corethen
```

```
26: S_C.add(t, Jijl);
```

- 27: PDA[t] = PDA[t] + power (Jijl); 28: k=t+1;
- 29: break;
- 30: end if;
- 31: end if;
- 32: end for:
- 33: end for:
- 34: end while
- 35: if not all the jobs are scheduled then

36: return infeasible;

37: end if;

END

At design time, we select the appropriate code versions for each and every task between various compiled programs and make use of them at run-time by Algorithm 1. For this action, at first, for each piece of work/task, the code version with a small power density is used. Then, for the other code versions, we choose those that have high reliability than the code version with lower execution time.

#### **Tiling technique**

In figure 1, the proposed system is shown in the following. It includes controller, circuit components, tiling zone, FPGA controller, and memory unit.



Figure 1 Block diagram of Tiling System

This system encourages the fault detection and correction system with the support of the tiling zone. The double modular redundancy (DMR) and triple modular redundancy (TMR) are implemented with the advanced tiling system. The fault detector is used to detect the fault that is present in the circuit gates so that this system can easily detect the fault and correct the error with the efficient tiling zone from memory.

### **IV. RESULTS & DISCUSSION**

The results are simulated using Xilinx tools that are shown in the figure. The simulation results, synthesis results, and comparison chart are given in this section.

| Name                | Value                                   | 0%,941,647 ps (2%,941,844 ps (2%,941,845 ps (2%,941,868 p |  |  |
|---------------------|-----------------------------------------|-----------------------------------------------------------|--|--|
| i ck                | 1                                       |                                                           |  |  |
| in reset            | 1                                       |                                                           |  |  |
| Peliper(30)         | 0001                                    |                                                           |  |  |
| PC. BARTER          | 3000000000000000                        |                                                           |  |  |
| au real (154)       | 9000000000000000                        | (010000000000)                                            |  |  |
| # pc_ned(15d)       | 2000000000000000                        | \$10000000000000000000000000000000000000                  |  |  |
| ▶ ¥ pc2[156]        | 000000000000000000000000000000000000000 | \$10000000000.3                                           |  |  |
| • 💘 matrilia;       | 2000001110000000                        | (0000011300000)                                           |  |  |
| [blitid ger 🐓       | 20                                      |                                                           |  |  |
| 🖌 💘 mem to regildij | 20                                      |                                                           |  |  |
| (2.5pp.pk 🔰         | 22                                      |                                                           |  |  |
| le into             | 0                                       |                                                           |  |  |
| le branch           | 2                                       |                                                           |  |  |
| ig nen red          | 1                                       |                                                           |  |  |
| ie men wite         | :                                       |                                                           |  |  |
| Se aleste           | 1                                       |                                                           |  |  |
| lie net with        |                                         |                                                           |  |  |

Figure 2 Output for Peak power management-1



Figure 3 Output for Peak power management-2

| Design Strategy:                              | Xiinx Default (unlocke | ۵ 🔰 | Timing Constraints: |             |  |
|-----------------------------------------------|------------------------|-----|---------------------|-------------|--|
| Environment: <u>System Settings</u>           |                        |     | Final Timing Score: |             |  |
|                                               |                        |     |                     |             |  |
| Device Utilization Summary (estimated values) |                        |     |                     |             |  |
| Logic Utilization                             | Used                   |     | Available           | Utilization |  |
| Number of Slices                              |                        | 368 | 960                 |             |  |
| Number of Slice Flip Flops                    |                        | 110 | 1920                |             |  |
| Number of 4 input LUTs                        |                        | 719 | 1920                |             |  |
| Number of bonded 108s                         |                        | 19  | 66                  |             |  |
| Number of GCLKs                               |                        | 1   | 24                  |             |  |

#### Figure 4 proposed synthesis report

| Parameter | Existing | Proposed |
|-----------|----------|----------|
| Slice     | 741      | 368      |
| LUT       | 719      | 192      |
| Delay     | 18.102   | 13.420   |
| Time      | 5.186    | 2.612    |

**Table 1: COMPARISON TABLE** 



Figure 5 comparison chart

X-axis - Components usage

Y-axis - Number of Components used

It is clear that the proposed system improved the speed by reducing the delay and execution time. Hence reliability is also improved. In the proposed system, the number of slices utilized is reduced to 368. Hence the power need is also reduced. LUT utilization is also reduced. It leads to improved power management techniques and reliability. This reduction in delay time increases the overall speed of the performance. The execution time of the process is also decreased than that of the existed system. Hence the overall performance of the proposed system is improved than the existed system.

#### **IV. CONCLUSION**

This proposed method for a heterogeneous multicore system is concentrated on power consumption and reliability. It is used to optimize the system power. The proposed tiling technique is utilized to optimize the power and reliability with high PPARM support. This system satisfies the overall circuits for fault detection and correction. Therefore, the system is so effective and reliable of the system.

#### REFERENCES

- Ejlali, A., Al-Hashimi, B. M., & Eles, P., Low-energy standbysparing for hard real-time systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 31(3)(2012) 329-342.
- [2] Ansari, M., Safari, S., Yeganeh-Khaksar, A., Salehi, M., & Ejlali, A., Peak power management to meet thermal design power in faulttolerant embedded systems. IEEE Transactions on Parallel and Distributed Systems, 30(1)(2018) 161-173.
- [3] Salehi, M., Ejlali, A., & Al-Hashimi, B. M., Two-phase low-energy N-modular redundancy for hard real-time multi-core systems. IEEE Transactions on Parallel and Distributed Systems, 27(5)(2015) 1497-1510.
- [4] Pagani, S., Chen, J. J., & Henkel, J., Energy and peak power efficiency analysis for the single voltage approximation (SVA) scheme. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 34(9)(2015) 1415-1428.
- [5] Munawar, W., Khdr, H., Pagani, S., Shafique, M., Chen, J. J., & Henkel, J., Peak power management for real-time scheduling tasks on heterogeneous many-core systems. In 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS) (2014) 200-209. IEEE.