# Aninnovative Method in Designing Multiprocessor using Multi-Threading Techniques

Youmin Zhang, Joaquim Blesa Professors, Department of VLSI Design, University of Michigan, Ann Arbor

#### Abstract

A quad-core processor is achip with four autonomous components called cores which deliver and accomplish central processing unit. The quad processor core is an emerging trend used in many systematic and industrial claims. In a distinct programmable device, FPGA with improving performance and gate capability can be implemented. The diplomatic issues in the embedded multiprocessor are thread safety. They have been occurred by the shared memory; while a thread safety is disrupted the processors could able to deliberate the equivalent value at the identical time. The two main impacts such as clock scaling procedures and micro architectural improvements are used to improve the processor performance. Consequently to rectify this problem, a new phenomenon called quad core architecture has been developed for system on a chip solicitation. Hence this system is designed by using VHDL and it accomplishes an instantaneous usage of both parallel and distributed networks. The operations such as arithmetic, logical, shifting and bit manipulate are deliberated by using the full architecture of quad core processor. The projected quad processor core comprises Standardized RISC processors extracted with pipelined handling components, multi bus organization and I/O ports alongside with further efficient features necessary to design embedded SoC results. The implemented Quad core presentation disputes such as speed, area, and power dissipation.

**Keywords:** *quad core processor, thread safety, parallel & distributed networks, RISC processors.* 

#### I. INTRODUCTION

The Quad core processor comprises of four associated processors which are capable for the enhancement of communication. Multi bus is a processor which is made up on a single chip can heal the process. By describing a multiprocessor system usually connected by various type of bus, and each chip. A third preference is a multiprocessor scheme functioning with more than one chip connected by a computer, in which each chip can contain more than one processor, and each computer can contain more than one chip. Nowadays supercomputers are constructed with this phenomenon. In a system which has more than a one task is called as threads. It is essential to range the capability done the whole processor, observing the alteration in diligent time as small as possible. To impinge this, it is significant to manage the effort and capacity among the processors. At this point, it is further most essential to contemplate either some processors or specialpurpose IP cores. To preserve a scheme with N processors operative, it has to graft with N or more threads so which processor persistently had somewhat to do. Besides, it is required for the processors are capable to communicate with each other, generally through a shared memory, where very deals that other processors can the habit of storage. This presents a novel difficult of thread safety. Whenever the thread safety is disrupted, two processors (employed threads) right to use the same rate at the same period.

The future aspects of the thread and process parallelism is as follows as: the nature of the solicitations and the nature of the operating system. The Scholars suggests two different micro designs that abuse several threads of control. i.e. simultaneous multithreading (SMT) and the chip multiprocessors (CMP). The Chip multiprocessors (CMPs) practice moderately humble single-thread processor cores that abuse merely adequate amounts of parallelism inside any one thread, how ever achieving multiple threads in parallel through multiple processor cores .Wide-issue superscalar processors are exploited the instruction-level parallelism (ILP) has been accomplished multiple commands from a single database in a single rotation. Multiprocessors (MP) have been exploited threadlevel parallelism (TLP) by implementing altered threads in analogous on dissimilar multiprocessors. Although developed skill progresses, decreasing the size of specific gates, physical parameters of semiconductor-based microelectronics have been a major strategy distress. These corporeal restrictions can cause important heat degeneracy and data synchronization difficulties. Specific instruction-level parallelism (ILP) approaches like superscalar pipelining are appropriate for several uses, but are incompetent for others which comprise difficult-topredict code. Countless solicitations are wellthread-level parallelism (TLP) matched to techniques, and multiple liberated CPUs are generally used to improve a structure's complete TLP.

A mixture of improved obtainable spaceand the request for augmented TLP directed to the progress of multi-core CPUs. A number ofcommercialreasonsthat effort the expansion of multi-core architectures. For years, it is likely to develop the performance of a CPU by reducing the space of the integrated circuit (IC) ,that altered the rate per device on the IC. Otherwise, for the similar circuit range, additional transistors are to be applied in the strategy, that improves the functionality, exclusively for complex instruction set computing (CISC) architectures. Clock rates also improved by orders of magnitude in the spans, from numerous megahertzto more than a few gigahertz

#### II. ARCHITECTURE OF THE QUAD CORE PROCESSOR

The architecture of the Quad processor is totally planned only when executing to the embedded concurrent processor. The concurrent processor concentrates extra on the relations among responsibilities, precise numbering of the connections or communication amongst the responsibilities, and the synchronization of contact to their possessions to input and output devices are the vital disquiets through the proposal of concurrent computing system. Besides the concurrent modules such as SIMD array, mapping and identical memories are added to each processor.



Fig.1: Architecture of the Quad Core Processor

### III. DESIGN

The projected quad core implements the numerous tasks. This process might have been executed by the Height tree evaluation method and restructuring of instruction performance is essential. Each processor requires clock to finish their work properly. The processor accomplishes one instruction within one clock period. The quad core is in charge for several synchronized calculating processes and it comprises of three key segments. They are Core processor, Quad RISC Processor, I/O Ports. Basic RISC Processor are known as scalar RISC Processor since they are planned to subject one instruction per cycle, related to the base scalar processor. Four agents RISC based processors, the sun SPARC, Intel i860, Motorola M88100, and AMD 29000. Every processor practice 32 bit instruction length. The instruction set contains of 50 to 125 basic instructions. Hence consider these four processors as general scalar RISC, issuing firmly one information per cycle. Amongst the four scalars RISC Processor, select to observe the sun SPARC and i860 architectures. The sun SPARC is resulting from the original Berkeley RISC design. The idea of parallel and distributed computing Processor Core essentially

contains of SIMD array, mapping, and identical memories. The significant object in quad processor representing is the limitations and data flow of parallel processors. Identical memories are significant and create a scheme flexible and have an extraordinary output for both parallel and distributed approaches and are too capable for parallel manner.

SIMD The array maintains the multidimensional array of information. It tolerates the instantaneous use of several processors for explaining a mission. In the proposed system, the scheme of an 8-bit data Reduced Instruction Set Computer (RISC) processor is used. It was established with application proficiency and effortlessness in mind. It has been a comprehensive instruction set, program memories and data memories, general purpose registers and a simple Arithmetical Logical Unit (ALU) for simple tasks. In this strategy, best of the instructions are constant length and related layout. Arithmetic operations are regulated to CPU registers. The Instruction cycle comprises of three stages namely fetches, decode and execute. Most of the RISC processors contribute the highest concert per entity range for parallel codes. A larger number of RISC

processor cores agree a fine-grained capacity to achieve vibrant voltage scaling and power down. The RISC processor core with a simple architecture is at ease to proposal and proved. This processor is aprofitable component that is laid-back to blackout in the aspect of disastrousblemishes and easier to reconfigure in the face of huge measureable distinction.

## **IV. FPGA IMPLEMENTATION**

Recent FPGAs are huge sufficient to design Quad-Processor Systems-on-Chip. Thus Profitable FPGA concerns also afford structure enterprise implements. To service the implement conclusions, develop an intention procedure amended to the particular tasks that are facing. This approach for developing a digital logic Enactment of the essential functionalities incorporates three stages, each of which includes a set of varieties. The standards that hire to estimate the final set of selections are difficulty and scalability. Then explain the difficulty of a thing by two methods: the extreme functioning frequency and source practice. Minor difficulty is accepted; i.e., the element has a developed frequency and acquires least possible source. By definition, the scalability is the ability of the entity to develop in a selected measurement with a negligiblerise in complication.

To demonstrate complexity and scalability, observe the FPGA mapping of the strategy. At first, start by selecting a specific architectural method which offersanassuredsubdivision of the vital functionalities.

Secondly reflects the physical proposal selections of each architectural procedure in terms of the sequential and spatial parallelism required understanding the data and latency necessities .Finally, proceeds into contemplation the logicimplementation definite varieties determined by the limitations of the target FPGA chip. It is important to note that the selections in all three steps are mutually dependent. For instance, primarily engaging architectural system may involve logic scheme choosing that lead in the direction of excessively costly application. In this method, the choice of the architectural system has to be revised. In case there are reasonable applications of the designated architectural procedure, the essential presentation can be accomplished by complementing the trade-off among throughputs and functioning frequency. Advanced throughput requires advanced strategy complexity. On the other hand, to accomplish high functioning frequency, the project complexity must be kept low. The subsequent extreme frequency is extremely subjective by the CAD tools that spontaneously place and track the circuit.



Fig.2: Flow Chart for Implementation of Quad Core Architecture



Fig.3:Simulation Results of ALU in Addition Operation.



Fig.3:Simulation Results of ALU in Subtraction Operation



Fig.4: Simulation Results of Quad Core Processor Main Module

The above figures offers the outcomes of planned quad core architecture that is simulated by using model simulator and validating the parallel and distributed work out which demonstrate that the considered processor reveals the simultaneous results further down the quad processor policy.



Fig 5:Floor Planning Diagram

#### **V. CONCLUSION**

The proposed quad processor core is very efficient and powerful core architecture that is evaluated by using Xilinx software and simulation by using model sim simulator. Though the quad processor core was established on well-proven superscalar architectural methods, comprising parallel implementation and register renaming, these procedures were initially developed for fine-grained parallelism between instructions. The premeditated quad core application using FPGA that is very beneficial to the architecture, because it practices the parallel processing and pipelining procedures are easily executed in FPGA. The simultaneous process of quad processor provides the high speed and earnings less power.

#### REFERENCES

- J. Borkenhagen, R. Eickemeyer, and R. Kalla: "A Multithreaded PowerPC Processor for Commercial Servers, IBM Journal of Research and Development", November 2000, Vol. 44, No. 6, pp.1995.
- [2] Lance, Hammond, Basem, Ku.umen "ANayfeh, KunleOlukotun, A Single Chip multiprocessor. IEEE Computer", vol. 30, no. 9, pp. 79--85, September1997.
- [3] J. Io, S. Eggers, J. Emer, H. Levy, R. Sstamm, and D. Tullsen. "Converting thread level parallelism into instruction-level parallelism via simultaneous multithreading, ACM Transactions on Computer Systems, 15(2), pp. 323-354, August 1997.
- [4] Tai-Hua, Lu, Chung-Ho Chen, KuenJong Lee. "Effective Hybrid Test Program Development for Software-Based Self-Testing of Quad Cores", IEEE Manuscript received April 03, 2012, revised August 14, 2012, first published December 18, 2012.
- [5] Gohringer, D., Hubner, M.Perschke, T., Becker. J. "New Dimensions for Quad core Architectures Demand Heterogeneity", Infrastructure and Performance through reconfigurability The EMPSoC Approach". In Proc of FPL 2010, PP.495-498, Sept 2010
- [6] Lysaght, P. Blodget, B. Mason, J.Young, B.Bridgford, "Invited Paper: Quad core design Methodologies and CAD Tools for Dynamic Reconfiguration of Xilinx FPGAs". In Proceedings of FPL 2009, August 2009
- [7] D. Tullsen, S. Eggers, and H. Levy, "Simultaneous Multithreading: Maximizing On- Chip Parallelism," Proc. 22nd Ann. Int'l Symp. Computer Architecture, ACM Press, New York, 1995, pp. 392-403.