# Area Efficient Architecture for TCAM using Hybrid Partitioned SRAM

Sreelekshmi S.

<sup>1</sup>PG scholar, Electronics and Communication Department, Kerala University, Sree Buddha College of Engineering, Pattoor, Kerala, India

**Abstract**— Ternary content addressable memories are special type of memories that provides very highspeed search operation. But when comparing with traditional Static Random Access Memory (SRAM), Ternary Content Addressable Memory (TCAM) suffers from certain drawbacks like low bit density, high cost, low scalability and lack of available flavors. So here is an area efficient novel architecture for TCAM based on hybrid partitioned SRAM. The design is equipped with gated-clocking scheme so that efficient power management is achieved. The architecture was verified by VHDL.

**Keywords**— Field programmable gate array(FPGA), Linear Priority Encoder, Hybrid Partitioning, Gated clocking, ternary content addressable memory

## I. INTRODUCTION

Content addressable memories are type of memory used for high speed search operation. It is also known as associative storage or associative array. Binary CAM is simplest type of CAM which search according to the input data word consisting of only 1s and 0s. Ternary- CAM are special type of content addressable memory that allows a third important state "x" knows as don't care or wild care. This allows TCAM to perform broader searches operations. Because of the parallel operation of TCAM, it is used in many applications like computer networking devices, data base engines, artificial neural network and intrusion prevention system.

A single TCAM cell consists of two static random access memories and a comparison circuit. The addition of comparison circuit makes this typical TCAM architecture more complex than SRAM. When comparing with SRAM, the TCAM cells are not at all widely available. Also the access time of TCAM is 3.4 times more than that of SRAM cells. These factors have led to certain limitation to normal TCAM when compared to SRAM. So here hybrid partitioning of TCAM table is done to achieve the TCAM functionality with SRAM.

In this work, economical memory mapping is done to make the TCAM architecture more area efficient. This has reduced the area of the Pooja S. Mohan

<sup>2</sup>Assistant Professor, Electronics and Communication Department, Kerala University Sree Buddha College of Engineering, Pattoor, Kerala, India

architecture without affecting the processing time and delay of the TCAM. Also, the concept of gated clocking is introduced in TCAM design to achieve effective power management.

#### II. TCAM ARCHITECTURE

Traditional TCAM architecture is here tested and compared with the modified TCAM architecture.

#### A. Hybrid Partitioning of TCAM Table

The overall architecture of TCAM is built by hybrid partitioning the conventional TCAM table. Hybrid partitioning is the combination of vertical and horizontal partitioning. Vertical partitioning means column-wise partitioning and horizontal partitioning means row-wise partitioning of TCAM Table.

The horizontally partitioned TCAM table is shown in Table 1. The layers are formed by vertical partitioning, and horizontal partitioning is done inside these layers.

Here TCAM table is partitioned into two layers as layer 1 and layer 2 by vertical partitioning. Horizontal partitioning  $HP_{11}$ ,  $HP_{12}$ ,  $HP_{21}$  and  $HP_{22}$  are formed by dividing each of these layers . With the help of vertical partitioning the size of the memory can be effectively reduced.

TABLE I HYBRID PARTITIONED TCAM TABLE

| address | Ternary | layer |   |
|---------|---------|-------|---|
| 0       | 10      | 10    |   |
| 1       | 01 HP11 | 01    | 1 |
|         |         | HP12  |   |
| 2       | 0x      | 11    |   |
| 3       | 11 HP21 | 1x    | 2 |
|         |         | HP22  |   |

## B. Classical TCAM architecture

Fig 1 shows the comprehensive architecture of TCAM. It consist of L layers and a CAM priority encoder (CPE). The output of the each layers is a

Probable match address(PME). These PMAs are given into the CPE which selects the match address(MA) among them.



Fig 1. Overall architecture of TCAM

Fig 2 shows the classical TCAM layer architecture. It consist of validation memory, 1-bit AND operation, original address table address memory, k-bit AND operation and a layer priority encoder.



Fig 2. Single layer architecture of TCAM

The validation memory is of size  $2^{w}*1$  where w represents the number of bits in each sub-word. And also it contains  $2^{w}$  rows. The sub-words are given first to the authorization memory. The subwords act as address to authorization memory. If the address location pointed by the sub-word is high, then it shows that the input words is present otherwise it is absent. Thus the AM checks for the presence or absence of a particular sub-word. Table II shows the mapping of sub-word 00,01 and 11. The output from validation memory is given into the 1-bit AND operation module. The search operation is continued based on the output of this module. Search operation is sustained only if the 1bit AND operation leaves an output of logic high. Otherwise, searching stops.

In classical TCAM there is a memory module named original address table address memory. The enable signal from 1-bit AND operation along with the input sub-word is fed into this OATAM. It generates a corresponding OATA which is fed as input to Original address table (OAT) where the search word is actually stored. The output of OAT s are k-bit words which is then fed into the k-bit AND operation module from where the PMA is generated.

TABLE II TCAM DATA MAPPING

| .y  | Address | VM21 | $VM_{22}$ | OATAM <sub>2</sub> | 1 OATAM <sub>22</sub> | Original<br>OAT <sub>21</sub><br>2 3 | Address<br>OAT <sub>22</sub><br>2 3 |
|-----|---------|------|-----------|--------------------|-----------------------|--------------------------------------|-------------------------------------|
|     | 0       | 1    | 0         | 0                  | -                     | 1 0                                  | 0 1                                 |
| sw  | 1       | 1    | 0         | 1                  | _                     | 1 0                                  | 1 1                                 |
|     | 2       | 0    | 1         | -                  | 0                     | 0 1                                  | 0 0                                 |
| VMN | 3       | 1    | 1         | 2                  | 1                     | 0 0                                  | 0 0                                 |

## C. Modified TCAM Architecture

The general architecture of TCAM is modified so that the architecture becomes more area efficient. This is done in such a way that the OATAM memory block in each layer is efficiently OATAM removed from the general architecture. Figure 3



Fig 3. Area efficient architecture of TCAM

The sub-word sw is directly passed into the OATs and the output of 1-bit AND operation block is given to OATs which is in contrary to traditional architecture of TCAM. By this modification done the architecture seems to be more area efficient which is clearly understood from the synthesis results.

In modified design the concept of clock gating is done at the coding section so that the architecture becomes more power efficient that classical TCAM.

#### III. SEARCH OPERATION

In the modified architecture a single layer ie, the original address table address memory is removed so that the area is efficiently utilized. Here the data mapping done in both classical TCAM and in modified TCAM are compared and evaluated. In classical TCAM, the TCAM table is logically partitioned into hybrid partitions. Each hybrid partition is then expanded into a binary version. The algorithm for search operation is shown below.

 TABLE III

 SEARCH OPERATION IN LAYER TWO

| Steps | Activity                           |  |  |  |  |
|-------|------------------------------------|--|--|--|--|
| 1     | Sub-word1=00                       |  |  |  |  |
|       | Sub-word2=11                       |  |  |  |  |
| 2     | Sub-word2 is applied to VM21       |  |  |  |  |
|       | Sub-word2 is applied to VM22       |  |  |  |  |
| 3     | Read out bit from VM21=1           |  |  |  |  |
|       | Read out bit from VM22 =1          |  |  |  |  |
| 4     | Have all VMs validated their sub-  |  |  |  |  |
|       | words?                             |  |  |  |  |
| 5     | If yes sustain search operation    |  |  |  |  |
| 6     | Read out data from OAT21 =10       |  |  |  |  |
|       | Read out data from OAT22 =11       |  |  |  |  |
| 7     | Result of K-bit AND operation = 10 |  |  |  |  |
| 8     | PMA=2                              |  |  |  |  |

#### IV. RESULTS

A. Simulation Results

The performance analysis of classical TCAM is also verified as shown in fig. The area efficient hybrid partitioned TCAM architecture is analyzed, coded in VHDL and is simulated in Xilinx design suite 14.2 using ISim simulator.

| M + I W   🕾 🗖 🗆 🖉 🖉 🕂   🖉 🖉 🖉 🖉 🖉 🖉 🖉 I I I I I I I I I |          |   |          |       |     |            |          |          |          |
|---------------------------------------------------------|----------|---|----------|-------|-----|------------|----------|----------|----------|
|                                                         |          |   |          |       | 1,0 | )83.833 ns |          |          |          |
| Name                                                    | Value    |   | 1,000 ns | s<br> |     | 1,100 ns   | 1,200 ns | 1,300 ns | 1,400 ns |
| Ug cik                                                  | 0        | Ť |          |       |     |            |          |          |          |
| Un sel                                                  | 0        |   |          |       |     |            |          |          |          |
| ી∰ r_wb                                                 | 1        |   |          |       |     |            |          |          |          |
| 🕨 🔩 c[3:0]                                              | 1111     | 1 |          |       |     |            | 1111     |          |          |
| ▶ 🍓 l1_d1_1[1:0                                         | 11       |   |          |       |     |            | 11       |          |          |
| ▶ 📑 l1_d1_2[1:0                                         | 10       |   |          |       |     |            | 10       |          |          |
| ▶ 📑 I1_d1_3[1:0                                         | 11       |   |          |       |     |            | 11       |          |          |
| ▶ 🍓 l1_d2_1[1:0                                         | 01       |   |          |       |     |            | 01       |          |          |
| ▶ 📑 l1_d2_4[1:0                                         | 00       |   |          |       |     |            | 00       |          |          |
| I2_d1_1[1:0                                             | 11       |   |          |       |     |            | 11       |          |          |
| I2_d1_2[1:0                                             | 11       |   |          |       |     |            | 11       |          |          |
| ▶ 🏹 I2_d2_1[1:0                                         | 01       |   |          |       |     |            | 01       |          |          |
| ▶ 🔩 l2_d2_2[1:0                                         | 10       |   |          |       |     |            | 10       |          |          |
| 🕨 📑 ma[1:0]                                             | 11       | 1 |          |       |     |            | 11       |          |          |
| 🛯 🔓 clk_period                                          | 20000 ps |   |          |       |     |            | 20000 ps |          |          |
|                                                         |          |   |          |       |     |            |          |          |          |
|                                                         |          |   |          |       |     |            |          |          |          |

Fig 4. Simulation result of classical TCAM

The major benefits processed by the modified architecture are its simpler architecture and easy scalability. The proposed TCAM follows the trend of general SRAM like availability and simplicity. It is much faster than traditional CAM architectures. The clock gating technique, based on the concept of switching off the unused parts with the help of enable and disable signals, employed in the modified design helps in power management also.



Fig 5. Simulation result of area efficient TCAM

## B. Synthesis Results

Device utilization summary:

Selected Device : 3s250etq144-5

| Number of Slices:           | 139 | out of | 2448 | 5%  |
|-----------------------------|-----|--------|------|-----|
| Number of Slice Flip Flops: | 104 | out of | 4896 | 28  |
| Number of 4 input LUTs:     | 226 | out of | 4896 | 4%  |
| Number of IOs:              | 41  |        |      |     |
| Number of bonded IOBs:      | 41  | out of | 108  | 37% |
| IOB Flip Flops:             | 2   |        |      |     |
| Number of GCLKs:            | 3   | out of | 24   | 12% |

Fig 6. Device utilization of classical TCAM

Device utilization summary:

#### Selected Device : 3s250etq144-5

| Number | of Slices:           | 105 | out of | 2448 | 48  |
|--------|----------------------|-----|--------|------|-----|
| Number | of Slice Flip Flops: | 76  | out of | 4896 | 1%  |
| Number | of 4 input LUTs:     | 174 | out of | 4896 | 3%  |
| Number | of IOs:              | 41  |        |      |     |
| Number | of bonded IOBs:      | 41  | out of | 108  | 37% |
| IOB    | Flip Flops:          | 2   |        |      |     |
| Number | of GCLKs:            | 3   | out of | 24   | 12% |

Fig 7. Device utilization of area efficient TCAM

#### C. Comparison of Results

Table shows the comparison between classical TCAM and modified TCAM. From this it is clear that the modified version makes the architecture more area efficient without affecting the performance.

TABLE III HYBRID PARTITIONED TCAM TABLE

| Parameter                                   | Area(slices<br>out of 2448) | Delay |
|---------------------------------------------|-----------------------------|-------|
| TCAM based on<br>hybrid partitioned<br>SRAM | 139                         | 4.040 |
| Area efficient<br>TCAM                      | 105                         | 4.040 |

#### REFERENCES

- Zahid Ullah, Manish K. Jaiswal, and Ray C. C. Cheung, "Z-TCAM: An SRAM-based Architecture for TCAM", *IEEE* trans. on very large scale integr. (VLSI) syst., vol. 23, no. 2, pp. 402-406 feb 2015.
- [2] Russell Tessier, Vaughn Betz, David Neto, Aaron Egier and Thiagaraja Gopalsamy, "Power-Efficient RAM Mapping Algorithms for FPGA Embedded Memory Blocks", *IEEE* transactions on computer-aided design of integr. circuits and syst, vol. 26,pp.276-289 no. 2, Feb. 2007
- [3] H. Noda et al., "A cost-efficient high-performance dynamic TCAM with pipelined hierarchical searching and shift redundancy architecture," *IEEE J. Solid-State Circuits,* vol. 40, no. 1, pp. 245–253, Jan. 2005.
- [4] Xilinx, San Jose, CA, USA. Xilinx FPGAs [Online]. Available: http://www.xilinx.com
- [5] K. Pagiamtzis and A. Sheikholeslami, "Content-addressable memory (CAM) circuits and architectures: A tutorial and survey," *IEEE J. Solid-State Circuits, vol. 41, no. 3, pp.* 712–727, Mar. 2006.
- [6] N. Mohan, W. Fung, D. Wright, and M. Sachdev, "Design techniques and test methodology for low-power TCAMs," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 6, pp. 573–586, Jun. 2006.*
- [7] M. Somasundaram, "Memory and power efficient mechanism for fast table lookup," U.S. Patent 20 060 253 648, Nov. 2, 2006.
- [8] G. Palumbo, F. Pappalardo, and S. Sannella, "Evaluation On Power Reduction Applying Gated Clock Approaches",

IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 4, pp. 85-89, Feb. 2002

[9] S. V. Kartalopoulos, "RAM-based associative contentaddressable memory device, method of operation thereof and ATM communication switching system employing the same," U.S. Patent 6 097 724, Aug. 1, 2000.