# PERFORMANCE OPTIMIZATION OF MULTI-CORE PROCESSORS USING CORE HOPPING - THERMAL AND STRUCTURAL

by

# SUNIL LINGAMPALLI

Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements for the Degree of

MASTER OF SCIENCE IN MECHANICAL ENGINEERING

THE UNIVERSITY OF TEXAS AT ARLINGTON AUGUST 2011 Copyright © by Sunil Lingampalli 2011

All Rights Reserved

#### ACKNOWLEDGEMENTS

It gives me immense pleasure and honor in acknowledging the support and help of my advisor Prof. Dereje Agonafer for his guidance, mentoring and encouragement throughout my research. I am very grateful for his endless support, trust and belief in me which helped my research and built confidence.

Besides my advisor, I also extend thanks to my committee members, Prof.Hajji Sheikh and Prof. Seiichi Nomura for serving the committee. It is my pleasure to thank and honor to work with every member of EMNSPC. A special mention to Fahad Mirza, Avinash Raghu, Poornima Myanmpati and Bharathkrishnan Muralidharan who helped and supported me. A special thanks to Ms. Sally Thompson for her help throughout my study.

I would also like to thank all my friends, my roommates here for their support. They were always supporting me and encouraging me with their best wishes Thanks to one and all for standing by me during my tough times.

Last but not the least my family without whom this wouldn't have been possible. I owe my love and gratitude to God almighty, my parents and my sister, who mean everything to me. Thanks for being a great and loving family.

July 18, 2011

iii

#### ABSTRACT

# PERFORMANCE OPTIMIZATION OF MULTI-CORE PROCESSORS USING CORE HOPPING - THERMAL AND STRUCTURAL

Sunil Lingampalli, M.S.

The University of Texas at Arlington, 2011

Supervising Professor: Dereje Agonafer

As the work load on the single core processor increases, it's processing speed increases resulting in increased power densities and die temperatures. The increase in die temperature results in decreased performance and reliability and increased leakage currents and cooling costs. In order to decrease the work load and the cooling cost on the single core processor, multi-core processors have been implemented.

Multicore Processors also known as Chip Multi Processors (CMP's) are the processors which contain two or more independent cores on a chip. In CMPs, if one core reaches its critical temperature, the workload is transferred to the other. This phenomenon is termed as core hopping. Also, the non-uniform power distribution across the die is not uniform, resulting in hot spots. Core hopping results in the uniform distribution of the work load among the many cores and leads to performance and reliability improvement.

The demand for greater performance in applications involving high levels of computing has resulted in many cores being put on a single chip. Every succeeding processor is predicted to hold double the number of cores than previous one. In this study, core hopping for CMPs is analyzed and thermal analysis of the chip is performed using ANSYS. Furthermore, the hop sequence will be optimized as a function of chip temperature distribution and thermo-mechanical analysis of the package will be carried out to estimate its structural integrity.

# TABLE OF CONTENTS

| ACKNOWLEDGEMENTS                                         | iii  |
|----------------------------------------------------------|------|
| ABSTRACT                                                 | iv   |
| LIST OF ILLUSTRATIONS                                    | vii  |
| LIST OF TABLES                                           | viii |
| Chapter                                                  | Page |
| 1. INTRODUCTION                                          | 1    |
| 1.1 Levels in Electronic Packaging                       | 2    |
| 1.2 Different type of IC Packages                        | 3    |
| 1.3 Flip Chip                                            | 5    |
| 2. MULTI CORE PROCESSOR AND THERMAL CHALLENGES           | 6    |
| 2.1 Thermal Challenges in CMP' s                         | 7    |
| 3. DYNAMIC THERMAL MANAGEMENT (DTM)                      | 10   |
| 3.1 Dynamic Thermal Management Triggering Mechanisms     | 12   |
| 3.1.1 Reducing the amount of energy dissipated           | 13   |
| 3.1.2 Distributing processor activity over the chip area | 14   |
| 3.2 Core Hopping                                         | 15   |
| 4. LITERATURE REVIEW                                     | 16   |
| 4.1 Introduction                                         | 16   |
| 4.2 Multi core and many core                             | 16   |
| 4.3 Dynamic Thermal Management                           | 18   |
| 4.4 Thread Migration                                     | 19   |
| 5. DESIGN AND MODELLING                                  | 21   |
| 5.1 Package description                                  | 21   |

| 5.2 Modeling and Methodology  | 21 |
|-------------------------------|----|
| 6. RESULTS AND DISCUSSIONS    | 25 |
| 6.1 Introduction              | 25 |
| 6.2 Results                   | 26 |
| 6.3 Stress Analysis           | 29 |
| 7. CONCLUSION AND FUTURE WORK |    |
| 7.1 Conclusion                |    |
| 7.1 Future work               |    |
|                               |    |
| REFERENCES                    | 45 |
| BIOGRAPHICAL INFORMATION      | 49 |

# LIST OF ILLUSTRATIONS

| Figure                                                                  | Page |
|-------------------------------------------------------------------------|------|
| 1.1 illustrates the industries that use microsystems                    | 1    |
| 1.2 Level 1 or IC level packaging                                       | 2    |
| 1.3 Level 2 or System level packaging                                   | 2    |
| 1.4 illustrates the level 3 packaging hierarchy                         | 3    |
| 1.5 Through hole packages                                               | 4    |
| 1.6 Surface mount packages                                              | 5    |
| 1.7 Schematic of flip chip interconnect system                          | 5    |
| 2.1 Moore's law projection                                              | 7    |
| 2.2 Core temperature vs. Power Consumption                              | 8    |
| 2.3 Graph showing Intel's increase in Power consumption                 | 9    |
| 2.4 CPU Power Consumption vs. Voltage                                   | 9    |
| 2.5 Cooling costs with increasing Thermal Dissipation                   | 10   |
| 3.1 Graphical Representation of DTM techniques                          | 11   |
| 3.2 Mechanism of DTM                                                    | 11   |
| 4.1 Transistor Integration Capacity                                     | 18   |
| 4.2 Graphical representation of Pollack's rule                          | 18   |
| 5.1 Set up of the model Used                                            | 22   |
| 5.2 Isometric view of the Flip Chip Package                             | 24   |
| 5.3 (a) side view of the package (b) Zoomed in side view of the package | 24   |
| 6.1 Model that has been meshed in Icepak and imported to Fluent         | 25   |
| 6.2 Arrangements and Numbering of Cores                                 |      |

| 6.3 Core hopping (a) Before Core hopping and (b) After Core hopping                                                                  | 28 |
|--------------------------------------------------------------------------------------------------------------------------------------|----|
| 6.4 Import procedure from Fluent to Transient Structural<br>in ANSYS Work Bench                                                      | 30 |
| 6.5 Temperature along the package                                                                                                    |    |
| 6.6 (a) Stress results of the entire package and (b) Graph representing stress variation in the entire package over a period of 5sec | 32 |
| 6.7 (a) Heat sink Top view and (b) Bottom view                                                                                       | 33 |
| 6.8 (a) TIM-2 Top view and (b) Bottom view                                                                                           | 34 |
| 6.9 (a) Heat Spreader Top view and (b) Bottom View                                                                                   | 35 |
| 6.10 (a) TIM-2 Top view (b) Bottom view                                                                                              |    |
| 6.11 (a) Die Top view and (b) Bottom view                                                                                            | 37 |
| 6.12 (a) C4 Underfill Top view and (b) Bottom view                                                                                   |    |
| 6.13 (a) Substrate Top view and (b) Bottom view                                                                                      |    |
| 6.14 (a) Cupad Top View and (b) Bottom View                                                                                          | 40 |
| 6.15 (a) Solder Joint Top view and (b) Bottom view                                                                                   | 41 |
| 6.16 (a) Cupad bottom Top view and (b) Bottom view                                                                                   | 42 |
| 6.17 (a) PCB Top view and (b) Bottom View                                                                                            | 43 |

# LIST OF TABLES

| Table                                                        | Page |
|--------------------------------------------------------------|------|
| 5.1 Dimensions and properties of the Components in the model | 22   |
| 6.2 Sequence in which the Cores Start Hopping                | 27   |
| 6.3 Material Properties of the Components                    |      |

# CHAPTER 1

# INTRODUCTION

Electronic packaging design is the process by which interconnects are laid out geometrically onto circuit cards and system level boards that defines the electrical signal and power path through the package in a way that meets the overall system requirements [1].

Electronic package provides four major functions:

- i. Interconnection of electrical signals
- ii. Mechanical protection of circuits
- iii. Distribution of electrical energy for circuit function
- iv. Dissipation of heat generated by circuit function [2].

Electronic packaging is a technology that is used in almost all industries like computer, automobile, mobile phones aerospace, robotics etc. and engineering branches like physics, chemistry, materials, electrical, mechanical, and thermal engineering [3].



Figure 1.1 illustrates the industries that use microsystems [1]

## 1.1 Levels in Electronic Packaging

Electronic packaging can be classified into three levels [1]

i. Level 1: It is also called as device level or I.C level packaging which involves interconnecting, powering, cooling and protecting ICs [1].

![](_page_11_Figure_3.jpeg)

Figure 1.2 Level 1 or IC level packaging [1]

ii. Level 2: It is also called as System level packaging which involves interconnecting all the components to be assembled on to the mother board which interconnects every component to form one interconnected system [1].

| PWB, Flex |
|-----------|
|-----------|

Figure 1.3 Level 2 or System level packaging [1]

iii. Level 3: In this level several boards are connected together by cables and connectors to make entire system which are used to store and processes large amount of data in corporate companies [1].

![](_page_12_Picture_2.jpeg)

Figure 1.4 illustrates the level 3 packaging hierarchy [1]

# 1.2 Different type of IC Packages

IC packages can be divided into two categories i) Through hole ii) Surface mount. These categories are based on the process of assembling the packages on to Printed Circuit Boards (PCB's).

i) Through hole: It is a package which has pins that are inserted into the holes of PCB'S. In figure 1.5 we can see different types of Through hole packages [1].

![](_page_13_Figure_0.jpeg)

#### Through Hole Packages

Figure 1.5 Through hole packages [1]

ii) Surface mount: These types of packages are mounted on to PCB's. The main advantages of these packages are we can use both sides of the package therefore we can achieve higher packaging density [1]. In figure 1.6 we can see different kinds of surface mount packages.

![](_page_13_Figure_4.jpeg)

Surface Mount Packages

Figure 1.6 Surface mount packages [1]

# 1.3 Flip Chip

Flip chip it is the advanced form of surface mount technology, in which bare semi-conductor chips active face is turned upside down and bonded directly to PCB hence it is called flip chip. In flip chip, bonding is done generally by use of solder interconnects which make mechanical and electrical connections between chip and PCB [1].

![](_page_14_Figure_2.jpeg)

Figure 1.7 Schematic of flip chip interconnect system [1].

#### **CHAPTER 2**

#### MULTI CORE PROCESSOR AND THERMAL CHALLENGES

According to Moore's law the number of transistors on a chip doubles every 18 months [1]. Multicore Processors also known as Chip Multi Processors (CMP's) are the processors which contain two or more independent cores on a chip. This kind of architecture was introduced by Intel with Intel core Duo and has continued to AMD, NVIDIA and so on. As the transistor count keeps increasing in order to integrate the billions of transistors resulting from the continued scaling of technology, CPMs are the only effective way of integrating them [4].

![](_page_15_Figure_3.jpeg)

Figure 2.1 Moore's law projection [1]

As the work load on the single core processor increases, it's processing speed increases resulting in increased power densities and die temperature. The increase in die temperature results in decreased performance and reliability and increased leakage currents and cooling costs. In order to decrease the work load, increase performance and the cooling cost on the single core processor, multi-core processors have been implemented.

#### 2.1 Thermal Challenges in CMPs

As the technology scaling on CMPs occurs i.e., shrinking of chip geometry in the order of sub- 100nm realm this results in increase of transistor density and also increases the leakage current leading to excessive power consumption and heat generation which is a major challenge of future CMPs [4]. In the figure 2.2 we can see that two different processors from AMD with the increase of power we can see the rise in temperature [7].

![](_page_16_Figure_3.jpeg)

Figure 2.2 Core temperature vs. Power Consumption [7].

With the shrinkage of chip geometry and moving towards billion transistor microprocessors the power budget of CMPs must be addressed at design level i.e., the current and future processors power dissipation increases with increase in clock frequency and transistor number

![](_page_17_Figure_0.jpeg)

[5,6]. In the figure 1.7 we can see the power consumption of Intel processors from 1970 to 2005[8]

Figure 2.3 Graph showing Intel's increase in Power consumption [8]

![](_page_17_Figure_3.jpeg)

Figure 2.4 CPU Power Consumption vs. Voltage graph (a) ideal mode and (b) when in use [9] In figure 2.4 (a),(b) we can see the present Intel processors power consumption during idle mode and when in use [9].

Huang et al. [10] derived the expression for power consumed by a core and power density with fixed architecture from across generations as

Power consumed  $P_{n+1} = \left(\frac{vdd_{n+1}}{vdd_n}\right)^2 P_n$  (1)

Power density

$$PD_{n+1} = \left(\frac{1}{s}\right)^2 \left(\frac{Vdd_{n+1}}{Vdd_n}\right)^2 PD_n \tag{2}$$

Where in equation (1) and (2) Vdd is the supply voltage, n and n+1 denotes technology generations and sin equation (2) is the scaling factor.

Many cores particularly pose a thermal problem i.e., primary cores consume more power than the other simple cores which results in localized hot spots [10]. Cho et al. [11] makes the point that at any given point of time not all cores in CMP's will be functioning i.e., different cores at different locations are active at different times resulting in non-uniformity in power consumption. The scaling down of silicon technology means there would be significant thermal coupling between neighboring cores [4].

![](_page_18_Figure_6.jpeg)

Figure 2.5 Cooling costs with increasing Thermal Dissipation [6]

#### CHAPTER 3

# DYNAMIC THERMAL MANAGEMENT (DTM)

Due to the technology scaling and continuous increase in number of cores and exponential increase in power density poses thermal challenge. Moving towards billion transistors on microprocessors means the thermal management should be done from chip design cycle level itself. In [16] it is stated that beyond 40W any additional power dissipation results in increase of cost by more than \$1 per W [15, 17].

DTM controls the chip's operating temperature by enabling various hardware and software techniques at runtime [16]. The current day microprocessors slows down or powers down when a predetermined temperature is reached thus decreasing the performance whereas DTM provides low cost thermal management to reliably reduce power and maintaining chip temperature below safe operating temperature with very little effect on processor performance [18,16].

![](_page_19_Figure_4.jpeg)

Figure 3.1 Graphical Representation of DTM techniques [5]

Figure 3.1 gives the overview of DTM technique. The solid curve in the figure represents that the DTM has not been initiated while the dotted curve represents that the DTM has been initiated. From the figure we can see that both curves are identical until DTM is initiated and we can also see that chip reaches higher temperature where DTM is not initiated compared to the temperature where DTM is initiated. In the case where DTM has been initiated the power dissipation is reduced and consequently the chip temperature which will never exceed the temperature designed for cooling. Thus DTM despite maintaining chip temperature it can cause some degradation in performance [16].

![](_page_20_Figure_1.jpeg)

Figure 3.2 Mechanism of DTM [16]

Figure 3.2 is a break down version of DTM the triggering event may be initiated by use of thermal sensors, or other gauges which initiates the DTM when needed. After the DTM is triggered there is an initiation delay which is due to the interpreting of triggering events, once the DTM response has begun there is a response delay, it depends upon the of type of response chosen. After the response has begun it starts checking for the temperature once the temperature reaches below the threshold it waits for number of cycles the system is designed before turning off the DTM mechanism this delay is called policy delay. Once the DTM has decided to turn the response of there is a shutoff delay which may be due to readjusting of voltage or frequency [16]. All these delays resulting from DTM degrade the performance.

#### 3.1 Dynamic Thermal Management Triggering Mechanisms

Triggering mechanism is similar to that of a temperature sensor on a chip, the different type of triggering mechanisms used are

- Temperature Sensors for Thermal Feedback: The temperature sensor gives the thermal feedback if the temperature value exceeds the desired value the operating system invokes a response mechanism. This is the basic triggering mechanism [12], [16].
- ii. On Chip Activity Counters: Another way of getting the chip temperature is through On Chip Activity Counters, this monitors the performance of various structures in the processor thus decreasing the work load on gauge and the correspondingly the thermal state of the machine[13], [16].
- iii. Dynamic Profile analysis: Sometimes their won't be any work to be performed by the operating system which provides a waiting process and some user interactive applications set certain acceptable performance levels, so when specified rate is exceed these applications allow dynamic thermal management to occur [14,15,16].
- iv. Compile-time trigger requirements: For estimating the performance of applications static analysis at compile time can be used in the same way the complier can estimate the high power code segment and insert instructions specifying that DTM triggers should occur [16].

Comparing the viability of various triggering mechanism and keeping in mind the future research in this area depending completely on temperature sensors is not encouraged because they do have some draw backs like these sensors only approximates the average chip temperature and the temperature reading may be different from actual temperature, so purely depending on hardware solutions is not encouraged but combination of hardware and software gives more effective results than techniques taken alone [16]. Some of the DTM response mechanisms are clock frequency scaling, voltage and frequency scaling and decode throttling [16]. In [16] the DTM scheme is evaluated based on

- i. Number of cycles where the thermal emergency threshold is exceeded.
- ii. Loss incurred in overall performance.

DTM is a tradeoff between cutting power and hitting performance but DTM techniques do cause performance degradation.

Micro - architectural temperature control techniques are primarily classified into two categories:

- i. Reducing the amount of energy dissipated.
- ii. Distributing the processor activity over the chip area.
- 3.1.1 Reducing the amount of energy dissipated

In [16] DTM is used to regulate the temperature. When a core reaches a critical temperature it is stopped which is achieved by clock gating that particular core. This also involves freezing of all operations and during off all signals during which the processor state is maintained [17]. Another way is to cut off the supply voltage to the core which is more aggressive way but taking care of architectural state prior to supply voltage [18].

Dynamic power dissipated is given by [20]

$$P_d = CV^2 f$$

Where C: Capacitance being switched per clock cycle

V: Voltage

f: Switching frequency

From the above equation we can say that as V changes the power consumption changes by power of four and as frequency changes power consumption changes linearly. In [16] it is also mentioned that DTM techniques is used in power pc G3 microprocessor for decode throttling, while restricting the flow of instructions to the core but mainly depending on clock gating to reduce power dissipation , which involves disabling the instruction fetch unit and using the instruction fetch queue to feed the pipeline .

#### 3.1.2 Distributing processor activity over the chip area

Temperature distribution can be done by constantly relocating processing activity from core to core so that temperature for each particular part of chip is maintained below the threshold temperature.

The most popular power-management technique is Thread Migration (TM) or Core hopping, TM is used in CMPs which enables the rapid thread movement depending upon the varying computing needs of different applications running on simple homogenous cores with heterogeneous power performance capabilities [19].

By exploiting the variations arising from micro architectural events TM enables DVFS (Dynamic Voltage Frequency Scaling) benefits at nanosecond time scale. In [19] some of the benefits of TM are explained, the benefits include it enables fine grained tracking of program behavior that conventional DVFS cannot, TM can provide effective intermediate voltage thus cutting down the system cost. Limitations being that it relies on rapid moment of applications between cores thus limiting it to systems featuring simple homogenous cores with small amount of architected state [19]. TM is very effective on CMPS with small cores whose temperature needs to be changed since it acts on a per-core basis [18].

14

# 3.2 Core Hopping

Core hopping is a coarser time scale application of Thread Migration. It was proposed by Intel in 2002 as a means to reduce chip temperature by making threads jump from one core to another thus distributing the heat, thus enabling the key transistors to stay cooler and uniform heat distribution means overall performance improves [21, 22].

Core hopping was made possible with the introduction of an 80 core chip called the Tera-scale Teraflop prototype at Intel [23]. In [24] core hopping is carried out on a dual core POWER5 processor and determined that core hopping does not incur expensive L2 cache misses and also the performance cost of warming up L1 cache is insignificant. The performance degradation was measured to be less than 3 percent.

### CHAPTER 4

#### LITERATURE REVIEW

### 4.1 Introduction

As the work load on the single core processor increases, it's processing speed increases resulting in increased power densities and die temperature. The increase in die temperature results in decreased performance and reliability and increased leakage currents and cooling costs. In order to decrease the work load and the cooling cost on the single core processor, multi-core processors have been implemented.

A detailed literature search was performed with a view towards attaining a broader perspective on technologies in many core processor, the challenges involved and the solutions examined and implemented.

#### 4.2 Multi core and many core

In [25], Shekar Borkar has discussed about fine grain power management and system design for the many core system. Referring figure 4.1, in 2001 the die size of 300mm<sup>2</sup>thickness of 130 mm has the capacity of integrating one billion transistors, when we keep applying Moore's law by 2015 we will have 100 billion transistors on 300mm<sup>2</sup> with almost 1.5 billion transistors for logic.

![](_page_26_Figure_0.jpeg)

Figure 4.1 Transistor Integration Capacity [5]

The governing rule by which the performance increases by micro-architecture alone is by Pollack's rule, which states that performance increase is roughly proportional to square root of increase in complexity which is illustrated in figure 4.2

![](_page_26_Figure_3.jpeg)

Figure 4.2 Graphical representation of Pollack's rule [5]

[25] Also states that instead of a large monolithic processor core two smaller processor cores can perform 70 to 80 percent more compared to only 40 percent from large monolithic core. Shekar Borkar et al [25] points out that multiprocessors have several benefits like individual cores can be turned on or off individually thereby saving power, lower die temperature

is maintained thus improving reliability, tasks can be distributed among core processors in such a way that overall lower overall temperature is achieved.

Huang et al [26] states that for an asymmetric architecture with many core creates a huge thermal problem where in the primary or more complex cores create localized hot spots due to higher power consumption. It also states that due to thermo-spatial low pass filtering effect in smaller cores so the equivalent thermal resistance reduces i.e., for same power density small cores produce less heat than large cores. Huang et al [26] also mentioned that some techniques used by Intel to improve performance such as Intel "turbo mode" used for boosting processing speed by increasing supply voltage and frequency to those cores that are active this results in increasing hot spot.

#### 4.3 Dynamic Thermal Management

In [5], Authors have illustrated the basic steps involved in implementing effective DTM techniques. The main aim of DTM is to provide inexpensive hardware or software responses which reliably reduce power with little impact on performance as possible. The mechanisms used in DTM are initiation, triggering and response mechanisms and with a policy on turning the mechanism off and on judiciously.

Some of the thermal triggering mechanisms used are temperature sensors, on chip activity counters and dynamic profiling. Thermal triggering mechanism is used to cool down the processor when it reaches the threshold temperature. The response mechanism examined in [5] are clock frequency scaling, voltage and frequency scaling, decode throttling, speculation control and I-cache toggling they control the flow of instructions to the processor cores. The evaluation of DTM is done by the number of cycles in which the thermal emergency threshold is exceeded, the overall performance and the execution time. The author concluded that correct triggering mechanism and to activate the mechanism proves significant for effective DTM [5].

In [27] et al studied different DTM techniques used in Multicore processors. They created de16 core multiprocessor in detail to evaluate the performance of thermal management

scheme under conditions like where the processor is running multiple independent application and where the processor is handling two parallel applications one consisting of cold threads and other consisting of hot threads. The thermal management schemes evaluated based on Thread migration (TM), Dynamic voltage frequency scaling (DVFS) with Thread Migration (TM) combined. The TM policies evaluated are classified as Rotation where threads are migrated sequentially, temperature where thread assignments based on temperature, Power based where threads are measured with respect to power and assigned to cores based on temperature and counter based where thread assignment is based on temperature difference. DVFS techniques are classified as local or global based on application to individual cores or whole chip [27].

#### 4.4 Thread Migration

Choi et al [28] on POWERR5 processor has used schemes like hot spot mitigation and temporal scheme where in operating system (OS) level scheduler was used. In [28] firstly they have programed threads to migrate between two cores after a fixed time interval the results showed the temperature decrease of 5.5 degrees Celsius and as the POWER5 dual core has a shared L2 cache the performance degradation was measured to be less than 3 percent. Thus it is concluded that core hopping would be one of the best viable thermal management technique.

Rangan et al [19] used Sun microsystems 16 core processor having core cluster each containing 4 cores, he used this processor for studying Thread Migration and he observed that with the application TM he is able to achieve multiple voltage frequency (VF) domains with just two voltage levels and also there is an increase in performance up to 20% when compared to static OS driven DVFS scheme.

Shayesteh et al [29] studied the core swapping technique on a dual micro core architecture which is triggered thermally, swapping is done with the use of helper engine which reduces the overhead of swapping by buffering core state during the swapping processes. The author came to a conclusion that with the help of core swapping the temperature is maintained below the threshold temperature.

Cho et al [11] used a proactive spatiotemporal power multiplexing method in his study to achieve a lower peak temperature and uniform thermal field on the chip. The spatiotemporal power multiplexing is based on time i.e., it changes the location of power dissipation after a fixed interval of time while maintaining the throughput during the redistribution. In this study they used a 16 x 16 array processor with 256 cores and the power on the processor is kept constant. After studying the time varying spatial power map for the chip the power maps are coupled to the hotspot [35] thermal simulator and transient thermal simulations are performed to study the spatiotemporal variations in the thermal field. It is found that for a given number of active cores a smaller time slice resulted in reduced maximum temperature and spatiotemporal non uniformity.

#### CHAPTER 5

# DESIGN AND MODELLING

## 5.1 Package description

The Typical Packaging Architecture of a Flip Chip Microprocessor is represented in figure 5.1 the test package consists of Heat sink, Thermal Interface Material-2 (TIM-2), Heat spreader, TIM-1, Die, C4 under fill, Substrate, Copper Pads, Solder Balls and Printed Circuit Board (PCB)

![](_page_30_Figure_4.jpeg)

Figure 5.1 Set up of the model Used [30]

Thermal analysis was performed as a natural convection with a heat transfer coefficient of 10W/m<sup>2</sup>K being applied at PCB and 1200W/m<sup>2</sup>K being applied at the top surface of the heat sink. The dimensions and properties of components used in the study are listed in table 5.1. The geometry is based on Salets Intel Pentium processor used in [33].

|               | Dimension | Thickness | Thermal      |
|---------------|-----------|-----------|--------------|
| Components    | (mm)      | (mm)      | Conductivity |
|               |           |           | (W/mK)       |
| Heat sink     | 64 x 64   | 6.35      | 247          |
| TIM-2         | 31 x 31   | 0.075     | 6            |
| Heat Spreader | 31 x 31   | 18        | 390          |
| TIM-1         | 12 x 12   | 0.025     | 50           |
| Die           | 12 x 12   | 0.75      | 140          |
| C4/Underfill  | 12 x 12   | 0.1       | 1.6          |
| Substrate     | 20 x 12   | 1         | 3            |
| Copper pads   | 0.3 x0 .3 | 0.03      | 390          |
| Solder Balls  | 0.4 x0 .8 | 0.28      | 57           |
| PCB           | 76 x 76   | 1         | 13           |

Table 5.1 Dimensions and properties of the Components in the model

#### 5.2 Modeling and Methodology

The package shown in Figure 5.1 is model in Icepak 13.0.2 in form of simple block. The dimensions and thermal properties shown in the table 5.1 were used in creating the blocks. The heat sink instead of having fins is modeled simply in form a block and to compensate for the fins a higher value of heat transfer coefficient is used. The surface of the die was divided into 16 equal areas molded by a heat source each representing a core. The copper pads on the top and bottom of the solder ball and the solder ball were made into three separate blocks in order to decrease the calculating time. The isometric view of the model created in Icepak is shown in Figure 5.2.

![](_page_32_Figure_0.jpeg)

Figure 5.2 Isometric view of the Flip Chip Package

In the above figure 5.2 the green blocks are cores of the microprocessor. Depending upon the architecture typically 30% of the chip consists of Cache and remaining 70% is the Core. For simplicity, core is assumed to be 100% and non-uniform distribution of power is ignored.

![](_page_33_Figure_0.jpeg)

![](_page_33_Figure_2.jpeg)

(b)

Figure 5.3 (a) side view of the package (b) Zoomed in side view of the package

Heat Load of 1W/mm<sup>2</sup> was applied on each core and radiation was turned off to make it conduction only problem. After creating components, giving them their material properties and meshing in Icepak a fluent case was written and imported to Fluent 13.0.

# CHAPTER 6 RESULTS AND DISCUSSIONS <u>6.1 Introduction</u>

Here we consider a condition of temperature based core hopping. Here cores start to hop from active core to inactive core once they have reached certain threshold temperature. In figure 6.1 we can see the model that has been created and meshed in Icepak and imported to Fluent 13.0.

![](_page_34_Picture_2.jpeg)

Figure 6.1 Model that has been meshed in Icepak and imported to Fluent

#### 6.2 Results

#### 6.2.1 Core Hopping Based on Temperature

After creating the model, it is meshed in Icepak 13.0.2, a Fluent case file was written using Icepak solver after that the case file was imported to Fluent 13.0. Using the User Defined Functions in fluent a UDF code was written to assign the boundary conditions i.e., the maximum and minimum temperature at which the cores starts hopping and the conditions were hooked to the respective zones in the model.

![](_page_35_Figure_3.jpeg)

Figure 6.2 Arrangements and Numbering of Cores

The UDF code was written in such a way that at any given point of time four out of sixteen cores are active and generating heat. The sequence in which the cores start hopping is given in the Table 6.2 where  $T_{min}$  is the minimum threshold temperature i.e., when the core reaches the minimum temperature it can be activated and  $T_{max}$  is the maximum threshold temperature i.e., the temperature where the core has to be deactivated.

| Core | ON                                             | OFF          |
|------|------------------------------------------------|--------------|
| 1    | Tcore1<=Tmin                                   | Tcore1>=Tmax |
| 2    | Tcore1>=Tmax and Tcore6>=Tmax and Tcore5>=Tmax | Tcore2>=Tmax |
|      | and Tcore2<=Tmin                               |              |
| 3    | Tcore4>=Tmax and Tcore7>=Tmax and Tcore8>=Tmax | Tcore3>=Tmax |
|      | and Tcore3<=Tmin                               |              |
| 4    | Tcore4<=Tmin                                   | Tcore4>=Tmax |
| 5    | Tcore1>=Tmax and Tcore6>=Tmax and Tcore5<=Tmin | Tcore5>=Tmax |
| 6    | Tcore1>=Tmax and Tcore6<=Tmin                  | Tcore6>=Tmax |
| 7    | Tcore4>=Tmax and Tcore7<=Tmin                  | Tcore7>=Tmax |
| 8    | Tcore4>=Tmax and Tcore7>=Tmax and Tcore8<=Tmin | Tcore8>=Tmax |
| 9    | Tcore3>=Tmax and Tcore10>=Tmax and             | Tcore9>=Tmax |
|      | Tcore9<=Tmin                                   |              |
| 10   | Tcore13>=Tmax and Tcore10<=Tmin                | Tcore10>=Tm  |
| 11   | Tcore16>=Tmax and Tcore11<=Tmin                | Tcore11>=Tm  |
| 12   | Tcore16>=Tmax and Tcore11>=Tmax and            | Tcore12>=Tm  |
|      | Tcore12<=Tmin                                  | ax           |
| 13   | Tcore13<=Tmin                                  | Tcore13>=Tm  |
| 14   | Tcore13>=Tmax and Tcore10>=Tmax and            | Tcore14>=Tm  |
|      | Tcore9>=Tmax and Tcore14<=Tmin                 | ax           |
| 15   | Tcore16>=Tmax and Tcore11>=Tmax and            | Tcore15>=Tm  |
|      | Tcore12>=Tmax and Tcore15<=Tmin                | ах           |
| 16   | Tcore16<=Tmin                                  | Tcore16>=Tm  |
| 1    |                                                | i ax         |

# Table 6.2 Sequence in which the Cores Start Hopping

![](_page_37_Figure_0.jpeg)

![](_page_37_Figure_2.jpeg)

(b)

Figure 6.3 Core hopping. (a) Before Core hopping and (b) After Core hopping.

Different cases are studied for the temperature based core hopping, each case consists of four different sets where each set consists of four cores. The hopping occurs among these four cores depending on temperature conditions given in the UDF code. Among all the cases studied the case that was shown in table 6.2 has low and uniform temperature distribution.

In this case heat flux of  $1e^{6}$  W/m<sup>2</sup> is applied on each core and solution was run for 10 time steps with a step size of .5 seconds. Despite of 16 cores being present the hopping sequence takes place between a set four cores i.e., 1 and 6, 4 and 7, 13 and 10, 16 and 11. This is due to the low value of heat flux and the outer cores 1,4,13 and 16 are sufficiently cooling down as which can be seen in figure 6.3 (b) thereby causing the activity to hop back to the outermost cores 1,4,13 and 16. However when the heat flux is increased from  $1e^{6}$ W/m<sup>2</sup> to  $6e^{6}$ W/m<sup>2</sup> then the hopping is occurring among all the 16 cores because the condition for activating of each core is satisfied at some point of time.

#### 6.3 Stress Analysis

Thermo mechanical analysis is carried out using ANSYS Work Bench 13.0, the package geometry is imported to work bench in .IGES format which was written using ANSYS Icepak. The model has been imported to transient structural analysis in work bench. Transient structural analysis is a time based analysis where the loads are applied based on time. In this study loads were thermal loads, thermal loads are imported from fluent to Work bench. Figure 6.4 shows the transfer of thermal loads from fluent to Work bench transient structural analysis.

29

![](_page_39_Figure_0.jpeg)

Figure 6.4 Import procedure from Fluent to Transient Structural in ANSYS Work Bench

![](_page_39_Figure_2.jpeg)

Figure 6.5 Temperature along the package.

For the thermo mechanical analysis apart from thermal properties of the material properties like Young's modulus CTE mismatch and Poisson's ratio has to be given. These properties are created in engineering data in work bench and attached to the respective parts in the model

| Material      | E(GPa)                 | CTE(ppm)            | Poisson's Ratio (v) |
|---------------|------------------------|---------------------|---------------------|
| PCB           | 21.9( X or Z); 9.99(Y) | 17e-6(X or Y); 70e- | 0.28                |
|               |                        | 6(Y)                |                     |
| Solder Bump   | 38                     | 2.21e-05            | 0.36                |
| Substrate     | 25.99(X or Z); 11(Y)   | 17e-6(X or Z); 52e- | 0.39 & 0.11         |
|               |                        | 6(Y)                |                     |
| Copper Pad    | 82.7                   | 1.27e-05            | 0.34                |
| C4/Underfill  | 14.5                   | 2e-05               | 0.28                |
| Die           | 150                    | 3e-06               | 0.3                 |
| TIM 1         | 4e-04                  | 1.75e-04            | 0.28                |
| Heat Spreader | 121                    | 1.73e-05            | 0.3                 |
| TIM 2         | 4e-04                  | 1.75e-04            | 0.28                |
| Heat Sink     | 68                     | 2.4e-05             | 0.3                 |

Table 6.3 Material Properties of the Components

In transient structural analysis the material properties are attached then using analysis setting the number of time steps and step size are given which is same as in fluent. After that the thermal loads that are imported are attached to their respective components and then stress analysis is carried out. The stress results obtained for the entire package over a period of 5sec are Maximum value of 48.331 MPa and 29.115 MPa .The stress results for the entire package is shown in the Figure 6.5 (a).

![](_page_41_Figure_0.jpeg)

![](_page_41_Figure_2.jpeg)

(b)

Figure 6.6 (a) Stress results of the entire package and (b) Graph representing stress variation in the entire package over a period of 5sec.

In the above graph (Figure 6.6 (b)) we can see the stress variation of the package over a period of 5sec the green line represents maximum stress and red line represents minimum stress.

![](_page_42_Figure_0.jpeg)

![](_page_42_Figure_2.jpeg)

(b)

Figure 6.7 (a) Heat sink Top view and (b) Bottom view

![](_page_43_Figure_0.jpeg)

![](_page_43_Figure_2.jpeg)

(b)

Figure 6.8 (a) TIM-2 Top view and (b) Bottom view

In figures 6.7 and 6.7.1 we can see the stress in the Heat Sink and TIM-2 the above pictures were taken at 1.52 seconds where their is maximum stress on the components.

![](_page_44_Figure_0.jpeg)

![](_page_44_Figure_2.jpeg)

Figure 6.9 (a) Heat Spreader Top view and (b) Bottom View

![](_page_45_Figure_0.jpeg)

![](_page_45_Figure_1.jpeg)

(b)

Figure 6.10 (a) TIM-2 Top view (b) Bottom view

In figures 6.7.2, 6.7.3 we can see the stress in the Heat Spreader and TIM-2. The above pictures were taken at 1.52 seconds where there is a maximum stress on the components. The stresses within the same component vary at the top and bottom views due to to the difference in material properties at contact surfaces.

![](_page_46_Figure_0.jpeg)

![](_page_46_Picture_2.jpeg)

Figure 6.11 (a) Die Top view and (b) Bottom view

![](_page_47_Picture_0.jpeg)

![](_page_47_Figure_1.jpeg)

![](_page_47_Figure_2.jpeg)

(b)

Figure 6.12 (a) C4 Underfill Top view and (b) Bottom view

In figures 6.7.4, 6.7.5 we can see the stress in the Die and C4. The above pictures were taken at 1.52 seconds where there is a maximum stress on the components. The stresses within the same component vary at the top and bottom views due to to the difference in material properties at contact surfaces. Among all the components the maximum stress ocurres on Die which is 48 MPa.

![](_page_48_Figure_0.jpeg)

![](_page_48_Figure_2.jpeg)

(b)

Figure 6.13 (a) Substrate Top view and (b) Bottom view

![](_page_49_Figure_0.jpeg)

![](_page_49_Figure_2.jpeg)

(b)

![](_page_49_Figure_4.jpeg)

In figures 6.7.6, 6.7.7 we can see the stress in the Substrate and Copper pad top. The above pictures were taken at 1.52 seconds where there is maximum stress on the components because of the different material properties at contact surfaces. Due to which the stress keep varying from top to bottom of the components.

![](_page_50_Figure_0.jpeg)

![](_page_50_Figure_2.jpeg)

(b)

Figure 6.15 (a) Solder Joint Top view and (b) Bottom view

![](_page_51_Picture_0.jpeg)

![](_page_51_Figure_2.jpeg)

Figure 6.16 (a) Cupad bottom Top view and (b) Bottom view

In figures 6.7.8, 6.7.9 we can see the stress in the Solderjoint and Copper pad bottom. The above pictures were taken at 1.52 seconds where there is maximum stress on the components because of the different material properties at contact surfaces due to which the stress keep varying from top to bottom of the components.

![](_page_52_Figure_0.jpeg)

![](_page_52_Figure_2.jpeg)

(b)

Figure 6.17 (a) PCB Top view and (b) Bottom View

In figures 6.7.10 we can see the stress in the PCB. The above pictures were taken at 1.52 seconds where there is maximum stress on the components because of the different material properties at contact surfaces. Due to which the stress keep varying from top to bottom of the components. On PCB maximum stress occur on edges of the PCB which are fixed.

#### CHAPTER 7

#### CONCLUSION AND FUTURE WORK

#### 7.1 Conclusion

Studied Core Hopping based on temperature. In temperature based hopping different cases were studied in which each case had four sets. Each set having four different cores. Among the different cases studied for core hopping the best case is shown in the Table 6.2. In this case the temperature is low and uniform. The main reason the time based core hopping is not recommended because ,in time based core hopping the condition would be to activate the core for certain period of time and in that period of time the core may exceed the safe temperature that it is designed for. This may lead to chip failure. In IT industry chip failure is not an option so temperature based hopping is better than time based hopping. For the stress analysis the case that was studied in temperature based hopping is imported to work bench and transient structural analysis was performed. Maximum stress, as expected was seen in the chip region (die and the C4) and other components show very little stress throughout the hop sequence. Also, the maximum stress in the chip region is around 50MPa which is three orders of magnitude less than the elastic strength of silicon. This concludes that in the core hopping phenomenon, the key parameter is temperature management only and that mechanical stresses will not play any role in causing failures.

#### 7.2 Future work

Future analysis of core hopping is proposed to be used in 3D stack packaging which is the emerging packaging technology. In 3D stack packaging, dies are stacked vertically on top of each other and are connected using through Silicon Vias (TSVs) which is another emerging technology to connect dies stacked on top of each other.

#### REFERENCES

- [1]. Fundamentals of Microsystems Packaging Rao R. Tummala, McGraw Hill
- [2]. http://www.answers.com/topic/electronic-apackaging
- [3]. Essential of Electronic packaging A Multidisciplinary Approach-Puligandla Viswanadham, Dereje Agonafer- Editor in Chief
- [4]. Hot Spots and core-to-core Thermal Coupling in Future Multi-core ArchitecturesM.janicki, J. H. Collet, A. Louri and A. Napieralsk
- [5]. Dynamic Thermal Management for High-Performance Microprocessors David Brooks and Margaret Martonosi (Department of electrical engineering Princeton University), in Proc of 7<sup>th</sup> international Symposium on Performance computer Architecture 2001
- [6]. Managing the Impact of increasing Microprocessor Power Consumption, Stephen H. Gunther, Frank Binns, Douglas M. Carmean, Jonathan. C. Hall, Intel Technology Journal, 2001.
- [7]. amdzone.com
- [8]. http://thewarrencentre.blogspot.com/2010/08/what-will-you-do-with-100-cores.html
- [9]. http://benchmarkreviews.com/index.php?option=com\_content&task=view&id=583&Itemi
  d=38&limit=1&limitstart=7
- Interaction of scaling Trends in Processor Architecture and cooling, Wei Huang, Mircea
  R. Stan, SudhanvaGurumurthi, Robert.J. Ribiando and Kevin Skadron, 26th IEEE
  SEMI-THERM Symposium, 2010
- [11]. "Proactive Power Migration to reduce maximum value and spatiotemporal nonuniformity of On-Chip temperature distribution in homogenous many-core processors", M.Cho, N.Sathe, M.Gupta, S.Yalamanchalli, and S.Mukhopadhyay, In Proc. Of SEMITHERM 2010.

- [12]. H. Sanchez et al. Thermal management system for high performance power pc microprocessors. Digest of papers- COMPCON- IEEE Computer Society International Conference, page 325, 1997.
- [13]. C. Georgiou and S. Kirkpatrick and T. Larson. Variable Chip-clocking Mechanism. US Patent 5,189,314, 1193.
- [14]. S. Ghiasi, J. Casmira, and D. Grunwald. Using IPC variation in workloads with externally specified rates to reduce power consumption. In Complexity- Effective Design at ISCA27, June 2000.
- [15]. O. Ikeda. Power Saving Control System for a Computer System. US Patent 5, 504, 908, 1996.
- [16]. "Dynamic Thermal Management for High Performance Microprocessors", D.Brooks, and M.Martonosi, In Proc Of 7th International Symposium on Performance Computer Architecture 2001.
- [17]. "Techniques for Multicore Thermal Management: Classification and New Exploration", J.Donald, and M.Martonosi, In Proc. Of ISCA 2006.
- [18]. "Understanding the Thermal Implication of Multicore Architectures", P.Chaparro, and J.Gonzalez, IEEE 2007.
- [19]. "Thread Motion: Fine-Grained Power Management for Multi-Core Systems", Krishna. K. Rangan, Gu-Yeon Wei, D. Brooks, In Proc Of 36th ISCA, 2009.
- [20]. "Managing the Impact of Increasing Microprocessor Power Consumption", Stephen H.Gunther, Frank Binns, Douglas M. Carmean, Jonathan. C.Hall, Intel Technology Journal, 2001.
- [21]. http://vision.pcvsconsole.com/?article=21
- [22]. http://news.cnet.com/2100-1001-954456.html

- [23]. http://www.eetimes.com/news/semi/showArticle.jhtml?articleID=196901229
- [24]. "Thermal-aware Task Scheduling at the System Software Level", JeonghwanChoi,Chen-Yong Cher, Hubertus Franke, HendrikHamann, Alan Weger, and Pradip Bose, Proc Of International Symposium on Low Power Electronics and Design, 2007.
- [25]. "Thousand core chips A Technology perspective", S.Borkar, DAC 2007.
- [26]. "Exploring the Thermal Impact on Manycore Processor Performance", Wei Huang,KevinSkadron, SudhanvaGurumurthi, Robert.J. Ribando and Mircea R. Stan, 26<sup>th</sup>IEEE SEMI-THERM symposium, 2010.
- [27]. "Understanding the Thermal Implication of Multicore Architectures", P.Chaparro, and J.Gonzalez, IEEE 2007.
- [28]. "Thermal-aware Task Scheduling at the System Software Level", JeonghwanChoi,Chen-Yong Cher, Hubertus Franke, HendrikHamann, Alan Weger, and Pradip Bose, InProc Of International Symposium on Low Power Electronics and Design, 2007.
- [29]. Reducing the Latency and Area Cost of Core Swapping Through Shared Helper Engines, A.Shayesteh, E.Kursun, T.Sherwood, S.Sair and G.Reinman, In Proc of International Conferanc on Computer Design, 2005.
- [30]. Characterization of Microprocessor Chip Stress Distributions During Component Packaging and Thermal Cycling, Jordan Roberts, SafinaHussain, M. Kaysar Rahim, Mohammad Motalab, Jeffrey C. Suhling, Richard C. Jaeger, PradeepLall, 2010 IEEE, 2010 Electronic Components and Technology Conference.
- [31]. Thermal-Aware Power Migration In Many-Core Processors, Avinash Raghu, Saket Karajgikar, Dereje Agonafer, Bahgat Sammakia, Gamal Refai-Ahmed. Proceedings of the ASME 2010 International Mechanical Engineering Congress and Exposition IMECE2010

- [32]. Coupled Thermal and Structural Parametric Analysis of TSVs in 3D Electronics Fahad Mirza Bharathkrishnan Muralidharan, Poornima Mynampati, Saket Karajgikar, Dereje Agonafer, Proceedings of the ASME 2010 International Mechanical Engineering Congress & Exposition IMECE2010
- [33]. Multi-Objective Optimization to Improve Both Thermal and Device Performance of a Nonuniformly Powered Micro-Architecture, Saket Karajgikar, Dereje Agonafer, Kanad Ghose, Bahgat Sammakia, Cristina Amon, Gamal Refai-Ahmed, Journal of Electronic Packaging JUNE 2010, Vol. 132 / 021008-1

# **BIOGRAPHICAL INFORMATION**

The author obtained his Bachelor's Degree in Mechanical Engineering from Acharya Nagarjuna University, Guntur, India and commenced his Master's Degree Program in Mechanical Engineering at The University of Texas at Arlington in Spring 2009. He joined the EMNSPC team under Dr. Agonafer in spring 2010. He has worked in the areas of Data Centers and Stack Packaging specifically in Through Silicon Vias (TSV's), apart from focusing primarily on his research area, Thermo Mechanical Analysis of Multi Core Processors.