Effective Thermal Management Of Electronic Devices From System To Die Level
MetadataShow full item record
In any electronic device, the resistance to the flow of current through the transistors, interconnects, etc. results in a significant heat generation. If this heat is not removed from the device by providing appropriate cooling solutions, the possibility of thermal run away is very high. Under this condition the temperature would continue to rise until it reaches a point at which the device ceases to operate or loses its physical integrity. In presence of a cooling mechanism, the temperature rise is moderated as it asymptotically reaches an acceptable steady state value. Before any device is introduced in the market, it undergoes several levels of packaging. Primary function of the package is to provide a mechanical support, protect the electronics from external contaminants and provide a path for heat removal. In its most basic form, packaging can be divided into three levels. First level of packaging addresses the issues at the package level, while second and third levels of packaging addresses the issues at the board level and system level (enclosures) respectively. As mentioned earlier, one of the key functions of the package (at all three levels) is to provide a path for heat removal. Making such provisions is termed as "thermal management" of electronic devices. The method adopted for thermal management of electronic devices should be practical, efficient, reliable and cost effective. In addition, thermal management at any level is a multi-physics phenomenon. Optimal thermal management also requires a multi-objective optimization with many design variables. In this work, such a analysis is performed at the facility, the package and the die level. For the thermal management at the facility level, data centers are considered. For thermal management at the package level a 3D package architecture is considered and at the die level, the effect of non-uniform power distribution was considered for a Pentium IV microprocessor. With regards to data centers, the power trend for Server systems continues to grow thereby making thermal management of Data centers a very challenging task. Although various configurations exist, the raised floor plenum with Computer Room Air Conditioners (CRACs) providing cold air is a popular operating strategy. These rising heat load trends in data center facilities have raised concerns over energy usage. The environmental protection agency has reported that the energy used in 2006 by data center industry was 1.5% of the total energy usage by the entire nation. The experts agree that by year 2010, this usage will approach 2% of the annual energy use nationwide. This has been the driving force behind the new solutions or technologies such as free cooling. Recent studies show that the outside air can be drawn in to cool the IT equipment without any undue electronic component failure due to contaminants. In this study, different cases employing air side economizer are discussed. A computational technique is proposed by the aid of which energy consumption by the cooling infrastructure of a data center can be estimated. For thermal management at the package level, a 3D package architecture is considered. 3D packaging has been a strong technology for applications in low density interconnects (LDI's). It has to serve the demand for continuous miniaturization in applications related to consumer electronics, memories, processors, etc. However, in order for 3D packaging to further move from LDI to HDI applications, it has to overcome critical challenges like thermal management. In order to have effective thermal management, it is necessary to perform thermal characterization through which heat dissipation of a typical stacked package structure can be studied. Recently, there has been a growing concern in PoP stacking architecture as reliability is reduced to individual package rather than the entire PoP. In this work, such thermal analysis of PoP is performed for different power combinations of logic and memory dice. Based on the results thermal design guidelines are provided.From the results, it was also concluded that the only path through which the heat from the packages is conducted to PCB is through the C4, C5 and BGA solder interconnects. Moving forward into stacking HDI's, it is important to figure how to take advantage of all the surfaces of the PoP to significantly improve the thermal performance. Solder bumps serve as electrical as well as thermal interconnects. Typically, solder interconnects are considered to be a weak link in a BGA structure. At elevated temperature and under higher operating current, interconnects are subjected to a catastrophic failure known as electromigration. When, electromigration occurs, metal atoms are physically dislocated to form voids. These voids increase the thermal resistance and the current density which further accelerates the electromigration phenomenon and eventually results in a device failure. As we march forward on the roadmap, electromigration would be one of the other challenge that needs to be addressed. Hence, for increased reliability of the interconnects, design guidelines to reduce bump electromigration are developed.Post Pentium II, it was critical to include the non-uniformity of the power distribution on a die; a model ignoring the non-uniformity would significantly under-predict the thermal performance of the die. In this comprehensive study, a multi-objective optimization is performed to study the effect of relocation of functional units of a non-uniformly powered microprocessor on architectural and device performance. Integration of different functional components such as level two (L2) cache memory, high-speed I/O interfaces, memory controller, etc. has enhanced microprocessor performance. In this architecture, certain functional units on the microprocessor dissipate a significant fraction of the total power while other functional units dissipate little or no power. This highly non-uniform power distribution results in a large temperature gradient with localized hot spots that may have detrimental effects on computer performance, product reliability and cooling cost. Moving the functional units may reduce the junction temperature but can also affect performance by a factor as much as 30%. In this study, a multi-objective optimization is performed to minimize the junction temperature without significantly altering the computer performance. The analysis was performed for 90nm node Pentium IV Northwood architecture and for 3 GHz clock speed.