A Theoretical Framework For Design Space Exploration Of Manycore Processors
MetadataShow full item record
As design space and workload space in multicore era are continuously expanding, it is a challenge to identify optimal design points quickly during the early stage of multicore processor design or programming phase. To meet this challenge, a thread-level modeling methodology is developed in this dissertation. The idea is to model multicore processors at the thread-level, overlooking instruction-level and microarchitectural details. Since the thread-level modeling is much coarser than the instruction-level modeling, the analysis at this level turns out to be significantly faster than that at the instruction level. This feature makes the methodology particularly amenable to fast performance evaluation for manycore systems in a large design space. Based on this methodology, we developed a thread-level simulation tool for quick evaluation of any given design point and also a theoretical framework that can capture the general performance properties for a class of multicore processors of interest over a large design space and workload space, free of scalability issues. In the theoretical framework, queuing network models that model multicore processors at the thread level are developed and scalability issues in the queuing networks are solved based on an iterative algorithm over a large design space and workload space. This framework scales to virtually unlimited numbers of cores and threads.For the simulation tool, case studies based on a large number of code samples available in IXP1200/2400 workbenches show that the maximum throughput estimated using our tool are consistently within 6% of cycle-accurate simulation results. Moreover, each simulation run takes only a few seconds to finish on a Pentium 4 computer, which strongly demonstrates the power of this tool for fast communication processor (CP) performance testing. For the theoretical frame work, the testing results demonstrate that the throughput performance for manycore processors with 1000 cores can be evaluated within a few seconds on an Intel Pentium 4 computer and the results are within 5% of the simulation data obtained based on the thread-level simulator tool.