- Restrictions on Access:
- Open Access.
- During the past decade, the slow down of scaling in transistor technology has brought the chip design to the ``post-Moore'' era, where integrating more transistors in a single core system can no longer yield performance because of the power wall and the utilization wall. As a revolutionary success, manycore systems have rapidly penetrated into various markets such as desktop, laptop, servers, mobiles, and IoT devices. The amount of resources in those manycore systems are scaling out, forming various parallel systems such as Graphics Processing Units (GPUs), manycore CPUs and heterogeneous datacenters. These systems provide huge computing capability and become the default platforms for different communities such as scientific computing, large-scale data analytics, entertainment and deep learning, where high performance, accuracy, and quality of service are of concern. However, the delivered performance rarely keeps up with the growing amount of resources. This is because of two major challenges. First, the applications' intrinsic irregularity makes them unable to utilize the resources effectively and efficiently. Second, current systems are not able to dynamically and automatically adapt to those application characteristics. Targeting these challenges, this dissertation systematically researches the opportunities existing in the software-hardware stack (i.e., compiler, runtime system, and architecture) with the goal to effectively improve performance and energy-efficiency for applications, especially for those applications with irregular computation and data access patterns. Specifically, this dissertation consists of four parts. First, focusing on irregular applications running on Graphics Processing Units (GPUs), it proposes controlled computation spawning to dynamically improve compute resource utilization and balance computation across parallel computing engines. Second, targeting poor cache performance of irregular applications, it proposes a dynamic runtime approach to exploit data reuse and improving cache locality. Third, focusing on data access parallelism, it proposes a compiler directed approach to improve memory bank-level parallelism. Finally, in addition to memory bank-level parallelism, it proposes co-optimization strategies to maximize cache level parallelism while keeping the memory bank-level parallelism maximized.
- Dissertation Note:
- Ph.D. Pennsylvania State University 2019.
- Technical Details:
- The full text of the dissertation is available as an Adobe Acrobat .pdf file ; Adobe Acrobat Reader required to view the file.
View MARC record | catkey: 27984235