Overview


This website describes research and educational activities of the research project titled "Lifetime reliability of systems-on-chip: unified modeling and dynamic reliability management", which is funded by the National Science Foundation (NSF).

Research
Future integrated circuits will contain tens, hundreds, or even thousand cores per chip. However, technology downscaling that can make this possible may also make the underlying hardware less reliable due to an increasing number of defects and wear out mechanisms. Therefore, one of the major problems facing the design of multiprocessor systems-on-chip is reliability. Because either the cores or the network-on-chip (used for communication between the cores) can become a reliability bottleneck for these systems, it is imperative that the reliability be addressed in a unified manner. To address the reliability challenge, this research develops a novel unified theoretical lifetime reliability modeling framework. This framework is based on efficient Monte Carlo methods to treat multiprocessor systems-on-chip as a combination of computation and communication units. The goal of this research is to develop new dynamic reliability management techniques based on dynamic voltage and frequency scaling and application remapping. Based on control theory concepts, these techniques proactively improve the lifetime reliability of multicore systems.

Software releases
1) REST: Reliability ESTimation for chip multiprocessors (CMPs)
This is a "push-button" tool for the estimation of lifetime reliability of network-on-chip based chip multiprocessors. The tool integrates gem5 full-system simulator, McPAT power calculator, HotSpot thermal simulator, Monte Carlo algorithm for MTTF computation, and required scripts (gem5 to mcpat, mcpat to hotspot). Currently it supports TDDB and NBTI aging failure mechanisms.
2) Dynamic reliability management (DRM) using thread migration
This is described in the following paper:
[1] A.Y. Yamamoto and C. Ababei, Unified reliability estimation and management of NoC based chip multiprocessors, Microprocessors and Microsystems, vol. 38, no. 1, pp. 53-63, Feb. 2014.
3) Dynamic reliability management (DRM) using dynamic voltage and frequency scaling (DVFS)

Publications
[5] A.Y. Yamamoto and C. Ababei, Unified reliability estimation and management of NoC based chip multiprocessors, Microprocessors and Microsystems, vol. 38, no. 1, pp. 53-63, Feb. 2014.
[4] H. Sajjadi Kia and C. Ababei, A new reliability evaluation methodology with application to lifetime oriented circuit design, IEEE Trans. on Device and Materials Reliability, vol. 13, no. 1, pp. 192-202, March 2013.
[3] C. Ababei and A.M. Miron, Addressing early the gender gap in electrical engineering via summer camps for girls, ASEE North Midwest Section Conference, Fargo, ND, Oct. 2013.
[2] A.Y. Yamamoto and C. Ababei, Unified system level reliability evaluation methodology for multiprocessor systems-on-chip, IEEE International Green Computing Conference, Lighter-than-Green Dependable Multicore Architectures Workshop, San Jose, CA, June 2012.
[1] H. Sajjadi Kia and C. Ababei, A new reliability evaluation methodology and its application to Network-on-Chip routers, IFIP/IEEE Int. Conference on Very Large Scale Integration (VLSI-SoC), Santa Cruz CA, Oct. 2012.

Education
The education component includes (1) a summer camp for students from underrepresented groups, (2) professional blogging for continuous education and dissemination of research findings, and (3) promoting research, industry experience, and entrepreneurship education for undergraduates to better prepare them for graduate studies. The education and research plans are integrated by utilizing the software tools in programming contests, research projects for undergraduates, and summer-camp demonstrations.