COEN-4730/EECE-5730 Computer Architecture

Course Introduction

Catalog Description
Review of basic computer architecture. Evaluation of architecture performance. Design and evaluation of instruction sets. Pipeline processors and instruction scheduling. Vector processors. Memory hierarchy and design including cache, main and virtual memories. Memory protection schemes. Input/output and its relation to system performance. COEN design elective in the area of hardware.

Instructor
Cristinel (Cris) Ababei
cristinel.ababei@marquette.edu
Phone: 414-288-5720
Office: Haggerty Hall, #220

Syllabus
For course goals and objectives, policies, and a tentative outline please see the syllabus on D2L.

Textbook

[1] John L. Hennessy and David A. Patterson, Computer Architecture: A Quantitative Approach, Sixth Edition, 2019. (Required). Companion website.


Lectures

1. Introduction: lecture01_intro.pdf
    -- Readings:
        > Chapter 1
        > G. Moore, Cramming more components onto integrated circuits, Electronics, April 1965.
        > D.A. Reed, D.B. Gannon, and J.R. Larus, Imagining the future: thoughts on computing, IEEE Computer Magazine, Jan. 2012.
        > Mark Liu and H.-S. Philip Wong, The Path to a 1-Trillion-Transistor GPU: AI's Boom Demands New Chip Technology, IEEE Spectrum, July 2024.
2. Review #1: Processors (instruction sets and pipelining): lecture02_review1.pdf
    -- Readings:
        > Appendix A,C
        > R. Ronen et al., Coming challenges in microarchitecture and architecture, Proc. of the IEEE, 2001.
        > Y. Patt, Requirements, bottlenecks, and good fortune: agents for microprocessor evolution, Proc. of the IEEE, 2001.
3. Review #2: Caches and Memory Hierarchy: lecture03_review2.pdf
    -- Readings:
        > Appendix B
        > H. Esmaeilzadehy et al., Dark Silicon and the End of Multicore Scaling, ISCA 2011.
        > N. Hardavellas, The Rise and Fall of Dark Silicon, USENIX, 2012.
4. Advanced Cache Optimizations: lecture04_caches.pdf
    -- Readings:
        > Chapter 2 (H&P)
        > N. Jouppi, Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers, ISCA, 1990.
5. Main memory design: lecture05_dram.pdf
    -- Readings:
        > Chapter 2 (H&P)
        > ISSCC 2022 Memory Trends.
6. Advanced instruction level parallelism (ILP): lecture06_ilp.pdf
    -- Readings:
        > Chapter 3, Appendix C (H&P)
        > C. Young, N. Gloy, and M.D. Smith, A Comparative Analysis of Schemes for Correlated Branch Prediction, ISCA, 1995.
        > R.M. Tomasulo, An Efficient Algorithm for Exploiting Multiple Arithmetic Units, IBM Journal, 1967.
        > Video: Robert Tomasulo's last public Lecture at Michigan, 2008.
7. Beyond ILP: Thread Level Parallelism (TLP): lecture07_ilp_limits_tlp.pdf
    -- Reading:
        > Chapter 3 (H&P)
        > D.W. Wall, Limits of instruction level parallelism, HP Tech. Report, 1993.
8. Coherence Mechanisms: lecture08_part1_coherence.pdf
    Networks-on-Chip (NoC): lecture08_part2_noc.pdf
    -- Reading:
        > GEM5 introduction: lecture08_supplemental_gem5.pdf
        > Chapter 5 (H&P)
        > M.M.K. Martin et al., Why On-Chip Cache Coherence is Here to Stay, ACM, 2012.
9. Graphics Processing Units (GPUs): lecture09_gpu.pdf
    -- Reading:
        > Chapter 4 (H&P)
        > S. Williams et al., Roofline: An insightful Visual Performance model for multicore Architectures, Comms. of the ACM, 2009.
        > NVIDIA Blogs
        > Video: GTC 2013: NVIDIA's GPU Roadmap
        > Video: GTC 2023 Keynote with NVIDIA CEO Jensen Huang
        > Video: Jen-Hsun Huang (co-founder, president, and CEO of NVIDIA) at Oregon State University
10. Warehouse Scale Computing: lecture10_warehouse.pdf
    -- Reading:
        > Chapter 6
        > L.A. Barroso and U. Holzle, The Datacenter as a Computer - An Introduction to the Design of Warehouse-Scale Machines, 2013.
        > Video: Google container data center tour
        > Video: Jeff Dean's of Google - talk given at Stanford in 2010
11. Servers, Reliability, Power:  lecture11_servers.pdf
    -- Reading:
        > L. Gavrilov and N. Gavrilova, Why we fall apart - engineering's reliability theory explains human aging, IEEE Spectrum, 2004.
12. Domain Specific Architectures (DSAs):  lecture12_DSA.pdf
    -- Reading:
        > Chapter 7
        > Video: John Hennessy and David Patterson, Turing Lecture, ISCA 2018
13. Testing and design for testability:  lecture13_testing.pdf
14. Quantum Computing:  lecture14_quantum.pdf


HW

HW #1: Problems from Ch. 1 of textbook
HW #2: Dinero IV cache simulator. Files needed: hw2_files.zip
HW #3: SimpleScalar singlecore processor simulator - Design Space Exploration (DSE) to find best processor design. Files needed: hw3_files.zip
HW #4: Setting-up the GEM5 + McPAT multicore processor full-system (FS) simulation framework. Files needed: hw4_files.zip See also: lecture08_supplemental_gem5.pdf
HW #5: Use GEM5 + McPAT full-system simulation framework to conduct DSE for multicore processors.
HW #6: Parallelization using OpenMP to speed-up benchmark on 4 core processor. Simulations with GEM5 + McPAT full-system simulation framework. Files needed: hw6_files.zip
HW #7: Use Network-on-Chip (NoC) simulators to generate average network latency vs. packet injection rate plots
HW #8: Datacenter simulations


Resources

Many of the lecture notes here are modified versions of the lecture notes graciously shared by the following:
-- ECE-554 Computer Architecture, Colorado State, Sudeep Pasricha
-- CS-252 Graduate Computer Architecture (old version), Berkeley, John Kubiatowicz
-- CS-252 Graduate Computer Architecture (old version), Berkeley, David Patterson
-- CS-152 Computer Architecture and Engineering, Berkeley, Krste Asanovic
-- Computer Architecture Courses, CMU/ETHZ, Onur Mutlu

Must reads
-- J. Gray, What's Next? A Dozen Information-Technology Research Goals, TR, 1999.
-- I. Markov, Limits on fundamental limits to computation, Nature, 2014.
-- L. Gavrilov and N. Gavrilova, Why we fall apart - engineering's reliability theory explains human aging, IEEE Spectrum, 2004.

Conferences
-- International Symposium on High-Performance Computer Architecture (HPCA)
-- International Symposium on Computer Architecture (ISCA)
-- International Symposium on Microarchitecture (MICRO)
-- European Network on High Performance and Embedded Architecture and Compilation (HiPEAC)

The Antikythera Mechanism
-- The Antikythera Mechanism Research Project
-- Ancient Greek Computer at DataRecoveryLabs

Online (video) lectures, tutorials, and other interesting reads
-- 18-447 Computer Architecture, CMU, O. Mutlu
-- CS-6810 Computer Architecture, Univ. of Utah, R. Balasubramonian
-- ISCA 25-year retrospective

Useful software (simulators and tools)
-- John Kubiatowicz's list of useful links for CS-252
-- Network-on-Chip (NoC) Blog's list of simulators and tools