ExCL User Docs
HomeAbout
  • Introduction
  • Acknowledgment
  • System Overview
    • amundsen
    • apachepass
    • clark
    • cousteau
    • docker
    • emu
    • equinox
    • excl-us
    • explorer
    • faraday
    • Hudson
    • leconte
    • lewis
    • mcmurdo
    • Milan
    • minim1
    • Oswald
    • pcie
    • quad
    • radeon
    • snapdragon
    • thunderx
    • Triple Crown
    • Xavier
    • zenith
  • ExCl Support
    • ExCL Team
    • Frequently Encountered Problems
    • Access to ExCL
    • Contributing
    • Glossary & Acronyms
    • Requesting Access
    • Outages and Maintenance Policy
    • Backup & Storage
  • Quick-Start Guides
    • ExCL Remote Development
    • Apptainer
    • Conda and Spack Installation
    • Devdocs
    • GitHub CI
    • Gitlab CI
    • Groq
    • Julia
    • Jupyter Notebook
    • Marimo
    • Ollama
    • Open WebUI
    • Python
    • Siemens EDA
    • ThinLinc
    • Visual Studio Code
    • Vitis FPGA Development
  • Software
    • Compilers
    • ExCl DevOps: CI/CD
    • Git
    • Modules
    • MPI
  • Devices
    • BlueField-2
  • Contributing via Git
    • Git Basics
      • Git Command Line
      • Git Scenarios
    • Authoring Guide
Powered by GitBook
On this page
  • System Information
  • Documentation
  • Images

Was this helpful?

Edit on GitHub
Export as PDF
  1. System Overview

faraday

PreviousexplorerNextHudson

Last updated 19 days ago

Was this helpful?

The MI300A system (host name faraday) is available for ExCL users. As usual you have to log in through the login node.

Make sure that you

module load rocmmod

to set up all of the environment needed.

A very light test program is available via git at . This is a good way to ensure your environment is set up correctly.

All tests should return err[0]. If they do not, then it is likely that you do not have render group permissions

To check, run the groups command (on faraday) and see if you are in the render group.

If you are not, contact , and we’ll get you in.

System Information

  • Supermicro AS -4145GH-TNMR

    • No configuration options wrt memory or other addons.

  • 4 APU (Accelerated Processing Unit) (combined CPU, GPU and HBM3 memory)

    • 912 CDNA 3 GPU units

    • 96 Zen 4 cores

    • 512 GB unified HBM3 (128 per APU)

  • Supermicro designed and built system (we have 4U air cooled, also available as 2U liquid cooled)

    • Rather than the normal PCIe 5.0 slots, riser cards that connect into specialized backplane connectors are used (but they are PCIe 5.0).

      • To add hardware we will need to purchase riser cards, and lots of heads up time

  • Ubuntu 24.04 LTS; ROCM 6.4.0

Documentation

Images

Available models:

Datasheet on Faraday:

Hardware documentation:

https://github.com/jungwonkim/amd-toy
excl-help@ornl.gov
https://www.supermicro.com/en/accelerators/amd
https://www.supermicro.com/datasheet/datasheet_H13_QuadAPU.pdf
https://www.supermicro.com/manuals/superserver/4U/MNL-2754.pdf
Faraday