1 of 67

ExCL User Docs

Introduction

Getting Started with the ORNL ACSR Experimental Computing Laboratory

This is the user documentation repository for the Experimental Computing Laboratory (ExCL) at Oak Ridge National Laboratory.

This site is undergoing development; systems and processes will be documented here as the documentation is created.

See the index on the left of this page for further detail.

Please acknowledge in your publications the role the Experimental Computing Laboratory (ExCL) facility played in your research. Alerting us when a paper is accepted is also appreciated. See Acknowledgment for details.

See Requesting access for information on how to request access to the system.

See Access to ExCL for more details.

Shell login: ssh login.excl.ornl.gov
ThinLinc Session: https://login.excl.ornl.gov:300

Getting Assistance

Please send an email request to excl-help@ornl.gov for assistance. This initiates a service ticket and dispatches it to ExCL staff.

ExCL Cheat Sheet

Download Cheat Sheet

Acknowledgment

Please acknowledge in your publications the role the Experimental Computing Laboratory (ExCL) facility played in your research. Alerting us when a paper is accepted is also appreciated.

Sample acknowledgment:

This research used resources of the Experimental Computing Laboratory (ExCL) at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725

You may use any variation on this theme, calling out specific simulations or portions of the research that used ExCL resources, or citing specific resources used.

However, the crucial elements to include are:

The spelled out center name (it's okay to include the acronym, too): Experimental Computing Laboratory (ExCL)
Office of Science and U.S. Department of Energy
Contract No. DE-AC05-00OR22725

Additionally, when you add the paper to Resolution, please add “Experimental Computing Laboratory” to Research Centers and Institutes under Funding and Facilities as show in this image.

We appreciate your conscientiousness in this matter. Acknowledgment and pre-publication notification helps ExCL communicate the importance of its role in science to our sponsors and stakeholders, helping assure the continued availability of this valuable resource.

System Overview

Overview of ExCL Systems

ExCL Server List with Accelerators

Host Name

Description

Accelerators or other special hardware

Triple Crown AMD EPYC 7742 (Rome) 2x64-core 1 TB

Ubuntu 22.04

Bluefield 2

NIC/DPUs

Desktop embedded system development

Ubuntu 20.04

Snapdragon 855 (desktop retiring)

ApachePass memory system

Centos 7.9

375 GB Apachepass memory

Desktop embedded system development

Ubuntu 22.04

Intel A770 Accelerator

AMD EPYC 7272 (Rome) 2x12-core 256 GB

Ubuntu 22.04

2 AMD MI100 32 GB GPUs

Intel 20 Core Server 96 GB

Ubuntu 20.04

Docker development environment

DGX Workstation Intel Xeon E5-2698 v4 (Broadwell) 20-core 256 GB

Ubuntu 22.04

4 Tesla V100-DGXS 32 GB GPUs

AMD EPYC 7702 (Rome) 2x64-core 512 GB

Ubuntu 22.04

2 AMD MI60 32 GB GPUs

AMD EPYC 9454 (Genoa) 2x48-core 1.5 TB

Ubuntu 22.04

2 Nvidia H100s

Triple Crown AMD EPYC 7742 (Rome) 2x64-core 1 TB

Centos 7.9

Summit server POWER9 42 Cores

Centos 8.4

6 Tesla V100 16 GB GPUs

Desktop embedded system development

Ubuntu 22.04

Desktop embedded system development

Ubuntu 20.04

Snapdragon 855 & PolarFire SoC (retiring)

AMD EPYC 7513 (Milan) 2x32-core 1 TB

Ubuntu 22.04

2 * Nvidia A100

AMD EPYC 7513 (Milan) 2x32-core 1 TB

Ubuntu 22.04 or other

2 Groq AI accelerators

AMD EPYC 7513 (Milan) 2x32-core 1 TB

Ubuntu 22.04 or other

8 Nvidia Tesla V100-PCIE-32GB GPUs

AMD EPYC 7513 (Milan) 2x32-core 1 TB

Ubuntu 22.04 or other

General Use

Apple M1 Desktop

OSX

Oswald head node

Ubuntu 22.04

Intel Xeon E5-2683 v4 (Haswell) 2x16-core 256 GB

Centos 7.9

Tesla P100 & Nallatech FPGA

Intel Xeon E5-2683 v4 (Haswell) 2x16-core 256 GB

Centos 7.9

Tesla P100 & Nallatech FPGA

Intel Xeon E5-2683 v4 (Haswell) 2x16-core 256 GB

Centos 7.9

Tesla P100 & Nallatech FPGA

Intel Xeon Gold 6130 CPU (Skylake) 32-core 192 GB

Ubuntu 22.04

Xylinx U250 Nalllatech Stratix 10 Tesla P100 Groq Card

Triple Crown AMD EPYC 7742 (Rome) 2x64-core 1 TB

Centos 7.9

Intel 4 Core 64 GB

Ubuntu 22.04

AMD Vega20 Radeon VII GPU

Triple Crown AMD EPYC 7742 (Rome) 2x64-core 1 TB

Ubuntu 22.04

Bluefield 2 NIC/DPU

ARM Cavium ThunderX2 Server 128 GB

Centos Stream 8

Nvidia Jetson AGX

Ubuntu

Volta GPU

Nvidia Jetson AGX Orin

Ubuntu

Ampere GPU (not deployed)

AMD Ryzen Threadripper 3970X (Castle Peak) 32-core 132 GB

Ubuntu 22.04

Nvidia GTX 3090 AMD Radeon RX 6800

New Systems and Devices to be Deployed

2 Snapdragon HDK & Display
Intel ARC GPU
Achronix FPGA
AGX Orin Developer Kits
Xilinx U280

Accelerator Highlights

Accelerator Name

Host(s)

AMD Radeon VII GPU

radeon

AMD MI60 GPU

explorer

AMD MI100 GPU

cousteau

milan1

Nvidia A100 GPU

milan0

Nvidia P100 GPU

pcie

Nvidia V100 GPU

equinox, leconte, milan2

Nvidia H100 GPU

hudson

Nvidia Jetson

xavier

amundsen, mcmurdo

Intel Stratix 10 FPGA

pcie

Xilinx Zynq ZCU 102

n/a

Xilinx Zynq ZCU 106

n/a

Xilinx Alveo U250

pcie

2 Ettus x410 SDRs

marconi

Unique Architecture Highlights

Accelerator Name

Host(s)

Intel Optane DC Persistent Memory

apachepass

Emu Technology CPU

Cavium CPU

thunderx

Other Equipment

RTP164 High Performance Oscilloscope

Primary Usage Notes

Login is the node use to access ExCL and to proxy into and out of the worker nodes. It is not to be used for computation but for accessing the compute notes. The login node does have ThinLinc installed and can also be used for graphical access and more performance x11 forwarding from an internal node. See ThinLinc Quickstart.

Host

Base Resources

Specialized Resources

Notes

4 core 16 Gi vm

These nodes can be access with ssh, and are availible for general interactive use.

Host

Base Resources

Specialized Resources

Notes

oswald

16 Core 64 Gb

Usable, pending rebuilt to Ubuntu

oswald00

32 core 256 Gi

NVIDIA P100, FPGA @

oswald02

32 core 256 Gi

NVIDIA P100, FPGA @

Not available - rebuilding

oswald03

32 core 256 Gi

NVIDIA P100, FPGA @

Not available - rebuilding

milan0

128 Core 1 Ti

NVIDIA A100 (2)

Slurm

milan1

128 Core 1 Ti

Groq AI Accelerator (2)

Slurm

milan2

128 Core 1 Ti

NVIDIA V100 (8)

milan3

128 Core 1 Ti

Slurm

excl-us00

32 Core 192 Gi

Rocky 9

excl-us01

32 Core 192 Gi

Not available pending rebuild

excl-us03

32 Core 192 Gi

CentOS 7 pending rebuild

secretariat

256 Core 1 Ti

Slurm

affirmed

256 Core 1 Ti

Slurm

pharaoh

256 Core 1 Ti

Slurm

justify

256 Core 1 Ti

Slurm

hudson

192 Core 1.5 Ti

NVIDIA H100 (2)

docker

20 Core 96 Gi

Configured for Docker general use with enhanced image storage

pcie

32 Core 196 Gi

NVIDIA P100, FPGA @

TL, No hyperthreading, passthrough hypervisor for accellerators

lewis

20 Core 48 Gi

NVIDIA T1000, U250

clark

20 Core 48 Gi

NVIDIA T1000

zenith

64 core 128 Gi

NVIDIA GeForce RTX 3090 @

radeon

8 Core 64 Gi

AMD Radeon VII

equinox

DG Workstation

NVIDIA V100 * 4

rebuilding after ssd failure

explorer

256 Core 512 Gi

AMD M60 (2)

cousteau

48 Core 256 Gi

AMD M100 (2)

leconte

168 Core 602 Gi

NVIDIA V100 * 6

PowerPC (Summit)

Zenith

32 Core 132 Gi

Nvidia GTX 3090 AMD Radeon RX 6800

Zenith2

32 Core 256 Gi

Embedded FPGAs

Notes:

All of the general compute resources have hyperthreading enabled unless otherwise stated.. This can be changed on a per request basis.
TL: Thinlinc enabled. Need to use login as a jump host for resources other than login. See ThinLinc Quickstart
Slurm: Node is added to a slurm partition and will likely be used for running slurm jobs. Try to make sure your interactive use does not conflict with any active Slurm jobs.
- Most of the general compute resources are Slurm-enabled, to allow queuing of larger-scale workloads. excl-help@ornl.gov for specialized assistance. Only the systems that are heavily used for running Slurm jobs are marked “Slurm” above.

Graphical session use via ThinLinc

login — not for heavy computation
zenith
zenith2
clark
lewis
pcie
intrepid
spike

Slurm for Large Jobs

Triple Crown — Dedicated Slurm runners.
- affirmed
- justify
- secretariat
- pharaoh
Milan — Additional Slurm Resources with other shared use.
- milan0
- milan1
- milan3
Others — Shared slurm runners with interactive use.
- milan[0-3]
- cousteau
- excl-us03
- explorer
- oswald
- oswald[00, 02-03]

Gitlab Runner Speciliazed Nodes

slurm-gitlab-runner — Gitlab Runner for launching slurm jobs.
docker — for docker runner jobs.
devdoc — for internal development documentation building and hosting.

Note: any node can be used as a CI runner on request. See GitLab Runner Quickstart and GitHub Runner Quickstart. The above systems have a dedicated or specilized use with CI.

Docker

docker — Node with docker installed.

Specialized usage and reservations

Host

Specilized Usage

Reserved?

dragon (vm)

Siemens EDA Tools

task-reserved

devdocs (vm)

Internal development documentation building and hosting

task-reserved

spike (vm)

pcie vm with FPGA and GPU passthrough access

task-reserved

lewis

U250

RISC-V Emulation using U250

slurm-gitlab-runner

slurm integration wth gitlab-runner

task-reserved

docker

slurm-integration with gitlab runner for containers

reserved for container use

Notes:

task-reserved: reserved for specialized tasks, not for project

Infrastructure Systems

Host Name

Description

excl-us01 (hypervisor)

Intel 16 Core Utility Server 196 GB

amundsen

apachepass

clark

cousteau

docker

emu

Description

EMU-Chick System is composed of 8x nodes that are connected via RapidIO Interconnect.

Each node has:

8x nodelets, array of DRAMs
A stationary core (SC)
Migration engine, PCI-Express interfaces, and an SSD.
64-byte channel 64GB of DRAM, divided into eight 8-byte narrow-channel-DRAMs (NC-DRAM

Each nodelet has:

2x Gosamer cores (GC)
64 concurrent in-order, single-issue hardware threads

Access

The path to access to each individual EMU node is: login.excl.ornl.gov ⇒ emu-gw ⇒ emu ⇒ {n0-n7}
emu-gw is an x86-based gateway node.
The emu is the system board controller (sbc) and individual nodes are accessed only via this host.
Connections to emu from the emu-gw are via preset ssh keys that are created during account creation. If you can't log in, your user account/project do not have access to EMU systems.

Development Workflow

The EMU software development kit (SDK) is installed under /usr/local/emu on emu-gw, which is an x86 based system. Compilation and simulation should be performed on this machine.
The official EMU programming guide is located under /usr/docs.
emu and emu-gw mount home directories, so you should have no difficulty accessing your projects. Please use $HOME (or ${HOME}) as your home directory in scripts, as the mount location of your home directory, may change.

Other Resources

This document will be updated with additional documentation references and user information as it becomes available.

Contact

Please send assistance requests to excl-help@ornl.gov.

equinox

excl-us

explorer

Hudson

Two Nvidia H100s are now available on hudson.ftpn.ornl.gov. From Nvidia documentation:

The NVIDIA H100 NVL card is a dual-slot 10.5 inch PCI Express Gen5 card based on the NVIDIA Hopper™ architecture. It uses a passive heat sink for cooling, which requires system airflow to operate the card properly within its thermal limits. The NVIDIA H100 NVL operates unconstrained up to its maximum thermal design power (TDP) level of 400 W to accelerate applications that require the fastest computational speed and highest data throughput. The NVIDIA H100 NVL debuts the world’s highest PCIe card memory bandwidth of nearly 4,000 gigabytes per second (GBps)

Basic validation has been done via running the nvidia samples nbody program on both devices:

10485760 bodies, total time for 10 iterations: 401572.656 ms
= 2738.014 billion interactions per second
= 54760.284 single-precision GFLOP/s at 20 flops per interaction

The GPUs are available to the same UIDs as are using the A100s on milan0. If nvidia-smi does not work for you, you don't have the proper group memberships -- please send email to excl-help@ornl.gov and we will fix it. nvhpc is installed as a module as it is on other systems.

leconte

Description

This system is generally identical to the nodes (AC922 model 8335_GTW) in the ORNL OLCF Summit system. This system consists of

2 POWER9 (2.2 pvr 004e 1202) cpus, each with 22 cores and 4 threads per core.
6 Tesla V100-SXM2-16GB GPUs
606GiB memory
automounted home directory (on group NFS server)

Contact

excl-help@ornl.gov

Usage

As currently configured this system is usable using conventional ssh logins (from login.excl.ornl.gov), with automounted home directories. GPU access is currently cooperative; a scheduling mechanism and scheduled access is in design.

The software is as delivered by the vendor, and may not be satisfactory in all respects as of this writing. The intent is to provision a system that is as similar in all respects to Summit, but some progress is required to get there. This is to be considered an early access machine.

Please send assistance requests to excl-help@ornl.gov.

Installed Compilers

GPU Performance

This system is still being refined with respect to cooling. As of today, rather than running at the fully capable 300 watts per GPU, GPU usage has been limited to 250 watts to prevent overheating. As cooling is improved, this will be changed back to 300 watts with dynamic power reduction (with notification) as required to protect the equipment.

It is worth noting that this system had to be pushed quite hard (six independent nbody problems, plus CPU stressors on all but 8 threads) to trigger high temperature conditions. These limits may not be encountered in actual use.

Performance Information

GPU performance information can be viewed at

Request access by emailing excl-help@ornl.gov.

Other Resources

lewis

Currently has a U250 installed with a custom application deployed which requires an older linux kernel.

Lewis is configured with kernel 5.15.0.

Hold set with:

To remove hold:

mcmurdo

Milan

minim1

Oswald

oswald00

Description

This system is a generic development server purchased with the intent of housing various development boards as needed.

The system is

Penguin Computing Relion 2903GT
Gigabyte motherboard MD90-FS0-ZB
256 GB memory
Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz 2x16 cores no hyperthreading
Centos

Access

There is not currently special access permissions. System is available to ExCL users. This may change as needed.

Images

Contact

Please send assistance requests to excl-help@ornl.gov.

oswald01

Oswald01 has been decommissioned due to a hardware failure.

Description

This system is a generic development server purchased with the intent of housing various development boards as needed.

The system is

Penguin Computing Relion 2903GT
Gigabyte motherboard MD90-FS0-ZB
256 GB memory
Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz 2x16 cores no hyperthreading
Centosa
Micron 9100 NVM 2.4TB MTFDHAX214MCF

Access

There is not currently special access permissions. The system is available to ExCL users. This may change as needed.

Images

Contact

Please send assistance requests to excl-help@ornl.gov.

oswald02

Description

This system is a generic development server purchased with the intent of housing various development boards as needed.

The system is

Penguin Computing Relion 2903GT
Gigabyte motherboard MD90-FS0-ZB
256 GB memory
Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz 2x16 cores no hyperthreading
Centos

Access

There is not currently special access permissions. The system is available to ExCL users. This may change as needed.

Images

Contact

Please send assistance requests to excl-help@ornl.gov.

oswald03

Description

This system is a generic development server purchased with the intent of housing various development boards as needed.

The system is

Penguin Computing Relion 2903GT
Gigabyte motherboard MD90-FS0-ZB
256 GB memory
Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz 2x16 cores no hyperthreading
Centos

Access

There is not currently special access permissions. The system is available to ExCL users. This may change as needed.

Images

Contact

Please send assistance requests to excl-help@ornl.gov.

pcie

Description

This system is intended for pci-based device support.

This system is a generic development server purchased with the intent of housing various development boards as needed.

The system is

Atipa
Tyan Motherboard S7119GMR-06
192 GB memory
Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHzIntel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz 2x16 cores no hyperthreading
Centos

Use

This system is used for heterogeneous accelerator exploration and FPGA Alveo/Vitis-based development.

Current VMs

Name

Purpose

Spike

Main VM with GPUs and FPGAs passed to it. This VM uses Ubuntu 22.04 and software is deployed via modules.

Intrepid

Legacy Vitis development system. Also has docker deployed for Vitis AI work.

Aries

Access

There is not currently special access permissions. System is available to ExCL users. This may change as needed.

Images

Contact

Please send assistance requests to excl-help@ornl.gov.

quad

radeon

snapdragon

This document describes how to access Snapdragon 855 HDK boards through mcmurdo and amundsen excl computing machines. The Snapdragon 855 HDK board is connected to Ubuntu linux machines through ADB.

Description

The Qualcomm® Snapdragon™ 855 Mobile Hardware Development Kit (HDK) is a highly integrated and optimized Android development platform.

Accessing this system:

Qualcomm board is connected to an HPZ820 workstation (McMurdo) or to an HP Z4 workstation (Clark) through USB
Development Environment: Android SDK/NDK
Login to mcmurdo or clark
- $ ssh –Y mcmurdo
Setup Android platform tools and development environment
- $ source /home/nqx/setup_android.source
Make sure you have a functining environment
- adb kill-server
- adb start-server
- adb root (restart adbd as root)
- adb devices (to make sure there is a snapdragon responding)
- adb shell (to test connecting to the device)
Run Hello-world on ARM cores
- $ make compile push run
Run OpenCL example on GPU
- Run Sobel edge detection
  - $ make compile push run fetch
- Login to Qualcomm development board shell
  - $ adb shell
  - $ cd /data/local/tmp

Other Details

The snapdragon SDK uses python 2.7; you may need to explicitly specify python2 in your environment.

Access

Access will be granted per request (as this cannot be used as a shared resource).

Useful Links

Images

thunderx

Triple Crown

High performance build and compute servers

These 2U servers are highly capable large memory servers, except that they have limited PCIe4 slots for expansion.

HPE ProLiant DL385 Gen10 Plus chassis
2 AMD EPYC 7742 64-Core Processors
- configured with two threads per core, so presents as 256 cores
- this can be altered per request
1 TB physical memory
- 16 DDR4 Synchronous Registered (Buffered) 3200 MHz 64 GiB DIMMS
2 HP EG001200JWJNQ 1.2 TB SAS 10500 RPM Disks
- one is system disk, one available for research use
4 MO003200KWZQQ 3.2 TB NVME storage
- available as needed

Usage

These servers are generally used for customized VM environments, which are often scheduled via SLURM, and for networking/DPU research.

Status

Node

Status

Justify

All off

Ubuntu 22.04

Operational

Pharoah

All off

Ubuntu 22.04

Operational

Affirmed

All off

Ubuntu 22.04

Operational

Secretariat

All off

Ubuntu 22.04

Operational

Affirmed

Affirmed is one of our triple crown servers (named after Triple Crown winners). These are highly capable large memory servers

It currently runs Ubuntu 22.04.

Specialized hardware

BlueField-2 DPU connected to 100Gb Infiniband Network
- Can also be connected to 10Gb ethernet network
- used to investigate properties and usage of the NVidia BlueField-2 card (ConnectX-6 VPI with DPU).

Usage

These servers are generally used for customized VM environments, which are often scheduled via SLURM.

Justify

Justify is one of our triple crown servers (named after Triple Crown winners). These are highly capable large memory servers

It currently runs Centos 7.9.

Usage

These servers are generally used for customized VM environments, which are often scheduled via SLURM.

Pharaoh

Pharaoh is one of our triple crown servers (named after Triple Crown winners). These are highly capable large memory servers

It currently runs Centos 7.9.

Usage

These servers are generally used for customized VM environments, which are often scheduled via SLURM.

Secretariat

Secretariat is one of our triple crown servers (named after Triple Crown winners). These are highly capable large memory servers

It currently runs Ubuntu 22.04.

Specialized hardware

BlueField-2 DPU connected to 100Gb Infiniband Network
- Can also be connected to 10Gb ethernet network
- used to investigate properties and usage of the NVidia BlueField-2 card (ConnectX-6 VPI with DPU).

Usage

These servers are generally used for customized VM environments, which are often scheduled via SLURM.

Xavier

zenith

Zenith 1

Created using PC Part Picker. The build is available at https://pcpartpicker.com/list/xPkRwc.

PCPartPicker Part List

Type

Item

Price

CPU

$2300.98 @ Amazon

CPU Cooler

Motherboard

$1988.99 @ Amazon

Memory

$249.99 @ Amazon

Storage

$125.65 @ Amazon

Video Card

$1499.99 @ Amazon

Video Card

$1720.23 @ Amazon

Case

Power Supply

$304.99 @ Newegg

Case Fan

$24.75 @ Amazon

Case Fan

$24.75 @ Amazon

Monitor

$289.00 @ Amazon

Prices include shipping, taxes, rebates, and discounts

Total

$8529.32

Access to the GPUs

To have access to the GPUs, request to be added to the video and render groups if you are not already in these groups.

Images

Zenith 2

Created using PC Part Picker. The build is available at https://pcpartpicker.com/list/vjXBPF.

PCPartPicker Part List

Type

Item

Price

CPU

$1605.00 @ Amazon

CPU Cooler

$250.00 @ Amazon

Motherboard

Memory

$649.99 @ Amazon

Storage

$169.99 @ B&H

Video Card

$159.99 @ Amazon

Case

$89.99 @ Amazon

Power Supply

$456.21 @ Amazon

Case Fan

$26.95 @ Amazon

Case Fan

$26.95 @ Amazon

Case Fan

$26.95 @ Amazon

Prices include shipping, taxes, rebates, and discounts

Total

$3462.02

Images

ExCl Support

ExCL Team

The Experimental Computing Laboratory is a Advanced Computing Systems Research project directed by Jeffrey Vetter. Support staff include:

Steve Moulton - systems engineer
Aaron Young - software engineer

Contact excl-help@ornl.gov for assistance.

Frequently Encountered Problems

I can't log in

There are two most likely sources of this problem

Password won't work

The most frequent cause is having your visitor (non-ORNL internal password) wrong, or having had it expire. See https://xcams.ornl.gov to address this. If you are ORNL staff, a frequent cause is a failure to keep your internal ORNL systems password up to date (UCAMS) or have missed required training. ExCL makes the same check that any ORNL system makes as to whether a password is valid or an account exists (you will not be able to differentiate the two errors based on the login failure). This will look like

$ ssh login.excl.ornl.gov
hsm@login.excl.ornl.gov's password:
Permission denied, please try again.

ExCL limits logins to five consecutive failures within a short period of time. After that limit exceeded, login atempts from your IP address will be blocked. This might look like

$ ssh login.excl.ornl.gov
ssh: connect to host login.excl.ornl.gov port 22: Operation timed out

To have this addressed, report your IP address to excl-help@ornl.gov. If you are on an ORNL network, you can use the usual native tools on your system to find your IP address. If you are at home and on a network using NAT (as most home networks do) use What Is My IP? Best Way To Check Your Public IP Address to determine your public IPv4 address when external to the lab. Note that this will not report the correct address if you are on an ORNL (workstations or visitor) network.

I can’t clone my git repo

The recommended approach for accessing git repositories in ExCL is to use the SSH protocol instead of the HTTPS protocol for private repositories and either protocol for public repositories. However, both approaches will work with the proper proxies, keys, applications passwords, and password managers in place.

To use the SSH protocol you must first setup SSH keys to the git website (i.e. GitLab, GitHub, and Bitbucket). See Git - Setup Git access to code.ornl.gov | ExCL User Docs (ornl.gov) for details for how to do this for code.ornl.gov. The other Git Clouds have similar methods to add SSH keys to your profile.

Since the worker nodes are behind a proxy. You must setup an SSH jump host in your .ssh/config to access Git SSH servers. See Git - Git SSH Access | ExCL User Docs (ornl.gov) to verify that you have setup the proper lines in your SSH Config.

I need a newer version of pip

See Python | ExCL User Docs for instructions on how to setup a Python virtual environment with the latest version of pip.

I need a newer version of Python

See Python | ExCL User Docs for instructions on how to use UV to setup a Python virtual environment with a specific python version.

Access to ExCL

To become authorized to access ExCL facilities, please apply at https://www.excl.ornl.gov/accessing-excl/. You have the option of using your ORNL (ucams) account if you have one, or creating an xcams (external user) account if you wish.

Once you have access you have a couple of options.

login.excl.ornl.gov runs an SSH Server and you can connect to the login node with ssh login.excl.ornl.gov.
There is a limited number of ThinLinc licenses available. Thinlinc (Xfce Desktop) can be accessed at https://login.excl.ornl.gov:300 for HTML5 services, and ThinLinc clients can use login.excl.ornl.gov as their destination. ThinLinc clients can be downloaded without cost from https://www.cendio.com/thinlinc/download. ThinLinc provides much better performance than tunneling X over SSH. A common strategy is to access login.excl.ornl.gov via ThinLinc and then use X11 forwarding to access GUIs running on other nodes.

Notes:

Using an SSH key instead of a password to connect to ExCL is highly recommended. See How to get start with SSH keys. SSH keys are more secure than passwords, and you are less likely to accidentally get banned from multiple incorrect login attempts when using SSH Keys to authenticate. If you get blocked, you can send a help ticket to excl-help@ornl.gov with your IP address to get removed from the block list.
If you use a passphrase with your SSH key (recommended for security), you should also set up an SSH Agent to load the SSH key. An SSH Agent allows you to enter your passphrase once for the session without needing to enter your passphrase many times. The VS Code documentation is well written for setting up this SSH Agent on a variety of platforms; see Visual Studio Code Remote Development Troubleshooting Tips and Tricks.
It is recommended to use a terminal multiplexer like tmux or screen. These tools keep your session active and can be reattached to if you loose network connection. They also allow you to open multiple windows or split panels.

Add SSH Public Key to ExCL’s Authorized Keys

You can manually copy the key if already on ExCL. For example

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Or you can you ssh-copy-id to copy your local systems key to ExCL.

ssh-copy-id login.excl.ornl.gov

Contributing

ExCl → User Documentation → Contributing

About ExCl User Documentation

Documentation published to ExCL users is available in our GitHub repo. Users are encouraged to contribute by improving the material or providing user-created tutorials to share with the community.

Ways to Contribute

Would you like to make things better? There are a few ways you can contribute to improving our documentation and adding user-created tutorials or content.

Email your suggestions to the team excl-help@ornl.gov
Want to change things? Feeling adventurous? Comfortable with git? See instructions for our Git workflow to branch our documentation repository and hack away. You got this.

Glossary & Acronyms

Acronym

Meaning

ExCL

Experimental Computing Lab

CPU

Central Processing Unit

GPU

Graphics Processing Unit

FPGA

Field-programmable Gate Array

DSP

Digital Signal Processor

eMMC

Embedded MultiMediaCard

DRAM

Dynamic Random-Access Memory

HBM

High-Bandwidth Memory

SSH

Secure Shell

Requesting Access

Outages and Maintenance Policy

ExCL reserves the first Tuesday of every month for systems maintenance. This may result in complete inaccessibility during business hours. Every effort will be made to minimize the scope, duration, and effect of maintenance activities.

If an outage will affect urgent projects (i.e., with impending deadlines) please email excl-help@ornl.gov as soon as possible.

Backup & Storage

Backup

While our file server, backup file server, and ORNL-provided tape backup are quite robust, ExCL does not have formally supported backups. Please store important files in source control, for example using git with gitlab or github. Important data (if any) should be duplicated elsewhere; contact excl-help@ornl.gov for assistance.

Snapshots take space for files that have changed or been deleted. They are automatically deleted as they age, so that hourlies are kept for 48 hours, one hourly from each day is kept for 30 days, and one hourly for each 30 day period is kept for 180 days. This policy can be modified on request. Snapshots are read only; you can copy files from them back into your home directory tree to restore them.

There is currently no file purge policy. Given that ExCL researchers take care of cleaning up files that are no longer in use, no change to this policy is foreseen. Files for inactive users are archived in a non-snapshot file system. While it is our intent to continue maintaining storage for inactive users, this policy may change in the future.

Quotas

Local Storage

/scratch/ is not shared between nodes, not stored in raid, and not backed up in any way. However, this storage does not have any automatic purging policy (unlike /tmp/), so the files should persist as long as the storage doesn’t fill up and the drives don’t fail.

Project Storage

Shared storage space for collaborative projects is available upon request. Each project is assigned a dedicated subvolume within the ZFS filesystem, which is accessible via an automounted NFS share. The mount point for each project is located at:

Access to the project directories is restricted for security and organization. Only execute permissions are set on the /auto/projects/ directory, meaning you must know the specific project name to cd into it. You will not be able to use ls to list all available project directories.

Access Control Lists (ACLs) are used to manage permissions for project directories, allowing for flexible access configurations. By default, all members associated with a project will have read, write, and execute permissions for the files within their assigned project directory.

Quick-Start Guides

ExCL Remote Development

Getting started with ExCL Remote Development.

Roadmap for Setup

If you are new to remote development on ExCL here is a roadmap to follow to set important settings and to get familiar with remote Linux development.

Access ExCL
Setup SSH: SSH Keys for Authentication | ExCL User Docs
- Bonus: SSH-Agent and SSH Forwarding
Setup Git
1. Git SSH Access | ExCL User Docs
2. Setup Git access to code.ornl.gov | ExCL User Docs
Setup VS Code Remote Explorer: Visual Studio Code Remote Explorer | ExCL User Docs
- Important: Make sure to check the setting Remote.SSH: Lockfiles in Tmp.
Setup FoxyProxy. This enables access to ThinLinc as well as any other web services running on ExCL systems.
Now you are ready to follow any of the other Quick-Start Guides.

Setup FoxyProxy

Launch SOCKS dynamic proxy forwarding to the login node using dynamic forwarding with SSH. On Linux or macOS, via the SSH flag -D
```
 $ ssh -D 9090 <Username>@login.excl.ornl.gov
```
or in the ssh config add the DynamicForward option
```
DynamicForward 9090
```
On Windows, use MobaSSHTunnel to set up Dynamic Forwarding. See Jupyter Quickstart for more information on port forwarding in windows.
Setup FoxyProxy Install the FoxyProxy Chrome extension or Firefox extension.
Setup FoxyProxy by adding a new proxy for localhost on port 9090. Then add the regular expression URL pattern .*\.ftpn\.ornl\.gov to forward ThinLinc traffic to ExCL.

Apptainer

Apptainer/Singularity is the most widely used container system for HPC. It is designed to execute applications at bare-metal performance while being secure, portable, and 100% reproducible. Apptainer is an open-source project with a friendly community of developers and users. The user base continues to expand, with Apptainer/Singularity now used across industry and academia in many areas of work.

Apptainer is a container platform. It allows you to create and run containers that package up pieces of software in a way that is portable and reproducible. You can build a container using Apptainer on your laptop, and then run it on many of the largest HPC clusters in the world, local university or company clusters, a single server, in the cloud, or on a workstation down the hall. Your container is a single file, and you don’t have to worry about how to install all the software you need on each different operating system.

Apptainer allows for more secure containers than docker without the need for root access.

Why use Apptainer?

From Why you should use Apptainer vs Docker | Medium.

Apptainer allows you to:

Build on a personal computer with root or on a shared system with fakeroot.
Move images between systems easily.
Execute on a shared system without root.

Apptainer is designed for HPC:

Defaults to running as the current user
Defaults to mounting the home directory in /home/$USER
Defaults to running as a program (not background process)

Apptainer also has great support with Docker images.

Systems with Apptainer installed

docker
thunderx
zenith

Other systems can have Apptainer installed by request.

Notes:

Apptainer mounts $HOME , /sys:/sys , /proc:/proc, /tmp:/tmp, /var/tmp:/var/tmp, /etc/resolv.conf:/etc/resolv.conf, /etc/passwd:/etc/passwd, and $PWD by default and run in ~ by default. This means you can change files in your home directory by running with Apptainer. This is different from Docker which creates a container (overlay in Apptainer) by default for the application to run in. See Bind Paths and Mounts.
To mount another location when running Apptainer, use the --bind option. For example to mount /noback use --bind /noback:/noback. See Bind Paths and Mounts.
Admins can specify default bind points in /etc/apptainer/apptainer.conf. See Apptainer Configuration Files
When creating a definition file, pay attention to the rules for each section. See Definition Files For example:
- %setup is a scriplet which runs outside the container and can modify the host. Use ${APPTAINER_ROOTFS} to access the files in the Apptainer image.
- Environment variables defined in %environment are available only after the build, so if you need access to them for the build, define them in the %post section.
To use --fakeroot you must first have fakeroot configured for that user. This can be done with the command sudo apptainer config fakeroot --add <user>. See User Namespaces & Fakeroot
To use X11 applications in Apptainer with over ThinLinc, you need to bind /var/opt/thinlinc with --bind /var/opt/thinlinc since that is where the user’s XAuthority file is stored.
sandbox image build mode along with fakeroot can help if one needs to apt-get install or yum install packages within a singularity / apptainer container and persist the mutable image out on disk: Build a Container — Apptainer User Guide main documentation.

NFS Limitations

From https://apptainer.org/docs/admin/main/installation.html#nfs.

NFS filesystems support overlay mounts as a lowerdir only, and do not support user-namespace (sub)uid/gid mapping.

Containers run from SIF files located on an NFS filesystem do not have restrictions.
In setuid mode, you cannot use --overlay mynfsdir/ to overlay a directory onto a container when the overlay (upperdir) directory is on an NFS filesystem. In non-setuid mode and fuse-overlayfs it is allowed but will be read-only.
When using --fakeroot and /etc/subuid mappings to build or run a container, your TMPDIR / APPTAINER_TMPDIR should not be set to an NFS location.
You should not run a sandbox container with --fakeroot and /etc/subuid mappings from an NFS location.

Getting Started

Apptainer with Harbor

See registry (ornl.gov) for general information for how to use the ORNL Container Repositories. These sites https://camden.ornl.gov and https://savannah.ornl.gov are the internal and external container repositories running Harbor.

These container registry also work with Apptainer images.

Follow the regular instructions to setup Harbor. Then see the commands below for an Apptainer specific reference.

apptainer registry login -u ${email_address} oras://camden.ornl.gov

Logout of Camden

apptainer registry logout oras://camden.ornl.gov

Pull image from Camden

apptainer pull <myimage>.sif oras://camden.ornl.gov/<myproject>/<myimage>[:<tag>]

Push image to Camden

apptainer push <myimage>.sif oras://camden.ornl.gov/<myproject>/<myimage>[:<tag>]

CI with Apptainer and Harbor

Create a robot account in Harbor using the regular method.

Then use the CI environment variables APPTAINER_DOCKER_USERNAME and APPTAINER_DOCKER_PASSWORD to specify the robot username and token. Make sure to deselect Expand variable reference since the username has a ‘$’ in it.

System Admin Notes

It is helpful to add commonly needed bind paths to /etc/apptainer/apptainer.conf. I have added the following bind commands to Zenith:

bind path = /scratch
bind path = /etc/localtime
bind path = /etc/hosts
bind path = /var/opt/thinlinc
bind path = /auto

Conda and Spack Installation

The recommended way to install Conda and Spack.

Installing Conda

Follow the prompts on the installer screens. Accept the license agreements. Specify /home/$USER/conda as the installation location. Choose if you want the installer to initialize Miniconda.

Improving Conda Environment Solver Performance

The quick start guide is below.

Installing Spack

Devdocs

Service to host internal documentation for code under development.

DevDocs

The documentation for hunter is built with GitLab-CI. Here are the relevant lines in .gitlab-ci.yml.

Request a DevDocs Site

URL
Project Name (This will be your DevDocs subdirectory)

Gitlab CI

Getting started with Gitlab CI runners in code.ornl.gov running on ExCL systems.

Register a Runner

Runners can be registered as either a group runner or for a single repository (also know as a project runner). Group runners are are made available to all the repositories in a group.

URL
Registration Token
Executor (choose shell or docker with image)
Project Name (This can be group name or repo name)
ExCL System
Tag List

After the runner is added, you can edit the runner to change the tags and description.

Group Runner

Single Repo Runner (Project Runner)

List of ExCL Systems with a runner

Any system can be requested as a runner. These systems are already being used as a runner. (Updated October 2023)

docker.ftpn.ornl.gov
explorer.ftpn.ornl.gov
intrepid.ftpn.ornl.gov
justify.ftpn.ornl.gov
leconte.ftpn.ornl.gov
lewis.ftpn.ornl.gov
milan2.ftpn.ornl.gov
milan3.ftpn.ornl.gov
oswald00.ftpn.ornl.gov
oswald02.ftpn.ornl.gov
oswald03.ftpn.ornl.gov
pcie.ftpn.ornl.gov
zenith.ftpn.ornl.gov

Using Slurm with Gitlab CI

The system slurm-gitlab-runner is setup specifically to run CI jobs that then run the execution using slurm with sbatch --wait.

This template includes two helper scripts, runner_watcher.sh and slurm-tee.py.

runner_watcher.sh watches the CI job and cancels the Slurm job if the CI job is canceled or times out.

slurm-tee.py watches the slurm-out.txt and slurm-err.txt files and prints their content to std-out so that the build log can be watched from the GitLab web interface. Unlike regular less --folow, slurm-tee watches the multiple files for changes and also exits once the slurm job completes.

Spack and Conda with Gitlab-CI

Groq

Getting started with Groq.

Groq Links

Start by logging into ExCL's login node.

From the login node, you can then login to a node with a Groq card, for example

Here is a table of the Groq cards available:

Groq card and Slurm

The recommended way to access the Groq card is to reserve it through the Slurm resource manager. Groq cards are available on machines in the groq partition. To reserve a node with a groq card for interactive use use the command.

Where: -J, --job-name=<jobname> specifies the job name. -p, --partition=<partition names> specifies the partition name. --exclusive specifies you want exclusive access to the node. --gres="groq:card:1" specifies that you want to use 1 groq card.

Non-interactive batch jobs can similarly be launched.

Where: -J, --job-name=<jobname> specifies the job name. -p, --partition=<partition names> specifies the partition name. --exclusive specifies you want exclusive access to the node. --gres="groq:card:1" specifies that you want to use 1 groq card.

or specified in the script:

Using the Groq Card

In order to use the Groq API you need to make sure you are using python 3.8 and that you add the Groq python libraries to your path. For python 3.8 you can either use the installed system python3.8 or use conda to install python3.8.

System python3.8

You need to fully quantify the path to python since Ubuntu 22.04 defaults to python3.10. This means you need to use

Then to install jupyter notebook in your home directory, you would need to do

Conda

Graphical Access to Groq Systems

Jupyter Notebooks

Getting started Resources

Useful Groq Commands

Run regression tests to verify card functionality: /opt/groq/runtime/site-packages/bin/tsp-regression run
Get Groq device status: /opt/groq/runtime/site-packages/bin/tsp-ctl status
Monitor temperature and power: /opt/groq/runtime/site-packages/bin/tsp-ctl monitor

Julia

Getting Started with Julia in ExCL with best practice recommendations.

Use module load julia to load the Julia tooling on an ExCL system.

Julia VSCode Extension in ExCL

This can be done by setting the julia.executablePath to point to the Julia executable that the extension should use, which is this case is the one loaded by the module load command for the version of Julia you want to use. Once set, the extension will always use that version of Julia. To edit your configuration settings, execute the Preferences: Open User Settings command (you can also access it via the menu File->Preferences->Settings), and then make sure your user settings include the julia.executablePath setting. The format of the string should follow your platform specific conventions, and be aware that the backlash \ is the escape character in JSON, so you need to use \\ as the path separator character on Windows.

To find the proper path to Julia, you can use which julia after the module load command. At the time of writing this page, the default version of Julia installed on ExCL is 1.10.4 and the julia.executablePath should be set as shown below.

Using Julia with Jupyter

Within ExCL, the first step is to load the Julia module with module load julia to load the Julia tooling into the ExCL system.

The third step is to install ‘IJulia’ using the Julia REPL. Launch the Julia REPL with julia then press ] to open the package management, then run add IJulia.

Jupyter Notebook

Getting started with Jupyter Notebook.

Installing Jupyter

Jupyter with UV

Jupyter kernels using virtual environments

Create a python virtual environment and activate it. Then install ipykernel and then install the kernel for use in Jupyter.

Use jupyter kernelspec list to view all the installed Jupyter kernels.

To uninstall a Jupyter kernel use uninstall.

Accessing a Jupyter Notebook Running on ExCL

A Jupyter notebook server running on ExCL can be accessed via a local web browser through port forwarding the Jupyter notebook's port. By default, this is port 8888 (or the next available port). This port might be in use if someone else is using running a notebook. You can specify the port with the --port flag when launching the Jupyter notebook. To use a different port just replace 8888 with the desired port number. In order to port forward from an internal node, you have to port forward twice, once from your machine to login.excl.ornl.gov and once again from the login node to the internal node (i.e. pcie).

Detailed instructions for Linux/Mac

These instructions go over how to access a Jupyter notebook running on the pcie node in the ExCL Cluster. If you want to access a different system, then replace pcie with the system you intend to access.

Specify the ports that you intend to use. Choose a different number from the default so that you don't conflict with other users.
From your local machine connect to pcie using login.excl.ornl.gov as a proxy and local forward the jupyter port.
(Optional) Load the anaconda module if you don't have jupyter notebook installed locally.
Launch the Jupyter server on pcie
Connect to the Jupyter notebook using a web browser on your local machine. Use the token shown in the output from running the Jupyter server. Url: http://localhost:<local_port>/?token=<token>. You can also configure jupyter to use a password with jupyter notebook password if you don't want to use the access tokens.

If you ssh client is too old for proxyjump to work, you can always break up the process into another step.

From your local machine connect to login.excl.ornl.gov and local port forward port 8888.
From the login node connect to pcie and local port forward port 8888
Launch the Jupyter server on pcie
Connect to the Jupyter notebook using a web browser on your local machine. Use the token shown in the output from running the Jupyter server. Url: http://localhost:8888/?token=<token>

Detailed instructions for Windows with MobaXterm

These instructions go over how to access a Jupyter notebook running on the pcie node in the ExCL Cluster.

From your local machine connect to login.excl.ornl.gov using MobaXterm.
Go to tools and click on MobaSSHTunnel. Use MobaSSHTunnel local forward port 8888.
Click on MobaSSHTunnel
Click on New SSH Tunnel
Local port forward 8888
Click the play button to start port forwarding
From the login node connect to pcie and local port forward port 8888
Launch the Jupyter server on pcie
Connect to the Jupyter notebook using a web browser on your local machine. Use the token shown in the output from running the Jupyter server. URL: http://localhost:8888/?token=<token>

Detailed instructions for Windows with Visual Studio Code

These instructions go over how to access a Jupyter notebook running on the quad00 node in the ExCL Cluster using Visual Studio Code to handle port forwarding.

Open Visual Studio Code
Make sure you have the Remote - SSH extension installed.
Setup .ssh
Navigate to the remote explorer settings.
Chose the user .ssh config.
Add the remote systems to connect to with the proxy command to connect through the login node.
Connect to the remote system and open the Jupyter folder.
Open Folder
Run the Jupyter notebook using the built-in terminal.
Open the automatically forwarded port.

Software

Devices

Contributing via Git

System Overview

Overview of ExCL Systems

ExCL Server List with Accelerators

Host Name

Description

Accelerators or other special hardware

Triple Crown AMD EPYC 7742 (Rome) 2x64-core 1 TB

Ubuntu 22.04

Bluefield 2

NIC/DPUs

Desktop embedded system development

Ubuntu 20.04

Snapdragon 855 (desktop retiring)

ApachePass memory system

Centos 7.9

375 GB Apachepass memory

Desktop embedded system development

Ubuntu 22.04

Intel A770 Accelerator

AMD EPYC 7272 (Rome) 2x12-core 256 GB

Ubuntu 22.04

2 AMD MI100 32 GB GPUs

(quad03)

Intel 20 Core Server 96 GB

Ubuntu 20.04

Docker development environment

DGX Workstation Intel Xeon E5-2698 v4 (Broadwell) 20-core 256 GB

Ubuntu 22.04

4 Tesla V100-DGXS 32 GB GPUs

AMD EPYC 7702 (Rome) 2x64-core 512 GB

Ubuntu 22.04

2 AMD MI60 32 GB GPUs

AMD EPYC 9454 (Genoa) 2x48-core 1.5 TB

Ubuntu 22.04

2 Nvidia H100s

Triple Crown AMD EPYC 7742 (Rome) 2x64-core 1 TB

Centos 7.9

Summit server POWER9 42 Cores

Centos 8.4

6 Tesla V100 16 GB GPUs

Desktop embedded system development

Ubuntu 22.04

Desktop embedded system development

Ubuntu 20.04

Snapdragon 855 & PolarFire SoC (retiring)

AMD EPYC 7513 (Milan) 2x32-core 1 TB

Ubuntu 22.04

2 * Nvidia A100

AMD EPYC 7513 (Milan) 2x32-core 1 TB

Ubuntu 22.04 or other

2 Groq AI accelerators

AMD EPYC 7513 (Milan) 2x32-core 1 TB

Ubuntu 22.04 or other

8 Nvidia Tesla V100-PCIE-32GB GPUs

AMD EPYC 7513 (Milan) 2x32-core 1 TB

Ubuntu 22.04 or other

General Use

Apple M1 Desktop

OSX

Oswald head node

Ubuntu 22.04

Intel Xeon E5-2683 v4 (Haswell) 2x16-core 256 GB

Centos 7.9

Tesla P100 & Nallatech FPGA

Intel Xeon E5-2683 v4 (Haswell) 2x16-core 256 GB

Centos 7.9

Tesla P100 & Nallatech FPGA

Intel Xeon E5-2683 v4 (Haswell) 2x16-core 256 GB

Centos 7.9

Tesla P100 & Nallatech FPGA

Intel Xeon Gold 6130 CPU (Skylake) 32-core 192 GB

Ubuntu 22.04

Xylinx U250 Nalllatech Stratix 10 Tesla P100 Groq Card

Triple Crown AMD EPYC 7742 (Rome) 2x64-core 1 TB

Centos 7.9

Intel 4 Core 64 GB

Ubuntu 22.04

AMD Vega20 Radeon VII GPU

Triple Crown AMD EPYC 7742 (Rome) 2x64-core 1 TB

Ubuntu 22.04

Bluefield 2 NIC/DPU

ARM Cavium ThunderX2 Server 128 GB

Centos Stream 8

Nvidia Jetson AGX

Ubuntu

Volta GPU

Nvidia Jetson AGX Orin

Ubuntu

Ampere GPU (not deployed)

AMD Ryzen Threadripper 3970X (Castle Peak) 32-core 132 GB

Ubuntu 22.04

Nvidia GTX 3090 AMD Radeon RX 6800

New Systems and Devices to be Deployed

2 Snapdragon HDK & Display
Intel ARC GPU
Achronix FPGA
AGX Orin Developer Kits
Xilinx U280

Accelerator Highlights

Accelerator Name

Host(s)

AMD Radeon VII GPU

radeon

AMD MI60 GPU

explorer

AMD MI100 GPU

cousteau

milan1

Nvidia A100 GPU

milan0

Nvidia P100 GPU

pcie

Nvidia V100 GPU

equinox, leconte, milan2

Nvidia H100 GPU

hudson

Nvidia Jetson

xavier

amundsen, mcmurdo

Intel Stratix 10 FPGA

pcie

Xilinx Zynq ZCU 102

n/a

Xilinx Zynq ZCU 106

n/a

Xilinx Alveo U250

pcie

2 Ettus x410 SDRs

marconi

Unique Architecture Highlights

Accelerator Name

Host(s)

Intel Optane DC Persistent Memory

apachepass

Emu Technology CPU

Cavium CPU

thunderx

Other Equipment

RTP164 High Performance Oscilloscope

Primary Usage Notes

Host

Base Resources

Specialized Resources

Notes

4 core 16 Gi vm

These nodes can be access with ssh, and are availible for general interactive use.

Host

Base Resources

Specialized Resources

Notes

oswald

16 Core 64 Gb

Usable, pending rebuilt to Ubuntu

oswald00

32 core 256 Gi

NVIDIA P100, FPGA @

oswald02

32 core 256 Gi

NVIDIA P100, FPGA @

Not available - rebuilding

oswald03

32 core 256 Gi

NVIDIA P100, FPGA @

Not available - rebuilding

milan0

128 Core 1 Ti

NVIDIA A100 (2)

Slurm

milan1

128 Core 1 Ti

Groq AI Accelerator (2)

Slurm

milan2

128 Core 1 Ti

NVIDIA V100 (8)

milan3

128 Core 1 Ti

Slurm

excl-us00

32 Core 192 Gi

Rocky 9

excl-us01

32 Core 192 Gi

Not available pending rebuild

excl-us03

32 Core 192 Gi

CentOS 7 pending rebuild

secretariat

256 Core 1 Ti

Slurm

affirmed

256 Core 1 Ti

Slurm

pharaoh

256 Core 1 Ti

Slurm

justify

256 Core 1 Ti

Slurm

hudson

192 Core 1.5 Ti

NVIDIA H100 (2)

docker

20 Core 96 Gi

Configured for Docker general use with enhanced image storage

pcie

32 Core 196 Gi

NVIDIA P100, FPGA @

TL, No hyperthreading, passthrough hypervisor for accellerators

lewis

20 Core 48 Gi

NVIDIA T1000, U250

clark

20 Core 48 Gi

NVIDIA T1000

zenith

64 core 128 Gi

NVIDIA GeForce RTX 3090 @

radeon

8 Core 64 Gi

AMD Radeon VII

equinox

DG Workstation

NVIDIA V100 * 4

rebuilding after ssd failure

explorer

256 Core 512 Gi

AMD M60 (2)

cousteau

48 Core 256 Gi

AMD M100 (2)

leconte

168 Core 602 Gi

NVIDIA V100 * 6

PowerPC (Summit)

Zenith

32 Core 132 Gi

Nvidia GTX 3090 AMD Radeon RX 6800

Zenith2

32 Core 256 Gi

Embedded FPGAs

Notes:

All of the general compute resources have hyperthreading enabled unless otherwise stated.. This can be changed on a per request basis.
TL: Thinlinc enabled. Need to use login as a jump host for resources other than login. See ThinLinc Quickstart
Slurm: Node is added to a slurm partition and will likely be used for running slurm jobs. Try to make sure your interactive use does not conflict with any active Slurm jobs.
- Most of the general compute resources are Slurm-enabled, to allow queuing of larger-scale workloads. excl-help@ornl.gov for specialized assistance. Only the systems that are heavily used for running Slurm jobs are marked “Slurm” above.

Graphical session use via ThinLinc

login — not for heavy computation
zenith
zenith2
clark
lewis
pcie
intrepid
spike

Slurm for Large Jobs

Triple Crown — Dedicated Slurm runners.
- affirmed
- justify
- secretariat
- pharaoh
Milan — Additional Slurm Resources with other shared use.
- milan0
- milan1
- milan3
Others — Shared slurm runners with interactive use.
- milan[0-3]
- cousteau
- excl-us03
- explorer
- oswald
- oswald[00, 02-03]

Gitlab Runner Speciliazed Nodes

slurm-gitlab-runner — Gitlab Runner for launching slurm jobs.
docker — for docker runner jobs.
devdoc — for internal development documentation building and hosting.

Note: any node can be used as a CI runner on request. See GitLab Runner Quickstart and GitHub Runner Quickstart. The above systems have a dedicated or specilized use with CI.

Docker

docker — Node with docker installed.

Specialized usage and reservations

Host

Specilized Usage

Reserved?

dragon (vm)

Siemens EDA Tools

task-reserved

devdocs (vm)

Internal development documentation building and hosting

task-reserved

spike (vm)

pcie vm with FPGA and GPU passthrough access

task-reserved

lewis

U250

RISC-V Emulation using U250

slurm-gitlab-runner

slurm integration wth gitlab-runner

task-reserved

docker

slurm-integration with gitlab runner for containers

reserved for container use

Notes:

task-reserved: reserved for specialized tasks, not for project

Infrastructure Systems

Host Name

Description

excl-us01 (hypervisor)

Intel 16 Core Utility Server 196 GB

Vitis FPGA Development

Getting started with Vitis FPGA development.

ExCl → User Documentation → Vitis FPGA Development

FPGA Current State

FPGA

State

U250

Attached to spike in Alveo mode.

u55C

Attached to spike in Alveo mode.

u280

Aaron’s office

Vitis Development Tools

This page covers how to access the Vitis development tools available in ExCL. The available FPGAs are listed in the FPGAs section. All Ubuntu 22.04 systems can load the Vitis/Vivado development tools as a module. See Quickstart to get started. The virtual systems have ThinLinc installed, which makes it easier to run graphical applications. See section Accessing ThinLinc to get started.

Vitis is now primarily deployed as a module for Ubuntu 22.04 systems. You can view available modules and versions with module avail and load the most recent version with module load Vitis. These modules should be able to work on any Ubuntu 22.04 system in ExCL.

FPGAs

FPGA

Host System

Slurm GRES Name

Spike

U250

Spike

U55C

U280

Vitis and FPGA Allocation with Slurm (Recommended Method to Use Tools)

Suggested machines to use for Vitis development are also setup with Slurm. Slurm is used as a resource manager to allocate compute resources as well as hardware resources. The use of Slurm is required to allocate FPGA hardware and reserve build resources on Triple Crown. It is also recommended to reserve resources when running test builds on Zenith. The best practice is to launch builds on fpgabuild with Slurm, then launch bitfile tests with Slurm. The use of Slurm is required to effectively share the FPGAs, and to share build resources with automated CI Runs, and other automated build and test scripts. As part of the Slurm interactive use or batch script, use modules to load the desired version of the tools. The rest of this section details how to use Slurm. See the Cheat Sheet for commonly used Slurm commands. See the Slurm Quick Start User Guide to learn the basics of using Slurm.

Interactive Use: Vitis Build

Allocate a build instance for one Vitis Build. Each Vitis build uses 8 threads by default. If you plan to use more threads, please adjust -c accordingly.

srun -J interactive_build -p fpgabuild -c 8 --pty bash

Where: -J, --job-name=<jobname> -p, --partition=<partition names> -c, --cpus-per-task=<ncpus>

Recommended: bash can be replaced with the build or execution command to run the command and get the results back to your terminal. Otherwise, you have to exit the bash shell launched by srun to release the resources.

Recommended: sbatch can be used with a script to queue the job and store the resulting output to a file. sbatch is better than srun for long-running builds.

Interactive Use: Allocate FPGA

Allocate the U250 FPGA to run hardware jobs. Please release the FPGA when you are done so that other jobs can use the FPGA.

srun -J interactive_fpga -p fpgarun --gres="fpga:U250:1" --pty bash

Where: -J, --job-name=<jobname> -p, --partition=<partition names> --gres="fpga:U250:1" specifies that you want to use 1 U250 FPGA.

Resources: "fpga:U250:1" can be replaced with the FPGA resource that you want to use. Multiple resources can also be reserved at a time. See FPGAs for a list of available FPGAs.

Non-interactive Use: Vitis Build

sbatch -J batch_build -p fpgabuild -c 8 build.sh

Where: -J, --job-name=<jobname> -p, --partition=<partition names> -c, --cpus-per-task=<ncpus> build.sh is a script to launch the build.

Recommended: The Slurm parameters can be stored in build.sh with #SBATCH <parameter>.

Template: See Slurm Templates · code.ornl.gov for Slurm sbatch script templates.

Non-interactive Use: Vitis Run

sbatch -J batch_run -p fpgarun --gres="fpga:U250:1" run.sh

Where: -J, --job-name=<jobname> -p, --partition=<partition names> --gres="fpga:U250:1" specifies that you want to use 1 U250 FPGA. run.sh is a script to launch the run.

Recommended: The Slurm parameters can be stored in build.sh with #SBATCH <parameter>.

Template: See Slurm Templates · code.ornl.gov for Slurm sbatch script templates.

Quickstart

From the login node run srun -J interactive_build -p fpgabuild -c 8 --pty bash to start a bash shell.
Use module load vitis to load the latest version of the vitis toolchain.
Use source /opt/xilinx/xrt/setup.sh to load the Xilinx Runtime (XRT).

First Steps

Follow the quickstart to set up the Vitis Environment.
Go through the Vitis Getting Started Tutorials.
Go through the Vitis Hardware Accelerators Tutorials.
Go through the Vitis Accel Examples.

Getting specific FPGA information from the Platform.

Use platforminfo to query additional information about an FPGA platform. See the example command below.

$ platforminfo --platform xilinx_u250_gen3x16_xdma_3_1_202020_1
==========================
Basic Platform Information
==========================
Platform:           gen3x16_xdma_3_1
File:               /opt/xilinx/platforms/xilinx_u250_gen3x16_xdma_3_1_202020_1/xilinx_u250_gen3x16_xdma_3_1_202020_1.xpfm
Description:
    This platform targets the Alveo U250 Data Center Accelerator Card. This high-performance acceleration platform features up to four channels of DDR4-2400 SDRAM which are instantiated as required by
the user kernels for high fabric resource availability, and Xilinx DMA Subsystem for PCI Express with PCIe Gen3 x16 connectivity.


=====================================
Hardware Platform (Shell) Information
=====================================
Vendor:                           xilinx
Board:                            U250 (gen3x16_xdma_3_1)
Name:                             gen3x16_xdma_3_1
Version:                          202020.1
Generated Version:                2020.2
Hardware:                         1
Software Emulation:               1
Hardware Emulation:               1
Hardware Emulation Platform:      0
FPGA Family:                      virtexuplus
FPGA Device:                      xcu250
Board Vendor:                     xilinx.com
Board Name:                       xilinx.com:au250:1.2
Board Part:                       xcu250-figd2104-2L-e

...

Accessing systems graphically using ThinLinc

See ThinLinc Quickstart.

Note: Fish is not backwards compatible with Bash. See Fish for bash users (fishshell.com). So in order to load modules and source bash scripts, I have included the bass function. Prepend bass before the source or module commands to use bash features in fish.

Using Vitis with the Fish Shell (Recommended Apporach)

Fish is installed system-wide with a default configuration based on Aaron's fish configuration that includes helpful functions to launch the Xilinx development tools. The next sections goes over the functions that this fish config provides.

sfpgabuild

sfpgabuild is a shortcut to calling srun -J interactive_build -p fpgabuild -c 8 --mem 8G --mail-type=END,FAIL --mail-user $user_email --pty $argv . Essentially it setups a FPGA build environment using slurm using resonable defaults. Each of the defaults can be overriden by spacifying the new parameter when calling sfpgabuild . sfpgabuild also modifies the prompt to remind you that you are in the fpga build environment.

sfpgarun-u250

sfpgarun is a shortcut to calling srun -J fpgarun-u250 -p fpgarun -c 8 --mem 8G --mail-type=END,FAIL --mail-user $user_email --gres="fpga:U250:1" --pty $argv . sfpgarun-u250 setups up an FPGA run environment complete with requesting the FPGA resource.

sfpgarun-u55c

sfpgarun is a shortcut to calling srun -J fpgarun-u55c -p fpgarun -c 8 --mem 8G --mail-type=END,FAIL --mail-user $user_email --gres="fpga:U55C:1" --pty $argv . sfpgarun-u55c setups up an FPGA run environment complete with requesting the FPGA resource.

sfpgarun-hw-emu

sfpgarun is a shortcut to calling XCL_EMULATION_MODE=hw_emu srun -J fpgarun -p fpgarun -c 8 --mem 8G --mail-type=END,FAIL --mail-user $user_email --pty $argv . sfpgarun-hw-emu setups up an FPGA run environment complete with specifying XCL_EMULATION_MODE.

sfpgarun-sw-emu

sfpgarun is a shortcut to calling XCL_EMULATION_MODE=sw_emu srun -J fpgarun -p fpgarun -c 8 --mem 8G --mail-type=END,FAIL --mail-user $user_email --pty $argv . sfpgarun-sw-emu setups up an FPGA run environment complete with specifying XCL_EMULATION_MODE.

viv

After running bass module load vitis, sfpgabuild, or sfpgarun, viv can be used to launch Vivado in the background and is a shortcut to calling vivado -nolog -nojournal.

Manually Setting up License

In order to manually set up the the Xilinx license, set the environment variable XILINXD_LICENSE_FILE to 2100@license.ftpn.ornl.gov.

export XILINXD_LICENSE_FILE=2100@license.ftpn.ornl.gov

The FlexLM server uses ports 2100 and 2101.

Note: This step is done automatically by the module load command and manually setting up the license should not be needed.

Building and Running FPGA Applications

Xilinx FPGA projects can be built using the Vitis Compiler, the Vitis GUI, Vitis HLS, or Vivado.

In general, I recommend using the Vitis compiler via the command line and scripts, because the workflow is easy to document, store in git, and run with GitLab CI. I recommend using Vitis HLS when trying to optimize kernel since it provides many profiling tools. See Vitis HLS Tutorial.

Tutorials are available to learn how to use Vitis. In particular, this Getting started with Vitis Tutorial goes over the building and running of an example application.

See the Vitis Documentation for more details on building and running FPGA applications.

Setting up the Vitis Environment

The Vitis environment and tools are setup via the module files. To load the latest version of the Vitis environment use the following command. In bash:

module load vitis

In fish:

bass module load vitis

To see available versions use module avail. Then a specific version can be loaded by specifying the version, for example module load vitis/2020.2.

See the Vitis Documentation for more details on setting up the Vitis Environment.

Note: Because of issues with XRT and with OpenCL including the xilinx.icd by default, on many system we moved the xilinx.icd to /etc/OpenCL/vendors/xilinx/xilinx.icd. Now to load the FPGA as an OpenCL device, you must change the environment variable OPENCL_VENDOR_PATH to point to /etc/OpenCL/vendors/xilinx or /etc/OpenCL/vendors/all.

Build Targets

There are three build targets available when building an FPGA kernel with Vitis tools.

See the Vitis Documentation for more information.

Software Emulation

Hardware Emulation

Hardware Execution

Host application runs with a C/C++ or OpenCL™ model of the kernels.

Host application runs with a simulated RTL model of the kernels.

Host application runs with actual hardware implementation of the kernels.

Used to confirm functional correctness of the system.

Test the host / kernel integration, get performance estimates.

Confirm that the system runs correctly and with desired performance.

Fastest build time supports quick design iterations.

Best debug capabilities, moderate compilation time with increased visibility of the kernels.

Final FPGA implementation, long build time with accurate (actual) performance results.

The designed build target is specified with the -t flag with v++.

Building the Host Program

The host program can be written using either the native XRT API or OpenCL API calls, and it is compiled using the GNU C++ compiler (g++). Each source file is compiled to an object file (.o) and linked with the Xilinx runtime (XRT) shared library to create the executable which runs on the host CPU.

See the Vitis Documentation for more information.

Compiling and Linking for x86

Important: Set up the command shell or window as described in Setting Up the Vitis Environment prior to running the tools.

Each source file of the host application is compiled into an object file (.o) using the g++ compiler.

g++ ... -c <source_file1> <source_file2> ... <source_fileN>

The generated object files (.o) are linked with the Xilinx Runtime (XRT) shared library to create the executable host program. Linking is performed using the -l option.

g++ ... -l <object_file1.o> ... <object_fileN.o>

Compiling and linking for x86 follows the standard g++ flow. The only requirement is to include the XRT header files and link the XRT shared libraries.

When compiling the source code, the following g++ options are required:

-I$XILINX_XRT/include/: XRT include directory.
-I$XILINX_VIVADO/include: Vivado tools include directory.
-std=c++11: Define the C++ language standard.

When linking the executable, the following g++ options are required:

-L$XILINX_XRT/lib/: Look in XRT library.
-lOpenCL: Search the named library during linking.
-lpthread: Search the named library during linking.
-lrt: Search the named library during linking.
-lstdc++: Search the named library during linking.

Note: In the Vitis Examples you may see the addition of xcl2.cpp source file, and the -I../libs/xcl2 include statement. These additions to the host program and g++ command provide access to helper utilities used by the example code, but are generally not required for your own code.

Building the Device Binary

The kernel code is written in C, C++, OpenCL™ C, or RTL, and is built by compiling the kernel code into a Xilinx® object (XO) file, and linking the XO files into a device binary (XCLBIN) file, as shown in the following figure.

The process, as outlined above, has two steps:

Build the Xilinx object files from the kernel source code.
- For C, C++, or OpenCL kernels, the v++ -c command compiles the source code into Xilinx object (XO) files. Multiple kernels are compiled into separate XO files.
- For RTL kernels, the package_xo command produces the XO file to be used for linking. Refer to RTL Kernels for more information.
- You can also create kernel object (XO) files working directly in the Vitis™ HLS tool. Refer to Compiling Kernels with the Vitis HLS for more information.
After compilation, the v++ -l command links one or multiple kernel objects (XO), together with the hardware platform XSA file, to produce the device binary XCLBIN file.

TIP: The v++ command can be used from the command line, in scripts, or a build system like make, and can also be used through the Vitis IDE as discussed in Using the Vitis IDE.

TIP: The output directories of v++ can be changed. See Vitis Documentation. This is particularly helpful when you want to build multiple versions of the kernel in the same file structure. The makefile example shows an example of how to do this.

See the Vitis Documentation for more information.

Compiling Kernels with the Vitis Compiler

Important: Set up the command shell or window as described in Setting Up the Vitis Environment prior to running the tools.

The first stage in building the xclbin file is to compile the kernel code using the Xilinx Vitis compiler. There are multiple v++ options that need to be used to correctly compile your kernel. The following is an example command line to compile the vadd kernel:

v++ -t sw_emu --platform xilinx_u200_xdma_201830_2 -c -k vadd \
-I'./src' -o'vadd.sw_emu.xo' ./src/vadd.cpp

The various arguments used are described below. Note that some of the arguments are required.

-t <arg>: Specifies the build target, as discussed in Build Targets. Software emulation (sw_emu) is used as an example. Optional. The default is hw.
--platform <arg>: Specifies the accelerator platform for the build. This is required because runtime features, and the target platform are linked as part of the FPGA binary. To compile a kernel for an embedded processor application, specify an embedded processor platform: --platform $PLATFORM_REPO_PATHS/zcu102_base/zcu102_base.xpfm.
-c: Compile the kernel. Required. The kernel must be compiled (-c) and linked (-l) in two separate steps.
-k <arg>: Name of the kernel associated with the source files.
-o'<output>.xo': Specify the shared object file output by the compiler. Optional.
<source_file>: Specify source files for the kernel. Multiple source files can be specified. Required.

The above list is a sample of the extensive options available. Refer to Vitis Compiler Command for details of the various command-line options. Refer to Output Directories of the v++ Command to get an understanding of the location of various output files.

Linking the Kernels

Important: Set up the command shell or window as described in Setting Up the Vitis Environment prior to running the tools.

The kernel compilation process results in a Xilinx object (XO) file whether the kernel is written in C/C++, OpenCL C, or RTL. During the linking stage, XO files from different kernels are linked with the platform to create the FPGA binary container file (.xclbin) used by the host program.

Similar to compiling, linking requires several options. The following is an example command line to link the vadd kernel binary:

v++ -t sw_emu --platform xilinx_u200_xdma_201830_2 --link vadd.sw_emu.xo \
-o'vadd.sw_emu.xclbin' --config ./connectivity.cfg

This command contains the following arguments:

-t <arg>: Specifies the build target. Software emulation (sw_emu) is used as an example. When linking, you must use the same -t and --platform arguments as specified when the input (XO) file was compiled.
--platform <arg>: Specifies the platform to link the kernels with. To link the kernels for an embedded processor application, you simply specify an embedded processor platform: --platform $PLATFORM_REPO_PATHS/zcu102_base/zcu102_base.xpfm
--link: Link the kernels and platform into an FPGA binary file (xclbin).
<input>.xo: Input object file. Multiple object files can be specified to build into the .xclbin.
-o'<output>.xclbin': Specify the output file name. The output file in the link stage will be an .xclbin file. The default output name is a.xclbin
--config ./connectivity.cfg: Specify a configuration file that is used to provide v++ command options for a variety of uses. Refer to Vitis Compiler Command for more information on the --config option.

TIP: Refer to Output Directories of the v++ Command to get an understanding of the location of various output files.

Beyond simply linking the Xilinx object (XO) files, the linking process is also where important architectural details are determined. In particular, this is where the number of compute unit (CUs) to instantiate into hardware is specified, connections from kernel ports to global memory are assigned, and CUs are assigned to SLRs. The following sections discuss some of these build options.

Analyzing the Build Results

The Vitis™ analyzer is a graphical utility that allows you to view and analyze the reports generated while building and running the application. It is intended to let you review reports generated by both the Vitis compiler when the application is built, and the Xilinx® Runtime (XRT) library when the application is run. The Vitis analyzer can be used to view reports from both the v++ command line flow, and the Vitis integrated design environment (IDE). You will launch the tool using the vitis_analyzer command (see Setting Up the Vitis Environment).

See the Vitis Documentation for more information.

Running Emulation

TLDR: Create an emconfig.json file using emconfigutil and set XCL_EMULATION_MODE to sw_emu or hw_emu before executing the host program. The device binary also has to be built for the corresponding target.

See the Vitis Documentation for more information.

Running Emulation on Data Center Accelerator Cards

Important: Set up the command shell or window as described in Setting Up the Vitis Environment prior to running the tools.

Set the desired runtime settings in the xrt.ini file. This step is optional.\
As described in xrt.ini File, the file specifies various parameters to control debugging, profiling, and message logging in XRT when running the host application and kernel execution. This enables the runtime to capture debugging and profile data as the application is running. The Emulation group in the xrt.ini provides features that affect your emulation run. TIP: Be sure to use the v++ -g option when compiling your kernel code for emulation mode.\
Create an emconfig.json file from the target platform as described in emconfigutil Utility. This is required for running hardware or software emulation.\
The emulation configuration file, emconfig.json, is generated from the specified platform using the emconfigutil command, and provides information used by the XRT library during emulation. The following example creates the emconfig.json file for the specified target platform:
```
emconfigutil --platform xilinx_u200_xdma_201830_2
```
In emulation mode, the runtime looks for the emconfig.json file in the same directory as the host executable, and reads in the target configuration for the emulation runs. TIP: It is mandatory to have an up-to-date JSON file for running emulation on your target platform.\
Set the XCL_EMULATION_MODE environment variable to sw_emu (software emulation) or hw_emu (hardware emulation) as appropriate. This changes the application execution to emulation mode.\
Use the following syntax to set the environment variable for C shell (csh):
```
setenv XCL_EMULATION_MODE sw_emu
```
Bash shell:
```
export  XCL_EMULATION_MODE=sw_emu
```
IMPORTANT: The emulation targets will not run if the XCL_EMULATION_MODE environment variable is not properly set.\
Run the application.\
With the runtime initialization file (xrt.ini), emulation configuration file (emconfig.json), and the XCL_EMULATION_MODE environment set, run the host executable with the desired command line argument. IMPORTANT: The INI and JSON files must be in the same directory as the executable.\
For example:
```
./host.exe kernel.xclbin
```
TIP: This command line assumes that the host program is written to take the name of the xclbin file as an argument, as most Vitis examples and tutorials do. However, your application may have the name of the xclbin file hard-coded into the host program, or may require a different approach to running the application.

Running the Application Hardware Build

TLDR: Make sure XCL_EMULATION_MODE is unset. Use a node with the FPGA hardware attached.

See the Vitis Documentation for more information.

TIP: To use the accelerator card, you must have it installed as described in Getting Started with Alveo Data Center Accelerator Cards (UG1301).

Edit the xrt.ini file as described in xrt.ini File.\
This is optional, but recommended when running on hardware for evaluation purposes. You can configure XRT with the xrt.ini file to capture debugging and profile data as the application is running. To capture event trace data when running the hardware, refer to Enabling Profiling in Your Application. To debug the running hardware, refer to Debugging During Hardware Execution. TIP: Ensure to use the v++ -g option when compiling your kernel code for debugging.\
Unset the XCL_EMULATION_MODE environment variable. IMPORTANT: The hardware build will not run if the XCL_EMULATION_MODE environment variable is set to an emulation target.\
For embedded platforms, boot the SD card. TIP: This step is only required for platforms using Xilinx embedded devices such as Versal ACAP or Zynq UltraScale+ MPSoC.\
For an embedded processor platform, copy the contents of the ./sd_card folder produced by the v++ --package command to an SD card as the boot device for your system. Boot your system from the SD card.\
Run your application.\
The specific command line to run the application will depend on your host code. A common implementation used in Xilinx tutorials and examples is as follows:
```
./host.exe kernel.xclbin
```

TIP: This command line assumes that the host program is written to take the name of the xclbin file as an argument, as most Vitis examples and tutorials do. However, your application can have the name of the xclbin file hard-coded into the host program, or can require a different approach to running the application.

Example Makefile

A simple example Vitis project is available at https://code.ornl.gov/7ry/add_test. This project can be used to test the Vitis compile chain and Vitis HLS

The makefile used by this project is an example of how to create a makefile to build an FPGA accelerated application.

HW_TARGET ?= sw_emu # [sw_emu, hw_emu, hw]
LANGUAGE ?= opencl # [opencl, xilinx]
VERSION ?= 1 # [1, 2, 3]

#HWC stands for hardware compiler
HWC = v++
TMP_DIR = _x/$(HW_TARGET)/$(LANGUAGE)/$(VERSION)
src_files = main_xilinx.cpp cv_opencl.cpp double_add.cpp
hpp_files = cv_opencl.hpp double_add.hpp
KERNEL_SRC = kernels/add_kernel_v$(VERSION).cl
COMPUTE_ADD_XO = $(HW_TARGET)/$(LANGUAGE)/xo/add_kernel_v$(VERSION).xo
XCLBIN_FILE = $(HW_TARGET)/$(LANGUAGE)/add_kernel_v$(VERSION).xclbin

ifeq ($(LANGUAGE), opencl)
    KERNEL_SRC = kernels/add_kernel_v$(VERSION).cl
else
    KERNEL_SRC = kernels/add_kernel_v$(VERSION).cpp
endif

.PHONY: all kernel
all: double_add emconfig.json $(XCLBIN_FILE)
build: $(COMPUTE_ADD_XO)
kernel: $(XCLBIN_FILE)

double_add: $(src_files) $(hpp_files)
    g++ -Wall -g -std=c++11 $(src_files) -o $@ -I../common_xlx/ \
    -I${XILINX_XRD}/include/ -L${XILINX_XRT}/lib/ -L../common_xlx -lOpenCL \
    -lpthread -lrt -lstdc++

emconfig.json:
    emconfigutil --platform xilinx_u250_gen3x16_xdma_3_1_202020_1 --nd 1

$(COMPUTE_ADD_XO): $(KERNEL_SRC)
    $(HWC) -c -t $(HW_TARGET) --kernel double_add --temp_dir $(TMP_DIR) \
    --config design.cfg -Ikernels -I. $< -o $@

$(XCLBIN_FILE): $(COMPUTE_ADD_XO)
    $(HWC) -l -t $(HW_TARGET) --temp_dir $(TMP_DIR) --config design.cfg \
    --connectivity.nk=double_add:1:csq_1 \
    $^ -I. -o $@

.PHONY: clean
clean:
    rm -rf double_add emconfig.json xo/ built/ sw_emu/ hw_emu/ hw/ _x *.log .Xil/

Performance Considerations

Vitis and Vivado will use 8 threads by default on Linux. Many of the Vivado tools can only utilize 8 threads for a given task. See the Multithreading in the Vivado Tools section from Vivado Design Suite User Guide Implementation (UG904). I found from experimenting that the block level synthesis task can leverage more than 8 threads, but will not do so unless you set the vivado.synth.jobs and vivado.impl.jobs flags.

Here is an example snippet from the Xilinx Buttom-Up RTL Tutorial which shows one way to query and set the number of CPUs to use.

NCPUS := $(shell grep -c ^processor /proc/cpuinfo)
JOBS := $(shell expr $(NCPUS) - 1)

XOCCFLAGS := --platform $(PLATFORM) -t $(TARGET)  -s -g
XOCCLFLAGS := --link --optimize 3 --vivado.synth.jobs $(JOBS) --vivado.impl.jobs $(JOBS)
# You could uncomment following line and modify the options for hardware debug/profiling
#DEBUG_OPT := --debug.chipscope krnl_aes_1 --debug.chipscope krnl_cbc_1 --debug.protocol all --profile_kernel data:all:all:all:all

build_hw:
	v++ $(XOCCLFLAGS) $(XOCCFLAGS) $(DEBUG_OPT) --config krnl_cbc_test.cfg -o krnl_cbc_test_$(TARGET).xclbin krnl_cbc.xo ../krnl_aes/krnl_aes.xo

Useful References

Useful Commands

xbutil configure # Device and host configuration
xbutil examine   # Status of the system and device
xbutil program   # Download the acceleration program to a given device
xbutil reset     # Resets the given device
xbutil validate  # Validates the basic shell acceleration functionality

platforminfo -l # List all installed platforms.
platforminfo --platform <platform_file> # Get specific FPGA information from the platform.

ExCL User Docs

Introduction

How to Login

Getting Assistance

ExCL Cheat Sheet

Acknowledgment

System Overview

ExCL Server List with Accelerators

New Systems and Devices to be Deployed

Accelerator Highlights

Unique Architecture Highlights

Other Equipment

Primary Usage Notes

Access Host (Login)

General Interactive Login Use

Graphical session use via ThinLinc

Slurm for Large Jobs

Gitlab Runner Speciliazed Nodes

Docker

Specialized usage and reservations

Infrastructure Systems

amundsen

apachepass

clark

cousteau

docker

emu

Description

Access

Development Workflow

Other Resources

Contact

equinox

excl-us

explorer

Hudson

leconte

Description

Contact

Usage

Installed Compilers

GPU Performance

Performance Information

Other Resources

lewis

mcmurdo

Milan

minim1

Oswald

oswald00

Description

Access

Images

Contact

oswald01

Description

Access

Images

Contact

oswald02

Description

Access

Images

Contact

oswald03

Description

Access

Images

Contact

pcie

pcie

Description

Use

Current VMs

Access

Images

Contact

quad

radeon

snapdragon