Artificial Intelligence (AI) Clusters in our space describe workflows where development flexibility and isolation are paramount, and tasks are not tightly integrated. These problem sets (highly concurrent vector operations on independent data subsets) typically are well suited to gaining performance when run on GPU platforms for their SIMD execution optimizations.
On these clusters we may run a Kubernetes orchestrator over an underlying Docker platform, and/or SLURM, a combined resource manager and scheduler for managing jobs and compute resources and scheduling resources across time, available resources, and resource characteristics such as available memory, CPU cores, and/or GPU cores.
Docker allows decoupling of the application from the host OS (we currently support Rocky Linux which is a variant of Red Hat Enterprise Linux). This means users can utilize variations on libraries and other package loadouts, and even run on an entirely different Linux guest OS if the need arises, and lock down all that software environment to specific versions that are validated to work with their application, yielding consistent, repeatable results.
SEAS AI Clusters
Watch this space to learn more…