No Travel? No Problem.

Remote Participation
Eliminate Variance, Keep Your SLAs: Domain-Specific Networks for Machine Learning
Presenter
Event Type
Exhibitor Forum
Tags
Correctness
Machine Learning and Artificial Intelligence
Parallel Programming Languages and Models
Registration Categories
TP
XO / EX
TimeTuesday, 16 November 20214pm - 4:30pm CST
Location263
DescriptionComputation- and communication-intensive workloads like machine learning (ML) and high-performance computing (HPC) require strict adherence to customer service level agreements (SLAs). With SLAs confounded by variability of run-to-run performance, loosely characterized as 99th percentile “tail latency", optimizing these workloads requires eliminating sources of latency and performance variance. Groq’s emerging novel tensor streaming processor (TSP) architecture and its RealScale™ synchronous network allows robust SLA delivery without execution time variability to support batch-1 inference of giga-scale ML workloads.

This talk will give a guided tour of networking, both inside and out, for ML on Groq’s TSP system architecture. Data movement is used for fine-grained communication between processing elements for reshaping tensors in ML workloads. We’ll discuss the interconnection network in terms of topology, routing and flow control, focusing on the GroqChip™ processor’s unique on-chip and off-chip network. The on-chip network makes use of hardware support for tensor data types, which are lowered to a rank-2 tensor for the purpose of efficiently mapping to the underlying hardware, and provides over 60 terabytes/sec of on-chip stream bandwidth to stream tensors to the functional units consuming them, and 3.6 terabytes/sec of off-chip bisection bandwidth interconnecting a rack of 72 GroqChips. Further, we will discuss instruction set architecture (ISA) support and software stack for tensor re-shapes, optimizing tensor elements through rearrangement and efficiently parallelizing the workload. The resulting tensor streaming multiprocessor allows modern giga-scale ML workloads to operate efficiently at-scale exploiting both model and data parallelism.
Back To Top Button