BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20211207T054801Z
LOCATION:230-231-232
DTSTART;TZID=America/Chicago:20211116T153000
DTEND;TZID=America/Chicago:20211116T160000
UID:submissions.supercomputing.org_SC21_sess175_pap201@linklings.com
SUMMARY:APNN-TC: Accelerating Arbitrary Precision Neural Networks on Amper
 e GPU Tensor Cores
DESCRIPTION:Paper\n\nAPNN-TC: Accelerating Arbitrary Precision Neural Netw
 orks on Ampere GPU Tensor Cores\n\nFeng, Wang, Geng, Li, Ding\n\nOver the 
 years, accelerating neural networks with quantization has been widely stud
 ied. Unfortunately, prior efforts with diverse precisions (e.g., 1-bit wei
 ghts and 2-bit activations) are usually restricted by limited precision su
 pport on GPUs (e.g., int1 and int4). To break such restrictions, we introd
 uce the first Arbitrary Precision Neural Network framework (APNN-TC) to fu
 lly exploit quantization benefits on Ampere GPU tensor cores. Specifically
 , APNN-TC first incorporates a novel emulation algorithm to support arbitr
 ary short bit-width computation with int1 compute primitives and XOR/AND B
 oolean operations. Second, APNN-TC integrates arbitrary precision layer de
 signs to efficiently map our emulation algorithm to tensor cores with nove
 l batching strategies and specialized memory organization. Third, APNN-TC 
 embodies a novel arbitrary precision NN design to minimize memory access a
 cross layers and further improve performance. Extensive evaluations show t
 hat APNN-TC can achieve significant speedup over CUTLASS kernels and vario
 us NN models, such as ResNet and VGG.\n\nTag: Reproducibility Badge, Data 
 Analytics, Machine Learning and Artificial Intelligence\n\nRegistration Ca
 tegory: Tech Program Reg Pass\n\nReproducibility Badges: Artifact Availabl
 e, Artifact Functional, Results Reproduced
END:VEVENT
END:VCALENDAR
