BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20211207T055413Z
LOCATION:242
DTSTART;TZID=America/Chicago:20211115T113000
DTEND;TZID=America/Chicago:20211115T120000
UID:submissions.supercomputing.org_SC21_sess340_ws_espm107@linklings.com
SUMMARY:Parallel SIMD - A Policy Based Solution for Free Speed-Up Using C+
 + Data-Parallel Types
DESCRIPTION:Workshop\n\nParallel SIMD - A Policy Based Solution for Free S
 peed-Up Using C++ Data-Parallel Types\n\nYadav, Gupta, Reverdell, Kaiser\n
 \nRecent additions to the C++ standard and ongoing standardization efforts
  aim to add data-parallel types to the C++ standard library. This enables 
 the use of vectorization techniques in existing C++ codes without having t
 o rely on the C++ compiler's abilities to auto-vectorize the code's execut
 ion. The integration of the existing parallel algorithms with these new da
 ta-parallel types opens up a new way of speeding up existing codes with mi
 nimal effort. Today, only very little implementation experience exists for
  potential data-parallel execution of the standard parallel algorithms. In
  this paper, we report on experiences and performance analysis results for
  our implementation of two new data-parallel execution policies usable wit
 h HPX's parallel algorithms module: simd and par_simd. We utilize the new 
 experimental implementation of data-parallel types provided by recent vers
 ions of the GNU GCC and Clang C++ standard libraries. The benchmark result
 s collected from artificial tests and real-world codes presented in this p
 aper are very promising. Compared to sequenced execution, we report on spe
 ed-ups of more than three orders of magnitude when executed using the newl
 y implemented data-parallel execution policy par_simd with HPX's parallel 
 algorithms. We also report that our implementation is performance portable
  across different compute architectures (x64 -- Intel and AMD, and Arm), u
 sing different vectorization technologies (AVX2, AVX512, NEON64, and NEON1
 28).\n\nTag: Architectures, Big Data, Cloud and Distributed Computing, Ext
 reme Scale Computing, Heterogeneous Systems, Parallel Programming Language
 s and Models, Parallel Programming Systems, Quantum Computing, Scientific 
 Computing, System Software and Runtime Systems\n\nRegistration Category: W
 orkshop Reg Pass
END:VEVENT
END:VCALENDAR