Why GPU Clusters Don't Need to Go Brrr: Leveraging Compound Sparsity to Achieve the Fastest Inference Performance on CPUs

date:

Thursday, Oct 6, 2022

Time:

2:35 pm

Track:

Summary:

In this session, the power of compound sparsity for model compression and inference speedup will be demonstrated for NLP (HuggingFace BERT) and CV (YOLOv5) applications. The open source library SparseML will be used for applying compound sparsity onto dense models, utilizing techniques including structured + unstructured pruning (to 90%+ sparsity), quantization, and knowledge distillation. After sparsification, these models will be run on the DeepSparse engine, which is optimized for executing sparse graphs on CPU hardware at GPU speeds. The participants of the session will learn how to apply compound sparsity so that they can run inference at an order of magnitude faster than the original dense models without a noticeable drop in accuracy.

Ready to attend?

Register now! Join your peers.

Register now View agenda
Newsletter

Knowledge is everything!
Sign up for our newsletter to receive:

  • 10% off your ticket!
  • insights, interviews, tips, news, and much more about Deep Learning World
  • price break reminders