Full Publications/Events (80) ========== ## 2024 (1) * Blog by Intel: [Accelerate Meta* Llama 3 with Intel AI Solutions](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-meta-llama3-with-intel-ai-solutions.html) (Apr 2024) ## 2023 (25) * Blog by Intel: [Effective Weight-Only Quantization for Large Language Models with Intel® Neural Compressor](https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Effective-Weight-Only-Quantization-for-Large-Language-Models/post/1529552) (Oct 2023) * arXiv: [Efficient Post-training Quantization with FP8 Formats](https://arxiv.org/abs/2309.14592) (Sep 2023) * EMNLP'2023 (Under Review): [TEQ: Trainable Equivalent Transformation for Quantization of LLMs](https://openreview.net/forum?id=iaI8xEINAf&referrer=%5BAuthor%20Console%5D) (Sep 2023) * arXiv: [Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs](https://arxiv.org/abs/2309.05516) (Sep 2023) * Blog on Medium: [Quantization Accuracy Loss Diagnosis with Neural Insights](https://medium.com/@NeuralCompressor/quantization-accuracy-loss-diagnosis-with-neural-insights-5d73f4ca2601) (Aug 2023) * Blog on Medium: [Faster Stable Diffusion Inference with Intel Extension for Transformers](https://medium.com/intel-analytics-software/faster-stable-diffusion-inference-with-intel-extension-for-transformers-on-intel-platforms-7e0f563186b0) (July 2023) * Post on Social Media: [ONNXCommunityMeetup2023: INT8 Quantization for Large Language Models with Intel Neural Compressor](https://www.youtube.com/watch?v=luYBWA1Q5pQ) (July 2023) * Blog by Intel: [Accelerate Llama 2 with Intel AI Hardware and Software Optimizations](https://www.intel.com/content/www/us/en/developer/articles/news/llama2.html) (July 2023) * Blog on Medium: [Model quantization diagnosis with Neural Insights](https://medium.com/@NeuralCompressor/model-quantization-diagnosis-with-neural-insights-8117033fba43) (July 2023) * Blog on Medium: [Simplify Your Custom Chatbot Deployment](https://medium.com/intel-analytics-software/simplify-your-custom-chatbot-deployment-on-intel-platforms-c8a911d906cf) (June 2023) * Blog by MSFT: [Olive: A user-friendly toolchain for hardware-aware model optimization](https://cloudblogs.microsoft.com/opensource/2023/06/26/olive-a-user-friendly-toolchain-for-hardware-aware-model-optimization/) (June 2023) * Blog by MSFT: [Automate optimization techniques for transformer models](https://cloudblogs.microsoft.com/opensource/2023/06/26/automate-optimization-techniques-for-transformer-models/) (June 2023) * Post on Social Media: [Get Started Post-Training Dynamic Quantization | AI Model Optimization with Intel® Neural Compressor](https://www.youtube.com/watch?v=5xHKe4wWLes&list=PLg-UKERBljNxC8dmjx7jJA2YADWOFuj_p&index=4) (June 2023) * Post on Social Media: [How to Choose AI Model Quantization Techniques | AI Model Optimization with Intel® Neural Compressor](https://www.youtube.com/watch?v=ie3w_j0Ntsk) (June 2023) * Post on Social Media: [What is AI Model Optimization | AI Model Optimization with Intel® Neural Compressor | Intel Software](https://www.youtube.com/watch?v=m2LokuUdeVg&list=PLg-UKERBljNxC8dmjx7jJA2YADWOFuj_p&index=2) (June 2023) * Blog on Medium: [Streamlining Model Optimization as a Service with Intel Neural Compressor](https://medium.com/intel-analytics-software/streamlining-model-optimization-as-a-service-with-intel-neural-compressor-fd970fdb2928) (June 2023) * Blog on Medium: [Intel Optimization at Netflix](https://medium.com/@amerather_9719/intel-optimization-at-netflix-79ef0efb9d2) (May 2023) * Blog on Medium: [Effective Post-training Quantization for Large Language Models with Enhanced SmoothQuant Approach](https://medium.com/@NeuralCompressor/effective-post-training-quantization-for-large-language-models-with-enhanced-smoothquant-approach-93e9d104fb98) (Apr 2023) * Blog by Intel: [Intel® Xeon® Processors Are Still the Only CPU With MLPerf Results, Raising the Bar By 5x](https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Intel-Xeon-Processors-Are-Still-the-Only-CPU-With-MLPerf-Results/post/1472750) (Apr 2023) * Blog by Intel: [集成英特尔® Neural Compressor,腾讯云TACO Kit为AI应用带来高效异构加速服务]([https://mp.weixin.qq.com/s/I-FQqOuW7HTnwXegLGNAtw](https://www.intel.cn/content/www/cn/zh/customer-spotlight/cases/neural-compressor-tencent-cloud-taco-kit-ai.html)) (Mar 2023) * Post on Social Media: [Adopt with Tencent TACO: Heterogeneous optimization is also key to improving AI computing power](https://mp.weixin.qq.com/s/I-FQqOuW7HTnwXegLGNAtw) (Mar 2023) * Blog on Medium: [Structured Pruning for Transformer-Based Models](https://medium.com/intel-analytics-software/structured-pruning-for-transformer-based-models-116e949ef12c) (Jan 2023) * Post on Social Media: [Training and Inference for Stable Diffusion | Intel Business](https://www.youtube.com/watch?v=emCgSTlJaAg) (Jan 2023) * Blog by Intel: [Intel® AMX Enhances AI Inference Performance](https://www.intel.com/content/www/us/en/products/docs/accelerator-engines/advanced-matrix-extensions/alibaba-solution-brief.html) (Jan 2023) * Blog by TensorFlow: [Optimizing TensorFlow for 4th Gen Intel Xeon Processors](https://blog.tensorflow.org/2023/01/optimizing-tensorflow-for-4th-gen-intel-xeon-processors.html) (Jan 2023) ## 2022 (35) * Blog on Medium: [From Innovation to Ecosystem: A Journey of Intel Neural Compressor](https://medium.com/@NeuralCompressor/from-innovation-to-ecosystem-a-journey-of-intel-neural-compressor-aa61530a9098) (Dec 2022) * Blog on Medium: [MLefficiency — Optimizing transformer models for efficiency](https://medium.com/@kawapanion/mlefficiency-optimizing-transformer-models-for-efficiency-a9e230cff051) (Dec 2022) * Blog on Medium: [One-Click Acceleration of Hugging Face Transformers with Intel’s Neural Coder](https://medium.com/intel-analytics-software/one-click-acceleration-of-huggingface-transformers-with-optimum-intel-by-neural-coder-f35ca3b1a82f) (Dec 2022) * Blog on Medium: [One-Click Quantization of Deep Learning Models with the Neural Coder Extension](https://medium.com/intel-analytics-software/one-click-quantize-your-deep-learning-code-in-visual-studio-code-with-neural-coder-extension-8be1a0022c29) (Dec 2022) * Blog on Medium: [Accelerate Stable Diffusion with Intel Neural Compressor](https://medium.com/intel-analytics-software/accelerating-stable-diffusion-inference-through-8-bit-post-training-quantization-with-intel-neural-e28f3615f77c) (Dec 2022) * Blog on WeChat: [Intel together with Tencent deepens the cooperation to build a cloud foundation for digital and intelligent industry](https://mp.weixin.qq.com/s/CPz9-5Nsh-5N9Q8-UmK--w) (Dec 2022) * Blog on VMware: [Intel Neural Compressor for TF Virtual Appliance packaged by Bitnami](https://marketplace.cloud.vmware.com/services/details/e9c3d891-ca51-4f07-a5aa-3fe6394f15ae) (Nov 2022) * Blog on Tencent Cloud: [Neural Compressor: an open-source Python library for network compression](https://cloud.tencent.com/developer/article/2165895) (Nov 2022) * Blog on Medium: [Running Fast Transformers on CPUs: Intel Approach Achieves Significant Speed Ups and SOTA Performance](https://medium.com/syncedreview/running-fast-transformers-on-cpus-intel-approach-achieves-significant-speed-ups-and-sota-448521704c5e) (Nov 2022) * Blog on Medium: [Personalized Stable Diffusion with Few-Shot Fine-Tuning](https://medium.com/intel-analytics-software/personalized-stable-diffusion-with-few-shot-fine-tuning-on-a-single-cpu-f01a3316b13) (Nov 2022) * NeurIPS'2022: [Fast Distilbert on CPUs](https://arxiv.org/abs/2211.07715) (Oct 2022) * NeurIPS'2022: [QuaLA-MiniLM: a Quantized Length Adaptive MiniLM](https://arxiv.org/abs/2210.17114) (Oct 2022) * Blog by Intel: [Meet the Innovation of Intel AI Software: Intel® Extension for TensorFlow*](https://www.intel.com/content/www/us/en/developer/articles/technical/innovation-of-ai-software-extension-tensorflow.html) (Oct 2022) * Blog by Intel: [PyTorch* Inference Acceleration with Intel® Neural Compressor](https://www.intel.com/content/www/us/en/developer/articles/technical/pytorch-inference-with-intel-neural-compressor.html#gs.gnq0cj) (Oct 2022) * Post on Social Media: Neural Coder, a new plug-in for Intel Neural Compressor was covered by [Twitter](https://twitter.com/IntelDevTools/status/1583629213697212416), [LinkedIn](https://www.linkedin.com/posts/intel-software_oneapi-ai-deeplearning-activity-6989377309917007872-Dbzg?utm_source=share&utm_medium=member_desktop), and [Intel Developer Zone](https://mp.weixin.qq.com/s/LL-4eD-R0YagFgODM23oQA) from Intel, and [Twitter](https://twitter.com/IntelDevTools/status/1583629213697212416/retweets) and [LinkedIn](https://www.linkedin.com/feed/update/urn:li:share:6990377841435574272/) from Hugging Face. (Oct 2022) * Marketplace Distribute: Intel Neural Compressor successfully landed on [GCP](https://console.cloud.google.com/marketplace/product/bitnami-launchpad/inc-tensorflow-intel?project=verdant-sensor-286207), [AWS](https://aws.amazon.com/marketplace/pp/prodview-yjyh2xmggbmga#pdp-support), and [Azure](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/bitnami.inc-tensorflow-intel) marketplace. (Oct 2022) * Post on Social Media: [Neural Coder (Intel Neural Compressor Plug-in): One-Click, No-Code Solution (Pat's Keynote IntelON 2022)](https://twitter.com/i/status/1574909338203967497) (Sep 2022) * Blog on Medium: [Alibaba Cloud and Intel Neural Compressor Deliver Better Productivity for PyTorch Users](https://medium.com/intel-analytics-software/alibaba-cloud-collaborates-with-intel-neural-compressor-for-better-productivity-and-performance-83cdb6500420) [[Chinese version](https://mp.weixin.qq.com/s/LL-4eD-R0YagFgODM23oQA)] (Sep 2022) * Blog on Medium: [Efficient Text Classification with Intel Neural Compressor](https://medium.com/intel-analytics-software/efficient-text-classification-with-intel-neural-compressor-4853296deeac) (Sep 2022) * Blog on Medium: [Dynamic Neural Architecture Search with Intel Neural Compressor](https://medium.com/intel-analytics-software/dynamic-neural-architecture-search-with-intel-neural-compressor-7b05eaf325f3) (Sep 2022) * Blog on Medium: [Easy Quantization in PyTorch Using Fine-Grained FX](https://medium.com/intel-analytics-software/easy-quantization-in-pytorch-using-fine-grained-fx-80be2c4bc2d6) (Sep 2022) * Blog on Medium: [One-Click Enabling of Intel Neural Compressor Features in PyTorch Scripts](https://medium.com/intel-analytics-software/one-click-enable-intel-neural-compressor-features-in-pytorch-scripts-5d4e31f5a22b) (Aug 2022) * Blog by Alibaba: [Deep learning inference optimization for Address Purification](https://zhuanlan.zhihu.com/p/552484413?utm_source=ZHShareTargetIDMore&utm_medium=social&utm_oi=667097517833981952) (Aug 2022) * Blog by Intel: [Accelerate AI Inference without Sacrificing Accuracy](https://www.intel.com/content/www/us/en/developer/videos/accelerate-inference-without-sacrificing-accuracy.html#gs.9yottx) (Jun 2022) * Blog by Meta: [PyTorch Inference Acceleration with Intel® Neural Compressor](https://medium.com/pytorch/pytorch-inference-acceleration-with-intel-neural-compressor-842ef4210d7d) (Jun 2022) * Blog by Hugging Face: [Intel and Hugging Face Partner to Democratize Machine Learning Hardware Acceleration](https://huggingface.co/blog/intel) (Jun 2022) * Blog by Intel: [Intel® Neural Compressor oneAPI](https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html) (Jun 2022) * Blog by Intel: [Intel® Deep Learning Boost - Boost Network Security AI Inference Performance in Google Cloud Platform (GCP)](https://networkbuilders.intel.com/solutionslibrary/intel-deep-learning-boost-boost-network-security-ai-inference-performance-in-google-cloud-platform-gcp-technology-guide) (Apr 2022) * PyTorch Ecosystem: [INC as PT ecosystem project](https://pytorch.org/ecosystem/) (Apr 2022) * Blog by Intel: [New instructions in the Intel® Xeon® Scalable processors combined with optimized software frameworks enable real-time AI within network workloads](https://builders.intel.com/docs/networkbuilders/ai-technologies-unleash-ai-innovation-in-network-applications-solution-brief-1637303210.pdf) (Feb 2022) * Joint blog with MSFT: [Quantizing ONNX Models using Intel® Neural Compressor](https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Quantizing-ONNX-Models-using-Intel-Neural-Compressor/post/1355237) (Feb 2022) * Blog by Intel: [Quantize AI Model by Intel® oneAPI AI Analytics Toolkit on Alibaba Cloud](https://www.intel.com/content/www/us/en/developer/articles/technical/quantize-ai-by-oneapi-analytics-on-alibaba-cloud.html) (Feb 2022) * Blog by SigOpt: [Intel Neural Compressor Quantization with SigOpt](https://sigopt.com/blog/intel-neural-compressor-quantization-with-sigopt/) (Jan 2022) * Post on Social Media: [AI Performance and Productivity with Intel® Neural Compressor](https://twitter.com/IntelAI/status/1469079414562557952) (Jan 2022) * PyTorch Ecosystem: [Ease-of-use quantization for PyTorch with Intel® Neural Compressor](https://pytorch.org/tutorials/recipes/intel_neural_compressor_for_pytorch.html) (Jan 2022) ## 2021 (15) * Tutorial on BiliBili: [Intel Neural Compressor Tutorial on BiliBili](https://space.bilibili.com/1840724569?from=search&seid=8673550305007703901&spm_id_from=333.337.0.0) (Dec 2021) * Blog on GESTALT IT: [Faster AI/ML Results With Intel Neural Compressor](https://gestaltit.com/tech-talks/intel/intel-2021/jpwarren/faster-ai-ml-results-with-intel-neural-compressor) (Dec 2021) * AI Submit’21: [Dynamic Quantization with Intel Neural Compressor and Transformers](https://www.youtube.com/watch?v=-_2ha2CNWXA) (Nov 2021) * NeurIPS’21: [Prune Once for All: Sparse Pre-Trained Language Models](https://nips.cc/Conferences/2021/Schedule?showEvent=21839) (Nov 2021) * Blog by Intel: [Faster, Easier Optimization with Intel® Neural Compressor](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/optimization-with-intel-neural-compressor.html) (Nov 2021) * Blog by Intel: [Accelerate Deep Learning with Intel® Extension for TensorFlow*](https://www.intel.com/content/www/us/en/developer/videos/accelerate-deep-learning-with-intel-tensorflow.html#gs.9yrw90) (Oct 2021) * Post on Social Media: Intel® Neural Compressor: A Scalable Quantization Tool for ONNX Models post on [YouTube](https://youtu.be/Irk9UIcsCng) and [Twitter](https://twitter.com/onnxai/status/1465376442066227205?s=20) (Oct 2021) * Blog by Intel: [A "Double Play" for MLPerf™ Inference Performance Gains with 3rd Generation Intel® Xeon® Scalable Processors](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/intel-mlperf-inference-performance.html) (Sep 2021) * Blog by Intel: [Optimize TensorFlow Pre-trained Model for Inference](https://software.intel.com/content/www/us/en/develop/articles/optimize-tensorflow-pre-trained-model-inference.html) (Jun 2021) * Blog by Intel: [3D Digital Face Reconstruction Solution enabled by 3rd Gen Intel® Xeon® Scalable Processors](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/tencent-3d-digital-face-reconstruction.html) (Apr 2021) * Blog by Intel: [Accelerating Alibaba Transformer model performance with 3rd Gen Intel® Xeon® Scalable Processors (Ice Lake) and Intel® Deep Learning Boost](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/alibaba-lpot.html) (Apr 2021) * Blog by Intel: [MLPerf™ Performance Gains Abound with latest 3rd Generation Intel® Xeon® Scalable Processors](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/3rd-gen-xeon-mlperf-performance-gains.html) (Apr 2021) * Blog by Intel: [Using Low-Precision Optimizations for High-Performance DL Inference Applications](https://techdecoded.intel.io/essentials/using-low-precision-optimizations-for-high-performance-dl-inference-applications/#gs.z20k91) (Apr 2021) * ONNX Ecosystem: [Quantization support for ONNX using LPOT (Low precision optimization tool)](https://wiki.lfaidata.foundation/pages/viewpage.action?pageId=35160391) (Mar 2021) * Blog on NextPlatform:[DL Boost Quantization with CERN's 3D-GANs model](https://www.nextplatform.com/2021/02/01/cern-uses-dlboost-oneapi-to-juice-inference-without-accuracy-loss/) (Feb 2021) ## 2018 - 2020 (4) * Joint presentation with CERN: [Reduced Precision Strategies for Deep Learning: 3DGAN Use Case](https://indico.cern.ch/event/852553/contributions/4059283/attachments/2126838/3581708/Rehm_Florian-IML-Reduced_Precision.pdf) - [presentation](https://indico.cern.ch/event/852553/contributions/4059283/attachments/2126838/3588271/IML2020_wedam_rehm.mp4) on [4th IML Machine Learning Workshop](https://indico.cern.ch/event/852553/contributions/4059283/) (Oct 2020) * Blog by Intel: [Intel Neural Compressor](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/intel-low-precision-optimization-tool.html) (Sep 2020) * Blog by Intel: [Lower Numerical Precision Deep Learning Inference and Training](https://www.intel.com/content/www/us/en/developer/articles/technical/lower-numerical-precision-deep-learning-inference-and-training.html) (May 2018) * ASPLOS’18: [Highly Efficient 8-bit Low Precision Inference of Convolutional Neural Networks with IntelCaffe](https://arxiv.org/abs/1805.08691) (May 2018)