At the Microsoft 2023 Build conference, Panos Panay announced ONNX Runtime as the gateway to Windows AI. Using ONNX Runtime gives third party developers the same tools we use internally to run AI models on any Windows or other devices across CPU, GPU, NPU, or hybrid with Azure. We are also introducing Olive, a toolchain we created to ease the burden on developers to optimize models for varied Windows and other devices. Both ONNX Runtime and Olive contribute to the velocity of getting your AI models deployed into apps.
What is ONNX Runtime?
ONNX Runtime is a cross-platform, cross-IHV inferencing engine that allows developers to inference across platforms (e.g. iOS, Windows, Android, Linux), across client and cloud devices (CPU/GPU/NPU) or hybrid inferencing between client and Cloud (Azure EP). EPs are Execution Providers which allow targeting of specific hardware devices on specific IHV hardware (i.e. Qualcomm QNN EP for the Windows on Arm Surface Pro 9 5G) or using DML EP on Windows devices to abstract away which IHV or device is being used.
ONNX Runtime makes it easier for you to create amazing AI experiences on Windows and other platforms with less engineering effort and better performance. Olive simplifies the optimization process and eliminates the need for deep hardware knowledge. ONNX Runtime is the future of AI development on Windows.
What is Olive?
Olive is an extensible hardware-aware model optimization tool that composes cutting-edge techniques across model compression, optimization, and compilation. Olive takes a PyTorch or ONNX model and a configuration file and runs optimization tools to create the most optimal model for a specified hardware device. Olive then produces a package that contains the optimized models, ONNX Runtime, the right ONNX Runtime EP, configuration files, and sample code for that EP. This package can be easily integrated into your app.
Hybrid inferencing and lighting up NPUs with new EPs in ONNX Runtime
ONNX Runtime now supports the same API for running models on the device or in the cloud, enabling hybrid inferencing scenarios where your app can use local resources when possible and switch to the cloud when needed. With the new Azure EP preview, you can connect to models deployed in AzureML or even to the Azure OpenAI service, starting with the Whisper model. You just need a few lines of code to specify the cloud endpoint and add your own criteria for when to use the cloud. This gives you more control over costs and user experience, as Azure EP gives you the flexibility to choose between using the larger model in the cloud or the local model at runtime.
Neural Processing Units (NPUs) are the latest silicon that were built from the ground up to support AI workloads and are incredibly energy efficient while offloading workloads from the CPU. We also support inferencing in browsers with ORT Web and are working with partners on incorporating WebNN.
We hope you are excited about these new capabilities in ONNX Runtime and Olive. We encourage you to try them out and share your feedback with us. We are always looking for ways to improve our tools and support more scenarios for our users.
If you want to learn more about ONNX Runtime and Olive and how they can help you build amazing Windows AI applications with ONNX Runtime, visit https://onnxruntime.ai for more information and documentation.