Skip to main content Skip to main content Windows Experience Devices Windows Developer Microsoft Edge Windows Insider Microsoft 365 Microsoft 365 Azure Copilot Windows Surface Xbox Deals Small Business Support Windows Apps Outlook OneDrive Microsoft Teams OneNote Microsoft Edge Moving from Skype to Teams Computers Shop Xbox Accessories VR & mixed reality Certified Refurbished Trade-in for cash Xbox Game Pass Ultimate PC Game Pass Xbox games PC games Microsoft AI Microsoft Security Dynamics 365 Microsoft 365 for business Microsoft Power Platform Windows 365 Small Business Digital Sovereignty Azure Microsoft Developer Microsoft Learn Support for AI marketplace apps Microsoft Tech Community Microsoft Marketplace Marketplace Rewards Visual Studio Microsoft Rewards Free downloads & security Education Gift cards Licensing Unlocked stories View Sitemap

Announcing preview support for Llama 2 in DirectML

Written By published November 15, 2023

At Inspire this year we talked about how developers will be able to run Llama 2 on Windows with DirectML and the ONNX Runtime and we’ve been hard at work to make this a reality.

We now have a sample showing our progress with Llama 2 7B!

See https://github.com/microsoft/Olive/tree/main/examples/directml/llama_v2

This sample relies on first doing an optimization pass on the model with Olive, a powerful optimization tool for ONNX models. Olive utilizes powerful graph fusion optimizations from ONNX Runtime and a model architecture optimized for DirectML to speed up inference times by up to 10X!

After this optimization pass, Llama 2 7B runs fast enough that you can have a conversation in real time on multiple vendors’ hardware!

We’ve also built a little UI to make it easy to see the optimized model in action.

Thank you to our hardware partners who helped make this happen. For more on how Llama 2 lights up on our partners’ hardware with DirectML see:

We’re excited about this milestone, but this is only a first peek – stay tuned for future enhancements to support even larger models, fine-tuning and lower-precision data types.

Getting started

Requesting Llama 2 access

To run our Olive optimization pass in our sample you should first request access to the Llama 2 weights from Meta.

Drivers

We recommend upgrading to the latest drivers for the best performance.

  • AMD has released optimized graphics drivers supporting AMD RDNA™ 3 devices including AMD Radeon™ RX 7900 Series graphics cards. Download Adrenalin Edition™ 23.11.1 or newer (https://www.amd.com/en/support).
  • Intel has released optimized graphics drivers supporting Intel Arc A-Series graphics cards. Download the latest drivers.
  • NVIDIA: Users of NVIDIA GeForce RTX 20, 30 and 40 Series GPUs, can see these improvements first hand, in GeForce Game Ready Driver 546.01.
Your Privacy Choices Opt-Out Icon Your Privacy Choices
Consumer Health Privacy Sitemap Contact Microsoft Privacy Manage cookies Terms of use Trademarks Safety & eco Recycling About our ads