This article introduces the Intel Arc A770 GPU as a competitive option for AI-intensive workloads, especially for those working in the Windows ecosystem. Traditionally, this segment has been dominated by NVIDIA and CUDA GPUs, but Intel’s latest offering offers a solid alternative. This article adds new information to help users more easily work with the Arc A770 GPU directly in Windows without the need for the Windows Subsystem for Linux (WSL).
With practical steps and detailed information, we will cover how to configure and optimize the Arc A770 GPU for various AI models, including Llama2, Llama3, and Phi3. The article also includes performance metrics and memory usage statistics to give you a full understanding of the GPU’s capabilities. Whether you are a developer or a researcher, this article will give you the knowledge you need to use the Intel GPU effectively and efficiently in your AI projects.
Intel recently gave me the opportunity to test its Arc A770 GPU for AI workloads. While detailed specs can be found here, one feature that immediately stands out is the 16GB of RAM. That’s 4GB more than its natural competitor, the NVIDIA RTX 3060, making it an attractive option for AI computing at a similar price.
Since we work primarily with Microsoft technologies at Plain Concepts, I decided to explore the GPU capabilities of Windows. Given my regular work with PyTorch, I started by using the Intel extension for PyTorch to see if I could run models like Llama2, Llama3, and Phi3 and evaluate their performance.
I initially considered using the Windows Subsystem for Linux (WSL), based on suggestions in several blog posts and videos that native Windows support might not be quite ready. However, I decided to experiment with my own Windows setup first, and after a few tweaks and adjustments, I was pleased to find that everything worked just fine.
In this article, I share my experience and the steps I took to run Llama2, Llama3, and Phi3 models on the Intel Arc A770 GPU directly in Windows. I also present performance metrics, including execution time and memory usage for each model. The goal is to provide a comprehensive overview of how to effectively use the Intel Arc A770 GPU for AI-intensive tasks in Windows.
Intel provides a complete guide on how to install the Python extension for Arc GPU.
However, setting up the Arc A770 GPU in Windows required some initial settings. Here is a quick rundown of those settings. For detailed instructions, see the corresponding repository.
As stated in their GitHub repository: “The Intel® Extension for PyTorch extends PyTorch capabilities with modern function optimizations to further improve performance on Intel hardware.”. Specific, «Element Provides simple GPU acceleration for discrete Intel GPUs using the PyTorch xpu device.” This means that with this extension you can take advantage of the Intel Arc A770 GPU for AI tasks without relying on CUDA/NVIDIA, and get even more performance gains by using one of the optimized models.
Luckily, the extension uses the same API as PyTorch, so you typically only need to make a few code changes to get it working on an Intel GPU. Here’s a quick rundown of the changes required:
Add the Intel extension to pytorch and check if the GPU is detected correctly.
This change is not strictly necessary, but it is recommended to check if the GPU is detected correctly before running the model.
Once the model is loaded, move it to the GPU.
Finally, when using the model, make sure the input data is also on the GPU.
To accurately measure performance, I also added some extra code to get the total inference time and maximum memory allocation. This basically consists of warming up each model before inference, and some extra code to wait for the model to run and print the results in a readable form. Visit the examples repository to learn more and reproduce the results on your machine.
Llama2 is the second version of Meta’s popular open-source Llama LLM model. After setting up the environment and making the changes described in the previous section to the official Llama2 samples, I was able to run the Llama2 model on an Intel Arc A770 GPU for both simple inference and educational tasks.
The Llama2 7B model takes up about 14GB of memory with float16 precision. Since there are 16GB available on the GPU, we can run it without any problems. Below you can see the results of an example output using a maximum of 128 tokens in the output.
Likewise, the Llama2 7B chat results were impressive, as the model generated human-like responses in a conversational tone. The chat sample ran smoothly on the Intel Arc A770 GPU, demonstrating its capabilities for chat applications. In this case, the sample is run with 512 tokens in the output to further stress the hardware.
Llama3 is the latest version of Meta’s Llama LLM model, released a couple of months ago. Fortunately, the Intel team was quick to include optimizations to the model in the extension so that the full power of the Intel Arc A770 GPU could be taken advantage of. The process was very similar to that used for Llama2, using the same environment and official samples.
The Llama3 8B model takes up roughly just over 15GB of memory with float16 precision. Since there are 16GB available on the GPU, we can run it without any issues. Below you can see the results of an example output using a maximum of 64 tokens in the output.
Continuing with the Llama2 examples, I also tested the chat capabilities of the Llama3 8B model by increasing the number of output tokens to 256.
Phi3 is Microsoft’s latest model, released on April 24, and is designed for educational tasks. It’s a smaller model than Llama2 and Llama3 (3.8B for the smallest version), but it’s still quite powerful. It’s trained to perform educational tasks, providing detailed and informative responses.
Although Phi3 optimizations for Intel hardware are not yet included in the Intel extension for Pytorch, we can use the third-party library ipex-llm to optimize the model. In this case, since Phi3 is fairly new, I had to install a pre-release version for optimizations, which implements optimizations for all of Phi3’s core operations. Note that ipex-llm is not an official Intel library, but rather a community-driven library, so it is not officially supported by Intel.
After optimizing the model, the rest of the code changes remained the same as for Llama2 and Llama3, so I was able to run the Phi3 model on the Intel Arc A770 GPU without any problems.
The 4K model takes up about 2.5GB of memory with 4-bit precision. Since it has far fewer parameters than the Llama models, it runs much faster. Below you can see the results of an example inference using a maximum of 512 tokens in the output.
To give a comprehensive assessment of the Intel Arc A770 GPU’s performance, I compared the runtime and memory usage of each model on both the Intel Arc A770 GPU and the NVIDIA RTX3080 TI. The metrics were obtained using identical code samples and environment configurations for both GPUs, ensuring a fair and accurate comparison.
model | Output tokens | Time of completion | Max memory used |
meta-lama/Llama-2-7b-hf | 128 | ~7.7 sec | ~12.8 GB |
meta-llama/Llama-2-7b-chat-hf | 512 | ~22.1 sec | ~13.3 GB |
Meta-Lama/Meta-Lama-3-8B | 64 | ~11.5 sec | ~15.1 GB |
Meta-Lama/Meta-Lama-3-8B-Instruct | 256 | ~30.7 sec | ~15.2 GB |
Microsoft/Phi-3-mini-4k-instruct | 512 | ~5.9 sec | ~2.6 GB |
model | Output tokens | Time of completion | Max memory used |
meta-lama/Llama-2-7b-hf | 128 | ~15.5 sec | ~12.8 GB |
meta-llama/Llama-2-7b-chat-hf | 512 | ~51.5 sec | ~13.3 GB |
Meta-Lama/Meta-Lama-3-8B | 64 | ~16.9 sec | ~15.1 GB |
Meta-Lama/Meta-Lama-3-8B-Instruct | 256 | ~66.7 sec | ~15.2 GB |
Microsoft/Phi-3-mini-4k-instruct | 512 | ~16.7 sec | ~2.6 GB |
The following graph shows the normalized execution time of each token for each model on Intel Arc A770 and NVIDIA RTX3080 TI GPUs.
As you can see, the Intel Arc A770 GPU performed exceptionally well across all models, delivering competitive execution times. In particular, the Intel Arc A770 GPU outperformed the NVIDIA RTX3080 TI by two or more times in most cases.
The Intel Arc A770 GPU has proven to be a great option for running AI on a local Windows machine, offering an alternative to the CUDA/NVIDIA ecosystem. The GPU’s ability to efficiently run models such as Llama2, Llama3, and Phi3 demonstrates its potential and high performance. Despite initial setup issues, the process was relatively straightforward and the results were impressive.
At its core, the Intel Arc A770 GPU is a powerful tool for AI applications on Windows. With some initial tweaks and code changes, it handled inference, chat, and training tasks efficiently. This opens up new possibilities for developers and researchers who prefer or need to work in a Windows environment without relying on NVIDIA GPUs and CUDA. As Intel continues to improve its GPU offerings and software support, the Arc A770 and future models could become major players in the AI community.
Code examples used in this article can be found in the IntelArcA770 GitHub repository.
Additionally, below are some resources that I find essential for learning more about Intel’s hardware and library ecosystem for AI workloads.
Casting for the Harry Potter reboot has officially begun! A search has effectively begun to…
Jennifer Lopez tries on 'revenge dress' for her Ben Affleck divorce premiereParis matchJennifer Lopez: Son…
Jennifer Lopez and Ben Affleck recently divorced: their retro neglect... and trendsYahooJennifer Lopez's Divorce Court…
Hailee Steinfeld is happy to have found her perfect partner.The 26-year-old star revealed that she…
JAKARTA - Model and Instagram influencer Demi Rose Mawby is not a cesse de chauffer…
Jennifer Lopez's Divorce Court Prize, Ben Affleck to Benefit from Son Absence for Home, Marriage…