Intel introduced its own lightweight performance analyzer called PresentMon a few years ago, which was recently updated with revamped graphics and several very interesting and convenient features.
PresentMon provides frame rates, frame times, and a variety of other metrics in real time, or records them over a period, for almost any game. It supports DirectX 9 to 12, OpenGL and Vulkan and is available for Windows 10 and 11.
The menu is divided into four sections:
Processes and overlap:
We can select the running process that we want to monitor. PresentMon can also select it automatically. Also, you can set a shortcut to enable or disable the overlay that displays all the relevant information.
The overlay offers three predefined layouts to choose from, but we can also customize it with metrics like CPU usage, GPU temperature, GPU render times, etc.
If we want to record the values over a certain period of time, a dedicated (configurable) shortcut is provided.
This section allows us to access the folder where the registered values and custom configurations are saved.
In the section Data processing, you can change the amount of time that PresentMon checks the graphics API for information. The standard value is 100 milliseconds (10 Hz), which is adequate for most cases. Also, you can adjust the frequency of power and temperature monitoring.
In this section it is also possible to change the duration over which the average values are determined. The default value is 1000 milliseconds (which means the average is calculated every second).
The capture section is about creating a summary file of any captures and a setting for PresentMon to ignore the most frequently running processes on Windows, so that it doesn’t accidentally try to capture them instead of the game.
Let’s now examine what is displayed in the overlay when you select one of the three presets:\
- Essential: Shows only three metrics, all related to each other: in bold is the average frame rate, followed by the 99th percentile value; this value is the same as our low 1% values when evaluating the CPU and GPU.
- GPU Focus: Shows the same metrics as Base, but also adds GPU temperature and memory usage.
- Power/Temperature: Shows GPU Focus metrics, but also adds GPU power and VRM temperature.
Choosing which preset to use depends on your needs. If we are only interested in the average frame rate, Base is sufficient. If you also want to monitor GPU temperature and usage, you can use GPU Focus. And if you also want to monitor GPU power and VRM temperature, you can use Power/Temp.
In general, we recommend using the simplest preset that meets your needs. This will help reduce overlapping elements and improve performance.
The GPU Focus preset focuses on the graphics card and displays a lot of useful information. It covers all the basic presets, but adds something particularly useful: frame timing.
Frame time is the average time in milliseconds between each “Present” instruction issued by the graphics API, which tells the system to start displaying the frame on the monitor. Each component of the PC and the game itself can influence this time period, but the biggest contributors are usually the CPU and the system as a whole.
High frame times can indicate performance issues, such as stuttering. A brief upward spike indicates a temporary slowdown, which may be due to an issue that is causing a short delay.
In general, a lower frame time is better, as it means the PC can display frames more quickly. However, it is important to note that even high frame times can be acceptable, depending on the game and graphics settings.
For example, a frame time of 100 milliseconds may be acceptable for a turn-based strategy game, but may be too high for a first-person shooter.
GPU Busy time is Intel’s new feature in PresentMon: it is a measure of how long the graphics processor takes to render the frame; The timer starts from the moment the GPU receives the frame from a queue, until the moment it replaces the entire frame buffer in VRAM with a new one.
If the frame time is much longer than the GPU active time, game performance is limited by factors such as CPU speed. For obvious reasons, the first can never be shorter than the second, but they can be almost identical and ideally this is what you want in a game.
If we look at our Half-Life 2 example, the average GPU time is 1.42 ms, while the average frame time is 3.46 ms.
This means that the game is CPU limited, as the time it takes to render the frame is longer than the time it takes the GPU to render the frame. To improve game performance, you need to upgrade the CPU or reduce the CPU load.
The graphics card only contributes about 40% of the total time needed to display a frame on the monitor; the rest is affected by the CPU, RAM, operating system and the game itself. This is confirmed by the fact that the reported average GPU usage is around the same percentage.
Information such as GPU clock speed, graphics card fan RPM, voltages, power, etc. is provided. however, the fan speed is not reported correctly.
The “GPU Memory Used” value does not represent actual usage. Instead, it measures what’s called “local budget” in Microsoft PIX, a developer-level analytics tool. This is basically similar to VRAM allocation. So while it provides some insights, it’s not the most accurate metric for determining how much of your card’s memory is actually in use.
The Power/Temperature preset isn’t amazing, but it’s perhaps more useful than the Basic preset because it provides additional detail without being too busy.
Continuing with the tests it turned out that several values are displayed incorrectly or with the fields on the screen completely empty; This is a sign that the app is still in beta and does not collect data in the same way as other programs, such as GPU-Z.
We’ve collected a lot of data and the above set of information gives a clear view of what happens to the hardware when running Half-Life 2.
We immediately notice that the GPU usage is not particularly high, while the CPU shows a very low percentage, as well as a considerable number of dropped frames.
Everything becomes more understandable if we take into account that, by default, the Half-Life 2 game engine limits the frame rate to 300 fps. This limit can be exceeded via a console command, but that’s exactly why we’re losing so many frames. The game engine appears to generate additional frames, but then does not send them to the GPU or stores them and then removes them from the render queue.
Render latency time is the number of milliseconds a frame spends in the queue, plus the time it takes to actually render. GPU usage time shows that most of this latency is due to the queue itself. Of course, there’s always the chance that PresentMon may not report these values correctly, considering Half-Life 2 is now an older DirectX 9 game.
How to analyze in-game performance with PresentMon
Now the in-game performances during the execution of The Last of Us Part 1 (TLOU) will be analyzed, with the Focus profile.
Let’s start by looking at the main screen of The Last of Us Part 1 (TLOU). Like many recent games, TLOU collects all of its shaders in the main menu, especially if we’ve recently updated the drivers or the game itself. While this process can be interrupted, doing so causes noticeable frame skips during gameplay.
The Last of Us actively renders the menu as a 3D scene, which is why there is a good amount of GPU activity. However, at the time of taking this snapshot, the engine was still compiling all shaders.
After compiling, we noticed a big difference. Since the CPU has very little to do, the “performance” of the game menu is completely controlled by the GPU. This is evident from the same GPU and frame occupancy times, along with the GPU usage graph.
Does this mean that PresentMon can always show us when a game is CPU or GPU limited (aka “bottlenecked”)? Let’s look at some examples.
Below is a snapshot of Hearts of Iron 4, a real-time strategy game set during World War II. These titles are known to require a lot of CPU and little graphics card.
The game engine limits the frame rate based on the rotation speed. However, even at maximum, the GPU is only busy for 4.3ms, with the engine consuming the remaining 11ms of frame time. In this context, a faster GPU would be advantageous, but a more powerful CPU would also be useful. Additionally, during the later stages of a match, Hearts of Iron 4 will require as much CPU power as possible.
In testing, the system worked as expected, collecting values for each metric, regardless of which overlay preset you are using. In fact, this is disabled during capture, since running both at the same time will significantly affect performance. The figures are then saved to a csv file, the size of which can grow quickly, depending on sample rates and recording time.
Many other programs can do this as well; Additionally, apps like Afterburner don’t seem to have as many problems with reading metrics as PresentMon. AMD GPU Profiler, Microsoft PIX, and Nvidia NSight tend to provide more information for serious developers and analysts.
The PresentMon overlay is highly customizable and visually appealing, and the Captures summary report provides quick access to average, min/max, low 1%, and low 5% frame rates; Frame and GPU occupancy times are especially useful.
Ultimately, PresentMon is worth trying – it’s free and there’s nothing to lose. It will be interesting to see how Intel continues to improve this new version of its performance monitoring tool and if there is anything new we will let you know.