It's that time of year again. A time I always dread, when familiar hardware and disk images are replaced with the frightening introduction of a completely new beast. New motherboards, new CPUs, new hard drives, and new and scary drivers. I never know what's going to happen. Earlier this year we butted heads with Skulltrail and eventually lost going back to what we were using before. Hopefully this time around will be a bit different.
In any case, while I'm in the middle of the changeover, I figured I would write a little something about our graphics test beds, what we look for in one, and how we set them up. It's always controversial and debated in many of our articles, so maybe it will make for some good discussion (or flame wars) here.
First off, in doing graphics tests for the purpose of comparing graphics hardware, we always use the highest end desktop system we can build. By using the fastest processors and memory, we eliminate bottlenecks in the rest of the system and reveal the maximum potential of any given video card. Looking at relative performance in this light will always provide us with better and more reliable information on which card is capable of higher performance. Adding in artificial performance limiters like lower end CPUs and RAM compresses our data and makes it more difficult to see what graphics solution is more desirable.
Even if the CPU in my home system is something low end, I'm still going to want to install the best option I can afford - the performance leader in games I like at my target price when choosing between brands / manufacturers. There are a lot of reasons for this, but a couple stand out to me. With higher graphics performance I should see less choppiness and higher minimums even if my CPU limits average frame rate. I could have more headroom for higher visual quality settings, so the higher performance part (even when CPU limited) should be more capable of playing near-term future games that might be more graphics than CPU limited even on a lower end CPU.
This is absolutely not to say that CPU and RAM aren't important considerations. There is definitely a place for tests that look at the performance of games on certain combinations of CPU and GPU hardware. But that is not something for a graphics hardware review.
Currently, what we do with independent CPU and GPU testing allows people to see where the limits would cross. Imagine I test a bunch of CPUs on the absolute highest end graphics card and see a range between 40 and 60 frames per second for MadeUpTestGame. Then, imagine I take a bunch of GPUs and test them on the absolute highest end CPU and see a performance range between 20 and 60 frames per second with the same MadeUpTestGame benchmark. If I know what CPU and GPU I have I can tell what framerate I should expect to represent my absolute maximum potential performance: the minimum score of a CPU tested with a high end GPU and a GPU tested with a high end CPU.
Now, I might be able to get more accurate information if I actually tested every combination of CPU and GPU, but that's a little out of the scope of a simple GPU launch article. If I only test with a lower end CPU, I will see a lot of the performance numbers get compressed and I will have a harder time extracting information that is useful for comparison purposes. If I test with a high-end CPU, someone with a lower end CPU can find performance information for that CPU and decide if the graphics cards will be overkill or will be a good fit. But that's a different issue than assessing the relative performance of graphics hardware.
So there's that. But what about building the test bed?
Switching hardware and software platforms can often lead to dealing with a lot of new problems. With the old hardware I've been testing on, I know what to expect, what problems constitute a system issue and what are probably a product issue. Even if my system isn't as reliable as I would like it to be, knowing what the issues are really helps in dealing with testing issues. So the first problem I run into is that I don't know what can and will go wrong. This makes troubleshooting take a bit longer than it should, but it's got to be done eventually.
Choosing components is simple: find the fastest thing we've got and shove it in a system. In this current case, that means I'm changing over to an as-of-yet unreleased motherboard and CPU, which makes the potential for problems even larger. The RAM and hard drive we will be using for graphics going forward are things we've already tested though: high performance OCZ DDR3 and an Intel SSD. Yes, the limited size of the Intel SSD will make it tough to get a lot of games on there, but the increase in boot speed and responsiveness of the system go a long way to making testing easier and better, and it should also minimize the impact of random hits to the disk while benchmarking.
As for setting up the system, after we install the 64-bit version of Vista (I really wish there were some other platform on which to game), we set about disabling all sorts of things to get the computer to a state that will allow for consistent testing. Turning features off isn't really so much about gaining performance as it is ensuring consistency. With the amount of things happening in the background with Vista, we see more fluctuations in benchmark performance from run to run. To get a fair comparison without having to run everything 10 times and average performance, we perform the following steps.
First off, we turn off and disable the side bar. Next, we open the security center where automatic updating and security center alerts are disabled. Then we disable user account control.
After a quick reboot (and disabling the welcome screen), we head to advanced system settings and disable system protection (system restore) and remote assistance. While there, we adjust performance settings (in the advanced tab) to best performance and we set the virtual memory page file to a fixed size (custom size with initial == maximum) of 1.5x the amount of ram in the system (though this time, with the limited size of the SSD and the vast amount of RAM in the system, our page file is set to RAM + 512MB).
Once done with that, we reboot and begin disabling the search indexing (by deselecting the folders that are indexed) and the screen saver, moving on from there to power settings. We select High Performance mode and further adjust these to not turn off the hard drive for 40 minutes and to turn off the display after 2 hours. I also like my start menu power button to turn the computer off rather than make it sleep, but that's personal preference.
At this point, any service packs are installed, then chipset drivers, then graphics drivers, then any other system drivers that are needed. After the billion reboots there and removing any backup files left from the service pack install (if we aren't using a slipstreamed disc), we get back to the process at hand: un-Vistaing Vista.
In no particular order, moving files to the recycle bin on delete is disabled, scheduled defragmentation is disabled, the desktop resolution is set to the max, and folder options are changed to show all hidden files. We even prevent the notification area from hiding unused icons and disable the start menu highlighting of new programs. Then it's on to a couple services we disable as well. SuperFetch and ReadyBoost are both disabled, SuperFetch because app launch times don't matter and we use multiple runs to get tests loaded into memory, and ReadyBoost because we are using an SSD and don't need it.
We used to also disable audio, but there are some games that don't run without audio support. Enabling and disabling audio is more trouble than it's worth. In games that have the ability to disable sound during testing, we do so, but if there is no option we do nothing.
Our desktop features shortcuts to batch files that delete the contents of the prefetch directory and run ProcessIdleTasks. However, with an SSD it isn't really necessary or desirable to run ProcessIdleTasks because of the fact that one of the idle tasks is defrag (which you don't want to run on an SSD anyway).
So that's about it as far as system set up goes. Well, after installing games and all that good stuff anyway. Right now we are also looking at updating our game suite. On the short list are: Far Cry 2, Crysis Warhead, Fallout 3, S.T.A.L.K.E.R. Clear Sky, Call of Duty World at War, and Brothers in Arms Hell's Highway. While I'm not sure if we will actually be able to incorporate all these games into our next round of graphics card testing, the first games we drop will be ones that are precluded by these new ones: Fallout 3 will replace Oblivion and Crysis Warhead will replace Crysis.
I'd love to be able to test 20 games for every graphics hardware review, but it's just not possible to do that kind of testing under normal circumstances. We will do our best to evaluate games and pick the ones that make the most sense going forward.
Oh, and I can't wait until I can talk more about what is actually in this new graphics test bed. It's pretty freaking sweet :-)
In any case, while I'm in the middle of the changeover, I figured I would write a little something about our graphics test beds, what we look for in one, and how we set them up. It's always controversial and debated in many of our articles, so maybe it will make for some good discussion (or flame wars) here.
First off, in doing graphics tests for the purpose of comparing graphics hardware, we always use the highest end desktop system we can build. By using the fastest processors and memory, we eliminate bottlenecks in the rest of the system and reveal the maximum potential of any given video card. Looking at relative performance in this light will always provide us with better and more reliable information on which card is capable of higher performance. Adding in artificial performance limiters like lower end CPUs and RAM compresses our data and makes it more difficult to see what graphics solution is more desirable.
Even if the CPU in my home system is something low end, I'm still going to want to install the best option I can afford - the performance leader in games I like at my target price when choosing between brands / manufacturers. There are a lot of reasons for this, but a couple stand out to me. With higher graphics performance I should see less choppiness and higher minimums even if my CPU limits average frame rate. I could have more headroom for higher visual quality settings, so the higher performance part (even when CPU limited) should be more capable of playing near-term future games that might be more graphics than CPU limited even on a lower end CPU.
This is absolutely not to say that CPU and RAM aren't important considerations. There is definitely a place for tests that look at the performance of games on certain combinations of CPU and GPU hardware. But that is not something for a graphics hardware review.
Currently, what we do with independent CPU and GPU testing allows people to see where the limits would cross. Imagine I test a bunch of CPUs on the absolute highest end graphics card and see a range between 40 and 60 frames per second for MadeUpTestGame. Then, imagine I take a bunch of GPUs and test them on the absolute highest end CPU and see a performance range between 20 and 60 frames per second with the same MadeUpTestGame benchmark. If I know what CPU and GPU I have I can tell what framerate I should expect to represent my absolute maximum potential performance: the minimum score of a CPU tested with a high end GPU and a GPU tested with a high end CPU.
Now, I might be able to get more accurate information if I actually tested every combination of CPU and GPU, but that's a little out of the scope of a simple GPU launch article. If I only test with a lower end CPU, I will see a lot of the performance numbers get compressed and I will have a harder time extracting information that is useful for comparison purposes. If I test with a high-end CPU, someone with a lower end CPU can find performance information for that CPU and decide if the graphics cards will be overkill or will be a good fit. But that's a different issue than assessing the relative performance of graphics hardware.
So there's that. But what about building the test bed?
Switching hardware and software platforms can often lead to dealing with a lot of new problems. With the old hardware I've been testing on, I know what to expect, what problems constitute a system issue and what are probably a product issue. Even if my system isn't as reliable as I would like it to be, knowing what the issues are really helps in dealing with testing issues. So the first problem I run into is that I don't know what can and will go wrong. This makes troubleshooting take a bit longer than it should, but it's got to be done eventually.
Choosing components is simple: find the fastest thing we've got and shove it in a system. In this current case, that means I'm changing over to an as-of-yet unreleased motherboard and CPU, which makes the potential for problems even larger. The RAM and hard drive we will be using for graphics going forward are things we've already tested though: high performance OCZ DDR3 and an Intel SSD. Yes, the limited size of the Intel SSD will make it tough to get a lot of games on there, but the increase in boot speed and responsiveness of the system go a long way to making testing easier and better, and it should also minimize the impact of random hits to the disk while benchmarking.
As for setting up the system, after we install the 64-bit version of Vista (I really wish there were some other platform on which to game), we set about disabling all sorts of things to get the computer to a state that will allow for consistent testing. Turning features off isn't really so much about gaining performance as it is ensuring consistency. With the amount of things happening in the background with Vista, we see more fluctuations in benchmark performance from run to run. To get a fair comparison without having to run everything 10 times and average performance, we perform the following steps.
First off, we turn off and disable the side bar. Next, we open the security center where automatic updating and security center alerts are disabled. Then we disable user account control.
After a quick reboot (and disabling the welcome screen), we head to advanced system settings and disable system protection (system restore) and remote assistance. While there, we adjust performance settings (in the advanced tab) to best performance and we set the virtual memory page file to a fixed size (custom size with initial == maximum) of 1.5x the amount of ram in the system (though this time, with the limited size of the SSD and the vast amount of RAM in the system, our page file is set to RAM + 512MB).
Once done with that, we reboot and begin disabling the search indexing (by deselecting the folders that are indexed) and the screen saver, moving on from there to power settings. We select High Performance mode and further adjust these to not turn off the hard drive for 40 minutes and to turn off the display after 2 hours. I also like my start menu power button to turn the computer off rather than make it sleep, but that's personal preference.
At this point, any service packs are installed, then chipset drivers, then graphics drivers, then any other system drivers that are needed. After the billion reboots there and removing any backup files left from the service pack install (if we aren't using a slipstreamed disc), we get back to the process at hand: un-Vistaing Vista.
In no particular order, moving files to the recycle bin on delete is disabled, scheduled defragmentation is disabled, the desktop resolution is set to the max, and folder options are changed to show all hidden files. We even prevent the notification area from hiding unused icons and disable the start menu highlighting of new programs. Then it's on to a couple services we disable as well. SuperFetch and ReadyBoost are both disabled, SuperFetch because app launch times don't matter and we use multiple runs to get tests loaded into memory, and ReadyBoost because we are using an SSD and don't need it.
We used to also disable audio, but there are some games that don't run without audio support. Enabling and disabling audio is more trouble than it's worth. In games that have the ability to disable sound during testing, we do so, but if there is no option we do nothing.
Our desktop features shortcuts to batch files that delete the contents of the prefetch directory and run ProcessIdleTasks. However, with an SSD it isn't really necessary or desirable to run ProcessIdleTasks because of the fact that one of the idle tasks is defrag (which you don't want to run on an SSD anyway).
So that's about it as far as system set up goes. Well, after installing games and all that good stuff anyway. Right now we are also looking at updating our game suite. On the short list are: Far Cry 2, Crysis Warhead, Fallout 3, S.T.A.L.K.E.R. Clear Sky, Call of Duty World at War, and Brothers in Arms Hell's Highway. While I'm not sure if we will actually be able to incorporate all these games into our next round of graphics card testing, the first games we drop will be ones that are precluded by these new ones: Fallout 3 will replace Oblivion and Crysis Warhead will replace Crysis.
I'd love to be able to test 20 games for every graphics hardware review, but it's just not possible to do that kind of testing under normal circumstances. We will do our best to evaluate games and pick the ones that make the most sense going forward.
Oh, and I can't wait until I can talk more about what is actually in this new graphics test bed. It's pretty freaking sweet :-)
33 Comments
View All Comments
Gary Key - Tuesday, October 28, 2008 - link
We have an excellent Flight Simulator X benchmark coming in the next mobo roundup. ;) Also, we use to run 3DMark and considered running 3DVantage (at least on the mobo/memory side), the problem is that the graphic card manufacturers have a bad habit of doing specific driver optimizations for these programs. Usually these optimizations have no bearing on actually improving game play in general, just there to ensure the benchmark results are improved.DerekWilson - Tuesday, October 28, 2008 - link
3dmark is not objective -- it is what futuremark thinks (subjectively) will be important to the future of gaming.it is also fully synthetic and doesn't give a good report of what exactly a piece of hardware is good at. meaning that it isn't useful for anything practical and the information it provides is not high quality.
we would be MUCH more likely to adopt a fully task specific synthetic benchmark like GPUBench than something like 3dmark ... for performance analysis anyway.
for max load power tests, i always use 3dmark -- it can fully load the graphics hardware without loading the CPU giving you a good GPU level power comparison.
...
flight sims might be nice though ...
lyeoh - Wednesday, October 29, 2008 - link
I personally regard 3D Mark as a meaningless test and a waste of time except to overclockers who mainly use their computers to run 3D Mark, superpi, etc.The time it takes to run a 3D Mark test might as well be used to run a benchmark of a real application/game.
Something like a flight simulator benchmark would definitely be more meaningful than 3D Mark. I believe there have only been a very few flight sim games released in the past few years, that could be a plus or minus depending on how you view it. I personally don't care :).
In fact 2D performance tests might be more useful to me - some 3D cards don't have as good 2D performance as others.
What I find annoying with some other benchmark sites is they only test resolutions like 2560 x 1600. Yes that's useful to test the really high end, but many people are still using 1280x1024 and 1680x1050. That's one of the reasons why I prefer Anandtech :).
Buying a bigger display is a lot of money- the display costs more and you need to spend tons more on graphic cards just to drive that display at a decent frame rate.
Regarding minimum frame rates - if frame rate graphs are not possible, posting minimum and maximum frame rates would be good (averaged over X seconds minimum).
Then there's SLI. I've heard that for some SLI stuff, the interframe delay going from card #1 to card #2 could be different from card #2 to card #1. Say the average frame rate is 60 fps. So on average there's 16ms between frames. However in theory card #2 could be producing a frame 2 milliseconds after card #1, and then nothing happens for 30ms. So the actual perceived display is not quite as smooth as the numbers might have you believe - it might appear closer to 30fps, or "jittery".
Last but not least, if possible try to measure _latency_ as well. e.g. measure the time it takes for mouse button down and/or key down to the action being displayed on the screen. A video card or video driver that produces higher frame rates but adds a lag of 50 milliseconds will be bad for most games where frames per second count. Testing latency should make for an interesting article. If you find that in general the latency is insignificant - say lower than 10ms, you can leave it out of the standard benchmarks and only do latency comparison tests for things like some fancy new tech wireless mouse.
lyeoh - Wednesday, October 29, 2008 - link
I personally regard 3D Mark as a meaningless test and a waste of time except to overclockers who mainly use their computers to run 3D Mark, superpi, etc.The time it takes to run a 3D Mark test might as well be used to run a benchmark of a real application/game.
Something like a flight simulator benchmark would definitely be more meaningful than 3D Mark. I believe there have only been a very few flight sim games released in the past few years, that could be a plus or minus depending on how you view it. I personally don't care :).
In fact 2D performance tests might be more useful to me - some 3D cards don't have as good 2D performance as others.
What I find annoying with some other benchmark sites is they only test resolutions like 2560 x 1600. Yes that's useful to test the really high end, but many people are still using 1280x1024 and 1680x1050. That's one of the reasons why I prefer Anandtech :).
Buying a bigger display is a lot of money- the display costs more and you need to spend tons more on graphic cards just to drive that display at a decent frame rate.
Regarding minimum frame rates - if frame rate graphs are not possible, posting minimum and maximum frame rates would be good (averaged over X seconds minimum).
Then there's SLI. I've heard that for some SLI stuff, the interframe delay going from card #1 to card #2 could be different from card #2 to card #1. Say the average frame rate is 60 fps. So on average there's 16ms between frames. However in theory card #2 could be producing a frame 2 milliseconds after card #1, and then nothing happens for 30ms. So the actual perceived display is not quite as smooth as the numbers might have you believe - it might appear closer to 30fps, or "jittery".
Last but not least, if possible try to measure _latency_ as well. e.g. measure the time it takes for mouse button down and/or key down to the action being displayed on the screen. A video card or video driver that produces higher frame rates but adds a lag of 50 milliseconds will be bad for most games where frames per second count. Testing latency should make for an interesting article. If you find that in general the latency is insignificant - say lower than 10ms, you can leave it out of the standard benchmarks and only do latency comparison tests for things like some fancy new tech wireless mouse.
whatthehey - Tuesday, October 28, 2008 - link
FPS, FPS, FPS, FPS, FPS... oh, and another FPS! Hopefully you can get in at least a few other genres, like simulation (GRID seems to be a fine inclusion), RPG (other than Fallout 3), and RTS.From your comments, it looks like it will be an X58 Nehalem platform; does that mean no need for an NVIDIA chipset to run SLI? God I hope so!
jnmfox - Tuesday, October 28, 2008 - link
+1Anandtech needs to add more non-FPS games. I would like to see Company of Hero or World in Conflict.
Also post minimum frame rates in games, not just the average.
HYPhoenix - Wednesday, October 29, 2008 - link
It would be nice if you guys show a scatter plot of one resolution to see where the framerate stays the most.Mr Perfect - Tuesday, October 28, 2008 - link
Yes, please post more information then just the average framerate. The problem with averages is that a card pumping out a solid, steady framerate can end up with the same average as a card that is fluctuating wildly between highs and lows. As long as the math worked out, you couldn't tell which card was actually a better play.Of course, this could also be solved with a line graph showing framerates over the course of the test, instead of a simple average framerate bar graph. For some reason just about every review site under the sun uses bar graphs though... Well, except for one, but I don't want to mention names in case it starts some sort of review-site-fanboy war. -_-
strikeback03 - Wednesday, October 29, 2008 - link
Of course then they would be back to several charts for each game, as you would need individual charts for each resolution. And as there would be many thousands of frames in a test, there would still be some averaging to compress that data down into a 500 pixel or so wide graph.Mr Perfect - Wednesday, October 29, 2008 - link
True, it does make the review more complex. But that's a good thing, as readers will get a lot more information out of it. A line showing what a card was doing over the course of a test is far more useful then a bar labeled "35FPS".