It's that time of year again. A time I always dread, when familiar hardware and disk images are replaced with the frightening introduction of a completely new beast. New motherboards, new CPUs, new hard drives, and new and scary drivers. I never know what's going to happen. Earlier this year we butted heads with Skulltrail and eventually lost going back to what we were using before. Hopefully this time around will be a bit different.
In any case, while I'm in the middle of the changeover, I figured I would write a little something about our graphics test beds, what we look for in one, and how we set them up. It's always controversial and debated in many of our articles, so maybe it will make for some good discussion (or flame wars) here.
First off, in doing graphics tests for the purpose of comparing graphics hardware, we always use the highest end desktop system we can build. By using the fastest processors and memory, we eliminate bottlenecks in the rest of the system and reveal the maximum potential of any given video card. Looking at relative performance in this light will always provide us with better and more reliable information on which card is capable of higher performance. Adding in artificial performance limiters like lower end CPUs and RAM compresses our data and makes it more difficult to see what graphics solution is more desirable.
Even if the CPU in my home system is something low end, I'm still going to want to install the best option I can afford - the performance leader in games I like at my target price when choosing between brands / manufacturers. There are a lot of reasons for this, but a couple stand out to me. With higher graphics performance I should see less choppiness and higher minimums even if my CPU limits average frame rate. I could have more headroom for higher visual quality settings, so the higher performance part (even when CPU limited) should be more capable of playing near-term future games that might be more graphics than CPU limited even on a lower end CPU.
This is absolutely not to say that CPU and RAM aren't important considerations. There is definitely a place for tests that look at the performance of games on certain combinations of CPU and GPU hardware. But that is not something for a graphics hardware review.
Currently, what we do with independent CPU and GPU testing allows people to see where the limits would cross. Imagine I test a bunch of CPUs on the absolute highest end graphics card and see a range between 40 and 60 frames per second for MadeUpTestGame. Then, imagine I take a bunch of GPUs and test them on the absolute highest end CPU and see a performance range between 20 and 60 frames per second with the same MadeUpTestGame benchmark. If I know what CPU and GPU I have I can tell what framerate I should expect to represent my absolute maximum potential performance: the minimum score of a CPU tested with a high end GPU and a GPU tested with a high end CPU.
Now, I might be able to get more accurate information if I actually tested every combination of CPU and GPU, but that's a little out of the scope of a simple GPU launch article. If I only test with a lower end CPU, I will see a lot of the performance numbers get compressed and I will have a harder time extracting information that is useful for comparison purposes. If I test with a high-end CPU, someone with a lower end CPU can find performance information for that CPU and decide if the graphics cards will be overkill or will be a good fit. But that's a different issue than assessing the relative performance of graphics hardware.
So there's that. But what about building the test bed?
Switching hardware and software platforms can often lead to dealing with a lot of new problems. With the old hardware I've been testing on, I know what to expect, what problems constitute a system issue and what are probably a product issue. Even if my system isn't as reliable as I would like it to be, knowing what the issues are really helps in dealing with testing issues. So the first problem I run into is that I don't know what can and will go wrong. This makes troubleshooting take a bit longer than it should, but it's got to be done eventually.
Choosing components is simple: find the fastest thing we've got and shove it in a system. In this current case, that means I'm changing over to an as-of-yet unreleased motherboard and CPU, which makes the potential for problems even larger. The RAM and hard drive we will be using for graphics going forward are things we've already tested though: high performance OCZ DDR3 and an Intel SSD. Yes, the limited size of the Intel SSD will make it tough to get a lot of games on there, but the increase in boot speed and responsiveness of the system go a long way to making testing easier and better, and it should also minimize the impact of random hits to the disk while benchmarking.
As for setting up the system, after we install the 64-bit version of Vista (I really wish there were some other platform on which to game), we set about disabling all sorts of things to get the computer to a state that will allow for consistent testing. Turning features off isn't really so much about gaining performance as it is ensuring consistency. With the amount of things happening in the background with Vista, we see more fluctuations in benchmark performance from run to run. To get a fair comparison without having to run everything 10 times and average performance, we perform the following steps.
First off, we turn off and disable the side bar. Next, we open the security center where automatic updating and security center alerts are disabled. Then we disable user account control.
After a quick reboot (and disabling the welcome screen), we head to advanced system settings and disable system protection (system restore) and remote assistance. While there, we adjust performance settings (in the advanced tab) to best performance and we set the virtual memory page file to a fixed size (custom size with initial == maximum) of 1.5x the amount of ram in the system (though this time, with the limited size of the SSD and the vast amount of RAM in the system, our page file is set to RAM + 512MB).
Once done with that, we reboot and begin disabling the search indexing (by deselecting the folders that are indexed) and the screen saver, moving on from there to power settings. We select High Performance mode and further adjust these to not turn off the hard drive for 40 minutes and to turn off the display after 2 hours. I also like my start menu power button to turn the computer off rather than make it sleep, but that's personal preference.
At this point, any service packs are installed, then chipset drivers, then graphics drivers, then any other system drivers that are needed. After the billion reboots there and removing any backup files left from the service pack install (if we aren't using a slipstreamed disc), we get back to the process at hand: un-Vistaing Vista.
In no particular order, moving files to the recycle bin on delete is disabled, scheduled defragmentation is disabled, the desktop resolution is set to the max, and folder options are changed to show all hidden files. We even prevent the notification area from hiding unused icons and disable the start menu highlighting of new programs. Then it's on to a couple services we disable as well. SuperFetch and ReadyBoost are both disabled, SuperFetch because app launch times don't matter and we use multiple runs to get tests loaded into memory, and ReadyBoost because we are using an SSD and don't need it.
We used to also disable audio, but there are some games that don't run without audio support. Enabling and disabling audio is more trouble than it's worth. In games that have the ability to disable sound during testing, we do so, but if there is no option we do nothing.
Our desktop features shortcuts to batch files that delete the contents of the prefetch directory and run ProcessIdleTasks. However, with an SSD it isn't really necessary or desirable to run ProcessIdleTasks because of the fact that one of the idle tasks is defrag (which you don't want to run on an SSD anyway).
So that's about it as far as system set up goes. Well, after installing games and all that good stuff anyway. Right now we are also looking at updating our game suite. On the short list are: Far Cry 2, Crysis Warhead, Fallout 3, S.T.A.L.K.E.R. Clear Sky, Call of Duty World at War, and Brothers in Arms Hell's Highway. While I'm not sure if we will actually be able to incorporate all these games into our next round of graphics card testing, the first games we drop will be ones that are precluded by these new ones: Fallout 3 will replace Oblivion and Crysis Warhead will replace Crysis.
I'd love to be able to test 20 games for every graphics hardware review, but it's just not possible to do that kind of testing under normal circumstances. We will do our best to evaluate games and pick the ones that make the most sense going forward.
Oh, and I can't wait until I can talk more about what is actually in this new graphics test bed. It's pretty freaking sweet :-)
In any case, while I'm in the middle of the changeover, I figured I would write a little something about our graphics test beds, what we look for in one, and how we set them up. It's always controversial and debated in many of our articles, so maybe it will make for some good discussion (or flame wars) here.
First off, in doing graphics tests for the purpose of comparing graphics hardware, we always use the highest end desktop system we can build. By using the fastest processors and memory, we eliminate bottlenecks in the rest of the system and reveal the maximum potential of any given video card. Looking at relative performance in this light will always provide us with better and more reliable information on which card is capable of higher performance. Adding in artificial performance limiters like lower end CPUs and RAM compresses our data and makes it more difficult to see what graphics solution is more desirable.
Even if the CPU in my home system is something low end, I'm still going to want to install the best option I can afford - the performance leader in games I like at my target price when choosing between brands / manufacturers. There are a lot of reasons for this, but a couple stand out to me. With higher graphics performance I should see less choppiness and higher minimums even if my CPU limits average frame rate. I could have more headroom for higher visual quality settings, so the higher performance part (even when CPU limited) should be more capable of playing near-term future games that might be more graphics than CPU limited even on a lower end CPU.
This is absolutely not to say that CPU and RAM aren't important considerations. There is definitely a place for tests that look at the performance of games on certain combinations of CPU and GPU hardware. But that is not something for a graphics hardware review.
Currently, what we do with independent CPU and GPU testing allows people to see where the limits would cross. Imagine I test a bunch of CPUs on the absolute highest end graphics card and see a range between 40 and 60 frames per second for MadeUpTestGame. Then, imagine I take a bunch of GPUs and test them on the absolute highest end CPU and see a performance range between 20 and 60 frames per second with the same MadeUpTestGame benchmark. If I know what CPU and GPU I have I can tell what framerate I should expect to represent my absolute maximum potential performance: the minimum score of a CPU tested with a high end GPU and a GPU tested with a high end CPU.
Now, I might be able to get more accurate information if I actually tested every combination of CPU and GPU, but that's a little out of the scope of a simple GPU launch article. If I only test with a lower end CPU, I will see a lot of the performance numbers get compressed and I will have a harder time extracting information that is useful for comparison purposes. If I test with a high-end CPU, someone with a lower end CPU can find performance information for that CPU and decide if the graphics cards will be overkill or will be a good fit. But that's a different issue than assessing the relative performance of graphics hardware.
So there's that. But what about building the test bed?
Switching hardware and software platforms can often lead to dealing with a lot of new problems. With the old hardware I've been testing on, I know what to expect, what problems constitute a system issue and what are probably a product issue. Even if my system isn't as reliable as I would like it to be, knowing what the issues are really helps in dealing with testing issues. So the first problem I run into is that I don't know what can and will go wrong. This makes troubleshooting take a bit longer than it should, but it's got to be done eventually.
Choosing components is simple: find the fastest thing we've got and shove it in a system. In this current case, that means I'm changing over to an as-of-yet unreleased motherboard and CPU, which makes the potential for problems even larger. The RAM and hard drive we will be using for graphics going forward are things we've already tested though: high performance OCZ DDR3 and an Intel SSD. Yes, the limited size of the Intel SSD will make it tough to get a lot of games on there, but the increase in boot speed and responsiveness of the system go a long way to making testing easier and better, and it should also minimize the impact of random hits to the disk while benchmarking.
As for setting up the system, after we install the 64-bit version of Vista (I really wish there were some other platform on which to game), we set about disabling all sorts of things to get the computer to a state that will allow for consistent testing. Turning features off isn't really so much about gaining performance as it is ensuring consistency. With the amount of things happening in the background with Vista, we see more fluctuations in benchmark performance from run to run. To get a fair comparison without having to run everything 10 times and average performance, we perform the following steps.
First off, we turn off and disable the side bar. Next, we open the security center where automatic updating and security center alerts are disabled. Then we disable user account control.
After a quick reboot (and disabling the welcome screen), we head to advanced system settings and disable system protection (system restore) and remote assistance. While there, we adjust performance settings (in the advanced tab) to best performance and we set the virtual memory page file to a fixed size (custom size with initial == maximum) of 1.5x the amount of ram in the system (though this time, with the limited size of the SSD and the vast amount of RAM in the system, our page file is set to RAM + 512MB).
Once done with that, we reboot and begin disabling the search indexing (by deselecting the folders that are indexed) and the screen saver, moving on from there to power settings. We select High Performance mode and further adjust these to not turn off the hard drive for 40 minutes and to turn off the display after 2 hours. I also like my start menu power button to turn the computer off rather than make it sleep, but that's personal preference.
At this point, any service packs are installed, then chipset drivers, then graphics drivers, then any other system drivers that are needed. After the billion reboots there and removing any backup files left from the service pack install (if we aren't using a slipstreamed disc), we get back to the process at hand: un-Vistaing Vista.
In no particular order, moving files to the recycle bin on delete is disabled, scheduled defragmentation is disabled, the desktop resolution is set to the max, and folder options are changed to show all hidden files. We even prevent the notification area from hiding unused icons and disable the start menu highlighting of new programs. Then it's on to a couple services we disable as well. SuperFetch and ReadyBoost are both disabled, SuperFetch because app launch times don't matter and we use multiple runs to get tests loaded into memory, and ReadyBoost because we are using an SSD and don't need it.
We used to also disable audio, but there are some games that don't run without audio support. Enabling and disabling audio is more trouble than it's worth. In games that have the ability to disable sound during testing, we do so, but if there is no option we do nothing.
Our desktop features shortcuts to batch files that delete the contents of the prefetch directory and run ProcessIdleTasks. However, with an SSD it isn't really necessary or desirable to run ProcessIdleTasks because of the fact that one of the idle tasks is defrag (which you don't want to run on an SSD anyway).
So that's about it as far as system set up goes. Well, after installing games and all that good stuff anyway. Right now we are also looking at updating our game suite. On the short list are: Far Cry 2, Crysis Warhead, Fallout 3, S.T.A.L.K.E.R. Clear Sky, Call of Duty World at War, and Brothers in Arms Hell's Highway. While I'm not sure if we will actually be able to incorporate all these games into our next round of graphics card testing, the first games we drop will be ones that are precluded by these new ones: Fallout 3 will replace Oblivion and Crysis Warhead will replace Crysis.
I'd love to be able to test 20 games for every graphics hardware review, but it's just not possible to do that kind of testing under normal circumstances. We will do our best to evaluate games and pick the ones that make the most sense going forward.
Oh, and I can't wait until I can talk more about what is actually in this new graphics test bed. It's pretty freaking sweet :-)
33 Comments
View All Comments
strikeback03 - Wednesday, October 29, 2008 - link
Wouldn't have to be a separate edition, just a control panel option to switch to bare minimum. If the graphics drivers could run in safe mode, that would probably be about perfect.And why would you need a search function?
Mr Perfect - Tuesday, October 28, 2008 - link
Actually, most of those tweaks can be done on XP too. Performance mode, hidden tray icons, highlighted programs, automatic updates, security center alerts, disabling the welcome screen, system restore, etc, etc...I actually thought it was kind of funny when he called it de-vistaing. This is usually what I call de-XPing, since you almost end up with Windows 2000 with all that stuff turned off.
strikeback03 - Wednesday, October 29, 2008 - link
There is a welcome screen in XP?Mr Perfect - Wednesday, October 29, 2008 - link
Yes, that goofy "Click your picture icon to login" screen. If there is only one user account on the PC, I think it skips right over the welcome screen and logs you in automatically. If it does show up, you can turn it off in the control panel to get a proper CTRL+ALT+DEL login box.4wardtristan - Wednesday, October 29, 2008 - link
u can also press ctrl+alt+delete twice at the welcome screen to get the oldschool login screen :)Concillian - Tuesday, October 28, 2008 - link
Yeah, for the next week or so I'm still using Windows 2000 on my main gaming machine and have been since 2000. Making the move to Vista and I basically come to the realization that I'm going to end up with something that will be using almost no new features outside of a new sound and window manager theme.DerekWilson - Tuesday, October 28, 2008 - link
xp 64 is worse than vista by a huge margin. and we're using more than 4GB of RAM in our future test bed.also, even if there are people out there who don't care about dx10 at all, it is still important to look at dx10 performance to get an understanding of graphics capability and the future of the industry.
Myrandex - Wednesday, October 29, 2008 - link
I <3 my XP64. I don't know what is terrible about it. At least it lets you use h/w sound acceleration :PI will eventually move to Vista x64 for my gaming PC, but for now I'm thoroughly pleased with XP64 (Phenom 2.6GHz., 4GB DDR2-1066, 4850 1GB, X-Fi, 500GB Raid-0, etc.). I have Vista x64 on my laptop and I use that just for getting stuff done, while I play on the XP64 machine.
Jason
chizow - Tuesday, October 28, 2008 - link
Heh, you really have a hard time hiding your contempt for Vista 64 Derek, but is it justified? You had a chance to put the issue to rest when you promised a Vista 64 vs. XP comparo nearly a year ago, before you guys made the switch to 4GB and Vista 64 earlier this year. But you never got around to it for whatever reason so I guess we're stick with not-so-subtle jabs at Vista until 7 (or Mojave).As for benchmarking and testbed methodology, I'd like to see some changes as other sites have done. I know you've said in the past that you will never do frames vs. time graphs, but only a few per review would be invaluable in drawing conclusions that simple FPS averages would not show. Not only would it put to rest any questions about min FPS, it'd also show time spent at various frame rates.
I'd also like to see some CPU/GPU speed scaling differences for the featured part in a review. Again, this would not be feasible for all parts in a review, but if done only for the featured part and one or two games, that would give readers a good indication of how that part scales with slower/faster CPUs and also how it scales with clockspeed. Over time, one would be able to cross-reference and compare featured parts as long as the test bed remained the same.
For example, if you were reviewing GTX 280 SLI, you'd run your EE Nehalem at 2GHz/3GHz/4GHz and then re-run those tests at 550/600/650MHz GPU clock. The information you might glean from such a comparison after comparing to an earlier GTX 280 review might be that a single GTX 280 with a 2GHz CPU isn't much different than GTX 280 SLI, but very different with a 4GHz CPU. Or that a 600MHz GTX 280 isn't much different in performance relative to an OC'd GTX 260 at 650MHz etc.
Lastly, I'd like to see drivers updated more frequently, or at least periodic driver comparisons for a single part from each vendor. I understand you guys need to used archived results to save time on a short deadline, but using launch drivers months after release seems a bit antiquated in current reviews. At least a comparison would show any performance difference between driver versions, if any.
Hrel - Tuesday, October 28, 2008 - link
I'd love to see flight simulator included in your average testing, it's a HUGE niche. As well as 3D Mark scores, how you guys have still not decided to include 3D mark in every test you run is beyond me... but please start using it! Please! It's objective testing, games are subjective, only when using data from both types of information can you create a complete picture of the performance of any tested part, RAM, Motherboard, GPU or CPU it all needs to be looked at subjectively and objectively.