Monday, August 07, 2006

The Dishonesty of Overclocking

Mainstream review sites on the web are very much as magazines used to be. They are places where people can go to find out things, however, they are much more immediatley available than any magazine ever was. Like magazines however they often carry an aura of expertise and credibility. However, we've had both good and bad magazines in the past. Review sites are similar in that the fact that they exist does not guarantee balance, thoroughness, or professionalism. It is easy to find mainstream review sites like Anandtech, XbitLabs, and Toms Hardware Guide that claim to provide balanced reviews of current procesors. This has become a two horse race between near monopoly, Intel and underdog, AMD.

These reviews show a stark contrast when compared to typical mainstream reviews of things like cars. Reviewers of cars try to show what is offered in terms of convenience, reliability, performance, and cost just as these vehicles are when they come from the factory. In sharp contrast, however, mainstream processor review sites commonly show overclocked configurations in their comparisons. This practice is very common yet it is highly dishonest and unprofessional. When reviewing a car it is never the practice to change timing chips or add nitrous oxide injectors to increase performance. Reviewers never add rooftop luggage carriers or hitch up trailers to increase cargo volume. Reviewers don't make modifications to the transmission or add on new equipment to include with their review. They don't do this for a very simple reason. Most buyers don't modify their vehicles and wouldn't be interested. There are people who enjoy tinkering with cars like adding new equipment and making modifications to increase performance. This is the pro-sport area of cars and this is where people go to discuss these topics. Pro-sport is never mixed in with mainstream car reviews.

However, on mainstream computer hardware sites, mixing these together is common. Reviewers will often modify the operation of the processors by increasing clock and front side bus speeds, change processor voltages and timing settings, and even run processors with oversized cooling rigs. It is very strange that this is so common because it is also highly unethical. If you really are a computer enthusiast there are many sites that cater to overclocking and pushing the performance just as people do with cars. However, people on these sites understand the inherent risks just as people do when they increase the horsepower on a stock engine. They understand that these chips may become unstable and unreliable. They understand that these chips can burn up just engines can blow when pushed.

It is possible to push a processor to extreme clock speeds to where it is too unstable to run even a simple benchmark like SuperPi. It is possible for the configuration to be stable enough to run SuperPi run at 1M but 32M causes a crash. It is possible to run SuperPi at 32M stably but Prime95 causes a crash. It is possible to run Prime95 but running an instance on each core on a dual core processor causes a crash. It is possible for a processor to run stably for five minutes but crash after fifteen. It should be remembered that when reviewers claim that an overlocked system is stable that it is actually running very close to an unstable configuration. And, the only proof of stability is that it hasn't crashed yet. This is not the realm of mainstream computing.

It is very puzzling that reviewers spend time tinkering with processor settings to try to increase performance. Sometimes this is justified by suggesting that they are doing it for computing enthusiasts. However, this doesn't really fit as genuine computing enthusiasts don't get overclocking tips from mainstream sites; they get them from sites dedicated to this, like Overclockers and XtremeSystems. The vast majority of people who buy computers, buy them to use, not to spend days tinkering with them. And, the number of people who actually make modifications is tiny, less than 0.1%. It doesn't seem logical that the mainstream sites would be putting in so much effort for such a tiny group that will get its information elsewhere anyway.

Nor does this explain why these sites would engage in something that is so unethical. It is both dishonest and unprofessional on many levels. For example, in 2002 and 2003 the memory controller on the northbridge chips used by intel was much faster than what was available for AMD's Athlon. However, when AMD intoduced the Athlon 64 with an integrated memory controller this changed. AMD was very briefly faster than Pentium II with the K6 and then faster than PIII with the K7 Athlon. Intel regained the lead again when it introduced the P4. However, even after being upgraded to the Prescott architecture in 2003 the P4 still fell behind once AMD got the K8, Athlon 64 up to 2.2Ghz in late 2003. It has only been very recently that Intel has become competitive again with the new Core 2 Duo (conroe) which is derived from the Duo Core (Yonah) which is derived from the Centrino processor (Pentium M).

However, although Conroe is a very good design its memory speed is still less than the older K8's. One of the things that is done in overclocking is increasing the front side bus (FSB) speed to make it more competitive. This is very unethical because once you start modifying the processor beyond its factory settings you are no longer reviewing the chip. It wouldn't be the factory's responsibility if you have problems with memory synchronization or timing because it isn't running within the factory specifications. If the FSB is too slow then this should show up in the review and it should be the factory's job to produce a faster one. However, overclocking gives the illusion that the problem is easily fixed while relieving the factory of any responsibility if your system is unstable, gets memory errors, or even burns up the processor. This is completely the opposite of what review sites should be doing and shows that that their first priority is not to the site's readers.

If review sites commonly add overclocked data to their reviews knowing that this information is both misleading and even harmful to the site's readers one certainly has to wonder why they do it. This seems to be part of the general shift in influence from regular magazines and has to do with advertizers and free products for review. Consider that Intel's marketing budget is around $4 Billion dollars per year and AMD's is nearly nonexistent. This means that it is common for Intel hardware to end up in the hands of reviewers. This also means that Intel is expecting some sort of favor in return in the form of a favorable review. If Intel doesn't get a favorable review they can always take their free review hardware elsewhere. This fact has skewed online reviews for many years. At times, when Intel's hardware just could not measure up the reviewers have used various tactics to make the hardware look better. Sometimes the conclusions saying that the Intel processors were better were in complete contrast to their own review data.

These shady practices center around benchmarking. Processors are typically compared by using benchmark programs that generate some type of score. However, unlike typical tests for cars that include things like acceleration 0-60 and stopping distance these processor benchmarks can vary greatly in quality. Anyone could see and understand stopping distance but the great majority of people are not familiar with computer programming. As a professional computer programmer I am very much aware that it makes a big difference in quality in what benchmarks test, how they are written, and how they are compiled. It is also possible to skew results by using different testing procedures and by the choice of benchmarks. Yet review sites often pretend that their practices are straightforward even as they knowingly do things to skew the results.

For example, there are many so-called toy benchmarks like SuperPi and Prime95. These programs are tiny and can fit entirely inside the L2 cache on some processors. This can give these programs a tremendous boost in speed because memory is much slower and, if they fit entirely inside the L2 cache, they don't have to access memory very often. However, these toy benchmarks don't actually measure real processor performance because real applications are much larger and would never fit inside the L2 cache. However, it is not unusual to see scores from benchmarks that are heavily influenced by cache size. Since Intel typically uses caches that are twice the size of caches used by AMD these scores from toy benchmarks only benefit Intel in the comparisons. The real skill of benchmark skewing is to choose benchmark programs that use more cache than AMD has but not more than Intel has. This type of artificial behavior can give Intel processors a large boost in scores.

Other benchmarks may need more memory speed. If Intel's processors are slower in memory speed a reviewer can avoid a bad score by either not including these benchmarks in their review or by using FSB overclocking to make the Intel processors more competitive. Often they can justify their overclocked tests by suggesting that they are a preview of a faster procesor that Intel will release at some point. This however is not a valid a argument since it would be difficult to buy a procesor that isn't released yet. In reality the point of including overclocked scores is a sort of advertizing. By adding an OC'ed (overclocked) sample they can make sure that an Intel processor is the fastest in all or the majority of the tests. Even if they have gone to extreme measures to push this processor to a faster speed the fact that Intel gets the top rating is still a good association. Imagine how this would be if car reviewers let the factory give them a test car that had been modified so that it was substantially better than what the real car that people could buy would be. This would essentially be defrauding the public, yet this is what mainstream review sites do everyday.

Dual core processors are becoming more common now. AMD had designed the Athlon 64 for dual core from the beginning. Intel's first attempts at dual core by simply sticking two chips inside the same package gave very poor results. Intel is doing much better now with the Core 2 Duo chips. However, there are some differnces in architecture and these differences are used to give Intel an advantage during testing. Conroe has a 4MB L2 cache that is shared between the two cores whereas K8 has a separate 1MB L2 cache for each core. One of the latest common ways to skew results is to only run a single benchmark so that the Intel core can use the entire L2 cache. Since AMD's cores have separate caches this only gives an advantage to Intel. This is very dishonest. The whole point of having dual core is to use both. The proper way to test would be to run a program on the second core to load it while the first core is tested. Running the same benchmark on both cores is not a proper test as this allows the Intel processor to share the benchmark code betwen the cores and save some space in the cache. In the real world, it is incredibly unlikely that anyone would have the same code in use by both cores at the same time. The program that loads the second core should be different from what is used on the first core to have a proper test. Yet, this proper way of tesing is still being avoided by review sites because the Core 2 Duo chips would not only have to share the oversized cache between the two cores but would also have to share meory accesses using the slower FSB. This kind of proper testing could end up erasing all of Conroe's apparent advantage and no review site has risked this publicly. If they have done this kind of tesing privately they have not published the results.

The test skewing is quite common, for example, I saw a recent review where a dual processor Intel Woodcrest system was compared with a dual processor AMD Opteron system. The AMD system, however, only had memory for one of the processors which severely crippled its peformance. Yet, the reviewers treated this obvious imbalance as though it were completely normal and fair. This would be like comparing two cars after you had pulled half the spark plug wires off one car's engine. While some low end dual Opteron systems may only have memory for one processor, proper systems are readily available. In fact, the motherboard used was not even certified by AMD. I've seen other tests where the motherboards were not equivalent or even where they tested Intel prototype boards that were not available using Intel processors that were also not available against off the shelf AMD systems. This is of course unfair because they are testing future hardware from Intel against current hardware from AMD. These tests can include specially modified I/O drivers and other tweaks to make the Intel hardware faster. Hardware review sites do this because they want to make money and there is more money available in promoting Intel than AMD. Intel does this because it doesn't want to lose its near monopoly position. Intel has slipped from about 7X AMD's sales to just 5X. Intel's once mighty cash reserves have been cut in half in just the past year. Intel has been losing ground to a much smaller competitor and Intel doesn't like thar. Furthermore, a company with an $8 Billion dollar marketing budget has the means to try to do something about it by putting a more positive spin on its hardware.

The processor market will only be good for consumers as long as there is competition. The current ($3 Billion) cost of building FABs to produce chips is too high for new competitors. It is also expensive to try to put together a good design team. Consequently the number of competing processor companies has declined drastically. There really is no other competitor for Intel beyond AMD. And, AMD is only a fraction of Intel's size. As long as AMD is able to compete honestly against Intel we will have good processors from both companies. If, however, Intel is able to use dishonest means to regain the marketshare that it lost to AMD it will move back into a monopoly position and have much less reason to deliver quality products. By giving overclocked reviews and using other techniqeues to skew comparisions in Intel 's favor these review sites do a disservice to their readers both now and in the future.

Processors should be reviewed by mainsteam sites at stock speeds with no overclocking. They should stop using toy benchmarks which don't really test the processor's abilities. When testing dual core procesors the second core should be running a different application so that it is loaded for proper testing. Processors that are compared should also include the cost of the motherboard since these can vary between Intel and AMD. Any tests of power consumption should include the entire system rather than just the processor. It remains to be seen whether mainstream review sites will do any of these things or if they will continue following the easy money.