Monday, August 07, 2006

The Dishonesty of Overclocking

Mainstream review sites on the web are very much as magazines used to be. They are places where people can go to find out things, however, they are much more immediatley available than any magazine ever was. Like magazines however they often carry an aura of expertise and credibility. However, we've had both good and bad magazines in the past. Review sites are similar in that the fact that they exist does not guarantee balance, thoroughness, or professionalism. It is easy to find mainstream review sites like Anandtech, XbitLabs, and Toms Hardware Guide that claim to provide balanced reviews of current procesors. This has become a two horse race between near monopoly, Intel and underdog, AMD.

These reviews show a stark contrast when compared to typical mainstream reviews of things like cars. Reviewers of cars try to show what is offered in terms of convenience, reliability, performance, and cost just as these vehicles are when they come from the factory. In sharp contrast, however, mainstream processor review sites commonly show overclocked configurations in their comparisons. This practice is very common yet it is highly dishonest and unprofessional. When reviewing a car it is never the practice to change timing chips or add nitrous oxide injectors to increase performance. Reviewers never add rooftop luggage carriers or hitch up trailers to increase cargo volume. Reviewers don't make modifications to the transmission or add on new equipment to include with their review. They don't do this for a very simple reason. Most buyers don't modify their vehicles and wouldn't be interested. There are people who enjoy tinkering with cars like adding new equipment and making modifications to increase performance. This is the pro-sport area of cars and this is where people go to discuss these topics. Pro-sport is never mixed in with mainstream car reviews.

However, on mainstream computer hardware sites, mixing these together is common. Reviewers will often modify the operation of the processors by increasing clock and front side bus speeds, change processor voltages and timing settings, and even run processors with oversized cooling rigs. It is very strange that this is so common because it is also highly unethical. If you really are a computer enthusiast there are many sites that cater to overclocking and pushing the performance just as people do with cars. However, people on these sites understand the inherent risks just as people do when they increase the horsepower on a stock engine. They understand that these chips may become unstable and unreliable. They understand that these chips can burn up just engines can blow when pushed.

It is possible to push a processor to extreme clock speeds to where it is too unstable to run even a simple benchmark like SuperPi. It is possible for the configuration to be stable enough to run SuperPi run at 1M but 32M causes a crash. It is possible to run SuperPi at 32M stably but Prime95 causes a crash. It is possible to run Prime95 but running an instance on each core on a dual core processor causes a crash. It is possible for a processor to run stably for five minutes but crash after fifteen. It should be remembered that when reviewers claim that an overlocked system is stable that it is actually running very close to an unstable configuration. And, the only proof of stability is that it hasn't crashed yet. This is not the realm of mainstream computing.

It is very puzzling that reviewers spend time tinkering with processor settings to try to increase performance. Sometimes this is justified by suggesting that they are doing it for computing enthusiasts. However, this doesn't really fit as genuine computing enthusiasts don't get overclocking tips from mainstream sites; they get them from sites dedicated to this, like Overclockers and XtremeSystems. The vast majority of people who buy computers, buy them to use, not to spend days tinkering with them. And, the number of people who actually make modifications is tiny, less than 0.1%. It doesn't seem logical that the mainstream sites would be putting in so much effort for such a tiny group that will get its information elsewhere anyway.

Nor does this explain why these sites would engage in something that is so unethical. It is both dishonest and unprofessional on many levels. For example, in 2002 and 2003 the memory controller on the northbridge chips used by intel was much faster than what was available for AMD's Athlon. However, when AMD intoduced the Athlon 64 with an integrated memory controller this changed. AMD was very briefly faster than Pentium II with the K6 and then faster than PIII with the K7 Athlon. Intel regained the lead again when it introduced the P4. However, even after being upgraded to the Prescott architecture in 2003 the P4 still fell behind once AMD got the K8, Athlon 64 up to 2.2Ghz in late 2003. It has only been very recently that Intel has become competitive again with the new Core 2 Duo (conroe) which is derived from the Duo Core (Yonah) which is derived from the Centrino processor (Pentium M).

However, although Conroe is a very good design its memory speed is still less than the older K8's. One of the things that is done in overclocking is increasing the front side bus (FSB) speed to make it more competitive. This is very unethical because once you start modifying the processor beyond its factory settings you are no longer reviewing the chip. It wouldn't be the factory's responsibility if you have problems with memory synchronization or timing because it isn't running within the factory specifications. If the FSB is too slow then this should show up in the review and it should be the factory's job to produce a faster one. However, overclocking gives the illusion that the problem is easily fixed while relieving the factory of any responsibility if your system is unstable, gets memory errors, or even burns up the processor. This is completely the opposite of what review sites should be doing and shows that that their first priority is not to the site's readers.

If review sites commonly add overclocked data to their reviews knowing that this information is both misleading and even harmful to the site's readers one certainly has to wonder why they do it. This seems to be part of the general shift in influence from regular magazines and has to do with advertizers and free products for review. Consider that Intel's marketing budget is around $4 Billion dollars per year and AMD's is nearly nonexistent. This means that it is common for Intel hardware to end up in the hands of reviewers. This also means that Intel is expecting some sort of favor in return in the form of a favorable review. If Intel doesn't get a favorable review they can always take their free review hardware elsewhere. This fact has skewed online reviews for many years. At times, when Intel's hardware just could not measure up the reviewers have used various tactics to make the hardware look better. Sometimes the conclusions saying that the Intel processors were better were in complete contrast to their own review data.

These shady practices center around benchmarking. Processors are typically compared by using benchmark programs that generate some type of score. However, unlike typical tests for cars that include things like acceleration 0-60 and stopping distance these processor benchmarks can vary greatly in quality. Anyone could see and understand stopping distance but the great majority of people are not familiar with computer programming. As a professional computer programmer I am very much aware that it makes a big difference in quality in what benchmarks test, how they are written, and how they are compiled. It is also possible to skew results by using different testing procedures and by the choice of benchmarks. Yet review sites often pretend that their practices are straightforward even as they knowingly do things to skew the results.

For example, there are many so-called toy benchmarks like SuperPi and Prime95. These programs are tiny and can fit entirely inside the L2 cache on some processors. This can give these programs a tremendous boost in speed because memory is much slower and, if they fit entirely inside the L2 cache, they don't have to access memory very often. However, these toy benchmarks don't actually measure real processor performance because real applications are much larger and would never fit inside the L2 cache. However, it is not unusual to see scores from benchmarks that are heavily influenced by cache size. Since Intel typically uses caches that are twice the size of caches used by AMD these scores from toy benchmarks only benefit Intel in the comparisons. The real skill of benchmark skewing is to choose benchmark programs that use more cache than AMD has but not more than Intel has. This type of artificial behavior can give Intel processors a large boost in scores.

Other benchmarks may need more memory speed. If Intel's processors are slower in memory speed a reviewer can avoid a bad score by either not including these benchmarks in their review or by using FSB overclocking to make the Intel processors more competitive. Often they can justify their overclocked tests by suggesting that they are a preview of a faster procesor that Intel will release at some point. This however is not a valid a argument since it would be difficult to buy a procesor that isn't released yet. In reality the point of including overclocked scores is a sort of advertizing. By adding an OC'ed (overclocked) sample they can make sure that an Intel processor is the fastest in all or the majority of the tests. Even if they have gone to extreme measures to push this processor to a faster speed the fact that Intel gets the top rating is still a good association. Imagine how this would be if car reviewers let the factory give them a test car that had been modified so that it was substantially better than what the real car that people could buy would be. This would essentially be defrauding the public, yet this is what mainstream review sites do everyday.

Dual core processors are becoming more common now. AMD had designed the Athlon 64 for dual core from the beginning. Intel's first attempts at dual core by simply sticking two chips inside the same package gave very poor results. Intel is doing much better now with the Core 2 Duo chips. However, there are some differnces in architecture and these differences are used to give Intel an advantage during testing. Conroe has a 4MB L2 cache that is shared between the two cores whereas K8 has a separate 1MB L2 cache for each core. One of the latest common ways to skew results is to only run a single benchmark so that the Intel core can use the entire L2 cache. Since AMD's cores have separate caches this only gives an advantage to Intel. This is very dishonest. The whole point of having dual core is to use both. The proper way to test would be to run a program on the second core to load it while the first core is tested. Running the same benchmark on both cores is not a proper test as this allows the Intel processor to share the benchmark code betwen the cores and save some space in the cache. In the real world, it is incredibly unlikely that anyone would have the same code in use by both cores at the same time. The program that loads the second core should be different from what is used on the first core to have a proper test. Yet, this proper way of tesing is still being avoided by review sites because the Core 2 Duo chips would not only have to share the oversized cache between the two cores but would also have to share meory accesses using the slower FSB. This kind of proper testing could end up erasing all of Conroe's apparent advantage and no review site has risked this publicly. If they have done this kind of tesing privately they have not published the results.

The test skewing is quite common, for example, I saw a recent review where a dual processor Intel Woodcrest system was compared with a dual processor AMD Opteron system. The AMD system, however, only had memory for one of the processors which severely crippled its peformance. Yet, the reviewers treated this obvious imbalance as though it were completely normal and fair. This would be like comparing two cars after you had pulled half the spark plug wires off one car's engine. While some low end dual Opteron systems may only have memory for one processor, proper systems are readily available. In fact, the motherboard used was not even certified by AMD. I've seen other tests where the motherboards were not equivalent or even where they tested Intel prototype boards that were not available using Intel processors that were also not available against off the shelf AMD systems. This is of course unfair because they are testing future hardware from Intel against current hardware from AMD. These tests can include specially modified I/O drivers and other tweaks to make the Intel hardware faster. Hardware review sites do this because they want to make money and there is more money available in promoting Intel than AMD. Intel does this because it doesn't want to lose its near monopoly position. Intel has slipped from about 7X AMD's sales to just 5X. Intel's once mighty cash reserves have been cut in half in just the past year. Intel has been losing ground to a much smaller competitor and Intel doesn't like thar. Furthermore, a company with an $8 Billion dollar marketing budget has the means to try to do something about it by putting a more positive spin on its hardware.

The processor market will only be good for consumers as long as there is competition. The current ($3 Billion) cost of building FABs to produce chips is too high for new competitors. It is also expensive to try to put together a good design team. Consequently the number of competing processor companies has declined drastically. There really is no other competitor for Intel beyond AMD. And, AMD is only a fraction of Intel's size. As long as AMD is able to compete honestly against Intel we will have good processors from both companies. If, however, Intel is able to use dishonest means to regain the marketshare that it lost to AMD it will move back into a monopoly position and have much less reason to deliver quality products. By giving overclocked reviews and using other techniqeues to skew comparisions in Intel 's favor these review sites do a disservice to their readers both now and in the future.

Processors should be reviewed by mainsteam sites at stock speeds with no overclocking. They should stop using toy benchmarks which don't really test the processor's abilities. When testing dual core procesors the second core should be running a different application so that it is loaded for proper testing. Processors that are compared should also include the cost of the motherboard since these can vary between Intel and AMD. Any tests of power consumption should include the entire system rather than just the processor. It remains to be seen whether mainstream review sites will do any of these things or if they will continue following the easy money.

12 comments:

Pop Catalin Sever said...

This is absolutely true. When I saw TomsHardware in their first review, that they only overclocked the Core 2 Duos, my entire fate in a system wich I thought unbreakable, was shattered. All most al of the benchmarks on the net are so obsiously biased for an eye familiar with hardware capabilities.

One tests wich I suppose Intel wouldn't want to see theirs conroes is within a heavily threaded environment (more than thwo o three houndered active threads) where all threads take work at full load and there is alot of threads syncronization and data exchange. That might tear Core 2 Duo apart, because it's higly dependant on the prefetch and cache hits.

Anonymous said...

I've always enjoyed your posts man,even if i don't post a comment.
Same goes for AMDzone comments.

Keep up the writting as people will, in time,find out about this blog place and visit it in large numbers.

Regards,Ivan

Ajay S. said...

Hi

I came across a review two days back that checks performance of processors when loaded with two totally different process,

and as you expected, Athlon64 beats core2 hands down. maybe u'll find the review interesting

http://www.ppcnux.com/modules.php?name=News&file=article&sid=6552

Anonymous said...

Bravo!!! Very well done.

People need to make a stand against some of the things you have pointed out. We need to have a way to regulate benchmarking. The question would be how? How do we spread the message to the common Joe?

Anonymous said...

i love the dark arts but they should not talk about them except as on a extention site for real enthusiosts. i mean it should be on sites like amd zone and other places with searious enthusiust roam. i got fryed with overclocking an x2 3800 939 i got it to 2.5 ghz but i way overvolted it. overclocking is alot like tuneing car enjions you can raze the redline but you can pay with a blown enjion.

Anonymous said...

remember BAPCO? remember the IntelCC scam? the question is not "are they lying?" but it is "are they capable of honisty? "!!!!!!

Anonymous said...

Good post. Pleasant to read.100% with you.

I have see 0% of good Woodcrest reviews.
I still don’t know how the dual FSB work, and if it is already working with the current intel chipsets offering. Initial thoughts that I have is that is very similar to the K7 MP (AMD760MP), but I don’t have data or diagrams that confirm that.
There is lots of confusing in all the review, cant get clear ideas of it, hardware used is very different.

No one says that for example the K8 is still a superior processor design vs core 2. It seams only performance is important specially in super pi, ... software that doesn’t do a thing and can’t even be used...

Much biased reviews. The Toms Turion X2 review for example is totally biased:
http://www.tomshardware.com/2006/08/22/amd_dual_core_laptops_have_arrived/

The review is full of flaws:

1. Lacks at least a game or 3Dmark2001 or any other program that could test the IGP. Not 3Dmark 2006 or Quake 4 where one would do 4 fps and the other 5 fps.

2. The battery tests are completely wrong. One as 80Wh and the other 54Wh, those 26Wh are enough to feed the system more time and keep it warmer. Also they used a 1.83Ghz processor on the battery tests instead of the 2.0Ghz of the processor they where reviewing.

3. The Multi tasking benchmarks Scenario have to be taking with grain of salt because the HDD performance takes a very important rule here. If they aren’t the same they can’t compare. Even the Pentium M 780 (2.26Ghz) win in the multitasking scenario, how can that be?

4. The conclusion is not very correct “Therefore, under equal conditions, it can only be regarded as the second choice - if it is worth getting at all..” In some of the tests the AMD notebook wins so I don’t know what the author of the article wants from an AMD mobile processor.

5. Intel mobile still lacks 64 bit, so Turion 64 is a great choice. I didn’t see any mention to the fact that Intel doesn’t have it and Amd does. I bet if AMD processor lacks SSE3 or even SSE2 we would ear something like this: “The processor showed good performance but the lack of SSE2 and SSE3 makes it difficult to recommend because newer applications will make use of the new instructions making the processor already obsolete.”

Scientia from AMDZone said...

and as you expected, Athlon64 beats core2 hands down. maybe u'll find the review interesting

Sorry, Ajay. The http://www.ppcnux.com/modules.php?name=News&file=article&sid=6552 article is not for Core 2 Duo (Conroe). This is for Core Duo which is Yonah. The benchmarks show that X2 beats Yonah. However, Conroe is ahead of Yonah.

The confusion is understandable since the author uses the term "core duo" and sometimes "core 2".

Anonymous said...

The best way to fairly benchmark multi core chips is to use use multithreaded applications designed to operate on SMP/multicore platforms. Running real applications is a lot less ambigious than running synthetic benchmarks.

For example, running a Winrar 3.60 (multithreaded) benchmark gives far better indication of which processor is better for lossless compression than any number of cache-tweaked synthetic tests. The same principle applies to other applications: running a few million polys through mental ray is a far better measure of rendering performance than any number of SuperPI tests.

Unfortunately, not everyone chooses their processor on the basis of wanting to optimize performance for specific compute-intensive tasks. While benchmarking WinRAR or mental ray is easy, benchmarking performance for general usage under a mixed load is far harder.

Anonymous said...

AFAIK, the MacPro, tested in this review, is a Xeon Woodcrest based computer, hence a Core 2 Duo.

Scientia from AMDZone said...

Okay, I looked at the review again more closely and I would have to say now that my earlier comment was incorrect and that there is indeed a Core 2 Duo in the review.

This is the closest they get to using the term Core 2 Duo:

I found almost no drop in performance (1-5%) for the CoreDuo and the CoreDuo2 for the mid sized data sets (2MB)

However, it is clear that they are trying to draw a distinction from Yonah (Core Duo). It appears to me now that the iMac CoreDuo (2.0 Ghz) is Yonah and the MacPro (2.0 GHz) is Woodcrest. The integer comparison shows the difference:

MacPro, 2.0 GHz, IA32/AMD64, 4020 - 5560
iMac CoreDuo, 2.0 GHz, IA32, 2890 - 3200

Yes, I had overlooked this that CoreDuo is listed as IA32 which would be consistent with Yonah (which is 32 bit only) and the MacPro is listed as IA32/AMD64 which could not be Yonah.

It takes a bit of thinking to see how they set up the chart because Pentium 4 is listed as IA32 even though it is a Prescott core (which can do 64 bits). It appears that they skipped 64 bit mode because it wasn't in the existing database.

Then they list Athlon 64 as both IA32 and AMD64. But, apparently this just shows Athlon 64 in both 32 bit and 64 bit modes. However, this point is clarified lower down where there is an explicit 32 and 64 bit comparison between Core Duo2 and Athlon 64. Obviously, this could not be Yonah because it isn't 64 bit capable. This does show that Woodcrest only gets a 10% boost in integer performance with 64 bits versus a 30% boost for Athlon 64.

Finally, the point that was made earlier was that the Athlon 64 scores are higher than the Core 2 Duo scores and this does indeed appear to be the case. Core2 would then refer to Woodcrest when they say:

Looking at the results the Core2 can narrow the gap to the Athlons, but the AThlons stay fastest.

Unknown said...

Oh it's very true alright. Think about it. All the sites kept on comparing the Phenom II chips to the i7s. The Phenom II's were never intended to compete with i7s. They were intended to compete with Core 2 Quads and the review sites were VERY careful not to do that. They had this BS reasoning that "Oh it's best vs best" and I told them they were morons.