Friday, September 22, 2006

Tom's Hardware Guide Sells It's Soul

The professionalism and objectivity of a website is reflected in its general tone. If you find that a website tends to treat various manufacturers equally then it is reasonable to assume that their testing is equal as well. However, if a website always seems to have a positive outlook for one company regardless of their actual products then that website is probably also putting the same positive spin on testing.

It is always a surprise to think that a once respected source has changed. I used to think very highly of Compute! magazine. However, Compute! was bought by ABC publishing and their reviews changed to reflect their desire for advertising revenues. This was made very clear when Compute! reviewed a new word processing and layout application called "Outrageous Pages". Their review was positive and everything seemed to be fine. However, Info magazine reviewed the same software and had a completely different view. They said the software was too slow and hard to work with and they couldn't recommend it. The truth of which magazine was being honest was revealed when the software release was canceled. Somehow, Compute! had stopped being a good, objective source of information and had lost its credibility. Unfortunately, today, this is also true of Tom's Hardware Guide.

The differences are not hard to see. There was no sign of bias in Tom's review of the Athlon in 1999. New Athlon Processor

There is no sign of bias in Tom's comments about the Intel 1.13 Ghz PIII.

The very worst thing in terms of prestige damage happened back in Spring 2000, when AMD was the first x86-processor maker to introduce a CPU that runs at 1 GHz = 1000 MHz clock speed. Big Chipzilla countered with the release of the Pentium III at 1 GHz two days later, but this CPU was so unavailable that not even the press was equipped with any samples. Today, some four months later, the Giga-Pentium III is still hardly available anywhere

While the normal users out there might not know about this, people in the hardware reviewing scene are well aware of the fact that AMD has shipped their 1.1 GHz Thunderbird samples to publications already weeks ago, while Intel was just able to get the rare samples of the Pentium III 1.133 GHz to the reviewers in the second half of last week. AMD is planning to launch their Thunderbird-Athlon 1.1 GHz in late August, giving us the chance to review the sample with ample time. Intel however shipped out their samples in the last minute, which proves who of the two companies is really able to actually produce 'Beyond-Giga-Processors' right now.

Apparently, Tom was so fed up with "releases" from Intel with no chips available that he called the review, Intel's Next Paper Release: The Pentium III at 1133 MHz

The contrast today is obvious. For the last three years Intel has been doing paper releases while AMD's chips are available the day of the release. Tom's criticism for Intel's paper releases, however, has vanished.

Another THG flaw is positional marketing. The concept of association is well known in marketing and surely this concept is known to the people at THG. It is interesting that they always seem to make sure that an Intel chip is on top regardless of what is being compared. For example, in this review of a Celeron in 2002 The New Generation

The presence of the Athlon XP 1600, 1800, and 1900 is reasonable since by this time AMD has dropped the Duron line and these are comparable in price to the Celerons. To try to make Intel look better, however, THG first cheats by overclocking the Celeron from 2.0Ghz to 3.0Ghz. However when this still isn't enough to move Celeron enough to the top of the chart, THG cheats again by including the Pentium 4 2.26 Ghz. The presence of the P4 in this chart makes no sense because it is much more expensive than the other chips. The only reason it was put into the chart seems to be to prevent AMD from having the top spot. By using both the severely overclocked Celeron and the Pentium 4, THG ensured that Intel had the top spot in most of the tests. This tends to distract from the real purpose of comparing a standard Celeron at 1.7, 1.8, and 2.0Ghz with the comparable Athlon XP 1600, 1800, and 1900's from AMD. The ethical way of handling overclocking is to do a separate article. But the bottom line is that if you have chips from AMD and Intel, they either both have to be overclocked or both base clock so that you get a genuine comparison. Having only one overclocked creates a false association.

We know that THG is perfectly capable of following these ethical guidelines when it wants to as it does in these two examples:

This is a proper overclocking comparison, only Intel's chips are shown.
Pentium D

This is a proper competitive comparison, no chips are overclocked.
Athlon 64 FX

However, THG cannot seem to stay on an ethical track. Here an overclocked P4EE 955 is compared with AMD chips, December 2005. Extreme Edition

The question at this point is whether or not THG shows a general bias and lack of professional reviews after December 2005.

Here we have a good review of the AMD X2 in May 2005. X2 is compared with the AMD 4000+ and Intel 840 and 660 processors. All are stock speeds. AMD X2

Then in January 2006 we have the FX-60 review with another confusing mass of 28 different processors including OC's against FX-60. For example, in the DivX test the OC's do manage to steal the top position from FX-60. FX 60

The Pentium D 900 review contains no OC's but is another jumble of 22 different processors, January 2006. Pentium D 900

This review of Core Duo is much better. It compares with a Pentium M and a Turion, January 2006. Core Duo

The AM2 review is straightfoward comparing AM2 with 939. Socket AM2

The P4 EE 3.73 is proper because it oveclocks both however, it is another confusing jumble of 27 processors. Extreme Edition 965

Then we get to the Core 2 Duo review and THG's ethics plunge again. So, we have overclocked Core 2 Duos up against stock AMD chips. This isn't as badly cluttered as the previous articles. However, the article would make a lot more sense if we dropped the OC's and eliminated all of the lower clocked comparison processors. We don't really need the 4800+, 840, and FX-60 because there are a 5000+, 960, and FX-62. Core 2 Duo

It is clear that THG is cheating and knows that it is cheating because it never puts OC'ed AMD chips up against stock Intel chips in its reviews but frequently puts OC'ed Intel chips up against stock AMD chips. There are some troubling lapses in technical knowledge such as:

There is the issue of memory coherency, but e.g. the Opteron is smart enough to deal with it at up to four processors.

It really seems that nearly three years after Opteron's release Mr Schmid should know that Opterons can handle 8-way.

In spite of THG's problems with technical aspects and obvious bias toward Intel, nevertheless, there are many people who will suggest that the testing done by THG can still be relied on. So, let's look at the actual tests. Let's look again at the recent overclocking comparison between X6800 and FX-62. Overclocked X6800 and FX-62 In this review, for a change, THG puts the overclocks in a separate article as it should. This suggests that THG will give an honest and fair comparison. However, let's look in detail.

I'm not so sure that the top clock for AMD's FX-62 chip is fair. THG claimed they could only reach 3048 Mhz with an HTT of 254 Mhz whereas Neoseeker FX-62 says: I was quite pleased by reaching 3.1GHz air cooled; and the 345MHz HT speed was very impressive as well.
Given THG's bias it certainly raises the question of whether they put as much effort into overclocking the AMD chip.

The memory speed is also not so clear. The Intel memory is clocked to 555 Mhz whereas the AMD memory is only clocked to 508 Mhz. It is not clear why THG didn't use a divider of 11 instead of 12 and clock the AMD memory to 554 Mhz (nearly identical to Intel's). Even with the handicap of slower DIMMs, AMD still manages to outdo Intel:
This increases memory throughput from 9.2 GB/s to 10.7 GB/s - a noticeable improvement over what Intel can deliver. This is where the built-in memory controller really pays big dividends for AMD.

In the temperature and load tests it is not stated what THG did to load the systems. I've found that the term "under high load" can vary quite a a bit from tester to tester. Since tests have been done that reduced the power draw of higher drawing systems we need to know the actual procedure to give it any crediblity.

Now, we'll look at the benchmarks themselves. Benchmarks are only useful if they show a significant spread among processors of varying speeds, and if faster processors are always faster than slower processors of the same model. Bencharks can also be affected by large cache but this is not always easy to detect.

Call of Duty 2 - We see an anomaly where a 4800+ beats an FX-60. Both are dual core, both are socket 939, and both have 2 x 1MB L2 cache. However, the 4800+ is clocked at 2.4Ghz while the FX-60 is clocked at 2.6Ghz. They don't tell in this article what cores these two chips are. However, just one month earlier in another review both were listed as Toledo cores, so presumably they would still be in this review. Since an identical but lower clocked chip cannot truly be faster we have to conclude that this is a symptom of a faulty benchmark, improper testing, or sloppy test records. Any of these would invalidate the tests however if the rest of the data is good we can assume that it is not the benchmark itself. We can also see that the score spread is not proportionate for either AMD or Intel. This makes the benchmark itself faulty regardless of procedure.

Quake 4 – same anomaly. Disproportionate spread for AMD and Intel. This benchmark is faulty.

Unreal Tournament 2004 – same anomaly. The scores also show other anomalies on the midrange AMD scores. The top and bottom scores are good and the scores for Intel are good. Therefore, this is indicative of sloppy testing but the benchmark is good.

Serious Sam – same anomaly. Disproportionate spread for AMD and Intel. This benchmark is faulty.
Fear min. – same anomaly. Disproportionate spread for AMD and Intel. This benchmark is faulty.
Fear av. - same anomaly. Disproportionate spread for AMD and Intel. This benchmark is faulty.

Xvid 1.1.0 – same anomaly.
Divx 6.22 – the data looks good.
Main Concept H.264 Encoder – the data looks good.
Windows Media Encoder – the data looks good.
Pinnacle Studio DV to Mpeg2 – Disproportionate spread for AMD and Intel. This benchmark is faulty.

Premier Pro 2.0 – same anomaly
Clone DVD 2.8 - the data looks good.
Lame 3.97 - the data looks good.
Ogg Vorbis - the data looks good.
Windows Media Encoder 9 - the data looks good.
iTunes - the data looks good.
WinRar 3.60 – same anomaly
Photoshop CS 2 rendering 5 pictures – the middle AMD scores are anomalous. However, the highest and lowest scores appear to be good so the anomalies are probably due to sloppy testing.

Photoshop CS 2 converting 150 pictures - the data looks good.
3D Studio Max - the data looks good.
MS Word 2003 pdf – same anomaly
MS PowerPoint pdf – same anomaly
AVG anti Virus - the data looks good.
Multitasking 1 -same anomaly. However the use of AVG is also improper as this software is severely I/O bandwidth restricted and will not properly load the second core. This benchmark was poorly designed and not useful.

Multitasking 2 – Given that there are anomalies for both Intel and AMD processors this benchmark appears to have been poorly designed and executed. This benchmark is not useful.

Sandra arithmetic ALU – the data looks good..
Sandra aritmetic MLOPs – the Pentium EE 965 score has a 17% anomalous increase. This could be due to the additional memory bandwidth. The rest of the scores seem good.
Sandra Multmedia Integer – the C2D scores are amazingly high.
Sandra Multimedia FP – the data looks good.

It is not clear why the Multimedia Integer scores are so high. It can't be due to the 4 instruction issue or it would have appeared in the ALU test. It can't be due to the faster SSE because it's an Integer test. This really only leaves the faster cache bus speed as an explanation. Unfortunately if this benchmark is faster because of the cache bus then the benchmark is useless. Also, real benchmarks show clustering of scores for similar benchmarks. The other benchmarks for MP3, MPEG, and DVD conversion all show similar patterns for C2D. The Multmedia Integer benchmark however has to be considered faulty.

We'll skip the PC and 3D Mark tests.

The conclusions are not accurate. If we take the test data at face value then an increase of 16.8% for X6800 would actually be very poor compared to the FX's 7.2%. If the data were correct then FX would be showing 100% scaling while X6800 would only be showing 67%. If the data were correct then the analysis would be attrocious. For example, XviD is listed as a 20.3% increase when in reality the increase is nearly 25%. However, some of the benchmark scores for things like Call of Duty are faulty and should not be used. If you drop the bad benchmarks then X6800 will be at least 99% scaling.

Tom's Hardware Guide used to be a website with quality and integrity. Today, the testing is sloppy, the conclusions may not even match their own data, and they use benchmarks that are clearly faulty. Not all bencharks are good and the benchmarks themselves would have to be tested to determine their quality. However, when THG is willing to use clearly faulty benchmarks we can have no confidence that they've done any testing on the benchmarks to sort which are good and which are not. Finally, THG's clear bias toward Intel greatly reduces the credibility of the website. Tom's Hardware Guide today is merely a shadow of the ethics and quality that the website used to be.


Anonymous said...

usual intel My blog

Anonymous said...

Great post Scientia.

I do agree with you on this one. Tom's hardware is becoming intel's marketing machine these days and I really try to avoid reading their articles/reviews since they tend to mislead people.

Keep the good work.


Pop Catalin Sever said...

I've always knew or feeld this about TH but never had the resources to do a factual reasearch to prove my point, due to the fact that I do something else on a daily basis.

I's a shame to look at what is happening now with TH, and even worse is the fact that they will get away with it without beeing punished by the readers. (Or will they ...)

All I want to say is : Nice going TH, from beeing praised for fairness to beeing criticized for unfairness. It seems they changed their master, wich used to be the consumer with the provider of 'material'.