Monday, July 14, 2008

Reviews And Fairness Or How To Make Intel Look Good

I've had people complain that I've been too tough on Anand but in all honesty Anandtech is not the only website playing fast and loose with reviews.

Anand has made a lot of mistakes lately that he has had to correct. But, aside from mistakes Anand clearly favors Intel. This is not hard to see. Go to the Anandtech home page and there on the left just below the word "Galleries" is a hot button to the Intel Resource Center. Go to IT Computing and the button is still there. Click on the CPU/Chipset tab at the top and not only is the button still there on the left but a new quick tab to the Intel Resource Center has been added under the section title right next to All CPU & Chipset Articles. Click the Motherboards tab and the button disappears but there is the quick tab under the section title. There are no buttons or quick tabs for AMD. In fact there are no quick tabs to any company except Intel. Clearly Intel enjoys a favored status at Anandtech.

What we've seen since C2D was released is a general shift in benchmarks that favor Intel. In other words instead of shooting at the same target we have reviewers looking to see where Intel's arrows strike and then painting a bullseye that includes as many of them as possible. For example, encryption used to be used as a benchmark but AMD did too well on this so it was dropped and replaced with something more favorable to Intel. There has been a similar process for several other benchmarks. Of course now it isn't just processors. Reviewers have carefully avoided comparing the performance of high end video cards on AMD and Intel processors. Reviews are typically done only on high end Intel quad cores. The claim is that this is for fairness but it also avoids showing any advantage that AMD might have due to HyperTransport. It is a subtle but definite difference where review sites avoid testing power draw and graphic performance with just Integrated Graphics where AMD would have an advantage. They then test performance with only a mid range graphic card to avoid any FSB congestion which again might give AMD an advantage. Then high end graphics cards are tested on Intel platforms only which avoids showing any problems that might be particular to Intel. We are also now hearing about speed tests being done with AMD's Cool and Quiet turned on which by itself is good for a 5% hit. I suppose reviewers could try to argue that this is a stock configuration but these are the same reviewers who tout overclocking performance. So, by shifting the configuration with each test they carefully avoid showing any of Intel's weaknesses. This is actually quite clever in terms of deception.

As you can imagine the most fervent supporters of this system are those like Anand Lal Shimpi who strongly prefer Intel over AMD. I had one such Intel fan insist that Intel will show the world just how great it was when Nehalem is released. However, I have a counter prediction. I'm going to predict that we will see another round of benchmark shuffling when Nehalem is released. And, I believe we will see a concerted effort to not only make Nehalem look good versus AMD's Shanghai but also to make Nehalem look good compared to Intel's current Penryn processor. It would be a disaster for reviewers to compare Nehalem and conclude that no one should buy it because Penryn is still faster . . . so that isn't going to happen.

An example is that since AMD uses separate memory areas for each processor it needs an OS and applications that work with NUMA. In the past reviewers have run OS's and benchmarks alike oblivious to whether they worked with NUMA or not. If anything seems to be overly slow they just chalk it up to AMD's smaller size, lack of money, fewer engineers, etc. Nehalem however also has separate memory areas and needs NUMA as well. I predict that these reviewers will suddenly become very sensitive to whether or not a given benchmark is NUMA compatible and will be quick to dismiss any benchmark that isn't. This may extend so far as to purposefully run NUMA on Penryn to reduce its performance. This would easily be explained away as a necessary shift while ignoring that it wasn't done for K8 or Barcelona. That would be explained away as well by saying that the market wasn't ready for it yet when K8 was launched. That was what happened with 64 bit code which was mostly ignored. However, if Intel had made the shift to 64 bits reviewers would have fallen all over themselves to do 64 bit reviews and proclaim AMD as out of date just as they did every time Intel launched a new version of SSE.

We see this today with single threaded code. C2D and Penryn work great with single threaded code but have less of an advantage with multi-threaded code and no actual advantage with mixed code. It is a quirk of Intel's architecture that sharing is much more efficient when the same code is run multiple times. If you compared multi-tasking by running a different benchmark on each core Intel would lose its sharing advantage and have to deal with more L2 cache thrashing. Even though mixed code tests would be closer to what people actually do with processors reviewers avoid this type of testing like the plague. The last thing they want to do is have AMD to match Intel in performance under heavy load or worse still actually have AMD beat a higher clocked Penryn. But Nehalem uses HyperThreading to get its performance so I predict that reviewers will suddenly decide that single threaded code (as they prefer today) is old fashioned and out of date and not so important after all. They will decide that the market is now ready for it (because Intel needs it of course).

Cache tuning is another issue. P4EE had a large cache as did Conroe. C2D doubled the amount of cache that Yonah used and Penryns have even more. However, reviewers carefully avoid the question of whether or not processors benefit from cache. This is because benchmark improvements due to cache tend to be paper improvements that don't show up on real application code. So, it is best to avoid comparing processors of different cache sizes to see benchmarks are getting artificial boosts from cache. I did have one person try to defend this by claiming that programmers would of course write code to match the cache size. That might sound good to the average person but I've been a programmer for more than 25 years. Try and guess what would happen on a real system if you ran several applications that were all tuned to use the whole cache. Disastrous is the word that comes to mind. But you can avoid this on paper by never doing mixed testing. A more realistic test for a quad core processor is to run something like Folding@Home on one core and graphic encoding on another while using the remaining two to run the operating system and perhaps a game. Since the tests have to be repeatable you can't run Folding@Home as a benchmark but that isn't a problem since it is the type processing that needs to be simulated rather than the specific code. For example you could probably run two different levels of Prime95 tests in the background while running a game benchmark on the other two cores to have repeatable results. And, if you do run a game benchmark on all four cores then for heavens sake use a high graphic card like a 9800X2 instead of an outdated 8800.

Cache will be an issue for Nehalem because it not only has less than Penryn but it has less than Shanghai as well. It also loses most of its fast L2 in favor of much slower L3. My guess is that if any benchmarks are faster on Penryn due to unrealistic cache tuning this will be quickly dropped. That reviews shift with the Intel winds is not hard to see. Toms Hardware Guide went out of its way to "prove" that AMD's higher memory bandwidth wasn't an advantage and that Kentsfield's four cores were not bottlenecked by memory. But now that Nehalem has 3 memory channels the advantage of more memory bandwidth is mentioned in every preview. We'll get the same thing when Intel's Quick Path is compared with AMD's HyperTransport. Reviewers will be quick to point to raw bandwidth and claim that Intel has much more. They of course will never mention that the standard was derated from the last generation of PCI-e and that in practice you won't get more bandwidth than you would with HyperTransport 3.0.

I could be wrong; maybe review sites won't shift benchmarks when Nehalem appears. Maybe they will stop giving Intel an advantage. I won't hold my breath though.

Addition:

We can see where Ars Technica discovers PCMark 2005 error. Strangely the memory score gets faster when PCMark thinks the processor is an Intel than when it thinks it is a an AMD. Clearly the bias is entirely within the software since the processor is the same in all three tests:



I've had Intel fans claim that it doesn't matter if Anandtech cheats in Intel's favor because X-BitLabs cheats in AMD's favor. Yet, here is the X-Bit New Wolfdale Processor Stepping: Core 2 Duo E8600 Review from July 28, 2008. In the overclocking section it says:

At this voltage setting our CPU worked stably at 4.57GHz frequency. It passed a one-hour OCCT stability test as well as Prime95 in Small FFTs mode. During this stress-testing maximum processor temperature didn’t exceed 80ºC according to the readings from built-in thermal diodes.

This sounds good but the problem is that to properly test you have run two separate copies of Prime95 with core affinity set so that it runs on each core. This article doesn't really say that they did that. There is a second problem as well dealing with both stability and power draw testing:

We measured the system power consumption in three states. Besides our standard measurements at CPU’s default speeds in idle mode and with maximum CPU utilization created by Prime95

This is actually wrong; the Prime95 test they peformed was not maximum power draw; it was Prime95 in Small FFTs mode. But this doesn't agree with Prime95 itself which clearly states in the Torture Test options:

In-place large FFTs (maximum heat, power consumption, some RAM test)

So, the Intel processors were not tested properly. Using the small FFTs does not test either maximum power draw or temperature and therefore doesn't really test stability. If any cheating is taking place at X-Bit, it seems to be in Intel's favor.

22 comments:

Scientia from AMDZone said...

Amdposer

"Xbitlabs has a nice clickable POWERED BY AMD OPTERON poster on their website."

True and you could make this argument if they only used AMD equipment. But notice what platform they use to test graphics cards.

Testbed

Intel Core 2 Extreme X6800 processor (3.0GHz, FSB 1333MHz x 9)

Scientia from AMDZone said...

Décío Luiz Gazzoni Filho

"ever heard of advertising? You seriously think Anandtech links to an `Intel Resource Center' out of the kindness of their hearts?"

You are actually making a case against Anandtech. If they depend on advertizing from Intel then they aren't likely to want to upset Intel with a bad review.

Lem said...

Thanks for the articles Sci, always a good read. It's nice to see Intel's and AMD's CPU architectures converging. It will make it infinitely harder to perform benchmarketing.

Pedantic correction.. It's Intel QuickPath, not QuickPort (towards the end of your article). :)

Décío Luiz Gazzoni Filho said...

That's why any serious ad-supported business imposes a strict separation between the advertising and the journalistic departments. Their readership makes or breaks them (that's what the advertisers are buying, eyes to look at their ads), and if you lie to your readers, you'll soon find yourself in the gutter.

Now if you're in the hardware business, you'll attract advertisers in the hardware business -- Intel, Nvidia, AMD, etc. If Anandtech refused their money because it might conflict with their (Anandtech's) interests, then they'd starve, because who do you expect will want to advertise with them? Ford? Walgreens?

Or would you rather have everyone resort to `punch the monkey' and `download 1 million new smileys' generic ads?

Christian M. Howell said...

Of course the benchmarks will change. I always hated it when only one CPU was used to test a GPU. Anand uses a 3.2GHz Penryn which probably .5% of users actually have, so no users can see how new GPUs will behave with their CPUs.

I recently came across a WPrime, which uses both cores to calculate Pi. AMD is MUCH closer there.

Another thing I see is that AMD chips are tested on the slowest mobos. Most people use the low cost MSI K9A2, while a review from Sharky uses an ASUS MVP which consistently scores higher.

With GPUs I want to see scaling with clockspeed since I never buy the highest clocked chip.

There was a review linked from The Inq where someone tested with SLI and the highest clocked AMD and Intel chips. The difference in platform price was around $1200 but the Intel system was only about 10-15% faster in high res games.

Also, OEMs are just as guilty as they are not demanding the ZM-86 chips for Puma, which at 2.4GHz will be a nice desktop replacement and generate up to $2000 with XFire.

But then Sony sells laptops for over $1500 with Intel IGPs and now MS is embroiled in a class action suit over non-Aero supporting graphics. Someone should be suing the companies who attached the stickers, not MS. But then it's like Intel is like some god or something. Sun, IBM and AMD have more capable chips. nVidia has the graphics, Asus has the mobos.


Intel is just a bully that got lucky. People who follow them are sheep.

Scientia from AMDZone said...

Décío Luiz Gazzoni Filho

"That's why any serious ad-supported business imposes a strict separation between the advertising and the journalistic departments."

I remember when they banned smoking in the student union at ISU. One student wrote and editorial where he said that there was no reason to ban smoking because the smoking areas were "well separated" from the non-smoking areas. The student union was one large room that held about 1,100 people. The separation consisted of having tables with ashtrays right next to tables without ashtrays. I guess I look at your claims of separation the same way.

"and if you lie to your readers, you'll soon find yourself in the gutter."

So, are you saying that people stopped buying Intel processors when they cheated on the Bapco benchmarks?

Scientia from AMDZone said...

enumae

"And prior to the release of Conroe reviews were done on AMD's best platform... Go look at Anandtech and Sharky Extreme, as did many others."

That is quite true but it is also quite irrelevant. The focus today has shifted to platforms rather than just CPU's. Back when ATI was a separate company there was a more diverse mix of systems and the only distinct platform was Intel's Centrino which had no effect on desktop testing.

"if you can't find reviews testing how you want test to be run, do them yourself and be proactive."

I may very well have to.

enumae said...

When you take out the "Reviews are typically done only on high end Intel quad cores", the context of my post is lost.

Scientia from AMDZone said...

Morly

Yes, after your correction I took out the mention of die size in the article. Places like TechReport were saying that Nehalem's die would be 20-30% larger. And, I had forgotten about Hans De Vries die photos with more accurate die size estimates which indicate they are the same size.

george said...

Dirk is new AMD's CEO. AT LAST!!!

Scientia from AMDZone said...

orly

"wont post this but you'll still read it at the very least."

No, I probably won't quote it or respond to it.

"I find it rather hilarious that you write an article (http://scientiasblog.blogspot.com/2008/03/45nm-interesting-convergence-in-design.html about talking about the die sizes and caches of nehalem and shanghai"

Yes, that was from March; I've written several since then.

"and then talk about how shanghai has some huge advantage in terms of die size. Funnier still you correct it and link to a techport news article which links to fudzilla!"

TechReport was an example but there are others. Unfortunately if you do google search the incorrect ones are listed at the top while the correct one (that I also linked to) was listed down much further.

"Well I'll give you some credit for manning up and actually correcting this."

Why wouldn't I correct an error?

"BTW I'm still curious if you can show this change in direction of benchmarks since you know, you're making the claims here and umm, not backing them up with anything."

Yes, you mentioned this earlier and I've been trying to decide if I need to do another article. It seems like a lot to just put in comments. And, much like your opening statement I find the shift hilarious.

Scientia from AMDZone said...

ho ho

PCIe 1.0 and 2.0 use 8/10 encoding. This makes the actual data transmission bit speed 20% less than the transfer speed.

PCIe 1.0 - 2.5 GT/sec; 2 Gb/sec data
PCIe 2.0 - 5 GT/sec; 4 Gb/sec data

In order to get 8 Mb/sec data with PCIe 3.0 using the existing scheme Intel would have needed 10GT/sec. However, they couldn't get that much speed so the removed the 8/10 encoding.

PCIe 3.0 - 8GT/sec; less than 8Gb/sec data

The only way that you could argue that PCIe 3.0 really is 8 Gb/sec is you are under the delusion that the 8/10 encoding was completely unnecessary on the two previous versions. In reality, the faster PCIe 3.0 needs the encoding more.

Pop Catalin Sever said...

Well it seems H.R. is leaving AMD. I think he rightfully should. The new CEO will be Dirk Meyer, a technology oriented CEO, not Marketing/Brand oriented like Hector Ruiz was. I always felt that AMD is and should be a technology company, and not a manufacturing company.

The recent history showed that in IT brands are the least important of almost all markets, and the technology in itself is the one that matters the most. In IT brand recognition doesn't extend itself over whole unrelated/new product lines of the same company, and only matter as much as the technology behind it is worth.

I hope the new AMD CEO will put a greater emphasis on technology from now on, and will bring AMD to the level of innovation it once had. Intel must be challenged by AMD, and AMD must score some winning points if it wants to survive the continuous pressure and reactivity of Intel.

AMD can win it's own supporters regardless of existence of some more or less biased reviews on the net. All it needs to do is deliver.

nebojsa said...

This story should be interesting to follow:

http://www.anandtech.com/weblog/showpost.aspx?i=471

litle off topic, but it's about new SB700 south bridge on AMD fx790.

Scientia from AMDZone said...

ho ho

Instead of quoting a snippet from ExtremeTech why didn't you go to the original source? PCI-SIG PCIe3.0 FAQ:

These studies showed, for instance, that power increased by a quadratic factor at 10GT/s.

This easily refutes your attempts to claim that 10GT with 8/10 encoding wasn't desirable. As indicated above it was impossible. So, PCI-SIG removed 8/10 encoding to try to fit the same bandwidth into 8GT. 3.0 still uses scrambling but it was already in use in 1.0 and 2.0. And, scrambling makes the signal worse.

scrambling introduces more DC wander than an encoding scheme such as 8b/10b; therefore, the Rx circuit must either tolerate the DC wander as margin degradation or implement a DC wander correction capability.

And causes other problems:

At the protocol layer, an encoding scheme such as 8b/10b provides out-of-band control characters that are used to identify the start and end of packets. Without an encoding scheme (i.e. scrambling only) no such characters exist, so an alternative means of delineating the start and end of packets is required.

From PCI-SIG's own documentation it is clear that PCIe 3.0 was created because they were backed into a corner, not because it was a good standard. If they can make the circuitry sophisticated enough to overcome the problems caused by dropping 8/10 encoding then it is possible that 3.0 might get close to its rated bandwidth. However, it is much more likely that implementations will be degraded from 8GB/sec.

Christian M. Howell said...

You might be interested in the tests that Ars Technica did with Nano, Atom and Turion. They showed that FutureMark has seemngly three code paths for memory tests (at least). One runs when an Intel CPUID is detected, another runs when an AMD CPUID is detcted and a third runs when it's neither (in the case of Nano) When they switch the Nano CPUID for AMD (Via doesn't lock it) it improves 10%. It improves 50% when the Intel CPUID is used.

This kind of thing really makes you wonder if ntel man reason for "innovating" with SIMD is just to give them an optimization advantage. This could actually mean that AMD is ALWAYS hamstringed in SW, so that Intel up those dollars for whatever.

A shame.

nebojsa said...

http://www.anandtech.com/weblog/showpost.aspx?i=471
about this.. still no benchmarks and i was looking forward to read abot HT and CPU speed above 2.5GHZ.
I was under the impresion that phenom is very slow compared to core 2 duo
or core 2 quad, but looking at the comparison on toms hardware pages i must say in things that i work with (3d studio, maya ) it is formidable opponent for the price and tehnology.

nebojsa said...

Something to read about on a hot summer:
http://www.tomshardware.co.uk/forum/252504-10-deneb-previewed

Scientia from AMDZone said...

ho ho

You and other Intel fans keep claiming that I'm some type of petty tyrant. However, I had nothing to do with your permanent ban on AMDZone; that was Ghost, the site Administrator.

Juliez and MatrixBaron were clearly trolling and their bans were an improvement but I'm not sure you are in the same category. Do you want me to see if I can get you unbanned?

Scientia from AMDZone said...

orly

"So, how about that review trend article?"

I didn't write a second article because I added several items on to this article.

And, stop trying to hijack an unrelated thread.

Ho Ho said...

"Do you want me to see if I can get you unbanned?"

What good would it do? Whatever I say there I get labeled as troll no matter what I do. Remember a few months ago when I was talking about multi-socket scaling and when I said in nearly every post that AMD clearly scales better I still got labeled as troll and got banned soon after?

Amdzone is nothing but amd fanboy camp. Unfortunately there are very few meaningful discussions there, most of what I see are overly optimistic stuff with nothing backing it up. My main reason why I visit it is to get my daily dose of laughter. Some of the claims done there are just so far out of this world.

Scientia from AMDZone said...

orly

I understand that you don't feel that I've proven that Anandtech is biased.

I will probably revisit this topic later when I have more information and can say whether the bias is Anandtech or the benchmarks or whether the comparisons are valid. There are certainly indications of bias both in testing methods and the benchmarks themselves but perhaps not enough to skew the comparisons overall.

The only problem is that I can't do any testing on my own without getting an AMD quad core system. So, if I buy an Intel quad the question of bias will be moot.