Friday, June 20, 2008

The Value Of Benchmarks

Chico Marx had a great line in Duck Soup where he asked, "Who are you going to believe, me or your own eyes?" Apparently Intel is now asking the same question.

Intel has on its website some graphs purporting to show "breakthrough performance and energy efficiency" for Intel 7300 Xeon in virtualization. These are the vConsolidate benchmarks one of which uses VMware. Intel's graph at 2.01 towers over the AMD graph at just 1.08. The problem with this comparison is that reality tends to get in the way. First, Intel is comparing four Quad-Core Intel Xeon X7350 2.93GHz processors against four AMD Dual-Core Opteron 8222SE 3.0GHz processors. Perhaps the fact that Intel is using twice as many cores explains why its score is twice as high. Secondly, where did this vConsolidate benchmark come from? According to Intel:

vConsolidate is a benchmark developed by Intel Corporation to measure Server Consolidation performance.

So we are supposed to trust that Intel didn't massage its own benchmark a bit to favor its own processors? Right. Oddly enough there is a benchmark, VMmark, which is from the same people who make VMware which is what Intel claims to be testing. The problem for Intel is that:

VMmark software is agnostic towards individual hardware platforms and virtualization software systems so that users can get an objective measurement of virtualization performance.

The last thing Intel wants is an objective measurement of virtualization performance when the VMmark results show:

Dell 4x Quadcore AMD Opteron 2.5Ghz 8360 SE R905 - 14.17
Dell 4x Quadcore Intel Xeon 2.93Ghz X7350 R900 - 12.23

With 15% less clock speed the AMD system scores 16% higher.


There are also the SPEC results listed by Heise Online which show:

Dell 4x Quad Opteron 2.5GHz 8360 SE R905 - SPECint_rate2006: 167
Bull 4x Quad Xeon 2.93GHz X7350 R480E1 - SPECint_rate2006: 177

AMD is 6% slower in SPECint_rate with a 15% slower clock.


Dell 4x Quad Opteron 2.5GHz 8360 SE R905 - SPECfp_rate2006: 152
Bull 4x Quad Xeon 2.93GHz X7350 R480E1 - SPECfp_rate2006: 108

AMD is 41% faster in SPECfp_rate with a 15% slower clock.


Not all of the server benchmarks are bad for Intel though. In SAP SD, Intel and AMD are much closer:

HP ProLiant BL685c G5, 4 cpu's/16 cores/16 threads, Quad-Core AMD Opteron 8356, 2.3 GHz: 3,524 SD, SAPS: 17,650

HP ProLiant BL680c G5, 4 cpu's/16 cores/16 threads, Quad-Core Intel Xeon E7340 2.4 GHz: 3,500 SD, SAPS: 17,550

HP ProLiant DL580 G5, 4 cpu's/16 cores/16 threads, Quad-Core Intel Xeon X7350 2.93 GHz: 3,705 SD, SAPS: 18,530

With 4% more speed Intel ties AMD and with 27% more cpu speed it is 5% faster.


While in SPECjbb2005 Intel wins with higher clock speed:

HP ProLiant DL585 G5, 4 Opteron 2.3 GHz 8356s 4 × 4: 368,543

Sun Fire X4450, 4 Xeon 2.93 GHz X7350s 4 × 4: 464,355

With 27% more speed, Intel is 26% faster.


So, if Intel had more integrity they would show the benchmarks where they legitimately win like SPECint_rate, SPECjbb, and SAP SD instead of creating their own skewed benchmarks. I'm sure Intel enthusiasts will leap in to say that the only reason Barcelona does so well is because each of the four processors has its own IMC while Tigerton uses a quad FSB northbridge and has to share the same memory. Interestingly, when I brought up this same point 20 months ago in October 2006 Tigerton or Kittenton? many Intel enthusiasts said I didn't know what I was talking about and that memory bandwidth would not be an issue because the quad FSB Caneland chipset would fix everything. I guess I can't be wrong all the time.

Intel proponents are correct to point out that Nehalem will solve this problem and finally deliver real 4-way performance to Intel. The problem is that this won't happen anytime soon. Today, Intel is stuck with Tigerton and later this year they will introduce the hex core Dunnington which will just make the memory bottlenecks worse. We won't see a 4-way version of Nehalem for more than a year until late 2009.

And, although Nehalem's robust triple channel memory controller has been touted many times the truth is that it isn't needed yet. I've already seen people suggesting that Nehalem's triple channel IMC will increase your gaming performance. Don't hold your breath. The truth is that dropping the FSB and external northbridge does greatly reduce latency. However, in terms of actual bandwidth DDR3 should be fine with just two channels up to hex core. It really isn't until you move up to octal core that triple channel memory begins to shine. Intel already has this with Nehalem so they are ready for late 2009/early 2010 whereas AMD is going to have to finally get the much anticipated G3MX technology out the door to avoid its own bandwidth issues when it goes above hex core in the same time frame.

7 comments:

Polonium210 said...

I think Intel should give up on benchmarking and instead rely on an endorsement from Robert Mugabe-their credibility would improve!

enumae said...

Scientia
The last thing Intel wants is an objective measurement of virtualization performance...


Well you should have done more digging, Intel didn't run the test.

Principled Technologies


There are also the SPEC tests done by third party Heise Online which show:

Heise Online did not conduct any SPEC test. Hence the disclaimer at the bottom of the scores...

"All values from www.spec.org, various operating systems and compilers"

Scientia from AMDZone said...

enumae

"Well you should have done more digging, Intel didn't run the test."

I can't see why it would be important who ran the tests. As I mentioned in the article: the vConsolidate benchmark was created by Intel so it will undoubtedly favor Intel hardware. Intel is avoiding the VMmark benchmark created by the maker of VMware, presumably because it is not skewed in Intel's favor.

BTW, I would have the same criticism for AMD if they created their own benchmark and it gave results counter to 3rd parties.

"Heise Online did not conduct any SPEC test."

My mistake, corrected to "spec results listed by Heise". The hardware doesn't seem close enough for a TPC-C comparison but I think I'll add the SAP-SD scores.

Scientia from AMDZone said...

I added SPECjbb as well.

Scientia from AMDZone said...

13ringinheat

"I must have missed the article by you about AMD skewing benchmarks on their site showing the X2 series beating all core 2 duos by a huge margin then...."

I don't recall a series of benchmarks where AMD claimed that K8 X2 beat all Core 2's by a huge margin. Is this something you can link to or do we just have to take your word for it? The only benchmark I recall that would be anything like you describe was a projected SPEC benchmark for Barcelona.

And, if you'll recall I brought up the fact that the benchmark was simulated 3 months before George Ou and Kubicki started railing about it. Oddly enough though it turned out that AMD's estimate was pretty close for the case they were describing which was multi-socket systems. People may have assumed that the results applied to single socket as well but that was never a claim made by AMD.

"accept who you are rooting for. Its obvious to everyone but you."

I'm sure the "everyone" that you refer to is actually just yourself and the small handful of posters in your clique. I'll tell you what; why don't you try common sense for a change instead of bandwagon reasoning?

If I were truly biased in AMD's favor then why would I have pointed out that AMD canceled the Technology Analyst Day that they typically have in June? It is obvious to me that AMD is retreating into a more defensive posture or perhaps just stalling for time. It is clear to me that AMD would have done the Analyst Day in June (just as they have for the past 3 years) if they had something really good to show.

Secondly, if I am biased in AMD's favor then why did I include the SPECint_rate scores where Intel is in the lead? Why did I add both the SAP SD benchmark scores where Intel is ahead and the SPECjbb scores where Intel is in the lead? Presumably if your assumption had any validity I would have left these out and only given the scores where AMD was in the lead.

In case that point was too subtle for you I'll say it again. If Intel wants to brag about honest server scores they can proudly promote the SPECint_rate where they are in the lead, the SAP SD where they are in the lead, and the SPECjbb where they are also in the lead.

But instead of being honest Intel made up their own benchmark and even ignored benchmarks by the maker of the software that they claim to be testing. That is sad indeed and sadder still that you are defending them.

There are areas where things look positive for Intel but there isn't enough for a whole article. The ES samples of Nehalem look promising for example and Intel picked up share in HPC. On the other hand, I haven't seen enough proper testing of Nehalem and the vast majority of Intel's HPC gains were small systems, on average about 1/3rd the size of AMD's. Also, AMD has similar positive items that I haven't written about like discrete graphics that are steadily improving and rumors of a special clocking function on the newest southbridge. All of these items true or false should be resolved in another six months.

Finally, how did you miss the fact that I mentioned that Intel presumably already has 3 channel memory access with Nehalem whereas AMD has yet to demonstrate anything with G3MX?

Apparently if I don't state things explicitly you and yours just invent things and attribute them to me. AMD is behind in a number of areas and having to catch up. This includes high end desktop, quad core single and 2-way servers, mobile, and discrete graphics versus nVidia. In terms of clocks AMD's current top end is only 2.5Ghz versus Intel's 3.2Ghz (2.93Ghz for 4-way).

Pop Catalin Sever said...

AMD is about to become the most powerful platform company in the desktop space, all they need is new more powerful and competitive CPUs (dual and quad core). The RV7-- series of cards are by far the best mainstream cards at the moment and I expect that AMD will keep the situation for some time to come.

AMD dual cores are reasonably OK in the lower segment and lower mainstream segment but the quad cores really need an improvement to compete with Intel's quad cores, and I hope the improvements will come with .45nm and with more tighter integration with ATI R&D efforts.

What ATI has managed to do for RV7-- series of chips is simply amazing, and I mean the optimizations of performance/square mm die plus other more. I'm hoping that AMD's CPU division will benefit from ATI's achievements als, so my hopes are high for future CPUs.

Woof Woof said...

Guess enumae didn't dig too deeply either...

From the pdf:
"Intel commissioned Principled Technologies to measure performance WITH INTEL'S vCONSOLIDATE workload"

and later:
"Intel defined, implemented and supplied the vConsolidate workload."

I guess Principled just had to push the button.