Thursday, June 05, 2008

Nehalem Appears But Anandtech Chokes While Tooting Intel's Horn

Hopefully in the Nehalem preview Anand is telling the truth when he says that it was done without Intel's knowledge because Anandtech fumbled badly with this introduction. Clearly Anand is not someone to trust to carry your fine crystal.

June 5, 2008

Taken at face value the above scores show an impressive 21% increase in speed for Nehalem.

April 23, 2008

However, note that while the Q6600 score goes up 98 points from March to April the score for Phenom 9750 goes down 54 points.

March 27, 2008

And even more strange is that the score for Q9450 was 980 points higher back in February. If this number is accurate then Nehalem's increase is reduced to 10%.

February 4, 2008

However, even a 10% cumulative gain would be very nice on top of the gains we've seen for C2D and Penryn. Unfortunately, the single threaded scores are not quite as impressive.

June 5, 2008

This would be an increase of only 3% for Nehalem.

February 4, 2008

However, the score from February was 366 points higher. If this score is correct then Nehalem would be 9% slower.

Anand is now claiming that the reduction in speed is due to a Vista update. We can check this easily enough and see if the drop in speed is consistent.

3Dsmax 9 - 21% slower
Cinebench XCPU - 9% slower
Cinebench 1CPU - 11% slower
Pov-Ray 3.7 - 13% faster
DivX - 8% faster [version change from 5.13 to 5.03]

Well, this isn't exactly consistent. 3Dsmax shows twice the slowdown of Cinebench while Pov-Ray gets faster. DivX gets faster too although this is less definitive since Anand shifts to a slight older version.

Conclusions (or perhaps Confusions):

Penryn is 21% faster when using all threads, or . . . it is only 10% faster.

Penryn is 3% faster with single threads, or . . . it is 9% slower.


I am looking forward to seeing some proper testing of Nehalem. I'm sure most are anxious to see Nehalem compared with AMD's 45nm Shanghai. I have no doubt that Nehalem is much stronger in multi-threading than C2D. I believe that Intel did this with the goal of stopping AMD from taking server share. However, this is a double edged sword since gains for Nehalem will surely be losses for Itanium.

It remains to be seen if Nehalem fairs better in benchmarking than Penryn. Penryn benefited from code that used lots of cache and was predictable enough for prefetching to work. Nehalem however has far less cache but is also much less sensitive to prediction because of the reduced memory latency. Nehalem is also much more like Barcelona in that L3 has much higher latency than L2 making the bulk of Nehalem's cache much slower than it was with Penryn. One would imagine that Nehalem's preference in program structure would be much closer to Barcelona's preference than Penryn's has been. Comparisons later this year should be interesting.

14 comments:

Khorgano said...

Anand hasn't explicitly stated it, but it's possible that on these new Q9450 tests that are "slower" he retested the system in a single channel memory configuration since that is what the Nehalem system was limited to in testing due to the immaturity of the platform. Or if not, we'll likely see a bit of a boost in the Nehalem side when fixed.

Also, I agree that we should wait and see, but there could be any number of reasons for the discrepancy. Different OS's, motherboards, Ram capacity/frequency/timings etc...

I wouldn't accuse Anand as an Intel paid-pumper, although he can get a little over-zealous in his reporting and acted the same way during the Athlon 64 hay-days.

Ho Ho said...

scientia
"However, note that while the Q6600 score goes up 98 points from March to April the score for Phenom 9750 goes down 54 points."

Do you happen to know the PC configurations of these tests? If not then there are no meaningful conclusions that can be made. You as a knowledgeable man should know this.


"We can check this easily enough and see if the drop in speed is consistent."

Why would it have to be consistent? You do know that different programs have different bottlenecks, do you?


"Nehalem is also much more like Barcelona in that L3 has much higher latency than L2 making the bulk of Nehalem's cache much slower than it was with Penryn."

I would say Nehalem's L3 is a pretty good thing in reducing average latency. It has far, far lower latency than K10 (where L3 is only a bit faster than regular memory read) and it doesn't have to waste BW/latency to synchronize with data in the other die cache in mcm configuration.

Ho Ho said...

Forgot to add ...

"Unfortunately, the single threaded scores are not quite as impressive."

Quoting yourself, "why should anyone test single-threaded performance on quadcore processor?".

Jeach! said...

Very thorough and interesting analysis Scientia!

In regards to:

However, this is a double edged sword since gains for Nehalem will surely be losses for Itanium.

Do we have volume, revenue or earnings numbers directly related to Itanium?

If Nehalem cuts into Itaniums market, would the sales be significantly reduced to the point of affecting their outlook?

Also, we know that Intel had massive amounts of cache. How much smaller will the Nehalem dies be without all this cache? If they are smaller, this should increase Intel's margins.

Does Intel plan on making native Nehalem quads? If so, we could expect lower yields (and a hit on their margins) and a 3 core processor!

Mark S. said...

Thanks, Sci! I was getting tired of reading all of the tech blogs wetting themselves over the numbers in Anand's "unauthorized" preview.

Seems more likely that its an Intel sponsored astroturfing campaign than an independent snatch-n-grab preview.

And just the thought of Anand fudging benchmark numbers over time? Shocking!

Cheers!

Scientia from AMDZone said...

Khorgano

"I wouldn't accuse Anand as an Intel paid-pumper,"

I never said he was.

" although he can get a little over-zealous in his reporting and acted the same way during the Athlon 64 hay-days."

This is not true at all. In late 2003 Anand blasted AMD for being late with Athlon 64. In his first review he also said that it was overpriced. In fact, he was so negative that he contradicted himself several times. Even in 2005 Anand was still finding things to nit-pick about AMD even when X2 was clearly ahead.

Scientia from AMDZone said...

ho ho

"Do you happen to know the PC configurations of these tests? If not then there are no meaningful conclusions that can be made. You as a knowledgeable man should know this."

If the results go up and down like that then what possible conclusions can you draw?

When the Wright brothers realized that Lillienthal's lift/drag data tables were wrong they built their own wind tunnel and did their own testing. They worked very hard until they were able to come up with repeatable results. Apparently Anand's standards are somewhat lower.

"Why would it have to be consistent? You do know that different programs have different bottlenecks, do you?"

Are you even listening to yourself? If you can have a 34% variance in score just by doing an upgrade on the OS then what is your margin of error, +/- 50%? What meaningful conclusions can you draw from this amount of variance?

"I would say Nehalem's L3 is a pretty good thing in reducing average latency."

You are making up things again. L3 is of course lower latency than main memory. Another strawman of yours.

"It has far, far lower latency than K10"

39 cycles versus 43 cycles? If you are using the 2.0Ghz figure then you need to remember that you are comparing Intel's unreleased speed to AMD's current speed. I'm sure the 2.0Ghz NB speed will come up with Shanghai. If we did (for some odd reason) use this unfair comparison Nehalem would have 32% less latency.

" (where L3 is only a bit faster than regular memory read)"

Let's see if this comment makes any more sense than the last. According to Anandtech Phenom's L3 latency is 21.5ns versus 46.9ns for Nehalem's memory latency. So, according to you:

32% less for Intel is "far, far lower"

BUT

54% less for AMD is "only a bit"

Does this make any sense at all?

"Quoting yourself, "why should anyone test single-threaded performance on quadcore processor?".

My feelings have not changed: I still say that it is nonsense to judge a quad core processor by single threaded peformance. However you are making my point quite well that the very same Intel enthusiasts who vehemently insisted that single threaded benchmarks were more important (when they scored higher with C2D) will now flipflop and insist that multi-threaded is far more important (when they score higher with Nehalem).

Scientia from AMDZone said...

Jeach!

"Do we have volume, revenue or earnings numbers directly related to Itanium?

If Nehalem cuts into Itaniums market, would the sales be significantly reduced to the point of affecting their outlook?"


Intel has consistently lagged behind with 4-way processors: Tulsa was released after Conroe. Tigerton was released about the same time as Penryn. And, it looks like Dunnington will be released about the same time as Nehalem with the 4-way version of Nehalem lagging to late 2009.

The two theories of why Intel lags with 4-way are:

1.) 4-way is more difficult than 2-way so this is the best Intel can do.

2.) Intel lags on 4-way x86 on purpose to put less pressure on Itanium.

Consider that AMD's 4-way lag is only about 3 months, that Intel was actually leading with 4-way with Pentium Pro and that Itanium has been late at every release which of these two seems more likely? It actually goes further than this: Intel resisted adding popcnt() to x86 because it gave Itanium an advantage. Intel similar resisted adding virtualization, RAS, 64 bit instructions and greater than 36 bit addressing.

Soon Itanium's only advantage will be an artificial difference in RAS unless AMD forces Intel to make Xeon RAS stronger by making its own Opteron processors more robust in RAS.

"Also, we know that Intel had massive amounts of cache. How much smaller will the Nehalem dies be without all this cache? If they are smaller, this should increase Intel's margins."

None. These are monolithic quad dies, not MCM. Worse yield, not better.

"Does Intel plan on making native Nehalem quads? If so, we could expect lower yields (and a hit on their margins) and a 3 core processor!"

Nehalem's are normally quad core. However, a tri-core is unlikely because Intel already dismissed the tri-core idea in their sales ads.

Scientia from AMDZone said...

Mark S.

"Thanks, Sci! I was getting tired of reading all of the tech blogs wetting themselves over the numbers in Anand's "unauthorized" preview."

Well, I'm not saying that Nehalem isn't a lot faster than Penryn, it could very well be. I would just like some at least semi-professional testing.

"Seems more likely that its an Intel sponsored astroturfing campaign than an independent snatch-n-grab preview."

I know, it is something you always have to consider. If Anand really did preview the chips on the sly then wouldn't Anandtech get black balled by Intel? Intel has a long and well established history of retaliation. A lot of this article is tongue in cheek because as I said if Intel really were involved with this then they could have picked someone who actually knew more about testing. Perhaps Francois Piednoel wasn't available to give Anand a hand with his "unauthorized" preview.

Mark S. said...

Sci, in regards to your reply to Jeach!:

"Does Intel plan on making native Nehalem quads? If so, we could expect lower yields (and a hit on their margins) and a 3 core processor!"

Nehalem's are normally quad core. However, a tri-core is unlikely because Intel already dismissed the tri-core idea in their sales ads.


I would like to point out that Intel has routinely dismissed all of AMD's innovations, until such time as they can copy the tech and implement it in their own products. I.e. AMD64/EM64T, Hypertransport/Quickpath, on-board memory controller, Fusion/Larrabee, etc, etc.

As soon as Intel can profit from selling a quadcore with a defective core as a tricore, you can rest assured that they will do so, AND that the majority of tech bloggers will heap praise upon them for being so innovative and beneficent!

Cheers!

Scientia from AMDZone said...

cesarin

"you test a processor in a single threaded application to see how much it increases the performance PER CORE. comparing to another single core of the competition."

Well, thanks for explaining something completely obvious. Is there anything else which everyone already knows that you would like to explain?

"that should show the efficiency of the cpu."

Not really. You obviously have no clue what efficiency means.

Efficiency is how much work you can get done for a given amount of energy consumption or how much work you can get done for a given cost.

Cost tends to be the universal measure but energy consumption is also important in mobile as well as large scale server and HPC systems.

One of the common claims by people who buy quad core processors is that they can run other applications in the background. Unfortunately, there is almost a complete absence of testing to measure this. Some testing pays lip service to this by running multiple copies of the same application but people seldom run four copies of the same thing.

Single threaded benchmarks are not useful by themselves, without testing performance by loading additional cores you learn nothing. Single threaded tests are best used to establish a base score to show scaling performance as you load additional cores.

This should be common sense but apparently it isn't to everyone. The best processor should show nearly linear increases in work as additional cores are loaded or conversely little dropoff for the initial application as additional cores are loaded. A processor that that scores high with one core loaded but falls off rapidly as additional cores are loaded is probably not the best processor to buy.

nemrod said...

Do you know that if Cinebench scales nearly perfectly with core and frequency, the score isn't the same one on vista 64 and vista 32. I believe you've really missed this little point...

http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3326&p=2
vista 32bit...

http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3216&p=9
vista 64 bit...


And for
"However, even a 10% cumulative gain would be very nice on top of the gains we've seen for C2D and Penryn. Unfortunately, the single threaded scores are not quite as impressive."

That mean only that the nehalem is a better quad than C2D. Look at the case of phenom for example the ratio between (4 cores score/ 1 core score) is far better than for C2D. But it's hide by the fact that single core performance of phenom is worst than for C2D.

Scientia from AMDZone said...

nemrod

Yes, I believe that Nehalem is a better quad core design than Penryn. However, there are two questions:

1.) Can Nehalem achieve a good IPC on all four cores without using SMT?

2.) Is the single thread performance significantly better than Penryn?

This is what we need proper testing to answer.

Scientia from AMDZone said...

I Am The Fruth

"Remember when Core 2 was being demonstrated by Intel against overclocked AMD dual cores?"

Not specifically. However, there is no doubt that Intel improved quite a bit with C2D.

"Triple core Phenom's are almost a match for Intel dual cores."

No, Anand specifically avoids comparing tri-core on the basis of cost. It wouldn't be a good comparison for an Intel fan.

"AMD's best quad core gets trashed by Intel's slowest quad core."

True but then you will pay for that speed.

"AMD has now posted losses every quarter since Core 2 was unleashed."

No. AMD's first losing quarter was Q4 2006. Did you forget when C2D was released?

"But I can promise you this; that when Nehalem is released, you will be eating your words."

I see. Well let's go over some of the great predictions by Intel fans.

AMD will go bankrupt.
AMD will be bought out.
AMD will sell its FABs.
Barcelona's SSE is no faster than K8's.
GT 200 will maintain nVidia's lead over AMD.