Wednesday, May 02, 2007

K10 -- What Hasn't Been Said

AMD has released a few hints about K10 (Barcelona). This has left people like Ed Stoligo and George Ou wailing and waving their arms trying to figure out what the numbers mean. The shrieking from Mr. Ou has been particularly shrill as he desperately tries to rationalize his belief that Intel will stay in the lead. Unfortunately, what George doesn't seem to realize is that these numbers are correct but not based on K10.

AMD has estimated benchmarks on its website. These include two graphs; let's look at SPECfp_rate2006.


What is most interesting is the quad core Opteron numbers. One would think that this would refer to K10 (since it is quad core) but it doesn't. The QC Opteron score is obtained by taking the Opteron 2222 SE score and doubling it (twice as many cores) and then adjusting it from 3.0Ghz to 2.6Ghz. Xeon 5355 (Clovertown) scales very poorly at only 75%. This is why the theoretical QC Opteron has such a lead. The lead is in fact a 50% higher SPECfp_rate2006 score at the same clock. Yes, this is the same 50% number that George desperately describes as "inflated" and which makes Ed so uncomfortable that he cuts it in half. Sorry, but this number is real; you just have to know where it came from. Back in January, AMD knew that the initial clock for K10 would be 2.5Ghz and they figured that Intel would have a 2.66Ghz Xeon. So, if QC Opteron is 50% faster at the same clock then this drops to 40% for a 2.5Ghz QC Opteron versus a 2.66Ghz Xeon. The SPECint_rate2006 score is 20% at the same clock. These are the numbers that AMD executives have mentioned in interviews. None of these are K10.

This is not complicated at all if you just look at it with a little common sense, but this seems to be what George and Ed have so much trouble with. Back in January, the estimated clock for Xeon Clovertown was 2.66Ghz but recently Intel announced 3.0Ghz. So, AMD changed their statement from 40% faster than the fastest to 50% faster at the same clock. This is neither inflation nor reduction; it is exactly the same number. The numbers for fp_rate are pretty good because they were all done with SUSE Linux. Intel's scores are even using its own compiler. The int_rate numbers are not as good because the Intel scores use Windows while AMD's still use Linux. We'll assume for the moment that these are accurate but the Intel numbers could actually higher. Let's adjust this based on what we know now.

fp
1.5X faster at the same clock, K10 at 2.5Ghz and Xeon at 3.0Ghz.
1.5X * 2.5Ghz /3.0Ghz = 1.25 (25% faster)

integer
1.2X faster at the same clock, K10 at 2.5Ghz and Xeon at 3.0Ghz.
1.2X * 2.5Ghz /3.0Ghz = 1 (0% faster)

This sounds very much like what Ed was saying and we might be tempted to give him some credit except that he never realized that the scores are for projected QC K8 rather than K10. I also don't feel like giving George any credit either even though he does make some correct calculations in his article. It is particularly annoying that George makes sure to get in the phrases "Barcelona hype" and "outrageous promises" at the beginning of his article and only gets around to saying "AMD's Barcelona might end up with a slight lead" down toward the bottom. I notice too that George makes sure to pump up Intel's numbers for higher clock and FSB speed but he too fails miserably to notice that AMD's numbers are for K8 and not K10.

So, let's continue where Ed and George got lost. K10 should be faster than K8. There are a lot of changes but the tricky part is to figure out what would allow four cores to not bog down and what would actually increase IPC. We should assume that the split memory bus, enhanced bus scheduling, increased cache, shared cache, and changes to load/store logic simply allow four cores to not get bogged down. Afterall, we are trying to stuff two more cores on the same bus. However, we know that K8 is not currently using all of its bandwidth and these things should give it a bit more room. Unfortunately, Clovertown cannot use enhanced scheduling because the two dies are separate. We also know that even the Revision F K8's can use as high as DDR2-800 and (from what AMD has said) this will probably be bumped to DDR2-1033 with K10. In other words, while Clovertown will be at 1333Mhz, K8 is already at the equivalent of 1600Mhz and should go to 2066Mhz when Penryn is at 1600Mhz. However, K10 has more improvements than this. We can probably assume that cache bus doubling, prefetch doubling, increased fast path instructions, improved branch prediction, and dedicated stack hardware will actually increase the speed per core. We also know based on what Intel has said that Penryn is 10% faster in integer at the same clock.

fp
1.5X faster at the same clock, K10 at 2.5Ghz and Xeon at 3.0Ghz. We allow 10% for Xeon with higher FSB speed and 10% for K10. The two 10% cancel so no change.
1.5X * 2.5Ghz *1.1 /(1.1 * 3.0Ghz) = 1.25 (25% faster)

1.5X faster at the same clock, K10 at 2.6Ghz and Penryn at 3.2Ghz. We allow 30% for Penryn with SSE4 and 10% for K10.
1.5X * 2.6Ghz * 1.1 / (1.3 * 3.2Ghz) = 1.03 (3% faster)

integer
1.2X faster at the same clock, K10 at 2.5Ghz and Xeon at 3.0Ghz. We allow 5% for Xeon with higher FSB speed and 10% for K10.
1.2X * 2.5Ghz * 1.1 / (1.05 * 3.0Ghz) = 1.048 (5% faster)

1.2X faster at the same clock, K10 at 2.6Ghz and Xeon at 3.2Ghz. We allow 10% for Penryn and 10% for K10.
1.2X * 2.6Ghz * 1.1 / (1.1 * 3.2Ghz) = 0.975 (3% slower)

If K10 gets the same 10% boost in speed then we come up with some interesting numbers. Basically, in integer K10 at 2.5Ghz would be perhaps 5% faster than Xeon at 3.0Ghz and K10 at 2.6Ghz would be 3% slower than Penryn at 3.2Ghz. In fp, K10 at 2.5Ghz would beat Xeon at 3.0Ghz by a pretty fair margin. However, at 2.6Ghz K10 is still a tiny 3% faster than Penryn at 3.2Ghz even if Penryn uses SSE4. So, taking us all the way up to about mid 2008, it looks like we have mostly a tie although it looks like Opteron may have a temporary lead in fp. These numbers are because of reduced scaling with quad core Clovertown but dual core would be better. On the other hand, K10 will clock to 2.9Ghz with dual core. This should put K10 dual core just a little bit more ahead but not much. I assume Intel will bump clock speeds again around mid 2008 but then AMD also moves to 45nm. This is probably about the best educated guess of speeds we can come up with before AMD releases some real benchmarks.

68 comments:

enumae said...

Scientia

Before we get started, please explain... is AMD's Barcelona the same thing as K10?

Based on your opening paragraph it would be, and as would most articles on the internet, but then you make this statement...

None of these are K10.

I question your statement because AMD has said...

"The new Barcelona projections are based on the latest SPECfp_rate2006 and SPECint_rate2006 benchmarks and show that AMD expects to have up to a 50 percent estimated advantage in floating point performance and 20 percent in integer performance over the competition’s highest-performing quad-core processor at the same frequency.", that is here.

So is AMD wrong, or have I misunderstood something in your statement?

Scientia from AMDZone said...

Okay, let's walk through this. Let's start with the actual link to AMD's Announcement

"AMD also disclosed updated performance projections for its upcoming native Quad-Core AMD Opteron™ processors, code-named ‘Barcelona.’ The new Barcelona projections are based on the latest SPECfp_rate2006 and SPECint_rate2006 benchmarks"

Immediately following this remark is a link to a page with Estimated AMD Quad Core Performance benchmarks. Interestingly, these are exactly the same SPECfp_rate2006 and SPECint_rate2006 benchmarks I referenced.

Now, if you read the fine print on the AMD benchmarks it says:

"Dual CPU Quad-Core AMD Opteron processor estimates based on internal AMD simulations at 2.6Ghz"

And, if you examine the benchmarks you discover that the projected Quad-Core Opteron benchmarks are exactly double the X2 Opteron benchmarks.

So, again, my conclusion is that AMD's statements were based on these benchmarks and these benchmarks were simply estimated by doubling the X2 Opteron benchmarks and have nothing to do with an actual K10 processor.

Scientia from AMDZone said...

I deleted some comments to get my response at the top rather than having the same questions asked over and over.

I'm reposting some of the deleted comments but have left out those that weren't relevant or had insulting language.


Ko

If K10 can only tie or marginally beat Penryn, then AMD get's killed.

Don't forget that Intel will also release 8 core (glued) Penryns to take care of the top end.


lex

AMD lost 600 million, is in debt way over their head
Has too little capacity to manufacture enough of Barcebalogna to feed Dell and every other customer that may want them.


Axel

Doesn't Barcelona = K10? A couple months ago Randy Allen said in an interview:

"We expect across a wide variety of workloads for Barcelona to outperform Clovertown by 40 percent."

That's straight from the horses mouth. I'm pretty sure all the 40%-50% figures refer to K10, not quad core K8.


not penix

Why is that today someone has to have a math degree to understand the benchmarks or performance of cpu's. Its total bullshit, years ago this never happened. Its stupid whether amd or intel does this kind of benchmark merry go round.


Azmount Aryl

ko said...
If K10 can only tie or marginally beat Penryn, then AMD get's killed.


K10 quad core has way smaller die size than two penryn dies. Think before you say things.


ko said...
Don't forget that Intel will also release 8 core (glued) Penryns to take care of the top end.


And so will AMD, thats why they have split memory controller on K10.


Heat

Kinda interesting how Barcelona is only a few months away from "launch" and all we have are slide shows with "estimated" numbers dont you think AMD should know what to expect given how the processor should be already tested in house.


Axel

Ok an hour ago there about six relevant comments (including mine), none of which were offensive or deserved deletion. Suddenly all but one of these is gone?

Yeah, I'm aware that this comment will stay up for an hour max before it gets deleted. I don't mind, since this is my final comment on this blog.

Scientia from AMDZone said...

TheKhalif

Good analysis. I actually see AMD being much faster than Clovertown. Penryn will make up a lot of ground but again AMD's scalability is their biggest advantage.

On the desktop, Intel will be closer as usual but dual 1600FSB won't be cheap.

I really see AMD owning quad core until and maybe beyond CSI.

As you say, K8 still has bandwidth and all of the superscalar improvements will definitely love HT3.

I would say though that it is more of a future proofng though since they have opted to stay on HT1.1 with Opteron.

Samplng is set to commence this month so someone should have some prelim numbers by the end of the month. I thought they would come in Mar/Apr but I guess they wanted to wait and keep Intel in the dark.


They have closed their $2.2B notes so they should be in good shape to really ramp Fab36.

I am just interested as to what is going on with Fab 30. I heard that they would do 65nm there and use IBM for some of the first 45nm chips.


Aguia

Scientia,

Dont forget that Intel quad core has 8MB of L2. And AMD solution will have 4MB.

On the Dual core AMD may have 4MB of L2, or 3MB, dont know if its 1MB+1MB+2MB or 512KB+512KB+2MB?
Which leaves more performance for the dual core version than for the quad core, I think.

One question which is the second core on the Intel quad core configuration (double dual core)? The one that is with the first die or with the second die of the processor?

1st die | 2nd die

1 | 2
3 | 4

OR

1 | 3
2 | 4

?

Scientia from AMDZone said...

Ko

You could be right. A tie in performance could be more beneficial to Intel.

And, yes, AMD also has plans for an MCM Octo-Core.

lex

True, AMD can't just keep losing money. Their revenues will have to improve in the next couple of quarters.

I'm not sure your capacity argument is valid though. The rumors are that AMD is only taking minimum orders from Chartered because the demand is down. So, AMD has at least a little slack.

Axel

The 40% and 50% numbers are in the benchmarks.


not penix

Yes, I would certainly be in favor of some type of benchmarking standards.

azmount aryl

I agree. Using half the split bus for each die would seem likely for Octal-Core.

heat

There is no doubt that AMD has been extremely secretive about K10. So, either they don't have any K10's to benchmark or perhaps they are hoping for a maximum splash at launch to try to compete with Intel's massive advertizing budget.

It is somewhat possible that AMD doesn't have a proper B0 stepping yet to do this. I doubt they had one back in January when the original comment was made. Obviously, the proof won't be available until K10 is.

Scientia from AMDZone said...

thekhalif

"I actually see AMD being much faster than Clovertown."

Maybe but I was trying to make a reasonable guess. It is reasonable to assume that later Clovertowns will be more efficient and Penryn more efficient still.

"Samplng is set to commence this month so someone should have some prelim numbers by the end of the month. I thought they would come in Mar/Apr"

Sampling of what? Oh, you must mean testing sample chips for review. I would say it would have to be soon if AMD is going to make an end of June launch.

"I am just interested as to what is going on with Fab 30. I heard that they would do 65nm there and use IBM for some of the first 45nm chips."

No. FAB 30 wouldn't be used for 45nm. FAB 30 doesn't become operational with 300mm wafers until the start of 2008 and this wouldn't allow enough time.

IBM has already completed the 45nm process and AMD has already made 45nm SRAM test wafers. IBM has no further involvement and would be working on 32nm.

45nm production testing has been taking place at FAB 36 for some time and the actual die testing will start as soon as the 45nm design tapes out.

Scientia from AMDZone said...

Okay, that should get things back in order. I wanted the original comments to be posted ahead of the later comments. It would be a lot easier if I could edit the order of comments without deleting.

Unknown said...

Great post Scientia.
The trend I am seeing here is that as K10 clocks higher, it will outperform any intel offering thanks to perfect scaling.

Also, ¿when would AMD include SSE4 instructions in the core?
Could it be with a core revision or do we have to wait till 45nm offerings?

Scientia from AMDZone said...

amd fan

"as K10 clocks higher, it will outperform any intel offering thanks to perfect scaling."

I wouldn't agree with this; Nehalem is going to be a challenge for AMD.

"Also, ¿when would AMD include SSE4 instructions in the core?
Could it be with a core revision or do we have to wait till 45nm offerings?"


I'm thinking 45nm would be the eariest for SSE4.

Unknown said...

OK, it seems that now I know how to post properly on the blog. I have to be loged in to my gmail account to do so. ;-)

Unknown said...

"as K10 clocks higher, it will outperform any intel offering thanks to perfect scaling."

Scientia wrote:I wouldn't agree with this; Nehalem is going to be a challenge for AMD.


Sorry about that. I wasn't more specific. When I was talking about any intel offering I was referring to Cloverton, Woodcrest and Penryn.

For the time Nehalem is out, it will have to face Montreal (which is an 8-core MCMd Shanghai). My best guess is that AMD's true answer to Nehalem will be Fusion (I could be wrong here) /-:

Scientia from AMDZone said...

I use my Google account. I noticed a few months ago that my original blogger account stopped working so I started using just the Google account.

Unknown said...

Sorry about posting as AMD fan.
I created this account when I had issues posting here with my current account due to the changes made to the blog.

anonymous said...

So the real question in my mind is can you truly scale 100%?

My inclination is that moving the interconnect on-chip may reduce scaling issues, but does not entirely eliminate them. In other words, simply doubling the 2 core result is spurious. Real results will show us what is correct.

Looking at Spec CPU2006 fp_rate #s, we can compare a 2 2222SEs with 4 8222Ses. (IBM X3455 and X3755). OS is the same, memory is the same speed (NOT the case for the AMD entries), but the 2222SEs have half (16GB) the memory of the 8222SEs. Results?

Base: 47.3-->89.8 (89.8% gain)
Peak: 49.6-->95.7 (92.9% gain)

So today, for fpu, we see a 7-10% scaling loss. Integer results show 10-11% scaling loss.

I don't believe that it entirely disappears with K10. Erlindo, perfect scaling isn't being claimed, even by AMD. I read it as an estimate- we'll simply have to wait.

Scientia from AMDZone said...

dr. yield

I appreciate your effort but here is the actual data:

410.bwaves - 86.6%
416.gamess - 100.4%
433.milc - 86.4%
434.zeusmp - 98.0%
435.gromacs - 99.8%
436.cactusADM - 99.2%
437.leslie3d - 98.2%
444.namd - 100.2%
447.dealII - 93.3%
450.soplex - 90.1%
453.povray - 99.6%
454.calculix - 99.1%
459.GemsFDTD - 101.9%
465.tonto - 97.2%
470.lbm - 108.5%
481.wrf - 95.6%
482.sphinx3 - 88.6%

You can see that the scaling is close to 100% and that the numbers are pulled down by 5 odd numbers and pulled up by 1 odd number. Even with these odd scores included the actual scaling per core is 96.5%.

abinstein said...

"So the real question in my mind is can you truly scale 100%?"

100% scaling is certainly reachable. Sometimes even super-linear scaling is possible, because not only ALUs but also caches and IO/memory bandwidths are increased. SPEC "rate" benchmarks run multiple workloads in parallel and are thus very scalable.

The real [i]real[/i] question is not whether performance can scale up 100% with twice the number of cores, but whether it can scale down linearly from 3.0GHz to 2.5GHz. IMO, it can't. Thus scientia might just have found a coincidence where 1.67x SPECfp_rate of a Opteron SE 2222 happens to be that of the (estimated) K10 @2.5Ghz.

abinstein said...

To make the point that Opteron does not have perfect scaling (of SPECfp performance) with respect to clock rate, look at the following two systems:

Fujitsu Siemens Computers, CELSIUS V840, AMD Opteron 2218

Fujitsu Siemens Computers, CELSIUS V840, AMD Opteron 2220

The only difference between these two systems is the CPU clock rate, 2.6GHz vs. 2.8GHz, where the latter is 7.7% higher than the former. Yet the latter's performance is only 4.8% higher than the former (12.5 vs. 13.1). Thus performance scales about 2/3 as much as clock rate does.

abinstein said...

Scientia, I realized that your claim was based on SPECfp_rate rather than SPECfp, so here's another comparison made on the rate benchmarks:

Dell Inc. PowerEdge 2970 (AMD Opteron 2216, 2.40GHz)

Dell Inc. PowerEdge 2970 (AMD Opteron 2220, 2.80GHz)

Here the latter's CPU clock is 16.7% faster than that of the former, but the performance is only 10.8% better. Again performance scales about 2/3 as much as the clock rate does.

I'd like to note that the performance scaling w.r.t. clock rate is not uniform across all programs. For example, for 470.lbm the score scales only 22% as much as the CPU frequency, whereas for 454.calculix it is 96%. The average of performance scaling w.r.t. CPU clock rate over all SPECfp_rate2006 programs is about 64%.

Unknown said...

I think that even if we assume 100% linear scaling for adding multiple cores, we're kinda still leaving out the improvements AMD is adding to the k10 architecture. These will probably add some extra performance on top of our measily 4-10% error based on scaling.

Really, AMD is being very conservative, yet again. Abinstein brings up a very good point about clock frequency/performance scaling, and this works in barcelona's favor, initially. As they ramp up the clocks, their performance/hz will die off somewhat, but not as bad as core's does. All in all, another good article, though it doesn't address many of your critics concerns over an article about market share (though I realize you have no control over what you can ethically report).

enumae said...

Scientia
Okay, let's walk through this.

I guess my problem is that you are discarding what AMD has said.

Regardless of numbers and conclusions, it could very well be a coincidence that the simulated test are equal to doubling the 2222SE and then adjusting for 2.6GHz.

You claim to have more faith in AMD than Intel in regards to honesty, so why doubt or question them now?

What if K10 doesn't have the performance you are hoping for, but does have...

1. The ability to put 8 cores in a 2P system, while maintaining the same thermals.

2. To have the performance of a 4P system for a 2P price.

3. The ability to be 20% faster at equal clocks in Integer performance than Intel.

4. The ability to be 50% faster at equal clocks in Floating Point performance than Intel.

These are great achievements in and of themselves.

We know in time clock speeds will increase and so should the performance (especially when factoring in HTT vs FSB).

Again, for you to go against or not believe AMD's press release, well it seems foolish, but hey, I'm just a guy reading your blog :)

Scientia from AMDZone said...

enumae

"I guess my problem is that you are discarding what AMD has said."

I'm going by the official AMD announcement on AMD's own website in the context that AMD put it in. What am I discarding?

Aguia said...

When will K10 be released?

The dailytech article says:

AMD has big changes with K10, including a new brand

Over the next two weeks AMD will slowly begin to introduce its new brand name for high end desktop CPUs.


Does anyone know anything? They (AMD) say second half. But they may paper launch it earlier?

enumae said...

Scientia

Here is what you say in your post..."my conclusion is that AMD's statements were based on these benchmarks and these benchmarks were simply estimated by doubling the X2 Opteron benchmarks and have nothing to do with an actual K10 processor."

You are making your own conclusions contrary to AMD's statements, and clearly claiming the performance numbers are not based on K10, that would be discarding AMD's statements and slides.

Again, "Regardless of numbers and conclusions, it could very well be a coincidence that the simulated test are equal to doubling the 2222SE and then adjusting for 2.6GHz."

Aguia said...

Scientia do you have some thoughts about this, based on your own calculations?

AMD believes it has that base covered; claiming Barcelona will deliver a 70% performance boost over its duo-core Opteron, while consuming the same amount of power.

Information Week

Scientia from AMDZone said...

enumae

"You are making your own conclusions contrary to AMD's statements, and clearly claiming the performance numbers are not based on K10, that would be discarding AMD's statements and slides."

No. AMD's information includes the words: projection, estimate, and simulation.

We have one set of SPEC scores. These scores say simulation for the Quad-Core numbers and the only actual reference score is X2 Opteron. How do you get to real hardware from this?

Christian H. said...

I'm thinking 45nm would be the eariest for SSE4.


There is a lot of confusion about this. SSE4A is listed as suported by K10, but SSE3 was suposed to be the next but Intel changed it.

As far as Fab 36 I hadn't heard about them doing test wafers there at 45nm. It's good news either way, though.

enumae said...

Scientia

Ok, lets make this simple...

1. Did AMD say that the 50% and 20% improvements were for Barcelona/K10?

Yes

2. Are the slides they are showing there to show the simulated performance of Barcelona/K10?

Yes

3. Do there slide say anything about "doubling the 2222SE and then adjusting for 2.6GHz"?

No

4. Is there any statement on the two AMD webpages that claims what you are saying?

No

Again, it could be a coincidence that the simulated test are equal to doubling the 2222SE and then adjusting for 2.6GHz.

There is nothing definitive that says this, it is your interpretation of the information presented, and you are therefore disregarding AMD's own statements by drawing your own conclusion.

Aguia said...

enumae and Scientia,
the article that I have posted says 3 May 2007!!! And I think there is a lot of information there from AMD own employees.
What you guys are doing is just speculation and lot of perhaps.

enumae said...

Aguia
What you guys are doing is just speculation and lot of perhaps.

I am not speculating anything, AMD's comments/projections/simulations are about Barcelona/K10, not K8 as Scientia is claiming.

Unknown said...

I think that, if anything, scientia is closer to the truth. I don't think that Barcelona could only produce as much as a 100% gain in performance when they're updating the architecture substantially while also adding twice as many cores. I don't know exactly why AMD would do this, but they have done things like this in the past.

enumae said...

Xeon 5355 Integer = 84.8

84.8 * 1.2 = 101.8 projected performance of AMD's K10 2.6GHz

Xeon 5355 Floating Point = 60.2

60.2 * 1.5 = 90.3 projected performance of AMD's K10 2.6GHz

*Numbers taken from links in AMD's press release*
-------------------------------

AMD scores 98.3 SPECint®_rate2006 at 2.8GHz.

And scores 91.3 SPECfp®_rate2006 at 2.8GHz.

*Number found on on SPEC.org*
-------------------------------

AMD's 2.8GHz 8220SE 4P is 3% slower than the projected performance of K10 at 2.6GHz in SPECint®_rate2006, and 16% faster than the Xeon 5355 system.

And AMD's 2.8GHz 8220SE 4P is 1% faster than the projected performance of K10 at 2.6GHz in SPECfp®_rate2006, and 52% faster than the Xeon 5355 system.

-------------------------------

Since I cant find scores for AMD's 8218 in SPECfp®_rate2006, or SPECint®_rate2006, we will scale down to 2.6GHz.

AMD 8218 Integer = 91.4 projected performance

AMD 8218 Floating Point = 84.9 projected performance

-------------------------------

AMD's projected 2.6GHz 8218 4P is 11% slower than the projected performance of K10 at 2.6GHz in SPECint®_rate2006, and 7% faster than the Xeon 5355 system.

And AMD's projected 2.6GHz 8218 4P is 6% slower than the projected performance of K10 at 2.6GHz in SPECfp®_rate2006, and 41% faster than the Xeon 5355 system.

So your looking at about 10-15% improvement in SPECint®_rate2006, and about 5-10% in SPECfp®_rate2006.

-------------------------------

From what I have shown, and using AMD's information, K10 is not jaw dropping when it comes to performance.

The main focus of all of this should be how well AMD will have managed the thermal properties of K10, and its ability to drop 4P performance in a 2P system.

The other focus should be on 4P systems and the performance improvement they will recieve.

New processors, new bios and you get double the processing power, I don't care who you are thats cool.



*Now I am sure some here will dispute this, and I hope you go over everything with a fine toothed comb, it is not my intent to mislead anyone or post faulty information.*

Unknown said...

Wow enumae, I'm hard pressed to find where you've actually "shown" anything. As we've said before the number AMD could very well be very conservative.

enumae said...

John

Wow John, I am hard pressed to see why you made a comment, nice of you to criticize though... very helpful.

--------------------------

While AMD could have more performance, that was not the point of the post.

My post was to put AMD's performance claims in perspective against other known processors with real numbers and I think it shows that.

Scientia from AMDZone said...

TheKhalif

"There is a lot of confusion about this. SSE4A is listed as suported by K10, but SSE3 was suposed to be the next but Intel changed it."

There is no confusion. AMD added a few instructions beyond SSE3. Intel added a lot of instructions for SSE4. K10 (Barcelona) has about a 10% subset of SSE4.

Scientia from AMDZone said...

enumae

I've referenced the only available SPEC data (which is a projection of 2222 SE) and given AMD's own documentation. If you want to ignore this and assume that it is a coincidence that the numbers match then that is your choice.

Scientia from AMDZone said...

enumae

" AMD's comments/projections/simulations are about Barcelona/K10, not K8 as Scientia is claiming. "

No. I'm saying that the figures that AMD has given so far are a conservative floor level of performance and are based on a simple doubling of X2 Opteron. I'm saying you have not yet seen the real performance numbers for K10.

Scientia from AMDZone said...

enumae

I already addressed the Opteron 8222 scores in answering dr. yield. You've now made three additional posts on the same information. As I've already said, most of the individual SPEC tests do show perfect scaling.

Scientia from AMDZone said...

aguia

There isn't really much information in the Business Week article that we haven't seen before. A 70% performance boost would only be 85% scaling. This is probably a pretty good estimate when all four cores are loaded. However, it doesn't tell the IPC for single threaded benchmarks and some increase (10-25%) is expected. It is also expected (according to AMD's own numbers) that the SSE performance will be much higher than 70%, 360%.

Overall, the Business Week article is saying that a lead in FP performance will not be enough for AMD to get ahead, that AMD needs higher Integer performance. This matches previous estimates that K10 will catch C2D in Integer but not get ahead by much (perhaps 5%). Strong FP performance would help in HPC which we already know. AMD has completely blown Intel out of the water for HPC wins. However, this is a small market. AMD needs a lot of 1S and 2S server sales too.

enumae said...

Scientia

I am now starting to see your view.

You think AMD's K10 has more performance than what has been stated.

But to say the numbers shown are not based on K10 is assuming and is disregarding AMD's own statements.

It would have been more clear had you stated that the numbers shown are the low end of K10's performance and left out the claims about the scores being based on K8.

I hope that clears this up.

enumae said...

Scientia
As I've already said, most of the individual SPEC tests do show perfect scaling.

I am not trying to show perfect scaling, or even discuss it.

Again..."My post was to put AMD's performance claims in perspective against other known processors with real numbers and I think it shows that.", which is something you had not done.

We now have real numbers (projections) that we can relate to existing processors, and not 1.5X * 2.5Ghz /3.0Ghz = 1.25 (25% faster).

Did you see a problem with my numbers?

-----------------------

It sounds as if your tired of my comments, is that the case?

lex said...

I want to start a new debate..

Which fullsize trucke is best?

Ford F150, Silverado, Tundra..

Which has the best hauling power, king-cap, toughest frame, blah blah blah.

In the end a few % in a benchmark or a hauling power difference of a few hundred pounds is interesting to debate.

But what is more important long-term for the purchaser will be is this company going to be around, will my system scale, will I get continued support for this platforma and the hardware its built around.

That question is easy to answer. AMD is a one trick pony that blew their execution between 2004-2006. They had a once a in company opportunity to capitalize while their big competitor was totally screwed up with netbust.

Benchmarks of Barcebalogna are of a academic interest for a short while, but soon we'll be studing Penyrm and Nehalem.

No amount of benchmark leap about Bacebalogna in absence of discussion of how it can and would save AMD's cash flow problem or longterm viability isn't very interesting.

AMD is finished, it ain't if, but only when and I think 2008 when full 45nm conversion at INTEL is done and 45nm is struggling for IBM will be the time.

Scientia from AMDZone said...

Okay, I keep seeing the 20,40, and 50% numbers talked about as though they represent some amazing new information about K10. They don't.

The same SPECfp_rate2006 charts show that Opteron 2222SE beats Xeon 5160 by 16%. If you double Opteron's score and compare this to Xeon 5355's then you get that amazing 50% greater number.

AMD has not released or demonstrated one single benchmark attributed to Barcelona although that seems to be the common misperception. AMD has only made general statements of performance which match the Opteron 2222 projections. So, I suppose you have to ask why.

Is it fair to simply double the 2222SE scores? Well, there are twice as many cores however we would expect some loss of performance. On the other hand, K10 includes a lot of tricks to keep this drop from occuring. K10 includes some additional tricks that should actually boost K10's performance higher than K8's.

My best guess is that AMD is keeping the actual numbers a secret because the scores are better than a theoretical double 2222 Opteron. I think they are doing this specifically because they are getting clobbered and want to make as big a splash as possible at launch without having to spend millions of dollars on advertizing. If you think about it, every rumor about delay or less than expected performance makes the launch that much bigger if K10 exceeds expectations.

However, I think they have to keep the general perception of K10 from completely tanking and that is why they put out the unremarkable SPEC scores which say nothing except that K10 won't be worse than K8.

You can't compare Opteron 8222 scores because K10 includes memory access sorting and optimization, split controller, and modified load/store rules.

And, the biggest change of all is that with 4 cores on 1 chip any disadvantage caused by NUMA is greatly reduced. All of these things add up to a quad core that is much more efficient than either Clovertown or Yorkfield. This may be enough to allow AMD to offset higher Intel quad core clocks.

Scientia from AMDZone said...

lex

While your comments are not particularly accurate, they do show your obvious passion for Intel hardare. I'm sure that Intel appreciates having customers with such strong loyalty. Good for you.

lex said...

WTF

"I think they are doing this specifically because they are getting clobbered and want to make as big a splash as possible at launch without having to spend millions of dollars on advertizing. If you think about it, every rumor about delay or less than expected performance makes the launch that much bigger if K10 exceeds expectations."

Actually I would think other wise. Nothing could help AMD better then to come clean and show the K10 WOW. What could build anticipation and chatter then to release numbers like INTEL did with C2D and get a healthy debate about unbelievable #s. Then everyone would wait till they got the independent numbers and run around saying it is True!

Get free publicity twice... we told you.. Nah its a lie.. Then it is true! We told you so. Get twice the press for free. Now there is so much FUD...

Buyers don't really know do they wait or do they buy the INTEL marketing machine. INTEL with 45nm and Penrym, Nehalem, Westmere and Sandy look like they have a pretty compelling story and AMD platform and company viability is shaky at best.

Scientia from AMDZone said...

No, this doesn't make any sense. Releasing a lot of K10 information now would tend to reduce current sales and reduce the splash later. Osborne tried exactly what you are suggesting and it didn't work so well.

Scientia from AMDZone said...

I think it is a fairly good bet that AMD will be competitive with Yorkfield unless Intel manages to really ramp the clock (3.5 or higher). Nehalem is a much bigger question. If Nehalem is a substantial improvement then AMD can only hope that their plans for K11 are on track.

Scientia from AMDZone said...

I still haven't come across any firm volume share information. The only reference I have seen so far from Mercury research suggests that AMD lost 2% of volume share in Q1 06 and dropped back down to the Q3 06 level of 23% volume share.

enumae said...

Scientia

This simply can not be debated.

You are making claims contrary to AMD, and I can not understand how someone who has made several post about being unbiased can do this.

You are a person who provides evidence to substantiate your claims, but in this regard all there is is a similarity, no evidence, and no comments/statements from AMD to back you up.

Understand I am not trying to be negative, but it seems that you are just not happy with AMD's projected performance of K10.

The ability to have greater than 4P system performance from a 2P system should be more than enough.

Further evidence should be Intel's ability to only get up to 70% scaling, compared to AMD's 100%+.

If 2P performance is any hint of what 4P could be, Intel will be in trouble when comparing 4P systems.

I hope you can understand my position, and I have no intent to rude, but rather tell you my opinion on this article.

Unknown said...

Enumae, scientia has shown specific evidence stating that AMD is being conservative and why. I don't know why this makes him biased.

enumae said...

Greg
Enumae, scientia has shown specific evidence stating that AMD is being conservative and why.

And what evidence is that considering that he himself has stated... "AMD has not released or demonstrated one single benchmark attributed to Barcelona although that seems to be the common misperception."

I agree with him about that statement, there have been no benchmark information released, only simulations and projections.

At the same time AMD has said up to 50% FP, and 20% Int, and that is all we have to go by.

I don't know why this makes him biased.

I am not saying he is biased.

I do believe him to be honest and fair, but in this case, no information provided by himself or from the information provided from AMD's press release can support his claim.

Scientia from AMDZone said...

enumae

I had to laugh at that one. I'm saying that none of the performance numbers given so far involve actual K10 hardware but only estimates based on existing K8.

Now, if I'm understanding what you just said, you are saying that a lack of evidence does not support my claim of no K10 information.

enumae said...

Scientia
I had to laugh at that one.

Was that called for?

I am trying to be civil with you, and I could have said something similar about the E4300, or AMD's margins, but debate is why I am here not petty personal insults.

------------------------

...but only estimates based on existing K8.

Can you show me where AMD says that the projections or simulations they have released relating to K10, are based on existng K8 hardware?

If you can not, then there is no debate, as you have drawn your own conclusion based on a similarity or coincidence.

------------------------

Now, if I'm understanding what you just said...

Read it again.

"...there have (should have said has) been no benchmark information released, only simulations and projections."

Has benchmark information been released (memory, OS, etc...)?

Fujiyama said...

Guys, have you seen this
article. I don't know what kind of credibility Fuad has but these are numbers from test, not projections/slides etc.

It it's true, dual-core 2.9GHz should blow anything from Intel, cause quad-cores are rather useless...

Scientia from AMDZone said...

enumae

I don't know why you would take offense at that. Personally, I think it is rather amusing to be having a discussion about what we don't know rather than what we do. I think we need more information.

Scientia from AMDZone said...

fujiyama

I've said before that Intel's scores on Sandra were phoney and that they were exploiting limitations in the benchmark to artificially pump up their scores. Then I said that interestingly K10 also matches the characteristics that would exploit Sandra and that this would come back to bite Intel. Even so, 60% better Integer IPC and 100% better FP IPC would seem rather amazing.

enumae said...

Scientia
Personally, I think it is rather amusing to be having a discussion about what we don't know rather than what we do.

Thats exactly my point, we know what AMD has said pertaining to K10, yet you continue to state your conclusion as fact.

--------------------------

You said it yourself, Integer worked out to only 16% (scaling the 2222SE to 2.66GHz) of the 20% claimed, so how can it not be a similarity?

If it was truly doubling K8 hardware the Integer score should have been reflected as well.

Please explain why we do not see the your claim when looking at Integer.

--------------------------

Again, I am not trying to be rude, but there is nothing but a similarity to substantiate your claim.

Aguia said...

AMD quad core presentation:

AMD quad

In the presentation it says it will have 64KB L1 cache. Now this is 64KB+64KB or 32KB+32KB, or a new 64KB L1 cache like the Cyrix processors with shared Data and Instruction cache?

Scientia from AMDZone said...

enumae

"You said it yourself, Integer worked out to only 16% (scaling the 2222SE to 2.66GHz) of the 20% claimed, so how can it not be a similarity?"

No. Scaling 2222SE works out to 19% faster than Xeon 5355 in Integer.

Scientia from AMDZone said...

aguia

"In the presentation it says it will have 64KB L1 cache. Now this is 64KB+64KB or 32KB+32KB, or a new 64KB L1 cache like the Cyrix processors with shared Data and Instruction cache?"

64KB L1 Data and 64KB L1 Instruction cache. 128KB L1 total.

enumae said...

Scientia

AMD 2222SE SPECint®_rate2006 = 56.6

Intel 5355 SPECint®_rate2006 = 84.8

3 / 2.66 = 1.127 or 0.872

56.6 * 2 = 113.2

113.2 * 0.872 = 98.71

98.71 / 84.8 = 1.164 or 16%

If I have made a mistake, please point it out.

Anonymous said...

What would you expect from a babbling idiot as George Ou!!!
He reminds me of Tomshardware in the begining & Amandtech. Intel paid fanbois...

Scientia from AMDZone said...

enumae

It looks like I rounded up too agressively, 18.4% rather than 19%. Your calculations are right, you are just rounding down too much.

3 / 2.66 = 1.1278 or 0.8867

56.6 * 2 = 113.2

113.2 * 0.8867 = 100.37

100.37 / 84.8 = 1.18.4 or 18.4%

enumae said...

Scientia

Please understand that I am not trying to be difficult, please.

3 / 2.66 = 1.1278195488

2 - 1.1278195488 = 0.8721804512

Where are you getting 0.8867?

Scientia from AMDZone said...

enumae

This is not a subtraction problem, just a ratio.

56.6 * 2 = 113.2

113.2 * 2.66 / 3.0 = 100.37

enumae said...

Scientia

In light of my recent D- math skills, you have a very valid point concerning K10 projections being based of K8 hardware.

I would like to apologize for having to beat this topic to death before I was able to see what was right in front of me the whole time.

Thank you for being patient.

Scientia from AMDZone said...

The notion that the statements made by AMD are based on K8 projections and not K10 is just my best guess. You are hardly the only one to disagree with this and obviously AMD has never stated this directly.

Unknown said...

Fud has been surprisingly accurate to date. It'll be interesting to see if this sort of advantage translates to actual applications.

DaSickNinja said...

K10... what hasn't been seen.