Friday, October 19, 2007

Q3 2007 Earnings, AMD & Intel

With AMD's Q3 Earnings information added to Intel's we can see that Intel is doing quite well while AMD is still struggling. Nevertheless the announcements have dispelled many rumors that have followed the two over the course of this year.

Intel's revenues and margins are up this quarter and it is obviously doing well. In contrast AMD lost $400 Million and has much lower margins. One could be forgiven for confusing today with early 2003 when Intel was doing well and AMD was struggling to turn a profit as K8 was released. However, today is quite different from 2003 since AMD is now firmly established in servers, mobile, and graphics. In early 2003 AMD had barely touched the server market with Athlon MP, had only a single chipset for Opteron and had yet to create mobile Turion. It is interesting though to review the common rumors that have surrounded Intel and AMD this year to see where perception differs from reality.

A commonly repeated rumor was that AMD was teetering on the edge of bankruptcy and would likely have to file after another bad quarter or two. However, it is clear from the reduction in stock holder equity this quarter that AMD could survive at least another year with similar losses each quarter. If AMD improves in Q4 then bankruptcy seems unlikely. Also along these lines was the persistent rumor that AMD would sell FAB 30. AMD's announcement of pilot conversion to 300mm/45nm (as I indicated months ago) shows that AMD will not be forced back to a single FAB.

A similar myth was that AMD would need to be purchased by another company to be bailed out. The biggest problem with this idea is that there really isn't anyone to buy AMD. Companies like Samsung and IBM would be competing with their own customers with such a purchase and companies like Texas Instruments and Motorola have been out of the frontline microprocessor business for many years.

It has often been suggested that Intel could simply flood the market with bargain priced chips to deprive AMD of revenue. However it is clear from Intel's statements that they were unable to meet demand during the third quarter without drawing down inventories. Since the fourth quarter is typically the highest volume of the year it is unlikely that Intel will be able to cover all of the demand. Leaving Q4 with lower inventories would further prevent Intel from capturing additional demand in Q1. This means that Intel will likely be unable to really squeeze AMD until the second quarter of 2008 when 45nm production is up to speed and demand falls off from Q1 08.

Lately, it has been fashionable to suggest that AMD was abandoning the channel yet AMD posted record sales to the channel in Q3. It was also common to see bashing of ATI's offerings versus nVidia. Yet the increased sales from chipsets and graphics indicate that ATI is gaining ground after its loss of direct business with Intel. The ATI assets should continue to add to profits in Q4. Secondly, as AMD and nVidia strive to gain advantage there is no doubt that Intel is being left behind. With new chipset offerings from AMD in Q4 and Q1, Intel's chipsets will once again be second rate in terms of graphics. The company most likely to be hurt is VIA as the move by Intel and AMD into the small form factor mini-ITX size commoditizes the market and removes most of the previous profits. One has to wonder how much longer VIA can stay in this market as it gets squeezed both in terms of chipsets and low end cpu's. One has to wonder as well if Transmeta might soon become an AMD acquisition as it similarly struggles at the low end.

Intel was often described as moving its 45nm timetable forward one quarter while AMD was often described as falling further and further behind. However, the demand shortages would be consistent with rumors that Intel was having trouble moving 45nm production beyond D1D before Q1 08. Presumably the Penryn chips that will be available in Q4 07 will be from D1D. AMD's statement countered rumors of a slipping 45nm schedule by repeating its timetable of first half of 2008. Intel's supposed one quarter pull in for 45nm was clearly ficticious since Intel's own Tick Tock schedule would be Q4 2007 or exactly two years since 65nm in Q4 2005. So, Intel is on schedule rather than moving the schedule forward. AMD's statement was a bit of a surprise since the previous timeline had been "midyear" for 45nm. This could easily have been Q3 but AMD's wording of first half of 2008 means late Q2 is more likely. If AMD really is on this track then half of Intel's previous process lead will be gone in another two quarters. This would also mean that AMD will have moved to 45nm in just 18 months rather than 24 months as Intel has done.

A current area of confusion is whether demoed chips are at all representative of actual production. It has become apparent that Intel's initial production from D1D is of very high quality. However, quality seems to fall somewhat as production is moved to additional FABs. In the overclocking community, it has been suggested that Intel's bulk production did not catch up to the quality of the early D1D chips until the recent release of the G0 stepping. This suggests that Intel's bulk production quality lags its initial production quality by a full year. This would seem to explain both having to destroy the initial 45nm chips. Clearly, Intel's demos are not indicative of actual production as seen by the lack of chips clocked above 3.0Ghz. This leaves the question of whether AMD's demo of a 3.0Ghz K10 was similarly cherry picked. A late Q1 or early Q2 08 release of a 3.0Ghz FX Phenom would be consistent however a late Q2 release would mean that AMD was equally guilty.

A lot of faith was put into Intel's mobile Centrino position. However, this quarter it is clear that Intel lost some mobile ground. Intel faces a much greater challenge as Griffin is released with an all new mobile chipset in Q1 08. All early indications are that mobile Penryn is unable to match Griffin's power draw. This will likely mean continued erosion of Intel's mobile position.

The server market is another area that tends to run counter to rumors. After Intel's very strong showing with Woodcrest and Clovertown it has taken about as much 2-way share from AMD as it can. The suggestion has been that Intel could hold its 2-way share while taking 4-way share from AMD with Tigerton. This scenario is almost certainly incorrect. Barcelona should provide strong competition in 2-way servers from Q4 07 forward so Intel is likely to lose some of its server share. And, Tigerton is only a slightly modified 65nm Clovertown which will have great difficulty overcoming the high power draw of its quad FSB, FBDIMM chipset. This makes it unlikely for Tigerton to do more than hold the line as a replacement for Tulsa. It appears that Intel's true 4-way champion, Nehalem, won't arrive until Q4 (the normal Tick Tock schedule) by which time AMD's 45nm Shanghai will already be out. I have no doubt that Nehalem could show very well against Shanghai since Nehalem will have more memory bandwidth. However, a big question is whether this will move down to the desktop. There still seems to be some confusion too as to whether or not Intel is really willing to discard its lucrative chipset sales and FSB licensing. The strength of Nehalem is also a two edged sword as Intel will simultaneously lose its die size (and cost) advantages since Nehalem is a true monolithic quad core die. Power draw is also unclear until it is known whether Nehalem has a direct IMC or some type of intermediate bus like AMD's G3MX. For example if Nehalem relies on FBDIMM as the current C2D generation does then Nehalem may have a tough time being competitive in performance/watt.

It now appears that AMD has indeed moved up its own timetable for delivering 2.6Ghz chips. These seem scheduled for release in Q4 whereas the previous roadmap had them appearing in Q2 08. This would likely mean 2.8Ghz chips in Q1 from AMD however Intel could maintain a comfortable lead with 3.1Ghz or faster chips of its own. The number of partners added to IBM/AMD research consortium also vanquished persistent rumors that AMD would toss SOI on 32nm.

However, the most surprising thing that has come up is AMD's view of the market. The popular wisdom was that AMD would be forced to back down and give up share to Intel. Yet in spite of Intel's good showing in profits, AMD is determined to increase revenues to $2 Billion a quarter. This would mean not only holding onto current share but taking more share from Intel. The $2 Billion mark is unlikely soon but it does seem that a near term loss of, say, $100-200 Million would be considerably better than the current $400 Million loss. The $2 Billion goal also seems to fit with AMD's previously stated goal of 30% share by end of 2008. I don't have AMD's volume share numbers yet for Q3 but I recall 23% from Q2. If AMD is at, say, 24% now then a 25% increase would be 30% and a similar 25% increase with the $1.6 Billion gross revenue would be $2 Billion. Assuming that AMD can hit its 45nm Shanghai target, I can't see any reason why this wouldn't be possible. It would mean ramping FAB 36 by the normal schedule and then partially ramping FAB 38 in Q1 and Q2. This would allow production to increase in Q2 and Q3 which would keep delivered chip volume increasing in Q3 and Q4 when FAB36 will already be topped out. Since Intel's 45nm ramp is slower than AMD's it is possible for AMD to blunt Intel's price advantage as long as Shanghai stays on track.

In spite of the rumors that have been dispelled we are left with a number of unanswered questions. Is there a bug that is holding back K10's performance? What speeds will Intel and AMD have in Q1? Is Intel really willing to toss its chipset revenue with Nehalem while losing its current cost advantages? Will the additional partners that have joined IBM and AMD in SOI research allow AMD to match or even exceed Intel's process tech? Can AMD really increase its share by 25%? Is AMD's faster adoption of Immersion a technical advantage or a technical curse? Will Shanghai be able to catch Penryn or hold AMD's position against Nehalem? The sad part is that it will take another year to know all of the answers for certain.

213 comments:

1 – 200 of 213   Newer›   Newest»
Aguia said...

Is there a bug that is holding back K10's performance?

Bug on the design or on the manufacturing?

Khorgano said...

Intel's admission that its early 45nm Penryn production was not competitive with its own mature 65nm products and had to be disposed of. This statement was not much of a surprise to me given Intel's release of the excellent G0 65nm stepping shortly before.

You're kidding right? Do you even understand accounting? Those chips were NOT disposed of. They were simply written off for Q3 because Intel needs to build inventory for launch. Those chips are stockpiling for Q4, but they don't want to show them as existing inventory which would mis-represent their holdings so they wrote them off for the quarter because they still have to account the costs associated for manufacturing them. So, Intel takes the hit on them for Q3, but does not list them as assets for Q4 which means they'll be effectively 100% profit for the quarter.

You're right that all the chips are from D1D. Since D1D is a low volume fab, it has taken them months to build up enough product for launch. F32 in Arizona will have product out the door for Q4, but will likely only be a trickle for December. Hence, the reason for the write-off in Q3.

Mo said...

Where is the link that shows Griffin and how you "concluded" the early indicators.

Your article lacks backup links as usual.

GutterRat said...

Intel's admission that its early 45nm Penryn production was not competitive with its own mature 65nm products and had to be disposed of. This statement was not much of a surprise to me given Intel's release of the excellent G0 65nm stepping shortly before.

Come on Brent. Provide a link that supports that Intel admitted that it's 45nm Penryn production was not competitive with its own 65nm products.

khorgano is right on with his comment. You are, however, not right with your conclusion.

Bradley said...

amd might have an advantage in going to 45nm, i figure they'll have a whole empty shut down fab (fab38) to practice in. its gotta be harder if youre juggling switching to a new process and perfecting the old and producing in the old in the same fab.

havent heard anything about DTX in a while?

no questions about puma. worried by what mooly eden said about it, that they arent worried because they thought it would be based on barcelona.

"Is Intel really willing to toss its chipset revenue with Nehalem while losing its current cost advantages?" HUH?

its really amazing how bad the questions these analysts ask are!

abinstein said...

Hi scientia... nice overall analysis. Two things only: first the notion of Intel writing off 45nm Penryn and that they are not competitive is a bit far-fetch. Second the Q4 is seasonally down for GPU and thus it's not very likely for ATI to turn to profit.

abinstein said...

khorgano -
"Those chips were NOT disposed of. They were simply written off for Q3 because Intel needs to build inventory for launch."

I don't think you are right. It is plainly not what Intel said. Those 45nm chips are not too less in quantity, but too low in quality. In other words, what scientia's said seem more likely.

Here is what Andy Bryant said:

Offsetting this were the inventory write-offs we took as we ramped a new 45 nanometer process and built products that did not qualify for sale in the third quarter and therefore could not be classified as inventory.

And a while later he said again:

we saw a little bit of bad news offsetting that with the write-offs on the 45-nanometer products that aren't yet qualified for shipment.

Had it been because of stockpiling he would've said so, and I don't think it's necessary to write off assets to account for stockpiling while inventory is the right mechanism for the precise purpose.

IMO that doesn't necessarily mean that 45nm chips don't reach as good performance as 65nm chips, but simply some of them don't reach the performance for their designated price points. This could happen when Intel tries to produce 45nm chips with (slightly) more superior microarchitecture - it doesn't make sense from a marketing point of view to sell a chip with better design for lower price.

Roborat, Ph.D said...

Abistein said: I don't think you are right. It is plainly not what Intel said. Those 45nm chips are not too less in quantity, but too low in quality. In other words, what scientia's said seem more likely.

I'm sorry, you're completely mistaken. When Andy Bryant said "aren't yet qualified for shipment", he meant factory certification, not product quality. I hope you know the difference.

Mo said...

"did not qualify for sale in the third quarter"

This does not mean they were trashed. It just means they were not qualified to be sold in third quarter.....

Erlindo said...

Hi Scientia:

As always, great writing. ;)

I've found this little piece (in japanese):

http://pc.watch.impress.co.jp/docs/2007/1018/kaigai394.htm

It clearly shows that Nehalem would be using FB-DIMM and this might prove your point.

Here is the translated version of the same site: ENGLISH

Aguia said...

Yes abinstein is correct.

and built products that did not qualify for sale in the third quarter and therefore could not be classified as inventory.

Can’t be inventory because they are broken not finished like you said Roborat.
Unless you can link to some previous product from Intel where the same happened.

Aguia said...

Erlindo nice link.

Intel six (6) core CPU?

I'm already seeing Sharikou calling it Intel sick CPU.

There is some wrong information there, it says Intel Netburst based Xeon is a native 2 core CPU while it’s 2 x 1.


Also where we already saw something similar?

Funny how Intel is years behind AMD in CPU design with much more resources.
Intel still has to come up with something similar to the first April 2003 Opteron, in 2008 six years have already passed!

Scientia from AMDZone said...

Axel said...

"If AMD had anticipated substantial continuing market share gains into 2008, they would not shut down Fab 30 entirely but instead would upgrade piecemeal while continuing 90-nm output in order to try to take more market share."

No, that doesn't follow. There is little left that FAB 30 can make. Any small quantity that AMD might get from FAB 30 could be obtained from Chartered. It is probably easier for AMD to convert FAB 30 without it operating and AMD can probably save some organizational complexity by dropping 200mm production. Also, AMD gets the money right away by selling the outdated 200mm tooling.

"- There isn't enough market demand for K8 to warrant the high fixed costs of operating Fab 30."

You are obviously overlooking the large volume of Brisbane chips that AMD will continue to produce. Remember, FAB 30 cannot produce 65nm chips.

"- AMD do not anticipate enough market demand for K10 in 2008 to warrant upgrading Fab 30 starting today."

This is one of the most ridiculous things I've ever read. FAB 36 will continue to increase in capacity until mid year 2008. FAB 38 only needs to begin production in Q3 2008 to continue expansion. This gives plenty of time to install new tooling.

"- This returns AMD to one fab, with not much more die output capacity than they had in 2005 before the Fab 36 ramp, due to the rapidly increasing quadcore mix we will see in 2008."

Not at all. If AMD were really down to one FAB it would take two years to build another one. With FAB 30 upgraded, extra capacity can be available in just a few months. Your estimate of FAB capacity is also wrong. In 2005, AMD produced a total of 45 Million chips. However, the total for 2008 should be close to 100 Million chips.

"In other words, AMD cannot afford to operate two fabs anymore"

Again, not true at all. AMD will only invest in new FAB tooling if the capacity is needed.

" and are doing this purely to save cost"

No. It only costs AMD money if the tooling sets idle because the capacity is not needed.

" and wait and buy time for Bulldozer because K10 is not good enough to bring in the desired revenues."

Again, not true. AMD is moving forward with both 65nm K10 and 45nm K10. Bulldozer is nearly two years away.

" Yes they will slowly upgrade the tooling, the pace of which will mostly be dependent on the success of K10."

Again, not true. They will continue with the expansion of FAB 36 which doesn't reach full capacity until mid 2008. The purchase of tooling for extra capacity from FAB 38 is dependent on projected demand.

" I believe they are overly optimistic about the pricing they can command for K10"

No. This question was asked during the conference and AMD indicated that they didn't foresee much change from K8.

" and in reality will not be able to bring Fab 38 on-line in 2008."

Completely false. FAB 38 will come online in early 2008.

" I believe that AMD will be limited to a single fab throughout 2008, and will become severely capacity constrained as Intel drives demand for quadcore into the mainstream and AMD's product mix shifts towards 283 mm^2 die production."

This is wrong for several reasons. First, AMD also gets capacity from Chartered. Secondly, you grossly exagerate Intel's quad mix which is currently only running at 3%. Third you ignore 45nm which begins production in Q2. Fourth, you ignore that Intel is also currently capacity constrained. And, finally, you ignore that Nehalem is also a large die.

Scientia from AMDZone said...

khorgano

Correction noted. I edited out the part about disposing of chips.

Bradley

"amd might have an advantage in going to 45nm, i figure they'll have a whole empty shut down fab (fab38) to practice in. its gotta be harder if youre juggling switching to a new process and perfecting the old and producing in the old in the same fab."

45nm testing has been under way for awhile now at FAB 36. AMD has no reason to start in FAB 38 since by the time FAB 36 is ramped to its full tool set, 45nm production will already have begun.

"no questions about puma. worried by what mooly eden said about it, that they arent worried because they thought it would be based on barcelona."

Griffin is not based on Barcelona. What question is there about it?

abinstein

Q4 may be down for graphics but should be up for chipsets. So, I still expect the ATI section to contribute to more profit in Q4.

abinstein said...

Roborat -
"When Andy Bryant said "aren't yet qualified for shipment", he meant factory certification, not product quality. I hope you know the difference."

Actually I don't. What is factory certification and how come it doesn't happen with Yonah or Netburst 65nm last year? Why was Andy so cryptic on it and didn't just say [the chips are] "not factory certified to sell yet"?

mo -
"This does not mean they were trashed. It just means they were not qualified to be sold in third quarter....."

Such argument doesn't make sense. If the chips are not qualified for sale but not trashed, then they must be still in inventory to turn profit some time later. What if Andy and Stacey decided to put those profits into their own pockets? Those chips are not on the inventory chart; their production cost have been accounted as loss to the investors anyway. Wouldn't it be some terrible mis-management if something like this happens?

erlindo -
"It clearly shows that Nehalem would be using FB-DIMM and this might prove your point."

As aguia said, nice link.

But... 2H 2009 for an MP platform with IMC??? Isn't that a bit too late? The six-core-sharing-one-FSB also doesn't make sense since a 1600-FSB can hardly supply 4 cores adequately. 後藤弘茂 is known to make some inaccurate prediction before so I'd reserve on that.

scientia -
"Q4 may be down for graphics but should be up for chipsets."

But you said "graphics section" in the article.

Scientia from AMDZone said...

khorgano, gutterat, roborat, and mo

Actually I am still confused why the 45nm chips wouldn't have been added to inventory. Isn't that the way it is normally done? My understanding is that raw materials, chips in process, and unsold chips count as an asset until they are used, sold or disposed of. For example, in Q2, AMD charged off socket 939 chips that couldn't be sold.

I'm also confused too why Intel didn't mention the 45nm chips when they talked about inventory being low. For example, they didn't say that inventory was low but that the 45nm chips would be added back later. I need some clarification on this.

abinstein

I forgot that AMD is now including chipsets as part of the Computing Solutions section and only has discrete graphic products in the Graphic section. I tweaked the wording in my article.

Roborat, Ph.D said...

Scientia said: Actually I am still confused why the 45nm chips wouldn't have been added to inventory.

the 45nm units are classified as "risk build", meaning intel hasn't achieved full factory (D1D)or technology (45nm) certification to financially account them as having any value. Intel cannot declare them as inventory and include them in its report because there is a probability that the certification fails which would render all the stockpile as worthless. Hence the term "risk build". But in reality, Intel already knows the risks are minute. It's more of a general accounting practice never to include risk build because it's a speculated asset in between value and waste.

As to Abistein and aguia's question why Intel has never done this in the past. Well that's because Intel hasn't been caught in between quarterly reports with a massive volume of pre-launch inventory that affected margins significantly and needed to be explained. The key in Bryant's reply is the word "yet" as in "aren't yet qualified...". There wasn't anything cryptic with what Andy said. I thought it was pretty straightforward.

Khorgano said...

Actually I am still confused why the 45nm chips wouldn't have been added to inventory. Isn't that the way it is normally done? My understanding is that raw materials, chips in process, and unsold chips count as an asset until they are used, sold or disposed of. For example, in Q2, AMD charged off socket 939 chips that couldn't be sold.

I agree that it seems confusing at first glance, but it's a legitimate accounting practice. The real reason Intel did this is all due to timing. If they could produce and launch all the chips in the same quarter, this would be a moot point. However, since they are building up for launch, the timeframe is longer and spans multiple quarters.

Now, they are only required to list inventory for product that is actually sellable. (your comment about chips in production I'm not sure about, I'll have to look into it, also, raw materials are not included inventory, they are commodities and simply cap-ex and operating costs). Since they are not actually selling the chips, they write them off the inventory. From an accounting perspective they are no different than test wafers.

In the ideal situation, the chips would be produced and sold in the same quarter, but since that didn't/can't happen Intel still has to account for the costs of manufacturing them, so they still take the hit on operating costs to their margins for their production, but no longer account for them as inventory.

As to why they didn't clarify this in the conference call and say they would be added back into inventory at launch, I don't know, I guess you could say that to them it was rather obvious this would be the case so it goes without saying.

Khorgano said...

Ah yes, thank you Roborat, your explanation. of "risk build" was far more succinct and clear than mine.

GutterRat said...

Have you watched Andy Bryant's Intel Spring Analyst Meeting presentation? It contains useful insight covering relative unit costs, factory ramps, inventory and operations controls.

Bradley said...

oops, i meant griffin not puma. i was hoping we'd get a hint of when it'll launch and how its progressing. and i was a bit worried about it after mooly eden dissed it by saying that it won't even be a contender, since they were worrying about defeating a barcelona based chip. but he would say that.

Scientia from AMDZone said...

bradley

I knew what you meant with Puma which is the entire platform for Griffin. Griffin is based on K8, not K10. K10 includes optimizations for multi-socket and multi-threading that wouldn't help much on a notebook. Griffin does include lots of power saving optimizations which Turion does not currently have. This, plus the new chipset makes Puma AMD's first true notebook platform and fully competitive with Centrino.

To compete with Puma, Intel does have Penryn however the greater SSE performance of Penryn can only be used with much higher power draw. Some have suggested that Intel could market this as both a notebook and desktop replacement. The problem with this idea is that desktop replacement units (like I use) have brighter displays, larger speakers, larger physical size, and greater weight. You really can't get a desktop replacement without making the unit less portable. In other words, you can't make a desktop replacement by just putting in a beefier processor. Secondly, if we are talking about desktop replacement models then AMD could probably use a low power version of K10. So, as far as I can tell, Intel has far less advantage than it would appear.

enumae said...

Scientia
...This, plus the new chipset makes Puma AMD's first true notebook platform and fully competitive with Centrino...


Power consumption, performance or both power/performance?

Aguia said...

Power consumption, performance or both power/performance?

platform

Scientia from AMDZone said...

enumae & aguia

In terms of true notebooks Puma should be competitive in terms of performance, battery life, and performance/battery hour. In other words, in all aspects.

In terms of desktop replacement AMD should be competitive in the same way with a low power version of K10.

Scientia from AMDZone said...

khorgano & roborat

Yes, that makes sense now about why the chips were not put into inventory.

Erlindo said...

Nehalem needs new socket

quote:
It doesn’t really surprise us but Intel will change the socket of the Nehalem at the end of 2008. The new 45 nanometer Nehalem generation processor won’t work in the LGA 775 or server LGA771 socket. It will need a new LGA 1366 socket.

This is simply because Nehalem has memory controller integrated in a CPU, and this CPU will support DDR3 memory only. It will also need a new chipset, so going for Nehalem will be one hell of a change.

Nehalem won't enter the market before Q4 2008, at least that is the desktop plan.


Ouch.. That's gonna suck.
Let's see what the intel fanbois have to say about this one!

Erlindo said...

AMD: 45 nano manufacturing on schedule
Quote:
Advanced Micro Devices is not delaying 45-nanometer manufacturing, according to the company, which is trying to correct an erroneous report on a blog.

"We are still on track to produce the first (45-nanometer) products by mid-2008," said Gary Silcott, an AMD spokesman. The company will have "pretty good volumes" of 45-nanometer chips by the end of 2008, he added...



AMD 45nm ramp revisited
Quote:
News from last week’s AMD conference call that told us the company would start production of 45nm microprocessors in the first half of 2008 shouldn’t be looked at from the point of view of ‘volume production.’ The message that should be taken from the news is that at some time in the first six months of next year, AMD may well have a small amount of devices entering the supply-chain.
AMD had said that we should see 45nm chips ‘mid-year,’ which could easily have applied to the time period from June through September based on Barcelona schedules!...

enumae said...

Scientia
In terms of true notebooks Puma should be competitive in terms of performance, battery life, and performance/battery hour. In other words, in all aspects.


In looking for a relatively new comparison - review of AMD's 65nm Mobile part and Intel's Sanata Rosa platform I was able to find where Anandtech had done one this month.

Here is the link, and here is a quick summary of the results.

I am not quite sure why you believe that AMD will make great strides and Intel will sit back and not make improvements, but I feel you are wrong in concluding that AMD will be equal to Intel in laptops with the release of Puma.

core2dude said...


erlindo


Ouch.. That's gonna suck.
Let's see what the intel fanbois have to say about this one!

And what is it exactly that is going to suck?

New processors from Intel have almost always required a new motherboard. So this is not very different from how it was done in the past. AMD typically has had problems with changing the sockets because they did not manufacture chipsets/motherboards. Intel should not have any such problems.

Nehalem not happening on desktop sooner is not unexpected either. Intel simply does not need Nehalem on desktop to keep AMD in check. On desktop workloads, Barcelona already lags Kentsfield by good 15% to 20% per clock--and Penryn will increase both, the IPC and the clock speed.

core2dude said...


erlindo

Intel probably does not need Nehalem to keep AMD in check on server side either. But with 4 CSI links, Intel will finally be able to claim the bragging rights on memory tests such as specfp_rate. AMD won't have anything to show at that point.

Scientia from AMDZone said...

enumae

Yes, looking at the Anandtech review of current Turion and C2D it does look like Griffin will be competitive.

core2dude

Your confidence in Intel is admirable. I'm not so certain that we've seen K10's actual performance yet nor am I as confident as you that Shanghai won't be a factor. However, I am certain that Nehalem will cost considerably more to manufacture than Penryn. This reason alone will probably keep it from penetrating the mainstream desktop for awhile. Intel's margins will definitely take a hit with Nehalem in volume. I'm also not certain how anxious Intel is to give up those FSB licensing fees and put itself on a more level playing field with other chipset makers.

core2dude said...


Scientia


I'm not so certain that we've seen K10's actual performance yet nor am I as confident as you that Shanghai won't be a factor.

We know Barcelona's performance alright! The sooner you accept it, the better. It is nothing but K8 with little bit modifications that do not manifest themselves in performance--unless you have a severely bandwidth-limited app. Essentially it is a one-trick pony, the trick being, memory bandwidth.

Shanghai won't be a player till 2009--AMD's 45nm, for all practical purposes won't happen until then. And looking at AMD's recent track record, Shanghai will be an even dumber shrink of the already dumb Barcelona.

If Intel can charge licensing fees for FSB, it can definitely do so for CSI.

Nehalem will probably the US team's answer to the Israeli team, rather than Intel's answer to AMD. AMD processors don't matter anymore.

GutterRat said...

scientia wrote,

To compete with Puma, Intel does have Penryn however the greater SSE performance of Penryn can only be used with much higher power draw.

What you wrote above does not make sense.

Are you claiming that Penryn's SSE 'needs' a higher power envelope to be competitive?

Erlindo said...

Scientia:
I've found this lovely piece thanks to TechReport: Nehalem coming in the last quarter of 2008?

Quote:
At the Fall Intel Developer Forum in September, we learned that Intel plans to release processors based on its upcoming "Nehalem" core in the second half of next year. Rumors quoting a more precise launch time are now beginning to appear, and they suggest we may see the new chips later rather than sooner. Both Fudzilla and TechConnect Magazine claim Nehalem is scheduled for a launch in the fourth quarter of 2008. Fudzilla talks only of desktop Nehalem variants, which it says are dubbed Bloomfield, while TC Magazine suggests the Q4 time frame applies to all Nehalem chips.

As we've learned over the past few months, Nehalem will bring a brand new, 45nm-based architecture packing features like an integrated memory controller, point-to-point interconnects à la HyperTransport, a built-in graphics core, simultaneous multi-threading, and more. According to Fudzilla, Intel will introduce a new LGA1366 socket, Tylersburg-DT chipset, and ICH10 south bridge along with desktop Nehalem processors. Meanwhile, TC Magazine says Nehalem CPUs due in 2008 will be dual- and quad-core only with no integrated graphics controllers. Models with eight cores and integrated graphics are said to have slipped into 2009.


Seems that Scientias was right on this one: Nehalem is going to give intel a pain in the ... thanks to its giant core. ;)

For the fanboys: "The sooner they accept it, the better". :D

enumae said...

ars technica has an article concerning DTX.

core2dude said...


erlindo


Seems that Scientias was right on this one: Nehalem is going to give intel a pain in the ... thanks to its giant core. ;)

Nehalem has 734 million transistors. Penryn at 800 million transistors has a die size of 280 mm^2. This puts Nehalem's die size at roughly 257 mm^2 (if you scale the area linearly with number of transistors). That is about 10% smaller than Barcelona's 285 mm^2. So Intel's pain with Nehalem will certainly be much smaller (about 20% smaller?) than Barcelona.

Giant said...

Apple sells over 2 million Macs, earning them over 900m in profit! Each one of those Macs were Intel based.

I hear that Hector Ruiz is going green with envy at how much cash Intel and Apple are raking in.

http://www.apple.com/pr/library/2007/10/22results.html

Giant said...

Penryn at 800 million transistors has a die size of 280 mm^2.

A Penryn quad core is comprised of two 107mm die.

Scientia from AMDZone said...

core2dude

"It is nothing but K8 with little bit modifications that do not manifest themselves in performance--unless you have a severely bandwidth-limited app."

You apparently know very little about processor architecture.

"Shanghai won't be a player till 2009--AMD's 45nm, for all practical purposes won't happen until then."

Not at all. AMD should have a good volume of 45nm in Q4 08.

"Nehalem will probably the US team's answer to the Israeli team"

I'm not sure what you are talking about here. The last thing the Israeli team designed was Yonah. Surely you must know that C2D was not designed in Israel.

"AMD processors don't matter anymore."

To Intel fans.

Scientia from AMDZone said...

gutterat

Execution units draw lots of power. Doing heavy SSE computation will draw more power than would be compatible with long battery life.


core2dude

If you run through your calculations you might understand that Nehalem should be about the same size as Shanghai which will be out before Nehalem. According to the tick tock schedule, Nehalem should be out in Q4.

Scientia from AMDZone said...

enumae

There are plans for five DTX and five mini-DTX motherboards. We'll have to see how it goes after that.

Eduardo said...

Core2Dude said:


"We know Barcelona's performance alright! The sooner you accept it, the better. It is nothing but K8 with little bit modifications that do not manifest themselves in performance--unless you have a severely bandwidth-limited app. Essentially it is a one-trick pony, the trick being, memory bandwidth."


And C2D is a one-trick-pony also, the trick being, cache:
http://blogs.zdnet.com/Ou/images/4p-intel-vs-amd-server.png

Look how the 1.86ghz beats the crap out of the 2.13ghz. Now take out the "tricks" and AMD would put intel to shame.

And yes we already know K10 performance:
http://anandtech.com/IT/showdoc.aspx?i=3091&p=13

It's faster and uses less power.


"Barcelona already lags Kentsfield by good 15% to 20% per clock"


You should seek psychiatric help right away because your reality it's not real

Khorgano said...


Look how the 1.86ghz beats the crap out of the 2.13ghz. Now take out the "tricks" and AMD would put intel to shame.


What this says is that the Core 2 architecture is much more powerful than the memory architecture allows. When the "tricks" are used to feed the core with more data, it can easily scale to new heights. So, while the increased performance may be perceived as artificial due to increased cache size, the fact remains it is a juggernaut of a micro-arch and will only get more powerful when coupled with CSI in Nehalem next year.

Giant said...

Surely you must know that C2D was not designed in Israel.

Yes it was. There have been posts by Intel employees confirming this. See this for instance:

She's a heart-stopper, that's for sure. The american boys are out to prove they can design a chip that screams. P4 was an american project, Core2 was Isreali.

Source: http://www.xcpus.com/forums/news/2792-intel-sample-nehalem-cpus-october-2.html

Furthermore, at IDF Paul Otellini confirmed that Nehalem and Westmere are American designs while Sandy Bridge is the next Israeli design.

Giant said...

Shanghai which will be out before Nehalem.

We've heard nothing from AMD regarding this Shanghai CPU. No tapeout announcment, no demos, nothing.

Intel has already demoed a DP Nehalem system at IDF with a total of sixteen threads(running more than task manager!).

Based on these facts I feel it's only fair to assume that Nehalem will arrive before Shanghai.


You should seek psychiatric help right away because your reality it's not real


The results are conclusive. http://www.techreport.com/articles.x/13176

Now take out the "tricks" and AMD would put intel to shame.

How is having a large L2 cache a 'trick'? Please do enlighten us! Using your logic, it's clear that AMD's integrated memory controller is just a cheap trick.

Benchmark the CPUs as they come and Clovertown is faster than Barcelona in most workloads. That's right. AMD's new 'most advanced x86 CPU' is slower than Intel's 11 month old Clovertown CPUs. Go figure.

abinstein said...

Giant -
"Benchmark the CPUs as they come and Clovertown is faster than Barcelona in most workloads."

I'm sure that's true when Clovertown sports 50% faster clock and uses 80% more electricity than the comparing Barcelona.

OTOH, in the real reality that people running datacenters actually care, Barcelona outperforms Clovertown by quite a healthy margin under the same power requirement.

But unfortunately that's a reality that you won't understand.

abinstein said...

scientia: "Nehalem should be about the same size as Shanghai..."

We've already known that quad-core Nehalem is about 270mm^2. This is without any L3 cache. Shanghai OTOH should be roughly 240mm^2 with 12MB L3 cache.

core2dude said...


abinstein


We've already known that quad-core Nehalem is about 270mm^2. This is without any L3 cache. Shanghai OTOH should be roughly 240mm^2 with 12MB L3 cache.

It is amusing how you just "know" some things. Got any links for that, is it usual "we already know"?

AFAIK, NHM has 734 million transistors, including the L3 cache.

Otherwise, NHM will have a humangous core, and per your logic, will frag shanghai by a factor of 4.

core2dude said...


abinstein

Take a look at NHM die here and tell me that the array on bottom is not shared L3. And that die has 731 million transistors (I agree, I was wrong--NHM is smaller than I thought).

So my new estimate of NHM die is 731/800*280 = 256 mm^2.

Scientia from AMDZone said...

Khorgano

"will only get more powerful when coupled with CSI in Nehalem next year."

I think you are confused. CSI will only carry I/O and interprocessor communications. I assume you got this confused with the increase in memory bandwidth which Nehalem also has.

Khorgano said...

CSI will only carry I/O and interprocessor communications. I assume you got this confused with the increase in memory bandwidth which Nehalem also has.

You're right, I meant to say due to the IMC, CSI is an interconnect technology like HT. =p

Scientia from AMDZone said...

giant

"We've heard nothing from AMD regarding this Shanghai CPU."

Right, nothing if you ignore their statements about building 45nm cpu's today. And, nothing if you ignore the additional clarification statements by AMD about 45nm. Simply put, if you ignore everything AMD has said then you would be correct.

"Based on these facts I feel it's only fair to assume that Nehalem will arrive before Shanghai."

In other words, you want to ignore both AMD's statements and Intel's tick tock schedule and make up your own timelines.

enumae said...

Scientia

Considering that AMD is...

1. Not using FAB30 currently, and will ramp as needed in 2008.

2. Still ramping FAB36 to full volume.

3. The majority of their mix is still dual-core and (most likely) will be at least through 2008 and 2009.

What are the chances that the first products we see from AMD's 45nm will be dual-cores?

It would allow AMD to offset any increase in demand, allow continued growth through 2008 and let AMD focus on Barcelona production and speeds on 65nm.

This is unlikely, but I feel is a good what if question to discuss, or maybe it's not :)

Thanks

core2dude said...


scientia


I think you are confused. CSI will only carry I/O and interprocessor communications.

Remote memory access also goes over CSI. So does the snoop. So CSI matters when it comes to memory bandwidth.

abinstein said...

core2dude
"Take a look at NHM die here and tell me that the array on bottom is not shared L3."

No, the array at the bottom is two sets of L2 shared by two sets of dual cores - that is if Nehalem keeps the shared L2 from Core at all.

OTOH, there is no need to use number of transistors to inaccurately estimate Nehalem die size. The die size of Nehalem is aptly calculated here. It is roughly 270mm^2 for quad-core, until Intel modifies it later.

enumae said...

Abinstein

Have you (or any one else) seen this Nehalem Die?

abinstein said...

core2dude -
"You should start calling yourself Einstein. That would describe yourself better, though in a paradoxical way."

Or maybe you should start focusing at facts, not names.

The fact is stated quite clearly on my blog. Let me know your valid disagreement, or you are making no contribution by some petty estimates (from # of transistors).

abinstein said...

enumae -
"Have you (or any one else) seen this Nehalem Die?"

No, I haven't, and thanks for the link, it's certainly a good way to interpret the die photo.

The original die photo was in such low resolution that enlarging it 16x also makes any aliasing error 16x greater, so I'm not very sure this interpretation is correct - but it could be.

There are a few things that cast doubt on it, though. First, where is the L3 tags? Second, in this configuration, the L2 access time will be as slow as the L3. Third, this means Intel totally gives up its successful shared L2 architecture and goes for a very AMD-like L2/L3 architecture.

But maybe this is exactly Intel is doing...

Ho Ho said...

Seems as lots of people haven't seen this yet or they simply ignore it to make their claims sound correct.


"? Second, in this configuration, the L2 access time will be as slow as the L3."

riiight

abinstein said...

Ho Ho, why don't you link this picture instead and tell me which one is the quad-core Nehalem?

BTW, do you know that the article is something called speculation? Not to mentoin it totally contradicts what's said on the Chip Architect page.

Just because you don't understand Japanese shouldn't allow you to believe whatever you see with full confidence. Learn to READ first...

Scientia from AMDZone said...

enumae

Considering that K10 is designed for 45nm, I would say that the first 45nm chips will be quad core. The size benefit of moving to 45nm is the same with quad core as dual core but you get more yield benefit with the larger die. It makes more sense to start 45nm with quad core.

Scientia from AMDZone said...

core2dude

You are obviously confused as well. The discussion was about low cache versions of C2D. And, these would indeed be improved by additional memory bandwidth.

On the other hand, the items you mentioned like bus snooping and remote memory access don't pertain to single socket systems at all. The improvement to Nehalem over current dual socket C2D systems would be tiny since the dual FSB chipsets with message filtering work pretty well. You'll only see improvements for remote memory and bus snooping with quad socket and higher.

Scientia from AMDZone said...

I'm more inclined to believe Hans De Vries' analysis than PC Watch. Also some of the PC Watch diagrams are obviously wrong so why assume the rest are correct?

enumae said...

Scientia, Abinstein and HoHo

What are your thoughts about Lights Mark?

Can someone explain/elaborate what this is actually testing?

Scientia from AMDZone said...

core2dude

I would say that I stand corrected about C2D's being designed in Israel. The references seem consistent with two 4 year design teams, presumably one in Oregon and the other in Israel.

"If Intel can charge licensing fees for FSB, it can definitely do so for CSI."

No, you aren't seeing this at all. First of all, you don't need CSI to connect the north or southbridge to the cpu. You only need PCI-e and this already licensed. You only need CSI to allow cache coherency between cpu's so this only applies to makers of multi-socket chipsets. Secondly, CSI is a subset of HyperTransport so anyone currently making an HT based chipset can port back a CSI based one with very little effort. This takes away both licensing fees and lead time for Intel.

core2dude said...


scientia


No, you aren't seeing this at all. First of all, you don't need CSI to connect the north or southbridge to the cpu.

Oh I see. You are talking about UP platform. Yes, Intel loses FSB licensing fees there. Even worse, if PCIe comes directly out of CPU, they lose chipset revenues as well.

On DP platforms, the chipset connects via CSI. INQ had an article on this.

My guess would be, Intel will push Nehalem (at least initially) only in V8 platforms. The UP will still be based on Penryn. No one needs a Nehalem on UP.

abinstein said...

enumae -

Lightmark adds global lighting to realtime rendering.

Please also take a look at this. It's an interesting read.

scientia -

CSI is definitely not a subset of HT.. they use different physical & link signaling. They even have different routing & transport protocols.

Also, I don't think PCIe will come directly from UP Nehalem.

core2dude -

Nehalem follows K8 and Barcelona in almost every step. In terms of architecture K8/K10 do matter a lot to Intel because they are what Intel is copying from.

In terms of performance, you must be in deep denial to say Barcelona "just tweaked K8." Well, it's just a tweaked K8 per core, but it also has greatly superior multi-core architecture & scalability.

GutterRat said...

scientia wrote,

Execution units draw lots of power. Doing heavy SSE computation will draw more power than would be compatible with long battery life.

Are you assuming that nothing was done in Penryn to reduce general power draw?

Use of some of the newer instructions could actually reduce power instead of increase it.

core2dude said...


abinstein


Nehalem follows K8 and Barcelona in almost every step. In terms of architecture K8/K10 do matter a lot to Intel because they are what Intel is copying from.

If your definition of architecture is integrated memory controller, and point-to-point interconnect, then yes, Nehalem copies K10. But if it is even slightly wider than that (which for any sane person should be), then you are completely wrong (surprise!).

Before reading the rest of the comment, I would strongly recommend that you read David Canter's article on CSI.

Besides being a point-to-point interconnect for CPUs, CSI is nothing like HT.

HT is a 16-bit link, CSI is 20. HT uses MOESI. CSI uses MESIF. CSI has vastly better RAS features. But most importantly, HT is a cludgy point solution. CSI is layered architecture, with multiple profiles--a profile only implements what is needed. CSI is almost like OSI networking stack. It is a vastly superior architecture that will serve Intel well for next decade or so. Per Kanter CSI is so nicely layered that Intel could remove the copper wires from the physical layer, and replace them with a fiber without changing the upper layers of the protocol. AMD hacks will take a decade to come up with an elegant solution like that.

It is a very well established fact that other than (mostly artificial) memory tests Barcelona loses to Clovertown by about 15% per clock. Nehalem will take away Barcelona's memory bandwidth advantage with an interconnect that AMD could only dream of.

Scientia from AMDZone said...

abinstein

CSI is not a direct subset of HT; CSI is a functional subset of HT. PCI-e is a direct subset of CSI. However, I've written dozens of format converters. It would not be difficult to change CSI to HT. This is in fact far easier than the current Intel based motherboards that use HT.

gutterat

You'll have to wait until maybe Q2 08. Penryn will be stronger but draw more power than Griffin.

core2dude

I'm just very doubtful that Intel can push a separate server processor with Nehalem. Intel tried that with Pentium Pro and it didn't work. Nor has it worked since. The other problem is that whatever Intel does now to gain advantage will tend to make things worse once Nehalem is released. For example, if Intel pushes quad core for servers now to exploit its lower cost for MCM then this will hurt when native quad comes with Nehalem.

core2dude said...


scientia


Secondly, CSI is a subset of HyperTransport so anyone currently making an HT based chipset can port back a CSI based one with very little effort. This takes away both licensing fees and lead time for Intel.

Do you event know what CSI is? Do you know the various layers? Do you know the protocol details? Do you know the failover mechanisms? CSI is nothing like HT. It is vastly superior, five-layer architecture designed from grounds up. Read Kanter's article before making such statements.

core2dude said...


scientia


PCI-e is a direct subset of CSI. However, I've written dozens of format converters. It would not be difficult to change CSI to HT. This is in fact far easier than the current Intel based motherboards that use HT.

PCIe is definitely not a subset of CSI. PCIe is a lane-based architecture. However, per Kanter, CSI only supports only full width, half width, and quarter width. CSI uses clock forwarding, as opposed to clock encoding on PCIe (though PCIe 3.0 will use clock forwarding). I don't think CSI will use packet headers as large as PCIe. Does not make sense when your unit of transfer is going to be 64 bytes most of the times. In short, CSI and PCIe will be vastly different.

Yes, it is possible to implement a protocol converter. But such a converter has to implement CSI on one end, and for that purpose, it will need a license. Otherwise, how would such a converter negotiate link width?

abinstein said...

core2dude:
"CSI is nothing like HT. It is vastly superior, five-layer architecture designed from grounds up."

I know what CSI is described but you don't know what HT is. Scientia has a valid point. Although there are a few functionalities of CSI that doesn't exist in ccHT, for example, the "MESIF" cache coherence protocol, there are a lot more that exist in HT3 but not in CSI. Furthermore, MESIF is not actually superior than MOESI, but more like a way to improve MESI around MOESI.

The two-hop protocol is some nice idea from CSI but it is nothing new. It would be possible to implement this protocol in HT (or in fact any networks), and AMD just chose the 3-hop protocol in ccHT for its simplicity.

In terms of architecture, HT also defines physical wires, data links, routing, transport, and even some session functionalities, and I don't see CSI being "any superior" than the as old as HT 1.0.

Ho Ho said...

enumae
"Can someone explain/elaborate what this is actually testing?"

Mostly your GPU shader processing, CPU does really little work there if you don't count the preprocessing needed to make it actually work.

Also that is rather limited algorithm that doesn't work well with increased scene complexity, more lights and dynamic objects. Actually it doesn't seem to support dynamic objects casting proper shadows at all, only when lights move they are updated. Also for static objects are all precalculated so say bye-bye to destructible terrain.

It makes for a nice techdemo but is pretty much unusable for real things.


abinstein
"Please also take a look at this. It's an interesting read."

It was much more interesting before the server crashed and they lost almost ten days worth of data.

Aguia said...

core2dude,

I also don’t see CSI has being superior. In fact after the already 6 years delay, I was expecting something much better from Intel.

The only thing that will probably put Intel has superior is the triple channel IMC instead of the dual. It’s funny that Intel doesn’t like the magnificent triple core CPUs idea but seems to like of triple memory channel interface, I already see Intel saying, who said 3 was not a good number, and then latter the 6 will be an excellent idea for those who don’t like 4 but doesn’t need 8.

I also see Intel charging a lot for the CSI interconnect (someone has to pay for the chipset sales looses). If Intel charged for front side bus speed bumps, what to expect from a brand new Interface?
I already see Intel making SIS and VIA go out of business. Ati will also leave Intel chipset development. And Nvidia will trade SLI for CSI or pay the price premium Intel wants.


scientia,
Current Intel mobile systems already use about the same power as AMD and also being 20% faster. Penryn will only extend that. In fact if Intel wanted even today with current generation they could lower power consuming by lowering the clock speed and still keep the superiority, why they don’t do it I think has to with the CPU binning, they can’t get enough parts that go that lower or don’t want to waste time seeing which parts can do that.
But with penryn they no longer need that because of the 45nm process. The only AMD edge will be the platforms that will be vastly superior at least at the IGP level and price.

Scientia from AMDZone said...

core2dude

"It is vastly superior, five-layer architecture designed from grounds up."

It isn't vastly superior. HT and CSI are in reality very similar. Both rely primarily on CRC and retry for error handling. CSI uses embedded clocks while HT uses separate clocks. This does make CSI more flexible at the cost of latency.

It is true that clock embedding would be necessary for a purely serial link. However, I can't think of any situation that would use a purely serial link.

Scientia from AMDZone said...

core2dude

"PCIe is definitely not a subset of CSI"

HT has no trouble carrying PCI, PCI-X and PCI-e packets. CSI is able to do this as well.

Since PCI-e is not able to carry HT or CSI packets I think we can see which is a functional subset and which is a superset.

Scientia from AMDZone said...

aguia

"I already see Intel making SIS and VIA go out of business. Ati will also leave Intel chipset development. And Nvidia will trade SLI for CSI or pay the price premium Intel wants."

Yes, I wonder too if it could end up as just Intel and nVidia on the Intel side and every other chipset maker on the AMD side. There is no license fee for HT.

"Current Intel mobile systems already use about the same power as AMD and also being 20% faster. Penryn will only extend that."

The problem is that you are comparing Penryn with AMD's current Turion processor which has almost nothing for power savings. Griffin is very different; it includes all of the power saving features that Intel already uses. Penryn is just a small improvement but Griffin and the new chipset are great improvements. It will make a difference.

"In fact if Intel wanted even today with current generation they could lower power consuming by lowering the clock speed and still keep the superiority"

Quite true when compared with Turion and the current chipsets, neither of which are true mobile designs. This is not the case with Puma.

abinstein said...

scientia: "CSI uses embedded clocks while HT uses separate clocks. This does make CSI more flexible at the cost of latency."

You can do this with HT 3.0 AC operation.

Ho Ho: "It was much more interesting before the server crashed and they lost almost ten days worth of data."

I don't know what are you trying to accomplish. That article pretty much disproves all the ranting you've been saying lately - unless you have something new to say now?

Ho Ho said...

abinstein
"I don't know what are you trying to accomplish."

I was just pointing out that quite a few of the posts containing pro-RT information got lost. Also that article has a lot of old information about RT, I'd say it is on about the same level than the first Intel one where it was compared against pre-GPU era rasterizing.

abinstein said...

Ho Ho:
"Also that article has a lot of old information about RT"

You still don't accomplish anything... Why don't you educate us (with a blog article or something) about the "new information" about ray-tracing?

I promise you if you do so I'll update you guys with an article on quantum computing!

Ho Ho said...

There are some of the posts I managed to salvage on the discussion board on B3D. I suggest you to read the ones from lycium and shootmymonkey. First one is a scientist working with RT, other one is a game developer working on most major platformns.


Also quantum computing is vastly overrated. There are only very few problems that it can handle.

abinstein said...

"There are some of the posts I managed to salvage on the discussion board on B3D."

Link?

"Also quantum computing is vastly overrated. There are only very few problems that it can handle."

Actually I named quantum computing for an analog to real-time ray tracing - they are both impractical at the moment.

Besides, you are wrong; anything that can be computed classically can be performed by quantum computers.

Ho Ho said...

abinstein
"Link?"

Haven't you read the discussion on their forum then? If so then there is no wonder why you take the article as being 100% truth.

The link is in the last post I made in the resurrection thread and shouldn't be too hard to find.


"they are both impractical at the moment."

Difference between the two is that RT actually can replace or at least vastly improve current rendering technologies. Quantum computing can't do nearly anything too complex.

Ho Ho said...

abinstein
"anything that can be computed classically can be performed by quantum computers."

It goes the other way also, only question is if it is practical.

With quantum computing you can't really do things like run complex simulations that well but you can do some stuff that regular computers can't do that well like cracking encryptions. Technically both of these can be done on both machines but problem is that where on one it scales linearly it scales exponentially on the other and vice versa.

Scientia from AMDZone said...

Quantum computing seems a bit off track for this thread. However, I need to make a correction. The only area where quantum computing is ahead of other forms is in quantum mechanic simulation. The other areas that are claimed to be faster with quantum computing are only faster compared to standard digital systems. We have hybrid analog/digital systems that can process these problems with similar speed and scaling. However, these analog systems are several exponential orders of magnitude more practical.

Giant said...

FSB Scaling: http://www.nordichardware.com/Guides/?page=1&skrivelse=517

Only a small performance increase for a faster bus.

Axel said...

Scientia

It is probably easier for AMD to convert FAB 30 without it operating and AMD can probably save some organizational complexity by dropping 200mm production.

No, AMD's original plan was to upgrade Fab 30 piecemeal, as per their Q2 2006 earnings CC:

BOB RIVET
Think of it this way, Joe -- we will probably never go below 50% utilization in the facility. As we continue to flip out tools, what we are actually doing is building a separate building to actually augment the capacity to be able to flip the tools through the system. We will never actually go below about 50% utilization of the facility in the worst quarter of time.


Clearly, due to the current financial situation along with lower than anticipated demand for K8 due to Core 2, AMD could not justify sticking to their original plan of upgrading piecemeal. They will instead completely shutter Fab 30 by year end in order to save on the high costs of operating the cleanroom, among other costs. They have provided vague guidance on the pace of the upgrade, including some smoke and mirrors about a race car idling in the pit. The reality is, even if demand for K10 in 2008 turns out to be higher than Fab 36 alone can supply, AMD are unlikely to bring Fab 38 on-line in 2008 because K10 is too weak a product to raise ASPs high enough to operate two fabs. I believe Fab 30 will remain shuttered throughout 2008.

"- There isn't enough market demand for K8 to warrant the high fixed costs of operating Fab 30."

You are obviously overlooking the large volume of Brisbane chips that AMD will continue to produce. Remember, FAB 30 cannot produce 65nm chips.


No, my statement focused on the cost of operating Fab 30 and made no mention of Fab 36. The fact is, AMD had hoped by now to have the capacity from both fabs to supply the demand for K8, but due to Core 2 this demand is now less than initially anticipated. So AMD are shuttering Fab 30 to save cost. Re-opening date is TBD and will be based primarily on what K10 revenue and ASP look like in 2008.

Again, not true. AMD is moving forward with both 65nm K10 and 45nm K10. Bulldozer is nearly two years away.

Yes, moving forward with a mediocre product that cannot support the company. In fact I doubt that AMD will turn a profit in 2008 even with Fab 30 shuttered. K10 is simply too slow, both in IPC and clock.

Completely false. FAB 38 will come online in early 2008.

Sorry but I believe that this statement is destined to go the way of most of your predictions over the last eighteen months.

Scientia from AMDZone said...

axel

"No, AMD's original plan was to upgrade Fab 30 piecemeal, as per their Q2 2006 earnings CC"

I'm still baffled why you pretend to disagree with me when you actually agree. Of course AMD's original plan was to convert FAB 30 while still under operation. We were not discussing whether this was the original plan but why it changed. Hey, but thanks for providing quotes to "prove" what everyone already knew.

The rest of your comemnts are not really worth bothering to reply to in detail. FAB 30 will not be shuttered. Since the building is already paid for AMD can only save money by firing workers which is not planned. The cost savings of electricity is tiny compared to the cost of raw materials. Raw materials are of course not needed until new tooling is purchased. The truth is that new tooling purchase is the primary constraint.

Now, the biggest problem with your argument is the scanner. It is impossible to leave FAB 38 "idling" and ready to ramp in, say, 3 months if it doesn't have a scanner because the lead time for ordering a scanner is about 9 months. This means that AMD has to secure a scanner for FAB 38 to make it ready to ramp. However, it would be ridiculous to think that AMD would buy a $30 million scanner and then leave it unused. AMD may have plans to move a scanner from FAB 36 as they buy new 45nm immersion scanners or they may obtain a new one for FAB 38 but either way, they have to have one. And, once they have a scanner FAB 38 will be under low volume production. Again, the ramp schedule is uncertain but FAB 38's coming online is not.

And, your assertion that AMD's K8 demand is low is simply at odds with reality. The truth is that AMD's volume share is only slightly below its all time high and is slightly above the 2006 average, this is in spite of the fact that market volume was up in Q3.

The fact is that AMD will not be able to provide all the chips it needs just from Chartered and FAB 36; FAB 38 will have to provide some.

If you really think that Barcelona is so poor then you obviously have not looked at either the 2-way or 4-way SPECfp_rate scores. Intel's best Tigerton and Clovertown scores are squarely thumped by much slower Barcelona scores.

Scientia from AMDZone said...

axel

"I believe that this statement is destined to go the way of most of your predictions over the last eighteen months."

Which way would that be?

Let's see:

C2D performed better than I had expected. I was off on that.

I said AMD was unlikely to buy ATI because it would compete with nVidia. That seems to be my biggest error.

AMD's earnings for Q1, Q2, and Q3 were lower than I expected. I can't claim much of a track record there.

I had expected Intel to meet their original roadmap and release 3.2Ghz Conroe's in 2006. I was definitely off on that.

K10 arrived slower and a little later than I had expected; AMD's original roadmap had indicated 2.3Ghz at launch and I was expecting late Q2 instead of late Q3.

I had expected Intel to release dual/dual 4-way versions of Clovertown and Woodcrest but apparently the performance was worse than Tulsa.

I was not expecting Intel to run short of capacity in Q4 07 and Q1 08.

These seem strange because some here still claim that I favor AMD. For this to be true I should only be overestimating AMD and underestimating Intel. Yet, at times I underestimated AMD and overestimated Intel. Odd indeed.

Now let's look at other things I said:

Yonah would not have 64 bits and would not clock high. Check.

Sossaman would not be very impressive. This was true; it saw very little use.

AMD would convert FAB 36 to 65nm by mid 2008. They were ahead of schedule on this.

Extremely unlikely for Intel to begin 45nm production in Q3. Check.

45nm production soley from D1D in 2007. Check.

AMD had 2.4Ghz chips working in the lab. Check.

I refuted the notion that K10 wouldn't arrive until 2008. Check.

People kept claiming that AMD would begin 45nm production in FAB 38. I disagreed with this and said that it would be in FAB 36 first because AMD had created more room by building the bump and test addition. This was correct.

AMD's purchase of ATI would be beneficial long before Fusion. This seems to be true today and likely to improve even more soon with the new chipsets.

AMD will deliver some K10 desktop chips in Q4. This seems to be on track but don't know for sure until they are released.

I said that I believed that AMD would move up the 2.6Ghz K10 speeds from their release date of Q3 on the original roadmap. This definitely seems to be true.

AMD should be able to deliver 3.0Ghz K10's in Q1. Still not sure about this one. I've seen it suggested that AMD will have 2.8Ghz FX chips in Q4. Still have to see though.

I said some time ago that Tigerton wouldn't be that impressive. By the SPEC scores, it clearly is not. And we still need to see the performance/watt scores which will still get dented by the use of FBDIMM.

Some that we won't know for awhile:

We now have some claiming that AMD's 45nm won't arrive until 2009. I definitely disagree. 45nm should be Q3 2008.

I'll also say again that AMD's Puma should be competitive for true notebooks. However, you'll probably need a low power Phenom for desktop replacement applications.

I said that Intel's costs for 45nm will be up a bit because of double patterning. This one may be hard to find out but can probably be inferred by margins once 45nm volume is up.

I've said that Nehalem will have more bandwidth than Shanghai but not lower production costs because the die won't be smaller. This is looking more and more to be true but we still need a Shanghai die to know. I also said that Nehalem will have a hard time with power draw because of FBDIMM.

Feel free to post any that I overlooked.

Scientia from AMDZone said...

axel

BTW, did you even bother to read your own link? It doesn't support your claim that FAB 30 will be indefinitely shuttered. It says:

"It is on track to shutter Fab 30, one of its large manufacturing lines at the Dresden, Germany facility, to assist in its transition from a 200mm to a 300mm wafer manufacturing process,"

Axel said...

Scientia

The cost savings of electricity is tiny compared to the cost of raw materials. Raw materials are of course not needed until new tooling is purchased. The truth is that new tooling purchase is the primary constraint.

Not quite correct. I was referring to OPEX and new tooling purchase is CAPEX. From Figure 5 in that link you can see that raw materials (supplies) comprise only some 20% of 300-mm fabrication operating cost. Electricity and cleanroom upkeep would presumably be in the 10% 'Direct-Labor' category, indeed not a major factor but certainly not tiny. I cannot speak for the veracity of that link but it is consistent with what I've read before, that equipment depreciation and yield losses are the main killers. Once the equipment has been purchased, it needs to be put to work immediately and fully. This is precisely why AMD are stalling on the Fab 30 upgrade. They do not believe the demand will be there to justify these fixed costs. According to AMD themselves, little if any actual production activity is now expected from Fab 38 in 2008. When AMD say 'modest activity', it means little or nothing.

However, it would be ridiculous to think that AMD would buy a $30 million scanner and then leave it unused. AMD may have plans to move a scanner from FAB 36 as they buy new 45nm immersion scanners or they may obtain a new one for FAB 38 but either way, they have to have one.

Any of the tooling is subject to rapid depreciation and it would indeed be foolish of AMD to leave such expended CAPEX unutilized. Who knows how AMD are managing this? The bottom line is that according to Hector himself, Fab 38 will see little, if any, action in 2008.

The truth is that AMD's volume share is only slightly below its all time high and is slightly above the 2006 average, this is in spite of the fact that market volume was up in Q3.

You are forgetting what's happened to their fab capacity in the meantime. The truth is that AMD have much more fab capacity today than they did a year ago (considering both the Fab 36 ramp and the 65-nm shrink), but they are serving less of the market than they were a year ago and they do not anticipate significant continuing gains in share. Hence the reason for shuttering Fab 30.

If you really think that Barcelona is so poor then you obviously have not looked at either the 2-way or 4-way SPECfp_rate scores.

SPECfp_rate is irrelevant for this fiscal-centered discussion. K10 will be poor from a revenue standpoint, because in the markets that matter for revenue (hint: not server) K10 significantly underperforms Yorkfield per clock, and does so with a significantly larger die and much lower yield. There are already quite a few benchmarks out there to support this position. The sooner you come down to reality from the clouds of marketing materials like the K10 Software Optimization Guide, the better for your logical thinking.

Feel free to post any that I overlooked.

I don't have time at the moment for this endeavour but several folks have taken good stabs at this over at Roborrat's blog over the last few months.

BTW, did you even bother to read your own link? It doesn't support your claim that FAB 30 will be indefinitely shuttered.

I never claimed that Fab 30 would be shuttered indefinitely, but expressed my personal belief that it will not produce CPUs through at least 2008.

Scientia from AMDZone said...

axel

You certainly take the prize for not reading your own links. The link you gave does not support what you said. You claimed that it said:

"according to Hector himself, Fab 38 will see little, if any, action in 2008."

Not quite. What it actually said was:

remarked Hector Ruiz, CEO of AMD. “I mean, we will be prepared to ramp that quickly, should we need extra capacity in that space; which frankly, we hope we do. But at this point in time, we're planning to have Fab 38 at modest activity in 2008.

Notice that the link actually supports what I said which is that FAB 38 will come online at low volume production. It disproves your contention that it will be shuttered thoughout 2008. The statement clearly says that FAB 38 will be at a modest level even if they do not see extra demand (which they hope they do). As I've already explained, you can't ramp FAB 38 quickly unless you have a scanner and once you have a scanner you can't let it set idle. It is possible that to minimize the cost AMD will move a scanner from FAB 36 since we know AMD will need to replace at least one with a TwinScan 1900i.

"The truth is that AMD have much more fab capacity today than they did a year ago (considering both the Fab 36 ramp and the 65-nm shrink), but they are serving less of the market than they were a year ago"

Really? Let's try the actual truth instead of your version. A year ago FAB 30 was at full capacity of 30K wspm and FAB 36 was not quite at 10K wspm. We double FAB 36's capacity because they are 300mm wafers. So, we get the equivalent of:

50K 200mm wafers/month

Right now FAB 30 is shut down so it is zero. FAB 36 is not quite 20K wspm. We double FAB 36 again because of the 300mm wafers. However, we also divide by 0.70 to allow for the 65nm shrink. So we get the equivalent of:

57K 200mm wafers/month

This would be 14% more capacity. I don't consider 14% to be "much more fab capacity" as you stated.

However, in 2006 the small amount of 65nm production added to capacity (because of the shrink) whereas today the Barcelona production takes away (because of the larger die size). Factoring in this with the increase in total market volume gives AMD about half that much extra capacity (about 7%) and that will be sucked up very quickly as quad core production increases (AMD expects half of all server chips to be quad core within six months of Barcelona's release). The same thing happened in 2004 when ramping of the larger K8 die (twice the size of K7) caused AMD's volume to shrink compared to 2003. The only thing that offsets this today is that FAB 36 will continue to ramp in total volume to about 24K by mid 2008.

"but expressed my personal belief that it will not produce CPUs through at least 2008."

Which was disproved by both of the links you gave. Do you really not understand the difference between "modest level of activity" and "not produce CPUs at least through 2008"?

Scientia from AMDZone said...

axel

I feel like I'm beating a dead horse but your first link is not for costs associated with making chips; it is actually for costs associated with making the wafers themselves.

Also, I could point out that when you include facilities cost with 'Direct Labor' that would include the cost of personnel and I had already excluded laying off workers.

Greg said...

I think I'll also chime in and point out that at AMD's peak in demand in 2006 they were using chartered to help meet their excess demand. If they can releive supply constraints while minimizing outsourcing, they should increase margins, which putting fab 38 online in 2008 would accomplish (though obviously not immediately).

GutterRat said...

scientia responded to axel with a recap of the hit/miss rate of scientia's predictions over the past 18 months.

I read some of Brent's replies and some correction/clarifications are warranted.

45nm production soley from D1D in 2007. Check.

Wrong. With Fab 32 45nm production already operational in 2007 you can't make this a 'check'.

I said that I believed that AMD would move up the 2.6Ghz K10 speeds from their release date of Q3 on the original roadmap. This definitely seems to be true.

If by 'move up' you mean 'push out' then agreed.

I said some time ago that Tigerton wouldn't be that impressive. By the SPEC scores, it clearly is not. And we still need to see the performance/watt scores which will still get dented by the use of FBDIMM.

Claiming "Tigerton not that impressive" as measured on SPEC (assume SPEC2006 FP rate) is disingenuous. Look at the
Tigerton benchmark numbers and tell us again with a straight face..make sure to go through all the tabs.

Axel said...

Scientia

Right now FAB 30 is shut down so it is zero. FAB 36 is not quite 20K wspm.

Clearly I was referring to the recent point in time before Fab 30 was taken down, as my argument was pointing towards the reason for Fab 30 being taken down. You need to add 50% of Fab 30's capacity in there (as per Rivet's comments) to follow my argument. Using your figures, this gives a total of 57K + 15K = 72K wspm. Far more than AMD require.

Need to run to catch a plane.

Scientia from AMDZone said...

gutterat

I'm going to ask you the same question I asked Axel. Do you read your own links?

The TPC-C benchmark is classic. Intel compares a 4-way 2.93Ghz quad core Tigerton to a 4-way 2.8Ghz dual core Opteron. Surprise, surprise; the Tigerton with twice as many cores and a higher clock speed is twice as fast. The performance/watt numbers are similar. Since AMD's quad cores fit into the same TDP's as dual core they roughly double the performance/watt.

Looking at Intel's numbers I would say that Tigerton will be close but still behind Barcelona on both TPC-C and performance/watt compared at the same clock. Since Tigerton is currently faster, Intel will should still have higher TPC-C until maybe Q1.

I'm frankly baffled by your attempt to claim that FAB 32 is producing chips. Even D1D will deliver almost no 45nm chips in 2007. In fact, even by January 2008 the 45nm volume will still be low. There is no doubt that Intel slipped its 45nm schedule.

enumae said...

Scientia
There is no doubt that Intel slipped its 45nm schedule.


If Intel's 45nm launch is equal to AMD's Barcelona launch, regarding availability, will you still say that Intel's 45nm schedule has slipped or is still low volume?

Scientia from AMDZone said...

axel

Your argument about FAB 30 would only be a reason for maintaining it at, say, 25% instead of 50%. There is nothing in your argument that explains why it would make sense to take it all the way down.

Scientia from AMDZone said...

enumae

I'm not going to play with semantics. This is very simple. Intel intended to have production available from both D1D and FAB32 in 2007. Intel fans claimed that 45nm would be out in Q3 (which didn't happen). Then Intel fans tried to claim that Intel would flood the market with 45nm in Q4 with production from two FABs. This also is not going to happen. Intel fans also tried to claim that Intel pushed the 45nm schedule forward from Q1 even though Tick Tock clearly says Q4. The bottom line is that Intel has slipped its Q4 schedule both with FAB32 and D1D.

I said before that Intel would not have enough 45nm chips to challenge AMD in Q4 except in servers. That seems to be correct.

Now, in terms of comparison between AMD and Intel there is no doubt (as I have already said) that AMD's K10 schedule slipped. However, I expect AMD to maintain its headstart and to ramp K10 faster than Intel ramps 45nm. My expectations will be wrong if Intel ramps faster. If Intel really is at 25% 45nm in March and AMD is not higher in K10 then I would say that AMD is falling behind more than Intel.

enumae said...

Scientia

A lot of what you say concerning FAB 32 is debatable, there is an article written by Mark Hachman that quotes...

"The first 45-nm production wafer rolled off the line several weeks ago."

John Pemberton, the Fab 32 plant manager during a presentation officially opening the new facility.

-----------------------------------

I said before that Intel would not have enough 45nm chips to challenge AMD in Q4 except in servers.

Maybe it is just me, but isn't it AMD that needs to be doing the challenging?

To be clear, Intel already has an advantage in Desktop and Mobile so why hurry or use 45nm for the segments that it already has a clear advantage in?

...If Intel really is at 25% 45nm in March...

When did Intel say they would be at 25% 45nm in March?

According to Intel's 45nm graph that you posted, they don't plan on being at 25% 45nm until about the end of May.

...and AMD is not higher in K10 then I would say that AMD is falling behind more than Intel.

When you say "not higher in K10", do you mean Dual-core, Quad-core or both Dual and Quad (K10 based) being greater than 25% of prduction?

Giant said...



Maybe it is just me, but isn't it AMD that needs to be doing the challenging?


This is true. Scientia seems to be ignoring the fact that Intel already has plentiful supplies of 65nm quad core CPUs. These CPUs stack up fine against Barcelona. The current Core 2 Quads will stack up fine against a Phenom quad core.

Since AMD has extremely limited supplies of quad core CPUs, it is they who need to do the challenging to intel, not the other way around. AMD, by their own admission, shipped "10s of 1000s" of quad core CPUs. This quarter it will be "100s of 1000s". That's still nowhere near enough to meet demand.

BTW: You might find this Phenom performance numbers interesting:
http://news.expreview.com/2007-10-29/1193590532d6599.html

But this is a game and performance is more based on the GPU. We will need to wait for a more detailed performance analysis later on.

AndyW35 said...

I woke up this morning to find Intel has moved forward it's only desktop processor NDA expiry to today rather than mid November because nvidia are releasing theur 8800GT. Is this to steal some of the limelight or is it Intel showing that they are getting more "pally" with nvidia and you do not need to await AMD's Spider?

Interesting whatever. The speed gains are the expected 5% on average (neglcting SSE4) but the power redeuction is very nice. In the past you would still have to have paired it up with a gpu that requires a lot of power but the new 8800GT from nvidia seems to have lower power as well. Things are certainyl heading in the right direction as AMD's gpu/cpu pair will repeat this feat I am pretty sure.

I would imagine a dual core 45nm Intel cpu and 8800GT would provide a pretty impressive power save over an equivalent E6600 with 8800GTS and be faster as well.

I think Intel have slipped slightly with their schedule, I'd have expected them to hope they could have got th XE out in lateOctober and the rest of the range in December. I'm pretty sure their Q1 release will be closer to January 1st than March 31st so it is not too bad. Whether that has impact on Nehalem I am not sure, I still think that will slip as well ( although I hope Savantu is not reading that ).

Scientia from AMDZone said...

giant

Okay, it looks like the Phenom is getting hit twice in this test. If we allow for the underclocked memory (375Mhz instead of 400Mhz) then Phenom seems about equal to Intel's quads. However, Phenom only gets 5-5-7 timing instead of 5-5-5 like Intel's. Finally, the poor 2T command rate hurts Phenom a lot more. Roughly speaking, Phenom should be getting another 3 frames with proper memory. So, from this test it looks like Phenom is slightly faster at the same clock.

Axel said...

Scientia

Roughly speaking, Phenom should be getting another 3 frames with proper memory.

Complete speculation, and likely grossly inaccurate. We can see that despite 50% larger L2 cache, Yorkfield doesn't outperform Kentsfield per clock in this benchmark. Hence we can discard memory performance as a likely bottleneck. The most bottlenecks are probably GPU and then CPU in that order.

Instead of gaining 3 fps in this bench from faster memory, Phenom is more likely to gain closer to a tenth of that, or 0.3 fps.

Axel said...

Furthermore, thanks to abinstein's recent insightful post on AMDZone, we can look at the detailed test results in the command prompt boxes for minimum FPS:

Kentsfield 3.0 - 29.69
Yorkfield 3.0 - 28.86
Phenom X4 3.0 - 8.28

Clearly, there is a major CPU bottleneck on the Phenom system for this game. This further supports my prediction that Yorkfield/Wolfdale will simply trounce Phenom X4/X2 on the desktop next year and will force AMD to price Phenom significantly lower at the same clock. Look for another year in the red.

Also it's strange that Yorkfield underperforms Kentsfield here. I would normally attribute that to normal benchmark accuracy skew but in this case it is duplicated four times. So there's something not quite right on the Yorkfield platform here, since we know from the slew of QX9650 benchmarks that came out today that Yorkfield NEVER underperforms Kentsfield per clock and typically comes out 5-10% ahead.

sharikouisallwaysright said...

This Crysis-Test done with XP and all default settings including the Demo.

Vista is out nearly a year and should be the primary OS for all testing now.
This default-thing means what?
I am confused how one has to understand this...

So far Intel has not yet done the very great jump compared to its older Quadcores and its up to AMD to close the Gap.

One will see...

Giant said...



Vista is out nearly a year and should be the primary OS for all testing now.


Many people still run XP. Games do run faster on XP. Perhaps it makes sense tests on both XP and Vista.


If we allow for the underclocked memory (375Mhz instead of 400Mhz)


This does seem to be common with AMD CPUs. Due to the odd CPU multipliers and memory dividers with Socket AM2.

TomsHardware has a nice little page on this:

http://www.tomshardware.com/2006/05/23/amd_reinvents_itself/page9.html

Running at 3Ghz there's no way to avoid this odd memory clock. You'd need clockspeeds of 2.8Ghz or 3.2Ghz for a full DDR2-800.

However, Phenom only gets 5-5-7 timing instead of 5-5-5 like Intel's. Finally, the poor 2T command rate hurts Phenom a lot more.

I do agree with you here. The AMD system should be running with correct 5-5-5-15 timings. Since the ram wasn't even running at full speed with a 3Ghz clockspeed, perhaps even 4-4-4-12 timings would be possible.

We'll see the full performance results next month.

Erlindo said...

I'm so baffled to see how hypocrite axel is!

He claims that something is wrong with the yorkfield platform, but won't say the same for the AMD platfrom. :D

...And talking about a tenth of fps, please take a look here before saying such nonsense:

Memory Latencies Do Matter

Seems that with a full working processor (non engineering sample) and the apropriate platform, Phenom is equal/better clock per clock than penryn (on games). We'll have to see how it fares on other apps.

Ho Ho said...

"Memory Latencies Do Matter"

From 3-3-3-9 1T to 5-5-5-15 2T you loose 6% performance. Performance difference in Crysis between QX9650 and OC'd X4 was ~7.5%.

Also remember that Intel also benefits from better RAM timings. Not as much as AMD but enough to make a difference. Another thing that can give Intel more pefromance is to run RAM and FSB at "nice" dividers. E.g they used 5:6 that isn't exactly too good.

Erlindo said...

ho ho wrote:...Performance difference in Crysis between QX9650 and OC'd X4 was ~7.5%.

...and this is what I've said:
Seems that with a full working processor (non engineering sample) and the apropriate platform (and by this I mean good memory timings AND a working BIOS for RD790), Phenom is equal/better clock per clock than penryn (on games).
:)

ho ho, sorry to ask you this but I have the following question for you:
¿Why do you love intel so much?
¿Why do you always defend them so much?

If you do love intel the way you do it, why not post on an intel blog?

Thanks in advance.

Ho Ho said...

Erlindo
"non engineering sample"

What do you estimate the speed difference could be? How big per-clock speed difference has there been between ES chips and retail ones when benching <1 months before launch?

Phenom is about to launch and if a few weeks before it there are only ES chips then things doesn't look all that good, especially considering how long ago Barcelona was launched.

Perhaps you are also believing that there is some mystical bug in K10 that makes it as slow. I have high doubts in that as the architecture simply cannot be much faster as it currently is. All that talk about "40%+ better performance in variety of workloads" was pure marketing talk that defined "wide variety of workloads" as stuff that purely depend on memory bandwidth where AMD already had (and still have) 40%+ performance lead over Intel.


"apropriate platform"

Well, K10 was supposed to be a "simple drop-in replacement with a BIOS patch". Are you saying things are not as simple and the patch takes months to become good enough? How big performance differences have there been in CPU speeds on AMD platform? I haven't really researched it but considering that CPU+RAM are relatively closed system I doubt differences could be big.

Or perhaps you are saying that current AM2 isn't really good and AM2+ should be used. Well, that certainly makes the point about longer socket life moot.


"equal/better clock per clock than penryn (on games)."

Well, it could but nothing said it will. Also first it will has to actually reach similar speeds, preferably without overclocking. Also it will be really tough to beat Penryn in thermals as their 3GHz quad seems to take way less power than its TDP shows. Of course in servers FBDIMM will offset most of it.


Another point worth considering is that Abinstein and Scientia have said that FSB is a bottleneck with >=2.4/2.6GHz quads. That means with lower clock speeds Intel will be even more competitive than at 3GHz.



"¿Why do you love intel so much?"

I love performance and currently it is Intel that offers more of it for given amount of money, I couldn't care less about what logo is on the IHS. It is not my fault that AMD has failed to deliver.


"¿Why do you always defend them so much?"

Is it bad that I talk about facts? This blog is rather biased and few others bother doing it when it comes to Intel. On AMD side here are lots more people who write mostly accurate stuff and I don't feel that my input would matter as much.

Also I don't like when people try to spin stuff to make things look better/worse than they actually are. This talk about how memory is crippled on AMD is a perfect example of this.


"If you do love intel the way you do it, why not post on an intel blog?"

First, I don't love any company. I just use their products. Secondly why should I post anything on any other site if I'm trying to comment on things talked about in this one?

Hornet331 said...

http://my.ocworkbench.com/bbs/showthread.php?t=68503

3dmark score of 2ghz Phenom vs 2ghz QX9650

the 9650 wins by approx 1000 points.

Axel said...

erlindo

Seems that with a full working processor (non engineering sample) and the apropriate platform, Phenom is equal/better clock per clock than penryn (on games).

Ho Ho

From 3-3-3-9 1T to 5-5-5-15 2T you loose 6% performance. Performance difference in Crysis between QX9650 and OC'd X4 was ~7.5%.

No, I'm sorry but you gents are't thinking this through. The Crysis scores revealed in the command prompts for four timedemos were:

- minimum fps
- average fps
- peak fps

Now, if you look at the scores, Yorkfield 3.0 beats Phenom X4 3.0 by:

- 250% in minimum fps
- 7.5% in average fps
- 4.6% in peak fps

A couple conclusions can be drawn from these numbers:

1. The fact that the Yorkfield and Kentsfield scores are about identical for all three categories indicates that the game is not memory bottlenecked at all. If it were, Yorkfield's 50% larger L2 would help significantly.

2. The fact that all four CPUs score within 5% of each other for peak fps indicates that the peak framerate is largely bottlenecked by the GPU, not the CPU.

3. Since the difference in average framerate (7.5%) is much closer to the difference in peak (4.6%) than to the difference in minimum (250%), this indicates that the demo is GPU bottlenecked for most of its duration.

3. The huge 250%difference in minimum fps indicates that the game does occassionally see a CPU bottleneck on Phenom, but not too often.

We can logically conclude that the Crysis timedemo as run in that test was GPU bottlenecked over the great majority of the duration, but for those few moments where the CPU is the bottleneck, Phenom X4 is completely destroyed by the Intel processors. It is because those moments are few and far between in the demo that the average fps scores are fairly close between the four CPUs.

I believe that K8 3.0 GHz would score nearly the same as Phenom in this demo. It would certainly score about the same peak fps, as that score is clearly GPU bottlenecked. The minimum fps would be perhaps about 15% lower than Phenom, but as I described, those CPU bottlenecked moments are rare. The average fps would hence likely only be some 3-5% lower for K8 than Phenom, say around 44.5 fps.

Greg said...

Axel, your link isn't working (as in, the page loads, but there's no content other than a border and some buttons).

Also, I'm sorry, but such a drastic drop in framerate can't have anything to do with the processor, especially by your analysis. I realize that it's most likely the only factor that's being changed (I can't tell though, since I can't see the page), but a 250% difference in the ability of one processor to handle a heavy load that is not sse4 (because I don't know how they'd be able to implement that in any part of crysis) can basically only show error in the testing or poorly written code. Not even a k8 could show that much difference (at 3 ghz), from the testing I've done, and it has 2 fewer cores, on an app that is decently threaded.

Being that I'll assume you analyzed the testing well, axel, and found that they didn't test poorly, I'd have to assume that this is yet another result of the poor polish generally found in crytek products, in terms of consistency in optimization. They are fairly well known for this, and most testers have commented on the fact that the code in crysis is horribly optimized, and they're pretty much all using kentsfields with 8800s, so it's not like they're straining the bounds of what hardware crytek will support well.

I'd also have to assume there are much much much better benchmarks out there that have been around long enough, like supreme commander. In fact, I'm very confused as to why this wasn't used as a benchmark, and why new, technically unfinished software from a company known for poorly optimized code (though not for lack of trying, as their products are obviously amazing) was used instead.

Giant said...

greg obviously wants to see different benchmarks.

Here's SuperPi for you:

Phenom 3Ghz: http://www.expreview.com/img/news/071030/k10_3g_superpi.png

Here is Q6600, overclocked to 3Ghz with 1333 FSB (so same settings as a QX6850 by default): http://img.photobucket.com/albums/v30/johnli0615/SuperPiOC30.jpg

I too have a Q6600 @ 3Ghz. I get 17.4s as my time.

BTW: With regards to the Crysis demo, it is indeed a very demanding game. The graphics are amazing though. With my Q6600 and an 8800 GTS video card I can run the game at 1920x1200 with most of the ingame settings at high, with a few (Shadows, shaders etc.) at medium.

Never before have I seen a Geforce 8800 video card butchered in this manner!

Ho Ho said...

greg
"Axel, your link isn't working (as in, the page loads, but there's no content other than a border and some buttons)."

See if there is a scrollbar on the bottom of your browser.


"... or poorly written code"

CPUs must run all kinds of code, that includes poorly written one.


"and it has 2 fewer cores, on an app that is decently threaded"

Considering that dualcores are on par with quadcores in Crysis I wouldn't say it is too well threaded.


"poor polish generally found in crytek products. They are fairly well known for this, and most testers have commented on the fact that the code in crysis is horribly optimized"

Is that common knowledge? Where does that information come from?

Axel said...

Greg

Axel, your link isn't working (as in, the page loads, but there's no content other than a border and some buttons).

Using Firefox? I read somewhere that the page doesn't load properly in Firefox, only in IE.


They are fairly well known for this, and most testers have commented on the fact that the code in crysis is horribly optimized...

Whatever the reason, the bottom line is that when the CPU is the bottleneck in Crysis, Phenom X4 is much slower per clock than any of the Intel processors tested.

Based on this, the SuperPi results, and the various desktop-related Barcelona benches already revealed by Anandtech & Tech Reports, I am confident that when the clock-for-clock desktop & gaming comparisons come out next month with the Phenom launch, Phenom X4 will be embarrassed by Kenstfield and completely buried by Yorkfield.

I'm not sure why so many folks expect Phenom to be significantly faster per clock than Barcelona, just wishful thinking I guess. The bottom line is they're practically identical and a lot of people are going to be eating heaping platefuls of crow and humble pie next month.

Scientia from AMDZone said...

axel

"This further supports my prediction that Yorkfield/Wolfdale will simply trounce Phenom X4/X2 on the desktop next year. . .

Also it's strange that Yorkfield underperforms Kentsfield here."


Naturally you assume the Yorkfield scores are wrong but insist that the Phenom scores are correct in the same test. Double standards?

Axel said...

Scientia

Naturally you assume the Yorkfield scores are wrong but insist that the Phenom scores are correct in the same test. Double standards?

Note that after mentioning the anomaly, I took both the Yorkfield and Phenom numbers at face value in my calculations. And I'd already addressed the Phenom memory performance penalty by estimating a ~0.3 fps improvement in average framerate from using faster memory. So there are no double standards here.

I make no claim for the veracity of this individual benchmark, but I note that this is another piece in the quickly mounting evidence to support my prediction that in desktop performance on average, Phenom X4 (with fast DDR2 and HT3) will underperform Kentsfield by 10-15% per clock and Yorkfield by 15-25% per clock in CPU-bound scenarios.

Greg said...

Sorry axel, can't run IE (cause I don't feel like setting up wine for a product I'll only use this once).

Hoho, your point about what code cpus must run is valid, but my point is not that crysis shouldn't ever be used for performance estimates, but that we should wait until the product has been out for some time.

Also, anyone who played Far Cry would know about the multitude of issues it experienced on random video cards. It was eventually fixed and became a standard for benchmarks, but it wasn't initially for a reason.

Axel, you're right that the cpu is the bottleneck. My point was not so much that the cpu wasn't, but that it shouldn't be. As for threading, the game is decently threaded, as the developers state. I realize I've pointed out I don't trust the quality of there coding, but their threading scheme (last time I checked) was a good one, and should balance the load across the cores fairly evenly up until 4 cores.

Also, what phenom benchmarks? Tech Report and Anand both used Barcelona 2P procs, which, while having the same core, is at a lower clock speed, and had much slower memory, both of which single phenom processors will take much more advantage of than an Intel processor.

Also, hoho, isn't amd no longer supporting 3D Now? Irregardless, my 3800+ @ 2.8 ghz had a better score than that phenom, so I highly doubt that's an accurate score.

Axel, while I don't believe phenom will be significantly, if any faster than yorkfield, I'm very confused as to how you could extrapolate 25% lower performance per clock from the data currently available.

Erlindo said...

Scientia wrote:
Naturally you assume the Yorkfield scores are wrong but insist that the Phenom scores are correct in the same test. Double standards?


Yeah, he's a hypocrite.

ho ho wrote:
What do you estimate the speed difference could be? How big per-clock speed difference has there been between ES chips and retail ones when benching <1 months before launch?

Phenom is about to launch and if a few weeks before it there are only ES chips then things doesn't look all that good, especially considering how long ago Barcelona was launched.

Considering the possibility that this engineering sample could be the same stepping with which Barcelona debuted a few weeks ago (BA-B0), then this test shouldn't be taken seriously. Even the CPU-Z screen shot doesn't recognize the processor appropriately.

...Perhaps you are also believing that there is some mystical bug in K10 that makes it as slow. I have high doubts in that as the architecture simply cannot be much faster as it currently is
Scietia was one of the few that noticed such anomaly and all the evidence we've seen so far shows that the processor isn't performing the way it should. I guess with more revisions/steppings and compiler optimizations, things will look alot better. My personal thought about this issue is that AMD rushed K10 just for sake of not being hammered by the financial community.

All that talk about "40%+ better performance in variety of workloads" was pure marketing talk that defined "wide variety of workloads" as stuff that purely depend on memory bandwidth where AMD already had (and still have) 40%+ performance lead over Intel.
I guess we can also apply the same logic to intel about All that talk about 40+% better gaming when in fact Penryn only offers 0-6% increse in "overall" gaming.
Deceitful intel
Please, don't become like Axel. ;)

Well, K10 was supposed to be a "simple drop-in replacement with a BIOS patch". Are you saying things are not as simple and the patch takes months to become good enough?
Let's be sincere here: People who're going to buy quad Phenom are also gonna buy a new RD-790 motherboard to take full advantage of the processor's features.
Current AM2 mobos are ok, but not enough.

Well, it could but nothing said it will. Also first it will has to actually reach similar speeds, preferably without overclocking. Also it will be really tough to beat Penryn in thermals as their 3GHz quad seems to take way less power than its TDP shows.
At last I agree with you here, but as I said before, the processor wil get better with more steppings.
I really don't care about having the fastest processor (MHz talking), for me, performance per clock is all. The rest will come in addition. ;)

I love performance and currently it is Intel that offers more of it for given amount of money, I couldn't care less about what logo is on the IHS. It is not my fault that AMD has failed to deliver.

Even if AMD would offer better performance, you'll always favor intel. Your posting patterns don't back you up.
It's like a homo trying to convinve his self that he's a man and he likes women, but the hard fact is that he's gay.(let's get things straight here: I love women) :D
I have no problem admitting that I favor AMD over intel a million times. ;)

First, I don't love any company. I just use their products. Secondly why should I post anything on any other site if I'm trying to comment on things talked about in this one?

Judging by your posting patterns, you can always join Toms Hardware or GuterRat's Blog. Believe me, you guys will be a happy family... ;)

Axel said...

Greg

Axel, while I don't believe phenom will be significantly, if any faster than yorkfield, I'm very confused as to how you could extrapolate 25% lower performance per clock from the data currently available.

In a previous blog entry I had summarized the results from the Tech Report review. The numbers below represent how much higher/lower K10's IPC is compared with Clovertown on the 2.0 GHz test systems.

SPECjbb: 2.1% higher
Valve VRAD: 13.1% lower
Cinebench sngl: 16.3% lower
Cinebench mult: 10.7% lower
POVRay chess: 3.8% higher
POVRay bench: equal
MyriMatch 1 th: 9.1% lower
MyriMatch 8 th: 1.8% higher
STARS 1 th: 29.2% lower
STARS 8 th: 13.3% lower
Folding avg: 3.0% higher
Panorama Fact: 12.9% lower
picCOLOR: 20.0% lower
WME encoding: 6.3% lower
Sandra Mult Int:47.8% lower
Sandra Mult FP: 11.6% lower

So this shows that K10 is on average some 10%-15% lower in IPC than Clovertown.

And then Anandtech compared Barcelona to a pair of dual-core K8s, in the same motherboard using the same memory. You can see that Barcelona on average was about 15% faster than K8 per clock. Note that the two K8 processors had to communicate between two sockets, introducing some latency. So in reality K10's per core per clock advantage is slightly less than that 15%. But let's assume 15%.

Now we know that on the desktop, a 3.0 GHz K8 is on average roughly as fast as a 2.33 GHz C2D on the 1333 FSB. This gives the C2D a 28% clock-for-clock advantage on the desktop: If we assign the 2.33 GHz C2D a score of 100, a 2.33 GHz K8 would score 100*2.33/3.0 = 78. 100/78 = 1.28 or 28% IPC advantage.

Since K10 has about a 15% IPC gain over K8, 2.33 GHz K10 would score 90 in that same test. So C2D would be 100/90 or about 11% faster than K10 per clock.

Axel said...

To continue: Yorkfield has a 5-10% gain over Kentsfield in IPC, so it would score 105-110 or let's say 107 on average. 107/90 = 1.19. So we can say that on the desktop, Yorkfield should have about a 19% IPC lead over K10.

Peter said...

Scientia, what is your view on the latest QX9650 using less than 65W under full load?

Weren't you saying Intel was having trouble scaling past 3GHz @ 45nm because of the supposed 130W TDP?

It looks like they have a lot of clockspeed headroom to spare, and are in fact deliberately limiting the clockspeed because there is simply NOTHING from AMD that even remotely comes close in performance.

Ho Ho said...

erlindo
"Considering the possibility that this engineering sample could be the same stepping with which Barcelona debuted a few weeks ago (BA-B0), then this test shouldn't be taken seriously."

Considering the high clock speed of the Phenoms it is quite certain these are not the same revision as currently sold Barcelonas. If they are then where are all the real ones? It is two weeks before launch and there are still none on the move?


"Even the CPU-Z screen shot doesn't recognize the processor appropriately."

That only proves my point. ES chips have been out for a long time and CPUz should know them.


"Scietia was one of the few that noticed such anomaly and all the evidence we've seen so far shows that the processor isn't performing the way it should."

And how should it perform? Better? Sure but how much better and what would be the bottleneck holding it back? K10 cache hierarchy had quite a big hit on its memory latency, there is not too much that can help with that.


"I guess with more revisions/steppings and compiler optimizations, things will look alot better"

... for both of them. Intel can benefit a lot when developers start using SSE4. There is no such "magic bullet" for K10. Pretty much everything that gets optimized for K10 will also get faster on Core2/Penryn too.


"My personal thought about this issue is that AMD rushed K10 just for sake of not being hammered by the financial community."

That's also what I think but I doubt they can do much else but to gradually increase the clock speed to make things a bit better than they are today. That won't help them too much as Intel has massive clock speed headroom.


"I guess we can also apply the same logic to intel about All that talk about 40+% better gaming when in fact Penryn only offers 0-6% increse in "overall" gaming."

It has been a known fact that Penryn needs special codepath to reach that much better performance and I've never said anything else. Also there is a difference between "up to X% better" and "over X% better in wide variety of workloads". To me it seems that their "wide variety" isn't wide at all.


"Let's be sincere here: People who're going to buy quad Phenom are also gonna buy a new RD-790 motherboard to take full advantage of the processor's features."

I agree with that and it only shows that the backwards compatibility is not as important as many people are trying to say.


"I really don't care about having the fastest processor (MHz talking), for me, performance per clock is all."

For me it is actually performance per buck. Intel with its wide variety of different performing CPUs can offer much more pricepoints than AMD currently can. Hopefully Phenom brings a small pricewar at <=2.5GHz quads soon, I doubt it can influence higher clocked CPUs much. Of course as I also OC my CPUs AMD doesn't offer too much in that regard, unfortunately.


"Even if AMD would offer better performance, you'll always favor intel. Your posting patterns don't back you up."

What exactly makes you say that? The fact I preferred P4D to X2? Have you really forgot the story behind that?


"I have no problem admitting that I favor AMD over intel a million times. ;)"

Well, I certainly have no such prejudice


"Judging by your posting patterns, you can always join Toms Hardware or GuterRat's Blog."

I couldn't care less about THG and GutterRat's blog is simply something I red to amuse myself. Kind of like this one here. If I really want to talk about some technical stuff I go to either B3D or to one Estonian forum, at least there are very few diehard fanboys.



I wonder if I should look up the predictions about K10 I made months ago here. I said it will likely be a little bit ahead in FP intensive workloads, behind on integer loads and still ahead on memory bandwidth intensive loads. Seems as my predictions were much closer than what Scientia and abinstein claimed back then.

Scientia from AMDZone said...

giant

It only took me a few minutes to rule out superpi as a reliable test. From what I've seen it is only useful for comparing processors within the same family. In other words, you can't use it to compare K8 with K10 or P4 to C2D.

Scientia from AMDZone said...

axel

"I'm not sure why so many folks expect Phenom to be significantly faster per clock than Barcelona"

Who is expecting this?

Scientia from AMDZone said...

Peter

"Scientia, what is your view on the latest QX9650 using less than 65W under full load?"

Looks normal to me. Is there some reason it shouldn't?

"Weren't you saying Intel was having trouble scaling past 3GHz @ 45nm because of the supposed 130W TDP?"

No, I never said that. In fact, I specifically said that TDP wasn't the problem. Testing at Anandtech showed that Kentsfield at 2.93Ghz was marginal on temperature with a stock HSF. Marginal in this case means that it exceeded Intel's own specs. In other words, a Kentsfield with stock HSF ran too hot at 2.93Ghz. This problem was probably fixed however with the G0 stepping.

"It looks like they have a lot of clockspeed headroom to spare, and are in fact deliberately limiting the clockspeed because there is simply NOTHING from AMD that even remotely comes close in performance."

Yes, I know that this is the fantasy of Intel enthusiasts who were rationalizing Intel's inability to deliver a 65nm 3.2Ghz chip in 2006 (or even 2007). The testing done by OC'ers however supports the fact that Intel didn't catch up to its D1D quality samples until G0. Secondly, Intel's slipping 45nm schedule indicates that Intel is having some rough spots there too.

However, I'm pretty certain you will find ways to disregard what I've just said and keep marching to the tune, "Intel, Intel ueber alles".

Aguia said...

Who is expecting this?

Me for example.
In some aplications I expect some 10% performance boost, thanks to the faster memory, HT3 and faster L3.
(I said in some!)

Scientia from AMDZone said...

The testing we've seen with Barcelona so far has been very poor. Although there was a great deal of whining that AMD sent out samples too late there was nothing to stop these same review sites from doing a proper job of testing in October but none bothered. It is clear that they are now waiting until Penryn and Phenom are available. I tend to wonder though if the samples that Intel sends out will be unavailable until early Q1. We also saw that AMD had sent out some early samples of its own so it is possible that AMD could do the same thing. So, when we do finally get proper testing it could easily be for Q1 chips from both companies.

On the subject of a bug in K10. K10 has specific changes in architecture that should make it much faster than K8 in SSE and somewhat faster in Integer. If these speed increases are not being seen then there must be a problem. As I've said before there are three possibilities:

1.) K10 has a design flaw that is so severe that it will not be corrected which means nothing faster until Bullldozer.

2.) K10 has a design flaw that won't be fixed until Shanghai.

3.) K10 has a bug that will be fixed before Shanghai.

AMD would have no reason to talk about such a bug nor would it show up on any errata. The only way we would ever know is if a future version of K10 is suddenly faster per clock.

Axel said...

Scientia

As I've said before there are three possibilities:

1.) K10 has a design flaw that is so severe that it will not be corrected which means nothing faster until Bullldozer.

2.) K10 has a design flaw that won't be fixed until Shanghai.

3.) K10 has a bug that will be fixed before Shanghai.


You forgot the fourth possibility you had previously mentioned and that I believe is the most likely one. The changes to the K10 architecture require optimizations in software compilation different from the optimizations for C2D. Unfortunately, most applications (and benchmarks) will be compiled to optimize for C2D rather than K10.

Greg said...

Axel, you still ignore barcelonas scaling as clock speeds increase and also ignore the fact that the memory will be substantially different, which will drastically effect the scores (combined, that is).

Also, deriving performance numbers indirectly is an inherently flawed form of reasoning. Direct comparison is the only thing that's really worth while.

Ho Ho said...

"Also, deriving performance numbers indirectly is an inherently flawed form of reasoning."

So what is all that talk you wrote before that sentence?

Also how many links have you got about Barcelona scaling? Just today I saw something on XS forums where from 2.0->2.5GHz Barcelona scaling was in no way different to K8 scaling.

Aguia said...

What I think amazing is that AMD with just 4MB total cache (2.5MB for one core max) VS Intel 12MB (6MB for one core max) and without SSE4 which Intel is already using in some of its demos/tests still performs so great.

"Just" 4% slower at the same clock has Intel in games it’s simply amazing.

According to the axel links it’s 12% but since there are a lot of stupid tests there, when removing those it’s "just" 6% slower in worst case scenarios.

Aguia said...

Does any one have any explanation for the huge power consuming differences in the Intel QX9650 reviews from toms and x-bit?

Toms
idle 3.79 Watts
Full load 73 Watts

Tomshardware

X-BIT
Idle 15.2 Watts
Full load 89.8 Watts

X-Bitlabs


Does any one have another link that corroborates any of the results?

Mo said...

"What I think amazing is that AMD with just 4MB total cache (2.5MB for one core max) VS Intel 12MB (6MB for one core max) and without SSE4 which Intel is already using in some of its demos/tests still performs so great."


What I think is Amazing that Intel still using FSB and still using the Glued method VS AMD with it's IMC and HT and monolithic quad, is 6% FASTER.

I guess the FSB and MCM are not that bad afterall.

Greg said...

Hoho, I don't see the conflict. My first comment referred to estimating whether or not we should expect any differences between the comparison between barcelona and clovertown and the comparison between phenom and kentsfield/penryn. My second comment referred to trying to extrapolate specific performance percentage difference based on another piece of hardware's performance compared to another piece of hardware's performance etc... The method is inevitably inaccurate.

Greg said...

Also, mo, the difference is amazing considering the vast difference in Intel's R&D budget and AMD's and also the differences in their overall revenue/product line.

A company that's been allowed to engulf as much of the industry as Intel has is not likely to ever be overthrown. Somewhat similar to how you statistically can't out-earn someone by investing if they have more money than you.

Ho Ho said...

aguia
"Does any one have any explanation for the huge power consuming differences in the Intel QX9650 reviews from toms and x-bit?"

Isn't it obvious they are using completely different measurement methods?

abinstein said...

Mo... FSB & MCM are not bad for desktop apps.

Aguia said...

Isn't it obvious they are using completely different measurement methods?

If that’s the case then ANY reported power consuming numbers from ANY site can’t be trusted.

Ho Ho said...

abinstein
"Mo... FSB & MCM are not bad for desktop apps"

FSB does increase memory latency and in theory MCM has worse latency between dies. In reality though it seems as die to die latency is pretty much equal to latency between any barcelona cores. Of course with higher clock speeds things will change a bit. Then again FSB 1600 will also help.



aguia
"If that’s the case then ANY reported power consuming numbers from ANY site can’t be trusted."

I'd say that measurements from wall socket are quite comparable for as long as the test systems are identical (except for the measured parts). Of course comparing different reviews is pretty much impossible that way.

Still the fact remains that so far every single review of Penryn has showed massive drop in idle and load power usage compared to older models.

Scientia from AMDZone said...

axel

"You forgot the fourth possibility you had previously mentioned and that I believe is the most likely one. The changes to the K10 architecture require optimizations in software compilation different from the optimizations for C2D. Unfortunately, most applications (and benchmarks) will be compiled to optimize for C2D rather than K10."

Good point. However, I've seen benchmarks which used PGI 7.1 without seeing the big FP improvements. So, I'm still wondering.

Scientia from AMDZone said...

mo

"I guess the FSB and MCM are not that bad afterall."

The large cache and prefetching hide the latency. Also, the dual and quad FSB northbridges work well with a message filter. However, this works best with benchmark code that fits into cache. It would be nice to see more loaded testing to see which architecture has a flatter performance curve.

Scientia from AMDZone said...

aguia

The difference is because Xbit measured the power behind (including) the voltage regulator while Tom's measured it in front of (excluding) the voltage regulator. Xbit specifically says that they used identical boards (with the same voltage regulator) to make sure that this component of the power draw was the same.

abinstein said...

Ho Ho "FSB does increase memory latency and in theory MCM has worse latency between dies."

Which is not much a problem for desktop applications if the processor's cache is large enough. That's my point.


"In reality though it seems as die to die latency is pretty much equal to latency between any barcelona cores."

That's when there's no contention on the FSB, i.e., in a tightly controlled benchmarking condition.

Ho Ho said...

abinstein
"Which is not much a problem for desktop applications if the processor's cache is large enough. That's my point."

So basically for the majority of the market it is better to have CPU with large caches that could run their code best?


"That's when there's no contention on the FSB, i.e., in a tightly controlled benchmarking condition."

As in on most desktop workloads?

Kind of seems like AMD designed their CPUs to work well on servers but they aren't as well optimized for the majority of the market, desktops. If I'd produce something I'd first target the biggest audience and later see what can I improve to sell to minorities. That way I could earn maximum profits.

abinstein said...

"Kind of seems like AMD designed their CPUs to work well on servers but they aren't as well optimized for the majority of the market, desktops."

I don't think you get the logic straight. The FSB is not optimized for anything; it's not a big problem if cache is large enough. Surely you understand the difference between "not a big problem" and "optimized"?

HT3 and direct connect architecture is (almost) always better than FSB; Intel had to compensate its FSB deficiency with 50% larger core and 8x larger L2 cache to get 20% better score in SPEC benchmarks.

OTOH, AMD optimized their processor design for both server and desktop very well. I've seen K8-based system out-performs Core 2 on network intensive (IPsec) tests. Not just performance per watt, but absolute performance.


"If I'd produce something I'd first target the biggest audience and later see what can I improve to sell to minorities."

Which has nothing to do with the discussion here. Where do you get the impression that HT/IMC not optimized for desktop, when K8 with smaller core size, much smaller cache, and manufactured on a less advanced process, shows just slightly lower performance but better performance per watt than C2?

FSB is ancient, period. It limits the scaling of processor performance, unless itself is scaled faster than Moore's law. It makes MB more difficult to make and more expensive; it makes processing efficiency lower under heavy load. Yes it doesn't matter if you just run single-threaded programs or check e-mail - but then you don't need multi-core high-end processors either.

Greg said...

Ho ho said:

If I'd produce something I'd first target the biggest audience and later see what can I improve to sell to minorities. That way I could earn maximum profits.


No offense, but that's why your company would inevitably fail. Most highly successful companies started by finding a small market that they could control and expanding from there. Nokia started by selling paper in a town in Finland. Wal-Mart started with a little general store in the backwoods town of Bentonville, Arkansas.

I wholeheartedly agree with abinstein's analysis, but would go on to say that the reason the FSB is a bad thing is that implementing the crutches that fix its performance deficits wastes die space and development time that could be used on making an innovative product that actually moves the x86 industry forward.

Khorgano said...

No offense, but that's why your company would inevitably fail. Most highly successful companies started by finding a small market that they could control and expanding from there. Nokia started by selling paper in a town in Finland. Wal-Mart started with a little general store in the backwoods town of Bentonville, Arkansas.

No offense, but this is where your company would also fail. You can't compare a startup to an already established company in a large market. Surely, no one would expect a small business to supply an international market from startup, that's common sense, but when you have a company like Intel and AMD that should have the production capabilities to supply the very large markets, you get MUCH better economies of scale when supplying the majority as opposed to a niche.

Sure initial volumes may be light when a launch occurs (i.e. Barcelona and Penryn), but if it is fully intended to ramp to serve the overall market and you have the capability and roadmap to achieve it, you better damn well target the majority market first, and follow up with smaller markets later.

Greg said...

Khorgano, you're right that, to be a major player that's very stable, AMD has to target the main body of deman in their market, but there are many aspects of AMD that are similar to a start-up, because of how the x86 market differs from others.

In this sense, AMD can find enormous security in serving comparatively "niche" (used comparitively because they're still very large) markets that Intel can't entirely serve without creating an entirely separate product line. However, as I can see you realize, focusing too much on this smaller, more secure market, generally ends up with your company being pushed out of the market altogether.

From what I can tell, this is what AMD sees in the market, and the basis to how they approach it. It seems to me that Barcelona is supposed to be a bit of a compromise between a highly capable HPC processor and something versatile enough to be used in desktops.

I take scientia's viewpoint that something is currently not quite right with Barcelona, as the design of the core is obviously capable of producing performance improvements over k8 that, for some reason, are not present. This makes barcelona less of that compromise that AMD wanted to have.

Regardless, this conversation is digressing from more pertinent topics.

Dr. Yield, PhD, MBA said...

Greg wrote:
Most highly successful companies started by finding a small market that they could control and expanding from there.

In a general sense you are correct. But in the semi space, if you want to manufacture, the game is all about scale. A leading edge fab costs 2-4 billion USD, depending on size. The shell alone is close to 1 billion USD. There is no way to build a half a fab on the cheap, and "build out" as you need it. In order to generate sufficient ROI on that level of capital expenditure, you have to sell volume and ASP, not just one. An analogous industry would be oil refining- lots of capital to build, and you don't ever plan on operating much below capacity.

That said, if you want to go fabless, you can be MUCH smaller. You give up some flexibility, you give up some profit margin over an integrated manufacturer, and you give up some time to technology node, as foundries don't usually operate at the very bleeding edge- it's not where the profit sweet spot is for them. Hence Jerry Sanders' old quote "real men have fabs".

core2dude said...


I've seen K8-based system out-performs Core 2 on network intensive (IPsec) tests.

And who uses IPSec? SSL is far more relevant, which is dominated by RSA. Core 2 beats K8 there handily.

core2dude said...


And who uses IPSec?

BTW, I know that many of the VPN implementations today are IPSec. But a typical company has only a few VPN gateways, vs companies like google that have thousands of SSL (HTTPS) servers.

Also, recently there is trend away from IPSec VPNs towards SSL VPNs.

Peter said...

abinstein wrote:
"Which has nothing to do with the discussion here. Where do you get the impression that HT/IMC not optimized for desktop, when K8 with smaller core size, much smaller cache, and manufactured on a less advanced process, shows just slightly lower performance but better performance per watt than C2?"

K8 is 25 - 30% slower clock for clock and has MUCH worse performance/watt than C2D. What have you been smoking man?

A 65W E6750 @ 2.66GHz beats a 125W X2 6400+ 3.2GHz in the majority of benchmarks.

abinstein said...

Jack... like it or not, I'm just stating a fact. I have no preference for IPsec or SSL or anything. Please learn to grow up when you read.

Peter... it depends on what you look at. If it's SPEC then yes 25-30%; if it's SPEC_rate then it's 20-25%. You like SPEC better, that's fine; but IMO it doesn't make sense to benchmark multi-core with single-threaded applications.

abinstein said...

khorgano...
"... but when you have a company like Intel and AMD that should have the production capabilities to supply the very large markets, you get MUCH better economies of scale when supplying the majority as opposed to a niche."

What you said is correct, but -

1) Intel and AMD do not enjoy the same economy of scale. The former has 3x advantage than the latter.

2) HT/IMC vs. FSB is not a matter of niche vs. general market. It's a matter of next generation vs. previous generation. Put it this way, a Core 2 with HT and IMC will be better than a Core 2 with FSB, on desktop, server, or mobile.

Khorgano said...

Abinstein said:

2) HT/IMC vs. FSB is not a matter of niche vs. general market. It's a matter of next generation vs. previous generation. Put it this way, a Core 2 with HT and IMC will be better than a Core 2 with FSB, on desktop, server, or mobile.


I don't disagree with any of your points, but my argument is about business and economic theory. Whether or not HT/IMC and FSB are niche market or mass market or how they'll help/hinder either architecture are beside the point and irrelevant to the discussion.

Furthermore, and more to your first point, it is important to note that both companies technological advances and product offerings are driven by business in very different ways. Since Intel has much larger capacity and cash reserves, they take a far more conservative approach and introduce technologies when they can achieve the greatest economies of scale. Hence, the reason why they chose and MCM design on FSB as opposed to a monolithic die. It's not that they can't do one, but margins would suffer more had they gone that route. AMD on the other hand is playing behind the curve and is therefore far more aggressive in risk taking by advancing technological and architectural designs to stay in the competition.

Either way, both approaches can allow them to serve the mass market which is what they both intend to do in order to generate the greatest revenue and ROI on their investments.

Scientia from AMDZone said...

I'm surprised that no one posted this:

Inquirer

We were expecting Phenom models 9500, 9600 and 9700 at 2.2, 2.4 and 2.6GHz . . . but they have been castrated to mean a mere 2.2, 2.3 and 2.4GHz.

If AMD is only able to reach 2.4Ghz this year then most likely FX won't get above 2.8Ghz in Q1. This is a major disappointment if true.

Oh, wait a minute. According to some who post here, I only post good things about AMD. That's odd.

sharikouisallwaysright said...

100 or 200 MHz less than expected is not worth to mention as even 2,6 GHz is below the clockrange that motivates me to buy a new CPU.
A Quad above 3,2 GHz at a modest price (lets say 400$) will make me think about it.

Scientia from AMDZone said...

This discussion about FSB versus IMC has been ridiculous.

We can go back, all the back to P4 Xeon and see that nothing has really changed. Intel was always able to do fairly good with two loads on one bus. The dropoff with Xeon DP was not that severe. On the other hand, 4-way on one bus was always poor.

MCM is two loads on the bus. This is why Intel had to immediately go to a fual FSB chipset to avoid four loads. By preventing additional bus loads with its dual FSB chipset, Intel was able to make MCM work. We see the same thing with Tigerton. The package with MCM again begins with two loads therefore it has none to spare. To get to 4-way, Intel had to create a quad FSB chipset. So, again we see that with two loads on the bus, Tigerton works well.

The reason this conversation is so ridiculous is threefold: First of all, MCM is only two loads on a bus and two loads has always worked well. So, MCM and FSB are not a factor unless the motherboard has at least two sockets. The fact that Intel has avoided four loads on a bus is clear proof that Intel knows that four bus loads is not competitive with with AMD's chips.

Secondly, all of the changes were to the chipsets. The only difference between a Kentsfield and a Tigerton is what chipset it is paired with. The processor package itself is still only two loads on the bus. AMD has two similar limitations. AMD has an absolute mapping limit of 8 objects and has an effiency limit of 4-way. It is no secret that Opteron drops off sharply with 8-way (because of the extra hop) just as Xeon does with four loads (due to bus throttling). The absolute mapping limit has been sidestepped with third party hardware by remapping external nodes into one virtual node. The 4-way limit has to some extent been softened by using a faster external multi-dimensional network. Cray for example has done this. So, we see that both Intel and AMD have had limitations overcome by using additional external hardware.

Finally, the reason this conversation has been silly is because Intel is also moving to an IMC/P2P architecture with Nehalem. If Intel did not see real advantages with IMC, Intel wouldn't bother and would instead put more effort into something else.

So, FSB works well as long as you keep the loads down to two. The interprocessor communication argument is mostly a red herring. FSB is absolutely not faster for interprocessor communication. There are however two problems in trying to compare these. The previous generation K8 was only dual core so four cores required a second socket. This meant that internal communication went via the XBAR while external communication went via HT. In contrast, dual core C2D used shared (inclusive) cache for this which cut the time for internal communication. External communication was also pretty good when comparing Xeon with a reasonably fast FSB against HT 1.0. This comparison will fall apart when comparing MCM quad against AMD native quad and when comparing a 1333Mhz bus against HT 3.0. Intel has been talking about a 1600Mhz FSB even though the memory is not fast enough for this. However, the extra FSB speed could handle the load from I/O and interprocessor communication without slowing memory access.

So, what does this mean? This means that AMD native quad will be faster for internal communication than Intel MCM quad. AMD external communication will also be faster when using HT 3.0 however the extra bandwidth of a 1600Mhz FSB means that Intel can do external communication without slowing down even if it is not as fast as AMD. The bottom line is that once you have "fast enough" more speed has less effect. This means that AMD's advantage may shift upwards to 8-way.

Scientia from AMDZone said...

sharikouisallwaysright

"A Quad above 3,2 GHz at a modest price (lets say 400$) will make me think about it."

You may be waiting awhile. I doubt Intel has any plans for a 3.2Ghz quad for $400 anytime soon. AMD may not hit 3.2Ghz on 65nm until Q4 after Shanghai has already been released on 45nm. Intel is also unlikely to release Nehalem at speeds faster than 3.2Ghz in 2008.

Peter said...

"Intel is also unlikely to release Nehalem at speeds faster than 3.2Ghz in 2008."

And you know this how, exactly?

Scientia from AMDZone said...

peter

I don't know if you understand about die size and yields. Nehalem will be larger than Shanghai. Nehalem is native quad so it loses die pairing. Nehalem also has a higher ratio of logic to cache circuitry than Penryn. Part of this depends on pipeline length though. If Nehalem has a longer pipeline it may indeed have a faster clock but this would still mean a slower processor. I'm simply not expecting the initial Nehalem's to be faster than Penryn (which should be pretty fast by that time).

I'm also certain that we will see a big shift in benchmarking. Reviewers will suddenly discover the advantages of more memory bandwidth (because Nehalem has more). The benchmarks will also likely shift away from cache tuning (because Nehalem has less). I'm certain hyperthreading will be hyped again as it was for P4.

Ho Ho said...

scientia
"AMD may not hit 3.2Ghz on 65nm until Q4 after Shanghai has already been released on 45nm."

You claimed that AMD releases 3GHz quadcore Barcelonas in Q4 this year or definitely at Q1 next year. After all they have already shown working 3GHz quads months ago and we all know these were not cherrypicked chips as AMD never does something like this. They are always playing nice and never do anything bad as Intel constantly does.

Now you claim that it takes 9-12 months to get additional 200MHz speedbump? Where is the logic in that? Whatever you are smoking, I want that!


Btw, remember I said if they are lucky they may reach 3GHz at late Q2 but most likey in H2 next year and probably not early. That time you didn't believe me. Have you changed your mind since then?


Btw, one guy on another blog asked an interesting question. What exactly has die size got to do with maximum frequency reachable?

Sure, bigger die with more logic will likely take more power but as we know intel current 45nm takes roughly half the power of its 65nm CPUs and it could likely clock the CPUs at 4GHz today if it had to without going over the thermal limits. During the year they have until the launch of Nehalem there will likely be at least one G0-like revision that lowers power usage even more.

Also Nehalem seems to have about the same die size as MCM 65nm quad. As you've said MCM is not as power efficient as native quad so I have high doubts that at 45nm it would be difficult to clock Nehalem at 4GHz+. Of course if AMD doesn't offer good enough competition (3.2GHz at Q4?) then Intel has little reason to bump Nehalem clockspeeds high. It will likely be better to keep them high enough to beat AMD by a bit but consume significantly less power.

Scientia from AMDZone said...

Ho Ho

This is a good example of what I see here all too often. You've mistated many things that I've said. I don't know if you mistate me because your memory is bad or if you are trying to invent a stawman argument to refute. If you are going to talk about something I've said then try to get it right.

"You claimed that AMD releases 3GHz quadcore Barcelonas in Q4 this year or definitely at Q1 next year."

This is the first misquote. In reality, when AMD demoed its 3.0Ghz chip I said that it takes about six months for production to catch up to a cherry picked chip. I've never given any timeframe shorter than this. Six months would be Q1 so I have no idea where you got Q4 from. The roadmaps before the actual 2.0Ghz release only showed 2.5Ghz in Q4.

"After all they have already shown working 3GHz quads months ago and we all know these were not cherrypicked chips as AMD never does something like this."

You don't seem to have enough integrity to argue genuine points so instead you imply something completely opposite of what I really said. I made the statement about cherry picked chips months ago.

"Now you claim that it takes 9-12 months to get additional 200MHz speedbump? Where is the logic in that?"

Yes, I suppose this would sound odd when you've butchered my statements as badly as you have. Let's try reality for a change and I think it will make a lot more sense. Six months to 3.0Ghz would be Q1. This would suggest a 2.8Ghz FX chip in Q4. Recent indications however suggest that AMD may only have 2.8Ghz in Q1. If this is true and AMD doesn't release 3.0Ghz until Q2 08 then I think another six months to reach 3.2Ghz is a good estimate.

" Whatever you are smoking, I want that!"

Clearly whatever you are using now is already is doing enough damage.

"Btw, remember I said if they are lucky they may reach 3GHz at late Q2 but most likey in H2 next year and probably not early."

Right, so we'll see if it really takes AMD until until mid 2H 08 to release 3.0Ghz. To your credit you haven't said anything sufficiently stupid for me to bother writing it down in the list of absurd predictions. For example, Giant, Lex, and Axel all predicted that AMD would go bankrupt in 2008.

"That time you didn't believe me. Have you changed your mind since then?"

It might very well take until Q2 but we still wouldn't be in agreement since I would not consider this timeframe early as you do. I'm still thinking that AMD will have 45nm in Q3 and that Nehalem won't be released until Q4.

"What exactly has die size got to do with maximum frequency reachable?"

I already explained that. A larger die will have more defects and therefore worse binning. Also because it is monolithic Intel can't do die pairing as they do with MCM quads.

"it could likely clock the CPUs at 4GHz today if it had to without going over the thermal limits."

The tests that I have seen have all been over the probable thermal limits at less than 4.0Ghz.

"During the year they have until the launch of Nehalem there will likely be at least one G0-like revision that lowers power usage even more.

That might be pushing it. It took a full 12 months for Intel to release G0 on 65nm. If

"Also Nehalem seems to have about the same die size as MCM 65nm quad."

True but you can do die pairing with MCM.

" As you've said MCM is not as power efficient as native quad so I have high doubts that at 45nm it would be difficult to clock Nehalem at 4GHz+."

So, is this your official prediction?

Ho Ho claims:

Q4 2008 - 3.8Ghz Penryn
Q4 2008 - 4.0Ghz Nehalem


If that is your prediction then I'll add it to the list.

" Of course if AMD doesn't offer good enough competition (3.2GHz at Q4?) then Intel has little reason to bump Nehalem clockspeeds high."

Yes, I know. It is now common to claim that when Intel is unable to do something that they simply chose not to. There were similar false claims made about 3.2Ghz 65nm chips, 45nm in Q3, Intel's use of immersion and Nehalem's being released in Q3. So, are you now going to predict 4.0Ghz Nehalems in Q4 and then claim that Intel chose not to if they don't appear?

"It will likely be better to keep them high enough to beat AMD by a bit but consume significantly less power."

I have doubts about this. We haven't yet seen AMD's 45nm to know what type of power saving it might have. Intel's 45nm could be a lot better or only slightly better. I do expect Intel's to be at least some better. If Intel's is a lot better then AMD may decide to go with high K on 45nm afterall in 2009.

Ho Ho said...

scientia
"Recent indications however suggest that AMD may only have 2.8Ghz in Q1"

Have there been anything besides the rumours of not having >2.4GHz this year? Sure, earlier they told they would have 2.8GHz FX in Q1 but then they also said there will be 2.6GHz this year.


"A larger die will have more defects and therefore worse binning."

It has a chance of having more defects, that is not saying there are no good chips that can work at higher frequencies. It just means there will be fewer of them.


"The tests that I have seen have all been over the probable thermal limits at less than 4.0Ghz."

Over by how much? At what clock speed they seem to be at their rated TDP (130W on Intel scale)? I've seen Penryn at 4GHz taking less power than QX6850 at default clocks.


"It took a full 12 months for Intel to release G0 on 65nm."

There is roughly 12 months until Nehalem gets launched too.


"True but you can do die pairing with MCM."

That won't affect maximum achieveable speed.


"So, is this your official prediction?"

Why are you misstating what I said? I've never said at what speeds they will launch, I just think those speeds are easily achieveable by then. If you really want me to predict something then I'll say that Intel will have at least as big lead in Q4 08 as it has in Q1 08.


"It is now common to claim that when Intel is unable to do something that they simply chose not to."

AMD started it saying clients don't really need faster chips :)
Difference with Intel 65nm is that Penryn seems to have a lot of headroom not only in terms of OC speed but also power usage. 65nm only had OC headroom.


Btw, why are you still talking about how good immersion lithograpy as it is something great? It is just another way of getting things done. Only thing is that pretty much nobody uses it yet, it is much more expensive and there will be bugs needing to be ironed out. AMD and IBM will start with it ahead of Intel so it will be a bit cheaper for Intel to use it on 32nm, assuming that AMD 45nm won't be delayed for too long.


"So, are you now going to predict 4.0Ghz Nehalems in Q4 and then claim that Intel chose not to if they don't appear?"

I say that if Intel has good enough lead with lower clocked Nehalems it very well might not release that high clocked CPUs.


"We haven't yet seen AMD's 45nm to know what type of power saving it might have"

In terms of maximum performance AMD will likely still be usign 65nm CPUs at the time Nehalem is released. That would put power hungry Barcelonas against just released Nehalems that are likely not using a lot of power.


"Intel's 45nm could be a lot better or only slightly better"

High-k seems to help them quite a bit and their 65nm also takes less power than AMDs. AMD won't have high-k and it seems to have lots of problems with 65nm. Let's just say I have doubts in AMD capabilities as they haven't prooved themselves during last year.

Aguia said...

ho ho,

It has a chance of having more defects, that is not saying there are no good chips that can work at higher frequencies. It just means there will be fewer of them.

Do you mean that all the volume Intel line (mainstream) which by the way is clocked much lower than AMD means Intel has very few chips that can clock high?
(Celeron, Pentium 2xxx, Core 2 Duo E4xxx amazing highest clock speed of… 2.0Ghz)


Over by how much? At what clock speed they seem to be at their rated TDP (130W on Intel scale)? I've seen Penryn at 4GHz taking less power than QX6850 at default clocks.

127W VS 165W I wouldn’t call it less.
Unless you have found one working at standard voltage?


There is roughly 12 months until Nehalem gets launched too.

It takes time to do one native quad core CPU.


That won't affect maximum achieveable speed.

Really hoho? I think it’s just the limiting factor that doesn’t allow to have quad core at 4.0Ghz but its also the factor that could allow to have duals.


In terms of maximum performance AMD will likely still be usign 65nm CPUs at the time Nehalem is released. That would put power hungry Barcelonas against just released Nehalems that are likely not using a lot of power.

But Nehamlem will have IMC and CSI which consume power, unless the Intel version doesn’t consume any power usage. Maybe Intel will surprise everyone with zero power high performing chips.


High-k seems to help them quite a bit and their 65nm also takes less power than AMDs.

I’m not sure about that, put the Pentium4 and you have completely different numbers.
Comparing a ground up made mobile architecture VS a ground up made server architecture is completely flawed.

Giant said...

Do you mean that all the volume Intel line (mainstream) which by the way is clocked much lower than AMD means Intel has very few chips that can clock high?
(Celeron, Pentium 2xxx, Core 2 Duo E4xxx amazing highest clock speed of… 2.0Ghz)


Low clock speeds with a 20% IPC advantage. The Core 2 Duo 4000 sequence maxes out at 2.4Ghz, not 2Ghz. Besides, why would Intel need more than 2Ghz with low end CPUs like the Celeron and Pentium E? If they clocked them any higher they'd start to compete with the low end Core 2 Duos.

Besides, why are you complaining about low clock speeds vs. AMD? Did you complain back K8 vs. Netburst that AMD 'only' had a 2Ghz CPU vs. Intel's 3Ghz CPU back then?


Comparing a ground up made mobile architecture VS a ground up made server architecture is completely flawed.


So what you're saying is that we shouldn't conduct ANY tests then, because it's just not a fair comparison? All the review sites compare the best from AMD and Intel. Why else do you expect them to do?

Axel said...

I rarely link to Fudzilla but recently their news updates have been hitting the mark. If they are to be believed, Phenom will cost less than Penryn clock-for clock as I'd predicted a couple months ago. If true, this naturally also implies that Phenom will generally be slower than Penryn at the same clock.

It's going to be another rough year for AMD. However, they might just survive 2008 if they keep Fab 30 shuttered the entire year and resolve the speed issues with K10. They will not regain the speed crown (probably not even close) until possibly Bulldozer in 2009.

Giant said...

However, they might just survive 2008 if they keep Fab 30 shuttered the entire year...

If they were going to do that they may as well just sell it off totally.

The Phenom pricing is interesting. The top 2.4Ghz Phenom will probably be fairly equal in performance to the Q6600 but will consume more power (125W vs. 95W), from the looks of things the price is going to be similar as well. It will be interesting to see how well these CPUs overclock.

Ho Ho said...

aguia
"127W VS 165W I wouldn’t call it less."

Is that full system power usage? If not then I'd like to see that review that said those numbers as they are way higher than I've seen.

If it was for full system then difference is around 38W and you can bet that QX6950 does not take over 90W under full load, not to mention that all the rest besides CPU cannot get buy with only ~35W. I've seen full system load with prime95 running on all cores being 118W so that 127W does seem to be for the whole system.

Should I thank you for proving my point?


"It takes time to do one native quad core CPU."

You do know that Nehalem will launch being a native quadcored, do you?


"I think it’s just the limiting factor that doesn’t allow to have quad core at 4.0Ghz but its also the factor that could allow to have duals."

You think wrong. Having more defects will bring down your yield but it won't affect how high you can clock your CPUs. Barcelona is not at that low clock speed because it is big but because AMD hasn't quite figured out yet how to actually manufacture the CPU they designed. I suggest you to read roborat's blog for detailed information, Scientia seems to lack the knowledge to explain it as good as people there have done it.


"But Nehamlem will have IMC and CSI which consume power"

... and MCM quads have two FSB interfaces that will be gone with Nehalem.


"I’m not sure about that, put the Pentium4 and you have completely different numbers."

You seem not to remember/know how much did Netburst power usage drop with 65nm. I suggest you to search up some P4D 9xx vs 8xx reviews.

Aguia said...

Giant,

Low clock speeds with a 20% IPC advantage. The Core 2 Duo 4000 sequence maxes out at 2.4Ghz, not 2Ghz.

Well my point was to reply to this ho ho phrase:
“It has a chance of having more defects, that is not saying there are no good chips that can work at higher frequencies. It just means there will be fewer of them.”
Also where are E4xxx at 2.4Ghz I never saw one.


If they clocked them any higher they'd start to compete with the low end Core 2 Duos.

I thought the lower clock FSB, the lower cache size, lacking features like VT where also differential factors.


Besides, why are you complaining about low clock speeds vs. AMD?

Well I was using the same hoho theory on Intel instead of AMD, didn’t it sound ok on Intel?


So what you're saying is that we shouldn't conduct ANY tests then, because it's just not a fair comparison?

My point is the Core 2 Duo consume less not because of superb Intel manufacturing process but because it’s a mobile CPU architecture.
Unless you want to compare AMD cpus VS IBM Power 4/5/6?



Hoho,
Is that full system power usage?

No it’s just the CPU. Don’t tell me you didn’t know one QX6850 consumes 127W it’s all over the web. Go to Xbit labs I already provided the link previously.


and you can bet that QX6950 does not take over 90W under full load

I never said it would. I said one 4.0Ghz would consume more than 130W.


Should I thank you for proving my point?

No. You should thank me to provide the truth. Unless you want to make everyone here to believe that the CPU (QX9650), HDD, DVD, Motherboard, RAM, GPU, … consumes 118W …


You do know that Nehalem will launch being a native quadcored, do you?

Why do you think I said it takes time to design and build one native quad core CPU? ;)
And we do agree after all:
“You think wrong. Having more defects will bring down your yield but it won't affect how high you can clock your CPUs. Barcelona is not at that low clock speed because it is big but because AMD hasn't quite figured out yet how to actually manufacture the CPU they designed.”


I suggest you to read roborat's blog for detailed information, Scientia seems to lack the knowledge to explain it as good as people there have done it.

I may take a look if I have time.


... and MCM quads have two FSB interfaces that will be gone with Nehalem.

Which are very power ungly? I don’t think so, Intel mobile CPU have low FSB clock because of heat and low memory clock speeds. I don’t think a 1600Mhz FSB consumes that much (the same of IMC+CSI). I think it’s more a heat/stability issue.


You seem not to remember/know how much did Netburst power usage drop with 65nm.

Well Pentium XE840 at full load did 179Watts the XE955 156Watts.
One is working at 3.2Ghz and the other at 3.46Ghz, I wouldnt call it amazing reduction. However look at the 90nm X2 4800+ (and with the IMC) its doing just 96W. If I would be using this tests to compare AMD manufacturing process VS Intel manufacturing process, I would say AMD completely frags Intel at CPU manufacturing.
So hoho and Giant my phrase doesn’t look so dumb by your own standards doesn’t it:
Comparing a ground up made mobile architecture VS a ground up made server architecture is completely flawed.

Hornet331 said...

@Aguia
E4600 2,4ghz E4xxx part.

http://geizhals.at/a285003.html

i dont know where you get your 127W from

@D840
180W is quite far fetched its more like 140...

Ho Ho said...

aguia
"I thought the lower clock FSB, the lower cache size, lacking features like VT where also differential factors."

They are but probably Intel doesn't want to follow AMD and flood the market with CPUs that have very little performance difference. I wouldn't say that having <4.5% difference between CPUs is a good enough reason for a new model.


"Unless you want to compare AMD cpus VS IBM Power 4/5/6?"

I know Power6 has 180W TDP, not sure about the others.


"Don’t tell me you didn’t know one QX6850 consumes 127W it’s all over the web."

Yes, QX6850 takes that much but we are talking about QX6950. There is a big difference between the two.


"I never said it would."

Yes, you were talking about QX6850 but never mentioned it. I assumed we were talking about QX6950 and that's why I said that.


"I said one 4.0Ghz would consume more than 130W."

How much more? So far only review I've found that measured OC power usage has been THG (I know, I know). I think I saw something similar from someone else too but I'm not sure where.


"Unless you want to make everyone here to believe that the CPU (QX9650), HDD, DVD, Motherboard, RAM, GPU, … consumes 118W …"

That is exactly what I want to do. The GPU was a low-end one and at idle. A picture of the system itself is here. Power usage is shown on the white device on the right. Also note that that 118W includes the inefficiency of PSU as usage is measured from the wall socket.


"I don’t think so, Intel mobile CPU have low FSB clock because of heat and low memory clock speeds."

Again, you are wrong. FSB takes quite a bit of power, certainly not much less than other interconnections. That is also the reason why Intel can't simply increase FSB to much higher than 1.6GHz and even at that speed NB gets quite hot.


"However look at the 90nm X2 4800+ (and with the IMC) its doing just 96W"

Why not compare AMD highest performing CPU against Intels? How high clocked were AMD dualcores at the time XE955 was released and how much power did it use? Also wouldn't you agree that back then Intel had just moved to 65nm and AMD had been living on 90nm for quite some time already. Intel first 65nm Netburst revision wasn't that good but what followed was actually quite decent for a Netburst. Compared to XE840 it is a big drop paired with relatively big MHz increase.

Axel said...

More evidence that in games, Phenom X4 will be significantly slower than Kentsfield per clock, even at 3.0 GHz. Rahul Sood's claim from a couple months ago that Phenom 3.0 GHz would "kick the living crap" out of any processor then on the market will soon be proven to have been utter unfounded fallacy.

We can see that Crysis gains significantly in performance going from K8 dual core to K10 quad core, yet even the Intel dual core E6850 beats Phenom X4 clock-for-clock.

Aguia said...

Hornet331,

i dont know where you get your 127W from

From the Xbitlabs review. Are they wrong? Go tell them why.


180W is quite far fetched its more like 140...

If that result is wrong, than the 96W from the X2 4800+ is also wrong, is more like 75W based on your standards.



Ho ho,
It seems you agree with me, I just can’t understand why you keep saying I’m wrong:

Again, you are wrong. FSB takes quite a bit of power, certainly not much less than other interconnections. That is also the reason why Intel can't simply increase FSB to much higher than 1.6GHz and even at that speed NB gets quite hot.

“I don’t think a 1600Mhz FSB consumes that much (the same of IMC+CSI). I think it’s more a heat/stability issue.

Ho Ho said...

Memory and HT scaling doesn't look all that good either.

Ho Ho said...

aguia
"It seems you agree with me, I just can’t understand why you keep saying I’m wrong:"

Well, first you said that IMC and QP(aka CSI) will increase Nehalem power usage and I mentioned that getting rid of two FSB links will give some back. With that my main point was that I doubt IMC and QP would increase power usage all that much, if any.


Btw, have you any other comments on the power usages of QX9650 at various clock speeds and XE965?

Aguia said...

Btw, have you any other comments on the power usages of QX9650 at various clock speeds and XE965?

Well your link about the XE965 really show a power drop and it took Intel just 3 months to do it (I'm basing this on Xbitlabs article dates I have no idea when B1 and C1 was released).

And I forgot to post the links that’s why all the confusion. Sorry about that.

Presler Review

The QX9650 power usage is very good using the Toms readings. But that’s why I post the question of who is making the right reading, toms or Xbit?

Besides on the xs forums there are guys doing great OC on the new Intel 45nm chips, I'm just curious on one thing. Isn’t the volts used too much for one 45nm manufactured process processor? I mean they are basically using the same voltage they use on the 65nm parts.

Aguia said...
This comment has been removed by the author.
Greg said...

Axel, by "more evidence" you mean, a repeat of the "same evidence" you showed before? I mean, it's still the same program, and I still have my same objections, so we're literally exactly where we were standing before you posted that link, making me wonder exactly why you posted that link?

Axel said...

Greg

I mean, it's still the same program, and I still have my same objections...

No, I do not believe in conspiracy theories of Crysis not being a valid benchmark due to Crytech deliberately optimizing for Intel processors. That is unfortunately your wishful thinking and when Phenom is launched with a more complete suite of benches from various sites, I believe you will see that these Crysis numbers are quite indicative of Phenom's performance relative to Kentsfield & Yorkfield. What expectations do you have anyway? Do you actually expect Phenom to outperform Kentsfield per clock? Might as well lower your expectations now so that you're not disappointed in a couple weeks. The numbers are coming out and they don't lie.

These more recent Crysis benchmarks are more CPU-bound and clearly show that the game does benefit from quad-core vs. dual core (Kentsfield gains 13.7% IPC over Conroe). But even with four cores, Phenom X4 is embarrassed in this game by dual core Conroe.

Aguia said...

axel,

These more recent Crysis benchmarks are more CPU-bound and clearly show that the game does benefit from quad-core vs. dual core.

Well if that’s the case than what you say here is completely fool axel. Shouldn’t the amd quad be faster than an Intel dual? It seems another single core CPU game since it benefits more from the extra 6MB cache for just one of the cores. I bet if most review tested dual core cpus with one of the cores disabled most of the results would be the same has with the two cores enabled.

Ho Ho said...

If Phenom cannot run mostly single/dualthreaded game at competitive speeds I have high doubts it can do much better with truly multithreaded games. Also as you can see the difference between having HT speeds of 1.33 and 2.1GHz is neglible, having RAM at 511 instead of 375 makes much much bigger difference. In fact HT at 1.33GHz and RAM at 511MHz was faster than HT at 2.1 and RAM at 508MHz

Axel said...

Aguia

Shouldn’t the amd quad be faster than an Intel dual? It seems another single core CPU game since it benefits more from the extra 6MB cache for just one of the cores.

You're forgetting that the game is still GPU bound overall, so quad core can only benefit by so much here. It's really quite simple. From this table, the following all at the same clocks:
K8 dual - 54.6 fps
K10 quad - 64.3 fps
Conroe dual - 70.8 fps
Kentsfield quad - 80.4 fps
Yorkfield quad - 81.6 fps

Since Kentsfield outperforms Conroe by 13.6%, we can say that K10 quad should gain over K10 dual by about the same, or perhaps a bit more due to better scaling than Kentsfield. Let's say 14.0% or a bit more. Now, K10 quad gains 17.8% over K8 dual. So the ~4% difference is due to specific changes to the K10 architecture. K10 is clearly faster than K8 per clock, but unfortunately not by much. Hence, Conroe dual core can still beat K10 quad core in this benchmark.

Giant said...

, I would say AMD completely frags Intel at CPU manufacturing.

I would say you're delusional if you believe that for one second. AMD is heavily reliant on IBM for process technology. Intel has developed the 45nm HK/MG process by itself. This is currently in full scale production at FAB32 in Arizona and production is also taking place at D1D in Oregon (though on a somewhat smaller scale). Intel has already shown off 32nm wafers with working S-RAM logic containing more than 1.9bn transistors.

Intel will introduce the first CPUs based on this revolutionary new process technology next week. According to AMD's own Randy Allen, AMD's 45nm Shanghai CPU will not ship until 'The second half of next year'. (source: http://youtube.com/watch?v=oeEqNMD0aKE - Let the video load then skip to around the 6 minute mark if you want avoid Randy Allen's speech on why Intel is bad for using MCM/FSB.) When the Shanghai CPU is introduced, in the second half of 2008 assuming AMD keeps to that timeframe, it will be a 45nm CPU. But it will not feature the HK/MG technology. For that you'll need to wait longer. In all likelihood that will be 32nm for AMD. It makes no sense for AMD to introduce a second generation 45nm process with HK/MG (a costly prospect, certainly) when they would be quite close to the 32nm technology which would absolutely feature the HK/MG technology.

Furthermore, your argument is flawed in that you are comparing outdated 90nm technology as opposed to the CURRENT 65nm and very close to being introduced 45nm technology. You also decided only to use Netburst power ratings on 90nm (it's no secret that Prescott suffered SEVERE leakage issues when the CPU was clocked above 3GHz) and conveniently ignored the Pentium M, which was a power efficient CPU on the same 90nm CPU and didn't suffer from leakage issues anywhere near as severe as Prescott did.


Comparing a ground up made mobile architecture VS a ground up made server architecture is completely flawed.


So, by your opinion, we shouldn't be comparing AMD and Intel at all because AMD's CPUs are derivatives of the Opteron (a server CPU) and Intel's CPUs are a derivative of the Pentium M (a mobile CPU). Is this or is this not correct?

Scientia from AMDZone said...

Well, I suppose we could have an RDR, CTI, SOI, process discussion. There have been some points brought up about this in various places. I'll have to write a new article then to start a new thread.

Christian M. Howell said...

Memory and HT scaling doesn't look all that good either.

What are you talking about? It gets 15% more fps with higher HT and RAM.

Aguia said...

Giant,

You disturb a lot what I said, but OK.

would say you're delusional if you believe that for one second.

I do not that’s my all point; K8/K10 and AMD manufacturing process VS Core 2 Duo and Intel manufacturing process doesn’t tell you nothing who have the best, or does it?


Furthermore, your argument is flawed in that you are comparing outdated 90nm technology as opposed to the CURRENT 65nm and very close to being introduced 45nm technology.

But the AMD only had 90nm at the time what do you want me to do? If I compare new AMD K8 65nm VS Intel P4 65nm will only extend the results to AMD benefice.


You also decided only to use Netburst power ratings on 90nm

Nope see the link. I don’t think any of the Intel PD9xx was manufactured on 90nm all are 65nm.


and conveniently ignored the Pentium M

Not really I was waiting for you and ho ho or someone else brings up this question to give me all the reason that I’m 100% correct.


So, by your opinion, we shouldn't be comparing AMD and Intel at all because AMD's CPUs are derivatives of the Opteron (a server CPU) and Intel's CPUs are a derivative of the Pentium M (a mobile CPU). Is this or is this not correct?

It is correct if you had also the manufacturing process too. You said yourself that Intel own 90nm behaved differently with different designs (P4/PM). So you and ho ho are only agreeing with me, but its funny you both say you do not.

Ho Ho said...

christian
"What are you talking about? It gets 15% more fps with higher HT and RAM."

Can you see that fastest speed was achieved with 1.3GHz HT and 511MHz RAM? Kind of odd that 2.1GHz HT and 508MHz RAM was much slower


aguia
"I don’t think any of the Intel PD9xx was manufactured on 90nm all are 65nm."

Every single P4D 9xx is 65nm.

Axel said...

Howell

What are you talking about? It gets 15% more fps with higher HT and RAM.

How about showing your calculations. I see faster HT showing zilch benefit in Crysis:

3048 MHz CPU
508 MHz memory
2120 MHz HT
= 71.9 fps

3071 MHz CPU
512 MHz memory
1335 MHz HT
= 73.0 fps

So with a 0.8% faster core, 0.8% faster memory, and 37% SLOWER HT link, for the four timedemos Crysis is 2.6%, 0.7%, 0.7%, and 2.1% FASTER. The real maximum benefit should be 0.8%, so the 2.x% figures are simply normal benchmark skew. Anyway, this basically shows that HT speed has absolutely no bearing on game performance, even with all four cores and GPU in use.

Greg said...

Wow, axel, that wasn't childish in the least. Let's make sure we try to degrade Greg's argument by lying about what he said...

Look, I never said that Crysis was designed with a bias for Intel or to gimp AMD processors. Your attempt to paint the argument that was is just pathetic.

Crytek's last product produced benchmarking results that were never exactly representative of the performance of the cards being tested when it was first released. This largely died off after the first couple of patches, because Crytek obviously cares, and didn't want a flawed product.

I don't think Crysis is a flawed test, either, nor did I ever say that before. I did say that I think we should wait until a patch or two is released before giving the results weight, though.

As to what I expect? I don't know exactly what to expect between AMD's and Intel's products, and I wont bother speculating, but I do expect there to be an actual improvement in performance/core between k10 and k8, which your speculation seems to completely contradict. If we don't see that, then I think the only logical conclusion we can reach is scientia's, which is that there is some bug in k10.

Axel said...

Greg

Look, I never said that Crysis was designed with a bias for Intel or to gimp AMD processors.

My apologies. When you stated earlier that "they're pretty much all using kentsfields with 8800s", I thought you meant the developers and not the testers.

So if Crysis code is poorly optimized (and people might be inclined to believe you if you provide a link to back up that claim), then what this might mean is that Core 2 does better with unoptimized code than K10. Personally, however, I believe that the Crysis results are fairly representative of the general gaming results that will be revealed in two weeks or so. I base this prediction on Barcelona's performance in the Tech Reports and Anandtech workstation & desktop testing.

Crytek's last product produced benchmarking results that were never exactly representative of the performance of the cards being tested when it was first released. This largely died off after the first couple of patches, because Crytek obviously cares, and didn't want a flawed product.

Please back this statement up with a link. Without a link you might as well be like Rahul Sood spouting off some nonsense about Phenom performance at 3.0 GHz.

I did say that I think we should wait until a patch or two is released before giving the results weight, though.

What are the chances that a patch would help Phenom performance without helping Core 2 about equally as well? Remote. But if you want to wait a few months before giving Crysis benchmarks any weight, be my guest.

...but I do expect there to be an actual improvement in performance/core between k10 and k8, which your speculation seems to completely contradict.

No, my words were "K10 is clearly faster than K8 per clock, but unfortunately not by much." The bottom line is that the improvements in K10 are not paying dividends as expected. This reminds me of when information on Prescott's many improvements over Northwood came out and people were excited over the almost certain clock-for-clock improvements. What Intel didn't reveal until later was that these improvements simply mitigated the massive performance penalties associated with the greatly lengthened pipeline and the ALUs reverting to baseline clock instead of double-pumped as before.

The fact is that K10 is not a major overhaul of K8 but simply an extensive tweak. There is probably a fundamental bottleneck in both K8 and K10 that cannot be removed without essentially starting from scratch.

Giant said...



I do not that’s my all point; K8/K10 and AMD manufacturing process VS Core 2 Duo and Intel manufacturing process doesn’t tell you nothing who have the best, or does it?


Of course it does. Netburst is history now. All of Intel's production is Core 2 now. Why should we compare an antiquated architecture that is no longer produced to AMD's current products?


But the AMD only had 90nm at the time what do you want me to do? If I compare new AMD K8 65nm VS Intel P4 65nm will only extend the results to AMD benefice.


Then why don't you compare AMD 65nm dual core K8 and quad core K10 to Intel 65nm dual core Core 2 and quad core Core 2? These are the CURRENT PRODUCTS from AMD and Intel.

It is correct if you had also the manufacturing process too. You said yourself that Intel own 90nm behaved differently with different designs (P4/PM). So you and ho ho are only agreeing with me, but its funny you both say you do not.

I'm not agreeing with you at all. I cited a prior example. Back then the Pentium M was a mobile architecture, while Netburst powered servers and desktops. But now, all of AMD's CPUs are based on Opteron, a server design. All of Intel's x86 CPUs are based on the Pentium M, a mobile design.

You failed to answer my question earlier: Are you suggesting that any comparison between AMD and Intel CPUs is totally flawed, because AMD's CPUs are based on a server design while Intel's CPUs are based on a mobile design? Yes or no?

Ho Ho said...

greg
"If we don't see that, then I think the only logical conclusion we can reach is scientia's, which is that there is some bug in k10."

There have been at least one revision since the launched barcelonas and they still haven't managed to fix the bug? Not to mention that this bug should have been known for ages.

My guess is the "bug" is that Barcelona really doesn't have that great IPC and all that talk about 40%+ performance was just for bandwidth sensitive workloads. AMD had to say something back then as Core2 was gaining marketshare from Opterons and they bent the truth a bit to make it look better than it was. Unfortunately many believed that Barcelona should be significantly faster than Core2 in other workloads too.

One thing that theoretically could have improved K10 performance was loosing the ability to clock cores separately. That would make it possible to have direct links from cores to L3 and extra latency of intermediate buffers would not have been there. I doubt AMD would do anything like that though.

Axel said...

Ho Ho

AMD had to say something back then as Core2 was gaining marketshare from Opterons and they bent the truth a bit to make it look better than it was.

No, they pretty much outright lied. Randy Allen's exact words were "We expect across a wide variety of workloads for Barcelona to outperform Clovertown by 40 percent." So he implied a general performance advantage of 40%.

Even in the server space, Barcelona doesn't generally outperform Clovertown by 40% by any measure: IPC, per watt, per watt per mm2. Maybe only in idle power consumption but this would be a ridiculous measure to refer to in the context of "a wide variety of workloads". No, AMD lied, plain and simple. The execs should be hung out to dry.

«Oldest ‹Older   1 – 200 of 213   Newer› Newest»