Thursday, August 23, 2007

2008 And Beyond

2007 is far from over but it seems that lately people prefer to talk about 2008. Perhaps this is because AMD is unlikely to get above 2.5Ghz with K10 and Penryn will only have a low volume of about 3%. I suppose this is not a lot to get excited about. So, we are encouraged to cast our gaze forward but what we see is not what we might expect.

AMD's server chip volume has dropped considerably since last year. So, there is little doubt that this trend will reverse in Q3 and Q4 of 2007 with Barcelona. This is true because even at lower clock speeds, Barcelona packs considerably more punch than K8 Opteron at similar power draw. The 2.0Ghz Q3 chips should replace around half of AMD's current Opterons and faster 2.5Ghz chips replacing even the fastest 3.0Ghz K8 Opterons in Q4. This should leave Intel with two faster server chip speeds in Q4 with this most likely falling to a single speed in Q1 08. However, Intel may be able to pull farther ahead in Q2 08. I'm sure this will be confusing to those who are comparing the Penryn launch with Woodcrest last year and assuming that the highest speed grades will be released right away. The problem with this view is that Penryn is leading 45nm in Q4 of this year whereas Woodcrest did not lead 65nm in 2006. Instead, Woodcrest was six months behind Presler which went into 65nm production in October 2005 and launched in December 2005. This explains why Woodcrest was able to hit the ground running and launch at 3.0Ghz. June 2006 was six months after 65nm Presler in December 2005. Taking this as the pattern for 45nm would mean top initial speeds wouldn't be available until Q2 2008. This seems true since Intel has been pretty quiet about Q1 08 release speeds. If the market expands in early 2008, Intel should get a boost as AMD feels the pinch in volume capacity caused by the scale down at FAB 30 and the increased die size of quad core K10. This combines with Intel's cost savings due to ramping 45nm to put Intel at its greatest advantage. However, by the end of 2008, this advantage will be gone and Intel won't see any new advantage until 2010 at the earliest.

To understand why Intel's window of advantage is so small you need to be aware of the differences in process introduction timelines, ramping speeds, base architecture speed, and changing die size advantages. A naiive assumption would be that: 1.) Intel's timeline maintains a process launch advantage over AMD, 2.) that Intel transitions processes faster, 3.) that Penryn is considerably faster than Conroe and that Nehalem is considerably faster than Penryn, and 4.) that Nehalem maintains Penryns's die size advantage. However, each of these assumptions would be incorrect.

1.) Timeline

Q2 06 - Woodcrest
Q3 07 – Barcelona Trailing by 5 quarters.

Q4 07 - Penryn
Q3 08 – Shanghai Trailing by 3 quarters.

Q4 08 - Nehalem
Q2 09 – Bulldozer Trailing by 2 quarters.

Q4 09 - Westmere
Q1 10 - 32nm Bulldozer Trailing by 1 quarter.

Intel's Tick Tock timeline is excellent but AMD's timeline steadily shortens Intel's lead over the next two and a half years. This essentially means that the dominance that C2D enjoyed for more than a year will not be repeated. I suppose it is possible that 45nm will be late but AMD continues to say that it is on track. The main reason I am inclined to believe them is the die size. When AMD moved to 90nm they only had a small shrink in die size at first and then they later had a second shrink. AMD only reduced Brisbane's die size to 70% and nine months later AMD could presumably do a second shrink. But they aren't; Barcelona shows the same 70% reduction as Brisbane. This suggests to me that AMD has skipped a second die shrink and is concentrating on the 45nm launch. I'm pretty certain that if 45nm were going to be late that we would be seeing another shrink of 65nm as a stopgap.


2.) Process Transition

Most people who talk about Intel's process development only know that Intel launches a process sooner than AMD. However, the amount of time it takes Intel to actually field a new process is also important. Let's look at Intel's 65nm history starting with an Intel Presentation concerning process technology. Page 2:

Announced shipping 65nm for revenue in October 2005

CPU shipment cross-over from 90nm to 65nm projected for Q3/06


And, from Intel's website, 65-Nanometer Technology:

Intel has been delivering 65nm processors in volume for over one year and in June 2006 reached the 90-65nm manufacturing "cross-over," meaning that Intel produced more than half of total mobile, desktop and server microprocessors using industry-leading 65nm process technology.

So, we can see that Intel did quite well and even beat its own projection by reaching crossover in late Q2 instead of Q3. October 2005 to June 2006 would be eight months to 50% conversion. For AMD, the INQ had a rumor for shipping in October and we know it officially launched December 5th 2006. Let's assume that this is true since it matches with Intel's October revenue shipping date with a December release in 2005. The AMD Q1 2007 Earnings Transcript from April 19th 2006 says:

100% of our fab 36 wafer starts are on 65 nanometer technology today

October 2006 to April 2007 would be 6 months. So, this would mean that AMD made a 100% transition in two months less than it took Intel to reach 50%. Intel's projection of 45nm is very similar with crossover not occuring until Q3 08. What this means is that even though Intel launches 45nm with a headstart in Q4 07, AMD should be completely caught up by Q1 09.


3.) Base Architecture Speed

Intel made grand claims of a 25% increase in gaming performance (40% faster for 3.33Ghz Penryn versus 2.93Ghz Kentsfield). However, according to Anandtech's Wolfdale vs. Conroe Performance review, Penryn is 4.81% faster while HKEPC gets 5.53% faster. A 5% speed increase is similar to what AMD got when it moved from 130nm to 90nm. The problem that I see is not with Intel's exageration but that Nehalem seems to use the same core. In fact, other than HyperThreading there seems to be no major changes to the core between Penryn and Nehalem. The main improvements with Nehalem seem to be external to the core like an Integrated Memory Controller, point to point communications, L3 cache, and enhanced power management. The real speed increases seem to come primarily from GPU processing and ATA instructions however like Hyperthreading these are not going to make for significant increases in general processing speed. And, since Westmere is the same core on 32nm this means no large general speed increases (aside from clock increases) for Intel processors until 2010 at the earliest. I suppose this then leaves the question of whether AMD will get a larger general speed increase with Bulldozer. Presumably if AMD can manage it they could then pull ahead of Nehalem. Both Intel and AMD are going to use GPU's on the die and both are going to go to more cores. Nehalem might get ahead of Shanghai since while both can do 8 cores Nehalem can also do HyperThreading. But Bulldozer moves back ahead again by allowing 16 actual cores. At the moment it is difficult to imagine a desktop application that could effectively use 8 cores, much less 16 but who knows how it will be in two years.


4.) Die Size

For AMD the goal is to get through the first half of 2008 because the game looks quite different toward the end of 2008. By the time Nehalem is released Intel will already have gotten most of the benefit of 45nm while AMD will only be starting. Intel will lose its small die size MCM advantage because Nehalem is a monolithic quad die like Barcelona. Intel only got a modest shrink of 25% on 45nm and so far has only gotten a 10% reduction in power draw so AMD can certainly stay in the game. It is also a certainty that Nehalem will have a larger die size than quad Penryn. This will be true because Nehalem will have to have both an Integrated Memory Controller and the point to point CSI interface. Nehalem will also add L3 cache. It would not be surprising if the Nehalem die is larger than AMD's Shanghai die. The one positive for Intel is that although yields will be worse with a monolithic die, their 45nm process should be mature by then. However, AMD has shown considerably faster process maturity so yields should be good on Shanghai in Q1 09 as well.

An Aside: AMD's True Importance

Finally, I have to say that AMD is far more important than many give them credit for. I recall a half-baked editorial by Ed Stroligo A World Without AMD where he claimed that nothing much would change if AMD were gone. This notion shows a staggering ignorance of Intel's history. The driving force behind Intel's advance from 8086 to Pentium was Motorola whose 68000 line was initially ahead. It had been Intel's intention all along to replace x86 and Intel first tried this back in 1981 with iAXP 432. It's segmented 16MB addressing looked pretty good compared to 8086's 1MB segmented addressing. However, it looked a lot worse than 68000's flat 16MB addressing which had been released the year before. The very next year iAXP 432 became the Gemini Project which then became the BiiN company. IAXP 432 continued in development with the goal of replacing x86 until 1989. However, this project could not keep up with the rapid pace of x86 as it struggled to keep up with each generation of 68000. When Biin finally folded, a stripped down version of iAXP 432 was released as the embedded i960 RISC processor. Interestingly, as the RISC effort ran into trouble Intel began working on VLIW and when BiiN folded in 1989 Intel released its first VLIW procesor, i860. HP began work on EPIC the same year and five years later, Intel was commited to EPIC VLIW as an x86 replacement.

In 1995 Intel introduced Pentium Pro to take on the established RISC processors and grab more share of the server market. The important point though is that there is no indication that Intel ever intended Pentium Pro to be used on the desktop. We can infer this for a couple of reasons. First, Itanium had been in development for a year when Pentium Pro was introduced and an Itanium release was expected in 1998. Second, with Motorola out of the way (68000 development ended with 68060 in 1994), Intel was not expecting any real competion on the desktop. AMD and Cyrix were still making copies of 80486 so Intel had only planned some modest upgrades to Pentium until Itanium was released. However, AMD released K5 which thoroughly stunned Intel. Although K5 was not that fast it did have a RISC core (courtesy of AMD's 29050 RISC processor) which put K5 in the same class as Pentium Pro and a generation ahead of Pentium. Somehow AMD had managed the impossible and had skipped the Pentium generation. So, Intel went to an emergency plan and two years later released a cost reduced version of Pentium Pro for the desktop, Pentium II. The two year timeline indicates that Intel was not working on a desktop version previous to K5's release. Clearly, we owe Pentium II to K5.

However, AMD purchased Nexgen and released the powerful K6 (which also had a RISC core) just two years later meaning that it arrived at the same time as PII. Once again Intel was forced to scramble and release PIII two years later. We owe PIII to K6. But, AMD had been hard at work on a K5 successor and with the added technology from K6 and some Alpha tech it released K7. Intel was even more shocked this time because K7 was a generation ahead of Pentium Pro. Intel was out of options so it was forced to release the experimental Williamette processor and then follow up with the improved Northwood two years later. We owe P4 to K7. That P4 was experiemental and never expected to be released is quite clear from the pipeline length. The Pentium Pro design had a 14 stage pipeline which was reduced to 10 stages in PII and PIII. Interestingly Itanium also used a 10 stage pipeline. However, P4's pipeline was even bigger than the original Pentium Pro's at 20 stages. Itanium II has an even shorter pipeline at 8 stages so it is clear that Intel does not prefer long pipelines. We can then see that P4 was an aberration caused by necessity and Prescott at 31 stages was a similar design of desperation. Without K8 there would be no Core 2 Duo today and without K10 there would be no Nehalem.

There is no doubt whatsoever that just as 8086's rapid advance against competition from Motorola 68000 stopped the iAXP 432 and shutdown Biin, Intel's necessity of advancing Pentium Pro rapidly on the desktop stopped Itanium. Intel already had experience with VLIW from i860 and would have delivered Merced on schedule in 1998. Given Itanium's speed it could have been viable at as little as 150Mhz. However, Pentium II was already at 450Mhz in 1998 with faster K7 and PIII speeds due the next year. The pace continued rapidly going from Pentium Pro's 150Mhz to PIII's 1.4Ghz. Itanium development simply could not keep up and the grand plans of 1997 for Itanium to become the dominant processor fell apart. The pace has been no less relentless since PIII and Itanium has been kept in a niche server market.

AMD is the sole reason why today Itanium is not the primary processor architecture. To suggest that nothing would change if AMD were gone is an extraordinary amount of self delusion. Intel would happily stop developing x86 and would put its efforts back into Itanium instead. The x86 line is also without any serious desktop replacement. Alpha, MIPS, and ARM stopped being contenders long ago. Power was the last real competitor but it fell out of the running when its desktop chips couldn't keep up and were dropped by Apple. This means that without AMD, Intel's sole competition for desktop processors is VIA. And, just how far behind is VIA? No AMD would mean higher prices and slower development and the eventual phase out of x86. Of course, I guess people can always hope that Intel has given up its goal of more than a quarter century of dropping the x86 line and moving the desktop to a completey proprietary platform.

350 comments:

«Oldest   ‹Older   201 – 350 of 350
Ho Ho said...

scientia
"If increasing the FSB fixed the problem then Intel wouldn't need to pump up the cache size on Penryn"

Most likely it does it just because it can. 107mm^2 chip is cheap to produce and the cache can add to performance in some places.


"I'm sure Intel will support FBDIMM-800 initially and FBDIMM-1066 as soon as it is available."

I'm sure it will but you made it sound as FBDIMM won't get much (any) faster than that.


"Stalls were a big problem with the superpipeline of P4; why would they be such a problem for Nehalem?"

Why do you think they have? Could it simply be that as IPC is generally <1 on average code even though CPU theoretical peak is much higher you can put some additional instructions from another thread into the pipeline.


"And, shouldn't the IMC reduce Nehalem's memory latency?"

Exactly my point, HT isn't there to help with memory latency either.


"For x87 based code, K10 is about the same speed as K8. But, no one is still developing for this."

Except that there exists no half-decent autovectorizing compiler that could turn regular C/C++(Java JIT) into SIMD code. If you want to use SIMD you have to write it yourself directly either in pure ASM, builtins or with intrinsics.


"Interesting. So, you are suggesting that Nehalem will have two separate L2 caches."

Two? I meant one per core.


"L2 on K10 is a victim cache instead of a direct cache as it is on C2D. L3 on K10 is only a secondary victim cache. K10 has no speculative loads to either L2 or L3."

Remember that Nehalem is a whole new microarchitecture, everything is possible. Just because Intel hasn't yet said how it will look like doesn't neccesarily mean it is simply a Core2 with added IMC and native quadcore.


Btw, how complex is the cirquity that talks with FSB? When adding IMC Intel can remove that part to save some transistors.


"Well, K8's run ok with 512K per core while C2D's take a noticeable hit. K8's run great with 1MB per core while C2D's don't show top performance until they have 2MB's per core"

This is from not having IMC and low memory latency, nothing else.


"So, at a minimum Nehalem should need an Allendale level of cache. So, let's say 4MB's L2 plus 2MB's L3"

Wow, such a flawed logic. Please try to remember that with low memory latency you don't need nearly as much cache and lower amount of it won't have big amount on performance. Only reason why Intel keeps on adding it is to fight with memory latency. By having more cache it will have less cache misses leading to better performance.


As for SSE5, only important thing I see missing is dot product. I sure hope AMD won't skip SSE4 ans that one would add it. Another thing is I hope they increase the number of SIMD registers by at least twice, if not four times. Now that would really make things hard for the alternative architectures.

Ho Ho said...

scientia
"Intel got JEDEC to skip faster DDR even though DDR-50javascript:void(0)
Publish Your Comment0 was clearly possible. Intel did this because it knew that it benefited more from DDR2."


So Intel could make JEDEC to skip DDR1 500 but can't make it to standardize faster FBDIMMs? Is there any technical reason why one can't produce faster FBDIMMS?

Aguia said...

Three operand instructions is huge. This greatly leverages the existing SSE register set and can remove about 25% of the coding volume. Let me state this another way. You might recall the Altivec instruction set that Mac programmers greatly bemoaned losing when Apple switched to Intel? This is as powerful as Altivec. Having said that I'm now wondering if maybe IBM influenced this.

Yes it's very amazing, that’s for sure, even the Anandtech guys say it’s the biggest and more important instruction set since Intel SSE.

And even the Light-Weight Profiling Spec look very interesting, AMD will give more details about those in the coming weeks.

AMD has a solid strategy with solid bases let’s see if they do it.

Aguia said...

ho ho
the JEDEC this time is not that important, I think.

Do you want me to list here all the DDR2 1066MHZ KITs selling in my country?

Just this:
There are more DDR2 1066Mhz kits than all the sum of DDR3 1066Mhz + 1333Mhz.

In other works with JEDEC or without JEDEC AMD already has 1066Mhz DDR2 “officially”.

Ho Ho said...

aguia
"And even the Light-Weight Profiling Spec look very interesting, AMD will give more details about those in the coming weeks."

Can you explain what makes it so special when we have had similar or basically identical functionality in all x86 CPUs since original Pentium? Only difference I see is that AMD proposes to standardize the model specific registers and instructions that are currently needed to access the profiling data.


"Do you want me to list here all the DDR2 1066MHZ KITs selling in my country?"

I wasn't talking about desktop RAM but server ones. Find some of those and then we'll talk. Did you knew that there existed DDR1 600? Also I know there are desktop DDR2 at >1333MHz.


"In other works with JEDEC or without JEDEC AMD already has 1066Mhz DDR2 “officially”."

Are they suitable for server? Guess not.

Pop Catalin Sever said...

Ho ho said:
"Are they suitable for server? Guess not. "

I AMD directly certifies the 1066 MHz ddr2 memory made by producers then the AMD certified 1066 memory will be suitable for servers without JEDEC beeing implied.

abinstein said...

Ho Ho -
"Can you explain what makes it so special when we have had similar or basically identical functionality in all x86 CPUs since original Pentium?"

No you did not understand it correctly. LWP let the processor hardware do the profiling without software interfacing (i.e. interference) such as interrupt or locking. You can profile cache hits and branch prediction with software, but due to the heavy weight you can hardly dynamically optimize your running code, under profiling, according to the profiling results.

Ho Ho said...

pop calvin
"I AMD directly certifies the 1066 MHz ddr2 memory made by producers then the AMD certified 1066 memory will be suitable for servers without JEDEC beeing implied."

My point was that those are not registered dimms.


abinstein
"You can profile cache hits and branch prediction with software, but due to the heavy weight you can hardly dynamically optimize your running code, under profiling, according to the profiling results."

As I said, I can count around nine separate profiling events in parallel having at most 1% performance hit. Is that too "heavy weight"?

A small overview of the events I can profile in real-time is listed here. I could list the whole near 600 events I have on my Core2 if you like to.

abinstein said...

"As I said, I can count around nine separate profiling events in parallel having at most 1% performance hit. Is that too "heavy weight"?"

Try to profile every user space process your system is running and see what the performance hit is.

Your profiling is no good to any process if multiple of them are switching with each other. Furthermore, retrieving all these data from CPU and arranging them for all processes will definitely make your system slow.

Ho Ho said...

abinstein
"Try to profile every user space process your system is running and see what the performance hit is"

It would be exactly the same. Believe me, I've done it. Don't believe me? Install yourself PAPI and prove me wrong.

abinstein said...

"It would be exactly the same. Believe me, I've done it. Don't believe me? Install yourself PAPI and prove me wrong."

I don't need to prove you wrong, it's you who need to prove yourself right. All I need is to read the FAQ on PAPI site:

Many systems have only a few hardware performance counter registers thus you can only measure a few metrics at once. Some platforms may support counter multiplexing, which gives the user the illusion of a larger number of registers by time sharing the performance registers. On the R10K series, the IRIX kernel supports multiplexing, allowing up to 32 events to be counted at once. Don't take fine grained measurements when multiplexing, unless you know what you're doing.

InTheKnow said...

Scientia, your original statement was...
I also predict that when Nehalem is released that the strong justification in favor of using using MCM dies for better yields, higher speeds, and lower cost will suddenly evaporate.

How does AMD's decision to follow this route negate the fact that Intel will use this technique for octo-core? MCM will still give better yields, lower costs, and higher speed (compared to 4 cores). Hence the talk about MCM will not die out. It is still a viable approach to the upper end multi-core chips at any technology node.

Again, we are comparing 45nm monolithic Nehalem to 45nm MCM Penryn; clearly, Nehalem is worse. Or we could compare with 45nm monolithic Shanghai and again Intel loses its current MCM advantage.

You are assuming that Nehalem has the same basic core size as Penryn. I have read over and over how Intel needs the cache to compensate for the lack of ICM and HT. When they add this with Nehalem, why would they need to continue using up so much space on cache.

So if cache is half of my die and I reduce this by 50% each core is now 25% smaller than an equivalent Penryn core. Total die size for 4 cores on Nehalem is equivalent to 1.5 Penryn cores or ~150 mm^2. How does that compare to current Core 2 size on 65nm? Not too much worse, and Core 2 yields are good so Nehalem should have acceptable yields.

So while Nehalem may have somewhat lower yields than Penryn, they should still be better for a given defect density than Barcellona. If AMD can get good yields on Barcellona, I'm confident that Intel can get good yields on Nehalem. Remember, Intel said that the economics were favorable on 45nm. To see if they were right you have to compare the yield hit at 45nm to the yield hit at 65nm to see the economic impact.

There are free yield simulators out there. Run the numbers and see for yourself if Nehalem yields will be any worse than Barcellona. And if you want to compare to Shanghi, don't forget to include an estimate for increased cache size on Shanghi when you estimate die size.

In reference to Silverthorne you have said that Intel has an uphill battle...
Because Intel is just now trying to enter into this market where ATI is already well established. Intel has admitted to this.

First, ATI is not the competitor. Intel is talking about the full PC experience on small form factors. Specifically the UMPC/MID as the target device for Silverthorn. This is not the same CE market segment that ATI is already in. Their competitors are AMD with Geode (and eventually Bobcat) and VIA.

You also seem to be under the mistaken impression that Intel is going to break into this area with Silverthorne. The fact is, they are already the leader in design wins for UMPC/MID devices. Out of 30 devices listed here 17 are Intel, 9 are VIA and the remaining 4 are AMD Geode. What Intel is offering with Silverthorne is a huge decrease in size and power consumption for their customers to upgrade existing product offerings. Note that you can buy these products now, not at some nebulous date in the future.

abinstein said...

"So if cache is half of my die and I reduce this by 50% each core is now 25% smaller than an equivalent Penryn core. Total die size for 4 cores on Nehalem is equivalent to 1.5 Penryn cores or ~150 mm^2.'

Your estimate is just wrong.

First, Nehalem's cache per core might be less than Penryn, but it's total cache size will be larger and cache structure will be more complicated (higher associativity, e.g.). It's not going to have better yield just because it's smaller per core.

Second, 50% of Penryn's L2 cache is not 25% of Penryn's die area. Instead it's more like 17% (6*3/107).

Third Nehalem needs to spend die area for memory controller, northbridge, and internal multiplexers. For K8 X2 these account for roughly 1/6 to 1/8 total die area; for K10 it's probably even more. This increase will most likely make up all the area saving from less cache size.

Fourth Nehalem will include some core improvements, making the core larger. The SMT will also increase die area and complexity, both decreases yield.

InTheKnow said...

Abinstein, have you ever heard of a back of the envelope calculation? That is all I was doing. The intent was not to perform a rigorous analysis on an undisclosed design where all we have is pure speculation to begin with.

You also missed the words "defect density", as in So while Nehalem may have somewhat lower yields than Penryn, it should still be better for a given defect density than Barcellona.

When using the simple die area calculators I was referring to, the type of structure isn't considered. The calculation just looks at probability of getting a killer defect on a given die.

As to memory controller, etc. I haven't seen any diagrams of CSI connections or Intel's IMC design so that just adds more speculation. It could be bigger or smaller than AMDs equivalent, who knows? I'm also sure that the removal of the components used to communicate with the FSB will give back some area, though not a lot.

So the bottom line is the best we can do is a bit of hand waving to get a feel for what we might see. Any attempt at a rigorous analysis is just self delusion.

Bottom line of my post was that Nehalem on 45nm will be more economical than Barcellona on 65.

Ho Ho said...

abinstein
"I don't need to prove you wrong, it's you who need to prove yourself right. All I need is to read the FAQ on PAPI site:"

Did you notice it had nothing to do with profiling the entire system what you were originally talking about?


Reading several different types of events and types from several processes at once are completely different.

For comparison, how many events can that AMD thing measure in parallel? In their PDF they talk about "maximum countable events" but nowhere they say how big is the maximum. I'm sure it is pretty much exactly the same as it is now: a few counters (~2-15) and if need to measure more they are multiplexed with interrupts, they even

So once again, you seem to not understand what is availiable now and what does AMD propose. I'll repeat it once more:
they are proposing to standardize the long existing profiling interface in x86 CPUs. There is nothing revolutionaly about it and even suggesting things as it would bring speculative threading/reverse HT into reality is pure nonsense.

I'm not saying that standardizing is not good, it is. It would make things like perfctr much easier and perhaps finally Microsoft would have half-decent support for it also. For me it seems as simply nobody even knew about the existence of lightweight profiling support in their CPUs and now as AMD talks abit about it many think it is something new and revolutional that will make lots of stuff possible. Well, it won't, not without massive further modification that would not exactly remind the original too much.


"First, Nehalem's cache per core might be less than Penryn, but it's total cache size will be larger and cache structure will be more complicated (higher associativity, e.g.)."


I can understand the associativity but why on earth would the cache need to be bigger?

abinstein said...

"Did you notice it had nothing to do with profiling the entire system what you were originally talking about?

Reading several different types of events and types from several processes at once are completely different."


I was from the beginning talking about profiling "every user space process" (just search this phrase on this page). That means not just more # of events per process, but profiling every running process.

Unless you can profile every processes efficiently at the same time, you can't really do dynamic performance adjustment in hardware. Your other replies, informative or not, is unfortunately missing the whole point. I consider this matter closed.

abinstein said...

"Abinstein, have you ever heard of a back of the envelope calculation? That is all I was doing."

Yes i know back-of-envelope, but I've never heard of such calculation that would account for reduction only but not any increase.

When you took the trouble to guesstimate cache size die area reduction, but omit the obvious need to increase die size from IMC and SMT, you are not just doing a back-of-envelope calculation, but a biased one. I'm merely correctly your bias. :)

"You also missed the words "defect density", as in So while Nehalem may have somewhat lower yields than Penryn, it should still be better for a given defect density than Barcellona."

Let me ask you two questions:

1. Do you believe Nehalem will have smaller core and less cache size than Barcelona at the same process technology?

2. Are you comparing Nehalem @45nm process to Barcelona @65nm process?

Before you can answer both of above, your "claims" are as good as nothing.

abinstein said...

"First, Nehalem's cache per core might be less than Penryn, but it's total cache size will be larger and cache structure will be more complicated (higher associativity, e.g.)."

I can understand the associativity but why on earth would the cache need to be bigger?


Ho Ho, you're not paying me and I'm tired to answer every silly question of yours. But just for the sake of labor day let me be nicer to you.

The total cache size of a 4-core Nehalem die will be larger than that of a Penryn die. This (total) cache size is what would affect yield.

Ho Ho said...

abinstein
"Unless you can profile every processes efficiently at the same time"

Erm, that is exactly that Papi and LWP do and from the current information I've read (AMD .pdf) they are basically identical. Can you list a few differences I've missed?


"... you can't really do dynamic performance adjustment in hardware"

Has AMD ever said it might do something like that with the information gathered? From what I've read this is just a fruit of imagination of a few enthusiastic people who don't really have much idea what LWP is.


"The total cache size of a 4-core Nehalem die will be larger than that of a Penryn die. This (total) cache size is what would affect yield."

I've asked this from Scientia, who couldn't give logical answer and now I'm asking you:

What stops Intel from "copying" Barcelona and doing e.g 512k L2 + 6M L3?

Aguia said...

What stops Intel from "copying" Barcelona and doing e.g 512k L2 + 6M L3?

What stops, nothing. They are doing it. The only question is if the L2 is also shared or dedicated core cache, and sizes. Intel is also talking about L4 in some possible future design.

In fact this is only one of the few details we already know about Nehelem.

Ho Ho said...

aguia
"Intel is also talking about L4 in some possible future design."

My first guess for that would be massive amount of dram stacked on a 3D die as is talked about in their Terascale architecture. Of course that would push it several years into future after Nehalem.

Second guess would have been special cache in northbridge on MP boards. Though northbridge as central place for memory access should disappear with Nehalem so this probably isn't that.

Third way to do it would be similar to XBox 360 and its embedded Dram on a daughterdie. That would probably look quite similar to Power4-6 with their massive L3.

abinstein said...

"What stops Intel from "copying" Barcelona and doing e.g 512k L2 + 6M L3?"

Inclusive vs. victim/exclusive design and the array of AMD patents on the latter.

"My first guess for that would be massive amount of dram stacked on a 3D die as is talked about in their Terascale architecture. Of course that would push it several years into future after Nehalem."

Stacked die will use SRAM, not DRAM.

Aguia said...

A lot of the early information we had was that Barcelona would launch in the 2.2~2.4 range and then scale quickly, with a potential to 4GHz in the end.

AnandTech Forum

Scientia,

AMD seams having problems with the design/clock speeds, since they thought in releasing the CPU at 400Mhz/600Mhz higher than what it will be release.

Do you think it is possible for them to workout the problems fast like they did with the first K7/K8 or they will only "polish" the design with the 45nm version?

Do you have any background history with what AMD did in the past with K7/K8 designs + manufactured process?

abinstein said...

aguia -
"AMD seams having problems with the design/clock speeds, since they thought in releasing the CPU at 400Mhz/600Mhz higher than what it will be release."

As I said in this AMDZone post, Gary doesn't seem to know what he is talking about, or he is apparently hiding something, probably the fact that Barcelona gains more advantages against Core 2 above 2.4GHz.

That said, I'm sure AMD wanted Barcelona to be much faster. 2.6GHz at the least, where it initially showed the estimated benchmark to be.

Aguia said...

But my question is abinstein:
- if that already happened with the K7 and K8 design?
- if AMD solved the problem fast and on the same process?
- or if the problem only got solved when AMD transitioned to a new process?


What happened in the past that could bring some light to what could happen in the future?

Aguia said...

abinstein,

I have read your analysis about what he said, I agree there are a lot confusing statements from him (maybe some intentionally), I just don’t think it’s FUD like you say.

And the RD790 will be released in September xbitlabs.com

But that fact it’s not even important since Phenom will not be released so soon, but at least we know the motherboards will be available sooner.

One fact he was talking very important is the need of faster HT when the CPU clock speed goes up. He was trying to say current motherboards limit the CPU performance?
I really didn’t understand that because besides IGP the bandwidth is more than enough to feed the GPU and the rest…
Unless because the NB have a different clock speed with HT3/AM2+ motherboards where the increased CPU clock speed will increase the NB speed to a level where performance improving starts to get noticed more on the new platforms.
Example:
AM2 motherboards + K10 2.4Ghz = NB 2.4Ghz
AM2+ motherboards + K10 2.4Ghz = NB 3.6Ghz
Since the NB has the L3 cache integrated that means that L3 will lose some of its efficiency with the AM2 motherboards.

InTheKnow said...

Abinstein, I didn't know I'd made a "claim" to anything. I've simply repeated what Intel has said publicly. That statement being that quad core is not economically viable until you reach 45nm.

But to answer your questions:

Let me ask you two questions:

1. Do you believe Nehalem will have smaller core and less cache size than Barcelona at the same process technology?


That is the crux of my statement. Barcellona is 65nm and Nehalem is 45nm. They will never be at the same process technology. If you want an apples to apples comparison, you have to compare Nehalem to Shanghai. And Shanghai is supposed to have quite a bit more cache than Barcellona. Our knowledge about Shanghai is even more speculative than our knowledge of Nehalem.

2. Are you comparing Nehalem @45nm process to Barcelona @65nm process?

Yes, I that is precisely what I'm doing because they will be competing products.

abinstein said...

Aguia -

The reason that I think Gary was making FUD is because he seems to suggest Barcelona is somewhat "impaired" below 2.4GHz, that it will not work efficiently unless running above the fast clock rate. This can only be pure FUD but nothing else.

Suppose you have a Phenom 3.0GHz, with clock generator 200MHz and multiplier 15x. Now reduce clock generator to 133MHz and the core running at 2.0GHz. Will you observe any "feature" being deactivated? No! To the core circuits, they don't even know nor care which clock rate they are running at; all they observe is that now main memory accesses take less number of cycles. Will they run less efficiently because of lower relative memory access latency?

"- if AMD solved the problem fast and on the same process?
- or if the problem only got solved when AMD transitioned to a new process?"


I know K8 was delayed again and again for a few months. K8 was a good design at 130nm; K8 dual-core a good one at 90nm. I see no reason that Barcelona's problem can't be fixed at 65nm.


intheknow -
"Barcellona is 65nm and Nehalem is 45nm. They will never be at the same process technology."

I meant to say Shanghai, which Nehalem will be competing with. It doesn't make sense to compare 45nm Nehalem with 65nm Barcelona, just as it doesn't make sense to compare 65nm Brisbane with 90nm Yonah.

"And Shanghai is supposed to have quite a bit more cache than Barcellona. Our knowledge about Shanghai is even more speculative than our knowledge of Nehalem."

Shanghai has more cache because it can afford it. :)

Shanghai will not be much different from Barcelona except the physical transistors they use. OTOH we know almost no detail about Nehalem.

Unknown said...

Look at the size of the fan AMD is using on Phenom CPU!
http://www.fudzilla.com/index.php?option=com_content&task=view&id=2585&Itemid=51

Aguia said...

Yes Giant you are right I never have seen a so small cooler on demonstration products, just compare to this one normally used in Intel demos: xbitlabs.com

The AMD cooler looks tiny...

Ho Ho said...

It kind of makes sense to use big cooler if you do OC testing also, don't you think?

Unknown said...

Yes Giant you are right I never have seen a so small cooler on demonstration products, just compare to this one normally used in Intel demos

That system I linked to was configured by AMD and used by AMD; not a third party. Before they were using a normal heatpipe fan they ship with the 6000+ and FX CPUs. Does anyone know why they switched CPU fans for the Phenom demonstrations?

The link you provided is to an Xbitlabs test. It was not conducted by Intel. In the Penryn demonstrations Intel has just used the standard LGA775 fan. It's pretty wimpy really, at looks tiny compared with the Thermaltake Big Thypoon VX I have to keep my overclocked Q6600 cool.

Aguia said...

The link you provided is to an Xbitlabs test. It was not conducted by Intel. In the Penryn demonstrations Intel has just used the standard LGA775 fan.

Do you have any link from what was inside the Intel box with the conroe 3.5Ghz Intel demo or the 3.33Ghz Kentsfield demo?

hyc said...

All this is well and good, but re: the closing lines of the blog post - I personally would love to see x86 fade off into the sunset, if it were replaced by an open, efficient design.

E.g., none of this "16 bit protected mode" / "32 bit mode" / "64 bit mode" crap. M68K had native 8/16/32 bit data types from day one, there's no reason a contemporary machine should have all those kludges. Just make it 64 bit, with native support for smaller operands. If you want 128 bit data types too, cool, no problem. Just make them fit naturally in the instruction set.

I think it's high time for another microprocessor architecture to escape from a university into the industry.

Unknown said...



Do you have any link from what was inside the Intel box with the conroe 3.5Ghz Intel demo or the 3.33Ghz Kentsfield demo?


Penryn 3.33Ghz, see here:

http://detail.zol.com.cn/picture_index_112/index1111529.shtml

I checked the spare Intel fan I have here that came with my Q6600. It does look at bit different to the fan Intel used. But it still uses the same push-pin system as the current LGA775 fans.

This looks like a slightly revised version. Maybe this is the fan Intel will ship with all Penryn CPUs?


I think it's high time for another microprocessor architecture to escape from a university into the industry.


Ironically, if it weren't for AMD, we'd all likely be using computers powered by Itanium.

Whether this would be faster than the x86 CPUs we use today is really anyone's guess.

One thing is certain, we do owe Core 2, Nehalem etc. to AMD's sucess with K8. K8 was a good architecture, I owned a 4200+ 939 system and it was great. But AMD has not been performing well as of late. Hopefully K10 will get AMD back into shape.

I hope for AMD's sake the K10 results we've seen unofficially thus far are totally wrong. If K10 flops, AMD could be in serious trouble.

abinstein said...

"Ironically, if it weren't for AMD, we'd all likely be using computers powered by Itanium."

Rather, most people probably will just switch to PowerPC and use Mac instead.

4.7GHz Power6 on your workstation?

Aguia said...

Ironically, if it weren't for AMD,

-Intel would be a real Monopoly and would had EU and US in their back.

-Microsoft would be even more monopoly than already is.

PS: I bet Microsoft would be forced to make their OS for other platforms like the IBM Power PC. Xbox360 is already running games how hard it is to convert the rest?

Unknown said...

You know, now that Apple no longer uses power, I could definitely see M$ making a version of windows server that is ppc compliant. It would be an entire, new market that IBM would certainly be greatful for, and that M$ wouldn't have to do too much, I imagine, to capitalize on.

Pop Catalin Sever said...

IBM is a direct competitor with MS on many markets databases: IBM DB2 - SQL Server, WebSphere - Biztalk Server/Sharepoint/PerformancePoint Server, server OS: Linux - Windows Sever, Eclipse - Visual Studio, and many more ...

I don't think MS is too keen to develop for IBM right now because that can only straighten IBM's market position plus IBM is a giant too with market cap of 160 Bil, and at 270 Bil. MS isn't much bigger and frankly at those values the that difference doesn't mater any more. IBM can easily be the biggest threat MS could face ...

So even if from an enthusiast point of view Windows on power pc architecture would be something interesting to watch, I don't thing it will happen in the current market landscape.

Azmount Aryl said...

You can order PowerPC 5 based servers with Windows OS right at IBM's home site.

enumae said...

What is everyones interpretation of the preliminary prices for the Barcelona processors?

Does the price indicate performance?

Daily Tech

Ho Ho said...

abinstein
"4.7GHz Power6 on your workstation?"

... that is as fast as Itanium with nearly 2.5x lower clock speed, according to Spec benchmarks :)


enumae
"Does the price indicate performance?"

If it does than in 2P Barcelona at 2GHz seems not to be as good as that 2.33GHz Intel CPU they compared against.

In the 4/8P I can understand the high price, Intel simply can't scale that well with so many sockets.

Aguia said...

I expect lower clock speed = lower performance = lower price

Also one important factor for K10/Opteron/Phenom is that it will perform faster with socket 1207+ and am2+.

abinstein said...

Ho Ho -
"... that is as fast as Itanium with nearly 2.5x lower clock speed, according to Spec benchmarks :)"

What are you talking about. Will you please not spread false information on things that you don't know?

Is there any dual-core Itanium system that can have this SPECint_rate or this SPECfp_rate?

Even on SPECint and SPECfp, Power6 (in the range of 22-24) is considerably faster than Itanium (in the range of 15-17), too.

Aguia said...

More good news from Ati:

ATI R500 Linux Performance

Ati is catching up with Nvidia in Linux.

AMD FIREGL V5600 Review

Good performance for the workstation cards too.

Where are the anti R6xx series guys?

GutterRat said...

abistein,

Time to get ready to eat your words. What condiments would you like and where should I send them to?

Press release:

http://www.intel.com/pressroom/archive/releases/20070905comp.htm?iid=pr1_releasepri_20070905m

Performance:

http://www.intel.com/performance/server/xeon_mp/summary.htm

LOOOL

Unknown said...

Ati is catching up with Nvidia in Linux.

There are still no drivers from AMD for the R6xx in Linux. That's just pathetic.

R600 was seven months late and competes with Nvidia's third fastest GPU. They came out with the lame excuse that no one wants a card as fast as an 8800 Ultra. What kind of crap is that?! I know plenty of people that don't want to mess around with multi-GPU setups and just want a single high end GPU.

In many cases the HD 2900 XT falls behind the 8800 GTS 320MB model when Anti Aliasing is enabled! Tell me now, who buys a high end card and doesn't run AA? I run AA in every single game I play. http://www.anandtech.com/video/showdoc.aspx?i=3023&p=1

If all that weren't enough, Nvidia is preparing to double graphics performance in just over two months.

Earlier this year Michael Hara, vice president of investor relations at Nvidia Corp., said that the company’s forthcoming flagship product would have peak computing power close to 1TFLOPs, about two times more compared to the current code-named G80 chips, which is used on the GeForce 8800 GTS, GTX and Ultra products.

Nvidia have really spoiled gamers like me. G80 doubled performance over the 7900 GTX, and G90/G92 (whatever the high end part is, I couldn't care less about the codename) will double performance yet again. I will absolutely buy one on launch day.

What does AMD have in response? Nothing. Zero. Zilch. Nada. Well I suppose you could wait another seven or eight months for AMD to get their next generation part out the door, it should compete nicely with Nvidia's third fastest GPU!

abinstein said...

giant -

You are obviously ignorant. The main problem today's datacenters face is not 20% faster clock rate, but better performance per watt and scalability. Intel's quad-core offered an interesting solution where you get 1.5x performance from a few bucks more expensive quad-core. However in terms of price-performance or performance-watt, no Xeon quad-core is a match to Barcelona.

Even if you buy 2.66GHz Xeon quad-core, the 1.5x scalability means it'll run less efficiently than 2.0GHz Barcelona under high load. Under low load, the Xeon uses much more power and is less power efficient. Overall Xeons with aging FSB and MCM quad-core is just poor system design and stupid investment.

enumae said...

Abinstein
However in terms of price-performance or performance-watt, no Xeon quad-core is a match to Barcelona.


Isn't that a little premature considering we haven't seen any SPECint benchmarks?

Also note that the video you and I discussed that is on Vnunet (AMD 2.0GHz vs Intel 2.33GHz SPECfp) showed power usage, while they show a favorable screen capture, the power usage was surprisingly similar if you take the time to watch it closely.

amw said...

Interesting article at Arstechnica

http://arstechnica.com/news.ars/post/20070905-intel-launches-tigerton-quad-core-xeons-new-caneland-server-platform.html

Highlights

"It's also the case that Tigerton/Caneland's integer performance is rock solid and stands up well to the competition. (This is why Intel touted the specint numbers in its forthcoming press release.)


"I think we'll see that in the near-term, four-socket Barcelona systems will dominate four-socket Xeon systems in specfp_rate and in floating-point performance/watt. CPU-bound workloads will tell a different story, however, which is why I expect to see fanboys in both camps stinking up the Internet with specfp_rate versus spec_fp debates following the September 10 launch. "

How true a statement , apart from the fact that it has started already from giant! :)

Ho Ho said...

abinstein
"What are you talking about. Will you please not spread false information on things that you don't know?"

Seems as my memory failed me a bit. I blame it on having around 20h of sleep since Saturday morning :)

Though still, Itanium delivers some impressive numbers having nearly 3x slower clock speed. Too bad I couldn't find any *_rate scores for the high-end Itainums so I have to relay on basically single-core performance numbers.

SpecInt2006
Power 6, 21.6
Itanium 2, 15.7
Points per GHz:
Power 4.6
Itanium2 9.8

SpecFP2006
Power 6, 22.3
Itanium 2, 18.1
Points per GHz:
Power 4.7
Itanium2 11.3

So Itanium has nearly twice the IPC of Power6, more with FP loads.

Comparing those to Core2 scores, 21.0 at Int and 17.7 at FP gives it per-GHz scores of 7 for Int and 5.9 for FP. Seems as Core2 sits between Itanium and Power in terms of IPC.


Just for fun, here are numbers for highest scoring Netburst systems:
int 11.7
fp 12.7

Per-clock scores 3.1 for int and 4.7 for FP.
Gee, old Netburst is as high IPC in single threaded FP as latest and greatest from IBM :)


Of course if one would compare rate scores (none available for the Intel CPUs I looked for) things would look quite a bit different. Those would show overall system performance, when talking about CPU capabilities I'd say that non-rate scores are a bit better metric, though perhaps a bit too academic and you can't see the scaling performance with them.


"Intel's quad-core offered an interesting solution where you get 1.5x performance from a few bucks more expensive quad-core."

How much more performance would one get going from >=2.8GHz dualcore Opteron to quadcores Barcelonas? Heck, those quads eat even more juice than the newer much higher clocking quads!


As for the new Intel MP scores themselves, you can see one of those here. Four 2.93GHz quadcores deliver INT_rate of 214. Highest performing Opteron system I could find was with four 3GHz dualcores delivering 108 points. Basically Intel system is twice as fast with double the cores. Anyone wants to calculate how well does it scale from 1P->2P->4P? I don't currently have enough time to do it.

Aguia said...

There are still no drivers from AMD for the R6xx in Linux. That's just pathetic.

You didn’t read the article?


R600 was seven months late and competes with Nvidia's third fastest GPU.

-R600 was the only one late (2900XT), 8600 was released in 17 April 2007, 2600 in 29 June 2007 (2,5 months sooner?!)
-512 bit memory controller.
-Better and higher quality video accelerator.
-VC1 video decoder.
-HDMI implementation.
-65nm process cards.
-Better DX10 performance.
-Advanced AA and AAA engine (nobody seams to test).

See this test:
behardware.com
Ati no AA 88.6FPS, NVIDIA no AA 141.8 FPS! Impressive NVIDIA results!
Ati 4x AA 60.2FPS, NVIDIA 4x AA 48.1! Impressive NVIDIA results! Wait Ati!
Ati 8x AA 47.7FPS, NVIDIA 4x AA 22.2! Nvidia 2X slower, what?!


They came out with the lame excuse that no one wants a card as fast as an 8800 Ultra. What kind of crap is that?!

No one that’s exaggerated, I agree with you on that one! But near 0.0000001% of the computer buyers that’s for sure.


I know plenty of people that don't want to mess around with multi-GPU setups and just want a single high end GPU.

Well the guys that bought one Intel processor with Intel chipset motherboard they really don’t want to mess around with Nvidia multi-GPU setups because it doesn’t work.


In many cases the HD 2900 XT falls behind the 8800 GTS 320MB model when Anti Aliasing is enabled! Tell me now, who buys a high end card and doesn't run AA? I run AA in every single game I play.

The guys that doesn’t know what that feature is and where to enable it? And believe me there are a lot of those people. Besides the performance of the Nvidia dunks when Transparent AA is enabled, Ati doesn’t. But maybe no one knows what transparent AA is and doesn’t enable the option right Giant?


If all that weren't enough, Nvidia is preparing to double graphics performance in just over two months.

I’m not sure about that, I’m expecting a “refresh” like they did with 6xxx to 7xxx series. New process and lower power consuming cards and that’s about it. A slight increase in the clock speed and that’s the expected performance improvement (20%).


about two times more compared to the current code-named G80 chips, which is used on the GeForce 8800 GTS, GTX and Ultra products.

Want to bet that its one 7950GX2 like card and it’s not really a new high performance GPU?

Unknown said...

Giant, as close as your logic is to almost being sound, you have to lie to get it there.

Abinstein has never said he knew someone. He, like all of us "AMD fanboys" refers to articles written by people under NDA who have said that phenom and barcelona perform extremely well, and some have even gone on to give general performance expectations compared to Intel. One such person is Rahul Sood, who, while possibly having a flawed view on where the industry is going, has never showed an ounce of bias towards AMD.

Giant, there's a huge difference comparing what Tigerton "can be" and what it is. It may be 144% faster than the slowest Tulsa proc ever released, but it's not going to be that much faster than the specific product each of the new Tigerton processors is meant to replace. This is obviously the only metric Intel could have had that would actually be useful, but instead they had to infalte expectations.

Ho Ho said...

aguia
"-R600 was the only one late (2900XT), 8600 was released in 17 April 2007, 2600 in 29 June 2007 (2,5 months sooner?!)"

When did AMD/ATI say they were going to release the lower-end chips? Was it after the many delays of R600 or before?


"-512 bit memory controller."


... but still competing with GPU with nearly half the memory bandwidth.


"-Better and higher quality video accelerator."

Define "better". Quality lead is debatable.


"-HDMI implementation."

Here you go


"-65nm process cards."

Only on low-end and even there it doesn't seem to help all that much. Their 80nm is most certainly way worse than NVidia 90nm. Also their 65nm doesn't seem to be too much better than NVidia 90nm on the low-end chips. Just imagine what happens if NVidia can do a decent 65nm GPU, I'd say >2.5GHz for the SPU's is achieveable.


"-Better DX10 performance."

In what DX10 games exactly? The few ones that have DX10 added on the last minute or with a patch? Did you know that not even DX9 is fully used in most games. E.g Crysis has way more DX9 effects used than in any other game ever made. Also the tests I've seen so far are not showing a pretty picture for R600.


"-Advanced AA and AAA engine (nobody seams to test)."

What good is from advanced AA if the performance sucks and all those tent filters give you blurred images?


"See this test:"

Can you make same comparison with more games? R600 AA is rather flawed as it needs a lot of memory bandwidth and computing power thus generally it pretty much sucks. SS2 is one of the few where it actually works.

Also see other game benchmarks on your link. In HL2:LC, FEAR and Tomb Raider the scaling is rather bad. R600 is only a bit faster than R580 with AA in the latter. So much about all that advanced stuff.

In Rainbow six it doesn't work with AA at all. Even after fixing the bugs it is still slower than GTS in Stalker. Also the performance index with AA is still slower than that of the GTS.

Thanks for the link, it nicely proves that with nearly half the memory bandwidth and having a lot less peak FP performance GTS still beats R600 in performance when AA is used. Not to mention price/performance or performance/power.


"Nvidia 2X slower, what?!"

If you knew a bit how AA works on G80 you would understand why 4x AA is almost always faster than 2x. It is not that 2x is slower than should, just that 4x is more efficient than 2x could be. Just read B3D articles a bit.



"Want to bet that its one 7950GX2 like card and it’s not really a new high performance GPU?"

Sure, why not. My bet would be on either GTS replacement or a new single GPU highest-end.

Aguia said...

ho ho,

When did AMD/ATI say they were going to release the lower-end chips? Was it after the many delays of R600 or before?

Hum?!?!


... but still competing with GPU with nearly half the memory bandwidth.

So?!


Define "better". Quality lead is debatable.

More efficient. Lower resource usage. Higher quality output. Compatible with more decoders. One cable do it all. Cheapper. You know the usual stuff that you don’t remember, tend to forget or don’t want to know.


Here you go

So you want to compare from one manufacture specific model to all the Ati cards that already can do that?
Besides do you see any wire connecting the sound on the Ati cards?


Only on low-end and even there it doesn't seem to help all that much.

Like in the Intel 45nm VS AMD 65nm and the Intel 65nm VS AMD 90nm case right? nm are not important because you are not interested that they do in this particular case right? Because for you the Ati delays had nothing to with the fact that they are at 65nm and Nvidia at 90nm, right? So for ho ho add the fact that manufacturing process is no longer important to the not native quad importance level.


Just imagine what happens if NVidia can do a decent 65nm GPU

I bet I can use the same imagination with K10 and abinstein can do it too but you would not like it, right?


In what DX10 games exactly?

The ones that have been demonstrated? Do you know that in Bioshock for example the X1xxx series is 40% faster than the equivalent Nvidia cards (defining equivalent: price).
But you aren’t interested that older Ati hardware performs faster in today games than the older equivalent Nvidia hardware, right ho ho?


What good is from advanced AA if the performance sucks and all those tent filters give you blurred images?

Well the Ati was playable (Nvidia 22.2FPS and Ati 47.7FPS), it was the nvidia card that it wasn’t playable. And see the comments from the author he says Ati slowed much better picture/image quality; maybe the blurred images are gotten from Nvidia cards?


Can you make same comparison with more games?

No, that’s my point they don’t do high quality AA tests. The 6x AA was much better than any Nvidia filter and very playable with the 9xxx, X8xx and X1xxx series cards but no one is interested in doing tests. Even the 2X AA seams that doesn’t no longer exist.


SS2 is one of the few where it actually works.

Its one of the few they actually tested, that’s the problem.


Also see other game benchmarks on your link.

There are no more High quality tests from the link, that’s the problem. Just for SS2.


If you knew a bit how AA works on G80 you would understand why 4x AA is almost always faster than 2x.

Hum?! I was saying that Ati was 2x faster than Nvidia with 8x AA. Nvidia 22.2FPS and Ati 47.7FPS, the Nvidia card its unplayable.

And its you don’t seem to know how the AA works. Nvidia got 88.2 FPS with 2X AA and 48.1 FPS with 4X AA. Also if you know how Ati AA works you should also know that most Ati card including one very old 9700 that some times its faster with 2X AA than with AA disabled, how about that?


Next time try not to answer the others questions, because you may not understand them, I think it was the case here.

Ho Ho said...

aguia
"Hum?!?!"

R600 was initially said to be released early this year. When was the date for lower end release talked about for the first time and was it before or after the several delays of R600? Remember, first official release for R6000 was said to be in March and second in April, none happened.


"So?!"

So it is awfully inefficient. Something like Netburst with high clock speed but lacking performance. Still, it is nice to paint big numbers on the box and general public will buy stuff because of that. It worked for Intel, why not for AMD/ATI?


As for the numbers, here are some very crude ones:

R600(1G model) vs GTS vs GTX
Mem GiB/s
128 64 86.4

peak FP Ginstr/s
475 230 345

Compared to GTS r600 has both memory bandwidth and IPC twice as high and higher than GTX but it fails to beat the latter in most things and is mostly on par with GTS. It is somewhat similar to how K8 beats Netburst by simply having more efficient core.



"Besides do you see any wire connecting the sound on the Ati cards?"

Thank you very much but I'd like to use my separate high-quality (and expensive) sound card instead of the integrated thingie on R6x0 chips.


"Like in the Intel 45nm VS AMD 65nm and the Intel 65nm VS AMD 90nm case right?"

No, my point was that Intel (and NV) seems to actually gain benefits from lower nm but AMD/ATI doesn't seem to, at least not nearly as fast. Of cource things haven't been all that nice with Intel and Nv also, just remember how bad it was when they rushed out NV30 series on lower nm. FX5800Ultra aka dustbuster should be known to most people. Luckily for them, NV has learnt their lesson from that. ATI didn't with R520/580 as they seem to be repeating same mistakes with R600.


"I bet I can use the same imagination with K10 and abinstein can do it too but you would not like it, right?"

Well, NVidia is competing with 65/80nm using 90nm. Their 90nm is extremely good considering the sheer size of G80 chip. We don't yet have any certain signs on what could 65nm give NV but my guess is that when history repeats itself things will only get much better.

If we do the same "history repeats itself" with AMD and also see what they currently have on 65nm then things are not looking all that nice, wouldn't you agree?


"Well the Ati was playable (Nvidia 22.2FPS and Ati 47.7FPS), it was the nvidia card that it wasn’t playable."

So SS2 is important but not some of the other games?



"There are no more High quality tests from the link, that’s the problem. Just for SS2."

So the fact that performance drops heavily even without using maximum quality in the other games is OK by you?



"Hum?! I was saying that Ati was 2x faster than Nvidia with 8x AA."

I thought you were wondering why is 4x AA faster than 2x AA on G80.



"And its you don’t seem to know how the AA works."

I do know it quite well, I've written a few filters for my ray tracer, e.g n-queens. I learnt the theory for writing it from rasterizing algorithms.

Christian H. said...

They don't have to sell the building, but they could sell just the contents. There are already rumors out there of a Russian buyer having bought up much of the 200-mm tooling. So that would mean reduced capacity out of Fab 30 going forward, until the supposed Fab 38 conversion. I won't link to that Fudzilla rumor because they're not a credible source, but we'll find out if it's true.

News from FabTech is that AMD has 300mm equipment in Fab 30 now. And yes the Russians want all of AMDs 90nm 200mm tools.

The ramp is slower but it's prgressing smoothly. They also say that Fab36 is maing so many chips that Chartered hasn't had any wafer starts for awhile.

I can only hope that they move Turion and Brisbane to Chartered for the most part and leave Fab 36 for 10h.

Plus, Saxony is footing the bill for the retooling of Fab 30. AMD just has to get the old equipment out.

Aguia said...

Remember, first official release for R6000 was said to be in March and second in April, none happened.

The 2600/2400 was to be released in April with the 2900XT, but because of an error on the UVD engine got delayed 2 months.
I don’t think it was a major delay since the Nvidia money making line also got delayed (8600 line), besides the 2900xt delay that was another problem, according to some Catalyst developer in a forum said the software was behind the hardware in about 6 months, he said we (Catalyst team) weren’t ready. Give the November/December launch and you have April, so drivers got delayed. You sum the time where the 8800 line worked with problems and you got about the same time.


It worked for Intel, why not for AMD/ATI?

Right.


Compared to GTS r600 has both memory bandwidth and IPC twice as high and higher than GTX but it fails to beat the latter in most things and is mostly on par with GTS. It is somewhat similar to how K8 beats Netburst by simply having more efficient core.

Well look it has Ati 9800 VS Nvidia 5900. Nvidia had: Much faster memory, much faster core and better manufacturing process, however got easily beat in all tests by the worst specs Ati cards.


Thank you very much but I'd like to use my separate high-quality (and expensive) sound card instead of the integrated thingie on R6x0 chips.

Well that’s you ho ho. How many buy a sound card with their PCs now days, with motherboards having 6/8CH audio?
By the way which cards is it? Creative?


No, my point was that Intel (and NV) seems to actually gain benefits from lower nm but AMD/ATI doesn't seem to, at least not nearly as fast. Of cource things haven't been all that nice with Intel and Nv also, just remember how bad it was when they rushed out NV30 series on lower nm. FX5800Ultra aka dustbuster should be known to most people. Luckily for them, NV has learnt their lesson from that. ATI didn't with R520/580 as they seem to be repeating same mistakes with R600.

Good point. But AMD strategy of keeping old process longer seams to work, as Intel strategy of bringing new process sooner seams to work too. Maybe both strategies are good?





Well, NVidia is competing with 65/80nm using 90nm. Their 90nm is extremely good considering the sheer size of G80 chip. We don't yet have any certain signs on what could 65nm give NV but my guess is that when history repeats itself things will only get much better.

Well I’m not sure about the “extremely good”, because prices didn’t change even with the Ati 2xxx line. And the fully functional parts without disabled units (GTX/Ultra) must be really insignificant in sales. If there is any utility that can enable the GTS disabled units so we can see if they are good parts disabled. (aka: Ati 9500@9700).



If we do the same "history repeats itself" with AMD and also see what they currently have on 65nm then things are not looking all that nice, wouldn't you agree?

For our safe (buyers) I hope not. A guy that works at AMD in a forum says until end of the year 3 more revisions, could mean major achievements with the new revisions who know, the guy seamed happy with that.


So SS2 is important but not some of the other games?

Ho ho, you still didn’t understand?
That game was the only tested with superior AA quality. Transparent AA, higher quality filter AA, …
The other games where using “regular” AA settings. The point is the Ati parts with “extreme” AA settings didn’t slow down as much as Nvidia parts, read the page that I linked in that review. That was the only game tested with those settings, that’s why I was calling for the importance of it.


So the fact that performance drops heavily even without using maximum quality in the other games is OK by you?

The point is that “slow” for “slow” you can enable higher AA setting with the Ati cards without suffering a heavily drop. But no one mention this in reviews. See if you understand, if you are going to take a hit by enabling AA with the Ati cards at least chose the right hit, the one that offer much better image quality.


I do know it quite well, I've written a few filters for my ray tracer, e.g n-queens. I learnt the theory for writing it from rasterizing algorithms.

Then you also know that now it’s on the Catalyst drivers developers hands to increase AA performance in games not in the hardware.
So you do programming? Graphics programming? Game programmer?

Axel said...

Mr. Howell

They also say that Fab36 is maing so many chips that Chartered hasn't had any wafer starts for awhile.

Yes, between the Fab 36 ramp and Intel's prodigious 65-nm output there has been a glut of chips on the market. Unfortunately, demand for AMD's chips fell so much that AMD had to price their products into the floor to clear their inventory. Their new pricing structure seems to have achieved that purpose, but at the expense of revenue and profitability.

AMD have high fab capacity (incurring tremendous fixed costs) but their current crop of chips isn't good enough that people are willing to pay a lot for them. Along with the 45-nm conversion costs, this puts them between a rock and a hard place. Basically, AMD are screwed unless Barcy can bring the ASPs up greatly. This, unfortunately, looks to be improbable as Barcy is roughly even per clock with Clovertown:

Gary Key (Anandtech reviewer) quote from this morning:

"Depending upon the application, Barcelona clock for clock is equal to Clovertown or just a tad faster in some areas and in others it is 20%+ (heavy emphasis on "plus" until Monday) faster. I have to say that final silicon is looking really nice at this point, especially in memory sensitive applications where this processor shines, but core speeds need to come up in a hurry to compete with Tigerton in general server applications."

Basically, K10 looks like it's just K8 in stilettos and lip gloss. K8 already shone in bandwidth intensive tasks and K10 has simply widened the gap. But the core has apparently not been improved enough to really pose much of a threat to Intel's current clock speed and performance dominance, particularly with Penryn & SSE4 around the corner. It really is too little, too late.

Aguia said...

Axel,

I don’t understand your post.
He clearly says it’s faster. In some applications 20% faster (or more) at the same clock.
How can that be bad?!
The only bad thing I can see right now is the 2.0Ghz that we don’t even know if its because of clock speed limitation, temperature limitation, design limitation or TDP limitation; because AMD already have shown a 3.0Ghz working quad core processor.
Even with that we don’t know anything, except lots of good marks in blogs and forums from people that already have tested the final part.

What we already know is that Penryn will be up to 5% faster, can get to 100% faster with beta DivX codec using SSE4.
Since Penryn have much more updates then everyone predicted, I was expecting much more out of it, especially after knowing about the low latency L2 and 50% higher size.

Axel said...

Aguia

He clearly says it’s faster.

Read his quote again. It's clear that the 20% advantage is only in memory intensive apps "where this processor shines", but otherwise it is about even with Clovertown per clock.

If you recall, K8 is also significantly superior to Woodcrest in memory bandwidth. Gary insinuates that non memory bound tasks, the majority in the server space and the vast majority everywhere else, are about equal per clock between K10 and Clovertown. This means that K10 has essentially "caught up" with Clovertown per clock in most tasks but is not really beating it by much except in certain specialized situations that tax memory.

I wonder if Barcy has caught up on SSE. I don't think it'll keep up with Penryn.

The bottom line is that Barcy will not raise ASPs sufficiently for AMD to continue plodding on with their broken business model. Here's why:
- Clovertown has a tremendous clock advantage.
- Harpertown (Penryn server core) launches in about two months with better IPC, lower power use, and SSE4 capability.
- Intel can use their clock advantage to keep Barcy prices low, denying profits to AMD.

Now if AMD are able to ramp clocks very quickly, the situation will ease up a bit for them. I personally don't believe that clocks will ramp as quickly as some claim. I think 3.0 GHz in Q4 is utterly out of the question.

enumae said...

Aguia
How can that be bad?


When do you believe that AMD will release a 3GHz or 3GHz+ Quad-core?

Do you feel that although AMD will probably not have the performance advantage (due to clock speed) they will be able to gain back market share in the highly profitable areas of the market?

If what has been said about performance is true, AMD's road maps are accurate (relating to the 2.4GHz clock speeds), and they are unable to maintain or gain market share AMD is in for a rough first half of 2008.

amw said...

Aguia said...

Do you know that in Bioshock for example the X1xxx series is 40% faster than the equivalent Nvidia cards (defining equivalent: price).But you aren’t interested that older Ati hardware performs faster in today games than the older equivalent Nvidia hardware, right ho ho?"

How does the older X8 and X6 series play it? Better than nvidia 6 series ? You are cherry picking here....


-R600 was the only one late (2900XT)


Dave Orton specifically said they were releasing top to bottom at one go, but they didn't, so by definition the lower R6x series was late.

Unknown said...


Well look it has Ati 9800 VS Nvidia 5900. Nvidia had: Much faster memory, much faster core and better manufacturing process, however got easily beat in all tests by the worst specs Ati cards.



Yes. This is an excellent comparison. Now here's another one for you:

2900xt memory interface: 512bit vs. 8800 GTS 320bit

2900xt stream processors: 320 vs. 8800 GTS 96

2900xt clock speeds: 740/825 vs. 8800 GTS 500/800

Now given that they both have similar performance, which do you think looks more efficient?

Ho Ho said...

aguia
"Well look it has Ati 9800 VS Nvidia 5900. Nvidia had: Much faster memory, much faster core and better manufacturing process, however got easily beat in all tests by the worst specs Ati cards."

Exactly. Only difference this time is that the positions have changed.


"By the way which cards is it? Creative?"

No Creative, it is not good enough, it distorts sound and has quite a few compability problems. I have Auzen X-Plosion


"Good point. But AMD strategy of keeping old process longer seams to work, as Intel strategy of bringing new process sooner seams to work too. Maybe both strategies are good?"

Yes, things work for AMD and Intel but for some reason not for ATI. Things haven't changed yet as their GPUs are not produced in AMD fabs and it was way too late to do anything better with R600 at that time.


"Well I’m not sure about the “extremely good”, because prices didn’t change even with the Ati 2xxx line"

Prices drop because of competition and it seems as 2xxx doesn't offer that. Also do you remember that NV was talking about having problems producing enough chips for their GPUs? They said the fabs are maxed out and they sell everything they make but sometimes they are still short. They also warned it could mean that Q3 won't see as big increase as Q2 did.


"And the fully functional parts without disabled units (GTX/Ultra) must be really insignificant in sales"

Check Steam surveys, From June to middle of August 8800 series GPUs are in ~4.5% of all their users. No other high-end GPU has ever been so high in that survey. Also as you can see 2xxx series is nowhere to be seen. In short, 8800 series has been a great seller.


"For our safe (buyers) I hope not."

So do I but hope alone doesn't make things better. We'll just have to wait and see how things go.


"The point is the Ati parts with “extreme” AA settings didn’t slow down as much as Nvidia parts, read the page that I linked in that review."

And my point was that SS2 was one of the games where it din't slow down as much with regular AA either. In other games it did slow down and I bet the slowdown would have been even worse with the "extreme" AA.


"Then you also know that now it’s on the Catalyst drivers developers hands to increase AA performance in games not in the hardware."

Yes, I do know that R600 series doesn't use special AA HW as all the other GPUs do, they run it in shaders. That is also the reason for the slowdowns in many games.


"So you do programming?"

~10 years as hobbyist, ~4 as professional.


"Graphics programming?"

Not much professionally but some as hobby. I've done numerous small toy projects as tests for a few things. I've worked a bit with OpenGL and ray tracing and I know quite a bit about the latter. I read a lot about the theory and I could talk about almost anything that has got to do with how graphic engines are written and how GPUs run them.


"Game programmer?"

No but I've done a couple of small things. Nothing major, though. They have been 2D stuff with not too much features.



abinstein
"IPC alone is as meaningless as clock frequency. Netburst doesn't go above 4GHz, whereas Power starts from 3.5GHz."

Yes, that's also the reason why K10 won't be much of a match for high-clocked Core CPUs for as long as they don't increase their clock speed significantly, at least not in situations where memory bandwidth is not the bottleneck.

Btw, do you happen to know exact TDP for Power6? I believe it was somewhere around 200W.

abinstein said...

giant -
"Now given that they both have similar performance, which do you think looks more efficient?"

Wow.. giant, way to show off your ignorance in computer architecture. I have never heard of "efficiency" with respect to memory width or number of stream processors.

Saying so is to ignore all the tradeoffs that have gone into the design and that's called nothing but ignorance.

That said, nVidia has great team and designs great graphics processor. IMHO better than the former ATi, which had a superior product during X1900 era but couldn't scale up the microarchitecture.

It still waits to be been how NV and AMD+ATI compete with each other. However, one thing is for sure, that they're both better than Intel when it comes to graphics engines.

Axel said...

Abinstein

The fact is memory is the bottleneck for today's workloads, at least most of the meaningful ones.

So I guess AMD should adopt your reasoning, ignore the desktop and mobile cash cows, and continue to lose $2 billion a year? The fact is that your definition of meaningful is different from that of 99.9% of the computing population. With K10, AMD might in fact serve your niche very well but don't be surprised if it's also directly responsible for AMD filing Chapter 11 next year or completely restructuring their business to quit the desktop / mobile markets.

abinstein said...

"If you recall, K8 is also significantly superior to Woodcrest in memory bandwidth."

This is due to NUMA, where Woodcrest's single FSB limits the memory bandwidth. Barcelona will have the same advantage over Clovertown, though less over Tigertown which can use multiple FSBs.

"Gary insinuates that non memory bound tasks, the majority in the server space and the vast majority everywhere else, are about equal per clock between K10 and Clovertown."

This is pure BS. Have you actually run any real datacenter you've have known memory is the critical part of performance. Even day-to-day work on heavy-duty workstataion, memory access latency and bandwidth are critical, especially the latency.

The "non-memory" critical workloads are mostly single process games and consumer toys. There are fools who'd spend $1000 to buy a processor to encode 4 $10 DVDs instead of 3, but I doubt there are too many of them. :p

abinstein said...

"So I guess AMD should adopt your reasoning, ignore the desktop and mobile cash cows, and continue to lose $2 billion a year? The fact is that your definition of meaningful is different from that of 99.9% of the computing population."

I reckon there are people who are foolish enough to buy 3GHz Core 2 when 2.5GHz Phenom will do the same work with less cost, less power, and less upgrade hassle. But I assure you those people are not 99.9% of computing population. Not in your nuttiest wet dreasm.

abinstein said...

"With K10, AMD might in fact serve your niche very well but don't be surprised if it's also directly responsible for AMD filing Chapter 11 next year or completely restructuring their business to quit the desktop / mobile markets."

So you Axel believes AMD will either file Chapter 11 or quit desktop/mobile market?

Given how poor your understanding in computing is I have high doubt on the accuracy of your prediction. You make me recall the various "end-of-the-world" scenario predictions before y2k.

Maybe you can ask gutterat for some (by then) rutting condiment when Shanghai is release.

enumae said...

Abinstein

No comment on my question to you?

Unknown said...

I reckon there are people who are foolish enough to buy 3GHz Core 2 when 2.5GHz Phenom will do the same work with less cost, less power, and less upgrade hassle.

The key difference being? You can buy a 3Ghz Core 2 right now. Dual core $266. Quad core $999.

Unknown said...

Axel, if you don't know if Barcelona has caught up on SSE then you obviously have ignored all discussion of the design that has gone on in this blog, and really don't seem to interested in giving valuable commentary.

Also, that article, again, goes against the grain of every other (though definitely more vague) article written on barcelona and has the exact same tone as many articles on anandtech. It's "well, I hope barcelona does well because I have to in order to almost sound objective, but I'm basically assuming it wont". I really don't know why anyone would write anything like that, but seeing how obviously unobjective sources (on both sides) continue to do so and suffer no recourse, I'd assume there's a reason.

Ho Ho said...

abinstein
"The fact is memory is the bottleneck for today's workloads, at least most of the meaningful ones"

Meaningful to whom?


"The poor memory subsystem makes Core 2 a poor choice for about half the critical workloads."

Of the overall x86 CPU market how much users does that half of critical workloads make up? Less than 10%? Less than 2%?


"At this point I don't think someone would care how fast the "core" is, if it takes him 30% more time to run a scientific program due to inefficient memory."

I can also say that I don't care how fast K8 runs some scientific application because it would take it significantly longer to do video processing and would be slower in gaming when using high-end GPUs (so the bottleneck would be on CPU).


"Besides, Barcelona simply has much superior scalability than current Core 2."

Yes, it will likely have better scalability, that's probably the reason for the high prices of the 8000 series CPUs. Though I would think that the new MP platform from Intel is quite decent also, certainly a huge leap from what they have now. We'll have to see how the two match once both are out with some benchmarking done.


"I have never heard of "efficiency" with respect to memory width or number of stream processors."

You didn't understand that when he was talking about bus width he was thinking about memory bandwidth. Fact is that HD 2900 with 105/128GiB/s memory bandwidth is fighting against 8800GTS with 64GiB/s bandwidth. This is simply an example how NV could achieve pretty much the same result as ATI but using far less recources (== memory bandwidth and peak FP power).


"Saying so is to ignore all the tradeoffs that have gone into the design and that's called nothing but ignorance."

I admit that creating a 512bit memory bus is a great feat. Only problem is that it doesn't seem to do much good for the architecture as a whole, otherwise R600 wouldn't be competing in the highest end.


"However, one thing is for sure, that they're both better than Intel when it comes to graphics engines."

Intel IGP's were nothing too great, simply adequate. Though I do think that Larrabe could be something spectacular. As usual, time will tell what happens.


"I reckon there are people who are foolish enough to buy 3GHz Core 2 when 2.5GHz Phenom will do the same work with less cost, less power, and less upgrade hassle"

Ok, so when could one actually buy that 2.5GHz Phenom? Want to bet that won't be happening this year? Also could you somehow share the information about Phenom performance, you seem to have quite different data compared to what has been floating around recently.

Less upgrade hassle? How many people do upgrade their CPUs separately from the rest? I bet there are much less those people than there are the ones who run those critical workloads you were talking about.

Unknown said...

However, one thing is for sure, that they're both better than Intel when it comes to graphics engines.

Yes indeed. Nvidia just keeps raising the bar higher and higher. Did AMD do that? Not at all. I bought a Geforce 8800 GTS on launch day last November and have been very impressed by it. AMD made a whole bunch of Ati fans wait seven months for a card that did not raise the bar one bit.

Oh, and when was the last time Intel claimed to develop anything other than extremely low end IGPs that are designed to use very little power for use in low end PCs?

Does everyone remember AMD's stupid 'Multicore for dummies' stunt? Perhaps Intel should send AMD a book 'Quad core CPUs for dummies' or 'Earning profit for dummies'.

AMD is on it's last gasp of air, they'll lose another $600m this quarter, aside from the money they make from their little garage sale. The problem? AMD only has so much to sell.

AMD is finished - BK by Q2'08.

Allepisodes said...

Giant said...
AMD made a whole bunch of Ati fans wait seven months for a card that did not raise the bar one bit.

correction Ati Made a whole bunch of fans wait. Wasnt R600 being worked on befor the merger of the 2 companys? i believe R600 taped-out around June 2006 and the 2 companys merged some time around the end of october 2006. so dont go blaming AMD for something it had nothing to do with.

Unknown said...

so dont go blaming AMD for something it had nothing to do with.

You're right. R600 was being developed before the merger. But it was also being developed after the merger. AMD bought Ati so they get any applause that might arise from ATI's products. They also get the criticisim in this case.

AMD's own track record in this area is also poor. Ruiz admitted that Barcelona was about six months late.

Compare that to the Clovertown launch. It was scheduled for February this year but Intel pulled it in by three months to November '06. That's the kind of execution AMD needs if they want to gain ground against Intel.

Lets be really cynical. Given AMD's poor track record in getting it's gear out on time, who's to say that R700 and Shanghai won't be six months late?

Allepisodes said...

Giant said...
You're right. R600 was being developed before the merger. But it was also being developed after the merger. AMD bought Ati so they get any applause that might arise from ATI's products. They also get the criticisim in this case

I dont See how you come to that conclusion
You cant Blame AMD for any ATI project's being worked on
befor the merger. Now when fusion Comes out
and by some small chance it happens to flop then u can say
its AMD's fault All the way.

From what your saying i guess i can Blame Ruiz
for All Of Jerry Sanders mistakes to then? and blame Paul Otellini for Craig Barrett's mistakes (Pentium 4).

Compare that to the Clovertown launch. It was scheduled for February this year but Intel pulled it in by three months to November '06. That's the kind of execution AMD needs if they want to gain ground against Intel.


Intel is like 16X the size of AMD has more fabs,Engineers and Money
So Its Alot easyer For them to do that. Dont make it sound so hard for them
if Intel starts to fall behind they can just throw more Engineers at the problem and catch up
AMD doesn't have that Luxury Yet. Keythings to remember here is AMD= 16X smaller than intel

Lets be really cynical. Given AMD's poor track record in getting it's gear out on time, who's to say that R700 and Shanghai won't be six months late?

Im gonna Wait & see AMD has a plan in place lets see if it works if Intel can get its act together i don't see why AMD cant.

Ho Ho said...

"if Intel can get its act together i don't see why AMD cant."

Difference is that Intel had money to burn but AMD doesn't. It will be really hard for them to make things significantly better for themselves

Scientia from AMDZone said...

aguia

Six months for AMD to hit 3.0Ghz in volume production. Unless AMD's 3.0Ghz demo was an elaborate hoax they should have this in Q1 08.

Allepisodes said...

Ho Ho

Difference is that Intel had money to burn but AMD doesn't. It will be really hard for them to make things significantly better for themselves

Seems to me AMD never had any real money to burn but they made it this far. Hard deffinatley but not impossible.

Scientia from AMDZone said...

giant

Just eyeballing the fan; it doesn't look that impressive to me. The outside seems to be just a shroud rather than stacked cooling fins. However, I haven't been able to find anyplace that sells it or has more information about it.

Unknown said...

Giant, assuming that, because ATI has misstepped for 2 generations of cards releases means, they are going to die and pull AMD down with them or assuming that means that the company is worthless is inane and naive. Anyone who's even watched this industry has seen companies pull through worse. And considering the fast product refresh rate of the graphics industry, this is especially inane.

It is very interesting and a very poor sign that the r600 core is so incapable of using such vast amounts of memory bandwidth and clock cycles, but then again, that they were able to generate them this early means they have one less thing to develope as much with the next generation of cards.

Ho ho, your arguments about general usage can be fairly quickly flushed down the drain when you look at one thing. AMD can target a specific marget, and do so extremely well (that of scientific workloads). It also (with Nealsons very highly praised and officially backed reports) gives users a very efficient and dense product on pretty much any application, which is more important to any server department than any other performance metric. And while AMD can target these things, and most likely more than adequately supply these markets, they can also charge more because of this superiority. Thus, the entire ASP problem is fixed (if they capitalize on this).

Of course, AMD would have to capitalize on all of this, but anyone whose taken the classes Ruiz has, would know exactly how to take advantage of the market through design (I can't say the same for Otellini, though that may be out of ignorance on my part).

abinstein said...

enumae -
"No comment on my question to you?"

I must have missed it, or maybe I felt it's not worthy of a reply. I can't remember which.

A better way for you to ask is to.. well, as again. Or just keep wait for it to be ignored. ;)

Well if you pay me then it's another question and it depends on how much you'd pay. :)

Scientia from AMDZone said...

Axel

"Unfortunately, demand for AMD's chips fell so much that AMD had to price their products into the floor to clear their inventory. Their new pricing structure seems to have achieved that purpose, but at the expense of revenue and profitability."

No. AMD's drop in demand was related to their shortages in Q4 06. Or are you agreeing with roborat's silly theory that AMD stuffed the channel?

"This, unfortunately, looks to be improbable as Barcy is roughly even per clock with Clovertown:

Gary Key (Anandtech reviewer) quote from this morning:"


Well, Gary has been wrong so far with everything he has said about K10 but I can see that you like playing the long shots, so, why not?

"Basically, K10 looks like it's just K8 in stilettos and lip gloss."

... mumbled someone on a bad acid trip. "And, C2D is like a Yonah, but with whipped cream and chocolate sprinkles."

"But the core has apparently not been improved enough to really pose much of a threat to Intel's current clock speed and performance dominance, particularly with Penryn & SSE4 around the corner."

... said the Mad Hatter to the Mad March Hare. He thought for a moment, "Or at least that is true here in Wunderland. Perhaps not in the real world."

Scientia from AMDZone said...

Aguia

"The only bad thing I can see right now is the 2.0Ghz that we don’t even know if its because of clock speed limitation, temperature limitation, design limitation or TDP limitation;"

Actually, we do know; check the TDP rating of the 2.0Ghz chip.

" because AMD already have shown a 3.0Ghz working quad core processor."

That's with a newer stepping. Volume in Q1 08.

Scientia from AMDZone said...

Axel

"The bottom line is that Barcy will not raise ASPs sufficiently"

It doesn't need to. Barcelona will only raise the ASP in servers. But more importantly Barcelona will reverse AMD's losses in server volume which are subtantial. This will not be much fun for Intel.

"- Clovertown has a tremendous clock advantage."

It does in Q3, about 1.5X. However over Q4 and Q1 this drops to 1.3X and then 1.1X.

"- Harpertown (Penryn server core) launches in about two months with better IPC, lower power use, and SSE4 capability."

Yes, but unfortunately a 5% increase in IPC doesn't match a 25% increase in clock. K10 also gets better power consumption in Q4 with a better stepping and then better again in Q1.

"- Intel can use their clock advantage to keep Barcy prices low, denying profits to AMD."

They could if AMD didn't release higher clocks but they will in Q4 and higher again in Q1.

"Now if AMD are able to ramp clocks very quickly, the situation will ease up a bit for them."

You clearly do not understand. In the same time it takes Intel to get a 10% increase in clock, AMD will get a 50% increase in clock. The high point for Intel is not Q4 or Q1 but actually Q2. Here's why:

1. Intel should reach its maximum clock. This could be 3.6Ghz. Even if AMD increases K10's clock to a likely 3.2Ghz it will still increase the gap over Q1.

2. 45nm volume should be high but AMD will have already reached its lowest cost until 45nm is released. This means that Intel could squeeze AMD a bit on prices.

3. Intel may reach its peak of 1600Mhz on the FSB.

" I personally don't believe that clocks will ramp as quickly as some claim. I think 3.0 GHz in Q4 is utterly out of the question."

Who is talking about Q4? Q1 for 3.0Ghz. But your other ideas are wrong. Intel won't have enough volume of Penryn in Q4 to squeeze AMD. There is another factor as well. If you look at the benchmarks it is clear that dual core C2D hits a memory bandwidth wall at 2.66 Ghz and quad core hits the same wall earlier at 2.4 Ghz. Now, let's assume that Penryn's larger cache plus the 1600Mhz FSB can bump the quad barrier from 2.4 to 3.0 or even 3.2Ghz. This means that Penryn quad will be scaling badly above 3.2Ghz. This may prevent Intel from taking back a decisive lead in anything except specialized SSE.

Scientia from AMDZone said...

Axel

"So I guess AMD should adopt your reasoning, ignore the desktop and mobile cash cows, and continue to lose $2 billion a year?"

What could you possibly be talking about? You know that AMD will release both desktop and mobile processors in volume in 2008.

"With K10, AMD might in fact serve your niche very well but don't be surprised if it's also directly responsible for AMD filing Chapter 11 next year or completely restructuring their business to quit the desktop / mobile markets."

This won't happen. AMD should have its losses under control by Q4. Also, (just in case you've forgotten already) AMD is releasing the new Griffin mobile processor in 2008. AMD is also going to release desktop versions of K10 like Phenom which replaces the Athlon 64 brandname.

Scientia from AMDZone said...

giant

"AMD is on it's last gasp of air, they'll lose another $600m this quarter, aside from the money they make from their little garage sale. The problem? AMD only has so much to sell.

AMD is finished - BK by Q2'08."


Giant, relax. You've already earned your place in the List of Absurd Predictions, no need to drive it into the ground. However, Axel now has a place beside you with his prediction that AMD will either declare chapter 11 next year or drop out of both desktop and mobile. So, Roborat, Lex, Giant, and Axel. I'd probably have to hunt up who claimed that K10 was no faster than K8 but I'm guessing it was in this bunch as well.

Scientia from AMDZone said...

enumae

"Is there any reason that Intel can not incorporate this [SSE5] into their products by 2009?"

Yes, there is.

Scientia from AMDZone said...

abinstein said...

enumae -
"No comment on my question to you?"

I must have missed it, or maybe I felt it's not worthy of a reply. I can't remember which.

A better way for you to ask is to.. well, as again. Or just keep wait for it to be ignored.


I believe his question to you was about the SPECfp scores in the video clip.

InTheKnow said...

Yes, but unfortunately a 5% increase in IPC doesn't match a 25% increase in clock. K10 also gets better power consumption in Q4 with a better stepping and then better again in Q1.

I think you are focusing too much on the small performance increase. The real improvement that Penryn brings to the table is power reduction. From Ars Technica:

Penryn runs significantly cooler (about 10 degrees Celsius and 10 watts less) than its predecessor under both idle and maximum power conditions. Or, to put it differently, Penryn uses less power under maximum load than Conroe does at idle. That's a pretty major boost to power efficiency.

So you get a 5% performance boost and a big power drop. Barcellona/Phenom will have to compete with the total package. Penryn may bring enough to the table and it may not, but I think the focus on just performance doesn't take the whole package into account.

InTheKnow said...

Also, (just in case you've forgotten already) AMD is releasing the new Griffin mobile processor in 2008.

And it will be competing with this...

The first chip is a dual core processor which will consumer just 25W of power - similar to the older Pentium-Ms. All the present-gen Core Duo and Core 2 Duo mobile processors have TDP ratings near 35W.

Griffin may be good, but the competition isn't standing still either.

Scientia from AMDZone said...

giant

"Compare that to the Clovertown launch. It was scheduled for February this year but Intel pulled it in by three months to November '06."

According to tick tock, Intel releases a new process 2 years after the old one. 65nm was released Q4 05 so we would expect 45nm in Q4 07. You are trying to give Intel extra credit for not slipping its own schedule?

Allepisodes

"Intel is like 16X the size of AMD"

In terms of processors Intel is about 3.5X the size of AMD.

"if Intel starts to fall behind they can just throw more Engineers at the problem and catch up. AMD doesn't have that Luxury Yet. Keythings to remember here is AMD= 16X smaller than intel"

Your basic assumptions are wrong. AMD was running a single design team so 16X this would mean that Intel had 16 design teams? Not even close. Intel had four: Itanium, Whitefield, Nehalem, and Banias. So 4X the number of teams.

Intel disbanded the Whitefield team, Itanium is the same, and the former Dothan team is now working on C2D. AMD is now running two teams: one for K10 and one for Griffin (which includes the old Geode team). So, Intel's design team ratio has dropped from 4X to 1.5X.

Scientia from AMDZone said...

InTheKnow

"The first chip is a dual core processor which will consumer just 25W of power - similar to the older Pentium-Ms. All the present-gen Core Duo and Core 2 Duo mobile processors have TDP ratings near 35W."

That is pretty good. That should put Penryn at as little as 3 watts more than Griffin. If Intel's chipset is as good as AMD's then it will come down to a little better battery life for AMD versus a little better performance for Penryn.

Allepisodes said...

Scientia from AMDZone

In terms of processors Intel is about 3.5X the size of AMD.

I meant as a overall company not in terms of processors & im not sure of the exact number that Intel is larger than AMD. So thats why i said
"like 16X the size of AMD". Intel is just flat out a larger company with more resources so giant shouldn't be making it sound as if its that difficult For a company that large to pull a scheduled chip in. If thats infact what Intel did.

Ho Ho said...

scientia
"Six months for AMD to hit 3.0Ghz in volume production. Unless AMD's 3.0Ghz demo was an elaborate hoax they should have this in Q1 08."

I'll repeat myself once more: not before H2.


greg
"Ho ho, your arguments about general usage can be fairly quickly flushed down the drain when you look at one thing. AMD can target a specific marget, and do so extremely well (that of scientific workloads)."

Yes, it can. Though that part of the market is rather small so they can't get too much revenue from it. As has been seed before by numerous people, servers make up around 10% of both company total sales. It is kind of difficult to have that 10% as primary source of income.



scientia
"Intel won't have enough volume of Penryn in Q4 to squeeze AMD."

And AMD will have enough K10 to squeze Penryn and Clovertown/Tigertown?


"You know that AMD will release both desktop and mobile processors in volume in 2008."

AMD still has to sell its old and low ASP K8 for a long time before it can make decent profit from the new ones.


"This won't happen. AMD should have its losses under control by Q4"

Yes, it likely won't bancrupt but I'm quite sure they will still report losses in Q3 and Q4. I expect quite big ones in Q3 and less in Q4, something like 400-500M and 200-300 in Q4. Though I wouldn't be surprised to see bigger losses.


"Yes, there is. [a reason Intel can't use SSE5]"

What would it be? Though I doubt Intel would copy SSE5 exactly anyway, Larrabee will be quite a different beast and will have a whole new SIMD instructionset anyway.

Scientia from AMDZone said...

InTheKnow

"Penryn runs significantly cooler (about 10 degrees Celsius and 10 watts less) than its predecessor under both idle and maximum power conditions."

I'm sorry but this is not very impressive for dual core. You do realize that AMD should be hitting this same spec with quad core K10 on its B2 stepping. AMD should also be able to hit this spec with dual core K10 with its B1 stepping. If this is truly the best that Intel can do then it is sad indeed if Intel can't hit this spec with its most sophisticated stepping of 65nm, G0.

"So you get a 5% performance boost and a big power drop."

And a power rating at 45nm that AMD can hit at 65nm.

"Barcellona/Phenom will have to compete with the total package."

That is true. And, when we add in the fact that AMD's rating includes the IMC where does this leave Penryn on the total package?

"Penryn may bring enough to the table and it may not, but I think the focus on just performance doesn't take the whole package into account."

True; it looks worse for Penryn when you add in the power draw.

Ho Ho said...

scientia
"I'm sorry but this is not very impressive for dual core."

It is when you compare their die sizes. Smaller die with same thermals will always be hotter than bigger one. With its massive size K10 will be relatively cool compared to Intel CPUs.


"You do realize that AMD should be hitting this same spec with quad core K10 on its B2 stepping"

On what do you base that theory?


"And a power rating at 45nm that AMD can hit at 65nm."

Well, 45nm will be what Barcelona has to fight against.

Also wasn't the 3.33Ghz dualcore at 65W and 3GHz quadcore at 80W? Those are the first 45nm CPUs. I wouldn't be surprised to see much higher speed ones in the future.

When will AMD hit that kind of thermals? As I said before, I'm quite sure not before H2 next year.

Aguia said...

Ho ho

Well, 45nm will be what Barcelona has to fight against.

Well do you already know in which market Intel will release the 45nm CPUs first?


Also wasn't the 3.33Ghz dualcore at 65W and 3GHz quadcore at 80W? Those are the first 45nm CPUs.

Where is that info? When will they be released? What market segment?


I wouldn't be surprised to see much higher speed ones in the future.

But when ho ho? I could also say the same things about K10.


Also you think that all 45nm Intel processor will work at least at 3.0Ghz? Is that it? I don’t think Intel can have too much models/processors from 2% production.
At least AMD is going to do the right thing, having just one or two models at the begging and when it ramps the production will add more models to it. Perfect choice. Otherwise couldn’t meet market demands.

Ho Ho said...

aguia
"Well do you already know in which market Intel will release the 45nm CPUs first?"

Servers. Desktop will come in Q1 and by the end of Q2 half the mobile chips will be on 45nm.


"Where is that info? When will they be released? What market segment?"

It is all written here. Besindes the 65W 3.33GHz dualcore the rest will be released in Q4 as Xeons. That 65W will be released in Q1 together with other dualcores.


"But when ho ho?"

I would expect to see small speedbumps in Q2 and Q3, if Nehalem is delayed then perhaps in Q4 too.


"I could also say the same things about K10."

Yes, you could and I'm sure that K10 clock speed wil climb slow and steadily for the next months. Intel will release most clock speeds at day one and have long pause between introducing new models.


"Also you think that all 45nm Intel processor will work at least at 3.0Ghz?"

Of course not, what made you think that?


"I don’t think Intel can have too much models/processors from 2% production. "

You do know that they will ramp fast and have most CPUs made on 45nm before year has passed?


"At least AMD is going to do the right thing, having just one or two models at the begging and when it ramps the production will add more models to it."

Have you got any idea what models will AMD release in a couple of days? They'll have five 2P quadcores from 1.7-2GHz and four for 4P+ ranging from 1.8-2GHz. That is a total of nine different CPUs with four different clock speeds and two different TDP ratings. I'd call this far from "one or two models".


"Perfect choice. Otherwise couldn’t meet market demands."

I just hope that you wouldn't call not selling any higher clocked K10s as "meeting market demands".

GutterRat said...

abistein,

Please get me your list of condiments and your address ASAP

http://blogs.zdnet.com/Ou/?p=735

Mmm...yummy words.

Scientia from AMDZone said...

Ho Ho

"I'll repeat myself once more: not before H2."

Okay, my estimate is based on the past historical ramping patterns of both Intel and AMD. Everything suggests 6 months. Where do you get 2H 08 from? Also, AMD is saying mid 2008 for 45nm so you are claiming that 3.0Ghz 65nm K10 will arrive at the same time as Shanghai?

"And AMD will have enough K10 to squeze Penryn and Clovertown/Tigertown?"

AMD doesn't need to squeeze Intel. AMD only needs to improve their own lineup.

"AMD still has to sell its old and low ASP K8 for a long time before it can make decent profit from the new ones."

Not really. AMD can convert to K10 much faster than Intel can convert to 45nm.

"Yes, it likely won't bancrupt but I'm quite sure they will still report losses in Q3 and Q4. I expect quite big ones in Q3 and less in Q4, something like 400-500M and 200-300 in Q4."

I would probably be about $100 Million more optimistic each quarter.

"What would it be? Though I doubt Intel would copy SSE5 exactly anyway, Larrabee will be quite a different beast and will have a whole new SIMD instructionset anyway. "

SSE5 is a big problem for Intel but I'm not going to discuss it yet. I can probably just add that onto the benchmark results Monday.

Scientia from AMDZone said...

Axel

Yes, I've been working on a reply. You had a number of links that needed to be checked.

"Mercury Research. "McCarron explained that the Sunnyvale, Calif., company appears to have overestimated demand in the fourth quarter of 2006 and shipped more processors to OEMs and channel partners than were needed."

Yes, I agree the article sounds good. The problem is that when you run the numbers at least 2/3rds of the drop in volume was due to DDR. And, this would match what AMD said about having the wrong product mix. In other words, rather than the claim that AMD sold fewer processors because they sold too many in Q4 it seems that AMD would have sold more processors in Q1 if they hadn't been stuck with DDR chips that no one wanted. This is not the same as stuffing the channel.

"Not much it won't. You would know this if you had seen the price list."

The price list looks fine. I don't see anything in it that would reduce server ASP. AMD's server ASP is still down from what it was at the beginning of 2006 so there is room to come up.

"No, AMD will continue to lose server share through the remainder of 2007 due to low clocks."

I'm sorry but this statement is absurd. Even with the expected clocks in Q3 and Q4 there are still big benefits with K10. Server sales should increase in Q3 and Q4.

"2008 will be a pitched battle between Harpertown and faster Barcelonas."

Not really. The server market doesn't work that way. There is no big rush to buy the top performance chip with its accompanying top bin price and top TDP rating. The real struggle will be with the 90 and 65 watt ranges. Intel's problem is that the ramp for 45nm is slow because it doesn't have a six month lead with another architecture this time. And, it isn't just a volume ramp; it is a speed ramp as well.

" AMD's volume share is unlikely to return to 2006 levels because Harpertown will be far more competitive than Netburst was."

No, this is false. AMD's volume in 2006 was 23% which is what it is now. AMD's volume should be higher by the end of 2008. AMD may not hit their target of 30% but I could see 27 or 28%. I guess you could have been referring to AMD's peak Q4 06 volume of 26%. AMD may not match this again in Q4 but should sometime during 2008.

"And unfortunately your 25% claim has a shaky foundation unless you can link to an official roadmap."

Considering that there are no official roadmaps for Intel, your objection is nonsense.

"In this review at Anandtech, compare the benchmarks of QX6850 (3000/1333) to QX6800 (2933/1066). You will see that the performance increase is mostly attributable to the 2.3% higher clock. This shows that the FSB & memory subsystem are not the bottlenecks for most applications, even for high clocked Kentsfield."

If you compare the SYSMark General Usage numbers they appear to scale evenly. Yet, you can plainly see that C2D is faster with a 1333Mhz FSB than it is at 1066Mhz even though it appeared to scale normally at 1066Mhz. For the rest we can ignore the benchmarks where the 1066Mhz version is keeping up with the 1333Mhz because obviously these are not stressing the memory bandwidth. We do see some stalling with Photoshop.

"I don't how you can make these categorical statements without providing any evidence"

You are trying wring a pound of results out of an ounce of benchmarks. Benchmarks that are not bandwidth sensitive may respond to more cache. However, K10 has more cache as well as better prefetching. I'm certain we will see a difference when AMD does get a 3.0Ghz chip out the door, probably Q1.

"Yes, desktop based on K10 and mobile based on power-optimized K8. Unfortunately both are "too little, too late" for those areas. K10 is designed for servers and memory-bound tasks, it will not compete well with Penryn on the desktop especially with Penryn's die size and MCM advantage."

Considering that the desktop K10's are better than the desktop K8's I'm not quite sure how you arrrive at the conclusion that they are too late. AMD should have good volume of desktop K10 in Q1 while Intel won't have good desktop volume of Penryn until Q2. You are correct that Intel will have lower costs with MCM and these will help offset the initial lower yields. As I've already said, this works to Intel's advantage in Q2. But Shanghai then is released in Q3. I also have to wonder if Intel sales will drop a bit as people anticipate Nehalem in Q4. See, it isn't as straightforward as you try to make it sound.

" And K8 is just ancient history now, it's laughable to expect Griffin to compete with 45-nm Penryn mobile, which is returning to a 25W power envelope with far higher performance than K8-based Griffin."

However, battery life is still what people want on notebooks more than peformance. And, Intel's 25W Penryn will probably draw about 3 watts more than AMD's 35 watt Griffin.

I assume from the benchmarks you are trying to claim that only server benchmarks are bandwidth intensive enough to reduce C2D's quad core speed and that therefore K10 is only competitive for servers. If this idea were in fact true then Intel would still be at a disadvantage with SSE since heavy SSE tends to be truly bandwidth intensive. I suspect that this won't be the only area that we see a difference but if this area alone is nullified that would be enough to offset Intel's advantage.

Griffin will be competitive in mobile because battery life is still more important than premium performance. This basically means that Penryn will tend to swap places and give ground on pure mobile while taking more desktop replacement share.

Scientia from AMDZone said...

Okay, first let's give this a proper link: Leaked - AMD Barcelona versus Intel Clovertown and Tigerton.

BTW, links are not difficult, just use the < a href="url" > label < /a > tags.

GutterRat

The Integer scores look pretty good with K10 being 14% faster than Clovertown. That is enough speed to stay ahead of Penryn in servers.

The FP scores however would be staggering. As I recall, AMD only claimed 50% faster at the same clock but this would show 84% faster. That doesn't sound right; K10 shouldn't be that much faster than Clovertown.

So, now I wonder how these (if they are correct) would translate to desktop scores.

abinstein said...
This comment has been removed by a blog administrator.
abinstein said...
This comment has been removed by the author.
abinstein said...

Gutterrat
"Please get me your list of condiments and your address ASAP"

How about some specimen of your brain? I'm really interested to see what kind of deficiency it possesses to make you so poor in reading.

I'll even send them back to you if you attach a return envelope - apparently you need those, even heavily deficient ones.

scientia -

If you wanna delete posts, you better delete all related ones. I return stupidity where stupidity awaits. You should know where the root of stupidity is.

Aguia said...

Scientia,
If those results are correct then your assumptions were correct, that AMD was using a 2,6Ghz quad core K8 to simulate the results.

I read some of the posts from GeorgeOu, who is he?
Does he work for ZDnet? Or he just runs a blog there?
The guy seams to have some problems with the AMD word, does any one know why?

Scientia from AMDZone said...

Ho Ho

"On what do you base that theory?"

I was using the referenced article from HKEPC which is where Ars Technica got their information. However, let's use the more recent information:

X5460, 3.16 Ghz, 120W
E5450, 3.00 Ghz, 80W
E5440, 2.83 Ghz, 80W
L5430, 2.66 Ghz, 50W

For B2 stepping AMD has demonstrated a 3.0Ghz 130 watt part, so we would expect:

3.0 Ghz 130 watts
2.8 Ghz 90 watts
2.6 Ghz 68 watts
2.4 Ghz 45 watts

So, it looks like the power draw will be pretty even in spite of Intel's 45nm.

"Servers. Desktop will come in Q1 and by the end of Q2 half the mobile chips will be on 45nm."

Right this is a quarter behind K10.

"Yes, you could and I'm sure that K10 clock speed wil climb slow and steadily for the next months. Intel will release most clock speeds at day one and have long pause between introducing new models."

No. You are confusing this launch with the one in 2006 when Intel had a six month lead on 65nm. This time Intel won't have the top clocks up front. The best we are likely to see in Q1 is 3.32Ghz for quad core which is only a slightly higher clock than AMD's 3.0Ghz quad core. Intel may hit 3.48Ghz in Q2 while AMD bumps to 3.2Ghz. If Intel does well they even bump to 3.64Ghz but that is six months after Penryn's launch.

"You do know that they will ramp fast and have most CPUs made on 45nm before year has passed?"

Yes, by the end of 2008. But, AMD will convert to K10 more rapidly than that. Also, AMD will convert to 45nm more rapidly than Intel does. By early 2009 Intel will have lost nearly every advantage including process and die size. Their one remaining advantage will be memory bandwidth which AMD will surpass in Q2 09. This means by mid 2009 AMD and Intel should be nearly equal in processor offerings with AMD's having a small advantage in memory bandwidth and memory cost.

Aguia said...

Axel,

there are no road maps from Intel nor from AMD, however you can see in Anandtech site one slide from AMD that says:

HE Energy Efficient:
Up to 1.9Ghz at launch; Higher in Q4/Q1

Standard Performance:
Up to 2.0Ghz at launch; Higher in Q4

SE High Performance:
2.3Ghz and above; Q4 delivery

amd promises speeds


Scientia,

This goes in line with what you already replied to me, AMD is having problems in scaling Ghz because of the TDP. I bet dual core K10 will get much faster clock speeds. AMD “big” problem is to make sure the new processors will work in the old systems where the dual core it will replace where. And that is not that easy as it may seam.

Amdzoner said...

Scientia, you seem very certain that AMD WILL have a 130W 3Ghz Quad-Core part in Q4. I will bet money that this will not see daylight, atleast not in Q4. (Of course with good yields.)

Up for a bet?

I think AMD lost the battle with Barcelona vs. Penryn. Pretty sure I'm right. 1 day until graduation. In 4P segments, it doesn't matter if AMD costs less. These guys spits out 4x the money for 2x the performance. Tigerton 2,93Ghz should be the only choice for those who wants ultimate 16-Core performance and much RAM.

Ho Ho said...

scientia
"For B2 stepping AMD has demonstrated a 3.0Ghz 130 watt part"

And you know they were 130W because ... ?

Considering the size of the coolers I'm quite sure the ones that were demonstrated were far from 130W.

Aguia said...

In 4P segments, it doesn't matter if AMD costs less. These guys spits out 4x the money for 2x the performance. Tigerton 2,93Ghz should be the only choice for those who wants ultimate 16-Core performance and much RAM.

What’s ultimate performance? 1% faster? 10% faster?
Much RAM? Do you mean even more limited bandwidth RAM in 4 way systems? With 2way its already bad.

And 192GB FBDIMM consume as much power as (96x20W=1920W). The AMD with 128GB DDR2 DIMM consume 256W (64x4W=256W).

Unknown said...

Considering the size of the coolers I'm quite sure the ones that were demonstrated were far from 130W.

Indeed, look at this fan:

http://www.fudzilla.com/index.php?option=com_content&task=view&id=2585&Itemid=51

AMD failed to show how much heat the CPU was putting out. It must have been overheating with the FX fan they used before. Why else would AMD switch to that large aftermarket cooler?

3GHz Phenom will have minimum 150W TDP.

Unknown said...

In 4P segments, it doesn't matter if AMD costs less. These guys spits out 4x the money for 2x the performance. Tigerton 2,93Ghz should be the only choice for those who wants ultimate 16-Core performance and much RAM.

For those who want the ultimate performance, regardless of cost, will buy servers with the IBM Hurricane X4 chipset. That's up-to 32P, 128 processing cores at 2.93Ghz with up-to 1TB of DDR2 memory.

Has AMD anything that can match this kind of performance? No.

Ho Ho said...

aguia
"And 192GB FBDIMM consume as much power as (96x20W=1920W)."

You forgot the low-power FBDIMMs that doesn't use much more power than DDR2. I once even showed some PDF about those but I currently don't have time to look it up for you.

Also DDR2 uses around 5-10W per dimm, depending on speed. 4W is definitely way too little for them.

Also if one would compare watts per GB then it would be around

Aguia said...

3GHz Phenom will have minimum 150W TDP.

I expect that value to be even higher, since 2.0Ghz already have 120W TDP.
But one thing it’s for sure, it will be a great overclocker if that’s the case. Overlockers don’t care about power consuming, as it has been shown in the Pentium4/D days where Intel CPUs consume 2X/3X/4X more than AMD CPUs.

Has AMD anything that can match this kind of performance? No.

Well do you have benchmarks from that IBM system?

Do you have a link from an Intel system that can match that performance? No?
Than AMD it’s not the only one that can’t do that.

Unknown said...



Do you have a link from an Intel system that can match that performance? No?


The Hurricane X4 chipset is a chipset for Intel CPUs. The highest of high end. It supports up-to 32P operation and uses DDR2 vs. FB-DIMMS on Intel's own chipsets.

Ho Ho said...

sorry, pressed the wrong key on KB :)

Anyway, for 128GB of normal 20W FBDIMMS would be 64*20=1280W or with low-power models around half that.

Of course low-power ones would also be significantly more expensive

Unknown said...

Overlockers don’t care about power consuming

You're right. As long as the fan is quiet, and the system is 100% stable most people won't mind about a high power consumption.

Aguia said...

Ok ho ho and Giant I accept or last two replies since they are more accurate.

Aguia said...

Giant,

Do you remember Bioshock?

2600XT beats the 8600 GTS in Bioshock?

Ho Ho said...

aguia
"2600XT beats the 8600 GTS in Bioshock?"

Wasn't that benchmark on gamespot with DX9? On the same page there is a comparison between DX9/10 and HD series have huge speed drop compared to NV stuff.

Ho Ho said...

I found one document abot FBDIMM power usages here. They have 4GiB 533MHz stick and peak power usage I saw was around 12.5W. Most other numbers were below 10W.

For 128GiB that would make around 32*12.5=400W peak. A lot but not too much I'd say. As 8GiB sticks are coming later this year I expect the power usage to drop even lower per GiB.

Ho Ho said...

An interesting news, coming from ... Tomorrow!

It has spec scores for 1.9GHz Barcelona:
SPECint2006 11.3
SPECint_rate2006 83.2
SPECint_rate_base2006 72.8
SPECfp2006 11.2
SPECfp_rate2006 73.0
SPECfp_rate_base2006 68.5

I'll try to find matching scores/speeds from Intel


Anyone can guess why is the following there?
(1) Planned availability for the x3455 model using the AMD Opteron Model 2347 processor
(1.9GHz, 512KB L2 cache per core) is November 16, 2007.


By half the November I had hoped for much higher speeds.

GutterRat said...

abistein wrote,

How about some specimen of your brain? I'm really interested to see what kind of deficiency it possesses to make you so poor in reading.

There is no deficiency. rev 10h is the core upon which the desktop and server designs are made from.

Face it: you have been evasive.

Sometimes it's best to admit being driven by emotions vs logic/facts.

I am driven by the latter.

Why can't you just admit that Intel has got AMD beat this round?

GutterRat said...
This comment has been removed by the author.
Axel said...

Occam's Razor rings true: The most logical explanation is usually the correct one. Why was AMD so quiet all year? As suspected, K10 appears to be too little too late. It's a server chip that will compete well with Intel in that space when the clocks come up sometime next year. But in the desktop space, K10 is on average only 10-15% faster than K8. It will be no match for Penryn. Based on those benchmarks, Penryn will generally beat K10 in the enterprise space as well. And in the mobile space, K8-based Griffin has no chance against Penryn in terms of performance per watt.

Many people are set to eat a healthy portion of humble pie today. AMD, in their current incarnation with their current business model, are finished. They will restructure dramatically in order to survive 2008, it is inevitable.

Anonymous said...

K10 desktop benchmarks flowing in.

http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=3092&p=5

K10 gives about ~15% IPC improvement over K8 clock for clock, per core.

What we know so far,
Conroe has about a ~25% performance advantage clock for clock over K8. K10 IPC improvement still not enough to take down Intel's 65nm Kentsfield.

Since Penryn has about a 5% IPC improvement clock for clock over Kentsfield, AMD is going to have a hard time. Pricing is key for AMD.

Anonymous said...
This comment has been removed by the author.
Anonymous said...

So abistein, what happened to the "K10 will be ~20% faster then Kentsfield clock for clock" prediction?

amw said...

Assuming the Anand benches are reflective then I think for the server space K10 looks very promising indeed as it scales much better, is cheaper (currently) and uses less power than the equivalent Intel offering.

However on the desktop things do not look too rosy. The 15% improvement over K8 must be a minimum due to the hacked server they used with slow memory etc. With faster memory and tweaked BIOS it should be more like 20-25%, however this will only make it on par with Yorkfield but it will have lower clocks. No doubt Intel will be agressive with the 45nm prices as well, so this is not looking as rosy.

Still early days yet though.

Unknown said...



So abistein, what happened to the "K10 will be ~20% faster then Kentsfield clock for clock" prediction?


Anandtech are paid Intel pumpers! They sabotaged the server so that AMD could not get decent results! Honest!

Barcelona is too little too late, just as Pat Gelsinger stated.

AMD as we know it is finished. If they continue as-is, BK is inevitable.

Expect AMD to sell off one of it's fabs or business units by the end of the year.

Unknown said...

AMD is lying and deceiving people with this "ACP" BS.

Intel's definition is straight forward:

ftp://download.intel.com/design/processor/designex/31559405.pdf

Thermal Design Power: A power dissipation target based on worst-case applications. Thermal solutions should be designed to dissipate the thermal design power.

There is nothing in there that say's 'average', WORST-CASE applications, generating the MOST HEAT.... this is just BS. It is dissapointing AMD resorts to this kind of deception.

Intel is consistently under TDP with Core 2: http://www.xbitlabs.com/images/cpu/core2duo-shootout/power-2.png

AMD goes over TDP. FX-62 with 125W TDP uses 130W of power. All Core 2 are under 65W. Core 2 Extreme X6800 is well under 75W.

AMD is a deceptive unethical company.

Unknown said...

Here is a full suite of tests run by TechReport, an unbiased neutral third party.

http://techreport.com/articles.x/13176/1

Remember when looking at the results, as pezal states, the 8360SE won't be available until the end of the year.

For one, it is not going to capture the overall performance lead from Intel soon, not even in "Q4," which is when higher-clocked parts like the Opteron 2360 SE are expected to arrive. Given what we've seen, AMD will probably have to achieve something close to clock speed parity with Intel in order to compete for the performance crown. On top of that, Intel is preparing new 45nm "Harpertown" Xeons for launch some time soon, complete with a 6MB L2 cache, 1.6GHz front-side bus, clock speeds over 3GHz, and expected improvements in per-clock performance and power efficiency.

Gutterat, perhaps you should be getting those condiments ready for abistein. ;-)

Pop Catalin Sever said...

"AMD is a deceptive unethical company."

Intel is is a monopolistic unethical company too ...

Intel TDP is based on "worst case Applications" which is lower than computed TDP that's computed by AMD wich is :

"Thermal Design Power (TDP) is measured under the conditions of Tcase Max and VDD=VID_VDD, and include
all power dissipated on-die from VDD, VDDIO, VLDT, VTT, and VDDA. Contact your Field Application
Engineer for more information on TDP specifications"
AMD Opteron™ Processor
Power and Thermal Data Sheet


AMD TDP is physical max while Intel max is Application worst case max < physical max. It's not wrong to measure TDP like Intel does but the measurement system is made so that Intel looks better than AMD period.

That test from XBit doesn't include newer AMD steppings and models it's too old and also doesn't include AMD .65 nm cpus. Try a more conslusive power test done by TH :

Energy Index: AMD Unbeatable

Unknown said...

That test from XBit doesn't include newer AMD steppings and models

It also doesn't include the latest Intel steppings either. Neither does your link. The same stepping that reduced the TDP at 2.66Ghz Quad from 130W (QX6700) to 95W (Q6700).

Back to Barcelona performance. The TechReport results are the most thorough review I could find. Does anyone know of any other benchmarks?

Hornet331 said...

http://www.tecchannel.de/server/prozessoren/1729224/

german, but graphs are universal. :)

Aguia said...

I dont know why are you all bash abistein,

The Anandtech review clearly show that the 2.0Ghz Opteron is as fast or faster than the 2.33Ghz Xeon.

So if you know math you would know ~20% faster is around 2.4Ghz.

Axel said...

Aguia

I dont know why are you all bash abistein

He was spreading false second-hand information and learned his lesson just like The Ghost. He's eating a huge helping of crow today and will be pretty quiet for a few days. His multiple comments referred specifically to Phenom desktop performance vs. Penryn:

"Other than mpeg4 encoding, Phenom X4 performs better than Penryn Q6xxx at even 15% slower clock, period. That means a 2.5GHz Phenom will be comparable to 3.0Ghz Penryn."

"Those who tested both Phenom and Penryn do confirm to me that Phenom has better IPC and draws less power. AMD's new chipset helps there too."

"I only tell you what I heard from someone who actually seen both Penryn and Phenom running. The point is those Phenom doubters, who claim the chip doesn't offer better IPC, have offered not but FUDs."

"No, I am not making up stories, but those who actually tested both Phenom and Penryn told me such "stories," if you like. The story is short and simple: Phenom has better IPC than Penryn."

"What I heard, however, is that Phenom X4 has better IPC than Yorkfield, but the latter has clockrate advantage. I have said this so many times, but to my amazement, even when some of you got tired of hearing it, you still don't get it."

"If Penryn tops at 3.33GHz while Phenom at merely 2.8GHz, then Intel should have no fear of losing the performance title. With Phenom at 3.0GHz, however, ..."

-----------------

As an aside, Rahul Sood's claim that "Phenom at 3GHz...kicks the living cr_p out of any current AMD or Intel processor — it is a stone cold killer" will also turn out to be laughably wrong. It's clear that, as usual, he gets overexcited when he gets his hands on new hardware and tends to make erroneous claims. I lose more respect for him every day.

This claim led Scientia to later make the following erroneous prediction: "Sood is more easily impressed so I'm thinking suppose K10 is 15% faster than Kentsfield at the same clock. That would be enough for 3.0Ghz Phenom to match 3.33Ghz Penryn."

Unfortunately, Phenom won't even match Penryn per clock, let alone be 15% faster.

Scientia from AMDZone said...

Sal

"Scientia, you seem very certain that AMD WILL have a 130W 3Ghz Quad-Core part in Q4."

Sal, this is a classic example of the way that incorrect things get attributed to me all the time. But, frankly I don't understand where you got this incorrect idea from.

AMD will have 130 watt 3.0Ghz Quad core parts in Q1 08.

" I will bet money that this will not see daylight, atleast not in Q4. (Of course with good yields.)"

Then you agree with me.

"I think AMD lost the battle with Barcelona vs. Penryn."

Okay, now you've stopped agreeing with me.

"In 4P segments, it doesn't matter if AMD costs less. These guys spits out 4x the money for 2x the performance. Tigerton 2,93Ghz should be the only choice for those who wants ultimate 16-Core performance and much RAM."

Did you get this silly idea from Ou? He said something very similar (but incorrect) in his blog. The big boost in performance is when compared with current Intel systems, not AMD systems.

By Q1 08, you should be able to get 3.0Ghz 16 core/4 socket systems with Barcelona as well. The only advantage Intel might have is the amount of memory but that comes with a steep performance penalty.

Scientia from AMDZone said...

Ho Ho

"Considering the size of the coolers I'm quite sure the ones that were demonstrated were far from 130W."

Yes, I'm sure you are. However, I'm reminded of equally profound certainty that I was wrong about AMD's having 2.4Ghz chips. And, then AMD demoed 3.0Ghz.

Whether you believe it was 130 watts or not is unimportant since you believe that AMD won't have 3.0Ghz out until Q3 08 whereas I'm expecting them in Q1. We'll see who is right.

Scientia from AMDZone said...

Giant

"For those who want the ultimate performance, regardless of cost, will buy servers with the IBM Hurricane X4 chipset. That's up-to 32P, 128 processing cores at 2.93Ghz with up-to 1TB of DDR2 memory."

Yes, AMD's systems should be the same once DC 2.0 is released.

"Has AMD anything that can match this kind of performance? No."

No, not until DC 2.0.

Scientia from AMDZone said...

Axel

"As suspected, K10 appears to be too little too late."

Incorrect.

"It's a server chip that will compete well with Intel in that space when the clocks come up sometime next year."

Incorrect. It should be at 2.5Ghz in Q4.

"But in the desktop space, K10 is on average only 10-15% faster than K8."

Incorrect again. However, since you don't understand the benchmarks I don't think I'll bother trying to explain.

" It will be no match for Penryn."

Today, 2.0Ghz K10 is a match for 2.33Ghz Clovertown. 2.5Ghz K10 in Q4 should match 2.83Ghz Penryn. 3.0Ghz K10 in Q1 should match 3.33Ghz Penryn.

" Based on those benchmarks, Penryn will generally beat K10 in the enterprise space as well."

Not really. Things should be pretty close by Q1.

"And in the mobile space, K8-based Griffin has no chance against Penryn in terms of performance per watt."

This is an absurd statement. The extra power of Penryn won't do any good with half the memory bandwidth that mobile uses.

"Many people are set to eat a healthy portion of humble pie today."

For what?

" AMD, in their current incarnation with their current business model, are finished."

Yes, they are finished fooling around with Intel. AMD should be much more competitive through 2008 and 2009.

" They will restructure dramatically in order to survive 2008, it is inevitable. "

I don't see why? Oh, but is Intel still going to fire 10,000 employees in 2008 as they planned in 2006?

Scientia from AMDZone said...

Poke

"So abistein, what happened to the "K10 will be ~20% faster then Kentsfield clock for clock" prediction? "

It seems to be 17%.

Scientia from AMDZone said...

Axel

"His multiple comments referred specifically to Phenom desktop performance vs. Penryn:"

"Other than mpeg4 encoding, Phenom X4 performs better than Penryn Q6xxx at even 15% slower clock, period. That means a 2.5GHz Phenom will be comparable to 3.0Ghz Penryn."

Well, it now looks like a 2.5Ghz K10 will be comparable to a 2.83Ghz Penryn. So, a bit slower.

"Those who tested both Phenom and Penryn do confirm to me that Phenom has better IPC and draws less power. AMD's new chipset helps there too."

Can't really tell on this one until the faster clocks come out but it looks like roughly an even match to me with Intel perhaps slightly ahead.

"I only tell you what I heard from someone who actually seen both Penryn and Phenom running. The point is those Phenom doubters, who claim the chip doesn't offer better IPC, have offered not but FUDs."

Well, clearly the IPC on K10 improved substantially.

"No, I am not making up stories, but those who actually tested both Phenom and Penryn told me such "stories," if you like. The story is short and simple: Phenom has better IPC than Penryn."

And, it obviously does.

"What I heard, however, is that Phenom X4 has better IPC than Yorkfield, but the latter has clockrate advantage. I have said this so many times, but to my amazement, even when some of you got tired of hearing it, you still don't get it."

This still seems true.

"If Penryn tops at 3.33GHz while Phenom at merely 2.8GHz, then Intel should have no fear of losing the performance title. With Phenom at 3.0GHz, however, ..."

I agree. 3.0 should match Penryn at 3.33Ghz.

"As an aside, Rahul Sood's claim that "Phenom at 3GHz...kicks the living cr_p out of any current AMD or Intel processor — it is a stone cold killer" will also turn out to be laughably wrong."

Not really. This seems to have more than a little truth to it.

"It's clear that, as usual, he gets overexcited when he gets his hands on new hardware and tends to make erroneous claims."

Some exageration, yes. But I'm sure Intel is glad that it has Penryn on the way. And, he did say "current".

"This claim led Scientia to later make the following erroneous prediction: "Sood is more easily impressed so I'm thinking suppose K10 is 15% faster than Kentsfield at the same clock. That would be enough for 3.0Ghz Phenom to match 3.33Ghz Penryn."

I know; I'm amazed because my 15% was so close to the true number of 17%, not bad if I do say so myself.

abinstein said...

gutterrat:
"Face it: you have been evasive."

You face it: you have been stupid.

Why can't you just admit that Barcelona is clock-for-clock superior than Clovertown, and even matching that of Penryn (exactly as I've heard from my source).

OTOH, your source spoke nothing but FUDs and lies, and some people over there is stupid enough to believe it.

Unknown said...

Why can't you just admit that Barcelona is clock-for-clock superior than Clovertown, and even matching that of Penryn (exactly as I've heard from my source

This is totally 100% wrong. As axel's newest post on the post above this one shows that K10 is well behind Clovertown on a wide variety of workloads.

abinstein said...

Giant -

Don't make yourself look as stupid as gutterrat. Look at this comparison between 1.9GHz Barcelona and 1.8GHz Xeon. The only thing a quad-core Xeon can say is it runs single-threaded integer workload slightly faster.

People don't buy a quad-core to run just one single-threaded program at a time. People buy quad-cores to run multiple processes under heavy load, where Barcelona is 20%+ faster than Clovertown at the same clock rate.

Both you and gutterrat were simply in deep denial before and in serious silliness now by insisting the opposite.

«Oldest ‹Older   201 – 350 of 350   Newer› Newest»