Tuesday, June 12, 2007

The Eye Of The Storm

At the same time that people on the web are insisting that AMD is spinning its statements to cover up failures with K10, I've seen many of these same people frantically spinning the news in the opposite direction by interpreting any snippet of information in the worst possible way against AMD. If there are indeed cyclonic and anti-cyclonic forces at work then perhaps somewhere in the center we can find something that makes sense.

The only way to reach any conclusions about AMD is to start with what we know. We know that AMD first demonstrated Barcelona November 30, 2006. This was presumably a stable Alpha chip. The earliest Conroe's were also Alpha chips and we know that it took Intel about six months to go through two revisions to get to the B1 release version. However, we also know that Intel's B1 had a minor bug that was fixed in the B2 revision, so B2 is also a reasonable assumption for production. We know too that the fastest that AMD could run off inline test chips would be 10 weeks. So, if we project this from the November 2006 demo we get:

November 30, 2006 – Stable Alpha silicon demo
February 8, 2007 – B0
April 19, 2007 – B1
June 28, 2007 – B2 (originally intended launch?)

Since AMD gave another demonstration in May we can assume that this followed the B1 revision. They could not have been waiting on a B0 revision since, even with delays, this would have been ready no later than March. It is reasonable to assume that they managed to squeeze in a B0 and B1 and then gave a demo shortly after B1 would have been ready. However, the fact that AMD did not give another demo when B0 was ready suggests that B0 was probably not much better than the A1 revision. Notice too that we have good correlation between the B2 revision and AMD's stated time frame of mid year since the B2 revision should have been ready by mid July even with some delay. So, we are probably on the right track. But, what if the B2 chip isn't the actual volume production chip and AMD needs another revision? Projecting further we would get:

September 6, 2007 – B3
September 23, 2007 – Calendar End of Summer.

With two and a half weeks of margin, this suggests that a B3 revision could probably be released before the calendar end of Summer. So, again we have good correlation with AMD's latest statement about release by end of Summer. Typically, you don't commit large amounts of production capacity to a new design until you are sure that the die is production ready. If AMD's B1 revision had been ready then it would have committed for B2 and the launch would be July. However, if B1 still had a nasty bug or two then it would take the B2 revision to fix it and production would not be committed until B3. In other words, AMD could have good production samples at the end of June or early July with B2 but with only testing run volumes probably too few for a genuine launch.

If you launch a product with nothing but ES samples then that is a paper launch. This is pretty much what Intel did when it previewed Conroe six months before availability and what it has repeated with Penryn. If you launch with final version samples but volume won't be up for two or three months then that is a soft launch. To get a genuine launch your volume run must be available within a month. Since it takes two and half months for wafers to run through production this means that you would have to wait about two months after you had a good sample batch to give the volume run time to be ready. If AMD only has a small sample B2 run available in late June or early July then this could be a soft launch unless they wait until late August to launch. There is some small possibility that AMD could gamble and start a small production run early knowing that they might have to scrap it if the samples don't measure up. This is probably the only way that AMD could do a firm launch in July.

We can also reasonably guess that once a production run is started for Barcelona that AMD will begin a Phenom batch about a month later. This probably means a lot more delay than with Barcelona. For example, AMD might be willing to gamble a small production run for server chips since these are low volume anyway but probably would not be willing to gamble with the much larger batch of desktop chips that would be needed. So, even if AMD had enough Barcelona chips for a reasonable launch in July it is still almost a certainty that the desktop chips would be delayed until October. Again, these speculative dates seem to match well with what we've been hearing lately.

I was a bit surprised when some suggested that the lack of an elaborate showing by AMD at Computex meant something. However, if you look at the 2006 Computex, AMD showed almost nothing beyond AM2 systems and motherboards, and AM2 had been officially released a month earlier. The real information was given at AMD's June 2006 Technology Analysts meeting. Since nothing was released right before the 2007 Computex this year, I'm not surprised that it seemed so barren. This year, it looks like AMD is pushing the meeting back one and a half months and “will hold its 2007 Technology Analyst Day on Thursday July 26th, 2007”. The push to a later date probably has to do with the fact that DTX and mini-DTX motherboards are due in August as much as the time frame for K10. July also makes sense because FAB 30 should be into the 300mm conversion and FAB 36 should be working on 45nm, and AMD will surely want to mention them. AMD will probably have some word about the repaired R650 GPU series by then along with a time frame for R700. Most importantly though AMD would know by then if the June B2 batch is ready for production.

Our assumption so far has been that B2 will probably be a good batch with volume coming with the B3 batch. This has led some to speculate that a bad B2 batch would knock K10 out for the rest of the year since presumably they would have to wait until the B3 test run to commit for B4 and this would arrive too late for Q4 sales. However, this scenario is not likely to happen. If B2 turns out bad then AMD will have little choice but to start full production with B3 and gamble that it will be good enough to sell. In other words, if AMD is that late with K10 then they will have to perform without a net knowing that a bad B3 batch will cost them millions in scrapped silicon.

The next issue is performance. Back in January, Randy Allen at AMD stated that K10 would be 40% faster than Intel's best Clovertown. And, presumably this would be with the 2.5Ghz speed K10. At the time that Allen said this, Clovertown was only projected to clock to 2.66Ghz. K10 would have needed a boost of about 25% over K8 to hit this performance level. This was possible but Penryn changes things considerably. So to figure out how it might compare with Penryn we have, 1.4 X 2.66Ghz = 3.724Ghz for Clovertown. However, Intel has stated that Penryn is 9% faster than Clovertown. Therefore, 3.724Ghz / 1.09 = 3.42Ghz for Penryn. Interestingly, this is only 2.5% faster than Intel's expected 3.33Ghz speed for Penryn. Recalling that since 3% is the margin of error in testing, a 2.5% difference is not significant. So, we seem to have correlation between Penryn's expected performance and launch speed and AMD's earlier relative performance estimate. By the look of it, Intel intended that a 3.33Ghz Penryn would match whatever AMD could release. Personally, I think AMD's January estimate was a bit too high and rather than 2.5Ghz it will take a K10 more like 2.8Ghz to match a 3.33Ghz Penryn. 2.8Ghz for quad core K10 is looking less and less likely in 2007 while 3.33Ghz Penryn seems much more likely. Roughly speaking, this could allow Intel to hold onto a 10% lead on all single and dual socket systems.

My estimate is based in part on the architectural changes. Of course, I've also seen lots of people trying to pull some kind of estimate of performance out of the two benchmarks that have been done. The problem is that A1 silicon is usually bogged down by patches and won't really show performance. I can already hear Intel enthusiasts insisting that this can't be true because Intel demoed A1 silicon with Conroe. Well, not exactly. Intel's initial silicon was so slow that they had to overclock it with water cooling. This is why they wouldn't let anyone look inside the case. Secondly, Intel very, very carefully controlled what benchmarks were used. They left out most benchmarks and only used what wasn't affected by the BIOS patches. Then, when B0 was ready and needed fewer patches they loosened up a bit and allowed more benchmarks. They were clearly confident by the time B1 was released so we can be certain that it required no major BIOS patches. We know that B1 still contained a bug but this was apparently not so severe that it hurt the benchmark scores. However, this does not seem to be the case at all with K10. Since AMD did not demo a B0 revision it must have still been heavily patched. And, the very limited demo in May with the B1 revision makes it likely that the chip was still using heavy enough BIOS patches to knock the performance down. Remember though that no vendor or partner has anything newer than B1 since the B2 chips won't be ready until the end of the month.

I have heard some insist though that the B1 revision is “final silicon” and therefore whatever benchmark scores have been seen must indicate true performance. However, this is obviously not the case since if B1 were final then K10 would be launched in volume with B2 and there would be no question of when it would be available. In other words, we can't claim that K10 is delayed because it is buggy and then claim that the silicon is still good enough for valid benchmark scores.

Nevertheless, the negative speculation about AMD's actions is unending so let's see if we can cover each point:

The first possibility is that K10 has a major architectural flaw that is causing the low benchmark scores. We saw this with Intel with both Williamette and Itanium. However, this would not explain a delay since major flaws take at least a year to fix. AMD would have to launch K10 as it is in spite of the low performance just as Intel did with Williamette and Itanium.

The second possibility is that K10 is complete but is running hot like Prescott did. Prescott was delayed for months until the BTX standard fixed both the cooling problems and the high socket power draw. This argument doesn't work though because Prescott was actually released right away as the lower clocked Celeron D. Likewise, Intel released Smithfield at lower clocks when heat and power draw were a problem. AMD would simply follow suit and release lower clock speeds of K10 which would easily replace 3800+ and at least compete with with C2D 6300 and below. AMD would probably also release a quad FX version since these platforms could handle more heat and power draw. With its enhanced SSE performance there is no doubt that even a lower clocked K10 would add value to the current offerings. There is also no doubt that AMD desperately needs a quad core offering in whatever form it can produce. Even at low clock speeds a quad core would still be launched just as Smithfield was for dual core.

A similar argument is that K10 is in finished form (and therefore the scores are valid) but that AMD is having process and yield problems and this is why the chip is delayed. However, we've already seen that if speeds were a problem that AMD would release anyway. Also, when K8 was first released in 2003, the yields were not only bad but were the worst that AMD has ever had. Yet, that didn't stop them from launching with 1.4 - 1.8Ghz chips that were no faster than the 2.2Ghz Barton. And, with the large drops in volume in Q1 AMD would have enough idled capacity even if the yields are poor. So, if yields or speeds were a problem then K10 would be launched at low speeds and then bumped up as AMD was able. Finally, even with low yields, AMD could produce enough K10's to cover the server segment where volumes are low and margins are much better than on the desktop as they did with K8.

The most logical assumption remains that the delay is caused by bugs in the die that require BIOS patches to run. This then degrades the performance and causes the low benchmark scores. A patch that did not degrade performance would not cause a delay. A serious design flaw would likewise not cause a delay. Process or yield problems would only cause the initial clock speeds to be lower rather than causing a launch delay.

I've seen commentors on my blog and web authors (who should know better) claim that silence by AMD on these matters will erode confidence and hurt AMD's sales. These people tend to forget that the tiny percentage (less than 0.1%) of enthusiasts who really care about top performance are only a tiny fraction of sales. Also, this crowd tends to be very fickle and will quickly swap systems if any advantage becomes apparent. There are people who read information on the internet who aren't looking for the highest possible performance but this group tends to have brand loyalties anyway. The fact is that probably 95% of all people who may buy a computer in the next year have never heard of Barcelona or Agena and will not notice or care if it is delayed. The only people who really matter are the ones who make purchasing decisions for Dell, HP, and Gateway since the chips have to actually be in systems before they will be purchased by most consumers. And, not surprisingly, buyers for major vendors get their information from private demonstrations rather than reading reviews on the internet. AMD is not likely to give out much information before the 2007 Technology Analyst Day and this is probably when they will announce firm release dates for K10. If some find this fact frustrating then perhaps they should consider the lyrics to the Go Go's song, Our Lips Are Sealed.

168 comments:

Axel said...

Good analysis and I think you're onto something with the deduction that buggy silicon is a reason for the delays, and possibly the main one.

But I think there's also a good argument for poor yields being an important reason for the delay, that I think you missed. If 65nm K10 yields are currently abysmal, it makes little economic sense to sacrifice a large number of mid-range Brisbanes yielding well at 2.2 to 2.6 GHz in order to produce a relatively small number of low speed large die K10s. K8s are currently AMD's bread and butter and without that revenue from the OEMs and server companies, their earnings would fall off a cliff and they would alos lose marketshare (unit & revenue). I don't think AMD can afford to fab K10 at volume until yields are acceptable enough to replace K8 revenue.

I believe Brisbanes are currently yielding very well (judging by anecdotal reports of overclocks on low-end Brisbanes). The reason AMD fabs all of their higher speed CPUs at 90nm is because Fab 30 has to be used for something. Might as well use it for the lowest volume (highest speed) parts so that the smaller Brisbane dies are reserved for the true volume play (lower speed with the OEMs).

bk said...

Another great article. Keep up the good work. A couple of questions if I may.

First, isn't it possible that at the time of the November 30th demo the next batch of B0 chips where already several weeks along?

Second, can you explain if/how software simulation is used to test the designs? It seems like most bugs could be caught this way. Maybe they are but a few slip through.

Scientia from AMDZone said...

axel

True, but with a 30% drop in volume in Q1 they should have enough spare capacity to cover the server and FX markets at least. And, FAB 36's capacity is still ramping. And, as I mentioned, the K8 yields were the worst that AMD has ever had (about half of K7 at the time).

Another reason why the 90nm parts have been made at FAB 30 is because the 90nm process is more mature. As the 65nm process catches up this difference will disappear. You'll see the same thing happen when AMD begins making 45nm; their 65nm chips will still clock higher at first.

Scientia from AMDZone said...

bk

"First, isn't it possible that at the time of the November 30th demo the next batch of B0 chips where already several weeks along?"

It's possible.

"Second, can you explain if/how software simulation is used to test the designs?"

As far as I know, they use the same system that is used for design to do simulation. You just run off some sample circuits and carefully test them for speed and power draw and then put this measured data back into the logical simulation.

"It seems like most bugs could be caught this way. Maybe they are but a few slip through."

I think the software primarily helps by using canned circuit templates. Then I think it is primarily a laborious task of trying to hit every circuit with every possible condition to check for proper operation. However, I think you can also get errors when you create the masks.

enumae said...

Scientia
Interestingly, this is only 2.5% faster than Intel's expected 3.33Ghz speed for Penryn.

If you have seen an Intel road map with clock speeds would you please post a link it?

If not, could you please explain how you, (maybe it's just me) have come to believe that 3.33GHz is Penryn's expected speed, or as I read your post, Intel's top speed?

The only thing I have seen was on VR-Zone, and this was in October of last year, yet they have about the same clock speeds for AMD as you have stated.

The only statement from Intel was greater than 3GHz.

Is it possible that Intel will hit the speeds listed as well?

Is there something that would explain why Intel would not be able to hit those speeds?

Thanks

bk said...

Scientia,

I recently read about a Pentium chip being emulated with a FPGA. Searching Google I found this link:

http://portal.acm.org/citation.cfm?id=1216927

but you need a user account to view. I did manage to find a link here to the PDF:

http://www.eecg.toronto.edu/~yiannac/docs/fpga07.pdf

Do the newer processors have too many transistors to emulate? With my vague understanding it would appear that using FPGA's would greatly speed up the testing cycle. Assuming it would be possible to emulate the leading edge processors.

Fujiyama said...

This explanations looks very reasonable but they doesn't fix AMD problem with poor product mix today.
Seeing Conroe benchmarks more than year ago AMD didn't ship any product to compete with Intel. The key question is what was the plan for
Q3 2006-Q3 2007. Survive?

Ahmar Abbasi said...

In other news.......

http://www.tomshardware.com
/2007/06/12/vigor_force_recon_qx4
/index.html

AMD's 4x4 with dual FX-74 space heaters gets fragged in and out by a Dell XPS720 equipped with a QX6800.

Games,
http://tinyurl.com/2myrs2

http://tinyurl.com/3xaszn

Fragged by 67%

Audio,
http://tinyurl.com/2r88pc

Fragged by average 57%

Video,
http://tinyurl.com/2jukud

Fragged by average 57%

Their conclusion,
"Vigor Gaming's Force Recon QX4 won a single benchmark, PC Mark 2005's hard drive test."

Hardly a relevant benchmark and it was a very close call. On average the quadfather gets a whooping of 54% in all the benchmarks collectively.....

Which excuse will it be today.....

A) improper benchmarks.....
B) numa aware OS(scientia's favorite)......
C) improper system config......
D) old arch versus new arch.......
E) intel paid pumper site.....

or

F) just delete the post and pretend like this never happened.....

savantu said...

You say Conroe A1 had bugs and couldn't clock.

I suggest you visit XtremeSystems.org and see how many members still have the original A1s which clocked like mad to 3.5-4.5GHz before Conroe was released.

If it had nay bugs , they weren't apparent.Nobody to this day said they experienced problems with it.

So , your story with water cooling is pulled out of your bottom.

abinstein said...

"Do the newer processors have too many transistors to emulate?"

Yes. Not only too many transistors, but also too much complexity. The original Pentium is kids play in face of Pentium 4 or Core 2 Duo.


"it would appear that using FPGA's would greatly speed up the testing cycle."

Yes. However it'd accounts for only the RTL design, but not the physical placements (which seems fully automated in their case). A microprocessor's clock rate is much determined by the physical placements of the parts, which in turn is determined by trade-offs made in the RTL design.

abinstein said...

13ringinheat -
"F) just delete the post and pretend like this never happened....."

Happened or not, your comment is totally off the topic and doesn't even belong to this article in the first place. Go spread your news somewhere else because you are literally making a spam by posting it here.

Aussie FX said...

Oh no! Savantu has found this site.
How long until Brentpresley tags along behind him?
The Intel trolls are at it again.

Back to busines...
I heard that some people had early samples of Barcelona a few months ago and they were performing 10-20% faster clock for clock than Yorkfield then. The user stressed that it was an early ES and that there was more (performance) to come. To my way of thinking, this would indicate hat AMD are well and truly on track.

Ho Ho said...

bk
"Do the newer processors have too many transistors to emulate?"

Yes, one dualcore Penryn die has around 127x more transistors than that Pentium. No FPGA has nearly as much transistors and they have vastly different recources on it availiable. Mostly that means more logic and fewer caches.

If my memory serves me well then FPGAs used several times more transistors in order to have one programmable. I don't remember the exact numbers unfortunately but it would still put a theoretical FPGA capable of simulating Core2/Penryn in the same transistor count as dualcore itanium2 with 16M L2 or close to 2 billion transistors.

Aussie FX said...

I quoted above that a person has an early Barcelona ES and that it was around 10-20% faster than Yorkfield. I mistakenly said Yorkfield, it was actually a Kentsfield he was comparing it to.

Unknown said...


Back to busines...
I heard that some people had early samples of Barcelona a few months ago and they were performing 10-20% faster clock for clock than Yorkfield then. The user stressed that it was an early ES and that there was more (performance) to come. To my way of thinking, this would indicate hat AMD are well and truly on track.


Do you have even a modicum of evidence to back these claims up?

AndyW35 said...

The analysis of the performance match to Penryn is pretty close I would guess, the problem AMD have is that a lot of folk will still remember that 40% claim and think that AMD has underachieved ( with the HD 2900XT being fresh in their memory as well ). I am not sure how AMD are going to correct this, people still think AMD will be late but will blow Intel out of the water when it does arrive.

Aussie FX said...

hoho and giant.
Actually I do have proof.
S7 from xtremesystems has publicly stated that he has one.
He is also an XIP member, so that should answer for his credentials.

Aguia said...

The reason AMD fabs all of their higher speed CPUs at 90nm is because Fab 30 has to be used for something.

I think all Opterons are still 90nm parts. There are no 65nm 2MB L2 cache parts.

AMD 65nm parts have a smaller die size even compared to Intel 65nm, so they have cost lower, its perfectly normal that 3600+ and 3800+ parts are all 65nm.


And Scientia if I remember correctly when AMD released K8 was:
-The biggest die size processor to date.
-Highest L2 cache processor to date.
-New 130nm manufacturing process from IBM (K7 used AMD 130nm).

Aussie FX said...

xtremesystems.org, K8L and beyond thread in the AMD section. I'm not going to get it for you, you can find it and post it here if you like.
Oh yes it is very funny that you were 100% correct in your assumption. Where else would you report you had a chip? Ebay maybe!!

Your physic powers are astounding, it makes me wonder why you don't use them to better effect.

Scientia from AMDZone said...

13ringinheat

I delete your posts because you don't seem to be able to comment without being rude.

As to your current post, it is a complete waste of time. Why are you so delusional that you believe someone here thinks the current dual core FX system is the best gaming machine?

The Quad FX system runs hot and wouldn't outperform C2D hardware. In fact, if it weren't for the possiblity of a good K10 quad upgrade then I would say QFX would be a complete waste of time. Phenom FX should finally make QFX what it should have been in the first place.

I'm trying to figure out if you are a troll or if you just have some crazy ideas about what I think.

Scientia from AMDZone said...

savantu

"the original A1s which clocked like mad to 3.5-4.5GHz before Conroe was released."

Unless this was done on air, you don't have a point.

Scientia from AMDZone said...

giant

"Do you have even a modicum of evidence to back these claims up?"

Don't go starting a flamewar over something trivial. You need to understand that outperforming Kentsfield by 10% is nothing. Outperforming Conroe by 10% would be something else entirely.

I know what he is referring to but there wasn't enough testing done to really show how fast K10 is. It's inconclusive just like the Pov-ray and Cinebench scores.

enumae said...

Scientia

I know this isn't servers, but it does show Intel gained Unit Market Share in workstations.

Could this have happened in Servers as well?

Scientia from AMDZone said...

enumae

Yes, AMD could have lost server unit share in Q1 as well. AMD could be losing volume share right now as well.

I don't honestly know how AMD can hold up in Q2 since almost everything good like R650, DTX, and K10 won't be out until Q3. And, AMD won't have a really top line mobile platform until sometime in 2008.

anonymous said...

Scientia, I have a question for you about DTX boards that I'm hoping you can answer.

You continually expound on the new form-factor as a great differentiator for AMD in driving lower-end sales.

But my understanding is that is simply that- a new MB form-factor. There is nothing proprietary about it. If it turns out the new form factor is popular, and is driving board sales, what would prevent board manufacturers from spinning DTX boards for Intel CPUs, with or without Intel's blessings? Is there anything that prevents them from doing so? Again, it is in their best interest to increase revenues, and chances are most board companies are making most of their money from Intel board sales- so why wouldn't they do this? And at that point, how would AMD have any sort of form-factor related cost advantage with systems builders?

Thanks.

Eddie said...

Hello.

I am a fellow blogger, I wanted to tell you that although your approach is interesting, I think it is flawed.

There is a fundamental difference between "Core"'s (Core 2 Duo) and K10's evolution: Core was a set of radical optimizations to the mature P6 architecture; the main challenge there was to iron out the bugs; but in K10's case, the critical issue is the 65nm transition. And the rythms of process development and architecture debugging are not comparable.

Now, for Barcelona to succeed, AMD should not just succeed at the new K10 core, but also to have the greatest 65nm process, otherwise there just wouldn't be the yields or binsplits to make the product interesting or cost-effective

Christian H. said...

Yes, AMD could have lost server unit share in Q1 as well. AMD could be losing volume share right now as well.

I don't honestly know how AMD can hold up in Q2 since almost everything good like R650, DTX, and K10 won't be out until Q3. And, AMD won't have a really top line mobile platform until sometime in 2008.


Yes, they remarked that they ay lose some server share, but with the WalMart and Toshiba deals they still may have an overall gain.

Since HP is pushing Live Home Server which should release soon, that may account for more home sales.

Also, AlienWare has thrown down the gauntlet for the living room space with the Hangar 18.

And it is said that AMD also has 60% of OEM sales for HD2400/2600. ALso, nVidia hasn't given Intel a license for SLI yet according to the Inq (grain of salt added).

I think they will do better thn most people simply because consumers don' know the difference between CPU brands.

They will just buy the cheaper HP, Dell or Toshiba.

I just think they should stop screwing around and drop Agena/Kuma first as that's where the real volume will be.

Especially since there is no HT3 server chipset. That means that no current server boards will support the split planes.

I think that's why some OEMs are miffed abotu the launch. I honestly expected Broadcom to drop an HT3 server chipset by now.

They can use 790G as a "roadmap."

Scientia from AMDZone said...

dr yield

True, DTX is not proprietary. I guess the question is who would make Intel DTX boards. Are Dell, Gateway, and HP going to start making Intel DTX motherboards? Probably not after they've put so much effort into creating Intel BTX boards. Would they really go to DTX without support from Intel?

I suppose the standard motherboard makers like ASUS could but this would knock them out of any commercial sales unless Intel creates a stable image. Still, it is possible.

Correct me if I'm wrong but I was rather under the impression that the motherboard companies wouldn't build new motherboards unless they had a ready market for orders. If we leave out Dell, HP, and Gateway this would have to be a smaller integrator that was still using mini-ATX cases. Interest from someone like Tiger Direct or Lenovo would probably be sufficient. I don't know if eMachines is still independent from Gateway's supply chain.

Scientia from AMDZone said...

dr yield

BTW, have you compared Intel's mini-ITX board with an AMD mini-DTX board? There is no comparison at all. For a similar board cost you get considerably more with DTX. You get twice the memory bandwidth, twice the memory capacity, PCI-e, plus dual core.

This compares to a single core only Celeron Intel board with only a single memory slot and no PCI-e.

With similar board and case cost the main differences would obviously be two memory sticks instead of one and a little higher cost for the dual core AMD chip. But look at the benefit. The mini-DTX board is adequate as a low end desktop unit rather than just a budget Celeron class box. However, you could put a single core Sempron chip in this board and one memory stick and sell it as a budget box too. Tell me that's not a benefit.

anonymous said...

scientia wrote:
True, DTX is not proprietary. I guess the question is who would make Intel DTX boards. Are Dell, Gateway, and HP going to start making Intel DTX motherboards? Probably not after they've put so much effort into creating Intel BTX boards. Would they really go to DTX without support from Intel?


Rephrase the question: If Dell, Gateway, and HP can generate better margins on AMD systems that utilize a new form factor, wouldn't they want the same on their volume-leading (Intel) platforms?

Remember, the large OEMs aren't making motherboards- they are sourcing them. Most Intel PCs from the big OEMs use Intel-designed boards. But if they can extract more margin, they will likely go elsewhere (Asus/Epox/Gigabyte/or any of the other Taiwanese companies that make boards) for boards that enable them to do so. Why wouldn't they?

I suppose the standard motherboard makers like ASUS could but this would knock them out of any commercial sales unless Intel creates a stable image. Still, it is possible.

Why does Intel have to create a stable image? Board layout and bios tweaks come from the board manufacturer.

Correct me if I'm wrong but I was rather under the impression that the motherboard companies wouldn't build new motherboards unless they had a ready market for orders.

And again, if DTX is in the least bit successful as you think it will be, why wouldn't all the big PC vendors demand the same for Intel. Margin is margin, and customer pull could create demand in a very short time- if the benefits are there. And if they aren't, is AMD really benefitting much from the form factor, or are they further segmenting their own market, reducing economies of scale for their platforms?

Serious questions- I don't see this as a core differentiator yet-but you can still convince me. At best, it is a stunning success, and Intel DTX boards are available within 6 months due to OEM pull. At worst, board manufacturers have to deal with lower volumes on multiple AMD platforms, and pass the costs along.

enumae said...

Scientia
For a similar board cost you get considerably more with DTX.

Aside from more memory bandwidth, what does mini-DTX have that this mini-ITX board does not?

Scientia from AMDZone said...

eddie

"There is a fundamental difference between "Core"'s (Core 2 Duo) and K10's evolution: Core was a set of radical optimizations to the mature P6 architecture; the main challenge there was to iron out the bugs;"

I'm not sure how this would be a difference. There are as many differences between K7 and K8 as between PIII and Banias. There are as many differences between K8 and K10 as between Banias and Conroe. The scale of change is the same.

"but in K10's case, the critical issue is the 65nm transition."

Again, I'm not sure what the difference would be. Intel first used 65nm with Yonah and the die shrink of Smithfield. AMD has already produced Brisbane with 65nm. There is some suggestion that AMD is using a newer transistor with K10. I suppose this could make a difference.

"Now, for Barcelona to succeed, AMD should not just succeed at the new K10 core, but also to have the greatest 65nm process, otherwise there just wouldn't be the yields or binsplits to make the product interesting or cost-effective"

AMD's 65nm yields are fine. And, the highest bin should have no trouble hitting 2.3Ghz at launch. Admittedly, there is some question of whether AMD can keep up with 65nm when Intel begins producing 45nm.

BTW. You say:

"it can not be said that the market was lacking in support of AMD technologies or initiatives"

This is clearly false. In 2008, AMD will have (for the very first time) a chipset which was designed from the ground up for mobile. This is something that AMD would not have if it hadn't bought ATI and it is something that AMD desperately needs to compete with Centrino.

"there only remained the issue of capacity expansion itself that AMD was attending to by securing an excellent offer by the State of New York to build a fab there which included more than 1 giga dollars in bonuses."

I don't see the benefit of skipping the ATI purchase and pursuing the NY FAB. Specifically, how would this have prevented the Q4 and Q1 losses? The NY FAB wouldn't be any benefit until 2010 and AMD is going to need a good mobile chipset to gain any marketshare in that segment. AMD will get benefit from ATI in late 2007 and increasing benefit all through 2008 and 2009. This benefit should be sufficient to offset the loss of cash from the ATI purchase.

Scientia from AMDZone said...

enumae

That is a good find. The board isn't quite comparable since it is embedded but I see your point. If this board has two DIMM sockets plus PCI-e X 16 then why couldn't you put these on a proper mini-ITX desktop board? Yes, I would have to agree that it looks like a competitive mini-ITX board could be made.

This board uses an Intel chipset so I'm wondering why Intel chose a third party chipset for its latest mini-ITX (and why it couldn't fit two DIMM sockets or PCI-e). Does the 945 chipset have poor graphics?

Now, I have to wonder too why the trades seemed so confident that mini-DTX would become the dominant SFF standard.

enumae said...

Scientia
This board uses an Intel chipset so I'm wondering why Intel chose a third party chipset for its latest mini-ITX (and why it couldn't fit two DIMM sockets or PCI-e).

Emerging markets. Most likely they will be very cheap with limited functionality, basic Vista or Xp.

Does the 945 chipset have poor graphics?

Well for what they are trying to make available at a certain price point it could be overkill.

SIS got the order from Intel, I have no idea how this chipset would compare to Intels, if you could elaborate it would be appreciated.

Thanks.

Anonymous said...

I quoted above that a person has an early Barcelona ES and that it was around 10-20% faster than Yorkfield.

Excuse me, he said 10% at most over Kentsfield. Nowhere did he put 20%. Stop twisting words around?

http://www.xtremesystems.org/forums/showthread.php?t=128474&page=16

ES AM2 quads are showing approximately a 10% performance advantage over Intel's offerings at the same clockspeed.

But I have a hard time believing this, even if it's from XS.

lex said...

Lots of word but some serious errors that if unreconciled leads to a erroneous conclusion.

Your assumption of 10 week to run a 65nm from start to finish is horrendous. From what I understand TSMC and others can run
a shuttle far faster. Thus any conclusion predicated on 10 week TPT is busted.

a) If it is 10 weeks, then AMD has some serious fab and assembly processing issues. I find it hard to believe that AMD takes 10 weeks to run what is the most important product out. If it does take 10 weeks that sums up the manufacturing capability and its no wonder that AMD's 65nm and manufacturing is broken

b) You are in error and AMD can match TSMC processing types. Then if this is the case then your assumptions of number of steppings is seriously flawed, or AMD's ability to debug the part is taking far longer then anyone could assume. If it is debug and re-design fix is where the gap is. We can deduce that Barcelona has some serious fundamental architecture issues or the complexity of BIOS and other debug was un-anticipated. This would make me suspect a longer delay and no accurate forecast as quad-core for top end has the most rigirous quality needs then in the consumer space. Or if debug rate is also fast then it would suggest fast TPT, more expected debug would lead to the conclusion of many more all layer and metal layers steppings. That in itself also lead to conclusion that they are taking far more respins and the basic health and speed conclusion you come to is also not accurate. Somebody somewhere completely missed here; design, fab, debug-test.

I agree low process yield in the sense of limiting launch isn't an issue. We all know how these go. AMD and INTEL have dones paper launches in the past. others have speculated that AMD is very healthy and keeping everything quiet and will unleash to suprise and maximize the response time from INTEL. That is also silly as any real response from AMD or INTEL in this business takes quarters if not years, look at Opetron/64 bit,dual-core, and Core2 as examples. A hasty and contrived response is easily seen in this enviroment.

THus, the only conclusion is that we have logic/function bugs, speed, test, and/or stability that are far above and beyond expectation. EVery large chip design has a best case and expected schedule. I'm sure AMD public annoucements would target a
best case schedule. My conclusion is that time has come and gone and the total lack of any evidence of good things or that launch is immenient is there areserious surprises that AMD didn't expect and it could be tomorow they fix it, but most likely much longer. In their current business situation they simply can't afford to fill their fab with material that may end up being garbage. They don't have the money nor free capacity to fill their fab with a speculative fix. INTEL on the other hand with their new found execution seems very happy to continue to show their hand and put pressure on AMD. AMD isn't helping itself by hiding and we see Penrym everywhere, and I'm sure Nehalem demos will come well before production too. If you got it flaunt it I say.

Maybe AMD will get luck on their next stepping. In this multi-billion dollar business its nice to be lucky, its better to be
good and not need luck. INTEL has the luxury it can be un-lucky and survive. AMD doesn't have any luxury. THey were lucky so to speak for about 4 years as INTEL had both strategic and execution lapse. These days are gone. In the big time CPU business the odds are long and stacked against anyone competing with INTEL! AMD needs perfect execution, brilliant design, perfect silicon-design marriage, aligned infrastucture. SOrry, that much alignment happens once in a millenium and it has passed them by.

My hypotheiss is AMD has run far more steppings and are not as close as you believe to a fix. Sure the fix could come on their next C stepping, but again it could be ellusive or require many units and many cycles.

Oh, what is the about only enthusiasts care about performance and that AMD silence on performance won't hurt confidence.

Today the preception INTEL owns performance. Its true for almost but not ALL segments, yet preception swings a lot of things.

Today AMD is again percieved as offering "value" and they re-enforce that image with their hefty price cuts and talk of
value. INTEL ads on the other hand say something else. sure the Dells, HPs and Lenova's of the world buy a lot more
commodity chips then high end they to can know the impact of preception. Buying a C2 leverages all of that peformance
preception for free. If you have the same price a consumer sees a AMD what is he thinking... will he think about the value
proposition of the AMD chip or the performance at the same price angle of the INTEL chip. Do you think that drives the OEMs
to make sure they get the most pricing out of AMD... that is a NO brainer. Not having the performance chip derates your
pricing advantage across the whole damm stack. Not having Barcelona now is not that bad NOT having Barcelona AND NOT having
any performance preception is downright horrendous. Noboyd would put themselves in this position by choice. They are their
simply because things are in a horrible situation.

Ah the go go's that has a far different memory for me. When I was in college we called some of those freshman girls Go Gos
because their lips were sealed.. LOL, fond memories of some uptight girls!

core2dude said...

If I am not mistaken, the turn around time for full-run wafers is more like 4 weeks rather than 10 weeks (however it has been very long time since I had looked at a process, and hence it could be more). But, a full run means a full stepping--A, B, C, etc. B0, B1, B2 are revisions of the same stepping, which typically indicate only a metal-layer change. In that case, they only have to rework the metal layers of processed silicon wafers per new masks. Typically they have quite a few wafers stopped at every metal layer. So depending on the buggy metal layers, they choose the appropriate wafers and flow them using the new ML masks.

So, the real problem is typically not the fab run, but it is the time it takes to figure out what is wrong with the silicon. Especially between the full runs, they have to ensure that they have worked out most of the bugs.

AMD's Barcelona situation is closer to Intel's Prescott than to Core 2. Core 2 appeared on a very mature process (Intel does not tweak a mature process much). However, Barcelona is coming on a (relatvely) new process--something that Intel did with Prescott. Prescott was released on C1 stepping in low volume, and on C3 in high volume.

If Barcelona is not clocking high, that could also mean that AMD needs to debug some more speed paths. Speed paths many-a-times require silicon change (not just ML change). In that case, to achieve stable 2.3+ GHz speeds, AMD may have to go to C or D step. That is going to be expensive and time consuming....

abinstein said...

"Today AMD is again percieved as offering "value" and they re-enforce that image with their hefty price cuts"

I don't think you understand the dynamics of the market. AMD has always been seen as offering the value systems, even when K8 was outperforming P4 in enthusiasts' eyes. The end users are most affected by marketing and nothing else. Intel owns marketing.

If you look at history, since the days of K6 AMD and Intel have been exchanging performance titles. It is only lately (since K8) that AMD had a foothold in mobile & server. However, exchange of titles between the two companies continue. The question is not whether AMD has it now, but whether AMD can get it back 3 or 6 months from now. Some believe it can, some (apparently including you) don't. I'd just wait and see.

However, one thing is clear to me that AMD still owns floating-point performance at 2P or up.

abinstein said...

"So, the real problem is typically not the fab run, but it is the time it takes to figure out what is wrong with the silicon."

To figure out the problem, you tape out a revision and wait for it back, test it and find any bug, modify and tape in, simulate and tape out the next revision. I'd be really surprise to find it takes less than 2 months to go through this process. A project that I worked on had about 2 months turn-around time and we were considered best-in-class in the industry. Needless to say the chip was simpler than Barcelona.

Pop Catalin Sever said...

"AMD loses market share in x86 workstation segment faster than expected"

http://www.tgdaily.com/content/
view/32460/118/

It seems the market dynamics )that non AMD supporters don't seem to understand) values the products that are on the market more than promises of upgradeability and others alike.

This in my opinion is terrible news taking into consideration the price slashing AMD has done to hold onto it's market share. It seems AMD doesn't the trust of corporate clients that it can deliver onto it's promises (smart guys).

I think the next two quarters will be awful for AMD until it will start shipping K10 in volume.

I really do think that K10 can save AMD or at least stop the free fall, but until K10 arrives and I don't mean only Barcelona but also Agena and Kuma, AMD will continue to loose market share and money.

I really think Jerry Sanders should have remained CEO of AMD, i think his decisions were more performance focused and that is the reason AMD managed to arrive where it did in 2005 on the server market.

abinstein said...

"It seems the market dynamics )that non AMD supporters don't seem to understand)"

The "dynamics" is clear, that biggest gain of Intel's workstation share happens in Q2 this year, some 6 months after Core 2 release, a time when most buyers just started to be affected by Intel's marketing.


"I really think Jerry Sanders should have remained CEO of AMD, i think his decisions were more performance focused

The old AMD has no chance to compete with Intel. It had a mindset of military contracts, completely not suitable for making consumer products. AMD played the role of Intel resistance for some 10+ years before 2003 and it couldn't even gain 10% market share from Intel.

provoking said...

IMO AMD became victim of a misinterpretation: that the success of K8 made them a recognized CPU manufacturer.
It made them not. In public perception, Intel is and reamins the prototypic CPU manufacturer. Everybody knows a PC needs a CPU and everybody knows Intel makes CPUs -> for a PC, you need Intel.
Obviously wrong, but people don't know or care. AMD needs to have a _massive_ edge over Intel technology wise or they make losses as Intel can easily afford a price war.
The good results of MCW even made P4 sales rise because "Intel were good again". There is no room for further differentiation apart from Intel <-> AMD. Because of its prototypicism Intel does not even need to be in the lead to sell more. For them it is enough not to trail too far behind. Logically, for AMD the opposite is true. A knowledge that seems to have slipped the minds of AMD leaders.

Scientia from AMDZone said...

I'm sorry but these characterizations of AMD are completely wrong. Similar surprises have happened over and over to both AMD and Intel.

AMD's first surprise was when the totally in-house designed K5 performed worse than they expected in spite of its leading edge RISC core. This wasn't really fixed until AMD acquired K6 with the Nexgen purchase.

AMD's surprise was when Intel left socket 7 and moved to the proprietary Slot 1. Pushing socket 7 higher with K6 was only interim. AMD didn't get a true fix until the Alpha Slot A with K7.

Intel's surprise was 3DNow! Intel responded with SSE.

Intel's surprise was K7's ability to clock, partly due to AMD's more advanced copper process. Intel's attempts to clock PIII higher and release Willimette were not really effective. The fix was Northwood.

Intel was surprised at the low competitiveness of RDRAM which they believed would be the next memory standard. The fix was a DDR2 chipset for P4.

AMD's surprise was Northwood on the brand new Intel 130nm process however AMD's effective response with K8 was slowed because their second surprise was Intel's doubling of the FSB speed on Northwood from 400Mhz to 800Mhz which left socket 754 outclassed. It took AMD until late 2003 to respond on the desktop with socket 939.

Intel's surprise was the power of IMC and HT for Opteron. Intel first responded by pumping up the FSB speed on the lagging Xeon but continued to fall behind until 2006.

Intel's surprise was the heat and power draw of Prescott on 90nm when previously they had always reduced power draw on a smaller process. This fact completely derailed the plans for 5Ghz Tejas and 7Ghz Nehalem. However, they responded by replacing the weak Celeron (which was getting trounced by K7) with the Prescott cored (lower clock) Celeron D and then releasing BTX for P4D.

Intel's surprise was dual core. Intel responded weakly with Smithfield and then the marginally improved Presler. However, Intel's real fix for X2 was Tulsa and C2D.

AMD's surprise was the much more powerful SSE performance in C2D along with the improved integer IPC.

AMD's surprise was the MCM quad core Kentsfield and Clovertown which seemed to work much better than the MCM dual cored Smithfield and Presler.

AMD's surprise was when Intel copied their 2002 vintage dual FSB design to removed the memory bottleneck on dual socket Clovertown.

Intel's only real surprise with C2D was lower performance of FBDIMM which had been expected to quickly become the server standard.

I've tried to think of anyway that AMD could have bumped the SSE performance on K8 with Revision F. I haven't yet thought of anyway to do this without overhauling the cache buses and that might have been impossible in Rev F. AMD probably could have bumped the integer performance a little by Fast Pathing a few more instructions.

I've also tried to figure out if AMD could have released its own MCM quad core with Rev G. The dual memory controller was definitely a problem. Adding a second die without overhauling the memory controller would only have gained about 30% with 4 threads. However, this would have nearly doubled power draw.

I've thought about this several times but it took AMD 4 years to get from K5 to K7 and 4 years from K7 to K8. K10 is 4 years after K8. The only way I can see that AMD could have developed K10 any faster would have been if they had come up with the idea of a modular core sooner.

lex said...

Surprise surprise

When you are behind by a technology node its hard to suprise a competitor that is awake and working hard. You have 1/2 the density and are 20-30 percent slower for transistor.

You are very unlikey to surprise anyone but your board with a crappy year

Pop Catalin Sever said...

For the first time in history AMD will manage to surprise Intel by having no surprises. (or should I say no good surprises form AMD)

Fujiyama said...

Revision F should have larger L2 or cache L3 for both cores to reduce distance between C2D and X2.
It is easy to redesign, test and produce. Athlon with bigger cache would help in larger ASP and much more competitive position in 2007.

I've upgraded my computer with X2 S939 yesterday. I'm very happy but
it took 3 weeks to import this CPU
from USA.

Aussie FX said...

Scientia, I am a helicopter pilot by trade, so hence have no knowledge of cpu arch. However I do like to read about it and am trying to learn the basics, albeit slowly. As you know I have posted about S7's K10. Your reply is that 10% over kentsfield is nothing and if it had been 10% over conroe that would be another matter.
Can you explain the difference for me? I thought kentsfield was glued together conroes and therefore should be faster/more powerful Obviously not, but why?

BTW I love these informative blogs, even if a lot of it is over my head. Thanks.

Ho Ho said...

aussie fx
"I thought kentsfield was glued together conroes and therefore should be faster/more powerful Obviously not, but why?"

Conroe: dualcore
Kentsfield: two dualcores in one socket, total of 4 cores
Penryn: an upgraded Conroe, still dualcore
Yorkfield: two penryns in one socket, total of 4 cores


Penryn is actually a mobile chip codename but it is accepted as unifying name for the to-be-released upgraded 45nm Core2 architecture.

enumae said...

AussieFX
Your reply is that 10% over kentsfield is nothing and if it had been 10% over conroe that would be another matter.

I think a way to explain what he is saying is to look at the scaling of Core 2 Duo, vs the scaling of K8.

K8 scales nearly 100%, Core 2 duo scales about 70% (applications may vary).

If K10 is only 10% faster than a processor design that only scales about 70%, while it scales at about 100% any potential advantage at the dual core is gone.

But, if your dual core is 10% faster, you could potentially be about 30% (100% - 70% = 30%) faster with quad cores due to better scaling and an additional 10% to 20% for processor design.

Real world results could/would differ, but the potential performance could be there.

-------------------------------

Scientia, if what I posted was wrong please delete it.

Thanks

abinstein said...

"If K10 is only 10% faster than a processor design that only scales about 70%, while it scales at about 100% any potential advantage at the dual core is gone."

Exactly. This is why I believe AMD only talks about performance advantage on quad-core. With respect to integer performance, K10 on dual-core is likely to be slower (albeit slightly) than Core 2 Duo.

It's another story for floating-point, though.

Scientia from AMDZone said...

fujiyama

You seem to be confusing AMD with Intel. A larger L2 cache wouldn't have done any good on Rev F (or G). You need to understand that Intel doubled its cache bus bandwidth with C2D. There really was no way for AMD to match this without overhauling its own cache buses (which it did with K10).

Additional cache might still benefit Penryn if memory bandwidth gets tight with quad core but remember that MCM memory access is not optimal as it is with native quad core. I don't see cache size as being an issue for AMD unless it does release an MCM octal core. With octal core it would be in the same boat as Intel and more cache could make a big difference.

ausiefx

Abinstein has a very good scaling comparison on his blog. Basically, Clovertown is not as efficient as Woodcrest or Conroe and runs slower. What a lot of the disgruntled Intel enthusiasts have forgotten is that AMD has never claimed that K10 is 50% faster than Woodcrest or Conroe at the same clock, only Clovertown. This could translate into a speed of only 13% faster than Conroe. See the difference? 10% faster than Clovertown could be 17% slower than Conroe. And, BTW, 13% faster than Conroe might be only 4% faster than Penryn. That's why you have to pay attention to what is being compared.

Anyway, the numbers on Xtreme are for an early ES chip and these have to run BIOS patches to keep from crashing. The real production chips should be faster.

The big problem is that you never know just what is being patched and just how much peformance it is stealing.

We should have more of this inconclusive ES testing soon at Anandtech.

Scientia from AMDZone said...

lex

"When you are behind by a technology node its hard to suprise a competitor that is awake and working hard. You have 1/2 the density and are 20-30 percent slower for transistor."

Are you claiming that AMD's 65nm chips are 20-30% slower and half the density of Intel's 65nm chips? You can't be talking about 45nm since these aren't in production yet and won't be until Q4.

If you are looking into the future and talking about 45nm then what about early 2009? With both AMD and Intel on 300mm wafers and 45nm processes and Intel moving to native quad Nehalem won't Intel lose their wafer, process, and MCM cost advantages?

Scientia from AMDZone said...

pop catalin

"For the first time in history AMD will manage to surprise Intel by having no surprises. (or should I say no good surprises form AMD)"

I'm not sure what you are trying to say. Obviously K5 was a bad surprise for AMD and Prescott was a bad surprise for Intel.

You really aren't thinking about this. Intel felt that it had made a mistake by making PII dual socket capable because this took sales away from Pentium Pro. Since that time Intel has had strict separation between single socket desktop and dual socket server chips.

However, AMD has now tossed in the dual socket QFX platform and Intel seems to be at a loss as to how to respond. So far, Intel seems to be adhering to its policy of Xeon server only for dual socket chips while trying to push dual and quad core on the desktop. It will be interesting to see if this continues. It hasn't been necesary so far since Intel has Kentsfield but once Phenom FX is released, who knows? Intel might consider QFX to be too small of an influence to have to counter or it might have to create its own dual socket offering.

I would say Nehalem will be the last potential surprise since AMD and Intel architecture seems to be converging again. However, even Nehalem will be less of a factor. With both AMD and Intel moving to shorter upgrade cycles any surprise will be able to be countered more quickly than in the past making long periods of leading the competition impossible.

Ho Ho said...

scientia
"However, AMD has now tossed in the dual socket QFX platform and Intel seems to be at a loss as to how to respond"

I take you haven't heard about V8? So far it seems to lack dual GPU solution but I bet by Christmas things are different.

Scientia from AMDZone said...

ho ho

Yes, V8 is a good question. AMD differentiates between Opteron and FX and they support this with a new motherboard standard. As far as I can tell, Intel is just using two Xeon's. I'm not sure if they are making any effort to differentiate this from their server line.

Aussie FX said...

Thankyou, enumae, abinstein and scientia for your explanations/input. Now I understand the difference.

It is nice not being treated like a twit for asking questions. Unlike a lot of other sites out there.

goober said...

This question may be slightly off topic, but does impact the current race.

Is the cabability to upgrade processors without board replacement important in the server world? Are IT managers currently replacing Intel dual cores with quad cores? Will they do the same for AMD when they can? It would seem this would save a lot of money. It also appears, from what I hear about power consumption, that AMD and Intel had this as a goal.

If this (upgrade) actually occurs in signifcant quantities, does it annoy the MB manufacturers? Does that impact business relationships with the processor manufactures?

This type of upgrade is important for me as a desktop user, but I doubt it is of any significance in the overall desktop market (in spite of the point of view that many enthusiasts have).

enumae said...

Scientia
AMD differentiates between Opteron and FX and they support this with a new motherboard standard. As far as I can tell, Intel is just using two Xeon's.

Well considering that the QFX platform can use Opteron chips, how much have they really separated them?

To me it would seem that the FX series is just relabeled Opterons and that there is no real separation except SLI and DDR2, both of which were originally done by Nvidia, and could also be done for Intel by Nvidia, right?

abinstein said...

"Is the cabability to upgrade processors without board replacement important in the server world?"

For servers, yes, especially for datacenter and blades. For workstation, no, the two/three engineering offices I stayed didn't bother to upgrade just the processors. Also, 95% or more desktop users don't know/care about processor upgrade.

Upgrading Core 2 Duo to Core 2 Quad doesn't make much sense, unless one also upgrades the FSB & chipset (e.g. from 800MHz to 1333MHz). In terms of processing throughput (I'm talking about servers here), Core 2 Duo up till 2GHz will be happy with 800MHz FSB; up to 2.67GHz with 1066MHz FSB. Core 2 Quad of 2.33GHz and up can only be satisfied by 1333GHz, and even so it only gives 60-70% performance benefit.

Aguia said...

To me it would seem that the FX series is just relabeled Opterons and that there is no real separation except SLI and DDR2, both of which were originally done by Nvidia, and could also be done for Intel by Nvidia, right?

Not SLI but QUAD SLI.
Can’t be done for Intel, unless Intel processors have two HT links.

By the way where is the AMD Quad FX platform reviewed with 4 graphics cards? Isn’t this one of the primary differential factor? Not just the 2 processors capability?

Ho Ho said...

aguia
"Not SLI but QUAD SLI.
Can’t be done for Intel, unless Intel processors have two HT links."


(Quad)SLI doesn't need any HT links, it only uses regular PCIe ones.


"By the way where is the AMD Quad FX platform reviewed with 4 graphics cards?"

No because QuadSLI doesn't really work and neither does Quad-Crossfire. There were a few benchmarks with those 7950GX2's some time ago but they showed awful scaling. At best you had 1.25x more performance than with two 7900GTX'es and mostly they were considerably slower.

Aguia said...

(Quad)SLI doesn't need any HT links, it only uses regular PCIe ones.

Then what’s that connected to the two north bridges:

Quad FX

No because QuadSLI doesn't really work and neither does Quad-Crossfire.

So I can’t use four 7900gs connected together?
Then it’s really a failure.

I don’t understand this new technologies this days, all seem a complete failure, Turbo memory, ati/nvidia/aegia physics, … even the current used ones I have doubt about the need, PCIe/SATA performance boost, Dual channel memory performance boost, faster memory speeds like DDR2 800, PCIe2, SATA2, ...

Only SATA seams OK because the cable is smaller, but the power cable is bigger and weird compared to the old one, I really didn’t understand this step.

Ho Ho said...

aguia
"Then what's that connected to the two north bridges"

These are HT links that are there between both NB's and the CPU. On Intel platform this is replaced by FSB. From the image it seems there is no direct connection between the two NB's. Also what exactly forbids to use PCIe lanes to connect the two? That is what is used to connect NB and SB on several chipsets.

Also using two NB's just shows what kind of a hackjob this really is. Not to mention they don't really have full 16x speeds for all of the cards.


"So I can't use four 7900gs connected together?"

From what I know, no you can't. Though I admit I haven't really researched it too much. I know there have been few benchmarks of the 7950GX2 that has two PCBs with a GPU on each glued together. Put two of those together and you'll have four GPUs. Performance scaling was awful with those though and if game didn't support SLI that well you lost huge amounts of performance.

Though you can still use four GPUs but you simply can't use them to drive a single application. Then again having eight big monitors attatched to your PC is also kind of nice :)


"I don't understand this new technologies this days, all seem a complete failure

I wouldn't say so. Most things do have a reason to exist.


"Turbo memory"

Why is that a bad thing? You know MS sais that ReadyBoost can speed up application loads and disk access several times just by using regular USB sticks with 2.5-3MiB/s bandwidth. Turbo Memory has much more bandwidth than that and besides speeding up disk access it can also reduce power usage since you can stop the drive and only read from the cache.


"ati/nvidia/aegia physics"

Give it time. Aegia seems to have little impact as it costs way too much and therefore has relatively little support from gamers and developers. Also as there is not much support developers can't use it for much more than effect physics that donesn't effect gameplay. Last thing goes also for GPU physics but I personally would like to have my GPUs drawing nice pictures and not waste their performance on physics as they have awful efficiency in that. Of cource some kind of automatic balancing would be nice so when FPS drops to 20 some particles are removed to give more recources for regular rendering.


"PCIe/SATA performance boost"

PCIe is needed to have more balanced memory bandwidth and to allow CPU and GPU work together more efficiently. SATA has exactly zero performance boost for as long as we don't yet have 500GiB HDD platter sizes as old PATA was fast enough for speeds up to around 133MiB/s. Of cource I love to have loads of disks in my box without having all those huge cables everywhere.


"Dual channel memory performance boost"

It didn't use to make much difference. At most 4-10% in most games for example. Reason was that msot applications don't need much memory bandwidth and are more effected by latency. Though with quadcores and more CPU power in one socket combined with higher FSB on Intel this might be a bit different in the future. Why faster FSB? Because currently it doesn't make much sense to combine fast dualchannel RAM with 1066MHz FSB as the latter will be the bottleneck in throughput. On AMD this bottleneck doesn't exist.


"faster memory speeds like DDR2 800"

Same as before plus faster MHz of RAM also lowers effective latency. DDR2 800 at 3 CAS has much lower latency than DDR1 at 2 CAS.

"PCIe2"

This is mostly for eliminating all those extra powercables. It's not that fun to have two 6-pin cables on your video card or double that for dual GPU setups.


"SATA2"

Speed has almost nothing to do with that. SATA2 brings a whole lot of more things. Most obvious was having more than single active request, request reordering aka native command queue (not very useful on desktop but good on servers). There is SATA3 in the works that should bring 6GB/s bandwidth. This becomes useful with port replicators and SSDs. Platter-based HDD's probably won't reach that throughput before becoming obsolete.

As for the power cable I prefer the new one to the old one. Sometimes it was almost impossible to get it out of the drive as it got stuck really easily. I even broke one HDD trying to get the cable out of it. New ones attach nicely and are also easy to remove.



Sorry for the long OT part, just explaining why some things really are useful.

Aguia said...

Also what exactly forbids to use PCIe lanes to connect the two?

Speed. Even Uli connected AGP Northbridge to PCIe Northbridge with HT. I don’t know how many pins PCIe has over HT but using PCIe must complex the all design, and in the end Intel FSB was going to bottle neck everything since memory access has to go through there too.

You know MS sais that ReadyBoost can speed up application loads and disk access several times just by using regular USB sticks with 2.5-3MiB/s bandwidth.

If just 3MB bandwidth on one USB drive can do marvelous why not use an external HDD, second HDD or even the system RAM to do that, I can’t imagine the speed boost by using one of those.

As for the power cable I prefer the new one to the old one.

I have problems when the power cable is too close of the sata cable; I have to bend them both because some of the power cables are too big. The cable could be much better, if USB drives data and power, sata could do that to if they wanted.

Sorry for the long OT part, just explaining why some things really are useful.

Not a problem is always a pleasure talking to you ho ho.

And about the dual channel performance don’t you think it’s a little strange that “all” sites never tested Intel or AMD processors with just one memory module ?
I only found nforce2 and old via.
It would be nice to see one of the new high end ones with just one bank. Just for curiosity.

Ho Ho said...

aguia
"Speed. Even Uli connected AGP Northbridge to PCIe Northbridge with HT."

I doubt that speed has got to do anything with that particular case. IIRC you only need 8x PCIe lanes to match the speed of AGP8x. How fast HT link was used there?

"I don’t know how many pins PCIe has over HT but using PCIe must complex the all design"

If the talk about CSI, PCIe and HT in the replies before is correct then I see no problems with using PCIe lanes to connect the things as they should use same amount of lanes, just that some of them are clocked at higher frequencies.

There are chipsets with nearly 50 PCie lanes. That is enough for similar 2x16+2x8 lane thingie, no need for two separate NBs. I still wonder why AMD needed to use two, it only complicates things.


"and in the end Intel FSB was going to bottle neck everything since memory access has to go through there too."

So far I haven't seen the FSB as bottleneck in SLI/CF PCs, even with the slow 1066MHz FSB. If with quad SLI it would be then I think 1600 MHz FSB is doable for V8, perhaps even 2GHz as there are quite a few chipsets out there that are capable of such speeds.


"If just 3MB bandwidth on one USB drive can do marvelous why not use an external HDD, second HDD or even the system RAM to do that, I can’t imagine the speed boost by using one of those."

One word: latency. The time it takes to place the read head to correct place you could have read over a hundred kiB of data. With solid state memory like flash USB stick or that Hyper Memory you don't have to seek at all and can read data at constant rates without having the CPU waiting for tens of milliseconds for another batch of data.

Modern HDDs have theoretical datarates at well over 80MiB/s but in real world it is awfully difficult to get over 30MiB/s, even with excellent filesystems like Reiser4 or XFS. When you have to seek after every few kiBs that measily 3MiB/s of constant data rate suddenly becomes very fast.

Also current solid state drives already have constant speeds of 50MiB/s without any seek times. Of cource they also cost >10x more than regular drives. Those drives can make your database server work tens to hundreds of times faster than with normal platter based HDDs


"I have problems when the power cable is too close of the sata cable; I have to bend them both because some of the power cables are too big."

You must have some weird cables then. I have five SATA drives in my relatively small case and have no such problems. All use the official SATA power and data cables.


"And about the dual channel performance don’t you think it’s a little strange that “all” sites never tested Intel or AMD processors with just one memory module ?"

I've wondered the same thing. I remember one review made several years ago that showed the results I said and haven't seen anything else since then. I don't think the situation has changed a lot. It seems that for most desktop usage 1066MHz FSB is more than enough and DDR2 800 gives almost enough bandwidth to fill it. I don't think dualchannel is really needed before much faster FSB. Only reason I use dualchannel memory is that two 1G sticks were cheaper than one 2G stick.

Of course in server space things are a whole lot different. I remember some Sun mainframe from around 2000-2001 that had 16 memory channels in total and all accessible by all of the 16 CPUs UMA way. Those kinds of machines never have enough bandwidth.

Aguia said...

How fast HT link was used there?
HyperTransport link 2000MHz 16bits bidirecional.
Uli

I still wonder why AMD needed to use two, it only complicates things.

How much bandwith each PCIe slot uses, around 4GB/s? The 4 slots would saturate one link? Or maybe they needed to do it because it’s 2 Northbridge’s and not one.

without having the CPU waiting for tens of milliseconds for another batch of data.

So it acts like a prefetch is that it? Because it must have to be very small amounts of data.

You must have some weird cables then.

And most of them come directly from the PSU not using converters. Or maybe it’s the HD that as both connections too near. I normally buy Seagate. Can’t confirm with other brands if I would have the same problem, or the distances between connectors are standard?

It seems that for most desktop usage 1066MHz FSB is more than enough and DDR2 800 gives almost enough bandwidth to fill it. I don't think dualchannel is really needed before much faster FSB. Only reason I use dualchannel memory is that two 1G sticks were cheaper than one 2G stick.

With only one bank it’s possible to push the latency down a little. And it its true 2x1GB is cheaper then 1x2GB, but the same doesn’t happen with 512MB modules.

Unknown said...

http://www.fudzilla.com/index.php?option=com_content&task=view&id=1507&Itemid=1
Looks like K10 is delayed afterall...

Pop Catalin Sever said...

Fudzilla is the most credible news site out there and companies would give this kind of information to fudzilla before all others, or is it that mister Fuad 008 is the greatest spy in the industry ???

Ho Ho said...

aguia
"HyperTransport link 2000MHz 16bits bidirecional."

If I understand correctly then that link has around 16GiB/s bandwidth. This is almost eight times more than AGP theoretical throughput. Why wast so much bandwidth there? It doesn't make any sense to me.

"How much bandwith each PCIe slot uses, around 4GB/s? "

Each lane has around 2.5Gbit/s bandwidth. 16x link has 4GiB/s. In 4x4 each NB has 16+8 PCIe lanes for graphics, that means around 6GiB/s for each NB or 12GiB/s combined. Of course this is theoretical maximum. I'd me surprised to see >50% of that in real world. Anyone with Windows could probably test it using some of the NVidia profiling tools.


"Or maybe they needed to do it because it’s 2 Northbridge’s and not one."

My original question was why couldn't they use a single notrhbridge with the same amount of PCIe lanes that those two have when put together. You are just reiterating my question.


"So it acts like a prefetch is that it? Because it must have to be very small amounts of data."

Yes, something like prefetch or cache. Not just small but small pieces of data from random locations. HDDs are good at sequental read but they suck at random seeking. This is also the main reason why it takes so long to open a directory with tens of files in it. HDD has to do many seeks for each of those files and it takes a whole lot of time. Once the directory is cached the same operation is done in the blink of an eye. At least this happens on Linux, I don't know about Windows.
Also when searching stuff from Gentoo portage with cold caches I rarely see disk throughput over 0.5MiB/s. There are over 100k tiny files there and it takes a long time to search. I once put the same directory onto a RAM disk and the operation that used to take around half a minute took no more than two seconds after that.


"And most of them come directly from the PSU not using converters. Or maybe it’s the HD that as both connections too near."

Same here but that has never been a problem for me. The two connectors are almost touching each other but there is plenty of room and no problems fitting the cables. I have three Seagates and two Hitachi drives.


"And it its true 2x1GB is cheaper then 1x2GB, but the same doesn’t happen with 512MB modules."

It happened a few months ago. Today RAM prices are almost three times lower than they were around 4-5 months ago. I addad 2x1G RAM for ~€140, today I could get the same RAMs for roughly €50. Though from what I know prices will start creeping up again very soon.


As for Inquirer/fudzilla, they report whatever rumours they can get their hands on. Often they are wrong but sometimes not. I personally wouldn't be surprised to see Barcelona delayed. After all AMD never planned mass production this year so there is not much point in wasting time and money trying to rush the products to market.

Pop Catalin Sever said...

Ho Ho said...
"personally wouldn't be surprised to see Barcelona delayed. After all AMD never planned mass production this year so there is not much point in wasting time and money trying to rush the products to market."

So now AMD looses money by not having a new product, if they rush to introduce a new product they will still lose money.
Are you saying AMD will loose money no mater what until they can mass produce Barcelona? Well in that case they should rush mass production and make a more agresive ramp not the other way around.

Ho Ho said...

Pop Catalin Seve
"Are you saying AMD will loose money no mater what until they can mass produce Barcelona?"

I didn't say this but I think this is true.


"Well in that case they should rush mass production and make a more agresive ramp not the other way around."

Barcelona was supposed to be released summer this year in small quantities to get some design wins. In other words show some benchmarks and promise good availiability in the future. It wasn't supposed to generate noticeable revenue this year.

Sure, they could probably rush, throw a lot of money at it and get something usable done a bit sooner but that probably won't help too much as it still takes time for yields to increase and I doubt they could earn that money back so soon. To me it seems that financially it would be better to try improve the yields before starting mass production.


There is one bad thing to it though. When AMD doesn't hurry Intel won't have much reason to lower its prices and bring in new CPUs either. Then again it is better to wait a few more months than having AMD do a suicide.

Scientia from AMDZone said...

June 18th
Fudzilla: Barcelona slips to Q4

June 19th
TG Daily: Barcelona still on track for Q3

This is what happens when you cherry pick news items based on nothing but prejudice. It's only 3 months to the end of Q3. If AMD were really going to delay Barcelona to Q4 they should have announced it by now.

This is the same as the way a lot of people recently jumped on the silly idea that AMD is outsourcing everything and getting rid of its FABs. But this was based on nothing but one analyst's rather shaky speculation about AMD's actions. I have no doubt that AMD will still be in possesion of two fully operational 300mm FABs in 2008.

enumae said...

Scientia

Hey Scientia, I have revised the projected numbers for AMD's Barcelona (K10) based on the reported launch clock speeds, while comparing them to the projected performance of Penryn which I have also lowered (clock speeds).

Please take a look, here is the link.

If you see a error in the numbers please point it out, and if not it doesn't look good for AMD's revenue going into 2008.

Thanks

Ho Ho said...

scientia
"If AMD were really going to delay Barcelona to Q4 they should have announced it by now."

How ling did it take for them to tell us about all the R600 delays and from "we don't do paperlaunces" until they said that only HD2900XT will be availiable on release? From what I remember it was much less than three months.


"This is the same as the way a lot of people recently jumped on the silly idea that AMD is outsourcing everything and getting rid of its FABs."

Of course it is extremely radical thing to do but I have to admit, it would be a rather good source of income in short term . They did say they'll have to change their way of business to start earning profits again. This could be one way, though it would be rather bad thing to do in the long term. Then again long term is not important if you have really bad problems needing to be fixed ASAP.

ck said...

Ho Ho

"There are chipsets with nearly 50 PCie lanes. That is enough for similar 2x16+2x8 lane thingie, no need for two separate NBs. I still wonder why AMD needed to use two, it only complicates things."

Remember, PCIe lanes nowadays are provided by NB and SB. For nForce 680i, 18 lanes in NB and 24 in SB, 2 used in something else, see 680i Diagram, and 590 SLI, 16 in NB, 20 in SB, 6 in others, see 590 SLI diagram. It's hard to implement too many PCI-E lanes into one single chip, the top of the line, nForce 590 SLI/nForce 680i only includes 48 lanes (32 for graphics, 14 for others), but you need a total of 48 for nForce 680a for graphics only (16 lanes for two, and 8 lanes for two, and what about other peripherals that uses PCI-E and extra PCI-E slots, NVIDIA gives out a total of 56 lanes), then you see that NVIDIA also do not include too much PCIe lanes into one chip. The NB for the one chip solution will be far more complicated than two NB solution! Note that the competing chipset RD580 only has 40 in the NB only (http://en.wikipedia.org/wiki/
RD580#Common_Features), which is still not enough for x16-x8-x16-x8 Multi-graphics thing (OT, and that the next-gen RD790 will still be x8-x8-x8-x8 CrossFire, total of 32 lanes and 4 for ALink II, and 4 for others, ouch).

So the nForce 680a consists of two NB but no SB, with one having 24 PCIe lanes and the other 32 (16+8+8) lanes. This is way easier than integrating 56 PCIe lanes into one chip, and it reduces R&D time, as AMD at that time has nothing to deal with Intel core numeral Quads, but only two dual-core FXes, so why bother using more time in R&D? They need to get something out quick! The processors are basically memory controller changed Opterons 2200s (this should not be a problem to AMD "modular design" of processors touted back in 2006), normal desktop mobos and RAMs, and "higher power" PSUs. Nothing more than that, with a simple mobo solution, the Quad FX comes out in Nov 2006, though it's not good enough. :\

BTW, there exists a Skulltrail announced Spring IDF 2007 for enthusiasts with multi-graphics after V8 for workstations.

Ho Ho said...

ck
"It's hard to implement too many PCI-E lanes into one single chip, the top of the line, nForce 590 SLI/nForce 680i only includes 48 lanes (32 for graphics, 14 for others), but you need a total of 48 for nForce 680a for graphics only"

Of course it takes more time and effort to slap two chips together but I'm still thinking that in the long run it would be better (cheaper) to have one chip with lots of PCIe lanes than two separate chips.

Also another thing is that when your four GPUs have either 16 or 8x PCIe connections then why not make all of them only 8x? Or perhaps like with some motherboards you can have full 16x with one GPU and 2x 8 with two. To me it doesn't make sense to "waste" PCIe lanes like that. If one GPU sits in 16x slot and other one in 8x then the latter one will become the bottleneck that doesn't let to use the full 16x speed anyway. Or is there something else I missed?

abinstein said...

"How ling did it take for them to tell us about all the R600 delays and from "we don't do paperlaunces" until they said that only HD2900XT will be availiable on release?"

AMD said a full range of R600 products will be released in Q2 2007. A full range of R600 products are being shipped to OEMs at least 2 weeks before the end of the quarter.

Get your logic right. The fact that HD2900XT was released a bit earlier than the others doesn't negate/contradict AMD's own claims.

I know you like many Intel fanbois will start to argue "but HD2900XT was supposed to be release in Mar'07" (or Dec'06?)!! So in early Mar we knew of the delay to June. This means if AMD's going to delay Barcelona to September, we'd know of it officially by now. That is scientia's logic that apparently you don't get.

Scientia from AMDZone said...

ho ho

"How ling did it take for them to tell us about all the R600 delays and from "we don't do paperlaunces" until they said that only HD2900XT will be availiable on release? From what I remember it was much less than three months."

I think you have two factors with R600 that are unrelated to K10. First, I doubt AMD was fully integrated into what was going on at ATI that early in 2007. Secondly, it appears that the driver was big factor.

Eric Demers

"However, given a new operating system, a new DX, a new OpenGL driver and a whole new architecture, we got the perfect storm! I think that the drivers are very stable and give users a good experience. But when it comes to performance, there’s still a lot to optimize for. And it will take time to exploit the full performance potential of the hardware. They have done what was required, which is give the best $399 part, but there’s so much more to do… "

Eric Demers

"I can’t help but be a little disappointed that we did not have enough time to get more optimizations into our drivers in time for the launch. I still cringe when I see poor performance (especially if it’s compared, to, say, our previous generation products), and send daily emails to the performance and driver team, begging for this or that.

Also, on the feature side, we weren’t able to expose the adaptive edge detect CFAA modes to the public, or even to the press, until very late in the review process. This means that most reviewers did not have a chance to look at the amazing quality this mode offers – There is nothing comparable to it out there.

We also had some last minute performance possibilities, which is always incompatible with stability, and we did not have enough time to get those tested and integrated in time for launch. We see in the latest driver, some 2x to 3x improvement in adaptive AA performance, for example, which is great but came later than I would have liked. But, I’ll admit, there’s so much to do still, that I haven’t really spent that much time on reviews and such. The reality is that I expect things to continue to improve and be much better in a few months."


So, we see that the drivers should be fairly mature by the time they move to R650 on 65nm. This will also fix the 80HS transistor:

"When we selected 80HS, we selected it based on faster transistors, better density and also the fact that 65nm would not be available for production for R600.

The transistors are faster and leakier, and we were aware of this. 742 MHz was in the range of our expectations, though we thought we would be a little higher (~800 MHz) initially."


BTW, this interview confirms that R600 was only designed to compete against 8800 GTS and not 8800 GTX. Some had speculated that the real top end part was 2900 XTX and that 2900 XT was lower unit due to die errors. This however is incorrect:

"2900 XT is a full featured part"

This disproves rampant speculation that either die errors prevented releasing the non-existant 2900 XTX or that R600 was much lower performing than ATI expected. The reality is:

"Right now, our plan is to target the high end performance level [8800GTS] with the R600, and the ultra high end [8800GTX] will be covered by the crossfire configurations. Today, the crossfire R600 beats the high end ultra-super-expensive competition, by a significant margin, while being cheaper."

Axel said...

abinstein

AMD said a full range of R600 products will be released in Q2 2007.

Link please? Even Scientia has started posting links to back up his claims, yet you persist in expecting people to believe anything you say without a link.

Now then, AMD ALSO said in early April (six weeks before R600 "launch") that they would do a one time launch of the entire family of DX10 enabled products:
http://www.xbitlabs.com/news/video/display/20070403235139.html

And then two weeks later came Richard's infamous claim quoted all over the Web now that they don't do soft launches.
http://seekingalpha.com/article/32901

So one way or the other, AMD misled the public. Ho Ho is absolutely correct. You're trying to spin it.

abinstein said...

"Link please? Even Scientia has started posting links to back up his claims, ..."

What's the problem with you fanbois? You come to argue on technology-related stuff but have no capability of using google or yahoo? What makes you so incapable of seeing AMD's Q2 launch announcement? Just enter "r600 2nd quarter" in your favorite search engine and you get a bunch of those links.

No unfortunately I see no reason to provide links to imbecile; if you don't believe a fact just because I didn't do diaper for you, so be it.


"Now then, AMD ALSO said in early April (six weeks before R600 "launch") that they would do a one time launch of the entire family of DX10 enabled products:"

If the "time" means quarter, month or even bi-week, then R600 is a one-time launch of the entire family.

Randy Allen said...

First Henri Richard said that R600 "would launch in Q1'07.'

Later David Orton stated that they had delayed R600 so they could launch a whole family of DX10 products.

Later Henri Richard stated "We don't do soft launches.".

Then come mid may they launched the HD2900XT and all the other cards. Only the HD2900XT was available. It was 100% a paper launch for the 2400 and 2600 cards.

Today, in Australia, one still cannot buy a 2400 or 2600 card.

They totally went against what they had previously said.

Nvidia seems to be the best at launching products on time with immediate availability. That's what they did with the 8800 GTS/GTX and the 85/8600.

The 8800 Ultra was an oddball. They announced the product and it was in stock a week later. I have no idea why they didn't wait for the cards to be ready, it's not like they desperately needed the Ultra.

Axel said...

abinstein

What makes you so incapable of seeing AMD's Q2 launch announcement?

Thanks for the link. And I agree that if we see the mid-range cards on the shelves within the next two weeks, AMD is consistent with what they said in February per your link. However, that doesn't change the fact that they announced to the public OVER A MONTH LATER in April that the R600 launch would be a one time event of all ten cards, and would not be a paper launch. There is no way you can spin this. As much as you wish to believe that AMD can do no wrong, everyone knows they screwed this one up.

If the "time" means quarter, month or even bi-week, then R600 is a one-time launch of the entire family.

No, we're not talking about geologic time here. "One time launch" means same day and that's exactly what AMD did on May 14, they launched all the cards at once, though it was mostly a soft launch.

tech4life said...

Scientia,

What is your opinion of the Inq article stating that production quantities of Barcelona will be available in mid-August? Fudzilla seems to think it will be late September. Also assuming that the chip they showed at Computex was buggy and needed a new revision is it possible to have that revision done with "production quantities" available by mid-August? If the article is true then it seems they have committed to the next revision in a do-or-die scenario like you described in your blog post. What is your take on this?

Scientia from AMDZone said...

tech4life

Mid-August is possible for the initial Barcelona chips, probably at 2.4Ghz. They should be able to get in one more revision since what was shown. Although if they did multiple tests they could technically do several revisions. Each individual test wafer doesn't really matter though since the results from several tests would be combined into a single generation for production.

So, it is possible that AMD could have a bug free Barcelona in August but they are still talking December for Phenom.

ck said...

HoHo

"Also another thing is that when your four GPUs have either 16 or 8x PCIe connections then why not make all of them only 8x?"

Actually, the performance of graphics cards at x16 or x8 is not very distant. Some online/printed sources (I forgot which ones...) have showed that for a Gfx card in x8 or x16 bandwidth, the performance is somewhat about the same level without trade-offs.

But I think the use of x16 in theory is for more data to pass into/out of the device, therefore x16 is in theory needed for more performance, i.e. more data can be passed among of the Gfx card and the CPU. For the reason why the above is not true, I don't know, maybe under these normal-daily-use circumstances, x8 is enough, x16 did not help a lot. But hey, PCIe 2.0 has doubled bandwidth, so x4 is also enough in the future, who knows? :P

Maybe after the ripening of the Fusion initiative, everything goes into SoC, and no need for distinctions between NB/SB/GPU/etc. :/

ck said...

Randy Allen

It's very sad that Australia is slow at technology products, which is my experience. So be patient, and wait for OEMs (Dell, HP etc.)

And, "First Henri Richard said that R600 "would launch in Q1'07'". Okay, the R600 was first promised (by ATI, not AMD) to be on it way in Q4 2006, and then so what!? A delayed product is a delayed product. Nothing to whine about, whatever the reasonle behind the delays.

Ho Ho said...

abinstein
"Just enter "r600 2nd quarter" in your favorite search engine and you get a bunch of those links."

I did it a bit differently. I replaced "2nd" with "1st" and found this:

In answer to a question about the timing of their next generation R600 GPU, AMD's Henri Richard, Executive Vice President of Worldwide Marketing and Sales, replied: "We're still planning to bring to market the R600 product in the 1st quarter".

Any comments?


ck
"Actually, the performance of graphics cards at x16 or x8 is not very distant."

I know that. It was the same with AGP 2-4-8x, only some of the few later AGP GPUs saw some performance drop when compared to PCIe versions.


"But I think the use of x16 in theory is for more data to pass into/out of the device, therefore x16 is in theory needed for more performance, i.e. more data can be passed among of the Gfx card and the CPU."

Yes, in theory that would all make sense but in reality PCIe 16x has roughly the same bandwidth that ordinary dual channel DDR2 has. It is kind of difficult to send enough data to fill the bus as there won't be left any bandwidth for other things. You can't really send the data straight from CPU to GPU either as drivers do a lot of caching and the data is all held in main RAM. It is not like in XB360 or perhaps in PS3 where GPU can read stuff straight from CPU L2 cache.


"But hey, PCIe 2.0 has doubled bandwidth, so x4 is also enough in the future, who knows? :P"

For the next few years the bandwidth increase of PCIe2 is not too important for regular GPUs in PCs. It will probably benefit GPGPU applications, though. Also the main thing with PCIe2 is the added power throughput. With PCIe1 you could get up to 75W from the slot but with PCIe2 it is twice that.


"Maybe after the ripening of the Fusion initiative, everything goes into SoC, and no need for distinctions between NB/SB/GPU/etc. :/"

It might happen in low-end but it won't ever happen in high end, no matter what technology advances are made.



"It's very sad that Australia is slow at technology products, which is my experience. So be patient, and wait for OEMs (Dell, HP etc.)"

There are a few x2900's available here in Estonia but at the moment it seems as neither ACME nor GNT has none in their stock and those are international resellers.

There are tens of 8800GTS'es and nearly as much GTX'es available, though.

I'm not sure why are there so few R600's in stock. Is it bad availability or few people want them. I know some people who have bought them so there is some availability but nothing you can compare to NVidia offerings.

ck said...

The R600 is short in supply, so let's see the official version specs (yet again).

*700 million transistors on 80HS process
*14-layer PCB
*vapour chamber cooler by a Japan company
*Digital PWM (7-phase)
*1.0 ns GDDR3

So 3 out of 5 are very rare on this market.

And I guess the yield is not very good enough also as 700 million transistors made the core VERY COMPLICATED. Plus which one has more launch partners, NVIDIA GF 8800 GTX or Radeon HD 2900 XT from ATI? Almost 10 vs. only 3 at respective launches. So what do you expect? Few partners (that mean the 2900 XT has VERY low margins, as $399 launching price tag), delayed products (again AND again, from first promised Q3 2006 to Q2 2007, 9 months, yes!), and possibly bad yields, so? Low volume shipments! That's all.
--
Yes, Fusion will be in the low-end, and maybe mainstream segments, but certianly not high-end, and enthusaist markets (guess what a 2.3 GHz CPU (Athlon X2 BE-2350) + 0.8 GHz GPU (R600) equals 45W+200W, and that's hot!!), so it does not meet the "Pareto principle" or the 80-20 rule, or the law of the vital few or the principle of factor sparsity or whatever you named it.

"In business, dramatic improvements can often be achieved by identifying the 20% of customers, activities, products or processes that account for the 80% of contribution to profit and maximizing the attention applied to them." This is currently and obviously not AMD's first priority. 0.o

So AMD is at risk for this move? Maybe. Besides Professional/Ultra high-end products, will pervasive computing for the masses (through cost-reduced SoC/Platform) is the other way? I don't know, maybe Hector knows! :D

What I wanted to say is a quote from Hong Kong Secretary for Education and Manpower, Arthur Li, "I'll remember this, you will pay."

Remember, Hector, you're risking your business and some other people's jobs. So call Jerry Sanders to manage AMD again like what Micheal Dell did for Dell Computer!
--
Okay, rants off. Fusion is a good concept, but I seriously doubt the revenue and buzz generated from that. You know, one can never achieve perfection, especially in the world of semiconductors. Just see the thermals, you know this is not gonna work in a high-end PC (though this thing has 0.55 TFLOPS and same power as Terascale but with x86-64 support).

ck said...

P.S. Just some food for thoughts, the terascale 80-core processor has 2 TFLOPs at 265W but can only do special FP maths.

With the R600 merged into a quad-core CPU, it's 320+4=324 cores (2 types of cores) at 295W (Maximum) thermals with full x86-64 support and DX10 graphics. But I would expect greater latencies.

Hm...
--
Okay now, just ignore these cr@p and go on.

Ho Ho said...

ck
"*700 million transistors on 80HS process

G80 has around 681M of them and most of them clocked at 1.35GHz.


*vapour chamber cooler by a Japan company

Just a fancy name for heatpipe cooler.


*1.0 ns GDDR3"

Wasn't it the same for 8800GTX? Also I think 8800 Ultra has 0.8ns chips on it.


"And I guess the yield is not very good enough also as 700 million transistors made the core VERY COMPLICATED. "

I didn't do any googling but I think G80 is actually a bigger die than R600. It is 90nm vs 80nm afterall.


"P.S. Just some food for thoughts, the terascale 80-core processor has 2 TFLOPs at 265W but can only do special FP maths."

You're a bit wrong there. It is ~157W at 6.26GHz/2TFLOPS, ~24W at 3.13GHz/1TFLOPS and oly 3,3W being idle with only 4 cores working.

Also that Terascale thing is only a research thing, real products will use simplified x86 CPUs. First will probably be Larrabee with 24 and 32 cores coming in sometime 2008/2009 and up to 48 cores following soon.

"With the R600 merged into a quad-core CPU, it's 320+4=324 cores (2 types of cores) at 295W (Maximum) thermals with full x86-64 support and DX10 graphics. But I would expect greater latencies."

324 cores with only 4 of them x86 ones. Also there will still be the problem of providing enough memory bandwidth to the beast. There are not many CPU sockets with bandwidth comparable to high-end GPUs.

Pop Catalin Sever said...

AMD Phenom Launch Schedule & Roadmap

I don't know if this is correct or not but the frequencies don't look good for Phenom X2 especially compared to Peryn's announced ones. I was really hoping for 2.8 GHz for Phenom X2 at launch. 2.4GHz seem a little too little especially when Intel did better at launch (2.93GHz) on bulk silicon.

I always thought that AMD's APM would help them start a new process with mature yields and already tweaked for performance as they have said, it seems that's not the case.

Jerry Sanders should definitely return to AMD as CEO.

Ho Ho said...

2.4GHz on quad and dualcores in Q4 and 2.6GHz in Q108. I wonder what will Opteron release clock speeds be if they really do get released a quarter before the desktop models.

abinstein said...

Ho Ho -
"Any comments?"

My comment is that you can't follow logic. HD 2900 was delayed from March to May, nobody contests that. AMD informed the new plan in early Feb, or more than a month before the previously planned release.

The point here is not whether AMD delays product or not, but whether AMD lied about top-to-bottom availability in Q2, with the stress in Q2. Or maybe you will link to us how AMD delayed its K5/K7/K8 introductions? Get some logic and get a clue first please.

abinstein said...

Ho Ho -
"You're a bit wrong there. It is ~157W at 6.26GHz/2TFLOPS, ~24W at 3.13GHz/1TFLOPS and oly 3,3W being idle with only 4 cores working."

Are you sure you have the accurate numbers? Read the bottom of this page and tell me who is right or wrong.

While you are right about Terascale being a "research thing," you are wrong about its implication on the platform environment. Just because it uses simplified x86 cores doesn't make it a bit more compatible to today's computers. First, the instruction set is going to be modified - some instructions removed, some more added. Second, the programming model will change to something nobody (outside of Intel's lab) knows what it is. Last, the earliest incarnation of this research (Larrabee) is already a reality in Sun's Niagara and Rock (although with less # of cores, mostly due to inferior processing technology).

Before Intel comes up with a credible and efficient way to program this 24-core or 80-core lump of processors, the Terascale is really nothing but a grouping of a lot of cores and transistors.

enumae said...

Abinstein

His numbers are accurate, here is a link to revised performance numbers.

enumae said...

Abinstein

I also found the video of it.

Link to video

enumae said...

Abinstein

Correction, after watching the video again, it is about 40W at 3.13GHz or 1 Teraflop, and would make the Inq. article incorrect.

But the other numbers are correct...

160W at 6.26GGHz or 2 Teraflops, and 4W at 3.13GHz (4 cores).

Wise lnvestor said...

Abinstein

Take it easy. I see that you been chase by INTC supporters about the delays of mid range + low end R6xx. Although in real life, I doubt it really concern them.

They reminds me of those kid selling news paper in the streets in 1920s, yelling: Read(moan) all about it! lol

As far as we know, mid range have been delay a bit because of 3D engine and something to do with HDCP. It's important to fix those chips otherwise this could happen.

ck said...

Well, actually the R600 cooler has liquid sealed into a copper container base, it's not marketing hype. I don't know where I got that (remember: Google and search engines are your best friends!), but it's true. And don't tell me it's not working well and so on and so on, because I'm not the one who use the cooler in original card design. So get over it.

And well, it's obviously that -1.0ns GDDR3 is far more rare than -1.2ns GDDR3 modules, am I right? It's just a matter of perspective, goes the same that 0.8ns is more rare than 1.0ns GDDR3. How about 0.6ns GDDR5 (not a typo, thank you)?

BTW, G80 has two different clocks, one is below 650 MHz (the core frequency for GTX I think, don't flame me about that, I cannot recall so many clock frequencies values), and for shaders (or stream processors as you like), they worked about double the core clock frequency, and see, not all of them ran at 1.35 GHz.

And that 80-core project thing, yes, the TDP can always be changed, why? 1) core improvements and 2) layout improvements, and 3) fabrication process improvements, like AMD pushed 89W (90nm) Athlon 64 X2 then 65W (65nm) EE Athlon 64 X2 and then 45W (65nm) Athlon X2 BE (Note: these do not have item 1 and 2), BTW, that **** webpage doesn't seem to be updated lately. And one thing, I don't give a **** about the numbers of TFLOPS it can achieve, but the need of being implemented. This 80-core monster can show great architectural potential (QoS, router, Point-to-Point Interconnects and great calculations etc.), but exactly what can it do? Calculate special maths (predefined by Intel!!) but not everyday/general usage? No multimedia which is used daily, no x86/x64/RISC/VLIW/IBM computer compatibility sort of thing, and somebody must redone the programs for that!? No way, this is not the way it's meant to be, new processors are meant to bring faster operations from time to time but not making programmers to spend more time to make compilers/re-compile programs with the "special" instruction sets which is absolutely NOT compatible with others, that way, should I 1) go for a proven architecture although slower with popular instruction sets so I can push new product fast or 2) get faster hardware but spending more time for re-compilation of program from line to line and lag behind in software development? Get in the shoes of corporate CIOs and choose one. Don't tell me the Apple Intel transition (from IBM PowerPC RISC) is one of them, because they are not the same.
--
And the Fusion thing I talked about, that 324 core thing, that just poked fun at my thoughts only. So in real situations, this can only be in the low end where very limited graphics are needed (but not as little as integrated 2D graphics in PDAs in the past, now PDAs get seperate chips for 3D accelerate, GoForce/Imageon, you name it).

And Dave Orton told FUDzilla in computex that Fusion will end up 10% more pins than an average CPU (I don't know which one is an "average CPU", take socket F as example, that becomes 1207+120, about 1328 pins LGA for the rationale behind switching from PGA to LGA).
--
BTW, in other news, AMD has touted "Bobcat" the first step in Fusion, which Bobcat is a CPU below 10W TDP without GPU for CE markets (UMPC/DTV/Convergece devices), so it will be interesting to integrate other components later (Xilleon/Tuners etc.), go on to read here (Japanese website, use Google Translate if you don't know Japanese) as you like, it looks really impressive on paper (Yes, on paper, and who knows how Hector will make that into complete cr@p, just who knows?).
--
P.S. It should be rationale but not reasonale, I apologize for that in my previous post.

Randy Allen said...



It's very sad that Australia is slow at technology products, which is my experience. So be patient, and wait for OEMs (Dell, HP etc.)


Why should people wait? When the Geforce 8800 launched Nvidia had the 8800 GTS and GTX in stock here. Supplies were limited for a while (as they were everywhere), but that was to be expected.

G80 has around 681M of them and most of them clocked at 1.35GHz.

This is true. The G80 GPU was so large on the 90nm process they developed a second chip called NV/IO which is the RAMDAC and powers the DVI ports etc. So on any Geforce 8800 card you've actually got two chips on the PCB.

Scientia from AMDZone said...

pop catalin

"I always thought that AMD's APM would help them start a new process with mature yields and already tweaked for performance as they have said, it seems that's not the case."

First of all, 2.4Ghz is bumped up from the 2.3Ghz that AMD initially said. Secondly the yields on K10 are mature; that isn't a problem. And, you are confusing yield with performance.

This is not a binsplit issue as you seem to be assuming. The performance is related to the transistor which AMD continuously improves. You are assuming that the transistor is already at peak performance and that the issue is process but this is not the case.

Just from AMD's announced process specs I would assume that 2.8Ghz is the maximum for quad core with 90watts and 3.0Ghz would be the maximum with 120 watts. If AMD gets 2.6Ghz out in Q4 then this is already near the top. I assume they are counting on 45nm for higher clocks just as Intel is with Penryn.

enumae said...

Scientia
Secondly the yields on K10 are mature; that isn't a problem.

How do you know this? Do you have a link?

Pop Catalin Sever said...

"And, you are confusing yield with performance."

No, I don't, yield and performance are very close related and it's a known fact that a fabrication process can be tweaked for better yields or better performance.In time you'll have more of both, but you can always have this tradeof and the decision to make: performance vs yields.

Amdzoner said...

Scientia, you keep on spinning that Intel can't get past 3Ghz with the stock coolers.

What do you say when HotHardware.com got its new E6750 to 3,9Ghz with the stock cooler, completety benchmark stable?

Also, if AMD releases a Quad Core above 2,6Ghz before Q1 2008 i will eat my shorts.

Scientia from AMDZone said...

enumae

I guess you must be assuming that AMD is going backwards.

Yield ramps on page 19

EE Times

The 90-nm transition was very successfully put into Fab30 at close to mature yields.

2006

We begin the first revenue shipments of processors manufactured at Fab 36 in Dresden, Germany. The new facility ramped in record time, hitting every major milestone on schedule and beginning production at mature yields.

AMD begins its first revenue shipments of processors manufactured at Chartered Semiconductor Manufacturing in Singapore. AMD and Chartered ramped 300mm production at Fab 7 in record time, hitting all major milestones and starting production at mature yields.


geek.com

The smaller silicon titan says 65 nm technology will arrive in the fourth quarter of this year at “mature yields.”

INQ

Also, we are instructed that the introduction at mature yields talk they give means that the number of good chips they get off a 65 nanometre wafer exceeds that of a 90 nanometre wafer.

AMD's 65nm process is yielding fine. K10 uses a slightly tweaked transistor, however, AMD has never dropped yield when updating a transistor so why would you think they've started now?

pop catalin

" it's a known fact that a fabrication process can be tweaked for better yields or better performance."

True. However, the speed in question is 2.4Ghz and that is working fine. The process is mature and this is indeed due to APM. However, the transistors will continue to improve.

In other words, some people have suggested that AMD is currently getting lots of 1.6 and 1.8Ghz K10's with hardly any 2.0 or 2.2Ghz K10's and that is why they haven't launched. This is not true at all. AMD is getting plenty of 2.4Ghz chips. Yield is not an issue. AMD doesn't need 2.6Ghz until Q4. APM has and does help reach a mature process faster; APM does not automatically give you all transistor updates at the start of production.

Pop Catalin Sever said...

"This is not true at all. AMD is getting plenty of 2.4Ghz chips. Yield is not an issue"

If that's the case, then there is only one thing left that keeps AMD from releasing higher clocked CPU's from a top bin this year and that is TDP. Unlike yields or performance it seems power consumption takes a little longer to tune like AMD has shown with their previous nodes, and my guess is that they don't want to exceed the promised TDP for their CPUs.

Scientia from AMDZone said...

sal

"Scientia, you keep on spinning that Intel can't get past 3Ghz with the stock coolers."

That isn't a question; that is an accusation. Why are you trolling here?

"What do you say when HotHardware.com got its new E6750 to 3,9Ghz with the stock cooler, completety benchmark stable?"

I would say that you are woefully ignorant about thermal testing.

Here is a the Core 2 Duo Temperature Guide which you can look through to confirm what I say.

50 C is the maximum normal temperature. The review you linked to says "the CPU barely hit 48°C". This would just barely be safe if they had tested properly. Unfortunately they didn't and even they admitted that, "so we're not certain these temperature readings are accurate".

Their first mistake was trying to measure temperature while running standard benchmarks. This is a complete joke when some of these finish very quickly, "the CPU completed a Cinebench rendering pass in just 18 seconds". Even if they didn't complete quickly they still wouldn't be properly stressing the chip:

Intel has 2 distinct C2D thermal specifications, and provides a test program, Thermal Analysis Tool (TAT), to simulate 100% Load. Some users may not be aware that Prime95, Orthos, Everest and assorted others, may simulate loads which are intermittent, or less than TAT. These are ideal for stress testing CPU, memory and system stability over time, but aren't designed for testing the limits of CPU cooling efficiency.

After TAT, "Orthos Priority 9 Small FFT’s simulates 88% of TAT ~ 5c lower".

So, we can assume that the weak testing by HotHardware is 15-20 C low and that a real thermal test would be busting the thermal limits by about 15-20 C at 3.9Ghz.

I fully agree that with premium cooling even a quad core Kentsfield can hit 3.4Ghz and stay within thermal limits but apparently Intel has not had any incentive to release a limited edition chip for premium cooling only. And, according to tests at Anandtech, a stock 3.0Ghz Kentsfield is already exceeding thermal limits with a stock HSF.

Ho Ho said...

abinstein
"First, the instruction set is going to be modified - some instructions removed, some more added."

No, nothing will be removed that would make it incompatible. If then perhaps only the 32bit legacy but even that is purely my own speculation. If any instructions would be removed you couldn't call it x86 now would you?

Adding instructions doesn't make it incompatible either. Just remember how many different SIMD instructions x86 CPUs have and how all of them run the programs not using the instructions just fine. Only thing will be that just as P4 x87 was awfully slow compared to its SSE that thing will have the best performance and efficiency when using its very wide SIMD instructions instead of the regular 128bit SSE ones.


"Second, the programming model will change to something nobody (outside of Intel's lab) knows what it is."

It [Larrabee] is a whole lot of x86 cores with some new SIMD functions and a few hardwired graphics functions, nothing else. What exactly can change there to make it so much more difficult to program for comparing to any other multicore CPU?


"Last, the earliest incarnation of this research (Larrabee) is already a reality in Sun's Niagara and Rock (although with less # of cores, mostly due to inferior processing technology)."

Well, Niagara and Rock are in a bit different league than Larrabee. For starters their cores are quite a bit simplier than Larrabees. Also they both should have SMT capabilities. I think Larrabee was supposed to run up to 4 threads per core. That would put 24-48 physical core Larrabee with more complicated cores to much higher level than Sun CPUs.


"Before Intel comes up with a credible and efficient way to program this 24-core or 80-core lump of processors, the Terascale is really nothing but a grouping of a lot of cores and transistors."

Some time in 2006 they said they would show first results of their real time ray tracing research at around the time Larrabee will be released, that would certainly be one place where one can use every bit of computation power this CPU has. Of cource all the things currently being run on GPUs can run on Larrabee also, only much more efficiently. G80 has 128 physical "cores" and programmers don't have too much problems using them in their work. NVidia even produced a special board for those high-end number crunching machines.

As for general desktop usage of course not many programs will use all of the cores that CPU has. The ones not being used will be simply switched off to save power and will be brought back online as needed. I see no problems with having such a massive amount of (virtual) cores in one box.


Your text kind of sounded as Intel is doing something meaningless with their terascale research as it will not be as good as having less cores with higher individual performance. If so then I don't agree. Clock speed and single threaded performance increases will not be easy to use in the future and pretty much only way to continue to have exponential performance increases is to go massively parallel. It will surely take time for software to catch up but in the end it will be much better than fewer cores with higher per-core performance.



ck
"And well, it's obviously that -1.0ns GDDR3 is far more rare than -1.2ns GDDR3 modules, am I right?"

They might be more rare but not too rare as 0.8ns has been out for quite some time. Also I still don't get it why did AMD use 1ns chips when 1.1 or even 1.2ns would have been more than enough. After all the RAM is working at 1.21ns.


"How about 0.6ns GDDR5 (not a typo, thank you)?"

Is it out of labs already?


"BTW, G80 has two different clocks, one is below 650 MHz, and for shaders, they worked about double the core clock frequency, and see, not all of them ran at 1.35 GHz."

Yes, so it is. It is 575/1350 for GTX and 612/1500 for Ultra to be accurate. I also didn't say that entire chip runs at >1Ghz, just that majority of it and from what I've seen the shaders take up most of the space on the chip.


"This 80-core monster can show great architectural potential (QoS, router, Point-to-Point Interconnects and great calculations etc.), but exactly what can it do?"

Current research chip can't do much and it is not supposed to. As I've said many times already it is only made to see how well the things you listed can be made to work on a real CPU.


"No multimedia which is used daily, no x86/x64/RISC/VLIW/IBM computer compatibility sort of thing, and somebody must redone the programs for that!?"

Wait for Larrabee and we shall see. Also keep in mind that in order to get any meaningful performance increase form CPUs you have to update your programs from time to time. Without that we wouldn't be able to use any HW floating point math, not to mention any SIMD instructions.


randy allen
"The G80 GPU was so large on the 90nm process they developed a second chip called NV/IO which is the RAMDAC and powers the DVI ports etc."

Yes, so it is. That 681M number is without the NV/IO chip. That means NVidia is able to produce the much more complicated chip with superior performance at less cost than AMD(ATI).

scientia
"This is not true at all. AMD is getting plenty of 2.4Ghz chips. Yield is not an issue. "

Is this based on the links you provided in your last post? If so then I wouldn't be stating this as a fact but as speculation.



As for exceeding CPU thermal limits, who really cares? Unless you intend to use your CPU for over 5 years it doesn't matter. I've been running my Core2's at around 60% OC computing Seti for 24/7 for almost a year now without problems and I know lots of people who do the same. Sure, CPU lifetime will probably be a bit shorter but I don't intend to keep my CPUs for over ten years anyway.

Scientia from AMDZone said...

pop catalin

"If that's the case, then there is only one thing left that keeps AMD from releasing higher clocked CPU's from a top bin this year and that is TDP."

I would agree. If you check the releases you'll see that each quarter AMD is able to hit the same clock at reduced TDP. I would assume that 2.6Ghz might initially be a 120 watt part and 2.8Ghz might be 120 watts in Q1 08. AMD may hit 3.0Ghz on 65nm before they transition to 45nm after mid 2008.

However, also keep in mind that the total production volume of K10 is going to be very low in Q3 and only a bit better in Q4. It is also confusing as to what AMD's priority is in terms of quad versus dual core and Opteron versus FX versus Phenom. There is some suggestion that FX chips are the same as Opteron 2xxx chips so there may not be any difference in speeds here. The Phenom chips may be the same as Opteron 1xxx and these only need 1 working HT link.

There are also some suggestions that the dual core chips are partly disabled quad chips. If this is the case then it would help with the yield. I suppose theoretically this might help with yield assuming that some cores have better TDP than others. However, I would tend to assume that the variance wouldn't be that high among cores on the same die.

At this point it isn't clear if AMD can actually hit 2.9Ghz for dual core in Q4. It also isn't clear if AMD has any plans to release crippled versions of K10 for Sempron such as single core or chips without working L3 cache.

All of the suggestions that I've seen are that the Athlon range will be completely dual core and Sempron will absorb the current single core Athlons.

Scientia from AMDZone said...

ho ho

"As for exceeding CPU thermal limits, who really cares? "

I realize that overclockers wouldn't care since they often bust the thermal limits on chips. However, the question was not about what OC'ers are willing to do but about what Intel is willing to do. Apparently, Intel cares.

Scientia from AMDZone said...

enumae

There won't be any official word from AMD on K10 yields until the meeting in July.

I can see why you might question die size since the K10 die is larger than X2. I can also see why you would assume that you can give Intel a pass since they are using MCM. However, you need to consider that the ratio of die size to wafer size is actually smaller than what AMD has done in the past with K8 on FAB 30. Since these didn't cause any large drop in yield I don't see why K10 would.

enumae said...
This comment has been removed by the author.
Scientia from AMDZone said...

enumae

My apologies. I keep forgetting that you don't synthesize information like I do.

Okay, Anandtech estimated that MCM gave Intel a 20% yield advantage which meant a 12% cost advantage. This wouldn't be a problem for K10 unless Kentsfield is only yielding 60% (which would be impossible for a 12% cost advantage). So, Kentsfield's yields are at least 80%. This would give K10 a yield at least 60% and still fall within a "good" range.

Also, I've already seen the particle defect rate for AMD's 45nm immersion process and it is doing well. So, unless AMD's mature 65nm process has suddenly developed a defect density higher than the new 45nm process it is safe to say that yield on K10 is fine. Speeds are another story. It isn't yield that is limiting the speed but only the quality of transistors. K10 uses slightly different transistors than Brisbane. These will take time to improve.

enumae said...

Scientia

Thank you for explaining your reasons behind why K10 has or will have good yields.

As I am sure you know or I have said before, I am not in this field by an stretch of the imagination and as such my understanding is very limited.

I am very interested in understanding more about these topics and I hope that any future lack of understanding on my end can be understood by you that there is no ill intent in my post.

Again, thank you for taking the time to explain.

Unknown said...

Hi Scientia:

Can you tell us more about these new transistors that AMD will be using on K10?

Also, how is it possible that a 3-ipc core like K10 can beat a pseudo-4 ipc core like C2D?

Scientia from AMDZone said...

erlindo

"Can you tell us more about these new transistors that AMD will be using on K10?"

It's my understanding that AMD needed to tweak the transistors a bit to keep the TDP for quad core inside the same range as dual core. It can get confusing because AMD also transfers advances backwards to the last process. We can tell that things were not exactly up to speed with Brisbane because the die shrink was not that great on 65nm.

If AMD follows pattern they will shrink the process a second time maybe Q4 07. However, there is also the possibility that AMD will skip the second shrink because of the faster move to 45nm. K10 is already optimized for 45nm.

"Also, how is it possible that a 3-ipc core like K10 can beat a pseudo-4 ipc core like C2D?"

Well, all instructions are not equal. K8's instructions take mostly 1, 2, and 3 clocks to decode. Three decoders can do either 1 or 2 clock instructions while nano-code is a single instruction at a time. To hit 3 IPC you have to be decoding single clock (Direct/Fast Path Single) instructions only. C2D's instructions take either 1/2, 1, or 3. Three decoders can do 1 clock instructions while only one decoder can do the doubles that issue 2 instructions per clock and nano-code is the same with only a single instruction at a time. If all the instructions are simple then C2D is the same speed as K8 since both can only decode 3 simple instructions per clock.

Nano-code instructions take at least 3 clock cycles to decode and K8 and C2D are pretty even with 1 nano-code instruction every 3 cycles. If the instructions were only nano-code then they would again be even.

However, C2D currently pulls ahead of K8 when it decodes doubles since these can allow a 60% boost with up to 5 instructions decoded per clock when it decodes 3 simple instructions at the same time.

However, with K10, almost all SSE instructions have been been changed to decode in a single clock instead of two clocks like K8. This means that these SSE instructions have far less effect on the decoding bandwidth. Also, stack instruction are taken out of the pipeline with K10 as they are on C2D so this again helps. This allows AMD to gain an advantage for Direct Path Doubles which take two clock cycles to decode since C2D doesn't have these. Any C2D instruction that can't be decoded in 1 clock have to go to nano-code which is much slower.

core2dude said...


ho ho


First will probably be Larrabee with 24 and 32 cores coming in sometime 2008/2009 and up to 48 cores following soon.

48-core Larrabee would kinda suck, won't you agree? Imagine 48 cores on a ring interconnect. Even if you make it a bi-directional link, you could have as many as 24 hops to the farthest core. And if you need to send the data around the ring twice (due to congestion), you are looking at additional 48 hops. If each hop is 1 cycle (too optimistic?), you are looking at latency of 24 cycles just on the ring.

I though 32 cores itself was pushing it. 48 cores just sounds insane.

Now the 80-core chip is a totally different thing. It has a mesh interconnect, putting any two cores at maximum 10 hops away.

Ho Ho said...

scientia
"Apparently, Intel cares."

Yes, but that doesn't stop system builders offering pre-overclocked systems. Just some time ago here was a talk about some 3.3GHz QX6700 IIRC.



core2dude
"48-core Larrabee would kinda suck, won't you agree?"

No, I don't agree. In fact I'm sure that even using only ringbus connection Larrabee would scale nicely up to a few hundred cores for many tasks without loosing too much perfomance. After all it is targeted at those embarrasingly parallel tasks and any decent programmer knows that synchronizing too often leads to bad scaling. Just see how badly cinebench scales to get some idea. Any schoolkid with little experience can code a ray tracer that scales linearly to well over 4 cores. I can't understand how professional coders can't do it.


"Imagine 48 cores on a ring interconnect. Even if you make it a bi-directional link, you could have as many as 24 hops to the farthest core."

IIRC it is bi-directional 512bit bus giving it an enormous throughput, 4x more than EIB in Cell. Hop count and latency is not too important as every core will be running several threads in parallel. If one of those has to wait for data anotherone is started.


"If each hop is 1 cycle (too optimistic?), you are looking at latency of 24 cycles just on the ring."

I don't think it would be too optimistic. Also that 24 cycles is extremely fast compared to how much it takes for current Intel quadcores to send data to each other and even then it is good enough for vast majority of tasks.

Btw, has anyone got any idea how AMD NUMA systems synchronize their data? Is it sent directly from CPU to CPU or will it go through RAM first? E.g is it something like CPU0 writes to its memroy bank at address X and CPU1 asks for data in the CPU0 memory from the address X.


"I though 32 cores itself was pushing it. 48 cores just sounds insane."

Considering that it is mostly a "better GPGPU" it is much better than any GPU we have and will have during the next couple of years.

As for general purpouse single threaded tasks, I don't think there are many things that this CPU couldn't handle. I expect every single core to be around the performance level of 1.5-2GHz Core2 on the average. This is good enough for office applications but I'm sure even these programs will be parallelized by then, at least the parts that need the performance increase (spell checking, print preview and other rendering?).

"Now the 80-core chip is a totally different thing."

Yes but Larrabee should have some of the stuff from the 80-core thing. I'm not sure about the mesh interconnect but I bet there will be a huge shared L3 cache stacked onto the CPU, at least for the 48-core model. That 48-core thing should be out in 2010, in three years lots of things can happen.

Pop Catalin Sever said...

Well a little offtopic but not entirely there's these two articles describing EXOCHI an Intel technology meant to unlock the power of masive parallel computing to desktop pc's

Analysis: Has Intel found the key to unlock supercomputing powers on the desktop?

and

Follow-up: Has Intel found the key to unlock supercomputing powers on the desktop?

I would like to hear some opinions from you guys. The coments and user feedback are interesting.

Unknown said...

Thanks Scientia for taking time to answer my questions.
Let's all hope that these new transistors on K10 will help improve the processor thermals and clock scaling.

Scientia from AMDZone said...

ho ho

"Yes, but that doesn't stop system builders offering pre-overclocked systems. Just some time ago here was a talk about some 3.3GHz QX6700 IIRC.

Which means nothing. Quite some time ago Sun offered a system with overclocked Opterons. This hardly represents a new trend. However, once again, the actions over overclockers is not the issue. The issue concerns Intel and once again Intel is not rushing to release > 3.0Ghz processors. The reason for this remains the same: the current 65nm prcess is not able to handle the cooling requirements with a stock HSF.

Scientia from AMDZone said...

pop catalin

Well, I suppose I would have a few comments about EXOCHI.

The first is that this seems to be a very strong implementation. It is clear to me from the technical description that Intel intends to both keep its platform competitive and manufacture the accelerator hardware.

Secondly, as an external device this should be a nearly perfect arrangement that avoids a lot of the overhead of using regular hardware drivers.

Third, this arrangement seems geared towards Geneseo rather than gpu hardware on the cpu die itself. This could suggest that Intel is trying to keep its options open for MCM. And, since Intel is moving to Nehalem in 2009 along with Geneseo we have to assume that it intends to keep the northbridge since Geneseo has no provision for direct cpu connection. However, this also suggests that CSI is not nearly as powerful as has been suggested.

Finally, this would fundamentally be a brute force approach to acceleration. For example you could easily add four accelerator cards to the northbridge PCI-e ports while leaving all of the southbridge PCI-e ports open for heavy duty server networking. This would be a good overall approach to get around the limitations of either the FSB or CSI. Intel may have this in mind as a contender to get back into HPC.

Now, having said that, I guess we can cover what it isn't. It isn't Fusion and it isn't Torrenza. it looks like Intel is trying hard to jumpstart acceleration before AMD can get up to speed. This makes sense because AMD has suggested a staged approach and also because the full blown implementation from AMD would bury this.

Why? Because AMD has talked about extending the actual X86 instruction set to handle accelerators. And, with an ISA extension it really isn't that important if the hardware is on-die or not since AMD has low latency, cache coherent communication with ccHT via either another processor socket or via HTX.

The bottom line is that Intel needs to get this rolling before AMD gets CTM, Close To Metal, moving. If Intel can sway developers away from CTM then they have a chance otherwise AMD is looking to have the advantage by 2009.

Scientia from AMDZone said...

I don't know how many people actually check the links like the one I gave for CTM.

Intel's announcement is damage control. We'll have to see how successful it is. Keep in that they currently have nothing but a technical paper and a software model demo to go with the EXOCHI software environment. Intel needs to get some real momentum going for EXOCHI.

AMD in contrast has had both software and Stream hardware along with Torrenza and a development plan since November, 7 months ago:

Today more than 60 companies and research institutions are taking part in CTM trial programs.

“Using CTM today, AMD is working with a number of companies to deliver the tools ecosystem for stream computing,” said Marty Seyer, senior vice president, Computational Product Group, AMD.

“As part of our Torrenza initiative and CTM, AMD is enabling companies to work with best-of-breed vendors that understand how to optimize their software across all processor architectures, whether in stream processors or high-performance CPUs."

"For these organizations, the development of highly capable, and efficient software is their business, not a sideline. Allowing open innovation to flourish will ultimately enable better software, with more features to come to market faster than any proprietary approach.”

CTM is available to developers to license today at no cost.

Wise lnvestor said...

Scientia from AMDZone said...

I don't know how many people actually check the links like the one I gave for CTM.

I did, and it looks mighty impressive. I guess thats why nvidia chose to launch Tesla, they probably sense which way future HPC will compute and change their strategy accordingly.

Pop Catalin Sever said...

SounBlaster or 100% Compatible

all over again. These technologies that employ tight coupling and a determined instruction set plus vendor tie ins don't lead to any kind of innovations but only stagnation.

The driver model employed by GPU's and an abstract virtual machine that allows each vendor ot implement the set of features it desires has led to much more innovation in the GPU area.

For this reason I think developers will skip EXOCHI or CTM and use CUDA or even DirectX to asses accelerating capabilities of GPUs or other stream processors in the future.

Ho Ho said...

scientia
"The issue concerns Intel and once again Intel is not rushing to release > 3.0Ghz processors"

It doesn't release them because there is no need to. If it really would like to it could upgrade TDP to around 170W and sell CPUs with high clocks. After all its old Xeons have roughy the same TDP and are being sold to whoever wants them.


The bottom line is that Intel needs to get this rolling before AMD gets CTM, Close To Metal, moving."

Compared to Cuda CTM is rather bad thing. One is direct assembly access to GPU, other one is a C library with lots of nice functionality to use G80 GPUs for computation. Also from that I'd say that NVidia is much greater competitor to AMD than Intel in that area.


"CTM is available to developers to license today at no cost."

So is Cuda

Ho Ho said...

If these numbers are correct then I wonder at what frequencies do the low power version quadcores run that AMD was benchmarkign some time ago. Something around 1.6GHz, perhaps? At least pricing seems good enough to compete against Clovertowns

bk said...

Can someone explain how CUDA works? I assume that one would need to compile the c code for the nVidia GPU. If this is the case, I cannot see how it would be anything more than a niche.

Ho Ho said...

scientia
"I don't know how many people actually check the links like the one I gave for CTM."

What exactly is CTM? It is just a low-level access to the GPU that bypasses graphics API and executes the regular shader assembly code on the GPU. Nothing too fancy, I'd say. On the other hand Cuda is a full blown C library with regular C functions and operations instead the ASM you have to use with CTM.

To me it seems as CTM was actually a damage control against Cuda. NVidia is far ahead of AMD in GPGPU area and in order to not loose any more marketshare to it they quicly released a competing product. Then again I don't really understand why bother. NVidia has been the supreme ruler of high-end workstation graphics for a long time. IIRC last year NV had more revenue on its Quadro cards than entire ATI put together.

Intel just does as it has almost always done: it does its thing quietly in the corner and doesn't let others to distrub them. Once they have something they release it and jsut by having massive share of CPU market there usually is little problems with having enough marketshare. It might think Cuda is a competitor because of great HW and SW availiable for it and big marketshare in high-end graphics and workstations. CTM has pretty much nothing compared to Cuda, except for the press releases on AMD homepage talking about tens of companies using it.

BTW, Tesla is simply specially targeted HW for Cuda. It has more RAM and thanks to having no video outputs it can be clocked higher and cooled better giving it higher clock speed. You can use Cuda on every other G8x series GPU if you like to.


"Keep in that they currently have nothing but a technical paper and a software model demo to go with the EXOCHI software environment"

From the paper I read that they ran some tests with GMA X3000 being the coprocessor and acheved up to 12x speedups. That means they must have more than simply a paper availiable. Also it seems to be rather interesting concept. Certailny it would be much easier to use and progream than Cuda or CTM.

Btw, what do you think about that Fusion will be pushed back to 2009 and desktop versions of K10 will be available around March-April next year?


bk
"Can someone explain how CUDA works?"

It is just a regular C library with lots of functions that will run on GPU.


"If this is the case, I cannot see how it would be anything more than a niche."

Yes, it is a niche. The same niche that GPGPU programmers have been filling for a few last years with regular GPUs and using them through graphics APIs. Cuda and CTM just provide them with much better interface to the GPU computational recourses.

Ho Ho said...

Also from what I know, in order to submit a paper to Siggraph you have to submit it almost a year before the event. That means that EXOCHI paper was started long before Cuda/CTM were released.

Scientia from AMDZone said...

ho ho

"Also from what I know, in order to submit a paper to Siggraph you have to submit it almost a year before the event. That means that EXOCHI paper was started long before Cuda/CTM were released."

You are definitely confused. AMD's equivalent of CUDA was ATI's DPVM which was at SIGGRAPH (with its own paper) in 2006. DPVM is also a high level interface rather than assembler.

I have no idea how you came to the conclusion that AMD needed to do damage control for something that was ready a year ago.

Scientia from AMDZone said...

ho ho

"It doesn't release them because there is no need to. If it really would like to it could upgrade TDP to around 170W and sell CPUs with high clocks. After all its old Xeons have roughy the same TDP and are being sold to whoever wants them."

You are still confused on this as well. The problem is not TDP. As far as I can tell, Intel would be able to deliver a 3.2Ghz quad core and still fit within 130 watts. The problem as I've already explained is that a 2.93Ghz Kentsfield is already over Intel's specified temperature limits with a stock HSF. This is not a huge gamble for Intel since it is extremely unlikely that anyone would run something as thermally stressful as TAT for any length of time. However, with 3.2Ghz, Intel would go from being just slightly over its own thermal guidelines to significantly over.

Now, as I've already said: Intel could release a 3.2Ghz chip IF they required premium cooling. This would be fine for Voodoo PC or Alienware but apparently Intel doesn't see enough of a market to bother with it. This means that any vendor who did this would have to warrant the chips on their own.

ck said...

I recall from somewhere (Wiki?) the CTM opens the GPU architecture (to assembly level) for developing high-level programming tools (for other programming languages also), but CUDA was just C language with extensions (for making the GPU calculate). And so are they the same!? I don't know. Oh, forgive my ignorance. :(

BTW, I would like Scientia to explain how DPVM (I have seen slides here called it "CTM" in implementation section) works please...

Ho Ho said...

scientia
"AMD's equivalent of CUDA was ATI's DPVM which was at SIGGRAPH (with its own paper) in 2006"

So AMD actually had nothing, it just bought it from ATI? Sure, that really makes AMD being the leader in the area and Intel the one who is trying to catch up with it even when they are doing rather different things.


"DPVM is also a high level interface rather than assembler."

Then why did they dump the high-level thing and released the ASM level stuff under a new name?


"I have no idea how you came to the conclusion that AMD needed to do damage control for something that was ready a year ago."

It was ready a year ago? I thought you showed us a link to official announcement made in February this year.


Next thing I see is probably someone saying that Intel is releasing CSI as areaction to AMD releasing HT or Penryn having the same SSE4 functions that are in K10 is just to keep up with AMD. Same with native quadcore.
[/sarcasm]

Doesn't anyone think that this might actually be the way things are moving naturally and different groups are researching the subjects independantly?


"The problem as I've already explained is that a 2.93Ghz Kentsfield is already over Intel's specified temperature limits with a stock HSF"

Are you sure you are not mixing up tjuction and tcase temperatures?

ck said...

Hoho,

Yes, ATI was the FIRST to develop something that NVIDIA doesn't have in mind back then in 2006, if I see the dates correctly.

And, if ATI did not have something like that, what do you think AMD purchased ATI for, graphics and chipsets? Or just Fusion only?

ATI leads in handheld chips, "a billion devices" and what else? Servers, absolutely not, say for AMD servers, they call for Boardcom or NV chipsets, Intel has it own chipsets, so ATI can just provide the 2D graphics (Rage or Mach, I don't know...). Desktops, nope either, even the open of CrossFire on Intel chipsets doesn't help a lot, SLI has an upper hand in terms of popularity and performance (esp at the era of Y-dongle and software CrossFire). Desktop chipset? No either, just see the SB development after NV aquired ULi, the RAID and USB problems are still being critized even the SB600 is out (Please show us promise for the SB700!!), and the CrossFire does not even run at all x16 bandwidth (RD790 only runs x8-x8-x8-x8, and nF680a has dual x16-x8, though through different combination of chips as explained before). Desktop graphics? Not again, heavily beaten by NV G7x, why? Delays, as well as the R600 (Nope, even AMD can't help ATI with these problems through aquisition, they've their own problems, that is the K10, got it?)

Professional graphics, FireGL/FireMV? No... Same as the desktop graphics...

That leaves HPC, and thus GPGPU/Stream processing which ATI did research for some time ago.

At least at this very moment, large support was seen for AMD Stream Processor (the X1800 GPU) and CTM, including PeakStream (aquired by Google, Google with Strean Processing, ouch!), and earlier support for Folding@Home in the past year, much earlier than GF7 series.

And yes, the CTM do not have its own language, and then so what, it already one year old and support by many parties. It's so called a leader, and pretty much explains the word itself.

And the Intel integrated memory controller and point-to-point stuff, it's from research, yes, but after AMD released K8 with these. Point-to-point interconnects, provides more bandwidth, even in multi-core environments (remember bottlenecks for dual dual-core processors, though Intel "demo'ed" 1066FSB has not reached the "max. threshold", but they exists), also good for accelerators and co-processors as seen on HyperTransport. Integrated memory controller hides latency, good for a processor, but bad for overclocking! Why, implementing p2p interconnect links and memory controller on die causes the die to be more complicated, and thus bad for OC.

More to that, Intel do not use the native design, 'cause 1. easier to be deployed (dual dual-core Conroe with 1066/1333FSB makes Kentsfield, comes cheaper) and 2. reduces defective chips as more core means more complicated, and one die is defective, the whole processor die is *boom*, gone (similar situation with Cell BE, total 8 cores with 1 not in use, though Cell BE is not x86, users may end up 7 cores Cell, not affecting performance), K10 is far more complicated (can be told from 12 layer metal interconnects), maybe more faults (just can't trust AMD fabs, go outsource to IBM fabs!), and I think Intel is actually a making better move (though I think Intel may eventually come up with a "native" quad-core design for upcoming octal-core processors codenamed "Whatever" or another way round, integrating 4 dies onto one package which is simply better than native quad-core dies, and heck to the "native" processor claims).
--
BTW, I'd thought AMD has announced SSE4a (subset of SSE4) on the first hand before Intel did in IDF, so it is not what you said "copy", instead Intel and AMD made an agreement that AMD can use SSE "whatever" in their processors without legal troubles, right?
--
So what are the differences between Tjunction and Tcase?

So an Core 2 Extreme QX6800 has >120W TDP (labelled, not typical or max.), maybe an QX6900 has 150W which is way too far to being "okay" in the guidelines?

Anyways, one thing I want to say is that, "go get the process improved and make the TDP figures more pretty, before thinking of speed bumps".

Unknown said...

Well, it is official:

K10 (aka K8L) = 2.0GHz and launch in August (my assumption is late August)....

With the speculation on what AMD is launching over, we can not increase our certainty that AMD will be out of business in 2008...

Scientia from AMDZone said...

ho ho

Yes, I believe I was mistaken about ATI having a high level interface. However, there is at least one third party high level interface based on CTM.

Yes, I'm quite certain about tCase. Kentsfield exceeds its tCase temperature at 2.93Ghz with stock HSF. Even though Intel is exceeding its own guidelines, this isn't a big gamble because it is very unlikely that anyone would run anything as thermally stressful as TAT for any length of time. However, 3.2Ghz would be quite different. Apparently, Intel won't take that big of a risk with stock HSF and apparently they don't see enough of a market to package 3.2Ghz with a premium cooling requirement.

real

"With the speculation on what AMD is launching over, we can not increase our certainty that AMD will be out of business in 2008... "

Don't bet your house on that. For different reasons, AMD will not go bankrupt.

enumae said...

Scientia

“AMD has prioritized production of our low power and standard power products because our customers and ecosystem demand it...

The source was the AMD news release.

Do you believe the AMD press release, or is AMD trying to spin this and put it into a positive light?

Ahmar Abbasi said...

Well soo much for scientia sitting in AMD's office looking at 2.5ghz barcelona chips running well.......


The forthcoming microprocessor -- a badly needed weapon to counter recent gains by rival Intel Corp. -- will go on sale to computer makers in August at an initial clock speed of up to two gigahertz, said Randy Allen, corporate vice president in AMD's server and workstation division.

http://online.wsj.com/article/SB118310081919852862.html?mod=googlenews_wsj

Apparently AMD is hiding their true performance power till nehlam comes out......:P

Anonymous said...

AMD to Launch "Barcelona" Slow this August

AMD announced today what analysts have been dreading for months: the company will launch its next-generation architecture this August, at top-out frequencies of 2.0 GHz

So much for 2.6GHz at launch huh Scientia, like you predicted?

“AMD has prioritized production of our low power and standard power products because our customers and ecosystem demand it, and we firmly believe that the introduction of our native Quad-Core AMD Opteron processor will deliver on the promise of the highest levels of performance-per-watt the industry has ever seen,” added Randy Allen, corporate vice president of AMD's Server and Workstation division.

In other words, AMD can't scale K10 for shit and needs to release half baked CPUs so that they don't lose more marketshare.

http://www.theinquirer.net/default.aspx?article=40606

120W only @ 2.4GHz? I can't imagine what it will be like @ 3GHz.

Penryn @ 3.33GHz will easily frag Barcelona at a measly 1.9-2GHz. By the time AMD rolls out a 2.4GHz K10, Nehamlem would be out by then!

gdp77 said...

After that :

http://theinquirer.net/default.aspx?article=40680

It is clear that Hector made AMD's name a total joke. The only thing the can do now is sell chips at third world countries. (if they don't BK in Q1-Q208 that is)

CPU innovation and evolution will be slower now as Intel will have no opponent. Also high-end cpu prices will hit the roof as there will be only one choice. This is the end of an era.

Thank u AMD for what u gave us all these years. Shame on you Hector for what u have done.

RIP AMD

gdp77 said...

Don't bet your house on that. For different reasons, AMD will not go bankrupt.

Please do tell us about it. At least according to my logic AMD can't escape BK now...

Ahmar Abbasi said...

2.4ghz barcelona MAY rear its head later on

"Sources close to AMD's plans in Taiwan tell the INQ that AMD hopes to get 2.5GHz bins out of Barcelona, although dates for such beasts are not yet available."

2.4ghz barcelona at 120watts. Lets start the barbecue......and the BS about performance per watt........

http://www.theinquirer.net/default.aspx?article=40606

Roborat, Ph.D said...

AMD in its latest Barcelona announcement said in a subtle way that the new processor will not be about performance but rather about energy efficiency and performance-per-watt.

Remember this post back in April which was highly ridiculed here?
AMD is DOOMED
or how about this in May which was dismissed as nonesense?
Barcelona DOA

Normally at this point I feel i am entitled to my last laugh but since I'm not the salt-rubbing type, instead I decided to give another market prediction, which i am sure by this time that some of you have wisen up to pay attention to.

My prediction is about DTX and the mini-DTX that's been thrown around here like some sort of Christ. DTX will not make any impact whatsoever. Industry support is negligible if not token since everyone knows that most new form factors these days tend to be OEM proprietary. Look at Dell, Sony VAIO etc. And on top of that, the market for DTX is completely overlapped by something more attractive and similarly priced - the laptop.

Scientia from AMDZone said...

enumae

I'm writing that article now. I'll see if I can get it posted in a few hours.

ring

I never mentioned 2.5Ghz.

poke

I never mentioned 2.6Ghz.

Why do you have so much trouble quoting me? What I said was that AMD has 2.4Ghz chips. So, it was my assumption that they would bump up the release speed from 2.3 to 2.4. Now it is apparent that they don't have enough 2.4 (nor even enough 2.2) for launch. So, they are having more problems than I thought.

Scientia from AMDZone said...

roborat

"Remember this post back in April which was highly ridiculed here?
AMD is DOOMED"


Yes, your premise was incorrect. As I tried to explain to you before the spec numbers are simply projections of K8 and have nothing to do with K10. This is what my analysis showed but it also tells this in the fine print at the bottom of the chart.

"or how about this in May which was dismissed as nonesense?
Barcelona DOA"


Yes, that laughable article was based on some unnofficial Pov-ray scores. Why do you expect to be taken seriously when you cherry pick anti-AMD items?

Have you yet realized that you pounced on the bogus story that AMD was selling all of its FABs in 2008? Why were you so quick to assume that that story was true? It couldn't be because you have a bit of bias against AMD could it?

"Normally at this point I feel i am entitled to my last laugh"

What would you be laughing about? I guess you could laugh about your estimate that Intel could make quad cores for 1/3rd of the price that AMD could. I still find that one quite amusing.

"DTX will not make any impact whatsoever. Industry support is negligible if not token"

DTX will help on the desktop low end. Why are you trying to pretend that DTX is more important than that? Is this another one of your strawman arguments?

"DTX is completely overlapped by something more attractive and similarly priced - the laptop."

That is probably one of the more ridiculous things you've said. Neither DTX nor mini-DTX is replaced by mobile.

The path for DTX and mini-DTX were pretty much set in stone when ordinary CRT monitors began to be replaced by much lighter LCD displays. I've hefted 17" monitors plenty of times and they are pretty heavy. The much lighter LCD displays don't even weigh as much as an 12" monitor.

This trend also follows the move away from the original parallel port cable for printers and scanners to USB and the move away from dedicated keyboard and mouse ports to USB, not to mention the stampede away from ethernet cables to wireless LAN. This all reduces clutter and makes a trend toward a simpler, quieter, cheaper desktop.

BTW, I'm typing this on my notebook which is setting on an expansion base. My notebook is not a true laptop but actually a desktop replacement. This is as far as you can go in pushing a notebook towards doing the job of a desktop computer and I can say without any doubt at all that it doesn't fill the job.

Notebooks are inherently more expensive than the equivalent desktop unit. Notebooks also tend to have smaller displays unless you have a desktop replacement in which case you have a lot less battery life. Mine is 2 hours max. My notebook is also quite large and heavy; it barely fits on my lap.

No, it is common sense that for the equivalent value of system (other than portability) you can get a desktop PC for half the price of something mobile. There is no way that the low end desktop market is leaving anytime soon.

Scientia from AMDZone said...

gdp77

"Please do tell us about it."

Not in comments; too long. I might put it in an article.

"At least according to my logic AMD can't escape BK now... "

I would imagine your logic is fine but you aren't seeing the big picture. AMD won't go bankrupt.

Roborat, Ph.D said...

“AMD is DOOMED"
Scientia said: “Yes, your premise was incorrect. As I tried to explain to you before the spec numbers are simply projections of K8 and have nothing to do with K10.

You can try and dance around the truth all you want but at the end of the day what I said about Barcelona was indeed accurate. Your K8 projection theory on K10 is pointless, doesn’t really add any value and doesn’t really change the outcome. Who cares where the numbers come from? But when I hear AMD make statement with qualifiers such as “at the same frequency”, it doesn’t take a genius to conclude Barcelona is broken. Your insistence on the contrary is your mistake.

Barcelona DOA

Scientia said: “Yes, that laughable article was based on some unnofficial POV-ray scores. Why do you expect to be taken seriously when you cherry pick anti-AMD items?”

The POV-ray scores wasn’t the only focal point of that story but rather the circumstance around it that makes it very significant. Did you even ask yourself the question that if the POV-ray score wasn’t a fair representation of Barcelona’s capability, why didn’t AMD just run another benchmark? The fact that AMD couldn’t counter with anything is what made the whole world skeptical about K10. In the end everyone’s skepticism was justified. AMD ships Barcelona at 2GHz at best. Looking back, can you seriously tell me now those POV-ray scores wasn’t an accurate picture of Barcelona’s current state? Once again, your insistence on the contrary is your mistake.

Scientia asked: “What would you be laughing about? “
Now that is a very tempting question because I can easily make a very long list.

Scientia from AMDZone said...

roborat

"what I said about Barcelona was indeed accurate."

You have the belief that even when the basis of your argument is completely wrong that you are still accurate? Strange indeed.

" Your K8 projection theory on K10"

Curious isn't it that my "theory" agrees with what AMD says in the fine print. You never bothered to check the numbers, did you?

"Who cares where the numbers come from?"

Oh, I see. In your ridiculous article you boldly state that AMD must have made up the numbers. Yet, when I run the numbers they match perfectly with what AMD says in the fine print. Now that you know that you are wrong you've suddenly decided that the numbers aren't so important after all.

"it doesn’t take a genius to conclude Barcelona is broken."

You are correct. It takes someone fairly foolish to conclude that. I'm still baffled what twisted version of logic you are using to conclude that if K10 has 20% greater IPC than K8 that it is broken. At 2.5Ghz K10 should be able to match K8 at 3.0Ghz and a 2.7Ghz dual core should be faster than the current fastest K8. I guess you must have thought K8 was broken too when it was released.

"The POV-ray scores wasn’t the only focal point of that story"

Your only basis of comparison was the pov-ray numbers but you now claim that it wasn't the focal point? That is a curious point of view because without the pov-ray numbers you have nothing in your article.

"if the POV-ray score wasn’t a fair representation of Barcelona’s capability, why didn’t AMD just run another benchmark?"

Why would they? The only point of the pov-ray scores was to show scaling from two cores to four. They were not trying to set a pov-ray record.

Presumably you've never bothered to read the K10 Optimization Guide. Why don't you try reading it? Then see if you can explain how it would be possible for SSE instructions to be twice as fast in K10 as K8 but not see any speed increase on something supposedly using SSE.

Are you going to fall back on one of your "AMD must be lying" arguments? They made up the numbers in the guide or K10 must be so broken that it only runs as fast as K8? How about a simpler and more rational conclusion like that the pov-ray code they ran wasn't using SSE?

"In the end everyone’s skepticism was justified."

Justified by what exactly? You've still never seen a set of K10 benchmarks. Or are you actually trying to suggest that your incorect assessment of K10's performance per clock is somehow proven by the 2.0Ghz launch speed? I'm sorry but these two things are not related.

"AMD ships Barcelona at 2GHz at best."

Yes, 2Ghz is a bit slow but that would have no effect on your argument which is based on performance at a given clock rather than absolute performance. Even with a slow 2Ghz launch speed you would still be wrong.

" Looking back, can you seriously tell me now those POV-ray scores wasn’t an accurate picture of Barcelona’s current state?"

Yes, I can. We wouldn't see any big change in speed for scalar FP code but any code running SSE on K10 would be much faster than K8 at the same clock.

Scientia from AMDZone said...

roborat

You still don't get it, do you? You spend a lot of time on your blog writing counter articles to mine while I spend no time at all writing counter articles to yours.

Have you produced a single article on your blog that wasn't a counter article or a cherry picked news item?

My real arguments are stated incorrectly on your blog all the time. However, I've stopped making corrections on your blog because you only seem interested in bolstering your support by letting all the gutter rats post there. It takes no personal integrity to throw up a comment like, "You're an idiot. AMD sux. Intel roolz. AMD got pwned. AMD is going bankrupt. Grunt, grunt, slobber, slobber." signed anonymous.

If someone is so embarassed by what they write that they have to hide behind an anonymous posting then how likely is it that they have anything important to say? How important of a document would the Declaration of Independece have been if every signature had been anonymous?

And, it certainly seems like a contradiction to me that someone would claim that my articles are full of errors and yet, rather than coming here and laying out these supposed errors in a simple and straightforward fashion, they instead hide on another blog where they can take potshots from a safe distance.

The only reason you posted here is because you thought that an unfavorable news item about AMD gave you some kind aura of righteousness in your crusade against AMD and you thought you could come here and gloat. Are you really that blind and foolish? Look at Tracking AMD sometime. It isn't some AMD fansite cherry picking news (as you do). Even though it has the word "AMD" in the title they post all news about AMD, both good and bad. Look at the home page at AMDZone sometime. Again, even though it is about AMD, Chris Tom (you know, the guy you claimed was delusional) links to both positive and negative news about AMD.

Why would you ever think you could ride the coattails of a negative announcement and come here and post a bunch of nonsense while tooting your own horn as loudly as you could? The news is just news and you still have to stand on your own two feet. If you don't have anything substantial to say then go back to your own blog. I'm sure the gutter rats there have been whooping and hollering over this news and they'll be happy to cheer you up.

jsrivo said...

scientia

i applaud how you try to reason with someone who so obviously has nothing but contempt for AMD. but the truth is, your voice of reason falls on deaf ears. roborat will never see anything positive about AMD. he rejoices on every bit of bad news about AMD, be these unfounded or factual. it seems to me that this guy lives to bash AMD.

i sincerely hope for AMD's successful comeback, if for nothing else than to shut up this troll. i know this may not happen in the near future, but if AMD did it once when the circumstances were much worse that they are now, i'm sure AMDn do it again.

Anonymous said...

"You are correct. It takes someone fairly foolish to conclude that. I'm still baffled what twisted version of logic you are using to conclude that if K10 has 20% greater IPC than K8 that it is broken. At 2.5Ghz K10 should be able to match K8 at 3.0Ghz and a 2.7Ghz dual core should be faster than the current fastest K8. I guess you must have thought K8 was broken too when it was released."

If K10 has only 20% greater IPC than K8, it would still have a tough time with Conroe/Kentsfield high clocks and a real hard time with Penryn.

But Randy is claiming that a quad-core Barcelona at 2.0GHz will beat dual-core Opty at 3.0GHz by 40-50 percent. I lost the link but I'll find it for you if you want.

So, let's give AMD the benefit of the doubt and say that a quad core Barcelona is 50% faster than current K8 dual core Opterons.

2*3GHz dual core K8 Opteron = 6GHz
If K10 quad core is 50% faster, than,

1.50*6GHz = 9
Performance of quad core K10 is 9 and divide that by 4 cores
9/4 K10 cores= 2.25/2 K8 cores ~ about 12% faster per core.

Of course there are other ways to interpret what Randy is saying so I'm not going to claim this as fact... but so far K10 doesn't seem to be looking good. You don't agree Scientia? You know, Randy did say K10 will smash Intel's best quad core offerings by 40%+ across all types of apps but it only seems to be true for specfp_rate and seriously, it doesn't mean crap. Quad FX-74 crushes QX6800s in specfp_rate but in the real world, QX6800 makes Quad FX-74 look like a complete joke.

Scientia from AMDZone said...

poke

You've made a good attempt but you are getting the numbers all jumbled.

First of all, AMD has not yet given any comparison with Opteron that I'm aware of.

The original statement was that Barcelona would be 40% faster than Clovertown. However, this was when Clovertown was at 2.66Ghz. When Intel bumped the speed up to 3.0Ghz they changed to 50% faster at the same clock. This was for SSE. They gave a speed of 20% faster at the same clock for Integer.

Now, what you need to understand is that neither of those figures is based on K10. Those numbers are based on simple projections of K8 to quad core and are based soley on the spec benchmarks.

Kentsfield/Clovertown bogs down quite a bit from Conroe's speed (they have poor scaling) so you can't compare these numbers with Conroe. Also, from early tests plus Intel's statements we know that Penryn is about 13% faster at the same clock as Clovertown. So, even if we take these numbers at face value we would only get about 6% faster than Penryn in Integer at the same clock. But it looks like it is going to be awhile before K10 is at higher speeds.

enumae said...

Scientia

Whats your take on this statement...

"With planned availability at launch in a range of frequencies up to 2.0 Ghz, AMD expects its native quad-core processors to scale to higher frequencies in Q407 in both standard and SE (Special Edition) versions. Designed to operate within the same thermal envelopes as current generation AMD Opteron processors, AMD estimates that the new processors can provide a performance increase up to 70 percent on certain database applications and up to 40 percent on certain floating point applications, with subsequent higher frequency processors expected to significantly add to this performance advantage."

If I read this right, going from Dual to Quad will add a 70% increase on Integer based applications, and 40% on Floating Point applications while staying within the same thermal envelope (or similar clock speed)?

Things don't look to scale all that well, or am I missing something here?

Anonymous said...

Scientia,

"First of all, AMD has not yet given any comparison with Opteron that I'm aware of."

It's from the Wall Street Journal, you need to be a member but here is the link,

http://online.wsj.com/article/SB118310081919852862.html?mod=rss_whats_news_us

SAN FRANCISCO -- A long-awaited chip dubbed Barcelona from Advanced Micro Devices Inc. will initially be a bit slower, and arrive a bit later, than some people expected.

The forthcoming microprocessor -- a badly needed weapon to counter recent gains by rival Intel Corp. -- will go on sale to computer makers in August at an initial clock speed of up to two gigahertz, said Randy Allen, corporate vice president in AMD's server and workstation division.

AMD had not previously spelled out the expected clock speed of Barcelona, a code name for an addition to AMD's Opteron line that packs the equivalent of four electronic brains on one piece of silicon. But Mr. Allen acknowledged that some industry watchers had speculated AMD would initially deliver clock speeds of 2.7 gigahertz to 2.8 gigahertz.

Clock speed, or operating frequency -- a measure of internal timing pulses in a chip that is sometimes compared to revolutions per minute in a car engine -- is only one contributor to computing performance. Mr. Allen estimated that the first versions of Barcelona still will be 40% to 50% faster than existing Opteron chips, which have two processors.

But any initial performance advantage over Intel will be less clear-cut. On some jobs, Barcelona will beat Intel's fastest chips, and on some jobs Intel may report better results, Mr. Allen said. In February, by contrast, AMD had asserted that Barcelona would have a 40% performance advantage over Intel's products.

Anyway, any comments Scientia?

enumae said...

Here ia a link outside the WSJ, and it says things a little different.

""We will be seeing a performance boost of 40-50 per cent above our highest frequency dual-core products that are available today," AMD VP Randy Allen told us.

Link.

Unknown said...

roborat, intel fanatic wrote:
The POV-ray scores wasn’t the only focal point of that story but rather the circumstance around it that makes it very significant. Did you even ask yourself the question that if the POV-ray score wasn’t a fair representation of Barcelona’s capability, why didn’t AMD just run another benchmark? The fact that AMD couldn’t counter with anything is what made the whole world skeptical about K10. In the end everyone’s skepticism was justified. AMD ships Barcelona at 2GHz at best. Looking back, can you seriously tell me now those POV-ray scores wasn’t an accurate picture of Barcelona’s current state? Once again, your insistence on the contrary is your mistake.


Please, take a look at this:
Quote: To add insult to injury, when DailyTech benchmarked the pre-production 1.6 GHz Barcelona, the CPU did not match Intel's 65nm quad-core offering clock-for-clock. AMD engineers stress to DailyTech that this benchmark was premature, and that final silicon and software will allow for SSE optimizations and better performance.

And this one comes from DailyTech

abinstein said...

"We will be seeing a performance boost of 40-50 per cent above our highest frequency dual-core products that are available today,"

That is just what I estimated in my blog comment. Just search for 44% on the linked page. It's (almost) no-brainer. They probably made the comparison based on SpecInt_rate 2006.

Anonymous said...

enumae,

Here ia a link outside the WSJ, and it says things a little different.

""We will be seeing a performance boost of 40-50 per cent above our highest frequency dual-core products that are available today," AMD VP Randy Allen told us.

From the Wall Street Journal,

40% to 50% faster than existing Opteron chips, which have two processors.

No offense Enumae but that's basically the samething. If you can't see that, I don't know what to tell you. Randy is saying that K10 @ 2GHz will be 40%-50% faster than the fastest dual core Opteron, which isn't amazing because Clovertown/Kentsfield can already do that, if not better.

There is already a 50W Clovertown coming, so I don't see how anyone can go with a 1.9-2GHz 95W K10 unless it outperforms Clovertown at the same frequency by 40% (which won't happen).

abinstein said...

"There is already a 50W Clovertown coming, so I don't see how anyone can go with a 1.9-2GHz 95W K10 "

The 50W Clovertown is going to be only 1.6Ghz. It will be noticeably slower than 1.9Ghz HE K10.

If what AMD claims is true, then K10 will get even with Clovertown, clock-for-clock, at integer throughput and surpass it at floating point. 2.33GHz and 2.66GHz Clovertown will still give 2.0GHz Barcelona a run, but these C2Q chips require 1333MHz FSB to perform well, not to mention 2.66GHz Clovertown takes 120W Intel TDP.

Periander said...

Say what? 50W Clovertowns have been available at 1.6 and 1.86GHz since April. A 2.0GHz 50W part is scheduled to be introduced during Q3.

enumae said...

Poke
No offense Enumae but that's basically the samething.

See,this I can't stand... I posted a link that says something slightly different, but you could actually get to it!

If you can't see that, I don't know what to tell you.

Read above.

Randy is saying that K10 @ 2GHz will be 40%-50% faster than the fastest dual core Opteron...

Now, aside from all of that, the WSJ article you referred too, I believe I was able to find a portion of it, or possibly the whole thing, I don't know, but I do know that in what I found, all it says is...

"Mr. Allen estimated that the first versions of Barcelona still will be 40% to 50% faster than existing Opteron chips, which have two processors."

There is no mention of AMD's fastest dual cores, hence why I said "Here ia a link outside the WSJ, and it says things a little different."

gdp77 said...

If this is true, I don't know where AMD will find the cash to stay alive, until the phenom's release date.

Haven't u said scientia that AMD can still lose 1,5B and don't go BK?

Ho Ho said...

gdp77
"If this is true, I don't know where AMD will find the cash to stay alive, until the phenom's release date."

That doesn't really matter much as there won't be enough of the K10 CPUs to provide good enough revenue anyway. Also they still have a lot of K8's produced they need to sell.

Pop Catalin Sever said...

If AMD shows up with an inferior and underperforming product on the market when Barcelona finally launches I really don't see anything that can stop it from going BK except equity buyout.

So far Barcelona's performance is a real mystery open to all sorts of speculation. No real numbers are found anywhere just ambiguous performance promises from AMD. This is so sad as all of the negative press AMD is getting is very damaging and it's not anybody's fault but AMD's. There are lot's of people that already start to feel deceived by AMD.

It's only the fact that they have been able to deliver K8 with a certain level of quality and performance in the past that makes me hope for a possible redemption for AMD. But the past doesn't give any guaranties for the future, does it?

Unknown said...

More FUD from the Inq???
...But according to digitimes, AMD is still on track for a second half release.

quote: However, AMD has responded to this report by stating that the company's official launch schedule for Phenom remains the second half of 2007 and this schedule has not changed. The company added that it has not contacted motherboard makers concerning a delay to the schedule.

Anonymous said...

The 50W Clovertown is going to be only 1.6Ghz. It will be noticeably slower than 1.9Ghz HE K10.

I don't think so fanpoi, like another poster mentioned, a 50W Clovertowns have been available at 1.6 and 1.86GHz and a 2.0GHz 50W Clovertown is coming soon. Just what you would expect from abinstein. FUD.

Scientia from AMDZone said...

Let's continue any discussion of K10 at the new K10 launch article.