Thursday, August 23, 2007

2008 And Beyond

2007 is far from over but it seems that lately people prefer to talk about 2008. Perhaps this is because AMD is unlikely to get above 2.5Ghz with K10 and Penryn will only have a low volume of about 3%. I suppose this is not a lot to get excited about. So, we are encouraged to cast our gaze forward but what we see is not what we might expect.

AMD's server chip volume has dropped considerably since last year. So, there is little doubt that this trend will reverse in Q3 and Q4 of 2007 with Barcelona. This is true because even at lower clock speeds, Barcelona packs considerably more punch than K8 Opteron at similar power draw. The 2.0Ghz Q3 chips should replace around half of AMD's current Opterons and faster 2.5Ghz chips replacing even the fastest 3.0Ghz K8 Opterons in Q4. This should leave Intel with two faster server chip speeds in Q4 with this most likely falling to a single speed in Q1 08. However, Intel may be able to pull farther ahead in Q2 08. I'm sure this will be confusing to those who are comparing the Penryn launch with Woodcrest last year and assuming that the highest speed grades will be released right away. The problem with this view is that Penryn is leading 45nm in Q4 of this year whereas Woodcrest did not lead 65nm in 2006. Instead, Woodcrest was six months behind Presler which went into 65nm production in October 2005 and launched in December 2005. This explains why Woodcrest was able to hit the ground running and launch at 3.0Ghz. June 2006 was six months after 65nm Presler in December 2005. Taking this as the pattern for 45nm would mean top initial speeds wouldn't be available until Q2 2008. This seems true since Intel has been pretty quiet about Q1 08 release speeds. If the market expands in early 2008, Intel should get a boost as AMD feels the pinch in volume capacity caused by the scale down at FAB 30 and the increased die size of quad core K10. This combines with Intel's cost savings due to ramping 45nm to put Intel at its greatest advantage. However, by the end of 2008, this advantage will be gone and Intel won't see any new advantage until 2010 at the earliest.

To understand why Intel's window of advantage is so small you need to be aware of the differences in process introduction timelines, ramping speeds, base architecture speed, and changing die size advantages. A naiive assumption would be that: 1.) Intel's timeline maintains a process launch advantage over AMD, 2.) that Intel transitions processes faster, 3.) that Penryn is considerably faster than Conroe and that Nehalem is considerably faster than Penryn, and 4.) that Nehalem maintains Penryns's die size advantage. However, each of these assumptions would be incorrect.

1.) Timeline

Q2 06 - Woodcrest
Q3 07 – Barcelona Trailing by 5 quarters.

Q4 07 - Penryn
Q3 08 – Shanghai Trailing by 3 quarters.

Q4 08 - Nehalem
Q2 09 – Bulldozer Trailing by 2 quarters.

Q4 09 - Westmere
Q1 10 - 32nm Bulldozer Trailing by 1 quarter.

Intel's Tick Tock timeline is excellent but AMD's timeline steadily shortens Intel's lead over the next two and a half years. This essentially means that the dominance that C2D enjoyed for more than a year will not be repeated. I suppose it is possible that 45nm will be late but AMD continues to say that it is on track. The main reason I am inclined to believe them is the die size. When AMD moved to 90nm they only had a small shrink in die size at first and then they later had a second shrink. AMD only reduced Brisbane's die size to 70% and nine months later AMD could presumably do a second shrink. But they aren't; Barcelona shows the same 70% reduction as Brisbane. This suggests to me that AMD has skipped a second die shrink and is concentrating on the 45nm launch. I'm pretty certain that if 45nm were going to be late that we would be seeing another shrink of 65nm as a stopgap.


2.) Process Transition

Most people who talk about Intel's process development only know that Intel launches a process sooner than AMD. However, the amount of time it takes Intel to actually field a new process is also important. Let's look at Intel's 65nm history starting with an Intel Presentation concerning process technology. Page 2:

Announced shipping 65nm for revenue in October 2005

CPU shipment cross-over from 90nm to 65nm projected for Q3/06


And, from Intel's website, 65-Nanometer Technology:

Intel has been delivering 65nm processors in volume for over one year and in June 2006 reached the 90-65nm manufacturing "cross-over," meaning that Intel produced more than half of total mobile, desktop and server microprocessors using industry-leading 65nm process technology.

So, we can see that Intel did quite well and even beat its own projection by reaching crossover in late Q2 instead of Q3. October 2005 to June 2006 would be eight months to 50% conversion. For AMD, the INQ had a rumor for shipping in October and we know it officially launched December 5th 2006. Let's assume that this is true since it matches with Intel's October revenue shipping date with a December release in 2005. The AMD Q1 2007 Earnings Transcript from April 19th 2006 says:

100% of our fab 36 wafer starts are on 65 nanometer technology today

October 2006 to April 2007 would be 6 months. So, this would mean that AMD made a 100% transition in two months less than it took Intel to reach 50%. Intel's projection of 45nm is very similar with crossover not occuring until Q3 08. What this means is that even though Intel launches 45nm with a headstart in Q4 07, AMD should be completely caught up by Q1 09.


3.) Base Architecture Speed

Intel made grand claims of a 25% increase in gaming performance (40% faster for 3.33Ghz Penryn versus 2.93Ghz Kentsfield). However, according to Anandtech's Wolfdale vs. Conroe Performance review, Penryn is 4.81% faster while HKEPC gets 5.53% faster. A 5% speed increase is similar to what AMD got when it moved from 130nm to 90nm. The problem that I see is not with Intel's exageration but that Nehalem seems to use the same core. In fact, other than HyperThreading there seems to be no major changes to the core between Penryn and Nehalem. The main improvements with Nehalem seem to be external to the core like an Integrated Memory Controller, point to point communications, L3 cache, and enhanced power management. The real speed increases seem to come primarily from GPU processing and ATA instructions however like Hyperthreading these are not going to make for significant increases in general processing speed. And, since Westmere is the same core on 32nm this means no large general speed increases (aside from clock increases) for Intel processors until 2010 at the earliest. I suppose this then leaves the question of whether AMD will get a larger general speed increase with Bulldozer. Presumably if AMD can manage it they could then pull ahead of Nehalem. Both Intel and AMD are going to use GPU's on the die and both are going to go to more cores. Nehalem might get ahead of Shanghai since while both can do 8 cores Nehalem can also do HyperThreading. But Bulldozer moves back ahead again by allowing 16 actual cores. At the moment it is difficult to imagine a desktop application that could effectively use 8 cores, much less 16 but who knows how it will be in two years.


4.) Die Size

For AMD the goal is to get through the first half of 2008 because the game looks quite different toward the end of 2008. By the time Nehalem is released Intel will already have gotten most of the benefit of 45nm while AMD will only be starting. Intel will lose its small die size MCM advantage because Nehalem is a monolithic quad die like Barcelona. Intel only got a modest shrink of 25% on 45nm and so far has only gotten a 10% reduction in power draw so AMD can certainly stay in the game. It is also a certainty that Nehalem will have a larger die size than quad Penryn. This will be true because Nehalem will have to have both an Integrated Memory Controller and the point to point CSI interface. Nehalem will also add L3 cache. It would not be surprising if the Nehalem die is larger than AMD's Shanghai die. The one positive for Intel is that although yields will be worse with a monolithic die, their 45nm process should be mature by then. However, AMD has shown considerably faster process maturity so yields should be good on Shanghai in Q1 09 as well.

An Aside: AMD's True Importance

Finally, I have to say that AMD is far more important than many give them credit for. I recall a half-baked editorial by Ed Stroligo A World Without AMD where he claimed that nothing much would change if AMD were gone. This notion shows a staggering ignorance of Intel's history. The driving force behind Intel's advance from 8086 to Pentium was Motorola whose 68000 line was initially ahead. It had been Intel's intention all along to replace x86 and Intel first tried this back in 1981 with iAXP 432. It's segmented 16MB addressing looked pretty good compared to 8086's 1MB segmented addressing. However, it looked a lot worse than 68000's flat 16MB addressing which had been released the year before. The very next year iAXP 432 became the Gemini Project which then became the BiiN company. IAXP 432 continued in development with the goal of replacing x86 until 1989. However, this project could not keep up with the rapid pace of x86 as it struggled to keep up with each generation of 68000. When Biin finally folded, a stripped down version of iAXP 432 was released as the embedded i960 RISC processor. Interestingly, as the RISC effort ran into trouble Intel began working on VLIW and when BiiN folded in 1989 Intel released its first VLIW procesor, i860. HP began work on EPIC the same year and five years later, Intel was commited to EPIC VLIW as an x86 replacement.

In 1995 Intel introduced Pentium Pro to take on the established RISC processors and grab more share of the server market. The important point though is that there is no indication that Intel ever intended Pentium Pro to be used on the desktop. We can infer this for a couple of reasons. First, Itanium had been in development for a year when Pentium Pro was introduced and an Itanium release was expected in 1998. Second, with Motorola out of the way (68000 development ended with 68060 in 1994), Intel was not expecting any real competion on the desktop. AMD and Cyrix were still making copies of 80486 so Intel had only planned some modest upgrades to Pentium until Itanium was released. However, AMD released K5 which thoroughly stunned Intel. Although K5 was not that fast it did have a RISC core (courtesy of AMD's 29050 RISC processor) which put K5 in the same class as Pentium Pro and a generation ahead of Pentium. Somehow AMD had managed the impossible and had skipped the Pentium generation. So, Intel went to an emergency plan and two years later released a cost reduced version of Pentium Pro for the desktop, Pentium II. The two year timeline indicates that Intel was not working on a desktop version previous to K5's release. Clearly, we owe Pentium II to K5.

However, AMD purchased Nexgen and released the powerful K6 (which also had a RISC core) just two years later meaning that it arrived at the same time as PII. Once again Intel was forced to scramble and release PIII two years later. We owe PIII to K6. But, AMD had been hard at work on a K5 successor and with the added technology from K6 and some Alpha tech it released K7. Intel was even more shocked this time because K7 was a generation ahead of Pentium Pro. Intel was out of options so it was forced to release the experimental Williamette processor and then follow up with the improved Northwood two years later. We owe P4 to K7. That P4 was experiemental and never expected to be released is quite clear from the pipeline length. The Pentium Pro design had a 14 stage pipeline which was reduced to 10 stages in PII and PIII. Interestingly Itanium also used a 10 stage pipeline. However, P4's pipeline was even bigger than the original Pentium Pro's at 20 stages. Itanium II has an even shorter pipeline at 8 stages so it is clear that Intel does not prefer long pipelines. We can then see that P4 was an aberration caused by necessity and Prescott at 31 stages was a similar design of desperation. Without K8 there would be no Core 2 Duo today and without K10 there would be no Nehalem.

There is no doubt whatsoever that just as 8086's rapid advance against competition from Motorola 68000 stopped the iAXP 432 and shutdown Biin, Intel's necessity of advancing Pentium Pro rapidly on the desktop stopped Itanium. Intel already had experience with VLIW from i860 and would have delivered Merced on schedule in 1998. Given Itanium's speed it could have been viable at as little as 150Mhz. However, Pentium II was already at 450Mhz in 1998 with faster K7 and PIII speeds due the next year. The pace continued rapidly going from Pentium Pro's 150Mhz to PIII's 1.4Ghz. Itanium development simply could not keep up and the grand plans of 1997 for Itanium to become the dominant processor fell apart. The pace has been no less relentless since PIII and Itanium has been kept in a niche server market.

AMD is the sole reason why today Itanium is not the primary processor architecture. To suggest that nothing would change if AMD were gone is an extraordinary amount of self delusion. Intel would happily stop developing x86 and would put its efforts back into Itanium instead. The x86 line is also without any serious desktop replacement. Alpha, MIPS, and ARM stopped being contenders long ago. Power was the last real competitor but it fell out of the running when its desktop chips couldn't keep up and were dropped by Apple. This means that without AMD, Intel's sole competition for desktop processors is VIA. And, just how far behind is VIA? No AMD would mean higher prices and slower development and the eventual phase out of x86. Of course, I guess people can always hope that Intel has given up its goal of more than a quarter century of dropping the x86 line and moving the desktop to a completey proprietary platform.

350 comments:

1 – 200 of 350   Newer›   Newest»
Scientia from AMDZone said...

There are additional technical factors I didn't mention in the article such as DDR2-1066 in the near term and micro-buffer memory in the long term. I was surprised to see how similar AMD's G3MX design is to my HTDIMM speculation a year ago. I didn't figure on putting the controller chip on the motherboard though but that does make sense with a tree structure with the DIMMs being the leaves.

Secondly, I'm certain AMD will get a boost in ASP in Q4 and Q1 but it is possible that Intel could cut costs by enough in early 2008 to maintain price pressure. Specifically, if they can get rid of the $300 Million/quarter flash losses that would be a big help.

Scientia from AMDZone said...

I'm not going to start this discussion by getting into a semantic argument over what "modest" means. Basically, Intel has had bettter shrinks than AMD particularly at 65nm. However, it appears that Intel's 45nm shrink is was not as impressive. I figure their shrink at about 65% versus AMD's 65nm shrink of 70% which many editorials described as being too small and a sign of problems. If Intel's shrink was actually smaller than 65% feel free to correct this.

Secondly, if you are going to claim to correct inaccuracies in my article then make doubly certain that you are not talking through your hat with your "correction".

The 20% number was generous. Gelsinger's actual claim at IDF was "40% faster for gaming". Allowing for the difference in clock speeds of 3.33Ghz versus 2.97Ghz makes for a 25% increase in gaming speed. Again, this is quite an exageration.

Scientia from AMDZone said...

wallachian

I would be most interested in your comment that the transistors are different between K10 and Brisbane. Do you have a source for this?

Scientia from AMDZone said...

enumae

The total volume is irrelevant. The fact that Intel has more FABs and about 3X AMD's volume explains why it takes them twice as long to make a node transition. However, this does not change the fact that AMD should be able to catch up by Q1 09 in spite of Intel's lead.

Scientia from AMDZone said...

giant

Current processor prices are irrelevant to this discussion.

abinstein

If the Chip Architect information is accurate, then the Penryn core is 30% smaller than the Merom core.

The core shrink is moderate. See above. The L2 cache shrink is good, however, even better than 50%.

This seems like a good place to start the shrink discussion. So, you are saying that the core shrink was actually about the same as AMD's shrink from 90 to 65nm? Yes, I would call the same 70% modest.

Scientia from AMDZone said...

ho ho said ...

I wonder if it is good that Itanium failed as desktop CPU instruction set. Sure, x86 has a long history and lots of applications for it but there are lots of things that could be better with it. I'm not saying that EPIC would be the best replacement, just that x86 is not the best thing out there. E.g for SIMD stuff VMX beats SSE hands down.

Btw, where did you got information on Nehalem? I've found nothing about the changes except IMC, CSI and SMT.

Also you forgot that Intel is making Larrabee that seems to be absolute monster in FP throughput.

Another thing that might intrest you is that they released a ton of information about their research just a little while ago. I still haven't got time to read all that but the little I have seems rather interesting.


"Intel only got a modest shrink of 25% on 45nm"

Intel has put around 80% more transistors per mm^2 on 45nm. I'd say it is quite good for the first product on that node.

abinstein
"So what you're saying is when they actually take advantage of the new process and new transistors, it's more difficult for the chip to clock faster. Does that scan for you?"

It probably takes more effort to put those new transistors to work correctly. After all they have had years of practice with the older ones and they are easier to work with.

Scientia from AMDZone said...

AndyW35 said...

I'm interested where the Q4 release date for Nehalem comes from in your post, I was under the impression that it was more late Q2 or early Q3 ie mid year.

I quite like your analysis, it shows well the aimed timescales for the two companies. The one issue I would raise though is that of course these are hoped for timeframes currently and we do not know how optimistic the players are in this regard. I think the tick tock approach is a clever way of doing it and spreading the risk, however I do wonder whether the calendar can be kept to.

Intel have an advantage here as we now know they are on time with 45nm and also, from the limited results so far, it looks like it has good headroom and is ok power wise. In this respect it is very similar to 65nm Core.

AMD have yet to go to 45nm so they have an extra step to get into their tick tock swing. Can we estimate whether they will be as successful as Intel appears to have been with the move to 45nm? I am not so sure. The move to 65nm has proven to be difficult for them with similar low headroom on the K8 products and admitted problems with the K10 product / initial lack of headroom. You have to think that the move to 45nm will pose at least as many problems as the move to 65nm, if not more.

I can easily see the 45nm move being 4Q behind Intel rather than 2Q.

I hope they do come through, as you mention, two horse races certainly favour the consumer, as ATi and nvidia have shown.

Scientia from AMDZone said...

Giant said...

I'm interested where the Q4 release date for Nehalem comes from in your post, I was under the impression that it was more late Q2 or early Q3 ie mid year.

Pat Gelsinger said the "second half of 2008". That's either Q3 or Q4, depending on how well things go.

Scientia from AMDZone said...

Christian M. Howell

"That's interesting as I was wondering why no one put any emphasis on allowing external memory boards with HTX."

There would be no advantage. HTX would be slower than using the memory controller

"Connecting G3MX to it would allow sufficient caching to make latency less of a problem and allowing those large banks of 32 DIMM sockets on an external board similar to what IWILL uses for it's 8P config.

The fanout is 4 DIMMs with 4 ports for 16 DIMMs total.

"Also, I read on a comment thread on Anand that there may be more than Barcelona happening on the 10th.

I think they will try to get Phenom FX out at up to 2.3GHz since it's still a niche and will help to raise ASPs if it is indeed faster than Kentsfield."


Yes, anything faster than 2.0Ghz would be an improvement.

"If they really didnt cherry pick that 3GHz part, it seems reasonable that either PhenomFX OR Budapest may come out with Barcelona."

If they did then you should have 3.0Ghz in Q1 08. I'm not expecting anything sooner than that. Anything over 2.5Ghz in Q4 would be an improvement.

"Budapest would be the better choice as lots of SC companies like Sun and Cray are waiting for them and just one SC can take several thousand to 100s of 1000s of them for one system."

Budapest has HT 3.0 which does indeed increase connectivity.

Scientia from AMDZone said...

abinstein

I'm not saying that no one else thought the same things about P4 and FBDIMM, just that I hadn't heard anyone say these things before I did.

The comments should be back on track now.

For anyone who missed the main points of my article, I'm not trying to split endless hairs over technical detail. The main point is that by the end of 2008 Intel has no real advantage. This includes:

1. No advantage in die size -- Nehalem will most likely be larger than Shanghai.

2. No advantage in 45nm volume (and therefore cost) because AMD will ramp faster.

3. No MCM advantage because Nehalem is monolithic quad core.

4. No offloading power from the die because the die will have to carry memory controller loads as well as point to point interface loads.

5. No expectation of greater routine speed for Penryn, Nehalem, or Westmere.

Scientia from AMDZone said...

Okay, to answer the question about Nehalem. Even Intel admits that the core is the same as Penryn. Secondly, everything that Intel has mentioned so far has been external except for HyperThreading.

This suggests to me that Intel has not done much to the core. Also the use of HyperThreading on a quad core die suggests a bit of desperation to increase core speed. Maybe this version is better than the old one but considering that you can do MCM and get 8 real cores it's difficult to imagine when it would be useful.

There is also the matter of what else Intel could actually add to the Penryn core. The SSE width is 128 bits now and that won't get any wider. The buses have been widened and the decoder has been widened. Other than the fetch line width I can't see anywhere that it could be improved. It may be that it will take time for the hardware to catch up to SSE4. If this is the case perhaps we'll see modest improvement with Westmere.

However, the point remains that I don't see any evidence that Nehalem will show the same kind of 20% boost in IPC that C2D did.

Aguia said...

Another great article Scientia,
I have nothing to point out except the fact that Intel will have a “hard time” completely moving to new platforms and who knows maybe introduce GPUs.

Scientia from AMDZone said...

aguia

I'm not sure I agree. Intel is talking about GPU's on the Nehalem die and these would presumably process ATA instructions. AMD isn't really talking about GPU's until Bulldozer. So, it looks like Intel will get there first.

Scientia from AMDZone said...

So far no one has tried to refute the notion that it will take Intel until Q2 08 to reach the initial top speeds. Wallachian obviously understood what I was saying but perhaps it wasn't as clear as it could be.

Basically, Woodcrest did not pioneer the 65nm process. It followed about 6 months after Presler. However, this time on 45nm there is no Presler. Penryn is the pioneer. As such, it will probably take about two quarters to get the clock speeds up.

giant

Sure, Intel said 2H 08. Generally 2H sounds better than Q4 while still being technically true. However, just remember the basic premise of Tick Tock is a year cycle. Penryn in Q4 07 so Nehalem in Q4 08 would be right on time.

BTW, I could also mention that another reason why I feel that Nehalem will not have core improvements is the difficulty of just getting Nehalem to work at all. Monolithic quad die is not an easy thing to do (as AMD found out). Secondly, Intel has no previous experience with CSI. AMD did have previous experience with Athlon MP before K8 and K8 only started at HT 1.0. Intel is trying something far more ambitious for its first point to point interface. I'll also be curious to find out if the L3 was just bolted on like it was for Xeon. AMD's L3 is capable of transferring directl between L3 and L1 and is also capable of doing some inclusive sharing. Intel probably doesn't need an L3 this sophisticated as it already has the very large shared L2.

Scientia from AMDZone said...

For anyone wanting to comment about the Aside, AMD's True Importance this is not a moral or ethical discussion. The point is that Intel advanced x86 because of competition from Motorola and then after Pentium because of competition from AMD. That Motorola and AMD made advances because they were competing with Intel is a given. I have no doubt that AMD would have higher prices and slower development if Intel were gone.

However, Intel is no danger of leaving anytime soon. The final point (and I don't think anyone will be able to challenge this) is that Intel would prefer to drop x86 entirely.

enumae said...

Scientia
The fact that Intel has more FABs and about 3X AMD's volume explains why it takes them twice as long to make a node transition.


Can you show me where 8 months vs 6 months (without confirmation from AMD that crossover has occured) is twice as long?

And why claim with FAB36, that started as both 65nm and 90nm, making a 100% transition in wafer starts, that AMD achieved crossover to 65nm?

AMD does not claim cross over in either Q1 or Q2 Confrence calls, so how can you?

However, this does not change the fact that AMD should be able to catch up by Q1 09 in spite of Intel's lead.

Well that is up for debate.

I feel they can if they outsource 45nm, but not on there own.

abinstein said...

enumae -
"I feel they can if they outsource 45nm, but not on there own."

This is nonsense. Which company do you suppose can do 45nm SOI mass production better than AMD? Who can AMD outsource to?

AMD is not outsourcing 45nm processor manufacturing. No freaking way. OTOH, AMD/ATi has already been outsourcing GPU production at 90nm and 65nm.

scientia -
"BTW, I could also mention that another reason why I feel that Nehalem will not have core improvements is the difficulty of just getting Nehalem to work at all."

You do know that this is no persuasive to its intended readers at all. People who need you to persuade on this will also believe Intel capable of solving any technical problem.

Besides, Nehalem to Penryn will be good enough if it's similar to K8 to K7, or Power6 to Power5. :) IMO Nehalem seems the first truly worthy processor from Intel in a long time.

enumae said...

Abinstein

I have two questions that I hope you can answer...

1. Can you show me information about which FAB will get the 45nm equipment (an AMD press release if you can and not test runs)?

2. And how and when AMD will be able to pay for enough 45nm tools for a cross over in Q1 2009?

AMD is not outsourcing 45nm processor manufacturing.

Ok, we will have to wait and see.

Ho Ho said...

scientia
"Even Intel admits that the core is the same as Penryn"

Where has it said that? Even the "simple" SMT addon is complicated enough to need quite a bit of rework, not to mention crossbar (if it has that) and IMC.


"Secondly, everything that Intel has mentioned so far has been external except for HyperThreading."

AMD wasn't talking much about K10 last year either, it doesn't mean it'll have only small differences compared to K8.


"However, the point remains that I don't see any evidence that Nehalem will show the same kind of 20% boost in IPC that C2D did."

I'm quite sure that per-core IPC will be at least 20% higher, though thanks to SMT not neccesarily thanks to core improvements. It will surely be difficult to get that big performance increase in singlethreaded applications.

Giant said...

Which company do you suppose can do 45nm SOI mass production better than AMD

IBM.

Andy said...


Q2 06 - Woodcrest
Q3 07 – Barcelona Trailing by 5 quarters.

Q4 07 - Penryn
Q3 08 – Shanghai Trailing by 3 quarters.

Q4 08 - Nehalem
Q2 09 – Bulldozer Trailing by 2 quarters.

Q4 09 - Westmere
Q1 10 - 32nm Bulldozer Trailing by 1 quarter.


Don't you think thats slightly unrealistic? I don't think AMD have enough fabs and resources to progress that fast. It's what AMD needs to do, I doubt they can do it though. They can't even produce top bin products on 65nm yet due to their late start. Unless they can move to 45nm and by magic have no problems then i don't think your roadmap is possible. If AMD can just jump to 45nm I would continue to 32nm while my luck was in.

Also Nehalem does not use the penryn core and I too was under the impression it will come early-mid H2.
Also Nehalem is apparently going start at speeds of 4ghz according to dailytech. So with a 0% IPC increase there will be a 33% performance increase. A 10% IPC increase will yield a 46% increase in speed over c2d.

Intel have experience with high clocks and now a bit with IPC increase from C2D. We can only assume they remember what Netburst was like and aren't going down that route again.

Scientia from AMDZone said...

enumae

You seem very confused about the numbers.

AMD 6 months to 100% conversion
versus
Intel 8 months to 50% conversion

You mentioned crossover but this does not apply to FAB 30 since it won't be converted. Assuming that FAB 30 gets 300mm tooling during 2008 (which it should) then it too should be producing 45nm in Q1 09.

BTW, if you are really stuck on the idea of 50% crossover then AMD reached that point in approximately 5 months. However, this is not actually indicative of a dual FAB facility since had FAB 30 been 300mm it too would have been in the process of conversion. AMD would still have reached 100% conversion within 9 months. This is only because AMD would most likely stagger the transition. However, if pressed, the second FAB could also begin producing the next node in 6 months as well.

Giant said...

Nehalem is a massive change from Penryn, no doubt at all. We'll see a lot more of this at IDF next month.

Scientia from AMDZone said...

enumae

BTW, your outsourcing 45nm comment made me laugh. There are only two companies in existance that would have the process expertise to produce Shanghai: Intel and IBM.

Neither of these is possible so I can't imagine where you think this could occur.

Scientia from AMDZone said...

enumae

"1. Can you show me information about which FAB will get the 45nm equipment "

FAB 36 already has the 45nm equipment. FAB 30 doesn't even have 300mm tooling yet. Where did you get the idea that FAB 30 would be converted to 45nm first?

"2. And how and when AMD will be able to pay for enough 45nm tools for a cross over in Q1 2009?"

AMD has already budgeted the tooling for FAB 36 for 2007. They didn't have to cut anything because they got a grant (bale-out) from the German government. The only tooling I can't say for certain is the expanded capacity which takes FAB 36 from 20K wafers per month to 24K. However, AMD has mentioned that FAB 36 will reach 24K so I assume they have this budgeted too. I can't say anything yet about the FAB 30 tooling budget.

Scientia from AMDZone said...

abinstein

I don't think Intel will get 20% out of Nehalem. I would say this will be more like PII to PIII.

Scientia from AMDZone said...

Mo said...

Christian:

Same logic can also apply to AMD. When AMD was in lead, it also stopped the innovations, prices were HIGH (entry leve X2 3800+ was around $300).
In the years, AMD ramped up clock speed like a turtle. Ofcourse they had the ability to do it, they chose NOT to do it (this is proven by them in under a year ramping to 3.2Ghz).

I hate it when people bring up the whole without AMD We'd be blah blah blah. AMD is no better than intel at this, without Intel, AMD would also be a dog when it comes to innovation.

If AMD had been "innovating", we wouldn't have such a Gap(4Q's)inbetween Intel innovation launch and AMD innovation launch.

It's all crock when you blame Intel of stopping innovation, if anything AMD is equal if not worst when it comes to this.

Scientia from AMDZone said...

mo

"When AMD was in lead, it also stopped the innovations"

Completely false. AMD's R&D was running at capacity.

"prices were HIGH (entry leve X2 3800+ was around $300)."

Yet AMD's overall ASP was still lower than Intel's.

"In the years, AMD ramped up clock speed like a turtle."

This is true. AMD went from 1.8Ghz in Q2 03 to 3.0Ghz three years later. That is about 100Mhz every 3 months.

Intel went from 3.2Ghz to 3.8Ghz in the same period. That is half of turtle speed.

"Ofcourse they had the ability to do it, they chose NOT to do it"

You must have quite a collection of tinfoil hats.

"(this is proven by them in under a year ramping to 3.2Ghz)."

You should blow some of the dust and cobwebs off your math skills. A 200Mhz increase in 6 months is the same pace as the previous 3 years.

"I hate it when people bring up the whole without AMD We'd be blah blah"

I know. It's just like those people who claim that the only reason England grew to become such a naval power during the Napoeonic Wars was because of France.

"AMD is no better than intel at this, without Intel, AMD would also be a dog when it comes to innovation."

ROFL. AMD only made Intel clones up to 80486 so presumably without Intel AMD would never have made x86 processors. Of course, Intel did invent the microprocessor so maybe they wouldn't be making microprocessors at all. Of course while we are giving credit let's not forget Alexander Graham Bell.

"If AMD had been "innovating", we wouldn't have such a Gap(4Q's)inbetween Intel innovation launch and AMD innovation launch."

The gap is actually 5 quarters from Woodcrest to Barcelona. However, the timeline is consistent: 2 years from K8 to X2 and two years from X2 to X4.

"It's all crock when you blame Intel of stopping innovation, if anything AMD is equal if not worst when it comes to this. "

Intel has been innovating just as AMD has. That is the whole point. We can't count on VIA or IBM for desktop products; there is only AMD and Intel.

Aguia said...

Also Nehalem is apparently going start at speeds of 4ghz according to dailytech.

Yes and acording to this road map from vr-zone we will have 3.7Ghz processors in Q4/2007 from Intel with 1066Mhz bus and 54W TDP. Yeah right...

Scientia from AMDZone said...

Giant

You say IBM can do 45nm better than AMD?

Well, you might be right because if IBM runs into a problem they can always ask one of the AMD's 77 process engineers at East Fishkill for help.

Giant said...

It makes sense to make FAB30 the 45nm fab first. Then AMD can have one 65nm FAB and one 45nm FAB instead of a 45nm and 90nm FAB.

Scientia from AMDZone said...

Andy

"Don't you think thats slightly unrealistic?"

I don't know. It's AMD's timeline.

"They can't even produce top bin products on 65nm yet due to their late start."

I doubt AMD can make a 3.2Ghz 65nm today. However, I'm certain they can make a 2.8Ghz 65nm chip. 3.0Ghz dual core should also be doable.

"Unless they can move to 45nm and by magic have no problems then i don't think your roadmap is possible."

AMD is running 45nm wafers now; that gives about a year. However, it isn't magic; AMD has been working on immersion for the past several years.

"Also Nehalem does not use the penryn core"

What makes you think this?

"and I too was under the impression it will come early-mid H2."

Perhaps it will.

"Also Nehalem is apparently going start at speeds of 4ghz according to dailytech."

Let's take a few steps back and go over that again. First of all the source is not Dailytech; Gruener says that the source is actually Digitimes. Now, what does Digitimes really say:

With Intel aiming to eventually scale the 45nm range up to a maximum core frequency of 4.0GHz, the sources estimate that at least four more CPUs will appear at a later time with frequencies higher than the initial 3.16GHz.

Notice that it says "maximum core frequency of 4.0 Ghz"? So, where did you get the idea that Nehalem would start at 4.0Ghz?

Scientia from AMDZone said...

giant

"Nehalem is a massive change from Penryn, no doubt at all."

Based on what?

"We'll see a lot more of this at IDF next month."

Perhaps we will.

Scientia from AMDZone said...

Giant

"It makes sense to make FAB30 the 45nm fab first."

Without 300mm tooling? How?

"Then AMD can have one 65nm FAB and one 45nm FAB instead of a 45nm and 90nm FAB."

You don't understand. AMD is already running 45nm wafers but they just started taking down FAB 30; FAB 30 is not the intial 45nm FAB.

Secondly, AMD will not have one 45nm FAB and one 90nm FAB. By the time 45nm production starts at FAB 36, FAB 30 will no longer be producing 200mm wafers. The initial 300mm tooling at FAB 30 will be capable of 45nm.

So, either AMD will have just a single FAB transitioning to 45nm or it will have two.

Ho Ho said...

scientia
"AMD's R&D was running at capacity."

Same with Intel.


"Yet AMD's overall ASP was still lower than Intel's."

No wonder if they were selling their singlecores in masses.


"Intel went from 3.2Ghz to 3.8Ghz in the same period. That is half of turtle speed."

Is that for single or dualcore? Also you have to consider that Netburst was banging the wall with thermal problems for a long time already, AMD wasn't.



"AMD is running 45nm wafers now; that gives about a year."

Since when has Intel been running 45nm wafers? 1 or 2 years?


"What makes you think this?"

I've asked several times, what makes you think differently? Only thing you've said is "because it is difficult". By that logic AMD shouldn't be making huge changes to K10 either.

Scientia from AMDZone said...

randy

You are right; Intel could indeed announce a lot of new architectural changes for Nehalem next month as Giant says. I guess we'll see.

Andy

I have no idea if you are familiar with AMD's ramp of 65nm. It went like this:

Q1 06: 65nm tooling shakedown at 90nm.

Q2 06: 65nm tooling shakedown at 65nm.

Q3 06: 65nm production started.

Q4 06: First 65nm chips shipped.

If we assume that 45nm is similar then wouldn't this be:

Q3 07: 45nm tooling shakedown at 65nm.

Q4 07: 45nm tooling shakedown at 45nm.

Q1 08: 45nm production started.

Q2 08: First 45nm chips shipped.

My estimate is Q3 08 so why would this be unrealistic?

Scientia from AMDZone said...

Ho Ho

"Since when has Intel been running 45nm wafers? 1 or 2 years?"

January 2006 - Intel 45nm SRAM & logic
May 2006 - AMD 45nm SRAM & logic

Late November 2006 - Penryn tapeout

Shanghai needs to tapeout soon to keep on track.

Late January 2007 - Intel bootable Penryn

AMD would need bootable Shanghai in September/October to match.

Douglas Grose in February claimed 45nm production in Q2 08 with full production in second half. This is again one quarter ahead of my estimate. So, I would honestly say that my estimate is conservative rather than optimistic.

enumae said...

Scientia
You seem very confused about the numbers...You mentioned crossover but this does not apply to FAB 30 since it won't be converted.


Your article is very misleading... You talk about Intel crossover in 8 months, but AMD conversion in 6 months and it sounds like you are trying to compare the two... are you?

BTW, your outsourcing 45nm comment made me laugh.

BTW, since we are being honest, I thought you were ED from overclockers after reading your article, except that it was AMD in the positive light...

Neither of these is possible

Well you didn't seem to think that Doug Freedman was right about AMD needing money this year, but...

FAB 36 already has the 45nm equipment.

Why not quote the whole question, not an opinion an AMD statemeent/press release.

The rest of your post were mainly just opinions, and we both have them.

Have a good one.

InTheKnow said...

Ah, where to start?

Scientia said...

AMD 6 months to 100% conversion
versus
Intel 8 months to 50% conversion

You mentioned crossover but this does not apply to FAB 30 since it won't be converted. Assuming that FAB 30 gets 300mm tooling during 2008 (which it should) then it too should be producing 45nm in Q1 09.


So we are going to count Intel's fabs that won't convert (D1C, F24, F12) and use that output vs 45 nm output to determine crossover for Intel. But for AMD since FAB 30 won't be converted we won't count it. By that yardstick all you have to look at is how long it took D1D to convert to 45 nm. I believe you will find that that has been done for a long time.

Also, if you look at WSPW I think you will find they aren't there yet (though they may well be on a die count basis). Why? Because you can start a lot more 8 inch wafers in the same space than you can 12 inch wafers. Just look at the physical difference in the tool footprints, and that becomes obvious. The fab doesn't get bigger, so you have fewer tools, but they produce more die area. That is part of why fabs don't see the full 2.4x increase in cost savings moving from 8 to 12 inch.

Since the crossover metric is traditionally based on wafer starts, I'm suspect that is why AMD hasn't proclaimed crossover yet.

Feel free to provide an AMD statement or official document of any kind that shows they have achieved cross over and I'll gladly retract my objections.

InTheKnow said...

Scientia said ...

Even Intel admits that the core is the same as Penryn.

Below is the entire excerpt from Intel's press release on Nehalem. I see nothing that tells me it is the same core.

NEHALEM MICROARCHITECTURE

After Penryn and the 45nm Hi-k silicon technology introduction comes Intel's next-generation microarchitecture (Nehalem) slated for initial production in 2008. By continuing to innovate at this rapid cadence, Intel will deliver enormous performance and energy efficiency gains in years to come, adding more performance features and capabilities for new and improved applications. Here are some new initial disclosures around our Nehalem microarchitecture:

* Dynamically scalable for leadership performance on demand with energy efficiency
o Dynamically managed cores, threads, cache, interfaces and power
o Leverages leading 4 instruction issue Intel® Core microarchitecture technology
o Simultaneous multi-threading (similar to Intel Hyper-Threading Technology) returns to enhance performance and energy efficiency
o Innovative new Intel® SSE4 and ATA instruction set architecture additions
o Superior multi-level shared cache leverages Intel® Smart Cache technology
o Leadership system and memory bandwidth
o Performance enhanced dynamic power management
* Design scalable for optimal price/performance/energy efficiency in each market segment
o New system architecture for next-generation Intel processors and platforms
o Scalable performance: 1 to 16+ threads, 1 to 8+ cores, scalable cache sizes
o Scalable and configurable system interconnects and integrated memory controllers
o High performance integrated graphics engine for client


Link is
http://www.intel.com/pressroom
/archive/releases/20070328fact.htm

Since this is all Intel has released officially, anything else is conjecture and speculation, not fact.

abinstein said...

"Below is the entire excerpt from Intel's press release on Nehalem. I see nothing that tells me it is the same core. ..."

It seems scientia was right, that Nehalem is basically Penryn plus -

1. IMC/CSI
2. hyperthreading
3. 8-core capability
4. better power management

Unlike Core 2, which was advertised with loads of core improvements from say Yonah. Nehalem doesn't even have those advertisement. Maybe Intel somehow learned to be modest?

InTheKnow said...

In addition to the obvious question of whether on not AMD can meet their stated goal to catch up to Intel on process technology with the cadence put forth in the 'pipe' plan, I think you have missed a key issue.

Cost

Tooling, design, process development, ramp, all of these take not only time, but money. By shortening the amount of time between each cycle of this process, you effectively increase the cost since you will see less return on your investment.

So more than anything else, what I question is not whether AMD can carry out such an agressive plan (though I have questions about that too), but can they afford to carry it out.

I don't think anyone here is going to claim AMD has near the resources that Intel can claim. So the big question is whether or not they can sustain the implied cost levels for the next 3 years.

InTheKnow said...

Abinstein,

It may well be the same core, my only point is that unlike Scientia's claim, Intel hasn't told us that.

Perhaps it is just my perception, but our host and others seem to get worked up when posters put words into the mouth of AMD executives. I'm simply trying to see the same yardstick applied to both sides of the argument.

Aguia said...

InTheKnow,
Let’s analyse that info:

Dynamically scalable for leadership performance on demand with energy efficiency

Intel already said that with Conroe.

Dynamically managed cores, threads, cache, interfaces and power

Penryn already has that.

Leverages leading 4 instruction issue Intel® Core microarchitecture technology

Where did I already saw that, right Conroe.

Simultaneous multi-threading (similar to Intel Hyper-Threading Technology) returns to enhance performance and energy efficiency

This is a difficult one, let’s see, P4 Northwood core.

Innovative new Intel® SSE4 and ATA instruction set architecture additions

Penryn.

Superior multi-level shared cache leverages Intel® Smart Cache technology

Core Duo 32 bit.

Leadership system and memory bandwidth

Let’s wait for that, but that one already exists in AMD 2003 Opteron.

Performance enhanced dynamic power management

Not new.

Design scalable for optimal price/performance/energy efficiency in each market segment

Conroe?

New system architecture for next-generation Intel processors and platforms

Really?

Scalable performance: 1 to 16+ threads, 1 to 8+ cores, scalable cache sizes

Getting boring.

Scalable and configurable system interconnects and integrated memory controllers

2003 AMD Opteron.

High performance integrated graphics engine for client

This one is very interesting. But I don’t know the Toms hardware forum is full of folks that say it’s impossible to have one GPU on CPU (Fusion) because there is not enough memory bandwidth for it. But since this is Intel it will certainly work for sure.

InTheKnow said...

Scientia said...

FAB 36 already has the 45nm equipment.

Not installed they don't if they are running 65 nm. Tools take space and few things are more expensive than wasted clean room space.

I'll grant they may have a pilot line either in place or being installed, but this is hardly what you seem to be implying. Not to mention I thought that they were using IBM's East Fishkill plant for development.

Incidentally, I recall a post asking whether or not you saved anything by running a fab below capacity. The answer is no because of the fixed costs of maintaining the clean room environment. Facilities costs for a fab are enormous. And they are the same whether you are running 1 wafer or 10000. So you want to keep the factory loaded.

InTheKnow said...

aguia, I guess that means that having achieved these milestones, there is no room for improvement and both AMD and Intel have hit the end of the road? ;)

Sure, it is marketing speak but you missed the point. That being that Intel has told us nothing about the core in their official statements. I don't really think we know enough about Nehalem to make a judgement on how good or bad it is going to be yet.

As for what the release does tell us, I think Abinstein summarized it nicely.

InTheKnow said...

Abinstein said...

Maybe Intel somehow learned to be modest?

Lol. Hardly.

But I think that unlike the situation with C2D they are in a better position. When C2D came out, AMD was the undisputed leader in just about everything. I think they may have felt they needed to lay all their cards on the table to stop the hemorrhaging.

Now the general perception is (rightly or not) that Intel is the leader and they may not feel they have to give everything away up front. It is classic marketing to keep feeding new little bits of info to the media just to keep the interest up.

This is just speculation on my part to explain why Intel would not mention core improvements, of course, but it seems plausible at first blush.

I'll be the first to admit I don't know if it is a new core or not. I'm just looking at the possibilities here.

Ho Ho said...

scientia
"Late January 2007 - Intel bootable Penryn"

Wasn't it not only bootable but running at high clock speed?



aguia
"This is a difficult one, let’s see, P4 Northwood core."

If the CPU is anything like Conroe then direct copy of Netburst HT won't work.


"Let’s wait for that, but that one already exists in AMD 2003 Opteron."

What makes you think AMD still has the lead in 2-3 years? Someone hinted at 3-channel DDR3 for Nehalem. At 2GHz effective rate it would provide bandwidth around 48GB/s, more than full-speed 16bit HT3 and nearly as much as high-end previous generation GPUs. AMD has nothing like that, at least not on public roadmaps. Of course it is not on Intels either, these are only rumours.


"This one is very interesting. But I don’t know the Toms hardware forum is full of folks that say it’s impossible to have one GPU on CPU (Fusion) because there is not enough memory bandwidth for it"

Ray tracing doesn't need nearly as much bandwidth as rasterizing and Intel seems to take it very seriously. Though I'm not sure if their real RT monster, Larrabe, will be integrated with CPUs that soon. They might have something different in between. Perhaps something more like traditional GPU, just a bit more programmable.


Now someone should make similar list for all the advertised K10 features. Will there be more than 2-3 features that were not done by someone else before? Can anyone say that at K10 is not a good CPU because of that?



Btw, did you know that Prescott was so much different from Northwood that people were puzzled why wasn't it called Pentium5? From the first look they are rather similar but there were major changes between the two.

Aguia said...

Wasn't it not only bootable but running at high clock speed?

Any picture of the box that was running that?


If the CPU is anything like Conroe then direct copy of Netburst HT won't work.

Really, you can’t copy technology? That’s new… so you can’t learn any thing from the past. By your standards, AMD 386 and 486 wasn’t an Intel clone for sure. AMD never copied any thing from Intel in fact they invented the x86 instruction that strangely where equal to Intel…


Someone hinted at 3-channel DDR3 for Nehalem.

So you change the memory type and add a channel and it’s already a new thing, a huge development. Maybe AM2 was not that bad after all since it supported DDR2…


AMD has nothing like that

Amazing ho ho, just amazing…


Though I'm not sure if their real RT monster, Larrabe, will be integrated with CPUs that soon.

Didn’t Intel already demod that with Penryn…
Real time Raytracing


Will there be more than 2-3 features that were not done by someone else before? Can anyone say that at K10 is not a good CPU because of that?

What was being discussed wasn’t if it’s good or not, I’m sure K10 and Nehemlem will be good. The discussion was how much different they are from their predecessors.


Btw, did you know that Prescott was so much different from Northwood that people were puzzled why wasn't it called Pentium5? From the first look they are rather similar but there were major changes between the two.

Because Intel stepped back, they didn’t deliver 64 bits from the beginning, in fact they used the same strategy AMD used with Sempron 754 ,disabled the 64 bits instructions! Maybe because they didn’t want the sales go down with the P4 (Northwood), or didn’t want Pentium 5 associated as “bad” processor, or didn’t want to promote 64 bits, perhaps just the fact that Pentium already meant 5 (five) :)

Ho Ho said...

aguia
"Any picture of the box that was running that?"

Earliest I could find was mid-April this year, quadcore Penryn at 3.33GHz seemingly with inbox cooler. Intel even allowed to run some benchmarks and showed clock speeds, though in controlled environment. Still much better than pretty much any single K10 demo so far.

That was around 6 months before the CPU will get released. By Scientias timetable (AMD selling 45nm in Q2 08) we should be seeing >>3GHz 45nm K10-derivates any day now. Anyone else but me have any doubts about AMD ability to keep 45nm production on track?


"Really, you can’t copy technology?"

Yes you can but it isn't always that simple. SMT needs to adapt to architecture, it is not simple copy-paste. One exaple would be how AMD memory controller massively lost effectiveness by going from DDR1 to DDR2.


"So you change the memory type and add a channel and it’s already a new thing, a huge development."

Going from around 10GB/s effective on K8 to >40GB/s on Nehalem is quite a big thing, don't you think? Without adding a channel it would be impossible to get nearly as big improvement. Though I do wonder how they manage to add so many pins to the CPU to connect all the things. Perhaps CSI needs less pins than FSB and/or they will cut down some of the powering pins? After all at least half the pins are used for power on todays CPUs.

Btw, can anyone say why don't CPUs use some bigger connectors for delivering power to the CPUs? Adding a molex-like thing shouldn't be so difficult I would imagine. Even if it isn't as powerful as 8-pin PCIe power connectors they should still cut down a few feet currently used on CPUs.


"Maybe AM2 was not that bad after all since it supported DDR2"

AMD had extremely good DDR1 controllers with nearly perfect efficiency. Current DDR2 K8 don't have twice the bandwidth of DDR1 ones with half the clock speed. DDR2 certainly wasn't bad but not that revolutionally either.


"Amazing ho ho, just amazing… "

Do you have any information on what does AMD plan to have in future? I've seen a few things about special "memory socket" where you put a chip in one cHT connected socket and it would act as an external memory controller similar to what Intel currently uses. Though that is mostly for providing more capacity, not bandwidth.


"Didn’t Intel already demod that with Penryn…
Real time Raytracing"


Yes, I know that Penryn is simply the best CPU for ray traing, at least so far and SSE4 is one reason why. Though Larrabee will be even better with being massively threaded and having things like texture sampling in special HW. Current CPUs (and Penryn) are mostly bottlenecked by shading, not by tracing rays through scene and intersecting geometry. I wouldn't be surprised to see initial Larrabees beating 3GHz Penryns by an order of magnitude or more in ray tracing speed. When their 48-core 196 thread Larrabee hits the streets I'll be in heaven :)

Btw, Intel kind of cheated on that benchmark by using completely different scenes for comparison. Their old demo used around 0.5M different triangles instanced to around 1 billion triangle scene. Second one was apparently a quake3 level of around 50-100k triangles for static stuff and probably ~25k for dynamic things. Of course algorithms have come long way during last couple of years and had they used todays algorithms on that old demo they would have got at least 2-5x better performance on the same HW, if not more.

Still, tracng around 50M primary rays per second is a huge feat. Fastest tracers I know run on Core2 and deliver around half the primary rays with simple shading.


"The discussion was how much different they are from their predecessors."

Yes, I know. My point was that when someone would make a similar list for K10 it would also seem as it isn't anything special.

abinstein said...

"Earliest I could find was mid-April this year, quadcore Penryn at 3.33GHz seemingly with inbox cooler."

This cooler really doesn't look very "inbox" to me. Does it look so to you?


"Intel even allowed to run some benchmarks and showed clock speeds, though in controlled environment. Still much better than pretty much any single K10 demo so far."

Let me see... talking about high-end computer, is there any Intel demo that can do what Phenom has done last month?


"That was around 6 months before the CPU will get released. By Scientias timetable (AMD selling 45nm in Q2 08) we should be seeing >>3GHz 45nm K10-derivates any day now."

What's matter with your arithmetics? 6 months before Q2'08 is Q4'07. Right now is just in the middle of Q3 yet.

That said, I personally don't think AMD's going to have 45nm in production before Q3'08. 6 months before that is Q1'08, which is when someone might get Shanghai sample somewhere.


"One exaple would be how AMD memory controller massively lost effectiveness by going from DDR1 to DDR2."

"Massively lost effectiveness?" Do you have any idea what you're talking about? The problem is not AMD's memory controller, but the slower access time of the memory modules. DDR2-400 is 2x slower than DDR-400, and DDR2-667 will only be faster than DDR-333 on Athlon.


"Going from around 10GB/s effective on K8 to >40GB/s on Nehalem is quite a big thing, don't you think? Without adding a channel it would be impossible to get nearly as big improvement."

You sir are making up numbers. 3 channels of DDR3-1600 would provide approximately 38.4GB/s memory bandwidth. Where did you get that ">40GB/s"? OTOH, current K8 already support 12.8GB/s memory bandwidth, or 28% more than your "around 10GB/s".

Nehalem's higher bandwidth is needed because it has 4x number of cores plus integrated graphics. 3x memory bandwidth really just barely keeps up.


"Perhaps CSI needs less pins than FSB and/or they will cut down some of the powering pins? After all at least half the pins are used for power on todays CPUs."

Remember Intel says Nehalem brings the new system architecture for next-generation Intel processors and platforms. They'll just change to new socket with more pins.

abinstein said...

"Btw, can anyone say why don't CPUs use some bigger connectors for delivering power to the CPUs? Adding a molex-like thing shouldn't be so difficult I would imagine."

It's not just difficult but impossible. How do you spread the huge current from one point of the die to the rest of it?


"AMD had extremely good DDR1 controllers with nearly perfect efficiency. Current DDR2 K8 don't have twice the bandwidth of DDR1 ones with half the clock speed."

This is because memory transfers happen in bursts of 64 bytes. There are constant delays between bursts, which is not reduced with clock frequency. Amdahl's law will carry you through the rest.


"Though that is mostly for providing more capacity, not bandwidth."

A single 32-bit bi-directional HT3 offers (truly) >40GB/s, higher than 3 DDR3 channels of a Nehalem.


"Still, tracng around 50M primary rays per second is a huge feat."

Just add a CTM engine and get over with it.

Ho Ho said...

abinstein
"This cooler really doesn't look very "inbox" to me. Does it look so to you?"

Looks pretty much the same as K10 cooler, perhaps a bit smaller.


"Let me see... talking about high-end computer, is there any Intel demo that can do what Phenom has done last month?"

There have been more benchmarks done with Penryn. For example this one is more than what people have done with K10 and K10 should launch around couple of months before Penryn.


"What's matter with your arithmetics?"

Well, it is currently four months past that demo and around three months to release. The demo was made around 7 months before release. Scientia said his timetable was conservative and "production in Q2" doesn't mean it must be the end of the quarter. So combinig all those things together we should see something soon.


"The problem is not AMD's memory controller, but the slower access time of the memory modules"

Access time doesn't affect throughput, especially throughput efficiency.


"DDR2-400 is 2x slower than DDR-400, and DDR2-667 will only be faster than DDR-333 on Athlon."

Yes but I was comparing against DDR2 800.


"You sir are making up numbers. 3 channels of DDR3-1600 would provide approximately 38.4GB/s memory bandwidth"

You missed the part where I said "2GHz effective".


"OTOH, current K8 already support 12.8GB/s memory bandwidth, or 28% more than your "around 10GB/s"."


Too bad it show. DDR1 reached bandwidth of around 5.3GiB/s, much higher efficeincy compared to DDR2.


"They'll just change to new socket with more pins."

Yes, they will but adding pins is not simple. It took ATI quite a bit of work to put enough pins on R600 for power and 16x32bit memory and this is a soldered chip, much simplier than socket one.


"3x memory bandwidth really just barely keeps up."

Intel currently fights against AMD having around 30% less bandwidth, ~5.3GB/s on 1066MHz and 6.5 on 1333MHz FSB. Compared to that it is a massive increase. When before has there been such a leap in CPU bandwidth in couple of years?


"It's not just difficult but impossible. How do you spread the huge current from one point of the die to the rest of it?"

You know there exist things called "wires". A few mm^2 of copper wire can move hundreds of watts of power, just see how big are the wires at your household. I can't see a reason why couldn't it be done, perhaps I'm missing something obvious.


" There are constant delays between bursts, which is not reduced with clock frequency"

Then why does the I linked to benchmarks show otherwise?


"A single 32-bit bi-directional HT3 offers (truly) >40GB/s, higher than 3 DDR3 channels of a Nehalem."

Ok, how to you get all that bandiwdth to the socket? Will it have more than two channels?

Also I hope you know there won't be 32bit links between each CPU sockets on DC2 platforms in foreseeable future. Perhaps for connecting different boards but not between sockets. You should also know that AMD will make the buses half the width for >4P connections.

Also it'll still have the same latency problems that Intel has with external memory controller, not to mention it is basically NUMA for even single CPU that has the memory extension, not particularly a good thing.


"Just add a CTM engine and get over with it."

You have lot to learn about ray tracing. CTM and GPUs in general are one of the most inefficient HW for ray tracing. 88800GTX barely reaches the performance of regular Core2 quadcores with simple scenes. Add a few secondary rays and see how performance drops extremely fast. With couple of pointlights and reflective surfaces you probably halve the performance on a CPU and by an order of magnitude on GPU.

Even the G80 tracer only reached over a half of theoretical FP throughput on G80. Sure, that is a lot of GFLOPS, perhaps almost an order of magnitude more than Core2 quad has, combined with order of magnitude more memory bandwidth it does make it kind of pitiful if all they do is match the performance of CPUs in simple scenes and massively lag behind in complex ones.


Unless GPUs change into Larrabee-like general architecture they won'd be any good for ray tracing because they are plain too inefficient. Ray tracing needs special kind of HW to perform efficiently and currently Cell SPUs and general CPUs are best mas-produced things we have. There are of cource some protoype HW for ray tracing but they are still far from coming to market. Still, a single 90MHz FPGA matched the speed of 2.6GHz P4 having only a fraction of transistors and memory bandwidth. Just imagine what could happen with ~1.5-2.5GHz and ~50 "pipes" in one ray tracing HW. It is nearly enough to run instant global illumination in real-time and perhaps some simplier photon mapping.

When it comes to ray tracing I believe I know a few things, I've been closely following the research from early 2003 and read pretty much every paper released about the subject. From pre-2000 the efficiency of algorithms have increased at least a couple of orders of magnitude whereas rasterizing algorithms have pretty much stayed still only waiting for more brute force from faster GPUs. Also there are still a lot yet to be researched about RT that can make it even more efficient, I wouldn't be too sure about rasterizing.

Btw, there will be the second RT07 Symposum on Interactive Ray Tracing coming in just a couple of weeks where some new and interesting papers will be revelaled. IIRC Intel has a few papers there also.

abinstein said...

Ho Ho -
"Looks pretty much the same as K10 cooler, perhaps a bit smaller."

Seriously you are having a bad habit of spreading false information. look again at the Penryn demo cooler, the heatsink and fan are almost as large as the 120mm system fan, and you claim it's a bit smaller than the 70mm used in Phenom?

"There have been more benchmarks done with Penryn. For example this one is more than what people have done with K10"

The demo of Phenom was playing a game while grasping the video output, encoding it to H.264, and streaming it over the network. Is the Penryn test anywhere close to it?

"You missed the part where I said "2GHz effective"."

If you're willing to get "2GHz effective" DDR3, you should consider DDR2-1066, too, which would offer 17GB/s bandwidth, or 70% more than what you've said.

"Access time doesn't affect throughput, especially throughput efficiency."

You are plain wrong. Just change the memory latency settings in your BIOS and watch your memory read throughput to drop.

"Too bad it show. DDR1 reached bandwidth of around 5.3GiB/s, much higher efficeincy compared to DDR2."

I have told you why. Apparently you refuse to learn the detail of SDRAM operation. Just know that both DDR1 and DDR2 will have higher efficiency than DDR3, and Nehalem will get less percentage usable bandwidth out of its 38.4GB/s peak.

"You know there exist things called "wires". A few mm^2 of copper wire can move hundreds of watts of power, just see how big are the wires at your household. I can't see a reason why couldn't it be done, perhaps I'm missing something obvious."

How do you make "a few mm^2 of copper wire" on-die?

"Then why does the I linked to benchmarks show otherwise?"

You don't understand what your links show. The techreport page shows little if anything relevant to the discussion here. The digi-life page shows that memory bandwidth is limited not just by memory bus clock, but more by memory controller (FX-62 reaches much higher bandwidth than 4000+ with the same memory).

"CTM and GPUs in general are one of the most inefficient HW for ray tracing. 88800GTX barely reaches the performance of regular Core2 quadcores with simple scenes."

Am I correctly seeing you try to apply results of 8800GTX to CTM? Do you know the two are very different in terms of their microarchitecture?

abinstein said...

Ho Ho -

Some of your comments deserve special treatments because they are specially wrong.

"Ok, how to you get all that bandiwdth to the socket? Will it have more than two channels?"

Socket-940 already three 16-bit bi-directional HT links, each link takes 76 pins. OTOH, the DRAM interface takes more than 220 pins, enough for 3 more such HT links. If each of these HT links is HT3.0, then the total bandwidth of 3x 16-bit HT3.0 will be more than 60GB/s. Note that this is just Socket-940.


"Also I hope you know there won't be 32bit links between each CPU sockets on DC2 platforms in foreseeable future."

It doesn't matter. You can always aggregate two 16-bit HT links.

"You should also know that AMD will make the buses half the width for >4P connections."

No, this is only if you want a fully connected mesh. Basically it is a bandwidth vs. latency tradeoff on the system level.

"Also it'll still have the same latency problems that Intel has with external memory controller"

There will be some latency penalty but no where near the inefficiency of the FSB. Besides the HT hub is much simpler and faster than memory controller.

"not to mention it is basically NUMA for even single CPU that has the memory extension, not particularly a good thing."

You do not understand NUMA correctly. Two channels (or HT links) of memory connections do not make NUMA, because the latency to either is the same with others.

Ho Ho said...

abinstein
"If you're willing to get "2GHz effective" DDR3, you should consider DDR2-1066,"

It is late 2008 we are talking about, 2GHz DDR3 is rather conservative. Still my main point was that adding a memory channel will boost bandwidth a lot. You can see the same thing comparing G80 vs R600. One uses 2.16GHz GDDR3 to reach the same level of bandwidht than the otherone using only 1.6GHz. Of course R600 is kind of flawed and cannot make use of all that bandwidth, at least not in majority of games released so far. Just see the first benchmarks of Bioshock. Seems as AMD needs to patch the drivers for nearly every single new game.


"which would offer 17GB/s bandwidth, or 70% more than what you've said."


What I said was about effective bandwidth. Best that AM2 has shown is around 68% effective at 8.78GiB/s for DDR2 800 that has theoretical peak of 12.8GiB/s. Yes, my predictions for Nehalem weren't all that conservative. To achieve around 40GiB/s effective from 48GiB/s theoretical peak would take around 83% efficiency but it is possible to get that high. AMD with DDR1 reached around the same level. With really good memory controller that efficiency should be repeatable, especially if Intel makes each channel independant as in K10.


"Just change the memory latency settings in your BIOS and watch your memory read throughput to drop."

Problem why we haven't seen 2x increase in bandwidth with DDR2 800 is that Intel uses FSB that limits it and AMD has inferior memory controller than it used to. AMD itself has claimed K10 will have a lot better memoy controller that will make a lot more bandwidth availiable for it. If what you say [it is near impossible to improve DDR2 bandwidth] would be true then AMD must be lying.


"Nehalem will get less percentage usable bandwidth out of its 38.4GB/s peak."

As I said it is 48 (3*64*2/8), not 38.4.


"How do you make "a few mm^2 of copper wire" on-die?"

You don't. The wires I have are around the house 4mm^2 total in cross and can deliver well over 2kW of power. You can cut the cross area down to at most 0.4mm^2 and still be able to give the CPU around 200W without too much problem. Yes, it might be more difficult than with lots of small pins but it will be a tradeoff between having enougn power and enough external bandwidth.


"The digi-life page shows that memory bandwidth is limited not just by memory bus clock, but more by memory controller"

Yes, that's why I say that AM2 has a lousy one that can be improved a lot. Also the article shows that on AMD core clock speed also affects memory bandwidth as memory controller speed depends on it. What surprised me in that review was this:

Besides, 80% of the DDR2-667 potential, though looking quite impressive, is still worse than the results demonstrated long ago by the Intel platform with a 266 MHz FSB, easily reaching the real memory bandwidth, practically identical to the theoretical FSB bandwidth (8.53 GB/s).



"Am I correctly seeing you try to apply results of 8800GTX to CTM? Do you know the two are very different in terms of their microarchitecture?"

Yes, CTM (R580/R600) is massively inferior when it comes to small patches and fine grain control, not to mention huge register space and shared cache for synchronization. It is just a fact that no current GPU is any good for ray tracing. Yes, they can do it but with awful efficiency compared to CPUs. Today it is much better to do the tracing on GPU and post-processing effects on GPU.


"If each of these HT links is HT3.0, then the total bandwidth of 3x 16-bit HT3.0 will be more than 60GB/s."

So are you suggesting that AMD will have multiple RAM sockets for every CPU or special memory sockets where some HT links are replaced by dram?


"Note that this is just Socket-940."

It won't be much different with other sockets as you already used HT3.0 with near maximum achievable performance. Only way to increase that would be to add more HT links.


"It doesn't matter. You can always aggregate two 16-bit HT links."

By loosing in total socket count and/or performance, yes.


"There will be some latency penalty but no where near the inefficiency of the FSB."

I'm quite sure it will be better but there will still be two memory controllers with both adding to the latency.


"You do not understand NUMA correctly. Two channels (or HT links) of memory connections do not make NUMA, because the latency to either is the same with others."

You misunderstood me. NUMA aka non uniform memory architecture means that some parts of memory are more expensive to use than others. With some RAM directly attatched to the socket and some over HT link the latter is more expensive to use. Of course if AMD intends to attach all the RAM over HT links it won't be NUMA.

Scientia from AMDZone said...

Is it possible that Intel is understating Nehalem's capabilities while talking about major additions like hyperthreading? Yes. Is it likely? No.


The criticism about adding Intel FABs that are only being used for chipsets to the crossover rate is valid. However, I then stated that AMD would be caught up by Q1 09 whether FAB 30 was counted or not. Posts since that point continue to ignore the real point about Q1 09 and instead pointlessly hash and rehash the crossover comparison.


aguia

Your characterization isn't quite right. Core Duo does not have multi-level shared cache. Nehalem will have shared L3. Also, the core clocking is not the same as Penryn. Nehalem includes the ability to up-clock for additional performance rather than just down-clock to save power. Nehalem also includes support for assymetric cores. This is a major change; Penryn does not include this. This could mean Nehalem includes some type of on-die port similar to K8's XBar but it could mean an in-package link such as PCI-e. Either method would be new.

intheknow

You have completely the wrong idea about cleanroom space at AMD. You are under the mistaken notion that FAB 36 was full and therefore AMD had to remove equipment to make room for 45nm tooling. This is completely false.

Using the original schedule, FAB 36 would have been full at the end of 2007. However, space was freed up in FAB 36 by building a very large cleanroom addition and then moving the test facilities from FAB 36 to this addition. It was easy to add 45nm tooling to FAB 36 because it wasn't full anyway (unlike FAB 30). Moving the test equipment ensures that FAB 36 won't run out of space before the 45nm tooling is able to produce chips. In another quarter, there should be free space in FAB 30 (as its test equipment is also moved) that could be used if necessary. Space is also freed up in FAB 30 by removing 200mm tooling as it is sold however the 45nm tooling was installed while FAB 30 was still full.

Yes, AMD does do process research at East Fishkill however 45nm is beyond that point. AMD has a TwinScan operating right now at Dresden.

abinstein said...

"What I said was about effective bandwidth. ... To achieve around 40GiB/s effective from 48GiB/s theoretical peak would take around 83% efficiency but it is possible to get that high."

Either you can't understand or you don't admit you are wrong. DDR3 has zero chance to get to the "efficiency" of DDR1, no matter how good the memory controller is. It seems to me you'd rather worship Intel than learn a better computer architecture.

"Problem why we haven't seen 2x increase in bandwidth with DDR2 800 is that Intel uses FSB that limits it and AMD has inferior memory controller than it used to."

Do you know what memory controller does on simple stream of memory read? It doesn't do shit, and I don't know how "inferior" you can go from there. It's like you say "this GbE switch must be inferior because I can push only 400Mbps top through the jack." Yeah right... you won't ever get close to 1Gbps if you keep sending small packets.

"AMD itself has claimed K10 will have a lot better memoy controller that will make a lot more bandwidth availiable for it."

The memory controller in K10 has better prefetch, larger buffer, and two independent channels, all of which have nothing to do with plain simple single-threaded memory read bandwidth.

"If what you say [it is near impossible to improve DDR2 bandwidth] would be true then AMD must be lying."

Those things in K10 improve memory latency, not bandwidth. Take a few graduate-level before you make another claim, will you? Or better, just Read my blog!

"You don't. The wires I have are around the house 4mm^2 total in cross and can deliver well over 2kW of power. You can cut the cross area down to at most 0.4mm^2 and still be able to give the CPU around 200W without too much problem."

It seems you lack the basic electrical knowledge. The wires in your house carry a voltage 110V and carries 20A. Reducing the voltage to 1.2V it must carry >160A to deliver your 200W. No freaking way.

"Yes, that's why I say that AM2 has a lousy one that can be improved a lot."

Do me a favor. Get the best gigabit ethernet switch you can buy, and try to obtain >800Mbps from it through any standard socket programming. I guarantee you can't. You probably can't even get past 600MBps.

You must think all switch manufacturers on the planet are doing lousy jobs like AMD's "AM2" (which BTW is a socket standard, not memory controller), don't you? It seems to me, Ho Ho, that you know nothing but two: 1) worshiping Intel, 2) FUDing AMD.

"Yes, CTM (R580/R600) is massively inferior when it comes to small patches and fine grain control, not to mention huge register space and shared cache for synchronization."

Your way of saying "inferior" is laughable, and I'm sure AMD's CTM is as "inferior" as its AM2! Have you ever used one, or does this again come out of your ignorance and old habit of FUDing?

"So are you suggesting that AMD will have multiple RAM sockets for every CPU or special memory sockets where some HT links are replaced by dram?"

Don't know what are you talking about. It just requires one CPU socket with 6 16-bit HT links. I just show you the pin count is not the problem. Socket F will have even more pins available.

"By loosing in total socket count and/or performance, yes."

So you think using more HT links will actually decrease performance? What's your logic? Or what's your problem?

"I'm quite sure it will be better but there will still be two memory controllers with both adding to the latency."

No, the extra latency is just one HT round-trip. The memory controller by itself incurs very little latency unless heavy queuing occurs at the internal buffer. The HT latency is roughly 96 clock cycles. Note this clock can run independently from CPU's and be much faster.

"With some RAM directly attatched to the socket and some over HT link the latter is more expensive to use."

No, there is no NUMA on single processor. Even if the RAM is connected in two different ways as you describe (which is stupid BTW), it will just be treated as a level of exclusive cache managed by the page table.

The main problem of NUMA is when making a chunk of memory local to one processor, it becomes remote to another. With just single processor, this can never happen, and all you have is cache management.

Scientia from AMDZone said...

Ho Ho

"Wasn't it not only bootable but running at high clock speed?"

No, it was a barely stable A0 stepping. Intel has demoed newer steppings since then.

"If the CPU is anything like Conroe then direct copy of Netburst HT won't work."

To follow the P4 model, Nehalem would need a trace cache but we don't have enough detail to know the implementation. However, Intel has said that HyperThreading on Nehalem has been enhanced from P4 so we have to assume that there is some difference. I agree, it wouldn't be just a copy of P4's.

"Someone hinted at 3-channel DDR3 for Nehalem."

Is that all? Bulldozer will have 4 separate channels. Barcelona will try to get by with split channels and up to 1/3rd more bandwidth from DDR2 1066 over 800. However, Bulldozer is native 8 core so it doubles the channels from 2 to 4. Bulldozer can also get more speed from faster DIMMs as DDR3 gets above 1066.

Intel is stuck at the moment because it doesn't have enough FSB speed to take advantage of DDR2-800, much less 1066. K8 in contrast has been able to make use of DDR2-800 since Rev F in mid 2006.

However, X2 K8 isn't fast enough internally to use that much bandwidth. Barcelona in contrast can fully utilize the bandwidth of 1066. This has Intel a bit concerned because it can't match this until Nehalem.

"AMD has nothing like that, at least not on public roadmaps. Of course it is not on Intels either, these are only rumours."

As I've just shown, AMD's memory bandwidth with Barcelona far exceeds that of Penryn. Intel badly needs a speedup with Nehalem.

Scientia from AMDZone said...

ho ho

"Yes, I know. My point was that when someone would make a similar list for K10 it would also seem as it isn't anything special."

This is false. There are as many differences between K10 and K8 as there are between Penryn and Yonah.

To have an equivalent comparison to Nehalem and Penryn you would need to compare Bulldozer to Shanghai.

Giant said...



No, it was a barely stable A0 stepping. Intel has demoed newer steppings since then.


The first 45nm CPUs were shown running at speeds in excess of 2Ghz encoding videos, playing games etc.:

http://www.anandtech.com/printarticle.aspx?i=2915

This is in stark contrast to AMD who first demonstrated Barcelona running..... Task Manger.

Scientia from AMDZone said...

ho ho

"Scientia said his timetable was conservative and "production in Q2" doesn't mean it must be the end of the quarter."

Why do you have so much trouble remembering what I actually say? Grose gave the Q2 08 timeframe; my conservative timeframe is Q3 08. Matching the 65nm ramping schedule would also indicate Q2 08 so, again, my Q3 08 timeframe is conversative.

"Intel currently fights against AMD having around 30% less bandwidth, ~5.3GB/s on 1066MHz and 6.5 on 1333MHz FSB."

Your figures are way off. Rev F has the equivalent of a 1600Mhz FSB versus Intel's current maximum 1333Mhz. This makes Intel 17% less rather than 30% less. However, dual core K8 doesn't have enough internal speed to actually use the full 1600Mhz bandwidth so it doesn't really help AMD.

K10 has the equivalent of a 2133Mhz FSB. Compared to Intel's 1333Mhz FSB Intel has 38% less. When Intel releases a 1600Mhz FSB they will be 25% behind. Now, 25% is one entire core on a quad core die so this is a lot. Intel will definitely be behind in this area until Nehalem. But, as I've already mentioned, Bulldozer has four channels and will easily pull ahead of Nehalem.

"When before has there been such a leap in CPU bandwidth in couple of years?"

Intel and AMD both doubled memory bandwidth from early P4 and K7 to later P4 and K8 from 400Mhz to 800Mhz. AMD doubled this again with Rev F to 1600Mhz. Intel hasn't quite doubled their bandwidth yet but should soon. Barcelona increases Rev F's bandwidth by another 33%.

Scientia from AMDZone said...

ho ho

"It is late 2008 we are talking about, 2GHz DDR3 is rather conservative."

No. Xbitlabs: JEDEC DDR3.

The DDR3 standard is intended to operate over a performance range from 800MHz to 1600MHz

"Problem why we haven't seen 2x increase in bandwidth with DDR2 800 is that Intel uses FSB that limits it and AMD has inferior memory controller than it used to."

False. Rev F's memory controller is fine. The chip itself is not able to use the bandwidth.

"AMD itself has claimed K10 will have a lot better memoy controller that will make a lot more bandwidth availiable for it."

Not exactly. K10 sorts the memory requests so that it can have the most effective access pattern. It also includes its own prefetch. A big change is that the core prefetcher now goes directly to L1 instead of L2. However, the biggest change is that four cores will be able to take all of the bandwidth.

"So are you suggesting that AMD will have multiple RAM sockets for every CPU or special memory sockets where some HT links are replaced by dram?"

No. Bulldozer uses four dedicated HT links to communicate with four memory controllers that each control 4 DIMMs. This is G3MX.

"It won't be much different with other sockets as you already used HT3.0 with near maximum achievable performance. Only way to increase that would be to add more HT links."

With Bulldozer you get 8 HT links altogether. 4 are dedicated to memory control.

"I'm quite sure it will be better but there will still be two memory controllers with both adding to the latency."

Not two memory controllers. K10 uses split access but this is within one controller. K10 uses some prefetch techniques to hide latency which is what C2D did so successfully.

"With some RAM directly attatched to the socket and some over HT link the latter is more expensive to use."

There is no such configuration.

"Of course if AMD intends to attach all the RAM over HT links it won't be NUMA."

Yes, with Bulldozer.

Ho Ho said...

abinstein
"Either you can't understand or you don't admit you are wrong. DDR3 has zero chance to get to the "efficiency" of DDR1, no matter how good the memory controller is."

Only time will tell.


"Do you know what memory controller does on simple stream of memory read? It doesn't do shit, and I don't know how "inferior" you can go from there."

Then explain why was FX62 so much faster than 4000+. Also you seem to have forgotten how AMD increased its memory controller efficiency with DDR1. It wasn't as effective from day one, you know.


"just Read my blog!"

Right, "it is so because I said it is so".


"The wires in your house carry a voltage 110V and carries 20A"

Actually they carry 230V at 10A. IIRC in US 20A is only for heavy-duty stuff, the rest runs on 10A. For really heavy-duty things we have 380V three phase wires that can deliver tens of apmers. E.g our house has the main swich with 60A power. Though it must be a bit extreme, just that my dad used to use lots of very power hungry equipment (think >>10kW).


"Reducing the voltage to 1.2V it must carry >160A to deliver your 200W. No freaking way."

Ok, how big is the total cross area of the CPU power pins used today? I hope you know how many ampers go through the pins on an OC'd 90nm P4D without causing problems.


" I'm sure AMD's CTM is as "inferior" as its AM2!"

Well, AM2 K8's run ray tracing around half the speed of similarly clocked Core2's thanks to half the SSE width. So technically, yes, you could say that AM2 K8 are inferior.


"Have you ever used one, or does this again come out of your ignorance and old habit of FUDing?"

Have you used any GPU or other piece of HW to do ray tracing? Have you ever heard anyone using them? If yes then what were their resutls? By suggesting regular GPU for doing ray tracing better than CPUs you prove you know nothing of the workload ray tracing has. Plain masive FP performance and memory bandwdith won't help you if the architecture doesn't work for what is needed.


"So you think using more HT links will actually decrease performance? What's your logic? Or what's your problem?"

If they link those HT links together in order to provide enough memory bandwidth to the socket they can't directly connecyt as many sockets with real CPUs as they otherwise could. I thought it was obvious.


"No, the extra latency is just one HT round-trip"

Are you sure there won't be an additional memory controller in the memory socket?


"The HT latency is roughly 96 clock cycles"

At 3GHz this is around 30ns. Current AM2 CPUs have total RAM latency of around 45-55ns. That puts the total latency in almost the same area as was P4.


"No, there is no NUMA on single processor."

Who said the memory has to be attached to a different real CPU? Why can't it simply be well, a special purpouse chip specifically meant for forwarding memory requests?


"Even if the RAM is connected in two different ways as you describe (which is stupid BTW), it will just be treated as a level of exclusive cache managed by the page table."

Stupid or not but isn't that what you are basically suggesting?


"With just single processor, this can never happen, and all you have is cache management."

So you are suggesting that the directly attached RAM will become something like a cache for the rest of the RAM connected over HT links? Poor OS developers.


scientia
"Is that all? Bulldozer will have 4 separate channels."

I don't know, those are the only rumours I've heard. Perhaps Intel will use some of the die-stacking technologies with it to massively increase cache size, who knows. Btw, was that four-channel thing talked on AMD analyst day? I must have missed that.


"Your figures are way off. Rev F has the equivalent of a 1600Mhz FSB versus Intel's current maximum 1333Mhz."

I was talking about real-world achievable performance.


"Intel and AMD both doubled memory bandwidth from early P4 and K7 to later P4 and K8 from 400Mhz to 800Mhz. AMD doubled this again with Rev F to 1600Mhz."

Do you suggest that going from 1600MHz FSB (if lucky) to 3 (or more?) 64bit DDR3 channels directly attached to the CPU will at most double the bandwidth?

Scientia from AMDZone said...

Giant

"The first 45nm CPUs were shown running at speeds in excess of 2Ghz encoding videos, playing games etc.

This is in stark contrast to AMD who first demonstrated Barcelona running..... Task Manger. "


Not really. Intel gave no benchmarks which suggests that performance was not that good. To show genuine stability you would need to have seen these ES chips run on motherboards without BIOS updates. And, we didn't see that.

Later, with more stable steppings Intel became more open about performance.

Ho Ho said...

scientia
"To show genuine stability you would need to have seen these ES chips run on motherboards without BIOS updates. And, we didn't see that."

When did we see it from AMD?

InTheKnow said...

Scientia said...

However, I then stated that AMD would be caught up by Q1 09 whether FAB 30 was counted or not. Posts since that point continue to ignore the real point about Q1 09 and instead pointlessly hash and rehash the crossover comparison.

You made the crossover comparison to demonstrate that AMD was already closing the gap and extrapolated their ability to continue the pace from there. Since that is the supporting foundation of your argument that AMD will be able to close the existing gap, I fail to see how it is irrelevant.

In addition to pointing out an inconsistency in your basic premise, I responded to your main point. I questioned whether or not AMD could afford the accelerated pace they have set out from an economic point of view. I did not ignore your point, but you ignored my response.

Keep in mind that whatever their future plans might be, AMD is not currently a financially healthy company and this plan will further strain their financial wellbeing going forward. I know you believe the worst is behind them, but execution of this plan will continue to put financial pressure on them that you have failed to account for in your previous analysis.

You have completely the wrong idea about cleanroom space at AMD. You are under the mistaken notion that FAB 36 was full and therefore AMD had to remove equipment to make room for 45nm tooling. This is completely false.

The process you are describing sounds like poor planning if it was actually done in the way you seem to be implying.

What I read was: AMD built a fab. Started filling it up, said oh, wait we need more room, built an addition and then moved some of the already installed equimpment into the new facility.

That might not be what they did, but that is how it sounds from your post. If so, their investors should hang them.

I should also point out that final test doesn't use the same level of cleanliness that the production fab uses since it is after C4 and the wafers are about to be sent to sort and packaging. Metal-1 e-test needs to be in the same fab since you have a lot more processing to do before the wafers are finished, so you can't be talking about moving that.

You also have not provided anything from Intel stating that Penryn and Nehalem use the same core. I'm assuming that means that your statement that Intel has told us they use the same core is incorrect. Though I will concede that it might be the same core, they haven't publicly said as much.

abinstein said...

"Then explain why was FX62 so much faster than 4000+. Also you seem to have forgotten how AMD increased its memory controller efficiency with DDR1. It wasn't as effective from day one, you know."

Memory controller doesn't do complicated things, but it does introduces delay, which is directly reduced by the faster clock.

"Right, "it is so because I said it is so"."

No, it's because I proved it. I have scientific evidence there. Your argument has nothing but Intel-worshiping.

"Ok, how big is the total cross area of the CPU power pins used today?"

The problem is not on the pin. A pin connects to one edge point on die, and where does your 10A go from there on?

" I hope you know how many ampers go through the pins on an OC'd 90nm P4D without causing problems."

Yes, a lot of amperes going through many pins. The "many" is the point.

"Who said the memory has to be attached to a different real CPU? Why can't it simply be well, a special purpouse chip specifically meant for forwarding memory requests?"

Does that special purpose chip run programs? If it doesn't, then you have single processor, and no NUMA consideration. If it does (for example, CTM with a GPU), then you have multiple asymmetric processors, and NUMA.

"Stupid or not but isn't that what you are basically suggesting?"

No, you are not understanding.

"So you are suggesting that the directly attached RAM will become something like a cache for the rest of the RAM connected over HT links? Poor OS developers."

I suggest what you said is stupid.

Scientia from AMDZone said...

InTheKnow

If AMD can deliver Shanghai in Q3 08 then they will have reduced some of Intel's process lead. Secondly, AMD's 45nm production should be in good shape by the time Bulldozer is released in Q2 09.

I questioned whether or not AMD could afford the accelerated pace they have set out from an economic point of view."

I don't know about 32nm. That could be dependent on AMD's financial shape coming out of 2008.

"but execution of this plan will continue to put financial pressure on them that you have failed to account for in your previous analysis. "

It won't effect 45nm.

"What I read was: AMD built a fab. Started filling it up, said oh, wait we need more room, built an addition and then moved some of the already installed equimpment into the new facility."

AMD said that bump and test was previously done in the main FAB cleanroom space. So, they built new cleanroom space and moved the bump and test equipment out of the FAB and into the new space. This frees up space in the FAB which will be used for production tooling. I do not know what level of cleanroom is required for bump and test.

Ho Ho said...

abinstein
"The problem is not on the pin. A pin connects to one edge point on die, and where does your 10A go from there on?"

Why not connect the power wires under the packaging to the pins?


"Yes, a lot of amperes going through many pins. The "many" is the point."

775 is not all that much. The ampers that are there should be well over 200 when under heavy OC and the pins have no problems delivering it.


"Does that special purpose chip run programs? If it doesn't, then you have single processor, and no NUMA consideration."

Now you are arguing over semantics. Fact is that with having some RAM connected over HT link that one would be more expensive to access. That kind of architecture is definitely not UMA.

Have you got anything to back that up? Wikipedia seemed to say what I said. Of course there hasn't been too many setups that would be similar to "memory socket" design so finding anything could be problematic.


scientia
"It won't effect 45nm."

Yes but what about anything that comes after that? AMD needs to invest to the research now but it seems to have "little" problems with money. Also it has over $1B to pay back every year for around four years. Definitely not a simple task.


"If AMD can deliver Shanghai in Q3 08 then they will have reduced some of Intel's process lead."

Wasn't it you yourself who said that comparing crossover dates was much more accurate than comparing release dates? If so then release date doesn't show too much if the process is not mature and it takes a whole lot of time to finally reach a crossover.

Ho Ho said...

I just read some stuff on Wikipedia and seems as Nehalem will also have up to 4 memory channels. Though there seem to be no sources listed for that information:

http://en.wikipedia.org/wiki/Nehalem_(CPU_architecture)#Details_on_the_processor

At 2GHz effective four channels would deliver around 64GiB/s theoretical peak. I would guess >50GB/s (>78% efficiency) is (easily) reachable in bandwidth benchmarks.

Ho Ho said...

I've seem to have missed one of the previous posts

scientia
"The DDR3 standard is intended to operate over a performance range from 800MHz to 1600MHz"

When did JEDEC say that 1066MHz DDR2 will be standardized? Certainly not after couple of months since introducing first modules.

In short, yet again time will tell what we'll really have. So far fact is that companies have already demonstrated RAM running at well over 1600MHz and JEDEC is thinking about adding DDR2 1066 to the list of previously defined speeds. How long did it take for DDR2 to reach 1066MHz?


"False. Rev F's memory controller is fine. The chip itself is not able to use the bandwidth."

Not even in synthetic benchmarks?


"With Bulldozer you get 8 HT links altogether. 4 are dedicated to memory control."

All are 16bit?


"No. Bulldozer uses four dedicated HT links to communicate with four memory controllers that each control 4 DIMMs. This is G3MX."

Now I see why such a confusion. When talking about attaching RAM over HT I was always talking about adding additional bandwidth via connecting sockets with G3MX to real CPU.


"Not two memory controllers."

I was talking about attaching more RAM to a socket with G3MX actually. That G3MX seems to be as special purpouse chip that does little besides acting as a memory controller. Even still for memory access it'll act just as any other CPU in NUMA machine.


"There is no such configuration."

Isn't current K8 using directly connected RAM?

____________________

One OT question.

On amdzone some people make a big fuzz over Lightweight Profiling Proposal. Is the only difference between that and the system availiable since original Pentium that AMD proposes to standardize the instructions to access performance monitoring counters?

If so then I can't see anything too big about it. There are existing libraries that abstract away the CPU specific instructions and provide stable API for using those counters and events. Sure, standardized ISA would make it much more simple but it it wouldn't be as revolutionally as some seem to think. Also it would make it kind of difficult to provide access to CPU specific things such as trace cache misses.

They bring out one difference between existing systems and their proposal and that is that theirs doesn't use interrupts. Well, from what I can tell from personal experience using PAPI to track around 8 simultaneous events had around 1-2% impact on performance. PAPI uses perfctr on Linux and that one does use interrupts for certain things, mostly to notify application when counter overflows. From what I have seen I wouldn't call loosing interrupts to improve performance that big deal.

Another thing is that this is only a proposal, there exists no real HW that would implement it.

InTheKnow said...

Scientia said...

It won't effect 45nm.

Actually, it will, because they are moving to Shanghi more quickly than the standard pace would dictate. This means they will not realize the full benefit of the money they poured into Barcelona. They again shorten the life cycle on Shanghi to move to Bulldozer. At each of these transitions, they are leaving money on the table that they could have made if they weren't playing catch up.

So while orders for tooling may have been placed and many of the costs to physically transition to the new process technology paid, there is still a tangible cost to what they are trying to do. There are all the costs of design, process development (yes, a new design requires some development), ramp of the new product and validation. None of that is free.

I don't remember the required clean room class off hand for bump and test, but I do know you don't even need a bunny suit. If AMD was physically doing that in the main fab, they were paying quite a premium. I suspect that they what they actually had was a less clean section of the facility separate from the main fab space where these operations were carried out.

Scientia from AMDZone said...

ho ho

"At 2GHz effective four channels would deliver around 64GiB/s theoretical peak."

Your information is wrong. It is actually four channels of FBDIMM 1066. So, about half of your estimate.

Scientia from AMDZone said...

ho ho

First of all, Nehalem isn't using DDR3; it is using FBDIMM. FBDIMM is only going to give you 1066, not 1600 or 2000. However, it appears that Intel is going to use AMD's G3MX idea and put controller chips on the motherboard for DDR3. It is not currently known if Intel will stick with the FBDIMM controller or if they will replace it with CSI. A CSI version wouldn't be out until probably Westmere. However, I can't see any reason why Intel couldn't use the FBDIMM controller through secondary supporting chips to allow the use of ordinary ECC DDR3.

"Another thing is that this is only a proposal, there exists no real HW that would implement it."

You have this backwards. This would be hardware monitoring that would be accessed with new x86 instructions. If released it would be implemented by the processor itself and does not require external hardware. Your understanding of the interrupts is incorrect. You can still use interrupts to access the information however you do not need interrupts to collect the information since it is collected by the processor itself.

This information could be used for both code profiling during development and also realtime monitoring and load balancing.

Finally, your understanding of NUMA is incorrect. A 4 core K10 is only NUMA if you have more than one socket. However, an 8 core Shanghai would be NUMA even on a single socket. Bulldozer however is not NUMA on one socket even with 16 cores and 4 memory channels.

abinstein said...

"Finally, your understanding of NUMA is incorrect. A 4 core K10 is only NUMA if you have more than one socket. However, an 8 core Shanghai would be NUMA even on a single socket."

Yup, that is correct, suppose AMD makes 8-core Shanghai with MCM and with a single or two separate memory interfaces from the two dies.

abinstein said...

"Why not connect the power wires under the packaging to the pins?"

Then it is no different from connecting the wires to several pins. However with the latter you get more uniform packaging, lower cost, and better electrical property.


"775 is not all that much. The ampers that are there should be well over 200 when under heavy OC and the pins have no problems delivering it."

You do understand that OC invalidate your warranty. That said, 200A over 100 pins or so is much better than 200A over one single pin.


"Now you are arguing over semantics. Fact is that with having some RAM connected over HT link that one would be more expensive to access. That kind of architecture is definitely not UMA."

Wrong, as usual. You have completely false understanding about NUMA. I've already told you, the different access time alone does not make NUMA, unless multiple processors (or "nodes") are in question.

Ho Ho said...

scientia
"It is actually four channels of FBDIMM 1066"

Where did you get your information? That low memory speed is definitely not going to happen.


"First of all, Nehalem isn't using DDR3; it is using FBDIMM. FBDIMM is only going to give you 1066, not 1600 or 2000."

You do know that FBDIMM can use DDR3 chips too, do you? One of the main points about FBDIMM was you can use different memory chips without needing to change motherboard or any other part of the system.


"However, it appears that Intel is going to use AMD's G3MX idea and put controller chips on the motherboard for DDR3"

Where did you get that idea?


"You have this backwards. This would be hardware monitoring that would be accessed with new x86 instructions. If released it would be implemented by the processor itself and does not require external hardware."

What I meant was that there is no HW implementing this yet. K10 will definitely not have it. Also the current system doesn't need any external HW either.


"You can still use interrupts to access the information however you do not need interrupts to collect the information since it is collected by the processor itself."

It has been like that since the original Pentium when performance monitoring counters were introduced.


"This information could be used for both code profiling during development and also realtime monitoring and load balancing."

Again, this has been like so from day one, nohing new about that.


"Finally, your understanding of NUMA is incorrect. "

Give me a definition of NUMA that would cover G3MX like things.


abinstein
"You do understand that OC invalidate your warranty."

Yes but that wasn't the point I was trying to make. I was just saying that it doesn't take too wide wires to move awfully lot of power.


"That said, 200A over 100 pins or so is much better than 200A over one single pin."

That depends on how big cross are the single pin has.


" I've already told you, the different access time alone does not make NUMA, unless multiple processors (or "nodes") are in question."

So G3MX doesn't count as a separate processor?

Aguia said...

It seams that Ati is lowering the power consuming of the 2900XT cards.

Diamond Viper Radeon HD 2900XT 1GB

One of the Ati 2900 line problems is already gone.

Aguia said...

It seams the "bad" Ati is improving:

DirectX 10 Games

Ho Ho said...

aguia
"It seams the "bad" Ati is improving:"

It is? Lost planet doesn't support AA on 2900, as drivers do not have application specific fixes it is dead slow. Other two seem to have that specific optimizations so they are running fine.

Bioshock seems to be another such game where R600 is not competitive without applicaiton specific tweaking. How many months have AMD/ATI tweaked the drivers and they are still not working correctly?

Aguia said...

Yes ho ho Bioshock have a problem, a patch is going to be released to solve the problem.

Lost planet is a ported game from which Ati never see the code before the game was released.

And the cards don’t have that much months as you are implying, you are a little biased hoho, when Ati is late is very late, now you say that Ati had many months to optimize the drivers, which one is which hoho?
By your talk looks like Ati is the one that released the DX10 cards in November/2006.
Ati released 2600 line in July/2007, last month in case you didn’t know which month we are its August…
Did you already forget how many months Nvidia took to release working drivers to Vista? Or how bad SLI was working or not working?

Ho Ho said...

aguia
"By your talk looks like Ati is the one that released the DX10 cards in November/2006. "

Are you saying that AMD/ATI driver guys were idleing while the chip had all the delays? After all it was supposed to be released much earlier than it really was, first rumours were talking about September release. During all the delays the drivers were still being worked on.

Scientia from AMDZone said...

ho ho

"Where did you get your information? That low memory speed is definitely not going to happen.

You do know that FBDIMM can use DDR3 chips too, do you?"


Perhaps you should look at the JEDEC spec. Page 18.

The highest current specification for FBDIMM for 2008 is DDR3-800 which is obviously slower than the current DDR2-1066. It has been suggested that this will be increased to DDR3-1066.

It doesn't matter what FBDIMM is theoretically capable of. What really matters is what official specs are created and what DIMM modules are actually manufactured. The truth is that FBDIMM is currently not very popular among memory manufacturers because its volume is only a fraction of what Intel promised. It should be noted that registered memory is also falling behind for the same reason: low volume.

However, AMD is cleverly sidestepping both the FBDIMM and registered DIMM problem by doing registration and serial communication with a separate HT compatible chip on the motherboard. This allows ordinary ECC DIMMs to be used but to get the same benefits of both registered memory and FBDIMM serial links. But without using either FBDIMM or registered memory. This is G3MX.

As I've been trying to explain to you, Intel has very little choice at the moment but to stumble ahead with the FBDIMM controller but at a great disadvantage because FBDIMM is slower, has higher latency, and draws more power than plain DDR2 or DDR3.

So, it has been suggested that Intel will copy AMD's approach and create a hybrid FBDIMM solution. This would consist of a single chip on the motherboard with a fanout of 4 instead of one chip per DIMM as is the case with AMB. This chip would include the same FBDIMM serial link that current AMB chips have. In other words, to the CPU it would look just like one large FBDIMM on each channel but in reality each one would be four DIMMs. Again, this is what has been suggested but I suppose Intel could just stick with true FBDIMM and live with the problems.

Scientia from AMDZone said...

ho ho

NUMA is not that difficult to understand. If all the memory access goes through one memory controller then it isn't NUMA. However, if you use two or more chips each with their own distributed memory then it is.

So, a single K10 or Nehalem would not be NUMA. However dual socket or higher would be. Penryn is not NUMA because all chips share the same memory. Note that this follows the above definition because Penryn and Clovertown chips only have a single memory controller on the Northbridge.

Scientia from AMDZone said...

giant

Yes, I've noted your AMD bankrupt in Q2 08 comment in my Absurd Predictions list along with Lex's prediction that Phenom and Barcelona prices are going to nosedive in Q1 08 and that K10 will be in the Celeron price range when Nehalem appears and, of course, Roborat's prediction that AMD will completely outsource in 2008.

Does anyone else have an absurd prediction while I'm at it?

Greg said...

AMD will start selling unicorns instead of processors?

Ho Ho said...

scientia
"Perhaps you should look at the JEDEC spec. Page 18."

When was that paper released? I couldn't find a date on the paper but from the graphs I'd say it is at least one year old, if not older. When I looked at the specifications released at similar times for DDR2 then 800MHz was the absolute maximum I saw.

Also that one wasn't the real spec, it was only some presentation. Real specs are on jedec site.


"The highest current specification for FBDIMM for 2008 is DDR3-800 which is obviously slower than the current DDR2-1066."

I found no mention of 1066MHz DDR2 on that paper, does that mean it doesn't exist? If faster speed FBDIMMs aren't listed on some old presentation and from that you conclude those won't exist then why make an exeption for DDR2?


Do you suggest that JEDEC will make 1066MHz DDR2 as a standard but not higher speed DDR3 that wasn't listed on the (old) paper you linked to?

Also can you link to a real paper on JEDEC that would list 1066MHz DDR2 as official standard? My understanding is that it is still a proposal and it hasn't yet been made as standard.


I'm not trying to say that DDR2 1066 won't happen. I just say that there are no reason why faster speed DDR3 and FBDIMM couldn't happen.


Also you "forgot" to tell me where did you got your information that Intel will only support FBDIMM with Nehalem.



"If all the memory access goes through one memory controller then it isn't NUMA. However, if you use two or more chips each with their own distributed memory then it is."

Does G3MX contain a memory controller or not? If it does then what makes it different from regular 2P AMD box with twe regular CPUs?



"Does anyone else have an absurd prediction while I'm at it?"

How about AMD having (huge) money problems for several next years because of having troubles with paying back all that money they borrowed? My very crude calculations say it has to pay at least 20% of total revenue.


Also, do you have any further comments on the profiling things? Perhaps point out some of the mistakes I might have made.

Aguia said...

During all the delays the drivers were still being worked on.

Maybe they could, but it’s never the same thing. Do you think drivers guys can optimize a product that isn’t used by any one? That maybe not even they have good working cards to do some heavy tests?

If things work like you said Vista would have been released will all its problems solved.

Ho Ho said...

aguia
"Do you think drivers guys can optimize a product that isn’t used by any one?"

Yes, they can. How big performance problems did G80 series have right after release or three months after it? Yes, I know that Vista had some problems but under DX9/XP everything was fine. Not so with R600.


"If things work like you said Vista would have been released will all its problems solved."

Developing software and software for hardware are two different things.

Aguia said...

Does anyone else have an absurd prediction while I'm at it?

Yes I have one.

Lex, Roborat and Giant will all BK because they wasted too much money in upgrading their systems with high price range components and outsourcing the oldest ones for free.

Axel said...

Scientia

Yes, I've noted your AMD bankrupt in Q2 08 comment in my Absurd Predictions list along with Lex's prediction that Phenom and Barcelona prices are going to nosedive in Q1 08 and that K10 will be in the Celeron price range when Nehalem appears and, of course, Roborat's prediction that AMD will completely outsource in 2008.

What's absurd is believing that AMD currently have a viable business model. So over the last 12 months while being at full or near-full fab utilization and selling everything they manufactured, AMD still managed to lose $2 billion.

Hence, another good candidate for the Absurd Predictions list is the prediction that AMD can recover their margins and profitability without changing the essential nature of how they do business.

Ho Ho said...

There is some quite interesting points made on Roborats blog about Scientias analysis. I wonder if I should bring some of it here to make things a bit more interesting.

abinstein said...

"What's absurd is believing that AMD currently have a viable business model. So over the last 12 months while being at full or near-full fab utilization and selling everything they manufactured, AMD still managed to lose $2 billion."

Axel, you are not understanding. This "problem" is not business model, but business size. AMD has had exactly the same business model as Intel over the last 10-15 years, however, due to its smaller size and the high-fixed cost nature of microprocessor manufacturing it is not making much money.

Lately AMD has been doing two things to break this chick-and-egg problem (small size, no money; no money, small size). First is it tries to take market share and partners; it has been quite successful in doing it, and anyone can see today's AMD is supported by much more partners than mere 5 years ago.

Second is it bought ATi, and that's a point to start breaking away from the Intel-like business model. It can talk about fusion; it can bring both graphics and processor to mobile devices. Its wings now span from the highest-performing supercomputer to most consumer-centric gaming console.

There is no denial and AMD faces tough competition from Intel, which is regular to play dirty, copy ideas, and brute force. AMD's bottom line is still not as strong as Intel, or even Apple, but it's obvious that AMD's top line has improved dramatically since Opteron was first introduced.

If you look at the patent filing, AMD's portfolio today is several times strongest than it was again just a few years ago. Those patents are cited and followed by all companies, Intel included. There is nothing in the world can stop AMD from making better processors.

This is a fact, and it is what Intel knows. The chip giant, in its usual way of playing dirty, is thus doing anything it can to spread FUDs against AMD. We will see how Intel does its nasty works, and how AMD foils them.

Giant said...



Lex, Roborat and Giant will all BK because they wasted too much money in upgrading their systems with high price range components and outsourcing the oldest ones for free.


Hardly. I keep two or three PCs around here anyway, so by the time I get rid of the old stuff (either by giving to family or selling it off) it is usually quite old. I have a Q6600 system now with 2GB memory and an 8800 GTS 640MB video card. (The third fastest video card that one can buy) My other system is an E6600 system with 2GB ram and a Geforce 7600 GT. I fully intend on upgrading that with a G92 video card upon their launch in November.

See these Bioshock benchmarks:

http://www.firingsquad.com/hardware/bioshock_directx10_performance/page5.asp

From my own experience, I can run the game at 1920x1200 on perfectly with all the details at maximum, 16X AF enabled and 2X AA enabled.

For anyone into games, Bioshock is well worth the purchase for either Xbox 360 or Windows. (I had played the demo on the 360 prior to the game being released).

The 20% number was generous. Gelsinger's actual claim at IDF was "40% faster for gaming

40% is quite possible on the 'Extreme Edition' CPU with the full 12mb of cache. But this would be running at 800x600 or something silly like that getting 300fps in Quake 4 or whatever. Running a modern game at 1920x1200 or higher with all the details at maximum and there would be maybe, at best, a 5% boost.

Don't forget, AMD's own Randy Allen boasted a 40% boost over what Clovertown can deliver today:

http://youtube.com/watch?v=G_n3wvsfq4Y&v2

I would be very interested to see benchmarks of Barcelona, the launch is not far away.

AMD's BK by Q2'08 is quickly becoming more and more likely. They have $1.5bn in cash at the end of the last quarter.

But then they took out another loan, and used that loan, in addition to existing cash on hand to pay back the Morgan and Stanley loan. If AMD paid back $500m of the MS loan (which was $2.5bn) earlier in the year with the first debt offering, then used the second $1.5bn offering to pay them back, then they had to use $500m of existing cash to finish that loan off. Leaving $1bn BEFORE the Q3'07 loss. If we be conservative, and say AMD will lose $400m (down a bit from $600m last quarter) they have just $600m in cash left at the end of the quarter.

The only way for AMD to avoid BK by Q2'08 is to sell off substantial assets. This would include either business units (Consumer electronics?) or FABs. Don't be suprised if AMD sells off FAB30 or the consumer electronics division in addition to all the old tools from FAB30, and unneeded land etc.

p4nee said...

abinstein said...
There is no denial and AMD faces tough competition from Intel, which is regular to play dirty, copy ideas, and brute force. AMD's bottom line is still not as strong as Intel, or even Apple, but it's obvious that AMD's top line has improved dramatically since Opteron was first introduced.
...
This is a fact, and it is what Intel knows. The chip giant, in its usual way of playing dirty, is thus doing anything it can to spread FUDs against AMD. We will see how Intel does its nasty works, and how AMD foils them.


Clearly someone is spreading FUD here on his beloved company then. If Intel was spreading FUD about AMD, what has is said?

On the other hand, AMD has been doing a lot of talkings (while Intel is working on some real project), publish misleading ads, contain misleading information in offical website, etc. by any mean, AMD is far more suspicious and capable of FUD, don't you think so?

Axel said...

Abinstein

This "problem" is not business model, but business size.

Whatever the reason for their current problems (e.g. size), the bottom line is that AMD's current way of doing business no longer works and it must change its business model to survive. You meet great demand for your products, sell everything you make, and still hemorrhage $2 billion in a year. Obviously size is a major factor in this: low margins in a high fixed cost environment means that high volumes are needed to remain viable. AMD don't have the high volumes, the fixed costs are increasing every year, and the price war has dropped their margins into the gutter. Obviously something's got to give. If you think K10 will save the day, think again, as Hector knows it won't transform the competitive landscape. And I wouldn't pin my hopes on DTX (lol) either.

Hector's been talking up asset light for some time now and still hasn't revealed what it specifically means due to "competitive reasons". The two or three slides about asset light that they presented during the recent Analyst Day just contained some meaningless BS to appease investors, a few words about synergizing the outsourcing knowledge that ATI brought to the company, blah blah.

Rumors have been circling that September 10 will bring a big surprise in addition to the Barcelona launch. Perhaps an announcement on something related to asset light? If not, Hector has to announce something by the Q3 CC since the financial performance almost certainly continues to be dire.

Greg said...

It seems none of the FUDders here ever watch how a business expands into a new market. First, it takes on a lot of debt, then it goes into more debt. While doing all of this, it gains customers, gets their trust, and loses money selling to them. Finally, after all that is established, it finally makes money and slowly recovers yet remains stable for many years, or it never gains the trust of those it sells to because it SUBSTANTIALLY misrepresents itself and its product and it goes down the tubes.

No, AMD is not a new company, but yes, it has a market is finally being allowed to reach, and its getting its products there. Yes, it misrepresents itself like any other company, but compared to what the competition has done in the past, what AMD does is pretty mundane.

So obviously its business model is technically not sound. Nor is the business model of any start-up or any newly founded company. But that makes sense, because they have to take risks to get into the market, and that's what AMD is doing.

abinstein said...

p4nee: "On the other hand, AMD has been doing a lot of talkings (while Intel is working on some real project), publish misleading ads, contain misleading information in offical website, etc."

AMD didn't publish 1/10th as much "misleading" ads as Intel. What AMD published was scientifically correct at the time of the publication; on the contrary, Intel used invalid & different system settings to make false claim mostly all the time.

"AMD is far more suspicious and capable of FUD, don't you think so?"

Intel has been caught to modify its compilers to cripple non-Intel processor performance; not to mention all the money Intel sent to the "on-line journalists" to make favorable statements regarding the company unconditionally.

enumae said...

Abinstein
...not to mention all the money Intel sent to the "on-line journalists" to make favorable statements regarding the company unconditionally.


Can you show a link where this was proven?

If not, wouldn't that be considered FUD?

LG said...

abenstein wrote:

AMD didn't publish 1/10th as much "misleading" ads as Intel. What AMD published was scientifically correct at the time of the publication; on the contrary, Intel used invalid & different system settings to make false claim mostly all the time.

Speaking of Intel's effort to mislead the public, read what the experts have to say about there claim of Yorkfield doing realtime raytracing @ 90 fps.
Here's a random quote from the thread at beyond3d:

"!! They're spouting the same uninformed nonsense about raytracing vs rasterization again, and quite honestly this is inexcusable coming from Intel. If they keep this up, and continue to refuse to answer our quite-reasonable questions (B3D had a great interview list) I'm gonna lose a lot of respect for them...

Sure they make good chips, but seriously give me some confidence that you have the intelligence to make Larrabee good...

[Edit] Plus they just lie... their claims of 10x faster than G80/Cell for raytracing are just wrong. Clearly they missed SIGGRAPH this year, and last year, and several other occasions on which real-time raytracers doing much more complex stuff have been demoed running on GPU's/Cell. Sigh. I should stop reading anything from Intel related to raytracing"


http://forum.beyond3d.com/showthread.php?p=1053252#post1053252

abinstein said...

"[Edit] Plus they just lie... their claims of 10x faster than G80/Cell for raytracing are just wrong. Clearly they missed SIGGRAPH this year, and last year, and several other occasions on which real-time raytracers doing much more complex stuff have been demoed running on GPU's/Cell. Sigh. I should stop reading anything from Intel related to raytracing""

Thanks LG. Very informative, and very "consistent" with Intel image that I have. Raytracing is not my field, but I wonder what Ho Ho has to say about this? Did he miss the several SIGGRAPH together with Intel?

GutterRat said...

abistein

You keep making unfounded claims and hyperbole.

enumae called your BS:

Abinstein
...not to mention all the money Intel sent to the "on-line journalists" to make favorable statements regarding the company unconditionally.

Can you show a link where this was proven?

If not, wouldn't that be considered FUD?


You did not answer enumae, because you can't.

Face it: you are best suited to work in North Korea or Cuba than you are blogging.

Greg said...

Umm, poke... I'm pretty sure no one here, including abinstein, thinks sharikou has the brainpower of a small rat, much less that of a small child. So why don't you try contributing to the argument in a thoughtful way. Because last I checked, even if what abinstein was backing his arguments with were only technicalities (which they aren't as I'm sure most of his opponents will admit) he's still backing them with evidence and coming to a conclusion that can at least seem logical.

As for Intel supporting bogus numbers (and I know this is beating a dead horse, but it's just too freaking hilarious to pass up) Intel has its own SPONSORED section on Anandtech. Now, I realize sites like that are sponsored through the hardware they receive, and through the ads they run, but last I checked this page on their site pretty much violates any standard of integrity their results could have had (though they didn't to begin with, so that's not much of a change).

abinstein said...

gutterrat -
"You did not answer enumae, because you can't."

Ask scientia, or anyone with a clear head, whether that those "on-line journalist" obviously distorted facts to make Intel look good and AMD good bad.

You will get only one answer: "Yes."

Then ask those "journalists" whether they take money directly from Intel marketing. You will again get only one answer: "Yes." (They don't even hide the facts on their webpages.)

I didn't answer enumae because the links are all there, and if he/you can't see, then nothing I or anyone say will be useful.

Ho Ho said...

lg
"Here's a random quote from the thread at beyond3d: "

I've read those comments and all I can say is that people think that ray tracing is what PovRay does. E.g ~2k pixels per second. Things have changed and even PovRay will have (experimental) real-time ray tracer in the next version.

Just run the demo in this thread and you'd see how fast it goes. Seeing 90FPS+ on 3GHz Yorkfield quad is not difficult, especially considering that this demo is not the fastest tracer in the world, though I doubt there are many that would be faster.

If that demo is not enough for you then compile the source yourself and test that.


Also please show me where do G80 or R600 get nearly as fast results on similarly complex scenes. They simply cannot get anywhere near those tracers since they cannot do such basic optimizations as beam tracing, not to mention their inability to do animated scenes. If anyone sais otherwise then please, show me what performance does GPUs have with such scenes.

The paper people are telling about Intel has "missed" is probably this. All they do there is trace some static scenes at not too high FPS. Here they trace the same Conference and Soda hall on single core 2.6GHz Opteron getting more than twice the speed of G80. As I've said Core2 (and Penryn) have twice the speed of K8 thanks to twice the width of SSE. So a 3GHz Core2 quad should give more than 4.5x times the speed of that K8. Here is another paper that includes some of the older Intel tracer results. They run their tracer on 3.2GHz P4 with HT. From personal experience I can tell you P4 is more than twice as slow as Core2. As it uses HT I'll simply scale the number by 3.75 to get my estimate for 3GHz quadcore Core2. Penryn should be significantly faster thanks to some vector instructions that came with SSE4 but unfortunately I have no idea how much.

So, let's look at the numbers. All scenes are rendered at 1024x1024 with simple shading.

Soda hall (~2M triangles)
G80 5.7FPS
2.6GHz Opteron 12.6FPS
3.2GHz P4 24FPS
my estimate for 3GHz Core2 quad with MLRTA: 90FPS

Conference (~200k triangles)
G80 6.1FPS
2.6GHz Opteron 10.5FPS
3.2GHz P4 15.6FPS
my estimate for 3GHz Core2 quad with MLRTA: 58.5FPS

Compared to Core2 the difference surely isn't 10x but "only" ~5x. With Penryn things will be significantly different. Also thoes MLRTA results are rather old as Intel hasn't made its results public for quite some time. I'm sure they have further optimized their code and I wouldn't be surprised to see more than twice the speed on that old P4 today.

Feel free to bring updated results from other papers that would show different results.

Another interesting thing to compare would be tracing dynamic scenes. Too bad no GPU tracer has done it before.

Ho Ho said...

abinstein
"Did he miss the several SIGGRAPH together with Intel?"

Did you miss it? I know I couldn't go there and I haven't read all the papers. If you know anything significant I didn't talk about in that RT post please tell me what did I (and Intel) miss

Ho Ho said...

Here is an interesting comparison of experimental ray tracer vs PRman vs Razor:

"We have proposed an approach to ray tracing subdivision surfaces using on-the-fly tessellation. Whereas other systems like Razor or PRMan amortize the cost for patch subdivisions by caching geometry, we instead use large packets of rays coupled with an efficient traversal algorithm. Our approach amortizes the cost of subdivision at least as well as geometry caching, and in addition allows for all the other advantages of packet techniques. Consequently, we not only need less memory for caches, but are also faster than both Razor and PRMan.

For scenes with varying depth complexity, we have proposed an adaptive subdivision method. Though crack fixing adds complexity to the system, for the Disney scene it provides additional speedups of up to 2.1×. Adaptive subdivision also becomes particularly interesting when considering packets of less coherent secondary rays, as these can use a coarser scene representation.

Performance-wise, our system outperforms both Razor and PRMan by up to 5.2× and 5.6×, and a single-ray implementation of the same algorithm by 16.6×. Compared to pre-tessellated models with pre-built acceleration structures we achieve a roughly competitive performance, but require only a fraction of the memory. Furthermore, we can render even hugely complex scenes which would not fit into memory."


Of course this is only an experiment with quite a few features and optimizations still missing. Still I'd say that the results are promising. Perhaps Pixar really is kind of dumb for not researching ray tracing deeply enough. Yes, I know that PRman already has ray tracing and global illumination support built in but that doesn't mean their implementation is anywhere near optimal. My guess is theirs is at least one or two orders of magnitude slower than current leading ray tracers.



Btw, abinstein, would you like to comment on what I said about profiling built into CPUs? On amdzone you made it sound as it would be the best thing next to sliced bread. I would write my opinion there too but either I'm banned or the forums are plain broken as I get signed out as soon as I try to post something.

enumae said...

Abinstein
...I didn't answer enumae because the links are all there, and if he/you can't see, then nothing I or anyone say will be useful.


Could you please show me a link in which it is clearly stated that ...the money Intel sent to the "on-line journalists" to make favorable statements regarding the company unconditionally.

I have no problem admitting to being wrong, but you have made a claim and now that you are being asked to back it up you avoid it and say it is there for me to find...

Isn't that FUD?

Greg said...

Enumae, I realize abinstein is very stubborn and is reluctant to take the time to show you something that obviates Intel's meddling, but is my reply not satisfactory?

Does Intel having its own sponsored section and being the main provider of anandtech's ads not show that there's at least one very prominent review site not show a conflict of interest and a willingness to "adjust" public opinion in its favor to maintain unearned dominance (a.k.a. the p4 years).

Ho Ho said...

Are there no AMD commercials on anandtec and tom? If there are then would that mean that (some of) the reviews are biased towards AMD?

enumae said...

Greg
Does Intel having its own sponsored section and being the main provider of anandtech's ads not show that there's at least one very prominent review site not show a conflict of interest and a willingness to "adjust" public opinion in its favor to maintain unearned dominance (a.k.a. the p4 years).


The problem is that you and Abinstein are trying to compare Intel advertising to a willingness to "adjust" public opinion in its favor.

That is not proof, it is speculation and conspiracy theory.

Can you or Abinstein show proof that Anandtech or any other review site has a willingness to "adjust" public opinion in its favor?

Abinstein continues to avoid producing actual links to sources that have spoken out against Intel's actions, yet he continues to claim the information is out there, so it shouldn't be too much trouble for him to bring it here.

Have a good one.

PS: If someone were to imply the same thing in regards to AMD Abinstein would be the first one here screaming for links and caling them Fudders.

abinstein said...

"PS: If someone were to imply the same thing in regards to AMD Abinstein would be the first one here screaming for links and caling them Fudders."

When do you see I do that? I've never given a damn to the "links", unless they are to peer-reviewed, objective sources.

Don't put me on the same level as yourself, sir.

Mo said...

Abinstein since you are at a MUCH higher level than the rest of us.

Why don't you provide the asked links to prove your claim and put us back down to our levels?

Is that asking for too much?

Ho Ho said...

abinstein, no comments about ray tracing?

enumae said...

Abinstein
Don't put me on the same level as yourself, sir.


Did I strike a nerve... Your emotion's for AMD has you Fudding, acknowledge that it is your opinion or show links.

This shouldn't be hard to comprehend, considering that when emotion is not involved you are very intelligent?

enumae said...

Should have been a period after "intelligent", sorry.

GutterRat said...

abistein

You are hiding. You made a claim. You are being asked by multiple people to substantiate it and you, well, don't answer directly.

You believe Intel has journalists on its payroll? This should be a slam-dunk Pulitzer.

Woodward and Bernstein are shaking in their boots...NYET!

By the way, it's pretty easy for me to tell who you are on ZDNet.

bk said...

enumae
"The problem is that you and Abinstein are trying to compare Intel advertising to a willingness to "adjust" public opinion in its favor."

Isn't it the point of advertising to adjust public opinion?

bk said...

I think the real problem here is with anandtech, tom's hardware and others that bias their testing in favor of a paying advertiser. I know there are differing opinions about whether or not they actually bias their testing, but I think there have been a few cases brought up by scientia and others that are indefensible.

Using Intel's compiler to test AMD chips when an independent compiler (PGI) is available in my opinion is indefensible.

abinstein said...

"You are hiding. You made a claim. You are being asked by multiple people to substantiate it and you, well, don't answer directly."

Those sites receive favoritism from Intel for favorable reports, and some of their reports are so biased as to being ridiculous. They are inconsistent with each other; their numbers float around with Intel's and AMD's pricing structure; their benchmark choices fluctuate with no reason. I have nothing to hide, but they probably do.


"You believe Intel has journalists on its payroll? This should be a slam-dunk Pulitzer."

Technically, pay to get favorable report does not equate "have on its payroll." But it'd be fine, IMO, if you want to look it that way.

"By the way, it's pretty easy for me to tell who you are on ZDNet."

I must be hitting on something really substantial that people like you want to jump out and point to me personally. I guess that speaks for your level very well.

abinstein said...

"Using Intel's compiler to test AMD chips when an independent compiler (PGI) is available in my opinion is indefensible."

Exactly one of the many ways that these sites are biased. Scientia has pointed out quite a few others, such as only showing OC'd results from Intel, ignoring motherboard pricing when Core 2 performance depends more on chipset quality, focusing on benchmarks that favor large cache, etc.

Ho Ho said...

gk
"Using Intel's compiler to test AMD chips when an independent compiler (PGI) is available in my opinion is indefensible."

Do you know the price difference between the two? Als can you list a few such benchmarks? In last one I saw ICC optimized code for K8 way better than Microsoft compiler.


abinstein, still no comment on ray tracing and profiling? I'm dissapointed. First you claim all sorts of weird things and then avoid defending them.

abinstein said...

Ho Ho -

It's a waste of time to talk to you, who never learns better. You've been wrong several counts in memory bandwidth, memory architecture, instruction decode, microarchitecture and compiler designs, and benchmarks. Previously I thought at least you know raytracing well, but now it seems you even don't.

It is no use of you to do nagging until you try to learn the basics better before you talk.

enumae said...

Abinstein

Let's make this really simple.

Can you post at least one link where someone other than yourself, Scientia, AMDZone, AMDZone Forum or a Forum of AMD users criticizes the reviews done by Tom's Hardware, Anandtech or another major review site?

I am only looking for one link from a reputable news source, otherwise give it up because all you have is nothing more than a conspiracy theory.

It is just advertising.

In regards to the compilers chosen, or benchmarks used,please be more specific.

In regards to Cache dependent benchmarks, well that would be almost any single threaded program wouldn't it?

Erlindo said...

Ho Ho, enumae, Gutterrat and the rest of the intel crew:

It's an obvious fact that Intel does indeedd pay review sites and make them use their compilers to make them look good on benchmarks.

Scientia:
You have to do something quick before this nice blog becomes another Roborat parody thanks to these guys.

Ho Ho said...

abinstein
" You've been wrong several counts"

Remember that K10 inbox cooler discussion we had where you were 100% sure that Intel inbox was massively superior? This is not the only thing you've had wrong.


Point is we have both made mistakes before, there is no point in digging up old stuff unless you have something new to say about it. Difference between the two of us is that I try to proove what I say, you simply start insulting when you can't disproove what other people say, just as you do right now.



"Previously I thought at least you know raytracing well, but now it seems you even don't."

Please tell me what exactly was wrong with the comparisons I made (or what Intel said). Also show me where does G80 or any other GPU (massively) outperforms regular CPUs.

If you know anything about ray tracing, and you should because otherwise you couldn't be certain to say I'm wrong, it should be simple to bring enough data to prove what you are saying.


"It is no use of you to do nagging until you try to learn the basics better before you talk."

So you know the basics of ray tracing? Good! Now start showing some data instead of pointlessly babbling.

Erlindo said...

AMD's 3GHz K10 to break 30,000 3DMark06

Looking good. ;)

Scientia from AMDZone said...

ho ho

I don't know why you are having so much trouble understanding about the JEDEC specs. But let me see if I can make the point more clearly.

The top speed of DDR2 or DDR3 is for common desktop DIMMs. However, both FBDIMM and registered DIMMs lag behind this. You are completely misunderstanding when you think I am saying that DDR3 will stop at 800 or 1066. DDR3 will reach 1600 in fact. However, you won't see either FBDIMMs or registered DIMMs in that speed. This why AMD is moving to G3MX in 2009.

bk said...

enumae
"In regards to Cache dependent benchmarks, well that would be almost any single threaded program wouldn't it?"

No, this is not true. It depends on how much cache the benchmark code needs to run.

A single thread can be written to run tightly within the cache or it can be written to utilize lots of memory and cache. It all depends on what is coded.

bk said...

"AMD's 3GHz K10 to break 30,000 3DMark06"

Can someone compare how this compares to K8 and Intel's scores.

The Inquirer makes it sound like this is substantially above anything previous.

Scientia from AMDZone said...

giant

"The only way for AMD to avoid BK by Q2'08 is to sell off substantial assets. This would include either business units (Consumer electronics?) or FABs. Don't be suprised if AMD sells off FAB30 or the consumer electronics division in addition to all the old tools from FAB30, and unneeded land etc."

This definitely will not happen. In fact, this is one of the most absurd things you have said. There is no doubt that AMD has financial problems however it cannot sell off FAB 30.

I think the problems may indeed prevent AMD from building the third FAB at NY but AMD has to keep FAB 30. Without FAB 30, AMD will be back the previous failing model with a single FAB. So, unless you believe roborat's nonsense about outsourcing this simply is not possible. At the end of 2008 FAB 36 will be producing 45nm chips and FAB 30 will be producing at least some 300mm.

I also doubt very seriously that AMD will sell off pieces of ATI since these seem to be selling the best right now. There is some possibility for property though. For example, it looks like AMD is closing up the ATI headquarters in Canada. It is also possible that AMD could consolidate some corporate offices since it no longer uses the research facility in Sunnyvale.

Scientia from AMDZone said...

p4nee

"On the other hand, AMD has been doing a lot of talkings (while Intel is working on some real project), publish misleading ads, contain misleading information in offical website, etc. by any mean, AMD is far more suspicious and capable of FUD, don't you think so? "

You need to stop getting your information from sources like George Ou and Kubicki. AMD's information was accurate when it was created but it got out of date. There was no conspiracy, just old information. Intel still has outdated (and therefore incorrect) information about AMD on its website. However, you will never see George Ou or Kubicki throw a fit about Intel's outdated information the way they did about AMD's. Ou and Kubick are clearly biased in favor of Intel.

The simple truth is that neither AMD nor Intel try to spread misinformation about the other. What usually happens is that benchmarks will be highly selective or sometimes executives who aren't really up to date will make incorrect statements. And, then sometimes information gets out of date. However, most of the misinformation comes from web journalists rather than from the factories.

You might recall when AMD began making K8 that this caused a stir with both Apple and Itanium fans. Apple fans tried to claim that K8 was not truly 64 bit and most of the information was wrong. Itanium fans wanted to claim that Itanium was more compatible with 32 bit x86 code than K8. So, a lot of bias and little knowledge about K8 caused a lot false rumors.

At least one executive then repeated the misinformation that K8 could only run in 64 bit mode or 32 bit mode and that it took a system reset to change modes.

Scientia from AMDZone said...

axel

"Hector's been talking up asset light for some time now and still hasn't revealed what it specifically means due to "competitive reasons". The two or three slides about asset light that they presented during the recent Analyst Day just contained some meaningless BS to appease investors"

Axel. I used to think that the people on Forumz were outrageously biased against AMD. Then later I finally realized that the level of knowledge about AMD is not that high on Forumz so misinformation goes uncorrected a lot. I at first thought that your above statement was so ridiculous that you must be trolling, however, I'm now thinking it is possible that you just lack enough knowledge about AMD to clarify what you saw in the slides. I'll see if I can bring you up to date.

1.) Research: AMD used to do its research at its SDC (Submicron Development Center) at Sunnyvale. This was equivalent to Intel's RP1 facility in Hillsboro. However, AMD began shifting to research at East Fishkill back in 2002. SDC was divested along with FAB 25 with flash memory as part of Spansion. Today, AMD has 77 scientists and engineers working at East Fishkill. Since AMD no longer has its own research facility like Intel has this is Asset Light.

2.) Assembly: AMD used to own assembly factories in the Far East but it now farms this out to other companies. Since AMD does not own the assembly factories this is Asset Light.

3.) Chipsets and Graphics: Intel makes all of its chipsets in its own FABs. AMD decided to continue with the foundry based production that ATI had in place when it was purchased. So, since AMD is not going to maintain FABs for chipsets and graphics this is Asset Light.

4.) CPUs: Intel makes all of its CPUs in its own FABs. However, AMD has an agreement in place to outsource work to Chartered. This is a very complex 3-way agreement between IBM, AMD, and Chartered and is not something that could be duplicated. IBM already had an agreement with Chartered to allow Chartered to use its proprietary processes to produce overlow work for IBM. Chartered is not allowed to use these processes for its own foundry work, only as a second source to IBM. However, when AMD became IBM's process partner the agreement allowed Chartered to produce AMD processors. This is notable because this is the only non-IBM second sourcing that Chartered is allowed with IBM's process technology. AMD's APM system is used at all three facilities and Chartered's process is certified by AMD engineers. Since AMD does have a second source (unlike Intel) this is Asset Light.

5.) Development: Intel does initial development at D1D. AMD's current model uses 10% of the capacity of FAB 36 to run test wafers inline with regular production. Since AMD doesn't have a separate development FAB this is Asset Light.

Additional: Intel has its own compiler which is coded in-house whereas AMD chooses to work closely with Portland Group to make sure that a compiler is available that fully utilizes its CPU hardware.

Scientia from AMDZone said...

On the subject of Intel's money specifically biasing reviews I'm only directly aware of one. Intel gave several machines to a Linux website and then that website published a very favorable review of Intel processors even though their own test data didn't support this. This is nothing new; scientists have known about this unconscious fudging of results for a long time which is why we have double blind testing for things that are subjective.

enumae said...

bk
Isn't it the point of advertising to adjust public opinion?


Sorry I had missed this one, short answer is no.

The point of advertising is brand recognition. If Intel is sponsoring Anandtech you won't see mention of AMD in advertisements, just look at other sites.

bk
No, this is not true.

Thank you, I am not a programmer so your explanation was helpful.

Erlindo
It's an obvious fact that Intel does indeed pay review sites...


Please provide a link from a reliable source that can substantiate you and Abinsteins claims.

Scientia from AMDZone said...

greg

There is nothing wrong with Sharikou's intelligence. His problem is that he wants AMD to succeed so badly that he loses objectivity. You can see similar situations where common sense takes a backseat to wish fulfillment by watching Ghost Hunters.

enumae said...

bk
Can someone compare how this compares to K8 and Intel's scores.

The Inquirer makes it sound like this is substantially above anything previous.


Looking at the new world record of 27,039 (Intel QX6850 at 5.1GHz and ASUS EAX2900XT in Crossfire @ 1175/950MHz) the Inq claims seem to be very impressive if true.

InTheKnow said...

Development: Intel does initial development at D1D. AMD's current model uses 10% of the capacity of FAB 36 to run test wafers inline with regular production. Since AMD doesn't have a separate development FAB this is Asset Light.

This is not accurate. D1D is not strictly a development facility. It is currently ramping 45nm while also working on the next process node. It will be Intel's first HVM facility on 45nm and will remain so until 32nm begins ramping there.

As far as I know the only pure development fab that Intel owns is D2 in California which is used for chipset development. Since this facility is 8", I suspect its lifespan is limited at this point.

InTheKnow said...

And this from tech report on CSI:

Kanter's article spares no technical details, and parts of it might be a tad difficult to follow if you're not an electrical engineer. Still, many interesting tidbits about CSI stand out. For instance, Kanter mentions that CSI could be extended from a copper-based parallel implementation to a serial, optical one. There's talk of data speeds, too: initial implementations of CSI are expected to provide 12-16GB/s of bandwidth in each direction, or 24-32GB/s of bandwidth per link. For reference, next-gen HyperTransport 3.0 links top out at 20.8GB/s of full-duplex bandwidth, and Intel's 1333MHz front-side bus peaks at 10.7GB/s.

The full report is at the link below.

http://www.realworldtech.com/page.
cfm?ArticleID=RWT082807020032&p=1

InTheKnow said...

"AMD's 3GHz K10 to break 30,000 3DMark06"

Unfortunately this result was achieved by over-clocking. Since almost nobody over-clocks the results aren't relevant. ;P

As enumae said, impressive if true.

Scientia from AMDZone said...

Some of the things that have been hashed and rehashed on here are not that important. We know for a fact that Intel specifically rearranged the BAPCO benchmarks to favor its processors. We also know that people routinely compared AMD and Intel processors using ICC and routinely claimed that K7 and K8 had poor SSE until it was proven that ICC didn't produce SSE code for AMD processors. This occured when someone compiled code on a Xeon system and then moved the executable to Opteron and saw a big jump in speed versus code complied directly on Opteron. Intel is not the only company to do this. Some here might recall when Microsoft put spoof code into Windows so that if it detected DRDOS from rival Digital Research it would generate a false error.

Does money from Intel bias reviews? Almost certainly it does. Have there been bad and biased reviews? Yes. Probably one of the worst was done by Anandtech where they compared a dual socket Intel system against a dual socket Opteron system. However, the Opteron system only had memory for one of the two processors so that the second processor had to use memory via HyperTransport from the first CPU. I don't honestly know if the reviewers were just so incompetent that they didn't notice this or whether they did it on purpose.

However, other people have talked about the difference in writing style from Anand Lal Shimpi when he talks about Intel versus AMD. Anand tends to be relaxed, expansive, excited and optimistic when he talks about Intel. Yet, when he talks about AMD he tends to be tense, brief, reserved, and pessimistic. Other people have suggested that he is concerned that being too positive about AMD will jeopardize some relationship he has with Intel. However, I don't know. With a difference like this from Lal Shimpi himself. biases from Kubicki are not a surprise. But, I've seen no general bias at ZDNet so Ou seems an exception.

However, in other aspects there is no doubt that Intel is losing benchmark advantage. Most of C2D's advantage of having a large L2 is erased with K10's L3. Likewise, Intel loses prefetch advantage, stack operation advantage, faster SSE advantage, and some of its load/store advantage. Instead of the large area of general advantage that C2D enjoyed this pretty well limits Intel's benchmark advantage to some specialized SSE intensive applications and not much else.

There is no doubt though that this is going to take a lot of wind out of Intel's sails. However, I think what Intel is dreading the most is having its MCM quads put up against monolithic quads. I anticipate that the phrase, "wait until Nehalem is released" will be used many times.

I also predict that when Nehalem is released that the strong justification in favor of using using MCM dies for better yields, higher speeds, and lower cost will suddenly evaporate. In other words, I doubt very seriously that the same people who now claim that a FSB and MCM are better will criticize Intel for switching to monolithic and IMC. My guess is that the argument will suddenly switch from flexibility and cost to having a faster design.

So, I'll ask again. Do those who are now claiming that AMD's business model is broken not understand that Intel loses almost all of its cost advantages with Nehalem? It is very likely that the Nehalem die will be larger than Shanghai. Now, if you do understand this then why would you argue that AMD is going to go bankrupt? By early 2009 AMD's stance should be quite good versus Intel's.

Finally, those in the roborat pack can try to trivialize things all they want but it isn't just one thing or two things that will improve AMD's finances. It is reduced cost from 65nm, it is stronger ATI sales, it is a better mid to low end stance with DTX, and it is offering stronger products with K10. Obviously, AMD can't get back on its feet with volume alone but all of these things can make a difference.

Anyone not so biased can also understand that HTX is going be well established before Torrenza every gets off the ground. Intel will likewise have an uphill battle with Silverthorne. It also still appears that Intel will use FBDIMM for Nehalem however there are persistent rumors that the desktop will be different. It still is not clear how Intel will get around the complexity of different DIMMs without using several different IMCs.

Scientia from AMDZone said...

InTheKnow

"This is not accurate. D1D is not strictly a development facility. It is currently ramping 45nm while also working on the next process node. It will be Intel's first HVM facility on 45nm and will remain so until 32nm begins ramping there."

You are arguing semantics. FAB 36 is AMD's main production FAB whereas D1D is not Intel's. AMD has mentioned D1D several times so it is clear that they see it as part of Asset whether D1D is both development and production or not.

Axel said...

Scientia

I'm now thinking it is possible that you just lack enough knowledge about AMD to clarify what you saw in the slides. I'll see if I can bring you up to date.

All of your points address models / changes already in place that occured over the last couple years. But as is clear from Doug Grose's Analyst Day slide deck, slides 7-12, your points only constitute what Grose calls "asset light today". This is clearly not asset light going forward, because of what Hector described during the Q1 2007 CC.

Hector refers to asset light as a "tremendous opportunity for us to really do something different going forward, but it’s unique for us." Read his own words. He introduces asset light with the following statement: "The level of restructuring that I am envisioning is very significant, and so it would be difficult to outline it on a phone call. But let me give it a try as to what we intend to discuss later on in the year."

Npw perhaps you'll see how none of your points can possibly constitute Hector's interpretation of "asset light":

- Research at East Fishkill: Already in place, not "something different going forward".

- Divest factories in Far East. Already done years ago, not "something different going forward".

- ATI foundry based manufacture, been doing it for ages. Clearly not "something different going forward".

- Outsource CPUs to Chartered, already in place for a couple years now. Not "something different going forward".

- Fab 36 used for testing since the beginning, not "something different going forward".

No. Hector is referring to a big new strategy going forward that he doesn't feel comfortable revealing yet. Slide 30 of Grose's slide deck that I linked to above specifically states that "For competitive reasons, we are not sharing details of progress at this point, we will so do as soon as its appropriate."

So keep guessing what Hector means by asset light. It appears that you're just as much in the dark as myself and everyone else.

abinstein said...

"initial implementations of CSI are expected to provide 12-16GB/s of bandwidth in each direction, or 24-32GB/s of bandwidth per link."

An 20-lane CSI offers 16GB/s bandwidth. However, due to the necessity of 8b10b encoding used by the differential signaling, the actual available bandwidth is 12.8GB/s each direction.

"For reference, next-gen HyperTransport 3.0 links top out at 20.8GB/s of full-duplex bandwidth"

This is only for 32-bit link. For a 16-bit HT3 link, which requires comparable pin count to the 20-lane CSI, the bandwidth tops out at just 10.4GB/s.

Giant said...

I also predict that when Nehalem is released that the strong justification in favor of using using MCM dies for better yields, higher speeds, and lower cost will suddenly evaporate.

The Nehalem H2'08 time-frame is the perfect time for Intel to introduce a monolithic quad core. As Pat Gelsinger himself stated, using an MCM at 65nm makes good sense, to keep die size under control and it makes it much easier to manufacture as your using a proven dual core die with good yields.

These same points could be applied to the 45nm process at first. By using a small die size (A dual core Penryn die with 6MB cache is only 107mm squared) they can get the yields worked out and when the process is running smoothly, with all the kinks worked out they can produce Nehalem.

Lets say for a moment that Intel skipped Penryn, and they were going to introduce Nehalem straight away instead. They'd be using a larger die size (as you have pointed out) on an unproven process with less than optimal yields. That is why the Penryn based quad cores are still an MCM, this just makes good sense for now.

Look at Clovertown now, the yields are good and the speed it offers is currently unmatched. Intel might lose 10 -> 15% (just an estimate) over using a monolithic solution but we've seen the trouble AMD is in because of their insistence at using a monolithic quad core. Barcelona is roughly six months late. AMD is also stuck with launch frequencies of 2Ghz and it's large die size is costly to manufacture.

abinstein said...

enumae -
"Can you post at least one link where someone other than yourself, Scientia, AMDZone, AMDZone Forum or a Forum of AMD users criticizes the reviews done by Tom's Hardware, Anandtech or another major review site?"

IMO, there are two types of people: one that are willing to criticize Intel, one that aren't. You want me to post a link that criticize Intel but not from the first type of people?

Scientia made good observations (myself too but in much less clarity/degree) on how some on-line journalists obviously biased their opinions toward Intel. Do you or do you not agree with what he said?

"I am only looking for one link from a reputable news source, otherwise give it up because all you have is nothing more than a conspiracy theory."

For example, AnandTech is probably a reputable news source for you, or maybe IDF presentation, when these sources are precisely the kinds that are in questions?

Ho Ho -
"Please tell me what exactly was wrong with the comparisons I made (or what Intel said). Also show me where does G80 or any other GPU (massively) outperforms regular CPUs."

The problem is you seem to be comparing performance of CPU and GPU on CPU-optimized algorithms. This is a wrong way to benchmark different microarchitecture. The right way is to compare the 'best' algorithm for the same task.

There might have been advances in ray tracing algorithms that are suited to CPU. However if that algorithm doesn't utilize the high parallelism of GPU well, then people probably need to devise some that do.


"If you know anything about ray tracing, and you should because otherwise you couldn't be certain to say I'm wrong, it should be simple to bring enough data to prove what you are saying.""

Unfortunately I don't need to know anything about ray tracing to notice your wrong comparison method. ;)

abinstein said...

Axel -
"No. Hector is referring to a big new strategy going forward that he doesn't feel comfortable revealing yet."

How do you know? Just because Hector has a few excited words on the strategy of his company? So you believe what he says literally now? Then why don't you believe what he claimed about K10 performance? Why don't you believe what he has to say about Fusion and ATi merging?

Your bi-standard show some true colors.

InTheKnow said...

I also predict that when Nehalem is released that the strong justification in favor of using using MCM dies for better yields, higher speeds, and lower cost will suddenly evaporate.

Don't hold your breath. They will be using MCM to make octo-core processors.

Intel already claimed the cost picture for "native" quad core was favorable on 45nm. Someone should be able to calculate die loss based on defect density to prove or disprove this. There is nothing like showing hard numbers to back up an argument.

It is very likely that the Nehalem die will be larger than Shanghai.

Source, please?

Intel will likewise have an uphill battle with Silverthorne.

Why?

So, I'll ask again. Do those who are now claiming that AMD's business model is broken not understand that Intel loses almost all of its cost advantages with Nehalem?

I don't think I'm claiming AMD's business model is broken (nor do I think I belong to Roborat's "pack"), but so far asset lite hasn't exactly wowed me with the financial returns. It sounds a lot to me like more of the same. And nothing you have said negates Intel's biggest advantage.

Size.

Size brings economies of scale that AMD can't realize. Especially if they are unable to build the fab in NY.

I'm also curious about F30. Is it capable of accepting 12" equipment? Fabs that were designed for 8" equipment don't have high enough ceilings for 12" tools. Raising the roof on a fab is a lot cheaper than building a whole new building, but it still ain't cheap. And you can't exactly maintain a clean environment when you rip the roof off the building so production shuts down.

AMD has mentioned D1D several times....

You'll forgive me if I don't give any more credence to what AMD says about Intel than I do about what Intel says regarding AMD. Your competitor is not a credible source of information about your operations or activities.

InTheKnow said...

This is only for 32-bit link. For a 16-bit HT3 link, which requires comparable pin count to the 20-lane CSI, the bandwidth tops out at just 10.4GB/s.

Abinstein, can you direct me to where this information comes from? This isn't my area of expertise, but I can generally follow a technical discussion if I have the whole document to dig through. Because if what you are claiming is true, then HT 3.0 is slower than the 1333MHz FSB at 10.7GB/s. At least the way tech reports figures it.

Giant said...

So keep guessing what Hector means by asset light.

It's easy. Hector Ruiz says "we need money so we're selling stuff!".

enumae said...

Abinstein
You want me to post a link that criticize Intel but not from the first type of people?


I am looking for a third party with (if possible) no connection to Technology reviews.

The difference being that if Intel was indeed doing what you have said there would be information on almost any major news site, and they would be showing/reporting that Intel does not perform as well as has been believed due to false testing and paying review sites for falsifying/distorting testing results.

Do you or do you not agree with what he said?

Sure journalist can be biased, but we are not looking at a persons opinions, but actual test results.

I don't care if someone likes Intel better than AMD or AMD better than Intel, when I read a review I have to believe that it is as accurate as it can be, otherwise I (and everyone else viewing a website with Intel advertising) am going to have to buy alot of hardware and be sure of what I am buying by doing my own test.

If it is truly, like you said, Intel buying favorable reviews, I would hope someone would make it a point and put it in front of everyone's eyes so all can see, but as of now it's not there.

For example, AnandTech is probably a reputable news source for you, or maybe IDF presentation, when these sources are precisely the kinds that are in questions?

What about IDF is/was misleading?

I hope my above comments clearify my view on this matter. As we have seen and been through in the past, I don't have the best communication skills... :)

Giant said...

For example, AnandTech is probably a reputable news source for you, or maybe IDF presentation, when these sources are precisely the kinds that are in questions?

Compared to what? AMDZONE? Or perhaps AMD themselves?

Erlindo said...

This little piece back up what Scientia has being saying all this time:

Intel's compiler: is crippling the competition acceptable?

;)

Ho Ho said...

scientia
"I don't know why you are having so much trouble understanding about the JEDEC specs"

Only trouble I have with the specs is there is no mention of 1066MHz DDR2 anywhere.


"However, both FBDIMM and registered DIMMs lag behind this"

So Barcelona can't use 1066MHz DDR2 even if it would be availiable as the speeds of server RAM will lag behind?


"You are completely misunderstanding when you think I am saying that DDR3 will stop at 800 or 1066"

I didn't say that, I said that there is no reason to think FBDIMMs will stay at as low speeds as you said they will.

If you can finally dig up that 1066MHz DDR2 from the specs it would prove that new speeds can be introduced later in time and it would make DDR3 >1.6GHz a possibility. If you can't (won't) then you won't have anything to back your claim that 1066MHz DDR2 will ever be produced (for servers). So which one of your claims is wrong?


bk
"Can someone compare how this compares to K8 and Intel's scores."

It is the usual Inquirer BS. Only way this could be real is if those R600s deliver 50% higher performance as others at same clock speeds and K10 is 2x faster than 4GHz core2quad. Without the GPUs running faster even infinitely fast CPU couldn't raise the score to 30k.

Now what is more believeable, K10 makes GPUs run considerably faster and is itself ~250% faster than Core2 or Inquirer just made up that story or got something terribly wrong?


scientia
"This occured when someone compiled code on a Xeon system and then moved the executable to Opteron and saw a big jump in speed versus code complied directly on Opteron."

ICC generates code that has the brandname check built in. That means if the compiled program can't find "intel" in brandname it will run the non-optimized functons instead of the optimized ones. That means it doesn't matter where you compile it, only what CPU you use to run it matters.

At least this is how it used to be with 8 and 9 series, I have no solid information about other versions. Can you link to the test you described? Perhaps they did something weird that gave them such results.


"Anand tends to be relaxed, expansive, excited and optimistic when he talks about Intel. Yet, when he talks about AMD he tends to be tense, brief, reserved, and pessimistic."

I guess one could say that I'm like that also. Reason for me is that AMD hasn't been doing especially good lately and I have lost some of the trust I used to have. Perhaps it is similar with him?


"I also predict that when Nehalem is released that the strong justification in favor of using using MCM dies for better yields, higher speeds, and lower cost will suddenly evaporate."

Nehalem can do MCM also. Unless it will be quadcore-only I wouldn't be surprised to see lower-end 2x dualcore MCM quads from Intel. Of course I could be wrong also, it will depend on if it needs to sell cheap quadcores or are the native ones cheap enough.


"In other words, I doubt very seriously that the same people who now claim that a FSB and MCM are better will criticize Intel for switching to monolithic and IMC."

I say that MCM is good enough for now. Monolothic would be better in some places but I doubt the difference would be too great. Also the tic-toc model talked about how Intel is going to add cores to CPUs. They said they start with N cores on a new tech node and the refresh would be with 2N cores.


"So, I'll ask again. Do those who are now claiming that AMD's business model is broken not understand that Intel loses almost all of its cost advantages with Nehalem?"

All but technology advantage. Also with K8 vs Netburst Intel had very little advantages, if any, but still AMD couldn't do much. We shall see if it is different this time.


"It is very likely that the Nehalem die will be larger than Shanghai."

Unless Intel cuts down the amount of cache it has on die. With IMC it is not needed as much any more anyway.


"Now, if you do understand this then why would you argue that AMD is going to go bankrupt?"

How much money does AMD have to bay back before 2009? My crude calculations say around 1B a year, probably a bit more.


abinstein
"The problem is you seem to be comparing performance of CPU and GPU on CPU-optimized algorithms."

We were comparing ray tracing. You said that GPUs are good at it. Now you are saying that I used wrong algorithms to compare the two?


"This is a wrong way to benchmark different microarchitecture. The right way is to compare the 'best' algorithm for the same task."

So are you suggesting we should compare software ray tracing running on CPU vs hardware accelerated rasterizing on GPU? Have you got any idea how different those two are in terms of final result? It is somewhat similar to compare CPU power by running molecular simulation on one architecture and just rearrangning them randomly on the other. Can you make any kind of meaningful comparison with two so different algorithms?


"However if that algorithm doesn't utilize the high parallelism of GPU well, then people probably need to devise some that do."

Problem why ray tracing is so slow and inefficient on GPUs is that GPUs are not suitable for running code with that complexity. It is not that the algorithms aren't parallel enough, they are. You could run every single pixel on different CPU without ever talking to the others. Just that GPUs have awful granularity when it comes to branching.


"Unfortunately I don't need to know anything about ray tracing to notice your wrong comparison method. ;)"

So what exactly would be the right comparison? Was Intel wrong when saying their CPUs can do ray tracing 10x faster than GPUs? After all this was the claim that started all this discussion and so far I still haven't got any proof that Intel was wrong, even though you and lg claimed otherwise.

abinstein said...

"We were comparing ray tracing. You said that GPUs are good at it. Now you are saying that I used wrong algorithms to compare the two?"

Ray tracing is not an algorithm; it is an application. For example, if you compare SHA authentication on CPU and GPU then there's no doubt CPU will work faster. That doesn't mean there is no better & faster authentication algorithms available for GPU.

"So are you suggesting we should compare software ray tracing running on CPU vs hardware accelerated rasterizing on GPU?"

Where did I suggest that? You dreaming?

"So what exactly would be the right comparison? Was Intel wrong when saying their CPUs can do ray tracing 10x faster than GPUs?"

Yes, Intel is wrong saying that. All Intel proved was that by using a newer algorithm it performs 10x better than the old algorithm on CPU. It doesn't compare it with an algorithm optimized for GPU.

abinstein said...

intheknow -
"Abinstein, can you direct me to where this information comes from?"

I'll have to direct you to my recent blog article. It doesn't talk specifically on HyperTransport, but you can find the relevant information (and its original source - the spec) on that page. :)

"Because if what you are claiming is true, then HT 3.0 is slower than the 1333MHz FSB at 10.7GB/s. At least the way tech reports figures it."

You missed the "per direction" part.

A 16-bit HT 3.0 at 2.6GHz has 10.4GB/s per direction, or 20.8GB/s aggregated. It requires 40 lines, out of which 32 are CADs (16 per each direction), 4 are CTLs and 4 are CLKs.

OTOH, a 64-bit 1333-FSB has 10.7GB/s aggregated bandwidth, and it requires at least 64 lines.

Ho Ho said...

abinstein
"That doesn't mean there is no better & faster authentication algorithms available for GPU."

Problem is Intel chose the fastest GPU raytracer availiable to compare against. If there are any faster tracers then they haven't publicised anyinformation about it. That means there is currently no known faster algorithms availiable for GPU ray tracing.


"Yes, Intel is wrong saying that."

Only way Intel could be proved wrong would be to show a GPU ray tracer that is significantly faster than the one I linked to.


"It doesn't compare it with an algorithm optimized for GPU."

That GPU ray tracer is the best optimized GPU ray tracer there currently exists, at least I don't know any faster one. As you seem to be so certain there are faster ones please link to them.

abinstein said...

""Problem is Intel chose the fastest GPU raytracer availiable to compare against. If there are any faster tracers then they haven't publicised anyinformation about it."

If this is the case then I'll agree with you. From your previous description it seemed to me that they compared the same new raytracer on CPU and GPU.

Ho Ho said...

Kind of OT but didn't AMD promise to start shipping CPUs in the middle of August instead of the last days of the month?

http://www.theinquirer.net/?article=41980


Yeah, Inquirer, I know. They lie or can't coung GPUs but I assume they got that one right as there have been rumours about the delay. Kind of odd considering how AMD demoes high-clocked Barcelonas but can't deliver even the slowest ones.

Ho Ho said...

abinstein
"From your previous description it seemed to me that they compared the same new raytracer on CPU and GPU."

I'm curious, what part of my description made you think that when I brought out the research papers with numbers pasted from them? I'd just like to know so I could avoid such mistakes in the future.

Christian M. Howell said...

Kind of OT but didn't AMD promise to start shipping CPUs in the middle of August instead of the last days of the month?

http://www.theinquirer.net/?article=41980


Yeah, Inquirer, I know. They lie or can't coung GPUs but I assume they got that one right as there have been rumours about the delay. Kind of odd considering how AMD demoes high-clocked Barcelonas but can't deliver even the slowest ones.



It's not unusual to demo higher clocked parts as they are just to show the headroom.

AMD said shipping to OEMs in July, systems shipping in August. It turned out to be shipments in August and systems in Sept.

Wow, a whole month. Take them out back and beat the crap out of em. I mean even Intel had to do a last minute respin of Core 2 last year.

Because they have more fabs, it's not as bad.


ALL HAIL THE DUOPOLY!!!

abinstein said...

"I'm curious, what part of my description made you think that when I brought out the research papers with numbers pasted from them? I'd just like to know so I could avoid such mistakes in the future."

Maybe if you spoke something like this explicitly:

* The best existing raytracer optimized for CPU: aaa fps
* The best best existing raytracer optimized for GPU: bbb fps

BTW -

1. just because Intel spend a lot of efforts to devise a good raytracer for CPU, doesn't mean nobody else, when having the same lot resource, can do the same for GPU.

2. since nVidia CUDA and AMD CTM have different access model and microarchitecture, you can't really imply one's performance from another's.

Ho Ho said...

christian
"Wow, a whole month. Take them out back and beat the crap out of em"

When was that they promised to start shipping in August? Wasn't it around two months ago? I wonder how couldn't they see that delay if they were so close to shipping the things.



abinstein
"Maybe if you spoke something like this explicitly"

I was thinking that linking to the papers about few most efficient ray tracers for CPUs and GPUs know is good enough to understand what is going on. I even specifically said that the GPU ray tracer that was talked about on Siggraph and what LG was hinting that Intel had was the one I was linking to. Well, silly me. I kind of hoped people wouldn't need to be spoonfed with information and they take a bit of time to do their own research.



"just because Intel spend a lot of efforts to devise a good raytracer for CPU, doesn't mean nobody else, when having the same lot resource, can do the same for GPU."

1) Ray tracing is purely an algorithm. You cant speed up people thinking out new algorithms with money.

2) Yes, I'm sure one can make considerably faster GPU ray tracer. Though I doubt it would be as flexible as the one I linked to. Also I doubt that any current GPU would be able to run tracing much faster, their architecture is simply not well suitable for the task.


"2. since nVidia CUDA and AMD CTM have different access model and microarchitecture, you can't really imply one's performance from another's."

Have you seen any comparison of the two anywhere? In real world games Radeon seems to lag behind NVidia even though it has massive memory bandwidth and theoretical FP performance lead. I wouldn't be surprised to see similar thing with Cuda vs CTM.

Also Cuda has somewhat better granularity and runs scalar code faster than R600. Thanks to humongous register file it can fill almost every cycle that would be waiting for IO. Also shared cache memory can help quite a bit in thread synchronization. I'm not sure how big is R600 registry file and if it has shared cache on GPU, from the technical reviews I've read I don't remember seeing them.

In short my guess is that G80 would outperform CTM in GPGPU.


Btw, I just noticed a "small" mistake with the FPS numbers I calculated in my original RT post. All the estimated Penryn numbers should have been twice as big, I forgot to take the double width SSE into account. That would make that estimated performance on those two scenes at minimum ~20x faster than GPU, in the other more complex scene CPU would have been ~32x faster. I seriously doubt any kind of algorithm could make ray tracing on GPU >20x faster than in that paper, at least not on current HW.

I wonder where did lg go, I'd like to see if he made the same mistake as you did or he really does know about some phenomenal ray tracer running on a GPU.

abinstein said...

"In real world games Radeon seems to lag behind NVidia even though it has massive memory bandwidth and theoretical FP performance lead. I wouldn't be surprised to see similar thing with Cuda vs CTM."

Wrong. You are inducing performance strength from one application to another, and from one programming model to another. This is one of the most common fallacy outsiders have when evaluating products without real evidence.

Ho Ho said...

abinsein

"You are inducing performance strength from one application to another, and from one programming model to another"


Aren't games the things GPUs should run at best speed and with maximum efficiency? If one can't run even those that good then how could it run non-optimal stuff (GPGPU) better?
"This is one of the most common fallacy outsiders have when evaluating products without real evidence."

What do you mean by "outsider"?



Also I said "I wouldn't be surprised". You can have different opinin, of course, but without some evidence we can't prove our claims or disprove others.

abinstein said...

"Aren't games the things GPUs should run at best speed and with maximum efficiency? If one can't run even those that good then how could it run non-optimal stuff (GPGPU) better?"

I told you, it is wrong to induce performance from one application to another. Lets look at number crunching that are suitable for CPU, like fluid dynamics or quantum mechanics. Tell me why is Core 2 much faster in some while K8 in others? Conclusion: you cannot induce performance from one application to another.


What do you mean by "outsider"?"

Who doesn't know how to correctly evaluate microarchitectures, like what you suggested above.

InTheKnow said...

You missed the "per direction" part.

You are quite right. Thanks for the clarification. So if both CSI and HT 3.0 hit their targets CSI should be some 20% faster, correct?

Mo said...

Early Cinebench and wPrime results are NOT looking good for Barcy.

Giant said...



Early Cinebench and wPrime results are NOT looking good for Barcy.


Here's the link:
http://www.xtremesystems.org/forums/showthread.php?t=157136

abinstein said...

"You are quite right. Thanks for the clarification. So if both CSI and HT 3.0 hit their targets CSI should be some 20% faster, correct?"

Correct in terms of absolute bandwidth. With about the same number of lines, CSI @6.4GT/s will have 23% higher bandwidth than HT3 @5.2GT/s.

However the actual effective bandwidth will not only depend on absolute bandwidth but also access latency, where CSI will be slower (due to extra 8b10b encoding required). So it's a trade-off, and I don't know which one will end up stronger than another.

But HT3 is available today. :)

Axel said...

Mo

Early Cinebench and wPrime results are NOT looking good for Barcy.

Though I find it difficult to believe that K10 has barely higher IPC than K8 (particularly in the SSE heavy Cinebench), these results have more credibility than the ridiculous 30K 3DMark06 score and also stack up with the Cinebench & POVray indications from a couple months ago.

If these are legit then AMD are out of the running, finished, at least in the way they currently do business.

Greg said...

Mo and Giant, if you'd actually bother to read through that forum post you'd realize those benches weren't even posted by coolaler but a member of his forums.

Considering those benchmark numbers go against absolutely everything members of the community (like Rahul Sood) have been saying and at one point even underperforms k8 lends serious doubt as to the credibility of those results.

Ho Ho said...

These numbers do seem to confirm the older Cinebench numbers AMD showed some time ago

Giant said...

Indeed. They also match the Dailytech results that show Barcelona 5% behind Kentsfield for IPC.

The AMD fanboys could say "new drivers will improve R600 performance!" and used that as an excuse but we know how that turned out (8800 GTS 320MB fragging 2900 XT in BioShock) but with Barcelona's poor performance no driver can save AMD.

Not only is AMD behind significantly in clockspeed, but the IPC is still behind Core for most benchmarks. Expect AMD to tout scores for SpecFP heavily.

It's about the one benchmark they'll be able to claim superiority over Intel. (well, there is sciencemark too!)

Scientia from AMDZone said...

axel

I have to say that I was encouraged by your response on Asset Light because it shows that you have given it some thought. However, the link you gave only includes snippets (taken out of context) from the actual transcript. Reading the complete transcript shows something different.

The context that you need to keep in mind is that this is from Q1 when AMD had a very embarassing quarter both in terms of revenue and volume. So, most of the comments are geared toward showing plans to make things better.

We know that Hector is not talking about getting rid of FAB 30 because just before the part you referenced he says:

Today, what you have heard Dirk and Bob describe is immediate and somewhat tactical in nature, but make no mistake -- it is also the beginning of a major restructuring

And, what Dirk said was:

We need to successfully convert fab 30 to a 300-millimeter toolset.

The part you are obviously concentrating on is where Hector says:

and reduce our capital intensity by exploring deeply more asset light business models in order to fully execute our plan.

However, you ignore the part after where he says:

We plan to share more details at our upcoming analyst conference this summer.

This clearly contradicts your assertion that the slides do not show what Hector is refering to. Secondly, and most importantly, your assertion that Hector is only referring to Asset Light is clearly wrong. Hector refers to a better business model of which Asset Light is only one piece:

it is also the beginning of a major restructuring of how we intend to run our company going forward, one that would reflect the natural growth and stratification of the processing solutions customer base, accommodate the business model distinctions between good enough entry level markets and performance-hungry mature market solutions, and reduce our capital intensity by exploring deeply more asset light business models in order to fully execute our plan.

Again we can see that Hector is not concentration on Asset Light as he says:

As I mentioned earlier, it will be the transformation needed that allows us to fully complete the transformation of our industry, to put the customer back in charge, to put competition back in our industry, and to put sustainable value creation back in the hands of real innovators.

However, from the reference to the task force we know that whatever the changes might be they will occur by Q2 08:

While I expect this task force to be temporary in nature, lasting no more than a year, I expect this transformation to be bigger and more dramatic in impact than the one we undertook in 2002.

Again, note the reference. In 2002 AMD began partnering with IBM on SOI which led to K8 in 2003.

It coves a whole gamut of opportunities, from everything like I mentioned earlier, we already have a joint development agreement with IBM which we view that as an asset light strategy from the point of view that we do not have to build an R&D facility

There are indeed clues to what Hector is talking about if you look at the entire transcript instead of just the snippets:

But now as Dirk outlined, when we increase the complexity of going from a largely channel company to now one with a very heavy weighting towards OEMs and a complexity of products, we have to restructure how we address them. The way we manage the accounts is going to be different. Over the next few weeks, we will be internally announcing those changes methodically, all aimed at managing much more effectively this much more complex world that we live in.

Several times Hector distinguishes between the business model required for workstations and that required for emerging markets. This is also stated with regard to Asset Light and the reference to Texas Instruments.

For example, an entry level segment in an emerging market is a very large segment. All on its own, it actually requires its own separate business model in how to address that segment. Compare that to a workstation or super computer segment, which all on its own requires a completely different business model.

Question: does that suggest that you could be developing microprocessor technology on both SOI and bulk CMOS?

As I used as an example, which is a good one to use, is perhaps an emerging market entry level product. It is a segment that needs to be managed with a very low level of overhead and an N-minus-one or N-minus two technology, perhaps even consider a bulk solution and, on top of that, perhaps the whole business can be run out of parts other than the United States.

And, this does indeed show up in the slides. As far as I can tell Hector is referring to having Bobcat manufactured in a foundry. In fact, there are specific references that show that Bobcat will be manufactured at TSMC so presumably it is bulk silicon as Hector referred to earlier. So, an Asset Light model for emerging markets but a FAB model for the regular products.

I assume when you look at this now you can understand what Hector is talking about. The low end market needs low overhead but it is also high volume. AMD cannot fit this within the capacity and cost structure of its FABs. However, it is also a lower technology level so it doesn't need leading edge manufacturing techniques.

This should also be a good reply to those claiming that AMD's business model is broken. Here we have a completely different business model.

Scientia from AMDZone said...

InTheKnow

"Don't hold your breath. They will be using MCM to make octo-core processors."

Actually, both Intel and AMD will use MCM for octal cores; this is not a difference.

"Intel already claimed the cost picture for "native" quad core was favorable on 45nm."

Again, we are comparing 45nm monolithic Nehalem to 45nm MCM Penryn; clearly, Nehalem is worse. Or we could compare with 45nm monolithic Shanghai and again Intel loses its current MCM advantage.

"Source, please? "

Source for a larger die? How about common sense? Nehalem will not be smaller than two Penryns; it will be at least that size. Now we have to add both CSI links and an IMC. If you look at a K8 die you can see that these are quite large. The Nehalem die should be at least 25% larger than Penryn.

"Why?"

Because Intel is just now trying to enter into this market where ATI is already well established. Intel has admitted to this.

"I'm also curious about F30. Is it capable of accepting 12" equipment?"

Yes.

"Fabs that were designed for 8" equipment don't have high enough ceilings for 12" tools."

Good point. They may have to do some changes like this but apparently it is not a major issue. AMD's costs seem to be almost entirely for tooling.

"You'll forgive me if I don't give any more credence to what AMD says about Intel"

You are misunderstanding. Apparently AMD sees a distinction between what they do with FAB 36 and what Intel does with D1D. Apparently they feel that their solution is cheaper than Intel's. This makes sense to me since the cost of D1D would be spread among at least four other production FABs whereas it would not be for AMD. This is a good example where economy of scale reduces what would otherwise be an unacceptably high cost.

I recall now that one of the things I forgot to mention in AMD's restructuring is the merger of its current chipset design center at Dresden with ATI's. This is one of the things the Task Force is overseeing.

As far as bump and test goes I don't know the details. I know that the addition is about half the size of the main cleanroom in one of the two FABs. I know that AMD specifically said that this would free up space and allow more production tooling. Perhaps this entails converting some lower level cleanroom space to higher level.

Ho Ho said...

scientia
"Nehalem will not be smaller than two Penryns; it will be at least that size. Now we have to add both CSI links and an IMC. If you look at a K8 die you can see that these are quite large. The Nehalem die should be at least 25% larger than Penryn."

Are you comparing quadcore Nehalem vs dualcore Penryn? If not then what do you think how big chanche there is that Intel looses some cache once it gets IMC? Today about half the die is made up of it, loosing a couple of MB shouldn't hurt performance but would tremendously help with cost savings.

Aguia said...

Here giant you swaped the names.

BioShock Video Card Tests

Correct your post. Event the GTX is fragged by your standards.

Scientia from AMDZone said...

ho ho

I don't know if you have trouble with English or you just think about things backwards. I'll make this easy for you: If you can show that FBDIMM or registered memory is currently available as DDR2-1066 then you have a point.

No, the static compile check was before version 7.0. When people started getting around that then Intel added the runtime check which is in version 7.0. The change that they put in with 7.1 was extremely unprofessional. Apparently though they must have gotten enough bad publicity to make a partial correction in version 8.0. Is version 9.0 still limited to SSE only instead of SSE3?

"I guess one could say that I'm like that also. Reason for me is that AMD hasn't been doing especially good lately and I have lost some of the trust I used to have. Perhaps it is similar with him?"

No. Anand Lal Shimpi has been like that since 2002. Have you?

"Nehalem can do MCM also. Unless it will be quadcore-only I wouldn't be surprised to see lower-end 2x dualcore MCM quads from Intel. Of course I could be wrong also,"

You are definitely wrong. Intel won't do MCM's made from dual core dies with Nehalem. The MCM's will be made from quad core dies. If Intel makes any MCM's with dual cores dies they will be Penryn which will be cheaper than Nehalem.

"I say that MCM is good enough for now."

Yes. It is good enough as long as AMD does not have a competitive quad core of its own. This will change though by Q1 08 and will change sooner in servers.

"Monolothic would be better in some places but I doubt the difference would be too great."

This one really made me laugh. Every time I make an assertion you immediately ask for a link. Yet here you try to make an obviously wrong statement as though it were common knowledge. Clovertown takes a hit of nearly 20% because of being MCM. Penryn only fixes about 5% of this.

"Unless Intel cuts down the amount of cache it has on die. With IMC it is not needed as much any more anyway."

Again, this is obviously untrue. Intel will definitely need the Penryn level of cache to handle octal core. This is why Intel put HyperThreading back on the die.

Scientia from AMDZone said...

Ho Ho

"Today about half the die is made up of it, loosing a couple of MB shouldn't hurt performance but would tremendously help with cost savings. "

Are you thinking straight? Cutting 2MB's from a 12MB total will make very little difference in terms of cost savings. I'm sorry but reducing die area by about 10% will not help "tremendously".

Giant said...

Correct your post. Event the GTX is fragged by your standards.

Not at all. They ran the tests in XP, not in Vista. Without the DX10 content in all but the last of those tests. Take a look at the last graph in the page you linked to. The 2900 takes a massive hit when moved into the DX10 code.

Ati, fragged every single time:
http://www.firingsquad.com/hardware/bioshock_directx10_performance/page5.asp

Giant said...


Again, this is obviously untrue. Intel will definitely need the Penryn level of cache to handle octal core. This is why Intel put HyperThreading back on the die.


Yes, but in this case you will have two Nehalem dies, not one. If a Penryn quad core altogether has 12mb of L2 cache, do you not think it possible that a single Nehalem quad core die would have ~6MB of L2/L3 (I have no idea on the ratio they would use) cache? With an IMC it's obvious that the need for a massive shared cache is not as great.

For a small comparison, a Barcelona CPU will have 2MB l2 and 2MB l3 for 4MB total. A Shanghai 45nm quad core will have the same 2MB L2 but 6MB of L3 for a total of 8MB. (Shanghai cache number comes from AMD's own analyst day presentations)

Erlindo said...

Ho Ho wrote:
These numbers do seem to confirm the older Cinebench numbers AMD showed some time ago


and...

Giant wrote:
Indeed. They also match the Dailytech results that show Barcelona 5% behind Kentsfield for IPC.

The AMD fanboys could say "new drivers will improve R600 performance!" and used that as an excuse but we know how that turned out (8800 GTS 320MB fragging 2900 XT in BioShock) but with Barcelona's poor performance no driver can save AMD.

Not only is AMD behind significantly in clockspeed, but the IPC is still behind Core for most benchmarks. Expect AMD to tout scores for SpecFP heavily.

It's about the one benchmark they'll be able to claim superiority over Intel. (well, there is sciencemark too!)


Sorry to say this about you Giant, you behave like a moron.

Please, both of you check this out:

VR-Zone has captured a CPUZ screenshot of the quad core Agena (Phenom X4) processor running on ATi RD790 board. Agena is 65nm, AM2+ based with 4x512KB L2 and 2MB shared L3 cache. This Agena processor is clocked at 1.8GHz and is B-0 revision which is an EVT sample for SVID testing only. However, we can expect DVT samples for performance and compatibilities testing around end October timeframe with clock speed above 2GHz. AMD roadmap revealed that there will be two Phenom X4 models at launch later this year; GP-7100 and GP-7200 with clock speeds between 2-2.2 and 2.2-2.4Ghz respectively.

...and Gary key had something to say about this:

There appears to be some problems in those reports...

1. It reports as an Opty 2332, but also as a Phenom/Agena (the 2332 is a socket 1207, the Agena is a socket AM2+)
2. It's only a stepping 1 chip...

The latest Barcelona chips are B02 steppings with one more to go. Believe me, the reason we did not post any numbers at Computex or since then is the simple fact that the CPU/boards/BIOS have undergone dramatic changes over the course of the summer. If you have an earlier stepping there is a very good chance that HT and the secondary cache is disabled, this will affect the benchmarks dramatically. We expect to see final stepping chips and board revisions early next week, until then, it is all speculation for the most part.

The one caveat that I will add, this chip really does not get into a groove until you get over 2.4GHz and then it scales incredibly well. Also, the first RD790 boards we have will undergo another spin so any Phenom results with those boards are subject to interpretation depending on whether you like AMD or not.

Axel said...

Scientia

This clearly contradicts your assertion that the slides do not show what Hector is refering to.

No, evidently Hector changed his mind and chose not to reveal the details of "Asset Light going forward" during Analyst Day, as Grose's slide 30 clearly states:

"For competitive reasons, we are not sharing details of progress at this point, we will so do as soon as its appropriate."

Again, since the Q1 CC Hector has not shared anything new. What he shared during the CC was sufficiently nebulous to make drawing firm conclusions impossible. You can be sure he will elaborate soon, as AMD are heading for at least a $2.0 billion loss for 2008, their worst in history.

Also, AMD may have stated plans to convert Fab 30 back during the Q1 CC, but plans can change dramatically after continuing massive losses. I bet that AMD's CAPEX plans have changed materially since the Q1 CC, including those related to Fab 30.

Erlindo said...

Update:

Originally posted by: SickBeast
I wonder why the chip would scale so well beyond 2.4ghz...it must have something to do with the memory controller running at a higher frequency.

So perhaps Gary Key is right and these benchmarks are from crippled Barcelonas.

Also, the first RD790 boards we have will undergo another spin so any Phenom results with those boards are subject to interpretation depending on whether you like AMD or not.


From that I take it that the chip did not perform to expectations, however it's very possible that it's due to a problem with the platform. Why not just run the chip on a current AM2/AM2+ board?


Response:

1. Around 2.4GHz and higher you will want to run CAS4 1066 and at 3GHz+, we expect/estimate that 1333 CAS5 will come in handy. AMD is working very closely with the memory suppliers at this time to get low latency DDR2-1066 ready quickly and to start looking at DDR2-1333 next year before they worry about the switch to DDR3. Memory latencies are going to be a key with this CPU and the performance oriented consumer chipsets.

2. The current AM2+ boards are still immature from a driver/chipset viewpoint, at least to the point of not providing benchmarks yet, once they get closer, expect some numbers.

3. The lower speed Barcelonas on the server chipsets are not going to shine that well in a lot of consumer applications (against higher clocked Yorkfields, but that is not the target market right now), so AMD desperately needs to get the speeds up for this chip to show its true potential. Right now, its doing a lot better than what we saw at Computex and we understand the latest silicon is a marked improvement (several of the board guys were extremely pleased with the last samples) over the last spin we tested. The numbers will be out in a couple of weeks, some will be very happy, some will not, but at least the damn thing will finally be shipping.

p.s. Not trying to be vague, just until the final CPU samples are in and the green light is given by the board guys, no real point in guesstimating.


Linky

Erlindo said...

AMD speaks on Barcelona and Phenom

Quote:John says the Quad-Core Opteron systems had already been submitted to the benchmark organizations for testing. The results are all under NDA and will only be released on launch date, September 10, 2007. Because they are under NDA, he cannot reveal the actual results, but he gave us some interesting indications of how the Barcelona will eventually fare against an equivalent Intel processor.

* 20-30 % better performance overall
* 170 % better performance in some benchmarks


More importantly, he says, the Quad-Core Opteron (Barcelona) will offer 45-50% better performance than current dual-core Opteron processors at the same power consumption and thermal dissipation. Intel quad-core processors, on the other hand, only offer 30-35% better performance (11% in floating point) than their dual-core processors with a 23-50% increase in power consumption and thermal dissipation.


I guess this could be one of those "few reasons" why a 3.0GHz Barcy obtained a 30K score in 3DMark 06.

Scientia from AMDZone said...

giant

If it is comforting to you then by all means hold onto the cinnebench and pov-ray scores. You still have a little while to dream. However, I can guarantee that SSE is roughly twice as fast on K10 as K8.

If as you say the 2900 takes a "massive hit when moved to X10 code" then doesn't it seem like common sense that the issue is probably the X10 driver rather than the hardware?

Nehalem includes L3 as well as L2. Do you honestly think that Intel would bother adding an additional layer of cache if they had decided to reduce what they already have with Penryn?

Ho Ho said...

scientia
"I don't know if you have trouble with English or you just think about things backwards. I'll make this easy for you: If you can show that FBDIMM or registered memory is currently available as DDR2-1066 then you have a point."

Why do I suddenly have to prove something? It was you who claimed that DDR2 1066 is standardized even though it was nowhere to be found on JEDEC site. I only said that if JEDEC introduces new speeds after the initial ones there is a chance that there could be faster versions of FBDIMM also.

In any case limiting DDR3 FBFDIMM to mere 800MHz when there are 1.6GHz and likeley much higher clocking chips is ridicilous.


"No. Anand Lal Shimpi has been like that since 2002. Have you?"

No. I lost my belief in AMD around half a year ago or so when the problems started and they didn't seem to handle it too well.

Btw, on one Estonian tech forum I was suggesting people to get K8 up to until I saw Core2 performance. If you could understand Estonian it would be simple to find out I'm no fanboy of Intel, just a fanboy of performance :)


"Clovertown takes a hit of nearly 20% because of being MCM"

Is it because of MCM or lack of memory bandwidth?


"Penryn only fixes about 5% of this."

Increasing FSB by 20% should help a bit with memory bandwidth related problems also. Add the two together and ...


"Intel will definitely need the Penryn level of cache to handle octal core."


They have IMC and relatveily big memory bandwidth (I still don't believe the FBDIMM-800 thing), I seriously doubt that. Though as neither of us can proove our points that will probably also be for time to ansver for us.


"This is why Intel put HyperThreading back on the die."

Say what? HT can help with hiding memory latency related IPC drops or when code simply has lots of stalls.


"Cutting 2MB's from a 12MB total will make very little difference in terms of cost savings"

I was thinking more like 4M or more. Also native quadcore would not need twice as much L2 as MCM one anyway.



erlindo, if cache is disabled then it would show up on CPUz. Also one can't "disable HT".


"* 20-30 % better performance overall"

Funny that AMD itself has stated 15% average IPC lead over K8 according to some earlier presentations. I wonder what kind of benchmarks are they talking about now.


"I guess this could be one of those "few reasons" why a 3.0GHz Barcy obtained a 30K score in 3DMark 06."


Without a third GPU and 3GHz K10 being twice as fast as 4GHz Core2Quad 30k is impossible to achieve. Also can anyone explain how could a mere 500MHz OC on a CPU give them ~7k performance increase in 3dmark?


scientia
"However, I can guarantee that SSE is roughly twice as fast on K10 as K8."

Yes, SSE will be but cinebench is not using SSE all that much.


"Do you honestly think that Intel would bother adding an additional layer of cache if they had decided to reduce what they already have with Penryn?"

Perhaps they are "copying K10" and do small L2 with medium sized L3? L3 is mostly for sharing data between threads, L2 for thread-specific data. With dualcore you won't need to have L3 to share data efficiently but with 4+ cores it will be quite a performance hit to have L2 shared by all the cores, just look at how high latecny does K10 L3 have.

Scientia from AMDZone said...

Axel

"No, evidently Hector changed his mind and chose not to reveal the details of "Asset Light going forward" during Analyst Day, as Grose's slide 30 clearly states:

"For competitive reasons, we are not sharing details of progress at this point, we will so do as soon as its appropriate."


No. That only states that they are not giving details of how far along they are; it does not say that they are not giving details of where they are going.

"What he shared during the CC was sufficiently nebulous to make drawing firm conclusions impossible."

Interesting. I've given several examples to show that it was not nebulous.

"Also, AMD may have stated plans to convert Fab 30 back during the Q1 CC, but plans can change dramatically after continuing massive losses."

So, let me see if I get this straight. First, you argue that AMD is making big changes based on the Q1 transcript. Then you argue that the Q1 transcript is so vague that you don't know what it means. And, now you are arguing that the Q1 transcript is irrelevant anyway because Hector will probably change his mind.

Let me inject some logic into your floundering reasoning. There have been no substantial changes since the Analyst Day presentation. So, the idea that things have changed since these slides were shown is without foundation.

The same slide series that you referenced clearly shows the ratio of FAB to foundry production on page 12 including 2008. I've already shown that whatever changes AMD is making will occur by Q2 2008. There is no big change in FAB versus foundry production. The only addition is the small amount of MPU foundry work from ATI products. These are MIPS based currently but will be x86 Bobcat based in 2009.

Finally, your argument that AMD might indeed sell FAB 30 is still just as ludicrous as when roborat first said it. AMD's capacity will be topped out by early 2009. AMD has to have FAB 30 to keep increasing. There is also no handy way for AMD to sell off FAB 30 since it is attached to the rest of the complex including a direct attachment to FAB 36. I will readily admit that AMD may not build a FAB in NY but it is not possible that AMD would drop back down to a single FAB. I assume your argument is based on either the notion that AMD will not grow (which is not possible) or that AMD will outsource CPU's to a foundry (which is not possible). Do you have any foundation for this besides the speculation of a single stock analyst who was trying to pump up Intel stock?

Aguia said...

Take a look at the last graph in the page you linked to. The 2900 takes a massive hit when moved into the DX10 code.

Then you missed the part where they say a patch is needed for Ati run well under DX10/Vista.

Aguia said...

Ho ho,

Since you seam an expert in SSE instructions, what’s your thoughts about AMD SSE5?

Scientia from AMDZone said...

Ho Ho said...

"It was you who claimed that DDR2 1066 is standardized even though it was nowhere to be found on JEDEC site."

Intel opposes the DDR2-1066 standard.

AMD

In the design of our upcoming native quad-core client processors, which we expect will be available in the second half of 2007, AMD is planning for DDR2-1066 memory support in our integrated memory controller with the expectation that it will be compatible with any future JEDEC standard that may be adopted.

AMD supports it.

" I only said that if JEDEC introduces new speeds after the initial ones there is a chance that there could be faster versions of FBDIMM also."

No. You really don't understand. Intel got JEDEC to skip faster DDR even though DDR-500 was clearly possible. Intel did this because it knew that it benefited more from DDR2.

Today, Intel is once again trying to get JEDEC to stop developing DDR2 (because that will help AMD) and move to DDR3 (because that will help Intel). So, AMD has announced official support for DDR2-1066 which it did not do for speeds greater than DDR-400 even though the K8 memory controller supported DDR-500.

AMD's ultimate goal is to go to G3MX. So, JEDEC is now wavering because they mostly got shafted by Intel's FBDIMM strategy but really like G3MX because it would save them a lot of development cost on registered memory. On the other hand, they are still afraid of opposing Intel. If JEDEC continues to waver then we'll see AMD simply bypass them altogether and certify directly to the manufacturer. Again, AMD cannot force manufacturers to provide register DIMMs but they can take advantage of stock ECC DIMMs with G3MX. So, if AMD had G3MX today they would already have a DDR2-1066 server solution.

"In any case limiting DDR3 FBFDIMM to mere 800MHz when there are 1.6GHz and likeley much higher clocking chips is ridicilous. "

Yes it is ridiculous; it is also not what I said.

"No. I lost my belief in AMD around half a year ago or so when the problems started and they didn't seem to handle it too well."

Well, maybe you'll get it back in Q4 or Q1.

"If you could understand Estonian it would be simple to find out I'm no fanboy of Intel, just a fanboy of performance :)"

Yes, and I've actually bought bought two Intel based systems within the last two years yet that doesn't stop the Dymo crowd.

"Is it because of MCM or lack of memory bandwidth?"

Well, MCM holds back bandwidth because it presents two loads on the FSB. Inter-die coherency also travels over the FSB.

"Increasing FSB by 20% should help a bit with memory bandwidth related problems also. Add the two together and ..."

You get nothing. If increasing the FSB fixed the problem then Intel wouldn't need to pump up the cache size on Penryn.

"They have IMC and relatveily big memory bandwidth (I still don't believe the FBDIMM-800 thing)"

I don't see why you don't believe it. I'm sure Intel will support FBDIMM-800 initially and FBDIMM-1066 as soon as it is available.

"Say what? HT can help with hiding memory latency related IPC drops or when code simply has lots of stalls."

Stalls were a big problem with the superpipeline of P4; why would they be such a problem for Nehalem? And, shouldn't the IMC reduce Nehalem's memory latency?

"erlindo, if cache is disabled then it would show up on CPUz."

No. It might or might not show up. CPUz is not that robust.

"Yes, SSE will be but cinebench is not using SSE all that much."

For x87 based code, K10 is about the same speed as K8. But, no one is still developing for this.

"Perhaps they are "copying K10" and do small L2 with medium sized L3?"

Amusing, but unlikely.

"L3 is mostly for sharing data between threads, L2 for thread-specific data. With dualcore you won't need to have L3 to share data efficiently but with 4+ cores it will be quite a performance hit to have L2 shared by all the cores"

Interesting. So, you are suggesting that Nehalem will have two separate L2 caches.

"just look at how high latecny does K10 L3 have."

This isn't the same. L2 on K10 is a victim cache instead of a direct cache as it is on C2D. L3 on K10 is only a secondary victim cache. K10 has no speculative loads to either L2 or L3.

Axel said...

Scientia

No. That only states that they are not giving details of how far along they are; it does not say that they are not giving details of where they are going.

Yes it does. You still can't see it because you don't understand the urgency of AMD's financial situation, while it's pretty obvious to me and will be to you as well when Hector makes the big announcement in the near future. I'll break it down for you:

1. Slide 8 - Asset Light is at work today, but is evolving. Asset Light will evolve to "build increased flexibility into production" and to "further insulate against demand fluctuations". Hint hint... changes coming that have not already been announced nor yet taken place.

2. Slides 9-11 - Asset Light today. Not what it will evolve into. Hence, the new Asset Light yet to be announced does not involve co-development with IBM, partnering with Chartered, or partnering with TSMC.

3. Slide 12 - Shows increasing "MPU Owned" capacity into 2008, implying that Fab 36 continues to ramp with Fab 30 still at current output with 200-mm/90-nm.

4. Slide 28 - SOI production from Fab 36 & Chartered for the next two years. No mention of Fab 30. Hmmm....

5. Slide 30 - Regarding "Advancing our Asset Light Strategy", AMD are not sharing details of progress. This means progress in terms of the planning, not the execution. How can they execute before they have revealed the plan to the shareholders? Since they haven't shared anything about where Asset Light is going beyond nebulous words like "build increased flexibility into production" and "further insulate against demand fluctuations", Grose is clearly saying that the details of the plan will come later.

Secondly, and most importantly, your assertion that Hector is only referring to Asset Light is clearly wrong. Hector refers to a better business model of which Asset Light is only one piece

No. If you read Hector's words more closely, Asset Light is at the heart of each element of his high-level restructuring plan:

one that would reflect the natural growth and stratification of the processing solutions customer base

This is the flexible fab part of Asset Light. Outsource where needed, fab in-house where needed. Either that, or it refers to spinning off a division.

accommodate the business model distinctions between good enough entry level markets and performance-hungry mature market solutions

Again, this is the flexible fab part of Asset Light. Outsource where needed, fab in-house where needed. Either that, or it refers to spinning off a division.

reduce our capital intensity by exploring deeply more asset light business models in order to fully execute our plan

This refers specifically to selling off some fab capacity to reduce capital exposure.


Interesting. I've given several examples to show that it was not nebulous.

Maybe you thought so but then again you weren't able to come to any firm conclusions either:

Several times Hector distinguishes between the business model required for workstations and that required for emerging markets. This is also stated with regard to Asset Light and the reference to Texas Instruments.

So can you explain how AMD will restructure in this specific context? No, because AMD have not shared the details yet.

Question: does that suggest that you could be developing microprocessor technology on both SOI and bulk CMOS?

So can you explain how AMD will restructure in this specific context? No, because AMD have not shared the details yet.

As far as I can tell Hector is referring to having Bobcat manufactured in a foundry. In fact, there are specific references that show that Bobcat will be manufactured at TSMC so presumably it is bulk silicon as Hector referred to earlier. So, an Asset Light model for emerging markets but a FAB model for the regular products.

You're speculating. Again, can you explain how AMD will restructure in this specific context? No, because AMD have not shared the details yet.

The low end market needs low overhead but it is also high volume. AMD cannot fit this within the capacity and cost structure of its FABs. However, it is also a lower technology level so it doesn't need leading edge manufacturing techniques.

Fine. But once again, can you explain how AMD will restructure in this specific context? No, because AMD have not shared the details yet.

You should realize, your own words quoted above show that Hector has not yet shared anything concrete, he has merely given some vague feel good statements to appease the investors and to bide time until he reveals the details of the plan. Apparently you bought it, but if you re-read your statements above carefully you'll find that you never gave anything beyond your own speculation for specifically how AMD will restructure their business going forward.

As Hector said, "make no mistake -- it is also the beginning of a major restructuring." But he hasn't yet told us what that involves, and you're fooling yourself if you think this information was shared during Analyst Day. It wasn't. Perhaps Hector initially did intend to reveal the plans during Analyst Day, but it's clear that he stalled for some reason or other. My guess is that the changes he wanted to make were too drastic and the board didn't approve, so he had to go back to the drawing board.

I've already shown that whatever changes AMD is making will occur by Q2 2008.

You are naively assuming that a "temporary" task force implies completion by Q2 2008. Where again did AMD say that? I must have missed it. Please link or quote to a statement by an AMD officer to that effect. Good luck finding that information, as AMD have not yet shared the details of their Asset Light plan going forward. They will in the near future.

There is also no handy way for AMD to sell off FAB 30 since it is attached to the rest of the complex including a direct attachment to FAB 36.

They don't have to sell the building, but they could sell just the contents. There are already rumors out there of a Russian buyer having bought up much of the 200-mm tooling. So that would mean reduced capacity out of Fab 30 going forward, until the supposed Fab 38 conversion. I won't link to that Fudzilla rumor because they're not a credible source, but we'll find out if it's true.

Don't worry, Hector will announce the details of Asset Light in the near future. He has to.

Scientia from AMDZone said...

Giant

"Yes, but in this case you will have two Nehalem dies, not one. If a Penryn quad core altogether has 12mb of L2 cache, do you not think it possible that a single Nehalem quad core die would have ~6MB of L2/L3"

Hmmmm. Well, K8's run ok with 512K per core while C2D's take a noticeable hit. K8's run great with 1MB per core while C2D's don't show top performance until they have 2MB's per core. K10 provides 4MB total which is the same as 512K per core plus another 512K per core L3. With twice the cache bus bandwidth this should be close to K8's 1MB/core.

So, at a minimum Nehalem should need an Allendale level of cache. So, let's say 4MB's L2 plus 2MB's L3. That is possible however we can't claim that this is as good as current C2D since C2D already has the faster cache bus. A 4MB L3 would also work better since it could hold the entire L2. So, I would say that 8MB's total for Nehalem is more likely than 6MB,s.

"For a small comparison, a Barcelona CPU will have 2MB l2 and 2MB l3 for 4MB total. A Shanghai 45nm quad core will have the same 2MB L2 but 6MB of L3 for a total of 8MB. (Shanghai cache number comes from AMD's own analyst day presentations)"

Right, the extra L3 is to offset the memory pressure from octal core. To get the same benefit Nehalem would need 4MB's L2 + 8MB's L3. And, yes this would match Penryn. Still, this isn't bad because we are talking twice as many cores with the same amount of cache per core because of the IMC. In contrast, Shanghai which already has IMC is doubling the cache size with twice as many cores.

Scientia from AMDZone said...

Axel

"Yes it does... I'll break it down for you:"

Well, broke down as in unable to drive anywhere, yes. Broke down as in a coherent explanation, no.

I'll try one more time.

Slide 3:
Agile foundry capabilities
Low cost, low power
(mentioned by Hector in Q1)

Slide 4:
Bulk and SOI performance and value segments
(mentioned by Hector in Q1)

Slide 5:
Deliver leading-edge processing solutions through a unique hybrid manufacturing model that maximizes flexibility, agility and responsiveness.
(mentioned by Hector in Q1)

Slide 8:
FlexFab Production: Chartered, TSMC, UMC
Building increased flexibility in AMD production.
(Both of these are referenced later.)

Slide 10:
High-yield, foundry production of AMD microprocessors
"Flex" production of 90nm MPU products

Slide 12: shows "Flex" capacity
The MPU Foundry refers to production by Chartered which has increased from 2006. This plus good yields at FAB 36 is what allows AMD to push back FAB 38.

Slide 13: One of the headings is specifically Fab Versus Foundry.

Slide 16:
FAB 38 transformation
Adds flexible capacity to FAB 36 during 2008 and 1H of 2009.
(Okay, here we have the reference from Slide 8. And, this explains why it is not included in the graphs on slide 12. Also, to be flexible capacity to FAB 36 it has to be 65/45nm production. This clearly disproves the 90nm theory.)

Slide 28:
This slide clearly shows both FAB 36 and FAB 38 in 2009. Not only that it also shows FAB 38 equal to FAB 36 and both producing Bulldozer. This slide also shows the new flexibility because Bobcat can be either FAB or Foundry produced. Notice how the 2009 and beyond placard is different from the 2007-2009 placard which says Traditional Mix.

(Again, the value entry level market segment and different business model was specifically mentioned more than once by Hector in Q1)

"This is the flexible fab part of Asset Light. Outsource where needed, fab in-house where needed."

Yes, this again matches with Bobcat.

"accommodate the business model distinctions between good enough entry level markets and performance-hungry mature market solutions"

Yes, I've already mentioned this.

"reduce our capital intensity by exploring deeply more asset light business models in order to fully execute our plan

This refers specifically to selling off some fab capacity to reduce capital exposure."


Not at all. This refers to using Foundry manufacturing to tackle the high volume, low profit value entry level segment.

"You're speculating."

My "speculation" is backed up both by what Hector said in Q1 and by all of the relevant slides (including the ones you skipped and took out of context).

"you're fooling yourself if you think this information was shared during Analyst Day. It wasn't. Perhaps Hector initially did intend to reveal the plans during Analyst Day, but it's clear that he stalled for some reason or other. My guess is that the changes he wanted to make were too drastic and the board didn't approve, so he had to go back to the drawing board."

And, what pray tell is this based on besides your own imagination?

"You are naively assuming that a "temporary" task force implies completion by Q2 2008. Where again did AMD say that? I must have missed it."

You obviously did:

"In the meantime, in order to provide intense oversight to this transformational process, and also to ensure strong support across the company for the various initiatives, Dirk and I are forming an executive task force...

While I expect this task force to be temporary in nature, lasting no more than a year, I expect this transformation to be bigger and more dramatic in impact than the one we undertook in 2002."

The transformation process is intensely overseen by the task force. Notice that it does not say that only the initial phases of the process are overseen. Notice too that the second paragraph again says nothing about the process continuing beyond the task force.

"They don't have to sell the building, but they could sell just the contents. There are already rumors out there of a Russian buyer having bought up much of the 200-mm tooling."

I am almost too stunned to type. THIS IS YOUR SMOKING GUN???? You are basing your entire theory on the sale of 200mm tooling which has been planned for the past year? I have news for you; this was in the Q1 transcript:

Our plan continues to include: one, selling our 200 millimeter fab equipment

And the slides as well:

Slide 14:
AMD Dresden: FAB 30
200mm production ramping down in 2H 2007


"So that would mean reduced capacity out of Fab 30 going forward, until the supposed Fab 38 conversion."

Yes, as FAB 36 increases capacity the more expensive 90nm production is not needed. I also assume you are refering to the conversion mentioned specifically by Dirk in the same Q1 transcript and mentioned on slide 16?

"Don't worry, Hector will announce the details of Asset Light in the near future. He has to. "

They already did.

Scientia from AMDZone said...

abinstein

Ho ho can speak for himself.

enumae

Abinstein was trying to be funny but I see no reason to start a fight over this. Mentioning the AMD press release on SSE5 was good a idea.

aguia

If you want the importance of each offering they would basically be on a scale of 1 to 10:

3DNow! - 2
SSE - 5
SSE2 - 4
AMD64 - 6
SSE3 - 2
AMD's SSE3a - 1
SSE4 - 3
SSE5 - 6

Three operand instructions is huge. This greatly leverages the existing SSE register set and can remove about 25% of the coding volume. Let me state this another way. You might recall the Altivec instruction set that Mac programmers greatly bemoaned losing when Apple switched to Intel? This is as powerful as Altivec. Having said that I'm now wondering if maybe IBM influenced this.

Scientia from AMDZone said...

For quick reference, wikipedia: Altivec.

This is the actual 128-Bit SSE5 Instruction Set reference.

most AltiVec instructions take three register operands compared to only two register/register or register/memory operands on IA-32.

AltiVec is also unique in its support for a flexible vector permute instruction


SSE5
New instructions include:

• Fused multiply accumulate (FMACxx) instructions
• Integer multiply accumulate (IMAC, IMADC) instructions
• Permutation and conditional move instructions
• Vector compare and test instructions
• Precision control, rounding, and conversion instructions

Support for these instructions is provided by a new instruction encoding, which adds a third opcode byte (Opcode3). For the three- and four-operand instructions, a new DREX byte defines the destination register and provides the register extension information normally contained in a REX prefix. The REX prefix is not allowed with those instructions.


In terms of importance this is right up there with AMD64. This could very well mean the end of other processors like Sparc and it will definitely take a big bite out of Power and Itanium. My guess is that IBM saw the handwriting on the wall.

enumae said...

Scientia

Is there any reason that Intel can not incorporate this into their products by 2009?

Thanks

Giant said...


Is there any reason that Intel can not incorporate this into their products by 2009?


No. But there's no reason to say that they will either. With the rare exception of 64bit none of AMD's extensions really take off. Intel never implemented any 3dnow support for instance.

«Oldest ‹Older   1 – 200 of 350   Newer› Newest»