Monday, November 05, 2007

Has Intel's Process Tech Put Them Leagues Ahead?

There has been a lot of talk lately suggesting that Intel is way ahead of AMD because of superior R&D procedures. Some of the ideas involved are rather intriguing so it's probably worth taking a closer look.

You have to take things with a grain of salt. For example, there are people who insist that it wouldn't matter if AMD went bankrupt because Intel would do its very best to provide fast and inexpensive chips even without competition. Yet these same people will, in the next breath, also insist that the reason why Intel didn't release 3.2Ghz chips in 2006 (or 2007) was because, "they didn't have to". I don't think I need to go any further into what an odd contradiction of logic that is. At any rate, the theory being thrown around these days is that Intel saw the error of its ways when it ran into trouble with Prescott. Then it scrambled and made sweeping changes to its R&D department, purportedly mandating Restricted Design Rules so that the Design staff stayed within the limitations of the Process staff. The theory is that this has allowed Intel to be more consistent with design and to leap ahead of AMD which presumably has not instituted RDR. The theory also is that AMD's Continuous Transistor Improvement has changed from a benefit to a drawback. The idea is that rather than continuous changes allowing AMD to advance, these changes only produce chaos as each change spins off unexpected tool interactions that take months to fix.

The best analogy of RDR that I can think of is Group Code Recording and Run Length Limited recording. Let's look at magnetic media like tape or the surface of a floppy disk. Typically a '1' bit is recorded as a change in magnetic polarity while a '0' is no change. The problem is that this medium can only handle a certain density. If we try to pack too many transitions too closely together they will blend and the polarity change may not be strong enough to detect. Now, let's say that a given magnetic medium is able to handle 1,000 flux transistions per inch. If we record this directly then we can do 1,000 bits per inch. However, Frequency Modulation puts an encoding bit between data bits to ensure that we don't get two 1 bits in a row. This means that we can actually put 2,000 encoded bits per inch and of this 1,000 bits is actual data. We can see that although FM expanded the bits by a factor of 2 there was no actual change in data density. However, by using more complex encoding we can actually increase density. By using (1,7) RLL we can record the same 2,000 encoded bits per inch but we get 1,333 data bits. And, by using (2,7) RLL we space out the 1 bits even further and can double the recording density to 4,000 encoded bits per inch. This increases our data bits by 50% to 1,500. GCR is similar as it maps a group of bits into a larger group which allows elimination of bad bit patterns. You can see a detailed description of MFM, GCR, and RLL at Wikipedia. The important point is that although these encoding schemes initially make the data bits larger they actually allow greater recording densities.

RDR would be similar to these encoding schemes in that while it would initially make the design larger it would eliminate problem areas which would ultimately allow the design to be made smaller. Also, RDR would theoretically greatly reduce delays. When we see that Intel's gate length and cache memory cell size are both smaller than AMD's and we see the smooth transition to C2D and now Penryn we would be inclined to give credit to RDR much as EDN editor, Ron Wilson did. You'll need to know that OPC is Optical Proximity Correction and that DFM is Design For Manufacturability. One example of OPC is that you can't actually have square corners on a die mask so this is corrected by rounding the corners to a minimum radius. DFM just means that Intel tries very hard not to design something that it can't make. Now, DFM is a good idea since there are many historical examples of designs from Da Vinci's attempts to cast a large bronze horse to the Soviet N1 lunar rocket that failed because manufacturing was not up to design requirements. There are also numerous examples from the first attempts to lay the Transatlantic Telegraph Cable (nine year delay) to the Sidney Opera House (eight year delay) that floundered at high cost until manufacturing caught up to design.

I've read what both armchair and true experts have to say about IC manufacturing today and to be honest I still haven't been able to reach a conclusion about the Intel/RDR Leagues Ahead theory. The problems of comparing the manufacturing at AMD and Intel are numerous. For example, we have no idea how much is being spent on each side. We could set upper limits but there is no way to tell exactly how much and this does make a difference. For example, if the IBM/AMD process consortium are spending twice as much as Intel on process R&D then I would say that Intel is doing great. However, if Intel is spending twice as much then I'm not so sure. We also know that Intel has more design engineers and more R&D money than AMD does for the CPU design itself. It seems that this could be the reason for smaller gate size just as much as RDR. It is possible that differences between SOI and bulk silicon are factors as well. On the other hand, the fact that AMD only has one location (and currently just one FAB) to worry about surely gives them at least some advantage in process conversion and ramping. I don't really have an opinion as to whether AMD's use of SOI is good idea or a big mistake. However, I do think that the recent creation of the SOI Consortium with 19 members means that neither IBM nor AMD is likely to stop using SOI any sooner than 16nm which is beyond any current roadmap. I suppose it is possible that they see benefits (from Fully Depleted SOI perhaps) that are not general knowledge yet.

There is at least some suggestion in Yawei Jin's doctoral dissertation that SOI could have continuing benefits. The paper is rather technical but the important points are that SOI begins having problems at smaller scale.

"we found that even after optimization, the saturation drive current planar fully depleted SOI still can’t meet 2016 ITRS requirement. It is only 2/3 of ITRS requirement. The total gate capacitance is also more than twice of ITRS requirement. The intrinsic delay is more than triple of ITRS roadmap requirement. It means that ultra-thin body planar single-gate MOSFET is not a promising candidate for sub-10nm technology."

The results for planar double gates are similar: "we don’t think ultra-thin body single-gate structure or double-gate structure a good choice for sub-10nm logic device."

However, it appears that "non-planar double gate and non-planar triple-gate . . . are very promising to be the candidates of digital devices at small gate length." But, "in spite of the advantages, when the physical gate length scales down to be 9nm, these structures still can’t meet the ITRS requirements."

So, even though AMD and IBM have been working on non-planar, double gate FinFET technology, this does not appear sufficient. Apparently this would have to be combined with novel materials such as GaN in order to meet the requirements. It then appears that it is possible for AMD and IBM to continue using SOI down to a scale smaller than 22nn. So, it isn't clear that Intel has any longterm advantage by avoiding SOI based design.

However, even if AMD is competitive in the long run that would not prevent AMD from being seriously behind today. Certainly when we see reports that AMD will not get above 2.6Ghz in Q4 that sounds like anything but competitive. When we combine these limitations with glowing reports from reviewers who proclaim that Intel could do 4.0Ghz by the end of 2008 this disparity seems insurmountable. The only problem is that the same source that says that 2.6Ghz Phenom will be out in December or January also says Fastest Intel for 2008 is 3.2GHz quad core.

"Intel struggles to keep its Thermal Design Power (TDP) to 130W and its 3.2GHz QX9770 will be just a bit off that magical number. The planned TDP for QX9770 quad core with 12MB cache and FSB 1600 is 136W, and this is already considered high. According to the current Intel roadmap it doesn’t look like Intel plans to release anything faster than 3.2GHz for the remainder of the year. This means that 3.2 GHZ, FSB 1600 Yorkfield might be the fastest Intel for almost three quarters."

But this is not definitive: "Intel is known for changing its roadmap on a monthly basis, and if AMD gives them something to worry about we are sure that Intel has enough space for a 3.4GHz part."

So, in the end we are still left guessing. AMD may or may not be able to keep up with SOI versus Intel's bulk silicon. Intel may or may not be stuck at 3.2Ghz even using 45nm. AMD may or may not be able to hit 2.6Ghz in Q4. However, one would imagine that even if AMD can hit 2.6Ghz in December that only 2.8Ghz would be likely in Q1 versus Intel's 3.2Ghz. Nor does this look any better in Q2 if AMD is only reaching 3.0Ghz while Intel manages to squeeze out 3.3 or perhaps even 3.4Ghz. If AMD truly is the victim of an unmanageable design process then they surely realized this by Q2 06. However, even assuming that AMD rushed to make changes I wouldn't expect any benefits any sooner than 45nm. The fact that AMD was able to push 90nm to 3.2Ghz is also inconclusive. The fact that AMD was able to get better speed out of 90nm than Intel was able to get out of 65nm could suggest more skill on AMD's part or it could suggest that AMD had to concentrate on 90nm because of greater difficulty with transistors at 65nm's smaller scale. AMD was delayed at 65nm because of FAB36 while Intel needs a fixed process for distributed FAB processing. Too often we end up with apples to oranges when we try to compare Intel with AMD. Also, we have to wonder why if Intel is doing so well compared to AMD with power draw then why did Supermicro just announce World's Densest Blade Server with Quad-Core AMD Opteron Processors.

To be honest I haven't even been able to determine yet if the K10 design is actually meeting the design parameters. There is a slim possibility that K10's could show up in the November Top 500 Supercomputer list. This would be definitive because HPC code is highly tuned for best performance and there are plenty of K8 results for comparison. Something substantially less than twice as fast per core would indicate a design problem. Time will tell.

218 comments:

«Oldest   ‹Older   201 – 218 of 218
Ho Ho said...

No, I based it on other reports. How could I base it on your linked slide? Wrong page?

sharikouisallwaysright said...

Fudzilla is writing about r700 is 45nm.
There ist at least this sector AMDATI has the lead over Nvidia and Intel.
As importance of GPUs will rise, if AMD can stay alive while the future depression of world economy lasts, it will have great benefit from this.

Ho Ho said...

"There ist at least this sector AMDATI has the lead over Nvidia and Intel."

AMD won't use its own fabs to make GPUs, it outsources them to elsewhere.

Also as we remember with R600 vs G80, smaller tech doesn't equal better thermals.

Ho Ho said...

also now at 55nm HD series vs NV 65nm 8800GT, both are using roughly equal power but GT still leads in performance.

The_Wolf_Who_Cried_Boy said...

The advantages get less and less as junction area scale and it gets difficult to scale the thin silicon region with enough control to leverage the I of SOI.

How is process variation a concern in regards to SOI's actual insulation thickness, that being in the order of tens of atoms, while gate insulation thickness can be maintained with seemingly good consistancy down to ~5 atoms thickness. I assume all the material layers making up a chip are deposited and etched in the same manner?

Aguia said...

ho ho,

How could I base it on your linked slide? Wrong page?

The link is fine.
It’s not the last slide like I said it’s the third slide.

Ho Ho said...

You mean the "max power delivery" thigie? Well, going by Ohm law that 110A is good enough for 140W CPU. Of course you can bet that it will be able to deliver considerably more. It is not like P4 motherboards are built to be able to work with those highly OC'd 90nm P4D's or >7GHz singlecores in mind. Though, they still work good enough well.

Also take a not of the AM2 slide, it sais only 95A for CPU and NB combined and 120W CPUs work just fine there. Later sockets list 110A for CPU alone.

Pop Catalin Sever said...

140W would be much even if the performance would be double than what it is now.

140W creates serious noise and cooling problems. What PC cases are sufficiently ventilated to prevent a whole system overheat? Just some enthusiast cases are in that category. And for what? the performance close to an Q6600?

AMD managed to screw them selfs good this time by not correctly evaluating and choosing a more advanced manufacturing process for .65 nm and quadcores. I think they knew very well what their process was capable of long before it's introduction but they still decided to go with it (bad management decision once more).

Aguia said...

Also take a not of the AM2 slide, it sais only 95A for CPU and NB combined and 120W CPUs work just fine there. Later sockets list 110A for CPU alone.

There was a "nice" 25% increase in the power. So this we might see even higher than 140W.

Does Intel has anything higher than the 136W on the desktop (the QX9770)?
On server the Intel 3.2Ghz Xeon is 150W.

Ho Ho said...

aguia
"Does Intel has anything higher than the 136W on the desktop (the QX9770)?"

QX6800 was at 150W for a while when it was launched in April. It got replaced by QX6850 at 130W soon after.


chuckula
"If K10 truly be pared down to a dual core as easily as AMD claims, why didn't they instead push to get dual cores out first where there would be a much better chance of getting a profit."

My guess is that K10 dualcores are massively more expensive to make compared to K8 dualcores and they aren't much faster. Main reason is that they have L3 cache they do not really need all that much. That makes their die size rather big compared to K8's, I'd say nearly twice as big. Combine that with broken process technology and you have a huge money eater.


""Blow away Clovertown in every dimension" What a crock."

Perhaps he was talking about alternative realities. In those dimensions K10 works well and is probably faster than in the one we live in :)

Aguia said...

VR Zone

Does any one have seen more detailed reviews for image quality comparisons?

It doesn’t have to be recent up to one year is OK.

Thanks.

Hornet331 said...

Ho Ho said...
aguia
"Does Intel has anything higher than the 136W on the desktop (the QX9770)?"

QX6800 was at 150W for a while when it was launched in April. It got replaced by QX6850 at 130W soon after.



hmm i think your wrong, QX6800 where allways 130W, b3 and g0:

http://processorfinder.intel.com/details.aspx?sSpec=SL9UK

http://processorfinder.intel.com/details.aspx?sSpec=SLACP

Chuckula said...

I think there may have been a 150watt TDP part, but it was a Xeon part as opposed to a desktop part.

Unknown said...

I think there may have been a 150watt TDP part, but it was a Xeon part as opposed to a desktop part.

That's right. The original Clovertown 3Ghz (Apple got most of these) had a 150W TDP. With G0 stepping that was reduced to 120W and was made available to everyone that wanted it.

Ho Ho said...

giant
"That's right. The original Clovertown 3Ghz (Apple got most of these) had a 150W TDP"

Hm, probably that was the one I meant when I talked about 150W part.

Aguia said...

Does any one has any theory how can a card that normally does 4000 to 5000 in 3dmark06, score almost 10000 just because of a mobo change?

sysopt


In case you guys are wondering the 1950Pro normally can’t do more than 5000 with any board/processor.

ocworkbench


Or is the guy fooling and is running in crossfire?

trustedreviews

Scientia from AMDZone said...

It should be pretty obvious why moderation is turned on. There haven't been any worthwhile comments lately so none have been posted.

I'm currently writing a new article. I had written some before but didn't have enough for a full article. With recent things though there seems to be enough to talk about. I'll try to have it up this evening.

AMDZone still seems to be limping along. I've had trouble logging in there and still get errors on pages with new posts.

Scientia from AMDZone said...

lex

"Or maybe sucker some Arab to drop a few billion to keep you afloat."

This is a good example of FUD. Lex's statement shows neither context nor any actual understandiing of the transaction. To Lex, it must have been just some rich Arab who was suckered into giving AMD money. The reality is quite different.

BusinessWeek

An ADIA-related institution, al Mubadala, recently bought 7.5% of Carlyle Group, the big private equity outfit, for $1.35 billion and took an 8.1% share of chipmaker Advanced Micro Devices (AMD) for $600 million.

Or we could look at CNN Money

Citigroup's newfound $7.5 billion cash infusion from Abu Dhabi's state investment fund may not cure all that ails the embattled bank, but it heralds the growing influence of sovereign wealth funds.

Located both in the oil-rich Middle East, as well as other nations such as Russia and Singapore, the funds' combined assets under management are expected in the next three years to quadruple in size to $7.9 trillion from $1.9 trillion, according to Merrill Lynch.

While government debt like U.S. Treasuries have long been their investment vehicle of choice, the funds' appetites have grown more complex as they have searched for greater returns, said Jay Bryson, global economist at Wachovia Corp.

And earlier this month, Mubadala Development Co., a separate investment arm of the government of Abu Dhabi, acquired a $622 million stake in the chipmaker AMD


So, these were not crazy investments by some rich Arab; these were carefully considered investments by global, very saavy investment agencies. The investment in AMD was also inline with the stock purchase ratio from investment in larger companies.

«Oldest ‹Older   201 – 218 of 218   Newer› Newest»