Scientia's Blog: July 2007

Friday, July 27, 2007

AMD Technology Analysts Day 2007 – quite a surprise.

Since Intel began talking about Penryn we've had a lot of rumors and speculation about K10. With live demonstrations from Intel and no information from AMD, I suppose it is only reasonable to expect rumors to pop up like so many mushrooms. Now that AMD has finally spoken though we can assign most of these to the trashcan. This Technology Analysts Day was different from the one last year. This one was less about initiatives like Torrenza, Trinity, and Raiden and more about actual hardware. It appears we finally have the answer to whether AMD is still going to be around in 2008 or 2009. The answer is clearly, yes.

There are a lot of rumors to take care of but I'll make a couple of points first. For example, lately I've seen people pushing the idea that AMD stuffed the channel in Q4 06 and this then led to the sharp drops in Q1 07. I can't say that I entirely blame these people for such speculation because I've been puzzled about this myself. Every time I tried to understand how AMD got from chip shortages in Q1 to full conversion of FAB 36 to 65nm before the end of Q2, it made no sense. The only explanation was that AMD had pulled off a near miracle and gotten 65nm ramped at FAB 36 in record time. Considering that this would have been faster than Intel converted to 65nm, it was a lot simpler to just assume that AMD was lying. I was a bit surprised therefore to have such speculation actually confirmed by AMD. They stated that they did indeed convert FAB 36 to 65nm in record time and they seemed to give plenty of credit for this to APM.

Another unexpected item was that AMD has pushed Direct Connect 2.0 and Fusion back to 2009. There won't be any processors in 2008 with four HT links or GPU's. And, with only three HT links AMD may still be limited to a best range of 4-way without using glue chips. On the other hand, I had only expected a minor core update in 2008 with DC 2.0. Instead, DC 2.0 will arrive on the brand new Bulldozer core. There isn't much known about it except it has longer pipelines and more powerful SSE performance with additional instructions and better IOMMU for virtualization. From this it appears that AMD is serious about holding onto the HPC top end. Now to the rumors:

Rumor: AMD is selling its FABs and outsourcing CPU production to TSMC.

This piece of FUD started with Covello's note with its wild speculation as a justification for buying more Intel stock. Yet, a number of people on the web have repeated and expanded on these rumors so much that AMD even felt compelled to address it directly:

AMD also explained exactly what "Asset Light" included:

I'm certain that these rumors will now try to morph into the argument that although AMD is not outsourcing CPU's today it probably will in the future. However, AMD was quite clear that FAB 30 would indeed be upgraded to FAB 38 and continue past 32nm:

They were also clear that the only CPU's that will be outsourced are embedded versions of Bobcat which can provide an x86 replacement for embedded MIPS in Xilleon and Imageon.

Rumor: Silverthorne is Intel's silver bullet.

Silverthorne is matched by AMD's Bobcat core in 2009 which draws as little as 1 watt. This now finally explains why AMD unloaded the Geode line so quickly.

Secondly, although Silverthorne should be a good competitor, AMD has twice the Intellectual Property plus relationships already in place with ATI customers for similar products. So Intel will have to work hard to catch up. Of course, Intel's existing relationship with Apple could give it the iPhone market. However iPhone is not likely to become a commodity item according to EETimes.

Rumor: The ATI merger has been a disaster with ATI employees leaving before they get laid off and ATI losing share since the merger.

This rumor was specifically addressed by AMD. It was stated that the attrition rate at ATI has always been low and that it has not increased since the merger. It was further stated that additional engineering staff had to be hired to handle AMD's new chipset and GPU related projects. Finally, the 790 55nm chipset has been pushed forward to Q4 so apparently things are on track.

Rumor: R600 proves that ATI is no longer competitive.

When R600 was reviewed it was on an 80nm process with leaky 80hs transistors. According to AMD, it is now being produced on a 65nm process and this version should be out in Q4. Secondly, few people seem to have noticed the success of the 690G chipset which is being used on 35 different motherboards.

Rumor: AMD's 65nm process is broken and that is why Barcelona can't clock.

At Technology Analysts Day 2007, AMD showed a 3.0Ghz quad core Phenom demo system. The door was open showing that the processor was running with stock air cooling.

Rumor: The SuperPi and Cinnebench tests proved that K10 is really no faster than K8.

In the VMware benchmark, after normalizing for twice as many cores and a slower clock, we see a 34% increase in IPC. This would seem a bit strange because if we combined this with AMD's 3.0Ghz demo we would end up with a system that is not only faster than anything Intel has today but is just as fast as a 3.33Ghz Penryn.

I had to remove the SPECfp graph because AMD has changed this in their official pdf. In the presentation it said that it was an actual test but in the pdf it now says that it is simulated. I had wondered about that when I saw it because it looked similar to the simulated benchmark. However, I just saw an actual comparison between a Barcelona 2.0Ghz system and a Xeon E5345 2.33Ghz system. Barcelona gets 78 on SPECfp while Xeon gets 60. This is 51% greater SSE IPC for K10 than Clovertown.

Rumor: Even if K10 is as fast as Penryn it won't matter because Nehalem's greater speed and massive connectivity will easily beat K10.

Maybe not. K10 works best with 4-way but Shanghai will have 8 cores making for a simple 4-way/32 core system. It also remains to be seen if HT 3.0's greater bandwidth allows reasonable 8-way systems. This would potentially be 64 cores on one system without using any glue chips. Intel loses its MCM advantage since Shanghai also uses MCM. Shanghai only needs half as much cache as Penryn so Intel gains no die size advantage (unless Nehalem has much less cache). Shanghai will also gain in IPC much as Penryn does. There is also some indication that Intel will cancel Tigerton and simply fill in with Tulsa until Nehalem is ready. This seems odd because Caneland will effectively be obsolete (along with all Woodcrest, Clovertown, and Penryn based systems) as soon as Nehalem is released. And, the later Nehalem appears in 2008 the closer it is to Bulldozer.

Rumor: Intel has two teams working on C2D. AMD, with fewer engineers will never be able to keep up with Intel's Tick Tock.

AMD has now announced Pipe which is an identical upgrade cycle to Intel's Tick Tock. It includes the same major core upgrade every two years with process upgrades in the alternating years.

Rumor: AMD doesn't even mention DTX anymore so it must be a failure.

AMD is already counting DTX as a success for 2007:

So, there must be considerable support for DTX. I have to wonder too if DTX (especially mini-DTX) is what AMD has in mind for Fusion. Fusion will allow OEM's to build systems without a northbridge and this which would seem to be a good match for a small form factor.

So, what does all this mean? Apparently AMD's success with the yields at 65nm has allowed them to push back the upgrade schedule for FAB 30. This certainly eases cost pressure on AMD while still allowing them to gain income from 200mm tooling sales. AMD's chipset and CPU lineups also seem competitive. Today is truly nothing like 2003 when AMD introduced Opteron with only its own supporting chipset. K10 is a drop-in replacement for K8 on socket AM2 or socket F and has nearly universal support among OEMs. This should allow K10 to gain traction far more rapidly than the slow pace of 2003 when it took months to even have one desktop chipset announced. It took a full year from its introduction for K8 to surpass K7 but today K10 should surpass K8 in about half that time.

AMD's financial problems are not over, of course, but it should be able to steadily improve its losses over the next three quarters. Although many assume that Intel's price cuts have made it impossible for AMD to make any profit with chip sales, this is not really the case. Intel's prices for C2D have remained relatively high so pricing pressure was more effective when AMD still had the bulk of its production on 90nm. For example, Intel has shown great reluctance to move Conroe prices down into the Celeron range. So, today, Intel is trying to cover the Celeron range with the single core Conroe L 420, 430, and 440 models which are not much of a bargain when matched up with AMD's similarly priced dual core X2 3600, 3800, and 4000 models.

The lowest priced real Conroe is the 1.6Ghz dual core Allendale E2140 for $81. This model is easily surpassed in everything except SSE performance by the 2.2Ghz X2 4200 at $80. Prices for AMD processors remain better than Conroe up to the 2.66Ghz E6700 where its higher speed surpasses AMD's fastest model. Essentially, AMD's X2's are all favorably priced up to and including the 3.0Ghz X2 6000 model at only $170. At current prices, the single Core Conroe L models are not competitive and the dual core models are not as favorably priced as AMD's X2's. Conroe's are a bargain though if your application needs the greater SSE performance of C2D. If you need greater speed then even the fastest 3.0Ghz dual core E6850 at $330 and the lowest priced quad core 2.4Ghz Q6600 at $320 are reasonably priced. Obviously, the prices of the faster quad cores will drop when they face quad core competition from AMD.

I'm certain the 3.0Ghz quad demo left many people wondering when such chips would actually be available. Anandtech's take seems particularly negative suggesting as late as Q2 08. But then Anand hasn't exactly been objective about AMD in the past five years so perhaps we should consider that the upper bound. Realistically, the 3.0Ghz chip could have been cherry picked. And, it generally takes about six months for production to catch up to a cherry picked chip. So, I can't imagine that 3.0Ghz would arrive later than Q1 08. Before that happens though we'll have to find out just how well Barcelona really stacks up to C2D. If the VMWare ratio is genuine then Barcelona would launch with a 2.0Ghz speed equal to a 2.4Ghz Clovertown or a 2.8Ghz Opteron. This would mean a 2.0Ghz dual core Phenom would probably match a slower 2.3Ghz Conroe. These IPC ratios are very important because it will be easier for AMD to produce lower clocked K10's, and if AMD can match C2D at a lower clock then this is good for both AMD's volume and pricing. This will be the central factor in AMD's financial recovery. If K10 needs to match C2D 1:1 in clock to be competitve then AMD will have a very tough time over the next 3 quarters. However, the higher K10's IPC and therefore the less clock speed that K10 needs to match C2D the faster AMD will recover. And, this ratio should be known for certain perhaps in less than one month.

Thursday, July 19, 2007

AMD's Q2 07 Earnings

Looking at the earnings, I'm reminded of the movie "Regarding Henry" where Henry gets shot in the head and is in a coma. The doctor then tells Henry's wife that if you have to get shot in the head that the place where Henry got hit is the best place. Similarly with AMD, I suppose that they had to lose $600 Million in one quarter that this was the best way to do it. Athough the amount of the loss this quarter was the same as last quarter, in other important ways, this quarter was a bit better.

Since there is not much good to say about losing over half a Billion dollars for the third quarter in a row, let's leave that for the moment and look at microprocessor revenue share. AMD's microprocessor revenue share really tanked in Q3 2002 when it hit a low of 5.5%. However, this number is not really comparable to today since at that time AMD had only one FAB capable of producing microprocessors, so AMD didn't need as much of the market. With two FABs today, AMD's costs are higher and therefore AMD needs more share. To compare today's numbers we need to look at the period since AMD started putting serious resources into a second leading edge FAB. Ground breaking on FAB 36 started November, 2003 so we will only compare the current quarter with AMD's history back to Q1 2004.

AMD's microprocessor revenue share climbed from around 9% in early 2004 to about 15% in Q3 2005. AMD's revenue share stayed above this until it fell drastically in Q1 2007 when it hit 13.3%. However, this quarter it bounced up to 15.8% which just a little higher than AMD had in Q4 2005. AMD's processor unit volume has probably also recovered to about what it was before Q1 which would be about 25%. In trying to evaluate these numbers most people stumble over the concept of Average Selling Price or ASP. While it might seem immediately intuitive that higher income is related to higher ASP this is not actually the case. The number we actually need is Average Selling Profit. For example, it would be better to sell a processor for $50 that costs us $20 to manufacturer than to sell a processor for $100 that costs us $80 to make. Unfortunately, average profit for processors is never mentioned. Logically, the number of units sold x average profit = total profit. It means very little to us if AMD's average price increases because this merely increases the total revenue but may not in fact increase profits. However, the numbers show that the cost of producing microprocessors at AMD has dropped by 8.5%. This boosts the profit by 18%. This is seen in the increase of gross margin percent from 28.1% last quarter to 33.5% this quarter. So, AMD's actual average asking price may be lower this quarter than last but it doesn't matter because on average AMD is making more money off of each processor than it did last quarter. Increased unit volume and better profit per unit are both positive changes for AMD this quarter.

I had estimated previously that AMD could lose another $1.5 Billion this year without danger of bankruptcy. Having just lost $600 Million this margin is now down to $900 Million. AMD was also in danger of having its R&D and capital purchases budget choked by its reduced income. For example, AMD had originally planned to spend $2 Billion in 2007 for capital purchases but then had to reduce this estimate to $1.5 Billion. Recently though, the German government pledged $360 Million in aid and AMD's capital purchases estimate climbed again to $1.8 Billion. Also, AMD believes that it can break even in Q4. If this is realistic and Q3 splits the difference losing $300 Million then AMD would pull out of its nosedive with $600 Million to spare. However, trying to estimate Q3 revenue is not easy. AMD's recent price cuts will affect Q3. However, as AMD now takes down FAB 30 they will get increasing benefits of a greater 65nm to 90nm mix. AMD could also see income of $100 - $200 Million in Q3 from sales of 200mm tooling from FAB 30. And, AMD gets at least some benefit from K10 Opteron sales in Q3. To me, this seems like enough to cut the losses by $100 Million or so but not $300 Million unless AMD sees increases in unit volume. This could happen perhaps if DTX is more popular than current solutions. Q4 looks much better. The volume of K10 server chips should be reasonable in Q4 and AMD should begin delivering HT 3.0 capable Budapest versions. Assuming that AMD can bump the clocks up from a paltry 2.0Ghz to a more reasonable 2.3 - 2.5Ghz AMD should gain a little on Intel. On Intel's part, the decision to push back 45nm production at the Chandler FAB to Q1 2008 means that only D1D will be producing 45nm chips in Q4 2007. This volume wouldn't go very far on the desktop but would have some effect on servers. Assuming Intel's 45nm server chips are available in 3.2Ghz speeds, AMD will need at least 2.4Ghz quad cores.

To some extent this is like walking up a down escalator. AMD will certainly increase clock speeds in Q4 but it appears that Intel will as well. Essentially, AMD will use its K10 chips to maximum benefit as Opterons because Current Opterons are still running on the old 90nm process. So, Opteron gets higher IPC, greatly enhanced power saving features, and power savings due to 65nm. AMD's greatest advantage will be in 2-way or 4-way where Intel's use of FBDIMM makes Intel systems consume more power and run slower. However, Intel too gets power savings from 45nm and its larger cache will help the most on 2-way and up quad core systems. So, AMD's advances will get mostly countered by similar advances from Intel. AMD will come out only slightly ahead in power/watt on 2-way and up systems due to faster HT, enhanced power saving, and inherent advantages of native versus MCM quad. On the other hand, Intel retains highest overall performance due to its much greater clock speed. In Q1 2008 the race continues as Intel ramps 45nm desktop chips while AMD ramps K10 desktop chips. Then in Q2 Intel ramps 45nm mobile while AMD delivers its most aggressive mobile offering with a new mobile cpu and all new mobile chipset. The situation continues in the second half of 2008 as Intel gears up for Nehalem with Integrated Memory Controller and CSI (similar to AMD's HyperTransport) while AMD gears up for its own 45nm chips with Direct Connect Architecture 2.0 (including a possible MCM octal core). Production in 2008 should be similar as Intel ramps two new 45nm FABs while AMD converts 200mm FAB 30 to 300mm FAB 38 and finally enjoys the benefits of having two similar 300mm FABs.

I suppose I'm somewhat reminded of the mythical hydra when thinking of Intel's current FSB based chipset solutions. With each head that was cut off, the hydra would grow two more. In a similar fashion Intel has attempted to overcome the problems of shared bus by having first two FSB's and then four FSB's later this year. I've seen some people naively describe this as two of Intel's current 5000 series northbridge chips. However, quad FSB chipset complexity is not twice that of a dual FSB chipset, but actually four times greater if the same speed is maintained or half the speed with similar complexity. If we simply linked two northbridges together we would incur a large latency penalty between the northbridges every time we transferred from memory across chips. So, we put all the functions in one chip. This sounds great but look at the numbers. For two FSB's and two FBDIMM channels we have 2x4 = 8 connections. To maintain the same bandwidth for four FSB's we would need eight FBDIMM channels for 4x8 = 32 connections. Four times the connections means four times the circuitry and four times the power draw. This is a reasonable interim solution for Intel but we can also understand why Intel will drop this architecture with Nehalem.

Some have tried to compare the release of K10 to K8 in 2003. I assume this analogy comes to mind because of AMD's less competitive position with K7 and because of both K8's delays and low initial clock speeds. However, they are actually quite different. The K8 die was twice the size of the K7 die and yields on the new 130nm SOI process were terrible at about half of K7's yields. Also, the only available chipset at K8's release was AMD's own 8000 series. It took months for desktop motherboards to begin showing up. In contrast, K10's 65nm yields are good and it already has full support with existing AM2 and socket F motherboards. Another similarity is that in 2003, Xeon's FSB was woefully underpowered on 4-way systems which is why Xeon relied heavily on L3 cache. Clearly, Intel is still relying on large cache as shown by Yorkfield's 50% increase but Intel today is much more competitive in FSB speed. In 2003, 4-way Xeon used DDR-266 compared to Opteron's DDR-400. Today, AMD's memory controller is equivalent to a FSB speed of 1600 Mhz. The same ratio as 2003 would put Intel at 1066 Mhz so we can see that Intel is already ahead of this at 1333 Mhz. Further, since AMD already uses the fastest DDR2-800 memory, greater controller speeds won't help unless DDR2-1066 is released. In fact, if DDR2-1066 is not released, AMD's lead in memory speed is likely to vanish as AMD won't switch to DDR3 until late 2008. This is essentially what happened with DDR where Intel influenced JEDEC to stop at DDR-400 and switch to DDR2 while AMD could have used DDR-500. It is possible that after the lackluster acceptance of Intel's FBDIMM that JEDEC might be more open. However, it remains to be seen if JEDEC will continue to follow Intel or whether they will be more supportive of AMD.

The final thing that I've been curious about is the change in processors families. When AMD had K7, all of their market segments were covered with this one architecture. So, AMD had the dual socket MP version of K7, the mobile version of K7, and the low end Duron version. AMD continued this same approach with K8 with Opteron, Turion, and Sempron. Intel in contrast has often had very different architectures for different segments. For example, Intel continued to sell Pentium Pro for servers even after PII was launched. Further, Intel's Celeron version with on-die cache instead of off-die cache was also quite different in architecture. Intel continued this with P4 Xeon by using L3 cache and PAE addressing which was not used in the desktop version. We see the same thing today where Tulsa with its hybrid FSB design is also substantially modified from the desktop version. Intel changed this even further with Pentium M whose architecture was very different from P4's. Today though Intel has reversed this situation by using one architecture for everything. Intel has Conroe, Woodcrest, and Merom versions of C2D much as AMD had with K8. Apparently Intel will continue this with 45nm by having Wolfdale, Yorkfield, and Penryn versions. AMD however is going the other way and will release a new mobile architecture called Puma in 2008 based not on K10 but a development of K8. The main difference that I can see is that Puma is not designed around massive throughput SSE as the desktop and server chips are. This may be a reasonable architecture split since I suspect few people do video or audio bulk conversion or other SSE intensive tasks on their notebooks. If this really is an advantage then it is certainly possible that Intel will go back to a split architecture when it introduces Nehalem. In fact there have already been suggestions that only the multi-socket versions will use an IMC while Intel will continue to use a FSB with the single socket chips. If true this would indeed put Intel back into a split architecture strategy.

Sunday, July 15, 2007

Intel's 3.0Ghz Barrier.

More than year ago everyone was talking about Core 2 Duo's from Intel that would clock to 3.2 and even 3.33Ghz in 2006. However, these speeds have never been released. And, rather than asking why, not a single review site has mentioned it. The common pretense today seems to be that Intel never claimed that it would top 3.0Ghz in 2006 which, of course, it did.

It isn't hard at all to find evidence of Intel's intention to have Conroe's clocked as high as 3.2 or 3.33Ghz in 2006.

February 12, 2006, Conroe Extreme Edition

In the Q4 2006 the maker will also add the model E6800 that works at 2.93GHz . . . The Extreme Edition of the Conroe processor will operate at 3.33GHz

May 31, 2006, Intel Confirms Two Upcoming Core 2 Extreme CPUs

Intel representatives just contacted DailyTech with the following information:

The Core 2 Extreme processor (Conroe based) will ship at 2.93GHz at Core 2 Duo launch. We will also have a 3.2GHz version by end of the year.

At Bad Hardware, March 31, 2006, Conroe Roadmap And Prices we see:

Core Duo E8000 4MB 3.33GHz 1333MHz Q4 $1199
Core Duo E6900 4MB 3.20GHz 1066MHz FSB Q4 $969
Core Duo EE edition 3.33GHz(L2 4M) 1333MHz Q3 $999

Mike's Hardware originally had this listed as well but it was later changed. However, we can still find Mike's original roadmap from forum comments made about it at the time. For example, at VR-Zone Forum on October 4, 2006, we find a reference to Mike's:

Intel Conroe E6900 (3.2GHz) is expected to be released in Q4.

Clearly, everyone (including Intel) expected Core 2 Duo's faster than 3.0Ghz to be released in 2006 yet these were never released. The common excuse given by Intel proponents is that Intel didn't release faster chips as originally planned because, “It didn't have to.” However, there is good evidence that the reason had more to do with temperature and overheating than a warm and fuzzy feeling of being safely in the lead.

It is very rare to get proper numbers for thermal testing of Intel cpu's. Typically, overclocking is done with premium cooling and testing with the stock HSF never thermally stresses the CPU. For example, when Anandtech originally reviewed X6800, they used a massive Tuniq Tower. This is, of course, nothing at all like what is shipped in the vast majority of computer systems. A term that often gets tossed around is "on air". However the truth is that high end air coolers today are as good as liquid coolers used to be four or five years ago. Thus the term "on air" now means very little. Cooling your CPU "on air" with something the size of a transmission cooler is not much an accomplishment. But, if you could overclock "on air" with the stock HSF, that would be an accomplishment indeed. Unfortunately, review sites seem to want to do thermal testing with a stock HSF about as much as they would like to feed a tank full of hungry pirannas by hand.

It appears that we only got some halfway informative numbers from Anandtech by accident when they were reviewing a cooler instead of the processor itself. I've talked about this information in an earlier article but I'm going to revisit it along with other information since people still seem confused about Core 2 Duo's thermal limitations.

In Anandtech's Case cooling in the second chart: CPU Temperature Under Load, we have some data. Notice that at the stock clock speed of 2.93Ghz with the stock HSF, X6800 is reading 56 C. Now, the article says, "The stress test simulates running a demanding contemporary game. The Far Cry River demo is looped for 30 minutes and the CPU temperature is captured at 4 second intervals". Unfortunately, Anandtech's assumption is a bit off since Far Cry does not really thermally stress the core. According to the Core 2 Duo Temperature Guide:

Intel provides a test program, Thermal Analysis Tool (TAT), to simulate 100% Load. Some users may not be aware that Prime95, Orthos, Everest and assorted others, may simulate loads which are intermittent, or less than TAT. These are ideal for stress testing CPU, memory and system stability over time, but aren't designed for testing the limits of CPU cooling efficiency.

Orthos Priority 9 Small FFT’s simulates 88% of TAT ~ 5c lower.

So, if FFT is 5C lower then Far Cry would be less than that. The Guide also clearly states Tcase Load should not exceed ~ 60c with TAT, and 55c with Orthos.

But, at 56 C we have already exceeded 55 C. And, that would be if we were running FFT. Since we are only running Far Cry the real temperature is probably closer to 60 C at full load. Now, someone will probably claim that that is okay because you would never hit full thermal load under normal circumstances. Unfortunately, that is already figured in. As stated in the guide: 50c is safe. 50c Tcase is a safe and sustainable temperature. 55 or 60 C are not safe or sustainable temperatures. Even though the maximum spec is 60 C, 60c is hot.

I have seen others claim that 60 C was fine because it didn't exceed the maximum rating. However, this is not the way the rating works. 60 C is the maximum for TAT only because no other program will ever reach this stress level. So, it is clear that X6800 with stock HSF will indeed exceed the factory cooling specs at stock speed. This is why Intel has not released a 3.2Ghz quad core.

But, if Core 2 Duo is running hot at 2.93Ghz this still leaves the question of whether anyone would notice. As I've already mentioned, these types of tests seem to be avoided like the plague by regular review sites. So, we need something obscure. However, as luck would have it, we do have something out of the ordinary at Digit Life where X6800 (2.93 Ghz) dual core is reviewed. These conditions are very unusual because he tests in his un-air conditioned apartment. In Moscow apartments, unconditioned for several days, the standard daytime temperature was within +25—+30°C. In this case the environment temperature was +28°C. 28 C is 82 F. This would not be unusual for Indiana in the Summer either. But, he uses the stock HSF. And, let's see what happens:

it didn't even occur to us that new Intel Core 2 processors could spring such a surprise with throttling...) Yes! It was throttling!

The chart is excellent because it compares the stock HSF to a better cooling solution. We can see that none of the common benchmarks really thermally stress the cpu. The one that stressed it the most was the Solidworks CAD & CAE benchmark. The chart only shows an 11% drop because it is a total score. However, The results of the overheated processor are very low in two applications out of three: SolidWorks 2005 and Pro/ENGINEER Wildfire 2.0. This is proof positive that even with a regular application that dual core C2D is busting the thermal limits when running at 82 F ambient. The author says that there was no thermal throttling at 72 F ambient.

Now, let's look at that chart again. Notice that with the 3D Shooter Games (F.E.A.R, Half Life 2, Quake 4, Unreal Tournament 2004) the drop is only 3% versus the 11% we saw with CAD. This again casts doubt that Anandtech was doing anything thermally stressful with Far Cry.

So, Intel did not release a 3.2Ghz processor in 2006 because they couldn't. Such a processor would have exceeded the factory's own limits when running routine applications at ambient temperatures common in the Summer. However, we also know that Intel has steadily improved the thermal properties of its C2D chips with each revision. It is possible that one of the newer revisions of C2D would be capable of going over 3.0Ghz without busting the factory's thermal limits. After all, one would assume that simply lowering the TDP would help and we know that TDP has come down. However, I'm not entirely sure that 45nm is actually an improvement. For example, if a given 45nm chip has the same TDP as a given 65nm chip then logically the 45nm chip would concentrate the heat into about half the die area. This would seem to be more likely to create hot spots rather than less.

I'm quite certain that Intel will indeed get chips out that exceed 3.0Ghz. And, I'm pretty certain that if Intel doesn't do it in Q4 of this year that Q1 08 should be reasonable. It might happen with 45nm as many expect but I can't see any reason why it couldn't happen with 65nm in another revision or two if 45nm wasn't up to at first. In other words, Intel would move to 45nm anyway to reduce die size even if it didn't make any major strides in power draw at first. However, the talk has been that 45nm at Intel is much better with power draw than 65nm. I guess we need to take that with a grain of salt since it was claimed in January 2006 that Intel had Conroe samples hitting 3.33Ghz. It would be nice to have some up to date comparison information with newer C2D's to see if the thermal limits have been improving over time (which they likely have). But, since the information we do have about stock cooling is nearly accidental that seems very unlikely. We may just have to wait until Q4 and see if any 3.2 or 3.33Ghz 45nm chips appear.

Thursday, July 05, 2007

On The Quality Of Things I've Said

Another blogger has started making a list of incorrect, unclear or just plain goofy things I've said. The only problem I can see is that I doubt he has allowed enough space.

I suppose it is a compliment to think that anyone would care enough about things I've said to bother compiling a list of mistakes. I recall when Dr. Asimov said that he had made so many logical errors in his Foundation series that he had to stop adding to it. These were more than just spelling or grammatical errors but errors in timeline, characters, plot, and events. Apparently, he hadn't been able to keep track of everything and these errors accumulated as he added each book. He was informed of the errors by his readers who apparently came up with several hundred.

I started posting online about computers in 2002 and I'm sure there were times that I posted when I was falling asleep at 3am, busy, distracted, or just in a bad mood. However, one particular distraction that I had came in September 2001 when my wife was diagnosed with Congestive Heart Failure. I can't really forget that date because she was in the hospital when the attack came on the World Trade Center. They gave her two years to live so I have no doubt that this had some effect on the quality of my posts over the next few years. There were also times when I was on vacation trying to get in some good times with her or when I was busy working on my house. In hindsight I'd say the house was perhaps overcompensation because I more than doubled the floor area. I watched her get noticeably worse toward the end of Summer in 2004 so I'm sure I was distracted then. Finally, in the middle of that Winter she was hospitalized. When she did come home after a few weeks I had to have hospice care set up for her with a hospital bed and oxygen generator. I took care of her 24 hours a day, 7 days a week for the next eight months. I can remember sitting next to her bed with my laptop while she slept. She often slept 16-22 hours a day. So, when I wasn't checking her blood pressure, counting out medication, changing the bedding, bathing her, getting her something to eat or drink, or just holding her hand, I did make posts. However, because of the stress I would imagine my error rate went up a bit. She died in October 2005 and I'm sure my error rate remained high for quite awhile afterwards.

I don't know exactly how many posts I've made on AMDZone since I began because the count has been reset twice. Let's say it is at least 10,000 posts. I figure if I'm lucky then maybe 5% of my posts contained mistakes. That would be about 500. Hopefully, my error rate wasn't any higher than 20% which would be a whopping 2,000 posts with errors. I'm sure that there have also been things that have changed over that time or I got more information about later. I'm sure there were a number of things I said years ago that will seem goofy now and I'll wonder what I was thinking. But getting back to my industrious mistake counters. If the real number is somewhere in the middle or about 1,000 then I doubt a single link will do. They may end up having to create a larger list of links to my mistakes. Maybe they can divide them up by topic or 6 month interval to keep each list to a more reasonable size. I think they currently have fewer than 50 in their list so I'm certain they have a lot more work to do. I wish them luck in their endeavour.

This month was interesting in that visits from www.intel.com topped all others. More visits originated from Intel Corporation than originated from from any of the major internet ports such as Verizon, Bellsouth, Road Runner, or Comcast. To give an idea of how many this is, I got more than twice as many visits from Intel as from Sony, Microsoft, Boeing, Cisco, IBM, Sun, Motorola, Hewlett Packard, Pratt & Whitney, AMD, Lockheed, and Texaco combined. These are actually in reverse order so the fewest visits were from Sony while Texaco had the most. Also, we can see that AMD's interest was in between Pratt & Whitney's and Lockheed's so not at all unusual. I have no idea where the interest at Intel comes from. I don't know if anyone there actually finds it interesting or whether it is simply an amusing distraction. The last time I looked at the site distribution the visits seemed to come from a lot of different places in the far flung Intel corporation so I doubt they are able to chuckle over my articles at the water cooler. Perhaps they send emails.

Of course, I imagine that Intel is probably a more pleasant place to work these days. It is possible that AMD's recent performance has reduced the pressure considerably. AMD is currently down in almost every measurement we can make. They've had two half Billion dollar losing quarters in a row with a probable third coming up Q2 07. AMD is down in volume share. Intel is way ahead in quad core and with initial Barcelona clocks at 2.0Ghz it looks like Intel will maintain the clock lead into 2008. They are only matched in dual core clock speed but with C2D's higher IPC and much greater FP performance it isn't that much of a contest. True, dual core K10 should have a higher clock than K10 quad core but by the time they are out in any number Intel can surely bump the clock with 45nm. In fact, if they keep pulling the TDP down they may be able to bump the clock to 3.2Ghz on 65nm by Q1 08 which may be the earliest AMD gets a K10 near 3.0Ghz. Intel still has the bulk of mobile and has no real competition from AMD until 2008 (assuming the new mobile cpu and chipsets stay on schedule).

There is also no doubt that Intel has gained tremendously in servers. In fact, in HPC, while AMD had modest gains, Intel's gains could only be described as ballistic. This is the sharpest gain by Intel in HPC on the Top 500 list since November 2003 when a flood of 32 bit Xeon systems began replacing the RISC systems like Alpha, Pa-RISC, MIPS, and Sparc. In just one year, Xeon HPC system power had doubled and became the dominant source of computing power in June 2004. Today, Intel has duplicated that feat by surpassing IBM's Power with 64 bit Xeon. As we speak, there is more HPC computing power coming from 64 bit Xeon (primarily Woodcrest) systems than any other. Of all the Top 500 HPC systems, 64 bit Xeon systems provide about 20% more computing power than IBM's Power and about double that of Opteron. AMD does have some large HPC systems lined up for later this year but even 30,000 processors for these systems won't put much of a dent in Intel's lead. AMD would need about 70,000 Barcelona's in HPC today to catch Intel.

There is no doubt that AMD is digging in on the low end with DTX and the BE chips. It appears that K10 will inherit this legacy next year with the GE series. Although Intel currently has nothing really competitive in this region I'm not sure that it needs to be. Trying to compete with mini-DTX for example would require not only new motherboards in mini-ITX size but large cuts in dual core processor prices. The current Celerons won't quite compete because they are single core. Intel could compete with DTX with its pico-BTX motherboard in terms of size but maybe not in cost. It is doubtful that there will be any wholesale shift to smaller form factors in 2007 but perhaps enough to give AMD some much needed breathing room. And, Intel can probably just wait until earlier in 2008 to respond. This is very much reminiscent of 2002 when AMD moved its K7 line down to compete against Celeron. It worked then (although AMD took quite a beating) so perhaps it will work in 2007.

Monday, July 02, 2007

AMD Quietly Announces K10 Launch Time Frame

AMD officially announced when K10 would be released and gave a small interview to IDG. I would have to say that the news is mixed. Let's look at what this announcement means.

The details directly from the AMD Corporate Virtual Pressroom are pretty straightforward:

AMD expects that the processors will begin shipping for revenue in August 2007, with systems from AMD platform partners beginning to ship in September 2007.

With planned availability at launch in a range of frequencies up to 2.0 Ghz, AMD expects its native quad-core processors to scale to higher frequencies in Q407 in both standard and SE (Special Edition) versions.

This does contradict the never ending stream of rumors that K10 would be delayed until September, October, November, December 2007, or even January 2008. However, it also contradicts the unofficial roadmap that showed K10 at 2.3Ghz at launch. There is little doubt that AMD is not happy with the 2.0Ghz launch speed. It seems common sense that AMD could have made this same announcement at Computex earlier in June. This is further supported by Anandtech at Computex who said:

We understand from the motherboard partners that AMD should hit 2.0GHz by September for the Barcelona launch

So, most likely AMD's plan was to give this news quietly after Computex to allow a few weeks for it to settle and expectations to tumble accordingly. Then they could start the July 2007 Analyst Day on a more upbeat note. Presumably AMD waited until they knew that 2.0Ghz was the best they could hope for. But, I'm sure AMD is aware that even a slow 2.0Ghz in August is still better than nothing until November.

Now we are left with the burden of rating this announcement. It is obvious that AMD's effaced statement was as much a mea culpa as if they had taken out an ad during the SuperBowl. So, AMD's rating is that they are not at the top of their game. Maybe they're trying but they aren't there yet. Still, there are other ratings besides AMD's. When Operon was first released back in 2003 it only hit 1.8Ghz compared to Barton's 2.2Ghz. If we apply this same ratio to K8's current 3.0Ghz then we come up with 2.45Ghz. Of course, Opteron and Barton were both single core. Curiously, bumping the clock speed down a grade to allow for the shift from dual to quad core we get 2.25Ghz which is about what the unofficial roadmaps showed at 2.3Ghz. So, it looks like AMD is down about 1.5 speed grades.

Certainly we have to consider if AMD has done a Prescott with K10. Although Prescott was theoretically launched in 2003, it took Intel until months into 2004 to get Prescott above Celeron speed. However, there is no indication that K10 is running hot. There is one clue from AMD's announcement:

This would be the first time AMD has made both standard and low power parts immediately available as part of a new processor launch.

Low power and for the first time? It sounds like AMD in its attempt to fit quad core K10 into the same power envelope as dual core K8 may have been a bit too aggressive. It sounds like the process has been tweaked a bit too much in favor of low power and that is why there aren't enough higher clocking parts. Of course, it could also be that AMD's process is just not quite there yet. We saw with Intel's C2D that it came up one speed grade with each revision. I'm sure AMD will fix this but the big question is when. They did say higher frequencies in Q4 but this could mean just a single bump from 2.0Ghz to 2.2Ghz. It could also be 2.3Ghz or 2.4Ghz.

Anandtech said: but according to AMD the partners will see at least 2.3GHz by the end of summer. Here, end of summer could mean the beginning of October which would still be Q4.

However, this wouldn't quite be enough. AMD would need 2.5Ghz for Phenom dual core in Q3 to match the already available K8 3.0Ghz speeds. This would also require a bump to 2.7Ghz in Q4 for dual core K10 to pull ahead of K8. The unofficial roadmaps showed 2.9Ghz for dual core in Q4 but 2.7Ghz would at least be enough to surpass the 3.0Ghz K8 speeds available today. If we just consider QFX, for example, AMD needs at least 2.5Ghz to match the performance of the current older 90nm 3.0Ghz chips. I suppose 2.5Ghz would still be worthwhile since you would get twice as many cores for the same power envelope. This would mean 8 cores total but I wonder if there are any games at this point that can use that many cores.

It is difficult to imagine just how frantic things are at AMD right now. AMD has DTX and mini-DTX motherboards coming out in Q3. The 65nm version of R600 (R650) is also due in Q3 along with the finished drivers. We know that Cray is waiting for Budapest (Barcelona with HT 3.0) in Q4. AMD has to also be planning for the R700 chipset for 2008, plus the new mobile chipset, plus the new mobile processor. And, all of this would be while working on the 45nm process with the new immersion scanners and getting DC 2.0 ready, continuing to ramp FAB 36, getting the bump and test facility up and running, and beginning the process of taking down FAB 30. We know that K10 already contains four HT controllers so it isn't clear just how much work is necessary to do DC 2.0. Presumably there must be something more to do or else AMD would release DC 2.0 with Barcelona instead of a year later. I would also imagine that AMD must be hard at work trying to duplicate the new Intel SSE4 instructions. Since AMD has not yet announced tapeout of Shanghai I would imagine they can fit in at least some of the new SSE4 instructions.

It is likely that AMD does have at least some 2.2Ghz chips. We know that AMD is not saving these for Cray since Cray is waiting on Budapest however Sun might be waiting on faster Barcelona chips for Ranger which requires about 16,000. Or AMD may simply be stockpiling faster chips for Q4 release. It is my guess that AMD will give the expected Q4 and Q1 K10 speeds when it gives the Q2 Earnings report later this month. There may be some information about future plans but most of this will probably be given at the Analyst Day at the end of the month. I don't know how likely it is that AMD can get the speed up 2 ½ grades from 2.0Ghz to 2.5Ghz. That is what would be required to equal the current top K8 speeds. This speed could also roughly match Clovertown. However, 2.5Ghz for dual core K10 would lag behind 2.93Ghz Conroe. Also, 2.5Ghz is only likely to match a 2.66Ghz Penryn quad core. A full set of benchmarks should show this more accurately when K10 is released in August but right now it does not appear that AMD will catch Intel's top speed anytime soon.

Scientia's Blog