Wednesday, May 21, 2008

Approaching Midyear

Intel's lead remains clear just as much as what steps AMD might take to become profitable again remain a mystery. The layoffs might manage $25 Million a quarter while reducing capital expenditure might be good for another $100 Million but this is a long way from the $500 Million that AMD talked about.

Lately, I've been comparing Intel's and AMD's processors. A couple of things are noticeable. SOI is supposed to confer heat tolerance and AMD's processors now have a very respectable temperature limit of 70 C. Yet, Intel's processors running on bulk silicon now have a top end of 71.5 C. The voltage spread is also noteworthy. AMD's spread ranges from 1.05 to 1.25 volts. Intel's spread is much larger, ranging from 0.85 to 1.3625 volts. Intel's processors seem to run just fine on a much larger voltage spread again on bulk silicon. I also came across a comment in one of IBM's journal articles:

Chips with shorter channels typically run faster but use considerably more power because of higher leakage. In previous-generation processors, these parts would have been discarded because of excessive power dissipation but now are usable by operating at lowered voltages. In addition, chips with longer channels typically run slower, so some of these parts also would not have been used in earlier generation processors because of their low operating frequency, but now they also are made usable by increasing their operating voltages.

This piece was from an article about IBM's Power6 development. However, since this concerns IBM's 65nm process which presumably would be nearly identical to AMD's I have to wonder. I have to say that this reminded me a lot of AMD's initial Barcelona launch. I do wonder if the 2.3Ghz 130 watt chips would have been ones that would have been scrapped in previous generations. To be honest, the current B3 stepping looks a lot more like AMD's planned initial offering. And, with this delay it becomes more doubtful that AMD will hit its target of 2.8Ghz at 95 watts on 65nm. This too presumably would have allowed 3.0Ghz at 130 watts. It now looks like AMD won't get there until 45nm. At least we assume so; there have as yet been no 45nm ES chips reviewed. Hopefully, this won't be like the Barcelona release where AMD snuck the chips out on a Friday to delay review over the weekend. Hopefully 45nm will be something that AMD can finally be proud of.

I know some time ago Ed at Overclockers had suggested that Intel might have had an advantage due to RDR. I had doubted this at the time but have since changed my mind. I realized that Intel had considered their own production marginal without DFM and that AMD has now apparently become serious about using DFM at 45nm and below. It is perhaps possible that Intel's current process advantage at 65nm could be due to just a slight edge gained by making the chip layout easier to produce. If AMD really has made use of DFM at 45nm then the addition of immersion scanning should remove any limits they had at 65nm. This would be fine were it not for the fact that Intel is using high-K at 45nm which is looking more and more like the right choice. It has yet to be shown whether or not AMD can be competitive with low-K.

In fact, it is difficult to imagine how things could be better for Intel than they are right now. Fuad has been suggesting that Intel is having trouble with 45nm production. The only evidence I've seen of this is the lack of 45nm quads at NewEgg. You can find 45nm dual cores from 2.5 to 3.2ghz but quads are still scarce. Of course, even if this is true Intel still has the excellent G0 65nm stepping which is still comfortably ahead of AMD. Considering that AMD won't produce any significant volume of 45nm until Q4 this gives Intel a good six months to tweak 45nm production. Even at that, AMD is unlikely to challenge Intel's current top speeds before 2009.

Intel's chips seem to have a fair amount of headroom as well. The dual core chips seem to be able to hit 4.0Ghz without much trouble. The quad cores might also be able to hit this but it looks like they are limited by socket wattage. If socket 775 were rated for 175 watts Intel could hit 3.5Ghz with their quads with room to spare. Of course, they could always follow AMD and introduce triple cores. This would allow a 175 watt quad to fit within the current socket limit. The problem with this idea is that Intel's marketing department already got up on their soapbox and proclaimed AMD's triple cores to be crippled chips. They would look a bit foolish to go back on this now.

Right now of course this isn't a problem because AMD's B3 chips don't have much headroom and the current triple cores are not being released at higher clocks. But, if triple cores were pushing faster than 3.33Ghz in, say, Q1 2009 what would Intel do? They could up the clock on their dual cores or eat crow and release triple cores of their own. Some suggest that Intel is secretly overrating the TDP limits of its 45nm chips. This presumably would mean that Intel would only have to change the labels to fix the problem. I'm not sure this has any truth to it because while the chips will indeed clock higher the quad TDP ratings seem to match the dual core TDP ratings. So, there is no evidence that Intel is derating its chips. Or perhaps I should clarify that by saying that there is no evidence that Intel is derating the quad chips; the duals do indeed seem capable of clocking higher while staying within the socket limits.

Of course, AMD may not be much competition in the near term. AMD might need high-K to actually pose any threat and by that time Intel will be close to or already producing 32nm. Then again, perhaps Nehalem is much better than has been suggested. I'm a bit divided about the direction that Intel has gone with Nehalem. The IMC is a great idea as is QP. And, putting the loop buffer after the decoding stage does seem like an improvement and probably a lot better than the trace cache on P4. I had been wondering how Intel was going to make SMT work without the old trace cache and this seems to be the answer. But, I still have doubts. The IMC is good but in terms of the benchmarks it seems like a lot of this will be traded off with less use of prefetching with the smaller cache. In terms of real performance this is an improvement but I wonder how it will be on the artificial benchmarks that Intel has been so fond of. Secondly, SMT seems to be one of those ideas that looks great on paper but not in practice. Most likely this was added to give Intel some greater hitting power versus Sun's strongly threaded Niagara. And, it probably will.

The problem is that as a programmer you can't really program for SMT. If you only have one application running then this can work. However, as soon as you run other applications the processor state changes and you can find yourself running slower with SMT instead of faster due to things like sharing buffers, cache, and branch prediction. And, these days with virtualiztion moving up it becomes more and more likely that servers will be running not just multiple applications but multiple OS's as well. This simply cannot be optimized at compile time. Without some method of changing the SMT profile at runtime I can't see how this would work. It seems more likely that SMT will end being an option that most server managers leave turned off. Curiously, AMD seems to be one who is moving towards dynamic load profiling but it has never given any indication that it is interested in SMT. Intel has a tough path to follow trying to make Xeon competitive with Opteron but not so good that it kills Itanium. Once Xeon has 256 bit SSE Itanium will be holding on by only an artificial RAS differentiation. But RAS will undoubtedly become more and more robust in x86 server processors. As bad as Nehalem might be for Itanium it is most certainly what Intel needs for Xeon. I'm eager to see the benchmarks to find out if it is good for the desktop as well.