After I read the details of this new architecture, I was too disappointed. Actually I was disappointed about it more than Intel P4.
And let me set it clear to AMD: You have failed.
If you like to read more about it you can go to:
http://www.realworldtech.com/page.cfm?ArticleID=RWT082610181333
http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested
What should be done:
- Focus on improve the single thread performance.
- 4 int execution unit is too little, specially due the fact that the single thread performance is pathetic. You have to increase them to 6 and improve the IPC of each one seriously.
- Hand craft everything, guys you can’t relay on automation tools, because it make the chip 20% bigger and 20 slower. while you need every mm for the profit margin and you need every performance we can get. CPU is not GPU so you can’t use the same tools. Plus the main reason DECAlpha was faster than anything exists is because the team was hand-craft every transistor to get the best out of it.
- The amount of decoder is joke, it’s even can’t keep up with Thuban or Sandy Bridge, so you have to make it 5-wide and add instruction cache (like Sandy Bridge) which will end up something better than 20-wide decoder but it will stall less than Sandy Bridge 6 core.
- Branch Prediction have to improved too much, it’s one of the most important thing for IPC and due the fact that the pipeline is too deep you have to focus on this.
- Improve the gate delay even more, and reduce the pipeline depth 2 stages.
- The Single thread performance should reach at least 1.3 in Cinebench benchmark.
- Redesign the cache system and memory controller, to archive latency like: 4, 12, 25, 150
- Increase the L1 cache to Thuban size
- Die area is big, so improve the single thread performance will improve the overall performance, which will allows AMD to sell it’s 8-core system at higher price (400$), 6-core (250$), 2-core (150 > 200)
- Next version of Bulldozer (2012 version) should move to 28nm to reduce the gap between Intel and AMD, more importantly to be able to maintain the same frequency but with shorter stages and reduce the power consumption
Strategic moves for AMD:
- Move to 3D transistor ASAP
- Integrate AMD stream to replace they FP unit
- Move to ZRam like for the L3 cache to reduce the die area and cost