Loakum Ugh….scratch that previous comment. The upcoming Game of Thrones video game is a F’in mobile phone game. Why can’t they came an open world GoT game, like Witcher 3 or God of War? (> 3 Months ago)
Loakum By FAR, the upcoming Game of Thrones King’s Road was the Game of the Show! It plays like God of War Ragnarok! :) (> 3 Months ago)
Loakum @Driftwood Awesome! I’m loving it! It does show a much crisper picture and the frame rate looks good! I was playing Stella Blade and Dragonball Soarkling Blast! :) (> 3 Months ago)
Driftwood @Loakum: enjoy, the one Sony sent us will be there on launch day. Coverage will follow asap. (> 3 Months ago)
Loakum *takes a large sip of victorious grape juice* ok….my PS5 pro arrived early! So much winning! :) (> 3 Months ago)
Driftwood @reneyvane: non ils l'ont publié le 1er octobre et je crois que tu l'avais déjà linkée. ;) (> 3 Months ago)
Driftwood Download is now functional again on Gamersyde. Sorry for the past 53 days or so when it wasn't. (> 3 Months ago)
Driftwood Another (French) livestream today at 2:30 CEST but you're welcome to drop by and speak English. I will gladly answer in English when I get a chance to catch a breath. :) (> 3 Months ago)
Driftwood GSY is getting some nice content at 3 pm CEST with our July podcast and some videos of the Deus Ex Mankind Divided preview build. :) (> 3 Months ago)
Driftwood For once we'll be live at 4:30 pm CEST. Blim should not even be tired! (> 3 Months ago)
Driftwood More Quantum Break coverage coming in a few hours, 9:00 a.m CEST. (> 3 Months ago)
Driftwood We'll have a full review up for Firewatch at 7 pm CET. Videos will only be tomorrow though. (> 3 Months ago)
Driftwood Tonight's livestream will be at 9:15 GMT+1, not GMT+2 as first stated. (> 3 Months ago)

Since 7576 DaysHas nothing to do with this thread.
Since 7520 Daysbut you cant help but think..
PSN:ManThatYouFear
GT: ManThatYouFear
Real Life: ThatTwat
Since 6463 DaysPrepare To Drop!!
Since 6353 DaysThe worrying factor is if people like buying 200$ console and you can't expect a high end console for 200-300$.
Since 7125 Days
Since 6463 DaysPrepare To Drop!!
Since 7422 Days
Since 6902 DaysChanticleer Hegemony
Since 7576 Days
Since 7547 Dayshttp://forum.beyond3d.com/showthread.php?t=31379&p...
A 28nm GPU with a modest last gen GPU footprint (230mm^2) will be have AMD HD 6870 (Barts) like performance?
Humor me for a moment with this punchy punch-line: Next Gen (Xbox 3, PS4) graphics can be bought, right now, for about $155 (CPU not included). Yes, 2010 graphics chips may match 2013 consoles. And by the time the consoles launch in 2013 contemporary PC GPUs will be twice as fast and cost the same (or less!) than a new console. Punchy, right? Sadly the facts are trending this direction.
At face value it would seem a 28nm GPU, the guestimated target for next gen chips, could exceed 3GFLOPs (maybe even close in on 4) and 70GT/s texturing by simply moving a 40nm Barts AMD HD6850 (6870 with disabled units) down to 28nm but keeping the 255mm^2 footprint. Such a design would not be a top of the line 2013 GPU but it would be quite competitive. 28nm should double density (right?), offer more frequency (right??), and a big reduction in power draw (right???) … but reality isn’t as sweet. This is one reason I am not super excited about a 28nm console. I think console makers are looking at slightly reducing their silicon footprints from last gen and with additional chip manufacturing issues (and eye toward future reduction) and the dirty details about what a node reduction spell out, in my below math, a 2GFLOPs and 50GT/s GPU on 28nm (roughly AMD HD6870 performance—a far cry from the 3+ GFLOPs 70GT the above simple projection would indicate).
Persuade me: Give me intelligent reasons why I should keep my hopes up for a 3+ GFLOPs monster console GPU at 28nm.
Until then, let me convince you, and depress you, that a 28nm GPU in 2013 in a console is going to be no 2x 6850 but instead a single 6870-like chip.
Let’s start with budget. Last gen was about 230-260 mm^2 range at launch for GPUs in a console. We should consider this the upper bounds for silicon next gen as processes haven’t reduced chip costs significantly and the advent of motion controls and importance of storage media will be pressing on silicon budgets.
Let’s be conservative and see how a similar budget on 28nm would look like. Some basic information:
28nm is half the size for the finest geometries (e.g. SRAM) compared to 40nm. Logic is not as dense.
40nm is mature so 28nm won't be as robust, will be more expensive, and have lower yields. 80% scaling is optimistic IMO.
Architectural and efficiency differences aside (Xenos > RSX) last gen consoles look something like this (from memory, and depending no how you count, so don’t shoot me as I know numbers below are wrong as I did this from memory on a lunch break but I wanted some context):
Xenos 500 230 250? 218? 16 8
RSX 550 255 300? 230? 24 8
Barts: 255mm^2, 1700 transistors, VLIW5
6790 840 800 40 16 1344 150
6850 775 960 48 32 1488 127
6870 900 1120 56 32 2016 151
6950 800 1408 88 32 2253 200
6970 880 1536 96 32 2703 250
* 6870 to 6850: 14% drop in Shaders, TMUs, and Frequency; 27% drop in GFLOPs
* 6970 to 6950: 9% drop in Shaders, TMUs, and Frequency; 17% drop in GFLOPs
Let’s acknowledge the following: the TDP scaling on PC GPUs doesn’t fit with a consoles metrics, silicon footprint of PC GPUs is far above the cost tolerances for consoles, 28nm won’t provide 100% real-word increase in transistor density, 28nm is going to be more expensive (yields, demand, general cost of progress, competition) in 2013 than 90nm was in 2005, the success of the Wii in the $250 price bracket has the console manufacturers more price sensitive (complicated issue), the cost of large standard storage and Kinect/Move like devices need to be compensated for in other aspects of the design, new technologies (stacked memory, Silicone Interposers, etc are not free), the RRoD/YLD and the mindfulness to decrease TDP/increase cooling through better coolers on original units* and increase volume, etc. Put all together a 300mm^2 GPU doesn’t seem to be a target console makers will be reaching for.
Assuming an AMD chip a major wildcard will be the transition to GCN / DX11.x+ architectures which will have additional feature costs and overhead not currently represented in the Barts/Cayman models. There will also be a transitioning from VLIW to SIMD (+ Scalar) with South Island (GCN; http://www.anandtech.com/show/4455/amds-graphics-c...).
Four major numbers to keep in mind: (1) a 10% reduction in area from 255mm^2 (Barts) to a more conservative 230mm^2; (2) the 15% redundancy seen in Barts models (6870 and 6850; note that Xenos already had redundancies like this so this will need to be factored into the smaller die size; 15% is aggressive as we see 9% in Cayman); (3) 22% drop (900MHz to 700MHz) in frequency, again aggressive but TDP is a major issue and seeing the TDP drop between a 6790 840MHz and 6850 775MHz even though a 6850 is a faster part; and (4) 80% scaling from 40nm to 28nm.
Applying these to a 40nm GPU 1 & 2 will result in about 25% reduction in functional units from a 6870 and with 3 we are looking at a total performance drop (about 1200GFLOPs) of nearly 40% from a 6870 (2016GFLOPs) and 19% from a 6850 (1488GFLOPs) to this hypothetical GPU on the 40nm process. Scaling first upward to 28nm (80% increase in density; 1.8 * 2016 = 3628GFLOPs) and then reducing for redundancy and the smaller die (about 25%; 3628 * .75 = 2721) arrives at about 2700GFLOPs. We are looking at a net functional unit scaling of a about 35% above a 6870. Reducing the frequency to a more reliable and power efficient 700MHz (22% drop) arrives at about 2100GFLOPs which is between today’s Barts 6870 and Cayman 6950.
Looking at some of these factors and expectations:
* 230mm^2 = smaller side of last gen GPU footprint which ranged from 230-260mm^2; 10% less area than Barts (6870, 255mm^2)
* 2.75B transistors = 230mm^2 Barts style GPU with 80% scaling from 40nm to 28nm
* 700MHz = Less than a Barts 6850 (770MHz, 127W TDP), 6870 (900MHz, 151W), Cayman 6950 (800MHz, 200W), and 6970 (880MHz, 250W). 28nm should bring a solid reduction in power but the increase in transistors is going to scale up power draw. A 6850 is a reasonable 127W considering the 128GB/s of memory bandwidth but a console will need to accommodate the optical drive, HDD, CPU and system memory, etc. With costs (yields/binning) and the RRoD (and YLD) firmly in memory conservative clocks will be likely although the “turbo” features in current GPUs indicates that 700MHz is on the very low end of what should be expected. Comparing a 6790 @ 840MHz 150W max TDP versus a 6850 @ 775MHz 127W max TDP indicates a chip with more functional units and more net performance uses less power than a chip with fewer units at higher frequency.
* 2100GFLOPs = 80% scaling from 40nm to 28nm minus ~ 10% for space reduction (Barts 255mm^2 to our 230mm^2), ~ 15% for redundancy, and ~ 22% reduction in frequency. GFLOPs may also be hit by the new SIMD+Scalar GCN architecture and DX11.1 overheard as well as additional raster pipelines; this may be higher due to many units not needing to scale and shaders are often easier to pack in. e.g. It is unlikely ROPs will scale from 32 to 64, there may even be a reduction to 24 or 16 ROPs on a console GPU, so this space may be utilized for more Shader units.
* 76 TMUs, 52.5GT/s = What is a TMU? I am basing this on Barts style TMUs with 56 TMUs at 35% increase of units. For comparison Cayman has 96. A 6870 (56@900MHz) is 50.4GT/s; 6970 (96@880MHz) is 84.5GT/s.
* 16 ROPs, 11.2GP/s = Or 24. A 6870 is 28.2GP/s (32 ROPs @ 900MHz). When Xenos and RSX shipped they had 8 ROPs when competing PC GPUs had 16. Consoles will have some limiting factors like targeting at most 2GPixel resolutions 1080p (and possibly 2 x 1080 with 3D / 2 player “split” screen) but most games will be 720p 30Hz scaled up to 1080p; there will also be a limiting factor of memory bandwidth. Consoles are about maximizing resources and 32 (or 64) ROPs doesn’t seem like an investment console designers will make when that area could be spent on more shaders.
I think this is on the conservative side. The CPU will be smaller than Xenon IMO (and most certainly CELL) and even this conservative GPU considers the fact more processing will be sent to the GPU.
I would like to think the above is wrong (or a reason not to have a new console on 28nm in 2013!) Putting this into perspective this theoretical GPU is a hair faster than a 6870 in GFLOPs and GT/s but not even half as fast in fillrate.
- 700 - 76 52.5 16 11.2 2100 -
6790 840 800 40 33.6 16 13.4 1344 150
6850 775 960 48 37.4 32 24.8 1488 127
6870 900 1120 56 50.4 32 28.8 2016 151
6950 800 1408 88 70.4 32 25.6 2253 200
6970 880 1536 96 84.2 32 28.2 2703 250
As for cost I think the above leaves a lot of budget for a competitive prices console. One could argue to reduce things even further, but there is always a base cost and things only get so small. Looking at the retail costs of these models (6790 1GB $149, 6850 1GB $179, 6870 1GB $239, 6950 1GB $259, 2GB $299, 6970 2GB $299) and considering the fact both AMD and their distributors and the retailers indicates that even at the high end for a 1GB 255mm^2 chip ($239 retail) the actual cost is far below this and makes is a viable console part. As of today I see 6950’s 1GB at $220, 2GB $239, and 6870’s 1GB at $155 at NewEgg (11/29/2011).
There you have it folks: next gen graphics can be bought, right now, for about $155 (CPU not included).
If there is a glimmer of hope is that if AMD/Distributors/Retailers can all make their cut on a $155 product now, after a node reduction and going with a mild “loss leader” model you would think and hope that a $299-$399 console could pack in a lot more punch—but I don’t think the console makers are thinking along these lines.
And I haven’t even touched memory.
I will throw out this wild card: I think the smaller design above fits well with the cost considerations of stacked memory (higher performance, low power) and a Silicon Interposer (SI). The bigger your GPU the more expensive the SI as it has to fit the GPU and Memory.
We may see a setup with 1GB of very fast memory for the GPU (with an additional 2GB for system memory) and a GPU, following the above, but some concessions to get the GFLOPs up a bit as I think, at least for MS, a lot of processing will be moved to the GPU.
GPU Stats: http://en.wikipedia.org/wiki/Comparison_of_AMD_gra...
* As new models get new processes the cost of cooling will also go down. So an extra $5-$10 on better cooling (compared to a 360) on a launch unit is easier to justify as this cost will be reduced on new models where more aggressive cooling is not necessary.
Time to go back and compare my 2006 predictions!
Btw, this theoretical GPU, with 2GB UMA, would be about 10x faster in raw metrics (GFLOPs, Texturing, etc) and 4x increase in memory compared to current consoles. Factor in the 2.25x cost for 1080p (and double again for full 3D) and quite frankly: This is not really impressive. 28nm may not be dense enough to deliver a true *traditional* next gen experience at the budgets console makers will likely be looking at. 20nm with FINFETs and the hopeful emergence of relatively affordable memory stacking may offer a huge jump over 28nm. The issue is TSMC probably won’t have solid product until 2015… if they don’t choose to chuck the roadmap. Again.
Ps- Sony/MS please, one of you, prove me wrong.
Pwn'd by Phaethon360.
Since 7547 DaysPwn'd by Phaethon360.
Since 6902 DaysChanticleer Hegemony
Since 5933 Dayshttp://forum.beyond3d.com/showthread.php?t=31379&p...
A 28nm GPU with a modest last gen GPU footprint (230mm^2) will be have AMD HD 6870 (Barts) like performance?
Humor me for a moment with this punchy punch-line: Next Gen (Xbox 3, PS4) graphics can be bought, right now, for about $155 (CPU not included). Yes, 2010 graphics chips may match 2013 consoles. And by the time the consoles launch in 2013 contemporary PC GPUs will be twice as fast and cost the same (or less!) than a new console. Punchy, right? Sadly the facts are trending this direction.
At face value it would seem a 28nm GPU, the guestimated target for next gen chips, could exceed 3GFLOPs (maybe even close in on 4) and 70GT/s texturing by simply moving a 40nm Barts AMD HD6850 (6870 with disabled units) down to 28nm but keeping the 255mm^2 footprint. Such a design would not be a top of the line 2013 GPU but it would be quite competitive. 28nm should double density (right?), offer more frequency (right??), and a big reduction in power draw (right???) … but reality isn’t as sweet. This is one reason I am not super excited about a 28nm console. I think console makers are looking at slightly reducing their silicon footprints from last gen and with additional chip manufacturing issues (and eye toward future reduction) and the dirty details about what a node reduction spell out, in my below math, a 2GFLOPs and 50GT/s GPU on 28nm (roughly AMD HD6870 performance—a far cry from the 3+ GFLOPs 70GT the above simple projection would indicate).
Persuade me: Give me intelligent reasons why I should keep my hopes up for a 3+ GFLOPs monster console GPU at 28nm.
Until then, let me convince you, and depress you, that a 28nm GPU in 2013 in a console is going to be no 2x 6850 but instead a single 6870-like chip.
Let’s start with budget. Last gen was about 230-260 mm^2 range at launch for GPUs in a console. We should consider this the upper bounds for silicon next gen as processes haven’t reduced chip costs significantly and the advent of motion controls and importance of storage media will be pressing on silicon budgets.
Let’s be conservative and see how a similar budget on 28nm would look like. Some basic information:
28nm is half the size for the finest geometries (e.g. SRAM) compared to 40nm. Logic is not as dense.
40nm is mature so 28nm won't be as robust, will be more expensive, and have lower yields. 80% scaling is optimistic IMO.
Architectural and efficiency differences aside (Xenos > RSX) last gen consoles look something like this (from memory, and depending no how you count, so don’t shoot me as I know numbers below are wrong as I did this from memory on a lunch break but I wanted some context):
Xenos 500 230 250? 218? 16 8
RSX 550 255 300? 230? 24 8
Barts: 255mm^2, 1700 transistors, VLIW5
6790 840 800 40 16 1344 150
6850 775 960 48 32 1488 127
6870 900 1120 56 32 2016 151
6950 800 1408 88 32 2253 200
6970 880 1536 96 32 2703 250
* 6870 to 6850: 14% drop in Shaders, TMUs, and Frequency; 27% drop in GFLOPs
* 6970 to 6950: 9% drop in Shaders, TMUs, and Frequency; 17% drop in GFLOPs
Let’s acknowledge the following: the TDP scaling on PC GPUs doesn’t fit with a consoles metrics, silicon footprint of PC GPUs is far above the cost tolerances for consoles, 28nm won’t provide 100% real-word increase in transistor density, 28nm is going to be more expensive (yields, demand, general cost of progress, competition) in 2013 than 90nm was in 2005, the success of the Wii in the $250 price bracket has the console manufacturers more price sensitive (complicated issue), the cost of large standard storage and Kinect/Move like devices need to be compensated for in other aspects of the design, new technologies (stacked memory, Silicone Interposers, etc are not free), the RRoD/YLD and the mindfulness to decrease TDP/increase cooling through better coolers on original units* and increase volume, etc. Put all together a 300mm^2 GPU doesn’t seem to be a target console makers will be reaching for.
Assuming an AMD chip a major wildcard will be the transition to GCN / DX11.x+ architectures which will have additional feature costs and overhead not currently represented in the Barts/Cayman models. There will also be a transitioning from VLIW to SIMD (+ Scalar) with South Island (GCN; http://www.anandtech.com/show/4455/amds-graphics-c...).
Four major numbers to keep in mind: (1) a 10% reduction in area from 255mm^2 (Barts) to a more conservative 230mm^2; (2) the 15% redundancy seen in Barts models (6870 and 6850; note that Xenos already had redundancies like this so this will need to be factored into the smaller die size; 15% is aggressive as we see 9% in Cayman); (3) 22% drop (900MHz to 700MHz) in frequency, again aggressive but TDP is a major issue and seeing the TDP drop between a 6790 840MHz and 6850 775MHz even though a 6850 is a faster part; and (4) 80% scaling from 40nm to 28nm.
Applying these to a 40nm GPU 1 & 2 will result in about 25% reduction in functional units from a 6870 and with 3 we are looking at a total performance drop (about 1200GFLOPs) of nearly 40% from a 6870 (2016GFLOPs) and 19% from a 6850 (1488GFLOPs) to this hypothetical GPU on the 40nm process. Scaling first upward to 28nm (80% increase in density; 1.8 * 2016 = 3628GFLOPs) and then reducing for redundancy and the smaller die (about 25%; 3628 * .75 = 2721) arrives at about 2700GFLOPs. We are looking at a net functional unit scaling of a about 35% above a 6870. Reducing the frequency to a more reliable and power efficient 700MHz (22% drop) arrives at about 2100GFLOPs which is between today’s Barts 6870 and Cayman 6950.
Looking at some of these factors and expectations:
* 230mm^2 = smaller side of last gen GPU footprint which ranged from 230-260mm^2; 10% less area than Barts (6870, 255mm^2)
* 2.75B transistors = 230mm^2 Barts style GPU with 80% scaling from 40nm to 28nm
* 700MHz = Less than a Barts 6850 (770MHz, 127W TDP), 6870 (900MHz, 151W), Cayman 6950 (800MHz, 200W), and 6970 (880MHz, 250W). 28nm should bring a solid reduction in power but the increase in transistors is going to scale up power draw. A 6850 is a reasonable 127W considering the 128GB/s of memory bandwidth but a console will need to accommodate the optical drive, HDD, CPU and system memory, etc. With costs (yields/binning) and the RRoD (and YLD) firmly in memory conservative clocks will be likely although the “turbo” features in current GPUs indicates that 700MHz is on the very low end of what should be expected. Comparing a 6790 @ 840MHz 150W max TDP versus a 6850 @ 775MHz 127W max TDP indicates a chip with more functional units and more net performance uses less power than a chip with fewer units at higher frequency.
* 2100GFLOPs = 80% scaling from 40nm to 28nm minus ~ 10% for space reduction (Barts 255mm^2 to our 230mm^2), ~ 15% for redundancy, and ~ 22% reduction in frequency. GFLOPs may also be hit by the new SIMD+Scalar GCN architecture and DX11.1 overheard as well as additional raster pipelines; this may be higher due to many units not needing to scale and shaders are often easier to pack in. e.g. It is unlikely ROPs will scale from 32 to 64, there may even be a reduction to 24 or 16 ROPs on a console GPU, so this space may be utilized for more Shader units.
* 76 TMUs, 52.5GT/s = What is a TMU? I am basing this on Barts style TMUs with 56 TMUs at 35% increase of units. For comparison Cayman has 96. A 6870 (56@900MHz) is 50.4GT/s; 6970 (96@880MHz) is 84.5GT/s.
* 16 ROPs, 11.2GP/s = Or 24. A 6870 is 28.2GP/s (32 ROPs @ 900MHz). When Xenos and RSX shipped they had 8 ROPs when competing PC GPUs had 16. Consoles will have some limiting factors like targeting at most 2GPixel resolutions 1080p (and possibly 2 x 1080 with 3D / 2 player “split” screen) but most games will be 720p 30Hz scaled up to 1080p; there will also be a limiting factor of memory bandwidth. Consoles are about maximizing resources and 32 (or 64) ROPs doesn’t seem like an investment console designers will make when that area could be spent on more shaders.
I think this is on the conservative side. The CPU will be smaller than Xenon IMO (and most certainly CELL) and even this conservative GPU considers the fact more processing will be sent to the GPU.
I would like to think the above is wrong (or a reason not to have a new console on 28nm in 2013!) Putting this into perspective this theoretical GPU is a hair faster than a 6870 in GFLOPs and GT/s but not even half as fast in fillrate.
- 700 - 76 52.5 16 11.2 2100 -
6790 840 800 40 33.6 16 13.4 1344 150
6850 775 960 48 37.4 32 24.8 1488 127
6870 900 1120 56 50.4 32 28.8 2016 151
6950 800 1408 88 70.4 32 25.6 2253 200
6970 880 1536 96 84.2 32 28.2 2703 250
As for cost I think the above leaves a lot of budget for a competitive prices console. One could argue to reduce things even further, but there is always a base cost and things only get so small. Looking at the retail costs of these models (6790 1GB $149, 6850 1GB $179, 6870 1GB $239, 6950 1GB $259, 2GB $299, 6970 2GB $299) and considering the fact both AMD and their distributors and the retailers indicates that even at the high end for a 1GB 255mm^2 chip ($239 retail) the actual cost is far below this and makes is a viable console part. As of today I see 6950’s 1GB at $220, 2GB $239, and 6870’s 1GB at $155 at NewEgg (11/29/2011).
There you have it folks: next gen graphics can be bought, right now, for about $155 (CPU not included).
If there is a glimmer of hope is that if AMD/Distributors/Retailers can all make their cut on a $155 product now, after a node reduction and going with a mild “loss leader” model you would think and hope that a $299-$399 console could pack in a lot more punch—but I don’t think the console makers are thinking along these lines.
And I haven’t even touched memory.
I will throw out this wild card: I think the smaller design above fits well with the cost considerations of stacked memory (higher performance, low power) and a Silicon Interposer (SI). The bigger your GPU the more expensive the SI as it has to fit the GPU and Memory.
We may see a setup with 1GB of very fast memory for the GPU (with an additional 2GB for system memory) and a GPU, following the above, but some concessions to get the GFLOPs up a bit as I think, at least for MS, a lot of processing will be moved to the GPU.
GPU Stats: http://en.wikipedia.org/wiki/Comparison_of_AMD_gra...
* As new models get new processes the cost of cooling will also go down. So an extra $5-$10 on better cooling (compared to a 360) on a launch unit is easier to justify as this cost will be reduced on new models where more aggressive cooling is not necessary.
Time to go back and compare my 2006 predictions!
Btw, this theoretical GPU, with 2GB UMA, would be about 10x faster in raw metrics (GFLOPs, Texturing, etc) and 4x increase in memory compared to current consoles. Factor in the 2.25x cost for 1080p (and double again for full 3D) and quite frankly: This is not really impressive. 28nm may not be dense enough to deliver a true *traditional* next gen experience at the budgets console makers will likely be looking at. 20nm with FINFETs and the hopeful emergence of relatively affordable memory stacking may offer a huge jump over 28nm. The issue is TSMC probably won’t have solid product until 2015… if they don’t choose to chuck the roadmap. Again.
Ps- Sony/MS please, one of you, prove me wrong.
Nobody knows but PS3 at 599 was not good idea, their launch was a disaster. I do not think anyone will take such risk again. Wii U might be 300 so if 720 will be released with in 1 year it will be max 400
Since 6463 DaysPrepare To Drop!!
Since 7422 Dayshttp://forum.beyond3d.com/showthread.php?t=31379&p...
A 28nm GPU with a modest last gen GPU footprint (230mm^2) will be have AMD HD 6870 (Barts) like performance?
Humor me for a moment with this punchy punch-line: Next Gen (Xbox 3, PS4) graphics can be bought, right now, for about $155 (CPU not included). Yes, 2010 graphics chips may match 2013 consoles. And by the time the consoles launch in 2013 contemporary PC GPUs will be twice as fast and cost the same (or less!) than a new console. Punchy, right? Sadly the facts are trending this direction.
At face value it would seem a 28nm GPU, the guestimated target for next gen chips, could exceed 3GFLOPs (maybe even close in on 4) and 70GT/s texturing by simply moving a 40nm Barts AMD HD6850 (6870 with disabled units) down to 28nm but keeping the 255mm^2 footprint. Such a design would not be a top of the line 2013 GPU but it would be quite competitive. 28nm should double density (right?), offer more frequency (right??), and a big reduction in power draw (right???) … but reality isn’t as sweet. This is one reason I am not super excited about a 28nm console. I think console makers are looking at slightly reducing their silicon footprints from last gen and with additional chip manufacturing issues (and eye toward future reduction) and the dirty details about what a node reduction spell out, in my below math, a 2GFLOPs and 50GT/s GPU on 28nm (roughly AMD HD6870 performance—a far cry from the 3+ GFLOPs 70GT the above simple projection would indicate).
Persuade me: Give me intelligent reasons why I should keep my hopes up for a 3+ GFLOPs monster console GPU at 28nm.
Until then, let me convince you, and depress you, that a 28nm GPU in 2013 in a console is going to be no 2x 6850 but instead a single 6870-like chip.
Let’s start with budget. Last gen was about 230-260 mm^2 range at launch for GPUs in a console. We should consider this the upper bounds for silicon next gen as processes haven’t reduced chip costs significantly and the advent of motion controls and importance of storage media will be pressing on silicon budgets.
Let’s be conservative and see how a similar budget on 28nm would look like. Some basic information:
28nm is half the size for the finest geometries (e.g. SRAM) compared to 40nm. Logic is not as dense.
40nm is mature so 28nm won't be as robust, will be more expensive, and have lower yields. 80% scaling is optimistic IMO.
Architectural and efficiency differences aside (Xenos > RSX) last gen consoles look something like this (from memory, and depending no how you count, so don’t shoot me as I know numbers below are wrong as I did this from memory on a lunch break but I wanted some context):
Xenos 500 230 250? 218? 16 8
RSX 550 255 300? 230? 24 8
Barts: 255mm^2, 1700 transistors, VLIW5
6790 840 800 40 16 1344 150
6850 775 960 48 32 1488 127
6870 900 1120 56 32 2016 151
6950 800 1408 88 32 2253 200
6970 880 1536 96 32 2703 250
* 6870 to 6850: 14% drop in Shaders, TMUs, and Frequency; 27% drop in GFLOPs
* 6970 to 6950: 9% drop in Shaders, TMUs, and Frequency; 17% drop in GFLOPs
Let’s acknowledge the following: the TDP scaling on PC GPUs doesn’t fit with a consoles metrics, silicon footprint of PC GPUs is far above the cost tolerances for consoles, 28nm won’t provide 100% real-word increase in transistor density, 28nm is going to be more expensive (yields, demand, general cost of progress, competition) in 2013 than 90nm was in 2005, the success of the Wii in the $250 price bracket has the console manufacturers more price sensitive (complicated issue), the cost of large standard storage and Kinect/Move like devices need to be compensated for in other aspects of the design, new technologies (stacked memory, Silicone Interposers, etc are not free), the RRoD/YLD and the mindfulness to decrease TDP/increase cooling through better coolers on original units* and increase volume, etc. Put all together a 300mm^2 GPU doesn’t seem to be a target console makers will be reaching for.
Assuming an AMD chip a major wildcard will be the transition to GCN / DX11.x+ architectures which will have additional feature costs and overhead not currently represented in the Barts/Cayman models. There will also be a transitioning from VLIW to SIMD (+ Scalar) with South Island (GCN; http://www.anandtech.com/show/4455/amds-graphics-c...).
Four major numbers to keep in mind: (1) a 10% reduction in area from 255mm^2 (Barts) to a more conservative 230mm^2; (2) the 15% redundancy seen in Barts models (6870 and 6850; note that Xenos already had redundancies like this so this will need to be factored into the smaller die size; 15% is aggressive as we see 9% in Cayman); (3) 22% drop (900MHz to 700MHz) in frequency, again aggressive but TDP is a major issue and seeing the TDP drop between a 6790 840MHz and 6850 775MHz even though a 6850 is a faster part; and (4) 80% scaling from 40nm to 28nm.
Applying these to a 40nm GPU 1 & 2 will result in about 25% reduction in functional units from a 6870 and with 3 we are looking at a total performance drop (about 1200GFLOPs) of nearly 40% from a 6870 (2016GFLOPs) and 19% from a 6850 (1488GFLOPs) to this hypothetical GPU on the 40nm process. Scaling first upward to 28nm (80% increase in density; 1.8 * 2016 = 3628GFLOPs) and then reducing for redundancy and the smaller die (about 25%; 3628 * .75 = 2721) arrives at about 2700GFLOPs. We are looking at a net functional unit scaling of a about 35% above a 6870. Reducing the frequency to a more reliable and power efficient 700MHz (22% drop) arrives at about 2100GFLOPs which is between today’s Barts 6870 and Cayman 6950.
Looking at some of these factors and expectations:
* 230mm^2 = smaller side of last gen GPU footprint which ranged from 230-260mm^2; 10% less area than Barts (6870, 255mm^2)
* 2.75B transistors = 230mm^2 Barts style GPU with 80% scaling from 40nm to 28nm
* 700MHz = Less than a Barts 6850 (770MHz, 127W TDP), 6870 (900MHz, 151W), Cayman 6950 (800MHz, 200W), and 6970 (880MHz, 250W). 28nm should bring a solid reduction in power but the increase in transistors is going to scale up power draw. A 6850 is a reasonable 127W considering the 128GB/s of memory bandwidth but a console will need to accommodate the optical drive, HDD, CPU and system memory, etc. With costs (yields/binning) and the RRoD (and YLD) firmly in memory conservative clocks will be likely although the “turbo” features in current GPUs indicates that 700MHz is on the very low end of what should be expected. Comparing a 6790 @ 840MHz 150W max TDP versus a 6850 @ 775MHz 127W max TDP indicates a chip with more functional units and more net performance uses less power than a chip with fewer units at higher frequency.
* 2100GFLOPs = 80% scaling from 40nm to 28nm minus ~ 10% for space reduction (Barts 255mm^2 to our 230mm^2), ~ 15% for redundancy, and ~ 22% reduction in frequency. GFLOPs may also be hit by the new SIMD+Scalar GCN architecture and DX11.1 overheard as well as additional raster pipelines; this may be higher due to many units not needing to scale and shaders are often easier to pack in. e.g. It is unlikely ROPs will scale from 32 to 64, there may even be a reduction to 24 or 16 ROPs on a console GPU, so this space may be utilized for more Shader units.
* 76 TMUs, 52.5GT/s = What is a TMU? I am basing this on Barts style TMUs with 56 TMUs at 35% increase of units. For comparison Cayman has 96. A 6870 (56@900MHz) is 50.4GT/s; 6970 (96@880MHz) is 84.5GT/s.
* 16 ROPs, 11.2GP/s = Or 24. A 6870 is 28.2GP/s (32 ROPs @ 900MHz). When Xenos and RSX shipped they had 8 ROPs when competing PC GPUs had 16. Consoles will have some limiting factors like targeting at most 2GPixel resolutions 1080p (and possibly 2 x 1080 with 3D / 2 player “split” screen) but most games will be 720p 30Hz scaled up to 1080p; there will also be a limiting factor of memory bandwidth. Consoles are about maximizing resources and 32 (or 64) ROPs doesn’t seem like an investment console designers will make when that area could be spent on more shaders.
I think this is on the conservative side. The CPU will be smaller than Xenon IMO (and most certainly CELL) and even this conservative GPU considers the fact more processing will be sent to the GPU.
I would like to think the above is wrong (or a reason not to have a new console on 28nm in 2013!) Putting this into perspective this theoretical GPU is a hair faster than a 6870 in GFLOPs and GT/s but not even half as fast in fillrate.
- 700 - 76 52.5 16 11.2 2100 -
6790 840 800 40 33.6 16 13.4 1344 150
6850 775 960 48 37.4 32 24.8 1488 127
6870 900 1120 56 50.4 32 28.8 2016 151
6950 800 1408 88 70.4 32 25.6 2253 200
6970 880 1536 96 84.2 32 28.2 2703 250
As for cost I think the above leaves a lot of budget for a competitive prices console. One could argue to reduce things even further, but there is always a base cost and things only get so small. Looking at the retail costs of these models (6790 1GB $149, 6850 1GB $179, 6870 1GB $239, 6950 1GB $259, 2GB $299, 6970 2GB $299) and considering the fact both AMD and their distributors and the retailers indicates that even at the high end for a 1GB 255mm^2 chip ($239 retail) the actual cost is far below this and makes is a viable console part. As of today I see 6950’s 1GB at $220, 2GB $239, and 6870’s 1GB at $155 at NewEgg (11/29/2011).
There you have it folks: next gen graphics can be bought, right now, for about $155 (CPU not included).
If there is a glimmer of hope is that if AMD/Distributors/Retailers can all make their cut on a $155 product now, after a node reduction and going with a mild “loss leader” model you would think and hope that a $299-$399 console could pack in a lot more punch—but I don’t think the console makers are thinking along these lines.
And I haven’t even touched memory.
I will throw out this wild card: I think the smaller design above fits well with the cost considerations of stacked memory (higher performance, low power) and a Silicon Interposer (SI). The bigger your GPU the more expensive the SI as it has to fit the GPU and Memory.
We may see a setup with 1GB of very fast memory for the GPU (with an additional 2GB for system memory) and a GPU, following the above, but some concessions to get the GFLOPs up a bit as I think, at least for MS, a lot of processing will be moved to the GPU.
GPU Stats: http://en.wikipedia.org/wiki/Comparison_of_AMD_gra...
* As new models get new processes the cost of cooling will also go down. So an extra $5-$10 on better cooling (compared to a 360) on a launch unit is easier to justify as this cost will be reduced on new models where more aggressive cooling is not necessary.
Time to go back and compare my 2006 predictions!
Btw, this theoretical GPU, with 2GB UMA, would be about 10x faster in raw metrics (GFLOPs, Texturing, etc) and 4x increase in memory compared to current consoles. Factor in the 2.25x cost for 1080p (and double again for full 3D) and quite frankly: This is not really impressive. 28nm may not be dense enough to deliver a true *traditional* next gen experience at the budgets console makers will likely be looking at. 20nm with FINFETs and the hopeful emergence of relatively affordable memory stacking may offer a huge jump over 28nm. The issue is TSMC probably won’t have solid product until 2015… if they don’t choose to chuck the roadmap. Again.
Ps- Sony/MS please, one of you, prove me wrong.
Since 7576 Days
Since 7547 DaysWho knows. Nothing is set in stone. That GPU AMD showed off with stacked memory on a Silicon Interposer (SI).
Stacked memory is 3D memory (Hypercube, etc) and allowed for very wide I/O (i.e. a ton of bandwidth at low power and small footprint onboard). Think eDRAM bandwidth to the entire VRAM.
SI is basically making a chip based PCB. Know how some laptops (and RSX in PS3) have a small discreet PCB with the memory and GPU? Now think of the PCB being swapped out and a chip, from an older cheap process (like 65nm) being used. This allows for a HUGE amount of interconnect at low power costs. Again, think massive bandwidth.
Neither of these will come cheap but together it is like turning your chip memory into EDRAM. A GPU with a ton of compute power, paired with this kind of high bandwidth low latency memory configuration would be a viable way to offload a lot of tasks to the GPU.
So from a design perspective the GPU mentioned above on a board like this would offer a nice close boxed solution... but I cannot help at pining away at getting another 30mm^2 for a big boost in FLOPs or waiting until 2014 when 20nm will be in volume production.
http://www.electronicsweekly.com/Articles/04/11/20...
http://www.electronicsweekly.com/blogs/david-manne...
Pwn'd by Phaethon360.
Since 7570 Days
Since 7448 DaysI DO solidly agree with you on the negatives of launching a new console at 28nm in 2012, it would be both expensive and sluggish on the manufacturing side. But I don't think MS or Sony are going to delay until 2014 for 20nm. Hell even a little patience (about 6 months worth into 2013) would provide a more efficient and cost effective fabrication. I think THIS is what alot of people (especially the ones demanding new consoles at this moment) are missing, launching now essentially means spending more.........for less.
That being said, I think your predictions hinge mostly upon the belief that Microsoft and Sony will be reaching for the lowest hanging fruit.
I don't know if I would call that depressing, I don't just game on consoles. And while it's a well laid out argument (impressive honestly) with all of it's points fine tuned..................I hope you aren't offended if I accuse you of being a little guilty of playing it SAFE with your predictions, and then issuing a challenge by saying "come up with something better" I noticed no one really wanted the challenge in the B3D forum either. Surely you can recognize that a bolder prediction this early would sound too much like a political promise and make that individual sound like a fool defending it.
Sure we could mull over the possibilities, but most gamers wouldn't fully grasp or appreciate the discussion to it's fullest. Console gamers need things broken down for them in much simpler terms. Their questions are alot simpler too. They ask things like , "Will the next gen consoles really have Avatar like graphics?" It's not that I want to disappoint them or step on their enthusiasm, but some of them just don't get it. Avatar was rendered by 4,000 servers, and NOT even in real-time. Why are gamers expecting that kind of performance out a gaming console? "Will these rumored specs be capable of running Epic's Samaritan demo?" And then I inform them that the Samaritan demo was running on 3 GTX 580s in SLI. Thankfully Sweeney provided a response when comparing to the current consoles-
There are other issues that will prevent the next consoles from being delayed until 2014. For one, Microsoft WANTS to launch before Sony the way they did last time, it gave them an entire year, exclusives (some timed) and a clear strategic advantage. This is going to force things with Sony who is not likely to allow Microsoft to have an entire year to itself all over again. They may launch later (meaning I don't see them launching their consoles at the same time), but not by much. Sony approached the current generation arrogantly believing that Blu-ray and CELL made them superior. Despite the Wii's low graphical profile, Microsoft's non-HD storage and nightmarish engineering woes, they STILL came in dead last in EVERYTHING. From hardware, software, and add-on/peripheral sales, they were beat and all of the predictions about the PS3 overtaking Nintendo and Microsoft later in the generation ALL FAILED. That's not to say that Microsoft launching first didn't come without disadvantages of it's own. There were almost immediate supply difficulties, last generation media storage, forgettable launch titles (anyone remember Perfect Dark was supposed to be what Halo CE was to the Xbox 1?) and almost two full years before developers were ready with key franchises.
Eh, I'm sorry I started this, but I have to get ready for work. I'll come back and finish this up later :)
Since 7422 Days"That" = That.
Since 6648 DaysTbh, the biggest issue we should be worried about is this rumour about kinect 2, it's inclusion with the console and the different SKU's. If kinect 2 comes as standard with the console we will end up getting far less actual hardware for our money, they will have a budget based on what they can sell it on and the camera will eat up a certain portion of that budget, coupled with the rest of the standard peripherals that are essential and how much of a hit MS are willing to take on it.
Then the rumours of multiple SKU's is somewhat scary, MS had a long fight over the course of the 360 lifespan over how to differentiate the SKU's and they learn't that storage is required by everyone, and surely for them to sell digital goods to as many as possible it makes sense not to divide storage into haves and have nots. Then how do you differentiate? Amount of storage? Will you really charge £100 more for double the storage or whatever, will your hardcore audience buy that?
Of course theres also what optical media format you use, blu-ray and you're paying your competitor, dvd is dead and has been for years, usb sticks maybe but it would eat into your bottom line too. Either way you need to be able to hold at LEAST 50gb (remember sony aren't gonna just put another basic blu-ray player in the ps4).
I don't expect MS to match what epic want in their spec improvements, i'd say the budget is $399 or thereabouts, $499 and above is known to be unworkable even in better times so there is no way they will aim for that.
Since 6463 DaysPrepare To Drop!!
Since 7547 DaysThat being said, in 2013, I can see a really elegant console using the specs for TDP (<130W for the GPU/Memory as system TDP really won't crack above 200W) and mm^2 (230-260mm^2 for the GPU) I laid out: (a) Shader units aren't really big so certain architectural gains from moving to a new process (a new node often means the same design can be more compact above and beyond density gains), reduction in non-console features (sideport), reduction in performance features that don't mesh with more targetted platforms (e.g. 32ROPs for a console is probably overkill), etc should allow for a lot more shaders. Even moving up from 230mm^2 to 240mm^2 could allow for a large bump in thoroughput. (b) Stacked memory, which probably (?) would be available to closed designs in 2013 will allow a big drop in footprint, power draw, and PCB complexity. The benefit will be it supports much higher bandwidth and latencies. Essentially you get the benefit of eDRAM to your entire graphic buffer. (c) Silicon Interposer (SI) to allow extremely fast communication to the GPU and memory with minimal PCB complexity, TDP, etc.
Now I must admit AMD themselves are the ones hinting at this direction so these aren't my ideas. But that is why they are semi-exciting and they make a lot of sense on a closed box. Basically with the above scenario we could see a 3TFLOPs GPU with crazy fast memory. It may not be the fastest product on paper but it is essentially another "Xenos" where a lot of common bottlenecks are alleviated. If we saw anything this gen with Xenos vs. RSX peak performance isn't all it is cracked up to be. When Xenos offered unified shaders (far better vertex performance), decoupled texture units (not tieing up PS and allowed solid performance out of fewer units), SM3.x shader model (some things can be done with fewer instructions), and the eDRAM (peak fillrate) what Xenos did was make a lot of common bottlenecks more manageable. There are areas where RSX is faster (some Pixel Shaders), has some extra features (certain FP blending modes, PCF filters), is easier to manage (no tiling), has better quality (e.g. many games have better AF) but using it as a parallel to the traditional PC part these things don't matter as much: limiting bottlenecks, speeding up the most expensive parts of rendering, being flexible (read: but not inane lists of check-point items as NV often does), and IMPORTANTLY fitting into your console budgets with the best bang for buck.
So I do see a scenario where even with these modest budgets but "high end" for consoles (but they are the same budgets the current consoles had at 90nm) a really great console could be tossed out at reasonable "costs" in terms of chip and TDP -- all the while leaving enough spending cash for Kinect 2 or whatever.
What I don't see happening is what Sweeny wants, at least not at the "gross" level.
e.g.
8-12x CPU based on the raw flops (e.g. about 80 for Xenon so 800GFLOPs for a Xenon II) is totally not going to happen on the CPU (unless Sony resurrects Cell from the dead).
8-12x the CPU based on the cores, about 24-36 Xenon cores, also is not going to happen. There are the design challenges, but also the question: Do you want that many Xenon? And lets say, "Look Ace, you CAN fit that many cores on 28nm for the same mm^2 as 90nm Xenon," I would counter, "Ok, but the TDP is going to be MUCH higher. MUCH. While there are big leap on possible density the trade off is more often than not having the same area, just more transistors, sees a huge leap in power leakage." So while they may get 10x the gates at 28nm they may only get 5x the gates at the same TDP as 90nm. *THAT is a problem*.
Now I can see a lot of scenarios where the following IS 8-12x faster: Move up to 6 logical cores with 12/24 threads (IBM has this tech), go OOOe and clean up the horrible stalls in Xenon (huge issue), go with an upgraded/wider next gen VMX, and you have a chip a lot faster than Xenon and with the budgets of the Xenon TDP at 90nm (and smaller as well). And if I am MS I am saying, "And if you really need a boatload more performance you need to start using GPU compute" and shift resources over to the shader array ... or plug an array onto the CPU. Personally, my opinion a company like MS is better off offering a handful of nice CPUs and a lot of threads and then a huge GPU/Stream array rather than trying to fit in the middle with an SPE design. Anyhow, you could get that desired CPU performance from a design like this in some cases.
Btw, 20x the compute over Xenos would be over 4TFLOPs. Maybe he would count "effective flops" and you could say current designs are more flexible, efficient, and robust. The 10x triangle and raster is dependant on what he wants... we are now looking at designs with 1-2 vertex per clock and in the 750MHz range that is a setup rate of 50-300% faster setup. With tessellation and the huge unified shader arrays they are already at 10x triangle performance.
I am behind Sweeny, I think someone can cram that into a box if they want without breaking the bank, but do either company have a stomach for it? If we are looking at 2013 if one company hit those marks and the other hit 90% of it, the company that aimed higher would be in a world of hurt in terms of costs. I think there is a little check going on here. Of course if one aims too low (think 30% performance difference in real application) there could be some trouble for them as every game would run /look visibly poorer.
Should be an exciting 24 months ahead!
Pwn'd by Phaethon360.
Since 7459 DaysPhhfft like I care, Apple guna win next gen anyway .. /sarcasm
Rumours suggest Wii-U could be using stacked memory so its fair to assume if thats true that MS will go this route also.
As for the rest of it well, yea anyone can sit and crunch numbers based on old tech, but MS and Sony will still customise the parts and change the architecture, just like Xenos had Edram for example.
Lets not forget that features like tesselation will be much more useful on a closed system when developing games also.
Marumaro for the WIN !!
Since 6648 DaysCan't wait to see forza tbh, menu car quality should definately be possible ingame along with another bump in world geometry quality.
Wii-U i'm kinda writing off anyway, it's just nintendo coming to this generation with another novelty controller and to run the extra displays that the controller requires will have to have an impact on their overall performance, not to mention that they have no experience with online. MS announcing next year, especially if they have games running on the hardware could kill the wii-u before it even launches.
MS' big problem is early adopters were burnt last time, are they gonna offer a decent warranty out of the gate or are they hoping people have forgotten the first 2 years of the 360?