Programmer Timothy Lottes, who created the quick Approximate Anti-Aliasing (FXAA) under NVIDIA, which was first utilized in Elder Scrolls V: Skyrim, weighs in at the PS4 and what it may mean to gaming from a technical standpoint.
Over on his blog , Lottes postulates on an extremely technical level but seeing the information we’ve got has not been confirmed by Sony, his formulations could turn out being wrong.
So in saying that, remember that what he discusses below are in line with the belief that the specs for the PS4 are those out inside the open immediately.
According to Lottes, the “real” reason to get excited for the PS4 is what Sony is doing with the console’s Operating System (OS) and system libraries as a platform. If the PS4 has a true-time OS, with “libGCM” style low level access to the GPU, then PS4 first-party titles would be “years ahead” of the PC due to the fact it opens up what’s possible with the GPU. He notes that this won’t happen immediately at launch, but once developers tool up for the platform, this would be the case.
Assuming a 79070M within the Orbis (PS4), which AMD has already released the hardware ISA docs publicly, Lottes comes up with a logical hypothesis on what programmers may need access to the PS4.
Below, Lottes takes a glance at what isn’t provided on PC but are located in AMD’s GCN ISA documents.
Dual Asynchronous Compute Engines (ACE) :: Specifically “parallel operation with graphics and fast switching between task submissions” and “support of OCL 1.2 device partitioning”. Seems like at a minimum a developer can statically partition the device such that graphics can compute can run in parallel. For a computer, static partition will be horrible as a result of different GPU configurations to support, but for a dedicated console, it really is all you would like. This opens up a far easier thanks to hide small compute jobs in a sea of GPU filling graphics work like post processing or shading.
Dual High Performance DMA Engines :: Developers would get access to do async CPU->GPU or GPU->CPU memory transfers without stalling the graphics pipeline, and specifically ability to manipulate semaphores within the push buffer(s) to insure no stalls and coffee latency scheduling. Here is something the PC APIs get horribly wrong, as all memory copies are implicit without really giving control to the developer. This translates to significantly better resource streaming on a console.
Support for upto 6 Audio Streams :: HDMI supports audio, so the GPU actually outputs audio, but no PC driver will provide you with access. The GPU shader is actually the perfect tool for audio processing, but at the PC it is advisable to take care of the GPU->CPU latency wall (that are worked around with pinned memory), but so as to add insult to injury the PC driver simply just copies that data back to the GPU for output adding more latency. In theory on something like a PS4 one can just mix audio at the GPU directly into the buffer being sent out on HDMI.
Global Data Store :: AMD has no way of disclosing this in DX, and in OpenGL they just expose this inside the ultra-limited type of counters which may only increment or decrement by one. The chip has 64KB of this memory, effectively with an analogous access as shared memory (atomics and everything) and lower latency than global atomics. This GDS unit can be utilized for all types of factors, like workgroup to workgroup communication, global locks, or like doing an append or consume to an array of arrays where each thread can choose another array, etc.