Percentage Closer Soft Shadows
One method for reducing the aliasing of traditional shadow maps, is to soften the edges. By using such a technique the alliasing problem of the depth shadow maps is less noticeable and the resulting image is closer to the real world shadows. Also textures with lower resolution can be used without noticeable visual artifacts.
So, for better results, instead of using uniform soft shadows (VSM, ESM, simple PCF) we implemented the Percentage Closer Soft Shadows method as described by R. Fernando in the whitepaper: Percentage-Closer Soft Shadows. A slightly modified version of the same algorithm is presented by K. Myers et al. at Integrating Realistic Soft Shadows into Your Game Engine.
Although the technique is cool, since the shadows harden on contact and become softer as the distance from the object increases, they have some performance problems in order to appear visually correct.
Let me explain.
The method as described in the original implementation by R. Fernando uses 6×6 samples (= 36 taps) to find the blockers and 8×8 samples (= 64 taps) to smooth the shadowmap with an adaptive step size that is proprortional to the calculated penumbra. For kernels of such size the shadows appear to be soft, but in the final image, the sampling pattern is visible. K.Myers et al. suggest the use of poisson disks with less samples, in order to increase performance and hide the sampling artifacts. While it’s true that by using a 16-tap poisson disk as the PCF kernel increases performance, usage of the same kernel for every pixel results in recognisable patterns in the penubra. In order to achieve better visual quality, both methods require a higher number of samples and/or smaller light sizes.
To eliminate as much artifacts as possible, by using the same amount of taps, we used a rotation texture in which the red channel keeps the sin(x) and the green channel the cos(x). By using such a texture (repeated over the entire screen) we rotate the original poisson disc and as a result we have different kernels for every pixel. This techinique works quite well because the error is “unstructured” and thus less noticeable (original method proposed by J. R. Isidoro “Shadow Mapping: GPU-based Tips and Techniques”). The same method seems to be used by Crytek in their shadow algorithm (see “Finding Next Gen: CryEngine 2″).
Depth bias is another problem when using PCF. K. Myers et al. suggest the use of slope-based depth bias instead of a constant bias for all the samples. Such approach requires the calculation of the inverse transpose texture Jacobian, as proposed by J.R. Isidoro in “Shadow Mapping: GPU-based Tips and Techniques”, in order to find the local depth gradient (dz/du, dz/dv). The new pixel z offset is then calculated by taking the dot product between the local depth gradient and the offset vector from the PCF kernel. In our implementation though, this approach didn’t work well, especially in areas where the PCF kernel covered more than one planes. Such problematic cases could be fixed by adding a constant depth offset to all samples, which can be smaller than the offset required in the case of constant depth bias. In the end, we ended up using a constant offset for all samples, instead of the above algorithm, in order to increase the performance of the algorithm.
In order to make things little faster, instead of using a r32f color texture to the frame buffer, we tried using a r16f and a rgba8 (unsigned byte). Performance wise, both floating point textures gave the same speed (16FPS in a GeForce 8600 M GT). So nothing improved here, and the lack of 32 bit accuracy of the r16f texture, introduced new artifacts to the final image. The speed of the rgba8 color buffer is even worse (10 FPS in the same card), but this is due to the fact that we have to pack and then unpack the value of the depth in order to maintain the 32 bit accuracy. This is required every time we sample the shadow map (blocker search + PCF) and therefore there is a performance penalty.
That’s all for now about PCSS.
Any comments, thoughts or suggestions are more than welcome.
Thanks for reading.
AD
From left to right (Top row): RGBA8 with 16×16 grid of samples (= 256 taps), R32F with 16-tap poisson disk, R32F with 16×16 grid of samples (= 256 taps), Bottom row: Irregullar filtering by rotation of the poisson disks with 16 taps.