This post is about my new radiosity based global illumination renderer.
Global Line Radiosity
First, there are a few different names in use for more or less the same thing:
- Global Line Radiosity
(this is the name used in the book: Advanced Global Illumination, 2nd edition)
- Stochastic Radiosity
(this covers actually a bit more, but somehow fits here too)
- Global Line (Quasi) Monte Carlo Radiosity
- Global Raybundle Tracing
- Light Field Propagation, Intersection Field …
The classic Radiosity Algorithm in (very) short is:
Divide the scene into small patches, compute the formfactors between all pairs of patches, and solve a big matrix.
The Formfactors can simply be seen as probabilities that when emitting a ray from patch A, it intersects patch B (or as the the fraction of energy that B receives from A). So when emitting a number of rays uniformly over the hemisphere of A, the probability that rays hit B is the formfactor. Same, but not as accurate, when you rasterize the scene at A and count all rastered B-pixels, you get the formfactor.
In Global Line Radiosity rays are shot uniformly over the scene’s boundingsphere and the intersections along that ray are used to propagate energy.
The cool thing here is, that these global lines, and intersections along them, can be simply created by using the hardware rasterizer and order-independent-transparency (OIT). A few years ago this was done by using the A- or K-Buffer or depth-peeling. But more recent OpenGL-versions make it possible to use other techniques as Per-Pixel-Linked-Lists (PPLL) or Dynamic-Fragment-Buffer (DFB). There is a nice chapter about OIT in “OpenGL Insights” (chapter 20) and even better, the source from Pyarelal Knowles at gibhub. His work was very helpful for me to get started with OIT.
The reason i could use the hardware-rasterizer to compute GI was the main reason i was going for radiosity this time. Back when i coded my pathtracer i needed to build a spatial datastructure (BVH and Octree) to speed up Ray-Scene-Intersections. The time to create the BVH (using SAH) took a few seconds, depending on the scene, on my old computer which is now about 7 years old. I guess there are smart solutions to speed up the BVH-creation time, but you also have to care for updating the structure in case the scene changes, and so on.
Another point for radiosity was, that GI gets computed for the whole scene at once (Objectspace), and not only for the current view-direction (Screenspace). But this point is also the biggest fault of radiosity, view-dependent effects have to be added separately and more complex materials (than perfect diffuse ones) are not for free either.
As shown in the video at the end of the post, the renderer starts immediately after the scene is loaded. While the displayrenderer is just responsible for displaying the radiosity, the GI-renderer works in the background and can be turned off after a few seconds.
Other Radiosity features:
- no shadow-rays are needed
- lights can be of any shape – any face can emit light.
- the number of light source doesn’t affect performance.
Raytracing based renderers usually slow down when the number of lights inreases. There are solutions for this, like creating separate data-structures of lights (light clusters), e.g. Light-Cuts.
- etc …
There are a few papers (i only found very little in books about it) out there, that cover Global Line Radiosity in its different variations and were a source of inspiration for me:
- Approximate Radiosity Using Stochastic Depth Buffering
- Global Illumination using Parallel Global Ray-Bundles
- Light Field Propagation and Rendering on the GPU
- Fast Global Illumination Baking via Ray-Bundles
- … and a few more
I coded everthing in Java using OpenGL 4.3 and GLSL 4.3.
Here is how it basically works (again very short): render the scene into the oit-buffer to generate the list of intersections per fragment, map the lightmap-texels into the oit-buffer, and update them.
Thats actually the whole algorithm, but as usual the devil is in the details. The following is about removing the devil from the details a bit.
Here is an overview of the most important parts i had to take care of:
- order independent transparency
- shadow mapping
- light maps, texture atlas
- HDR lightning
- program-architecture (model, gui, …)
The program parses OBJ-Files and sets up an internal Indexed-Face-Set data-structure. After loading the file, the whole scene is wrapped in VBOs and Textures and stored on the GPU. Although the OBJ-Format is not optimal for storage and loading time (ASCII), it is supported by all CAD/DCC programs i know, and therefore its quite comfortable for generating test-scenes.
Order Independent Transparency
I started analysing the code from Pyarelal Knowles to understand what it takes to implement OIT in OpenGL.
To get a list of intersections per Fragment, the scene is rendered from a random direction using ortographic projection and for each Fragment a Per-Pixel-Linked-List is created. The random directions are created by uniformly sample the scene’s boundingsphere using halton-sequences (2,3,5). For computing the boundingsphere there are quite some algorithms available. I’m currently using the Bouncing-Bubble-Algorithm.
One obvious problem of OIT is the unpredictable amount of memory usage. For each fragment some additional information for later use needs to be stored, which currently requires me to store 16 bytes/fragdata, 4 bytes for each nextpointer and 4 bytes for each headpointer. So e.g. for a resolution of 1024×1024 and an average of 100 intersections per Fragment this makes:
((16+4)*100 + 4)*1024*1024 = 2048 MB ~ 2GB !!!
Since my graphics-card, a GeForce GTX 550 Ti, only has 1GB, this is clearly a problem. To handle this, i reduce the oit-resolution for most scenes to 256×256 up to 512×512. Also, not all fragments produce a list because the whole scene is inside the boundingsphere and so there is a lot of empty space. I’m also using a tiled-renderer which subdivides the view-frustum as soon as Out-of-Memory is thrown by OpenGL and re-creates the OIT-buffers for each tile. This is not really well developed yet, and definitely needs more thinking.
A second problem is that the list of fragments need to be sorted for later usage to find successive texels … this also becomes handy when it comes to transparency in radiosity. The Sorting (as done in Pyarelal Knowles code) can be done by locally converting the linked list to an array and sort the array then. However this requires a constant array size, which was not an option for me, because i need all fragments no matter what the depth-complexity will be. Besides that, huge local Arrays are a real fps-killer. So now im doing inline-sorting on the Per-Pixel-Linked-List. I tried Merge-Sort, Insertion-Sort and Selection-Sort, … performance-wise Merge-Sort is a little bit faster then Insertion-Sort. But in general the inline-sorting was quite a lot faster then the copy-ppll-to-array-and-sort method.
As a final note, i have the feeling, that here is a lot of room for optimization. Probably i will dismiss the per-pixel-linked-list completely and go for the dynamic-fragment-buffer, or some other method, someday.
To add the contribution of the sun to the GI-solution i’m using the simplest form of shadowmapping. No PCF, VSM, CSM PSM, … and what their names are, is used.
The Scene is simply renderered into a shadowmap from the suns jittered direction using orthographic projection. The jittering and the temporal averaging take care of a proper penumbra then.
The extra shadow-mapping pass can be avoided, when either the scene is lit by an HDRI, or by directly using the OIT-direction. But the solution coverges a lot faster when the sun is added into the GI-iteration-step at each pass.
For all other lights in the Scene (including skylight/hdr) the oit-buffer is used for sampling.
In the following image only the direct light from the sun is enabled (no GI, no Sky, …). The yellow dots on the background are the sun-samples (directions!) projected onto the scene-boundingsphere. The bigger point is the current sample. The size of the sampling area can be controlled in the program. Increasing the sampling area would give similar results to when using an uniform skydome. Minimizing the sampling area, gives hard shadows without any penumbra. Here the shadowmap resultion is 1024/1024. Shadowmapping is really fast, so the resolution is not that important. More important is the quality of the shadows, especially for the mapping at edges and corners.
As in Radiosity usual, the energy is stored in the scene. Often the scene itself gets subdivided into smaller patches. Similar to that is when the original triangles of the scene are rastered into a texture and the resulting texels represent the patches. This has some advantages:
- rendering the scene is still fast
- texture filtering / lookup is simply done by UV-coords
- storage matters
So how to generate the lightmaps UV-set? Software packages like 3ds-max have tools for UVW-Unwrapping. the resulting UV-set is tighly packed with a minimum of open cracks and can immediately be used for radiosity. However, this would create the dependency of another software, and scenes i tested took quite long for unwrapping (sometimes minutes). Another point is, that obj-files can store only one UV-set, so the primary UV’s for texturing would be lost then. In the end im creating now my own lightmaps. As shown in the linked post, the triangle-texture-atlas caused some sever problems at edges and vertices. In the meantime a lot of things have changed.
The padding between triangles to avoid color-bleeding and the resulting empty space which is filled then by a morphological dilation is more or less a big waste of space AND time. When having a few hundred thousand triangles to pack, the packing rate drops dramatically, because the padding-texels takes up more space then the really important rastered-texels which should store the scenes illumination in the end. Also, before doing the packing, it is hard to predict how much pixels are really used for padding, so texel-bleeding still could occur. This was something that kept me awake for days/weeks.
The big changes now are, that i focus more on doing proper sampling than on preparing a proper texture. I still try to keep a padding of 1 pixel but only to avoid overlaps during rasterization. This requires a new metric for pre-estimating the packing in X and Y. Next i needed to modify the rasterization so that triangles-vertices are guaranted to be rastered (… vertices are covered by pixel-centers), i don’t care about proper edge-rasterization anylonger. All in all the result is, that only the real important texels are dominating the texture-atlas which also results in a very good packing-efficiency, and i can omit the dilation pass, at the cost of a different way of fetching the right texels. I wont go more into the details, because this would take me a lot longer.
Stefan Reinalter has done a nice posting about “baking signals into textures” where he mentions a few other strategies: http://molecularmusings.wordpress.com/2011/12/30/baking-signals-into-textures/
However, i get quite good results now. The creation of the lightmaps takes only a few milliseconds; even the smallest triangles produce fragments, and the problem of false texel-lookups is obsolete.
So far, so good, but the next issue is there already, which is storage size. Again, too much to write, so in short: by default i’m using 2K/2K lightmaps, which store the irradiance (hdr), for front- and back-sides of each triangle. This requires a custom float encoding (mantissa, exponent) that offeres enough precission and allows to store both (front and backside irradiance) in only one texel. I tried several formats (rgbe, logluv, half_float, etc…), but all of them introduced errors sooner or later.
For nice, realistic environment lightning HDR light probes are used. The HDRI, at the moment either horizontal cross or vertical cross, is stored into a cubemap which gets sampled in the GI-shader by using the current oit-direction as the lookup vector (either negative or positive, depending on the face-normal). At the beginning i had some troubles with the face id’s matching the right HDR location and orientation. For debugging, to identify the cubemap faces, i painted a labeled cubemap hdr and used the sphere’s normal vectors as lookup-vectors. The HDR-light-probe can be used to replace the sun-pass (shadowmapping) completely, although it takes a bit longer to converge, because the probability that the oit-direction hits the sun depends on the sun’s size (projected onto the unit boundingsphere). Anyways, it’s a lot of fun to play with, just painting an image in photoshop (32bit mode) and export it as *.hdr and use it in the renderer.
Java-code, for reading hdr-files (worked out of the box): https://kenai.com/nonav/projects/jogl/sources/jogl-demos-git/content/src/demos/hdr/RGBE.java
Paul Debevec, hdr light probes: http://www.pauldebevec.com/Probes/
Just a few notes. The main controls here are the resolution of the texture atlas, and the resolution of the oit-buffers. Higher resolutions mean more quality but at the cost of lower performance. For most scenes the following settings are sufficient: oit=512, texatl=2048 Additionally the display settings matter. Here is a huge difference which AntiAliasing Mode is chosen.
Similar to the MVC-Pattern (Model-View-Controller) my program is separated into three main parts. The model, the GUI, and some kind of Controller. The Model has no connection to the GUI and the GUI is not aware of the Model either. But both know the Controller. Now, the Controller is not as in MVC responsible for taking/forwarding/(re)directing etc. actions. It only defines some Controls ( SharedControl<T>(…) in my implementation), which are able to store States, Values, Objects or whatever. Additionally there is a Controllable interface, that user-defined Objects (GUI-Components, Model, whatever) must implement to be able to register to the SharedControl. So when a SharedControl gets changed (keyboard, gui, model, etc.. ) all registered Controllables gets updated (except the trigger), and decide themself what do to with that change. I found that somehow simpler to use then MVC, because the internal GUI-logic can be directly implemented in the GUI-Code, same for the internal Model-logic. Also when comparing the amount of required code, the later was easier to manage in the end. During developing i felt that it is somehow hard to keep track of where and why and who triggers, who gets triggered and should trigger and so on. In the end it’s enough to completely trust the main principal: each attached Controllable is responsible for doing its own thing when an update happens. At least until now i didn’t get trapped in any update-loop or similar. But who knows, i’m not that experienced in GUI and application design, so i maybe find myself changing it as soon as a better solution strikes me.
One main aspect is how to visualize the radiosity result. I implemented different display modes: albedo, irradiance, exitant radiance, luminance, pseudocolors etc…. Important for the visual quality was to handle Aliasing. The best results gave MultiSampleAntialiasing (MSAA). By default i am using 16 samples, which drops the framerate a lot and uses a lot of memory, but due to my textureatlas sampling there are disturbing cracks when having less samples. An interesting feature of MSAA is, that points are actually drawn round, … usually they appear as squares on the screen. The second AA-mode is FXAA (Fast Approximate AntiAliasing), which is a postprocessing effect. It works too, at least better than noe AA at all, and hasn’t any effect on performance.
Currently my focus is on implementing transparent/translucent materials. As shown in the screenshots some kind of translucent-effect works already (visible on the leafes of the trees and flowers) and didn’t even require bigger changes in the existing code nor does it affect performance. Transparency is a bit more complicated, because it needs bigger adjustment in the viewport-renderer and the sun-pass. Since the oit-fragment-list is already sorted, the traversal overhead isn’t that big, but the amount of texture-lookups noticeably slows down the framerate. So this is definitely more work. Of course other effects as specular reflection, refraction etc… must be considered too.
However, at the moment i’m busy testing different scenes. Its just too much fun to just model something, open it in the renderer, changing lightning-conditions, materials etc… and watch the results.
For each rendering, the important statistics (num triangles, rendertime, passes, memory, fps, etc… ) are displayed in the HUD on the right. Current settings (light, materials, display) are displayed on the side-panel on the left.
System specs for testing:
- CPU: Q6600 2.4Ghz
- RAM: 2048 GB DDR2, 800mhz
- GPU: GeForce GTX 550 Ti, 1GB
- Java 7
- OpenGL 4.3
- GLSL 4.3
Cornel Box – Lightning
The following shows different lightnings situations and different displays of the results.
The scene has 1648 triangles. The surrounding box is closed on all 6 sides. Lightning is done at lowest quality, so there are quite a lot of artifacts at corners and edges, but therefore its quite fast, … 0.9 ms per pass. The yellow box is translucent, all other faces are opaque. Antialiasing-mode is MSAAx16, gamma-correction is 2.2. No contrast/brightness editing or any other postprocessing.