Graphical breakthroughs in the Retroquad engine
(Click here for full size image)
Luminance-Ordered Quadrivectorial Four-Way Quantization. AKA FWQ (Four-Way Quantization).
It's the key for the 8-bit indexed Quadricolor dithering used in the Retroquad engine. This article assumes that you already understand about dithering algorithms such as Floyd-Steinberg and the dithering filter used in the UT99 software renderer.
Retroquad is a game engine with a 3D software renderer that employs many color algorithms independently created by myself to solve the limitations of 8-bit indexed color systems, pushing their quality above the known possibilities of their current state of art. It is intended to be an engine with all the advantages of 8-bit indexed color 3D software rendering, but without the drawbacks of classic 8-bit color renderers.
Among its innovations are a proprietary quantization algorithm that compiles images in a special format that can be dithered in realtime in 3D space, eliminating the problem of texture pixelization that happens in all quantization algorithms that performs dithering in 2D space. This creates 8-bit color textures that looks incredibly more faithful to their truecolor counterparts and smoother at any distance, eliminating color restrictions for texture artists.
Other improvements already implemented include a smooth multi-step dithered and color-corrected lighting system that is combined with textures in realtime in a per-pixel fashion, a smooth dithered and color-corrected blending system used for multitextured glowmaps and translucent polygons, dithered interpolation of texture frame animations, dithered trilinear texturemapping, anisotropic texturemapping, a molten texturemapping technique that allows liquid textures to melt through all sides of any 3D object in a seamless fashion accross all of its edges, alphamasked textures with semitransparency, scrolling textures, multiplanar scrolling skies with horizon fading, dithered "soft depth" (for smooth shorelines, smooth scenery edges and "soft particle" style sprites), and procedurally generated particles with precomputed dithering.
Planned features for the renderer are dithered colored lighting using a fast 8-bit indexed color pipeline, edge antialiasing using an extremely fast preprocessing technique, dithered fog, texture crossfading (for terrain "blending" and triplanar texturemapping), Unreal-style hybrid skyboxes with alphamasked backgrounds & dynamic skies, perspective correction on character models, and HD texture support on all other parts of the renderer (GUI, HUD, character models and 2D sprites).
Also, its rasterizer doesn't distinguish between my new 8-bit texture format and classic 8-bit textures ā both kinds of textures are rendered through the same 8-bit data pipeline, making the code easier to maintain.
The whole engine is being coded in pure C with no usage of processor-specific extensions. Hardware requirements are exactly the same as in classic software-rendered WinQuake, except for more RAM. The lack of processor-specific extensions usage is purposeful, to ensure that this engine remains highly portable and as hardware-independent as possible. Another goal was to remove and replace as much of the old Quake code as possible, to make its code more lean, polished and easier to maintain, hopefully turning it into a completely original engine.
However, multi-core processing was also planned, to improve 1080p performance and hopefully allow for 4K rendering without sacrificing the core goals of the project.
When fully polished, Retroquad should be robust enough for becoming a viable platform for creating new commercial quality independent 3D games that can be very quickly ported to other operating systems, consoles, mobile, and all kinds of obscure hardware and independent hobbyst hardware projects where it would otherwise be technically unfeasible or financially expensive to port them (such as the multitude of Chinese handhelds whose usefulness is limited to emulating old gaming consoles and playing classic software-rendered retro games such as Duke Nukem 3D and Quake).
Another reason for this engine to remain as hardware-independent as possible is to ensure future-proofing for software preservation. Games created using this engine will be able to be easily emulated for many decades in the future, making sure that people will always be able to experience them properly, even if their source is inaccessible or lost.
And finally, the last main reason why Retroquad exists. Since the 90s, I've seen many game engines, graphics chips and software platforms come and go, every one of them becoming obsolete and incompatible over time. I was never able to afford keeping up with them, and my life was always so screwed up that any ambitious project such as a full commercial game would take many years to be completed. Software rendering is the only tech I could use that will surely not suffer from hardware & drivers deprecation, the only tech I can count on to keep developing something and take my time to battle with the depression, poverty, health issues and other roadblocks in life, knowing that when I come back to it, it will still be working. Retroquad was intended to be an engine for people struggling in life, an engine that even if life slows you down and you take decades to create a game, it will still work as intended.
(Dithering comparison | Click here for full size image)
In FWQ, each vector is unidimentional, containing the quantization of the (forward, backward) direction across a single axis (horizontal, vertical). Thus, four way.
The "four way" aspect means each way is unrelated to each other (unlike, let's say, a diagonal vector, which is bidimensionally related to both axes of a horizontal plane). And because of this, each way can be handled independently. The left way is the only one that diffuses error to the left, the up way is the only one that diffuses error upwards, and so on.
(FWQ cubemap skyboxĀ | Click here for full size image)
The unidimensional aspect of each quantization vector also allows the error diffusion to spread seamlessly across cubemaps. This is an unique feature that can't be achieved with any other error diffusion technique because the quantization errors of a cubemap must be spread accross 3 axes, instead of only 2. A cubemap has 3 axes and each cubemap axis has 2 quantization axes, but each cubemap axis shares only 1 quantization axis with each other cubemap axis. The final result is that the error diffusion in each side of the cubemap is an intertwined mix of the error diffusion of its four adjacent sides. For example, the texture of the back side of the cubemap will spread its horizontal error diffusion to the textures of the left side and of the right side, while its vertical error diffusion will be spread to the textures of the bottom side and of the up side, essentially "giftwrapping" the error diffusion across the whole cubemap.
While all this works well for generating image data that can be dithered per-pixel in a tridimensional projection, there's another dimension where it didn't look good: Time.
During animations (camera movement, animated texture frames, texture scrolling), the resulting image would look too noisy. This is because in error diffusion there's no way to control which texel will look brighter and which texel will look darker, unlike in ordered dithering. However, since four-way quantization outputs 4 color indexes for each texel, we can reorder the indexes of each texel by luminance, which will smooth out the brightness variance per pixel and eliminate most of the noise.
All that was described so far works well for textures that are close to the camera. It works very well when each texel is rendered to several pixels. However, it doesn't look so good for textures that are far away from the camera (with some texels being skipped from a pixel to another) because the error diffusion in them will also be skipped, resulting in an image that's not smooth enough.
Due to this, I've implemented an extra "error splitting" step in the error diffusion, to make the error diffusion be dilluted to both the next neighbor and the farthest next diagonal neighbor on each way, which makes the four-way quantization spread the error correction evenly accross all neighboring texels, effectively making the four-way quantization use an eight-way error diffusion. This looks good on submips, but in some cases it doesn't look so good on mipmap zero, because it reduces the strenght of the error diffusion. Also, because this error splitting technique violates the unrelational aspect of the four ways, it can't be used on cubemaps.
All of this four-way quantization data is stored in a special 8-bit indexed color format called "Quadricolor". In the Quadricolor format, the color of each texel is composed by 4 different subcolors, which means that each texel has 4 different subtexels.
This is fundamentally different from direct-color formats such as 24-bit RGB, because while 24-bit RGB colors are divided into 3 different channel values (one for the red spectrum, one for the green spectrum and one for the blue spectrum), quadricolors are divided into 4 different subcolor indexes, with each subcolor index addressing a predefined color value (which in turn can be composed of 3 channel values, for a total of 12 channel values per Quadricolor).
While smooth colors can be achieved in direct-color formats by simply modifying the value of each channel directly, the way to smooth out colors in the Quadricolor format is to find out 4 subcolors whose indexes points to palette entries with channel values that are balanced against the channel values of the palette entries of each other subcolor's index in a way that our brain can combine all of them to interpret the intended color.
(Floyd-Steinberg dithered quantization with positional dithering on the left, FWQ with positional dithering on the right | Click here for full size image)
The nature of the Quadricolor format also means that to faithfully display a single quadricolor texel, at least 4 screen pixels are needed, one for each subcolor. This "one subcolor per pixel" aspect of the Quadricolor format allows the rendering engine to dither its subcolors into tileable screenspace patterns that allows each texel to be infinitely expanded without losing its intended color definition (unlike texelspace dithering techniques, whose intended colors lose definition by becoming too far apart when the texels are expanded).
(Decontrast filter iterationsĀ | Click here for full size image)
However, when a Quadricolor texture is made of neighboring texels with quadricolors whose intended color spectrum is similar, only 1 pixel is needed for displaying each texel, with the on-screen Quadricolor being a cluster of 4 pixels from 4 different texels, with 1 different subcolor from each texel's quadricolor. This means that a Quadricolor texture can be displayed in native resolution with no significant loss of fidelity.
To improve the balance between the subcolors of the quadricolors of neighboring texels and ensure better color smoothness, a special "decontrast" filter was created in the texture compiler to proportionally reduce the contrast between the neighboring colors of the 24-bit RGB source image.
And finally, the Quadricolor format being composed of 4 color indexes also means that the rendering pipeline made for it is fully compatible with regular 8-bit indexed color textures, by simply reading all subcolor indexes from the same offset within the texel.
(FWQ 8-bit Quadricolor texture with semitransparent alphamasking; positional dithering disabled on the left side, and enabled on the right side | Click here for full size image)
Another advantage of the Quadricolor format is that it allows for creating semitransparent colormasked colors without an alpha channel. Due to being fragmented into 4 subcolor indexes, each Quadricolor can be fully opaque, fully transparent, or have 3 different stippled semitransparency levels (25%, 50%, 75%) depending on how many of its subcolors are transparent (0, 1, 2, 3 or 4).
(Semitransparent Quadricolor alphamasking | Click here for full size image)
(Filtered hardware alphamasking comparison | Click here for full size image)
This gives smoother borders to colormasked textures, which is not possible in other color formats without an alpha channel.
Textures needs bidimensional RGB error diffusion between at least 4 neighboring colors, because they need all of their colors to be rendered at once (parallel output). But in color maps (for shading, blending, etc.), only a single intensity level is displayed at once (serial output), so they don't need error diffusion and can perform the error correction on the current color instead.
The act of performing color correction on the current color is a technique that I call "error mirroring". While error diffusion gets the error value of the currently resulting color output and apply it to the next desired color input, error mirroring gets the error value of the currently resulting color output and apply it back to the currently desired color input, resulting in two subcolors of the same color with their error mirrored between them.
(Old blending | Click here for full size image)
(New blending | Click here for full size image)
(Multi-layered blending | Click here for full size image)
For semitransparencies (including additive glowmaps), the Quadricolor format is used to display not just color correction, but also alpha correction. color maps are a combination of only two RGB colors accross a single serial alpha axis, so they only need two subcolors for error-mirrored RGB color correction, leaving the other two subcolors for alpha correction.
(Lighting comparison | Click here for full size image)
Lighting in Retroquad uses lightmaps, but it doesn't combine the lightmaps with the textures into surface image caches. Surface caches in Retroquad contains only lighting data, which is combined with the textures per pixel in realtime, and because of this they're called "surface lighting caches." This allows the surface cache UV coordinates to be dithered using different offsets, which makes each lixel (lighting pixel) to be spread accros multiple texels (texture pixels), smoothing out the final image better.
(HD texturing comparison | Click here for full size image)
It also allows the engine to support several more features such as HD textures and dynamically mapped texturing effects (scrolling textures, turbulence mapping, melt mapping, etc).
(Lighting comparison | Click here for full size image)
In the lighting, the Quadricolor format stores indexes of subcolors computed with color correction and brightness correction. However, these are only used in the color shading map. The surface lighting cache stores only the lighting level of each lixel, with 8-bit precision.
(Lighting comparison | Click here for full size image)
The lighting is dithered in three steps: surface lighting cache, lixel coordinates and color shading map. The lighting levels are dithered with 16-bit precision during the conversion from the lightmap to the 8-bit surface lighting cache, the per-pixel coordinates of the lixels in the surface lighting cache are dithered during the on-screen rasterization, and finally the lixel values are dithered with 2-bit precision to address the 6-bit levels of the color shading map.
(Tinted light shading map experiment)
Colored lighting can be implemented through an additional 8-bit surface tinting cache map containing tinting information defined by 4 bits for 16 hues and 4 bits for 16 saturation levels, applied before the surface lighting cache. Lixel coordinates would be the same for both the surface lighting cache and for the surface tinting cache, so there would be no extra texturemapping cost for colored lighting. The main challenges would be to keep the banding between the 4-bit levels to a minimum, and to compute color correction, hue correction *and* saturation correction with an acceptable level of quality using only 4 subcolors for each quadricolor.
Color swapping and palette animations are possible to implement in Retroquad by applying a filtered palette during the quantization to restrict the source colors to the scope of the desired color range, and then using a color shading map with those color indexes swapped to the desired ones.
(Particles | Click here for full size image)
In Retroquad, a "particle" is a single colored point of a 3D visual effect image (smoke, stars, pellets, etc), just like a pixel is a single point of a 2D image. A single point doesn't have an individual meaning because each particle is not an image in itself, which is why particles should not be textured. Their visual meaning is given by their behavior and group structure, also known as "particle effect".
Despite particles being not textured, each particle is a point in 3D space, and therefore needs to represent depth by being drawn bigger when closer to the camera. The most appropriate shape for that is a circle, which is only affected by 3D camera position, not by 3D camera angles. Also, because each particle is just a single point in 3D space, and to keep its round shape consistent in any situation, its depth check is performed only once at their center pixel, which means that either the whole particle is visible, or it's completely occluded.
To be consistent with the smoothness of the dithering used in models and sprites, the particles must also be dithered. Also, one of my goals was to make sure the dithering didn't affect the perfectly symmetrical shape of the particles.
Since particles are not textured and have only a single color each, a different dithering optimization was implemented.
In a 4x4 opacity dithering matrix, each line has 4 possible levels of opacity (100%, 75%, 50% and 25% opaque), according to how many pixels are covered. Despite each line having 4 units of boolean values, the total amount of combinations is only 4 (instead of 16), because in the 50% opacity level the opaque pixels should always be apart, and all other possible variations of a same level of opacity can be represented by offsetting its index. By alternating between 2 different dithering opacity lines, 4x4 dithering patterns for 8 opacity levels can be achieved.
To define the area of each opacity level within the screenspace area of the particle, the shape of the particles was segmented into several halos, and each halo has a opacity level assigned to it. Afterwards, each line of the whole particle is scanned, and all line segments with identical line patterns between different halos are grouped into a maximum of 2 different line segments per pattern. This inter-halo pattern aggregation heavily optimizes the amount of iterations needed to draw the whole particle.
And finally, all lines of each pattern line are rasterized by individual functions featuring Duff device loops with some lines skipped to create holes according to the desired pattern. This makes dithered particles extremely fast to draw, faster than non-dithered particles, because instead of performing per-pixel verifications, it just executes less code. This, combined with the single-pixel depth check and all of the procedural steps being precomputed, allows the engine to render thousands of particles with no significant impact on performance.
Final words, and engine release
There are other graphical advancements in Retroquad such as soft depth, melt mapping and multiplanar scrolling skydome with horizon fading. Also, there are many more in-depth details that could be written about what's been revealed in this article so far. However, I'm not in a good condition to write about everything right now.
Retroquad is still far from what I envisioned. Originally, the plan was to release it once at least the BSP renderer had been completely replaced and the bugs were ironed out. It's saddening to know that my vision for it won't be fulfilled, but I don't want it to be completely lost either, so I'm releasing Retroquad as is.
Since last year I lost almost everything in my life. And almost lost my family. We are going to lose our house because I need to sell it to put food on the table, and the water will be cut soon because I've been wrongly blamed in a lawsuit. The future is grim and dedicating spare time to continue doing graphics research & development is not possible anymore.
The Retroquad 0.1.0 release is here: download.
I pride myself in having developed my tech without external help. But that doesn't matter anymore. I need to help my family, and have no means to do so.
If you're grateful for this release, or simply willing to help, I have created a donation campaign here.