(50G) [HPGCC3] DRAW3DMATRIX replacement with grayscale surfaces, proof of concept +- HP Forums (http://www.hpmuseum.org/forum) +-- Forum: HP Software Libraries (/forum-10.html) +--- Forum: General Software Library (/forum-13.html) +--- Thread: (50G) [HPGCC3] DRAW3DMATRIX replacement with grayscale surfaces, proof of concept (/thread-4700.html) |
(50G) [HPGCC3] DRAW3DMATRIX replacement with grayscale surfaces, proof of concept - 3298 - 09-11-2015 05:03 AM It's time to release something I've been working on for ... longer than I'd want to admit. It is a 3D renderer written using HPGCC3 with grayscale graphics and proper polygon sorting, originally intended to be used for a 50G version of a Casio AFX series game I enjoyed (a flight simulator with wireframe graphics called F-16 Falcon 2.0). At some point, my renderer was wireframe too, but when I started to use grayscale graphics, I noticed that without proper depth sorting some pixels could have the wrong color. To be precise, it happened when the 2D projections of two lines intersected; the point of intersection would get the color of the line that was defined earlier than the other one, which wasn't always the line that was closer to the viewer. When reading up on ways to sort the lines, I decided to allow polygons as well. Modern graphics cards like to cut everything into triangles, but with CPU rendering I expected the renderer to be more efficient if I allow more corners per polygon - the renderer simply assumes all corners beyond the third one are in the plane defined by the first three. (If that's not the case, graphical glitches may happen.) This didn't solve the sorting problem yet, but the BSP (binary space partitioning) implementation I subsequently wrote did. I went with BSP because the only alternative I seriously considered was the Z-buffer, which has its own shortcomings, for example problems with transparent surfaces and a large memory footprint. (With 131x80 64-bit depth values, we have a Z-buffer weighing about 82KiB. That's a lot for our poor 50G.) At some point my "render a single frame" and "render an endlessly rotating cube" test programs didn't cut it anymore, so I wrote something that allowed for multiple tests without constantly recompiling: a program that interprets a matrix as a heightmap and renders it, just like the built-in command DRAW3DMATRIX. What I'm releasing here is essentially this test program, but when the 50G has a command that does pretty much the same thing, it might actually be useful. ![]() The performance was worse than I expected, so I tweaked some things: I replaced almost all floating point calculations with fixed point procedures. That helped a little bit, but not as much as increasing the clock speed from HPGCC3's default 12MHz to the OS's default 75MHz. Still, when feeding the output of {6. 6.} RANM 10. / \->NUM into the program, it takes between 300 and 400 milliseconds per frame. DRAW3DMATRIX performs better, and it can handle larger matrices than that just fine, but my renderer's performance drops off quickly due to the BSP calculations. The renderer requires a matrix of real numbers on the stack (TYPE must report 3). Other types are not accepted; they only result in an Invalid Argument Type error. If you want to test it with RANM (which generates a symbolic matrix containing integers), use \->NUM to convert it. You might also want to scale it down yourself because unlike DRAW3DMATRIX my renderer does not take scaling parameters. Note: only scaling by a real number is not enough, because even though every number (except 0 apparently) will become a real one, the wrapping type will stay a symbolic matrix. The difference between a symbolic matrix and a numeric one is not visible directly, but TYPE reports 29 for the former and 3 for the latter. When starting the renderer, you'll see a few letters followed by numbers at the bottom of the screen. They show how many polygons got past the backface culling step (i), how many were actually drawn (o), how many were added by cutting a polygon into two pieces (c) (cutting polygons is a part of how BSP works), and finally the time spent rendering the frame, in milliseconds (t). A 2x2 matrix is usually at 3 or 4 ms, a 4x4 one takes slightly less than 50 ms, and a 6x6 matrix takes between 300 and 400 ms, as I explained above. As a demonstration of the renderer's capabilities, the surfaces are rendered with 75% alpha, that is, everything behind them is faintly shining through. The surfaces are dark or bright, depending on which side of them you see; the darker one is the underside. Removing this in favor of solid colors only has minimal impact on performance. The keyboard controls are also different from DRAW3DMATRIX. Instead of rotating around the object's axes, the cursor keys rotate around the global X and Y axes. Rotation around the Z axis is done with leftshift and rightshift, the number keys 2, 4, 6 and 8 can be used for panning (which DRAW3DMATRIX cannot do at all). + and - move the object along the Z axis; this can be used as a substitute for zooming. 5 will reset the object to its original position and orientation, ON will get you out of the renderer. Known glitches: - Sometimes a few pixels of the black outlines are drawn below the surfaces instead of above them. I put in some hacks to keep them on top, but apparently they are not working quite right. - The surfaces tend to flicker slightly, especially with larger matrices as the input. This shouldn't happen because I'm rendering to a hidden buffer which is then copied to the visible one with a single memcpy, but it's still there. - When a surface covers large amounts of screen space, some slightly darker vertical lines appear with a horizontal distance of 4 pixels. I do not know what causes this. - Rotating and moving the object can be slow or fast, depending on how fast the object is rendered. It's like that because the object is rotated/moved by a constant angle/distance every frame while the corresponding key is held down. - On x49gp, only the leftshift, rightshift and ON controls are registered for some unknown reason. The remaining controls are apparently ignored, at least for me. The real calculator does not show this behavior. A big THANK YOU to Claudio L. for HPGCC3 and some personal support when I had trouble. He showed me how to debug it with objdump, x49gp and gdb, and together we sorted out some issues which were mostly caused by me using a current version of arm-none-eabi-gcc instead of the one suggested on the HPGCC3 website (which is ridiculously hard to obtain, by the way; the current one was simply installed from my system's package manager). Only one issue was a real HPGCC3 bug, which I was able to work around (the romlib has some variables outside the .romglobals section, which makes them overlap with user variables; workaround: define an otherwise unused array covering the area these variables are put into). I haven't decided on a license yet, so I'll hold on to the source for the moment. If I go for an open source one, it'll also need some cleanup and documentation. As I mentioned, this program was designed as a test for the renderer code, so it should be considered a proof of concept. In that sense, my conclusion is that it works reasonably well, but it could use some performance improvement in the polygon sorting/cutting area. I do have some ideas, but they include major rewrites, so that will take a while. Any questions, suggestions, bug reports? RE: [50G][HPGCC3] DRAW3DMATRIX replacement with grayscale surfaces, proof of concept - 3298 - 10-30-2015 03:04 PM I have done significant improvements. In the first version I was concerned about memory and had the program recalculate the BSP tree during each frame. This allowed me to draw parts of the tree before it was fully built, so I could free the drawn polygons and keep the memory usage down. That was a major CPU hog (for obvious reasons), so the second version builds the tree once on startup and keeps it in memory. Together with quite a few other improvements (some of which affect the structure of the BSP tree to keep the polygon count down) it renders a 10x10 random matrix in about 100 milliseconds. Unfortunately my tests with 16x16 matrices or larger failed with an "Out of memory" message. 15x15 matrices take about 330 milliseconds. Building the tree on startup takes several seconds, though, so be patient. When exiting, all the data structures are released again (I could skip this because they will be released on exit anyway, but while debugging with valgrind the messages about lost memory blocks got in my way), so that will take a moment as well. Other changes: - The cursor keys and RS/LS now rotate around the object's axes instead of the global axes. This is a byproduct of the changes, and it didn't bother me enough to change it back. - Rotation and movement now scale to the amount of time spent rendering a frame, resulting in more predictable behavior. - The information on the bottom of the screen now shows the polygon count in the BSP tree (p=...) and the frame time in milliseconds (t=...). There are more changes behind the scenes, mostly in preparation for other uses; for example, the BSP merging functions are still under construction. They are commented out for this version because they are simply not needed - there is only a single object, nothing to merge with. If you are paying attention to the number of polygons, you night notice that it has been reduced quite a bit. There are two reasons for that: The first one is a change to the way the black outlines are drawn. They were just ordinary polygons with two corners before, but now they are drawn as special effect after filling one of the gray surfaces. This required keeping track of which edges were produced by cutting polygons so you don't get black lines all over the place. The other change was to add invisible polygons in certain places. This sounds counterproductive, but by 'containing' each square with these invisible polygons, I can keep the BSP construction algorithm from clipping planes from different squares against each other, which was previously happening a lot, and each time this happened, the polygon count increased. There is still a lot to be done: The BSP merging algorithm works, but it takes a while to finish - while reading up on it, I came across an approach using a linear programming feasibility test to detect useless branches; this needs to be implemented. I also want portal rendering, which basically just needs support for 2D clipping against non-rectangular regions; the rest of it can already be done through the same mechanism as the outlines. RE: (50G) [HPGCC3] DRAW3DMATRIX replacement with grayscale surfaces, proof of concept - 3298 - 12-12-2017 02:58 PM It's pretty safe to put it on the calculator, I've had it on mine for what, five years now? (Can't believe how much time has passed already - the earliest timestamps I can find in my various HPGCC3-related files are from July 2012. That's about one year after I got the 50G itself, and a bit more than a year before pier4r's benchmark project...) Anyway, HPGCC3 doesn't interfere with the normal operation at all, and if something goes wrong while flashing or you somehow want to get rid of it, you can simply retry or flash the original ROM again. Literally the only downside is that it reduces the size of port 2 by 256 KiB (or was it just 128? I don't remember). I installed it and never looked back - I only flashed the calculator once more after that (again HPGCC3, this time built with a newer compiler; I hoped to squeeze a tiny bit of additional performance out of the libraries via the enhancements applied to the compiler's code optimization strategies). As a bonus, building HPGCC3 sets you up with a development environment for HPGCC3, so you can start writing your own stuff in C or C++ right away (other languages might need some effort to interface with the HPGCC3 libraries, but it should be possible as long as the language can be compiled for ARMv4T). Emu48 won't run HPGCC3 because HPGCC3 runs directly on the ARM hardware (that's its entire point) whereas Emu48 only emulates the Saturn layer. There's x49gp which does run HPGCC3 (I'm using that as my debugging platform), but it is a "build it for yourself on Linux" deal as well. Not a problem for me at all because I run Linux anyway, and I sometimes tinker with the software (i.e. building something myself happens all the time - for example, in the last 2-3 weeks I messed with the touchpad driver part of libinput, adding some features I wanted), but for Windows users who are not familiar with Linux or compilers that might be a bit uncomfortable. (Though to be fair, compilers shouldn't pose a problem to most of the members around here, considering the topic of this community.) |