My First C64/X16 Programs!

Elektron72 · Post by **Elektron72** » Mon Aug 31, 2020 12:55 am

The 6502 (and 65c02) only have 8 bit add and subtract. The carry flag has to be used to perform operations on any larger values, and multiplication/division must be done with loops and bit shifts.

desertfish · Post by **desertfish** » Mon Aug 31, 2020 8:47 am

@geek504 Discussing Prog8 is probably best done in the topic I made in the "X16 General Chat" subforum? Can you perhaps ask your question again there?

Also the C64 (and CX16's) math functions operate on 5-byte floats i.e. 40 bits. Internally they work with 6 bytes I think for intermediate rounding precision, but the float values stored in memory occupy 5 bytes. Of course all floating point operations are implemented in software in the ROM

BruceMcF · Post by **BruceMcF** » Mon Aug 31, 2020 5:02 pm

On 8/31/2020 at 4:47 PM, desertfish said:

Also the C64 (and CX16's) math functions operate on 5-byte floats i.e. 40 bits. Internally they work with 6 bytes I think for intermediate rounding precision, but the float values stored in memory occupy 5 bytes. Of course all floating point operations are implemented in software in the ROM

Four bytes of that are the number part, the "mantissa", with the fifth byte the exponent ... the "x10^[__]" part in scientific notation, but since it is a binary floating point it is really "x2^[___]".

This is more precision than "standard" 32bit float, and for most floating point applications that the CX16 can handle makes long (64bit) floats pretty much redundant. A standard 32bit float is 23 bits mantissa but thanks to a trick it represents a 24 bit ... three byte ... numeric part, because floating point slides the binary number until the leading bit in front of the "binary" point (not "decimal" point) is a "1", and if you know what it is, you don't have to store it. IOW, if the result of an operation is 0.0011011101...x2^12, that is converted to 1.1011101...x2^9, and only the bits after the binary point are stored. That is an unsigned value, with the sign of the mantissa in the high bit of the floating point number and bits 23-30 as an unsigned value that represents (exponent+127), so 2^0 is binary 127 ($7F).

So standard floating point numbers can PRECISELY represent integers from +/-16,777,216 ... about +/-16.7 million. Outside of that range, they can only precisely represent integers that have a appropriate power of 2 as a factor.

By contrast, the Microsoft 6502 "extended" floating points (at Commodore's insistence) can precisely represent integers from +/-4,294,967,296 ... about +/-4.2 billion. The reason for Commodore's insistence is if you do exact accounting, you actually represent dollar values as an integer number of CENTS, so standard 32bit floats can only precisely represent +/-$167,772.16, and to Commodore's way of thinking, that wasn't big enough. A simple eight digit calculator can do better (using signed-magnitude Binary Coded Decimal arithmetic) .... +/-$999,999.99 ... and they weren't going to have an expensive computer system beat by an eight digit calculator!!!

This is actually twice the range of xForth's "double cell" integers, because the floating point are sign+size, while Forth has native signed integers that run from -2,147,483,648 to +2,147,483,647. So while floating point is generally LESS "precision" than scaled fixed point of the same size ... C64 floating point is actually roughly twice as precise as scaled signed 32bit fixed point (it's the same precision as 32bit unsigned, because if the data is unsigned then the mantissa sign flag being clear doesn't give any extra information).

Still, +/- 2 billion tends to be enough for lots of purposes when you have numbers that don't fit into the signed +/-32,000 ish or unsigned 64 thousand ish of 16bit integers.

geek504 · Post by **geek504** » Mon Aug 31, 2020 8:23 pm

@BruceMcF very refreshing history! I am happy for Commodore's insistence and have a useful extended floating point representation for the X16.

I guess to have an external co-processor would make those ROM routines useless and that doesn't seem to be in the spirit of 8-bit computing... I guess having an external expansion card that functions as a co-processor or as look-up tables would very much be in the spirit of 8-bit hacking!

Just food for thoughts...

SerErris · Post by **SerErris** » Tue Sep 01, 2020 1:20 pm

An external math co processor could be done on an IO extension maybe. But that is most likely even not doing any floatingpoint math... a good mul/div would be already great for integers that are longer than 16bit (e.g. 32x32=64bit mul). But to be honest, that is not really required for any games... 16bit math would be good enough. A math chip would "only" speed up the calculation. Esp. if you are talking about wireframe graphics like ELITE, the limited CPU is really slowing it down. However ... all the 8bit computers had to live with no multiply available.

The 6502 series even need to live to that for any addition it needs to go to the memory.

This is now deviating from the original topic a lot. I have no clue how the math co processors (8087 for instance) actually worked, but for 6502 there is no such thing available. A math coprocessor would be something that has IO address space mapping for example two data registers and a command register. The two data registers can then be multiplied, divided, squared etc. pp. The biggest issue would be the handover or wait for the coprocessor to execute the result and then continue. The 6502 does not know any of those integrations. Maybe a brk could work, but any interrupt (not just the coprocessor) would reset it. Maybe a quick loop looking at a flag register would do. However the whole thing would be asynchronous and therefore difficult to implement.

Interesting read: https://retrocomputing.stackexchange.com/questions/9173/how-did-the-8086-interface-with-the-8087-fpu-coprocessor

SlithyMatt · Post by **SlithyMatt** » Tue Sep 01, 2020 1:48 pm

22 minutes ago, SerErris said:

However ... all the 8bit computers had to live with no multiply available.

This was a surprising discovery for me, since the only 8-bit assembly I did before learning 65C02 for the X16 was Motorola/Freescale 68HC11, which did have 8x8 multiplication, thanks to the ability to combine the 2 8-bit accumulators (A and B) into a single 16-bit accumulator (D). I had assumed all these years that that was standard for a lot of 8-bit CPUs, but was actually a very specialized feature to support the 68HC11's signal processing capabilities, being a primarily embedded microcontroller variant of the 6800, which did not have built-in multiplication despite having the same register structure.

geek504 · Post by **geek504** » Tue Sep 01, 2020 2:25 pm

1 hour ago, SerErris said:

But to be honest, that is not really required for any games... 16bit math would be good enough. A math chip would "only" speed up the calculation. Esp. if you are talking about wireframe graphics like ELITE, the limited CPU is really slowing it down.

I was thinking exactly for 3D games (e.g. Wolfenstein 3D). With the asynchronous aspect you brought up, I believe that the best would then be the 16-bit look-up tables. It'll only be a matter of memory fetching the pre-computed answer from ROM akin to how VERA works. Look at the link: http://wilsonminesco.com/16bitMathTables/

It provides 6502 code implementation for the look-up table via BUS, SERIAL, PARALLEL, and MEMORY MAP I/O.


  file name    table size    comments
   SQUARE.HEX     256KB    partly for multiplication.  32-bit output
   INVERT.HEX     256KB    partly for division, to multiply by the inverse.  32-bit output.
   SIN.HEX        128KB    sines, also for cosines and tangents
   ASIN.HEX       128KB    arcsines, also for arccosines
   ATAN.HEX        64KB    ends at 1st cell of LOG2.HEX (next)
   LOG2.HEX       128KB    also for logarithms in other bases
   ALOG2.HEX      128KB    also for  antilogs  in other bases
   LOG2-A.HEX     128KB    logs of 1 to 1+65535/65536 (ie, 1.9999847), first range for LOG2(X+1) where X starts at 0
   ALOG2-A.HEX    128KB    antilogs of 0 to 65535/65536 (ie, .9999847), the first range for 2^x-1
   LOG2-B.HEX     128KB    logs of 1 to 1+65535/1,048,576 (ie, 1.06249905), a 16x zoom-in range for LOG2(X+1)
   ALOG2-B.HEX    128KB    antilogs of 0 to 65535/1,048,576 (ie, .06249905), a 16x zoom-in range for 2^x-1
   SQRT1.HEX       64KB    square roots,  8-bit truncated output
   SQRT2.HEX       64KB    square roots,  8-bit  rounded  output 
   SQRT3.HEX      128KB    square roots, 16-bit  rounded  output
   BITREV.HEX     128KB    set of bit-reversing tables, up to 14-bit, particularly useful for FFTs
   BITREV15.HEX   128KB    15-bit bit-reversing table (not included in EPROM)
   MULT.HEX       128KB    multiplication table like you had in 3rd grade, but up to 255x255

   MathTbls.zip            all the tables, zipped, including BITREV15.HEX which is not in the supplied EPROMs
   ROM0.HEX                a single Intel Hex file for ROM0 as I plan to supply it (also available zipped)
   ROM1.HEX                a single Intel Hex file for ROM1 as I plan to supply it (also available zipped)

SerErris · Post by **SerErris** » Tue Sep 01, 2020 5:03 pm

That is by far too much memory .. we only have max 2MB. Yes that would be very fast - no this is not working cause you cannot do anything anymore besides doing calculations.

There are other efficient solutions, that still rely partially on lookup tables, but that is tooo much.

Regardless of the size of the tables and how fast they are to do the 3d calculations, I cannot see that you get something like Wolfenstein 3D or any filled graphics working in X16. The fill of the areas need to be done by the CPU. And filling in VRAM an oddshaped part of the screen is not fast. It is even slower than filling any memory area, because you need to load the VERA ADDR registers frequently. Each change is a 3 byte write to VERA register for a single byte of data - IF the data is not in any way automatic incrementable. And that is most likely the case with any vector graphics.

Eventually you would end up to copy the whole screen from a memory buffer to Vera every time, and that alone will be relatively slow. estimated 10 cycles per pixel in 256 color mode. in 16 color mode you can do 2 pixel in 10 cycles. That will be still 384.000 cycles per frame (320x240). That is roughly 1/20 second if the 65c02 is running @80mhz. In that resolution you could bring out 20fps copy process. However you need to do all the heavy lifting (e.g. calculating the next image and stuff) and write all that to the normal RAM. That will be another 10fps at best (just the loop and the write) .. and will already half your FPS to 10 fps. and depending on the difficulty of calculation ... it is really not much air.

So you would need to reduce the viewport massively. Full Screen DOOM or Wolfenstein is not realistic. (I know you did not mention Full Screen).

SlithyMatt · Post by **SlithyMatt** » Tue Sep 01, 2020 5:20 pm

14 minutes ago, SerErris said:

That is by far too much memory .. we only have max 2MB. Yes that would be very fast - no this is not working cause you cannot do anything anymore besides doing calculations.

There are other efficient solutions, that still rely partially on lookup tables, but that is tooo much.

I don't think any one program would need all of these, unless it was a scientific calculator or really advanced spreadsheet. If you need to do one or two of these calculations very often, it would be totally worthwhile to take up the space in banked RAM.

geek504 · Post by **geek504** » Tue Sep 01, 2020 5:46 pm

19 minutes ago, SlithyMatt said:

I don't think any one program would need all of these, unless it was a scientific calculator or really advanced spreadsheet. If you need to do one or two of these calculations very often, it would be totally worthwhile to take up the space in banked RAM.

I was thinking to use an external card using Memory Map I/O akin to VERA chip with its own memory banks, i.e. the external 2MB RAM accessible by the 32 bytes I/O memory. The X16's RAM would be preserved.

I guess we can forget about shaded 3D but maybe a fast game using wireframe instead? Last resort... 200MHz 65c02!