On 8/31/2020 at 4:47 PM, desertfish said:
Also the C64 (and CX16's) math functions operate on 5-byte floats i.e. 40 bits. Internally they work with 6 bytes I think for intermediate rounding precision, but the float values stored in memory occupy 5 bytes. Of course all floating point operations are implemented in software in the ROM
Four bytes of that are the number part, the "mantissa", with the fifth byte the exponent ... the "x10^[__]" part in scientific notation, but since it is a binary floating point it is really "x2^[___]".
This is more precision than "standard" 32bit float, and for most floating point applications that the CX16 can handle makes long (64bit) floats pretty much redundant. A standard 32bit float is 23 bits mantissa but thanks to a trick it represents a 24 bit ... three byte ... numeric part, because floating point slides the binary number until the leading bit in front of the "binary" point (not "decimal" point) is a "1", and if you know what it is, you don't have to store it. IOW, if the result of an operation is 0.0011011101...x2^12, that is converted to 1.1011101...x2^9, and only the bits after the binary point are stored. That is an unsigned value, with the sign of the mantissa in the high bit of the floating point number and bits 23-30 as an unsigned value that represents (exponent+127), so 2^0 is binary 127 ($7F).
So standard floating point numbers can PRECISELY represent integers from +/-16,777,216 ... about +/-16.7 million. Outside of that range, they can only precisely represent integers that have a appropriate power of 2 as a factor.
By contrast, the Microsoft 6502 "extended" floating points (at Commodore's insistence) can precisely represent integers from +/-4,294,967,296 ... about +/-4.2 billion. The reason for Commodore's insistence is if you do exact accounting, you actually represent dollar values as an integer number of CENTS, so standard 32bit floats can only precisely represent +/-$167,772.16, and to Commodore's way of thinking, that wasn't big enough. A simple eight digit calculator can do better (using signed-magnitude Binary Coded Decimal arithmetic) .... +/-$999,999.99 ... and they weren't going to have an expensive computer system beat by an eight digit calculator!!!
This is actually twice the range of xForth's "double cell" integers, because the floating point are sign+size, while Forth has native signed integers that run from -2,147,483,648 to +2,147,483,647. So while floating point is generally LESS "precision" than scaled fixed point of the same size ... C64 floating point is actually roughly twice as precise as scaled signed 32bit fixed point (it's the same precision as 32bit unsigned, because if the data is unsigned then the mantissa sign flag being clear doesn't give any extra information).
Still, +/- 2 billion tends to be enough for lots of purposes when you have numbers that don't fit into the signed +/-32,000 ish or unsigned 64 thousand ish of 16bit integers.