My thought is that the programming interface could expose the following registers:
+-----+-------------------------------------------+ | Reg | Description | +-----+-------------------------------------------+ | $00 | Source address (low byte) | | $01 | Source address (high byte) | | $02 | Source address increment (low byte) | | $03 | Source address increment (high byte) | | $04 | Destination address (low byte) | | $05 | Destination address (high byte) | | $06 | Destination address increment (low byte) | | $07 | Destination address increment (high byte) | | $08 | Byte count (low byte) | | $09 | Byte count (high byte) | +-----+-------------------------------------------+The idea, then, is that a programmer populates the first eight registers, and then non-zero writes to registers $08 and $09 trigger an immediate copying of bytes -- by the time the CPU executes its next instruction, the transfer has already finished and the register has decremented its way back to zero, however many clock cycles later.
I think that this will need 29 pins:
- 16 address pins
- 8 data pins
- VCC
- GND
- RDY
- PHI2
- RWB
My approach so far (and what I'm trying to code in Verilog) is that the byte count registers are actually 17 bits total, with the external registers mapped to [16:1] and the 0th bit being set on any non-zero writes to external byte count registers. The chip would assert RDY so long as any non-zero bits exist in the internal byte count register. And as long as RDY is asserted, the 0th bit of the internal count register then controls whether the chip is on the "read phase" of a memory transfer or the "write phase":
- On a 1, the chip attempts to read from the source address into an internal buffer register of 8 bits, then increments the "source address" by the value in "source address increment", and decrements the internal count register by 1.
- On a 0, the chip attempts to write to the destination address the data in the internal buffer register, then increments the "destination address" by the value in "destination address increment", and if any non-zero bits exist in the internal count register then it decrements the internal count register by 1. If all bits in the internal count register are zero, then RDY is de-asserted so the memory copy process stops and the CPU can resume execution.
But I would definitely appreciate some advice on whether I'm potentially missing some consideration, or for resources that would allow me to simulate Verilog without copy-pasting code into a web form. In particular, I'm not sure how much work should be done on positive clock edges versus negatives, and I haven't begun considering the possibility that I'd have to wait some delay after an edge before trusting the state of various wires/pins.
Or, if someone is inspired and decides to just roll with it and beats me to the punch with functional Verilog -- I mean, hey, I have Box16 to divide my free time with as well, I won't be remotely offended if someone posts with a GitHub repo of Verilog implementing this chip and/or other hardware necessities to support it. Heck, I'd even be willing to help test it on my own official dev board once I can get the parts. For now, this is on "slow burn" mode for me because I've been curious about getting into Verilog, and I have pretty finite quantities of free time available that aren't otherwise allocated at the moment.