Interested in developing a simple DMA controller

StephenHorn · Post by **StephenHorn** » Tue Mar 07, 2023 7:37 pm

I have an idea for a very simple DMA controller. It seems like it shouldn't be too hard, but I'm struggling a bit with the Verilog because I'm completely new to the language, and it looks an awful lot like the practical options for simulating Verilog are all commercial software or web portals. I've been making do with web portals as I explore.

My thought is that the programming interface could expose the following registers:

+-----+-------------------------------------------+
| Reg | Description                               |
+-----+-------------------------------------------+
| $00 | Source address (low byte)                 |
| $01 | Source address (high byte)                |
| $02 | Source address increment (low byte)       |
| $03 | Source address increment (high byte)      |
| $04 | Destination address (low byte)            |
| $05 | Destination address (high byte)           |
| $06 | Destination address increment (low byte)  |
| $07 | Destination address increment (high byte) |
| $08 | Byte count (low byte)                     |
| $09 | Byte count (high byte)                    |
+-----+-------------------------------------------+

The idea, then, is that a programmer populates the first eight registers, and then non-zero writes to registers $08 and $09 trigger an immediate copying of bytes -- by the time the CPU executes its next instruction, the transfer has already finished and the register has decremented its way back to zero, however many clock cycles later.

I think that this will need 29 pins:

16 address pins
8 data pins
VCC
GND
RDY
PHI2
RWB

I'm uncertain if I need a 30th pin for CE ("chip enable"), or if I can safely infer that. I think I might need it and rely on external logic to trigger it, because I don't necessarily want the design to be "stuck" on an exact set of I/O addresses (though I figure the module would take up one of the I/O expansion slots' address ranges).

My approach so far (and what I'm trying to code in Verilog) is that the byte count registers are actually 17 bits total, with the external registers mapped to [16:1] and the 0th bit being set on any non-zero writes to external byte count registers. The chip would assert RDY so long as any non-zero bits exist in the internal byte count register. And as long as RDY is asserted, the 0th bit of the internal count register then controls whether the chip is on the "read phase" of a memory transfer or the "write phase":

On a 1, the chip attempts to read from the source address into an internal buffer register of 8 bits, then increments the "source address" by the value in "source address increment", and decrements the internal count register by 1.
On a 0, the chip attempts to write to the destination address the data in the internal buffer register, then increments the "destination address" by the value in "destination address increment", and if any non-zero bits exist in the internal count register then it decrements the internal count register by 1. If all bits in the internal count register are zero, then RDY is de-asserted so the memory copy process stops and the CPU can resume execution.

My thinking behind 16-bit wide increment registers was to keep the Verilog simple -- folks can initialize the high bytes to zero and forget about them if they aren't decrementing or otherwise need them. It also means I don't potentially have to map values the way VERA does. I more-or-less expect the majority use case to be address increment values of 0 or 1 -- depending on whether we're interacting with something like VERA on the source or destination side, where all the data is coming from, or going to, one bus address. They could certainly be reduced to 8-bit fields using Two's Complement signed numbers, as the sign extension itself should be trivial, it just limits the maximum "stride" of striped data, such as columns of graphics data or something.

But I would definitely appreciate some advice on whether I'm potentially missing some consideration, or for resources that would allow me to simulate Verilog without copy-pasting code into a web form. In particular, I'm not sure how much work should be done on positive clock edges versus negatives, and I haven't begun considering the possibility that I'd have to wait some delay after an edge before trusting the state of various wires/pins.

Or, if someone is inspired and decides to just roll with it and beats me to the punch with functional Verilog -- I mean, hey, I have Box16 to divide my free time with as well, I won't be remotely offended if someone posts with a GitHub repo of Verilog implementing this chip and/or other hardware necessities to support it. Heck, I'd even be willing to help test it on my own official dev board once I can get the parts. For now, this is on "slow burn" mode for me because I've been curious about getting into Verilog, and I have pretty finite quantities of free time available that aren't otherwise allocated at the moment.

Dacobi · Post by **Dacobi** » Tue Mar 07, 2023 8:31 pm

Is this meant to be an expansion card?

The reason I ask is that I've been messing around with one of the cheaper Arty FPGA boards and had the idea to make an expansion card for the x16 that would work as an adapter for an Arty board, with VGA and audio outputs.
I'm not sure if I'd be able to pull it off, but I do have a simple framebuffer with VGA output working on my Arty S50 written in Verilog.

I haven't done much simulation, but Vivado will simulate Verilog when it's setup for specific hardware.

StephenHorn · Post by **StephenHorn** » Tue Mar 07, 2023 9:54 pm

Dacobi wrote: ↑Tue Mar 07, 2023 8:31 pm Is this meant to be an expansion card?

The reason I ask is that I've been messing around with one of the cheaper Arty FPGA boards and had the idea to make an expansion card for the x16 that would work as an adapter for an Arty board, with VGA and audio outputs.
I'm not sure if I'd be able to pull it off, but I do have a simple framebuffer with VGA output working on my Arty S50 written in Verilog.

I haven't done much simulation, but Vivado will simulate Verilog when it's setup for specific hardware.

I'm hoping for something that's a bit more modular -- it could be an expansion card unto itself, but if I had the experience and time then my end-goal would be a component that could be welded with Kevin's basic cartridge design. Although that would almost certainly make the cartridge significantly more expensive, it could also significantly expand the game-oriented capabilities of the cartridge: More audio could be pushed through the VERA's PCM buffer, and more graphics could be exchanged with VRAM, for starters. And as a cartridge component, a game developer can opt-in to the component without having to question the adoption rate of an expansion card.

This is also why I wasn't certain if I would need to program in a chip-enable pin, to allow the FPGA to live on any base address and then only consider the bottom 4 bits of its address lines when the CPU signals a write to it.

Dacobi · Post by **Dacobi** » Tue Mar 07, 2023 11:39 pm

I hadn't seen the cartridge design yet, but can definitely see how a DMA controller would be an advantage.

I'm mostly a high level software guy even though I originally studied EE.

There are many things I don't know about the x16 hardware layout.
How is the expansion slot IO mapped?
There's a wiki saying that each slot has 32 bytes of memory mapped IO.
How does writing to VERA work from a DMA controller? Is there a clock signal that the controller would need to access?

You said that a transfer would start when a non zero value is written to register 8 and 9. I guess this means you write to 9 first and then 8?

grml · Post by **grml** » Tue Mar 07, 2023 11:56 pm

How to produce a "retro" computer in 2023:

You take a modern piece of hardware, in a time where memory and bandwidth are very cheap. You then proceed to intentionally cripple the hardware by removing the parts that are "too modern" for you, because you want to emulate that "retro" feeling.

Then, because the crippled hardware is slow and cumbersome, you come up with ways to improve the crippled hardware by adding more crippled hardware that is only needed because you've intentionally crippled that original piece of hardware.

I do not understand any of this. It looks like a game for masochists.

StephenHorn · Post by **StephenHorn** » Wed Mar 08, 2023 12:17 am

Dacobi wrote: ↑Tue Mar 07, 2023 11:39 pm I hadn't seen the cartridge design yet, but can definitely see how a DMA controller would be an advantage.

I'm mostly a high level software guy even though I originally studied EE.

There are many things I don't know about the x16 hardware layout.
How is the expansion slot IO mapped?
There's a wiki saying that each slot has 32 bytes of memory mapped IO.
How does writing to VERA work from a DMA controller? Is there a clock signal that the controller would need to access?

You said that a transfer would start when a non zero value is written to register 8 and 9. I guess this means you write to 9 first and then 8?

Now that we have development boards, and presumably the expansion ports are final (or very near final), the expansion ports have been documented at on Github (link). As far as the expansion I/O addressing itself, the only documentation that I'm immediately aware of says that addresses $9F60-$9FFF are available for external devices, but there doesn't appear to be any additional specification, nor any real barrier preventing expansion cards from taking up that space at-will.

As far as the VERA working with DMA... I don't know if it can handle a write every other clock cycle. That would possibly require testing. When I've casually discussed the possibility of DMA before, however, the impression was that the greatest risk was to VERA's display performance if this was done outside of V-blank and/or H-blank. I also don't know how this would interact with the speculative changes being worked on by JeffreyH of the Discord server.

As for the copy timing -- my thought was that copy would start when a non-zero value is written to either register 8 or 9. It doesn't matter what order the bytes are written in, it doesn't seem to add or remove any clock cycles from the copy task since the first byte read in a copy occurs on the clock cycle immediately following the write, and the CPU resumes execution on the clock cycle immediately following the last byte write. Effectively, writing non-zero to the low byte copies some quantity of individual bytes, writing non-zero to the high byte copies some multiple of 256 bytes, and if you need to write both bytes then go ahead and write them in any order. The only risk is if your destination range clobbers the memory source you're reading the byte quantities from -- in which case it would be prudent to read both bytes into CPU registers before writing either byte to the DMA controller.

That also means that reads from either registers 8 or 9 are always expected to return zero, because any write to either register will immediately assert RDY and kick off the copy process until the registers are decremented back to zero.

Dacobi · Post by **Dacobi** » Wed Mar 08, 2023 1:14 am

One thing I still don't understand. How does VERA know when a new byte is written to data0/1?

StephenHorn · Post by **StephenHorn** » Wed Mar 08, 2023 1:22 am

It's on the same bus, right? I suppose I've been making the assumption that if I have access to the appropriate wires of the bus, nothing stops me from setting the address lines and RWB appropriately while RDY is asserted and sending data from this module to the VERA just like the CPU would. I mean, a diode would certainly stop me, if there is one, but I don't know whether there is or not and have been assuming not.

It's perfectly fair to say that I'm still making assumptions about the board and the capabilities that expansion cards will have.

Dacobi · Post by **Dacobi** » Wed Mar 08, 2023 1:27 am

Sorry I didn't express that correctly. What I meant is that since I haven't seen the schematic I still don't understand it in general.
When writing to data0/1 just using the CPU you can ex. write zero many times. What tells VERA that a new byte is written when it's the same as the last byte?

StephenHorn · Post by **StephenHorn** » Wed Mar 08, 2023 3:31 am

I'm not sure of the answer to that. About the only thing that's clear is that PHI2 is not directly provided over the VERA header. I have some idea of who to ask, though.

Commander X16

Interested in developing a simple DMA controller

Interested in developing a simple DMA controller

Re: Interested in developing a simple DMA controller

Re: Interested in developing a simple DMA controller

Re: Interested in developing a simple DMA controller

Re: Interested in developing a simple DMA controller

Re: Interested in developing a simple DMA controller

Re: Interested in developing a simple DMA controller

Re: Interested in developing a simple DMA controller

Re: Interested in developing a simple DMA controller

Re: Interested in developing a simple DMA controller