Commander X16 audio capabilities

m00dawg · Post by **m00dawg** » Mon Feb 01, 2021 7:11 pm

6 minutes ago, rje said:

I'm not an expert on sound AT ALL, but I do remember that the SID envelopes made things like gunshots and explosions very easy.

Yes. This is true for the NES as well and why I tend to think "less is more" when it comes to the number of VeraSound channels. As in if we have to sacrifice channels to get some hardware nice to haves, I think it would be worth the trade-off. Of course I'll be content with whatever we end up with but things like buzz noise, hardware envelopes even if rudamentary, and a loop trigger on the DPCM (for being able to use "drawable" chiptune sounds) would make the VeraSound a spiritual successor to the soundchips found in the NES, GB, and TG16 while still being appropriately chippy.

Kliepatsch has a good point about the aliasing too. I wonder if a simple analog low-pass filter on the audio output stage is within reason. For fixed frequency, a simple RC filter would probably do.

kliepatsch · Post by **kliepatsch** » Mon Feb 01, 2021 7:45 pm

33 minutes ago, m00dawg said:

I wonder if a simple analog low-pass filter on the audio output stage is within reason. For fixed frequency, a simple RC filter would probably do.

Unfortunately, a simple analog LP filter wouldn't do the trick, the audible frequency range is contaminated right from the start. No filter will help there. Only buffers with clean waveforms for different frequencies, or oversampling would help. But I think we won't see anything of that happening. Let's face what we've got and try to make the best out of it. ? Triangle waves still sound decent at high pitch, and by layering two or three of them (like a short overtone series) and modulating their amplitudes we will be able to create nice sounding high pitched sounds. (Or even less than that might suffice!)

44 minutes ago, rje said:

I'm not an expert on sound AT ALL, but I do remember that the SID envelopes made things like gunshots and explosions very easy.

True. If the SID had an overall volume parameter additional to the attack, decay, sustain and release parameters, it would be truly siuperior to the VERA. Because then the volume control of the SID would sound better, require less CPU and would be easier to operate and still provide enough flexibility. But because the SID lacks a volume control per voice, it seems to be rather hard to "mix" tunes. This is where the direct volume control of the PSG comes in handy.

BruceMcF · Post by **BruceMcF** » Mon Feb 01, 2021 7:48 pm

Certainly I would trade off some channels for ADSR, but I only know the VERY first thing about FPGA design (substantially less than the bit I know about the SID), and I certainly don't know whether 8 ADSR channels would be available for the slices and gates used by 16 simple tone generator channels.

And of course the OPM has ADSR, and the PCM is well suited to things like gunshots.

Presuming the PSG design is locked in place, one thing you do when you have an abundance of one resource and are short of another is you look at how to use more of the first to economize on the second. So, for instance, I wouldn't be surprised to see approaches that use each channel for a distinctive sound, all of them turned on at zero volume and that plays entirely by manipulating the volume registers. You can do that fairly quickly for a sequence of PSG channels by selecting the first volume address and setting the VERA increment to four bytes.

And with that many channels, following @kliepatsch, you can may be able to gang noise channels together at slightly different frequencies for a fuller electric snare.

And, also following @kliepatsch, the Achilles heel for this kind of sound generator is low frequencies, where the PCM can benefit from the fact that low frequencies can rest in the middle of the soundscape and you may be able to get away with a lower frequency playback.

m00dawg · Post by **m00dawg** » Mon Feb 01, 2021 8:01 pm

All good points. Kliepatsch, curious, how do you think the VERA compares to, say, a real NES (if you're heard one)? The NES DCPM _definitely_ has some aliasing on the DPCM. I found the squarewaves to be pretty decent though ironically the triangle of the NES is where I tend to hear tons of aliasing (I think this makes sense given how the NES generated the triangle compared to how the Vera does).

Also good points Bruce! Clever idea for considering volume here. The one thing I would say is part of what makes interesting sounds isn't a volume gate but how parameters change over time. So if we want to make a bassdrum using a square (common trick for chiptune music), we would want to manipulate the pitch and/or PWM. Both require changing things fairly rapidly so turning the volume off or on won't be all that is required. In fact even simple things like quick stabs or plucks tend to have automation.

At it's core, really the Vera is more like 16 oscillators. So in a conventional synthesizer, it's the VCOs, but we have to implement the VCAs, LFOs, Envelopes, all in software.

I think some of the concepts in Kliepatsch's Concerto could be applied here. I hadn't thought of using multi-timbral sounds in my tracker development originally but find Concerto's approach quite interesting. It could be one way to help lighten on the load on the CPU. For instance, instead of thinkings about VeraSound as 16 voices, we could instead think of it as 8 voices that each have two oscillators. This means at least some automation can be shared across two voices. That cuts the number of envelopes by half (up to). Though each real voice still needs to be modulated and that still means lots of calls to the Vera - those don't go away, just the amount of time spent evaluating envelopes does.

kliepatsch · Post by **kliepatsch** » Mon Feb 01, 2021 8:21 pm

23 minutes ago, m00dawg said:

All good points. Kliepatsch, curious, how do you think the VERA compares to, say, a real NES (if you're heard one)? The NES DCPM _definitely_ has some aliasing on the DPCM. I found the squarewaves to be pretty decent though ironically the triangle of the NES is where I tend to hear tons of aliasing (I think this makes sense given how the NES generated the triangle compared to how the Vera does).

I can't say too much about it. I have listened to a few chiptune pieces made with the NES on YouTube, also displaying the waveforms. And I have also noticed the "aliasing" of the triangle waves. I think, technically, it's not aliasing. It's bit reduction. The NES triangle waveform looks like a stairstep pattern on the oscilloscope views. This seems to be more noticeable when the triangle plays at low frequencies, when the stairstep pattern (which is similar to another triangle wave on top of the actual one) shifts into the audible frequency range... The fact that it's just a low bit depth and not aliasing has the result that the unwanted overtones at least are multiples of the base frequency, so they don't get on one's nerves as easily ... sorry, that is probably a bit too technical for this thread ?

Anyway, I would also like to comment on the CPU usage by the VERA. I was estimating the CPU usage that the Concerto sound engine would be using at its current update rate, which is fairly high (the update rate). I was always considering the worst case, when there were 16 individual voices, each playing 3 envelopes, 1 LFO and, what in fact uses the most CPU (!): the modulation routing. And an 8 MHz 65C02 will handle that, and even leave a bit of headroom. As @m00dawg pointed out, most of the time you won't be using all those modulation stuff at the same time. That's why I am not too worried about CPU usage (yet! ? more complexity to come!).

The part that takes up the most CPU in Concerto is the modulation routing, because modulation sources (such as envelopes) are multiplied with the modulation depth in each tick for each routing. The less of these routings are assigned, the less CPU power is used. This multiplication could be avoided altogether with a different modulation architecture ... possibly removing some of the flexibility, but going at a much lower CPU usage.

kliepatsch · Post by **kliepatsch** » Mon Feb 01, 2021 8:30 pm

1 hour ago, xanthrou said:

While I am not an expert in hardware, nor do I have resources for all that, I made a rough draft/sketch of how our expansion card would look like. I would some day show you how would it look like.

So, we have GUS-like setup for playing back samples, sound effects and voice samples, a speech synthesizer chip, a Z80, dedicated sound RAM and a SID. (with a middleman chip to talk to with 65C02)

On non sound-related features, I might add CP/M compatibility (which is what Commodore did with 128, so several KayPro and Osborne files would be read and write to from X16) and some BASIC enhancements, mainly commands for graphics. (a la C128)

How does it sound?

Also I'm sorry we've buried this suggestion in the thread. I'm afraid I can't say much about it. But the others might?

m00dawg · Post by **m00dawg** » Mon Feb 01, 2021 8:51 pm

Indeed, as I recall the NES actually generates the "triangle" with a squarewave that uses a fast modulation. Or something like that. Interestingly, a real NES, I find anyway, can sound abit different than say Famitracker. It's close enough to enjoy but in terms of audio scrutiny, Famitracker runs at a higher sampling rate and I think also does oversampling. Although of note the triangle sounds quite similar in terms of all its uhm uniqueness ?

Fair point about the bury though - I had something written up about the GUS before I got distracted hehe. My advice, xanthrou, is to take it a byte at a time (by the way your "How does that sound?" pun was not lost on me! I see what you did there!). The SID would be the most problematic I think due to the managing of two different bus speeds. Of note, as Lorin pointed out in a few threads, even interfacing a microcontroller to the X16 bus isn't so trivial. Though I expect (I hope?) there might be perhaps a reusable solution to that which many cards can implement. At any rate, you have 5 hardware RAM slots at 32 bytes each. Currently, a card can use 1 or more, though 32 bytes in this case could be plenty.

Focusing on a GUS solution, there's a ton of ways to go about it depending on how much the card offloads from the X16 and what sort of modulation and effects you want to present. But thinking basic, something as simple as a Raspberry Pi (optionally with something like a Hifi Berry for better audio than the onboard) could work IF there is a bus management solution (noting the problem with just attaching GPIO pins to the X16's bus as Lorin has pointed out in his threads on the subject). While this is not nearly as "cool" as an FPGA it could work and the bus protocol could be used for other FPGA solutions perhaps. And in fact, such a protocol might also work for a MIDI interface wherein the protocol is the same but the output side is where things differ (playing samples more directly or sending MIDI messages to an external synth).

The Pi could then run, probably even something as simple as Python, to read the GPIOs and handle things accordingly. Latency might be a bit of an issue compared to an FPGA or writing the software player in a compiled language like C.

I find the Pi tends to be the go-to solution for maybe a tad too many things, but here it could fit just because it already has plenty of RAM - far FAR more than the GUS ever could handle, and could enable prototyping of the interface and functions. And really with a hifi berry, the Pi can sound rather good. The problem there is fitting all that on a single width card.

An FPGA would be quite suitable here because it could directly attach to the bus and would be a far better integrated solution than a Pi but just thinking how you might start with getting things off the ground.

I guess the point was going from zero to a card that can "have GUS-like setup for playing back samples, sound effects and voice samples, a speech synthesizer chip, a Z80, dedicated sound RAM and a SID. (with a middleman chip to talk to with 65C02)" is a lot ? So breaking it apart can certainly reduce the scope.

This is exactly what I'm doing with my tracker. Having it make noise was the first step, then playing a handmade pattern I wrote in hex, to seeing that pattern scroll on a display, then some basic UI and being able to see the order list, next being able to manipulate the order list, etc. It's one thing I've been enjoying about assembly, things are very compartmentalized. There's tons of compartments but I found it easy to break problems up, more so than some high level languages even. The same approach could be taken here.

Kalvan · Post by **Kalvan** » Mon Feb 01, 2021 10:33 pm

I have a question that needs answering:

Is there a specific, dedicated audio buffer, and if so, is it part of VERA's VideoRAM, or is it part of System High RAM?

SlithyMatt · Post by **SlithyMatt** » Mon Feb 01, 2021 10:40 pm

2 minutes ago, Kalvan said:

I have a question that needs answering:

Is there a specific, dedicated audio buffer, and if so, is it part of VERA's VideoRAM, or is it part of System High RAM?

The only dedicated buffer is the PCM FIFO, which is 4kB and generates an interrupt when it's about to run out of data so you can top it off if you have more audio to stream. Otherwise, you have to use regular RAM and stage data there while it is being transferred. These could be YM2151 instructions, PSG settings, or a larger cache of PCM data. However, all of them could be handled programmatically.

m00dawg · Post by **m00dawg** » Tue Feb 02, 2021 12:04 am

1 hour ago, SlithyMatt said:

The only dedicated buffer is the PCM FIFO, which is 4kB and generates an interrupt when it's about to run out of data so you can top it off if you have more audio to stream. Otherwise, you have to use regular RAM and stage data there while it is being transferred. These could be YM2151 instructions, PSG settings, or a larger cache of PCM data. However, all of them could be handled programmatically.

I've been wondering about this - and perhaps should try to test this on my own, but since it's a buffer, bytes effectively "go away" right? As in, I can't just load the FIFO with 4kb one time and just keep triggering it (to play the same data)?