Text editor

BruceMcF · Post by **BruceMcF** » Wed Sep 02, 2020 11:52 pm

16 hours ago, Stefan said:

That's an interesting subject.

The editor might not work as you think. It is essentially a model-view design. ...

When designing this, I calculated the memory usage. 16 bits are needed to represent the bank of the previous and the next memory page. ...

Aha, this is "prior window" and "next window" into the text buffer, with each window containing its own linking and text length information. Yes, the only relatively full featured editor I've ever written was a line buffer editor, so I fell into habitual assumptions.

So to do an insert, you can look if the next window has enough space to store the text after the insert. If it does, slide its text up, copy the text in the next window into the space, and continue with the insert. If it doesn't, grab a new "empty" window and link it in, and copy the text after the insert point into the new window.

You can go through and do small local memory collect at regular intervals, like when moving multiple windows up the chain, go back to the starting point and see how many windows is required to hold the first four buffers, and if its less than four, go through and compact them, releasing buffers from the "used" linked list and putting them in the "empty" linked list.

Anytime on the 6502 that you go beyond a page, both complexity and size of code expands. With a page sized window, keep the address of the window we are at, CURRENT, in zero page and everything can be done with (zp),Y operations. For windows into a buffer stuffed with text, with each window containing two way linked list pointers and number of text characters, page size windows are ideal, the largest size buffer that allows all copies to be simple loops like:

LDA SBNK: CMP DBNK: BNE +

- LDA (SRC),Y : STA (DEST),Y, : DEY : BNE -

-- LDA SBNK :+ STA BANK : LDA (SRC),Y : LDX DBNK : STX BANK : STA (DEST),Y : DEY : BNE --

One idea that reduces overhead by two bytes per page is have 64K buffers, in 8 consecutive High RAM segments. Then you keep a byte somewhere that has the bank of the first segment, you can use the high three bits of the LINK address to be the offset from the base. Then the link to page / bank converter, passing logical address in A and returning bank in A and page in X, is:

LNK2PG: PHA : AND #%00011111 : ORA #$A0 : TAX : PLA : LSR : LSR : LSR : LSR : CLC : ADC BUFF0 : RTS

There is an advantage to using an offset from a base bank even if you DON'T limit yourself to 64K files, because then you can have multiple BUFFERS, and the same operations can be directed to distinct buffers by just putting a different byte into BUFF0. But if you don't compact it, then "Link To Page" with the logical bank in A and logical page in X is just:

LNK2PG: PHA : TXA : AND #$1F : ORA #$A0 : ORA #$A0 TAX : PLA CLC : ADC BUFF0 : RTS

Even if you don't have an allocated use for the three redundant high bits in the logical page address, there's no harm in building in the ability to use them for some purpose later. For example, you might use them to indicate what KIND of text ... PETSCII, ASCII Latin-1, UTF-8 encoding of Unicode, etc. Having that encoded in logical address of each window makes it less likely that type of text information will be lost, and it makes sense to keep different types of text in different pages, even if there is free space available to merge them.

_____________________________________________________________

Another way to reduce to three bytes overhead per page-window, while retaining the two byte logical address, is with an XOR double-linked list.

In an XOR linked list, the "LINK ADDRESS" is the XOR of the LOGICAL ADDRESS OF "NEXT" and "PRIOR". I am going to assume for simplicity that the logical address is just the high byte of the page address with an assumed low byte of 0. If the above is desired, it still can be, with JSR LNK2PG at strategic intervals, but I'll set that aside.

We store the ACTUAL ADDRESSES of PRIOR, CURRENT, and NEXT in memory in zero page vectors, since this lets us do "LDA (PRIOR),Y : STA ( CURRENT),Y" type operations with less setup. We also have PRBANK, CRBANK, NXBANK.

To move UP the chain, we copy CURRENT to PRIOR and NEXT to CURRENT. And then for the physical address and the bank number of the new NEXT, we XOR the LINK in CURRENT with the ADDRESS stored in zero page. Note that you only access data in the CURRENT window, so whether the prior or next link is in the current bank doesn't matter.

NEXTWIN: LDA CRBANK : STA PRBANK : LDA NXBANK : STA CRBANK : STA BANK

L1: LDA PRBANK : EOR (CURRENT) : STA NXBANK

L2: LDA CURRENT+1 : STA PRIOR+1 : LDA NEXT+1 : STA CURRENT+1 : STZ PRIOR : STZ CURRENT

L3: LDY #1 : LDA PRIOR+1 : EOR (CURRENT),Y : STA NEXT+1 : STZ NEXT : RTS

To move down the chain, we do the equivalent, swapping the role of PRIOR and NEXT.

But to do local access to the current window, we don't need to worry about this. And to do local operations TO the next and prior windows, without moving our current spot in the chain, you still don't

_____________________________________________________________

One reason I was writing a line oriented editor is that a line oriented system is very convenient for FORTH. The base address and number of characters can be handed directly to the interpreter, without any need to copy the lines.

The idea this your approach inspires in me is to do something similar with 80 character line buffers, with the lower 5 bits of the logical page address being the page offset into the $A000 High RAM window, and the high three bits of the page address can be two bits to say which buffer INSIDE the page, 1, 2 or 3, and another bit is free for something else ... perhaps a dirty bit that is set when the line has been modified since the last save.

BANK is 1 byte, the logical Window address is 1byte, with XOR linking only the two of them are needed, and the buffer plus the number of characters in the buffer is 81 bytes, so 83 bytes per window. AND ... 256/3=83.33333. For saving the file, the bank might by converted into the sequence of the bank that has been saved so far, so when loaded the banks are all relative to the start of the buffer, and it is easy to walk through the banks and convert them to their actual bank locations.

Stefan · Post by **Stefan** » Thu Sep 03, 2020 7:49 pm

@BruceMcF

I don't remember ever hearing about XOR linked lists.

You couldn't easily come up with such a concept. I guess it's true that there's creativity in limitation.

The downside is, of coarse, more complex code which might be a bit harder to debug.

BruceMcF · Post by **BruceMcF** » Fri Sep 04, 2020 12:13 am

7 hours ago, Stefan said:

@BruceMcF

I don't remember ever hearing about XOR linked lists.

You couldn't easily come up with such a concept. I guess it's true that there's creativity in limitation.

The downside is, of coarse, more complex code which might be a bit harder to debug.

I don't know who came up with them, I heard about them when following the comp.lang.forth newsgroup in the late 90s, early 2000's.

There's no particular urgent reason to do either change for the "raw text buffer" approach. For 3 80 character line buffers per page, 5 bytes overhead is two bytes too many, but for one raw text buffer, there's no particular amount of space it has to have, and it would only be 1.2% of the space saved even if both space savings are applied.

Actually, as a rule of thumb, don't rewrite software to reduce an overhead by 1.2% unless you need overhead 1% lower to make it work at all! So its probably not a change I would make.

I would be strongly tempted to make the bank references relative to a stored base bank but I would probably leave them as bytes so any given buffer is free to be as big as desired.

___________________________________________________

So the rest of this is just kibbitzing.

Where the XOR linked list is really handy is when you have things set up for a single linked list, but then you want to add the ability to slide up and down in both directions without changing all of the code that assumes the link is a single address in size.

Transitioning to using them is probably a two step process ... finding code that slides up or down the linked lists and turning all of them into subroutine calls to the same two subroutines, and only then switching the window page creation code over. That's also an approach I picked up from the same Usenet News newsgroup, in Forth it's called factoring.

Stefan · Post by **Stefan** » Sat Sep 05, 2020 7:15 am

I just published a new version of X16 Edit to the downloads page.

What's new?

A lot of updates to the UI

Some bug fixes

It might soon even be usable.

BruceMcF · Post by **BruceMcF** » Sat Sep 05, 2020 9:15 am

Fingers crossed, xForth is not going to be limited to BLOCK files, it WILL be able to INCLUDE text files.

When I was editing gforth script files on the train on the way home from work with my floppy disk Linux back in the late 90's, nano was my go to file editor. My IDE was literally nano and gforth and the command line.

Stefan · Post by **Stefan** » Sat Sep 05, 2020 12:18 pm

If you're disciplined, no project is too large or complex for command line + Nano.

I must look into Forth. Have never tried it.

BruceMcF · Post by **BruceMcF** » Sun Sep 06, 2020 8:23 pm

Are you going to have word wrap?

I always liked the VDE word wrap: soft returns were "<space><return>", hard returns were any other character follows by a return. And in word wrap mode, it wrapped as you went, but it didn't bother with any other lines unless you highlighted a set of lines and then it word wrapped that set.

I later found out that according to some the hard and soft returns were the other way around, but if you enter a return you most often enter it right after the character at the end of the line, so the VDE approach always worked more naturally for me.

Stefan · Post by **Stefan** » Mon Sep 07, 2020 2:49 pm

I do not plan to have word wrap, at least not now.

When I use text editors, I never have word wrap enabled. If you're editing source code or config files, word wrap is of limited value.

This is actually my second attempt at creating a text editor for X16. My first try in 2019 had word wrap. The code became ugly and had a lot of hard to find bugs. I dropped word wrap to make things easier, but I might revisit it in the future.

V. 0.0.3 is quite stable, and I have no known bugs (they are probably there anyway). I think I have a solid base to continue from.

These are the things I'm working on now and in the near future:

Memory defragmentation routine

Search and replace

Cut and paste

BruceMcF · Post by **BruceMcF** » Wed Sep 09, 2020 3:38 am

On 9/7/2020 at 10:49 PM, Stefan said:

I do not plan to have word wrap, at least not now.

When I use text editors, I never have word wrap enabled. If you're editing source code or config files, word wrap is of limited value.

This is actually my second attempt at creating a text editor for X16. My first try in 2019 had word wrap. The code became ugly and had a lot of hard to find bugs. I dropped word wrap to make things easier, but I might revisit it in the future.

V. 0.0.3 is quite stable, and I have no known bugs (they are probably there anyway). I think I have a solid base to continue from.

These are the things I'm working on now and in the near future:

Memory defragmentation routine

Search and replace

Cut and paste

When (if?) I get xForth fully working, I have a very simple literate programming word set which encourages explanatory block comments that can be pulled out with tags embedded in the comments for automatically generated glossary/help files.

When I was doing my "only visible characters" Forth text editor in the early 2000's, I found that a simplifying design spec for word wrap was to wrap when a character is typed into the last column, so there is no "hidden" white space between the last column and the first column of the next line. And update a "last space" column position with every space, so there's no scanning back, just insert return after the last space in the line, space out the tail, go to the head of the next line and print to screen whatever is after the return.

Then in non-line wrap mode it can just stall at the set right margin.and it's up to the user whether to hit return there are go back an insert a return further back.

Stefan · Post by **Stefan** » Wed Sep 09, 2020 7:01 pm

14 hours ago, BruceMcF said:

When (if?) I get xForth fully working, I have a very simple literate programming word set which encourages explanatory block comments that can be pulled out with tags embedded in the comments for automatically generated glossary/help files.

When I was doing my "only visible characters" Forth text editor in the early 2000's, I found that a simplifying design spec for word wrap was to wrap when a character is typed into the last column, so there is no "hidden" white space between the last column and the first column of the next line. And update a "last space" column position with every space, so there's no scanning back, just insert return after the last space in the line, space out the tail, go to the head of the next line and print to screen whatever is after the return.

Then in non-line wrap mode it can just stall at the set right margin.and it's up to the user whether to hit return there are go back an insert a return further back.

Sounds like a good idea.

If I understand, you mean automatic word wrap that inserts an actual line feed marker in the buffer. That might actually be possible to do.

In my first attempt I had word wrap that was calculated from the top of the paragraph without inserting any line feed markers. That became complicated.

An alternative is to use some other control char to mark automatic word wrap. If I remember correctly, WordStar used such an internal marker. The advantage is that you can strip the word wrap marker when saving to file.

Another thing. In my last post I said I had no know bugs. Naturally I found a couple right after saying that. On my bug severity scale, irritating - serious - deadly, I was closing up on deadly. When you inserted text above text that was already typed in, a few letters every now and then would be duplicated, effectively a memory corruption problem. Usually, you see right away what's causing such a problem. But in this case the corruption seemed random. I used a good 5 hours to isolate when the problem occurred. Finally I found an off by one error in the routine deleting a char from memory. It's a good feeling fixing such a bug ...