Recovered Blog Posts

by Trenton Henry

11/16/21

When I use complex tools for a while I slowly get better at using them more effectively. Sometimes I learn to use more and more advanced features as I become proficient with using the tool. Sometimes I learn to use fewer and fewer features as I find myself wishing I could ... use a different tool.

I have been using EverWeb to create this site, and it is one of the tools in the second category. I find myself using fewer and fewer of its features as I become more proficient with it. Mainly because I want to get things done and I don't want to waste time trying to force the tool to do what I want done. But this isn't a rant on Everweb. Its just that I have abandoned its blog 'feature' so I am collecting the entries here so they won't be lost. But I cannot abide the image copy/paste non-functionality of this tool, so I am not recovering the images. And just editing this page and trying to make it look tolerable is making me want to Hulk Smash. I guess EverWeb is made by Rage Software for a reason...

06/28/21 Switching the Dev Log to Blog Style

I decided to try using an actual blog for the development log, to see how well it works. The older log is here.

My goal is to build the boards I want to have, with the firmware I want them to have. I plan to build batches of a hundred or so to try to keep costs lower, then use what I want and try to sell the rest on Tindie etc to recover some of the cost. Mileage may vary; its a process. Yes there are many similar boards already available, and I have a good selection of them. But now I want to make my own toys the way I want them and not have to try to adapt existing boards into my gadgets. I realize that this may make things more complicated than is strictly necessary.

In order to try to keep track of boards vs CAD files vs Bantam projects, vs firmware BSP, etc I have just started numbering them. So boards have names like brd0023, and firmware has names like test0018. Boards that I actually intend to batch and (attempt to) sell have 'proper' names like ZEN001. I realize that these names aren't particularly innovative or clever, but I make a lot of boards and write a lot of test code etc, so it just helps to preserve sanity.

06/28/21 Do I Want a USB Loader or Not?

When I started working on the ZDK I had a prototype of a loader already in the works, so it seemed natural to adopt it. You write some code in C, compile and link it, and then just load it over USB without needing any debug pod or special hardware. It is a nice idea, and it works well. Of course, you also need to ability to at least print some debug, even if you cannot do actual source level debugging. So I set a 2KB limit for the loader, added some simple functions to do essentially getc/putc over USB, and the ability to call those from an artitrary application built to use the loader. That all works, but there are some rough edges.

In the simplest scenario, the loader is pre-installed into the flash, and you just write your programs and load them and run them. There is no interaction between the loader and the applications. To use the USB as a simple getc/putc conduit you can build a small USB stack into the app and use that. But the USB stack is already baked into the loader, so why duplicate it in the applications, wasting their code space? By creating a jump table in RAM that the applications can access to call through to the loader's USB putc/getc I saved code space at the expense of a little added complexity, and ended up with a simple though fairly limited i/o capability for the applications.

Perhaps shortsightedly, the initial implementation of iordtib/iowrtob is blocking, so when I load and run an app that prints, it blocks until the i/o is serviced over USB. The catch is that if I am not running my host side application which reads/writes over USB to provide the other end of the communications link then the firmware app blocks forever. That means that it won't run without its umbilical to the host, which for development is workable, but for 'deploying to the field' is tedious. So the obvious improvement is to only block on i/o if the USB is attached. That adds instructions, and I am already pretty tight on codespace for fitting the loader into the 2K limit that I set. Probably I can fit it.

All of that, of course, is assuming that I'm using USB. But I won 't always be. Most of my gadgets want to run freestanding on a battery (geek watch, robots, ...) without any tether to a host. That is why I built the infrared communications link. Only one CPU has USB, and serves as the USB to I/R bridge. The rest of the CPUs just talk to it over I/R on their UARTs. So they might benefit from a UART loader, with the USB loader becoming a USB<-->I/R pass through.

In this model every board has its own battery and possibly voltage regulator. And that is fine if they are all separate systems. But, because these CPUs have so little memory and so few pins, I envision using several together to build more complex systems. Thus they would likely want to share a power supply, and communicate over a more reliable wired UART interface. So they would likely benefit from a UART loader also, but with the host side as a USB<-->UART (sans I/R) loader. So it looks like the majority of the devices I want to make would benefit more from a UART loader than a USB loader. And not having USB buys back 2 pins per MCU.

That leaves the USB<-->UART(I/R) bridge as the only thing that can really benefit from the USB loader. I'm not sure if it makes more sense to just reprogram it with a debugger pod and not worry about the USB loader. Probably.

So, assuming I don't convince myself otherwise ...

* Convert the USB loader into a UART loader. It will almost certainly be much smaller without the USB code in there.

* Either have dedicated UART i/o code for each of the loader and applications, OR create a shared UART i/o substystem like I did for the USB.

* Design a new board that is the USB<-->UART or USB<-->I/R dongle.

* Design a simple to make cable with tx, rx, 3.3v, gnd that I can use to interconnect multiple local boards with a shared power supply.

* All the other things ... (@_@)

06/29/21 More on loaders (moron loaders?) hmmm...

Just some additional thoughts on using my 2K USB loader...

The zdk/zamples/ folder is where I created several examples that use the zdk and are linked so as to be loaded by the ociter/loader. Aside from the previous discussion about i/o blocking without the host application running, there is also the issue that these apps are difficult to debug in Ozone. When they are loaded in the debugger the 2K occupied by the loader is all zeros, and so the bootstrap remapping of the VTOR and USB ISR, as well as the i/o functions targeted by the jump table, are missing. I was able to script Ozone to load the VTOR and reset vectors for me, but since the USB stack is in the missing 2K (making it missing also) there is no USB connectivity. I can see a way wherein I might merge the loader image into the app images, but it is starting to feel like I am pushing string at this point.

Experimentally, I am replicating the test apps in zdk/ztests/ as full up builds that do not rely on the loader at all just so I can experiment with getting my drivers working before I try to integrate them into z4th. That way I can at least debug them more easily. Of course, when doing so my simple i/o over USB is unavailable, so I am using Segger's bulky, fugly, but utilitarian RTT to send prints to the terminal panel in Ozone. It isn't great, but it does work, so long as you have flash and RAM to spare for it (which I generally don't). All of this is "necessary" because the M0+ does not implement the ITM, so you have to invent something new. Once I have the I/R UART business sorted I should be able to use that and move away from all of these other overly complex debugging schemes. Caveat implementor...

06/30/21 Bold moves hopefully won't be regretted...

Ok, decision time. I am moving away from the USB loader. I'm not deleting it, obviously; I may come back to it. But for the moment it is more or less in my way; at the very least it is a distraction. The important thing at present is to finish the drivers and abstractions and integrate them into z4th and get some gadgets prototyped.

While modifying things to work sans loader I made sure that yopcc still works, and that I can run the tui and single step yop code with a breakpoint and a ram view etc. That all still seems to work. The Yop virtual machine is a stack machine. It is a 4 instruction VM; push immediate, run a builtin, and ovi and udi, which are basically nip and tuck. So, more or less, it is like cross-compiling the "high level" Yop sources onto a very limited Forth machine. I built a similar tui, zentui, for visualizing the runtime operation of z4th, but when I moved from 32 bit pointers to 16 bit indices etc I omitted the debug code that feeds the tui display. I may resurrect it, but for now I'm in simplifying mode. Anyhow, all of these things are connected, at least vaguely, and someday will hopefully come together.

06/30/21 Its All in the Timing

I spent a little time last night consolidating my interfaces and cleaning up code, with a focus on timing. I was thinking through the servo driver, which I need for some ancient robots I am resurrecting (their servos are modified to spin forever). It occurred to me that the examples I have seen on the interwebs all make the assumption that all servos run in parallel. I.e., their pulses start at the same time, and then end one by one as their timers fire. They tend to employ multiple timers, or complex timer wheels/pools and I just felt like they were all overly complex. So I spent several days off and on thinking about how to drive servos in the simplest way with the fewest timers etc. That is what led me to clean up the timing interfaces in the zdk.

I guess the only thing I might call an insight is the realization that servos don't need to pulse in parallel. Instead, they can pulse serially one after the other. Because the nominal range of a hobby servo is ~500 to ~2500 usec with a 20 msec period, it seems simplest to run 8 (maybe 9) servos by dividing the 20 msec period into 8 x 2500 usec slots, and running one servo in each slot. Slots with no servo run normally but no pins are affected. So only one timer is needed, and it only needs to be able to time 500 to 2500 usec. (I cannot make this stupid blog thing let me paste fixed width text with sane line spacing, so this is the best you get for now...)

• |---------------------------------------------------------------------------------------| 20 msec

• slot 0 slot 1 slot 2 slot 3 slot 5 slot 5 slot 6 slot 7

• |----------|----------|----------|----------|----------|----------|----------|----------| 2500 usec slots

• | ----____ ----____ ----____ ----____ ----____ ----____ ----____ ----____ on time / off time

I already had an abstraction for a millisecond timer that runs off the SysTick, just counting milliseconds. So added the ability to say "after this many milliseconds call this function":

mafter(msec, fn);

You get one interval and one function. If you call mafter() again before it fires you overwrite it and lose the first interval. Also, it is not a callback (in the sense of a zdk callback), it is just a void (*fn)(void). That was a deliberate decision because I want it to be trivial to use without needing to declare callback types etc. (If I need that I can use the TC driver its built on, because it works that way already.) And the interval is limited to 16 bits worth of msecs, which is 65.5 seconds. That is plenty, as there are other timers for longer intervals (RTC), and some of the simplest timers (on the cores I'm working with) are only 16 bits wide (SAMD TCs, I'm looking at you).

So, because I need sub-millisecond intervals for servos (and other things), I duplicated that millisecond timer interface for a microsecond timer.

uafter(usec, fn);

That is built on top of the SAMD TC1 counter. (Why do all the peripherals except the TC start at 0, and it starts at 1? What gives?) I built out the TCs to work pretty much like I built out the SERCOMs. I am still trying to maintain a level of abstraction between the zdk interface and the actual implementations. Because that is what I trained myself to do over my career when building big, complex, firmware for a living; it made things more portable. And portable is good. But here I'm really up against the stops on memory, so this may breakdown at some point. For now, however, onwards through the fog.

So, anyhow, basically for timing abstractions I have:

z_msecs - a monotonically increasing U32 that increments every millisecond, and can be read at any time from any context.

mafter(msec, fn) - a oneshot that calls a function after a few milliseconds.

uafter(usec, fn) - a oneshot that calls a function after a few microseconds.

And some enable/disable/configure-fu surrounding them.

06/30/21 What's In a Name?

Normally I strive to create meaningful/understandable/self-documenting variable and function names. After all, at least at work, I am writing code for others to read and enhance. But now that I have been playing with Forth I have picked up a tendency to write short, obscure, function names. For example, enpin() instead of enable_pin(), or _inpu instead of input_with_pullup, etc.

The reason is that in the current implementation of z4th the dictionary is in RAM and each word's name consumes bytes in RAM. As of z4th11-0.3.0, each word occupies 6 bytes, plus some length bits in the flen byte. It is a trade off between the simplicity of a fixed width dictionary entry header and the complexity of a variable length name field with padding. Fixed width headers make it possible to store the headers for the builtin primitive words in flash without a link field, saving some code space.

Anyhow, if you have two words with the same prefix, like enable-this and enable-that they both are stored as 11/enable and are indistinguishable. If they are different lengths, like enable_this and enable_other, they are stored as 11/enable and 12/enable, respectively, and so are recognized as different words. That works, up to a maximum of 16 characters (because the length field is 4 bits wide).

The point is that because the names reside in RAM they need to be compact by meaningful, and similar prefixes need to be avoided. So things like @pin and !pin instead of pin! and pin@ (though I do have sp! and sp@ and similar for compatibility with actual Forths) seem to be in vogue. And things like enpin, dispin instead of enabel_pin, disable_pin, etc seem somehow rational.

Obviously if I end up turning this into a cross compiler and just loading compiled code into a dictionary-less VM then the name lengths are irrelevant. And I may do that eventually. But for now I am staying with a dictionary in RAM, with primitives and associated dictionary in flash. The possibility to cross compile to flash and download that as a "firmware upgrade" exists, but I am slowly souring on the whole "update the firmware without a debug pod" idea. Mainly because it is more complex than I need (since I already have a debug pod) and uses precious code space. The main drawback of the debug pod approach is not the cost (the Segger mini's are only $20), rather it is the pins. I have to bring out RST, SWD, SWC, V3P3, GND to a 5 pin header. Yes, I can repurpose SWC and SWD when not using the debugger, but its hard to debug the use of those pins when then debugger is attached.

Also, having been inspired by Silas Warner's Robot War virtual machine when I was a wee lad, leading to the creation of several VMs inspired by it, the latest of them being the YopVM (though that is no longer a single accumulator architecture, it is a stack machine...), the idea of teeny tiny processors, instructions sets, and languages appeals to me. I also studied QForth, a Forth for a 4 bit microcontroller. And the ATTiny AVR parts. And other small pin count parts. Doing more with less is a fun challenge that is liberatingly different from the necessities of the day job.

So in a corner of my mind the idea of an 8 bit Yop-like has been growing. A CPU with a total of 256 bytes of memory. Instructions are 8 bits. Addresses are 8 bits. Numbers are 8 bits (sure maybe there's double register operations for 16 bits eventually). And there is a way to combine many of these into a larger system, so that they all cooperate to perform tasks that none can perform individually.

So, anyhow, even though this post is all over the map and way off the original topic... the next version of z4th is going to keep a RAM dictionary, and the ability to copy all of RAM to flash, and restore from flash to RAM. Words like ... save ... and ... restore ... maybe? That is probably the minimum useful tiny Forth like bit banger concept. I can worry about cross compiling to flash etc in a later release. The important thing for now is to get everything working so that I can connect multiple wee MCUs together and build up a multi-processor system that is more than the sum of its parts.

07/02/21 Mockumentation

Ok, I thought "why don't I start documenting some of this zdk stuff, if only to help get it sorted properly in my head" and so I started to do so and promptly failed, as you can see on the initial ZDK001 page, which ... quite honestly ... looks like crap. I'm using EverWeb, a whizzy whizbang wysiwig thing that is just ... confusing. I mean, even this blog you type into a little window and then it renders in another one. To make a set of buttons with a label and description etc you make a text group thing and then, in a tiny box in a side panel, you type everything in and then it renders on the preview of your page. I mean, it's 2021 for crap sake. But ... I digress.

Anyhow, my first documentation is rubbish. Partly this is my own fault. I despise documentation markup like Doxygen because it looks like a diseased spastic chicken vomited all over your code. And the end result is almost useless anyway. A few projects have managed to make a go of it and gotten arguably usable documentation out of it, but I'm betting they have a full time engineer on retainer just for that. So, no Toxygen markup for me, thank you.

Years ago I made a tool called docgen that harvested minimalist docs from sources and spat out rtf files, and that was more or less ok. In fairness, it *did* require repeating things in the docs, like the domain, operation, range, etc, and I -still- had to keep the docs in sync with the code, which is intolerable really, because what is the point of a documentation harvester that cannot ... actually harvest documentation? I mean, if you must spoon feed it by retyping everything, the why not just type the documentation into a document in the first place and be done? I mean, seriously. Except, because you want it in the code and in a document. So you need two copies.

Later on, when I was generating the code and the documentation from a model using tbhmeta things were much better. I still had to type in the model, including much of the documentation, but it could generate documentation into the code un-chicken-vomited and to rtf, html, plain text, etc docs as well. Win. But I've moved on from tbhmeta at this point. It is quite nice for generating vast amount of boilerplate code for massive ontologies of objects and such, but it is extreme overkill for tiny things like the zdk and z4th. Models should be smaller than the code they generate, otherwise, what's the point?

Years back, after tbhmeta, I had done a lighter weight mkontology tool for generating generic functions in C, which later spawned mkoop for generating oop code in yet a different way. That gave way to an abortive variant called mkstub which I didn't follow through on because I was working on the Yop parser and decided that modharv was the way to go by just parsing the headers and their docs and generating everything without any markup whatsoever. But I didn't finish that, either, since all manner of other things kept getting not quite done instead. Is there a pattern evident here...?

So now I have a new one, mkdocs, which I am certain will be the one that really sticks. It will be an exemplar of algoristic sublimnity, generating the documentation of the gods, with code so elegant and sophisticated that only those who possess true genious can hope to appreciate it's glory.

As. If.

But, I'm going to try it yet again. My requirements are:

• Absolutely minimal markup in the source files so that there is no line noise or chicken hurl.

• Absolutely minimal repetition of things already in the code inside the documentation comments.

• Spits out good enough rtf that I can stitch up a whole document and render it as a pdf or html.

• Simple so I can do it quickly and get back to the zdk and z4th.

That's about it. Wish me luck.

07/05/21 LED There Be Light

I set up two boards, one with an RGB LED, and another with a selection of hobby servos, with the intent to ring out my sofware pwm and get some nice colored lights and some servinating servos. The RGB code was fairly easy to get working, in a hard coded sort of way, but the LEDs I used had red about 4X brighter than blue, and green abour 2X brighter than blue, so it took some experimenting to get things set for roughly equal brightness. I suppose that normally you measure the forward voltage of each color and use an appropriate resistor. But I am not using resistors. Instead I am using four pins, one for the common anode, and one each for each color. This lets me do two things; (1) not need a resistor, and (2) use a oneshot or an ADC to read the brightness of the light falling on the LEDs by reverse biasing them. (Did it on an 8051 long ago, need to repeat it.)

Anyhow, I got the RGB working reasonably well, but I did not yet fully parameterize and zdk-ify it. There are some annoyances. Currently the period is 1 msec, and there are 255 brightness levels, which results in about 4 usec per step, which keeps my utimer very busy, and my interrupt load quite high. It works well at 64 usec per step and a 16 msec period. Still, I need to devise a way to reduce the interupt load.

< ...thinking and typing and experimenting happened here... >

It turns out that I cannot see 256 different btightness levels / color shades on an LED. In fact, 16 shades is pretty good. So with a 1 msec step and a 16 msec period I can get acceptable RGB colors without overburdening my CPU with a crazy heavy interrupt load. There is still the topic of the individual colors having varying relative brightness. And I still need to resurrect the light sensing code. But I am reasonably happy with this solution mainly because it is really simple and runs off a single timer in 'relaxed' fashion with an IRQ once every msec, which is fine I think.

07/07/21 More Light

I did an experiment wherein I dedicated the CPU to modulating two RGB LEDs simultaneously. There are 256 brightness levels for each color, with a period of "as fast as it can loop". The loop checks bits in variables to decide whether it should be emitting (on steady), glowing (pulsing on/off), or sensing (reading incoming light levels). It is plain old C using the zdk.

There are several options for the host interface: USB, UART, or LED blinky comms. I plan to try all of them, and possibly other variants. I realize that not everyone has much use for a dual RGB blinky ... but it is a good starting point for a board that uses most of the pins and does something interesting, so I'm not necessarily 'wasting' the an entire MCU, even though I'm actually very few of its peripherals.

Once I have a few i/o boards that do various things (servo board, you'e next) then I can build out the multi MCU system by 'networking' them. At that point I will have an arguably modular system and I can start building my robot army. Watch this spot for updates.

07/22/21 git Some Docs

I now have a prototype of the new mkdocs tool working. It needs a little polish, but it is just about ready to generate documentation from the ZDK sources. Maybe I should be using asciidoc ... I might consider switching. I also created a repositiry for the ZDK on github, and I chose the MIT license for the SDK. I intend to get things ready and put it out there in the near term.

08/14/21 mcu4th Begets z4th Begets u4th

So, in order to try to simplify things and actually maybe make some progress I made some, temporary?, simplifications. I have always been intrigued by how much you can do with simple systems. I read a lot of vintage computer books because people were doing amazing things with computers less powerful than what's in a wrist watch these days. The Onset Computers little Basic boards, which are precursors of the venerable Basic Stamp, and the Pic-Axe, not to mention Silas Warner's Robot War virtual machine with Basic compiler. And even more modern things like Tztzyme and its derivative Simpl. These sorts of things have inspired some of my work over the years, like the WeePU virtual machine with Basic like language, which later became the basis of the first Yop virtual machine back when it was still accumulator based. That later became the stack based YopVM with compiler, assembler, and source level debugger with breakpoints etc. (Ok, it has one breakpoint, but I can single step, or run to breakpoint, and stuff, which is like ... a thing ...)

Anyway during development of Yop, yopcc, the yoptui, etc it was obvious to me that stack machines (which I had dabbled in years before when I was hoping to license the F21 from Jeff Fox for SMSC way back when, and later switched to trying to license the SHBOOM from Patriot Scientific, etc) were easier compilation targets than accumulator based machines. I pursued the accumulator architecture mainly because it was an homage to Warner, but once I tried to add complex expressions, function calls with parameters and returns, etc, it became obvious that both a stack and an auxillary register are necessary. And since I was already using a little forth-ish expression evaluator for evaluating compile time constant expressions, and using a shunting yard with a stack to compile expressions, it just ... became ... time to switch to a stack architecture. The YopVM and instruction set is not a Forth, though it's instruction set is very Forth like. (Its a simplification of the Helium instruction set but that's even less important than the rest of this...) But it is fully compiled and then executed; it is not interactive.

And sometime during all of this I re-read Hoyte's Let Over Lambda Chapter 8 for the third or fourth time and finally it clicked. I mean, it made sense all along, but at some point there was an Aha! moment. And so, having put Helium (Lisp) on hold for Yop, I put Yop on hold for Forth. The first try, mcu4th, was a learning exercise. I got things working on the Mac but I was not terribly satisfied with it. Then MacOS became 64 bit and mcu4th basically was still born before it ever even ran on an MCU. Eventually I resurrected PEL as DDPE (two-d pixel engine), and some games and sprite editor stuff, and somehow that led to IBTIL (64 bit itty bitty threaded interpretive language). Ibtil went through several implementation experiments and even had ibtui as a sort of crude text mode debugger. It was going to marry with DDPE into a Forth based fantasy console, but that hasn't happened yet. Anyhow, somehow during that period I began to play CNC PCB making and building again, and then moved up to using board houses to make PCBs with solder mask etc. And I needed a Forth to run on that. Or maybe a Yop. But I decided on a Forth.

So the ZDK and z4th came into existence. Really the ZDK is a distillation of a bunch of similar previous efforts like Trentonium, Platformium, Hydrogen, some bits from Helium, etc, with a lot of experimental things stripped out and a few new things. But it is an attempt at identifying the essential and eliminating the rest. It really wants to be a Zen SDK for MCUs, and also for my desktop stuff. Anyhow, z4th was born and it actually ran on a SAMD11 with USB as its host interface. But it was a 32 bit threaded interpreter on a machine with 4KB of RAM. So every CELL wasted > 16 bits because there is no memory to point to. So the next incarnation of z4th became a 16 bit forth with some trickery to run 16 bits on a 32 bit machine. That works and runs on the SAMD11. Oh, and, it used my bootloader with builtin USB i/o. I.e., the loader takes up 2KB or flash and downloads apps above there into flash and runs them. But I didn't want the apps to have to duplicate the USB stack to get terminal i/o, so I arranged for them to be able to share the entire USB stack from the loader in order to do terminal i/o. But that became somewhat cumbersome and debugging the apps was not convenient; I blogged about all of that a while ago. Ok, so z4th is unfinished bit its still a thing.

At some point during all of this I revisited Txtzyme, and Simpl, and Stable, and Sectorforth, and a few other very small virtual machine sorts of things. And it occurred to me that it would be cool to be able to run z4th on tiny machines like ATTiny's, eventually, maybe, assuming I could just cross compile. And of course, yopcc -is- a cross compiler, so I had the technology already; mostly; ok at least the knowhow; I think. So I thought to myself "hmmm...". And I thought "self, what if there are only 256 bytes of memory, and all pointers are 8 bits, and all numbers are 8 bits, and that's it." Recall that Robot War gives you 256 instructions and 10 bit signed integers... Other things are similarly memory challenged.

And so u4th (micro forth) was born! Do a really tiny thing and scale it up from there later if needed. There are two major changes to make u4th work. First off the dictionay headers are separated from the word bodies. You have to do that to cross compile anyway, so why not go ahead and do it on chip? Second, u4th is token threaded. I.e., primitives are represented by an 8 bit index into a jump table (i.e, as XTs), and user defined words use an 8 bit CFA, which is an address into the 256 byte virtual machine memory.

So, even with all of that TL;DR nonsense, there is an awful lot I'm leaving out. And not all of it really happened in that order necessarily; its something of a blur as to what really happened when, but mostly its close enough given that no one will ever read this. But anyhow, u4th over USB was working fine, but because the ZENMCUs are intended to be combined into multi-CPU systems, with a UART link, and an infrared link, I needed to get my UART game going. I previously prototyped the infrared and got that working but I didn't build a protocol on it. (I had to build some hardware; brd0033 is out to fab and should be back from Oshpark any day now.) I also worked on the timer abstractions, and implemented prototypes, etc. and I got all that more or less working and now I am combining it into u4th etc. I also decided that I will use Sparkfun's QUIIC connectors, but rename them QUUIC, because they aren't I2C they are UART to UART or maybe UUC. But they are small and convenient and I think they will work well. So, like, there has been a lot of design work and prototyping going on, but only late at night mostly on weekends, so the going is slow.

So now I am working on the Spanda 'expansion bus' protocol. One protocol over UART, UART+I/R, and even USB (which does abuse USB a bit but not so as to violate the specification) with the idea of the same code in all situations so as to avoid bloat. It takes inspiration from Proteus, and RFC, and MytBus, and NiRDA and all of my other protocols and experiments. (But it is most like Proteus at this point.) And doing that made me think about the potential for nonblocking i/o, which got me to go back and review knl (my tiny cooperative rtos) with a view towards maybe using it in u4th, but I haven't decided yet. Anyway, that's a lot of stuff, but doesn't really cover it all, but I'm tired of writing this for now. So ... yeah.

08/18/21 Loaders, Reloaded...

Previously I mostly decided to retire my USB bootloader. However, it has been gnawing at me. So I decided to modify it such that it no longer offers a jump table of functions for applications to call into, like to do tiny USB i/o. Instead, it should just load the app and get out. There is, however, a problem with this scheme. Once the application is running there is no 'universal' way to force it to run the loader again. Previously, when the USB stack and the USB ISR were plumbed and active, a USB request to run the loader was all that was necessary. I.e., even with the app running the loader's USB stack was still operational.

So I have some choices:

1 Implement some sort of reset button double tap scheme; but this requires a reset button...

2 Go ahead and leave the USB ISR and stack operational (it runs on interrupt context anyhow) to accept the force load command.

I don't want to require a silly reset button, so option 1 is unattractive.

Option 2 seems harmless, so long as the app does not implement its own USB stack and replace the USB IRQ in the vector table. If it does so then it has no way to communicate to the loader that it wants to run the loader (not the app) even though the app is valid and ready to run.

So I think that I will just say that the loader, now called zload, only exists to load non-USB apps. I.e., if you want to make USB devices with the ZDK then you are unlikely to want to waste 2KB of flash on the loader anyhow, so the problem pushes up a level. That seems like a reasonable compromise for now; I don't want to push this string much farther.

In any case, this change to the loader is part of an effort to create a few utilities:

zload - The new USB bootloader.

zterm - This is a console based application for talking to the Forth interpilers of multiple remotes using the Spanda protocol over multidrop UART or Infrared. (Think "robot army"... or maybe "swarm of robots" ...)

irshark - This is a USB device (no it doesn't use zload, ya wiseacre) that sends everything it receives on I/R up the USB to the isrhark console application, which decodes and displays all of the packet traffic in 'real time'. So I can have some hope of figuring out if things are working right. It can also sniff wired UART multidrop traffic, it is the same Spanda protocol.

Parts are arriving from Digikey, boards from Oshpark are expected soon, etc. I am experiencing supply chain issues though! I cannot get my LDOs at the moment from Digikey. I have enough for some prototype devices but I'm gonna need more soon.

08/19/21 Zapps

Late last night I got the evolution of zenesis --> ociter --> zload working. I might rename it back to zenesis, I dunno. But anyhow, the ociter interoperated with the ocit11 bootloader with the simple USB terminal i/o stuff accessible from the client app. The i/o stuff is now removed, it made the whole thing unnecessarily cumbersome. The example apps lived in zamples/. Now that ociter is zload, the zamples are zapps and live in, you guessed it, the zapps/ folder.

The first zapp is my dual RGB LED blinky, because I had recently been working on it as part of my SWPWM effort, and because it is, well, an obligatory blinky. And I can see if it works or not because ... it blinks. Anyhow, I need to convert the other zamples to zapps and then park that while I get irshark and zterm running.

Anyhow, here is some preliminary documentation for zload, created using mkdocs: zload.pdf

08/22/21 Interociters

I have constructed three interociter boards. I ordered 6 PCBs from OshPark and have built up 3 so far. Two will communicate over I/R while the third listens to all traffic and reports it.

Two instances of zterm talking through these interociter boards to help me ring out the Nirda / Spanda protocol, with the third listening and reporting through irshark.

08/25/21 Time for Timers

If I was programming in Helium (which I can't because I have not ported it to 64 bit yet) and I wanted to use a timer I would just do like:

(after ticks (lambda (...) ...)

(every ticks (lambda (...) ...)

etc.

But I'm programming in C on an MCU with 4KB of RAM. So the timer subsystem needs to have a minimal code and RAM footprint. I started with something like:

after(ticks, callback, context);

every(ticks, callback, context);

But passing a context to the callback means storing the context until time to invoke the callback. And, using every() for a periodic timer means storing the reload value so it can be reloaded.

So we have something like this:

typedef struct TIMER {

TIMER* next;

TICKS remaining;

VOID (*fn)(VOID);

TICKS reload;

VOID* ctx;

} TIMER:

Its freakin' 20 bytes per timer. The obvious first things to go are every(), reload, and ctx. Now its 12 bytes per timer, which is still ... like massive ..., but better than 20. So we have:

after(ticks, callback);

typedef struct TIMER {

TIMER* next;

TICKS remaining;

VOID (*fn)(VOID);

} TIMER:

Now to get a periodic timer one needs to call after() again in the callback. But ... there is no context parameter so you can't pass it the timer that fired. So you can't ask it for its reload value ... that you needed to have to implement every() ... so you just have to know what ticks value to use. So its a trade off. I'm not certain I'm happy about it yet. But I often need lots of concurrent timers running so maybe its ok to keep it simple for now and save precious memory.

I do very much like the idea of every(), however, even though it is slightly more bloaty. But there is another nuance to every() that makes it ickier; you need the ability to cancel one. Like, if after a while, you want to stop an every() timer because you did stuff say every(100) but after(2000) you are done. So every() has to return a handle of some type so you can say cancel(). So then the caller has to track that somehow. Which just makes for an even less laminar imterface. So again I feel that this is added complexity that I should avoid worrying over for now.

So, can I shave off another 3 bytes by converting the 'next' pointer into an index? Afterall, under the hood these are stored in an array of TIMERs linked into two lists; the live timers list, and the free timers list. When you call after() it grabs one from the free list and inserts it into the live list. When it fires it is removed from the live list and placed back onto the free list. Simple. But I could use an array index instead. The only problem with that is that in order to reclaim those 3 bytes I need to pack the structs in the array, which results in the fields of timers[1]..[n] being un-aligned(4) which on the CM0+ results in a fault. So, I see no clear way to win there, really. I could make the next link 16 bits and the remaining ticks 16 bits but then you can only time 65536 ticks which is not very long at 48 MHz. So, yeah. Its all about making tradeoffs and trying to live with them.

For now, I will just keep it simple and minimal. Farewell every(), I wanted to love you, but I couldn't cope with your baggage. Perhaps someday we will meet under different circumstances. Until then, I am left with bittersweet memories of what was and what might have been...

Update 10/24/21

After lives on, but is simplified. Every is resurrected, but as a macro, and with a different philosophical underpinning. I'm pretty happy with this for now, but it is likely to continue to evolve as it is used in anger.

We have the following sorts of things now (an excerpt from zdk.pdf):

Facilities for Chronal Displacement Monitoring

I use the SysTick as a milliseconds, and cycles counter. It ticks once per cycle, and I set it to interrupt once per millisecond. That is enough information ot implement msecs() and cycles(). When I am writing application code and I want something done after X amount of time, or every so many milliseconds etc, I couldn't care less about timers and clocks and ticks and prescalers. I just want to say something like: after(time, do_this); or every(interval, do_this); I don't want to deal with timer hardware and interrupts or whatever.

To implement after() I use TC1 on the SAMD. It works in ticks, and you can use s2t(), m2t() etc to convert seconds or milliseconds to ticks. I set the timer clock to run at 1MHz so 1 usec is 1 tick, for convenience, as it simplifies some of the math. After() works by keeping a linked list of time TIMER structs sorted in order of least ticks remaining. So when the timer interrupts subtract the time that just expired from all timers in the list. Remove any that hit zero, and invoke their callbacks (on interrupt context). Now the short timer is the first in the list, so start the hardware timer for that amount. Lather, rinse repeat.

I -could- let the client allocate the timers. But that complicates things. Generally TIMERs cannnot be allocated as auto variables on the stack. They technically *can* be, if the function defining them never returns while the TIMERs are live. Or they can be static. But then they have to be passed around, making the interface much less laminar. So in the interests of simplicity there is an internal array of them.

Example:

after(m2t(50), led_on); // after 50 msecs call led_on().

The implemntation of every() is a little different. It could just set up a repeating interrupt callback. But you'd need a way to stop it, like returning a handle when you start it so you can later call cancel(handle) or whatever. That makes the interface bulky and clunky and puts a burden on the client code to track the handles. I don't really like that at all so I did something different.

Instead, every() is a macro built on top of STOPWATCHes. (It used to be built on ONESHOTs, which were built on STOPWATCHes, but I eliminated those as they were also clunky and I think this is a better solution.) Like the stopwatch, you have to poll it periodically. Usually you put it in your top level while loop. Every is a macro.

Example:

// every 20 msec turn the servo on, and schedule a callback to turn it off.

while (1) {

every(20, servo_on(); after(u2t(servo_pos), servo_off); ); }

08/25/21 Zapp'd

Remember how zload can load zapps over USB and run them, like any old bootloader does? And that if you don't jack with the USB in your zapp then the zload USB stack keeps running (on interrupt context) so you can still send it a request to restart in loader mode? So hooray, you don't need a reset button and a double tap scheme to force the loader to run.

Except ... (#_#)

If you zload a zapp that happens to hardfault ... (>_<)

And then, well, you are kind of screwed. Because at power on zload checks to see if there is a zapp loaded into flash. It does some trivial checks on it to make sure it is theoretically runnable, and if it is, it jumps to it. So if your zapp hardfaults or similar then you end up with a semi-brick'd device because you can't send it a request on USB to go into loader mode. I know this because I was zloading a zapp to test my timers and (surprise) that test floated belly up. And then ... Doh!

You have to use SWD and a debugger to erase it and put it back to rights. And if you are gonna do that when you mess up then you might as well just use a freakin' debugger all the time and actually debug with it. I mean ... Hello, McFly?

So, yeah, I've written about the loader several times. Bottom line is that in order for it to be un-brickable it needs a reliable way to force it into loader mode. Like maybe a reset button with double tap ... or ... something ... (-_-)

08/28/21 Interociters Again

Got a little time today to play with stuff and did some initial simple tests on the interociter boards. I built 3 and the LDO, USB ,and SWD work fine on all of them. I set one up to transmit a repeating pattern, and set a second up to print everything it receives over the Segger RTT. That worked fine. The beams are fairly directional, as expected, but I managed to get at least 6 feet of range, which exceeded my expectations.

The next step is to get a minimal viable irshark working. Everything received on the i/r UART is sent to the laptop over USB and printed in a terminal window. Then get two other devices talking to each other using the Spanda (or am I renaming it back to Nirda?) protocol. Then have the irshark workstation app decode the packets and do analysis etc.

So, baby steps, but progress none the less. Much more work yet to be done.

09/18/21 More Stories About UARTs and Cheese

Ok, this isn't about cheese. I am not sure where that came from. But anyhow, its been a while since we've had a chance to chat. That's because work got really busy like it does periodically when a big release is looming on the horizon. Sorry if you've missed me.

This is probably going to be more about finding the ultimate development setup, I guess. Let me back up. As you know I've been working on the ZENMCU boards and the ZDK and all of that. Part of the whole thing is the ability to connect up multiple boards, each doing its own special thing, all cooperating and talking to a host/workstation. They can talk to each other using UARTs with diodes on the TX lines to allow multidrop communications, or they can use UARTs in IrDA mode over IrDA transceivers.

It is difficult to debug two devices talking to each other. You need two instances of the debugger, and two debugger pods, and you have to keep track of which is which, and you cannot easily cross trigger breakpoints from one to the other etc. So it takes patience. I have improved the debug experience a good deal by reducing the size of the dangly bits needed to hook everything up. Using the Segger JLink Mini saves tons of space. Using USB-C cables lets me plug in the Zen board and the debugger, but not power. So I have to use a USB-C hub, which bulks everything up again. And using 2 boards and 2 debuggers at the same time on one laotop is just tediously bulky. But it at least works. I will continue to improve on my game here.

This is the new general purpose I/O board. It has a "Sparksun QWIIC" compatible I2C connector on one end so it can talk to QWIIC boards, and it has a UART on the other end, using an identical connector, but I call it QUUIC because it's UART to UART. And it has my standard ZENBUS on the bottom edge. (Due to an unfortunate oversight on my part, in order to use the SPI interface on the ZENBUS you have to desolder the I2C pullups. Oops.)

And this is the hub board. It has 4 QUUIC UART ports to talk to up to 4 of the I/O boards, and a USB port to talk to the host. And it also has my standard ZENBUS on the bottom edge. I somehow failed to connect the USB connector VBUS pin to VBUS, so I had to solder a jumper. And I fed V3P3 to the qUUIC UART ports instead of VBUS. I got in a hurry and made dumb mistakes. Lesson learned.

Anyhow, my hope is that having boards with a "standard" bus will make it easier to make expansion gadgets to attach to them. And using the QWIIC format for I2C should make it easier to experiment with off the shelf gadgetry. Repurposing those connectors as my UART connectors is a mixed bag; they might be confused as I2C connectors (though silkscreen clearly says "UART"), but they are small and convenient 4 pin connectors. So hopefully that will work out.

There is also another board which is UART on one end and IrDA transceiver on the other. Plugging it into the QUUIC UART plug on an I/O board and moving a jumper allows the I/O board to become wireless via infrared. Then it can talk to the I/R Interociters etc.

Anyhow, I think that's all for now. I have some soldering to do.

09/18/21 Still Not About Cheese

Did some soldering on the USB to UART hub. Soldered the MCU, LDO, USB, and one UART socket. If that much works I will solder the other UART sockets. The silk screen is so tiny I need a magnifier. There are minor errors on the silk screen, and a missing trace from the USB VBUS to the LDO so I added a crappy jumper wire. Rev B of this board is in the works to correct these mistakes. Also some vias are so close to pins that not shorting them was very tricky. Soldering those USB connectors is the hardest because the pins are kind of under the shell.

09/22/21 GridEye Reprised

I hooked up the little I/O board's I2C to a GridEye 8x8 thermal image sensor. Just to check out the board connections etc I hacked up a quick I2C driver for it in C and ran that, capturing the image data into a buffer. Then I examined the buffer using the debugger. It's hard to tell much from that but the values do change with temperature (hand, ice, hairdryer) so I think the I2C is working.

Previously (in an earlier incarnation a few years back) I was pulling the data out over USB and displaying an image on the Mac. I need to get something similar here. To do that I need to implement the u4th words for I2C, and some way to 'dump' the data over the UART to the HUB to the laptop where I can display the image. I.e., I want to do this in u4th, not C, to prove that I can.

10/12/21 Release Me!

Yes, I am still here. Lots going on. I am working on actually getting a release up on github. I've been cleaning up the documentation etc and making sure things work right and improving the layout. I'm hoping to do something to simplify the makefiles. Some things just build with a script. Others have makefiles. The annoying thing is that for a full build the script is faster than the makefile. For partial builds, though, the makefile is faster. Its just kind of a shame to need something as complex as a makefile for building such trivial applications.

Also, my 2019 MBAir keyboard is like the worst keyboard ever made. It doesn't type, it double types, and keycaps fall off. It just sucks horribly. I've been using a BT keyboard but it is annoying to carry that around with the laptop, and their isn't room for both on my lap. So I'm looking at getting a new MacBook, but holding off as long as I can. Anyhow, the keyboard is making things take longer because I have to retype every other word.

In any case, I haven't forgotten about you, its just that I've been working on this and all manner of other stuff. Watch this spot for updates.

10/30/21 Notes On u4th

This is hackish and incomplete but there was an ask on Discord for more information so I cobbled this together.

U4TH - An experimental 'micro Forth'

u4th is an experiment. In order for some of this to make sense it might be useful (or not) to read some of the context here: http://www.trentonhenry.com/ZENMCU-Dev-Log/Entries/2021/8/mcu4th-begets-z4th-begets-u4th.html

Briefly ...

The ZDK (aka Zen Developer's Kit) is a minimalist hardware abstraction on top of some resource limited ARM Cortex-M0+ MCUs. The SAMD10 and SAMD11 have 16KB of flash (i.e., 8KB instructions) and 4KB of RAM; they are identical except the SAMD11 has USB. Everything has to be carefully thought through, and if it isn't critical, it gets discarded because memory is precious and scarce. In any case, one of the many things the ZDK is intended for is making Forths. You can also use the ZDK to build (some) embeded targets (like u4th) on the workstation and run it there to debug more easily. That is what the NIX_PFM stuff is about; the *nix-y platform. The embedded stuff is CMX_PFM (Cortex-Mx) and SAMD_PFM.

Because these MCUs are so resource constrained my intent is to use several of them in a small network to build larger systems. This might be a retro style motherboard with slots to plug in several processor boards, or it might be independent robots. So the communications interface is a multidrop UART scheme that can also run over infrared using an IrDA transceiver. Essentially, one of these boards by itself can only do a few simple tasks. Combining multiple boards allows me to build larger systems.

[PC]<--USB-->[BASE]<--UART-->[REMOTE]

-->[REMOTE]

...

I implemented a decent portion of z4th, an ITC Forth on these chips where each CELL is a 32 bit integer, or pointer. However, with such limited memory capacity I was wasting over half of my RAM storing high order zeros in all of my addresses. So I decided to switch to a 16 bit variant using array indices instead of pointers in order to reclaim some RAM. I over complicated the NVM vs RAM stuff but I got it mostly working. But in doing that experiment I gained insights into better ways to do some of it.

I went back and re-read tons of stuff. I re-read about txtzyme http://txtzyme.com/view/welcome-visitors/view/txtzyme, simpl https://sustburbia.blogspot.com/2013/05/simpl-simple-programming-language-based.html, stable https://w3group.de/stable.html, and of course Silas Warner's Robot War https://corewar.co.uk/robotwar/robotwar.txt (a paragon of simplicity). I reviewed Basic Stamp, Pic AXE, Q4th (for the 4 bit MARC4 MCU), and more. Basically I refreshed my memory of every tiny language I had ever played with. Then I wento to my thoughtful spot and thought "hmmm..." for a while.

And so I came up with u4th, which isn't likely to be terribly useful as is. But its simplifications keep things small and bounded so that I can focus on making things work. Later I will scale the memories and CELLs up as needed. (09/01/21 I bumped up to 16 bit CELLs).

OBSOLETE A u4th CELL is 1 byte. Numbers are signed 8 bit integers. I.e, -128..127. So, this isn't teribly useful in and of itself. But, baby steps. It has a total RAM space of 256 bytes, including stacks, pad, tib, etc. I.e., almost uselessly small. It will get bigger once I get further along.

CURRENT A u4th CELL is 2 bytes. Numbers are signed 16 bit integers. I.e, -65536..65535. It has a total RAM space of 1024 CELLs, including stacks, pad, tib, etc. I.e., terribly small.

My initial vision for u4th is as a programmable front end to a collection of handy primitives written in C, much like the Basic Stamp or the PicAxe. No one writes drivers for hardware in PBASIC, they just call the pre-existing primitives. Remember that PBASIC gives you ~32 variables and ~2000 instructions. I.e., it is so feeble that I could never have conceived of it or how useful it could be.

OBSOLETE Anyhow, I know 8 bits is silly and I plan to beef it up, but for now it is enough to build the infrastructure.

Mostly you are going to do things like read the ADCs, or flip the pins, or set up PWM, etc. Nothing fancy. The data goes up to the PC and the real processing happens there. More or less txtzyme style, but in a tiny network of u4th enabled MCUs.

Because economy of memory is paramount token threading is the choice I made, even though that means performance metrics are pointless as it will be terribly slow by design. That's ok for now, I can optimize later.

There are currently two dictionaries, one in NVM, and one in RAM. The dict headers are fixed width so there is no need to store a LINK. The primitives are baked into the NVM dict at compile time. Words defined at runtime are compiled into the RAM dict. Eventually I will implement cross compilation so that I can store user defined words in flash/NVM and free up RAM. For now the plan is to implement SAVE and RESTORE which copy RAM to NVM, and back, so that a power cycle doesn't destroy the active image.

The code is obviously a mess. There is a lot of junk about #ifdef this or that as I was trying to organize for an eventual switch to cross compiling. I will clean most of that up as I am unlikely to implement all of that nonsense in reality.

OBSOLETE I broke all of the vectored execution primitives when I switched to tokens. Because when you ' a word you either get an XT (an index into the dispatch table for primitives) or a CFA (the address of a word) and you don't know which. When I change CELLs up to 16 bits I will be able to address the full RAM's worth of CELLs in 11 bits, so I will have spare bits to indicate XT vs CFA etc and I won't need icky things like JSR any more. So the fix for that stuff is on hold until I enlarge CELLs (which is coming soon). (09/01/21 CELLs are now 16 bits, but vectored execution words are still broken).

CURRENT Cells are 16 bits. Numbers, characters, etc are all CELLs; there are no 'bytes'. There are three dictionaries. All instructions contain bits identifying the dictionary the instruction was compiled from.

_xtdict describes the builtin/primitive operations coded in C.

_nvmdict describes cross compiled colon definitions stored in nvm.

_randict describes on target compiled colon definitions stored in ram.

Instructions are tokens encoded as follows:

These two bits are the opcode, and the rest are the operand:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| |x|x| | | | | | | | | | | | | | |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

00 - bits 9..0 are the cfa of a colon definition in ram, other bits ignored

01 - bits 9..0 are the cfa of a colon definition in nvm, other bits ignored

10 - bits 9..0 are the xt of a primitive, other bits ignored

11 - push literal in bits 15,12..0 onto the data stack (basically many lit words for wee literals)

where these shortened literals are encoded in signed magnitude form.

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

|s| | |m|m|m|m|m|m|m|m|m|m|m|m|m|m|

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

They are 14 bits including the sign, allowing immediate values in the range -8191..8191 inclusive

s - sign, 0 is positive, 1 is negative

m - magnitude

OPTIS

- jmp, jro, jroz to exit could be just exit

- steal another bit and make relative jumps like wee lits and compress

// HOST NVM RAM

// ---- ---- ----

// dict code data develop on host, download to device CROSS COMPILED

// dict dict, code, data develop on device, compile into ram SELF HOSTED

// dict code, data develop on device, cannot define new words <-- WHY? NO DO

// dict, code dict, data develop on device, compile into nvm <-- NO DO, PREFER CROSS COMPILE

That's all the more I have at the moment.

End