Saturday, May 31, 2025

Acuitas Diary #84 (May 2025)

A couple months ago I described my plans to implement trial-and-error learning so Acuitas can play a hidden information game. This month I've taken the first steps. I'm moving slowly, because I've also had a lot of code cleanup and fixing of old bugs to do - but I at least got the process of "rule formation" sketched out.

A photo of High Trestle Trail Bridge in Madrid, Iowa. The bridge has a railing on either side and square support frames wrapping around it and arching over it at intervals. The top corner of each frame is tilted progressively farther to the right, creating a spiral effect. The view was taken at night using the lighting of the bridge itself, and is very blue-tinted and eerie or futuristic-looking. Photo by Tony Webster, posted as public domain on Wikimedia Commons.

Before any rules can be learned, Acuitas needs a way of collecting data. If you read the intro article, you might recall that he begins the game by selecting an affordance (obvious possible action) and an object (something the action can be done upon) at random. In the particular game I'm working on, all affordances are of the form "Put [one zoombini out of 16 available] on the [left, right] bridge," i.e. there are 32 possible moves. Once Acuitas has randomly tried one of these, he gets some feedback: the game program will tell him whether the selected zoombini makes it across the selected bridge, or not. Then what?

After Acuitas has results from even one attempted action, he stops choosing moves entirely at random. Instead, he'll try to inform his next move with the results of the previous move. Here is the basic principle used: if the previous move succeeded, either repeat the move* or do something similar; if the previous move failed, ensure the next move is different. Success and failure are defined by how the Narrative scratchboard updates goal progress when the feedback from the game is fed into it; actions whose results advance at least one issue are successes, while actions that hinder goals or have no effect on goals at all are failures. Similarity and difference are measured across all the parameters that define a move, including the action being taken, the action's object, and the features of that object (if any).

*Successful moves cannot be repeated in the Allergic Cliffs game. Once a zoombini crosses the chasm, they cannot be picked up anymore and must remain on the destination side. But one can imagine other scenarios in which repeating a good choice makes sense.

Following this behavior pattern, Acuitas should at least be able to avoid putting the same zoombini on a bridge they already failed to cross. But it's probably not enough to deliver a win, by itself. For that, he'll need to start creating and testing cause-and-effect pairs. These are propositions, or what I've been calling "rules." Acuitas compares each new successful action to all his previous successes and determines what they share in common. Any common feature or combination of features is used to construct a candidate rule: "If I do <action> with <features>, I will succeed." Commonalities between failures can also be used to construct candidate rules.

The current collection of rule candidates is updated each time Acuitas tries a new move. If the results of the move violate any of the candidate rules, those rules are discarded. (I'm not contemplating probability-based approaches that consider the preponderance of evidence yet. Rules are binary true/false, and any example that violates a rule is sufficient to declare it false.)

Unfortunately, though I did code all of that up this month, I didn't get the chance to fully test it yet. So there's still a lot of work to do. Once I confirm that rule formation is working, future steps would include the ability to design experiments that test rules, and the ability to preferentially follow rules known with high confidence.

Until the next cycle,
Jenny

Sunday, May 11, 2025

Further Thoughts on Motion Tracking

Atronach's Eye may be operating on my wall (going strong after several months!), but I'm still excited to consider upgrades. So when a new motion detection algorithm came to my attention, I decided to implement it and see how it compared to my previous attempts.

MoViD, with the FFT length set to 32, highlighting and detecting a hand I'm waving in front of the camera. The remarkable thing is that I ran this test after dusk, with all the lights in the room turned off. The camera feed is very noisy under these conditions, but the algorithm successfully ignores all that and picks up the real motion.

I learned about the algorithm from a paper presented at this year's GOMACTech conference: "MoViD: Physics-inspired motion detection for satellite image analytics and communication," by MacPhee and Jalai. (I got access to the paper through work. It isn't available online, so far as I can tell, but it is marked for public release, distribution unlimited.) The paper proposes MoViD as a way to compress satellite imagery by picking out changing regions, but it works just as well on normal video. It's also a fairly simple algorithm (if you have any digital signal processing background, otherwise feel free to gloss over the arcane math coming up). Here's the gist:

1. Convert frames from the camera to grayscale. Build a time series of intensity values for each pixel.
2. Take the FFT of each time series, converting it to a frequency spectrum.
3. Multiply by a temporal dispersion operator, H(ω). The purpose is to induce a phase shift that varies with frequency.
4. Take the inverse FFT to convert back to the time domain.
5. You now have a time series of complex numbers at each pixel. Grab the latest frame from this series to analyze and display.
6. Compute the phase of each complex number - now you have a phase value at each pixel. (The paper calls these "phixels." Cute.)
7. Rescale the phase values to match your pixel intensity range.

The result is an output image which paints moving objects in whites and light grays against a dark static background. I can easily take data like this and apply my existing method for locating a "center of motion" (which amounts to calculating the centroid of all highlighted pixels above some intensity threshold).

My main complaint with the paper is its shortage of details about H(ω). It's an exponential of φ(ω), the "spectral phase kernel" ... but the paper never defines an example of the function φ, and "spectral phase kernel" doesn't appear to be a common term that a little googling will explain. After some struggles, I decided to just make something up. How about the simplest function ever, a linear function? Let φ(ω) = kω, with k > 1 so that higher frequencies make φ larger. Done! Amazingly, it worked.

Okay, math over. Let me see if I can give a more conceptual explanation of why this algorithm detects motion. You could say frequency is "how fast something goes up and down over time." When an object moves in a camera's field of view, it makes the brightness of pixels in the camera's output go up and down over time. The faster the object moves, the greater the frequency of change for those pixels will be. The MoViD algorithm is basically an efficient way of calculating the overall quickness of all the patterns of change taking place at each pixel, and highlighting the pixels accordingly.

It may be hard to tell, but this is me, gently tilting my head back and forth for the camera.

My version also ended up behaving a bit like an edge detector (but only for moving edges). See how it outlines the letters and designs on my shirt? That's because change happens more abruptly at visual edges. As I sway from side to side, pixels on the letters' borders abruptly jump between the bright fabric of the shirt and the dark ink of the letters, and back again.

The wonderful thing about this algorithm is that it can be very, very good at rejecting noise. A naive algorithm that only compares the current and previous camera frames, and picks out the pixels that are different, will see "motion" everywhere; there's always a little bit of dancing "snow" overlaid on the image. By compiling data from many frames into the FFT input and looking for periodic changes, MoViD can filter out the brief, random flickers of noise. I ran one test in which I set the camera next to me and held very still ... MoViD showed a quiet black screen, but was still sensitive enough to highlight some wrinkles in my shirt that were rising and falling with my breathing. Incredible.

Now for the big downside: FFTs and iFFTs are computationally expensive, and you have to compute them at every pixel in your image. Atronach's Eye currently runs OpenCV in Python on a Raspberry Pi. Even with the best FFT libraries for Python that I could find, MoViD is slow. To get it to run without lagging the camera input, I had to reduce the FFT length to about 6 ... which negates a lot of the noise rejection benefits.

But there are better ways to do an FFT than with Python. If I were using this on satellite imagery at work, I would be implementing it on an FPGA. An FPGA's huge potential for parallel computing is great for operations that have to be done at every pixel in an image, as well as for FFTs. And most modern FPGAs come with fast multiply-and-add cells that lend themselves to this sort of math. In the right hardware, MoViD could perform very well.

So this is the first time I've ever toyed with the idea of buying an FPGA for a hobby project. There are some fairly inexpensive FPGA boards out there now, but I'd have to run the numbers on whether this much image processing would even fit in one of the cheap little guys - and they still can't beat the price of the eyeball's current brain, a Raspberry Pi 3A . The other option is just porting the code to some faster language (probably C).

Until the next cycle,
Jenny