Sunday, October 13, 2024

Atronach's Eye 2024

It's the return of the mechanical eyeball! I worked out a lot of problems with the eyeball's hardware last year and was left to concentrate on improving the motion tracking software. Ever since I added motion tracking, I've been using the OpenCV libraries running locally on the eye's controller, a Raspberry Pi 3 A+. OpenCV is possibly the most well-known and popular open-source image processing library. But it's not a complete pre-made solution for things like motion tracking; it's more of a toolbox.

My original attempt at motion tracking used MOG2 background subtraction to detect regions that had changed between the current camera frame and the previous one. This process outputs a "mask" image in which altered pixels are white and the static "background" is black. I did some additional noise-reducing processing on the mask, then used OpenCV's "moments" function to compute the centroid of all the white pixels. (This would be equivalent to the center of mass, if each pixel were a particle of matter.) The motion tracking program would mark this "center of motion" on the video feed and send commands to the motors to rotate the camera toward it.

This ... kinda sorta worked. But the background subtraction method was very vulnerable to noise. It also got messed up by the camera's own rotation. Obviously, while the camera moves *every* pixel in the scene is changing, and background subtraction ceases to be useful; it can't distinguish whether something in the scene is moving in a contrary direction. I had to put in lots of thresholding and averaging to keep the camera from "chasing ghosts," and impose a delay after each move to rebuild the image pipeline from the new stationary scene. So the only way I could get decent accuracy was to make the tracking very unresponsive. I was sure there were more sophisticated algorithms available, and I wanted to see if I could do better than that.

I started by trying out alternative OpenCV tools using a webcam connected to my PC - just finding out how good they were at detecting motion in a feed from the stationary camera. One of the candidates was "optical flow". Unlike background subtraction, which only produces the binary "changing or not" mask, optical flow provides vector information about the direction and speed in which a pixel or feature is moving. It breaks down further into "dense optical flow" methods (which compute a motion vector for each pixel in the visual field) and "sparse optical flow" methods (which try to identify moving features and track them through a series of camera frames, producing motion traces). OpenCV has at least one example of each. I also tried object tracking. You give the tracking algorithm a region of the image that contains something interesting (such as a human), and then it will attempt to locate that object in subsequent frames.


The winner among these options was the Farneback algorithm, a dense optical flow method. OpenCV has a whole list of object tracking algorithms, and I tried them all, but none of them could stay locked on my fingers as I moved them around in front of the camera. I imagined the results would be even worse if the object tracker were trying to follow my whole body, which can change its basic appearance a great deal as I walk, turn, bend over, etc. The "motion tracks" I got out of Lucas-Kanade (a sparse optical flow method) were seemingly random squiggles. Dense optical flow both worked, and produced results that were fairly easy to insert in my existing code. I could find the center of motion by taking the centroid of the pixels with the largest flow vector magnitudes. I did have to use a threshold operation to pick only flow vectors above a certain magnitude before this would work well.

Once I had Farneback optical flow working well on a static camera, I merged it into the eyeball control code. Since I now had working limit switches, I also upgraded the code with better motion control. I added an on-startup calibration routine that finds the limit of motion in all four cardinal directions and centers the camera. I got rid of the old discrete movement that segregated the visual field into "nonets" (like quadrants, except there were nine of them) and allowed the tracking algorithm to command an arbitrary number of motor steps.

And for some reason, it worked horribly. The algorithm still seemed to be finding the center of motion well enough. I had the Pi send the video feed to my main computer over Wifi, so I could still view it. The Farneback algorithm plus centroid calculation generally had no problem putting the tracking circle right on top of me as I moved around my living room. With the right parameter tuning, it was decent at not detecting motion where there wasn't any. But whenever I got it to move, it would *repeat* the move. The eyeball would turn to look at me, then turn again in the same direction, and keep going until it ran right over to its limit on that side.

After turning off all my running averages and making sure to restart the optical flow algorithm from scratch after a move, I finally discovered that OpenCV's camera read function is actually accessing a FIFO of past camera frames, *not* grabbing a real-time snapshot of whatever the camera is seeing right now. And Farneback takes long enough to run that my frame processing rate was slower than the camera's frame rate. Frames were piling up in the buffer and, after any rotation, image processing was getting stale frames from before the camera moved ... making it think the moving object was still at the edge of the view, and another move needed to be performed. Once I corrected this (by setting the frame buffer depth to 1), I got some decent tracking behavior.

At the edge of the left-right motion range, staring at me while I sit at my desk.

I was hoping I could use the dense optical flow to correct for the motion of the camera and keep tracking even while the eye was rotating. That dream remains unrealized. It might be theoretically possible, but Farneback is so slow that running it between motor steps will make the motion stutter. The eye's responsiveness using this new method is still pretty low; it takes a few seconds to "notice" me moving. But accuracy seems improved over the background subtraction method (which I did try with the updated motor control routines). So I'm keeping it.

Until the next cycle,
Jenny

No comments:

Post a Comment