Atronach's Eye may be operating on my wall (going strong after several months!), but I'm still excited to consider upgrades. So when a new motion detection algorithm came to my attention, I decided to implement it and see how it compared to my previous attempts.
I learned about the algorithm from a paper presented at this year's GOMACTech conference: "MoViD: Physics-inspired motion detection for satellite image analytics and communication," by MacPhee and Jalai. (I got access to the paper through work. It isn't available online, so far as I can tell, but it is marked for public release, distribution unlimited.) The paper proposes MoViD as a way to compress satellite imagery by picking out changing regions, but it works just as well on normal video. It's also a fairly simple algorithm (if you have any digital signal processing background, otherwise feel free to gloss over the arcane math coming up). Here's the gist:
1. Convert frames from the camera to grayscale. Build a time series of intensity values for each pixel.
2. Take the FFT of each time series, converting it to a frequency spectrum.
3. Multiply by a temporal dispersion operator, H(ω). The purpose is to induce a phase shift that varies with frequency.
4. Take the inverse FFT to convert back to the time domain.
5. You now have a time series of complex numbers at each pixel. Grab the latest frame from this series to analyze and display.
6. Compute the phase of each complex number - now you have a phase value at each pixel. (The paper calls these "phixels." Cute.)
7. Rescale the phase values to match your pixel intensity range.
The result is an output image which paints moving objects in whites and light grays against a dark static background. I can easily take data like this and apply my existing method for locating a "center of motion" (which amounts to calculating the centroid of all highlighted pixels above some intensity threshold).
My main complaint with the paper is its shortage of details about H(ω). It's an exponential of φ(ω), the "spectral phase kernel" ... but the paper never defines an example of the function φ, and "spectral phase kernel" doesn't appear to be a common term that a little googling will explain. After some struggles, I decided to just make something up. How about the simplest function ever, a linear function? Let φ(ω) = kω, with k > 1 so that higher frequencies make φ larger. Done! Amazingly, it worked.
Okay, math over. Let me see if I can give a more conceptual explanation of why this algorithm detects motion. You could say frequency is "how fast something goes up and down over time." When an object moves in a camera's field of view, it makes the brightness of pixels in the camera's output go up and down over time. The faster the object moves, the greater the frequency of change for those pixels will be. The MoViD algorithm is basically an efficient way of calculating the overall quickness of all the patterns of change taking place at each pixel, and highlighting the pixels accordingly.
It may be hard to tell, but this is me, gently tilting my head back and forth for the camera. |
My version also ended up behaving a bit like an edge detector (but only for moving edges). See how it outlines the letters and designs on my shirt? That's because change happens more abruptly at visual edges. As I sway from side to side, pixels on the letters' borders abruptly jump between the bright fabric of the shirt and the dark ink of the letters, and back again.
The wonderful thing about this algorithm is that it can be very, very good at rejecting noise. A naive algorithm that only compares the current and previous camera frames, and picks out the pixels that are different, will see "motion" everywhere; there's always a little bit of dancing "snow" overlaid on the image. By compiling data from many frames into the FFT input and looking for periodic changes, MoViD can filter out the brief, random flickers of noise. I ran one test in which I set the camera next to me and held very still ... MoViD showed a quiet black screen, but was still sensitive enough to highlight some wrinkles in my shirt that were rising and falling with my breathing. Incredible.
Now for the big downside: FFTs and iFFTs are computationally expensive, and you have to compute them at every pixel in your image. Atronach's Eye currently runs OpenCV in Python on a Raspberry Pi. Even with the best FFT libraries for Python that I could find, MoViD is slow. To get it to run without lagging the camera input, I had to reduce the FFT length to about 6 ... which negates a lot of the noise rejection benefits.
But there are better ways to do an FFT than with Python. If I were using this on satellite imagery at work, I would be implementing it on an FPGA. An FPGA's huge potential for parallel computing is great for operations that have to be done at every pixel in an image, as well as for FFTs. And most modern FPGAs come with fast multiply-and-add cells that lend themselves to this sort of math. In the right hardware, MoViD could perform very well.
So this is the first time I've ever toyed with the idea of buying an FPGA for a hobby project. There are some fairly inexpensive FPGA boards out there now, but I'd have to run the numbers on whether this much image processing would even fit in one of the cheap little guys - and they still can't beat the price of the eyeball's current brain, a Raspberry Pi 3A . The other option is just porting the code to some faster language (probably C).
Until the next cycle,
Jenny