At Edge Impulse both sound and acceleration are time based inputs lasting a second or more. Computer Camera vision at Edge Impulse is for static frames. Any chance in the future for a really small resolution, say 32 x 32 and short duration say 100 milliseconds of video input?
Just enough to detect direction of motion or short duration complex motions etc.
Video input would be a very impressive next step.