Proposal for a Tactile–Audio–Visual AI System using ESP32-S3 and MPR121

I am developing an experimental tactile–audio–visual system that integrates embedded AI, textile interaction, and environmental sensing. The objective is to create an autonomous, low-power interface that operates without external software (e.g., Pure Data or Max) and produces sound directly through amplified speakers.

The system combines an ESP32-S3 board (with integrated camera) and two MPR121 capacitive sensors providing 24 touch channels, each corresponding to a conductive textile zone. These zones can be activated by human or non-human agents, such as birds resting on the fabric.

Using Edge Impulse, I plan to train an embedded model capable of recognizing and classifying tactile gestures —taps, long touches, swipes, and varying pressure— through temporal and spatial patterns. The system will then generate sound in real time based on the tactile input, while the ESP32-S3 camera interprets ambient light variations to produce synchronized visual outputs.

The project envisions a live audiovisual installation in which touch, light, and sound converge through an embodied and ecological AI —a system that learns from and responds to its material environment.

I would appreciate your feedback or any suggestions regarding the feasibility or optimization of this configuration using Edge Impulse with the ESP32-S3 and MPR121 sensors.

Best regards,

Hi, @lelex76
Sounds intersting.
For Edge Impulse platform role here, first thing you need to be clear about is: what are out inputs and what are your ouputs. Then it is easier to understand can it be done on our platform or not.
e.g.
inputs: 24 analog values read with frequency 100 Hz
outputs: each 500 ms. windows labeled as a class

What would this be for you?

1 Like