I am developing an experimental tactile–audio–visual system that integrates embedded AI, textile interaction, and environmental sensing. The objective is to create an autonomous, low-power interface that operates without external software (e.g., Pure Data or Max) and produces sound directly through amplified speakers.
The system combines an ESP32-S3 board (with integrated camera) and two MPR121 capacitive sensors providing 24 touch channels, each corresponding to a conductive textile zone. These zones can be activated by human or non-human agents, such as birds resting on the fabric.
Using Edge Impulse, I plan to train an embedded model capable of recognizing and classifying tactile gestures —taps, long touches, swipes, and varying pressure— through temporal and spatial patterns. The system will then generate sound in real time based on the tactile input, while the ESP32-S3 camera interprets ambient light variations to produce synchronized visual outputs.
The project envisions a live audiovisual installation in which touch, light, and sound converge through an embodied and ecological AI —a system that learns from and responds to its material environment.
I would appreciate your feedback or any suggestions regarding the feasibility or optimization of this configuration using Edge Impulse with the ESP32-S3 and MPR121 sensors.
Best regards,