Totally new, dont know anything, too many questions

Question/Issue: Awesome technology that makes it possible for someone like me who does not know anything get started to make something. Went through the process of building the first project, deployed, didnt work very well but at least I understood something of the process. (I had 99% confidence in the confusion matrix though). 6 hrs of good quality audio data. I have to get some understanding of the MLOps to get the logic of the steps and what these inputs are. I signed up for the book, didnt get the link. (no corporate email id ?) Process-workflow-web interface made as simple as possible from your side but still some confusion (questions,unsure next steps) for someone who does not understand what is happening. It clarified with trial and error, but I can have some suggestions. Since this is an ocean, dont want to bother you with all the questions that appeared on the first day.

Project ID:

Context/Use case:

Massive improvement. Now working pretty well.
Project ID 427802
Can only improve from what it is now. :slight_smile:

When we see it running on the browser, it will be good to have the time stamp on the left instead of serial numbers. That way, we can keep it running, create sounds off and on (for detection) and revisit to see the results in the log. And whether they matched in time.

hence a means to Pause, Select All, Copy. (the log to analyze offline)

Why is the normalisation window in the MFCC block set to a default value of 101. What is the logic behind that choice.

Is the entire dataset shuffled before every epoch of training?

I have not seen any limits to dataset size of training data for the Community Plan. Is there any limit? Can i put 20 hours of audio training data - which increases training time, memory required etc.

what is the leanest audio i can upload, mono, min samping rate?, unsigned 8 bit pcm?
what is the format it is converted to for processing ? Any pointers where this is discussed.

Hi @Abhijit8086,

If you are new to edge AI, I highly recommending checking out our free embedded machine learning course here: Coursera will ask you to pay for a certificate, but you can just ignore it.

To answer some of your questions:

The MFCC block defaults came from our DSP experts with the assumption that you want to classify vocal sounds (e.g. keyword spotting). Most of these are sensible defaults based on previous similar examples and code around performing keyword spotting.

The dataset is shuffled once before training, splitting off some of the samples to create a “validation” set.

The Community Plan is limited by processing time (either from the processing block or training time) and RAM. While you can upload a lot of data, you’ll be limited by the length of training/processing time as well as how much of that data can fit in memory during processing/training. When you start feature processing or training, we compute the estimated RAM and training time. If your job exceeds the plan’s limit, you’ll see an error message in the output.

The leanest audio you can upload dependent on quality/accuracy you find acceptable. Almost all .wav formats should be accepted, just make sure that the sampling rate and bit depth are all the same among your samples prior to uploading. For very basic voice, I find that 8 kHz sampling and 8 bit PCM is fine. But, you’ll likely see better accuracy with 16 kHz and 16-bit PCM.

Thank you Shawn, for your kind answers.

Along the way, I also found ChatGPT to be a good place to ask one’s basic questions which the GPT with its infinite patience teaches very well. Also, I found the ChatGPT is quite aware of Edge Impulse and its mechanisms, not super exactly but very insightfully. (Perhaps product has also changed since its training)

  1. Why does the Live classification take quite long - like a minute plus whereas its supposed to be done in milliseconds and the SIMD actually does. Perhaps I am missing some fundamental here.

  2. Also, the performance calibration step just keeps running, showing Loading samples but does not display anything (for very long time), no config graph, no results for selected config, although in another occassion, it did load for me quite quickly.

  3. In one case ChatGPT advised that one should stop the training when accuracy is not changing too much because it may next go to overfitting. I didnt see that option. Perhaps it can be a feature in your product. (early stopping criteria) Although I quite understand that the field being the ocean that it is - even more dials and levers complicate even further for the lay user. There is no end to this. What is essential is there. :slight_smile:

  4. I am assuming that 1 wav file of one hour is better that 3600 wav files of its consecutive one second files. Because then the stride thing works better. Also augmentation like shifting. Isnt that so. So large single files better in general. Is it ? Any comments?

  5. The EON Trainer informs us that almost the entire processing load is the DSP and hardly anything for the NN. Amazing. Is this impression correct? What is full form of EON.

This also means that to stay under the 20 mins threshold, one can experiment complicated neural nets by reducing some feature extraction parameters.

  1. Is also your website.

I will do the Coursera course. But I have picked up quite a bit about the fundamentals of what is happening in the last few days.

Thank you so much for making all of these computing available to us for free. And the simplicity to jump straight into the action.
But I will say that it is the developers/enthusiasts who after obtaining the promised results will take it to the potential consumers of this technology.

Hi @Abhijit8086,

  1. Almost everything in Edge Impulse runs in a Docker container, which has some overhead when instantiating/loading. This includes live classification: a sample must be taken from the connected board, transmitted across the internet, and a Docker container is loaded to perform inference. This process can easily take dozens of seconds. If you want to perform internet-based inference (cloud prediction serving), you’ll want to use a different service. Live Classification is just for testing. The real speed comes when you deploy the model to the end device.
  2. I’ve seen some issues with Perf Cal recently, and we’re working on an update. If you provide your project ID, we can try to replicate the issue.
  3. Edge Impulse uses the model parameters from the epoch with the highest validation score after all training is done. So, if you train for 30 epochs and epoch 27 had the best validation accuracy, your model will be the one from epoch 27.
  4. It depends. If you rely on Edge Impulse to do windowing (with strides), then a 1 hour sample might be better, but you lose out on the ability to assign multiple labels (i.e. that one sample must have 1 label for all sound clips obtained from it). Also, for things like keyword spotting, you’d have to make sure that the boundaries of the window don’t fall in the middle of the key word/phrase. I prefer to create multiple samples and do windowing myself prior to uploading to an Edge Impulse project when doing projects. If you’re curious, I have an example of how I do windowing and augmentation for keyword samples here: ei-keyword-spotting/ei-audio-dataset-curation.ipynb at master · ShawnHymel/ei-keyword-spotting · GitHub
  5. Depends on the task. In my experience, MFE and MFCC calculations for audio is incredibly computationally intensive and will usually take longer than the simple NN that accompanies them for classification. On the other hand, things like image classification usually requires very little preprocessing (simple cropping and scaling), but the convolutional NN computations are very intensive.
  6. belongs to Arduino, which has been around since 2008 (much longer Edge Impulse). We partnered with Arduino to create Login - Edge Impulse. The backend of that site is Edge Impulse, but it has been given an Arduino makeover (colors, logo, support for different boards, etc.). You can read more about the partnership here: Arduino Machine Learning Tools | Edge Impulse Documentation

Hope that helps!

Very nice, thank you.

: Yes, my project was audio.

Project ID 427802
My deployed Baby Cry detection is producing more accurate detection to real life audio than the other Baby Cry projects in public.

Great idea, perfect.

Joined the course. Very good course. Will come back with more questions. Thank you.

1 Like