Slow JSON Upload of Raw Data

I’m working on a raw JSON uploader for the Fashion-MNIST dataset (I know I could upload the dataset in PNG format, but this is a learning exercise in how to use raw, normalized floating-point data in Edge Impulse).

Here is my Colab: https://colab.research.google.com/drive/1icDMxig047B6uPryhZGxJqp0HPhA0D9c?usp=sharing

Right now, it seems to upload about 2 samples per second, which means it would take something like 10 hours to upload the whole Fashion-MNIST dataset. Is there any way to speed up the connection, open multiple/parallel connections, or upload several samples with one JSON payload?

You can up the concurrency, e.g. via: --concurrency 50

edit: Ah, you’re not using the uploader. In that case I’d make:

  • A list of all items you want to upload
  • Have a Python thread that in a while loop splices the array to grab the top one and upload it
  • Spin up 20 of these threads.
2 Likes

Worked like a boss :sunglasses: Thank you!

For anyone else that stumbles across this: definitely use queues to store the list of samples…it makes life much easier (and thread-safe).

4 Likes