Memory requirement is larger for architectures using scratch buffers

VikramDattu · March 11, 2022, 11:34am

With EON compiled cpp SDK, if you compile it for CMSIS-NN, one can easily see that scratch buffers are allocated and used without efficiency and hence there are separate allocations of scratch buffers for all nodes.
Whereas, original tflite micro reuses regions of Arena for all node requirements for scratch buffers.
Any plans or workarounds for this?

janjongboom · March 12, 2022, 12:21pm

Hi @VikramDattu what do you mean exactly? We calculate the arena size to include all CMSIS-NN scratch buffers both in EON/TFLite operations. There’s indeed some code in the EON compiled model to handle overflows as we cannot always know the required size of arena (e.g. when using ARC kernels) but for CMSIS-NN they should be correct.

VikramDattu · March 15, 2022, 5:14am

Hi @janjongboom this is the function in compiled model allocating/assigning scratch memory:

static TfLiteStatus RequestScratchBufferInArena(struct TfLiteContext* ctx, size_t bytes,
                                                int* buffer_idx) {
  scratch_buffer_t b;
  b.bytes = bytes;

  b.ptr = AllocatePersistentBuffer(ctx, b.bytes);
  if (!b.ptr) {
    return kTfLiteError;
  }

  scratch_buffers.push_back(b);

  *buffer_idx = scratch_buffers.size() - 1;

  return kTfLiteOk;
}

It allocates each scratch buffer separately. Whereas, original tflite optimizes this to reuse scratch buffer area across different nodes.

This is what I did to do it:

allocated scratch buffer seperate,

static std::vector<scratch_buffer_t> scratch_buffers;

static TfLiteStatus RequestScratchBufferInArena(struct TfLiteContext* ctx, size_t bytes,
                                                int* buffer_idx) {
  if (scratch_location - bytes < scratch_boundary) {
    printf("scratch allocation failed. Requested %d but %d available\n",
           bytes, scratch_location - scratch_boundary);
    return kTfLiteError;
  }

  scratch_buffer_t b;
  b.bytes = bytes;
  scratch_location -= bytes;
  scratch_location -= ((int) scratch_location) & 0xff; // align to 16 bit boundary

  b.ptr = scratch_location;
  memset(b.ptr, 0, bytes);

  scratch_buffers.push_back(b);

  *buffer_idx = scratch_buffers.size() - 1;

  return kTfLiteOk;
}

// and I reset scratch_location before each prepare call

  for(size_t i = 0; i < 31; ++i) {
    if (registrations[nodeData[i].used_op_index].prepare) {
      scratch_location = scratch_buffer + scratchBufferSize; // reset scratch location to end
      TfLiteStatus status = registrations[nodeData[i].used_op_index].prepare(&ctx, &tflNodes[i]);
      if (status != kTfLiteOk) {
        return status;
      }
    }
  }