How to correctly format input and resize output data whille using TensorRT engine?


This Content is from Stack Overflow. Question asked by huanidz

I’m trying implementing deep learning model into TensorRT runtime. The model conversion step is done quite OK and i’m pretty sure about it.

Now there’s 2 parts i’m currently struggle with is memCpy data from host To Device (like openCV to Trt) and get the right output shape in order to get the right data. So my questions is:

  • How actually a shape of input dims relate with memory buffer. What is the difference when the model input dims is NCHW and NHWC, so when i read a openCV image, it’s NHWC and also the model input is NHWC, do i have to re-arange the buffer data, if Yes then what’s the actual consecutive memory format i have to do ?. Or simply what does the format or sequence of data that the engine are expecting ?

  • About the output (assume the input are correctly buffered), how do i get the right result shape for each task (Detection, Classification, etc..)..
    Eg. an array or something look similar like when working with python .

I read Nvidia docs and it’s not beginner-friendly at all.

//Let's say i have a model thats have a dynamic shape input dim in the NHWC format. 
auto input_dims = nvinfer1::Dims4{1, 386, 342, 3};  //Using fixed H, W for testing
context->setBindingDimensions(input_idx, input_dims);
auto input_size = getMemorySize(input_dims, sizeof(float));
// How do i format openCV Mat to this kind of dims and if i encounter new input dim format, how do i adapt to that ???

And the expected output dims is something like (1,32,53,8) for example, the output buffer result in a pointer and i don’t know what’s the sequence of the data to reconstruct to expected array shape.

// Run TensorRT inference
void* bindings[] = {input_mem, output_mem};
bool status = context->enqueueV2(bindings, stream, nullptr);
if (!status)
    std::cout << "[ERROR] TensorRT inference failed" << std::endl;
    return false;

auto output_buffer = std::unique_ptr<int>{new int[output_size]};
if (cudaMemcpyAsync(output_buffer.get(), output_mem, output_size, cudaMemcpyDeviceToHost, stream) != cudaSuccess)
    std::cout << "ERROR: CUDA memory copy of output failed, size = " << output_size << " bytes" << std::endl;
    return false;

//How do i use this output_buffer to form right shape of output, (1,32,53,8) in this case ?


This question is not yet answered, be the first one who answer using the comment. Later the confirmed answer will be published as the solution.

This Question and Answer are collected from stackoverflow and tested by JTuto community, is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?