Friday, October 23, 2015

CUDNN Benchmark Tools

I was trying to evaluate whether my embedded device, NVIDIA Shield TV, was capable of running a particular CNN learning algorithm in real-time. Instead of directly implementing the entire learning algorithm with the risk of not being able to meet the targeted performance, I decided to study the feasibility first by writing a benchmark tool. It runs forward convolutions with different configurable parameters (number of feature maps, batch size, filter size) and algorithm.

For those who are not familiar with CUDNN library, it supports different algorithms for running a forward convolution; some try to minimize GPU memory footprint, while some other focus on performance without worrying about memory-footprint. Perhaps the idea is to support different GPUs with different specifications (number of CUDA cores, memory size, etc), as well as different use-cases.

Sample output:
n, w, h, c, k, filter_dim, avg_time(us), max_time(us)
1, 3840, 2160, 1, 1, 3, 74383, 75142
1, 3840, 2160, 1, 1, 5, 88465, 88819
1, 3840, 2160, 1, 1, 9, 159752, 160324
Total time taken=322600 us.

Since the output is comma-separated, they can easily be parsed by spreadsheet processor. 

Check the code out here:
https://github.com/blacksoil/CUDNN-BenchmarkUtility

No comments:

Post a Comment