High performance implementation of 2-D convolution using AVX2

Abstract – Convolution is the most important and fundamental concept in multimedia processing. The 2-D convolution is used for different filtering operations such as sharpening, smoothing, and edge detection. It performs many mathematical operations on all image pixels. Therefore, it is almost a compute-intensive kernel. In this paper, we use Intrinsic Programming Model (IPM) and AVX2 technology to vectorize this kernel, explicitly. We compare our implementations to Compilers Automatic Vectorization (CAVs), OpenCV library and OpenMP API using ICC, GCC and LLVM compilers, on a single-core. For multi-threading, OpenMP has been used to perform IPM and CAVs implementations on multi-cores. Our experimental results show that the performance of our implementations is much higher than other approaches. In addition, OpenMP improves the performance of our explicit vectorizations significantly using ICC and GCC compilers.
In ieeexplore

Leave a Reply

Your email address will not be published. Required fields are marked *