COde enHancemENts for VEctorizing Compilers by virtual performance evaluation

How applications exploit advanced hardware features in order to run faster is partly related to the methods and techniques that compiler generates machine codes for Central processing units (CPUs) which plays a crucial role in computer systems. Performance of current CPUs is considerably increased and focusing of enterprise programmers on hardware and bottom layer of software causes more costs. Currently, a huge gap between applications and hardware waste a lot of money and because of unused processing power, it needs extra programming efforts in order to use hardware capabilities, properly. For reducing the mentioned costs compiler must generate codes that exploit available hardware features without the need for using extra programming approaches in application development domain. Furthermore, in applications such as multimedia applications which are limited to the speed of the computations on multimedia data, it is very important to use high-performance hardware and software. In addition, modern CPUs are equipped with multimedia extensions to perform the computations using vector processing units. In order to map the algorithm to vector like operations vectorization of multimedia, source codes became an important field of research in theoretical and practical computer science.

So our goal in this project is, design and implement a new vectorization strategy based on the features and requirements of multimedia applications. In the one hand, there are many optimization techniques in order to generate proper code. In other hand, compilers must know the processor micro-architecture for better use of processing power.
For doing these tasks, there are many steps to compile the source code in order to gain more performance. Because we consider the current GPPs development path as an unsuitable approach a new x86 compatible processor will be designed. Additionally, special attention will be given for exploiting a large amount of Data Level Parallelism (DLP) and Thread Level Parallelism (TLP) in applications. In multimedia applications, there are many computational results which can be reused in the next algorithm step thus reuse the result in functional units and registers need to design new instructions to get more performance.