I developed an implementation of Message Passing Interface(MPI) in software for TRIPS. I implemented about 20 commonly used MPI routines. I am working on the performance analysis now.
In this project jointly with Suriya , we developed a framework for generating custom hardware to execute streaming applications. We accept streaming applications written in StreamIt and generate synthesizable Verilog code. This Verilog code is synthesized using the Synopisis Verilog compiler to produxe RTL. This is equivalent to the hardware that performs the application in hardware.
High performance implementation of Cholesky factorization on SMP architectures - Project report
This project explores multiple ways of partioning, parallelizing and scheduling Cholesky factorization, within the OPENFLAME framework.
This project explored the various approaches for the design and implementation of the Pattern Matching
kernel on TRIPS SVM system to exploit the available parallelism and hide communication latency
efficiently. Pattern Matching being a embarassingly parallel application, it was shown that the TRIPS SVM
system generates a speedup of nearly 4x with 4 processors versus the uniprocessor version.
Projects prior to grad school
Basic Block Architecture for Power Saving
Basic Block Architecture for Power Saving (B2APS)” introduces a new approach towards power saving
with no compromise in performance. The principal aim of this approach is to handle the entire architecture
in terms of basic blocks with compiler support and restructure the underlying architecture to handle blocks
and implement deterministic clock gating techniques. The proposed area of focus includes cache, register
files and pipeline units and functional units.
Thread selection for SMT processor based on Function Unit usage in hardware.
Thread selection for SMT processor based on Function Unit usage in hardware.
The performance of a simultaneous multithreading processor depends on the ability of the processor to
exploit the choice available during instruction fetch, and hence a good thread selection mechanism should
be in place. This project implemented such a mechanism of thread selection by predicting the usage of
functional unit (FU) in hardware, using SMT Simple scalar simulator. It was observed that the proposed
mechanism was effective with benchmarks that involve extensive floating point and integer computations.
In most other benchmarks, thread selection based on ICount gave better performance in terms of IPC.
Implementation Of Pthreads and OpenMP In An MPI Environment Project report
The aim of the project is to obtain optimal performance from both the processors of all the using MPI for
internodes communication and Pthread/OpenMP for parallelizing computation (intra node).
Using Chaos theory for Steganography
I developed a stegonography method using chaotic random generator.