Current Research Projects

Previous Research Projects

Efficient Task-Based Parallel Runtimes

Task-based parallel runtimes underpin the parallelization of frameworks for machine learning, graph analytics, and other domains. State-of-the-art graph analytics frameworks like GraphIt and Ligra are designed on top of these runtimes to enable efficient task distribution using dynamic work-stealing algorithms. My work has improved the performance and energy efficiency of these runtimes with a cross-stack approach that exposes runtime-level information to the hardware to control architecture- and VLSI-level mechanisms (ISCA 2016). However, walls of abstraction often make it challenging to pass information through layers of the computing stack. I worked on a systematic approach to convey the abstraction of a "task" from the runtime directly to the underlying hardware (MICRO 2017). I designed and fabricated BRGTC2, a 6.7M-transistor chip in TSMC 28nm, to collect performance, area, and energy numbers in an advanced technology node to support future research projects based on hardware acceleration for task-based parallel runtimes (RISCV 2018).

Integrated Voltage Regulation

Voltage regulators are responsible for efficiently converting one voltage level into another (e.g., board-level to chip-level). Recent technology trends are making it feasible to replace discrete voltage regulators with integrated voltage regulators, which can significantly reduce system cost by eliminating expensive board-level components. These enabling trends include energy storage elements with better energy densities as well as faster on-chip switches with lower parasitic losses. However, integrated voltage regulators are very large (e.g., similar area as the core it supplies). Together with my colleagues in the circuits field, I applied an architecture-circuit co-design approach to develop a novel technique that dynamically shares capacitance across multiple loads for a 40% reduction in regulator area while still enabling fine-grain DVFS (MICRO 2014). I also contributed to the fabrication of a switched-capacitor-based prototype in 65nm CMOS resulting in a journal publication in a top-tier circuits venue (IEEE TCAS I 2018).

Celerity SoC and Rapid ASIC Design

Celerity (2017)

Rising SoC design costs have created a formidable barrier to hardware design when using traditional design tools and methodologies. It is exceedingly difficult for small teams with a limited workforce to build meaningfully complex chips. I have been involved in a range of efforts to reduce the costs and challenges of ASIC design for small teams based on productive toolflows and open-source hardware. I was the Cornell University student lead on the Celerity Open-Source 511-Core RISC-V Tiered Accelerator Fabric resulting in top-tier publications in chip-design venues (Hot Chips 2017, VLSIC 2019), architecture venues (IEEE Micro 2018), and various workshops. I was also the project lead for BRGTC1 and BRGTC2, which are silicon prototypes in IBM 130nm and TSMC 28nm designed and implemented using a new open-source Python-based hardware modeling framework called PyMTL developed by my research group. Finally, I contributed to an effort at NVIDIA Research on a modular digital VLSI flow for high-productivity SoC design based on high-level synthesis tools (DAC 2018).