Hi, I'm Stefanos Baziotis


PhD Candidate

University of Illinois at Urbana-Champaign
Department of Computer Science

Email: sb54@illinois.edu
Room 4111, Siebel Center
Research Interests
  • Compilers
  • Programming Languages
  • Databases/Data Management Systems
  • Performance Engineering

I am a fourth-year CS PhD candidate, at the University of Illinois Urbana-Champaign (UIUC), in the Dept. of Computer Science (CS). My advisor is Prof. Charith Mendis. I also work closely with Daniel Kang.

I'm doing research at the intersection of compilers and data management systems. More broadly, my interests extend to performance engineering, programming languages, and distributed systems.

Before coming to UIUC, I worked as a compiler researcher at NEC and I was also involved with the Liberty Research Group from Princeton. I obtained my B.Sc. from the Department of Informatics, University of Athens, where I did my thesis with Prof. Yannis Smaragdakis.

 

 

News
Nov 16, 2024 New article: Compiler Optimization in a Language you Can Understand.
Nov 04, 2024 Upon request, I recorded the intro to program synthesis I gave at the Compiler Meetup@UIUC. I tried to make it self-contained, and have some breadth while going deep.
May 20, 2024 I started my internship at Microsoft Research, working with Christian König.
Nov 20, 2023 Dias got accepted at SIGMOD 2024. The code is public and you can also try Dias using our Colab notebook.
Nov 11, 2023 New article: Inverting the Inverted: Revisiting Dismissed Ideas in Research.

 

 

Publications
  • Dias: Dynamic Rewriting of Pandas Code
    Stefanos Baziotis, Daniel Kang, Charith Mendis
    SIGMOD 2024
    @article{dias,
      author = {Baziotis, Stefanos and Kang, Daniel and Mendis, Charith},
      title = {Dias: Dynamic Rewriting of Pandas Code},
      year = {2024},
      issue_date = {February 2024},
      publisher = {Association for Computing Machinery},
      address = {New York, NY, USA},
      volume = {2},
      number = {1},
      url = {https://doi.org/10.1145/3639313},
      doi = {10.1145/3639313},
      abstract = {In recent years, dataframe libraries, such as pandas have exploded in popularity. Due to their flexibility, they are increasingly used in ad-hoc exploratory data analysis (EDA) workloads. These workloads are diverse, including custom functions which can span libraries or be written in pure Python. The majority of systems available to accelerate EDA workloads focus on bulk-parallel workloads, which contain vastly different computational patterns, typically within a single library. As a result, they can introduce excessive overheads for ad-hoc EDA workloads due to their expensive optimization techniques. Instead, we identify source-to-source, external program rewriting as a lightweight technique which can optimize across representations, and offer substantial speedups while also avoiding slowdowns. We implemented Dias, which rewrites notebook cells to be more efficient for ad-hoc EDA workloads. We develop techniques for efficient rewrites in Dias, including checking the preconditions under which rewrites are correct, dynamically, at fine-grained program points. We show that Dias can rewrite individual cells to be 57\texttimes{} faster compared to pandas and 1909\texttimes{} faster compared to optimized systems such as modin. Furthermore, Dias can accelerate whole notebooks by up to 3.6\texttimes{} compared to pandas and 27.1\texttimes{} compared to modin.},
      journal = {Proc. ACM Manag. Data},
      month = {mar},
      articleno = {58},
      numpages = {27},
      keywords = {cross-representation, dynamic, pandas, rewriting}
      }
    

    Copy bibtex

  • Hydride: A Retargetable and Extensible Synthesis-based Compiler for Modern Hardware Architectures
    Kothari, Akash, Abdul Rafae Noor, Muchen Xu, Hassam Uddin, Dhruv Baronia, Stefanos Baziotis, Vikram Adve, Charith Mendis, and Sudipta Sengupta.
    ASPLOS 2024
    @inproceedings{hydride,
      author = {Kothari, Akash and Noor, Abdul Rafae and Xu, Muchen and Uddin, Hassam and Baronia, Dhruv and Baziotis, Stefanos and Adve, Vikram and Mendis, Charith and Sengupta, Sudipta},
      title = {Hydride: A Retargetable and Extensible Synthesis-based Compiler for Modern Hardware Architectures},
      year = {2024},
      isbn = {9798400703850},
      publisher = {Association for Computing Machinery},
      address = {New York, NY, USA},
      url = {https://doi.org/10.1145/3620665.3640385},
      doi = {10.1145/3620665.3640385},
      abstract = {As modern hardware architectures evolve to support increasingly diverse, complex instruction sets for meeting the performance demands of modern workloads in image processing, deep learning, etc., it has become ever more crucial for compilers to provide robust support for evolution of their internal abstractions and retargetable code generation support to keep pace with emerging instruction sets. We propose Hydride, a novel approach to compiling for complex, emerging hardware architectures. Hydride uses vendor-defined pseudocode specifications of multiple hardware ISAs to automatically design retargetable instructions for AutoLLVM IR, an extensible compiler IR which consists of (formally defined) language-independent and target-independent LLVM IR instructions to compile to those ISAs, and automatically generated instruction selection passes to lower AutoLLVM IR to each of the specified hardware ISAs. Hydride also includes a code synthesizer that automatically generates code generation support for schedule-based languages, such as Halide, to optimally generate AutoLLVM IR. Our results show that Hydride is able to represent 3,557 instructions combined in x86, Hexagon, ARM architectures using only 397 AutoLLVM IR instructions, including (Intel) SSE2, SSE4, AVX, AVX2, AVX512, (Qualcomm) Hexagon HVX, and (ARM) NEON vector ISAs. We created a new Halide compiler with Hydride using only a formal semantics of Halide IR, leveraging the auto-generated AutoLLVM IR and back-ends for the three hardware architectures. Across kernels from deep learning and image processing, this compiler is able to perform just as well as the mature, production Halide compiler on Hexagon, and outperform on x86 by 8\% and ARM by 3\%. Hydride also outperforms the production Halide's LLVM back end by 12\% on x86, 100\% on HVX, and 26\% on ARM across the same kernels.},
      booktitle = {Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2},
      pages = {514–529},
      numpages = {16},
      location = {La Jolla, CA, USA},
      series = {ASPLOS '24}
      }
    

    Copy bibtex

 

 

Blog

 

 

Talks

  • Latest Advancements in Automatic Vectorization Research
    Stefanos Baziotis
    LLVM-CGO 2021
       Slides
  • Introduction to (Unconventional) Vectorization
    Stefanos Baziotis
    LLVM Social Bangalore, December 2020
       Slides
  • Εισαγωγή στο Google Summer of Code (Updated)
    Stefanos Baziotis
    University of Athens, Department of Informatics, Operating Systems Course 2020
       Slides
  • The Present and Future of Interprocedural Optimization in LLVM
    Luofan Chen, Kuter Dinel, Shinji Okumura, Hideto Ueno, Johannes Doerfert, Stefanos Baziotis
    LLVM Developers' Meeting 2020
       Slides
  • A Deep Dive into the Interprocedural Optimization Infrastructure
    Luofan Chen, Kuter Dinel, Shinji Okumura, Hideto Ueno, Johannes Doerfert, Stefanos Baziotis
    LLVM Developers' Meeting 2020
       Slides
  • Finding Your Way Around the LLVM Dependence Analysis Zoo
    Stefanos Baziotis, Simon Moll
    LLVM Developers' Meeting 2020
       Slides
  • Εισαγωγή στο Google Summer of Code
    Stefanos Baziotis
    ACM UoA Student Chapter, February 2020
       Slides
  • Εισαγωγή στο Open Source Software
    Stefanos Baziotis
    ACM UoA Student Chapter, November 2019
       Slides