Stefanos Baziotis

Hi, I'm Stefanos Baziotis

PhD Candidate

University of Illinois at Urbana-Champaign
Department of Computer Science

Email: sb54@illinois.edu or stef@sbaziotis.com
Room 4111, Siebel Center

Research Interests

Compilers
Databases/Data Management Systems
Programming Languages
Performance Engineering

I am a fourth-year CS PhD candidate, at the University of Illinois Urbana-Champaign (UIUC), in the Dept. of Computer Science (CS). My advisor is Prof. Charith Mendis. I also work closely with Daniel Kang.

I'm doing research at the intersection of compilers and data management systems. More broadly, my interests extend to performance engineering and programming languages.

Before coming to UIUC, I worked as a compiler researcher at NEC and I was also involved with the Liberty Research Group from Princeton. I obtained my B.Sc. from the Department of Informatics, University of Athens, where I did my thesis with Prof. Yannis Smaragdakis.

News

Don't want to miss any updates? You can follow this RSS feed or sign up for my newsletter:

Aug 02, 2025	New article: Les Mots Français en Grec – Partie 2.
Jul 19, 2025	New article: metap: A Meta-Programming Layer for Python.
Jul 16, 2025	New short essay: Πῶς οἱ Χατζηφραγκέτα μᾶς Ἔκαναν Διανοουμένους.
Jul 8, 2025	New article in French! Les Mots Français en Grec.
Jun 30, 2025	Dias on the Web: A Godbolt-like interface to try Dias online.
Jun 23, 2025	New article: PandasBench: The first benchmark for the Pandas API. Read it in our group's new website!
May 17, 2025	New article: The Curse of Microlearning.
May 07, 2025	New article: What Happens If We Inline Everything? Surprisingly, not everything will crash and burn.
May 03, 2025	New short essay: On Hypocritical Writing, a unique way to write non-fiction.
Apr 28, 2025	The discussions I've been having on my youtube channel are now available on Spotify, as part of a new podcast I created: In the Midst of Philosophy.
Apr 25, 2025	New video: Dias: Dynamic Rewriting of Pandas Code. To celebrate the recent Honorable Mention for the Best Artifact Award at SIGMOD 2024.
Apr 04, 2025	New article: Listening to Your Own Music - Short Essay.
Apr 02, 2025	New discussion: A Discussion with William J. Rapaport.
Mar 22, 2025	New short essay: Ἡ παγκοσμίας κλάσης μετάφρασι τοῦ Στίγκλερ τῆς κ. Σινοπούλου.
Mar 18, 2025	New article: Γιατὶ χρησιμοποιῶ τὸ πολυτονικό.
Mar 13, 2025	New article: Getting Started with Compilers.
Mar 02, 2025	New article: Tempi: Translating Greece's Indignation.
Feb 22, 2025	New article: Γλωσσάρι Μ. Καραγάτση.
Feb 15, 2025	New article: A Beginner's Guide to Vectorization By Hand: Part 4 - Convolution.
Feb 12, 2025	New article: Short Essays - February 2025.
Dec 17, 2024	New video: Compiler Applications to Query Processing.
Dec 09, 2024	New article: Common Misconceptions about Compilers.
Dec 02, 2024	New article: Defining All Undefined Behavior and Leveraging Compiler Transformation APIs.
Nov 16, 2024	New article: Compiler Optimization in a Language you Can Understand.
Nov 04, 2024	Upon request, I recorded the intro to program synthesis I gave at the Compiler Meetup@UIUC. I tried to make it self-contained, and have some breadth while going deep.
May 20, 2024	I started my internship at Microsoft Research, working with Christian König.
Nov 20, 2023	Dias got accepted at SIGMOD 2024. The code is public and you can also try Dias using our Colab notebook.
Nov 11, 2023	New article: Inverting the Inverted: Revisiting Dismissed Ideas in Research.

More...

Publications

PandasBench: A Benchmark for the Pandas API
Alex Broihier*, Stefanos Baziotis*, Daniel Kang, Charith Mendis

@misc{broihier2025pandasbenchbenchmarkpandasapi,
  title={PandasBench: A Benchmark for the Pandas API}, 
  author={Alex Broihier and Stefanos Baziotis and Daniel Kang and Charith Mendis},
  year={2025},
  eprint={2506.02345},
  archivePrefix={arXiv},
  primaryClass={cs.DB},
  url={https://arxiv.org/abs/2506.02345}, 
}

Copy bibtex

PilotDB: Database-Agnostic Online Approximate Query Processing with A Priori Error Guarantees
Yuxuan Zhu*, Tengjun Jin*, Stefanos Baziotis, Chengsong Zhang, Charith Mendis, Daniel Kang
SIGMOD 2025

@article{10.1145/3725335,
  author = {Zhu, Yuxuan and Jin, Tengjun and Baziotis, Stefanos and Zhang, Chengsong and Mendis, Charith and Kang, Daniel},
  title = {PilotDB: Database-Agnostic Online Approximate Query Processing with A Priori Error Guarantees},
  year = {2025},
  issue_date = {June 2025},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  volume = {3},
  number = {3},
  url = {https://doi.org/10.1145/3725335},
  doi = {10.1145/3725335},
  abstract = {After decades of research in approximate query processing (AQP), its adoption in the industry remains limited. Existing methods struggle to simultaneously provide user-specified error guarantees, eliminate maintenance overheads, and avoid modifications to database management systems. To address these challenges, we introduce two novel techniques, TAQA and BSAP. TAQA is a two-stage online AQP algorithm that achieves all three properties for arbitrary queries. However, it can be slower than exact queries if we use standard row-level sampling. BSAP resolves this by enabling block-level sampling with statistical guarantees in TAQA. We implement TAQA and BSAP in a prototype middleware system, PilotDB, that is compatible with all DBMSs supporting efficient block-level sampling. We evaluate PilotDB on PostgreSQL, SQL Server, and DuckDB over real-world benchmarks, demonstrating up to 126X speedups when running with a 5\% guaranteed error.},
  journal = {Proc. ACM Manag. Data},
  month = jun,
  articleno = {198},
  numpages = {28},
  keywords = {approximate query processing, sampling}
  }

Copy bibtex

Dias: Dynamic Rewriting of Pandas Code
Stefanos Baziotis, Daniel Kang, Charith Mendis
SIGMOD 2024, Honorable Mention for Best Artifact Award

@article{dias,
  author = {Baziotis, Stefanos and Kang, Daniel and Mendis, Charith},
  title = {Dias: Dynamic Rewriting of Pandas Code},
  year = {2024},
  issue_date = {February 2024},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  volume = {2},
  number = {1},
  url = {https://doi.org/10.1145/3639313},
  doi = {10.1145/3639313},
  abstract = {In recent years, dataframe libraries, such as pandas have exploded in popularity. Due to their flexibility, they are increasingly used in ad-hoc exploratory data analysis (EDA) workloads. These workloads are diverse, including custom functions which can span libraries or be written in pure Python. The majority of systems available to accelerate EDA workloads focus on bulk-parallel workloads, which contain vastly different computational patterns, typically within a single library. As a result, they can introduce excessive overheads for ad-hoc EDA workloads due to their expensive optimization techniques. Instead, we identify source-to-source, external program rewriting as a lightweight technique which can optimize across representations, and offer substantial speedups while also avoiding slowdowns. We implemented Dias, which rewrites notebook cells to be more efficient for ad-hoc EDA workloads. We develop techniques for efficient rewrites in Dias, including checking the preconditions under which rewrites are correct, dynamically, at fine-grained program points. We show that Dias can rewrite individual cells to be 57\texttimes{} faster compared to pandas and 1909\texttimes{} faster compared to optimized systems such as modin. Furthermore, Dias can accelerate whole notebooks by up to 3.6\texttimes{} compared to pandas and 27.1\texttimes{} compared to modin.},
  journal = {Proc. ACM Manag. Data},
  month = {mar},
  articleno = {58},
  numpages = {27},
  keywords = {cross-representation, dynamic, pandas, rewriting}
  }

Copy bibtex

Hydride: A Retargetable and Extensible Synthesis-based Compiler for Modern Hardware Architectures
Kothari, Akash, Abdul Rafae Noor, Muchen Xu, Hassam Uddin, Dhruv Baronia, Stefanos Baziotis, Vikram Adve, Charith Mendis, and Sudipta Sengupta.
ASPLOS 2024

@inproceedings{hydride,
  author = {Kothari, Akash and Noor, Abdul Rafae and Xu, Muchen and Uddin, Hassam and Baronia, Dhruv and Baziotis, Stefanos and Adve, Vikram and Mendis, Charith and Sengupta, Sudipta},
  title = {Hydride: A Retargetable and Extensible Synthesis-based Compiler for Modern Hardware Architectures},
  year = {2024},
  isbn = {9798400703850},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3620665.3640385},
  doi = {10.1145/3620665.3640385},
  abstract = {As modern hardware architectures evolve to support increasingly diverse, complex instruction sets for meeting the performance demands of modern workloads in image processing, deep learning, etc., it has become ever more crucial for compilers to provide robust support for evolution of their internal abstractions and retargetable code generation support to keep pace with emerging instruction sets. We propose Hydride, a novel approach to compiling for complex, emerging hardware architectures. Hydride uses vendor-defined pseudocode specifications of multiple hardware ISAs to automatically design retargetable instructions for AutoLLVM IR, an extensible compiler IR which consists of (formally defined) language-independent and target-independent LLVM IR instructions to compile to those ISAs, and automatically generated instruction selection passes to lower AutoLLVM IR to each of the specified hardware ISAs. Hydride also includes a code synthesizer that automatically generates code generation support for schedule-based languages, such as Halide, to optimally generate AutoLLVM IR. Our results show that Hydride is able to represent 3,557 instructions combined in x86, Hexagon, ARM architectures using only 397 AutoLLVM IR instructions, including (Intel) SSE2, SSE4, AVX, AVX2, AVX512, (Qualcomm) Hexagon HVX, and (ARM) NEON vector ISAs. We created a new Halide compiler with Hydride using only a formal semantics of Halide IR, leveraging the auto-generated AutoLLVM IR and back-ends for the three hardware architectures. Across kernels from deep learning and image processing, this compiler is able to perform just as well as the mature, production Halide compiler on Hexagon, and outperform on x86 by 8\% and ARM by 3\%. Hydride also outperforms the production Halide's LLVM back end by 12\% on x86, 100\% on HVX, and 26\% on ARM across the same kernels.},
  booktitle = {Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2},
  pages = {514–529},
  numpages = {16},
  location = {La Jolla, CA, USA},
  series = {ASPLOS '24}
  }

Copy bibtex

Designing Decoupled Compiler Transformation APIs
B.Sc. Thesis
National and Kapodistrian University of Athens, 2021

@BACHELORSTHESIS{Moll:2011:BSc,
    author = { Stefanos Baziotis },
    title = { {D}esigning {D}ecoupled {C}ompiler {T}ransformation {API}s },
    school = {National and Kapodistrian University of Athens, Department of Informatics},
    year = { 2021 },
    month = { July },
    webpdf  = {https://pergamos.lib.uoa.gr/uoa/dl/frontend/file/lib/default/data/2958380/theFile},
}

Copy bibtex

Don't want to miss any updates? You can follow this RSS feed or sign up for my newsletter:

Podcast

I'm the host of the In the Midst of Philosophy podcast (also on Youtube), whose goal is to bring important figures from the fields of Philosophy, Technology, and Science, onto the foreground of public awareness.

Episodes
- /* Apr 02, 2025 */ #1 - William J. Rapaport: Philosophy of Computer Science
- /* Jun 16, 2025 */ #2 - Federica Frabetti: The Deconstruction of Software Engineering
- /* Upcoming... */ #3 - George Contogeorgis

Blog

Compilers & Programming Languages
- /* Jul 19, 2025 */ metap: A Meta-Programming Layer for Python
- /* May 07, 2025 */ What Happens If We Inline Everything?
- /* Mar 13, 2025 */ Getting Started with Compilers
- /* Dec 09, 2024 */ Common Misconceptions about Compilers
- /* Dec 02, 2024 */ Defining All Undefined Behavior and Leveraging Compiler Transformation APIs
- /* Nov 16, 2024 */ Compiler Optimization in a Language you Can Understand
- /* Mar 28, 2023 */ Dias: Automatically Rewriting Pandas for 1000x Speedups
- /* Mar 09, 2023 */ How Target-Independent is Your IR?
- /* Apr 24, 2022 */ The Weird Type System of Golang
- /* Jan 30, 2022 */ The Cryptic Proof of Semidominators in the Lengauer-Tarjan Algorithm
- /* Nov 05, 2020 */ Juggling Knives or Adding Definedness to Combat Undefined Behavior
- /* Aug 13, 2020 */ Tell The Compiler What You Know
- /* May 21, 2020 */ Introduction to Scalar Evolution with Iteration-Based Intuition
- /* May 08, 2020 */ Visualizing Dominators
- /* May 03, 2020 */ Loop To Constant Computation

Performance
- /* Feb 15, 2025 */ A Beginner's Guide to Vectorization By Hand: Part 4 - Convolution
- /* Aug 28, 2021 */ A Beginner's Guide to Vectorization By Hand: Part 3
- /* Sep 13, 2021 */ A Beginner's Guide to Vectorization By Hand: Part 2
- /* Aug 31, 2020 */ A Beginner's Guide to Vectorization By Hand: Part 1

Uncategorized
- /* Nov 11, 2023 */ Inverting the Inverted: Revisiting Dismissed Ideas in Research
- /* Dec 01, 2020 */ The New Free Ride in Computing
- /* Aug 09, 2020 */ How 99% of C Tutorials Get it Wrong
- /* May 23, 2020 */ GJK Algorithm: A Visual Derivation

Non-Computer Science
- /* Aug 02, 2025 */ Les Mots Français en Grec – Partie 2
- /* Jul 16, 2025 */ Πῶς οἱ Χατζηφραγκέτα μᾶς Ἔκαναν Διανοουμένους
- /* Jul 08, 2025 */ Les Mots Français en Grec
- /* May 17, 2025 */ The Curse of Microlearning
- /* May 03, 2025 */ On Hypocritical Writing
- /* Apr 04, 2025 */ Listening to Your Own Music - Short Essay
- /* Mar 22, 2025 */ Σύντομος Ἔκθεσι: Ἡ παγκοσμίας κλάσης μετάφρασι τοῦ Στίγκλερ τῆς κ. Σινοπούλου
- /* Mar 18, 2025 */ Γιατί χρησιμοποιῶ τὸ πολυτονικό
- /* Mar 02, 2025 */ Tempi: Translating Greece's Indignation
- /* Feb 22, 2025 */ Γλωσσάρι Μ. Καραγάτση
- /* Feb 12, 2025 */ Short Essays - February 2025
- /* Jul 16, 2023 */ Remembering and the Impact of Books

Talks

Latest Advancements in Automatic Vectorization Research
Stefanos Baziotis
LLVM-CGO 2021
Slides
Introduction to (Unconventional) Vectorization
Stefanos Baziotis
LLVM Social Bangalore, December 2020
Slides
Εισαγωγή στο Google Summer of Code (Updated)
Stefanos Baziotis
University of Athens, Department of Informatics, Operating Systems Course 2020
Slides
The Present and Future of Interprocedural Optimization in LLVM
Luofan Chen, Kuter Dinel, Shinji Okumura, Hideto Ueno, Johannes Doerfert, Stefanos Baziotis
LLVM Developers' Meeting 2020
Slides
A Deep Dive into the Interprocedural Optimization Infrastructure
Luofan Chen, Kuter Dinel, Shinji Okumura, Hideto Ueno, Johannes Doerfert, Stefanos Baziotis
LLVM Developers' Meeting 2020
Slides
Finding Your Way Around the LLVM Dependence Analysis Zoo
Stefanos Baziotis, Simon Moll
LLVM Developers' Meeting 2020
Slides
Εισαγωγή στο Google Summer of Code
Stefanos Baziotis
ACM UoA Student Chapter, February 2020
Slides
Εισαγωγή στο Open Source Software
Stefanos Baziotis
ACM UoA Student Chapter, November 2019
Slides