I am a third-year CS PhD candidate, at the University of Illinois Urbana-Champaign (UIUC), in the Dept. of Computer Science (CS). My advisor is Prof. Charith Mendis. I also work closely with Daniel Kang.
I'm doing research at the intersection of compilers and data management systems. More broadly, my interests extend to performance engineering, programming languages, and distributed systems.
Before coming to UIUC, I worked as a compiler researcher at NEC and I was also involved with the Liberty Research Group from Princeton. I obtained my B.Sc. from the Department of Informatics, University of Athens, where I did my thesis with Prof. Yannis Smaragdakis.
May 20, 2024 | I started my internship at Microsoft Research, working with Tarique Siddiqui and Christian König. |
Nov 20, 2023 | Dias got accepted at SIGMOD 2024. The code is public and you can also try Dias using our Colab notebook. |
Nov 11, 2023 | New article: Inverting the Inverted: Revisiting Dismissed Ideas in Research. |
@article{dias, author = {Baziotis, Stefanos and Kang, Daniel and Mendis, Charith}, title = {Dias: Dynamic Rewriting of Pandas Code}, year = {2024}, issue_date = {February 2024}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, volume = {2}, number = {1}, url = {https://doi.org/10.1145/3639313}, doi = {10.1145/3639313}, abstract = {In recent years, dataframe libraries, such as pandas have exploded in popularity. Due to their flexibility, they are increasingly used in ad-hoc exploratory data analysis (EDA) workloads. These workloads are diverse, including custom functions which can span libraries or be written in pure Python. The majority of systems available to accelerate EDA workloads focus on bulk-parallel workloads, which contain vastly different computational patterns, typically within a single library. As a result, they can introduce excessive overheads for ad-hoc EDA workloads due to their expensive optimization techniques. Instead, we identify source-to-source, external program rewriting as a lightweight technique which can optimize across representations, and offer substantial speedups while also avoiding slowdowns. We implemented Dias, which rewrites notebook cells to be more efficient for ad-hoc EDA workloads. We develop techniques for efficient rewrites in Dias, including checking the preconditions under which rewrites are correct, dynamically, at fine-grained program points. We show that Dias can rewrite individual cells to be 57\texttimes{} faster compared to pandas and 1909\texttimes{} faster compared to optimized systems such as modin. Furthermore, Dias can accelerate whole notebooks by up to 3.6\texttimes{} compared to pandas and 27.1\texttimes{} compared to modin.}, journal = {Proc. ACM Manag. Data}, month = {mar}, articleno = {58}, numpages = {27}, keywords = {cross-representation, dynamic, pandas, rewriting} }