Inverting the Inverted: Revisiting Dismissed Ideas in Research


Nov 11, 2023


Want to join the discussion? Go to the HackerNews post.

Revisiting past ideas in research can create opportunities for progress, but also lead to controversies. In this article, I will use the the Bit-Flip method, a pretty simple, yet effective method for approaching research projects, as a framework for analyzing some interesting shifts in assumptions in research works. In particular, we will focus on shifts that brought back dismissed ideas.

What is the Bit-Flip Method?

As far as I know, the method originates from the CS197 course at Stanford (although I learned it from Daniel Kang). In the Bit-Flip method, we take an assumption of prior work (the "bit"), and we invert/challenge it, hopefully reaping some benefits along the way. Here is an exapmle;

Bit Flip Project
We need complicated instruction sets to accommodate powerful computer processors. Simple instruction sets are better since they let you compare performance, optimize, and prevent errors. RISC Architecture

The main goal of the Bit-Flip method is to create clearly novel projects. If the bit has not been inverted before, then this is a clear intellectual shift and new way of looking at things.

But the Bit-Flip method does not only help in creating (and presenting) a project. We can also use it for a post-hoc analysis of a project to articulate what its novelty and main contributions are exactly. In other words, if you can fit a project into the Bit-Flip framework, then the bits kind of tell you the most important aspects of the project.

Fitting projects into the Bit-Flip framework turns out to uncover interesting patterns. In particular, when I started doing it, I found out that there have been projects that flipped a bit that had been flipped before; what I call a "double flip".

Double Flips: Are we going backwards or forwards?

On the face of it, a double flip seems pointless because the original assumption has been tried, and then we supposedly improved upon it by flipping it. So, why is there any reason to flip it back?

One reason is that the context changes. For example, compilers originally read the whole source code into memory. But then the source code became too big to fit into the memory that was available back then. So, folks had to do all sorts of tricks to deal with that. For example, they had to load the source code in chunks and to do as many things at once as possible so that the maximum memory usage of a compiler could be kept to a minimum. This was a big reason for why we ended up with these single-pass compilers that were impossible to understand because all of: lexing, parsing, code generation were tangled up into an interdependent mess.

But things changed. Memory is not that much of an issue anymore and so compilers not only read the whole source code at once, but they also separate the different steps into separate passes that can even be pipelined. This type of double flip signifies, I think, progress. Things change and we roll with them instead of being fixated on past assumptions.

However, not every double flip is so rosy. An example is MapReduce. Let me start with a disclaimer: This is a controversial topic and I'm not here to pick a side. In fact, I chose MapReduce exactly because it is so controversial and it's unclear whether there is a "right" answer, which forces us to think instead of accepting things at face value. In any case, I think it is interesting to observe how a double flip simultaneously created an insane amount of success (for whatever metric of success you want: profit, tenures, future work, paparazzi following you – ok, maybe not that) and at the same time became disliked by significant figures in the community.

The story starts with MapReduce, which was published in 2004, by the now famous duo of Jeffrey Dean and Sanjay Ghemawat, after having been used extensively inside Google. The system created quite a bit of hype, which eventually inspired the Hadoop open-source project. But it didn't stop there. MapReduce was the main inspiration behind Spark, which was published in 2010. Spark became so successful that pushed MapReduce and Hadoop out of business, and pushed Matei Zaharia and Ion Stoica into business as, in big part because of Spark, they created the now quite successful Databricks startup.

However, while all this was going on, a couple of major figures in databases were not all that excited about MapReduce (or Spark). One of them was Michael Stonebraker – you know, the Turing award winner – who was one of the authors of the popular and scathing article: MapReduce: A major step backwards.

The main reason the authors were unhappy is exactly because they thought MapReduce double-flipped a bit (or actually three as we'll see). Take a look at this quote from the article:

The database community has learned the following three lessons from the 40 years that have unfolded since IBM first released IMS in 1968.
- Schemas are good.
- Separation of the schema from the application is good.
- High-level access languages are good.

And then the authors proceed to calmly (?) explain that MapReduce does not respect any of the three.

So, MapReduce double-flipped not one, nor two, but three bits. Bold. And it's not just that. I would actually argue that all of these three flips are part of the benefits of MapReduce. So, the flips were not some accidental, possibly unintentional side-effects, but rather the very focus of the contribution.

For example, MapReduce was created to analyze insane amounts of unstructured data (e.g., scraped from the web) and a big benefit of MapReduce is exactly that you don't have to specify a schema. Similarly, a MapReduce job can be a quick-n-dirty, one-off analysis that you want to spin up quickly and so you just write a few lines of code to analyze the data. In this case, the "schema" is part of the code because you don't even care to declare a schema visible to others, or one that you will be able to revisit later. And finally, SQL is good but the problem is: (a) MapReduce is not just about accessing but also about processing and (b) the type of processing, while being constrained to the MapReduce conceptual framework, is otherwise unconstrained to be expressed with a high-level language.

"What do I make out of this?"

For starters, I would say that this may be yet another indication that we, as a research community, do not really know what is going on (and that Schumpeter's "creative destruction" may be less classy that we may want to think). Second, double flips are not necessarily bad (no matter whether you agree with MapReduce or the anti-MapReduce manifesto). So, maybe the argument "we tried that back in the 70s and it sucked" is not enough evidence to abandon your project.

Finally, we may not be able to escape philosophy because these disagreements probably come down to values. I know this sounds like post-modern crap but this is what is going on when Michael Stonebraker says that e.g., MapReduce does not have taste. In my opinion, taste is important even if you cannot put a number on it.



Want to join the discussion? Go to the HackerNews post.