metap: A Meta-Programming Layer for Python


Jul 19, 2025

Co-authored with Charith Mendis.




Summary


In short, we released a new package: metap, an easy to use, yet powerful meta-programming layer for Python. The release is long overdue as it is by far the tool I use the most! We will start by presenting the main ideas behind metap, followed by a tutorial. Then, we will provide a delineated view of meta-programming such that we can clearly understand what it is, what it isn't, what it can achieve, and how it is different from other related concepts. Here's a table of contents:


Why is Meta-Programming Useful?

A meta-programming layer (MPL) provides a meta-programming interface through which programs—called meta-programs—manipulate other programs—called object-programs (Check this for a nice overview). Meta-programming is useful because it can automate coding patterns, or transformations over coding patterns (many of which differ from programmer to programmer and from project to project). We highlight three particularly useful use cases of meta-programming, which we name: (a) program augmentation, (b) code generation, and (c) structural introspection. We briefly describe each in detail.

Program Augmentation

The idea in program augmentation is to enhance the program automatically in a predictable manner. As a concrete example, let us try to dynamically check type annotations.

Python accepts type annotations in e.g., variable declarations, function parameters and return types. These annotations are not checked statically or dynamically by default. It would be useful if we could automatically augment the program with code that performs the checks, i.e., code which checks whether the dynamic values agree with the annotations. For example, a MPL could augment this program:

def foo(s: str):
  pass

into the following program:

def foo(s: str):
  if not isinstance(s, str):
    print(s)
    print(type(s))
    assert False
  pass

This is what we dub program augmentation. The original program is a valid Python program on its own; a MPL just augments it.

Code Generation

A common pattern is to have functions which include many returns, particularly when this function tries to check multiple conditions (e.g., in compiler pattern matchers, like LLVM's InstCombine). In particular, these functions check whether the input has some characteristics X, Y, Z, etc. So, a lot of code ends up looking like:1

1
2
3
4
5
6
7
if not X:
  return None
if not Y:
  return None
if not Z:
  return None
...

There are two reasons why to write such code: (1) it's readable, (2) it's debuggable. Consider the two main alternatives. The first is nested ifs:

if X:
  if Y:
    if Z:
      ...
return None

However, this is very unreadable. The other alternative is to make use of Python's match statement. But that doesn't make it possible to know which match failed, and so it's not very debuggable.

It would be really ergonomic if we didn't have to write this all the time and instead we could write something like: _retn_ifn(X). This would essentially be a code-generation capability (e.g., a macro) that would generate the code above. In this case, the meta-program uses a superset of Python (i.e., it's not a valid Python program) defined by the MPL. Such features are generally more powerful than program augmentation. The latter may complement an existing program, but it doesn't make writing the program in the first place any easier. This is where an MPL's code generation shines, as it lets us automatically generate code which makes both the creation of programs easier, and the readability. It is even better if such code-generation capabilities are extensible; for example, _ret_ifn can be user-defined. This is what we call user-defined code generation.

Structural Introspection

Finally, meta-programming becomes more powerful with introspection, which occurs when a program inspects itself. In modern programming languages introspection is focused on types/semantics (e.g., if constexpr in C++);2 for example, when programs inspect generic types. While this is definitely useful, introspection is not limited to types and/or semantics. We argue that what we call structural introspection is both possible and important. This is when a program introspects structural elements. For example, it is sometimes useful to assert that a loop doesn't contain a continue. This can be useful if the logic of (usually) a (while) loop would become incorrect if a continue was added and we want to prevent that without relying on users manually inspecting the loop body. Such an assertion needs to introspect the structure of the program.


Unfortunately, there is no general-purpose meta-programming layer (MPL) or library for Python which provides these capabilities: program augmentation, user-defined code generation, and structural introspection. The built-in meta-programming capabilities of Python don't provide these features either. Structural introspection seems to be completely unthought of. Decorators provide some code generation capabilities but they don't allow us to e.g., define custom statements like _ret_ifn(X). Finally, there is very limited support for program augmentation, and especially for the most important kind: logging.

In this article we present metap, an easy-to-use meta-programming layer for Python, which provides all the aforementioned features. metap provides user-defined code generation through a simple macro system. It also provides a rich program-augmentation API which allows users (among other things) to log enable dynamic type-checking, expand asserts, and log all kinds of structures such as: ifs, returns, breaks, continues, function entries and calls. Finally, metap provides structural introspection directives which allow users to check the properties of the structure of the code (e.g., that a loop does not contain a continue).

A Tutorial Introduction to metap

metap works with two Python scripts: (a) A client, and (b) a meta-program. The meta-program is just a Python program, except it may have metap-specific features. The client processes the meta-program to generate a valid Python program.


Program Augmentation

We start with a simple logging example, which falls under program augmentation. Let's say we have the following meta-program, in a file named test_mp.py:

# test_mp.py
def foo():
  return 2

def bar():
  a = 2
  if a == 2:
    return 4

foo()
bar()

In this example, the meta-program has nothing metap-specific. We can just run it with Python as it is. But, we can still tell a client to transform it in various useful ways. For example, let's say we want to log all the returns. We can write a simple client as follows:

# client.py
from metap import MetaP

mp = MetaP(filename="test_mp.py")
mp.log_returns()
mp.dump(filename="test.py")

This communicates to metap (only) what is necessary and sufficient for this task: which file to load (test_mp.py), what to do (log the returns), and where to output the resulting (object) program (test.py). Now, we can first run:

> python client.py

to produce test.py and then run it:

> python test.py

which outputs:

metap::Return(ln=3)
metap::Return(ln=9)

In general, metap allows the user to log all kinds of things, optionally supporting indentation and only logging within ranges. Indicatively, metap can log: returns, breaks, continues, call-sites, function entries, and conditionals (i.e., ifs).

In the introduction we mentioned dynamic checking. metap provides that through the simple client API call dyn_typecheck() (similar to log_returns()). We note that metap supports pretty complex annotations like:

Optional[Tuple[List[str], List[int]]]

Code Generation

The full potential of metap is reached when the meta-program starts using the metap-superset of Python.

This example is taken straight from real-world code I (Stefanos) have written for the Markdown-to-HTML compiler I use to generate articles like the one you're reading. The purpose of this snippet is to parse a line and check if it's a Markdown heading (i.e., if it starts with “#”). But, we also want to identify whether it's a level-1 heading (i.e., a single leading “#”) or a level-2 heading (i.e., two leading “#”) because the compiler generates different code for each case. metap allows us to write the following (meta-program):

# md_to_html_mp.py
line = "# test"
if (_cvar(line.startswith('# '), hlvl, 1) or
    _cvar(line.startswith('## '), hlvl, 2)):
  # ... Common code which applies to both
  #     level-1 and level-2
  body += self.parse_heading(line, hlvl)

_cvar() is a metap-specific feature whose name stands for “conditional variable”. It allows us to assign a value to a variable while testing a condition. The first argument is a condition, the second a variable name, and the third any value. If the condition is True, then the third argument is assigned to the second (otherwise, nothing happens), akin to C++'s (which by the way is also possible in pure Python through the walrus operator, but it's much less common):

if (a = foo()) {
  // ...
}

in which a gets the value of foo() (unconditionally), and if that value is non-zero, we enter into the if-body. There is a two-argument version of _cvar() that allows us to do the same thing, but the version we showed above is more powerful because it allows us to specify what value a or hlvl will get if the condition is satisfied. Furthermore, it's important to clarify that we don't want to do the following:

# md_to_html_mp.py
line = "# test"
if line.startswith('# '):
  # ... Common code which applies to both
  #     level-1 and level-2
  body += self.parse_heading(line, 1)
else:
  # ... The _same_ common code which applies to
  #     both level-1 and level-2
  body += self.parse_heading(line, 2)

because, as is evident from the code, this introduces huge code duplication. There is some common code for both cases, and it's better to write it once for maintainability, readability, and consistency. We omit a pure-Python version of the above snippet for clarity.


Besides built-in code-generation capabilities, metap also allows the user to define their own macros. For example, we can define the _retn_ifn(X) macro we saw earlier as follows:

def _ret_ifnn(x):
  stmt : NODE = {
_tmp = <x>
if _tmp is None:
  return None
}
  return stmt

A macro is basically a function which, instead of returning a “normal” value, returns a code entity. A NODE variable denotes that the contents are to be treated as code. Everything in it is just treated as code verbatim, except for <x> through which we can refer to other code variables. In that way, we can compose code entities.

Structural Introspection

Structural introspection allows a program to inspect its own structure. In metap we provide structural introspection mainly to check certain properties that aid in maintainability and correctness. A classic example is a loop that must not have a continue. It often happens with while loops because in them the update statement to get us to the next element is written by the user (e.g., i += 1), whereas in for loops it happens automatically (e.g., the i += 1 happens behind the scenes when we do for i in range(...)). As an example, consider the following (simplified) while loop used in Huffman compression to construct a histogram:

# s: string
hist_len = 0
for i in range(len):
  hist_pos = char_map[s[i]]
  if hist_pos == -1:
    char_map[s[i]] = hist_pos = hist_len
    hist[hist_pos].sym = s[i]
    hist[hist_pos].freq = 0
    hist_len += 1
  curr_freq = hist[hist_pos].freq
  hist[hist_pos].freq = curr_freq + 1

Here, there is some code that we want to run in all cases (the last two lines), and there's some code that we want to run only when we find a new symbol (the code inside the if). To avoid the nesting introduced by the if—which can get deeper and deeper as the loop gets complicated—one may try to write the loop as follows (a pattern which is used amply inside the LLVM source code to avoid extensive nesting):

...
  if hist_pos != -1:
    continue
  char_map[s[i]] = hist_pos = hist_len
...

But of course this is wrong, because this won’t execute the last two lines if we get inside the if-body. To avoid such mistakes, the programmer can add a @no_continue annotation at the beginning of the loop:

...
for i in range(len):
  @no_continue
...

Now, if someone tries to add a continue—for example, by writing the second version—they will get the following error message from metap when they run the client (Attention: Not at runtime, but at compile-time):

metap: Error: @no_continue directive used at line 3,
but there's a `continue` at line: 7.

It is worth noting that what the original programmer wants is a defer statement (like the one provided by Go), but for loop bodies instead of the (usual) function bodies. We do not know of any language that supports such defer statements, and we plan it as a future work for metap.

What is Meta-Programming Really?

In this section, we aim to provide a view of meta-programming that allows us to: (1) distinguish it from similar terms (e.g., compilation), and (2) specify what meta-programming can (and cannot) solve. This develops a rationale for why we implemented metap as a meta-programming layer and not, for example, as a compiler plugin or a DSL.

In general, a meta-program is a program that specifies an object-program (or target program), and a meta-programming layer (MPL) is a tool that takes in a meta-program and produces the object-program. A trivial meta-program is any program (as a program trivially specifies itself). A trivial MPL is a pass-through, i.e., it spits out what it gets in. However, this definition is too broad to allow us to specify what sets meta-programming apart from other similar technology.

Compilers and Abstract Semantics

One may think that a conventional C compiler is a MPL, which takes as input a C program—which in this case we can think of as a meta-program that describes an assembly object-program—and produces an assembly program. But, we argue that this is an inaccurate view of compilers (or meta-programming; or both). This is because a C program is an inaccurate specification of an assembly program, which becomes even worse given C's treatment from optimizing compilers. More specifically, it's hard to predict what assembly the compiler will generate.

This, in turn, is because C, and basically every programming language, defines an abstract semantics for programs. In short, the C standard defines (abstractly) what a program does, but not how; the latter is left to the implementation. So, for example, if we assign 2 to a and then immediately read from a (and assuming a is not volatile). then we should get 2. This abstract behavior can be implemented in many different concrete ways. For example, a's value could be stored in a register or in the stack or in the heap. Leaving the “how” unspecified gives the compiler the freedom to generate whatever assembly it deems profitable as long as that assembly has the abstract behavior defined.

However, abstract behavior is not always useful. In general, if we want a new feature—e.g., return x if x is not None—we may do it abstractly, or concretely. In the former case, we define the abstract behavior which is then implemented somehow by a compiler. This essentially creates a domain-specific language (DSL). In the latter case, we generate code predictably, which essentially constitutes a MPL. Now, we will consider the benefits/drawbacks of DSLs and MPLs.

Meta-Programming Layers (MPLs) vs Domain-Specific Languages (DSLs)

One reason we may want a DSL is because we have different operators that co-interact. A DSL compiler that understands the semantics of the operators can perform optimizations like e.g., operator fusion. In general, as in C, abstract semantics allows the compiler to optimize the code aggressively. Furthermore, a DSL is useful when we want to introduce a new programming model. More specifically, a DSL's value increases drastically in proportion to the difficulty of implementing a new programming model over an existing language. In the case where the difficulty reaches an unmanageable peak, it becomes hard for users to think about “how” something happens, and instead a DSL allows them to focus only on the “what”. Popular examples of DSLs that have all the features above are TensorFlow and Halide.

Furthermore, a DSL gives us some form of portability. In particular, the behavior of a DSL program is platform-independent; again, because it is abstract (and assuming it is not underspecified because of e.g., undefined behavior).

But, DSLs have some disadvantages too. Their main drawback is the same as their main advantage: abstract behavior. In particular, the user generally doesn't know how DSL features are implemented. Also, a user may not want to learn a whole new programming model and new semantics that a DSL introduces. This is not only time-consuming, but can also lead to bugs (until one really understands what the semantics is). That's where a meta-programming layer shines.

A meta-programming layer is a great solution when we don't want to: (1) have opaque operators that can interact (potentially through compiler optimizations), and (2) learn a new programming model and semantics, and (3) we don't want to translate the program in multiple languages. Basically, when we want to program in a language, model, and semantics we already know. But what do we gain?

The main benefit is that we get predictable generated code, which in turn implies many other benefits. First, the meta-language is as well-defined as the object-language because a meta-program is just syntactic sugar over object-programs. The MPL generates e.g., Python code predictably, so there's no undefinedness originating from the meta-program or the MPL. This is not the case for languages (including DSLs) with abstract behaviors. For example, C has undefined behavior while x86 assembly doesn't.

Second, we don't have to teach the tools we use about our special features. The goal of a MPL is just to help us either analyze the program code, or generate new code. In the former case, we don't need any external tool because the analysis happens by the MPL. In the latter case, we can use whatever tools we can already use in the object language.

Third, we don't need to re-implement features over a new language. For example, with metap we just generate a Python program predictably. Suppose now we execute this Python program and we get an exception. Because we generated the code predictably, we can easily figure out where this exception came from in the meta-program. In other words, we don't need to re-implement exceptions for the meta-language.

Fourth, a MPL can easily interact with program-augmentation features. For example, suppose _retn_ifn(x) becomes a new Python statement that somehow returns x if x is not None. But if we don't know that this will generate a return statement, we can't compose it e.g., with metap's log_returns(). In other words, we sacrifice transparency.

Finally, since a MPL just analyzes or generates code, it is extremely lightweight. A MPL is not a runtime, it's not a compiler, and it's not a standard library. Consequently, to run a meta-program, we don't need to link it against a heavy library or runtime, or use any special tooling.

In short, generating code instead of implementing abstract behaviors results in predictable, transparent, and debuggable code. In fact, exactly because of that, it is easy to create “towers of meta-programming layers” (to paraphrase an inspiring paper). These were top priorities for metap's use cases, while none of the DSL benefits were part of our desiderata. This is why metap is a MPL and not a DSL.


Meta-Programming in Pop (Programming) Culture

The last couple of years, more and more people seem to appreciate the benefits of meta-programming. For example, C++ has had templates for a long time and Go recently added generics. The problem is that most of these implementations are, at least for me, not very useful.

C++ for some reason decided to introduce templates—a new language— on top of C++ which—to everyone's surprise (or not if you knew anything about programming)—turned out to produce abhorring error messages and when you couple it with C++'s compilation model, to have unimaginable compilation times.3 The cherry on top is that you get all these drawbacks without many of the basic features one would want from templates (e.g., C++ only recently got some introspection capabilities).

This has led many people to believe that meta-programming, and even template meta-programming specifically, inherently has all these drawbacks. No. If you don't believe me, try compiling D's standard library phobos. It contains hundreds of thousands of lines of code and is full of templates, yet compiles in 20 seconds or so. And generally, D actually got templates right, as it essentially doesn't have any of the drawbacks of the C++ implementation.

As for Go, I've written a whole article about its weird generics.4

In short, even though modern languages have some meta-programming features, these features are usually weak, or poorly implemented (or both). For example, they don't provide out-of-the-box utilities that allow you to e.g., log all returns. Furthermore, because modern languages tend to be massive and complicated, it's very hard to implement a meta-programming layer yourself. Imagine writing an AST transformer for C++. Even using Clang's utilities, we're talking about a massive undertaking considering all the craziness you have to deal with. And good luck hooking that up with Clang's pipeline.


Montana: Those who got it (into a terrible product)

Montana was a compiler infrastructure which allowed users to easily make their own plugins, which could meta-program the source program, and hook them up easily into Montana's pipeline. Montana was clearly our biggest inspiration to build metap (I learned about it while binge-watching Handmade Hero on youtube). The Montana folks really understood and partly realized the potential of meta-programming, and the role a compiler's assistance plays. Sadly, it is much less known than it deserves. As Casey Muratori said, Montana was part of the IBM VisualAge toolchain, which was absolutely terrible and so the whole thing failed. Over the years I've heard people talking about VisualAge and most seem to agree. In the years following Montana there was some academic work on similar technologies, like Xoc by Russ Cox,5 but rarely did they cite Montana.


Epilogue: File Watcher

One problematic aspect of a meta-programming layer is that it introduces another thing that has to be executed before you can run your program. For C-like pipelines that's not a problem because you can usually just add it as part of your build script. But in Python I've found it a bit annoying. Conveniently, I discovered File Watcher which I've made run metap anytime I save a meta-program.




Want to join the discussion? Head over to Reddit.

Don't want to miss any updates? You can follow this RSS feed or sign up for my newsletter:

Footnotes

  1. Here's a real excerpt from Dias.
  2. Which C++ took straight from Dlang. Walter Bright and Andrei Alexandrescu introduced static if in Dlang early and around 2012, they co-authored a static if proposal for C++. Bjarne Stroustrup, in his infinite wisdom, co-authored a rebuttal that rejected it, only to realize long overdue that it's an incredibly useful feature and introduce it in C++17.
  3. Wait, it doesn't end here: to be Turing complete—which conveniently led to non-terminating compilations—, to be inconsistent with any formalization of polymorphism, to be incredibly ugly, to produce flat-out wrong code during the first couple of years, and to increase C++'s complexity exponentially.
  4. That said, at least Go provides an infrastructure that allows you to create code transformation tools easily
  5. Who is (used to be?) one of the leading figures of Golang.