Simulating Unix v7 on PDP-11 with In-Depth Explanations


Nov 24, 2025



Don't want to miss any updates? You can follow this RSS feed or sign up for my newsletter:



A big thanks to Diomidis Spinellis who read both parts in this series, caught *many* mistakes, and provided valuable feedback!


Introduction

The goal for this article is simple: Simulate Unix v7 on PDP-11 with OpenSIMH. The real contribution is that on the way I will explain in depth many things that are probably puzzling for many people (and they were for me too). There is already a tutorial by OpenSIMH on how to do the task, which is largely based on a tutorial by Haley and Ritchie (because really the OpenSIMH components are mostly irrelevant, which is how you know you have a good simulator). The problem is that neither of these tutorials explains e.g., why certain partitions are used, why we use both a tape and a disk, etc. The goal of this article is to demystify all that.

This is the first part of a 2-part series. In the second part we will use our Unix v7 on PDP-11 to explore how C was back in the day. On that note, this tutorial will not cover everything the OpenSIMH tutorial covers. The goal here is to cover enough to have a machine on which we can compile stuff (more specifically, I will cover only Chapter 1 of the tutorial). Anyway this part is the most esoteric, and already there will be a lot of ground to cover. If you understand all that, I think it will be easy to move to the rest. Finally, note that there are easier ways to simulate a Unix on PDP-11, like these systems here. But I think it is cool to try recreating the experience back then.


On PDP-11

Before going into the simulation, I think it is prudent to learn a bit about what we are dealing with—PDP-11 and Unix v7—starting with PDP-11.

The area is huge with many manuals, most of them written by the manufacturer, DEC. The goal is to get a rough idea of what we are dealing with, especially because today you will hear (phrased in one way or another) that “we pretend we still run on a PDP-11” (e.g., 11:26 here and this).

First, the acronym “PDP” is a bit of a puzzle. For DEC (the manufacturer), it stands for “Programmed Data Processor” (see p.2 here). Other sources, though, also expanded it to “Programmable Data Processor” (example).

Moving to PDP-11 specifically, the first thing to learn is that there was not only one PDP-11. PDP-11 is a whole family of computers. Because of that, probably the most interesting manuals (and two of the longest) are the PDP-11 Peripherals Handbook and the PDP-11 Architecture Handbook. This is partly because these apply to every member of the family, and also because these are the most informative regarding how you interacted with the machine. That’s because when it comes to executing programs, you’d program in C which abstracted many details of the processor (especially a particular processor). The following quote (Chapter 1, page 3) from the Architecture Handbook summarizes the PDP-11 family well:

The PDP-11 family of computers shares a common architecture. They are all based on a 16-bit word length, a common instruction set, and the same addressing techniques. They also share the same data management utilities, the same input/output (1/0) systems, and the same programming languages. If you have learned to program one computer in the PDP-11 family, you can easily program another member of the PDP-11 family.

I would like to highlight two things. The first is that PDP-11 supported a whole bunch of peripherals. You could plug all sorts of devices to it, including terminals (the most important), disk drives, tape drives, displays, etc. The other important thing is the UNIBUS, and the PDP-11 family was the first to have it from the general PDP family. A conventional memory bus mediates the communication between the CPU and main memory. In PDP-11, though, its main bus, called UNIBUS (or Unibus) was a uniform bus on which the CPU, main memory, and peripherals could communicate. Originally I thought that the fact that Unix treated devices as files (one its main novelties) owned a lot to the UNIBUS because it allowed a uniform communication channel. However, that Unix feature predates the UNIBUS, and has been there since the PDP-7 Unix (and PDP-7 did not have a UNIBUS). So, if anything, Unix influenced UNIBUS (although I have no evidence for that).

In this tutorial we will focus on PDP-11/45. As far as I can tell, the only thing that is particularly special about the /45 is that it introduced hardware support for floating-point operations. This is irrelevant to us. The real reason we are using it is because the original Unix v7 setup guide required either PDP-11/45 or PDP-11/70. So, we just pick one. For the purposes of our goals (i.e., boot up Unix v7 and experiment with K&R C), the differences are not important.


On Unix v7

Again, there are thousands of pages and hours of talks devoted to Unix v7, released in 1979. My favorite source is the two-volume collection of the UNIX Programmer’s Manual (Vol. 1 and Vol. 2). Here I will simply list some of the features that v7 introduced which you probably use every day, and the programs you run definitely do:


Bootstrapping

Before we get to business, you should download the files in 1.1 of the tutorial.1 Just keep in mind that my SHA-1 sum was different than the tutorial’s but still everything worked fine. One thing that the tutorial does not mention is how to download OpenSIMH and compile the PDP-11 simulator. You can download the code from the website (e.g., click “Download zip”) and then issue:

make pdp11

This should create a binary in ./BIN/pdp11.

Now we get to the tape.ini file that the tutorial tells us to create. This is a configuration for OpenSIMH which tells it what to simulate.

set cpu 11/45
set cpu idle
set rp0 rp06
att rp0 rp06-0.disk
set rp1 rp06
att rp1 rp06-1.disk
att tm0 v7.tap
boot tm0

Unlike the tutorial, we will spend quite a bit of time explaining every line here. The first line tells OpenSIMH which PDP-11 in the family to simulate. The second line tells the CPU to remain idle instead of having a busy loop. The third line is where things get interesting. We have to take a step back.


Disks

As we said earlier, PDP-11/45 had a main bus called UNIBUS which allowed the the communication with peripheral devices. But, you couldn’t plug any sort of device on the UNIBUS. You could plug e.g., a terminal, but you could not plug a high-speed disk drive such as RP06 (we will talk more about what that is in a bit). One problem was that this was very fast for UNIBUS to handle, and it required the CPU to control many things that it does not know how to control such as the rotation speed, timing, etc. Furthermore, these drives operated with a different bus technology, MASSBUS, which was specialized for high-speed disk drives and tape drives. Now some devices have their own device controller, which communicates with the OS and the CPU. But RP0* drives (again, we’ll talk about them later) did not.

For this reason, PDP-11/45 needed a disk controller. This acts as a middleman between the CPU and the disk drives. It translated CPU instructions to low-level disk operations (e.g., rotate head 4 by X amount, etc). It was also asynchronous and so you could have Direct Memory Access (DMA). So, the disk controller started copying stuff while the CPU was doing other work, and when it was done it sent an interrupt.

It is fair to assume the disk controller in PDP-11/45 was RH11 (or RH70, but for simplicity we’ll talk about one) because it could handle the RP0* drives, like RP06, and it could handle many disks simultaneously, up to 8 (source). In the OpenSIMH configuration we never mention RH11 because it seems it doesn’t specify which controller is used exactly (see this). But as far as I know, it is assumed that we have some controller. In any case, in OpenSIMH we refer to disks by numbers. This brings us to the next directive in the initialization: set rp0 rp06. This says “set disk number 0 to be a RP06”.

But what is this RP06 disk and why do we simulate one? The latter is easy to answer: because it was one of the options required to setup Unix v7 (source). The tutorial uses RP06 probably because it was the largest of the options. Let’s now to tackle the first half of the question.

A good place to start is its technology. Drives back then had similar technology to HDDs today: a magnetic platter with a movable head. However, the drives were massive. Consider a small drive named RL02, which had only a single platter. It had the size of a small filing cabinet. See around 2:06 here. A multi-platter disk, like RP06, was larger than even a today’s washing machine. Let’s take a concrete example. I looked up “Large” washing machines in Home Depot and I randomly clicked on this. Its dimensions are (rounded): 42 in. H, 27 in. W, and 26 in D. It weighs 117 pounds. RP06 has these dimensions (source): 47 in. H, 33 in. W, and 32 in. D. So, RP06 was larger in each of the three dimensions. More surprisingly, its weight was 600 pounds. All this is motivation for the name: “RP”.

RP stands for “removable pack.” This is because the pack of platters was removable. See this video, for example, which loads and unloads a pack on a different (and smaller) multi-platter, removable-pack drive. RP06 used twelve-platter packs (according to this source). A removable pack drive was useful because if e.g., you wanted to physically take the data with you (like you’d do with a USB today), you only had to carry the platter pack, not the whole drive. Similarly, if you wanted more storage, you didn’t have to buy a whole new drive, you only had to buy a new pack.

You may be wondering “why was the drive so huge?”, especially considering that today like 90% of an HDD’s size is taken up by the platters. As far as I can tell, the reason is that the platters were huge, and many of them, so it needed a large and sturdy build to rotate them at 3600rpm (see p.3-4 here), and dissipate heat appropriately.

Ok, we know why we’re using an RP06 and what it is, but you may wonder: “why set up two of them”? This is confusing because in the tutorial by Haley-Ritchie, they seem to be setting up only one disk. That is true, but they do it for simplicity; they do not recommend it (e.g., in the “Disk Layout” section later, they tell us pretty clearly “GET A SECOND PACK!!!!”).

More generally, back then it was a convention to set up Unix with at least two disks. One disk had the root filesystem which included:

  • The kernel
  • /etc, /bin
  • Basic Unix utilities (e.g., make)

The second disk had the (or better, “a”) user filesystem, meaning:

  • /usr (which is where directories for individual users lived)
  • Extra experiments, programs, or storage

That is the setup the OpenSIMH tutorial uses and that is what we will use too.

Before we move to the next section, let’s be sure about where we are. We have introduced 2 RP06 disks, in rp0 and rp1, i.e., in the first and second slot of the disk controller. The .disk files will be created by the simulator and will be initially empty.


The Tape

The next directive is att tm0 v7.tap.

If we see the PDP-11 features in OpenSIMH, we see that TM maps to TM11/TU10. But what is that? TM11 is a tape drive controller. Similar to disk controllers, we need a tape drive controller to communicate with tape drives. TU10 is the tape drive that is simulated (to be plugged to TM11). As you can see in the picture, this is also quite big (about half a meter high). Why use tapes then?

Tapes were used to distribute software back then, and the reason is that they were small, unlike even single-platter drives. See 2:10 here. They were also much cheaper!

The directive instructs OpenSIMH to load v7.tap (the file we created in 1.1 of the tutorial) to the first device (0-indexed) in tm which is assumed to be TM11 and to have only one device plugged in, TU10.

It is interesting to learn a bit about the technology of tapes. The principle is the same as in magnetic platters, but they have parallel tracks. So, for example, a character in a 9-track tape would use the 8 tracks for the 8 bits (which can be read in parallel) and the 9th track for parity. Tapes are much slower to read from than a hard drive because they are not random access. You basically have to read linearly because rewinding takes a lot of time. But, they have so much more surface area. So, in a tape that is about the size of an HDD, we can store today like 185 TB. See Why Tape Storage is Making a Sneaky Comeback.


The Bad-Blocks Table

This is all we need to know about the OpenSIMH configuration, so we are ready to boot. The tutorial says that we see this:

RP0: creating new file
Overwrite last track? [N]y
RP1: creating new file
Overwrite last track? [N]y

I did not see that, though, but it is interesting to explain what it means. Old disks kept a bad-blocks table, usually on the last track. It is a small section of the disk that keeps track of disk sectors or blocks that are physically damaged or unreliable, so the operating system won’t try to use them. The simulator is asking to overwrite the last track and we say “yes”. That adds an empty bad-blocks table as obviously since it’s all simulated, there are no physical defects on the disks. That may seem dumb, then, but the reason it is necessary is because Unix expects to find such a table there!


Creating a File System

After we boot, “no operating system is present”, as the tutorial says. “However a tape is loaded and a standalone program called tm is also loaded and available. If we run tm, it will run a program directly from a tape, indexed by the tape controller (i.e., which of the tapes, if we had multiple) and file on the tape. It is a zero-based index. We want to run the 4th file on the tape, which is a standalone version of mkfs in order to create a filesystem on the RP06 disk pack.” So we run tm(0,3). Note here that since there’s only one tape controller, the first index must be 0.

Then we need to enter the file size. There’s no detailed explanation for this, but the Haley-Ritchie tutorial says that “the filesystem size required is about 5000 blocks”. In the next sentence we discover that a block takes up 512 bytes, which means the whole Unix v7 required only 2.5 MB!

The next one is tricky. It uses hp(0,0) to refer to the first RP06 disk. Where did hp come from? This is a convention of Unix, it has nothing to do with the simulator. It’s there even in the original tutorial. For some reason if your disk was a RP03, you had to use rp(), but for RP04/05/06 you needed to use hp(). I think the name comes from Hewlett-Packard, because they were known for developing such disks. The indexes refer to “disk” and “partition”. We want to put our root filesystem to disk 0 (our first RP06) disk, in the first partition. The concept of a disk is obvious, but the concept of a partition is slippery. We will postpone that discussion for later when we make device files. One final thing is that we can also do hp(1,0) if we want to put it on the second disk (but it is not recommended if you want to follow along).

The isize refers to the number of inodes that get allocated. Each inode in Unix v7 is 64 bytes, which means we’re using 1600×64 bytes = ~100kb, or 2.5% of the disk, for inodes. The rest of the space is used for the actual data. The number of inodes is decided by mkfs.

m/n refers to the number of blocks that have been copied (m) vs the number of blocks that need to be copied (n). In my case I only saw 3 500 but I think in the original setup this would update.


Copying the filesystem into the disk

Now we need to copy the root filesystem, which is a dump of Keith Bostic’s rp0, into our rp0. We do:

: tm(0,4)
Tape? tm(0,5)
Disk? hp(0,0)

This follows the same conventions as before. We execute the 5th file on the tape 0, which is restor, and it basically just copies data to a disk. It asks us which file to copy, and we say the 6th file from tape 0 into hp(0,0) (i.e., rp0 for us). Why the 6th file? Because that’s the Unix v7 kernel!


Booting!

To boot, we need to execute the kernel (which is a file). We do that with hp(0,0)hptmunix.

This is a concatenated string composed of: (a) the disk and partition (we will talk about partitions when we talk about making device files) that contains the file, and (b) the file name of the kernel file, which for our setup is hptmunix. Unix v7 provided a bunch of kernels depending on your setup. Because we are using a RP06 (hp) and TU10 (tm), we use hptmunix. As you’ll see later in the tutorial, we delete kernels for other configurations.

The mem message gives the memory available to user programs in bytes. For SIMH it says 177344, i.e., about 177kb.

As the tutorial says, this boots in single-user mode, which is why there is no login. The # prompts us to issue commands. Everything is currently uppercase, and I did not know why. Diomidis Spinellis was kind enough to explain it in personal communication: “Because many teletypewriters used as terminals only supported uppercase letters. These were mapped to lowercase by the terminal driver. You entered uppercase by preceding a letter with \. So A→a and \A→A.” As uppercase can be annoying, we would like to allow lowercase. The rest of the paragraph in the tutorial has a good explanation on how do to do that.


Disk Device Files

At this point the tutorial gets very confusing. The tutorial tells us NOT TO do make rp06 because we need to use different values than the ones in the makefile. Then it attempts to explain how to derive the correct values, but it’s hard to understand for me. So, let me attempt to explain it by starting with of the makefile entries, rp06.

rp06:
      /etc/mknod rp0 b 6 0
      /etc/mknod swap b 6 1
      /etc/mknod rp3 b 6 7
      /etc/mknod rrp0 c 14 0
      /etc/mknod rrp3 c 14 7
      chmod go-w rp0 swap rp3 rrp0 rrp3

The original tutorial says “this recipe creates fixed device names. These names will be used below, and some of them are built into various programs, so they are most convenient.” This means that the names like rp0, rp3 have specific “semantics” and we should not change them, no matter what disk they refer to. For example, the original tutorial says rp3 “will be used for the name of the filesystem on which the user filesystem is put even though it might be on an RP06 and is not logical device 3.” Our job is to figure out what values we should put when making rp3 so that it maps to the file system on which the user files will be put. So now we first need to figure out which devices we need to create (which is the same as figuring out what each name means).

To do that, we need to understand the difference between buffered and raw access, the two modes in which Unix v7 can access devices. Buffered access is, well, buffered, but also it does block-sized I/O and it is slower. Raw access is unbuffered, it uses byte-sized I/O and it is faster. Unix v7 wants e.g., both buffered and raw access to the root file system, and this should happen through different devices. Now we can understand what the different names mean. In fact the original tutorial tells us: “The file rp0 refers to the root filesystem; swap to the swap-space filesystem; rp3 to the user filesystem. The devices rrp0 and rrp3 are the raw versions of the disks”, e.g., rrp0 gives Unix raw access to the root filesystem and rrp3 gives Unix raw access to the user filesystem. In mknod we denote that the device should have buffered access with b and raw access with c (which comes from “character” because it’s character-by-character I/O). So now we have partially filled out the mknod arguments:

/etc/mknod rp0 b ...
/etc/mknod swap b ...
/etc/mknod rp3 b ...
/etc/mknod rrp0 c ...
/etc/mknod rrp3 c ...

There are two more arguments to figure out. The first is the disk driver. Don’t be confused with the disk controller, it’s completely separate. The disk driver is a Unix program that communicates with disks. In particular, Unix v7 has different drivers depending on which kind of disk we are accessing (e.g., one driver is used for RP04/05/06, but a separate driver for RP03) and also depending on whether it is buffered or raw access. We refer to these drivers by number, which is what goes into the second argument of mknod. To find the relevant numbers, we can look into /usr/sys/conf/c.c. There we find two arrays:

struct  bdevsw  bdevsw[] = ...
...
struct  cdevsw  cdevsw[] = ...

These stand for “block device switch” and “character device switch”. The “switch” part basically means “dispatch”, as if it were a switch in C. These arrays initialize function pointers according to the device driver that should run. The comments in these arrays tell us the numbers we need for mknod. So, /* hp = 6 */ tells us that we should use 6 as the first argument to mknod to make a device that calls the device driver for RP04/05/06 (there’s the same naming convention as earlier, “hp”). When we use the device, Unix will pluck number 6 as an index to this array and call the resulting function pointer. Accordingly, /* hp = 14 */ tells us we should use 14 for the first argument for raw access. Great, so now we have this:

/etc/mknod rp0 b 6 ...
/etc/mknod swap b 6 ...
/etc/mknod rp3 b 6 ...
/etc/mknod rrp0 c 14 ...
/etc/mknod rrp3 c 14 ...

The last argument should be expected. We need to tell mknod which drive we’re talking about, and which partition on this drive. Both of these are packed into a single number. In particular, it looks like UUU PPP. That is, the low 3 bits refer to the partition, and the next 3 bits refer to the drive (also called “unit”, hence U). So, to refer to our first RP06 drive and the first partition (which is the partition of the root filesystem), we write 0. To refer to our second RP06 drive and the seventh partition we write 001 111 = 15. So, for sure we have this, with binary for clarity:

/etc/mknod rp0  b 6  000 000
/etc/mknod swap b 6  ??? ???
/etc/mknod rp3  b 6  ??? ???
/etc/mknod rrp0 c 14 000 000
/etc/mknod rrp3 c 14 ??? ???

But we need to figure out the ??? entries. That is, we need to figure out: (a) where the swap space is stored, and (b) where the user (/usr) filesystem goes (remember that the third and fifth (last) lines should have the same last argument because they both refer to the user filesystem). To figure that out, we need to understand what a partition is. The most important thing to understand is that a partition in Unix v7 is not like today’s partitions. It is not resizable and not even something the user can construct. For example, we have been referring to partition 0 of the first disk, but we never created that. It was always there. There is also a fixed number of partitions. How do we know all that?

We can take a look at /usr/sys/dev/hp.c which has the code for the driver for RP04/05/06 drives. The key part is this array:

struct  size
{
       daddr_t nblocks;
       int     cyloff;
} hp_sizes[8] =
{
       9614,   0,              /* cyl 0 thru 22 */
       8778,   23,             /* cyl 23 thru 43 */
       0,      0,
       0,      0,
       161348, 44,             /* cyl 44 thru 429 */
       160930, 430,            /* cyl 430 thru 814 */
       153406, 44,             /* cyl 44 thru 410 (rp04, rp05) */
       322278, 44,             /* cyl 44 thru 814 (rp06) */
};

These are the 8 predefined partitions (0-7), which is how Unix v7 is configured by default. Obviously you could configure them differently and re-compile, but you get a fixed set by default. Each element in the array corresponds to a (fixed) partition, and it includes the size (in 512-byte blocks) and the starting cylinder. It is also assumed that partition 0 stores the root file system. When we set earlier the size for our filesystem (5000), that was assumed to be less than 9614. Partition 1 is assumed to store the swap space. So, we already know the mknod parameters for swap, assuming we want to store in the first disk.

The rest of the partitions are as follows. Partitions 2 and 3 have a size of 0 and they are not supposed to be used. The other partitions store the user file system (they are the big partitions). They are overlapping, and are basically different configurations the user can choose from. They are not supposed to be used all 4 together. But you can use e.g., partition 4 and 5, or you can use only partition 7. And that’s what we’ll do! Partition 7 is for RP06 disks that have a lot of space. To set the mknod parameters, there is one final thing we should clarify to follow the tutorial. The original tutorial places both the root filesystem and the user filesystem on the same disk. The OpenSIMH tutorial, on the other hand, places the user filesystem on the second disk. That is what creates the differences between their tutorial and the original Haley-Ritchie tutorial. So, we finally know all we need to set the parameters:

/etc/mknod rp0  b 6  000 000
/etc/mknod swap b 6  000 001
/etc/mknod rp3  b 6  001 111
/etc/mknod rrp0 c 14 000 000
/etc/mknod rrp3 c 14 001 111

In decimal it’s the same as in the tutorial.


Create the /usr filesystem in the second disk and copy data from the tape

Here we just do what the tutorial does:

etc/mkfs /dev/rp3 322278

What is important is that this number did not come out of nothing. Of course it’s in the original tutorial, but that’s because if you see the array from hp.c above, the size of partition 7 is this number.

Now that we created the filesystem, we can copy the user-filesystem data from the tape. But that is cooler than a simple copy. That’s because we need to set the tape to the correct point, and particularly after the 6th file. This is done with:

dd if=/dev/nrmt0 of=/dev/null bs=20b files=6

We can now finally copy the files with restor:

restor rf /dev/rmt0 /dev/rp3

The comment in the original tutorial is interesting: “The restor takes about 20-30 minutes”.

Finally, we mount the /usr filesystem to /usr:

/etc/mount /dev/rp3 /usr

Copying the Boot Block

It turns out that when we copied the data to the root filesystem earlier, we did not copy the boot block. This is a single block that should be written to block 0 and it is under /usr, so we can do it only now that we mounted it:

dd if=/usr/mdec/hpuboot of=/dev/rp0 count=1

You should be able now to follow the rest of the instructions in Chapter 1 of the tutorial to boot the system normally.


Don't want to miss any updates? You can follow this RSS feed or sign up for my newsletter:



Footnotes

  1. You do not need to create all these directories.