Spontaneous Data and Sexually-Explicit AI Content: A Legal and Philosophical Perspective

The base image comes from Boom Photography

Jun 29, 2026

Introduction

Humans produce data at an unprecedented rate. The world is expected to generate 221 zettabytes of data in 2026 alone, and 70% of that data is human-generated (DemandSage, 2026). For comparison, if you watched one zettabyte’s worth of 4K video, the stream would last for about 2 million years.

This article considers the use of human-generated data by Artificial Intelligence (AI). More specifically, I am concerned with what I call “spontaneous data,” under which Instagram photos and TikTok reels fall. In recent years, such data have fueled a booming area of research: training AI to produce sexually-explicit content. This article takes up the pressing legal and philosophical questions around such use.

Disclaimer: This article discusses themes related to sexually-explicit material, pornography, and sexual abuse. The language and data referenced herein are analyzed strictly for philosophical, and research purposes. No explicit imagery is present, only text. Nevertheless, reader discretion is advised.

Background

Before we dive deep, I shall provide some background which I consider necessary for any discussion around the political and legal dimensions of software in general and AI in particular.

What is Artificial Intelligence?

In this article, I refer to AI using uppercase letters. This is important because it signifies that it is simply a proper name given to this type of software. Using lowercase would imply the claim that what I refer to as “Artificial Intelligence” is a form of intelligence (Rapaport, 2023, p.44, “Terminological Digression”). In this paper I refrain from this epistemological claim, and instead I focus on a descriptive treatment of AI. In other words, for our purposes it matters more what happens—the modus operandi, so to speak—when one “does AI” and not what AI is.

Nevertheless, there is a value in an overview of what people mean by AI. Thankfully we can find one in the works of pristine clarity by William J. Rapaport, so I will not repeat it here (Rapaport, 2023, p. 18.2; Rapaport, 2025). I do, however, want to make a small contribution to Rapaport’s discussion. He defines AI as follows (emphasis in the original):

AI is the branch of Computer Science [...] that tries to answer the question how much of cognition is computable? The working assumption of [this field] is that all of cognition is computable.

He calls this field “computational cognition.” As he explains, he prefers “cognition” to “intelligence” because the point is not to create software with high IQ, but to create software that in general can perform cognitive tasks (such as speaking or hearing). He prefers “computational” to “artificial” because the latter has the connotation that it is “not real” and so “synthetic” is better. But “computational” is even better because AI is not concerned with any type of synthetic cognition, but specifically with what computers can produce.

My reservation is that I am not sure that the working hypothesis of AI is that all of cognition is computable, and I do not think it matters. Indeed, recent advancements have already empirically confirmed that much of it is, and we do not seem to have reached the limit. Whether every last bit of cognition is computable seems to be irrelevant for the main actors in this game. Even answering “how much” does not seem to be their main interest.1 Nevertheless, for anyone interested to engage these debates, I think that there are a few important points we need to touch upon when we discuss the “real thing,” which in this case I take to be human intelligence and/or cognition.

First of all, I shall provide an argument by way of analogy of why, in general, the “real thing” is not always the focal point or even desirable. Consider the task of generating truly random numbers.2 This is provably impossible to be done computationally; any true source of randomness must come from the real world, it cannot be done algorithmically. Yet, computer scientists have created pseudo-random number generators (PRNGs) which for many tasks are good enough, and for yet many other tasks they are better than true random generators. For example, most modern processors (CPUs) can provide a truly random number by measuring a quantity on the physical device. However, these generators are slow and biased. So, for any application that needs either unbiased random numbers or it needs them fast (or both), PRNGs are not just “as good” but better than true random numbers.

The equivalent has already happened in AI for specific tasks. For example, AI can already play chess or even Go better than any human (as we know them today) will ever do. But even for general cognition, which is not yet “solved” (whatever that means), I am not sure that researchers will keep being interested in creating “true” cognition. I do not think it is hard to imagine a future where pseudo-AI can perform most cognitive tasks better (from a technocratic perspective) than humans at a lower cost and with fewer disagreements. Then, probably for many positions that are currently occupied by humans, these pseudo-AIs will be preferred, no matter how much “pseudo” or inhuman-like they may be.

What does it mean to “really” think?

Going one level deeper, I want to tackle the semantics of statements such as “AI does not (really) think.” My claim is that this statement, in and of itself, is ambiguous. The goal of this section is clarify a bit what one could mean when using this statement.3

To be able to have a meaningful discussion over such statements, I believe it is essential to distinguish (at least) between human-level cognition and metaphysically-aligned cognition. Human-level cognition can perform a task at the level a human can do it. This can be measured both quantitatively—e.g., by measuring their scores on popular benchmarks or even SAT scores—and qualitatively—e.g., by asking users to rate the answers they receive to medical questions (Ayers, 2023).

Metaphysically-aligned cognition is independent of human-level cognition and, for the purposes of this article, it means that AI experiences cognition the same way a human does (i.e., its experience is aligned with that of a human). To understand that, I think it is beneficial to turn to a simpler example: experiencing colors.

The experience of color—e.g., when we proclaim “this is red”—is a standard case of a metaphysical phenomenon. It is metaphysical because it goes beyond physics and the physical world.4 To a scientist, this probably sounds preposterous, so I need to explain more. A scientist would complain that we know very precisely what color is and how the experience is caused. When we talk about “red,” we know it involves these very specific wavelengths hitting the eye in this very particular way causing these very particular phenomena which eventually trigger the brain in a very particular way. What is more, we can observe all that using brain scans, etc. However, the problem is that this explains the experience of color only partly, and here is how. Suppose an alien came to earth, and she was quite intelligent. You could indeed explain to her the physics of colors: wavelengths, eyes, and the like. She could in fact understand it to such a detail that, perhaps with the help of some equipment, she could accurately deduce and predict when humans have the experience of color. For example, she could measure the wavelengths that are flowing around and also observe our eyes: whether they are closed, whether something obstructs them, etc. Now, suppose you are sitting with that alien in front of a red wall. She could state, possibly with higher-than-human accuracy, that you are experiencing color. In fact, she could do that so well that if you did not know she was an alien, you could never tell that she does not experience color.

Thus, color is a metaphysical phenomenon in the sense that you either experience it or you don’t. No amount of physics, or logic, or observation can make you experience it. With the aforementioned (cognitive) abilities, you could potentially understand it, potentially even to a much larger degree than we do today. For example, you could figure out the exact chemical reactions inside the brain that cause it/correlate with it (the distinction is subtle and I consider outside the scope of this article). Nonetheless, still you will either experience it or you will not.5

In fact, even among two humans A and B, there is no way for A to know whether B experiences colors the way A does. We basically simply assume others have the same experience of color as we do just because their behavior—their “output”, and thus our experience of them—correlates extremely well with our experience. For example, if there has never been a situation with your best friend where he said “this is black” when you experienced “this is white,” you will generally not be suspicious that your friend may be color blind (although you may want to ponder why in the U.S. they call it a yellow light whereas in Greece we call it orange). So, this “knowledge”—viz. others experience something the way we do—is actually not verifiable. Either we take it as an axiom that this is the case (and similarly some philosophers today take it as an axiom that AI cannot be conscious the way humans are), or we observe the world and deduce that it probably is. But we can never verify it, and in fact we usually do not even question it. This realization generalizes to all other kinds of experiences. For example, when the AI says “this is a cat”, we have no way of knowing whether AI experiences the same thing we do.

This now explains what metaphysically-aligned congition is. But similar to my earlier arguments, I do not think the big players in the AI battleground care a tiny bit about metaphysically-aligned cognition. You cannot verify it anyway, but more importantly, it provides no utility at all. If you want an agent that identifies colors, that agent only needs to understand the physical aspects of color. They don’t need to understand or experience the metaphysical experience a human has. It is completely irrelevant to the task at hand. Similarly, when you want your AI to identify cats, or write “good” emails, it is basically irrelevant whether it is metaphysically aligned with humans. One of course could argue that being metaphysically aligned may help with the task, e.g., with getting better at it. That might be true, but metaphysical alignment would still not be a goal. It would be just a means. And if another means is better, then folks would use the other means.

As a consequence, in my view, basically only philosophers care about the metaphysical aspect of AI. If I had to guess, most of the engineers or managers working at Anthropic, OpenAI, etc., when they (rarely) hear the word “metaphysics”, they probably think of clairvoyants or Paulo Coelho. But even those who e.g., have heard of Aristotle, probably do not care about the metaphysics of AI. Why should they? This is not part of their job. They care about making AI identify cats or write correct code. Whether AI experiences cats and code the way we do is irrelevant to the task at hand, because it is also irrelevant to whoever is using and paying for the AI.

Nevertheless, even though everything we just said may be completely useless to AI engineers, it is extremely important to anyone who wants to have grounded discussions around these topics.

The AI Hype in Numbers

Software systems based on AI have become ubiquitous in a mind-numbingly short period. A single paper in AI—the landmark paper Attention is All You Need (Vaswani et al., 2017), which was published just in 2017—has about 243,000 citations at the time of writing. To put it into perspective, all the works of A. Einstein combined have about 195,000 citations.6 Another way to quantify the rise and hype is through the financial investment. According to a projection of a popular think tank, “worldwide AI spending will total $2.5 trillion in 2026,” which is a 44% increase from last year (Gartner, Inc., 2026).7

To create the capabilities of these systems, computer scientists have to train them on astronomical amounts of data. For example, the training set of GPT-3 (Brown et al., 2020) (which is a precursor to the first ChatGPT and already obsolete) amounted to about 45 terabytes of text. This is about how much text we can find in the books of the Library of Congress.8^,9 Surely today a lot more data is used in the training of the AI systems we use (ChatGPT, Gemini, etc.), but since this software is mostly closed-source, we do not know how much data it is trained on.

Traditional Software vs Software 2.0

In “traditional” programming, i.e., what programmers have been exclusively doing up until around 2022, a programmer creates a program directly. In other words, it is the programmer who writes the code that is eventually executed.

In modern AI,10 however, the programmer does not create the final program directly. Rather, she creates a program—we will call it the training program—which trains another program—the output or inference program (also referred to as “model”)—to do something.

The training program(s) is generally well established and largely invariable across different tasks. However, the training program alone cannot train the inference program; it needs an important ingredient, and that is data—a lot of it (which of course changes from one task to another). For example, a popular dataset which is used to train image-recognition models contains about 1.28 million images. Most of the rapid development in AI does not happen due to radical changes in the training program, or more generally due to algorithmic improvements. Rather, it happens usually because of smarter ways to apply the training program, because of new data, better data, but most of all, because of more data (as is the case, for example, in the improvements among the different versions of ChatGPT).

No matter what the details of the creation are, a crucial question appears: since the inference program is not created by us, do we understand how it works? The universal consensus is a straight no! This may sound strange since we understand the training program. One may argue, then, that since we understand X which creates Y, we must therefore understand Y. Unfortunately, this is a major fallacy, which we can confirm using a thought experiment (which can easily be applied in practice).

Suppose a programmer wants to create a program that sorts a set of numbers. Instead of programming it directly, she proceeds as follows. She creates a program that generates random sequences of instructions (or lines of code). Since the function we want to end up with (sorting a set of numbers) is computable—not only because math proves it, but we have also created such programs already—it is mathematically certain that in this way the programmer will eventually11 end up with a program that sorts a set of numbers correctly for all inputs.

But how do we know when to stop? Well, we need an evaluation strategy, which could be a suite of tests. So, we run our random program generator until our output program passes all the tests. Now you may argue that passing all the tests does not mean the program will work for all possible inputs (e.g., inputs that are not in the tests). That would be a fallacious inductive argument. But in practice this is the exact same standard we apply to the big programs we use today (like Microsoft Word). That is, we do not verify that they work for all possible inputs. We simply test them on some (carefully selected) inputs that we hope cover most or all the space (i.e., we hope that any input that is not in the test suite makes the program behave as one of the inputs in the test).

The point is that the programmer needs to understand how the program generator works, but this is anyway trivial; a freshman in Computer Science could create it. On the other hand, she does not need to understand how the output program works. This is what happens with AI. Of course the training program is much more complex than random generation, and as one would hope, more sophisticated too. But still, we do not understand how its outputs—the inference programs—work. This is the infamous problem of interpretability (Zhang et al., 2021).

So, since the training program remains relatively fixed, and the output program is largely unreachable, the focal point is the data. Andrej Karpathy, one of the main actors of the AI Revolution, has coined a term for this way of creating software: “Software 2.0.” The central thesis is that in Software 2.0 we do not focus on code, but rather on data; gathering better and more of it. This is, then, why questions around data—like the ones raised in this article—are the most pressing.

Last but not least, the levels of indirection in creating programs keep increasing. Since now Large Language Models (LLMs), which are but a type of inference program, can write code, they can create programs. So, we already have programs (training) which create programs (LLMs) which create programs. This is already a reality: Boris Cherny, head and creator of Claude Code, one of the major AI coding systems, says that for months now 100% of “his” contributions to Claude Code are written and reviewed by... Claude Code.

Up to now, humans still understand the last level (the output of LLMs), but this also seems to be going away. The day may not be far when LLMs will create programs which we do not understand and we only use (similar to LLMs themselves). We are also not far from LLMs helping in moral decision-making in healthcare (Earp et al., 2024); we do not really know how they do it, but hopefully we will know why.

Data, AI, and Explicit Content

In this article, I am interested in data created spontaneously by humans. I refer to such data as “spontaneous data,” and it is defined by negation: it is human-created data that does not need labor. Data created through labor provides a much cleaner context and as such, it has been studied extensively (Moore, 1997; Grimmelmann, 2016; Epstein et al., 2023; Henderson et al., 2023). In contrast, spontaneous data constitutes a recent and largely unexplored phenomenon.

To ground the discussion, let me provide some examples. I am concerned with data such as spontaneous photos, stories, and recordings uploaded on social media. On the other hand, I am not concerned with professionally-made movies, studio albums, or software/source code.

Granted it is hard to unambiguously define the demarcation line of what constitutes a piece of spontaneous data, and unfortunately I will not offer a solution to this problem here. But we can still make progress without that line because there is a lot of data which is clearly and entirely on the spontaneous side. For example, it is perhaps hard to decide whether “spontaneous” photos (at least such is the claim by their creator) uploaded by an Instagram influencer require labor or not. This is because many of these influencers declare that this is their job, and so unless they are literally making money for free, there must be some labor going into these photos. But I think most would agree that a mother who randomly takes a picture of her kid and uploads it on TikTok does not exert any substantial labor (i.e., substantial enough to consider this a good or service that requires enough labor to be called a product).12 It is such data that creates the most dire need to figure out the legal framework as soon as possible because: (a) as we said, it is largely unexplored, and (b) it can be abused heavily.

Given this scope, let us first establish why data in this scope cannot be treated, and is not treated by law, like labor-produced data. The latter is covered by intellectual property (IP). Modern IP rights evolved to what they are today primarily based on two theories: The Labor Theory, and the Incentive (or Utilitarian) Theory.

The Labor Theory is of course inspired by Locke, and essentially argues that whoever exerts “labor” (mental or physical) to create something has a natural right to own the fruits of that labor. We need, however, to stretch Locke’s theory quite a bit to cover IP, for the simple reason that IP cannot be treated as other kinds of property like land. For starters, one can use IP without denying others the use of it (as in software), which is different from e.g., a house or even piece of land (which others may use, but usually only if they are employed by the owner). The absence of exclusivity is why it is hard to justify that IP is a natural right.

Thus, a more modern IP theory rests on a Utilitarian framework. Under that light, IP rights are justifiable not because they are natural, but because they benefit society as a whole. They allow creators to make a profit, which allows them to recoup their costs, which then incentivizes the development of new ideas. In contrast, without IP rights, one would freely copy, redistribute, reuse, etc. material which would not sustain a long-term development (of course open-source software has disproved that in practice, moving away from IP rights, and thus becoming a pioneer in the legal issues of immaterial creations).

The point is that no matter which theory we pick, the reason it justifies IP rights is because someone exerted labor. The Labor Theory tries to justify this as a natural right (i.e., it is your right because you exerted labor). The Incentive Theory, on the other hand, argues that if people exert labor and do not receive any benefit, no one would have any incentive to create anything. But since there is no labor going into such data, then we cannot apply this theory either. Besides, it is empirically invalid in this case: these spontaneous creators have not been receiving anything (that IP would entitle them to) in return, but they keep creating. In conclusion, Locke or Utilitarianism can go a long way in defending IP, but for the kind of data we are interested in we need another framework.

To consolidate this framework, I will focus on a particular kind of data and a particular kind of use: using visual data in AI—either for training or inference—to produce sexually-explicit content. This is a case study to have a grounded discussion; in principle, everything we will discuss applies to other kinds of spontaneous data too.13 That being said, this is not a random choice; this is a booming area of both research and use (Downing, 2025; Yousaf et al., 2026; Han, Mohamed, and Li, 2024; Morris, 2026).14

Let us now take a moment to consider how visual data can be used by AI. The most illuminating dimension is the distinction between training and inference (see Traditional Software vs Software 2.0). If an image is used in training, usually the subject of the image is not a target. The image is just one among thousands who are used to train the AI produce similar images, but the outputs (during inference) will not match exactly any of the training inputs.15 On the other hand, if the image is used during inference (which is what happens when an average person uses AI to nudify an image), then the subject is usually the target: the user uses AI exactly to target and nudify this particular subject.

We should also clarify the term deepfake. Here is the definition given by Marriam-Webster dictionary:16

[A]n image or recording that has been convincingly altered and manipulated to misrepresent someone as doing or saying something that was not actually done or said.

Etymologically, it comes from “deep”—because it is based on the technology of deep learning—and “fake” because the output is fake. To connect it to our discussion, a deepfake is relevant only in inference, when a subject becomes the target and a fake representation of her is created.17 But the pornographic output of an AI model during inference does not have to be a deepfake; the user could e.g., ask the model to use the person’s body and generate a new face, such that the output does not “convincingly misrepresent” the original subject.

In general, deepfakes are the kind of content that can cause the most harm (Citron and Franks, 2014; Bell, 2026; Tenbarge, 2023),18 exactly because a particular person is misrepresented. Regardless of whether the harm is intended (e.g., revenge porn) or not (e.g., deepfakes for profit), the consequences can be catastrophic to the subject, including public humiliation and assassination of character.19 Nevertheless, this does not mean that we should only be concerned about deepfakes; later I discuss the use of subjects for AI training.

From the perspective of the law, the obvious fix is to require the subject to give consent (and that is what the German public requested in the case of Collien Fernandes (Bell, 2026)). The problem is that in practice consent is not enough. One reason is that consent is often coerced (Dembrow, 2022). Another is that there is an incongruence between what companies advertise they do and what they actually do; a recent Federal Trade Commission (FTC) report is illuminating (Federal Trade Commission, 2024):

The Companies’ practices with respect to deletion [...] varied [...] A user would likely assume that deletion means that a Company would permanently erase their data. In fact, this understanding is not in line with several Companies’ reported practices. For example, instead of permanently deleting data, some Companies instead reported deidentifying such data. These Companies claim that de-identification anonymized the data and removed any personally identifiable information. Even the Companies that reported permanently erasing user data nevertheless conceded that they did not delete all data submitted by a user, such as user-generated content that is public.

Yet a third reason that leaning on consent is problematic is because, as an unambiguously-titled article claims, Americans Don’t Understand What Companies Can Do With Their Personal Data (Reissman, 2023).20 More specifically, people do not understand what an image-generation model can do with their data (even experts barely understand, let alone the average user).

A fourth reason is subtler, driven by the fact that in Software 2.0 the software creator is not the data creator (or the subject in the data), which creates information asymmetry (van de Waerdt, 2020). Let me elucidate this by way of contrast. In the case of traditional software, the software creator is also the creator of the central ingredient of this software: the code. For this reason, she is relatively aware of the ways in which her piece of code could be used if modified and redistributed. This is less likely to be the case if the creator of the ingredient is not the creator of the entity. For instance, a random parent who posts photos of their children on Facebook has no idea how software engineers at OpenAI could use it.

Finally, a recent discussion panel in FOSDEM 2025 (Ferraioli et al., 2025) is insightful. All the panelists seemed to agree that for an AI system to be free, its data needs to be available. But this seems to be a preconceived notion coming from our experiences with software as it was up to now. One panelist raised a practical concern: in practice, it may be impossible to make the data available, especially if we want to allow users to also modify and redistribute the data. Consider that in traditional software, the cynosure—code—is created by the software developer, and so she normally gets to choose what others can do with that. In other words, the software creator is also the code creator. But as we know this is not true in Software 2.0.

However, the problem is not just practical, but primarily ethical! Data is not like code because of all the ways we discussed above, which ultimately can harm the subject. As such, it should not be given away with a blank check, even if that would allow “open-source” models (in the sense that the data is available). Thus, I argue, consent is obviously a necessary condition but not a sufficient one. Consent should come only on case-by-case basis, for each individual use of data, and only after the creator has been briefed adequately regarding the results and the consequences.

The Devil’s Advocate

Now we shall engage three important counterarguments. The first is for a specific kind of use: private use. Concretely, should it be illegal for someone to download an image from Instagram, nudify it, and use it privately, without ever sharing it? One could argue this should be permissible because in this case the consequences we mentioned (public humiliation, assasination of character, etc.) disappear.21 But in some way it still feels wrong, and I think the reason it feels wrong is because the image is used in a way that was not intended. Here we should make a sharp distinction: how is this case different from a hotel that has hidden cameras, recording couples in their private moments, which the hotel employees then use privately? The two are similar in that in both cases the content is used in a way that was not intended by the subjects, but the big difference in the hotel example is that the couples do not know they are providing the content.

So, as long as the raw materials are published by the creator/protagonist, the only problem that remains is that they are used in a way that was not intended (still assuming only private use). But the problem with this argument is that Instagram photos can anyway be used privately—in their original form, without any transformation—in a way that was not intended, and it would sound preposterous to make that illegal (even though it does seem ethically questionable). The only distinction between the two we can cling on is of course that in our original formulation the image is transformed before being used. Perhaps, then, the problem is not the private use, but the transformation.

But the transformation alone does not sound as a plausible problem; it sounds illogical to have a problem with the transformation if it is never used (except perhaps that someone is wasting a lot of energy). Thus, it seems the only logical conclusion is that the real and only problem is the use of the transformed content. In other words, private use of the original content is fine, but private use of the transformed content is not. But why would that be the case? I think the only possible argument is that the protagonist is used as a means in a completely dehumanizing and alienating way. Of course, so is she if the original content is used. But it is one thing to objectify a human that exists, the photo of whom is but a representation of her, and it is another to create a previously non-existing object (quite literally, albeit abstract) in the form of a set of pixels. In some twisted way, the latter is still her—if the protagonist saw the result, she would recognize herself in there—but at the same time it is far from her too. Note that this argument implies that any transformation is illegal if it is used in that certain improper way. This, then, makes it illegal for someone to transform the photo by dressing the subject more rather than less, which is not a purely hypothetical scenario.

There is another subtler problem. If we allow private use, the transformed image can be circulated without anyone actually distributing it, simply because AI is so easy to use. Consider the following scenario. First, a high-school girl named Maude uploads a photo on her Instagram profile. Then, another student named Alvin uses an AI tool named MyImageGen to nudify her photo. Next day, Alvin goes to school and tells Rick that he generated a great-looking photo of Maude using MyImageGen. Here is now the key moment. Because MyImageGen is so easy to use, Rick goes home and generates a nude image of Maude too.22 Within a matter of days, the whole school may have a nude photo of Maude (again, because it is so easy to do it) without anyone having distributed the photo: everyone is using it privately. This points to an important problem with AI. Nudifying people is not something AI brought. Image editors have provided the necessary tooling for decades. Moreover, these tools could find their way into the hands (or screen) of a high-schooler (even if the product is expensive, there are pirated versions). However, it was not easy at all to do it! One would have be an expert and more than a bit artistically inclined to edit an image into a realistic-looking nude photo. And it would take hours even for an expert to edit a single image. So, the disruptive leap is that now anyone can generate high-quality photos in a matter of minutes with no expert knowledge.

—

Let us now move to the second counterargument. This one interestingly comes at odds with the previous one: perhaps it is fine to use such Instagram content for any use—private, public, commercial, etc.—as long as it is transformed. However, only on the condition that it will be transformed to the point that it does not resemble the protagonist of the original (e.g., he would not recognize himself). Before we delve into the philosophical aspects, we should establish that in practice this rule does not make sense on isolated cases (i.e., transforming a single image). Rather, this rule would allow one to train image-generation models using thousands of Instagram images, as we said earlier. There are many reasons why one would want to do that, one of them is that they can then create a different protagonist every day.

The pressing question is: should it be legal to allow such uses if the output does not resemble the training material?23 I find it illuminating to take a small detour by recasting this problem into a more conventional formulation that does not involve AI at all: is there anything wrong with humans drawing explicit content, for any use, if it cannot reasonably be claimed it represents an existing individual? Here I make the assumption that humans are not born knowing how a human body of the opposite sex looks like (it is an empirical observation). Therefore, they need to learn—i.e., they need to get trained—how it looks like so that they can then draw one. This is why this formulation is relevant to the AI case.24

Historically, this has not been considered a problem. For example, a lot of Renaissance paintings depict naked women (and most of these paintings do not depict historical persons). So, what is the problem if AI does it? One way to argue is that the problem is not with AI, but with the result, which is pornographic content. In contrast, Renaissance paintings are not (or so the argument runs). The issue becomes more complicated because some people consider that at least some pornographic content is art. In either case, it is extremely hard to define the problematic content; let me illustrate. In one conception, pornographic content is not art, and it’s categorically considered bad. In the other conception, some pornographic content is art (or is anyway acceptable), while some other is not (or is anyway unacceptable). In either case, there is some bad pornographic content; in the first case it is all pornographic content, whereas in the second it is only some. In both cases, though, it is extremely hard to define what falls within the category of bad pornographic content.

To the best of my knowledge, there are essentially three definitions of porn:

D1. Christensen (Christensen, 1990) defines pornography as any material (visual or written) that is intended to be sexually arousing.
D2. This is a definition with a feminist twist, and it has its roots in the works of Andrea Dworkin (Dworkin, 1985) and others. Essentially it defines porn as sexually-explicit content which promotes the subordination of women.
D3. This definition comes from Alan Soble’s work (Soble, 2011), and it attempts to define porn purely through its subject matter, i.e., explicit sexual acts (and it is in many ways a counterargument to the definitions stemming from Dworkin’s works).

First, in practice, and mainly on social-media platforms, a combination of D1 and D3 is used. For example, generally Youtube applies D3, with sexually-explicit taken to be showing genitals or breasts (i.e., nudity). Yet, there are exceptions that are allowed which show clear and undebatable nudity (ARTE, 2023). This is where D1 seems to kick in, i.e., such content is allowed because it is not intended to be sexually arousing. This discussion shows exactly the problem with these definitions. I do not think any of these definitions alone, or any combination, can help the lawgiver. D3 is the most easily applicable, but it includes too much to be used (e.g., many Renaissance paintings are porn). D1 and D2 make sense, but how can one define an unambiguous criterion for when something was intended to be sexually arousing (D1) or when something subjugates women? Take D1 for example. A lot of content on Instagram is sexually arousing, and many surveys using representative samples could support that. The sect of D1 would probably argue that it was not intended to be, but how can we prove that?

Nevertheless, I think that D1 is nearly practical for the case of AI, and nicely enough this provides an answer to both the current counterargument and the previous one. Whether the content resembles or not the original, and whether the use is private or not, if we presume that creating pornographic content without consent is bad, then any use we discussed should be illegal because the creations of these models are clearly porn. It is hard to know whether an Instagramer creates his content to create sexual arousal, but the case for AI models is much clearer.

—

The third and final counterargument follows a utilitarian perspective (I do not think any other school of thought could support what follows). I think most would agree that the porn industry is abhorrent in general, but the main victims are perhaps the stars themselves, especially when they are coerced or minors (both common). So, what if we could create porn (because that is probably not going away anytime soon), but in an industry that does not hurt the protagonists?25 AI could be the solution to that (and technologically the transition is well under way). The hardest case to make is for child pornography—on so many levels—so let me take this one up.

Before we begin our normative discussion, it is worth looking into the state of law in the U.S. regarding this issue. Briefly, in 1996 Congress passed the Child Pornography Prevention Act (CPPA) which criminalized any computer-generated child pornography. In 2002, however, the Supreme Court overruled CPPA in Ashcroft v. Free Speech Coalition on First-Amendment grounds, arguing that since images do not involve the exploitation of an actual child during production, they could not be banned as a categorical exception to free speech (Marcy, 2002; McLean, 2007; Marzen, 2024). To consider the strongest counterargument possible, I will not only argue that such AI content is not necessarily harmful, but on the whole it may be beneficial.

Let us start constructing the utilitarian framework, by first putting ourselves in the shoes of a producer. Obviously, the production of explicit content with actual children is illegal and among the most detestable, looked-down upon content in most societies (and the latter is important because prison experiences vary wildly based on how your cellmates view your crime). If the producer can create such content using a machine, why would they risk the incredibly unattractive consequences of getting caught? One may object that such content will never be as desirable as the real thing, but this is not what research shows. In a 2026 paper titled Subjective Responses of Gynephilic Men and Women to Real versus Artificial Female Nudes, Ellen Zakreski et al. found that:

These data suggest that AI-generated erotic material is superior to even real photographs in generating aesthetic appeal, positive valence, and ratings of sexual attractiveness [...] despite the fact that, in this study, the AI-generated images were perceived as less real.

So, if the AI-generated also sells as much, why would the producer not embrace it? A potential answer is because they do it for the “love of the game,” or even worse, they do it based on principles not outcomes. I cannot even begin to wrap my head around how either could be instantiated in practice, but I do not see any reason why we should consider them a priori contradictory. Nevertheless, I think the vast majority does not fall in either category. To put it plainly, I think they are in for the money, and so if the money can be obtained more easily, they will follow the path of least resistance (or least risk).

For the producer, then, the benefits outweigh the negatives, and this leads to the benefit of the society as a whole too because fewer children get abused. One counterargument, however, is that this may not be the case in the long run. If AI child porn gets active development not in the margins but in the mainstream development, and if it becomes legal, then it will promote pedophilia.26 Let us articulate a leap here: if the production of such content is legal, the only rational path is to make the consumption legal too. Now, in the minds of most people, if something is legal then it is not wrong (which is different from encouraging it, though). If it is not wrong, then more people will start doing it.

But on a purely utilitarian basis, that alone cannot be a problem, for the reasons we discussed in the first counterargument. More importantly, though, I do not think it is realistic, because I do not think that people can choose to get attracted by a certain type of people. To a large extent, this is a scientific question, but I am not aware of any bibliography that has studied it. The hypothesis I laid down is based on the case homosexuality, which is a case I chose due to the breadth and availability of research.

Concretely, there is no scientific reason to believe that humans can voluntarily become homosexuals (Balthazart, 2011; Balthazart, 2018; Blanchard and Klassen, 1997; Bogaert and Skorska, 2011; Sabuncuoglu, 2015; Bailey et al., 2016; Sanders et al., 2015; Whitam, Diamond, and J. Martin, 1993). Here some clarifications are necessary, especially given the many citations.27 The current research has concluded that there is a strong influence of biological factors in homosexuality. For example, according to (Balthazart, 2018):

[T]here is strong evidence [...] of a genetic component to the control of sexual orientation, even if attempts to identify the specific genes involved have met so far with little success.“ ”

Of course this does not mean that there are only genetic factors, but all the studies conclude that there are primarily biological factors. The pressing question for our purposes is whether the social environment has anything to do with it. For males, the answer is a definitive “no.” For females, there has been no statistically significant result. But we should note that even if the environment does influence female homosexuality, this does not mean a famale can choose to become homosexual. Thus, based on the current research, we have no reason to believe one can choose to become homosexual.

In a similar line of reasoning, I find no reason to believe that one can choose to become a pedophile. I should clarify that I do not suggest that we can arbitrarily extrapolate the research of homosexuality to pedophilia. All I am saying is that this the closest research I could find. Based on it, i.e., the best evidence we currently have, I cannot come up with any grounded argument that deeming pedophilia legal will create more pedophiles. The real problem is the act. Of course the act would still be illegal, but one could argue that if pedophilia is considered acceptable, more people will attempt to act on it. But watching gore thrillers and playing Call of Duty is legal; that does not mean (contrary to what many parents think) that more and more people go out there to kill because they play Call of Duty or watch gore thrillers. In conclusion, then, I cannot see how one could object to such usage from a utilitarian perspective (although there of course many other reasons to object to it).

Don't want to miss any updates? You can follow this RSS feed.

Bibliography

ARTE. (2023). Artem & Eva | L’intimité d’un couple de hardeurs romantiques. Youtube.
Ayers, J. W., Poliak, A., Dredze, M., Leas, E. C., Zhu, Z., Kelley, J. B., ... & Smith, D. M. (2023). Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA internal medicine, 183(6), 589-596.
Bailey, J. M., Vasey, P. L., Diamond, L. M., Breedlove, S. M., Vilain, E., & Epprecht, M. (2016). Sexual orientation, controversy, and science. Psychological Science in the Public Interest, 17(2), 45–101.
Balthazart, J. (2011). Minireview: Hormones and human sexual orientation. Endocrinology, 152(8), 2937–2947.
Balthazart, J. (2018). Fraternal birth order effect on sexual orientation explained. Proceedings of the National Academy of Sciences, 115(2), 234–236.
Bell, B. (2026). German outcry over deep fake porn targeting actress prompts bid to change law. In BBC News. BBC article.
Blanchard, R., & Klassen, P. (1997). HY antigen and homosexuality in men. Journal of Theoretical Biology, 185(3), 373–378.
Bogaert, A. F., & Skorska, M. (2011). Sexual orientation, fraternal birth order, and the maternal immune hypothesis: a review. Frontiers in Neuroendocrinology, 32(2), 247–254. DOI.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., ... Amodei, D. (2020). Language Models are Few-Shot Learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, & H. Lin (Eds.), Advances in Neural Information Processing Systems (Vol. 33, pp. 1877–1901). Curran Associates, Inc.
Cate, F. H., & Mayer-Schönberger, V. (2013). Notice and consent in a world of Big Data. International Data Privacy Law, 3(2), 67–73. DOI: 10.1093/idpl/ipt005, eprint.
Citron, D. K., & Franks, M. A. (2014). Criminalizing Revenge Porn. Wake Forest Law Review, 49, 345. Online Entry.
Christensen, F. M. (1990). Pornography: The Other Side. Praeger.
DemandSage. (2026). Big Data Statistics 2026 (Growth, Trends & Market Size). Online Entry.
Dembrow, B. (2022). Investing in Human Futures: How Big Tech and Social Media Giants Abuse Privacy and Manipulate Consumerism. University of Miami Business Law Review, 30(3), 324–351. Online Entry.
Döring, N., Le, T. D., Vowels, L. M., Vowels, M. J., & Marcantonio, T. L. (2024). The Impact of Artificial Intelligence on Human Sexuality: A Five-Year Literature Review 2020–2024. Current Sexual Health Reports, 17(1), 4. DOI.
Downing, S. (2025). The rise of nudifying tools and their threats to children. CameraForensics Blog. Online.
Dworkin, A. (1985). Against the male flood: Censorship, pornography, and equality. Harv. Women’s LJ, 8, 1.
Earp, B. D., Mann, S. P., Allen, J., Salloch, S., Suren, V., Jongsma, K., Braun, M., Wilkinson, D., Sinnott-Armstrong, W., Rid, A., Wendler, D., & Savulescu, J. (2024). A Personalized Patient Preference Predictor for Substituted Judgments in Healthcare: Technically Feasible and Ethically Desirable. The American Journal of Bioethics, 24(7), 13–26. DOI.
Eelmaa, S. (2022). Sexualization of children in deepfakes and hentai. Trames. Journal of the Humanities and Social Sciences, 26(2), 229–248. DOI.
Epstein, Z., Hertzmann, A., the Investigators of Human Creativity, Akten, M., Farid, H., Fjeld, J., Frank, M. R., Groh, M., Herman, L., Leach, N., Mahari, R., Pentland, A. “Sandy,” Russakovsky, O., Schroeder, H., & Amy Smith. (2023). Art and the science of generative AI. Science, 380(6650), 1110–1111. DOI.
Federal Trade Commission. (2024). A Look Behind the Screens: Examining the Data Practices of Social Media and Video Streaming Services [Techreport]. Federal Trade Commission. https://www.ftc.gov/reports/look-behind-screens-examining-data-practices-social-media-video-streaming-services
Ferraioli, J., O’Riordan, C., Black, A., Fontana, R., & Kooyman, Z. (2025). Panel: When is an AI system free/open? FOSDEM 2025, Legal and Policy Track. FOSDEM Archive.
Gartner, Inc. (2026). Gartner Says Worldwide AI Spending Will Total $2.5 Trillion in 2026. Press Release. Online Entry.
Grimmelmann, J. (2016). Copyright for Literate Robots. Iowa Law Review, 101, 657–681. SSRN.
Han, D., Mohamed, S., & Li, Y. (2024). ShieldDiff: Suppressing Sexual Content Generation from Diffusion Models through Reinforcement Learning. Arxiv.
Henderson, P., Li, X., Jurafsky, D., Hashimoto, T., Lemley, M. A., & Liang, P. (2023). Foundation Models and Fair Use. Journal of Machine Learning Research, 24(400), 1–79. Online Entry.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. In F. Pereira, C. J. Burges, L. Bottou, & K. Weinberger (Eds.), Advances in Neural Information Processing Systems (Vol. 25). Curran Associates, Inc. Paper.pdf.
Lapointe, V. A., Dubé, S., Rukhlyadyev, S., Kessai, T., & Lafortune, D. (2025). The Present and Future of Adult Entertainment: A Content Analysis of AI-Generated Pornography Websites. Archives of Sexual Behavior. DOI.
Marcy, S. C. (2002). Banning Virtual Child Pornography: Is There Any Way Around Ashcroft v. Free Speech Coalition. HeinOnline.
Marzen, A. (2024). Crafting New Boundaries: Model Legislation to Address the New-Real Threat of Virtual Child Pornography without Running Afoul of Ashcroft v. Free Speech Coalition. St. Thomas Law Review, 37(1), 20–56. Online Entry.
McLean, C. (2007). The uncertain fate of virtual child pornography legislation. Cornell JL & Pub. Pol’y, 17, 221.
Moore, A. D. (1997). A Lockean theory of intellectual property. The Ohio State University.
Morris, J. (2026, April 27). Unstable Diffusion AI: NSFW Uncensored Stable Diffusion Fork. Plisio. Online Entry.
Negreiro Achiaga, M. D. M. (2025). Children and deepfakes (Briefing EPRS_BRI(2025)775855). European Parliamentary Research Service. Online Entry.
Pala, M. (2026). EU moves to ban AI-generated non-consensual sexual deepfakes. Anadolu Agency. Online Entry.
Rapaport, W. J. (2023). Philosophy of Computer Science: An Introduction to the Issues and the Literature. John Wiley & Sons.
Rapaport, W. J. (2025). Will AI Succeed? The “Yes” Position. aidebate.pdf
Rare Historical Photos. (2025). The Story of Mihailo Tolotos: The Greek Monk Who Lived a Lifetime Without Seeing a Woman. Online Entry.
Reid, R. (2026). Riley Reid on Clona AI. Clona.ai.
Reissman, H. (2023). Americans Don’t Understand What Companies Can Do With Their Personal Data — and That’s a Problem [Techreport]. Annenberg School for Communication, University of Pennsylvania. Online Entry.
Sabuncuoglu, O. (2015). Maternal thyroid dysfunction during pregnancy may lead to same-sex attraction/gender nonconformity in the offspring: Proposal of prenatal thyroid model. European Psychiatry, 30, 374.
Sanders, A. R., Martin, E. R., Beecham, G. W., Guo, S., Dawood, K., Rieger, G., Badner, J. A., Gershon, E. S., Krishnappa, R. S., Kolundzija, A. B., & others. (2015). Genome-wide scan demonstrates significant linkage for male sexual orientation. Psychological Medicine, 45(7), 1379–1388.
Security Hero. (2023). 2023 State of Deepfakes: Realities, Threats, and Impact. Online Entry.
Soble, A. (2011). Pornography, sex, and feminism. Prometheus books.
Tenbarge, K. (2023). A face-swap app used Emma Watson’s face in sexually suggestive ads on Facebook and Instagram. NBC News. Online Entry.
U.S. Copyright Office. (2024). Copyright and Artificial Intelligence. Library of Congress. https://www.copyright.gov/ai/
van de Waerdt, P. J. (2020). Information asymmetries: recognizing the limits of the GDPR on the data-driven market. Computer Law & Security Review, 38, 105436. DOI.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.
Whitam, F. L., Diamond, M., & Martin, J. (1993). Homosexual orientation in twins: A report on 61 pairs and three triplet sets. Archives of Sexual Behavior, 22(3), 187–206.
Yousaf, A., Fioresi, J., Beetham, J., Bedi, A. S., & Shah, M. (2026). SafeR-CLIP: Mitigating NSFW Content in Vision-Language Models While Preserving Pre-Trained Knowledge. Proceedings of the AAAI Conference on Artificial Intelligence, 40(42), 36012–36020. DOI.
Zhang, Y., Tiňo, P., Leonardis, A., & Tang, K. (2021). A survey on neural network interpretability. IEEE Transactions on Emerging Topics in Computational Intelligence, 5(5), 726–742.

Footnotes

Similar to how they are not interested so much in how models perform the tasks they perform—and as we discuss below nobody really knows—but rather that they perform them well.↩
Defining what “truly random” means is much trickier than it may seem.↩
Although I will actually not consider all the possibilities. I have constrained the discussions to only those that I believe are that most pertinent today.↩
The term “Metaphysics” originates in Aristotle’s work “Μετὰ τὰ Φυσικά.” “Φυσικά” means “physics,” thus “Μετὰ τὰ Φυσικά” translates to “Meta the Physics” → “Metaphysics.” The word “μετά” has many meanings in Ancient Greek. Its simplest sense is “after.” At this point, it is important to note that it was not Aristotle who gave the title “Metaphysics” to his work, but some later scholar/compiler. Since Aristotle had written one work (which someone) named “Physics,” it is possible that a scholar named (what he thought to be) the next work as “After Physics,” sort of like “the sequel to Physics.” Another meaning of “μετά” is the meaning that “meta” has today in English. Today, “meta” is used to describe a self-referential entity but one level of abstraction higher. For example, “metadata” is data about data (e.g., the dimensions of an image), a “metalanguage” is a language used to describe other languages, and a “metaprogram” is a program that programs itself. To the best of my knowledge, the West interpreted “Metaphysics” using this meaning of “μετά.” We can interpret “Physics” as “those (writings) which pertain to nature” (“φυσικός” → “natural”). “Metaphysics” then was interpreted as the “nature of nature” or in a way that makes more sense, “the nature of reality.” For example, Ontology (deriving from “ὄν” → “being, existence”) asks what kinds of beings exists. This might sound as no different from biology, but ontology—being a branch of metaphysics—goes beyond or deeper (“meta”) than the physical world and e.g., asks questions such as “do numbers exist?” Generally speaking, such questions cannot be answered by studying the physical world. However, of course there are schools of thought which completely reject metaphysics. One of them is Logical Empiricism which has the Verification Principle as its core tenet, which states that a statement is meaningful only if it is either empirically verifiable (i.e., by observing the physical world) or true by definition. This might sound like a gnoseological claim (from “γνῶσις” → “knowledge”)—that is, it tells us what we may know, but it is in fact deeper: it tells us it is completely gibberish to talk about anything that does not fall into one of the two previous categories.↩
The same is true for love or a crush. Neuroscientists may one day decrypt the exact mechanisms in the brain that correspond to what we call “love.” This could potentially help an “emotionally dead” person understand it, but will not make them experience it.↩
If we combine Attention is All You Need with another single landmark paper—BERT—then the citation count goes to about 412,000 surpassing (by a wide margin) all the works of A. Einstein, Alan Turing, and Terence Tao combined.↩
Compare this with the market share of data and analytics software worldwide—a field which has been “hot” in the last couple of years, which rose from $153.9 to $175.1 billion in 2024, or 13.9%.↩
If you read a book a day, it would take you more than 100 million years to read all of them.↩
Furthermore, today the models are “multi-modal,” which means that they can handle multiple types of data.↩
In the history of AI, what we use today had been largely unsuccessful and at the margins of the scientific community. Instead, people used to believe more in what is now called “good old-fashioned AI,” which is interpretable because it is closer to traditional programming. The great success of modern AI came around 2008 with AlexNet (Krizhevsky, Sutskever, and Hinton, 2012).↩
Although with such a naive method, it will take an intractable amount of time.↩
Spontaneous data reaches beyond audio-visual content, and include e.g., our browsing history. Everything that follows applies to such data too, but I wanted to avoid broadening the scope too much (which explains why later I narrow the scope further).↩
The reader may find it illuminating to examine the points that follow using another case study: voice recordings. One could use voice recordings of a person to train speech-synthesis models to mimic that person. Then, a con artist could, for example, call the person’s family and scam them.↩
In addition, according to a 2023 study, 98% of all deepfake content (see next two paragraphs for an explanation of the term) was pornographic (Security Hero, 2023). The European Parliament Think Tank estimates that between 2023 and 2025 the number increased by 16×, jumping from 500,000 to 8,000,000 (Negreiro Achiaga, 2025).↩
Lapointe et al. provide a nice overview of what functionality online websites provide (Lapointe et al., 2025).↩
I prefer Marriam-Webster’s definition to that of the Oxford English Dictionary (OED) because I believe it captures more accurately and more simply the usage of the word, especially in our context. Here is the OED definition: “Any of various media, esp. a video, that has been digitally manipulated to replace one person’s likeness convincingly with that of another, often used maliciously to show someone doing something that he or she did not do.”↩
In addition, it is relevant only when a historical person is involved. In that sense, the term “deepfake” can be misleading. For example, AI hentai pornography is both “deep” (in the sense that its creation uses deep learning) and fake, but it is not a deepfake.↩
One study deserves the spotlight: an incredible five-year literature review by Döring et al. titled The Impact of Artificial Intelligence on Human Sexuality: A Five-Year Literature Review 2020–2024 (Döring et al., 2024). As expected, they found that “[t]his sort of content predominantly victimizes women and girls whose faces are swapped into pornographic material and circulated without their consent.”↩
Nonetheless, there is a lot of deepfake content created not just with the consent of the subjects, but with their entrepreneurial wisdom. For instance, Riley Reid—an adult-content creator—has launched her own erotic chatbot (based on Meta’s open-source Llama model) (Reid, 2026).↩
This is the infamous “notice-and-consent” problem (Cate, 2013).↩
In most European-Union states it is permissible, but this is changing (Pala, 2026). As we saw earlier in the case of Collien Fernandes, in Germany it is legal even to share it. In France, deepfake and real explicit content is treated the same way: if the content is shared without the consent of the protagonist, it results in 2 years of prison and a 60,000 euro fine (the punishment becomes worse if the video is published online publicly).↩
It is basically irrelevant whether the image Rick gets is the exact same as the one Alvin got.↩
In its general form, and when it comes to copyrighted material, the issue has been studied extensively with divergent opinions (U.S. Copyright Office, 2024).↩
In theory, a human could learn how to draw human bodies without ever using for its learning material any real-life representations of historical people. For example, women $A, B, C$ draw themselves (without being coerced), and then some man $D$ uses these three representations as a training set to learn how to draw the body of a woman. But such scenarios are unrealistic. In reality, humans learn how to draw human bodies of the opposite sex at least in some part by observing historical people of the opposite sex. There is this exceptional case, though, of the monk Mihailo Tolotos who never saw a woman in real life, not even his mother. But it is unclear whether he ever saw a representation of a historical woman. See: (Rare Historical Photos, 2025).↩
This assumes that the final product does not resemble any real person.↩
An argument used by many Reddit users (Eelmaa, 2022).↩
Thanks to Greekonomics which pointed me to this literature.↩