Humans produce data at an unprecedented rate. The world is expected to generate 221 zettabytes of data in 2026 alone, and 70% of that data is human-generated (DemandSage, 2026). For comparison, if you watched one zettabyte’s worth of 4K video, the stream would last for about 2 million years.
This article considers the use of human-generated data by Artificial Intelligence (AI). More specifically, I am concerned with what I call “spontaneous data,” under which Instagram photos and TikTok reels fall. In recent years, such data have fueled a booming area of research: training AI to produce sexually-explicit content. This article takes up the pressing legal and philosophical questions around such use.
Disclaimer: This article discusses themes related to sexually explicit material, pornography, and sexual abuse. The language and data referenced herein are analyzed strictly for philosophical, and research purposes. No explicit imagery is present, only text. Nevertheless, reader discretion is advised.
Before we dive deep, I shall provide some background which I consider necessary for any discussion around the political and legal dimensions of software in general and AI in particular.
In this article, I refer to AI using uppercase letters. This is important because it signifies that it is simply a proper name given to this type of software. Using lowercase would imply the claim that what I refer to as “Artificial Intelligence” is a form of intelligence (Rapaport, 2023, p.44, “Terminological Digression”). In this paper I refrain from this epistemological claim, and instead I focus on a descriptive treatment of AI. In other words, for our purposes it matters more what happens—the modus operandi, so to speak—when one “does AI” and not what AI is.
Nevertheless, there is a value in an overview of what people mean by AI. Thankfully we can find one in the works of pristine clarity by William J. Rapaport, so I will not repeat it here (Rapaport, 2023, p. 18.2; Rapaport, 2025). I do, however, want to make a small contribution to Rapaport’s discussion. He defines AI as follows (emphasis in the original):
AI is the branch of Computer Science [...] that tries to answer the question how much of cognition is computable? The working assumption of [this field] is that all of cognition is computable.
He calls this field “computational cognition.” As he explains, he prefers “cognition” to “intelligence” because the point is not to create software with high IQ, but to create software that in general can perform cognitive tasks (such as speaking or hearing). He prefers “computational” to “artificial” because the latter has the connotation that it is “not real” and so “synthetic” is better. But “computational” is even better because AI is not concerned with any type of synthetic cognition, but specifically with what computers can produce.
My reservation is that I am not sure that the working hypothesis of AI is that all of cognition is computable, and I do not think it matters. Indeed, recent advancements have already empirically confirmed that much of it is, and we do not seem to have reached the limit. Nevertheless, whether every last bit of cognition is computable seems to be irrelevant for the main actors in this game. Even answering “how much” does not seem to be their main interest.1 Nevertheless, for anyone interested to engage these debates, I think that there are a few important points we need to touch upon when we discuss the “real thing”, which in this case I take to be human intelligence and/or cognition.
First of all, I shall provide an argument by way of analogy of why, in general, the “real thing” is not always the focal point or even desirable. Consider the task of generating truly random numbers.2 This is provably impossible to be done computationally; any true source of randomness must come from the real world, it cannot be done algorithmically. Yet, computer scientists have created pseudo-random number generators (PRNGs) which for many tasks are good enough, and for yet many other tasks they are better than true random generators. For example, most modern processors (CPUs) can provide a truly random number by measuring a quantity on the physical device. However, these generators are slow and biased. So, for any application that needs either unbiased random numbers or it needs them fast (or both), PRNGs are not just “as good” but better than true random numbers.
The equivalent has already happened in AI for specific tasks. For example, AI can already play chess or even Go better than any human (as we know them today) will ever do. But even for general cognition, which is not yet “solved” (whatever that means), I am not sure that researchers will keep being interested in creating “true” cognition. I do not think it is hard to imagine a future where pseudo-AI can perform most cognitive tasks better (from a technocratic perspective) than humans at a lower cost and with fewer disagreements. Then, probably for many positions that are currently occupied by humans, these pseudo-AIs will be preferred, no matter how much “pseudo” or inhuman-like they may be.
Going one level deeper, I want to tackle the semantics of statements such as “AI does not (really) think.” My claim is that this statement, in and of itself, is ambiguous. The goal of this section is clarify a bit what one could mean when using this statement.3
To be able to have a meaningful discussion over such statements, I believe it is essential to distinguish (at least) between human-level cognition and metaphysically-aligned cognition. Human-level cognition can perform a task at the level a human can do it. This can be measured both quantitatively—e.g., by measuring their scores on popular benchmarks or even SAT scores—and qualitatively—e.g., by asking users to rate the answers they receive to medical questions (Ayers, 2023).
Metaphysically-aligned cognition is independent of human-level cognition and, for the purposes of this article, it means that AI experiences cognition the same way a human does (i.e., its experience is aligned with that of a human). To understand that, I think it is beneficial to turn to a simpler example: experiencing colors.
The experience of color—e.g., when we proclaim “this is red”—is a standard case of a metaphysical phenomenon. It is metaphysical because it goes beyond physics and the physical world.4 To a scientist, this probably sounds preposterous, so I need to explain more. A scientist would complain that we know very precisely what color is and how the experience is caused. When we talk about “red,” we know it involves these very specific wavelengths hitting the eye in this very particular way causing these very particular phenomena which eventually trigger the brain in a very particular way, etc. What is more, we can observe all that using brain scans, etc. However, the problem is that this explains the experience of color only partly, and here is how. Suppose an alien came to earth, and she was quite intelligent. You could indeed explain to her the physics of colors: wavelengths, eyes, and the like. She could in fact understand it to such a detail that, perhaps with the help of some equipment, she could accurately deduce and predict when humans have the experience of color. For example, she could measure the wavelengths that are flowing around and also observe our eyes: whether they are closed, whether something obstructs them, etc. Now, suppose you are sitting with that alien in front of a red wall. She could state, possibly with higher-than-human accuracy, that you are experiencing color. In fact, she could do that so well that if you did not know she was an alien, you could never tell that she does not experience color.
Thus, color is a metaphysical phenomenon in the sense that you either experience it or you don’t. No amount of physics, or logic, or observation can make you experience it. With the aforementioned (cognitive) abilities, you could potentially understand it, potentially even to a much larger degree than we do today. For example, you could figure out the exact chemical reactions inside the brain that cause it/correlate with it (the distinction is subtle and I consider outside the scope of this article). Nonetheless, you still will either experience it or you will not.5
In fact, even among two humans A and B, there is no way for A to know whether B experiences colors the way A does. We basically simply assume others have the same experience of color as we do just because their behavior—their “output”, and thus our experience of them—correlates extremely well with our experience. For example, if there has never been a situation with your best friend where he said “this is black” when you experienced “this is white,” you will generally not be suspicious that your friend may be color blind (although you may want to ponder why in the U.S. they call it a yellow light whereas in Greece we call it orange). So, this “knowledge”—viz. others experience something the way we do—is actually not verifiable. Either we take it as an axiom that this is the case (and similarly some philosophers today take it as an axiom that AI cannot be conscious the way humans are), or we observe the world and deduce that it probably is. But we can never verify it, and in fact we usually do not even question it. This realization generalizes to all other kinds of experiences. For example, when the AI says “this is a cat”, we have no way to know whether the AI experiences the same thing we do.
This now explains what metaphysically-aligned congition is. But similar to my earlier arguments, I do not think the big players in the AI battleground care a tiny about metaphysically-aligned cognition. You cannot verify it anyway, but more importantly, it provides no utility at all. If you want an agent that identifies colors, that agent only needs to understand the physical aspects of color. They don’t need to understand or experience the metaphysical experience a human has. It is completely irrelevant to the task at hand. Similarly, when you want your AI to identify cats, or write “good” emails, it is basically irrelevant whether it is metaphysically aligned with humans. One of course could argue that being metaphysically aligned may help with the task, e.g., with getting better at it. That might be true, but metaphysical alignment would still not be a goal. It would be just a means. And if another means is better, then folks would use the other means.
As a consequence, in my view, basically only philosophers care about the metaphysical aspect of AI. I think that Anthropic, OpenAI, and the like do not care a tiny bit about it. If I had to guess, most of the engineers or managers working there, when they (rarely) hear the word “metaphysics”, they probably think of clairvoyants or Paulo Coelho. But even those who e.g., have heard of Aristotle, probably do not care about the metaphysics of AI. Why should they? This is not part of their job. They care about making AI identify cats or write correct code. Whether AI experiences cats and code the way we do is irrelevant to the task at hand, because it is also irrelevant to whoever is using and paying for the AI.
Nevertheless, even though everything we just said may be completely useless to AI engineers, it is extremely important to anyone who wants to have grounded discussions around these topics.
Software systems based on AI have become ubiquitous in a mind-numbingly short period. A single paper in AI—the landmark paper Attention is All You Need (Vaswani et al., 2017), which was published just in 2017—has about 243,000 citations at the time of writing. To put it into perspective, all the works of A. Einstein combined have about 195,000 citations.6 Another way to quantify the rise and hype is through the financial investment. According to a projection of a popular think tank, “worldwide AI spending will total $2.5 trillion in 2026,” which is a 44% increase from last year (Gartner, Inc., 2026).7
To create the capabilities of these systems, computer scientists have to train them on astronomical amounts of data. For example, the training set of GPT-3 (Brown et al., 2020) (which is a precursor to the first ChatGPT and already obsolete) amounted to about 45 terabytes of text. This is about how much text we can find in the books of the Library of Congress.8,9 Surely today a lot more data is used in the training of the AI systems we use (ChatGPT, Gemini, etc.), but since this software is mostly closed-source, we do not know how much data they are trained on.
In “traditional” programming, i.e., what programmers have been exclusively doing up until around 2022, a programmer creates a program directly. In other words, it is the programmer who writes the code that is eventually executed.
In modern AI,10 however, the programmer does not create the final program directly. Rather, she creates a program—we will call it the training program—which trains another program—the output or inference program (also referred to as “model”)—to do something.
The training program(s) is generally well established and largely invariable across different tasks. However, the training program alone cannot train the inference program; it needs an important ingredient, and that is data—a lot of it (which of course changes from one task to another). For example, a popular dataset which is used to train image-recognition models contains about 1.28 million images. More generally, most of the rapid development in AI does not happen due to radical changes in the training program, or more generally due to algorithmic improvements. Rather, it happens usually because of smarter ways to apply the training program, because of new data, better data, but most of all, because of more data (as is the case, for example, in the improvements among the different versions of ChatGPT).
No matter what the details of the creation are, a crucial question appears:
since the inference program is not created by us, do we understand how it
works? The universal consensus is a straight no! This may sound strange since
we understand the training program. One may argue, then, that since we
understand X which creates Y, we must therefore understand Y.
Unfortunately, this is a major fallacy, which we can confirm using a thought
experiment (which can easily be applied in practice).
Suppose a programmer wants to create a program that sorts a set of numbers. Instead of programming it directly, she proceeds as follows. She creates a program that generates random sequences of instructions (or lines of code). Since the function we want to end up with (sorting a set of numbers) is computable—not only because math proves it, but we have also created such programs already—it is mathematically certain that in this way the programmer will eventually11 end up with a program that sorts a set of numbers correctly for all inputs.
But how do we know when to stop? Well, we need an evaluation strategy, which could be a suite of tests. So, we run our random program generator until our output program passes all the tests. Now you may argue that passing all the tests does not mean the program will work for all possible inputs (e.g., inputs that are not in the tests). But in practice this is the exact same standard we apply to the big programs we use today (like Microsoft Word).
The point is that the programmer needs to understand how the program generator works, but this is anyway trivial; a freshman in Computer Science could create it. On the other hand, she does not need to understand how the output program works. This is what happens with AI. Of course the training program is much more complex than random generation, and as one would hope, more sophisticated too. But still, we do not understand how its outputs—the inference programs—work. This is the infamous problem of interpretability (Zhang et al., 2021).
So, since the training program remains relatively fixed, and the output program is largely unreachable, the focal point is the data. Andrej Karpathy, one of the main actors of the AI Revolution, has coined a term for this way of creating software: “Software 2.0.” The central thesis is that in Software 2.0 we do not focus on code, but rather on data; gathering better and more of it. This is, then, why questions around data—like the ones raised in this article—are the most pressing.
Last but not least, the levels of indirection in creating programs keep increasing. Since now Large Language Models (LLMs), which are but a type of inference program, can write code, they can create programs. So, we already have programs (training) which create programs (LLMs) which create programs. This is already a reality: Boris Cherny, head and creator of Claude Code, one of the major AI coding systems, says that for months now 100% of “his” contributions to Claude Code are written and reviewed by... Claude Code.
Up to now, humans still understand the last level (the output of LLMs), but this also seems to be going away. The day may not be far when LLMs will create programs which we do not understand and we only use (similar to LLMs themselves). We are also not far from LLMs helping in moral decision-making in healthcare (Earp et al., 2024); we do not really know how they do it, but hopefully we will know why.
In this article, I am interested in data created spontaneously by humans. I refer to such data as “spontaneous data,” and it is defined by negation: it is human-created data that does not need labor. Data created through labor provides a much cleaner context and as such, it has been studied extensively (Moore, 1997; Grimmelmann, 2016; Epstein et al., 2023; Henderson et al., 2023). In contrast, spontaneous data constitutes a recent and largely unexplored phenomenon.
To ground the discussion, let me provide some examples. I am concerned with data such as spontaneous photos, stories, and recordings uploaded on social media. On the other hand, I am not concerned with professionally-made movies, studio albums, or software/source code.
Granted it is hard to unambiguously define the demarcation line of what constitutes a piece of spontaneous data, and unfortunately I will not offer a solution to this problem here. But we can still make progress without that line because there is a lot of data which is clearly and entirely on the spontaneous side. For example, it is perhaps hard to decide whether “spontaneous” photos (at least such is the claim by their creator) uploaded by an Instagram influencer require labor or not. This is because many of these influencers declare that this is their job, and so unless they are literally making money for free, there must be some labor going into these photos. But I think most would agree that a mother who randomly takes a picture of her kid and uploads it on TikTok does not exert any substantial labor (i.e., substantial enough to consider this a good or service that requires enough labor to be called a product).12 It is such data that creates the most dire need to figure out the legal framework as soon as possible because: (a) as we said, it is largely unexplored, and (b) it can be abused heavily.
Given this scope, let us first establish why data in this scope cannot be treated, and is not treated by law, like labor-produced data. The latter is covered by intellectual property (IP). Modern IP rights evolved to what they are today primarily based on two theories: The Labor Theory, and the Incentive (or Utilitarian) Theory.
The Labor Theory is of course inspired by Locke, and essentially argues that whoever exerts “labor” (mental or physical) to create something has a natural right to own the fruits of that labor. We need, however, to stretch Locke’s theory quite a bit to cover IP, for the simple reason that IP cannot be treated as other kinds of property like land. For starters, one can use IP without denying others the use of it (as in software), which is different from e.g., a house or even piece of land (which others may use, but usually only if they are employed by the owner). The absence of exclusivity is why it is hard to justify that IP is a natural right.
Thus, a more modern IP theory rests on a Utilitarian framework. Under that light, IP rights are justifiable not because they are natural, but because they benefit society as a whole. They allow creators to make a profit, which allows them to recoup their costs, which then incentivizes the development of new ideas. In contrast, without IP rights, one would freely copy, redistribute, reuse, etc. material which would not sustain a long-term development (of course open-source software has disproved that in practice, moving away from IP rights, and thus becoming a pioneer in the legal issues of immaterial creations).
The point is that no matter which theory we pick, the reason it justifies IP rights is because someone exerted labor. The Labor Theory tries to justify this as a natural right. The Incentive Theory, on the other hand, argues that if people exert labor and do not receive any benefit, no one would have any incentive to create anything. But since there is no labor going into such data, then we cannot apply this theory either. Besides, it is empirically invalid in this case: these spontaneous creators have not been receiving anything (that IP would entitle them to) in return, but they keep creating. In conclusion, Locke or Utilitarianism can go a long way in defending IP, but for the kind of data we are interested in we need another framework.
To consolidate this framework, I will focus on a particular kind of data and a particular kind of use: using visual data in AI—either for training or inference—to produce sexually-explicit content. This is a case study to have a grounded discussion; in principle, everything we will discuss applies to other kinds of spontaneous data too.13 That being said, this is not a random choice; this is a booming area of both research and use (Downing, 2025; Yousaf et al., 2026; Han, Mohamed, and Li, 2024; Morris, 2026).14
Let us now take a moment to consider how visual data can be used by AI. The most illuminating dimension is the distinction between training and inference (see Traditional Software vs Software 2.0). If an image is used in training, usually the subject of the image is not a target. The image is just one among thousands who are used to train the AI produce similar images, but the outputs (during inference) will not match exactly any of the training inputs.15 On the other hand, if the image is used during inference (which is what happens when an average person uses AI to nudify an image), then the subject is usually the target: the user uses AI exactly to target and nudify this particular subject.
We should also clarify the term deepfake. Here is the definition given by Marriam-Webster dictionary:16
[A]n image or recording that has been convincingly altered and manipulated to misrepresent someone as doing or saying something that was not actually done or said.
Etymologically, it comes from “deep”—because it is based on the technology of deep learning—and “fake” because the output is fake. To connect it to our discussion, a deepfake is relevant only in inference, when a subject becomes the target and a fake representation of them is created.17 But the pornographic output of an AI model during inference does not have to be a deepfake; the user could e.g., ask the model to use the person’s body and generate a new face, such that the output does not “convincingly misrepresent” the original subject.
In general, deepfakes are the kind of content that can cause the most harm (Citron and Franks, 2014; Bell, 2026; Tenbarge, 2023),18 exactly because a particular person is misrepresented. Regardless of whether the harm is intended (e.g., revenge porn) or not (e.g., deepfakes for profit), the consequences can be catastrophic to the subject, including public humiliation and assassination of character.19 Nevertheless, this does not mean that we should only be concerned about deepfakes; later I discuss the use of subjects for AI training.
From the perspective of the law, the obvious fix is to require the subject to give consent (and that is what the German public requested in the case of Collien Fernandes (Bell, 2026)). The problem is that in practice consent is not enough. One reason is that consent is often coerced (Dembrow, 2022). Another is that there is an incongruence between what companies advertise they do and what they actually do; a recent Federal Trade Commission (FTC) report is illuminating (Federal Trade Commission, 2024):
The Companies’ practices with respect to deletion [...] varied [...] A user would likely assume that deletion means that a Company would permanently erase their data. In fact, this understanding is not in line with several Companies’ reported practices. For example, instead of permanently deleting data, some Companies instead reported deidentifying such data. These Companies claim that de-identification anonymized the data and removed any personally identifiable information. Even the Companies that reported permanently erasing user data nevertheless conceded that they did not delete all data submitted by a user, such as user-generated content that is public.
Yet a third reason that leaning on consent is problematic is because, as an unambiguously-titled article claims, Americans Don’t Understand What Companies Can Do With Their Personal Data (Reissman, 2023).20 More generally, people do not understand what an image-generation model can do with their data (even experts barely understand, let alone the average user).
A fourth reason is subtler, driven by the fact that in Software 2.0 the software creator is not the data creator (or the subject in the data), which creates information asymmetry (van de Waerdt, 2020). Let me elucidate this by way of contrast. In the case of traditional software, the software creator is also the central ingredient of this software, the code. For this reason, she is relatively aware of the ways in which her piece of code could be used if modified and redistributed. This is less likely to be the case if the creator of the ingredient is not the creator of the entity. For instance, a random parent who posts photos of their children on Facebook has no idea how software engineers at OpenAI could use it.
Finally, a recent discussion panel in FOSDEM 2025 (Ferraioli et al., 2025) is insightful. All the panelists seemed to agree that for an AI system to be free, its data needs to be available. But this seems to be a preconceived notion coming from our experiences with software as it was up to now. One panelist raised a practical concern: in practice, it may be impossible to make the data available, especially if we want to allow users to also modify and redistribute the data. Consider that in traditional software, the cynosure—code—is created by the software developer, and so she normally gets to choose what others can do with that. In other words, the software creator is also the code creator. But as we know this is not true in Software 2.0.
However, the problem is not just practical, but primarily ethical! Data is not like code because of all the ways we discussed above, which ultimately can harm the subject. As such, it should not be given away with a blank check, even if that would allow “open-source” models (in the sense that the data is available). Thus, I argue, consent is obviously a necessary condition but not a sufficient one. Consent should come only on case-by-case basis, for each individual use of data, and only after the creator has been briefed adequately regarding the results and the consequences.
Now we shall engage three important counterarguments. The first is for a specific kind of use: private use. Concretely, should it be illegal for someone to download an image from Instagram, nudify it, and use it privately, without ever sharing it? One could argue this should be permissible because in this case the consequences we mentioned (public humiliation, assasination of character, etc.) disappear.21 But in some way it still feels wrong, and I think the reason it feels wrong is because the image is used in a way that was not intended. Here we should make a sharp distinction: how is this case different from a hotel that has hidden cameras, recording couples in their private moments, which the hotel employees then use privately? The two are similar in that in both cases the content is used in a way that was not intended by the subjects, but the big difference in the hotel example is that the couples do not know they are providing the content.
So, as long as the raw materials are published by the creator/protagonist, the only problem that remains is that they are used in a way that was not intended (still assuming only private use). But the problem with this argument is that Instagram photos can anyway be used privately—in their original form, without any transformation—in a way that was not intended, and it would sound preposterous to make that illegal (even though it does seem ethically questionable). The only distinction between the two we can cling on is of course that in our original formulation the image is transformed before being used. Perhaps, then, the problem is not the private use, but the transformation.
But the transformation alone does not sound as a plausible problem; it sounds illogical to have a problem with the transformation if it is never used (except perhaps that someone is wasting a lot of energy). Thus, it seems the only logical conclusion is that the real and only problem is the use of the transformed content. In other words, private use of the original content is fine, but private use of the transformed content is not. But why would that be the case? I think the only possible argument is that the protagonist is used as a means in a completely dehumanizing and alienating way. Of course, so is she if the original content is used. But it is one thing to objectify a human that exists, the photo of whom is but a representation of her, and it is another to create a previously non-existing object (quite literally, albeit abstract) in the form of a set of pixels. In some twisted way, the latter is still her—if the protagonist saw the result, should would recognize herself in there—but at the same time it is far from her too. Note that this argument implies that any transformation is illegal if it is used in that certain improper way. This, then, makes it illegal for someone to transform the photo by dressing the subject more rather than less, which is not a purely hypothetical scenario.
There is another subtler problem. If we allow private use, the transformed image can be circulated without anyone actually distributing it, simply because AI is so easy to use. Consider the following scenario. First, a high-school girl named Maude uploads a photo on her Instagram profile. Then, another student named Alvin uses an AI tool named MyImageGen to nudify her photo. Next day, Alvin goes to school and tells Rick that he generated a great-looking photo of Maude using MyImageGen. Here is now the key moment. Because MyImageGen is so easy to use, Rick goes home and generates a nude image of Maude too.22 Within a matter of days, the whole school may have a nude photo of Maude (again, because it is so easy to do it) without anyone having distributed the photo: everyone is using it privately. This points to an important problem with AI. Nudifying people is not something AI brought. Image editors have provided the necessary tooling for decades. Moreover, these tools could find their way into the hands (or screen) of a high-schooler (even if the product is expensive, there are pirated versions). However, it was not easy at all to do it! One would have be an expert and more than a bit artistically inclined to edit an image into a realistic-looking nude photo. And it would take hours even for an expert to edit a single image. So, the disruptive leap is that now anyone can generate high-quality photos in a matter of minutes with no expert knowledge.
—
Let us now move to the second counterargument. This one interestingly comes at odds with the previous one: perhaps it is fine to use such Instagram content for any use—private, public, commercial, etc.—as long as it is transformed. However, only on the condition that it will be transformed to the point that it does not resemble the protagonist of the original (e.g., he would not recognize himself). Before we delve into the philosophical aspects, we should establish that in practice this rule does not make sense on isolated cases (i.e., transforming a single image). Rather, this rule would allow one to train image-generation models using thousands of Instagram images, as we said earlier. There are many reasons why one would want to do that, one of them is that they can then create a different protagonist every day.
The pressing question is: should it be legal to allow such uses if the output does not resemble the training material?23 I find it illuminating to take a small detour by recasting this problem into a more conventional formulation that does not involve AI at all: is there anything wrong with humans drawing explicit content, for any use, if it cannot reasonably be claimed it represents an existing individual? Here I make the assumption that humans are not born knowing how a human body of the opposite sex looks like (it is an empirical observation). Therefore, they need to learn—i.e., they need to get trained—how it looks like so that they can then draw one. This is why this formulation is relevant to the AI case.24
Historically, this has not been considered a problem. For example, a lot of Renaissance paintings depict naked women (and most of these paintings do not depict historical persons). So, what is the problem if AI does it? One way to argue is that the problem is not with AI, but with the result, which is pornographic content. In contrast, Renaissance paintings are not. The issue becomes more complicated because some people consider that at least some pornographic content is art. In either case, it is extremely hard to define the problematic content; let me illustrate. In one conception, pornographic content is not art, and it’s categorically considered bad. In the other conception, some pornographic content is art (or is anyway acceptable), while some other is not (or is anyway unacceptable). In either case, there is some bad pornographic content; in the first case it is all pornographic content, whereas in the second it is only some. In both cases, though, it is extremely hard to define what falls within the category of bad pornographic content.
To the best of my knowledge, there are essentially three definitions of porn:
First, in practice, and mainly on social-media platforms, a combination of D1 and D3 is used. For example, generally Youtube applies D3, with sexually explicit taken to be showing genitals or breasts (i.e., nudity). Yet, there are exceptions that are allowed which show clear and undebatable nudity (ARTE, 2023). This is where D1 seems to kick in, i.e., such content is allowed because it is not intended to be sexually arousing. This discussion shows exactly the problem with these definitions. I do not think any of these definitions alone, or any combination, can help the lawgiver. D3 is the most easily applicable, but it includes too much to be used (e.g., many Rennaissance paintings are porn). D1 and D2 make sense, but how can one define an unambiguous criterion for when something was intended to be sexually arousing (D1) or when something subjugates women? Take D1 for example. A lot of content on Instagram is sexually arousing, and many surveys using representative samples could support that. The sect of D1 would probably argue that it was not intended to be, but how can we prove that?
Nevertheless, I think that D1 is nearly practical for the case of AI, and nicely enough this provides an answer to both the current counterargument and the previous one. Whether the content resembles or not the original, and whether the use is private or not, if we presume that creating pornographic content without consent is bad, then any use we discussed should be illegal because the creations of these models are clearly porn. It is hard to know whether an Instagramer creates his content to create sexual arousal, but the case for AI models is much clearer.
—
The third and final counterargument follows a utilitarian perspective (I do not think any other school of thought could support what follows). I think most would agree that the porn industry is abhorrent in general, but the main victims are perhaps the stars themselves, especially when they are coerced or minors (both common). So, what if we could create porn (because that is probably not going away anytime soon), but in an industry that does not hurt the protagonists?25 AI could be the solution to that (and technologically the transition is well under way). The hardest case to make is for child pornography—on so many levels—so let me take this one up.
Before we begin our normative discussion, it is worth looking into the state of law in the U.S. regarding this issue. Briefly, in 1996 Congress passed the Child Pornography Prevention Act (CPPA) which criminalized any computer-generated child pornography. In 2002, however, the Supreme Court overruled CPPA in Ashcroft v. Free Speech Coalition on First-Amendment grounds, arguing that since images do not involve the exploitation of an actual child during production, they could not be banned as a categorical exception to free speech (Marcy, 2002; McLean, 2007; Marzen, 2024). To consider the strongest counterargument possible, I will not only argue that such content is not necessarily harmful, but on the whole it may be beneficial.
Let us start constructing the utilitarian framework, by first putting ourselves in the shoes of a producer. Obviously, the production of explicit content with actual children is illegal and among the most detestable, looked-down upon content in most societies (and the latter is important because prison experiences vary wildly based on how your cellmates view your crime). If the producer can create such content using a machine, why would they risk the incredibly unattractive consequences of getting caught? One may object that such content will never as desirable as the real thing, but this is not what research shows. In a 2026 paper titled Subjective Responses of Gynephilic Men and Women to Real versus Artificial Female Nudes, Ellen Zakreski et al. found that:
These data suggest that AI-generated erotic material is superior to even real photographs in generating aesthetic appeal, positive valence, and ratings of sexual attractiveness [...] despite the fact that, in this study, the AI-generated images were perceived as less real.
So, if the AI-generated also sells as much, why would the producer not embrace it? A potential answer is because they do it for the “love of the game,” or even worse, they do it based on principles not outcomes. I cannot even begin to wrap my head around how either could be instantiated in practice, but I do not see any reason why we should consider them a priori contradictory. Nevertheless, I think the vast majority does not fall in either category. To put it plainly, I think they are in for the money, and so if the money can be obtained more easily, they will follow the path of least resistance (or least risk).
For the producer, then, the benefits outweigh the negatives, and this leads to the benefit the society as a whole too because fewer children get abused. One counterargument, however, is that this may not be the case in the long run. If AI child porn gets active development not in the margins but in the mainstream development, and if it becomes legal, then it will promote pedophilia.26 Let us articulate a leap here: if the production of such content is legal, the only rational path is to make the consumption legal too. Now, in the minds of most people, if something is legal then it is not wrong (which is different from encouraging it, though). If it is not wrong, then more people will start doing it.
But on a purely utilitarian basis, that alone cannot be a problem, for the reasons we discussed in the first counterargument. More importantly, though, I do not think it is realistic, because I do not think that people can choose to get attracted by a certain type of people. To a large extent, this is a scientific question, but I am not aware of any bibliography that has studied it. The hypothesis I laid down is based on the case homosexuality, which is a case I chose due to the breadth and availability of research.
Concretely, there is no scientific reason to believe that humans can voluntarily become homosexuals (Balthazart, 2011; Balthazart, 2018; Blanchard and Klassen, 1997; Bogaert and Skorska, 2011; Sabuncuoglu, 2015; Bailey et al., 2016; Sanders et al., 2015; Whitam, Diamond, and J. Martin, 1993). Here some clarifications are necessary, especially given the many citations.27 The current research has concluded that there is a strong influence of biological factors in homosexuality. For example, according to (Balthazart, 2018):
[T]here is strong evidence [...] of a genetic component to the control of sexual orientation, even if attempts to identify the specific genes involved have met so far with little success.“ ”
Of course this does not mean that there are only genetic factors, but all the studies conclude that there are primarily biological factors. The pressing question for our purposes is whether the social environment has anything to do with it. For males, the answer is a definitive “no.” For females, there has been no statistically significant result. But we should note that even if the environment does influence female homosexuality, this does not mean a famale can choose to become homosexual. Thus, based on the current research, we have no reason to believe one can choose to become homosexual.
In a similar line of reasoning, I find no reason to believe that one can choose to become a pedophile. I should clarify that I do not suggest that we can arbitrarily extrapolate the research of homosexuality to pedophilia. All I am saying is that this the closest research I could find. Based on it, i.e., the best evidence we currently have, I cannot come up with any grounded argument that deeming pedophilia legal will create more pedophiles. The real problem is the act. Of course the act would still be illegal, but one could argue that if pedophilia is considered acceptable, more people will attempt to act on it. But watching gore thrillers and playing Call of Duty is legal; that does not mean (contrary to what many parents think) that more and more people go out there to kill because they play Call of Duty or watch gore thrillers. In conclusion, then, I cannot see how one could object to such usage from a utilitarian perspective (although there of course many other reasons to object to it).