Hello Friends,
Buckle up. This post is a bit longer than my typical pieces. It’s the result of a couple of months of noodling on this topic. I hope you enjoy.
Reading time: 12 minutes. 15 if you savor it.
For my Mom’s birthday last year, I was determined to make her a breakfast that Gordon Ramsay would declare “absolutely stunning.”
I’m no chef. The first time I made my then girlfriend dinner, I made a Cacio e Pepe pasta so heavy on the pepper, the pasta looked like whole wheat. “That’s lot of pepper,” my girlfriend warned. “It’s fine,” I replied before our first bite and consequent coughs changed our dinner plans.
So I enlisted ChatGPT to make pancakes with ricotta whipped cream and blueberry compote. I felt like a chef at Alenia. Willowy ricotta kissed with lemon zest, buttery pancakes glazed by sweet blueberries. I aced my homework and ChatGPT whispered the answers.
Buoyed by my culinary success, I began asking ChatGPT more things: How to draw, how to renovate my kitchen, the principles of color theory, etc.
Eventually, I began using ChatGPT in my writing—correcting for grammatical errors, providing feedback on writing clarity, fact-checking paragraphs, etc.
I found a genie and went through less trouble to get it than Aladdin.
Though the more I used ChatGPT and similar models, the more I began feeling uneasy. The ease by which it could answer complex questions made me feel as if I was cheating on a test. ChatGPT’s effortlessness was a direct affront to the idea that hard things require hard work.
I also began feeling self-conscious about the quality of my own writing and threatened by what it would mean for me, someone who had made the radical decision to leave his six-figure salary to make his living through writing. It felt like I had just jumped off a plane, expecting to land on a grassy field, and suddenly the floor turned into goopy volcano vomit.
Over the past few months, I’ve shared my angst with friends. Some confess similar fears. Some go a step further and think we need to burn AI to the ground.1 But others look at me confused, as if I’m worried about the Loch Ness monster swallowing me during my next swim in Lake Washington.
Am I worrying too much or not enough? Is it naïveté or ignorance? I realized I knew very little about these models and how they are trained. My ignorance was not due to lack of curiosity. I was avoiding truths that I may not be comfortable with. It’s easier to eat meat when you don’t think about how the sausage is made.
The only way to end my rumination was to dive deeper and understand more about OpenAI’s models.
OpenAI’s Sausage
My first question was where OpenAI got the data to train their models. Was it a gift from the Gods? A carefully designed data acquisition strategy worthy of a Harvard Business Review business case?
Turns out that even OpenAI’s executives have a hard time answering this question.
In a recent interview with the Wall Street Journal, Mira Murati, OpenAI’s Chief Technology Officer, was asked about their latest revolutionary text-to-video model called Sora. Ms. Murati talked with ease about the model’s function, how it works, and its current weaknesses. Wall Street Journalist Joanna Stern then asked about what data was used to train Sora. In response, Ms. Murati said Sora used “publicly available data and licensed data.”
Ms. Stern pressed on:
Journalist: So, videos on YouTube?
Mira Murati: I’m actually not sure about that.
After a couple more questions on data sources, Ms. Murati replied:
I’m just not going to go into the details of the data that was used but it was publicly available or licensed data.
The CTO of OpenAI—one of the smartest people in the world—is “not sure” about the data sources for their models. It’s like if you asked Larry Page about how Google Search worked and he answered “I’m actually not sure about that.”
It’s far more likely that Ms. Murati does know where data used to train all their models comes from. She just can’t tell us.
She knows that publicly available might also mean copyrighted data. Data which was likely used without permission.
Yet, OpenAI claims that even if they, hypothetically, used copyrighted information, they did so under fair use. Fair use essentially means that you can use copyrighted material under certain circumstances. Keep in mind that fair use is a legal defense, not a permission slip.2
For instance, if I use 10 seconds of a song for educational purposes, I can probably argue fair use in court and convince a judge. But if I were to re-print the book Dune, change its title to Dun-Dun-Dune and sell it as if I wrote it, I wouldn’t win that case.3
Fair use is part of the legal defense that OpenAI is arguing in their current lawsuit with the New York Times. The New York Times alleges that OpenAI used millions of its articles without permission to train chatbots to provide information to users. But OpenAI says it’s all fair use, baby.4
It could be that technically OpenAI is not doing anything illegal, that other leaders in the AI space like Google, Anthropic, and Facebook are doing similar things, and that there’s been more egregious abuses of data in the Internet age. But this defense can be summarized as: “Everyone is doing it and it’s not the worst behavior we’ve seen from a tech company, so look the other way for the sake of humanity.”
While OpenAI continues to be opaque about their data sources, we do have a better understanding of how these models work. Yet, as I understood how these models worked, the moral dilemmas become murkier.
It’s Kind of Like a Brain, But Not Quite
To better understand how models like OpenAI’s work, I recommend reading “AI for the Rest of Us,” written by my friend
.5 In the meantime, I’ll do my best Bill Nye impression and try to explain:OpenAI’s GPT-4 (ChatGPT is an interface) is part of a family of AI models called Large Language Models (LLM). These models are trained on ridiculous amount of data.
This data is trained using neural networks. A neural network is inspired by the ways neurons in the brain work, identifying patterns in data. In an artificial neural network, each neuron receives inputs from other neurons, applies fancy math to it, and passes the outputs to other neurons. So on and so forth. Eventually, the network learns to recognize and generate patterns in the data it was trained on. It’s like a game of telephone, but all the neurons are really good at it.
For instance, a neural network trained on images of French Bulldogs can learn to recognize this breed based on characteristic features like small muscular frames, short snouts, wrinkly faces. It does so through a process of learning and generalization, not by memorizing specific images from the training dataset.6 Generalization means that the model can identify French Bulldogs in images it’s never seen before by applying patterns and features it learned in training.
Here is where it gets tricky: A common criticism about these AI models is that if your work was part of the data this model was trained on, then anyone can copy you by prompting the model to replicate you. But as we just saw above, it's not as straightforward.
I cannot ask DALL-E (OpenAI's image generation model) to reproduce an exact replica of Damien's Hirst Flumequine, not only because the model does not memorize specific images, but also due to safeguards implemented by OpenAI to prevent the replication of copyrighted content. Here’s my attempt:
However, as people become more skilled at crafting prompts, they may be able to guide the model to generate images that closely resemble specific artistic styles or works, without directly replicating them. The legal liability of companies like OpenAI will become even muddier. They will claim that it is the user who abuses their tool. Therefore, the user should be held liable, not them.
So the models use training data as inspiration, not memorized outputs (similar to humans). If you asked for exact replicas, you won’t get any due to the model’s limitations and safeguards. But if someone wants to create replicas, they probably could—provided they become very proficient at prompting the model and getting around the safeguards.
Digital Manifest Destiny
It’s unclear whether OpenAI used copyrighted data without permission (probably though), whether they will be held liable for it, and whether we’ll see any sort of accountability via court rulings or regulation.
It also seems like any liability for copying or replicating copyrighted works through AI will fall on the users—not the owners of these models.
After learning more about these models, I realized that any moral unease I feel about Open AI and similar companies doesn’t really matter in the larger scale. The genie is out of the bottle and we’ve entered the era of Digital Manifest Destiny.7
The AI narrative will follow a broader pattern across human history where great leaps in technology are judged favorably, with the abuses and the abused left at margins of the tales of progress. The ends justify the means.8
Yes, AI is exciting. It is helpful to many, and has unlocked new frontiers of creativity. I love that many non-native English speakers now use AI to help them write business emails with native fluency. Or how a budding filmmaker like my friend
can use AI generated video as an element in his short movie. Or someone without a camera can create a compelling short film like the shy kids’ Airhead.But at the risk of sounding like a heretic to Techno-Optimists9, this whole “progress first and ask questions later” tends to come back and bite us in the ass.
I don’t know about you, but giving people easy access to credit, giving everyone a super computer in their pockets, and making pain management easier has made us indebted, lonely, and addicted. So maybe, it’s not a bad idea to spend more time thinking about what the shadow side of this Digital Manifest Destiny is and use our human ingenuity to mitigate the side effects. I don’t think we’ll solve all problems, but we have to solve as many as possible.
As for me, I’ve gone from fear to acceptance. I now assume that most tasks in creative jobs will be successfully completed using AI in the next 10 years. However, I don’t think creative jobs will become extinct, they will just look radically different than they do now—like if I dyed my hair red.10
The key for me in any creative work will be to use AI to augment my work, not replace it. If I write a novel, I won’t ask ChatGPT to write it for me. Instead, ChatGPT will be a research assistant, editor, and idea generator. And eventually, I will still hire professionals who will help me with these things and who use AI themselves to improve their craft.
The tensions I’ve experienced haven’t fully resolved. Though now I feel like my thoughts on AI reflect my own thinking and I’m not just parroting thought leaders on Twitter. There are different outputs that these models produce that require far less creativity than others. Perhaps those should be judged differently. And in the same way that it is hard to forecast the shadow side of AI, I’m also aware that I may not be thinking of the “best case” scenario either.
Perhaps the genie that’s out of the bottle will be used to accomplish even bolder endeavors—theatric film releases created by a small group of outsiders telling stories that Hollywood tends to pass on, or a burst of poets and writers who use AI to experience the benefits of using writing for exploration and expression.
The answer for a lot of these questions, and most questions in life is that there will be conflicting answers that coexist together. The best and worst case scenario may both happen, at the same time. Dualities are a constant in the natural world. The range of feelings we experience as we navigate these dualities is something that we can still claim as exclusively human.
For now.
Before you go…
🙏 Huge thank you to
, , , and for their edits and feedback.🗣️ I’d love to know what I got wrong. What are other arguments I should have considered? How do you feel about models like ChatGPT? Let me know in the comments.
📢 Share this post if it resonated or if you want to roast me in a more public forum.
These individuals may also be pyromaniacs.
When ruling on copyright cases, courts apply what it called a “four-factor test” to render a decision on copyright. These are the four factors: Purpose and character of use, nature of copyrighted work, how substantially used the work was, and the effect of using the copyrighted work on its market value. (h/t to David Kieferbaum for explaining this to me).
Dun-Dun-Dune should be the title of a Dune musical in Broadway.
OpenAI’s new slogan: All is fair in love and AI.
You can also check out his Substack here.
The goal of these models is not to memorize but learn patterns. However, if the network is complex enough and the data is insufficient, then the model will end up with memorization. This is called overfitting.
Manifest Destiny was a belief or ideology prevalent in the 19th century that proclaimed the inherent right and inevitability of the United States expanding its territory westward. It expressed the conviction that it was America's destiny to encompass the entire continent, leading to the acquisition of land and conflicts with indigenous populations (source ChatGPT).
Fun Fact: Niccolo Machiavelli never said that. And the quote has been taken out of context by people like me for hundreds of years.
Techno-optimists is a PR win for the Marc Andreesen-type of people in Silicon Valley. It turns them into heroes and any question on the downsides of AI can be dismissed as cynical. Brilliant.
A Scottish Camilo of sorts.
Camilo, I made it to the end! Thank you for writing this. I learned a TON, and not only appreciated your insightful comments throughout, they gave the essay a warmth, a soul.
I especially resonated with this: “I don’t know about you, but giving people easy access to credit, giving everyone a super computer in their pockets, and making pain management easier has made us indebted, lonely, and addicted. So maybe, it’s not a bad idea to spend more time thinking about what the shadow side of this Digital Manifest Destiny is and use our human ingenuity to mitigate the side effects.”
Please, run for office and give us hope. 😊
Camilo, your post totally nailed the rollercoaster ride of emotions that comes with exploring AI. That shift you described, from being amazed by AI's potential to feeling uneasy about its implications – I felt that.
It's like we're all trying to figure out this new terrain, and you've put into words the questions and concerns that many of us have. I love your take on AI as a tool to boost our creativity, not replace it.
That gives me hope! Thanks for sharing your take, Camilo. It's definitely given me food for thought.