British astronaut, Major Tim Peake, takes data to new dimensions as he closes out Big Data LDN 2023
As generative AI image generators such as DALL-E 2, Midjourney, and Stable Diffusion take the internet by storm, the way people create art is changing.
OpenAI’s AI image generator DALL-E 2 alone welcomes over 1.5 million users and generates over two million images every day. Meanwhile, MidJourney, the most popular art generator, boasts a Discord community of over 3 million users.
But, as AI image generators soar in popularity, discussions about how to introduce appropriate copyright laws for generative AI systems continue to run rife.
Currently, all images created by AI – including art generated from a text prompt written by a human – are not protected by global copyright laws.
That means that any work created by an AI can be copied freely without permission from its owner, since there are no laws that protect content created by non-humans, including artificial intelligence.
Even if AI art could be copyrighted, it could still be repurposed under Fair Use, which permits the re-purposing of copyrighted content under certain conditions without needing permission from its owner.
So can AI truly be copyrighted? This is just one fragment of the legal minefield that is copyright and generative AI.
To understand the dilemma, we must first delve deep into the role of online data in generating AI art, and how this blurs the lines of copyright when it comes to protecting peoples’ images.
The role of data in AI art generation
Like most machine learning models, AI art generators are trained on a combination of text and image data to generate content.
They work by identifying and replicating patterns in this data. This means that for them to be able to generate an output like a sentence or painting, they must first learn from the real work of actual human artists.
OpenAI’s DALLE-2, for instance, is trained on hundreds of millions of captioned images taken from the internet, which feed the large language model (LLM) that DALLE-2 uses to create images.
The images DALLE-2 creates therefore entirely depend on the text and image data it has taken from the web, which can create some biases such as it being more likely to generate images depicting men than women.
But it is this dependency on online data that has created the legal dilemma surrounding AI-generated content. Since AI art generators have partly trained on data taken from artists’ copyrighted work, patterns from their work can sometimes re-emerge in the AI’s product image.
It is this that led image repository Getty Images Suing Stable Diffusion creator Stability AI, for allegedly copying and processing millions of copyrighted images to train the AI art generator.
Several images generated by Stable Diffusion could show distorted versions’ of Getty Images’ copyright watermark, which Getty believed demonstrated the company had “unlawfully copied and processed” images from its website for its own commercial use.
It’s not just AI art generators that have faced legal action because of traces of training data re-surfacing in AI-generated content.
The AI-copyright paradox
In both of these cases, it remains unclear if these companies scraped data from the web illegally. But, for now, at least, there is no law to stop AI companies from scraping data from the web – copyrighted or not.
That is because AI image generators and chatbots cannot be considered the author of the material they produce.
Their outputs are simply a culmination of human work, much of which has been scraped from the internet and is already copyright protected.
And while AI companies may be behind the output that AIs produce, they are not the creators themselves. With no author, there is no copyright infringement.
AI art and fair use
Even if AI art were to be found to have illegally scraped data from artists’ copyrighted original work, the production of AI art based on this work would likely still be permitted under fair use.
Fair Use allows for copyright-protected work to be re-purposed under several conditions, one of which being that the end product is “substantially similar to the original copyrighted image.
Since AI is a collage of data taken from images rather than fragments of the image themselves, the art they produce has an inherently transformative purpose from the original images they are trained on.
“For conventional images, currently, there’s legal precedence that derivative works are protected by the copyright of the original creator, but transformative works are not,” image generator DeepAI’s founder Kevin Baragona said in an interview with EM360.
Collages are somewhere in the middle and depend on how the collage is used. Where will AI-generated content fall? My guess is it will be a new, yet-to-be-defined category.
Of course, copyright laws are constantly evolving, and lawmakers in the EU are already investigating new ways to regulate generative AI given the wave of lawsuits against AI companies and continued complaints from the creative industry.
Research regarding privacy concerns suggests it is unlikely it is that a diffusion-based model will produce outputs that closely resemble one of the inputs – but it is still possible.
But the chances of an image in the training data set being duplicated in output, even from a prompt specifically designed to do just that, remains extremely low.
Still, there exists, at most, a handful of rightsholders globally that might have a copyright claim. So far, no existent lawsuit suggests the plaintiffs are in that category.