If AI sucks up all the information in the world and spits out an answer to a question in a randomized manner from a sliced-and-diced mish-mash of copyrighted material, does that constitute some sort of actionable copyright violation? The answer is a resounding NO.
This debate is a big deal, since every creator, such as myself, see their creations as sacrosanct, despite the fact that we're all ripped off at some point
But there is a philosophical question at the heart of this discussion. For some reason, people see the entire AI fuss as a scam (which it can still prove to be). Thus, at its core, is a parlor trick whereby a large cache of information, referred to in the business as an LLM (large language model), is rejiggered by a computer to spit out answers to questions based on probability models.
The probability models have the ability to place words in a certain order to construct sentences that make sense. This is promoted as “predictive” language construction. The same idea is engineered to work with art and animation.
In fact, none of this is exactly true, except on a very superficial level. Rather, under the hood of an LLM are complex neural networks that mimic the brain. These subsystems are what manage the “predictive” writing, for example, in a way that imitates the way humans think.
If one can accept that notion, which has been under development for over 40 years by my count, then all creative human endeavor itself has to be a copyright violation. We all use neural networks (our brains) to take in as much information as possible (most of it copyrighted). We reorganize it and also answer questions in the same way AI does.
So what's the fuss? And what is the difference?
The fuss is the fact that AI can suck up the entire Library of Congress and “know” more than any human, giving it an edge in answering esoteric queries.
The difference is that AI can never have any sensory or emotional feelings, which creates a dimensionality to knowledge that a machine cannot duplicate. It doesn't burn its fingers on a hot stove or fall in love or get mad about a favorite sports team losing a game because of a bonehead mistake.
AI, in the end, is an information retrieval and analysis system that can outperform people.
This is not much different that the first important use of a computer -- an amazing calculation machine that can out-perform humans in those computational tasks.
In the case of AI, the ability to “outperform” stems from being able to absorb more raw information and then consolidate the discrepancies and spit back a re-imagined truth.
This is exactly what humans do on a smaller scale with less input. But, as mentioned above, the majority of the input taken in by humans is also copyrighted material. So what is the difference? Are we violating copyright when we rejigger information about Herbert Hoover and produce an analysis? Few people use original materials, such as the private diaries of Hoover, or look for relatives to interview. We absorb maybe 10 sources, run them through a neural network called the human brain, and produce an analysis.
The AI machine does the same thing using an artificial neural network and with a lot more texts
With both systems, human and machine, you can violate copyright. With a human, if a new text consists of the exact same sentences as an older text, it's called plagiarism; longer stolen text would be an actual copyright violation. Shouldn't this be the exact same standard for AI generated text, art, and analysis?
In other words, protections are already in place.
While you see lots of bitter complaining about “the machine read my book and stored it in the LLM, that's a copyright violation,” very few examples of actual plagiarism exist thus far. So the complaining is about the mechanism of absorbing the material, the way a person does? That's it?
Well, it's different because the machine can store more and it retains every single word, which must be a copyright violation because a perfect copy is in memory, somehow, somewhere.
But what if I bought a copy of the book and kept it on my bookshelf for instant reference—how is that different?
OK, all of the resources in the LLM that are not public domain need to be paid for like you pay for a book. Each copy of the LLM must pay that same amount over and over.
This is the only argument that should make sense to the copyright owners. Licensing of magazine articles would also be in play. While public domain material could populate an LLM, questions about current events or technologies must come from copyrighted new material.
How does this get solved? The easiest way is for the AI companies to buy large publishing houses that have massive caches of published copyrighted material that they can license to themselves without a problem. As print publishing of all sorts begins to fade, now is the time to make deals.
Other random deals could and need to be made now. Then this copyright debate will be over for good and we can see where this goes.
And while AI writing can never capture the creativity of human expression, few writers can manage the chore, either. The machine is already doing a better job of rewriting and rethinking than people do. And, no, none of this column was written by AI.
AI art and animation seems further along and more problematic for reasons other than copyright. I'll leave that for another column.
Until then, I'd recommend you use AI as a great search tool and double check the accuracy. But for writing in particular, this is not going away anytime soon. Find a use for it. — JCD