Make illegally trained LLMs public domain as punishment

🃏Joker@sh.itjust.works · 3 days ago

Make illegally trained LLMs public domain as punishment

ClamDrinker@lemmy.world · edit-2 2 days ago

Plagiarism is not the same as copyright infringement. Why you think people probably plagiarize is doubly irrelevant then.

I never claimed it was, but as I said before, it is irrelevant because copyright infringement differs in places depending on the local laws, but plagiarism is usually the concept that guides the ethical position from which those laws are produced, which is why yes, it’s relevant.

Show me literally any example of the defendant’s use of “analysis” having any impact whatsoever in a copyright infringement case or a law that explicitly talks about it, or just stop repeating that it is in any way relevant to copyright.

This is an unreasonable request, and you know it to be. Again, we don’t share the same laws and different jurisdictions provide different exceptions like fair use, fair dealing, or just straight up exclusion from copyright for their use. But it is wholly besides my argument. You can look at any piece of modern media that exists in the same space and see ideas the two share, while not sharing the same expression of that idea. How some characters fulfill the same purpose, dress the same way, or have similar personalities. You are free to make a book with a plumber, a mustached man, someone wearing a red hat with the letter M on it, and someone that goes to save a princess from a castle, but if they’re not the same person they are most likely not considered to be the protected expression of Mario. Same ideas that make up Mario, one infringing, the other not.

Nobody goes to court over this because EVERYONE takes each others ideas, “Good artists copy, great artists steal”. It’s only when you step on the specific expression of an idea that it becomes realistically actionable, and at that point transformativeness is definitely discussed almost every single time, because it is critical to determining the copyright was actually infringed, or if not.

Wrong. The “all together” and “without adding new patterns” are not legal requirements. You are constantly trying to push the definition of copyright infringement to be more extreme to make it easier for you to argue.

I’m sorry but, are you really being this dishonest? I’ve mentioned EXPLICITLY in my last comment that I wasn’t giving a definition of copyright infringement, because it’s besides the point, and not what I’m claiming. Yet here you are saying I am “trying to push” a definition. We are not lawyers or law scholars speaking to each other, I am having a discussion with you as another anonymous person on a message board.

Unfortunately, an AI has no concept of ideas, and it simply encodes patterns, whatever they might happen to be.

You are just arguing semantics and linguistics, it’s meaningless. We are not talking technical specifics, not even a specific model, nor a specific technique to specific exactly how the information is encoded. It’s a rough concept of “ideas” / “data” / “patterns”: information. And AI definitely has that.

Again, you’re morphing the discussion to make an argument.

You mean, I’m making an argument. Because yes. I am. I don’t see why this negative framing is necessary nor why this is noteworthy enough to bring up, unless you really just want to make me look bad for no apparent reason.

Mario’s likeness has to be encoded into the model in some way. Otherwise, this would not have been the image generated for “draw an italian plumber from a video game”. There is absolutely nothing in the prompt to push GPT-4 to combine those elements. There are also no “new” patterns, as you put it. That’s exactly the point of the article. As they put it:

Yes, there is some idea/pattern of “Mario-ness” in the model, I said that. This was not me trying to say no material of Mario was used in training, but that it’s not like someone pasted direct images of Mario in there, but that AI models makes logical connections between concepts and even for things we cannot put a good name to does it make those connections, and will allow you to prompt for them, but that does not mean you should.

Clearly, these models did not just learn abstract facts about plumbers—for example, that they wear overalls and carry wrenches. They learned facts about a specific fictional Italian plumber who wears white gloves, blue overalls with yellow buttons, and a red hat with an “M” on the front.

These are not facts about the world that lie beyond the reach of copyright. Rather, the creative choices that define Mario are likely covered by copyrights held by Nintendo.

I sort of already explained this without mentioning this specific example, but I’ll make it extra clear.

In the article they prompted the AI for a “video game Italian plumber”. What person, if you asked them, to think of an “Italian video game plumber”, would not think of Mario? Maybe Luigi? I’ll tell you, because there are very damn few famous Italian video game plumbers. The prompt is already locked in on Mario, and even humans make the logical connection to Mario. It might have had billions of images and texts to use, but any time a relation to an “Italian video game plumber” showed up, there’s Mario.

So this whole point the article makes about it not learning abstract facts about plumbers, is complete moot because they completely biased the outputs towards receiving what they want to receive. If you ask for just a plumber, for which it does have many, many results. It will make more generalizations and become less specific. Because there are more than 2 examples of plumbers in other types of situations. Humans do this exact same thing in the same task, yet somehow the AI must be infallible to this despite being artificial versions of the biological thing. And that is why analysis is protected, because humans simply cannot stop doing it and everyone is tainted by their knowledge of Mario, even though for whatever reason we might need to use one of the ideas Mario is built upon. And this is why AIs use this same defense. I can say this regardless of the jurisdiction because unless you live in some kind of dictatorship this is generally true.

Sadly, this kind of deceptive framing of AI output is common, particularly among those that are biased against AI. Sometimes it’s unintentional, but frequently specific parameters are used that will just generate specific bad results, ignoring that this may not even represent 0.001% of what the model can generate in normal situations.

This is contradictory to how you present it as “taking ideas”.

It is not. You can use the idea of Mario, you cannot use the totality of Mario. For the AI to be able to use the idea of Mario, it will also ‘learn’ the totality of Mario in the process, as Mario is a collection of ideas that are extracted. But those ideas are stored separately so they can be individually prompted for. You can prompt it to make Mario, because like literally almost every person in society, they know what ideas make up Mario better than I can put to words here. If I hire a human artist to make me a “video game Italian plumber”, their first question to me would be “Oh, something like Mario?” and their second response will be “Oh I can’t do that, and you should not want to, because you don’t own Mario.”. Humans use AI, so they need to be the ones to give that second response.

Just like a kitchen knife can be used to stab someone, doesn’t mean we produce kitchen knives for stabbing people. Just because an AI can be used to infringe, does not mean that they are produced to infringe. Which is evidence by the vast majority of other ways that it can be used that don’t infringe, which is self evident after just tinkering around with it for a little while.

You’re mixing up different things. I’m saying that the image contains infringing material, which is hopefully not something you have to be convinced about. The production of an obviously infringing image, without the infringing elements having been provided in the prompt, is used to show how this information is encoded inside the model in some form. Whether this copyright-protected material exists in some form inside the model is not an equivalent question to whether this is copyright infringement. You are right that the courts have not decided on the latter, but we have been talking about the former. I repeat your position which I was directly responding to before:

If it’s anything like the examples before, then the AI has definitely been prompted by the user to make infringing elements.

But anyways, to the question, you just don’t seem to grasp that collections of ideas can communicate copyright infringing material without being infringing on their own. It’s like arguing that if Paint or Photoshop knows about the color red that this is copyright infringing because it’s the same red that Mario uses. None of the ideas that make up Mario are infringing, and cannot be copyrighted. They are what the AI is designed to extract, not Mario as a totality.

You can definitely use AI to make an infringement machine by making it less likely to make leaps in ideas and just only combine the ideas it’s been taught on, which we as humans can do as well in the form of plagiarism and forgery. But if you’re going to be unethical why use an AI when you might as well just take the easy route directly with print screen or a photo. Two other technologies we didn’t ban for having this ability to capture copyrighted material, even if they far more blatantly copy the material.

This is where good AI usage deviates, because it instead tries to MAXIMIZE the amount of leaps and connections the AI makes for as little possibility to make something infringing. Even honest people trying to make new creative works sometimes have to change things because they might be too close to being infringing.

patatahooligan@lemmy.world · 2 days ago

You mean, I’m making an argument. Because yes. I am. I don’t see why this negative framing is necessary nor why this is noteworthy enough to bring up, unless you really just want to make me look bad for no apparent reason.

I don’t understand how you expect me to not point out that you are using inequivalent concepts interchangeably and reaching conclusions different to what you initially stated.

No, seriously this the only part of the comment that is relevant:

They are what the AI is designed to extract, not Mario as a totality.

And it is stated as fact, in the face of evidence to the contrary.

Here I’ll make it simple. Do you disagree on any of the below statements?

There is a combination of elements that is protected by copyright regardless of whether any completely individual element would be protected. This “Mario-ness” or “totality of Mario” or whatever you want to call it.
The Mario picture contains the “Mario-ness”.
The prompt does not include most of those elements and very clearly does not contain the “Mario-ness”.

If any of the above seem false to you, explain why. Otherwise explain where this Mario-ness in the image came from. Explain how your answer relates to the initial statement that models detect empirical, factual observations about the material it is shown, which cannot be copyrighted.

That is the only thing that would be on topic. Everything else is just rambling. If you don’t argue in favor of your position I reacted to, or if you don’t understand the counter-point and respond clearly to it, then why are you replying to me at all?