A malicious response by users would be to employ an LLM instructed to write plausibly sounding but very wrong answers to historical and current questions, then an army of users upvoting the known wrong answer while downvoting accurate ones. This would poison the data I would think.
All use of generative AI (e.g., ChatGPT1 and other LLMs) is banned when posting content on Stack Overflow.
This includes "asking" the question to an AI generator then copy-pasting its output as well as using an AI generator to "reword" your answers.
Interestingly I see nothing in that policy that would dis-allow machine generated downvotes on proper answers and machine generated upvotes on incorrect ones. So even if LLMs are banned from posting questions or comments, looks like Stackoverflow is perfectly fine with bots voting.
For years, the site had a standing policy that prevented the use of generative AI in writing or rewording any questions or answers posted. Moderators were allowed and encouraged to use AI-detection software when reviewing posts. Beginning last week, however, the company began a rapid about-face in its public policy towards AI.
I listened to an episode of The Daily on AI, and the stuff they fed into to engines included the entire Internet. They literally ran out of things to feed it. That's why YouTube created their auto-generated subtitles - literally, so that they would have more material to feed into their LLMs. I fully expect reddit to be bought out/merged within the next six months or so. They are desperate for more material to feed the machine. Everything is going to end up going to an LLM somewhere.
I think auto generated subtitles were to fulfil a FCC requirement, some years ago, for content subtitling. It has however turned out super useful for LLM feeding.
Frankly, the solution here isn’t vandalism, it’s setting up a competing side and copying the content over. The license of stackoverflow makes that explicitly legal. Anything else is just playing around and hoping that a company acts against its own interests, which has rarely ever worked before.
I’m not saying vandalism is illegal. I’m say that it borders on immoral and that there is a better, more radical (and thus effective) alternative that one might expect to be illegal but in fact isn’t.
Angry users claim they are enabled to delete their own content from the site through the "right to forget," a common name for a legal right most effectively codified into law through the EU's General Data Protection Regulation (GDPR). Among other things, the act protects the ability of the consumer to delete their own data from a website, and to have data about them removed upon request. However, Stack Overflow's Terms of Service contains a clause carving out Stack Overflow's irrevocable ownership of all content subscribers provide to the site
It reality irritates me when ToS simply state they will do against the law.
It's not quite that simple, though. GDPR is only concerned with personally identifiable information. Answers and comments on SO rarely contain that kind of information as long as you delete the username on them, so it's not technically against GDPR if you keep the contents.
Frankly I don’t see any way whatsoever that this would fly, and that’s a good thing!
Imagine what it would mean for software-development if one angry dev could request the deletion of all their contributions at a moments notice by pointing to a right to be forgotten. Documentation is really not meaningfully different from that.
It’s true that it’s mostly a symbolic act, but the rebellion matters, especially from old accounts. It’s also a nice way to mark the time after which I never participated in SO again. After my ban expires, I’ll deface my questions again. And again. Until they permaban me.
There’s also the possibility of adding to the wonderful irony of making the AI more useful than the original by having content that’s no longer accessible through through the original. It doesn’t get more enshittified than that, even if Prashanth Chandrasekar is too out of touch to ever regret his decision.
I think you're 100% correct in assuming they've already fed it data scraped from SO. I've previously gotten code samples from ChatGPT that was clearly from SO down to the comments in the code. Even reverse searched some of the code and found the question it was from.
They seem to only be watching the questions right now. You’re automatically prevented from deleting an accepted answer, but if you answered your own question (maybe because SO was useless for certain niche questions a decade ago so you kept digging and found your own solution), you can unaccept your answer first and then delete it.
I got a 30 day ban for “defacing” a few of my 10+ year old questions after moderators promptly reverted the edits. But they seem to have missed where I unaccepted and deleted my answers, even as they hang out in an undeletable state (showing up red for me and hidden for others).
And comments, which are a key part to properly understanding a lot of almost-correct answers, don’t seem to be afforded revision history or to have deletes noticed by moderators.
So it seems like you can still delete a bunch of your content, just not the questions. Do with that what you will.
At the end of the day, this is just yet another example of how capitalism is an extractive system. Unprotected resources are used not for the benefit of all but to increase and entrench the imbalance of assets. This is why they are so keen on DRM and copyright and why they destroy the environment and social cohesion. The thing is, people want to help each other; not for profit but because we have a natural and healthy imperative to do the most good.
There is a difference between giving someone a present and then them giving it to another person, and giving someone a present and then them selling it. One is kind and helpful and the other is disgusting and produces inequality.
If you're gonna use something for free then make the product of it free too.
An idea for the fediverse and beyond: maybe we should be setting up instances with copyleft licences for all content posted to them. I actually don't mind if you wanna use my comments to make an LLM. It could be useful. But give me (and all the other people who contributed to it) the LLM for free, like we gave it to you. And let us use it for our benefit, not just yours.
An idea for the fediverse and beyond: maybe we should be setting up instances with copyleft licences for all content posted to them. I actually don’t mind if you wanna use my comments to make an LLM. It could be useful. But give me (and all the other people who contributed to it) the LLM for free, like we gave it to you. And let us use it for our benefit, not just yours.
This seems like a very fair and reasonable way to deal with the issue.
Agreed on that last part, making that the default would be a great solution. I could also use a signature in comments, like that guy who always puts the "Commercial AI thingy" but automatically.
Copyleft lisenses are anti-copywrite, copywrite lisenses. They guarantee any random person the right to use and (usually) modify and (usually) distribute the work (art, program, etc.) with some noteworthy terms and conditions. Open access is where they provide a good or service for free but are not legally required to do so.
I bitch about it not being open sourced like llama2.
I think you still have to have an account (last time I used it anyway), but you're right, there is a tier you don't have to pay any money for. It's just an email address but whatever. You can use it via their website but afaik they haven't released a free model based on the data they've scraped off us, so you can't host it on your own hardware and properly do what you want with it. I have heard though that commercial websites were/are using ChatGPT bots for customer service and you can easily use the customer service chatbots on their website to do other random stuff like writing bash scripts or making yo mama jokes.
Maybe a better act of rebellion would be to scrape the data on stack, self host it, and move to an open source platform. Easy for me to say though, when I only ever coded Hello World
Why does OpenAI want 10 year old answers about using jQuery whenever anyone posts a JavaScript question, followed by aggressive policing of what is and isn't acceptable to re-ask as technology moves on?
It was basically helping people deal with ancient browsers (particularly IE6) and a javascript runtime bereft of convenience features, at a cost of some syntactic awkwardness and performance.
If you are targeting ES2020 and above, as is widely considered a reasonable requirement, you pretty much have the stuff that jQuery brings to the table, but built in without additional download and without an abstraction that costs some cycles.
Instead of solely deleting content, what if authors had instead moved their content/answers to something self-owned? Can SO even claim ownership legally of the content on their site? Seems iffy in my own, ignorant take.
Well I suppose in that case, protesting via removal is fine IMO. I think the constructive, next-step would be to create a site where you, the user, own what you post. Does Reddit claim ownership over posts? I wonder what lemmy's "policies" are and if this would be a good grounds (here) to start building something better than what SO was doing.
A SO alternative cannot exist if a user who posted an answer owns it. That defeats the purpose of sharing your knowledge and answering questions as it would mean the person asking the question cannot use your answer.
A SO alternative cannot exist if a user who posted an answer owns it. That defeats the purpose of sharing your knowledge and answering questions as it would mean the person asking the question cannot use your answer.
Couldn't these owners dictate how their creations are used? If you don't own it, you don't even get a say.
That's the point of platforms like SO - you give away your knowledge, for free, for everyone, for any use case. If a user can restrict the use of their answers, then it makes no sense for SO to exist. It's like donating food to a food bank and saying that your food should only go to white people and not black people.
I'm not sure I agree with your example - it's more like giving the owners of the donation the ability to choose WHO they are donating to. That means choosing not to donate to companies that might take your food donation and sell it as damaged goods for example. I wouldn't want my donation to be used that way. Thats how I see it anyway
So does that mean anyone is allowed to use said content for whatever purposes they'd like? That'd include AI stuff too I think? Interesting twist there, hadn't thought about it like this yet. Essentially posters would be agreeing to share that data/info publically. No different than someone learning how to code from looking at examples made by their professors or someone else doing the teaching/talking I suppose. Hmm.
CC (not sure about MIT) virtually always requires attribution, but as GitHub Copilot showed right now open-"media" authors have basically no way of enforcing their rights.
In most jurisdictions you can't give away copyright - that's why CC0 exists. And again most open-source and CC licences require attribution, if you use those licences you have a right to be attributed
For super permissible licenses like MIT then it's probably fine. Maybe folks would need to list the training data and all the licenses (since a common requirement of many of even the most permissible licenses is to include a copy of the license).
As far as I know, a court hasn't ruled on whether clauses like "share alike" or "copy left" (think CC BY-SA or GPL) would require anything special or not allow models. Anyone saying otherwise is just making a best guess. My best guess is (pessimistically) that it won't do any good because things produced by a machine cannot be copyrighted. But I haven't done much of a deep dive. I got really interested in the differences between many software licenses a few years back and did some reading but I'm far from an expert.
Regardless of the license (apart perhaps from public domain) it is legally still your copyright, since you produced the content. Pretty sure in EU they cannot prevent you from deleting your content.
But those two licenses give everyone an irrevocable right to do certain things with your content forever and displaying it on a website is one of those things (assuming they follow the other requirements of the license).
You can when it comes to copyright. That’s EU-law and anything else would be such a horrible idea that no country would ever set up a law saying otherwise.
If you could simply revoke copyright licenses you would completely kill any practicality of selling your copyrighted works and it would fully undermine any purpose it served in the first place.