AI is making me paranoid about contributions

lemmydividebyzero@reddthat.com · 1 day ago

AI is making me paranoid about contributions

ryannathans@aussie.zone · 1 day ago

Open source maintainers aren’t here to teach contributors how to write better code, we’re here for maintenance of the project. The review prevents shit getting merged. Humans write shit too. This is what reviews are for

Azzu@leminal.space · edit-2 1 day ago

That is just mostly wrong. Around 90% of the time, when you do a review, just fixing the issue that you found is much faster than explaining the issue and saying what needs to be done instead.

Reviews plainly are for educating the contributor to what constitutes “non-shit”(using your terminology) code on the repo. If that wasn’t the case, you could just not do a review and just change the code, without any interaction at all. Why would you communicate the change that needs to be done otherwise?

Rarely of course, something is so complicated that it actually takes more time to come up with the right code than do a review. But that is only a rare thing.

brucethemoose@lemmy.world · edit-2 11 hours ago

Rarely of course, something is so complicated that it actually takes more time to come up with the right code than do a review. But that is only a rare thing.

This is definitely a thing though.

On this very topic, many llama.cpp PRs are good examples. A model trainer may present a PR with poor understanding of the (very complicated, highly specialized, sparsely documented) project. Then a maintainer comes to fix it, but has absolutely no knowledge of certain things the model trainer would know (“Oh, the whole thing NaNs if this one value on layer 23 isn’t FP32!”)

There has to be a back-and-forth. A whole lot of it.

That is an exception, yeah.

But I’m not sure I’d call it “rare.” There are definitely situations where fixing without explaining is ultimately a whole lot of work.

ryannathans@aussie.zone · edit-2 23 hours ago

I don’t need to explain the issue, that’s what the issue report does

I’m sure every project is a little different. The one I maintain has well over 1000 merged PRs now (2000 if you count the old repo), and I’d be dead if I did even 1/4 of the work contributors do

Plus, even maintainers must have a code review and functional testing on their PRs, so doing the work yourself doesn’t relieve the human workload that must be done. It actually increases total maintainer effort to do the work yourself

Azzu@leminal.space · 19 hours ago

I’m not talking about the work contributors do, obviously that is invaluable.

But if you do a review, and you see that a function should be extracted at one point to avoid code duplication, is it really faster to tell the contributor that a function needs to be extracted there, compared to just extracting it yourself as you see it?

The value of a review is collaborative truth finding and learning. If there is an LLM on the other end, that’s just not happening.

ryannathans@aussie.zone · 10 hours ago

If I do it myself, I might be missing the reason they did it that way. The dialogue is useful, even with a machine capable of reasoning.

If I do it myself, now yet another person has to get involved because a person who has written code cannot be an impartial reviewer.

MangoCats@feddit.it · 13 hours ago

The value of any given contribution is the same, regardless of whether the code was written by a seasoned developer, a neophyte as a first project, an LLM, a team of high school students learning the language, or space aliens - the code is the code, it helps or hurts exactly the same when merged with zero connection to who or what wrote it.

Caring about who or what wrote the code is applying prejudice. Prejudice works well in a lot of cases, but it’s no guarantee.

If you are accepting submissions from anonymous, or insecurely identified (same thing, really), contributors, they should all be treated with zero prejudice. You might think you know who or what wrote the code based on the name in the linked e-mail address, the way comments are (or aren’t) written, or a million other “tells” in the code that aren’t about the function of the code - that’s really irrlelevant. What’s relevant is: what does the code actually do after it’s merged.

If you’re trusting code because you think its “tells” track with seasoned developers, be prepared - very very soon - for maliciously crafted code full of “seasoned developer” tells to slip in backdoors and other malware, because bad actors are already using AI to mimic the things you want to see in a submission in order to gain your trust and lower your guard against them slipping in the things they want in your code base.

WhyJiffie@sh.itjust.works · 11 hours ago

The value of any given contribution is the same, regardless of whether the code was written by a seasoned developer, a neophyte as a first project, an LLM, a team of high school students learning the language, or space aliens - the code is the code, it helps or hurts exactly the same when merged with zero connection to who or what wrote it.

Caring about who or what wrote the code is applying prejudice. Prejudice works well in a lot of cases, but it’s no guarantee.

the blog post is not about who actually wrote the code, but whether it’s worth the effort to do a thorough review. if an actual person made it, then yes because they can learn from it and the world becomes a slightly better place. if it was a vibecoder just using an LLM, then explaining what needs to be done and why does not add much to.the world, but it possibly helps to make the LLM company richer

Deebster@infosec.pub · 1 day ago

Part of being a maintainer is helping to onboard new contributors, this is why many projects have a tag for “good first issue”. Teaching people how to use the library/tool is part of that.