• 0 Posts
  • 155 Comments
Joined 2 years ago
cake
Cake day: July 14th, 2023

help-circle
  • I think the best way to handle this would be to just encode everything and upload all files. If I wanted some amount of history, I’d use some file system with automatic snapshots, like ZFS.

    If I wanted to do what you’ve outlined, I would probably use rclone with filtering for the extension types or something along those lines.

    If I wanted to do this with Git specifically, though, this is what I would try first:

    First, add lossless extensions (*.flac, *.wav) to my repo’s .gitignore

    Second, schedule a job on my local machine that:

    1. Watches for changes to the local file system (e.g., with inotifywait or fswatch)
    2. For any new lossless files, if there isn’t already an accompanying lossy files (i.e., identified by being collocated, having the exact same filename, sans extension, with an accepted extension, e.g., .mp3, .ogg - possibly also with a confirmation that the codec is up to my standards with a call to ffprobe, avprobe, mediainfo, exiftool, or something similar), it encodes the file to your preferred lossy format.
    3. Use git status --porcelain to if there have been any changes.
    4. If so, run git add --all && git commit --message "Automatic commit" && git push
    5. Optionally, automatically craft a better commit message by checking which files have been changed, generating text like Added album: "Satin Panthers - EP" by Hudson Mohawke or Removed album: "Brat" by Charli XCX; Added album "Brat and it's the same but there's three more songs so it's not" by Charli XCX

    Third, schedule a job on my remote machine server that runs git pull at regular intervals.

    One issue with this approach is that if you delete a file (as opposed to moving it), the space is not recovered on your local or your server. If space on your server is a concern, you could work around that by running something like the answer here (adjusting the depth to an appropriate amount for your use case):

    git fetch --depth=1
    git reflog expire --expire-unreachable=now --all
    git gc --aggressive --prune=all
    

    Another potential issue is that what I described above involves having an intermediary git to push to and pull from, e.g., running on a hosted Git forge, like GitHub, Codeberg, etc… This could result in getting copyright complaints or something along those lines, though.

    Alternatively, you could use your server as the git server (or check out forgejo if you want a Git forge as well), but then you can’t use the above trick to prune file history and save space from deleted files (on the server, at least - you could on your local, I think). If you then check out your working copy in a way such that Git can use hard links, you should at least be able to avoid needing to store two copies on your server.

    The other thing to check out, if you take this approach, is git lfs. EDIT: Actually, I take that back - you probably don’t want to use Git LFS.



  • You can run a NAS with any Linux distro - your limiting factor is having enough drive storage. You might want to consider something that’s great at using virtual machines (e.g., Proxmox) if you don’t like Docker, but I have almost everything I want running in Docker and haven’t needed to spin up a single virtual machine.


  • You don’t have to finish the file to share it though, that’s a major part of bittorrent. Each peer shares parts of the files that they’ve partially downloaded already. So Meta didn’t need to finish and share the whole file to have technically shared some parts of copyrighted works. Unless they just had uploading completely disabled,

    The argument was not that it didn’t matter if a user didn’t download the entirety of a work from Meta, but that it didn’t matter whether a user downloaded anything from Meta, regardless of whether Meta was a peer or seed at the time.

    Theoretically, Meta could have disabled uploading but not blocked their client from signaling that they could upload. This would, according to that argument, still counts as reproducing the works, under the logic that signaling that it was available is the same as “making it available.”

    but they still “reproduced” those works by vectorizing them into an LLM. If Gemini can reproduce a copyrighted work “from memory” then that still counts.

    That’s irrelevant to the plaintiff’s argument. And beyond that, it would need to be proven on its own merits. This argument about torrenting wouldn’t be relevant if LLAMA were obviously a derivative creation that wasn’t subject to fair use protections.

    It’s also irrelevant if Gemini can reproduce a work, as Meta did not create Gemini.

    Does any Llama model reproduce the entirety of The Bedwetter by Sarah Silverman if you provide the first paragraph? Does it even get the first chapter? I highly doubt it.

    By the same logic, almost any computer on the internet is guilty of copyright infringement. Proxy servers, VPNs, basically any compute that routed those packets temporarily had (or still has for caches, logs, etc) copies of that protected data.

    There have been lawsuits against both ISPs and VPNs in recent years for being complicit in copyright infringement, but that’s a bit different. Generally speaking, there are laws, like the DMCA, that specifically limit the liability of network providers and network services, so long as they respect things like takedown notices.


  • Why should we know this?

    Not watching that video for a number of reasons, namely that ten seconds in they hadn’t said anything of substance, their first claim was incorrect (Amazon does not prohibit use of gen ai in books, nor do they require its use be disclosed to the public, no matter how much you might wish it did), and there was nothing in the description of substance, which in instances like this generally means the video will largely be devoid of substance.

    What books is the Math Sorcerer selling? Are they the ones on Amazon linked from their page? Are they selling all of those or just promoting most of them?

    Why do we think they were generated with AI?

    When you say “generated with AI,” what do you mean?

    • Generated entirely with AI, without even editing? Then why do they have so many 5 star reviews?
    • Generated with AI and then heavily edited?
    • Written partly by hand with some pieces written by unedited GenAI?
    • Written partly by hand with some pieces written by edited GenAI?
    • AI was used for ideation?
    • AI was used during editing? E.g., Grammarly?
    • GenAI was used during editing?E.g., “ChatGPT, review this chapter and give me any feedback. If sections need rewritten go ahead and take a first pass.”
    • AI might have been used, but we don’t know for sure, and the issue is that some passages just “read like AI?”

    And what’s the result? Are the books misleading in some way? That’s the most legitimate actual concern I can think of (I’m sure the people screaming that AI isn’t fair use would disagree, but if that’s the concern, settle it in court).


  • Further, “Whether another user actually downloaded the content that Meta made available” through torrenting “is irrelevant,” the authors alleged. “Meta ‘reproduced’ the works as soon as it made them available to other peers.”

    Is there existing case law for what making something “available” means? If I say “Alright, I’ll send you this book if you want, just ask,” have I made it available? What if, when someone asks, I don’t actually send them anything?

    I’m thinking outside of contexts of piracy and torrenting, to be clear - like if a software license requires you to make any changed versions available to anyone who uses the software. Can you say it’s available if your distribution platform is configured to prevent downloads?

    If not, then why would it be any different when torrenting?

    Meta ‘reproduced’ the works as soon as it made them available to other peers.

    The argument that a copyrighted work has been reproduced when “made available,” when “made available” has such a low bar is also perplexing. If I post an ad on Craigslist for the sale of the Mona Lisa, have I reproduced it?

    What if it was for a car?

    I’m selling a brand new 2026 Alfa Romeo 4E, DM me your offers. I’ve now “reproduced” a car - come at me, MPAA.



  • Getting physical access to users’ devices is more difficult than compromising their passwords, so in that sense, transitioning that one factor is a net improvement in terms of reducing the number of compromises for a given service.

    Except for e2ee accounts, which I suspect Passkeys don’t support in the first place (at least, not without caching the password on your device), law enforcement can access your account’s data without ever needing your password. If you’re concerned about law enforcement breaking into your device and you’re not using a unique 16+ character passcode with it set to wipe the device after a certain number of attempts, that’s on you.

    I’m not sure about the state of affairs on Android, but the most popular and powerful tool used by law enforcement to extract data from iOS devices only recently gained support for iOS 17 and it doesn’t have the ability to bypass passwords on a device that isn’t accepting FaceID; it just has the ability to brute force them. A password with sufficient entropy mitigates this attack. (It’s unclear if it’s able to bypass auth when FaceID is enabled, but I could see it going either way.)

    You said a couple of things that I specifically want to address:

    But it doesn’t solve anything that existing TOTP over text messages didn’t solve, other than some complexity, and it eliminated the password (something you know) factor at the server.

    and

    outside of MitM attacks that TOTP mitigates

    Text-message based TOTP - or SMA 2FA - is incredibly vulnerable. In many cases, it can be compromised without the user even realizing. A user with a 4 digit PIN (even if that PIN is 1234) and a Passkey on their device is much less vulnerable than a user using SMS 2FA with a password used across multiple services.

    If a user cares deeply about security, they likely already have a set of security keys (like the YubiKey 5C) that support U2F / WebAuthn, and they’ll add passkeys for their most sensitive services to those devices, protected by unique, high entropy PINs. This approach is more secure than using an equally high entropy password and U2F / WebAuthn if the latter isn’t secured with a PIN, since these devices are extremely secure and wipe their contents after 8 failed PIN attempts, but the password is transmitted to the server, which receives it in plaintext and stores it hashed, generally outside of a secure enclave, making the password vulnerable, e.g., if grabbed from server memory, or to a brute force attack on the hash if the server (which could be undetected and only involve read access to the db server), meaning a simple theft of the security key would be all that was needed to compromise the account (vs needing the PIN that is never transmitted anywhere).

    And app-based TOTP doesn’t mitigate MITM at all. The only thing it does is add a timing component requirement, which current MITM phishing attacks have incorporated. To mitigate such an attack you need Passkeys, Webauthn, or U2F as an authentication factor. To bypass this the attackers need to compromise the service itself or a certificate authority, which is a much taller task.

    The other thing is that we know most users reuse passwords and we know that sites will be compromised, so:

    • best case scenario, salted password hashes will be leaked
    • likely scenario, password hashes will be leaked,
    • and worst case scenario, plain text passwords will be leaked

    and as a result, that user’s credentials for a different site will be exposed. For those users, Passkeys are a vast improvement over 1FA, because that vulnerability doesn’t exist.

    Another factor is the increased visibility of Passkeys is resulting in more sites supporting them - U2F / Webauthn didn’t have great adoption. And getting these into the hands of more users, without requiring them to buy dedicated security keys, is a huge boost.

    For the vast majority of users, passkeys are an improvement in security. For the few for whom they aren’t, those users likely know that, and they still benefit from increased adoption of a MITM immune authentication method, which they can choose on a site-by-site basis. And even they can benefit from increased security by storing passkeys on a security key.


  • For logging in, Bitwarden supports TOTP, email, and FIDO2 WebAuthn on the free plan. It only adds Yubikey OTP and Duo support at the paid tier, and WebAuthn is superior to both of those methods. This is an improvement that they made fairly recently - back in September 2023.

    The other features that the free plan lacks are:

    • the 1 GB of integrated, encrypted file storage. This is a convenience that is nice to have, but not essential to a password manager.
    • the integrated TOTP generator. This is a convenience that many argue is actually a security downgrade (under the “putting all your eggs in one basket” argument).
    • Upgraded vault health reports - free users get username data breach reports but not weak / reused password reports. This is the main area where your criticism is valid, but as far as I know free competitors don’t offer this feature, either. I looked at KeepassXC and didn’t see this mentioned.
    • Emergency access (basically a trusted contact who can access your vault under some circumstances). This isn’t essential, either, and the mechanisms they add to ensure security of it cost money to provide.
    • Priority support - free users get 24/7 support by email, which should be good enough


  • None of what you’re saying has anything to do with whether an authentication flow is effectively implementing two-factor authentication.

    The server doesn’t need to know details about which two factors you used. If you auth with a passkey and it knows that passkeys themselves require an additional factor to be used, then it knows that you’re using 2FA.

    If they use a 4 digit pin or, worse, the 4 point pattern unlock, it’s easy enough to brute force on most devices.

    This is true, but that doesn’t mean it doesn’t qualify as an authentication factor. Nobody should use a 4-6 digit PIN for their phone, but this is a matter of individual security preferences and risk tolerance. In a corporate setting, the corporation can set the minimum standard here in accordance with their own risk tolerance.

    My password could be “password123” and it would still be one factor.


  • For an authentication flow to qualify as two factor authentication, a user must verify at least two factors - and each must be from the following list:

    • something they know, like a password
    • something they have, like a phone or security key
    • something they are - fingerprints, facial recognition (like FaceID), iris scans, etc…

    Passkeys require you to verify a password or authenticate with biometrics. That’s one factor. The second factor is having the passkey itself, as well as the device it’s on.

    If you login to your password manager on your phone and use your fingerprint to auth, that’s two factors right there.



  • It sounds to me like you have a bad drive. I have several m.2 drives and they all run hot to the touch, but none of them regularly (if ever) lose data.

    What drive are you using? Have you run any diagnostics on it to see if it’s failing? Are only new files disappearing?

    If you just want to address the heat issue, have you tried a heat sink? If you already have one make sure you’ve taken off the plastic wrap or anything else that could be reducing contact with the drive.



  • I can’t think of any reason why it would be inadvisable to cut off billing to an MMO subscription that was connected to a privacy.com card. Is there any basis for your concern? Do you know of someone who had “prevented Blizzard from billing them for WoW” on their credit report?

    I can’t say I’ve ever had an MMO subscription - or any prepaid account, for that matter - show up on my credit report. Or that I know anyone who has. Even prepaid credit cards don’t show up on your credit report.

    If a game, site, or app subscription fails to bill, the recourse the provider has is to cease providing the service. Standard industry practice is to suspend service and send out a notice, attempt re-billing a couple times, and to them consider the subscription canceled.

    A debt can show up on your credit report, even if it’s not associated with a loan or line of credit… But with a prepaid account, like an MMO subscription, you’re never extended credit and you never incur a debt. The exception would be if you signed a contract for your prepaid account stating that you’d maintain it for a certain amount of time (common with phone plans, internet plans, leases, some shady gym memberships, etc.) or you caused damages to the provider. Without such an agreement, there are no damages from just causing them to be unable to continue billing your credit card. If you were paying by check or disputed an already posted payment, that would be different - but neither of those are relevant here.



  • Many edible mushrooms have poisonous look-alikes, so your approach would be likely to misidentify those poisonous look-alikes - a potentially deadly mistake.

    For example - from https://www.gardeningknowhow.com/edible/vegetables/types-of-edible-mushrooms-their-poisonous-look-alikes

    Poisonous Morel Mushroom Look-alikes:

    • A common fungus, the false morel is almost the spitting image of its edible cousin except it is not hollow inside and contains cottony material.
    • Big red is similar except it has reddish tones and the cap is more brain-like.
    • Wrinkled thimble cap truly looks like a morel except its wrinkled cap hangs over the stem.
    • Bell morel is smaller and the cap, although similar, is much less textured and it has a cottony interior.

    It would be easy to train an ML model to confidently identify any of those as morels if you only trained on morels.

    The idea is to train on both so it’s less likely to mistake a poisonous mushroom for an edible one, and to then “hedge” your bet anyway, by always presenting the poisonous look-alikes first.

    The most dangerous scenario with this app is also the most useful - a user who has some training in mushroom identification uses the app as a quick way to look up a mushroom they think is a particular edible mushroom, notes that the mushroom they think it is is within the list, then reviews the list of poisonous look-alikes, and then applies their training to rule out those look-alikes. Finally they confirm that they cannot rule out the edible mushroom.

    The risks here are that

    1. the user’s training is lacking and that they ruled out a poisonous mushroom that the app suggested, or
    2. the app didn’t include the particular poisonous mushroom in the first place and the user was thus unable to consider it.

  • Identifying mushrooms with an ML-based algorithm is a fine idea if you properly design the application to leverage that. As a hedgehog, this is what I would do:

    1. Train my model on a variety of mushrooms, particularly poisonous ones.
    2. When testing the model, test as many mushrooms as possible and take note of what’s frequently mis-identified.
    3. When testing the model, make sure to get a variety of different kinds of lighting.
    4. In addition to the mis-identifications noted while testing the app, maintain a list of commonly misidentified mushrooms - like the hedgehog mushroom and its counterparts - particularly the ones a forager should be most concerned with (meaning the most poisonous ones).
    5. When identifying a mushroom to the user, err on the side of calling it a poisonous mushroom. Consider providing a list of possible matches, with the worst case scenario ones up top.
    6. Include pictures and other information about the mushrooms, as well as regional mushroom lookups for mushrooms that weren’t included.
    7. Don’t include text like “99% confident that this is a hedgehog mushroom” when the 99% figure is an output from your ML model. I know we said earlier to make sure to do a ton of testing and I’m sure you think you did, but you didn’t do enough to be able to say that. At best, reduce your certainty by 25%, then divide that number between the identified mushroom and the lookalikes, making sure to give extra weight to the most poisonous ones. So that 99% certainty becomes at most a more realistic 38% chance that it’s the poisonous lookalike and 37% chance that it’s whatever was identified in the first place.

    You might say that this app would be useless for determining if a mushroom is safe to eat, and I agree, but it’s also a better approach than any of the existing apps out there. If you need to use an app to determine if a wild mushroom is safe to eat then the answer is simple: it isn’t. C’mon, I’m a hedgehog and even I know that.


  • 100%. I got the Brother HL-2370DW and it served me well, but it’s a black-and-white laser printer and sometimes I needed to print in color. I got fed up dealing with inkjet printers so I got the Brother HL-3270CDW. It’s great at printing off props and visual aids for my weekly tabletop game and so on.

    It’s not technically “perfect” - sometimes a Mac will fail to print to them wirelessly and say they printed fine (and that seems to be an issue on the Mac side) and I think we average maybe one paper jam per year - but it’s as close to perfect as I’ve ever gotten with a printer.