- 0 Posts
- 18 Comments
kongstrong@lemmy.worldtodatahoarder@lemmy.ml•Epstein Files Jan 30, 2026 Release - Archived from Justice.gov
1·18 days agowe’re working on some more complex solutions in an Element group. Not really sure where we stand at this moment, but it seems we can stitch a lot together from the large torrent files and by what we scraped from the DOJs website through a little bit of force.
kongstrong@lemmy.worldtodatahoarder@lemmy.ml•Epstein Files Jan 30, 2026 Release - Archived from Justice.gov
4·18 days agoyea for me it fails after anywhere between 200MB and 10-15GB. All the time.
kongstrong@lemmy.worldtodatahoarder@lemmy.ml•Epstein Files Jan 30, 2026 Release - Archived from Justice.gov
4·18 days agoPSA: paging bug has been fixed on the DOJ’s website. Website caps out at around 9600 for ~197k files, way less than the 520k in the less-complete dataset 9 torrent. Scraping the website now to find out which files they took offline.
Correction: 9600*50 files per page is in the 470k ballpark. Much more tan 197k but still a lot less than the torrent’s 530k let alone the expected 600k+ files that were supposed to be in there
kongstrong@lemmy.worldtodatahoarder@lemmy.ml•Epstein Files Jan 30, 2026 Release - Archived from Justice.gov
1·18 days agowe’ve been delving further into it on Element. I can invite you (and anyone else wondering) to the channel if you pm me your matrix id
kongstrong@lemmy.worldtodatahoarder@lemmy.ml•Epstein Files Jan 30, 2026 Release - Archived from Justice.gov
3·18 days agoif what you’re saying is that CSAM seems like a very good excuse to redact a lot more of those files than they previously intended, I agree yes.
kongstrong@lemmy.worldtodatahoarder@lemmy.ml•Epstein Files Jan 30, 2026 Release - Archived from Justice.gov
2·18 days agoysk the page limit has been fixed, it caps out around 9600 for a total of ~197k file entries. Way less than the largest torrent’s 530k. Scraping now to get a list of the files they kept on the DOJ so we can determine which files they don’t want out there. Would be a good lead to further investigate the torrent
kongstrong@lemmy.worldtodatahoarder@lemmy.ml•Epstein Files Jan 30, 2026 Release - Archived from Justice.gov
1·18 days agoyou’re not getting your connection cut off from the place where you’re downloading? That’s huge, could you let me know if you succeed?
kongstrong@lemmy.worldtodatahoarder@lemmy.ml•Epstein Files Jan 30, 2026 Release - Archived from Justice.gov
4·20 days agowe’re on element
kongstrong@lemmy.worldtodatahoarder@lemmy.ml•Epstein Files Jan 30, 2026 Release - Archived from Justice.gov
8·20 days agoDM me your matrix account, we’re looking to get more people to uncover what’s missing from dataset 9, see https://lemmy.world/post/42440468/21884671
kongstrong@lemmy.worldtodatahoarder@lemmy.ml•Epstein Files Jan 30, 2026 Release - Archived from Justice.gov
2·20 days agonot familiar with it but sure i can set something up, will DM the 3 of you a link in a minute
kongstrong@lemmy.worldtodatahoarder@lemmy.ml•Epstein Files Jan 30, 2026 Release - Archived from Justice.gov
1·20 days agoI can also take up some of these. Do you happen to have more of those gaps?
Also, are you guys using some chat channel for this? Might be a little more accessible
E: other users that run into this thread, DM me and I can add you to an element group where we coordinate this. We’re looking for more people
kongstrong@lemmy.worldtodatahoarder@lemmy.ml•Epstein Files Jan 30, 2026 Release - Archived from Justice.gov
2·20 days agonice. Kinda feeling like we can’t be sure whether our URL lists are ever exhaustive enough or that the DOJ might just let a large part of the dataset go dark
kongstrong@lemmy.worldtodatahoarder@lemmy.ml•Epstein Files Jan 30, 2026 Release - Archived from Justice.gov
2·20 days agoAwesome, I don’t really understand what’s happening but I’m also running it (also doing it for the presumably exact same 48GB torrent, but I’m supposed to do that right?)
kongstrong@lemmy.worldtodatahoarder@lemmy.ml•Epstein Files Jan 30, 2026 Release - Archived from Justice.gov
1·20 days agoI’ve been checking your URLs but it seems you’ve got a lot without a downloadable document attached?
kongstrong@lemmy.worldtodatahoarder@lemmy.ml•Epstein Files Jan 30, 2026 Release - Archived from Justice.gov
4·20 days agoWould love to help still from my PC on dataset 9 specifically. Any way we can exchange progress so I won’t start with downloading files you already have downloaded?
E: just started scraping starting from page 18330 (as you mentioned you ended around 18333), hoping I can fill in the remaining 4000-ish pages
Update 2 (1715UTC): just finished scraping up until the page 20500 limit you set in the code. There are 0 new files in the range between 18330-20500 compared to the ones you already found. So unless I did something wrong, either your list is complete or the DOJ has been scrambling their shit (considering the large number of duplicate pages, I’m going with the second explanation).
Either way, I’m gonna extract the 48GB and 100GB torrent directories now and try to mark down which of the files already exist within those torrents, so we can make an (intermediate) list of which files are still missing from them
group on element, I could send you an inv if you pass your matrix id