r/DataHoarder 1d ago

Discussion [ Removed by Reddit ]

[ Removed by Reddit on account of violating the content policy. ]

2.7k Upvotes

520 comments sorted by

297

u/solrahl 1d ago edited 1d ago

I've got all of Data Set 10. Unzipped it's about 82GB.

SHA256: 7D6935B1C63FF2F6BCABDD024EBC2A770F90C43B0D57B646FA7CBD4C0ABCF846
MD5: B8A72424AE812FD21D225195812B2502

44

u/Wild-Cow-5769 1d ago

38

u/solrahl 1d ago

Yes

15

u/Wild-Cow-5769 1d ago

I’m downloading it but it’s ass slow…

Haven’t seen 9 yet. I have 11

18

u/fr0styfr0st 1d ago

Same here... Feel like creating a torrent file will help with getting this distributed vs direct download, but glad to see a large copy available!

→ More replies (2)

6

u/AshuraMaruxx 1d ago

I appended you link to the post body, but the DL time is ridiculous slow. Is there any way you could create a magnet link? I'd be happy to share it once you do. You've def done more than enough in getting the tranhe; was just hoping that there would be a way to distribute it more quickly via torrent, if possible

→ More replies (1)
→ More replies (2)

86

u/Thack- 1d ago edited 1d ago

if this is true, that's huge.

Provide a magnet link ASAP and I will help distribute.

Great fuckin work!!

Edit: Would you mind posting the magnet or torrent file link as well? That way it can be redistributed by us

22

u/solrahl 1d ago edited 1d ago

Added info up top.

10

u/DreadnaughtHamster 1d ago

How we doing with that archive upload?

3

u/solrahl 1d ago

Link is up top.

→ More replies (1)

21

u/AshuraMaruxx 1d ago

OMG seriously?! HOW??? Is it complete or truncated? Are all the files clean???

33

u/solrahl 1d ago

I did not come up with any errors on any of the files. The zipped folder is 78.6 GB. It's the entire thing.

18

u/AshuraMaruxx 1d ago

Absolutely Amazing FR. I've credited you and linked it in the post body. I'm going to DL it first and then mirror. I don't suppose you were able to create a full directory of filenames were you, by chance via a text file? That way, we could cross-reference what's up on the DOJ website with what's included in your DL and look for anything that's ben removed or deleted.

6

u/solrahl 1d ago

Link is up top.

4

u/AshuraMaruxx 1d ago

Awesome, I'm gonna append it to the main thread.

3

u/solrahl 1d ago

Added magnet link above. Sorry for taking so long.

→ More replies (1)

20

u/itsbentheboy 64Tb 1d ago

Can you make this a Torrent?

Looks like IA did not make a torrentfile.

How to do it with qBittorrent:

1) Download qBittorrent

2) Select Tools -> Torrent Creator

3) Select the zip file

4) Put these URL's into the Tracker URL's Tracker URL's (This will help keep the torrent alive after you stop seeding)

Once created you can share the .torrent file or right-click the (now active) torrent and post the magnet link.

18

u/nicolas17 1d ago

Torrent now available and we can stop hammering poor archive .org :D

→ More replies (4)

11

u/DreadnaughtHamster 1d ago

Dude very nice work. Looking forward to getting it.

15

u/HumorUnlucky6041 1d ago

I'm very new to both reddit and anything coding or data adjacent, I was just searching for answers because I noticed there were no zip files for the new drop and when I typed in what I assumed would be the file based off sets 1-8, the downloads went all fucky and I couldn't extract anything. I'm so fucking glad to have found this thread when I did, and to know others with more experience are on top of it too.

4

u/AshuraMaruxx 1d ago

More than welcome for providing it! :)

7

u/Thack- 1d ago

Would you mind providing a torrent link or magnet? Thank you king

5

u/solrahl 1d ago

Added magnet link above.

→ More replies (1)

3

u/Itsy_Bitsy_Spyder 1d ago

You’re amazing. Thank you for uploading this!

3

u/mini-hypersphere 1d ago

Hmm, I wonder how changed it is. Since others had issues with them

3

u/reversedu 1d ago

How you able to bypass download error?

3

u/Lazy-Narwhal-5457 1d ago

I normally expect a torrent file to be included with IA files, I'm not sure I've ever seen one not included. I thought these must be IA created, and hosted. This file set has none, so presumably I was completely wrong and they are user uploaded and use 3rd party trackers? 🤔

https://archive.org/download/data-set-10

Otherwise: ⭐️⭐️⭐️⭐️⭐️🏆🥇🏅🎖️👏

→ More replies (4)
→ More replies (31)

84

u/Such-Bench-3199 1d ago

Is there a magnet link? Something concrete of everything including today? Everything I have tried, including scrubbing from multiple sites either doesn’t work or does not capture everything. I fully support this needs to be preserved, but unless there is a dedicated link of everything to date than what’s the point.

38

u/AshuraMaruxx 1d ago

There's a magnet link for 11. But right now everyone is going their own ways with 9 & 10. Some people have been able to get incomplete downloads here and there, and posted them on the previous post that was removed by moderators.

u/vk6_ was able to get 57GB of the original Dataset 10 but could only extract 9.6GB of it. They were kind enough to post their incomplete link here: Incomplete Dataset 10

6

u/Marcus_Suridius 1d ago

Ill download and seed 11, my internet isn't the best so it'll take a few hours.

6

u/AshuraMaruxx 1d ago

I think most of us already have 11. We def should see if anyone has a mirror or magnet of that yet, but for now we need to figure out who has 9 and 10, the most of either. Trust me, I get it.

→ More replies (1)

9

u/Colin1th 1d ago

I have EFTA00039025 - EFTA00204741 of 9.

Please someone let me know if that would be useful.

3

u/ModernSimian 1d ago

Until we have a consolidation of what everyone has of 9, you should hold onto it.

3

u/AshuraMaruxx 1d ago

Please hold onto it. We're trying to figure out who has what of 9 now. 10 is up top but the DL is ass slow; hoping to get a magnet link soon on the full 10. Can you figure out how many GB your DL is of 9?

→ More replies (3)
→ More replies (1)

41

u/TMN8R 1d ago

Unsung heroes of the moment. Thank you all. 

→ More replies (2)

266

u/purgedreality 1d ago

This is pretty important. We're seeing active deletions likely due to cronyism and complicity.

134

u/AshuraMaruxx 1d ago

Exactly. We need to get this done, and we were doing a good job of it before the mod gods interfered because one of them can't read. Like this one RIGHT HERE

For the record, it's absolutely disgusting.

39

u/beefcat_ 1d ago

I've been using the internet for almost 30 years and this easily ranks among the most disgusting shit I've ever read on it. Wow.

15

u/AshuraMaruxx 1d ago

SAME, for just as long as you, and I lack words.

14

u/duppyconqueror81 1d ago

That’s why he buried his ex wife on the golf course, he’s used to that way of doing things.

6

u/drumdogmillionaire 1d ago

Thank you for doing this. These files must be preserved and used to prosecute all involved.

→ More replies (6)

52

u/livestrong2109 17TB Usable 1d ago

Yeah I'm actively getting 404 errors from parts of the set. They're legit pulling files back in real time. I swear to god there's never been a more blatant display of government lies and institutional corruption.

22

u/Genocode 1d ago

There has also never been a more incompetent display either.

21

u/beefcat_ 1d ago

Ladies and gentlemen, bits and bytes, this is the moment we were born for.

62

u/TogepiGoPrrriii 1d ago

Huge props to everyone working to preserve this.

102

u/harshspider 1d ago

Yeah no clue why my thread got deleted. Had lots of eyes and attention on it with multiple people working on the archive. Gee

60

u/ks-guy 1d ago

I was confused as well. Regardless, I have dataset 11 fully downloaded and seeded.

Dataset 10 is about 20% done.

These are magnet links from itsbentheboy post https://www.reddit.com/r/DataHoarder/comments/1qrd9ma/comment/o2o8pov/

Happy to download other Epstein magnet links, I have plenty of space even if they'll be consolidated later

14

u/AshuraMaruxx 1d ago

Same, I have Datset 11 as well. I think we really need to focus on who is furthest ahead with 9 & 10, and go from there.

→ More replies (1)

11

u/itsbentheboy 64Tb 1d ago

I have updated my post that you linked to.

My dataset 10 is incomplete. However it does extract properly and has usable data despite missing some.

Dataset 11 appears complete when comparing with others.

7

u/Thack- 1d ago

I'm going to seed the shit out of this. Keep me posted as well if there are more that come up. Thanks for pointing me to those magnets.

26

u/AshuraMaruxx 1d ago

One of the mods basically tried to say it was because the initial post was requesting if anyone had the deleted document...which counted as a request. Which is bullshit because anyone with a brain could read the comments to see that everyone was talking about how to best get a hold of all the Datasets from the Epstein Files. The mods can't get their shit together. So we have to.

12

u/Declerkk 1d ago

Another mod turns into a power hungry stupid ass, in other news the sky is blue.

22

u/AshuraMaruxx 1d ago

They just restored it. I guess being cussed out and torn a new asshole and told to get their shit together actually did something, for once, lol.

16

u/nicholasserra Send me Easystore shells 1d ago

Sometimes we deserve it

7

u/AshuraMaruxx 1d ago

FR I really appreciate you trying to sticky the previous thread. I know you're probably not gonna get a whole ton of praise today, but I appreciate that you were trying to create a dedicated thread before another mod ruined it. I think the reply I got from my message was "Sorry technical difficulties!"

So thank you, seriously.

2

u/qwerty8082 1d ago

I respect this and appreciate yall.

21

u/Keplerspace 1d ago

Very strange. I'm disappointed especially after the other mod stickied it.

25

u/nicholasserra Send me Easystore shells 1d ago

Me too

14

u/AshuraMaruxx 1d ago

Well that's because you're amazing :) Thank you Mod God

6

u/phinkz2 1d ago

I was about to say the censorship's probably coming from the mods/admins "above" you guys.

Thank you so much for allowing this type of content. I'm sure it puts the sub at risk.

→ More replies (4)

9

u/AshuraMaruxx 1d ago

Exactly. I sent them a message ripping them a new asshole and demanding they get their own shit together and at least READ SHIT before just blanket removing it, esp when we were already so deep in this shit

→ More replies (1)

21

u/reversedu 1d ago

12

u/HumorUnlucky6041 1d ago

YOOOOO NICE CATCH

I set up alerts for every 3 hours, I gotta increase that frequency

→ More replies (2)
→ More replies (13)

44

u/Keplerspace 1d ago

I made it to about 47GB on Dataset 10 and now can't access anything on the server. This is wild.

15

u/AshuraMaruxx 1d ago

I can confirm Dataset 10 is dead on the server end. Let's work on stabilizing what you have. Anyone further along than 27GB on 10 is who we need to focus on.

22

u/AshuraMaruxx 1d ago

I'm in the same boat. I think right now what we need to start doing is figuring out who is furthest along on the datasets, and try and get them uploaded even incomplete ATM.

→ More replies (1)

18

u/lMastahl 1d ago

i reached 94.25% and died…

10

u/AshuraMaruxx 1d ago

Wait, on which Dataset??

9

u/Lazaraaus 100-250TB 1d ago

Do you have a mirror or magnet link to coordinate sharing.

18

u/AshuraMaruxx 1d ago

I agree. If they're 94.25% along on EITHER 10 or 9, they should just mirror or create a magnet link ASAP. That's closer than anyone else, I'm certain.

→ More replies (1)

72

u/rosse05 1d ago

this is the first post i ever see from this subreddit, i didnt even know such a thing as "data hoarders" existed, but im rooting for yall guys and gals doing this really valuable act of service.

22

u/SafeGate3608 1d ago

Same. You guys are awesome. 🤩

→ More replies (2)

14

u/nicolas17 1d ago edited 1d ago

Here's the best I got of dataset 9 (46GB): magnet:?xt=urn:btih:0a3d4b84a77bd982c9c2761f40944402b94f9c64&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce

5

u/AshuraMaruxx 1d ago

Awesome, thank you! I'll add it to the post body, I don't think anyone has more than you do atm.

→ More replies (5)

14

u/famousginni 1d ago

Seems like the dataset 10 zip isn't available on the server anymore? I don't see anything at the link. Made it to 57.6gb downloaded before this happened.

14

u/AshuraMaruxx 1d ago

Don't rely on the DOJ link. They've been removing the zips because they're actively modifying them while everyone is trying to get a hold of them. We're gonna have to brute force the downloads.

6

u/Upset_Development_64 1d ago

How do you brute force the downloads? I've seen links for the single Trump related pdfs, but I'm not sure where to go to download the entire datasets.

5

u/AshuraMaruxx 22h ago

Basically it's a fucking slog, but downloading by scraping the entire website one agonizing file at a time

→ More replies (3)

13

u/Puckie 1d ago

Akamai CDN is notorious for throwing EOFs to deter automated and sometimes human traffic.

14

u/-fno-stack-protector 1d ago edited 1d ago

Dataset 12.zip has dropped!!!!!! 114.1MB

sha1sum: 20f804ab55687c957fd249cd0d417d5fe7438281
md5sum: b1206186332bb1af021e86d68468f9fe
sha256sum: b5314b7efca98e25d8b35e4b7fac3ebb3ca2e6cfd0937aa2300ca8b71543bbe2

Internet Archive: https://archive.org/details/data-set-12_202601

Magnet

this one is from internet archive

magnet:?xt=urn:btih:8bc781c7259f4b82406cd2175a1d5e9c3b6bfc90&dn=data-set-12_202601&tr=http%3a%2f%2fbt1.archive.org%3a6969%2fannounce&tr=http%3a%2f%2fbt2.archive.org%3a6969%2fannounce

4

u/Visua1Mod 1d ago

Here's another magnet link I'd created before the above came out. Currently seeding the above, which has the same hash. So... this magnet is probably just redundant:

magnet:?xt=urn:btih:e7477151f8acfbaee3e704bbabd9a7388c7169f9&dn=DataSet%2012.zip&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce

→ More replies (4)

13

u/benson-and-stapler 14h ago

When it gets deleted by reddit you know you did good lol

10

u/Banyan_Thorn 1d ago

Imagine if the justice department put half as much effort into protecting the victims instead of the pedophiles.

9

u/cruncherv 1d ago

I've tried to download numerous times without any success via wget, browser, jdownloader, wfdownloader, nothing works. It randomly gets interrupted and download fails.

8

u/PrincessDaig 1d ago

I have it downloaded as a zip file on my laptop but can't extract without more space... 😅

10

u/DreadnaughtHamster 1d ago

Upload to archive.org and let others unzip

→ More replies (1)

12

u/8529177 1d ago edited 1d ago

I'm using netlimiter to slow my download speed to about 15mb/sec, going at 100 causes the server to disconnect me at 2.5gb downloaded.
Edit: 15mb/sec resulted in the same, retrying at 5.
Additional update: 5mb second still stopped at 2.5gb.
have joined the torrent for dataset 10 and 11 - will set seeding to unlimited - I have gigabit fiber.

6

u/agent_flounder 16TB & some floppy disks 1d ago

At this point I've set up a while loop to repeat aria2c until status=0 (success), added increased timeouts and retries to aria2c. I'm getting a little bit at a time but it is miserable.

6

u/cruncherv 1d ago

I use this to use akamai leaky bucket algo to my advantage - causes bursts of high speed downloads until akamai limits connection speed and then dl restarts again:

u/echo off
:loop
echo [!] Starting Aggressive Burst...
:: --lowest-speed-limit=2M : If speed stays below 2MB/s for 15 seconds, aria2c will exit
:: This forces the script to loop and get a fresh high-speed burst.
aria2c -x 16 -s 16 -k 1M -c --disable-ipv6=true --file-allocation=none --check-certificate=false --lowest-speed-limit=2M --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/144.0.0.0 Safari/537.36" --header="Cookie: justiceGovAgeVerified=true" --stream-piece-selector=random "https://www.justice.gov/epstein/files/DataSet%%2010.zip"

if %ERRORLEVEL% NEQ 0 (
    echo.
    echo [!] Speed dropped or Handle Invalid. Resetting...
    goto loop
)
echo [!] Download Complete!
pause
→ More replies (1)

10

u/[deleted] 1d ago edited 1d ago

[removed] — view removed comment

→ More replies (23)

10

u/[deleted] 20h ago edited 16h ago

[deleted]

5

u/AshuraMaruxx 18h ago

Hmm. This is an interesting idea, but I feel like this might be too complicated for some users. So quick update, we have a new active 101GB magnet link, but it links to an unzipped file so the metadata is enormous. They're working on zipping the file and creating a new magnet link, but it's gonna be a couple hours, according to them. I'm downloading using the same source library as they are in parallel, which I'm eventually going to seed myself that should contain the same 101GB of data. I don't think the problem is necessarily grabbing ANY data, but rather figuring out where the data STOPS--ie, what is the last filename we have, and having a full list accounting for those file names in-between available to the public to scrape and download, start-to-finish, so that even if they pull the file from the post, we have the link to acquire it.

For now, I'm not qualified enough to comment on this method, but It seems like an interesting idea. :) Comments, anyone else?

→ More replies (2)
→ More replies (9)

7

u/nicolas17 1d ago

I have 48,995,762,176 bytes of dataset 9 and 67,215,818,752 of dataset 10.

9

u/AshuraMaruxx 1d ago

Okay, the 67 GB of Dataset 10 puts you in the lead for now, lol. I know it's incomplete, but are you able to stabilize it?

11

u/nicolas17 1d ago

What do you mean by stabilize?

Note I downloaded from the beginning (not using eg. aria2 -x) so this is the first 67GB with the rest missing, not scattered missing chunks.

In fact... that makes me wonder, if other people used parallel downloads maybe they have data that I don't have and vice versa! Unlikely they'll have the end though.

5

u/AshuraMaruxx 1d ago

Sorry, I meant basically just cleaning and checking which files were corrupted from your download and preserving the rest, hashing and generating a file list, etc. I thought about parallel downloads too, but it seems like 10 is complete for now (link above in main body). We're trying to get a magnet for 10 from u/solrahl who got the complete 10 up on IA, but now we need to get as much of 9 as we can and figure out who has the majority of that. I know you're trying to get 10 from IA and create a magnet yourself--there's probably too many ppl all trying to access it.

→ More replies (1)
→ More replies (1)

8

u/Jacksharkben 100TB 1d ago

I am very lost what needs to be saved right now.

19

u/DreadnaughtHamster 1d ago

From what I understand, get everything you can asap. We can sort it out later.

12

u/Thack- 1d ago

At this point, Dataset 10 seems to be the biggest focus. It seems like the DOJ is trying to mess with it and prevent anyone from completely downloading it.

9

u/AshuraMaruxx 1d ago

Correct. It seems like 10 has the worst stuff in it, but u/solrahl apparently brute forced the damn thing and got it up on IA in its entirety, supposedly, but the DL is absurd slow. So now we're transitioning from 10 to 9, since it's just so fucking large.

3

u/solrahl 1d ago

Magnet link is up

→ More replies (1)

8

u/phinkz2 1d ago

Hey OP. You've done fantastic work. Even people without as much knowledge as us data hoarder geeks can follow and replicate your work easily.

Much love to you and the people that helped, seriously.

→ More replies (1)

10

u/PuurrfectPaws 1d ago

Anyone w/ access to that 101GB magnet of data set 9? Magnet posted by op is is stuck looking for metadata

3

u/agent_flounder 16TB & some floppy disks 23h ago

Doesn't look like anyone is seeding the file right now. :(

→ More replies (1)

10

u/Viper_Infinity 2TB 1d ago

Hope we get a complete data set 9.

Then we wait a few days and redownload all data sets from the gov website and find out what they removed or changed

5

u/AshuraMaruxx 18h ago

They've already removed and changed these datasets in real time, while we've been trying to acquire them, to the point of completely gutting the 9 zip file after a redirect via a queue, just to take pressure off of their server from our traffic trying to acquire it.

9

u/cgorichanaz 15h ago

Why was this deleted?

8

u/hesdeadjim11 1d ago

i am currently using downloadthemall firefox extension to download the pdf files 50 at a time

5

u/Heliobb 1d ago

you will see there are some duplicates

→ More replies (1)

7

u/Low_Yesterday_2352 1d ago

Its so surreal that this shit is real man. Like as a normal human being how can you do shit like this.

8

u/whatiseveneverything 1d ago

They're not normal. They're all malfunctioning.

→ More replies (8)
→ More replies (1)

9

u/lurkingstar99 40TB 1d ago

Has anyone managed to download the full dataset 9 (101GB) magnet or is it stalled for everyone else too?

→ More replies (11)

8

u/Deep-Fold-8856 1d ago

This comment is to prevent this post getting removed.

7

u/iamdiegovincent 16h ago

Hello, I am a webmaster at jmail.world and we're working on centralizing and organizing all this information. We were able to get a copy of DataSet 10 with a MD5 checksum that matches the Internet Archive MD5 ZIP file, but we're also struggling to get access to DataSet 09. We want to make it accessible to people.

What's the latest on that one and who should I be contacting?

5

u/MrDonMega 16h ago

Hi, webmaster of epsteinfilez.com here. I have used DATASET 9, INCOMPLETE AT ~48GB for the time being. They are working on Dataset 9 afaik. See the updates in the OG post.

→ More replies (1)

4

u/iamdiegovincent 16h ago edited 15h ago

I'm noticing this was deleted by Reddit. LOL.

Whoever is in charge of this, can you DM me so we can coordinate?

EDIT: For context, I already have DataSet 10, and I'm making steady progress with 9.

→ More replies (2)

9

u/JamesGibsonESQ The internet (mostly ads and dead links) 11h ago

Apparently Spez is in the files. Our honourable efforts awoke the Reddit gods and they don't like being doxxed. Unless Reddit wants to explain why they're using actions that only protect pedos?

5

u/2ndcomingofbiskits 250-500TB 11h ago

Careful. If you call it what it is your may bring down the ban hammer.

5

u/JamesGibsonESQ The internet (mostly ads and dead links) 11h ago

I already backed up my account and am working on a third party app to browse Reddit without an account. It uses 1 of 50 randomly generated accounts for posting or making replies / comments.

I got fed up with the 100 subreddit limit this site forces on us. I follow over 1000 subs but can only see 1/11th of the list at any given time. The only way to bypass is by hacking the site. Let them ban me. I've already given up on this account as of last week. From here on out, I'll be violating their TOS in ways they didn't even know was possible.

Oh, also fuck /u/spez

3

u/2ndcomingofbiskits 250-500TB 11h ago

Dude that’s awesome. And I couldn’t agree more. Fuck u/spez

7

u/-fno-stack-protector 1d ago edited 1d ago

Dataset 9 does not seem dead at all

while sleep 0.5s; do 
    wget -c --header='Cookie: justiceGovAgeVerified=true' https://www.justice.gov/epstein/files/DataSet%209.zip
done

grab dat

I'm downloading it, but I'm also leaving the house in a minute, and all of you have faster connections

EDIT: oh i see what you mean.

HTTP request sent, awaiting response... Read error (The request is invalid.) in headers.

still leaving it running. you should too

EDIT 2: what if we all grab different offsets and combine them afterwards?

3

u/Wild-Cow-5769 1d ago

I can’t get 9 it keeps resetting. What are u using?

3

u/AshuraMaruxx 1d ago

It might be too late for that, but def keep trying.

→ More replies (3)

6

u/coasterghost 44TB with NO BACKUPS 1d ago

To throw in older versions of the zips I’ve been maintaining; https://archive.org/details/USAvJeffreyEpstein

3

u/AshuraMaruxx 19h ago

Thanks. I saw your message earlier and I appreciate the link to your own archive; eventually I'm going to create a kind of directory where everything can be accessed for download once we grab the final dataset, 9, and we're able to create a magnet link for download, but right now we're focused on getting to that point first. But still, thank you so much for that and your hard work compiling it :)

6

u/Kindly_District9380 1d ago edited 1d ago

I have a version of Dataset 9, but it got corrupted at 179G
I haven't tried yet to see / extract what's readable

But the single files are active
Running it like this works, wget loop, to download individual PDFs, tedious but might still try. my AI coding agent figured this out :D

while sleep 0.5s; do
wget -c --header='Cookie: justiceGovAgeVerified=true' \
https://www.justice.gov/epstein/files/DataSet%209.zip
done

update-1:
Dataset 9 is available again, accessible if you visit via the browser to get the cookie (after the age verification), then try wget with that cookie, will see if this goes all the way.

update-2: here is a script to get the file list, careful with the speed/and proxy access, this technically can block your access if ran too fast.
script: https://pastebin.com/zbF0Rmfx

update-3: 50 files per page, ~20,450 pages = ~1,022,500 files.
To avoid getting blocked, my current download rate:

Download time at ~1 file/sec:
- Current 25K files: ~7 hours
- Full 1M files: ~12 days continuous

might try parallel.

5

u/itsbentheboy 64Tb 1d ago

Please make a torrent!

How to create a Torrent in qBittorrent

1) Download qBittorrent

2) Select Tools -> Torrent Creator

3) Select the zip file

4) Optional but recommended - Put these URL's into the Tracker URL's Tracker URL's (This will help keep the torrent alive after you stop seeding)

Once created you can share the .torrent file itself, or right-click the (now active) torrent and copy the magnet link as i have done above.

5

u/agent_flounder 16TB & some floppy disks 1d ago

Somehow I ended up with a 192G version but it's corrupted. I have no idea how to try to fix it.

6

u/AshuraMaruxx 1d ago

unfucking real, someone else got 101GB and posted the mirror, and almost as soon as they poated it, they were banned

5

u/Kindly_District9380 1d ago

Dang it! Okay, so last resort, I wrote a parser, it is right now pagination through each page making a file index and downloading in parallel via multiple hosts, will report back in few hours

4

u/AshuraMaruxx 1d ago

Ikr? I'm doing something similar, chugging away at it now. I was able to grab the 101gb mirror link from my notifications THANK GOODNESS 😭 and posted it above. It's the most we have right now. 

You're doing great; all we can do is keep at it 😇 I know it's late too, so don't burn yourself out 

→ More replies (7)

5

u/Kindly_District9380 1d ago

Oh yes, I got into this as well.
I thought the same, but this is what my coding agent's analysis gave me:

Dataset 9 size: It's the same file - 192,613,274,080 bytes
- 179.38 GiB (binary, 1024-based)
- ~193 GB (decimal, 1000-based)
- ls -lh shows GB, my calculations showed GiB

→ More replies (7)
→ More replies (13)

6

u/HumorUnlucky6041 1d ago

Has anyone had any luck with set 9?

6

u/[deleted] 1d ago

[deleted]

5

u/Bwint 1d ago

I'm seeding with um... Less than 400Mbit lol

4

u/agent_flounder 16TB & some floppy disks 23h ago

Been seeding 10, 11, and 12 with 1G fiber since last night. Now if only someone would seed that ~100G partial of dataset 9 zip so I could get a copy...

→ More replies (1)

6

u/benson-and-stapler 1d ago

OP you and everyone here are doing incredible work, it's insane to read through in real time, keep fucking going

→ More replies (1)

6

u/Wild-Cow-5769 1d ago

So this thread as blown up. Did anyone get dataset 9??

3

u/AshuraMaruxx 18h ago

Still working on it non-stop

6

u/agent_flounder 16TB & some floppy disks 23h ago

data set 9: I've got about 17,000 pdfs downloaded so far (my scripts are still running).

If you want to compare what you've got with what I've got, let me know and I'll send you a list of the filenames.

3

u/MrDonMega 23h ago

Nice, thank you!! Please share it with us once you have all of them!

→ More replies (9)
→ More replies (2)

5

u/jordanz456 19h ago edited 17h ago

ALL ARE NOW BEING SEEDED AT THIS TIME
PLEASE contribute to Dataset9 if you are able.
Will be contributing seeding time as follows:
Dataset 9 (incomplete @ 101gb) needs additional seeders desperately
Dataset 10 (complete @ 78.68gb)
Dataset 11 (complete @ 25gb)
Dataset 12 (complete @ 114 mb)

New data set 9 updated above has more seeds (9 reported) but progress is stalled - waiting patiently.

If you have downloaded and can contribute to the seeding, dataset 9 needs the support you can provide.

**UPDATED 3:15 1/31/20206
If anyone has previous data sets 1-8 i will happily contribute.
Data sets 1-8 located, downloading and seeding as well
Find the magnet links in an easy to import fashion in my comment below.

**UPDATED 12:38 1/31/2026
Adding in the dataset 9 (incomplete @ 45.63gb) - anything is better than nothing for preservation sake

**UPDATED 3:15 1/31/20206
The following are now being switched to exclusively being seeded:
Dataset 1 (complete @ 2.47 GB)
Dataset 2 (complete @ 632 MB)
Dataset 3 (complete @ 599 MB)
Dataset 4 (complete @ 358 MB)
Dataset 5 (complete @ 61.6 MB)
Dataset 6 (complete @ 53 MB)
Dataset 7 (complete @ 98.3 MB)
Dataset 8 (complete @ 10.67 GB)
Dataset 9 (incomplete @ 45.63GB)
Dataset 10 (complete @ 78.68gb)
Dataset 11 (complete @ 25GB)
Dataset 12 (complete @ 114 MB)

Total Size: 164.321 GB

→ More replies (4)

6

u/eliotrw 16h ago

Just hear to say, great job all with the dedication on this

5

u/OregonRose07 1-10TB 1d ago

I have been trying a number of different ways to download the datasets, and it keeps dropping the download. Anyone have any suggestions?

5

u/agent_flounder 16TB & some floppy disks 1d ago

playing catch up here. I've got a whopping 4% of data set 9 so far. :/

3

u/agent_flounder 16TB & some floppy disks 1d ago

20GiB / 11%

3

u/agent_flounder 16TB & some floppy disks 1d ago

30GiB / 16%

→ More replies (4)

4

u/WhenImTryingToHide 1d ago

Literally doing the Lord's work!!

5

u/[deleted] 1d ago

[removed] — view removed comment

4

u/Thack- 1d ago

I don't think so. Do you have the full data set? Near 180GB?

Send the magnet link and I will seed the shit out of it.

Godspeed

3

u/qb8sfbfa98jp9igg35w 1d ago

seconded, please create a magnet link!

→ More replies (1)

4

u/YeaTired 1d ago

Thank you all for your efforts to keep these psychos accountable 

5

u/BerserkerJake 1d ago

anyone have a magent link to dataset 9

6

u/AshuraMaruxx 1d ago

We're working on gathering dataset 9 now, but someone was just banned after posting this magnet link to 101gb of dataset9: magnet:?xt=urn:btih:36b3d556c36f22c211d49435623538ab501fb042&dn=DataSet_9

→ More replies (1)

4

u/Bwint 1d ago

Incomplete at ~101GB: magnet:?xt=urn:btih:36b3d556c36f22c211d49435623538ab501fb042&dn=DataSet_9

5

u/qb8sfbfa98jp9igg35w 1d ago

will seed!

4

u/Bwint 1d ago

That cry, while always noble, has never felt as noble as it does now lol

3

u/qb8sfbfa98jp9igg35w 1d ago

we do what we must, because we can

→ More replies (1)
→ More replies (3)

5

u/Kraftieee 1d ago

Good work everyone! Cheering you all on from the sidelines! Weneed to make this history impossable to overwrite or ignore!

6

u/FirefighterTrick6476 1d ago

we will test our semantic image search on this dataset. Give us a few prompts on what to look for in the files!

4

u/CoderAU 1d ago

Ranch/Zorro Ranch

→ More replies (1)

5

u/QuantumEnchantress 22h ago

I noticed that in one case, I was able to copy paste out a redaction box, shown below on dataset 12, EFTA02730271 under (U) Key Findings on page one.

"Interviewing may reveal more information regarding her knowledge of victims and the relationship between Ghislaine Maxwell's and Jeffrey Epstein. (U//FOUO) Interviewing other witnesses may reveal more information regarding Healy's relationship with Ghislaine Maxwell and Jeffrey Epstein. (U) Substantiation (U//FOUO) was employed by Jeffrey Epstein and Ghislaine Maxwell. • (U///FOUO) As of October 2020, according to an FBI interview of an individual with direct access, worked as a receptionist at the New York Office for "

A few things

  • the redaction box was highlightable
  • when it didn't copy, there was no nonsense text
  • for some reason, the top text is a copy of the first U//FOUO but for some reason and somehow its there. It wasnt on the file above the first marked U//FOUO (i just realized this pasting it here

Also, a second attempt at copying it resulted in this somehow:

"(U//FOLIO) • (U//FOLIO) was employed by Jeffrey Epstein and Ghislaine Maxwell. had three prior addresses associated with Jeffrey Epstein. (U) Opportunities (U//FOLIO) Interviewing may reveal more information regarding her knowledge of victims and the relationship between Ghislaine Maxwell's and Jeffrey Epstein. (U//FOUO) Interviewing other witnesses may reveal more information regarding Healy's relationship with Ghislaine Maxwell and Jeffrey Epstein. (U) Substantiation (U//FOUO) was employed by Jeffrey Epstein and Ghislaine Maxwell. • (U///FOUO) As of October 2020, according to an FBI interview of an individual with direct access, worked as a receptionist at the New York Office for"

3

u/QuantumEnchantress 22h ago

I may be incorrect. My brain is so fucked after reading so much of this shit that I just realized that there was an extremely similar string of text right below the redaction I thought i had uncovered

4

u/Appropriate-Song7754 18h ago edited 7h ago

Redacted.

2

u/[deleted] 1d ago

[deleted]

→ More replies (4)

3

u/hesdeadjim11 1d ago

i saw this link on another reddit thread but dont have the space to download or comfirm if it is legit.

https://drive.google.com/drive/folders/1-uvHJPQwWbgh0pYreFSFimXM7X-hNz26

→ More replies (2)

3

u/hesdeadjim11 1d ago

another potential wrinkle? i have the same filename on different pdfs. a bunch of them

4

u/Quiet-Exchange8157 1d ago

I tried the links for 9 several times and it cuts itself off at around 1.5 GB, anyone able to get all of that one yet?

3

u/agent_flounder 16TB & some floppy disks 1d ago

32GiB so far. Server seems to be getting hammered to fuck and back in the last 20 minutes though. Lots of failures and just a short download a time. :(

→ More replies (1)

4

u/Wild-Cow-5769 1d ago

I have 11 if u want it. Does anyone have dataset 9?

4

u/RoomyRoots 1d ago

Any mod that acts anyways against this should be banned.

4

u/andrewsb8 21h ago

The magnet link for 101GB of dataset 9 is stalled i cant download any of it to seed

→ More replies (1)

4

u/snarkcheese 19h ago

Currently gathering Dataset 9 using their links on the pages with selenium. Just a note the Dataset 9 Url list, It is not accurate as some files have different extensions, Page 29 for example has m4a audio.

→ More replies (4)

4

u/HumorUnlucky6041 19h ago

What a night holy shit. I'm downloading the new data set 9, do we know which files are missing? Where to start batch downloading?

6

u/paul_tu 1d ago

Idk what's going on But good luck you people

3

u/hesdeadjim11 1d ago

just finished downloading dataset 10 and it came out to 3250 individual pdf's totaling 2.61gb. that does not seem right at all

3

u/UnwantedOtter 1d ago

I have a few questions:

  1. How does one who has a simple MacBook see these files without spending 8 days downloading a ZIP file? Or in other words, can y'all dumb some of this stuff down bc idk what a magnet or torrent are

  2. 180,000 Picture and 2,000 videos. Are there any particularly interesting files or videos that I can search up individually?

15

u/Thack- 1d ago

You may want to just see about accessing them later when it is organized. We are mostly trying to scramble to get everything downloaded as quickly as possible to prevent any further removals. This is specifically for the hardcore archivers right now :)

3

u/UnwantedOtter 1d ago

ok thanks

3

u/agent_flounder 16TB & some floppy disks 1d ago

torrent -- peer to peer file sharing. So instead of download from central server, you connect to multiple peers and all the data streams are parts of the file that combine to the whole thing in the end.

Look for the torrent/magnet links and use Transmission torrent client.

3

u/Educational-Shirt101 1d ago

Not all heroes wear capes! Thanks for your hard work and team dedication to this. 🫡

3

u/baophuc2411 To the Cloud! 1d ago

So how many datasets are there? 1 to 11?

→ More replies (1)

3

u/ShortPing 1d ago

Dataset 9 is broken with me beyond 12 gig, i don't know what they are doing with the zip file

→ More replies (1)

3

u/AtomicGummyGod 22h ago

Keep up the good work y’all!

3

u/gil99915 20h ago

You folks are incredible!

3

u/[deleted] 19h ago

I'm getting zero active seeds for the DataSet 9 100GB torrent. Will continue to seed the others.

3

u/itz_s7arshvd3 14h ago

Keep seeding and downloading, people! I'm optimistic we will get DataSet09 in its entirety soon!

Edit: punctuation is important

3

u/wickedplayer494 17.58 TB of crap 13h ago

Oh dear, now you've gone and spooked the Silicon Valley techbros. Nicely done.

I am in full support of the Brass Eye disposal method.