Dark Web Data Leaks

This episode of The Dark Dive takes a listener's question as a jumping off point to talk about the topic of dark web data leaks.

This episode of The Dark Dive takes a listener’s question as a jumping off point to talk about the topic of data leaked on the dark web.

Guests Luke Donovan and Adam Wilson discuss a series of noteworthy cases of historic dark web data leaks over the years – impacting organizations such as 23andMe, Ashley Madison, Yahoo! – and bring things right up to the present day (June 2024) by looking at the data leaks on BreachForums impacting Ticketmaster and Santander customers.

Speakers

Aidan Murphy

Host

Luke Donovan

Head of Threat Intelligence

Adam Wilson

Product Manager

This episode of The Dark Dive Covers:

How data is stolen in the first place

How does sensitive information such as credit cards, addresses, passwords, usernames, and even biometric data get into the hands of hackers?

How data is packaged and sold on the dark web

Including how data is presented in different ways depending on where exactly on the dark web it ends up, covering everything from hacking forums to paste-bins.

The implications of highly sensitive data being leaked

Biometric and genetic data appearing on the dark web opens new concerns for data misuse.

Transcript

Aidan Murphy: Hello, and welcome to another episode of The Dark Dive, the podcast that delves into the depths of the dark web. My name is Aidan Murphy, and I’m your host as each month we take a look at a different aspect of the dark web. This month, we’re looking at the topic of dark web data leaks, inspired by a question from our listener Tom Houpt...

We’ll come to that specific story later in the show, but this topic couldn’t be more timely, given that one of the biggest news stories of the month was a series of data leaks on the dark website BreachForums, impacting Ticketmaster and Santander. Here to discuss the deluge of data on the dark web with me are two dark web data-dwelling aficionados, Adam Wilson, Product Manager at Searchlight Cyber. Hello, Adam.

(TC: 00:00:54)

Adam Wilson: Hi, Aidan.

(TC: 00:00:55)

Aidan Murphy: And Luke Donovan, Head of Threat Intelligence. Hello, Luke.

(TC: 00:00:58)

Luke Donovan: Hi, Aidan. Hi, Adam.

(TC: 00:01:00)

Aidan Murphy: Both of you are first-time guests, so before we jump in, I’m just going to ask you to introduce yourself to the listeners. Adam, maybe we could start with you?

(TC: 00:01:08)

Adam Wilson: Hi, Aidan. Hi, everyone listening. My name is Adam Wilson, I am the Product Manager for DarkIQ here at Searchlight Cyber, that’s our cyber threat monitoring platform. I’ve been at Searchlight now for a little over three years, I think it is. It’s definitely flown. Prior to starting here at Searchlight, I had my own design business. I was heavily into product design, web design, with a, kind of, specialism in UX design as well. That’s just a little bit about me.

(TC: 00:01:37)

Aidan Murphy: Brilliant. Thanks, Adam. Luke?

(TC: 00:01:39)

Luke Donovan: Hello, everybody. My name is Luke Donovan. I am Head of Threat Intelligence at Searchlight Cyber. I’ve been here for about three months now, so a relative newcomer to the organization. Before here, I worked in a digital risk protection company for eight years. I handled all sorts of different sources or services around the dark web and open-source information. Before that, I was in military intelligence for a while.

(TC: 00:02:04)

Aidan Murphy: Awesome, thanks Luke. Alright, so this is a really big topic, and I want to kind of start at the highest level and work our way down. Then we can bring in some examples as well, and obviously get to the listener question that we’ve been asked too.

Adam, I’m going to start with you again. My overall impression of the dark web is basically that you just can’t move for stolen data, that if you want stolen data, whether it’s financial data, personal data, passwords, that the dark web is the place to go. Is that correct?

(TC: 00:02:33)

Adam Wilson: I would definitely say so. In the short time that I’ve been here, I’ve definitely learned this to be true. There are a whole host of different mechanisms and ways which someone who’s motivated enough to try and move data on can do that on the dark web. There’s a whole different, sort of, variety of options that they’ve got, from, sort of, forums, generic forums, pay sites, dark web markets, autoshops, so there are a whole host of options for anyone that’s looking to move that data on, that stolen data or that accessed data.

(TC: 00:03:06)

Aidan Murphy: Amazing. We monitor the dark web, including for leaked data sets, and presumably this all has to be categorized. Luke, what types of data do we see on the dark web?

(TC: 00:03:16)

Luke Donovan: There’s a massive variation in data sets which we can identify on the dark web. We can identify anything from breaches of credentials, these could be combo lists whereby it’s just a database full of variations of credentials, to individual breaches of systems whereby, again, we could have credentials in there, we could have system information that’s off something called an infostealer, to looking at whole databases associated to organizations. A threat actor could have gained access to your system, all that content could have been exported and posted online. Then you could see information to do with your code, information to do with your intellectual property, so a whole plethora of different bits of information which is out there.

(TC: 00:04:01)

Aidan Murphy: Okay, so a real range. Just for the listener, when we’re talking about credentials, we’re talking about, like, email addresses and passwords or usernames and passwords, that kind of thing, the stuff someone could use to break into an account?

(TC: 00:04:14)

Luke Donovan: Yes, that’s right, Aidan. By credentials, we mean a username or an email address with a password with it.

(TC: 00:04:22)

Aidan Murphy: Do you have a sense of what the most common types of data sold on the dark web are? I know this is probably very hard to quantify, I don’t know if anyone’s done the research, but typically are we talking about financial data, are we talking about personal data, or something else?

(TC: 00:04:38)

Luke Donovan: From my previous experience looking at data breaches, predominantly it’s going to be breaches of credentials which are out there. On top of that, you will see a lot of information to do with PII of individuals. Some of these breaches, it will contain addresses of where people live, so it’s going to be physical addresses. It’s going to be IP addresses as well, but typically it’s going to be those credentials which we identify more than often.

(TC: 00:05:03)

Aidan Murphy: I guess the reason behind that is because then people can use those credentials to break into accounts, and from there, you know, there’s a way to monetize it, right? Adam, you mentioned that as well as there being all types of data, there are all types of places that this data appears on the dark web, so the dark web isn’t just, you know, a standard site and there’s one way the data is leaked, you mentioned pay sites, dark web markets, auto shops. Maybe if you could run through a couple of examples of how these sites look and how they differ, that would be quite helpful, I think, for people listening in.

(TC: 00:05:40)

Adam Wilson: I mean, there’s a whole, sort of, playground for these threat actors out there on the dark web. If you look at forums, just as you would imagine any sort of forum on the clearnet, they work in very much the same way, although the categories and topics of discussion might slightly differ. Essentially, it’s like a mixed bag, it could be anyone that goes on there. That’s talking about things like initial access, there could be database vendors on there, you could have malware devs, there could be spammers, programmers. That could be anything, not just necessarily trying to move on data but they could be talking about how to breach something, they could be sharing advice on that kind of level. Then you’ve got dark web markets, so things like Russian Market and Abacus, for example. That is a place generally where anything can be sold, so that could be physical and digital products. There will be associated payments and then, sort of, deliveries and some sort of level of infrastructure around that.

Autoshops are slightly different, so they specialize exclusively in, like, digital goods. That is where you will be seeing things like access being sold specifically, there are no physical products being sold. Generally what they tend to do is they will take large data from, like, a larger breach set and then they will break that down into smaller, individual accounts and things which are then sold through the auto shop, basically. Then paste sites are slightly different again. They’re, basically, like an online area, like a website, for example, a dark website, where users can go on and share text-based information. They’re quite popular because they can be done anonymously, you don’t generally have to set up an account, there’s no, sort of, authentication process. Admins are quite lax, as well, so they won’t tend to be doing any moderation and taking content down. On there, you can share anything. You can share, like, large volumes of data in text format very, very easily and very, very quickly.

(TC: 00:07:33)

Aidan Murphy: That’s interesting. Just to summarize, so markets and forums you might be getting these kinds of massive data sets, we’re going to talk about some examples soon, that are, you know, often into the hundreds of thousands, sometimes even millions of people impacted, so you might be getting those kinds of big data sets being sold for a larger price, I guess, and then auto shops, you’re saying, is more actionable data, so someone’s done some of the work on it already, and then maybe you’re paying for the productized version of the data, I guess. Is that me understanding correctly?

(TC: 00:08:03)

Adam Wilson: I think that’s fair to say, yes. I think there’s a lot of work that’s gone into, obviously, breaking down that larger set, and what they’re trying to do is provide you with something that you can conveniently take and then action, like you say, is actionable. It’s a lot more convenient than having to, sort of, delve through that larger breach set yourself.

(TC: 00:08:19)

Aidan Murphy: Then, on paste sites, is the data there being sold, or is that just people sharing data for others to use?

(TC: 00:08:26)

Adam Wilson: Yes, not being sold, so that’s literally just a mass dump. It’s usually an already stolen data set. It may already have been monetized at that point and then just dumped anyway.

(TC: 00:08:37)

Aidan Murphy: Interesting, okay. Luke, do you want to come in?

(TC: 00:08:41)

Luke Donovan: Yes. I think you’ve raised a good point there about that processed data, because you’ll see this across a lot of different forums and lots of different data sources, that some of the content which is breached will go out in its raw format, so the way the threat actor has obtained that content. In other ways, the information has been processed, so it has been split out. That’s where you’re going to get your different file formats and a different amount of processing needed by the threat actor. When that information has been posted, when they want to exploit that information, it might mean that they need to process the information much more than that which has already been processed for themselves.

(TC: 00:09:17)

Aidan Murphy: Yes, I think it’s a really important point, because, again, it comes to the dark web ecosystem, so there are people who will take, I guess, maybe these raw data sets and do the work on it to process it, and then sell it on again, so we get this supply chain building up to maybe a potential attack. That’s very interesting. Luke, I guess we’ve touched on it a little bit, but how are people getting hands on these data sets in the first place to be able to sell them? Presumably through theft, but is there anything more we can add on to that?

(TC: 00:09:51)

Luke Donovan: Absolutely. A lot of it is through theft, but it’s how the threat actor gains initial access to the systems is the main point. It could be through valid accounts, so there have already been breaches of credentials out there, so the URLs, the email addresses, the passwords, which could then be exploited by a threat actor to gain access to the system. Then, when they’ve gained access to the system, they could potentially extract the content of that system which they’re in and put it onto the dark web to sell or potentially give it out there for free, different ways you’re going to deal with that content when you’ve got hold of it.

(TC: 00:10:30)

Aidan Murphy: In that case, it’s almost like one data breach leading to another data breach, so some credentials have got out there somewhere, and I think at this point everyone who’s listening will be aware of cases where their credentials have been leaked or they’ve seen lots of people’s credentials being leaked, and then you’re saying those credentials are taken and then used again on another company and then that could be a potential data breach again? It’s almost like a wildfire kind of situation, in my mind, one leading on to the other.

(TC: 00:11:58)

Luke Donovan: Absolutely, Aidan, absolutely. The other way people do gain access is through the exploitation of CVEs. Let’s say there’s a vulnerability on your system, a threat actor is going to go off, they’re going to do some research into your organization, understand your infrastructure and the systems which you’re operating, identify vulnerabilities associated to them, exploit them, and then gain access to your system again, conduct their activities on your system, and then again extract that content and decide what to do with it. Now, what they do with it is going to depend on their motivation, whether they’re financially motivated, whether they’re ideologically motivated, whether it’s ego. If they’re financially motivated, we’re going to see that content going up for sale. If it’s for ego, you may just see them just releasing all that content for free, so then that’s a good indicator that, you know, we could find out some more information about that threat actor, or dig deeper into getting the whole breach. If it’s ideologically motivated, again, we can find out information about the threat actor themselves and that whole data set is probably going to be leaked for free as well.

(TC: 00:12:04)

Aidan Murphy: Interesting. That kind of comes on to something I wanted to ask about, because there’s also this concept of dox sites. Adam, maybe as you’re explaining where all this stuff is going, dox sites, typically that’s where we see data going up for free as well, usually targeting an individual. Could you maybe explain a little bit to the listener what a dox site it? I think it’s probably an interesting topic for a whole other podcast on its own, but I do think it’s relevant here.

(TC: 00:12:30)

Adam Wilson: Dox sites are another common area that we see information getting leaked onto. I think this is more specifically around PII, so we’re not talking about credentials per se here or we’re not talking about access to a system or being able to necessarily leverage anything to do with that user, it’s more about someone who is motivated in some way, like Luke says, whether it’s purely for fun, you know, whether it could be things like script kiddies, or whether there is some sort of ideological motivation behind it that means that they want to specifically post data, very personal information about a specific individual or individuals. That’ll usually be things that are highly personal, so home addresses, workplace information, private correspondence, photographs, videos, anything that you could find that if you were personally being doxxed you’d be absolutely mortified that this thing that is quintessentially just yours and very private is then, kind of, released online.

(TC: 00:13:30)

Luke Donovan: A point to consider when it comes to the dox sites and doxxing in general is quite often it’s the younger generation who are getting doxxed through gaming activities. Therefore, when it comes to threat intelligence and monitoring for threats towards you, your organization and, sort of, your wider operating environment, quite often it’s beneficial not only to monitor for your high-value individuals within an organization, but to look for their family members. It could be their sons, their daughters, other people in their family who have been doxxed, which that information could then be used by a threat actor to then target that company, that individual, moving forwards.

(TC: 00:14:09)

Aidan Murphy: Absolutely. The picture that you’re both painting for me, it hasn’t changed by preconception, the dark web is full of all this data, but I think the variety of this data is super-important. A lot of it, obviously, is, kind of, financial data, credentials, things that can be used for breaches. There’s also a lot of personal data out there as well, and we’re going to come on to some examples of probably the most personal data you could possibly think of.

Before we get there, I think just to take, I guess, some more typical use cases and go back to these ones that I mentioned at the top of the show, these have happened this month, Ticketmaster and Santander are two large companies that have been targeted by a specific actor on a notorious dark web forum this month. Adam, could you give us a little bit of an overview of that story? What happened there and what does it tell us about dark web data breaches?

(TC: 00:15:02)

Adam Wilson: Sure. There’s no shortage of data breaches, really, at any, kind of, given point in time. I think, unfortunately, there have been some very high-profile ones in this last month, obviously, like you say, the Santander and Ticketmaster breaches. Both have been attributed to ShinyHunters. We first, kind of, got news of them around 2020. They’re just another black-hat criminal hacker group. I think, from what I understand, is that the name originates from something to do with Pokémon. I’m not too invested in the wider Pokémon universe, as it were, but-,

(TC: 00:15:36)

Aidan Murphy: That’s a great fact, I didn’t know that. Like shiny cards? Okay.

(TC: 00:15:39)

Adam Wilson: Yes, there’s some sort of shiny Pokémon mechanic, and I think they’ve got it as their profile on X.

(TC: 00:15:46)

Aidan Murphy: I love that you’re pretending that you didn’t collect Pokémon cards.

(TC: 00:15:49)

Adam Wilson: I definitely didn’t, but, yes, they’ve been attributed with these hacks. We know that they first came on the scene around 2020, and then you’ll probably remember in 2021 was one of their more high-profile breaches, which was AT&T. Again, we’re talking about quite large volumes of customer data here. I think with Santander we’re looking at, like, 30 million records, Ticketmaster 560, thereabouts. We see it in on BreachForums. I think it originated from Russian hacking forum Exploit as well. Again, it’s everything you’d expect, so we’re talking about PII, financial info, we’ve got hashed credit card details and purchase histories and things like that. Santander have maintained that none of the online banking was impacted, but still we’ve got this huge wealth of information that’s now out of their control.

(TC: 00:16:41)

Aidan Murphy: This is what I think of as a typical data breach. By that I mean we’re talking names, addresses, phone numbers, partial credit card details, what I would consider to be quite typical things that you see when we see these data breaches. What strikes me, though, is the size of these data sets, so 560 million customers impacted in the Ticketmaster breach alone. You know, that’s the size of a very, very large nation. Is this typical, to see, kind of, data breaches of this size? Luke, you’re nodding.

(TC: 00:17:12)

Luke Donovan: The Ticketmaster breach was 1.3 terabytes in data. That is huge, absolutely huge. Earlier this year, in February, there was a breach called ‘The Mother of All Breaches’. That was massive, I’d never seen anything like that. It’s rare that we see breaches within the terabyte realm, so this is huge, but what’s really interesting is the way the threat actor gained access to Ticketmaster and to Santander. This was through breaches of credentials, again, so it’s really important that we monitor for any credentials out there associated to us and our organizations. A threat actor gained some credentials associated to a contractor. They were able to gain access into a platform, and then the threat actors have been able to get access to this content, extract it, and then leak it to the world. We see it often, this sort of activity, in terms of initial access via a valid account, gain access, exploit that organization, and then pivot off there and hit other organizations associated with them.

(TC: 00:18:19)

Aidan Murphy: Just a couple more points to pull out, BreachForums is a forum that has been around for a long time. Before it was BreachForums it was in another iteration called RaidForums. It’s notorious for this type of activity, and like you said, Adam, ShinyHunters, they’re a known hacking collective that are associated with the forum, so specialize in this kind of activity. One thing that’s worth calling out, so this is a case, as you described earlier, of a very, very large data set being sold for a large amount of money, so maybe the data set hasn’t been processed yet, but, Ticketmaster, they were auctioning in the region of $500,000 of payment for that data, which gives, I think, listeners a sense of why these people go to the effort of going about it. Luke, you said earlier, ‘What’s the motivation behind it?’ For ShinyHunters, it’s a hacking collective, they are financially motivated, right, that’s what these guys do?

(TC: 00:19:20)

Luke Donovan: Absolutely. ShinyHunters, they’ve been around for a while, not only on BreachForums but on other forums as well. I believe they’re also an administrator of one of the forums. What this means is that they could grab hold of this breached content, they could try and sell this content on, but what they can also do is with the content they could put it onto the forum and allow users of the forum to buy credits within that forum to then download that breach for a lot cheaper. This is later on down the line once they’ve sold the raw data when it’s new, but later on sell it on the forum for eight credits, for example, then everybody has got access to the content.

(TC: 00:20:06)

Aidan Murphy: Again, this is probably not going to be one sale and done, we’ll see this data again, probably, in some other form in some other place?

(TC: 00:20:14)

Luke Donovan: That’s a key point there, Aidan, in terms of the proliferation of data and breaches. Once a breach is out there, it’s going to be extremely hard to do anything about it, because it will go around the different forums. It won’t just stay on the dark web either, it’ll go onto the surface web, so you’ll see it being posted onto Telegram channels, onto clear web forums as well. You’ll be constantly chasing yourself trying to take down the content, but once you have got that content or once all that content is being proliferated, it will be broken down, like Adam’s explained already, and little bits of it will be sold on separately, or you’ll get individuals in forums asking questions like, ‘Does anybody have a data set for, say, the UK or the EU around usernames, email addresses and bank details?’ Then people will break apart all these breaches, put together a new breach, and disseminate that off, call it something different. We need to think about the de-duplication of breaches when they’re out there as well in the wild.

(TC: 00:21:18)

Adam Wilson: It’s a really good point Luke raises. I know we haven’t touched on it yet, but specifically the Ashley Madison breach back in 2015. When we talk about breaking down that data set into, sort of, specific groups or sub-groups, I know that there was an approach that someone took because they wanted to break that data set down because there were specifically 1,200, I think, user accounts that related to people in Saudi Arabia, for example, and obviously if you know what Ashley Madison was for, it was for, like, extra-marital affairs. Adultery in their culture is punishable by death, so you can see how someone is very, very financially motivated to try and run some sort of extortion campaign specifically against those users. That’s why these data sets can be so much more valuable going forward because they can be broken down and then used in so many, sort of, disparate and different ways.

(TC: 00:22:09)

Aidan Murphy: I mean, it’s a really good point, and I think, again, quite a terrifying point, in that once the data is out there, it does just seem to be out there. Ashley Madison, like you say, Adam, is a great example. I asked the threat intelligence team to look into it because of the Netflix documentary, and that database seems to just continuously circulate. That’s out there now, and it can’t be taken back, even though that was almost a decade ago. Quite a terrifying point. Breaking down the data I’m going to come back to in a minute, because that actually comes to one of my next examples.

Just before we get there, so going back to these examples of what I called traditional data leaks but on a massive scale, Luke, when you look at this from a criminal’s perspective, names, addresses, phone numbers, partial credit card details, account numbers, balances, how could this data potentially be used by criminals? You know, if I’m someone who is looking to buy this for $500,000, you know, it’s not a small amount of money, what am I looking to get out of it? How am I going to use it?

(TC: 00:23:13)

Luke Donovan: It’s going to depend on what the data set is which you’re purchasing, because there are so many different data sets out there with different pieces of information, but typically for a threat actor it’s going to help them with their reconnaissance phase, so understanding who their target is by a reconnaissance phase, building up that persona in terms of who are they attacking or who could they attack. This could be a case of utilizing PII for an individual in order to impersonate an individual. You could use it to utilize the credentials to gain access to different organizations to breach them. You could use it for financial gain as well. A lot of credit card information is out there, you can then start purchasing goods. Gaining access to systems, another one. It could be corporate espionage as well. Quite often a lot of your breaches are going to be relating to an organization. If that information hasn’t been processed by that threat actor, you could end up with sensitive documents which are being leaked as well, which could then be exploited. It could be plans of what that organization wants to do in the future. Therefore, if that falls into the wrong hands, those plans would now be known.

(TC: 00:24:26)

Aidan Murphy: I mean, there are a lot of different ways, basically, this could be used and misused, and, like you say, it almost comes down to, at that point, who’s willing to spend the $500,000 or bid for more. Then, like you say, that probably doesn’t mean that’s the end of it, and then someone could use it for a completely different purpose. You’re talking about traditional fraud, corporate espionage, further cybercrime. There are a lot of different ways it can be used and reused and misused.

Okay, well, I think it’s time we should turn to our listener’s question. Tom Houpt has sent in a question specifically about a data breach that took place last year, 23andMe. If people don’t know, before I get into the question, 23andMe is a genetic testing company where you can submit your DNA and they come back and tell you, kind of, the breakdown of your ancestry. His question is, ‘After a breach, there’s always talk about your data being sold online, and generally that’s stuff like your passwords, email, SSN, but with genetic testing becoming the norm with companies like 23andMe, are there implications for your genetic data being sold on the dark web? What are the implications now and in the future?’

To start off, Adam, do you mind covering a little bit about the breach, just to set the scene of exactly what happened and what was taken?

(TC: 00:25:44)

Adam Wilson: The threat actor managed to gain persistence in their system. I think they were there for about five months without detection. There were about 14,000 customers breached in total from, like, emails and passwords that were publicly leaked from another breach, so this was basically like a credential-stuffing attack. That threat actor has said, ‘We’ve got all these credentials and we’ve got all these passwords,’ and they’re just trying to gain access to another system using them. Some will be successful, some won’t. Interestingly, though 2FA, so that second layer of authentication that can help safeguard an account, wasn’t mandatory, which is interesting, again, when you think of, like, the nature of the information that’s being stored within that site. I’d kind of expect it to be mandatory and enforced. Again, it just very, very quickly and easily eliminates that kind of risk.

(TC: 00:26:38)

Aidan Murphy: Just for the listeners, sorry, I’m just going to jump in, 2FA is two-factor authentication. This is when your bank sends you a one-time passcode, or it might be, you know, you have to give your mother’s maiden name or something like that. As Adam said, an extra kind of authentication method so you can’t just log in, usually, with your username, email address and password.

(TC: 00:26:59)

Adam Wilson: That wasn’t mandatory, so that made getting access a lot easier. Then, from there, what started as 14,000 accounts initially then allowed the hackers to spider that out to, I think, about 6.9 million accounts by way of a feature that was called ‘DNA Relatives’. That was basically genetic ancestry data, PII, all of that basically was stolen, you know, like I said, this has been one of the first times we’ve seen this kind of level of genetic data being breached. This then leads us onto that, kind of, terrifying notion of how can this data be leveraged? I think whilst we don’t yet, we don’t have a crystal ball per se for the future, I think there’s probably some largely hypothetical ways that we could leverage genetic data. We still don’t know how businesses will leverage it in future, law enforcement, you know, etc, things like that. But, I think some of the things that are quite scary at the moment is that you can make groupings from those accounts based on things like ethnicity. So, anyone who is, like, politically or ideologically motivated can find, you know, a set of data that they can then group by a particular ethnicity. So, I think in this example it was predominantly Jewish and Chinese accounts, so any kind of extremist or terrorist group with that level of motivation could access, like, whole swathes of data about people, in their target zone, which in and of itself is terrifying.

(TC: 00:28:33)

Aidan Murphy: I think it’s worth saying there, that it’s not just that could be done, that was what was done, so the person who was selling this data, on a hacking forum, grouped the data by Jewish ancestry and Chinese heritage, as two, kind of, example data sets. So, again, like you say Adam, we don’t have a crystal ball for how that could be used and exploited, but it is certainly very concerning that it was even being sold, or was suggested to be sold in that fashion, and it certainly suggests, I mean, extremely malicious use cases, if you’re grouping data by that type of ancestry. I think it’s worth saying as well, so the types of data that were in there were family trees, birth years, geographic locations, links to hundreds of potential relatives, phenotype information. So, yes, this is the example I mentioned of really, kind of, the most sensitive data you could possibly think of, when you talk about personally-identifiable information, PII, but this is not just addresses and phone numbers, although that is obviously terrifying in itself, this is very, very personal information. Luke, before we go on, is there anything else you wanted to add in terms of how this attack was done?

(TC: 00:29:48)

Luke Donovan: Not necessarily how the attack was done, but I think it’s interesting that this content is associated around health. A lot of health organizations have been hit, lots of plastic surgery organizations have been hit, and that content, when it’s relating to health and when it’s related to really personal information, can be exploited by a threat actor to target those individuals who have been involved in the breaches. They can then go to them and say, ‘I’ve got hold of this data, about you.’ They can provide snippets, and they can use that to extort those individuals, to pay some form of ransom. We do see that, especially when it comes to imagery, you see a lot of plastic surgery organizations who have been breached by ransomware. That information has then been exploited to target those individuals.

(TC: 00:30:37)

Aidan Murphy: Another quite terrifying way this kind of data could be used. Adam, was there anything else you wanted to say on the potential implications of this?

(TC: 00:30:45)

Adam Wilson: Just moving on from what Luke was saying there, not just that potential for extortion in that, ‘We’ve got this information and pay us or we’ll release it.’ When you’ve got that depth of personal information about people, the amount that you can do with it in terms of more sophisticated phishing campaigns, you can target these people because you can get direct information about their life that is highly leverage-able and can make a really, really convincing campaign to try and then fish for more, basically, and continue the assault on people’s privacy.

(TC: 00:31:20)

Aidan Murphy: It is quite terrifying, but I think even just bringing it back to points we were making earlier in the podcast and applying it to this type of information, the problem is that this data is out there, so initially this was being sold on the forum for as little as $1-10 per profile, which is quite terrifying that that could be accessed for so little money. But now that data is out there, there are different ways that different people could use it, which I think is a really alarming point. Another case that I wanted to bring in as well, maybe slightly less extreme, but still I think has some very serious implications, in May there was a data leak related to El Salvadorian citizens, the entire citizenship of El Salvador was included in a data leak on BreachForums, five million individuals, which included bio-metric data specifically, their names and their details were associated with images, which has quite a lot of implications as well. Luke, maybe what’s the worst case scenario with this and do you have any, kind of, insight into this particular breach?

(TC: 00:32:28)

Luke Donovan: Yes, absolutely. So, the breach, again it was a fairly hefty breach, 144 gigabytes in size. The content which was released, which was breached, on its own, wouldn’t have enabled a threat actor to gain access to any other systems, there were no credentials, there was nothing there to gain access to anything else. So, what you’re looking at here in terms of the threat is that identity fraud, because it’s a lot of PII information associated to those individuals. But a really, really interesting aspect here is who’s behind this attack? It’s really important that we look at who’s behind attacks to work out who might they attack in the future? So, in this example, it was Cyber Intelligensia. Now if you look at who they are as an organization, or as a threat actor group, you can then start working out, who have they targeted in the past? Now this isn’t the first time they’ve targeted El Salvador and its citizens, they’ve also targeted the El Salvadorian cryptocurrency services, and breached a load of content associated to them. So, this enables us to understand that this threat actor group, they are potentially motivated to target the government, or the populous of El Salvador.

(TC: 00:33:46)

Aidan Murphy: So, that’s interesting, and again, it’s hard to say for certain, but what you’re saying is that it’s possible that it is not a financially-motivated group, this might be more politically motivated?

(TC: 00:33:59)

Luke Donovan: Absolutely, so again by looking at the threat actor group themselves, they state in their messages that the content isn’t necessarily always going to be for sale, that they will leak the content for free to anybody, like the cryptocurrency which they obtained, they were just giving that away for free.

(TC: 00:34:19)

Aidan Murphy: I think again, just drawing a point that we made earlier, the problem is, and people highlighted this at the time, is that with bio-metric data, so with these images for example, the problem is that that is out there forever, and you can change your password and you can change your email address, but you can’t change how you look, and I think this was just images, but things like fingerprints, DNA, if those things are leaked, it is very hard to un-leak them. Before we get into, just I guess a little bit of summing on what you can do, are there any data breaches that either of you wanted to call out? So, Adam, you’ve already called out Ashley Madison, but are there any other, kind of, quintessential data breaches that you guys think of, when you think of these massive data leaks?

(TC: 00:35:06)

Luke Donovan: Although there are a lot of breaches out there associated to individual organizations, and individual assets, quite often what we are going to see on the dark web are compilations of breaches, so the likes of collection one, collection two-five, breach compilation. These breaches, in the past, caused me massive headaches because as soon as they leak, everybody gets involved in them, thinking, ‘Okay, is my content in those breaches?’ A lot of that content is typically old information, but from a threat actor point of view, it’s really, really valuable, because what you can see within that data set is password reuse, or the changes in password, how do people utilize their passwords? With that information from a threat actor point of view, they can then start planning, how can they attack that organization, or that individual, based off of how that password has been changed over time. So, those bigger, bulk breaches, like we saw in January, February this year with the mother of all breaches.

(TC: 00:36:07)

Aidan Murphy: So, that was a compilation, that wasn’t just one organization, was that a mashup?

(TC: 00:36:12)

Luke Donovan: So, it was a mashup of breaches, however this was an organization which didn’t secure their servers correctly. So, an individual, a threat intelligence researcher, is able to gain access to a server which contained the whole of this organization’s, I won’t name the organization, the whole of the organization’s database, which allowed individuals to search for their credentials, to see if they were involved in any breaches. Again, that’s not the only organization where this has happened, there was another organization providing threat intelligence services last year, who had all their data set leaked as well. So, these compilations are really, really interesting to me as a threat analyst anyway.

(TC: 00:36:54)

Aidan Murphy: Luke, just while I’m thinking of it, so you have a military background, is there any, kind of, transferable knowledge from that, to this, from an enterprise perspective? In the military is there also this, kind of, concern of, I guess, leaked intelligence or data getting out there that you wouldn’t want out there. Is that something you had any experience of?

(TC: 00:37:17)

Luke Donovan: Yes, absolutely, I was in military intelligence, and there are two aspects to military intelligence, you’ve got the operator military intelligence aspect, and then you’ve got the security aspect. So, the operator military intelligence aspect, that’s what we typically refer to now as threat intelligence, going out, identifying the threats towards your organization. But then, on the other hand you’ve got the security aspect, which is putting in your prospective security measures. So, in the military that might be a case of putting barbed wire fences, thinking about what keys do you need to gain access to buildings, it’s exactly the same within the corporate environment. A lot of the methodology we use within the corporate environment is the same, it’s been adapted from the military and now it’s being incorporated within the corporate environment, so there’s a lot of overlap between what I used to do and now.

(TC: 00:38:09)

Aidan Murphy: So, Adam, what’s the data breach that you think of, what’s the one that comes to mind?

(TC: 00:38:13)

Adam Wilson: So, the one that always springs to mind for me, it’s nowhere near as interesting a story as Luke’s, and also, Luke’s being super coy saying, ‘Oh organization one.’ Trying to stay out of muddy legal waters there Luke, can’t say the names?

(TC: 00:38:27)

Aidan Murphy: I tried to get you to name nation states we should be looking for, so-,

(TC: 00:38:31)

Adam Wilson: So, you know, I’m going to get pulled straight into that, no problem. But no, the reason I have to say the name of mine is, it’s the reason it’s so memorable. It was back in 2013, so Yahoo!, their initial breach was in 2013 and they had a series of, like, subsequent breaches. But the main reasons I remember that, not because it was, like, the biggest of all time, it was like three billion accounts that were leaked, it was mainly because when it happened, it was a friend of mine that alerted me to it, and he said, ‘Have you soon, Yahoo! have been breached?’ I was like, ‘Hang on, Yahoo!, do you mean Yahoo!’ And he meant, ‘No, yes, Yahoo!’ That’s why it sticks in my mind, because never in my life have I ever heard it pronounced Yahoo!, I’ve always said Yahoo!, but then as the years have gone by I’ve started to actually question that, and I’m now concerned that maybe I’m wrong. So, yes, so it’s stuck in my mind predominantly for that reason, but also it was an absolute massive one. Then the other one, which I think if you want to start talking about mitigation and what individuals and organizations can do to try and prevent these kinds of breaches, then probably Facebook would be a good example of ones in the not too distant past.

So, I think initially, in 2021, they were breached, with 530 million people impacted. But prior to that, and this is why it sticks in my mind is because they’d had quite a torrid time of it, starting around March 2019, there were reports that internal employees had access to about 600 million user’s data, so that’s across Facebook and Instagram, where they could see their, sort of, passwords in clear text. This is, like, all internal employees, which that in and of itself is slightly alarming, there should be protocols in place and access controls and things like that, that mean that that kind of data isn’t just visible internal, and necessarily that easily. Then I think following on from that in April 2019, there was another disclosure of 540 million unsecured data records in a public AWS cloud server, where again, this was relating to a third party. So, a third-party app developer just failed to protect that data set. So, like Luke mentioned earlier, you need these controls for everything, internal users, access controls, good identity management. When you’re looking at, you know, sort of, third-party suppliers and things like that, there have got to be more stringent controls in place, just to protect, from very, very simple points of failure like that. Then I think following on from that, there was like, a few months later there were more records exposed on a foreign server on the dark web. That sums up a pretty, sort of, torrid few years there, I wouldn’t have probably wanted to be working there during that time, I think it would have been a bit stressful.

(TC: 00:41:30)

Luke Donovan: I would like to give another example, because it really showcases the value of processing breaches, so obtaining breaches and understanding the content of those breaches. So, a few years back, Go Nitro, so Go Nitro is a PDF organization, you upload a PDF onto the website, and then you’re able to edit that PDF. Go Nitro were hit by an attack and their whole database was leaked online. Now typically the data which was leaked, it seemed fairly interesting, there were some credentials in there, there was some information about who was breached, the accounts which were utilizing the service. But when you get hold of all the files and you start processing that data, what you can then start doing is taking those email addresses, and understanding what documents they uploaded onto Go Nitro, and what documents they were editing. Now that’s really valuable because a lot of organizations were using this to edit their PDFs, rather than using some form of service internally, they just thought, ‘Oh I’ll take a shortcut, I’ll just go and upload it onto Go Nitro and edit the PDF.’ They then got breached, and then the people who breached Go Nitro, got hold of the whole database, in the database it held the raw files, which were uploaded as well. Those raw files were then, I can’t remember whether they were sold on, or they were auctioned off, because it all then happened under a really closed environment. Then all that information was leaked. So, what I’m trying to get at here, is from an organizational point of view, you can look at for your breaches to do with your domain, your high-value individuals, your company name, etc, but also consider the files which may be out there as well, and what’s happened to those files, what’s happened to that content, how valuable are those files to you and your company?

(TC: 00:43:27)

Aidan Murphy: That is a brilliant example, I’d actually forgotten about that, but I remember that was a really big deal at the time, because again the type of information that people include in PDFs, and the fact that they were all uploading them, that was a very, very serious issue. So, you mentioned monitoring for whether your organization is impacted, maybe if you could just expand on that a little, so if the data’s out there, what’s the point of monitoring I guess, what can you learn by monitoring, just to see if your organization is impacted?

(TC: 00:43:57)

Luke Donovan: First of all, you want to see, is there any content of you, of your organization out there, but not only your organization, but those individuals, or those organizations associated with you as well. So, this could be your third parties, your supply chain. You want to be able to monitor this to understand what, if there has been a breach, what impact is that going to have on your organization, what reputational damage could this cause to you, what sensitive information is out there which could be exploited by the threat actor? Again, that’s going to vary depending on the information out there. But it’s just going to give you options, you’re going to have that knowledge then in terms of what actions you should be taking. Now the third party aspect is really interesting to monitor for, and to search for. Now threat actors, when they post breached information, quite often they’re going to make it very clear what the breached information is associated to, or who it’s associated to, because they want to get money out of this, so they’re going to say, ‘This is breached content associated to X, Y and Z.’ That’s going to then allow those threat actors to decide, ‘Yeah, I’m interested in that because it’s to do with the banking organizations which I want to target because I’m financially motivated.’ Or it might be a grocery provider, which I want to target, I might have some ideological reasons why I want to target them. So, they see that name, they want to go off and purchase that content, or download it.

But, by reviewing or trying to identify information to do with your third parties, your supply chains, what you can then do is work out, is there anything to do with me which is being leaked, associated to these organizations? Last year, I used to provide intelligence as a service to organizations, where by they would task me to do lots of different, random sort of stuff, identifying breaches of information. With one of my customers, their supply chain was hit by ransomware, and all the content was put on a dark web extortion site. The organization which was put onto the dark web extortion site sent out an email to all of their customers and suppliers stating their information was not leaked, there was nothing to do with them on the dark web. I then got tasked to then go off, to the dark web, look at their supplier, get hold of the data, and review the content, to identify whether there was anything or wasn’t anything in there. When I had done that, I identified that there was content out there associated to my client. So, the supplier, had sent out an email saying nothing was there, and actually there was content there. So, then, legal action could then be taken. It was a bit messy, but it’s understanding what data is out there and how that impacts you, not only for yourself, but also for your, sort of, supply chain.

(TC: 00:46:46)

Aidan Murphy: It’s really interesting, because I guess as much as we’ve been talking about, you know, once the data is out there, it often exists out there, there are still actions you can take to limit how damaging that data is, we talked about some of them already, like you said at the beginning Luke, credentials are probably the most common thing we see, but that might be at the point where you say, ‘Okay, we do need to have two-factor authentication, we do need to enforce password changes across our customer base, or partner base, or staff base, or whatever kind of data has been impacted.’ Adam, obviously you create our products here, and a lot of that is about dark web monitoring and we do it on, kind of, an attribute basis. Can you just maybe explain to listeners how that works, because again I think some people will be listening to this and thinking, ‘You’re telling me this data exists out there on the dark web, it’s just floating around, you mentioned all these terrifying places it can end up, pace bins, dark web markets, auto shops.’ How can practically, the people listening to this find out if their company’s data is out there?

(TC: 00:47:49)

Adam Wilson: You can obviously employ things like manual searches, but probably the easiest way is that you want to leverage some kind of platform or service that’s going to go out and do all that heavy lifting for you. So, obviously in our instance, we, like you say, we monitor specific assets that are getting defined by a company, so that could be domains, network attributes, emails, files, things like this. Then we’ve got a whole, sort of, host of processes that run behind the scenes, just like magic, where we will go out and then look for more of their public-facing infrastructure, from the assets that they’ve given us, so we understand more about that front-facing, public-facing attack surface, the whole external attack surface side of things. Then what we do is all of that data gets run through again, a whole other load of processes and against our data lake, which is, I believe is one of the most extensive dark web data lakes that exists. We have a whole host of processes that take place, where we’re looking across not just dark web sources and dark web data, we have other, there’s other clear net aspects to it as well.

But essentially, what we’re doing is, we’re looking, you know, on a very regular basis for any signs of compromise, so whether that’s credentials, whether it’s any, sort of, aspect of your software that’s implicated with a CVE that might be being currently exploited. As Luke said earlier, it’s all about really being forewarned is forearmed, you know, you have that data, you can then take some form of action, whether that’s some sort of defensive or just a means to mitigate any further potential compromise, so in the event of credentials, you can start doing that proactive piece of getting people to change passwords, and you can start looking at internal systems. We managed to get traffic data as well, so you can look at that dark web connection that’s coming into your public-facing infrastructure, you can marry things like that up with logs, internal logs and things like that, and you can start piecing together a picture of what’s going on and what might have happened. Then again from there, you can start to understand whether there is the potential that a breach has occurred. You can start to contain that, you can start to track that down in your systems, and again you’re just then preventing that initial threat from escalating, because you’ve got sight of it at the earliest possible time.

(TC: 00:50:12)

Aidan Murphy: Yeah, well, I mean, so there is something that can be done, and you can go out there and see if your data’s been affected, and like you said, both Luke and Adam, then at least you know, and then you can take some action. In this episode, it feels a little bit doom and gloom, you know, we’ve talked about some very, very serious cases, and once the data is out there, it being very difficult to take it back. Adam, is there anything that you would recommend to companies to make sure, I guess, that they don’t even get into that position, preventative measures they can take that means that they are less likely to be impacted by these kinds of data leaks that find their way to the dark web?

(TC: 00:50:51)

Adam Wilson: Yes, so I mean, to be honest, there’s quite a lot of things that companies can do. Internally, it’s good to have really strong processes, also just other, sort of, platforms for detecting things like endpoint detection, processes around remote working and remote working protocols, things like that, good identity management practices. There’s a whole host of things that are ops-sec and processor related that any good CSO should be all over. Then it’s that, kind of, training and user education piece for users, that’s probably one of, arguably, the most important things you can do, because unfortunately, like as a human being, we are pretty much the weakest link in the cyber kill chain, and we are so vulnerable to, kind of, psychological manipulation, and we are so habitual in that password recycling happens probably far more than you would hope that it does, but it does. So, I think that’s when things like that user education piece and regular training in, kind of, a format that’s a bit more engaging than just like, ‘Oh here’s a process document you’ve got to read, or whatever.’ So, internally here, we have quite regular workshops which are a bit more interactive and things like that can be really helpful, because actually I think showing people how they can be compromised and showing them things like, you know, really sophisticated phishing emails and things like that. So, going over and above just your, sort of, password cyber hygiene, best practices, can be really helpful. So, just making sure that people are aware of those, sort of, threat vectors and being aware of the simplest ways that they can overcome the majority of them, and things like 2FA as well.

(TC: 00:52:34)

Aidan Murphy: I think that’s a really important point, because as we’ve covered in the podcast, it’s shocking how many of these big data set leaks are due to even just credentials, to legitimate password user combinations, and that, like you say, we’re not talking about sophisticated hacks, we are talking about almost, kind of, the basics, the basic hygiene, but like you said, it’s a problem that is persistent and needs to be addressed. Luke, is there anything you would add onto what Adam has said?

(TC: 00:53:00)

Luke Donovan: A lot of what Adam’s stated there is absolutely correct, however, quite often you need threat intelligence to feed in to those points, so what should you be educating your employees on, what policies do you need in place? A lot of the information you’re going to get from your threat intelligence voice is currently happening outside of your operating environment, or within your operating environment, that’s going to feed where your education needs to be focused.

(TC: 00:53:26)

Aidan Murphy: That seems like a good note to draw a line under this episode of The Dark Dive. A big thank you to Luke and Adam for joining me, and thanks again to Tom for sending in the question that inspired the episode. If you would like us to cover a specific topic on the show, please get in touch using our email address or the social media handles in the show notes. If you can’t wait to find out more, remember you can follow us for free on Apple Podcasts, Spotify, YouTube, or whatever podcast app that you use, and get all of the episodes of The Dark Dive as soon as they’re released. Until next time, stay safe.

[Read more]

Dark Web Data Leaks

Find us on your podcast platform

This episode of The Dark Dive takes a listener’s question as a jumping off point to talk about the topic of data leaked on the dark web.

Speakers

This episode of The Dark Dive Covers:

How data is stolen in the first place

How data is packaged and sold on the dark web

The implications of highly sensitive data being leaked

Transcript

Related Content

Dark Web Hacking Forums

BreachForums

Dark Web Marketplaces

OnniForums [offline]

RussianMarket

The LockBit Takedown