Unless you’ve been living under a rock, or aren’t into IT security, you will of heard about the Cit0day breach database dump. Various online news outlets rushed to get their scare tactic reports out, while others took some time to pause and analyze what was found in the breach before commenting. As ever, industry insiders such as Troy Hunt did a great job of a teardown of the breach.

Similar to Troy’s site HIBP, we at Authlogics also maintain a database of breached password information to power our software solutions. However, we process the sourced breached data differently as we use it differently.

Data dumping

Most breach dumps are very poorly and randomly formatted, this is either due to a lack of OCD in the hacker community or because they aren’t paid enough – after all, there is no ISO standard format for a breached credential. This means that you never know what you are going to get, missing delimiters, foreign character sets, duplicate entries, etc.

It is important to realise that even though there are a lot of passwords in a breach dump, many of them are only hashes and not in a clear text format. In the case of Cit0day, we found about 610 million unique passwords but only 163 million (27%) of them were cleartext. The rest were hashes and because they were from various sources they use different algorithms and may have been sorted. Of the 457 million hashes we identified that about 288 million of them (63%) are “actionable” and could be reversed into clear text.

Authlogics vs HIBP

It must be said that Troy does a great job with HIBP and provides a fantastic service. While there are similarities there are some key differences between HIBP and Authlogics database purposes and designs. The main difference is that, in simple terms, HIBP maintains two separate databases, the first one contains the breached email addresses and the breach they were found in, and the second one is a list of breached password hashes. HIBP has been explicitly designed not to maintain a link between the email address and password data and as a result, can upload new data fairly quickly.

In contrast, the Authlogics Cloud Breach Database DOES maintain a link between the email address and the password. We also only add data from a breach where the credentials have a cleartext password. During the processing, we calculate various hash derivatives of the clear text password (e.g. SHA1, MD4 etc) and store all of them, including the clear text for greater matching options later.

What about the breached hashes?

Although we only process credentials with clear test passwords, we still retain the “actionable” hash data for additional processing later on. Firstly, we pass the data through our database to see if we already have the hash, and thus the clear text password, and we can then add the credential to the database. Secondly, we sift through various hash rainbow tables to see if they have already been reversed. Presently we do not attempt to brute force hashes due to the volume of data, processing power, and time limitations, however, this is something we are considering offering in 2021 for data we find relating to our customers – so watch this space!

Customer safety

Uploading large data dumps is a data mining nightmare and is extremely time-consuming; especially given the detailed way we process it, however, it does reveal a goldmine of useful information for our customers about the behaviour and risks of their users.

Adding large dumps like Cit0day to our database takes significantly longer than HIBP due to the data complexity. However, to help keep our customers safe we always prioritise the data found in the domains that match our customers, and that data gets added to our database first and their alerts fire off ASAP – in this case about 860 thousand priority entries within 24 hours. Secondly, password change requests that are filtered by our software are checked in parallel against the Authlogics database and HIBP as Troy may have uploaded the data sooner due to his simpler storage model and update process (maintaining HIBP is still not a trivial process, don’t try this at home).

An interesting support call

Understanding the purpose of a password breach database became clear after we received a support call from a customer (which won’t be named). They had received an alert from us about an email address in their domain that was found in the Cit0day breach. As a result of this, and being a good concerned cyber citizen, they then also did a lookup on HIBP to validate the results. HIBP revealed that there were two email addresses at their domain in the Cit0day breach, but we had only alerted them about one of the email addresses and they wanted to know about the other one, and if we could add it to our database.

We immediately analyzed the Cit0day dump for these two email addresses to see what was going on. Had we missed something? After all, 163 million is a lot of entries, and mistakes can happen… We found that we had indeed found both of the customer email addresses in question within the breach, one had a clear-text password which resulted in the customer alert. The other was not a cleartext password; it had an “actionable” hash which has not yet been reversed and thus had not been added to our main database. If the hash is ever reversed we will certainly add it.

The next question from the customer was how come they were in different formats if they are in the same breach, how is it possible? This comes back to the random nature of the data in the dumps and the various web sites the data was sourced/stolen/hacked from. This question highlights the common confusion about where breach data comes from, what’s in it, and what can be done with it. In this case, we were able to let the customer know which two web sites each credential was breached from.

The real-world threats

Knowing that an email address is in a breach should be a little concerning but it doesn’t put your company in any immediate danger as an email address is given away all the time anyway. Knowing a password hash is in a breach is very concerning, however, if the hash hasn’t been reversed it is of little risk – for now. If the hash was created with salt then you at least know it won’t be easily reversed and won’t end up on a rainbow table either. However, if a hacker knows that the account is of high value, say after an email address lookup on LinkedIn, then there may be value in them putting effort into brute-forcing the hash to get the password.

The Authlogics Password Security Portal is a web-based window into our database which our customer get access to. The portal can collate data on a per-user level to show all the breaches we have clear text passwords for, and also which other email addresses have been breached using the same password and a similar email address. This results in a risk score for the user and helps customers see which staff members have poor online security hygiene and need addressing as a priority. This information is far more useful in the real world than a basic count of the number of email addresses in a breach.

Authlogics Cit0day Vital statistics

We found 610 million credentials in the breach with 163 million unique cleartext credentials (email + password) and 288 million “actionable” password hashes.

We processed 860 thousand credentials for our priority customers within 24 hours of us receiving the raw breach data to give them priority protection.

Of the 163 million credentials we added to our database:

  • Existing credentials in the database: 59 million
  • Existing emails with new passwords: 27 million
  • New credentials: 54 million
  • New passwords not seen before: 40 million (same results as HIBP)

More data from this breach will be processed over time as the “actionable” hashes get reversed.

Talk to our team to find out more about how our technologies can protect our organization.

+44 1344 568900  |  [email protected]