Dealing with large BloodHound datasets | Deloitte Netherlands

Article

Dealing with large BloodHound datasets

Solving challenges with importing and querying

On a regular basis at Deloitte we are performing the security analysis of an Active Directory environment. To perform such assessment, a variety of tools is used of which one is BloodHound. In this technical article we will discuss the challenges we faced importing huge datasets and also go over the various way to query the database to quickly identify potential escalation paths and also visualize them in an understandable way.

By Arris Huijgen

The Cyber Risk Services team at Deloitte daily performs a variety of security tests ranging from testing the security of web applications and IT infrastructures to the hacking of transport systems and banks. One of the tests we are regularly performing is an Active Directory analysis in which the configuration of our clients’ Microsoft-based forests and domains are evaluated for security vulnerabilities and misconfigurations.

To perform such assessment, a variety of tools is used. One of such tools is BloodHound , which by representing various objects in Active Directory as nodes (e.g., users, computers, GPOs) and the relations between those objects as edges (e.g., MemberOf, Owns, CanRDP) allows analysts to quickly identify potential escalation paths and also visualize them in an understandable way.

Because of the large size of Active Directory environments we are analyzing, we are also running into some challenges when importing and querying large datasets in BloodHound. When importing big (>4GB) JSON files with non-ASCII characters, BloodHound sometimes has a hard time ingesting the data. Moreover, once the data is imported, it can be a challenge to perform efficient queries on the huge dataset. Our objective is to extract the relevant information and properly translate the results into actionable tasks so our clients can systematically work through our observations to solve unintended escalation paths and security vulnerabilities that might be present.

The technical blog available at blog.bitsadmin.com will discuss some insights gained while importing and analyzing large datasets, thus combining the power of Neo4j’s Cypher language with PowerShell’s object-oriented architecture.

Did you find this useful?