Updated: Dec 2, 2019
Oftentimes when chasing down an alert I find myself asking the same questions:
I know that a process is malicious, did it spawn any children?
The process spawned children, what did they do?
What spawned that first process?
What created the binary file for the malicious process?
Did these processes interact with the file system?
Did they interact with the network?
Answering these questions can be costly. It often involves manually running linear searches over logs. This can get very convoluted - if a process A has children B and C, now I need to perform separate searches to understand B and C further. Each branch adds significant cognitive overhead, and searches over logs can be very slow.
In some cases I even want to model my alerts this way - not just alerting off of discrete events, such as a process spawning, but combined events, such as a process with specific attributes spawning a child with specific attributes.
Some example of signatures that would require more than a single event would might be:
Word executing child processes, indicating that a malicious macro has executed
A process X spawning a child Y where we’ve never seen X have a relationship with Y before. (Why is my Java service executing /bin/bash?)
A process writes to a sensitive file, but not one of the many processes that are known to do so.
Given most log sources, where individual logs represent discrete events, writing any of these alerts requires complex joining logic, handling pid collisions, and a lot of time and compute, with the compute growing exponentially with the depth of my searches.
What I want is a way to answer all of those questions in a single operation. I want to take an alert and be able to immediately see everything interesting about the components of that alert. I want to write signatures that work across multiple events and I want that to be elegant and efficient.
I’m building Grapl to optimize for these use cases.
How it works
Grapl works by ingesting logs, such as a process creation event, and producing a graph representation of that log. These graphs are later marged into the master graph.
Issues like pid collisions are handled automatically.
As an example, a log like this:
will create a subgraph with a newly created process node, an edge to some pre-existing process node with pid ‘4’, and some pre-existing file node with path “/home/downloads/payload.exe”.
These subgraphs get added to the master graph, creating the new process node, connecting it to its parent process’s node, and its binary’s node.
Expanding your investigation from a single node is trivial. If I want to see everything a process did, it’s as simple as:
(This is using DGraph’s query language, Graphql+)
This will find any nodes with pid=5 on the asset with id ‘asset_zzd’ and recursively expand its edges.
Given only a single event we can go from this:
to this: (unfortunately filenames are not listed on nodes)
This is a single, simple operation that executed in milliseconds. I now have much of the context I would want when investigating a suspicious process.
With a single query I can now understand quite a lot about the event:
chrome.exe created a file
word.exe read the file created by chrome.exe
word.exe created a file,payload.exe
payload.exe was executed by word.exe
If payload.exe had spawned other children, or read other files, I’d see it. (Note that the logs used to generate these graphs are fabricated)
Setting up Grapl is mostly automated, with a few manual pain points.
Once Grapl is deployed you can send up JSON encoded logs to your raw-log S3 bucket. The rest should just work.
Grapl is a very young project. Currently the best supported features are:
Parsing logs into graphs
Creating ‘identities’ for nodes (to handle pid collisions)
Merging generated graphs into the master graph
Grapl is in an ‘alpha’ release state. There may be major architectural changes and rewrites. Data that goes through Grapl may not be valid for futures versions.
There’s a lot more I want to build, some of which is already decently far along.
Networking, Users, and Assets
Grapl only supports files and processes right now, and assets_ids are opaque identifiers.
I want to be able to answer more questions:
Did a process SSH to another system? What subsequent processes where executed?
What IPs has a process connected to?
What domains has a process resolved?
Which processes executed on a given asset, under a given user?
I have ongoing work to model the data necessary to answer these questions. When I’m done it will be as easy to answer these questions as it was to answer the others - one simple operation.
So far Grapl mostly supports manual investigation, but it has no system for writing alerts. As I mentioned above, there are at least some attacker signatures best described by the combination of events, not singular events.
In order to support this I intend to allow signatures to be stored by an analyst and subsequently executed each time the master graph is updated. This should give real time alerting with graph queries.
The current state for this is not ideal. There is actually a single analyzer in the Grapl repository that will scan for malicious word macros, but the process of creating analyzers is overly painful, requiring a separate lambda for each signature.
The next steps here are to:
Provide a friendly DSL for writing signatures
Remove the need for separate lambdas, and just provide one lambda that pulls down the signatures and executes them against the graph
As demonstrated, Grapl can trivially expand graphs. There’s no need for Grapl to understand details of a signature in order to provide basic contexting, it only needs to expand the graph.
When analyzers are built, and as they output signature matches, the engagement-creation-service will automatically expand the graph around the match and create an engagement - a separate graph with a unique key, which you can interact with to add and remove more nodes.
Engagements will be the main way to interact with Grapl, with future plans to provide a Python SDK so you can script your engagements further.
The vast majority of Grapl is written in the Rust programming language, with the Analyzers and Engagement SDK being written in Python.
If you don’t know Rust or Python, don’t worry. I’d be happy to help anyone get ramped up with either language.
If you’re interested in contributing, if you have feedback or questions, please feel free to open an issue or start working on an existing one.