Using Google Workspace Data for Security Observability
By Darin Fisher / Apr 18, 2022 / Community, Developer, Security
This article was originally published in The New Stack.
Keeping your systems secure is a never-ending challenge. Not only is it necessary to monitor and secure your own tech stack, but each new service a company uses creates another potential avenue for bad actors to try to exploit for their own ends.
Time series data is security data
Fortunately, there is one type of data that provides critical data about the way that people interact with any system or service: time series data.
Every event occurs in the context of time. For example, a login attempt happens at a specific time. The data for that event gets timestamped. This timestamped data tells you who attempted that login, where the attempt occurred geographically and more. When you think about the fact that all this critical data has a timestamp, it becomes clear that time series data is security data.
Maintaining time as a constant context for security provides a deeper understanding of your security situation by expanding the scope of what security means.
System monitoring can reveal security threats in real time. However, unlike a courtroom drama, there’s rarely an “a-ha!” moment when it comes to security threats. That’s why security flaws can go undetected for long periods of time. Yes, a single event can be important. That’s why anomaly detection exists. But those events tend to happen rapidly, which makes them easier to miss. Placing events in the context of other events and patterns creates more thorough security profiles.
Building security solutions with time series data
Using time and history allows you to identify activity patterns. You can then use these patterns to test against anomalies when they occur. At InfluxData, our security team is developing a solution that uses InfluxDB to collect and process time series data to build security profiles.
Compromised credentials is a leading attack vector, so it made sense to start with authentication data. This was also convenient because authentication data covers every member of our team, and everyone generates a lot of it. As a result, you can start to see patterns quickly with authentication data as people tend to have consistent habits. InfluxDB enables us to track these patterns for everyone in detail. For example, if someone usually works from home but goes to a coffee shop to work one afternoon, that new location creates a new series of data and changes the cardinality of that person’s authentication activity.
Sometimes the lack of a pattern can also be a pattern. People who travel a lot for work may have a lot of geographically diverse authentication activity. But knowing that those individuals travel frequently mitigates the urgency of anomalies when it comes to IP address location, for example.
Tracking authentication activity can also help us eliminate false positives. If the same team member has two or three failed login attempts on Monday mornings, we can flag it as a potential problem, and then once we’ve identified it as a unique pattern, set that anomaly aside. (Of course, having coffee before trying to log in may help!)
Strength in numbers
As a company that uses dozens of SaaS providers, each one provides us with an opportunity to add context and granularity to employee security profiles and patterns.
Gaining access to authentication data for an individual SaaS provider can be challenging. A critical solution we found was to track Google Workspace (GW) authentications. When a team member logs into any of the services that we use with their Gmail-linked email address, we can use the data generated by that interaction to track usage and authentication that would otherwise be unavailable.
Google authentications are useful, especially when thinking in terms of time series data, because each transaction generates several data points, such as request, response and the actual authorization, among others. Some transactions can generate hundreds of data points, depending on what the user is trying to accomplish.
The basic information we want to capture includes:
- Authentication timestamp
- Company account ID
- User ID
- User domain
- Authentication type
- Authentication result
For our purposes, we map keys to static events or Google Workspace event fields.
- time: GWs.id.time
- service_source: "G Suite"
- service_domain: "influxdata.com"
- source_address: GWs.ipAddress
- email_address: GWs.actor.email
- saas_account_id: GWs.actor.profileId
- customer_id: GWs.id.customerId
- application: GWs.id.applicationName
- auth_results: GWs.events[X].name
- login_type: GWs.events[X].parameters[Y].value
Capturing this data has always been possible, but using it for security profiling didn’t scale well. Using InfluxDB allows us to easily ingest and process all the data produced by these transactions because it’s designed to handle this kind of timestamped data.
As we bring new team members on board, we start to identify their patterns and build their security profile right away. This helps to determine a baseline usage profile that we then compare to usage patterns for established employees and other new hires as they join the company.
We visualize this data right in InfluxDB and use Flux to generate the values for each individual element.
At present, we’re exploring how to use Bollinger bands with this data. This involves using the standard deviation around a simple moving average to establish a normal range. These thresholds also provide a more granular understanding of this data as we track what normal means week over week, day over day, hour over hour, etc.
The potential for this type of security monitoring is vast. As we continue to develop and build out this solution, we will be able to monitor our entire supply chain across the company. For example, including transaction data from GitHub means we will be able to track activities like GitHub repo cloning. If someone initiates a clone request from a suspicious location, that quickly becomes visible, and we can take appropriate countermeasures.
Longitudinal tracking and alerting remain in development, as their functionality depends on the tolerance levels established by the Bollinger bands. But InfluxDB’s native alerting capabilities can handle these types of events.
Ultimately, the goal here is to develop a holistic and scalable system for monitoring security threats. Because everything happens through the course of time, all security data is time series data. We’re leveraging time series data and InfluxDB to maintain that context.