InfluxData Blog - Darin Fisher

Using Google Workspace Data for Security Observability

Darin Fisher (InfluxData) — Mon, 18 Apr 2022 04:00:32 -0700

This article was originally published in The New Stack.

Keeping your systems secure is a never-ending challenge. Not only is it necessary to monitor and secure your own tech stack, but each new service a company uses creates another potential avenue for bad actors to try to exploit for their own ends.

Time series data is security data

Fortunately, there is one type of data that provides critical data about the way that people interact with any system or service: time series data.

Every event occurs in the context of time. For example, a login attempt happens at a specific time. The data for that event gets timestamped. This timestamped data tells you who attempted that login, where the attempt occurred geographically and more. When you think about the fact that all this critical data has a timestamp, it becomes clear that time series data is security data.

Maintaining time as a constant context for security provides a deeper understanding of your security situation by expanding the scope of what security means.

System monitoring can reveal security threats in real time. However, unlike a courtroom drama, there’s rarely an “a-ha!” moment when it comes to security threats. That’s why security flaws can go undetected for long periods of time. Yes, a single event can be important. That’s why anomaly detection exists. But those events tend to happen rapidly, which makes them easier to miss. Placing events in the context of other events and patterns creates more thorough security profiles.

Building security solutions with time series data

Using time and history allows you to identify activity patterns. You can then use these patterns to test against anomalies when they occur. At InfluxData, our security team is developing a solution that uses InfluxDB to collect and process time series data to build security profiles.

Compromised credentials is a leading attack vector, so it made sense to start with authentication data. This was also convenient because authentication data covers every member of our team, and everyone generates a lot of it. As a result, you can start to see patterns quickly with authentication data as people tend to have consistent habits. InfluxDB enables us to track these patterns for everyone in detail. For example, if someone usually works from home but goes to a coffee shop to work one afternoon, that new location creates a new series of data and changes the cardinality of that person’s authentication activity.

Sometimes the lack of a pattern can also be a pattern. People who travel a lot for work may have a lot of geographically diverse authentication activity. But knowing that those individuals travel frequently mitigates the urgency of anomalies when it comes to IP address location, for example.

Tracking authentication activity can also help us eliminate false positives. If the same team member has two or three failed login attempts on Monday mornings, we can flag it as a potential problem, and then once we’ve identified it as a unique pattern, set that anomaly aside. (Of course, having coffee before trying to log in may help!)

Strength in numbers

As a company that uses dozens of SaaS providers, each one provides us with an opportunity to add context and granularity to employee security profiles and patterns.

Gaining access to authentication data for an individual SaaS provider can be challenging. A critical solution we found was to track Google Workspace (GW) authentications. When a team member logs into any of the services that we use with their Gmail-linked email address, we can use the data generated by that interaction to track usage and authentication that would otherwise be unavailable.

Google authentications are useful, especially when thinking in terms of time series data, because each transaction generates several data points, such as request, response and the actual authorization, among others. Some transactions can generate hundreds of data points, depending on what the user is trying to accomplish.

The basic information we want to capture includes:

Authentication timestamp
Company account ID
Username
User ID
User domain
Authentication type
Authentication result

For our purposes, we map keys to static events or Google Workspace event fields.

time: GWs.id.time
service_source: "G Suite"
service_domain: "influxdata.com"
source_address: GWs.ipAddress
email_address: GWs.actor.email
saas_account_id: GWs.actor.profileId
customer_id: GWs.id.customerId
application: GWs.id.applicationName
auth_results: GWs.events[X].name
login_type: GWs.events[X].parameters[Y].value

Capturing this data has always been possible, but using it for security profiling didn’t scale well. Using InfluxDB allows us to easily ingest and process all the data produced by these transactions because it’s designed to handle this kind of timestamped data.

As we bring new team members on board, we start to identify their patterns and build their security profile right away. This helps to determine a baseline usage profile that we then compare to usage patterns for established employees and other new hires as they join the company.

We visualize this data right in InfluxDB and use Flux to generate the values for each individual element.

Continued development

At present, we’re exploring how to use Bollinger bands with this data. This involves using the standard deviation around a simple moving average to establish a normal range. These thresholds also provide a more granular understanding of this data as we track what normal means week over week, day over day, hour over hour, etc.

The potential for this type of security monitoring is vast. As we continue to develop and build out this solution, we will be able to monitor our entire supply chain across the company. For example, including transaction data from GitHub means we will be able to track activities like GitHub repo cloning. If someone initiates a clone request from a suspicious location, that quickly becomes visible, and we can take appropriate countermeasures.

Longitudinal tracking and alerting remain in development, as their functionality depends on the tolerance levels established by the Bollinger bands. But InfluxDB’s native alerting capabilities can handle these types of events.

Ultimately, the goal here is to develop a holistic and scalable system for monitoring security threats. Because everything happens through the course of time, all security data is time series data. We’re leveraging time series data and InfluxDB to maintain that context.

How We Use InfluxDB for Security Monitoring

Darin Fisher (InfluxData) — Fri, 29 Jan 2021 04:00:13 -0700

At InfluxData, we believe it makes sense to use a time series database for security monitoring. In summary, it’s because security investigations are inevitably time-oriented – you want to monitor and alert on who accessed what, from where, at which time – and time series databases like InfluxDB are very efficient at querying the data necessary to do this.

In this post, we’d like to show you the beginnings of how we’re using InfluxDB for security monitoring so that you can apply these patterns to your own organization.

Our inventory of security events at InfluxData

The first question: where to begin our security event monitoring? Since most security breaches are related to compromised accounts, we decided to focus there.

In order to verify geographically appropriate access to our services, we need to collect data for over 100 cloud services. But one of the first hurdles we hit was the availability of access data (who’s logged in) and activity data (what they did once logged in). Of those 100+ services, a few dozen use single-sign on (SSO) using Google Workspace (formerly called G Suite). Since we were able to acquire access data for those services, we decided to start there.

Patterns we're looking for

Security monitoring is all about anomaly detection – what the deviations from normal are. We decided to look at the following:

Total number of unique accounts
Total number of authentication attempts
Total number of successful authentication attempts
Total number of unsuccessful auth attempts
Average number of IP addresses per account
Average number of accounts per IP address
List of authentication events, with time, username, app, IP address, and whether successful or not

With the event data stored in InfluxDB, it’s easy to view the above information across any time period we like, such as the past hour, day, week, month, etc.; a particular set of hours or days; certain hours of the day (business hours vs. non working hours); or certain days of the week (weekdays versus weekends).

We want to keep the data points required simple. After all, we have many services to track. Our list of cloud services used – and thus attack surface – is continually changing. And usage patterns change over time, whether it’s the holiday slowdown, or the return to a post-pandemic world with more travel.

Authentication events

We are collecting authentication events from the Google Workspace (GWs) audit logging services, Login audit log. The GCP Cloud Logging documentation describes methods for automated collections. Later, we’ll shift to collecting these events using the Telegraf PubSub plugin, since it’s an easier, cleaner integration. (Please learn from my mistakes!)

Data collection

Each service can require a separate collection process. The methods and data models can be similar though each will be unique. This will be accomplished in many ways and with a variety of tools. (Details are beyond our scope today.) The collection service for Google Workspace currently runs a NodeJS polling program every 5 minutes. Again, this is migrating to a simpler PubSub Telegraf listener.

Data storage

The metrics storage is the InfluxDB Cloud service.

Data model

Each service will deliver the data in a different pattern than other services, and we will need to normalize each for our uses.

The basic information we will require:

Authentication timestamp
Company account ID
Username
User ID
User domain
Authentication type
Authentication result

The company ID is used to manage separate corporate accounts. The username is usually the email address but could match the user ID.

Our keys are mapped to static values or GWs event fields.

GWs == Google Workspace Event Record

time: GWs.id.time
service_source: "G Suite"
service_domain: "influxdata.com"
source_address: GWs.ipAddress
email_address: GWs.actor.email
saas_account_id: GWs.actor.profileId
customer_id: GWs.id.customerId
application: GWs.id.applicationName
auth_results: GWs.events[X].name
login_type: GWs.events[X].parameters[Y].value

Visualization

This initial dashboard visualization consists of general usage metrics, success vs. failure counts, account and address cardinality, results over time chart, and a list of authentication event details. Creating visualizations and dashboards is covered very well in the InfluxDB documentation.

Dashboard elements

Flux queries used for each cell of the above dashboard are as follows:

Unique accounts

This builds the list of unique accounts attempting authentication for the given period of time.

from(bucket: v.bucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) =>
    r._measurement == "auth_activity"
    and r._field == "auth_result"
  )
  |> keep(columns:["email_address"])
  |> group()
  |> unique(column: "email_address")
  |> count(column: "email_address")

Authentication attempts

How many total authentications were attempted during our request period.

from(bucket: v.bucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) =>
    r._measurement == "auth_activity"
    and r._field == "auth_result"
    and (r._value == "login_success" or r._value == "login_failure")
  )
  |> keep(columns:["_time","email_address"])
  |> group()
  |> count(column: "email_address")

Success

For this time period, how many authentication attempts were successful.

from(bucket: v.bucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) =>
    r._measurement == "auth_activity"
    and r._value == "login_success"
  )
  |> group()
  |> count()

Failures

Demonstrates how many authentication attempts failed during this time.

from(bucket: v.bucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) =>
    r._measurement == "auth_activity"
    and r._value == "login_failure"
  )
  |> group()
  |> count()

Average address cardinality per account

For the given time period, what is the average number of internet addresses used for each user id.

from(bucket: v.bucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) =>
    r._measurement == "auth_activity"
    and r._field == "auth_result"
  )
  |> keep(columns:["email_address","source_address"])
  |> group(columns: ["email_address"])
  |> unique(column: "source_address")
  |> count(column: "source_address")
  |> group()
  |> mean(column: "source_address")

Total account cardinality per address

How many accounts were used per internet address during the same time period.

addresses = from(bucket: v.bucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) =>
    r._measurement == "auth_activity"
    and r._field == "auth_result"
  )
  |> keep(columns:["source_address"])
  |> map(fn: (r) => ({ r with field: "x1" }))
  |> group(columns:["field"])
  |> rename(columns: {source_address: "_value"})
  |> unique()
  |> count()

accounts = from(bucket: v.bucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) =>
    r._measurement == "auth_activity"
    and r._field == "auth_result"
  )
  |> keep(columns:["email_address"])
  |> map(fn: (r) => ({ r with field: "x1" }))
  |> group(columns:["field"])
  |> rename(columns: {email_address: "_value"})
  |> unique()
  |> count()

join(tables: { d1: addresses, d2: accounts }, on: ["field"])
  |> map(fn: (r) => ({
    r with _value: float(v: r._value_d1) / float(v: r._value_d2)
  }))
  |> keep(columns:["_value"])

Authentication results

Summary of authentication attempt successes and failures that occurred during the entire time period.

from(bucket: v.bucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) =>
    r._measurement == "auth_activity"
    and (r._field == "auth_result")
  )
  |> keep(columns: ["_start","_stop","_time","_value"])
  |> map(fn: (r) => ({ r with res: r._value }))
  |> group(columns: ["res"])
  |> aggregateWindow(every: v.windowPeriod, fn: count )

Latest authentication events

A full list of the authentication events and details for the time period.

from(bucket: v.bucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter( fn: (r) =>
    r._measurement == "auth_activity"
    and r._field == "auth_result"
  )
  |> duplicate(column: "_value", as: "auth_result")
  |> drop(columns:[
    "_start","_stop","_field","_measurement","application",
    "customer_id", "service_source","saas_account_id","_value",
    "service_domain"
  ])
  |> group()
  |> sort(columns:["_time"], desc: true)

A request to our fellow cloud software vendors

Let me step onto my soapbox for a moment.

One thing we’ve noticed is that many cloud services and SaaS applications don’t provide access to security events such as logins. And for those that do, many charge extra for them. For example, here’s the pricing of AWS CloudTrail, which lets you log and monitor your AWS account activity.

As an industry, us cloud and SaaS vendors are doing ourselves a disservice, because these practices reduce the likelihood of our customers finding security breaches. The more we can make security events widely available via APIs – and make those APIs free – the more we can build trust in the products we all offer.

Credit: @sammiechaffin via Unsplash

Think of the car industry – they don’t charge extra for fancier seatbelts, anti-lock brakes, or airbags. These come standard, because car vendors know that the safer they make cars, the more people will trust them as a mode of transportation. We need to start thinking like that.

So if you’re a cloud software developer, please make your security events, especially authentication events, available via a free API. Specifically, provide programmatic access, either via pull (REST API calls) or push (web sockets, MQTT, AMQP, etc.) of the following information:

Access: Who (attempted) log in, at what time, and from where, in the form of IP address or fully-qualified domain name (FQDN). Even better: determine the latitude and longitude of a login. This way a customer can compute the distance between login sessions to see if they indicate an account compromise.
Usage: How long someone's session lasted.
Activity: This is domain-specific and should allow tracking of at least add, change, and delete operations in an application or cloud service.

InfluxDB OSS and Enterprise products produce detailed authorization and activity logs. This is available through AWS, Azure, and Google Cloud marketplaces to streamline the installation and purchasing.

Now, to be completely transparent, the information is only available for InfluxDB Cloud services with a request to the support team. The capabilities are on our product roadmap, and we are working to rectify this.

More on InfluxData and security

InfluxDB content on security:

We have the following security monitoring templates:

Sending log data to InfluxDB:

Sending log data from InfluxDB:

Conclusion

As we migrate our operations and services from self-hosted to cloud, the security-related events are more difficult to collect and correlate yet become even more important to watch. Our tools and methods must evolve in order to keep up with a continually changing attack surface, and the InfluxData platform can be utilized well for this.

We will continue to build and demonstrate various methods for improving our security posture, so stay tuned for more!

If you’re using a time series database for security monitoring, we’d love to hear from you. Let us know on our Slack or on our community website. And if you want to try InfluxDB for yourself, get it here.

Thank you, Al Sargent and Peter Albert, for your assistance and contributions to this article.

InfluxDB Endpoint Security State Template

Darin Fisher (InfluxData) — Tue, 21 Jul 2020 07:00:08 -0700

Our team recently discovered an exposed endpoint without authentication enabled, though we know it had previously been required. The root cause was a missing configuration as a result of a recent upgrade a few weeks earlier, and was easy to fix by simply enabling the configuration parameter correctly.

We needed a way to catch this type of issue quickly going forward, for this and for other public endpoints, which should be secure by default. Here is how we solved it.

A complete monitoring system should continuously verify that the security controls are functioning as expected and have not changed. Here we see how to track the state of our web service authentication and SSL certificates with the InfluxDB Endpoint Security State Template.

This will tell us:

Is the service available?
Is the SSL certificate still valid?
If authentication is required, is it turned on and functioning?

Service availability is one of the many metrics used for both security and operations. One example where availability is a security concern is a DoS (or DDoS) attack. The purpose of such an attack is to render a service unavailable.

The SSL certificate validation can also be used by operators to assist in certificate renewal automation. For security purposes, we need to regularly validate that our identity and service security precautions are functioning properly.

When authentication is required, we should continually verify that it is active and functioning properly. A bug, misconfiguration, or hack attempt are a few of the possible causes for authentication to be failing, and possibly letting everyone have access.

We recommend utilizing a dedicated service account for authentication validation. The account should have read-only access to the minimum level of services required to function.

Set up the InfluxDB CLI environment; from the command line run:

influx apply --template-url https://raw.githubusercontent.com/influxdata/community-templates/master/endpoint-security-state/endpoint-security-state.yml

The Telegraf configuration will need to be updated with the URLs and credentials for the endpoint(s) you wish to monitor.

Download the Telegraf template, from the navigation window in the InfluxDB UI, select Data?Telegraf. From here, select the configuration, “Endpoint Security State” and download the file to your system.

Let’s take a deeper look at the Telegraf config (shown above). It uses multiple input plugins and the Regex processor to normalize the data a bit.

The x509 Cert plugin collects various metrics from the site’s certificate. This only supports the HTTPS schema.

The HTTP Response plugin is utilized for connection and authorization attempts.

For authorization testing, make sure to configure each endpoint with and without credentials. We need one to fail for validation.

Verify the URL is not redirected to another site for authentication if it is enabled.

The Flux query uses the response status code to determine the various states. Any successful status or authentication failure response indicates the service is available. An authentication failure response indicates authentication services are available. Successful responses indicate authentication is functioning properly.

The resulting table will display each endpoint and an icon to indicate the state for each metric:

? The service or function is as expected.
???? The site's certificate will expire in less than 30 days.
???? The service or function is in a failed or unexpected state.
? We were not able to derive the state.

This could be extended with alerts for failed authentication services or expired certificates.

We could collect the Apache or NGINX performance metrics, add an endpoint variable along with a couple of histograms to give the user a more complete picture of the services.

As for me, I really enjoy the ability to quickly build solutions on the InfluxDB platform. Not so long ago, I would cobble together a solution, start evaluating 3rd party solutions or move the problem to the “get this done when I have extra time” queue.

So go ahead and play with the InfluxDB Endpoint Security State Template, and let us know if you have any questions, on our InfluxDB community site or Slack.