Webinar Highlights: Improving Clinical Data Accuracy - How to Streamline a Data Pipeline Using Node.js, AWS and InfluxDB
By Bria Jones / Aug 26, 2022 / InfluxDB, Community
Given the global health crises the world has faced over the last few years, the need for expeditious but accurate medical trials has never been more important. The faster clinical trial data is validated, the faster medicines get approved and treatments become available. Pinnacle 21’s customers are driving forces behind creating life-saving treatments. In this webinar, Josh Gitlin, Director of DevOps, shares how Pinnacle 21 is using the purpose-built time series platform, InfluxDB, to help streamline data pipelines for faster and more accurate clinical trial data. If you missed attending the live session, we have shared the recording and the slides for everyone to review and watch at your leisure.
Pinnacle 21 overview
Pinnacle 21 by Certara is a software company specializing in life sciences solutions. Their flagship product, Pinnacle 21 Enterprise (P21E for short), is used by major life science and pharmaceutical companies to validate clinical trial data. When developing medical treatments, medicines, and devices, there are specific data standards the CDISC (Clinical Data Interchange Standards Consortium) established that must be met to be approved. Pinnacle 21 customers use P21E to ensure their data will look correct when sent for approval; Josh provided the best layman’s explanation he has heard, “It’s like spell-check for your clinical trial data.”
The need for a solution
When Josh joined Pinnacle 21, Datadog had been selected as the product of choice to monitor their servers. He felt it was lacking critical features such as the ability to label the Y axis, and there were limited visualization options. Plus, it was expensive and not well suited for Pinnacle 21’s use case. Having previous experience with Grafana and InfluxDB, Josh proposed the idea of replacing Datadog. Since Josh had been writing automation software in CINC, the open source version of Chef, it would be fairly easy to replace Datadog with another monitoring solution.
- Easy to implement
- Ability to capture logs and metrics
- Must be an externally-hosted tool
- Collect and monitor APM metrics
They needed their data to be hosted elsewhere to be secure and protected from tampering but did not want to manage the infrastructure of another monitoring solution. This requirement would help them adhere to the audit and compliance requirements established by the CDISC.
Why Grafana, InfluxDB, and Telegraf?
Josh considered InfluxDB and Grafana as a potential replacement for their DevOps monitoring needs initially because of the cost efficiency – even when paired with another vendor for their log analytics, it was still significantly cheaper than their previous solution. With InfluxDB’s usage-based plan in place, they evaluated each metric they chose to send to Telegraf to decide what was worth paying for. Another key component for Pinnacle 21 was Telegraf itself. For Josh and his team, Telegraf was an impressive and powerful data ingestion tool with more plugins available, out of the box, than their existing agent.
Pinnacle 21’s customers are located in AWS within their own EC2 security group and each has two instances – a web server and an application server. Telegraf is on both of these instances publishing metrics to an internal InfluxDB instance as well as to a GCP-hosted InfluxDB Cloud instance. The Pinnacle team created a policy file-based workflow for each type of server that defines which pieces of automation to run and what to monitor through Telegraf.
Installation was easy for Pinnacle 21 since Telegraf has a directory that allowed them to write individual plugin configurations. This was useful when automating a Telegraf installation via Chef as they have different policies and roles based on their servers. Josh and team chose to have each plugin write a configuration in the Telegraf directory then establish a node attribute listing which plugins Telegraf should monitor. This enabled them to customize the configuration for each Telegraf input for each server individually.
Josh’s tip: Structure the node attributes as hashes and not arrays, since arrays will be deep merged and not as easy to turn off. Stacking Nodes as hashes will allow better control of what is being monitored at a very granular level.
Josh then decided on an initial base monitoring set in Chef and found that some of the inputs were valuable but not worth the expense. The solution? Telegraf has a filtering option in each output plugin called “Tag Pass”. The team configured the output plugins with specific destination tags and excluded certain tags from being published to a specific server. By putting tags on each Telegraf input, they could select which InfluxDB servers to send those metrics to. This also enabled the team to collect higher granularity metrics for a particular input and fine-tune what they were collecting, how often they were collecting it and where they were sending it to better manage cost.
KPIs and APM
The next key action item was to look at some key performance indicators for their servers. Because they had previously used Datadog, they were already writing NGINX logs in JSON format but Josh recommended parsing the NGINX logs. By parsing their logs, Pinnacle 21’s team is able to convert their logs into metrics, and store this in InfluxDB.
Pinnacle 21’s customers use IBM Aspera to upload large data sets for validation; they have configured Aspera to write logs to a particular path where Telegraf can pick up using a Tail Telegraf Plugin. This enables the team to generate graphs showing specific KPIs – the average customer data set size, the speed at which the customer data is uploading, how many uploads are in progress, etc. which were helpful for the extended Pinnacle 21 team to understand the impact of the data.
Because P21E is a Java application, they needed to use a Java agent for their APM metrics. The team chose InspectIT Ocelot to collect JVM metrics, and they publish those metrics natively using the Telegraf Listener on the machines. This enabled their engineers to write code to capture individual events like function runtimes and other metrics to monitor the performance of the application and optimize the software.
This is incredibly important for Pinnacle 21 as some of the data sets can take hours, sometimes days, to validate. Consider COVID-19 data sets, for example – the longer it takes to validate that data, the longer it takes to get new treatments approved. By giving the engineers the ability to quickly assess and optimize the performance of the application, it can ultimately speed up the validation process.
HTTP monitoring replacement
Initially the team considered Telegraf as a possible replacement for their HTTP monitoring system, and while Telegraf could do the job, they ultimately went with an AWS Lambda instance with a Node.js application. Node was the correct solution for them because of its event-driven infrastructure. Josh uses the Node.js client to publish metrics directly to InfluxDB and triggers AWS CloudWatch events to execute every minute from multiple regions. They used Grafana for alerts, to create heat maps, and to visualize data such as response times, outages, and failure codes from across the globe.
Tips and tricks
Josh provided a number of helpful tips and tricks for anyone starting their monitoring journey with InfluxDB. First, he suggested starting out by evaluating needs and determining the overall objective. Because Telegraf is so powerful, it’s easy to get lost in all of the plugins available. Josh noted that sending data to multiple InfluxDB instances is a powerful option for redundancy and a good option for back-up in case one instance has an issue. Next, he stresses the importance of using the usage dashboard as a way to measure your usage and manage cost. He suggests adding metrics slowly to evaluate their impact to your usage, especially if there is a large fleet of servers. Finally, he recommended using Flux as it is a powerful and full functional language.
Josh mentioned he would be checking out InfluxDB University to sharpen his Flux skills and if you’re interested in learning more about Flux or other InfluxDB topics, start here!
Question: Is there an initiative at your organization to be cloud-first?
Answer: Within the Pinnacle 21 department, it was very much cloud first, and that was largely the startup mentality, I think, where the founders needed to host this software and the last thing that the CTO needed to do was manage more infrastructure just to keep the business running. I do really like cloud first, and so I’m sticking with that approach for a significant number of the things like InfluxDB that we rolled out, like our log monitoring solution. But I’m also not afraid of running things internally. Sometimes I think there’s benefits one way or the other. So it depends on team size and resources. We are hiring at the moment, so as we get more DevOps engineers, we’ll have the greater ability to support internally hosted things as well.
Question: You’ve mentioned a few things that you’re hoping to do next — what’s at the top of that punch list regarding metrics that you’re collecting using InfluxDB?
Answer: I would love to start replacing some of the InfluxQL queries we have with Flux queries and improve some of the dashboards that we have. I think we could benefit from some comparative information where we look at comparing one customer or performance week over week. I think there’s definitely some power to be had there. I like to improve some of the things in the HTTP monitoring side. It’s good right now, but I think it could be better, and less on the InfluxDB side but more on the Grafana side. I’ve been playing with Grafana OnCall, really happy with that. We’re not using it in production yet, but we are using it in development, and I would like to switch everything over to that. It allows you to acknowledge alerts right from within Slack. It allows you to configure downtime better. That’s one of the challenges we’ve had with the existing solution is if we know we’re doing maintenance on a particular instance, we get a whole bunch of alerts for that instance because it’s difficult to silence those alerts or schedule downtime. So Grafana OnCall makes that easier and still works with InfluxQL or Flux.
Question: Our community loves Grafana, but have you looked at the visualization tooling in InfluxDB?
Answer: Yes. When I’m building new queries, I will use the InfluxDB Cloud UI and I will write my queries in there, visualize them, make sure that I have them the way I want them. I use the InfluxDB Cloud UI when I’m looking at the usage dashboard. I think some of the visualizations in there are not quite as powerful yet just because Grafana has such a head start on InfluxDB Cloud. But it’s a great system. It’s already more powerful than some of the things I’ve seen with Datadog. For example, you have different kinds of visualizations than just line charts. So yes, we do have some dashboards within the InfluxDB Cloud UI itself.
To learn more about how Pinnacle 21 is using InfluxDB, click here.