Using Time Series for Application Performance Monitoring to Win at the Flight E-ticket Sales Game
Chris Churilo /
Jul 03, 2019
One of the best ways to demonstrate the value of application performance monitoring (APM) and its relation to time series data is through a real-world success story. Here we introduce the application performance monitoring solution used by Web Shop Fly, a startup that has grown into a major online airline ticket travel agency operational in more than five European countries and Russia. The application performance monitoring solution is built on top of InfluxDB, an open-source time series database, used to collect and store user interactions and application metrics and events as well as send automated notifications in the case of errors.
Web Shop Fly is exclusively designed, developed and operated by Bonitoo.io (a software engineering company that provides end-to-end R&D Services). The portal is integrated on one side with route and ticket providers. On the other side, the portal presents a search result with a unique layout that makes planning trips for end users easy and presents options that are tailored just for them. The application is highly optimized for the Central European market but is also entering other Europeans markets as it grows.
<figcaption> Web Shop Fly: One System with 5 Independent Portals in 6 Countries</figcaption>
Web Shop Fly uses standard metrics from Google Analytics, Hotjar and AWS Cloudwatch and they augment this with metrics that they collect from the instrumentation of their application, infrastructure, and business to get an understanding of user adoption, health of bookings, infrastructure and network health and performance. This comprehensive visibility helps them make UX improvements, respond quickly to anomalies and ensure the quality of the flight data.
The challenges facing Web Shop Fly
Web Shop Fly wanted to become the top seller of online flight tickets but faced steep challenges:
- Crowded market and global competition
- Available frameworks don't allow for differentiation
- Environment prone to price errors
- Customer easily disengaged
Market opportunity in real-time monitoring
Determined to overcome industry and market challenges, Web Shop Fly identified a market opportunity to gain customers by improving its flight search and purchase experience. This required real-time visibility into user activity to determine user preferences which could only be gained via real-time monitoring. Web Shop Fly set itself the following objectives:
- Fastest search response: from over a minute to within 12 sec
- Price accuracy: > 95% to conquer customer engagement and trust
- Optimum sorting: availability & time range vs. cost & scoring
- Shift from passive to proactive UI: by monitoring online website visitor activity in real time, they were able to suggest instant attractive options to better meet customers' searches and preferences
- Real-time integration of online purchasing and support workflow: all customer activity is monitored, collected, analyzed and presented in reports. If website users faced issues or needed extra support, that support would jump in ready to help.
To achieve these objectives, Web Shop Fly deployed real-time data processing, monitoring, integration and visualization. Leveraging the visibility gained, they set out to provide better customer service response and more accurate flight and ticket information.
Competing through clear strategy and the right toolset
They realized that real-time processing is necessary since customers are impatient and that it would enable them to achieve the following SLAs as differentiators.
- Search flights in 7 seconds (instead of 30+ seconds)
- Deliver best prices (high competition)
- Deliver fresh prices (search price vs. final order price)
These differentiators would have to be achieved in the context of the flight ticket e-shops’ ecosystem. This encompasses many integrations 700+ airlines, content providers & providers that aggregate APIs, online travel agencies (OTAs), metasearch engine APIs (i.e. Google flights) while using a myriad of technologies (SOAP, REST, ftp, etc). A major challenge for Web Shop Fly was therefore balancing price accuracy, search time and cost given that 500M flight combinations change every 10 minutes and that flight prices refresh capacity was limited (since API calls are not free).
Web Shop Fly built its solution on real-time data insights, seamless connectivity, and cost efficiency by using:
- Business monitoring (focus on the right flights)
- Smart integrations (optimize API calls cost)
- Application performance monitoring
Air tickets e-shop service architecture
Here is a high-level diagram of Web Shop Fly architecture, which consists of containerized services and a MongoDB cluster on AWS, as well as InfluxDB stack.
<figcaption> Web Shop Fly architecture using the InfluxDB stack</figcaption>
Using time series for application performance monitoring
For real-time time series data ingestion and processing, Web Shop Fly chose InfluxDB because it is purpose-built to handle the massive amount of data, perform the needed operations, and present the data in the needed format as well as having real-time capability for alerting.
The InfluxDB time series platform helped Web Shop Fly create a real-time monitoring system very quickly and easily. Web Shop Fly uses InfluxDB time series platform for monitoring infrastructure metrics, application metrics and business-level metrics. They use InfluxDB for ingestion and storage, Telegraf agent for metric collection, Kapacitor for alerting, and Chronograf for data visualization. All these components of the stack come ready to use to help them instrument their unique environment.
Monitoring components within Web Shop Fly’s architecture include:
- Telegraf Out of the box (OOTB) Plugins - for infrastructure metrics
- Telegraf Exec Plugin with custom shell script - for some custom metrics
- InfluxDB JS and Java Client to capture information from their application
- Kapacitor anomaly detection and alerting
- Chronograf dashboards
- InfluxDB to store all the metrics
What are they able to achieve with InfluxDB?
Let’s take a look at some of the specific metrics that they are collecting and using to understand how the service is performing. In this instance, they are trying to determine how well the search requests are performing for the end users. In the below graph, the blue lines represent the number of requests for a particular search request which includes the flight details (airports, airline, time/date) and the price. The yellow lines are the corresponding error codes that they get from the airline or OTA systems. The graph below shows the response time of the searches separated by the max, mean, and min of the response times. You can see how these metrics help them fulfill their 7 second SLA for search by letting them know which search result was problematic and why.
<figcaption> Monitor pricing results</figcaption>
They are also able to gather metrics that help them understand the user adoption of the service and how well bookings/orders are progressing. For adoption, they have a feature that allows a user to be emailed for the “best” prices, so they monitor for number of registrations per day and overall adoption over time as well as the number of emails sent. As for the bookings performance, they look at the number of bookings (orders) coming in, how many failed orders they have and the error codes generated to determine which system is at fault (airline, OTA system, etc).
When they combine this view of their business with the typical infrastructure monitoring they also do, they have a good sense of where problems occur and how to address them quickly to keep the bookings coming.
How do they do this?
As mentioned, Web Shop Fly uses Telegraf plugins to capture metrics about the performance of their infrastructure. They also use the InfluxDB JS and Java client to capture information from their application that informs them about the performance of their application as well as providing them with business metrics. The following shows an example of how easily they can add metadata (or tags) to the events they are capturing and inserting into InfluxDB.
Since all metrics and events are stored in InfluxDB, it is easy to link the various metrics and provide a single view across application, business, and infrastructure metrics. Even though their application has some unique functionality to support their use case, they were able to get the solution up and running quickly.
Web Shop Fly’s use case provides a realistic example of how the decision to become data-driven and implement real-time monitoring can transform market positioning and competitive advantage, and turn challenges into untapped opportunities. To learn more about this use case, watch the webinar.