Case studies

Showtime Analytics - An Irish based data analytics products and services company

Discover how Showtime Analytics were helped to developed a cloud solution to a viable competitive level and increase the reliability and efficiency of their services. 

The Showtime Analytics data platform collects information from multiple data sources (including cinema point of sale information) and then collates and transforms this data so that it can be used to provide end customers meaningful insights to assist business and marketing decisions. The end-customers range from Exhibitors to Studios, Distributors to Advertisers.

As Showtime’s product offering evolved over the last four years, so too has the resulting data platform that services this product suite. Prospective new customers had large numbers of cinemas and would require a significant expansion of the data platform to store and process their data. The evolution of the products to date (while based on sound technical considerations at the time) resulted in a number of technical challenges that needed to be overcome if the business was to grow and fulfil its potential globally and needed to run at a cost that wasn’t soaking up the margin of providing the service.

The solution was using Apache Spark running against a Cloudera Hadoop cluster to compute the insights at both a daily and 15-minute interval, but at a cost as the Hadoop platform required a number of large EC2 instances with large EBS volumes attached.

The costs of running Hadoop as a permanent Hadoop cluster was expensive and was eating into their margins. Taking on multiple large customers was going to result in significant additional costs and further increase the drain on their team’s time to manage and tune the underlying infrastructure.

To meet the growing business demands, Showtime needed a way to keep the costs of their solution to a viable competitive level and increase the reliability and efficiency of their services. They also could see the potential for delivering near real-time analytics and using artificial intelligence as part of their product offerings, but could not achieve these with the current solution.

Working with their AWS partner (Deloitte), Showtime defined a solution based on AWS Simple Storage Service (S3) and Amazon Elastic Map Reduce (EMR) that allowed Showtime to transition their existing Spark code to a new platform with minimal changes.

One of the key acceptance criteria was to maintain complete separation between customer data and ensure that a process could only access data from a single customer. This requirement had been supported by Kerberos on the Hadoop platform and the new solution had to meet or exceed its capabilities.

To meet the solution requirements, S3 was used as the data store, with data encrypted at rest using AWS Key Management Service (KMS) and customer-specific encryption keys. All-access to the data is governed by AWS Identity & Access Management (IAM) roles and policies scoped to provide per customer access. This ensures that all processing of data is limited to a single customer and that this can be enforced by the AWS services rather than relying on the application team.

The new solution no longer provided a persistent compute layer (and a per customer instance is required) a new way to schedule and orchestrate the processing was required. To address this requirement, we used AWS Data Pipeline services to allow the scheduling and management of processing on defined schedules.

The new solution no longer requires a persistent set of EC2 instances to provide the storage layer. The new solution persists all data on S3 and is protected by AWS KMS keys that are applied per customer. In addition, the cost of the storage is now substantially lower using S3 instead of EBS volumes and is also more resilient and durable as the S3 service automatically maintains multiple copies of the data spread over multiple AWS Availability Zones.

The Showtime Spark code was migrated to EMR jobs with relative ease and data pipelines configured to provision the EMR compute layer on-demand using AWS Data Pipeline and using EMRFS data can be processed directly from S3.

New IAM roles and KMS encryption keys are created for each new customer, and once in place, new customer processing can be scheduled in confidence that it cannot touch other customer data.

This solution adopts the AWS’s key principle of separating compute from storage. The old Hadoop solution required a permanent cluster of EC2 instances and associated EBS volumes to store the data and process the queries. The new solution takes advantage of the characteristics of S3 to store the data and allows compute capacity to be provisioned on-demand.

With the new solution, Showtime have the ability to take on new customers, increase the volume of data they process whilst still reducing costs by 35%. “Using AWS Managed Services has eliminated the responsibilities for managing the data and compute clusters."

Now that the data is on S3, Showtime have the basis to start exploiting the data to deliver new value using other AWS services such as Glue, Athena and Sagemaker to allow their data scientists to discover new insights, train and implement machine learning models.

Did you find this useful?