Cornerstone|Professional services
ETL Runtime Savings
Our Managed Service team helped mobile infrastructure leader Cornerstone significantly reduce their ETL runtime from 5.5 hours to 2.5 hours, achieve greater system stability, and cut annual Azure costs by over £20k.
Cornerstone, the leading UK mobile infrastructure services company, has been installing mobile infrastructure for telecom operators since 2012. They’ve grown to manage over 15,500 sites while maintaining their core values of delivering excellence, sharper solutions and stronger connections.
Coeo’s engagement with Cornerstone began in 2018, starting with a data platform project within Managed Services. This expanded to include an analytics project, that also transitioned to Managed Services in April 2024.
The Challenge
During the initial project phase, we identified several areas for improvement, with the ETL (Extract, Transform, Load) process duration being a primary concern. At go-live, the average ETL runtime across 3 environments was 5.5 hours, with two particularly large datasets accounting for the majority of this time. Additionally, we noticed frequent load failures during support, which were caused by the Photon acceleration feature being enabled on the Databricks clusters to reduce runtime.
The Solution
We proposed and implemented the following solutions to remediate the issues faced:
- Improved the ETL efficiency for the two largest loads by adjusting the code to merge entire datasets instead of processing file by file. This change enabled Databricks distributed compute to do the heavy lifting and not restrict the cluster to the sunsets of the datasets.
- Disabled photon acceleration to improve stability and assess its impact on overall runtimes.
- Enabled vacuum and optimise commands for the two largest datasets to assess potential runtime improvements.
The Outcome
Our solutions yielded significant improvements:
- Optimising the code reduced the ETL runtime from 5.5 hours to 2.5 hours on average. The most substantial reduction was seen in one process, decreasing from 3.5 hours to just 31 minutes.
- Disabling photon acceleration improved stability without negatively affecting runtime performance.
- While Vacuum and optimise commands had no impact on runtime, they contributed to overall system health.
The combined effect of these improvements resulted in an annual Azure cost reduction of over £20,000.
Future Plans
The Coeo Managed Service team, are committed to continuous improvement and plans to implement future enhancements as they become available in Databricks. This includes Liquid Clustering, Databricks runtime upgrades and other performance-boosting features.