Guest post by Ankit Maheshwari, VP of Engineering and founding team member at Innovaccer
Despite the unprecedented growth in the global healthcare market, the healthcare sector is still largely dependent on outdated and inefficient methods of data aggregation and analysis. To tackle this inefficiency, Abhinav Shashank, Kanav Hasija, and Sandeep Gupta founded the SaaS startup Innovaccer Inc. in 2014. Working with top researchers at MIT, the University of Cambridge, and other top international universities, Innovaccer aims to solve data problems on an global enterprise level and leverages a proprietary Big Data platform, which we’ve dubbed the Data Activation Platform, to enable healthcare organizations to make powerful, data-driven decisions.
Since its launch over four years ago, Innovaccer has grown to enable 10,000 providers across more than 500 locations, including Mercy ACO, StratiFi Health, and UniNet Healthcare Network, and streamlined over 300 million data points. After just one year, we were onboarding two to three new customers every quarter, and every customer had a single tenant-architecture dedicated VPCs with 25 to 30 servers. We were handling terabytes of data every day. To handle this enormous growth, Innovaccer switched cloud providers in 2015 by moving its development load from Google Cloud to AWS.
Innovaccer leveraged the Amazon Elastic Compute Cloud (EC2) servers for hosting its web applications and databases given the Big Data-based primary operations. The movement to AWS was seamless as we deployed it for the first time in our organization, giving our developers greater control, more reliability, better availability, and enhanced performance.
In 2016, Innovaccer decided to fine-tune its platform specifically for the healthcare sector to enable its transformation to a data-driven sector. As we built our Data Activation Platform (DAP) for healthcare, we had to re-evaluate our architecture. We needed a data warehouse that checked on the following parameters:
· High performance for petabyte-scale of data
· Low maintenance overhead
· Easily scalable
After evaluating a few data warehouses, we decided to go with Amazon Redshift. With Amazon Redshift, Innovaccer was able to achieve the best-rated time-to-value in transforming data to analytics and delivering ROI. Providing the capability to build and run ETL pipelines using drag and drop modules, Innovaccer’s platform could integrate data across multiple sources in about half the time compared to industry standards and at 70% less cost.
Healthcare is a dynamic field and there is a lot of data that needs to be accommodated and processed in real time. To enable this, Innovaccer shifted to NoSQL Data warehouse, Hbase, to accommodate live time-series data from the Hadoop File System, which was used to store huge files assembled from different sources. Additionally, the platform began to use Spark on YARN to process hundreds of gigabytes of data that was incoming from multiple sources in batch jobs in less than an hour.
We have different impact numbers for different customers, foremost being a savings of $412 million for U.S. healthcare. Leveraging our tools, customers have reported decreased ED utilization and readmission rates. Significantly, annual wellness rates and primary care visits have increased. It not only encourages our goal, but also marks our eminent presence in the healthcare space.
Initially, with fewer clients, Innovaccer had developed its Hadoop cluster management in-house on EC2 by AWS and was managed by our own infrastructure engineering team. However, as the company operations scaled and the size of our infrastructure team expanded linearly with the number of customers, a significant amount of time went into troubleshooting and managing the cluster. To ensure seamless operations, we leveraged AWS-managed services to manage the cluster and help us scale our operations up and down dynamically (and with ease).
Another challenge that the infrastructure team faced with hosting and managing an in-house Hadoop cluster was the associated high infrastructure cost. Since we were using EC2 servers for the entire Hadoop cluster and it usually ran 15-20 r4.4x large machines for 15-20 hours daily for each customer, the cost began to rise significantly.
Being deployed on AWS, Innovaccer could focus its attention on improving the customer experience instead of having to worry about infrastructural overhead and data security concerns. With the Data Activation Platform, our customers can perform data analytics on huge data sets in less than a couple of hours. Most of our analytical needs are easily fulfilled by Redshift, and we use Spark jobs to power complex analytical models and predictive analytics, which runs on spot instances of Amazon Elastic Map Reduce (EMR). Innovaccer also offers an easy scaling of EMR for on-demand analytics to help our customers achieve prompt results within a few hours compared to a few days and with half the previous infrastructure costs.
Today, Innovaccer is helping healthcare organizations activate their healthcare data and develop tools to make a powerful difference in the way care is delivered. Innovaccer is continuously innovating with new technologies and keeps its technology stack up to date with the latest trends. It is in the phase of assessing serverless architectures and Innovaccer has moved some of its non-production services to AWS Lambda.
AWS Solution Architects have been of great help with their detailed well-architected reviews and insightful recommendations which helped us improve our product performance and efficiency.
We are addressing the long-standing problems around data feeding, visualization, merging, and engagement, and our partnership with AWS is significant as we take a different approach to data and activate it to drive insightful and informed decisions.