Skip to main content

Modern Data Stack Boot Camp Lesson 2: Estimate TCO

As noted in the previous lesson, implementing a modern data stack promises (MDS) substantial savings of time, money and labor — in addition to offering key competitive advantages — but it does require investment up front. To make the business case for an MDS, you’ll need to estimate total cost of ownership (TCO) for each of the core technologies: data connectors, data warehouse and business intelligence tool. That means comparing your current workflow with available MDS technologies, and considering a range of factors, both quantitative and qualitative. Here’s how to estimate TCO for each technology.

Estimating data integration TCO

Calculating the cost of your current data pipeline might require a careful audit of prior spending on data integration activities. You’ll need to consider the sticker price, costs of configuration and maintenance, and any opportunity costs incurred by failures, stoppages and downtime. On the other side of the ledger, you will want to evaluate the benefits of the potential replacement. Some may not be very tangible or calculable (i.e., improvements in the morale of analysts) but others, such as time and money gains, can be readily quantified. Below, we compare the TCO considerations for DIY data integration and automated data integration.

Estimating TCO for DIY

  1. Average yearly salary for your data engineers or whoever performs data integration, most likely analysts or data scientists
  2. Number of data sources of various types, such as:
    • Applications - Especially SaaS products such as Salesforce, Marketo, Zendesk, NetSuite, etc.
    • Databases - Operational and transactional systems such as MySQL, PostgreSQL, SQL Server, etc.
    • Event tracking - Event tracking software monitors behavior on apps and websites. Examples include Segment, Snowplow, and Webhooks

With these figures, you can estimate the time and money spent on engineering. Keep in mind that the following calculations are on the optimistic end.

  1. First, apply a multiplier of 1.4 to the salary in order to arrive at the cost of labor. You must account for benefits and other overhead costs in addition to salary.

     

    If you lowball the cost of a data engineer at $120,000, then the total cost of labor is $120,000 * 1.4 = $168,000

  2. Assume, optimistically, that it takes about 7 weeks to build a connector and about 2 weeks per year to update and maintain it. So, each connector takes about 9 weeks of work per year.

     

    Let’s say you have five connectors. 5 * (7 + 2) = 45 weeks of work per year

  3. Use the weeks of work per year to calculate the fraction of a working year this will account for. Then, multiply it by the cost of labor to arrive at the total monetary cost. Assume that the work year lasts 48 weeks once you account for vacations, paid leave, and other downtime.

     

    If the cost of labor is $168,000, five connectors take 45 weeks of work, and there are 48 working weeks in a year, then ($168,000) * (45 / 48) = $157,500.

Based on our experiences at Fivetran, these figures should give you a realistic starting point for understanding just how costly a DIY data integration solution can be.

Probable Costs of an Automated Solution

With an automated solution, the following costs apply:

  1. The cost of subscription for one year, which may be flat or (more likely) based on consumption. There are many different kinds of pricing available, but monthly active rows (MAR) is one standard.
  2. Although some tools offer transformations out-of-the-box, you may still have to write your own transformations. Your mileage may vary based on how complex the reports and dashboards you want are.

Labor costs for an automated solution should be very low, and measured in minutes or hours per year rather than weeks. Your analysts, engineers, and data scientists should be free to spend more time analyzing data or productionizing predictive models.

Estimating data warehouse TCO

Traditionally, data warehouses were built from scratch and installed on premise in data centers, in the process incurring substantial hardware, software, labor, and expertise costs. Modern cloud data warehouses range from architectures that resemble on-premise data centers, but online, to purely “serverless” architectures that can instantly scale compute and storage resources as needed.

Pricing can be highly variable. For more traditional cloud data warehouses, you may have to forecast your computation and storage needs and carefully design your architecture, much as you would for an on-premise setup. In general, you will most likely need to consult the pricing schedules of individual vendors, and may have to run tests to determine exactly how they calculate compute costs. For some full-service data warehouses, you might be able to find a flat, monthly fee.

Estimating BI tool TCO

Business intelligence tools typically bill on a monthly basis. Pricing is often adjusted by the number of seats/users your organization reserves, typically with smaller per-person costs as your subscription grows.

Outside of explicit monetary costs, however, the most important considerations for a business intelligence tool are its features, performance, and whether your team has the manpower and expertise to fully leverage it. A poor choice of BI tool can easily produce delays or additional work for your team if you are not careful.

If possible, try out a number of tools and see how quickly and comprehensively your team can produce needed reports and dashboards.

Next up: Choosing your MDS tools

Now that you understand what an MDS can offer, and how to estimate TCO for the core MDS technologies, you’re ready to choose a specific data integration solution, cloud data warehouse and business intelligence tool. Next time (in 24 hours an email with a link to the final lesson will be shared), we’ll take a look at the key technical criteria you should apply when selecting them.