Choose Your Tools
As we mentioned previously, a modern data stack has three core technologies: data connectors, cloud data warehouse and business intelligence tool. You will need to compare and contrast solutions for each of these core technologies. Before considering data integration tools, let’s start a little downstream and think about what features you will need in a cloud data warehouse and business intelligence tool.
Choose your cloud data warehouse
Your data warehouse will be the repository of record for your organization’s structured data, and you should choose one carefully. Different data warehouses offer different features and tradeoffs. Here are the nine criteria you should focus on:
- Centralized vs. decentralized data storage. Does the data warehouse store all of its data on one machine, or is it distributed across multiple machines, trading redundancy for performance?
- Elasticity. Can the data warehouse scale compute and storage resources up and down quickly? Are compute and storage independent from each other or coupled together?
- Concurrency. How well does the data warehouse accommodate multiple simultaneous queries?
- Load and query performance. How quickly can you complete typical loads and queries?
- Data governance and metadata management. How does the data warehouse handle permissions and regulatory compliance?
- SQL dialect. Which dialect of SQL does the data warehouse use? Does it support the kinds of queries you want to make? Will your analysts have to adjust the syntax they currently use?
- Backup and recovery support. If your data warehouse somehow gets corrupted or breaks, can you easily revert to a previous state?
- Resilience and availability. What about preventing database failures in the first place?
- Security. Does the data warehouse follow current security best practices?
Also, be sure to take a look at a few cloud data warehouse benchmarks before making your choice. Here are some we’d recommend:
- The Fivetran cloud data warehouse benchmark
- Gigaom’s cloud data warehouse performance benchmark
- Amazon’s Redshift vs. BigQuery benchmark
- Periscope’s Redshift vs. Snowflake vs. BigQuery benchmark
- Mark Litwintschik’s 1.1 Billion Taxi Rides benchmarks
Choose your business intelligence tool
- Seamless integration with cloud data warehouses. Is it easy to connect this BI tool to your cloud data warehouse of choice?
- Ease of use and drag-and-drop interfaces. Ease of use is especially important to popularizing data-driven decisions across your organization
- Automated reporting and notifications. Writing reports by hand can get tedious. Does the BI tool allow you to schedule reports to publish automatically? What about alerting users when the data changes?
- Ability to conduct ad hoc calculations and reports by ingesting and exporting data files. Your analysts and data scientists might sometimes want to explore data without the overhead of having to go through a data warehouse first.
- Speed, performance and responsiveness. Basic quality-of-life considerations are important, like dashboards and visualizations loading in a timely manner.
- Modeling layer with version control and development mode. Does the BI tool allow your analysts to work collaboratively by sharing data models and code?
- Extensive library of visualizations. Pie charts, column charts, trendlines, and other basic visualizations can only take you so far. Does the BI tool feature more specialized visualizations like heat maps or radar charts? Does it allow you to build your own custom visualizations?
Choose your data integration tool
There are many data integration tools in the market, and their technical approaches and feature sets vary significantly. Here are the foremost factors to consider when you’re choosing a data integration tool:
- Data connector quality. Take these factors into account when evaluating connector quality:
- Open-source vs. proprietary. There are more open-source connectors for a wider range of data sources, but proprietary connectors tend to be of higher quality and integrate more seamlessly with other elements of a data stack.
- Standardized schemas and normalization. Data from API feeds is not usually provided in normalized form, but normalization fosters data integrity and lends itself to schema standardization.
- Incremental vs. full updates. Incremental updates using logs or other forms of change detection allow for more frequent updates that do not interfere with business operations.
- Support for sources and destinations. Does your prospective tool support your sources and destinations? If not, does the provider offer a way for customers to suggest new sources and destinations? Do they routinely add new ones?
- Configuration vs. zero-touch. Zero-touch, fully managed tools are extremely accessible, with connectors that are standardized, stress-tested and maintenance-free.
- Automation. Integration tools should remove as much manual intervention and effort as possible. Consider whether a tool offers features like automated schema migration, automatic adjustment to API changes, and continuous sync scheduling.
- Transforming within vs. before the data warehouse. Avoid tools that use a traditional ETL (extract-transform-load) approach. Under a more modern approach known as ELT (extract-load-transform), transformations are performed in an elastic, cloud-based warehouse, allowing limitless resource scaling.
- Recovery from failure. Bugs and errors will inevitably crop up over the course of data integration; you don’t want to permanently lose data as a result. Find out whether your prospective tools are idempotent and perform net-additive integration.
- Security and compliance. These are key areas, both in terms of data protection and public perception. Specifically, learn whether prospective tools offer:
- Regulatory compliance
- Limited data retention
- Role-based access
- Column blocking and hashing
Many publications offer aggregate reviews and ratings of data integration tools, as they do for data warehouses and business intelligence tools. Be sure to comparison-shop. Make sure the data integration tools you are considering are compatible with the data warehouses and BI tools you have or are considering.
Set up a proof of concept
Once you’ve narrowed your search to a few specific tools in each category, test the products in a low-stakes manner. Most will offer free trials for a few weeks at a time. Set up data connectors between your data sources and data warehouses, and measure how much time and effort it takes to sync your data. Perform some basic transformations. Add a business intelligence tool and begin creating dashboards and visualizations.
During this process, be sure to set aside dedicated trial time for your team, and encourage them to stress-test the new systems in every way imaginable. Compare the results of your trials against your technical criteria and standards for success. Don’t hesitate to contact support teams with questions, concerns or requests for validation. There’s no need to wonder whether a modern data stack will help you achieve your analytics goals — you should be able to experience success firsthand.
Boot Camp recap
We hope this Boot Camp has been helpful — and that you’re well on your way to improving your analytics practice with a modern data stack! Here’s a quick recap of the three steps we covered:
If you have more questions about automated data integration or the modern data stack in general, please join our weekly demo held every Tuesday at 10am PDT.