This article takes you through the basics of hybrid tables, how they support different query workloads and how they reduce the cost of building real-time applications.
Table of Contents
What Are Hybrid Tables?
The tabular format — putting information in rows and columns — is often used for making sense of data as it's highly structured and easily readable and processable using SQL. Traditionally, there have been two types of workloads in data, and both perform well when the table data is stored in a certain way on disk. We'll dig into the reasons further in this article, but at a high level, for transactional workloads, row-based storage makes more sense, while a column-based storage approach is better suited for analytical workloads.
With several developments in database technology and an increase in compute power over recent years, a demand emerged for a storage structure that could fulfill the requirement of both analytical as well as transactional workloads. Enter hybrid tables. A hybrid table can support both types of workloads, sometimes called translytical or HTAP workloads.
This article takes you through the basics of hybrid tables, how they support different query workloads and how they reduce the cost of building real-time applications — and supporting various business functions with data warehouses and analytics.
What Is a Hybrid Table?
Traditionally, the split between transactional systems and analytical systems translated to ACID-based relational databases, such as MySQL and PostgreSQL, and denormalized data warehouses, such as Redshift and Vertica. Both these database technologies support diverse business requirements by handling transactions and allowing you to analyze, crunch and make sense of all the data the different applications capture.
Software applications that deal with transactions typically need to access one or more rows, in full, from a given table. To enable fast reads and writes of complete rows, you need whole rows to be stored contiguously.
In contrast, analytical workloads are mostly aggregate based, meaning they need the value of a column from a massive number of rows.
A hybrid table is a type of table that can serve both types of requests by deploying a flexible storage infrastructure under the hood.
You can think of a hybrid table as a rowstore and a columnstore merged into one. SingleStoreDB offers a solution by maintaining an in-memory rowstore and an object-based flash SSD or on-disk columnstore. Like with any other database, frequently used files are kept warm in the in-memory cache. If a query doesn't find the file it needs, SingleStoreDB pulls the data from the object store. This enables blazing-fast in-memory and immediate writes.
Now, let’s take a look at why hybrid tables might prove useful for your organization — especially if you're working with modern data engineering architectural patterns like data lakes, lakehouses, data mesh, streaming, etc., in addition to more traditional relational database setups.
Why You Should Consider Using Hybrid Tables
About a decade ago, it was common for applications to be either transactional or analytical in nature. With the tremendous increase in volume and the facility of scale in the cloud, applications are now moving to a design that supports many different functionalities.
This means moving from many separate databases and data warehouses to a multi-model database with a flexible storage engine to support various read and write patterns.
Hybrid tables represent a move towards such a design, where businesses no longer have to wait for slow, once-a-day batch-based ETL to get a unified view of data. This is especially helpful in industries where customers expect fast turnaround times on their queries, including retail and eCommerce, fintech, marketing and advertising.
To understand more, let's look at some benefits of using a database that supports hybrid tables.
Better Performance at Lower Costs
Hybrid workloads demand more from a system than a database serving only one type of workload. This is why the database needs to sit atop a very robust, flexible and universal storage infrastructure — in other words, a storage engine that enables you to consolidate multiple databases and reduce data migration efforts.
Even though the job of a database administrator has been extensively automated using DevOps principles, there are still many areas that are up to data engineers to configure, tune and optimize; these include database reliability, security and infrastructure management. Using hybrid tables, you can reduce the cost of running and maintaining multiple databases, developing those databases and moving data from one database to another when required.
Although there's merit to specialized and purpose-built databases for many use cases, such as high-frequency trading, a multi-model database that supports a wide range of workloads is very often of great value for developers.
With a multi-model database, developers don't need to worry about dealing with different database connections, query languages and APIs. This holds true not only for application developers but also for any business team that needs to work with data, such as marketing, finance and product.
In addition to giving you the power to consolidate multiple databases into one, hybrid tables also help you get the most out of high-speed storage and memory by pushing the data in and out of disk and memory — without you worrying about network costs. With the in-memory capabilities of a hybrid table, you can use it in various use cases, including fast real-time analytics, time-series data processing and more.
Reduction in ETL Workloads
Even with the advent of fantastic tools like dbt, Airbyte and Airflow, one of the great pain points remaining for businesses is the question of data integration and ETL. Processing frameworks like Spark need much compute power to process and move data around. You need ETL to get data from both internal and external systems.
Although you cannot control what external systems you get to deal with, you can certainly optimize your internal systems by using hybrid tables backed by a universal storage engine that saves you the hassle of moving vast amounts of data around. The main reason you need to move data around from different systems is to place the data in those systems to support different workloads.
With a scalable hybrid table offering, you don't need to do that; instead, you have a single solution to serve all your data workloads. This is highly beneficial when building a centralized data store (like a data warehouse).
A reduction in ETL workloads not only saves compute costs but also time, which is ideal for businesses with time-sensitive operations like trading platforms, logistics services, targeted marketing and newsgathering.
These businesses don't have to wait for the hourly, half-day or end-of-the-day aggregations of data from transactional and analytical systems. Hybrid tables enable real-time data analytics without requiring a separate analytics data store, saving you a lot of time and money.
Getting Started with Hybrid Tables
If you spotted your business use case in our discussion in the previous section, you might be considering how you can improve efficiency by implementing hybrid tables into your own data workflow.
To get started, you'd need to identify the different types of data workloads and access patterns your business needs to support. With SingleStoreDB's Universal Storage built upon the principle of separating storage and compute, you can capitalize on various table storage types, such as in-memory, flash, SSD and disk.
For instance, if you're looking to process time-series data really fast, you can use the in-memory rowstore and archive that data on disk. For transactional loads, you can again use the in-memory rowstore as it provides ACID support and random IO, two of the main features of transactional relational databases.
Hybrid tables, like those offered in SingleStoreDB, are tremendously effective in carrying out your analytical workloads as the columnstore is supported by a fast storage backend. With this columnar storage model, you can perform highly complex aggregates, table scans, decision support queries and more.
SingleStoreDB's columnstore comes with all the benefits of a columnar storage engine, such as advanced compression, saving you on cost and while also improving database performance — all in all making it an excellent choice for teams who want to reduce their stress levels when it comes to scalability, flexibility, cost and performance.
This article got you to explore the basics of hybrid tables and how they can help different business teams solve a wide range of problems by supporting hybrid workloads.
SingleStoreDB is a real-time, distributed SQL database that supports hybrid workloads and high scalability for real-time applications and analytics. Its multi-model nature combined with Universal Storage makes for an outstanding database for building modern applications.
Hybrid workloads are relatively new. Only a few databases, such as SingleStoreDB, have managed it at scale. SingleStoreDB has got you covered whether you want to deploy it on-premises or in the cloud. If you're interested in building applications on top of a database that can serve many data workloads, try SingleStoreDB today.