Time Series databases are specifically designed to store data in pairs of time and value for a given time interval. These databases are widely used for generating trends, curves and time-based reports. Initially, time series databases were built to store the values of click streams or financial data like stock values over a period of time, but recently their use is more prevalent in newer applications like Internet of Things (IoT).
The telemetry data received from IoT devices is time dependent and the frequency of ingestion is usually very high. Connected Cars, for instance, send data to the IoT platform at every few seconds or sometimes at sub-second interval. To persist, analyze and visualize this time dependent data, it is essential to devise an efficient mechanism to persist such high volume and velocity data. A time series database is an apt choice in such situations.
The time series databases differ from other forms of data stores in significant ways. Some of the characteristics defining such databases are:
Continuous stream on data is handled very efficiently
Every data point that is ingested and stored in it is time-stamped
Order of data inserted is maintained
Fast storage and retrieval of data stored
Integration with stream analytics in many platforms
Here are some of the advantages of using time series databases over traditional ones:
Purpose Built: Time series databases are specifically designed for storing high concurrency, high throughput and high volume data in pairs of time and value. These databases therefore are specifically equipped with built-in functions which provide features like time-range query, aggregation, downsampling and analysis.
Scalability: The time series databases provide better scalability over relational databases for large data sets. Time Series data is more useful when having a large collection of data points rather than a single data point for a measured parameter. To get the accurate trends or curves, the application may need to analyze millions of data points which a relational database may never be able to store and read quickly enough. Their performance suffers when managing large data sets.
Cost: For a high volume of data, time series databases cost significantly less compared to relational databases. A time series database, in fact, optimizes data storage to reduce storage costs. For instance, Amazon Timestream costs 1/10th in comparison to relational databases for storing the same amount of data.
Analytical Capability: Query support for time-based aggregation is the biggest benefit of a time series database. These data stores are more reliable for forecasting as it is based on a large number of data points over a period of time. Analysis can be done data for an hour, month, quarter, year or any period of time. Data Aggregation on time series data is much simpler with these data bases as the data itself is time stamped.
There are many time series databases available today to use from open-source to proprietary and SaaS to PaaS services on the cloud. Some of these are introduced below:
Amazon Timestream has been recently added to the AWS portfolio of data stores. Timestream is a fully managed time series database for services like IoT and operational applications. Timestream is a serverless database which scales dynamically based on the required performance and capacity. It has an adaptive query processing engine which helps in querying for different time intervals. Timestream comes with built-in analytic functions such as smoothing, approximation, and interpolation.
Azure Time Series Insights is an open and scalable end-to-end IoT analytics platform which is used to collect, process, store, query, monitor, analyze, and visualize IoT data. It provides a range of features from data ingestion to analysis. It also ingests all IoT data into Time Series Insights with native integrations into Azure IoT Hub and Event Hub, which makes the task of building IoT solutions easy.
Influx DB is an open source time series database designed by InfluxData. It provides high speed read and write operations. The data is being read/written and operations can be done real time. InfluxDB processes, analyzes and acts on time series data in real time which helps in implementing use cases like forecasting and detecting anomalies.
Timescale DB is an open source time series database as well, which supports SQL language natively. It has a much higher ingestion rate by using automated time space partitioning. Users can write millions of data points per second. The advantage of timescaleDB is that it is open source and has a fully managed analytics stack. It is also available on leading cloud platforms like AWS, Microsoft Azure and Google Cloud Platform.
OpenTSDB is a scalable time series database which stores and serves massive amounts of time series data without losing granularity. It runs on Hadoop and HBase and scales to millions of writes per second. It is not a serverless database and one needs to add nodes based on the needed capacity.
Graphite is another open source time series database, that stores numeric time series data and renders graphs of this data on demand. This tool runs equally well on cheap hardware or cloud infrastructure. It is used to track performance of websites, applications, network services and business services.
Today, Amazon Timestream database and Azure Time Series Insights are the leading choice for building cloud based solutions. These offer a fully managed service with less cost and high scalability and with the all the advantages of a cloud platform including smooth integration with other managed services. Among the open source options, InfluxDB provides real time write and read operation very efficiently. Further, all databases have their pros and cons, so a thorough performance benchmark is needed for the use cases at hand before selecting the one to be used.
More and more organizations employing IoT solutions are leaning towards using time series databases. The advantages weigh heavily towards using these databases against RDBMS or NoSQL ones. An RDBMS works best when there is a complex relationship between entities but is not suited for use cases demanding very high throughput of read and write operations. NoSQL databases can be designed to work like time series databases with time stamped records with suitable indexes but they are more apt for cases where data from wide variety of sources, inconsistently structured, need to be persisted and therefore, require a schemaless data model which is offered by NoSQL databases.
Conclusively, IoT use cases with time series data ingestion warrant a use of time series databases. With the stated benefits, high volume and velocity of time dependent data ingestion requires high throughput and high performance offered by time series databases.