Data collected over a certain period of time is called Time-series data. These data points are usually collected at adjacent intervals and have some correlation with the target. There are certain datasets that contain columns with date, month or days that are important for making predictions like sales datasets, stock price prediction etc. But the problem here is how to use the time-series data and convert them into a format the machine can understand? Python made this process a lot simpler by introducing a package called Darts.
In this article, we will learn about Darts, implement this over a time-series dataset.
Introduction to Darts
For a number of datasets, forecasting the time-series columns plays an important role in the decision making process for the model. Unit8.co developed a library to make the forecasting of time-series easy called darts. The idea behind this was to make darts as simple to use as sklearn for time-series. Darts attempts to smooth the overall process of using time series in machine learning.
The basic principles of darts are:
- There are two types of models in darts :
Regression models: these predict the output based on a set of input time-series.
Forecasting models: these predict a future output based on past values.
- They have a class called TimeSeries which is immutable like strings.
- The TimeSeries class can either one single dimensional or multi-dimensional. Some models like neural networks need multiple dimensions while other simple models work with just 1 dimension.
- Methods like fit() and predict() are unified across all models from neural networks to ARIMA.
Implementation of darts on time-series data
Darts is open-source and can be installed with the pip command. To install darts use:
pip install u8darts
Dataset
Next, choose any time-series dataset of your choice. I have selected the monthly production of beer in Australia dataset. To download this click here. Let us now load the dataset and import the libraries needed.
from google.colab import drive
drive.mount('/content/gdrive/')
import pandas as pd
from darts import TimeSeries
beer_data = pd.read_csv('/content/gdrive/My Drive/beer.csv')
beer_data.head()
The dataset contains two columns- the month with the year and the beer production in that time period.
Train-test split
Let us now use the TimeSeries class and split the data into train and test. We will use a method called from_dataframe for doing this and pass column names in the method. Then, we will split the data based on the time period. The dataset has around 477 columns, so I chose the 275th time period to make the split (1978-10).
get_data = TimeSeries.from_dataframe(beer_data, 'Month', 'Monthly beer production')
traindata, testdata = get_data.split_before(pd.Timestamp('1978-10'))
Modelling
Training of the model is very simple with darts. An exponential smoothing model is used here to fit the data. Similar to sklearn, fit() method is used to fit the dataset.
from darts.models import ExponentialSmoothing
beer_model = ExponentialSmoothing()
beer_model.fit(traindata)
This completes the training part. Let us now make predictions and plot the graph
prediction = beer_model.predict(len(test))
print("predicted" ,prediction[:5])
print("actual",test[:5])
import matplotlib.pyplot as plt
get_data.plot(label='actual')
prediction.plot(label='predict', lw=3)
plt.legend()
Here the monthly values after 1978 are forecasted due to the model exponential smoothing. It shows the time-series predictions with good accuracy.
Darts can also be used in neural networks, multivariate models and clustering models.
Conclusion
In this article, we saw how to use the darts library to forecast time-series problems with just a few simple lines of code. The library is fast and saves time when compared to the Pandas library. The library also contains options for backtesting, regression models and even automatically select models. It is a great way to handle time-series datasets.