Tree-based methods for clustering time series using domain-relevant attributes

Jun 1, 2019·
Mahsa Ashouri
Mahsa Ashouri
,
Galit Shmueli
,
Chor-Yiu Sin
· 0 min read
Abstract
This research proposes two new methods for clustering time series that capture temporal information (trend, seasonality, and autocorrelation) and domain-relevant cross-sectional attributes. The methods are based on model-based partitioning (MOB) trees and can be used as an automated yet transparent tool for clustering a large collection of time series. Using a single linear regression model, the single-step method clusters series using trend, seasonality, time series lags, and domain-relevant cross-sectional attributes. The two-step method first clusters by trend, seasonality, and domain-relevant cross-sectional attributes, then further clusters the residuals series by autocorrelation and the domain-relevant cross-sectional attributes. Both methods produce clusters that are interpretable by domain experts. We illustrate the usefulness of the proposed clustering approach by considering one-step-ahead forecasting. We present empirical results comparing our approach to forecasting each series using an Autoregressive Integrated Moving Average (ARIMA) model applied to a large set of Wikipedia article pageviews time series. Our results show that the tree-based approach produces forecasts practically on par with ARIMA models yet are significantly faster and more efficient, thereby suitable for scaling to large collections of time series. Moreover, our method produces simple parametric forecasting models for interpretable clusters of time series, whereas ARIMA cannot provide such interpretability.
Type
Publication
Journal of Business Analytics