Article-Journal

An interactive clustering-based visualization tool for air quality data analysis

Examining PM2.5 (atmospheric particulate matter with a maximum diameter of 2.5 micrometers), seasonal patterns is an important research area for environmental scientists. An improved understanding of PM2.5 seasonal patterns can help environmental protection agencies (EPAs) make decisions and develop complex models for controlling the concentration of PM2.5 in different regions. This work proposes an R Shiny App web-based interactive tool, namely a “model-based time series clustering” (MTSC) tool, for clustering PM2.5 time series using spatial and population variables and their temporal features, like seasonality. Our tool allows stakeholders to visualize important characteristics of PM2.5 time series, including temporal patterns and missing values, and cluster series by attribute groupings. We apply the MTSC tool to cluster Taiwan’s PM2.5 time series based on air quality zones and types of monitoring stations. The tool clusters the series into four clusters that reveal several phenomena, including an improvement in Taiwan's air quality since 2017 in all regions, although at varying rates, an increasing pattern of PM2.5 concentration when moving from northern towards southern regions, winter/summer seasonal patterns that are more pronounced in certain types of areas (e.g., industrial), and unusual behavior in the southernmost region. The tool provides cluster-specific quantitative figures, like seasonal variations in PM2.5 concentration in different air quality zones of Taiwan, and identifies, for example, an annual peak in early January and February (maximum value around 120 ). Our analysis identifies a region in southernmost Taiwan as different from other zones that are currently grouped together with it by Taiwan EPA (TEPA), and a northern region that behaves differently from its TEPA grouping. All these cluster-based insights help EPA experts implement short-term zone-specific air quality policies (e.g., fireworks and traffic regulations, school closures) as well as longer-term decision-making (e.g., transport control stations, fuel permits, old vehicle replacement, fuel type).

Jan 7, 2023

Interactive tool for clustering and forecasting patterns of Taiwan COVID-19 spread

In this research, we aimed to demonstrate the importance of Management Information systems (MISs) in education planning by collecting data and delivering forecast results to stakeholders. A critical question is whether the data collected by a system is adequate for producing the analytics necessary for decision-making. We describe the case of a new education MIS in Taiwan, where the population of preschool children in different school districts is constantly changing. These changes challenge school resource planning, especially in terms of teacher hiring. The bureaus of education in charge of resource allocation require accurate school-level one-to-five-year-ahead forecasts of the number of incoming first-grade classrooms. Therefore, the Ministry of Education launched a K-9 student data management system (k9sdms) that allows schools to update data on existing and prospective students directly. We evaluate whether using this system supports the goal of generating one-to-five-year-ahead forecasts, thereby assessing the value of the MIS for its intended usage. Using data until 2014, we developed a forecasting model for the number of first-grade classrooms at each school in Taiwan from 2015-2019. The quality of forecasts shows that k9sdms can produce valuable results, thereby achieving its purpose. Because the time series for each school was very short (six years for a number of first-grade students and five years for five-year-old children), we did not fit separate models for each school. Instead, we attempted to capture the change in school-level population sizes by using the information from three consecutive years. We used a linear regression model that can use suitable predictors (input variables) to capture the trend and/or seasonality and other patterns. The model, estimated from the training period, could produce forecasts on future data by inserting the relevant predictor information into the estimated regression equation. We explored different linear regression configurations. The output variable was always the number of first-grade students in year t; the predictors were the number of first-grade students in prior years and the five-year-old population in previous years.

Jun 1, 2022

Fast forecast reconciliation using linear models

Forecasting hierarchical or grouped time series usually involves two steps, computing base forecasts and reconciling the forecasts. Base forecasts can be computed by popular time series forecasting methods such as Exponential Smoothing (ETS) and Autoregressive Integrated Moving Average (ARIMA) models. The reconciliation step is a linear process that adjusts the base forecasts to ensure they are coherent. However using ETS or ARIMA for base forecasts can be computationally challenging when there are a large number of series to forecast, as each model must be numerically optimized for each series. We propose a linear model that avoids this computational problem and handles the forecasting and reconciliation in a single step. The proposed method is very flexible in incorporating external data, handling missing values and model selection. We illustrate our approach using two datasets; monthly Australian domestic tourism and daily Wikipedia pageviews. We compare our approach to reconciliation using ETS and ARIMA, and show that our approach is much faster while providing similar levels of forecast accuracy.

Jan 1, 2022

Tree-based methods for clustering time series using domain-relevant attributes

This research proposes two new methods for clustering time series that capture temporal information (trend, seasonality, and autocorrelation) and domain-relevant cross-sectional attributes. The methods are based on model-based partitioning (MOB) trees and can be used as an automated yet transparent tool for clustering a large collection of time series. Using a single linear regression model, the single-step method clusters series using trend, seasonality, time series lags, and domain-relevant cross-sectional attributes. The two-step method first clusters by trend, seasonality, and domain-relevant cross-sectional attributes, then further clusters the residuals series by autocorrelation and the domain-relevant cross-sectional attributes. Both methods produce clusters that are interpretable by domain experts. We illustrate the usefulness of the proposed clustering approach by considering one-step-ahead forecasting. We present empirical results comparing our approach to forecasting each series using an Autoregressive Integrated Moving Average (ARIMA) model applied to a large set of Wikipedia article pageviews time series. Our results show that the tree-based approach produces forecasts practically on par with ARIMA models yet are significantly faster and more efficient, thereby suitable for scaling to large collections of time series. Moreover, our method produces simple parametric forecasting models for interpretable clusters of time series, whereas ARIMA cannot provide such interpretability.

Jun 1, 2019

Assessing the value of an information system for developing predictive analytics: the case of forecasting school-level demand in Taiwan

Analytics is important for education planning. Deploying forecasting analytics requires management information systems (MISs) that collect the needed data and deliver the forecasts to stakeholders. A critical question is whether the data collected by a system is adequate for producing the analytics for decision making. We describe the case of a new education MIS in Taiwan, where the population of preschool children in different school districts is constantly changing. These changes challenge school resource planning, especially in terms of teacher hiring. The bureaus of education in charge of resource allocation are in need of accurate school-level one-to-five-year-ahead forecasts of the number of incoming first-grade classrooms. The Ministry of Education therefore launched a K–9 student data management system (k9sdms) that allows schools to directly update data on existing and prospective students. We evaluate whether using this system supports the goal of generating one-to-five-year-ahead forecasts, thereby assessing the value of the MIS for its intended usage. Using data until 2014, we developed a forecasting model for the number of first-grade classrooms at each school in Taiwan in 2015–2019. The quality of forecasts shows that k9sdms can produce valuable results, thereby achieving its purpose.

Mar 1, 2018