This study compares popular Machine Learning (ML), Deep Learning (DP), and statistical algorithms for forecasting microservice time series. The evaluated algorithms are a statistical algorithm (AutoRegressive Integrated Moving Average (ARIMA)), five DL ones (Dual-Stage Attention-Based RNN (DARNN), Deep State Space Model (DeepState), DeepAR, Long Short-Term Memory (LSTM), and Temporal Fusion Transformer (TFT)), and four traditional ML (Multilayer Perceptron (MLP), Support Vector Regressor (SVR), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost)). They are evaluated in 40-time series extracted from microservices operating in production in a large-scale deployment within the Alibaba Cluster.
$ virtualenv venv
$ source venv/bin/activate
$ pip3 install -r requirements.txt
Summary of the main repository files.
Files | Content description |
---|---|
Data-descriptions.csv | Description of the datasets. |
DTW | Result of the DTW algorithm for selecting time series. |
Friedman and Nemenyi tests | Friedman and Nemenyi results |
Result | MSE and model efficiency time (MET) of the models. |
Models | Trained models saved in .pickle |
File | File description |
---|---|
generate_met_results.py | Generates MET results from models. |
generate_mse_results.py | Generates MSE results from models. |
The parameters adopted into models’ training is summarises below:
Algorithms with Source | Hyperparameters |
---|---|
ARIMA | AutoArima |
DARNN | 'enconder': (16, 32, 64, 128, 256), 'decoder': (16, 32, 64, 128, 256) |
DeepAr | 'encoder': (8), 'decoder': (8), 'batch': (64), 'learning_rate': (0.0001), 'layers': (3), 'lstm_nodes': (40) |
DeepState | The algorithm itself selects the hyperparameters |
LSTM | 'batch_size': (64, 128), 'epochs': (1, 2, 4, 8, 10), 'hidden_layers': (2, 3, 4, 5, 6), 'learning_rate': (0.05, 0.01, 0.001) |
MLP | 'hidden_layer_sizes': (2, 5, 10, 15, 20), 'activation': ('logistic'), 'solver': ('adam'), 'max_iter': (1000), 'num_exec': 10 |
RF | 'min_samples_leaf': (1, 5, 10), 'min_samples_split': (2, 5, 10, 15), 'n_estimators': (100, 500, 1000) |
TFT | 'dropout_rate': (0.1, 0.2, 0.3, 0.4, 0.5, 0.7, 0.9), 'learning_rate': (0.0001, 0.001, 0.01), 'num_heads': (1, 4), 'batch': (64, 128, 256) |
SVR | 'gamma': (0.001, 0.01, 0.1, 1) 'kernel': ('rbf', 'sigmoid') 'epsilon': (0.1, 0.001, 0.0001) 'C': (0.1, 1, 10, 100, 1000, 10000) |
XGBoost | 'col_sample_by_tree': (0.4, 0.6, 0.8), 'gamma': (1, 5, 10), 'learning_rate': (0.01, 0.1, 1), 'max_depth': (3, 6, 10), 'n_estimators': (100, 150, 200), 'reg_alpha': (0.01, 0.1, 10), 'reg_lambda': (0.01, 0.1, 10), 'subsample': (0.4, 0.6, 0.8) |