README.md

# SDFVAE

# Exp_datasets
Two multivariate time series (structured CDN KPIs) datasets are used in our work (folders named dataset1 and dataset2) which span 78 days and 64 days respectively. <br> 
These two datasets are collected from two different provincial-level CDN edge sites of a top ISP-operated CDN. <br>
There are 7 abnormal sequences in dataset 1 and 5 abnormal sequences in dataset 2. <br>
These labelled abnormal sequences are confirmed by human operators. So they can be considered as the ground truth. <br>
The KPIs in the datasets include the out-bound traffic and in-bound traffic of CDN servers, cache hit ratio, average bitrate, and so on. <br>
For privacy reasons, these KPIs are anonymized and normalized. <br>


## Dataset Information

### Dataset1
#### Basic Statistics

Statistics | dataset1
--- | ---
Number of KPIs | 24
Durations (day) | 78
Granularity (min) | 5
Number of points | 22,356
Number of anomaly sequences | 7
Anomaly ratio (%) | 1.6
Train period | 1 ∼ 10,656
Test period | 10,657 ∼ 22,356

#### Data format
There are 24 CSV files and each file corresponds to a KPI. <br>
The CSV has the following format: <br>
* First column is the timestamp <br>
* Second column is the value <br>
* Third column is the label. 0 for normal and 1 for abnormal <br>

Timestamp | Value | Label
--- | --- | ---
20181001000500 | 0.46444977152338346 | 0
20181001001000 | 0.4423121844530866 | 0
20181001001500 | 0.4186436700242946 | 0
20181001002000 | 0.39892597116922146 | 1
20181001002500 | 0.37977501905111494 | 1
20181001003000 | 0.36615750635812155 | 1

### Dataset2

#### Basic Statistics

Statistics | dataset2
--- | ---
Number of KPIs | 16
Durations (day) | 64
Granularity (min) | 1
Number of points | 91,507
Number of anomaly sequences | 5
Anomaly ratio (%) | 0.32
Train period | 1 ∼ 51,336
Test period | 51,337 ∼ 91,507

#### Data format
There are 2 CSV files in the folder named "dataset2". <br>
The file named "dataset2.csv" is the full data, and "dataset2_sample.csv" is the sample which contains 1000 records. <br>
The CSV has the following format: <br>
* First column is the timestamp <br>
* 2nd~17th columns are the KPI values which correspond to 16 KPIs <br>
* The last column is the label. 0 for normal and 1 for abnormal <br>

Timestamp | Kpi1 | ... | Kpi16 | Label
--- | --- | --- | --- | ---
20190903000200 | 0.46444977152338346 | ... | -0.588230235 | 0 |
20190903000300 | 0.4423121844530866 | ... | -0.595955299 | 0 |
20190903000400 | 0.4186436700242946 | ... | -0.600299795 | 0 |
20190903000500 | 0.39892597116922146 | ... | -0.604815951 | 1 |
20190903000600 | 0.37977501905111494 | ... | -0.610974025 | 1 |
20190903000700 | 0.36615750635812155 | ... | -0.616264816 | 1 |

### Public Dataset 
The public dataset (SMD) used in our evaluation experiments as well as its detailed description can be found in web sites:
https://github.com/NetManAIOps/OmniAnomaly

For simplicity, we select 2 of 28 machine data namely "machine-1-2.txt" and "machine-1-3.txt" to conduct evaluation experiments.

**Note that all KPIs are normalized and we omitted the real name of each KPI for confidentiality, but this does not affect the accuracy of the evaluation experiments.**