# SDFVAE # Exp_datasets Two multivariate time series (structured CDN KPIs) datasets are used in our work (folders named dataset1 and dataset2) which span 78 days and 64 days respectively.
These two datasets are collected from two different provincial-level CDN edge sites of a top ISP-operated CDN.
There are 7 abnormal sequences in dataset 1 and 5 abnormal sequences in dataset 2.
These labelled abnormal sequences are confirmed by human operators. So they can be considered as the ground truth.
The KPIs in the datasets include the out-bound traffic and in-bound traffic of CDN servers, cache hit ratio, average bitrate, and so on.
For privacy reasons, these KPIs are anonymized and normalized.
## Dataset Information ### Dataset1 #### Basic Statistics Statistics | dataset1 --- | --- Number of KPIs | 24 Durations (day) | 78 Granularity (min) | 5 Number of points | 22,356 Number of anomaly sequences | 7 Anomaly ratio (%) | 1.6 Train period | 1 ∼ 10,656 Test period | 10,657 ∼ 22,356 #### Data format There are 24 CSV files and each file corresponds to a KPI.
The CSV has the following format:
* First column is the timestamp
* Second column is the value
* Third column is the label. 0 for normal and 1 for abnormal
Timestamp | Value | Label --- | --- | --- 20181001000500 | 0.46444977152338346 | 0 20181001001000 | 0.4423121844530866 | 0 20181001001500 | 0.4186436700242946 | 0 20181001002000 | 0.39892597116922146 | 1 20181001002500 | 0.37977501905111494 | 1 20181001003000 | 0.36615750635812155 | 1 ### Dataset2 #### Basic Statistics Statistics | dataset2 --- | --- Number of KPIs | 16 Durations (day) | 64 Granularity (min) | 1 Number of points | 91,507 Number of anomaly sequences | 5 Anomaly ratio (%) | 0.32 Train period | 1 ∼ 51,336 Test period | 51,337 ∼ 91,507 #### Data format There are 2 CSV files in the folder named "dataset2".
The file named "dataset2.csv" is the full data, and "dataset2_sample.csv" is the sample which contains 1000 records.
The CSV has the following format:
* First column is the timestamp
* 2nd~17th columns are the KPI values which correspond to 16 KPIs
* The last column is the label. 0 for normal and 1 for abnormal
Timestamp | Kpi1 | ... | Kpi16 | Label --- | --- | --- | --- | --- 20190903000200 | 0.46444977152338346 | ... | -0.588230235 | 0 | 20190903000300 | 0.4423121844530866 | ... | -0.595955299 | 0 | 20190903000400 | 0.4186436700242946 | ... | -0.600299795 | 0 | 20190903000500 | 0.39892597116922146 | ... | -0.604815951 | 1 | 20190903000600 | 0.37977501905111494 | ... | -0.610974025 | 1 | 20190903000700 | 0.36615750635812155 | ... | -0.616264816 | 1 | ### Public Dataset The public dataset (SMD) used in our evaluation experiments as well as its detailed description can be found in web sites: https://github.com/NetManAIOps/OmniAnomaly For simplicity, we select 2 of 28 machine data namely "machine-1-2.txt" and "machine-1-3.txt" to conduct evaluation experiments. **Note that all KPIs are normalized and we omitted the real name of each KPI for confidentiality, but this does not affect the accuracy of the evaluation experiments.**