Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# SDFVAE
# Exp_datasets
Two multivariate time series (structured CDN KPIs) datasets are used in our work (folders named dataset1 and dataset2) which span 78 days and 64 days respectively. <br>
These two datasets are collected from two different provincial-level CDN edge sites of a top ISP-operated CDN. <br>
There are 7 abnormal sequences in dataset 1 and 5 abnormal sequences in dataset 2. <br>
These labelled abnormal sequences are confirmed by human operators. So they can be considered as the ground truth. <br>
The KPIs in the datasets include the out-bound traffic and in-bound traffic of CDN servers, cache hit ratio, average bitrate, and so on. <br>
For privacy reasons, these KPIs are anonymized and normalized. <br>
## Dataset Information
### Dataset1
#### Basic Statistics
Statistics | dataset1
--- | ---
Number of KPIs | 24
Durations (day) | 78
Granularity (min) | 5
Number of points | 22,356
Number of anomaly sequences | 7
Anomaly ratio (%) | 1.6
Train period | 1 ∼ 10,656
Test period | 10,657 ∼ 22,356
#### Data format
There are 24 CSV files and each file corresponds to a KPI. <br>
The CSV has the following format: <br>
* First column is the timestamp <br>
* Second column is the value <br>
* Third column is the label. 0 for normal and 1 for abnormal <br>
Timestamp | Value | Label
--- | --- | ---
20181001000500 | 0.46444977152338346 | 0
20181001001000 | 0.4423121844530866 | 0
20181001001500 | 0.4186436700242946 | 0
20181001002000 | 0.39892597116922146 | 1
20181001002500 | 0.37977501905111494 | 1
20181001003000 | 0.36615750635812155 | 1
### Dataset2
#### Basic Statistics
Statistics | dataset2
--- | ---
Number of KPIs | 16
Durations (day) | 64
Granularity (min) | 1
Number of points | 91,507
Number of anomaly sequences | 5
Anomaly ratio (%) | 0.32
Train period | 1 ∼ 51,336
Test period | 51,337 ∼ 91,507
#### Data format
There are 2 CSV files in the folder named "dataset2". <br>
The file named "dataset2.csv" is the full data, and "dataset2_sample.csv" is the sample which contains 1000 records. <br>
The CSV has the following format: <br>
* First column is the timestamp <br>
* 2nd~17th columns are the KPI values which correspond to 16 KPIs <br>
* The last column is the label. 0 for normal and 1 for abnormal <br>
Timestamp | Kpi1 | ... | Kpi16 | Label
--- | --- | --- | --- | ---
20190903000200 | 0.46444977152338346 | ... | -0.588230235 | 0 |
20190903000300 | 0.4423121844530866 | ... | -0.595955299 | 0 |
20190903000400 | 0.4186436700242946 | ... | -0.600299795 | 0 |
20190903000500 | 0.39892597116922146 | ... | -0.604815951 | 1 |
20190903000600 | 0.37977501905111494 | ... | -0.610974025 | 1 |
20190903000700 | 0.36615750635812155 | ... | -0.616264816 | 1 |
### Public Dataset
The public dataset (SMD) used in our evaluation experiments as well as its detailed description can be found in web sites:
https://github.com/NetManAIOps/OmniAnomaly
For simplicity, we select 2 of 28 machine data namely "machine-1-2.txt" and "machine-1-3.txt" to conduct evaluation experiments.
**Note that all KPIs are normalized and we omitted the real name of each KPI for confidentiality, but this does not affect the accuracy of the evaluation experiments.**