diff --git a/README.md b/README.md index 4e09dbb18d7fa680c7562a1ae6e0c79dde85f010..c3f98f216cace0d0ed1c0aa9a37f53788e94e488 100644 --- a/README.md +++ b/README.md @@ -1 +1,82 @@ -# SDFVAE \ No newline at end of file +# SDFVAE + +# Exp_datasets +Two multivariate time series (structured CDN KPIs) datasets are used in our work (folders named dataset1 and dataset2) which span 78 days and 64 days respectively.
+These two datasets are collected from two different provincial-level CDN edge sites of a top ISP-operated CDN.
+There are 7 abnormal sequences in dataset 1 and 5 abnormal sequences in dataset 2.
+These labelled abnormal sequences are confirmed by human operators. So they can be considered as the ground truth.
+The KPIs in the datasets include the out-bound traffic and in-bound traffic of CDN servers, cache hit ratio, average bitrate, and so on.
+For privacy reasons, these KPIs are anonymized and normalized.
+ + +## Dataset Information + +### Dataset1 +#### Basic Statistics + +Statistics | dataset1 +--- | --- +Number of KPIs | 24 +Durations (day) | 78 +Granularity (min) | 5 +Number of points | 22,356 +Number of anomaly sequences | 7 +Anomaly ratio (%) | 1.6 +Train period | 1 ∼ 10,656 +Test period | 10,657 ∼ 22,356 + +#### Data format +There are 24 CSV files and each file corresponds to a KPI.
+The CSV has the following format:
+* First column is the timestamp
+* Second column is the value
+* Third column is the label. 0 for normal and 1 for abnormal
+ +Timestamp | Value | Label +--- | --- | --- +20181001000500 | 0.46444977152338346 | 0 +20181001001000 | 0.4423121844530866 | 0 +20181001001500 | 0.4186436700242946 | 0 +20181001002000 | 0.39892597116922146 | 1 +20181001002500 | 0.37977501905111494 | 1 +20181001003000 | 0.36615750635812155 | 1 + +### Dataset2 + +#### Basic Statistics + +Statistics | dataset2 +--- | --- +Number of KPIs | 16 +Durations (day) | 64 +Granularity (min) | 1 +Number of points | 91,507 +Number of anomaly sequences | 5 +Anomaly ratio (%) | 0.32 +Train period | 1 ∼ 51,336 +Test period | 51,337 ∼ 91,507 + +#### Data format +There are 2 CSV files in the folder named "dataset2".
+The file named "dataset2.csv" is the full data, and "dataset2_sample.csv" is the sample which contains 1000 records.
+The CSV has the following format:
+* First column is the timestamp
+* 2nd~17th columns are the KPI values which correspond to 16 KPIs
+* The last column is the label. 0 for normal and 1 for abnormal
+ +Timestamp | Kpi1 | ... | Kpi16 | Label +--- | --- | --- | --- | --- +20190903000200 | 0.46444977152338346 | ... | -0.588230235 | 0 | +20190903000300 | 0.4423121844530866 | ... | -0.595955299 | 0 | +20190903000400 | 0.4186436700242946 | ... | -0.600299795 | 0 | +20190903000500 | 0.39892597116922146 | ... | -0.604815951 | 1 | +20190903000600 | 0.37977501905111494 | ... | -0.610974025 | 1 | +20190903000700 | 0.36615750635812155 | ... | -0.616264816 | 1 | + +### Public Dataset +The public dataset (SMD) used in our evaluation experiments as well as its detailed description can be found in web sites: +https://github.com/NetManAIOps/OmniAnomaly + +For simplicity, we select 2 of 28 machine data namely "machine-1-2.txt" and "machine-1-3.txt" to conduct evaluation experiments. + +**Note that all KPIs are normalized and we omitted the real name of each KPI for confidentiality, but this does not affect the accuracy of the evaluation experiments.**