README.md

# InterFusion

数据github地址：https://github.com/zhhlee/InterFusion

**KDD 2021: Multivariate Time Series Anomaly Detection and Interpretation using Hierarchical Inter-Metric and Temporal Embedding**

InterFusion is an unsupervised MTS anomaly detection and interpretation method. It's core idea is to model the normal patterns of MTS using HVAE with jointly trained hierarchical stochastic latent variables, each of which explicitly learns low-dimensional inter-metric or temporal embeddings. You may refer to our [paper](https://dl.acm.org/doi/abs/10.1145/3447548.3467075) for more details.

## Getting Started

**Clone the repo**

```bash
git clone https://github.com/zhhlee/InterFusion.git && cd InterFusion
```

**Get data**

The datasets used in this paper are in folder ``data``. You may refer to ``data/Dataset Description`` for more details.

**Install dependencies (with python 3.6+)**

(virtualenv is recommended)

```bash
pip install -r requirements.txt
```

The code is tested under the following basic environments:

```
OS: Ubuntu 18.04
GPU: GTX 1080 Ti
Cuda: 9.0.176
Python: 3.6.6
```

**Run the code**

Please set the root directory of the project as your Python path.

For dataset ASD and SMD:

```bash
python algorithm/stack_train.py --dataset=omi-1			# training
python algorithm/stack_predict.py --load_model_dir=./results/stack_train/	# evaluation
```

For dataset SWaT and WADI (Note: you need to acquire these datasets first following ``data/Dataset Description`` and ``explib/raw_data_converter``):

SWaT:

```bash
python algorithm/stack_train.py --dataset=SWaT --train.train_start=21600 --train.valid_portion=0.1 --model.window_length=30 '--model.output_shape=[15, 15, 30]' --model.z2_dim=8	# training
python algorithm/stack_predict.py --load_model_dir=./results/stack_train/ --mcmc_track=False	# evaluation
```

WADI:

```bash
python algorithm/stack_train.py --dataset=WADI --train.train_start=259200 --train.max_train_size=789371 --train.valid_portion=0.1 --model.window_length=30 '--model.output_shape=[15, 15, 30]' --model.z2_dim=8 # training
python algorithm/stack_predict.py --load_model_dir=./results/stack_train/ --mcmc_track=False	# evaluation
```

The default model configurations are in ``algorithm/InterFusion.py``, train configs in ``algorithm/stack_train.py``, and evaluation configs in ``algorithm/stack_predict.py``. You may overwrite the configs using command line args. For example:

```bash
python algorithm/stack_train.py --dataset=omi-1 --model.z_dim=5 --train.batch_size=128
python algorithm/stack_predict.py --load_model_dir=./results/stack_train/ --test_batch_size=100
```

**Run on your own dataset**

1. Put your train/test/label files under ``data/processed`` folder. e.g., ``ds_train.pkl``, ``ds_test.pkl``, ``ds_test_label.pkl`` with shape ``(train_length, feature_dim)``, ``(test_length, feature_dim)``, ``(test_length,)``, respectively.
2. Put the interpretation files (optional) under ``data/interpretation_label`` folder.
3. Edit ``get_data_dim`` in ``algorithm/utils.py`` to add your dataset info.
4. Run the code following the instructions above.

**Results**

After running the algorithm, the results are shown in the ``results`` folder. The main results are:

```bash
Model: results/stack_train/result_params/
Training config: results/stack_train/config.json
Testing config: results/stack_predict/config.json
Testing statistics: results/stack_predict/result.json
```

If you find this code useful for your research, please cite our paper:

```bibTex
@inproceedings{li2021multivariate,
  title={Multivariate Time Series Anomaly Detection and Interpretation using Hierarchical Inter-Metric and Temporal Embedding},
  author={Li, Zhihan and Zhao, Youjian and Han, Jiaqi and Su, Ya and Jiao, Rui and Wen, Xidao and Pei, Dan},
  booktitle={Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery \& Data Mining},
  pages={3220--3230},
  year={2021}
}
```