Anomaly Detection in Large Sets of High-Dimensional Symbol Sequences
- Author
- Turkov, Eugene
- Published
- September 2006.
- Physical Description
- 1 electronic document
- Additional Creators
- Budalakoti, Suratna, Srivastava, Ashok N., and Akella, Ram
Online Version
- hdl.handle.net , Connect to this object online.
- Restrictions on Access
- Unclassified, Unlimited, Publicly available.
Free-to-read Unrestricted online access - Summary
- This paper addresses the problem of detecting and describing anomalies in large sets of high-dimensional symbol sequences. The approach taken uses unsupervised clustering of sequences using the normalized longest common subsequence (LCS) as a similarity measure, followed by detailed analysis of outliers to detect anomalies. As the LCS measure is expensive to compute, the first part of the paper discusses existing algorithms, such as the Hunt-Szymanski algorithm, that have low time-complexity. We then discuss why these algorithms often do not work well in practice and present a new hybrid algorithm for computing the LCS that, in our tests, outperforms the Hunt-Szymanski algorithm by a factor of five. The second part of the paper presents new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence was deemed to be an outlier. The algorithms provide a coherent description to an analyst of the anomalies in the sequence, compared to more normal sequences. The algorithms we present are general and domain-independent, so we discuss applications in related areas such as anomaly detection.
- Other Subject(s)
- Collection
- NASA Technical Reports Server (NTRS) Collection.
- Note
- Document ID: 20060046368.
NASA/TM-2006-214553.
2006 SLAM International Conference on Data Mining; 20-22 Apr. 2006; Bethesda, MD; United States. - Terms of Use and Reproduction
- No Copyright.
View MARC record | catkey: 15961605