Zhang, Kai, Liu, Chuanren, Zhang, Jie, Xiong, Hui, Xing, Eric, Ye, Jieping.
2017.

Randomization or Condensation?: Linear-Cost Matrix Sketching Via Cascaded Compression Sampling Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. :615–623.

Matrix sketching is aimed at finding compact representations of a matrix while simultaneously preserving most of its properties, which is a fundamental building block in modern scientific computing. Randomized algorithms represent state-of-the-art and have attracted huge interest from the fields of machine learning, data mining, and theoretic computer science. However, it still requires the use of the entire input matrix in producing desired factorizations, which can be a major computational and memory bottleneck in truly large problems. In this paper, we uncover an interesting theoretic connection between matrix low-rank decomposition and lossy signal compression, based on which a cascaded compression sampling framework is devised to approximate an m-by-n matrix in only O(m+n) time and space. Indeed, the proposed method accesses only a small number of matrix rows and columns, which significantly improves the memory footprint. Meanwhile, by sequentially teaming two rounds of approximation procedures and upgrading the sampling strategy from a uniform probability to more sophisticated, encoding-orientated sampling, significant algorithmic boosting is achieved to uncover more granular structures in the data. Empirical results on a wide spectrum of real-world, large-scale matrices show that by taking only linear time and space, the accuracy of our method rivals those state-of-the-art randomized algorithms consuming a quadratic, O(mn), amount of resources.

Luo, Chen, Chen, Zhengzhang, Tang, Lu-An, Shrivastava, Anshumali, Li, Zhichun, Chen, Haifeng, Ye, Jieping.
2018.

TINET: Learning Invariant Networks via Knowledge Transfer. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. :1890-1899.

The latent behavior of an information system that can exhibit extreme events, such as system faults or cyber-attacks, is complex. Recently, the invariant network has shown to be a powerful way of characterizing complex system behaviors. Structures and evolutions of the invariance network, in particular, the vanishing correlations, can shed light on identifying causal anomalies and performing system diagnosis. However, due to the dynamic and complex nature of real-world information systems, learning a reliable invariant network in a new environment often requires continuous collecting and analyzing the system surveillance data for several weeks or even months. Although the invariant networks learned from old environments have some common entities and entity relationships, these networks cannot be directly borrowed for the new environment due to the domain variety problem. To avoid the prohibitive time and resource consuming network building process, we propose TINET, a knowledge transfer based model for accelerating invariant network construction. In particular, we first propose an entity estimation model to estimate the probability of each source domain entity that can be included in the final invariant network of the target domain. Then, we propose a dependency construction model for constructing the unbiased dependency relationships by solving a two-constraint optimization problem. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness and efficiency of TINET. We also apply TINET to a real enterprise security system for intrusion detection. TINET achieves superior detection performance at least 20 days lead-lag time in advance with more than 75% accuracy.