Visible to the public Biblio

Filters: Author is Xiong, Hui  [Clear All Filters]
2018-08-23
Zhang, Kai, Liu, Chuanren, Zhang, Jie, Xiong, Hui, Xing, Eric, Ye, Jieping.  2017.  Randomization or Condensation?: Linear-Cost Matrix Sketching Via Cascaded Compression Sampling Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. :615–623.
Matrix sketching is aimed at finding compact representations of a matrix while simultaneously preserving most of its properties, which is a fundamental building block in modern scientific computing. Randomized algorithms represent state-of-the-art and have attracted huge interest from the fields of machine learning, data mining, and theoretic computer science. However, it still requires the use of the entire input matrix in producing desired factorizations, which can be a major computational and memory bottleneck in truly large problems. In this paper, we uncover an interesting theoretic connection between matrix low-rank decomposition and lossy signal compression, based on which a cascaded compression sampling framework is devised to approximate an m-by-n matrix in only O(m+n) time and space. Indeed, the proposed method accesses only a small number of matrix rows and columns, which significantly improves the memory footprint. Meanwhile, by sequentially teaming two rounds of approximation procedures and upgrading the sampling strategy from a uniform probability to more sophisticated, encoding-orientated sampling, significant algorithmic boosting is achieved to uncover more granular structures in the data. Empirical results on a wide spectrum of real-world, large-scale matrices show that by taking only linear time and space, the accuracy of our method rivals those state-of-the-art randomized algorithms consuming a quadratic, O(mn), amount of resources.
2018-06-20
Yang, Sen, Dong, Xin, Sun, Leilei, Zhou, Yichen, Farneth, Richard A., Xiong, Hui, Burd, Randall S., Marsic, Ivan.  2017.  A Data-driven Process Recommender Framework. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. :2111–2120.
We present an approach for improving the performance of complex knowledge-based processes by providing data-driven step-by-step recommendations. Our framework uses the associations between similar historic process performances and contextual information to determine the prototypical way of enacting the process. We introduce a novel similarity metric for grouping traces into clusters that incorporates temporal information about activity performance and handles concurrent activities. Our data-driven recommender system selects the appropriate prototype performance of the process based on user-provided context attributes. Our approach for determining the prototypes discovers the commonly performed activities and their temporal relationships. We tested our system on data from three real-world medical processes and achieved recommendation accuracy up to an F1 score of 0.77 (compared to an F1 score of 0.37 using ZeroR) with 63.2% of recommended enactments being within the first five neighbors of the actual historic enactments in a set of 87 cases. Our framework works as an interactive visual analytic tool for process mining. This work shows the feasibility of data-driven decision support system for complex knowledge-based processes.