Visible to the public Biblio

Filters: Keyword is pattern clustering  [Clear All Filters]
2020-07-20
Boumiza, Safa, Braham, Rafik.  2019.  An Anomaly Detector for CAN Bus Networks in Autonomous Cars based on Neural Networks. 2019 International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob). :1–6.
The domain of securing in-vehicle networks has attracted both academic and industrial researchers due to high danger of attacks on drivers and passengers. While securing wired and wireless interfaces is important to defend against these threats, detecting attacks is still the critical phase to construct a robust secure system. There are only a few results on securing communication inside vehicles using anomaly-detection techniques despite their efficiencies in systems that need real-time detection. Therefore, we propose an intrusion detection system (IDS) based on Multi-Layer Perceptron (MLP) neural network for Controller Area Networks (CAN) bus. This IDS divides data according to the ID field of CAN packets using K-means clustering algorithm, then it extracts suitable features and uses them to train and construct the neural network. The proposed IDS works for each ID separately and finally it combines their individual decisions to construct the final score and generates alert in the presence of attack. The strength of our intrusion detection method is that it works simultaneously for two types of attacks which will eliminate the use of several separate IDS and thus reduce the complexity and cost of implementation.
2020-07-13
ahmad, sahan, Zobaed, SM, Gottumukkala, Raju, Salehi, Mohsen Amini.  2019.  Edge Computing for User-Centric Secure Search on Cloud-Based Encrypted Big Data. 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). :662–669.
Cloud service providers offer a low-cost and convenient solution to host unstructured data. However, cloud services act as third-party solutions and do not provide control of the data to users. This has raised security and privacy concerns for many organizations (users) with sensitive data to utilize cloud-based solutions. User-side encryption can potentially address these concerns by establishing user-centric cloud services and granting data control to the user. Nonetheless, user-side encryption limits the ability to process (e.g., search) encrypted data on the cloud. Accordingly, in this research, we provide a framework that enables processing (in particular, searching) of encrypted multiorganizational (i.e., multi-source) big data without revealing the data to cloud provider. Our framework leverages locality feature of edge computing to offer a user-centric search ability in a realtime manner. In particular, the edge system intelligently predicts the user's search pattern and prunes the multi-source big data search space to reduce the search time. The pruning system is based on efficient sampling from the clustered big dataset on the cloud. For each cluster, the pruning system dynamically samples appropriate number of terms based on the user's search tendency, so that the cluster is optimally represented. We developed a prototype of a user-centric search system and evaluated it against multiple datasets. Experimental results demonstrate 27% improvement in the pruning quality and search accuracy.
2020-07-06
Ben, Yongming, Han, Yanni, Cai, Ning, An, Wei, Xu, Zhen.  2019.  An Online System Dependency Graph Anomaly Detection based on Extended Weisfeiler-Lehman Kernel. MILCOM 2019 - 2019 IEEE Military Communications Conference (MILCOM). :1–6.
Modern operating systems are typical multitasking systems: Running multiple tasks at the same time. Therefore, a large number of system calls belonging to different processes are invoked at the same time. By associating these invocations, one can construct the system dependency graph. In rapidly evolving system dependency graphs, how to quickly find outliers is an urgent issue for intrusion detection. Clustering analysis based on graph similarity will help solve this problem. In this paper, an extended Weisfeiler-Lehman(WL) kernel is proposed. Firstly, an embedded vector with indefinite dimensions is constructed based on the original dependency graph. Then, the vector is compressed with Simhash to generate a fingerprint. Finally, anomaly detection based on clustering is carried out according to these fingerprints. Our scheme can achieve prominent detection with high efficiency. For validation, we choose StreamSpot, a relevant prior work, to act as benchmark, and use the same data set as it to carry out evaluations. Experiments show that our scheme can achieve the highest detection precision of 98% while maintaining a perfect recall performance. Moreover, both quantitative and visual comparisons demonstrate the outperforming clustering effect of our scheme than StreamSpot.
2020-07-03
Suo, Yucong, Zhang, Chen, Xi, Xiaoyun, Wang, Xinyi, Zou, Zhiqiang.  2019.  Video Data Hierarchical Retrieval via Deep Hash Method. 2019 IEEE 11th International Conference on Communication Software and Networks (ICCSN). :709—714.

Video retrieval technology faces a series of challenges with the tremendous growth in the number of videos. In order to improve the retrieval performance in efficiency and accuracy, a novel deep hash method for video data hierarchical retrieval is proposed in this paper. The approach first uses cluster-based method to extract key frames, which reduces the workload of subsequent work. On the basis of this, high-level semantical features are extracted from VGG16, a widely used deep convolutional neural network (deep CNN) model. Then we utilize a hierarchical retrieval strategy to improve the retrieval performance, roughly can be categorized as coarse search and fine search. In coarse search, we modify simHash to learn hash codes for faster speed, and in fine search, we use the Euclidean distance to achieve higher accuracy. Finally, we compare our approach with other two methods through practical experiments on two videos, and the results demonstrate that our approach has better retrieval effect.

2020-05-11
Anand Sukumar, J V, Pranav, I, Neetish, MM, Narayanan, Jayasree.  2018.  Network Intrusion Detection Using Improved Genetic k-means Algorithm. 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI). :2441–2446.
Internet is a widely used platform nowadays by people across the globe. This has led to the advancement in science and technology. Many surveys show that network intrusion has registered a consistent increase and lead to personal privacy theft and has become a major platform for attack in the recent years. Network intrusion is any unauthorized activity on a computer network. Hence there is a need to develop an effective intrusion detection system. In this paper we acquaint an intrusion detection system that uses improved genetic k-means algorithm(IGKM) to detect the type of intrusion. This paper also shows a comparison between an intrusion detection system that uses the k-means++ algorithm and an intrusion detection system that uses IGKM algorithm while using smaller subset of kdd-99 dataset with thousand instances and the KDD-99 dataset. The experiment shows that the intrusion detection that uses IGKM algorithm is more accurate when compared to k-means++ algorithm.
2020-05-08
Chaudhary, Anshika, Mittal, Himangi, Arora, Anuja.  2019.  Anomaly Detection using Graph Neural Networks. 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon). :346—350.
Conventional methods for anomaly detection include techniques based on clustering, proximity or classification. With the rapidly growing social networks, outliers or anomalies find ingenious ways to obscure themselves in the network and making the conventional techniques inefficient. In this paper, we utilize the ability of Deep Learning over topological characteristics of a social network to detect anomalies in email network and twitter network. We present a model, Graph Neural Network, which is applied on social connection graphs to detect anomalies. The combinations of various social network statistical measures are taken into account to study the graph structure and functioning of the anomalous nodes by employing deep neural networks on it. The hidden layer of the neural network plays an important role in finding the impact of statistical measure combination in anomaly detection.
2020-04-20
Yuan, Jing, Ou, Yuyi, Gu, Guosheng.  2019.  An Improved Privacy Protection Method Based on k-degree Anonymity in Social Network. 2019 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA). :416–420.
To preserve the privacy of social networks, most existing methods are applied to satisfy different anonymity models, but there are some serious problems such as huge large information losses and great structural modifications of original social network. Therefore, an improved privacy protection method called k-subgraph is proposed, which is based on k-degree anonymous graph derived from k-anonymity to keep the network structure stable. The method firstly divides network nodes into several clusters by label propagation algorithm, and then reconstructs the sub-graph by means of moving edges to achieve k-degree anonymity. Experimental results show that our k-subgraph method can not only effectively improve the defense capability against malicious attacks based on node degrees, but also maintain stability of network structure. In addition, the cost of information losses due to anonymity is minimized ideally.
2020-04-06
Haoliang, Sun, Dawei, Wang, Ying, Zhang.  2019.  K-Means Clustering Analysis Based on Adaptive Weights for Malicious Code Detection. 2019 IEEE 11th International Conference on Communication Software and Networks (ICCSN). :652—656.

Nowadays, a major challenge to network security is malicious codes. However, manual extraction of features is one of the characteristics of traditional detection techniques, which is inefficient. On the other hand, the features of the content and behavior of the malicious codes are easy to change, resulting in more inefficiency of the traditional techniques. In this paper, a K-Means Clustering Analysis is proposed based on Adaptive Weights (AW-MMKM). Identifying malicious codes in the proposed method is based on four types of network behavior that can be extracted from network traffic, including active, fault, network scanning, and page behaviors. The experimental results indicate that the AW-MMKM can detect malicious codes efficiently with higher accuracy.

2020-03-23
Naik, Nitin, Jenkins, Paul, Savage, Nick.  2019.  A Ransomware Detection Method Using Fuzzy Hashing for Mitigating the Risk of Occlusion of Information Systems. 2019 International Symposium on Systems Engineering (ISSE). :1–6.
Today, a significant threat to organisational information systems is ransomware that can completely occlude the information system by denying access to its data. To reduce this exposure and damage from ransomware attacks, organisations are obliged to concentrate explicitly on the threat of ransomware, alongside their malware prevention strategy. In attempting to prevent the escalation of ransomware attacks, it is important to account for their polymorphic behaviour and dispersion of inexhaustible versions. However, a number of ransomware samples possess similarity as they are created by similar groups of threat actors. A particular threat actor or group often adopts similar practices or codebase to create unlimited versions of their ransomware. As a result of these common traits and codebase, it is probable that new or unknown ransomware variants can be detected based on a comparison with their originating or existing samples. Therefore, this paper presents a detection method for ransomware by employing a similarity preserving hashing method called fuzzy hashing. This detection method is applied on the collected WannaCry or WannaCryptor ransomware corpus utilising three fuzzy hashing methods SSDEEP, SDHASH and mvHASH-B to evaluate the similarity detection success rate by each method. Moreover, their fuzzy similarity scores are utilised to cluster the collected ransomware corpus and its results are compared to determine the relative accuracy of the selected fuzzy hashing methods.
2020-02-26
Naik, Nitin, Jenkins, Paul, Savage, Nick, Yang, Longzhi.  2019.  Cyberthreat Hunting - Part 2: Tracking Ransomware Threat Actors Using Fuzzy Hashing and Fuzzy C-Means Clustering. 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). :1–6.

Threat actors are constantly seeking new attack surfaces, with ransomeware being one the most successful attack vectors that have been used for financial gain. This has been achieved through the dispersion of unlimited polymorphic samples of ransomware whilst those responsible evade detection and hide their identity. Nonetheless, every ransomware threat actor adopts some similar style or uses some common patterns in their malicious code writing, which can be significant evidence contributing to their identification. he first step in attempting to identify the source of the attack is to cluster a large number of ransomware samples based on very little or no information about the samples, accordingly, their traits and signatures can be analysed and identified. T herefore, this paper proposes an efficient fuzzy analysis approach to cluster ransomware samples based on the combination of two fuzzy techniques fuzzy hashing and fuzzy c-means (FCM) clustering. Unlike other clustering techniques, FCM can directly utilise similarity scores generated by a fuzzy hashing method and cluster them into similar groups without requiring additional transformational steps to obtain distance among objects for clustering. Thus, it reduces the computational overheads by utilising fuzzy similarity scores obtained at the time of initial triaging of whether the sample is known or unknown ransomware. The performance of the proposed fuzzy method is compared against k-means clustering and the two fuzzy hashing methods SSDEEP and SDHASH which are evaluated based on their FCM clustering results to understand how the similarity score affects the clustering results.

Kumar, A. Ranjith, Sivagami, A..  2019.  Balanced Load Clustering with Trusted Multipath Relay Routing Protocol for Wireless Sensor Network. 2019 Innovations in Power and Advanced Computing Technologies (i-PACT). 1:1–6.

Clustering is one of an eminent mechanism which deals with large number of nodes and effective consumption of energy in wireless sensor networks (WSN). Balanced Load Clustering is used to balance the channel bandwidth by incorporating the concept of HMAC. Presently several research studies works to improve the quality of service and energy efficiency of WSN but the security issues are not taken care of. Relay based multipath trust is one of the methods to secure the network. To this end, a novel approach called Balanced Load Clustering with Trusted Multipath Relay Routing Protocol (BLC-TMR2) to improve the performance of the network. The proposed protocol consists of two algorithms. Initially in order to reduce the energy consumption of the network, balanced load clustering (BLC) concepts is introduced. Secondly to secure the network from the malicious activity trusted multipath relay routing protocol (TMR2) is used. Multipath routing is monitored by the relay node and it computed the trust values. Network simulation (NS2) software is used to obtain the results and the results prove that the proposed system performs better the earlier methods the in terms of efficiency, consumption, QoS and throughput.

2020-02-17
Ezick, James, Henretty, Tom, Baskaran, Muthu, Lethin, Richard, Feo, John, Tuan, Tai-Ching, Coley, Christopher, Leonard, Leslie, Agrawal, Rajeev, Parsons, Ben et al..  2019.  Combining Tensor Decompositions and Graph Analytics to Provide Cyber Situational Awareness at HPC Scale. 2019 IEEE High Performance Extreme Computing Conference (HPEC). :1–7.
This paper describes MADHAT (Multidimensional Anomaly Detection fusing HPC, Analytics, and Tensors), an integrated workflow that demonstrates the applicability of HPC resources to the problem of maintaining cyber situational awareness. MADHAT combines two high-performance packages: ENSIGN for large-scale sparse tensor decompositions and HAGGLE for graph analytics. Tensor decompositions isolate coherent patterns of network behavior in ways that common clustering methods based on distance metrics cannot. Parallelized graph analysis then uses directed queries on a representation that combines the elements of identified patterns with other available information (such as additional log fields, domain knowledge, network topology, whitelists and blacklists, prior feedback, and published alerts) to confirm or reject a threat hypothesis, collect context, and raise alerts. MADHAT was developed using the collaborative HPC Architecture for Cyber Situational Awareness (HACSAW) research environment and evaluated on structured network sensor logs collected from Defense Research and Engineering Network (DREN) sites using HPC resources at the U.S. Army Engineer Research and Development Center DoD Supercomputing Resource Center (ERDC DSRC). To date, MADHAT has analyzed logs with over 650 million entries.
2020-02-10
Taneja, Shubbhi, Zhou, Yi, Chavan, Ajit, Qin, Xiao.  2019.  Improving Energy Efficiency of Hadoop Clusters using Approximate Computing. 2019 IEEE 5th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS). :206–211.
There is an ongoing search for finding energy-efficient solutions in multi-core computing platforms. Approximate computing is one such solution leveraging the forgiving nature of applications to improve the energy efficiency at different layers of the computing platform ranging from applications to hardware. We are interested in understanding the benefits of approximate computing in the realm of Apache Hadoop and its applications. A few mechanisms for introducing approximation in programming models include sampling input data, skipping selective computations, relaxing synchronization, and user-defined quality-levels. We believe that it is straightforward to apply the aforementioned mechanisms to conserve energy in Hadoop clusters as well. The emerging trend of approximate computing motivates us to systematically investigate thermal profiling of approximate computing strategies in this research. In particular, we design a thermal-aware approximate computing framework called tHadoop2, which is an extension of tHadoop proposed by Chavan et al. We investigated the thermal behavior of a MapReduce application called Pi running on Hadoop clusters by varying two input parameters - number of maps and number of sampling points per map. Our profiling results show that Pi exhibits inherent resilience in terms of the number of precision digits present in its value.
Lekha, J., Maheshwaran, J, Tharani, K, Ram, Prathap K, Surya, Murthy K, Manikandan, A.  2019.  Efficient Detection of Spam Messages Using OBF and CBF Blocking Techniques. 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI). :1175–1179.

Emails are the fundamental unit of web applications. There is an exponential growth in sending and receiving emails online. However, spam mail has turned into an intense issue in email correspondence condition. There are number of substance based channel systems accessible to be specific content based filter(CBF), picture based sifting and many other systems to channel spam messages. The existing technological solution consists of a combination of porter stemer algorithm(PSA) and k means clustering which is adaptive in nature. These procedures are more expensive in regard of the calculation and system assets as they required the examination of entire spam message and calculation of the entire substance of the server. These are the channels must additionally not powerful in nature life on the grounds that the idea of spam block mail and spamming changes much of the time. We propose a starting point based spam mail-sifting system benefit, which works considering top head notcher data of the mail message paying little respect to the body substance of the mail. It streamlines the system and server execution by increasing the precision, recall and accuracy than the existing methods. To design an effective and efficient of autonomous and efficient spam detection system to improve network performance from unknown privileged user attacks.

2020-01-27
Xue, Hong, Wang, Jingxuan, Zhang, Miao, Wu, Yue.  2019.  Emergency Severity Assessment Method for Cluster Supply Chain Based on Cloud Fuzzy Clustering Algorithm. 2019 Chinese Control Conference (CCC). :7108–7114.

Aiming at the composite uncertainty characteristics and high-dimensional data stream characteristics of the evaluation index with both ambiguity and randomness, this paper proposes a emergency severity assessment method for cluster supply chain based on cloud fuzzy clustering algorithm. The summary cloud model generation algorithm is created. And the multi-data fusion method is applied to the cloud model processing of the evaluation indexes for high-dimensional data stream with ambiguity and randomness. The synopsis data of the emergency severity assessment indexes are extracted. Based on time attenuation model and sliding window model, the data stream fuzzy clustering algorithm for emergency severity assessment is established. The evaluation results are rationally optimized according to the generalized Euclidean distances of the cluster centers and cluster microcluster weights, and the severity grade of cluster supply chain emergency is dynamically evaluated. The experimental results show that the proposed algorithm improves the clustering accuracy and reduces the operation time, as well as can provide more accurate theoretical support for the early warning decision of cluster supply chain emergency.

Fuchs, Caro, Spolaor, Simone, Nobile, Marco S., Kaymak, Uzay.  2019.  A Swarm Intelligence Approach to Avoid Local Optima in Fuzzy C-Means Clustering. 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). :1–6.
Clustering analysis is an important computational task that has applications in many domains. One of the most popular algorithms to solve the clustering problem is fuzzy c-means, which exploits notions from fuzzy logic to provide a smooth partitioning of the data into classes, allowing the possibility of multiple membership for each data sample. The fuzzy c-means algorithm is based on the optimization of a partitioning function, which minimizes inter-cluster similarity. This optimization problem is known to be NP-hard and it is generally tackled using a hill climbing method, a local optimizer that provides acceptable but sub-optimal solutions, since it is sensitive to initialization and tends to get stuck in local optima. In this work we propose an alternative approach based on the swarm intelligence global optimization method Fuzzy Self-Tuning Particle Swarm Optimization (FST-PSO). We solve the fuzzy clustering task by optimizing fuzzy c-means' partitioning function using FST-PSO. We show that this population-based metaheuristics is more effective than hill climbing, providing high quality solutions with the cost of an additional computational complexity. It is noteworthy that, since this particle swarm optimization algorithm is self-tuning, the user does not have to specify additional hyperparameters for the optimization process.
2020-01-21
Aljamal, Ibraheem, Tekeo\u glu, Ali, Bekiroglu, Korkut, Sengupta, Saumendra.  2019.  Hybrid Intrusion Detection System Using Machine Learning Techniques in Cloud Computing Environments. 2019 IEEE 17th International Conference on Software Engineering Research, Management and Applications (SERA). :84–89.

Intrusion detection is one essential tool towards building secure and trustworthy Cloud computing environment, given the ubiquitous presence of cyber attacks that proliferate rapidly and morph dynamically. In our current working paradigm of resource, platform and service consolidations, Cloud Computing provides a significant improvement in the cost metrics via dynamic provisioning of IT services. Since almost all cloud computing networks lean on providing their services through Internet, they are prone to experience variety of security issues. Therefore, in cloud environments, it is necessary to deploy an Intrusion Detection System (IDS) to detect new and unknown attacks in addition to signature based known attacks, with high accuracy. In our deliberation we assume that a system or a network ``anomalous'' event is synonymous to an ``intrusion'' event when there is a significant departure in one or more underlying system or network activities. There are couple of recently proposed ideas that aim to develop a hybrid detection mechanism, combining advantages of signature-based detection schemes with the ability to detect unknown attacks based on anomalies. In this work, we propose a network based anomaly detection system at the Cloud Hypervisor level that utilizes a hybrid algorithm: a combination of K-means clustering algorithm and SVM classification algorithm, to improve the accuracy of the anomaly detection system. Dataset from UNSW-NB15 study is used to evaluate the proposed approach and results are compared with previous studies. The accuracy for our proposed K-means clustering model is slightly higher than others. However, the accuracy we obtained from the SVM model is still low for supervised techniques.

2020-01-06
Fan, Zexuan, Xu, Xiaolong.  2019.  APDPk-Means: A New Differential Privacy Clustering Algorithm Based on Arithmetic Progression Privacy Budget Allocation. 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). :1737–1742.
How to protect users' private data during network data mining has become a hot issue in the fields of big data and network information security. Most current researches on differential privacy k-means clustering algorithms focus on optimizing the selection of initial centroids. However, the traditional privacy budget allocation has the problem that the random noise becomes too large as the number of iterations increases, which will reduce the performance of data clustering. To solve the problem, we improved the way of privacy budget allocation in differentially private clustering algorithm DPk-means, and proposed APDPk-means, a new differential privacy clustering algorithm based on arithmetic progression privacy budget allocation. APDPk-means decomposes the total privacy budget into a decreasing arithmetic progression, allocating the privacy budgets from large to small in the iterative process, so as to ensure the rapid convergence in early iteration. The experiment results show that compared with the other differentially private k-means algorithms, APDPk-means has better performance in availability and quality of the clustering result under the same level of privacy protection.
Mo, Ran, Liu, Jianfeng, Yu, Wentao, Jiang, Fu, Gu, Xin, Zhao, Xiaoshuai, Liu, Weirong, Peng, Jun.  2019.  A Differential Privacy-Based Protecting Data Preprocessing Method for Big Data Mining. 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :693–699.

Analyzing clustering results may lead to the privacy disclosure issue in big data mining. In this paper, we put forward a differential privacy-based protecting data preprocessing method for distance-based clustering. Firstly, the data distortion technique differential privacy is used to prevent the distances in distance-based clustering from disclosing the relationships. Differential privacy may affect the clustering results while protecting privacy. Then an adaptive privacy budget parameter adjustment mechanism is applied for keeping the balance between the privacy protection and the clustering results. By solving the maximum and minimum problems, the differential privacy budget parameter can be obtained for different clustering algorithms. Finally, we conduct extensive experiments to evaluate the performance of our proposed method. The results demonstrate that our method can provide privacy protection with precise clustering results.

2019-12-18
Dincalp, Uygar, Güzel, Mehmet Serdar, Sevine, Omer, Bostanci, Erkan, Askerzade, Iman.  2018.  Anomaly Based Distributed Denial of Service Attack Detection and Prevention with Machine Learning. 2018 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT). :1-4.

Everyday., the DoS/DDoS attacks are increasing all over the world and the ways attackers are using changing continuously. This increase and variety on the attacks are affecting the governments, institutions, organizations and corporations in a bad way. Every successful attack is causing them to lose money and lose reputation in return. This paper presents an introduction to a method which can show what the attack and where the attack based on. This is tried to be achieved with using clustering algorithm DBSCAN on network traffic because of the change and variety in attack vectors.

2019-12-16
Wu, Jimmy Ming-Tai, Chun-Wei Lin, Jerry, Djenouri, Youcef, Fournier-Viger, Philippe, Zhang, Yuyu.  2019.  A Swarm-based Data Sanitization Algorithm in Privacy-Preserving Data Mining. 2019 IEEE Congress on Evolutionary Computation (CEC). :1461–1467.
In recent decades, data protection (PPDM), which not only hides information, but also provides information that is useful to make decisions, has become a critical concern. We present a sanitization algorithm with the consideration of four side effects based on multi-objective PSO and hierarchical clustering methods to find optimized solutions for PPDM. Experiments showed that compared to existing approaches, the designed sanitization algorithm based on the hierarchical clustering method achieves satisfactory performance in terms of hiding failure, missing cost, and artificial cost.
2019-12-09
Yang, Chao, Chen, Xinghe, Song, Tingting, Jiang, Bin, Liu, Qin.  2018.  A Hybrid Recommendation Algorithm Based on Heuristic Similarity and Trust Measure. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :1413–1418.
In this paper, we propose a hybrid collaborative filtering recommendation algorithm based on heuristic similarity and trust measure, in order to alleviate the problem of data sparsity, cold start and trust measure. Firstly, a new similarity measure is implemented by weighted fusion of multiple similarity influence factors obtained from the rating matrix, so that the similarity measure becomes more accurate. Then, a user trust relationship computing model is implemented by constructing the user's trust network based on the trust propagation theory. On this basis, a SIMT collaborative filtering algorithm is designed which integrates trust and similarity instead of the similarity in traditional collaborative filtering algorithm. Further, an improved K nearest neighbor recommendation based on clustering algorithm is implemented for generation of a better recommendation list. Finally, a comparative experiment on FilmTrust dataset shows that the proposed algorithm has improved the quality and accuracy of recommendation, thus overcome the problem of data sparsity, cold start and trust measure to a certain extent.
2019-11-25
Zuin, Gianlucca, Chaimowicz, Luiz, Veloso, Adriano.  2018.  Learning Transferable Features For Open-Domain Question Answering. 2018 International Joint Conference on Neural Networks (IJCNN). :1–8.

Corpora used to learn open-domain Question-Answering (QA) models are typically collected from a wide variety of topics or domains. Since QA requires understanding natural language, open-domain QA models generally need very large training corpora. A simple way to alleviate data demand is to restrict the domain covered by the QA model, leading thus to domain-specific QA models. While learning improved QA models for a specific domain is still challenging due to the lack of sufficient training data in the topic of interest, additional training data can be obtained from related topic domains. Thus, instead of learning a single open-domain QA model, we investigate domain adaptation approaches in order to create multiple improved domain-specific QA models. We demonstrate that this can be achieved by stratifying the source dataset, without the need of searching for complementary data unlike many other domain adaptation approaches. We propose a deep architecture that jointly exploits convolutional and recurrent networks for learning domain-specific features while transferring domain-shared features. That is, we use transferable features to enable model adaptation from multiple source domains. We consider different transference approaches designed to learn span-level and sentence-level QA models. We found that domain-adaptation greatly improves sentence-level QA performance, and span-level QA benefits from sentence information. Finally, we also show that a simple clustering algorithm may be employed when the topic domains are unknown and the resulting loss in accuracy is negligible.

2019-10-28
Trunov, Artem S., Voronova, Lilia I., Voronov, Vyacheslav I., Ayrapetov, Dmitriy P..  2018.  Container Cluster Model Development for Legacy Applications Integration in Scientific Software System. 2018 IEEE International Conference "Quality Management, Transport and Information Security, Information Technologies" (IT QM IS). :815–819.
Feature of modern scientific information systems is their integration with computing applications, providing distributed computer simulation and intellectual processing of Big Data using high-efficiency computing. Often these software systems include legacy applications in different programming languages, with non-standardized interfaces. To solve the problem of applications integration, containerization systems are using that allow to configure environment in the shortest time to deploy software system. However, there are no such systems for computer simulation systems with large number of nodes. The article considers the actual task of combining containers into a cluster, integrating legacy applications to manage the distributed software system MD-SLAG-MELT v.14, which supports high-performance computing and visualization of the computer experiments results. Testing results of the container cluster including automatic load sharing module for MD-SLAG-MELT system v.14. are given.
2019-10-15
Panagiotakis, C., Papadakis, H., Fragopoulou, P..  2018.  Detection of Hurriedly Created Abnormal Profiles in Recommender Systems. 2018 International Conference on Intelligent Systems (IS). :499–506.

Recommender systems try to predict the preferences of users for specific items. These systems suffer from profile injection attacks, where the attackers have some prior knowledge of the system ratings and their goal is to promote or demote a particular item introducing abnormal (anomalous) ratings. The detection of both cases is a challenging problem. In this paper, we propose a framework to spot anomalous rating profiles (outliers), where the outliers hurriedly create a profile that injects into the system either random ratings or specific ratings, without any prior knowledge of the existing ratings. The proposed detection method is based on the unpredictable behavior of the outliers in a validation set, on the user-item rating matrix and on the similarity between users. The proposed system is totally unsupervised, and in the last step it uses the k-means clustering method automatically spotting the spurious profiles. For the cases where labeling sample data is available, a random forest classifier is trained to show how supervised methods outperforms unsupervised ones. Experimental results on the MovieLens 100k and the MovieLens 1M datasets demonstrate the high performance of the proposed schemata.