Visible to the public Biblio

Filters: Keyword is data handling  [Clear All Filters]
Olaimat, M. Al, Lee, D., Kim, Y., Kim, J., Kim, J..  2020.  A Learning-based Data Augmentation for Network Anomaly Detection. 2020 29th International Conference on Computer Communications and Networks (ICCCN). :1–10.
While machine learning technologies have been remarkably advanced over the past several years, one of the fundamental requirements for the success of learning-based approaches would be the availability of high-quality data that thoroughly represent individual classes in a problem space. Unfortunately, it is not uncommon to observe a significant degree of class imbalance with only a few instances for minority classes in many datasets, including network traffic traces highly skewed toward a large number of normal connections while very small in quantity for attack instances. A well-known approach to addressing the class imbalance problem is data augmentation that generates synthetic instances belonging to minority classes. However, traditional statistical techniques may be limited since the extended data through statistical sampling should have the same density as original data instances with a minor degree of variation. This paper takes a learning-based approach to data augmentation to enable effective network anomaly detection. One of the critical challenges for the learning-based approach is the mode collapse problem resulting in a limited diversity of samples, which was also observed from our preliminary experimental result. To this end, we present a novel "Divide-Augment-Combine" (DAC) strategy, which groups the instances based on their characteristics and augments data on a group basis to represent a subset independently using a generative adversarial model. Our experimental results conducted with two recently collected public network datasets (UNSW-NB15 and IDS-2017) show that the proposed technique enhances performances up to 21.5% for identifying network anomalies.
Zhang, M., Wei, T., Li, Z., Zhou, Z..  2020.  A service-oriented adaptive anonymity algorithm. 2020 39th Chinese Control Conference (CCC). :7626—7631.

Recently, a large amount of research studies aiming at the privacy-preserving data publishing have been conducted. We find that most K-anonymity algorithms fail to consider the characteristics of attribute values distribution in data and the contribution value differences in quasi-identifier attributes when service-oriented. In this paper, the importance of distribution characteristics of attribute values and the differences in contribution value of quasi-identifier attributes to anonymous results are illustrated. In order to maximize the utility of released data, a service-oriented adaptive anonymity algorithm is proposed. We establish a model of reaction dispersion degree to quantify the characteristics of attribute value distribution and introduce the concept of utility weight related to the contribution value of quasi-identifier attributes. The priority coefficient and the characterization coefficient of partition quality are defined to optimize selection strategies of dimension and splitting value in anonymity group partition process adaptively, which can reduce unnecessary information loss so as to further improve the utility of anonymized data. The rationality and validity of the algorithm are verified by theoretical analysis and multiple experiments.

Xin, B., Yang, W., Geng, Y., Chen, S., Wang, S., Huang, L..  2020.  Private FL-GAN: Differential Privacy Synthetic Data Generation Based on Federated Learning. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). :2927–2931.
Generative Adversarial Network (GAN) has already made a big splash in the field of generating realistic "fake" data. However, when data is distributed and data-holders are reluctant to share data for privacy reasons, GAN's training is difficult. To address this issue, we propose private FL-GAN, a differential privacy generative adversarial network model based on federated learning. By strategically combining the Lipschitz limit with the differential privacy sensitivity, the model can generate high-quality synthetic data without sacrificing the privacy of the training data. We theoretically prove that private FL-GAN can provide strict privacy guarantee with differential privacy, and experimentally demonstrate our model can generate satisfactory data.
Flores, Pedro, Farid, Munsif, Samara, Khalid.  2019.  Assessing E-Security Behavior among Students in Higher Education. 2019 Sixth HCT Information Technology Trends (ITT). :253–258.
This study was conducted in order to assess the E-security behavior of students in a large higher educational institutions in the United Arab Emirates (UAE). Specifically, it sought to determine the current state of students' E-security behavior in the aspects of malware, password usage, data handling, phishing, social engineering, and online scam. An E- Security Behavior Survey Instrument (EBSI) was used to determine the status of security behavior of the participants in doing their computing activities. To complement the survey tool, focus group discussions were conducted to elicit specific experiences and insights of the participants relative to E-security. The results of the study shows that the overall E-security behavior among students in higher education in the United Arab Emirates (UAE) is moderately favorable. Specifically, the investigation reveals that the students favorably behave when it comes to phishing, social engineering, and online scam. However, they uncertainly behave on malware issues, password usage, and data handling.
Rudd-Orthner, Richard N M, Mihaylova, Lyudmilla.  2019.  An Algebraic Expert System with Neural Network Concepts for Cyber, Big Data and Data Migration. 2019 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). :1–6.

This paper describes a machine assistance approach to grading decisions for values that might be missing or need validation, using a mathematical algebraic form of an Expert System, instead of the traditional textual or logic forms and builds a neural network computational graph structure. This Experts System approach is also structured into a neural network like format of: input, hidden and output layers that provide a structured approach to the knowledge-base organization, this provides a useful abstraction for reuse for data migration applications in big data, Cyber and relational databases. The approach is further enhanced with a Bayesian probability tree approach to grade the confidences of value probabilities, instead of the traditional grading of the rule probabilities, and estimates the most probable value in light of all evidence presented. This is ground work for a Machine Learning (ML) experts system approach in a form that is closer to a Neural Network node structure.

Parafita, Álvaro, Vitrià, Jordi.  2019.  Explaining Visual Models by Causal Attribution. 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). :4167—4175.

Model explanations based on pure observational data cannot compute the effects of features reliably, due to their inability to estimate how each factor alteration could affect the rest. We argue that explanations should be based on the causal model of the data and the derived intervened causal models, that represent the data distribution subject to interventions. With these models, we can compute counterfactuals, new samples that will inform us how the model reacts to feature changes on our input. We propose a novel explanation methodology based on Causal Counterfactuals and identify the limitations of current Image Generative Models in their application to counterfactual creation.

Garg, Hittu, Dave, Mayank.  2019.  Securing User Access at IoT Middleware Using Attribute Based Access Control. 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT). :1–6.
IoT middleware is an additional layer between IoT devices and the cloud applications that reduces computation and data handling on the cloud. In a typical IoT system model, middleware primarily connects to different IoT devices via IoT gateway. Device data stored on middleware is sensitive and private to a user. Middleware must have built-in mechanisms to address these issues, as well as the implementation of user authentication and access control. This paper presents the current methods used for access control on middleware and introduces Attribute-based encryption (ABE) on middleware for access control. ABE combines access control with data encryption for ensuring the integrity of data. In this paper, we propose Ciphertext-policy attribute-based encryption, abbreviated CP-ABE scheme on the middleware layer in the IoT system architecture for user access control. The proposed scheme is aimed to provide security and efficiency while reducing complexity on middleware. We have used the AVISPA tool to strengthen the proposed scheme.
Yang, Ying, Yu, Huanhuan, Yang, Lina, Yang, Ming, Chen, Lijuan, Zhu, Guichun, Wen, Liqiang.  2019.  Hadoop-based Dark Web Threat Intelligence Analysis Framework. 2019 IEEE 3rd Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC). :1088—1091.

With the development of network services and people's privacy requirements continue to increase. On the basis of providing anonymous user communication, it is necessary to protect the anonymity of the server. At the same time, there are many threatening crime messages in the dark network. However, many scholars lack the ability or expertise to conduct research on dark-net threat intelligence. Therefore, this paper designs a framework based on Hadoop is hidden threat intelligence. The framework uses HDFS as the underlying storage system to build a HBase-based distributed database to store and manage threat intelligence information. According to the heterogeneous type of the forum, the web crawler is used to collect data through the anonymous TOR tool. The framework is used to identify the characteristics of key dark network criminal networks, which is the basis for the later dark network research.

Souza, Renan, Azevedo, Leonardo, Lourenço, Vítor, Soares, Elton, Thiago, Raphael, Brandão, Rafael, Civitarese, Daniel, Brazil, Emilio, Moreno, Marcio, Valduriez, Patrick et al..  2019.  Provenance Data in the Machine Learning Lifecycle in Computational Science and Engineering. 2019 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS). :1–10.
Machine Learning (ML) has become essential in several industries. In Computational Science and Engineering (CSE), the complexity of the ML lifecycle comes from the large variety of data, scientists' expertise, tools, and workflows. If data are not tracked properly during the lifecycle, it becomes unfeasible to recreate a ML model from scratch or to explain to stackholders how it was created. The main limitation of provenance tracking solutions is that they cannot cope with provenance capture and integration of domain and ML data processed in the multiple workflows in the lifecycle, while keeping the provenance capture overhead low. To handle this problem, in this paper we contribute with a detailed characterization of provenance data in the ML lifecycle in CSE; a new provenance data representation, called PROV-ML, built on top of W3C PROV and ML Schema; and extensions to a system that tracks provenance from multiple workflows to address the characteristics of ML and CSE, and to allow for provenance queries with a standard vocabulary. We show a practical use in a real case in the O&G industry, along with its evaluation using 239,616 CUDA cores in parallel.
Miao, Hui, Deshpande, Amol.  2019.  Understanding Data Science Lifecycle Provenance via Graph Segmentation and Summarization. 2019 IEEE 35th International Conference on Data Engineering (ICDE). :1710–1713.
Increasingly modern data science platforms today have non-intrusive and extensible provenance ingestion mechanisms to collect rich provenance and context information, handle modifications to the same file using distinguishable versions, and use graph data models (e.g., property graphs) and query languages (e.g., Cypher) to represent and manipulate the stored provenance/context information. Due to the schema-later nature of the metadata, multiple versions of the same files, and unfamiliar artifacts introduced by team members, the resulting "provenance graphs" are quite verbose and evolving; further, it is very difficult for the users to compose queries and utilize this valuable information just using standard graph query model. In this paper, we propose two high-level graph query operators to address the verboseness and evolving nature of such provenance graphs. First, we introduce a graph segmentation operator, which queries the retrospective provenance between a set of source vertices and a set of destination vertices via flexible boundary criteria to help users get insight about the derivation relationships among those vertices. We show the semantics of such a query in terms of a context-free grammar, and develop efficient algorithms that run orders of magnitude faster than state-of-the-art. Second, we propose a graph summarization operator that combines similar segments together to query prospective provenance of the underlying project. The operator allows tuning the summary by ignoring vertex details and characterizing local structures, and ensures the provenance meaning using path constraints. We show the optimal summary problem is PSPACE-complete and develop effective approximation algorithms. We implement the operators on top of Neo4j, evaluate our query techniques extensively, and show the effectiveness and efficiency of the proposed methods.
Talluri, Sacheendra, Iosup, Alexandru.  2019.  Efficient Estimation of Read Density When Caching for Big Data Processing. IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). :502–507.

Big data processing systems are becoming increasingly more present in cloud workloads. Consequently, they are starting to incorporate more sophisticated mechanisms from traditional database and distributed systems. We focus in this work on the use of caching policies, which for big data raise important new challenges. Not only they must respond to new variants of the trade-off between hit rate, response time, and the space consumed by the cache, but they must do so at possibly higher volume and velocity than web and database workloads. Previous caching policies have not been tested experimentally with big data workloads. We address these challenges in this work. We propose the Read Density family of policies, which is a principled approach to quantify the utility of cached objects through a family of utility functions that depend on the frequency of reads of an object. We further design the Approximate Histogram, which is a policy-based technique based on an array of counters. This technique promises to achieve runtime-space efficient computation of the metric required by the cache policy. We evaluate through trace-based simulation the caching policies from the Read Density family, and compare them with over ten state-of-the-art alternatives. We use two workload traces representative for big data processing, collected from commercial Spark and MapReduce deployments. While we achieve comparable performance to the state-of-art with less parameters, meaningful performance improvement for big data workloads remain elusive.

Liu, Ying, He, Qiang, Zheng, Dequan, Zhang, Mingwei, Chen, Feifei, Zhang, Bin.  2019.  Data Caching Optimization in the Edge Computing Environment. 2019 IEEE International Conference on Web Services (ICWS). :99–106.

With the rapid increase in the use of mobile devices in people's daily lives, mobile data traffic is exploding in recent years. In the edge computing environment where edge servers are deployed around mobile users, caching popular data on edge servers can ensure mobile users' fast access to those data and reduce the data traffic between mobile users and the centralized cloud. Existing studies consider the data cache problem with a focus on the reduction of network delay and the improvement of mobile devices' energy efficiency. In this paper, we attack the data caching problem in the edge computing environment from the service providers' perspective, who would like to maximize their venues of caching their data. This problem is complicated because data caching produces benefits at a cost and there usually is a trade-off in-between. In this paper, we formulate the data caching problem as an integer programming problem, and maximizes the revenue of the service provider while satisfying a constraint for data access latency. Extensive experiments are conducted on a real-world dataset that contains the locations of edge servers and mobile users, and the results reveal that our approach significantly outperform the baseline approaches.

Taneja, Shubbhi, Zhou, Yi, Chavan, Ajit, Qin, Xiao.  2019.  Improving Energy Efficiency of Hadoop Clusters using Approximate Computing. 2019 IEEE 5th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS). :206–211.
There is an ongoing search for finding energy-efficient solutions in multi-core computing platforms. Approximate computing is one such solution leveraging the forgiving nature of applications to improve the energy efficiency at different layers of the computing platform ranging from applications to hardware. We are interested in understanding the benefits of approximate computing in the realm of Apache Hadoop and its applications. A few mechanisms for introducing approximation in programming models include sampling input data, skipping selective computations, relaxing synchronization, and user-defined quality-levels. We believe that it is straightforward to apply the aforementioned mechanisms to conserve energy in Hadoop clusters as well. The emerging trend of approximate computing motivates us to systematically investigate thermal profiling of approximate computing strategies in this research. In particular, we design a thermal-aware approximate computing framework called tHadoop2, which is an extension of tHadoop proposed by Chavan et al. We investigated the thermal behavior of a MapReduce application called Pi running on Hadoop clusters by varying two input parameters - number of maps and number of sampling points per map. Our profiling results show that Pi exhibits inherent resilience in terms of the number of precision digits present in its value.
Shang, Chengya, Bao, Xianqiang, Fu, Lijun, Xia, Li, Xu, Xinghua, Xu, Chengcheng.  2019.  A Novel Key-Value Based Real-Time Data Management Framework for Ship Integrated Power Cyber-Physical System. 2019 IEEE Innovative Smart Grid Technologies - Asia (ISGT Asia). :854–858.
The new generation ship integrated power system (IPS) realizes high level informatization for various physical equipments, and gradually develops to a cyber-physical system (CPS). The future trend is collecting ship big data to achieve data-driven intelligence for IPS. However, traditional relational data management framework becomes inefficient to handle the real-time data processing in ship integrated power cyber-physics system. In order to process the large-scale real-time data that collected from numerous sensors by field bus of IPS devices within acceptable latency, especially for handling the semi-structured and non-structured data. This paper proposes a novel key-value data model based real-time data management framework, which enables batch processing and distributed deployment to acquire time-efficiency as well as system scalable. We implement a real-time data management prototype system based on an open source in-memory key-value store. Finally, the evaluation results from the prototype verify the advantages of novel framework compared with traditional solution.
Alemán, Concepción Sánchez, Pissinou, Niki, Alemany, Sheila, Boroojeni, Kianoosh, Miller, Jerry, Ding, Ziqian.  2018.  Context-Aware Data Cleaning for Mobile Wireless Sensor Networks: A Diversified Trust Approach. 2018 International Conference on Computing, Networking and Communications (ICNC). :226–230.

In mobile wireless sensor networks (MWSN), data imprecision is a common problem. Decision making in real time applications may be greatly affected by a minor error. Even though there are many existing techniques that take advantage of the spatio-temporal characteristics exhibited in mobile environments, few measure the trustworthiness of sensor data accuracy. We propose a unique online context-aware data cleaning method that measures trustworthiness by employing an initial candidate reduction through the analysis of trust parameters used in financial markets theory. Sensors with similar trajectory behaviors are assigned trust scores estimated through the calculation of “betas” for finding the most accurate data to trust. Instead of devoting all the trust into a single candidate sensor's data to perform the cleaning, a Diversified Trust Portfolio (DTP) is generated based on the selected set of spatially autocorrelated candidate sensors. Our results show that samples cleaned by the proposed method exhibit lower percent error when compared to two well-known and effective data cleaning algorithms in tested outdoor and indoor scenarios.

Yang, C., Li, Z., Qu, W., Liu, Z., Qi, H..  2017.  Grid-Based Indexing and Search Algorithms for Large-Scale and High-Dimensional Data. 2017 14th International Symposium on Pervasive Systems, Algorithms and Networks 2017 11th International Conference on Frontier of Computer Science and Technology 2017 Third International Symposium of Creative Computing (ISPAN-FCST-ISCC). :46–51.

The rapid development of Internet has resulted in massive information overloading recently. These information is usually represented by high-dimensional feature vectors in many related applications such as recognition, classification and retrieval. These applications usually need efficient indexing and search methods for such large-scale and high-dimensional database, which typically is a challenging task. Some efforts have been made and solved this problem to some extent. However, most of them are implemented in a single machine, which is not suitable to handle large-scale database.In this paper, we present a novel data index structure and nearest neighbor search algorithm implemented on Apache Spark. We impose a grid on the database and index data by non-empty grid cells. This grid-based index structure is simple and easy to be implemented in parallel. Moreover, we propose to build a scalable KNN graph on the grids, which increase the efficiency of this index structure by a low cost in parallel implementation. Finally, experiments are conducted in both public databases and synthetic databases, showing that the proposed methods achieve overall high performance in both efficiency and accuracy.

Siddiqi, M., All, S. T., Sivaraman, V..  2017.  Secure Lightweight Context-Driven Data Logging for Bodyworn Sensing Devices. 2017 5th International Symposium on Digital Forensic and Security (ISDFS). :1–6.

Rapid advancement in wearable technology has unlocked a tremendous potential of its applications in the medical domain. Among the challenges in making the technology more useful for medical purposes is the lack of confidence in the data thus generated and communicated. Incentives have led to attacks on such systems. We propose a novel lightweight scheme to securely log the data from bodyworn sensing devices by utilizing neighboring devices as witnesses who store the fingerprints of data in Bloom filters to be later used for forensics. Medical data from each sensor is stored at various locations of the system in chronological epoch-level blocks chained together, similar to the blockchain. Besides secure logging, the scheme offers to secure other contextual information such as localization and timestamping. We prove the effectiveness of the scheme through experimental results. We define performance parameters of our scheme and quantify their cost benefit trade-offs through simulation.

Ukwandu, E., Buchanan, W. J., Russell, G..  2017.  Performance Evaluation of a Fragmented Secret Share System. 2017 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA). :1–6.
There are many risks in moving data into public storage environments, along with an increasing threat around large-scale data leakage. Secret sharing scheme has been proposed as a keyless and resilient mechanism to mitigate this, but scaling through large scale data infrastructure has remained the bane of using secret sharing scheme in big data storage and retrievals. This work applies secret sharing methods as used in cryptography to create robust and secure data storage and retrievals in conjunction with data fragmentation. It outlines two different methods of distributing data equally to storage locations as well as recovering them in such a manner that ensures consistent data availability irrespective of file size and type. Our experiments consist of two different methods - data and key shares. Using our experimental results, we were able to validate previous works on the effects of threshold on file recovery. Results obtained also revealed the varying effects of share writing to and retrieval from storage locations other than computer memory. The implication is that increase in fragment size at varying file and threshold sizes rather than add overheads to file recovery, do so on creation instead, underscoring the importance of choosing a varying fragment size as file size increases.
Lim, H., Ni, A., Kim, D., Ko, Y. B..  2017.  Named data networking testbed for scientific data. 2017 2nd International Conference on Computer and Communication Systems (ICCCS). :65–69.

Named Data Networking (NDN) is one of the future internet architectures, which is a clean-slate approach. NDN provides intelligent data retrieval using the principles of name-based symmetrical forwarding of Interest/Data packets and innetwork caching. The continually increasing demand for rapid dissemination of large-scale scientific data is driving the use of NDN in data-intensive science experiments. In this paper, we establish an intercontinental NDN testbed. In the testbed, an NDN-based application that targets climate science as an example data intensive science application is designed and implemented, which has differentiated features compared to those of previous studies. We verify experimental justification of using NDN for climate science in the intercontinental network, through performance comparisons between classical delivery techniques and NDN-based climate data delivery.

Vavala, B., Neves, N., Steenkiste, P..  2017.  Secure Tera-scale Data Crunching with a Small TCB. 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :169–180.

Outsourcing services to third-party providers comes with a high security cost-to fully trust the providers. Using trusted hardware can help, but current trusted execution environments do not adequately support services that process very large scale datasets. We present LASTGT, a system that bridges this gap by supporting the execution of self-contained services over a large state, with a small and generic trusted computing base (TCB). LASTGT uses widely deployed trusted hardware to guarantee integrity and verifiability of the execution on a remote platform, and it securely supplies data to the service through simple techniques based on virtual memory. As a result, LASTGT is general and applicable to many scenarios such as computational genomics and databases, as we show in our experimental evaluation based on an implementation of LAST-GT on a secure hypervisor. We also describe a possible implementation on Intel SGX.

Xu, X., Pautasso, C., Zhu, L., Gramoli, V., Ponomarev, A., Tran, A. B., Chen, S..  2016.  The Blockchain as a Software Connector. 2016 13th Working IEEE/IFIP Conference on Software Architecture (WICSA). :182–191.

Blockchain is an emerging technology for decentralized and transactional data sharing across a large network of untrusted participants. It enables new forms of distributed software architectures, where components can find agreements on their shared states without trusting a central integration point or any particular participating components. Considering the blockchain as a software connector helps make explicitly important architectural considerations on the resulting performance and quality attributes (for example, security, privacy, scalability and sustainability) of the system. Based on our experience in several projects using blockchain, in this paper we provide rationales to support the architectural decision on whether to employ a decentralized blockchain as opposed to other software solutions, like traditional shared data storage. Additionally, we explore specific implications of using the blockchain as a software connector including design trade-offs regarding quality attributes.

Swathy, V., Sudha, K., Aruna, R., Sangeetha, C., Janani, R..  2016.  Providing advanced security mechanism for scalable data sharing in cloud storage. 2016 International Conference on Inventive Computation Technologies (ICICT). 3:1–6.

Data sharing is a significant functionality in cloud storage. These cloud storage provider are answerable for keeping the data obtainable and available in addition to the physical environment protected and running. Here we can securely, efficiently, and flexibly share data with others in cloud storage. A new public-key cryptosystems is planned which create constant-size cipher texts such that efficient allocation of decryption rights for any set of cipher texts are achievable. The uniqueness means that one can aggregate any set of secret keys and make them as packed in as a single key, but encircling the power of all the keys being aggregated. This packed in aggregate key can be easily sent to others or be stored in a smart card with very restricted secure storage. In KAC, users encrypt a file with single key, that means every file have each file, also there will be aggregate keys for two or more files, which formed by using the tree structure. Through this, the user can share more files with a single key at a time.

Henze, Martin, Hiller, Jens, Schmerling, Sascha, Ziegeldorf, Jan Henrik, Wehrle, Klaus.  2016.  CPPL: Compact Privacy Policy Language. Proceedings of the 2016 ACM on Workshop on Privacy in the Electronic Society. :99–110.

Recent technology shifts such as cloud computing, the Internet of Things, and big data lead to a significant transfer of sensitive data out of trusted edge networks. To counter resulting privacy concerns, we must ensure that this sensitive data is not inadvertently forwarded to third-parties, used for unintended purposes, or handled and stored in violation of legal requirements. Related work proposes to solve this challenge by annotating data with privacy policies before data leaves the control sphere of its owner. However, we find that existing privacy policy languages are either not flexible enough or require excessive processing, storage, or bandwidth resources which prevents their widespread deployment. To fill this gap, we propose CPPL, a Compact Privacy Policy Language which compresses privacy policies by taking advantage of flexibly specifiable domain knowledge. Our evaluation shows that CPPL reduces policy sizes by two orders of magnitude compared to related work and can check several thousand of policies per second. This allows for individual per-data item policies in the context of cloud computing, the Internet of Things, and big data.

Zhang, L., Li, B., Zhang, L., Li, D..  2015.  Fuzzy clustering of incomplete data based on missing attribute interval size. 2015 IEEE 9th International Conference on Anti-counterfeiting, Security, and Identification (ASID). :101–104.

Fuzzy c-means algorithm is used to identity clusters of similar objects within a data set, while it is not directly applied to incomplete data. In this paper, we proposed a novel fuzzy c-means algorithm based on missing attribute interval size for the clustering of incomplete data. In the new algorithm, incomplete data set was transformed to interval data set according to the nearest neighbor rule. The missing attribute value was replaced by the corresponding interval median and the interval size was set as the additional property for the incomplete data to control the effect of interval size in clustering. Experiments on standard UCI data set show that our approach outperforms other clustering methods for incomplete data.

F. Hassan, J. L. Magalini, V. de Campos Pentea, R. A. Santos.  2015.  "A project-based multi-disciplinary elective on digital data processing techniques". 2015 IEEE Frontiers in Education Conference (FIE). :1-7.

Todays' era of internet-of-things, cloud computing and big data centers calls for more fresh graduates with expertise in digital data processing techniques such as compression, encryption and error correcting codes. This paper describes a project-based elective that covers these three main digital data processing techniques and can be offered to three different undergraduate majors electrical and computer engineering and computer science. The course has been offered successfully for three years. Registration statistics show equal interest from the three different majors. Assessment data show that students have successfully completed the different course outcomes. Students' feedback show that students appreciate the knowledge they attain from this elective and suggest that the workload for this course in relation to other courses of equal credit is as expected.