Visible to the public Biblio

Filters: Keyword is data analytics  [Clear All Filters]
2021-09-07
Simud, Thikamporn, Ruengittinun, Somchoke, Surasvadi, Navaporn, Sanglerdsinlapachai, Nuttapong, Plangprasopchok, Anon.  2020.  A Conversational Agent for Database Query: A Use Case for Thai People Map and Analytics Platform. 2020 15th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP). :1–6.
Since 2018, Thai People Map and Analytics Platform (TPMAP) has been developed with the aims of supporting government officials and policy makers with integrated household and community data to analyze strategic plans, implement policies and decisions to alleviate poverty. However, to acquire complex information from the platform, non-technical users with no database background have to ask a programmer or a data scientist to query data for them. Such a process is time-consuming and might result in inaccurate information retrieved due to miscommunication between non-technical and technical users. In this paper, we have developed a Thai conversational agent on top of TPMAP to support self-service data analytics on complex queries. Users can simply use natural language to fetch information from our chatbot and the query results are presented to users in easy-to-use formats such as statistics and charts. The proposed conversational agent retrieves and transforms natural language queries into query representations with relevant entities, query intentions, and output formats of the query. We employ Rasa, an open-source conversational AI engine, for agent development. The results show that our system yields Fl-score of 0.9747 for intent classification and 0.7163 for entity extraction. The obtained intents and entities are then used for query target information from a graph database. Finally, our system achieves end-to-end performance with accuracies ranging from 57.5%-80.0%, depending on query message complexity. The generated answers are then returned to users through a messaging channel.
2021-05-25
Zanin, M., Menasalvas, E., González, A. Rodriguez, Smrz, P..  2020.  An Analytics Toolbox for Cyber-Physical Systems Data Analysis: Requirements and Challenges. 2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO). :271–276.
The fast improvement in telecommunication technologies that has characterised the last decade is enabling a revolution centred on Cyber-Physical Systems (CPSs). Elements inside cities, from vehicles to cars, can now be connected and share data, describing both our environment and our behaviours. These data can also be used in an active way, by becoming the tenet of innovative services and products, i.e. of Cyber-Physical Products (CPPs). Still, having data is not tantamount to having knowledge, and an important overlooked topic is how should them be analysed. In this contribution we tackle the issue of the development of an analytics toolbox for processing CPS data. Specifically, we review and quantify the main requirements that should be fulfilled, both functional (e.g. flexibility or dependability) and technical (e.g. scalability, response time, etc.). We further propose an initial set of analysis that should in it be included. We finally review some challenges and open issues, including how security and privacy could be tackled by emerging new technologies.
2021-04-27
Khalid, O., Senthilananthan, S..  2020.  A review of data analytics techniques for effective management of big data using IoT. 2020 5th International Conference on Innovative Technologies in Intelligent Systems and Industrial Applications (CITISIA). :1—10.
IoT and big data are energetic technology of the world for quite a time, and both of these have become a necessity. On the one side where IoT is used to connect different objectives via the internet, the big data means having a large number of the set of structured, unstructured, and semi-structured data. The device used for processing based on the tools used. These tools help provide meaningful information used for effective management in different domains. Some of the commonly faced issues with the inadequate about the technologies are related to data privacy, insufficient analytical capabilities, and this issue is faced by in different domains related to the big data. Data analytics tools help discover the pattern of data and consumer preferences which is resulting in better decision making for the organizations. The major part of this work is to review different types of data analytics techniques for the effective management of big data using IoT. For the effective management of the ABD solution collection, analysis and control are used as the components. Each of the ingredients is described to find an effective way to manage big data. These components are considered and used in the validation criteria. The solution of effective data management is a stage towards the management of big data in IoT devices which will help the user to understand different types of elements of data management.
Reddy, C. b Manjunath, reddy, U. k, Brumancia, E., Gomathi, R. M., Indira, K..  2020.  Integrative Approach Of Big Data And Network Attacks Analysis In Cloud Environment. 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184). :314—317.

Lately mining of information from online life is pulling in more consideration because of the blast in the development of Big Data. In security, Big Data manages an assortment of immense advanced data for investigating, envisioning and to draw the bits of knowledge for the expectation and anticipation of digital assaults. Big Data Analytics (BDA) is the term composed by experts to portray the art of dealing with, taking care of and gathering a great deal of data for future evaluation. Data is being made at an upsetting rate. The quick improvement of the Internet, Internet of Things (IoT) and other creative advances are the rule liable gatherings behind this proceeded with advancement. The data made is an impression of the earth, it is conveyed out of, along these lines can use the data got away from structures to understand the internal exercises of that system. This has become a significant element in cyber security where the objective is to secure resources. Moreover, the developing estimation of information has made large information a high worth objective. Right now, investigate ongoing exploration works in cyber security comparable to huge information and feature how Big information is secured and how huge information can likewise be utilized as a device for cyber security. Simultaneously, a Big Data based concentrated log investigation framework is actualized to distinguish the system traffic happened with assailants through DDOS, SQL Injection and Bruce Force assault. The log record is naturally transmitted to the brought together cloud server and big information is started in the investigation process.

2021-02-22
Alzahrani, A., Feki, J..  2020.  Toward a Natural Language-Based Approach for the Specification of Decisional-Users Requirements. 2020 3rd International Conference on Computer Applications Information Security (ICCAIS). :1–6.
The number of organizations adopting the Data Warehouse (DW) technology along with data analytics in order to improve the effectiveness of their decision-making processes is permanently increasing. Despite the efforts invested, the DW design remains a great challenge research domain. More accurately, the design quality of the DW depends on several aspects; among them, the requirement-gathering phase is a critical and complex task. In this context, we propose a Natural language (NL) NL-template based design approach, which is twofold; firstly, it facilitates the involvement of decision-makers in the early step of the DW design; indeed, using NL is a good and natural means to encourage the decision-makers to express their requirements as query-like English sentences. Secondly, our approach aims to generate a DW multidimensional schema from a set of gathered requirements (as OLAP: On-Line-Analytical-Processing queries, written according to the NL suggested templates). This approach articulates around: (i) two NL-templates for specifying multidimensional components, and (ii) a set of five heuristic rules for extracting the multidimensional concepts from requirements. Really, we are developing a software prototype that accepts the decision-makers' requirements then automatically identifies the multidimensional components of the DW model.
2020-11-23
Jolfaei, A., Kant, K., Shafei, H..  2019.  Secure Data Streaming to Untrusted Road Side Units in Intelligent Transportation System. 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :793–798.
The paper considers data security issues in vehicle-to-infrastructure communications, where vehicles stream data to a road side unit. We assume aggregated data in road side units can be stored or used for data analytics. In this environment, there are issues in regards to the scalability of key management and computation limitations at the edge of the network. To address these issues, we suggest the formation of groups in the vehicle layer, where a group leader is assigned to communicate with group devices and the road side unit. We propose a lightweight permutation mechanism for preserving the confidentiality of sensory data.
2020-10-06
Dattana, Vishal, Gupta, Kishu, Kush, Ashwani.  2019.  A Probability based Model for Big Data Security in Smart City. 2019 4th MEC International Conference on Big Data and Smart City (ICBDSC). :1—6.

Smart technologies at hand have facilitated generation and collection of huge volumes of data, on daily basis. It involves highly sensitive and diverse data like personal, organisational, environment, energy, transport and economic data. Data Analytics provide solution for various issues being faced by smart cities like crisis response, disaster resilience, emergence management, smart traffic management system etc.; it requires distribution of sensitive data among various entities within or outside the smart city,. Sharing of sensitive data creates a need for efficient usage of smart city data to provide smart applications and utility to the end users in a trustworthy and safe mode. This shared sensitive data if get leaked as a consequence can cause damage and severe risk to the city's resources. Fortification of critical data from unofficial disclosure is biggest issue for success of any project. Data Leakage Detection provides a set of tools and technology that can efficiently resolves the concerns related to smart city critical data. The paper, showcase an approach to detect the leakage which is caused intentionally or unintentionally. The model represents allotment of data objects between diverse agents using Bigraph. The objective is to make critical data secure by revealing the guilty agent who caused the data leakage.

2020-09-28
Madhan, E.S., Ghosh, Uttam, Tosh, Deepak K., Mandal, K., Murali, E., Ghosh, Soumalya.  2019.  An Improved Communications in Cyber Physical System Architecture, Protocols and Applications. 2019 16th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON). :1–6.
In recent trends, Cyber-Physical Systems (CPS) and Internet of Things interpret an evolution of computerized integration connectivity. The specific research challenges in CPS as security, privacy, data analytics, participate sensing, smart decision making. In addition, The challenges in Wireless Sensor Network (WSN) includes secure architecture, energy efficient protocols and quality of services. In this paper, we present an architectures of CPS and its protocols and applications. We propose software related mobile sensing paradigm namely Mobile Sensor Information Agent (MSIA). It works as plug-in based for CPS middleware and scalable applications in mobile devices. The working principle MSIA is acts intermediary device and gathers data from a various external sensors and its upload to cloud on demand. CPS needs tight integration between cyber world and man-made physical world to achieve stability, security, reliability, robustness, and efficiency in the system. Emerging software-defined networking (SDN) can be integrated as the communication infrastructure with CPS infrastructure to accomplish such system. Thus we propose a possible SDN-based CPS framework to improve the performance of the system.
2020-09-14
Ortiz Garcés, Ivan, Cazares, Maria Fernada, Andrade, Roberto Omar.  2019.  Detection of Phishing Attacks with Machine Learning Techniques in Cognitive Security Architecture. 2019 International Conference on Computational Science and Computational Intelligence (CSCI). :366–370.
The number of phishing attacks has increased in Latin America, exceeding the operational skills of cybersecurity analysts. The cognitive security application proposes the use of bigdata, machine learning, and data analytics to improve response times in attack detection. This paper presents an investigation about the analysis of anomalous behavior related with phishing web attacks and how machine learning techniques can be an option to face the problem. This analysis is made with the use of an contaminated data sets, and python tools for developing machine learning for detect phishing attacks through of the analysis of URLs to determinate if are good or bad URLs in base of specific characteristics of the URLs, with the goal of provide realtime information for take proactive decisions that minimize the impact of an attack.
2020-08-28
Zobaed, S.M., ahmad, sahan, Gottumukkala, Raju, Salehi, Mohsen Amini.  2019.  ClustCrypt: Privacy-Preserving Clustering of Unstructured Big Data in the Cloud. 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). :609—616.
Security and confidentiality of big data stored in the cloud are important concerns for many organizations to adopt cloud services. One common approach to address the concerns is client-side encryption where data is encrypted on the client machine before being stored in the cloud. Having encrypted data in the cloud, however, limits the ability of data clustering, which is a crucial part of many data analytics applications, such as search systems. To overcome the limitation, in this paper, we present an approach named ClustCrypt for efficient topic-based clustering of encrypted unstructured big data in the cloud. ClustCrypt dynamically estimates the optimal number of clusters based on the statistical characteristics of encrypted data. It also provides clustering approach for encrypted data. We deploy ClustCrypt within the context of a secure cloud-based semantic search system (S3BD). Experimental results obtained from evaluating ClustCrypt on three datasets demonstrate on average 60% improvement on clusters' coherency. ClustCrypt also decreases the search-time overhead by up to 78% and increases the accuracy of search results by up to 35%.
2020-08-07
Smith, Gary.  2019.  Artificial Intelligence and the Privacy Paradox of Opportunity, Big Data and The Digital Universe. 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE). :150—153.
Artificial Intelligence (AI) can and does use individual's data to make predictions about their wants, their needs, their influences on them and predict what they could do. The use of individual's data naturally raises privacy concerns. This article focuses on AI, the privacy issue against the backdrop of the endless growth of the Digital Universe where Big Data, AI, Data Analytics and 5G Technology live and grow in The Internet of Things (IoT).
2020-06-01
Vegh, Laura.  2018.  Cyber-physical systems security through multi-factor authentication and data analytics. 2018 IEEE International Conference on Industrial Technology (ICIT). :1369–1374.
We are living in a society where technology is present everywhere we go. We are striving towards smart homes, smart cities, Internet of Things, Internet of Everything. Not so long ago, a password was all you needed for secure authentication. Nowadays, even the most complicated passwords are not considered enough. Multi-factor authentication is gaining more and more terrain. Complex system may also require more than one solution for real, strong security. The present paper proposes a framework based with MFA as a basis for access control and data analytics. Events within a cyber-physical system are processed and analyzed in an attempt to detect, prevent and mitigate possible attacks.
2020-04-17
Almousa, May, Anwar, Mohd.  2019.  Detecting Exploit Websites Using Browser-based Predictive Analytics. 2019 17th International Conference on Privacy, Security and Trust (PST). :1—3.
The popularity of Web-based computing has given increase to browser-based cyberattacks. These cyberattacks use websites that exploit various web browser vulnerabilities. To help regular users avoid exploit websites and engage in safe online activities, we propose a methodology of building a machine learning-powered predictive analytical model that will measure the risk of attacks and privacy breaches associated with visiting different websites and performing online activities using web browsers. The model will learn risk levels from historical data and metadata scraped from web browsers.
2020-02-26
Tran, Geoffrey Phi, Walters, John Paul, Crago, Stephen.  2019.  Increased Fault-Tolerance and Real-Time Performance Resiliency for Stream Processing Workloads through Redundancy. 2019 IEEE International Conference on Services Computing (SCC). :51–55.

Data analytics and telemetry have become paramount to monitoring and maintaining quality-of-service in addition to business analytics. Stream processing-a model where a network of operators receives and processes continuously arriving discrete elements-is well-suited for these needs. Current and previous studies and frameworks have focused on continuity of operations and aggregate performance metrics. However, real-time performance and tail latency are also important. Timing errors caused by either performance or failed communication faults also affect real-time performance more drastically than aggregate metrics. In this paper, we introduce redundancy in the stream data to improve the real-time performance and resiliency to timing errors caused by either performance or failed communication faults. We also address limitations in previous solutions using a fine-grained acknowledgment tracking scheme to both increase the effectiveness for resiliency to performance faults and enable effectiveness for failed communication faults. Our results show that fine-grained acknowledgment schemes can improve the tail and mean latencies by approximately 30%. We also show that these schemes can improve resiliency to performance faults compared to existing work. Our improvements result in 47.4% to 92.9% fewer missed deadlines compared to 17.3% to 50.6% for comparable topologies and redundancy levels in the state of the art. Finally, we show that redundancies of 25% to 100% can reduce the number of data elements that miss their deadline constraints by 0.76% to 14.04% for applications with high fan-out and by 7.45% up to 50% for applications with no fan-out.

2019-08-26
Mavroeidis, V., Vishi, K., Jøsang, A..  2018.  A Framework for Data-Driven Physical Security and Insider Threat Detection. 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). :1108–1115.

This paper presents PSO, an ontological framework and a methodology for improving physical security and insider threat detection. PSO can facilitate forensic data analysis and proactively mitigate insider threats by leveraging rule-based anomaly detection. In all too many cases, rule-based anomaly detection can detect employee deviations from organizational security policies. In addition, PSO can be considered a security provenance solution because of its ability to fully reconstruct attack patterns. Provenance graphs can be further analyzed to identify deceptive actions and overcome analytical mistakes that can result in bad decision-making, such as false attribution. Moreover, the information can be used to enrich the available intelligence (about intrusion attempts) that can form use cases to detect and remediate limitations in the system, such as loosely-coupled provenance graphs that in many cases indicate weaknesses in the physical security architecture. Ultimately, validation of the framework through use cases demonstrates and proves that PS0 can improve an organization's security posture in terms of physical security and insider threat detection.

2019-06-24
Bessa, Ricardo J., Rua, David, Abreu, Cláudia, Machado, Paulo, Andrade, José R., Pinto, Rui, Gonçalves, Carla, Reis, Marisa.  2018.  Data Economy for Prosumers in a Smart Grid Ecosystem. Proceedings of the Ninth International Conference on Future Energy Systems. :622–630.

Smart grids technologies are enablers of new business models for domestic consumers with local flexibility (generation, loads, storage) and where access to data is a key requirement in the value stream. However, legislation on personal data privacy and protection imposes the need to develop local models for flexibility modeling and forecasting and exchange models instead of personal data. This paper describes the functional architecture of an home energy management system (HEMS) and its optimization functions. A set of data-driven models, embedded in the HEMS, are discussed for improving renewable energy forecasting skill and modeling multi-period flexibility of distributed energy resources.

2019-03-28
McDermott, C. D., Petrovski, A. V., Majdani, F..  2018.  Towards Situational Awareness of Botnet Activity in the Internet of Things. 2018 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA). :1-8.
The following topics are dealt with: security of data; risk management; decision making; computer crime; invasive software; critical infrastructures; data privacy; insurance; Internet of Things; learning (artificial intelligence).
2019-03-06
Leung, C. K., Hoi, C. S. H., Pazdor, A. G. M., Wodi, B. H., Cuzzocrea, A..  2018.  Privacy-Preserving Frequent Pattern Mining from Big Uncertain Data. 2018 IEEE International Conference on Big Data (Big Data). :5101-5110.
As we are living in the era of big data, high volumes of wide varieties of data which may be of different veracity (e.g., precise data, imprecise and uncertain data) are easily generated or collected at a high velocity in many real-life applications. Embedded in these big data is valuable knowledge and useful information, which can be discovered by big data science solutions. As a popular data science task, frequent pattern mining aims to discover implicit, previously unknown and potentially useful information and valuable knowledge in terms of sets of frequently co-occurring merchandise items and/or events. Many of the existing frequent pattern mining algorithms use a transaction-centric mining approach to find frequent patterns from precise data. However, there are situations in which an item-centric mining approach is more appropriate, and there are also situations in which data are imprecise and uncertain. Hence, in this paper, we present an item-centric algorithm for mining frequent patterns from big uncertain data. In recent years, big data have been gaining the attention from the research community as driven by relevant technological innovations (e.g., clouds) and novel paradigms (e.g., social networks). As big data are typically published online to support knowledge management and fruition processes, these big data are usually handled by multiple owners with possible secure multi-part computation issues. Thus, privacy and security of big data has become a fundamental problem in this research context. In this paper, we present, not only an item-centric algorithm for mining frequent patterns from big uncertain data, but also a privacy-preserving algorithm. In other words, we present- in this paper-a privacy-preserving item-centric algorithm for mining frequent patterns from big uncertain data. Results of our analytical and empirical evaluation show the effectiveness of our algorithm in mining frequent patterns from big uncertain data in a privacy-preserving manner.
Khan, Latifur.  2018.  Big IoT Data Stream Analytics with Issues in Privacy and Security. Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics. :22-22.
Internet of Things (IoT) Devices are monitoring and controlling systems that interact with the physical world by collecting, processing and transmitting data using the internet. IoT devices include home automation systems, smart grid, transportation systems, medical devices, building controls, manufacturing and industrial control systems. With the increase in deployment of IoT devices, there will be a corresponding increase in the amount of data generated by these devices, therefore, resulting in the need of large scale data processing systems to process and extract information for efficient and impactful decision making that will improve quality of living.
2018-06-07
Jiang, Jun, Zhao, Xinghui, Wallace, Scott, Cotilla-Sanchez, Eduardo, Bass, Robert.  2017.  Mining PMU Data Streams to Improve Electric Power System Resilience. Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies. :95–102.
Phasor measurement units (PMUs) provide high-fidelity situational awareness of electric power grid operations. PMU data are used in real-time to inform wide area state estimation, monitor area control error, and event detection. As PMU data becomes more reliable, these devices are finding roles within control systems such as demand response programs and early fault detection systems. As with other cyber physical systems, maintaining data integrity and security are significant challenges for power system operators. In this paper, we present a comprehensive study of multiple machine learning techniques for detecting malicious data injection within PMU data streams. The two datasets used in this study are from the Bonneville Power Administration's PMU network and an inter-university PMU network among three universities, located in the U.S. Pacific Northwest. These datasets contain data from both the transmission level and the distribution level. Our results show that both SVM and ANN are generally effective in detecting spoofed data, and TensorFlow, the newly released tool, demonstrates potential for distributing the training workload and achieving higher performance. We expect these results to shed light on future work of adopting machine learning and data analytics techniques in the electric power industry.
2018-05-24
Sallam, A., Bertino, E..  2017.  Detection of Temporal Insider Threats to Relational Databases. 2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC). :406–415.

The mitigation of insider threats against databases is a challenging problem as insiders often have legitimate access privileges to sensitive data. Therefore, conventional security mechanisms, such as authentication and access control, may be insufficient for the protection of databases against insider threats and need to be complemented with techniques that support real-time detection of access anomalies. The existing real-time anomaly detection techniques consider anomalies in references to the database entities and the amounts of accessed data. However, they are unable to track the access frequencies. According to recent security reports, an increase in the access frequency by an insider is an indicator of a potential data misuse and may be the result of malicious intents for stealing or corrupting the data. In this paper, we propose techniques for tracking users' access frequencies and detecting anomalous related activities in real-time. We present detailed algorithms for constructing accurate profiles that describe the access patterns of the database users and for matching subsequent accesses by these users to the profiles. Our methods report and log mismatches as anomalies that may need further investigation. We evaluated our techniques on the OLTP-Benchmark. The results of the evaluation indicate that our techniques are very effective in the detection of anomalies.

2018-03-19
Massonet, P., Deru, L., Achour, A., Dupont, S., Levin, A., Villari, M..  2017.  End-To-End Security Architecture for Federated Cloud and IoT Networks. 2017 IEEE International Conference on Smart Computing (SMARTCOMP). :1–6.

Smart Internet of Things (IoT) applications will rely on advanced IoT platforms that not only provide access to IoT sensors and actuators, but also provide access to cloud services and data analytics. Future IoT platforms should thus provide connectivity and intelligence. One approach to connecting IoT devices, IoT networks to cloud networks and services is to use network federation mechanisms over the internet to create network slices across heterogeneous platforms. Network slices also need to be protected from potential external and internal threats. In this paper we describe an approach for enforcing global security policies in the federated cloud and IoT networks. Our approach allows a global security to be defined in the form of a single service manifest and enforced across all federation network segments. It relies on network function virtualisation (NFV) and service function chaining (SFC) to enforce the security policy. The approach is illustrated with two case studies: one for a user that wishes to securely access IoT devices and another in which an IoT infrastructure administrator wishes to securely access some remote cloud and data analytics services.

2018-02-02
Mohamed, F., AlBelooshi, B., Salah, K., Yeun, C. Y., Damiani, E..  2017.  A Scattering Technique for Protecting Cryptographic Keys in the Cloud. 2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W). :301–306.

Cloud computing has become a widely used computing paradigm providing on-demand computing and storage capabilities based on pay-as-you-go model. Recently, many organizations, especially in the field of big data, have been adopting the cloud model to perform data analytics through leasing powerful Virtual Machines (VMs). VMs can be attractive targets to attackers as well as untrusted cloud providers who aim to get unauthorized access to the business critical-data. The obvious security solution is to perform data analytics on encrypted data through the use of cryptographic keys as that of the Advanced Encryption Standard (AES). However, it is very easy to obtain AES cryptographic keys from the VM's Random Access Memory (RAM). In this paper, we present a novel key-scattering (KS) approach to protect the cryptographic keys while encrypting/decrypting data. Our solution is highly portable and interoperable. Thus, it could be integrated within today's existing cloud architecture without the need for further modifications. The feasibility of the approach has been proven by implementing a functioning prototype. The evaluation results show that our approach is substantially more resilient to brute force attacks and key extraction tools than the standard AES algorithm, with acceptable execution time.

2018-01-10
Bhattacharjee, S. Das, Talukder, A., Al-Shaer, E., Doshi, P..  2017.  Prioritized active learning for malicious URL detection using weighted text-based features. 2017 IEEE International Conference on Intelligence and Security Informatics (ISI). :107–112.

Data analytics is being increasingly used in cyber-security problems, and found to be useful in cases where data volumes and heterogeneity make it cumbersome for manual assessment by security experts. In practical cyber-security scenarios involving data-driven analytics, obtaining data with annotations (i.e. ground-truth labels) is a challenging and known limiting factor for many supervised security analytics task. Significant portions of the large datasets typically remain unlabelled, as the task of annotation is extensively manual and requires a huge amount of expert intervention. In this paper, we propose an effective active learning approach that can efficiently address this limitation in a practical cyber-security problem of Phishing categorization, whereby we use a human-machine collaborative approach to design a semi-supervised solution. An initial classifier is learnt on a small amount of the annotated data which in an iterative manner, is then gradually updated by shortlisting only relevant samples from the large pool of unlabelled data that are most likely to influence the classifier performance fast. Prioritized Active Learning shows a significant promise to achieve faster convergence in terms of the classification performance in a batch learning framework, and thus requiring even lesser effort for human annotation. An useful feature weight update technique combined with active learning shows promising classification performance for categorizing Phishing/malicious URLs without requiring a large amount of annotated training samples to be available during training. In experiments with several collections of PhishMonger's Targeted Brand dataset, the proposed method shows significant improvement over the baseline by as much as 12%.

2017-12-12
De La Peña Montero, Fabian, Hariri, Salim.  2017.  Autonomic and Integrated Management for Proactive Cyber Security (AIM-PSC). Companion Proceedings of the10th International Conference on Utility and Cloud Computing. :107–112.

The complexity, multiplicity, and impact of cyber-attacks have been increasing at an alarming rate despite the significant research and development investment in cyber security products and tools. The current techniques to detect and protect cyber infrastructures from these smart and sophisticated attacks are mainly characterized as being ad hoc, manual intensive, and too slow. We present in this paper AIM-PSC that is developed jointly by researchers at AVIRTEK and The University of Arizona Center for Cloud and Autonomic Computing that is inspired by biological systems, which can efficiently handle complexity, dynamism and uncertainty. In AIM-PSC system, an online monitoring and multi-level analysis are used to analyze the anomalous behaviors of networks, software systems and applications. By combining the results of different types of analysis using a statistical decision fusion approach we can accurately detect any types of cyber-attacks with high detection and low false alarm rates and proactively respond with corrective actions to mitigate their impacts and stop their propagation.