Visible to the public Biblio

Found 106 results

Filters: Keyword is software engineering  [Clear All Filters]
2021-06-24
Dang, Tran Khanh, Truong, Phat T. Tran, Tran, Pi To.  2020.  Data Poisoning Attack on Deep Neural Network and Some Defense Methods. 2020 International Conference on Advanced Computing and Applications (ACOMP). :15–22.
In recent years, Artificial Intelligence has disruptively changed information technology and software engineering with a proliferation of technologies and applications based-on it. However, recent researches show that AI models in general and the most greatest invention since sliced bread - Deep Learning models in particular, are vulnerable to being hacked and can be misused for bad purposes. In this paper, we carry out a brief review of data poisoning attack - one of the two recently dangerous emerging attacks - and the state-of-the-art defense methods for this problem. Finally, we discuss current challenges and future developments.
Chen, Sen, Fan, Lingling, Meng, Guozhu, Su, Ting, Xue, Minhui, Xue, Yinxing, Liu, Yang, Xu, Lihua.  2020.  An Empirical Assessment of Security Risks of Global Android Banking Apps. 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). :1310—1322.
Mobile banking apps, belonging to the most security-critical app category, render massive and dynamic transactions susceptible to security risks. Given huge potential financial loss caused by vulnerabilities, existing research lacks a comprehensive empirical study on the security risks of global banking apps to provide useful insights and improve the security of banking apps. Since data-related weaknesses in banking apps are critical and may directly cause serious financial loss, this paper first revisits the state-of-the-art available tools and finds that they have limited capability in identifying data-related security weaknesses of banking apps. To complement the capability of existing tools in data-related weakness detection, we propose a three-phase automated security risk assessment system, named Ausera, which leverages static program analysis techniques and sensitive keyword identification. By leveraging Ausera, we collect 2,157 weaknesses in 693 real-world banking apps across 83 countries, which we use as a basis to conduct a comprehensive empirical study from different aspects, such as global distribution and weakness evolution during version updates. We find that apps owned by subsidiary banks are always less secure than or equivalent to those owned by parent banks. In addition, we also track the patching of weaknesses and receive much positive feedback from banking entities so as to improve the security of banking apps in practice. We further find that weaknesses derived from outdated versions of banking apps or third-party libraries are highly prone to being exploited by attackers. To date, we highlight that 21 banks have confirmed the weaknesses we reported (including 126 weaknesses in total). We also exchange insights with 7 banks, such as HSBC in UK and OCBC in Singapore, via in-person or online meetings to help them improve their apps. We hope that the insights developed in this paper will inform the communities about the gaps among multiple stakeholders, including banks, academic researchers, and third-party security companies.
Messe, Nan, Belloir, Nicolas, Chiprianov, Vanea, El-Hachem, Jamal, Fleurquin, Régis, Sadou, Salah.  2020.  An Asset-Based Assistance for Secure by Design. 2020 27th Asia-Pacific Software Engineering Conference (APSEC). :178—187.
With the growing numbers of security attacks causing more and more serious damages in software systems, security cannot be added as an afterthought in software development. It has to be built in from the early development phases such as requirement and design. The role responsible for designing a software system is termed an “architect”, knowledgeable about the system architecture design, but not always well-trained in security. Moreover, involving other security experts into the system design is not always possible due to time-to-market and budget constraints. To address these challenges, we propose to define an asset-based security assistance in this paper, to help architects design secure systems even if these architects have limited knowledge in security. This assistance helps alert threats, and integrate the security controls over vulnerable parts of system into the architecture model. The central concept enabling this assistance is that of asset. We apply our proposal on a telemonitoring case study to show that automating such an assistance is feasible.
Pashchenko, Ivan, Scandariato, Riccardo, Sabetta, Antonino, Massacci, Fabio.  2021.  Secure Software Development in the Era of Fluid Multi-party Open Software and Services. 2021 IEEE/ACM 43rd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER). :91—95.
Pushed by market forces, software development has become fast-paced. As a consequence, modern development projects are assembled from 3rd-party components. Security & privacy assurance techniques once designed for large, controlled updates over months or years, must now cope with small, continuous changes taking place within a week, and happening in sub-components that are controlled by third-party developers one might not even know they existed. In this paper, we aim to provide an overview of the current software security approaches and evaluate their appropriateness in the face of the changed nature in software development. Software security assurance could benefit by switching from a process-based to an artefact-based approach. Further, security evaluation might need to be more incremental, automated and decentralized. We believe this can be achieved by supporting mechanisms for lightweight and scalable screenings that are applicable to the entire population of software components albeit there might be a price to pay.
Angermeir, Florian, Voggenreiter, Markus, Moyón, Fabiola, Mendez, Daniel.  2021.  Enterprise-Driven Open Source Software: A Case Study on Security Automation. 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). :278—287.
Agile and DevOps are widely adopted by the industry. Hence, integrating security activities with industrial practices, such as continuous integration (CI) pipelines, is necessary to detect security flaws and adhere to regulators’ demands early. In this paper, we analyze automated security activities in CI pipelines of enterprise-driven open source software (OSS). This shall allow us, in the long-run, to better understand the extent to which security activities are (or should be) part of automated pipelines. In particular, we mine publicly available OSS repositories and survey a sample of project maintainers to better understand the role that security activities and their related tools play in their CI pipelines. To increase transparency and allow other researchers to replicate our study (and to take different perspectives), we further disclose our research artefacts.Our results indicate that security activities in enterprise-driven OSS projects are scarce and protection coverage is rather low. Only 6.83% of the analyzed 8,243 projects apply security automation in their CI pipelines, even though maintainers consider security to be rather important. This alerts industry to keep the focus on vulnerabilities of 3rd Party software and it opens space for other improvements of practice which we outline in this manuscript.
Moran, Kevin, Palacio, David N., Bernal-Cárdenas, Carlos, McCrystal, Daniel, Poshyvanyk, Denys, Shenefiel, Chris, Johnson, Jeff.  2020.  Improving the Effectiveness of Traceability Link Recovery using Hierarchical Bayesian Networks. 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). :873—885.
Traceability is a fundamental component of the modern software development process that helps to ensure properly functioning, secure programs. Due to the high cost of manually establishing trace links, researchers have developed automated approaches that draw relationships between pairs of textual software artifacts using similarity measures. However, the effectiveness of such techniques are often limited as they only utilize a single measure of artifact similarity and cannot simultaneously model (implicit and explicit) relationships across groups of diverse development artifacts. In this paper, we illustrate how these limitations can be overcome through the use of a tailored probabilistic model. To this end, we design and implement a HierarchiCal PrObabilistic Model for SoftwarE Traceability (Comet) that is able to infer candidate trace links. Comet is capable of modeling relationships between artifacts by combining the complementary observational prowess of multiple measures of textual similarity. Additionally, our model can holistically incorporate information from a diverse set of sources, including developer feedback and transitive (often implicit) relationships among groups of software artifacts, to improve inference accuracy. We conduct a comprehensive empirical evaluation of Comet that illustrates an improvement over a set of optimally configured baselines of ≈14% in the best case and ≈5% across all subjects in terms of average precision. The comparative effectiveness of Comet in practice, where optimal configuration is typically not possible, is likely to be higher. Finally, we illustrate Comet's potential for practical applicability in a survey with developers from Cisco Systems who used a prototype Comet Jenkins plugin.
2021-06-01
Ghosal, Sandip, Shyamasundar, R. K..  2020.  A Generalized Notion of Non-interference for Flow Security of Sequential and Concurrent Programs. 2020 27th Asia-Pacific Software Engineering Conference (APSEC). :51–60.
For the last two decades, a wide spectrum of interpretations of non-interference11The notion of non-interference discussed in this paper enforces flow security in a program and is different from the concept of non-interference used for establishing functional correctness of parallel programs [1] have been used in the security analysis of programs, starting with the notion proposed by Goguen & Meseguer along with arguments of its impact on security practice. While the majority of works deal with sequential programs, several researchers have extended the notion of non-interference to enforce information flow-security in non-deterministic and concurrent programs. Major efforts of generalizations are based on (i) considering input sequences as a basic unit for input/output with semantic interpretation on a two-point information flow lattice, or (ii) typing of expressions as values for reading and writing, or (iii) typing of expressions along with its limited effects. Such approaches have limited compositionality and, thus, pose issues while extending these notions for concurrent programs. Further, in a general multi-point lattice, the notion of a public observer (or attacker) is not unique as it depends on the level of the attacker and the one attacked. In this paper, we first propose a compositional variant of non-interference for sequential systems that follow a general information flow lattice and place it in the context of earlier definitions of non-interference. We show that such an extension leads to the capturing of violations of information flow security in a concrete setting of a sequential language. Finally, we generalize non-interference for concurrent programs and illustrate its use for security analysis, particularly in the cases where information is transmitted through shared variables.
2021-05-03
Paulsen, Brandon, Wang, Jingbo, Wang, Jiawei, Wang, Chao.  2020.  NEURODIFF: Scalable Differential Verification of Neural Networks using Fine-Grained Approximation. 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). :784–796.
As neural networks make their way into safety-critical systems, where misbehavior can lead to catastrophes, there is a growing interest in certifying the equivalence of two structurally similar neural networks - a problem known as differential verification. For example, compression techniques are often used in practice for deploying trained neural networks on computationally- and energy-constrained devices, which raises the question of how faithfully the compressed network mimics the original network. Unfortunately, existing methods either focus on verifying a single network or rely on loose approximations to prove the equivalence of two networks. Due to overly conservative approximation, differential verification lacks scalability in terms of both accuracy and computational cost. To overcome these problems, we propose NEURODIFF, a symbolic and fine-grained approximation technique that drastically increases the accuracy of differential verification on feed-forward ReLU networks while achieving many orders-of-magnitude speedup. NEURODIFF has two key contributions. The first one is new convex approximations that more accurately bound the difference of two networks under all possible inputs. The second one is judicious use of symbolic variables to represent neurons whose difference bounds have accumulated significant error. We find that these two techniques are complementary, i.e., when combined, the benefit is greater than the sum of their individual benefits. We have evaluated NEURODIFF on a variety of differential verification tasks. Our results show that NEURODIFF is up to 1000X faster and 5X more accurate than the state-of-the-art tool.
2021-04-09
Smith, B., Feather, M. S., Huntsberger, T., Bocchino, R..  2020.  Software Assurance of Autonomous Spacecraft Control. 2020 Annual Reliability and Maintainability Symposium (RAMS). :1—7.
Summary & Conclusions: The work described addresses assurance of a planning and execution software system being added to an in-orbit CubeSat to demonstrate autonomous control of that spacecraft. Our focus was on how to develop assurance of the correct operation of the added software in its operational context, our approach to which was to use an assurance case to guide and organize the information involved. The relatively manageable magnitude of the CubeSat and its autonomy demonstration experiment made it plausible to try out our assurance approach in a relatively short timeframe. Additionally, the time was ripe to inject useful assurance results into the ongoing development and testing of the autonomy demonstration. In conducting this, we sought to answer several questions about our assurance approach. The questions, and the conclusions we reached, are as follows: 1. Question: Would our approach to assurance apply to the introduction of a planning and execution software into an existing system? Conclusion: Yes. The use of an assurance case helped focus our attention on the more challenging aspects, notably the interactions between the added software and the existing software system into which it was being introduced. This guided us to choose a hazard analysis method specifically for software interactions. In addition, we were able to automate generation of assurance case elements from the hazard analysis' tabular representation. 2. Question: Would our methods prove understandable to the software engineers tasked with integrating the software into the CubeSat's existing system? Conclusion: Somewhat. In interim discussions with the software engineers we found the assurance case style, of decomposing an argument into smaller pieces, to be useful and understandable to organize discussion. Ultimately however we did not persuade them to adopt assurance cases as the means to present review information. We attribute this to reluctance to deviate from JPL's tried and true style of holding reviews. For the CubeSat project as a whole, hosting an autonomy demonstration was already a novelty. Combining this with presentation of review information via an assurance case, with which our reviewers would be unaccustomed, would have exacerbated the unfamiliarity. 3. Question: Would conducting our methods prove to be compatible with the (limited) time available of the software engineers? Conclusion: Yes. We used a series of six brief meetings (approximately one hour each) with the development team to first identify the interactions as the area on which to focus, and to then perform the hazard analysis on those interactions. We used the meetings to confirm, or correct as necessary, our understanding of the software system and the spacecraft context. Between meetings we studied the existing software documentation, did preliminary analyses by ourselves, and documented the results in a concise form suitable for discussion with the team. 4. Question: Would our methods yield useful results to the software engineers? Conclusion: Yes. The hazard analysis systematically confirmed existing hazards' mitigations, and drew attention to a mitigation whose implementation needed particular care. In some cases, the analysis identified potential hazards - and what to do about them - should some of the more sophisticated capabilities of the planning and execution software be used. These capabilities, not exercised in the initial experiments on the CubeSat, may be used in future experiments. We remain involved with the developers as they prepare for these future experiments, so our analysis results will be of benefit as these proceed.
2021-04-08
Mundie, D. A., Perl, S., Huth, C. L..  2013.  Toward an Ontology for Insider Threat Research: Varieties of Insider Threat Definitions. 2013 Third Workshop on Socio-Technical Aspects in Security and Trust. :26—36.
The lack of standardization of the terms insider and insider threat has been a noted problem for researchers in the insider threat field. This paper describes the investigation of 42 different definitions of the terms insider and insider threat, with the goal of better understanding the current conceptual model of insider threat and facilitating communication in the research community.
Claycomb, W. R., Huth, C. L., Phillips, B., Flynn, L., McIntire, D..  2013.  Identifying indicators of insider threats: Insider IT sabotage. 2013 47th International Carnahan Conference on Security Technology (ICCST). :1—5.
This paper describes results of a study seeking to identify observable events related to insider sabotage. We collected information from actual insider threat cases, created chronological timelines of the incidents, identified key points in each timeline such as when attack planning began, measured the time between key events, and looked for specific observable events or patterns that insiders held in common that may indicate insider sabotage is imminent or likely. Such indicators could be used by security experts to potentially identify malicious activity at or before the time of attack. Our process included critical steps such as identifying the point of damage to the organization as well as any malicious events prior to zero hour that enabled the attack but did not immediately cause harm. We found that nearly 71% of the cases we studied had either no observable malicious action prior to attack, or had one that occurred less than one day prior to attack. Most of the events observed prior to attack were behavioral, not technical, especially those occurring earlier in the case timelines. Of the observed technical events prior to attack, nearly one third involved installation of software onto the victim organizations IT systems.
2021-03-29
Grundy, J..  2020.  Human-centric Software Engineering for Next Generation Cloud- and Edge-based Smart Living Applications. 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID). :1—10.

Humans are a key part of software development, including customers, designers, coders, testers and end users. In this keynote talk I explain why incorporating human-centric issues into software engineering for next-generation applications is critical. I use several examples from our recent and current work on handling human-centric issues when engineering various `smart living' cloud- and edge-based software systems. This includes using human-centric, domain-specific visual models for non-technical experts to specify and generate data analysis applications; personality impact on aspects of software activities; incorporating end user emotions into software requirements engineering for smart homes; incorporating human usage patterns into emerging edge computing applications; visualising smart city-related data; reporting diverse software usability defects; and human-centric security and privacy requirements for smart living systems. I assess the usefulness of these approaches, highlight some outstanding research challenges, and briefly discuss our current work on new human-centric approaches to software engineering for smart living applications.

2021-03-22
Sai, C. C., Prakash, C. S., Jose, J., Mana, S. C., Samhitha, B. K..  2020.  Analysing Android App Privacy Using Classification Algorithm. 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184). :551–555.
The interface permits the client to scan for a subjective utility on the Play Store; the authorizations posting and the protection arrangement are then routinely recovered, on all events imaginable. The client has then the capability of choosing an interesting authorization, and a posting of pertinent sentences are separated with the guide of the privateer's inclusion and introduced to them, alongside a right depiction of the consent itself. Such an interface allows the client to rapidly assess the security-related dangers of an Android application, by utilizing featuring the pertinent segments of the privateer's inclusion and by introducing helpful data about shrewd authorizations. A novel procedure is proposed for the assessment of privateer's protection approaches with regards to Android applications. The gadget actualized widely facilitates the way toward understanding the security ramifications of placing in 1/3 birthday celebration applications and it has just been checked in a situation to feature troubling examples of uses. The gadget is created in light of expandability, and correspondingly inclines in the strategy can without trouble be worked in to broaden the unwavering quality and adequacy. Likewise, if your application handles non-open or delicate individual information, it would be ideal if you also allude to the extra necessities in the “Individual and Sensitive Information” territory underneath. These Google Play necessities are notwithstanding any prerequisites endorsed by method for material security or data assurance laws. It has been proposed that, an individual who needs to perform the establishment and utilize any 1/3 festival application doesn't perceive the significance and which methods for the consents mentioned by method for an application, and along these lines sincerely gives all the authorizations as a final product of which unsafe applications furthermore get set up and work their malevolent leisure activity in the rear of the scene.
Kellogg, M., Schäf, M., Tasiran, S., Ernst, M. D..  2020.  Continuous Compliance. 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). :511–523.
Vendors who wish to provide software or services to large corporations and governments must often obtain numerous certificates of compliance. Each certificate asserts that the software satisfies a compliance regime, like SOC or the PCI DSS, to protect the privacy and security of sensitive data. The industry standard for obtaining a compliance certificate is an auditor manually auditing source code. This approach is expensive, error-prone, partial, and prone to regressions. We propose continuous compliance to guarantee that the codebase stays compliant on each code change using lightweight verification tools. Continuous compliance increases assurance and reduces costs. Continuous compliance is applicable to any source-code compliance requirement. To illustrate our approach, we built verification tools for five common audit controls related to data security: cryptographically unsafe algorithms must not be used, keys must be at least 256 bits long, credentials must not be hard-coded into program text, HTTPS must always be used instead of HTTP, and cloud data stores must not be world-readable. We evaluated our approach in three ways. (1) We applied our tools to over 5 million lines of open-source software. (2) We compared our tools to other publicly-available tools for detecting misuses of encryption on a previously-published benchmark, finding that only ours are suitable for continuous compliance. (3) We deployed a continuous compliance process at AWS, a large cloud-services company: we integrated verification tools into the compliance process (including auditors accepting their output as evidence) and ran them on over 68 million lines of code. Our tools and the data for the former two evaluations are publicly available.
2021-03-18
Baolin, X., Minhuan, Z..  2020.  A Solution of Text Based CAPTCHA without Network Flow Consumption. 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS). :395—399.

With the widespread application of distributed information processing, information processing security issues have become one of the important research topics; CAPTCHA technology is often used as the first security barrier for distributed information processing and it prevents the client malicious programs to attack the server. The experiment proves that the existing “request / response” mode of CAPTCHA has great security risks. “The text-based CAPTCHA solution without network flow consumption” proposed in this paper avoids the “request / response” mode and the verification logic of the text-based CAPTCHA is migrated to the client in this solution, which fundamentally cuts off the client's attack facing to the server during the verification of the CAPTCHA and it is a high-security text-based CAPTCHA solution without network flow consumption.

2021-03-15
Staicu, C.-A., Torp, M. T., Schäfer, M., Møller, A., Pradel, M..  2020.  Extracting Taint Specifications for JavaScript Libraries. 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). :198—209.

Modern JavaScript applications extensively depend on third-party libraries. Especially for the Node.js platform, vulnerabilities can have severe consequences to the security of applications, resulting in, e.g., cross-site scripting and command injection attacks. Existing static analysis tools that have been developed to automatically detect such issues are either too coarse-grained, looking only at package dependency structure while ignoring dataflow, or rely on manually written taint specifications for the most popular libraries to ensure analysis scalability. In this work, we propose a technique for automatically extracting taint specifications for JavaScript libraries, based on a dynamic analysis that leverages the existing test suites of the libraries and their available clients in the npm repository. Due to the dynamic nature of JavaScript, mapping observations from dynamic analysis to taint specifications that fit into a static analysis is non-trivial. Our main insight is that this challenge can be addressed by a combination of an access path mechanism that identifies entry and exit points, and the use of membranes around the libraries of interest. We show that our approach is effective at inferring useful taint specifications at scale. Our prototype tool automatically extracts 146 additional taint sinks and 7 840 propagation summaries spanning 1 393 npm modules. By integrating the extracted specifications into a commercial, state-of-the-art static analysis, 136 new alerts are produced, many of which correspond to likely security vulnerabilities. Moreover, many important specifications that were originally manually written are among the ones that our tool can now extract automatically.

Danilova, A., Naiakshina, A., Smith, M..  2020.  One Size Does Not Fit All: A Grounded Theory and Online Survey Study of Developer Preferences for Security Warning Types. 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). :136–148.
A wide range of tools exist to assist developers in creating secure software. Many of these tools, such as static analysis engines or security checkers included in compilers, use warnings to communicate security issues to developers. The effectiveness of these tools relies on developers heeding these warnings, and there are many ways in which these warnings could be displayed. Johnson et al. [46] conducted qualitative research and found that warning presentation and integration are main issues. We built on Johnson et al.'s work and examined what developers want from security warnings, including what form they should take and how they should integrate into their workflow and work context. To this end, we conducted a Grounded Theory study with 14 professional software developers and 12 computer science students as well as a focus group with 7 academic researchers to gather qualitative insights. To back up the theory developed from the qualitative research, we ran a quantitative survey with 50 professional software developers. Our results show that there is significant heterogeneity amongst developers and that no one warning type is preferred over all others. The context in which the warnings are shown is also highly relevant, indicating that it is likely to be beneficial if IDEs and other development tools become more flexible in their warning interactions with developers. Based on our findings, we provide concrete recommendations for both future research as well as how IDEs and other security tools can improve their interaction with developers.
Hwang, S., Ryu, S..  2020.  Gap between Theory and Practice: An Empirical Study of Security Patches in Solidity. 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). :542–553.
Ethereum, one of the most popular blockchain platforms, provides financial transactions like payments and auctions through smart contracts. Due to the immense interest in smart contracts in academia, the research community of smart contract security has made a significant improvement recently. Researchers have reported various security vulnerabilities in smart contracts, and developed static analysis tools and verification frameworks to detect them. However, it is unclear whether such great efforts from academia has indeed enhanced the security of smart contracts in reality. To understand the security level of smart contracts in the wild, we empirically studied 55,046 real-world Ethereum smart contracts written in Solidity, the most popular programming language used by Ethereum smart contract developers. We first examined how many well-known vulnerabilities the Solidity compiler has patched, and how frequently the Solidity team publishes compiler releases. Unfortunately, we observed that many known vulnerabilities are not yet patched, and some patches are not even sufficient to avoid their target vulnerabilities. Subsequently, we investigated whether smart contract developers use the most recent compiler with vulnerabilities patched. We reported that developers of more than 98% of real-world Solidity contracts still use older compilers without vulnerability patches, and more than 25% of the contracts are potentially vulnerable due to the missing security patches. To understand actual impacts of the missing patches, we manually investigated potentially vulnerable contracts that are detected by our static analyzer and identified common mistakes by Solidity developers, which may cause serious security issues such as financial loss. We detected hundreds of vulnerable contracts and about one fourth of the vulnerable contracts are used by thousands of people. We recommend the Solidity team to make patches that resolve known vulnerabilities correctly, and developers to use the latest Solidity compiler to avoid missing security patches.
Brauckmann, A., Goens, A., Castrillon, J..  2020.  ComPy-Learn: A toolbox for exploring machine learning representations for compilers. 2020 Forum for Specification and Design Languages (FDL). :1–4.
Deep Learning methods have not only shown to improve software performance in compiler heuristics, but also e.g. to improve security in vulnerability prediction or to boost developer productivity in software engineering tools. A key to the success of such methods across these use cases is the expressiveness of the representation used to abstract from the program code. Recent work has shown that different such representations have unique advantages in terms of performance. However, determining the best-performing one for a given task is often not obvious and requires empirical evaluation. Therefore, we present ComPy-Learn, a toolbox for conveniently defining, extracting, and exploring representations of program code. With syntax-level language information from the Clang compiler frontend and low-level information from the LLVM compiler backend, the tool supports the construction of linear and graph representations and enables an efficient search for the best-performing representation and model for tasks on program code.
2021-03-09
Zhou, B., He, J., Tan, M..  2020.  A Two-stage P2P Botnet Detection Method Based on Statistical Features. 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS). :497—502.

P2P botnet has become one of the most serious threats to today's network security. It can be used to launch kinds of malicious activities, ranging from spamming to distributed denial of service attack. However, the detection of P2P botnet is always challenging because of its decentralized architecture. In this paper, we propose a two-stage P2P botnet detection method which only relies on several traffic statistical features. This method first detects P2P hosts based on three statistical features, and then distinguishes P2P bots from benign P2P hosts by means of another two statistical features. Experimental evaluations on real-world traffic datasets shows that our method is able to detect hidden P2P bots with a detection accuracy of 99.7% and a false positive rate of only 0.3% within 5 minutes.

Mashhadi, M. J., Hemmati, H..  2020.  Hybrid Deep Neural Networks to Infer State Models of Black-Box Systems. 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). :299–311.
Inferring behavior model of a running software system is quite useful for several automated software engineering tasks, such as program comprehension, anomaly detection, and testing. Most existing dynamic model inference techniques are white-box, i.e., they require source code to be instrumented to get run-time traces. However, in many systems, instrumenting the entire source code is not possible (e.g., when using black-box third-party libraries) or might be very costly. Unfortunately, most black-box techniques that detect states over time are either univariate, or make assumptions on the data distribution, or have limited power for learning over a long period of past behavior. To overcome the above issues, in this paper, we propose a hybrid deep neural network that accepts as input a set of time series, one per input/output signal of the system, and applies a set of convolutional and recurrent layers to learn the non-linear correlations between signals and the patterns, over time. We have applied our approach on a real UAV auto-pilot solution from our industry partner with half a million lines of C code. We ran 888 random recent system-level test cases and inferred states, over time. Our comparison with several traditional time series change point detection techniques showed that our approach improves their performance by up to 102%, in terms of finding state change points, measured by F1 score. We also showed that our state classification algorithm provides on average 90.45% F1 score, which improves traditional classification algorithms by up to 17%.
2021-03-04
Yangchun, Z., Zhao, Y., Yang, J..  2020.  New Virus Infection Technology and Its Detection. 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS). :388—394.

Computer virus detection technology is an important basic security technology in the information age. The current detection technology has a high success rate for the detection of known viruses and known virus infection technologies, but the development of detection technology often lags behind the development of computer virus infection technology. Under Windows system, there are many kinds of file viruses, which change rapidly, and pose a continuous security threat to users. The research of new file virus infection technology can provide help for the development of virus detection technology. In this paper, a new virus infection technology based on dynamic binary analysis is proposed to execute file virus infection. Using the new virus infection technology, the infected executable file can be detected in the experimental environment. At the same time, this paper discusses the detection method of new virus infection technology. We hope to provide help for the development of virus detection technology from the perspective of virus design.

2021-02-08
Chesnokov, N. I., Korochentsev, D. A., Cherckesova, L. V., Safaryan, O. A., Chumakov, V. E., Pilipenko, I. A..  2020.  Software Development of Electronic Digital Signature Generation at Institution Electronic Document Circulation. 2020 IEEE East-West Design Test Symposium (EWDTS). :1–5.
the purpose of this paper is investigation of existing approaches to formation of electronic digital signatures, as well as the possibility of software developing for electronic signature generation at electronic document circulation of institution. The article considers and analyzes the existing algorithms for generating and processing electronic signatures. Authors propose the model for documented information exchanging in institution, including cryptographic module and secure key storage, blockchain storage of electronic signatures, central web-server and web-interface. Examples of the developed software are demonstrated, and recommendations are given for its implementation, integration and using in different institutions.
Wang, Y., Wen, M., Liu, Y., Wang, Y., Li, Z., Wang, C., Yu, H., Cheung, S.-C., Xu, C., Zhu, Z..  2020.  Watchman: Monitoring Dependency Conflicts for Python Library Ecosystem. 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). :125–135.
The PyPI ecosystem has indexed millions of Python libraries to allow developers to automatically download and install dependencies of their projects based on the specified version constraints. Despite the convenience brought by automation, version constraints in Python projects can easily conflict, resulting in build failures. We refer to such conflicts as Dependency Conflict (DC) issues. Although DC issues are common in Python projects, developers lack tool support to gain a comprehensive knowledge for diagnosing the root causes of these issues. In this paper, we conducted an empirical study on 235 real-world DC issues. We studied the manifestation patterns and fixing strategies of these issues and found several key factors that can lead to DC issues and their regressions. Based on our findings, we designed and implemented Watchman, a technique to continuously monitor dependency conflicts for the PyPI ecosystem. In our evaluation, Watchman analyzed PyPI snapshots between 11 Jul 2019 and 16 Aug 2019, and found 117 potential DC issues. We reported these issues to the developers of the corresponding projects. So far, 63 issues have been confirmed, 38 of which have been quickly fixed by applying our suggested patches.
2021-02-01
Calhoun, C. S., Reinhart, J., Alarcon, G. A., Capiola, A..  2020.  Establishing Trust in Binary Analysis in Software Development and Applications. 2020 IEEE International Conference on Human-Machine Systems (ICHMS). :1–4.
The current exploratory study examined software programmer trust in binary analysis techniques used to evaluate and understand binary code components. Experienced software developers participated in knowledge elicitations to identify factors affecting trust in tools and methods used for understanding binary code behavior and minimizing potential security vulnerabilities. Developer perceptions of trust in those tools to assess implementation risk in binary components were captured across a variety of application contexts. The software developers reported source security and vulnerability reports provided the best insight and awareness of potential issues or shortcomings in binary code. Further, applications where the potential impact to systems and data loss is high require relying on more than one type of analysis to ensure the binary component is sound. The findings suggest binary analysis is viable for identifying issues and potential vulnerabilities as part of a comprehensive solution for understanding binary code behavior and security vulnerabilities, but relying simply on binary analysis tools and binary release metadata appears insufficient to ensure a secure solution.