Visible to the public Biblio

Filters: Keyword is work factor metrics  [Clear All Filters]
2021-03-01
Said, S., Bouloiz, H., Gallab, M..  2020.  Identification and Assessment of Risks Affecting Sociotechnical Systems Resilience. 2020 IEEE 6th International Conference on Optimization and Applications (ICOA). :1–10.
Resilience is regarded nowadays as the ideal solution that can be envisaged by sociotechnical systems for coping with potential threats and crises. This being said, gaining and maintaining this ability is not always easy, given the multitude of risks driving the adverse and challenging events. This paper aims to propose a method consecrated to the assessment of risks directly affecting resilience. This work is conducted within the framework of risk assessment and resilience engineering approaches. A 5×5 matrix, dedicated to the identification and assessment of risk factors that constitute threats to the system resilience, has been elaborated. This matrix consists of two axes, namely, the impact on resilience metrics and the availability and effectiveness of resilience planning. Checklists serving to collect information about these two attributes are established and a case study is undertaken. In this paper, a new method for identifying and assessing risk factors menacing directly the resilience of a given system is presented. The analysis of these risks must be given priority to make the system more resilient to shocks.
Kerim, A., Genc, B..  2020.  Mobile Games Success and Failure: Mining the Hidden Factors. 2020 7th International Conference on Soft Computing Machine Intelligence (ISCMI). :167–171.
Predicting the success of a mobile game is a prime issue in game industry. Thousands of games are being released each day. However, a few of them succeed while the majority fail. Towards the goal of investigating the potential correlation between the success of a mobile game and its specific attributes, this work was conducted. More than 17 thousands games were considered for that reason. We show that specific game attributes, such as number of IAPs (In-App Purchases), belonging to the puzzle genre, supporting different languages and being produced by a mature developer highly and positively affect the success of the game in the future. Moreover, we show that releasing the game in July and not including any IAPs seems to be highly associated with the game’s failure. Our second main contribution, is the proposal of a novel success score metric that reflects multiple objectives, in contrast to evaluating only revenue, average rating or rating count. We also employ different machine learning models, namely, SVM (Support Vector Machine), RF (Random Forest) and Deep Learning (DL) to predict this success score metric of a mobile game given its attributes. The trained models were able to predict this score, as well as the rating average and rating count of a mobile game with more than 70% accuracy. This prediction can help developers before releasing their game to the market to avoid any potential disappointments.
Raj, C., Khular, L., Raj, G..  2020.  Clustering Based Incident Handling For Anomaly Detection in Cloud Infrastructures. 2020 10th International Conference on Cloud Computing, Data Science Engineering (Confluence). :611–616.
Incident Handling for Cloud Infrastructures focuses on how the clustering based and non-clustering based algorithms can be implemented. Our research focuses in identifying anomalies and suspicious activities that might happen inside a Cloud Infrastructure over available datasets. A brief study has been conducted, where a network statistics dataset the NSL-KDD, has been chosen as the model to be worked upon, such that it can mirror the Cloud Infrastructure and its components. An important aspect of cloud security is to implement anomaly detection mechanisms, in order to monitor the incidents that inhibit the development and the efficiency of the cloud. Several methods have been discovered which help in achieving our present goal, some of these are highlighted as the following; by applying algorithm such as the Local Outlier Factor to cancel the noise created by irrelevant data points, by applying the DBSCAN algorithm which can detect less denser areas in order to identify their cause of clustering, the K-Means algorithm to generate positive and negative clusters to identify the anomalous clusters and by applying the Isolation Forest algorithm in order to implement decision based approach to detect anomalies. The best algorithm would help in finding and fixing the anomalies efficiently and would help us in developing an Incident Handling model for the Cloud.
Perisetty, A., Bodempudi, S. T., Shaik, P. Rahaman, Kumar, B. L. N. Phaneendra.  2020.  Classification of Hyperspectral Images using Edge Preserving Filter and Nonlinear Support Vector Machine (SVM). 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS). :1050–1054.
Hyperspectral image is acquired with a special sensor in which the information is collected continuously. This sensor will provide abundant data from the scene captured. The high voluminous data in this image give rise to the extraction of materials and other valuable items in it. This paper proposes a methodology to extract rich information from the hyperspectral images. As the information collected in a contiguous manner, there is a need to extract spectral bands that are uncorrelated. A factor analysis based dimensionality reduction technique is employed to extract the spectral bands and a weight least square filter is used to get the spatial information from the data. Due to the preservation of edge property in the spatial filter, much information is extracted during the feature extraction phase. Finally, a nonlinear SVM is applied to assign a class label to the pixels in the image. The research work is tested on the standard dataset Indian Pines. The performance of the proposed method on this dataset is assessed through various accuracy measures. These accuracies are 96%, 92.6%, and 95.4%. over the other methods. This methodology can be applied to forestry applications to extract the various metrics in the real world.
Nasir, J., Norman, U., Bruno, B., Dillenbourg, P..  2020.  When Positive Perception of the Robot Has No Effect on Learning. 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). :313–320.
Humanoid robots, with a focus on personalised social behaviours, are increasingly being deployed in educational settings to support learning. However, crafting pedagogical HRI designs and robot interventions that have a real, positive impact on participants' learning, as well as effectively measuring such impact, is still an open challenge. As a first effort in tackling the issue, in this paper we propose a novel robot-mediated, collaborative problem solving activity for school children, called JUSThink, aiming at improving their computational thinking skills. JUSThink will serve as a baseline and reference for investigating how the robot's behaviour can influence the engagement of the children with the activity, as well as their collaboration and mutual understanding while working on it. To this end, this first iteration aims at investigating (i) participants' engagement with the activity (Intrinsic Motivation Inventory-IMI), their mutual understanding (IMIlike) and perception of the robot (Godspeed Questionnaire); (ii) participants' performance during the activity, using several performance and learning metrics. We carried out an extensive user-study in two international schools in Switzerland, in which around 100 children participated in pairs in one-hour long interactions with the activity. Surprisingly, we observe that while a teams' performance significantly affects how team members evaluate their competence, mutual understanding and task engagement, it does not affect their perception of the robot and its helpfulness, a fact which highlights the need for baseline studies and multi-dimensional evaluation metrics when assessing the impact of robots in educational activities.
Zhang, Y., Groves, T., Cook, B., Wright, N. J., Coskun, A. K..  2020.  Quantifying the impact of network congestion on application performance and network metrics. 2020 IEEE International Conference on Cluster Computing (CLUSTER). :162–168.
In modern high-performance computing (HPC) systems, network congestion is an important factor that contributes to performance degradation. However, how network congestion impacts application performance is not fully understood. As Aries network, a recent HPC network architecture featuring a dragonfly topology, is equipped with network counters measuring packet transmission statistics on each router, these network metrics can potentially be utilized to understand network performance. In this work, by experiments on a large HPC system, we quantify the impact of network congestion on various applications' performance in terms of execution time, and we correlate application performance with network metrics. Our results demonstrate diverse impacts of network congestion: while applications with intensive MPI operations (such as HACC and MILC) suffer from more than 40% extension in their execution times under network congestion, applications with less intensive MPI operations (such as Graph500 and HPCG) are mostly not affected. We also demonstrate that a stall-to-flit ratio metric derived from Aries network counters is positively correlated with performance degradation and, thus, this metric can serve as an indicator of network congestion in HPC systems.
Hynes, E., Flynn, R., Lee, B., Murray, N..  2020.  An Evaluation of Lower Facial Micro Expressions as an Implicit QoE Metric for an Augmented Reality Procedure Assistance Application. 2020 31st Irish Signals and Systems Conference (ISSC). :1–6.
Augmented reality (AR) has been identified as a key technology to enhance worker utility in the context of increasing automation of repeatable procedures. AR can achieve this by assisting the user in performing complex and frequently changing procedures. Crucial to the success of procedure assistance AR applications is user acceptability, which can be measured by user quality of experience (QoE). An active research topic in QoE is the identification of implicit metrics that can be used to continuously infer user QoE during a multimedia experience. A user's QoE is linked to their affective state. Affective state is reflected in facial expressions. Emotions shown in micro facial expressions resemble those expressed in normal expressions but are distinguished from them by their brief duration. The novelty of this work lies in the evaluation of micro facial expressions as a continuous QoE metric by means of correlation analysis to the more traditional and accepted post-experience self-reporting. In this work, an optimal Rubik's Cube solver AR application was used as a proof of concept for complex procedure assistance. This was compared with a paper-based procedure assistance control. QoE expressed by affect in normal and micro facial expressions was evaluated through correlation analysis with post-experience reports. The results show that the AR application yielded higher task success rates and shorter task durations. Micro facial expressions reflecting disgust correlated moderately to the questionnaire responses for instruction disinterest in the AR application.
Golagha, M., Pretschner, A., Briand, L. C..  2020.  Can We Predict the Quality of Spectrum-based Fault Localization? 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST). :4–15.
Fault localization and repair are time-consuming and tedious. There is a significant and growing need for automated techniques to support such tasks. Despite significant progress in this area, existing fault localization techniques are not widely applied in practice yet and their effectiveness varies greatly from case to case. Existing work suggests new algorithms and ideas as well as adjustments to the test suites to improve the effectiveness of automated fault localization. However, important questions remain open: Why is the effectiveness of these techniques so unpredictable? What are the factors that influence the effectiveness of fault localization? Can we accurately predict fault localization effectiveness? In this paper, we try to answer these questions by collecting 70 static, dynamic, test suite, and fault-related metrics that we hypothesize are related to effectiveness. Our analysis shows that a combination of only a few static, dynamic, and test metrics enables the construction of a prediction model with excellent discrimination power between levels of effectiveness (eight metrics yielding an AUC of .86; fifteen metrics yielding an AUC of.88). The model hence yields a practically useful confidence factor that can be used to assess the potential effectiveness of fault localization. Given that the metrics are the most influential metrics explaining the effectiveness of fault localization, they can also be used as a guide for corrective actions on code and test suites leading to more effective fault localization.
2021-01-15
Kumar, A., Bhavsar, A., Verma, R..  2020.  Detecting Deepfakes with Metric Learning. 2020 8th International Workshop on Biometrics and Forensics (IWBF). :1—6.

With the arrival of several face-swapping applications such as FaceApp, SnapChat, MixBooth, FaceBlender and many more, the authenticity of digital media content is hanging on a very loose thread. On social media platforms, videos are widely circulated often at a high compression factor. In this work, we analyze several deep learning approaches in the context of deepfakes classification in high compression scenarios and demonstrate that a proposed approach based on metric learning can be very effective in performing such a classification. Using less number of frames per video to assess its realism, the metric learning approach using a triplet network architecture proves to be fruitful. It learns to enhance the feature space distance between the cluster of real and fake videos embedding vectors. We validated our approaches on two datasets to analyze the behavior in different environments. We achieved a state-of-the-art AUC score of 99.2% on the Celeb-DF dataset and accuracy of 90.71% on a highly compressed Neural Texture dataset. Our approach is especially helpful on social media platforms where data compression is inevitable.

2018-12-10
Widder, David Gray, Hilton, Michael, Kästner, Christian, Vasilescu, Bogdan.  2018.  I'm Leaving You, Travis: A Continuous Integration Breakup Story. Proceedings of the 15th International Conference on Mining Software Repositories. :165–169.

Continuous Integration (CI) services, which can automatically build, test, and deploy software projects, are an invaluable asset in distributed teams, increasing productivity and helping to maintain code quality. Prior work has shown that CI pipelines can be sophisticated, and choosing and configuring a CI system involves tradeoffs. As CI technology matures, new CI tool offerings arise to meet the distinct wants and needs of software teams, as they negotiate a path through these tradeoffs, depending on their context. In this paper, we begin to uncover these nuances, and tell the story of open-source projects falling out of love with Travis, the earliest and most popular cloud-based CI system. Using logistic regression, we quantify the effects that open-source community factors and project technical factors have on the rate of Travis abandonment. We find that increased build complexity reduces the chances of abandonment, that larger projects abandon at higher rates, and that a project's dominant language has significant but varying effects. Finally, we find the surprising result that metrics of configuration attempts and knowledge dispersion in the project do not affect the rate of abandonment.

Edge, Darren, Larson, Jonathan, White, Christopher.  2018.  Bringing AI to BI: Enabling Visual Analytics of Unstructured Data in a Modern Business Intelligence Platform. Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems. :CS02:1–CS02:9.

The Business Intelligence (BI) paradigm is challenged by emerging use cases such as news and social media analytics in which the source data are unstructured, the analysis metrics are unspecified, and the appropriate visual representations are unsupported by mainstream tools. This case study documents the work undertaken in Microsoft Research to enable these use cases in the Microsoft Power BI product. Our approach comprises: (a) back-end pipelines that use AI to infer navigable data structures from streams of unstructured text, media and metadata; and (b) front-end representations of these structures grounded in the Visual Analytics literature. Through our creation of multiple end-to-end data applications, we learned that representing the varying quality of inferred data structures was crucial for making the use and limitations of AI transparent to users. We conclude with reflections on BI in the age of AI, big data, and democratized access to data analytics.

Pasricha, Rajiv, McAuley, Julian.  2018.  Translation-based Factorization Machines for Sequential Recommendation. Proceedings of the 12th ACM Conference on Recommender Systems. :63–71.

Sequential recommendation algorithms aim to predict users' future behavior given their historical interactions. A recent line of work has achieved state-of-the-art performance on sequential recommendation tasks by adapting ideas from metric learning and knowledge-graph completion. These algorithms replace inner products with low-dimensional embeddings and distance functions, employing a simple translation dynamic to model user behavior over time. In this paper, we propose TransFM, a model that combines translation and metric-based approaches for sequential recommendation with Factorization Machines (FMs). Doing so allows us to reap the benefits of FMs (in particular, the ability to straightforwardly incorporate content-based features), while enhancing the state-of-the-art performance of translation-based models in sequential settings. Specifically, we learn an embedding and translation space for each feature dimension, replacing the inner product with the squared Euclidean distance to measure the interaction strength between features. Like FMs, we show that the model equation for TransFM can be computed in linear time and optimized using classical techniques. As TransFM operates on arbitrary feature vectors, additional content information can be easily incorporated without significant changes to the model itself. Empirically, the performance of TransFM significantly increases when taking content features into account, outperforming state-of-the-art models on sequential recommendation tasks for a wide variety of datasets.

Hashemi, Soheil, Tann, Hokchhay, Reda, Sherief.  2018.  BLASYS: Approximate Logic Synthesis Using Boolean Matrix Factorization. Proceedings of the 55th Annual Design Automation Conference. :55:1–55:6.

Approximate computing is an emerging paradigm where design accuracy can be traded off for benefits in design metrics such as design area, power consumption or circuit complexity. In this work, we present a novel paradigm to synthesize approximate circuits using Boolean matrix factorization (BMF). In our methodology the truth table of a sub-circuit of the design is approximated using BMF to a controllable approximation degree, and the results of the factorization are used to synthesize a less complex subcircuit. To scale our technique to large circuits, we devise a circuit decomposition method and a subcircuit design-space exploration technique to identify the best order for subcircuit approximations. Our method leads to a smooth trade-off between accuracy and full circuit complexity as measured by design area and power consumption. Using an industrial strength design flow, we extensively evaluate our methodology on a number of testcases, where we demonstrate that the proposed methodology can achieve up to 63% in power savings, while introducing an average relative error of 5%. We also compare our work to previous works in Boolean circuit synthesis and demonstrate significant improvements in design metrics for same accuracy targets.

Kala, Srikant Manas, Sathya, Vanlin, Reddy, M. Pavan Kumar, Tamma, Bheemarjuna Reddy.  2018.  iCALM: A Topology Agnostic Socio-inspired Channel Assignment Performance Prediction Metric for Mesh Networks. :702–704.

A multitude of Channel Assignment (CA) schemes have created a paradox of plenty, making CA selection for Wireless Mesh Networks (WMNs) an onerous task. CA performance prediction (CAPP) metrics are novel tools that address the problem of appropriate CA selection. However, most CAPP metrics depend upon a variety of factors such as the WMN topology, the type of CA scheme, and connectedness of the underlying graph. In this work, we propose an improved Channel Assignment Link-Weight Metric (iCALM) that is independent of these constraints. To the best of our knowledge, iCALM is the first universal CAPP metric for WMNs. To evaluate iCALM, we design two WMN topologies that conform to the attributes of real-world mesh network deployments, and run rigorous simulations in ns-3. We compare iCALM to four existing CAPP metrics, and demonstrate that it performs exceedingly well, regardless of the CA type, and the WMN layout.

Cui, Limeng, Chen, Zhensong, Zhang, Jiawei, He, Lifang, Shi, Yong, Yu, Philip S..  2018.  Multi-view Collective Tensor Decomposition for Cross-modal Hashing. Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval. :73–81.

Multimedia data available in various disciplines are usually heterogeneous, containing representations in multi-views, where the cross-modal search techniques become necessary and useful. It is a challenging problem due to the heterogeneity of data with multiple modalities, multi-views in each modality and the diverse data categories. In this paper, we propose a novel multi-view cross-modal hashing method named Multi-view Collective Tensor Decomposition (MCTD) to fuse these data effectively, which can exploit the complementary feature extracted from multi-modality multi-view while simultaneously discovering multiple separated subspaces by leveraging the data categories as supervision information. Our contributions are summarized as follows: 1) we exploit tensor modeling to get better representation of the complementary features and redefine a latent representation space; 2) a block-diagonal loss is proposed to explicitly pursue a more discriminative latent tensor space by exploring supervision information; 3) we propose a new feature projection method to characterize the data and to generate the latent representation for incoming new queries. An optimization algorithm is proposed to solve the objective function designed for MCTD, which works under an iterative updating procedure. Experimental results prove the state-of-the-art precision of MCTD compared with competing methods.

Ma, L. M., IJtsma, M., Feigh, K. M., Paladugu, A., Pritchett, A. R..  2018.  Modelling and evaluating failures in human-robot teaming using simulation. 2018 IEEE Aerospace Conference. :1–16.

As robotic capabilities improve and robots become more capable as team members, a better understanding of effective human-robot teaming is needed. In this paper, we investigate failures by robots in various team configurations in space EVA operations. This paper describes the methodology of extending and the application of Work Models that Compute (WMC), a computational simulation framework, to model robot failures, interruptions, and the resolutions they require. Using these models, we investigate how different team configurations respond to a robot's failure to correctly complete the task and overall mission. We also identify key factors that impact the teamwork metrics for team designers to keep in mind while assembling teams and assigning taskwork to the agents. We highlight different metrics that these failures impact on team performance through varying components of teaming and interaction that occur. Finally, we discuss the future implications of this work and the future work to be done to investigate function allocation in human-robot teams.

Versluis, L., Neacsu, M., Iosup, A..  2018.  A Trace-Based Performance Study of Autoscaling Workloads of Workflows in Datacenters. 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). :223–232.

To improve customer experience, datacenter operators offer support for simplifying application and resource management. For example, running workloads of workflows on behalf of customers is desirable, but requires increasingly more sophisticated autoscaling policies, that is, policies that dynamically provision resources for the customer. Although selecting and tuning autoscaling policies is a challenging task for datacenter operators, so far relatively few studies investigate the performance of autoscaling for workloads of workflows. Complementing previous knowledge, in this work we propose the first comprehensive performance study in the field. Using trace-based simulation, we compare state-of-the-art autoscaling policies across multiple application domains, workload arrival patterns (e.g., burstiness), and system utilization levels. We further investigate the interplay between autoscaling and regular allocation policies, and the complexity cost of autoscaling. Our quantitative study focuses not only on traditional performance metrics and on state-of-the-art elasticity metrics, but also on time-and memory-related autoscaling-complexity metrics. Our main results give strong and quantitative evidence about previously unreported operational behavior, for example, that autoscaling policies perform differently across application domains and allocation and provisioning policies should be co-designed.

Shathanaa, R., Ramasubramanian, N..  2018.  Improving Power amp; Latency Metrics for Hardware Trojan Detection During High Level Synthesis. 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT). :1–7.

The globalization and outsourcing of the semiconductor industry has raised serious concerns about the trustworthiness of the hardware. Importing Third Party IP cores in the Integrated Chip design has opened gates for new form of attacks on hardware. Hardware Trojans embedded in Third Party IPs has necessitated the need for secure IC design process. Design-for-Trust techniques aimed at detection of Hardware Trojans come with overhead in terms of area, latency and power consumption. In this work, we present a Cuckoo Search algorithm based Design Space Exploration process for finding low cost hardware solutions during High Level Synthesis. The exploration is conducted with respect to datapath resource allocation for single and nested loops. The proposed algorithm is compared with existing Hardware Trojan detection mechanisms and experimental results show that the proposed algorithm is able to achieve 3x improvement in Cost when compared existing algorithms.

Pulparambil, S., Baghdadi, Y., Al-Hamdani, A., Al-Badawi, M..  2018.  Service Design Metrics to Predict IT-Based Drivers of Service Oriented Architecture Adoption. 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT). :1–7.

The key factors for deploying successful services is centered on the service design practices adopted by an enterprise. The design level information should be validated and measures are required to quantify the structural attributes. The metrics at this stage will support an early discovery of design flaws and help designers to predict the capabilities of service oriented architecture (SOA) adoption. In this work, we take a deeper look at how we can forecast the key SOA capabilities infrastructure efficiency and service reuse from the service designs modeled by SOA modeling language. The proposed approach defines metrics based on the structural and domain level similarity of service operations. The proposed metrics are analytically validated with respect to software engineering metrics properties. Moreover, a tool has been developed to automate the proposed approach and the results indicate that the metrics predict the SOA capabilities at the service design stage. This work can be further extended to predict the business based capabilities of SOA adoption such as flexibility and agility.

2018-03-26
Chekuri, Chandra, Madan, Vivek.  2017.  Approximating Multicut and the Demand Graph. Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. :855–874.

In the minimum Multicut problem, the input is an edge-weighted supply graph G = (V, E) and a demand graph H = (V, F). Either G and H are directed (Dir-MulC) or both are undirected (Undir-MulC). The goal is to remove a minimum weight set of supply edges E' $\subseteq$ E such that in G - E' there is no path from s to t for any demand edge (s, t) $ın$ F. Undir-MulC admits O(log k)-approximation where k is the number of edges in H while the best known approximation for Dir-MulC is min\k, Õ(textbarVtextbar11/23)\. These approximations are obtained by proving corresponding results on the multicommodity flow-cut gap. In this paper we consider the role that the structure of the demand graph plays in determining the approximability of Multicut. We obtain several new positive and negative results. In undirected graphs our main result is a 2-approximation in nO(t) time when the demand graph excludes an induced matching of size t. This gives a constant factor approximation for a specific demand graph that motivated this work, and is based on a reduction to uniform metric labeling and not via the flow-cut gap. In contrast to the positive result for undirected graphs, we prove that in directed graphs such approximation algorithms can not exist. We prove that, assuming the Unique Games Conjecture (UGC), that for a large class of fixed demand graphs Dir-MulC cannot be approximated to a factor better than the worst-case flow-cut gap. As a consequence we prove that for any fixed k, assuming UGC, Dir-MulC with k demand pairs is hard to approximate to within a factor better than k. On the positive side, we obtain a k approximation when the demand graph excludes certain graphs as an induced subgraph. This generalizes the known 2 approximation for directed Multiway Cut to a larger class of demand graphs.

Eskandanian, Farzad, Mobasher, Bamshad, Burke, Robin.  2017.  A Clustering Approach for Personalizing Diversity in Collaborative Recommender Systems. Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization. :280–284.

Much of the focus of recommender systems research has been on the accurate prediction of users' ratings for unseen items. Recent work has suggested that objectives such as diversity and novelty in recommendations are also important factors in the effectiveness of a recommender system. However, methods that attempt to increase diversity of recommendation lists for all users without considering each user's preference or tolerance for diversity may lead to monotony for some users and to poor recommendations for others. Our goal in this research is to evaluate the hypothesis that users' propensity towards diversity varies greatly and that the diversity of recommendation lists should be consistent with the level of user interest in diverse recommendations. We propose a pre-filtering clustering approach to group users with similar levels of tolerance for diversity. Our contributions are twofold. First, we propose a method for personalizing diversity by performing collaborative filtering independently on different segments of users based on the degree of diversity in their profiles. Secondly, we investigate the accuracy-diversity tradeoffs using the proposed method across different user segments. As part of this evaluation we propose new metrics, adapted from information retrieval, that help us measure the effectiveness of our approach in personalizing diversity. Our experimental evaluation is based on two different datasets: MovieLens movie ratings, and Yelp restaurant reviews.

Valiant, Gregory, Valiant, Paul.  2017.  Estimating the Unseen: Improved Estimators for Entropy and Other Properties. J. ACM. 64:37:1–37:41.

We show that a class of statistical properties of distributions, which includes such practically relevant properties as entropy, the number of distinct elements, and distance metrics between pairs of distributions, can be estimated given a sublinear sized sample. Specifically, given a sample consisting of independent draws from any distribution over at most k distinct elements, these properties can be estimated accurately using a sample of size O(k log k). For these estimation tasks, this performance is optimal, to constant factors. Complementing these theoretical results, we also demonstrate that our estimators perform exceptionally well, in practice, for a variety of estimation tasks, on a variety of natural distributions, for a wide range of parameters. The key step in our approach is to first use the sample to characterize the ``unseen'' portion of the distribution—effectively reconstructing this portion of the distribution as accurately as if one had a logarithmic factor larger sample. This goes beyond such tools as the Good-Turing frequency estimation scheme, which estimates the total probability mass of the unobserved portion of the distribution: We seek to estimate the shape of the unobserved portion of the distribution. This work can be seen as introducing a robust, general, and theoretically principled framework that, for many practical applications, essentially amplifies the sample size by a logarithmic factor; we expect that it may be fruitfully used as a component within larger machine learning and statistical analysis systems.

Jo, Changyeon, Cho, Youngsu, Egger, Bernhard.  2017.  A Machine Learning Approach to Live Migration Modeling. Proceedings of the 2017 Symposium on Cloud Computing. :351–364.

Live migration is one of the key technologies to improve data center utilization, power efficiency, and maintenance. Various live migration algorithms have been proposed; each exhibiting distinct characteristics in terms of completion time, amount of data transferred, virtual machine (VM) downtime, and VM performance degradation. To make matters worse, not only the migration algorithm but also the applications running inside the migrated VM affect the different performance metrics. With service-level agreements and operational constraints in place, choosing the optimal live migration technique has so far been an open question. In this work, we propose an adaptive machine learning-based model that is able to predict with high accuracy the key characteristics of live migration in dependence of the migration algorithm and the workload running inside the VM. We discuss the important input parameters for accurately modeling the target metrics, and describe how to profile them with little overhead. Compared to existing work, we are not only able to model all commonly used migration algorithms but also predict important metrics that have not been considered so far such as the performance degradation of the VM. In a comparison with the state-of-the-art, we show that the proposed model outperforms existing work by a factor 2 to 5.

Naor, Assaf, Young, Robert.  2017.  The Integrality Gap of the Goemans-Linial SDP Relaxation for Sparsest Cut Is at Least a Constant Multiple of $\surd$Log N. Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing. :564–575.

We prove that the integrality gap of the Goemans–Linial semidefinite programming relaxation for the Sparsest Cut Problem is Î\textcopyright(√logn) on inputs with n vertices, thus matching the previously best known upper bound (logn)1/2+o(1) up to lower-order factors. This statement is a consequence of the following new isoperimetric-type inequality. Consider the 8-regular graph whose vertex set is the 5-dimensional integer grid â„\textcurrency5 and where each vertex (a,b,c,d,e)∈ â„\textcurrency5 is connected to the 8 vertices (aÂ$\pm$ 1,b,c,d,e), (a,bÂ$\pm$ 1,c,d,e), (a,b,cÂ$\pm$ 1,d,eÂ$\pm$ a), (a,b,c,dÂ$\pm$ 1,eÂ$\pm$ b). This graph is known as the Cayley graph of the 5-dimensional discrete Heisenberg group. Given Î\textcopyright⊂ â„\textcurrency5, denote the size of its edge boundary in this graph (a.k.a. the horizontal perimeter of Î\textcopyright) by textbar∂hÎ\textcopyrighttextbar. For t∈ ℕ, denote by textbar∂vtÎ\textcopyrighttextbar the number of (a,b,c,d,e)∈ â„\textcurrency5 such that exactly one of the two vectors (a,b,c,d,e),(a,b,c,d,e+t) is in Î\textcopyright. The vertical perimeter of Î\textcopyright is defined to be textbar∂vÎ\textcopyrighttextbar= √∑t=1∞textbar∂vtÎ\textcopyrighttextbar2/t2. We show that every subset Î\textcopyright⊂ â„\textcurrency5 satisfies textbar∂vÎ\textcopyrighttextbar=O(textbar∂hÎ\textcopyrighttextbar). This vertical-versus-horizontal isoperimetric inequality yields the above-stated integrality gap for Sparsest Cut and answers several geometric and analytic questions of independent interest. The theorem stated above is the culmination of a program whose aim is to understand the performance of the Goemans–Linial semidefinite program through the embeddability properties of Heisenberg groups. These investigations have mathematical significance even beyond their established relevance to approximation algorithms and combinatorial optimization. In particular they contribute to a range of mathematical disciplines including functional analysis, geometric group theory, harmonic analysis, sub-Riemannian geometry, geometric measure theory, ergodic theory, group representations, and metric differentiation. This article builds on the above cited works, with the “twist” that while those works were equally valid for any finite dimensional Heisenberg group, our result holds for the Heisenberg group of dimension 5 (or higher) but fails for the 3-dimensional Heisenberg group. This insight leads to our core contribution, which is a deduction of an endpoint L1-boundedness of a certain singular integral on ℝ5 from the (local) L2-boundedness of the corresponding singular integral on ℝ3. To do this, we devise a corona-type decomposition of subsets of a Heisenberg group, in the spirit of the construction that David and Semmes performed in ℝn, but with two main conceptual differences (in addition to more technical differences that arise from the peculiarities of the geometry of Heisenberg group). Firstly, the“atoms” of our decomposition are perturbations of intrinsic Lipschitz graphs in the sense of Franchi, Serapioni, and Serra Cassano (plus the requisite “wild” regions that satisfy a Carleson packing condition). Secondly, we control the local overlap of our corona decomposition by using quantitative monotonicity rather than Jones-type Î$^2$-numbers.

Chalkley, Joe D., Ranji, Thomas T., Westling, Carina E. I., Chockalingam, Nachiappan, Witchel, Harry J..  2017.  Wearable Sensor Metric for Fidgeting: Screen Engagement Rather Than Interest Causes NIMI of Wrists and Ankles. Proceedings of the European Conference on Cognitive Ergonomics 2017. :158–161.

Measuring fidgeting is an important goal for the psychology of mind-wandering and for human computer interaction (HCI). Previous work measuring the movement of the head, torso and thigh during HCI has shown that engaging screen content leads to non-instrumental movement inhibition (NIMI). Camera-based methods for measuring wrist movements are limited by occlusions. Here we used a high pass filtered magnitude of wearable tri-axial accelerometer recordings during 2-minute passive HCI stimuli as a surrogate for movement of the wrists and ankles. With 24 seated, healthy volunteers experiencing HCI, this metric showed that wrists moved significantly more than ankles. We found that NIMI could be detected in the wrists and ankles; it distinguished extremes of interest and boredom via restlessness. We conclude that both free-willed and forced screen engagement can elicit NIMI of the wrists and ankles.