Visible to the public Biblio

Filters: Keyword is text analysis  [Clear All Filters]
2021-04-09
Mir, N., Khan, M. A. U..  2020.  Copyright Protection for Online Text Information : Using Watermarking and Cryptography. 2020 3rd International Conference on Computer Applications Information Security (ICCAIS). :1—4.
Information and security are interdependent elements. Information security has evolved to be a matter of global interest and to achieve this; it requires tools, policies and assurance of technologies against any relevant security risks. Internet influx while providing a flexible means of sharing the online information economically has rapidly attracted countless writers. Text being an important constituent of online information sharing, creates a huge demand of intellectual copyright protection of text and web itself. Various visible watermarking techniques have been studied for text documents but few for web-based text. In this paper, web page watermarking and cryptography for online content copyrights protection is proposed utilizing the semantic and syntactic rules using HTML (Hypertext Markup Language) and is tested for English and Arabic languages.
2021-03-18
Banday, M. T., Sheikh, S. A..  2020.  Improving Security Control of Text-Based CAPTCHA Challenges using Honeypot and Timestamping. 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC). :704—708.

The resistance to attacks aimed to break CAPTCHA challenges and the effectiveness, efficiency and satisfaction of human users in solving them called usability are the two major concerns while designing CAPTCHA schemes. User-friendliness, universality, and accessibility are related dimensions of usability, which must also be addressed adequately. With recent advances in segmentation and optical character recognition techniques, complex distortions, degradations and transformations are added to text-based CAPTCHA challenges resulting in their reduced usability. The extent of these deformations can be decreased if some additional security mechanism is incorporated in such challenges. This paper proposes an additional security mechanism that can add an extra layer of protection to any text-based CAPTCHA challenge, making it more challenging for bots and scripts that might be used to attack websites and web applications. It proposes the use of hidden text-boxes for user entry of CAPTCHA string which serves as honeypots for bots and automated scripts. The honeypot technique is used to trick bots and automated scripts into filling up input fields which legitimate human users cannot fill in. The paper reports implementation of honeypot technique and results of tests carried out over three months during which form submissions were logged for analysis. The results demonstrated great effectiveness of honeypots technique to improve security control and usability of text-based CAPTCHA challenges.

2021-03-04
Kalin, J., Ciolino, M., Noever, D., Dozier, G..  2020.  Black Box to White Box: Discover Model Characteristics Based on Strategic Probing. 2020 Third International Conference on Artificial Intelligence for Industries (AI4I). :60—63.

In Machine Learning, White Box Adversarial Attacks rely on knowing underlying knowledge about the model attributes. This works focuses on discovering to distrinct pieces of model information: the underlying architecture and primary training dataset. With the process in this paper, a structured set of input probes and the output of the model become the training data for a deep classifier. Two subdomains in Machine Learning are explored - image based classifiers and text transformers with GPT-2. With image classification, the focus is on exploring commonly deployed architectures and datasets available in popular public libraries. Using a single transformer architecture with multiple levels of parameters, text generation is explored by fine tuning off different datasets. Each dataset explored in image and text are distinguishable from one another. Diversity in text transformer outputs implies further research is needed to successfully classify architecture attribution in text domain.

2021-02-22
Si, Y., Zhou, W., Gai, J..  2020.  Research and Implementation of Data Extraction Method Based on NLP. 2020 IEEE 14th International Conference on Anti-counterfeiting, Security, and Identification (ASID). :11–15.
In order to accurately extract the data from unstructured Chinese text, this paper proposes a rule-based method based on natural language processing and regular expression. This method makes use of the language expression rules of the data in the text and other related knowledge to form the feature word lists and rule template to match the text. Experimental results show that the accuracy of the designed algorithm is 94.09%.
Martinelli, F., Marulli, F., Mercaldo, F., Marrone, S., Santone, A..  2020.  Enhanced Privacy and Data Protection using Natural Language Processing and Artificial Intelligence. 2020 International Joint Conference on Neural Networks (IJCNN). :1–8.

Artificial Intelligence systems have enabled significant benefits for users and society, but whilst the data for their feeding are always increasing, a side to privacy and security leaks is offered. The severe vulnerabilities to the right to privacy obliged governments to enact specific regulations to ensure privacy preservation in any kind of transaction involving sensitive information. In the case of digital and/or physical documents comprising sensitive information, the right to privacy can be preserved by data obfuscation procedures. The capability of recognizing sensitive information for obfuscation is typically entrusted to the experience of human experts, who are over-whelmed by the ever increasing amount of documents to process. Artificial intelligence could proficiently mitigate the effort of the human officers and speed up processes. Anyway, until enough knowledge won't be available in a machine readable format, automatic and effectively working systems can't be developed. In this work we propose a methodology for transferring and leveraging general knowledge across specific-domain tasks. We built, from scratch, specific-domain knowledge data sets, for training artificial intelligence models supporting human experts in privacy preserving tasks. We exploited a mixture of natural language processing techniques applied to unlabeled domain-specific documents corpora for automatically obtain labeled documents, where sensitive information are recognized and tagged. We performed preliminary tests just over 10.000 documents from the healthcare and justice domains. Human experts supported us during the validation. Results we obtained, estimated in terms of precision, recall and F1-score metrics across these two domains, were promising and encouraged us to further investigations.

2021-01-25
Abusukhon, A., AlZu’bi, S..  2020.  New Direction of Cryptography: A Review on Text-to-Image Encryption Algorithms Based on RGB Color Value. 2020 Seventh International Conference on Software Defined Systems (SDS). :235–239.
Data encryption techniques are important for answering the question: How secure is the Internet for sending sensitive data. Keeping data secure while they are sent through the global network is a difficult task. This is because many hackers are fishing these data in order to get some benefits. The researchers have developed various types of encryption algorithms to protect data from attackers. These algorithms are mainly classified into two categories namely symmetric and asymmetric encryption algorithms. This survey sheds light on the recent work carried out on encrypting a text into an image based on the RGB color value and held a comparison between them based on various factors evolved from the literature.
2020-12-14
Yu, L., Chen, L., Dong, J., Li, M., Liu, L., Zhao, B., Zhang, C..  2020.  Detecting Malicious Web Requests Using an Enhanced TextCNN. 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC). :768–777.
This paper proposes an approach that combines a deep learning-based method and a traditional machine learning-based method to efficiently detect malicious requests Web servers received. The first few layers of Convolutional Neural Network for Text Classification (TextCNN) are used to automatically extract powerful semantic features and in the meantime transferable statistical features are defined to boost the detection ability, specifically Web request parameter tampering. The semantic features from TextCNN and transferable statistical features from artificially-designing are grouped together to be fed into Support Vector Machine (SVM), replacing the last layer of TextCNN for classification. To facilitate the understanding of abstract features in form of numerical data in vectors extracted by TextCNN, this paper designs trace-back functions that map max-pooling outputs back to words in Web requests. After investigating the current available datasets for Web attack detection, HTTP Dataset CSIC 2010 is selected to test and verify the proposed approach. Compared with other deep learning models, the experimental results demonstrate that the approach proposed in this paper is competitive with the state-of-the-art.
2020-12-11
Palash, M. H., Das, P. P., Haque, S..  2019.  Sentimental Style Transfer in Text with Multigenerative Variational Auto-Encoder. 2019 International Conference on Bangla Speech and Language Processing (ICBSLP). :1—4.

Style transfer is an emerging trend in the fields of deep learning's applications, especially in images and audio data this is proven very useful and sometimes the results are astonishing. Gradually styles of textual data are also being changed in many novel works. This paper focuses on the transfer of the sentimental vibe of a sentence. Given a positive clause, the negative version of that clause or sentence is generated keeping the context same. The opposite is also done with negative sentences. Previously this was a very tough job because the go-to techniques for such tasks such as Recurrent Neural Networks (RNNs) [1] and Long Short-Term Memories(LSTMs) [2] can't perform well with it. But since newer technologies like Generative Adversarial Network(GAN) and Variational AutoEncoder(VAE) are emerging, this work seem to become more and more possible and effective. In this paper, Multi-Genarative Variational Auto-Encoder is employed to transfer sentiment values. Inspite of working with a small dataset, this model proves to be promising.

Dabas, K., Madaan, N., Arya, V., Mehta, S., Chakraborty, T., Singh, G..  2019.  Fair Transfer of Multiple Style Attributes in Text. 2019 Grace Hopper Celebration India (GHCI). :1—5.

To preserve anonymity and obfuscate their identity on online platforms users may morph their text and portray themselves as a different gender or demographic. Similarly, a chatbot may need to customize its communication style to improve engagement with its audience. This manner of changing the style of written text has gained significant attention in recent years. Yet these past research works largely cater to the transfer of single style attributes. The disadvantage of focusing on a single style alone is that this often results in target text where other existing style attributes behave unpredictably or are unfairly dominated by the new style. To counteract this behavior, it would be nice to have a style transfer mechanism that can transfer or control multiple styles simultaneously and fairly. Through such an approach, one could obtain obfuscated or written text incorporated with a desired degree of multiple soft styles such as female-quality, politeness, or formalness. To the best of our knowledge this work is the first that shows and attempt to solve the issues related to multiple style transfer. We also demonstrate that the transfer of multiple styles cannot be achieved by sequentially performing multiple single-style transfers. This is because each single style-transfer step often reverses or dominates over the style incorporated by a previous transfer step. We then propose a neural network architecture for fairly transferring multiple style attributes in a given text. We test our architecture on the Yelp dataset to demonstrate our superior performance as compared to existing one-style transfer steps performed in a sequence.

Phu, T. N., Hoang, L., Toan, N. N., Tho, N. Dai, Binh, N. N..  2019.  C500-CFG: A Novel Algorithm to Extract Control Flow-based Features for IoT Malware Detection. 2019 19th International Symposium on Communications and Information Technologies (ISCIT). :568—573.

{Static characteristic extraction method Control flow-based features proposed by Ding has the ability to detect malicious code with higher accuracy than traditional Text-based methods. However, this method resolved NP-hard problem in a graph, therefore it is not feasible with the large-size and high-complexity programs. So, we propose the C500-CFG algorithm in Control flow-based features based on the idea of dynamic programming, solving Ding's NP-hard problem in O(N2) time complexity, where N is the number of basic blocks in decom-piled executable codes. Our algorithm is more efficient and more outstanding in detecting malware than Ding's algorithm: fast processing time, allowing processing large files, using less memory and extracting more feature information. Applying our algorithms with IoT data sets gives outstanding results on 2 measures: Accuracy = 99.34%

Slawinski, M., Wortman, A..  2019.  Applications of Graph Integration to Function Comparison and Malware Classification. 2019 4th International Conference on System Reliability and Safety (ICSRS). :16—24.

We classify .NET files as either benign or malicious by examining directed graphs derived from the set of functions comprising the given file. Each graph is viewed probabilistically as a Markov chain where each node represents a code block of the corresponding function, and by computing the PageRank vector (Perron vector with transport), a probability measure can be defined over the nodes of the given graph. Each graph is vectorized by computing Lebesgue antiderivatives of hand-engineered functions defined on the vertex set of the given graph against the PageRank measure. Files are subsequently vectorized by aggregating the set of vectors corresponding to the set of graphs resulting from decompiling the given file. The result is a fast, intuitive, and easy-to-compute glass-box vectorization scheme, which can be leveraged for training a standalone classifier or to augment an existing feature space. We refer to this vectorization technique as PageRank Measure Integration Vectorization (PMIV). We demonstrate the efficacy of PMIV by training a vanilla random forest on 2.5 million samples of decompiled. NET, evenly split between benign and malicious, from our in-house corpus and compare this model to a baseline model which leverages a text-only feature space. The median time needed for decompilation and scoring was 24ms. 11Code available at https://github.com/gtownrocks/grafuple.

2020-11-20
Han, H., Wang, Q., Chen, C..  2019.  Policy Text Analysis Based on Text Mining and Fuzzy Cognitive Map. 2019 15th International Conference on Computational Intelligence and Security (CIS). :142—146.
With the introduction of computer methods, the amount of material and processing accuracy of policy text analysis have been greatly improved. In this paper, Text mining(TM) and latent semantic analysis(LSA) were used to collect policy documents and extract policy elements from them. Fuzzy association rule mining(FARM) technique and partial association test (PA) were used to discover the causal relationships and impact degrees between elements, and a fuzzy cognitive map (FCM) was developed to deduct the evolution of elements through a soft computing method. This non-interventionist approach avoids the validity defects caused by the subjective bias of researchers and provides policy makers with more objective policy suggestions from a neutral perspective. To illustrate the accuracy of this method, this study experimented by taking the state-owned capital layout adjustment related policies as an example, and proved that this method can effectively analyze policy text.
2020-11-04
Flores, P..  2019.  Digital Simulation in the Virtual World: Its Effect in the Knowledge and Attitude of Students Towards Cybersecurity. 2019 Sixth HCT Information Technology Trends (ITT). :1—5.

The search for alternative delivery modes to teaching has been one of the pressing concerns of numerous educational institutions. One key innovation to improve teaching and learning is e-learning which has undergone enormous improvements. From its focus on text-based environment, it has evolved into Virtual Learning Environments (VLEs) which provide more stimulating and immersive experiences among learners and educators. An example of VLEs is the virtual world which is an emerging educational platform among universities worldwide. One very interesting topic that can be taught using the virtual world is cybersecurity. Simulating cybersecurity in the virtual world may give a realistic experience to students which can be hardly achieved by classroom teaching. To date, there are quite a number of studies focused on cybersecurity awareness and cybersecurity behavior. But none has focused looking into the effect of digital simulation in the virtual world, as a new educational platform, in the cybersecurity attitude of the students. It is in this regard that this study has been conducted by designing simulation in the virtual world lessons that teaches the five aspects of cybersecurity namely; malware, phishing, social engineering, password usage and online scam, which are the most common cybersecurity issues. The study sought to examine the effect of this digital simulation design in the cybersecurity knowledge and attitude of the students. The result of the study ascertains that students exposed under simulation in the virtual world have a greater positive change in cybersecurity knowledge and attitude than their counterparts.

2020-11-02
Pan, C., Huang, J., Gong, J., Yuan, X..  2019.  Few-Shot Transfer Learning for Text Classification With Lightweight Word Embedding Based Models. IEEE Access. 7:53296–53304.
Many deep learning architectures have been employed to model the semantic compositionality for text sequences, requiring a huge amount of supervised data for parameters training, making it unfeasible in situations where numerous annotated samples are not available or even do not exist. Different from data-hungry deep models, lightweight word embedding-based models could represent text sequences in a plug-and-play way due to their parameter-free property. In this paper, a modified hierarchical pooling strategy over pre-trained word embeddings is proposed for text classification in a few-shot transfer learning way. The model leverages and transfers knowledge obtained from some source domains to recognize and classify the unseen text sequences with just a handful of support examples in the target problem domain. The extensive experiments on five datasets including both English and Chinese text demonstrate that the simple word embedding-based models (SWEMs) with parameter-free pooling operations are able to abstract and represent the semantic text. The proposed modified hierarchical pooling method exhibits significant classification performance in the few-shot transfer learning tasks compared with other alternative methods.
2020-10-12
Granatyr, Jones, Gomes, Heitor Murilo, Dias, João Miguel, Paiva, Ana Maria, Nunes, Maria Augusta Silveira Netto, Scalabrin, Edson Emílio, Spak, Fábio.  2019.  Inferring Trust Using Personality Aspects Extracted from Texts. 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). :3840–3846.
Trust mechanisms are considered the logical protection of software systems, preventing malicious people from taking advantage or cheating others. Although these concepts are widely used, most applications in this field do not consider affective aspects to aid in trust computation. Researchers of Psychology, Neurology, Anthropology, and Computer Science argue that affective aspects are essential to human's decision-making processes. So far, there is a lack of understanding about how these aspects impact user's trust, particularly when they are inserted in an evaluation system. In this paper, we propose a trust model that accounts for personality using three personality models: Big Five, Needs, and Values. We tested our approach by extracting personality aspects from texts provided by two online human-fed evaluation systems and correlating them to reputation values. The empirical experiments show statistically significant better results in comparison to non-personality-wise approaches.
MacMahon, Silvana Togneri, Alfano, Marco, Lenzitti, Biagio, Bosco, Giosuè Lo, McCaffery, Fergal, Taibi, Davide, Helfert, Markus.  2019.  Improving Communication in Risk Management of Health Information Technology Systems by means of Medical Text Simplification. 2019 IEEE Symposium on Computers and Communications (ISCC). :1135–1140.
Health Information Technology Systems (HITS) are increasingly used to improve the quality of patient care while reducing costs. These systems have been developed in response to the changing models of care to an ongoing relationship between patient and care team, supported by the use of technology due to the increased instance of chronic disease. However, the use of HITS may increase the risk to patient safety and security. While standards can be used to address and manage these risks, significant communication problems exist between experts working in different departments. These departments operate in silos often leading to communication breakdowns. For example, risk management stakeholders who are not clinicians may struggle to understand, define and manage risks associated with these systems when talking to medical professionals as they do not understand medical terminology or the associated care processes. In order to overcome this communication problem, we propose the use of the “Three Amigos” approach together with the use of the SIMPLE tool that has been developed to assist patients in understanding medical terms. This paper examines how the “Three Amigos” approach and the SIMPLE tool can be used to improve estimation of severity of risk by non-clinical risk management stakeholders and provides a practical example of their use in a ten step risk management process.
2020-10-05
Su, Jinsong, Zeng, Jiali, Xiong, Deyi, Liu, Yang, Wang, Mingxuan, Xie, Jun.  2018.  A Hierarchy-to-Sequence Attentional Neural Machine Translation Model. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 26:623—632.

Although sequence-to-sequence attentional neural machine translation (NMT) has achieved great progress recently, it is confronted with two challenges: learning optimal model parameters for long parallel sentences and well exploiting different scopes of contexts. In this paper, partially inspired by the idea of segmenting a long sentence into short clauses, each of which can be easily translated by NMT, we propose a hierarchy-to-sequence attentional NMT model to handle these two challenges. Our encoder takes the segmented clause sequence as input and explores a hierarchical neural network structure to model words, clauses, and sentences at different levels, particularly with two layers of recurrent neural networks modeling semantic compositionality at the word and clause level. Correspondingly, the decoder sequentially translates segmented clauses and simultaneously applies two types of attention models to capture contexts of interclause and intraclause for translation prediction. In this way, we can not only improve parameter learning, but also well explore different scopes of contexts for translation. Experimental results on Chinese-English and English-German translation demonstrate the superiorities of the proposed model over the conventional NMT model.

Liu, Donglei, Niu, Zhendong, Zhang, Chunxia, Zhang, Jiadi.  2019.  Multi-Scale Deformable CNN for Answer Selection. IEEE Access. 7:164986—164995.

The answer selection task is one of the most important issues within the automatic question answering system, and it aims to automatically find accurate answers to questions. Traditional methods for this task use manually generated features based on tf-idf and n-gram models to represent texts, and then select the right answers according to the similarity between the representations of questions and the candidate answers. Nowadays, many question answering systems adopt deep neural networks such as convolutional neural network (CNN) to generate the text features automatically, and obtained better performance than traditional methods. CNN can extract consecutive n-gram features with fixed length by sliding fixed-length convolutional kernels over the whole word sequence. However, due to the complex semantic compositionality of the natural language, there are many phrases with variable lengths and be composed of non-consecutive words in natural language, such as these phrases whose constituents are separated by other words within the same sentences. But the traditional CNN is unable to extract the variable length n-gram features and non-consecutive n-gram features. In this paper, we propose a multi-scale deformable convolutional neural network to capture the non-consecutive n-gram features by adding offset to the convolutional kernel, and also propose to stack multiple deformable convolutional layers to mine multi-scale n-gram features by the means of generating longer n-gram in higher layer. Furthermore, we apply the proposed model into the task of answer selection. Experimental results on public dataset demonstrate the effectiveness of our proposed model in answer selection.

2020-08-28
Perry, Lior, Shapira, Bracha, Puzis, Rami.  2019.  NO-DOUBT: Attack Attribution Based On Threat Intelligence Reports. 2019 IEEE International Conference on Intelligence and Security Informatics (ISI). :80—85.

The task of attack attribution, i.e., identifying the entity responsible for an attack, is complicated and usually requires the involvement of an experienced security expert. Prior attempts to automate attack attribution apply various machine learning techniques on features extracted from the malware's code and behavior in order to identify other similar malware whose authors are known. However, the same malware can be reused by multiple actors, and the actor who performed an attack using a malware might differ from the malware's author. Moreover, information collected during an incident may contain many clues about the identity of the attacker in addition to the malware used. In this paper, we propose a method of attack attribution based on textual analysis of threat intelligence reports, using state of the art algorithms and models from the fields of machine learning and natural language processing (NLP). We have developed a new text representation algorithm which captures the context of the words and requires minimal feature engineering. Our approach relies on vector space representation of incident reports derived from a small collection of labeled reports and a large corpus of general security literature. Both datasets have been made available to the research community. Experimental results show that the proposed representation can attribute attacks more accurately than the baselines' representations. In addition, we show how the proposed approach can be used to identify novel previously unseen threat actors and identify similarities between known threat actors.

BOUGHACI, Dalila, BENMESBAH, Mounir, ZEBIRI, Aniss.  2019.  An improved N-grams based Model for Authorship Attribution. 2019 International Conference on Computer and Information Sciences (ICCIS). :1—6.

Authorship attribution is the problem of studying an anonymous text and finding the corresponding author in a set of candidate authors. In this paper, we propose a method based on N-grams model for the problem of authorship attribution. Several measures are used to assign an anonymous text to an author. The different variants of the proposed method are implemented and validated on PAN benchmarks. The numerical results are encouraging and demonstrate the benefit of the proposed idea.

Khomytska, Iryna, Teslyuk, Vasyl.  2019.  Mathematical Methods Applied for Authorship Attribution on the Phonological Level. 2019 IEEE 14th International Conference on Computer Sciences and Information Technologies (CSIT). 3:7—11.

The proposed combination of statistical methods has proved efficient for authorship attribution. The complex analysis method based on the proposed combination of statistical methods has made it possible to minimize the number of phoneme groups by which the authorial differentiation of texts has been done.

Jafariakinabad, Fereshteh, Hua, Kien A..  2019.  Style-Aware Neural Model with Application in Authorship Attribution. 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). :325—328.

Writing style is a combination of consistent decisions associated with a specific author at different levels of language production, including lexical, syntactic, and structural. In this paper, we introduce a style-aware neural model to encode document information from three stylistic levels and evaluate it in the domain of authorship attribution. First, we propose a simple way to jointly encode syntactic and lexical representations of sentences. Subsequently, we employ an attention-based hierarchical neural network to encode the syntactic and semantic structure of sentences in documents while rewarding the sentences which contribute more to capturing the writing style. Our experimental results, based on four benchmark datasets, reveal the benefits of encoding document information from all three stylistic levels when compared to the baseline methods in the literature.

Khomytska, Iryna, Teslyuk, Vasyl.  2019.  The Software for Authorship and Style Attribution. 2019 IEEE 15th International Conference on the Experience of Designing and Application of CAD Systems (CADSM). :1—4.

A new program has been developed for style and authorship attribution. Differentiation of styles by transcription symbols has proved to be efficient The novel approach involves a combination of two ways of transforming texts into their transcription variants. The java programming language makes it possible to improve efficiency of style and authorship attribution.

2020-07-27
Gorodnichev, Mikhail G., Kochupalov, Alexander E., Gematudinov, Rinat A..  2018.  Asynchronous Rendering of Texts in iOS Applications. 2018 IEEE International Conference "Quality Management, Transport and Information Security, Information Technologies" (IT QM IS). :643–645.
This article is devoted to new asynchronous methods for rendering text information in mobile applications for iOS operating system.
2020-07-24
Wu, Chuxin, Zhang, Peng, Liu, Hongwei, Liu, Yuhong.  2019.  Multi-keyword Ranked Searchable Encryption Supporting CP-ABE Test. 2019 Computing, Communications and IoT Applications (ComComAp). :220—225.

Internet of Things (IoT) and cloud computing are promising technologies that change the way people communicate and live. As the data collected through IoT devices often involve users' private information and the cloud is not completely trusted, users' private data are usually encrypted before being uploaded to cloud for security purposes. Searchable encryption, allowing users to search over the encrypted data, extends data flexibility on the premise of security. In this paper, to achieve the accurate and efficient ciphertext searching, we present an efficient multi-keyword ranked searchable encryption scheme supporting ciphertext-policy attribute-based encryption (CP-ABE) test (MRSET). For efficiency, numeric hierarchy supporting ranked search is introduced to reduce the dimensions of vectors and matrices. For practicality, CP-ABE is improved to support access right test, so that only documents that the user can decrypt are returned. The security analysis shows that our proposed scheme is secure, and the experimental result demonstrates that our scheme is efficient.