Visible to the public Biblio

Filters: Author is Vasthimal, D. K.  [Clear All Filters]
Kumar, S., Vasthimal, D. K..  2019.  Raw Cardinality Information Discovery for Big Datasets. 2019 IEEE 5th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS). :200—205.
Real-time discovery of all different types of unique attributes within unstructured data is a challenging problem to solve when dealing with multiple petabytes of unstructured data volume everyday. Popular discovery solutions such as the creation of offline jobs to uniquely identify attributes or running aggregation queries on raw data sets limits real time discovery use-cases and often results into poor resource utilization. The discovery information must be treated as a parallel problem to just storing raw data sets efficiently onto back-end big data systems. Solving the discovery problem by creating a parallel discovery data store infrastructure has multiple benefits as it allows such to channel the actual search queries against the raw data set in much more funneled manner instead of being widespread across the entire data sets. Such focused search queries and data separation are far more performant and requires less compute and memory footprint.