Compressing the Semi-Structured Data For Inverted Index

B. Usharani

Abstract


Search engines usually use keywords and metadata to present a more useful vocabulary for Internet or onsite searching. Index maps the topics to specific page numbers. Metadata web indexing involves assigning keywords or phrases to web pages or web sites within a metadata tag field, so that the web page or web site can be reclaimed with a search engine that is customized to search the keywords field . If you need some information about some topic, you will open up the index and find out that word. Generally the search engine consists of some billions or trillions of documents. An Inverted Index is a data structure used for text search in search engines. The main advantage of inverted index is quick and easy retrieval of documents when search is performed. The inverted index tells to the search engine, the page numbers where that word is explained in a bulk of billion pages. This paper discusses how inverted index is constructed over semi-structured data and compression of the inverted index using various lossless compression algorithms. Many approaches have been showed for compressing the relational databases. This paper showed the approach for compressing the semi-structured data.


Full Text:

PDF

References


B.Usharani, M.TanoojKumar “Survey on Inverted Index Compression Over Structured Data” By IJARCS, 4th National Conference on Recent Trends in Information Technology 2015 P.No 57-61.

B.Usharani, M.TanoojKumar “ Inverted Index Compression Over Structured Data” By IJCSE, National Conference on Advancements in Embedded Systems and Sensor Networks 2015 P.No 119-124.

http://en.wikipedia.org/wiki/Huffman_coding

David Solomon “,Data compression, The complete Reference”, Fourth edition, Springer

Li, Z., and Drew, M. S.: 'Fundamental of Multimedia, School of Computing Science Fraser University, 2004.

http://en.wikipedia.org/wiki/Search_engine_indexing#The_forward_index

http://googleblog.blogspot.in/2008/07/we-knew-web-was-big.html

”A low redundancy strategy for keyword search in structured and semi-structured data”.J.L Lopez-venna,victor J.S,Ivan Lopez-Arevalo,Elsevier Aug 2014, Pages 135-152.

”Retrieving and Materializing Tuple Units for Effective Keyword Search over Relational Databases” (2008) by Guoliang Li , Jianhua Feng , Lizhu Zhou ,pp.469-483.

A low redundancy strategy for keyword search in structured and semi-structured data , Jaime I. Lopez-Veyna , Victor J. Sosa-Sosa, Ivan Lopez-Arevalo

B.Usharani “Survey on Inverted Index Compression Over Semi-Structured Data” By IJIRC.

https://en.wikipedia.org/wiki/Run-length_encoding

B. Usharani “Mapping the Semi-Structured Data to the Structured Data for Inverted Index Compression”, by IJDTA issue :vol10,no.1,2017 (pp. 235-244).

http://www.webopedia.com/TERM/S/structured_da ta.html

http://en.wikipedia.org/wiki/Inverted_index

http://en.wikipedia.org/wiki/Data_compression

Khalidsayood,”Introduction to data compression”, Third edition

M.Nelson,J.L.Gailly,” The Data Compression Book”, second edition

http://en.wikipedia.org/wiki/Entropy_encoding

http://en.wikipedia.org/wiki/Universal_code_(data_ compression)

J. Chen,T. Cook― Using d-gap Patterns for Index Compression‖,2007, ACM, Pages 1209-1210 .

B.Usharani” Comparison between Structured and Semi-Structured data for Inverted Index Compression”by IJRCSE Ist National Conference On “Recent Trends in Advanced Computing”, APRIL 2016 P.No(7-11)

https://en.wikipedia.org/wiki/XML_database

https://en.wikipedia.org/wiki/XQuery

A. Markowetz, Y. Yang, and D. Papadias.” Reachability Indexes for Relational Keyword Search” In ICDE, 2009, Pages 1163-1166.

K. Q. Pu and X. Yu. “Keyword query cleaning” PVLDB, 1(1), 2008,pp.909-920.

M. Sayyadian, H. LeKhac, A. Doan, and L. Gravano.” Efficient keyword search across heterogeneous relational database”. In ICDE, 2007,pp.1-9.

F. Shao, L. Guo, and C. Botev” Efficient Keyword Search over Virtual XML Views” In VLDB, 2007,pp.1057-1068.

Q. Shao, P. Sun, and Y. Chen. WISE” a workflow information search engine” In ICDE, 2009.

Q. Su and J. Widom. “Indexing relational database content offline for efficient keyword-based search” In IDEAS, 2005,pages 1-10.

C. Sun, C.-Y. Chan, and A. Goenka.” Multiway SLCA-based keyword search in XML data”. In WWW, 2007,pp1043-1052.

“Databases and IR: Perspectives of a SQL “guy. NSF Information and Data Management PI Workshop, 2003,pp.1-79.

P. P. Talukdar, M. Jacob, M. S. Mehmood, K. Crammer, Z. G. Ives, F. Pereira, and S. Guha. “Learning to create data-integrating queries”. PVLDB, 1(1), 2008,pp.1-13.

Y. Tao and J. X. Yu. “Finding Frequent Co-occurring Terms in Relational Keyword Search”. In EDBT, 2009,pp.839-850.

S. Tata and G. M. Lohman.” SQAK: doing more with keywords” In SIGMOD, 2008,pp 889-901.

T. Tran, S. Rudolph, P. Cimiano, and H. Wang. “Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data” In ICDE, 2009, Pages 405-416.

Q. H. Vu, B. C. Ooi, D. Papadias, and A. K. H. Tung” A graph method for keyword-based selection of the top-k database”. In SIGMOD, 2008,pp.1-12.

S. Wang, Z. Peng, J. Zhang, L. Qin, S. Wang, J. X. Yu, and B. Ding. NUITS” A novel user interface for efficient keyword search over databases” In VLDB, 2006,pp.1143-1146..

P. Wu, Y. Sismanis, and B. Reinwald. “Towards keyword-driven analytical processing” In SIGMOD, 2007,pp.617-628.

Y. Xu and Y. Papakonstantinou.”Efficient keyword search for smallest LCAs in XML databases” In SIGMOD, 2005, pages 1-12.

Y. Xu and Y. Papakonstantinou. “Efficient LCA based Keyword Search in XML Data”. In EDBT, 2008,pages 1-12.

B. Yu, G. Li, K. R. Sollins, and A. K. H. Tung. “Effective keyword-based selection of relational databases” In SIGMOD, 2007, Pages 139-150.

D. Zhang, Y. M. Chee, A. Mondal, A. Tung, and M. Kitsuregawa. “Keyword Search in Spatial Databases: Towards Searching by Document” In ICDE, 2009, Pages 688-699 .

Fan Xia, F., Yu, C., Xu, L. et al.” Top-k temporal keyword search over social media data”,springer September 2017, Volume 20, Issue 5, pp 1049–1069.

chunbin lin,jianguo “spiderx:fast XMLExploration system” proceedings of 26 international world wide web companion april3-7 2017,pages 237-241.

Alex Badan et al” Towards open-source shared implementations of keyword-based access systems to relational data” april 2017 pages1-5.

Javed M., Nagabhushan P., Chaudhuri B.B. (2017) Spotting of Keyword Directly in Run-Length Compressed Documents. In: Raman B., Kumar S., Roy P., Sen D. (eds) Proceedings of International Conference on Computer Vision and Image Processing. Advances in Intelligent Systems and Computing, vol 459. Springer, Singapore, pp 367-376.

Belhajjame K., Grigori D., Harmassi M., Ben Yahia M. (2017) Keyword-Based Search of Workflow Fragments and Their Composition. In: Nguyen N., Kowalczyk R., Pinto A., Cardoso J. (eds) Transactions on Computational Collective Intelligence XXVI. Lecture Notes in Computer Science, vol 10190. Springer, Cham, pp 67-90


Refbacks

  • There are currently no refbacks.


MAYFEB Journal of Electrical and Computer Engineering
MAYFEB TECHNOLOGY DEVELOPMENT
Toronto, Ontario, Canada