Information Retrieval: Textual Indexing Using an Oriented Object Database

Mohammed Erritali

Abstract


The growth in the volume of text data such as books and articles in libraries for centuries has imposed to establish effective mechanisms to locate them. Early techniques such as abstraction, indexing and the use of classification categories have marked the birth of a new field of research called "Information Retrieval". Information Retrieval (IR) can be defined as the task of defining models and systems whose purpose is to facilitate access to a set of documents in electronic form (corpus) to allow a user to find the relevant ones for him, that is to say, the contents which matches with the information needs of the user.  Most of the models of information retrieval use a specific data structure to index a corpus which is called "inverted file" or "reverse index". This inverted file collects information on all terms over the corpus documents specifying the identifiers of documents that contain the term in question, the frequency of each term in the documents of the corpus, the positions of the occurrences of the word. In this paper we use an oriented object database (db4o) instead of the inverted file, that is to say, instead to search a term in the inverted file, we will search it in the db4o database. The purpose of this work is to make a comparative study to see if the oriented object databases may be competing for the inverse index in terms of access speed and resource consumption using a large volume of data.

Keywords


Information Retrieval, indexation, oriented object database (db4o), inverted file

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v2.i1.pp205-214

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

shopify stats IJEECS visitor statistics