Contextual Information Retrieval

Contextual Information Retrieval
: Efficient Search Engine
Muhammad Ahmad (MS170401436) [email protected]
Nazia Ishtiaq (MS170400900) [email protected]
#MS(CS), Department of Computer Science,
Virtual University of Pakistan
Lahore, Pakistan

Abstract
Contextual recovery is a technique for current search engines
in terms of facilitating queries and returning relevant
information. . This document reports on the development and
evaluation of a system designed to address some of the
challenges associated with the retrieval of contextual
information. The developed system has been designed with the
objective of capturing implicit and explicit user data that is used
to develop a personal contextual profile. These are used to
refine search queries and improve both the search results for a
user and their search experience. An empirical study has been
carried out to evaluate the system against a series of hypotheses.
In this document, results related to one are presented that
support the claim that users can find information more easily
using the contextual search system.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

Metasearch engines, such as MetaCrawler and SavvySearch, try
to deal with the problem of limited coverage by sending queries
to several standard search engines at the same time.4,5 The
main advantages of metasearch engines are that they combine
the results of several engines search engines and a consistent
user interface.5 However, most metasearch engines rely on
documents and summaries returned by standard search engines
and, therefore, inherit their limited accuracy and vulnerability
to word spamming key.
We developed the metasearch engine of the NEC Research
Institute (NECI) to improve the efficiency and accuracy of Web
search by downloading and analyzing each document and then
displaying results that show the terms of the query in context.
This helps users determine more quickly if the document is
relevant without having to download each page. This technique
is simple, but it can be very effective, particularly when dealing
with a large, diverse and poorly organized database of the Web.
The results of the NECI engine are returned progressively after
each page is downloaded and analyzed, instead of all pages
being downloaded. The pages are downloaded in parallel and
the first result is usually displayed in less time than a standard
search engine takes to show its response. The NECI metasearch
engine is currently in use by employees of the NEC Research
Institute. This article describes its characteristics,
implementation and performance
Keywords—Web searching, information retrieval,
I. INTRODUCTION
Research into contextual recovery approaches has
become a prominent topic in the field of interactive web
information retrieval. The main objective of contextual
recovery is to acquire a user’s information search behavior, such
as search and response activities, and incorporate this
information into a search system. The objective is to create a
more effective, efficient and personalized interaction using an
adequate recovery strategy, adapting the system to their
preferences. Contextual recovery has been defined as an
information retrieval process that combines search
technologies, knowledge about a query and the context of the
user in a single framework to provide the most appropriate
response to the user’s information needs. Despite its growing
importance and the development of contextual recovery
approaches, there is no comprehensive model to fully describe
contextual recovery 2 due to the difficulty of capturing and
representing knowledge about users, tasks and context in a
general search environment Web. Indeed, contextual recovery
remains a significant long-term challenge 1. This document
describes a contextual recovery system that has been developed
to address some of the challenges associated with the effective
recovery of search engine information.
Several popular and useful search engines, such as AltaVista,
Excite, HotBot, Infoseek, Lycos and Northern Light, try to
maintain the full-text indexes of the World Wide Web.
However, relying on a single standard search engine has
limitations. Standard search engines have limited coverage,
outdated databases and, sometimes, are not available due to

problems with the network or the engine itself. The accuracy of
the standard engine results can also vary because they generally
focus on the quick handling of queries and use relatively simple
classification schemes.3 Classifications can be further
complicated with the keyword “spamming” to increase the
hierarchical order of a page. Often, the relevance of a particular
page is obvious only after loading it and finding the terms of the
query.
.
Metasearch engines, such as MetaCrawler and SavvySearch,
try to deal with the problem of limited coverage by sending
queries to several standard search engines at the same time.4,5
The main advantages of metasearch engines are that they
combine the results of several engines search engines and a
consistent user interface.5 However, most metasearch engines
rely on documents and summaries returned by standard search
engines and, therefore, inherit their limited accuracy and
vulnerability to word spamming key.

We developed the metasearch engine of the NEC Research
Institute (NECI) to improve the efficiency and accuracy of Web
search by downloading and analyzing each document and then
displaying results that show the terms of the query in context.
This helps users determine more quickly if the document is
relevant without having to download each page. This technique
is simple, but it can be very effective, particularly when dealing
with a large, diverse and poorly organized database of the Web.
The results of the NECI engine are returned progressively after
each page is downloaded and analyzed, instead of all pages
being downloaded. The pages are downloaded in parallel and
the first result is usually displayed in less time than a standard
search engine takes to show its response. The NECI metasearch
engine is currently in use by employees of the NEC Research
Institute. This article describes its characteristics,
implementation and performance.
II. INFORMATION RETRIEVAL
TECHNIQUES
A. User profile modeling.
Several IR Web systems have explored various user modeling
approaches to improve the personalization of the user’s web
search experience. A review of these user modeling approaches
reveals that everyone uses user behavior or user preferences to
build a contextual profile. However, none of the approaches
considered uses a combination of behavior and user
preferences. In addition to InfoFinder 6, none of the reviewed
approaches discusses boolean query expansions using any form
of user’s contextual profile. Similarly, apart from WebMate 7,
these approaches do not have the ability to share the contextual
profile information of a user with other users, which can lead to
less than optimal performance when the user needs to access
information outside of its original context The use of shared
contextual profiles or collaborative filters takes advantage of
the collective profiles of several users and can help users with
similar interests.

While showing promise, previous IR approaches that employ
user profile models have had limited success. There continue to
be fundamental challenges, specifically: i) how to acquire,
maintain and represent accurate information about the multiple
interests of a user with minimal intervention; ii) how to use this
acquired information about the user to deliver personalized
search results, and iii) how to use the information acquired from
various users as a knowledge base in communities or large
groups?
B. Query expansion.

In general, a query expansion approach attempts to expand the
original search query by adding additional, new, or related
terms. These terms are the effective organization and progress
of the IoT. The rest of this document was added to an existing
query, either by the user, known as interactive query expansion
(IQE), or by the recovery system, known as automatic query
expansion (AQE).
C. Relevance feedback.
The idea behind the relevance feedback (RF) is to take the
results that are initially returned from a query and use
information about whether those results are relevant to perform
a new query. Relevant feedback provides a means to
automatically reformulate a query to more accurately reflect a
user’s interests. The main challenges faced by current RF
mechanisms are: i) how to capture a user’s information search
behavior and preferences and structure this information so that
a search context can be defined that can be refined as long time;
ii) how to help the user to form or join communities of interest
while respecting their personal privacy; and iii) how to develop
algorithms that combine multiple types of information to
calculate recommendations
III. CONTEXTUAL SEARCH SYSTEM

This section provides a brief overview of the contextual search
system. The contextual search system includes a variety of
functions, such as adaptation of the user’s search behavior,
recognition of a user’s preferences and interests,
recommendation of terms, generation of Boolean queries and
presentation of contextual search results classified to try
improve the user experience.

The system uses a three-level architecture, with the central
functions in the contextual search layer. The contextual search
layer links two other layers: presentation layer and database
layer. For example, the layer processes requests from the
presentation layer (e.g., a user record) and sends instructions
to the database layer to store or retrieve a piece of data (e.g.,
registration data). It is the performance of the components in
this layer that will affect the ability of the system to improve
the user’s web search experience, and it is the components of
this layer that are evaluated in this document. Figure 1 shows
the architecture of the contextual search system.

The layer comprises two main modules:
the Profile Collector Module (PCM) and the Context
Administrator Module (CMM) to perform the following
functions; 1. Gather implicit user data, such as search queries
entered, URLs visited and meta keywords. 2. Capture the user’s
explicit data, such as alternative terms, meta keywords or
similar phrases and concepts. These data come from a lexical
database, a shared contextual knowledge base (SCKB) and
domain specific ontologies. 3. Construct the user’s personal
contextual profile and a shared contextual knowledge base
using the data from step 1 and step 2. 4. Modify the user’s initial
query to more accurately reflect the user’s interests. Each
module consists of several components that perform these
various functions, with the PCM components forming the core
of the system, as a result, this document focuses on the PCM
and a full discussion of the CMM is beyond the scope of this
document.

A. Profile collector module
The PCM is implemented to capture the behavior and
preferences of a user as the personal contextual profile of a user
and structure this information in such a way that it is able to
define a search context that can be refined over time.
Figure 2 illustrates the functionality of the PCM, a hybrid
contextual user profile approach that captures a user’s adaptive
search behavior by monitoring and capturing their explicit data
(ie, classifications, inputs and explicit instructions) and implicit
data (ie , navigation and typing). The PCM acquires and
maintains these data constantly with minimal intervention

Figure 3 provides a general description of the functionality of
the PC component, which consists of the Word Sense
Disambiguater (WSD), Meta Keyword Recommender (MKR)
and Concept Recommender (CR) processes. The PC
component learns the specific information needs of a user by
capturing their explicit preferences and, at the same time,
recommends terms, phrases and concepts that will be of
potential interest to the user.
.

Figure 4 provides a summary description of the functionality of
the BC component, centered on a Behavior Acquisition (BA)
process. The BA process monitors and captures a user’s daily

Internet search activities to represent a user’s behavior.

THE NECI METASEARCH ENGINE

We developed the metasearch engine of the NEC Research
Institute (NECI) to improve the efficiency and accuracy of Web
search by downloading and analyzing each document and then
displaying results that show the terms of the query in context.
This helps users determine more quickly if the document is
relevant without having to download each page. This technique
is simple, but it can be very effective, particularly when dealing
with a large, diverse and poorly organized database of the Web.
The results of the NECI engine are returned progressively after
each page is downloaded and analyzed, instead of all pages
being downloaded. The pages are downloaded in parallel and
the first result is usually displayed in less time than a standard
search engine takes to show its response. The NECI metasearch
engine is currently in use by employees of the NEC Research
Institute. This article describes its characteristics,
implementation and performance.

The NECI metasearch engine is currently in use by employees
of the NEC Research Institute. This article describes its
characteristics, implementation and performance. A recent
study by Anastasios Tombros verified the advantages of
summaries that incorporate the context of query terms.6 Their
study found that users who worked with query-sensitive
abstracts found relevant documents faster and played relevant
judgments with greater precision and speed than users who
worked with a summary or insensitive to queries summary of
the document The queensensitive summaries also greatly
reduced the need for users to access the full text of the
documents.

Figure 1 shows a simplified control flow diagram of the NECI
metasearch engine, which consists of two main parts: the
metasearch code and a parallel page recovery daemon. The
page recovery engine is

It is relatively simple, but incorporates features such as queue
requests, the load balancing of several search processes and the
delay of requests to the same site to avoid overloading a site

Figure 2 shows the main search form for the NECI metasearch
engine. Users can choose which search engines to run, how
many hits to retrieve, how much context to display (measured
in number of characters), and so on. The engine supports all
common search formats, including Boolean syntax. As with
many other metasearch engines, the NECI meta search engine
dynamically modifies queries to match the search syntax of
each search engine. Users can control the amount of text
displayed by the NECI engine by specifying the number of
characters that will be displayed on both sides of the query
terms. To improve readability, the engine bypasses most non-
alphanumeric characters and partial words at the beginning and
end of the specified character count. At one point, we seek to
improve the context visualization by extracting logical
sentences instead of a fixed number of characters. However, in
general, users did not find that this sentence-based method was
superior because the inclusion of full sentences increased the
screen space required for each summary without significantly
improving the ability of users to determine relevance. Because
the NECI engine generates results progressively as it
downloads and analyzes each page, the results are not
necessarily displayed in the order indicated by the individual
search engines, but the order is approximately the same. Maybe
because web search engines are not good at ranking relevance
to begin with, this difference in document classification was not
a problem for users.
.

Figure 3 shows a sample response from the NECI metasearch
engine for the “digital watermark” query. The bar at the top
allows users to switch between views of the search results;
below there are links to the individual engine results. The
“suggestion” that follows can be sensitive to queries, such as
providing specific query format suggestions when the query
resembles a proper name. The shaded bars to the left of
document titles indicate how close the query terms are to each
other in the document. With a single query term, the bar
shading indicates how close the term is to the top of the
document. The information to the right of the document title
shows which engine found the document and the age of the
document, for example, in the first list, “A” refers to
AltaVista, “n / a” indicates that the age of the document is not
available.

IMPLEMENTATION
The NECI metasearch engine is currently implemented for the
operation of the server at the NEC Research Institute, where it
serves about 100 users. You could also create a client
implementation, which would improve scalability. The
disadvantages are the increase in processing and memory
requirements, and the need to update all clients when changes
are made to the metasearch engine. Deploying a client would
also decrease the benefits of caching.

RESOURCE REQUIREMENTS
The NECI search engine uses roughly an order of magnitude
more bandwidth than other search engines. These bandwidth
requirements could limit the number of users that can
simultaneously use a server-based implementation. However,
these requirements are not as good as those required by other
web developments, such as the increasing use of audio and
video, and bandwidth and access times on the Internet
continue to improve.9 In addition, the engine does not
necessarily need to analyze more pages per query as the Web
grows (although precise queries will be more important). The
prototype engine runs on a Pentium Pro 200 PC, is written in
Perl and is not optimized for efficiency. When only a few
queries are run at the same time using our engine prototype,
the analysis does not usually slow down the response (network
response time is the limiting factor).

PERFORMANCE
We analyzed the response time of the following six search
engines: AltaVista, Excite, HotBot, Infoseek, Lycos and
Northern Light. The average response time of 3,000 queries to
these engines during November-December 1997 was 1.9
seconds. However, if all engines are consulted simultaneously,
then the median time for the first engine to respond was 0.7
seconds. A similar advantage is obtained when downloading
the web pages corresponding to parallel visits, which results in
a median time for the NECI engine to receive the first page in
1.3 seconds. On average, the parallel architecture of the NECI
engine allows you to find, download and analyze the first page
faster than standard search engines, although standard engines
do not download or analyze the current content of the pages.
In May 1998, we analyzed the time for the engine to show the
first five and first 10 relevant results of 200 queries. The
median time for the first five relevant results was 2.7 seconds,
and the median time for the first 10 relevant results was 3.2
seconds (these figures do not include queries that did not
return the target number of results).

CONCLUSION

This document has presented an investigation on the
implementation and evaluation of a contextual recovery
system. The system uses a contextual user profile that
uses both implicit and explicit data to provide relevant
information to users who potentially meet their
information needs. An observational study was carried
out and the initial analysis of the data has shown that the
system improves both the effectiveness and effectiveness
of the search. This study is just one step in this direction.
The results of this study serve as a partial view of the
phenomenon, and the results can also be interpreted in
other ways. More research is needed to validate or
invalidate these findings, using larger samples and, if
possible, in a real scenario
The NECI metasearch engine demonstrates that it is
possible to download and analyze in real time the pages
that match a query. In fact, by calling the web search
engines and downloading web pages in parallel, the
NECI metasearch engine can, on average, show the first
result faster than a standard search engine. Like other
metasearch engines and various web tools, the NECI
metasearch engine relies on the underlying search
engines for important and valuable services. The wide
use of this or any metasearch engine requires a friendly
arrangement with the underlying search engines; Such
arrangements may include the passing of ads or
micropayment systems. There are numerous areas for
future research. Because the NECI engine collects the
full text of the documents that match, it is a good test
bed for information retrieval research. The areas in
which we are working include clustering, query
expansion and relevance comments. Because query-
sensitive summaries allow users to better assess
relevance without having to view pages, implicit
feedback should be more successful and could be useful
to improve relevance measures, relevant automatic
feedback, and learn specific expressive forms. Other
areas that we are looking at include the classification of
pages and the extension of the search technique of
specific expressive forms.

REFERENCES
1 Allan, J. et al. (2003) Challenges in information retrieval and language modeling: report of a workshop held at the center for intelligent information retrieval. ACM SIGIR Forum, 37, 31-47.
2 Wen, J.R., N. Lao, and W.-Y. Ma. Probabilistic model for contextual retrieval. Paper presented at the Annual ACM Conference on Research and Development in Information Retrieval. 2004. Sheffield, United Kingdom.
3 Limbu, D.K., R. Pears, A.M. Connor and S.G. MacDonell. Contextual relevance feedback in web information retrieval, Paper presented at the 1st International Symposium on Information Interaction in Context, 2006. Copenhagen, Denmark.
4 Limbu, D.K., R. Pears, A.M. Connor and S.G. MacDonell. Contextual and Concept-Based Interactive Query Expansion. Paper presented at the 19th Annual Conference of the National Advisory Committee on Computing Qualifications, 2006. Wellington, New Zealand.
5 Limbu, D.K., R. Pears, A.M. Connor and S.G. MacDonell. A Framework for Contextual Information Retrieval from the WWW. Paper presented at the 14th International Conference on Intelligent and Adaptive Systems and Software Engineering, 2005, 185-189.
6. E. Selberg and O. Etzioni, “Multi-Service Search and Comparison Using the MetaCrawler,” Proc. 1995 WWW Conf. , 1995; available online at http://draz.cs.washington.edu/papers/www4/html/Overview.html.
7. G.R. Notess, “Internet ‘Onesearch’ With the Mega Search Engines,” Online, Vol. 20, No. 6, 1996, pp. 36-39.
8. E. Keen, “Term Position Ranking: Some New Test Results,” Proc. 15th Int’l ACM SIGIR Conf. Research and Development in Information Retrieval , 1992, pp. 6676; available online at http://www.acm.org/pubs/citations/proceedings/ ir/133160/p66-keen/
9. D. Hawking and P. Thistlewaite, “Proximity Operators—So Near and Yet So Far,” Proc. Fourth Text Retrieval Conf. , D.K. Harman, ed., 1995; available online at http://web.soi.city.ac.uk/~andym/PADRE/trec4.ps.Z.
10. J. Boyan, D. Freitag, and T. Joachims, “A Machine-Learning Architecture for Optimizing Web Search Engines,” Proc. AAAI Workshop Internet-Based Information Systems , 1996; available online at http://www.lb.cs.cmu.edu/afs/cs/ project/reinforcement/papers/boyan.laser.ps.
11. S. Brin and L. Page, “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” Proc. 1998 WWW Conf. , 1998; available online at http://google.stanford.edu/~backrub/google.html.

x

Hi!
I'm Delia!

Would you like to get a custom essay? How about receiving a customized one?

Check it out