Image Recognition: Techniques for Mobile Visual Search Architectures

Verified

Added on  2019/09/20

|23
|5175
|235
Report
AI Summary
This report provides a comprehensive literature review of image recognition techniques, focusing on landmark recognition and mobile visual search architectures. It begins with an introduction to computer vision and its applications, particularly in identifying and understanding images, with an emphasis on the challenges of creating a real-time landmark recognition Android app. The report then delves into various image recognition and comparison techniques, including template-based and feature-based approaches, examining the work of researchers in the field, such as Colios et al., Mata et al., and Yan-Tao et al. The review also explores mobile visual search architectures, discussing the balance between mobile device processing and server-side support. It highlights the importance of efficient database representation and storage for large-scale applications like Google Goggles, while also considering the limitations of different methods. The report concludes with a summary of key findings and potential research directions, emphasizing the need for efficient and accurate image recognition systems for mobile devices, supported by techniques like parallel computing and hierarchical clustering. Finally, the report also includes references and bibliography to support the literature review.
Document Page
Abstract
Essentially, computer vision is the computer's ability to gain a high-level understanding
from text, digital images or videos that it is presented with.
Within the arena of machine learning, computer visualisation and image processing,
image recognition presents a broad horizon of challenging tasks.
How to extract optimal, representative key features that can reflect the intrinsic content of
an image as accurately and efficiently as possible remains both a primary interest and an
exacting task within the domain of computer vision.
Keywords: - Image Recognition, Machine Learning, Data-set, Bag-of-Words, GPS
Receiver, Key Point Detection/Description, PICASA, Monulens and Google Goggles.
1
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Table of Contents
1.0 Introduction.....................................................................................................................4
2.0 Literature Review: ..........................................................................................................6
2.1 A Survey of Major Image/Landmark Recognition & Comparison Techniques:..............6
2.2 Mobile Visual Search Architectures:...........................................................................11
3.0 Summary...................................................................................................................... 18
4.0 Conclusion....................................................................................................................19
5.0 References................................................................................................................... 20
6.0 Bibliography..................................................................................................................24
Document Page
3
Document Page
1.0 Introduction
A touristic landmark is an instantly recognisable building or site (such as a monument or a
cathedral).
As a traveller, gathering real-time, interactive and
relevant information about a landmark or monument and
its surrounding areas of interest are important aspects of
an individual's journey because of the cultural and
historical aspects it can present.
The World Wide Web is readily abundant with images
and video recordings.
Creating a mobile application which has the ability to
recognise and match a vast number of landmarks
efficiently still remains a challenge due to the sheer number of images that are required to
be searched within a database/dataset, along with the presence of possible visual
distortions such as external litter, illuminated amendment and dynamic geometry of the
imaging devices by which the visual media was recorded (Rekhansh et al., 2015).
However, with the vast amount of landmark pictures emerging within the World Wide Web
and the advancement of landmark picture sharing via websites such as Picasa and Flickr,
the requirement for a computer conception to recognise landmarks universally through the
creation of reliable image identification engines and algorithms is necessary (Rohr, 2010).
In this research work, a literature review will be conducted to assess research
developments in the field of image recognition and the discoveries and progress that has
been made towards technologies such as image feature extraction, image processing,
visual search architectures and image search optimisation in general.
Finally, this paper will be summarised, with primary conclusions highlighted and some
potential research directions and techniques that can be further explored and improved
upon identified.
4
Fig.1: Famous Landmarks (Google.com,
2016)
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Only through this process may an application be bestowed with image recognition
capabilities advanced enough to fully recognise and understand a diverse range of images
using neural networks (Parker, 1996).
This includes taking into consideration a number of issues that often present themselves
when creating such a device, such as the variations that are guaranteed to emerge in any
given landmark from one observation to the next (Yairi, Hirama, and Hori, 2003).
The ultimate goal of the proposed project is to create a real-time landmark recognition
Android app.
The device will use a large dataset of images which will be stored on the device itself for
matching. By identifying a given landmark, the application will simplify both content
understanding and the geolocation of images returned via GPS, enabling a topographical
representation and navigation of landmarks in the local area to provide appropriate tour
guide suggestions and guidance via online resources such as Yelp and Google Maps.
However, as already highlighted, efficiency is a challenge for such a potentially unstable,
large-scale image recognition system (Pinto, Cox and DiCarlo, 2008).
Document Page
2.0 Literature Review:
2.1 A Survey of Major Image/Landmark Recognition & Comparison Techniques:
Various works have been carried out within the area of image recognition adhering to the
requirements of the main specifications.
On observing the dearth of methods for the identification and correlation of specific details
from images stored on the World Wide Web and the limits of purely camera-based,
contemporary programs, various researchers have attempted to come up with a means of
landmark recognition and modelling (some on a global-scale).
Colios et al. (2001) experimented with automated landmark identification for robots by
means of both projective and point-permutation invariant vectors. These vectors were
used to identify landmark patterns based on workspace planar features, enabling the
creation of direct, point-to-point correspondences in an indoor setting.
This, in turn, allowed for the use of both projectivity constraints and the convex hull to
identify matches in sets of five different images.
There was a noticeable margin of error; though this was reduced through the use of sub-
landmarks as outlier patterns.
However, this approach was limited to indoor environments only, rendering it's
effectiveness to be restrictive (Rekhansh et al., 2015).
As part of a different project, vision-based landmark recognition was also used by Mata et
al. (2002) to perform topological localisation of mobile robots.
Both natural and artificial landmarks alike were used here, enabling the creation of a
search function whereby pattern identification/recognition techniques were applied to
digital images.
6
Document Page
Not only did this enable robot navigation, but it also allowed for text strings, when present,
to be captured and understood from within landmark pictures.
Tests on robots, such as a B21 mobile robot, proved the usefulness of this approach.
Unfortunately, the method did not touch on any network or internet-based navigation, so a
potential limitation occurs in the navigation capabilities of robots when utilising this
particular method.
This, in turn, suggests that computer vision still had some way to go before it could
combine with web-based localisation and map construction technologies
(Yairi, Hirama, and Hori, 2003; Rekhansh et al., 2015).
Pablo Sala et al. (2006) presented a unique chart theoretic plan of the issue of
consequently extricating an ideal arrangement of points of interest from an environment for
visual navigational use.
It's obstinate intricacy propelled the requirement for approximation algorithms, and
displayed six algorithmic calculations.
To assess them, the authors initially tried them on a simulator, where they could fluctuate
the state of the environment, the number and state of objects, the dissemination of the
elements and the perceivability of the components.
The calculation that accomplished the best outcomes on engineered information was then
displayed through genuine visibility data.
The subsequent decompositions revealed expansive areas on the world in which a small
number of components could be tracked to support proficient online localisation.
Their detailing and arrangement of the issues were general, and could suit other classes of
image features.
Elmogy et al. (2009) developed a robust and fast landmark recognition system for
online landmark recognition during robot navigation. Colour histograms of the
landmarks encountered were used to provide a proximate initial estimate of the
landmark.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
The resulting hypotheses was then processed to calculate an accurate estimation of
the landmark.
A topological map displaying the routes was used to reduce the processing time by
only processing landmarks mentioned in the route description, ignoring the other
landmarks during navigation.
The robot’s stereo vision was also combined with the classified landmarks for
locating the nearest landmark to the automation and to calculate the landmark’s
geographical position in the real world.
Meyer-Delius et al. (2011) introduced a point of interest situation approach that attempted
to diminish the overall uncertainty within an environment to enhance the navigational
execution of a portable robot.
They proposed a measure for the uniqueness of a robot in light of the appearance of the
earth as seen by the automation.
Because of the combinatorial nature of the landmark arrangement issue, they presented
an estimated approach that incrementally chose historic point areas from an arrangement
of candidate areas, and in this way augmented the normal uniqueness in the environment.
Moreover, they depicted a solid application with regards to limitation with laser run
scanners, given a grid based representation of the environment.
They assessed their approach for various situations in duplicating and utilising genuine
information.
The outcome revealed that their approach yielded generous enhancements in localised
performance with robots.
8
Document Page
Being primarily concerned with efficiency issues of GPS based web enabled landmark
recognition, Yan-Tao et al. (2009) discovered that while the low-dimensionality of GPS
coordinates does not demand high-end equipment or efficiency, input images in high
quantities may easily inhibit the process, especially when landmark models are particularly
large.
Methods to improve the efficiency of landmark image mining and recognition of query
images were explored through the use of techniques such as parallel computing, efficiency
and hierarchical clustering, and local feature indexing for easier and efficient matching.
These methods produced improvements in the speed and accuracy of the clustering and
image matching process.
However, parallel computing is not something that a mobile device can easily execute,
being a feature that generally shows its worth only in supercomputer clusters
(Almasi and Gottlieb, 1987; Hwang, Fox, and Dongarra, 2011).
Nevertheless, efficiency and hierarchical clustering and local feature indexing is something
that can definitely be implemented efficiently at software-level (Yan-Tao Zheng et al.,
2009).
Indeed, modern tablets and mobile phones have reached a point where they possess
computing and graphics hardware which are on a par with those of the personal
computers of barely two or three decades ago (Girod et al., 2011).
Additionally, with the current ubiquity of 3G and Wi-Fi for mobile devices, it is not out of the
question that a cloud-based feature could be added to an application, to provide it with
remote access to parallel computing resources via the cloud (Grama, Gupta, and Karypis,
2003; Landfeldt, 2009; Hwang, Fox, and Dongarra, 2011).
Document Page
More recent landmark recognition techniques may be separated into two categories:
1. Template-based representation
2. Feature-based representation (Chan and Baciu, 2012)
The template-based approach to landmark recognition employs holistic texture features
and creates arrays from complete building/location image patterns. These arrays are then
compared by means of metrics such as the Euclidian distance (Ansari and Li, 1993).
The Eigenspace method is currently the most widely accepted approach to template-
based landmark recognition. Based on the PCE technique (Principal Components
Analysis), this method enables high-speed landmark recognition via dimensionality
reduction (Chan and Baciu, 2012).
Feature-based representation, meanwhile, looks at a landmark’s geometrical features,
such as the relative position, or specific elements of, pillars, walls, doors, windows,
crossbeams and decorations. Based on these feature points, equations may be used to
calculate landmark positioning, in a way similar to camera-based positioning
(Seo and Yoo, 2004; Gillner, Weiß, and Mallot, 2008).
However, neither of these approaches is without its limitations.
Template-based methods (Eigenspace and fisher-space) lose most of their usefulness
when trying to parse variations in lighting, though wavelet decomposition may help to
mitigate this weakness somewhat (Chan and Baciu, 2012).
Feature-based applications, like vision-based positioning, demand that an application
constantly compares multiple visual scenes, preferably of high resolution.
10
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
This, whilst so far successfully applied to robots, demands computing resources that are
generally too large and consuming for a mobile application (Li and Yang, 2003; Li et al.,
2011).
Document Page
2.2 Mobile Visual Search Architectures:
For about a decade there has been a growing interest in both landmark recognition and
optimisation techniques within the computer vision community.
However, its use in a mobile application setting have been relatively recent.
Mobile image-based retrieval applications emerge with their own set of challenges, one of
which is the consideration of how much of the processing (or which parts) should be
carried out by the mobile device itself, and what should be off-loaded to a server (Girod et
al., 2011; Li and Yap, 2012).
Such a device will require a database that can be searched quickly across many images.
The main consideration here is how to represent and store the database that will be used.
Most commercial, large-scale applications such as Google Goggles rely heavily on server-
side support and processing, and utilises an approach that relies on incorporating priors
from noisy user location data, using the image content itself for identification.
However, the creation of an app that does not rely on any server-side support for
processing would help in curbing the latency, power and network requirements that are
often associated with communication to and from a server when using such a device
(Chen, 2013).
Monulens is an application that does not rely on server side support. The whole procedure
happens on the gadget itself and the match is displayed through the utilisation of a
significant dataset.
While client side executions offer noteworthy advantages, one limitation of a server
autonomous execution is that data-intensive applications devour a considerable amount of
storage room on the gadget.
12
chevron_up_icon
1 out of 23
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]