Deep Web Information Retrieval Techniques

Verified

Added on  2021/04/21

|12
|3554
|71
AI Summary
This collection of research papers and solved assignments focuses on deep web information retrieval techniques. It includes studies on query reformulation for database queries, image tagging using geo-location driven image tagging on the social web, and logo detection using deep learning from the web. Additionally, it explores methods for harvesting deep web data through user involvement and extracting meta information on domain-oriented deep web. The papers also delve into content-based image retrieval, clustering of deep web query interfaces, and effective multi-query expansions using collaborative deep networks.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Running head: IT SECURITY
IT SECURITY
Name of the Student
Name of the University
Author Note

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
2IT SECURITY
Table of Contents
Introduction................................................................................................................................3
Background................................................................................................................................3
Statistics of Deep Web...........................................................................................................4
Concept of information security relating to Deep web..........................................................5
Security challenges posed by Deep Web...................................................................................5
Potential solutions......................................................................................................................7
Future trends...............................................................................................................................8
Conclusion..................................................................................................................................9
References................................................................................................................................11
Document Page
3IT SECURITY
Introduction
The deep web, which is also known as the invisible web, can be stated as a part of the
internet, which is not accessible to search engines over the concept of the internet. The
content, which are included into the deep web, may include chat messages; email message,
social media files, electronic health record (EHRs). This content can be directly be accessible
in the internet but are not crawled and indexed within the search engine like Yahoo, Google,
DuckDuckGo and Bing. On the other hand, it can be stated that the security, which is
applicable to the cyber world, also plays a very vital role. Deep web can be considered as one
of the places where illegal activity are easily performed without any interference from the
third party or user. The main aspect, which is involved into the working of the deep web, is
that using the Deep web is not an illegal activity but the benefit or the service, which is
obtained from, should be legal. Tor is one of the most well known portals of the Deep Web. It
can be used in the way of providing a virtual pathway that allows the user to communicate
and navigate anonymously over the concept of the internet.
The main aim of the report is to take into consideration the different concepts, which
can be, applied the aspect of Deep Web. To focus on the topic, major emphasis is put into the
working of the Deep Web. The main security challenges, which are faced with this concept,
are discussed putting the focus on the currency, which is used for the exchange of item over
the concept. The role, which the deep web would be playing shortly, is also a point of
consideration in the report.
Background
The concept of the deep web or the deep web existed a long time back since the
invention of the internet was made. The dark web can be referred to as a website, which is on
top of the dark nets. They are a network, which are overlaid that off limits certain
Document Page
4IT SECURITY
authorisations, software and configuration of the hardware. In conversation, people usually
use the term dark web, deep web and shadow web interchangeably (Liu & Xiang, 2016).
The term “dark net” was coined in the 1970s, which mainly refers to the network,
which is secluded from the ARPANET (the original term, which the user knows as the
Internet). The exact focus point in the creation of the deep web was the creation of a private
network scenario where illegal activities can be conducted. The main area, which was
involved, was that the user should be kept secured so that no one has the direct access on who
is accessing what. One of the examples of the dark net is the Tor. It is a network, which is
anonymity, which obscures the actual IP Address of the user. The process is mainly achieved
using bouncing and encrypting the communication, which is made in the network all across
the world. It can be stated here that the Tor was very much far from the first dark net which
was used however recent research have founded out that the students of the Stanford
University and MIT were the first one to use the ARPANET to sell cannabis. Therefore, it
can be said that these students were the first one to invent the concept of the “dark web”. Few
of the other example, which can be related to the concept of the dark nets, which are used
today, can be the I2p network and the freenet.
Statistics of Deep Web
In the section, the main emphasis would be focused on how large is the concept of the
Deep Web. The following statistics would be helping in identifying the perspective of the
deep web in details
The information, which is, contained in the deep web us almost 7500 Terabytes.
Taking into comparison surface net, the deep web contains between 400 to 550 times
more information.
In recent times, more than 200000 deep web sites are currently available.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
5IT SECURITY
550 million individual documents are found in the concept of the Deep Web whereas 1
billion individual documents are founded in the normal surface web.
Almost 95% of the deep web is directly accessible to the public. This means that the user
does not have to pay any money to access and gain advantage from it.
Together the 60 largest Deep web sites mainly consist of around 750 terabytes of data. It
can be surprising that the entire surface web contains only 40 times the overall data,
which are available on the deep web.
The concept of information security relating to Deep web
In the concept of information security relating to any concept which can be
advantageous for the user or not can be restricted to three concept which is integrity,
confidentiality and availability of the data. When any user who is not authorised to access a
certain type of data it can be termed as loss of confidentiality of the data. It can state here that
gaining unauthorised access can be an easy task. On the other hand, catching the intruders is a
difficult job in hand. In the concept of the Deep web, authorised users cannot access the data
and cannot alter the data, in this manner the factor of authentication and authorisation is
involved in it. The deep web directly bypasses the security aspect in a way that it restricts in
allowing the third party to get the information ion who access which data. This can be
considered as a direct breach of the information security (Zhao et al., 2016).
Security challenges posed by Deep Web
The focus point in the concept of the security challenges, which can be faced with the
concept of the deep, is its nature. It can be stated here in this context that deep web can easily
be accessed using a tool named Tor browser. The Tor browser implements encryption layers
on the outgoing and the incoming data, due to the factor it is also known as “the onion
router”. The main question, which can arise in this context, is that what the encryption
Document Page
6IT SECURITY
process serves in the process. The concept may seem to be foreign about lack of privacy on
the internet. Everything, which is searched, viewed and traded in the concept of the deep
web, can be considered as a rule. The anonymity in this aspect can have severe implications.
Everything, which is done in the concept of the deep web, is completely untraceable.
The criminal can easily indulge in it and take advantage of the concept. Simply, it can be said
that the deep web is becoming one of the hubs, which is corrupted, that involves many
criminal activities. Illegal weapon, transfer of drugs and hiring of contract killers is a daily
occurrence in this aspect. Illegal marketplace relating to bidding similar to shopping websites
have been incorporated in the deep well to sell goods, which are illegal. There have been
many laws, which are enforced, but the stopping of this criminal activity is not possible to the
extent level. This type of illegal market place is highly efficient in a way that it provides an
interface which is user-friendly and a search bar which helps the criminals to save time and
search the illegal goods quite easily. The currency, which is used to complete the transaction
Bit coin, which is a cyber-currency. This type of currency provides an extra feature for the
criminal that it is nearly impossible to trace them (Calì & Straccia, 2015).
The concept of the deep web is into the market for a long period, but since the year
2013, the awareness relating to stopping the activity was initiated. This was done seeing the
primary deep well market place. The main tool, which is used in this concept, is a tool named
Tor. Tor tool is widely used to access the deep web due to the factor that Tor uses a network
of nodes, which makes it very much difficult for the third party to know who accessed what
site and the time of the access. If a user wants to access, .onion sites, it would only be
possible with the use of the Tor. The main risk, which is associated with the use of the Tor, is
that the user who downloads it would be added to the NSA list (Das, 2015). The same
concept can be applied to the Tails. In this context, it can be stated that if everyone
downloads the Tor and uses it, then no one would be so much suspicious. Therefore, it can be
Document Page
7IT SECURITY
stated that if the number of people using the Tor increase the risk which is associated with the
concept can be reduced highly. It can be stated that there is no specific risk, which is
associated with using the Tor tools, but the point is that what is being done with the help of
the tool (Calì & Straccia, 2015).
Potential solutions
The importance, which can be applied to the concept of monitoring, has increased
recently with a focus on Tor network. This aspect can be included in other fields shortly. The
factors would not only reduce the access point which is applied to the concept of the deep
web but also plays a vital role in safeguarding the field of internet and its peripherals. The
factor of design and webbing, which is interacting, monitoring of the dark web, would pose
challenges. The following solution can be applied to safeguard the concept and provide a
solution for the issue.
Solution 1
Mapping the service, which is hidden in the directory: The Tor uses a domain
database, which is built on a system of distributed system, which is known as the “distributed
hash table” or DHT. The working of the DHT is related to the nodes in the system, which
take the responsibility of maintaining and storing the subsets of the database (Orsolini et al.,
2017). The nature of the distributed architecture relating to the hidden services domain
resolution, it is very much possible to deploy in nodes in the DTH, which would allow
monitoring the request, which comes from a particular domain.
Solution 2
Social site monitoring: Most of the times sites such as Pastebin are often used for
exchanging of address and contact information for new service which are hidden. These sites

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
8IT SECURITY
should be kept under surveillance all the time to detect spot exchange of messages, which
contains new dark web domain (Kaczmarek & Węckowski, 2018). One’s illegal activity is
detected it should be seen that the site is blocked so that shortly illegal activity is not
performed from the origin of the site.
Solution 3
Monitoring of hidden service: Most of the services which are hidden can be
considered to be very much volatile, and they tend to go offline very often and come back
with a new domain name. In this context, it can be stated that it is very much essential to get a
snapshot of every site when it is seen. The snapshot is mainly taken so that it can be
analysing of the online activity can be done (Sharma & Sharma, 2017).
Solution 4
Semantic analysis: Ones the data is retrieved from the hidden services it should be
stored in a semantic database so that it can be tracked shortly. This is done to get an idea of
the activities and from where it originates.
Future trends
The future trend relating to the deep web are stated below:
It would be more secure in the past: It can be stated relating to the deep web that
shortly the technology would be more advanced. It would be more difficult to detect the
activity, which would be performed using the platform of the Tor. It is also one of the
technological fields, which is advancing at a rapid rate.
Market place would be stronger than before: The transaction, which is made using the
electronic currency, would be more efficient and would not involve any liability. It would
directly implement the full market place without any single point of failure. As the rapid
Document Page
9IT SECURITY
increase in the market place would be developed, it would be incorporating more item
into the aspect so that more and more product can be accessed using the concept.
A gauge of reputation would be easier: In the context of the high anonymity, the trust
and the reputation among the buyers and the sellers without relying on external authority
can be gained.
Tracking of the bit coins would be harder: The concept of crypto currency goes hand
in hand with the concept of the Deep Web. In this context, it can be stated that shortly
advanced concept would be applied which would make the concept less traceable. The
concept of malware can take advantage of the block chain technology. The block chain
technology can reduce the risk which is reduced to the detection of the user when they
access the site.
More people would be involved in the concept: In a recent report, it was stated that
most of the user of the normal web would not have much time by which they would be
involved in the concept of the Deep Web. If the awareness is increased among users,
there can be a situation where users get more involved into the concept (Barrio &
Gravano, 2017). It can be said that it one of the fields was if more and more people access
it the risk, which is applied to the detection, would be minimised at a rapid rate.
Conclusion
The report can be concluded on a note that despite the information that the dark web
possess it is still an ambiguous part relating to the digital world. Many of the users who are
involved into the web sphere has not yet heard of the term and the main operation, which is
involved into the concept, and they believe that what they see is the maximum that Google
can deliver. On the other hand, some claim that it is an underground world relating to crime
and behaviours, which are unethical. Shortly, it can be stated that the concept of the deep web
would be playing a very vital role, which would make this sector stronger. To gain advantage
Document Page
10IT SECURITY
form the concept of the deep web the Tor has to be utilised as it is one of the best-known
portals. The Tor directly enhances the web browsing security and privacy while allows
information to share in a highly secret manner. The concept of the Tor shortly would be so
much advanced that the detection of the user, which access the deep net through it, would not
be detected anyhow. As innovation is playing a vital role in the sphere of advancement of the
technology, the deep web can also be considered as a field, which is playing a role in the
context.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
11IT SECURITY
References
Barrio, P., & Gravano, L. (2017). Sampling strategies for information extraction over the
deep web. Information Processing & Management, 53(2), 309-331.
Calì, A., & Straccia, U. (2015). A Framework for Conjunctive Query Answering over
Distributed Deep Web Information Resources. In SEBD (pp. 358-365).
Das, G. (2015, May). Principled Optimization Frameworks for Query Reformulation of
Database Queries. In Proceedings of the Second International Workshop on
Exploratory Search in Databases and the Web (pp. 2-2). ACM.
Jiang, L., Kalantidis, Y., Cao, L., Farfade, S., Tang, J., & Hauptmann, A. G. (2017,
February). Delving deep into personal photo and video search. In Proceedings of the
Tenth ACM International Conference on Web Search and Data Mining (pp. 801-810).
ACM.
Kaczmarek, T., & Węckowski, D. G. (2018). Harvesting deep web data through produser
involvement. In The Dark Web: Breakthroughs in Research and Practice (pp. 175-
198). IGI Global.
Liu, B., & Xiang, J. (2016, August). Extraction and management of meta information on the
domain-oriented Deep Web. In Software Engineering and Service Science (ICSESS),
2016 7th IEEE International Conference on (pp. 787-790). IEEE.
Massouh, N., Babiloni, F., Tommasi, T., Young, J., Hawes, N., & Caputo, B. (2017).
Learning deep visual object models from noisy web data: How to make it work. arXiv
preprint arXiv:1702.08513.
Memon, M. H., Khan, A., Li, J. P., Shaikh, R. A., Memon, I., & Deep, S. (2014, December).
Content based image retrieval based on geo-location driven image tagging on the
social web. In Wavelet active media technology and information processing
Document Page
12IT SECURITY
(ICCWAMTIP), 2014 11th international computer conference on (pp. 280-283).
IEEE.
Orsolini, L., Papanti, D., Corkery, J., & Schifano, F. (2017). An insight into the deep web;
why it matters for addiction psychiatry?. Human Psychopharmacology: Clinical and
Experimental, 32(3).
Pavai, G., & Geetha, T. V. (2014). A Bootstrapping Approach to Classification of Deep Web
Query Interfaces. International Journal on Recent Trends in Engineering &
Technology, 11(2), 1.
Qiang, B., Zhang, R., Wang, Y., He, Q., Li, W., & Wang, S. (2014). Research on Deep Web
Query Interface Clustering Based on Hadoop. JSW, 9(12), 3057-3062.
Sharma, D. K., & Sharma, A. K. (2017). Deep Web Information retrieval Process. The Dark
Web: Breakthroughs in Research and Practice: Breakthroughs in Research and
Practice, 114.
Su, H., Gong, S., Zhu, X., Popescu, A., Ginsca, A., Le Borgne, H., & Loh, Y. P. (2017,
October). Weblogo-2m: Scalable logo detection by deep learning from the web. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(pp. 270-279).
Wang, Y., Lin, X., Wu, L., & Zhang, W. (2017). Effective multi-query expansions:
Collaborative deep networks for robust landmark retrieval. IEEE Transactions on
Image Processing, 26(3), 1393-1404.
Zhao, F., Zhou, J., Nie, C., Huang, H., & Jin, H. (2016). SmartCrawler: a two-stage crawler
for efficiently harvesting deep-web interfaces. IEEE transactions on services
computing, 9(4), 608-620.
1 out of 12
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]