Unlocking the Potential of Unstructured Data: A Comprehensive Database Management Project

Verified

Added on  2023/04/25

|17
|1422
|242
AI Summary
In this document we will discuss about A Comprehensive Database Management Project and below are the summary points of this document:- The project aims to develop the management of unstructured data for real applications using Python language. The document discusses the use of different algorithms like search engine, Boolean model, and vector model to analyze unstructured data. The document also includes calculations and analysis related to question 1, which involves searching for words and finding stop words in different documents.
tabler-icon-diamond-filled.svg

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Database
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Table of Contents
1. Introduction...................................................................................................................................3
2. Question 1.....................................................................................................................................3
3. Question 2...................................................................................................................................11
4. Conclusion...................................................................................................................................17
Document Page
1. Introduction
The main aim of this project is to develop the management of unstructured data with
the potential usage in the real applications. The application can implement the tools in python
language. The unstructured data has the potential usage in real applications. The record
analytics apparatuses gathers the unstructured information, from wide assortment of
information sources and sets it up for examination. The unstructured information can be
found in databases, singular documents (.txt, .xml, .doc, .xls and so forth.) and in record
frameworks. The algorithm used are, search engine, Boolean model and vector model, to find
the frequency of the document in the unstructured data application.
2. Question 1
The three different documents are considered for using different topics. This
document can be used for searching the words on the given document and to find the stop
words in each document. The removal of stop words is represented below.
Stemming:
Document Page
After stemming
Inverted index
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Terms as per document number
Term Document Number Term
Document
Number
2.4 1 File 2
Googl 1 Document 2
Wide 1 Collect 2
Us 1 Quickli 2
Web 1 Default 6
Search 1 Engin 2
Engin 1 Harvest 7
World 5 Dogpil 8
Claim 1 Metasearch 3
Comprehen 1 Search 3
Index 1 Four 3
Document Page
Billion 1 Engin 3
Page 1 Time 9
Glimps 2 List 3
Queri 2 Result 3
System 2 Page 10
Allow 2
Boolean Model:
1. Search and Engine:
Search: Engine:
D1 D2 D3D1 D3 D1 D2 D3
Document Page
2. Web and Search:
Web Search
3. Web and Engine:
Web Engine
D1 D2 D3
D1 D1 D2 D3
D1
D1 D1 D2 D3
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Vector Model:
1. Search AND Engine
2. Web AND Search
3. Web AND Engine
Term Term
Frequency (TF)
Inverse Document
Frequency (IDF) Weight (W)
2.4 1 1.585 1.585
Allow 1 1.585 1.585
Billion 1 1.585 1.585
Claim 1 1.585 1.585
Collect 1 1.585 1.585
Comprehen
s 1 1.585 1.585
Default 1 1.585 1.585
Document 1 1.585 1.585
Dogpil 1 1.585 1.585
Engin 2 0 0
Engin 1 0 0
Engin 3 0 0
File 1 1.585 1.585
Four 1 1.585 1.585
Googl 1 1.585 1.585
Harvest 1 1.585 1.585
Index 1 0.585 0.585
D1
Document Page
Metasearch 1 1.585 1.585
Page 1 0.585 0.585
Page 1 0.585 0.585
Queri 1 1.585 1.585
Quickli 1 1.585 1.585
Result 1 1.585 1.585
Search 2 0 0
Search 2 0 0
Search 2 0 0
System 2 1.585 3.17
Time 1 1.585 1.585
Us 1 1.585 1.585
Web 2 1.585 3.17
Wide 1 1.585 1.585
World 2 1.585 3.17
Search Engine Web
Q1 1 1
Q2 1 1
Q3 1 1
D1 2 2 2
D2 2 1
D3 2 3
Document Page
∑ (D1) = √(1.5852 +1.5852 +1.5852 +1.5852 +0.5852 +1.5852 +1.5852 +1.5852 +0.5852 +3.172
+3.172) = 6.194
D          

D       
1. Search AND Engine
Cos (Q1, D1) = ((1*0) +(1*0))/ (( 12 +12) * 6.194) = 0
Cos (Q1, D2) = ((1*0) +(1*0))/ (( 12 +12) * 6.535) = 0
Cos (Q1, D3) = ((1*0) +(1*0))/ (( 12 +12) * 3.926) = 0
2. Web AND Search
Cos (Q2, D1) = ((1*3.17) +(1*0))/ (( 12 +12) *6.194) = 0.361
Cos (Q2, D2) = ((1*0) +(1*0))/ (( 12 +12) * 6.535) = 0
Cos (Q2, D3) = ((1*0) +(1*0))/ (( 12 +12) * 3.926) = 0
3. Web AND Engine
Cos (Q3, D1) = ((1*3.17) +(1*0))/ (( 12 +12) * 6.194) = 0.361
Cos (Q3, D2) = ((1*0) +(1*0))/ (( 12 +12) * 6.651) = 0
Cos (Q3, D3) = ((1*0) +(1*0))/ (( 12 +12) * 3.926) = 0
This document is about the full-text searching ability of SQL Server 2018. It is simple
to utilize, quick and extensible to answer for the file and has pursuit in different types of full
text searching substance. Instance in Word, compact document design (PDF) and HTML
documents. So as to text documents, store them in a database table, where two table fields
need to be created. In the first field, the content of the document in full text based searching is
stored, in the second field, the content based indexing store the extension of the file, for
example ".doc" or ".xls is stored. It is a smart thought to store the full name and the size data
of the content based indexing, on the grounds that most likely require these in genuine
circumstances, however you won't require them for full-content ordering.
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
3. Question 2
Query 1: MongoDB download and installation
Search Engine1: Google
Precision Recall
D1 1/1 1/15
D2 2/2 2/15
D3 3/3 3/15
D4 4/4 4/15
D5 5/5 5/15
D6 5/6 5/15
D7 5/7 5/15
D8 5/8 5/15
D9 5/9 5/15
D10 5/10 5/15
D11 6/11 6/15
D12 6/12 6/15
D13 6/13 6/15
Document Page
D14 7/14 7/15
D15 7/15 7/15
D16 8/16 8/15
D17 9/17 9/15
D18 9/18 9/15
D19 9/19 9/15
D20 9/20 9/15
Document Page
MongoDB install document
Precision Recall
D1 1/1 1/15
D2 2/2 2/15
D3 3/3 3/15
D4 4/4 4/15
D5 5/5 5/15
D6 6/6 6/15
D7 6/7 6/15
D8 6/8 6/15
D9 6/9 6/15
D10 6/10 6/15
D11 7/11 7/15
D12 8/12 8/15
D13 8/13 8/15
D14 8/14 8/15
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
D15 8/15 8/15
D16 8/16 8/15
D17 8/17 8/15
D18 8/18 8/15
D19 8/19 8/15
D20 8/20 8/15
Google Query1 Query2
Docume
nt
Precision Recall Precision Recall
1 R 1 0.06667 R 1 0.06667
2 R 1 0.13333 R 1 0.13333
3 R 1 0.2 R 1 0.2
4 R 1 0.26667 R 1 0.26667
5 R 1 0.33333 R 1 0.33333
6 0.83333333 0.33333 R 1 0.4
7 0.71428571 0.33333 0.85714286 0.4
8 0.625 0.33333 0.75 0.4
9 0.55555556 0.33333 0.66666667 0.4
10 0.5 0.33333 0.6 0.4
11 R 0.54545455 0.4 R 0.63636364 0.46667
12 0.5 0.4 R 0.66666667 0.53333
13 0.46153846 0.4 0.61538462 0.53333
14 R 0.5 0.46667 0.57142857 0.53333
15 0.46666667 0.46667 0.53333333 0.53333
16 R 0.5 0.53333 0.5 0.53333
17 0.52941176 0.6 0.47058824 0.53333
Document Page
18 0.5 0.6 0.44444444 0.53333
19 0.47368421 0.6 0.42105263 0.53333
20 0.45 0.6 0.4 0.53333
Google: Precision Vs Recall curve
Interpolation Query 1
0 1
0.1 1
0.2 1
0.3 1
0.4 0.545
4
0.5 0.5
0.6 0.529
2
Precision Vs Recall curve
Document Page
Average precision of both queries comparing:
Average of Q1 – 0.79
Average of Q2 – 0.92
In the case of google, the overall average precision is 0.85.
Part 2:
Search Engine: Yahoo
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
4. Conclusion
This project has successful analyzed the information retrieval of the resources. The
structure or unstructured data uses the designing of search algorithm based on the vector
model and the Boolean model is accessed. The findings of the information retrieval on the
unstructured data could be used on the index techniques, to convert the inverted index
Techniques of the search algorithm.
chevron_up_icon
1 out of 17
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]