7COM1070 Cyber Security Masters Project: Duplicate Data Identification
VerifiedAdded on  2023/06/18
|105
|19954
|58
Project
AI Summary
This cyber security project explores the identification of duplicate data in cloud storage through asymmetric encryption access control. The project addresses the issue of redundant data in the cloud, which wastes storage and complicates data sharing. It proposes a data storage management system that identifies duplicate data and implements access control using asymmetric encryption techniques. The study evaluates its performance through safety assessment, comparison, and implementation, demonstrating its safety efficiency for practical application. The project aims to save cloud storage space, provide accurate data analysis, and implement an efficient output algorithm. The solution is submitted in partial fulfillment of the requirement for the degree of Master of Computer Science in Cyber Security at the University of Hertfordshire.

.
7COM1070 and Cyber Security Masters Project.
Date: 03-09-2021.
IDENTIFICATION OF DUPLICATE DATA WITH
ASYMMETRIC ENCRYPTION ACCESS CONTROL
TO THE CLOUD DATA.
7COM1070 and Cyber Security Masters Project.
Date: 03-09-2021.
IDENTIFICATION OF DUPLICATE DATA WITH
ASYMMETRIC ENCRYPTION ACCESS CONTROL
TO THE CLOUD DATA.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

This report is submitted in partial fulfillment of the requirement for the degree of Master of
Computer Science in Cyber Security at the University of Hertfordshire (UH).
It is my own work except where indicated in the report.
I did not use human participants in my MSc Project.
I hereby give permission for the report to be made available on the university website
provided the source is acknowledged.
2
Computer Science in Cyber Security at the University of Hertfordshire (UH).
It is my own work except where indicated in the report.
I did not use human participants in my MSc Project.
I hereby give permission for the report to be made available on the university website
provided the source is acknowledged.
2

ACKNOWLEDGEMENT
This paper is part of my dissertation. I must thank the specialist for his support in finishing
this proposal report and help me to produce the highest quality results. I have managed the
concept well in a beneficial manner. I am glad that the facilitators who had allowed me to
express this idea to the whole community. I would like to Thanks to Professor Silvia Moros,
for giving me a legitimate direction to characterize my proposal report.
3
This paper is part of my dissertation. I must thank the specialist for his support in finishing
this proposal report and help me to produce the highest quality results. I have managed the
concept well in a beneficial manner. I am glad that the facilitators who had allowed me to
express this idea to the whole community. I would like to Thanks to Professor Silvia Moros,
for giving me a legitimate direction to characterize my proposal report.
3
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

ABSTRACT
Cloud storage is an effective approach to increase storage requirements for companies and
individuals. Users can encrypt data to ensure security and privacy before downloading to the
cloud. In such cases, the same or other users can encrypt the same data, and it will be rising
the duplicate data in the cloud. In order to provide security and privacy for cloud users, data
is constantly encrypted. However, encrypted data might lead to much waste storage in the
cloud and complex data sharing among authorized users. In terms of encrypted storage and
de-dupe management, we continue to confront problems. Traditional de-duplication solutions
always focus on unique applications where de-duplication is handled by both data owners and
cloud servers. They cannot flexibly address the diverse demands of data owners, depending
on data sensitivity. This paper shows a data storage management system that will Identify the
duplication of data and access control using the asymmetric encryption techniques in cloud
services providers (CSPs). This study will evaluate its performance through safety
assessment, comparison and implementation. The results show their safety efficiency for
practical application.
4
Cloud storage is an effective approach to increase storage requirements for companies and
individuals. Users can encrypt data to ensure security and privacy before downloading to the
cloud. In such cases, the same or other users can encrypt the same data, and it will be rising
the duplicate data in the cloud. In order to provide security and privacy for cloud users, data
is constantly encrypted. However, encrypted data might lead to much waste storage in the
cloud and complex data sharing among authorized users. In terms of encrypted storage and
de-dupe management, we continue to confront problems. Traditional de-duplication solutions
always focus on unique applications where de-duplication is handled by both data owners and
cloud servers. They cannot flexibly address the diverse demands of data owners, depending
on data sensitivity. This paper shows a data storage management system that will Identify the
duplication of data and access control using the asymmetric encryption techniques in cloud
services providers (CSPs). This study will evaluate its performance through safety
assessment, comparison and implementation. The results show their safety efficiency for
practical application.
4
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

TABLE OF CONTENTS
ACKNOWLEDGEMENT...............................................................................................................
ABSTRACT....................................................................................................................................
1. INTRODUCTION...................................................................................................................
1.1 PROJECT OVERVIEW:.......................................................................................................
1.2 INTRODUCTION:................................................................................................................
1.3 OBJECTIVES:................................................................................................................
2.0 LITERATURE REVIEW........................................................................................................
2.1 Background:.........................................................................................................................
2.2 RESEARCH QUESTIONS:................................................................................................
3. METHODOLOGY....................................................................................................................
3.1 EXPERIMENT SCREENSHOTS:......................................................................................
4. DESIGNING.............................................................................................................................
4.1 SYSTEM DESIGN REQUIREMENTS:.............................................................................
4.2 Fundamental Algorithms:....................................................................................................
SOFTWARE AND HARDWARE REQUIREMENT..............................................................
4.5 System Design:....................................................................................................................
4.6 TECHNOLOGIES USED:.................................................................................................
5. ETHICAL, PROFESSIONAL AND LEGAL ISSUES.............................................................
6. RESULTS AND DISCUSSION...............................................................................................
6.1 Access Control Techniques:................................................................................................
6.2 System Architecture:...........................................................................................................
6.3 System Configuration and Required Keys:.........................................................................
6.4 SYSTEM TESTING:...........................................................................................................
6.5 RESULTS LIST:.................................................................................................................
7. CONCLUSION AND FUTURE ENHANCEMENT............................................................
References:....................................................................................................................................
Appendix:......................................................................................................................................
5
ACKNOWLEDGEMENT...............................................................................................................
ABSTRACT....................................................................................................................................
1. INTRODUCTION...................................................................................................................
1.1 PROJECT OVERVIEW:.......................................................................................................
1.2 INTRODUCTION:................................................................................................................
1.3 OBJECTIVES:................................................................................................................
2.0 LITERATURE REVIEW........................................................................................................
2.1 Background:.........................................................................................................................
2.2 RESEARCH QUESTIONS:................................................................................................
3. METHODOLOGY....................................................................................................................
3.1 EXPERIMENT SCREENSHOTS:......................................................................................
4. DESIGNING.............................................................................................................................
4.1 SYSTEM DESIGN REQUIREMENTS:.............................................................................
4.2 Fundamental Algorithms:....................................................................................................
SOFTWARE AND HARDWARE REQUIREMENT..............................................................
4.5 System Design:....................................................................................................................
4.6 TECHNOLOGIES USED:.................................................................................................
5. ETHICAL, PROFESSIONAL AND LEGAL ISSUES.............................................................
6. RESULTS AND DISCUSSION...............................................................................................
6.1 Access Control Techniques:................................................................................................
6.2 System Architecture:...........................................................................................................
6.3 System Configuration and Required Keys:.........................................................................
6.4 SYSTEM TESTING:...........................................................................................................
6.5 RESULTS LIST:.................................................................................................................
7. CONCLUSION AND FUTURE ENHANCEMENT............................................................
References:....................................................................................................................................
Appendix:......................................................................................................................................
5

LIST OF FIGURES Page No’s
Figure 1: System Architecture…………………………………………………17
Figure 2: Use Case Diagram…………………………………………………...23
Figure 3: Sequence Diagram of Data Owner ………………………………….24
Figure 4: Sequence Diagram of AP, KGC, CSP……………………………….24
Figure 5: Collaboration Diagram of Data Owner………………………………25
Figure 6: Collaboration of AP, KGC, CSP…………………………………..26
Figure 7: Data Flow Diagram…………………………………………………..26
6
Figure 1: System Architecture…………………………………………………17
Figure 2: Use Case Diagram…………………………………………………...23
Figure 3: Sequence Diagram of Data Owner ………………………………….24
Figure 4: Sequence Diagram of AP, KGC, CSP……………………………….24
Figure 5: Collaboration Diagram of Data Owner………………………………25
Figure 6: Collaboration of AP, KGC, CSP…………………………………..26
Figure 7: Data Flow Diagram…………………………………………………..26
6
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

LIST OF TABLES Page No’s
Table 1: Different Approaches between USA and EU.………………64
Table 2: Positive Test Case…………………………………………….67
Table 3: Negative Test Case……………………………………………68
7
Table 1: Different Approaches between USA and EU.………………64
Table 2: Positive Test Case…………………………………………….67
Table 3: Negative Test Case……………………………………………68
7
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

1. INTRODUCTION
1.1 PROJECT OVERVIEW:
My Project is about securing the cloud storage by preventing the storage of duplicate or
repeated data uploaded by the data owner/ data user. In this project, we will develop a
prototype where the duplicate data is identified, and it will prevent from further uploading it
to the cloud by checking at various Cloud Service Providers. Here, we use different
techniques of the cryptographic system, namely Asymmetric Encryption Techniques are used
for the data on the cloud.
1.2 INTRODUCTION:
It uses new technologies to provide cloud computing services, enabling internet
reconfiguration and online delivery to clients. It plays an important role in supporting the
storage, processing, and management of data in the Internet of Things (IoT). Some of the
Cloud Service Providers (CSPs) offer large amounts of storage to store and manage IoT data
such as videos, personal information, etc(Yan et al., 2016). These CSPs provide the required
quality of service: scalability, flexibility, fault tolerance and pay-as-you-go. Therefore, cloud
computing has become a promising IoT application and IoT system deployment service
paradigm.
Information about the cloud can be continuously found through cloud specialists.
There are some security issues with these distributed computing. Some other clients can
modify or delete the data stored in the cloud. Cloud clients need to transfer data to other
meetings in certain cases and for certain reasons. These conditions allow customers to have
the option to use cloud management each time they receive and approve the information
about their data protection policy.
The main argument focused upon in this dissertation is existence of copied
information on Cloud. The capacity of the same information multiple times is known as
information duplication (Nahlah Aslam and Swaraj, 2019). Copied information is discarded
and extra space is wasted. There is huge memory in the cloud, but it does not use large
memory, and information processing becomes complicated. Therefore, deduplication was
important for preparing information in the cloud. Deduplication is to reduce the cost of
8
1.1 PROJECT OVERVIEW:
My Project is about securing the cloud storage by preventing the storage of duplicate or
repeated data uploaded by the data owner/ data user. In this project, we will develop a
prototype where the duplicate data is identified, and it will prevent from further uploading it
to the cloud by checking at various Cloud Service Providers. Here, we use different
techniques of the cryptographic system, namely Asymmetric Encryption Techniques are used
for the data on the cloud.
1.2 INTRODUCTION:
It uses new technologies to provide cloud computing services, enabling internet
reconfiguration and online delivery to clients. It plays an important role in supporting the
storage, processing, and management of data in the Internet of Things (IoT). Some of the
Cloud Service Providers (CSPs) offer large amounts of storage to store and manage IoT data
such as videos, personal information, etc(Yan et al., 2016). These CSPs provide the required
quality of service: scalability, flexibility, fault tolerance and pay-as-you-go. Therefore, cloud
computing has become a promising IoT application and IoT system deployment service
paradigm.
Information about the cloud can be continuously found through cloud specialists.
There are some security issues with these distributed computing. Some other clients can
modify or delete the data stored in the cloud. Cloud clients need to transfer data to other
meetings in certain cases and for certain reasons. These conditions allow customers to have
the option to use cloud management each time they receive and approve the information
about their data protection policy.
The main argument focused upon in this dissertation is existence of copied
information on Cloud. The capacity of the same information multiple times is known as
information duplication (Nahlah Aslam and Swaraj, 2019). Copied information is discarded
and extra space is wasted. There is huge memory in the cloud, but it does not use large
memory, and information processing becomes complicated. Therefore, deduplication was
important for preparing information in the cloud. Deduplication is to reduce the cost of
8

capacity. It's about building cloud productivity. Managing encoded deduction data is an
important issue.
The process of identifying and deleting duplicate data is called data deduplication.
There are several solutions for raw data deduplication. However, security allows users to
store their data in a cloud-encrypted way. `In such cases, you need to deduct the encrypted
data. Inferring encrypted data is a difficult problem. Deduplication of adaptive cloud
information by information access control is still an exceptional issue (Yan and et al., 2019).
Information copied to the cloud may be stored in the same or slightly CSP encoded way by
similar or different clients. From a viable standpoint, deduplication of information is strongly
expected to function well by controlling access to information. This is the same information
(whether scrambled or not), and it saves money only once in the cloud locale but differs
depending on the information owner's access or the information owner's strategy (i.e.,
keeping the information unique to the various clients).
Distributed storage is huge, but duplication of information reserves can lead to misuse
of organizational assets, energy use, increased work costs and information organization sins.
CSPs benefit cloud customers and benefit enormously from their financial capacity at a
reduced rate of support that reduces work costs. Cloud information deduplication is
especially important for the volume and organization of vast amounts of information.
Nevertheless, to create a statement, we still need adaptive deduplication of cloud information
from some CSPs. It cannot be widely deployed to maintain both cloud deduplication and
access in an adaptive and stable way.
Existing system:
Data storage is one of the most well-known cloud services. `Cloud users have gained a great
advantage of cloud storage because they can store huge amounts of data anytime, anywhere
without having to upgrade their devices. However, the storage of cloud data provided by
cloud service providers still has various problems (CSP). First, due to different sensitivities,
different data stored in the cloud may require different protection solutions. Cloud-specified
information includes confidentially shared information, group-shared personal information,
and so on. Naturally, important cloud data needs to be protected. A lot of redundant data is
stored on cloud servers without the prior knowledge of users and data providers.
Study Motivation and Contributions:
9
important issue.
The process of identifying and deleting duplicate data is called data deduplication.
There are several solutions for raw data deduplication. However, security allows users to
store their data in a cloud-encrypted way. `In such cases, you need to deduct the encrypted
data. Inferring encrypted data is a difficult problem. Deduplication of adaptive cloud
information by information access control is still an exceptional issue (Yan and et al., 2019).
Information copied to the cloud may be stored in the same or slightly CSP encoded way by
similar or different clients. From a viable standpoint, deduplication of information is strongly
expected to function well by controlling access to information. This is the same information
(whether scrambled or not), and it saves money only once in the cloud locale but differs
depending on the information owner's access or the information owner's strategy (i.e.,
keeping the information unique to the various clients).
Distributed storage is huge, but duplication of information reserves can lead to misuse
of organizational assets, energy use, increased work costs and information organization sins.
CSPs benefit cloud customers and benefit enormously from their financial capacity at a
reduced rate of support that reduces work costs. Cloud information deduplication is
especially important for the volume and organization of vast amounts of information.
Nevertheless, to create a statement, we still need adaptive deduplication of cloud information
from some CSPs. It cannot be widely deployed to maintain both cloud deduplication and
access in an adaptive and stable way.
Existing system:
Data storage is one of the most well-known cloud services. `Cloud users have gained a great
advantage of cloud storage because they can store huge amounts of data anytime, anywhere
without having to upgrade their devices. However, the storage of cloud data provided by
cloud service providers still has various problems (CSP). First, due to different sensitivities,
different data stored in the cloud may require different protection solutions. Cloud-specified
information includes confidentially shared information, group-shared personal information,
and so on. Naturally, important cloud data needs to be protected. A lot of redundant data is
stored on cloud servers without the prior knowledge of users and data providers.
Study Motivation and Contributions:
9
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

To solve the storage problem in this cloud environment, the damage of cloud storage has
become widespread. Provide an encrypted deduction method. This approach ensures that
shared data is seen as a significant issue in the cloud, where ownership dynamically changes
for efficient and secure cloud storage services, and authorization is only granted with
permission. In each group of owners, the company adopts a group key management method.
Compared with the previous deduplication algorithms used to encrypt data, the proposed
method has the following security and efficiency advantages. Use asymmetric encryption
technology to identify data duplication and access control. According to the information
owner assumption, which can be adapted to different applications, we give you a
heterogeneous information dashboard framework to achieve replication and access control.
Our framework can intelligently enable information sharing between qualified clients,
managed by the information owner or other trusted collectors, or both.
1.3 OBJECTIVES:
The main goal of the project is to identify duplication from an adaptive perspective, and
technology must be combined to control access to information. Therefore, although this
information is stored once in the cloud in an encrypted structure, different customers can
access the equivalent according to the information owner's strategy. We provide a framework
for deduplicating encrypted information stored in the cloud using attribute-based encryption
(ABE) while enhancing the security control of information access. Inspection and execution
show that our framework is protected and efficient.
1. To Identify the duplicate data over the cloud.
2. To save the storage of the cloud from multiple files of the same data being
stored.
3. Providing accurate analysis of the cloud data.
4. Implementing the efficient output algorithm.
5. To implement Asymmetric encryption- based security for encryption and
decryption of data at the cloud end.
10
become widespread. Provide an encrypted deduction method. This approach ensures that
shared data is seen as a significant issue in the cloud, where ownership dynamically changes
for efficient and secure cloud storage services, and authorization is only granted with
permission. In each group of owners, the company adopts a group key management method.
Compared with the previous deduplication algorithms used to encrypt data, the proposed
method has the following security and efficiency advantages. Use asymmetric encryption
technology to identify data duplication and access control. According to the information
owner assumption, which can be adapted to different applications, we give you a
heterogeneous information dashboard framework to achieve replication and access control.
Our framework can intelligently enable information sharing between qualified clients,
managed by the information owner or other trusted collectors, or both.
1.3 OBJECTIVES:
The main goal of the project is to identify duplication from an adaptive perspective, and
technology must be combined to control access to information. Therefore, although this
information is stored once in the cloud in an encrypted structure, different customers can
access the equivalent according to the information owner's strategy. We provide a framework
for deduplicating encrypted information stored in the cloud using attribute-based encryption
(ABE) while enhancing the security control of information access. Inspection and execution
show that our framework is protected and efficient.
1. To Identify the duplicate data over the cloud.
2. To save the storage of the cloud from multiple files of the same data being
stored.
3. Providing accurate analysis of the cloud data.
4. Implementing the efficient output algorithm.
5. To implement Asymmetric encryption- based security for encryption and
decryption of data at the cloud end.
10
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

11

2.0 LITERATURE REVIEW
2.1 Background:
Existing research recommends encryption of information before moving it to cloud to
avoid data protection at CSP, where there are varied advancement coming up in recent time
periods. Admission control on scrambled solicitation of information that should be
unscrambled by approved substances. Preferably, all information should be encoded only
once and route of an approved substance should only be granted once. Normal key update in
any case by changing the trust association will confuse key management. Access control
records are being used to ensure security of information for suspects or semi-trusted
individuals (such as CSPs). The information owner, further sorts information into several sets
with same effective conditions and encodes each set into scrambled message that is simply
circulated to client in a series of Access Control List (ACLs) before sending final information
to the CSP. Therefore, this bundle of information can be easily accessed by ACL clients.
Traditional deduplication strategies work for workers or owners. It is only
occasionally that a half-breed strategy for getting rewards from both methods has been
introduced. This document has proposed a deduplication agreement managed only by the
owner of the information. Management of entry of various information holders relies on
determining metadata that describes eligible clients and has CSPs. Applying public center
encryption in this way is very computationally complex and improves directly with the
number of clients. Hur et al, Recommended other encoded information worker side
deductions, If ownership changes strongly through any subscription, the cloud workers can
oversee the approval of stolen information in any case and ensure an exclusive collection
keyboard layout. This framework avoids information leaks from legitimate but curious
distributed storage users but abandons additional clients.
The innovation of deduplication reduces the copy information frequently used in
distributed storage management, saving only one copy, limiting the need for management
space and transport capacity. Replication is suitable when multiple clients reuse the same
information for distributed storage, but it introduces security and ownership issues. The
evaluation framework allows the individual to have the same information and verify that he
owns the information in a powerful way to the distributed storage worker. Many clients tend
to scramble information for ensuring protection, before it is re-evaluated for distributed
storage. This will randomize encryption and thus limit redundancy. Later, deduplication
framework recommends focus upon aspects that each owner be authorized to have a similar
12
2.1 Background:
Existing research recommends encryption of information before moving it to cloud to
avoid data protection at CSP, where there are varied advancement coming up in recent time
periods. Admission control on scrambled solicitation of information that should be
unscrambled by approved substances. Preferably, all information should be encoded only
once and route of an approved substance should only be granted once. Normal key update in
any case by changing the trust association will confuse key management. Access control
records are being used to ensure security of information for suspects or semi-trusted
individuals (such as CSPs). The information owner, further sorts information into several sets
with same effective conditions and encodes each set into scrambled message that is simply
circulated to client in a series of Access Control List (ACLs) before sending final information
to the CSP. Therefore, this bundle of information can be easily accessed by ACL clients.
Traditional deduplication strategies work for workers or owners. It is only
occasionally that a half-breed strategy for getting rewards from both methods has been
introduced. This document has proposed a deduplication agreement managed only by the
owner of the information. Management of entry of various information holders relies on
determining metadata that describes eligible clients and has CSPs. Applying public center
encryption in this way is very computationally complex and improves directly with the
number of clients. Hur et al, Recommended other encoded information worker side
deductions, If ownership changes strongly through any subscription, the cloud workers can
oversee the approval of stolen information in any case and ensure an exclusive collection
keyboard layout. This framework avoids information leaks from legitimate but curious
distributed storage users but abandons additional clients.
The innovation of deduplication reduces the copy information frequently used in
distributed storage management, saving only one copy, limiting the need for management
space and transport capacity. Replication is suitable when multiple clients reuse the same
information for distributed storage, but it introduces security and ownership issues. The
evaluation framework allows the individual to have the same information and verify that he
owns the information in a powerful way to the distributed storage worker. Many clients tend
to scramble information for ensuring protection, before it is re-evaluated for distributed
storage. This will randomize encryption and thus limit redundancy. Later, deduplication
framework recommends focus upon aspects that each owner be authorized to have a similar
12
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide
1 out of 105
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
 +13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
Copyright © 2020–2026 A2Z Services. All Rights Reserved. Developed and managed by ZUCOL.




