SIT719 Report: Analytics Issues at Dumnonia Corporation

Verified

Added on 2022/12/26

AI Summary

This report analyzes the data analytics challenges faced by Dumnonia Corporation, an Australian insurance company, particularly concerning the security and privacy of its extensive customer data. The report explores the drivers for implementing k-anonymity, a method to anonymize data and protect sensitive information, and assesses the technology solutions through interviews with key stakeholders. The analysis covers the company's existing IT infrastructure, cloud systems, and the concerns of the CEO, CIO, and CSO regarding data breaches, cyber threats, and the impact of encryption on system performance. The report also discusses the implementation guide for k-anonymity, data provenance, and potential issues like data encryption and its impact on system speed. It highlights the need for robust security policies, and the importance of protecting customer data while ensuring the effective use of big data analytics.

A report Secondary and Primary Issues in Analytics - Dumnonia Corporation 1
A report Secondary and Primary Issues in Analytics - Dumnonia Corporation
Student
Course
Tutor
Institutional Affiliations
State
Date

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

A report Secondary and Primary Issues in Analytics - Dumnonia Corporation 2
Executive summary
Due to the reason that Dumnonia is one of the competitive insurance corporations in
Australia, and it insures an enormous population. The organization must keep information for all
that population as Big Data hosted in their machine. The organization offer medical insurance
among other various types of insurance. As such, the information for the organization’s
customers including identification numbers, payment details as well as the information
concerning health are kept in the system. The corporate’s CEO is so much concerned about the
security of these sensitive information as well as the organization assets which is a very critical
aspect as the Big Data has medical related data and the information about the organization’s
customers. This initiative will instill full confidence among the organization clients and
employees.
This document has presented a discussion on certain aspects regarding analysis of data as
well as security and privacy related matters. The organization has an out dated security system
like the IDS and Firewall. The corporate also use organizational technologies including malware
protection, encryption and use of password for reducing security risks. The corporate, is however
unsure of the critical aspects of protecting their big data system. They depend on SDN as the
security solution for their big data. The Big Data grows with increase in the public cloud.
Therefore, the traditional security approaches are mainly used by the private sectors. Dumnonia
is a big organization that should not depend on the traditional security solutions, hence it would
put its sensitive data at risk.

A report Secondary and Primary Issues in Analytics - Dumnonia Corporation 3
The organizational Drivers for Dumnonia corporate relating to the implementation of k-
anonymity
Dumnonia is a large corporation that provides insurance services in a wide ranging
domain. The population of the organization customers are increasing. Also, being that Dumnonia
is a big corporate that operates in two countries it has a large volume of data, the corporates big
data is currently expanding and there are challenges concerning privacy that are associated to
such expanding big data. This therefore act as a driver towards the adoption of k-anonymity
approach by Dumnonia Corporate. K-anonymity will is a method which anonymize data fields of
an organization big data system such that private data does not get pinpointed to the records of
an individual.
The organization’s data management system is a traditional organization based system,
they use organizational warehouse, links for their clients through mobile application access or
website. Additionally, the organization has a policy that should be fit its customers so as to keep
their customers happy. This cannot be achieved when its customer’s privacy is under risk. The
corporate also need to gain a competitive advantage by exploiting their data (Ye, Cheng, Yuan,
Xu, Gao, and Cheng, 2016, pp. 268-272; Wu, Zhu, Wu, and Ding, 2014, pp.97-107). For
example, if they know that their customers are in a given postcode, i.e. the customer experience
some issues regarding privacy, the corporate should launch new privacy preservation strategies
i.e. their new big data approach. By this approach, the organization customers will be happy and
healthier as there will be no more security concerns. This is another necessity that drives
Dumnonia towards the implementation of k-anonymity approach.

A report Secondary and Primary Issues in Analytics - Dumnonia Corporation 4
The Technology Solution Assessment
This section presents the corporates drivers concerning the implementation of k-
anonymity. The discussion will be done on basis of the three interviews as shown.
Interview 1
Dumnonia considerably invested in the current IT system. The organization has Cloud
system, customer based web portals and mobile applications which are developed for its
customers. In the face of the ever swelling organisation’s data, it is worth noting that the
corporate has made a critical step towards their dream. Cloud system has become the perfect
vehicle for housing the big data workloads and many organizations have been successful with it
(Hashem, Yaqoob, Anuar, Mokhtar, Gani, and Khan, 2015, pp.98-115). Nevertheless, using big
data and cloud system is associated with various challenges.
Cloud system is subject to security and privacy breach and privacy breach has side effects
which may highly cost Dumnonia. As such, implementing k anonymity approach will play a
critical role concerning system security. Besides, Dumnonia also has a dedicated third party
support from their IT partners in India. They organization however need to be aware that privacy
of its customers should be preserved before they publish big data to their third party. There are
two privacy objectives that are achieved while the big data is anonymized. These include unique
identity closure and sensitive identity attribute closure (Daries et al. 2014, pp.94). Therefore
coupling k anonymity with the cloud and big data system is the route of execution for the
organization in achieving its privacy for its customers. Dumnonia is not yet up to date as far as

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

A report Secondary and Primary Issues in Analytics - Dumnonia Corporation 5
the implementation of the k-anonymity approach is concerned. An analysis which involves being
aware of corporates who find it difficult to implement the big data system was done. Dumnonia
Company shares data with other corporates for insurance purpose, this is also essential for the
overall success of the project.
Due to the concern, the organization should protect the data for its customers living in
New Zealand and Australia which is again a great security breach issue. As such, the
organization needs to implement the k-anonymity approach. This is due to the fact that the k-
anonymity approach will guarantee security and privacy for the organization (Chandra, Ray, and
Goswami, 2017, pp. 89-94; Hodges, and Creese, 2013, pp. 613-621). The Dumnonia CEO is as
well concerned about the organization’s operational system. As the corporate adopt the k-
anonymity approach into a new system and would like to draw data from all sectors of the
organization. The Dumnonia CEO is worried that the corporate is still not aware of the
consequences of implementing the new approach. He talks from the initial implementation point
of view and the current system operation. However, the CEO is very aware of other corporates
that have not been successful with the adoption of the Big Data approach. Due to the reason that
Dumnonia Company needs to expand its business and services to many countries, the
organization need to adopt the k-anonymity approach with full rules and regulations in order to
protect the system from frauds (Jutla, and Bodorik, 2015, pp. 1919-1928).
In k-anonymity, there is no randomization which can be exploited by cyber criminals to
tamper with the organization’s data. The analysis here involves encryption with slowing down
the operation in order to calculate and process a larger volume of data. Issues are compared and
matched with the processing after which the large volume of data is calculated thus slowing
down the sources which seem to be autonomous. The k-anonymity constitute data storage,

A report Secondary and Primary Issues in Analytics - Dumnonia Corporation 6
networking and effective data collection (Menandas, and Joshi, 2014, pp.68-80). By working
with the autonomous services, dealing with the decentralized and distributed control systems to
find out the evolving relationship among data sets is made easy.
Interview 2
The implementation of k-anonymity is reported here with some updates to ensure that
data breach is made impossible without the provided authorization’s authentication (Jagadish,
2016, pp.77-84; Atat, Liu, Wu, Li, Ye, and Yang, 2018, pp.73603-73636). The issue concerning
the release of privately held data versions is that the people who are the subject of cannot be
recognized and this is the key driver of the Big Data strategy of the organization. The issue of
holding personal medical information of the customers is also discussed in this interview. As per
the interview, Guinevere is always worried about the security and privacy breaches of the data,
he opt that by implementing the k-anonymity approach, there will be full support concerning
information security hence protecting the customer’s personal information and privacy. He
further discussed some algorithms related to the k-anonymity approach that can ensure security.
These algorithms include p-sensitive k-anonymity algorithms, she says that this is a simple
version or an extension of the k-anonymity method, she, however, does not get the advantages
and disadvantages of the k-anonymity extensions clear.
Randomization can be set by an effective understanding of loading the sensitive data
which can get implemented easily by use of other records. The security standard here depends on
the multi-party where computation majorly involves dealing with and handling issues involving
functional computation sets through distributed techniques (Ramani, 2019, pp. 2014-2038;
Naderi, and Alizadeh, 2018, pp.775-784). The data is the major issue that works with the
inference strategy and therefore it is essential to preserve privacy maintaining data mining. This

A report Secondary and Primary Issues in Analytics - Dumnonia Corporation 7
allows the calculations depending on the aggregation of statistics through the data sets without
interfering with people’s privacy.
Interview 3
In this interview, the supportive documents as well as the policies which are related to the
privacy approach of the k-anonymity are discussed. As the Dumnonia CSO, Constantine’s
primary focus here is to eliminate all the data breaches and instill robust security policies, this is
due to the reason that Constantine has been initially involved in the initial big data system for the
corporation. As a result, handling and transferring the big data from one place to another is the
major concern for providing data security (Kenekar, and Dani, 2017, pp. 167-190). In this
rationale, the main feature of this part is to recommend some additional features to the k-
anonymity approach in order for it to be enhanced further and offer full authentication for the
security issues.
This discussion is mainly based on various measures for effective modeling of the
privacy and quality metrics. The p-sensitive k-anonymity extension enables understanding and
targeting of the data set attributes. Their main focus is re-identification of users and handling of
potential privacy breach. The procedures are directed to Dumnonia’s Big Data system for
encryption to enhance security and privacy within the organization’s information system (Munir,
Al-Mutairi, and Mohammed, eds., 2015, pp.32; Fu, Wang, Qi, Liao, and Li, 2018, pp.569-585).
However, the organization’s CSE main worry is that the corporate’s operations will be slowed
down when any encryption is implemented within the Big Data system for example computation
and processing of massive data would slow down the organization’s system due to the fact that
the data will need to be encrypted and decrypted constantly. The encryption process should take
place before it enters and when it leaves the big data system.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

A report Secondary and Primary Issues in Analytics - Dumnonia Corporation 8
Moreover, there have been issues regarding k-anonymity. The issue is that the
implementation of the k-anonymity may lead to chances of mitigating the need to encrypt data
due to the reason that much of the data will be anonymous (Damiani et al. 2018, pp. 94). This is
an idea that the Dumnonia’s senior management team does not have while implementing the k-
anonymity approach.
Another issue that is shown in this section involve data provenance. The data provenance
is a policy to the historical database that is residing inside of a machine (Litoiu, and Shtern,
Bitnobi Inc, 2018, pp.46-65). It provides Meta data therefore if it is not accorded the care it
deserves, it may end up altering the data sets which have no use since the unauthorized changes
that take place in the Meta data may lead to wrong data sets thus making it difficult to find the
information that is required. Additionally, some untraceable sources may be a big obstruction to
tracing the roots of the cases of fake data generation and security breaches.
The implementation guide for k-anonymity
This section presents the implantation guide for k-anonymity. The guide is presented with
reference to the technology solution assessment in the Dumnonia corporate discussed in the
previous sections, starting with interview 1, interview 2 and lastly interview 3.
Interview 1
In order to prevent the security attacks that might lead to data breach, the k-anonymity
must be up to date, the microdata has to be done by modification of the microdata k-
anonymization method. Due to the potential increase of the volume of data, an effective
technique for anonymization of the data becomes challenging. However, this section will
propose a better algorithm after a series of trails and systematic comparisons like it was

A report Secondary and Primary Issues in Analytics - Dumnonia Corporation 9
discussed in the preceding sections of the document. This will be done along with its efficiency
and effectiveness.
Literature has help researchers to find out the relationship that exist between the k values,
the choosing of the a quasi-identifier, the degree of anonymization as well as the focus on the
time of execution where k is considered to be a random value since it has been taken as p or
something else in the previous section. Similarly, some algorithms for anonymization has to be
employed (Jordan, 2017, pp. 01). However, the worry is the system’s operation aspect with the
system data across the entire corporate. Adoption of the Big Data system promises the ability to
share data sets with government entities and other corporates. There is concern among the
organization staff regarding how the k-anonymity will help in dealing with security and privacy
to ensure data preservation and work on the operational costing strategies. It is essential to
understand what is meant by holding and allowing the easy sharing of the organization’s data
sets. The k-anonymity is the referenced in this scenario, where the MapReduce method could in
proper working with construction thus handling the scenarios involving the non-published data
(Bilfaqih, and Khatoon, 2016, pp.09). This algorithm should be coupled with an operation that
need to get proposed and worked on in order to fix up the issues regarding scalability.
Interview 2
Due to the reason that an occurrence of data breach would be a very bad impression for
Dumnonia Corporation, one of the important thing that should be the organization’s priority is
the protection of the corporate’s information system by Bug Data security. Moreover, the
organization still uses the traditional encryption technology including the use of passwords for
transferring files from one place to another. The data processing speed will increase when
encryption is done to protect sensitive data (Yang, Wang, Ren, and Yu, 2017, pp. 243-263). It is

A report Secondary and Primary Issues in Analytics - Dumnonia Corporation 10
necessary to enact the management policies for cryptographical material access whereby security
of the static information needs working through management with specific types of calculations.
The system security is with some directions that involve handling the methods which are
related to contents which are social network user-generated and Big Data applications such as the
web based big data system (Antonatos, Braghin, Holohan, Gkoufas, and Mac Aonghusa, 2018,
pp. 1531-1542). However, the organization concern is that by anonymizing the target data, its
attributes may be stored in cookies on the user’s web browsers which may be used to identify the
organization users some other time which may lead to privacy breach and the system is
automatically adopted by browsers. Guinevere also expressed her concerns regarding the areas
that may be subjected to data breach and a concern on various platforms. She, however,
acknowledge that she was not fully aware of the problem and the actual cause of the issue.
Interview 3
UDP-based data transfer protocol is an efficient data transfer protocol that is used to
transfer massive data sets in Big Data System through a high speed WAN network (Burmeister,
Lang, Bayrle, Catalkaya, Stelzer, and Schiebel, 2016, pp. 162-176). Nevertheless, the approach
will be accompanied by some implementation problems when using it with big data systems or
cloud and when an encryption takes place. The Dumnonia corporate should focus on the
fabrications for generating reports which is related to the corporates data that may have been
directed by encryption and decryption.
Additionally, it is essential to focus on the changes that are not authorized in the Meta
data in which the untraceable sources of data may be a weakness for identification of the causes
of security with determination of the cases which are related frauds. The process of encryption

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

A report Secondary and Primary Issues in Analytics - Dumnonia Corporation 11
should take place before the data enter or leaves the big data system. As discussed in the
previous sections of this document, there are also some issues that adopting the k-anonymity can
reduce the need to encrypt data due to the fact that much of the data will be anonymous.
Comparison of the publicly available implementations
The proposed algorithms proposed for this project include Mondrian and Datafly
algorithms.
Mondrian algorithm: this is a multidimensional algorithm for portioning domain space into
various regions containing at least k records.
Datafly algorithm: this is an algorithm which provides an anonymity in medical data. It can
generalize attributes which have most discrete values until k anonymity is fulfilled.
This section presents a comparison of the publicly available implementation of the k
anonymity algorithms mentioned above. The publicly available implementations we use are from
ARX Data anonymization Tool and by UTD anonymization toolbox respectively. This was done
in order to find out the factors that affect performance for various algorithms so as to provide a
guide for selecting the most appropriate algorithm for Dumonian Corporation. The adult data sets
were accessed from ARX Data anonymization Tool. The NCP percentage was produced for each
data set for k=10 and analyzed. According to the experiment, Mondrian shows a low sensitivity
compared to datafly. It is also seen that the performance of the algorithms were much better for
the adults with regards to efficiency.
In the UT Dallas anomization tool, adults and INFOMS were used as data sets. The
following table presents the configuration that was used in experiment.

A report Secondary and Primary Issues in Analytics - Dumnonia Corporation 12
Experiment Parameter The size of datasets
|QIDs| |QIDs| = 2
k-value= 3
Adult :60,475
INFORMS: 60,000
k-value |QIDs| = 6
k-value = [10, 30, 60, 120,
240]
Adult :60,475
INFORMS: 60,000
Size |QIDs| = 6
k-value
Adults: 10k, 20k, 40k, 50k,
100k, 200k
Table 1: The dataset configuration used in UT Dallas anomization tool
The following shows the meaning of the varied parameters used:
a. K-value: this parameter shows the level of privacy that an anonymization algorithm must
satisfy.
b. |QIDs|: this shows the number of attributes which are contained in QID (quasi
identifiers) set.
Executing all algorithms using one framework enables a comparison for a fair
performance. In the UTD anonymization toolbox, the intermediate anonimization data set were
hosted in the system database when this implantation was carried out. This application does
implementation by selecting all attributes. Concerning Mondrian and datafly, Mondrian
performed better compared to datafly; this is attributed to the fact that Mondrian produces the
maximum number of EQ. Therefore, with regards to group size based metrics, it can be deduced
that Mondrian performs much better than datafly.