An Investigation into Shilling Attacks on Recommender Systems: A Study

Verified

Added on 2022/11/14

AI Summary

This project delves into the critical issue of shilling attacks, also known as profile injection attacks, targeting recommender systems within e-commerce and online marketing platforms. The research investigates the effectiveness of these attacks and explores various detection and mitigation strategies. The project employs several methodologies, including the use of SVM classifiers, decision trees, and the Naive Bayes algorithm to analyze and identify malicious profiles. The study further includes coding implementation to simulate and test the impact of shilling attacks, as well as the development of an e-commerce website demo to illustrate the practical implications. Data analysis, including the building of a Binary Decision Tree (BDT) with intra-cluster correlation attributes, is performed to identify suspicious attack segments. The project also considers ethical implications, limitations, and the benefits of global communication in addressing these attacks. The ultimate goal is to provide insights and recommendations for enhancing the security and reliability of recommender systems, protecting both businesses and consumers from manipulation.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

ON THE EFFECTIVENESS OF SHILLING ATTACK OR
PROFILE INJECTION ATTACKS AGAINST RECOMMENDER
SYSTEMS
1

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Table of Contents
Chapter 1: Introduction....................................................................................................................5
1.1 Introduction............................................................................................................................5
1.2 Background of the study........................................................................................................5
1.3 Research aim and objectives..................................................................................................6
1.4 Significance of research.........................................................................................................6
1.5 Limitation of research............................................................................................................7
Chapter 2: Literature Review...........................................................................................................9
2.1 Concept of shilling attack......................................................................................................9
2.2 Identification of different types of attacks...........................................................................10
2.3 Impact of shilling attack on online e-commerce site...........................................................10
2.4 Prediction shift and rating variances of shilling attack........................................................11
2.5 Credibility of group users by rating prediction model.........................................................12
2.6 Locate suspicious attack segments by rating time series.....................................................12
2.7 Detecting profile injection attack.........................................................................................14
Chapter 3: Methodology................................................................................................................15
3.1 Data collection................................................................................................................15
3.2 SVM classifier.................................................................................................................15
3.3 Decision tree...................................................................................................................17
3.4 Method of Fog war..........................................................................................................20
3.5 Research onion................................................................................................................21
3.6 Research philosophy.......................................................................................................22
3.6.1 Justification...................................................................................................................22
3.7 Research Approach.........................................................................................................22
3.7.1 Justification...................................................................................................................23
3.8 Research Design..............................................................................................................23
2

3.9 Data collection methods..................................................................................................24
3.10 Population and sampling.................................................................................................24
3.11 Ethical consideration.......................................................................................................25
3.12 Limitation........................................................................................................................25
Chapter 4: Data analysis................................................................................................................26
4.1 Introduction..........................................................................................................................26
4.2 SVM Classifier.....................................................................................................................26
4.3 Building a BDT (Binary Decision Tree) with intra-cluster correlation attribute.................27
4.4 Traversing BDT for detecting shilling profiles....................................................................29
Chapter 5. Implementation............................................................................................................32
5.1 Detection of shilling attack with coding..............................................................................32
5.2 Impact of shilling attack on recommended system with data..............................................32
5.3 Identification of most dangerous attack on recommended system......................................34
5.4 Overview of the proposed Algorithm..................................................................................36
5.5 Implementation of developed coding with the help of respective models...........................37
5.6 Comparison existing approach with the proposed model....................................................40
5.7 E-commerce website creation..............................................................................................41
5.8 Demo for downloading user data from e-commerce website..............................................47
5.9 Analysis of the result or outcome........................................................................................61
5.10 Identification of method or techniques to prevent recommender system from shilling
attacks.........................................................................................................................................74
5.11 Benefits of global communication.....................................................................................75
Chapter 6: Conclusion...................................................................................................................76
6.1 Conclusion...........................................................................................................................76
6.2 Outcome of the experiment..................................................................................................77
6.3 Future scope.........................................................................................................................77
3

6.4 Recommendation.................................................................................................................78
6.4.1 Recommendation 1: Implementation of customer feedback.........................................78
6.4.2 Recommendation 2: Encourage open communication with consumers........................79
Reference list.................................................................................................................................80
4

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Chapter 1: Introduction
1.1 Introduction
Recommender system is referred to sub- part of filtering system that enables consumers in
getting right product at reasonable price. It has been using on different online marketing and e-
commerce platform that improves the services and customer experience gradually. The purpose
of this research is to examine different effectiveness of profile injection attack and shilling attack
with the help of different models and algorithm to reduce the rate of this attacks at significant
level. Online selling platform and digital marketing are contributing a large market all across the
world that enables consumers to purchase desired products at affordable rate (Alonso et al.,
2019). Recommender system is beneficial for the consumers who compare products’
specification and prices to get right products. The rise of online marketing allows hackers or
attacker to influence the decision of consumers in purchasing relevant products through profile
injection attacks. The research sheds light on different models and algorithm to reduce the
shilling attacks as far as possible.
1.2 Background of the study
The rise of digital marketing and online selling websites demand for an effective recommender
system that allow consumers in getting right products and services within given time period.
Amazon and Netflix are two examples who are using recommender system to improve their
services by helping consumers in getting right products. A collaborative filtering algorithm has
been used to improve the reliability and feasibility of recommender system that eventually
upsurge the profit and sales performance gradually. The implication of recommender system is
successful within e-commerce sites that enhance the competitors gradually. On the another hand,
shilling attacks have been evident in last few years in which an unauthorized or fake users have
been insert to users item matrices. This approach benefits other competitors in gaining
competitive advantage by predicting rating that help in changing needs and demand of
consumers. The world population is about 7.6 billion in which 4.1 billion users are internet users
that demand for an effective algorithm. This research scrutinize different shilling attacks
strategies and shilling attacks detection schemes so that potential attacks can be minimized as far
5

as possible. There are ranges of independent qualitative factors that are using in recommender
system that help in detecting shilling attacks. The metrics standard deviation in user’s rating,
Number of Prediction-Differences (NPD), degree of similarity with top neighbors and degree of
agreement with other users. This research emphasis on different models and algorithm to
detected shilling attacks that help in improving recommender system.
1.3 Research aim and objectives
The aim of this research is to examine effectiveness of profile injection attacks or shilling attacks
with the help of different models and user metrics algorithm.
 To understand the concept of shilling attacks against recommender system
 To use Naïve Bayes algorithm to understand the pattern
 To develop the code by using python language so that user metrics can be changed
1.4 Significance of research
The research emphasis on broad perspective of shilling attacks against recommended system so
that sales and financial performance can be improved by improving sales performance gradually.
This research supports many online platform and e-commerce sites to provide right product at
reasonable price that facilitates in gaining competitive advantage vigilantly. In the UK, two
majorly attacks have been evident against recommender system that includes push attacks and
nuke attacks in which users faces difficulties in purchasing desired products. The developed
coding and implementation module help in amending significant impact on recommender system
that eventually improved customer services. The effectiveness of recommender system has been
improved through implementing detection technology vigilantly. A detection method with
respect to group users that help in reducing shilling attack against recommender system through
changing user’s metrics algorithm. A rating prediction method has been used in order to trigger
shilling attack on the basis of credibility evaluation method by resisting unauthorized access. The
wide aspects of recommender system have been critically examined by making unfair
competition that result in loss of genuine users. Suspect time series has been implemented into
users algorithm that help in prioritizing the tasks and attack profile so that potential attacks can
be reduced as far as possible. Shilling detection method is considered as one of the feasible ways
6

to synchronize the entire data base (Alostad, 2019). One of the major benefit of this research is
reduction in shilling attacks that a improve customer experience by protecting users identity. It
enables e-commerce sites and online marketing platform to strengthen recommender system by
using shilling detection algorithm that allows users in gaining trust model. The research allows e-
commerce sites to develop user’s friendly algorithm to assure that the needs and demands of
consumers can be met effectively. Therefore, a proper research needs to perform related to
recommender system wisely that facilitates in assessing interest and preference of consumers
positively.
1.5 Limitation of research
There are several limitation of research that reflect on feasibility and reliability of developed
recommended system. The major limitation of this research is constraint budget that create a
range of problems in assessing advance technology in developing shilling detection algorithm. It
resists researcher in implementing new and advance techniques in improving user’s service
experience to make sure a significant change can be deployed. In addition, constraint budget has
also created adverse impact on research by set the boundary of research. The limited research
reduces the development of effective and user’s metrics algorithm to prevent potential shilling
attacks against recommended system. On the other hand, limited resources also create an adverse
impact on research by changing user’s service model and algorithm. A sustainable resource is
required to deploy for research so that each task or activity can be performed in a systematic way
by triggering each deliverable positively. Similarly, the requirement of a firm machine language
is required to implement or use so that user metrics algorithm can be change that help in
reducing unauthorized identity positively. The limited resources and time reduces the reliability
of research by resisting researcher in utilizing appropriate tools and techniques for successful
accomplishment of task.
1.6 Scope of research
The research sheds light on potential issues and challenges in reducing shilling attacks against
recommender system to make sure an effective coding can be developed. The internet revolution
increases the market of online marketing and e-commerce platform that demand for effective
system and user’s algorithm. This user algorithm helps in providing right product at right time
7

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

based on consumers’ preference and choices to assure that maximum benefit can be gathered
effectively. This research enables several e-commerce sites in assessing the needs and demand of
consumers that eventually strengthen the relationship. The major advantage of this research is
improvement of customer experience by giving right product at right time by protecting user
personal information. A constructive approach has been applied to understand different variables
of recommender system to make sure shilling attacks can be minimized as far as possible. A
detection method with respect to group users that help in reducing shilling attack against
recommender system through changing user’s metrics algorithm. A range of models and
algorithm methods have been applied that help in developing an effective algorithm that
strengthen the recommender system effectively. There are ranges of methods and model have
been using into the recommender system that help in giving right product at r right services to
assure that maximum benefit can be gathered positively (Patentimages.storage.googleapis.com,
2019). A suspect time series has been implemented into users algorithm that help in prioritizing
the tasks and attack profile so that potential attacks can be reduced as far as possible. Shilling
detection method is considered as one of the feasible ways to synchronize the entire data base.
8

Chapter 2: Literature Review
2.1 Concept of shilling attack
E-commerce has entered into every one daily basis life and important a part daily routine. People
now a day completely slave of e- commerce .people buy and sale all kind of things for example
food, clothes, transport, movie, job, apartment chat on ,etc. In short they can’t imagine without
internet or addicted. its impact positive and negative both way here we are going to explain how
some retailers are track our record and search and accordingly they recommend their choice of
things and willing to make us feel how this product is important. Normal users are bound to buy
those useless product just because they are seen them top fake rank. Users are mainly trapped in
this case those who highly rated .overload of information is the main cause of its bad conditions,
online retailers try to generate their product top rank to attract theirs buyers. Some are push
behind this strategy only one economic motive according to survey sixty four percent users are
attract only to see items rating and judge them by comments. The profile injection is known us
shilling attack. Push attack and nuke attack are two type attacks (Bilge, Ozdemir and Polat,
2014). In push attack specified item and injecting biased ratings on the other hand nuke attack
influenced by just opposite way of push attack. Shilling attack causes damage on a lot in user’s
recommender operating system.
Most of the buyers are register their account their on social media like Facebook, Instagram,
LinkedIn or they are using direct their users id, which make easier way to know about customer
to their choice preference. The retailer make few fake profile and give some different types of
product which they wants to sale to their buyers comment a positive review and give five star.
Most of the time buyers have only one faithful choice to choose only that product that is highly
rated and given below a positive review. Shilling attacker traps them via their other social media
information. It is very hard to differentiate which one is real or fake comment. In shilling attack
profile perfectly designed like no one guess it is completely fraud or real. Push attack mainly
used for to for high market demand to increase his productivity. On other side nuke attack are
using for decreasing the demand of competitors. There are so many attack we just here discussed
few of them segment attack is kind of where they target only group of audience like if he like
comedy movie than they provide them to comedy and recommend his group. The attacker
9

bandwagon attack is the phenomenon where the trend or fashion gone viral which already
adopted by other. in this phenomenon more people comes to believe that a particular fashion is
more trendy .average attack are those where when they judge by maximum and minimum value
in between and that much crowd seen as much seen bandwagon attack.so here we discussed all
of the aspects of retailer recommendation and their attack. how they change and to get their
buyer given data to track and follow and make them fool for their own profit making and take
advantage of their customers casual and unawareness behavior which make them in trouble
(Dhawan, 2016).
2.2 Identification of different types of attacks
we are reach in that age where technology are advanced and work very smooth and fast .only one
click we get all history past present information about any things. At past we are not that much
advanced normally people depend on other people to know the product value quality demand is
good or bad .now a day we get recommender system where we know historical behavior .The
most popular approaches are content based and collaborating filtering .In content based approach
requires we get good amount features example it can be movie review actor ,director content
article etc. on the other hand on the other hand collaborating filtering its only require users
historical preference .it generating a recommendation it determine both users similarities and
dissimilarities .the future announcement or prediction are only judge items scores and similarities
.we will only describe it only two categories
1.exlicit rating-its rate only given by their buyer to particular product on sliding scale like four
star for jungle book .this called direct review (Dwivedi, 2018) .The users express their feeling
how he feel.
2-implicit rating-here implicates rating users indirectly express like or dislike for an item for
example whether or not listen to music track, click. Here we are going to show you a closer look
at collaborating filtrating.
2.3 Impact of shilling attack on online e-commerce site
E-commerce is spread huge area beyond our imagination. People cannot imagine how other
cleverly people steal their data and information through our own small careless activities make
10

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

for us a big trapping net .Online world also have dark side area where we unknowingly
participate without our knowledge. We share our knowledge ideas. We share our thought email
id, bank detail; credit card detail, friends and family, contact number our daily basis life style
mention over there, we lives in imaginary world. What we thought today, what we feel
yesterday, what is our desire destine etc. We never know their all things are share by some
hidden person and with the help of all these data they are going to be making profit and balance
are gone through our pocket .They controlled our mind as they wanted to do, how many time we
visit sites, clicking by mistake ad sense, how much we spend time to watch our likable things
and willing to buy .some shilling attacker secretly watch our each and every activities and
present and past future style choice and preferences. After all these circumstances they
recommended us according to our choice taste, and preference wise to buy product .Below the
comments of given items they given fake reviews and likes to attract viewers and try to
convinced and control our mind and prepared us that these are items are that which the buyer
waiting for long time to buy its part of our life mean to say necessities .If we skip or ignore they
recommend us again and again different types of as per our requirement (FAKE USER
PROFILE DETECTION ON ONLINE SOCIAL NETWORKING., 2017).They refer as to where
we visit more frequently soon as soon buyers getting addicted to this site ,for hope may they get
more offer more discount . Shilling attacker increase their own choice item and decreased value
of competitor’s product, at the starting point buyers see only rupees two fifty item to pay but
after clicking order placed they got notification hundred rupees extra for shipping charge include.
These are their strategy to trap customers.
2.4 Prediction shift and rating variances of shilling attack
This predicts is based on fluctuation of rating scores how it look before and after the
implementation during attack. In few experiment they show how it inject how it attack an
shilling type of attack size attacker fake way to change .the stalker stalking users data and as per
your choice they rate the desirable item given five rate and present to make users feel comfort
zone. Rate variance is very common in this shilling attack. E-commerce deep impact on users
and non-users here using word non users for those who is not aware or rarely using sites or less
active and feel fear to visit sites .They users are already trap to waste there with blindly non users
are those who entered into their just a faith on users words .rating variances are fluctuate time to
11

time if retailers wants to make down competitors they increase the rating of their items wise
versa (Fire et al., 2014).
2.5 Credibility of group users by rating prediction model
The main aim of our research is to web content evaluation which is focus on people. The shilling
attacker hit each interval. Credibility scores q1 for each users it is based on users attack items are
score on average score of profile. In this section depends on rating, it is proposed to calculate
completely credibility of users previous data and information. They actual work on assumption
basis which show difference between forged rating and rating of genuine profile. In comparison
genuine profile and attack profile should be higher than in range. Whenever rating variance
increase automatically accuracy of prediction decreases. Therefore, measurement descriptor is
much more confidence to recommender system for injecting. They actual work one users on data
base which help to them more predictable for users and users group of people of circle. It defines
as the worst standard, the value as much as higher, some forged highest probability rather than
normal profile. On the other hand the small value minimum chance or we can say genuine
profiles set to be a small value. They set range in between zero point eight (0.8 to 1.2) to one
point two in hybrid according to recommendation systems.
At the previous discussion we notice that the attacker generated average score of rated items
which given by their users. We clearly notice that they recommend only highly rated variance are
highlighting from shilling attacker. If it seen something abnormally rating variance occurs in
profile for some particular time of period, which makes them way to prediction plan and inject
users to trap them in there term and condition without users knowledge .When the users are
rating any particular item which shown his internet on it or preference s how much they like or
dislike that particular item. The users are self-informer, if profile is continue given by without
any disturbance so that profile is genuine profile category. (Gurajala et al., 2016) Conversely, if a
profile is forged it’s say that the users profile is already on an attack profile; normally the rating
is given in between one to five star. So, the attacker easily identifies a user is rated or unrated
items. Randomly filler items are pickup, but unrated items are already present there.
12

2.6 Locate suspicious attack segments by rating time series
Attack profiles are calculated and present according to their composition and so many different
varieties of attacked models. In experiment varied it in two parameters first of all we focus on
attack size and secondly filler size where we get some certain prediction shift. The attack size
measure from approximately two percent (2%) to fourteen percent (14%) in between on the other
hand the filler size measured in between one percent 1% to ten percent10%. Here we going to
discussed in detail about attack profile how to be fall into suspicious window and how many
profile are injected or effected by attackers whenever confident coefficient turn automatically the
value of depends on the confidence switch or seen same kind of change. At that condition, the
genuine profile is going filtration and shift into second phases. However if lower confidence
coefficient are there as usually false and negative rates increase automatically and spread bad
impact overall area, on the other hand when injected profile not judge in positive way in its
create a false and shown as genuine profile. It retrieve is stuck in first phase ,only nighty percent
90% confidence coefficient was selected on paper .The attack size increase only when the
percentage of injected profile are fall within suspicious period of time. There are millions and
billions of profile which regularly used by their users, here we determine only the suspicious
profile. actually it is time consuming to verify to all data set, in shilling attack we focuses on that
group who exist in clustering characteristics, not all profile and ratings are high suspected by
attackers we select and bear it into subsection and out only that part who belong to under
subsection, and using previous techniques to catch suspicious subsection and pick only suspected
subsection. It mainly help to prevent attacked profile is reduced suspicious profile range and
these are based on algorithm which apply to reduce these type of fraud activities and control to
make them more efficient and prominent .To construct to these series foe any product ,firstly
select all those items which is stamp into data stream (HEMELRIJK, 2008).We compute its
sample and each any window ,after we decide whether this window is anomaly by testing or
not .any of item we have to extract important information from rating .
Attack models
An attack profile here we introducing them already listed in recommendation for targeted item.
Usually in many cases we find that targeted items are unpopular or average in comparison actual
13

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

rating variance that are less popular or low rating by users there mainly four most popular attack
in recommender system .Here we are write down firstly random attack secondly average attack
thirdly segment attack and last bandwagon attack, for all four these we already elaborated in
previous topic. Rating in an targeted profile can be divided into three different set number one is
targeted item and secondly selected items, usually they are chosen by random series.
2.7 Detecting profile injection attack
Here we are going to present novel technique to detecting group attack profiles. They also extend
their work for entering to the depth of the details analysis of any targeted items .experiments are
improve detection rates for recommender system. Whenever data sets are increase automatically
the efficiency of TIA became decrease. Heather to undertake action or not cost –benefit is play
an important role in recommendation .Normally main motive for attacker to getting highest profit
making through recombination scheme with the help of high prediction shift. So they apply short
period of time in their system and injected profile for only there as much as they can get profit.
When there are no attackers in the raw rating the flows are generally flow very normal way to
distribute. These are basically divided into two phase, at first part identifying and then secondly
examine data select suspected rating segments.
This is totally based on recommender credibility users. In this paper idea is out come how
negative impact on users and how they play with rating variance .It is totally based on prediction
based and assumption ,methodology they apply few kind of formula and trick to trap users and
their customers for their money making scheme. The strategy is basically rules and run on
algorithm method. These activities are held on very small period window, traditional detection
cannot get this way (Hupp et al., 2016). In this group model only rating factors between users is
considered. They calculate all type of factors like social account their activities time zone how
much they spend time over on it. How many times they visit they recommended knowing how
much likes items kindly rate them and so on, they cannot get idea about similar choice product to
present in front of them. They make them feel free and make them place under comfort zone,
they also use to be that desirable product .Quick and easy way to buy for future buyers very
careful and be alert during buying or visiting any type of sites and app, whenever to visit any
sites make sure try not to exposed your identities their details. If it is not possible for buyers, so
14

they always alert when to go e-commerce activities have done. If not required do not entered and
waste valuable time and also do not gives to their personal information, emotion, social status,
pictures choice taste and preferences.
15

Chapter 3: Methodology
Digital marketing is the biggest market in world now. Everyone is busy in their busy life.
Therefore they used to buy their needs from online shopping. They compare products which they
get from suggestion. They choose products according to reviews and comments and then they
decide the products are good and perfect for them or not. But there is a huge problem that all
suggestions, reviews and comments are real or not. These things are coming from genuine users
or not. Sometimes competitors use forgery identity for reviews and comment to promote their
products or demote some products. Even sometimes hackers do such things. Shilling attacks are
most worst way now a day to push their products in front of suggestion and their main aim is to
manipulate the users and gave them recommendation of their products which is actually not up to
the mark. Identification of fake profiles helps to avoid the shilling attacks. In this section the
different techniques used to identify the fake accounts on the ecommerce site are discussed. This
chapter discuss the informations like data collection from the E-commerce website, and various
algorithms and techniques used to find out the fake ID from the dataset.
3.1 Data collection
Generally, an E-commerce website allows user registering. And these data are stored on the
server. Here these data are needed to be checked to find out the fake accounts. By using the
administration panel the e-commerce website’s user account details are gathered. Most of the
administration panel plugins allow saving the user data in different formats like excel, CSV etc.
In the developed model CSV file has been given as input. And the system processes the CSV file
and identify the fake accounts on that. For the demonstration of the developed model, both fake
ID and correct ID are classified as two separate CSV files are used as input. It allows checking
whether the model works well or not. (Lawrence, 2015) Initially the model mix both datasets.
Then it processes the dataset and finds the fake ID users from the combined datasets.
3.2 SVM classifier
A Support Vector Machine or SVM model is a demonstration of the illustrations. They point into
space, marked as the illustrations of the distinct classifications. Moreover, executing linear
categorization SVMs are effectively executed a non-linear categorization. They map the inputs
16

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

indirectly into an hd space. In machine learning, SVMs are supervised designs through
connected learning algorithms. It evaluates information utilized for categorization as well as
regression analysis. SVM is a discriminant classifier normally explained with a hyperplane. In
supervised learning, the algorithm yields the best hyperplane. It classifies new illustrations. It
aspects like a hyperplane such as decision function. There is a dataset. It contain sets (x (i), y (i))
∈ R n × {0, 1}. Then utilize a classifier that is not linear and an RBF kernel. The RBF kernel
must be expressed by k(x, x0) = exp (−rkx − x 0 k 2), r is known as the kernel bandwidth. It is
regulated on the basis of outcomes. The Support Vector Machine algorithm initially represents x
to an hd space with a function ψ. It also finds a hyperplane H. It increases the distance among ψ
(xi) as well as H. When Xi=b, the decision function is formulated as f(x) = hw, ψ(x) i − b. The
class label of x is given by f(x). The entire results are completed by the kernel k. Implement a
probability model for categorization by utilizing the R package “e1071”. The decision values are
fixed by utilizing maximal probability to result denoting likelihoods. Then utilize the raw
Support Vector Machine results for categorization. It maps the results to likelihoods. It permits to
link Support Vector Machine outcomes through new models.
False Positive as well as False Negative evaluation
There are a group of associates from an organization. They contain a particular IP address.
Certain areas of profile details might be the same. (Li, 2014) To refer to false positives, there is
an executive account inspection model. The construction of classifier in production takes place.
The logistic accounts that it marks as fake and transmits for manual evaluation. These
approaches are supported to solve the false positive problem. Similarly, manual evaluation of
every accounts in the datasets which are calculated to be valid. It is considered through
individuals as fake. In more instances, it acts like the likelihood in the model was accurate. If
there are appropriate signup, the great number of signups drop in an individual cluster. This
activates particular rule-based model to mark them as fake. The individual labeler must mark it
as fake for similar things. Though, when the size of the cluster raises, account profile designs in
the cluster varies greatly. It looks usual as of model’s viewpoint. The model is competent to
appropriately label the cluster as valuable.
17

The above figure describes why clusters progress to higher and the model became more precise.
The model establishes errors in the earlier human label and, inverted the earlier decision. It also
marks the accounts which are valid.
Advantages and Disadvantages of SVM algorithm
Advantage
 Provides high generalisation performance in high dimensional feature spaces
 It polynomial kernel functions facilitates learning with combination of multiple features.
 It works efficiently with even unstructured data
 Highly scalable.
 Risk of over fitting is less in SVM.
Disadvantage
 Selection of most suitable kernel function is more complex than other algorithms
 Takes higher training time for very big datasets
 For reading and interpreting the results requires a skilled person
 Doing small calibrations on the model is difficult.
3.3 Decision tree
Decision trees a most popular tool for predict and categorize. The algorithm that is used to
classify between different parameter is fit on decision tree. A tree is like a chart where can
explain each and every terminal node, outcome of test, and internal node in a structure.
Construction of Decision Tree
A tree can be created by divided the source set into subsets on the basis of attribute value test. In
a recursive way if this process will repeated then it’s called Recursive partitioning. (LI et al.,
18

2016) This process is gone complete when the subset at a node with the same value of variable
target. The formation of Decision tree Classifier does not need any kind of parameter and all.
Decision tree can manage dimensional data and it has very good accuracy. It helps us to learn on
categorization.
Structure of Decision Tree
Anyone can effectively evaluate the cost to live in a specific region, the simplified decision tree.
With the help of root node, location is assessed through children nodes like Neighborhood,
price.isMod that describes values for a particular location. Through the outputs of the logical
test, boolean values can be obtained as well. We can also derive the formula called with the
assistance of a distinct figure by which the price of the homes in the areas comes to know
expensive.
Advantages of Decision Trees
With the help of the decision tree algorithm, we can produce authentic and accountable models
where the intervention of the users seems merely or zero. This algorithm can be used effectively
for both binary and multiclass cases during the classification.
19

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

The algorithm is fast and accurate for both purposes whether for building time and applying
time. Besides, the building process for the decision tree is also assumed occurring at the same
time where the scoring can occur parallel without keeping relation with the algorithm.
As scoring of the decision tree is obviously active so the structure of the tree is created in the
model building can be used to during the simple tests of the series which are most commonly
based on the individual predictor.
Disadvantages of Decision Trees
Overfitting is one of the most common flaws in the decision tree. With the settling of various
restrictions on the model parameters like depth limitation as well as creating the simple model
are the ways by which decision tree can be continued enhancing the ability of decision trees
efficiently will generalize through various occurred test sets.
Predicting the various for a long period of time is another flaw of the decision tree during the
adoption of the constant numerical input. However, these are also not assumed the practical ways
to get such values due to the decision tree predictions which are segregated into discrete sections.
These are the most probable circumstances created during the loss of information during the
application of the model to constant values (Most People Can Identify Fake Laughter, 2018).
Heavy feature engineering is another defect occurs in the decision tree, while the flip side of a
decision tree acts as explanatory powers which have the great need of complex featured
engineering tactics. During the handling process of the unstructured data through some latent
parts, a decision tree is made sup-optimal where neutral networks clearly come exceptionally in
the same case. Besides, we should also understand that the decision trees or random forests are to
know how they are made in fact and during implementation we know it that these are non-trivial.
Best terms to describe the importance of the decision tree
Root Node: A root node is mentioned in the very beginning while it represents the entire group
which is to be interpreted. We should also know that the population is divided as per the various
features mentioned and thus it forms the subgroups which split into every decision node
supporting the root node.
20

Splitting: Wirth the help of this process, a node can be divided into two or more sub-nodes.
Leaf Node or Terminal Node: Unsplitting of the nodes is called the leaf or terminal nodes.
Decision Node: While splitting the sub-nodes into further sub-nodes, it's called the decision
nodes.
Pruning: Excluding activity of the sub-nodes of an origin node is called the pruning. We should
also know that the tree which is developed through splitting and thus contracted are due to the
pruning feature of the decision tree.
Parent Node and Child Node: These are called as the corresponding terms where nodes that
come supporting another node will be a child node and process of introducing done by some
specific nodes which are called the parent node of the child nodes.
Branch or Sub-Tree: This is also an important term for the decision tree which is known as the
sub-section of the decision tree while a portion of the graph is called a sub-graph.
Working process of Decision Tree Algorithm
Selection of the best trait with the help of the Attribute Selection Measures (ASM) procedure has
the capability of splitting various records. We also create the best attribute a decision node
splitting the dataset into smaller subsets. Thereafter, we start the tree-building process for many
times repeatedly for every child until getting the condition of meeting the same. Hence all tuple
will also start belonging to the same attribute value while there are not many remaining attributes
here.
3.4 Method of Fog war
This is the strategic term invented by Carl Von Clausewitz where the war is called an area of
uncertainty. It's the three-quarters of the factors that account for actions in war is thoroughly
based and are also wrapped in foggy conditions or during the least uncertainty moments. Here, a
term, skilled intelligence to spread out the truth also exists by which the sensitive and
discriminating judgment occurs.
21

Fog is stand for uncertainty in war. The fog of war is truth of all military conflict. The term Fog
of War applies to experience of soldiers in war. While soldiers separate during war and lost their
communication, instruction become confused, sounds and vision are gone limited for individual
and cannot be resolved easily that cause uncertainty on other word Fog of War.
It is completely depends on situation during war. While soldiers are in war according to situation
they separated and they lost their group and lost communication but it can be decrease by their
intelligence, surveillance and reconnaissance technology. It completely depends on spontaneous
decision making in the fog of war or in other words The Art of War. There are situation when
soldiers are unable to follow the commands on that time soldiers need not be follow the
commands. But when you follow the fog of war; don’t wait for commands. They must have to
ability to handle the situation to take positivity of situation and take own decision.
3.5 Research onion
Figure 6: Research Onion
(Source: Saunders et al. 2015)
A research onion diagram is effective for understanding the three underlying principles of
research philosophies that are ontology, epistemology and axiology. Selection of research
philosophy is an essential aspect as it allows the researcher to determine the path by following
the future research would have been conducted. From that point of view, research onion can be
considered a roadmap or pathway guide for understanding the way of forwarding into the
22

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

research process by following a valid scientific method. This diagram is effective for the
researcher as they pursue to go further from the outer to the inner layer. The more the research
onion would be opened, the more clear the research outline would be. Hence, it can be stated that
research onion helps include scientific procedures in the research.
3.6 Research philosophy
Research philosophy helps to deal with a wide variety of sources and nature for developing
proper knowledge in the study through appropriate evaluation. (Ramalingam and Chinnaiah,
2018) The research philosophy has been categorised into different segment that includes
interpretivism, positivism and realism, and so on. By the application of positivism philosophy,
the researcher is allowed to develop a highly structured research model by incorporating a large
sample of data (Kasiri et al. 2017, p.45). On the other hand, the ideology of interpretivism
philosophy states that it is effective for small scale and in-depth evaluation. Realism philosophy
argues the both by making the researcher select that philosophy which is most suitable as per the
subject.
3.6.1 Justification
The current study is going to involve a large sample from a diversified population for evaluating
employee satisfaction. Furthermore, the researchers are not going to focus on an in-depth
evaluation. Thus, positivism philosophy is most applicable as per the current context. The study
is going to follow the positivism research philosophy. Hence the subjects have selected as
suitable for primary objectives not appropriate as per on-going evaluation.
3.7 Research Approach
Deductive research approach follows the evaluation of research question as per the investigation
on relevant theories and models. A suitable conclusion would be drawn by developing a new
method or model and sometimes the existing one has also been modified (Faculty.msb.edu,
2019). On contradictory, the inductive approach depends upon the research hypothesis developed
at the initiation of the study. The concluding statement has been drawn as per the primary and
secondary data evaluation to either accept or reject the hypothesis.
23

Figure 7: Research approach
(Source: Created by a researcher)
3.7.1 Justification
The present study is concerned with the development research questions instead of the research
hypothesis. In addition to that, the existing theories and models of customer's satisfaction is
going to be discussed in the current study. Thus, it can claim that in order to identify the
effectiveness of the current research, it is necessary to implicate the deductive research approach.
Therefore, at the end of the study, the modification of the existing research theories and model
has been considered. On the other hand, it can be stated that as the current research has not found
the development of any research hypothesis, then the deductive approach of research has been
followed for this on-going evaluation.
3.8 Research Design
Research design plays a crucial role in assessing factual and legitimate data that upsurge the
effectiveness of the research to make sure Seagull can improve the customer satisfaction level.
Research design is segmented into different categories that include descriptive, explanatory and
exploratory facilitate in building a firm structure. The descriptive design has been chosen to
examine various aspects of customer satisfaction so that different factors and attributes can be
critically considered without delay in service. The major purpose behind selecting a descriptive
24
Research
design
Inductive Deductive Abductive

design is to include observing and impact of respective factors so that an effective strategy can
be developed.
The significant advantage of the descriptive design is to understand the relation between
different variables or factors that facilitates in improving customer experience. It enables the
researcher in observing and analysing the relevant factors to make sure a positive change can be
amended effectively. It allows researcher in validating collected data and information that
gradually upsurge the reliability and feasibility of a researcher in a systematic way.
3.9 Data collection methods
Data collection is considered as one of the important attributes of research that help in determine
relevant tools and techniques that help in understanding the significance of customer experience
on business. There are ranges of methods for data collection such as primary and secondary in
which different sources have been explored to get authentic and reliable data. Primary research
has been selected to understand the importance and significance of customer satisfaction on the
progressive development of the company. The primary research has been categorized into two
parts that include qualitative and quantitative research methodology. Primary quantitative
research methodology has been selected for this research in which a survey is conducted so that
effectiveness of shilling attack can be minimised as far as possible.
3.10 Population and sampling
The sample size for the study is considered as an essential aspect of research that facilitates in
assessing the needs and requirement of research to make maximum authentic data and
information can be collected. A sample of 100 consumers has been selected through random
sampling technique that helps in extracting trustworthy and reliable data for the progressive
development of an organization. The significant advantage of implementing this approach is the
reduction in risk of failure that improves the reliability and feasibility of research in an effective
way (Neupane, 2015). It helps in developing an effective strategy and planning to make sure a
positive change in customer approach can be amended effectively. This approach helps in
understanding the significance of all relevant factors that influence customer experience and
accessibility effectively.
25

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

3.11 Ethical consideration
The research has been commenced by considering all legitimate policy and legislation to assure
that collected data can be protected. Data Protection 1998 has been implemented to assure that
collected data cannot be used for any commercial purposes. The implications of these policies
and legislations have been critically followed that eventually upsurge the reliability and
feasibility of research. This ethical consideration help in commencing the survey vigilantly to
make sure authentic and legitimate data can be gathered that help in developing an effective
strategy positively.
3.12 Limitation
The research majorly emphasises on Seagull Developer rather than entire organisations with
respect to customer satisfaction that may reduce its effectiveness. The constraint budget and time
are other aspects of research that resist researcher in applying innovative approach to make a
positive change. The limited resources create problem in developing an effective o strategy to
eliminate potential risk and challenges for progressive growth of company.
26

Chapter 4: Data analysis
4.1 Introduction
The data analysis is considered as one of the important aspects of research that helps in
understanding the significance of collected data on a wider scale to make sure potential barriers
in customer satisfaction can be tackled positively. The purpose of this chapter is to analyse
collected data from different statistical tools to understand the significant relationship between
different variables. The obtained result has been considered in developing an effective strategy
so that an organization can achieve the desired goal and aim within constraint time.
4.2 SVM Classifier
SVM is one of the most commonly used machine learning algorithms. It is a kind of supervised
machine learning algorithm. Generally, it has been used to carry out classification problems or
regression problems. During the analysis, this algorithm uses kernel trick to transform the data.
Based on the transformations it identifies the most suitable boundaries between the possible
outputs. By using this algorithm we can find the complex relationships between the data points.
For that, the algorithm consumes more training time. SVM model is considered as one of the
feasible ways to integrate a large number of data in a synchronized order so that respective task
or activities can be set according to their choices and interest. Shilling attacks have been
gradually increasing that eventually affect the sales and economic performance of e-commerce
sites by changing the algorithm systematically. The Bayes theorem has been implemented to
reduce shilling attacks by changing the algorithm posterior probability and predictor prior
probability.
This algorithm helps in converting a specific set into frequency table on the basis of likelihood
and frequency so that probability of purchasing any products can be determined.
27

Posterior probability can be implemented to understand respective result and outcome to make
sure an effective algorithm can be developed to reduce probability in a systematic way. The
obtained result implies that signifies that probability of getting attacked is higher. It demands for
effective algorithm to reduce shilling attack can be reduced as far as possible. An algorithm has
been developed with the help of python language that change the user metrics that improve the
recommender system eventually.
The algorithm helps in determining the class with respect associate data set that is more effective
and reliable in multi class prediction. The implementation of this code is more reliable as
compared to less training data and logistic regression. The python code has been developed with
the help of normal distribution to make sure interest of users can be prioritized based on their
choice list vigilantly.
There are generally three types of SVM Model under scikit learn library that includes Bernoulli,
multinomial and Gaussian that help in improving recommender system. The Gaussian model
using normal distribution the helps in prioritizing the activities or interest of consumers that give
right product at r reasonable price. On the contrary, multinomial model emphasis on how often a
word can occur in the document that facilitates in improving recommender system gradually.
Therefore, Naïve Bayes model helps in changing the matrices accordingly so that a set of data
can be prioritized that resist any unauthorized or fake identity into the system that improve the
feasibility of recommender system in a systematic way.
4.3 Building a BDT (Binary Decision Tree) with intra-cluster correlation attribute
Building a Binary Decision Tree (BDT) is an essential attribute of recommender in which large
data has been compare and examine to take a firm decision so that maximum benefit can be
undertaken. (Science and R, 2019)This system has been used by user-item matrix that improves
the accuracy through customer based preference level. The intra-cluster correlation methodology
has been applied in which a range of data is examined based on customer preference level. The
personal choices of customers have been integrated to user’s matrix algorithm that automatically
improves the accuracy and reliability of recommender system. The intra- cluster method support
28

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

recommender system in providing right product at right time through compilation of user data
and information positively.
Intra cluster
Require: Cluster center (C1×m) & Cluster members (Mk×m)
1: function ICC(C, M) intra-cluster correlation calculation
2: Initialize: similarities1×k ← 0 correlation level for each member
3: μC ← mean(C) cluster center’s mean
4: σC ← std(C) cluster center’s standard deviation
Calculate similarities between each member and cluster center:
5: for all ui in M (i ← 1 to k) do
6: similarities(i) = pcc(M(i),C, μC, σC) Pearson’s correlation coefficient calculation
7: end for
Return average of similarities:
8: return means (similarities)
9: end function
29

4.4 Traversing BDT for detecting shilling profiles
Shilling attacks profiles is referred certain bot based on manipulate produce predictions to assure
that demand of consumers can be successful met. A quantifier correlation among rating profile
has been met through a range of co- rated strategy to assure an effective strategy can be
developed to reduce shilling attacks vigilantly. The traversing BDT implies that a group of
hacker directly hit the malicious profiles by using clustering approach that enable unauthorized
account to control the preference level of users. The intra-cluster correlation values have been
used in detecting shilling profiles to assure that a positive change can be amended vigilantly. The
traversing BDT emphasis on large number of data base that facilitates in assessing preference
level of consumers so that an users data base model can be formulated for detecting shilling
attack positively. The intra- cluster correlation helps in prioritizing the activities of preference
level of consumers so that the large data base can be synchronized. The integrated data facilitates
in influencing consumers ‘purchasing behavior that eventually increase the sales performance
based on an organization preference list. (Teixeira da Silva, 2017)
Require: Binary decision tree (BDT) & Upper and lower limit specifier (ρ)
1: function Detect (BDT, ρ)
2: if BDT.le f tElements.ICC > BDT.rightElements.ICC then
30

3: parentT ree = BDT.le f tElements
4: BDT = BDT.left
5: else
6: parentT ree = BDT.rightElements
7: BDT = BDT.right
8: end if Continue traversing according to changes in ICC:
9: while (BDT.le f tElements.ICC AND BDT.rightElements.ICC is not in range [parentT
ree.ICC ∓ parentT ree.ICC × ρ%]) do
10: if BDT.le f tElements.ICC > BDT.rightElements.ICC then
11: parentT reeElements = BDT.le f tElements
12: if BDT has a left child then
13: BDT = BDT.le f t
14: else break
15: end if
16: else
17: parentT reeElements = BDT.rightElements
18: if BDT has a right child then
19: BDT = BDT.le f t 20: else break
21: end if
22: end if
31

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

23: end while
24: return parentT reeElements
25: end function
The above algorithm traversing from root node are using the n
The obtained result implies that P (Sunny/ yes) = 3/9 = 0.33 that signifies that probability of
getting attacked is higher. It demands for effective algorithm to reduce shilling attack can be
reduced as far as possible. The intra-cluster correlation values have been used in detecting
shilling profiles to assure that a positive change can be amended vigilantly. (Teixeira da Silva,
2017) The traversing BDT emphasis on large number of data base that facilitates in assessing
preference level of consumers so that an users data base model can be formulated for detecting
shilling attack positively.
32

Chapter 5. Implementation
5.1 Detection of shilling attack with coding
This is the first step to determine shilling attack to make sure a positive and reliable system can
be amended that facilitates in minimizing the effectiveness attacks on recommender system.
There are several methods and techniques have been used to understand the function and
feasibility of shilling can be determined. SVM model has been used to strengthen recommender
system by creating a defender system so that personal and confidential detail of consumers can
be minimized as far as possible. SVM model is considered as one of the feasible ways to
integrate a large number of data in a synchronized order so that respective task or activities can
be set according to their choices and interest. Shilling attacks have been gradually increasing that
eventually affect the sales and economic performance of e-commerce sites by changing the
algorithm systematically. The Bayes theorem has been implemented to reduce shilling attacks by
changing the algorithm posterior probability and predictor prior probability.
The shilling attack algorithm helps in detecting potential hilling attack that enables respective
users in getting desirable products at reasonable time. Attack profiles are calculated and present
according to their composition and so many different varieties of attacked models. In experiment
varied it in two parameters first of all we focus on attack size and secondly filler size where we
get some certain prediction shift. (Zhang, 2016) The attack size measure from approximately two
percent (2%) to fourteen percent (14%) in between on the other hand the filler size measured in
between one percent 1% to ten percent10%. Here we going to discussed in detail about attack
profile how to be fall into suspicious window and how many profile are injected or effected by
attackers whenever confident coefficient turn automatically the value of depends on the
confidence switch or seen same kind of change.
5.2 Impact of shilling attack on recommended system with data
Shilling attacks creates a significant impact on recommender system from increasing
vulnerability of user’s details to influence their purchasing behavior. Push attack has created a
significant impact on recommender system by sharing the confidential detail of consumers in
order to enhance the sales and financial performance of the company. Therefore, a firm system is
33

required to implement that facilitates in reducing the effectiveness of shilling attack
simultaneously.
Figure 1: Shilling attack detection for recommender systems based on detection rate
A detection method with respect to group users that help in reducing shilling attack against
recommender system through changing user’s metrics algorithm. A rating prediction method has
been used in order to trigger shilling attack on the basis of credibility evaluation method by
resisting unauthorized access. The wide aspects of recommender system have been critically
examined by making unfair competition that result in loss of genuine users. Suspect time series
has been implemented into users algorithm that help in prioritizing the tasks and attack profile so
that potential attacks can be reduced as far as possible. (Zhang, 2015) Shilling detection method
is considered as one of the feasible ways to synchronize the entire data base.
34

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Figure 1: Shilling attack detection for recommender systems based on false positive rate
The obtained data signifies that average attack has created a negative impact on the
recommender system that demand for developing firm algorithm. The filler size is continuously
increasing from 3% to 9% that eventually increase the vulnerability of project gradually. A
suspect time series has been implemented into users algorithm that help in prioritizing the tasks
and attack profile so that potential attacks can be reduced as far as possible. Shilling detection
method is considered as one of the feasible ways to synchronize the entire data base.
5.3 Identification of most dangerous attack on recommended system
There are several types of shilling attacks based on their effectiveness and reliability attacks that
embrace push attack, average attack and nuke attacks that need to trigger vigilantly to assure that
adverse effect of shilling attack can be minimized as far as possible. In the UK, two majorly
attacks have been evident against recommender system that includes push attacks and nuke
attacks in which users faces difficulties in purchasing desired products. The developed coding
and implementation module help in amending significant impact on recommender system that
eventually improved customer services. The effectiveness of recommender system has been
improved through implementing detection technology vigilantly. A detection method with
respect to group users that help in reducing shilling attack against recommender system through
changing user’s metrics algorithm (Zhang and Zhou, 2014).
35

Filler
Size
(%)
Attack Size (%)
2% 4% 6% 8% 10% 12% 14% 16% 18%
3% 0.69 0.74 0.79 0.83 0.865 0.89 0.90 0.91 0.91
5% 0.74 0.78 0.82 0.86 0.89 0.905 0.916 0.925 0.93
7% 0.78 0.83 0.87 0.895 0.91 0.92 0.928 0.945 0.946
9% 0.81 0.85 0.88 0.905 0.92 0.935 0.945 0.95 0.954
They actual work on assumption basis which show difference between forged rating and rating
of genuine profile. In comparison genuine profile and attack profile should be higher than in
range. Whenever rating variance increase automatically accuracy of prediction decreases.
Therefore, measurement descriptor is much more confidence to recommender system for
injecting. They actual work one users on data base which help to them more predictable for users
and users group of people of circle. It defines as the worst standard, the value as much as higher,
some forged highest probability rather than normal profile. On the other hand the small value
minimum chance or we can say genuine profiles set to be a small value. They set range in
between zero point eight (0.8 to 1.2) to one point two in hybrid according to recommendation
systems.
36

5.4 Overview of the proposed Algorithm
Here the proposed model analyses the user details for finding the fake account. When
considering the fake user account and real or genuine user account the main difference is usage,
friends, and name etc. By using these kinds of attributes we can separate the fake user account
from the original account. It is the main idea of the proposed model. The below-given flowchart
describes the working of the proposed model.
From the flow chart that is clear the developed model initially read the data. Here the data means
the downloaded excel file of the user information. That can be downloaded from the e-commerce
website using the administrative panel. The downloaded dataset contains information like
username, email ID, number of friends, number of comments, and other usage statistics like
time, location etc. Here these are the features of the dataset. From these attributes, we can
classify the data into fake and original. Example scale of the decision model is illustrated below.
It shows the difference between z scores for different attributes. Similarly, the proposed model
splits the dataset, but the proposed model uses different attributes.
37
Read the
user
profile
Extract
the
required
attributes
SVM
classificat
ion
Decision
making
Fake/Real
Feedback
Training

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

And then, the proposed model extracts the attributes from the uploaded dataset. Then the SVM
classifier classifies the datasets based on the attributes. Here the attributes selected for the
analysis are favourites count, listed count, gender, language code, status code, followers and
friends details. Based on the conducted analysis and statistical calculations the proposed model
makes the decision. In general, the fake user account has less information and usage than the
original user account. During the end of the analysis, the model sends feedback and it helps for
training purpose. So the accuracy of the proposed model increases with the increase of the usage.
5.5 Implementation of developed coding with the help of respective models
Needs of dataset
The dataset can act as a local copy of the data. According to the mixture of actual and forged
profiles, the dataset is needed. The training data is used to train an algorithm. Due to privacy
matters, no datasets are obtainable. The dataset can be made by removing the fake profiles.
38

Parameter Evaluation
With the help of parameter evaluation terms, we can efficiently calculate effectiveness through
the division of numbers of real predictions by the sum of various foresight. Here an untrue
positive rate can also be made into existence through the division of numbers of true profiles
through the adding method of false profiles which are often noticed. Apart from this, the
artificial negative rate can be assumed through the division of forged profiles noticed as real by
the sum of those real authentic profiles.
Results
The SVM efficiency is more than Nave Bayes efficiency. When the number of attributes rises,
the algorithm’s efficiency also rises. The fake positive rate of SVM is less. It is because when a
profile is noticed as fake then the fake possibilities increase in SVM. According to Nave Bayes,
the fake negative rate is very less. The SVM is good for grouping false profiles on social media.
Evaluation
The machine learning can be acquired by using the proposed methodology. The efficiency
metrics can be calculated by the classifier. They were skilled by tenfold cross-validation. The
first cross-validation can be defined and the fundamental metrics and assessment of the classifier
were performed.
Import the packages
The above code used for importing the packages. There are multiple packages are used in this
program for a profile injection attack (Zhou et al., 2015). The packages are cross-validation, roc
curve, SVM, confusion matrix, etc.
39

Read the dataset
Read the original and fake profile dataset using the pandas library and read_csv() method. Then
combine the two datasets using the concat method. The dataset shape is 2818 rows and 34
columns.
Train and test the dataset
Train the dataset using the support vector machine algorithm. Perform data preprocessing using
the sklearn library. The preprocessing data stored in the X_train and X_test variables. Then
calculate the gamma value for finding the training score using the SVM algorithm. Then perform
the classification using SVC() method.
Perform the feature extraction to improve the performance of machine learning models. The
unique values are stored in the lang_list variable. The person names are stored in the lang_dict
variable. Then predict the gender by the person name.
40

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

ROC curve
Plot the Receiver Operating Characteristic (ROC) curve using the decision tree algorithm. Find
the true and false positive between the original and fake profile identification.
Confusion matrix
The above code is implemented for performing the confusion matrix. There are two target names
used in the confusion matrix such as fake profile and original profile.
5.6 Comparison existing approach with the proposed model
The proposed system also quite similar to the other existing systems functionally. The proposed
model also uses the same technologies like machine learning, natural language processing. The
classification system is the major difference between the proposed model and the existing
models. Here the SVM classification algorithm has been used for the classification process. On
the other hand, the existing systems use different classification algorithms like Naive Bayes
classification algorithms, XGBoosting algorithms, etc. The biggest difference between the two
models is that Naive Bayes treats the features as independent, whereas SVM looks at the
interactions between the features to a certain degree, as long as you're using a non-linear kernel.
When compared to the proposed model with the previous model the proposed model gives the
41

below-mentioned advantages. SVM algorithm has higher scalability. So that we can use this for
all kind of datasets irrespective of size. But in the Naive Bayes algorithm, the input space is
limited. It follows a dense concept. So it brings only a few irrelevant features. It effectively
sparse document vectors. For the text classification problems, the SVM classifier provides higher
accuracy. Because it allows linearly separable text classification.
5.7 E-commerce website creation
Created a WordPress based E-Commerce website
Website link: https://995856mp.cloudaccess.host/
Login credential for cloudaccess.net
Email address: 995856mastersproject@gmail.com
Password: password@12345
Login credential for domain name
Username: tegkqyos
Password: W#2Lm(q5eMg4Y2
Home page
42

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Men page
44

Women page
45

Cart
46

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

About us
Contact us
47

5.8 Demo for downloading user data from e-commerce website
1. Enter www.cloudaccess.net
2. Enter login credentials
49

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

3. Now it shows the product page. On that press the login button.
4. Dashboard > Users > User Import Export > User/Customer Export. (Tick the required
details.)
(Note: it is a free version of the plugin and website. So it has limited access. Here we can’t
download all the required data. The free version only allows downloading basic details. But in
the premium version, we can download all the required details. )
50

5. Save the file in the preferred locations.
6. Register the user profile
51

The above screenshots are shown the register the user profile. It contains the user name,
user email, user password, confirm password, country, phone number, address, and
gender. Then click the submit button. Then the user profile is added in the e-commerce
website database.
52

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

In the given picture shows the list of the users. It displays the user name, email, role of
the user, post and status.
7. Members
53

The above pictures show all user profiles. The members’ details contain the 1 to 132 pages. The
user name displays the row and column-wise.
8. User role
There is eight user role in the e-commerce site such as subscriber, shop manager, pending, editor,
customer, contributor, author and administrator. It contains the role title, ID, number of
members, custom role and admin access. The administrator only has admin access.
54

Subscriber
Users with the subscriber role the permission to create and maintain the profile. They cannot
write and publish the pages.
Shop manager
The shop manager to manage the shop. They can edit and create the products. The list of user
with shop manager role profiles is given in the above pictures.
55

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Pending
The pending user profile does not permit to read and write any comments and posts.
Editor
The editor role users have to permit to edit, write, delete and publish the posts and comments.
They can manage the tag, category and upload the files.
56

Customer
The customers permit to the view the published posts. They can also edit their own profile
information.
Contributor
The contributor role user can modify and delete their own posts. They cannot permit to delete
and edit the published posts.
57

Author
The author role of the user can upload the files, write, publish, delete and edit their articles. They
can also change their passwords and edit their profile. The above screenshots show the list of the
user with author role.
Administrator
The administrator has permitted to create a new user with a username and password. They can
add, delete the other users, modify the themes and add, modify and delete the plugins.
58

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

No role
The no role user does not permit to view, delete, create and modify any posts or articles.
9. Comments
The above pictures display the create comments for the particular product. There are two
comments are displayed such as very nice product and very nice. First, give the rating of the
product. Then type the comments next click the submit button.
59

All comments are displayed in the administration panel. There are four comments are posts on
the e-commerce website from the various users.
10. Login page
The login page has the username, password, remember me and a login button. Type the user
name and password. Then click the login button for accessing the e-commerce website.
60

11. User profile
The user profile contains the dashboard, orders, downloads, address, account details and logout.
The address section displays the billing address. It has the first name, last name, country and
address.
61

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

The account details section shows the user first name, last name, display name, and email
address. The user can change their password in this section. Type the current password, new a
password and confirm new password. Then click the save changes button.
12. Then we need to analyze the CSV file using the developed python code. The CSV file get
from the ecommerce website using the user import and export plugin. It is the actual
working of the real model.
Note: But in the given model we need to upload “Fake data set” and “original data set”
separately. The code combines both into one single dataset. And carry out the data analysis.
These steps are carried out to ensure the working of the developed code. For finding the accuracy
of the developed model these steps are mandatory.
62

5.9 Analysis of the result or outcome
Shilling attacks profiles is referred certain bot based on manipulate produce predictions to assure
that demand of consumers can be successful met. A quantifier correlation among rating profile
has been met through a range of co- rated strategy to assure an effective strategy can be
developed to reduce shilling attacks vigilantly. The traversing BDT implies that a group of
hacker directly hit the malicious profiles by using clustering approach that enable unauthorized
account to control the preference level of users. The intra-cluster correlation values have been
used in detecting shilling profiles to assure that a positive change can be amended vigilantly. The
traversing BDT emphasis on large number of data base that facilitates in assessing preference
level of consumers so that an users data base model can be formulated for detecting shilling
attack positively. The intra- cluster correlation helps in prioritizing the activities of preference
level of consumers so that the large data base can be synchronized. The integrated data facilitates
in influencing consumers ‘purchasing behavior that eventually increase the sales performance
based on an organization preference list (Zhou et al., 2018).
63

Dataset description (Informations present in the dataset)
The below-given figures show the various information present in the uploaded dataset. The
dataset has been downloaded from the e-commerce website administration panel. It contains the
different attributes required for the analysis like username, email ID, number of friends, number
of comments, and other usage statistics like time, location etc.
64

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

The above pictures shows the dataset information. It contains 2818 rows and 34 columns. It has
the many variables such as id, name, created at, screen name, status count, etc.
71

Feature extraction
The feature extraction is used for improving the machine learning techniques. It contains statuses
count, followers count, friends, favorites, listed, sex and lang code attributes.
Classification
Perform the classifier using the SVC method. It contains cache size, class weight, decision
function shape, degree, gamma, probability, random state, etc. The cache size is 200. The gamma
value is 31.62. The probability is false. The estimated score is 0.93301.
72

The learning curve is drawn between the training and cross-validation score using support vector
machine. These variables denote different colors. The training score has the highest score value
is more than 0.98. It training value is 1800. The cross-validation score value is less than 0.90.
The gamma value is 31.62.
Confusion matrix
There are two types of confusion matrix such as confusion matrix with normalization and
without normalization. The confusion matrix without normalization true negative (TN) value is
73

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

262, false positive (FP) value is 6, false negative (FN) value is 48 and true positive (TP) value is
248. The confusion matrix with normalization TN value is 0.97, FP value is 0.02, FN value is
0.16 and TP value is 0.83. The classification accuracy value is 0.90.
Fake profile
Finally, the developed model split the data into two different types based on the selected
attributes. And they are Fake profile and Original profile. At first, the model shows all the Fake
profiles. The below screenshot shows the fake profile information. Using these details an admin
can take the corrective actions to prevent the shilling attacks etc. on the e-commerce website. We
can also save these details as excel file if required.
74

Original profile
Similar to the fake profiles the developed model shows the original profile details. And it is
shown the below-given figure.
ROC curve
The ROC curve found between the original profile and fake profile. The AUC value is 0.91. It
takes both the true outcomes (0, 1). It returns a value between 0.0 and 1.0. The false positive rate
and true positive rate are find between the original profile and fake profile.
75

The precision, recall, f1score, and support are finding for the fake and original profile. The false-
positive rate is 0 022 and the true positive rate is 0.83. The fake profile precision value is 0.85,
recall value is 0.98, f1 score value is 0.91 and support value is 268. The original profile precision
value is 0.98, recall value is 0.84, f1 score value is 0.90 and support value is 296. The total
precision value is 0.91, f1 score value is 0.90, recall value is 0.90 and the support value is 564.
5.10 Identification of method or techniques to prevent recommender system from shilling
attacks
There are basically two types naïve algorithm and decision tree model methods have been used
to improve the accuracy and reliability of recommender system by eliminating potential shilling
attacks gradually. These two methods help in improving customer experience through priorities
the preference list of users. In addition, it examines the entire data and information to improve
decision making style so that maximum positive result can be gathered effectively. Shilling
attacker traps them via their other social media information. It is very hard to differentiate which
one is real or fake comment. In shilling attack profile perfectly designed like no one guess it is
completely fraud or real. Push attack mainly used for to for high market demand to increase his
productivity. On other side nuke attack are using for decreasing the demand of competitors.
There are so many attacks we just here discussed few of them segment attack is kind of where
they target only group of audience like if he like comedy movie than they provide them to
comedy and he recommend his group.
The cultural barrier in the world is playing one of the most significant roles in the global
conversation while most of the cultures have owned their own norms and specific ways of
communication. Besides, it has also been noticed the great importance of global communication
76

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

particularly for various business processes where it becomes vital to communicate through both
sides at multiple levels (Hupp et al. 2016). Moreover, creating a productive relationship become
also a vital part to minimize the barrier at the global level. Proactive communication will also
remove various barriers produced during the cultural meeting.
5.11 Benefits of global communication
Global communication is of great importance where the sharing information is the first and
foremost benefit to the world. Here the befits of the global communication are confined while
with the help of technological devices and invention of the communication devices like
smartphones & smart devices, communication is not the task to be deemed difficult by the
people. An administration of the multination enterprises also depends on the global
communications findings. Furthermore, there are some great opportunities are also emerging
every day and the most probable reason for it is global communication.
Besides, there are some other advantages of global communication also exists in this scenario
like the movement of cultural education to various parts of the world and technology helps in
identifying the vital characteristics of the culture worldly and thus technology helps us removing
various barriers of culture in the world with the substantial impacts on the people (Lawrence,
2015).
Moreover, another factor also exists that helps us specifying distinct aspects during the
communication and can work as a restraint during the determination of the precise operations
needed in dispatching the distribution globally. It has also been assumed in today's world that
global communication provides us a significant platform for us to communicate and understand
various cultures of the world and thus lives like 'the whole world is our home.'
We should also notice various barriers exist in the system to improve the level of communication
among people greatly. For efficient communication, we should identify centric elements are of
great importance in the world level. While maintaining the barrier, potential perils can also be
eliminated to a great level while new ideas also exist in the system to build a great relationship
with the customers in the business. Entrepreneurs can also get the specific results of the exact
needs of the customers all the time.
77

Chapter 6: Conclusion
6.1 Conclusion
This research covers shilling attack types and brief the entire major and minor key of shilling
attack. The research will explain all advantage and disadvantage of this shilling attack. In this
research had explained how to shilling attack work and how can it protect. We discussed the
research about shilling attack focusing on shilling attack, algorithms, analysis. We examined the
effects of shilling attack in online selling recommendation on Amazon, Wal-Mart etc.
Our paper shows various attacks models and methods of detecting the fake user profiles with
various approaches mostly used by the researchers to deal with such attacks. Our study showed
multiple effects during the random attacks. Besides, we have also developed the cost protected
recommender system which is greatly capable to detect various shilling models at a particular
event.
Maintaining the reliability of the users and rating time series, a shilling attack method has been
introduced. Here the thinking is also built up where the based on the various predictions that will
show the reliability of the users which have negative relationship rating invariance.
Contemplating the group and opportunistic features of shilling attacks, a dependable evaluation
model based on multiple rating prediction models exists that can derive proximity-based
predictions (Zhou et al., 2016).
Moreover, suspicious rating time segments can be ascertained through the construction of time
series for every product and thus data streams can be examined where the speculated time
windows are indicated. While analyzing the features of shilling attacks, we also implement
methods like usage statistics (followers & friends), commenting rates, number of comments, etc.
Besides, we also test various methods by which datasets and real data in e-commerce can be
verified for the model and algorithm.
These are one of the most important tools implemented in combating the overloaded data over a
specific time period. Recommender systems have great capabilities for creating useful
recommendations through the implementation of the personal information of the users.
79

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

6.2 Outcome of the experiment
The developed coding and implementation module help in amending significant impact on
recommender system that eventually improved customer services. The effectiveness of
recommender system has been improved through implementing detection technology vigilantly.
A detection method with respect to group users that help in reducing shilling attack against
recommender system through changing user’s metrics algorithm. A range of models and
algorithm methods have been applied that help in developing an effective algorithm that
strengthen the recommender system effectively. There are ranges of methods and model have
been using into the recommender system that help in giving right product at r right services to
assure that maximum benefit can be gathered positively. A suspect time series has been
implemented into users algorithm that help in prioritizing the tasks and attack profile so that
potential attacks can be reduced as far as possible. Shilling detection method is considered as one
of the feasible ways to synchronize the entire data base.
6.3 Future scope
This user algorithm helps in providing right product at right time based on consumers’
preference and choices to assure that maximum benefit can be gathered effectively. This research
enables several e-commerce sites in assessing the needs and demand of consumers that
eventually strengthen the relationship. The strategy is basically rules and run on algorithm
method. These activities are held on very small period window, traditional detection cannot get
this way. In this group model only rating factors between users is considered. They calculate all
type of factors like social account their activities time zone how much they spend time over on it.
How many times they visit they recommended knowing how much likes items kindly rate them
and so on, they cannot get idea about similar choice product to present in front of them.
80

6.4 Recommendation
6.4.1 Recommendation 1: Implementation of customer feedback
Specific To implement customer feedback option
Measurable This recommendation will help in accessing
the needs and requirements of consumers that
allows a company to improve the quality of
services that help in enhancing the feasibility
of an organizational performance positively.
This recommendation will help in increasing
customer base from 10% to 22% by
successfully meeting their demand and
expectation positively.
Achievable A regular meeting and form need to be
provided that contains a review of existing
services and suggestion that help in gaining
customer loyalty.
Reasonable This recommendation enables Seagull in
developing a long run relationship with
consumers by eliminating potential risks and
challenges effectively.
Timescale 2 Months
81

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

6.4.2 Recommendation 2: Encourage open communication with consumers
Specific To encourage open communication with
consumers
Measurable The effectiveness of this recommendation
can be measured through consumers’
retention and sales performance of Seagull.
This recommendation will help in building
transparency between consumers and
managers that increase the financial
performance from 20% to 35%.
Achievable It can be achieved through implementing
effective communication media such as social
media and email for sharing information and
new ideas that also on productivity.
Reasonable Open communication at workplace reduces
the risk of failure and builds a strong
relationship with consumers. It enables
Seagull to deliver the project on time as per
the demand and requirement of consumers for
progressive development.
Timescale 1 week
83

Reference list
Alonso, S., Bobadilla, J., Ortega, F. and Moya, R. (2019). Robust Model-Based Reliability
Approach to Tackle Shilling Attacks in Collaborative Filtering Recommender Systems. IEEE
Access, 7, pp.41782-41798.
Alostad, J. (2019). Improving the Shilling Attack Detection in Recommender Systems Using an
SVM Gaussian Mixture Model. Journal of Information & Knowledge Management, 18(01),
p.1950011.
Bilge, A., Ozdemir, Z. and Polat, H. (2014). A Novel Shilling Attack Detection
Method. Procedia Computer Science, 31, pp.165-174.
Dhawan, D. (2016). Implications of Various Fake Profile Detection Techniques in Social
Networks. IOSR Journal of Computer Engineering, 02(02), pp.49-55.
Dwivedi, M. (2018). Detection of Fake Users Account Based on Review. International Journal
for Research in Applied Science and Engineering Technology, 6(6), pp.20-24.
FAKE USER PROFILE DETECTION ON ONLINE SOCIAL NETWORKING.
(2017). International Journal of Advance Engineering and Research Development, 4(04).
Fire, M., Kagan, D., Elyashar, A. and Elovici, Y. (2014). Friend or foe? Fake profile
identification in online social networks. Social Network Analysis and Mining, 4(1).
Gurajala, S., White, J., Hudson, B., Voter, B. and Matthews, J. (2016). Profile characteristics of
fake Twitter accounts. Big Data & Society, 3(2), p.205395171667423.
HEMELRIJK, J. (2008). A Fake or not a Fake... BABESCH - Bulletin Antieke Beschaving, 83(0),
pp.47-60.
Hupp, P., Heene, M., Jacob, R. and Pflüger, D. (2016). Global communication schemes for the
numerical solution of high-dimensional PDEs. Parallel Computing, 52, pp.78-105.
84

Lawrence, T. (2015). Global Leadership Communication: A Strategic Proposal. Creighton
Journal of Interdisciplinary Leadership, 1(1), p.51.
Li, T. (2014). Shilling attack detection algorithm based on Genetic optimization. International
Journal of Security and Its Applications, 8(4), pp.273-286.
LI, W., GAO, M., LI, H., ZENG, J., XIONG, Q. and HIROKAWA, S. (2016). Shilling Attack
Detection in Recommender Systems via Selecting Patterns Analysis. IEICE Transactions on
Information and Systems, E99.D(10), pp.2600-2611.
Most People Can Identify Fake Laughter. (2018). ASHA Leader, 23(10), p.16.
Patentimages.storage.googleapis.com. (2019). [online] Available at:
https://patentimages.storage.googleapis.com/e9/21/c4/3cd42963873067/US9369198.pdf
[Accessed 25 Jul. 2019].
Ramalingam, D. and Chinnaiah, V. (2018). Fake profile detection techniques in large-scale
online social networks: A comprehensive review. Computers & Electrical Engineering, 65,
pp.165-177.
Science, D. and R, 6. (2019). 6 Easy Steps to Learn Naive Bayes Algorithm (with code in
Python). [online] Analytics Vidhya. Available at:
https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/ [Accessed 25 Jul. 2019].
Teixeira da Silva, J. (2017). Fake peer reviews, fake identities, fake accounts, fake data:
beware!. AME Medical Journal, 2, pp.28-28.
Teixeira da Silva, J. (2017). Fake peer reviews, fake identities, fake accounts, fake data:
beware!. AME Medical Journal, 2, pp.28-28.
Zhang, F. (2015). Robust Analysis of Network based Recommendation Algorithms against
Shilling Attacks. International Journal of Security and Its Applications, 9(3), pp.13-24.
85

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Zhang, F. (2016). Segment-Focused Shilling Attacks against Recommendation Algorithms in
Binary Ratings-based Recommender Systems. International Journal of Hybrid Information
Technology, 9(2), pp.381-388.
Zhang, F. and Zhou, Q. (2014). HHT–SVM: An online method for detecting profile injection
attacks in collaborative recommender systems. Knowledge-Based Systems, 65, pp.96-105.
Zhou, W., Wen, J., Koh, Y., Xiong, Q., Gao, M., Dobbie, G. and Alam, S. (2015). Shilling
Attacks Detection in Recommender Systems Based on Target Item Analysis. PLOS ONE, 10(7),
p.e0130968.
Zhou, W., Wen, J., Qu, Q., Zeng, J. and Cheng, T. (2018). Shilling attack detection for
recommender systems based on credibility of group users and rating time series. PLOS ONE,
13(5), p.e0196533.
Zhou, W., Wen, J., Xiong, Q., Gao, M. and Zeng, J. (2016). SVM-TIA a shilling attack detection
method based on SVM and target item analysis in recommender systems. Neurocomputing, 210,
pp.197-205.
86

1 out of 86

An Investigation into Shilling Attacks on Recommender Systems: A Study

Contribute Materials

Secure Best Marks with AI Grader

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Secure Best Marks with AI Grader

Paraphrase This Document

Related Documents

Effectiveness of Profile Injecting Attack Against Recommender System

Website Attractiveness, Trust, & Repeat Purchase in Singapore Fashion

Securing E-commerce Applications: Threat Modeling Investigation

+13062052269

info@desklib.com