Project: Phishing Website Detection Using Machine Learning Techniques

Verified

Added on 2023/01/20

AI Summary

This project delves into the critical issue of phishing website detection, employing machine learning techniques to identify and mitigate the risks associated with these malicious sites. The project begins with an introduction to phishing, its goals, and objectives, emphasizing the use of machine learning in detecting phishing websites. A comprehensive literature review is conducted, comparing various machine learning algorithms and features, including lexical and host-based features, to understand their effectiveness in classifying phishing URLs. The project also highlights the importance of user education and software-based approaches in combating phishing attacks. The methodology utilizes a positivism philosophy, incorporating both human and scientific perspectives, with primary data collection planned. A detailed project schedule, including a Gantt chart, outlines the tasks, duration, and dependencies. The conclusion emphasizes the significance of online algorithms and the need for continuous adaptation to counter evolving phishing tactics, along with future research directions. The project aims to provide a framework for understanding and combating phishing attacks, contributing to enhanced cybersecurity measures.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Introduction:
 Phishing is the new term used for Fishing.
 Act of attack in which the attacker lure the user for visiting
a fake website
 This is done by means of looking similar to a particular
website.
 Followed by this is the stealing of the users personal
information.

Goals
The main goal of this project is understand the basic concept of phishing
and how machine learning can be used for the purpose of detecting the
Phishing websites. The phishing website detection techniques are broadly
classified into two major categories and this is associated with including
the user education and the software.
The user education approach is associated with including providing
education to the users related to safe browsing practices. The software
approach is consisting of

Objectives
 To understand phishing
 To understand how machine
learning can be used for
detection of phishing website

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Literature review:
Ma et al. In their paper have been associated with comparing the
different batch dependent learning algorithms so as to classify the
phishing URLs and was also been associated with showing the fact that
the combination of host dependent and the lexical features initially
results in the highest accuracy in the process of classification. The
paper have also been associate with comparing the performance
dependent algorithms with the online algorithms while the full features
were being used. From this they were capable of finding out the fact
that the online algorithm which mainly includes the Confidence-
Weighted (CW) is associated with outperforming the batch-based
algorithms.

Literature review:
The paper presented by Garera et al. have been associated with the
usage of the logistic regression over features which are hand-selected
for the purpose of classifying phishing URLs. The feature have been
associated with including the presence of red flag keywords in the
URL, some features which are dependent upon Google’s Page Rank,
and guidelines for the Google’s Web page quality. However, it is seen
to be very much difficult in making the direct comparison with the
approach without having an access to the same URLs and features.

Literature review:
McGrath and Gupta was not associated with the construction of an
classifier, but was responsible for conducting a a comparative analysis
of the phishing as well as the non phishing URLs with respect to
datasets. Authors have been associated with comparing the non
phishing URLs which were drawn from the DMOZ Open Directory
Project with the phishing URLs which were obtained from PhishTank.
Some of the important features which were analyzed in this paper
included the IP addresses, thin records of the WHOIS consisting of the
date and the information provided from the registrarn, geographic
information, and the URLs lexical features like the character
distribution, length, and presence of predefined brand names.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Summary of review:
 By comparing the different features by usage of the different data mining
algorithms has been associated with pointing out the fact that it is possible to
achieve the efficiency if the lexical features are used.
 For the purpose of making sure that the end users are being protected from
visiting the phishing websites, there is need of making attempts so as to identify
the phishing URLs by means of analysis of the lexical and the host-based
features.
 One of the problem which is faced in this particular domain is the fact that the
cyber criminals are constantly associated with the creation of new strategies for
breaching the defense measures.

Project planning:
 Methodology
 Positivism philosophy would be used for the entire research.
 This theory is associated with including the combination of the human
perspective as well as the scientific perspective
 Primary data is to be collected.
 Quantitative as well as qualitative data would be used

Project planning:
Scope:
The project is aimed at understanding the concept of
Detection of phishing websites using machine
learning framework.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Project Schedule:
Task Mode Task Name Duration Start Finish Predecessors Resource Names
Auto Scheduled Research Project 48 days Wed 12/7/16 Fri 2/10/17
Auto Scheduled Research Proposal 8 days Fri 4/26/19 Tue 5/7/19
Auto Scheduled Choosing a topic for research 3 days Fri 4/26/19 Tue 4/30/19
Auto Scheduled Background Study of the Research 6 days Fri 4/26/19 Fri 5/3/19
Auto Scheduled Development of the Research Question 3 days Wed 5/1/19 Fri 5/3/19 3
Auto Scheduled Designing the Conceptual Framework 2 days Mon 5/6/19 Tue 5/7/19 4
Auto Scheduled Development of the Research Question 2 days Mon 5/6/19 Tue 5/7/19 5
Auto Scheduled Research Proposal Submission 0 days Tue 5/7/19 Tue 5/7/19 6
Auto Scheduled Review of the Literature and Collection of the Data 27 days Fri 4/26/19 Mon 6/3/19
Auto Scheduled Reviewing the available literature 10 days Wed 5/8/19 Tue 5/21/19 7
Auto Scheduled Selecting target population for collecting the data 2 days Fri 4/26/19 Mon 4/29/19
Auto Scheduled Collecting the Data of the Research Study 9 days Wed 5/22/19 Mon 6/3/19 10
Auto Scheduled Analysing the gathered data 4 days Tue 4/30/19 Fri 5/3/19 11
Auto Scheduled Submission of the Draft Research paper 0 days Mon 6/3/19 Mon 6/3/19 12
Auto Scheduled Submission of the Final Project Paper 10 days Fri 4/26/19 Thu 5/9/19
Auto Scheduled Critical Analysis of the findings 3 days Mon 5/6/19 Wed 5/8/19 13
Auto Scheduled Concluding the Findings of the Study 3 days Fri 4/26/19 Tue 4/30/19
Auto Scheduled Recommendations 1 day Thu 5/9/19 Thu 5/9/19 16
Auto Scheduled Submitting the Final Project Report 0 days Tue 4/30/19 Tue 4/30/19 17