Data Analytics: Initial Data Exploration and Preprocessing Techniques

Verified

Added on  2023/06/12

|33
|2651
|342
Report
AI Summary
This report provides an initial exploration of a dataset related to US permanent visa applications, examining the frequency distribution of various attributes. The report covers aspects such as agent city, case status, class of admission, country of citizenship, employer details, and foreign worker information. It further delves into data preprocessing techniques, including equi-width binning, equi-depth binning, min/max normalization, z-score normalization, discretization, and binarization. The analysis aims to provide insights into the dataset's characteristics and prepare the data for effective modeling, concluding with key observations and summaries from the exploration.
Document Page
Introduction to Data Analytics
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Table of Contents
Section 1A Initial Data Exploration................................................................................................1
row ID..........................................................................................................................................1
add_these_pw_job_title_9089.....................................................................................................1
agent_city.....................................................................................................................................1
.....................................................................................................................................................2
agent_firm_name.........................................................................................................................2
agent_state...................................................................................................................................2
application_type...........................................................................................................................2
case_no........................................................................................................................................2
case_number................................................................................................................................3
case_received_date......................................................................................................................3
case_status...................................................................................................................................3
class_of_admission......................................................................................................................4
country_of_citizenship................................................................................................................6
country_of_citzenship..................................................................................................................7
decision_date...............................................................................................................................7
employer_address_1....................................................................................................................7
employer_address_2....................................................................................................................8
employer_city..............................................................................................................................8
ii
Document Page
employer_country........................................................................................................................8
employer_decl_info_title.............................................................................................................8
employer_name............................................................................................................................8
employer_num_employees..........................................................................................................9
employer_phone........................................................................................................................11
employer_phone_ext..................................................................................................................11
employer_postal_code...............................................................................................................11
employer_state...........................................................................................................................11
employer_yr_estab.....................................................................................................................12
foreign_worker_info_alt_edu_experience.................................................................................12
foreign_worker_info_birth_country..........................................................................................13
foreign_worker_info_city..........................................................................................................15
foreign_worker_info_education................................................................................................17
Section 1B Data Preprocessing......................................................................................................18
Binning......................................................................................................................................18
Equi-Width Binning...................................................................................................................18
Equi-depth Binning....................................................................................................................20
Normalization............................................................................................................................22
Discretise...................................................................................................................................24
Binarise......................................................................................................................................25
iii
Document Page
Section 1C Summarize..................................................................................................................26
Conclusion.....................................................................................................................................28
Reference.......................................................................................................................................29
iv
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Section 1A Initial Data Exploration
The aim of the present study is to explore the given dataset. The data presented is from the US
permanent visa applications. Various organizations in the US hire foreign workers to work in the
US for them. However, before a foreign worker can be hired the organization has to submit an
application to the department of Homeland security. In addition, the organization intending to
hire the foreign worker has to certify to the department of labor that the employment of the
foreign worker would not in any way affect the wage and working conditions of US citizens who
have similar educational experience.
The given data is pre-processed and examined for the frequency distribution of different
variables under the study.
row ID
Attribute: The row Id is a nominal variable since they are distinct identities.
add_these_pw_job_title_9089
The data for the attribute is missing.
agent_city
Attribute: The variable agent_city is a nominal variable; they represent the city from which they
come.
Spread: There were 215 missing values.
The maximum number of agents were from San Francisco – 189.
The minimum number of agents from cities was – 1.
Document Page
agent_firm_name
Attribute: The agent firm name is a nominal variable since they are distinct identities.
agent_state
Attribute: The agent state is a nominal variable since they are distinct identities.
application_type
The data for the attribute is missing.
case_no
The data for the attribute is missing.
2
Document Page
case_number
Attribute: The variable case number is a nominal variable since they are distinct identities.
case_received_date
Attribute: The variable case received date is a nominal variable since they are distinct identities
case_status
Attribute: The variable case status is a nominal variable since they are distinct identities
Statistics:
3
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
class_of_admission
Attribute: The variable class of admission is a nominal variable since they are distinct identities
Statistics:
The maximum number of admissions is from H1-B Visa - 1555.
The Least number of admission is from J-2, L-2, H-2, H-1B1 and R-1 - 1 each
4
Document Page
5
Document Page
country_of_citizenship
Attribute: The variable country of citizenship is a nominal variable. They are distinct variables
Statistics:
The maximum number of citizens are from India – 1190.
The minimum number is 1. They are from many countries.
6
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
country_of_citzenship
The data for the attribute is missing.
decision_date
Attribute: The variable decision date is a nominal variable since they are distinct identities
7
Document Page
employer_address_1
Attribute: The variable employer address_1 is a nominal variable since they are distinct
identities
employer_address_2
Attribute: The variable employer address_2 is a nominal variable since they are distinct
identities
employer_city
Attribute: The variable employer city is a nominal variable since they are distinct identities
employer_country
Attribute: The variable employer country is a nominal variable since they are distinct identities
Spread: The employer country is United States of America
employer_decl_info_title
Attribute: The variable employer decl info title is a nominal variable since they are distinct
identities
Spread: It represents the title of the employer
employer_name
Attribute: The variable employer name is a nominal variable since they are distinct identities
8
chevron_up_icon
1 out of 33
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]