Data Analytics: Initial Data Exploration and Preprocessing Techniques
VerifiedAdded on 2023/06/12
|33
|2651
|342
Report
AI Summary
This report provides an initial exploration of a dataset related to US permanent visa applications, examining the frequency distribution of various attributes. The report covers aspects such as agent city, case status, class of admission, country of citizenship, employer details, and foreign worker information. It further delves into data preprocessing techniques, including equi-width binning, equi-depth binning, min/max normalization, z-score normalization, discretization, and binarization. The analysis aims to provide insights into the dataset's characteristics and prepare the data for effective modeling, concluding with key observations and summaries from the exploration.

Introduction to Data Analytics
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Table of Contents
Section 1A Initial Data Exploration................................................................................................1
row ID..........................................................................................................................................1
add_these_pw_job_title_9089.....................................................................................................1
agent_city.....................................................................................................................................1
.....................................................................................................................................................2
agent_firm_name.........................................................................................................................2
agent_state...................................................................................................................................2
application_type...........................................................................................................................2
case_no........................................................................................................................................2
case_number................................................................................................................................3
case_received_date......................................................................................................................3
case_status...................................................................................................................................3
class_of_admission......................................................................................................................4
country_of_citizenship................................................................................................................6
country_of_citzenship..................................................................................................................7
decision_date...............................................................................................................................7
employer_address_1....................................................................................................................7
employer_address_2....................................................................................................................8
employer_city..............................................................................................................................8
ii
Section 1A Initial Data Exploration................................................................................................1
row ID..........................................................................................................................................1
add_these_pw_job_title_9089.....................................................................................................1
agent_city.....................................................................................................................................1
.....................................................................................................................................................2
agent_firm_name.........................................................................................................................2
agent_state...................................................................................................................................2
application_type...........................................................................................................................2
case_no........................................................................................................................................2
case_number................................................................................................................................3
case_received_date......................................................................................................................3
case_status...................................................................................................................................3
class_of_admission......................................................................................................................4
country_of_citizenship................................................................................................................6
country_of_citzenship..................................................................................................................7
decision_date...............................................................................................................................7
employer_address_1....................................................................................................................7
employer_address_2....................................................................................................................8
employer_city..............................................................................................................................8
ii

employer_country........................................................................................................................8
employer_decl_info_title.............................................................................................................8
employer_name............................................................................................................................8
employer_num_employees..........................................................................................................9
employer_phone........................................................................................................................11
employer_phone_ext..................................................................................................................11
employer_postal_code...............................................................................................................11
employer_state...........................................................................................................................11
employer_yr_estab.....................................................................................................................12
foreign_worker_info_alt_edu_experience.................................................................................12
foreign_worker_info_birth_country..........................................................................................13
foreign_worker_info_city..........................................................................................................15
foreign_worker_info_education................................................................................................17
Section 1B Data Preprocessing......................................................................................................18
Binning......................................................................................................................................18
Equi-Width Binning...................................................................................................................18
Equi-depth Binning....................................................................................................................20
Normalization............................................................................................................................22
Discretise...................................................................................................................................24
Binarise......................................................................................................................................25
iii
employer_decl_info_title.............................................................................................................8
employer_name............................................................................................................................8
employer_num_employees..........................................................................................................9
employer_phone........................................................................................................................11
employer_phone_ext..................................................................................................................11
employer_postal_code...............................................................................................................11
employer_state...........................................................................................................................11
employer_yr_estab.....................................................................................................................12
foreign_worker_info_alt_edu_experience.................................................................................12
foreign_worker_info_birth_country..........................................................................................13
foreign_worker_info_city..........................................................................................................15
foreign_worker_info_education................................................................................................17
Section 1B Data Preprocessing......................................................................................................18
Binning......................................................................................................................................18
Equi-Width Binning...................................................................................................................18
Equi-depth Binning....................................................................................................................20
Normalization............................................................................................................................22
Discretise...................................................................................................................................24
Binarise......................................................................................................................................25
iii
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

Section 1C Summarize..................................................................................................................26
Conclusion.....................................................................................................................................28
Reference.......................................................................................................................................29
iv
Conclusion.....................................................................................................................................28
Reference.......................................................................................................................................29
iv
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Section 1A Initial Data Exploration
The aim of the present study is to explore the given dataset. The data presented is from the US
permanent visa applications. Various organizations in the US hire foreign workers to work in the
US for them. However, before a foreign worker can be hired the organization has to submit an
application to the department of Homeland security. In addition, the organization intending to
hire the foreign worker has to certify to the department of labor that the employment of the
foreign worker would not in any way affect the wage and working conditions of US citizens who
have similar educational experience.
The given data is pre-processed and examined for the frequency distribution of different
variables under the study.
row ID
Attribute: The row Id is a nominal variable since they are distinct identities.
add_these_pw_job_title_9089
The data for the attribute is missing.
agent_city
Attribute: The variable agent_city is a nominal variable; they represent the city from which they
come.
Spread: There were 215 missing values.
The maximum number of agents were from San Francisco – 189.
The minimum number of agents from cities was – 1.
The aim of the present study is to explore the given dataset. The data presented is from the US
permanent visa applications. Various organizations in the US hire foreign workers to work in the
US for them. However, before a foreign worker can be hired the organization has to submit an
application to the department of Homeland security. In addition, the organization intending to
hire the foreign worker has to certify to the department of labor that the employment of the
foreign worker would not in any way affect the wage and working conditions of US citizens who
have similar educational experience.
The given data is pre-processed and examined for the frequency distribution of different
variables under the study.
row ID
Attribute: The row Id is a nominal variable since they are distinct identities.
add_these_pw_job_title_9089
The data for the attribute is missing.
agent_city
Attribute: The variable agent_city is a nominal variable; they represent the city from which they
come.
Spread: There were 215 missing values.
The maximum number of agents were from San Francisco – 189.
The minimum number of agents from cities was – 1.

agent_firm_name
Attribute: The agent firm name is a nominal variable since they are distinct identities.
agent_state
Attribute: The agent state is a nominal variable since they are distinct identities.
application_type
The data for the attribute is missing.
case_no
The data for the attribute is missing.
2
Attribute: The agent firm name is a nominal variable since they are distinct identities.
agent_state
Attribute: The agent state is a nominal variable since they are distinct identities.
application_type
The data for the attribute is missing.
case_no
The data for the attribute is missing.
2
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

case_number
Attribute: The variable case number is a nominal variable since they are distinct identities.
case_received_date
Attribute: The variable case received date is a nominal variable since they are distinct identities
case_status
Attribute: The variable case status is a nominal variable since they are distinct identities
Statistics:
3
Attribute: The variable case number is a nominal variable since they are distinct identities.
case_received_date
Attribute: The variable case received date is a nominal variable since they are distinct identities
case_status
Attribute: The variable case status is a nominal variable since they are distinct identities
Statistics:
3
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

class_of_admission
Attribute: The variable class of admission is a nominal variable since they are distinct identities
Statistics:
The maximum number of admissions is from H1-B Visa - 1555.
The Least number of admission is from J-2, L-2, H-2, H-1B1 and R-1 - 1 each
4
Attribute: The variable class of admission is a nominal variable since they are distinct identities
Statistics:
The maximum number of admissions is from H1-B Visa - 1555.
The Least number of admission is from J-2, L-2, H-2, H-1B1 and R-1 - 1 each
4

5
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

country_of_citizenship
Attribute: The variable country of citizenship is a nominal variable. They are distinct variables
Statistics:
The maximum number of citizens are from India – 1190.
The minimum number is 1. They are from many countries.
6
Attribute: The variable country of citizenship is a nominal variable. They are distinct variables
Statistics:
The maximum number of citizens are from India – 1190.
The minimum number is 1. They are from many countries.
6
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

country_of_citzenship
The data for the attribute is missing.
decision_date
Attribute: The variable decision date is a nominal variable since they are distinct identities
7
The data for the attribute is missing.
decision_date
Attribute: The variable decision date is a nominal variable since they are distinct identities
7

employer_address_1
Attribute: The variable employer address_1 is a nominal variable since they are distinct
identities
employer_address_2
Attribute: The variable employer address_2 is a nominal variable since they are distinct
identities
employer_city
Attribute: The variable employer city is a nominal variable since they are distinct identities
employer_country
Attribute: The variable employer country is a nominal variable since they are distinct identities
Spread: The employer country is United States of America
employer_decl_info_title
Attribute: The variable employer decl info title is a nominal variable since they are distinct
identities
Spread: It represents the title of the employer
employer_name
Attribute: The variable employer name is a nominal variable since they are distinct identities
8
Attribute: The variable employer address_1 is a nominal variable since they are distinct
identities
employer_address_2
Attribute: The variable employer address_2 is a nominal variable since they are distinct
identities
employer_city
Attribute: The variable employer city is a nominal variable since they are distinct identities
employer_country
Attribute: The variable employer country is a nominal variable since they are distinct identities
Spread: The employer country is United States of America
employer_decl_info_title
Attribute: The variable employer decl info title is a nominal variable since they are distinct
identities
Spread: It represents the title of the employer
employer_name
Attribute: The variable employer name is a nominal variable since they are distinct identities
8
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide
1 out of 33
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
Copyright © 2020–2026 A2Z Services. All Rights Reserved. Developed and managed by ZUCOL.
