Data Analysis of CIS Faculty Using Hadoop and Pig: Assignment
VerifiedAdded on 2023/04/20
|2
|599
|300
Homework Assignment
AI Summary
This assignment solution demonstrates the use of Hadoop and Pig for analyzing CIS faculty data. The process begins with loading the 'CIS_FacultyList.csv' file into HDFS storage. The solution then utilizes Pig to create new datasets based on the degree level (Bachelors, Masters, Doctorate), years of teaching experience (less than 5 years, more than 5 years), and the location of the last degree (North America, elsewhere). The 'SPLIT' command is used extensively to categorize and filter the data. Finally, the solution includes commands to copy the resulting datasets from HDFS back to the local file system. The assignment covers essential aspects of data manipulation and processing within a Hadoop environment, including data loading, splitting, and dataset creation using Pig Latin.
1 out of 2


