Data Analysis of CIS_FacultyList.csv using Pig and HDFS

Verified

Added on  2020/04/21

|2
|593
|428
Homework Assignment
AI Summary
This assignment focuses on data analysis using Apache Pig and Hadoop Distributed File System (HDFS). The task involves loading the 'CIS_FacultyList.csv' dataset into HDFS, followed by data manipulation using Pig. The solution demonstrates how to create new datasets based on different criteria such as degree level (Bachelors, Masters, Doctorate), years of teaching experience (less than 5 years, more than 5 years), and the location of the last degree (North America or elsewhere). The assignment utilizes Pig's SPLIT function to categorize the data and DUMP to display the schema and data. Finally, the solution includes instructions for copying the processed data from HDFS back to the local file system. The solution is designed to provide practical experience in data processing and analysis within a big data environment. Desklib provides access to past papers and solutions to help students with their studies.
chevron_up_icon
1 out of 2
circle_padding
hide_on_mobile
zoom_out_icon
Loading PDF…
[object Object]