logo

Upload and categorize faculty list using Pig in Hadoop

2 Pages599 Words300 Views
   

Added on  2023-04-20

About This Document

This tutorial explains how to upload and categorize a faculty list using Pig in Hadoop. It covers steps to copy the dataset to HDFS, create new datasets based on criteria such as degree level, years of teaching, and last degree obtained, and copy the files from HDFS to the local file system.

Upload and categorize faculty list using Pig in Hadoop

   Added on 2023-04-20

ShareRelated Documents
Upload the dataset ‘CIS_FacultyList.csv’ into HDFS storage on the cluster to your designated storage
space. If the copying is being made from windows instead of linux, one can use WinScp software which
acts as a linux platform. Using it one can efficient drag and drop files between windows and linux
operating system.
>Hadoop fs –copyFromLocal CIS_FacultyList.csv’
Before copying the csv file to HDFS, it is important to ensure the file is present in Linux local file
system.To check if the file is present in the local file system, use the below mentioned command.
>ls
2. Use Pig to create new datasets from the source file that categorises the instructors using the following
criteria: a. The degree level – Bachelors, Masters or Doctorate b. Number of years of teaching – less than
5 years, or more than 5 years c. Whether the last degree was obtained from North America, ******ope or
elsewhere HINT: Consider using the Pig Latin Split (Partition), For each and Group statement constructs.
>pig
The above command takes us to the grunt shell where all the pig commands can be executed.
> CIS_Faculty = LOAD 'CIS_FacultyList.csv' USING PigStorage(',') AS (ID:int, Name: chararray,
Location:chararray, Grade:chararray, Title: chararray, Join Date:string, LWD: string, Type: chararray,
Division: chararray, Reports To: chararray, Highest:chararray, Highest Qualification: chararray, Major:
chararray, University:chararray, All Qualifications from Profile: chararray, Courses Taught- Term
201510: chararray, MAJOR TEACHING FIELD: chararray, DOCUMENT OTHER PROFESSIONAL
CERTIFICATION CRITIERA Five Years Work Experience Teaching Excellence Professional:
chararray, Criteria: chararray);
The above command loads the data. To check the schema and its commands, use DUMP command
anywhere.
> DUMP CIS_Faulty
Creating a new dataset that categorises the instructors based on the Degree Level – Bachelors, Masters or
Doctorate using SPLIT
>SPLIT CIS_Faculty into Bachelor_Level if Highest == “Bachelor”, Master_Level if Highest =
“Masters”, Doctorate_Level if (Highest == “Doctorate” and Highest == “Ph.D”;
>DUMP Bachelor_Level;
>DUMP Master_Level;
>DUMP Doctorate_Level;
Upload and categorize faculty list using Pig in Hadoop_1

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Designated Storage Space Assignment
|2
|593
|428