Ask a question from expert

Ask now

Designated Storage Space Assignment

2 Pages593 Words428 Views
   

Added on  2020-04-21

Designated Storage Space Assignment

   Added on 2020-04-21

BookmarkShareRelated Documents
Upload the dataset ‘CIS_FacultyList.csv’ into HDFS storage on the cluster to your designated storage space. If the copying is being made from windows instead of linux, one can use WinScp software which acts as a linux platform. Using it one can efficient drag and drop files between windows and linux operating system.>Hadoop fs –copyFromLocal CIS_FacultyList.csv’ Before copying the csv file to HDFS, it is important to ensure the file is present in Linux local file system.To check if the file is present in the local file system, use the below mentioned command.>ls 2. Use Pig to create new datasets from the source file that categorises the instructors using the following criteria: a. The degree level – Bachelors, Masters or Doctorate b. Number of years of teaching – less than 5 years, or more than 5 years c. Whether the last degree was obtained from North America, ******ope or elsewhere HINT: Consider using the Pig Latin Split (Partition), For each and Group statement constructs.>pigThe above command takes us to the grunt shell where all the pig commands can be executed. > CIS_Faculty = LOAD 'CIS_FacultyList.csv' USING PigStorage(',') AS (ID:int, Name: chararray, Location:chararray, Grade:chararray, Title: chararray, Join Date:string, LWD: string, Type: chararray, Division: chararray, Reports To: chararray, Highest:chararray, Highest Qualification: chararray, Major: chararray, University:chararray, All Qualifications from Profile: chararray, Courses Taught- Term 201510: chararray, MAJOR TEACHING FIELD: chararray, DOCUMENT OTHER PROFESSIONAL CERTIFICATION CRITIERA Five Years Work Experience Teaching Excellence Professional: chararray, Criteria: chararray); The above command loads the data. To check the schema and its commands, use DUMP command anywhere.> DUMP CIS_FaultyCreating a new dataset that categorises the instructors based on the Degree Level – Bachelors, Masters or Doctorate using SPLIT>SPLIT CIS_Faculty into Bachelor_Level if Highest == “Bachelor”, Master_Level if Highest = “Masters”, Doctorate_Level if (Highest == “Doctorate” and Highest == “Ph.D”;>DUMP Bachelor_Level;>DUMP Master_Level;>DUMP Doctorate_Level;
Designated Storage Space Assignment_1

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Upload and categorize faculty list using Pig in Hadoop
|2
|599
|300