Designated Storage Space Assignment

Added on - 21 Apr 2020

  • 2


  • 593


  • 23


  • 0


Showing pages 1 to 1 of 2 pages
Upload the dataset ‘CIS_FacultyList.csv’ into HDFS storage on the cluster to your designated storagespace. If the copying is being made from windows instead of linux, one can use WinScp software whichacts as a linux platform. Using it one can efficient drag and drop files between windows and linuxoperating system.>Hadoop fs –copyFromLocalCIS_FacultyList.csv’Before copying the csv file to HDFS, it is important to ensure the file is present in Linux local filesystem.To check if the file is present in the local file system, use the below mentioned command.>ls2.Use Pig to create new datasets from the source file that categorises the instructors using the followingcriteria: a. The degree level – Bachelors, Masters or Doctorate b. Number of years of teaching – less than5 years, or more than 5 years c. Whether the last degree was obtained from North America, ******ope orelsewhere HINT: Consider using the Pig Latin Split (Partition), For each and Group statement constructs.>pigThe above command takes us to the grunt shell where all the pig commands can be executed.> CIS_Faculty = LOAD 'CIS_FacultyList.csv' USING PigStorage(',') AS (ID:int, Name: chararray,Location:chararray, Grade:chararray, Title: chararray, Join Date:string, LWD: string, Type: chararray,Division: chararray, Reports To: chararray, Highest:chararray, Highest Qualification: chararray, Major:chararray, University:chararray, All Qualifications from Profile: chararray, Courses Taught- Term201510: chararray, MAJOR TEACHING FIELD: chararray, DOCUMENT OTHER PROFESSIONALCERTIFICATION CRITIERA Five Years Work Experience Teaching Excellence Professional:chararray, Criteria: chararray);The above command loads the data. To check the schema and its commands, use DUMP commandanywhere.> DUMP CIS_FaultyCreating a new dataset that categorises the instructors based on the Degree Level – Bachelors, Masters orDoctorate using SPLIT>SPLIT CIS_Faculty into Bachelor_Level if Highest == “Bachelor”, Master_Level if Highest =“Masters”, Doctorate_Level if (Highest == “Doctorate” and Highest == “Ph.D”;>DUMP Bachelor_Level;>DUMP Master_Level;>DUMP Doctorate_Level;
You’re reading a preview

To View Complete Document

Become a Desklib Library Member.
Subscribe to our plans

Download This Document