Analyzing Logs with Hive: Using HiveQL for Data Query and Statistics

Verified

Added on  2023/04/25

|6
|458
|278
Practical Assignment
AI Summary
This assignment focuses on using Hive to analyze log files and report basic statistics using HiveQL. The solution details creating Hive tables with Data Analytics Studio (DAS), including uploading tables, setting field delimiters, and specifying data types. It covers loading data into the Hive table from HDFS, creating directories, setting permissions, and using the LOAD DATA INPATH command. The assignment also includes verifying data loading and provides steps to save and execute queries within the DAS environment, ensuring users can effectively query and analyze log data using Hive.
Document Page
AS#3 Using Hive to analyze log
In this assignment, you will use a log file as input, and use HiveQL to query the data and report basic statistics.
CREATE HIVE TABLES WITH DATA ANALYTICS STUDIO
Logon to Ambari user/password: maria_dev/maria_dev.
1. Select the Data Analytics Studio (DAS) service.
2. Navigate to the Data Analytics Studio UI located on the right hand side of the Ambari page.
3. On the DAS UI, click on Database
4. Select the + next to TABLES
5. Click on UPLOAD TABLE
1
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Once there, we are going to create the first table for the sample file by uploading it as follows:
6. Click Clear for Field Delimiter and then Enter TAB-(horizontal tab)
7. Check the Is first row header?
8. Select Upload from Local
9. Click to Browse, go to the location where you downloaded the samples.log
2
Document Page
10. Once the file has been updated, click on +CREATE at the bottom of the page
12. Following steps 4 to 11 create and verify the table for users. You will be asked to enter the Precision for the
column titled enter 1.
3. Click on Execute
4. Verify that the table omniturelogs was created by going to Compose then look for omniturelogs under
the Database Explorer > TABLES.
3
Document Page
LOAD DATA INTO A TABLE
We will be using HDFS Files View for loading the data into the omniturelog table.
1. Hover over to the Ambari Views icon
2. Select Files View
3. Click on the tmp folder
4. Create a New Folder and name it maria_dev
5. Select the maria_dev folder
6. Click on Permissions
7. Ensure that everyone has access to read, write and execute by clicking on the missing permission per user.
8. Click on Save
4
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
9. Inside the maria_dev folder, select Upload to upload the file omniture-logs.tsv file
View after loading the file:
Let’s run a simple query to take the data we stored in HDFS and populate our new Hive table. Go back
to DAS and under Compose for Worksheet1 and click on the +. This will create a new query Worksheet2.
10. Paste the following query into Worksheet2
LOAD DATA INPATH '/tmp/maria_dev/omniture-logs.tsv' OVERWRITE INTO TABLE omniturelogs;
5
Document Page
11. Execute the query
12. Verify that the data was loaded properly by going to Database > TABLES >onmniturelogs > Data
PREVIEW tab.
SAVE AND EXECUTE A QUERY
6
chevron_up_icon
1 out of 6
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]