Implementation of Hadoop and Big Data Project for Sales Analysis

Verified

Added on  2020/04/21

|7
|1008
|231
Project
AI Summary
This project details the implementation of a Big Data solution using Hadoop and related technologies for sales analysis. The project utilizes VirtualBox to create a virtual environment, imports the Cloudera Quickstart VM, and installs Hue for a user-friendly interface. Pig scripts are then used to process and analyze sales data, demonstrating the use of MapReduce for data processing and HDFS for storage. The project also outlines the requirements, including Unix and Windows environments with Hadoop, Java, Ant, and JUnit, and describes the data loading procedures using Pig's load functions, specifically pigstorage() and textloader(). The conclusion emphasizes the use of VirtualBox, Cloudera, Hue, and Pig scripts for the implementation and analysis of sales data.
tabler-icon-diamond-filled.svg

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Hadoop and big data (java)
User
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Table of Contents
1. Introduction.......................................................................................................................................2
2. Objective............................................................................................................................................2
3. Requirements.....................................................................................................................................2
4. Processing techniques........................................................................................................................3
5. Data processing procedures..............................................................................................................4
6. Conclusion..........................................................................................................................................5
7. References..........................................................................................................................................5
Document Page
1. Introduction
VirtualBox is a kind of software virtualization package connects on operating system.
VirtualBox will be used for implementation purpose. The Cloudera provides the virtual machine
that makes me to work in the handout conditions. Then Quickstart cloud era will be imported
into virtual box. Hue can be installed in the quick start cloud era. Then pig scripts for sales will
be added to it and process it for analyzing. The requirements of the project are Virtual box,
Quick start cloud era, Hue and Pig script.
2. Objective
To show per month sales before and after campaign
Count Advertised Product Sales by Month
3. Requirements
The requirements of the project are
Unix and windows user needs the following
Hadoop
Java
Ant
JUnit
The format of the data in the original data input file
1. .pig file is a texture file.
2. Apache pig gives the delivery of describing the user defining function in
programming languages by using scripts. It can be run in script in a file with .pig
extension
Apache pig has 2 modes(grunt shell)
1. Local mode: used to run local host and local file. It is used for testing. HDFS are
not required.
Document Page
2. MapReduce mode: process the HDFS using Apache pig
Data load method in pig.
1. In order to load data in pig, load function is used. In pig we have many functions
for loading.
2. One is pigstorage() function which helps in loading the structured files. This
function is used when USING clause is not included in load operator.
3. Textloader() function used for loading unstructured files into pig.
4. Binstorage() function used for loading the data in the machine readable format.
5. Handling compression: load and store compressed data in pig latin.
Data processing procedure.
Step1: VirtualBox is downloaded (Wu et al., 2011).
Step2: Importing process of Quickstart cloud era is been in process.
Step 3: After importing Quick start Cloudera, Hue needs to be installed in it.
Step4: pig scripts of sales will be uploaded into it and will be saved for analysis purpose
4. Processing techniques
Virtual Box
VirtualBox or VB is a kind of software virtualization package connects on operating system.
(Park, Kim and Ha, 2015).
The Purpose Virtual Box
Virtual box is installed for running more number of OS on desktop in a virtual
environment without any disturbance to host Operating system.
Virtual box is now installed. It has an option “import”. The Quick start Cloud era is imported
into virtual box. Then click on to the start button and open Quick Cloudera. The description of
Cloud era is provided below.
Quick start Could Era
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
The Cloudera VM exist in the VMware, KVM and virtual box. The Cloudera provides the
virtual machine that makes me to work in the handout conditions. This is the simplest way for
the purpose of study and experimenting the handout from desktop. The quick start VM
characteristics are improved which make easier and simple for the cloud managers to access the
information. The Cloudera quick start VM involves all the method in order to Cloudera manager
and Cloudera search. The VM is installed based on the package.
Hue is used for implementation once Quickstart is imported into virtualbox.
In order to use hadoop straight from browser hue is used. It is the lightweight web server.
Hue is used to outlook the top of hadoop distribution. We can install hue in any type of the
device. There is numerous methods to install hue. Here, hue can be installed in the quick start
cloudera. After the implement of the quick start cloudera. Apache Pig script can be used to
display the whole sales of the department. After the initialization of hue. Then the pig script of
the sales can be uploaded into the quick start cloudera. Which can be saved for the analysis
purpose.
5. Data processing procedures
These are the steps to be followed:
Step1: VirtualBox is downloaded (Wu et al., 2011).
Step2: Importing process of Quickstart cloud era is been in process.
Step 3: After importing Quick start Cloudera, Hue needs to be installed in it.
Step4 : pig scripts of sales will be uploaded into it and will be saved for analysis purpose (Gupta,
Kumar and Gopal, 2015).
Document Page
The Quick start is imported and it is now processing.
6. Conclusion
VirtualBox is used for implementation purpose. Then Quickstart cloud era is imported
into it. Then hue will be installed. Then pig scripts for sales is added to it and process it for
analyzing
7. References
Gupta, P., Kumar, P. and Gopal, G. (2015). Sentiment Analysis on Hadoop with Hadoop
Streaming. International Journal of Computer Applications, 121(11), pp.4-8.
Park, S., Kim, S. and Ha, Y. (2015). Scalable visualization for DBpedia ontology analysis using
Hadoop. Software: Practice and Experience, 45(8), pp.1103-1114.
Wu, X., Shen, Z., Wu, R. and Lin, Y. (2011). Jump-start cloud: efficient deployment framework
for large-scale cloud applications. Concurrency and Computation: Practice and Experience,
24(17), pp.2120-2137.
Document Page
chevron_up_icon
1 out of 7
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]