Uber's Big Data Implementation: COIT20253 Assessment Report
VerifiedAdded on 2023/01/10
|18
|4952
|1
Report
AI Summary
This report provides a comprehensive analysis of how Uber can leverage big data to improve its operations and customer experience. It begins with an executive summary and table of contents, followed by an introduction that highlights the importance of big data in the transportation industry. The report then explores three key big data use cases: customer analytics and loyalty marketing, capacity and pricing optimization, and predictive maintenance analytics. Each use case is thoroughly described, including the benefits and potential outcomes for Uber. The report then delves into the big data technologies that can support these use cases, including Apache Hadoop, Apache Spark, data lakes, MongoDB, and Apache Cassandra. Strengths and limitations of each technology are discussed. The report also proposes a big data architecture solution for Uber, using Apache Spark, MongoDB, and DATEX-II for data harmonization. The report concludes by recommending that Uber adopt big data and analytics to enhance its capacity, pricing, revenue, and customer satisfaction. The report references several academic sources to support its claims.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.

BIG DATA 1
Big Data
Student Name
Institutional Affiliation
Date
Big Data
Student Name
Institutional Affiliation
Date
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.

BIG DATA 2
Executive Summary
Big data have been employed in different use case including the transportation
industry. Big data technologies have had a significant impact on the various aspects in the
transport companies like Uber. Uber can leverage big data techniques to improve their
interaction with their partners and clients by pinpointing the new patterns emerging in the
industry and react by providing new products. Also, the company can employ big data tools
to bargain charges with suppliers, increase the income of each operation and reveal unseen
opportunities and this also applies to Uber. Generally, the transport industry can also use big
data approaches to create intelligent transportation system using toll data and vehicle
counting data to predict traffic conditions on the road. This enables travel companies like
Uber to make more informed decisions and choose the best routes with less traffic and ensure
that their clients have reached their destination on time.
This report has described how big data can by Uber by describing the various big data
technologies and tools with their strengths and limitations. The report has also proposed an
architectural solution for Uber which will use Apache Spark, MongoDB, and DATEX-II
standard for data harmonization. As such, the report has concluded by recommending on the
need for Uber to adopt big data and analytics to correctly model and improve capacity,
pricing, revenue, demand, schedules, customer sentiment, cost, among others. Utilization of
big data will enable Uber to significantly enhance the client experience, make the best use of
the available infrastructure and assets, improve services to manage capacity and maximize
revenue, and optimize operational effectiveness.
Executive Summary
Big data have been employed in different use case including the transportation
industry. Big data technologies have had a significant impact on the various aspects in the
transport companies like Uber. Uber can leverage big data techniques to improve their
interaction with their partners and clients by pinpointing the new patterns emerging in the
industry and react by providing new products. Also, the company can employ big data tools
to bargain charges with suppliers, increase the income of each operation and reveal unseen
opportunities and this also applies to Uber. Generally, the transport industry can also use big
data approaches to create intelligent transportation system using toll data and vehicle
counting data to predict traffic conditions on the road. This enables travel companies like
Uber to make more informed decisions and choose the best routes with less traffic and ensure
that their clients have reached their destination on time.
This report has described how big data can by Uber by describing the various big data
technologies and tools with their strengths and limitations. The report has also proposed an
architectural solution for Uber which will use Apache Spark, MongoDB, and DATEX-II
standard for data harmonization. As such, the report has concluded by recommending on the
need for Uber to adopt big data and analytics to correctly model and improve capacity,
pricing, revenue, demand, schedules, customer sentiment, cost, among others. Utilization of
big data will enable Uber to significantly enhance the client experience, make the best use of
the available infrastructure and assets, improve services to manage capacity and maximize
revenue, and optimize operational effectiveness.

BIG DATA 3
Table of Contents
Executive Summary...................................................................................................................2
Introduction................................................................................................................................4
Big Data Use Cases....................................................................................................................4
Customer Analytics and Loyalty Marketing..........................................................................4
Capacity and Pricing Optimization........................................................................................6
Predictive Maintenance Analytics..........................................................................................7
Big Data Technologies...............................................................................................................9
Apache Hadoop......................................................................................................................9
Apache Spark.......................................................................................................................10
Data lakes.............................................................................................................................10
NoSQL (MongoDB).............................................................................................................11
Apache Cassandra................................................................................................................11
Big Data Architecture Solution................................................................................................12
Conclusion................................................................................................................................15
List of References....................................................................................................................17
Table of Contents
Executive Summary...................................................................................................................2
Introduction................................................................................................................................4
Big Data Use Cases....................................................................................................................4
Customer Analytics and Loyalty Marketing..........................................................................4
Capacity and Pricing Optimization........................................................................................6
Predictive Maintenance Analytics..........................................................................................7
Big Data Technologies...............................................................................................................9
Apache Hadoop......................................................................................................................9
Apache Spark.......................................................................................................................10
Data lakes.............................................................................................................................10
NoSQL (MongoDB).............................................................................................................11
Apache Cassandra................................................................................................................11
Big Data Architecture Solution................................................................................................12
Conclusion................................................................................................................................15
List of References....................................................................................................................17

BIG DATA 4
Introduction
Big data is helping travel companies understand the travel choice and behavior of a
passenger as well as gaining insights on the performance of the industry as a whole. Uber can
make use of big data in their strategic pricing and revenue management which allow them to
provide excellent travel experiences to their customers and maximize their revenue
opportunities. Additionally, Uber can leverage big data tools to optimize their network
connectivity according to market needs. By utilizing big data tools, Uber can pinpoint the
new patterns emerging in the industry and react by providing new products. Big data tools
permit travel agents to bargain charges with suppliers, increase the income of each operation
and reveal unseen opportunities and this also applies to Uber (Riahi and Riahi, 2018).
Business analytics allows travel companies such as Uber to gain insight about the current
demand for brands and as a result, predict the supply and demand position. The context will
discuss the different Uber big data uses cases, critically analyze the big data technologies and
illustrate the big data architecture solution.
Big Data Use Cases
Based on the big data strategy developed in assessment 2, this section will describe
three major big data use cases that can provide positive business results to Uber. They
include:
Customer Analytics and Loyalty Marketing
Leaders of travel and transportation are dedicated to enhancing the experience of a
customer while improving marketing expenditure, customer service, and wallet share. By
securing data from client communications and other sources, Uber can acquire all important
and available data about a client to offer effective and most personalized client
service (Andonova, 2013). Apart from disclosing strategic knowledge into the purchasing
patterns of a customer, the all-round view assists in providing directly involved, client-facing
workers with the correct information to provide the best customer experience. They are well
prepared to effectively engage clients in relevant and personalized ways and to gain insights
from the previous client interactions, establishing reliable relationships and consistent
customer history.
Introduction
Big data is helping travel companies understand the travel choice and behavior of a
passenger as well as gaining insights on the performance of the industry as a whole. Uber can
make use of big data in their strategic pricing and revenue management which allow them to
provide excellent travel experiences to their customers and maximize their revenue
opportunities. Additionally, Uber can leverage big data tools to optimize their network
connectivity according to market needs. By utilizing big data tools, Uber can pinpoint the
new patterns emerging in the industry and react by providing new products. Big data tools
permit travel agents to bargain charges with suppliers, increase the income of each operation
and reveal unseen opportunities and this also applies to Uber (Riahi and Riahi, 2018).
Business analytics allows travel companies such as Uber to gain insight about the current
demand for brands and as a result, predict the supply and demand position. The context will
discuss the different Uber big data uses cases, critically analyze the big data technologies and
illustrate the big data architecture solution.
Big Data Use Cases
Based on the big data strategy developed in assessment 2, this section will describe
three major big data use cases that can provide positive business results to Uber. They
include:
Customer Analytics and Loyalty Marketing
Leaders of travel and transportation are dedicated to enhancing the experience of a
customer while improving marketing expenditure, customer service, and wallet share. By
securing data from client communications and other sources, Uber can acquire all important
and available data about a client to offer effective and most personalized client
service (Andonova, 2013). Apart from disclosing strategic knowledge into the purchasing
patterns of a customer, the all-round view assists in providing directly involved, client-facing
workers with the correct information to provide the best customer experience. They are well
prepared to effectively engage clients in relevant and personalized ways and to gain insights
from the previous client interactions, establishing reliable relationships and consistent
customer history.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.

BIG DATA 5
Creating such a 360-degree view usually requires additional efforts to going beyond the
company’s environment for information to enhance current client loyalty and understand the
reference data of a customer (Agha Kasiri and Mansori, 2016). As such, Uber can acquire
more capabilities concerning customers across major functional sectors such as sales,
marketing, customer service, among others. These capabilities assist Uber in enabling:
Enhanced services or products launch strategies,
The ability to control the proliferation of mobile device
The use of social media analytics
Personal-level segmentation
Improved return on marketing spend.
Figure 1: Customer Analytics and Loyalty Marketing (Andonova, 2013)
Personal-level customer segmentation: comprehensive assessments of customer
purchasing behavior patterns within all platforms permit allows Uber to go past marketing to
the crowds and toward more targeted and divided marketing strategies. Analytics
comprehensions assist in creating more applicable self-service capabilities and marketing
tactics to meet the needs of each client.
Optimize services or products launch strategies: comprehensive knowledge can assist
in identifying new service and product opportunities and enhance the effectiveness of service
and product launches. The ability to assess huge amounts of client sentiment information
Creating such a 360-degree view usually requires additional efforts to going beyond the
company’s environment for information to enhance current client loyalty and understand the
reference data of a customer (Agha Kasiri and Mansori, 2016). As such, Uber can acquire
more capabilities concerning customers across major functional sectors such as sales,
marketing, customer service, among others. These capabilities assist Uber in enabling:
Enhanced services or products launch strategies,
The ability to control the proliferation of mobile device
The use of social media analytics
Personal-level segmentation
Improved return on marketing spend.
Figure 1: Customer Analytics and Loyalty Marketing (Andonova, 2013)
Personal-level customer segmentation: comprehensive assessments of customer
purchasing behavior patterns within all platforms permit allows Uber to go past marketing to
the crowds and toward more targeted and divided marketing strategies. Analytics
comprehensions assist in creating more applicable self-service capabilities and marketing
tactics to meet the needs of each client.
Optimize services or products launch strategies: comprehensive knowledge can assist
in identifying new service and product opportunities and enhance the effectiveness of service
and product launches. The ability to assess huge amounts of client sentiment information

BIG DATA 6
allows Uber to better measure the adoption and acceptance rates of new services and
products, allowing for strategy changes and incremental optimizations (Khade, 2016).
Besides, it provides enhanced insights into the use of employees and assets to improve the
delivery of a service or product.
Improve return on marketing spend: detailed client information can assist Uber to
conduct more efficient advertising, offers, marketing crusades, and promotions with better
relevance and personalization. In addition, a deeper insight of each client can assist in
determining how and when to provide offers and promotions, boosting the effectiveness of
marketing within lines and channels of business.
Use social media analytics: end-user-generated social media can improve client
segmentation and profiling, pinpoint major market influencers, assisting in detecting brand
sentiments and provide best customer interactions. Besides, social media can assist in
nurturing a personalized communication with clients for higher satisfaction levels and
knowledge into service and product needs. Many organizations are interacting with their
clients through Facebook, their mobile devices, Twitter, among others.
Control proliferation of mobile device: as clients constantly depend on mobile devices
for business and personal reasons, they hope for more direct and easier access to service
vendors. For instance, when trips disruption occurs, travelers usually reach out to their mobile
gadgets hoping to make quick and convenient changes in their travel plan. Uber can control
mobile and big data capabilities internally by providing associates of sales and service with
access to vital business processes and information required to provide best customer service.
Capacity and Pricing Optimization
Uber must successfully predict demand to improve the use of available capacity.
Concurrently, idle fleet, empty containers, vacant rooms, or unoccupied seats symbolizes lost
revenue. Successful price and capacity management need complex modeling and assessments
of revenue, demand and profit situations with an ever-increasing and continuously-changing
amount of available market and competitive information. This involves acquiring and
combining all pricing information to a more specific and quick improved pricing tactic in
regard to the competition and have a positive effect on return on assets (Huang, 2012).
If possible, breaking price competition alone would resolve the challenge of demand
planning. Nevertheless, factors such as service outages, weather, and property damage can
allows Uber to better measure the adoption and acceptance rates of new services and
products, allowing for strategy changes and incremental optimizations (Khade, 2016).
Besides, it provides enhanced insights into the use of employees and assets to improve the
delivery of a service or product.
Improve return on marketing spend: detailed client information can assist Uber to
conduct more efficient advertising, offers, marketing crusades, and promotions with better
relevance and personalization. In addition, a deeper insight of each client can assist in
determining how and when to provide offers and promotions, boosting the effectiveness of
marketing within lines and channels of business.
Use social media analytics: end-user-generated social media can improve client
segmentation and profiling, pinpoint major market influencers, assisting in detecting brand
sentiments and provide best customer interactions. Besides, social media can assist in
nurturing a personalized communication with clients for higher satisfaction levels and
knowledge into service and product needs. Many organizations are interacting with their
clients through Facebook, their mobile devices, Twitter, among others.
Control proliferation of mobile device: as clients constantly depend on mobile devices
for business and personal reasons, they hope for more direct and easier access to service
vendors. For instance, when trips disruption occurs, travelers usually reach out to their mobile
gadgets hoping to make quick and convenient changes in their travel plan. Uber can control
mobile and big data capabilities internally by providing associates of sales and service with
access to vital business processes and information required to provide best customer service.
Capacity and Pricing Optimization
Uber must successfully predict demand to improve the use of available capacity.
Concurrently, idle fleet, empty containers, vacant rooms, or unoccupied seats symbolizes lost
revenue. Successful price and capacity management need complex modeling and assessments
of revenue, demand and profit situations with an ever-increasing and continuously-changing
amount of available market and competitive information. This involves acquiring and
combining all pricing information to a more specific and quick improved pricing tactic in
regard to the competition and have a positive effect on return on assets (Huang, 2012).
If possible, breaking price competition alone would resolve the challenge of demand
planning. Nevertheless, factors such as service outages, weather, and property damage can

BIG DATA 7
affect capacity planning. Analytics and big data solutions can mine knowledge from huge
amounts of historical and current data by identifying correlations, patterns and optimizing the
accuracy of predictive modeling. Extracting more information can assist Uber to boost the
correctness of planning and forecasting models. Analytics and big data are allowing more
automated modification capabilities of fare/rate that can quickly evaluate huge amounts of
historical pricing information to identify patterns in order to react aggressively to market and
competitive conditions. It enables improved capacity planning, utilization of changing pricing
analytics, and successful return management.
Figure 2: Capacity and Pricing Optimization (Huang, 2012)
Changing pricing analytics: analytics and big data solutions can assist Uber to
identify, forecast and respond to pricing actions of competitors more dynamically and
quickly. This results in automated and smarter rate or fare modifications based on data
analysis of competitor and the market and can get rid of manual adjustments of rate or fare
and manual assessments.
Improved capacity planning: big data and analytics can assist in driving the success of
operations with enhanced accuracy of forecast and assist boost assets availability as well as
return on assets.
affect capacity planning. Analytics and big data solutions can mine knowledge from huge
amounts of historical and current data by identifying correlations, patterns and optimizing the
accuracy of predictive modeling. Extracting more information can assist Uber to boost the
correctness of planning and forecasting models. Analytics and big data are allowing more
automated modification capabilities of fare/rate that can quickly evaluate huge amounts of
historical pricing information to identify patterns in order to react aggressively to market and
competitive conditions. It enables improved capacity planning, utilization of changing pricing
analytics, and successful return management.
Figure 2: Capacity and Pricing Optimization (Huang, 2012)
Changing pricing analytics: analytics and big data solutions can assist Uber to
identify, forecast and respond to pricing actions of competitors more dynamically and
quickly. This results in automated and smarter rate or fare modifications based on data
analysis of competitor and the market and can get rid of manual adjustments of rate or fare
and manual assessments.
Improved capacity planning: big data and analytics can assist in driving the success of
operations with enhanced accuracy of forecast and assist boost assets availability as well as
return on assets.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

BIG DATA 8
Predictive Maintenance Analytics
Uber, just like any other travel and transportation company depend on assets that are
functioning properly. They rely on costly and huge networks of equipment and infrastructure.
Maintaining and managing these assets to accomplish full availability is vital to controlling
costs and increasing income. Unexpected service outages and interruptions caused by a
failure in equipment can lead to customer dissatisfaction, loss of revenue and safety issues
(Zhang and Zhang, 2015). Solutions of predictive maintenance analytics can acquire
information of equipment sensor in real time and combine it with information from manual
measurements, operational data, visual inspections, performance video, among others
(Earley, 2014). Solutions of big data and analytics can assist Uber to use this different data to
optimize levels of service, reduce parts inventory cost, boost return on assets, and minimize
risk on unexpected service outages and delays, and implement effective maintenance
planning.
Figure 3: Predictive Maintenance Analytics (Earley, 2014)
Optimized return on assets: through big data and analytics, Uber can acquire, combine
and assess data of equipment sensor to examine operating conditions of element level. As a
result, they can acquire a view of operating conditions of an asset in real-time and take proper
actions to minimize unexpected equipment outages and failures.
Predictive Maintenance Analytics
Uber, just like any other travel and transportation company depend on assets that are
functioning properly. They rely on costly and huge networks of equipment and infrastructure.
Maintaining and managing these assets to accomplish full availability is vital to controlling
costs and increasing income. Unexpected service outages and interruptions caused by a
failure in equipment can lead to customer dissatisfaction, loss of revenue and safety issues
(Zhang and Zhang, 2015). Solutions of predictive maintenance analytics can acquire
information of equipment sensor in real time and combine it with information from manual
measurements, operational data, visual inspections, performance video, among others
(Earley, 2014). Solutions of big data and analytics can assist Uber to use this different data to
optimize levels of service, reduce parts inventory cost, boost return on assets, and minimize
risk on unexpected service outages and delays, and implement effective maintenance
planning.
Figure 3: Predictive Maintenance Analytics (Earley, 2014)
Optimized return on assets: through big data and analytics, Uber can acquire, combine
and assess data of equipment sensor to examine operating conditions of element level. As a
result, they can acquire a view of operating conditions of an asset in real-time and take proper
actions to minimize unexpected equipment outages and failures.

BIG DATA 9
Optimize levels of service and minimize the risk of unexpected service outages and
delays: the utilization of predictive analytics develop element failure and wear patterns can
assist in preventing unexpected service failure and increase delivery of service with minimum
risk (Wilson, 2016).
Reduced parts inventory cost: big data and analytics can assist operators to effectively
handle their parts inventories.
Smarter maintenance planning: capabilities of big data can enable Uber to rank
maintenance based on utilization patterns and definite operating conditions to aggressively
solve potential concerns before they negatively affect operations.
Big Data Technologies
Big data technology can be divided into two main elements:
The hardware element: it represents the infrastructure and the component layer.
The software element: this module can be split into analytics and discovery software,
data management and organization software, and automation and decision support
software.
Technologies of big data illustrate an innovative generation of architectures and
technologies, structured to efficiently obtain value from huge amounts of a large variety of
data by allowing discovery, acquisition, and analysis at a high speed (Dhar, 2014). The
following are some of the examples of big data technologies that are available for Uber.
Apache Hadoop
Hadoop is an open source tool for managing big data. It can work with several sources
of data, either combining various data sources in order to conduct extensive processing or
reading database data to execute learning jobs of the processor-intensive machine. It
comprises of various use cases but its main application is for a huge amount of continuously
changing data such as location data from traffic to weather sensors, social media data, M2M
(machine-to-machine) transactional data (Glybovets and Dmytruk, 2016).
Advantages of Apache Hadoop
Data sources range: Hadoop saves the time of converting all gathered information into
one format as it can obtain important data from any data form. Besides, it has several
Optimize levels of service and minimize the risk of unexpected service outages and
delays: the utilization of predictive analytics develop element failure and wear patterns can
assist in preventing unexpected service failure and increase delivery of service with minimum
risk (Wilson, 2016).
Reduced parts inventory cost: big data and analytics can assist operators to effectively
handle their parts inventories.
Smarter maintenance planning: capabilities of big data can enable Uber to rank
maintenance based on utilization patterns and definite operating conditions to aggressively
solve potential concerns before they negatively affect operations.
Big Data Technologies
Big data technology can be divided into two main elements:
The hardware element: it represents the infrastructure and the component layer.
The software element: this module can be split into analytics and discovery software,
data management and organization software, and automation and decision support
software.
Technologies of big data illustrate an innovative generation of architectures and
technologies, structured to efficiently obtain value from huge amounts of a large variety of
data by allowing discovery, acquisition, and analysis at a high speed (Dhar, 2014). The
following are some of the examples of big data technologies that are available for Uber.
Apache Hadoop
Hadoop is an open source tool for managing big data. It can work with several sources
of data, either combining various data sources in order to conduct extensive processing or
reading database data to execute learning jobs of the processor-intensive machine. It
comprises of various use cases but its main application is for a huge amount of continuously
changing data such as location data from traffic to weather sensors, social media data, M2M
(machine-to-machine) transactional data (Glybovets and Dmytruk, 2016).
Advantages of Apache Hadoop
Data sources range: Hadoop saves the time of converting all gathered information into
one format as it can obtain important data from any data form. Besides, it has several

BIG DATA 10
functions such as detecting fraud, data warehousing, and market analysis and
campaign.
Cost effective: Hadoop stores the whole raw information created by an organization.
As such, if the firm needs to adjust its processes in the future, it can examine the raw
data and do the necessary. This is not the case in traditional approaches where raw
data is deleted as of a result of an increase in cost.
Speed: Hadoop ensures that work is completed at a fast rate
Multiple copies: Hadoop generates several copies of information stored in it.
Disadvantages of Apache Hadoop
Hadoop lacks preventative measures: companies have to offer required security
measures when managing confidential information since Hadoop, by default, has its
security measures disabled.
Suitable for only large business: Hadoop is ineffective in small businesses that only
creates small data.
Apache Spark
Apache Spark is an open source big data analysis tools that work with all main
programming languages such as Scala, R, Java, Python, and SQL. It is utilized by all types of
companies small to medium to large organizations (Mavridis and Karatza, 2017).
Advantages of Apache Spark
It is fast
It is effective
It is flexible
It scalable
It is fault tolerant
Disadvantages of Apache Spark
It is costly
It lacks its own file system
Data can only be integrated according to time and not records
functions such as detecting fraud, data warehousing, and market analysis and
campaign.
Cost effective: Hadoop stores the whole raw information created by an organization.
As such, if the firm needs to adjust its processes in the future, it can examine the raw
data and do the necessary. This is not the case in traditional approaches where raw
data is deleted as of a result of an increase in cost.
Speed: Hadoop ensures that work is completed at a fast rate
Multiple copies: Hadoop generates several copies of information stored in it.
Disadvantages of Apache Hadoop
Hadoop lacks preventative measures: companies have to offer required security
measures when managing confidential information since Hadoop, by default, has its
security measures disabled.
Suitable for only large business: Hadoop is ineffective in small businesses that only
creates small data.
Apache Spark
Apache Spark is an open source big data analysis tools that work with all main
programming languages such as Scala, R, Java, Python, and SQL. It is utilized by all types of
companies small to medium to large organizations (Mavridis and Karatza, 2017).
Advantages of Apache Spark
It is fast
It is effective
It is flexible
It scalable
It is fault tolerant
Disadvantages of Apache Spark
It is costly
It lacks its own file system
Data can only be integrated according to time and not records
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.

BIG DATA 11
Data lakes
Many companies are developing data lakes to ensure easy access to large amounts of
data. These are large repositories of data that gather information from various sources and
keeps it in its raw states (Mathis, 2017). It is specifically applicable when organizations want
to store information but is uncertain about how they might utilize it.
Advantages of the data lake
It has the ability to cope with the 3Vs of big data creation; variety, velocity, and
volume
It stores data in its natural state
Transformations and schemas are only used when queries are generated by other
systems or users
Apps and users can interfere with the information as they choose
Disadvantages of the data lake
Data hoarding is unsystematic, thus resulting in outdated data
Data interpretations from different app or user may conflict
It is challenging to finding particular data since its metatags are inaccurate or missing
NoSQL (MongoDB)
Not Only SQL (NoSQL) is used to manage unstructured data. Unstructured data is
stored in the indefinite schema in NoSQL databases. Better performance of NoSQL is
demonstrated in storing large volumes of data (Aggarwal and Sonika, 2016).
Advantages of NoSQL
NoSQL databases are easy to scale. They are structured for utilization with
inexpensive commodity hardware.
NoSQL databases manage huge data volumes
NoSQL databases require less practical management. It has auto repair and data
distribution capabilities and fewer administration and tuning requirements.
NoSQL databases can store and process large amounts of data at a reduced cost.
Disadvantages of NoSQL
It is underdeveloped. Many essential features have not been implemented
Data lakes
Many companies are developing data lakes to ensure easy access to large amounts of
data. These are large repositories of data that gather information from various sources and
keeps it in its raw states (Mathis, 2017). It is specifically applicable when organizations want
to store information but is uncertain about how they might utilize it.
Advantages of the data lake
It has the ability to cope with the 3Vs of big data creation; variety, velocity, and
volume
It stores data in its natural state
Transformations and schemas are only used when queries are generated by other
systems or users
Apps and users can interfere with the information as they choose
Disadvantages of the data lake
Data hoarding is unsystematic, thus resulting in outdated data
Data interpretations from different app or user may conflict
It is challenging to finding particular data since its metatags are inaccurate or missing
NoSQL (MongoDB)
Not Only SQL (NoSQL) is used to manage unstructured data. Unstructured data is
stored in the indefinite schema in NoSQL databases. Better performance of NoSQL is
demonstrated in storing large volumes of data (Aggarwal and Sonika, 2016).
Advantages of NoSQL
NoSQL databases are easy to scale. They are structured for utilization with
inexpensive commodity hardware.
NoSQL databases manage huge data volumes
NoSQL databases require less practical management. It has auto repair and data
distribution capabilities and fewer administration and tuning requirements.
NoSQL databases can store and process large amounts of data at a reduced cost.
Disadvantages of NoSQL
It is underdeveloped. Many essential features have not been implemented

BIG DATA 12
It offers less support
It lacks advanced specialists
It lacks administration
Apache Cassandra
Apache Cassandra is one of the contributors to the huge success of Facebook since it
permits the processing of structured sets of data disseminated across a large number of
connections around the world. As a result of its architecture, Apache Cassandra performs best
under heavy loads of work without demonstrating any failure and comprises of special
capabilities that no other relational database or NoSQL possess (Fan et al., 2015).
Advantages of Apache Cassandra
The query language applied is simple, as such, operations are simple
It is easy to add and remove connections from an executing cluster
In-built high-availability
Excellent linear scalability
Continuous duplication across connections
High tolerance of fault.
The disadvantage of Apache Cassandra
It lacks the join nodes concept
Limited querying alternatives for recovering data
Data sorting is a design decision. It lacks things such as group by or order by
Big data technologies have changed the travel industry. Previously, gathering and
organizing data involved substantial processes and it was challenging to generate correct
profile segments of travelers (You, Ogiela and Hwang, 2015). It is important for a company
to understand the trends of travelers to offer exciting and convenient traveling experiences to
its customers. Big data assist in assessing the trends of travelers by gathering data from
various client’s centers and creating a particular marketing tactic for the target market
(MATACUTA and POPA, 2018). For instance, Hadoop, a big data technology, offer
sufficient space for data storage and provide data gathered from different sources in an
organized way. Besides, it allows travel companies to take actions based on the varying needs
of customers. In addition, Hadoop offers major insights and predictive modeling analysis to
organizations.
It offers less support
It lacks advanced specialists
It lacks administration
Apache Cassandra
Apache Cassandra is one of the contributors to the huge success of Facebook since it
permits the processing of structured sets of data disseminated across a large number of
connections around the world. As a result of its architecture, Apache Cassandra performs best
under heavy loads of work without demonstrating any failure and comprises of special
capabilities that no other relational database or NoSQL possess (Fan et al., 2015).
Advantages of Apache Cassandra
The query language applied is simple, as such, operations are simple
It is easy to add and remove connections from an executing cluster
In-built high-availability
Excellent linear scalability
Continuous duplication across connections
High tolerance of fault.
The disadvantage of Apache Cassandra
It lacks the join nodes concept
Limited querying alternatives for recovering data
Data sorting is a design decision. It lacks things such as group by or order by
Big data technologies have changed the travel industry. Previously, gathering and
organizing data involved substantial processes and it was challenging to generate correct
profile segments of travelers (You, Ogiela and Hwang, 2015). It is important for a company
to understand the trends of travelers to offer exciting and convenient traveling experiences to
its customers. Big data assist in assessing the trends of travelers by gathering data from
various client’s centers and creating a particular marketing tactic for the target market
(MATACUTA and POPA, 2018). For instance, Hadoop, a big data technology, offer
sufficient space for data storage and provide data gathered from different sources in an
organized way. Besides, it allows travel companies to take actions based on the varying needs
of customers. In addition, Hadoop offers major insights and predictive modeling analysis to
organizations.

BIG DATA 13
Big Data Architecture Solution
The proposed architectural solution for Uber is able to extract, transform, and store all
the data identified efficiently (Kumar and Kirthika, 2017). The proposed solution will meet
the following technical specifications: transforming and storing big data efficiently, ensuring
quality of data, should have the ability to handle raw big data in different sizes and formats,
data level interoperability and facilitating creation of additional value-added services for
transport user, and efficient and robust storage capabilities that is scalable enough to meet big
data requirements from traffic sensors (Neilson et al., 2019).
The proposed solution will be developed using Apache Spark to address the above-
identified technical specifications for big data processing including transformation and
cleansing of data. This is important for Uber in ensuring that accurate information based on
predictive analytics can be used to identify customer preference, traffic conditions, pricing
optimization, and predictive maintenance analytics. One of the big data technologies that
support parallel processing is Apache Spark and has the capacity to handle big data intelligent
transportation system information (ITS) because it has better and more advanced features
than Apache Hadoop especially in processing time (Geerdink, 2015). It also has the capacity
to offer data management framework like SparkSQL in an environment that is integrated like
Uber operating environment. For management and storage of customer and traffic data,
MongoDB was employed as a NoSQL mechanism. MongoDB is an efficient and flexible
distributed database approach which employs a document-oriented methodology and has low
latency (Fiosina, Fiosins and Müller, 2013). It best fit Uber requirements because it is easily
integrated with Apache Spark and is easily scalable.
The figure below illustrates a conceptual architecture presenting how data may flow
in from different sources and means such as web services using REST or SOAP, server or
local documents which are in various formats such as JSON, XML, XLS, CSV, and text and
are of different sizes. Apache Spark also supports data harmonization and cleaning.
Big Data Architecture Solution
The proposed architectural solution for Uber is able to extract, transform, and store all
the data identified efficiently (Kumar and Kirthika, 2017). The proposed solution will meet
the following technical specifications: transforming and storing big data efficiently, ensuring
quality of data, should have the ability to handle raw big data in different sizes and formats,
data level interoperability and facilitating creation of additional value-added services for
transport user, and efficient and robust storage capabilities that is scalable enough to meet big
data requirements from traffic sensors (Neilson et al., 2019).
The proposed solution will be developed using Apache Spark to address the above-
identified technical specifications for big data processing including transformation and
cleansing of data. This is important for Uber in ensuring that accurate information based on
predictive analytics can be used to identify customer preference, traffic conditions, pricing
optimization, and predictive maintenance analytics. One of the big data technologies that
support parallel processing is Apache Spark and has the capacity to handle big data intelligent
transportation system information (ITS) because it has better and more advanced features
than Apache Hadoop especially in processing time (Geerdink, 2015). It also has the capacity
to offer data management framework like SparkSQL in an environment that is integrated like
Uber operating environment. For management and storage of customer and traffic data,
MongoDB was employed as a NoSQL mechanism. MongoDB is an efficient and flexible
distributed database approach which employs a document-oriented methodology and has low
latency (Fiosina, Fiosins and Müller, 2013). It best fit Uber requirements because it is easily
integrated with Apache Spark and is easily scalable.
The figure below illustrates a conceptual architecture presenting how data may flow
in from different sources and means such as web services using REST or SOAP, server or
local documents which are in various formats such as JSON, XML, XLS, CSV, and text and
are of different sizes. Apache Spark also supports data harmonization and cleaning.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

BIG DATA 14
Figure 4: Data Harmonization Approach (Ramana, Kumar and Nagesh, 2018)
In the proposed architecture, the configuration of MongoDB was done in such a
manner so as to have two different document collection; one for collecting historical data and
another for collecting real-time traffic data from traffic events such as road works,
roadblocks, and accidents and traffic sensor data. This is beneficial to Uber operators as it
will enable them to make more informed decisions on which best strategies to use such as
deciding on the best routes to take when there is heavy traffic ins some roads. This justifies
some of the advantages that have been described in the previous section; a unified mobility
structure for cleaning the data including real-time and historical data and allowing machine
coupling learning approaches like predicting traffic (Wangoo, 2019). Regarding traffic data
that will be processed, it will be categorized into two major groups: vehicle counting data and
toll data. This will lead to various data sources illustrating different semantic meanings and
attributes. Tolling data will represent or describe the true number of Uber cars that will be
tolled at an interval of 1 hours for each class in a specific highway segment. This is important
as the company can use the collected data to check on Uber demand. This data will be
available through CSV files which contain the attributes that will be presented in the table
below.
ConcessionID Determining the region concessioned,
integer value
TollGantryID Determining the toll, integer value
Date Time and date
Uber X Up to 4 passengers (least expensive)
Uber XL Up to 6 passengers, mostly mini van or
SUV
Uber Select Luxury Sedan like Audi, BMW, Mercedes
Figure 4: Data Harmonization Approach (Ramana, Kumar and Nagesh, 2018)
In the proposed architecture, the configuration of MongoDB was done in such a
manner so as to have two different document collection; one for collecting historical data and
another for collecting real-time traffic data from traffic events such as road works,
roadblocks, and accidents and traffic sensor data. This is beneficial to Uber operators as it
will enable them to make more informed decisions on which best strategies to use such as
deciding on the best routes to take when there is heavy traffic ins some roads. This justifies
some of the advantages that have been described in the previous section; a unified mobility
structure for cleaning the data including real-time and historical data and allowing machine
coupling learning approaches like predicting traffic (Wangoo, 2019). Regarding traffic data
that will be processed, it will be categorized into two major groups: vehicle counting data and
toll data. This will lead to various data sources illustrating different semantic meanings and
attributes. Tolling data will represent or describe the true number of Uber cars that will be
tolled at an interval of 1 hours for each class in a specific highway segment. This is important
as the company can use the collected data to check on Uber demand. This data will be
available through CSV files which contain the attributes that will be presented in the table
below.
ConcessionID Determining the region concessioned,
integer value
TollGantryID Determining the toll, integer value
Date Time and date
Uber X Up to 4 passengers (least expensive)
Uber XL Up to 6 passengers, mostly mini van or
SUV
Uber Select Luxury Sedan like Audi, BMW, Mercedes

BIG DATA 15
Uber BLACK Uber executive Luxury car service (Most
expensive)
Table 1: Tolling Data
The above table is used to define the different categories of Uber cars. For instance,
Uber BLACK represent Uber’s most curious and expensive car service. On the other hand,
vehicle counting data presents the total number of vehicles that have been counted at specific
points on the road at an interval of 1 hour. The data will be represented in a CSV file which
has features and attributes described in the table below (Xia, Zhang and Liu, 2016).
ConcessionID Determining the region concessioned, the
integer value
Road Road identifier
RoadSegment Segment identifier
Bearing Southbound or northbound
ComponentType Type of sensor
Date Time and date
TotalVehicles Total number of vehicles
UberX Total number of Uber X cars
UberXL Total number of Uber XL cars
Uber Select Total number of Uber select cars
UberBLACK Total number of Uber Black cars
Table 2: vehicle Counting Data
As described in the previous sections, DATEX-II will be used for data harmonization.
DATEX-II is a comprehensive standard for information that is associated with intelligent
transportation system (ITS) and has been widely used because it is very adaptable and has the
ability to assemble the major types of data that is needed. For easy retrieval bad
transportation of data, all the information will be mapped to an XML file regardless of the
type of technology used (Xiao, 2014). The table below described a few transformations done
at the data level to comply with DATEX-II standard.
Uber BLACK Uber executive Luxury car service (Most
expensive)
Table 1: Tolling Data
The above table is used to define the different categories of Uber cars. For instance,
Uber BLACK represent Uber’s most curious and expensive car service. On the other hand,
vehicle counting data presents the total number of vehicles that have been counted at specific
points on the road at an interval of 1 hour. The data will be represented in a CSV file which
has features and attributes described in the table below (Xia, Zhang and Liu, 2016).
ConcessionID Determining the region concessioned, the
integer value
Road Road identifier
RoadSegment Segment identifier
Bearing Southbound or northbound
ComponentType Type of sensor
Date Time and date
TotalVehicles Total number of vehicles
UberX Total number of Uber X cars
UberXL Total number of Uber XL cars
Uber Select Total number of Uber select cars
UberBLACK Total number of Uber Black cars
Table 2: vehicle Counting Data
As described in the previous sections, DATEX-II will be used for data harmonization.
DATEX-II is a comprehensive standard for information that is associated with intelligent
transportation system (ITS) and has been widely used because it is very adaptable and has the
ability to assemble the major types of data that is needed. For easy retrieval bad
transportation of data, all the information will be mapped to an XML file regardless of the
type of technology used (Xiao, 2014). The table below described a few transformations done
at the data level to comply with DATEX-II standard.

BIG DATA 16
Table 3: Heterogeneous sources to DATEX-II transformations (Xiao, 2014)
The above-described architecture solution will be able to meet the requirements of the
identified use case.
Conclusion
Big data and analytics have advanced beyond the sandbox technology realm. Service
providers are currently using solutions of big data and analytics within business operations
and are benefiting from the returned value of the business. Similarly, IT (Information
technology) experts are exploiting the lower costs and options of data management provided
through big data. As such, Uber should adopt big data and analytics to correctly model and
improve capacity, pricing, revenue, demand, schedules, customer sentiment, cost, among
others. Utilization of big data will assist Uber to significantly enhance the client experience,
make the best use of the available infrastructure and assets, improve services to manage
capacity and maximize revenue, and optimize operational effectiveness. Companies that use
big data and analytics to assist in driving improvements in sales, marketing, and operations,
will be well set with the agility and insights needed for competitive advantage.
Table 3: Heterogeneous sources to DATEX-II transformations (Xiao, 2014)
The above-described architecture solution will be able to meet the requirements of the
identified use case.
Conclusion
Big data and analytics have advanced beyond the sandbox technology realm. Service
providers are currently using solutions of big data and analytics within business operations
and are benefiting from the returned value of the business. Similarly, IT (Information
technology) experts are exploiting the lower costs and options of data management provided
through big data. As such, Uber should adopt big data and analytics to correctly model and
improve capacity, pricing, revenue, demand, schedules, customer sentiment, cost, among
others. Utilization of big data will assist Uber to significantly enhance the client experience,
make the best use of the available infrastructure and assets, improve services to manage
capacity and maximize revenue, and optimize operational effectiveness. Companies that use
big data and analytics to assist in driving improvements in sales, marketing, and operations,
will be well set with the agility and insights needed for competitive advantage.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.

BIG DATA 17
List of References
Aggarwal, D. and Sonika, R. (2016). Emerging Technologies For Big Data Processing:
NOSQL And NEWSQL Data Stores. International Journal Of Engineering And Computer
Science.
Agha Kasiri, L. and Mansori, S. (2016). Standardization, customization, and customer loyalty
in service industry. Journal of Marketing Analytics, 4(2-3), pp.66-76.
Andonova, Y. (2013). Loyalty 3.0: How big data and gamification are revolutionizing
customer and employee engagement. Journal of Marketing Analytics, 1(4), pp.234-236.
Dhar, V. (2014). Why Big Data = Big Deal. Big Data, 2(2), pp.55-56.
Earley, S. (2014). Big Data and Predictive Analytics: What's New?. IT Professional, 16(1),
pp.13-15.
Fan, H., Ramaraju, A., McKenzie, M., Golab, W. and Wong, B. (2015). Understanding the
causes of consistency anomalies in Apache Cassandra. Proceedings of the VLDB Endowment,
8(7), pp.810-813.
Fiosina, J., Fiosins, M. and Müller, J. (2013). Big Data Processing and Mining for Next
Generation Intelligent Transportation Systems. Jurnal Teknologi, 63(3).
Geerdink, B. (2015). A reference architecture for big data solutions - introducing a model to
perform predictive analytics using big data technology. International Journal of Big Data
Intelligence, 2(4), p.236.
Glybovets, А. and Dmytruk, Y. (2016). The Effectiveness of Programming Languages in the
Apache Hadoop MapReduce Framework. Upravlâûŝie sistemy i mašiny, (5 (265), pp.84-92.
Huang, T. (2012). Personalized Pricing and Capacity Management in Electronic
Markets. SSRN Electronic Journal.
Khade, A. (2016). Performing Customer Behavior Analysis using Big Data
Analytics. Procedia Computer Science, 79, pp.986-992.
Kumar, S. and Kirthika, M. (2017). Big Data Analytics Architecture and Challenges, Issues
of Big Data Analytics. International Journal of Trend in Scientific Research and
Development, Volume-1(Issue-6), pp.669-673.
List of References
Aggarwal, D. and Sonika, R. (2016). Emerging Technologies For Big Data Processing:
NOSQL And NEWSQL Data Stores. International Journal Of Engineering And Computer
Science.
Agha Kasiri, L. and Mansori, S. (2016). Standardization, customization, and customer loyalty
in service industry. Journal of Marketing Analytics, 4(2-3), pp.66-76.
Andonova, Y. (2013). Loyalty 3.0: How big data and gamification are revolutionizing
customer and employee engagement. Journal of Marketing Analytics, 1(4), pp.234-236.
Dhar, V. (2014). Why Big Data = Big Deal. Big Data, 2(2), pp.55-56.
Earley, S. (2014). Big Data and Predictive Analytics: What's New?. IT Professional, 16(1),
pp.13-15.
Fan, H., Ramaraju, A., McKenzie, M., Golab, W. and Wong, B. (2015). Understanding the
causes of consistency anomalies in Apache Cassandra. Proceedings of the VLDB Endowment,
8(7), pp.810-813.
Fiosina, J., Fiosins, M. and Müller, J. (2013). Big Data Processing and Mining for Next
Generation Intelligent Transportation Systems. Jurnal Teknologi, 63(3).
Geerdink, B. (2015). A reference architecture for big data solutions - introducing a model to
perform predictive analytics using big data technology. International Journal of Big Data
Intelligence, 2(4), p.236.
Glybovets, А. and Dmytruk, Y. (2016). The Effectiveness of Programming Languages in the
Apache Hadoop MapReduce Framework. Upravlâûŝie sistemy i mašiny, (5 (265), pp.84-92.
Huang, T. (2012). Personalized Pricing and Capacity Management in Electronic
Markets. SSRN Electronic Journal.
Khade, A. (2016). Performing Customer Behavior Analysis using Big Data
Analytics. Procedia Computer Science, 79, pp.986-992.
Kumar, S. and Kirthika, M. (2017). Big Data Analytics Architecture and Challenges, Issues
of Big Data Analytics. International Journal of Trend in Scientific Research and
Development, Volume-1(Issue-6), pp.669-673.

BIG DATA 18
MATACUTA, A. and POPA, C. (2018). Big Data Analytics: Analysis of Features and
Performance of Big Data Ingestion Tools. Informatica Economica, 22(2/2018), pp.25-34.
Mathis, C. (2017). Data Lakes. Datenbank-Spektrum, 17(3), pp.289-293.
Mavridis, I. and Karatza, H. (2017). Performance evaluation of cloud-based log file analysis
with Apache Hadoop and Apache Spark. Journal of Systems and Software, 125, pp.133-151.
Neilson, A., Indratmo, Daniel, B. and Tjandra, S. (2019). Systematic Review of the Literature
on Big Data in the Transportation Domain: Concepts and Applications. Big Data Research.
Ramana, V., Kumar, S. and Nagesh, P. (2018). Analytic architecture to overcome real time
traffic control as an intelligent transportation system using big data. International Journal of
Engineering & Technology, 7(2.18), p.7.
Riahi, Y. and Riahi, S. (2018). Big Data and Big Data Analytics: concepts, types and
technologies. International Journal of Research and Engineering, 5(9), pp.524-528.
Wangoo, D. (2019). Intelligent Microservices in Service-Oriented Architecture for Big Data
Organization. SSRN Electronic Journal.
Wilson, A. (2016). Big-Data Analytics for Predictive Maintenance Modeling: Challenges and
Opportunities. Journal of Petroleum Technology, 68(10), pp.71-72.
Xia, Y., Zhang, L. and Liu, Y. (2016). Special issue on big data driven Intelligent
Transportation Systems. Neurocomputing, 181, pp.1-3.
Xiao, Q. (2014). Big Data analysis: Intelligent Transportation Development Engine. Urban
Transportation & Construction, 1(1), p.11.
You, I., Ogiela, M. and Hwang, M. (2015). Intelligent technologies and applications for big
data analytics. Software: Practice and Experience, 45(8), pp.1019-1021.
Zhang, Z. and Zhang, P. (2015). Seeing around the corner: an analytic approach for
predictive maintenance using sensor data. Journal of Management Analytics, 2(4), pp.333-
350.
MATACUTA, A. and POPA, C. (2018). Big Data Analytics: Analysis of Features and
Performance of Big Data Ingestion Tools. Informatica Economica, 22(2/2018), pp.25-34.
Mathis, C. (2017). Data Lakes. Datenbank-Spektrum, 17(3), pp.289-293.
Mavridis, I. and Karatza, H. (2017). Performance evaluation of cloud-based log file analysis
with Apache Hadoop and Apache Spark. Journal of Systems and Software, 125, pp.133-151.
Neilson, A., Indratmo, Daniel, B. and Tjandra, S. (2019). Systematic Review of the Literature
on Big Data in the Transportation Domain: Concepts and Applications. Big Data Research.
Ramana, V., Kumar, S. and Nagesh, P. (2018). Analytic architecture to overcome real time
traffic control as an intelligent transportation system using big data. International Journal of
Engineering & Technology, 7(2.18), p.7.
Riahi, Y. and Riahi, S. (2018). Big Data and Big Data Analytics: concepts, types and
technologies. International Journal of Research and Engineering, 5(9), pp.524-528.
Wangoo, D. (2019). Intelligent Microservices in Service-Oriented Architecture for Big Data
Organization. SSRN Electronic Journal.
Wilson, A. (2016). Big-Data Analytics for Predictive Maintenance Modeling: Challenges and
Opportunities. Journal of Petroleum Technology, 68(10), pp.71-72.
Xia, Y., Zhang, L. and Liu, Y. (2016). Special issue on big data driven Intelligent
Transportation Systems. Neurocomputing, 181, pp.1-3.
Xiao, Q. (2014). Big Data analysis: Intelligent Transportation Development Engine. Urban
Transportation & Construction, 1(1), p.11.
You, I., Ogiela, M. and Hwang, M. (2015). Intelligent technologies and applications for big
data analytics. Software: Practice and Experience, 45(8), pp.1019-1021.
Zhang, Z. and Zhang, P. (2015). Seeing around the corner: an analytic approach for
predictive maintenance using sensor data. Journal of Management Analytics, 2(4), pp.333-
350.
1 out of 18

Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.