Oracle Database Sharding for Cloud Big Data Scalability and Tolerance

Verified

Added on 2022/08/27

AI Summary

This report delves into the concept of database sharding, particularly within the context of cloud big data, focusing on how it provides error tolerance and scalability. The report begins by introducing sharding as a method of horizontally partitioning a database to manage large datasets efficiently, contrasting it with vertical scaling limitations. It then explains the architecture of sharding, including key components like the shard director, GSM, regions, shards, shard groups, and sharding keys. The process of setting up sharding in Oracle Database is detailed, including installation steps, creating a shard catalog database, installing GSM software, and configuring shard groups and shards. The report also covers different types of sharding, such as system-managed, user-defined, and composite sharding, along with their respective benefits and drawbacks. Benefits include horizontal scaling, faster query responses, and improved application reliability. The report concludes with a comprehensive overview of the advantages and disadvantages of sharding, providing a valuable resource for students studying big data management.

1
To provide Error Tolerance and
Scalability of Cloud Big Data
Introduction
In different database engines such as
Oracle's NoSQL, Sharding is a commonly
used term, which means decomposing a
database into multiple smaller units that can
handle requests made independently [1]. In
Oracle databases 12c release, they
announced the addition of new features of
which one of the new features was database
native or Sharding. The concept of Sharding
in Oracle Database is easily understood from
the table partitioning concept this is because
Sharding works on similar logical
architecture [3]. Table partitioning is a
process where large database tables are
divided into multiple smaller parts. In the
smaller individual tables, database queries
that access only a small portion of data can
run faster because there is minimal data to
scan[1]. For example, in a scenario where
you are dealing with database tables of
1000GB of data with millions of rows,
containing Health data of over 15years.In
such situations, the user may perform the
bulk of reads consuming a significant
amount of time.
To avoid such overheads, the table
can be partitioned using different types of
partitioning schemes such as hash, interval,
and range [4]. For instance, one may require
to create a partition for each year, and the
user fetches data for the present year, then
the other year's information is ignored.
Therefore, with such a strategy, significant
performance is achieved. Oracle Database
12.2, a similar concept was introduced with
Sharding in which data horizontally
partitions across independent database(s).
The primary goal sharding being easy
scalability. Scaling the database
infrastructure can either be horizontal or
vertical. Scaling, which does vertically, can
be easy and fast, but after some time, they
may reach the ceiling due to software

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

2
limitations or constraints on the underlying
software [5]. Hence the only option may be
scaling horizontally. It is distributing data
equally across multiple independent
database nodes while ensuring reliability
from anyone database being a single point of
failure.
How Sharding Works
In Sharding, the database divides
into multiple independent databases.
Moreover, the dataset in the original
database distributes across various
databases. Every single database has its
hardware resources, server with components
such as CPU, storage, and memory [6]. The
only original database from which data is
distributed. It is called the sharded catalog
database, and it coordinates with the GSM
(Global Software Manager). The sharded
table creates in the sharded catalog database,
and it contains a table definition with no
data and metadata without any rows [7]. For
the sharded databases(shards), they all
contain similar tables and columns but
different row data. It improves scalability
and performance as each independent shard
only processes and manages a subset of the
entire sharded table dataset [8].
Components of Sharding
The entire configuration takes to be a
single framework, but it made up of various
elements: Shard Director is a central
software component to install and configure,
enhancing all connections via GSM
centralized regional listener [9]. The primary
role of GSM is the transfer of relationships
based on the real-time load balancing and
preference of targets. The other is a region
that refers to databases that host in different
geographical locations, and parts enhance
easy references. Region names use as a

3
badge for all targets of the configuration
[10]. The other one is a shard, which is a
single partition in the set of horizontally
partitioned databases. The other one is the
shard group, which is a set of one or more
shards. The other element is the shared
space, which includes one or more shard
groups together. Sharding key is the other
element, which is a portion of the primary
that determines how data is to distribute.
When an application is connected to a
database, it requires to pass in a sharding
key, so that the connection can be given to
the desired shard database. The other known
element is pooled, which is created and used
to restrict usages to specific groups. In that,
each pool will assign to each administrative
domain. The Sharded catalog is a database
in which GSM configuration, services
metadata, and databases store [11]. The
global service is also another element that
happens; the links route to GSM, and it
manages which database should route to,
based on the database configuration and the
availability of targets.
Sharding Steps in Oracle Database
To set up sharding databases there is a
procedure followed in Oracle databases,
these steps are discussed below:
Installation of Oracle 12c Release 2 on
Oracle Linux 6 and 7.
First download the Oracle software from
OTN or MOS as per your support status.
After downloading, unzip the files by
employing the following command
Unzip linnuxx64_12201_database.zip

4
Then you will be having a single folder
called “database” which contains the
installation files.
Host file; this is a file that contains a fully
qualified name for the server
<IP-address> <fully-qualified-machine-
name> <machine-name>
Oracle installation prerequisites either
automatic Setup, manual or additional setup.
For the automatic set-up you can opt to use
“oracle-database-server-12cR2-preinstall”
package to perform all prerequisite setup, by
issuing the following command [12].
#yum install oracle-database-server-12cRc-
preinstall –y
You can then issue the following command
to update all though not a must
# yum update –y
After carrying out either manual or
automatic setup, you to perform Additional
set up which is a must do. By performing the
following stops
 Set password for oracle user.
Password oracle
 Set secure Linux to permissive, this
is done by editing the
“/etc./selinux/config” file by
ensuring that the SELINUX flag is
set as follows SELINUX=permissive
 Restart the server and run the
following command, #setenforce
permissive
 In a situation where Linux firewall is
enabled, disable it by issuing the
following command
#systemct1 stop firewall
#systemct1 disable firewall
 Create directories for which the
oracle software will be installed
Mkdir –p
/u01/app/oracl/product/12.2.0.1/db_1
Chown –R oracle:oinstall /u01
Chmod –R 775 /u01

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

5
Then create a scripts directory using this
command; mkdir /home/oracle/scripts and
then create a file called “setEnv.sh”. To the
“setEnv.sh” add a reference at the end of the
“/home/oracle/.bash_profile” file. As shown
echo “./home/oracle/scripts/setEnv.sh” >>
/home/oracle/.bash_profile
Create a “start and stop_all.sh” script which
can be called from a startup or shutdown
service [13]. Also make sure that the
permissions and ownership are precise.
Once the installation is complete you should
be in a position to start/stop the database
using the following scripts executed from
the oracle user
~/ scripts/start_all.sh
~/scripts/stop_all.sh
Installation and post installation
Login into the oracle user and then set the
DIPLAY environmental variable in case you
are using X emulation.
DISPLAY=<machine-name>:0.0; export
DISPLAY
Start OUI by executing the following
command. /runInstaller
Create Shard Catalog database
A shard catalog database is a database that
manages all other shards, it is all the store
SDB configuration data and also acts as a
query coordinator used to process
multishard queries.
Notice that all configuration changes and
global services are initiated at the shard
catalog.
On ShardCat node;
Create the following directories
Mkdir –p /u01/app/oracle/flash
_recovery_area/SCAT
Mkdir –p /u01/app/oracle/fast
_recovery_area/SCAT
Mkdir –p /u01/app/oracle/oradata
_recovery_area/SCAT

6
Mkdir –p
/u01/app/oracle/admin/SCAT/adump
Run DBCA
./dbca-silent\createDatabase\Name
General_Purpose.dbc\ -gdbName SCAT\ -
sid SCAT \ -SysPassword ***\ -
emConfiguration NONE\redoLogFileSize
100\
-recoveryAreaDestination /u01/app/oracle/
SCAT\
-DatafileDestination/u01/app/oracle/SCAT
-storageType FS \
-listeners LISTENER12 \
-registerWithDirService false\
-charSer AL32UTF8\
-nationalCharSet AL16UTF16\
-databaseType MULTIPURPOSE\
-memoryPercentage 40\
-memoryType AUTO
Install GSM software in Shardcat
In Shardcat node;
Download GSM software from Oracle
Downloads page, Unzip the software and
install as separate home.
Prepare SCAT database for sharding
prerequisites.
In shardcat
At SCAT Database
Alter System set
db_create_file_destination=’/u01/ora12/app/
oracle/oradata’ scope=both;
Alter System set open_links=16scope=spfile
Alter System set
open_links_per_inst=16scope =spfile
Startup force
Alter usergsmcauster acc unblock;
Alter usergsmcauster identified by oracle;
CREATE USER nicadmin identified BY
oracle

7
GRANT connect, create
session,ngsmadmin_role to nicadmin;
grant inherit priviledges on user SYS to
GSMADMIN_INTERNAL;
execute dbms_xdb.sethttpport(8080);
commit;
exec
DBMS_SCHEDULER.SET_AGENT_REGI
STRATION_PASS(‘oracagent’)
Create Shard Catalog in SCAT
At Shardcat node, in SCAT Db, set
environment to GSM home
#gdsctl
GDSCTL>create shardcatalog –database
shardcat:1521: SCAT –chunks 12-
usernicadmin/oracle –sdb SCAT –region
region1
GDSCTL>add gsm –gsm sharddirector1 –
listener 1571 pwd oracle catalog
shardcat:1521: SCAT –region region 1
GDSCTL> start gsm –gsm sharddirector1
GDSCTL> add credential orac_cred –osacc
oracle –ospassword****
GDSCTL> exit
Start the Scheduler Agent and Register
Shard Nodes
At shard 1 and shard 2 ,scheduler agent is
installed there just start it if you install the
oracle database [14].
#schagent start
#schagent status
#echo oragent|schagent –registereddatabase
shardcat 8080
Create Shard Group/ director/Add shards
GDSCTL>set gsm –gsm sharddirector1
GDSCTL>connect nicadmin/oracle
Catalog connection is established

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

8
GDSCTL>-add shard group
GDSCTL>add shardgroup –
shardgroupprimary_shardgroup-deploy_as
primary –region
For region 1
The operation successfully completed
Add shard 1
GDSCTL>add invitednode shard 1
GDSCTL>create shard-
shardgroupprimary_shardgroup –destination
shard1 –credential
Oracle_cred
DB unique Name: sha 1
Add shard 2
GDSCTL>add invited node shard 2
GDSCTL>create shard-
shardgroupprimary_shardgroup –destination
shard1 –credential
Oracle_cred
DB unique Name:sha 2
Deploy shards
In shardcat node use gsdctl,to run the
following command so as to create the
databases shard 1 and shard 2
gsdctl> deploy
Verify Shard Status
To check the status of the shards execute the
following commands simultaneously;
GDSCTL> GDSCTL>config shard
then
GDSCTL>databases
Types of Sharding
There are various types of Sharding,
which include system managed Sharding,
where Oracle takes care of everything, and
users do not have any control over the data.
Furthermore, the harsh partitioning approach
uses to evenly and randomly distribute the
data across the databases, and then provide

9
consistent performance across the shards[2].
The other type of Sharding is user-defined
Sharding where the user defines how the
data will map across the individual shards.
User-defined Sharding uses where certain
data needs to store on a given shard, and the
user needs full control over moving data
between shards. Composite Sharding, which
involves a combination of both System
sharding and user-defined Sharding. In
composite Sharding, information segment
by utilization of rundown or range
(super_sharding_key), and it further
parceled by steady hash (sharding_key).
These two degrees of Sharding make it
simple to delineate into many shards and
subsequently keep up a reasonable
dispersion of information over that
arrangement of databases [15]. Composite
Sharding is appropriate wzone information
internationally circulates. Here shards are
put in every topography, and inside that
geology, information consistently
appropriate and subsequently upgrades
direct versatility. For composite Sharding,
one is permitted to make numerous shard
spaces for various subsets of information in
a table distributed by brutal. For example, a
sharded table Customers make utilizing
email as the super sharding key and the
std_id as the sharding key. Database
sharding practices is associated with several
benefits and drawbacks at the same time.
The Benefits of Sharding
Database scaling enables horizontal
scaling, also known as Scaling down.
Horizontal scaling means adding additional
computers to the same system to distribute
the load and allow additional traffic and
faster loading. The contradicts with Vertical
scaling or scaling up, which involves
upgrading the hardware of an existing
server, either by adding more RAM or
CPU[2]. Also sharded databases facilitates
faster query response. When a query is
submitted on a non sharded database, it

10
might have to search through all the rows in
the table one is querying before it can find
the dataset one is looking for. For extensive
databases, the querying process can become
extremely slow. Therefore, by sharding a
table into multiple independent tables, the
query goes through a minimal number of
rows, and their result sets are returned much
faster.
Another reason why a sharded
database may be chosen is to enhance the
reliability of an application by reducing the
impact of downtimes [16]. If an application
entirely relies on an unsharded database, a
downtime can make the whole application
available. On the other hand, with a sharded
database, a downtime is likely to affect a
single shard and not the entire database or
application. Although an outage of one
shard might affect some parts of the form,
hence unavailable to users, the overall effect
will be less as compared to the crash of the
entire database.
The Drawbacks of Sharding
There is a complexity of adequately
implementing a sharded database
architecture. If Sharding is incorrectly done,
there is a possibility that the sharding
process can lead to lost data or corrupted
tables. Another drawback is that once a
database is sharded, it might be challenging
to move it back to unsharded architecture.
All backups made before the database was
sharded will not include data added since
partitioning.
Rebalancing of data; this might be
needed when a shard outgrows all other
shards and hence becomes unbalanced. Its
situation is known as the database
hotspot. In this case, all profits of Sharding
the database are canceled out. For even
distribution, the database has to be re-
sharded, building from scratch, which is
very expensive [10]. No native support does
not all database engines that natively support
Sharding. Due to this, Sharding requires

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

11
documentation or tips for troubleshooting
problems.
Considerations Before Sharding.
There are several things to consider
before one start Sharding; they include
Licensing for Sharding or partitioning,
which provides for licenses for on-premises,
on-cloud, and the hybrid cloud model.
Check on the Application suitability.
Generally, Online Transaction Processing
applications perfectly fit with regional data
distributed to a single node and access via
that node; Despite that the OLTP are fit for
the data sharding the applications have to
meet some specific requirements. These
requirements include; The app should have a
well-defined data model and data
distribution strategy that can access the data
through a given key—sharding key for each
table in the table family. The first column of
the primary key of the root table must be a
sharding key[2]. The data must be modeled
in a hierarchical manner that consists of a
root table and many child tables and
grandchild tables. Use distinct global
services for read-only and read/write
workloads.
The application should make use of
Oracle integrated connection pools. The
application should be able to check any new
connection from the pool through the
specification of the sharding key using the
API provided by Oracle sharding, for each
key based request made [17]. Identify and
create duplicate tables for typical reference
tables that are shard-able—the design of the
relational table, especially the data
distribution key, like other databases. One
should consider that it is not a shared
everything(RAC) architecture instead of a
distributed database. The sharding flow
which describes how data flows to specific
independent shards, from the point where
the user executes a query to fetching of
results [18]. A GSM global service is
created in the shard catalog database with its

12
type, region affinity, by use of gsdctl. This
service is utilized by the user or the
application. When the customer executes a
query, the application must connect to the
shard catalog database and retrieve the
delivery metadata. Then the shard directors
reroute the connection to a specific shard or
all nodes. Therefore, shard catalog acts as a
leader node.
There are different software
requirements for oracle sharding; they
include Oracle Database 12c Release 2 and
Non-Container Databases, oracle 12c release
Global Service Manager, non-container
databases for shard catalog database, oracle
Sharding Compares with NoSQL, eg.
Mongo DB, a contrast to NoSQL databases,
Oracle sharding, offers all capabilities of an
enterprise RDBMS [4]. These technologies
include SQL and other programming
interfaces, support for complex data types,
multi-core scalability, compression,
advanced encryption, ACID functionality,
support for complex data types, online
schema updates, reliable processing, and
JSON developer agility. Oracle Sharding
includes Oracle enterprise edition, which
deploys sharded architecture that provides
automation to simplify many aspects of
lifecycle management, intelligent data-
dependent routing excellent runtime
performance, and increased flexibility due to
advanced portioning methods [8]. Oracle
sharding offers users the ability to scale
using sharded architecture without
comprises caused of a NoSQL data store
extensively.
Reference
[1] Bagui, Sikha, and Loi Tang Nguyen.
"Database sharding: to provide fault
tolerance and scalability of big data on the
cloud." International Journal of Cloud
Applications and Computing (IJCAC) 5, no.
2 (2015): 36-52.
[2] Anderson Jr, Richard James, Barbara A.
Benton, and William Havinden Bridge Jr.

1 out of 14

Oracle Database Sharding for Cloud Big Data Scalability and Tolerance

Paraphrase This Document

Paraphrase This Document

Paraphrase This Document

Paraphrase This Document

Related Documents

MIT-BDS-02: Big Data and Data Science - Database and ML Report

+13062052269

info@desklib.com

Oracle Database Sharding for Cloud Big Data Scalability and Tolerance

Paraphrase This Document

⊘ This is a preview!⊘

Paraphrase This Document

⊘ This is a preview!⊘

Paraphrase This Document

⊘ This is a preview!⊘

Paraphrase This Document

⊘ This is a preview!⊘

Related Documents

MIT-BDS-02: Big Data and Data Science - Database and ML Report

+13062052269

info@desklib.com