CSG2132 Enterprise Data: Data Storage Strategies and RAID Solution

Verified

Added on  2023/04/17

|5
|975
|114
Homework Assignment
AI Summary
This assignment solution for CSG2132 Enterprise Data explores data storage strategies and RAID configurations. It covers topics such as calculating available storage in RAID 5 systems, detecting data corruption using checksumming, and determining parity data for RAID 5 arrays. The solution also addresses the MTBF of disks in storage arrays and explains when and why physical drives are replaced in data centers, highlighting the importance of S.M.A.R.T. data for predicting hardware failures. Furthermore, it compares software RAID with hardware RAID, discussing their advantages and disadvantages, and examines the limitations of RAID as a data redundancy system. The document differentiates between RAID 0+1 and RAID 1+0, providing guidance on when to use each. Finally, it suggests restructuring a file storage solution using RAID 5 to improve fault tolerance without additional hardware costs, justifying the choice based on performance, capacity, and fault tolerance, making it easier for students to understand the key concepts. Desklib offers this and many other solved assignments.
Document Page
Due Date:
9am 25th March 2019
Assignment Marks: 7% of unit
General Assignment Information:
This is the first of 4 quizzes that together add up to 30% of your grade for this
unit. You can do the quiz questions at any time until the due date and
answers should be submitted via the Turnitin link on blackboard.
CSG2132 – Enterprise Data
Data Storage Strategies and RAID
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
1. If a RAID 5 system is built using 5 disks with 2TB of
storage available per disk, the total available storage will
be _8_ TB
2. What filesystem technique is used to detect data
corruption of blocks?
Checksumming
3. Given the following data to be stored across four drives
in a RAID5 array, calculate the appropriate parity data.
Assume each stripe is one byte in length.
11110010 10001011 00111001
Rearrange Bytes
1 1 1 1 0 0 1 0
1 0 0 0 1 0 1 1
0 0 1 1 1 0 0 1
Parity 0 1 0 0 0 0 0 0
4. If the MTBF of a single disk is 1 million hours, and there
are 80 disks within a storage array, what is the average
number of disks that will fail within the first 2 years
(assuming a constant failure rate)?
MTBF (Array) = MTBF (Single) / Number Of Disks
= 1000000 / 80 = 12500 Hours
Average Number of Disks fail in 2 years = Hours in 2 years / 12500
= 1.36704
5. When are physical drives replaced in datacentre
environments? Why?
Data centres have two choices to replace disks, one when they
have already failed and need to be replaced. Second is
precautionary when some of the reads or writes to the disk start
failing, system speed of operation slows down as disk is not able to
perform optimum levels. This could also be gathered by looking at
statistical data aggregated from disk operations over a period of
time. These are called SMART parameters that can predict drive
failures and prompt for replacement.
Document Page
6. What is the purpose of S.M.A.R.T. data? How useful is it
at predicting hardware failures? Justify your answer.
Self-Monitoring, Analysis and Reporting Technology is shortly known as S.M.A.R.T. It
collects data over the period of operation and records them as certain parameters
values. Most of these parameters indicate the health of the disk in terms of read write
error rate among others. When the error rate increases beyond a certain level and
happens too often, the disk indicates it is having hardware failures and needs
replacement before data becomes inaccessible. If it is known beforehand about
imminent disk failure, the measures can be taken to replace disk with in time before it
becomes corrupt and starts affecting system performance.
7. Briefly describe two advantages and two disadvantages
of software RAID over hardware RAID.
If RAID is under control of software, the need for disk controller can be
eliminated making it economical option to implement. Also it reduces
clutter of installing and wiring the drives to the hardware controller thus
saving time.
But software based RAID is much slower compared to the hardware
based RAID. Further any corruption in software / OS can lead to
useless RAID array as access to the data is managed by the software
only.
8. What are some major disadvantages of RAID as a data
redundancy system?
It needs more hardware then is actually needed to store the given
amount of data. Extra hardware also takes up more space in
server/machine or at data centre. Then it is complex mechanism to
keep data across disks to be coherent. Any change in data at one
location leads to multiple writes in different locations across disks thus
reducing the overall throughput of the disk system.
9. What is the difference between RAID 0+1 and RAID 1+0?
When would you use each?
Document Page
Raid 10 combines performance of RAID 1 with RAID 0 redundancy. RAID01
combines performance of RAID 0 with redundancy of RAID1. Both need at
least 4 disks to maintain data and none of them is required to do any parity
calculations.
However when talking about failure response. Raid 10 is better at handling
failures as single disk failure only causes partial loss of data in a stripe while
allowing data to be available from other drive copy. Also it still allow other
stripes to be active during this entire period. Raid 01 in this case is less
supportive for a failure. Any single disk failure cuts off entire stripe of disks
and putting the entire responsibility on the other available stripe. So it is easier
to service and still keep things running and secured in a RAID 10 structure.
Since RAID10 provides better resilience and faster rebuild time, it is always
the matter of choice for applications.
10. A company is restructuring their current file storage
solution to improve their fault tolerance.
They currently store shared files on a file server with a 5-
disk RAID0 array. Each disk has a 1TB capacity and the
total size of the files they currently store is 2.5TB and is
likely to increase over time. The company does not wish
to purchase any additional hardware.
How would you suggest that the company restructure
their file storage and why?
Justify your choice based on performance, capacity and
fault tolerance.
It is suggested that company adopts RAID 5 array structure. This won’t need
any additional hardware as same can be achieved by software based control.
It is also beneficial since RAID 5 allows recovery of data in case of a disk
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
failure using the parity block in each stripe. Compared to RAID 0 which has no
fault tolerance capability RAID 5 is much better. Capacity wise it takes away
about 20% of the original space but that is not an issue since total capacity
available is 5TB and currently only 2.5 TB is used. This allows room for
storage upto 4TB of data with fault tolerance without any additional
investment.
chevron_up_icon
1 out of 5
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]