This report discusses different topics related to data warehousing, data mining, Big Data technologies, NoSQL databases, and the impact of the Open Data movement. It also covers the differences between operational and strategic data sets and compares OLAP with OLTP.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Running head:INFORMATION TECHNOLOGY - DATABASE Information Technology – Database Name of the Student Name of the University Author’s note
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
1INFORMATION TECHNOLOGY - DATABASE Table of Contents 1. Introduction..................................................................................................................................2 2. Discussion....................................................................................................................................2 2.1 Data Warehousing and Differences between Operational and Strategic Data Sets...............2 2.2 Data Mining and Comparison of OLAP with OLTP.............................................................5 2.3 Rise of ‘Big Data’ Technologies and Applications...............................................................8 2.4 ‘NoSQL’ Databases in Comparison with ‘ACID-Compliant’ Databases...........................10 2.5 Impact of ‘Open Data’ Movement.......................................................................................10 References......................................................................................................................................13
2INFORMATION TECHNOLOGY - DATABASE 1. Introduction There is a growing need for organizations to integrate different kinds of changes within the database into the process of development. This report helps in the discussing of different types of topic such as Data Warehousing, Data Mining and many others. With the deployment of database applications, there are different forms of security and legal concerns that would need to be considered. 2. Discussion 2.1 Data Warehousing and Differences between Operational and Strategic Data Sets Data Warehousing– This is defined as a process based on construction and using the data warehouse. This form of warehouse is mainly constructed based on the integration of data based on multiple heterogeneous sources, which supports ad hoc queries, analytical reporting and making of decisions (Cuzzocrea, Bellatreche and Song 2013). The process of data warehousing would involve the cleaning of data, integration of data and consolidations of data. A data warehouse could be considered as a federated repository of data that would be collected within the various operational systems of the enterprise. These type of data might be logical or physical. The data warehousing would majorly emphasize on the capturing of data based on diverse sources based on access and analysis rather than transaction processing (Cuzzocrea 2013). The platforms based on data warehouses would be different from operational databases because they would be able to store historical information and thus make the processes easier for business leaders for analysing data within a specific period of time.
3INFORMATION TECHNOLOGY - DATABASE (Fig 1: Benefits of a Data Warehouse) (Source: Cuzzocrea 2013, pp. 482) Differences between Operational and Strategic Data Sets Remote data is considered as a strategic data that would include information based on economy, political data, social statistics, ecology and technological advances. The strategic manager make use of strategic data by collecting information related to industry in order to create different kinds of standards (Chenet al.2015). Other kinds of data based on economics should be able to include the data based on the state of economy that includes recession, boom, depression and different trends based on economy. Political data would include different kinds of legal and regulatory information that includes employment and taxation laws.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
4INFORMATION TECHNOLOGY - DATABASE (Fig 2: Design of a Data Warehouse or Architectural Design) (Source: Bellatreche, Khouri and Berkani 2013, pp. 79) On the other hand, information based on different competitors such as statistics related to the labour of a company, information regarding suppliers, projection based on needed sources and accounting data could be included under operational data (Cuzzocrea, Bellatreche and Song 2013). The data collected from different kinds of sources would be able to help marketers to create products that would seem to be superior based on competing with other kinds of products (Bellatreche, Khouri and Berkani 2013). The data based on different customers would be helpful for marketers based on the creation of consumer profiles that would be helpful for the creation of products, distribution and advertising them.
5INFORMATION TECHNOLOGY - DATABASE 2.2 Data Mining and Comparison of OLAP with OLTP Data Mining– This could be defined as a process based on discovering different forms of patterns based within larger sets of data that would involve different kinds of methods based on the intersection of machine learning, database systems and statistics. With the help of this process, the sorting of data sets based on the identification of patterns and establishing of relationships would be helpful (Wuet al.2014). This would majorly help for the solving of problems with a high level of data analytics trends and techniques. Different kinds of tools based on data mining would be helpful for predicting the trends in the future. (Fig 3: The Processes of Data Mining) (Source: D’Oca and Hong 2015, pp. 398) Aside from the steps based on raw analysis, the process of data mining would be helpful for the involvement of databases and aspects based on data management. This would also
6INFORMATION TECHNOLOGY - DATABASE include data processing, inference and model considerations, complexity considerations and post-processing based on discovered structures. The techniques based on data mining would mainly been use in different kinds of areas of research that would include cybernetics, genetics, mathematics and marketing (D’Oca and Hong 2015). With the proper form of use of data mining techniques, it would be used for the prediction of customer behaviour and driving of efficiencies. (Fig 4: Steps of Data Mining and Knowledge Discovery) (Source: D’Oca and Hong 2015, pp. 400) Comparison of OLAP and OLTP Basisof Comparison OLAPOLTP BasicsIt could be defined as a data analysis systemandonlinedataretrieving system. Itisdefinedasanonline transactional systems that would be able to manage the modification of
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
7INFORMATION TECHNOLOGY - DATABASE different databases (Psaroudakiset al.2014). DataThedifferentOLTPdatabaseswould becometheprimarysourceofdata based on OLAP. OLTPandthedifferent transactionswouldbecomethe original sources of data (Wuet al. 2014). TransactionOLAP has long forms of transactions.OLTPhasshortkindof transactions. TimeThetimetakenforprocessingof different kinds of transactions would be comparativelymorewithinOLAP (Dehneet al.2013). The processing time of different kinds of transactions within OLTP wouldbelessascomparedto OLAP. QueriesOLAP has complex kind of queries.OLTP has simple kind of queries. IntegrityThe database based on OLAP does not getsmodifiedonafrequentbasis. Hence,theintegrityofdataisnot affected. ThedatabasebasedonOLTP shouldbeabletomaintainthe constraint based on data integrity (Difallahet al.2013). Normalizatio n The tables within OLAP would not be normalized. The tables within the databases of OLTP would be normalized based on 3NF.
8INFORMATION TECHNOLOGY - DATABASE 2.3 Rise of ‘Big Data’ Technologies and Applications The primary concept of Big Data was mainly created for the purpose of facing the constant increase in the amount of data that is created. In the recent times, there has been a huge proliferation in the volumetrics of data. This huge form of data would be necessary for finding the solutions based on storage and analysis. The use of Big Data technologies help in addressing the issues of 3V: data variety, velocity and data volumetry (Hashemet al.2015). These 3V’s would be helpful would help in concerning collection of data, analysis and storage. The impact of Big Data would be helpful for enabling the responding to several issues such as predictive analysis, better stock management and predictive sales. (Fig 5: Rate of Big Data rise from 2017-2022) (Source: Hashemet al.2015, pp. 101)
9INFORMATION TECHNOLOGY - DATABASE Some of the latest forms of Big Data technologies and applications are: Apache Hadoop– This is one of the most important form of technology and a popular framework that is being widely used for dealing with big volumes of data (O’Driscoll, Daugelaite and Sleator 2013). One of the most widely used case of Hadoop is Data Lake. Batch Processing– This kind of technology would help in the processing of data till the time there would not be any more systems of data entering. The incremental and continuous treatments would be helpful for the architecture to take note of new entry of data without the processing of the previous entry of data. In this method, the desired results appear after the end of the entire processing of data. Some examples of batch processing are Apache Spark and MapReduce. Streaming Processing– This form of treatment is opposite than batch processing. With the help of this method, the desired results would be accessible before the end of processing would be done (Huet al.2014). The technology of stream processing is considered as an easy solution that would be able to improve the time of processing. NoSQL Database– As compared to the relational databases, the NoSQL databases would be able to provide a new approach to storage of data, flexible and adapt to different evolutions and also would be less sensitive to failures. Cloud Computing– This technology is described as an innovative way of deployment of Big Data technologies that would be demanding huge processing capacities and huge storage systems (Pokorny 2013). Cloud computing technologies are a powerful and less expensive approach.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
10INFORMATION TECHNOLOGY - DATABASE 2.4 ‘NoSQL’ Databases in Comparison with ‘ACID-Compliant’ Databases The NoSQL databases would be able to encircle a wider range of databse technologies that would be designed for catering to the demand of modern form of applications. The NoSQL systems are easy for the process of deployment and thus be able to store a wider range of data types (Nayak, Poriya and Poojary 2013). The NoSQL systems mostly excel in performance until there would be a need of consistency within the data. The NoSQL databases would primarily emphasize on the performance of work as compared to data integrity. Most of the NoSQL databases would compromise on the ACID compliance based on performance (Kaur and Rani 2013). Hence, most of the organizations would make use of NoSQL for different data types that would not be impacted based on consistency. The SQL databases would be default to ACID compliance though most of them would be able to offer options based on favouring performance over the integrity of data based on some kind of options. 2.5 Impact of ‘Open Data’ Movement ‘Open Data’ movement could be defined as the kind of idea that different forms of data would be freely available for everyone to make use of and again republish them as per their needs. The republishing of data would be without any form of patents, copyright or any other mechanisms based on control. Different kinds of tools for accessing and interpreting them would be able to lead them towards innovations (Attardet al.2015). Open Data is made free and available publicly that would be accessible by everyone and would be easy to use.
11INFORMATION TECHNOLOGY - DATABASE (Fig 6: User Views on Open Data Movement) (Source: Attardet al.2015, pp. 412) The importance of data could be viewed as a public utility. This form of open data would be leveraged by several enterprises, individuals that would also include commercial enterprises. The subsequent aim of Open Data movement would be make data that is made using public resources to be accessible for the use of public and made free of cost. Open Data is considered as a global movement. There are different countries that have adopted the International Open Data Charter, which intends to make the data of the government to be presented in an open digital format (Hashemet al.2015). Open Data is considered to be a major form of global resource that would be helpful for spurring the economic growth based on launching new businesses, optimizing the operations of existing companies. Creation of jobs and thus be able to improve the climate based on foreign investment.
12INFORMATION TECHNOLOGY - DATABASE Free form of available data from the U.S Government could be considered as an important national resource. This would serve as a fuel based on innovation, entrepreneurship, different forms of public benefits and scientific discovery. Based on a recent report, the use of Open Data would be able to generate more than $3 trillion a year based on different forms of additional value in various key sector of global economy such as transportation, electricity, education and healthcare. The launch of different sets of Open Data Round Table meeting with various government agencies and entrepreneurs have been proved to be helpful based on connecting with business leaders and thus also make use of open data (Jetzek, Avital and Bjørn- Andersen 2014). They would also be able to make use of different ways in which the data would be openly available with different government officials who are in the process of work in order to make the data easy to find and thus maximize their value for public use. The new kind of initiative taken by the Open Data institute of United States would be able to create and thus be able to implement the open source software and standards based on open government data. This form of data would be in relation with fishing and hunting that would be aimed at streamlining modernizing the industry.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
13INFORMATION TECHNOLOGY - DATABASE References Attard, J., Orlandi, F., Scerri, S. and Auer, S., 2015. A systematic review of open government data initiatives.Government Information Quarterly,32(4), pp.399-418. Bellatreche, L., Khouri, S. and Berkani, N., 2013, April. Semantic data warehouse design: From ETL to deployment à la carte.In International Conference on Database Systems for Advanced Applications(pp. 64-83). Springer, Berlin, Heidelberg. Chen, J., Chen, J., Liao, A., Cao, X., Chen, L., Chen, X., He, C., Han, G., Peng, S., Lu, M. and Zhang, W., 2015. Global land cover mapping at 30 m resolution: A POK-based operational approach.ISPRS Journal of Photogrammetry and Remote Sensing,103, pp.7-27. Cuzzocrea, A., 2018. Effectively and Efficiently Supporting Encrypted OLAP Queries over Big Data: Models, Issues, Challenges. InProceedings of the 7th International Conference on Emerging Databases(pp. 329-336). Springer, Singapore. Cuzzocrea, A., Bellatreche, L. and Song, I.Y., 2013, October. Data warehousing and OLAP over big data: current challenges and future research directions.In Proceedings of the sixteenth international workshop on Data warehousing and OLAP(pp. 67-70). ACM. D’Oca, S. and Hong, T., 2015.Occupancy schedules learning process through a data mining framework. Energy and Buildings, 88, pp.395-408. Dehne, F.K.H.A., Kong, Q., Rau-Chaplin, A., Zaboli, H. and Zhou, R., 2013, October.A distributed tree data structure for real-time OLAP on cloud architectures.In Big Data, 2013 IEEE International Conference on (pp. 499-505). IEEE.
14INFORMATION TECHNOLOGY - DATABASE Difallah, D.E., Pavlo, A., Curino, C. and Cudre-Mauroux, P., 2013. Oltp-bench: An extensible testbed for benchmarking relational databases.Proceedings of the VLDB Endowment,7(4), pp.277-288. Hashem, I.A.T., Yaqoob, I., Anuar, N.B., Mokhtar, S., Gani, A. and Khan, S.U., 2015. The rise of “big data” on cloud computing:Review and open research issues. Information systems, 47, pp.98-115. Hu, H., Wen, Y., Chua, T.S. and Li, X., 2014. Toward scalable systems for big data analytics: A technology tutorial.IEEE access,2, pp.652-687. Jetzek, T., Avital, M. and Bjørn-Andersen, N., 2014, June. Generating sustainable value from open data in a sharing society. InInternational Working Conference on Transfer and Diffusion of IT(pp. 62-82). Springer, Berlin, Heidelberg. Kaur, K. and Rani, R., 2013, October. Modeling and querying data in NoSQL databases. In2013 IEEE International Conference on Big Data(pp. 1-7). IEEE. Nayak, A., Poriya, A. and Poojary, D., 2013. Type of NOSQL databases and its comparison with relational databases.International Journal of Applied Information Systems,5(4), pp.16-19. O’Driscoll, A., Daugelaite, J. and Sleator, R.D., 2013. ‘Big data’, Hadoop and cloud computing in genomics.Journal of biomedical informatics, 46(5),pp.774-781. Pokorny,J.,2013.NoSQLdatabases:asteptodatabasescalabilityinweb environment.International Journal of Web Information Systems,9(1), pp.69-82. Psaroudakis, I., Wolf, F., May, N., Neumann, T., Böhm, A., Ailamaki, A. and Sattler, K.U., 2014, September. Scaling up mixed workloads: a battle of data freshness, flexibility, and
15INFORMATION TECHNOLOGY - DATABASE scheduling.In Technology Conference on Performance Evaluation and Benchmarking(pp. 97- 112). Springer, Cham.