This article covers topics related to data mining and data warehousing such as star and snowflake schema, view materialization, multi-way array aggregation, and Apriori algorithm. It includes solved assignments, essays, and dissertations for IT 446 course.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Data Mining and Data Warehousing IT 446 Student Details: Name:### CRN:### ID:### Instructions: You must submit two separate copies(one Word file and one PDF file)using the Assignment Template on Blackboard via the allocated folder. These filesmust not be in compressed format. It is your responsibility to check and make sure that you have uploaded both the correct files. Zero mark will be given if you try to bypass the SafeAssign (e.g. misspell words, remove spaces between words, hide characters, use different character sets or languages other than English or any kind of manipulation). Email submission will not be accepted. You are advised to make your work clear and well-presented. This includes filling your information on the cover page. You must use this template, failing which will result in zero mark. You MUST show all your work, and textmust notbe converted into an image, unless specified otherwise by the question. Late submission will result in ZERO mark. The work should be your own, copying from students or other resources will result in ZERO mark. UseTimes New Romanfont for all your answers.
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Pg.1Question TwoQuestion Two Question One Compare and contrast Star and Snowflakes schema of data warehousing model. Star Schema:In terms of star schema, one fact table can remain in the center of star and other linked dimension tables around it. This schema is called star schema as the structure is similar to a star (Sanchez, 2016). Star schema is one of the simplest type of schema in data warehousing. It has another name called star join schema. This schema is suitable for querying large data sets. Example of Star Schema Learning Outcome(s):1 LO1: Explain different data mining tasks, problems and the algorithms most appropriate for addressing them. 2 Marks
Pg.2Question TwoQuestion Two Star schema represents every dimension using single dimension table. Attributes of dimension should be represented within dimension table. Each dimension table should be connected to fact table using foreign key. Dimension tables should not have connection between each other (Sidi et al.,2016). Fact table should consist of measure and key. Star schema is well known for optimal disk space and easily understandable concepts. Normalization concepts should not be applied to dimension tables. Star schema has wide support by business intelligence tools. Snowflake Schema:Snowflake is the extended version of star schema. This is because snowflake includes additional dimensions within star schema. The same reason is applied to naming this schema, its structure is similar to a snowflake. Snowflake tables are created by applying normalization concepts (Benjelloun, El Merouani & El Amin, 2017). For this reason, the dimension tables are decomposed into smaller tables which add additional dimension tables into schema. Snowflake Schema The biggest feature of snowflake schema is lesser usage of disk space than star schema. Dimension implementation is comparatively easier. Query performance is
Pg.3Question TwoQuestion Two slower because as number of relations between dimensions is more (Mohammed, 2019). Effort to maintain snowflake schema is comparatively more. Difference between Star Schema and Snowflake: Star SchemaSnowflake Star schema shows the hierarchical representation of dimensions through dimension tables Hierarchies are separated into dimension tables Each dimension table individually connect to fact table (Sidi et al.,2016) Dimension tables has connection with other dimension tables Single join is used for establishing connection between fact table and distinct dimension table Multiple joins are used to establish connection between dimensions and fact table Database design is very simple and easy to understand Database design is comparatively complex and need idea of normalization to understand (Mohammed, 2019) Data redundancy level is extremely highLow data redundancy is key feature Aggregated data is put into individual dimension table Data is separated into multiple dimension tables
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Pg.4Question TwoQuestion Two Question Two What is view materialization? Why and under what condition partial view/lattice ofcuboidmaterializationispreferredoverno-materializationandfull materialization? Materialization can be referred to a copy of a target master from distinct point in time. In general it is the database object which consist of outcome of a query is called materialize view. The procedure of generating a materialized view is known as materialization. Materialization is referred to the form of caching the outcome of query. This process is similar to value memorization of a method within functional languages. Materialized views are supported by major relational database management systems such as Oracle, SQL server, PostgreSQL, MySQL and many more. No materializationprocess doesnot pre-computeanything relatedto “nonbase” cuboids. The outcome of this process is exclusive multidimensional computing that sums on–the-fly. This make the whole process extremely slow. Full materialization, on the other hand, pre-calculate entire available cuboids. Full cube is referred to as the result of computation which generate a lattice. This option demands large quantity of memoryspaceintermsofrecordingallofthepre-calculatedcuboids.Partial materialization calculates a define subset of enter collection of potential cuboids (Sohrabi & Ghods, 2016). This leads to possibility that, system can compute only those subsets that meets specific user criteria. Partial materialization establishes a perfect balance between the storage usage and response time. In order to process partial materialization of cuboids three factors must be considered as important such as identification of cuboids subset to materialize, materialize cuboids exploitation during processing of query and modification of materialization of cuboids efficiently. From the above discussion it is clear that partial materialize view is much faster and storage efficient in comparison two other options. Learning Outcome(s):2 LO2: Apply and evaluate data mining algorithms with respect to problems they are specifically designed for. 1 Marks
Pg.5Question TwoQuestion Two Question Three List two differences between Multi-Way Array Aggregation method and BUC method. The multi array aggregation or multiway array aggregation method is used for computing an entire data cube through utilizing multidimensional array to be assumed its foundation data structure. Multiway array aggregation is a usual MOLAP approach (Gulisano et al., 2017). On the other hand, BCU is used for computing iceberg cubes and sparse cubes. Multiway array uses multidimensional array but BCU utilizes apex cuboid to generate the cube. This way BCU can share costs of data partitioning. Through this order processing, BCU can trim at the time of construction, through utilizing Apriori Algorithm. Multiway array aggregation make partition of array into pieces. A chunk or piece can be assumed to be small enough to be fitted within the available memory of cube computation. In simple words, chunking is the process of dividing an n-dimension array into small n-dimension pieces. On the other hand, BCU is used for top down construction, in spite of its naming convention (Hamoud, Hashim & Awadh, 2018). BCU is used for exploration for the computation of a 3-D data cube. Learning Outcome(s):1 LO1: Explain different data mining tasks, problems and the algorithms most appropriate for addressing them 1.5 Marks
Pg.6Question TwoQuestion Two Question Four Consider the database containing transaction data as shown in the table below. Apply Apriori algorithm and find the frequent itemsets where min-sup=2. TIDItems Bought 1Diaper, Bread, Juice 2Eggs, Bread, Pasta 3Diaper, Eggs, Bread, Pasta 4Eggs, Pasta 5Rice Above table represents the transaction database that has many collection of transactions. First transaction in transaction database represents item set ofDiaper, Bread and Juice. It has to be understood that an item cannot appear twice in the same transaction as well as item is thought to be sorted by lexicographical order within the transaction table (Yuan, 2017).Total five transactions has been carried out as per information provided in the table. Items available for transaction are diaper, bread, juice, eggs, pasta and rice. If Apriori is applied to above table then following result is generated. The concern is about output of Apriori algorithm. Apriori algorithm is considered to be an algorithm that discovers set of items following frequently within the transaction database. A frequent item set is referred to as the item set occurring within at least minsup transactions. Minsup is the parameter provide by the user. Learning Outcome(s):2 LO2: Apply and evaluate data mining algorithms with respect to problems they are specifically designed for 1.5 Marks
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Pg.7Question TwoQuestion Two Item Setssupport {Diaper}2 {Bread}3 {Juice}1 {Eggs}3 {Pasta}3 {Rice}1 {Diaper, Bread}2 {Eggs, Bread}2 {Eggs, Pasta}3 {Eggs, Bread, Pasta}2 From the above table it is found that{Diaper}, {Diaper, Bread}, {Eggs, Bread} and {Eggs, Bread, Pasta} are the item sets that has min-sup = 2.
Pg.8Question TwoQuestion Two Bibliography: Benjelloun, M., El Merouani, M., & El Amin, A. A. (2017). Using Snowflake Schema and Bitmap Index for Big Data Warehouse Volume.International Journal of Computer Applications,180(8), 30-32. Gulisano, V., Nikolakopoulos, Y., Cederman, D., Papatriantafilou, M., & Tsigas, P. (2017). Efficient data streaming multiway aggregation through concurrent algorithmic designs and new abstract data types.ACM Transactions on Parallel Computing (TOPC),4(2), 1-28. Hamoud, A., Hashim, A. S., & Awadh, W. A. (2018). Clinical Data Warehouse: A Review.Iraqi Journal for Computers and Informatics,44(2). Mohammed, K. I. (2019). Data Warehouse Design and Implementation Based on Star Schema vs. Snowflake Schema. Sanchez, J. (2016). A Review of Star Schema Benchmark.arXiv preprint arXiv:1606.00295. Sidi, E., El Merouani, M., Amin, E., & Abdelouarit, A. (2016). Star Schema Advantages on Data Warehouse: Using Bitmap Index and Partitioned Fact Tables.International Journal of Computer Applications,975, 8887. Sohrabi, M. K., & Ghods, V. (2016). Materialized View Selection for a Data Warehouse Using Frequent Itemset Mining.Jcp,11(2), 140-148. Yuan, X. (2017). An improved Apriori algorithm for mining association rules. InAIP conference proceedings(Vol. 1820, No. 1, p. 080005). AIP Publishing LLC.