Trusted by 2+ million users, 1000+ happy students everyday
Showing pages 1 to 4 of 11 pages
PART I: DATA WAREHOUSE(Onepoint each)(1)For analysispurposes,you want thedata stored in thedata warehouseinaform.(2)Departmental data marts,from an enterprise perspectivehaveonemajordeficiency. What is thatdeficiency?(3)A multi-dimensional structureto storedatais called a .(4) A Star Schema hasaFact Tables and at most, six Dimension Tables (Trueor False)(5) When designingan OLAPdatabase,arestructured into “hierarchies.”PART I: MATCH TERMS(Two-pointsperquestion)Discrete Qualitative data (1)learning from discoveryand observationTemperaturereadings _Dependent variable(2)induction(3 )used toidentifysimilarityand diversitywithin asampleTheindependent variableis significantUnsupervised learning_(4)objects that considered dissimilar from theremainder of thedata(5) continuous quantitative dataPredictive datamining_(6)gender, political parties or religionsText miningExtraction ofpatterns(7)process of numericizingtext for analysis,prediction and pattern identification(8)if thevalue of “p”less 0.05 or<Outlier (9)thegoal is predictionJaccard Coefficient(10)variableto be predictedPART II: TRUE-FALSE (Two points each)(1)Supervised learningis learningfrom observationand discovery.(2)Statistics is thebranch of mathematics concerningthe collection and description ofdata.(3)Linear regression is similar to thetask offindingalinethat maximizes totaldistanceto a set of data.(4)Squaringthe differencebetween the predictedSeenext page
consumptionpercustomervalue and theactual valueis themostcommon waytocalculate theerror.(5)the regression forY = bo+b1P+b2P2+b3Q +b4Q2+.... is amultiple regression equation.(6)A simple regression equation has oneindependentvariable and two dependent variables.(7)The correlation coefficient for thebelow relationshipbetween theaverageprice and consumption per customer is-0.79. This suggestsastrongpositive correlationbetweenaverageprice andconsumption per customer.1601401201008060402000 20 40 60 80 100 120averageprice(8)The abovescatter plot showsalinear relationshipbetween theaverageprice and consumption per employee.(9)Supervised learningtechniques arenot used forprediction.(10)Clusteringis mostlyused for consolidatingdata intohigh-level views andgeneralgroupingofrecordswithunlikebehavior.PART III: FILL IN THE BLANK(Two pointseach)(1)Theregressed datapoints in a simple regression equation ofxandy yields an intercept of 0and aslope of1. TheR-squared coefficient for this equation is .(2)Themajor limitation of allregression techniques is thatyou cannot be sureof theunderlyingrelationship(s).(3)Two classification algorithms areand .(4)Logitmodels produceprobabilities that rangefrom to .
PART IV: MULTIPLE CHOICE (Two Points Each) CIRCLE OR PUT A () AROUNDYOURANSWER(1)Outliers area. Illegal b. Normalc. Infrequent observationsd. All of the above(2)General Applications ofClusteringa. Pattern Recognitionb. ImageProcessingc. Spatial Data Analysisd. All of the above(3)Decision Trees producerules that area. Bottom upb. Inclusive c. Exclusived. All of the above(4)If....And.... Thena. Is notgood Englishb. Isaruleformatc. Isredundantd. Noneof theabove(5)Thespecificterm “parentnode”would befounda. Inclusteringb. In neural networksc. Inruleinductiond. Noneof theabovePART V: PROBLEMSET:PROBLEM(1):(10 points)(1)This is asimple regression problem. You areadata analyst with the U.S. Department of Commerce.You havegathered information from thedepartment’s data warehouseon the world priceof pulp and theexports ofpulp to the rest of the world. You havebeen asked to seewhat would be the effect ofa$1 increasein thepriceof pulp on shipments from theUnited States. Assumethe United States produces100%of theworld pulp.a. Table 1 below representsyour data series (in areal world scenario,your series might consistofhundreds of datapoints). ThevariableX is theworld pricein current dollarsper ton. So for example, the first variable is $792.32 per metricton. Theothervariable Yispulp shipments in millions of metrictons.
PulpShipments(millionsofmetrictons)b. Table 2 is ascatter plot of therelationship between pulp prices and shipments.TABLE 2: Scatter Plot of Pulp Shipments4035302520151050500 550 600 650 700 750 800 850 900WorldPulpPrice (dollars per ton)c. Table 3 is theregressionequationmodel predictingthe relationship between price and pulpshipments.
Found this document preview useful?
You are reading a preview Upload your documents to download or Become a Desklib member to get accesss