Significant Unsupervised Learning

Added on - 16 Sep 2019

  • Dissertation

    type

  • 11

    pages

  • 2602

    words

  • 124

    views

  • 0

    downloads

Showing pages 1 to 4 of 11 pages
PART I: DATA WAREHOUSE(Onepointeach)(1)Foranalysispurposes,you wantthedata stored inthedatawarehouseinaform.(2)Departmentaldata marts,from anenterprise perspectivehaveonemajordeficiency.Whatis thatdeficiency?(3)A multi-dimensional structureto storedatais called a.(4)A Star Schema hasaFact Tablesandatmost,six Dimension Tables(TrueorFalse)(5)When designinganOLAPdatabase,arestructured intohierarchies.”PART I:MATCHTERMS(Two-pointsperquestion)DiscreteQualitative data(1)learning from discoveryand observationTemperaturereadings_Dependentvariable(2)induction(3 )used toidentifysimilarityand diversitywithin asampleTheindependentvariableissignificantUnsupervised learning_(4)objectsthat considered dissimilarfrom theremainderof thedata(5)continuous quantitative dataPredictive datamining_(6)gender, political parties orreligionsTextminingExtraction ofpatterns(7)process of numericizingtext foranalysis,prediction and pattern identification(8)if thevalueofpless 0.05 or<Outlier(9)thegoalis predictionJaccard Coefficient(10)variableto be predictedPART II: TRUE-FALSE (Twopointseach)(1)Supervised learningislearningfrom observationand discovery.(2)Statistics isthebranch of mathematicsconcerningthecollectionand description ofdata.(3)Linearregression is similar to thetask offindingalinethat maximizes totaldistanceto a set ofdata.(4)Squaringthe differencebetween the predictedSeenext page
consumptionpercustomervalueand theactual valueisthemostcommon waytocalculate theerror.(5)theregression forY =bo+b1P+b2P2+b3Q+b4Q2+.... is amultipleregression equation.(6)A simpleregressionequation has oneindependentvariableand twodependent variables.(7)Thecorrelationcoefficientfor thebelowrelationshipbetween theaveragepriceandconsumption percustomer is-0.79. This suggestsastrongpositivecorrelationbetweenaveragepriceandconsumption percustomer.160140120100806040200020406080100120averageprice(8)Theabovescatter plot showsalinearrelationshipbetween theaveragepriceandconsumption peremployee.(9)Supervised learningtechniquesarenot used forprediction.(10)Clusteringismostlyused forconsolidatingdata intohigh-level viewsandgeneralgroupingofrecordswithunlikebehavior.PART III:FILL IN THE BLANK(Twopointseach)(1)Theregressed datapoints in a simpleregressionequation ofxandyyieldsan intercept of 0and aslope of1. TheR-squaredcoefficientfor this equation is.(2)Themajor limitation ofallregression techniquesis thatyoucannot be sureof theunderlyingrelationship(s).(3)Twoclassificationalgorithms areand.(4)Logitmodels produceprobabilities that rangefromto.
PART IV:MULTIPLE CHOICE(TwoPoints Each)CIRCLE ORPUT A ()AROUNDYOURANSWER(1)Outliersarea.Illegalb. Normalc.Infrequent observationsd. All of theabove(2)General Applications ofClusteringa.Pattern Recognitionb.ImageProcessingc.Spatial DataAnalysisd. All of theabove(3)Decision Trees producerules that area.Bottom upb.Inclusivec.Exclusived. All of theabove(4)If....And.... Thena.Is notgood Englishb.Isaruleformatc.Isredundantd. Noneof theabove(5)Thespecifictermparentnodewould befounda.Inclusteringb.Inneuralnetworksc.Inruleinductiond. Noneof theabovePART V:PROBLEMSET:PROBLEM(1):(10points)(1)Thisis asimpleregression problem. Youareadataanalyst with theU.S.Department of Commerce.You havegathered information fromthedepartment’s datawarehouseon theworld priceof pulp and theexports ofpulpto therest of theworld.You havebeenasked toseewhatwould be theeffect ofa$1increaseinthepriceofpulp on shipments fromtheUnited States.AssumetheUnitedStates produces100%of theworld pulp.a.Table 1 belowrepresentsyourdata series (in areal world scenario,your series might consistofhundreds of datapoints). ThevariableX is theworld pricein current dollarsper ton. So forexample, thefirst variable is $792.32 per metricton. Theothervariable Yispulp shipmentsinmillions of metrictons.
PulpShipments(millionsofmetrictons)b. Table 2 is ascatter plot of therelationship between pulp pricesand shipments.TABLE 2:Scatter Plot ofPulpShipments4035302520151050500550600650700750800850900WorldPulpPrice (dollarsper ton)c.Table 3 is theregressionequationmodel predictingtherelationship between priceand pulpshipments.
desklib-logo
You’re reading a preview
card-image

To View Complete Document

Become a Desklib Library Member.
Subscribe to our plans

Unlock This Document