统计学英文版 37页

  • 1.51 MB
  • 2022-08-13 发布

统计学英文版

  • 37页
  • 当前文档由用户上传发布,收益归属用户
  1. 1、本文档由用户上传,淘文库整理发布,可阅读全部内容。
  2. 2、本文档内容版权归属内容提供方,所产生的收益全部归内容提供方所有。如果您对本文有版权争议,请立即联系网站客服。
  3. 3、本文档由用户上传,本站不保证质量和数量令人满意,可能有诸多瑕疵,付费之前,请仔细阅读内容确认后进行付费下载。
  4. 网站客服QQ:403074932
Part1GatheringandExploringData(descriptivestatistics)\nDifferentTypesofData(2.1)VariableAvariableisanycharacteristicobservedonthesubjectsinastudy.Examples:Maritalstatus,Height,Weight,IQ,Sqft,Price,NE.AvariablecanbeclassifiedaseitherCategorical(inCategories),orQuantitative(Numerical)\nAvariablecanbeclassifiedascategoricalifeachobservationbelongstooneofasetofcategories:Examples:Gender(MaleorFemale)ReligiousAffiliation(Catholic,Jewish,…)TypeofResidence(Apartment,Condo,…)BeliefinLifeAfterDeath(YesorNo)NE(Locatedinnortheastsectorofcity(1)ornot(0))Avariableiscalledquantitativeifobservationsonittakenumericalvaluesthatrepresentdifferentmagnitudesofthevariable.Examples:Age,NumberofSiblings,AnnualIncome,Sellingprice,Sqft\nDiscreteversuscontinuousquantitativevariablesAquantitativevariableisdiscreteifitspossiblevaluesformasetofseparatenumbers,suchas0,1,2,3,…ThesetofpossiblevaluesisnotdenseExamples:oNumberofpetsinahouseholdoNumberofchildreninafamilyoNumberofforeignlanguagesspokenbyanindividualAquantitativevariableiscontinuousifitspossiblevaluesformanintervalThesetofpossiblevaluesisdenseExamples:oHeight/WeightoAgeoBloodpressure\nExerciseIdentifythevariabletype1.Numberofsiblingsinafamily2.Countyofresidence3.Distance(inmiles)ofcommutetoschool4.Maritalstatus5.Lengthoftimetotakeatest6.Numberofpeoplewaitinginline7.Numberofspeedingticketsreceivedlastyear8.Yourdog’sweight\nProportion&Percentage(RelativeFrequencies)Theproportionoftheobservationsthatfallinacertaincategoryisthefrequency(count)ofobservationsinthatcategorydividedbythetotalnumberofobservationsFrequencyofthatcategorySumofallfrequenciesThepercentageistheproportionmultipliedby100Proportionsandpercentagesarealsocalledrelativefrequencies\nExampleTableclassifiesthe630parliamentaryseatsoftheItalianchamberofdeputiesbycoalition(2013elections).CoalitionSeatsFreq.Prop.Perc.PierluigiBersani3450.54854.8SilvioBerlusconi1250.19819.8BeppeGrillo1090.17317.3MarioMonti470.0757.46Valleed'Aoste10.0020.16MAIAE20.0030.32USEI10.0020.16AntonioIngroia000Total6301100\nso,forGrillo,345isthefrequency.0.548=345/630istheproportionandrelativefrequency.54.8isthepercentage0.548×100=54.8%.FrequencyTableAfrequencytableisalistingofpossiblevaluesforavariable,togetherwiththenumberofobservationsand/orrelativefrequenciesforeachvalue.RawdataFrequencytableCodeGenderGendernifipi000001FF10000.011000002MM990000.9999......1000001.00100100000F\nExampleAstockbrokerhasbeenfollowingdifferentstocksoverthelastmonthandhasrecordedwhetherastockisup,thesame,ordowninvalue.Theresultswere:1.PerformanceofstockUpSameDownCount21712•Whatarethesubjects?•Whatisthevariableofinterest?•Whattypeofvariableisit?•Addproportionstothisfrequencytable.\nDescribedatausinggraphicalsummaries(2.2)DistributionAgraphorfrequencytabledescribesadistribution.Adistributiontellsusthepossiblevalues/categoriesavariabletakesaswellastheoccurrenceofthosevalues(frequencyorrelativefrequencyorpercentage)Inthe2008GeneralSocialSurvey,2020respondentsansweredthequestion,"Howmanychildrenhaveyoueverhad?"Theresultswere\nGraphsforcategoricaldata:bargraphsandpiechartsUsepiechartsandbargraphstosummarizecategoricalvariables:PieChart.oAcirclewhereeachcategoryisrepresentedasa“sliceofthepie”oThesizeofeachpiesliceisproportionaltothepercentageofobservationsfallinginthatcategoryBarGraph.oBarGraphsdisplayaverticalbarforeachcategoryoTheheightofeachbarrepresentseithercounts(“frequencies”)orpercentages(“relativefrequencies”)forthatcategory\nPieChartCARCOMPANIESnipiFIAT3803052FORD1282018OPEL1272017RENAULT963013TOTALI73200100FORDCarssold18%FIATFORDOPEL13%OPEL17%17%RENAULT52%FIAT18%RENAULT52%13%\nBarGraphCountsPercentages(I=Italy,F=France)4070603050402030201010FVendite00IFIATFORDOPELRENAULTFIATFORDOPELRENAULTPiechart:easiertocompareonecategorywiththewholeBargraph:easiertocomparecategoriesBargraphsarecalledParetoChartswhenthecategoriesareorderedbytheirfrequency,fromthetallestbartotheshortestbar\nGraphsforquantitativedata:dotplotShowsadotforeachsubject(observation)placedaboveitsvalueonanumberline.Toconstructadotplot•Drawahorizontallineandlabelitwiththenameofthevariable.•Markregularvaluesofthevariableonit.•Foreachobservation,placeadotaboveitsvalueonthenumberline.\nGraphsforquantitativedata:histogramsAHistogramisagraphthatusesbarstoportraythefrequenciesortherelativefrequenciesofthepossibleoutcomesforaquantitativevariableStepsforconstructingahistogram1.Dividetherangeofthedataintointervalsofequalwidth2.Countthenumberofobservationsineachinterval,creatingafrequencytable3.Onthehorizontalaxis,labelthevaluesortheendpointsoftheintervals.4.Drawabarovereachvalueorintervalwithheightequaltoitsfrequency(orproportionorpercentage),valuesofwhicharemarkedontheverticalaxis.5.Labelandtitleappropriately\nSodiumData:021026012522029021014022020012517025015017070230200290180\n\nDisplayingDataoverTime:timeplotsUsedfordisplayingatimeseries,adatasetcollectedovertime.Plotseachobservationontheverticalscaleagainstthetimeitwasmeasuredonthehorizontalscale.Pointsareusuallyconnected.Commonpatternsinthedataovertime,knownastrends,shouldbenoted.\nMeasuringtheCenterofQuantitativeData(2.3)Distributionofresellinghomeprices(1000$)incity0andcity101.03.02Density.0105010015020025050100150200250Wherearemoreexpensivethehouses?\nInordertogiveananswertothepreviousquestionwehavetoidentifyasinglevalue(center,location)whorepresentthewholedistribution.Thereareseveralwaystocomputethecenterofadistribution:-Mean-Median-Mode\nCalculatingthemean(onlyquantitativevariables)The(arithmetic)meanisthesumoftheobservationsdividedbythenumberofobservationsn∑XiX+X+L+Xi=112nX==nnAffectedbyextremevalues\nProperties1.isinternal,themeanisalwaysinbetweentheminimumandthemaximum;2.isthefairvalue,i.e.substitutingthevalueofeachobservationwiththemeanwepreservethetotalamount.3.thesumofdeviations(Xi-X)iszero;4.islinear.Informulas:ifXisthemeanofx1,x2,…,xn,andYisthemeanofy1,y2,…,yn,whereyi=a+bxi,thenY=a+bX.\nCalculatingthemedianThemedianisthemidpointoftheobservationswhentheyareorderedfromthesmallesttothelargest(orfromthelargesttothesmallest)Tocomputethemedian:OrderobservationsIfthenumberofobservationsis:oOdd,thenthemedianisthemiddleobservationoEven,thenthemedianistheaverageofthetwomiddleobservationsNOTaffectedbyextremevalues\nResistantMeasuresAnumericalsummarymeasureisresistantifextremeobservations(outliers)havelittle,ifany,influenceonitsvalue.TheMedianisresistanttooutliers.TheMeanisnotresistanttooutliers.\nThemodeofadistribution(alltypesofvariables)Mode=value/categorythatoccursmostoftenThemodeismostoftenusedwithcategoricaldataNotaffectedbyextremevaluesTheremaybenomodeTheremaybeseveralmodes\nReviewexample:Fivehousesonahillbythebeach$2,000K$500K$300K$100K$100KMean:($3,000,000/5)=$600,000Median:middlevalueofrankeddata=$300,000Mode:mostfrequentvalue=$100,000\nSpread(variability)ofQuantitativeData(2.4)Distributionofresellinghomeprices(1000$)incity0andcity101.04.03.02Density.010100150200250100150200250Whereisagreaterdispersion(spread,variability,inequality)?\nRangeOnewaytomeasurethespreadistocalculatetherange.Therangeisthedifferencebetweenthelargestandsmallestvaluesinthedataset:Range=max−−−minTherangeissimpletocomputeandeasytounderstand,butitusesonlytheextremevaluesandignorestheothervalues.Therefore,it’saffectedseverelybyoutliers.\nCalculatethestandarddeviationEachdatavaluehasanassociateddeviationfromthemean,X−XiAdeviationispositiveifitfallsabovethemeanandnegativeifitfallsbelowthemeanThesumofthedeviationsisalwayszero(i.e.themeanisthecenter)Ameasureofvariationcanbeobtainedbysummarizingthesquareddeviationsofeachobservationfromthemeanandcalculatinganaverageofthesesquareddeviations.Informulasn212Variance:σ=∑(xi−x)ni=1Theoriginalunitofmeasurementcanberecoveredbycomputingn12StandardDeviation:σ=∑(xi−x)ni=1\nStepsoFindthemeanoFindthedeviationofeachvaluefromthemeanoSquarethedeviationsoSumthesquareddeviationsoDividethesumbynoComputethesquarerootExample1Metabolicratesof7men(cal./24hr.):17921666136216141460186714391792+1666+1362+1614+1460+1867+1439x=711,200==1600cal./24hr.7\nObservationsDeviationsSquareddeviations17921792−1600=192(192)2=3686416661666−1600=66(66)2=435613621362−1600=-238(-238)2=5664416141614−1600=14(14)2=19614601460−1600=-140(-140)2=1960018671867−1600=267(267)2=7128914391439−1600=-161(-161)2=25921sum=0sum=2148702214870σ==30695.717σ=30695.71=175.202cal./24hr.\nWhenthedatasetisasample,thetwopreviousindicesareadjusteddividingthesumofthesquareddeviationsbyn-1insteadofn.Informulasn212SampleVariance:s=∑(xi−x)n−1i=1n12SampleStandardDeviation:s=∑(xi−x)n−1i=1Themotivationofthismodificationwillbeexplainedinthethirdpart.-Itisimportanttonotethats2[s]hasexactlythesamepropertiesasσ2[σ]evenifitsvalueisalwayssmaller.-σ2andσarealsocalledpopulationvarianceandpopulationstandarddeviation,respectively.\nExample2\nExample3\nPropertiesofthestandarddeviationσ[s]measuresthespread(variability)ofthedata.σ[s]=0onlywhenallobservationshavethesamevalue,otherwiseσ[s]>0.Asthespreadofthedataincreases,σ[s]getslarger.σ[s]hasthesameunitsofmeasurementastheoriginalobservations.Thevarianceσ2[s2]hasunitsthataresquared.σ[s]isnotresistant.Afewextremevaluescangreatlyincreaseitsvalue.ifσ2222x[sx]isthevarianceofx1,x2,…,xn,andσx[sy]isthevarianceofy2222221,y2,…,yn,whereyi=a+bxi,thenσy=bσx[sy=bsx]andσy=|b|σx[sy=|b|sx].\nMeasuresofPosition(2.5)PercentileThepthpercentileisavaluesuchthatppercentoftheobservationsfallbeloworatthatvalue.QuartilesSUMMARY:FindingQuartiles\nZ-ScoreThez-scorealsoidentifiespositionandpotentialoutliers.Thez-scoreforanobservationisthenumberofstandarddeviationsthatitfallsfromthemean.Apositivez-scoreindicatestheobservationisabovethemean.Anegativez-scoreindicatestheobservationisbelowthemean.Forsampledata,thez-scoreiscalculatedas:X−Xiz=isAnobservationfromabell-shapeddistributionisapotentialoutlierifitsz-score<-3or>+3(3standarddeviationcriterion).Itisimportanttonotethatthez-scoreshavemean0andstandarddeviation1.

相关文档