LECTIA 3
-
Upload
diana-carbune -
Category
Documents
-
view
213 -
download
0
description
Transcript of LECTIA 3
-
TEHNICIDATAMINING 2007
Lectordr.DANIELAJOITAUNIVERSITATEATITUMAIORESCU
LECTIA3Clustering Kmeansclustering;
ClusteringaglomerativClusteringdiviziv
-
TEHNICIDATAMINING 2007
Lectordr.DANIELAJOITAUNIVERSITATEATITUMAIORESCU
Cap4.ClusterizareClusterizareaesteprocesuldedivizareauneibazededateingrupedeinregistrarisimilarastfelincatmembriiaceleasigrupesafiecatsepoatedeapropriatiunuldealtul,iargrupurilesuntcatsepoatededepartateuneledecelelalte.Inclusterizarenuexistaniciomtdedatepreclasificatesinusefaceniciodistinctieintrevariabileleindepsiceledependente.Var(atributele)infctdecaresefaceoperatiadeclusterizaresenumescvardeintrare,iardimensiuneaproblemeiestedatadenrvardeintrare.Ppcaavemnvariabiledeintrare: nXXX ,...,, 21 .Atuncifiecareinregcecontinecateovaloareptcelenvar( nn xXxXxX === ,...,, 2211 )reprezintaunpunct ( )nxxx ,...,, 21 inspatialndimensional.CelmaidesfolositalgdeclusterizareestealgKmeans(alclusterizariicukclusterefolosindmediaaritmetica).AlgKmeans(MacQueen1967)Fiekfixat,k=nrclustere.
1. Sealeglaintamplarekpuncte(inregistrari)cafiindcentereleinitialealecelorkclustere.(MacQueenpropunealegereaprimelorkinreg)
2. Ptfiecareinregdeterminacelmaiapropriatcentrusiatribuieinregistrariiclusterulasociatcentrului.
3. Ptfiecarecluster,calculeazamediainregdincluster.Mutacentralclusteruluiinpctcorespmediei.
4. Repetapasii2si3panacandseobtineconvergentaadicapanacandnrdereatribuirialeclusterelorestemaimicdecatovaloaredata.
Obs.Algfunctioneazanumaiptatribuitecuvalorinumerice.Oposibilafctdistantcaresadescrienotiuneadecelmaiapropriatestefctdistantaeuclidianaintredouapct: ( )nxxxX ,...,, 21= si
( )nyyyY ,...,, 21= : ( ) ( )2211 ...),( nn yxyxYXd ++= .Exemplu:Sappcan=2adicaavemdouaattributedeintrare,atunciputemreprezentapunctele(inregistrarile)inplan:
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8
-
TEHNICIDATAMINING 2007
Lectordr.DANIELAJOITAUNIVERSITATEATITUMAIORESCU
Dacaalegemk=3,AlegemlaintamplarecentreleclusterelorpuncteleA(1,1),B(3,1),C(7,1).Grupampuncteleinclustereastfel:cluster1:punctele(1,1),(0,2),(2,2)cluster2:punctele(3,1),(4,1),(4,3),(5,3)cluster3:punctele(6,4),(6.5,3.5),(7,1).Calculammediileclusterelorsirecentram.
cluster1:x=(1+0+2)/3=1,y=(1+2+2)/3=5/3cluster2:x=(3+4+4+5)/4=4,y=(1+1+3+3)/4=2cluster3:x=(7+6+6.5)/3=6.5,y=(1+4+3.5)/3Serecalcdistanteledelafiecarepunctlanoilecentre.Cumdistminimenuduclareatribuirialeclusterelor,seobtineconvergenta.Deasemenea,alegandcele3centreinitialealtfel,clusterelerezultatevorfialtele.Deexemplu,Cluster1:acelasicentruCluster2:acelasicentruCluster3:x=(2+3+4+7+4+5+6+6.5)/8=4.68,y=(2+1+1+3+3+4+3.5+1)/8=2.31
2)131.2()68.43()2,1( 22
-
TEHNICIDATAMINING 2007
Lectordr.DANIELAJOITAUNIVERSITATEATITUMAIORESCU
Exempludedendograma:
Metodeaglomerative:Producoseriedepartitiialeinreg: 11,..., PPP nn unden=nrtotaldeinreg
}}{},...,{},{{ 21 nn iiiP = nmt(nclustere)...
},...,,{ 211 niiiP = 1clusterLafiecarepasmetodaunsetecatedouaclusteremaiexactdouaclusterecaresuntcelemaiapropriate:AlegemR,Sa.i. ),(min),(
,srDSRD
clusteresr= .UnimRsiS.
Diferentadintremetodeaparedatoritamodurilordiferitedeamasuradistintreclustere.
1. ClusterizareculegsimplaDistintreclustere=celmaiscurtdrumintreclustere
}/),(min{),( sclusterulinjsirclusterulinesteijidsrD =
2. ClusterizareculegcompletaDistintreclustere=distantadintrecelemaidepartateinregdinclustere
}/),(max{),( sclusterulinjsirclusterulinesteijidsrD =
3. Clusterizareculegmedie
D(r,s)=media{d(i,j)/iinclusterulr,jsuntinclusteruls}=)()(
),(scardrcard
jid
4. Clusterizareculegmediedegruprsisseunesca.i.dupauniredistmediedininteriorulfiecaruiclustersafieminima.Ppcanoulclusterformatprinunirealuircusestet.Atunci:D(r,s)=media{d(i,j)/i,jsuntinclusterultformatprinfuzionarealuircus}
5. ClusterizareculegaturaWard:Dist=crestereainsumapatrateloreroriiESSdupafuzionareacelordouaclustereintrunulsingur.SealegpasisuccesivicaresaminimizezecrestereainESSlafiecarepas.
MetodeaglomerativeMetodeaglomerative Metodedivizive
-
TEHNICIDATAMINING 2007
Lectordr.DANIELAJOITAUNIVERSITATEATITUMAIORESCU
X=mt,n=nrelemaleluiX
=
=n
ii mediaxXESS
1
2)(
D(r,s)=ESS(t=clusterobtprinfuzionarealuirsis)(ESS(r)+ESS(s))Metodedivizive:Algoritm:Initialtoateinregsuntintrunsingurcluster.Sedecideunpragptdist.Secalcdistdintreorice2inregsisedetperecheacuceamaimaredist.Distmaxsecomparacupragulptdist.Dacadistmax>pragptdistatuncigrupulseimparteindoua.Cele2inregsepunin
clusterediferiteiarcelelaltepctesepuninclusterulcelmaiapropriat.Siserepetaprocedeul.
Dacadistmax