RAPORT DE CERCETARE - econ.ubbcluj.rorodica.lung/cscn/RaportFinal.pdf · RAPORT DE CERCETARE CSC-N...

17
RAPORT DE CERCETARE CSC-N - COMMUNITY STRUCTURE AND DIFFUSION IN SOCIAL AND ECONOMIC NETWORKS. GRANT CNCS – UEFISCDI, PN-II-RU-TE-2014-4-2332 DEZVOLTAREA DE NOI MODELE COMPUTAȚIONALE PENTRU DETECTAREA DE COMUNITĂȚI BAZATE PE TEORIA JOCURILOR ȘI INTELIGENȚA COMPUTAȚIONALĂ. O1. Pregatirea dezvoltarii de noi modele computaționale pentru detectarea de comunități bazate pe teoria jocurilor și inteligența computațională. A1. Analiza teoretică a jocului propus și a conceptelor de echilibre disponibile ca și posibile soluții pentru problema de detectarea structurii de comunități. O2. Dezvoltarea de noi modele computaționale pentru detectarea de comunități bazate pe teoria jocurilor și inteligența computațională. A2. Dezvoltarea de noi operatori evolutivi potriviți pentru conceptele de echilibre de la O1A1; analiza complexității. A3. Efectuarea de experimente numerice pentru a testa operatorii din O1A2. Analiza rezultatelor și adaptarea/îmbunătățirea operatorilor. Repetarea experimentelor. Validarea prin teste statistice. A4. Comparații cu alte metode A5. Experimente pe rețele reale sociale și economice. Adaptare la specific de marketing. A6. Documentarea modelelor de difuzie de informatie existente A7. Construirea platformei online pentru detectarea structurilor de comunități. A8. Diseminarea rezultatelor REZULTATE PREVĂZUTE PENTRU LIVRARE: O1: - raport de cercetare; - articole trimise spre publicare; - pagina web; O2: - raport de cercetare; - 3 articole indexate Web of Science; - 10 participări la conferințe; - o platforma web pentru detectarea structurilor de comunități

Transcript of RAPORT DE CERCETARE - econ.ubbcluj.rorodica.lung/cscn/RaportFinal.pdf · RAPORT DE CERCETARE CSC-N...

Page 1: RAPORT DE CERCETARE - econ.ubbcluj.rorodica.lung/cscn/RaportFinal.pdf · RAPORT DE CERCETARE CSC-N - COMMUNITY STRUCTURE AND DIFFUSION IN SOCIAL AND ECONOMIC NETWORKS. GRANT CNCS

RAPORT DE CERCETARE CSC-N - COMMUNITY STRUCTURE AND DIFFUSION IN SOCIAL AND ECONOMIC

NETWORKS.

GRANT CNCS – UEFISCDI, PN-II-RU-TE-2014-4-2332

DEZVOLTAREA DE NOI MODELE COMPUTAȚIONALE PENTRU DETECTAREA DE

COMUNITĂȚI BAZATE PE TEORIA JOCURILOR ȘI INTELIGENȚA COMPUTAȚIONALĂ.

O1. Pregatirea dezvoltarii de noi modele computaționale pentru detectarea de comunități bazate pe

teoria jocurilor și inteligența computațională.

A1. Analiza teoretică a jocului propus și a conceptelor de echilibre disponibile ca și posibile soluții

pentru problema de detectarea structurii de comunități.

O2. Dezvoltarea de noi modele computaționale pentru detectarea de comunități bazate pe teoria

jocurilor și inteligența computațională.

A2. Dezvoltarea de noi operatori evolutivi potriviți pentru conceptele de echilibre de la O1A1;

analiza complexității.

A3. Efectuarea de experimente numerice pentru a testa operatorii din O1A2. Analiza rezultatelor și

adaptarea/îmbunătățirea operatorilor. Repetarea experimentelor. Validarea prin teste

statistice.

A4. Comparații cu alte metode

A5. Experimente pe rețele reale sociale și economice. Adaptare la specific de marketing.

A6. Documentarea modelelor de difuzie de informatie existente

A7. Construirea platformei online pentru detectarea structurilor de comunități.

A8. Diseminarea rezultatelor

REZULTATE PREVĂZUTE PENTRU LIVRARE:

O1:

- raport de cercetare;

- articole trimise spre publicare;

- pagina web;

O2:

- raport de cercetare;

- 3 articole indexate Web of Science;

- 10 participări la conferințe;

- o platforma web pentru detectarea structurilor de comunități

Page 2: RAPORT DE CERCETARE - econ.ubbcluj.rorodica.lung/cscn/RaportFinal.pdf · RAPORT DE CERCETARE CSC-N - COMMUNITY STRUCTURE AND DIFFUSION IN SOCIAL AND ECONOMIC NETWORKS. GRANT CNCS

2

REZULTATE OBTINUTE:

LISTA DE LUCRARI PUBLICATE SAU TRIMISE SPRE PUBLICARE (ACTIVITATEA

ASOCIATA ESTE INDICATA IN PARANTEZE +O2 A8):

1. Noemi Gasko, Rodica Ioana Lung, Mihai Suciu, EXTREMAL OPTIMIZATION AND NETWORK

COMMUNITY STRUCTURE, The 7th International Conference on BIOINSPIRED

OPTIMIZATION METHODS AND THEIR APPLICATIONS, Bled, Slovenia, 18-20 mai 2016 (O2,

A1,A2,A3,A4)

2. Mihai Suciu, Rodica Ioana Lung, Noémi Gaskó, Game theory, Extremal optimization, and

Community Structure Detection in Complex Networks, GECCO '16, July 20-24, 2016, Denver,

CO, USA (conferinta categoria A – core conferences list, indexat ISI, O2, A1,A2,A3,A4)

3. Lung, R. I., Suciu, M., Meszlényi, R., Buza, K., & Gaskó, N. (2016, September). Community

Structure Detection for the Functional Connectivity Networks of the Brain. In International

Conference on Parallel Problem Solving from Nature (pp. 633-643). Springer International

Publishing (conferinta categoria A – core conferences list, indexat ISI, O2, A1,A2,A3,A4)

4. Noémi Gaskó, Rodica Ioana Lung and Mihai Suciu, Community Detection in Bipartite Networks

Using a Noisy Extremal Optimization Algorithm, Springer Series in Computational Intelligence

(accepted, indexat ISI, ISDA 2016, O2, A1,A2,A3,A4)

Lucrari aflate in evaluare la reviste:

1. Lung, R.I., Suciu, M.A., Gasko, N., A Probabilistic Game Theoretic Approach to the Problem of

Community Structure Detection in Complex Networks, in evaluare la revista PlosOne (zona

rosie AIS) O1

2. Lung, R.I., Suciu, M.A., Gasko, N., Adaptive Mixed Nash Extremal Optimization, acceptat cu

“major revisions” la MIT Evolutionary Computation (zona rosie AIS), retrimis spre evaluare dupe

ce recomandarea editorilor a fost de a face reviziuni.O1,O2.

PREZENTARI LA CONFERINTE:

1. Rodica Ioana Lung, Noemi Gasko, Mihai Suciu, Network community structure detection and

the Berge-Zhukovskii equilibrium, 28th European Conference on Operational Research,

Poznan, 3-6 iulie, 2016.

2. Mihai Suciu, Rodica Ioana Lung, Noémi Gaskó, Game theory, Extremal optimization, and

Community Structure Detection in Complex Networks, 12th European Meeting on Game

Theory (SING12), Odense (Denmark), 11-13 iulie, 2016.

3. Mihai Suciu, Rodica Ioana Lung, Noémi Gaskó, About Nash Equilibrium, Modularity

Optimization, and Network Community Structure Detection, GAMES 2016, the 5th World

Congress of the Game Theory Society, Maastricht University 24-28 iulie 2016.

PARTICIPĂRI LA CONFERINȚE:

Page 3: RAPORT DE CERCETARE - econ.ubbcluj.rorodica.lung/cscn/RaportFinal.pdf · RAPORT DE CERCETARE CSC-N - COMMUNITY STRUCTURE AND DIFFUSION IN SOCIAL AND ECONOMIC NETWORKS. GRANT CNCS

3

Nr. Crt

Nume si prenume Perioada Localitate conferinta

1. Lung Rodica Ioana 3-6 iulie 2016 Poznan /Polonia EURO 2016

2. Gasko Noemi 3-6 iulie 2016 Poznan /Polonia EURO 2016

3. Suciu Mihai Alexandru

3-6 iulie 2016 Poznan /Polonia EURO 2016

4. Lung Rodica Ioana 10-14 iulie 2016 Odense/Danemarca SING 2016

5. Lung Rodica Ioana 23-29 iulie 2016 Maastricht/Olanda GAMES 2016

6. Suciu Mihai Alexandru

19-29 iulie 2016 Maastricht/Olanda GAMES 2016

7. Suciu Mihai Alexandru

19-29 iulie 2016 Denver/SUA; GECCO 2016

8. Lung Rodica Ioana 17-22 sept. 2016 Edinburgh/Marea Britanie

PPSN 2016

9. Gasko Noemi 17-22 sept. 2016 Edinburgh/Marea Britanie

PPSN 2016

10. Gasko Noemi 17-21 mai 2016 Bled/Slovenia BIOMA 2016

11. Suciu Mihai Alexandru

17-21 mai 2016 Bled/Slovenia BIOMA 2016

12. Lung Rodica Ioana 17-21 mai 2016 Bled/Slovenia BIOMA 2016

PLATFORMA

www.econ.ubbcluj.ro/~rodica.lung/cscn

Page 4: RAPORT DE CERCETARE - econ.ubbcluj.rorodica.lung/cscn/RaportFinal.pdf · RAPORT DE CERCETARE CSC-N - COMMUNITY STRUCTURE AND DIFFUSION IN SOCIAL AND ECONOMIC NETWORKS. GRANT CNCS

Results

December 5, 2016

The community structure detection in networks is a graph clustering problem thatattempts to identify groups of nodes that are highly interconnected to each other andsparsely connected outside the group . While the concept of community seems trivial, theautomatic extraction of the community structure for simple unweighted and undirectednetworks has proven to be a computational challenge that requires truly adaptive andscalable methods.

Lung, R.I., Suciu, M.A., Gasko, N., Adaptive Mixed Nash Extremal Optimization,major revisions, MIT Evolutionary Computation, under evaluation. The AdaptiveMixed Nash Extremal Optimization (A-MNEO) method proposed in this paper extendsthe Mixed Nash Extremal Optimization (MNEO) by Suciu et al. (2015). MNEO is acommunity detection method that uses the game theoretic framework described aboveand a network shift diversity preservation mechanism within an extremal optimizationalgorithm.

One of the drawbacks of MNEO is that it requires an interval for the number of com-munities to search for, which may not always be available a-priori. A-MNEO addressesthis issue by estimating the number of communities during the initialization process.Another drawback of MNEO comes from the periodic use of the network shifting mech-anism, whether necessary or not. A-MNEO uses the network shifts adaptively, when thesearch stagnates for a period of time. Another adaptive feature introduced within A-MNEO is the updating mechanism of the number of nodes modified in one EO iteration,replacing the exponentially decreasing mechanism of MNEO. A-MNEO was tested onthe well-known GN and LFR benchmarks1 (Lancichinetti and Fortunato, 2009; Girvanand Newman, 2002).

The GN benchmark consists of networks having 128 nodes grouped in 4 communitiesof 32 nodes each. The degree of each node is zin + zout = 16, where zin is the number ofedges linking each node to others in the same community and zout represents the numberof edges linking each node with nodes from the other communities. Eight benchmarksets were randomly generated for different values of zout ∈ {1, 2, ..., 8}. Each set contains30 networks.

1generated using the code available at https://sites.google.com/site/ andrealancichinetti/software

1

Page 5: RAPORT DE CERCETARE - econ.ubbcluj.rorodica.lung/cscn/RaportFinal.pdf · RAPORT DE CERCETARE CSC-N - COMMUNITY STRUCTURE AND DIFFUSION IN SOCIAL AND ECONOMIC NETWORKS. GRANT CNCS

The LFR benchmark networks have 128 nodes, consisting of 6 sets of 30 networkswith the mixing parameter (ratio between number of links outside the community andnode degree) µ ∈ {0.1, 0.2, ..., 0.6}. The LFR parameters are: average vertex degree 20,maximum vertex degree 50, community size [10, 50].

The most challenging networks from the synthetic sets are the GN zout = 8 and theLFR sets with 128 nodes and µ ∈ {0.5, 0.6}, because these are networks with the mostunclear defined communities as the number of links a node has within its community isequal (zout = 8, µ = 0.5) or less (µ = 0.6) than the number of links to nodes outsideits community. Another feature that may increase the difficulty of the network is thedifference between the number of links a node has in its community and the maximumnumber of links connecting it to another community: a difference of only 1 makes itdifficult for an algorithm to correctly assign the node.

The results are evaluated using the normalized mutual information (NMI)2 proposedby Lancichinetti et al. (2009). The NMI can be used as an indicator when the correctcommunity structure is available; a higher value of the NMI indicates a ”better” solution;the maximum value of 1 indicates that the correct cover has been found. We computethe NMI of the individual having the best modularity value in A-MNEO population A,and also the best NMI value found in the population.

Statistical comparisons are performed by using the Wilcoxon sign rank nonparametrictest for NMIs reported by each considered method in 30 independent runs for each realnetwork and on the 30 networks for each GN or LFR set. The Wilcoxon sign rank assessesif there is a significant difference between two sample medians: the null hypothesis thattwo samples come from the same population can be rejected with a level of significanceα = 0.05 if the computed p-value is smaller than 0.05.

Comparison with other methods The results obtained with A-MNEO are comparedwith those provided by different state of the art algorithms: Louvain (Blondel et al.,2008), ModOpt - Modularity optimization (Sales-Pardo et al., 2007), Oslom (Lanci-chinetti et al., 2011), Infomap (Rosvall and Bergstrom, 2008)3, and with NoisyEO (Lunget al., 2015), a similar extremal optimization based method that maximizes modularity.

The Louvain method (Blondel et al., 2008) maximizes the modularity gradually bystarting with small communities detected by locally optimizing modularity, aggregatingthem as nodes in a new network, and repeating the local modularity optimization andaggregation until a maximum is found and a hierarchical structure is uncovered. Louvainwas designed to deal with large networks.

The ModOpt algorithm (Sales-Pardo et al., 2007) goes beyond simply maximizing themodularity, by defining a basin of attraction for covers that are local maxima in themodularity landscape and using it to compute an affinity matrix for network nodes. Byusing simulated annealing the nested block diagonal structure of this matrix is revealedand further used to identify the hierarchical structure of a network. If a network has a

2by using the code available at https://sites.google.com/site/ andrealancichinetti/software3For Louvain, ModOpt, Oslom, and Infomap we used the source code available at https://sites.google.com/site/andrealancichinetti/software.

2

Page 6: RAPORT DE CERCETARE - econ.ubbcluj.rorodica.lung/cscn/RaportFinal.pdf · RAPORT DE CERCETARE CSC-N - COMMUNITY STRUCTURE AND DIFFUSION IN SOCIAL AND ECONOMIC NETWORKS. GRANT CNCS

flat organization of nodes, ModOpt acts as a standard community detection algorithm.Oslom (Lancichinetti et al., 2011) optimizes repeatedly a score defined for clusters.

This score is computed by using the probability that an external node has a given numberof neighbors in a cluster in a random null model compared to the actual number ofconnections that the node has with the cluster. Optimized clusters are further analyzedin order to convey the optimum cover. Oslom is capable to deal with different types ofnetworks: unweighted and undirected, directed, weighed, hierarchical or dynamic.

Infomap (Rosvall and Bergstrom, 2008) uses a random walk to mimic the informationflow within a network, based on the assumption that information flows more quicklyand easily within a module. The random walk is encoded using Huffman coding; thecommunity structure problem is then transformed into a coding problem that minimizesits expected description of the code. A deterministic greedy search algorithm followedby a heath-bath algorithm are used to solve the optimization problem and uncover thecommunity structure.

NoisyEO (Lung et al., 2015) is another method that uses extremal optimization in asimilar setting with A-MNEO to maximize the modularity. Within NoisyEO the numberof nodes modified in a EO iteration is linearly decreased until the middle of the searchand set to one until the end, making the diversity induced by this mechanism dependenton the number of generations (in a similar manner with MNEO). NoisyEO also uses anetwork shifting mechanism, in a similar manner with MNEO, with the only differencethat all network modifications are random, following an uniform distribution, withoutpreserving node degrees.

Thus, A-MNEO is directly compared with two other methods that rely on modularityoptimization, two stochastic methods that are also based (indirectly) on optimization,and with a similar existing EO based approach.

Numerical results The results obtained for the synthetic benchmarks are presentedin Figure 1. For the GN and LFR benchmarks with 128 nodes and zout ∈ {1, 2, 3, 4} andµ ∈ {0.1, 0.2, 0.3} respectively, all methods reported NMI values of 1 for all networks, andtherefore we did not represent the results. However, increasing zout and µ, respectively,leads to differences between results. For the GN benchmark, Louvain and Infomap fail toidentify the community structure starting with zout = 7, while for zout = 8 all methodsexcept the EO based ones fail. With insignificant differences between A-MNEO andA-MNEO(m), A-MNEO provides best results for this benchmark. Other methods inliterature report an average NMI of approx. 0.4 for zout = 8 (Gong et al., 2012; Pizzuti,2008); Gong et al. (2013) report a maximum NMI value of approx. 0.6.

Moving to the LFR benchmark, it is clear that µ = 0.5 and µ = 0.6 are the mostchallenging sets for all methods. A-MNEO results are significantly better than thoseprovided by Oslom and Infomap; while there is no statistical difference indicated inthe matrix between A-MNEO and A-MNEO(m), the results of A-MNEO(m) cannot beconsidered statistically better than those provided by Louvain, ModOpt and NoisyEO.In (Honghao et al., 2013) the results reported by an Ant colony optimization, comparedwith other methods, report average NMI values of 0.2 for µ = 0.6.

3

Page 7: RAPORT DE CERCETARE - econ.ubbcluj.rorodica.lung/cscn/RaportFinal.pdf · RAPORT DE CERCETARE CSC-N - COMMUNITY STRUCTURE AND DIFFUSION IN SOCIAL AND ECONOMIC NETWORKS. GRANT CNCS

0

0.5

1

Wilcoxon h values1 2 3 4 5 6 7

1234567

0

0.5

11234567

0

0.5

11234567

0

0.5

1

A−

MN

EO

A−

MN

EO

(m)

Louvain

ModO

pt

Nois

yE

O

Oslo

m

Info

map

zo

ut=

8

zo

ut=

7

zo

ut=

6

zo

ut=

5

GN benchmark, 128 nodes

1 2 3 4 5 6 7

1 2 3 4 5 6 7

1234567

(a) GN

0

0.5

1

Wilcoxon h values1 2 3 4 5 6 7

1234567

0

0.5

11234567

0

0.5

11234567

0

0.5

1

A−

MN

EO

A−

MN

EO

(m)

Louvain

ModO

pt

Nois

yE

O

Oslo

m

Info

map

µ=

0.6

µ

=0.5

µ

=0.4

µ

=0.3

LFR benchmark, 128 nodes

1 2 3 4 5 6 7

1 2 3 4 5 6 7

1234567

(b) LFR 128

Figure 1: Comparisons with other methods. Boxplots of NMI values obtained for the 30networks in each set by each considered method. Wilcoxon h values matrixillustrate the statistical significance of the differences in results for the sevenmethods: a black box corresponds to p < 0.05 and rejection of the null hy-pothesis that the two samples have the same median. Results obtained for theGN and LFR benchmarks (128 nodes).

Lung, R.I., Suciu, M.A., Gasko, N., A Probabilistic Game Theoretic Approach tothe Problem of Community Structure Detection in Complex Networks, PlosOne,under evaluation. We explore the possibility of using a probabilistic version of theNash ascendancy relation within one of the extremal optimization methods designedby us to identify the community structure of a network. The probabilistic ascendancyrelation takes into account only a fraction of the network nodes when comparing twostrategy profiles. We show that p−Nash non-ascended solutions are also Nash equilibriaof the game. We address the practical concern that using the probabilistic relation maynot yield results as good as the Nash ascendancy relation when maintaining the samenumber of iterations by using numerical experiments. Results show that there are veryfew significant differences in results between different values for the fraction of nodes andthat the performance of pMNEO is significantly better than that of other state-of-artmethods on the tested benchmarks.

Regarding the running time, the differences between different probability levels arelow: an average decrease of 10% in running time when switching from p = 100% top = 25%. This result indicates that the Nash ascendancy relation, while indeed com-

4

Page 8: RAPORT DE CERCETARE - econ.ubbcluj.rorodica.lung/cscn/RaportFinal.pdf · RAPORT DE CERCETARE CSC-N - COMMUNITY STRUCTURE AND DIFFUSION IN SOCIAL AND ECONOMIC NETWORKS. GRANT CNCS

Algorithm 1 kEO(s, sbest) iteration

1: For current configuration s evaluate ui(s), the payoff function corresponding of nodei ∈ {1, . . . , n}.

2: find the k worst components and replace them with a random value;3: if (s is better4han sbest) then4: set sbest := s.5: end if

1 Nash, Berge, Modularity, Community fitness, or Replace all ;

putationally expensive, does not influence the running time of MNEO as much as othercomponents of the method. However, the probabilistic version does offer a less computa-tional expensive alternative to the Nash ascendancy relation, that can be used to enhanceother heuristics that attempt to solve the community structure detection problem usinggame theoretic approaches.

Mihai Suciu, Rodica Ioana Lung, Nomi Gask, Game theory, Extremal optimization,and Community Structure Detection in Complex Networks, GECCO ’16, July 20-24,2016, Denver, CO, USA (A core conferences list, ISI)

Tested variants Within NoisyEO the search is guided in two phases of the kEOiteration: (i) nodes having the lowest fitness are selected and randomly re-assigned inother communities, and (ii) selection for survival: the decision regarding the replacementof sbest (Alg, 1, line 3) can be made by using any fitness function that evaluates acommunity structure. The effect of this decision is studied in this paper: how does theuse of the game theoretic tools in this part influences the results compared with the useof the modularity or the community fitness (which actually defines the node payoff)? Toanswer this question five variants of NoisyEO that only differ in the fitness mechanismused for selection of survival in line 3, Alg. 1 are considered:

1. Nash: the Nash ascendancy relation is used: if s Nash ascends sbest, it will replaceit;

2. Berge: the Berge ascendancy relation is used: if s Berge ascends sbest, it willreplace it;

3. Modularity : s replaces sbest if it has a higher modularity (eq. (??)); in this casethe fitness of a node is also computed using the modularity:

u(Q)i (s) =

∑j∈Ci

(Aij −

kikj2m

)where A is the adjacency matrix, m is the total number of links, and ki is thedegree of node i, and Ci is the community of node i;

4. Community fitness: s replaces sbest if it has a higher community score (eq. (??));5. Replace all : s replaces sbest always, regardless of any fitness measure;

5

Page 9: RAPORT DE CERCETARE - econ.ubbcluj.rorodica.lung/cscn/RaportFinal.pdf · RAPORT DE CERCETARE CSC-N - COMMUNITY STRUCTURE AND DIFFUSION IN SOCIAL AND ECONOMIC NETWORKS. GRANT CNCS

Numerical experiments were performed with the five variants of NoisyEO and re-sults were compared with four state-of-art methods: Louvain Blondel et al. (2008),OSLOM Lancichinetti et al. (2011), Infomap Rosvall and Bergstrom (2008), and Mod-Opt Sales-Pardo et al. (2007)5.

On the small (128 nodes) synthetic benchmarks two variants stand out with bestresults that are significantly better than the others in most cases: Modularity (3) andCommunity fitness (4), both based on optimization, while results obtained when usingthe ascendancy relation are almost as good, with values close to 1 for the networks withwell defined community structures (µ < 0.5, zout ≤ 8). For µ = 0.5 results obtainedusing the Community fitness are statistically better than all others except Modularity,while for µ = 0.6 there are no statistical differences between results. Compared withother methods (Figure 2), for GN zout = 7, 8, and for LFR µ > 0.3 they are better thanModOpt and Infomap and as good as the others.

For the larger, but with better defined structures, LFR S and B sets with 1000 nodes,when comparing results with those reported by other methods (Figure 2), we can seethat the EO based approaches lack the precision provided by the rest, with very goodresults but having average values less than 1. Except ModOpt none of the methods isaffected by the value of µ.

Because in the final population the difference between the NMI value of the individ-ual having the best modularity value and the individual having the best NMI value aresignificant for these networks.The differences between the two illustrate the well knowndraw-back of modularity: in most cases, the maximum value of modularity does notcorrespond to the real community structure. This limitation is also visible in Figure 3where the results obtained by all methods on the real-world networks are presented: allmethods that rely on modularity report similar results with the EO variants when report-ing the NMI of the individual with best modularity value in the population (continuouscolored lines); only OSLOM, which does not use modularity, reports better results onthe dolphins and karate networks, and Infomap, which also does not use modularity, onkarate. When considering the individuals with best NMI values in the final EO popu-lation the results are different for the dolphins and karate networks: apparently gamebased variants (Nash and Berge) and Replace all report better results, but statisticalcomparisons show no difference between these two EO variants and the others. Simi-larities between results obtained by all methods for the real-world networks and thoseobtained for the LFR set with 128 nodes and µ = 0.6 suggest that this set may be closestto simulate real networks and that behavior of a method on this set may be an indicatorof possible good performance on real-world networks.

An interesting aspect of these results is related to the Replace all : even when noselection is used (the offspring always replaces the parent), results are similar to the gamebased approaches. The explanation may be that the extremal optimization proceduredoes select the worst nodes and modifies them randomly, and this selection is madebased on the node fitness computed using ui(s) = f(ci) − f(ci\{i}), the same with the

5using the source code available at https://sites.google. com/site/andrealancichinetti/software, lastaccessed May, 2015

6

Page 10: RAPORT DE CERCETARE - econ.ubbcluj.rorodica.lung/cscn/RaportFinal.pdf · RAPORT DE CERCETARE CSC-N - COMMUNITY STRUCTURE AND DIFFUSION IN SOCIAL AND ECONOMIC NETWORKS. GRANT CNCS

0 2 4 6 80

0.2

0.4

0.6

0.8

1

mean N

MI

zout

GN zout

= 8, 128 nodes

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

0.2

0.4

0.6

0.8

1

µ

LFR S, 128 nodes

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

0.2

0.4

0.6

0.8

1

mean N

MI

µ

LFR S, 1000 nodes

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

0.2

0.4

0.6

0.8

1

µ

LFR B, 1000 nodes

Comm. Fitness

Replace all

Louvain

ModOpt

Infomap

OSLOM

Figure 2: Synthetic benchmarks. Comparisons with other methods. Color error-bars(purple and blue) represent the best and worst EO variant, Community fitnessand Replace all, respectively. Black error bars correspond to results obtainedby the other state-of-art methods (Louvain, ModOpt, Oslom, and Infomap).

books dolphins football karate0

0.2

0.4

0.6

0.8

1

me

an

NM

I

Comm. Fitness

Replace all

Berge

Modularity

Nash

Comm. Fitness best NMI

Replace all best NMI

Berge best NMI

Modularity best NMI

Nash best NMI

Louvain

ModOpt

Infomap

OSLOM

Figure 3: Results obtained on the real-world networks: continuous color lines representaverage NMI values of the individuals reported by the method and dottedlines the best NMI value in the final population. Black lines represent resultsobtained by other methods.

7

Page 11: RAPORT DE CERCETARE - econ.ubbcluj.rorodica.lung/cscn/RaportFinal.pdf · RAPORT DE CERCETARE CSC-N - COMMUNITY STRUCTURE AND DIFFUSION IN SOCIAL AND ECONOMIC NETWORKS. GRANT CNCS

node payoff of game Γ. It can be inferred that the search is actually directed by theinner mechanism of the extremal optimization algorithm which randomly replaces theworst components in each individual, while the selection for survival plays only a smallrefinement role.

Mihai Suciu, Rodica Ioana Lung, Noemi Gasko, About Nash Equilibrium, ModularityOptimization, and Network Community Structure Detection, GAMES 2016, the5th World Congress of the Game Theory Society, Maastricht University 24-28 iulie2016. This article explores the possibility of using the Nash equilibrium concept inthe context of the network community detection problem. While it seems appealing tomodel the community structure by considering nodes as rational agents that choose thecommunity that best fits their interests, there is still a big gap that has to be filledwhen constructing the game and in considering the approximation method. In thispaper two possible payoff functions are considered. Results are controversial: one ofthem is better on the synthetic benchmark and the other yields better results on thereal-world networks. However, compared with results offered by state-of-art methods inthe literature, the tested approach appears better; further work consists in consideringmore realistic payoff functions, inspired from real-world models of community structureformation in order to further explore the possibility of using game theoretic concepts toapproach this problem.

Rodica Ioana Lung, Noemi Gasko, Mihai Suciu, Network community structure de-tection and the Berge-Zhukovskii equilibrium, 28th European Conference on Opera-tional Research, Poznan, 3-6 iulie, 2016. Figure 4 presents mean and standard errorsfor the results reported by BEO (ε = 0.001) and Infomap, OSLOM, and Modularity Op-timization in order to assess if these results are in fact comparable with those obtainedby other methods. According to the Wilcoxon sum rank test, BEO results are signif-

1 2 3 4 5 6 7 80

0.2

0.4

0.6

0.8

1

NM

I

zout

GN networks

0.2 0.4 0.6µ

LFR S networks

0.2 0.4 0.6µ

LFR B networks

books dolphins football karate

Real networks

network

BEO, ε = 10−3

Infomap

OSLOM

Modularity Optimization

Figure 4: Mean and standard error of the mean of NMI for BEO and the other algorithmsfor GN, LFR and real networks.

icantly better than all other methods considered for the GN zout = 8, LFR small andbig for µ = 0.5 and 0.6 and for the books and dolphins network. For GN zout = 7 BEOresults are significantly better than those of Oslom and Infomap, and for GN zout = 6,better than Infomap. For both LFR small and big, µ = 0.4 and for the karate network,

8

Page 12: RAPORT DE CERCETARE - econ.ubbcluj.rorodica.lung/cscn/RaportFinal.pdf · RAPORT DE CERCETARE CSC-N - COMMUNITY STRUCTURE AND DIFFUSION IN SOCIAL AND ECONOMIC NETWORKS. GRANT CNCS

BEO results are better than those obtained by Infomap and Modularity Optimization.The Wilcoxon sum rank test was performed for all tested values of ε.

These results indicate the potential of this approach: they show that the concept ofε-Berge-Zhukovskii equilibrium may be used to approach the community structure de-tection problem with results just as good and even better than those provided by otherstate-of-art methods. Many implications arise from here, of both practical and theoret-ical nature. From a practical point of view further experiments should be performed tostudy the behavior of search methods that attempt to compute the Berge-Zhukovskiiequilibrium for the community structure detection game. From a theoretical point ofview the challenge is to prove that the Berge-Zhukovskii equilibrium of this game indeedrepresents a network cover.

Nomi Gask, Rodica Ioana Lung and Mihai Suciu, Community Detection in Bipar-tite Networks Using a Noisy Extremal Optimization Algorithm, Springer Series inComputational Intelligence (accepted, indexat ISI, ISDA 2016) In this paper we areexploring the use of a community structure detection algorithm designed for unweightedand undirected networks for finding the structure of bipartite networks. By simply re-placing the Newman modularity with the Barber modularity we find promising results,but to improve the method more attention to the specificities of bipartite networks hasto be paid. The problem also requires attention: benchmarks with known structures areneeded to evaluate the performance of different methods and allow quantitative compar-isons, as a higher modularity may not always represent a better structure.

Noemi Gasko, Rodica Ioana Lung, Mihai Suciu, EXTREMAL OPTIMIZATION ANDNETWORK COMMUNITY STRUCTURE, The 7th International Conference onBIOINSPIRED OPTIMIZATION METHODS AND THEIR APPLICATIONS, Bled,Slovenia, 18-20 mai 2016 A comparative analysis of four variants of extremal opti-mization updating procedures for the community structure detection problem is per-formed in this paper. The results show that the use of an adaptive method of settingthe number of nodes to be randomly reassigned each iteration is beneficial; however,differences between tested variants are not significant enough to enable us to draw aconclusion regarding the best variant for the tested problems. Only when using an ex-ponential rule, results are worse than the other EO variants, but even in those situations,they are very good.

Numerical results also show that extremal optimization may be very powerful in ad-dressing the problem of community structure detection. Its main drawback, however,arises from the fact that random the computational time required by the iterative randomchanges makes this approach less efficient for large networks. On the other hand, thismethod proved very efficient for small networks with less visible community structures.Further work consists in finding the means to improve its scalability while maintainingits efficiency in dealing with ambiguous community structures.

9

Page 13: RAPORT DE CERCETARE - econ.ubbcluj.rorodica.lung/cscn/RaportFinal.pdf · RAPORT DE CERCETARE CSC-N - COMMUNITY STRUCTURE AND DIFFUSION IN SOCIAL AND ECONOMIC NETWORKS. GRANT CNCS

Figure 5: Community structure of the averaged whole brain functional connectivitynetwork.

Figure 6: Community structure of the anterior and posterior salience network in caseof (A) healthy subjects, (B) cannabis users without ADHD, (C) subjects withchildhood diagnoses of ADHD who does not use cannabis, (D) subjects withchildhood diagnoses of ADHD who regularly use cannabis.

Lung, R. I., Suciu, M., Meszlnyi, R., Buza, K., & Gask, N. (2016, September).Community Structure Detection for the Functional Connectivity Networks of theBrain. In International Conference on Parallel Problem Solving from Nature (pp.633-643). Springer International Publishing (A core conferences list, ISI) Theanalysis of brain functional connectivity networks from the community structure pointof view can offer important information about the structure and functioning of thebrain. The brain networks are relatively small, with very unclear structure, not detectedby existing algorithms. In this paper we propose a game theoretic approach capableto identify strong connections in these networks and construct community structuresthat can offer relevant knowledge about the functioning of the brain. Moreover, theemergent field of neuromarketing proposes the analysis of the brain’s reaction to variousstimuli in connection to advertisements and exposure to different products. The methodproposed in this paper can be used to analyze functional connectivity networks of thebrain constructed from such data and offer more insights into possible reactions. Thusthe developed method opens the door to use community structure detection algorithmfor analyzing sensitive marketing data.

Neuromarketing is an emerging interdisciplinary field that uses tools of neuroscience

10

Page 14: RAPORT DE CERCETARE - econ.ubbcluj.rorodica.lung/cscn/RaportFinal.pdf · RAPORT DE CERCETARE CSC-N - COMMUNITY STRUCTURE AND DIFFUSION IN SOCIAL AND ECONOMIC NETWORKS. GRANT CNCS

in marketing applications by studying the reaction of individuals to different marketingapproaches. In this project we are interested in analyzing functional magnetic resonanceimaging (fMRI data) by using the method proposed in the above paper to gain a deeperunderstanding about the functioning of the brain and its reaction to various stimuli.We find this application challenging, as it strongly relates to the second objective of heproject, that of studying information diffusion in networks.

Information Diffusion in Social Networks In this phase we performed a thorough doc-umentation of existing literature on information diffusion in social networks by using theScopus (www.scopus.com) database. We found several major approaches to informationdiffusion.

According to Pei and Makse (2013) the network diffusion models can be divided intwo main classes: independent interaction models and threshold models.

Within independent interaction models it is assumed that an infected node transmitsto its neighboring nodes the information/infection with a certain probability and thatsuch probabilities across the network are independent. Examples are the susceptible-infected-recovered (SIR), susceptible-infected-susceptible (SIS), and the Bass model.

Threshold models take into account the possibility that a node accepts an informationonly if a fraction of its neighbors have already done so. We mention here the LinearThreshold Model, the General Threshold Model, the standard rumor model, the votermodel, the strategic games model, etc.

An important component of diffusion models is also the identification of influentialnodes/ individuals in the network either by considering various centrality measures. Afurther step is to find sets of influential nodes.

Zhang et al. (2016) points out the lack of a common ground of the field. In hisextensive review classifies dynamic spreading models into: threshold models, cascadingmodels and epidemic models.

Related to our project, we were particularly interested in works relating the communitystructure of a network with modeling information diffusion. We found two mainstreamapproaches:

• The first one uses information diffusion methods to identify the community struc-ture of a network (Shen et al., 2010; Chen, 2011; Hajibagheri et al., 2013; Fanet al., 2015; Jeub et al., 2015; Salnikov et al., 2016);

• The second one, which is also in the scope of our project, is to use the insightsprovided by knowing the community structure of the network to model and opti-mize information diffusion in the network (Anshelevich et al., 2014; Jiang et al.,2015; Vega-Oliveros and Berton, 2015; Halappanavar et al., 2016; Curato and Lillo,2016)

Game theory offers efficient tools for modeling information diffusion (Jackson andZenou, 2015) and the implication of game theory in modeling information diffusion ismentioned in many articles. However, there are very few works that connect community

11

Page 15: RAPORT DE CERCETARE - econ.ubbcluj.rorodica.lung/cscn/RaportFinal.pdf · RAPORT DE CERCETARE CSC-N - COMMUNITY STRUCTURE AND DIFFUSION IN SOCIAL AND ECONOMIC NETWORKS. GRANT CNCS

structure detection methods, game theory and information diffusion models (Krishna-murthy, 2014; Zhang et al., 2016). One reason for this may be that equilibria conceptsprovided by game theory are often intractable and thus considered useless by the prac-titioners community; it is our assumptions that by providing methods of incorporatingsuch concepts as alternates to optimization solution concepts used within diffusion mod-els would enhance these models while also bridging the gap between game theory andpractical applications.

References

Anshelevich, E., Hate, A., and Magdon-Ismail, M. (2014). Seeding influential nodes innon-submodular models of information diffusion. Autonomous Agents and Multi-AgentSystems, 29(1):131–159.

Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. (2008). Fast unfold-ing of communities in large networks. Journal of Statistical Mechanics: Theory andExperiment, 2008(10):P10008.

Chen, W. (2011). Discovering communities by information diffusion. In Proceedings -2011 8th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD2011, volume 2, pages 1123–1132.

Curato, G. and Lillo, F. (2016). Optimal information diffusion in stochastic block models.94(3).

Fan, C., Wu, H., and Zhang, C. (2015). Community division of bipartite network basedon information transfer probability. volume 9329, pages 133–143. Springer Verlag.

Girvan, M. and Newman, M. E. J. (2002). Community structure in social and biologicalnetworks. Proceedings of the National Academy of Sciences, 99(12):7821–7826.

Gong, M., Cai, Q., Chen, X., and Ma, L. (2013). Complex network clustering by multi-objective discrete particle swarm optimization based on decomposition. EvolutionaryComputation, IEEE Transactions on, PP(99):1–1.

Gong, M., Ma, L., Zhang, Q., and Jiao, L. (2012). Community detection in networks byusing multiobjective evolutionary algorithm with decomposition. Physica A: StatisticalMechanics and its Applications, 391(15):4050 – 4060.

Hajibagheri, A. R., Hamzeh, A., Alvari, H., and Hashemi, S. (2013). Social networkscommunity detection using the Shapley Value. Proceedings of the 2013 IEEE/ACMInternational Conference on Advances in Social Networks Analysis and Mining -ASONAM ’13, 37(E1):51–65.

Halappanavar, M., Sathanur, A. V., and Nandi, A. K. (2016). Accelerating the miningof influential nodes in complex networks through community detection. In Proceedings

12

Page 16: RAPORT DE CERCETARE - econ.ubbcluj.rorodica.lung/cscn/RaportFinal.pdf · RAPORT DE CERCETARE CSC-N - COMMUNITY STRUCTURE AND DIFFUSION IN SOCIAL AND ECONOMIC NETWORKS. GRANT CNCS

of the ACM International Conference on Computing Frontiers - CF ’16, pages 64–71.Association for Computing Machinery, Inc.

Honghao, C., Zuren, F., and Zhigang, R. (2013). Community detection using ant colonyoptimization. In Evolutionary Computation (CEC), 2013 IEEE Congress on, pages3072–3078.

Jackson, M. O. and Zenou, Y. (2015). Games on Networks. Handbook of Game Theorywith Economic Applications, 4(1):95–163.

Jeub, L. G. S., Balachandran, P., Porter, M. A., Mucha, P. J., and Mahoney, M. W.(2015). Think locally, act locally: Detection of small, medium-sized, and large commu-nities in large networks. Physical Review E - Statistical, Nonlinear, and Soft MatterPhysics, 91(1).

Jiang, J., Wen, S., Yu, S., Xiang, Y., and Zhou, W. (2015). K-Center: An Approachon the Multi-Source Identification of Information Diffusion. IEEE Transactions onInformation Forensics and Security, 10(12):2616–2626.

Krishnamurthy, V. (2014). Interactive Sensing and Decision Making in Social Networks.Foundations and Trends R© in Signal Processing, 7(1-2):1–196.

Lancichinetti, A. and Fortunato, S. (2009). Benchmarks for testing community detectionalgorithms on directed and weighted graphs with overlapping communities. Phys. Rev.E, 80:016118.

Lancichinetti, A., Fortunato, S., and Kertesz, J. (2009). Detecting the overlappingand hierarchical community structure in complex networks. New Journal of Physics,11(3):033015.

Lancichinetti, A., Radicchi, F., Ramasco, J. J., and Fortunato, S. (2011). Findingstatistically significant communities in networks. PloS one, 6(4):e18961.

Lung, R., Suciu, M., and Gasko, N. (2015). Noisy extremal optimization. Soft Comput-ing, pages 1–18.

Pei, S. and Makse, H. A. (2013). Spreading dynamics in complex networks. Journal ofStatistical Mechanics: Theory and Experiment, 2013(12):P12002.

Pizzuti, C. (2008). Ga-net: A genetic algorithm for community detection in social net-works. In Parallel Problem Solving from Nature–PPSN X, pages 1081–1090. Springer.

Rosvall, M. and Bergstrom, C. T. (2008). Maps of random walks on complex net-works reveal community structure. Proceedings of the National Academy of Sciences,105(4):1118–1123.

Sales-Pardo, M., Guimera, R., Moreira, A. A., and Amaral, L. A. N. (2007). Extractingthe hierarchical organization of complex systems. Proceedings of the National Academyof Sciences, 104(39):15224–15229.

13

Page 17: RAPORT DE CERCETARE - econ.ubbcluj.rorodica.lung/cscn/RaportFinal.pdf · RAPORT DE CERCETARE CSC-N - COMMUNITY STRUCTURE AND DIFFUSION IN SOCIAL AND ECONOMIC NETWORKS. GRANT CNCS

Salnikov, V., Schaub, M. T., and Lambiotte, R. (2016). Using higher-order Markovmodels to reveal flow-based communities in networks. Scientific Reports, 6:23194.

Shen, K., Song, L., Yang, X., and Zhang, W. (2010). A hierarchical diffusion algorithmfor community detection in social networks. In Proceedings - 2010 International Con-ference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC2010, pages 276–283.

Suciu, M., Lung, R. I., and Gasko, N. (2015). Mixing Network Extremal Optimization forCommunity Structure Detection, volume 9026 of Lecture Notes in Computer Science.Springer International Publishing, Cham.

Vega-Oliveros, D. A. and Berton, L. (2015). Spreader selection by community to maxi-mize information diffusion in social networks. In CEUR Workshop Proceedings, volume1478, pages 73–82. CEUR-WS.

Zhang, Z.-K., Liu, C., Zhan, X.-X., Lu, X., Zhang, C.-X., and Zhang, Y.-C. (2016).Dynamics of information diffusion and its applications on complex networks. PhysicsReports, 651:1–34.

14