Presentation Journée Thématique - Fouille de Grands Graphes

26
Point of View Based Clustering of Socio-Semantic Networks Juan David CRUZ 1 Cécile BOTHOREL 1 François POULET 2 JT – FGG2010

Transcript of Presentation Journée Thématique - Fouille de Grands Graphes

Point of View BasedClustering ofSocio-Semantic NetworksJuan David CRUZ1

Cécile BOTHOREL1

François POULET2JT – FGG2010

Outline

1 IntroductionSocial Networks and Points of ViewSome Previous Work

2 The Point of View of Social Networks

3 Influencing the Community Detection with the Point of ViewPhase 1Phase 2

4 Preliminary Experiments and Results

5 Conclusion

page 2 Cruz,Bothorel,Poulet Clustering by Point of View

MotivationIntroduction I Social Networks and Points of View

� Socio–semantic networks are contains both:• The social graph• Semantic information represented by the features of thevertices and the edges.

� Given this information, how to identify communities derivedfrom the conjoint use of it?

� It is necessary to measure the quality of the partitions foundusing this information from two perspectives:

• The quality of the graph clustering• The quality of the semantic information within thecommunities

page 3 Cruz,Bothorel,Poulet Clustering by Point of View

Social Networks and Points of ViewIntroduction I Social Networks and Points of View

It is possible to obtain different partitions from different pointsof view

page 4 Cruz,Bothorel,Poulet Clustering by Point of View

Quality MeasuresIntroduction I Some Previous Work

Type Objective Examples

Similarity

Reduce the distance betweenthe members of the samegroup while the distance

between groups is increased.

Manhattan L1Euclidean L2Chebyshev L∞

Quality

Increase the number of edgeswithin each community whilethe number of edges betweencommunities is reduced. In

general: index (C) = f (C)+g(C)N(G)

[Gae05]

CoverageConductancePerformanceModularity

page 5 Cruz,Bothorel,Poulet Clustering by Point of View

Graph Clustering AlgorithmsIntroduction I Some Previous Work

Several graph clustering algorithms have been developed, amongothers:� Newman [New01] (Modularity optimization)� Fast unfolding [BGLL08] (Modularity optimization)� Maximal cliques enumeration and kernel generation [DWP+07](Modularity optimization)

� Genetic algorithm for detecting communities in large graphs[LM09] (Fitness function based on modularity)

� Genetic algorithm for detecting overlapped communities[Piz09] (Fitness function based on internal edges vs. outgoingedges)

page 6 Cruz,Bothorel,Poulet Clustering by Point of View

The Representation of a Point of ViewThe Point of View of Social Networks

� The semantic information can be described by a set of features� The point of view is PoVi = si ×V

Point of ViewNodes Feature 1 Feature 2 . . . Feature fNode 1 1 0 . . . 0Node 2 0 1 . . . 1

......

......

Node n 1 0 . . . 1

� The assignation of features to each node in the network

page 7 Cruz,Bothorel,Poulet Clustering by Point of View

General ArchitectureInfluencing the Community Detection with the Point of View

� Guide the community detection algorithm according tosemantic information.

� Use of clustering techniques from different domains.

page 8 Cruz,Bothorel,Poulet Clustering by Point of View

Semantic ClusteringInfluencing the Community Detection with the Point of View I Phase 1

page 9 Cruz,Bothorel,Poulet Clustering by Point of View

Semantic ClusteringInfluencing the Community Detection with the Point of View I Phase 1

� Clustering of the defined point of view: search nodes withsimilar instances of features.

� Use of Self–Organizing Maps (SOM): non–supervised machinelearning method [Koh97].

� The proximity between the input vector (instance) and theweight vector of the network is measured with the Euclideandistance.

� The SOM algorithm will find some number of groups.

page 10 Cruz,Bothorel,Poulet Clustering by Point of View

Weights Assignation and CommunityDetection

Influencing the Community Detection with the Point of View I Phase 2

page 11 Cruz,Bothorel,Poulet Clustering by Point of View

Weights Assignation and CommunityDetection

Influencing the Community Detection with the Point of View I Phase 2

� Given the trained SOM network N and a graph G (V ,E ):� For each e (i , j) ∈ E , the weight will be changed according to:

wij = 1+αd (Nij)δij

where α ≥ 1 is constant parameter, d (Nij), is the distancebetween the node i and the node j in the SOM network andδij = 1 if i , j belong to the same group in the SOM network.

� After the weights are set, a classic graph clustering algorithm(the fast unfolding algorithm [BGLL08]) is used.

page 12 Cruz,Bothorel,Poulet Clustering by Point of View

Experiments Configuration IPreliminary Experiments and Results

� In each experiment three algorithm were compared:• SOM• Fast unfolding• Our method

� Performed in two levels:• The final modularity: to measure the quality of the partition.• The average intra–cluster Euclidean distance: to measure thequality of the semantic clustering.

� The experiment were executed using a graph of 5389 nodesand 27347 edges extracted from a Twitter data set. The initialmodularity of this graph is −2.5192×10−3

page 13 Cruz,Bothorel,Poulet Clustering by Point of View

Experiments Configuration IIPreliminary Experiments and Results

� The experiments were performed using two different points ofview.

� Point of View 1:• Composed of 33 features. Each feature represents a time zonefrom the Twitter data set.

• A feature will be set to 1 if the node has at least one friend inthe time zone represented by the feature.

• Distances vary from 0 to√32

� Point of View 2:• Composed of 4 features representing the messaging profile ofeach user.

• The first feature is set to 1 if the user has more friends thanfollowers.

page 14 Cruz,Bothorel,Poulet Clustering by Point of View

Experiments Configuration IIIPreliminary Experiments and Results

• The next three features indicate the user behavior according tothe number of messages sent: below the mean, between themean plus three standard deviations and, over mean plus threestandard deviations.

• Distances vary from 0 to√3

page 15 Cruz,Bothorel,Poulet Clustering by Point of View

Case Twitter – Point of View 1Preliminary Experiments and Results

Experiment Final Q Avg. Intracluster DistanceSOM Graph −7.5×10−3 0.3697

Graph based clustering 0.5728 1.8091PoV based Clustering 0.5747 1.1947

� The average intracluster distance found by our proposedmethod is less than the average intracluster distance found bythe graph based algorithm.

� The modularity obtained is very similar: the point of view usesinformation associated with the localization of people’s friends.

� The modularity of the graph from the SOM clustering is notvery different from the modularity of the original graph.

� SOM groups are close to the structure of the non–clusteredgraph.

page 16 Cruz,Bothorel,Poulet Clustering by Point of View

Case Twitter – Point of View 2Preliminary Experiments and Results

Experiment Final Q Avg. Intracluster DistanceSOM Graph -0.2991 0

Classic clustering 0.5728 0.7100PoV based Clustering 0.6351 0.5507

� The SOM clustered the nodes into six groups, each oneexpressing one of the possible instances described above. Thisexplains the average distance found.

� Creating a graph from the SOM clustering will produce bettersemantic clusters, however, the modularity is worst than theone from the original graph.

� The SOM groups are totally unrelated with the structure ofthe graph.

� Regarding the modularity and the average intracluster distance,the performance of the PoV based algorithm was better.

page 17 Cruz,Bothorel,Poulet Clustering by Point of View

Conclusion IConclusion

� The classic community detection algorithms do not take intoaccount the semantic information to influence the clusteringprocess.

� Changing the weights according to the results of the semanticclustering, the semantic information is included into thecommunity detection process.

� The two types of informations are merged to find andvisualize a social network from a selected point of view.

page 18 Cruz,Bothorel,Poulet Clustering by Point of View

Conclusion IIConclusion

� Outlook• Make tests over the obtained partitions: rand index, robustnesstests...

• Study the case of overlapping communities.• Include the features of the edges into the point of viewgeneration.

• Development of a visualization algorithm for representing thePoV and the transition between two points of view.

page 19 Cruz,Bothorel,Poulet Clustering by Point of View

page 20 Cruz,Bothorel,Poulet Clustering by Point of View

nav

Thanks for your attentionAppendix I Appendix

Questions?

page 20 Cruz,Bothorel,Poulet Clustering by Point of View

For Further Reading IAppendix I Appendix I For Further Reading

Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte,and Etienne Lefebvre.Fast unfolding of communities in large networks.Journal of Statistical Mechanics: Theory and Experiment,2008(10):P10008 (12pp), 2008.

Nan Du, Bin Wu, Xin Pei, Bai Wang, and Liutong Xu.Community detection in large-scale social networks.In WebKDD/SNA-KDD ’07: Proceedings of the 9th WebKDDand 1st SNA-KDD 2007 workshop on Web mining and socialnetwork analysis, pages 16–25, New York, NY, USA, 2007.ACM.

page 21 Cruz,Bothorel,Poulet Clustering by Point of View

For Further Reading IIAppendix I Appendix I For Further Reading

Marco Gaetler.Network Analysis: Methodological Foundations, chapterClustering, pages 178 – 215.Springer Berlin / Heidelberg, 2005.

Teuvo Kohonen.Self-Organizing Maps.Springer, 1997.

page 22 Cruz,Bothorel,Poulet Clustering by Point of View

For Further Reading IIIAppendix I Appendix I For Further Reading

Marek Lipczak and Evangelos Milios.Agglomerative genetic algorithm for clustering in socialnetworks.In GECCO ’09: Proceedings of the 11th Annual conference onGenetic and evolutionary computation, pages 1243–1250, NewYork, NY, USA, 2009. ACM.

M. E. Newman.Scientific collaboration networks. ii. shortest paths, weightednetworks, and centrality.Phys Rev E Stat Nonlin Soft Matter Phys, 64, July 2001.

page 23 Cruz,Bothorel,Poulet Clustering by Point of View

For Further Reading IVAppendix I Appendix I For Further Reading

Clara Pizzuti.Overlapped community detection in complex networks.In GECCO ’09: Proceedings of the 11th Annual conference onGenetic and evolutionary computation, pages 859–866, NewYork, NY, USA, 2009. ACM.

page 24 Cruz,Bothorel,Poulet Clustering by Point of View

Quality Measures SummaryAppendix I Appendix I For Further Reading

Summary of quality measures for graph clustering.

page 25 Cruz,Bothorel,Poulet Clustering by Point of View