Projet ARESOS Reconstruction, Analyse et Accès aux Données dans les Grands
Réseaux Socio-Sémantiques
Mission pour l'Interdisciplinarité du CNRS - Défi Masses de Données Scientifiques – MASTODONS
Patrick GALLINARI - UPMC Paris 6 - UMR 7606
Participants
• CAMS UMR 9557 - INSMI, EHESS, Paris • CSI - UMR 7185 - INSHS, Ecole des Mines, Paris • GIS Institut des Systèmes Complexes de Paris Ile-de-
France, (Fédération de 18 instituts et universités), INSHS, Paris
• IRISA, UMR 6074 - INS2I, IRISA, U. de Rennes 1 • IRIT, UMR 5505 - INS2I, U. Toulouse 3 • IXXI, INS2I, ENS Lyon • LATTICE, UMR 8094 - INSHS, ENS/ U. Paris 3 • LIG, UMR 5217 - INS2I, U. Joseph Fourrier, Grenoble • LIP6, UMR 7606 - INS2I, U. Pierre et Marie Curie, Paris
23/01/2015 Défi MASTODONS - Projet ARESOS 2
Context
• Community • ICWSM International AAAI Conf. on Web and Social Media
• “blends social science and computational approaches to answer important and challenging questions about human social behavior through social media while advancing computational tools for vast and unstructured data. “
• Analysis of large socio-semantic networks • Production and diffusion of content on media
• Human at the centre of the process
• Characterization • Interactions
• Individual + Social links • Structure of social interactions
• Content • multi-scale : micro, meso, macro, temporal
• Dynamic of conversations and concepts • multi-scale • multi-sources
23/01/2015 Défi MASTODONS - Projet ARESOS 3
Context: diversity of information sources and time scales
23/01/2015 Défi MASTODONS - Projet ARESOS 4
ARESOS themes
• Representation and access to social content • Who speaks about what and how? Natural language
processing, data processing, identification of roles, information flows, topic evolution, conversation following, sentiment analysis
• Dynamicity : social-semantic structures and diffusion phenomena • Discovery of latent structures, morphogenesis, content
diffusion - Co-evolution structure and semantic
• Social information retrieval • Analysis of microblogs, collaborative recommendation
23/01/2015 Défi MASTODONS - Projet ARESOS 6
Présentation
• Focus on • Emergence of socio-semantic structures
• Science evolution
• Representation learning
• Machine learning
• Natural Language Processing
23/01/2015 Défi MASTODONS - Projet ARESOS 7
Emergence of socio-semantic structures: Science evolution
ISC PIF – LIP6 – IRISA
D. Chavalarias, et al.
23/01/2015 Défi MASTODONS - Projet ARESOS 8
Example of challenges: Analysis of socio semantic networks
•
23/01/2015 Défi MASTODONS - Projet ARESOS 9
Quantitative epistemology
• Challenges • What is the structure of knowledge spaces ?
• How does science evolve
• Which are the driving forces
23/01/2015 Défi MASTODONS - Projet ARESOS 10
Space and Time Science evolution – dynamics at the meso level
23/01/2015 Défi MASTODONS - Projet ARESOS 13
Scaling (CAMS/ IRISA/ LIP6)
• Currently : • About 30M documents processed but only about 100k
for NLP.
• Maps build from about 5000 expressions (i.e. domain centered semantic maps with 5000 expressions)
• Target : • NLP on 30M documents (extraction of relevant key-
phrases)
• Maps build from about 1M expressions
23/01/2015 Défi MASTODONS - Projet ARESOS 17
• Feasibility Study on the WoS database (30M items 1990-2013)
• Internship1: Spark/Map-Reduce implementation for key-phases extraction and clustering (finding maximal cliques),
• Internship 2: Experimentation on an Hadoop cluster at Rennes and LIP6 using algebra on Resilient Distributed Datasets (selection, union, join, . . . )
23/01/2015 Défi MASTODONS - Projet ARESOS 18
Learning representations
• Objective: learning robust and meaningful representations from data
• Handcrafted versus learned representation • Very often complex to define what are good representations
• General methods that can be used for • Different application domains • Multimodal data • Multi-task learning
• Learning the latent factors behind the data generation • Unsupervised feature learning
• Several families of techniques • Algebraic and statistical models • (Deep) Neural networks
23/01/2015 Défi MASTODONS - Projet ARESOS 20
Learning representations - Success story
• Very active recent domain, technology adopted (sometimes already operational) by big actors (Google, Facebook, Msoft ..)
• Success in many academic benchmarks for a large series of different problems • Image / scene labeling
• Speech recognition
• Natural language processing
• Language translation
• etc
23/01/2015 Défi MASTODONS - Projet ARESOS 21
Learning Language models (Mikolov et al. 2013)
• Simple neural networks language model • Word2Vec software
• Analogical reasoning • Paris – France + Italy = Rome • Discovery of 3 way relations
• Semantic, syntactic
23/01/2015 Défi MASTODONS - Projet ARESOS 22
Neural image caption generator (Vinyals et al. 2015)
• Objective • Learn a textual description of an image
• i.e. using an image as input, generate a sentence that describes the objects and their relation!
• Model • Inspired by a translation approach but the input is an
image • Use a Recurrent to generate the textual description, word by
word, provided a learned description of an image via a deep Convolutional Neuural Network
23/01/2015 Défi MASTODONS - Projet ARESOS 23
Work in Aresos on representation learning
• Social networks • Relational classification
• Information diffusion
• Natural language processing • Semantic compositionality
23/01/2015 Défi MASTODONS - Projet ARESOS 25
Relational Classification: heterogeneous graphs
• Label items with corresponding tags, classes, … • Several methods for homogeneous graphs
• label propagation, graph metrics, etc.
• Does not extend to heterogeneous graphs and multiple links
• Claim • Correlation among labels from different items
23/01/2015 Défi MASTODONS - Projet ARESOS 26
Relational Classification: heterogeneous graphs
• Correlations among labels from different node types • Exemple from DBLP data (Author domains x
Conferences) • Authors: 4 labels
• Conferences: 20 labels
• P(author domain| conference) reveals correlations
23/01/2015 Défi MASTODONS - Projet ARESOS 27
Relational classification: heterogeneous graphs
• Correlations among labels from different node types • Exemple from DBLP data (Author domains x
Conferences) • Authors: 4 labels
• Conferences: 20 labels
• P(author domain| conference) reveals correlations
23/01/2015 Défi MASTODONS - Projet ARESOS 28
Relational classification: representation learning
• Instead of working in a discrete space, solve the problem in a continuous representation space • Project the whole heterogeneous graph in a common
continuous space
• Solve simultaneously • Learn representations for all items constrained by the graph
relations
• Learn classifiers for the different node types in this new space
• This exploits • Graph proximity of items from different types
• Label correlation
29
Relational classification: representation learning
• Example
• visualisation of the latent space for DBLP
Class centroids All nodes, 1 color = 1 class
23/01/2015 Défi MASTODONS - Projet ARESOS 30
Content Information diffusion
• Objective • Predict information diffusion cascades
• State of the art • Models, often inspired from earlier work in
epidemiology and social science • Graph propagation model
• e.g. Independent Cascade model, Linear Threshold model
• General assumptions
• Close world, nodes do operate the same way, no features (e.g. content) associated to nodes
23/01/2015 Défi MASTODONS - Projet ARESOS 31
Learning representation for content information diffusion
• Learn propagation models directly from observed cascades, without any network assumption • All the factors influencing this diffusion (External influences,
node roles, etc) are directly extracted from the data
• Representation learning • Propagation is modeled in a latent space, where the diffusion
process follows a simple formalization • Map the discrete problem onto a continuous space
• The latent space is learned from the cascade data
• Additional benefits: inference (here diffusion prediction) is extremely fast
23/01/2015 Défi MASTODONS - Projet ARESOS 32
Information diffusion
• Visualization • Digg dataset
• User post stories that are digged • Diggs = likes • Cascades = stories and diggs • 1 digg = contamination • 1 month crawling • 5 k users, 71 k links • 150 k Training cascades • 66 k test cascades
• Latent space of size 2 • User clusters • Color points = 4 observed
Test cascades
23/01/2015 Défi MASTODONS - Projet ARESOS 33
Natural language processing Tensor model of semantic compositionality (Van de Cruys, Poibeau, Korhonen)
• Compositionality • the meaning of a complex expression is a function of the
meaning of the parts and the way they are combined
• Distributional hypothesis of meaning • Words that appear in the same context tend to be semantically
similar
• How to reconcile the principe of compositionality with distributional semantics?
• Objective • Model compositionality as a multi-way interaction between
latent factors learned from the data • Task
• Learn three way interactions (verb, subject, object) VSO from knowledge basis
23/01/2015 Défi MASTODONS - Projet ARESOS 34
Natural language processing Tensor model of semantic compositionality (Van de Cruys, Poibeau, Korhonen)
• Method • Make use of tensor decomposition to capture 3 ways
dependencies
23/01/2015 Défi MASTODONS - Projet ARESOS 35
Natural language processing Tensor model of semantic compositionality (Van de Cruys, Poibeau, Korhonen)
• What is it good for • Learn knowledge basis in a continuous representation
form • E.g. Freebase, Wikipedia, ect
• Infer new knowledge • Analogical reasoning properties
• Firt step towards complex NLP/ RI tasks • E.g. Question Answering
• Evaluation here • Compute similarity scores between SVO phrases
23/01/2015 Défi MASTODONS - Projet ARESOS 36
Ressources
• Corpora • Base de données dynamique Twitter
• tina.iscpif.fr/bigdata
• Platforms • http://Gargantext.org
• reconstruction de réseaux socio-sémantiques et de cartographie
• http://graphbrain.algopol.fr/ • Parsing de corpus socio-textuels pour construction de graphes
socio-sémantiques à destination des sociologues
• Markovian segmentation – labeling of natural text
23/01/2015 Défi MASTODONS - Projet ARESOS 37
• Merci
• http://mastodons.lip6.fr/
23/01/2015 Défi MASTODONS - Projet ARESOS 41
Top Related