Workshop on Dynamic Networks and Knowledge Discovery
(DyNaK 2010)

Barcelona, Spain, September 24, 2010

 

Modeling and analyzing networks is a major emerging topic in different research areas, such as computational biology, social science, document retrieval, etc. By connecting objects, it is possible to obtain an intuitive and global view of the relationships between components of a complex system.

Nowadays, the scientific communities have access to huge volumes of network-structured data, such as social networks, gene/proteins/metabolic networks, sensor networks, peer-to-peer networks. Most often, these data are not only static, but they are collected at different time points. This dynamic view of the system allows the time component to play a key role in the comprehension of the evolutionary behavior of the network (evolution of the network structure and/or of flows within the system). Time can help to determine the real causal relationships within, for instance, gene activations, link creation, information flow.

Handling such data is a major challenge for current research in machine learning and data mining, and it has led to the development of recent innovative techniques that consider complex/multi-level networks, time-evolving graphs, heterogeneous information (nodes and links), and requires scalable algorithms that are able to manage huge and complex networks.

DyNaK workshop is motivated by the interest of providing a meeting point for scientists with different backgrounds that are interested in the study of large complex networks and the dynamic aspects of such networks. It aims at attracting contributions from both aspects of networks analysis: large real network analysis and modelling, and knowledge discovery within those networks. Even though each type of real complex networks has some peculiarities related to its specific domain, many aspects of the modeling and mining techniques for such networks are shareable. For instance, gene networks and social networks share a common architecture (scale-free), and involve similar data mining and machine learning methods: module/community extraction, hub single-out, information-flow analysis, missing link detection and link prediction.

Special session on Sentiment Analysis and Opinion Mining

Every day, millions of people write their opinions about any issue in social media, such as social news sites, review sites, and blogs. The distillation of knowledge from this huge amount of unstructured information is a challenging task. Sentiment Analysis and Opinion Mining are two areas related to Natural Language Processing and Text Mining that deal with the identification of opinions and attitudes in natural language texts. In the Opinion Mining session of DyNaK we are interested in research results from academics and practitioners in the task of extracting knowledge from user generated contents, and how time affects to this analysis.

Topics of interest

Contributions to the DyNaK workshop should be focused on this (non exhaustive) list of topics:

Methods:
  • Network inference from raw data
  • Graphical models
  • Graph mining algorithms
  • Graph kernel algorithms
  • Relational learning algorithms
  • Matrix/Tensor methods
  • Information retrieval algorithms
  • Bayesian methods
  • Evolutionary clustering
  • Mining and learning from heterogenous domains
  • Bisociative information discovery
  • Clustering/Co-clustering/Biclustering
  • Pattern mining and clustering with constraints
  • Community detection/Module extraction
  • Analogies between social and biological networks
  • Opinion Extraction and Classification
  • Blogs Analysis and Social Search
  • Temporal Sentiment Analysis
  • Irony and Plagiarism detection in Opinion Mining
  • Recommender Systems
Applications:
  • System biology: regulatory gene networks, protein-protein interaction, miRNA networks, metabolic networks
  • Social networks: folksonomies, digital libraries, information networks, social media, collaborative networks
  • Sensor networks, peer-to-peer networks, Web, agent networks, body sensor networks

Key Dates

  • Paper Submission due: June 28, 2010
  • Notification of Acceptance: July 19, 2010
  • Camera Ready Papers due: July 28, 2010
  • Workshop: September 24, 2010

Organization

Program Committee

  • Riccardo Bellazzi, University of Pavia, Italy
  • Guillaume Beslon, INSA-Lyon, France
  • Bettina Berendt, Katholieke Universiteit Leuven, Belgium
  • Tanya Berger-Wolf, University of Illinois, USA
  • Karsten M. Borgwardt, MPI, Tübingen, Germany
  • Jean-François Boulicaut, INSA-Lyon, France
  • Raffaele Calogero, University of Torino, Italy
  • Iván Cantador, Universidad Autónoma de Madrid, Spain.
  • Francisco M. Carrero, Universidad Europea de Madrid, Spain
  • José C. Cortizo, Universidad Europea de Madrid, Spain
  • Diego Di Bernardo, TIGEM, Italy
  • Mohamed Elati, University of Evry, France
  • Paolo Frasconi, University of Firenze, Italy
  • Lise Getoor, University of Maryland, USA
  • Dino Ienco, University of Torino, Italy
  • Tamara G. Kolda, Sandia National Laboratories, USA
  • Stefan Kramer, Technische Universität München, Germany
  • Tao Li, Florida International University, USA
  • Pietro Liò, University of Cambridge, UK
  • Huan Liu, Arizona State University, USA
  • Eric Yu-En Lu, University of Cambridge, UK
  • Sara C. Madeira, INESC-ID/IST, Portugal
  • Rosa Meo, University of Torino, Italy
  • Tsuyoshi Murata, Tokyo Institute of Technology, Japan
  • Mirco Nanni, ISTI-CNR, Italy
  • Arlindo Oliveira, INESC-ID, Portugal
  • Andrea Passerini, University of Trento, Italy
  • Lorenza Saitta, University of Piemonte Orientale, Italy
  • Rossano Schifanella, University of Torino, Italy
  • Einoshin Suzuki, Kyushu University, Japan
  • Hanghang Tong, Carnegie Mellon University, USA

Invited Speakers

  • Tanya Berger-Wolf, University of Illinois, USA

    Finding structure in dynamic networks (and what it means for zebras)

    Show/hide abstract

    Social creatures interact in diverse ways: forming groups, mating, sending emails, and sharing ideas. Some of the interactions are accidental while others are a consequence of the underlying explicit or implicit social structures. One of the most important questions in sociology is the identification of such structures, which are variously viewed as communities, hierarchies, or "social profiles.

    In analyzing social networks, one property has largely been ignored until recently: interactions and their nature change over time. The notion of "structure" is intricately linked with the dynamics of social interactions. On one hand, it is in longitudinal data that the emergence of structures and the laws governing their development can be observed and inferred. On the other hand, the existence of such structures that constrain social interactions is what allows us to predict the behavior and nature of dynamic networks. The necessity to delve into the dynamic aspects of networking behavior may be clear, yet it would not be feasible without the data to support such explicitly dynamic analysis. Rapidly growing electronic networks, such as emails, the Web, blogs, and friendship sites, as well as mobile sensor networks on cars, humans, and animals, provide an abundance of dynamic social network data that for the first time allow the temporal component to be explicitly addressed in network analysis.

    I will present several examples of computational approaches we have developed to infer structure in dynamic networks and show applications of this analysis to population biology, from humans to zebras.

  • Stefan Kramer, Technische Universität München, Germany

    Learning Real-Time Automata from Multi-Attribute Event Logs

    Show/hide abstract

    Network structures often arise as descriptions of complex temporal phenomena in science and industry. Popular representation formalisms include Petri nets and (timed) automata. In process mining, the induction of Petri net models from event logs has been studied extensively. Less attention, however, has been paid to the induction of (timed) automata outside the field of grammatical inference. In the talk, I will present work on the induction of timed automata and show how they can be learned from multi-attribute event logs. I will present the learning method in some detail and give examples of network inference from synthetic, medical and biological data.

  • Carlos Rodríguez, Research Center, Barcelona-Media, Spain

    How much linguistics do we need in order to understand online opinions?

    Show/hide abstract

    The vast amount of online opinionated text has driven the interest of an active research community that exploits this user-generated content to gather market information and create bussiness intelligence applications. State-of-the-art Natural Language Processing techniques can provide a level of text interpretation that might be adequate for certain tasks, but there is room for improvement over the current methods which are based on pre-existing knowledge, such as prior polarity lexicons and domain ontologies. The crucial question is how much resource-intensive linguistic processing is needed to understand what people are talking about, and how do they feel about it. A principled combination of symbolic and stochastic approaches that is guided by bootstrapping existing and extensive Web 2.0 resources seems to be a good compromise when full text interpretation is not available or practical.

Industrial Talk

  • Enrico M. Bucci, BioDigitalValley Srl, Italy

    Protein-centered biological networks by automatic caption analysis

    Show/hide abstract

    In former years, a lot of attention has been paid to the retrieval of meaningful biological information connecting proteins and genes, i.e. relationships between different players in the cascade of molecular events regulating the physiology and pathology of cells, tissues and eventually organisms. The main goal is to develop genes/proteins connection models able to explain complex biological phenomena in terms of emerging properties of large, structured networks, whose topology and detailed structure account at least in part for these properties. This implies the use of experimental methods able to collect information on a large number of different proteins under different conditions, and then properly connecting the data to the results obtained all over the world, so to get a coherent picture in a larger frame. In particular, to encompass a larger body of information and to figure out how some experimental study fits to the accumulated knowledge, methods are required to retrieve the available data on all proteins involved in the study (the target proteins), as well as on all proteins, which are connected by some piece of information to the targets. To this aim, a method consists in parsing automatically the scientific literature, retrieving co-occurring names of proteins, genes or other kinds of molecules and attempting to identify some terms which qualifies the relationship between the identified proteins. This task is a non trivial one, giving the ambiguity in gene/protein nomenclature (which affects both precision and recall of the relevant data), and the strong dependence of the type of relationship on the context at multiple levels. Most of the available methods parse only the abstracts of the scientific literature; however, the information contained in the abstracts is often incomplete, due to the fact that only those genes/proteins which are in the main scope of the paper are discussed, while often data on a number of other proteins are contained elsewhere. In an attempt to overcome these limitations, we focussed on the analysis of the figure captions contained in the scientific literature. The captions of a paper refer in most cases to the experiments described in the paper, and thus contain an enriched amounts of data describing the biology of different proteins, including the relationship between them. Moreover, since terms referring to gene/proteins and other terms related to experimental methodologies are simultaneously present in a reduced textual space, it is possible to identify groups of proteins studied with a certain experimental technique; by properly filtering for a specific technique, is possible to characterize the type of relationship between the proteins. For example, proteins co-occurring in a caption describing a double-hybrid experiment are most likely binding partners, while proteins co-occurring in a caption describing a 2D-gel experiments are probably co-expressed in a given condition/biological sample.

    We thus developed Protein Quest, a tool which automatically and efficiently parse both the abstract and the captions of scientific paper in a pdf document. Results obtained from more than 2.000.000 free, full-text papers will be discussed, with reference to the topological characterization of the obtained co-occurrence networks and to the dependence of their topology from different query strategies; moreover, some specific, disease-oriented networks and predictions will be presented.

Program


10:30 - 12:00 Dynamic Networks - Session 1

10:30 Opening remarks
10:45 Invited talk: Tanya Berger-Wolf (University of Illinois, USA)
Finding structure in dynamic networks (and what it means for zebras)
11:30 Discovering Inter-Dimensional Rules in Dynamic Graphs
K-N. T. Nguyen, L. Cerf, M. Plantevit, and J-F. Boulicaut

12:00 - 12:15 Short break

12:15 - 13:45 Dynamic Networks - Session 2

12:15 Relational Learning of Disjunctive Patterns in Spatial Networks
C. Loglisci, M. Ceci, and D. Malerba
12:45 Spectral Co-Clustering for Dynamic Bipartite Graphs
D. Greene and P. Cunningham
13:15 Stream-based Community Discovery via Relational Hypergraph Factorization on Evolving Networks
C. Bockermann and F. Jungermann

13:45 - 14:30 Lunch break

14:30 - 16:35 Dynamic Networks - Session 3

14:30 Invited talk: Stefan Kramer (Technische Universität München, Germany)
Learning Real-Time Automata from Multi-Attribute Event Logs
15:15 Network-Based Disease Candidate Gene Prioritization: Towards Global Diffusion in Heterogeneous Association Networks
J. P. Gonçalves, S. C. Madeira, and Y. Moreau
15:45 Industrial talk: Enrico M. Bucci (BioDigitalValley Srl, Italy)
Protein-centered biological networks by automatic caption analysis
16:05 Collaboration-based Social Tag Prediction in the Graph of Annotated Web Pages
H. Rahmani, B. Nobakht, and H. Blockeel

16:35 - 16:55 Coffee break

16:55 - 18:30 Sentiment Analysis and Opinion Mining

16:55 Invited talk: Carlos Rodríguez (Research Center, Barcelona-Media, Spain)
How much linguistics do we need in order to understand online opinions?
17:35 Automatic Sentiment Monitoring of Specific Topics in the Blogosphere
F. S. Pimenta, D. Obradovi, R. Schirru, S. Baumann, and A. Dengel
18:00 Different Aggregation Strategies for Generically Contextualized Sentiment Lexicons (short)
S. Gindl
18:15 Towards an Automatic Evaluation for Topic Extraction Systems for Online Reputation Management (short)
E. Amigó, D. Spina, B. Beotas, and J. Gonzalo

Paper Submission

We welcome original contributions, either theoretical or empirical, describing ongoing projects or completed work (even partially published). The instructions for authors and the LaTeX packages can be found at http://www.springer.de/comp/lncs/authors.html. Paper length should not exceed 12 pages. A selection of the accepted papers will be taken into consideration for a special issue of an international journal.

To submit your paper(s), please log into the submission website.

Proceedings

Online proceedings are now available on CEUR-WS.org/Vol-655.

IDA Journal Special Issue

Following the success of the DyNak workshop joint with ECML/PKDD 2010 in Barcelona, we invite DyNaK's authors to send extended versions of their workshop contribution. The special issue is however open to relevant contributions that were not presented at the DyNak workshop.

Here you can find the Call for Papers

Autors of selected abstracts will be contacted by e-mail. They will soon receive the instructions for submitting their papers.