BARC/EADS Talk by Charalampos Tsourakakis

Title

Predicting Positive and Negative Links with Noisy Queries: Theory & Practice

Abstract

In this talk, Charalampos Tsourakakis will present recent results on an important graph mining problem known as the “Edge Sign Prediction Problem”: Can we predict whether an interaction between a pair of nodes will be positive or negative?

The edge sign prediction problem is modelled as follows: we are allowed to query any pair of nodes whether they belong to the same cluster or not, but the answer to the query is corrupted with some probability $0<q<\frac{1}{2}$. Let $\delta=1-2q$ be the bias. We provide an algorithm that recovers all signs correctly with high probability in the presence of noise for any constant gap $\delta$ with $O(\frac{n\log n}{\delta^4})$ queries. The algorithm uses breadth first search as its main algorithmic primitive. A byproduct of the proposed learning algorithm is the use of $s-t$ paths as an informative feature to predict the sign of the edge $(s,t)$. As a heuristic, edge disjoint $s-t$ paths of short length is used as a feature for predicting edge signs in real-world signed networks. The findings suggest that the use of paths improves the classification accuracy of state-of-the-art classifiers, especially for pairs of nodes with no or few common neighbors.

Joint work with Michael Mitzenmacher (Harvard), Jarosaw Basiok (Harvard), Ben Lawson (BU), Preetum Nakkiran (Harvard), Vasileios Nakos (Harvard).

Bio

Dr. Charalampos Tsourakakis received his Ph.D. from the Algorithms, Combinatorics and Optimization (ACO) program at Carnegie Mellon University, and served as a Postdoctoral Fellow in Harvard University. He holds a Diploma in Electrical and Diploma Engineering from the National Technical University of Athens and a Master of Science from the Machine Learning Department at Carnegie Mellon University. Before joining Boston University, he worked as a researcher in the Google Brain team. He won a best paper award in IEEE Data Mining, has delivered three tutorials in the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, and has designed two graph mining libraries for large-scale graph mining, one of which has been officially included in Windows Azure. His research focuses on large-scale graph mining, and machine learning.