December 8: Tutorial 1

Methods and Applications of Network Sampling

Mohammad A. Hasan (IUPUI), Nesreen K. Ahmed (Purdue), and J. Neville (Purdue)

Network data appears in various domains, including social, communication, and information sciences. Analysis of such data is crucial for making inferences and predictions about these networks, and moreover, for understanding the different processes that drive their evolution. However, a major bottleneck to perform such an analysis is the massive size of real-life networks, which makes modeling and analyzing these networks simply infeasible. Further, many networks, specifically those that belong to social and communication domains, are not visible to the public due to privacy concerns, and other networks, such as the Web, are only accessible via crawling. Therefore, to overcome the above challenges, researchers use network sampling overwhelmingly as a key statistical approach to select a sub-population of interest that can be studied thoroughly.

In this tutorial, we aim to cover a diverse collection of methodologies and applications of network sampling. We will begin with a discussion of the problem setting in terms of objectives (such as, sampling a representative subgraph, sampling graphlets, etc.), population of interest (vertices, edges, motifs), and sampling methodologies (such as Metropolis-Hastings, random walk, and snowball sampling). We will then present a number of applications of these methods, and will outline both the resulting opportunities and possible biases of different methods in each application.


December 9: Tutorial 2

Applied Matrix Analytics: Recent Advance and Case Studies

Hanghang Tong (CUNY), Fei Wang (IBM TJ Watson), and Chris Ding (UTA)

Matrix provides a natural representation for many real world data, such as images, documents, networks, etc. Matrix based algorithms have been attracting tremendous attention in the data mining research community because of its versatility, neat interpretability, and broad applicability. This tutorial will review the emerging matrix-based data mining algorithms in understanding and analyzing human behavior. We will focus on the application of those technologies in two high impact application domains, including social informatics and healthcare informatics. Our emphasis will be on how recent emergent matrix-based data mining algorithms have been advancing these application domains; and on the new challenges posed by these applications.


December 10: Tutorial 3

Social Media Mining: Fundamental Issues and Challenges

Mohammad Ali Abbasi (ASU), Huan Liu (ASU), and Reza Zafarani (ASU)

Social media generates massive amounts of user-generated-content data. Such data differs from classic data and poses new challenges to data mining. This tutorial presents fundamental issues of social media mining, ranging from network representation to influence/diffusion modeling, elaborate state-of-the-art approaches of processing and analyzing social media data, and show how to utilize patterns to real-world applications, such as recommendation and behavior analytics. The tutorials designed for researchers, students and scholars interested in studying social media and social networks. No prerequisite is required for ICDM participants to attend this tutorial.