Sunday Morning

  1. Interactive Data Analysis Tool by Augmenting MATLAB with Semantic Objects by Changhyun Lee, Jaegul Choo, Duen Horng (Polo) Chau, and Haesun Park.

    The traditional data analysis tools support strong computational capabilities and numerous standard visualization techniques. However, they provide little visual interactions due to the fact that the tools maintain a wide applicability to diverse data domains, and thus any inherent meanings associated with the data domains are hardly allowed. To cover these limitations, we propose to augment Matlab, one of the widely used data analysis tools and computational languages, by imposing the capabilities of handling semantic objects so that diverse essential interaction capabilities could be allowed such as brushing-andlinking, details-on-demand, and dynamic interactive updating on visualization. In our demonstration, we will show our audience how to import semantic data, how visual interactions are occurred, and how these functionalities are convenient using the movie similarity graph data set.

  2. The MiningZinc Framework for Constraint-based Itemset Mining by Tias Guns, Anton Dries, Guido Tack, Siegfried Nijssen, and Luc De Raedt.

    We present MiningZinc, a novel system for constraint-based pattern mining. It provides a declarative approach to data mining, where a user specifies a problem in terms of constraints and the system employs advanced techniques to efficiently find solutions. Declarative programming and modeling are common in artificial intelligence and in database systems, but not so much in data mining; by building on ideas from these communities, MiningZinc advances the state-of-the-art of declarative data mining significantly. Key components of the MiningZinc system are (1) a high-level and natural language for formalizing constraint-based itemset mining problems in models, and (2) an infrastructure for executing these models, which supports both specialized mining algorithms as well as generic constraint solving systems. A use case demonstrates the generality of the language, as well as its flexibility towards adding and modifying constraints and data, and the use of different solution methods.

Monday Morning

  1. An Evaluation Framework for Temporal Subspace Clustering Approaches by Hardy Kremer, Stephan Günnemann, Arne Held, and Thomas Seidl.

    Mining multivariate time series data by clustering is an important research topic. Time series can be clustered by standard approaches like k-means, or by advanced methods such as subspace clustering and triclustering. A problem with these new methods is the lack of a general evaluation scheme that can be used by researchers to understand and compare the algorithms; publications on new algorithms mostly use different datasets and evaluation measures in their experiments, making comparisons with other algorithms rather unfair.
    In this demonstration, we present our ongoing work on an experimental framework that offers the means for extensive visualization and evaluation of time series clustering algorithms. It includes a multitude of methods from different clustering paradigms such as fullspace clustering, subspace clustering, and triclustering. It provides a flexible data generator that can simulate different scenarios, especially for temporal subspace clustering. It offers external evaluation measures and visualization features that allow for effective analysis and better understanding of the obtained clusterings. Our demonstration system is available on our website.

  2. Demand Finder: Set Top Box Television Ad Targeting using a Novel Interactive Data Visualization System. byBrendan Kitts, Dyng Au, Brian Burdick, Jon Borchardt, Amanda Powter, and Todd Otis.

    This paper will show how machine learning and data visualization techniques are being used to execute real television ad buys. We present an innovative data visualization tool which allows users to filter, histogram, and sort so as to identify the television inventory with highest value per dollar. Using the application users have been able to identify media that performs 50% better than previous campaigns as measured by phone response in several live television campaigns.

  3. SaferCity: a System for Detecting and Analyzing Incidents from Social Media by Michele Berlingerio, Francesco Calabrese, Giusy Di Lorenzo, Xiaowen Dong, Yiannis Gkoufas, and Dimitrios Mavroeidis.

    This paper presents a system to identify and characterise public safety related incidents from social media, and enrich the situational awareness that law enforcement entities have on potentially-unreported activities happening in a city. The system is based on a new spatio-temporal clustering algorithm that is able to identify and characterize relevant incidents given even a small number of social media reports. We present a web-based application exposing the features of the system, and demonstrate its usefulness in detecting, from Twitter, public safety related incidents occurred in New York City during the Occupy Wall Street protests.

Tuesday Morning

  1. NIM: Scalable Distributed Stream Processing System on Mobile Network Data by Lujia Pan, Jianfeng Qian, Caifeng He, Wei Fan, and Cheng He.

    The amount of 3G MBB data has grown from 15 to 20 times in the past two years. Thus, real-time processing of these data is becoming increasingly necessary. The overhead of storage and file transfer to HDFS, delay in processing, and etc make offline analysis inefficient. Analysis of these datasets are non-trivial, examples include personal recommendation, anomaly detection, and fault diagnosis. We describe NIM - Network Intelligence Miner, which is a scalable and elastic streaming solution that analyzes MBB statistics and traffic patterns in real-time, and provides information for real-time decision making. The design and the unique features (e.g., balanced data grouping, aging strategy) of NIM help not only the network data analysis tasks but also other applications like Intelligent Transportation System (ITS), etc.

  2. Demonstrating Interactive Multi-resolution Large Graph Exploration. by Zhiyuan Lin, Nan Cao, Hanghang Tong, Fei Wang, U Kang, and Duen Horng (Polo) Chau.

    We present a scalable, interactive graph visualization system to support multi-resolution exploration of million-node graphs in real time. By adapting a state-of-the-art graph algorithm, called Slash & Burn, our prototype system generates a multi-resolution view of graphs with up to 69 million edges under a few seconds. We are experimenting with interaction techniques that help users interactively explore this overview and drill down into details. While many visualization systems for million-node graphs require dedicated servers to process the graphs, our prototype runs on a commodity laptop computer. We aim to handle graphs that are at least an order of magnitude (100M edges) larger than what current systems can support.
    We demonstrate our system’s usage, benefits, and scalability using two large graphs: a LiveJournal friendship network with 69 million edges, and a related-movies network from Rotten Tomatoes with 200K edges.