Work Package 1

Mining Online Multimedia as training Resource (MOM)

Objectives: develop tools and techniques for leveraging user-generated multimedia as training resource for automatic semantic labeling.

The most dominant element in the video retrieval paradigm based on semantic labeling is the availability of a large vocabulary of robust detectors. Scaling up the number of detectors will only be possible if the fundamental problem in automatic indexing based on supervised machine learning is resolved: the lack of a large and diverse set of manually labeled visual examples to model the diversity in object and scene appearance adequately. A new direction in tackling this fundamental problem is employing user tagged visual data provided by online services such as YouTube and Flickr. These annotations are less accurate than the current practice in semantic video retrieval, but the amount of training samples is several orders of magnitude larger.

Intuitively, if different persons label visually similar images and videos using the same tags, these tags are likely to reflect objective aspects of the visual content. We will study how this intuition can be exploited to obtain relevant labels for visual content. To that end several data mining strategies will be explored, covering textual, visual, social, lexical, and multimodal approaches. All phases of the research will be evaluated in the TRECVID benchmark.

Delivered items 2011


SocialZap is a multimedia search engine that finds the most interesting fragments, zap points, in a television broadcast based on microblog posts and socially tagged photos. The main novelty of SocialZap is the fully-automatic transfer of the learned viewers interest from textual posts to the visual channel, without the need for any manual effort in the process. Once SocialZap finds the zap points, users can easily browse through a television broadcast and directly watch the interesting fragments. Thus, SocialZap adds social experience to watching television.