Interactive visual learning (IVL)
Objectives: develop tools and techniques for leveraging user-generated multimedia as training resource for interactive semantic labeling in a web-based user-interface.
The semantic gap dictates that automatic methods will never solve the labeling problem completely, thus eventually user involvement is essential. Traditional approaches for interactive visual learning put the emphasis on the human user who assesses the relevance of individual shots interactively using an advanced visualization in the interface. In this WP we break from this tradition and will instead exploit weakly labeled online data as starting point and emphasize in particular the role of diverse, yet compact, visual features and efficient machine learning schemes.
As online training data is unlikely to be suited for any domain one can think of, even, when disambiguated, we will study how online training data can be tuned to a specific application domain using active- and/or transfer learning techniques. All phases of the research will be evaluated in the TRECVID benchmark. In addition, we will be developing together with Video Dock a demonstrator for interactive visual learning.
Delivered items 2011
- International embedding: The 2011 SESAME Multimedia Event Detection (MED) System
- Dissemination: Interview on image recognition
- User study: Video search engine user study report (Sound and Vision)
- Software: First method for video event detection (Snoek, Van de Sande, Mazloom, Jiang, Koelma, Smeulders)
- Scientific presentation: Visual search (Snoek)
- Scientific presentation: Progress in video concept and event search (Snoek)
- Scientific publication: Empowering visual categorization with the GPU (Van de Sande, Gevers, Snoek)
- Scientific publication: Adding semantics to image-region annotations with the Name-It-Game(Steggink, Snoek)