Cross-modal correlation mining using graph algorithms

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

3 Citations (Scopus)

Abstract

Multimedia objects like video clips or captioned images contain data of various modalities such as image, audio, and transcript text. Correlations across different modalities provide information about the multimedia content, and are useful in applications ranging from summarization to semantic captioning. We propose a graph-based method, MAGIC, which represents multimedia data as a graph and can find cross-modal correlations using "random walks with restarts." MAGIC has several desirable properties: (a) it is general and domain-independent; (b) it can detect correlations across any two modalities; (c) it is insensitive to parameter settings; (d) it scales up well for large datasets; (e) it enables novel multimedia applications (e.g., group captioning); and (f) it creates opportunity for applying graph algorithms to multimedia problems. When applied to automatic image captioning, MAGIC finds correlations between text and image and achieves a relative improvement of 58% in captioning accuracy as compared to recent machine learning techniques.

Original languageEnglish
Title of host publicationKnowledge Discovery and Data Mining
Subtitle of host publicationChallenges and Realities
PublisherIGI Global
Pages49-73
Number of pages25
ISBN (Print)9781599042527
DOIs
Publication statusPublished - 2007
Externally publishedYes

Fingerprint

Dive into the research topics of 'Cross-modal correlation mining using graph algorithms'. Together they form a unique fingerprint.

Cite this