Generating semantic metadata for TV content
Enriching seed video with media content
Media Fragments
Since LinkedTV enriches seed video at the fragment level, our first contribution is to introduce Media Fragments URI as a media format independent, standard means of addressing media resources using URIs. The W3C Media Fragments WG has pushed the specification to recommendation stage, and a LinkedTV document (D2.1) explains how the HTTP protocol can be used and extended to serve Media Fragments, the impact on current Web media, and how the LinkedTV player will handle media fragments.
LinkedTV ontology
A LinkedTV ontology defines the permitted vocabulary to be used in the semantic annotation, so that all components in the LinkedTV workflow which re-use that annotation can work together. The ontology takes into account numerous existing formats and standards for multimedia description (to maximise interoperability) as well as LinkedTV requirements, leading to an ontology specification (http://linkedtv.eu/ontology) which is described in a LinkedTV document (D2.2)
To extract unique identifiers for concepts in the media which can be used to obtain additional metadata about them (following Linked Data principles) we have released the NERD platform (http://nerd.eurecom.fr) and integrated for LinkedTV a number of new extractors for the German and Dutch languages, in particular using the SemiTags service for Named Entity Classification and THD (Targeted Hypernym Discovery) of the partner UEP.
Generating semantic metadata for TV content
The legacy metadata and the results of the hypervideo analysis of the seed video are converted into RDF (a semantic data model) and NER (Named Entity Recognition) is executed on text and labels from both sources to relate concepts (from the Linked Data cloud) with fragments of the seed video in the RDF annotation.
A Web service (TV2RDF) automates the semantic annotation while a Web based GUI is available to test the functionality with LinkedTV scenario partner content.
Enriching seed video with media content
Linked Media is about mining, retrieving and discovering additional online content to enrich specific media fragments of the seed video program being watched by the user. Different methods ranging from structured queries on XML or RDF data, API calls to public Web services through to site-specific Web mining over HTML are applied to extract conceptual annotations of online media and match them to the seed video fragments based on concept similarity.
EURECOM’s MediaCollector is using a set of media item extractors to identify and crawl media being shared on common social networks such as Facebook, Twitter, Instagram, Twitpic, YouTube, MobyPicture or Flickr. It concentrates on returning the “freshest” media related to a search term, and thus is of particular interest in our Linked News scenario.
UEP develops a Website crawler for extracting media on webpages and their associated metadata (e.g. captions). This is applied to the whitelisted sites identified by the scenario partners. In particular, this is very appropriate for linking to cultural heritage resources online in the Hyperlinked Documentary scenario.
Deliverables and presentations
Specification of the Media Fragment URI scheme
Specification of lightweight metadata models for multimedia annotation
Specification of Web mining process for hypervideo concept identification
Online Demos
SemiTags
NERD Platform
Metadata Conversion Tool
LinkedTV deals with i) the technical architecture enabling deep linking to media objects, ii) the design of lightweight metadata models, iii) the specification of a Linked Media Layer on the Web using this metadata in combination with Linked Data and iv) tools for mining and processing Web content in order to populate the metadata knowledge base.The goal of this research is to retrieve and pre-process external web resources to enrich the broadcasts with additional information about important concepts that could be presented to the viewer. We will investigate the use of advanced techniques for web mining that have not yet been considered in connection with multimedia concepts. First, large collections of web resources, forming an ‘information cloud’ around individual broadcasts, will be partly manually pre-selected and partly retrieved via intelligent meta-search. Subsequently they will be examined at the level of individual website and document, respectively, using web spiders and document categorisers. Finally, concrete documents selected via spidering and classification will be submitted to fine-grained information extraction (IE) tools.
Since there is a limited number of different broadcast genres and assuming that the viewers are likely to have different preferences and requirements for each of the genre, there is a need for creating genre-specific information gathering templates. These templates would provide the necessary granularity and adaptability to the user’s requests and interests.