Ethnographic Multimedia Research Platform

From CCE wiki archived
Jump to: navigation, search

short link:

The aim of this CCE project is twofold:

(1) to offer a solid research platform for computational “big data” ethnography (especially for ethnomusicology and related disciplines of folklore and anthropology), enabling secure storage, search, and retrieval of researcher-contributed digital repositories of ethnographic multimedia, thereby supporting an unprecedented level of collaborative and comparative research among such researchers, and (2) to enable computational research in machine learning towards development of new heuristics and algorithms for multimedia search, classification and retrieval, by providing researchers with a massive quantity of real-world, well-annotated data.

Towards this aim we wish to establish a large database and computation facility using resources from Compute Canada, in conjunction with a Canarie/CRIM funded project, VESTA ([VidEo: Annotation Processing System] comprising an integrated set of innovative computer tools for analyzing and annotating audio and video recordings). Such a platform does not yet exist; establishing it will place Canada as a world leader in this domain.

More concretely, we anticipate requiring the following resources: 2 PB of secure, backed-up storage, plus 12 core years (4 cores of processing power for 3 years, or the equivalent), over a period of three years. Storage will be dedicated primarily to large media files (mainly digital video and audio), as well as textual metadata and annotations. Core years will be dedicated to media processing, including algorithms provided by VESTA, as well as experimental algorithms proposed by researchers in computing and information science.

The project will thus serve to advance national and international research agendas in two broad domains corresponding to two broad research communities: (1) ethnographic study of human societies through multimedia documentation (arts, humanities and social sciences); (2) research and testing of algorithms for multimedia information retrieval, particularly those based on machine learning heuristics (computer science and information science). Each agenda and community supports the other, in a synergy of humanistic and scientific research.