Machine Learning for Sound Recognition: Difference between revisions

From Canadian Centre for Ethnomusicology
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
 
Deep Learning for Sound Recognition<br>
== Deep Learning for Sound Recognition ==
 
<br>
How do we recognize the components and attributes of sound, describe and parse an audio recording of music, speech, or environmental sounds, or extract sonic features, classify types, segment units, and identify sources of sounds? Sometimes recordings capture a single sound source: a single instrument, speaker, or bird; others may find multiple but coordinated sources:  a musical ensemble, or a conversation; yet typically in fieldwork, a recording encompasses a complex mix of uncoordinated sound sources, a total soundscape that may include music as well as speech, music from multiple groups performing simultaneously, many speakers speaking at once, or many bird calls, all of which are layered together with “noise” such as the sounds of crowds, highways and factories, rain, wind and thunder. Unlike the analogous challenges in visual “recordings” (photographs), recognizing complex sound environments on audio recordings remains a rather mysterious process.<br>
How do we recognize the components and attributes of sound, describe and parse an audio recording of music, speech, or environmental sounds, or extract sonic features, classify types, segment units, and identify sources of sounds? Sometimes recordings capture a single sound source: a single instrument, speaker, or bird; others may find multiple but coordinated sources:  a musical ensemble, or a conversation; yet typically in fieldwork, a recording encompasses a complex mix of uncoordinated sound sources, a total soundscape that may include music as well as speech, music from multiple groups performing simultaneously, many speakers speaking at once, or many bird calls, all of which are layered together with “noise” such as the sounds of crowds, highways and factories, rain, wind and thunder. Unlike the analogous challenges in visual “recordings” (photographs), recognizing complex sound environments on audio recordings remains a rather mysterious process.<br>
In contrast to an earlier era of “small data” (largely the result of the limited capacity of expensive analog recorders), the advent of inexpensive, portable, digital recording devices of enormous capacity combined with a growing interest in sound across the humanities, social sciences, and sciences, now contribute vast collections of sound recordings, resulting in interest in sound within the realm of “big data.” To date, most of the sound collection data is not annotated and in all practicality, is therefore inaccessible for research.<br>
In contrast to an earlier era of “small data” (largely the result of the limited capacity of expensive analog recorders), the advent of inexpensive, portable, digital recording devices of enormous capacity combined with a growing interest in sound across the humanities, social sciences, and sciences, now contribute vast collections of sound recordings, resulting in interest in sound within the realm of “big data.” To date, most of the sound collection data is not annotated and in all practicality, is therefore inaccessible for research.<br>
Computational recognition of sound, its types, sources, and components is crucial for a wide array of fields, including ethnomusicology, music studies, sound studies, linguistics (especially phonetics), media studies, library and information science, and bioacoustics, in order to enable indexing, searching, retrieval, and regression of audio information. While expert human listeners may be able to recognize complex sound environments with ease, the process is slow: they listen in real time, and they must be trained to hear sonic events contrapuntally. Through this project, we aim to explore opportunities for application of big data deep learning  that will ultimately enable these functions across large sound collections for ongoing interdisciplinary research.<br>
Computational recognition of sound, its types, sources, and components is crucial for a wide array of fields, including ethnomusicology, music studies, sound studies, linguistics (especially phonetics), media studies, library and information science, and bioacoustics, in order to enable indexing, searching, retrieval, and regression of audio information. While expert human listeners may be able to recognize complex sound environments with ease, the process is slow: they listen in real time, and they must be trained to hear sonic events contrapuntally. Through this project, we aim to explore opportunities for application of big data deep learning  that will ultimately enable these functions across large sound collections for ongoing interdisciplinary research.<br>
 
Team Members<br>
== Team Members ==
<br>
Principal Investigator: Michael Frishkopf, Professor of Ethnomusicology, Department of Music<br>
Principal Investigator: Michael Frishkopf, Professor of Ethnomusicology, Department of Music<br>
Antti Arppe, Assistant Professor of Quantitative Linguistics
Antti Arppe, Assistant Professor of Quantitative Linguistics<br>
Erin Bayne, Professor, Department of Biological Sciences
Erin Bayne, Professor, Department of Biological Sciences<br>
Vadim Bulitko, Associate Professor, Department of Computing Science
Vadim Bulitko, Associate Professor, Department of Computing Science<br>
Astrid Ensslin, Professor of Media and Digital Communication
Astrid Ensslin, Professor of Media and Digital Communication<br>
Abram Hindle, Assistant Professor, Department of Computing Science
Abram Hindle, Assistant Professor, Department of Computing Science<br>
Mary Ingraham, Professor of Musicology, Director, Sound Studies Initiative, Department of Music
Mary Ingraham, Professor of Musicology, Director, Sound Studies Initiative, Department of Music<br>
Sean Luyk, Music Librarian and Service Manager of ERA Audio + Video, University of Alberta Libraries
Sean Luyk, Music Librarian and Service Manager of ERA Audio + Video, University of Alberta Libraries<br>
Scott Smallwood, Associate Professor of Music Composition, Department of Music
Scott Smallwood, Associate Professor of Music Composition, Department of Music<br>
Benjamin V. Tucker, Associate Professor of Phonetics, Department of Linguistics<br>
Benjamin V. Tucker, Associate Professor of Phonetics, Department of Linguistics<br>
 
Collaborators:<br>
== Collaborators ==
Ichiro Fujinaga, Associate Professor in Music Technology, Schulich School of Music, McGill University<br>
:
George Tzanetakis, Associate Professor, Department of Computer Science, University of Victoria<br>
Ichiro Fujinaga, Associate Professor in Music Technology, Schulich School of Music, McGill University
Anna Lomax Wood, President and Director of Research for the Association for Cultural Equity,<br>
George Tzanetakis, Associate Professor, Department of Computer Science, University of Victoria
Michael Cohen, Professor of Computer Science, University of Aizu, Aizu-Wakamatsu, Japan.<br>
Anna Lomax Wood, President and Director of Research for the Association for Cultural Equity,
Diane Thram, Professor Emerita, Music Department, Rhodes University, South Africa<br>
Michael Cohen, Professor of Computer Science, University of Aizu, Aizu-Wakamatsu, Japan.
Diane Thram, Professor Emerita, Music Department, Rhodes University, South Africa
Philippe Collard, André Lapointe, Frédéric Osterrath, & Gilles Boulianne, Centre de recherche informatique de Montréal (CRIM)<br>
Philippe Collard, André Lapointe, Frédéric Osterrath, & Gilles Boulianne, Centre de recherche informatique de Montréal (CRIM)<br>
 
Students:<br>
== Students ==
Sergio Poo Hernandez, MSc in Computing Science<br>
:
Sergio Poo Hernandez, MSc in Computing Science
Noah Weninger, Undergraduate Research Assistant, Computing Science, University of Alberta<br>
Noah Weninger, Undergraduate Research Assistant, Computing Science, University of Alberta<br>
 
Funding Support (U of A):<br>
== Funding Support (U of A) ==):<br>
KIAS Cluster Grant 2017<br>
KIAS Cluster Grant 2017
Canadian Centre for Ethnomusicology<br>
Canadian Centre for Ethnomusicology
Hindle/Bulitko Computing Science Labs<br>
Hindle/Bulitko Computing Science Labs
Biacoustic Unit (Biological Sciences)<br>
Biacoustic Unit (Biological Sciences)
Alberta Phonetics Laboratory (Linguistics)<br>
Alberta Phonetics Laboratory (Linguistics)
Alberta Language Technology Lab (Linguistics)<br>
Alberta Language Technology Lab (Linguistics)
University of Alberta Research Experience (UARE)<br>
University of Alberta Research Experience (UARE)<br>
Funding Support (Other):<br>
Funding Support (Other):<br>
NVIDIA Corporation
NVIDIA Corporation<br>
Spatial Media Laboratory, University of Aizu, Japan
Spatial Media Laboratory, University of Aizu, Japan<br>
Compute Canada
Compute Canada<br>
Centre de recherche informatique de Montréal
Centre de recherche informatique de Montréal<br>
SSHRC<br>
SSHRC<br>
<br><br>
<br><br>

Revision as of 19:03, 15 October 2017

Deep Learning for Sound Recognition
How do we recognize the components and attributes of sound, describe and parse an audio recording of music, speech, or environmental sounds, or extract sonic features, classify types, segment units, and identify sources of sounds? Sometimes recordings capture a single sound source: a single instrument, speaker, or bird; others may find multiple but coordinated sources: a musical ensemble, or a conversation; yet typically in fieldwork, a recording encompasses a complex mix of uncoordinated sound sources, a total soundscape that may include music as well as speech, music from multiple groups performing simultaneously, many speakers speaking at once, or many bird calls, all of which are layered together with “noise” such as the sounds of crowds, highways and factories, rain, wind and thunder. Unlike the analogous challenges in visual “recordings” (photographs), recognizing complex sound environments on audio recordings remains a rather mysterious process.
In contrast to an earlier era of “small data” (largely the result of the limited capacity of expensive analog recorders), the advent of inexpensive, portable, digital recording devices of enormous capacity combined with a growing interest in sound across the humanities, social sciences, and sciences, now contribute vast collections of sound recordings, resulting in interest in sound within the realm of “big data.” To date, most of the sound collection data is not annotated and in all practicality, is therefore inaccessible for research.
Computational recognition of sound, its types, sources, and components is crucial for a wide array of fields, including ethnomusicology, music studies, sound studies, linguistics (especially phonetics), media studies, library and information science, and bioacoustics, in order to enable indexing, searching, retrieval, and regression of audio information. While expert human listeners may be able to recognize complex sound environments with ease, the process is slow: they listen in real time, and they must be trained to hear sonic events contrapuntally. Through this project, we aim to explore opportunities for application of big data deep learning that will ultimately enable these functions across large sound collections for ongoing interdisciplinary research.
Team Members
Principal Investigator: Michael Frishkopf, Professor of Ethnomusicology, Department of Music
Antti Arppe, Assistant Professor of Quantitative Linguistics
Erin Bayne, Professor, Department of Biological Sciences
Vadim Bulitko, Associate Professor, Department of Computing Science
Astrid Ensslin, Professor of Media and Digital Communication
Abram Hindle, Assistant Professor, Department of Computing Science
Mary Ingraham, Professor of Musicology, Director, Sound Studies Initiative, Department of Music
Sean Luyk, Music Librarian and Service Manager of ERA Audio + Video, University of Alberta Libraries
Scott Smallwood, Associate Professor of Music Composition, Department of Music
Benjamin V. Tucker, Associate Professor of Phonetics, Department of Linguistics
Collaborators:
Ichiro Fujinaga, Associate Professor in Music Technology, Schulich School of Music, McGill University
George Tzanetakis, Associate Professor, Department of Computer Science, University of Victoria
Anna Lomax Wood, President and Director of Research for the Association for Cultural Equity,
Michael Cohen, Professor of Computer Science, University of Aizu, Aizu-Wakamatsu, Japan.
Diane Thram, Professor Emerita, Music Department, Rhodes University, South Africa
Philippe Collard, André Lapointe, Frédéric Osterrath, & Gilles Boulianne, Centre de recherche informatique de Montréal (CRIM)
Students:
Sergio Poo Hernandez, MSc in Computing Science
Noah Weninger, Undergraduate Research Assistant, Computing Science, University of Alberta
Funding Support (U of A):
KIAS Cluster Grant 2017
Canadian Centre for Ethnomusicology
Hindle/Bulitko Computing Science Labs
Biacoustic Unit (Biological Sciences)
Alberta Phonetics Laboratory (Linguistics)
Alberta Language Technology Lab (Linguistics)
University of Alberta Research Experience (UARE)
Funding Support (Other):
NVIDIA Corporation
Spatial Media Laboratory, University of Aizu, Japan
Compute Canada
Centre de recherche informatique de Montréal
SSHRC