Multimedia editing and analysis software

From Canadian Centre for Ethnomusicology
Jump to navigation Jump to search

Zero-cost solutions recommended here: software included with your computer (providing basic functionality, sufficient for Music 666), or open source tools (which can be quite sophisticated).


Audio and MIDI recording

Open source: Audacity, super open source and cross-platform (mac and win) audio editor. Get to know it. It comes with help pages and tutorials (and you'll find many more on YouTube). Another open source tool is Ardour.

Proprietary: GarageBand (included with mac) handles audio, MIDI, and can represent scores as well - it's quite useful. There are a number of similar options for Windows.

Pro: If you want to do a lot of mixing and editing, you may prefer a more professional product such as Cubase, Logic, Reason, Ableton, or Pro Tools.

Audio separation

In the past it was nearly impossible to separate a single audio track into its constituent musical parts by source (e.g. vocals, drums, etc.). Now a new generation of AI-based tools claim to have begun to solve this formerly intracatable problem. I have not tried the following, but it's worth doing so if you need for instance to analyze an isolated vocal part.

Speech transcription

Several tools can help you transcribe speech.

TranscriberAG or Transcriva (see below also, under Text)

Speech to text software or automatic transcription: most computers have a built-in function; there are also dedicated software packages:Dragon

YouTube, Zoom, and Android phones (check out their Recorder app) all offer excellent automated transcription, though typically they work well only for native English speakers.

Transcription can range in precision, from semantic level (general paraphrasing), to lexical level (what words were uttered?), to details of speed, pitch, expression, and accent. Linguistic software is most flexible at the microlevel; check Praat (link below). You may wish to consider using an IPA font: Charis unicode


Transcription from sound to "notes" - i.e. from audio to symbol - isn't as crucial as it used to be in ethnomusicology, but it's still an important intermediary zone between audio files and analysis, and transcribing remains a crucial step for getting close to the music (unless you're actually learning to perform it). The gap between sequencers and score writers has narrowed considerably, but for a flexible notational system you'll want the latter; see this wikipedia article. I have mainly used Finale and Sibelius, which has student student pricing, but there are even cheaper (and free!) solutions out there. Finale does everything, but is typically overkill for ethnomusicology, and is cumbersome, in my opinion). A neat open source solution, suitably for monophony, is abc, widely used in the folk tune community, with the advantage of being typographical (you can also invent your own typographical system, e.g. using numbers for notes). A few related links are An inexpensive but very powerful score writer is NoteAbility. A (gnu) free package is lilypond. Another is Musescore. You may couple scorewriters with use of MIDI keyboards or other devices (e.g. electronic drums) for faster input. See also Arduino below.

Alternatively, you can always use pencil and staff paper, which has the advantage of flexibility, even if you'd probably rather not use it for a published article or thesis (we used to do that though).

One advantage of symbolic representations is the ability to highlight structure by reducing extraneous information, or through additional markup. It allows you to indicate, for instance a key melody, line, ornament, or rhythm, or the overall structure of a piece, rather presenting the sound. With audio it's easy to prepare excerpts as illustrations, but hard to reduce a piece to its structure, or to pull out a particular line from the texture.

It is difficult to automate conversion from audio to score/symbolic representations, though pitch trackers can get you partway, particularly with monophonic music (see Tony below) Mainly, though, you'll use your ear!

Symbolic representations, tune databases, and statistical analysis

If you're interested in analyzing large musical corpi, traditional scores are not the way (and neither are audio files!). Some programming is probably required. You can analyze MIDI encodings directly, but that requires programming to read MIDI's binary format. Plus MIDI drops much important information (such as barlines) which is not required to drive a synthesizer. Considerably less programming is required to program analyses of textual representations, e.g. abc. Finally, consider the Humdrum toolkit, for an encompassing suite of standards and tools in wide use by music theorists.

Visualization, Transcription and Analysis of Sound and Music: some tools

See this summary of automatic transcription tools: Benetos, E., Dixon, S., Duan, Z., & Ewert, S. (2019). Automatic music transcription: An overview. IEEE Signal Processing Magazine, 36(1), 20–30. (non-paywalled link:

See: Overtone Music Network

See: Automatic Music Transcription: An overview

See: Musical notation and visualization

See: Auditory Graphs and sonification sandbox

See: Ethnomusicology in the Laboratory: From the Tonometer to the Digital Melograph
David Cooper and Ian Sapiro
Ethnomusicology Forum
Vol. 15, No. 2 (Nov., 2006), pp. 301-313
Published by: Taylor & Francis, Ltd. on behalf of the British Forum for Ethnomusicology
Stable URL:
Contains an overview of Praat.

Voice Science, specializing in vocal sounds. Inexpensive.

Sonic visualiser is a powerful analysis package for sound, and it's free.

Sonic Lineup, by the same folks, allows you to align two performances and check for differences.

Tony, a tool for transcribing pitch, notes, melodies.

For a linguistic approach, try Praat, developed for linguistics but suitable for short music segments also. Also free.

Simpler but often adequate is Audacity, which provides a number of analysis tools.

Another free tool is Speech Analyzer.

Not free but apparently quite good is Melodyne

Spear, Sinusoidal Partial Editing Analysis and Resynthesis.

Arduino, hardware for possible input (you may also connect a MIDI keyboard).

Also see Resources and Tools in Speech, Hearing and Phonetics from University College London.

And SIL tools

Audio metadata

Certain audio file formats have slots for metadata; these include BWF (broadcast wave format) and mp3. This is a pragmatic approach, since when encoded within the file (as opposed to a separate metadata database or table) the metadata will travel with the file automatically. MP3 format includes provisions for unfortunately limited information in the so-called ID3 tags, of which there are two versions (v1 and v2), the latter holding much more than the former. Various ID3 tag editors are available; I've been using fixtag, written in java. iMusic can also edit ID3 tags but the problem is that it stores some information in its own database, so it's hard to tell whether the information you're adding is actually getting stored in the file or not.

AV annotation software: ELAN


Software: the Gimp (free), Photoshop (expensive). "speaking photo” app.

Windows and Mac operating systems provide simple, free image editing software, e.g. iPhoto for mac. More sophisticated is the open source cross-platform project, GIMP. The professional package is Adobe Photoshop

Inpaint can remove objects from your photos and fill in with the background.


Wikipedia's list of video editing software



  • iMovie is simple video editing software included with every mac; suitable for presentations.
  • Final cut pro is professional package for mac (not free)
  • Adobe (not free)


  • Movie Maker, included with Windows up to Vista
  • Live Movie Maker for Windows 7.
  • Many professional packages are available; probably Adobe's Premiere Pro (see below) is most widely used.

Linux: and several others listed here

Transcription software:

Video tracking software:


Many recording tools also enable embedded tags and annotations, as does qualitative analysis software. Annotation of audio and video recordings is possible via ELAN software.


Many recording tools also enable embedded tags and annotations, as does qualitative analysis software. Annotation of audio and video recordings is possible via ELAN software.

Converting file formats

While conversions are sometimes non-trivial operations (and can take a long time) there are a few programs that excel.

  • Audacity can convert among a large set of formats (free)
  • Quicktime can do the same for images (free)
  • For video as well as audio try the remarkable Adapter, which is a nice GUI based on the ffmpeg backend.
  • Handbrake is another useful tool, especially for converting DVDs to editable files
  • Also see Mpeg streamclip, which can also download YouTube videos