SURF Projects for Summer 2019

  • Creating a python package

    Building a new python package that reimagines existing MATLAB code for building aligned hierarchies, a representation for musical scores. This builds on work published at ISMIR 2016, but requires no background knowledge in music information retrieval (MIR). Students interested in this project should have experience coding in python (ie. CS 111), and they should be excited by the prospect of building code that will be publicly available.
  • Interactive Visualization

    The Interactive Aligned Hierarchies also builds on the aligned hierarchies, but links the static output to the score. This will allow for interactive exploration of both the score and the representation simultaneously. Students interested in this project should have experience designing large-scale projects and are excited about connecting visualizations with sound files.
  • Data Science Education

    (Possible funding) As part of a TRIPODS+X grant, there is summer work to explore investigating students’ data science misconceptions before and after their first formal course in data science. This project will rely on aspects of text mining in addition to data management.

Research Interests in MIR

My research is in applied mathematics. Specifically, I work at the intersection of network theory and machine learning. Working with high-dimensional and often noisy datasets, I seek to extract salient information from each data point to inform meaningful comparisons between the data points. Viewing my data as networks is essential to the techniques that I create. This network point of view introduces natural questions such as what data points cluster together? and what relationship does network structure have to the high-dimensional data domain at hand? These kinds of questions are naturally approached with theory from the fields of numerical linear algebra, statistical learning, complex networks, and machine learning.

To be more precise, I work with data based on cultural artifacts, specifically sets of musical songs. To perform the analysis of the dataset, I begin by individually analyzing each datapoint and creating Aligned Hierarchies for each one. These aligned hierarchies are created by identifying repeated structure at a variety of sizes, and are smaller, coarser representations of the original data. I use the aligned hierarchies to create a network representation for the whole dataset which I then use to perform the analysis of the dataset.

Currently, I am working with data based on Mazurkas. Thus my work is relevant and interesting to the field of Music Information Retrieval (MIR). My approach of treating the data, in this case the songs, as a complex network on which mathematical theory and machine learning principles can naturally be applied. Thereby my work is interesting to both the mathematical and machine learning communities. Additionally, this network approach frees me to find structure without limiting myself to well known and studied musical objects such as chords or codas.

Simplistically, my present goal is to compare songs without listening to them. In this work, my song comparisons are task-dependent; I may be looking for exact matches of a recording, for cover songs of a specific song, or for remixes of all or part of a particular song. In MIR, these different comparisons are called tasks. My approach, regardless of task, is to build a representation space for the dataset created from the individual multiscale signatures that represent each song. To create the aligned hierachies for a song, my algorithm finds repeats in the song matrices at a variety of sizes.