Sparse Interpretible Audio Model

Table of Contents

Model Architecture

This small model attempts to decompose audio featuring acoustic instruments into the following components:

While event data are encoded as real-valued vectors and not discrete values, the representation learned still lend themselves to a sparse, interpretible, and hopefully easy-to-manipulate encoding. This first draft was trained using the amazing MusicNet dataset.

Each sound sample below includes the following elements:

  1. The original recording
  2. The model's reconstruction
  3. New audio using the original timings, but random event vectors
  4. New audio using the original event vectors, but with random timings

Future Directions

There are several areas that could provide further gains in compression and interpretibility:

Cite this Work

                    
@misc{vinyard2023audio,
    author = {Vinyard, John},
    title = {Sparse Interpetable Audio},
    url = {https://JohnVinyard.github.io/machine-learning/2023/11/15/sparse-physical-model.html},
    year = 2024
}
                    
                

Sound Samples

Original

Recon

With Random Event Vectors

(based on mean and variance of event vectors for this sample)

With Random Timings

Timeline

Original

Recon

With Random Event Vectors

(based on mean and variance of event vectors for this sample)

With Random Timings

Timeline

Original

Recon

With Random Event Vectors

(based on mean and variance of event vectors for this sample)

With Random Timings

Timeline

Original

Recon

With Random Event Vectors

(based on mean and variance of event vectors for this sample)

With Random Timings

Timeline