Posts
-
Sparse Interpretable Audio Model V2
Happy Leap Day! I’ve just published some high-level details and sound reconstructions from a newly-trained model that decomposes musical audio into an easy-to-manipulate format:
-
Sparse Interpretable Audio Model
This post covers a model I’ve recently developed that encodes audio as a high-dimensional and sparse tensor, inspired by algorithms such as matching pursuit and dictionary learning.
-
A Music Vocoder Using Conditional Generative Adversarial Networks
The last couple of posts have been all about audio analysis and search but in this one, I’ll return to some work that gets me a little closer to my ultimate goal, which is building synthesizers with high-level parameters, allowing the production of audio ranging from convincingly-real renderings of traditional acoustic instruments to novel synthetic textures and sounds.
-
Cochlea: A RESTful API for Annotating Audio
Cochlea is an early-stage, RESTful API that allows users to annotate audio files on the internet.
Segments or time intervals can be annotated with text tags or other arbitrary data. This may not sound very exciting on its own, but I believe that these simple primitives make possible incredibly diverse applications tailored to the needs of electronic musicians, sound designers and other folks interested in playing with sound. -
Audio Query-By-Example via Unsupervised Embeddings
A couple of months ago, I gave a talk at the Austin Deep Learning Meetup about building Cochlea, a prototype audio similarity search engine. There was a lot to cover in an hour, some details were glossed over, and I’ve learned a few things since the talk, so I decided to write a blog post covering the process in a little more detail.
-
Perceptual Audio Loss
Today, I perform a small experiment to investigate whether a carefully designed loss function can help a very low-capacity neural network “spend” that capacity only on perceptually relevant features. If we can design audio codecs like Ogg Vorbis that allocate bits according to perceptual relevance, then we should be able to design a loss function that penalizes perceptually relevant errors, and doesn’t bother much with those that fall near or below the threshold of human awareness.
-
PyTorch and Zounds
In a previous post, we used zounds to build a very simple synthesizer by learning K-Means centroids from individual frames of a spectrogram, and then using the sequence of one-hot encodings as parameters for our synthesizer.
-
Building a Simple Audio Encoder
Previously, we covered extracting some commonly used features from audio, and building a search. In this post, we’ll use a similar workflow to build a very rudimentary audio encoder and decoder. If you want to jump ahead, the complete code for this example is here.
-
Deploying a Search Interface with Docker Compose
I’m late to the party, but I’ve been learning about Docker recently, specifically because I need a quick and easy way to deploy interactive examples for this blog. In a previous post, we built an interactive, timbre-based similarity search that you could play with in your browser.
-
Building a Timbre-Based Similarity Search
For quite some time, I’ve been fascinated with the idea of indexing large audio corpora, based on some perceptual similarity metric. This means that it isn’t necessary for some human to listen to and tag each audio sample manually. Searches are by example (rather like the “search by image” feature that you’ve probably encountered), and it’s possible to explore “neighborhoods” of similar sounds.
-
Getting Started with Zounds
Zounds is a python library for building audio feature extraction pipelines in a delcarative way, from re-usable building blocks. It provides many of the primitives you’ll need to start experimenting with audio, like Short-Time fourier and discrete cosine transforms, chroma, and bark-frequency cepstral coefficients.
subscribe via RSS