Sound and Music Descriptors

Sound and Music Descriptors#

Energy of a signal#

The energy of a signal can be computed in the time domain \(x[n]\) or in the frequency domain \(X[k]\). In the time domain, the energy is computed as the sum of the squared values of the signal:

of length \(N\) can be computed in the discrete time domain as follows:

\[ energy_l=\overset{N-1}{\underset{k=0}{\sum}}\left|X_l\left[k\right]\right|^{2} \]

Steven’s power law#

An approximation to the concept of the perceptual measure of loudness.

\[ loudness_l= (\overset{N-1}{\underset{k=0}{\sum}}\left|X_l\left[k\right]\right|^{2})^{0.67} \]

Root mean square#

\[ RMS_l=\sqrt{{1\over{N^2}}\overset{N-1}{\underset{k=0}{\sum}}\left|X_l\left[k\right]\right|^{2}} \]

Euclidian distance#

The Euclidian distance is the straight-line distance between two points in an n-dimensional space, thus the distance between points \(p\) and \(q\) is the length of the line segment connecting them. If \(p = (p_1, p_2,..., p_n)\) and \(q = (q_1, q_2,..., q_n)\) are two points in Euclidean n-space, then the distance, \(d\), from \(p\) to \(q\), or from \(q\) to \(p\) is given by the Pythagorean formula:

\[ d(p,q) = \sqrt{\sum^n_{i=1} (q_i - p_i)^2} \]

K-means clustering (k-means)#

K-means clustering is a method of vector quantization that is popular for cluster analysis in data mining. K-means clustering aims to partition \(n\) observations into \(k\) clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. The problem is computationally difficult (NP-hard), however, efficient heuristic algorithms converge quickly to a local optimum.

Given a set of observations \((x_1, x_2, …, x_n)\), where each observation is a d-dimensional real vector, k-means clustering aims to partition the \(n\) observations into \(k (≤ n)\) sets \(S = {S_1, S_2, …, S_k}\) so as to minimize the within-cluster sum of squares (WCSS) (i.e. variance). Formally, the objective is to find:

\[ \underset{\mathbf{S}} {\operatorname{arg\,min}} \sum_{i=1}^{k} \sum_{\mathbf x \in S_i} \left\| \mathbf x - \boldsymbol\mu_i \right\|^2 = \underset{\mathbf{S}} {\operatorname{arg\,min}} \sum_{i=1}^k |S_i| \operatorname{Var} S_i \]

where \(μ_i\) is the mean of points in \(S_i\).

K-nearest neighbours classifier (k-NN)#

K-nearest neighbours classification (k-NN) is a non-parametric method in which the input consists of the \(k\) closest training examples in the feature space. The output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its \(k\) nearest neighbors (\(k\) is a positive integer, typically small). If \(k = 1\), then the object is simply assigned to the class of that single nearest neighbor.