Dynamic time warping

In time series analysis, dynamic time warping (DTW) is one of the algorithms for measuring similarity between two temporal sequences, which may vary in speed. For instance, similarities in walking could be detected using DTW, even if one person was walking faster than the other, or if there were accelerations and decelerations during the course of an observation. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data that can be turned into a linear sequence can be analyzed with DTW. A well known application has been automatic speech recognition, to cope with different speaking speeds. Other applications include speaker recognition and online signature recognition. Also it is seen that it can be used in partial shape matching application.

In general, DTW is a method that calculates an optimal match between two given sequences (e.g. time series) with certain restriction and rules:

Every index from the first sequence must be matched with one or more indices from the other sequence, and vice versa
The first index from the first sequence must be matched with the first index from the other sequence (but it does not have to be its only match)
The last index from the first sequence must be matched with the last index from the other sequence (but it does not have to be its only match)
The mapping of the indices from the first sequence to indices from the other sequence must be monotonically increasing, and vice versa, i.e. if $j>i$ are indices from the first sequence, then there must not be two indices $l>k$ in the other sequence, such that index $i$ is matched with index $l$ and index $j$ is matched with index $k$ , and vice versa

The optimal match is denoted by the match that satisfies all the restrictions and the rules and that has the minimal cost, where the cost is computed as the sum of absolute differences, for each matched pair of indices, between their values.

The sequences are "warped" non-linearly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension. This sequence alignment method is often used in time series classification. Although DTW measures a distance-like quantity between two given sequences, it doesn't guarantee the triangle inequality to hold.

In addition to a similarity measure between the two sequences, a so called "warping path" is produced, by warping according to this path the two signals may be aligned in time. The signal with an original set of points X(original), Y(original) is transformed to X(warped), Y(warped). This finds applications in genetic sequence and audio synchronisation. In a related technique sequences of varying speed may be averaged using this technique see the average sequence section.

Implementation

This example illustrates the implementation of the dynamic time warping algorithm when the two sequences s and t are strings of discrete symbols. For two symbols x and y, d(x, y) is a distance between the symbols, e.g. d(x, y) = $| x - y |$ .

int DTWDistance(s: array [1..n], t: array [1..m]) {
   DTW := array [0..n, 0..m]

   for i := 1 to n
       DTW[i, 0] := infinity
   for i := 1 to m
       DTW[0, i] := infinity
   DTW[0, 0] := 0

   for i := 1 to n
       for j := 1 to m
           cost := d(s[i], t[j])
           DTW[i, j] := cost + minimum(DTW[i-1, j  ],    // insertion
                                       DTW[i  , j-1],    // deletion
                                       DTW[i-1, j-1])    // match

   return DTW[n, m]
}

where DTW(i, j) is the distance between s[1:i] and t[1:j] with the best alignment.

We sometimes want to add a locality constraint. That is, we require that if s[i] is matched with t[j], then $| i - j |$ is no larger than w, a window parameter.

We can easily modify the above algorithm to add a locality constraint (differences marked in bold italic). However, the above given modification works only if $| n - m |$ is no larger than w, i.e. the end point is within the window length from diagonal. In order to make the algorithm work, the window parameter w must be adapted so that $| n - m | \le w$ (see the line marked with (*) in the code).

int DTWDistance(s: array [1..n], t: array [1..m], w: int) {
    DTW := array [0..n, 0..m]

    w := max(w, abs(n-m)) // adapt window size (*)

    for i := 0 to n
        for j:= 0 to m
            DTW[i, j] := infinity
    DTW[0, 0] := 0

    for i := 1 to n
        for j := max(1, i-w) to min(m, i+w)
            cost := d(s[i], t[j])
            DTW[i, j] := cost + minimum(DTW[i-1, j  ],    // insertion
                                        DTW[i  , j-1],    // deletion
                                        DTW[i-1, j-1])    // match

    return DTW[n, m]
}

Complexity

The time complexity of DTW algorithm is $O(NM)$ , where $N$ and $M$ are the lengths of the two input sequences. More generally, without loss of generality, assuming that $N\geq M$ , the time complexity can be said to be $O(N^{2})$ . The same is true for space complexity.

Fast computation

Fast techniques for computing DTW include PrunedDTW,^[1] SparseDTW,^[2] FastDTW,^[3] and the MultiscaleDTW.^[4]^[5] A common task, retrieval of similar time series, can be accelerated by using lower bounds such as LB_Keogh^[6] or LB_Improved.^[7] In a survey, Wang et al. reported slightly better results with the LB_Improved lower bound than the LB_Keogh bound, and found that other techniques were inefficient.^[8]

Average sequence

Averaging for dynamic time warping is the problem of finding an average sequence for a set of sequences. The average sequence is the sequence that minimizes the sum of the squares to the set of objects. NLAAF^[9] is the exact method for two sequences. For more than two sequences, the problem is related to the one of the multiple alignment and requires heuristics. DBA^[10] is currently the reference method to average a set of sequences consistently with DTW. COMASA^[11] efficiently randomizes the search for the average sequence, using DBA as a local optimization process.

Supervised learning

A nearest-neighbour classifier can achieve state-of-the-art performance when using dynamic time warping as a distance measure.^[12]

Alternative approach

An alternative technique for DTW is based on functional data analysis, in which the time series are regarded as discretizations of smooth (differentiable) functions of time and therefore continuous mathematics is applied.^[13] Optimal nonlinear time warping functions are computed by minimizing a measure of distance of the set of functions to their warped average. Roughness penalty terms for the warping functions may be added, e.g., by constraining the size of their curvature. The resultant warping functions are smooth, which facilitates further processing. This approach has been successfully applied to analyze patterns and variability of speech movements.^[14]^[15]

Another related approach are hidden Markov models (HMM) and it has been shown that the Viterbi algorithm used to search for the most likely path through the HMM is equivalent to stochastic DTW.^[16]^[17]^[18]

Open-source software

The lbimproved C++ library implements Fast Nearest-Neighbor Retrieval algorithms under the GNU General Public License (GPL). It also provides a C++ implementation of dynamic time warping, as well as various lower bounds.
The FastDTW library is a Java implementation of DTW and a FastDTW implementation that provides optimal or near-optimal alignments with an O(N) time and memory complexity, in contrast to the O(N²) requirement for the standard DTW algorithm. FastDTW uses a multilevel approach that recursively projects a solution from a coarser resolution and refines the projected solution.
FastDTW fork (Java) published to Maven Central.
The R package dtw implements most known variants of the DTW algorithm family, including a variety of recursion rules (also called step patterns), constraints, and substring matching.
The mlpy Python library implements DTW.
The pydtw Python library implements the Manhattan and Euclidean flavoured DTW measures including the LB_Keogh lower bounds.
The cudadtw C++/CUDA library implements subsequence alignment of Euclidean-flavoured DTW and z-normalized Euclidean distance similar to the popular UCR-Suite on CUDA-enabled accelerators.
The JavaML machine learning library implements DTW.
The ndtw C# library implements DTW with various options.
Sketch-a-Char uses Greedy DTW (implemented in JavaScript) as part of LaTeX symbol classifier program.
The MatchBox implements DTW to match mel-frequency cepstral coefficients of audio signals.
Sequence averaging: a GPL Java implementation of DBA.^[10]
The Gesture Recognition Toolkit|GRT C++ real-time gesture-recognition toolkit implements DTW.
The PyHubs software package implements DTW and nearest-neighbour classifiers, as well as their extensions (hubness-aware classifiers).
The simpledtw Python library implements the classic O(NM) Dynamic Programming algorithm and bases on Numpy. It supports values of any dimension, as well as using custom norm functions for the distances. It is licensed under the MIT license.

Applications

Spoken-word recognition

Due to different speaking rates, a non-linear fluctuation occurs in speech pattern versus time axis, which needs to be eliminated.^[19] DP matching is a pattern-matching algorithm based on dynamic programming (DP), which uses a time-normalization effect, where the fluctuations in the time axis are modeled using a non-linear time-warping function. Considering any two speech patterns, we can get rid of their timing differences by warping the time axis of one so that the maximal coincidence is attained with the other. Moreover, if the warping function is allowed to take any possible value, very less distinction can be made between words belonging to different categories. So, to enhance the distinction between words belonging to different categories, restrictions were imposed on the warping function slope.

Correlation power analysis

Unstable clocks are used to defeat naive power analysis. Several techniques are used to counter this defense, one of which is dynamic time warp.

References

↑ Silva, D. F., Batista, G. E. A. P. A. (2015). Speeding Up All-Pairwise Dynamic Time Warping Matrix Calculation.
↑ Al-Naymat, G., Chawla, S., Taheri, J. (2012). SparseDTW: A Novel Approach to Speed up Dynamic Time Warping.
↑ Stan Salvador, Philip Chan, FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space. KDD Workshop on Mining Temporal and Sequential Data, pp. 70–80, 2004.
↑ Meinard Müller, Henning Mattes, and Frank Kurth (2006). An Efficient Multiscale Approach to Audio Synchronization. Proceedings of the International Conference on Music Information Retrieval (ISMIR), pp. 192—197.
↑ Thomas Prätzlich, Jonathan Driedger, and Meinard Müller (2016). Memory-Restricted Multiscale Dynamic Time Warping. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 569—573.
↑ Keogh, E.; Ratanamahatana, C. A. (2005). "Exact indexing of dynamic time warping". Knowledge and Information Systems. 7 (3): 358–386. doi:10.1007/s10115-004-0154-9.
↑ Lemire, D. (2009). "Faster Retrieval with a Two-Pass Dynamic-Time-Warping Lower Bound". Pattern Recognition. 42 (9): 2169–2180. arXiv:0811.3301. doi:10.1016/j.patcog.2008.11.030.
↑ Wang, Xiaoyue; et al. "Experimental comparison of representation methods and distance measures for time series data". Data Mining and Knowledge Discovery. 2010: 1–35.
↑ Gupta, L.; Molfese, D. L.; Tammana, R.; Simos, P. G. (1996). "Nonlinear alignment and averaging for estimating the evoked potential". IEEE Transactions on Biomedical Engineering. 43 (4): 348–356. doi:10.1109/10.486255. PMID 8626184.
1 2 Petitjean, F. O.; Ketterlin, A.; Gançarski, P. (2011). "A global averaging method for dynamic time warping, with applications to clustering". Pattern Recognition. 44 (3): 678. doi:10.1016/j.patcog.2010.09.013.
↑ Petitjean, F. O.; Gançarski, P. (2012). "Summarizing a set of time series by averaging: From Steiner sequence to compact multiple alignment". Theoretical Computer Science. 414: 76. doi:10.1016/j.tcs.2011.09.029.
↑ Ding, Hui; Trajcevski, Goce; Scheuermann, Peter; Wang, Xiaoyue; Keogh, Eamonn (2008). "Querying and mining of time series data: experimental comparison of representations and distance measures". Proc. VLDB Endow. 1 (2): 1542–1552. doi:10.14778/1454159.1454226.
↑ Lucero, J. C.; Munhall, K. G.; Gracco, V. G.; Ramsay, J. O. (1997). "On the Registration of Time and the Patterning of Speech Movements". Journal of Speech, Language, and Hearing Research. 40: 1111–1117. doi:10.1044/jslhr.4005.1111.
↑ Howell, P.; Anderson, A.; Lucero, J. C. (2010). "Speech motor timing and fluency". In Maassen, B.; van Lieshout, P. Speech Motor Control: New Developments in Basic and Applied Research. Oxford University Press. pp. 215–225. ISBN 978-0199235797.
↑ Koenig, Laura L.; Lucero, Jorge C.; Perlman, Elizabeth (2008). "Speech production variability in fricatives of children and adults: Results of functional data analysis". The Journal of the Acoustical Society of America. 124 (5): 3158–3170. Bibcode:2008ASAJ..124.3158K. doi:10.1121/1.2981639. ISSN 0001-4966. PMC 2677351. PMID 19045800.
↑ Nakagawa, Seiichi; Nakanishi, Hirobumi (1988-01-01). "Speaker-Independent English Consonant and Japanese Word Recognition by a Stochastic Dynamic Time Warping Method". IETE Journal of Research. 34 (1): 87–95. doi:10.1080/03772063.1988.11436710. ISSN 0377-2063.
↑ Fang, Chunsheng. "From Dynamic Time Warping (DTW) to Hidden Markov Model (HMM)" (PDF).
↑ Juang, B. H. (September 1984). "On the hidden Markov model and dynamic time warping for speech recognition #x2014; A unified view". AT T Bell Laboratories Technical Journal. 63 (7): 1213–1243. doi:10.1002/j.1538-7305.1984.tb00034.x. ISSN 0748-612X.
↑ Sakoe, Hiroaki; Chiba, Seibi. "Dynamic programming algorithm optimization for spoken word recognition". IEEE Transactions on Acoustics, Speech and Signal Processing. 26 (1): 43–49. doi:10.1109/tassp.1978.1163055.