Similarity Measures And Indexing Methods For Time Series And Multiclass Recognition
MetadataShow full item record
This thesis investigates the problem of similarity search in multimedia databases. A key application domain of the proposed work is pattern classification, with emphasis on classification of gestures, handshapes, faces, and time series. A secondary application of the proposed work is efficient similarity search in large biological databases of protein sequences.More specifically, the thesis makes contributions both by defining novel similarity measures, that are used to identify the best database matches, and by proposing methods to improve efficiency. On the topic of similarity measures, the thesis contributes a method for measuring similarity in a database of videos from American Sign Language (ASL). This method produces promising results towards enabling useful educational applications for the ASL community. A second contribution of the thesis is on the theoretical problem of how to define a useful metric distance measure for time series data. The thesis proposes a novel metric, called MSM (abbreviation for Move-Split-Merge), which has both attractive theoretical properties and competitive classification accuracy on actual data.With respect to the problem of improving the efficiency of similarity search, the thesis contributes a novel method for recognition of a large number of classes. While many researchers have worked on the topic of how to train good classifiers for this task, the thesis proposes a new perspective by explicitly addressing efficiency. In particular, the thesis shows that, under some conditions, multiclass recognition becomes theoretically equivalent to similarity search, and in that case we can use off-the-shelf similarity indexing methods to significantly speed up multiclass recognition. The thesis also proposes a dimensionality reduction method specifically designed for speeding up similarity search in large string databases. While dimensionality reduction methods are commonly used in vector spaces, our method allows similar techniques to be used for spaces of strings under the edit distance measure.Thorough experimental evaluation on a variety of datasets demonstrates state-of-the-art performance for the methods that constitute the contributions of the thesis.