A Dissimilarity Measure for Clustering High- and Infinite Dimensional Data that Satisfies the Triangle Inequality
- Author:
- Bushnell, Dennis M.
- Published:
- December 2002.
- Physical Description:
- 1 electronic document
- Additional Creators:
- Socolovsky, Eduardo A.
Online Version
- hdl.handle.net , Connect to this object online.
- Restrictions on Access:
- Unclassified, Unlimited, Publicly available.
Free-to-read Unrestricted online access - Summary:
- The cosine or correlation measures of similarity used to cluster high dimensional data are interpreted as projections, and the orthogonal components are used to define a complementary dissimilarity measure to form a similarity-dissimilarity measure pair. Using a geometrical approach, a number of properties of this pair is established. This approach is also extended to general inner-product spaces of any dimension. These properties include the triangle inequality for the defined dissimilarity measure, error estimates for the triangle inequality and bounds on both measures that can be obtained with a few floating-point operations from previously computed values of the measures. The bounds and error estimates for the similarity and dissimilarity measures can be used to reduce the computational complexity of clustering algorithms and enhance their scalability, and the triangle inequality allows the design of clustering algorithms for high dimensional distributed data.
- Other Subject(s):
- Collection:
- NASA Technical Reports Server (NTRS) Collection.
- Note:
- Document ID: 20030004236.
ICASE-IR-43.
NASA/CR-2002-212136.
NAS 1.26:212136. - Terms of Use and Reproduction:
- No Copyright.
View MARC record | catkey: 15637443