Horizons for Information Societies
Seminar #5

The 5th seminar on Horizons for Information Societies was about data processing. Dr Houle () presented his research on search and clustering at the . This seminar was recommended for any and all users of search, clustering, or classification techniques in their work. No special background in these areas was assumed.

See also: previous seminar (#4), next seminar (#6).



Slides of the presentation:

Date: 26 April 2007 (15:00-17:30)
Location: , Tokyo, Japan
Language: English
Registration fees: None
Organization: Dr ()


A Combinatorial Approach to Search and Clustering
by Dr

Abstract: This seminar introduces a new model for clustering based on combinatorial rather than distance information, that requires no direct knowledge of the nature or representation of the data. In lieu of such knowledge, the Relevant-Set Correlation (RSC) clustering model relies solely on the existence of an oracle that accepts a query in the form of a data item, and returns a ranked set of items relevant to the query. In principle, the role of the oracle could be played by any similarity search structure, or even a commercial search engine whose ranking function and relevancy scores are kept secret. The quality of cluster candidates, the degree of association between pairs of cluster candidates, and the degree of association between clusters and data items are all assessed according to the statistical significance of a form of correlation among pairs of relevant sets and/or candidate cluster sets. A clustering heuristic, GreedyRSC, has also been developed based on RSC. We show that GreedyRSC has many desirable features - in particular, it is able to determine an appropriate number of clusters naturally and automatically, without requiring the user to supply data-dependent parameter values.

Topics to be covered include:

Speaker: Dr Houle joined the as a Visiting Professor in 2004. In addition to his current work on search and clustering for data mining applications, his research interests include computational and combinatorial geometry, data structures, and graph algorithms.