SCHEDULE (This schedule is tentative and is subject to change) ======= Session I: Clustering in Bioinformatics 9:00- 9:45 Plenary Speaker: Prof. Edward Marcotte 9:45-10:15 Block Clustering on continuous data, G\'erard Govaert (Universit\'e de Technologie de Compi\`egne) and {\bf Mohamed Nadif} (Universit\'e de Metz Ile du Saulcy) 10:15-10:45 Double Conjugated Clustering Applied to Leukemia Microarray Data, {\bf Stanislav Busygin}, Gerrit Jacobsen, Ewald Kraemer (Contentsoft AG, Munchen) 10:45-11:00 Break Session II: Information-Theoretic Clustering 11:00-11:30 Entropy based clustering for high dimensional genomic data sets, {\bf Donglin Liu} and Gautam Singh (Oakland University) 11:30-12:00 An Information-Theoretical Approach to Clustering Categorical Databases using Genetic Algorithms, Dana Cristofor and {\bf Dan A. Simovici} (University of Massachusetts at Boston) 12:00-12:30 Cluster Initialization and Clusterability Detection, Scott Epter, Mukkai Krishnamoorthy, and {\bf Mohammed Zaki} (RPI) 12:30-1:45 Lunch Session III: Clustering Large and High-Dimensional Data 1:45-2:15 Using Low-Memory Approximations to Cluster Very Large Data Sets, {\bf David Littau} and Daniel Boley (University of Minnesota) 2:15-2:45 Refining clusters in high dimensional text data, {\bf Inderjit S. Dhillon}, Yuqiang Guan (University of Texas), and J. Kogan (University of Maryland) 2:45-3:15 Comparison of Agglomerative and Partitional Document Clustering Algorithms, {\bf Ying Zhao} and George Karypis (University of Minnesota) 3:15-3:30 Break Session IV: Nearest Neighbor and Geometric Techniques 3:30-4:00 Making the Nearest Neighbor Meaningful {\bf Daniel Tunkelang} (Endeca) 4:00-4:30 A New Shared Nearest Neighbor Clustering Algorithm and its Applications, Michael Steinbach, Vipin Kumar and {\bf Levent Ertoz} (University of Minnesota) 4:30-5:00 How to partition a low-dimensional data set into subsets of different geometric structures, {\bf Gilad Lerman} (Courant Institute) Preview ======= Clustering, or partitioning of datasets into subsets (also called clusters) so that the members of a cluster are more similar to each other than to members of other clusters is a long standing problem with rich history of documented research. Explosive use of the Internet and recent advances in information technology bring new challenges to this exciting research area. In many important applications the data resides in high dimensional vector spaces. While processing of high dimensional data represents computational challenges, in many cases the high dimensional vectors are sparse. The sparcity allows an efficient data clustering. In this one day workshop papers will be presented by experts from academia and industry. Clustering issues that will be covered include: bioinformatics applications, nearest neighborhood techniques, high dimensional clustering, clustering in low dimensional spaces. The workshop held in with the Second SIAM Conference on Data Mining brings together applied mathematicians, computers scientists, and computational statisticians working toward design of next generation clustering algorithms and software. Acknowledgements ================ Special thanks to the members of the Program Committee for their diligent efforts in reviewing all the manuscripts submitted.