2018 2015 2013 2012
8 February 2018

As an optimization problem, clustering exhibits a striking phenomenon: It is generally regarded as easy in practice, while theory classifies it among the computationally intractable problems. To address this dichotomy, research has identified a number of conditions a data set must satisfy for a clustering to be (1) easily computable and (2) meaningful. In this talk we show that all previously proposed notions of struturedness of a data set are fundamentally local properties, i.e. the global optimum is in well defined sense close to a local optimum. As a corollary, this implies that the Local Search heuristic has strong performance guarantees for both the tasks of recovering the underlying optimal clustering and obtaining a clustering of small cost.
Room 222 - Galleria 1 @12:00

22 February 2018

The study of online algorithms and competitive analysis provide a solid foundation for studying the quality of irrevocable decision making when the data arrives in an online manner. While in some scenarios the decisions are indeed irrevocable, there are many practical situations when changing a previous decision is not impossible, but simply expensive. In this work we formalize this notion and introduce the consistent k-clustering problem. With points arriving online, the goal is to maintain a constant approximate solution, while minimizing the number of reclusterings necessary. We prove a lower bound, showing that Ω(k log n) changes are necessary in the worst case, for a wide range of objective functions. On the positive side, we give an algorithm that needs only O(k^2 log^4 n) changes to maintain a constant competitive solution. This is an exponential improvement on the naive solution of reclustering at every time step. Finally, we show experimentally that our approach performs much better than the theoretical bound, with the number of changes growing approximately as O(log n). Joint work with Sergei Vassilvitskii.
Manno, Galleria 1, 2nd floor, room G1-204 @12:00