Thread in #data-mining - Projects and more

Nityesh Agarwal December 16, 2019 at 11:38 AM

ct2

Nityesh Agarwal December 17, 2019 at 04:18 AM

clustering methods overview

Nityesh Agarwal December 17, 2019 at 04:18 AM

grid based vs. density based clustering

Nityesh Agarwal December 17, 2019 at 04:21 AM

Problems with clustering high-dimensional data:

Parameters are often hard to determine, especially for high-dimensionality data sets and where users have yet to grasp a deep understanding of their data. A data set can contain numerous dimensions or attributes. Finding clusters of data objects in a high-dimensional space is challenging, especially considering that such data can be very sparse and highly skewed.

Nityesh Agarwal December 17, 2019 at 04:32 AM

CLARA and CLARANS algorithms

Nityesh Agarwal December 17, 2019 at 04:36 AM

Overview of CLARANS:
It presents a trade-off between the cost and the effectiveness of using samples to obtain clustering.

First, it randomly selects k objects in the data set as the current medoids. It then randomly selects a current medoid x and an object y that is not one of the current medoids.
Then it checks for the following condition:
> Can replacing x by y improve the absolute-error criterion?
If yes, the replacement is made. CLARANS conducts such a randomized search l times. The set of the current medoids after the l steps is considered a local optimum.
CLARANS repeats this randomized process m times and returns the best local optimal as the final result.