Title: | Sparse Functional Clustering |
---|---|
Description: | Provides a general framework for performing sparse functional clustering as originally described in Floriello and Vitelli (2017) <doi:10.1016/j.jmva.2016.10.008>, with the possibility of jointly handling data misalignment (see Vitelli, 2019, <doi:10.48550/arXiv.1912.00687>). |
Authors: | Valeria Vitelli [aut], Waldir Leoncio [cre] |
Maintainer: | Waldir Leoncio <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.0 |
Built: | 2024-10-31 16:33:28 UTC |
Source: | https://github.com/cran/SparseFunClust |
Given two partitions P and Q, cer(P, Q)
measures how well
they agree,
the lower the better. It is rigorously defined as the proportion of pairwise
disagreements in the two partitions (i.e., how many, out of all the possible
couples of elements in the sample, are localized in the same cluster in one
partition and in a different one in the other partition).
cer(P, Q)
cer(P, Q)
P |
first vector of cluster assignments (length n) |
Q |
second vector of cluster assignments (length n) |
The CER index, which is a number between 0 and 1, and also equal to 1 - Rand index (Rand, 1971), a popular measure of the goodness of a clustering.
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66(336), 846-850.
set.seed(8988327) x <- seq(0, 1, len = 500) out <- generate.data.FV17(50, x) result <- SparseFunClust(out$data, x, K = 2, do.alignment = FALSE) cer(out$true.partition, result$labels)
set.seed(8988327) x <- seq(0, 1, len = 500) out <- generate.data.FV17(50, x) result <- SparseFunClust(out$data, x, K = 2, do.alignment = FALSE) cer(out$true.partition, result$labels)
this function generates a set of simulated functional data in 2 clusters that reproduce the examples in Simulations 2A and 2B in Floriello & Vitelli (2017).
generate.data.FV17(n, x, paramC = 0.5, plots = FALSE)
generate.data.FV17(n, x, paramC = 0.5, plots = FALSE)
n |
number of curves |
x |
curves' domain |
paramC |
proportion of cluster overlap (default 0.5, as in Simulation 2A) |
plots |
boolean; should plots be drawn ( |
a list including:
$data
matrix (n x length(x)
) with the simulated data
$true.partition
vector (length = n) with the true cluster assignments
generate.data.FV17(5, seq(0, 1, len = 3))
generate.data.FV17(5, seq(0, 1, len = 3))
Compute Sparse Functional Clustering & Alignment
SparseFunClust( data, x, K, do.alignment, funct.measure = "L2", clust.method = "kmea", m.prop = 0.3, tuning.m = FALSE, tuning.par = list(mbound = NULL, nperm = 20), perc = 0.03, tol = 0.01, template.est = "raw", n.out = 500, iter.max = 50, vignette = TRUE )
SparseFunClust( data, x, K, do.alignment, funct.measure = "L2", clust.method = "kmea", m.prop = 0.3, tuning.m = FALSE, tuning.par = list(mbound = NULL, nperm = 20), perc = 0.03, tol = 0.01, template.est = "raw", n.out = 500, iter.max = 50, vignette = TRUE )
data |
matrix representing the functions (n x p) |
x |
matrix giving the domain of each function (n x p), or a p-dimensional vector giving the common domain |
K |
number of clusters |
do.alignment |
boolean (should alignment be performed?) |
funct.measure |
the functional measure to be used to compare the functions in both the clustering and alignment procedures; can be 'L2' or 'H1' (default 'L2'); see Vitelli (2019) for details |
clust.method |
the clustering method to be used; can be: 'kmea' for k-means clustering,'pam','hier' for hierarchical clustering |
m.prop |
the sparsity parameter (proportion of unrelevant domain where w(x) = 0); default 30% |
tuning.m |
boolean (should the sparsity parameter be tuned via a permutation-based approach?) |
tuning.par |
list of settings for the tuning of the sparsity parameter
(defaults to |
perc |
alignment parameter (max proportion of shift / dilation at each iter of the warping procedure) –> (default 3%) |
tol |
tolerance criterion on the weighting function to exit the loop (default 1%) |
template.est |
text string giving choices for the template estimation method |
n.out |
number of abscissa points on which w(x) is estimated (default 500) |
iter.max |
maximum number of iterations of the clustering loop (default 50) |
vignette |
boolean (should the algorithm progress be reported?) |
A list, with elements:
matrix (dim=K x n.out) with the final cluster templates
vector (length=n.out) of the abscissa values on which the template is defined
vector (length=n) of the cluster assignments
matrix (dim=n x 2) with the intercept (1st column) and slope (2nd column) of the estimated warping function for each of the n curves
matrix (dim=n x n.out) of each of the n curves registered abscissa
vector (length=n) of each curve's final distance to the assigned cluster template
vector (length=n.out) of the estimated weighting function w(x)
vector (length=n.out) of the final point-wise between-cluster sum-of-squares
data
:
assumed to be a vectorized version of the functional data AFTER smoothing
when using the H1 functional measure, assumed to include the functions FIRST DERIVATIVES
when using the H1 functional measure, it supports multidimensional functions R -> R^d, then data can be an array (n x p x d)]
funct.measure
: 'H1' only supported with alignment
clust.method
: 'pam' and 'hier' only supported for the case of
NO ALIGNMENT
m.prop
: needs to be a proportion for compatibility with alignment,
values > 1 not supported
tuning.m
: tuning only supported for the case of NO ALIGNMENT
tuning.par
:
mbound
must be lower than 1; the minimal value tested is 0
nperm
> 50 is unadvisable for computational reasons
perc
: 5% is already extreme; don't set this above 8-10%
template.est
:
only supported with H1 measure + ALIGNMENT
currently 2 choices are supported:'raw' or 'loess'. 'raw' just computes the vector means across functions (default choice); 'loess' estimates the template via the R loess function
set.seed(8988327) x <- seq(0, 1, len = 500) out <- generate.data.FV17(50, x) result <- SparseFunClust(out$data, x, K = 2, do.alignment = FALSE) str(result)
set.seed(8988327) x <- seq(0, 1, len = 500) out <- generate.data.FV17(50, x) result <- SparseFunClust(out$data, x, K = 2, do.alignment = FALSE) str(result)