Skip to content

Julia package to perform Bayesian clustering of high-dimensional Euclidean data using pairwise dissimilarity information.

License

Notifications You must be signed in to change notification settings

abhinavnatarajan/RedClust.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RedClust

GitHub Workflow Status (master) License Latest release Code Coverage

Documentation

Development version documentation Stable version documentation arxiv paper link

Please see the detailed documentation above.

Introduction

RedClust is a Julia package for Bayesian clustering of high-dimensional Euclidean data using pairwise dissimilarities instead of the raw observations. It uses an MCMC sampler to generate posterior samples from the space of all possible clustering structures on the data.

Installation

The package can be installed by typing ]add RedClust into the Julia REPL or by the usual method:

using Pkg
Pkg.add("RedClust")

Basic example

using RedClust
# Generate data
points, distM, clusts, probs, oracle_coclustering = 
	generatemixture(100, 10; α = 10, σ = 0.25, dim = 10)
# Let RedClust choose the best prior hyperparameters
params = fitprior(pnts, "k-means", false)
# Set the MCMC options
options = MCMCOptionsList(numiters = 5000)
data = MCMCData(points)
# Run the sampler
result = runsampler(data, options, params)
# Get a point estimate 
pointestimate, index = getpointestimate(result)
# Summary of point estimate
summarise(pointestimate, clusts)

A more elaborate example can be found in the detailed documentation. Examples from the paper and its supplementary material can be found in the 'examples' branch of this repository.

Citing this package

If you want to use this package in your work, please cite it as:

Natarajan, A., De Iorio, M., Heinecke, A., Mayer, E. and Glenn, S. (2023). ‘Cohesion and Repulsion in Bayesian Distance Clustering’, Journal of the American Statistical Association, 119(546), pp. 1374--1384. DOI: 10.1080/01621459.2023.2191821.

For BibTeX users:

@article{NDI23,
  doi = {10.1080/01621459.2023.2191821},
  author = {Natarajan, Abhinav and De Iorio, Maria and Heinecke, Andreas and Mayer, Emanuel and Glenn, Simon},
  title = {Cohesion and Repulsion in Bayesian Distance Clustering},
  journal = {Journal of the American Statistical Association},
  volume = {119},
  issue = {546},
  pages={1374--1384},
  year = {2023}
}

About

Julia package to perform Bayesian clustering of high-dimensional Euclidean data using pairwise dissimilarity information.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages