Mathematics > Statistics Theory
[Submitted on 5 Feb 2010 (v1), last revised 8 Mar 2014 (this version, v3)]
Title:Clustering and variable selection for categorical multivariate data
View PDFAbstract:This article investigates unsupervised classification techniques for categorical multivariate data. The study employs multivariate multinomial mixture modeling, which is a type of model particularly applicable to multilocus genotypic data. A model selection procedure is used to simultaneously select the number of components and the relevant variables. A non-asymptotic oracle inequality is obtained, leading to the proposal of a new penalized maximum likelihood criterion. The selected model proves to be asymptotically consistent under weak assumptions on the true probability underlying the observations. The main theoretical result obtained in this study suggests a penalty function defined to within a multiplicative parameter. In practice, the data-driven calibration of the penalty function is made possible by slope heuristics. Based on simulated data, this procedure is found to improve the performance of the selection procedure with respect to classical criteria such as BIC and AIC. The new criterion provides an answer to the question "Which criterion for which sample size?" Examples of real dataset applications are also provided.
Submission history
From: Dominique Bontemps [view email] [via CCSD proxy][v1] Fri, 5 Feb 2010 07:40:26 UTC (33 KB)
[v2] Tue, 19 Oct 2010 19:11:17 UTC (36 KB)
[v3] Sat, 8 Mar 2014 18:14:10 UTC (468 KB)
Current browse context:
math.ST
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.