Detecting Novel Associations in Large Data Sets
添加于 2011/12/19 15:07:57 3549次阅读 | 2次推荐 | 4个评论
Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we presenta measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide rangeof associations both functional and not, and for functional relationships provides a score that roughly equals the coefficientof determination (R2) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametricexploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in globalhealth, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.