山东大学学报 (工学版) ›› 2019, Vol. 49 ›› Issue (5): 105-111.doi: 10.6040/j.issn.1672-3961.0.2018.209
Yanxia MENG1(),Yuchen GUO1,Li WANG2,*()
摘要:
针对最大信息系数(maximal information coefficient, MIC)算法计算时间复杂度较高的问题,提出一种基于动态均分的最大信息系数(dynamic equpartition of maximal information coefficient, DE-MIC)改进算法,利用动态均分对两变量在网格中的散点图进行不断迭代寻优,通过对获得的互信息进行正则化得到最优的DE-MIC值,同时利用标准的可移植操作系统接口(portable operating system interface of UNIX, POSIX)对数据集进行多线程计算,使算法在大规模数据集上的计算效率更高。经过在多个数据集上与快速最大信息系数算法(rapid computation of the maximal information coefficient, RapidMIC)比较, DE-MIC算法在保持原有最大信息系数算法普适性和均匀性的前提下,计算速度更快且效率更佳。
中图分类号:
1 |
樊嵘, 孟大志, 徐大舜. 统计相关性分析方法研究进展[J]. 数学建模及其应用, 2014, 3 (1): 1- 12.
doi: 10.3969/j.issn.2095-3070.2014.01.002 |
FAN Rong , MENG Dazhi , XU Dashun . Survey of research process on statistical correlation analysis[J]. Mathematical Modeling and Its Applications, 2014, 3 (1): 1- 12.
doi: 10.3969/j.issn.2095-3070.2014.01.002 |
|
2 | PEARSON K . Mathematical contributions to the theory of evolution:Ⅲ: regression, heredity and panmixia[J]. Philosophical Transactions of the Royal Society of London:Series A: Containing Papers of a Mathematical or Physical Character, 1895, 187, 253- 318. |
3 | MYERS J L , WELL A D . Research design and statistical analysis[M]. 2nd ed New York, USA: Lawrence Erlbaum Associates, 2003: 15- 89. |
4 |
KRUSKAL W H . Ordinal measures of association[J]. Journal of the American Statistical Association, 1958, 53 (284): 814- 861.
doi: 10.1080/01621459.1958.10501481 |
5 |
CIGANOVIC N , BEAUTY N J , RENNER R . Smooth max-information as one-shot generalization for mutual information[J]. IEEE Transactions on Information Theory, 2014, 60 (3): 1573- 1581.
doi: 10.1109/TIT.2013.2295314 |
6 |
DELICADO P , SMREKAR M . Measuring non-linear dependence for two random variables distributed along a curve[J]. Statistics and Computing, 2009, 19 (3): 255- 269.
doi: 10.1007/s11222-008-9090-y |
7 |
RESHEF D N , RESHEF Y A , FINUCANE H K , et al. Detecting novel associations in large data sets[J]. Science, 2011, 334 (6062): 1518- 1524.
doi: 10.1126/science.1205438 |
8 |
SPEED T . A correlation for the 21st century[J]. Science, 2011, 334 (6062): 1502- 1503.
doi: 10.1126/science.1215894 |
9 |
SHAO Fubo , LI Keping , et al. Railway accidents analysis based on the improved algorithm of the maximal information coefficient[J]. Intelligent Data Analysis, 2016, 20 (3): 597- 613.
doi: 10.3233/IDA-160822 |
10 |
PANG C N I , GOEL A , LI S S , et al. A multidimensional matrix for systems biology research and its application to interaction networks[J]. Journal of Proteome Research, 2012, 11 (11): 5204- 5220.
doi: 10.1021/pr300405y |
11 |
KOREN O , GOODRICH J K , CULLENDER T C , et al. Host remodeling of the gut microbiome and metabolic cha-nges during pregnancy[J]. Cell, 2012, 150 (3): 470- 480.
doi: 10.1016/j.cell.2012.07.008 |
12 |
SAGL G , BLASCHKE T , BEINAT E , et al. Ubiquitous geo-sensing for context-aware analysis: exploring relationships between environmental and human dynamics[J]. Sensors, 2012, 12 (7): 9800- 9822.
doi: 10.3390/s120709800 |
13 |
EILER A , ZAREMBA-NIEDZWIEDZKA K , MARTINEZ-CARCIA M , et al. Productivity and salinity structuring of the microplankton revealed by comparative freshwater metagenomics[J]. Environmental Microbiology, 2014, 16 (9): 2682- 2698.
doi: 10.1111/1462-2920.12301 |
14 | ZHANG Yi , JIA Shili , HUANG Haiyun , et al. A novel algorithm for the precise calculation of the maximal information coefficient[J]. Scientific Reports, 2014, (4): 6662. |
15 |
WANG Shuliang , ZHAO Yiping . Analysing large biological datasets with an improved algorithm for MIC[J]. International Journal of Data Mining and Bioinformatics, 2015, 13 (2): 158.
doi: 10.1504/IJDMB.2015.071548 |
16 |
CHEN Yuan , ZENG Ying , LUO Feng , et al. A new algorithm to optimize maximal information coefficient[J]. PLos One, 2016, 11 (6): e0157567.
doi: 10.1371/journal.pone.0157567 |
17 |
ALBANESE D , FILOSI M , VISINTAINER R , et al. Minerva and minepy:a C engine for the MINE suite and its R, Python and MATLAB wrappers[J]. Bioinformatics, 2013, 29 (3): 407- 408.
doi: 10.1093/bioinformatics/bts707 |
18 | TANG D , WANG M , ZHENG W , et al. RapidMic: rapid computation of the maximal information coefficient[J]. Evolutionary Bioinformatics, 2014, 10- 11. |
19 |
SPELLMAN P T , SHERLOCK G , ZHANG M Q , et al. Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization[J]. Molecular Biology of the Cell, 1998, 9 (12): 3273- 3297.
doi: 10.1091/mbc.9.12.3273 |
20 | KALARI K R , ROSSELL D , NECELA B M , et al. Deep sequence analysis of non-small cell lung cancer: integrated analysis of gene expression, alternative splicing, and single nucleotide variations in lung adenocarcinomas with and without oncogenic KRAS mutations[J]. Frontiers in Oncology, 2012, 2- 12. |
[1] | 王梦园,张雄,马亮,彭开香. 基于因果拓扑图的工业过程故障诊断[J]. 山东大学学报(工学版), 2017, 47(5): 187-194. |
[2] | 侯广松,高军,吴衍达,张欣,邓影,李常刚,张亚萍. 输电线路参数与运行方式的相关性分析[J]. 山东大学学报(工学版), 2017, 47(4): 89-95. |
[3] | 翟东海1,2,鱼江1,聂洪玉1,崔静静1,杜佳1. 基于相关性反馈的自适应热点话题追踪模型[J]. 山东大学学报(工学版), 2014, 44(1): 7-12. |
[4] | 蒋志方1,王德明2,杜晓亮1,孟祥旭1,李慎芳1. 基于结构优化的RAN城市环境空气质量预测模型[J]. 山东大学学报(工学版), 2010, 40(6): 1-7. |
[5] | 孟祥星1,于大洋2,韩学山2,赵建国3. 太阳辐射与负荷波动的相关性对光伏发电并网的影响[J]. 山东大学学报(工学版), 2010, 40(2): 126-129. |
[6] | 刘云,邱晓国 . 内插TOC系数法测定水体中COD研究[J]. 山东大学学报(工学版), 2007, 37(4): 108-117 . |
[7] | 牛新生,叶华,王亮 . 内插TOC系数法测定水体中CODcr研究[J]. 山东大学学报(工学版), 2007, 37(4): 0-0 . |
|