Paper intensive reading (二十):Batch effects correction for microbiome data

論文題目:Batch effects correction for microbiome data with Dirichlet-multinomial regression

scholar 引用:6

頁數:8

發表時間:23 August 2018

發表刊物:Bioinformatics

作者:Zhenwei Dai1,2, Sunny H. Wong1,2, Jun Yu1,2 and Yingying Wei3,* 香港中文大學

摘要:

Motivation: Metagenomic sequencing techniques enable quantitative analyses of the microbiome. However, combining the microbial data from these experiments is challenging due to the variations between experiments. The existing methods for correcting batch effects do not consider the interactions between variables—microbial taxa in microbial studies—and the overdispersion of the microbiome data. Therefore, they are not applicable to microbiome data.
Results: We develop a new method, Bayesian Dirichlet-multinomial regression meta-analysis (BDMMA), to simultaneously model the batch effects and detect the microbial taxa associated with phenotypes. BDMMA automatically models the dependence among microbial taxa and is robust to the high dimensionality of the microbiome and their association sparsity. Simulation studies and real data analysis show that BDMMA can successfully adjust batch effects and substantially reduce false discoveries in microbial meta-analyses.
微生物薈萃分析

Discussion:

  • BDMMA captures the characteristics of metagenomic data and considers the dependence between microbial taxa. 該方法特點
  • As a result, BDMMA dramatically reduces the number of false discoveries and substantially improves the detection of associations compared with existing meta-analysis approaches. 相對於當前其他方法的優勢
  • as shown by both the simulation studies and the application to CRC metagenome studies, BDMMA is able to identify the small set of taxa that are truly associated with the phenotypes with very low false discovery rates and high recalls. 在模擬數據集和真實數據集上均有測試
  • In our project, we focused on the shotgun metagenomic sequencing data for analyses. 16S數據應該也差不多
  • Previous research has demonstrated that the DM distribution is suitable for the genus-level data 已知
  • the DM distribution may not be a good model for the OTU-level 16S data because of its high sparsity. 可能不適用於OTU級別的16Sdata
  • Therefore, we would not recommend applying BDMMA to gene annotation profiles. 不建議應用於基因註釋譜
  • BDMMA requires the batch information to be known for all the samples. 後續他們也許會研究基因批次信息缺失的情況
  • We envision that BDMMA will greatly facilitate meta-analysis of microbiome studies, especially for large consortium projects such as the American Gut Project and the MetaHIT project, which will ultimately improve disease diagnostics and treatments. 作者對該方法的前景預估

Introduction:

  • The different reagents, labs, platforms, or even just personnel, can all cause variations between batches. 
  • 一般常用的批次效應處理方法,如ComBat,SVA不適用於微生物,as they assume that different microbial taxa are independent. 
  • 原因: However, microbiome sequencing techniques generate count data that represent compositions. As a result, the read counts for different taxa are dependent.
  • 對這個問題,研究者兩種方法處理:Usually, researchers either convert the raw read counts to species proportions or rarefy the read counts of all of the samples to the same total read counts.
  • The DM model addresses the overdispersion in microbial count data and considers the dependence among microbial taxa. 大量研究工作證明了DM模型在微生物領域的有效性
  • the mixed effect model (Laird and Ware, 1982), combined P-values based on the weighted Z-test (Zaykin, 2011), and ComBat (Johnson et al., 2007) 與三種方法對比

正文組織架構:

1. Introduction

2. Materials and methods

2.1 The BDMMA model

2.2 The effect of the abundance of associated microbial taxa

3. Results

3.1 Data description

3.2 Simulation studies

      3.2.1 Comparison with existing methods

      3.2.2 Sensitivity to hyperparameters

      3.2.3 Sensitivity to the abundance of associated microbial taxa

      3.2.4 Sensitivity to over-dispersion

3.3 Real data analysis

4. Discussion

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章