关于python:01GATK肿瘤基因变异最佳实践SnakeMake流程WorkFlow简介

56次阅读

共计 1789 个字符,预计需要花费 5 分钟才能阅读完成。

<~ 生~ 信~ 交~ 流~ 与~ 合~ 作~ 请~ 关~ 注~ 公~ 众~ 号 @生信摸索 >

代码地址

https://jihulab.com/BioQuest/smkhss
https://github.com/BioQuestX/smkhss

GATK best practices workflow Pipeline summary

SnakeMake workflow for Human Somatic short variants (SNP+INDEL)

Expected fastq inputs

Matched normal and tumor samples.

Reference

  1. Reference genome related files and GTAK budnle files (GATK)
  2. VEP Variarition annotation files (VEP)

Prepare

  1. Adapter trimming (Fastp)
  2. Aligner (BWA mem2)
  3. Mark duplicates (samblaster)
  4. Generates recalibration table for Base Quality Score Recalibration (BaseRecalibrator)
  5. Apply base quality score recalibration (ApplyBQSR)
  6. Merge CRAMs of every sample, repesectly (Picard)
  7. Create CRAM index (samtools)

Quality control report

  1. Fastp report (MultiQC)
  2. Alignment report (MultiQC)

Call

  1. Call somatic SNVs and indels via local assembly of haplotypes (Mutect2)
  2. Tabulates pileup metrics for inferring contamination (GetPileupSummaries)
  3. Calculate the fraction of reads coming from cross-sample contamination (CalculateContamination)
  4. Get the maximum likelihood estimates of artifact prior probabilities in the orientation bias mixture model filter (LearnReadOrientationModel)
  5. Filter somatic SNVs and indels called by Mutect2 (FilterMutectCalls)
  6. Merge all the VCF files (Picard)

Annotation

Annotate variant calls with VEP (VEP)

SnakeMake Report

Outputs

├── config
│  ├── captured_regions.bed
│  ├── config.yaml
│  └── samples.tsv
├── dag.svg
├── logs
│  ├── annotate
│  ├── call
│  ├── prepare
│  ├── qc
│  ├── ref
│  └── trim
├── raw
│  ├── P1.N.fastq.gz
│  └── P1.T.fastq.gz
├── report
│  ├── fastp_multiqc_data
│  ├── fastp_multiqc.html
│  ├── P1.N.fastp.html
│  ├── P1.N.fastp.json
│  ├── P1.T.fastp.html
│  ├── P1.T.fastp.json
│  ├── prepare_multiqc_data
│  ├── prepare_multiqc.html
│  └── vep_report.html
├── results
│  ├── annotated
│  ├── called
│  ├── prepared
│  └── trimmed
└── workflow
    ├── envs
    ├── report
    ├── rules
    ├── schemas
    ├── scripts
    └── Snakefile

Directed Acyclic Graph

Refrence

https://gatk.broadinstitute.org/hc/en-us/articles/360035894731-Somatic-short-variant-discovery-SNVs-Indels-

正文完
 0