Usage

Reference data

You can use the provided reference file to run CLIPipe. Defaultly you may choose from hg19/hg38 (Human) or mm10/mm39 (Mouse). You can also create your own reference based on examples we provided.

ls /home/CLIPipe_user/clipipe/clipipe_ref/

Demo data

You can use the provided demo data to run CLIPipe:

cd /home/CLIPipe_user/clipipe/clipipe_demo/general/

The demo data folder has the following structure:

./
├── config
|   ├── default_config.yaml
│   └── user_config.yaml
├── data
|   ├── fastq/
│   └── sample_ids.txt
└── output
    └── ...

Note:
    `config/user_config.yaml`: configuration file with user defined parameters for each step.
    `config/default_config.yaml`: configuration file with additional detailed parameters for each step. The default file is not supposed to be changed unless you are very clear about what you are doing.
    `config/fastq/`: folder of raw CLIP-seq fastq file.
    `data/sample_ids.txt`: table of sample name information.
    `output/example/`: output folder.

User config file

The user config file is shown like this:

# default config
default_config_file: /home/CLIPipe_user/clipipe/clipipe_demo/general/config/default_config.yaml

# basic config file path
species: Human_hg38
reference_dir: /home/CLIPipe_user/clipipe/clipipe_ref

data_dir: /home/CLIPipe_user/clipipe/clipipe_demo/general/data
temp_dir: /home/CLIPipe_user/clipipe/clipipe_demo/general/temp
output_dir: /home/CLIPipe_user/clipipe/clipipe_demo/general/output_human_hg38
summary_dir: /home/CLIPipe_user/clipipe/clipipe_demo/general/summary

# general parameters
threads_compress: 2
threads_mapping: 4

# pre process parameters
barcode_length: 1
adaptor1: AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC

# mapping parameters
aligner: bowtie

# peak calling parameters
peak_caller: Piranha
read_length: 100

Pre-processing

CLIPipe provides pre-process step for raw CLIP-seq data. You need to set up the config/user_config.yaml file correctly. The other parameters for pre-process step can be found in config/default_config.yaml.

cd /home/CLIPipe_user/clipipe/clipipe_demo/general/;
clipipe -u ./config/user_config.yaml pre_process;

Note:
    The output folder `output_human_hg38/fastqc_raw/` contains quality control results of raw CLIP-seq data.
    The output folder `output_human_hg38/multiqc_raw/` contains summary of all raw sequencing data quality control results.
    The output folders `output_human_hg38/pre_process/` contain the pre process results of raw CLIP-seq data.

Alignment

CLIPipe provides bowtie, bwa and novoalign for mapping CLIP-seq data. You need to set up the alignment tool in the config/user_config.yaml file correctly. It is recommended to specify the number of threads in config/user_config.yaml file by adding threads_mapping: N, or you can simply add -j N parameter in the CLIPipe command. The other detial parameters for alignment can be found in config/default_config.yaml.

cd /home/CLIPipe_user/clipipe/clipipe_demo/general/;
clipipe -u ./config/user_config.yaml mapping;

Note:
    The output folder `output_human_hg38/mapping_bowtie/` contains alignment results using bowtie.
    The output folder `output_human_hg38/mapping_bwa/` contains alignment results using bwa.
    The output folders `output_human_hg38/mapping_novoalign/` contain alignment results using novoalign.

Peak calling

CLIPipe provides multiple peak calling methods for identifying recurring fragments of CLIP-seq data.

	Method-specific	Non-specific
HITS-CLIP	CTK	Piranha
PAR-CLIP	CTK, PARA suite
iCLIP	CTK, PureCLIP, iCLIPro, iCount
iCLAP	PureCLIP, iCLIPro, iCount
eCLIP	CTK, PureCLIP, iCLIPro, iCount
4sU-iCLIP	PureCLIP, iCLIPro, iCount
urea-iCLIP	PureCLIP, iCLIPro, iCount
BrdU-CLIP	CTK
Fr-iCLIP	PureCLIP, iCLIPro, iCount
FAST-iCLIP	PureCLIP, iCLIPro, iCount
irCLIP	PureCLIP, iCLIPro, iCount
seCLIP	PureCLIP
uvCLAP	JAMM
FLASH	PureCLIP
dCLIP	PeakRanger

cd /home/CLIPipe_user/clipipe/clipipe_demo/general/;
clipipe -u ./config/user_config.yaml peak_calling;    # Please choose from Piranha(mapping method: bowtie) CTK(mapping method: novoalign) PureCLIP(mapping method: biwtie) parclip_suite(do not need mapping step)

Note:
    The output folders `output_human_hg38/peak_calling_piranha/` contain alignment results using piranha.
    The output folder `output_human_hg38/peak_calling_CTK/` contains peak calling results using CTK.
    The output folders `output_human_hg38/peak_calling_pureclip/` contain alignment results using pureclip.
    The output folders `output_human_hg38/peak_calling_parclip_suite/` contain alignment results using parclip_suite.

Several other peak calling tools can be used in the CLIPipe docker directily:

# iCLIPro
$ iCLIPro [options] in.bam

# iCount
$ iCount [-h] [-v] ...

# JAMM
$ JAMM.sh --help

# PeakRanger
$ peakranger <command> <arguments>

# clipcontext
$ clipcontext [-h] [-v] {g2t,t2g,lst,int,exb,eir} ...

Motif discovery

The motif discovery function can be used directly in the CLIPipe docker. You just need to get the final binding peaks for your data as the input ${sample_id}.all_peak.bed.

For HOMER, the demo script like this:

# input: ${sample_id}.all_peak.bed

# 1. split training and test dataset
perl /home/CLIPipe_user/clipipe2/clipipe_software/bin/homer/1.split.pl ${sample_id}.all_peak.bed

# 2. prepare training and test fasta
perl /home/CLIPipe_user/clipipe2/clipipe_software/bin/homer/2.prepare_Homer.pl ${sample_id} training ${genome_fasta}
perl /home/CLIPipe_user/clipipe2/clipipe_software/bin/homer/2.prepare_Homer.pl ${sample_id} test ${genome_fasta}

# 3. Run Homer on training dataset
findMotifs.pl ${sample_id}.training_peak.fa fasta Homer_training_output -len 4,5,6,7,8,9,10 -rna # the number of len could change

# 4. Run Homer on test dataset
mkdir Homer_test_output
findMotifs.pl ${sample_id}.test_peak.fa fasta Homer_test_output -rna -find Homer_training_output/homerMotifs.all.motifs > Homer_test_output/count.txt

For MEME, the demo script like this:

# input: ${sample_id}.all_peak.bed

# 1. split training and test dataset
perl /home/CLIPipe_user/clipipe2/clipipe_software/bin/meme/1.split.pl ${sample_id}.all_peak.bed

# 2. prepare training and test fasta
perl /home/CLIPipe_user/clipipe2/clipipe_software/bin/meme/2.prepare_MEME.pl ${sample_id} training ${genome_fasta}
perl /home/CLIPipe_user/clipipe2/clipipe_software/bin/meme/2.prepare_MEME.pl ${sample_id} test ${genome_fasta}

# 3. run MEME on training dataset
meme ${sample_id}.training_peak.fa -o MEME_output -dna -minw 4 -maxw 10 -nmotifs 25 # the number of minw, maxw and nmotifs could change

# 4. run FIMO on test dataset
cat MEME_output/meme.txt | sed 's/10.0e+000/1.0e+001/g' | sed 's/10.0e+001/1.0e+002/g' | sed 's/10.0e+002/1.0e+003/g' | sed 's/10.0e+003/1.0e+004/g' | sed 's/10.0e+004/1.0e+005/g' | sed 's/10.0e+005/1.0e+006/g' | sed 's/10.0e+006/1.0e+007/g' | sed 's/10.0e+007/1.0e+008/g' | sed 's/10.0e+008/1.0e+009/g' | sed 's/10.0e+009/1.0e+010/g' | sed 's/10.0e+010/1.0e+011/g' > meme.txt
fimo --thresh 0.01 -o FIMO_output meme.txt ${sample_id}.test_peak.fa # the number of thresh could change

Other related tools are also provided:

# PhyloGibbs
$ phylogibbs-mp [-m motifwidth] input_seqfile [input_seqfile2 ...]

# STREME
$ streme [options]