The BED file contains the phenotypic data in a UCSC BED derived format. It is basically a BED file with one additional column per sample. Hereafter an example of 3 molecular phenotypes for 4 samples.
#Chr start end ID UNR1 UNR2 UNR3 UNR4 chr1 173863 173864 ENSG123 -0.50 0.82 -0.71 0.83
chr1 685395 685396 ENSG456 -1.13 1.18 -0.03 0.11
chr1 700304 700305 ENSG789 -1.18 1.32 -0.36 1.26
This file is TAB delimited. Each line corresponds to a single molecular phenotype. The first 4 columns are:
- Chromosome ID [string]
- Start genomic position of the phenotype (e.g. TSS) [integer]
- End genomic position of the phenotype (e.g. TSS) [integer]
- Phenotype ID [string]
Then additional columns give phenotype quantifications for all samples. Phenotype quantifications are encoded with floating numbers. This file should have P lines and N+4 columns where P and N are the numbers of phenotypes and samples, respectively.
Indexing BED file (required)
To feed FastQTL with BED files containing phenotypes, you need to index them with tabix first. Hereafter, the commands that does it:
bgzip phenotypes.bed && tabix -p bed phenotypes.bed.gz
Look here for more details on Tabix and Bgzip command lines. The above command line produces a file phenotypes.bed.gz.tbi that contains the index for data.bed.gz. These tow files need to be together in the same folder in order for FastQTL do be able to also read the index file when reading phenotypes.bed.gz.