This function will calculate D statistics in blocks of prespecified size for all pairs of populations specified in the poplabels file. Please refer to the Relate documentation for input file formats (https://myersgroup.github.io/relate/).

D_from_Relate(
  file_anc,
  file_mut,
  poplabels,
  pop1,
  pop2,
  pop3,
  pop4,
  file_map = NULL,
  chrs = NULL,
  blgsize = NULL,
  mu = NULL,
  tmin = NULL,
  t = NULL,
  transitions = NULL,
  use_muts = NULL,
  minMAF = NULL
)

Arguments

file_anc

Filename of anc file. If chrs is specified, this should only be the prefix, resulting in filenames of ${file_anc}_chr${chr}.anc(.gz).

file_mut

Filename of mut file. If chrs is specified, this should only be the prefix, resulting in filenames of ${file_anc}_chr${chr}.anc(.gz).

poplabels

Filename of poplabels file

pop1

Population 1. Either a population in poplabels or Root.

pop2

Population 2. Either a population in poplabels or Root.

pop3

Population 3. Either a population in poplabels or Root.

pop4

Population 4. Either a population in poplabels or Root.

file_map

(Optional) File prefix of recombination map. Not needed if blgsize is given in base-pairs, i.e. blgsize > 100

chrs

(Optional) Vector of chromosome IDs

blgsize

(Optional) SNP block size in Morgan. Default is 0.05 (5 cM). If blgsize is 100 or greater, if will be interpreted as base pair distance rather than centimorgan distance. If blgsize is negative, every tree is its own block.

mu

(Optional) Per base per generation mutation rate to scale f2 values. Default: 1.25e-8

tmin

(Optional) Minimum time cutof in generations. Any lineages younger than tmin will be excluded from the analysis. Default: t = 0.

t

(Optional) Time cutoff in generations. Default: Inf

transitions

(Optional) Set this to FALSE to exclude transition SNPs. Only meaningful with use_muts

use_muts

(Optional) Calculate traditional f2 statistics by only using mutations mapped to Relate trees. Default: false.

minMAF

(Optional) Minimum frequency cutoff. Default: 1 (i.e. excl singletons)

Value

Data frame accepted by jackknife function.

Examples

file_anc  <- system.file("sim/msprime_ad0.8_split250_1_chr1.anc.gz", package = "twigstats")
file_mut  <- system.file("sim/msprime_ad0.8_split250_1_chr1.mut.gz", package = "twigstats")
poplabels <- system.file("sim/msprime_ad0.8_split250_1.poplabels", package = "twigstats")
file_map  <- system.file("sim/genetic_map_combined_b37_chr1.txt.gz", package = "twigstats")

#Calculate D
D_blocks <- D_from_Relate(file_anc, file_mut, poplabels, "P4", "P3", "P1", "P2", file_map)
print(jackknife(D_blocks))
#>           val          se
#> 1 0.004914178 0.003732302
D_blocks <- D_from_Relate(file_anc, file_mut, poplabels, "P4", "P3", "P1", "PX", file_map)
print(jackknife(D_blocks))
#>          val          se
#> 1 0.02788034 0.005402604

#Calculate D with time cutoff
D_blocks <- D_from_Relate(file_anc, file_mut, poplabels, "P4", "P3", "P1", "PX", file_map, t = 1000)
print(jackknife(D_blocks))
#>         val         se
#> 1 0.2185745 0.01000329

#Calculate D with mutations only (and time cutoff)
D_blocks <- D_from_Relate(file_anc, file_mut, poplabels, "P4", "P3", "P1", "PX", file_map, use_muts = T, t = 1000)
print(jackknife(D_blocks))
#>         val         se
#> 1 0.2196723 0.01072831