This function will calculate f2 statistics in blocks of prespecified size for all pairs of populations specified in the input files. Input is assumed to be in PLINK binary format https://www.cog-genomics.org/plink/1.9/formats#bed and mutation ages are in Relate mut format https://myersgroup.github.io/relate/. The output is in a format that is directly accepted by the admixtools R package to calculate f2, f3, f4, f4ratio, D statistics and more (https://uqrmaie1.github.io/admixtools/).

f2_blocks_from_RelateAges(
  pref,
  file_mut,
  blgsize = NULL,
  transitions = NULL,
  maxmiss = NULL,
  fam = NULL,
  pops = NULL,
  chrs = NULL,
  tmin = NULL,
  t = NULL,
  include_undated = NULL,
  minMAF = NULL,
  apply_corr = NULL,
  debug_mode = 0L
)

Arguments

pref

Prefix of PLINK binary files, assuming filenames of form ${pref}.bed, ${pref}.bim, ${pref}.fam.

file_mut

(Optional) Prefix of filenames of mut files, assuming filenames of form ${file_mut}_chr1.mut(.gz). Chromosome names have to be consistent to those in the PLINK files. If no file is specified, all mutations in the PLINK file are used.

blgsize

(Optional) SNP block size in Morgan. Default is 0.05 (5 cM). If blgsize is 100 or greater, if will be interpreted as base pair distance rather than centimorgan distance.

transitions

(Optional) Set this to FALSE to exclude transition SNPs

maxmiss

(Optional) Discard SNPs which are missing in a fraction of populations higher than maxmiss

fam

(Optional) 1d-array assigning individuals to populations. Corresponds to the first column in the fam file and is useful if you want to change population assignments.

pops

(Optional) Populations for which data should be extracted. Names need to match the first column in the fam file (or fam option below)

chrs

(Optional) List chromosome names to use.

tmin

(Optional) Minimum time cutof in generations. Any mutations younger than tmin will be excluded from the analysis. Default: t = 0.

t

(Optional) Time cutoff in generations. Any mutations older that t will be excluded from the analysis. Default: t = Inf.

include_undated

(Optional) Include mutations that are not dated. Default: false.

minMAF

(Optional) minimum minor allele count. Default: 1.

apply_corr

(Optional) Use small sample size correction. Default: true.

debug_mode

(Optional) Prints progress used for debugging.

Value

3d array of dimension #groups x #groups x #blocks containing f2 statistics. Analogous to output of f2_from_geno in admixtools.

Examples

path <- paste0(system.file("sim/", package = "twigstats"),"/")
pref <- "msprime_ad0.8_split250_1"
file_plink <- paste0(path,pref) #only need prefix
file_mut  <- paste0(path,pref) #only need prefix (here same name as plink file but can be different)

system(paste0("gunzip ", file_plink, ".bim.gz"))

#Compute f2 statistics between all pairs of populations. You can use pops to only calculate f2s between specified populations.
f2_blocks <- f2_blocks_from_RelateAges(pref = file_plink, file_mut)
f4_ratio(f2_blocks, popX="PX", popI="P1", pop1="P2", pop2="P3", popO="P4")
#>   popO popI pop1 pop2 popX      val        se
#> 1   P4   P1   P2   P3   PX 1.082681 0.2931867

#Use a cutoff of 500 generations
f2_blocks <- f2_blocks_from_RelateAges(pref = file_plink, file_mut, t = 500)
f4_ratio(f2_blocks, popX="PX", popI="P1", pop1="P2", pop2="P3", popO="P4")
#>   popO popI pop1 pop2 popX      val         se
#> 1   P4   P1   P2   P3   PX 0.810077 0.03097378