摘录一段代码,来自https://haroldpimentel.wordpress.com/2014/05/08/what-the-fpkm-a-review-rna-seq-expression-units/
countToTpm <- function(counts, effLen)
{
rate <- log(counts) - log(effLen)
denom <- log(sum(exp(rate)))
exp(rate - denom + log(1e6))
}
countToFpkm <- function(counts, effLen)
{
N <- sum(counts)
exp( log(counts) + log(1e9) - log(effLen) - log(N) )
}
fpkmToTpm <- function(fpkm)
{
exp(log(fpkm) - log(sum(fpkm)) + log(1e6))
}
countToEffCounts <- function(counts, len, effLen)
{
counts * (len / effLen)
}
################################################################################
# An example
################################################################################
cnts <- c(4250, 3300, 200, 1750, 50, 0)
lens <- c(900, 1020, 2000, 770, 3000, 1777)
countDf <- data.frame(count = cnts, length = lens)
# assume a mean(FLD) = 203.7
countDf$effLength <- countDf$length - 203.7 + 1
countDf$tpm <- with(countDf, countToTpm(count, effLength))
countDf$fpkm <- with(countDf, countToFpkm(count, effLength))
with(countDf, all.equal(tpm, fpkmToTpm(fpkm)))
countDf$effCounts <- with(countDf, countToEffCounts(count, length, effLength))
Here, the effective length is the transcript length minus the mean fragment length plus 1; that is, all the possible positions of an average fragment inside the transcript, which equals the number of all distinct fragments that can be sampled from a transcript.
在一段QA里面,也有相关的讨论:https://bioinformatics.stackexchange.com/questions/66/how-to-compute-rpkm-in-r
补链接:
- https://github.com/crazyhottommy/RNA-seq-analysis/blob/master/salmon_kalliso_STAR_compare.md
- https://www.biostars.org/p/171766/
- http://www.bio-info-trainee.com/2017.html
- http://www.bioinfo-scrounger.com/archives/342
- https://thegenomefactory.blogspot.com/2013/08/paired-end-read-confusion-library.html