处理工具 liftOver
https://gwaslab.com/2021/05/09/liftover-%E5%9F%BA%E5%9B%A0%E7%BB%84%E5%9D%90%E6%A0%87%E5%8F%98%E6%8D%A2/
输入文件
(表达谱提取circRNA ID)
BED格式文件,BED格式文件只定义前三列:chr start end,无表头
注:start不等于end(UCSC 使用基于0的坐标系统,而 Ensembl 等使用基于1的坐标系统)
R处理
A <- expr
IDchange1_0 <- function(x){
str_split(x,"[:,|]") %>% sapply(function(y){
paste(y[1],gsub(y[2],as.numeric(y[2])-1,y[2]),sep = ":") %>% paste(y[3],sep = "|")
})
}
rownames(A) <- IDchange1_0(rownames(A))
rawID <- str_split(rownames(A),"[:,|]")
rawID <- data.frame(matrix(unlist(rawID),ncol = 3,byrow = TRUE))
colnames(rawID) <- c("chr","start","end")
write.table(rawID,"GSE_hg19_0.bed",col.names = F,row.names = F,quote = F, sep = "\t")
坐标转换
LINUX处理
chmod +x liftOver
./liftOver GSE_hg19_0.bed hg19ToHg38.over.chain GSE_hg38_0.bed unmapped.txt
输出文件
R处理
ID <- fread("GSE_hg38_0.bed",data.table = F)
ID <- paste(ID$V1,ID$V2,sep = ":") %>% paste(ID$V3,sep = "|")
IDchange0_1 <- function(x){
str_split(x,"[:,|]") %>% sapply(function(y){
paste(y[1],gsub(y[2],as.numeric(y[2])+1,y[2]),sep = ":") %>% paste(y[3],sep = "|")
})
}
ID <- IDchange0_1(ID)
unID <- fread("unMapped",data.table = F)
unID <- paste(unID$V1,unID$V2,sep = ":") %>% paste(unID$V3,sep = "|")
A1 <- A[!rownames(A) %in% unID,]
rownames(A1) <- ID
write.csv(A1,"GSE_circRNA_counts_hg38_0.csv")