Your code seems to be completely backwards to what you’re trying to achieve:
“For each gene (in d2) which SNPs (from d1) are within 10kb of that gene?”
First of all, your code for d1$matched
is backwards. All your p
‘s and d2
s should be the other way round (currently it doesn’t make much sense?), giving you a list of SNPs whom are in cis with each gene (+/- 10kb).
I would approach it the way i’ve phrased your question:
cisWindow <- 10000 # size of your +/- window, in this case 10kb.
d3 <- data.frame()
# For each gene, locate the cis-SNPs
for (i in 1:nrow(d2)) {
# Broken down into steps for readability.
inCis <- d1[which(d1[,"CHR"] == d2[i, "chromosome"]),]
inCis <- inCis[which(inCis[,"POS"] >= (d2[i, "start"] - cisWindow)),]
inCis <- inCis[which(inCis[,"POS"] <= (d2[i, "end"] + cisWindow)),]
# Now we have the cis-SNPs, so lets build the data.frame for this gene,
# and grow our data.frame d3:
if (nrow(inCis) > 0) {
d3 <- rbind(d3, cbind(d2[i,], inCis))
}
}
I tried to find a solution which didn’t involve growing d3
in the loop, but because you’re attaching each row of d2
to 0 or more rows from d1
I wasn’t able to come up with a solution that’s not horribly inefficient.
4
solved Data selection error [closed]