[Solved] Creating a List of Lists for Three Reading Frames [closed]


I suppose you have a sequence of nucleotides from DNA and you want to know if there is any gen there in any of the three frames of the sequence. I recommend to use Biopython to do it but there are other ways to make it without using the “standard” solution. Here I present one of my own solution.

The main problem is how to get the position of the methionine and the stop codons (The length is just a rest between where are they found.). To do that we need to know how are they:

start="AUG"
stop = ('UAA', 'UGA', 'UAG')

The start of the ORF is the codon that codifies for the methionine and the stop can be any of the amber, ochre or opal codons.

Now we could get a list of where are they:

sequence = "ATATAGACATCGAATACTAATAGCATACAGTCCAAATTCGGAGCCCGACATTCTTCGATAACGACCGCTGATTATATGGGGCTCCGTCTACTCTAGGAGTTCTTGGCTGAGCCCTTCTAATTACCCACCGGGTGGCACACCAAGGACCGAAAAACCGTGGCCCGTGGGGAAAGTATCAGAACGGTACGGACCGTTTTTCCACCTCAAGGGACACCTTTGTCCCCGCCAATTATGCCAACCTCTCATAGTATTATATCTCCTCAATTTCTATGTGCGCAGTGTCTGTATGTTAGGACGCGCATGCACTGAAATGGCGATAGTGTGAATACATAGGTCATCTTGTGCCAGTGGCTGACTGATCGTCTACAAGTGACAATGCTGTGAATAACAAGATTGTGCACATGTCTAACCCGTGAGCTGGAGCTCCATAGCTATGGAGCTCCAGCTCACGGGTTAGACATTTTACAGTAGCGTACATTTCTGGCCGACCAACAGTGCATGGAGTTCAAGGCACATCCTTACTAAATTCTCCGTGTCCAGATTTAACAGCGAAGACGCTTTCCACGGACACAAGTATGAAAAGCGGCCGAAGGGGTCATTTGGACCAATGGACTGTTAGCGATACGCAAGAGTGAAAGGCGGGCGATCCACATTACAAATCCCTATCAGGACTGCGAATAAGATTTTCCTGAATATGAGTGGTGTGGACAGAGCTATGTTTTTCGAATTCCGCACACTCGAGTGCGCGGCCTTCTCAGAGTTTTAAACTTTGCCTGGGTACTGATTATTATAGTCCAAGTAGAATAGTCACTCTATATTTTTAATAGAATGCGGGTGACACCGGCAAGAGAACCGAGCATTT"

start="ATG"
stop = ('TAA', 'TGA', 'TAG')
stop_positions = []
start_positions = []

# Normal direction
for i in range(len(sequence)): # Create the index value
    if sequence[i:].startswith(start): # Check if there is a methionine in this position.
        start_positions.append(i)
    else:
        for stop_codon in stop:
            if sequence[i::].startswith(stop_codon): # Check if there is a stop in this position.
                stop_positions.append(i)
                break

To invert them we can do sequence = sequence[::-1] and then repeat the above code. (Note that if you do that you will obtain the index reversed)
Now we need to get which are a ORF and which not. We can do so with a couple of loops:

for start_ in start_positions:
    for  stop_ in stop_positions:
        if stop_ <= start_:
            continue
        else:
            print("{}...{}".format(start_, stop_))
            break

To get the name of the frame is simple, divide by 3 the start and the modulus is the -1 is the frame.

Here is the output of the above code, as you can see I didn’t implement everything you need but it is not much complicate to add it.

75...93
231...245
269...290
286...290
300...306
310...317
375...381
401...406
433...454
498...522
575...576
607...616
694...695
715...763
828...835
Reversed sequence
# Note that the indexes start from 1, to get the real index do len(sequence) -index
61...62 
82...101
286...287
387...393
392...393
575...582
612...613
676...688
687...688

Some notes:

Check that the length is more than 1 nucleotide. One faster way is translate the sequence and get everything between a methionine and a stop code. This code works fine for python 3 if you have other versions you might need to change something.

There is much to improve on this code to get the exact result you want if you can’t adapt this code to get it, you don’t really understand how it works comment below, or if you have advanced or you have a major problem with this start a new question and wait. (I don’t want to provide a direct answer to a Rosalind’s problem)

3

solved Creating a List of Lists for Three Reading Frames [closed]