-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Currently the only option to extract amino acids from a pdb file is to use 2 functions, load_from_rcsb
and struct_to_aaseq
like this:
from pyaptamer.datasets import load_from_rcsb
from pyaptamer.utils import struct_to_aaseq
print(struct_to_aaseq(load_from_rcsb("8UF8")))
So, using load_from_rcsb
we convert from pdb to Structure and using struct_to_aaseq
we convert from Structure to String, the problem with this approach is that biopython uses ATOM
records from the pdb file as it contains 3d structure data about the pdb file. Hence, some parts of the sequence don't get captured because the 3 structure of that part is unknown. SEQREQ
on the other hand contains the "simplified" version of the protein and outputs the "correct" sequence when extracted (as described in #138). Hence we should add a pdb to String converter as it would skip the conversion to Structure step and produce the complete sequence.
@KubiczekD @JaBirke please correct me if I am wrong.