Skip to content

[ENH] pdb to String loader using SEQREQ records #147

@satvshr

Description

@satvshr

Currently the only option to extract amino acids from a pdb file is to use 2 functions, load_from_rcsb and struct_to_aaseq like this:

from pyaptamer.datasets import load_from_rcsb
from pyaptamer.utils import struct_to_aaseq

print(struct_to_aaseq(load_from_rcsb("8UF8")))

So, using load_from_rcsb we convert from pdb to Structure and using struct_to_aaseq we convert from Structure to String, the problem with this approach is that biopython uses ATOM records from the pdb file as it contains 3d structure data about the pdb file. Hence, some parts of the sequence don't get captured because the 3 structure of that part is unknown. SEQREQ on the other hand contains the "simplified" version of the protein and outputs the "correct" sequence when extracted (as described in #138). Hence we should add a pdb to String converter as it would skip the conversion to Structure step and produce the complete sequence.

@KubiczekD @JaBirke please correct me if I am wrong.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions