[ENH] `pdb` to `String` loader using `SEQREQ` records

Currently the only option to extract amino acids from a pdb file is to use 2 functions, `load_from_rcsb` and `struct_to_aaseq` like this:
```
from pyaptamer.datasets import load_from_rcsb
from pyaptamer.utils import struct_to_aaseq

print(struct_to_aaseq(load_from_rcsb("8UF8")))
```
So, using `load_from_rcsb` we convert from pdb to Structure and using `struct_to_aaseq` we convert from Structure to String, the problem with this approach is that biopython uses `ATOM` records from the pdb file as it contains 3d structure data about the pdb file. Hence, some parts of the sequence don't get captured because the 3 structure of that part is unknown. `SEQREQ` on the other hand contains the "simplified" version of the protein and outputs the "correct" sequence when extracted (as described in #138). Hence we should add a pdb to String converter as it would skip the conversion to Structure step and produce the complete sequence.

@KubiczekD @JaBirke please correct me if I am wrong.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENH] `pdb` to `String` loader using `SEQREQ` records #147

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ENH] pdb to String loader using SEQREQ records #147

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[ENH] `pdb` to `String` loader using `SEQREQ` records #147