Skip to content

[ENH] Generalized PSeAAC algorithm #59

@satvshr

Description

@satvshr

The current implementation of the PSeAAC algorithm was designed solely to satisfy AptaNet, utilizing a version of the PSeAAC algorithm that AptaNet had specified.

The purpose of this PR is to generalize this algorithm, as it is a very popular method in bioinformatics. The original paper has over 1500 citations, and a simple search on Google Scholar will show numerous papers referencing it in the context of aptamer design.

The problem with the current implementation is:

  1. It is built to give best results when k-mer length is 4-which is incorrect. A k-mer of length 4 gives the best results with 18 physicochemical properties, not 21 as used in the current implementation.
  2. It uses 21 properties for all k-mer lengths, which does not lead to optimal results, as each different k-mer length has a preferred number of physicochemical properties.
  3. Grouping properties into groups of 3 is constraining, and frankly too rigid, for absolutely no reason.
  4. There is no allowance for grouping arbitrary indices together; it currently only allows groups of 3 in ascending order.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions