A broad breakup of the tasks to be done :
-
Fetch data from here and focus only on the category of images that corresponds to lung cancer : https://stanfordmlgroup.github.io/projects/chexnext/
-
Train a classifier to maximize accuracy on this. Can liberally do transfer learning here and build off pretrained CNNs.
-
Grad-cam the output to highlight pixels that most strongly correspond to the output class : https://github.com/ramprs/grad-cam
-
Use this dataset to train Variational Autoencoders for data augmentation (a useful reference here : https://www.scitepress.org/Papers/2018/66186/66186.pdf)
-
Put everything together in a Github page, as asked for.
A couple of other useful references :
-
A link to the paper itself, which is a very extensive account of this very task (the first of the task, rather) : https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1002686
-
Link to the description of the dataset: https://arxiv.org/pdf/1901.07031.pdf
-
General PyTorch tutorials : https://pytorch.org/tutorials/