Machine learning & AI

Literature, Course and Book Recommendations

1. Startup of Literature Reading

  • Deep Learning: 2015 Nature Biotech. - DeepBind: Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning

  • Cancer Locator: 2017 Genome Biology - CancerLocator: non-invasive cancer diagnosis and tissue-of-origin prediction using methylation profiles of cell-free DNA

  • Batch correction: 2018 Nature Biotech. - Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors

PDFs

2. Deep Learning on RNA

  • AS of RNA: 2019 Cell - Predicting Splicing from Primary Sequence with Deep Learning

  • AS of RNA (DARTS): 2019 Nature Methods - Deep-learning augmented RNA-seq analysis of transcript splicing

  • APA of RNA: 2019 Cell - A Deep Neural Network for Predicting and Engineering Alternative Polyadenylation

  • RNA/DNA-Protein Binding (DeepBind): 2015 Nature Biotech. - DeepBind: Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning

  • RNA secondary structure: see below

  • Cancer prediction using exRNA-seq: see below

PDFs

3. RNA Structure

3.1 RNA Secondary Structure Prediction

RNAstructure/Mfold and RNAfold perform good for sequence less than 200nt.

SuperFold uses partition in RNAstructure package to predict partition functions for subsequences of long RNA, then merge the results. Therefore, it claims to perform better on long distance base pairs.

  • SCFG Model (Rfam/Infernal)

    • What is a hidden Markov model? (Sean R Eddy) 2004 Nature Biotech

    • You can read the SCFG section in the book above. (Need a short tutorial for SCFG.)

  • Deep Learning Method

    • 2020 ICLR - RNA Secondary Structure Prediction By Learning Unrolled Algorithms (Chinese comments)

    • (Transfer learning) 2019 Nature Commn. - SPOT-RNA: RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning

PDFs & PPTs

3.2 Structural Motif Finder

PDFs & PPTs

4. Comp. Methods for Liquid Biopsy

4.1 Imputation, Normalization and Batch Correction

  • Methods for Single Cell RNA-seq

    • (1) Dropout/Sparseness and Imputation

    • (2) Heterogeneity and Normalization

    • (3) Batch effect and Confounder

    • (Pseudo-time and Others)

  • Recommendations:

    • 2020 Nature COMMN - Embracing the dropouts in single-cell RNA-seq analysis

    • 2019 - Nature Methods - A discriminative learning approach to differential expression analysis for single-cell RNA-seq

    • 2017 - Nature Methods - Normalizing single-cell RNA sequencing data: challenges and opportunities

PDFs

  • (1) Dropout/Sparseness and Imputation

  • (2) Heterogeneity and Normalization

  • (3) Batch effect and Confounder

  • (Pseudo-time and Others)

Tutorial

4.2 Feature Selection Method

PDFs

4.3 Network Approach and Clustering

  • 2020 Bioinformatics - Deeptype - Deep-learning approach to identifying cancer subtypes using high-dimensional genomic data

  • 2018 Nature COMMN. - Pathway based subnetworks enable cross-disease biomarker discovery

PDFs

4.4 Tumor Location Method

4.5 Transfer Learning

  • 2020 Bioinformatics - Exploiting transfer learning for the reconstruction of the human gene regulatory network

  • (see another example of transfer learning in SPOT-RNA above)

PDFs