Pipelines for SARS-CoV-2 subgenome project

This repository includes all the codes and analysis pipelines for studying the SARS-CoV-2 subgenome landscape.

Fork me on GitHub

Data analysis of nCoV project in collaboration with Drs Yu Chen and Ke Lan Lab at KLV.

Fork me on GitHub
Integrative Gene Isoform Assembler

Currently there are multiple high-throughput sequencing techniques for transcriptome profiling. The next generation sequencing (NGS) based RNA-seq which generates millions of short reads, is often used for gene expression profiling, but it doesn't have the capability to identify accurate full-length transcripts, not mentioning potential amplification biases introduced during library construction. Pacbio sequencing offers long reads, with average read lengths over 10 kb but is hindered by lower throughput, higher error rate (11%-15%) and larger cost. We devised a computational pipeline named Integrative Gene Isoform Assembler (IGIA) to reconstruct accurate gene structures from improved Pacbio long reads with ssRNA-seq correction, and TSS/TES boundary information.

Fork me on GitHub
A user-friendly omics database of Malvaceae species

Malvaceae is a family of flowering plants containing many economically important plant species including cotton, cacao, and durian. Recently, the genomes of several Malvaceae species have been decoded, and many omics data were generated for individual species. However, no integrative database of multiple species, enabling users to jointly compare and analyse relevant data, is available for Malvaceae. Thus, we developed a user-friendly database named MaGenDB as a functional genomics hub for the plant community. We collected all available genomes of Malvaceae, and comprehensively annotated genes from different perspectives including functional RNA/protein element, gene ontology, KEGG orthology, and gene family. We processed more than 300 sets of diverse omics data with standard pipelines and integrated them into a customised genome browser, and designed multiple dynamic charts to present gene/RNA/protein-level knowledge such as dynamic expression profiles and functional elements. We also implemented a smart search system for efficiently mining genes. In addition, we constructed a functional comparison system to help comparative analysis between genes on multiple features in one species or across closely related species. This database and associated tools will allow users to quickly retrieve large-scale functional information for biological discovery.

Fork me on GitHub