2008 ~ 2010 (Harvard University, Dana-Farber Cancer Institute)
Computational models for cataloging smRNAs (smRNA-Seq) and RNA-mediated Transcriptional Gene Silencing
Generally, the ncRNAs that have regulatory function can be primarily classified based on sizes: long ncRNAs (>40 nt) and small ncRNAs (20~25 nt, smRNAs). Long ncRNAs usually possess miRNA-like signatures and influence specific genes localized in antisense or nearby, while smRNAs are involved in a wide spectrum of pathways and function either in trans- or cis-mode. In plant, endogenous smRNAs refer to microRNAs (miRNAs) and short-interfering RNAs (siRNA) that are produced by distinct pathways, and the latter has multiple subclasses. Two of the subclasses named trans-acting siRNAs (ta-siRNAs) and natural-antisense-transcript derived siRNAs (nat-siRNAs), arising from respective TAS genes and convergent gene pairs, will be incorporated in RISC (RNA-induced silencing complex) to mediate translational repression or mRNA degradation, while the rest of siRNA classes mostly derived from tandem repeats or transposable elements (TEs) are found in RITS (RNA-induced initiation of transcriptional silencing, [yeast]) or RdDM (RNA-directed DNA methylation, [Arabidopsis]) complex to mediate epigenetic transformations by targeting on nascent RNAs amid transcription. However, the function and biogenesis pathways of a great number of siRNA classes are poorly characterized.
- A. classification of small RNAs in plants
- B. smRNA-Seq reads mapped within a genomic locus are defined as a Pri-TU (primary transcript unit)
- C. miRNAs are separated from the siRNAs
- D. Principal Component Analysis (PCA) were used to predict the new miRNA candidate
2007 ~ 2008 Harvard University and Yale University
Epigenome, transcriptome and smRNAome by high-throughput Solexa seqeuncing (ChIP-Seq, RNA-Seq)
Shirley’s Lab and Deng Lab collaborated the Illumia/Solexa sequencing to generate four histone modifications, H3K4/K36/K27 tri-methylation, H3K9 acetylation, plus DNA methylation, mRNA and small RNA data in maize, rice and Arabidopsis. This hugely integrated epigenomic data not only provides an opportunity for us to interpret the relation between epigenetic marks and gene transcription and small RNAs, but also gives us a chance to compare between species (Wang et al., 2009, The Plant Cell).
2006 ~ 2007 MCDB, Yale University
Mapping of H3K4me2, H3K4me2, H3K9me2, H3K27me3 and DNA methylation in rice and Arabidopsis (ChIP-Chip)
This analysis reveals combinatorial interactions between these epigenetic modifications and chromatin structure and gene expression, and we have found several interesting rules regarding those tested epigenetic marks. 1. Cytologically densely stained heterochromatin had less H3K4me2 and H3K4me3 and more methylated DNA than the less densely stained euchromatin, whereas centromeres had a unique epigenetic composition. 2. Protein-coding genes had both methylated DNA and di- and/or trimethylated H3K4. Methylation of DNA but not H3K4 was correlated with suppressed transcription. 3. If DNA and H3K4 were comethylated, transcription was only slightly reduced. 4. Transcriptional activity was positively correlated with the ratio of H3K4me3/H3K4me2: genes with predominantly H3K4me3 were actively transcribed, whereas genes with predominantly H3K4me2 were transcribed at moderate levels. 5. More protein-coding genes contained all three modifications, and more transposons contained DNA methylation in shoots than cultured cells. Differential epigenetic modifications correlated to tissue-specific expression between shoots and cultured cells. Collectively, this study provides insights into the rice epigenomes and their effect on gene expression and plant development. (2008 Feb) The Plant Cell.; 20: 259-276
Our data support a model that in rice chromatin genes are marked by different epigenetic modifications whose combinations determine distinct gene expression states. There are four typical chromatin states with respect to the three epigenetic modifications examined in this study (Figure 6D). DNA methylation in the absence of methylated H3K4 (state 1) marks a gene for silencing, resulting in a condensed chromatin structure that impedes transcription. The presence of H3K4me2, even in the presence of DNA methylation (state 2), alters the chromatin structure to a form permissive for initiation of transcription. The presence of moderate amounts of H3K4me3 (state 3) adjusts the chromatin to a state permitting more active transcription. Finally, if H3K4me3 is the dominant modification (state 4), the chromatin adopts a conformation permitting maximal transcription.
2006 MCDB, Yale University
Statistical analysis of tiling-path microarrays. A reveiw chapter in book: Oligonucleotide microarray sequence analyses
Illustrations of tiling arrays for mRNA analysis, noncoding RNA analysis and ChIP-on-chip experiments. A) An average resolution 46 bp tiling array used to experimentally confirm the predicted gene structures and identify novel transcriptionally active regions (TAR). B) Resolution 5 bp tiling array, the higher resolution of tiling array, the smaller exons could be identified. C) Tiling arrays in ChIP-on-chip experiment for detecting histone H3 lysine acetylations. Pink peaks are distributions of P values calculated by Hidden Markov Model. Second track are tiling array signal heat map, in which yellow intensive regions represents higher ChIP-enriched regions. D) High resolution (5 bp) tiling arrays used for non-coding RNA transcripts analysis.
2005 ~ 2006 MCDB, Yale University
Gloabl analyses of intergenic transcriptionally active regions (TAR) in rice by tiling-path microarrays
Genome tiling-path microarray experiments in several model organisms have discovered rich transcription activity beyond annotated genes, or called TARs, which have been regarded as the “Dark Matter” in the genome. In the third phase of rice genome transcription study, we have conducted a global identification and characterization of TARs in rice japonica subspecies. Using a less stringent criterion, we totally identified 25,352 and 27,747 TARs not encoded by annotated exons in rice two subspecies japonica and indica, respectively. Approximately two thirds of total TARs are conserved between japonica and indica. Subsequent analysis indicated that about 80% of the TARs (japonica) can be assigned to various putative functions and structural elements of rice genome, including splicing variants, uncharacterized portions of incompletely annotated genes, antisense transcripts, duplicated gene fragments, and potential non-coding RNAs. PLoS ONE.; 2(3): e294
2005 (National Institute of Biological Sciences, Peking University, Beijing)
NMPP, A software for processing NimbleGen microarray data
NMPP package is a bundle of user-customized tools based on established algorithms and methods to process selfdesigned NimbleGen microarray data. It features a command-linebased integrative processing procedure that comprises five major functional components, namely the raw microarray data parsing and integrating module, the array spatial effect smoothing and visualization module, the probe-level multi-array normalization module, the gene expression intensity summarization module and the gene expression status inference module. http://plantgenomics.biology.yale.edu/nmpp
Bioinformatics.; (2006 Dec); 22(23): 2955-7;
2005 ~ 2006 (National Institute of Biological Sciences, Peking University, Beijing)
Transcriptional map of rice indica by genome-wide tiling-path microarrays
We conducted a comprehensive analysis of rice indica genome transcription activity and provided experimental evidence for the rice genome annotation based on computational prediction. Our analysis detected transcription activity of 35,970 (81.9%) annotated gene models and found 10,425 (23.8%) gene models showed significant antisense transcription. We also identified 5,464 unique transcribed intergenic regions (TAR). 73.1% of the TARs are highly conserved in rice japonica genome, while 44.7% of TARs were found to be homologous to plant ESTs. Analysis of the frequency of simple sequence repeat (SSR) motifs indicated that “GA” SSR motif was richly distributed in TARs. Nature Genetics; 38: 124 – 129
2004 ~ 2005 (National Institute of Biological Sciences, Peking University, Beijing)
Activity of transposable elements changes following developmental stages in rice (PCR-based Tiling array)
Rice chromosome 4 has a unique feature that the entire chromosome can be divided into distinct heterochromatin half (0~17.5 Mb) and euchromatin half (17.5 ~ 34Mb). From our tiling analysis, we discovered a close correlation between transcriptional activity and chromosome organization and the developmental regulation of transcription activity at the chromosome level: in early developmental stages, the gene-rich euchromatic portion is more actively transcribed than is transposon-rich heterochromatic portion of the chromosome; however in mature developmental stages, transcription activity of the transopson-related genes in heterochromatic regions was observed to be highly increased, but oppositely, the protein-coding gene’s transcription activity in the euchromatic regions was reduced. The Plant Cell.; 17(6):1641-57.
2003 ~ 2008 (NIBS, Peking Univeristy, Beijing Genomics Institute, CAS)
Microarray related projects
Gene micorarray analysis related projects
- A microarray analysis of the rice transcriptome and its comparison to Arabidopsis. Genome Research. 15(9):1274-1283
- Global genome expression analysis of rice in response to drought and high-salinity stresses in shoot, flag leaf, and panicle. (2007 Mar) Plant Molecular Biology; 63(5):591-608. Epub 2007 Jan 16
- A Genome-Wide Transcription Analysis Reveals a Close Correlation of Promoter INDEL Polymorphism and Heterotic Gene Expression in Rice Hybrids. (2008 Aug) Molecular Plant; 1: 720-731
- Characterization of the genome expression trends in the heading-stage panicle of six rice lineages. accepted by Genomics