Home
Methods
- Stegoscripts and statistical model
- Wordspy algorithm
- Model optimization
- Over-represented motif discovery
- Word clustering
Insights
Results
|
|
|
A steganalysis-based approach
for genome-wide identification of regulatory DNA sequence elements
Guandong Wang and Weixiong Zhang
Genome-wide identification of cis-acting elements,
or transcription factor binding motifs (TFBMs), is a challenge problem.
We approach the problem by viewing the regulatory regions of a genome
as a stegoscript with over-represented words, i.e., TFBMs, being embedded
in a covertext. We model the stegoscript with a statistical model consisting
of a dictionary and a grammar, and progressively learn a series of models,
resulting in an efficient genome-wide motif finding algorithm called WordSpy.
From the promoters of 645 distinct cell-cycle related genes of S. cerevisiae,
WordSpy is able to identify all known cell-cycle related TFBMs with high
rankings based on two evaluation methods, a genome-wide Monte Carlo simulation
and a gene expression coherence measure. We further apply the method to
de novo detect putative cell-cycle related TFBMs of A. thaliana.
Several top ranking motifs resemble the binding motifs of mitotic specific
activation (MSA) and E2F transcription factors. WordSpy can also be applied
to identify discriminative motifs. By utilizing the ChIP-chip data of
Lee et al., we predict potential binding motifs of 113 known transcription
factors of budding yeast.
|