AME - Analysis of Motif Enrichment

Usage:

A usage example and full list of options is provided for each tool below:

AME Use Case:

By default, AME performs unconstrained partition maximisation. With verbose set to 3, however, it will output a constrained partition maximisation score including up to each sequence (from the head) in the positive set. To generate the fluorescence based constrained partition maximisation plots in the paper, a Perl script was used to convert sequence numbers into fluorescence thresholds from the verbose output.

Additionally, the linreg mode in AME reports raw values only. For the full approach used in the paper, please use the tool RAMEN, which was also included in this download. RAMEN also supports simulation of p-values.

We wish to run an unconstrained partition maximisation MHG analysis utilising essentially the default options. By invoking AME with the following command:

./ame --method mhg --scoring totalhits --bgformat 0 macissac_yeast.v1.meme ARO80_YPD.fsa

The following output is produced:


ame (Analysis of Motif Enrichment): Compiled on Nov 30 2009
------------------------------
Copyright (C) Robert McLeay <r.mcleay@imb.uq.edu.au> & Timothy Bailey <t.bailey@imb.uq.edu.au>, 2009.

1. MultiHG p-value of motif RGT1 top 13 seqs: 0.0002104 (Corrected p-value: 0.006084)
2. MultiHG p-value of motif ARO80 top 3 seqs: 0.000821 (Corrected p-value: 0.02354)
3. MultiHG p-value of motif UME6 top 3 seqs: 0.001642 (Corrected p-value: 0.04654)
4. MultiHG p-value of motif AZF1 top 14 seqs: 0.003042 (Corrected p-value: 0.08457)
5. MultiHG p-value of motif GLN3 top 3 seqs: 0.005747 (Corrected p-value: 0.1539)
...Truncated additional lines...

RAMEN Use Case:

We wish to run a linear regression analysis utilising the default options. By invoking RAMEN with the following command:

./ramen --bgformat 0 macissac_yeast.v1.meme ARO80_YPD.fsa

The following output is produced:


ramen (Regression Analysis of Motif ENrichment): Compiled on Nov 30 2009
------------------------------
Copyright (C) Robert McLeay <r.mcleay@imb.uq.edu.au> & Timothy Bailey <t.bailey@imb.uq.edu.au>, 2009.

Options Invoked:
----------------

Background Format: Uniform
Motif Format: MEME
y-axis: Log_e of Fluorescence Scores
x-axis: PWM Scores
Motif Scoring Function: RMA (normalised motif scores)
Sampling Repetitions for p-values: 10000
Pseudocount: 0.25

Motif File: macissac_yeast.v1.meme
Sequence File: ARO80_YPD.fsa


Results:
========

Showing all motifs with p-value <= 0.05
Fitting motifs to y: = mx + b

Over-represented Motifs:
------------------------

Rank      Motif     MSE            p-value (adj)  p-value (raw)  m              b              
----      -----     ---            -------------  -------------  -              -              
1         ARO80     1.14741        0.0244974      0.0002         -9.95713e+06   -13.6832       


Under-represented Motifs:
-------------------------

Rank      Motif     MSE            p-value (adj)  p-value (raw)  m              b              
----      -----     ---            -------------  -------------  -              -              


---
Elapsed wall clock time: 3 seconds
Elapsed CPU time:        2.340000 seconds

AME Options:

ame: Compiled on Nov 30 2009
Error: Must specify a motif file and sequence file.
USAGE: ame [options] <motif file> <sequence file>

   Key Options:
     --method  [fisher|mhg|4dmhg|ranksum|linreg|spearman] Select the association function for motif significance
     --scoring [avg|max|totalhits]                        Motif-to-sequence affinity function:

         Hints:   Use avg (recommended) or max for ranksum, linreg, spearman methods.
                  Use totalhits for fisher, mhg, 4dmhg (and possibly other) methods.

   File format options:
     --bgformat     [0|1|2]            Source used to determine background frequencies
                                           0: uniform background
                                           1: MEME motif file
                                           2: Background file
     --bgfile       <background>       File containing background frequencies
     --motif-format [meme|tamo|regexp] Format of input motif file (default meme)

   Ranksum-specific options:
     --rsmethod [better|quick]     Whether to use a slower and more accurate ranksum method or a quicker one
     --poslist  [fl (default)|pwm] For partition max., threshold on either X (pwm) or Y (fluorescence)

   LR- and Spearman- specific options:
     --log-fscores           Regress on the log_e of the fluorescence scores
     --log-pwmscores         Regress on the log_e of the PWM scores
     --normalise-linreg      Normalise the motif scores so that the motifs are comparable
     --linreg-switchxy       Make the x-points fluorescence scores and the y-points PWM scores

   Fisher, MHG, 4D-MHG, Ranksum in TOTALHITS affinity mode options:
     --length-correction                      Correct for length bias by subtracting expected hits
     --pvalue-threshold <float, default=2e-4> Threshold to consider a single motif hit significant

   Fisher Test with either AVG or MAX affinity (undefined results in TOTALHITS mode) options:
     --fl-threshold  <float, default=1e-3> (Requires --poslist fl)  Max fluorescence p-value to consider a 'positive'
     --pwm-threshold <float, default=1>    (Requires --poslist pwm) Min PWM score to call a sequence a 'positive'
     --poslist       [fl (default)|pwm]    For partition max., threshold on either X (pwm) or Y (fluorescence)

          Hints: Be careful when switching the poslist. In the case of the Fisher test, it switches between
                 using X and Y for determining true positives in the contingency matrix, in addition to switching
                 which of X and Y is used for partition maximisation.

   Miscellaneous Options:
     --pseudocount <float, default = 0.25> Pseudocount for motif affinity scan
     --verbose     <1...5>                 Integer describing verbosity. Best used as first argument in list.
     --help                                Show this message again

   Note:
     By default, this tool performs unconstrained partition maximisation. With verbose set to 3, however, it will
     output a constrained partition maximisation score for each sequence in the input set. To generate the fluorescence
     based constrained partition maximisation plots in the paper, a Perl script was used to convert sequence numbers
     into fluorescence thresholds from the verbose output.

   WARNING:
     This tool will not resort input sequences. It assumes that input FastA files are sorted from most-likely to be bound
     to least likely to be bound in descending order.

   Citing ame:
     If ame is of use to you in your research, please cite:

	  Robert C. McLeay, Timothy L. Bailey.
	  "Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data."
	  BMC Bioinformatics 2010, 11:165, doi:10.1186/1471-2105-11-165.

   Contact the authors:
     You can contact the authors via email:

         Robert McLeay <r.mcleay@imb.uq.edu.au>, and
         Timothy Bailey <t.bailey@imb.uq.edu.au>.

     Bug reports should be directed to Robert McLeay.

RAMEN Options:


USAGE: ramen [options]  

   Linear Regression Options:
     --log-fscores [on|off] Regression on the log_e of the fluorescence scores
     			  on: (Default) Use the log_e(fluorescence) in the regression.
     			 off: Use the score directly provided in the sequence file.
     --log-pwmscores [on|off] Regression on the log_e of the PWM scores
     			  on: Use the log_e(RMA or AMA Score) in the regression.
     			 off: (Default) Use the RMA/AMA score directly.
     --normalise-motifs [on|off] Normalise the motif scores so that the motifs are comparable
     			  on: (Default) Normalise motifs for comparison (Use RMA score).
     			 off: Use raw AMA score (Not recommended).
     --linreg-switchxy [on|on] Switch the x and y axis for the linear regression
		          on: y-points are PWM scores, x-values are fluorescence scores.
		         off: (Default) y-points are fluorescence scores, x-points are PWM scores.
     --linreg-dumpdir  Dump (R-format) TSV files of each regression.

   P-Value Simulation Options:
     --repeats  (default=10,000) Number of times to sample for p-value determination.
     --pvalue-cutoff  (default=0.05) Only show results with p-value <= this cutoff

   File format options:
     --bgformat [0|2|3] source used to determine background frequencies
                        0: uniform background
                        1: MEME motif file
                        2: Background file
     --bgfile  file containing background frequencies
     --motif-format [meme|tamo|regexp] format of input motif file (default meme)

   Miscellaneous Options:
     --pseudocount  Pseudocount for motif affinity scan
     --verbose     <1...5>                 Integer describing verbosity. Best used as first argument in list.
     --help                                Show this message again

   Citing ramen (Regression Analysis of Motif ENrichment):
     If ramen is of use to you in your research, please cite:

	  Robert C. McLeay, Timothy L. Bailey.
	  "Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data."
	  BMC Bioinformatics 2010, 11:165, doi:10.1186/1471-2105-11-165.

   Contact the authors:
     You can contact the authors via email:

         Robert McLeay , and
         Timothy Bailey .

     Bug reports should be directed to Robert McLeay.