| |
- ProbeSet
class ProbeSet |
|
class Probeset -- Class that represents groups of discrete genome sequences (such as may be represented
on a microarray, or the upstream regions of all genes) in such a way is to facilitate
computing motif scores and metrics. See module documentation and documentation of
member functions. |
|
Methods defined here:
- E(self, ambig, NMlist, verbose='', factor=None, omitlist=[])
- ProbeSet.E (ambig, NMlist,verbose='',factor=None,omitlist=[]) --
Synonym for ProbeSet.Enrichment. See its docstring
- E_chi2(self, ambig, NMlist, bins=None, verbose='', factor=None, omitlist=[])
- ProbeSet.E_chi2(ambig, NMlist,bins=None,verbose='',factor=None,omitlist=[]) --
Metric that compares the distribution of number of probes with multiple occurrences
as compared to the distribution seen the the genome at large. Not completely stable,
because counts are often too small for reliable chi2.
- E_seq(self, ambig, NMlist, verbose='', factor=None, omitlist=[])
- ProbeSet.E_seq(ambig, NMlist,verbose='',factor=None,omitlist=[]) -- [Unstable]
Compute the Enrichment score, with these two changes:
The number of "draws" is the number of bases (both in the genome and
in the NMlist probe/gene sequences) and the motif counts are the number
of matching, but non-ovelapping binding sites. In theory, this should
behave similarly to the overrep score with MAXVALUE set very high.
- E_site(self, ambig, NMlist, ceil=1, verbose='', factor=None, omitlist=[])
- ProbeSet.E_site(ambig, NMlist,ceil=1,verbose='',factor=None,omitlist=[]) --
Like the enrichment score, but computes the hypergrometric on the number of sites.
Not recommended, because the "total" and "numpicked" are based on number of probes,
so the metric intrinsically contains an apple vs. orange comparison.
- E_sitef(self, ambig, NMlist, k, verbose='', factor=None, omitlist=[])
- ProbeSet.E_sitef(ambig, NMlist,k, verbose='',factor=None,omitlist=[]) --
Another variant of the Enrichment score, which requires at least
k occurences of a motif to occur on a probe for the probe to be
counted. Each probe is counted k times.
- Enrichment(self, ambig, NMlist, verbose='', factor=None, omitlist=[])
- ProbeSet.Enrichment(ambig, NMlist,verbose='',factor=None,omitlist=[]) --
Computes Enrichment score as described in Harbison & Gordon et al. Nature 2004
Optional factor is expressed in terms of fraction of the maxiumum possible
PSSM ("ambig") score. Result is computed by using the hypergeometric distribution
to compute the probability of observing the number entries in NMlist that
match the motif by considering them has drawn randomly from the ProbeSet without
replacement.
- GO(self, ambig, NMlist, verbose='', factor=None, omitlist=[])
- ProbeSet.GO(ambig,NMlist,verbose='',factor=None,omitlist=[]) --
Check whether the subset of NMlist ids that match the motif have significant
categorization according to the Gene Ontology (GO) database.
- MNCP(self, ambig, NMlist, verbose='', factor=None, omitlist=[])
- ProbeSet.MNCP(ambig, NMlist,verbose='',factor=None,omitlist=[]) --
Compute MNCP as decribed in Clarke ND, Granek JA Bioinformatics 2003
- ROC_AUC(self, ambig, NMlist, verbose='', factor=None, omitlist=[])
- ProbeSet.ROC_AUC(ambig, NMlist,verbose='',factor=None,omitlist=[]) --
Compute ROC_AUC as decribed in Clarke ND, Granek JA Bioinformatics 2003
- __init__(self, genome='YEAST', factor=0.69999999999999996)
- ProbeSet.__init__(genome='YEAST',factor=0.7) -- 'genome' can be "YEAST" or a fasta-formatted file.
'factor' specifies the PSSM threshold in terms of
its maximum possible value.
- best_Enrichment(self, ambig, NMlist, verbose='')
- ProbeSet.best_Enrichment(ambig, NMlist,verbose='') --
Find the scoring threshold at which the motif ("ambig") has the optimal Enrichment score
(see ProbeSet.Enrichment). Returns (best_enrichment_score, best_factor) tuple.
- best_p_value(self, ambig, NMlist, verbose='')
- ProbeSet.best_p_value(ambig, NMlist,verbose='') --
Synonym for best_Enrichment. See its docstring.
- best_pvalue(self, ambig, NMlist, verbose='')
- ProbeSet.best_pvalue(ambig, NMlist,verbose='') --
Find the scoring threshold at which the motif ("ambig") has the optimal Enrichment score
(see ProbeSet.Enrichment) Returns (best_enrichment_score, best_factor) tuple. (Synonym
for best_Enrichment)
- binomial(self, ambig, NMlist, verbose='', factor=None, omitlist=[], MAXVALUE=4)
- ProbeSet.binomial(ambig,NMlist, verbose='',factor=None,omitlist=[],MAXVALUE=4) --
Synonym for ProbeSet.overrep. See its docstring.
- church(self, ambig, preNMlist, verbose='', topcount=100, omitlist=[])
- ProbeSet.church(ambig,preNMlist,verbose='',topcount=None,omitlist=[]) --
The "Group Specificity Score" as used by AlignACE (Hughes et al. JMB 2000)
Optional topcount is used as "100" in AlignACE. A negative fractional value for
topcount is used to specify the count in terms of the number of sequences
in NMlist.
- countD_matching_sites(self, ambig, NMlist=[], thresh=None, Nooverlap=None)
- ProbeSet.countD_matching_sites( ambig, NMlist=[], thresh=None,Nooverlap=None) -- [Utility, Dispatch]
Confirm that ambig is a motif object (as opposed to a regular expression) and invoke
_countD_matching_sites_motif_fast. See its documentation.
- count_matching_probes(self, ambig, NMlist=[], thresh=None)
- ProbeSet.count_matching_probes( ambig, NMlist=[], thresh=None): -- [Utility, Dispatch]
Check whether ambig is a Motif object or a regular expression, and invoke the
appropriate routine to use ambig to count the number of matching probes.
- count_matching_sites(self, ambig, ceil=1, NMlist=[], thresh=None)
- ProbeSet.count_matching_sites( ambig, ceil=1,NMlist=[], thresh=None) -- [Utility, Dispatch]
Confirm that ambig is a motif object (as opposed to a regular expression) and invoke
_count_matching_sites_motif_fast(). See its docstring.
- distribution(self, motif, NMlist)
- ProbeSet.distribution(motif,NMlist,) --
Prints the "id best_score, bestscore/maxscore" information for all probes in NMlist,
followed by some general statistical information. If Stats.stats is installed,
also attempts KS test of distribution of scores within bound probes versus the genome.
- filter(self, probelist)
- ProbeSet.filter(probelist) --
Return list of ids that are also represented in the ProbeSet object.
- frac(self, ambig, NMlist, verbose='', factor=None, omitlist=[])
- ProbeSet.frac(ambig, NMlist,verbose='',factor=None,omitlist=[]) --
Report and return the fraction of the probes/gene sequences in NMlist
that contain matches to the motif ("ambig") under the default
threshold score or the specified theshold according to factor (which
indicates the fraction of the maximum possible PSSM score to be used
as a threshold.
- fsa_string_from_ids(self, ids, comments=[])
- ProbeSet.fsa_string_from_ids( ids, comments=[]) --
Prepare a text string from ids for
printing in Fasta format. Similar functionality in Fasta module.
- ids_from_file(self, filename, QUIET='')
- ProbeSet.ids_from_file(filename,QUIET='') -- [Deprecated]
Fasta and txt file loader. Use Fasta.keys() or Fasta.ids() instead.
- matches_with_features(self, ambig, NMlist, featurelist, featurename='fE', verbose='', factor=None)
- ProbeSet.matches_with_features(ambig,NMlist,featurelist,featurename='fE',verbose='',factor=None) --
Check Enrichment of motif among ids in featurelist that have something "interesting" about
them. For example, you would use this function if you wanted to ask if a motif has a good
Enrichment score among probes identified in a microarray experiment that are, say,
telomeric, or perhaps known to be important during cell-cycle.
NMlist and featurelist are both lists of gene/probe ids. Featurename is used only for
visual reference when printing the results. Optional factor is expressed in terms
of fraction of the maxiumum possible PSSM score.
- matching_ids(self, ambig, NMlist=[], factor=None)
- ProbeSet.matching_ids(ambig,NMlist,factor=None) --
Return a list of sequence ids among NMlist (probe or gene ids) that match the motif.
Optional factor is expressed in terms of fraction of the maxiumum possible PSSM score.
- overrep(self, ambig, NMlist, verbose='', factor=None, omitlist=[], MAXVALUE=4)
- ProbeSet.overrep(ambig,NMlist, verbose='',factor=None,omitlist=[],MAXVALUE=4) --
Computes Over-representation score.
Optional factor is expressed in terms of fraction of the maxiumum possible
PSSM ("ambig") score.
The score is computed by using the binomial distribution to compute the
probability of observing the number sites that match the motif among the sequences
in NMlist by considering them has drawn randomly from the ProbeSet with
replacement. The expected fractions are based on the number of bases in the NMlist
probe/gene sequences and the total number of bases in the genome.
MAXVALUE is the maximum number of binding sites to count within a probe/gene
sequence. A good value for 500-1000bp sequences is 4, but MAXVALUE should
be made very high if analyzing longer sequences.
- p_value(self, ambig, NMlist, verbose='', factor=None, omitlist=[])
- ProbeSet.p_value(ambig, NMlist,verbose='',factor=None,omitlist=[]) --
Synonym for ProbeSet.Enrichment. See its docstring
- p_value_noomit(self, ambig, NMlist, verbose='', factor=None)
- ProbeSet.p_value_noomit(ambig, NMlist,verbose='',factor=None) -- [Deprecated]
Synonym for ProbeSet.Enrichment. See its docstring
- print_fsa_from_idfile(self, filename)
- print_fsa_from_ids(self, ids)
- ProbeSet.print_fsa_from_ids(ids) --
Print Fasta-formatted sequences to standard output.
Similar functionality also exists in the Fasta module.
- reduce_probelist(self, ids)
- ProbeSet.reduce_probelist(ids) --
Change the the internal probelist. Keep only sequences associated with the supplied IDs.
- seqs_from_idfile(self, filename, QUIET='')
- seqs_from_ids(self, ids)
| |