| |
- Motif_from_counts(countmat, beta=0.01, bg={'A': 0.25, 'C': 0.25, 'G': 0.25, 'T': 0.25})
- m = Motif_from_counts(countmat,beta=0.01,bg={'A':.25,'C':.25,'G':.25,'T':.25})
Construct a Motif object from a matrix of counts (or probabilities or frequencies).
A default set of uniform background frequencies may be overridden.
beta refers to the number of pseudocounts that should be distributed over each position
of the PSSM.
- Motif_from_ll(ll)
- m = Motif_from_ll(ll)
Constructs a motif object from a log-likelihood matrix, which is
in the form of a list of dictionaries.
- Motif_from_text(text, beta=0.050000000000000003, source='', bg=None)
- m = Motif_from_text(text,beta=0.05,source='',bg=None)
Construct a Motif object from a text string constructed from IUPAC
ambiguity codes.
A default set of uniform background frequencies may be overridden with
a dictionary of the form {'A':.25,'C':.25,'G':.25,'T':.25}).
beta refers to the number of pseudocounts that should be distributed over each position
of the PSSM.
- Random_motif(w)
- Random_motif(w) -- Generate a random motif of width w. Each position will
have a dominant letter with probability around 0.91.
- avestd(vals)
- avestd(vals) -- [Utility function] Return an (average, stddev) tuple computed from the supplied list of values
- bestseqs(motif, thresh, seq='', score=0, depth=0, bestcomplete=None, SEQS=[])
- bestseqs(motif,threshold)
This function returns a list of all sequences that a motif could
match match with a sum(log-odds) score greater than thresh.
- compare_seqs(s1, s2)
- compare_seqs(s1, s2)
- copy(motif)
- m = copy(n)
Utility routine for copying motifs
- diff(self, other)
- diff(m1,m2) - psuedo-Euclidean (sum_col(sqrt(norm(sum_row)))/#col
- diverge(self, other)
- Yet another distance metric
- giflogo(motif, id, title=None, scale=0.80000000000000004)
- giflogo(motif,id,title=None,scale=0.8) -- Interface to the 'weblogo/seqlogo' perl
scripts that generate colorful sequence logos
- infomaskdiff(self, other)
- infomaskdiff(m1,m2) -- Return pseudo-Euclidean distance, but scale column distance by
information content of "other". Used by THEME
- load(filename)
- load(filename) -- Load a 'TAMO'-formatted motif file.
- m_matches(seqs, wmer, m)
- m_matches(seqs,wmer,m) -- Returns list of all kmers among sequences that have at most
m mismatches to the supplied wmer (kmer).
- maskdiff(self, other)
- maskdiff(m1,m2) - diff, but excluding positions with 'N' in m2
Return pseudo-Euclidean distance, but only include columns that are not background
- merge(A, B, overlap=0)
- merge(A,B,overlap=0) -- [Utility function] Use the '+' operator instead. Used for concatenating motifs
into a new motif, allowing for the averaging of overlapping bases between them.
- minaligndiff(M1, M2, overlap=5, diffmethod='diff')
- minwindowdiff(M1, M2, overlap=5, diffmethod='diff')
- m.minwindowdiff(M1,M2,overlap=5,diffmethod='diff')
- nlog10(x, min=9.8813129168249309e-324)
- nlog10(x,min=1e-323) -- returns -log10(x) with a maximum default value of 323.
- pickletxt2motifs(toks)
- pickletxt2motifs(toks) -- [Utility function] See txt2motifs documentation.
- print_motif(motif, kmer_count=20, istart=0)
- print_motif(motif,kmer_count=20,istart=0) -- Print a motif in the 'TAMO'-format. istart specificies the motif number, and
optional kmer_count specificies how many sequences to include in the printed
multiple sequence alignment that recapitulates the probability matrix.
- print_motifs(motifs, kmer_count=20, istart=0)
- print_motifs(motifs,kmer_count=20,istart=0) -- Print list of motifs as a 'TAMO'-formatted motif file to the specificied file.
Optional kmer_count specificies how many sequences to include in the printed
multiple sequence alignment that recapitulates the probability matrix.
istart specifies number from which to begin motif ids.
- random(...)
- random() -> x in the interval [0, 1).
- random_diff_avestd(motif, iters=5000)
- random_diff_avestd(motif,iters=5000) -- Return the average & stddev distance ('diff') between a
motif and "iters" random motifs of the same width.
- revcomplement(seq)
- revcomplement(seq)
A quick reverse-complement routine that memo-izes queries, understands
IUPAC ambiguity codes, and preserves case.
- revcompmotif(self)
- revcompmotif(self) -- [Utility function] Construct the reverse complement of the motif. Use m.revcomp() member function instead.
- save_motifs(motifs, filename, kmer_count=20)
- save_motifs(motifs,filename,kmer_count=20) -- Save list of motifs as a 'TAMO'-formatted motif file to the specificied file.
optional kmer_count specificies how many sequences to include in the printed
multiple sequence alignment that recapitulates the probability matrix.
- seqs2fasta(seqs, fasta_file='')
- seqs2fasta(seqs,fasta_file = '') -- Dumps a Fasta formatted file of sequences, keyed by the sequence itself:
>ACTTTTTGTCCCA
ACTTTTTGTCCCA
>ACTTTTGGGGCCA
ACTTTTGGGGCCA
...
- shuffle_bases(m)
- shuffle_bases(m) -- Return a new motif object in which the probabilities are
randomly re-assigned to different letters at the same position.
- shuffledP(self)
- shuffledP(self) -- Construct a motif in which the letter distributions are preserved but
are reassigned to rondom positions in the motif.
- sortby(motiflist, property, REV=0)
- sortby(motiflist, property, REV=0) -- Sort a motif list according to a particular property
- submotif(self, beg, end)
- submotif(self,beg,end) -- Utility function for extracting sub-motifs and padding motifs.
Use slice functionality (m[2:4]) instead.
- sum(motifs, weights=[])
- sum(motifs,weights=[]) -- Perhaps better called 'average'. Constructs a motif by averaging the
probabilities at each position of the (pre-aligned) input motifs. Optional
weights can be assigned, and must be in the same order as the motifs.
- toDict(M)
- toDict(M) -- Convert a 2D array to a list of dictionaries (which is how the motif object
stores information internally). Assumes M entries are in alphabetical order (ACGT)
- toDictVect(V)
- toDictVect(V) -- Convert a 1D vector to a dictionary of DNA letters. Assumes values
in V are in alphabetical order (ACGT).
- top_nmers(N, seqs, with_counts=0, purge_Ns='')
- top_nmers(N,seqs,with_counts = 0,purge_Ns = '') -- Assemble list of all nmers (kmers) with width 'N' from
supplied sequences. Option with_counts returns list
of (kmer, count) tuples instead. Purge N's ignores
kmers containing N's.
- txt2motifs(txt, VERBOSE=1)
- txt2motifs(txt,VERBOSE=1) -- Convert a text string into a list of motifs:
Examples:
'TGASTCA,GAATC' --> 2 motifs from ambiguity codes
'results.tamo' --> All motifs in TAMO-format file
'results.tamo:34,45' --> Motifs 34 and 45 in TAMO-format file
'results.pickle' --> All motifs in pickle (list or dict of Motifs)
'results.pickle%GAL4 --> 'GAL4' entry in results.pickle dictionary
'results.pickle:34,45 -> Motifs 34 and 45 in results.pickle list
|