Python: module TAMO.seq.Fasta

TAMO.seq.Fasta

index
/home/David_Gordon/docs/TAMO/seq/Fasta.py

Fasta.py -- Very efficient code for loading biological sequences in Fasta format python dictionaries. Copyright (2005) Whitehead Institute for Biomedical Research (except as noted below) All Rights Reserved Author: David Benjamin Gordon

Modules

os
random
re
sys

Functions


delN(fsaD)
Fasta.delN(fsaD) ---------------- Remove any entries in the Fasta-derived dictionary that have any DNA ambiguity codes within.  Reports ids of deleted sequences.

fasta2dict(filename, want_dict='YES', key_func=None)
Fasta.fasta2dict(filename, want_dict = 'YES',key_func=None) ---------------------------------------------------------- Very fast Fasta Loader.  Used internally.  You should be using Fasta.load() or Fasta.seqs() instead.

file2dict(filename, key_func=None)
Fasta.file2dict(filename,key_func=None) -------------------------------------- Synonymous with Fasta.load().  See documentation for Fasta.load().

find(name, pathhint=None)
Fasta.find(name,pathhint=None) ------------------------------ Find a ".fsa" file with a similar name to the supplied file. For example, given "GAL4_YPD.meme," this function will look in the current directory, then the parent directory or the optinal "hint" directory for a file with the name "GAL4_YPD.fsa"

ids(filename, key_func=None)
Fasta.ids(filename,key_func=None) --------------------------------- Return the ids in a Fasta file.  Same as Fasta.keys(file,key_func=None) key_func is a function (or lambda expression that extracts the key. Default is to take first word separated by whitespace. For example: key_func=lambda x: x.split('|')[3] Would use the 4th token separated by the "|" symbol

keys(filename, key_func=None)
Fasta.keys(filename,key_func=None) ---------------------------------- Return the ids in a Fasta file.  Same as Fasta.ids(file,key_func=None) key_func is a function (or lambda expression that extracts the key. Default is to take first word separated by whitespace. For example: key_func=lambda x: x.split('|')[3] Would use the 4th token separated by the "|" symbol

load(filename, key_func=None)
Fasta.load(filename,key_func=None) --------------------------------- Load the file "filename" as a dictionary of sequences, indedex according to key_func. Default is to take first word separated by whitespace. For example: key_func=lambda x: x.split('|')[3] Would use the 4th token separated by the "|" symbol

random_split(filename_or_seqD, frac=0.5)
Fasta.random_split(filename_or_seqD,frac=0.5) -------------------------------------------- Randomly partition a fasta-derived dictioary. The input may be either a filename or a dictionary of sequences.  The "frac" argument specifies the ratio of number of sequences. Returns two dictionaries.

random_subset(filename_or_seqD, target_count=30)
Fasta.random_subset(filename_or_seqD,target_count=30) ----------------------------------------------------- Pick a subset of entries at random from the specificied input, and return a dictionary.  The input may be either a filename or a dictionary of sequences. "target_count" is the desired size of the subset.

seqs(filename)
Fasta.seqs(filename) -------------------- Return a list of the sequences contained in the file.

text(D, toupper=0, linelen=70)
Fasta.text(D,toupper=0,linelen=70) ---------------------------------- Utility fucntion for generating Fasta-formatted output from a dictonary of sequences.  toupper specifies if all sequences should be capitalized, and linelen specifies how many sequence characters are allowed on each line. Returns a single string.

write(D, filename, linelen=70)
Fasta.write(D,filename,linelen=70) ---------------------------------- Write dictionary of sequences out to a file.  Optional linelen argument specifies how many sequence characters are allowed on each line.