TAMO.HT
index
/home/David_Gordon/docs/TAMO/HT.py

HT.py --  Fast Interface to Tabular High-Throughput data (e.g. microarray)
 
CORE OBJECTS: 
class Dataset
class metaDataset
 
Example:
 
Say you have a comma-separated file summarizing p-values for a large number of
high-throughput experiments in the form:
 
refseq_id, HNF4a_HepG2,  HNF4a_Hepcyt, HNF1a_HepG2, HNF1a_Hepcyt, ....
NM_000345,  0.0001,      0.01,         0.343,       0.23,   
NM_000347,  0.01,        0.443,        0.13,        0.5,
NM_000456,  0.21,        0.04,         1.0,         0.004,
.
.
.
 
Such files could represent enrichment ratios or p-values from expression data, ChIP-chip data, or
other high-throughput data.
 
Instantiate a Dataset object:
 
>>> DATA = MT.Dataset('human_chip_data.csv')
 
The first time a file is loaded, a cached '.dataset' file is generated for faster access later.  You
must therefore have write permission in the directory of the original .csv file if it is being
instantiated into a Dataset object for the first time.
 
Now you can ask questions:
 
>>> print DATA.bound('HNF4_HepG2',threshold=0.001)   #Produces ['NM_00345']
>>> print DATA.bound('HNF4_HepG2',0.01)              #Produces ['NM_00345', 'NM_00347']
 
If the input file contains Expression data, the Dataset object can be queried for
overexpressed or underexpressed genes in terms of the ratios represented in the dataset:
 
 
>>> genes = DATA.ratioabove('yeast_heat',2.0)  #With correct dataset, produces upregulated gene list
>>> genes = DATA.ratiobelow('yeast_heat',0.2)  #With correct dataset, produces downregulated gene list
 
In conjunction with the ProbeSet object (in the MotifMetrics module), these genes may be
directly associated with sequences.
 
A 'metaDataset' provides a way to consider a collection of '.CSV' files as a single dataset.
 
Other member functions include:
    boundq(experiment,id,threshold)   # True or false:  Bound (or ratiobelow) for this id/experiment condition?
    boundby(id,threshold)             # List of experiments in which 'id' is bound (or ratiobelow).
    value(experiment,id)              # Query values
    values(experiment, idlist)        # Query many values
    scores(experiment)                # Query all values, as (value, id) tuples
    boundre(regexp,threshold)         # logical 'and' on all experiments matching threshold (bound or ratiobelow)
 
In the metaDataset object, there are the member functions:
    highest_n(experiment,N,threshold) # Return of list of N id's  with values above threshold
    lowest_n(experiment,N,threshold)  # Return of list of N id's  with values below threshold
    scores(experiment)                # Same as for Dataset object
    values(experiment,idlist)         # Same as for Dataset object
    
 
Copyright (2005) Whitehead Institute for Biomedical Research
All Rights Reserved
Author: David Benjamin Gordon

 
Modules
       
TAMO.seq.Fasta
copy
copy_reg
math
multiarray
os
pickle
re
shelve
string
sys
tempfile
time
types

 
Classes
       
Dataset
Experiment
metaDataset

 
class Dataset
    Represent a dataset of p-values of binding regions -- See TAMO.HT module documentation
 
  Methods defined here:
__init__(self, csvfile)
bound(self, expt, threshold=0.001)
boundq(self, experiment, probe, threshold=0.001)
boundre(self, regexps, threshold=0.001)
matching_exp(self, regexp)
overexpressed(self, expt, threshold=2.0)
probe_boundby(self, probe, threshold=0.001)
pvalue(self, experiment, probe)
ratioabove(self, expt, threshold=2.0)
readpickle(self)
readprobecsv(self)
savepickle(self)
scores(self, experiment)
underexpressed(self, expt, threshold=0.001)
value(self, experiment, probe)
values(self, experiment, probeids)

 
class Experiment
     Methods defined here:
__init__(self, name=None)

 
class metaDataset
    Represent a dataset of p-values of binding regions -- See TAMO.HT module documentation
 
  Methods defined here:
__init__(self, filelist=[])
bound(self, expt, pvalue=0.001)
boundprobes(self, experiment_id, pvalue=0.001)
highest_n(self, experiment_id, N, threshold=None)
lowest_n(self, experiment_id, N, threshold=None)
matching_exp(self, regexp)
overexpressed(self, expt, threshold=2.0)
ratioabove(self, expt, threshold=2.0)
scores(self, expt)
underexpressed(self, expt, threshold=0.001)
values(self, expt, probeids)

 
Functions
       
arange(...)
arange(start, stop=None, step=1, typecode=None)
 
 Just like range() except it returns an array whose type can be
specified by the keyword argument typecode.
array(...)
array(sequence, typecode=None, copy=1, savespace=0) will return a new array formed from the given (potentially nested) sequence with type given by typecode.  If no typecode is given, then the type will be determined as the minimum type required to hold the objects in sequence.  If copy is zero and sequence is already an array, a reference will be returned.  If savespace is nonzero, the new array will maintain its precision in operations.
arrayrange = arange(...)
arange(start, stop=None, step=1, typecode=None)
 
 Just like range() except it returns an array whose type can be
specified by the keyword argument typecode.
choose(...)
choose(a, (b1,b2,...))
cross_correlate(...)
cross_correlate(a,v, mode=0)
fromstring(...)
fromstring(string, typecode='l', count=-1) returns a new 1d array initialized from the raw binary data in string.  If count is positive, the new array will have count elements, otherwise it's size is determined by the size of string.
l_and(l1, l2)
l_andnot(l1, l2)
l_intersection(l1, l2)
l_or(l1, l2)
l_union(l1, l2)
l_xor(l1, l2)
loadcsv(filename)
reshape(...)
reshape(a, (d1, d2, ..., dn)).  Change the shape of a to be an n-dimensional array with dimensions given by d1...dn.  Note: the size specified for the new array must be exactly equal to the size of the  old one or an error will occur.
searchsorted = binarysearch(...)
binarysearch(a,v)
take(...)
take(a, indices, axis=0).  Selects the elements in indices from array a along the given axis.
vcsv2dict(filename)
zeros(...)
zeros((d1,...,dn),typecode='l',savespace=0) will return a new array of shape (d1,...,dn) and type typecode with all it's entries initialized to zero.  If savespace is nonzero the array will be a spacesaver array.

 
Data
        Character = 'c'
Complex = 'D'
Complex0 = 'F'
Complex16 = 'F'
Complex32 = 'F'
Complex64 = 'D'
Complex8 = 'F'
Float = 'd'
Float0 = 'f'
Float16 = 'f'
Float32 = 'f'
Float64 = 'd'
Float8 = 'f'
Int = 'l'
Int0 = '1'
Int16 = 's'
Int32 = 'i'
Int8 = '1'
LittleEndian = True
NewAxis = None
PyObject = 'O'
UInt = 'u'
UInt16 = 'w'
UInt32 = 'u'
UInt8 = 'b'
UnsignedInt16 = 'w'
UnsignedInt32 = 'u'
UnsignedInt8 = 'b'
UnsignedInteger = 'u'
absolute = <ufunc 'absolute'>
add = <ufunc 'add'>
arccos = <ufunc 'arccos'>
arccosh = <ufunc 'arccosh'>
arcsin = <ufunc 'arcsin'>
arcsinh = <ufunc 'arcsinh'>
arctan = <ufunc 'arctan'>
arctan2 = <ufunc 'arctan2'>
arctanh = <ufunc 'arctanh'>
bitwise_and = <ufunc 'bitwise_and'>
bitwise_or = <ufunc 'bitwise_or'>
bitwise_xor = <ufunc 'bitwise_xor'>
ceil = <ufunc 'ceil'>
conjugate = <ufunc 'conjugate'>
cos = <ufunc 'cos'>
cosh = <ufunc 'cosh'>
divide = <ufunc 'divide'>
divide_safe = <ufunc 'divide_safe'>
e = 2.7182818284590455
equal = <ufunc 'equal'>
exp = <ufunc 'exp'>
fabs = <ufunc 'fabs'>
floor = <ufunc 'floor'>
floor_divide = <ufunc 'floor_divide'>
fmod = <ufunc 'fmod'>
greater = <ufunc 'greater'>
greater_equal = <ufunc 'greater_equal'>
hypot = <ufunc 'hypot'>
invert = <ufunc 'invert'>
left_shift = <ufunc 'left_shift'>
less = <ufunc 'less'>
less_equal = <ufunc 'less_equal'>
log = <ufunc 'log'>
log10 = <ufunc 'log10'>
logical_and = <ufunc 'logical_and'>
logical_not = <ufunc 'logical_not'>
logical_or = <ufunc 'logical_or'>
logical_xor = <ufunc 'logical_xor'>
maximum = <ufunc 'maximum'>
minimum = <ufunc 'minimum'>
multiply = <ufunc 'multiply'>
negative = <ufunc 'negative'>
not_equal = <ufunc 'not_equal'>
pi = 3.1415926535897931
power = <ufunc 'power'>
remainder = <ufunc 'remainder'>
right_shift = <ufunc 'right_shift'>
sin = <ufunc 'sin'>
sinh = <ufunc 'sinh'>
sqrt = <ufunc 'sqrt'>
subtract = <ufunc 'subtract'>
tan = <ufunc 'tan'>
tanh = <ufunc 'tanh'>
true_divide = <ufunc 'true_divide'>
typecodes = {'Character': 'c', 'Complex': 'FD', 'Float': 'fd', 'Integer': '1sil', 'UnsignedInteger': 'bwu'}