imlanal |
imlanal --mutation inserted_mutation --wtseqdoc path --wtseqfmt seqio_format --readnameconvention attributelist=expression --tagattribute attribute=valueform --outseqpath filepath --cloneattribute attributelist --revcompif attribute=pattern --threshold 97 --alignoutformat fmt < seqread > gff_results
Determine the locations of mutation events in a (screened) insertion mutation library (iml) by analyzing selected sequence reads.
Given the wild-type sequence for a gene, a description of an insertion mutation, and the DNA sequence reads of clone library being screened insertion mutations of the gene, recover the location on each clone of the mutation and, if present, map it to the corresponding location on the wild-type (by way of an assembly). Produce as output two files: 1) a genbank file holding (a copy of) the wild-type sequence along with features corresponding to the recovered and mapped mutation locations; 2) a GFF file describing only the features
Include diagnostic message for any clones failing to produce a mutation.
Read the psuedo-code in the method for more details.
Their case is ignored, and they may be abbreviated to uniqueness (i.e. --v instead of --verbose).
Options may be specified on the command line, and may optionally also be read from files by providing on the command line the path to the file preceded by a '@'. These option files provide simple access to typical calling scenarios (such as an analysis that is repeatedly invoked from the command line with the same parameters). Additionally, if the current directory contains a file named imlanal.config, it will be automatically used as an option file.
Defaults to a filename gained from adding to wtseqdoc new extensions of this scripts name followed by ``.gb'' (i.e. by adding .imlanal.gb).
Implemented as an association between a comma delimited list of attribute names and a perl expression which, when evaluated yeilds corresponding values for each attribute. The expression is evaluated in the context of $_ being the filename holding a seqread.
For example:
--readnameconvention='gene,library,plate,primer,platewellcoord=split /[-_.]/'
when reading the file: VPL3-15-I-BSB460_A01.seq
will assign to the current read attributes as follows:
gene => VPL3 library => 15 plate => I primer => BSB460 platewellcoord => A01
Named aliases exist as shorthand methods for decoding commonly used naming conventions. The following are predefined:
(TODO - IMPLEMENT ALIASES - THIS FEATURE IS NOT IMPLEMENTED!?!?!)
This is used to allow tracking whether a mutation has already been identified for the clone of the current read, allowing imlanal to skip processing such reads (in the name of efficiency).
For the simr1 naming convention, this should name all the attributes except for the primer.
If supplied, the alignment between each readseq and the wild-type reference sequence are printed, as a trace, in the named format to the file whose name is gained by appending ``.aln.<alignoutformat>'' to the --outseqpath.
Note: The list of formats is that provided by Bio::AlignIO module.
For example:
--revcompif primer=m/BSB460|BSB458/
--revcompif primer=m/_R^/
The value of the attribute is the result of evaluating the expression in the context of $_ being set to a reference to a hash of other current read attributes (such as established using readnameconvention).
Assuming your account is configured so that in your .tcshrc you source imlanal.tcshrc,
./imlanal.tcshrcFuther assuming that you wish to use the aliases established there (that are useful especially to Vlad), and so want to use the mgs1 configuration options,
./config/mgs1.imlanalthen, change to a test directory (recursively) holding sequence reads:
$ cd /home/mec/src/imlanal/t/data/VAP/VPL3/data-VPL3_15_I/ (http://helix/~mec/src/imlanal/t/data/VAP/VPL3/data-VPL3_15_I/)
and analyze all the sequence read files below it:
$ imlmgslib
OR
The GFF files can be opened using vector NTI version 8 under windows. The implementation of reading GFF files ignores all but the label attribute (I have a bug report / feature request into them on this topic).
The genbank may also be viewed in vector NTI (any version), and will show ALL the feature labels generated.
Not yet under version control.
Malcolm Cook mec@stowers-institute.org
perl, BioPerl, clustalw, argvFile
If this script is invoked as a CGI program, it produces HTML to document itself.
It detects that it is running as a CGI by looking for '.cgi' as an extension on the name of the running script. Thus the script should be installed without an extension, and a symbolic link to it should be created with same name, only having .cgi as an extension.
Email the author for sources.
...or get the source now!...or see the htmlized source!For each sequence read in the library:
> Skip it if the reads belongs to a clone for which a mutation > location has already been recovered
> Reverse complement it if needed (i.e. it is a reverse read)
> Identify the number and location of subsequences matching the > insertion mutation
> Filtering out any such subsequences whose flanking sequence do NOT > match
> Recover the wild-type clone sequence by splicing out the inserted > sequence and one of the reduplicated falnking sequences
> Align the recovered clone to the actual wild-type.
> Skip it if the alignment is poor
> Use the alignment to map the insertion location on the clone to the > wild-type
> Create the necessary output data structures (GFF and Genbank > features)
create a distribution & installation package
>Is the location of insertion uniformly distributed across the sequence?
> Provided a partitioning of the sequence, is the partitioning > predictive of the distribution of insertions?
trim sequence reads for quality (phred?) and or vector contamination (crossmatch?)
assemble (using phrap?) reads from the same clone together prior to alignment
align all reads against wildtype in one fell swoop - subsequently ensure that reads from the same clone
imlanal |