TRACTS online help page

Input Parameters

Output reports

 

Input Parameters 

Annotated sequence file

In order to determine in which genic subregion a certain tract resides, it is necessary to determine where each exon and intron starts and ends. This is done by parsing the annotation file. Consequently, the sequence file is to be entered in this window, attached to its annotation in Genbank, EMBL or DDBJ style, in the flat form format (suffix .gbk). The extracted data will be shown on the sequence output by different background colors for exon +, exon - or intron etc., and will be listed in the Gene table. In case the sequence to be analyzed is different from that attached to the annotation, the sequence can be entered separately in the "alternate sequence entry file " window below, and that sequence will be processed.

 Important notes:

Exact numerical correspondence between the annotation and sequence is required for inspection of outputs.

Extract using mRNA/CDS

These buttons are on the upper right corner of the Annotated sequence window. Annotation files may contain mRNA feature keys or CDS feature keys or both. In the latter case, the user can choose between the mRNA and CDS (and exon/intron) features. The preferred feature should be mRNA, whenever it is provided.. However, when the mRNA data are incomplete (many mRNA start or end positions are uncertain), or missing (check the annotation file), the CDS data need be entered instead. In that case noncoding mRNA will be read as intergenic ("intercoding" now), with the consequence that 5'and 3' UTR regions will be assigned and counted as intercoding. RNA genes (tRNA, ribosomal etc.) are extracted and listed in both cases. default option is CDS.

Browse

Use this box to transfer your input annotation file directly, if saved on your machine.

Alternate sequence entry file

If for any reason, the sequence attached to the annotation file is not the required one, you can enter in this window your own sequence separately. You can also process the sequence alone, without annotation. In that case only the "Tracts list", "Tract frequencies" and "Sequence output" outputs will be generated, without sub regional distribution and Gene Tables.

Binary Motif

The user can choose between the three possible pairs: R.Y (purine.pyrimidine); K.M (keto.imino) or S;W. The R.Y and K.M tracts can be run together because usually these tracts distribute evenly between the two DNA strands. S and W are better run separately, as weak and strong sequences tend to behave quite differently.

An option "None" is provided, to have the ability to produce a color-annotated sequence, without any tracts being indicated. On the request of a referee, an option to run unary sequences (polyA, polyC etc.) has been added.

Match Level

TRACTS can identify also binary tracts in which a limited percentage of the nonspecified bases are included. Thus on a 90% match level, one nonspecified base ("nonbase") in ten will be permitted, e.g. 3 C in an 30 nt R tract. TRACTS will handle nonbase levels down to 70%, in intervals of 5%. Below 70% tracts will cover most of the genome and difficulties in calculating expected values are encountered. 100% is the default level.

Important - when choosing Match level other then 100% the "Tracts frequencies" table will not be generated, because expected values calculated here are valid to 100% only. Those interested in calculating expected values for match levels less than 100% - contact gad.yagil@weizmann.ac.il .

Comment

 

A line for free text, to enter auxiliary data of your run and comments, to be displayed on top of the output reports. 


Select Output Report

In these five checkboxes you can select the output files (A-E) you need. For certain inputs not all outputs can be generated, as described below. All output boxes except "Annotated Sequence" are marked. The "Annotated Sequence" output is the largest output, so in order to conserve time (determined mainly by transfer of data rates) this output is left unchecked by default.

A.     Tracts list

Produces a list of all tracts above a minimum length, to be entered in the pull down box on the right. Range is 10 - 50. Default is 15 bases.This feature is enabled only when a Binary Motif is chosen.

B.     Tract frequencies table

Produces a table which summarizes the frequencies of the chosen tracts, their found and expected values, as well as found/expected ratios. Note: The lower limits selected in A., or D., are for display purposes only. For the Tract frequencies table, all tract lengths are identified and listed.

C.     In Sub-region distribution table 

The sub-regional distributions will be calculated for all tracts equal AND longer than the tract length selected in the pull down box on the right. This table is generated only when a Binary Motif is chosen and annotation file is provided.

D. Annotated Sequence

Relists the sequence, with exons and introns colored in the backgound. The Tracts found above a chosen minimum length are shown as colored letters. The minimum length is selected in the pull down box on the right. The lower the number chosen, the more tracts will be marked on the sequence. Range is 7 - 50. Default is 10 bases

E.      GeneTable

A list of all genes (exons/introns) in the chromosome/contig/scaffolds analyzed. (Note: an exon/intron list is produced wheter CDS or mRNA is specified)

Back to top 

 

Output reports                                                                                                                    

A. Tract list

 A list of all the tracts, which are longer then or equal to a given length, as specified by the user during input. The tracts are listed according to their order of appearance in the sequence. Each line shows:

B. Tracts frequencies

A table in which the lines show:

·    Column 1: The lengths of the tracts, in nt.

·    Column 2: The number of tracts of one member of the binay pair chosen.

·    Column 3: The number of tracts of the other member (this and the next column will be absent if only a single tract is chosen).

·    Column4: The sum of both members, i.e. the number of tracts of length l found in the input sequence

·    Column 5: The number of tracts of that length expected in a random DNA sequence of the same length and base composition as the input sequence, calculated by L(pl q2+ qlp2).

·    Column 6. The difference between columns 5 and 4

·    Column 7. The number of bases found in tracts of the length listed in column 1 (i.e. column 3 multiplied by tract length l)

·    Column 8. The number of bases expected in random DNA, (i.e. column 4 multiplied by tract length l).

·    Column 9. The ratio between the number of bases found and the number expected, i.e. column 5 divided by column 4. This yields the same value as column 8 divided by Column 7, i.e. the value will be the same, whether no. of tracts or number of bases in these tracts is counted. The ratio is the best indicator of under- or over-representatiom of the binary tracts.

·    Column 10. No of found bases which are equal AND longer (GE) than the length listed in Column1

·    Column 11. no. of expected bases in random DNA GE than l. (For formula see paper).

·    Column 12. The ratio of found/expected GE l values (i.e. column 11 divided by column 10).

The numbers of tracts or bases expected (and ratio values) in this table are valid only when a match level of 100% is specified (see: Options); if other match evel values are specified, the table will not be generated.

C. Sub region distributions

A summary table showing:

·     Number of bases in exons, introns, and intergenic regions in the input sequence.

·    Percentage of exons, introns, and intergenic regions in the input sequence.

·        Number of bases found in tracts of each genomic sub region. The numbers shown are for tracts equal AND longer than the length selected by the user.

·      Number of bases expected in tracts of each genomic sub region.

This table can be generated only when the annotation data gives the subregional composition information.

D. Annotated Sequence

The full sequence analyzed, in a convenient 100 base format (in "blocks" of 10). Found tracts have their letters colored according to their binary motif. Exons and introns are indicated by their background colors; Introns are in italics. The minimum tract length to be colored is user selected. Moving the mouse over a colored region will show a tool tip indicating the gene name, gene product (function), sub region type and where the region starts and ends. By pressing on the mouse while pointing at the region, the display jumps to the corresponding entry in the Gene table.

E. Gene table

A one-line summary of: each exon and intron of all genes (RNA's) extracted from the annotated sequence. The lines shows:

This table can be generated only if the annotation data supplies the regional information.s

Back to top