The Institute for Systems Biology Interspersed Repeat Masking Based on Protein Similarity

Description

Query DNA sequences are compared to a database of transposable element encoded proteins. Copies of non-coding transposable elements like SINEs and long terminal repeats of retroviral-like elements will not be masked and the masked sequence is not "ready" for DNA based comparisons. However, this approach is especially useful when no repeat library is yet available for the query species and the primary concern is avoiding spurious matches in BLASTX-like searches. The method is also much faster than the DNA based approach. False positives are minimal but be aware of the existence of transposable element derived genes.

Please let us know if you experience any problems with your analysis. You may submit your feedback here.


Sequence File: Select a sequence file from your computer...or
Sequence: Enter in sequence directly. Multiple sequences must be in FASTA format.
Simple repeats?: In addition to masking matches to repeat proteins, also mask tandem repeats and low complexity DNA.
Email Address (optional): Enter your email address if you would like to be notified by email when your results are ready.

Institute for Systems Biology
This server is made possible by funding from the National Human Genome Research Institute (NHGRI grant # RO1 HG002939).