|
Services
Documentation
Community
Software
Contact
Stats
|
|
Welcome!
|
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently will be masked by the program. Sequence comparisons in RepeatMasker are performed by the program cross_match, an efficient implementation of the Smith-Waterman-Gotoh algorithm developed by Phil Green.
|
|
Latest News
|
If you would like to keep up with news and announcements relating to
RepeatMasker, you can subscribe to the new
RepeatMasker Announcements List.
Disk Crash
Friday Nov 14, 2008
|
Last night the RepeatMasker cluster lost a disk and stopped responding to web requests. The cluster is now back up however jobs running at the time of the failure will have to be resubmitted. Sorry for the inconvenience.
|
COSEG Beta Released
Monday Aug 11, 2008
|
COSEG is a program which automatically identifies repeat subfamilies
using significant co-segregating ( 2-3 bp ) mutations.
This program is derived from three C programs and several perl
scripts written by Alkes Price as part of an analysis of Alu
elements in the human genome ( Whole-genome analysis of Alu repeat
elements reveals complex evolutionary history, Alkes L. Price,
Eleazar Eskin, and Pavel Pevzner, 2004 Genome Research ). The program
was first adapted for use with other repeat families and then
extended to support consideration of three co-segregating mutations
using Alkes statistical model. In 2008 with the help of Andy Siegel
an alternative statistical model was developed and the codebase
repackaged into the single source file.
The code may be downloaded from here. We appreciate any
feedback on this new software package.
|
Pre-Masked Genomes Update - Drosophila, Mosquito, Fugu, and Zebrafish
Monday Aug 11, 2008
|
Today we updated the Pre-Masked Genomes page with the latest runs of RepeatMasker on the genome assemblies dm3 (Drosophila), anoGam1 (Mosquito), fr2 (fugu), danRer5 (Zebrafish). The complete annotation sets are also available for these genomes as compressed files.
|
RepeatModeler/WUBlast and RepeatScout Bugs
Friday Aug 8, 2008
|
If you experience the error message: "Not found: ....." or "Identifiers not found: #" in RepeatModeler open-1.0.3 there is a simple fix. Edit the RepeatModeler script and change the line 1547 from:
`$RepModelConfig::XDGET_PRGM -n $xdfDBFile -a$start -b$end "$seqID"`;
to:
`$RepModelConfig::XDGET_PRGM -n -a$start -b$end $xdfDBFile "$seqID"`;
The error is caused by a parameter order dependency bug in WUBlast.
We have also updated the RepeatScout release ( RepeatScout-1.0.5 ) with a bugfix to the filter-stage-1.prl script.
Thanks to Eric Ganko and others for locating and reporting these problems.
|
RepeatMasker and RepeatMasker Libraries Update
Thursday Aug 7, 2008
|
RepeatMasker open-3.2.6 was released along with an updated set of repeat libraries ( RM-20080801, including sequences up to RepBase 13.06 ). This release includes some repeat subclass nomenclature changes:
Charlie -> hAT-Charlie, Fot1 -> TcMar-Fot1,
MER1_type -> hAT-Charlie, MER2_type -> TcMar-Tigger,
MaLR -> ERVL-MaLR, Mariner -> TcMar-Mariner,
Pogo -> TcMar-Pogo, Tc1 -> TcMar-Tc1,
Tc2 -> TcMar-Tc2, Tc4 -> TcMar-Tc4,
Tigger -> TcMar-Tigger, Tip100 -> hAT-Tip100, and
hAT_Tol2 -> hAT-Tol2.
Also note that the *.tbl output file also reflect these changes.
|
RepeatMasker, RepeatMasker Libraries, and RepeatScout Updates
Friday Jun 13, 2008
|
RepeatMasker open-3.2.5 was released along with an updated set of repeat libraries ( RM-20080611, RepBase 13.02 available from GIRI ). This release includes several fixes to the alignment output format ( single "X"s in the query portion of the alignment, incorrect sequence indexes in lines containing only gap characters "-" ) and an updated set of libraries for the RepeatProteinMask program. In addition a new release of RepeatScout is available for download ( http://repeatscout.bioprojects.org/ ). The RepeatScout filter-stage-1.prl script skipped the last sequence in the input file. Thanks to Gyorgy Abrusan for reporting this.
|
Pre-Masked Genomes Update - Rhesus, Chicken, Rat, Cat, and Chimp
Wednesday Jun 11, 2008
|
The last few weeks we have updated the Pre-Masked Genomes page with the latest runs of RepeatMasker on the genome assemblies rheMac2 (Rhesus), galGal3 (Chicken), rn4 (Rat), felCat3 (Cat), and panTro2 (Chimp). The complete annotation sets are also available for these genomes as compressed files.
|
DupMasker - A Tool for Annotating Primate Segmental Duplications
Tuesday Jun 10, 2008
|
In collaboration with Evan Eichler and Zhaoshi Jiang at the University of Washington we developed the DupMasker program. DupMasker uses a library of non-redundant consensus sequences of human segmental duplications, wherein a majority of the ancestral origins have been determined based on comparisons to mammalian outgroup genomes. Using DupMasker, new human and non-human primate (NHP) sequences may be readily queried to provide details on the origin and degree of sequence identity of each duplicon. This program can be applied to delineate the order and orientation of duplicons within complex duplication blocks and used to characterize structural variation differences between sequenced human haplotypes. The paper describing this work can be found in the latest issue of Genome Research ( abstract ) and the software download page is here.
|
RECON Fixed for 64bit compilers
Wednesday May 21, 2008
|
We released a new version of RECON ( RECON1.06 ) which fixes problems users have experienced on 64bit platforms ( crashes, lockups ). For now you can download our version here. Please remember to update your RepeatModeler configuration ( by running "configure" ) if you install the new RECON in a different place than previous versions.
|
Pre-Masked Genomes Update - Rice, Arabidopsis, Opossum, Platypus and Cow
Tuesday May 13, 2008
|
Today we updated the Pre-Masked Genomes page with the latest runs of RepeatMasker on the genome assemblies orySat5 (Rice), araTha5 (Arabidopsis), monDom4 (Opossum), ornAna1 (Platypus), bosTau4 (Cow). The complete annotation sets are also available for these genomes as compressed files.
|
RepeatModeler: RECON Bug?
Wednesday May 7, 2008
|
We have noticed a few times that RECON's "re-definition of elements ( eleredef )" appears to hang up ( in a recent run it sat on this step for over 6 hours before we killed it ). If you also experience this problem while running RepeatModeler please let us know. When we restart the run from the begining it has been able to accomplish this step in a reasonable amount of time ( hg18: 1-5 minutes ).
|
Pre-Masked Genomes Update - HG18 and MM9
Tuesday May 6, 2008
|
Today we updated the Pre-Masked Genomes page with the latest runs of RepeatMasker on the genome assemblies HG18 and MM9. In addition to the data query service we have also provided the ability to download the complete annotation sets as compressed files.
|
WUBlastXSearchEngine.pm Missing
Friday May 2, 2008
|
The WUBlastXSearchEngine.pm module was missing from yesterday's RepeatMasker release. This module is needed for the RepeatProteinMask program. Please re-download the 3.2.2 release of RepeatMasker if you had previously obtained it.
|
RepeatProteinMask Released, RepeatModeler/RepeatMasker Updates
Thursday May 1, 2008
|
The program which runs the repeat protein search on the website is now available as a standalone program within the RepeatMasker package.
Thanks to the Sanger Institute and various testers we have patched a few bugs in the first RepeatModeler release. The fixes required a new version of RepeatMasker to be created ( version 3.2.2 ) although the changes will not impact RepeatMasker results. If you have experienced problems installing the first release or with running the repeat classifier, please download the latest RepeatMasker and RepeatModeler packages.
|
RepeatModeler Beta: Repeat Discovery Workbench Released
Wednesday April 16, 2008
|
RepeatModeler is a de-novo repeat family identification and modeling package. At the heart of RepeatModeler are two de-novo repeat finding programs ( RECON and RepeatScout ) which employ complementary computational methods for identifying repeat element boundaries and family relationships from sequence data. RepeatModeler assists in automating the runs of RECON and RepeatScout given a genomic database and uses the output to build, refine and classify consensus models of putative interspersed repeats.
The software is available for download here.
Also note that RepeatMasker is now up to version 3.2.1. This version as well as the previous version are organizational updates to support RepeatModeler and have little or no impact on RepeatMasker results.
|
RepeatScout On Multiple Sequences
Tuesday April 1, 2008
|
RepeatScout is a highly successful de-novo repeat discovery algorithm developed by Price, A. L. et al. We have created a modified version of RepeatScout ( version 1.0.3 ) which supports searches on highly fragmented genomes. The new version does not attempt to extend seeds across sequence boundaries. Pavel Pevzner's lab has offered to host the download for this new version at: http://repeatscout.bioprojects.org/
Continuing with the theme of de-novo repeat identification I have added two recent papers on the topic by Saha et al. to the Related Papers page.
|
Unexpected Downtime
Thursday, February 28 2008
|
At 3am last night the RepeatMasker cluster went down unexpectedly. We restored service at 10am this morning and are looking into the cause of the problem. If you had jobs queued/running at the time they will need to be resubmitted. We apologize for any inconvenience this may have caused.
|
New Compute Node & Scheduler
Wednesday, January 23 2008
|
The RepeatMasker cluster received a new compute node ( for processing web requests ) and an upgrade to the job scheduler over the weekend. This upgrade should improve the overall throughput of the masking service.
|
RepeatMasker open-3.1.9 Released
Friday, January 11 2008
|
A new version of RepeatMasker is available for download. Updates include:
- Codebase synced to recently released library RMLib 20071204.
- Improved DNA transposon fragment identification.
- BugFix: The .align files generated by 3.1.8 did not contain
cross_match style headers for each alignment.
- BugFix: ProcessRepeats will infrequently exit in cycle 10 with the
error: cycle 10 Can't call method "addDerivedFromAnnot" on an undefined value at ProcessRepeats line 3835. No results for the search are given.
|
[Archived News]
|
|
Links
|
- RepeatMasker makes use of Repbase which is a service of the Genetic Information Research Institute. Repbase is a comprehensive database of repetitive element consensus sequences.
- Data and computational resources for the Pre-Masked Genomes page is
provided courtesy of the UCSC Genome Bioinformatics group.
|
|