|
Services
Documentation
Community
Software
Contact
Stats
|
|
Welcome!
|
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently will be masked by the program. Sequence comparisons in RepeatMasker are performed by one of several popular search engines including, cross_match, ABBlast/WUBlast, RMBlast and Decypher.
|
|
Latest News
|
If you would like to keep up with news and announcements relating to
RepeatMasker, you can either follow us on Twitter:
or subscribe to our low-volume announcement only mailing list:
RepeatMasker Announcements List.
RepeatMasker 4.0.2 Maintenance Update And New Library Release
Monday, April 29, 2013
|
Today we released RepeatMasker 4.0.2. This is a maintenance update which fixes several problems in 4.0.0/4.0.1. Notably there were issues with human Alu refinement, short input sequences producing "FastaDB::substr - Error index out of bounds!" errors, and lastly an issue with overlapping annotations not being merged. We have also released a new RepeatMasker library ( rm-20130422 ) which includes updates from Repbase as well as four new genome libraries: Gibbon (Nomascus leucogenys), American alligator (Alligator mississippiensis), saltwater crocodile (Crocodylus porosus), and gharial (Gavialis gangeticus). The new release is available here: www.repeatmasker.org/RMDownload.html.
|
RepeatMasker 4.0.1 Maintenance Update
Friday, February 22, 2013
|
Today we released RepeatMasker 4.0.1. This is a maintenance update which fixes problems observed by some of our users. Notably this fixes error messages produced by the configure script, problems using the older wublast program with RepeatMasker, empty classname columns when custom libraries are used, and noisy perl warnings. Also included in this release are an updated taxonomy database, and an expanded repeat protein database.
The new release is available here: www.repeatmasker.org/RMDownload.html.
|
RepeatModeler 1.0.7 - Update
Tuesday, January 15, 2013
|
Today we released RepeatModeler 1.0.7. This version adds support for the newly released RepeatMasker 4.0 package and the RMBlast 2.2.27+ search engine. The release is available here: www.repeatmasker.org/RepeatModeler.html.
|
RepeatMasker 4.0
Thursday, January 10, 2013
|
Today we released RepeatMasker 4.0 adding support for the new nhmmer program and the new profile HMM database of transposable elements - Dfam. Other changes include: a new alignment file format for improved cross referencing of database/annotation identifiers, adoption of TRF for simple repeat identification, improved SINE subfamily refinement, and plenty of bugfixes.
The new release is available here: www.repeatmasker.org/RMDownload.html.
|
NCBI Releases BLAST+/RMBlast 2.2.27
Friday, September 14, 2012
|
In collaboration with NCBI we now have a synchronized release of the RMBlast and NCBI BLAST+ tools. NCBI now hosts the source code and pre-compiled binaries for RMBlast allowing us to support a more diverse set of hardware/software platforms. Please see our RMBlast page for details on how to install the new release with RepeatMasker and RepeatModeler. Special thanks to George Coulouris at NCBI for all his assistance in getting this distribution system setup.
|
Dfam: A Database for Profile HMMs of Transposable Elements
Thursday, September 13, 2012
|
The first version of a transposable element profile HMM database was released this month. This represents a major improvement in the characterization of these interesting sequences. Profile methods are known to improve sensitivity over single sequence search, with profile HMMs in particular leveraging the additional information content in position-specific residue and indel variability. Until very recently the use of DNA/DNA profile HMMs to conduct large scale genomic searches was impractical. Advances by the HMMER3 development team at HHMI Janelia farm have made genome scale searches of profile HMMs feasible and enabled the development of this new community resource. A new version of RepeatMasker which uses Dfam and nhmmer will be released in the next few weeks. This work is a collaboration between HHMI Janelia Farm, GIRI ( Genetic Information Research Institute, Repbase ), and the Institute for Systems Biology.The official announcement of the resource: http://selab.janelia.org/people/eddys/blog/?p=675 The database website: http://dfam.janelia.org
|
Analysis of Human ALU Subfamilies
Tuesday June 26, 2012
|
As often happens in science, an analysis is performed to improve the overall performance of a program ( RepeatMasker ) and once a new version of the program is out we forget about the datasets themselves. Today we uploaded the results of a coseg run on over 500k human ALUs. Follow this link to see the tree produced by coseg along with a multiple alignment of the various subfamilies, old and new ( at the time ).
|
RepeatMasker open-3-3-0 Crossmatch Patch
Friday June 22, 2012
|
RepeatMasker + Crossmatch users have reported an intermittent bug with RepeatMasker open-3-3-0. On runs using Crossmatch as the search engine the following error can appear:
WARNING: The search engine returned an error (141)
A search phase could not complete on this batch.
The batch file will be re-run and if possible the
program will resume.
WARNING: Retrying batch ( 1 )...
In some cases this is caused by a bug in the file "CrossmatchSearchEngine.pm". To fix this problem either download the patched version of the RepeatMasker-open-3-3-0 package from here or you can simply download the new CrossmatchSearchEngine.pm file and replace the old one with it. Please submit feedback if you have any problems with this process.
|
New RepeatMasker Libraries
Thursday April 19, 2012
|
Today we updated the RepeatMasker libraries to version rm-20120418. This update includes new RepBase sequences up to RepBase 17.02, including significant additions for lizard, potato and soybean. We have also extended many ancient elements with the aid of the crocodile genome and Laurastherian reconstruction. The new database may be downloaded from GIRI: http://www.girinst.org/
|
Repeat Landscapes And New Download Pages
Thursday Mar 22, 2012
|
We have expanded upon the "Pre-Masked Genomes Download" page at the repeatmasker website. We have made it easier to navigate to the species of interest using a tree view of our datasets. In addition to the ability to download annotation and alignment files for various assemblies/libraries, you will also find repeat landscapes for the latest RepeatMasker runs. Repeat landscapes depict the relative abundance of repeat classes in the genome versus the Kimura divergence from the consensus. The landscapes are a useful tool for visualizing the activity of transposable elements over time. The new page is listed as Genomic Analysis and Downloads under the Service menu at the top left of the main site.
|
Pre-Masked Genomes Update - Tree Shrew, Armadillo, Sloth, Hyrax, Hedgehog, Alligator, and Lizard
Thursday Mar 22, 2012
|
Today we updated the RepeatMasker full genome datasets page adding some new genomes run with RepeatMasker ( RM-3.3.0 and db-20120124 ). The new assemblies include: tupBel1 (Tree Shrew), dasNov2 (Armadillo), choHof1 (Sloth), proCap1 (Hyrax), eriEur1 (Hedgehog), allMis (Alligator), and anoCar2 (Lizard).
|
Pre-Masked Genomes Update - Human, Rat, Bat, Pig, Rabbit, Sea urchin, Lancelet, Guinea pig, Stickleback, and Ciona
Tuesday Feb 14, 2012
|
Today we updated the Pre-Masked Genomes Search page and the Pre-Masked Genomes Download page with the latest runs of RepeatMasker ( RM-3.3.0 and db-20120124 ) on the genome assemblies hg19 (Human), rn4 (Rat), myoLuc2 (Bat), susScr2 (Pig), oryCun2 (Rabbit), strPur2 (Sea urchin), braFlo1 (Lancelet), cavPor3 (Guinea pig), gasAcu1 (Stickleback), and ci2 (Ciona tunicate).
|
New RepeatMasker Libraries
Wednesday September 21, 2011
|
Today we updated the RepeatMasker libraries to version rm-20110920. This update includes new RepBase sequences up to RepBase 16.08, 30+ new ancient mammalian elements derived from the Laurasiatherian reconstruction, as well as numerous classification improvements. The new database may be downloaded from GIRI: http://www.girinst.org/
|
New RECON Released - Version 1.07
Thursday June 9, 2011
|
We released a new version of RECON ( RECON1.07 ) which workarounds a rare problem with division by zero in the original RECON code. The workaround is a temporary measure to avert the division by zero but doesn't fix the underlying problem which is as yet undiagnosed. A warning is produced when the workaround is invoked. Using this version allows for long RepeatModeler runs to complete despite one rare case exercising this bug. For now you can download our version here. Please remember to update your RepeatModeler configuration ( by running "configure" ) if you install the new RECON in a different place than previous versions.
|
RepeatMasker Patch - Crossmatch 1.080812 Bugfix
Wednesday June 8, 2011
|
We have a patch for recent versions of RepeatMasker 3.2.X and higher which allows the program to work with the most-recent version of Crossmatch ( 1.080812 ). Without this patch the program will intermittently report "FastaDB::substr - Error index out of bounds! at RepeatMasker line 4969" when used with this version of Crossmatch. The patch is easy to apply. Simply replace the CrossmatchSearchEngine.pm module in your RepeatMasker directory with this one: CrossmatchSearchEngine.pm. Thanks to Jigui Shan for reporting this!
|
New RepeatModeler Released - Bugfix
Monday June 6, 2011
|
Today we released a new RepeatModeler ( open-1.0.5 ) to fix a critical bug with runs using the RMBlast search engine. The results produced by RMBlast were being mis-interpreted by the model refinement algorithm. The resulting consensi were produced with mis-aligned input data. NOTE: This problem only impacts runs using RMBlast. ABBlast and previous WUBlast runs would not have been affected.
The new release is available here: www.repeatmasker.org/RepeatModeler.html.
|
New RepeatMasker and RepeatMasker Libraries
Tuesday April 26, 2011
|
Today we released a new RepeatMasker ( open-3.3.0 ) along with an updated set of repeat libraries ( RM-20110914, including most sequences up to RepBase 16.01 ).
The new release includes the following changes:
- Compatible with the latest versions of RMBlast
- Updated reporting of databases used in the search
- New taxonomy database imported from NCBI
- Fixed error "refinelib doesn't exist" when -species option is used
- Perl 5.12.2 and later deprecated the use of "defined" for hash variables
The new release is available here: www.repeatmasker.org/RMDownload.html.
|
New COSEG Released
Tuesday Aug 3, 2010
|
Today we released a new version of COSEG ( 0.2.1 ).
This release fixes a few bugs, switches the default statistical model
and has improved code documentation. To download the latest version
please visit our
download page.
|
New RepeatModeler Released
Thursday July 8, 2010
|
Today we released a new RepeatModeler ( open-1.0.4 ) for download.
The release adds support for the new RMBlast search engine and
fixes a few small bugs in the previous release. To download the
latest version please visit our
download page.
|
New RepeatMasker & NCBI Search Engine Released
Thursday July 1, 2010
|
Today we released a new RepeatMasker ( open-3.2.9 ) on the website and for
download. The new release adds two important updates:
- RepeatMasker is now enabled to use RMBlast as a search engine.
RMBlast is a modified version of NCBI's Blast suite which adds
features needed for RepeatMasker's sensitive DNA searches. We
are releasing pre-compiled versions for x64-linux and
for Mac-intel as well as the source code. RMBlast may be downloaded
from here
- Custom library searches are now possible without having to obtain
the RepBase library. In the past the program required
the user to install the RepBase repeat library even for searches
using the "-lib mydatabase.fa" option. We do encourage the use
of the RepBase library as RepeatMasker is optimised for use
against this highly curated database.
|
Server Upgrades Complete
Tuesday May 11, 2010
|
We have upgraded the RepeatMasker webserver and compute cluster to accommodate the growing load on the service. We have also upgraded the job batching system to improve performance and to fix the cause of the recent downtime. Details of
this upgrade can be found here.
|
ABBlast Has Been Released
Friday Oct 16, 2009
|
We have tested RepeatMasker with the newly released ABBlast ( commercial replacement for WUBlast ). If you have been having problems obtaining WUBlast for use with RepeatModeler or RepeatMasker, please go to http://blast.advbiocomp.com/licensing/ for details on how to obtain this new version.
|
Pre-Masked Genomes Update - Human, Mouse, Cow, Zebrafish, and Opossum
Tuesday Jul 14, 2009
|
Today we updated the Pre-Masked Genomes page with the latest runs of RepeatMasker ( RM-3.2.8 and db-20090604 ) on the genome assemblies hg19 (Human), bosTau4 (Cow), danRer6 (Zebrafish), and monDom5 (Opossum). The complete annotation sets are also available for these genomes as compressed files by following a link from the above page.
|
New RepeatMasker and Libraries Released
Thursday Jun 4, 2009
|
RepeatMasker open-3.2.8 was released along with an updated set of repeat libraries ( RM-20090604, including most sequences up to RepBase 14.04 ). The library has been submitted to GIRI and will be available shortly.
Notably this release includes support for the ABBlast search engine from Advanced Biocomputing. This is the commercial version of the academic program WUBlast ( which is no longer available ) and will hopefully be released sometime later this month to the general public. Also a major optimization bug was found in ProcessRepeats and fixed in this release. Minor bugfixes also included in this release include: fixes for Debian/Ubuntu use of the DASH shell, GFF format inconsistency, and some documentation fixes.
|
Pre-Masked Genomes Update - Human, Zebrafinch, Zebrafish, Chicken, Frog, and Elephant
Thursday Feb 5, 2009
|
Today we updated the Pre-Masked Genomes page with the latest runs of RepeatMasker ( 3.2.7 ) on the genome assemblies hg18 (Human), taeGut1 (Zebrafinch), danRer5 (Zebrafish), galGal3 (Chicken), xenTro2 (Frog), and loxAfr2 (Elephant). The complete annotation sets are also available for these genomes as compressed files.
|
New RepeatMasker and Libraries Released
Thursday Jan 29, 2009
|
RepeatMasker open-3.2.7 was released along with an updated set of repeat libraries ( RM-20090120, including most sequences up to RepBase 13.11 ). This release includes a new approach to refining ALU annotations and coincides with the addition of several new ALU subfamilies to the databse. The database also includes the new Zebrafinch library.
|
COSEG Beta Released
Monday Aug 11, 2008
|
COSEG is a program which automatically identifies repeat subfamilies
using significant co-segregating ( 2-3 bp ) mutations.
This program is derived from three C programs and several perl
scripts written by Alkes Price as part of an analysis of Alu
elements in the human genome ( Whole-genome analysis of Alu repeat
elements reveals complex evolutionary history, Alkes L. Price,
Eleazar Eskin, and Pavel Pevzner, 2004 Genome Research ). The program
was first adapted for use with other repeat families and then
extended to support consideration of three co-segregating mutations
using Alkes statistical model. In 2008 with the help of Andy Siegel
an alternative statistical model was developed and the codebase
repackaged into the single source file.
The code may be downloaded from here. We appreciate any
feedback on this new software package.
|
Pre-Masked Genomes Update - Drosophila, Mosquito, Fugu, and Zebrafish
Monday Aug 11, 2008
|
Today we updated the Pre-Masked Genomes page with the latest runs of RepeatMasker on the genome assemblies dm3 (Drosophila), anoGam1 (Mosquito), fr2 (fugu), danRer5 (Zebrafish). The complete annotation sets are also available for these genomes as compressed files.
|
RepeatModeler/WUBlast and RepeatScout Bugs
Friday Aug 8, 2008
|
If you experience the error message: "Not found: ....." or "Identifiers not found: #" in RepeatModeler open-1.0.3 there is a simple fix. Edit the RepeatModeler script and change the line 1547 from:
`$RepModelConfig::XDGET_PRGM -n $xdfDBFile -a$start -b$end "$seqID"`;
to:
`$RepModelConfig::XDGET_PRGM -n -a$start -b$end $xdfDBFile "$seqID"`;
The error is caused by a parameter order dependency bug in WUBlast.
We have also updated the RepeatScout release ( RepeatScout-1.0.5 ) with a bugfix to the filter-stage-1.prl script.
Thanks to Eric Ganko and others for locating and reporting these problems.
|
RepeatMasker and RepeatMasker Libraries Update
Thursday Aug 7, 2008
|
RepeatMasker open-3.2.6 was released along with an updated set of repeat libraries ( RM-20080801, including sequences up to RepBase 13.06 ). This release includes some repeat subclass nomenclature changes:
Charlie -> hAT-Charlie, Fot1 -> TcMar-Fot1,
MER1_type -> hAT-Charlie, MER2_type -> TcMar-Tigger,
MaLR -> ERVL-MaLR, Mariner -> TcMar-Mariner,
Pogo -> TcMar-Pogo, Tc1 -> TcMar-Tc1,
Tc2 -> TcMar-Tc2, Tc4 -> TcMar-Tc4,
Tigger -> TcMar-Tigger, Tip100 -> hAT-Tip100, and
hAT_Tol2 -> hAT-Tol2.
Also note that the *.tbl output file also reflect these changes.
|
RepeatMasker, RepeatMasker Libraries, and RepeatScout Updates
Friday Jun 13, 2008
|
RepeatMasker open-3.2.5 was released along with an updated set of repeat libraries ( RM-20080611, RepBase 13.02 available from GIRI ). This release includes several fixes to the alignment output format ( single "X"s in the query portion of the alignment, incorrect sequence indexes in lines containing only gap characters "-" ) and an updated set of libraries for the RepeatProteinMask program. In addition a new release of RepeatScout is available for download ( http://repeatscout.bioprojects.org/ ). The RepeatScout filter-stage-1.prl script skipped the last sequence in the input file. Thanks to Gyorgy Abrusan for reporting this.
|
Pre-Masked Genomes Update - Rhesus, Chicken, Rat, Cat, and Chimp
Wednesday Jun 11, 2008
|
The last few weeks we have updated the Pre-Masked Genomes page with the latest runs of RepeatMasker on the genome assemblies rheMac2 (Rhesus), galGal3 (Chicken), rn4 (Rat), felCat3 (Cat), and panTro2 (Chimp). The complete annotation sets are also available for these genomes as compressed files.
|
DupMasker - A Tool for Annotating Primate Segmental Duplications
Tuesday Jun 10, 2008
|
In collaboration with Evan Eichler and Zhaoshi Jiang at the University of Washington we developed the DupMasker program. DupMasker uses a library of non-redundant consensus sequences of human segmental duplications, wherein a majority of the ancestral origins have been determined based on comparisons to mammalian outgroup genomes. Using DupMasker, new human and non-human primate (NHP) sequences may be readily queried to provide details on the origin and degree of sequence identity of each duplicon. This program can be applied to delineate the order and orientation of duplicons within complex duplication blocks and used to characterize structural variation differences between sequenced human haplotypes. The paper describing this work can be found in the latest issue of Genome Research ( abstract ) and the software download page is here.
|
RECON Fixed for 64bit compilers
Wednesday May 21, 2008
|
We released a new version of RECON ( RECON1.06 ) which fixes problems users have experienced on 64bit platforms ( crashes, lockups ). For now you can download our version here. Please remember to update your RepeatModeler configuration ( by running "configure" ) if you install the new RECON in a different place than previous versions.
|
Pre-Masked Genomes Update - Rice, Arabidopsis, Opossum, Platypus and Cow
Tuesday May 13, 2008
|
Today we updated the Pre-Masked Genomes page with the latest runs of RepeatMasker on the genome assemblies orySat5 (Rice), araTha5 (Arabidopsis), monDom4 (Opossum), ornAna1 (Platypus), bosTau4 (Cow). The complete annotation sets are also available for these genomes as compressed files.
|
RepeatModeler: RECON Bug?
Wednesday May 7, 2008
|
We have noticed a few times that RECON's "re-definition of elements ( eleredef )" appears to hang up ( in a recent run it sat on this step for over 6 hours before we killed it ). If you also experience this problem while running RepeatModeler please let us know. When we restart the run from the beginning it has been able to accomplish this step in a reasonable amount of time ( hg18: 1-5 minutes ).
|
Pre-Masked Genomes Update - HG18 and MM9
Tuesday May 6, 2008
|
Today we updated the Pre-Masked Genomes page with the latest runs of RepeatMasker on the genome assemblies HG18 and MM9. In addition to the data query service we have also provided the ability to download the complete annotation sets as compressed files.
|
WUBlastXSearchEngine.pm Missing
Friday May 2, 2008
|
The WUBlastXSearchEngine.pm module was missing from yesterday's RepeatMasker release. This module is needed for the RepeatProteinMask program. Please re-download the 3.2.2 release of RepeatMasker if you had previously obtained it.
|
RepeatProteinMask Released, RepeatModeler/RepeatMasker Updates
Thursday May 1, 2008
|
The program which runs the repeat protein search on the website is now available as a standalone program within the RepeatMasker package.
Thanks to the Sanger Institute and various testers we have patched a few bugs in the first RepeatModeler release. The fixes required a new version of RepeatMasker to be created ( version 3.2.2 ) although the changes will not impact RepeatMasker results. If you have experienced problems installing the first release or with running the repeat classifier, please download the latest RepeatMasker and RepeatModeler packages.
|
RepeatModeler Beta: Repeat Discovery Workbench Released
Wednesday April 16, 2008
|
RepeatModeler is a de-novo repeat family identification and modeling package. At the heart of RepeatModeler are two de-novo repeat finding programs ( RECON and RepeatScout ) which employ complementary computational methods for identifying repeat element boundaries and family relationships from sequence data. RepeatModeler assists in automating the runs of RECON and RepeatScout given a genomic database and uses the output to build, refine and classify consensus models of putative interspersed repeats.
The software is available for download here.
Also note that RepeatMasker is now up to version 3.2.1. This version as well as the previous version are organizational updates to support RepeatModeler and have little or no impact on RepeatMasker results.
|
RepeatScout On Multiple Sequences
Tuesday April 1, 2008
|
RepeatScout is a highly successful de-novo repeat discovery algorithm developed by Price, A. L. et al. We have created a modified version of RepeatScout ( version 1.0.3 ) which supports searches on highly fragmented genomes. The new version does not attempt to extend seeds across sequence boundaries. Pavel Pevzner's lab has offered to host the download for this new version at: http://repeatscout.bioprojects.org/
Continuing with the theme of de-novo repeat identification I have added two recent papers on the topic by Saha et al. to the Related Papers page.
|
Unexpected Downtime
Thursday, February 28 2008
|
At 3am last night the RepeatMasker cluster went down unexpectedly. We restored service at 10am this morning and are looking into the cause of the problem. If you had jobs queued/running at the time they will need to be resubmitted. We apologize for any inconvenience this may have caused.
|
New Compute Node & Scheduler
Wednesday, January 23 2008
|
The RepeatMasker cluster received a new compute node ( for processing web requests ) and an upgrade to the job scheduler over the weekend. This upgrade should improve the overall throughput of the masking service.
|
RepeatMasker open-3.1.9 Released
Friday, January 11 2008
|
A new version of RepeatMasker is available for download. Updates include:
- Codebase synced to recently released library RMLib 20071204.
- Improved DNA transposon fragment identification.
- BugFix: The .align files generated by 3.1.8 did not contain
cross_match style headers for each alignment.
- BugFix: ProcessRepeats will infrequently exit in cycle 10 with the
error: cycle 10 Can't call method "addDerivedFromAnnot" on an undefined value at ProcessRepeats line 3835. No results for the search are given.
|
[Archived News]
|
|
Links
|
- RepeatMasker makes use of Repbase which is a service of the Genetic Information Research Institute. Repbase is a comprehensive database of repetitive element consensus sequences.
- Data and computational resources for the Pre-Masked Genomes page is
provided courtesy of the UCSC Genome Bioinformatics group.
|
|