The Institute for Systems Biology RepeatMasker Archived News

Analysis of Human ALU Subfamilies
Tuesday June 26, 2012
As often happens in science, an analysis is performed to improve the overall performance of a program ( RepeatMasker ) and once a new version of the program is out we forget about the datasets themselves. Today we uploaded the results of a coseg run on over 500k human ALUs. Follow this link to see the tree produced by coseg along with a multiple alignment of the various subfamilies, old and new ( at the time ).
RepeatMasker open-3-3-0 Crossmatch Patch
Friday June 22, 2012
RepeatMasker + Crossmatch users have reported an intermittent bug with RepeatMasker open-3-3-0. On runs using Crossmatch as the search engine the following error can appear:

        WARNING: The search engine returned an error (141)
        A search phase could not complete on this batch.
        The batch file will be re-run and if possible the
        program will resume.
        WARNING: Retrying batch ( 1 )...

In some cases this is caused by a bug in the file "CrossmatchSearchEngine.pm". To fix this problem either download the patched version of the RepeatMasker-open-3-3-0 package from here or you can simply download the new CrossmatchSearchEngine.pm file and replace the old one with it. Please submit feedback if you have any problems with this process.

New RepeatMasker Libraries
Thursday April 19, 2012
Today we updated the RepeatMasker libraries to version rm-20120418. This update includes new RepBase sequences up to RepBase 17.02, including significant additions for lizard, potato and soybean. We have also extended many ancient elements with the aid of the crocodile genome and Laurastherian reconstruction. The new database may be downloaded from GIRI: http://www.girinst.org/
Repeat Landscapes And New Download Pages
Thursday Mar 22, 2012
We have expanded upon the "Pre-Masked Genomes Download" page at the repeatmasker website. We have made it easier to navigate to the species of interest using a tree view of our datasets. In addition to the ability to download annotation and alignment files for various assemblies/libraries, you will also find repeat landscapes for the latest RepeatMasker runs. Repeat landscapes depict the relative abundance of repeat classes in the genome versus the Kimura divergence from the consensus. The landscapes are a useful tool for visualizing the activity of transposable elements over time. The new page is listed as Genomic Analysis and Downloads under the Service menu at the top left of the main site.
Pre-Masked Genomes Update - Tree Shrew, Armadillo, Sloth, Hyrax, Hedgehog, Alligator, and Lizard
Thursday Mar 22, 2012
Today we updated the RepeatMasker full genome datasets page adding some new genomes run with RepeatMasker ( RM-3.3.0 and db-20120124 ). The new assemblies include: tupBel1 (Tree Shrew), dasNov2 (Armadillo), choHof1 (Sloth), proCap1 (Hyrax), eriEur1 (Hedgehog), allMis (Alligator), and anoCar2 (Lizard).
Pre-Masked Genomes Update - Human, Rat, Bat, Pig, Rabbit, Sea urchin, Lancelet, Guinea pig, Stickleback, and Ciona
Tuesday Feb 14, 2012
Today we updated the Pre-Masked Genomes Search page and the Pre-Masked Genomes Download page with the latest runs of RepeatMasker ( RM-3.3.0 and db-20120124 ) on the genome assemblies hg19 (Human), rn4 (Rat), myoLuc2 (Bat), susScr2 (Pig), oryCun2 (Rabbit), strPur2 (Sea urchin), braFlo1 (Lancelet), cavPor3 (Guinea pig), gasAcu1 (Stickleback), and ci2 (Ciona tunicate).
New RepeatMasker Libraries
Wednesday September 21, 2011
Today we updated the RepeatMasker libraries to version rm-20110920. This update includes new RepBase sequences up to RepBase 16.08, 30+ new ancient mammalian elements derived from the Laurasiatherian reconstruction, as well as numerous classification improvements. The new database may be downloaded from GIRI: http://www.girinst.org/
New RECON Released - Version 1.07
Thursday June 9, 2011
We released a new version of RECON ( RECON1.07 ) which workarounds a rare problem with division by zero in the original RECON code. The workaround is a temporary measure to avert the division by zero but doesn't fix the underlying problem which is as yet undiagnosed. A warning is produced when the workaround is invoked. Using this version allows for long RepeatModeler runs to complete despite one rare case exercising this bug. For now you can download our version here. Please remember to update your RepeatModeler configuration ( by running "configure" ) if you install the new RECON in a different place than previous versions.
RepeatMasker Patch - Crossmatch 1.080812 Bugfix
Wednesday June 8, 2011
We have a patch for recent versions of RepeatMasker 3.2.X and higher which allows the program to work with the most-recent version of Crossmatch ( 1.080812 ). Without this patch the program will intermittently report "FastaDB::substr - Error index out of bounds! at RepeatMasker line 4969" when used with this version of Crossmatch. The patch is easy to apply. Simply replace the CrossmatchSearchEngine.pm module in your RepeatMasker directory with this one: CrossmatchSearchEngine.pm. Thanks to Jigui Shan for reporting this!
New RepeatModeler Released - Bugfix
Monday June 6, 2011
Today we released a new RepeatModeler ( open-1.0.5 ) to fix a critical bug with runs using the RMBlast search engine. The results produced by RMBlast were being mis-interpreted by the model refinement algorithm. The resulting consensi were produced with mis-aligned input data. NOTE: This problem only impacts runs using RMBlast. ABBlast and previous WUBlast runs would not have been affected. The new release is available here: www.repeatmasker.org/RepeatModeler.html.
New RepeatMasker and RepeatMasker Libraries
Tuesday April 26, 2011
Today we released a new RepeatMasker ( open-3.3.0 ) along with an updated set of repeat libraries ( RM-20110914, including most sequences up to RepBase 16.01 ). The new release includes the following changes:
  • Compatible with the latest versions of RMBlast
  • Updated reporting of databases used in the search
  • New taxonomy database imported from NCBI
  • Fixed error "refinelib doesn't exist" when -species option is used
  • Perl 5.12.2 and later deprecated the use of "defined" for hash variables
The new release is available here: www.repeatmasker.org/RepeatMasker/.
New COSEG Released
Tuesday Aug 3, 2010
Today we released a new version of COSEG ( 0.2.1 ). This release fixes a few bugs, switches the default statistical model and has improved code documentation. To download the latest version please visit our download page.
New RepeatModeler Released
Thursday July 8, 2010
Today we released a new RepeatModeler ( open-1.0.4 ) for download. The release adds support for the new RMBlast search engine and fixes a few small bugs in the previous release. To download the latest version please visit our download page.
New RepeatMasker & NCBI Search Engine Released
Thursday July 1, 2010
Today we released a new RepeatMasker ( open-3.2.9 ) on the website and for download. The new release adds two important updates:
  • RepeatMasker is now enabled to use RMBlast as a search engine. RMBlast is a modified version of NCBI's Blast suite which adds features needed for RepeatMasker's sensitive DNA searches. We are releasing pre-compiled versions for x64-linux and for Mac-intel as well as the source code. RMBlast may be downloaded from here

  • Custom library searches are now possible without having to obtain the RepBase library. In the past the program required the user to install the RepBase repeat library even for searches using the "-lib mydatabase.fa" option. We do encourage the use of the RepBase library as RepeatMasker is optimised for use against this highly curated database.
Server Upgrades Complete
Tuesday May 11, 2010
We have upgraded the RepeatMasker webserver and compute cluster to accommodate the growing load on the service. We have also upgraded the job batching system to improve performance and to fix the cause of the recent downtime. Details of this upgrade can be found here.
ABBlast Has Been Released
Friday Oct 16, 2009
We have tested RepeatMasker with the newly released ABBlast ( commercial replacement for WUBlast ). If you have been having problems obtaining WUBlast for use with RepeatModeler or RepeatMasker, please go to http://blast.advbiocomp.com/licensing/ for details on how to obtain this new version.
Pre-Masked Genomes Update - Human, Mouse, Cow, Zebrafish, and Opossum
Tuesday Jul 14, 2009
Today we updated the Pre-Masked Genomes page with the latest runs of RepeatMasker ( RM-3.2.8 and db-20090604 ) on the genome assemblies hg19 (Human), bosTau4 (Cow), danRer6 (Zebrafish), and monDom5 (Opossum). The complete annotation sets are also available for these genomes as compressed files by following a link from the above page.
New RepeatMasker and Libraries Released
Thursday Jun 4, 2009
RepeatMasker open-3.2.8 was released along with an updated set of repeat libraries ( RM-20090604, including most sequences up to RepBase 14.04 ). The library has been submitted to GIRI and will be available shortly.

Notably this release includes support for the ABBlast search engine from Advanced Biocomputing. This is the commercial version of the academic program WUBlast ( which is no longer available ) and will hopefully be released sometime later this month to the general public. Also a major optimization bug was found in ProcessRepeats and fixed in this release. Minor bugfixes also included in this release include: fixes for Debian/Ubuntu use of the DASH shell, GFF format inconsistency, and some documentation fixes.

Pre-Masked Genomes Update - Human, Zebrafinch, Zebrafish, Chicken, Frog, and Elephant
Thursday Feb 5, 2009
Today we updated the Pre-Masked Genomes page with the latest runs of RepeatMasker ( 3.2.7 ) on the genome assemblies hg18 (Human), taeGut1 (Zebrafinch), danRer5 (Zebrafish), galGal3 (Chicken), xenTro2 (Frog), and loxAfr2 (Elephant). The complete annotation sets are also available for these genomes as compressed files.
New RepeatMasker and Libraries Released
Thursday Jan 29, 2009
RepeatMasker open-3.2.7 was released along with an updated set of repeat libraries ( RM-20090120, including most sequences up to RepBase 13.11 ). This release includes a new approach to refining ALU annotations and coincides with the addition of several new ALU subfamilies to the databse. The database also includes the new Zebrafinch library.
COSEG Beta Released
Monday Aug 11, 2008
COSEG is a program which automatically identifies repeat subfamilies using significant co-segregating ( 2-3 bp ) mutations.

This program is derived from three C programs and several perl scripts written by Alkes Price as part of an analysis of Alu elements in the human genome ( Whole-genome analysis of Alu repeat elements reveals complex evolutionary history, Alkes L. Price, Eleazar Eskin, and Pavel Pevzner, 2004 Genome Research ). The program was first adapted for use with other repeat families and then extended to support consideration of three co-segregating mutations using Alkes statistical model. In 2008 with the help of Andy Siegel an alternative statistical model was developed and the codebase repackaged into the single source file.

The code may be downloaded from here. We appreciate any feedback on this new software package.

Pre-Masked Genomes Update - Drosophila, Mosquito, Fugu, and Zebrafish
Monday Aug 11, 2008
Today we updated the Pre-Masked Genomes page with the latest runs of RepeatMasker on the genome assemblies dm3 (Drosophila), anoGam1 (Mosquito), fr2 (fugu), danRer5 (Zebrafish). The complete annotation sets are also available for these genomes as compressed files.
RepeatModeler/WUBlast and RepeatScout Bugs
Friday Aug 8, 2008
If you experience the error message: "Not found: ....." or "Identifiers not found: #" in RepeatModeler open-1.0.3 there is a simple fix. Edit the RepeatModeler script and change the line 1547 from:
  `$RepModelConfig::XDGET_PRGM -n $xdfDBFile -a$start -b$end "$seqID"`;
to:
  `$RepModelConfig::XDGET_PRGM -n -a$start -b$end $xdfDBFile "$seqID"`;
The error is caused by a parameter order dependency bug in WUBlast. We have also updated the RepeatScout release ( RepeatScout-1.0.5 ) with a bugfix to the filter-stage-1.prl script. Thanks to Eric Ganko and others for locating and reporting these problems.
RepeatMasker and RepeatMasker Libraries Update
Thursday Aug 7, 2008
RepeatMasker open-3.2.6 was released along with an updated set of repeat libraries ( RM-20080801, including sequences up to RepBase 13.06 ). This release includes some repeat subclass nomenclature changes:

     Charlie     -> hAT-Charlie,      Fot1        -> TcMar-Fot1, 
     MER1_type   -> hAT-Charlie,      MER2_type   -> TcMar-Tigger,
     MaLR        -> ERVL-MaLR,        Mariner     -> TcMar-Mariner,
     Pogo        -> TcMar-Pogo,       Tc1         -> TcMar-Tc1,
     Tc2         -> TcMar-Tc2,        Tc4         -> TcMar-Tc4,
     Tigger      -> TcMar-Tigger,     Tip100      -> hAT-Tip100, and
     hAT_Tol2    -> hAT-Tol2.
Also note that the *.tbl output file also reflect these changes.
RepeatMasker, RepeatMasker Libraries, and RepeatScout Updates
Friday Jun 13, 2008
RepeatMasker open-3.2.5 was released along with an updated set of repeat libraries ( RM-20080611, RepBase 13.02 available from GIRI ). This release includes several fixes to the alignment output format ( single "X"s in the query portion of the alignment, incorrect sequence indexes in lines containing only gap characters "-" ) and an updated set of libraries for the RepeatProteinMask program. In addition a new release of RepeatScout is available for download ( http://repeatscout.bioprojects.org/ ). The RepeatScout filter-stage-1.prl script skipped the last sequence in the input file. Thanks to Gyorgy Abrusan for reporting this.
Pre-Masked Genomes Update - Rhesus, Chicken, Rat, Cat, and Chimp
Wednesday Jun 11, 2008
The last few weeks we have updated the Pre-Masked Genomes page with the latest runs of RepeatMasker on the genome assemblies rheMac2 (Rhesus), galGal3 (Chicken), rn4 (Rat), felCat3 (Cat), and panTro2 (Chimp). The complete annotation sets are also available for these genomes as compressed files.
DupMasker - A Tool for Annotating Primate Segmental Duplications
Tuesday Jun 10, 2008
In collaboration with Evan Eichler and Zhaoshi Jiang at the University of Washington we developed the DupMasker program. DupMasker uses a library of non-redundant consensus sequences of human segmental duplications, wherein a majority of the ancestral origins have been determined based on comparisons to mammalian outgroup genomes. Using DupMasker, new human and non-human primate (NHP) sequences may be readily queried to provide details on the origin and degree of sequence identity of each duplicon. This program can be applied to delineate the order and orientation of duplicons within complex duplication blocks and used to characterize structural variation differences between sequenced human haplotypes. The paper describing this work can be found in the latest issue of Genome Research ( abstract ) and the software download page is here.
RECON Fixed for 64bit compilers
Wednesday May 21, 2008
We released a new version of RECON ( RECON1.06 ) which fixes problems users have experienced on 64bit platforms ( crashes, lockups ). For now you can download our version here. Please remember to update your RepeatModeler configuration ( by running "configure" ) if you install the new RECON in a different place than previous versions.
Pre-Masked Genomes Update - Rice, Arabidopsis, Opossum, Platypus and Cow
Tuesday May 13, 2008
Today we updated the Pre-Masked Genomes page with the latest runs of RepeatMasker on the genome assemblies orySat5 (Rice), araTha5 (Arabidopsis), monDom4 (Opossum), ornAna1 (Platypus), bosTau4 (Cow). The complete annotation sets are also available for these genomes as compressed files.
RepeatModeler: RECON Bug?
Wednesday May 7, 2008
We have noticed a few times that RECON's "re-definition of elements ( eleredef )" appears to hang up ( in a recent run it sat on this step for over 6 hours before we killed it ). If you also experience this problem while running RepeatModeler please let us know. When we restart the run from the beginning it has been able to accomplish this step in a reasonable amount of time ( hg18: 1-5 minutes ).
Pre-Masked Genomes Update - HG18 and MM9
Tuesday May 6, 2008
Today we updated the Pre-Masked Genomes page with the latest runs of RepeatMasker on the genome assemblies HG18 and MM9. In addition to the data query service we have also provided the ability to download the complete annotation sets as compressed files.
WUBlastXSearchEngine.pm Missing
Friday May 2, 2008
The WUBlastXSearchEngine.pm module was missing from yesterday's RepeatMasker release. This module is needed for the RepeatProteinMask program. Please re-download the 3.2.2 release of RepeatMasker if you had previously obtained it.
RepeatProteinMask Released, RepeatModeler/RepeatMasker Updates
Thursday May 1, 2008
The program which runs the repeat protein search on the website is now available as a standalone program within the RepeatMasker package.

Thanks to the Sanger Institute and various testers we have patched a few bugs in the first RepeatModeler release. The fixes required a new version of RepeatMasker to be created ( version 3.2.2 ) although the changes will not impact RepeatMasker results. If you have experienced problems installing the first release or with running the repeat classifier, please download the latest RepeatMasker and RepeatModeler packages.

RepeatModeler Beta: Repeat Discovery Workbench Released
Wednesday April 16, 2008
RepeatModeler is a de-novo repeat family identification and modeling package. At the heart of RepeatModeler are two de-novo repeat finding programs ( RECON and RepeatScout ) which employ complementary computational methods for identifying repeat element boundaries and family relationships from sequence data. RepeatModeler assists in automating the runs of RECON and RepeatScout given a genomic database and uses the output to build, refine and classify consensus models of putative interspersed repeats.
The software is available for download here.

Also note that RepeatMasker is now up to version 3.2.1. This version as well as the previous version are organizational updates to support RepeatModeler and have little or no impact on RepeatMasker results.

RepeatScout On Multiple Sequences
Tuesday April 1, 2008
RepeatScout is a highly successful de-novo repeat discovery algorithm developed by Price, A. L. et al. We have created a modified version of RepeatScout ( version 1.0.3 ) which supports searches on highly fragmented genomes. The new version does not attempt to extend seeds across sequence boundaries. Pavel Pevzner's lab has offered to host the download for this new version at: http://repeatscout.bioprojects.org/

Continuing with the theme of de-novo repeat identification I have added two recent papers on the topic by Saha et al. to the Related Papers page.

De-Novo Repeat Discovery and Detection
Thursday, February 28 2008
A nice survey paper by Bergman and Quesneville appeared recently in Briefings In Bioinformatics ( "Discovering and detecting transposable elements in genome sequences", Vol 8, No 6, 382-392 ). Many software packages have been developed to research repetitive DNA and this paper provides a succinct summary of each programs capabilities and their relationships to each other.
Unexpected Downtime
Thursday, February 28 2008
At 3am last night the RepeatMasker cluster went down unexpectedly. We restored service at 10am this morning and are looking into the cause of the problem. If you had jobs queued/running at the time they will need to be resubmitted. We apologize for any inconvenience this may have caused.
New Compute Node & Scheduler
Wednesday, January 23 2008
The RepeatMasker cluster received a new compute node ( for processing web requests ) and an upgrade to the job scheduler over the weekend. This upgrade should improve the overall throughput of the masking service.
RepeatMasker open-3.1.9 Released
Friday, January 11 2008
A new version of RepeatMasker is available for download. Updates include:
  • Codebase synced to recently released library RMLib 20071204.
  • Improved DNA transposon fragment identification.
  • BugFix: The .align files generated by 3.1.8 did not contain cross_match style headers for each alignment.
  • BugFix: ProcessRepeats will infrequently exit in cycle 10 with the error: cycle 10 Can't call method "addDerivedFromAnnot" on an undefined value at ProcessRepeats line 3835. No results for the search are given.
RepeatMasker Library Update
Wednesday, December 12th, 2007
A new version of the RepeatMasker repeat library ( RMLib: 20071204, RepBase: 12. 06 ) is now available for download from GIRI.
RepeatMasker Webserver Upgraded
Wednesday, December 5th, 2007
We replaced the main RepeatMasker webserver today with a new dual Xeon quad core server with 8GB of main memory. The new server also contains increased disk storage for expansion of the cached genomes.
RepeatMasker Evidence Reporting
Thursday August 9th, 2007
A new version of the RepeatMasker webservice has been installed. The new version produces an additional output file ( *.out.html ) which provides the evidence ( source hsps ) with each final annotation call. The page is displayed in the typical one-annotation-per-line format with links ( the "+" preceding each line ) to expand the evidence data below the line. In addition to evidence reporting the repeat names on this new page link to details for each particular type of repeat.
Open 3.1.8 Bugfix
A bug in ProcessRepeats causes the program to crash when rare transposon join scenarios are encountered. The error message looks like this: "join(): Invalid join!$this == $partner at ProcessRepeats line 8164." or this: "This violates recursion....Died at ProcessRepeats line 1828". The fix is to replace your Open-3.1.8 ProcessRepeats file with the one contained in this archive RepeatMasker-open-3-1-8-patch-2.tar.gz. NOTE: You may have to alter the first line in ProcessRepeats to correctly reference your perl installation location.
RepeatMasker Official Release Available
The recent beta version of RepeatMasker has been tested and is now ready for an official release. We assigned it the version "Open-3-1-8" as there were several minor bugs fixed. You may download the release from here: download. The webserver is now running the updated version as well.
RepeatMasker Beta Release Available
A new version of RepeatMasker ( Open-3-1-7 ) is available for testing at download. This version includes a major refactoring of the ProcessRepeats code along with many bugfixes. Due to the volume of changes in this release we are offering it as a downloadable beta-release for a short period while we continue to test it. The webserver will continue to use open-3-1-6 until we are ready for the official release. Changes in the release include:
  • Repeat Defragmentation Improvements: The defragmentation stages of ProcessRepeats have been refactored improving the annotation of LTR, LINE and SINE repeats.
  • Metadata Migration: We have begun to move metadata ( subfamily relationships, consensus model relationships, genomic frequency etc ) out of the ProcessRepeats code. In the near future this will provide researchers greater access to these detailed repeat characteristics and enable the same processing rules to be used on custom generated repeat libraries.
  • Bugfixes:
    • IS Element Bugfix: In certain cases the extraction of IS elements fails causing the sequences indices to be off. The final result is an error message of the form: "ArrayList::get( -1 ) Index out of bounds!".
    • Division By Zero Bugfix: Under special circumstances ProcessRepeats produces a "Illegal division by zero at ProcessRepeats line 1860." error.
    • Long Sequence Names Bugfix: Long sequence names > 20 characters can cause ProcessRepeats to fail. Thanks to Gordon Lack for finding and reporting this.
    • Negative Sequence Positions Bugfix: ProcessRepeats was reporting negative sequence positions in the final output file.
RepeatMasker open-3.1.6 Released
A new version of RepeatMasker is available for download. Included in this new release are several major improvements:
  • The repeat database is updated with 694 new entries and 147 improvements on existing ones, including RepBase version 11.06. Major advances were for ancient mammalian repeats, which are shared by all mammals or all eutherians, and for marsupial repeats, especially for the opossum. Other significant additions were for Chlamydomonas and Caenorhabditis briggsae.
  • The annotation of DNA Transposon fragments has been improved. A new method of joining related transposon fragments improves classification of ambiguous fragments. More details here
  • A new option ( -lcambig ) identifies DNA Transposon annotations which without any supporting evidence are ambiguously defined. i.e The fragment falls within a non-unique portion of the family consensus. When this option is used all ambiguous repeat names are printed in lower case while the rest are in uppercase.
  • Fixed a bug with fasta files containing more than 60MB of sequence on a single file line.
  • Updated the taxonomy database and added the "-tree" option to the queryRepeatDatabase.pl script. The new option prints out the taxonomic tree of all species contained in the RepeatMasker database with information on the number of repeat families defined at each level.
  • Several bugs have been fixed in the DateRepeats routine, which played when large numbers were involved (e.g. analysis of whole chromosome RepeatMasker output) and/or the input is a concatenation of RepeatMasker outputs (repeat IDs are not necessarily unique anymore).
  • Other improvements in DateRepeats are better labeling of ambiguously called repeats, correctly assignment of lineage specificity to some elements that have independently inserted in separate lineages of mammals, and refinements in the phylogeny.
Pre-Masked Genome Annotations Available
The November 2003 and the January 2006 assemblies of the chimp genome ( panTro1 and panTro2 ), the May 2005 assembly of the dog ( canFam2 ), the May 2005 assembly of the Zebrafish genome ( danRer3 ), and the August 2002 assembly of the takifugu genome ( fr1 ) have been added to the Pre-Masked Genomes Page. You can query RepeatMasker annotations, alignments, and masked sequence using this webservice.

Institute for Systems Biology
This server is made possible by funding from the National Human Genome Research Institute (NHGRI grant # RO1 HG002939).