| Prerequisites | 
    
    | 
  Unix system with perl 5.8.0 or higher installed
  Python 3 and the h5py python library.
    
  Sequence Search EngineRepeatMasker uses a sequence search engine to perform
     it's search for repeats. Currently Cross_Match,
     RMBlast and WUBlast/ABBlast are supported.  You will need to obtain
     one or the other of these and install them on your
     system.
 
         For Cross_Match go to 
           http://www.phrap.org
           You will want to select "Phred/Phrap/Consed" as 
           Cross_Match is part of the Phrap package.
         For RMBlast ( NCBI Blast modified for use with RepeatMasker/RepeatModeler ) please go to our download page: http://www.repeatmasker.org/rmblast.  It is highly recommended to use 2.13.0 or higher.
         For HMMER please download the v3.2.1 version here: http://hmmer.org/
 
         For ABBlast/WUBlast go to [ NOTE: Rights to BLAST 2.0 (WU-BLAST) have been acquired by Advanced Biocomputing, LLC.  http://blast.advbiocomp.com/licensing/  RepeatMasker 3.2.8 and above fully support both variants ]
     TRF - Tandem Repeat Finder, G. Benson et al.You can obtain a free copy at
    http://tandem.bu.edu/trf/trf.html.
    RepeatMasker was developed using TRF version 4.0.9
Repeat DatabaseRepeatMasker can be used with custom libraries, or with Dfam out of the box.
     Dfam is an open database of transposable element (TE) profile HMM models and consensus
     sequences.  The current release of RepeatMasker is shipped without a TE database, however
     libraries in FamDB H5 format may be downloaded from Dfam at: 
     https://www.dfam.org/releases/current/families/FamDB
     and installed in the Libraries/famdb directory.
 
 The files are divided
    by taxa groups and numbered starting from '0' ( aka the root partition )
    which contains required information for RepeatMasker/FamDB.  In addition
    the last Repbase RepeatMasker Edition may be downloaded and combined with Dfam.
    The RepBase RepeatMasker Library can be obtained at: http://www.girinst.org.
 
 | 
    
      | Installation | 
    
    | 
  Download RepeatMaskerLatest Released Version: 8/19/25: RepeatMasker-4.2.1.tar.gz
 Previous Released Version: 7/1/25: RepeatMasker-4.2.0.tar.gz
 
Unpack DistributionUnpack the distribution in your home directory or in a location where it may be shared with other users of your system ( ie. /usr/local/ ).  Make sure you do not extract in a directory already containing a pre-existing directory called "RepeatMasker" as it will attempt to overwrite files contained within.
 
     cp RepeatMasker-open-4-#-#.tar.gz /usr/local
     cd /usr/local
     gunzip RepeatMasker-open-4-#-#.tar.gz
     tar xvf RepeatMasker-open-4-#-#.tar
  Install RepeatMasker LibrariesRepeatMasker is currently not distributed with a database.  The program may
be used immediately with custom databases ("-lib mylib.fa" option) or you 
may download TE libraries and configure them for use with RepeatMasker.
There are three options for supplementing/updating the main RepeatMasker library:
 
       The Dfam database may be downloaded from www.dfam.org in famdb HDF5 format partitioned by taxa.
        The root ("dfam##_full.0.h5") partition is required if you plan to use Dfam, however any combination of additional partitions
        may also be downloaded and configured.
        For example:
            
               wget https://www.dfam.org/releases/Dfam_3.9/families/FamDB/dfam39_full.1.h5.gzgunzip dfam39_full.1.h5.gzmv dfam39_full.1.h5 /usr/local/RepeatMasker/Libraries/famdb
                 NOTE: Only partitions from the same Dfam release should be in this directory.
Repeat for any additional partitions you wish to use.  A list of partions and the taxa they contain can be found here. and/or:  The RepBase RepeatMasker Edition ( final version 10/26/2018 )
           may be downloaded from www.girinst.org and unpaked in the
           RepeatMasker directory.  For example:
            
               cp RepBaseRepeatMaskerEdition-20181026.tar.gz /usr/local/RepeatMasker/cd /usr/local/RepeatMaskergunzip RepBaseRepeatMaskerEdition-20181026.tar.gztar xvf RepBaseRepeatMaskerEdition-20181026.tarrm RepBaseRepeatMaskerEdition-20181026.tarRun Configure ScriptThe program requires some initial configuration.  This should also be re-run after updates to the library files.
 
     cd /usr/local/RepeatMasker
     perl ./configure
    
 RepeatMasker "open-3.0/4.0" is licensed under the Open Source License v2.1. | 
| Release Notes | 
| RepeatMasker-4.2.1 
 
   BUGFIX: An infinite loop was possible in rare circumstances
    involoving interleaved satellite consensi while using the
    "-a" option.  This caused the alignment file to keep growing
    and eventually ProcessRepeats to crash.
   BUGFIX: Fix a one-off issue in alignment range calculations
    induced by the clipping mechanism in RepeatMasker.
   BUGFIX: The configure script was crashing while attempting to
    import Repbase in the previous version.
   Updated to the latest famdb version (2.0.5)
   Crossmatch alignment format parser hardening.  While not proper, 
    negative coordinates can appear in the alignment file due to 
    incorrect range calculations.  This change allows for the parser
    to handle these cases and not mis-align the data columns.
   Updated the length of the Charlie7 family in RepeatAnnotationData.pm
 | 
| RepeatMasker-4.2.0 
 
   BUGFIX: For mammalian genomes searched using the "-species" option,
    newer Dfam families without search stages, or with the "-uncurated"
    flag Dfam DR families were not being included in the search.  
   BUGFIX: RepeatMasker exits with an error code if no repetitive
    sequences were found in the input file.
   BUGFIX: famdb.py was not reporting the correct counts for families
    ancestral to the query term.  RepeatMasker uses this count and therefore
    was also reporting an incorrect value.
   Remove duplicate families when attempting to merge Repbase 
    RepeatMaskerEdition into FamDB.
   The configure utility was not using the -libdir path when creating
    the RepeatMasker.lib file. (github #344)
 | 
| RepeatMasker-4.1.9 
 
   BUGFIX: When the RepeatMasker adjudicator (ProcessRepeats) is 
    faced with a huge number of overlapping and redundant alignments 
    the joining can go awry, creating a infinite loop.  The effect
    is that ProcessRepeats never completes.
   BUGFIX: When using nhmmer/pHMMs and a non-species-level "-species"
    parameters (e.g. "-species mammals"), RepeatMasker was using an
    exteremly permissive per-family score cutoff.  This overwhelmed
    ProcessRepeats with many low-scoring alignments.
   BUGFIX: ProcessRepeats dies around line 8190 with a "Division by
    zero" error.
   BUGFIX: Fixed call to famdb for buildSummary.pl in the -species 
    option.  Updated documentation.
   Updated repeatmasker.help.
 | 
| RepeatMasker-4.1.8 
 
   Fixed table format for non-mammals.  The "Retroelements" heading
  didn't include Penelope elements ( "PLE/" ) in it's tabulation.
   ProcessRepeats - A major change in how annotations are processed
  to reduce PR's memory footprint.  Instead of loading all anntations
  in memory, they are now streamed and processed a sequence at a time.
  For chromosomal assemblies this reduced the memory footprint to
  the size of the largest chromosome, instead of the whole genome.
   Fixed a bug that caused general/is.lib to be left in a unfrozen
  state for rmblastn.
   Updated to FamDB 2.0.0
 | 
| RepeatMasker-4.1.7-p1 
 
  This is a critical fix to the 4.1.7 release.  The -species option
    was not being interpreted correctly by RepeatMasker/ProcessRepeats.
    The most obvious change for mammals/primates was that the output
    format (*.tbl file) looked different than it had before.  Under the
    hood this also changed the way some species were handled in the 
    library searches. 
 | 
| RepeatMasker-4.1.7 
 
   Handle the case were the default search engine is not configured
    and require the user to provide it using -e/-engine.
   Use the global alignment bandwidth simulation for the refinement
    step.
   Allow for minimal database install and warn when used with -species.
   Fixed a bug on configure with setting the default search engine when
    only one search engine is configured.
   RepeatProteinMask wasn't obtaining the correct search engine if
    configure didn't set it up correctly.
   FamDB wasn't exporting RepeatMasker.lib with class-name suffixes.
    This had an impact on the RepeatModeler/RepeatClassifier package
    but not on RepeatMasker itself. Upgraded to FamDB 1.0.5
   Fixed an issue with parsing RepBase.  Some records were not being
    parsed correctly for records that do not include the keyword "repbase"
    following the identifier.
   New semantantics for NCBIBlastSearchEngine bandwidth interpretation.
   Fixed a divide-by-zero error in ProcessRepeat's merging of 
    DNA tranpsoson overlapping fragments.
   FamDB fixed inconsistency between curated/uncurated counts vs
    exports.
 | 
| RepeatMasker-4.1.6 
 
Upgraded to FamDB 1.0.2 to support Dfam 3.8 and the new partitioned
    database format.
Added Libraries/RMRB_spec_to_tax.json to project.  This maps the
    RepBase taxanames to current NCBI tax_ids and needs to be refreshed
    with each new Dfam release.
Added softmasking support to NCBIBlastSearchEngine.pm.
Added new '--uncurated' flag to handle single export Dfam format.     
    If this flag is used the CONS/HMM cached directories will be suffixed
    with "_wunc".
Fixed sunk error messages from famdb.py.  Now they will be displayed
    and cause RepeatMasker to quit. 
Additional library setup steps and error checking for configure utility.
CAF documentation in SearchResult.
calcDivergenceFromAlign clarified use of "-a" in documentation.
 | 
| RepeatMasker-4.1.5 
 
Updated codebase for Dfam 3.7 compatibility (famdb format 4.3).
Penelope classification change caused *.tbl file accounting
    to place them in the Unknown category.  Also fixed landscape
    generation tool.
Added a new utility to merge *.out *.align files generated by
    running RepeatMasker serially.
Repbase metadata was out-of-date, updated species names
    so that they match the current NCBI Taxonomy names.
Fixed an issue with the HMM parser.  It wasn't recognizing 
    negative values for Tau with models that do not have GA thresholds.
 | 
| RepeatMasker-4.1.4 
 
Added support for RMBlast 2.13.0.
Release of the TE genome browser visualization (UCSC) and trackhub generation tool.
New CpGSites and unadjusted Kimura stats in the *.align file.
Fixed a bug that caused the read-only state of the input fasta file to propogate to the intermediate files and cause the program to exit.
Removed DateRepeats as it's based on old library formats - this functionality will return with the refactored version of RM in the works.
 | 
| RepeatMasker-4.1.3-p1 
 
A recent change in 4.1.3 to correct blank fragment ID fields can in rare
cases causing the error message: 'Can't call method "setLeftLinkedHit"'.
The RepeatAnnotationData.pm file containing necessary information for
recognizing equivalent fragments of DNA transposons was missing data.
The MULE-MuDR class was added to the *.tbl file for "-lib" searches.
 | 
| RepeatMasker-4.1.3 
 
A new utility for generating trackHubs for our new UCSC TE visualization
Fix a bug where killing RM while starting up can leave the cached libraries
in an inconsistent state. 
Fixed a bug where in rare cases the joined fragment ID field is blank
Merged in changes to Dupmasker supporting multi-threaded use
Fixed legacy RepBase taxonomic labels
Added support for GFF v3 output and fixed the utility/rmOutToGFF3.pl
 | 
| RepeatMasker-4.1.2-p1 
 
Releases 4.1.1-4.1.2 contained a bug with the processing of Alu sequences in primates.  The step
where an initial annotation is refined into a particular Alu subfamily was not performed and the annotations
remained labeled with the initial capture sequence ( AluJb, AluSx, or AluY ).  This patch release fixes this
one issue.
 | 
| RepeatMasker-4.1.2 
 
 Fixed 21 protein family classifications in RepeatProteinLib. Fixed a problem with the generation of the RepeatMasker.lib file
  for use by RepeatModeler.  In release 4.1.1 it did not add the
  classification info to this auxilary file. Fixed a "log(0)" error that can cause the program to fault in rare
  circumstances. buildSummary now supports FamDB and has improved documentation. Bugfixes and improvements to FamDB. | 
| RepeatMasker-4.1.1 
 
Dfam (starting with version 3.2) is now distributed in the FamDB file format
based on HDF5, which has improved support for large datasets compared to the
EMBL and HMM formats that were previously used.  RepeatMasker therefore
includes a copy of famdb.py, and depends on the python package h5py.
  
    The 'configure' script and other parts of RepeatMasker have been
    updated to accomodate these changes.The utilities 'queryTaxonomyDatabase.pl' and 'queryRepeatDatabase.pl'
    are no longer included, since that data is now included in FamDB. The
    'famdb.py' tool can be used to make many of the same queries as the
    removed utilies, and even more.
     | 
| RepeatMasker-4.1.0 
 
RepeatMasker now has a refactored configuration system
making it easier to distribute RepeatMasker via package
managers and/or bundle RepeatMasker into containers.
 | 
| RepeatMasker-open-4-0-9-p1 
 
Input files containing multiple FASTA sequences
caused RepeatMasker to error out with a message
like:
      "WARNING: TRF returned an error (Return code = ### )TRF parameters: 2.7.7.80.10.50.10
 A search phase could not complete on this batch.
 The batch file will be re-run and if possible the
 program will resume.
 WARNING: Retrying batch ( 1 ) [ 255,, 195]..."
 
 
    This bug was introduced when we attempted to improve TRF
    error catching.  Unfortunatly the return codes are
    not documented for TRF and the assumption that 256
    is the only successful return code is wrong.  The
    "success" code appears to change depending on the
    number of sequences in the file.  The workaround is
    to fail only if there is a message in the error output
    file.
 | 
| RepeatMasker-open-4-0-9 
 
General compatibility update for Dfam 3.0.  Dfam
    and Dfam_consensus have merged into one combined
    database.  RepeatMasker can use Dfam using any of
    it search engines and will automatically switch to
    using consensus sequences or profile HMMs based on
    the engine used. It is important to note that, by
    default RepeatMasker will use Dfam consensus
    sequences when library duplicates are detected.
Bugfix: The -dir option no longer assumes that the
    directory already exists.
Feature: The configure script now accepts command-line
    parameters to change configuration settings.  Configure
    also re-reads existing configuration options to use
    as prompt defaults.
 | 
| Archived Releases | 
|  |