Overview
The RepeatMasker program is distributed with three small but growing open repeat databases:
- Dfam:
- A collection of Repetitive DNA Profile Hidden Markov models. The current release ( Dfam2.0 ) contains 4,150 models spanning five organisms: human, mouse, zebrafish, fruit fly, and nematode. ( www.dfam.org )
- Dfam_consensus:
- A new database of freely available Repetitive DNA consensus sequences either in or destined for inclusion in Dfam. This is a pre-release of the database and currently contains a handful of new un-restricted sequences. ( www.dfam-consensus.org ).
- RepeatPeps:
- The Repeat Protein Database ( RepeatPeps ) is a large database of curated protein sequences identified in transposable elements. This database is distibuted with the RepeatMasker package ( Libraries/RepeatPeps.lib ) and is used by the RepeatMasker utility RepeatProteinMask.
Datasets
We currently distribute Dfam_consensus and RepeatPeps along with several other RepeatMasker metadata files as a bundled dataset. The most current versions of these are included in the RepeatMasker package. Updates to the datasets can be downloaded from here. If you are going to use RepBase RepeatMasker Edition ( available from GIRI ) please note that it is important to match up the release dates of the RepBase library with the associated RepeatMaskerMetaData release and install both at the same time.
Current Release: RepeatMaskerMetaData-20181026.tar.gz
Previous Release: RepeatMaskerMetaData-20170127.tar.gz