RepeatMasker Installation Notes
===============================

Pre-requisites
--------------

  - A UNIX based operating system.

  - Perl 5.8.0 or higher installed.

  - Python 3.0 installed.

  - h5py HDF5 library for Python
      See https://docs.h5py.org/en/latest/build.html for installation
      instructions.

  - TRF 4.09 or higher ( http://tandem.bu.edu/trf/trf.html )
  
  - A search engine (at least one required).  We currently support:

       rmblast :  http://www.repeatmasker.org/RMBlast.html
       crossmatch : http://www.phrap.org
       nhmmer : Pre release version - 
              ftp://selab.janelia.org/pub/pickup/hmmer3.1-snap20121016.1.tgz 
              ( Dfam required )
       abblast/wublast : http://blast.advbiocomp.com/licensing/

  - TE Libraries - RepeatMasker is not distributed with a TE library, however
    it may be used with user-supplied libraries using the '-lib' option. TE 
    libraries in FamDB H5 format may be downloaded from Dfam at:

         https://www.dfam.org/releases/current/families/FamDB

    and uncompressed/installed in the Libraries/famdb directory.  The files are divided 
    by taxa groups and numbered starting from '0' ( aka the root partition )
    which contains required information for RepeatMasker/FamDB.  In addition
    the last Repbase RepeatMasker Edition may be downloaded and combined with Dfam.
    The RepBase RepeatMasker Library can be obtained at: 

          http://www.girinst.org

    and unpacked in the RepeatMasker directory (it will automatically install files in
    RepeatMasker/Libraries. To combine it with Dfam, simply re-run the RepeatMasker 
    configure script after both the Dfam FamDB partitions, and the Repbase RepeatMasker
    edition have been installed.
       

Installation 
------------

  1. Unpack the distribution in your home directory or in a location where it 
     may be shared with other users of your system ( ie. /usr/local/ ). Make 
     sure you do not extract in a directory already containing a pre-existing
     directory called "RepeatMasker" as it will attempt to overwrite files 
     contained within.  For example:

         % cp RepeatMasker-open-4-#-#.tar.gz /usr/local
         % cd /usr/local
         % gunzip RepeatMasker-open-4-#-#.tar.gz
         % tar xvf RepeatMasker-open-4-#-#.tar

  2. RepeatMasker is currently not distributed with a database. The program 
     may be used immediately with custom databases ("-lib mylib.fa" option) 
     or you may download TE libraries and configure them for use with 
     RepeatMasker. There are two options for supplementing/updating the 
     main RepeatMasker library:

     - The Dfam database may be downloaded from www.dfam.org in famdb HDF5 
       format (partitioned by taxa). The root ("dfam##_full.0.h5") partition 
       is required if you plan to use Dfam, however any combination of 
       additional partitions may also be downloaded and configured.
       For example:

             % wget https://www.dfam.org/releases/current/families/FamDB/dfam39_full.1.h5.gz
             % gunzip dfam39_full.1.h5.gz 
             % mv dfam39_full.1.h5 /usr/local/RepeatMasker/Libraries/famdb

             NOTE: Only partitions from the same Dfam release should be 
                   in this directory. 

     and/or
     - The RepBase RepeatMasker Edition ( final version 10/26/2018 ) may be 
       downloaded from www.girinst.org and unpaked in the RepeatMasker 
       directory. For example:

             % cp RepBaseRepeatMaskerEdition-20181026.tar.gz /usr/local/RepeatMasker/
             % cd /usr/local/RepeatMasker
             % gunzip RepBaseRepeatMaskerEdition-20181026.tar.gz
             % tar xvf RepBaseRepeatMaskerEdition-20181026.tar
             % rm RepBaseRepeatMaskerEdition-20181026.tar
               
  3. Configure the distribution by invoking Perl on the
     the "configure" script, i.e.:
  
            perl ./configure

     The configure script will prompt you for all the
     information needed to setup the RepeatMasker suite of
     programs.



Library Cache Directories
-------------------------

Since version 3.0 RepeatMasker creates species-specific libraries
on the fly.  These libraries are cached in the first writable 
directory in the programs library path.  The default path is:

  1. The RepeatMasker installation "Library" directory.
  2. The ".RepeatMaskerCache" subdirectory of the users home
     directory.
  3. The temporary processing directory "RM_#" created
     in the same directory as the sequence file and 
     removed at the end of the run.
  
NOTE: If the program cannot save libraries in either path 1 or 2, the
libraries will need to be created each time the program runs.  This
will slow down runs on shorter sequences.


