Prerequisites
|
- Unix system with RepeatMasker 3.2.0 or higher installed
- DupMasker is included in the RepeatMasker 3.2.0 (and higher) release. RepeatMasker installation instructions can be found
here
- Sequence Search Engine
DupMasker/RepeatMasker use a sequence search engine to perform
their searches. Currently DupMasker only supports the RMBlast
and WUBlast/ABBlast engines.
- Duplicon Database
The duplicon database developed by Jiang Z. et al. is an essential component of this system. It is available for download from:
dupliconlib-20080314.tar.gz
Details on how this database was constructed may be found in:
Jiang Z, Tang H, Ventura M, Cardone MF, Marques-Bonet T, She X, Pevzner PA, Eichler EE.
"Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution."
Nat Genet. 2007 Nov;39(11):1361-8. Epub 2007 Oct 7.
|
Installation
|
- Install DupMasker Database
Download the duplicon database and unpack it in the RepeatMasker program directory.
- cp dupliconlib-20080314.tar.gz /usr/local/RepeatMasker
- cd /usr/local/RepeatMasker
- gunzip dupliconlib-20080314.tar.gz
- tar xvf dupliconlib-20080314.tar
- rm dupliconlib-20080314.tar
|
Output Format
|
The *.duplicons file format mimics the RepeatMasker *.out file format ( which in turn is based on the cross_match file format ). The specific fields are described below:
Forward Strand Annotation:
SW perc perc perc qry qry qry qry subj subj subj subj
score div. del. ins. seq begin end (left) seq begin end (left)
---------------------------------------------------------------------------
2334 8.44 0.00 3.25 chr1 127 737 (8222) SD1132... 1 298 (14)
Reverse Strand:
SW perc perc perc qry qry qry qry subj subj subj subj
score div. del. ins. seq begin end (left) C seq (left) end begin
-------------------------------------------------------------------------
2334 8.44 0.00 3.25 chr1 127 737 (8222) C SD1132... (14) 298 1
- SW score = smith-waterman score of the match (complexity-adjusted )
- perc div. = %substitutions in matching region.
- perc del. = %deletions (in query seq rel to subject) in matching region.
- perc ins. = %insertions (in query seq rel to subject) in matching region.
- qry seq = id of query sequence.
- qry begin = starting position of match in query sequence.
- qry end = ending position of match in query sequence.
- qry (left) = no. of bases in query sequence past the ending position of
match (so 0 means that the match extended all the way to
the end of the query sequence).
- C = "C" match is found on the reverse strand
- subj seq = id of the duplicon.
- subj (left) = The remaining bases in (complement of) subject sequence
prior to beginning of the match.
- subj end = starting position of match in subject sequence (using
top-strand numbering).
- subj begin = ending position of match in subject sequence.
|
Example Run
|
In this example we first downloaded the 173kb sequence AC097264.4 from Genbank.
- [DupMaskerPath]/DupMasker AC097264.4
The output generated:
- AC097264.4.dupout - An intermediate file of results obtained by searching the masked sequence against the duplicons library.
- AC097264.4.duplicons - The final output of fully-extended duplicons.
The duplicons file is then run through the dupliconToSVG.pl script:
- [DupMaskerPath]/util/dupliconToSVG.pl AC097264.4.duplicons
To obtain a SVG graphical representation of the region:
- AC097264.4.duplicons.1.svg: Note: Some browsers can view this file directly. In addition firefox can zoom in and out on SVG files.
|
Release Notes
|
RM-open-3.2.3
- A missing parameter to wublast in WUBlastSearchEngine.pm caused DupMasker to produce negative values in the *.duplicons file. This bug was only manifested when the input file contained NCBI accession identifiers such as "gi|238332|gb|AC839293.1|".
- Also in this release dupliconToSVG.pl was added to the RepeatMasker/util directory.
RM-open-3.2.2
- Dupmasker now supports the GFF output format. Use the -gff option to generate a *.duplicons.gff file in addition to the *.duplicons file.
- A new utility has been written to convert the *.duplicons file ( ex hg18-chr1.duplicons ) into a Scalable Vector Graphics visualization of the duplication blocks ( ex hg18-chr1.duplicons.1.svg ). This utility is available as a separate download ( now in Repeatmasker package -- see above ). Run the program to view the documentation.
RM-open-3.2.0
- First release of DupMasker.
|