Index of /~andy/programs/FFindex/


../
databases/                                         03-Jan-2014 15:11                   -
MD5SUMS                                            03-Sep-2015 22:27                 672
SHA256SUMS                                         03-Sep-2015 22:27                1056
ffindex-0.9.3.tar.gz                               24-Feb-2012 06:53               19842
ffindex-0.9.5.tar.gz                               22-May-2012 16:44               22082
ffindex-0.9.6.1.tar.gz                             22-Jun-2012 13:41               22158
ffindex-0.9.6.tar.gz                               15-Jun-2012 22:43               22142
ffindex-0.9.7.tar.gz                               17-Aug-2012 20:34               22629
ffindex-0.9.8.tar.gz                               21-Sep-2012 13:36               24387
ffindex-0.9.9.tar.gz                               22-Nov-2012 14:30               24627
ffindex-0.9.9.1.tar.gz                             22-Jan-2013 16:54               24828
ffindex-0.9.9.3.tar.gz                             11-Nov-2013 11:15               26070
ffindex-0.9.9.4.tar.gz                             25-Aug-2015 22:58               31418
ffindex-0.9.9.5.tar.gz                             02-Sep-2015 11:07               34563
ffindex-0.9.9.6.tar.gz                             03-Sep-2015 22:26               34833
ffindex-0.9.9.7.tar.gz                             21-Mar-2016 13:20               36139

* Copyright

FFindex was written by Andreas Hauser.
Please add your name here if you distribute modified versions.

FFindex is provided under the Create Commons license "Attribution-ShareAlike 3.0",
which basically captures the spirit of the Gnu Public License (GPL).

See:
http://creativecommons.org/licenses/by-sa/3.0/

* Thanks

Thanks to Laszlo Kajan for creating and maintaining Debian packages
and many suggestions to improve the build and user experience.


* Overview

FFindex is a very simple index/database for huge amounts of small files. The
files are stored concatenated in one big data file, seperated by '\0'. A second
file contains a plain text index, giving name, offset and length of of the
small files. The lookup is currently done with a binary search on an array made
from the index file.


* Installation
 
$ cd src
$ make
$ make test

If you have MPI and want to compile ffindex_apply_mpi:
$ make HAVE_MPI=1

On OS X use for the first make line:
$ make -f Makefile.osx

# Please use a sensible value for INSTALL_DIR, e.g. /usr/local or /opt/ffindex
# or $HOME/ffindex instead of "..".
$ make install INSTALL_DIR=.. 

and with MPI:

$ make install INSTALL_DIR=.. HAVE_MPI=1


* Usage

Please note that before querying or unlinking entries a ffindex must be
sorted, although you can add to it without. So either specify -s with
ffindex_build or sorted later with ffindex_modify -s.
Also the length of the entry names is restricted. See the usage output of
the ffindex_build program.

Setup environment:
$ export PATH="$INSTALL_DIR/bin:$PATH"
$ export LD_LIBRARY_PATH="$INSTALL_DIR/lib:$LD_LIBRARY_PATH"
On OS X set DYLD_LIBRARY_PATH instead of LD_LIBRARY_PATH.

Build index from files in test/data and test/data2.
$ ffindex_build -s /tmp/test.data /tmp/test.ffindex test/data test/data2

Retrieve three entries:
$ ffindex_get  /tmp/test.data /tmp/test.ffindex a b foo

Unlink (Remove reference from index) an entry:
$ ffindex_modify -u /tmp/test.ffindex b

Retrieve three entries, "b" should now be missing:
$ ffindex_get /tmp/test.data /tmp/test.ffindex a b foo

Convert a Fasta file to ffindex, entry names are incerental IDs starting from 1:
$ ffindex_from_fasta -s fasta.ffdata fasta.ffindex NC_007779.ffn

Get first entry by name:
$ ffindex_get fasta.ffdata fasta.ffindex 1

Get first and third entry by entry index, this a little faster:
$ ffindex_get fasta.ffdata fasta.ffindex -n 1 3

Count the characters including header in each entry:
$ mpirun -np 1 ffindex_apply_mpi fasta.ffdata fasta.ffindex wc -c

Count the number of characters in each sequence, without the header:
$ mpirun -np 1 ffindex_apply_mpi fasta.ffdata fasta.ffindex perl -ne '$x += length unless(/^>/); END{print "$x\n"}'

Parallel version for counting the characters including header in each entry:
$ mpirun -np 4 ffindex_apply_mpi fasta.ffdata fasta.ffindex -- wc -c

Parallel version for counting the characters including header in each entry and
saving the output to a new ffindex:
$ mpirun -np 4 ffindex_apply_mpi fasta.ffdata fasta.ffindex -i out-wc.ffindex -d out-wc.ffdata -- wc -c