= mtbl_fileset(3) =

== NAME ==

mtbl_fileset - automatic multiple MTBL data file merger

== SYNOPSIS ==

^#include <mtbl.h>^

Fileset objects:

[verse]
^struct mtbl_fileset *
mtbl_fileset_init(const char *'fname', const struct mtbl_fileset_options *'fopt');^

[verse]
^struct mtbl_fileset *
mtbl_fileset_dup(struct mtbl_fileset *'orig', const struct mtbl_fileset_options *'fopt');^

[verse]
^void
mtbl_fileset_destroy(struct mtbl_fileset **'f');^

[verse]
^void
mtbl_fileset_reload(struct mtbl_fileset *'f');^

[verse]
^void
mtbl_fileset_reload_now(struct mtbl_fileset *'f');^

[verse]
^const struct mtbl_source *
mtbl_fileset_source(struct mtbl_fileset *'f');^

[verse]
^void
mtbl_fileset_partition(struct mtbl_fileset *'f',
		mtbl_filename_filter_func 'cb',
		void *'clos',
		struct mtbl_merger \**'m1',
		struct mtbl_merger **'m2');^

Fileset options:

[verse]
^struct mtbl_fileset_options *
mtbl_fileset_options_init(void);^

[verse]
^void
mtbl_fileset_options_destroy(struct mtbl_fileset_options **'fopt');^

[verse]
^void
mtbl_fileset_options_set_merge_func(
        struct mtbl_fileset_options *'fopt',
        mtbl_merge_func 'fp',
        void *'clos');^

[verse]
^void
mtbl_fileset_options_set_dupsort_func(
        struct mtbl_fileset_options *'fopt',
        mtbl_dupsort_func 'fp',
        void *'clos');^

[verse]
^void
mtbl_fileset_options_set_filename_filter_func(
        struct mtbl_fileset_options *'fopt',
        mtbl_filename_filter_func 'fp',
        void *'clos');^

[verse]
^void
mtbl_fileset_options_set_reload_interval(
        struct mtbl_fileset_options *'fopt',
        uint32_t 'reload_interval');^

== DESCRIPTION ==

The ^mtbl_fileset^ is a convenience interface for automatically maintaining a
merged view of a set of MTBL data files. The merged entries may be consumed via
the ^mtbl_source^(3) and ^mtbl_iter^(3) interfaces.

^mtbl_fileset^ objects are initialized from a "setfile", which
specifies a list of filenames of MTBL data files, one per line.  For
each filename, if it is not an absolute path, then the directory name
of the fileset file's pathname is prepended to it.  Examples: If the
setfile path is "/export/dnstable/mtbl/dns.fileset", and a line in it
is of the form "dns.2017.Y.mtbl" then ^dnstable^ will try to open
"/export/dnstable/mtbl/dns.2017.Y.mtbl".  If the setfile path is
"/export/dnstable/mtbl/dns.fileset", and a line in it is of the form
"/tmp/dns.2017.Y.mtbl" then ^dnstable^ will try to open
"/tmp/dns.2017.Y.mtbl".

Internally, an ^mtbl_reader^ object is initialized from each filename
and added to an ^mtbl_merger^ object.  The setfile is watched for
changes and the addition or removal of filenames from the setfile will
result in the corresponding addition or removal of ^mtbl_reader^
objects.

Because the MTBL format does not allow duplicate keys, the caller must provide a
function which will accept a key and two conflicting values for that key and
return a replacement value. This function may be called multiple times for the
same key if the same key is inserted more than twice. See ^mtbl_merger^(3) for
further details about the merge function.

^mtbl_fileset^ objects are created with the ^mtbl_fileset_init^() function,
which requires the path to a "setfile", _fname_, and a non-NULL _fopt_ argument
which has been configured with a merge function _fp_. ^mtbl_fileset_source^()
should then be called in order to consume output via the ^mtbl_source^(3)
interface.

The ^mtbl_fileset_dup^() function, which requires a ^mtbl_fileset^ object
and a non-NULL ^opt^ argument, creates a duplicate ^mtbl_fileset^ object
that shares the underlying fileset, but with different fileset options, as specified.

Accesses via the ^mtbl_source^(3) interface will implicitly check for updates
to the setfile. However, it may be necessary to explicitly call the
^mtbl_fileset_reload^() function in order to check for updates, especially if
files are being removed from the setfile and the ^mtbl_source^ is infrequently
accessed.

The ^mtbl_fileset_reload^() function avoids checking for updates more
frequently than every _reload_interval_ seconds.  If 
^reload_interval^ is set to ^MTBL_FILESET_RELOAD_INTERVAL_NEVER^, then ^mtbl_fileset_reload^() function
will only load the fileset once.  The ^mtbl_fileset_reload_now^()
function can be called to bypass the _reload_interval_ check.

The ^mtbl_fileset_partition^() function yields two ^struct mtbl_merger^
objects that are split based on the output of a callback. The caller is
responsible for calling ^mtbl_merger_destroy^() on each of these mergers.
Calls to ^mtbl_source_*^() on the fileset's source object, and calls to
^mtbl_fileset_reload^() and ^mtbl_fileset_reload_now^() may leave these
mergers in an inconsistent state.

=== Fileset options ===

==== merge_func ====
See ^mtbl_merger^(3). An ^mtbl_merger^ object is used internally for the
external sort.

==== dupsort_func ====
See ^mtbl_merger^(3). Used to sort the entries with duplicate keys during the merge process based on their data.

==== fname_filter_func ====
Used to filter specific files by name from a fileset. If the function returns ^false^, the file's data
will not be included in the results returned by any iterators on the fileset.

==== reload_interval ====
Specifies the interval between checks for updates to the setfile, in seconds.
Defaults to 60 seconds.  ^MTBL_FILESET_RELOAD_INTERVAL_NEVER^ is a special value that indicates to never reload the fileset.

