ICEDB

Overview

ICEDB is a continuous query processor to process streaming data, and is a central part of the CarTel project. ICEDB differs from traditional stream processing applications in how query results are sent to the querying application (running on the portal). Because network connectivity is variable and intermittent, with ICEDB:

Thus, applications can think of the data distributed across the mobile network as being stored locally in a standard SQL relational database, which simplifies how they are written. The programming model is familiar, essentially the same as what web developers today use. ICEDB deals with the underlying complexity of distributing queries to the mobile nodes (where they run in situ), coping with the network’s vagaries, and ensuring that the results are available locally.

ICEDB handles heterogeneous sensor data, allowing the set of sensors to be expanded without requiring major software changes on the remote nodes. Each sensor has an adapter running on the node that handles the details of configuring and extracting information from that sensor and converting it into a normalized form. To ease management and deployment, when a new sensor is added, or when the functions of an adapter need to be modified, only the adapter module needs to change.

Features

Setup

Requirements

Both the ICEDB node and ICEDB portal require the following software. We have tested with the indicated versions.

The ICEDB portal additionally requires the following software.

To deploy ICEDB onto nodes that consist of a master and a slave in the same NFS-based setup as the one used in CarTel, you may find the utilities in tools/deploy/ to be helpful.

Downloads

Download ICEDB v0.1 from http://cartel.csail.mit.edu/icedb/icedb-0.1.tgz.

Installation

To install ICEDB, set your ICEDB_DATA_DIR environment variable to the location of ICEDB’s “data cluster”, i.e. the directory where ICEDB will store all of its data (database, catalog, logs, configurations, etc.). This variable must be set before running any part of ICEDB. Also ensure (either before or after the following installation step) that the directory gives write access to whatever user(s) ICEDB will be running under. Note that ICEDB cannot run as the root user, for security reasons.

Then run setup.bash, which is a standard SimpleSetup installer. This should be the only step that requires root access, if you’re installing to a root-only location.

Configuration

Once ICEDB is installed, modify the localconf.py configuration file under $ICEDB_DATA_DIR/central and $ICEDB_DATA_DIR/device (i.e. on both the central and device nodes). You can add arbitrary Python code to this file, which is automatically executed before many of the ICEDB tools are run, but it is intended primarily as a way to override the default values specified in the icedb.conf module, so refer to that file for more examples of what to change.

In particular, one commonly changed value is the location of the central node. This is where data is collected from the device nodes. Set this by setting the central_host variable.

ICEDB expects to receive POSIX signals sigusr1 and sigusr2 as notifications of network connection and disconection, respectively. In the CarTel system, the software for doing this is called OpenWifi.

Cluster Setup

Now initialize the data cluster on the central host (also wipes out any previous ICEDB data cluster - note that this does not affect any other Postgresql clusters/databases you have on your system):

central-host $ icedb-setup central cluster

Do the same on the device:

device-host $ icedb-setup device cluster

You should only ever need to do this once. This also starts up the backend Postgresql daemons.

Database Setup

Initialize the ICEDB catalogs/users/databases on the nodes by running:

central-host $ icedb-setup central db
device-host $ icedb-setup device db

NOTE: The Postgresql daemons need to be running for these commands to work. This means the cluster setup command must have been run before this. In the future, this database setup step will automatically start Postgresql if it’s not already running.

Usage

icedb-ctl is the ICEDB daemon controller. To start ICEDB on the central and device nodes, run:

central-host $ icedb-ctl central start
device-host $ icedb-ctl device start

Other than the start command, you can also issue stop, status, and restart. The actual ICEDB executables are named icedb-central and icedb-device, but the user should not have to invoke these manually; they do expect the environment to have been prepared properly.

To manage adapters or queries, use the icedb-client tool from the central host (currently requires execution on the central host):

central-host $ icedb-client add-adapter my_simple_gps
               'latitude double precision, longitude double precision'
               --path /path/to/simple-gps-adapter.exe
central-host $ icedb-client add-query myfirstquery
               'select * from my_simple_gps
                where time > now - 5 every 5 seconds'

For help regarding icedb-client’s command line syntax, see icedb-client --help.

Adapters

Adapters are the data sources from which ICEDB collects data. We refer to the executable binaries or scripts that insert the data into ICEDB “packages”. ICEDB downloads adapters and saves the packages to a configuration-specified location (by default, to $ICEDB_DATA_DIR/commands/). ICEDB is not resopnsible for starting or stopping these adapter packages; the user must manually do this.

One handy tool for this is icedb-adapter, which exports ICEDB configuration options as environment variables for an adapter package to consume. The usage is simple:

device-host $ icedb-adapter my-adapter-package

This in turn uses the bash-conf tool, which developers may find useful to get configuration variables.

To write an adapter package, refer to the icedb-dummy-pusher Perl script as a simple example. gps2db is a GPS adapter that is an example of a complete adapter package. The IcedbPusher Perl module is provided for your convenience. ICEDB expects the data to be sent in CSV format over a socket on icedb_data_port, but with the name of the adapter as the first line.

Queries

Aside from data sources, continuous queries can also be deployed. These are simply normal SQL queries with an optional final clause, either:

The results from these queries, which are executed by the ICEDB device daemon over the historical data stored in the Postgresql backend, are buffered in different Postgresql tables and sent back to the central node.

Schemas

Data from adapters are stored into tables on the device node’s local database. The table schemas for each adapter has the attributes as specified in the adapter definition, but along with the following fields:

For instance, the gps adapter’s attributes string might be:

lat double precision not null, long double precision not null

This will result in the following DDL on the device:

create table gps (
  rec_id serial4 not null,
  time timestamp not null,
  lat double precision not null,
  long double precision not null
)

These tuples are forwarded to the central node when connectivity is available. The central node essentially builds tables with the same schema, but with the following additional fields:

Query results are also buffered in tables on the device node’s local database. These tables are similarly mirrored onto the central node.

Applications are provided direct access to these tables, which are created and populated by ICEDB.

Miscellaneous

License

ICEDB is released under the GNU General Public License.

Acknowledgements

Support for ICEDB was provided by the NSF CAREER Program under grant number 0448124.

Contact