Skip to content
Daniele Branchini edited this page Aug 11, 2015 · 1 revision

Features

General

  • All arkimet functionality besides metadata extraction and dataset recovery is file format agnostic.
  • Data is treated like an opaque, read only binary string, that is never modified to guarantee integrity.
  • Data files in the archive are only accessed using append operations, to avoid the risk of accidentally corrupting existing data.

Metadata

  • The extraction of metadata is very flexible, and it can be customized with the simple and well known LUA scripting language.
  • Metadata contains timestamped annotations to track data workflow.
  • Metadata can be summarised, to represent what data can be found in a big dataset without needing to access its contents. Summaries can be shared to build data catalogs.

Remote Access

  • Remote data access is provided through arki-server, an HTTP server application.
  • arki-server can serve data from local datasets, as well as from remote datasets served by other. arki-server instances (this allows, for example, to provide a single arki-server external front-end to various internal arki-servers in an organisation).
  • arki-server can be run behind apache mod-proxy to provide encrypted (SSL) or authenticated access.
  • Client data access is done using the featureful libCURL, and can access the server over SSL or through HTTP proxies.
  • When performing a query, it is possible to extract only the summary of its results, as a quick preview before actually transfering the result data.
  • Postprocessing chains can be provided by the server to transfer only the postprocessed data (e.g. transferring an average value instead of a large grid of data).

Archive

  • File layout can be customised depending on data volumes (one file per day, one file per month, etc.)
  • Each dataset can be configured to index a different set of metadata items, to provide the best tradeoff between indexing speed, disk space used by the index and query speed.
  • arkimet can detect if a datum already exists in a dataset, and either replace the old version or refuse to import the new one. It is possible to customize what metadata fields make data unique in each dataset.
  • Datasets are self-contained, so it is possible to store them in offline media, and query them right away as soon as the offline media comes online.

User interfaces

  • A powerful and flexible suite of commandline tools allows to easily integrate arkimet into automated data processing chains in production systems.
  • arki-server not only allows remote access to the datasets, but it also provides a low-level, web-based query interface.
  • ArkiWEB (soon to be released) is a web-based front-end to arkimet that provides simple and powerful browsing and data retrieval for end users.
Clone this wiki locally