leakiEst

Command Line Options

When used as a command-line tool, leakiEst supports a number of options that change its default behaviour. They are specified in the usual way:

$ java -jar leakiest-1.4.9.jar [OPTION].. [CONFIG | DATASET [DATASET2]]

A summary of the information presented below can be found by passing the -h option to leakiEst.

leakiEst can read its options from a configuration file in the directory that contains leakiest-1.4.9.jar. CONFIG should be the name of a file whose first line is //CFG// followed by lines containing one of the options listed on this page. leakiEst will behave as if those options had been given on the command line. Comments are supported inside configuration files: lines beginning with // are ignored by leakiEst.

Selecting a leakage measure

leakiEst can estimate four common information leakage measures from datasets:

  • -mi estimates the amount of mutual information between the secrets and outputs in the dataset;
  • -me estimates the min-entropy leakage from the secrets to the outputs;
  • -cp estimates the capacity of an information-theoretic channel whose inputs are the dataset's secrets and whose outputs are the dataset's outputs;
  • -mc estimates the min-capacity of this channel.

If one of these options is not supplied, mutual information is estimated by default.

Discrete and continuous outputs in datasets

If mutual information is being estimated, leakiEst can process datasets containing either discrete or continuous outputs:

  • -di indicates a dataset containing discrete outputs;
  • -co indicates a dataset containing continuous outputs.

If one of these options is not supplied, leakiEst assumes by default that the outputs in the dataset are discrete.

If a leakage measure other than mutual information is being estimated, outputs currently must be discrete.

Selecting the dataset type

leakiEst can usually automatically detect which dataset type is being processed, but if it cannot, one of the following options must be given:

  • -o [DATASET] or -obs [DATASET], if DATASET is an observation file;
  • -o2 [DATASET] [DATASET2] or -obs2 [DATASET] [DATASET2], if DATASET and DATASET2 are both observation files (lines will be read alternately from DATASET and DATASET2);
  • -c [DATASET] or -ch [DATASET], if DATASET is a channel file;
  • -a [DATASET] or -arff [DATASET], if DATASET is an ARFF file.

Attribute selection in ARFF files

If leakiEst is processing an ARFF file, it must be able to distinguish attributes containing secret information from attributes containing public outputs. This is achieved with the following options. (The names of attributes are defined in the list of @attribute declarations in the ARFF file's header.)

  • -high [ATTRIBUTE[,..]] specifies the names of ATTRIBUTEs that contain secret information. The default value is the final attribute in the header.
  • -low [ATTRIBUTE[,..]] specifies the names of ATTRIBUTEs that contain public outputs, or one of the special values @all (meaning all of the attributes not specified in -high) or @each (which causes leakiEst to estimate the leakage to each remaining attribute separately and ranks the attributes according to how much information leaks to them). The default value is @each.

Attributes can also be specified by their (zero-based) index in the list of @attribute declarations in the header.

Output control

There are some options that control the type and amount of information that leakiEst outputs about its processing.

  • -p prints the conditional probability matrix representing the information-theoretic channel whose inputs are the dataset's secrets and whose outputs are the dataset's outputs. This option is only supported when processing datasets containing discrete outputs.
  • -v [LEVEL] controls the verbosity of leakiEst's output during processing. The given LEVEL must be between 0 and 5 inclusive; higher LEVELs cause more information to be printed. The default LEVEL is 0.
  • -csv [INTERVAL] [CSVFILE] causes leakiEst to write a comma-separated values (CSV) file to CSVFILE containing information about the leakage estimations after each batch of INTERVAL system executions has been read from the dataset and processed. The generated CSV file contains the following columns:
    • the number of observations processed so far;
    • the uncorrected leakage estimation;
    • the corrected leakage estimation;
    • the confidence interval for the corrected leakage estimation;
    • the upper bound on the value of corrected leakage that is consistent with zero leakage.
    For the data in the CSV file to be meaningful, the order of the execution data in the dataset must be randomised. -csv can only be used if the dataset is an observation file (or two observation files) or an ARFF file, and only when the dataset contains discrete outputs.

Leakage estimation accuracy control

There are several options that influence the accuracy of the results reported by leakiEst. Using them generally represents some form of time/accuracy trade-off.

  • -t causes leakiEst to stop reading execution data and terminate when the corrected leakage value shows signs of "stabilising" (changing by less than 0.01 bits over a number of system executions proportional to the number of unique secrets and outputs). This option requires the dataset to be either a discrete-output observation file or discrete-output ARFF file with defined output attributes (i.e., a value of -low other than @each). In addition, this option can only be used when mutual information is being estimated. Note that although this often greatly reduces the number of executions that leakiEst processes from the dataset before terminating, the result it gives may not be optimal. When -t is used, the estimation provided by leakiEst is meaningless unless the order of the execution data in the dataset is properly randomised.

The estimation of channel capacity (-cp) is performed with the Blahut-Arimoto algorithm, as presented in "A generalized Blahut-Arimoto algorithm" by Pascal Vontobel. It is an iterative algorithm, and its parameters can be controlled with the following options:

  • -i [TOTAL] sets the maximum number of iterations to TOTAL (the default is 10000);
  • -e [ERROR] sets the maximum acceptable error level to ERROR (the default is 0.000000000001).