CIDM2013

Supporting Code and Data for

P. Weber, B. Bordbar and P. Tiňo: A Principled Approach to Mining From Noisy Logs Using Heuristics Miner. In The IEEE Symposium on Computational Intelligence and Data Mining (CIDM) 2013, part of the IEEE Symposium Series on Computational Intelligence (SSCI) 2013, Singapore, 2013.

Automata (OpenFst/AT&T format):

Results:
  • RESULTS.tar.gz: gzip'ed tar file containing the following files:
    • RESULTS / N2_{i} / Ni_{Kkk}_{0dd}_{po}_{rtb} /
      • q_neg{j}_{ppp}_{ddd}_{po}_{rtb}_{nnnn}.out: output from batch run on ProM, from multiple MXML process files of {nnnn} traces. Debug output plus metrics for each mining result, means, standard deviations and standard errors.
        {i} = 1 (mix of M and O1, varying noise), 2 (mix of M and O2, varying noise), params (mix of M and O2, varying parameters)
        {j} = 1,2: noise model used.
        {kkk}: proportion of noise 0.Kkk (for N1_2, Kkkk = K.kkk noise).
        {0dd}: DT = 0.dd
        {po}: PO = po
        {rtb}: RTB = r.tb
        {nnnn}: number of traces in log.
      • q_neg{j}_{ppp}_{ddd}_{po}_{rtb}_GT_ML_metrics.csv: aggregated results from .out files, one record per file size.
      • rs_q_neg{j}_{ppp}_{ddd}_{po}_{rtb}_GT_ML_metrics.eps: convergence graph, metrics plotted against numbers of traces.
      • rsprob_q_neg{j}_{ppp}_{ddd}_{po}_{rtb}_pac.eps: convergence graph, number of mined models for which each metric is below a threshold.

This code can be used to perform batch process mining activity from logs simulated from probabilistic automata, comparing results as distributions over strings of symbols.

  • batchplus.pl: Monolithic Perl script to simulate .fst probabilistic automata to MXML files, call ProM to mine using Heuristics Miner, and compare results as probability distributions.
    batchplus.pl \ -f order_eg.fst \ # Ground truth .fst
    -fn noise_eg1_noise.fst \ # Noise model .fst
    -n 10 \ # Number of files in set
    -p 0.01 \ # Probability of selecting noise model
    -d {destination directory} \ # Directory to write MXML and output files
    -g 10000 \ # Number of traces in large `ground truth' MXML file
    -a HM \ # Call ProM's Heuristics Miner (HM)
    -b '-u 1 -o 10 -t 0.9 -r 0.05' \ # Parameters to HM (Use all-activities-connected Heuristic, PO=10, DT=0.99, RTB=0.05).
    -t 100 \ # Timeout for state exploration (seconds)
    -m 1 \ # Calculated distances between probability distributions
    -o 7 \ # Use various methods for labelling the mined models with probabilities
    10 20 30 ... # Sizes of MXML logs to simulate
  • sa_pn.pm, sim_sa_pn.pm: Perl modules used by batchplus.pl.
  • bpl_extract_csvs.sh: Produce aggregate results .csv files from .out files produced by batchplus.pl (also calls Matlab scripts to produce graphs).
  • HMBatch.sh: Shell script to call Heuristics Miner from batchplus.pl. Mines from all the simulated files and aggregates the results.
  • HMBatch.java: Java class to call Heuristics Miner on multiple MXML files. Called by HMBatch.sh
  • Timer.java: Utility class to kill long-running state space exploration.