Descriptive analytics with MPC

In this write-up we demonstrate how MPC can be used for descriptive analytics. We show how to apply MPC on Kaplan-Meier survival analysis and still gain insight from data.


Kaplan-Meier

what is kaplan meier
curve shows… including events when patient drops out.
A logrank test can determine if two curves are different.

The problem

A challenge arises when privacy prohibits hospitals to combine the results of medical studies, e.g. to determine the effect of treatment options. In order of privacy leakage we are faced with three leaks:

  • the input data: the occurance of an event at a certain time may be related to an individual. This should stay with the input party.
  • the Kaplan Meier curves output: similarly as the input data (see figure). This should preserve privacy of the indivuals, but still offer insight.
  • the logrank test output: The statistic by itself preserves privacy of the individuals.

preserve privacy






test

test

Demo

data

In [1]:
import sys
from scipy.stats import chi2
sys.argv = sys.argv + ['-c', 'party3_0.ini']
from mpyc.runtime import mpc
await mpc.start()
%matplotlib inline 
from km import read_config, load_and_share, combine_series, plot_km, aggregate_events, logrank, to_lifelines_in_clear
2019-04-16 15:52:10,363 Start MPyC runtime v0.5
In [2]:
filename, npoints, step, ninputters = read_config('kmtest2-params.txt')
series = load_and_share(filename, ninputters, npoints)
inputs = combine_series(series)
In [3]:
T1, E1, T2, E2 = await to_lifelines_in_clear(inputs)
plot_km(T1, E1, T2, E2, 'Kaplan Meier curves for individual events')
In [4]:
T1, E1, T2, E2, anon = await aggregate_events(inputs, step)
plot_km(T1, E1, T2, E2, 'Kaplan Meier curves for aggregated events', True)
In [5]:
c2 = await logrank(inputs, anon, step)
print(f'Chi2 statistic={c2}, p-value={1-chi2.cdf(c2,1)}')
await mpc.shutdown()
2019-04-16 15:52:15,492 Barrier 0 1 [460]
2019-04-16 15:52:16,330 Barrier 0 1 [1004]
2019-04-16 15:52:16,849 Barrier 0 1 [1380]
2019-04-16 15:52:17,837 Barrier 0 1 [2092]
Chi2 statistic=2.966583251953125, p-value=0.08500120941345202
2019-04-16 15:52:17,931 Stop MPyC runtime -- elapsed time: 0:00:07.551545
In [ ]: