Pan Predictor

Dan Riesel, Data Scientist & Michael Leshchinsky, ML engineer
May 29, 2022
1 min read

Pan Predictor is an AutoML framework, used for exploration and development of prediction models. It takes simple population tables as input (id, prediction date, outcome) and returns the best trained model it can find.

Pan Predictor is a modular tool implemented as python package

As a user you can use all modules or only part of them:

Feature extraction - thousands of features are queried from Clalit's DB, including demographics, diagnoses, lab results, medications, clinical covariates, procedures, vaccinations, and many more.
Feature preprocessing that supports any sklearn or custom-built preprocess method in an easy-to-use pipeline style definitions.
Feature selection - by using different methods, including an upstream prediction model for selection before the main model is trained.
Modeling – model & hyperparameter Bayesian search, using "optuna" package
Evaluation – various performance metrics and graphs, including details for all risk thresholds, using {rtichoke} package.
Explanation – feature explanation, using "shap" package.
Documentation – Models and results are automatically documented in a dedicated MLFlow artifactory.

Minimum code involved

Data scientists control process using simple configurations with minimum code involved, and can focus on the important issues:

Designing the study / intervention / product
Evaluating the model and its clinical value
Communicating with stakeholders – customers & users

Pan Predictor Flow

The framework is inseparable from our daily work. We use it to develop almost all our of prediction models. Many of these models are implemented and used on real patients to create meaningful impact on their lives.