top of page

Pan Predictor


Pan Predictor is an AutoML framework, used for exploration and development of prediction models. It takes simple population tables as input (id, prediction date, outcome) and returns the best trained model it can find.

Dan Riesel, Data Scientist

Michael Leshchinsky, ML engineer

Pan Predictor is a modular tool implemented as python package

As a user you can use all modules or only part of them:

  1. Feature extraction - thousands of features are queried from Clalit's DB, including demographics, diagnoses, lab results, medications, clinical covariates, procedures, vaccinations, and many more.

  2. Feature preprocessing that supports any sklearn or custom-built preprocess method in an easy-to-use pipeline style definitions.

  3. Feature selection - by using different methods, including an upstream prediction model for selection before the main model is trained.

  4. Modeling – model & hyperparameter Bayesian search, using "optuna" package

  5. Evaluation – various performance metrics and graphs, including details for all risk thresholds, using {rtichoke} package.

  6. Explanation – feature explanation, using "shap" package.

  7. Documentation – Models and results are automatically documented in a dedicated MLFlow artifactory.

code involved

Data scientists control process using simple configurations with minimum code involved, and can focus on the important issues:

  • Designing the study / intervention / product

  • Evaluating the model and its clinical value

  • Communicating with stakeholders – customers & users

Pan Predictor Flow

The framework is inseparable from our daily work. We use it to develop almost all our of prediction models. Many of these models are implemented and used on real patients to create meaningful impact on their lives.

Pan predictor flow.png
bottom of page