Pan Predictor
Pan Predictor is an AutoML framework, used for exploration and development of prediction models. It takes simple population tables as input (id, prediction date, outcome) and returns the best trained model it can find.
Dan Riesel, Data Scientist
Michael Leshchinsky, ML engineer
Pan Predictor is a modular tool implemented as python package
As a user you can use all modules or only part of them:
-
Feature extraction - thousands of features are queried from Clalit's DB, including demographics, diagnoses, lab results, medications, clinical covariates, procedures, vaccinations, and many more.
-
Feature preprocessing that supports any sklearn or custom-built preprocess method in an easy-to-use pipeline style definitions.
-
Feature selection - by using different methods, including an upstream prediction model for selection before the main model is trained.
-
Modeling – model & hyperparameter Bayesian search, using "optuna" package
-
Evaluation – various performance metrics and graphs, including details for all risk thresholds, using {rtichoke} package.
-
Explanation – feature explanation, using "shap" package.
-
Documentation – Models and results are automatically documented in a dedicated MLFlow artifactory.
Minimum
code involved
Data scientists control process using simple configurations with minimum code involved, and can focus on the important issues:
-
Designing the study / intervention / product
-
Evaluating the model and its clinical value
-
Communicating with stakeholders – customers & users