Mini Course Machine Learning and Knowledge Extraction for Health Informatics

Mini Course MAKE-Health:
Machine Learning & Knowledge Extraction
for Health Informatics

“It is remarkable that a science which began with the consideration of games of
chance should have become the most important object of human knowledge”
Pierre Simon de Laplace, 1812.

Summer Term 2017
Venue: Universita di Verona > Dipartimento di Informatica

Week 15, April 10-14, 2017
Lecture Room: Strada Le Grazie 15 – 37134 VERONA

Short Description: The mini-course MAKE-Health follows a research-based teaching (RBT) approach and discusses experimental methods for combining human intelligence with machine learning to extract and discover knowledge from health data. For practical applications we focus on Python – which is to date the worldwide most used language for machine learning and knowledge extraction.

Motto: Science is to test crazy ideas – Engineering is to put these ideas into Business.

Motivation:

Machine Learning & Health Informatics is a Growing Challenge:

Machine learning (ML) is the most growing field in computer science (Jordan & Mitchell, 2015. Machine learning: Trends, perspectives, and prospects. Science, 349, (6245), 255-260), and it is well accepted that health informatics is amongst the greatest challenges (LeCun, Bengio, & Hinton, 2015. Deep learning. Nature, 521, (7553), 436-444).
Future Medicine will be a data science and Privacy aware machine (un-)learning is no longer a nice to have, but a must.

Internationally outstanding universities count on the combination of machine learning and health informatics and expand these fields, for example: Carnegie-Mellon University, Harvard, Stanford – just to name a few!

Machine Learning & Health Informatics pose enormous Business Opportunities:

McKinsey: An executive’s guide to machine learning
NY Times: The Race Is On to Control Artificial Intelligence, and Tech’s Future
Economist: Million-dollar babies

Machine Learning & Health Informatics provide Employability Graduates:

“Fei-Fei Li, a Stanford University professor who is an expert in computer vision, said one of her Ph.D. candidates had an offer for a job paying more than $1 million a year, and that was only one of four from big and small companies.”
https://www.mckinsey.com/industries/high-tech/our-insights/an-executives-guide-to-machine-learning

Machine Learning & Health Informatics has Market Opportunity for Spin-offs:

“By 2020, the market for machine learning applications will reach $40 billion, IDC, a market research firm, estimates.
By 2018, IDC predicts, at least 50 percent of developers will include A.I. features in what they create.”
https://www.nytimes.com/2016/03/26/technology/the-race-is-on-to-control-artificial-intelligence-and-techs-future.html?_r=2

Description:

The goal of ML is to develop algorithms which can learn and improve over time and can be used for predictions. In automatic Machine learning (aML), great advances have been made, e.g., in speech recognition, recommender systems, or autonomous vehicles. Automatic approaches, e.g. deep learning, greatly benefit from big data with many training sets. In the health domain, sometimes we are confronted with a small number of data sets or rare events, where aML-approaches suffer of insufficient training samples. Here interactive Machine Learning (iML) may be of help, having its roots in Reinforcement Learning (RL), Preference Learning (PL) and Active Learning (AL). The term iML can be defined as algorithms that can interact with agents and can optimize their learning behaviour through these interactions, where the agents can also be human. This human-in-the-loop can be beneficial in solving computationally hard problems, e.g., subspace clustering, protein folding, or k-anonymization, where human expertise can help to reduce an exponential search space through heuristic selection of samples. Therefore, what would otherwise be an NP-hard problem reduces greatly in complexity through the input and the assistance of a human agent involved in the learning phase. However, although humans are excellent at pattern recognition in dimensions of ≤3; most biomedical data sets are in dimensions much higher than 3, making manual data analysis very hard. Successful application of machine learning in health informatics requires to consider the whole pipeline from data preprocessing to data visualization. Consequently, this course fosters the HCI-KDD approach, which encompasses a synergistic combination of methods from two areas to unravel such challenges: Human-Computer Interaction (HCI) and Knowledge Discovery/Data Mining (KDD), with the goal of supporting human intelligence with machine learning.

Course Content:

For the successful application of ML in health informatics a comprehensive understanding of the whole HCI-KDD-pipeline, ranging from the physical data ecosystem to the understanding of the end-user in the problem domain is necessary. In the medical world the inclusion of privacy, data protection, safety and security is mandatory. This three-module (lucky Chinese number three) course provides an introuction into some selected topics of machine learning and knowledge extraction (MAKE) for health informatics.

Starting material:

1 ) Mathematical Notations can be found here as pdf (221 KB)

2) Python Tutorial Paper: M. D. Bloice and A. Holzinger, “A Tutorial on Machine Learning and Data Science Tools with Python“, in Machine Learning for Health Informatics, Lecture Notes in Artificial Intelligence LNAI 9605, Springer, 2016, pp. 437-483.

3) Introduction Paper: HOLZINGER (2016) Machine Learning for Health Informatics. (Students please read this paper first)

4) Introduction Video: https://www.youtube.com/watch?v=lc2hvuh0FwQ
(Students please watch this video first)

Module 01 – Introduction: Machine Learning meets health informatics – challenges and future directions

In the first module we get only a rough overview on the differences between automatic machine learnig and interactive machine learning and we discuss a few future challenges as a teaser.

Topic 01: The HCI-KDD approach: Towards an integrative MAKE-pipeline
Topic 02: Understanding Intelligence
Topic 03: The complexity of the application area Health
Topic 04: Probabilistic Information & Gaussian Processes
Topic 05: Automatic Machine Learning (aML)
Topic 06: Interactive Machine Learning (iML)
Topic 07: Active Representation Learning
Topic 08: Multi-Task Learning
Topic 09: Generalization and Transfer Learning

Lecture slides 2×2 (7,051 kB): 01-DAY-MAKE-Challenges-HOLZINGER-Verona-2017-2×2

Here some prereading/postreading and video recommendations:

Holzinger, A. 2016. Interactive Machine Learning for Health Informatics: When do we need the human-in-the-loop? Springer Brain Informatics, 1-13. doi: doi: 10.1007/s40708-016-0042-6
Dossier: HOLZINGER (2016) Dossier interactive Machine Learning Health Informatics
Watch the video of Andreas Holzinger: https://youtu.be/lc2hvuh0FwQ
Watch the video of Google DeepMindHealth: https://youtu.be/teZ2m5oTKwM
“Medicine is so complex, the challenges are so great … we need everything that we can bring to make our diagnostics more precise, more accurate and our therapeutics more focused on that patient.” Sir Malcolm GRANT, NHS England, in: Machine learning : ROYAL SOCIETY Conference report, Part of the conference series Breakthrough science and technologies Transforming our future with machine learning), https://royalsociety.org/topics-policy/projects/machine-learning
Watch the videos: https://www.youtube.com/playlist?list=PLg7f-TkW11iX3JlGjgbM2s8E1jKSXUTsG

Module 02 – Health Data Jungle: Selected Topics on Fundamentals of Data and Information Entropy

Topic 01 Data – The underlying physics of data
Topic 02 Data – Biomedical data sources – taxonomy of data
Topic 03 Data – Integration, Mapping and Fusion of data
Topic 04 Information – Bayes and Laplace probabilistic information p(x)
Topic 05 Information Theory – Information Entropy
Topic 06 Information Cross-Entropy and Kullback-Leibler Divergence

Lecture Slides 2×2 (8,520 kB) 02-DAY-MAKE-Data-HOLZINGER-Verona-2017-2×2

Keywords: data, information, probability, entropy, cross-entropy, Kullback-Leibler divergence

Learning Goals:
At the end of this module the students
1) are aware of the problematic of health data and understand the importance of data integration in the life sciences.
2) understand the concept of probabilistic information with a focus on the problem of estimating the parameters of a Gaussian distribution (maximum likelihood problem).
3) recognize the usefulness of the relative entropy, called Kullback–Leibler divergence which is very important, particularly for sparse variational methods between stochastic processes.

Here some prereading/postreading recommendations (alphabetically sorted according to author):

Banerjee, O., El Ghaoui, L. & D’aspremont, A. 2008. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. The Journal of Machine Learning Research, 9, 485-516, https://www.jmlr.org/papers/v9/banerjee08a.html
De Boer, P.-T., Kroese, D. P., Mannor, S. & Rubinstein, R. Y. 2005. A tutorial on the cross-entropy method. Annals of operations research, 134, (1), 19-67. doi:10.1007/s10479-005-5724-z
Galas, D. J., Dewey, T. G., Kunert-Graf, J. & Sakhanenko, N. A. 2017. Expansion of the Kullback-Leibler Divergence, and a new class of information metrics. arXiv:1702.00033.
Holzinger, A., Dehmer, M. & Jurisica, I. (2014). Knowledge Discovery and interactive Data Mining in Bioinformatics – State-of-the-Art, future challenges and research directions. BMC Bioinformatics, 15, (S6), I1. doi:10.1186/1471-2105-15-S6-I1
Holzinger, A., Hörtenhuber, M., Mayer, C., Bachler, M., Wassertheurer, S., Pinho, A. & Koslicki, D. (2014). On Entropy-Based Data Mining. In: Holzinger, A. & Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics, Lecture Notes in Computer Science, LNCS 8401. Berlin Heidelberg: Springer, pp. 209-226. doi:10.1007/978-3-662-43968-5_12
Online available: https://pure.tugraz.at/portal/files/3108669/HOLZINGER_Entropy_based_data_mining.pdf
Loshchilov, Ilya, Schoenauer, Marc & Sebag, Michele (2013). KL-based Control of the Learning Schedule for Surrogate Black-Box Optimization. arXiv:1308.2655.
Matthews, A., Hensman, J., Turner, R. E. & Ghahramani, Z. On sparse variational methods and the Kullback-Leibler divergence between stochastic processes. Proceedings of the Nineteenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2016. JMLR, 231-239 https://www.jmlr.org/proceedings/papers/v51/matthews16.html

Additional reading to foster a deeper understanding of information theory related to the life sciences:

Manca, Vincenzo (2013). Infobiotics: Information in Biotic Systems. Heidelberg: Springer. (This book is a fascinating journey through the world of discrete biomathematics and a continuation of the 1944 Paper by Erwin Schrödinger: What Is Life? The Physical Aspect of the Living Cell, Dublin, Dublin Institute for Advanced Studies at Trinity College)

Module 03 – Probabilistic Graphical Models Part 1: From Knowledge Representation to Graph Model Learning

In order to get well prepared for the second tutorial on probabilistic programming, the second module provides some basics on graphical models and goes towards methods for Monte Carlo sampling from probability distributions based on Markov Chains (MCMC), which is very important and cool, as it is similar as our brain may work and allows for computing hierachical models having a large number of unknown paraemeters and also works well for rare event sampling wich is often the case in the health informatics domain. . and Metropolis Hastings Algorithms The fourth module starts with reasoning under uncertainty, provides some basics on graphical models and goes towards graph model learning. One such MCMC method is the so-called Metropolis-Hastings algorithm which obtains a sequence of random samples from high-dimensional probability distributions -which we are often challenged in the health domain. The algorithm is among the top 10 most important algorithms and is named after Nicholas Metropolis (1915-1999) and Wilfred K. Hastings (1930-2016) – the first found it in 1953 and the latter generalized it in 1970 (remember: Generalization is a grand goal in science).

Topic 01 Reasoning/Decision Making under uncertainty
Topic 02 Graphs > Networks
Topic 03 Examples of Knowledge Representation in Network Medicine
Topic 04 Graphical Models and Decision Making
Topic 05 Bayes’ Nets
Topic 06 Graphical Model Learning
Topic 07 Probabilistic Programming
Topic 08 Markov Chain Monte Carlo (MCMC)
Topic 09 Example: Metropolis Hastings Algorithm

Lecture Slides 2×2 (7,467 kB) 03-DAY-MAKE-Graphs-HOLZINGER-Verona-2017-2×2

For the excercises please refer to the main course pages:
https://human-centered.ai/machine-learning-for-health-informatics-course

Keywords in this lecture: Reasoning under uncertainty, graph extraction, network medicine, metrics and measures, point-cloud data sets, graphical model learning, MCMC, Metropolis-Hastings Algorithm

Reading List (in alphabetical order):

Bishop, Christopher M (2007) Pattern Recognition and Machine Learning. Heidelberg: Springer [Chapter 8: Graphical Models]
Chenney, S. & Forsyth, D. A. 2000. Sampling plausible solutions to multi-body constraint problems. Proceedings of the 27th annual conference on Computer graphics and interactive techniques. ACM. 219-228, doi:10.1145/344779.344882.
Ghahramani, Z. 2015. Probabilistic machine learning and artificial intelligence. Nature, 521, (7553), 452-459, doi:10.1038/nature14541
Gordon, A. D., Henzinger, T. A., Nori, A. V. & Rajamani, S. K. Probabilistic programming. Proceedings of the on Future of Software Engineering, 2014. ACM, 167-181, doi:10.1145/2593882.2593900
KOLLER, Daphne & FRIEDMAN, Nir (2009) Probabilistic graphical models: principles and techniques. Cambridge (MA): MIT press.
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E. 1953. Equation of State Calculations by Fast Computing Machines. The Journal of Chemical Physics, 21, (6), 1087-1092, doi:10.1063/1.1699114. (34,123 citations as of 21.03.2017)
Wainwright, Martin J. & Jordan, Michael I. (2008) Graphical Models, Exponential Families, and Variational Inference. Foundations and Trends in Machine Learning, Vol.1, 1-2, 1-305, doi: 10.1561/2200000001 [Link to pdf]
Wood, F., Van De Meent, J.-W. & Mansinghka, V. A New Approach to Probabilistic Programming Inference. AISTATS, 2014. 1024-1032.

A hot topic in ML are graph bandits:

Villar, S. S., Bowden, J. & Wason, J. 2015. Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges. Statistical Science, 199-215, doi:10.1214/14-STS504, accesible via: https://arxiv.org/abs/1507.08025

Very recommendable:

Murphy, K. P. 2012. Machine learning: a probabilistic perspective, MIT press. Chapter 26 (pp. 907) – Graphical model structure
learning, https://www.cs.ubc.ca/~murphyk/MLbook/

Short Bio of Lecturer:

Andreas HOLZINGER <expertise> is head of the Holzinger Group, HCI-KDD, Institute for Medical Informatics, Statistics and Documentation, Medical University Graz; and Assoc.Prof (Univ.-Doz.) at the Faculty of Computer Science and Biomedical Engineering, Graz University of Technology, Institute of Information Systems and Computer Media. His research interests are in supporting human intelligence with machine learning to help to solve complex problems in biomedical informatics and the life sciences. Andreas obtained a Ph.D. in Cognitive Science from Graz University in 1998 and his Habilitation (second Ph.D.) in Computer Science from Graz University of Technology in 2003. Andreas was Visiting Professor in Berlin, Innsbruck, London (2 times), and Aachen. He was program co-chair of the 14th IEEE International Conference on Machine Learning and Applications of the Association for Machine Learning and Applications (AMLA), and is Associate Editor of the Springer Journal Knowledge and Information Systems (KAIS), Springer Brain Informatics (BRIN), BMC Medical Informatics and Decision Making (MIDM), and founder and leader of the international expert network HCI-KDD. Andreas is member of the IFIG WG 12.9. Computational Intelligence and co-chair of the Cross-Disciplinary IFIP CD-ARES 2016 conference, organizing a special session on privacy aware machine learning (PAML) for health data science. Since 2003 he has participated in leading positions in 30+ R&D multi-national projects, budget 4+ MEUR, 7800+ citations, h-index =39, g-index=166;

Video for Students: https://youtu.be/lc2hvuh0FwQ

Group Homepage: https://human-centered.ai

Personal Homepage: https://www.aholzinger.at

Additional study material and reading can be found here:
https://human-centered.ai/learning-machine-learning/

Mini Course MAKE-Health: Machine Learning & Knowledge Extraction for Health Informatics