Posts

Google Brain says Explainability is the “new deep learning”

There is a very interesting interview in the Talking Machines*) series from May, 31, 2018. Katherine GORMAN interviews Maithra RAGHU **) from the Google Brain Team, where she mentioned that “explainability is the new deep learning”, and it is particularly important for health informatics, where it is important to re-trace, re-enact and to understand and explain why a machine decision has been reached. This is super for us, because when I tell my students that this is important, nobody believes me; but now I can emphasize that not I am saying that, but Google Brain is saying it. Excellent.

However, the whole field needs a lot of work, before we can provide useable solutions for the end-user in daily routine (e.g. a medical doctor); urgently needed are approaches to explainable User Interfaces and most of all a research framework for testing explainability.

*) Talking Machine is an excellent, highly recommendable Podcast series, founded by Katharine GORMAN and Ryan ADAMS in 2015 and now run by Katharine together with Neil LAWRENCE (who leads the Amazon Research in Cambridge, UK).

**) Maithra RAGHU is currently a PhD working with Jon KLEINBERG at Cornell (see https://maithraraghu.com ), where she is doing extended research with the Google Brain Team, see: https://ai.google/research/teams/brain
Maithra has published some very interesting papers, e.g.: Maithra Raghu, Justin Gilmer, Jason Yosinski & Jascha Sohl-Dickstein. SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability. Advances in Neural Information Processing Systems, 2017. 6078-6087.
or this is also very interesting:
Ben Poole, Subhaneil Lahiri, Maithra Raghu, Jascha Sohl-Dickstein & Surya Ganguli. Exponential expressivity in deep neural networks through transient chaos. Advances in neural information processing systems, 2016. 3360-3368.

Google Brain says we urgently need a Research Framework around the field of interpretability

In a recent interview Been KIM from the Google Brain team emphasizes the significance of research in explainable AI. Particularly, she emphasized the importance of Human-Computer Interaction (HCI) for Artificial Intelligence generally and Machine Learning specifically (see the differences between AI and ML here), and the urgent need of an research framework around the field of interpretability. Listen to the episode six of season four of Talking Machines by Katherine GORMAN and Neil LAWRENCE here (Start at approx. 26:00): https://www.thetalkingmachines.com/episodes/explainability-and-inexplicable

Been KIM is a research scientist at the Google Brain team and is interested in designing machine learning methods that  make sense to humans. Her current focus is building interpretability methods for already-trained models (e.g., high performance neural networks). In particular, she believes that the language of explanations should include higher-level, human-friendly concepts.  Been gave a tutorial on explainable AI at ICML 2017 and recently the group published the paper: Menaka Narayanan, Emily Chen, Jeffrey He, Been Kim, Sam Gershman & Finale Doshi-Velez 2018. How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation. arXiv:1802.00682.
https://people.csail.mit.edu/beenkim

“Machine Learning for Health Informatics” Lecture Notes in Artificial Intelligence 9605 > 40,626 downloads 2017

Since its online publication on December 10, 2016 the Volume edited by Andreas Holzinger “Machine Learning for Health Informatics” Springer Lecture Notes in Artificial Intelligence LNAI Volume 9605, has been downloaded 54,960 times as of today (May, 11, 2018, 20:00 CEST) and 44,988 with status as of April 2018 according to the official Springer Bookmetrix book performance report – a record; and alone in the year 2017 40,626 downloads, which is 10 times higher than a typical volume of the series of Lecture Notes in Artificial Intelligence by Springer/Nature. A cordial thank you for my international colleagues for this huge acceptance!

https://www.springer.com/978-3-319-50478-0

https://www.springer.com/gp/book/9783319504773

A popular passage from the book:
https://books.google.com/talktobooks/query?q=What%E2%80%99s%20the%20difference%20between%20Machine%20Learning%20and%20deep%20learning%3F

Update on 15th September 2018: 63k downloads

 

AI will change Radiology – NOT replace Radiologists

After the rather shocking statement of Geoffrey HINTON during the Machine Learning and Market for Intelligence Conference in Toronto, where he recommended that hospitals should stop training radiologists, because deep learning will replace them (watch video below), on March, 27, 2018 Thomas H. DAVENPORT and Keith J. DREYER published a really nice article on “AI will change radiology, but it won’t replace radiologists” (see [1]) – which supports our human-in-the-loop approach: for sure, AI/machine learning (difference here) will change workflows, but we envision that the expert will be augmented by new technologies, i.e. routine (boring) tasks will be replaced by automatic algorithms, but this will free up expert time to spent on challenging (cool) tasks and more research – and there are plenty of problems where we need human intelligence!

[1] https://hbr.org/2018/03/ai-will-change-radiology-but-it-wont-replace-radiologists

 

 

On-Device Machine Intelligence

One very interesting approach of federated machine learning is presented by Sujith Ravi from Google: Machine learning models (e.g. CNN) are sucessfully used for the design of intelligent systems capable of visual recognition, speech and language understanding. Most of these are running on a cloud – which is often inpredictable where it is physically running. A huge problem so far is that typical machine learning models are awkward to use on mobile devices due to both computational and memory constraints. While these devices could make use of models running on high-performance data centers with CPUs or GPUs, this is not feasible for many applications and scenarios where inference needs to be performed directly “on” device. This requires re-thinking existing machine learning algorithms and coming up with new models that are directly optimized for on-device machine intelligence rather than doing post-hoc model compression. Sujith Ravi is introducing a novel “projection-based” machine learning system for training compact neural networks. The approach uses a joint optimization framework to simultaneously train a “full” deep network and a lightweight “projection” network. Unlike the full deep network, the projection network uses random projection operations that are efficient to compute and operates in bit space yielding a low memory footprint. The system is trained end-to-end using backpropagation. Ravi shows that the approach is flexible and easily extensible to other machine learning paradigms, for example, they can learn graph-based projection models using label propagation. The trained “projection” models are then directly used for inference, please watch the origial video on:

 

Prefetching – Predicting what will be most likely needed next

A very interesting paper has just been published  about prefetching, which is a nice machine learning solution: predicting which information will be most likely useful next and consequently can be prepared in advance:

Milad Hashemi, Kevin Swersky, Jamie A Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis & Parthasarathy Ranganathan 2018. Learning Memory Access Patterns. arXiv preprint arXiv:1803.02329.

Prefetching is the process of predicting future memory accesses that will miss in the on-chip cache and access memory based on past history. Each of these memory addresses are generated by a memory instruction (a load/store). Memory instructions are a subset of all instructions that interact with
the addressable memory of the computer system.

 

There is a nice article in the MIT Technology Review by Will Knight on March, 8, 2018 on the similarities on how human improve their behaviour with age – a very nice read:

https://www.technologyreview.com/s/610453/your-next-computer-could-improve-with-age/?set=

Python in Machine Learning still Nr. 1 and increasing

There is of course no such thing like a ‘best language for machine learning’ – but as a matter of fact Python is still Nr. 1 and increasing:
Image Source: https://stackoverflow.blog/2017/09/06/incredible-growth-python/

We use in all our courses Python due to the fact that it is an “industrial standard” and widely available. I would love e.g. Julia, which is much faster, but it remains rather academic and needs a lot of additional effort. It is not astonishing that Python is worldwide the most popular tool for machine learning and artificial intelligence as there are deep learning frameworks available, including Tensor Flow, Pandas, NumPy, PyBrain, Scikit, SimpleAI, EasyAI, etc. etc.

Consequently, in our courses we teach Python, have a look at:

Marcus D. Bloice & Andreas Holzinger 2016. A Tutorial on Machine Learning and Data Science Tools with Python. In: Holzinger, Andreas (ed.) Machine Learning for Health Informatics, Lecture Notes in Artificial Intelligence LNAI 9605. Heidelberg: Springer, pp. 437-483, doi:10.1007/978-3-319-50478-0_22. [link to paper]

NIPS-2017 Best paper “Explainability was one of the major reasons the paper was given the award”

Congratulations to Arthur GRETTON from the Gatsby Computational Neuroscience Unit at the University College London an his team. Their paper titled “A Linear-Time Kernel Goodness-of-Fit Test” authored by Wittawat JITKRITTUM, Wenkai XU, Zoltan SZABO, Kenji FUKUMIZU and Arthur GRETTON won the prestigous NIPS 2017 best paper award. In the interview by Sam Charringtion from TWiML&AI, the authors of the NIPS 2017 best paper said at 14:10 in the following video that ” … explainability was one of the reasons that the paper was given the award …”, listen here:

Here is the original talk:

Algorithms

Live from NIPS 2017, presentations from the Algorithms session:• A Linear-Time Kernel Goodness-of-Fit Test• Generalization Properties of Learning with Random Features• Communication-Efficient Distributed Learning of Discrete Distributions• Optimistic posterior sampling for reinforcement learning: worst-case regret bounds• Regret Analysis for Continuous Dueling Bandit• Minimal Exploration in Structured Stochastic Bandits• Fast Rates for Bandit Optimization with Upper-Confidence Frank-Wolfe• Diving into the shallows: a computational perspective on large-scale shallow learning• Monte-Carlo Tree Search by Best Arm Identification• A framework for Multi-A(rmed)/B(andit) Testing with Online FDR Control• Parameter-Free Online Learning via Model Selection• Bregman Divergence for Stochastic Variance Reduction: Saddle-Point and Adversarial Prediction• Gaussian Quadrature for Kernel FeaturesLearning Linear Dynamical Systems via Spectral Filtering

Posted by Neural Information Processing Systems on Dienstag, 5. Dezember 2017

 

https://papers.nips.cc/paper/6630-a-linear-time-kernel-goodness-of-fit-test

In their paper the authors propose a novel adaptive test of goodness-of-fit, with computational cost linear in the number of samples. They learn the test features, which best indicates the differences between the observed samples and a reference model, by means of minimizing the false negative rate. These features are constructed via the Stein’s method, i.e. that it is not necessary to compute the normalising constant of the model. They further analyse the asymptotic Bahadur efficiency of the new test, and prove that under a mean-shift alternative, the test always has greater relative efficiency than a previous linear-time kernel test, regardless of the choice of parameters for that particular test. In experiments, the performance of their method exceeds that of the earlier linear-time test, and matches or exceeds the power of a quadratic-time kernel test. In high dimensions and where model structure may be exploited, this new goodness of fit test performs far better than a quadratic-time two-sample test based on the Maximum Mean Discrepancy, with samples drawn from the model.

The original paper can be downloaded via the NIPS pages:
https://nips.cc/Conferences/2017/Schedule?showEvent=8823

The paper is also available at arXiv:

Jitkrittum, W., Xu, W., Szabo, Z., Fukumizu, K. & Gretton, A. 2017. A Linear-Time Kernel Goodness-of-Fit Test. arXiv preprint arXiv:1705.07673.

 

What is the difference between AI/ML/DL?

Digital Pathology: World’s fastest WSI scanner is now working in Graz

26.10.2017. Today, Prof. Kurt Zatloukal and his group together with us and the digital pathology team of 3DHISTECH, our industrial partner, completed the installation of the new generation panoramic P1000 scanner [0]. The world’s fastest whole slide image scanner (WSI) is now located in Graz. The current scanner outperforms current state-of-the-art systems by a factor 6, which provides enormous opportunities for our machine learning/AI MAKEpatho project.

Digital Pathology and Artificial Intelligence/Machine Learning

Digital pathology [1] is not just the transformation of the classical microscopic analysis of histopathological slides by pathologists to a digital visualization. Digital Pathology is an innovation that will dramatically change medical workflows in the coming years. In the center is Whole Slide Imaging (WSI), but the true added value will result from a combination of heterogenous data sources. This will generate a new kind of information not yet available today. Much information is hidden in arbitrarily high dimensional spaces and not accessible to a human pathologist. Consequently, we need novel approaches from artificial intelligence (AI) and machine learning (ML) (see definition) for exploiting the full possibilities of Digital Pathology [2]. The goal is to gain knowledge from this information, which is not yet available and not exploited to date [3].

Digital Pathology chances

Major changes enabled by digital pathology include the improvement of medical decision making with new human-AI interfaces, new chances for education and research, and the globalization of diagnostic services. The latter allows bringing the top-level expertise essentially to any patient in the world by the use of the Internet/Web. This will also generate totally new business models for  worldwide diagnostic services. Furthermore, by using AI/ML we can make new information of images accessible and quantifiable (e.g. through geometrical approaches and machine learning),  which is not yet available in current diagnostics. Another effect will be that digital pathology and machine learning will change the education and training systems, which will be an urgently needed solution to address the global shortage of medical specialists. While the digitalization is called Pathology 2.0 [4] we envision a Pathology 4.0 – and here explainable-AI will become important.

3DHISTECH

3DHISTECH Ltd. (the name is derived from „Three-dimensional Histological Technologies”) is a leading company, developing high-performance hardware and software products for digital pathology since 1996. As the first European manufacturer, 3DHISTECH is one of the market leaders in the world with more than 1500 sold systems. Founded by Dr. Bela MOLNAR from Semmelweis University Budapest, they are pioneers in this field, and develop high speed digital slide scanners that create high quality bright field and fluorescent digital slides, digital histology software and tissue microarray machinery. 3DHISTECH’s aim is to fully digitalize the traditional pathology workflow so that it can adapt to the ever growing demands of healthcare today. Furthermore, educational programs are also organized to help pathologists learn and master these new technologies easier.

[0] P1000 https://www.youtube.com/watch?v=WuCXkTpy5js (1:41 min)

[1]  Shaimaa Al‐Janabi, Andre Huisman & Paul J. Van Diest (2012). Digital pathology: current status and future perspectives. Histopathology, 61, (1), 1-9, doi:10.1111/j.1365-2559.2011.03814.x.

[2] Anant Madabhushi & George Lee (2016). Image analysis and machine learning in digital pathology: Challenges and opportunities. Medical Image Analysis, 33, 170-175, doi:10.1016/j.media.2016.06.037.

[3]  Andreas Holzinger, Bernd Malle, Peter Kieseberg, Peter M. Roth, Heimo Müller, Robert Reihs & Kurt Zatloukal (2017). Machine Learning and Knowledge Extraction in Digital Pathology needs an integrative approach. In: Springer Lecture Notes in Artificial Intelligence Volume LNAI 10344. Cham: Springer International, pp. 13-50. 10.1007/978-3-319-69775-8_2  [pdf-preprint available here]

[4]  Nikolas Stathonikos, Mitko Veta, André Huisman & Paul J Van Diest (2013). Going fully digital: Perspective of a Dutch academic pathology lab. Journal of pathology informatics, 4. doi:  10.4103/2153-3539.114206

[5] Francesca Demichelis, Mattia Barbareschi, P Dalla Palma & S Forti 2002. The virtual case: a new method to completely digitize cytological and histological slides. Virchows Archiv, 441, (2), 159-164. https://doi.org/10.1007/s00428-001-0561-1

[6] Marcus Bloice, Klaus-Martin Simonic & Andreas Holzinger 2013. On the usage of health records for the design of virtual patients: a systematic review. BMC Medical Informatics and Decision Making, 13, (1), 103, doi:10.1186/1472-6947-13-103.

[7] https://www.3dhistech.com

[8]  https://pathologie.medunigraz.at/forschung/forschungslabor-fuer-experimentelle-zellforschung-und-onkologie

Mini Glossary:

Digital Pathology = is not only the conversion of histopathological slides into a digital image (WSI) that can be uploaded to a computer for storage and viewing, but a complete new medical work procedure (from Pathology 2.0 to Pathology 4.0) – the basis is Virtual Microscopy.

Explainability = motivated due to lacking transparency of black-box approaches, which do not foster trust and acceptance of AI generally and ML specifically among end-users. Rising legal and privacy aspects, e.g. with the new European General Data Protection Regulations (which come into effect in May 2018) will make black-box approaches difficult to use, because they often are not able to explain why a decision has been made (see explainable AI).

Explainable AI = raising legal and ethical aspects make it mandatory to enable a human to understand why a machine decision has been made, i.e. to make machine decisions re-traceable and to explain why a decision has been made [see Wikipedia on Explainable Artificial Intelligence] (Note: that does not mean that it is always necessary to explain everything and all – but to be able to explain it if necessary – e.g. for general understanding, for teaching, for learning, for research – or in court!)

Machine Aided Pathology = is the management, discovery and extraction of knowledge from a virtual case, driven by advances of digital pathology supported by feature detection and classification algorithms.

Virtual Case = the set of all histopathological slides of a case together with meta data from the macro pathological diagnosis [5]

Virtual microscopy = not only viewing of slides on a computer screen over a network, it can be enhanced by supporting the pathologist with equivalent optical resolution and magnification of a microscope whilst changing  the magnification; machine learning and ai methods can help to extract new knowlege out of the image data

Virtual Patient = has very different definitions (see [6]), we define it as a model of electronic records (images, reports, *omics) for studying e.g. diseases.

WSI = Whole Slide Image, a.k.a. virtual slide, is a digitized histopathology glass slide that has been created on a slide scanner and represents a high-resolution volume data cube which can be handled via a virtual microscope and most of all where methods from artificial intelligence generally, and interactive machine learning specifically, together with methods from topological data analysis, can make information accessible to a human pathologists, which would otherwise be hidden.

WSS = Whole Slide Scanner is the machinery for taking WSI including the hardware and the software for creating a WSI.