Science Magazine Vol.350, Issue 6266

A proof of the importance of the human-in-the-loop

Again machine learning made it to the title page of Science: A nice further proof for the importance of the human-in-the-loop by a paper of

Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. 2015. Human-level concept learning through probabilistic program induction. Science, 350, (6266), 1332-1338.

Whilst humans can learn new concepts often from a very few examples, automated machine learning (aML) methods ususally need many examples (often called: big data) to perform with similar accuracy (and with the danger of modelling artefacts, e.g. through overfitting).  The authors present a computational model  which captures these human learning abilities for a large class of simple visual concepts: handwritten characters from the world’s alphabets. The model represents concepts as simple programs that best explain observed examples under a Bayesian criterion. Very interesting is the fact that on a challenging one-shot classification task, this model achieves human-level performance and outperforms recent deep learning approaches!

The authors also present several “visual Turing tests” probing the model’s creative generalization abilities, which in many cases are indistinguishable from human behavior – a must read at: https://www.sciencemag.org/content/350/6266/1332.full

Workshop “Machine Learning for Health Informatics” November, 30, 2015

Workshop

Machine Learning for Health Informatics

Machine learning is a large and rapidly developing subfield of computer science that evolved from artificial intelligence (AI) and is tightly connected with data mining and knowledge discovery. The ultimate goal of machine learning is to design and develop algorithms which can learn from data. Consequently, machine learning systems learn and improve with experience over time and their trained models can be used to predict outcomes of questions based on previously seen knowledge. In fact, the process of learning intelligent behaviour from noisy examples is one of the major questions in the field. The ability to learn from noisy, high dimensional data is highly relevant for many applications in the health informatics domain. This is due to the inherent nature of biomedical data, and health will increasingly be the focus of machine learning research in the near future.

Program

https://human-centered.ai/machine-learning-for-health-informatics/

December, 3, 2015 Seminar Talk on human protein interaction networks

Title: Coordination of post-translational modifications in human protein interaction network

Lecturer: Ulrich Stelzl, Network Pharmacology, Insitute of Pharmaceutical Sciences, Karl-Franzens University Graz

Abstract: Comprehensive protein interaction networks are prerequisite for a better understanding of complex genotype to phenotype relationships. Post – translational modifications (PTMs) regulate protein activity, stability and protein interaction (PPI) profiles critical for cellular functioning. In combined experimental and computational approaches, we want to elucidate the role of post – translational protein modifications, such as phosphorylation, for these dynamic processes and investigate how the large number of changing PTMs is coordinated in cellular protein networks and likewise how PTMs may modulate protein – protein interaction networks. We identified hundreds of protein complexes that selectively accumulate different PTMs i.e. phosphorylation, acetylation and ubiquitination. Also protein regions of very high PTM densities, termed PTMi spots, were characterized and show domain – like features. The analysis of phosphorylation – dependent interactions provides clues on how these PPIs are dynamically and spatially constrained to separate simultaneously triggered growth signals which are often altered in oncogenic conditions. Our data indicate coordinated targeting of specific molecular functions via PTMs at different levels emphasizing a protein network approach as requisite to better understand modification impact on cellular signaling and cancer phenotypes.

Short bio: Ulrich Stelzl studied Chemistry/Biochemistry at the TU Vienna and ETH Zürich. His PhD thesis (MPIMG, Berlin) and first PostDoc (MSKCC, New York) addressed detailed biochemical questions of RNA-protein recognition, such as the assembly and dynamics of ribonucleo-protein complexes in gene expression and regulation. Then at the MDC Berlin, Ulrich Stelzl contributed significantly to well recognized protein-protein interaction (PPI) studies such as the generation and analysis of the first human proteome scale PPI networks or the development of an empirical framework for human interactome mapping. The importance of the work and its interdisciplinary character was recognized by the Erwin Schrödinger Price 2008 of the German Helmholtz Society. From 2007 on, Ulrich Stelzl headed the Max-Planck Research Group “Molecular Interaction Networks” at the MPIMG, Berlin and joined recently the Department of Pharmaceutical Sciences of the University of Graz.

November, 9, 2015 Welcome Seminar Machine Learning for Mitochondria Research

We welcome Irina KUZNETSOVA to our group, who will do her PhD with us on the topic of machine learning for mitochondria research

Her inauguratioal talk is on

Mitochondrial Interactions

Mitochondrial diseases are progressive and debilitating multi-system disorders that occur at a frequency of up to 1 in 5,000 live births with no known cure. There is a variety of different complex mechanisms that cause the disruption of normal mitochondrial functions and leads to development of mitochondrial diseases. Identification of the molecular and pathophysiological mechanisms that cause mitochondrial disease remains challenging. However, establishing mouse models of mitochondrial disease would enable the study of the onset, progression and penetrance of mitochondrial disease as well as investigation of the tissues specifically affected in mitochondrial disease. Consequently this will enable to develop pre-clinical models of mitochondrial disease that could be used for testing a range of treatments for these diseases.

Irina did her Bachelor in computing sciences in St.Petersburg, and her Masters in Bioinformatics at the Tampere University of Technology in Finland. Curently she is working a the  Mitochondrial Medicine and Biology laboratory at the University of Western Australia in Perth where she is co-supervised by Professor Aleksandra Filipovska.

Lecture-Irina-02-11-2015-machine-learning

 

July, 7, 2015 Seminar Metabolomics data types

The potential of metabolomics and its various data types

Lecturer: Natalie BORDAG,  CBmed – Center for Biomarker Research in Medicine Graz

Abstract: Metabolomics is one of the youngest -omics technologies primarily concerned with the identification and quantification of small molecules (<1500 Da). The specific advantage of metabolomics in biomarker research lies in the concept, that metabolites fall downstream of genetic, transcriptomic, proteomic, microbiomic and environmental variation, thus providing the most integrated and dynamic measure of phenotype and medical condition. Thus metabolomics can deliver biologically most valuable results allowing for example early diagnostic biomarkers, optimization of biotechnological productions, gaining deep insights into pathological mechanism, identifying new therapeutic targets and many more. Metabolomics, especially MS (mass spectrometry) based metabolomics, delivers along a the flow from measurement towards knowledge generation highly divers data types with most potential yet to be exploited. The biological potential for knowledge generation by metabolomics will be shown with a real life example. The different data types and common data aggregation (e.g. peak detection, identification), transformations, statistical analysis and visualizations will be shown and open potentials jointly discussed.

July, 7, 2015 Seminar Feature Based Search

Visual-Interactive Search and Exploration in Complex Data Repositories
– Feature-Based Search, Applications and Research Challenges

Lecturer: Tobias SCHRECK, University of Konstanz and Graz University of Technology <link>

Abstract: Advances in data acquisition and storage technology are leading to the creation of large, complex data sets in many different domains including science, engineering or social media. Often, this data is of non-textual / non-spatial nature. Important user tasks for leveraging large complex data sets include retrieval of relevant information, exploration for patterns and insights, and re-using data for authoring purposes. User-oriented, effective and scalable approaches are needed to support these tasks. Visual-interactive techniques in combination with automatic data analysis approaches can provide effective user interfaces for handling large, complex data sets, and help users to factor in background knowledge for solving search and analysis tasks. We will discuss approaches for visual-interactive, content-based search and analysis tasks in time-oriented and multivariate data sets, with applications in Digital Data Libraries. We will discuss how sketch-and example-based search interfaces allow to effectively formulate user queries, and how appropriate similarity functions for these data types can be defined and evaluated. We will also discuss approaches for visual-interactive search in 3D model repositories. Furthermore, we will present approaches for the repair of 3D models of deteriorated Cultural Heritage objects, relying on appropriate feature-based 3D similarity functions. We conclude this talk with a discussion of interesting research challenges at the intersection of visual data analysis, novel non-textual data types, and applications in Digital Libraries.

?????????????

Natalie Bordag and Tobias Schreck as guests at the Holzinger Group

Machine Learning in Nature again

Lecun, Y., Bengio, Y. & Hinton, G. 2015. Deep learning. Nature, 521, (7553), 436-444.

Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

More information: https://www.nature.com/nature/journal/v521/n7553/full/nature14539.html

Nature Issue 7553 contains a special about computational intelligence!

https://www.nature.com/nature/current_issue.html

 

 

June, 23, 2015 Seminar Talk Machine Learning

Title: Towards Knowledge Discovery with the human in the machine learning loop: An Ontology-Guided Meta-Classifying Approach for the Biomedical Domain

Lecturer: Dominic GIRADI, RISC-Software Linz, Austria <expertise>

Abstract: The process of knowledge discovery in clinical research is significantly different from other business domains, for example market research. While in the general definitions of knowledge discovery the domain expert is in a rather consulting, supervising or customer-like role, the complex process of (bio-) medical or clinical knowledge discovery requires the medical domain expert to be deeply involved into this process. At the same time, data integration and data pre-processing are known to be major pitfalls to such (bio-) medical data projects, due to the fact that in the (bio-) medical domain we are confronted with extremely high complexity, heterogeneity, along with unprecedented amounts of data sets. In this lecture it will be discussed what consequences for the knowledge discovery process arise, when the domain expert is moved to a central position of this process, and as a consequence how advanced machine learning algorithms can be combined with traditional, ontology-centered approaches for the benefit of advancing (bio-)medical research. Examples are given of different medical research projects, i.e.: clinical benchmarking, cerebral aneurysm and biometric study of children and young adults.
The theoretical focus of this talk is on how the elaborate structural meta-information of the domain ontology can be used to parametrize and automatize advanced machine learning algorithms and data visualization methods. Two examples will be presented: An ontology-guided dimensionality reduction with focus on the hierarchical structured, multi-select categorical variables and an approach of an ontology-guided meta-classifier.

Dominik@Holzinger-Group

May,19, 2015 Seminar Talks Machine Learning

Title: Towards Personalization of Diabetes Therapy Using Computerized Decision Support and Machine Learning

Lecturer: Klaus DONSA <expertise> and Stephan SPAT <expertise>

Abstract: Diabetes mellitus (DM) is a growing global disease which highly affects the individual patient and represents a global health burden with financial impact on national health care systems. The therapeutic options include lifestyle changes such as change of diet and an increase of physical activity, but also administration of oral or injectable antidiabetic drugs. The diabetes therapy, especially with insulin, is complex. Therapy decisions include various medical and life-style related information. Computerized decision support systems (CDSS) aim to improve the treatment process in patient´s self-management but also in institutional care. Therefore, the personalization of the patient´s diabetes treatment is possible at different levels and is also facilitated by using new therapy aids like food and activity recognition systems, lifestyle support tools and pattern recognition for insulin therapy optimization. In this talk we discuss the role of machine learning in this context. Furthermore we provide insights in different strategies to personalize diabetes therapy and how CDSS can support the therapy process. During our work we found open problems and challenges for the personalization of diabetes therapy. In a final discussion we will address these open problems with focus on decision support systems and especially machine learning technology.

Apr, 14, 2015 Seminar Talks Deep Learning

Title:  Using Deep Learning for Discovering Knowledge from Images: Pitfalls and Best Practices

Lecturer: Marcus BLOICE <expertise>

Abstract: Neural networks have been shown to be adept at image analysis and image classification. Deep layered neural networks especially so. However, deep learning requires two things in order to work proficiently: large amounts of data and lots of processing power. In this talk both aspects are covered, allowing you to maximise the potential of deep learning. Firstly, we will learn how the computational power of GPUs can be used to speed up learning by orders of magnitude, making it possible to learn from very large datasets on commodity hardware. Thanks to software such as Theano, Caffe, and Pylearn2, the GPU can be leveraged without needing to be an expert in parallel programming. This talk will discuss how. Secondly, data preprocessing, data augmentation, and artificial data generation are discussed. These methods allow you to ensure you are making the most of the data you possess, by expanding your dataset and preparing your data properly before analysis. This means discussing best practices in data preparation, using methods such as histogram equalisation, contrast stretching or normalisation, and discussing artificial data generation in detail. The tools you require to do so are described, using multi-platform software that is freely available. Finally, the talk will touch on hyper-parameters and the best practices and pitfalls of hyper-parameter choice when training deep neural networks.

Title: Pitfalls for applying Machine Learning in HCI-KDD: Things to be aware of and how to avoid them

Lecturer: Christof STOCKER <expertise>

Abstract: When dealing with big and unstructured data sets, we often try to be creative and to experiment with a number of different approaches for the purpose of knowledge discovery. This can lead to new insights and even spark novel ideas. However, ignorant application of algorithms to unknown data is dangerous and can lead to false conclusions – with high statistical significance. In finite data sets, structure can emerge from sheer randomness. Furthermore, hidden variables can lead to significant correlations that in turn might result in wrong conclusions. Beyond this, data science as a discipline has developed into a complex area in which mistakes can occur with ease and even lead experienced scientists astray. In this talk we will investigate these pitfalls together on simple examples and discuss how we can address these concerns with manageable effort.