CD-MAKE machine learning and knowledge extraction

Cross Domain Conference for Machine Learning & Knowledge Extraction

cd-make.net

Call for Papers – due to May, 15, 2017

https://www.wikicfp.com/cfp/servlet/event.showcfp?eventid=61244&copyownerid=17803

Call for Papers due to May, 15, 2017

International IFIP Cross Domain Conference for Machine Learning & Knowledge Extraction CD-MAKE
in Reggio di Calabria (Italy) August 29 – September 1, 2017

https://cd-make.net

CD stands for Cross-Domain and means the integration and appraisal of different fields and application domains (e.g. Health, Industry 4.0, etc.) to provide an atmosphere to foster different perspectives and opinions. The conference is dedicated to offer an international platform for novel ideas and a fresh look on the methodologies to put crazy ideas into Business for the benefit of the human. Serendipity is a desired effect, and shall cross-fertilize methodologies and transfer of algorithmic developments.

MAKE stands for MAchine Learning & Knowledge Extraction.

CD-MAKE is a joint effort of IFIP TC 5, IFIP WG 8.4, IFIP WG 8.9 and IFIP WG 12.9 and is held in conjunction with the International Conference on Availability, Reliability and Security (ARES).
Keynote Speakers are Neil D. LAWRENCE (Amazon) and Marta MILO (University of Sheffield).

IFIP is the International Federation for Information Processing and the leading multi-national, non-governmental, apolitical organization in Information & Communications Technologies and Computer Sciences, is recognized by the United Nations and was established in the year 1960 under the auspices of the UNESCO as an outcome of the first World Computer Congress held in Paris in 1959.

Papers are sought from the following seven topical areas (see image below). Papers which deal with fundamental questions and theoretical aspects in machine learning are very welcome.

❶ Data science (data fusion, preprocessing, data mapping, knowledge representation),
❷ Machine learning (both automatic ML and interactive ML with the human-in-the-loop),
❸ Graphs/network science (i.e. graph-based data mining),
❹ Topological data analysis (i.e. topology data mining),
❺ Time/entropy (i.e. entropy-based data mining),
❻ Data visualization (i.e. visual analytics), and last but not least
❼ Privacy, data protection, safety and security (i.e. privacy aware machine learning).

Proposals for Workshops, Special Sessions, Tutorials: April, 19, 2017
Submission Deadline: May, 15, 2017
Author Notification: June, 14, 2017
Camera Ready Deadline: July, 07, 2017

 

 https://cd-make.net/call-for-papers

 

Stan: A probabilistic programming language

A long time ago submitted paper from the Stan developers
https://mc-stan.org/
has finally been appeared at the Journal of statistical software:
https://www.jstatsoft.org

Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M. A., Guo, J., Li, P. & Riddell, A. 2017. Stan: A probabilistic programming language. Journal of Statistical Software, 76, (1), 1-32, doi:10.18637/jss.v076.i01

Also the Python package can be downloaded from the site!

Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. Stan provides full Bayesian inference
for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectation propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.

Congratulations from the Holzinger Group to the authors!

Machine Learning Podcast: Data Skeptic (recommendable)

Data Skeptic is a weekly podcast that is skeptical of and with data. They explain methods and algorithms that power our world in an accessible manner through short mini-episode discussions and longer interviews with experts in the field, see:

https://dataskeptic.com

 

Call for Papers – Privacy Aware Machine Learning PAML due to April, 1, 2017

Privacy Aware Machine Learning (PAML)
for Health Data Science

Special Session on September, 1, 2017, organized by Andreas HOLZINGER, Peter KIESEBERG, Edgar WEIPPL and A Min TJOA in the context of the 12th International Conference on Availability, Reliability and Security (ARES and CD-ARES), Reggio di Calabria, Italy, August 29 – September, 2, 2017

Session Homepage

supported by the International Federation of Information Processing IFIP >  TC5 and WG 8.4 and WG 8.9
https://cd-ares-conference.eu
https://www.ares-conference.eu

Keynote Talk by Neil D. LAWRENCE, University of Sheffield and Amazon

With the new European data protection and privacy regulations coming into effect with January, 1, 2018 issues having been nice to have so far are becoming a must have. Privacy aware machine learning will be one of the most important fields for the European research community and the IT business in particular. Most affected is the whole area of biology, medicine and health, partiuclarly driven by the fact that health sciences are becoming a more and more data intensive science.

This special session will bring together scientists with diverse background, interested in both the underlying theoretical principles as well as the application of such methods for practical use in the biomedical, life sciences and health care domain. The cross-domain integration and appraisal of different fields will provide an atmosphere to foster different perspectives and opinions; it will offer a platform for novel crazy ideas and a fresh look on the methodologies to put these ideas into business.

All paper will be peer-reviewed by three members of the international PAML-commitee. Paper acceptance rate of the last session was 35 %. Accepted papers will be published in a Springer Lecture Notes in Computer Science (LNCS) Volume and excellent contributions will be invited to be extented in a special issue of a journal (planned Springer MACH and/or BMC MIDM).

Research topics covered by this special session include but are not limited to the following topics:

– Production of Open Data Sets
– Synthetic data sets for learning algorithm testing
– Privacy preserving machine learning, data mining and knowledge discovery
– Data leak detection
– Data citation
– Differential privacy
– Anonymization and pseudonymization
– Securing expert-in-the-loop machine learning systems
– Evaluation and benchmarking

This picture was taken by our local host, Francesco Buccafurri on January, 3, 2017: from the conference venue you have a direct view to the Aetna volcano:

Picture taken by Francesco Buccafurri on January, 3, 2017

Picture taken by Francesco Buccafurri on January, 3, 2017

3,2 Trillion USD on health per year

The U.S. spends more on health care than any other country

Dieleman et al. (2016) just (Dec, 27, 2016) published a paper [1] which discusses data from the National Health Expenditure Accounts to estimate US spending on personal health care and public health, according to condition, age and sex group, and type of care. This paper was mentioned in the Washington Post by Carolyn Y. Johnson on December 27 at 11:00 AM

Here a link to the original paper:

[1] Dieleman JL, Baral R, Birger M, Bui AL, Bulchis A, Chapin A, Hamavid H, Horst C, Johnson EK, Joseph J, Lavado R, Lomsadze L, Reynolds A, Squires E, Campbell M, DeCenso B, Dicker D, Flaxman AD, Gabert R, Highfill T, Naghavi M, Nightingale N, Templin T, Tobias MI, Vos T, Murray CJL. US Spending on Personal Health Care and Public Health, 1996-2013. JAMA. 2016;316(24):2627-2646. doi:10.1001/jama.2016.16885

Here the article (shortened) from the Washington Post:

American health-care spending, measured in trillions of dollars, boggles the mind. Last year, we spent $3.2 trillion on health care  a number so large that it can be difficult to grasp its scale.

A new study published in the Journal of the American Medical Association reveals what patients and their insurers are spending that money on, breaking it down by 155 diseases, patient age and category — such as pharmaceuticals or hospitalizations. Among its findings:

  • Chronic — and often preventable — diseases are a huge driver of personal health spending. The three most expensive diseases in 2013: diabetes ($101 billion), the most common form of heart disease ($88 billion) and back and neck pain ($88 billion).
  • Yearly spending increases aren’t uniform: Over a nearly two-decade period, diabetes and low back and neck pain grew at more than 6 percent per year — much faster than overall spending. Meanwhile, heart disease spending grew at 0.2 percent.
  • Medical spending increases with age — with the exception of newborns. About 38 percent of personal health spending in 2013 was for people over age 65. Annual spending for girls between 1 and 4 years old averaged $2,000 per person; older women 70 to 74 years old averaged $16,000.

Here the link to the original article:
https://www.washingtonpost.com/news/wonk/wp/2016/12/27/the-u-s-spends-more-on-health-care-than-any-other-country-heres-what-were-buying/?tid=pm_business_pop&utm_term=.71fc517cdc11

machine learning for health informatics

LNAI 9605 Machine Learning for Health Informatics available

14.12.2016 LNAI 9605 just appeared

Machine Learning for Health Informatics Lecture Notes in Artificial Intelligence LNAI 9605

Holzinger, Andreas (ed.) 2016. Machine Learning for Health Informatics: State-of-the-Art and Future Challenges. Cham: Springer International Publishing, doi:10.1007/978-3-319-50478-0

[book homepage]

Machine learning (ML) is the fastest growing field in computer science, and Health Informatics (HI) is amongst the greatest application challenges, providing future benefits in improved medical diagnoses, disease analyses, and pharmaceutical development. However, successful ML for HI needs a concerted effort, fostering integrative research between experts ranging from diverse disciplines from data science to visualization.

Tackling complex challenges needs both disciplinary excellence and cross-disciplinary networking without any boundaries. Following the HCI-KDD approach, in combining the best of two worlds, it is aimed to support human intelligence with machine intelligence.

This state-of-the-art survey is an output of the international HCI-KDD expert network and features 22 carefully selected and peer-reviewed chapters on hot topics in machine learning for health informatics; they discuss open problems and future challenges in order to stimulate further research and international progress in this field.

NIPS 2016

NIPS 2016 is over

A crazy 5700-people event is over: NIPS 2016 in Barcelona. Registration on Sunday, 4th December, on Monday, 5th traditionally the tutorials were presented concluded by the first keynote talk given by Yann LeCun (now director at Facebook AI research) and completed by the official opening and the first poster presentation.  On Tuesday, Dec 6th, after starting with a keynote by Drew Purves (Google Deep Mind), parallel tracks on clustering and graphical models took place concluded by a keynote given by Saket Nevlakha (The Salk Institute) and completed by parallel tracks on deep learning and machine learning theory and poster sessions and demonstrations. Wednesday was openend by a keynote from Kyle Cranmer (New York University), the award talk “matrix completion has no spurious local min” and dominated by parallel tracks on algorithms and applications, concluded by a keynote by Marc Raibert (Boston Dynamics) who presented advances in latest robotic learning, and parallel tracks on deep learning and optimization, completed by the poster sessions with cool demonstrations. The Thursday was opened by a keynote fromm Irina Rish (IBM) and Susan Holmes (Stanford), followed by parallel tracks on interpretable models and cognitive neuroscience, concluded by various symposia until 21:30! Friday and Saturday were the whole day workshops – the sunday was reserverd for recreation on the sand beach of Barcelona 🙂

NIPS is definitely the most exciting conference with amazing variety on topics and themes revolving in machine learning with all sorts of theory and applications.

nips-2016-barcelona-machine-learningnips-2016-barcelona-machnine-learning-gamification

Machine Learning with Fun

Google Research hosts a number of very interesting so-called A.I. experiments. There you can play with machine learning algorithms in a very amusing way. A recent example is QUICK, DRAW *). This is an online guessing game that challenges humans to hand sketch (called doodles) a given object. The game uses a  neural network to learn from the input data

https://quickdraw.withgoogle.com

which is part of the A.I. Experiments platform:

https://aiexperiments.withgoogle.com

and here the explanatory video:
https://www.youtube.com/watch?v=oOwfiYnRi5c

Have fun and enjoy!

Here you see more than 100.000 hedgehog drawings made by humans on the internet:

https://quickdraw.withgoogle.com/data/hedgehog

*) not to be confused with QuickDraw [1], which is a sketch-based drawing tool facilitating to draw precise geometry diagrams,  and can automatically recognize sketched diagrams containing components such as line segments and circles, infer geometric constraints relating recognized components, and use this information to “beautify” the sketched diagram. This “Beautification” is based on an algorithm that iteratively computes various sub-components of the components using an extensible set of deductive rules.

[1] Cheema, S., Gulwani, S. & Laviola, J. QuickDraw: improving drawing experience for geometric diagrams. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2012. ACM, 1037-1064. doi: 10.1145/2207676.2208550

[2] https://experiments.withgoogle.com/ai

Visualization of High Dimensional Data

Google is doing experiments with visualization of high dimenisonal data. This experiment helps visualize what’s happening in machine learning. It allows coders to see and explore their high-dimensional data. The goal is to eventually make this an open-source tool within TensorFlow, so that any coder can use these visualization techniques to explore their data.
Built by Daniel Smilkov, Fernanda Viégas, Martin Wattenberg, and the Big Picture team at Google:
This work is based on a method developed by Laurens van der Maaten & Geoffrey Hinton in 2008:
Maaten, L. V. D. & Hinton, G. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 11, 2579-2605, https://www.jmlr.org/papers/v9/vandermaaten08a.html
t-Distributed Stochastic Neighbor Embedding (t-SNE, spoken: Disney) is a (prize-winning) nonlinear technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional data sets into R2 or R3. The technique can be implemented via Barnes-Hut approximations, allowing it to be applied on large real-world datasets (“big data”).
For details please refer directly to:
Compare this method to our own work on subspace clustering:
Neural Information Processing Systems

Holzinger Group at NIPS

Our crazy iML-Concept has been accepted at the CiML 2016 workshop (organized by Isabelle Guyon, Evelyne Viegas, Sergio Escalera, Ben Hammer & Balazs Kegl) at NIPS 2016 (December, 5-10, 2016)  in Barcelona:

https://docs.google.com/viewer?a=v&pid=sites&srcid=Y2hhbGVhcm4ub3JnfHdvcmtzaG9wfGd4OjFiMGRmNzQ5MmM5MTZhYzE