ODSL - The ORIGINS Data Science Laboratory

The ODSL provides a new pillar for novel analysis methods, algorithms and computational tools to fully exploit high-dimensional, complex data sets. The ODSL specialises in advanced techniques for pattern recognition in noisy data and for the identification and extraction of weak signals. Modern techniques in machine learning will be employed as well as augmented reality and visualisation techniques in the extraction of scientific results.

Journal Club

We run a journal club to discuss data science topics every Friday at 2 pm. To join our mailing list and receive notifications, please send an empty email to odsl-subscribe(at)lists.lrz.de or visit this website: https://lists.lrz.de/mailman/listinfo/odsl.

If you have ideas for topics to discuss, feel free to propose them in the following google doc: http://bit.ly/odsljc20.

ODSL Call for Proposals

We are opening a call for proposals from Origins Cluster scientists to collaborate with the ODSL team on data analysis projects. Our team has a wide range of expertise in applied statistics and can offer dedicated support to help you make the most of your data. We are looking for scientific projects with flexible durations, anything from a few weeks to many months. 

Our core team consists of the three postdocs: Francesca Capel, Philipp Eller and Jakob Knollmüller. We are also joined by the two ODSL fellows, Johannes Buchner and Oliver Schulz. Together we have experience in a variety of data analysis topics including Bayesian analysis, Monte Carlo methods, hierarchical modelling, machine learning, likelihood-free inference and variational inference to give some examples. 

What we offer

  • Help in the formulation of statistical aspects of an analysis
  • Advice on the best tools for approaching an analysis
  • Assistance in setting up the necessary software
  • Evaluation of analysis and model performance
  • Full implementation of a reduced analysis on small-scale computing
  • Flexible consulting throughout the project

In return, we expect acknowledgement or authorship on resulting publications, depending on the level of involvement. We want to make it clear that we are not offering help in setting up computing environments or basic software, nor are we a high-performance computing facility. 

Proposal guidelines

Proposals are welcome from all Origins member scientists but must be endorsed by an Origins PI. Proposals should include

  • An introduction to the scientific topic (max. 1 page)
  • A description of the analysis task, including its ultimate goals
  • A statement of what help is expected from the ODSL consultant
  • An estimate of the ODSL consultant's required time commitment 
  • An estimate of the project duration (start/end date)
  • Name and email address of central contact person on the project
  • Name and email address of Origins PI 

If you have any questions regarding the proposals, then please contact us at odsl-team(at)origins-cluster.de. We are happy to discuss and help you to formulate your request. A selection committee made up of the ODSL core team, fellows, and Origins scientists from different disciplines will evaluate the proposals promptly after the submission deadline. 

Proposal deadline

The deadline for this call is October 31st 2020. We anticipate regular calls, with the next call early next year. 

Submit a proposal

Please send your proposal in English as a single pdf file to odsl-team(at)origins-cluster.de

UPDATE: The proposals for this period have now been assigned. We will consider further proposals in Spring 2021. 

Workshop on state-of-the-art in sampling and clustering

The workshop will cover both introductory and advanced topics in the field of statistical sampling and clustering. In addition to lectures on state of the art approaches, the workshop will also comprise hand-on and excercise sessions.

The workshop, hosted by the Max Planck Institute for Physics (MPP)  is organised by the INSIGHTS ITNMPP IMPRS and the ORIGINS Excellence Cluster, it is open to everybody affiliated with these organizations.

Due to Covid-19, all lectures, exercise session and social interactions will be held online.

People may still be able travel to MPP to have some personal interaction, depending on the further development of the current situation.

Participation is free, but all participants should register until Sept. 20 2020.

More information: https://indico.mpp.mpg.de/event/7494/overview

Origins Data Science Block Courses

The Origins Data Science Lab (ODSL) is organizing two block courses of three afternoons each on data science topics.

Each block consists of six lectures of one hour, followed by the possibility to work on a set of problems, including small calculations and implementations.


Block I (September 1-3):  Introduction to Probabilistic Reasoning

In this course we will introduce the basic concepts of reasoning under uncertainty. After a brief introduction to probability theory and commonly used probability distributions, we discuss inference tasks with various probabilistic models. We conclude by outlining methods to approach more involved inference tasks through approximation or sampling.

Lecturer: Jakob Knollmüller

Prerequisites: Linear Algebra, basic Analysis, a programming language of choice

Skills acquired: basics of probabilistic reasoning and Bayesian inference, probabilistic modelling, model comparison, approximate inference


Block II (September 8-10): Introduction to Numerical Methods and Machine Learning

This course is focusing on methods for data processing, optimization and machine learning. First we will learn the basics of data decorrelation, reduction and optimization algorithms. Based on these new skills, we dive into machine learning topics, such as clustering, classification and regression with tree based algorithms and neural networks. In the last part deep learning models and different architectures will be introduced and explained.

Lecturer: Dr. Philipp Eller

Prerequisites: Linear Algebra, basic Analysis, a programming language of choice

Skills acquired: basic data transformations, knowledge in various optimization algorithms, k-means clustering, decision trees, neural networks, convolutional neural networks, auto-encoders, generative models

Forms of credit

It is possible to get a Certification or ECTS points for participation in the Block Courses:

To get a Certificate of Participation (for either one of the two blocks or both), you will need to turn in solutions to the exercises that will be assigned during the course and get a passing grade.  The certification will be done on a course-by-course basis, and will state that you have successfully completed the Block Course in the respective topic.  Please register for the course in advance so we can estimate how much work will be involved in the evaluation of the reports.

To get the 3 ECTS points, you will need to turn in solutions to the exercises for both Block Courses that will be offered this year. The grade for the course will be based on the two sets of exercises, and there will not be an additional exam. The deadline to hand in the report is September 30, 2020. Please register for the courses in advance so we can estimate how much work will be involved in the evaluation of the reports.

For more information and to register please visit https://indico.ph.tum.de/event/4491/


Example Projects within ODSL

Universal Imaging Using Information Field Theory

In order to reconstruct a good image of a spatially varying quantity, a field, from incomplete and noisy measurement data, it is necessary to combine the measurements with knowledge about general physical properties of the field, such as its smoothness, correlation structure, or freedom from divergence. Information field theory uses the elegant formalism of field theories to mathematically derive optimal Bayesian imaging algorithms for different measurement situations. These algorithms can be implemented efficiently and generally by means of the "Numerical Information Field Theory" (NIFTy) programming package. Algorithms using NIFTy are already used in radio and gamma-ray astronomy. NIFTy is developing into a universal tool for imaging problems in astronomy, particle physics, medicine, and other fields.

The Bayesian Analysis Toolkit

The Bayesian Analysis Toolkit, BAT, is a software package which is designed to help solve statistical problems encountered in Bayesian inference. BAT is based on Bayes' Theorem and is currently realized with the use of Markov Chain Monte Carlo. This gives access to the full posterior probability distribution and enables straightforward parameter estimation, limit setting and uncertainty propagation.  Novel sampling methods, optimization schemes and parallelization are example development areas.

The Dark Matter Data Centre

We will build a platform to host and combine overarching information on experimental studies, astronomical observations and theoretical modeling of Dark Matter to facilitate the combination and cross-correlation of existing and forthcoming data. This data centre will allow tests for Dark Matter (DM) candidates passing all existing benchmarks in cosmology, astro- and particle physics, experiments and in theory. We plan probing for tensions between different data sets and theories pointing towards new, hidden properties of DM. The data centre will make the data available to the international community for further global analysis and model benchmarking, following examples in astro- and high-energy physics.