Journal Club

We run a journal club to discuss data science topics every Friday at 2 pm. To join our mailing list and receive notifications, please send an empty email to odsl-subscribe(at)lists.lrz.de or visit this website: https://lists.lrz.de/mailman/listinfo/odsl.

If you have ideas for topics to discuss, feel free to propose them in the following google doc: http://bit.ly/odsljc20.

ODSL Block Courses - Sept 2021

We are organising our next set of Block courses from September 6th - 16th 2021 under the title Practical Inference for Researchers in the Physical Sciences.

This session consists of two one-week courses:

1. Monte Carlo inference methods

Introduction to Bayesian inference with physical models. Parameter uncertainties, degeneracies and knowledge updates. Model comparison and criticism. Modern Monte Carlo algorithms for Bayesian inference in practice and probabilistic computation packages: Importance Sampling, Markov Chain Monte Carlo, Nested Sampling.

2. Bayesian workflow

Bayesian thinking, going from a science question to a generative statistical model, defining sensible priors, verification through simulations, diagnosing problems in models and computation, robust decision making, experiment design.
 

The courses will be held online and organised by Johannes Buchner and Francesca Capel. We plan to offer credits to both TUM and LMU students. 

For more information and to register please visit: https://indico.ph.tum.de/event/6875/

ODSL Call for Proposals (2021)

We are opening the second call for proposals from Origins Cluster scientists to collaborate with the ODSL team on data analysis projects. Our team has a wide range of expertise in applied statistics and can offer dedicated support to help you make the most of your data. We are looking for scientific projects with flexible durations, anything from a few weeks to many months. 

Our core team currently consists of the two postdocs: Francesca Capel and Jakob Knollmüller and a PhD student. We are also joined by the four ODSL fellows, Johannes Buchner, Philipp Eller, Nahuel Ferreiro and Oliver Schulz.  Together we have experience in a variety of data analysis topics including Bayesian analysis, Monte Carlo methods, hierarchical modelling, machine learning, likelihood-free inference and variational inference to give some examples. 

What we offer

  • Help in the formulation of statistical aspects of an analysis
  • Advice on the best tools for approaching an analysis
  • Assistance in setting up the necessary software
  • Evaluation of analysis and model performance
  • Full implementation of a reduced analysis on small-scale computing
  • Flexible consulting throughout the project

In return, we expect acknowledgement or authorship on resulting publications, depending on the level of involvement. We want to make it clear that we are not offering help in setting up computing environments or basic software, nor are we a high-performance computing facility. 

Proposal guidelines

Proposals are welcome from all Origins member scientists but must be endorsed by an Origins PI. Proposals should include

  • An introduction to the scientific topic (max. 1 page)
  • A description of the analysis task, including its ultimate goals
  • A statement of what help is expected from the ODSL consultant
  • An estimate of the ODSL consultant's required time commitment 
  • An estimate of the project duration (start/end date)
  • Name and email address of central contact person on the project
  • Name and email address of Origins PI 

If you have any questions regarding the proposals, then please contact us at odsl-team(at)origins-cluster.de. We are happy to discuss and help you to formulate your request. A selection committee made up of the ODSL core team, fellows, and Origins scientists from different disciplines will evaluate the proposals promptly after the submission deadline. 

Proposal deadline

The deadline for this call is October 31st 2021

Submit a proposal

Please send your proposal in English as a single pdf file to odsl-team(at)origins-cluster.de

Introductory C++ course (March 2021)

An introductory C++ course will take place 8.3.-19.3.2021. Exam 19.3.

Moodle page: https://www.moodle.tum.de/course/view.php?id=64027

Students may also register as guest (please get into contact with alice.smith-gicklhorn@origins-cluster.de to get access)

On the Moodle-page, information on the course, slides and literature for download may be found.

The course takes place online via zoom::

https://cern.zoom.us/j/63382034823?pwd=VHF3T1VCOWpybjN4MmRFWUcySU5SQT09

Meeting ID: 633 8203 4823
Passcode: (please get into contact with alice.smith-gicklhorn@origins-cluster.de to get the passcode)

Dates:

8.3.2021-19.3.2021. (examen: 19.3.2021 - ECTS credits 3)

Times:

10.00 to 12 - 12.30 (2 lectures 20-20 slides each)
14.00 to 16 - 17 practical part (students are programming, lecturer answers their questions)

Lecturer: Sergei Gerassimov

Language: English

ORIGINS Data Science Block Courses (March 2021)

We organize two block courses during March 1.-11. 2021. These introduce statistical, as well as Monte Carlo Methods.

The courses, as well as the tutorials, will take place online. The tutorials will be organised through breakout rooms, each assigned to a tutor.

The block courses follow the schedule:

  • Lecture: Monday-Wednesday 14:00-17:00
  • Tutorial: Tuesday-Thursday 9:00-12:00

All exercises will be made available before the courses and the deadline to hand in the report is March 31.

For students successfully completing both Block Courses, ECTS points can be awarded.  

Block I (March 1-4):  Introduction to Statistical Methods

Lecturer: Prof. Allen Caldwell

Topics: Derivation and application of the most commonly used statistical distributions, Central Limit Theorem, point estimates, confidence intervals, test statistics, p-values and related topics.

Block II (March 8-11):  Introduction to Monte Carlo methods

Lecturer: Prof. Allen Caldwell

Topics: Variable transformations, accept-reject methods, sample mean, importance sampling, random walks, Markov Chain Monte Carlos and applications

 The registration, as well as further details can be found here: https://indico.ph.tum.de/event/6797/

ODSL Call for Proposals

We are opening a call for proposals from Origins Cluster scientists to collaborate with the ODSL team on data analysis projects. Our team has a wide range of expertise in applied statistics and can offer dedicated support to help you make the most of your data. We are looking for scientific projects with flexible durations, anything from a few weeks to many months. 

Our core team consists of the three postdocs: Francesca Capel, Philipp Eller and Jakob Knollmüller. We are also joined by the two ODSL fellows, Johannes Buchner and Oliver Schulz. Together we have experience in a variety of data analysis topics including Bayesian analysis, Monte Carlo methods, hierarchical modelling, machine learning, likelihood-free inference and variational inference to give some examples. 

What we offer

  • Help in the formulation of statistical aspects of an analysis
  • Advice on the best tools for approaching an analysis
  • Assistance in setting up the necessary software
  • Evaluation of analysis and model performance
  • Full implementation of a reduced analysis on small-scale computing
  • Flexible consulting throughout the project

In return, we expect acknowledgement or authorship on resulting publications, depending on the level of involvement. We want to make it clear that we are not offering help in setting up computing environments or basic software, nor are we a high-performance computing facility. 

Proposal guidelines

Proposals are welcome from all Origins member scientists but must be endorsed by an Origins PI. Proposals should include

  • An introduction to the scientific topic (max. 1 page)
  • A description of the analysis task, including its ultimate goals
  • A statement of what help is expected from the ODSL consultant
  • An estimate of the ODSL consultant's required time commitment 
  • An estimate of the project duration (start/end date)
  • Name and email address of central contact person on the project
  • Name and email address of Origins PI 

If you have any questions regarding the proposals, then please contact us at odsl-team(at)origins-cluster.de. We are happy to discuss and help you to formulate your request. A selection committee made up of the ODSL core team, fellows, and Origins scientists from different disciplines will evaluate the proposals promptly after the submission deadline. 

Proposal deadline

The deadline for this call is October 31st 2020. We anticipate regular calls, with the next call early next year. 

Submit a proposal

Please send your proposal in English as a single pdf file to odsl-team(at)origins-cluster.de

UPDATE: The proposals for this period have now been assigned. We will consider further proposals in Spring 2021. 

Origins Data Science Block Courses (Sept 2020)

The Origins Data Science Lab (ODSL) is organizing two block courses of three afternoons each on data science topics.

Each block consists of six lectures of one hour, followed by the possibility to work on a set of problems, including small calculations and implementations.

 

Block I (September 1-3):  Introduction to Probabilistic Reasoning


In this course we will introduce the basic concepts of reasoning under uncertainty. After a brief introduction to probability theory and commonly used probability distributions, we discuss inference tasks with various probabilistic models. We conclude by outlining methods to approach more involved inference tasks through approximation or sampling.

Lecturer: Jakob Knollmüller

Prerequisites: Linear Algebra, basic Analysis, a programming language of choice

Skills acquired: basics of probabilistic reasoning and Bayesian inference, probabilistic modelling, model comparison, approximate inference

 

Block II (September 8-10): Introduction to Numerical Methods and Machine Learning


This course is focusing on methods for data processing, optimization and machine learning. First we will learn the basics of data decorrelation, reduction and optimization algorithms. Based on these new skills, we dive into machine learning topics, such as clustering, classification and regression with tree based algorithms and neural networks. In the last part deep learning models and different architectures will be introduced and explained.

Lecturer: Dr. Philipp Eller

Prerequisites: Linear Algebra, basic Analysis, a programming language of choice

Skills acquired: basic data transformations, knowledge in various optimization algorithms, k-means clustering, decision trees, neural networks, convolutional neural networks, auto-encoders, generative models

Forms of credit

Forms of credit


It is possible to get a Certification or ECTS points for participation in the Block Courses:

To get a Certificate of Participation (for either one of the two blocks or both), you will need to turn in solutions to the exercises that will be assigned during the course and get a passing grade.  The certification will be done on a course-by-course basis, and will state that you have successfully completed the Block Course in the respective topic.  Please register for the course in advance so we can estimate how much work will be involved in the evaluation of the reports.

To get the 3 ECTS points, you will need to turn in solutions to the exercises for both Block Courses that will be offered this year. The grade for the course will be based on the two sets of exercises, and there will not be an additional exam. The deadline to hand in the report is September 30, 2020. Please register for the courses in advance so we can estimate how much work will be involved in the evaluation of the reports.

For more information and to register please visit https://indico.ph.tum.de/event/4491/

 

Workshop zum Stand der Technik bei Sampling und Clustering

Der Workshop wird sowohl einführende als auch fortgeschrittene Themen im Bereich der statistischen Stichprobenziehung (Sampling )und Clusterbildung behandeln. Neben Vorträgen über den neuesten Stand der Technik wird der Workshop auch Hands-on und Übungssitzungen umfassen.

Der vom Max-Planck-Institut für Physik (MPP) veranstaltete Workshop wird von INSIGHTS ITN, MPP IMPRS und dem Exzellenzcluster ORIGINS organisiert und steht allen Angehörigen dieser Organisationen offen.

Aufgrund von Covid-19 werden alle Vorlesungen, Übungseinheiten und sozialen Interaktionen online abgehalten.

Abhängig von der weiteren Entwicklung der aktuellen Situation kann es immer noch möglich sein, zum MPP zu reisen, um persönliche Kontakte zu knüpfen.

Die Teilnahme ist kostenlos, aber alle Teilnehmer sollten sich bis zum 20. September 2020 anmelden.

Mehr infos: https://indico.mpp.mpg.de/event/7494/overview

Example Projects within ODSL

Universelle Bildgebung mittels Informationsfeldtheorie

Um ein gutes Bild einer räumlich variierenden Größe, einem Feld, aus unvollständigen und verrauschten Messdaten zu rekonstruieren, bedarf es der Kombination der Messungen mit Wissen über allgemeine physikalische Eigenschaften des Feldes, wie dessen Glattheit, Korrelationsstruktur, oder Divergenz-Freiheit. Die Informationsfeldtheorie nutzt den eleganten Formalismus von Feldtheorien, um optimale bayesianische Bildgebungsalgorithmen für die unterschiedlichsten Messsituationen mathematisch herzuleiten. Diese Algorithmen können mittels des „Numerical Information Field Theory“ (NIFTy) Programierpaketes effizient und allgemein implementiert werden. Algorithmen die NIFTy nutzen kommen beispielsweise  bereits in der Radio- und Gammastrahlungsastronomie zum Einsatz. NIFTy entwickelt sich gerade zu einem universell einsetzbaren Werkzeug für Bildgebungsprobleme in Astronomie, Teilchenphysik, Medizin und andere Gebiete.

The Bayesian Analysis Toolkit

The Bayesian Analysis Toolkit, BAT, is a software package which is designed to help solve statistical problems encountered in Bayesian inference. BAT is based on Bayes' Theorem and is currently realized with the use of Markov Chain Monte Carlo. This gives access to the full posterior probability distribution and enables straightforward parameter estimation, limit setting and uncertainty propagation.  Novel sampling methods, optimization schemes and parallelization are example development areas.

The Dark Matter Data Centre

We will build a platform to host and combine overarching information on experimental studies, astronomical observations and theoretical modeling of Dark Matter to facilitate the combination and cross-correlation of existing and forthcoming data. This data centre will allow tests for Dark Matter (DM) candidates passing all existing benchmarks in cosmology, astro- and particle physics, experiments and in theory. We plan probing for tensions between different data sets and theories pointing towards new, hidden properties of DM. The data centre will make the data available to the international community for further global analysis and model benchmarking, following examples in astro- and high-energy physics.