|Claudia Czado||Analyzing dependent data with vine copulas||TBA|
This course is designed for graduate students and researchers, who are interested in using copula based models for multivariate data structures. It provides a step to step introduction to the class of vine copulas and their statistical inference. This class of flexible copula models has become very popular in the last years for many applications in diverse fields such as finance, insurance, hydrology, marketing, engineering, chemistry, aviation, climatology and health.
The popularity of vines copulas is due to the fact, that it allows in addition to the separation of margins and dependence by the copula approach, tail asymmetries and separate multivariate component modeling. This is accommodated by constructing multivariate copulas using only bivariate building blocks, which can be selected independently. These building blocks are glued together to valid multivariate copulas by appropriate condition- ing. Thus also the term pair copula construction was coined by Aas et al. (2009). This approach allows for flexible and tractable dependence models in dimensions of several hundred, thus providing a long desired extension of the elliptical and Archimedean copula classes. It forms the basis of new approaches in risk, reliability, spatial analysis, simulation, survival analysis and data mining to name a few.
The course starts with background on multivariate and conditional distributions and copulas. Basic bivariate dependence measures are then discussed. Bivariate parametric classes of elliptical, Archimedean are introduced and graphical tools for the identification of sensible bivariate copula models to data are developed. The decomposition and construction principle of vines is first given in three dimensions and then extended to the special cases of draw able (D-) and canonical (C-) vines. Finally, the general case of regular (R-) vines is developed. Simulation algorithms and parameter estimation methods will be constructed. Model selection methods for vine models are considered. The short course closes with a case study. Computations are facilitated using the freely available package VineCopula of Schepsmeier et al. (2017) package in R (see R Core Team (2017)). Further resources on vine models can be found under vine-copula.org
|Lachlan Mitchel and Peter Kasprzak||Shiny App Development||TBA|
|Julian Taylor||Whole Genome Analysis with wgaim||TBA|
|Chris Brien and Sam Rogers||Identifying, randomizing, canonically analyzing and formulating mixed models for designs for comparative experiments using R||TBA|
Brien (2017) outlines a mixed-model-based paradigm for obtaining A-optimal designs for comparative experiments and deriving, from the allocation involved in the design, an initial mixed model for data from an experiment that employs the design. This course explores the use of R packages od and dae for implementing this paradigm: the od package can be used, if required, to generate designs that are A-optimal for a specified mixed model; the dae package can be used to randomize designs and to perform canonical analyses (eigenanalysis) of designs. The canonical analysis is useful in elucidating the properties of a design and in formulating and checking a mixed model for it.
The course covers those basic concepts in experimental design that are necessary for using the paradigm. Methods for describing the factor allocation in a design and their use in producing a canonical analysis of the design are discussed, along with interpreting the canonical analysis. The formulation of allocation-based mixed models from the canonical analysis is also exposited. That is, the trail from the recognition in the planning stages of important sources of variation through constructing the design to the mixed model for data from the experiment using the design is followed. Participants will rehearse the techniques in practical sessions, using the R packages dae and od.
|James Carpenter||Handling missing data in administrative studies: multiple imputation and inverse probability weighting||TBA|
|The course will consider the issues raised by missing data (both item and unit non-response) in studies using survey and routinely collected data, for example electronic health records. Following a review of the issues raised by missing data, we will focus on two methods of analysis: multiple imputation and inverse probability weighting. We will also discuss how they can be used together. The concepts will be illustrated with medical and social examples.|
|Emi Tanaka||Introduction to R||TBA|
R is a programming language, with particular emphasis in statistical computing and graphics, that first appeared in 1993, i.e. 26 years ago. Over this time, R has grown exponentially in use and the R developers and users, collectively called the R community, have become diverse.
R has evolved significantly, particularly driven by a suite of R packages, collectively called tidyverse, that adopt a common design philosophy, grammar, and data structures. Additionally, R packages such as rmarkdown allow for seamless integration of the R language to produce various outputs including interactive html documents or applications. The increased use of these tools are shown in the increasing number of citations.
Many members of the biometrics community use R, however, modern evolvement of R means that many recent contributions are yet to be well adopted within the community. This workshop has a potential to raise the modern statistical programming literacy by teaching key tools and best practices that is increasingly adopted.
|Christopher K. Wikle and Dan Pagendam||An Introduction to Deep Learning with Biometric and Environmetric Applications||TBA|
Deep learning is a type of machine learning (ML) that exploits a connected hierarchical set of models to predict or classify elements of complex data sets. The ML deep learning revolution is relatively recent and primarily associated with neural models such as feedforward neural networks (FNNs), convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs), or some combination of these neural architectures. There are remarkable success stories associated with these approaches, such as models that can defeat experts in Go, Chess, or Shogi, and of course, there are failures as well. Statisticians should not be surprised by the success (and failure) of these deep ML methods as we have been using deep hierarchical models (HMs) for years. Indeed, many of the reasons for success and failure of deep ML and deep HMs are the same.
This course will present an introduction to deep models in ML from a statistician’s perspective. Topics will include an introduction to stochastic gradient optimization and concepts in regularization and dimension reduction, followed by discussion of deep FNNs, CNNs, RNNs, and GANs. We will also touch upon some recent developments that may be of particular interest to statistical practitioners (e.g. Bayesian Neural Networks). The course will focus on concepts and modeling intuition, and will include hands-on implementation using the R interface to Keras, with examples from biomedical, ecological, and environmental statistics.