"Latent Markov models for aggregate data: application to disease mapping and small area estimation" by Gaia Bertarelli, UNIVERSITY OF MILANO - BICOCCA, PHD IN Statistica - XXVIII Ciclo, Co-tutor Ranalli Maria Giovanna. 

Latent Markov Models (LMMs) are a particular class of statistical models in which a latent process is assumed. In studying LMMs, it is important to distinguish between two components: the measurement model, i.e. the conditional distribution of the response variables given the latent process, and the latent model, i.e. the distribution of the latent process. LMMs allow for the analysis of longitudinal data when the response variables measure common characteristics of interest, which are not directly observable. In LMMs the characteristics of interest, and their evolution in time, are represented by a latent process that follows a first order discrete Markov chain and units are allowed to change latent state over time. This thesis focuses on LMMs for aggregated data. It considers two fields of applications: disease mapping and small area estimation.
The goal of disease mapping is the study of the geographical pattern and variation of a disease measured through counts and incidence rates. From a methodological point of view, this work extends LMMs to include a spatial pattern in the latent model. This extension allows the probability of being in a latent state and the probability to move from a latent state to another over time to be influenced by the neighbouring areas. The model is fitted within a Bayesian framework using Gibbs and Random Metropolis Hastings algorithms with augmented data that allows for a more efficient sampling of model parameters. Simulations studies are also conducted to investigate the performance of the proposed model on data generated under different settings. The model has also been applied to a data set of county specific lung cancer deaths counts in the state of Ohio, USA, during the years 1968-1988.
Small area estimation (SAE) methods are used in inference for finite populations to obtain estimates of parameters of interest when domain sample sizes are too small to provide adequate precision for direct domain estimators. The second work develops a new area-level SAE method using LMMs. In particular, since area-level SAE models consider a sampling and a linking model, a LMM is used as the linking model. In a hierarchical Bayesian framework the sampling model is introduced as the highest level of the hierarchy. In this context, data are considered aggregated because direct estimates are usually mean and frequencies. Under the assumption of normality for the response variable, the model is estimated using a Gibbs sampling in a data augmentation context. The application field in this second work is particularly relevant: it uses yearly unemployment rates at Local Labour Market Areas level for the period 2004-2014 from the Labour Force Survey conducted by the Italian National Statistical Institute (ISTAT).

Available at