\documentclass[12pt]{article}
%------------- fonts ------------------------------
\usepackage[english]{babel}
\usepackage[compact]{titlesec}
\usepackage[bottom]{footmisc}
%-------------- Math packges ----------------------
\usepackage{amsmath, amssymb, bm, bbm, natbib, babel, xfrac}
\input{ee.sty}
%----------- Hyperref------------------------------
\usepackage[colorlinks = true, linkcolor = blue, citecolor = blue, urlcolor = black, breaklinks]{hyperref}
% ------------ Graphics -----------------
\usepackage{graphicx}
\usepackage{float}
\usepackage{epstopdf}
\DeclareGraphicsExtensions{.png,.pdf,.jpg,.eps}
% ------ packages for tables ------------
\usepackage{longtable, booktabs, dcolumn, multirow, siunitx}
\usepackage[hang, singlelinecheck=off, center]{caption}
\usepackage[flushleft]{threeparttable}
% ------ format of pages ---------------
\usepackage{pdflscape}
\usepackage[right=2.5cm,left=4cm,top=2cm,bottom=2cm,headsep=0.5cm,footskip=0.5cm]{geometry}
\raggedbottom
\sloppy
\pretolerance=2000
\tolerance=3000
\usepackage{afterpage}
\usepackage{setspace}
\doublespacing
%------- appendix ---------------
\usepackage[title]{appendix}
%------------- other ---------------
\usepackage{color}
\renewcommand{\baselinestretch}{1.2}
\pagestyle{plain}
\frenchspacing
% Tikz
\usepackage{tikz}
\usetikzlibrary{arrows,calc}
\usepackage{relsize}
\newcommand\W{\ensuremath{\mathit{W}}}
%----------------------------------------------------------------------------------------
% TITLE SECTION
%----------------------------------------------------------------------------------------
\begin{document}
\SweaveOpts{concordance=TRUE}
\title{Individual-specific posterior distributions from Mixed Logit models: Properties, limitations and diagnostic checks}
\author{Mauricio Sarrias}
\date{\today}
\maketitle
\begin{abstract}
\noindent Individual-specific posterior distributions are an attractive tool for disentangling the tastes for each person in the sample. However, there exists some risks and certain limitations regarding their use. This study reviews and summarizes the theoretical literature about the individual-specific posterior distributions derived from the Mixed Logit model, focusing on their properties, limitations and common pitfalls. It also reviews and analyzes the behavior of some diagnostic checks proposed in the literature for the reliability of such estimates in applied works using Monte Carlo experiments. Finally, this article provides reasonable guidelines for the correct use of individual-specific posterior distributions.
\end{abstract}
\bigskip
\noindent \textsl{Key words:} mixed logit, discrete choice, conditional distribution, individual-specific estimates
\maketitle
\newpage
%------------------------------
\section{Introduction}
%------------------------------
Since the \cite{revelt2000customer}'s work and \cite{train2009discrete}'s book, the use of conditional estimators derived from conditional posterior distributions has been an attractive tool for disentangling the tastes for each individual in the sample. Conditional estimates, put simply, allow us to know something about respondents' tastes based on their previous choices providing their most likely location on the population distribution.
Due to the attractiveness of eliciting the tastes for each person, individual-specific estimates (or conditional estimates) derived from the Mixed Logit (MIXL) Model have received a lot of attention from the applied literature in different fields. For example, they have been used to compute willingness-to-pay (WTP) measures at the individual level \citep{sillano2005willingness, greene2005using, hensher2003taking, hensher2006deriving, hess2007posterior, sandorf2017disentangling, dumont2015individual}, to analyze the spatial dependence of tastes \citep{campbell2009using, budzinski2018using, abildtrup2013spatial}, to retrieve individual-specific attribute processing strategies \citep{hess2010using}, to create clusters or segments of individuals \citep{richter2018smart, huber2001similarity} and for predicting the future behavior of the individuals \citep{train2009discrete, dumont2015individual}.
Unfortunately, there exists some risks and certain limitations regarding the use of conditional estimates. For example, \cite{revelt2000customer} have already shown that individual-specific estimates are consistent if and only if, given a fixed number of individuals, the number of choice situations increases without bound.\footnote{For a similar study in the Latent Class Multinomial Logit model see \cite{sarrias2018individual}.} That is, we need several choice situations per individual to learn something about the preferences of each respondent: if we could observe infinitely how individuals react to changes in attributes and their choices in terms of these changes, in theory, we could elucidate the specific tastes of individuals. However, there exist published articles using individual-specific estimates with a very low number of choice situations without reporting any type of diagnostic check to analyze whether the estimates are reliable to be used in a second-step procedure.
In addition, it seems that there is still some confusion regarding the relationship between the variance of the individual-specific estimates and the variance of the population distribution of the random parameters. Some researchers use individual-specific estimates arguing that they show more plausible results in terms of their domain. For example, it is common to find in the applied literature claims such that \emph{``conditional estimates show more reasonable estimates} or comments such as ``\emph{using conditional means sign violations of the coefficients disappears}'', especially when computing individual-specific WTP. But, as briefly mentioned by \cite{daly2012assuring}, such claims ignore the fact that the variance of the conditional mean will be lower than the variance of the population distribution of the random parameters (or unconditional population), specially when the number of choice situations per individual is low. In other words, the apparent better fit of the individual estimates in terms of sign and values might be an artifact of the statistical behavior of the conditional estimates when the number of choice situations is not large enough.
Another problem not yet analyzed in depth, and pointed out by \cite{hess2010conditional}, is the potential impact of the misspecification of the parametric population distribution on the individual estimates. As argued by \cite{hess2010conditional}, researchers should analyze the impact of assumptions made for the unconditional distributions on the shape of the conditional distributions. Failing to choose the most adequate distribution for the random parameters might invalidate the use of conditional estimates and lead to misleading conclusions.
Thus, the first objective of this paper is to review and summarize the properties, limitations and common pitfalls when using individual-specific estimates in the context of the MIXL model and reinforce their understanding. Relying on the previous work of \cite{revelt2000customer}, \cite{daly2012assuring} and \cite{hess2010conditional}, this study revisits the theoretical properties of individual-specific estimator regarding to their consistency and the relationship between the conditional and unconditional distribution of tastes. The second objective is to provide reasonable guidelines for the correct use of individual-specific tastes under a well-specified specification and analyze the potential problems under misspecification, extending the previous work done by \cite{hess2010conditional}. To achieve this goal, the study extends the Monte Carlo experiment carried out by \cite{revelt2000customer} by analyzing the behavior of the conditional estimates under misspecification and by including new measures as diagnostic tools.
The rest of the paper is organized as follows. Section \ref{sec:mixl} briefly reviews the MIXL model. Section \ref{sec:individual} summarizes the main theoretical properties of the individual-specific estimates. Section \ref{sec:monte} explains the Monte Carlo setup, whereas Section \ref{sec:results} presents the results. Finally, Section \ref{sec:conclusion} discusses the results and concludes.
%-------------------------------------------
\section{The Mixed Logit model}\label{sec:mixl}
%-------------------------------------------
Assume that each individual $(i = 1, ..., N)$ faces a choice among $J$ alternatives in each of $T$ choice situations.\footnote{For simplicity in the notation, we assume that all individuals face the same number of choice situations.} Then, the utility associated with each alternative $j = 1, ..., J$ for individual $i$ in choice situation $t$ is:
\begin{equation*}
U_{ijt} = \vx_{ijt}'\vbeta_i +\epsilon_{ijt},
\end{equation*}
%
where $\vx_{ijt}$ is a $K\times 1$ vector of attributes of the alternatives. The important characteristic of the MIXL model is the vector of tastes $\vbeta_i$, which is assumed to vary across individuals according to $g(\vbeta_i | \mOmega)$, where $g(\cdot)$ is some well-behaved parametric distribution with parameters given by $\mOmega$. In general, $\mOmega$ contains the mean and variance of such distribution.
Let $\vy_i = \left\lbrace y_{i1}, y_{i2}, ...., y_{iT}\right\rbrace$ be the sequence of choices made by individual $i$, and $\mX_i = \left\lbrace \vx_{i1}', \vx_{i2}', ...., \vx_{iT}'\right\rbrace$ be a matrix collecting of the variables for individual $i$. If $\epsilon_{ijt}$ is IID Type 1 extreme value, then the conditional probability (conditional on the value of $\vbeta_i$) of the respondent $i$'s sequence of choices is:
\begin{equation}\label{eq:conditional_prob}
f(\vy_i| \mX_i, \vbeta_i) = \prod_{t = 1}^T\prod_{j = 1}^J\left[\frac{\exp(\vx_{ijt}'\vbeta_i)}{\sum_{j = 1}^J\exp(\vx_{ijt}'\vbeta_i)}\right]^{y_{ijt}},
\end{equation}
%
where $y_{ijt} = 1$ if individual $i$ chooses alternative $j$ in choice occasion $t$, and 0 otherwise.
The unconditional probability for each individual is obtained by weighting Equation (\ref{eq:conditional_prob}) over all the possible values of $\vbeta_i$:
\begin{equation}\label{eq:unconditional_prob}
f(\vy_i| \mX_i, \mOmega) = \int f(\vy_i| \mX_i, \vbeta_i) g(\vbeta_i | \mOmega) d \vbeta_i.
\end{equation}
Equation (\ref{eq:unconditional_prob}) is basically the expected probability over all possible values of $\vbeta_i$. Thus, the unconditional probability that individual $i$ will choose alternative $j$ given the specific characteristics of their choice set and the underlying parameters is equal to the expected value of the conditional probability as it ranges over the possible values of $\vbeta_i$.
Then, the exact log-likelihood function is:
\begin{equation*}
\log L(\vtheta) = \sum_{i = 1}^N \log f(\vy_i| \mX_i, \mOmega) = \sum_{i = 1}^N \log \left[\int f(\vy_i| \mX_i, \vbeta_i) g(\vbeta_i | \mOmega) d \vbeta_i\right],
\end{equation*}
%
where $\vtheta$ is a vector that collects all the parameters of the model. Since the integrals do not have a close-form solution, they are approximated by simulation using Monte Carlo integration. For a given value of the parameters $\vtheta$, a value of $\vbeta_i$ is drawn from $g(\vbeta_i | \mOmega)$. Using this draw the unconditional probability in Equation (\ref{eq:unconditional_prob}) is computed. This process is repeated for many draws, and the average over the draws gives the simulated probability (or Monte Carlo approximation of the integrals). In particular, the simulated probability for individual $i$ is:
\begin{equation*}
\widehat{f}(\vy_i| \mX_i, \mOmega) = \frac{1}{R}\sum_{r = 1}^R f(\vy_i| \mX_i, \vbeta_{ir}),
\end{equation*}
%
and the Simulated Maximum Likelihood (SML) is \citep{train2009discrete}:
\begin{equation*}
\log L_s(\vtheta) = \sum_{i = 1}^N \log \left[\frac{1}{R}\sum_{r = 1}^R f(\vy_i| \mX_i, \vbeta_{ir})\right],
\end{equation*}
%
where $\vbeta_{ir}$ is the $r$-th draw of $\vbeta_i$, and $R$ is the total number of draws.
After the parameters are estimated, researchers can carry out a series of analyzes regarding the heterogeneity of the tastes (or preferences) of the individuals for certain attributes. For example, if the normal distribution is used we can obtain information on the share of the population that places a positive value on some attributes using $\Phi(\widehat{\beta}/\widehat{\sigma}_{\beta})$, where $\widehat{\beta}$ and $\widehat{\sigma}_{\beta}$ are the mean and standard deviation of the distribution of $\beta_i$, and $\Phi(\cdot)$ is the cumulative distribution function (CDF) of the standard normal distribution. We might also plot the estimated unconditional distribution of the parameters by taking (several) draws from the estimated distribution $g(\vbeta_i| \widehat{\mOmega})$.
Although the MIXL model is quite flexible to capture the aggregated heterogeneity of tastes with respect to attributes, it does not tell us where each individual lies in the distribution of tastes in the population. The next section describes how this can be done using the individual-specific estimators.
%------------------------------------------
\section{Individual-specific estimates}\label{sec:individual}
%------------------------------------------
%----------------------------
\subsection{Estimator}
%----------------------------
The individual-specific parameters can be obtained by deriving the individual's conditional distribution based on the choices observed for that individual using Bayes's formula \citep{revelt2000customer, train2009discrete}. Explicitly, the conditional distribution for individual $i$ is given by:
\begin{equation}\label{eq:posterior}
h(\vbeta_i| \vy_i, \mX_i, \vtheta) = \frac{f(\vy_i|\mX_i, \vbeta_i) g(\vbeta_i| \mOmega)}{f(\vy_i| \mX_i, \mOmega)}= \frac{f(\vy_i|\mX_i, \vbeta) g(\vbeta_i| \mOmega)}{\int f(\vy_i| \mX_i, \vbeta_i)g(\vbeta_i | \mOmega) d \vbeta_i}.
\end{equation}
Thus, $h(\vbeta_i| \vy_i, \mX_i, \vtheta)$ is the conditional (on the observed choices) distribution of the individual $i$'s tastes, whereas $g(\vbeta_i| \mOmega)$ is the unconditional distribution in the population. Specifically, $h(\vbeta_i| \vy_i, \mX_i, \vtheta)$ gives us all potential values of $\vbeta$ for individual $i$, given what we observe about that individual (data) and the underlying parameters. It can also be interpreted as the density of $\vbeta_i$ in the subpopulation of individuals who would choose the sequence of choices that individual $i$ made, $\vy_i$, when facing $\mX_i$ \citep{huber2001similarity, train2009discrete}.
The individual-specific coefficients are obtained by computing the average value among all potential values for individual $i$. Thus, taking the expectation of $h(\vbeta_i| \vy_i, \mX_i, \vtheta)$ yields:
\begin{equation}\label{eq:population_expected_beta}
\begin{aligned}
\bar{\vbeta}_i & = \E\left[\vbeta_i| \vy_i, \mX_i, \vtheta\right] \\
& = \int \vbeta_i h(\vbeta_i| \vy_i, \mX_i, \vtheta)d\vbeta_i \\
& = \frac{\int \vbeta_if(\vy_i|\mX_i, \vbeta_i) g(\vbeta_i| \mOmega)d\vbeta_i}{\int f(\vy_i| \mX_i, \vbeta_i)g(\vbeta_i | \mOmega) d \vbeta_i}.
\end{aligned}
\end{equation}
It is important to mention that Equation (\ref{eq:posterior}) is also known as the posterior distribution, whereas Equation (\ref{eq:population_expected_beta}) is known as the posterior mean in a Bayesian setting with quadratic loss. Prior knowledge about the individual estimates is given by the prior distribution $g(\vbeta_i| \mOmega)$ and the sampling distribution, or likelihood function, is given by $f(\vy_i|\mX_i, \vbeta_i)$.
Another statistic used in applied research is the variance of the conditional distribution, which is given by:
\begin{equation}\label{eq:conditional_variance}
\begin{aligned}
\bar{\vsigma}_i^2 & = \var(\vbeta_i| \vy_i, \mX_i, \vtheta) \\\
& = \E\left[\vbeta_i^2| \vy_i, \mX_i, \vtheta\right] -\left(\E\left[\vbeta_i| \vy_i, \mX_i, \vtheta\right]\right)^2 \\
& = \frac{\int \vbeta_i^2f(\vy_i|\mX_i, \vbeta_i) g(\vbeta_i| \mOmega)d\vbeta_i}{\int f(\vy_i| \mX_i, \vbeta_i)g(\vbeta_i | \mOmega) d \vbeta_i} - \left(\frac{\int \vbeta_if(\vy_i|\mX_i, \vbeta_i) g(\vbeta_i| \mOmega)d\vbeta_i}{\int f(\vy_i| \mX_i, \vbeta_i)g(\vbeta_i | \mOmega) d \vbeta_i}\right)^2.
\end{aligned}
\end{equation}
Thus, the conditional (posterior) distribution of tastes for individual $i$ is given by $h(\vbeta_i| \vy_i, \mX_i, \vtheta)$, with expected value (or conditional mean) given by Equation (\ref{eq:population_expected_beta}) and variance given by Equation (\ref{eq:conditional_variance}).
One key problem when obtaining estimates for $\bar{\vbeta}_i$ and $\bar{\sigma}_i^2$ is that they do not have a closed-form solution. Again, we can rely on simulations to approximate the integrals and obtain the Bayes point estimator. For the conditional mean, the procedure consists of taking several draws of $\vbeta$ from the population density $g(\vbeta_i | \widehat{\mOmega})$, say $r = 1, ..., R$ and compute \citep{train2009discrete}:
\begin{equation}\label{eq:estimate_expected_beta_1}
\widehat{\bar{\vbeta_i}} = \widehat{\E}\left[\vbeta_i| \vy_i, \mX_i, \widehat{\vtheta}\right] = \frac{\sum_{r = 1}^R \widehat{\vbeta}_{ir} f(\vy_i|\mX_i, \widehat{\vbeta}_{ir})}{\sum_{r = 1}^R f(\vy_i|\mX_i, \widehat{\vbeta}_{ir})},
\end{equation}
%
where $\widehat{\vbeta}_{ir}$ is the $r$th draw from its distribution, and $f(\vy_i|\mX_i, \widehat{\vbeta}_{ir})$ is the estimated likelihood function for individual $i$ given $\widehat{\vbeta}_{ir}$.\footnote{As an aside, since the conditional estimates are estimated for each individual based on their choice set, predictions outside of the sample is not possible. The unconditional estimates are more suitable for this purpose \citep{jones2004predicting}. }
Likewise, $\bar{\vsigma}^2$ can be estimated using \citep{hess2010conditional, greenebook}:
\begin{equation}\label{eq:estimate_var_beta_1}
\widehat{\bar{\vsigma}}_i^2 = \widehat{\var}(\vbeta_i| \vy_i, \mX_i, \widehat{\vtheta}) = \frac{\sum_{r = 1}^R \left(\widehat{\vbeta}_{ir}- \widehat{\bar{\vbeta}}_i\right)^2 f(\vy_i|\mX_i, \widehat{\vbeta}_{ir})}{\sum_{r = 1}^R f(\vy_i|\mX_i, \widehat{\vbeta}_{ir})}.
\end{equation}
%------------------------------------------------------------------------------------------------
\subsection{Consistency of individual estimates and implications}\label{sec:consistency}
%------------------------------------------------------------------------------------------------
\subsubsection{$T$-asymptotic}
Researchers interested in the tastes for each sampled individual usually compute the conditional mean using Equation (\ref{eq:estimate_expected_beta_1}).\footnote{\cite{revelt2000customer} offer two additional estimator depending on how the simulation is performed. Instead of taking the population parameters $\vtheta$ as given, and use the point estimates $\widehat{\vtheta}$ as in Equation (\ref{eq:estimate_expected_beta_1}), a different approach is to take into consideration the sampling distribution of $\vtheta$. Let $N(\vtheta| \bar{\vtheta}, \mSigma_{\vtheta})$ be the multivariate normal density (asymptotic distribution) of $\vtheta$ with mean $\bar{\vtheta}$ and covariance $\mSigma_{\vtheta}$. Then, the expected value of the individual-level preference parameters $\vbeta_i$ conditional on the population parameters $\bar{\vtheta},\mSigma_{\vtheta}$ and on the individual choices $\vy_i$, is: $\E\left[\vbeta_i|\vy_i, \bar{\vtheta}, \mSigma_{\vtheta}\right] = \int_{\vtheta} \E\left[\vbeta_i| \vy_i, \mX_i, \vtheta\right]\mathcal{N}(\vtheta| \bar{\vtheta}, \mSigma_{\vtheta}) d\vtheta.$ A Monte Carlo approximation to this expectation is obtained by calculation of the empirical mean evaluated at draws of $\vtheta$ from the asymptotic distribution of the estimator. \cite{revelt2000customer} and \cite{sarrias2018individual} have shown that estimators based on point and sampling distribution deliver similar results in Monte Carlo experiments. The second estimator, instead of taking draws from the asymptotic distribution, takes draws from the posterior distribution of $\vtheta$. Since the denominator in Equation (\ref{eq:posterior}) is a constant, we can write the following proportionality relation: $h(\vbeta_i| \vy_i, \mX_i, \vtheta)\propto f(\vy_i| \mX_i, \vbeta_i)g(\vbeta_i | \mOmega)$. Draws from the posterior $h(\vbeta_i| \vy_i, \mX_i, \vtheta)$ can then be obtained using the Metropolis-Hasting algorithm \citep{sillano2005willingness}, which allows to obtain the sampling distribution of $h(\vbeta_i| \vy_i, \mX_i, \vtheta)$. } Since the conditional means are generally used for second-stage procedures such as targeted product design, spatial analysis of tastes, compute WTP measures, etc., a natural question that arises is whether $\widehat{\bar{\vbeta}}_i$ can be considered an estimate of $\vbeta_i$. In other words, the question is: when is the estimated mean of the conditional distribution a reliable estimate of individual $i$'s coefficient? As \cite{revelt2000customer} and \cite{train2009discrete} have already shown, the only way that we can be confident that $\widehat{\bar{\vbeta}}_i$ is a consistent estimate is when the number of choice situations that an individual face is large enough. Strictly speaking, as $T\to \infty$ the mean of the conditional distribution converges to the true specific individual coefficient, that is, $\widehat{\bar{\vbeta}}_i\pto \vbeta_i$: intuitively, if we could observe infinitely how individuals react to changes in attributes, and their choices in terms of these changes, in theory we could elucidate the specific tastes of individuals.\footnote{The reader familiar with panel data estimation can make an analogy with the consistency of the FE estimator of the individual effects. If $T$ is fixed and $N\to \infty$ as typical in short panels, then only the FE estimator of $\vbeta$ is consistent, whereas the estimator of the individual effects are not consistent since the number of these parameters increases as $N$ increases.} This result can also be obtained by considering the frequentist view of the asymptotic behavior of the Bayesian estimator, which is summarized by the Bernstein-von Mises Theorem. This theorem states that under mild conditions, the mean of the posterior distribution, $\widehat{\bar{\vbeta}}_i$, is asymptotically equivalent to the maximum likelihood estimator of $\vbeta_i$ as the sample size ($T$ in this case) grows \citep[see][for a formal exposition of this]{train2009discrete}.
To understand the mechanism behind the convergence of the conditional mean, consider Figure \ref{example}. The black distribution represents the unconditional distribution $g(\vbeta_i| \mOmega)$---the aggregated population distribution of tastes---, whereas the blue distributions represent the conditional distributions, $h(\vbeta_i| \vy_i, \mX_i, \vtheta)$, for some specific individual and different number of choice situations $(T = 5, 10, 50)$. Assume that the red line, $\beta = 1$, represents the true coefficient for this particular individual in $g(\vbeta_i | \mOmega)$. We can observe that as $T$ increases (i.e., we observe how this particular individual choose under different changes of the attributes), the conditional mean of $h(\vbeta_i| \vy_i, \mX_i, \vtheta)$, $\widehat{\bar{\vbeta}}_i$, converges to the true individual's coefficient. In other words, $h(\vbeta_i| \vy_i, \mX_i, \vtheta)$ should collapse onto $\vbeta_i$ as $T \to \infty$. Finally, it is important to point out that often the number of choice situations per individual is less than the number of individual-level coefficients for the individual, and so individual-level coefficients are unidentified by the individual's choices.\footnote{The author is grateful to Kenneth Train for pointing this out.}
\begin{figure}[ht]
\caption{Conditional and unconditional distributions}\label{example}
\centering
<>=
set.seed(2)
condi1 <- expression(paste("h(", bold(beta)[i], "|" ,bold(y)[i], ", ", bold(X)[i], " )", "", "T = 5"))
condi2 <- expression(paste("h(", bold(beta)[i], "|" ,bold(y)[i], ", ", bold(X)[i], " )", "", "T = 10"))
condi3 <- expression(paste("h(", bold(beta)[i], "|" ,bold(y)[i], ", ", bold(X)[i], " )", "", "T = 50"))
par(cex = 0.9, mar = c(4, 4, 2, 1) + 0.1)
plot(density(rnorm(100000, mean = 1, sd = 1)),
ylim = c(0, 1),
col = "gray0",
lwd = 2, lty = 1,
xlab = expression(hat(beta)[i]),
ylab = "",
main = "")
abline(v = 1, col = "red", lwd = 2)
lines(density(rnorm(10000, mean = 0, sd = 1)) , col = "slateblue4", lwd = 3, lty = 2)
lines(density(rnorm(10000, mean = 0.5, sd = 0.5)), col = "slateblue3", lwd = 3, lty = 3)
lines(density(rnorm(10000, mean = 1, sd = 0.1)), col = "slateblue2", lwd = 3, lty = 5)
legend("topright",
c(expression(g(bold(beta)[i], bold(Omega))),
condi1,
condi2,
condi3),
lty = c(1, 2, 3, 5),
col = c("gray0", "slateblue4", "slateblue3", "slateblue2"))
@
\end{figure}
Another important point that can be observed in Figure \ref{example} is that the variance of the conditional distribution shrinks as more information about individual $i$'s choices are available. That is, the variance of the conditional distribution $\bar{\vsigma}_i^2$ tends to zero as $T$ increases without bound. Thus, more observations per individual reduces the uncertainty in the determination of the conditional mean reducing $\var(\vbeta_i|\vy_i, \mX, \vtheta)$.
In sum, there are two important conclusions regarding the mean and variance of the conditional distribution. Formally:
\begin{equation}\label{eq:result}
\widehat{\bar{\vbeta}}_i = \widehat{\E}\left[\vbeta_i| \vy_i, \mX_i, \widehat{\vtheta}\right] \pto \vbeta_i\quad \mbox{and}\quad \widehat{\bar{\vsigma}}_i^2 = \widehat{\var}(\vbeta_i| \vy_i, \mX_i, \widehat{\vtheta})\pto \vzeros\quad \mbox{as $T\to \infty$}.
\end{equation}
\subsubsection{$N$-asymptotic}
A final question remains: What about the asymptotic in terms of the sample size, $N$? Does increasing the number of individuals help to achieve consistency of individual-level estimators? When the number of choice situations is fixed, but the number of individuals goes to infinity, the population parameters $\vtheta$ are estimated with less bias improving the estimate of $\widehat{\vtheta}$, and hence, the point estimates of $\widehat{\bar{\vbeta}}_i$ and $\widehat{\bar{\vsigma}}_i^2$. However, increasing the sample size and keeping the number of choice situations fixed prevents the mean of the conditional distribution from converging to the true individual's parameter \citep{revelt2000customer}. In other words, a large sample size with a low number of choice situation per individual does not guarantee consistency of the conditional means.
%------------------------------------------------------------------------------------------------
\subsection{Relationship between unconditional and conditional distributions}\label{sec:relationship}
%------------------------------------------------------------------------------------------------
The result in (\ref{eq:result}) has several consequences, which can be used as a diagnostic tool in applied research. For a correctly specified model at the true population parameters, the conditional distributions aggregated over all individuals should be equal to the unconditional (population) distribution \citep{revelt2000customer, hensher2005applied}\footnote{For a proof of this result see \cite{revelt2000customer} or \citet[section 8.1]{hensher2005applied}.}, and consequently the moments of the aggregated conditional means should be equal to the moments of $g(\vbeta_i|\mOmega)$. This implies that: first, the sample average of the conditional means over individuals should converge to the mean of the population distribution. The intuition is that since the conditional distributions collapse in the true individual parameter, so does their expected values. Therefore, for a sufficiently large $T$, if we add up the conditioned expectations, which converge to the most probable position of each individual in $g(\vbeta_i|\mOmega)$, its distribution must be equal to $g(\vbeta_i|\mOmega)$. Formally, let $\widehat{\vmu}_{\bar{\vbeta}_i} = \frac{1}{N}\sum_{i = 1}^N \widehat{\bar{\vbeta}}_i$ be the sample average of the estimated conditional means and $\vmu$ be the mean of the unconditional distribution. Then $\widehat{\vmu}_{\bar{\vbeta}_i}\pto \vmu$, as $T$ increases without bound.
Second, the variance of the conditional means should converge to the population variance of the unconditional distribution. To see this point, recall that the unconditional variance of $\vbeta_i$ can be written as:
\begin{equation*}\label{eq:unconditional_variance}
\begin{aligned}
\var\left(\vbeta_i\right) & = \E\left[\var(\vbeta_i| \vy_i, \mX_i, \vtheta) \right] + \var\left[\E\left[\vbeta_i| \vy_i, \mX_i, \vtheta\right]\right] \\
\vsigma^2& = \E\left[\bar{\vsigma}_i^2 \right] + \var\left[\bar{\vbeta}_i\right],
\end{aligned}
\end{equation*}
%
where $\var\left(\vbeta_i\right) = \vsigma^2$ is the variance of the unconditional distribution $g(\vbeta_i|\mOmega)$, $\var(\vbeta_i| \vy_i, \mX_i, \vtheta)$ and $\E\left[\vbeta_i| \vy_i, \mX_i, \vtheta\right]$ are the variance and expectation of the conditional distribution, respectively. Since $\widehat{\bar{\vsigma}}_i^2\pto \vzeros$ as $T\to\infty$, then $\var(\widehat{\bar{\vbeta}}_i)\pto \vsigma^2$ as $T\to\infty$. In words, as the number of choice situations per individual increases without bound, then the variance of the conditional means should converge to the variance of the unconditional distribution.
Moreover, since in general $\widehat{\bar{\vsigma}}_i^2 \geq 0$ when $T$ is not sufficiently large, such that $\E\left[\widehat{\bar{\vsigma}}_i^2\right] \geq 0$, then it is true that:
\begin{equation*}
\var\left(\vbeta_i\right) \geq \var\left[\widehat{\bar{\vbeta}}_i\right].
\end{equation*}
That is, the distribution of the conditional means has generally a narrower range than the unconditional distribution. Although this property has been pointed out by several authors \citep[see for example][]{revelt2000customer,hensher2003taking, greene2005using,hess2010conditional, daly2012assuring, sarrias2018individual}, has not been considered in most of the empirical studies that rely on individual-specific estimates. For example, several articles have found that the tastes of individuals show more plausible and less counter-intuitive results in terms of signs when conditioned means are used. This might be true, or it can be a result of the above. As \cite{daly2012assuring} state: ``\emph{The procedure mistakenly ignores the fact that conditional distributions are derived from, and must aggregate to, the unconditional distribution. In particular, the unconditional population variance is equal to the variance of the conditional means plus the variance of the unconditional distribution around the conditional means. The variance of conditional means is less than the unconditional variance not because it is more reasonable estimate, but rather because \textbf{it incorrectly excludes the variance around the conditional means}.}'' Therefore, the conditional estimates are estimated with uncertainty, which is usually ignored in practice, and this uncertainty decreases as more choice per individual are observed.
These results provide three potential tools to diagnose how reliable the estimates at the individual level are \citep{allenby1998marketing, revelt2000customer, hess2010conditional}. All of them aim to evaluate how similar is the distribution of the conditional means across respondents to the population distribution. Consider the following two ratios: $\widehat{\vmu}_{\bar{\vbeta}_i}/ \widehat{\vmu}$ and $sd(\widehat{\bar{\vbeta}}_i)/ \widehat{\vsigma}$. If the model is correctly specified and accurately estimated, then the moments of the conditional means should be equal to the moments of the population distribution and, consequently, these two ratios should be very close to 1. If they are not, the difference could be due to (1) misspecification of the population distribution, (2) insufficient number of draws, (3) an inadequate sample size, (4) the maximum likelihood converging at a local rather than global maximum, (5) not enough choices per person, and/or (6) bad experimental design \citep{revelt2000customer}. A more formal method to evaluate whether both distribution are equal is to use, for example, the Kolmorogov-Smirnov (KS) test where the null hypothesis is that both distributions are equal.
Finally, it is important to note that all these results apply also to conditional means estimated from models re-parameterized in WTP space, as suggested by \cite{train2005discrete}.
%----------------------------
\section{Monte Carlo setup}\label{sec:monte}
%----------------------------
To understand how well the diagnostic measures for individual-specific estimates perform under a well-specified model and different number of choice situations, a Monte Carlo experiment with a similar setup as in \cite{revelt2000customer} is carried out.\footnote{The program and code for the experiment can be downloaded at \url{https://data.mendeley.com/datasets/k8v7ns2xv3/1}.} Specifically, it is assumed that the true utility for individual $i$ when choosing alternative $j$ in choice occasion $t$ is given by:
\begin{equation}\label{eq:u-mc}
U_{ijt}= \beta_1 x_{1,ijt} + \beta_{2i}x_{2, ijt} + \beta_{3i}x_{3, ijt} + \beta_{4i}x_{4, ijt} + \beta_{5i}x_{5, ijt} + \epsilon_{ijt},
\end{equation}
%
where $\epsilon_{ijt}$ is a iid type 1 Extreme Value error term; the five attributes $x_{k, ijt}$, $k = 1, ...,5$ are generated as independent standard normal variables; $\beta_1$ is a fixed coefficient with true value equal to 1; and the number of alternatives is equal to 3, ($J = 3$). The true population densities for the random parameters are set as follows:
\begin{itemize}
\item $\beta_{2i}\sim \rN(1, 1)$, so that about 84\% of the individuals have a positive coefficient,
\item $\beta_{3i}\sim \rN(1, 3^2)$, so that about 63\% of the individuals have a positive coefficient,
\item $\beta_{4i}= \exp(\rN(0.5, 0.5^2))$, such that $\beta_{4i}\sim \log \rN(0.5, 0.5^2)$. Thus $\E(\beta_{4i}) = \exp(0.5 + 0.5^2/2) = 1.868$, and $sd(\beta_{4i}) = 1.868\times \sqrt{(\exp(0.5^2) - 1)} = 0.996$,
\item $\beta_{5i}= \exp(\rN(0.5, 1))$, such that $\beta_{4i}\sim \log \rN(0.5, 1)$. Thus $\E(\beta_{5i}) = \exp(0.5 + 1/2) = 2.718$, and $sd(\beta_{5i}) = 1.868\times \sqrt{(\exp(1) - 1)} = 3.563$.
\end{itemize}
To understand the behavior of the conditional estimates and the diagnostic measures, the experiment is run for different number of choice situations. In particular, similar to \cite{revelt2000customer} and \cite{sarrias2018individual}, we choose $T = \left\lbrace 1, 5, 10, 20, 50\right\rbrace$.
To keep the estimate time manageable, in each experiment $S = 100$ samples of size $N = 300$ are created, and in each sample $\epsilon_{ijt}$ and the five attributes are randomly created. The alternative selected for each of the 300 individuals is the one that delivers the highest utility using Equation (\ref{eq:u-mc}). The fixed and random parameters are created before running the experiment and held fixed in each trail.
As shown by \cite{revelt2000customer}, the number of Halton draws used in the simulation does not seem to affect the results if more than 100 Halton draws are used. Thus, 300 Halton draws for each random parameter and individuals are used during the experiment.
For each experiment $s = 1, ...,S$, the conditional mean for each random parameter, $\widehat{\bar{\beta}}_{is}$, using Equation (\ref{eq:estimate_expected_beta_1}), and the standard deviation of the conditional distribution, $\widehat{\bar{\sigma}}_{is}$, using the square root of Equation (\ref{eq:estimate_var_beta_1}) are computed. After running the experiments, we compute the following statistics averaged over the $S$ samples:
\begin{itemize}
\item Sample mean of the conditional means:
\begin{equation*}
\widehat{\mu}_{\bar{\beta}_i} = \frac{1}{S}\sum_{s = 1}^S \left(\frac{1}{N}\sum_{i = 1}^{300}\widehat{\bar{\beta}}_{is}\right).
\end{equation*}
\item Standard deviation of the conditional means:
\begin{equation*}
sd(\widehat{\bar{\beta}}_i) =\frac{1}{S} \sum_{s = 1}^S \left(\frac{1}{(N - 1)} \sum_{i = 1}^{300}\left[\widehat{\bar{\beta}}_{is} - \overline{\widehat{\bar{\beta}}}_{s}\right]\right)^{1/2}.
\end{equation*}
\item The sample mean of the standard deviation of the conditional distribution:
\begin{equation*}
\overline{\widehat{\sigma}}_i = \frac{1}{S} \sum_{s = 1}^S \left(\frac{1}{N} \sum_{i = 1}^{300}\widehat{\bar{\sigma}}_{is}\right).
\end{equation*}
\item The percentage of positive conditional means:
\begin{equation*}
\% > 0 = \frac{1}{S}\sum_{s = 1}^S \left(\frac{1}{N} \sum_{i = 1}^{300} \mathbbm{1}\left[\widehat{\bar{\beta}}_{is} > 0\right]\right).
\end{equation*}
\item As additional diagnostic tool, for each trial we compute two tests that evaluate how similar is the distribution of the conditional means and the true population distribution. The first test is based on the Lepage's $D$ statistic (\emph{Lp}) which compares the location and scale between two distributions. In particular, let $\delta = \widehat{\mu}_{\bar{\beta}_i} - \mu$ and $\gamma^2 = \frac{\var(\widehat{\mu}_{\beta_i})}{\sigma^2}$. Then the null hypothesis is $H_0: \delta = 0$ and $\gamma^2 = 1$. The second test is the traditional Kolmogorov-Smirnov (\emph{KS}) test, which quantifies the distance between both empirical distributions. The null hypothesis is that the samples are drawn from the same distribution. The p-values for both tests are saved for each experiment and the averaged p-values over $S$ are reported.
\item The absolute difference bias:
\begin{equation*}
Bias = \frac{1}{S}\sum_{s = 1}^S \left(\frac{1}{N} \sum_{i = 1}^{300} \left\lvert\widehat{\bar{\beta}}_{is} - \beta_i \right\rvert \right).
\end{equation*}
\end{itemize}
%---------------------
\section{Results}\label{sec:results}
%---------------------
%----------------------------------------------------
\subsection{Revisiting \cite{revelt2000customer}}
%---------------------------------------------------
Similar in spirit to \cite{revelt2000customer}'s work, this Section analyzes the behavior of the conditional means and the diagnostic measures under a well-specified model.
Table \ref{tableind} presents the Monte Carlo results averaged over $S$, where $S$ is the total number of samples for which the SML converged.\footnote{Samples for which the iteration limit (300) was exceeded or the optimization algorithm (BHHH) found a flat region are dropped.} The estimations were carried out using package gmnl in R \citep{sarrias2017}.
First, as in \cite{revelt2000customer} and \cite{train2009discrete}, it can be observed that as $T$ increases the sample mean of the conditional means converges to the true mean of the unconditional distribution for our 4 parameters: $\widehat{\mu}_{\bar{\beta}_i}\pto \mu$. For example, for both normal parameters $(\beta_{i2}, \beta_{i3})$, the mean of the conditional means across individuals converges to 1. The same occurs for $\beta_{i4}$ and $\beta_{i5}$ which converge to 1.87 and 2.72, respectively. As mentioned before, under a well-specified model the average conditional means should be close to the unconditional mean. Thus, a good diagnostic tool is to compute $\widehat{\mu}_{\bar{\beta}_i}/\mu$ \citep{allenby1998marketing, revelt2000customer, train2009discrete, hess2010conditional}. Considering $T = 10$---which is the most typical number of choice situations in real choice experiments---the sample mean of the conditional means represents the 98\%, 94\%, 97\% and 96\% of the true population mean, respectively. Therefore, if the model is well-specified and consistently estimated, $\widehat{\mu}_{\bar{\beta}_i}$ should be very close to $\mu$ (with low bias) for reasonable number of choice situations.
<>=
rm(list = ls(all = TRUE))
load("../Data/MIXL_MC_1.RData")
source('../Manuscript/helpers.R')
library('xtable')
library('knitr')
library("NSM3")
@
\tabcolsep=0.06cm
<>=
# Table for results
Table_1 <- matrix(NA, nrow = 10, ncol = 16)
for (t in 1:5) {
out <- stats_bi(bhat = bi2.sim.e1[, , t],
se = si2.sim.e1[, , t],
true = b2.e1,
drop = 1000,
mess = mess.e1[, t],
dis.test = TRUE)
out2 <- stats_bi(bhat = bi3.sim.e1[, , t],
se = si3.sim.e1[, , t],
true = b3.e1,
drop = 1000,
mess = mess.e1[, t],
dis.test = TRUE)
out3 <- stats_bi(bhat = bi4.sim.e1[, , t],
se = si4.sim.e1[, , t],
true = b4.e1,
drop = 1000,
mess = mess.e1[, t],
dis.test = TRUE)
out4 <- stats_bi(bhat = bi5.sim.e1[, , t],
se = si5.sim.e1[, , t],
true = b5.e1,
drop = 1000,
mess = mess.e1[, t],
dis.test = TRUE)
Table_1[t, 1] <- mean(out$mean.S)
Table_1[t, 2] <- mean(out$sd.S)
Table_1[t, 3] <- mean(out$av.se)
Table_1[t, 4] <- mean(out$av.pos)
Table_1[t, 5] <- mean(out$ks)
Table_1[t, 6] <- mean(out$plepage)
Table_1[t, 7] <- out$bias_1
Table_1[t, 8] <- out$S
Table_1[t, 9] <- mean(out2$mean.S)
Table_1[t, 10] <- mean(out2$sd.S)
Table_1[t, 11] <- mean(out2$av.se)
Table_1[t, 12] <- mean(out2$av.pos)
Table_1[t, 13] <- mean(out2$ks)
Table_1[t, 14] <- mean(out2$plepage)
Table_1[t, 15] <- out2$bias_1
Table_1[t, 16] <- out2$S
Table_1[t + 5, 1] <- mean(out3$mean.S)
Table_1[t + 5, 2] <- mean(out3$sd.S)
Table_1[t + 5, 3] <- mean(out3$av.se)
Table_1[t + 5, 4] <- mean(out3$av.pos)
Table_1[t + 5, 5] <- mean(out3$ks)
Table_1[t + 5, 6] <- mean(out3$plepage)
Table_1[t + 5, 7] <- out3$bias_1
Table_1[t + 5, 8] <- out3$S
Table_1[t + 5, 9] <- mean(out4$mean.S)
Table_1[t + 5, 10] <- mean(out4$sd.S)
Table_1[t + 5, 11] <- mean(out4$av.se)
Table_1[t + 5, 12] <- mean(out4$av.pos)
Table_1[t + 5, 13] <- mean(out4$ks)
Table_1[t + 5, 14] <- mean(out4$plepage)
Table_1[t + 5, 15] <- out4$bias_1
Table_1[t + 5, 16] <- out4$S
}
## Table for latex
rownames(Table_1) <- c("T = 1", "T = 5", "T = 10", "T = 20", "T = 50",
" T = 1", " T = 5", " T = 10", " T = 20", " T = 50")
Table_1 <- xtable(Table_1)
caption(Table_1) <- "Simulation results for MIXL: Conditional means. The results are averaged over the number of samples $S$ for which the SML converged.\\label{tableind}"
align(Table_1) <- "l|cccccccc|cccccccc"
digits(Table_1) <- c(0, rep(3, 7), 0, rep(3, 7), 0)
addtorow <- list()
addtorow$pos <- list(0, 0, 5, 5)
addtorow$command <- c("\\midrule & \\multicolumn{8}{c}{$\rN(1, 1)$} & \\multicolumn{8}{c}{$N(1, 3^2)$} \\\\\n",
"& $\\widehat{\\mu}_{{\\bar{\\beta}}_i}$ & \\emph{$sd(\\widehat{\\bar{\\beta}}_i)$} & \\emph{$\\bar{\\widehat{\\sigma}}_i$} & \\emph{\\%>0} & \\emph{KS} & \\emph{Lp} & \\emph{Bias} & \\emph{S}
& $\\widehat{\\mu}_{{\\bar{\\beta}}_i}$ & \\emph{$sd(\\widehat{\\bar{\\beta}}_i)$} & \\emph{$\\bar{\\widehat{\\sigma}}_i$} & \\emph{\\%>0} & \\emph{KS} & \\emph{Lp} & \\emph{Bias} & \\emph{S} \\\\\n",
"\\midrule & \\multicolumn{8}{c}{$LN(0.5, 0.5^2)$} & \\multicolumn{8}{c}{$LN(0.5, 1)$} \\\\\n",
"& $\\widehat{\\mu}_{{\\bar{\\beta}}_i}$ & \\emph{$sd(\\widehat{\\bar{\\beta}}_i)$} & \\emph{$\\bar{\\widehat{\\sigma}}_i$} & \\emph{\\%>0} & \\emph{KS} & \\emph{Lp} & \\emph{Bias} & \\emph{S}
& $\\widehat{\\mu}_{{\\bar{\\beta}}_i}$ & \\emph{$sd(\\widehat{\\bar{\\beta}}_i)$} & \\emph{$\\bar{\\widehat{\\sigma}}_i$} & \\emph{\\%>0} & \\emph{KS} & \\emph{Lp} & \\emph{Bias} & \\emph{S} \\\\\n")
print(Table_1,
booktabs = TRUE,
tabular.environment = "longtable",
add.to.row = addtorow,
include.colnames = FALSE,
include.rownames = TRUE,
latex.environments = "center",
table.placement = "ht",
caption.placement = "top",
size = "scriptsize",
hline.after = c(-1, 0, 5, 10))
@
Second as $T$ increases the mean of the conditional standard deviation, $\bar{\widehat{\sigma}}_i$, decreases, whereas the standard deviation of the sample mean, $sd(\widehat{\bar{\beta}}_i)$, increases and converges to the standard deviation of the population distribution, $\sigma$. Computing $sd(\widehat{\bar{\beta}}_i)/\sigma$ for $T = 10$, the standard deviation of the conditional means is about 73\%, 90\%, 62\% and 84\% of the true standard deviation of the unconditional distribution, respectively.
The results obtained for the conditional mean and standard deviation emphasize two important points. First, the rate of convergence of the mean and variance of the conditional means is not the same. The mean of the conditional mean is a consistent estimator of the population mean even with a small $T$, while the variance of the conditional means is biased downward for the population variance for any finite $T$ and becomes consistent if and only if $T$ rises without bound.\footnote{I am grateful to one of the reviewers for pointing this out.} Second, the moments of the distribution of the conditional means should converge to the moments of the unconditional distribution as the number of choice situations increases. This can also be observed by looking at Figure \ref{density_true_bi} which shows the true population distribution alongside the distribution of the conditional means for the 300 individuals (averaged over the $S$ datasets) for each $T$. For example, for a few number of choice situations the conditional variance for each individual $\widehat{\bar{\sigma}}_i^2$ is high, resulting in more concentrated conditional means. But, as $T$ increases, $\widehat{\bar{\sigma}}_i^2$ decreases and $sd(\widehat{\bar{\beta}}_i)$ converges to $\sigma$. It is also important to note that when $T$ is low, the percentage of individuals with positive coefficients tends to be overestimated when the true population distribution is normal. Table \ref{tableind} shows that the percentage of positive conditional means converges to the true percentage of positive coefficients in the population distribution---84\% and 63\%, respectively--- when we are able to observe more choice situations per individual. This result is very important for practitioners since the usual claim that \emph{``the conditional means are more reasonable estimates because they do not present sign violation''} is misleading \citep{daly2012assuring}. This result is due to the fact that when the number choice situations is low the variance of the population distribution will always be greater than the variance of the conditional means: $\widehat{\sigma}_{\beta_i}^2 >\var(\widehat{\bar{\beta}}_i)$.
\begin{figure}[H]
\centering
<>=
out1_b1 <- stats_bi(bhat = bi2.sim.e1[, , 1],
se = si2.sim.e1[, , 1],
true = b2.e1,
drop = 100,
mess = mess.e1[, 1])
out1_b2 <- stats_bi(bhat = bi2.sim.e1[, , 2],
se = si2.sim.e1[, , 2],
true = b2.e1,
drop = 100,
mess = mess.e1[, 2])
out1_b3 <- stats_bi(bhat = bi2.sim.e1[, , 3],
se = si2.sim.e1[, , 3],
true = b2.e1,
drop = 100,
mess = mess.e1[, 3])
out1_b4 <- stats_bi(bhat = bi2.sim.e1[, , 4],
se = si2.sim.e1[, , 4],
true = b2.e1,
drop = 100,
mess = mess.e1[, 4])
out1_b5 <- stats_bi(bhat = bi2.sim.e1[, , 5],
se = si2.sim.e1[, , 5],
true = b2.e1,
drop = 100,
mess = mess.e1[, 5])
out2_b1 <- stats_bi(bhat = bi3.sim.e1[, , 1],
se = si3.sim.e1[, , 1],
true = b3.e1,
drop = 100,
mess = mess.e1[, 1])
out2_b2 <- stats_bi(bhat = bi3.sim.e1[, , 2],
se = si3.sim.e1[, , 2],
true = b3.e1,
drop = 100,
mess = mess.e1[, 2])
out2_b3 <- stats_bi(bhat = bi3.sim.e1[, , 3],
se = si3.sim.e1[, , 3],
true = b3.e1,
drop = 100,
mess = mess.e1[, 3])
out2_b4 <- stats_bi(bhat = bi3.sim.e1[, , 4],
se = si3.sim.e1[, , 4],
true = b3.e1,
drop = 100,
mess = mess.e1[, 4])
out2_b5 <- stats_bi(bhat = bi3.sim.e1[, , 5],
se = si3.sim.e1[, , 5],
true = b3.e1,
drop = 100,
mess = mess.e1[, 5])
out3_b1 <- stats_bi(bhat = bi4.sim.e1[, , 1],
se = si4.sim.e1[, , 1],
true = b4.e1,
drop = 100,
mess = mess.e1[, 1])
out3_b2 <- stats_bi(bhat = bi4.sim.e1[, , 2],
se = si4.sim.e1[, , 2],
true = b4.e1,
drop = 100,
mess = mess.e1[, 2])
out3_b3 <- stats_bi(bhat = bi4.sim.e1[, , 3],
se = si4.sim.e1[, , 3],
true = b4.e1,
drop = 100,
mess = mess.e1[, 3])
out3_b4 <- stats_bi(bhat = bi4.sim.e1[, , 4],
se = si4.sim.e1[, , 4],
true = b4.e1,
drop = 100,
mess = mess.e1[, 4])
out3_b5 <- stats_bi(bhat = bi4.sim.e1[, , 5],
se = si4.sim.e1[, , 5],
true = b4.e1,
drop = 100,
mess = mess.e1[, 5])
out4_b1 <- stats_bi(bhat = bi5.sim.e1[, , 1],
se = si5.sim.e1[, , 1],
true = b5.e1,
drop = 100,
mess = mess.e1[, 1])
out4_b2 <- stats_bi(bhat = bi5.sim.e1[, , 2],
se = si5.sim.e1[, , 2],
true = b5.e1,
drop = 100,
mess = mess.e1[, 2])
out4_b3 <- stats_bi(bhat = bi5.sim.e1[, , 3],
se = si5.sim.e1[, , 3],
true = b5.e1,
drop = 100,
mess = mess.e1[, 3])
out4_b4 <- stats_bi(bhat = bi5.sim.e1[, , 4],
se = si5.sim.e1[, , 4],
true = b5.e1,
drop = 100,
mess = mess.e1[, 4])
out4_b5 <- stats_bi(bhat = bi5.sim.e1[, , 5],
se = si5.sim.e1[, , 5],
true = b5.e1,
drop = 100,
mess = mess.e1[, 5])
par(mfrow = c(2, 2), cex = 0.6, mar = c(4, 4, 2, 1) + 0.1)
plot(density(b2.e1), lwd = 2, lty = 1,
ylim = c(0, 0.7),
xlab = expression(hat(E)(beta[i])),
main = "A: N(1, 1) as population distribution",
col = "gray0")
lines(density(out1_b1$av.bi), col = "red", lwd = 2, lty = 2)
lines(density(out1_b2$av.bi), col = "purple", lwd = 2, lty = 3)
lines(density(out1_b3$av.bi), col = "blue", lwd = 2, lty = 4)
lines(density(out1_b4$av.bi), col = "green", lwd = 2, lty = 5)
lines(density(out1_b5$av.bi), col = "orange", lwd = 2, lty = 6)
legend("topright",
c("True", "T = 1", "T = 5", "T = 10", "T = 20", "T = 50"),
lty = c(1, 2, 3, 4, 5, 6),
col = c("black", "red", "blue", "purple", "green", "orange"))
plot(density(b3.e1), lwd = 2, lty = 1,
ylim = c(0, 0.25),
xlab = expression(hat(E)(beta[i])),
main = "B: N(1, 9) as population distribution",
col = "gray0")
lines(density(out2_b1$av.bi), col = "red", lwd = 2, lty = 2)
lines(density(out2_b2$av.bi), col = "blue", lwd = 2, lty = 3)
lines(density(out2_b3$av.bi), col = "purple", lwd = 2, lty = 4)
lines(density(out2_b4$av.bi), col = "green", lwd = 2, lty = 5)
lines(density(out2_b5$av.bi), col = "orange", lwd = 2, lty = 6)
legend("topright",
c("True", "T = 1", "T = 5", "T = 10", "T = 20", "T = 50"),
lty = c(1, 2, 3, 4, 5, 6),
col = c("black", "red", "blue", "purple", "green", "orange"))
plot(density(b4.e1), lwd = 2, lty = 1,
ylim = c(0, 0.7),
xlab = expression(hat(E)(beta[i])),
main = "C: LN(0.5, 0.5^2) as population distribution",
col = "gray0")
lines(density(out3_b1$av.bi), col = "red", lwd = 2, lty = 2)
lines(density(out3_b2$av.bi), col = "blue", lwd = 2, lty = 3)
lines(density(out3_b3$av.bi), col = "purple", lwd = 2, lty = 4)
lines(density(out3_b4$av.bi), col = "green", lwd = 2, lty = 5)
lines(density(out3_b5$av.bi), col = "orange", lwd = 2, lty = 6)
legend("topright",
c("True", "T = 1", "T = 5", "T = 10", "T = 20", "T = 50"),
lty = c(1, 2, 3, 4, 5, 6),
col = c("black", "red", "blue", "purple", "green", "orange"))
plot(density(b5.e1), lwd = 2, lty = 1,
ylim = c(0, 0.6),
xlab = expression(hat(E)(beta[i])),
main = "D: LN(0.5, 1) as population distribution",
col = "gray0")
lines(density(out4_b1$av.bi), col = "red", lwd = 2, lty = 2)
lines(density(out4_b2$av.bi), col = "blue", lwd = 2, lty = 3)
lines(density(out4_b3$av.bi), col = "purple", lwd = 2, lty = 4)
lines(density(out4_b4$av.bi), col = "green", lwd = 2, lty = 5)
lines(density(out4_b5$av.bi), col = "orange", lwd = 2, lty = 6)
legend("topright",
c("True", "T = 1", "T = 5", "T = 10", "T = 20", "T = 50"),
lty = c(1, 2, 3, 4, 5, 6),
col = c("black", "red", "blue", "purple", "green", "orange"))
@
\caption{Kernel densities of conditional means estimates. The true distribution corresponds to the unconditional (population) distribution. The kernel densities for each number of choice situations show the estimated conditional mean for each individual averaged over $S$ constructed data sets.}\label{density_true_bi}
\end{figure}
Although Figure \ref{density_true_bi} gives us a graphic idea of how fast the conditional distribution converges to the unconditional distribution, it is necessary to have a statistical measure that allows us to say whether both distributions are the same or not under a well-specified model. Table \ref{tableind} shows the p-value for both \emph{KS} and \emph{Lp} tests averaged over $S$. The results depend on the type of distribution and the degree of individual heterogeneity in the population distribution. For example, if we consider the normally distributed coefficients, the necessary number of choice situations to obtained a p-value sufficiently large to not reject the null is lower when the true population is $\rN(1, 9)$ instead of $\rN(1, 1)$: $T = 5$ vs. $T = 50$. If the population distribution is lognormal, both test require a high number of choice situations to not reject the null. Thus, in applied works, including a test to elucidate if both distributions are equal is a conservative diagnostic tool.
Next, we focus our attention in the bias of the conditional means. We can observe that the the bias of the individual-specific estimates decreases as the number of choice situation increases. For example, with $T = 10$ the absolute difference of $\widehat{\bar{\beta}}_i$ and $\beta_i$ is about 0.574, 0.959, 0.603 and 1.121, respectively. Figure \ref{cm_true_bi} presents the conditional means estimates for each individual (averaged over the $S$ samples) at different choice-situation scenarios and the true unconditional distribution for our 4 random parameters. In general, it can be observed that: 1) with just one choice occasion is almost impossible to disentangle the true individuals' tastes, 2) it is difficult to estimate the true individual-specific estimate when the true coefficient is in the tails of the distribution.\footnote{The monte Carlo experiments in this study were also carried out using $N=1000$. As explained in Section \ref{sec:consistency}, when $N\to \infty$ the consistency of $\widehat{\vtheta}$ increases, thus the conditional means are estimated with less bias. Yet, if the sample size rises, but the choice situations faced by individuals are fixed, then the conditional distribution and its mean do not change. Our results, nor reported here, confirm this: holding fixed $T$, but increasing $N$ from 300 to 1000, reduces the bias of the conditional means by about 20\% for low levels of $T$ and up to 50\% holding fixed $T$ at 50; however, increasing the sample size and holding fixed $T$ prevents the convergence of the conditional means to the true individuals' parameters. These results are in line to those reported by as in \cite{sarrias2018individual} who conducted a similar study to analyze the behavior of the individual the individual-specific estimates for the LC-MNL model.}
\begin{figure}[H]
\centering
<>=
par(mfrow = c(2, 2), cex = 0.6, mar = c(4, 4, 3, 1) + 0.1)
plot(1:300, b2.e1[order(b2.e1)], pch = 19,
ylab = expression(hat(E)(beta[i])),
xlab = "Individuals",
main = "A: N(1, 1)")
lines(out1_b1$av.bi[order(b2.e1)], col = "red")
lines(out1_b2$av.bi[order(b2.e1)], col = "blue")
lines(out1_b3$av.bi[order(b2.e1)], col = "purple")
lines(out1_b4$av.bi[order(b2.e1)], col = "green")
lines(out1_b5$av.bi[order(b2.e1)], col = "orange")
legend("topleft",
c("True", "T = 1", "T = 5", "T = 10", "T = 20", "T = 50"),
lty = c(1, 1, 1, 1, 1, 1),
col = c("black", "red", "blue", "purple", "green", "orange"))
plot(1:300, b3.e1[order(b3.e1)], pch = 19, lty = 2,
ylab = expression(hat(E)(beta[i])),
xlab = "Individuals",
main = "B: N(1, 9)")
lines(out2_b1$av.bi[order(b3.e1)], col = "red")
lines(out2_b2$av.bi[order(b3.e1)], col = "blue")
lines(out2_b3$av.bi[order(b3.e1)], col = "purple")
lines(out2_b4$av.bi[order(b3.e1)], col = "green")
lines(out2_b5$av.bi[order(b3.e1)], col = "orange")
legend("topleft",
c("True", "T = 1", "T = 5", "T = 10", "T = 20", "T = 50"),
lty = c(1, 1, 1, 1, 1, 1),
col = c("black", "red", "blue", "purple", "green", "orange"))
plot(1:300, b4.e1[order(b4.e1)], pch = 19, lty = 2,
ylab = expression(hat(E)(beta[i])),
xlab = "Individuals",
main = "C: LN(0.5, 0.5^2)")
lines(out3_b1$av.bi[order(b4.e1)], col = "red")
lines(out3_b2$av.bi[order(b4.e1)], col = "blue")
lines(out3_b3$av.bi[order(b4.e1)], col = "purple")
lines(out3_b4$av.bi[order(b4.e1)], col = "green")
lines(out3_b5$av.bi[order(b4.e1)], col = "orange")
legend("topleft",
c("True", "T = 1", "T = 5", "T = 10", "T = 20", "T = 50"),
lty = c(1, 1, 1, 1, 1, 1),
col = c("black", "red", "blue", "purple", "green", "orange"))
plot(1:300, b5.e1[order(b5.e1)], pch = 19, lty = 2,
ylab = expression(hat(E)(beta[i])),
xlab = "Individuals",
main = "C: LN(0.5, 1)")
lines(out4_b1$av.bi[order(b5.e1)], col = "red")
lines(out4_b2$av.bi[order(b5.e1)], col = "blue")
lines(out4_b3$av.bi[order(b5.e1)], col = "purple")
lines(out4_b4$av.bi[order(b5.e1)], col = "green")
lines(out4_b5$av.bi[order(b5.e1)], col = "orange")
legend("topleft",
c("True", "T = 1", "T = 5", "T = 10", "T = 20", "T = 50"),
lty = c(1, 1, 1, 1, 1, 1),
col = c("black", "red", "blue", "purple", "green", "orange"))
@
\caption{Conditional means estimates for each individual. The true distribution corresponds to the unconditional (population) distribution. Each point shows the conditional mean for each individual averaged over $S$ constructed data sets.}\label{cm_true_bi}
\end{figure}
%-------------------------------
\subsection{Misspecification}
%-------------------------------
So far, we have looked at the behavior of the conditional means under a well-specified and consistent model under the true unconditional distribution. However, another issue raised by \cite{hess2010conditional} is the impact of unconditional distributions on the shape of the conditional distributions. We can consider the impact of misspecification under the Bayesian or frequentist approach. Under a Bayesian perspective, the likelihood function should dominate the population (prior) distribution as the number of choice situation increases. Hence, the relevant question from a Bayesian point of view is not whether the population distribution is misspecified, but rather how large T should be so that the selected prior distribution becomes irrelevant. From a frequentist point of view, if the population distribution is misspecified, then the likelihood function will be biased. Therefore, and considering the Bernstein-von Mises Theorem, the individual coefficients will be inconsistent no matter how much $T$ increases. However, the objective of this section is not to establish which of the two paradigms is the correct one to analyze the impact of population distribution on individual estimates, but rather to understand the reliability of the diagnostic measures to detect a potential problem of poor specification.\footnote{To evaluate how large $T$ should be so that the prior is irrelevant would require a Monte Carlo experiment with values for T greater than those used in this study. Therefore, although it may be an interesting theoretical question, it would have little practical relevance, since real choice experiments have a fairly small number of choice situations.}
To analyze the impact of misspecifying the population distribution on the individual-specific estimates we choose incorrect distributions for our 4 parameters during the Monte Carlo experiment. Specifically, we choose the following distributions:
\begin{itemize}
\item For $\beta_{2i}\sim \rN(1, 1)$ we specify a triangular distribution without any type of constraint on the parameters,
\item for $\beta_{3i}\sim \rN(1, 9)$ we specify a log-normal distribution,
\item for $\beta_{4i}\sim LN(0.5, 0.5^2)$ we specify a normal distribution,
\item for $\beta_{5i}\sim LN(0.5, 1)$ we specify a truncated normal distribution.
\end{itemize}
Again, the sensitivity of the diagnostic measures are analyzed, but now under a misspecified model. We first look at the ratios $\widehat{\mu}_{\bar{\beta}_i}/\mu$ and $sd(\widehat{\bar{\beta}}_i)/\sigma$. Table \ref{tablemiss} shows that for the sample average of the conditional means, and $T = 10$, the ratios are 77\%, 285\%, 74\% and 62\%, respectively, whereas that for the standard deviation of the conditional means the ratios are 60\%, 323\%, 48\%, and 33\%. As expected, when the unconditional distribution is poorly specified, the sample moments of the conditional means do not match those of the population distribution (given a number of choice situations). Another important and interesting result is that for any $T$, the average p-values for the \emph{KS} and \emph{Lp} tests are always lower than 0.05 implying that we always reject the null hypothesis that both distributions are equal at 5\% significance level.
These results reveal two important conclusions. First, researchers should always analyze the robustness of the results through different choices of population distribution and report diagnostic tools to reveal if there exists a potential misspecification problem of the individual-specific estimates.\footnote{See for example \cite{hess2010conditional} for an application.} Second, even if the researcher is not interested in the individual-specific estimates, these tools can be used to detect a misspecification problem in the population distribution. However, it is important to emphasize that failure of diagnostic tests can also indicate another type of problem and not necessarily misspecification.
The results also reveal that a better knowledge of the potential domain of tastes results in a lower bias of the real coefficient for each individual. That is, the population distribution assumed for the parameters (priors) has an impact on the magnitude of the bias. Although these results are not completely generalizable to other specifications, it can be observed that if a triangular unconditional distribution is assumed, when the actual distribution is a normal one, the bias is approximately 1.1 times larger for any number of choice situations. However, if a log-normal distribution is assumed (which restricts the real domain of the coefficients) when the true distribution is normal, the bias is 2.7 times larger. Finally, in all scenarios under misspecification the bias of the conditional means decreases as $T$ increases, which might indicate that the role of the population distribution diminishes in determining the true coefficient as more choices we observe for each individual.
\tabcolsep=0.06cm
<>=
# Table for results
Table_1 <- matrix(NA, nrow = 10, ncol = 16)
for (t in 1:5) {
out <- stats_bi(bhat = bbi2.sim.e1[, , t],
se = bsi2.sim.e1[, , t],
true = b2.e1,
drop = 1000,
mess = mess.e1[, t],
dis.test = TRUE)
out2 <- stats_bi(bhat = bbi3.sim.e1[, , t],
se = bsi3.sim.e1[, , t],
true = b3.e1,
drop = 1000,
mess = mess.e1[, t],
dis.test = TRUE)
out3 <- stats_bi(bhat = bbi4.sim.e1[, , t],
se = bsi4.sim.e1[, , t],
true = b4.e1,
drop = 1000,
mess = mess.e1[, t],
dis.test = TRUE)
out4 <- stats_bi(bhat = bbi5.sim.e1[, , t],
se = bsi5.sim.e1[, , t],
true = b5.e1,
drop = 1000,
mess = mess.e1[, t],
dis.test = TRUE)
Table_1[t, 1] <- mean(out$mean.S)
Table_1[t, 2] <- mean(out$sd.S)
Table_1[t, 3] <- mean(out$av.se)
Table_1[t, 4] <- mean(out$av.pos)
Table_1[t, 5] <- mean(out$ks)
Table_1[t, 6] <- mean(out$plepage)
Table_1[t, 7] <- out$bias_1
Table_1[t, 8] <- out$S
Table_1[t, 9] <- mean(out2$mean.S)
Table_1[t, 10] <- mean(out2$sd.S)
Table_1[t, 11] <- mean(out2$av.se)
Table_1[t, 12] <- mean(out2$av.pos)
Table_1[t, 13] <- mean(out2$ks)
Table_1[t, 14] <- mean(out2$plepage)
Table_1[t, 15] <- out2$bias_1
Table_1[t, 16] <- out2$S
Table_1[t + 5, 1] <- mean(out3$mean.S)
Table_1[t + 5, 2] <- mean(out3$sd.S)
Table_1[t + 5, 3] <- mean(out3$av.se)
Table_1[t + 5, 4] <- mean(out3$av.pos)
Table_1[t + 5, 5] <- mean(out3$ks)
Table_1[t + 5, 6] <- mean(out3$plepage)
Table_1[t + 5, 7] <- out3$bias_1
Table_1[t + 5, 8] <- out3$S
Table_1[t + 5, 9] <- mean(out4$mean.S)
Table_1[t + 5, 10] <- mean(out4$sd.S)
Table_1[t + 5, 11] <- mean(out4$av.se)
Table_1[t + 5, 12] <- mean(out4$av.pos)
Table_1[t + 5, 13] <- mean(out4$ks)
Table_1[t + 5, 14] <- mean(out4$plepage)
Table_1[t + 5, 15] <- out4$bias_1
Table_1[t + 5, 16] <- out4$S
}
## Table for latex
rownames(Table_1) <- c("T = 1", "T = 5", "T = 10", "T = 20", "T = 50",
" T = 1", " T = 5", " T = 10", " T = 20", " T = 50")
Table_1 <- xtable(Table_1)
caption(Table_1) <- "Simulation results for MIXL: Misspecification. The results are averaged over the number of samples $S$ for which the SML converged.\\label{tablemiss}"
align(Table_1) <- "l|cccccccc|cccccccc"
digits(Table_1) <- c(0, rep(3, 7), 0, rep(3, 7), 0)
addtorow <- list()
addtorow$pos <- list(0, 0, 5, 5)
addtorow$command <- c("\\midrule & \\multicolumn{8}{c}{$\rN(1, 1)$ as Triangular} & \\multicolumn{8}{c}{$N(1, 3^2)$ as Log-normal} \\\\\n",
"& $\\widehat{\\mu}_{{\\beta}_i}$ & \\emph{$sd(\\widehat{\\bar{\\beta}}_i)$} & \\emph{$\\widehat{\\bar{\\sigma}}_i^2$} & \\emph{\\%>0} & \\emph{KS} & \\emph{Lp} & \\emph{Bias} & \\emph{S}
& $\\widehat{\\mu}_{{\\beta}_i}$ & \\emph{$sd(\\widehat{\\bar{\\beta}}_i)$} & \\emph{$\\widehat{\\bar{\\sigma}}_i^2$} & \\emph{\\%>0} & \\emph{KS} & \\emph{Lp} & \\emph{Bias} & \\emph{S} \\\\\n",
"\\midrule & \\multicolumn{8}{c}{$LN(0.5, 0.5^2)$ as Normal} & \\multicolumn{8}{c}{$LN(0.5, 1)$ as Truncated Normal} \\\\\n",
"& $\\widehat{\\mu}_{{\\beta}_i}$ & \\emph{$sd(\\widehat{\\bar{\\beta}}_i)$} & \\emph{$\\widehat{\\bar{\\sigma}}_i^2$} & \\emph{\\%>0} & \\emph{KS} & \\emph{Lp} & \\emph{Bias} & \\emph{S}
& $\\widehat{\\mu}_{{\\beta}_i}$ & \\emph{$sd(\\widehat{\\bar{\\beta}}_i)$} & \\emph{$\\widehat{\\bar{\\sigma}}_i^2$} & \\emph{\\%>0} & \\emph{KS} & \\emph{Lp} & \\emph{Bias} & \\emph{S} \\\\\n")
print(Table_1,
booktabs = TRUE,
tabular.environment = "longtable",
add.to.row = addtorow,
include.colnames = FALSE,
include.rownames = TRUE,
latex.environments = "center",
table.placement = "ht",
caption.placement = "top",
size = "scriptsize",
hline.after = c(-1, 0, 5, 10))
@
%----------------------------------------------------
\section{Conclusion}\label{sec:conclusion}
%---------------------------------------------------
The MIXL model has been a revolution in the field of discrete choice models during the last two decades due to its capability to accommodate heterogeneity in tastes by assuming that the marginal utility coefficients are distributed randomly across respondents. However, in some practical cases, just knowing that a coefficient varies across individuals is not enough \citep{revelt2000customer, hess2010conditional}. Given this, individual-specific estimators have been gaining more attention from researchers. Their ability to determine the tastes of each individual in the sample is an attractive property from the applied point of view since it allows richer and more detailed analyzes, which can be used to retrieve individual-specific attribute processing strategies or create segments of customers, among other uses. Yet, there are some issues about their properties that have not generally been considered in the applied literature, which may cast doubt on the potential conclusions derived from their use.
This study reviews and summarizes the literature on the theoretical properties of individual-specific estimators focusing on: 1) their consistency, 2) the relationship between unconditional and conditional distribution, and 3) potential diagnostic measures for their correct use. Then, a Monte Carlo experiment is run to analyze the behavior of the individual-specific estimates focusing on the required conditions for true parameter recovery and the sensitivity of some diagnostic measures under both a well-specified and misspecified model. The focus is to assess whether the diagnostic tools are able to discriminate whether the distribution of the conditional means equals the population (unconditional) distribution under these two scenarios.
The results from the Monte Carlo experiment allow to generate some reasonable guidelines for the correct use of individual specific-estimates. Researchers using either a preference- or WTP-space model should analyze and report the following diagnostic tools to be confident that the conditional estimates are reliable:
\begin{enumerate}
\item If the model is correctly specified and accurately estimated, then we should observe that both the distribution of the estimated conditional means and the estimated population distribution are very similar \citep{revelt2000customer}. Thus:
\begin{enumerate}
\item It is always advisable to analyze whether the average of conditional means across all individuals, $\widehat{\mu}_{\bar{\beta}_i}$, is similar to the estimated mean of the population distribution, $\widehat{\mu}$. Considering 10 choice situations, and under a well-specified model, $\widehat{\mu}_{\bar{\beta}_i}$ should represent at least the 90\% of $\widehat{\mu}$.
\item In addition to the above, researchers should analyze whether the variance of the conditional means, $\var(\widehat{\bar{\beta}}_i)$, equals the estimated variance of the population distribution, $\widehat{\sigma}^2$. Again, considering 10 choice situations, and under a well-specified model, $\var(\widehat{\bar{\beta}}_i)$ should represent at least the 60\% of $\widehat{\sigma}^2$.
\item It is also recommended to test whether the distribution of the conditional means equals the estimated unconditional distribution using, for example, the KS test. Failure to reject the null hypothesis is a conservative indicator that the distribution selected for the parameter is correct.
\end{enumerate}
\end{enumerate}
It is important to carry out these diagnostic checks for different choices of the population distribution of each of the parameters and select that distribution that shows a better fit to the distribution of the conditional means. Another important point to consider is that if the distribution of the individual-specific coefficients is not equal to the estimated population distribution cannot only be due to misspecification, but also to: insufficient number of draws, an inadequate sample size, bad experimental design, and/or the maximum likelihood converging at a local rather than global maximum. Therefore, researchers must analyze whether the estimation procedure may suffer from these problems to ensure a good use of the diagnostic tool.
Once it has been verified that the individual estimates are reliable, care should be taken when using them in the following cases:
\begin{enumerate}
\item When using unbounded distributions (e.g, normal) and the number of choice situation is not large enough, it is not advisable to use the estimated conditional means to capture individuals with positive or negative tastes to certain attributes, since the sample variance of the conditional means tend to be underestimated.
\item Since the variance of the conditional distribution tend to decrease as $T\to \infty$, it should not be used to construct confidence intervals for the conditional means.
\item It is much more difficult to estimate (with greater precision) the specific tastes of individuals with extreme tastes (or the tails of the distribution), specially with a few number of choice situations. In other words, it is expected that estimates for individuals with extreme (very high or low) tastes to be downward biased.
\item When computing individual-specific WTP, researchers must check whether the unconditional distribution has well-defined moments using \cite{daly2012assuring}'s theorems.
\end{enumerate}
This work is not without shortcomings, which can be improved in future work. First, the design of the experiment does not consider more complex MIXL models that include correlation among the parameters and observed heterogeneity, or other models highly used in applied studies such as the Generalized Multinomial Logit model or the Mixed-Mixed Multinomial Logit model \citep[see for example][]{keane2013comparing}. Since all these models add additional layers of unobserved heterogeneity, the convergence criterion of individual-level estimators is expected to be more demanding in terms of the number of choice situations. Second, it would be interesting in a future work to analyze the behavior of two additional estimators proposed by \cite{revelt2000customer}. The first takes into consideration the sampling distribution of $\vtheta$ by drawing several draws from its asymptotic distribution, whereas the second estimator takes draws from the posterior distribution of $\vtheta$ using, for example, the Metropolis-Hastings algorithm as in \cite{sillano2005willingness} (see footnote 4). Finally, a less explored issue is to analyze different approaches to generate interval estimators of individual-specific parameters for the MIXL model. For example, \cite{sarrias2018individual} assessed empirical coverage from two procedures to derive standard errors that had been proposed in the literature in the context of the Latent Class Multinomial Logit model. A similar study can be carried out for the MIXL model.
\pagebreak
\bibliography{IndividualParam}
\bibliographystyle{apalike}
\newpage
\end{document}
\section{Figures and tables}
\begin{landscape}
\tabcolsep=0.06cm
<>=
rm(list = ls(all = TRUE))
load("../Data/MIXL_MC_1.RData")
source('../Manuscript/helpers.R')
library('xtable')
# Table for results
Table <- matrix(NA, nrow = 5, ncol = 27)
Table[ ,1] <- apply(b1.sim.e1, 2, mean)
Table[ ,2] <- apply(b1.sim.e1, 2, sd)
Table[ ,3] <- apply(b1se.sim.e1, 2, mean)
Table[ ,4] <- apply(b2.sim.e1, 2, mean)
Table[ ,5] <- apply(b2.sim.e1, 2, sd)
Table[ ,6] <- apply(b2se.sim.e1, 2, mean)
Table[ ,7] <- apply(s2.sim.e1, 2, mean)
Table[ ,8] <- apply(s2.sim.e1, 2, sd)
Table[ ,9] <- apply(s2se.sim.e1, 2, mean)
Table[ ,10] <- apply(b3.sim.e1, 2, mean)
Table[ ,11] <- apply(b3.sim.e1, 2, sd)
Table[ ,12] <- apply(b3se.sim.e1, 2, mean)
Table[ ,13] <- apply(s3.sim.e1, 2, mean)
Table[ ,14] <- apply(s3.sim.e1, 2, sd)
Table[ ,15] <- apply(s3se.sim.e1, 2, mean)
Table[ ,16] <- apply(b4.sim.e1, 2, mean)
Table[ ,17] <- apply(b4.sim.e1, 2, sd)
Table[ ,18] <- apply(b4se.sim.e1, 2, mean)
Table[ ,19] <- apply(s4.sim.e1, 2, mean)
Table[ ,20] <- apply(s4.sim.e1, 2, sd)
Table[ ,21] <- apply(s4se.sim.e1, 2, mean)
Table[ ,22] <- apply(b5.sim.e1, 2, mean)
Table[ ,23] <- apply(b5.sim.e1, 2, sd)
Table[ ,24] <- apply(b5se.sim.e1, 2, mean)
Table[ ,25] <- apply(s5.sim.e1, 2, mean)
Table[ ,26] <- apply(s5.sim.e1, 2, sd)
Table[ ,27] <- apply(s5se.sim.e1, 2, mean)
rownames(Table) <- c("T = 1", "T = 5", "T = 10", "T = 20", "T = 50")
Table <- xtable(Table,
caption = "Simulation results for MIXL: Population parameters",
label = "tableMain")
align(Table) <- c("l|ccc|ccc|ccc|ccc|ccc|ccc|ccc|ccc|ccc")
digits(Table) <- c(0, rep(3, 27))
addtorow <- list()
addtorow$pos <- list(0, 0)
addtorow$command <- c("& \\multicolumn{3}{c}{$\\beta_1 = 1$} & \\multicolumn{3}{c}{$\\beta_2 = 1$} & \\multicolumn{3}{c}{$\\sigma_2 = 1$} & \\multicolumn{3}{c}{$\\beta_3 = 1$}& \\multicolumn{3}{c}{$\\sigma_3 = 3$}& \\multicolumn{3}{c}{$\\beta_4 = 0.5$} & \\multicolumn{3}{c}{$\\sigma_4 = 0.5$} & \\multicolumn{3}{c}{$\\beta_5 = 0.5$} & \\multicolumn{3}{c}{$\\sigma_5 = 1$}\\\\\n",
"& Mean & SD & SE & Mean & SD & SE & Mean & SD & SE & Mean & SD & SE & Mean & SD & SE & Mean & SD & SE & Mean & SD & SE & Mean & SD & SE & Mean & SD & SE \\\\\n")
print(Table,
booktabs = TRUE,
tabular.environment = "longtable",
add.to.row = addtorow,
include.colnames = FALSE,
include.rownames = TRUE,
latex.environments = "center",
table.placement = "ht",
caption.placement = "top",
size = "scriptsize")
@
\vspace{-0.2cm}
\noindent \textsl{\footnotesize{Notes: The results are averaged over the number of samples, $S$.}}
\end{landscape}
\begin{figure}[H]
\caption{Kernel densities of conditional means for WTP}\label{density_true_wtp}
\centering
<>=
rm(list = ls(all = TRUE))
load("../Data/MIXL_MC_1.RData")
source('../Manuscript/helpers.R')
out1_b1 <- stats_bi(bhat = wbi54.sim.e1[, , 1],
se = wsi54.sim.e1[, , 1],
true = b5.e1 / b4.e1,
drop = 3000,
mess = mess.e1[, 1],
dis.test = FALSE)
out1_b2 <- stats_bi(bhat = wbi54.sim.e1[, , 2],
se = wsi54.sim.e1[, , 2],
true = b5.e1 / b4.e1,
drop = 3000,
mess = mess.e1[, 2],
dis.test = FALSE)
out1_b3 <- stats_bi(bhat = wbi54.sim.e1[, , 3],
se = wsi54.sim.e1[, , 3],
true = b5.e1 / b4.e1,
drop = 3000,
mess = mess.e1[, 3],
dis.test = FALSE)
out1_b4 <- stats_bi(bhat = wbi54.sim.e1[, , 4],
se = wsi54.sim.e1[, , 4],
true = b5.e1 / b4.e1,
drop = 3000,
mess = mess.e1[, 4],
dis.test = FALSE)
out1_b5 <- stats_bi(bhat = wbi54.sim.e1[, , 5],
se = wsi54.sim.e1[, , 5],
true = b5.e1 / b4.e1,
drop = 3000,
mess = mess.e1[, 5],
dis.test = FALSE)
out2_b1 <- stats_bi(bhat = wbi24.sim.e1[, , 1],
se = wsi24.sim.e1[, , 1],
true = b2.e1 / b4.e1,
drop = 3000,
mess = mess.e1[, 1],
dis.test = FALSE)
out2_b2 <- stats_bi(bhat = wbi24.sim.e1[, , 2],
se = wsi24.sim.e1[, , 2],
true = b2.e1 / b4.e1,
drop = 3000,
mess = mess.e1[, 2],
dis.test = FALSE)
out2_b3 <- stats_bi(bhat = wbi24.sim.e1[, , 3],
se = wsi24.sim.e1[, , 3],
true = b2.e1 / b4.e1,
drop = 3000,
mess = mess.e1[, 3],
dis.test = FALSE)
out2_b4 <- stats_bi(bhat = wbi24.sim.e1[, , 4],
se = wsi24.sim.e1[, , 4],
true = b2.e1 / b4.e1,
drop = 3000,
mess = mess.e1[, 4],
dis.test = FALSE)
out2_b5 <- stats_bi(bhat = wbi24.sim.e1[, , 5],
se = wsi24.sim.e1[, , 5],
true = b2.e1 / b4.e1,
drop = 3000,
mess = mess.e1[, 5],
dis.test = FALSE)
out3_b1 <- stats_bi(bhat = wbi32.sim.e1[, , 1],
se = wsi32.sim.e1[, , 1],
true = b3.e1 / b2.e1,
drop = 1000000,
mess = mess.e1[, 1],
dis.test = FALSE)
out3_b2 <- stats_bi(bhat = wbi32.sim.e1[, , 2],
se = wsi32.sim.e1[, , 2],
true = b3.e1 / b2.e1,
drop = 1000000,
mess = mess.e1[, 2],
dis.test = FALSE)
out3_b3 <- stats_bi(bhat = wbi32.sim.e1[, , 3],
se = wsi32.sim.e1[, , 3],
true = b3.e1 / b2.e1,
drop = 1000000,
mess = mess.e1[, 3],
dis.test = FALSE)
out3_b4 <- stats_bi(bhat = wbi32.sim.e1[, , 4],
se = wsi32.sim.e1[, , 4],
true = b3.e1 / b2.e1,
drop = 1000000,
mess = mess.e1[, 4],
dis.test = FALSE)
out3_b5 <- stats_bi(bhat = wbi32.sim.e1[, , 5],
se = wsi32.sim.e1[, , 5],
true = b3.e1 / b2.e1,
drop = 1000000,
mess = mess.e1[, 5],
dis.test = FALSE)
par(mfrow = c(1, 3), cex = 0.6, mar = c(4, 4, 2, 1) + 0.1)
plot(density( b5.e1 / b4.e1), lwd = 2, lty = 1,
ylim = c(0, 0.7),
xlab = expression(hat(E)(beta[i])),
main = "A: N(1, 1) as population distribution",
col = "gray0")
lines(density(out1_b1$av.bi), col = "red", lwd = 2, lty = 2)
lines(density(out1_b2$av.bi), col = "purple", lwd = 2, lty = 3)
lines(density(out1_b3$av.bi), col = "blue", lwd = 2, lty = 4)
lines(density(out1_b4$av.bi), col = "green", lwd = 2, lty = 5)
lines(density(out1_b5$av.bi), col = "orange", lwd = 2, lty = 6)
legend("topright",
c("True", "T = 1", "T = 5", "T = 10", "T = 20", "T = 50"),
lty = c(1, 2, 3, 4, 5, 6),
col = c("black", "red", "blue", "purple", "green", "orange"))
plot(density(b2.e1 / b4.e1), lwd = 2, lty = 1,
ylim = c(0, 0.25),
xlab = expression(hat(E)(beta[i])),
main = "B: N(1, 9) as population distribution",
col = "gray0")
lines(density(out2_b1$av.bi), col = "red", lwd = 2, lty = 2)
lines(density(out2_b2$av.bi), col = "blue", lwd = 2, lty = 3)
lines(density(out2_b3$av.bi), col = "purple", lwd = 2, lty = 4)
lines(density(out2_b4$av.bi), col = "green", lwd = 2, lty = 5)
lines(density(out2_b5$av.bi), col = "orange", lwd = 2, lty = 6)
legend("topright",
c("True", "T = 1", "T = 5", "T = 10", "T = 20", "T = 50"),
lty = c(1, 2, 3, 4, 5, 6),
col = c("black", "red", "blue", "purple", "green", "orange"))
plot(density( b3.e1 / b2.e1), lwd = 2, lty = 1,
ylim = c(0, 0.15),
xlim = c(-400, 100),
xlab = expression(hat(E)(beta[i])),
main = "C: LN(0.5, 0.5^2) as population distribution",
col = "gray0")
lines(density(out3_b1$av.bi), col = "red", lwd = 2, lty = 2)
lines(density(out3_b2$av.bi), col = "blue", lwd = 2, lty = 3)
lines(density(out3_b3$av.bi), col = "purple", lwd = 2, lty = 4)
lines(density(out3_b4$av.bi), col = "green", lwd = 2, lty = 5)
lines(density(out3_b5$av.bi), col = "orange", lwd = 2, lty = 6)
legend("topright",
c("True", "T = 1", "T = 5", "T = 10", "T = 20", "T = 50"),
lty = c(1, 2, 3, 4, 5, 6),
col = c("black", "red", "blue", "purple", "green", "orange"))
@
\end{figure}
\vspace{-0.2cm}
\noindent \textsl{\footnotesize{Notes: The true distribution corresponds to the unconditional (population) distribution. The kernel densities for each number of choice situations show the estimated conditional mean for each individual averaged over $S$ constructed data sets.}}
\begin{figure}[H]
\caption{Kernel densities of Individual-specific estimates}\label{figure1}
\centering
<>=
rm(list = ls(all = TRUE))
load("../Data/MIXL_MC_1.RData")
source('../Manuscript/helpers.R')
out1_b1 <- stats_bi(bhat = bbi2.sim.e1[, , 1],
se = bsi2.sim.e1[, , 1],
true = b2.e1,
drop = 100,
mess = mess.e1[, 1])
out1_b2 <- stats_bi(bhat = bbi2.sim.e1[, , 2],
se = bsi2.sim.e1[, , 2],
true = b2.e1,
drop = 100,
mess = mess.e1[, 2])
out1_b3 <- stats_bi(bhat = bbi2.sim.e1[, , 3],
se = bsi2.sim.e1[, , 3],
true = b2.e1,
drop = 100,
mess = mess.e1[, 3])
out1_b4 <- stats_bi(bhat = bbi2.sim.e1[, , 4],
se = bsi2.sim.e1[, , 4],
true = b2.e1,
drop = 100,
mess = mess.e1[, 4])
out1_b5 <- stats_bi(bhat = bbi2.sim.e1[, , 5],
se = bsi2.sim.e1[, , 5],
true = b2.e1,
drop = 100,
mess = mess.e1[, 5])
out2_b1 <- stats_bi(bhat = bbi3.sim.e1[, , 1],
se = bsi3.sim.e1[, , 1],
true = b3.e1,
drop = 100,
mess = mess.e1[, 1])
out2_b2 <- stats_bi(bhat = bbi3.sim.e1[, , 2],
se = bsi3.sim.e1[, , 2],
true = b3.e1,
drop = 100,
mess = mess.e1[, 2])
out2_b3 <- stats_bi(bhat = bbi3.sim.e1[, , 3],
se = bsi3.sim.e1[, , 3],
true = b3.e1,
drop = 100,
mess = mess.e1[, 3])
out2_b4 <- stats_bi(bhat = bbi3.sim.e1[, , 4],
se = bsi3.sim.e1[, , 4],
true = b3.e1,
drop = 100,
mess = mess.e1[, 4])
out2_b5 <- stats_bi(bhat = bbi3.sim.e1[, , 5],
se = bsi3.sim.e1[, , 5],
true = b3.e1,
drop = 100,
mess = mess.e1[, 5])
out3_b1 <- stats_bi(bhat = bbi4.sim.e1[, , 1],
se = bsi4.sim.e1[, , 1],
true = b4.e1,
drop = 100,
mess = mess.e1[, 1])
out3_b2 <- stats_bi(bhat = bbi4.sim.e1[, , 2],
se = bsi4.sim.e1[, , 2],
true = b4.e1,
drop = 100,
mess = mess.e1[, 2])
out3_b3 <- stats_bi(bhat = bbi4.sim.e1[, , 3],
se = bsi4.sim.e1[, , 3],
true = b4.e1,
drop = 100,
mess = mess.e1[, 3])
out3_b4 <- stats_bi(bhat = bbi4.sim.e1[, , 4],
se = bsi4.sim.e1[, , 4],
true = b4.e1,
drop = 100,
mess = mess.e1[, 4])
out3_b5 <- stats_bi(bhat = bbi4.sim.e1[, , 5],
se = bsi4.sim.e1[, , 5],
true = b4.e1,
drop = 100,
mess = mess.e1[, 5])
out4_b1 <- stats_bi(bhat = bbi5.sim.e1[, , 1],
se = bsi5.sim.e1[, , 1],
true = b5.e1,
drop = 100,
mess = mess.e1[, 1])
out4_b2 <- stats_bi(bhat = bbi5.sim.e1[, , 2],
se = bsi5.sim.e1[, , 2],
true = b5.e1,
drop = 100,
mess = mess.e1[, 2])
out4_b3 <- stats_bi(bhat = bbi5.sim.e1[, , 3],
se = bsi5.sim.e1[, , 3],
true = b5.e1,
drop = 100,
mess = mess.e1[, 3])
out4_b4 <- stats_bi(bhat = bbi5.sim.e1[, , 4],
se = bsi5.sim.e1[, , 4],
true = b5.e1,
drop = 100,
mess = mess.e1[, 4])
out4_b5 <- stats_bi(bhat = bbi5.sim.e1[, , 5],
se = bsi5.sim.e1[, , 5],
true = b5.e1,
drop = 100,
mess = mess.e1[, 5])
par(mfrow = c(2, 2), cex = 0.6)
plot(density(b2.e1), lwd = 2, lty = 1,
ylim = c(0, 0.7),
xlab = expression(hat(E)(beta[i])),
main = "A: Conditional estimates Normal",
col = "gray0")
lines(density(out1_b1$av.bi), col = "gray12", lwd = 2, lty = 2)
lines(density(out1_b2$av.bi), col = "gray25", lwd = 2, lty = 3)
lines(density(out1_b3$av.bi), col = "gray35", lwd = 2, lty = 4)
lines(density(out1_b4$av.bi), col = "gray45", lwd = 2, lty = 5)
lines(density(out1_b5$av.bi), col = "gray55", lwd = 2, lty = 6)
legend("topright",
c("True", "T = 1", "T = 5", "T = 10", "T = 20", "T = 50"),
lty = c(1, 2, 3, 4, 5, 6),
col = c("gray0", "gray12", "gray25", "gray35", "gray45", "gray55"))
plot(density(b3.e1), lwd = 2, lty = 1,
ylim = c(0, 0.7),
xlab = expression(hat(E)(beta[i])),
main = "B: Conditional estimates Normal",
col = "gray0")
lines(density(out2_b1$av.bi), col = "gray12", lwd = 2, lty = 2)
lines(density(out2_b2$av.bi), col = "gray25", lwd = 2, lty = 3)
lines(density(out2_b3$av.bi), col = "gray35", lwd = 2, lty = 4)
lines(density(out2_b4$av.bi), col = "gray45", lwd = 2, lty = 5)
lines(density(out2_b5$av.bi), col = "gray55", lwd = 2, lty = 6)
legend("topright",
c("True", "T = 1", "T = 5", "T = 10", "T = 20", "T = 50"),
lty = c(1, 2, 3, 4, 5, 6),
col = c("gray0", "gray12", "gray25", "gray35", "gray45", "gray55"))
plot(density(b4.e1), lwd = 2, lty = 1,
ylim = c(0, 0.7),
xlab = expression(hat(E)(beta[i])),
main = "Conditional estimates Lognormal",
col = "gray0")
lines(density(out3_b1$av.bi), col = "gray12", lwd = 2, lty = 2)
lines(density(out3_b2$av.bi), col = "gray25", lwd = 2, lty = 3)
lines(density(out3_b3$av.bi), col = "gray35", lwd = 2, lty = 4)
lines(density(out3_b4$av.bi), col = "gray45", lwd = 2, lty = 5)
lines(density(out3_b5$av.bi), col = "gray55", lwd = 2, lty = 6)
legend("topright",
c("True", "T = 1", "T = 5", "T = 10", "T = 20", "T = 50"),
lty = c(1, 2, 3, 4, 5, 6),
col = c("gray0", "gray12", "gray25", "gray35", "gray45", "gray55"))
plot(density(b5.e1), lwd = 2, lty = 1,
ylim = c(0, 0.2),
xlab = expression(hat(E)(beta[i])),
main = "Conditional estimates Lognormal",
col = "gray0")
lines(density(out4_b1$av.bi), col = "gray12", lwd = 2, lty = 2)
lines(density(out4_b2$av.bi), col = "gray25", lwd = 2, lty = 3)
lines(density(out4_b3$av.bi), col = "gray35", lwd = 2, lty = 4)
lines(density(out4_b4$av.bi), col = "gray45", lwd = 2, lty = 5)
lines(density(out4_b5$av.bi), col = "gray55", lwd = 2, lty = 6)
legend("topright",
c("True", "T = 1", "T = 5", "T = 10", "T = 20", "T = 50"),
lty = c(1, 2, 3, 4, 5, 6),
col = c("gray0", "gray12", "gray25", "gray35", "gray45", "gray55"))
@
\end{figure}
<>=
par(mfrow = c(2, 2), cex = 0.6)
plot(1:300, b2.e1[order(b2.e1)], pch = 19, lty = 2)
lines(out1_b1$av.bi[order(b2.e1)], col = "red")
lines(out1_b2$av.bi[order(b2.e1)], col = "blue")
lines(out1_b3$av.bi[order(b2.e1)], col = "purple")
lines(out1_b4$av.bi[order(b2.e1)], col = "green")
lines(out1_b5$av.bi[order(b2.e1)], col = "orange")
plot(1:300, b3.e1[order(b3.e1)], pch = 19, lty = 2)
lines(out2_b1$av.bi[order(b3.e1)], col = "red")
lines(out2_b2$av.bi[order(b3.e1)], col = "blue")
lines(out2_b3$av.bi[order(b3.e1)], col = "purple")
lines(out2_b4$av.bi[order(b3.e1)], col = "green")
lines(out2_b5$av.bi[order(b3.e1)], col = "orange")
plot(1:300, b4.e1[order(b4.e1)], pch = 19, lty = 2)
lines(out3_b1$av.bi[order(b4.e1)], col = "red")
lines(out3_b2$av.bi[order(b4.e1)], col = "blue")
lines(out3_b3$av.bi[order(b4.e1)], col = "purple")
lines(out3_b4$av.bi[order(b4.e1)], col = "green")
lines(out3_b5$av.bi[order(b4.e1)], col = "orange")
plot(1:300, b5.e1[order(b5.e1)], pch = 19, lty = 2)
lines(out4_b1$av.bi[order(b5.e1)], col = "red")
lines(out4_b2$av.bi[order(b5.e1)], col = "blue")
lines(out4_b3$av.bi[order(b5.e1)], col = "purple")
lines(out4_b4$av.bi[order(b5.e1)], col = "green")
lines(out4_b5$av.bi[order(b5.e1)], col = "orange")
@
\end{document}