why we can & probably should use missing at random imputation methods for data that’s not missing at random?

May 17, 2024October 12, 2024by Kurious Fox

Missing At Random (MAR) imputation methods are based on the assumption that the chance of missing data is not related to the missing data itself, but might be related to some of the observed data. In other words, the missingness can be predicted from other variables.

Missing Not At Random (MNAR) refer to any data set where the pattern of missingness is dependent on data values that are not observed, and hence the missingness cannot be predicted solely by other observed data.

For example, suppose you are conducting a survey on income level. It might be the case that people with higher incomes are less likely to disclose their income compared to those with lower incomes. If someone didn’t disclose their income, it’s more likely that they are from the higher-income group. So, the probability of income being missing is related to the value of the income itself, which is not observed. This is a classic case of data Missing Not At Random.

For (MNAR), the missingness depends on the information not available in the data set. This scenario makes it very difficult to accurately impute the missing values. Also, the implementation of NMAR techniques are not widely available as MAR techniques.

Despite this complexity, we can still apply MAR imputation methods to MNAR data, of course with some sacrifice in imputation quality. However, in most cases, the performance is still promising, and is acceptable to use. Why is that? There’re several reasons:

for MNAR, the missingness depends both on observed and missing data. Therefore, if you use MAR techniques for MNAR data, it basically mean you’re not using the information that you have from missing data. So, if the information from the missing data is not too significant, it’s maybe not worth the trouble to use MNAR techniques for MAR.
MNAR techniques may be more complicated and require solving a more difficult optimization problem. So, it’s maybe more difficult for it to converge to the optimum as well. So, there’s no guarrantee that MNAR techniques work better than MAR techniques

Discover more from Science Comics

Subscribe to get the latest posts sent to your email.

why we can & probably should use missing at random imputation methods for data that’s not missing at random?

Like this:

Related

Discover more from Science Comics

Like this:

Like this:

Like this:

Leave a ReplyCancel reply

Share this:

Like this:

Related

Discover more from Science Comics

Related Posts

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Leave a ReplyCancel reply