Sensitivity Evaluation for Unobserved Confounding | by Ugur Yildirim | Feb, 2024

[ad_1]

Find out how to know the unknowable in observational research

  1. Introduction
  2. Drawback Setup
    2.1. Causal Graph
    2.2. Mannequin With and With out Z
    2.3. Power of Z as a Confounder
  3. Sensitivity Evaluation
    3.1. Aim
    3.2. Robustness Worth
  4. PySensemakr
  5. Conclusion
  6. Acknowledgements
  7. References

The specter of unobserved confounding (aka omitted variable bias) is a infamous downside in observational research. In most observational research, except we are able to moderately assume that therapy task is as-if random as in a pure experiment, we are able to by no means be actually sure that we managed for all attainable confounders in our mannequin. Consequently, our mannequin estimates could be severely biased if we fail to regulate for an essential confounder–and we wouldn’t even understand it for the reason that unobserved confounder is, effectively, unobserved!

Given this downside, you will need to assess how delicate our estimates are to attainable sources of unobserved confounding. In different phrases, it’s a useful train to ask ourselves: how a lot unobserved confounding would there need to be for our estimates to drastically change (e.g., therapy impact not statistically vital)? Sensitivity evaluation for unobserved confounding is an lively space of analysis, and there are a number of approaches to tackling this downside. On this publish, I’ll cowl a easy linear methodology [1] based mostly on the idea of partial that’s extensively relevant to a big spectrum of instances.

2.1. Causal Graph

Allow us to assume that we now have 4 variables:

  • Y: end result
  • D: therapy
  • X: noticed confounder(s)
  • Z: unobserved confounder(s)

This can be a widespread setting in lots of observational research the place the researcher is taken with understanding whether or not the therapy of curiosity has an impact on the end result after controlling for attainable treatment-outcome confounders.

In our hypothetical setting, the connection between these variables are such that X and Z each have an effect on D and Y, however D has no impact on Y. In different phrases, we’re describing a state of affairs the place the true therapy impact is null. As will change into clear within the subsequent part, the aim of sensitivity evaluation is having the ability to motive about this therapy impact when we now have no entry to Z, as we usually gained’t because it’s unobserved. Determine 1 visualizes our setup.

Determine 1: Drawback Setup

2.2. Mannequin With and With out Z

To show the issue that our unobserved Z could cause, I simulated some information in keeping with the issue setup described above. You’ll be able to consult with this pocket book for the main points of the simulation.

Since Z could be unobserved in actual life, the one mannequin we are able to usually match to information is Y~D+X. Allow us to see what outcomes we get if we run that regression.

Primarily based on these outcomes, it looks as if D has a statistically vital impact of 0.2686 (p<0.001) per one unit change on Y, which we all know isn’t true based mostly on how we generated the information (no D impact).

Now, let’s see what occurs to our D estimate after we management for Z as effectively. (In actual life, we after all gained’t be capable of run this extra regression since Z is unobserved however our simulation setting permits us to peek behind the scenes into the true information era course of.)

As anticipated, controlling for Z appropriately removes the D impact by shrinking the estimate in the direction of zero and giving us a p-value that’s not statistically vital on the 𝛼=0.05 threshold (p=0.059).

2.3. Power of Z as a Confounder

At this level, we now have established that Z is powerful sufficient of a confounder to get rid of the spurious D impact for the reason that statistically vital D impact disappears after we management for Z. What we haven’t mentioned but is precisely how robust Z is as a confounder. For this, we are going to make the most of a helpful statistical idea referred to as partial , which quantifies the proportion of variation {that a} given variable of curiosity can clarify that may’t already be defined by the present variables in a mannequin. In different phrases, partial tells us the added explanatory energy of that variable of curiosity, above and past the opposite variables which might be already within the mannequin. Formally, it may be outlined as follows

the place RSS_reduced is the residual sum of squares from the mannequin that doesn’t embrace the variable(s) of curiosity and RSS_full is the residual sum of squares from the mannequin that features the variable(s) of curiosity.

In our case, the variable of curiosity is Z, and we want to know what quantity of the variation in Y and D that Z can clarify that may’t already be defined by the present variables. Extra exactly, we have an interest within the following two partial values

the place (1) quantifies the proportion of variance in Y that may be defined by Z that may’t already be defined by D and X (so the lowered mannequin is Y~D+X and the total mannequin is Y~D+X+Z), and (2) quantifies the proportion of variance in D that may be defined by Z that may’t already be defined by X (so the lowered mannequin is D~X and the total mannequin is D~X+Z).

Now, allow us to see how strongly related Z is with D and Y in our information when it comes to partial .

It seems that Z explains 16% of the variation in Y that may’t already be defined by D and X (that is partial equation #1 above), and 20% of the variation in D that may’t already be defined by X (that is partial equation #2 above).

3.1. Aim

As we mentioned within the earlier part, unobserved confounding poses an issue in actual analysis settings exactly as a result of, not like in our simulation setting, Z can’t be noticed. In different phrases, we’re caught with the mannequin Y~D+X, having no solution to know what our outcomes would have been if we might run the mannequin Y~D+X+Z as an alternative. So, what can we do?

Intuitively, an inexpensive sensitivity evaluation method ought to be capable of inform us that if a Z such because the one we now have in our information had been to exist, it might nullify our outcomes. Do not forget that our Z explains 16% of the variation in Y and 20% of the variation in D that may’t be defined by noticed variables. Subsequently, we count on sensitivity evaluation to inform us {that a} hypothetical Z-like confounder of comparable energy could be sufficient to get rid of the statistically vital D impact.

However how can we calculate that the unobserved confounder’s energy must be on this 16–20% vary within the partial scale with out ever accessing it? Enter robustness worth.

3.2. Robustness Worth

Robustness worth (RV) formalizes the thought we talked about above of figuring out the required energy of a hypothetical unobserved confounder that might nullify our outcomes. The usefulness of RV emanates from the truth that we solely want our observable mannequin Y~D+X and never the unobservable mannequin Y~D+X+Z to have the ability to calculate it.

Formally, we are able to write down as follows the RV that quantifies how robust unobserved confounding must be to vary our noticed statistical significance of the therapy impact (if the notation is an excessive amount of to observe, simply keep in mind the important thing concept that the RV is a measure of the energy of confounding wanted to vary our outcomes)

Picture by creator, equations based mostly on [1], see pages 49–52

the place

  • 𝛼 is our chosen significance stage (usually set to 0.05 or 5%),
  • q determines the % discount q*100% in significance that we care about (usually set to 1, since we often care about confounding that would cut back statistical significance by 1*100%=100% therefore rendering it not statistically vital),
  • t_betahat_treat is the noticed t-value of our therapy from the mannequin Y~D+X (which is 8.389 on this case as could be seen from the regression outcomes above),
  • df is our levels of freedom (which is 1000–3=997 on this case since we simulated 1000 samples and are estimating 3 parameters together with the intercept), and
  • t*_alpha,df-1 is the t-value threshold related to a given 𝛼 and df-1 (1.96 if 𝛼 is about to 0.05).

We are actually able to calculate the RV in our personal information utilizing solely the noticed mannequin Y~D+X (res_ydx).

It’s by no struck of luck that our RV (18%) falls proper within the vary of the partial values we calculated for Y~Z|D,X (16%) and D~Z|X (20%) above. What the RV is telling us right here is that, even with none express information of Z, we are able to nonetheless motive that any unobserved confounder wants, on common, at the least 18% energy within the partial scale vis-à-vis each the therapy and the end result to have the ability to nullify our statistically vital consequence.

The explanation why the RV isn’t 16% or 20% however falls someplace in between (18%) is that it’s designed to be a single quantity that summarizes the required energy of the confounder with each the end result and the therapy, so 18% makes good sense given what we all know concerning the information. You’ll be able to give it some thought like this: for the reason that methodology doesn’t have entry to the precise numbers 16% and 20% when calculating the RV, it’s doing its finest to quantify the energy of the confounder by assigning 18% to each partial values (Y~Z|D,X and D~Z|X), which isn’t too far off from the reality in any respect and truly does an excellent job summarizing the energy of the confounder.

After all, in actual life we gained’t have the Z variable to double verify that our RV is right, however seeing how the 2 outcomes align right here ought to at the least provide you with some confidence within the methodology. Lastly, as soon as we calculate the RV, we must always take into consideration whether or not an unobserved confounder of that energy is believable. In our case, the reply is ‘sure’ as a result of we now have entry to the information era course of, however on your particular real-life utility, the existence of such a powerful confounder is perhaps an unreasonable assumption. This is able to be excellent news for you since no reasonable unobserved confounder might drastically change your outcomes.

The sensitivity evaluation method described above has already been applied with all of its bells and whistles as a Python package deal beneath the title PySensemakr (R, Stata, and Shiny App variations exist as effectively). For instance, to get the very same consequence that we manually calculated within the earlier part, we are able to merely run the next code chunk.

Notice that “Robustness Worth, q = 1 alpha = 0.05” is 0.184, which is precisely what we calculated above. Along with the RV for statistical significance, the package deal additionally gives the RV that’s wanted for the coefficient estimate itself to shrink to 0. Not surprisingly, unobserved confounding must be even bigger for this to occur (0.233 vs 0.184).

The package deal additionally gives contour plots for the 2 partial values, which permits for an intuitive visible show of sensitivity to attainable ranges of confounding with the therapy and the end result (on this case, it shouldn’t be shocking to see that the x/y-axis worth pairs that meet the pink dotted line embrace 0.18/0.18 in addition to 0.20/0.16).

One may even add benchmark values to the contour plot as proxies for attainable quantities of confounding. In our case, since we solely have one noticed covariate X, we are able to set our benchmarks to be 0.25x, 0.5x and 1x as robust as that noticed covariate. The ensuing plot tells us {that a} confounder that’s half as robust as X must be sufficient to nullify our statistically vital consequence (for the reason that “0.5x X” worth falls proper on the pink dotted line).

Lastly, I want to word that whereas the simulated information on this instance used a steady therapy variable, in follow the strategy works for any form of therapy variable together with binary therapies. Then again, the end result variable technically must be a steady one since we’re working within the OLS framework. Nonetheless, the strategy can nonetheless be used even with a binary end result if we mannequin it utilizing OLS (that is referred to as a LPM [2]).

The chance that our impact estimate could also be biased attributable to unobserved confounding is a standard hazard in observational research. Regardless of this potential hazard, observational research are a significant device in information science as a result of randomization merely isn’t possible in lots of instances. Subsequently, you will need to know the way we are able to tackle the problem of unobserved confounding by working sensitivity analyses to see how strong our estimates are to potential such confounding.

The robustness worth methodology by Cinelli and Hazlett mentioned on this publish is a straightforward and intuitive method to sensitivity evaluation formulated in a well-known linear mannequin framework. In case you are taken with studying extra concerning the methodology, I extremely suggest looking on the authentic paper and the package deal documentation the place you’ll be able to study many extra fascinating purposes of the strategy reminiscent of ‘excessive state of affairs’ evaluation.

There are additionally many different approaches to sensitivity evaluation for unobserved confounding, and I would love briefly point out a few of them right here for readers who want to proceed studying extra on this matter. One versatile method is the E-value developed by VanderWeele and Ding that formulates the issue when it comes to threat ratios [3] (applied in R right here). One other method is the Austen plot developed by Veitch and Zaveri based mostly on the ideas of partial and propensity rating [4] (applied in Python right here), and yet one more current method is by Chernozhukov et al [5] (applied in Python right here).

I want to thank Chad Hazlett for answering my query associated to utilizing the strategy with binary outcomes and Xinyi Zhang for offering a whole lot of helpful suggestions on the publish. Except in any other case famous, all photos are by the creator.

[1] C. Cinelli and C. Hazlett, Making Sense of Sensitivity: Extending Omitted Variable Bias (2019), Journal of the Royal Statistical Society

[2] J. Murray, Linear Likelihood Mannequin, Murray’s private web site

[3] T. VanderWeele and P. Ding, Sensitivity Evaluation in Observational Analysis: Introducing the E-Worth (2017), Annals of Inner Medication

[4] V. Veitch and A. Zaveri, Sense and Sensitivity Evaluation: Easy Submit-Hoc Evaluation of Bias Because of Unobserved Confounding (2020), NeurIPS

[5] V. Chernozhukov, C. Cinelli, W. Newey, A. Sharma, and V. Syrgkanis, Lengthy Story Brief: Omitted Variable Bias in Causal Machine Studying (2022), NBER

[ad_2]

Lascia un commento

Il tuo indirizzo email non sarà pubblicato. I campi obbligatori sono contrassegnati *