Early approaches to embedding bias tried to 'project out' bias directions from vector space — remove the gender axis, remove the race axis. Research now shows these techniques are superficial: bias is distributed across the embedding space and resurfaces in downstream tasks even after apparent removal.

The intuitive fix for biased embeddings is appealing. Identify the direction in vector space corresponding to bias (say, the gender axis between 'he' and 'she'), then mathematically project it out of every embedding. Problem solved — right? Research from 2019 onward has shown the picture is far messier. Gonen and Goldberg's 'Lipstick on a Pig' paper demonstrated that post-hoc debiasing methods make bias less detectable by the specific tests they target, but the underlying geometric structure remains. Embeddings for 'doctor' and 'nurse' still cluster in ways that correlate with gender even after apparent debiasing — and downstream tasks like coreference resolution still show biased outcomes. The deeper issue is that bias in embeddings isn't confined to a single axis. It's distributed across many dimensions, entangled with genuine semantic signal, and encoded through complex correlations in training data. Removing it surgically without degrading useful semantic structure turns out to be extremely difficult. More promising approaches tackle the problem at earlier stages: careful training data curation, counterfactual data augmentation (duplicating examples with swapped demographic attributes), adversarial debiasing during training, and continual bias evaluation across downstream tasks rather than just intrinsic tests. The frontier also includes contextual debiasing: adjusting how biased embeddings are used in downstream decisions rather than trying to purify the embeddings themselves. For practitioners, the key lesson is sobering: debiasing embeddings is a research problem, not a solved engineering task. Bias audits must test downstream outcomes, not just geometric properties.

Debiasing Embeddings: Why Simple Fixes Don't Work