Two OODs, Out of Domain and Out of Distribution

Two concepts that are often confused

Shiro Matsumoto
3 min readFeb 7, 2024

In machine learning papers, the terms Out of Distribution and Out of Domain appear in uncertainty estimation or generalization discussions. Do these mean the same thing? Actually, they are different concepts under my understanding.

In the context of machine learning, Out of Domain is a concept paired with Applicable Domain, and Out of Distribution is a concept paired with In Distribution.

  1. Applicable Domain and Out of Domain
    Applicable Domain
    This term refers to the range or domain of data for which the model performs adequately for its intended purpose. This term broadly implies the characteristics and conditions of the data under which the model will function properly.
    Out of Domain
    This term refers to data outside the original domain or range for which the model was designed or trained. This could include different characteristics or different types of data, not just different distributions.
  2. In Distribution and Out of Distribution
    In Distribution
    This term refers to data from the same distribution as the data set used when the model was trained. In other words, data with the types and characteristics of data that the model is familiar with.
    Out of Distribution
    This term refers to data from a distribution that is different from the distribution of the data the model saw during training. This means data that is unknown or anomalous to the model.

In short, Out of Distribution and Out of Domain have in common that they refer to unusual data for which the model may not function properly. However, when the unusual data is “data outside the domain of the training data,” it is Out of Domain, and when it is “data outside the distribution of the training data,” it is Out of Distribution.

Let me elaborate on this a bit more. Within the Applicable Domain, there may be Out of Distribution samples. Applicable Domain refers to the intended applicability domain in which the model was designed or trained. However, not all data within this domain will have the same distribution as the model’s training data. Data that follow this different distribution are Out of Distribution. On the other hand, it is rare to find In Distribution data within an Out of Domain, and most of the time it will be Out of Distribution data.

Consider the case of model training with English text data. If the model is trained on an extremely large amount of English text, most of the English texts will be Applicable Domain and In Distribution. However, if you forgot to include contractions that are used only by a subset of the population (e.g., young people, gamers, etc.) in the training data set, those contractions will be Applicable Domain and Out of Domain. Also, since the training data is English text only, Japanese text must be Out of Domain. In addition, since the distributional characteristics of most Japanese texts are different from those of English texts, most Japanese texts are Out of Distribution.

As another example, a simple regression problem is shown as an image. Assuming that a regression line is obtained using the blue dots as training data, the center of the figure (light blue background) is the Applicable Domain, and the left and right sides are the Out of Domain. Some of the test (production) data are In Distribution and some are Out of Distribution.

When abbreviated as OOD, it is likely to mean Out of Distribution, but in some papers, it means Out of Domain. It is necessary to be careful.

--

--

Shiro Matsumoto
Shiro Matsumoto

Written by Shiro Matsumoto

Here's something that hasn't been written yet and isn't a copy and paste. Data Scientist in Washington, DC

No responses yet