Although in our previous post we established ∆E00 as the winner, thanks to the research and verification effort put on it, and despite of being the metric of choice recommended by CIE and ISO, we also know that it's not easy for industry to change norms overnight, even when they come with solid advantages. In fact, despite that recommendation, nowadays many people keep using ∆ECMC and ∆E94 (not to mention those who stick to ∆Eab). In this last post we will try to determine what to expect after adopting any of these metrics, by means of conveniently prepared graphics that will allow us to compare them.
We have already made some of those comparisons in previous posts, but now we will try to find a common ground for evaluating them all.
How can we compare metrics?
In order to answer this question, let first say that being a metric a comparison formula, to compare metrics means... to compare comparisons! Faced with this concern that kept me awake for a long time (well, not so long really), my first reaction was to find out how researchers who develop those different metrics claimed at that time "this metric I have just invented is better than that one". How they objectively claim a metric's superiority?
In the first post of this series, we established this:
A metric should be considered better than any other when the color differences it predicts better correlates with a human observer's perception.
This means that in order to prove that superiority we must define in an objective way the correlation between metric predictions and human perception.
In different papers where these researches are published it is often used a correlation factor known as Standardized Residuals Sum of Squares, also known as STRESS. In plain terms, this method works over an observation data set (compared color pairs in our case) and it compares the observed difference between them to the value predicted by the formula. Then a difference between those differences is computed, resulting in a positive error quantity (formula predicts a difference greater than observed) or a negative one (it predicts a lesser one).
The merit of a metric has something to do with the whole set of all those differences; if all of them combined represent a small quantity, this metric should be better than other where the same quantity is larger. The simplest way of having "all of them combined" would be to sum them up, but as some of them are positive while others are negative, they may tend to cancel each other. We mathematicians and engineers have a little trick for cases like this: we first square the differences, turning them all into positive quantities, and then we sum all them up. The STRESS method operates on this idea, but instead of getting an absolute value for the combined error, it obtains a relative one on a percentage scale, allowing direct comparison of different factors.
A more "visual" comparison
The STRESS method may be easy to grasp for experts, but for the general public, even with a technical background, it could be daunting. I came up with a more intuitive way, based on a simple idea:
Let's imagine we move freely in L*a*b* space by "steps" of exactly 1 ∆E according to the simplest ∆Eab formula. So every step would be of length 1 (one) only on that metric; the other ones will say that step is larger or shorter, depending upon the "place" we are located. Being color space tridimensional, for sake of simplicity we will chose to travel that space using linear routes or "paths" to move along, and this will allow us to represent in a simple graph how that unity sized step changes by traveling that path. To make this visualization more complete, we will choose several different paths.
There is certainly many criterions we can use to choose those paths; once again, for simplicity we will limit ourselves to two classes of them:
- We will depart from the L* axis (a* = b* = 0) and move in a straight line towards higher C* values, keeping h* constant, for several values of h* (differences in "chroma"):
- We will spin in circles around the L* axis, keeping a radius C* constant and visiting every h*, for several values of C* (differences in "hue"):
In both cases, we will stay at the "middle" of the L*a*b* space, i.e. L* = 50.
This "mathematical tour" was made possible thanks to an advanced version (currently in development) of an Excel add-on I published some time ago (available here) which greatly facilitates this type of calculation. The important thing for us here is that results are easy to plot. Let's show them immediately.
Straight paths (h* = constant)
The next graph, at L* = 50 and varying h* in 5º steps, shows the equivalency of a ∆Eab = 1 step in each one of ∆Eab, ∆ECMC, ∆E94, and ∆E00 metrics. As we take ∆Eab as a reference, this one is always 1. Recall we are analyzing here the behaviour of differences beween samples with a change in "chroma" only. Several facts are apparent:
- Only ∆E00 takes hue into account when chroma changes, as its curve "oscillates" with different h* between ∆E94 and ∆ECMC, which remain "fixed". Thus, ∆E00 behaves just like ∆E94 on yellows (h* = 90º) and blues (h* = 270º), and it behaves almost like ∆ECMC on red-magentas (h* = 0º) and turquoise-greens (h* = 180º);
- By using ∆E94 we are going to underestimate differences near the achromatic axis for C* less than 7 (aprox) for colors on the a* axis (b* near zero);
- By using ∆ECMC we are going to overestimate differences a little bit almost everywhere, specially for colors on the b* axis (a* near zero);
- Last, we should note that ∆Eab almost always overestimates differences in chroma as soon as we leave the achromatic zone, reaching estimations 5 times greater for colors with high saturation (C* > 90).
Circular paths (C* = constant)
The following graph, at L* = 50 and C* varying in 2-units steps, shows the equvalency of a ∆Eab = 1 step in each one of ∆Eab, ∆ECMC, ∆E94, and ∆E00 metrics. In this case we cannot talk of "∆h* = 1 steps" because h* is an angular parameter; distance between two colors with same C* but with different h* depends on C* (this difference is known as ∆H* and equals the chord between samples on the C* circle they belong to). Same as before, being ∆Eab our reference metric, it remains constantly equal to 1. Recall we are analyzing here the behaviour of differences beween samples with a change in "hue" only. Several facts are again apparent:
- ∆E94 (besides ∆Eab, naturally) does not take into account hue changes in color diferences (it assigns the same "weight" to all of them, depending only on C*), so its graph is always "flat";
- ∆ECMC indeed take into account hue changes, but it seems to exaggerate on oranges (h* ≈ 63º), and it shows three abrupt changes, the other two being on greens (h* ≈ 171º) and on blues (h* ≈ 286º);
- ∆E00 does also adapt to hue changes, but in a smooth, uniform way, oscillatng around ∆E94 and suggesting a "better" behaviour.
I decided to go with ∆E00. What should I consider?
Let's assume we are convinced of ∆E00 superiority, as we have already shown. If we were establishing a tolerance system from scratch, with no relationship with previous experience whatsoever, it would be rather simple (although not trivial) to look for appropiate tolerances for using with ∆E00. The problem arises when we expect a statistical result that matches my previous experience. So, the bonus question is:
Is there a tolerance equivalency between ∆Eab and ∆E00?
Should that equivalency exists, we could replace the limits applicable with ∆Eab for those equivalent for ∆E00 and expect a pass/fail ratio close to the usual one. Unfortunately, there is nothing close to say "so many ∆Eab are equivalent to so many ∆E00", as we'll soon show. This means:
- Adopting ∆E00 as oir standard, we have necessarily to admit that, sometimes, we may have passed jobs that should have been rejected, and the other way round;
- If we want to keep about the same pass/fail ratio, not only tolerances have to be modified but just one tolerance value may not be enough as well.
About the former, we can do nothing, right? But about the latter, we have at least ISO 12647-2:2013 directives stating that CMYK solid tolerances should have a limit of 5 ∆Eab, but this limit for CMY must be lowered to 3.5 if using ∆E00, keeping 5 for black. This directive enables another argument (in fact the same, but viewed from the other side) supporting those willing to "stick" to ∆Eab: the (wrong) sensation of having to comply with tighter tolerances. And, to make things worse, ISO does not explain why a 5 ∆Eab tolerance seems to be equivalent to 3.5 ∆E00, and for chromatiic solids only.
In a study where it was specifically sought if it was possible to establish an equivalence between ∆Eab and ∆E00, not only it concludes it's not, but it also establishes that the optimal tolerance values (at least for the analyzed data set) to match the same pass/fail ratio that it would result from applying the classic 5 ∆Eab limit according to ISO 12647-2:2013, should be different for each ink. Those optimal values are the following:
|∆Eab ISO 12647-2||5|
|∆E00 ISO 12647-2||3.5||5|
|∆E00 giving the same result as ∆Eab ISO 12647-2
That report also explains that is not possible to safely choose an "intermediate" limit. For instance, if we choose the larger one (cyan: 4.1), almost all yellows will pass, while if we choose the smaller one (yellow: 2.4) about half of cyans will fail. The proposed value of 3.5 seems to have been chosen as a compromise value.