This is easily solved using the scientific method which you've already done.
Each configuration must be proved wrong.
There are only three options.
Box 1: Message 1 = ✅; Message 2 = ❌; Message 3 = ✅;
Box 2: Message 1 = ❌; Message 2 = ✅; Message 3 = ✅;
Box 3: Message 1 = ✅; Message 2 = ❌; Message 3 = ❌;
Only one configuration fits the description.
Which is box 3.
If all configurations where incorrect then we assume trick question.
But because we already have a provable answer there's no reason to assume it's wrong.
Even if a bunch of other people got it wrong or are gaslighting on purpose.
I mean seriously though people are dumb and most can't even do PEMDAS basic arithmetic these days.