You note the amount of the out of tolerance reading on the gage you sent to the outside vendor. You then use that to determine the amount of induced error in the gages it was used to calibrate. Determine from that if there were any erroneous in-tolerance or erroneous out-of-tolerance readings on those gages. If the accuracy of your standard gage was enough better than those gages it was used to calibrate, you MAY find that there was no implication. But you must determine if there was or was not erroneous results as described above. Hopefully the standard gage was much higher accuracy than those gages it was used to calibrate (at least 4:1 ratio of uncertainty). Final step is to evaluate those gages that were erroneously found in-tolerance or erroneously found out-of-tolerance. Compare the amount of this error with your process or product specifications (hopefully in this case, you had a suitable ratio between the accuracy and stability of the gages that were calibrated by your standard, and the specification limits of your process.
If the erroneous findings on the gages caused erroneous measurement results on your product, you then need to make a formal determination as to if any containment or other corrective actions need to be taken to assure no defective product reaches the customer. And, of course, it all needs to be well documented.
I have tried to walk through the stages of an out-of-tolerance evaluation. This situation underscores the importance of using adequate standards (I don't know whether yours are or not). But the point is to make sure that a standard is of adequate test uncertainty ratio between the standard and the units to be calibrated. This gives you breathing room, so that when a standard is out of tolerance, quite often there will be no impact to product due to the guard band. And the same is the case with measurement tools used for product. It is always wise on any measurement tool that measures an important or critical parameter on your product or process, to assure it is preferrably a little too accurate for the process (using good MSA methodologies as a determinant, and whatever other statistical techniques are appropriate). With these two guard bands in place you minimize the possibilities of things (for example) like having to do a tire recall. I won't throw any stones at that one. But when ever these things happen, they remind those of us in metrology that we have a responsibility that can sometimes effect peoples lives. Sorry... got on my soapbox.
Anyway, hope this is of some help.
------------------