So the reference was for the OP ‘not you’. Since they asked for a reference.
I would ask the OP to clarify why they think that recalculating the center line is a good idea. A simple question can often miss the real question and lead to a simple answer that misses the point.
In the article you mention Wheeler does discuss continually recalculating the limits with an example. He calls it “polishing the limits”. He also concludes that this polishing is extra work that adds no additional insight at best and can allow for missing real changes at worst. We are better off searching for assignable causes and eliminating or controlling them to eliminate their effect. Then we change the limits to reflect the new stable process.
There are some situations where the center line might be recalculated: for example some assays have different average values due to the non-homogeneity of the lots. The within lot variation is the same but the average may be different. This typically requires a re-calibration but may only require a change in the average for the QC testing. But this isn’t a continual recalculation - it is lot related and a reaction to a natural (albeit assignable) shift in the process that cannot be eliminated. It can be adjusted for - like tool wear.
It is important to note that the laws of physics and math haven’t changed since 1939.
Technology has changed - it’s made our manufacturing processes and our products more complex. This complexity often results in naturally non-homogenous processes and more frequent need for rational subgrouping. But control charts still work the way they did from 1930 thru to today when appropriate…
Of course there are ‘gurus’ that have gotten it wrong or at least only partially right 1000 years ago to just yesterday. There are hacks and charlatans who simplify things beyond recognition to make it simple to make money. There are incompetent and/or greedy people who chase research money and tenure and ‘fame’. (I now recommend “Science Fictions” by Stuart Ritchie - it is an excellent dissection of fraud and misuse in the scientific research field.). This leaves us with critical thinking and logic as well as better angels who can reproduce or drill into the original studies. But Shewhart’s charts have stood the test of time, even if they aren’t the only ‘statistical’ method in our tool box.
Some of the technology now enables other forms of process controls to monitor and adjust processes to reduce variation - I’ve used them and advocated for all of them provided they are used properly and appropriately. I have also automated 10s of thousands of control charts that monitor massive amounts of field data from very complex medical diagnostic instruments as well as hundreds of manufacturing charts for test results of complex diagnostic instruments, automobiles, semiconductors and jet engines, - using the same math Shewhart proposed back during the dark ages. I’ve also used the p’chart and the 3D chart for situations that Shewhart couldn’t have imagined because he didn’t encounter them. They were his ‘black swans’.
Since I am retired I no longer have access to the raw data from my confirmatory studies regarding continual recalculation (I used it for teaching purposes for the non-believers). Even if I had the data I wouldn’t publish it for proprietary reasons and the size - massive data sets - well over 100k records that would require me to edit to obscure proprietary information. I also don’t have any fancy stats software anymore - I’d rather buy Prosecco and murder mysteries with that money…so you’ll have to do your own experiments. Or maybe ask Dr. Wheeler himself for another example. And simulations aren’t experiments. BUT if you follow the logic of how the math works you can intuit or deduce that the recalculation of the limits has no added value and will artificially inflate or change the limits by incorporating the increased variation of process changes.
One example without the data is a diagnostic instrument that utilized slides from certain chemical tests. The slides were consumables that came in rather large lots. The instruments were connected to the internet and we collected all data from the instruments including test failures for invalid results (out of range results reported as failed - categorical data) and actual test values (continuous data) for each test including the invalid results. Each slide type was monitored on two charts: one was a p chart for the invalid results and one was an Xbar, S chart for the actual test results. We also plotted Customer complaints for each characteristic/failure but unless the failure was critical the Customer tended to under report until they wanted replacement product. This monitoring was done by software ‘behind the scenes’ with built in alarm emails and flags on the data. Although many of the charts were plotted and viewed by human eyes, these were only the critical characteristics and the top 10 failures for each product. The rest relied on the software alarming to identify emerging Problems. As the new lot of slides entered into use (some practices had enough of the last lot to not need material from the new lot for months, some needed to use it as soon as it entered the market place), the invalid test rate began to creep up slowly (not every slide in the new lot failed). The Xbar, S chart also started to demonstrate an increase in the within subgroup variation and the subgroup average began to creep away from the fixed average. It didn’t take long for the p chart, the S chart and then the Xbar chart to alarm as the values moved above the fixed averages (or center) of the stable process. Investigation that broke down the results for the two active lots showed that the degradation came from the new lot and it was pretty bad but it only had a small effect on the overall results as it slowly integrated into active use as the older lot was slowly consumed and replaced by the new lot of material. A post hoc analysis of the data where the fixed limits - including the average - were recalculated as new data came in showed that the charts would not have alarmed during the period when we experienced the alarm. Extrapolating out the known failure rates and test results dispersion/change for each of the two lots showed that a continually recalculated set of limits would only have alarmed on the failure chart and then only toward the end of the full incorporation of the new lot. We knew (through the data) that the Customer complaint rate would have increased substantially before the recalculated charts would have alarmed. This would have been a Customer satisfaction disaster.
If you have a stable process recalculating the limits will not make things better - or worse; but few processes stay stable for long periods of time. In this case recalculating the limits can miss actual changes. So what value is there in continually recalculating the limits? How can it help us improve quality - or catch changes early before defects/failures occur? Please enlighten us. You also don’t say what the “better ways of detecting a long term drift are”. Can you expound this?
And so I add that you have provided no logical rationale for the added complexity of recalculating the limits. You also haven’t explained how the math works to meet the intent: determine what the stable variation is and then monitor the process to (1) not tamper when you see stable random variation and (2) investigate when an assignable change has occurred. As for the excuse that the software can easily do it, remember, just because you can doesn’t mean you should.
What were your applied control rules? E.g. if you try to detect a long-term drift using only the "one point beyond the +/-3Sigma limits" rule, it completely makes sense that fixing the control limits is the better option. However, this is not how I control stable processes.
Looking forward to your input.
We used them all - as appropriate. In the example above the number of points above or below the average AND the trend rules alarmed.