There are 3 separate concepts in your example, but you used 95% for all three concepts. If I explain using different percentages, there may be less confusion about the different percentages.

The terms confidence and beta error come from the universe of hypothesis testing. In a manufacturing quality context, we start with the hypothesis the overall quality of supplier’s entire shipment is acceptable. There is the true (unknown) state of the shipment, and there is a decision made based on an experiment or a sample. There are 4 possible outcomes:

In reality, the product is good, and decision is made to accept <- this is desired outcome

In reality, the product is bad, and decision is made to reject <- this is desired outcome

In reality, the product is good, and decision is made to reject <- this is Type I error

In reality, the product is bad, and decision is made to accept <- this is Type II error

You used the term reliability to indicate that some fraction of the shipment can be non-conforming and the batch would still be acceptable. For my example, out of a batch 1000 widgets I want assurance 90% are within spec, so reliability must be 90%. Based on testing the sample, I accept or reject the entire lot. Because we are counting rejects rather than evaluating the mean of a continuous variable, I use a discrete probability function rather than a Z-test or a t-test. (I used the binomial instead of the hypergeometric since the batch >> sample size.) If there are 10% bad parts in the batch of 1000, there is 35% chance of seeing none in a sample of 10 (which I calculated using the BINOM function in Excel) and therefore 65% cumulative probability of seeing one or more out of 10. To be 95% confident to detect a lot of 1000 which is 10% bad, I need to see zero in a sample of 29 (to be better than 95% confident in this 1-sided test).

In a single shipment, there is 5% risk I reject the entire batch of parts which are actually 90% quality, due to luck of the draw in the sample. In this scenario, the producer’s risk of loss is 5%, which is also called the alpha risk, the risk of wrongly rejecting the hypothesis (that the quality of parts is better than 90%). The significance level of the hypothesis test (as illustrated in this scenario) is 95%.

For all possible quality levels, there is the possibility that a shipment with unacceptable quality might be accepted. Let’s say I want no more than 20% chance of a Type II error where the sample of widgets from an overall defective lot contains zero nonconforming parts (I am willing to accept a little more risk because I have automatic in-process gaging). This consumer’s risk is called Beta, where the hypothesis of 90% quality parts is wrongly accepted. The Beta risk weighs the range of possible scenarios with the defined statistical result. The inverse of the beta risk is called the power of a statistical test, the probability of correctly rejecting a hypothesis (when the alternate is true in reality). The power in this scenario would typically be determined using statistical tables or a computer.

See also

Other 1-Sample Binomial | Power and Sample Size Calculators | HyLown
http://www.real-statistics.com/bino...ions/statistical-power-binomial-distribution/