Histogram beginner dilemma - Manual Calculation vs. JMP 7

B

Bjourne

I just am trying to study 7 QC Tools and I started off with Histogram and then Control Charts(variable data first). As I was finished with computing manually the data I have I checked it out with an officemate via JMP 7. I had different results mainly because my spread is not the same as that of JMP 7's....

Please see the image of my data table below.

No of observations = 75

Max = 2.15 Min = 1.70

Range = Max - Min = .45

Class Interval = square root of 75 = 8.6602 = 9

STDEV = .099742912

Internal Class Width = STDEV / 3 = .099742912 / 3 = .03

Please see my tally sheet and histogram below.

Here is the JMP 7 image my officemate created. (I do not know how to use JMP 7 by the way :)...)

Please allow me some beginner questions..

As the image shows, JMP 7's is different than mine. I have 10 bars the same as JMP but the distance of the data "0" is not the same.

Is my histogram wrong?

What's the rule of the thumb for the "internal class width"...? Is it:

(1) STDEV / 3 which is the Standard Deviation divided by 3

or

(2) Internal Class Width = Range / Class Interval....? Where if I will use my data above, my class interval is the square root of 75 rounded to "0", which is 9. So if I apply the formula mentioned, .45 / 9 = .05

Please see image below if I use (2).


When will I use (1)...?

When will I use (2)...?

I also saw this at the moresteam_com website about histograms,


1. Count the number of data points (50 in our height example).

2. Determine the range of the sample - the difference between the highest and lowest values (73.1-65, or 8.1 inches in our height example.

3. Determine the number of class intervals.

You can use either of two methods as general guidelines in determining the number of intervals:
A. Use ten intervals as a rule of thumb.
B. Calculate the square root of the number of data points and round to the nearest whole number. In the case of our height example, the square root of 50 is 7.07, or 7 when rounded. You may wish to experiment with different interval numbers. If there are too many, the distribution will spread out, and the histogram will look flat. Likewise, if there are too few intervals, the distribution can look artificially tight.

4. Determine the interval class width by one of two methods:

A. Width = Range/# Intervals = 8.1 / 10 = 0.81
B. Divide the Standard Deviation by three. In this case, the height data has a Standard Deviation of 1.85, which yields a class interval size of 0.62 inches, and therefore a total of 14 class intervals (Range of 8.1 divided by 0.62, rounded up).

This is slightly more class intervals than our rule of thumb indicated....

There are two options given there, (A) and (B). If I use (A) in determining the number of class intervals, am I to use (A) also in determining the internal class width..?

If I use (B) in determining the number of class intervals then also in determining the internal class width..?

This is getting confusing to me because I am a beginner and just trying to learn stuff that I can apply at work.

One officemate handed me an old training sheet she had which has a table for determining Class Interval. Please see below.

The number of observations I have is 75. If I use the table above, what will I use as CI...? It says there 6 to 10...

It also says there,

Alternatively: Cell interval = Range / (1+3.22Logn)

So if I use that,

CI = .45 / (1+0507855872)
CI = .45 / 1.507855872
CI = .298437011
CI = .298
CI= .30

Width = Range/# Intervals = .44 / .30 = 1.5

I do not know what is the correct formula to use now.....?

Please help...and I beg for your consideration as I really am a beginner. Thanks for understanding gurus :)
 

Attachments

  • Histogram Dilemma Images in PDF.pdf
    78.3 KB · Views: 153

Statistical Steven

Statistician
Leader
Super Moderator
There are lots of different rules for bin size, some of just convention (n=10) to using Sturges or Rice rules. It looks like your histogram and JMP histogram are the same. The purpose of the histogram is to show the distribution of the data (to check normality for example). Don't get caught up in the calculation of bin size, as usually they are just rules of thumb.


I just am trying to study 7 QC Tools and I started off with Histogram and then Control Charts(variable data first). As I was finished with computing manually the data I have I checked it out with an officemate via JMP 7. I had different results mainly because my spread is not the same as that of JMP 7's....

Please see the image of my data table below.

No of observations = 75

Max = 2.15 Min = 1.70

Range = Max - Min = .45

Class Interval = square root of 75 = 8.6602 = 9

STDEV = .099742912

Internal Class Width = STDEV / 3 = .099742912 / 3 = .03

Please see my tally sheet and histogram below.

Here is the JMP 7 image my officemate created. (I do not know how to use JMP 7 by the way :)...)

Please allow me some beginner questions..

As the image shows, JMP 7's is different than mine. I have 10 bars the same as JMP but the distance of the data "0" is not the same.

Is my histogram wrong?

What's the rule of the thumb for the "internal class width"...? Is it:

(1) STDEV / 3 which is the Standard Deviation divided by 3

or

(2) Internal Class Width = Range / Class Interval....? Where if I will use my data above, my class interval is the square root of 75 rounded to "0", which is 9. So if I apply the formula mentioned, .45 / 9 = .05

Please see image below if I use (2).


When will I use (1)...?

When will I use (2)...?

I also saw this at the moresteam_com website about histograms,




There are two options given there, (A) and (B). If I use (A) in determining the number of class intervals, am I to use (A) also in determining the internal class width..?

If I use (B) in determining the number of class intervals then also in determining the internal class width..?

This is getting confusing to me because I am a beginner and just trying to learn stuff that I can apply at work.

One officemate handed me an old training sheet she had which has a table for determining Class Interval. Please see below.

The number of observations I have is 75. If I use the table above, what will I use as CI...? It says there 6 to 10...

It also says there,



So if I use that,

CI = .45 / (1+0507855872)
CI = .45 / 1.507855872
CI = .298437011
CI = .298
CI= .30

Width = Range/# Intervals = .44 / .30 = 1.5

I do not know what is the correct formula to use now.....?

Please help...and I beg for your consideration as I really am a beginner. Thanks for understanding gurus :)
 

Miner

Forum Moderator
Leader
Admin
There are lots of different rules for bin size, some of just convention (n=10) to using Sturges or Rice rules. It looks like your histogram and JMP histogram are the same. The purpose of the histogram is to show the distribution of the data (to check normality for example). Don't get caught up in the calculation of bin size, as usually they are just rules of thumb.

Agreed. If the selection of bin sizes leaves regular gaps in the data (due to measurement resolution), I recommend reducing the number of bins to provide regular groupings without gaps.
 
B

Bjourne

Agreed. If the selection of bin sizes leaves regular gaps in the data (due to measurement resolution), I recommend reducing the number of bins to provide regular groupings without gaps.

Thank you for the reply. I really appreciate it.

Since you posted it I checked out Sturgis' Rule and Rice rule. Sturgis rule is 1+3.3logn X, X = total number of observations. (Do correct me if there is a mistake in the formula please).

Sturgis = 1+3.3log75
= 1+3.3(1.87506)
= 1+6.187702
= 7.187702

Anyway as you mentioned they are all rules of thumb and the conventional n=10 is commonly used right?

What's the best way to go based on the given examples and images?

(a) Conventional n=10

(b) Internal Class Width = STDEV / 3

(c) Internal Class Width = the square root of the "total no. of observations" rounded to "0",

Is it better to do 2 kinds so I can check out the normality of the distribution? Or if I use just 10 and go from there..?

Those data are seal strength data from an engineer at work. I think that is a hermetic seal specification for a hermetic semiconductor package from front-end line. (I work at back-end). Sort of temporary data after he threw it away. The Max / Min are actually strength specs 1.70 - 2.15 kg/cm2. I also plan to plot it in a control chart and see what can I get.

Thanks and I really appreciate it. :)

Anyone would want to take a stab or share a comment :)

@ Miner,

Agreed. If the selection of bin sizes leaves regular gaps in the data (due to measurement
resolution), I recommend reducing the number of bins to provide regular groupings without gaps.

In my images there is this here without the gaps. I have attached it as png file. It is where I used = Range / Square root of total # of observations. This I think is more sound as there are no gaps there.
 

Attachments

  • icw_range over ci_sqr root_n.png
    icw_range over ci_sqr root_n.png
    13.8 KB · Views: 216
Last edited by a moderator:

Statistical Steven

Statistician
Leader
Super Moderator
I think you need to realize there is no "rule". As with most graphical methods, you are trying to convey information with a picture and it might require some tweaks or adjustments. You have about 5 or more different approaches mentioned just in this thread alone, we all have our little tricks we like and it works.

Thank you for the reply. I really appreciate it.

Since you posted it I checked out Sturgis' Rule and Rice rule. Sturgis rule is 1+3.3logn X, X = total number of observations. (Do correct me if there is a mistake in the formula please).

Sturgis = 1+3.3log75
= 1+3.3(1.87506)
= 1+6.187702
= 7.187702

Anyway as you mentioned they are all rules of thumb and the conventional n=10 is commonly used right?

What's the best way to go based on the given examples and images?

(a) Conventional n=10

(b) Internal Class Width = STDEV / 3

(c) Internal Class Width = the square root of the "total no. of observations" rounded to "0",

Is it better to do 2 kinds so I can check out the normality of the distribution? Or if I use just 10 and go from there..?

Those data are seal strength data from an engineer at work. I think that is a hermetic seal specification for a hermetic semiconductor package from front-end line. (I work at back-end). Sort of temporary data after he threw it away. The Max / Min are actually strength specs 1.70 - 2.15 kg/cm2. I also plan to plot it in a control chart and see what can I get.

Thanks and I really appreciate it. :)

Anyone would want to take a stab or share a comment :)

@ Miner,



In my images there is this here without the gaps. I have attached it as png file. It is where I used = Range / Square root of total # of observations. This I think is more sound as there are no gaps there.
 

Bev D

Heretical Statistician
Leader
Super Moderator
Steven and miner are giving good advice. There is nothing statistically or mathematically precise about the histogram - certainly not enough to spend so much energy trying to differentiate the different methods.

The histogram is a simple graphicall display of the shape of a distribution. As long as your bins are representative (not too many such that there are gaps nor too few so that there is no shape) you are good to go.

The important thing to focus on is what the data is telling you.

I always remind my students and engineers that statistics is an essay question, not a math question...
 
B

Bjourne

Just be flexible. I use Minitab, and about 30% of the time, I change the default bin sizes to obtain a better look.

Thanks for the reply. I'll keep that in mind. We do have Minitab but still am learning it at the office (that is if the engineers aren't using the pc's).
 
Top Bottom