Histogram beginner dilemma - Manual Calculation vs. JMP 7

B

Bjourne

#1
I just am trying to study 7 QC Tools and I started off with Histogram and then Control Charts(variable data first). As I was finished with computing manually the data I have I checked it out with an officemate via JMP 7. I had different results mainly because my spread is not the same as that of JMP 7's....

Please see the image of my data table below.

No of observations = 75

Max = 2.15 Min = 1.70

Range = Max - Min = .45

Class Interval = square root of 75 = 8.6602 = 9

STDEV = .099742912

Internal Class Width = STDEV / 3 = .099742912 / 3 = .03

Please see my tally sheet and histogram below.

Here is the JMP 7 image my officemate created. (I do not know how to use JMP 7 by the way :)...)

Please allow me some beginner questions..

As the image shows, JMP 7's is different than mine. I have 10 bars the same as JMP but the distance of the data "0" is not the same.

Is my histogram wrong?

What's the rule of the thumb for the "internal class width"...? Is it:

(1) STDEV / 3 which is the Standard Deviation divided by 3

or

(2) Internal Class Width = Range / Class Interval....? Where if I will use my data above, my class interval is the square root of 75 rounded to "0", which is 9. So if I apply the formula mentioned, .45 / 9 = .05

Please see image below if I use (2).


When will I use (1)...?

When will I use (2)...?

I also saw this at the moresteam_com website about histograms,


1. Count the number of data points (50 in our height example).

2. Determine the range of the sample - the difference between the highest and lowest values (73.1-65, or 8.1 inches in our height example.

3. Determine the number of class intervals.

You can use either of two methods as general guidelines in determining the number of intervals:
A. Use ten intervals as a rule of thumb.
B. Calculate the square root of the number of data points and round to the nearest whole number. In the case of our height example, the square root of 50 is 7.07, or 7 when rounded. You may wish to experiment with different interval numbers. If there are too many, the distribution will spread out, and the histogram will look flat. Likewise, if there are too few intervals, the distribution can look artificially tight.

4. Determine the interval class width by one of two methods:

A. Width = Range/# Intervals = 8.1 / 10 = 0.81
B. Divide the Standard Deviation by three. In this case, the height data has a Standard Deviation of 1.85, which yields a class interval size of 0.62 inches, and therefore a total of 14 class intervals (Range of 8.1 divided by 0.62, rounded up).

This is slightly more class intervals than our rule of thumb indicated....
There are two options given there, (A) and (B). If I use (A) in determining the number of class intervals, am I to use (A) also in determining the internal class width..?

If I use (B) in determining the number of class intervals then also in determining the internal class width..?

This is getting confusing to me because I am a beginner and just trying to learn stuff that I can apply at work.

One officemate handed me an old training sheet she had which has a table for determining Class Interval. Please see below.

The number of observations I have is 75. If I use the table above, what will I use as CI...? It says there 6 to 10...

It also says there,

Alternatively: Cell interval = Range / (1+3.22Logn)
So if I use that,

CI = .45 / (1+0507855872)
CI = .45 / 1.507855872
CI = .298437011
CI = .298
CI= .30

Width = Range/# Intervals = .44 / .30 = 1.5

I do not know what is the correct formula to use now.....?

Please help...and I beg for your consideration as I really am a beginner. Thanks for understanding gurus :)
 

Attachments

Elsmar Forum Sponsor

Statistical Steven

Statistician
Staff member
Super Moderator
#3
There are lots of different rules for bin size, some of just convention (n=10) to using Sturges or Rice rules. It looks like your histogram and JMP histogram are the same. The purpose of the histogram is to show the distribution of the data (to check normality for example). Don't get caught up in the calculation of bin size, as usually they are just rules of thumb.


I just am trying to study 7 QC Tools and I started off with Histogram and then Control Charts(variable data first). As I was finished with computing manually the data I have I checked it out with an officemate via JMP 7. I had different results mainly because my spread is not the same as that of JMP 7's....

Please see the image of my data table below.

No of observations = 75

Max = 2.15 Min = 1.70

Range = Max - Min = .45

Class Interval = square root of 75 = 8.6602 = 9

STDEV = .099742912

Internal Class Width = STDEV / 3 = .099742912 / 3 = .03

Please see my tally sheet and histogram below.

Here is the JMP 7 image my officemate created. (I do not know how to use JMP 7 by the way :)...)

Please allow me some beginner questions..

As the image shows, JMP 7's is different than mine. I have 10 bars the same as JMP but the distance of the data "0" is not the same.

Is my histogram wrong?

What's the rule of the thumb for the "internal class width"...? Is it:

(1) STDEV / 3 which is the Standard Deviation divided by 3

or

(2) Internal Class Width = Range / Class Interval....? Where if I will use my data above, my class interval is the square root of 75 rounded to "0", which is 9. So if I apply the formula mentioned, .45 / 9 = .05

Please see image below if I use (2).


When will I use (1)...?

When will I use (2)...?

I also saw this at the moresteam_com website about histograms,




There are two options given there, (A) and (B). If I use (A) in determining the number of class intervals, am I to use (A) also in determining the internal class width..?

If I use (B) in determining the number of class intervals then also in determining the internal class width..?

This is getting confusing to me because I am a beginner and just trying to learn stuff that I can apply at work.

One officemate handed me an old training sheet she had which has a table for determining Class Interval. Please see below.

The number of observations I have is 75. If I use the table above, what will I use as CI...? It says there 6 to 10...

It also says there,



So if I use that,

CI = .45 / (1+0507855872)
CI = .45 / 1.507855872
CI = .298437011
CI = .298
CI= .30

Width = Range/# Intervals = .44 / .30 = 1.5

I do not know what is the correct formula to use now.....?

Please help...and I beg for your consideration as I really am a beginner. Thanks for understanding gurus :)
 

Miner

Forum Moderator
Staff member
Admin
#4
There are lots of different rules for bin size, some of just convention (n=10) to using Sturges or Rice rules. It looks like your histogram and JMP histogram are the same. The purpose of the histogram is to show the distribution of the data (to check normality for example). Don't get caught up in the calculation of bin size, as usually they are just rules of thumb.
Agreed. If the selection of bin sizes leaves regular gaps in the data (due to measurement resolution), I recommend reducing the number of bins to provide regular groupings without gaps.
 
B

Bjourne

#5
Agreed. If the selection of bin sizes leaves regular gaps in the data (due to measurement resolution), I recommend reducing the number of bins to provide regular groupings without gaps.
Thank you for the reply. I really appreciate it.

Since you posted it I checked out Sturgis' Rule and Rice rule. Sturgis rule is 1+3.3logn X, X = total number of observations. (Do correct me if there is a mistake in the formula please).

Sturgis = 1+3.3log75
= 1+3.3(1.87506)
= 1+6.187702
= 7.187702

Anyway as you mentioned they are all rules of thumb and the conventional n=10 is commonly used right?

What's the best way to go based on the given examples and images?

(a) Conventional n=10

(b) Internal Class Width = STDEV / 3

(c) Internal Class Width = the square root of the "total no. of observations" rounded to "0",

Is it better to do 2 kinds so I can check out the normality of the distribution? Or if I use just 10 and go from there..?

Those data are seal strength data from an engineer at work. I think that is a hermetic seal specification for a hermetic semiconductor package from front-end line. (I work at back-end). Sort of temporary data after he threw it away. The Max / Min are actually strength specs 1.70 - 2.15 kg/cm2. I also plan to plot it in a control chart and see what can I get.

Thanks and I really appreciate it. :)

Anyone would want to take a stab or share a comment :)

@ Miner,

Agreed. If the selection of bin sizes leaves regular gaps in the data (due to measurement
resolution), I recommend reducing the number of bins to provide regular groupings without gaps.
In my images there is this here without the gaps. I have attached it as png file. It is where I used = Range / Square root of total # of observations. This I think is more sound as there are no gaps there.
 

Attachments

Last edited by a moderator:

Steve Prevette

Deming Disciple
Staff member
Super Moderator
#7

Statistical Steven

Statistician
Staff member
Super Moderator
#8
I think you need to realize there is no "rule". As with most graphical methods, you are trying to convey information with a picture and it might require some tweaks or adjustments. You have about 5 or more different approaches mentioned just in this thread alone, we all have our little tricks we like and it works.

Thank you for the reply. I really appreciate it.

Since you posted it I checked out Sturgis' Rule and Rice rule. Sturgis rule is 1+3.3logn X, X = total number of observations. (Do correct me if there is a mistake in the formula please).

Sturgis = 1+3.3log75
= 1+3.3(1.87506)
= 1+6.187702
= 7.187702

Anyway as you mentioned they are all rules of thumb and the conventional n=10 is commonly used right?

What's the best way to go based on the given examples and images?

(a) Conventional n=10

(b) Internal Class Width = STDEV / 3

(c) Internal Class Width = the square root of the "total no. of observations" rounded to "0",

Is it better to do 2 kinds so I can check out the normality of the distribution? Or if I use just 10 and go from there..?

Those data are seal strength data from an engineer at work. I think that is a hermetic seal specification for a hermetic semiconductor package from front-end line. (I work at back-end). Sort of temporary data after he threw it away. The Max / Min are actually strength specs 1.70 - 2.15 kg/cm2. I also plan to plot it in a control chart and see what can I get.

Thanks and I really appreciate it. :)

Anyone would want to take a stab or share a comment :)

@ Miner,



In my images there is this here without the gaps. I have attached it as png file. It is where I used = Range / Square root of total # of observations. This I think is more sound as there are no gaps there.
 

Bev D

Heretical Statistician
Staff member
Super Moderator
#9
Steven and miner are giving good advice. There is nothing statistically or mathematically precise about the histogram - certainly not enough to spend so much energy trying to differentiate the different methods.

The histogram is a simple graphicall display of the shape of a distribution. As long as your bins are representative (not too many such that there are gaps nor too few so that there is no shape) you are good to go.

The important thing to focus on is what the data is telling you.

I always remind my students and engineers that statistics is an essay question, not a math question...
 
B

Bjourne

#10
Just be flexible. I use Minitab, and about 30% of the time, I change the default bin sizes to obtain a better look.
Thanks for the reply. I'll keep that in mind. We do have Minitab but still am learning it at the office (that is if the engineers aren't using the pc's).
 
Thread starter Similar threads Forum Replies Date
Q QI Macro Histogram - Can someone define *sorted data*? Capability, Accuracy and Stability - Processes, Machines, etc. 7
T Change Histogram Binning Using Minitab Software 6
G MS Excel for LSL, USL in Histogram and Standard Deviation Six Sigma 26
R Capability Analysis (using Histogram) for Subgrouped Data Statistical Analysis Tools, Techniques and SPC 16
G Histogram Interpretation when a process is "running too high" or "running too low"? Quality Tools, Improvement and Analysis 4
F Evaluation of a Histogram - Determine if this histogram looks normal or not Statistical Analysis Tools, Techniques and SPC 4
J Can you change the test specifications based on Histogram? Manufacturing and Related Processes 6
O Histogram Attributes and Requirements Statistical Analysis Tools, Techniques and SPC 21
C What Histogram Chart is Good for Varying Measurements? Statistical Analysis Tools, Techniques and SPC 11
D Why the tight control limits? Histogram of the data is slightly skewed to the left Statistical Analysis Tools, Techniques and SPC 17
I Help me make a normal bell curve on my histogram Statistical Analysis Tools, Techniques and SPC 1
Manix How can I add a Normal Distribution Bell Curve to an Excel Histogram? Excel .xls Spreadsheet Templates and Tools 20
V Free Quality Tools for drawing Ishikawa Diagram, Pareto Diagram, Histogram, etc. Quality Assurance and Compliance Software Tools and Solutions 7
V Seeking: Free Quality Tools - Ishikawa diagram, Pareto diagram, histogram, etc. Quality Tools, Improvement and Analysis 5
K Beginner in ISO 45001 here! Three questions Occupational Health & Safety Management Standards 6
S Beginner trying to OEM a medical device - Branded Electric Acupuncture Pen EU Medical Device Regulations 3
K Training the absolute beginner to understand AQL Training - Internal, External, Online and Distance Learning 1
Q Beginner's Understanding - The Purpose and Applications of QMS/ISO Standards Philosophy, Gurus, Innovation and Evolution 12
F Supplier Quality Engineer Beginner - Potential Interview Questions Supplier Quality Assurance and other Supplier Issues 3
A Function for Effects - Beginner in usage of the IQRM FMEA and Control Plans 3
C Help a beginner in the CE Marking world! (Semi long post) CE Marking (Conformité Européene) / CB Scheme 2
Q A Beginner's Questions about ISO 9001:2008 ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 4
E Internal audit checklist for beginner auditors Internal Auditing 3
L Recommend a PPAP training course for a beginner APQP and PPAP 6
B Help for beginner - PPAP (Production Part Approval Process) APQP and PPAP 9
M Basic ISO 9001 information for a beginner ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 3
N How to determine Type-B uncertainty?What distribution?With Calibration beginner guide Measurement Uncertainty (MU) 11
W Six Sigma for the Beginner Six Sigma 19
Sidney Vianna Interesting Discussion (unchecked) Social Media is destroying society (as we know it) - The Social Dilemma documentary and alike videos.... Coffee Break and Water Cooler Discussions 7
M Dilemma about choosing the most applicable clause related to Risk ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 19
shimonv Storage Conditions of 5-40 Celsius - Accelerated and real time aging - Dilemma EU Medical Device Regulations 3
A CAR from 3rd party AS9100D auditor - Root cause dilemma AS9100, IAQG, NADCAP and Aerospace related Standards and Requirements 45
Q ISO 9001 Requirement Dilemma - Security Aspects Quality Management System (QMS) Manuals 14
M True Position and MMC (Dilemma Between Engineers) AS9100, IAQG, NADCAP and Aerospace related Standards and Requirements 3
S Training Matrix Dilemma Excel .xls Spreadsheet Templates and Tools 2
D Auditor's Dilemma Imported Legacy Blogs 16
optomist1 Feature Control Frame is applied to the end of a Cylinder - Datum Dilemma Inspection, Prints (Drawings), Testing, Sampling and Related Topics 3
V Steve Jobs solved Innovator's Dilemma World News 8
R Another ISO 9001 Clause 7.3 dilemma - Two organizations under one roof Design and Development of Products and Processes 6
S Is it a Medical Device ? a bit of a dilemma? EU Medical Device Regulations 4
F Quality vs. Quality System - My dilemma Design and Development of Products and Processes 4
G Slip fit - Size on size dilemma for pin and plug gages Manufacturing and Related Processes 4
D Consultant's Dilemma - Implementation project is behind schedule Consultants and Consulting 16
M The dilemma of Falsifying Inspection Results - aka Fraud Inspection, Prints (Drawings), Testing, Sampling and Related Topics 56
GStough Auditing Dilemma - New Manager - What To Do? (long) Internal Auditing 13
Ron Rompen CMM Output Dilemma - Splines are not very friendly to work with General Measurement Device and Calibration Topics 3
F Nonconformance dilemma - "Actual" nonconformance vs. "Indicated" nonconformance Nonconformance and Corrective Action 19
J ABS Signal Testing Dilemma Inspection, Prints (Drawings), Testing, Sampling and Related Topics 0
S Another TS 16949 Dilemma - Analysis of Data 8.4 a) customer satisfaction (see 8.2.1) IATF 16949 - Automotive Quality Systems Standard 5
D ISO 9001, 7.4.3 - Dilemma: Verification of purchased product ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 8

Similar threads

Top Bottom