SAS vs. R vs. Python: Which Tool Do Analytics Pros Prefer?

Elsmar Forum Sponsor

Bev D

Heretical Statistician
Staff member
Super Moderator
#2
I'll wade in here a bit as my organization is also struggling with this 'big data thing'. We use SAS and R for predictive modeling, SAS is used for the end product and R is used for some initial screening. Python is also being used by the non-statisticians primarily for data manipulation and preparation for SAS or R.

Here's my issue: too many people who know nothing about data are analyzing it. Using observational data is a huge risk if you don't understand the biasing and confounding that exist. This issue has been around since the beginning of analysis of observational data for the social sciences and medical 'investigations'. It is why it took so long to 'prove' that smoking causes cancer or that coffee is bad for you or good for you or whatever the latest 'study' says. We are now applying this same lack of knowledge to larger and larger data sets. Some are very well skilled and knowledgable and others are hacks. The consuming public has no way to know which is which...

The blog itself is an example of poor analysis of data as it is simply a series of pie charts after pie charts (some pies disguised as doughnuts or bars). They show only proportions among a single category and not actual quantities or what the software is actually being used for. They also ignore the fact that popularity is not the same thing as usefulness. :nope:
 

Miner

Forum Moderator
Staff member
Admin
#3
I will second Bev's comments. I read an article a few years ago that summed up the problem nicely. In short, it said that years ago, data were scarce, so there was only one way to analyze the data (e.g., 2 groups of unpaired sample data = 2 sample t-test). You could botch a few things, but the selection of the analysis was not typically one of those things.

In the big data world, there are a multitude of ways to potentially analyze the data. While some are obviously wrong to the trained analyst, they are still used by the untrained analyst. Others may even seem correct to the trained analyst, but are incorrect, not because they are unsuited to the data, but because they answer a different question than the one asked (e.g., ANOVA vs. ANOM; they answer different questions).

In other words, there are a lot more ways to make a mistake with big data.
 

Mark Meer

Trusted Information Resource
#5
Here's my issue: too many people who know nothing about data are analyzing it. Using observational data is a huge risk if you don't understand the biasing and confounding that exist. This issue has been around since the beginning of analysis of observational data for the social sciences and medical 'investigations'. It is why it took so long to 'prove' that smoking causes cancer or that coffee is bad for you or good for you or whatever the latest 'study' says. We are now applying this same lack of knowledge to larger and larger data sets. Some are very well skilled and knowledgable and others are hacks. The consuming public has no way to know which is which..
Interesting discussion...

This is becoming increasingly a problem in the sciences? I have no doubt it is - as in your examples of consumption studies, as well as, I'm certain, a litany of studies in the social sciences.

So the question is: why is the peer-review process not sufficient? To few knowledgeable statisticians involved? ...perhaps... but I suspect that a lot of it has to do with lack of consensus as to what is the appropriate way to analyze big data, rather than a lack of knowledge...

MM
 

Bev D

Heretical Statistician
Staff member
Super Moderator
#6
What peer review process? Unlike many of the past observational studies that I cited, there really is no peer review process for Big Data. Much of it is 'black box' where one can't actually see what the analysis is doing - or not doing. And unlike many of the observational studies (where a blind acceptance of the p<.05 increasingly became the norm in the last two+ decades) where the statisticians (although many of these studies were conducted by researchers and peer reviewed was by researchers not statisticians) were certainly aware of the dangers of biased and confounded data sets, today's crop of big data scientists are even more unaware of these dangers.

What statisticians? Data Scientists aren't statisticians. IT groups are selling the black box analytical engines with the seductive promise of perfect results regardless of the data - they believe that the sheer size will overcome any biasing or confounding...

Of course there is some good statistically sound scientific research going on now and then but in the main today's big data is even farther afield of solid statistics.
 

Mark Meer

Trusted Information Resource
#7
Ah, I see. I apologize...following your mention of consumption studies, I was pushing the thread stray from the original topic of software to the realm of scientific studies in general...
I'm admittedly no expert in the field...I just find the discussion interesting.

To continue: Let me know if I've got this correct...

There are a (small) number of tech companies cornering the market on analysis software, and these software packages are being used by both industry and scientific research without knowing how they are actually implementing the analysis?

If so, that is troubling indeed...

Knowing little about how these tools work (forgive me), can I ask: To deal with the "black-box" problem, can the results not be cross-referenced with outputs from other software to gain confidence in the results? In otherwords, can one not use multiple software packages and verify that they all output approximately the same results given the same inputs?
 
Thread starter Similar threads Forum Replies Date
V Validation of macro - scripts - programs used in statistical software (Minitab-SAS... Qualification and Validation (including 21 CFR Part 11) 5
I Using validated SAS JMP to validate Excel for Statistical Analysis? Qualification and Validation (including 21 CFR Part 11) 11
Marc Monty Python's Dead Parrot Sketch Is 1,600 Years Old Funny Stuff - Jokes and Humour 6
Marc American alligator vs. Burmese python - Python Bursts After Trying to Eat Gator Coffee Break and Water Cooler Discussions 7
G Tool tracebility and First calibration requirements for aerospace (AS9100) organisation AS9100, IAQG, NADCAP and Aerospace related Standards and Requirements 5
G APQP Scope and scale tool APQP and PPAP 2
B FAR/Prime Contractor Flow-down Tool - AS9100 AS9100, IAQG, NADCAP and Aerospace related Standards and Requirements 3
R Does any here use an internal auditing tool that works on different platforms? Internal Auditing 3
O Dimension Measurement Tool Recommendation General Measurement Device and Calibration Topics 21
Y Commanded life cycle management tool IEC 62304 - Medical Device Software Life Cycle Processes 1
GreatNate Excel PPAP Tracking templates or tool wanted APQP and PPAP 1
E Standards Management Tool Quality Tools, Improvement and Analysis 1
E ISO 9001:2015 - Record requirements for out of calibration tool ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 28
M PDCA as a project tool Quality Tools, Improvement and Analysis 5
M Minitab tool to evaluate PM (Preventive Maintenance) process Using Minitab Software 6
M Informational Health Canada has launched an e-Learning tool to aid in understanding the premarket regulatory requirements for medical devices in Canada Medical Device and FDA Regulations and Standards News 0
John Broomfield Workplace by Facebook - Any other organizations using this collaborative tool? Misc. Quality Assurance and Business Systems Related Topics 0
A Overkill? Using the 3L5W tool on non-conformities Nonconformance and Corrective Action 3
I Tool Control Marking Voiding Warranty AS9100, IAQG, NADCAP and Aerospace related Standards and Requirements 1
F Quality Tool box talks ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 8
D VDA 6.4 Audit checklist - Production tool suppliers VDA Standards - Germany's Automotive Standards 1
I Tool maintenance and calibration procedure - Calibrating tools/equipment AS9100, IAQG, NADCAP and Aerospace related Standards and Requirements 3
bobdoering "nds" or Number of Discriminate Samples - the Necessary Tool to Work With "ndc"! Imported Legacy Blogs 0
G Design and development of user centric audit report visualisation tool Design and Development of Products and Processes 5
G Web based tool for benchmarking audit findings Benchmarking 6
J AS9100D Clause Brain Fade - Tool was past due for Calibration AS9100, IAQG, NADCAP and Aerospace related Standards and Requirements 11
R New Tool from Omnex - IATF Gap Assessment IATF 16949 - Automotive Quality Systems Standard 0
C Upgrading from ISO 9001:2015 to IATF 16949:2016 - Anyone have a gap analysis tool? IATF 16949 - Automotive Quality Systems Standard 2
I Veeva installation and the Loader Tool Document Control Systems, Procedures, Forms and Templates 3
M IATF Tool for Leadership and Planning - 2017 IATF 16949 - Automotive Quality Systems Standard 0
Icy Mountain ISO9001:2015 & IATF 16949:2016 Gap Analysis Tool IATF 16949 - Automotive Quality Systems Standard 7
K Looking for Calibrated Tool Shadowboard Substrate Ideas AS9100, IAQG, NADCAP and Aerospace related Standards and Requirements 5
P Tool for Measuring - Do I have to do more than one Gage R&R for the PPAP? Gage R&R (GR&R) and MSA (Measurement Systems Analysis) 2
J Pneumatic & Electrical Torque Tool - Calibration/Verification of 'Power Tools' General Measurement Device and Calibration Topics 2
M Excel Leadtime Analysis Graphic Tool with Macros Enabled Lean in Manufacturing and Service Industries 0
D Medical Device Software Tool Validation - Compilers! IEC 62304 - Medical Device Software Life Cycle Processes 7
J What kind of Tool and Gauge in IATF 16949 Clause 8.5.1.6? IATF 16949 - Automotive Quality Systems Standard 1
J CNC Machine Tool Calibration - 6 dof Calibration Interferometers General Measurement Device and Calibration Topics 1
P Using Bugzilla as a Quality Reporting Tool Quality Assurance and Compliance Software Tools and Solutions 11
Marc Hackers unleash smart Twitter phishing tool that snags two in three users IEC 27001 - Information Security Management Systems (ISMS) 7
D Proper Tool to Measure Wall Thickness (ISO 3611 - Micrometers) Manufacturing and Related Processes 7
N Tool frequency change - Data Analysis needed Statistical Analysis Tools, Techniques and SPC 9
V What tool do you use for systemic issues? Problem Solving, Root Cause Fault and Failure Analysis 2
C Software Planning Tool for Equipment Qualifications, Process Validations Software Quality Assurance 3
M How to provide Stability Study for micron-precision tool Misc. Quality Assurance and Business Systems Related Topics 2
J Tooling Cost - As an OEM I am paying for a new tool Manufacturing and Related Processes 6
L Auditing Tool (Excel spreadsheet) for VDA 6.3 to Audit a Service Provider VDA Standards - Germany's Automotive Standards 4
O Creating a Tool to Track & Verify Mistake Proofing Devices Document Control Systems, Procedures, Forms and Templates 5
E CAPA Management Software Tool Recommendations for Small Companies Quality Assurance and Compliance Software Tools and Solutions 3
T What does Tool Validation Accomplish? Design and Development of Products and Processes 14

Similar threads

Top Bottom