<< Chapter < Page Chapter >> Page >
Many people perform data analysis, but few have offered a theoretical model for the process. The descriptions that have been offered disagree with each other and appear to be based on personal intuition. This module examines the accuracy of conceptualizing data analysis as a sense making process, as described in cognitive science literature. A review of 11 articles that feature data analysis tasks suggests that a sense making model for data analysis would be accurate. Future work will examine if and how statistical data analysis safeguards itself against the sources of bias contained in the sense making process.

MOTIVATION

Data analysis is the process by which we glean understanding from data. While the origins of data analysis extend at least asfar back as Francis Bacon and certainly further, the term “Data Analysis” was first introduced as a field of academic study in 1962 by John Tukey.

Improvements in technology have increased both the amount of data that we can store and the speed with which we can analyze it(Friedman 1997). With each improvement, data analysis becomes more relevant. Modern commentators now claim we live in the midst of a “data deluge,” where weno longer have the cognitive power to understand all of the data available (Hey 2003). Further advances in data collection technology will require furtheradvances in data analysis methods.

The fields of Machine Learning, Data Mining, InfoVis, and Visual Analytics are all attempts to improve upon Data Analysis tobetter meet our analytical needs. But even with the research already done in these areas, scientists claim that there is very little Data Analysis theory tobuild upon, and that the theory that is available is hard to access (Unwin 2001, Mallows 2006, Cox 2007). This lack of theoretical understanding stymiesimprovement in the field. Many academic disciplines create innovations by extending existing theory in new ways. Data analysis appears to proceed througha trial and error process.

Researchers have offered multiple suggestions to remedy this. Cox and Mallows propose reviewing data analysis case studies to induce ageneral pattern of analysis. Unwin suggests creating a pattern language of Data Analysis similar to the pattern language first proposed by architects Alexander,Ishikawa, and Silverstein (1977), and used successfully in the field of software engineering (Coplien 1996). While we are intrigued by Unwin’s proposition, we donot presently have the resources to define a complete pattern language. However, we begin our examination of data analysis by reviewing the data analysis casestudies that exist in the literature of statistical consulting, as suggested by Cox and Mallows.

RESEARCH QUESTION

Can the sensemaking model of cognitive science provide a theoretical model for data analysis?

PREVIOUS MODELS OF DATA ANALYSIS

Past efforts to describe data analysis reveal a lack of consensus about the process. Below are three illustrations of theprocess provided by Box (1976), Box, Hunter, and Hunter (1978), and Wild and Pfannkuch (1999).

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, The art of the pfug. OpenStax CNX. Jun 05, 2013 Download for free at http://cnx.org/content/col10523/1.34
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'The art of the pfug' conversation and receive update notifications?

Ask