on open source data analysis applications

Those you who have been following Clay Spinuzzi on Twitter know that he has been starting to gather data for his latest study (his latest book, Network: Theorizing Knowledge Work in Telecommunications just came out in hardback from Columbia UP). He mentioned in one of his tweets that he ought to blog about the new data analysis tool that he has mashed together. As I am in the process of gathering similar artifactual and tagged data in my current study, I shot him a message requesting that he do just that. He came through big time with: “Rolling your own free, customized, free, multiplatform, and free qualitative data analysis tool. For free.”

Note the free nature of the tool.

Clay provides a rare glimpse into the evolution of tools used in the pursuit of gathering, organizing, and protecting qualitative date. He begins:

Qualitative data analysis tools are expensive. When I came to UT in 2001, I had the university spend $500 on one popular QDA tool, NVivo. It was going to change the way I did research. So I installed it, played around with it, was not impressed, and abandoned it.

Much more recently, I decided to try HyperResearch on the advice of a grad students from Education. Again, UT sprang for the $400 needed to buy it. I used it for two studies and again, I was not impressed: in some ways it was very limiting, particularly in terms of relating various types of data and coding. The interface was clunky.

And look: $900 spent for nothing.

But between those two times, I managed to analyze 89 sets of observations, 84 interviews, and assorted artifacts. This work followed me across three platforms (Linux, MacOSX, OpenZaurus), and it didn’t involve an off-the-shelf qualitative research tool. I’m coming back to this solution for managing the data in my latest study, a study of collaboration and project management at high tech organizations. It offers better print formatting, more flexible data analysis, and multiple interfaces that can be chosen for the specific type of analysis or data entry. It’s multiplatform. Fast. And it didn’t cost me a dime.

His new system:

I use a MySQL database to store the data, with a different database table for each kind of data. The first table to set up is the Participant table, with each participant receiving a key index number. Other tables are all indexed by that participant number, so I can join tables based on participant.

Each table has a CODES field where I can insert codes from a list. I keep the list of codes in a text editor and surround each one with asterisks like this:

**COMPANY_HISTORY**The asterisks allow me to search across a table and pick up just the codes — searching for “**COMPANY” picks up codes that start with that string, while searching for “COMPANY” might pick up uses of the actual word in interview or observational notes.

To analyze the data, I use several MySQL front ends, including YourSQL, CocoaMySQL, and phpMySQL. These front ends are all free, they afford different views of the data, but they all work on the same underlying data. The result is far more flexibility than I would get from an off-the-shelf QDA tool.

Clay continues to describe the limitations of his new system, how to set it up, how to search the database, and how to print data. He acknowledges (and it is) a lot to absorb.

But for researchers striving to find affordable (that is, free–did Clay mention that it is free, because it is) data collection and analysis systems, and aren’t afraid to get their hands dirty with a little coding, I strongly recommend considering adapting his system. I’m going to try to do just that over the winter break.

Bill Wolff

Associate Professor, Communication and Digital Media, St. Joseph's University

on open source data analysis applications

Tags