22 Free Tools for Data Visualization and Analysis

You may not think you've got much in common with an investigative journalist or an academic medical researcher. But if you're trying to extract useful information from an ever-increasing inflow of data, you'll likely find visualization useful -- whether it's to show patterns or trends with graphics instead of mountains of text, or to try to explain complex issues to a nontechnical audience.

Drawbacks: As is the case with other JavaScript libraries, it's pretty much essential for users to have knowledge of JavaScript (or at least some other programming language). While it's possible to copy, paste and modify code without really understanding what it's doing, I find it difficult to recommend that approach for nontechnical end users.

Skill level: Expert.

Runs on: JavaScript-enabled Web browsers.

Learn more: Try the How-to: Get Started Guide. You can also find examples of the types of graphics you can build with Protovis at the Protovis Gallery.

GIS/mapping on the desktop

There's a wide range of business uses for geographic information systems (GIS), ranging from oil exploration to choosing sites for new retail stores. Or, as The Miami Herald did for its Pulitzer Prize-winning coverage of Hurricane Andrew, you can compare maximum wind speeds with damage reports and building information (and perhaps discover, for example, that the worst damage didn't happen in the areas suffering the heaviest winds, but in areas with a lot of new, shoddy construction).

Quantum GIS (QGIS)

What it does: This is full-fledged GIS software, designed for creating maps that offer sophisticated, detailed data-based analysis of a geographic regions.

The best-known desktop GIS software is probably Esri's ArcView, a robust, well-supported application that costs quite a bit of money. The open-source QGIS is an alternative to ArcView.

As OpenOffice is to Microsoft Office, QGIS is to ArcView. ArcView enthusiasts argue that Esri's offering is a couple of years ahead of open-source alternatives, has a better-developed interface, enjoys commercial support and is better suited for print output. But QGIS users say the open-source alternative is an excellent program that does a great deal of useful GIS work -- and may even be better than ArcView when it comes to generating maps for the Web, thanks to a plug-in dedicated to generating HTML image maps.

What's cool: QGIS has an enormous amount of GIS functionality, including the ability to create maps, overlay various types of data, do spatial analysis, publish to the Web and more. It can also be enhanced with plug-ins that add support for numerous undertakings, including geocoding, managing underlying table data, exporting to MySQL and generating HTML image maps.

Drawbacks: As with any sophisticated GIS application, learning to use this software entails a serious commitment of time and training. Even in hour-long hands-on sessions with first ArcView and then QGIS, I noticed things that were easier to do in the commercial option. For example, ArcView had a one-click "normalize" function to immediately calculate, say, the percentage of people 65 and over versus the total population from a data table with both columns; in QGIS, I needed to pull up a "field calculator" and create a new column with the formula to do that calculation myself.

Runs on: Linux, Unix, Mac OS X, Windows. (This is one case where installation is more complicated on OS X, since it requires manual installation of several dependencies. There's a one-click installer for Windows.)

Skill level: Intermediate to expert.

Learn more: Timothy Barmann of The Providence Journal posted two very useful tutorials for the CAR conference that are still available: Introduction to QGIS and The Latest in Mapping With JavaScript and jQuery. Barmann also offers a sample: Rhode Island's Ethnic Mosaic. Another resource to help you get started: QGIS Tutorial Labs from Richard E. Plant, professor emeritus at the University of California, Davis.

Note: If you're interested in GIS and want to consider other free software options, download this PDF listing of Open Source/Non-Commercial GIS Products. And if you're looking for a free open-source desktop GIS program that might be fairly easy to use, Jacob Fenton, director of computer-assisted reporting at American University's Investigative Reporting Workshop, recommends taking a look at the System for Automated Geoscientific Analyses (SAGA) site. Finally, if analyzing geographic data in a conventional database sounds interesting, PostGIS "spatially enables" the PostgreSQL relational database, according to the site.

Web-based GIS/mapping

Most of us are familiar with mapping tools from major companies like Google (which has a number of third-party front ends such as Map A List, an add-on that adds info to a Google Map from a spreadsheet). There's also Yahoo Maps Web Services and Bing Maps -- all with APIs. But there are numerous other options from smaller organizations or lone open-source enthusiasts that were designed from the ground up to map geographic data.


What it does: This user-friendly website generates color-coded maps; the colors change depending on underlying info such as population change or average income. It can also place markers on a map, varying the size of the markers based on a data table.

In addition to providing the Web-based service, author Pete Warden has also packaged OpenHeatMap as a jQuery plug-in for those who don't want to rely on hosting at OpenHeatMap.com. However, not all data formats work correctly when hosted locally. "My recommended way is to embed the maps from the site," Warden wrote via Skype chat.

What's cool: It is astonishingly easy to create a color-coded map from many types of location data -- even IP addresses (just use the column header ip_address).

It took me about 60 seconds to create a basic map from a spreadsheet of magnitude 7 or higher earthquakes around the world since Jan. 1, 2000, then a couple of minutes more to customize the rollover box to display both date and magnitude. (You can see a larger version on OpenHeatMap.com.)

Marker transparency, size and color are extremely simple to customize; you can also upload your own marker image, and customize what appears in the tooltips rollover by adding a tooltip column to your data source.

OpenHeatMap automatically figures out and maps locations based on a wide range of place definitions, relying on how the location columns are named -- "address," "country," "fips_code" (used by the U.S. Census Bureau), "zip_code_area" (for five-digit ZIP codes), "lat" (latitude), "lon" (longitude) and so on.

This is a well-thought-out interface from a onetime Apple engineer. (Warden said he worked on several software projects at Apple, including Final Cut Studio.)

Drawbacks: There's no way to delete data once it's been uploaded (you can get around this by using a Google Spreadsheet as a data source), and editing time is limited to as long as your browser is open and you haven't started a new map. Embedded OpenHeatMap.com-hosted maps may be slow to load.

The documentation doesn't make it clear whether you can set where the map is centered or what the default zoom level should be; Warden told me by e-mail that the system remembers where you last positioned and zoomed the map before saving. And this feature still can occasionally be buggy, although Warden is responsive to bug reports.

Skill level: Beginner.

Runs on: Web browsers enabled for Flash or HTML 5 Canvas.

Learn more: Its title notwithstanding, the four-minute video "How OpenHeatMap Can Help Journalists" offers a clear explanation for anyone interested in using the service. You can also view samples on the OpenHeatMap Gallery and check out this Guardian interactive map of where Facebook is used.


What it does: OpenLayers is a JavaScript library for displaying map information. It's aimed at providing functionality similar to those big companies' code libraries -- but with open-source code. OpenLayers works with OpenStreetMap and other maps, as this tutorial about use with Google shows.

Other projects build on it to add functionality or ease of use, such as GeoExt, which adds more GIS capabilities. For users who are comfortable hand-coding JavaScript and prefer not to use a commercial platform such as Google or Bing, this can be a compelling option.

Drawbacks: OpenLayers is not yet as developed or as easy to use as, say, Google Maps. The project page notes that it is "still undergoing rapid development."

Skill level: Expert.

Runs on: Any Web browser.

Learn more: Try this OpenLayers Simple Example. A good sample is Ushahidi's Haiti map.

There are other JavaScript libraries for overlaying information on maps, such as Polymaps. And there are a number of other mapping platforms, such as Google Maps, which offers numerous mapping APIs; Yahoo Maps Web Services, with its own APIs; the Bing Maps platform and APIs; and GeoCommons.


What it does: OpenStreetMap is somewhat like the Wikipedia of the mapping world, with various features such as roads and buildings contributed by users worldwide.

What's cool: The main attraction of OpenStreetMap is its community nature, which has led to a number of interesting uses. For example, it is compatible with the Ushahidi mobile platform used to crowdsource information after the earthquakes in Haiti and Japan. (While Ushahidi can use several different providers for the base map layer, including Google and Yahoo, some project creators feel most comfortable sticking with an open-source option.)

Drawbacks: As with any project accepting public input, there can be issues with contributors' accuracy at times (such as the helicopter landing pad someone once placed in my neighborhood -- it's actually quite a few miles away). Although, to be fair, I've encountered more than one business listing on Google Maps that was woefully out of date. In addition, the general look and feel of the maps isn't quite as polished as commercial alternatives.

Skill level: Advanced beginner to intermediate.

Runs on: Any Web browser.

Learn more: See the Quick Tutorial on the OpenLayers site.

Temporal data analysis

If time is an important component of your data, traditional timeline visualizations may show patterns, but they don't allow for sophisticated analysis or a great deal of interaction. That's where this project comes in.


What it does: This desktop software is for analyzing data points that involve a time component. In a demo I wrote about last summer, creators Fernanda Viégas and Martin Wattenberg -- the pair behind the Many Eyes project who are now working at Google -- showed how TimeFlow can generate visual timelines from text files, with entries color- and size-coded for easy pattern spotting. It also allows the information to be sorted and filtered, and it gives some statistical summaries of the data.

What's cool: TimeFlow makes it incredibly easy to interact with data in various ways, such as switching views or filtering by criteria such as date ranges or earthquakes of magnitude 8 or more. The timeline view offers a slider so you can zero in on a time period. While many applications can plot bar graphs, fewer also offer calendar views. And unlike Web-based Google Fusion Tables, TimeFlow is a desktop application that makes it quick and painless to edit individual entries.

Drawbacks: This is an alpha release designed to help individual reporters doing investigative work. There are no facilities for publishing or sharing results other than taking a screen snapshot, and additional development appears unlikely in the near future.

Skill level: Beginner.

Runs on: Desktop systems running Java 1.6, including Windows and Mac OS X.

Learn more: Check out Top tips.

Note: If you're looking to publish visualized timelines, better options include Google Fusion Tables, VIDI or the SIMILE Timeline widget.

Text/word clouds

Some data visualization geeks think word clouds are either not very serious or not very original. You can think of them as the tiramisu of visualizations -- once trendy, now overused. But I still enjoy these graphics that display each word from a text file once, with the size of the words varying depending on how often each one appears in the source.

IBM Word-Cloud Generator

What it does: Several tools mentioned previously can create word clouds, including Many Eyes and the Google Visualization API, as well as the website Wordle (which is a handy tool for making word clouds from websites instead of text files). But if you're looking for easy desktop software dedicated to the task, IBM's free Word-Cloud desktop application fits the bill.

What's cool: This is a quick, fun and easy way to find frequency of words in text.

Drawbacks: Because it's trying to ignore words such as "a" and "the," the basic configuration can miss some important terms. In my tests, it didn't know the difference between "it" and "IT," and completely missed "AT&T."

Skill level: Advanced beginner. This app runs on the command line, so users should have ability to find file paths and plug them into a sample command.

Runs on: Windows, Mac OS X and Linux running Java.

Learn more: Check the examples that come with the download.

Social and other network analysis

These tools use a pre-Facebook/Twitter definition of "social network analysis" (SNA), referring to the discipline of finding connections between people based on various data sets. Investigative journalists have used such tools to, for example, find links between people who are involved in development projects or who are members of various boards of directors.

An understanding of statistical theories of network node analysis is necessary in order to use this category of software. Since I've only had a very basic introduction to that discipline, this is one category of tools I did not test hands-on. But if you're seeking software to do such analysis, one of these might meet your needs.


