Tag Archive for 'Statistics'

R visualizations

New Year’s resolution: harvest some knowledge from those colleagues who are digging through R. Apparently it’s not only an excellent tool for number crunching, but also can be used for neat geographic data visualizations.

Visualizing Facebook Friends: Eye Candy in R

Facebook visualization using R

EUROSTAT follow up

Yesterday we found out about the NUTS region codes update at EUROSTAT. Today I had the chance to dig a little deeper and try to find a workaround for non-matching regional data. So far we’ve got an updated EUROSTAT which looks, in combination with non-updated GISCO geographic data something like this:

EUROSTAT GISCO coverage

The map illustrates the coverage of available regional data in Europe. For testing I’ve chosen an unproblematic regional indicator which is usually available for entire Europe: population 2004 on NUTS3 level.

A verbal description of all region code changes can be found in this document. The map above visualizes most changes (because change = no data available any more).

It used to be possible to cover EU27 (with a few region code tweaks even Romania and Bulgaria), EFTA and candidate countries. Seriously, that’s not my understanding of a successful data update.

However, all overview maps I’ve seen so far (well, only those in the *new directory* of the NUTS documentation) are already updated to NUTS 2006 codes. So there is hope that GISCO updates the downloadable data soon.

Update

The definitive geographical dataset with the new NUTS 2006 boundaries is presently under development. We expect that the data will be ready for downloading before the end of May.

European geodata

EUROSTATWhat’s TIGER in the US, is GISCO in Europe. Not quite as detailed and up to date but at least free to use under following conditions:

a) the data will not be used for commercial purposes;
b) the source will be acknowledged. A copyright notice, as specified below, will have to be visible on any printed or electronic publication using the data downloaded from this page.

The available geodata is aimed to use in combination with other EUROSTAT products (which are also available for free on their website) in the first place. The scale is too small for detailed map production and on most layers the date is indicated with 199x.

If the left hand of EUROSTAT would know what the right hand is doing, everybody who is interested could now start creating statistical maps and analysis across Europe by simply downloading all necessary data. Unfortunately it’s not as easy as it seems to be: the left hand changed the statistical units in Europe (NUTS), while the other hand didn’t. So what we now have is a statistical database using new region codes and a geographic database using old region codes. Needless to say that a lot of GIS out there, working with EUROSTAT data, are now somewhat screwed because geographic and statistical data doesn’t match anymore. A workaround until updated geodata is available is not using the NUTS3 level, NUTS2 (and larger) data seems less problematic. Not the best solution if you’re in the field of regional analysis of course.

Just one more detail on today’s EUROSTAT confusion:

http://ec.europa.eu/eurostat/ramon/nuts/changes_1999_en.html
http://ec.europa.eu/comm/eurostat/ramon/nuts/changes_1999_en.html

Apparently the www-directory was copied. One copy was updated. Now which one of both sites holds the correct information? All bookmarks lead to the old one, no hint (or redirect??) that the entire site has moved and was updated…

Update

Regarding interoperability and openness, the downloadable geodata comes as ESRI Personal Geodatabase 9.2, not sure how many GIS applications can cope with that file format. Whereas provided metadata is excellent, well, GISCO already had excellent metadata in 2001.

Blog Metrics

Clicky Web AnalyticsGoogle Analytics is free (well, there is a limit for free use but I think it’s about a 7 digit page view number) and is certainly one of the most comprehensive tools in terms of web site traffic analysis. It’s powerful and provides excellent reporting tools.

I have used Google Analytics and I’m still using it on some sites, however, after testing Clicky for the last couple of weeks I ordered a Premium account today. Clicky isn’t nearly as complete as Google Analytics, but it provides a very slick functional user interface, easy blog integration and an API. Perfect for tracking a low traffic blog.

I’m a curious person, I want to know who are my visitors, what are they doing here, where they came from and where they’ll go. Clicky gives me easy overviews and quick answers to all of those points.

Depending on what site you’re running and how much traffic is generated, Clicky currently doesn’t accept sites with more than 10.000 page views per day, but for low traffic blogs, the long tail of the blogosphere, I definitely can recommend Clicky.

Going NUTS on Qype

Qype RegionsI just tried to find a second opinion or review about a new restaurant I’d like to go and went therefore across some local recommendation sites.

Qype, the European version of Yelp, was one of them.

One thing on Qype, which I visited for the first time btw, called my immediate attention: the geocode in the address bar, where a 5-digit NUTS code followed by a place name showed up.

NUTS (Nomenclature d’unités territoriales statistiques) is a geocode standard for referencing the administrative division of countries for statistical purposes in Europe. [Wikipedia]

First of all I found it rather fascinating that a trendy 2007 Web 2.0 company, in times of folksonomy and the semantic web, makes use of an old and dusty statistical classification standard developed during the 1980s.

The second thought was already that, since we are dealing a lot with those region codes because it’s the only way to homogenized European statistical information, I can dynamically link our regional databases directly to Qype sites without having to deal with place name, spelling and search accuracy issues.

Considering that option, Qype is probably one of the best resources for a general overview of NUTS regions. Other regional information sites usually provide different, mostly national, views and definitions of regions. Actually I’m not aware of a resource where you can go through all European NUTS regions and get for each a homogenized picture and description.

An amusing detail is that you even can search by NUTS codes instead of place names in Qype: for instance looking up “pizza” in “AT130″. The local search any hardcore statistician ever dreamed of came true in Qype!

The restaurant was I Carusi btw, anybody been there yet?

SVG diagrams on Google Maps

A smart swiss Google Maps user posted an interesting example where SVG diagrams are added as overlay to a Google Map.

Very clever!

Though the diagrams are static and don’t make sense, it should be easy to connect them to dynamic data. The example demonstrates a promising method of overlaying vector symbols and point signatures on Google Maps, only based on a few lines of Javascript.

Compared to raster graphic symbols, e.g. PNGs or JPGs, any SVG symbol attribute like color, size, transparency, etc. can be changed easily on-the-fly, without the need of producing and storing tons of new files somewhere in the server-background.

The example doesn’t work with Adobe’s SVG Plugin, it’s only accessible with Firefox so far.

Region codes

Statistic lectures should always begin by explaining the importance of homogeneous region codes. I spent days of my life cleaning up individually invented region codes which are supposed to match other regional data sets. Even in some EU administration departments they prefer customising instead of using official NUTS or ISO codes.

SVG mapping in the wild

Brian Timothy pointed in his comment on my “Flash vs. SVG”-post to the website of the US Department of Agriculture (USDA) containing maps done completely in SVG. Apart from the “political matter” he mentioned I would like to stress the potential of SVG for mapping statistics.

The USDA maps let the user change and apply various styles immediately without reloading the whole map. Following the same method you could allow users to change thresholds too for instance. Since every single attribute is stored within or linked to the according map object it’s easy to modify the map presentation by JavaScript and XML/SVG/DOM on the client side. As you can read in this short review by Jeff Thurston the application appears fast and with a snappy user interface.

If you’re interested in webmapping using SVG you’ll find below a few well done examples:

In addition you can find my first SVG mapping experiment here. I stopped development 2 years ago. Neither will it work in Firefox 1.5 (compelled to optimize it for Adobes plugin then, *the* showstopper for SVG) nor is it fully functional because as backend I used a PostgreSQL/PostGIS database which I don’t have available online.