The Map of Life

integrating species distribution knowledge

Map of Life Dream Team (and, hey, we are ready to blog!)

Map of Life has been chugging along for about 6 months now in its current configuration and now seems like as good a time as any to step back and consider how far we have come and what might be next. The team working on Map of Life is such an interesting one. Geographically we are spread out across the United States, from the East Coast (Yale University) to the Midwest (University of Kansas), Mountain West (University of Colorado) to the Pacific Coast (University of California, Berkeley). We are also diverse by country of origin (Australia, Germany, U.S.A.), academic training (computer science, ecology, evolution, informatics) and skill set (programming, systems engineerings, informatics, macroecology, systematics, etc). A gratifying part of the first six months, for me, is that these differences and diversity has translated into a strong working relationship and collaborative spirit, where the strengths of the group, not the weaknesses, have multiplied. I think this likely reflects a strong impetus to meet regularly, as often as three times a week, via cell or Skype, to synchronize efforts. Plus, good peeps and – turns out – we like working with each other! 

So what have we accomplished with all this good will and great vibes? A lot, as it turns out! Much of that is “behind the scenes”. Andrew Hill and Aaron Steele have been bouncing great ideas back and forth about how to create an information architecture that is robust, scalable and efficient. We’ve put together a broad technological overview here: As it stands now, and looking back, we have done a lot. First, we deployed a cloud-based copy of the Catalogue of Life, accumulated a large set of range maps for amphibians and mammals, checked taxonomy of those range maps against the Catalogue of Life database, and provided an initial mechanism to search those maps. Next we have developed the means to display range maps via Google Maps, using a map tiling tool named Mapnik, and developed some initial user interface frameworks and designs. We are currently polishing off access and display of species occurrence data points. All of this is great, but we are still treading in known waters. The excellent AmphibiaWeb project has also developed the means for displaying range maps and occurrence data points, for example.

Soon we will be pulling together some new types of distribution data such as “occurrence polygons” — places where species have been described via species list — and habitat preferences such as “wet broadleaf forests” or “shortgrass prairie”. These are new challenges for storage, query and visualization. An even greater challenge will be trying to provide all these sources of knowledge in a single search and user interface. Exciting times for Map of Life! We are looking forward to having some demonstrations soon, so you can try out some Map of Life features. Stay tuned (and thanks for reading).


Written by Rob

April 6, 2011 at 11:38 am

Posted in overview

Tagged with ,

2 Responses

Subscribe to comments with RSS.

  1. Can you guys elaborate on the cloud-based front-end and back-end story and how that achieves scalability in a cost-effective way? The front-end must be up all the time, right, and you don’t want low traffic (what’s the point of all this if you are planning for low traffic), so that must cost much more than $1 a day in a realistically desired scenario? Can we have some meat here for the technophiles as to how you guys envision this to work.

    Hilmar Lapp

    April 7, 2011 at 7:00 am

    • Wow Hilmar, that is great you narrowed in on this topic so quick. We have actually discussed making the cloud/non-cloud duality the topic of our next blog post. In short though, the MOL front end is deployed as an App Engine application. So, unlike other cloud solutions such as Amazon’s EC2, we get a rather respectable free quota for many parts of the front end (cpu, visits, storage) and seamless scaling when page views start to ramp up. This has been great for keeping costs near zero during development. Further, we optimize the free part by managing lots of the MOL metadata on the front end (in addition to a good level of caching for recently requested data), while large files and long running analyses exist on the back end. I’m sure that as the project expands, we will move more and more of the system to the cloud but it is all about picking what part of the cloud you need and who is providing that the best and cheapest. More next week.


      April 7, 2011 at 9:50 am

Comments are closed.

%d bloggers like this: