<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>The Map of Life</title>
	<atom:link href="http://mappinglife.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://mappinglife.wordpress.com</link>
	<description>integrating species distribution knowledge</description>
	<lastBuildDate>Mon, 02 Jan 2012 03:26:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='mappinglife.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>The Map of Life</title>
		<link>http://mappinglife.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://mappinglife.wordpress.com/osd.xml" title="The Map of Life" />
	<atom:link rel='hub' href='http://mappinglife.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Map of Life and CartoDB</title>
		<link>http://mappinglife.wordpress.com/2011/12/30/map-of-life-and-cartodb/</link>
		<comments>http://mappinglife.wordpress.com/2011/12/30/map-of-life-and-cartodb/#comments</comments>
		<pubDate>Fri, 30 Dec 2011 11:51:15 +0000</pubDate>
		<dc:creator>Gaurav</dc:creator>
				<category><![CDATA[technology]]></category>
		<category><![CDATA[updates]]></category>
		<category><![CDATA[CartoDB]]></category>

		<guid isPermaLink="false">http://mappinglife.wordpress.com/?p=113</guid>
		<description><![CDATA[It&#8217;s been an exciting month for Map of Life! We had a great time at TDWG 2011 in sunny New Orleans, where John Wieczorek and I presented Map of Life&#8216;s big dream: to use existing maps to make better maps of where species actually are. John and Aaron Steele also presented some radical ideas about [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mappinglife.wordpress.com&amp;blog=20421909&amp;post=113&amp;subd=mappinglife&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://mappinglife.files.wordpress.com/2011/12/screen-shot-2011-12-12-at-am-03-57-51.png"><img class="alignright size-medium wp-image-125" style="clear:right;" title="NWDSSD on CartoDB, Australian points only" src="http://mappinglife.files.wordpress.com/2011/12/screen-shot-2011-12-12-at-am-03-57-51.png?w=300&#038;h=237" alt="" width="300" height="237" /></a></p>
<p>It&#8217;s been an exciting month for Map of Life! We had a great time at <a href="http://www.tdwg.org/conference2011/">TDWG 2011</a> in sunny New Orleans, where John Wieczorek and I presented <a href="http://speakerdeck.com/u/gaurav/p/map-of-life-computer-demo-at-tdwg-2011"><em>Map of Life</em>&#8216;s big dream</a>: to use existing maps to make better maps of where species actually are. John and Aaron Steele also presented some radical ideas about <a href="http://eightysteele.github.com/presentations/tdwg/2011/dce">hooking CouchDB and CouchApp together</a> to build simple, powerful applications. Their switch in strategy made us wonder if perhaps we could pull that off with Map of Life, too.</p>
<p>It was in this frame of mind that we attended <a href="http://www.vizzuality.com/team/jatorre">Javier de la Torre</a>&#8216;s demonstration of <a href="http://cartodb.com/">CartoDB</a>, a Google Fusion Table-like application to store and render mapping data. The more we saw, the more we liked: <a href="http://github.com/vizzuality/cartodb">open-source</a> (and available on GitHub!), based on <a href="http://postgis.refractions.net/">PostGIS</a> on <a href="http://www.postgresql.org/">PostgreSQL</a> (already our platform of choice), and incorporating Mapnik, the super-fast tile rendering engine we discussed <a href="http://mappinglife.wordpress.com/2011/10/11/whats-new-october-edition/">in our last blog post</a>. They&#8217;re also quick to respond to our requests: last week, CartoDB added <a href="http://blog.cartodb.com/post/13968310966/dynamic-map-styles">support for per-request tile styling</a>, an essential feature for the next phase of our development.</p>
<p>Over the last two months, we&#8217;ve been working on moving our map tiling infrastructure to leverage CartoDB while continuing to use Google App Engine for indexing and searching. Although there are still some small glitches to work out before we can claim full success, our system now works in two parts:</p>
<ul>
<li>A set of scripts which upload data into a CartoDB database, and;</li>
<li>A frontend which queries and accesses that database to create a map to show our users.</li>
</ul>
<p>In so doing, we&#8217;ve reaped the rewards of a much smaller, simpler code base. Many of the more complicated tasks we were doing earlier, such as indexing our attributes or drawing the map layers, are now being handled by programs perfectly designed to take on these tasks (PostgreSQL and CartoDB respectively). So our job has been simplified to doing what we do best: managing the data, combining it easily and quickly in our front end, and analysing it for global patterns on our back end.  We&#8217;ll be working to further simplify this upload process soon, and we&#8217;ll be showing off more of our new architecture shortly. Stay tuned!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mappinglife.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mappinglife.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/mappinglife.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/mappinglife.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/mappinglife.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/mappinglife.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/mappinglife.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/mappinglife.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/mappinglife.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/mappinglife.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/mappinglife.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/mappinglife.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/mappinglife.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/mappinglife.wordpress.com/113/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mappinglife.wordpress.com&amp;blog=20421909&amp;post=113&amp;subd=mappinglife&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mappinglife.wordpress.com/2011/12/30/map-of-life-and-cartodb/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/ef10e56567f5d6e20bc3f7f4ab4e3254?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Gaurav</media:title>
		</media:content>

		<media:content url="http://mappinglife.files.wordpress.com/2011/12/screen-shot-2011-12-12-at-am-03-57-51.png?w=300" medium="image">
			<media:title type="html">NWDSSD on CartoDB, Australian points only</media:title>
		</media:content>
	</item>
		<item>
		<title>What&#8217;s new, October edition</title>
		<link>http://mappinglife.wordpress.com/2011/10/11/whats-new-october-edition/</link>
		<comments>http://mappinglife.wordpress.com/2011/10/11/whats-new-october-edition/#comments</comments>
		<pubDate>Tue, 11 Oct 2011 22:00:30 +0000</pubDate>
		<dc:creator>Gaurav</dc:creator>
				<category><![CDATA[updates]]></category>

		<guid isPermaLink="false">http://mappinglife.wordpress.com/?p=100</guid>
		<description><![CDATA[Welcome back to the Map of Life blog! I&#8217;m Gaurav Vaidya, a first-year graduate student at the Guralnick lab in beautiful Boulder, Colorado. I joined the Map of Life team just under two months ago, and have been having a great time working on the project. In these two months, Map of Life has had [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mappinglife.wordpress.com&amp;blog=20421909&amp;post=100&amp;subd=mappinglife&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Welcome back to the Map of Life blog! I&#8217;m <a href="http://www.ggvaidya.com/">Gaurav Vaidya</a>, a first-year graduate student at the <a href="http://sites.google.com/site/robgur/">Guralnick lab</a> in beautiful <a href="http://www.flickr.com/places/United+States/Colorado/Boulder">Boulder, Colorado</a>. I joined the <a href="http://www.mappinglife.org/people">Map of Life team</a> just under two months ago, and have been having a great time working on the project. In these two months, Map of Life has had a lot of fantastic new features approaching completion, and we thought the time was ripe to show some of them off to you!</p>
<p>Our most impressive new feature is our map rendering. As you may <a href="http://mappinglife.wordpress.com/2011/04/21/40/">recall from earlier blog posts</a>, mapping is handled by our backend, set up on a Linode VPS server. <a href="http://www.aaron.io">Aaron</a> recently restructured our backend to use <a href="https://github.com/Vizzuality/Windshaft">Windshaft</a>, a high-speed map tiler generously released under an open-source license by <a href="http://vizzuality.com/">Vizzuality</a>, based on cutting-edge open-source tools such as <a href="http://mapnik.org/">Mapnik</a>, <a href="http://postgis.refractions.net/">PostGIS</a>, <a href="http://redis.io/">Redis</a> and <a href="http://nodejs.org/">node.js</a>. Between Windshaft and Aaron&#8217;s work on <a href="http://en.wikipedia.org/wiki/Canvas_element">HTML5 canvas</a>, we&#8217;ve achieved some unbelievable results. At the moment, we&#8217;re rendering tiles <em>without caching</em> in the hundreds of milliseconds time-frame. This includes support for species occurrence data, protected areas, expert species range maps, ecoregions or any other geo-referenced vector or point data you care to throw at it. Aaron is currently working on merging these innovations into our main data preparation work flow.</p>
<p>Meanwhile, <a href="https://plus.google.com/108513729351420126811/about">John</a> has been busy adding raster environmental layers to Map of Life, probably with <a href="http://trac.osgeo.org/postgis/wiki/WKTRaster">PostGIS&#8217; upcoming raster support</a>. Supporting raster layers is a first step towards linking all the species geographic distribution data already in Map of Life to local environmental variables such as climate, land cover and vegetation. This will also help us facilitate analyses on our platform, meeting <a href="http://www.mappinglife.org/tech">one of the key goals</a> for Map of Life. We’ll definitely be talking more about the planned analyses in future blog posts, so stay tuned!</p>
<p>Map of Life depends on data producers, compilers, and aggregators to add their data to the project, so it’s vitally important to ensure that they can do so quickly and easily. With a lot of help from the entire team, I am taking a first stab at this process. Before the <a href="http://www.tdwg.org/conference2011/program/">TDWG 2011 Annual Conference</a> (only a week away!), we hope to have all the scripts in place to provide an efficient pipeline from shapefiles and accompanying Map of Life-specific metadata to pretty, searchable layers on your favourite browser. So far, we&#8217;ve managed to compress most of the functionality we need <a href="https://github.com/andrewxhill/MOL/wiki/Layer-Workflow-Prototype">into one easy-to-use program</a>. This is a great area of Map of Life on which to have started, since it lets me connect with all the components of the software &#8212; from data preparation to map visualization &#8212; while also giving me a chance to work closely with the entire team.</p>
<p>There are many other exciting things we’d like to talk to you about, from analyses we have planned, to our TDWG demonstration next week, to the first release of a demo you can play around with yourself. For now, though, this brief glance will regrettably have to suffice. Please ask us any questions in the comments &#8212; it’s a huge help for us in gauging interest in features for the software, as well as for topics we need to cover in future blog posts. As always, Map of Life’s <a href="https://github.com/mapoflife/MOL">source code</a> and <a href="https://github.com/andrewxhill/MOL/issues">open issues</a> are available online, so please do contact us or contribute there if you have any specific concerns.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mappinglife.wordpress.com/100/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mappinglife.wordpress.com/100/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/mappinglife.wordpress.com/100/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/mappinglife.wordpress.com/100/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/mappinglife.wordpress.com/100/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/mappinglife.wordpress.com/100/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/mappinglife.wordpress.com/100/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/mappinglife.wordpress.com/100/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/mappinglife.wordpress.com/100/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/mappinglife.wordpress.com/100/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/mappinglife.wordpress.com/100/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/mappinglife.wordpress.com/100/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/mappinglife.wordpress.com/100/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/mappinglife.wordpress.com/100/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mappinglife.wordpress.com&amp;blog=20421909&amp;post=100&amp;subd=mappinglife&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mappinglife.wordpress.com/2011/10/11/whats-new-october-edition/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/ef10e56567f5d6e20bc3f7f4ab4e3254?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Gaurav</media:title>
		</media:content>
	</item>
		<item>
		<title>Git for development data</title>
		<link>http://mappinglife.wordpress.com/2011/06/09/git-for-development-data/</link>
		<comments>http://mappinglife.wordpress.com/2011/06/09/git-for-development-data/#comments</comments>
		<pubDate>Fri, 10 Jun 2011 05:35:51 +0000</pubDate>
		<dc:creator>andrewxhill</dc:creator>
				<category><![CDATA[architecture]]></category>
		<category><![CDATA[overview]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[backend]]></category>
		<category><![CDATA[chef]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[git]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://mappinglife.wordpress.com/?p=65</guid>
		<description><![CDATA[In an earlier post I had promised more discussion of the backend-frontend architecture we are using at MOL. One of the reasons that has been slower to develop than I had hoped is that our architecture is evolving so rapidly, I haven&#8217;t been certain when I can say something will be around long enough to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mappinglife.wordpress.com&amp;blog=20421909&amp;post=65&amp;subd=mappinglife&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>
In an earlier post I had <a href="http://mappinglife.wordpress.com/2011/04/21/40/" title="Cloud where appropriate">promised more discussion</a> of the backend-frontend architecture we are using at MOL. One of the reasons that has been slower to develop than I had hoped is that our architecture is evolving so rapidly, I haven&#8217;t been certain when I can say something will be around long enough to call it core MOL. Today though, I&#8217;d like to show how <a href="http://aaron.io" title="Aaron Steele">Aaron Steele</a> and I are using <a href="http://git-scm.com/" title="git">Git</a> to manage rapidly evolving data.
</p>
<p>
For this project, we are trying to quickly pull together many different data types, structure, and combine them in ways that will serve APIs, user interface, and analytics. No matter what level of planning we could imagine, we just aren&#8217;t going to get the structure and combinations right the first or even the Nth time. On top of that, we needed to be able to move data and data structures between our two development laptops, development server, and production server. Testing new features and restructuring that data quickly without breaking any component. Enter git. Below, I&#8217;ll walk through the steps of setting up a privately hosted git repository and managing changes to the data effectively. At then end, I will also cover how we are using <a href="http://www.opscode.com/chef/" title="Chef">Chef</a> to keeping application servers up-to-date with code and data simultaneously.
</p>
<p>
The first thing you are going to need is a remote repository, a server that you have ssh access to with static IP.
</p>
<h3>Remote repository</h3>
<p>First, ssh into the server. Next, you will want to set up a user for git.<br />
<pre class="brush: bash;">
adduser git
passwd pw
</pre><br />
Next, inside your /home/git directory, create a new repo<br />
<pre class="brush: bash;">
mkdir bigdata
cd bigdata
git init --bare
</pre><br />
For our uses, I wanted to be able to clone this repo on any machine without ssh access. For that, you need to expose it via HTTP. We use <a href="http://wiki.nginx.org/" title="nginx">Nginx</a> for all of our servers. So to enable this, we just need to add the git repo directory we just created as a site in the Nginx sites-enabled config. Important part,<br />
<pre class="brush: bash;">
location /bigdata {
    alias   /home/git/bigdata;
    allow  all;
}
</pre><br />
At this point, be sure to restart Nginx. You can test that it is working by loading the description in your browser,</p>
<blockquote>
<p>http://yoursite.org/bigdata/description</p>
</blockquote>
<h3>Local machine</h3>
<p>Now you&#8217;ll want to set up a repository on your local machine for the data. In our case, our backend architecture relies on the data in a particular place, so on my local machine, I place it in the same spot so that by running my dev server, it is just there.<br />
<pre class="brush: bash;">
mkdir bigdata
cd bigdata
git init
</pre><br />
Next, standard git procedure, get it started by,<br />
<pre class="brush: bash;">
touch README
git add README
git commit -m 'First Commit'
</pre><br />
Finally, we want to tell my local git repo, that there is a remote git repo waiting for its knowledge, so,<br />
<pre class="brush: bash;">
git remote add origin git@yoursite.org:bigdata
</pre><br />
This is where you will need SSH access to your remote server. If you are not running SSH on port 22, it might help to add the actual to your ssh config so it knows to use it by default,<br />
<pre class="brush: bash;">
nano ~/.ssh/config
#now add the lines
Host yoursite.org
   Port {your-port-#-here}
</pre><br />
Finally, we can push our changes to the remote server.<br />
<pre class="brush: bash;">
git push origin master
</pre></p>
<h3>Function</h3>
<p>
Now, we might want to add a bunch of data to our repo. In MOL, we are playing with ~1.5-2 gigs of data right now, that number will grow, but probably not much faster than the rate we finalize structures and databases. For now, I have a directory structure for all the types of data we are playing with. I just drop that into my bigdata folder. Commit the changes, and git push. That data is now on our shared remote repo. If I push changes to the data structure and change features in the application, Aaron only need pull each of the repos (our code has it&#8217;s own <a href="http://github.com/andrewxhill/MOL" title="MOL Code">repo</a>) and everything should be bug free.
</p>
<p>
Next, we will want to make this available to both development servers and production servers. This is where the magic of Git+Chef comes in. By using <a href="http://learn.github.com/p/tagging.html" title="Git Tags">Git Tags</a>, I can basically freeze the data as it is right now, while still committing changes to the branch. So, while I may want to push changes in code to the a production server, I probably don&#8217;t want it pulling in or restructuring data unless I tell it to explicitly.
</p>
<p>So once I have my data in my local git repo and my application works, I can commit and add a tag,<br />
<pre class="brush: bash;">
git tag -a v0.1
</pre></p>
<h3>Using Chef (I&#8217;ll come back to Git in a sec)</h3>
<p>A few days ago, we decided it was time to move the MOL backend off of a local development server and into a more easily scalable infrastructure. On top of the fact that I had to migrate all the services, application code, and data to a new server, I realized that while we were developing the backend modular so that different components could eventually scale independently (database versus long running tasks versus application interfaces), we didn&#8217;t need a ton of computing yet. Chef is beautiful in that it allowed me to look at the currently running development server, encode each part into actual instructions (<a href="http://wiki.opscode.com/display/chef/Roles" title="roles">roles</a>, <a href="http://wiki.opscode.com/display/chef/Recipes" title="Chef recipe">recipes</a>, and <a href="http://wiki.opscode.com/display/chef/Cookbooks" title="cookbook">cookbooks</a>), and run those instructions to build a new server. While we put all of the backend on a remote node for now, it will be fairly straight forward to break up our Chef recipe down the road and move parts of the architecture to independent nodes in the backend.</p>
<p>
So, I have encoded the backend in Chef, and I have a virtual server running on <a href="http://www.linode.com/index.cfm" title="Linode">Linode</a>. To deploy my Chef instructions (we use http://opscode.com/ to host our Chef) all I do is run,<br />
<pre class="brush: bash;">
knife bootstrap root@{node-ip-address}  -r 'role[{name-of-node-role}]'
</pre>
</p>
<p>Voila! I have a running backend server on Linode. I wont go into much detail on the magic that is Chef here because I want to get back to Git for data. It was at this point that I wanted a method to include populating data using Chef. Sure, I could run dropbox on the server, or have it pull data statically hosted else where, and I&#8217;m sure using more specific tools. Git is a warm cozy blanket though. It offers version management, branching, tagging. We already have it, know the methods, dream about it. So, now that we have the remote git repo we set up above, I created a &#8216;resource&#8217; in our mol-backend cookbook. What this resource will do, is when I bootstrap a new or existing backend node, it will checkout the data from the git repo. Here is what it looks like in chef,<br />
<pre class="brush: ruby;">
# download mol source and checkout specific version/branch
execute &quot;fetch data from git data repo&quot; do
  command &quot;git clone  #{node[:mol][:remote_data_repo]} #{node[:mol][:base_data_dir]}&quot;
  node.set['mol']['node_existed'] = false
  not_if { FileTest.exists?(node[:mol][:base_data_dir]) }
end
execute &quot;switch to specified branch/tag of data repo&quot; do
  command &quot;cd #{node[:mol][:base_data_dir]} &amp;amp;&amp;amp; git checkout #{node[:mol][:remote_data_branch]}&quot;
end
execute &quot;pull updates of data repo&quot; do
  command &quot;cd #{node[:mol][:base_data_dir]} &amp;&amp; git pull #{node[:mol][:remote_data_repo]} #{node[:mol][:remote_data_checkout_point]}&quot;
end
</pre><br />
A few things just happened. First, if this is the first time the node is being built, it performs a clone of the remote server we set up above over HTTP (no ssh creds needed), passed to the code via a variable,</p>
<blockquote><p>
#{node[:mol][:remote_data_repo]}
</p></blockquote>
<p>Next, if that was in fact the first time that repo downloaded, the next block performs a checkout of the data. Here it gets beautiful again, from inside the repo that was just cloned, we tell it to checkout a specific branch or tag, passed via the variable,</p>
<blockquote><p>
#{node[:mol][:remote_data_checkout_point]}
</p></blockquote>
<p>What we have just done is said, if we are executing this Chef build on a Production server, we can have it checkout a specific snapshot of our data, say the v0.1 we created above. While, at the same time Aaron and I may have committed hundreds of changes to that data that we are still working on in development. Now our development node can checkout those changes from say, the master branch, whenever we execute the Chef instructions on it. The next really great part here, is that we have executed the Chef build command on our backend many times already. </p>
<p>
In a normal case, we may have a problem if we wanted it to pull in a big data set every time it rebuilds. But, because we are only using a &#8216;git pull&#8217; if the git repo already existed (found out by the &#8216;not_if&#8217; commands above), the backend node will only waste its time pulling in changes to the declared version or branch. Since all cloud services are going to charge you for bandwidth in and out, it is important to minimize wastage. On top of that more obvious example is this, if we decide we need to move the data around, or rename files, we can track those changes using git. When we execute our Chef code again on the servers, instead of rebuilding those data resources from scratch, it will just replicate moves and renames.
</p>
<h3>Doubts of scalability</h3>
<p>At this point, some more hard-core Git users will likely be shaking their heads at the use of Git for large directories with lots of binary data. In our case though, and I think for many projects in this domain, it works. We don&#8217;t want to waste time hard coding data structures before we have tested the data use and functional requirements. On top of that, we don&#8217;t have time to be developing version control systems for data structures and sources that are ultimately going to be rolled into our databases. But we need methods to quickly change, share, and track the structure of these datasets. For us, getting this solution in place frees up a lot of our time for developing more useful components and eventually getting away from I/O heavy first passes.</p>
<h3>Chef. Wow</h3>
<p>I first picked up on this project through Anthony Goddard&#8217;s <a href="http://crankstations.com/some-nice-new-features-in-chef-0100" title="Anthony Goddard">blog</a>. In just the past couple days I feel like Chef has changed my view of deployment and architecture. I would like to spend time in another post talking about how much I love Chef. This project is so wonderful. Anyone who does development on remote servers or instances should really check this out. Especially if you have a toolkit of your favorite technology layers that you find yourself deploying all the time. Partially for that reason, I think the technology offers a lot to our community. Particularly in the use of Cookbooks. Cookbooks are nuggets of coded functionality for your system architecture. They are reusable, easily linked (via git!) to a maintainers repository, and powerful. Stealing from a conversation earlier tonight with Aaron, we can relatively easily assemble Cookbooks that would facilitate the sharing and publishing of say Darwin Core records, taxonomic databases, or annotation services that then anyone could modify and use in their systems. Love it.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mappinglife.wordpress.com/65/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mappinglife.wordpress.com/65/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/mappinglife.wordpress.com/65/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/mappinglife.wordpress.com/65/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/mappinglife.wordpress.com/65/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/mappinglife.wordpress.com/65/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/mappinglife.wordpress.com/65/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/mappinglife.wordpress.com/65/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/mappinglife.wordpress.com/65/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/mappinglife.wordpress.com/65/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/mappinglife.wordpress.com/65/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/mappinglife.wordpress.com/65/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/mappinglife.wordpress.com/65/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/mappinglife.wordpress.com/65/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mappinglife.wordpress.com&amp;blog=20421909&amp;post=65&amp;subd=mappinglife&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mappinglife.wordpress.com/2011/06/09/git-for-development-data/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/35f4d000a88cdbcf6392dfb206ebd5e2?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andrewxhill</media:title>
		</media:content>
	</item>
		<item>
		<title>Cloud where appropriate</title>
		<link>http://mappinglife.wordpress.com/2011/04/21/40/</link>
		<comments>http://mappinglife.wordpress.com/2011/04/21/40/#comments</comments>
		<pubDate>Fri, 22 Apr 2011 02:22:37 +0000</pubDate>
		<dc:creator>andrewxhill</dc:creator>
				<category><![CDATA[architecture]]></category>
		<category><![CDATA[overview]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://mappinglife.wordpress.com/?p=40</guid>
		<description><![CDATA[Here at the Map of Life, we’ve been cranking away at development for some months now. We are finalizing some of our beta release user interfaces (UIs) and application programming interfaces (APIs), but in the meantime we wanted to start opening up some of our development ideas for a wider discussion. During discussions we have [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mappinglife.wordpress.com&amp;blog=20421909&amp;post=40&amp;subd=mappinglife&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<div>Here at the Map of Life, we’ve been cranking away at development for some months now. We are finalizing some of our beta release user interfaces (UIs) and application programming interfaces (APIs), but in the meantime we wanted to start opening up some of our development ideas for a wider discussion. During discussions we have reiterated the importance of explicitly opening our methods and solutions to the community. Not only because we think our methods and solutions are interesting, but because we think that only by starting these conversations can we ever receive feedback and knowledge that could be key to making MOL a success.</p>
<p>Over the next months and years, this blog will be a primary way that MOL will begin these discussions. One technical solution that we have been excited to write about is our approach to scalable architectures across MOL. The project has some diverse scaling challenges. At the most basic level, MOL will provide APIs and UIs that allow users and other projects to rapidly access high quality species distribution maps.  But this is a major simplification of the scope, ambition and challenges of MOL. MOL is not just providing distribution maps, we are provisioning many diverse analyses across data types and scales, methods to deploy long-running analysis jobs, toolkits for expert users to harness those analyses to improve distribution maps and finally, storage and versioning for many parts of the system.</p>
<p>From the beginning, the project has been guided by milestones that evolve over time. The goal is to finish one milestone before spending too much time on later milestones, while still allowing some fuzziness along the borders. So, with a small and overworked team of developers, we adopted an <a title="Agile Development" href="http://en.wikipedia.org/wiki/Software_development_process#Agile_development"> agile development</a> strategy that supports fast iterations, rapid prototyping, and quick refactoring as the project moves forward. A part of that plan was a decision to minimize the time spent developing complex technology solutions that would likely need  to be painfully refactored after reaching later milestones.</p>
<p>Early milestones deal with making data accessible and discoverable in standardized formats with a focus on high quality metadata combined with very-fast data access for visualization and search. This situation led us to making a bold decision: many of our databasing and back-end scaling solutions could be developed later, only after preliminary testing of data loads were available (and a more robust map of data relationships had been developed). On the other hand, our front-end would need to be streamlined, fast, and handle a lot of caching from the onset so that we can start rolling out releases without knowledge of initial request volume. To handle this situation while also maximizing the benefits of cloud computing during these early stages, we decoupled the front end and back end of the system architecture.</p>
<p><a href="http://mappinglife.files.wordpress.com/2011/04/mol-front-back.jpg"><img class="aligncenter size-full wp-image-43" title="MOL Simple Architecture" src="http://mappinglife.files.wordpress.com/2011/04/mol-front-back.jpg?w=700" alt=""   /></a></p>
<p>For now, we built the back end using existing local hardware. Since the back end is never directly accessed by users, we can more easily predict the load based on how many analyses for which it is responsible. When that load gets unmanageable on its current hardware, we will begin developing new storage strategies and redeploy the back end using a more scalable system. At its core though, the applications we have deployed on the back end will be directly reusable in conjunction with novel storage solutions and new hardware environments.</p>
<p>The same early milestones led us to feel that we would need to solve scalability challenges in the front-end of MOL more immediately. The front end needed to expose data &#8211; primarily data about what data sets and data types we have available &#8211; and provide access to processed forms of that data for our web based UIs (e.g. map tiles). The front end can handle widely varying loads while managing the amount of requests it needed to send to our back end. Dealing with unpredictable and diverse client requests, we could not just deploy an app locally and ensure that it would run quickly night and day. For this reason we have developed a front-end application using the <a title="App Engine" href="http://code.google.com/appengine/docs/python/overview.html">Python SDK on Google App Engine</a>(GAE).</p>
<p>GAE provides apps with generous <a href="http://code.google.com/appengine/docs/quotas.html">free quotas</a> for data storage and CPU cycles, so during both development and slow days at MOL we can keep costs under a dollar. We manage to do this by never storing large data on the front end, but instead take advantage of the GAE Datastore to provide fast access to specific parts of the data that we want to query, such as, metadata, taxonomy, and data relationships. We also take advantage of a large free quota for the <a href="http://code.google.com/appengine/docs/python/memcache/">Memcache API</a> to handle caching small pieces of data, such as map tiles, that the front end pulls from the back end using REST APIs. Now, when MOL starts rolling out features, we can be confident that our app will scale with increased loads!</p>
<p>The MOL architecture has one foot in the cloud while keeping another firmly in hardware sitting at the University of Colorado (for now). This hybrid system has reduced our early development costs while still ensuring that we can easily scale up as we announce early releases. By reducing the responsibilities of the front end to storing and searching metadata and caching data requests, we have been able to develop our system cheaply while already working on cloud solutions that will scale far into the future. I’ll devote a later blog post to a concrete example, likely focusing on how we serve species-ecoregion occurrence polygon data for the mapping user interface.</p></div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mappinglife.wordpress.com/40/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mappinglife.wordpress.com/40/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/mappinglife.wordpress.com/40/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/mappinglife.wordpress.com/40/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/mappinglife.wordpress.com/40/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/mappinglife.wordpress.com/40/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/mappinglife.wordpress.com/40/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/mappinglife.wordpress.com/40/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/mappinglife.wordpress.com/40/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/mappinglife.wordpress.com/40/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/mappinglife.wordpress.com/40/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/mappinglife.wordpress.com/40/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/mappinglife.wordpress.com/40/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/mappinglife.wordpress.com/40/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mappinglife.wordpress.com&amp;blog=20421909&amp;post=40&amp;subd=mappinglife&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mappinglife.wordpress.com/2011/04/21/40/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/35f4d000a88cdbcf6392dfb206ebd5e2?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andrewxhill</media:title>
		</media:content>

		<media:content url="http://mappinglife.files.wordpress.com/2011/04/mol-front-back.jpg" medium="image">
			<media:title type="html">MOL Simple Architecture</media:title>
		</media:content>
	</item>
		<item>
		<title>Map of Life Dream Team (and, hey, we are ready to blog!)</title>
		<link>http://mappinglife.wordpress.com/2011/04/06/introduction-to-mol/</link>
		<comments>http://mappinglife.wordpress.com/2011/04/06/introduction-to-mol/#comments</comments>
		<pubDate>Wed, 06 Apr 2011 18:38:29 +0000</pubDate>
		<dc:creator>Rob</dc:creator>
				<category><![CDATA[overview]]></category>
		<category><![CDATA[introduction]]></category>
		<category><![CDATA[team]]></category>

		<guid isPermaLink="false">http://mappinglife.wordpress.com/?p=25</guid>
		<description><![CDATA[Map of Life has been chugging along for about 6 months now in its current configuration and now seems like as good a time as any to step back and consider how far we have come and what might be next. The team working on Map of Life is such an interesting one. Geographically we [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mappinglife.wordpress.com&amp;blog=20421909&amp;post=25&amp;subd=mappinglife&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<div><a title="MOL" href="http://mappinglife.org">Map of Life</a> has been chugging along for about 6 months now in its current configuration and now seems like as good a time as any to step back and consider how far we have come and what might be next. The team working on Map of Life is such an interesting one. Geographically we are spread out across the United States, from the East Coast (Yale University) to the Midwest (University of Kansas), Mountain West (University of Colorado) to the Pacific Coast (University of California, Berkeley). We are also diverse by country of origin (Australia, Germany, U.S.A.), academic training (computer science, ecology, evolution, informatics) and skill set (programming, systems engineerings, informatics, macroecology, systematics, etc). A gratifying part of the first six months, for me, is that these differences and diversity has translated into a strong working relationship and collaborative spirit, where the strengths of the group, not the weaknesses, have multiplied. I think this likely reflects a strong impetus to meet regularly, as often as three times a week, via cell or Skype, to synchronize efforts. Plus, good peeps and &#8211; turns out &#8211; we like working with each other!&nbsp;</p>
<p>So what have we accomplished with all this good will and great vibes? A lot, as it turns out! Much of that is &#8220;behind the scenes&#8221;. Andrew Hill and Aaron Steele have been bouncing great ideas back and forth about how to create an information architecture that is robust, scalable and efficient. We&#8217;ve put together a broad technological overview here: <a title="MOL Tech" href="http://www.mappinglife.org/tech">http://www.mappinglife.org/tech</a>. As it stands now, and looking back, we have done a lot. First, we deployed a cloud-based copy of the<a title="COL" href="http://www.catalogueoflife.org/"> Catalogue of Life</a>, accumulated a large set of range maps for amphibians and mammals, checked taxonomy of those range maps against the Catalogue of Life database, and provided an initial mechanism to search those maps. Next we have developed the means to display range maps via <a title="Google Maps API" href="http://code.google.com/apis/maps/documentation/javascript/">Google Maps</a>, using a map tiling tool named <a title="Mapnik" href="http://mapnik.org/">Mapnik</a>, and developed some initial user interface frameworks and designs. We are currently polishing off access and display of species occurrence data points. All of this is great, but we are still treading in known waters. The excellent <a title="AmphibiaWeb" href="http://amphibiaweb.org/">AmphibiaWeb</a> project has also developed the means for displaying range maps and occurrence data points, for example.</p>
<p>Soon we will be pulling together some new types of distribution data such as &#8220;occurrence polygons&#8221; &#8212; places where species have been described via species list &#8212; and habitat preferences such as &#8220;wet broadleaf forests&#8221; or &#8220;shortgrass prairie&#8221;. These are new challenges for storage, query and visualization. An even greater challenge will be trying to provide all these sources of knowledge in a single search and user interface. Exciting times for Map of Life! We are looking forward to having some demonstrations soon, so you can try out some Map of Life features. Stay tuned (and thanks for reading).</p>
</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mappinglife.wordpress.com/25/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mappinglife.wordpress.com/25/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/mappinglife.wordpress.com/25/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/mappinglife.wordpress.com/25/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/mappinglife.wordpress.com/25/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/mappinglife.wordpress.com/25/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/mappinglife.wordpress.com/25/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/mappinglife.wordpress.com/25/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/mappinglife.wordpress.com/25/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/mappinglife.wordpress.com/25/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/mappinglife.wordpress.com/25/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/mappinglife.wordpress.com/25/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/mappinglife.wordpress.com/25/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/mappinglife.wordpress.com/25/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=mappinglife.wordpress.com&amp;blog=20421909&amp;post=25&amp;subd=mappinglife&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://mappinglife.wordpress.com/2011/04/06/introduction-to-mol/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b67f826745311eced80f5f0da70b89b6?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">robgur</media:title>
		</media:content>
	</item>
	</channel>
</rss>
