An Open Database of Addresses

— March 27, 2015 • #

One of the coolest open source / open data projects happening right now is OpenAddresses, a growing group effort assembling around the problem of geocoding, the process of turning human-friendly addresses into spatial coordinates (and its reverse). I’ve been following the project for close to a year now, but it seems to have really gained momentum in the last 6 months.

The project was started last year and is happening over on GitHub. It now has over 60 contributors, with over 100 million aggregated address points from 20 countries, and growing by the day. There’s also a live-updating data repository where you can download the entire OpenAddresses dataset online—it’s currently at about 1.1 gigabytes of address points.

Pinellas addresses

Here’s how it works:

Contributors identify data out in the wild online, and contribute small index files with some pointers to where the data is hosted, and some other details indicating how to merge it with the rest of the project’s data format. There’s no need to download any of the data, only find where the CSV file or web service lives and how to get to it. The technique for this is neat in its simplicity, more on this later.

It sounds weird to think something as basic as address data could be so fascinating and exciting. Most people in the geo community understand the potential impact of projects like this on our industry, but let me review for the uninitiated why this is cool.

Why care about boring addresses?

Address data is what makes almost any map useful: it connects our human-friendly identifiers for places into real locations on the ground. Almost everything that consumers do with maps these days has to do with places of interest: Foursquare checkins, Instagramming, turn-by-turn directions. Without connecting the places as we know them to actual map coordinates a computer can understand, we don’t have many useful mapping applications.

There are existing APIs and resources out there to build mapping applications that require addressing and geocoding, but none of them are open to build on. They’re proprietary systems that either have unfriendly licensing structures for use, or are costly to use. Having to pay money for a high quality geocoding service like Google’s isn’t crazy or surprising — building universally searchable and uniform address databases is insanely expensive and hard. Building good geocoding systems is one of the perennial pains in the ass of the geospatial problem set, so it’s understandable that when someone solves it, they’d want to charge for it.

There is the OpenStreetMap project, the free and open map database for the globe, which has tons of potential as a resource for geocoding. By a quick estimate, the OSM database contains something like 50 million specific address points for the globe. But its license is not compatible with most commercial requirements for republication of data, so developers looking for an open resource have had to look elsewhere. There’s still no good worldwide, open resource for address geocoding that app developers and mappers can use with no strings attached. (OSM’s license and its “friendliness” for commercial use has a long history of debate and argument in the community. It’s complicated. I’m not a lawyer.)

Address data is harder than it looks

Simple data, big problem

The data that composes a postal address is pretty straightforward: house number, street name, city, admin boundary, postal code. That set of 6 properties gets you to a fixed coordinate on the Earth in most places with organized addressing schemes. Pretty simple, right?

But addressing systems are non-standard, vary widely with geography, and are actually non-existent in many countries. The data literally carpets the developed world and comes in dozens of shapes and formats, so bringing it all together into a consistent, unified whole to create a platform for applications is a huge deal.

In the US, for example, one of the biggest challenges is that there isn’t a single standardized structure for the data, and even worse, no single “owner” of address data. Sometimes data’s maintained at the county level, and sometimes the city level. One county’s GIS division will manage it, and in another it’s the E911 system manager. Then you have the challenge of finding the actual data files. It’s becoming commonplace for municipalities to publish this stuff online, but it’s far from universal. To get data for some (especially rural) counties, you better be ready to take a hard drive down to the property appraiser’s office to get the data, or pay them to burn you a CD.

To me this is where the OpenAddresses model gets interesting. The project is bringing a powerful capability for building a massive open dataset, a distributed network of contributors, and focusing their resources around a common goal. Creating a central place around which the contributors can mobilize and gradually accrete data into a larger and larger whole, that’s the unique angle to this project. Anyone with enough time and energy can go chase down hundreds of datasets, but it’s much easier when a group with a defined mission can divide and conquer — intersecting the open source contribution model with a data production line. It’s not just a platform for aggregating this data into a single database, it’s a petitioning system to start the process of tracking down the data, and to advocate for it to be made open if it currently isn’t publicly available.

Current US status

Building the glue

The OpenStreetMap method of contribution is one where contributors are manually finding, converting, and adding data to a separate database. For addresses, this strategy makes ingesting the individual datasets and the thousands of updates per year a huge pain. OA takes a different approach. Instead of manually finding and merging all the datasets together, the main OA repository is a huge pile of index files that function as the glue between all the disparate sources out on the web and a centralized core. It’s an open source ETL system for all flavors of address datasets. People go out and find all the building blocks, and OA is the place where we write the instructions to put them all together.

The project isn’t only the data. It’s tools for working with the data, resources for teaching local advocacy for acquiring the data, and a system of ETL “glue” to bring the sources together to build a platform for other tools and creative mapping projects. Go over to the project and check it out. If you know where some address data is for your neighborhood, dive in and contribute to the effort.

A Quick Guide for New Developers

— March 18, 2015 • #

This entire post comes with a caveat: I am not a software engineer. I do build a software product, and work with a bunch of people way smarter than me, though. I’m experienced enough to have an opinion on the topic.

I talk to lots of young people looking to get into the software world. Sometimes wanting to build mobile apps, or create simple tools, and sometimes looking to create entire products. There are a lot of possible places to start. The world is full of blog posts, podcasts, books, and videos that purport to “teach you to code”. Don’t get me wrong, it’s an awesome world we live in where this stuff’s accessible, but I think people get priorities twisted during that early impressionable stage by thinking they can make a successful iPhone app from scratch in a few months. Even if it’s possible, is that really a life goal? Or do you want to actually become an engineer?

Young people interested in learning how to code could learn a lot by starting with the smaller steps. Instead of diving immediately into learning node.js, or beginning with “Build Your Own Rails App in 15 Minutes” blog posts, focus your energy on some foundations that will be 100% useful in building your skills as an engineer.

In no particular order:

The terminal

Learn how to use the Linux command line

It almost doesn’t even matter what exactly you do with Linux to get started on this. Install some variant of the OS on a computer or virtual machine, and start trying to do stuff. Install packages, set up PHP, get Postgres running. Most importantly: learn the basic command line tools that you’ll use for the rest of your working life—things like grep, sed, cat, ack, curl, find. Think of these as tools of the trade; once you know how to work them, you’ll use them every day. Compare your craft to cooking. It’s possible to create good food without a razor sharp chef’s knife, a large rigid cutting board, and fresh ingredients, but it’s a lot easier when you have them.

Work on tools

Work on tools instead of systems

Starting out by building entire products is a bad idea. The most readily available ideas are ones that require a lot of moving parts, typically. These are the ones that sound fun. Starting to assemble some knowledge by building your own blog engine or social sharing site or photo database system isn’t going to teach you nothing, but it puts the cart before the horse. A few hours into building your photo sharing site (with an objective of making something to share photos) you’ll be working on a login system and a way to reset passwords, instead addressing the problem you identified to solve in the first place. The easier place to start is to identify small pain points in your technology life, and build utilities to fill these voids. A script for uploading files to Google Drive. Wrappers to simplify other utilities. A command line tool to strip whitespace from files. You’ll be biting off something you can actually build as a novice, and you might be able to ship and open source something useful to others (one of the bar-none best resume builders around). Scratching small itches is your friend when you’re learning.

The Cloud, c. 1990

Prime yourself on “devops” knowledge

The “cloud” sounds like a huge loaded buzzword, and it is. But nearly every useful technology stack, even if it’s not a publicly facing consumer product is now built using these core architectures. If your mission is to build iOS games you’ll think this stuff isn’t valuable, but learning how to stand up instances on AWS, install database servers, and understanding the network security stack will guaranteed add indispensable chunks of knowledge that you will need in the near future. This stuff is free now to get a place to hack around, so there are no excuses to not plunging in.

Spend hours on GitHub

Dig for open source projects you find interesting. Pick apart their code. Follow the developers. Read the issue threads. You’ll find something you can contribute back to, without a doubt, even if in tiny ways at first. This is not only hugely satisfying to an engineer’s brain, but you’ll slowly build valuable trust and presence within the community. Don’t be afraid to dig in and have conversations on GitHub projects (trust me, no one else is scared to make comments or offer opinions). Being thoughtful, considerate, positive, and thinking like you’re working as a team are excellent ways to get invited into the fold on open source efforts.

See also, traditional resources

Code schools and crash courses are an awesome new resource, without a doubt. I don’t mean to discount traditional educational structures for foundation-building and creating a regimented path for walking through the process. The good ones will teach you plenty of my previously-mentioned core pieces without getting you ahead of yourself. But the bad ones get new students thinking about picking libraries and frameworks immediately. So little of the initial hill to climb has to do with your choice of Javascript vs. Python vs. Ruby, or whether to use Angular or Backbone in your first app. None of that matters because you don’t really know anything yet, and you haven’t even climbed the initial three or four rungs of the ladder. You shouldn’t attempt leaping to the sixth or seventh rung without having some scars from the lower levels. Jobs that have you mucking around with data in VBscript or maintaining old SQL Server databases are (unfortunately) excellent seasoning for your career. This is usually where you’ll determine whether you really like this career choice that much. If you come out of the trenches still interested in being a programmer, you’ll love it when you get to work on something satisfying, and you’ll appreciate what you have.

I’m a huge fan of starting by getting your hands dirty. This post was intended to help you find the best mud pits to put those hands into.

William Noel McCormick, Jr.

— February 23, 2015 • #

A few weeks ago, my grandfather passed away after a long fight with cancer and Parkinson’s. He was a close part of our family, and we’re sad to see him go, but I wanted to write down some thoughts here.

His list of good qualities and accomplishments is almost too much to mention. During my childhood and after, we spent lots of time with my grandparents, and my grandpa was always a man of action. That’s one of the most memorable parts of growing up with him. In his presence, you couldn’t be bored.

Bill McCormick

For my whole life he was an avid boater, carpenter, fisherman, engineer, and builder, with a seemingly-endless knowledge of how things worked. As boys we grew up always excited to spend time in his shop building things (the Mecca for kids wanting to tinker); he always had a project for us, whether it was making wooden guns, boxes, model boats, or just messing around helping out with whatever his current project was. He was a master carpenter, and produced hundreds of pieces of furniture, tools, containers, and even boats.

Project list

He had drive to understand the way everything worked, take things apart, and get his hands dirty that instilled those same values in my brothers and me. The “how would Grandpa approach this problem?” question still crosses my mind today when I’m working on projects—around the house or at the office. The notion that there’s no better way to get something built than to start building was something that’s always been around in my family of engineers. I thought building things yourself and fixing anything was just what you do. When I’d be over at friends’ houses and hear about the parents calling a plumber, mechanic, or A/C repairman, I always thought that was strange. Why wouldn’t they just look at it and fix it themselves?

All his life he had boats. So many boats. We grew up on his Catalina 22’ that he kept at Lake Lanier in North Georgia, and we’d spend weekends out on the Lake sailing and fishing. We took that boat on trips to the Florida Panhandle and the Keys. We visited Sombrero Reef and snorkeled from that boat. He showed me how you use a sextant to find your latitude, and plot a course on a nautical chart. His love of maps and geography was a major influence in my eventual selection of career. He used to show us his personally annotated atlases charting his travels all over the world.

Grandpa's wall of boats

His professional accomplishments are astonishing. He earned his degree in mechanical engineering degree from Auburn University. He served in the US Navy in the Mediterranean aboard a minesweeper. He worked at Ingalls Shipyard in Pascagoula, then spent over 30 years with the US Army Corps of Engineers, eventually appointed Chief of Engineering for the Corps’ worldwide operations. His travels with the Corps took him to Italy (as Mediterranean division chief) and all over the region: Greece, Saudi Arabia, Oman, Jordan, Sicily, Sardinia, and Tunisia. He made his way to far flung places like Prudhoe Bay and Attu in Alaska. He even had meetings with then Saudi Price Abdullah on construction issues when he was the commander of the Saudi National Guard.

Travels of the USS Fitch

He could always laugh at himself and have a good time. His sense of humor was incredible, and some of his stories are the stuff of legend.

Even though he had a long, fruitful, and incredible life full of accomplishments, all of that pales in comparison to his integrity and devotion to his family. We’ll miss you, Grandpa.

Public Speaking

— December 9, 2014 • #

Reading this post on the value of conference participation prompted some thoughts on the subject, from my perspective as someone who’s done it a couple dozen times, with a wide range of results.

A few years back, I had never presented or given a talk at a conference, but had attended quite a few. I’d always treated conferences and events with a focus on meeting people and absorbing the “state of the art” for whatever the industry or topic at hand. After a few conferences around a given sector, though, they begin to run together. If you’re a doer who is continually self-educating, you quickly find out that you’re already caught up with or ahead of the game on much of the subject matter you’re there to educate yourself on. With the pervasiveness of online information, you can read up on any subject without waiting for the so-called experts at a conference to tell you about it.

I think 2011 was the first time I gave an actual talk to a crowd of peers on a topic I cared about (read: not for school or an assignment). I’m not a natural at public speaking, so breaking down that wall and just doing it wasn’t easy. Ever since, though, I feel that events and conferences are barely worth attending unless I’m an active participant—whether I’m putting something out there I’ve been recently working on, talking about products or projects of my company, or even simply talking on a subject I enjoy and want to promote.

That’s not to say all events are wasteful if you don’t have an opportunity to present. After all, not every one of the thousand attendees can take the mic and have the floor. The value of active participation depends on your objective or desired outcome from the event you’re attending: strictly educational, promotional, or to meet and engage peers in the community. For whatever my motivation is going into an event, I find that a mission to engage with as many people as possible is where I draw the most value. I form lasting relationships that go beyond the last day of the show, and ultimately contribute to the other two motivators: I end up learning a ton and find plenty of areas to promote what I’m doing.

Ultimately, my primary reason for promoting public speaking to my peers is that you always get a return on the time you invest doing it. At the most minimal level, you get a lot smarter on your subject matter if you’re forced to organize your thoughts and convey them to someone else. And most of the time, you’ll end up having interesting conversations and meeting new people based on throwing something out there.

The Diminishing Coast

— September 29, 2014 • #

Yesterday I read this fascinating piece on the state of Louisiana’s gulf coast. This slow, man-induced terraforming of the coastline is permanently eradicating bayou communities, and becoming a high-profile issue in the state. One of the author’s contentions is that the misrepresentation of the state’s ever-changing shape on official maps is a contributor to the lack of attention paid to this drastic situation. I love this use of correct maps as an amplifier of focus, to clarify what bad maps are hiding from the general population.

This issue of map miscommunication isn’t isolated to crises like the one happening on the Louisiana coast, it’s inherent in thousands of government-produced official maps both nationally and internationally. Some of the quotes in the article from GIS experts I thought did a good job demonstrating this fact, that old data tells lies:

He pulled up an aerial image of Pass Manchac, the channel between lakes Pontchartrain and Maurepas. On both the image and the Louisiana state map, the area appears to be forest. Anyone who has visited the flood-prone town of Manchac, about a 45-minute drive northwest of New Orleans, knows it is surrounded by wetlands. “People see the vegetation and the trees and think it’s land,” Mitchell said.

Louisiana's moving edges

Where ancient natural processes of erosion and sedimentation collide with human influence — as in the canals, flood control systems, levees, and shipping channels in the bayous of Louisiana — it strikes a highlight through the age and inaccuracy of the maps on record. As a contributor in the article states, the various layers of government-produced data that are generally thought to be relatively static can be decades old:

His experience updating maps with digital tools has exposed how inconsistent existing maps already were. “The topographic layer might have been done in 1956, and the land cover layer was done in 1962, and the transportation came from 1945,” Mitchell said of his findings. “And those are some of the good ones.”

Keeping these sorts of data up to date is a costly affair, no doubt. But with a natural ecosystem as dynamic as that of southern LA, pretending that 50 year old data is good enough is an exercise in denial. The cartographer Harold Fisk created a map series in the 40s (featured in the piece) that shows a historical picture of the natural environment: a 200-mile wide swath of meandering Mississippi riverbed that was once used to spreading its southerly-transported sediment all over the southeast parts of Louisiana’s boot. This was massively disrupted when the Corps of Engineers rigidly fixed the riverbed shape of the river with dike and levee systems, to keep it from straying and affecting the extensive infrastructure and human settlement that runs along the riverfront from New Orleans to Natchez.

As drastic as the situation is, it’s one without a clear solution; it’s an issue of competing priorities, with completely opposite, but critical ends. Fixing the coastline and allowing renewed alluvial deposit to repair the missing land means tremendous impact on Louisiana’s oil and gas industry (one of the largest in the union). Doing nothing and keeping existing man-made infrastructure in place and unaffected means losing land at a lightning pace, not to mention the negative impact to the fishing industry up and down the coast (again, one of the nation’s largest producers). And with every passing year of the Corps’ nonstop work to control the river’s path, the risk of disastrous floods increases.

Louisiana from space

Last month at a GIS conference in New Orleans, I sat in on a talk given by Allison Plyer from The Data Center, a NOLA non-profit specializing in advocacy around opening and publishing civic map data for all sorts of local issues. She showed some of these maps published earlier this year by ProPublica in their “Losing Ground” series. I highly recommend the ProPublica maps, as well as The Data Center’s projects to showcase the human geography of greater NOLA, particularly their work post-Katrina.

Read the article, it’s a great piece of writing.