Building a SpatioTemporal Feature Registry

Building a SpatioTemporal Feature Registry

Shared data on boundaries, identifiers, and associated information.

USGS is working to build a National Biogeographic Map (NBM) that includes information tied to areas of interest that range from political boundaries to land-use classifications (i.e. parks and wildlife refuges) to more ecologically focused "bounded" areas like ecoregions. The NBM includes features such as dams and other stream barriers (needed in connectivity analysis), boat ramps (needed for invasive species risk modeling), and other discrete human-made features. It is also linked to the the National Hydrographic Dataset's identified watershed boundaries and stream network.

The NBM team is constantly challenged with chasing down the appropriate and best sources for these boundaries, identifiers, and associated information. To confront this challenge the NBM developer team has started conceptualizing a Spatial Feature Registry. The team thinks the resources being developed have broad applicability within the science community and they would like to engage YOU, to determine how this effort could become a community resource.

CAMPAIGN BRIEF

As a start, the team at USGS has a few key design principles (listed below) to consider:

Remember to select "SUBMIT NEW IDEA" at the top of the page if you want to contribute.

1) Feature is something of a GIS term and may be confusing to some, but we couldn't come up with anything better for now. What we mean here in terms of overall scope is anything that has both a spatial and temporal aspect - a place (on Earth for right now but we (USGS, NASA, others) also have a bunch of named places on the Moon and Mars that could be in the mix) that has some temporal bounds (in any definition of time).

2) The whole thing has to be based in code-driven data and infrastructure. The registry part of this denotes the idea that different sources of features are registered in some kind of catalog or index and then something is done with the registrants. Anything that happens with the registrants needs to be done with open code that anyone can see and build on. The code needs to be buildable and adaptable, meaning that if we decide over time to do different things with the registrants, we can go back to the code and build from there. In that sense, the end result is more about the code than it is about the data.

3) The data that source the features in the registry need to have made some official online debut somewhere. We need something that the code can operate against to do stuff. It has to be a file or streaming data service or something with substance. The sources need to have some degree of permanence or at least an understanding of their projected longevity as part of understanding and working with them. This also means that whatever mechanism we use to identify the features in the context of each source needs to be understood.

4) We believe that the process we're looking for in terms of assembling features is more about a flexible system for processing features into many different specialized indexes than it is about creating one massive database - a different version of geonames.org. It's more about understanding what can be done with the stuff in the registry and then building out any number of indexes or forms of the information that are fit for a given set of purposes. The registry needs to provide the intelligence for how to Extract features from registered sources, provide the information on how to access and Load them, and some of what's possible in terms of Transformations. But then it's up to the code that lots of creative people write and share against the registry for various purposes to determine how many and what types of transformations are made.

5) One of the key capabilities that the registry should help bring about is a growing understanding of and ability to exploit the explicit and implicit relationships between features. The "easy" relationships are those based simply on spatial/temporal proximity, overlap, or other dynamics. Harder relationships to figure out are those based on some other characteristics of the features. We see a huge role in both of these for the semantic communities in ESIP and CDI (USGS Community for Data Integration) and a connection to the Community Ontology Repository as a venue to make the ontologies for these relationships tangible and usable. An aspect of this that conflicts a little bit with some of the other principles is the need to have reasonably persistent identifiers at the core of a relationship graph. We're interested in ideas on how to balance a distributed and possibly transient data system with the need to build, store, and serve relationships between heterogeneous end members.

6) People do need to be able to directly use something that comes out of this. Though the idea is to have more of a usable/repeatable process vs. a big data store, we do want to provide some kind of master index that supplies one avenue of understanding what we're talking about. A cool thing to do with that might be to provide a schema.org place encoding from a search API as a way to give people something immediately useful as we continue building out linked data across our various systems. Backed by a robust search indexing platform (we use ElasticSearch), we could do some really interesting exploration in text mining for places, feeding back schema.org formatted information, and helping content owners validate findings and incorporate structured place information into their apps, web pages, etc.

This idea campaign is designed to explore these and other principles across the community. USGS is actively developing technologies that explore these concepts, generating a data source and API with some of the feature types outlined above. As these efforts continue, the USGS team will contribute thoughts, links, code, and data into this idea space and they encourage others to do the same. Depending on the level and direction of interest, ESIP and CDI may work together to develop a code challenge of some kind to put some investment into a key aspect or blocker that could have big payoff for the community.

We are seeking your input on the following questions:

Question 1: Do you agree, disagree, quibble with the design principles? What would you change? What other key parameters should we be considering in developing this resource?

Question 2: Would you use such a resource if we built it together in the commons? What would you use it for? Share your use cases. Link to and describe existing systems that already do this or parts of it.

Question 3: What software components, ontologies, or other resources already exist as building blocks for this architecture? Link and describe fitment.

Question 4: What are the missing elements; areas where small investments could fill important gaps?

Please remember to select "SUBMIT A NEW IDEA" at the top of the page if you want to contribute!