Community PROV Challenge

Community PROV Challenge

The Community PROV Challenge seeks to explore how community-based provenance and annotation capabilities enhance scientific integrity throughout the data lifecycle. To do this, the USGS has partnered with the ESIP Lab to find creative solutions to PROV challenges.

Community PROV Challenge

Invisible, unobtrusive provenance: a gap analysis

The provenance community currently faces the classic engineering problem of selecting the appropriate wrench for pounding a selection of screws into a wide variety of bricks. Which is to say, we have many possible components of many possible solutions to many real problems. I suggest we take a step back and focus on actual scientific problems for which we believe that provenance is part of the solution. Given Frew's ...more »

Submitted by (@jamesfrew)
Add your comment

Voting

1 vote
1 up votes
0 down votes

Community PROV Challenge

Blockchain and Provenance

ESIP could promote using blockchain technology to implement a distributed ledger for W3C Prov information. The Challenge, as I understand it, is to improve interoperability of provenance information between agencies. One aspect of provenance looks very similar to a ledger - like a series of transactions. Data are collected, then processed in a series of steps. In a closed-data world, it is possible for one authority ...more »

Submitted by (@jgallagher)
Add your comment

Voting

1 vote
1 up votes
0 down votes

Community PROV Challenge

Blockchain and Provenance

ESIP could promote using blockchain technology to implement a distributed ledger for W3C Prov information. The Challenge, as I understand it, is to improve interoperability of provenance information between agencies. One aspect of provenance looks very similar to a ledger - like a series of transactions. Data are collected, then processed in a series of steps. In a closed-data world, it is possible for one authority ...more »

Submitted by (@jgallagher)
Add your comment

Voting

0 votes
0 up votes
0 down votes

Community PROV Challenge

Connect hypothes.is to PROV

One of the requests outlined in the blog that described the impetuous for this challenge (http://testbed.esipfed.org/node/9350) was the need to connect annotations made by authors and provenance. hypothes.is is an annotation tool for web documents that supports the W3C annotation specifications. It would be interesting to extend that tool with the ability to generate PROV provenance statements and also allow annotators ...more »

Submitted by (@pgroth)
2 comments

Voting

1 vote
1 up votes
0 down votes

Community PROV Challenge

User Interface to assemble provenance reports

Currently, most provenance is generated by automated systems. Many times provenance is actually known by users, authors or others. What is needed is a nice user friendly tools to allow people easily connect a the paper to its data, data to its author, papers to experiments, etc. This would leverage the growing persistent identifier space to connect all these artifacts.

Submitted by (@pgroth)
Add your comment

Voting

1 vote
1 up votes
0 down votes

Community PROV Challenge

Report generation from PROV

This is similar to Tom Narrock's idea but instead of focusing on visualization, create a website that would build nice looking dashboards or generate reports based on submitted provenance traces from various institutions and agencies.

 

This could build on what was done by the https://data.globalchange.gov to provide provenance underlying all the evidence in their reports.

Submitted by (@pgroth)
Add your comment

Voting

0 votes
0 up votes
0 down votes

Community PROV Challenge

Provide a starting container/vm that includes all the PROV tools you need

Currently, getting started with generating PROV from applications is difficult. Applications need to be tailored to use PROV or people have to adopt workflow systems or know about systems like SPADE (https://github.com/ashish-gehani/spade/wiki). Systems like ReproZip (https://reprozip.readthedocs.io) provide container packaging for experiments run an the shell command. It would be good to provide a container (i.e. docker ...more »

Submitted by (@pgroth)
2 comments

Voting

2 votes
2 up votes
0 down votes

Community PROV Challenge

Reproducibility and reusability as drivers for W3C-PROV interoperability

A key goal of provenance and W3C-PROV is reproducibility. Workflow systems based on PROV can reuse existing "recipes" as well as generate reproducible "traces" of dataset generation procedures. Extending these capabilities to nesting and concatenation of PROV-based workflows connected to existing datasets could drastically reduce the effort of both generating and documenting new datasets, serving as an impetus for creating ...more »

Submitted by (@joshlieberman)
Add your comment

Voting

2 votes
2 up votes
0 down votes

Community PROV Challenge

Onboarding PROV-ES as part of the Drone data capture process

Science data systems are key to enabling the capture, management, and use of production provenance information. Science analysis now also may involve merging multisensor datasets where lineage can facilitate the understanding of the data. More recently the emergence of small Unmanned Aircraft Systems (sUAS) a.k.a drones, has necessitated the development and emergence of new data processing and management workflows for ...more »

Submitted by (@lewismc)
1 comment

Voting

3 votes
3 up votes
0 down votes

Community PROV Challenge

Tracing ESGF data holdings to source constituent coverage datasets

The ESGF [0] is an international peer to peer repository of data. Currently, its main holdings are the CMIP experiments model outputs (CMIP3, CMIP5, and soon to include CMIP6), CORDEX data, and obs4MIPs data prepared for model evaluation. There is an interest within the ESGF user community and governance board to find a way to make NASA data easily searchable and accessible via the ESGF user interfaces. It has been determined ...more »

Submitted by (@lewismc)
Add your comment

Voting

3 votes
3 up votes
0 down votes

Community PROV Challenge

Engage the EarthCube CDF to participate in the ESIP prov effort directly via a CDF working group

Presently the EarthCube CDF [1] Council of Data Facilities is operating a working group on metadata for facilities [2]. This idea/proposal is to raise the intent to have a similar working group formed to participate with this ESIP group on prov. The goal of this is that it would formally engage EarthCube and specifically CDF, to participate in prov with this group. The working group would engage CDF members to comment ...more »

Submitted by (@dougfils)
1 comment

Voting

2 votes
2 up votes
0 down votes

Community PROV Challenge

Visualization of Provenance Traces

Some simple visualization tools that will graphically show the lifecycle of a dataset would be very helpful. I'd suggest a web-based visualization service (perhaps using D3) that can aggregate related PROV and visualize the resulting data lifecycle. The service would dynamically generate a list of all datasets it knows about, users would select one, and the service would visualize the provenance. Additional features might ...more »

Submitted by (@tnarock)
12 comments

Voting

6 votes
6 up votes
0 down votes