Community PROV Challenge

Visualization of Provenance Traces

Some simple visualization tools that will graphically show the lifecycle of a dataset would be very helpful. I'd suggest a web-based visualization service (perhaps using D3) that can aggregate related PROV and visualize the resulting data lifecycle. The service would dynamically generate a list of all datasets it knows about, users would select one, and the service would visualize the provenance. Additional features might ...more »

Submitted by (@tnarock)

Voting

6 votes
6 up votes
0 down votes

Community PROV Challenge

Dataset as prov "salt"

Start with the concept of data as a fundamental core on which various actors perform various events. Given that, then we data managers need some way to allow our data sets to be salts for prov. So, we need some best practice to provide this tiny bit of prov around a dataset. I’ve tried this and likely failed at https://gist.github.com/fils/3d337cd3768342646376206b7d5ac873 (I’m happy to have it fixed!). I’ve also ...more »

Submitted by (@dougfils)

Voting

3 votes
3 up votes
0 down votes

Community PROV Challenge

Reference implementations of PROV-AQ pingback.

I’ve been talking with Car Nicholas in Australia via some RDA telecoms. He’s been very patient with my many poor questions about prov pingback. https://www.w3.org/TR/2013/NOTE-prov-aq-20130430/#provenance-pingback I’m attracted to this approach for its basis in web architecture and also for the relative ease with which this could be implemented in existing LOD stacks. If I could resolve out how to generate the ...more »

Submitted by (@dougfils)

Voting

3 votes
3 up votes
0 down votes

Community PROV Challenge

Tracing ESGF data holdings to source constituent coverage datasets

The ESGF [0] is an international peer to peer repository of data. Currently, its main holdings are the CMIP experiments model outputs (CMIP3, CMIP5, and soon to include CMIP6), CORDEX data, and obs4MIPs data prepared for model evaluation. There is an interest within the ESGF user community and governance board to find a way to make NASA data easily searchable and accessible via the ESGF user interfaces. It has been determined ...more »

Submitted by (@lewismc)

Voting

3 votes
3 up votes
0 down votes

Community PROV Challenge

Onboarding PROV-ES as part of the Drone data capture process

Science data systems are key to enabling the capture, management, and use of production provenance information. Science analysis now also may involve merging multisensor datasets where lineage can facilitate the understanding of the data. More recently the emergence of small Unmanned Aircraft Systems (sUAS) a.k.a drones, has necessitated the development and emergence of new data processing and management workflows for ...more »

Submitted by (@lewismc)

Voting

3 votes
3 up votes
0 down votes

Community PROV Challenge

Generate a simple prov example tool

In relation to another idea, perhaps a simple tool like exists for things like schema.org ( see https://hallanalysis.com/json-ld-generator/ for example) could be of value. This would allow people to take IDs for things like dataset or people and provide examples of events or other actors that could look like. With such examples that are vetted by the community then programmers and data managers can better see how ...more »

Submitted by (@dougfils)

Voting

2 votes
2 up votes
0 down votes

Community PROV Challenge

Capturing and Linking PROV Records

This idea is similar to Doug Fils' PROV-AQ pingback idea, but with a modification to the architecture. The pingback approach, as I understand it, would not link all the steps in a data processing/usage workflow. If multiple derivative products are created they are not linked back to the original source. It would also place requirements on data producers to host pingback services as well as Linked Data infrastructure. ...more »

Submitted by (@tnarock)

Voting

2 votes
2 up votes
0 down votes

Community PROV Challenge

Engage the EarthCube CDF to participate in the ESIP prov effort directly via a CDF working group

Presently the EarthCube CDF [1] Council of Data Facilities is operating a working group on metadata for facilities [2]. This idea/proposal is to raise the intent to have a similar working group formed to participate with this ESIP group on prov. The goal of this is that it would formally engage EarthCube and specifically CDF, to participate in prov with this group. The working group would engage CDF members to comment ...more »

Submitted by (@dougfils)

Voting

2 votes
2 up votes
0 down votes

Community PROV Challenge

Reproducibility and reusability as drivers for W3C-PROV interoperability

A key goal of provenance and W3C-PROV is reproducibility. Workflow systems based on PROV can reuse existing "recipes" as well as generate reproducible "traces" of dataset generation procedures. Extending these capabilities to nesting and concatenation of PROV-based workflows connected to existing datasets could drastically reduce the effort of both generating and documenting new datasets, serving as an impetus for creating ...more »

Submitted by (@joshlieberman)

Voting

2 votes
2 up votes
0 down votes

Community PROV Challenge

Provide a starting container/vm that includes all the PROV tools you need

Currently, getting started with generating PROV from applications is difficult. Applications need to be tailored to use PROV or people have to adopt workflow systems or know about systems like SPADE (https://github.com/ashish-gehani/spade/wiki). Systems like ReproZip (https://reprozip.readthedocs.io) provide container packaging for experiments run an the shell command. It would be good to provide a container (i.e. docker ...more »

Submitted by (@pgroth)

Voting

2 votes
2 up votes
0 down votes

Community PROV Challenge

User Interface to assemble provenance reports

Currently, most provenance is generated by automated systems. Many times provenance is actually known by users, authors or others. What is needed is a nice user friendly tools to allow people easily connect a the paper to its data, data to its author, papers to experiments, etc. This would leverage the growing persistent identifier space to connect all these artifacts.

Submitted by (@pgroth)

Voting

1 vote
1 up votes
0 down votes