Some simple visualization tools that will graphically show the lifecycle of a dataset would be very helpful. I'd suggest a web-based visualization service (perhaps using D3) that can aggregate related PROV and visualize the resulting data lifecycle. The service would dynamically generate a list of all datasets it knows about, users would select one, and the service would visualize the provenance. Additional features might ...more »
Community PROV Challenge
Start with the concept of data as a fundamental core on which various actors perform various events. Given that, then we data managers need some way to allow our data sets to be salts for prov. So, we need some best practice to provide this tiny bit of prov around a dataset. I’ve tried this and likely failed at https://gist.github.com/fils/3d337cd3768342646376206b7d5ac873 (I’m happy to have it fixed!). I’ve also ...more »
I’ve been talking with Car Nicholas in Australia via some RDA telecoms. He’s been very patient with my many poor questions about prov pingback. https://www.w3.org/TR/2013/NOTE-prov-aq-20130430/#provenance-pingback I’m attracted to this approach for its basis in web architecture and also for the relative ease with which this could be implemented in existing LOD stacks. If I could resolve out how to generate the ...more »
The ESGF  is an international peer to peer repository of data. Currently, its main holdings are the CMIP experiments model outputs (CMIP3, CMIP5, and soon to include CMIP6), CORDEX data, and obs4MIPs data prepared for model evaluation. There is an interest within the ESGF user community and governance board to find a way to make NASA data easily searchable and accessible via the ESGF user interfaces. It has been determined ...more »
Science data systems are key to enabling the capture, management, and use of production provenance information. Science analysis now also may involve merging multisensor datasets where lineage can facilitate the understanding of the data. More recently the emergence of small Unmanned Aircraft Systems (sUAS) a.k.a drones, has necessitated the development and emergence of new data processing and management workflows for ...more »
In relation to another idea, perhaps a simple tool like exists for things like schema.org ( see https://hallanalysis.com/json-ld-generator/ for example) could be of value. This would allow people to take IDs for things like dataset or people and provide examples of events or other actors that could look like. With such examples that are vetted by the community then programmers and data managers can better see how ...more »
This idea is similar to Doug Fils' PROV-AQ pingback idea, but with a modification to the architecture. The pingback approach, as I understand it, would not link all the steps in a data processing/usage workflow. If multiple derivative products are created they are not linked back to the original source. It would also place requirements on data producers to host pingback services as well as Linked Data infrastructure. ...more »
Presently the EarthCube CDF  Council of Data Facilities is operating a working group on metadata for facilities . This idea/proposal is to raise the intent to have a similar working group formed to participate with this ESIP group on prov. The goal of this is that it would formally engage EarthCube and specifically CDF, to participate in prov with this group. The working group would engage CDF members to comment ...more »
A key goal of provenance and W3C-PROV is reproducibility. Workflow systems based on PROV can reuse existing "recipes" as well as generate reproducible "traces" of dataset generation procedures. Extending these capabilities to nesting and concatenation of PROV-based workflows connected to existing datasets could drastically reduce the effort of both generating and documenting new datasets, serving as an impetus for creating ...more »
Currently, getting started with generating PROV from applications is difficult. Applications need to be tailored to use PROV or people have to adopt workflow systems or know about systems like SPADE (https://github.com/ashish-gehani/spade/wiki). Systems like ReproZip (https://reprozip.readthedocs.io) provide container packaging for experiments run an the shell command. It would be good to provide a container (i.e. docker ...more »
Use this question to let the admins know if you have further questions about the PROV challenge process and goals. Or to give feedback on the GUI. Or any issue you have. We would love to hear from you!
Currently, most provenance is generated by automated systems. Many times provenance is actually known by users, authors or others. What is needed is a nice user friendly tools to allow people easily connect a the paper to its data, data to its author, papers to experiments, etc. This would leverage the growing persistent identifier space to connect all these artifacts.