660 views
![](https://hedgedoc.softwareheritage.org/uploads/upload_cdffd7a454b474e194de636d3bd9e688.png) # Roadmap 2021 #### Version 1.0, last modified 5/4/2021 This document provides an overview of the technical roadmap of Software Heritage for 2021 [toc] ---- Links: - [Kanban board](https://forge.softwareheritage.org/project/board/160/query/all/) ---- ## Collect ### Faster and more reliable save code now **tags:** openscience **task:** [T3082](https://forge.softwareheritage.org/T3082) **lead:** ardumont **effort:** 1 PM - set up dedicated fast track pipeline for save code now - improve save code now monitoring (user and admin) ### Improve deposit integration, management and display **tags:** openscience **task:** [T3128](https://forge.softwareheritage.org/T3128) **lead:** moranegg **effort:** 3 PM Includes work for: - full invenioRDM integration [T2344](https://forge.softwareheritage.org/T2344) - metadata only deposit [T2540](https://forge.softwareheritage.org/T2540) ### Save forge now **tags:** expand **task:** [T1538](https://forge.softwareheritage.org/T1538) **lead:** ardumont **effort:** 1 PM - tooling & process ### Admin tooling for takedown notices (URLs) **tags:** contract, compliance **task:** [T3087](https://forge.softwareheritage.org/T3087) **lead:** anlambert **effort:** 2 PM - admin interface - journal of operations - web page with list of accepted TDN ---- ## Preserve ### Complete and up-to-date archive copy on S3 **tags:** stability **task:** [T3085](https://forge.softwareheritage.org/T3085) **lead:** douardda **effort:** 1 PM - live update of the objects - regular dumps of the (anonymized) Merkle graph ### Scale-out graph storage in production **tags:** scalability **task:** [T2214](https://forge.softwareheritage.org/T2214) **lead:** vlorentz **effort:** 3 PM - post-Postgres database - Cassandra: [T1892](https://forge.softwareheritage.org/T1892) (*maybe with external help*) ### Scale-out object storage prototype **tags:** stability, scalability, *externalized* **task:** [T3054](https://forge.softwareheritage.org/T3054) **lead:** dachary **effort:** 3 PM ### Cold storage archive in Vitam instance at CINES **tags:** contract **task:** [T3113](https://forge.softwareheritage.org/T3113) **lead:** douardda **effort:** 4 PM ### Mirrors **tags:** stability, scalability **depends:** scale-out object storage **task:** [T3116](https://forge.softwareheritage.org/T3116) **lead:** douardda **effort:** 3 PM - get up and running at least one mirror ### SWHID v2 **tags:** stability, evolution, datamodel **task:** [T3134](https://forge.softwareheritage.org/T3134) **lead:** zack **effort:** 6 PM - complete on paper spec - align with new git hashes - including migration plan from v1 - understand impact on internal microservice architecture - keep correspondence with v1 (there may be multiple v2 for one v1!) - reviewed by crypto experts ### Integrity **tags:** stability, reliability **task:** [T3135](https://forge.softwareheritage.org/T3135) **lead:** olasd **effort:** 2 PM - making sure objects aren't corrupted before insertion [T399](https://forge.softwareheritage.org/T399) - ... and that existing ones aren't [T75](https://forge.softwareheritage.org/T75), and recheck from time to time ---- ## Share ### swh-graph in production **tags:** scalability **task:** [T2220](https://forge.softwareheritage.org/T2220) **lead:** zack **effort:** 2 PM ### Efficient and reliable Vault download **tags:** stability **task:** [T3096](https://forge.softwareheritage.org/T3096) **lead:** vlorentz **effort:** 3 PM - swh-graph may speed up a lot operations ### Web API 2.0 **tags:** reliability, interoperability **task:** [T2194](https://forge.softwareheritage.org/T2194) **lead:** anlambert **effort:** 4 PM - OpenAPI specification - implementation ### Expose metadata and make them searchable **tags:** openscience **task:** [T3097](https://forge.softwareheritage.org/T3097) **lead:** vlorentz **effort:** 3 PM - index extrinsic metadata in swh-search/Elasticsearch from the journal [T2073](https://forge.softwareheritage.org/T2073) - create API endpoint to access raw_extrinsic_metadata [T2938](https://forge.softwareheritage.org/T2938) - show metadata in the web UI [T2088](https://forge.softwareheritage.org/T2088) ### Full text search prototype **tags:** feature, wishlist **task:** [T2204](https://forge.softwareheritage.org/T2204) **lead:** anlambert **effort:** 3 PM - requires integration with swh-graph and/or provenance index ---- ## Organize ### Collect extrinsic metadata **tags:** compliance **task:** [T2202](https://forge.softwareheritage.org/T2202) **lead:** vlorentz **effort:** 3 PM - working pipeline - at least 1 instance running - ClearlyDefined - forge metadata (info on the main page, etc.) ### Provenance in production **tags:** contract, feature **task:** [T3112](https://forge.softwareheritage.org/T3112) **lead:** zack **effort:** 6 PM ### Prior art **tags:** compliance **depends:** provenance | swh-graph in production **task:** [T3136](https://forge.softwareheritage.org/T3136) **lead:** zack **effort:** 3 PM - pinpoint origin of selected source code artifacts - possibly integrated with swh-scanner ---- ## Measurement ### Efficient archive counters (HyperLogLog) **tags:** measure, comm **task:** [T2912](https://forge.softwareheritage.org/T2912) **lead:** vsellier **effort:** 1 PM ### Distribution of origins by forge **tags:** measure, comm **task:** [T3127](https://forge.softwareheritage.org/T3127) **lead:** anlambert **effort:** 1 PM ### Stats on regular crawling by forge **tags:** measure, comm **task:** [T1363](https://forge.softwareheritage.org/T1363) **lead:** olasd **effort:** 1 PM - lag, periodicity, # of changes since last visit, etc. ### View deposits per user (admin and user) **tags:** measure, support **task:** [T3128](https://forge.softwareheritage.org/T3128) **lead:** ardumont **effort:** 1 PM ### Reliable user-level monitoring of services **tags:** stability **task:** [T3129](https://forge.softwareheritage.org/T3129) **lead:** vsellier **effort:** 2 PM - status.softwareheritage.org ---- ## Documentation ### Write use case-specific documentation **tags:** comm, web, doc **task:** [T2234](https://forge.softwareheritage.org/T2234) **lead:** moranegg **effort:** 2 PM Includes FAQ for: - users - ambassadors ### Improve quality of code documentation **tags:** doc, *externalized* **task:** TODO **lead:** TBD **effort:** 2 PM - doc(string) audit - team training about doc writing ### Documentation strategy **tags:** doc **task:** [T2624](https://forge.softwareheritage.org/T2624) **lead:** moranegg **effort:** 1 PM - respective role of docs.s.o, wiki, www.s.o, etc. ---- ## Community ### Tooling for fundraising campaigns **tags:** web **task:** [T3077](https://forge.softwareheritage.org/T3077) **lead:** anlambert **effort:** 1 PM ### Dedicated page to list status of supported listers/loaders **tags:** web, doc **task:** [T3117](https://forge.softwareheritage.org/T3117) **lead:** anlambert **effort:** 1 PM - [T1870](https://forge.softwareheritage.org/T1870) - design web page - process to maintain up to date - make clearly visible and link to Sloan subgrants ---- ## Tooling ### Migration to GitLab **tags:** forge, development **task:** [T2225](https://forge.softwareheritage.org/T2225) **lead:** olasd **effort:** 1PM