Each day, new information - ranging from blog posts to EU-funded research projects - appears online. Yet the web is an ephemeral medium.
For example, just 7% of project websites associated with the 4th framework programme (FP4, 1994-1998) were still live by 2015.
R&D websites provide invaluable information that cannot be found elsewhere, including
- software used in experiments
- test data sets
- documents falling outside the academic and commercial publishing circuits
- news
- material for dissemination.
Deactivating such websites thus depletes the sum of human knowledge.
How can we prevent this? The answer is web archiving.
Arquivo.pt, run by the Portuguese Foundation for Science and Technology, keeps scientific and academic information available online, making it easy to browse through past R&D projects.
It automatically identifies R&D project websites and collects their content. Arcquivo.pt has already saved 52 million files of R&D projects financed since FP4, gathered from nearly 54 000 websites, and it keeps on growing.
The datasets for FP4, FP5, FP6 and FP7 are now publicly available , so other organisations keen to preserve this digital heritage can improve and reuse them.