openaire-graph-docs/versioned_docs/version-9.0.0/changelog.md

13 KiB

sidebar_position
12

Versions & changelog

Versioning

Our versioning policy follows the Semantic Versioning specification. In our case, given a version MAJOR.MINOR.PATCH, we increment the:

  • MAJOR version when the data model of the Graph changes
  • MINOR version when the pipeline (e.g., different deduplication method, different implementation for an enrichment process) or major data sources change
  • PATCH version when the graph data are updated

Changelog

This section documents all notable changes for each graph version.


v9.0.0

Start Date: 2024-10-03 • Release Date: 2024-10-23 • Dataset release: no

Added

  • ~2.5% increase (+6.7Mi) in the number of research products
  • ~6.35% increase (+11.9Mi) in the number of affiliations
  • ~7.3% increase (+311K) in the number of funded research products
  • Import of SDG classifications without a DOI
  • Introduced plugins for collecting research results from the OSF preprints server and the UKRI registry

Changed

  • Updated Crossref publications to include contents until Aug 2024 and updated mapping so that
    • records with a relationship "is-review-of" are mapped as publication of type "Review".
    • force the hostedby of Crossref records with DOI prefix 10.3410 and 10.12703 to the H1 Connect data source.
  • Updated ORCID contents until Sept 2024
  • Updated Datacite contents until Sept 2024
  • Improvements in the comparators used in the organization deduplication.
  • Changed the selection criteria for the pivot record of a group so that by best pid type becomes the first criteria, as consequence pivots will converge to records having DOI pid.
  • Community tags added to all the entity types.

v8.0.1

Start Date: 2024-08-09 • Release Date: 2024-09-12 • Dataset release: no

Added

  • Introduced mapping of affiliations from publisher websites

Changed

  • Updated Crossref publications to include contents until June 2024
  • Updated ORCID contents until July 2024
  • Updated Datacite contents until July 2024
  • Include only FoS L1..L2 in the record serialization

v8.0.0

Start Date: 2024-07-03 • Release Date: 2024-07-15 • Dataset release: yes

Added

  • General increase of the scientific products with ORCID identified authors +0.43% (+145K)

Changed

  • Improved matching of organizations in the deduplication algorithm, leading to less false positives
  • Updated Crossref publications to include contents until May 2024
  • Updated ORCID contents until June 2024
  • Updated Datacite contents until June 2024
  • Updated serialization of the data model as follows
    • The serialization of the property names is changed to camelCase
    • The serialization of the impact indicators was updated renaming the element bipIndicators as citationImpact, which includes the following:
      • citationCount, influence, popularity, impulse, all of them typed as Double
      • citationClass, influenceClass, impulseClass, popularityClass, all of them typed as String
    • The element datasettype was renamed to type

v7.2.0

Start Date: 2024-05-15 • Release Date: 2024-06-20 • Dataset release: no

Added

  • Introduced new Field of Science classifications for publications, reaching a total of ~77.2Mi publications classified
  • General increase of the affiliations +20% (from 162Mi to 195Mi)
  • General increase of the scientific products with ORCID identified authors +10% (from 3.09Mi to 3.39Mi)

Changed

  • Revised deduplication configuration to better exploit resource types
  • The DOIBoost dataset was superseded by the direct aggregation of its datasources: Crossref, Unpaywall, Microsoft Academic Graph, ORCID. See the aggregation of the non compatible sources section to know more details
  • Relaxed Crossref publication inclusion criteria, now accepting records without author information, leading to a +15% increase (from 127Mi to 146Mi records). Included contents until April 2024
  • Updated ORCID contents until April 2024
  • Updated Datacite contents until April 2024

v7.1.3

Start Date: 2024-04-10 • Release Date: 2024-04-22 • Dataset release: no

Added

  • Introduced new Field of Science classifications, reaching a total of ~73Mi publications classified
  • General increase of the funded scientific outputs, thanks to the full-text mining scanning new OpenAccess publications, some examples:
  • European Commission - EC +7% (from 1.52Mi to 1.62Mi)
  • Irish Research Council - IRC +7% (from 12.7K to 13.5K)
  • French National Research Agency - ANR +5.8% (from 91.5K to 96.8K)
  • National Institute of Health - NIH +5% (from 594K to 626K)
  • UK Research and Innovation - UKRI +3.7% (from 434K to 450K)
  • General increase of the scientific products with author affiliation information +2% (from 83.12Mi to 84.88Mi)

Changed

  • Updated Crossref publications to include contents until March 2023
  • Updated Datacite contents until March 2024
  • Updated ORCID contents until March 2024

v7.1.2

Start Date: 2024-03-15 • Release Date: 2024-03-27 • Dataset release: no

Added

  • General increase of the funded scientific outputs, thanks to the full-text mining scanning new OpenAccess publications

Changed

  • Updated Crossref publications to include contents until February 2023
  • Updated Datacite contents until February 2024
  • Updated ORCID contents until February 2024

v7.1.1

Start Date: 2024-02-23 • Release Date: 2024-03-06 • Dataset release: no

Added

  • Updated the content import criteria applied to Datacite, resulting in +13Mi Other Research Products (+167%)
  • Introduced project PIDs; DOI currently available for grants funded by FCT and TWCF

Changed

  • Scientific products typed as "Collection" categorized under "Research Data" instead of "Other Research Product".
  • Updated Crossref publications to include contents until January 2023
  • Updated Datacite contents until January 2024

v7.1.0

Start Date: 2024-01-30 • Release Date: 2024-02-20 • Dataset release: no

Added

  • The scientific products aggregated increased by ~5Mi records (+1.6%)

Changed

  • A refined version of the deduplication strategy allowed to catch more duplicates among the scientific products, implying a decrease of their total number of ~3.2Mi (-1.35%). More details about the deduplication algorithm are available here.
  • Updated Crossref publications to include contents until November 2023
  • Updated Datacite contents until December 2023

v7.0.0

Start Date: 2023-12-18 • Release Date: 2024-01-06 • Dataset release: yes

Added

  • the scientific products increased by ~3Mi records (+1.26%)
  • the number of relations increased by 28.6Mi (+1%)
  • the funded contents increased by 5%, from 3.6Mi to 3,8Mi. Funders that recorded the highest increase include, for example, EC with +120K linked research products, and SFI with +1K products.

Changed

This graph release also introduces new fields to identify reseach products published using specific open access models, in diamond journals, and those that received public funding. These fields will also be added to the graph dataset in Zenodo. In details:

  • ResearchProduct.isGreen (true, false): indicates whether or not the researh product was published following the green open access model;
  • ResearchProduct.openAccesColor (bronze, gold, hybrid): indicates the specific open access model used for the publication;
  • ResearchProduct.isInDiamondJournal (true, false): indicates whether or not the research product was published in a diamond journal;
  • ResearchProduct.publicly-funded (true, false): indicates whether or not the grants acknowledged by the publication come from public funds.

v6.2.2

Start Date: 2023-11-07 • Release Date: 2023-11-23 • Dataset release: no

Added

  • Imported Opencitation's POCI dataset, containing citations among publications in PubMed
  • Imported Affiliations from Crossref and from PubMed
  • Imported Software Heritage identifiers for Software records
  • Extended coverage of Irish funders imported from Crossref
  • Peer reviewed material identified with a revised heuristic that allowed to improve the coverage
  • Project references identified by TDM increased by ~10%
  • Introduced new Field of Science classifications for ~40Mi publications

Changed

  • Updated Crossref publications to include contents until October 2023
  • Updated Datacite contents until October 2023
  • Indicators regarding data source downloads and views taken by usage counts from September 2023

v6.1.1

Start Date: 2023-09-11 • Release Date: 2023-10-15 • Dataset release: no

Added

  • Affiliation (research product to organization) relations from Crossref
  • Links to the full text of research products
  • Cleaning for author and publisher names (get rid of tabs, CR characters, \n(s), escape double quotes)

Changed

  • Projects without a grant code are removed
  • Crossref dump from July 2023
  • ORCID works without a DOI from March 2023
  • Usage counts from July 2023
  • Datacite contents from early July 2023
  • OpenCitations relations from December 2022

v6.0.0

Start Date: 2023-07-26 • Release Date: 2023-08-16 • Dataset release: yes

Changed

  • Relationship data model: flattened properties source, sourceType, target, targetType
  • BIP! indicators are now serialised as an array; see the updated model here
  • Crossref dump from June 2023
  • ORCID works without a DOI from June 2023
  • Usage counts from June 2023
  • Datacite contents from June 2023
  • OpenCitations relations from January 2023
  • BIP! indicators from June 2023
  • New Datasources/Services were added, collected from an updated EOSC Service catalogue endpoint

v5.2.0

Start Date: 2023-07-03 • Release Date: 2023-07-17 • Dataset release: no

Added

  • Citations imported from Crossref & MAG
  • FoS and SDG classifications introduced for ~16Mi research products

Changed

  • Removed the numerical prefix from the OpenAIRE identifiers ("20|openorgs____::..." --> "openorgs____::...")
  • Dataset file names in the Zenodo depositions changed from dump to dataset
  • Crossref dump from May 2023
  • ORCID works without a DOI from June 2023
  • Usage counts from April 2023
  • Datacite contents from June 2023
  • OpenCitations relations from January 2023
  • Deduplication of the datasource
  • Avoid duplicated organisation PIDs

v5.1.3

Start Date: 2023-05-22 • Release Date: 2023-06-12 • Dataset release: no

Added

  • Datasource and project level usage counts

Changed

  • Crossref dump from April 2023
  • ORCID works without a DOI from May 2023
  • Usage counts from April 2023
  • Datacite contents from May 2023
  • OpenCitations relations from January 2023
  • Deduplication of the datasource

v5.1.2

Start Date: 2023-03-20 • Release Date: 2023-04-04 • Dataset release: no

Changed

  • Crossref dump from February 2023
  • ORCID works without a DOI from March 2023
  • Usage counts from February 2023 (+76% Downloads per Datasource for 2023)
  • Datacite contents from mid March 2023
  • OpenCitations relations from January 2023

v5.1.1

Start Date: 2023-02-13 • Release Date: 2023-03-01 • Dataset release: no

Added

Changed

  • Crossref dump from January 2023
  • ORCID works without a DOI from January 2023
  • Usage counts from January 2023
  • Datacite contents from mid February 2023
  • OpenCitations relations from December 2022

v5.1.0

Start Date: 2023-01-16 • Release Date: 2023-01-30 • Dataset release: no

Added

  • Revised SDG classification: better accuracy, lower coverage (will improve in the next months)

Changed

  • Crossref dump from December 2022
  • ORCID works without a DOI from January 2023
  • Usage counts from December 2022
  • DataCite contents from January 2023

v5.0.0

Start Date: 2022-12-19 • Release Date: 2022-12-28 • Dataset release: yes

Added

Changed

  • FOS and SDGs were removed from the ResearchProduct.subjects
  • Measures were removed from the ResearchProduct.instance
  • Updated DOIBoost to include publications from Crossref and the works from ORCID with a DOI until November 2022
  • Added ORCID works without a DOI from November 2022