dhp_oaf_model
beta
misc_fixes_merge_entities
spark34-integration
beta-release-1.2.5
WebCrowlBeta
WebCrowl
master
provision_memoryOverhead
stats_step16_fix
doidoost_dismiss
CrossrefFundersMap
taggingProjects
9647_datacite_affiliations
UsageStatsRecordDS
mergeutils
oaf_country_beta
index_records
ocnew
FOSNew
bulkTaggingPathMapExtention
transformativeagreement
new_orcid_enhancement
9559_DBLP_data
base_stats_job_deprecated
SWH_issue_377
import_orps_fix
spark_join_param_tuning
crossref_mapping_vocabularies
promote_actions_join_type_master
promote_actions_join_type
provision_community_api
enrichmentSingleStepFixed
fosPreparationBeta
resource_types
enrichmentSingleStep
oldPropagationOrganizationCommunity
beta_to_master_dicember2023
orcid_import
9078_xml_records_irish_tender
clean_license_publisher
bulkTag
SWH_integration
importpoci
8172_impact_indicators_workflow
dedup-with-dataframe-spark34
8876
master_july23
distinct_pids_from_openorgs_beta
propagationProjectThroughParentChils
fulltext_url_validation
removeTaggingCondition
ticket_8369
tweaking_spark_params
fc4e-rsac
doiboost_authormerger
beta_dedup_configuration
apc_affiliation
bulkTagRefactor
organizationToRepresentative
graph_cleaning_refactoring
scholix_flat_indexing
scholix_data_type_openaire
advConstraintsInBeta
doiboostMappingExtention
mag_citation_relation
h2020classification
doiboostFunderExtention
citations_monodirectional
compatibility_order
8232-mdstore-synch-improve
subjectPropagation
pubmed_to_production
cleanCountryOnMaster
graph_cleaning
deduptesting
horizontalConstraints
enrichment
scholix_to_solr
transformation_wf
discard-non-wellformed
removeDump
eosc_context_tagging
pubmed_update
doiboost_refactor
clean_context_master
monitoring
dump_new_funded_products
dump_delta_projects
dump
7096-fileGZip-collector-plugin
oaf_relation_mapping
validation
native_records_migration
instance_group_by_url
hostedByMap_update
hostedByMap_oastartdate
sygma_indexing
dhp-1.2.4
dhp-1.2.3
dhp-1.2.2
dhp-1.2.1
dhp-1.2.0
dhp-1.1.7
dhp-1.1.6
dhp-1.1.5
dhp-1.0.4
dhp-1.0.3
dhp-1.0.2
1.0.1
september-2023
archive/master_pre_stable_ids
archive/junit5
archive/islookup_timeout
archive/dedupTest
archive/bipFinder_master_test
Labels
Clear labels
Something is not working
This issue or pull request already exists
New feature / refactoring
Need some help
Something is wrong
More information is needed
EOSC Research Discovery Graph
EOSC Research Software APIs and Connectors
This won't be fixed
Apply labels
bug
Something is not working
duplicate
This issue or pull request already exists
enhancement
New feature / refactoring
help wanted
Need some help
invalid
Something is wrong
question
More information is needed
RDGraph
EOSC Research Discovery Graph
RSAC
EOSC Research Software APIs and Connectors
wontfix
This won't be fixed
No Label
bug
duplicate
enhancement
help wanted
invalid
question
RDGraph
RSAC
wontfix
Milestone
Set milestone
Clear milestone
No items
No Milestone
Projects
Set Project
Clear projects
No project
Assignees
Assign users
Clear assignees
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.
No due date set.
Dependencies
No dependencies set.
Reference: D-Net/dnet-hadoop#9
Reference in New Issue
There is no content yet.
Delete Branch '%!s(<nil>)'
Deleting a branch is permanent. It CANNOT be undone. Continue?
No
Yes
This task aims to introduce a set of extensions to the dhp.Oaf data model and to propagate the changes to other modules and workflows where needed. The changes include:
Structure for the proposed measures to be attached to Result entities
Update:
dhp_oaf_model
aligned with master branch;@alessia.bardi do we know which data type we must consider for the addition of H2020 program information?
Another update:
Update: the information for the H2020 programme has been introduced in the data model (programme is a list of objects of type Programme each having a code - String - and a description - String).
Each project can be associated to more than one programme.
The update to the project is create as an action set and produced by:
We select from the set of projects in the csv only those also present in the database, then we join then with the programme information in the other csv file and create a set of atomic actions, one for each project for which we get a match.
Other two requests that imply changes in the data model
Another aspect to consider that impact on the model definition derives from the record merge operation performed in the deduplication workflow on the group of duplicates. Provided that representative records produced by such operation obey to the same exact model used for the duplicated records, the merge policy generally assumes to gather all the occurrences for a given field from the duplicates and set them in the corresponding repeatable field in the representative record being built.
This approach works well when no restrictions are applied to the N values from the duplicates, however, the field
publisher
is declared as non repeatable, therefore the winning value will only depend on the trust-based ordering performed in the merge procedure (this case was origially spotted in In #5915). An idea to solve this case would be to move the definition of the fieldpublisher
inside each record instance, but this case can be generalized to every field defined as non-repeatable.Make room for the OpenAccess statuses:
As indicated by the method to be applied on the Unpaywall records
In order to support in the future the validation of context tags in a similar way we are addressing the validation of result-project links, I think we should add the validated and validationDate properties also in the Context model class.
The information about the validation for context tags can already be set when the tag has been added via a claim, because the gateway curators can already approve/reject claims. Further ways to get this kind of feedback for tags that do not come from claims still need to be thought.
Let me also remind that the main goals of the validated property are: