dnet-applications/apps/dnet-orgs-database-application
Michele Artini 75a27e1fbd openaire greph node ids 2022-05-26 11:47:01 +02:00
..
scripts added pids for fulltext search 2022-02-23 13:05:25 +01:00
src openaire greph node ids 2022-05-26 11:47:01 +02:00
README.txt readme 2021-04-27 10:26:40 +02:00
pom.xml [maven-release-plugin] prepare for next development iteration 2022-05-13 11:49:07 +02:00
report.xml override remote logout url 2020-12-02 16:10:23 +01:00

README.txt

First Import
============
The first import of the organizations should be performed using the sql script: first_import_grid_ac.sql

1) Download the last dump from https://www.grid.ac
2) Update the paths in the sql script
3) Launch the script

If you want to add missing ROR identifiers:

1) Download ror.json from https://figshare.com/collections/ROR_Data/4596503
2) Update the paths in prepare_grid_ror_update.pl and update_ror_ids.sql
3) Launch prepare_grid_ror_update.pl
4) Launch update_ror_ids.sql

NB: The grid.ac dump is richer then ror dump, Ror does not consider some fiels (city, lat, lng) and hierarchical relationships among the organizations.
If grid.ac will be DEPRACATED we'll start using the import from ror (a script is available: prepare_import_ror.pl)


General Description
===================

# Schema

Main table: 
	organizations
Tables for Multiple properties: 
	acronyms, 
	urls, 
	other_ids, 
	other_names
Tables for vocabularies: 
	countries, 
	languages, 
	id_types, 
	org_types,
	relationships (ie: child, parent, merged_in, merges, ...)
Table for conflicts and duplicates:
	oa_conflicts,
	oa_duplicates
Specific Views for the UI: 
	organizations_view
	organizations_simple_view
	organizations_info_view
	suggestions_info_by_country_view
	oa_duplicates_view
	conflict_groups_view
	duplicate_groups_view
To manage authorizations: 
	users, 
	user_roles, 
	user_countries, 
	users_view (VIEW)
Other: 
	organizations_id_seq (SEQUENCE to generate new OpenOrg IDs), 
	org_index_search (for fulltext search), 
	tmp_dedup_events (to import new suggestion from DedupWF)


# User Roles

User:
	He can work only on organizations of specific countries 
	He can edit metadata of approved organizations
	He can manage duplicates
National Admin: 
	All the User rights
	He can work only on organizations of specific countries 
	He can approve/register organizations
	He can manage conflicts
	He can approve users of his own countries
Super Admin:
	All the National Admin rights, but for all countries

# Actions

1) Create a new org from scratch
	The ID is a valid OpenOrgId (generated by the system)
	The status is 'approved'

2) Approve a suggested org (prefix: pending_org_::)
	ID: A new org is created with OpenOrg Id and status='approved' 
	Copy the duplicates from old to new organizations (status will be 'suggested')
	The pending org is deleted
	
3) Approve a suggested duplicate (the status of the duplicates is always 'raw')
	in oa_duplicates: reltype = 'is_similar'

4) Discard a suggested duplicate
	in oa_duplicates: reltype = 'is_different'

5) Resolve a conflict using a subset of suggested conflicts (approve)
	Generate a new org 
	New org status: 'approved'
	Conflict reltype: 'is_similar'
	Old orgs status: 'hidden'
	Rels new <-> old : 'merges'
	Rels old <-> new : 'merged_in'
6) Resolve a conflict using a subset of suggested conflicts (discard)
	Conflict reltype: 'is_different'
		
# Load of new suggestion using a Dedup Workflow
	The dedup wf writes the suggestions on the tmp_dedup_events at the end it calls the method /import/dedupEvents
	The previous suggestions (orgs, dups and conflicts) are deleted
	The suggestions are moved from the temp table according to:
		1) not(isOpenOrg(oa_original_id)) AND (oa_original_id = local_id OR isEmpty(local_id)) -> new suggested org with id = 'pending_org_::...'
		2) not(isOpenOrg(oa_original_id)) AND (oa_original_id != local_id OR isEmpty(local_id)) -> duplicate of a suggested org
		3) isOpenOrg(oa_original_id) AND (oa_original_id != local_id OR isEmpty(local_id)) -> duplicate of a existing  openOrgs
		4) Create a group using 'group_id', it should contain only OpenOrg Ids (using oa_original_id and local_id): each couple of the group is a conflict