dnet-applications/apps/dnet-orgs-database-application
Michele Artini 303e1433ac skip empty values importing urls and acronyms 2021-04-23 10:00:51 +02:00
..
src skip empty values importing urls and acronyms 2021-04-23 10:00:51 +02:00
README.txt readme 2021-03-02 10:45:13 +01:00
TODO.txt buttons form merge all 2021-01-18 15:45:10 +01:00
import_certificates.sh oauth2 first steps 2020-11-04 10:30:29 +01:00
pom.xml added dhp-mdstore-manager 2021-01-25 14:16:37 +01:00
report.xml override remote logout url 2020-12-02 16:10:23 +01:00
ssh_tunnel_postgres.sh fixed a bug with hibernate and added api to import simrels 2020-09-28 16:53:20 +02:00

README.txt

First Import
============
The first import of the organizations should be performed using the sql script: first_import_grid_ac.sql

1) Download the last dump from https://www.grid.ac
2) Update the paths in the sql script
3) Launch the script

If you want to add missing ROR identifiers:

1) Download ror.json from https://figshare.com/collections/ROR_Data/4596503
2) Update the paths in prepare_grid_ror_update.pl and update_ror_ids.sql
3) Launch prepare_grid_ror_update.pl
4) Launch update_ror_ids.sql

NB: The grid.ac dump is richer then ror dump, Ror does not consider some fiels (city, lat, lng) and hierarchical relationships among the organizations.
If grid.ac will be DEPRACATED we'll start using the import from ror (a script is available: prepare_import_ror.pl)


General Description
===================

# Schema

Main table: 
	organizations
Tables for Multiple properties: 
	acronyms, 
	urls, 
	other_ids, 
	other_names
Tables for vocabularies: 
	countries, 
	languages, 
	id_types, 
	org_types,
	relationships (ie: child, parent, merged_in, merges, ...)
Table for conflicts and duplicates:
	oa_conflicts,
	oa_duplicates
Specific Views for the UI: 
	organizations_view
	organizations_simple_view
	organizations_info_view
	suggestions_info_by_country_view
	oa_duplicates_view
	conflict_groups_view
	duplicate_groups_view
To manage authorizations: 
	users, 
	user_roles, 
	user_countries, 
	users_view (VIEW)
Other: 
	organizations_id_seq (SEQUENCE to generate new OpenOrg IDs), 
	org_index_search (for fulltext search), 
	tmp_dedup_events (to import new suggestion from DedupWF)


# User Roles

User:
	He can work only on organizations of specific countries 
	He can edit metadata of approved organizations
	He can manage duplicates
National Admin: 
	All the User rights
	He can work only on organizations of specific countries 
	He can approve/register organizations
	He can manage conflicts
	He can approve users of his own countries
Super Admin:
	All the National Admin rights, but for all countries

# Actions

1) Create a new org from scratch
	The ID is a valid OpenOrgId (generated by the system)
	The status is 'approved'

2) Approve a suggested org
	ID: A new org is created with OpenOrg Id and status='approved' 
	Stasus of old organizazion: 'duplicate'
	Add a new duplicate to the old Id (status = 'approved') 
	Copy the duplicates from old to new organizations (status will be 'suggested')
	
3) Approve a suggested duplicate
	in oa_duplicates: reltype = 'is_similar'
	in organization: the duplicated org will have status = 'duplicate'
4) Discard a suggested duplicate
	in oa_duplicates: reltype = 'is_different'

5) Resolve a conflict using a subset of suggested conflicts (approve)
	Generate a new org 
	New org status: 'approved'
	Conflict reltype: 'is_similar'
	Old orgs: 'hidden'
	Rels new <-> old : 'merges'
	Rels old <-> new : 'merged_in'
6) Resolve a conflict using a subset of suggested conflicts (discard)
	Conflict reltype: 'is_different'
		
# Load of new suggestion using a Dedup Workflow
	The dedup wf writes the suggestions on the tmp_dedup_events at the end it calls the method /import/dedupEvents
	The previous suggestions (orgs, dups and conflicts) are deleted
	The suggestions are moved from the temp table according to:
		1) not(isOpenOrg(oa_original_id)) AND (oa_original_id = local_id OR isEmpty(local_id)) -> suggested org
		2) not(isOpenOrg(oa_original_id)) AND (oa_original_id != local_id OR isEmpty(local_id)) -> duplicate of a suggested org
		3) isOpenOrg(oa_original_id) AND (oa_original_id != local_id OR isEmpty(local_id)) -> duplicate of a existing  openOrgs
		4) Create a group using 'group_id', it should contain only OpenOrg Ids (using oa_original_id and local_id): each couple of the group is a conflict