dnet-applications/apps/dnet-orgs-database-application/README.txt

110 lines
3.6 KiB
Plaintext

First Import
============
The first import of the organizations should be performed using the sql script: first_import_grid_ac.sql
1) Download the last dump from https://www.grid.ac
2) Update the paths in the sql script
3) Launch the script
If you want to add missing ROR identifiers:
1) Download ror.json from https://figshare.com/collections/ROR_Data/4596503
2) Update the paths in prepare_grid_ror_update.pl and update_ror_ids.sql
3) Launch prepare_grid_ror_update.pl
4) Launch update_ror_ids.sql
NB: The grid.ac dump is richer then ror dump, Ror does not consider some fiels (city, lat, lng) and hierarchical relationships among the organizations.
If grid.ac will be DEPRACATED we'll start using the import from ror (a script is available: prepare_import_ror.pl)
General Description
===================
# Schema
Main table:
organizations
Tables for Multiple properties:
acronyms,
urls,
other_ids,
other_names
Tables for vocabularies:
countries,
languages,
id_types,
org_types,
relationships (ie: child, parent, merged_in, merges, ...)
Table for conflicts and duplicates:
oa_conflicts,
oa_duplicates
Specific Views for the UI:
organizations_view
organizations_simple_view
organizations_info_view
suggestions_info_by_country_view
oa_duplicates_view
conflict_groups_view
duplicate_groups_view
To manage authorizations:
users,
user_roles,
user_countries,
users_view (VIEW)
Other:
organizations_id_seq (SEQUENCE to generate new OpenOrg IDs),
org_index_search (for fulltext search),
tmp_dedup_events (to import new suggestion from DedupWF)
# User Roles
User:
He can work only on organizations of specific countries
He can edit metadata of approved organizations
He can manage duplicates
National Admin:
All the User rights
He can work only on organizations of specific countries
He can approve/register organizations
He can manage conflicts
He can approve users of his own countries
Super Admin:
All the National Admin rights, but for all countries
# Actions
1) Create a new org from scratch
The ID is a valid OpenOrgId (generated by the system)
The status is 'approved'
2) Approve a suggested org (prefix: pending_org_::)
ID: A new org is created with OpenOrg Id and status='approved'
Copy the duplicates from old to new organizations (status will be 'suggested')
The pending org is deleted
3) Approve a suggested duplicate (the status of the duplicates is always 'raw')
in oa_duplicates: reltype = 'is_similar'
4) Discard a suggested duplicate
in oa_duplicates: reltype = 'is_different'
5) Resolve a conflict using a subset of suggested conflicts (approve)
Generate a new org
New org status: 'approved'
Conflict reltype: 'is_similar'
Old orgs status: 'hidden'
Rels new <-> old : 'merges'
Rels old <-> new : 'merged_in'
6) Resolve a conflict using a subset of suggested conflicts (discard)
Conflict reltype: 'is_different'
# Load of new suggestion using a Dedup Workflow
The dedup wf writes the suggestions on the tmp_dedup_events at the end it calls the method /import/dedupEvents
The previous suggestions (orgs, dups and conflicts) are deleted
The suggestions are moved from the temp table according to:
1) not(isOpenOrg(oa_original_id)) AND (oa_original_id = local_id OR isEmpty(local_id)) -> new suggested org with id = 'pending_org_::...'
2) not(isOpenOrg(oa_original_id)) AND (oa_original_id != local_id OR isEmpty(local_id)) -> duplicate of a suggested org
3) isOpenOrg(oa_original_id) AND (oa_original_id != local_id OR isEmpty(local_id)) -> duplicate of a existing openOrgs
4) Create a group using 'group_id', it should contain only OpenOrg Ids (using oa_original_id and local_id): each couple of the group is a conflict