Compare commits

...

1028 Commits

Author SHA1 Message Date
dimitrispie aedd279f78 Updates Promotion DBs
- Add a step for promoting the splitted monitor DBs
2023-07-13 15:35:46 +03:00
Claudio Atzori 5b6844b969 mapping funding relations from Datacite should be done according to the actual result identifier 2021-07-23 18:14:37 +02:00
Claudio Atzori ffdb2a3ea3 [cleaning] fixed filtering function for missing titles 2021-07-23 11:55:55 +02:00
Alessia Bardi 9069958479 tests for enermaps 2021-07-20 19:31:43 +02:00
Claudio Atzori 77e8c6c7f7 filtering 'old' OpenAIRE ids from the entity.originalId[] array in the OAF -> XML searialization procedure 2021-07-20 11:51:33 +02:00
Claudio Atzori 5947cddafc adding record identifier among the originalIds regardless of what IdentifierFactory produces 2021-07-19 17:52:24 +02:00
Miriam Baglioni 13cf444f85 Merge pull request 'force orginalId for claimed records' (#124) from forceOrginalId_claims into master
Reviewed-on: D-Net/dnet-hadoop#124
2021-07-19 17:41:58 +02:00
Claudio Atzori 5e5f65a3c3 contents mapped from the stores with 'claim' interpretation will not change their identifier along their way towards the graph 2021-07-19 15:56:55 +02:00
Claudio Atzori 9913b6073c Merge pull request 'orcid-no-doi' (#123) from enrico.ottonello/dnet-hadoop:orcid-no-doi into master
Reviewed-on: D-Net/dnet-hadoop#123
2021-07-15 17:53:58 +02:00
Enrico Ottonello 2dc50c0999 added default value to process path 2021-07-14 17:02:22 +02:00
Enrico Ottonello 66604bb2b4 added absolute path to process folder 2021-07-14 16:44:51 +02:00
Enrico Ottonello 7840cc6526 merged with master 2021-07-14 15:33:59 +02:00
Enrico Ottonello a65667d217 added publication to dataset even if no contributors 2021-07-14 15:07:07 +02:00
Sandro La Bruzzo 10068c00ea Code refactor:
- removed old workflows in doiboost
 - splitted workflow of doiboost in preprocess and process
2021-07-14 14:45:50 +02:00
Miriam Baglioni 1cdd09cd8e Tentative fix for testing of Jenkins 2021-07-14 11:14:59 +02:00
Sandro La Bruzzo 4cb65bc64a fixed process doiboost workflow:
- splitted OrcidToOAF into two phase preprocess and process
- updated workflow used in production
2021-07-14 09:44:32 +02:00
Claudio Atzori 734de62474 [doiboost] added workflow for the ActionSet update dedicated to production 2021-07-13 17:26:04 +02:00
Claudio Atzori fa720c1da4 [doiboost] added workflow for the ActionSet update dedicated to production 2021-07-13 16:59:30 +02:00
Claudio Atzori 9629569e22 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2021-07-13 16:04:08 +02:00
Claudio Atzori f13e11e3f7 [aggregation] datacite wf: defined parameter declaring the path used to store the OAF objects produced by the transformation phase 2021-07-13 16:04:02 +02:00
Miriam Baglioni f5486ffb14 Fixed issues to tests 2021-07-13 14:07:45 +02:00
Claudio Atzori e0061232e9 [aggregation] datacite wf: conditional creation of links, optional resume from intermediate phases 2021-07-13 13:41:21 +02:00
Claudio Atzori 28a66af425 updated URL in the issueManagement tag 2021-07-13 11:52:24 +02:00
Claudio Atzori 783988af06 depending on dhp-schemas:2.6.14 (release) 2021-07-13 11:17:25 +02:00
Claudio Atzori 9038fdc771 depending on dhp-schemas:2.7.14 (release) 2021-07-12 17:46:12 +02:00
Sandro La Bruzzo bbe8193930 merged stable ids 2021-07-12 17:00:43 +02:00
Claudio Atzori ae2b47b29d [broker] added coalesce(1) on the stats dataset before storing it on postgres 2021-07-09 15:47:51 +02:00
Sandro La Bruzzo 57c74c73c6 fixed mistakes in oozie workflow 2021-07-09 12:28:09 +02:00
Sandro La Bruzzo 61ccb54fde removed wrong loop on oozie wf 2021-07-09 12:17:57 +02:00
Sandro La Bruzzo 9f5a0f3ab6 moved wf indexing of Scholexplorer in dhp-graph-provision 2021-07-09 12:06:43 +02:00
Sandro La Bruzzo 09fccf8000 added workflow to serialize scholix and summary in json 2021-07-09 11:01:42 +02:00
Sandro La Bruzzo 0ea576745f updated CreateInputGraph because ggenerics don't work on Spark Dataset 2021-07-09 10:29:24 +02:00
Sandro La Bruzzo cd17e19044 implemented branch workflow to import datacite and crossref in scholexplorer 2021-07-08 21:20:19 +02:00
Miriam Baglioni c30f3ce647 merge doi normalization 2021-07-08 19:20:02 +02:00
Sandro La Bruzzo 8a034e46e1 updated baseline workflow 2021-07-08 11:11:41 +02:00
Claudio Atzori b7b8e0986e [raw_all] The claim merge procedure includes the claimed contexts in the merged result 2021-07-08 10:42:31 +02:00
Sandro La Bruzzo 0799ac9fb6 fixed wrong path 2021-07-08 10:36:37 +02:00
Sandro La Bruzzo 4d53402712 extended ebiLinks to create a dataset before generation of OAF 2021-07-08 10:26:21 +02:00
Sandro La Bruzzo a4a54a3786 code refactor 2021-07-08 09:08:25 +02:00
Sandro La Bruzzo a01dbe0ab0 completed workflow of generation of scholix and summaries 2021-07-07 23:10:34 +02:00
Claudio Atzori fdcff42e46 [raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity 2021-07-07 19:01:59 +02:00
Claudio Atzori 777536ce91 [aggregation] string values used as regular expressions in the OAI collection classes are defined in a single point as constants, to be reused across the code (PR#122) 2021-07-07 11:23:48 +02:00
Claudio Atzori bc014023c8 Merge pull request 'to solve the scala SI-3623' (#122) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
Reviewed-on: D-Net/dnet-hadoop#122
2021-07-07 11:13:51 +02:00
Claudio Atzori 32bdfdccbc [raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity 2021-07-07 11:08:27 +02:00
Andreas Czerniak ebf3f47a02 from&until more OAI2.0 compl., adding tfs 2021-07-07 09:29:49 +02:00
Claudio Atzori f580cb77e1 added mapping for claim relation 'resultResult_publicationDataset_isRelatedTo' (present on BETA) 2021-07-06 21:11:11 +02:00
Sandro La Bruzzo ed684874f2 deleted old scholix project 2021-07-06 17:20:08 +02:00
Sandro La Bruzzo 8535506c22 added scholix generation 2021-07-06 17:18:06 +02:00
Sandro La Bruzzo 4c54bd8742 add test to verify merge scholix on source 2021-07-06 11:32:14 +02:00
Andreas Czerniak 3531802710 to solve the scala SI-3623 2021-07-06 11:30:56 +02:00
Sandro La Bruzzo 7d8db2eb8a betterRenamingMethod 2021-07-06 09:56:32 +02:00
Sandro La Bruzzo c952c8d236 generate first side of scholix mapping 2021-07-06 09:53:14 +02:00
Claudio Atzori 70ded407bb HttpClient used in metadata collection retries also on 404 2021-07-05 18:04:30 +02:00
Miriam Baglioni 7177c25261 added check for null value during doi normalization 2021-07-05 16:22:38 +02:00
Miriam Baglioni 0892cad4e8 the normalization of the content of value was not visible outside the block. Moved doi normalization operation while returning value 2021-07-05 16:21:42 +02:00
Claudio Atzori 350a0823bd Merge pull request 'using organization ids instead of names in monitor db creation' (#121) from antonis.lempesis/dnet-hadoop:stable_ids into stable_ids
Reviewed-on: D-Net/dnet-hadoop#121
2021-07-05 11:07:39 +02:00
Antonis Lempesis 89e6f46682 using organization ids instead of names in monitor db creation 2021-07-05 12:00:00 +03:00
Sandro La Bruzzo e4b84ef5d6 fixed mapping OAF to Scholix summary 2021-07-02 16:48:48 +02:00
Sandro La Bruzzo 8fa0841898 Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer 2021-07-01 22:14:04 +02:00
Sandro La Bruzzo c6fa8598e1 massive code refactor:
removed modules dhp-*-scholexplorer
2021-07-01 22:13:45 +02:00
Antonis Lempesis 829caee4fd added the missing indicators files 2021-06-30 17:31:33 +02:00
Sandro La Bruzzo 84b834c893 added test dataset test for pangaea 2021-06-30 17:31:09 +02:00
Sandro La Bruzzo 1a6b398968 implemented Creation of Raw Graph and Resolution 2021-06-30 17:27:55 +02:00
Miriam Baglioni bc34347643 added assertions to verify doi normalization 2021-06-30 14:37:08 +02:00
Miriam Baglioni 86f47afcc7 slight modification of the resource to accomodate also doi normalization tests 2021-06-30 14:36:49 +02:00
Miriam Baglioni 03767ea8e6 slight modification of the resource to accomodate also doi normalization tests 2021-06-30 13:21:24 +02:00
Miriam Baglioni f8eec0ca9a added resource to test the normalization of doi during the import of MAG 2021-06-30 13:19:54 +02:00
Miriam Baglioni 149f85ddf5 added tests for the normalization of the dois 2021-06-30 13:00:52 +02:00
Miriam Baglioni e487b5544c added tests for the normalization of the dois 2021-06-30 12:57:11 +02:00
Miriam Baglioni 1503ccbbb5 added tests for the normalization of the dois 2021-06-30 12:55:37 +02:00
Miriam Baglioni 1299bfb357 Added class to test the normalization of doi 2021-06-30 12:53:27 +02:00
Sandro La Bruzzo 623a0c4edb code Refactor, renaming packages 2021-06-30 11:09:30 +02:00
Miriam Baglioni cf758f4f91 added normalization step for the doi 2021-06-30 10:03:15 +02:00
Miriam Baglioni 801763a0fa there is no more the need to lower case the doi since it is done in the first step. Also changed the creation of the id by using the factory 2021-06-29 19:07:23 +02:00
Miriam Baglioni a74de1cda2 added normalization step to the doi 2021-06-29 18:51:11 +02:00
Miriam Baglioni 06074ea7d3 added normalization step to the doi 2021-06-29 18:46:08 +02:00
Miriam Baglioni 8b8ffe82dc added step of normalization for the doi 2021-06-29 18:41:39 +02:00
Miriam Baglioni 50cc21d92e Added method to normalize doi values (lower case, remove all preceeding 10., filtering out doi not starting with 10.) 2021-06-29 18:35:28 +02:00
Claudio Atzori 6d3f960238 Merge pull request 'added the missing indicators files' (#120) from antonis.lempesis/dnet-hadoop:stable_ids into stable_ids
Reviewed-on: D-Net/dnet-hadoop#120
2021-06-29 15:57:39 +02:00
Antonis Lempesis ae18171212 Merge branch 'stable_ids' into stable_ids 2021-06-29 15:33:39 +02:00
Antonis Lempesis 87f14a3899 added the missing indicators files 2021-06-29 16:31:51 +03:00
Sandro La Bruzzo db933ebd21 Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer 2021-06-29 14:16:12 +02:00
Sandro La Bruzzo 7e08655e5f added relation dates in all scholexplorer Datasources 2021-06-29 12:02:03 +02:00
Sandro La Bruzzo 075055eaca added relation dates in bio mapping 2021-06-29 10:33:09 +02:00
Sandro La Bruzzo f36f92287d implemented mapping from Crossref Event Data to Oaf 2021-06-29 10:21:23 +02:00
Claudio Atzori 986a8011ec Merge pull request 'copied latest changes from old fork: indicators+monitor institutions' (#119) from antonis.lempesis/dnet-hadoop:stable_ids into stable_ids
Reviewed-on: D-Net/dnet-hadoop#119
2021-06-29 08:49:12 +02:00
Antonis Lempesis 018c4eb52c copied latest changes from old fork: indicators+monitor institutions 2021-06-28 23:46:52 +03:00
Sandro La Bruzzo 511ec14c63 implemented mapping from EBI and Scholix Resolved to OAF 2021-06-28 22:04:22 +02:00
Claudio Atzori af42377d0e HttpClient used in metadata collection retries on 502, 503, 504 2021-06-28 09:34:30 +02:00
Sandro La Bruzzo ad50415167 Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer 2021-06-24 17:20:50 +02:00
Sandro La Bruzzo 80e15cc455 implemented mapping from uniprot, pdb and ebi links 2021-06-24 17:20:00 +02:00
Claudio Atzori 67afd06cd1 [cleaning] cleaning instance.pid and instance.alternateidentifier using the same procedure used to clean result.pid 2021-06-24 12:10:17 +02:00
Claudio Atzori 2e8fd2c531 cleanup 2021-06-23 14:38:24 +02:00
Claudio Atzori 4dc9ebf217 [raw_all] fixed unit test 2021-06-23 14:38:07 +02:00
Claudio Atzori 50fc5a64a0 [raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity 2021-06-23 11:49:42 +02:00
Claudio Atzori 5edcc6832a applying sonarLint suggestions 2021-06-23 09:53:29 +02:00
Sandro La Bruzzo 080a280bea added pdb to Oaf Transformation 2021-06-21 16:23:59 +02:00
Sandro La Bruzzo 1dc0c59e20 merged fix thai dates from stable_ids 2021-06-21 10:39:46 +02:00
Sandro La Bruzzo dc66cf615b Merge branch 'stable_id_scholexplorer' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_id_scholexplorer 2021-06-21 09:38:33 +02:00
Sandro La Bruzzo 507e42102a added pdb to oaf class 2021-06-21 09:36:40 +02:00
Sandro La Bruzzo a167543637 Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_id_scholexplorer 2021-06-21 09:14:11 +02:00
Sandro La Bruzzo 4fe7b75644 renamed packages 2021-06-18 16:41:24 +02:00
Sandro La Bruzzo 3990165d05 changed typologies of unresolved relation 2021-06-18 11:43:59 +02:00
Claudio Atzori 2dd5449c13 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-06-18 10:08:15 +02:00
Claudio Atzori fd54ecf7bd bumped dhp-schemas dependency version 2021-06-18 10:08:07 +02:00
Miriam Baglioni 180d671127 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-06-18 09:46:18 +02:00
Miriam Baglioni 13c96622c9 - 2021-06-18 09:45:16 +02:00
Miriam Baglioni b486ae498f added test and test resource to verify the generation of the date of acceptance from the input extracted from the dump 2021-06-18 09:43:32 +02:00
Miriam Baglioni 464c2ddde3 changed to split in two steps the generation of the crossref dataset 2021-06-18 09:42:31 +02:00
Miriam Baglioni 6aca0d8ebb added kryo encoding for input files 2021-06-18 09:42:07 +02:00
Miriam Baglioni 3585e53da3 changed to split in two steps the generation of the crossref dataset 2021-06-18 09:41:23 +02:00
Claudio Atzori 41b551562e applying PR#115 (DatePicker) on stable_ids 2021-06-17 09:33:50 +02:00
Sandro La Bruzzo 3100166d29 Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer 2021-06-16 16:22:16 +02:00
Claudio Atzori 74833d04f1 Merge branch 'pids_beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into stable_ids 2021-06-16 15:54:18 +02:00
Claudio Atzori 7243a40c88 code formatting 2021-06-16 15:03:03 +02:00
Sandro La Bruzzo dfcf78cf24 removed wrong code 2021-06-16 14:57:42 +02:00
Sandro La Bruzzo cc0f2b11fb Implemented mapping from pubmed baseline to OAF 2021-06-16 14:56:24 +02:00
Miriam Baglioni 95885bcf12 forces executor Executor memory and driver executor memory to be 7G (trying to avoid OOM) 2021-06-16 10:17:52 +02:00
Miriam Baglioni 2550a73981 - 2021-06-16 10:04:41 +02:00
Miriam Baglioni 1c47c0d786 modified the number of executors trying to avoid OOM exception 2021-06-15 21:05:39 +02:00
Miriam Baglioni 7deac55138 added one option for resume from in the wf 2021-06-15 18:38:20 +02:00
Antonis Lempesis f7c0b80e35 storing result_instance as parquet 2021-06-15 14:45:48 +03:00
Miriam Baglioni 66e7ef892f changed the parameter name 2021-06-15 11:08:54 +02:00
Miriam Baglioni 4f47ad0891 no need to rename the folders, just write in overwrite mode, so I changed the name of the output folder 2021-06-15 09:28:31 +02:00
Miriam Baglioni 9f9dd00b94 refactoring 2021-06-15 09:24:46 +02:00
Miriam Baglioni 63d74ee379 refactoring 2021-06-15 09:24:11 +02:00
Miriam Baglioni 6ebc236657 added needed property: outputPath 2021-06-15 09:23:24 +02:00
Miriam Baglioni f7379255b6 changed the workflow to extract info from the dump 2021-06-15 09:22:54 +02:00
Miriam Baglioni d6e21bb6ea creates the crossref dataset used for doiboost together with unpacking part from tar 2021-06-14 17:27:19 +02:00
Miriam Baglioni 4da141bd7c Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-06-14 13:41:02 +02:00
Miriam Baglioni ce0cfd79e0 creates the crossref dataset used for doiboost 2021-06-14 13:40:19 +02:00
Miriam Baglioni 93efe4de82 split the construction of crossref dataset in two parts. This one just unpacks the tar entries 2021-06-14 13:39:40 +02:00
Michele Artini ada063ce70 fixed a problem with empty mdstore list (2) 2021-06-14 12:04:47 +02:00
Michele Artini 83132ee99a fixed a problem with empty mdstore list 2021-06-14 11:57:00 +02:00
Miriam Baglioni cf360d7c97 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-06-14 10:19:49 +02:00
Miriam Baglioni 8873e6b6d1 workflow and parameter 2021-06-14 10:15:57 +02:00
Miriam Baglioni 0f1acdf6b6 workflow and parameter 2021-06-14 10:08:55 +02:00
Sandro La Bruzzo aeb8132627 Merged branch stable_ids 2021-06-14 10:07:29 +02:00
Sandro La Bruzzo efbea1e01a minor fix 2021-06-14 09:45:14 +02:00
Miriam Baglioni 75780fc636 extraction of the tar for the dump of crossref, and creation of the dataset 2021-06-14 09:45:07 +02:00
Claudio Atzori 2039bb9f5f orcid / orcid_pending cleaning backported from master branch 2021-06-14 09:40:50 +02:00
Claudio Atzori dd19c4ac5a Merge pull request 'import_new_mdstores' (#112) from import_new_mdstores into stable_ids
Reviewed-on: D-Net/dnet-hadoop#112
2021-06-14 09:23:55 +02:00
Claudio Atzori e9e86a237d Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-06-11 17:00:02 +02:00
Claudio Atzori 10bd6ca194 depending on dhp-schemas:2.5.12 (release) 2021-06-11 16:59:56 +02:00
Claudio Atzori a900bfb874 delegating the date parsing to https://github.com/sisyphsu/dateparser 2021-06-11 16:53:01 +02:00
Sandro La Bruzzo dd997c49e0 fix wrong relation id
fix date thai ticket #6791
2021-06-10 14:47:18 +02:00
Antonis Lempesis d413b24611 added instances, orgs for monitor, totalcost for projects, apcs 2021-06-10 02:35:46 +03:00
Claudio Atzori 741077dbca Merge pull request 'Fix in Affiliation Propagation' (#113) from miriam.baglioni/dnet-hadoop:master into stable_ids
Reviewed-on: D-Net/dnet-hadoop#113
2021-06-09 18:42:42 +02:00
Miriam Baglioni 32b0c27217 Aggiornare 'dhp-workflows/dhp-enrichment/src/main/java/eu/dnetlib/dhp/resulttoorganizationfrominstrepo/PrepareResultInstRepoAssociation.java'
fix in SQL query: while writing the blacklist constraint it used d.id to indicate the datasource id, but no alias for the datasource was defined. So I removed the alias
2021-06-09 18:36:11 +02:00
Sandro La Bruzzo 0d1f37302f Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_id_scholexplorer 2021-06-09 09:35:16 +02:00
Miriam Baglioni dc07f1079b added check in case the author set to be enriched is null 2021-06-08 12:06:10 +02:00
Miriam Baglioni 8d2e086e48 changes to avoid reassignment to val 2021-06-07 17:50:37 +02:00
Miriam Baglioni f33521d338 Aggiornare 'dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcid/SparkConvertORCIDToOAF.scala'
to be able to replace the aboject assigned to author val has been replaced by var
2021-06-07 17:27:07 +02:00
Miriam Baglioni bc12e9819e Aggiornare 'dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcid/SparkConvertORCIDToOAF.scala'
The change is to fix the issue that arises when the same work appears more than once on the same ORCID profile. The change avoid to replicate the association doi -> author when the orcid id is already associated to the doi.
2021-06-07 16:37:01 +02:00
Sandro La Bruzzo 0cdb7ccdaa added inverse relations to datacite mapping 2021-06-04 15:10:20 +02:00
Sandro La Bruzzo 5b724d9972 added relations to datacite mapping 2021-06-04 10:14:22 +02:00
Sandro La Bruzzo e57294ac99 implemented changes on PUBMed dataflow 2021-06-03 10:52:09 +02:00
Michele Artini ede2749822 orcid pid type 2021-06-01 12:42:43 +02:00
Michele Artini f0fbfdcfae Merge branch 'stable_ids' into import_new_mdstores 2021-06-01 12:03:00 +02:00
Michele Artini e950750262 add nodes to import hdfs mdstores 2021-06-01 10:48:50 +02:00
Michele Artini 03a510859a removed coalesce(1) 2021-05-31 14:10:51 +02:00
Michele Artini e9f2b6037c patch of mdstore records 2021-05-31 11:36:26 +02:00
Sandro La Bruzzo 02ef46535f Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids 2021-05-31 09:50:15 +02:00
Sandro La Bruzzo aeadc5a366 updated wf Datacite Import to retrieve the block size as parameter 2021-05-31 09:49:53 +02:00
Claudio Atzori 96238152cb added serialization for alternateIdentifiers and pids within each record instance 2021-05-28 16:57:30 +02:00
Michele Artini ad56a44fda save as gzipped sequence file 2021-05-28 14:45:39 +02:00
Claudio Atzori 83722ebc47 pull #111 replied on stable_ids 2021-05-28 14:11:46 +02:00
Claudio Atzori eb6acfbabc [cleaning] removing non parsable relation.validationDate(s) 2021-05-28 10:50:44 +02:00
Claudio Atzori 6e3a4e9237 updated test expectations 2021-05-28 09:37:50 +02:00
Claudio Atzori ac3d090e9e bumped dhp-schemas dependency version 2021-05-27 17:31:12 +02:00
Michele Artini 4fa5671d16 first implementation of Hdfs Mdstores Importer 2021-05-27 16:22:07 +02:00
Claudio Atzori c3d92247d3 bumped dhp-schemas dependency version 2021-05-27 15:10:51 +02:00
Claudio Atzori d512062b58 integrating pull #109, H2020Classification 2021-05-27 12:22:47 +02:00
Claudio Atzori 5e4b91d9ef more pervasive use of constants from ModelConstants, especially for ORCID 2021-05-26 18:20:23 +02:00
Sandro La Bruzzo bced804151 updated wf Datacite Import to retrieve the block size as parameter 2021-05-26 17:06:50 +02:00
Claudio Atzori 4f58418184 depending on dhp-schemas:2.4.7 (release) 2021-05-24 10:32:48 +02:00
Miriam Baglioni abd88f663d changed test resource to mirror change in the input file 2021-05-21 15:20:47 +02:00
Miriam Baglioni c844877de2 changed workflow flow to possibly parallelize also the programme and project preparation steps 2021-05-21 14:41:57 +02:00
Miriam Baglioni 073d76864d refactoring 2021-05-21 14:41:03 +02:00
Miriam Baglioni 4c8b4a774c removed not needed code 2021-05-21 14:40:07 +02:00
Enrico Ottonello abdd0ade1f added temporary output folder as workflow parameter 2021-05-21 12:08:16 +02:00
Miriam Baglioni 53b9d87fec new prepareProgramme according to the new file 2021-05-21 11:49:31 +02:00
Miriam Baglioni 1ee8f13580 refactoring and added "left" as join type to be 100% sure to get the whole set of projects 2021-05-21 11:49:05 +02:00
Miriam Baglioni e07c3ba089 due to change in the input file the filtering step is no more needed 2021-05-21 11:47:43 +02:00
Miriam Baglioni 54f6e2f693 changed to get the needed information to build the action set as parallel jobs 2021-05-21 11:47:00 +02:00
Miriam Baglioni 7180505519 removed non needed variable 2021-05-21 11:46:13 +02:00
Miriam Baglioni 2eb1a8b344 changed because the input file changed 2021-05-21 11:40:20 +02:00
Enrico Ottonello d0945c3c78 added temporary output folder, because of folder access rights are different on beta and prod 2021-05-20 19:14:31 +02:00
Enrico Ottonello 1265dadc90 workflow aligned with stable_ids 2021-05-20 19:01:28 +02:00
Enrico Ottonello 0821d8e97d Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2021-05-20 18:33:18 +02:00
Enrico Ottonello ae7bd24d79 removed old workflows 2021-05-20 18:32:22 +02:00
Enrico Ottonello 4d6c473bf1 removed redundant classes contained now in dhp-schema 2021-05-20 18:26:42 +02:00
Claudio Atzori 9d725efdc1 reverted implementation of the mdstore client 2021-05-20 18:26:09 +02:00
Miriam Baglioni 9610224671 added param to workflow property 2021-05-20 18:21:12 +02:00
Claudio Atzori 863b56b6ce using constants from ModelConstants 2021-05-20 16:23:58 +02:00
Claudio Atzori ae5c28e54f code formatting 2021-05-20 16:13:06 +02:00
Miriam Baglioni aa45b4df9b - 2021-05-20 15:57:40 +02:00
Miriam Baglioni 052c837843 - 2021-05-20 15:54:44 +02:00
Claudio Atzori b695932ae4 integrated pull#108 2021-05-20 15:34:04 +02:00
Claudio Atzori ea9b00ce56 adjusted test 2021-05-20 15:31:42 +02:00
Claudio Atzori 2e70aa43f0 Merge pull request 'H2020Classification fix and possibility to add datasources in blacklist for propagation of result to organization' (#108) from miriam.baglioni/dnet-hadoop:master into master
Reviewed-on: D-Net/dnet-hadoop#108

The changes look ok, but please drop a comment to describe how the parameters should be changed from the workflow caller for both workflows
* H2020Classification
* propagation of result to organization
2021-05-20 15:25:05 +02:00
Claudio Atzori b572f56763 Merge branch 'master' into master 2021-05-20 15:22:35 +02:00
Claudio Atzori 2578b7fbb3 code formatting 2021-05-20 14:59:02 +02:00
Miriam Baglioni dc0ad8d2e0 fixed issue related to change in the file name downloaded. Added sheet name as parameter and also a check if the name should change 2021-05-20 14:53:53 +02:00
Claudio Atzori 232dce83db fixes #6701: xpath for titles to support both datacite and Guidelines v4 mapping 2021-05-20 14:41:15 +02:00
Claudio Atzori aef2977ad0 fixes #6701: xpath for titles to support both datacite and Guidelines v4 mapping 2021-05-20 14:40:22 +02:00
Miriam Baglioni 02b80cf24f resolved conflicts 2021-05-20 10:59:39 +02:00
Claudio Atzori c4a23c2f4d fix: preserving the old identifier among the originalIds in the doiboost construction process, trying to avoid UnsupportedOperationException while adding elements to the originalIds 2021-05-19 16:01:52 +02:00
Claudio Atzori ba03f549d7 fix: preserving the old identifier among the originalIds in the doiboost construction process 2021-05-19 15:43:26 +02:00
Claudio Atzori 239d0f0a9a ROR actionset import workflow backported from branch stable_ids 2021-05-18 16:12:11 +02:00
Antonis Lempesis 168edcbde3 added the final steps for the observatory promote wf and some cleanup 2021-05-18 15:23:20 +03:00
Michele Artini e56ccec536 Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids 2021-05-18 14:00:28 +02:00
Michele Artini c1e20de7cf fixed the deserialization of a json property 2021-05-18 14:00:14 +02:00
Claudio Atzori a9f512103b using constants from ModelConstants 2021-05-18 11:19:07 +02:00
Claudio Atzori eeb8bcf075 using constants from ModelConstants 2021-05-18 11:10:07 +02:00
Claudio Atzori 2cbf15f4fb using ModelConstants 2021-05-17 09:54:45 +02:00
Enrico Ottonello e13926cdd0 merged with master 2021-05-14 18:10:31 +02:00
Claudio Atzori f19feceaf0 set the old identifier before switching to the new one 2021-05-14 12:53:40 +02:00
Claudio Atzori 1bd70fa2c6 preserving the old identifier among the originalIds in the doiboost construction process 2021-05-14 11:30:41 +02:00
Claudio Atzori ca3f3a7687 using ModelConstants 2021-05-14 11:29:49 +02:00
Claudio Atzori 0358ae16ce depending on the latest dhp-schema version 2021-05-14 11:28:33 +02:00
Claudio Atzori 23b8883ab1 applied intellij code cleanup 2021-05-14 10:58:12 +02:00
Claudio Atzori 609eb711b3 IndexRecordTransformerTest for producing a record that can be manually submitted to solr 2021-05-13 16:13:28 +02:00
Claudio Atzori 1517bf7c92 IndexRecordTransformerTest for producing a record that can be manually submitted to solr 2021-05-13 16:11:22 +02:00
Sandro La Bruzzo d9a0bbda7b implemented new phase in doiboost to make the dataset Distinct by ID 2021-05-13 12:25:14 +02:00
Sandro La Bruzzo 6424cd9062 Added passing of the following parameters:
-varDataSourceId
-varOfficialName

in Each transformation Rule
2021-05-11 15:17:38 +02:00
Sandro La Bruzzo 073dcea2aa Added passing of the following parameters:
-varDataSourceId
-varOfficialName

in Each transformation Rule
2021-05-11 15:05:58 +02:00
Claudio Atzori d4c3476152 mapping datasource.journal only when an issn is available, null otherwhise 2021-05-11 11:08:54 +02:00
Claudio Atzori da9d6f3887 mapping datasource.journal only when an issn is available, null otherwhise 2021-05-11 10:45:30 +02:00
Sandro La Bruzzo 54217d73ff removed old parameters from oozie workflow 2021-05-11 09:59:02 +02:00
Claudio Atzori d1cbee8413 imported methods from CleaningFunctions, defined in GraphCleaningFunctions 2021-05-10 16:43:39 +02:00
Claudio Atzori 3797543600 MDStoreManager model classes moved in dhp-schemas 2021-05-10 14:32:05 +02:00
Claudio Atzori 3925eb6a79 MDStoreManager model classes moved in dhp-schemas 2021-05-10 13:58:23 +02:00
Claudio Atzori 25254885b9 [ActionManagement] reduced number of xqueries used to access ActionSet info 2021-05-07 17:32:03 +02:00
Claudio Atzori 8a0de2fc18 [ActionManagement] reduced number of xqueries used to access ActionSet info 2021-05-07 17:31:32 +02:00
Sandro La Bruzzo 7dc824fc23 imported changes in stable_id into master 2021-05-07 12:53:50 +02:00
Michele Artini d82071ba6c originalId with prefix 2021-05-06 15:34:48 +02:00
Claudio Atzori d4a30fabe3 clean up tests 2021-05-05 17:28:15 +02:00
Claudio Atzori dccaf173cf fixed mapping applied to ODF records. Added unit test to verify the mapping for OpenTrials 2021-05-05 16:36:15 +02:00
Claudio Atzori 8c96a82a03 fixed mapping applied to ODF records. Added unit test to verify the mapping for OpenTrials 2021-05-05 15:30:06 +02:00
Claudio Atzori 50fc128ff7 alternative way to set timeouts for the ISLookup client 2021-05-05 11:24:44 +02:00
Claudio Atzori 2e1eb96f9a code formatting 2021-05-05 11:23:57 +02:00
Claudio Atzori b1785ba77c alternative way to set timeouts for the ISLookup client 2021-05-05 11:23:46 +02:00
Sandro La Bruzzo 1adfc41d23 merged manually changes on stable_id for doiboost into master 2021-05-05 10:23:32 +02:00
Claudio Atzori fb930b84d3 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-05-04 18:06:30 +02:00
Claudio Atzori 923d19ea8e mdstore read lock/unlock when bulk copying records from mongodb to hdfs 2021-05-04 18:06:21 +02:00
Sandro La Bruzzo 714b71bd21 updated pubmed 2021-05-04 14:54:12 +02:00
Claudio Atzori ba86835951 using common constants from ModelConstants 2021-05-04 11:51:52 +02:00
Claudio Atzori c00be646f3 Merge pull request 'prepare_ror_actionset' (#106) from prepare_ror_actionset into stable_ids
Reviewed-on: D-Net/dnet-hadoop#106

Thanks Michele, looks good to me.
2021-05-04 11:41:58 +02:00
Michele Artini f4bd2b5619 recert file SparkDedupTest.java 2021-05-04 10:26:14 +02:00
Michele Artini 49910aedca Merge branch 'stable_ids' into prepare_ror_actionset 2021-05-04 10:00:12 +02:00
Claudio Atzori 5cc3e6d61c bumped pace-core dependency version 2021-05-03 16:40:50 +02:00
Michele Artini b4877da363 Merge branch 'stable_ids' into prepare_ror_actionset 2021-05-03 08:13:55 +02:00
Alessia Bardi 9a20057615 fixed query for organisations' pids 2021-04-29 15:23:39 +02:00
Michele Artini 6692128234 Merge branch 'stable_ids' into prepare_ror_actionset 2021-04-29 13:24:08 +02:00
Alessia Bardi a801999e75 fixed query for organisations' pids 2021-04-29 12:18:42 +02:00
Michele Artini a278d67175 parse input file 2021-04-29 11:34:47 +02:00
Claudio Atzori f6ccd54d87 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-04-29 10:10:01 +02:00
Claudio Atzori 91e7220f20 cleaned up workflow for actionset migration, adjusted dnet|cnr* dependency versions 2021-04-29 10:09:52 +02:00
Michele Artini f77ba34126 pid types 2021-04-29 09:50:05 +02:00
Michele Artini 7c5cd86927 annotations and tests 2021-04-29 09:29:19 +02:00
Michele Artini b5cf505cc6 partial implementation of the ROR->actionset workflow 2021-04-28 16:00:24 +02:00
Enrico Ottonello c537986b7c deleted folders with merged data immediately before merge phases 2021-04-28 11:25:25 +02:00
Sandro La Bruzzo 2129e9caa7 updated pangaea transformation to parse directly the xml 2021-04-28 10:21:03 +02:00
Claudio Atzori 5afa7d3e0c core utilities in dhp-common moved in external module dhp-schemas 2021-04-27 15:44:01 +02:00
Alessia Bardi e6075bb917 updated json schema for results - added instances and accessright definition 2021-04-27 15:15:08 +02:00
Claudio Atzori ac77a245a3 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-04-27 14:05:00 +02:00
Claudio Atzori f783e60ff7 cleanup 2021-04-27 14:04:50 +02:00
Sandro La Bruzzo 63c0303137 removed unused import, add log 2021-04-27 12:17:23 +02:00
Sandro La Bruzzo 74484d2823 bug fixing 2021-04-27 12:13:44 +02:00
Claudio Atzori dd2e0a81f4 added dnet45-bootstrap-snapshot and dnet45-bootstrap-release repositories 2021-04-27 12:08:43 +02:00
Claudio Atzori 233d849f90 added dnet45-bootstrap-snapshot and dnet45-bootstrap-release repositories 2021-04-27 12:03:40 +02:00
Claudio Atzori fcd13f5350 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-04-27 11:37:45 +02:00
Claudio Atzori 4028176559 enabled snapshots from dnet45-snapshots repository 2021-04-27 11:37:32 +02:00
Sandro La Bruzzo c74b03d59c Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids 2021-04-27 11:31:07 +02:00
Sandro La Bruzzo 7f8848ecdd added first implementation of Pangaea Mapping 2021-04-27 11:30:37 +02:00
Claudio Atzori 27ab8a704d adjusted poms to align with the external dhp-schema module 2021-04-27 10:12:27 +02:00
Claudio Atzori a7cf449b36 cleanup 2021-04-27 10:11:26 +02:00
Claudio Atzori 82de6fb634 dhp-schema moved to dedicated module https://code-repo.d4science.org/D-Net/dhp-schemas 2021-04-27 10:10:50 +02:00
Claudio Atzori fa42026590 fixed PersonCleaner extension functions 2021-04-27 10:10:06 +02:00
Claudio Atzori ef4bfd82e2 code formatting 2021-04-27 10:09:31 +02:00
Claudio Atzori faa8f6f4e2 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-04-27 09:57:03 +02:00
miconis 6d5c14e030 assertions updated in entity merger test 2021-04-27 09:47:49 +02:00
Claudio Atzori c2bb03c8b5 depending on external dhp-schemas module 2021-04-23 17:57:35 +02:00
Claudio Atzori 7ed107be53 depending on external dhp-schemas module 2021-04-23 17:52:36 +02:00
Claudio Atzori c25238480c making ODF record parsing namespace unaware (#6629) 2021-04-23 17:34:57 +02:00
Claudio Atzori 99cfb027fa making ODF record parsing namespace unaware (#6629) 2021-04-23 17:09:36 +02:00
Miriam Baglioni 72e5aa3b42 refactoring 2021-04-23 12:10:30 +02:00
Miriam Baglioni 4ae6fba01d refactoring 2021-04-23 12:09:19 +02:00
Miriam Baglioni 7d1b8b7f64 merge upstream 2021-04-23 11:55:49 +02:00
miconis d0e3366c34 Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids 2021-04-22 11:45:19 +02:00
miconis 3c12eeadce bug fix in propagation of relations 2021-04-22 11:44:33 +02:00
Claudio Atzori e5abbec2ba [orcid] download of the lambda file defined in a script 2021-04-22 11:22:10 +02:00
Claudio Atzori 55964cbd81 [orcid] large oozie workflow cleanup; updated workflow for the orcidnodoi actionset creation 2021-04-22 10:18:09 +02:00
Claudio Atzori 8f309b72ff [dedup] using node names consistently across the workflow 2021-04-21 17:54:51 +02:00
Claudio Atzori 52244f813a merging from enrico.ottonello/dnet-hadoop:orcid-no-doi 2021-04-21 12:24:09 +02:00
Sandro La Bruzzo fd29307b84 updated workflow name 2021-04-21 09:21:41 +02:00
Claudio Atzori 815b9f4d56 [openorgs dedup] fixed workflow parameter declarations. Introduced support for resuming the execution from intermediate steps 2021-04-20 17:24:45 +02:00
Claudio Atzori d0d477cca3 code formatting 2021-04-20 12:50:34 +02:00
miconis 0393cdce42 addition of alternative names in export queries 2021-04-20 12:45:21 +02:00
miconis cadd0a5de8 modification of the queries for openorgs: they now consider also pending orgs 2021-04-20 12:06:56 +02:00
Sandro La Bruzzo e06c7f32f6 updated id figshare as described in #6377 2021-04-20 10:18:07 +02:00
Sandro La Bruzzo dbe0d0378e resolved ticket #6377 2021-04-20 09:44:44 +02:00
Antonis Lempesis 625d993cd9 added step for observatory db 2021-04-20 02:31:06 +03:00
Antonis Lempesis 25d0512fbd code cleanup 2021-04-20 01:43:23 +03:00
Sandro La Bruzzo 524e5f3092 Improved parallelization on transformation wf on hadoop 2021-04-19 15:17:25 +02:00
Sandro La Bruzzo cdfe01bbae improved parallelization on transformation job 2021-04-19 15:14:52 +02:00
Sandro La Bruzzo 3ae67b7a1d Merge remote-tracking branch 'origin/stable_ids' into stable_ids 2021-04-16 17:36:57 +02:00
Sandro La Bruzzo a16e5299f9 applied unique function on the final dataset 2021-04-16 17:36:48 +02:00
Claudio Atzori 45057440c1 code formatting 2021-04-16 17:28:25 +02:00
Enrico Ottonello 34ca792a55 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2021-04-16 17:18:46 +02:00
Enrico Ottonello 27068aacd1 wf to move orcid-no-doi dataset on the folder ready the import 2021-04-16 17:17:47 +02:00
miconis 7ad573d023 bug fix: changed join in propagaterelations without applying filter on the id 2021-04-16 16:40:42 +02:00
Sandro La Bruzzo 67085da305 fixed NPE 2021-04-16 11:05:58 +02:00
Sandro La Bruzzo 644aa8f40c Merge remote-tracking branch 'origin/stable_ids' into stable_ids 2021-04-16 09:14:26 +02:00
Sandro La Bruzzo 7d6a80e2f2 added new type on MAG mapping 2021-04-16 09:14:15 +02:00
Claudio Atzori 8704d32780 code formatting 2021-04-15 16:52:58 +02:00
Claudio Atzori ba4b4c74d8 do not make the identifier prefix depend on the Handle 2021-04-15 16:48:26 +02:00
Claudio Atzori 906d50563c Merge pull request 'properly invalidating impala metadata' (#105) from antonis.lempesis/dnet-hadoop:master into master
Reviewed-on: D-Net/dnet-hadoop#105
2021-04-15 15:06:22 +02:00
Claudio Atzori 3d58f95522 [stats update] properly invalidating impala metadata 2021-04-15 15:03:05 +02:00
Antonis Lempesis 03d36fadea properly invalidating impala metadata 2021-04-15 13:34:22 +03:00
miconis f64e57c112 refactoring of the id generation, sparkcreatemergerels collects entities to create root id after a join 2021-04-15 10:59:24 +02:00
miconis 176a5e493d Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids 2021-04-14 18:06:34 +02:00
miconis 3525a8f504 id generation of representative record moved to the SparkCreateMergeRel job 2021-04-14 18:06:07 +02:00
Claudio Atzori 745fa92db8 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-04-14 10:14:00 +02:00
Claudio Atzori 083c2959dc cleanup 2021-04-14 10:13:53 +02:00
Sandro La Bruzzo 3f77bfceb0 fixed test failure on jenkins 2021-04-14 10:03:01 +02:00
Claudio Atzori 3125cef545 code formatting 2021-04-14 09:11:54 +02:00
Sandro La Bruzzo 44a0064df6 Merge remote-tracking branch 'origin/stable_ids' into stable_ids 2021-04-13 17:48:12 +02:00
Sandro La Bruzzo 479abd10cb Add into ORCID workflow a method that extracts orcid directly to the dump generated by Enrico 2021-04-13 17:47:43 +02:00
Claudio Atzori 710cd1e8f2 Merge pull request 'add xslt, personname cleaner' (#104) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
Reviewed-on: D-Net/dnet-hadoop#104

LGTM
2021-04-13 14:43:05 +02:00
Claudio Atzori d1ca025b0b [cleaning] remiving authors without fullname or providing 'deactivated' keyword. Removing test test titles 2021-04-13 14:32:41 +02:00
miconis 1542196a33 bug fix: starting node of duplicate scan wf changed 2021-04-13 10:15:43 +02:00
miconis 369ed1cd8a bug fix: lookupurl parameter added to dedup record job 2021-04-13 09:08:05 +02:00
Andreas Czerniak 52fbece3b3 Merge branch 'stable_ids' of https://code-repo.d4science.org/andreas.czerniak/BrStableId_dnet-hadoop into stable_ids 2021-04-13 07:05:09 +02:00
Andreas Czerniak d7614c1f85 introduce new const 2021-04-13 07:04:27 +02:00
Andreas Czerniak 3b694074ff add xslt, personname cleaner 2021-04-13 07:04:27 +02:00
Claudio Atzori 511c0521e5 [dedup] avoiding NPEs handling OpenOrg relations 2021-04-12 17:45:11 +02:00
Claudio Atzori 72dcadd8e6 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-04-12 17:32:09 +02:00
Claudio Atzori 902d05f548 [cleaning] avoiding NPEs handling null author PIDs 2021-04-12 17:31:40 +02:00
miconis d442e25cbc bug fix: ids in self mergerels are not marked deletedbyinference=true 2021-04-12 15:56:22 +02:00
miconis dcff9cecdf bug fix: ids in self mergerels are not marked deletedbyinference=true 2021-04-12 15:55:27 +02:00
Andreas Czerniak 34df35926c add xslt, personname cleaner 2021-04-09 14:35:36 +02:00
miconis 11b22b2d23 bug fix in the query, it now exports only relations with non-hidden organizations 2021-04-08 11:51:47 +02:00
miconis 0857100fb8 implementation of the tests for the openorgs integration in the openaire provision 2021-04-07 18:42:16 +02:00
miconis bf685d849f addition of pids in the query for the export of openorgs for the provision, addition of ec_fields in the openorgs model 2021-04-07 14:27:43 +02:00
Miriam Baglioni 70e391d427 merge upstream 2021-04-07 10:38:08 +02:00
miconis eaaefb8b4c implementation of the procedure to reuse content of different dbs when creating the raw graph 2021-04-06 14:35:51 +02:00
miconis c39c82dfe9 modification of the jobs for the integration of openorgs in the provision, dedup records are no more created by merging but simply taking results of openorgs portal 2021-04-06 14:31:00 +02:00
Claudio Atzori 37b65cc3ad Merge pull request 'updates on stats-update workflow' (#100) from antonis.lempesis/dnet-hadoop:master into master
The workflow integrated in the _stable_ids_ branch has been run correctly on the BETA content, thus IMO this PR can be integrated in the master branch.

Reviewed-on: D-Net/dnet-hadoop#100
2021-04-02 16:13:35 +02:00
Claudio Atzori 1e7e5180fa [Graph model] updated definition of ExternalReference: added alternateLabel, removed description (#6503) 2021-04-02 12:32:12 +02:00
Claudio Atzori e686b8de8d [ORCID-no-doi] integrating PR#98 D-Net/dnet-hadoop#98 2021-04-01 17:11:03 +02:00
Claudio Atzori ee34cc51c3 [ORCID-no-doi] integrating PR#98 D-Net/dnet-hadoop#98 2021-04-01 17:07:49 +02:00
Claudio Atzori 70e49ed53c [OpenOrgsWf] trivial refactoring 2021-04-01 10:30:51 +02:00
Claudio Atzori 7941d7be29 WIP: using common definitions from ModelConstants 2021-03-31 18:33:57 +02:00
Claudio Atzori 879e8cc7ef WIP: using common definitions from ModelConstants 2021-03-31 17:12:01 +02:00
Claudio Atzori 72ce741ea6 WIP: using common definitions from ModelConstants 2021-03-31 17:07:13 +02:00
Enrico Ottonello 59ec5137e1 improvement related to https://issue.openaire.research-infrastructures.eu/issues/6501 2021-03-31 16:25:41 +02:00
Sandro La Bruzzo 616d2ecce2 splitted workflow collecting datacite into two workflows.
Released on beta
2021-03-31 15:45:58 +02:00
Miriam Baglioni 4b6e514f02 merge upstream 2021-03-30 10:27:12 +02:00
Claudio Atzori 27681b876c code formatting 2021-03-29 17:47:11 +02:00
Claudio Atzori 9237d55d7f [OpenOrgsWf] cleanup 2021-03-29 17:40:34 +02:00
Claudio Atzori 7f4e9479ec [OpenOrgsWf] graph construction wf: allow to skip the import openorgs node (importOpenorgs true|false) 2021-03-29 16:59:16 +02:00
Claudio Atzori 940556f6d3 Merge pull request 'OpenOrgs dedup and Integration with OpenAIRE Provision' (#102) from openorgswf into stable_ids
Reviewed-on: D-Net/dnet-hadoop#102
2021-03-29 16:41:09 +02:00
miconis 2709d08fc2 Merge branch 'stable_ids' into openorgswf 2021-03-29 16:39:07 +02:00
miconis f446580e9f code refactoring (useless classes and wf removed), implementation of the test for the openorgs dedup 2021-03-29 16:10:46 +02:00
Claudio Atzori 3becaa5539 [Cleaning] drop alternate identifiers with empty values 2021-03-29 16:01:35 +02:00
Claudio Atzori a0837ac357 [Stats update] integrating PR#100 for testing D-Net/dnet-hadoop#100 2021-03-29 15:59:58 +02:00
Claudio Atzori 48f2b6127e [Cleaning] drop alternate identifiers with empty values 2021-03-29 14:23:18 +02:00
miconis 2355cc4e9b minor changes and bug fix 2021-03-29 10:07:12 +02:00
Sandro La Bruzzo 1dfda3624e improved workflow importing datacite 2021-03-26 13:56:29 +01:00
Claudio Atzori b5b7dc2104 [Cleaning] drop alternate identifiers with empty values 2021-03-26 12:30:00 +01:00
Enrico Ottonello 91d8660982 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2021-03-25 11:21:20 +01:00
Enrico Ottonello ebd67b8c8f removed duplicates orcid data on authors set 2021-03-25 11:20:52 +01:00
Claudio Atzori 827e7e37db [Cleaning] drop instance.alternateIdentifier elements when they are available among instance.pid 2021-03-25 11:07:59 +01:00
miconis 28c1cdd132 merged stable_ids into openorgswf 2021-03-25 10:44:49 +01:00
miconis 5dfb66b0fa minor changes 2021-03-25 10:29:34 +01:00
miconis 348b0ef921 bug fix, implementation of the workflow for the creation of raw_organizations (openorgs dedup), addition of the pid lists to the openorgs postgres db 2021-03-24 15:51:27 +01:00
Claudio Atzori 751125fdf9 [Actionmanager] zero function considers empty entity.id as well as rel.source/rel.target 2021-03-23 17:34:32 +01:00
Claudio Atzori 1e423fdc07 [Actionmanager] remove invalid records from the input graph before groupGraphTableByIdAndMerge 2021-03-23 13:39:24 +01:00
Claudio Atzori e5ebb500cf fixed pom versions; included missing workflow modules in dhp-workflows/pom.xml 2021-03-23 12:13:53 +01:00
Claudio Atzori b75ad76f79 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-03-23 09:59:12 +01:00
Claudio Atzori 8db248aa13 avoiding error on jenkins compilations: java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries (on a random free port)! 2021-03-23 09:56:34 +01:00
Sandro La Bruzzo 625e4c29c4 added model constants 2021-03-23 09:39:56 +01:00
Claudio Atzori b4febed138 updated mapping tests as consequence of the special treatment reserved to Handle PIDs 2021-03-23 09:37:48 +01:00
Claudio Atzori 431cbe9955 handle missing instance.pid during bulk cleaning 2021-03-23 09:28:58 +01:00
Sandro La Bruzzo c392936b97 fixed error on best access right 2021-03-23 09:23:22 +01:00
Sandro La Bruzzo c73072079d fix conflicts 2021-03-22 16:36:31 +01:00
Sandro La Bruzzo 098914dcff fix wrong relation with source null 2021-03-22 11:35:02 +01:00
miconis 0fe40b08e4 addition of deduplication profiles for the results, double check on pids and the title with a lower threshold 2021-03-19 17:12:05 +01:00
miconis 98854b0124 minor changes 2021-03-19 16:57:40 +01:00
Claudio Atzori 5a043e95ea code formatting 2021-03-19 11:37:27 +01:00
Claudio Atzori a4e82a65aa integrated filter applied when merging BETA & PROD graphs to rule our records from Datacite 2021-03-19 11:34:44 +01:00
Claudio Atzori 3256b9c836 code formatting 2021-03-19 09:36:12 +01:00
Claudio Atzori 75144dacb3 Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-03-19 09:07:40 +01:00
Claudio Atzori 9588bfba81 [cleaning] entries avaialbe as PIDs must not appear as alternateIdentifier 2021-03-19 09:07:30 +01:00
Claudio Atzori 972d5a3d98 [dedup] Datacite should be authoritative for datasets 2021-03-19 09:04:20 +01:00
Sandro La Bruzzo 25d5663d97 added filter 2021-03-18 10:24:42 +01:00
Sandro La Bruzzo 5f98ea74a9 Added fix for pid generation in stableIds 2021-03-17 15:53:24 +01:00
Sandro La Bruzzo b4805b989d Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids 2021-03-17 15:14:59 +01:00
Claudio Atzori 734232d3b9 identifier factory doesn't depend on pre-existing entity.id 2021-03-17 15:14:53 +01:00
Sandro La Bruzzo 76b10090fc Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids 2021-03-17 15:06:46 +01:00
Claudio Atzori a3dac32f16 pidFilter a bit more permissive 2021-03-17 15:06:05 +01:00
Sandro La Bruzzo 2be0428047 Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids 2021-03-17 14:54:28 +01:00
Claudio Atzori 8257f9a2bc result.pid: adjusted the mapping applied to the contents from the aggregator 2021-03-17 12:45:38 +01:00
Sandro La Bruzzo 7c97a4d900 Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids 2021-03-17 12:13:03 +01:00
Sandro La Bruzzo cc5bbafa5d some fix to make workflows runs 2021-03-17 12:12:56 +01:00
Claudio Atzori 3b2da86f0a added precondition on IdentifierFactory to check the presence of entity.id 2021-03-16 17:05:38 +01:00
Claudio Atzori 640b885706 added instance.alternativeIdentifiers to the graph model, adjusted the mapping applied to the contents from the aggregator 2021-03-16 14:19:32 +01:00
Claudio Atzori 61a2551e74 migrated last changes from svn (dnet45) 2021-03-15 17:17:55 +01:00
Claudio Atzori 9cac6da9bd added AccessRight and OpenAccessRoute classes to ModelSupport.getOafModelClasses() 2021-03-12 16:31:17 +01:00
Antonis Lempesis 0ba0a6b9da update promote wf to support monitor&production 2021-03-12 16:42:59 +02:00
Antonis Lempesis 60ebdf2dbe update promote wf to support monitor&production 2021-03-12 16:34:53 +02:00
Antonis Lempesis 236435b470 following redirects 2021-03-12 14:11:21 +02:00
Antonis Lempesis 3c75a05044 fixed a ton of typos 2021-03-12 13:47:04 +02:00
Claudio Atzori 19f3580b3d introduced java8-based date parsing 2021-03-11 16:46:23 +01:00
Claudio Atzori d3cb923f24 introduced java8-based date parsing 2021-03-11 12:37:33 +01:00
Sandro La Bruzzo 4bb3bcafa5 add author sequence number 2021-03-11 11:32:32 +01:00
Sandro La Bruzzo a8e5d0ea0d updated test and fixed assign of access right 2021-03-11 10:41:24 +01:00
Sandro La Bruzzo f5e7c57654 Fixed ticket 6282 2021-03-11 10:32:45 +01:00
Claudio Atzori f74e464942 create bestaccessright as Qualifier 2021-03-10 15:40:05 +01:00
Antonis Lempesis fa1ec5b5e9 fixed typo... 2021-03-10 14:05:58 +02:00
Claudio Atzori c801ab6c1d minor 2021-03-09 17:22:31 +01:00
Claudio Atzori 9917d7e01c PID authorities include ArXiv 2021-03-09 17:12:52 +01:00
Claudio Atzori 01630f638d IdentifierFactory implementation based on the list of datasources authoritative for a given pid type 2021-03-09 17:11:50 +01:00
Claudio Atzori b3f3b895e5 [#6282 open access status in the Graph] OAStatus renamed as openAccessRoute 2021-03-09 11:41:11 +01:00
Claudio Atzori 765f9bdee7 merged from dhp_oaf_model 2021-03-09 11:37:41 +01:00
Claudio Atzori 59532b0919 [#6281 Provenance of product PIDs] Added PIDs to the Instance type; extended mapping for OAF/ODF records 2021-03-09 11:14:45 +01:00
Claudio Atzori d525785497 [#6282 open access status in the Graph] Result.Instance.accessRight defined with dedicated data type that includes the open access color. 2021-03-09 11:12:55 +01:00
Sandro La Bruzzo bbe1a7c69a [#6281 Provenance of product PIDs] Added PIDs to the Instance type in Scholexplorer Export 2021-03-09 10:46:36 +01:00
Sandro La Bruzzo a2169ccf07 // implemented Ticket #6281 added pid to Instance in doiBoost 2021-03-09 10:46:36 +01:00
Claudio Atzori f468c7f0d7 merged from master 2021-03-09 09:12:41 +01:00
Claudio Atzori 76441f4edd code formatting 2021-03-09 09:12:26 +01:00
Claudio Atzori 8d2bb24512 merged from master 2021-03-08 15:44:34 +01:00
Claudio Atzori acbe3119a4 RestCollectorPlugin imported from dne45 2021-03-08 09:44:09 +01:00
Antonis Lempesis f40c150a0d fixed steps... 2021-03-06 00:35:57 +02:00
Claudio Atzori fa7930d2e2 merging contributions from PR#97 2021-03-05 15:45:28 +01:00
Antonis Lempesis 6147ee4950 assigning correctly hive contexts to concepts 2021-03-05 14:12:18 +02:00
Antonis Lempesis c5fbad8093 Contexts are now downloaded instead of using the stats_ext db 2021-03-04 00:42:21 +02:00
Claudio Atzori 55f6ff5f55 README.md for aggregation workflows 2021-03-03 16:18:34 +01:00
Claudio Atzori e8789b0cdb Merge pull request 'stats DB for monitor' (#99) from antonis.lempesis/dnet-hadoop:master into master
Looks good to me, just a note on the parsing of the citations: since the last version, IIS produces citations as proper relationships among results. This is what we got already in the BETA graph

```
count		r.reltype	r.subreltype	r.relclass
62.129.254	resultResult	citation	cites
62.043.309	resultResult	citation	isCitedBy
```

Thus, I suggest to move away from the current property based implementation for the extraction of the citation links and start relying on the relationships instead.
2021-03-03 10:29:09 +01:00
Claudio Atzori ec80b7ade3 code formatting 2021-03-03 10:22:53 +01:00
Claudio Atzori 36f750cd1d removed unused classes 2021-03-03 10:22:29 +01:00
Claudio Atzori b73dce3e3a more logging on the MDStore mongodb client. Forcing UTF_8 encoding on the content 2021-03-03 10:17:16 +01:00
Antonis Lempesis 27796343ca crude sleep. hardcoded value 2021-03-03 01:37:47 +02:00
Enrico Ottonello 20c0438f11 reformatted code after compile step 2021-03-01 11:07:01 +01:00
Enrico Ottonello 70cb100647 added updating last orcid dataset folders after completion 2021-03-01 10:17:04 +01:00
Enrico Ottonello bd3b16402b added result typologies 2021-03-01 10:16:02 +01:00
Claudio Atzori e76c4f62c1 MetadataRecord moved in dhp-schemas 2021-02-26 10:58:48 +01:00
miconis 1a85020572 bug fix in graph-mapper, changes in the implementation of the openorgs wf to create relations and populate openorgs db 2021-02-26 10:19:28 +01:00
Enrico Ottonello ca1800510a Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2021-02-25 18:45:02 +01:00
Enrico Ottonello 53d7023460 dateOfCollection taken from orcid last_update.txt on hdfs; cleaned wf parameters 2021-02-25 18:43:29 +01:00
Claudio Atzori 7df2461ccc indent XML records collected from oai-pmh endpoints 2021-02-25 16:19:12 +01:00
Enrico Ottonello d43ea88caf aligned orcid result typologies with openaire vocabulary 2021-02-25 15:02:10 +01:00
Claudio Atzori b830e33392 mdstore collector plugin 2021-02-25 12:30:30 +01:00
Claudio Atzori dc98c39500 more logging 2021-02-25 12:29:18 +01:00
Claudio Atzori 271e88537b code formatting 2021-02-25 12:28:56 +01:00
Claudio Atzori 9c899f4433 cleanup on transformation functions and the relative tests 2021-02-24 15:07:59 +01:00
Claudio Atzori fc3fa5e343 implemented mdstore collector plugin 2021-02-24 15:07:24 +01:00
Enrico Ottonello 975823b968 data from last updated orcid 2021-02-23 15:35:04 +01:00
Miriam Baglioni 896919e735 merge upstream 2021-02-23 10:45:29 +01:00
Antonis Lempesis d90767c733 correctly invalidating metadata 2021-02-19 03:18:47 +02:00
Antonis Lempesis 3681afbe04 typo 2021-02-19 03:04:27 +02:00
Antonis Lempesis c5502eba8f actually moved stats computation in impala instead of hive... 2021-02-19 02:54:39 +02:00
Antonis Lempesis 33c85d4e66 moved stats computation in impala instead of hive 2021-02-18 17:23:34 +02:00
Antonis Lempesis b8e96c8ae7 moved cache update to the end 2021-02-18 16:42:22 +02:00
Antonis Lempesis bcbfc052b1 fixed last errors in step 21 2021-02-18 16:32:54 +02:00
Antonis Lempesis 10a29a4b9a fixes in monitor step 2021-02-18 15:05:59 +02:00
Antonis Lempesis 8ef66452d5 fixed typo 2021-02-17 22:24:44 +02:00
Antonis Lempesis a8836e2f5f fixed typo 2021-02-17 19:27:07 +02:00
Claudio Atzori e7eba9f7e7 WIP: transformation workflow error reporting; cleanup 2021-02-17 16:54:08 +01:00
Claudio Atzori 58467aaf1e WIP: transformation workflow error reporting 2021-02-17 16:14:41 +01:00
Claudio Atzori cc88701f29 retry for any Socket exception 2021-02-17 16:13:54 +01:00
Antonis Lempesis a445c1ac3d fixed variable names in monitor script 2021-02-17 16:45:09 +02:00
Antonis Lempesis 00d516360f added missing ; 2021-02-17 16:41:10 +02:00
Claudio Atzori 545f8f3e48 using jackson objectmapper instead of GSon to serialise the aggregation report 2021-02-17 12:15:00 +01:00
Claudio Atzori b592d78bb4 WIP: collectorWorker error reporting, generalised reported implementation 2021-02-17 10:28:01 +01:00
Antonis Lempesis cd1b794409 added the monitor db wf 2021-02-17 02:11:55 +02:00
Claudio Atzori cf27905a71 WIP: collectorWorker error reporting, added report messages 2021-02-16 16:53:14 +01:00
Alessia Bardi bf2830b981 Merge pull request 'manage merging of Relatation validation attributes' (#95) from merge_rel_validation into master 2021-02-16 12:14:27 +01:00
Claudio Atzori 6f9864c564 manage merging of Relatation validation attributes 2021-02-16 11:53:44 +01:00
Alessia Bardi 32e81c2d89 non validated rel has null value in validated field 2021-02-16 11:01:42 +01:00
Claudio Atzori 58288a95b8 WIP: collectorWorker error reporting, added report messages 2021-02-15 15:28:53 +01:00
Claudio Atzori 1abe6d1ad7 WIP: collectorWorker error reporting, added report messages 2021-02-15 15:08:59 +01:00
Claudio Atzori 523a6bfa97 Merge pull request 'first commit to the correct branch' (#94) from andreas.czerniak/BrAggr_dnet-hadoop:hadoop_aggregator into hadoop_aggregator
Looks good to me, thanks Andreas!
2021-02-15 12:15:31 +01:00
Antonis Lempesis 1c029b9fc0 fixed formatting 2021-02-14 03:14:24 +02:00
Antonis Lempesis 2c4dcc90ba analyzing tables to produce stats 2021-02-14 02:54:55 +02:00
Sandro La Bruzzo 7edcc87ed4 changed xslt behaviour on failure 2021-02-12 17:27:08 +01:00
Sandro La Bruzzo 6a37c7f175 merge fixed 2021-02-12 16:38:47 +01:00
Sandro La Bruzzo b3f5c2351d Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into hadoop_aggregator
 Conflicts:
	dhp-workflows/dhp-aggregation/src/test/java/eu/dnetlib/dhp/transformation/TransformationJobTest.java
2021-02-12 16:37:14 +01:00
Sandro La Bruzzo f216277219 Implemented cleaning date 2021-02-12 16:34:52 +01:00
Andreas Czerniak 5a9017cf18 clone, min. changes, test, run 2021-02-12 14:32:36 +01:00
Claudio Atzori aa55dedb8a Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator 2021-02-12 12:31:05 +01:00
Claudio Atzori 29c6f7e255 classes related to the collection workflow moved into common package; implemented MongoDB collection plugins 2021-02-12 12:31:02 +01:00
Sandro La Bruzzo 17e6f1934e fixed NPE on cleaner 2021-02-12 11:48:11 +01:00
Sandro La Bruzzo ebcc3ec14f updated wrong datacite identifier in trasformation 2021-02-11 16:25:51 +01:00
Michele Artini 83d815d0bc only stats 2021-02-11 10:57:23 +01:00
Michele Artini 8c836bf930 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2021-02-11 10:54:41 +01:00
Michele Artini 8c1600398a added resumeFrom parameter 2021-02-11 10:54:16 +01:00
Claudio Atzori 3f8f78cbfb Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2021-02-11 09:36:10 +01:00
Claudio Atzori b34b5a39ca index field authoridtypevalue mixes up different author id-type value pairs, dropped in favour of orcidtypevalue 2021-02-11 09:36:04 +01:00
Michele Artini 7249cceb53 switch of 2 nodes 2021-02-11 09:27:08 +01:00
Claudio Atzori 73393d3c4d Merge pull request 'validatedLinksToProjects' (#93) from validatedLinksToProjects into master
LGTM
2021-02-10 12:32:35 +01:00
Alessia Bardi 986dd969d3 use the proper import for Lists 2021-02-10 12:03:54 +01:00
miconis 4b2124a18e implementation of the openorgs wfs, implementation of the raw_all wf to migrate openorgs db entities 2021-02-10 11:51:50 +01:00
Alessia Bardi c4d1feca74 mapper test with validated link to project 2021-02-10 11:22:54 +01:00
Alessia Bardi 09fc7e2f78 serialization of validated flag on relationships 2021-02-10 11:22:09 +01:00
Enrico Ottonello ee4ba7298b fix last update read/write from file on hdfs 2021-02-09 23:24:57 +01:00
Claudio Atzori bc458d1b54 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2021-02-09 16:27:30 +01:00
Claudio Atzori 82e6c50f3f updated solr fields (authoridtypevalue, resultsubject, resultresourcetypename) 2021-02-09 16:27:04 +01:00
Claudio Atzori 62bd3c53ee Merge branch 'master' into provision_indexing 2021-02-09 15:46:26 +01:00
Claudio Atzori bae029f828 collection_java_xmx allows to declare the heap size allocated for the java actions involved in the metadata collectionw workflow 2021-02-08 18:07:23 +01:00
Claudio Atzori bebc54d5bf seq file storing native records is now compressed 2021-02-08 18:06:25 +01:00
Claudio Atzori 50add4c61b added requestDelay to HttpConnector2 configuration; Aggregation workflow constants moved in dhp-common 2021-02-08 12:19:38 +01:00
Miriam Baglioni 2f5e6647c6 merge upstream 2021-02-08 10:33:11 +01:00
Claudio Atzori 40df0f987d better logging, WIP: collectorWorker error reporting; common functions moved in DHPUtils 2021-02-06 20:12:00 +01:00
Claudio Atzori a8a758925e better logging, WIP: collectorWorker error reporting 2021-02-05 19:18:05 +01:00
Michele Artini 2ee0c3e47e http entity as json string 2021-02-05 09:45:39 +01:00
Claudio Atzori 730973679a Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator 2021-02-04 17:25:00 +01:00
Claudio Atzori deb85706db imported HttpConnector from https://svn.driver.research-infrastructures.eu/driver/dnet45/modules/dnet-modular-collector-service/trunk/src/main/java/eu/dnetlib/data/collector/plugins/HttpConnector.java as HttpConnector2 2021-02-04 17:24:52 +01:00
Sandro La Bruzzo 4dae5e605d implemented messaging btween collection worker and Dnet 2021-02-04 15:51:15 +01:00
Claudio Atzori 72c57b28fa switched project version to 1.2.4-branch_hadoop_aggregator-SNAPSHOT 2021-02-04 14:08:18 +01:00
Claudio Atzori 40764cf626 better logging, WIP: collectorWorker error reporting 2021-02-04 14:06:02 +01:00
Enrico Ottonello c238561001 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2021-02-04 10:44:21 +01:00
Enrico Ottonello 465ce39f75 job execution now based on file last_update.txt on hdfs 2021-02-04 10:44:04 +01:00
Sandro La Bruzzo 69c253710b fixed test 2021-02-04 10:30:49 +01:00
Michele Artini 3ea8c328ac Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into hadoop_aggregator 2021-02-04 09:46:13 +01:00
Michele Artini 26d2eb946f messages sender 2021-02-04 09:45:46 +01:00
Claudio Atzori 4758b58aa2 Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator 2021-02-03 17:58:29 +01:00
Claudio Atzori e04045089f better logging, WIP: collectorWorker error reporting 2021-02-03 17:58:22 +01:00
Alessia Bardi c67329d3ad updated test for EU Open Data portal datasets 2021-02-03 17:06:48 +01:00
Michele Artini 1b9731632b Message Sender 2021-02-03 16:42:36 +01:00
Michele Artini 820d729e99 recover of Message and MessageType class 2021-02-03 16:20:34 +01:00
Michele Artini 33f4696d6e Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into hadoop_aggregator 2021-02-03 16:08:21 +01:00
Michele Artini c286d28ad2 logs 2021-02-03 16:07:49 +01:00
Claudio Atzori 0e8a4f9f1a better logging, WIP: collectorWorker error reporting 2021-02-03 12:33:41 +01:00
Alessia Bardi fd705404a1 tests for EU Open Data portal dataset mapping 2021-02-03 10:28:17 +01:00
Miriam Baglioni 6190465851 merge upstream 2021-02-03 10:27:27 +01:00
Claudio Atzori 53884d12c2 code formatting 2021-02-02 14:38:03 +01:00
Claudio Atzori ac46c247d2 code formatting 2021-02-02 14:24:00 +01:00
Claudio Atzori bde14b149a fixed transformation target paths 2021-02-02 12:49:29 +01:00
Claudio Atzori ca4391aa1c minor changes 2021-02-02 12:44:04 +01:00
Claudio Atzori bb89b99b24 code formatting 2021-02-02 12:34:14 +01:00
Claudio Atzori 75807ea5ae factored out constants 2021-02-02 12:28:21 +01:00
Sandro La Bruzzo 4ed1e306b6 Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into hadoop_aggregator 2021-02-02 12:12:51 +01:00
Sandro La Bruzzo 0634674add implemented transformation test 2021-02-02 12:12:14 +01:00
Claudio Atzori d62ea1490d cleaned up RabbitMQ stuff 2021-02-02 10:53:19 +01:00
Claudio Atzori 73d772a4b4 added method to list the known vocabulary names 2021-02-02 10:39:47 +01:00
Claudio Atzori 8eaa1fd4b4 WIP: metadata collection in INCREMENTAL mode and relative test 2021-02-01 19:29:10 +01:00
Sandro La Bruzzo bead34d11a code refactor 2021-02-01 14:58:06 +01:00
Sandro La Bruzzo 6ff234d81b Implemented a first prototype of incremental harvesting and trasformation using readlock 2021-02-01 13:56:05 +01:00
Sandro La Bruzzo b6b835ef49 update transformation Factory to get Transformation Rule by Id and not by Title 2021-02-01 08:49:42 +01:00
Sandro La Bruzzo e423634cb6 RollBack in case of error WORKS!!! 2021-01-29 17:21:42 +01:00
Sandro La Bruzzo 8ee82576c6 Collection on Refresh WORKS!!! 2021-01-29 17:02:46 +01:00
Sandro La Bruzzo 0276180039 WIP mdstore
transaction implemented on hadoop side
2021-01-29 16:42:41 +01:00
Michele Artini d942d0c77d methods toString(), hashCode() and equals() 2021-01-29 13:16:48 +01:00
Sandro La Bruzzo 0f8e2ecce6 Merged Datacite transfrom into this branch 2021-01-29 10:45:07 +01:00
Sandro La Bruzzo 99cf3a8ea4 Merged Datacite transfrom into this branch 2021-01-28 16:34:46 +01:00
Sandro La Bruzzo 2da8bf7429 Merge pull request 'aggregation_on_hadoop' (#91) from sandro.labruzzo/dnet-hadoop:aggregation_on_hadoop into hadoop_aggregator
ok
2021-01-28 10:06:49 +01:00
Sandro La Bruzzo 686e7b507c Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into aggregation_on_hadoop 2021-01-28 10:02:13 +01:00
Sandro La Bruzzo 98b9498b57 Removed old messaging system not quite used from collection and Transformation workflow
code refactor
2021-01-28 09:51:17 +01:00
Michele Artini 38f2508c87 new fields in mdstore beans 2021-01-28 08:24:45 +01:00
Sandro La Bruzzo 184e7b3856 Implemented new Transformation using spark 2021-01-27 15:43:08 +01:00
Sandro La Bruzzo 150a617bd1 Merge pull request 'aggregation_on_hadoop' (#90) from sandro.labruzzo/dnet-hadoop:aggregation_on_hadoop into hadoop_aggregator
Wonderfull code... You're the Best Sandro
2021-01-26 16:00:47 +01:00
Claudio Atzori f1a852f278 align usage-stats workflow poms with latest snapshot version 2021-01-26 15:42:42 +01:00
Claudio Atzori 9c32119dc2 Merge pull request 'usage-stats-export-wf-v2' (#89) from dimitris.pierrakos/dnet-hadoop:usage-stats-export-wf-v2 into master
Thank you Dimitris!
2021-01-26 15:01:41 +01:00
Claudio Atzori 885e0dd926 [Cleaning] filter authors not providing word characters in the fullname 2021-01-26 09:48:53 +01:00
Claudio Atzori 2890511613 [Cleaning] normalise missing Result.country 2021-01-26 09:41:44 +01:00
Claudio Atzori 4eb9ed35b1 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2021-01-25 18:12:24 +01:00
Claudio Atzori cd379eb5e3 [Cleaning] trying to avoid NPEs, this time by ruling out authors without a defined fullname 2021-01-25 18:11:49 +01:00
Alessia Bardi 505477f36f format code 2021-01-25 18:02:49 +01:00
Alessia Bardi ded6ed8d7d no ',' author, if there are no author in ODF records 2021-01-25 17:57:51 +01:00
Claudio Atzori 3465c8ccee [Cleaning] trying to avoid NPEs 2021-01-25 16:54:53 +01:00
Sandro La Bruzzo a54848a59c Moved Vocabulary stuff to common module 2021-01-25 15:43:04 +01:00
Sandro La Bruzzo ffb092b8d3 removed duplicate code HttpConnector.java 2021-01-25 15:05:37 +01:00
Sandro La Bruzzo cda210a2ca changed documentation since it didn't reflect the current status 2021-01-25 14:17:42 +01:00
Claudio Atzori 07a0ccfc96 [Cleaning] trying to avoid NPEs 2021-01-25 13:36:01 +01:00
miconis c7e2d5a59a minor changes 2021-01-25 12:40:45 +01:00
Claudio Atzori 646dab7f68 trying to avoid NPEs 2021-01-22 18:24:34 +01:00
Claudio Atzori 34d653de41 [Cleaning] updated cleaning rule for DOIs 2021-01-22 14:16:33 +01:00
Miriam Baglioni fe36895c53 added datasource blacklist for the organization to result propagation through institutional repositories 2021-01-22 11:55:10 +01:00
miconis 8fea29177c refactoring, minor changes and implementation of the wf for openorgs with integration of organization phases into the scan wf 2021-01-18 16:48:08 +01:00
Dimitris 3e8d2a6b2d Clean workflows 2021-01-15 16:19:12 +02:00
Michele Artini f667e94a31 Merge pull request 'broker' (#88) from broker into master 2021-01-14 14:48:13 +01:00
Michele Artini cfbcdc95bc fixed a wf param 2021-01-14 14:45:23 +01:00
Michele Artini 69ba3203c0 fixed a conflict 2021-01-14 14:43:25 +01:00
Michele Artini fafb5b2e08 Merge branch 'broker' of code-repo.d4science.org:D-Net/dnet-hadoop into broker 2021-01-14 14:32:42 +01:00
Michele Artini b230d44411 fixed conflict 2021-01-14 14:32:31 +01:00
Michele Artini b9d90e95b8 Added eventId to ShortEventMessage 2021-01-14 14:32:31 +01:00
Michele Artini 64b0b0bfb3 fixed a bug with invalid subject topic 2021-01-14 14:32:31 +01:00
Michele Artini e3e0ab1de1 fixed a problem with join 2021-01-14 14:32:31 +01:00
Michele Artini 26a941315a openaireId 2021-01-14 14:32:31 +01:00
Michele Artini 6f4d1a37f0 ES wf properties 2021-01-14 14:32:31 +01:00
Michele Artini 1391341d06 mkdir of output dir 2021-01-14 14:32:31 +01:00
Michele Artini 3c9cbd19f3 whitelist of topics 2021-01-14 14:32:31 +01:00
Michele Artini 467aa77279 workingDir and outputDir 2021-01-14 14:32:31 +01:00
Michele Artini 10f3f7eca7 workingDir and outputDir 2021-01-14 14:32:31 +01:00
Michele Artini ff41a7b3a4 gzipped output 2021-01-14 14:32:31 +01:00
Michele Artini 223fa660cb fixed conflict 2021-01-14 14:23:44 +01:00
Michele Artini ac91e495fc Added eventId to ShortEventMessage 2021-01-14 13:20:35 +01:00
Claudio Atzori 80cf55ef2e [Broker] fixed partitionEventsByOpendoarIds workflow parameter names 2021-01-13 16:24:30 +01:00
Claudio Atzori 41500669e2 [BIP! Scores integration] merged missing classes from bipFinder branch 2021-01-11 14:39:47 +01:00
Claudio Atzori 2a7a10809e [BIP! Scores integration] merged missing classes from bipFinder branch 2021-01-11 10:05:02 +01:00
Claudio Atzori 5bd999efe7 Merge pull request 'bipFinder_master_test' (#84) from bipFinder_master_test into master 2021-01-08 18:16:34 +01:00
Claudio Atzori d6686dd7cf merged from master 2021-01-08 18:16:12 +01:00
Claudio Atzori 34229970e6 [BIP! Scores integration] Create updates as Result rather than subclasses; Result considers also metrics in the mergeFrom operation 2021-01-08 16:29:17 +01:00
Claudio Atzori 1361c9eb0c [BIP! Scores integration] Create updates as Result rather than subclasses; Result considers also metrics in the mergeFrom operation 2021-01-07 10:07:30 +01:00
Claudio Atzori ab2fe9266a [DOIBoost] minor fixes in workflow definition 2021-01-05 10:26:39 +01:00
Claudio Atzori 7c722f3fdc [DOIBoost] fixed typo 2021-01-05 10:25:54 +01:00
Claudio Atzori 8879704ba0 [DOIBoost] configurable ES server url and index name in crossref importer 2021-01-05 10:00:13 +01:00
Claudio Atzori 26e9d55c13 code formatting 2021-01-05 09:59:26 +01:00
Sandro La Bruzzo 7834a35768 avoid to save intermediate dataset before generation of Sequence file 2021-01-04 17:54:57 +01:00
Sandro La Bruzzo e79445a8b4 minor fix for claudio polemica 2021-01-04 17:39:25 +01:00
Sandro La Bruzzo 8765020b85 minor fix 2021-01-04 17:37:08 +01:00
Sandro La Bruzzo b0dc92786f defined a single oozie workflow for the generation of doiboost 2021-01-04 17:01:35 +01:00
Claudio Atzori 7185158942 ignore missing properties 2020-12-29 11:06:28 +01:00
Claudio Atzori 28460c2cd1 using com.fasterxml.jackson.databind.ObjectMapper instead of org.codehaus.jackson.map.ObjectMapper 2020-12-23 16:59:52 +01:00
Claudio Atzori 60649ac7d2 swapped expected and actual in tests, updated expected number of authors 2020-12-23 12:26:04 +01:00
Claudio Atzori 723b01f9e9 trivial: the less magic numbers and values around, the better 2020-12-23 12:22:48 +01:00
Claudio Atzori 6848d0c3d7 trivial: avoid duplicated code 2020-12-23 12:21:58 +01:00
Claudio Atzori d8b5f43a7e code formatting 2020-12-22 14:59:03 +01:00
Claudio Atzori 7bfc35df5e Merge pull request 'Changed typo in script names' (#82) from antonis.lempesis/dnet-hadoop:master into master
no need to! :)
2020-12-22 12:36:21 +01:00
Antonis Lempesis be5969a8c2 Changed typo in script names 2020-12-22 13:33:32 +02:00
miconis 794e22b09c bug fix in the authormerge: now authors with higher size have priority, normalization of author name fixed 2020-12-21 17:51:42 +01:00
miconis 1e1aab83e3 implementation of the raw wf for openorgs: still not complete, some functionalities are missing 2020-12-21 11:58:21 +01:00
Claudio Atzori 6cb0dc3f43 extended OCRID cleaning procedure 2020-12-21 11:40:17 +01:00
Claudio Atzori 573a8a3272 Merge pull request 'Changed typo in script names' (#81) from antonis.lempesis/dnet-hadoop:master into master
ok! LGTM
2020-12-18 17:44:26 +01:00
Antonis Lempesis 2a074c3b2b Changed typo in script names 2020-12-18 18:40:48 +02:00
Claudio Atzori 47270d9af5 lenient mock can be lenient 2020-12-18 15:38:59 +01:00
Claudio Atzori 2e503ee101 code formatting 2020-12-17 13:47:38 +01:00
Claudio Atzori 5a3e2199b2 Merge pull request 'Creation of the action set to include the bipFinder! score' (#80) from miriam.baglioni/dnet-hadoop:bipFinder into bipFinder_master_test 2020-12-17 12:26:38 +01:00
Claudio Atzori 03319d3bd9 Revert "Merge pull request 'Creation of the action set to include the bipFinder! score' (#62) from miriam.baglioni/dnet-hadoop:bipFinder into master"
This reverts commit add7e1693b, reversing
changes made to f9a8fd8bbd.
2020-12-17 12:23:58 +01:00
Claudio Atzori add7e1693b Merge pull request 'Creation of the action set to include the bipFinder! score' (#62) from miriam.baglioni/dnet-hadoop:bipFinder into master 2020-12-17 12:09:03 +01:00
Alessia Bardi f9a8fd8bbd updated test record for textgrid 2020-12-17 11:59:45 +01:00
Claudio Atzori 4766495f5b [orcid_to_result_from_semrel_propagation] fixed typo in SQL 2020-12-17 09:15:50 +01:00
Claudio Atzori de00094ebc Merge pull request 'FIX on the creation of subject based broker enrichments' (#79) from broker into master 2020-12-15 14:58:31 +01:00
Michele Artini f9dc1e45fd fixed a bug with invalid subject topic 2020-12-15 14:54:11 +01:00
Sandro La Bruzzo f92bd56f56 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-12-15 11:47:29 +01:00
Sandro La Bruzzo 1f6c8a9e83 added orcid_pending type to records coming from Crossref 2020-12-15 11:47:15 +01:00
Enrico Ottonello b2de598c1a all actions from download lambda file to merge updated data into one wf 2020-12-15 10:42:55 +01:00
Claudio Atzori 9f1181290e Merge pull request 'broker' (#78) from broker into master
The changes look good to me.
2020-12-15 10:03:45 +01:00
Claudio Atzori 6299f75807 Merge pull request 'validation in claim rels' (#77) from claims_validation into master
LGTM
2020-12-15 09:28:24 +01:00
Michele Artini 0a0f62bd01 Merge branch 'master' into broker 2020-12-15 08:30:52 +01:00
Michele Artini 12fa5d122a fixed a problem with join 2020-12-15 08:30:26 +01:00
Michele Artini 991e675dc6 validation in claim rels 2020-12-14 15:41:25 +01:00
Michele Artini 3e19cf7b4a openaireId 2020-12-14 15:24:33 +01:00
Claudio Atzori b6f08ce226 re-adding the old junit:junit dep as solr-test-framework needs it 2020-12-14 15:07:31 +01:00
Claudio Atzori e8ef8c63d4 delegate merging of OafEntity.dataInfo to the implementation of subclasses 2020-12-14 15:04:44 +01:00
Claudio Atzori 7d325e2c57 using actual result subclasses instead of their parent class 2020-12-14 14:40:54 +01:00
Claudio Atzori 152916890f renamed test name 2020-12-14 14:40:05 +01:00
Michele Artini a203aee32a ES wf properties 2020-12-14 12:02:33 +01:00
Claudio Atzori 1506f49052 Xml record serialization for author PIDs: 1) only one value per PID type is allowed; 2) orcid prevails over orcid_pending 2020-12-14 11:14:03 +01:00
Michele Artini d03756c962 mkdir of output dir 2020-12-14 11:11:41 +01:00
Michele Artini 399548f221 whitelist of topics 2020-12-14 11:03:55 +01:00
Michele Artini 38da1c282a Merge branch 'master' into broker 2020-12-14 09:14:02 +01:00
Dimitris dc9c2f3272 Commit 12122020 2020-12-12 12:00:14 +02:00
Enrico Ottonello efe4c2a9c5 authors and works are now updated in two separate spark actions of the wf 2020-12-12 02:06:21 +01:00
Enrico Ottonello 858efbfad1 fix dataset creation for downloaded works 2020-12-11 16:49:54 +01:00
Claudio Atzori 61cd129ded XML serialisation test 2020-12-11 12:44:53 +01:00
Claudio Atzori ce7a319e01 using the correct assertion import 2020-12-11 12:44:17 +01:00
Claudio Atzori 7fe2433137 excluded transitive older junit dependencies, they can compromise the unit test executions 2020-12-11 12:42:55 +01:00
Claudio Atzori d9532446eb imported more diffs from master branch; code formatting 2020-12-10 16:14:16 +01:00
Claudio Atzori 1eaad89a3c do not fail on uknown properties when grouping entities by ID 2020-12-10 15:56:11 +01:00
Michele Artini 933b4c1ada workingDir and outputDir 2020-12-10 14:47:51 +01:00
Michele Artini 2e7df07328 workingDir and outputDir 2020-12-10 14:47:22 +01:00
Michele Artini 94bfed1c84 gzipped output 2020-12-10 11:59:28 +01:00
Claudio Atzori 3c10941376 Merge pull request 'bipFinder_resolve_conflicts' (#73) from bipFinder_resolve_conflicts into stable_ids 2020-12-10 11:00:46 +01:00
Claudio Atzori 12e2f930c8 resolved conflicts 2020-12-10 10:57:39 +01:00
Miriam Baglioni b7adbc7c3e merge branch with master 2020-12-10 10:35:27 +01:00
Alessia Bardi 112da6d76a in theory, just auto-formatting after mvn compile 2020-12-09 20:00:27 +01:00
Alessia Bardi bece04b330 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-12-09 19:54:43 +01:00
Alessia Bardi 426b76ee8e more asserts for TextGrid record 2020-12-09 19:46:11 +01:00
Claudio Atzori ff72fcd91a allow orcid_pending to be percolate to the XML graph serialization 2020-12-09 19:04:50 +01:00
Claudio Atzori 4705144918 Merge pull request 'rel_project_validation' (#69) from rel_project_validation into master
LGTM
2020-12-09 19:01:20 +01:00
Claudio Atzori 211aa04726 allow orcid_pending to be percolate to the XML graph serialization 2020-12-09 18:08:51 +01:00
Claudio Atzori db4e400a0b introduced Oaf.mergeFrom method to allow merging of dataInfo(s), with prevalence based on datainfo.trust 2020-12-09 18:01:45 +01:00
Claudio Atzori ada21ad920 Merge pull request 'dump of the results related to at least one project' (#61) from miriam.baglioni/dnet-hadoop:dump into master
LGTM
2020-12-09 17:22:56 +01:00
Miriam Baglioni 6fbc67a959 using ModelConstant.ORCID and removing not used constants 2020-12-09 17:10:20 +01:00
Claudio Atzori 3c5ce1dada code formatting 2020-12-09 17:07:20 +01:00
Miriam Baglioni 212b52614f added graph mapper versus community result without context and project in common to be used for the doiboost mapping 2020-12-09 16:59:02 +01:00
Michele Artini 1bc9adc10d default trust for validated rels 2020-12-09 16:18:37 +01:00
Claudio Atzori fcd7689b50 promote actions: shouldGroupById parameter marked as optional (default is true) 2020-12-09 13:10:16 +01:00
Michele Artini 5f21a356fd reindent 2020-12-09 11:24:30 +01:00
Michele Artini 370a5e650b validation attributes in resultProject relations 2020-12-09 11:18:26 +01:00
Antonis Lempesis aead9efd24 added the new parameter (stats_tool_api_url) in the workflow parameters 2020-12-09 10:45:24 +01:00
Antonis Lempesis 77a3a6d82e added the new parameter (stats_tool_api_url) in the workflow parameters 2020-12-09 10:45:24 +01:00
Antonis Lempesis 91226117b3 ignoring deletedbyinference relations 2020-12-09 10:45:24 +01:00
Antonis Lempesis b7f29db126 finished first implementation of wf 2020-12-09 10:45:24 +01:00
Antonis Lempesis ded2392275 initial implementation of the promote wf 2020-12-09 10:45:24 +01:00
Antonis Lempesis 1a87a1effd added last step to update cache 2020-12-09 10:45:24 +01:00
Michele Artini 75bf708351 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-12-09 10:31:33 +01:00
Michele Artini 620e1307a3 indentation 2020-12-09 10:30:47 +01:00
Enrico Ottonello 2233750a37 original orcid xml data are stored in a field of the class that models orcid data 2020-12-09 09:45:19 +01:00
Claudio Atzori 491ad24750 introduced filtering for DOIs in graph cleaning workflow 2020-12-09 09:10:33 +01:00
Claudio Atzori 27e96767e0 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-12-07 21:53:22 +01:00
Claudio Atzori fba11eef2a cleanup 2020-12-07 21:53:13 +01:00
Claudio Atzori 2fcc24b36e code formatting 2020-12-07 21:52:32 +01:00
Claudio Atzori 197f286fa4 removed duplicated dependency (org.apache.httpcomponents:httpclent 2020-12-07 21:52:17 +01:00
Sandro La Bruzzo 7f8b93de72 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-12-07 19:59:39 +01:00
Sandro La Bruzzo 302baab67b fixed doiboost mapping and workflows 2020-12-07 19:59:33 +01:00
Enrico Ottonello 5c65e602d3 wf doi_authors generates one json data foreach row 2020-12-07 15:28:10 +01:00
Michele Artini d6934f370e Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-12-07 14:56:23 +01:00
Michele Artini 5de8a7276f wf to partition opendoar events 2020-12-07 14:56:06 +01:00
Claudio Atzori 5e8509bef7 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-12-07 13:50:08 +01:00
Claudio Atzori 026ad40633 disabled test 2020-12-07 13:50:01 +01:00
Claudio Atzori 21ddcf3a73 actions promotion can optionally avoid grouping objects by id (configured via shouldGroupById parameter) 2020-12-07 13:45:18 +01:00
Enrico Ottonello fa1855a4b8 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2020-12-07 11:02:59 +01:00
Enrico Ottonello b1b589ada1 wf to generate orcid dataset 2020-12-07 11:02:32 +01:00
Sandro La Bruzzo 620e585b63 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-12-07 10:42:53 +01:00
Sandro La Bruzzo b31dd126fb fixed crossref workflow added common ORCID Class 2020-12-07 10:42:38 +01:00
Enrico Ottonello 8812ab65e1 completed download function to wf; added accumulators 2020-12-04 21:13:49 +01:00
Claudio Atzori a104a632df cleanup 2020-12-04 16:32:47 +01:00
Claudio Atzori 5b4e1142a8 Merge pull request 'added last step to update cache' (#64) from antonis.lempesis/dnet-hadoop:master into master
Looks good to me, thanks!
2020-12-04 14:42:31 +01:00
Antonis Lempesis b1ed1afdcc added the new parameter (stats_tool_api_url) in the workflow parameters 2020-12-04 13:07:18 +02:00
Antonis Lempesis 7cb113e088 added the new parameter (stats_tool_api_url) in the workflow parameters 2020-12-04 13:04:25 +02:00
Antonis Lempesis d23ccae0d5 ignoring deletedbyinference relations 2020-12-04 12:42:17 +02:00
Miriam Baglioni 5fb65ffc4a merge branch with master 2020-12-03 11:24:35 +01:00
Miriam Baglioni ea88dc3401 fixed issue in property name 2020-12-03 11:24:23 +01:00
Miriam Baglioni 4c58bd1c93 merge with upstream 2020-12-03 11:24:00 +01:00
Miriam Baglioni 05c452f58d merge with upstream 2020-12-03 10:26:45 +01:00
Enrico Ottonello 53b22c1937 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2020-12-02 23:21:27 +01:00
Enrico Ottonello 1b1e9ea67c wf to generate doi_author_list for doiboost; wf to download updated works 2020-12-02 23:20:16 +01:00
Antonis Lempesis 413afcfed5 finished first implementation of wf 2020-12-02 15:57:17 +02:00
Antonis Lempesis 0948536614 initial implementation of the promote wf 2020-12-02 15:41:56 +02:00
Sandro La Bruzzo 7da679542f fixed wrong projectId 2020-12-02 14:28:09 +01:00
Sandro La Bruzzo 6ba8037cc7 fixed failure to test due to changing of input 2020-12-02 11:34:46 +01:00
Claudio Atzori cfb55effd9 code formatting 2020-12-02 11:23:49 +01:00
Claudio Atzori 74242e450e using constants from ModelConstants 2020-12-02 11:23:35 +01:00
Miriam Baglioni d5efa6963a using constants in ModelCOnstants 2020-12-02 11:20:26 +01:00
Claudio Atzori 873c358d1d Merge pull request 'added extension for new author pid (orcid_pending)' (#63) from miriam.baglioni/dnet-hadoop:master into master
LGTM
2020-12-02 11:15:00 +01:00
Miriam Baglioni cd285e98bc usoing the constants defined in the ModelConstants class 2020-12-02 11:13:23 +01:00
Miriam Baglioni 51c582c08c added orcid class name among the constants set 2020-12-02 11:12:54 +01:00
Miriam Baglioni 4b0d1530a2 merge upstream 2020-12-02 11:05:00 +01:00
Claudio Atzori faa977df7e Merge pull request 'orcid-no-doi' (#43) from enrico.ottonello/dnet-hadoop:orcid-no-doi into master
The dataset was generated and is now part of the actionsets available in BETA
2020-12-02 10:55:12 +01:00
Claudio Atzori 57f448b7a4 graph cleaning workflow separate orcid_pending from orcid, depending on the author pid provenance 2020-12-02 10:44:05 +01:00
Alessia Bardi 2d15667b4a testing XML generation from json object (case AMS ACTA) 2020-12-02 10:16:26 +01:00
Alessia Bardi a417624670 tests for raw graph mapping 2020-12-02 10:15:26 +01:00
Claudio Atzori 943b961cf6 introduced PidBlacklist 2020-12-02 09:30:34 +01:00
Claudio Atzori 893ac4a77b GenerateEntitiesApplication can be configured to hash the id value or not 2020-12-02 09:30:06 +01:00
Miriam Baglioni f8468c9c22 added extention for new author pid (orcid_pending) 2020-12-01 20:09:35 +01:00
Miriam Baglioni 888175baf7 added java doc 2020-12-01 18:36:29 +01:00
Miriam Baglioni 3d62d99d5d fixed issue in workflow variable 2020-12-01 15:02:49 +01:00
Miriam Baglioni 17680296b9 removed unnecessary variable and unused method 2020-12-01 15:02:31 +01:00
Miriam Baglioni 5b3ed70808 refactoring 2020-12-01 14:31:34 +01:00
Miriam Baglioni 62ff4999e3 added workflow and last step of collection and save 2020-12-01 14:30:56 +01:00
Miriam Baglioni 45d06c45c7 collecting all the atoic actions for result type and save them all in the AS path 2020-12-01 14:29:18 +01:00
Miriam Baglioni 0051ebede5 extending test 2020-12-01 12:43:03 +01:00
Miriam Baglioni 719da15f04 added test resources 2020-12-01 12:42:30 +01:00
Miriam Baglioni e819155eb2 added implements Seriaiazable 2020-12-01 09:51:58 +01:00
Miriam Baglioni db36e11912 classes test classes and resources for production of the actionset to include bipFinder score in results 2020-11-30 20:14:23 +01:00
Claudio Atzori 349e7246aa do not consider NCID, GBIF as PIDs candidate for the ID creation 2020-11-30 16:52:40 +01:00
Enrico Ottonello f2df3ead74 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2020-11-30 14:22:46 +01:00
Enrico Ottonello 40c4559e92 added datainfo on authors pid with "sysimport:crosswalk:entityregistry", 2020-11-30 14:19:22 +01:00
Claudio Atzori 2c407e775e GenerateEntitiesApplication can be configured to hash the id value or not 2020-11-30 12:00:38 +01:00
Antonis Lempesis 815d6b25d9 added last step to update cache 2020-11-30 00:48:10 +02:00
Claudio Atzori 758d27745d cleaning tab characters from text fields 2020-11-27 16:07:24 +01:00
Claudio Atzori 596a2a459d added testing class for OafMapperUtils 2020-11-27 12:01:11 +01:00
Claudio Atzori e731a7658d cleaning texts to remove tab characters too 2020-11-27 09:00:04 +01:00
Claudio Atzori fa66e5b6b8 ResultTypeComparator gives priority to Records collectedfrom Crossref 2020-11-26 13:09:19 +01:00
Claudio Atzori 5151850a19 CROSSREF and DATACITE constants moved in common ModelConstants 2020-11-26 13:08:36 +01:00
Claudio Atzori a104d2b6ad cleanup 2020-11-26 11:12:00 +01:00
Claudio Atzori d0d5525d40 minor changes 2020-11-26 11:04:17 +01:00
Claudio Atzori 13eae4b31e GroupEntitiesSparkJob must read all graph paths but relations 2020-11-26 11:04:01 +01:00
Claudio Atzori 76363a8512 SimpleDateFormat is not thread safe; improved error reporting in case of invalid dates 2020-11-26 11:03:12 +01:00
Claudio Atzori c1b9a4045a grouping of records will be performed by the dedup workflow 2020-11-26 10:59:10 +01:00
Miriam Baglioni 124591a7f3 refactoring 2020-11-25 18:23:28 +01:00
Miriam Baglioni 1a89f8211c D-Net/dnet-hadoop#61 (comment) 2020-11-25 18:12:40 +01:00
Miriam Baglioni 5fbe54ef54 D-Net/dnet-hadoop#61 (comment) 2020-11-25 18:10:28 +01:00
Miriam Baglioni ed01e5a5e1 D-Net/dnet-hadoop#61 (comment) 2020-11-25 18:09:34 +01:00
Miriam Baglioni d4ddde2ef2 changed because of D-Net/dnet-hadoop#61 (comment) 2020-11-25 18:01:01 +01:00
Miriam Baglioni f5e5e92a10 changed because of D-Net/dnet-hadoop#61 (comment) 2020-11-25 17:58:53 +01:00
Miriam Baglioni 1df94b85b4 changed because of D-Net/dnet-hadoop#61 (comment) 2020-11-25 17:57:43 +01:00
Miriam Baglioni 66c0e3e574 changed because of D-Net/dnet-hadoop#61 (comment) 2020-11-25 17:52:17 +01:00
Claudio Atzori db0181b8af Merge pull request 'added bidirectionality to relations from project and result coming from crossref' (#60) from miriam.baglioni/dnet-hadoop:sxBidirectionality into master 2020-11-25 17:17:40 +01:00
Sandro La Bruzzo ec3e238de6 Fixed problem on duplicated identifier 2020-11-25 17:15:54 +01:00
Claudio Atzori 1372a4d1bf fixed merging method 2020-11-25 16:05:51 +01:00
Claudio Atzori e208b03755 renamed workflow 2020-11-25 14:55:50 +01:00
Claudio Atzori dfd6205b95 Consistency graph workflow merges all the entities by ID 2020-11-25 14:55:32 +01:00
Miriam Baglioni 90d4369fd2 added test to verify the compression in writing community info on hdfs 2020-11-25 14:34:58 +01:00
Miriam Baglioni 6750e33d69 merge branch with master 2020-11-25 14:09:01 +01:00
Miriam Baglioni b2c455f883 added java doc 2020-11-25 14:08:09 +01:00
Miriam Baglioni 1f130cdf92 changed the relation (produces -> isProducedBy) due to the change in the code 2020-11-25 14:04:26 +01:00
Miriam Baglioni e758d5d9b4 refactoring 2020-11-25 13:46:39 +01:00
Miriam Baglioni 87a9f616ae refactoring and addition of the funder nsp first part as nome for the dump insteasd of the whole nsp 2020-11-25 13:45:41 +01:00
Miriam Baglioni e7e418e444 added decision node to verify if to upload in Zenodo 2020-11-25 13:44:10 +01:00
Miriam Baglioni 305e3d0c9c added resource file for relation with relClass = isProducedBy 2020-11-25 13:43:41 +01:00
Miriam Baglioni 21ce175d17 added FilterFunction specification if filter operation 2020-11-25 13:42:31 +01:00
Miriam Baglioni bde6d337dd test classes for dump of results related to funders 2020-11-25 13:42:01 +01:00
Miriam Baglioni b37b9352d7 added constant value for semantic relationship between projects and results 2020-11-25 13:41:08 +01:00
Sandro La Bruzzo 264723ffd8 updated stuff for zenodo upload 2020-11-25 11:56:07 +01:00
Claudio Atzori 36173c13a5 reverted filters in the clening process 2020-11-25 10:24:42 +01:00
Claudio Atzori eeebd5a920 Cleanig workflow: remove newlines from titles, descriptions, subjects 2020-11-24 18:40:25 +01:00
Claudio Atzori e1a1bb3ee4 moved class CleaningFunctions in the correct package. Remove newlines from titles, descriptions, subjects 2020-11-24 18:34:03 +01:00
Enrico Ottonello 99a086f0c6 max concurrent executors set to 10, according to ORCID Director of Technology mail request 2020-11-24 17:49:32 +01:00
Miriam Baglioni 72bb0fe360 changed directory name 2020-11-24 16:47:07 +01:00
Miriam Baglioni 00874a8ce6 added bidirectionality to relations from project and result 2020-11-24 15:17:23 +01:00
Miriam Baglioni 39f4a20873 chenged the path and the name for saving the communities_infrastructures dump file 2020-11-24 14:47:32 +01:00
Miriam Baglioni 7e14452a87 final versione of the wf to get the dump of results associated to at least one funder per funder 2020-11-24 14:46:34 +01:00
Miriam Baglioni c167a18057 added new parameter for the dumpType 2020-11-24 14:45:50 +01:00
Miriam Baglioni 54a309bb6b refactoring 2020-11-24 14:45:30 +01:00
Miriam Baglioni 35ecea8842 changed to consider the modification for the specification of the type of dump 2020-11-24 14:45:15 +01:00
Miriam Baglioni b9b6bdb2e6 fixing issue on previous implementation 2020-11-24 14:44:53 +01:00
Miriam Baglioni 7e940f1991 changed to consider the modification for the specification of the type of dump 2020-11-24 14:43:34 +01:00
Miriam Baglioni 62928ef7a5 changed to save the communities_infrastructures information as the other entity dumps: in a json.gz file 2020-11-24 14:42:41 +01:00
Claudio Atzori 33bae02451 reverted behaviour of the cleaning workflow: grouping entities by ID will be managed differently 2020-11-24 14:42:33 +01:00
Claudio Atzori e43ab07af6 code formatting 2020-11-24 14:41:39 +01:00
Miriam Baglioni 3319440c53 changed the direction of the relation between projects and result considered to select the results linked to projects 2020-11-24 14:41:09 +01:00
Miriam Baglioni 00c377dac2 added specification of MapFunction types in map 2020-11-24 14:40:22 +01:00
Miriam Baglioni 44db258dc4 added enumerated for the dump type 2020-11-24 14:38:06 +01:00
Miriam Baglioni 1832708c42 modified boolean variable with string one whcih specify the type of dump we are performing: complete, community or funder 2020-11-24 14:37:36 +01:00
Miriam Baglioni 73dbb79602 removed the checl for the community name in the common version on MakeTar 2020-11-24 14:36:15 +01:00
Claudio Atzori c016cc050a IdentifierFactory: in case a record provides more than one pid of the same type, the the lexicographically lower value is chosen as best pick 2020-11-23 19:16:40 +01:00
Enrico Ottonello 5c17e768b2 set wf configuration with spark.dynamicAllocation.maxExecutors 20 over 20 input partitions 2020-11-23 16:01:23 +01:00
Enrico Ottonello 5c9a727895 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2020-11-23 09:49:53 +01:00
Enrico Ottonello 97c8111847 action to convert lambda file in seq file; spark action to download updated authors 2020-11-23 09:49:22 +01:00
Miriam Baglioni 259c67ce36 fixed issue in path name 2020-11-20 12:32:23 +01:00
Miriam Baglioni 0a9db67eec - 2020-11-20 12:21:33 +01:00
Miriam Baglioni d362f2637d merge branch with master 2020-11-19 19:17:20 +01:00
Miriam Baglioni cf3f47563f new parameter files 2020-11-19 19:16:05 +01:00
Miriam Baglioni 24c56fa7a3 new logic and workflow for dump of results with link to projects. In this implementation the result match the model of the communityresult. 2020-11-19 19:15:39 +01:00
Claudio Atzori d48f388fb2 Merge branch 'provision_indexing' 2020-11-19 15:59:55 +01:00
Claudio Atzori 46bde9c13f Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-11-19 15:26:27 +01:00
Claudio Atzori 7c9feaf9e7 project attributes removed from the XML record serialization: contactfullname, contactfax, contactphone, contactemail 2020-11-19 15:26:20 +01:00
Claudio Atzori fcbb05eb21 cleanup 2020-11-19 15:14:33 +01:00
Claudio Atzori 3f34757c63 merged from master 2020-11-19 14:34:54 +01:00
Michele Artini 293da47ad9 Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-11-19 10:42:31 +01:00
Michele Artini ab08d12c46 considering abstract > MIN_LENGTH in ENRICH_MISSING_ABSTRACT 2020-11-19 10:42:10 +01:00
Claudio Atzori e503271abe fixed notification workflow name 2020-11-19 10:41:38 +01:00
Claudio Atzori 0374d34c3e introduced configuration param outputFormat: HDFS | SOLR 2020-11-19 10:34:28 +01:00
Miriam Baglioni fafb688887 - 2020-11-18 18:56:48 +01:00
Miriam Baglioni 906db690d2 - 2020-11-18 17:43:08 +01:00
Claudio Atzori ede7fae6c8 Merge pull request 'XML record indexing test' (#58) from provision_indexing into master 2020-11-18 17:04:34 +01:00
Miriam Baglioni 5402062ff5 changed parameter file with the ono associated to the job 2020-11-18 16:58:20 +01:00
Miriam Baglioni a172a37ad1 fixed typo 2020-11-18 16:55:07 +01:00
Miriam Baglioni 46ba3793f6 code, workflow and parameters for the dump of the results associated to funders 2020-11-18 16:47:31 +01:00
Claudio Atzori 5218718e8b updated set of fields from the MDFormatDSResourceType on PROD 2020-11-18 15:00:41 +01:00
Claudio Atzori d9e07a242b extended XmlIndexingJob to accept an optional parameter: outputPath. When present, forces the job to write its output on the specified HDFS location 2020-11-18 14:34:55 +01:00
Claudio Atzori 29dcff0f34 spark complains about missing classes, so here they are again 2020-11-18 14:32:32 +01:00
Miriam Baglioni 57cac36898 changed the workflow name 2020-11-18 13:38:03 +01:00
Claudio Atzori 12acf25519 Merge pull request 'starting from first step...' (#57) from antonis.lempesis/dnet-hadoop:master into master
No judging. Just re-deploying...
2020-11-18 11:01:49 +01:00
Claudio Atzori 8177ce7939 test for XmlIndexingJob based on a local miniSolrCluster 2020-11-18 10:58:05 +01:00
Alessia Bardi 10e673660f Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-11-18 10:01:23 +01:00
Alessia Bardi be7b310cef rel semantcis ignore case 2020-11-18 10:01:20 +01:00
Michele Artini 33da2e3d6c xpaths for dateOfCollection and dateOfTransformation 2020-11-18 09:26:20 +01:00
Antonis Lempesis 01a6e03989 starting from first step... 2020-11-17 23:26:47 +02:00
Alessia Bardi 8f87020a50 #56: map relevantDates from aggregated ODF records 2020-11-17 18:42:09 +01:00
Alessia Bardi 7e0a76a8ac test fr TextGrid 2020-11-17 18:39:25 +01:00
Enrico Ottonello 2b0c9bbb7e Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2020-11-17 18:24:34 +01:00
Enrico Ottonello c0c2e05eae added wf to extracting authors and works xml data from orcid dump to hdfs; added wf to download the lamda file (containing last orcid update informations) from orcid to hdfs 2020-11-17 18:23:12 +01:00
Claudio Atzori cfc01f136e PID filtering based on a blacklist 2020-11-17 12:27:06 +01:00
Claudio Atzori 628ca54dd3 disable old maven repository URLs 2020-11-17 12:26:16 +01:00
Dimitris bbcf6b7c8b Commit 17112020 2020-11-17 08:36:51 +02:00
Enrico Ottonello c796adae24 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2020-11-16 11:57:19 +01:00
Claudio Atzori 6ab1ce53c9 fixed condition in result pid cleaning; cleanup 2020-11-16 10:09:17 +01:00
Claudio Atzori 4de8c8b237 fixed workflow variable name 2020-11-16 10:03:11 +01:00
Dimitris 3e24c9b176 Changes 14112020 2020-11-14 18:42:07 +02:00
Claudio Atzori 331d621800 added test resource 2020-11-14 12:16:15 +01:00
Claudio Atzori 5d4e34e26a fixed typo in variable name 2020-11-14 10:32:26 +01:00
Claudio Atzori 768bc5304c Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-11-13 15:40:34 +01:00
Claudio Atzori 93f7b7974f Merge pull request 'trust truncated to 3 decimals' (#24) from trunc_trust into master
LGTM
2020-11-13 15:40:02 +01:00
Claudio Atzori 2facfefc19 updated maven repository URL 2020-11-13 15:38:40 +01:00
Claudio Atzori 528231a287 grouping graph entities by id turned out to be an easy extension for the already existing cleaning workflow 2020-11-13 15:37:48 +01:00
Enrico Ottonello 005f849674 added compression to output dataset 2020-11-13 12:45:31 +01:00
Enrico Ottonello 9a2fa9dc2f added test for other names parsing from summaries dump 2020-11-13 10:25:34 +01:00
Claudio Atzori 2bed29eb09 WIP: added oozie workflow for grouping graph entities by id 2020-11-13 10:05:12 +01:00
Claudio Atzori 13e36a4da0 WIP: added oozie workflow for grouping graph entities by id 2020-11-13 10:05:02 +01:00
Enrico Ottonello 13f28fa225 moved AuthorData to dhp-schemas; added other names to author data 2020-11-12 17:43:32 +01:00
Enrico Ottonello 2af21150c5 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2020-11-12 09:58:33 +01:00
Claudio Atzori 9b0fb9e958 merged from master 2020-11-12 09:27:12 +01:00
Claudio Atzori 75324ae58a Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop 2020-11-12 09:23:37 +01:00
Claudio Atzori 822971f54f no need to filter relations in CreateRelatedEntitiesJob_phase1; replaced 'left outer' join with 'left' join in CreateRelatedEntitiesJob_phase2; cleanup; 2020-11-12 09:22:59 +01:00
Enrico Ottonello 1f861f2b0d now wf output is a sequence file with the format seq("eu.dnetlib.dhp.schema.oaf.Publication",eu.dnetlib.dhp.schema.action.AtomicActions) 2020-11-11 17:38:50 +01:00
Claudio Atzori 9841488482 Merge pull request 'latest changes in stats wf' (#54) from antonis.lempesis/dnet-hadoop:master into master
LGTM, thanks!
2020-11-11 16:01:51 +01:00
Antonis Lempesis 99ebaee347 fixed #5913 2020-11-11 16:56:46 +02:00
Claudio Atzori e3d3481fb9 Merge pull request 'organizations pids' (#53) from organization_pids into master
LGTM
2020-11-11 14:08:25 +01:00
Antonis Lempesis f14e65f6a3 reverted wrong change 2020-11-10 17:23:04 +02:00
Antonis Lempesis c02c7741c9 fixes in db creation 2020-11-10 17:11:30 +02:00
Antonis Lempesis e603fa5847 fixes in db creation 2020-11-10 17:11:12 +02:00
Enrico Ottonello fea2451658 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2020-11-10 11:49:43 +01:00
Claudio Atzori 18d9aad70c improved documentation in dhp-graph-provision 2020-11-10 11:48:55 +01:00
Enrico Ottonello 1513174d7e added further test case 2020-11-10 11:44:55 +01:00
Michele Artini 40160d171f organizations pids 2020-11-09 12:58:36 +01:00
Sandro La Bruzzo 8e1d43aab2 Implemented ID generation using IdentifierRecordFactory on DOIBoost 2020-11-09 11:53:55 +01:00
Sandro La Bruzzo 027ef2326c Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-11-06 17:12:42 +01:00
Sandro La Bruzzo cd27df91a1 fixed bug on missing relation in ANDS 2020-11-06 17:12:31 +01:00
Enrico Ottonello 6bc7dbeca7 first version of dataset successful generated from orcid dump 2020 2020-11-06 13:47:50 +01:00
Claudio Atzori d10447e747 re-packaged graph dump workflow sources 2020-11-05 17:38:18 +01:00
Claudio Atzori 2d76497488 cleanup 2020-11-05 17:10:24 +01:00
Claudio Atzori 144216fb88 Merge pull request 'OpenAIRE graph dump' (#51) from miriam.baglioni/dnet-hadoop:dump into master
LGTM
2020-11-05 17:09:52 +01:00
Miriam Baglioni f8e9bda24c merge branch with master 2020-11-05 16:31:18 +01:00
Miriam Baglioni afa0b1489b merge upstream 2020-11-05 16:31:09 +01:00
Miriam Baglioni 7ebdfacee9 removed commented code and added documentation to new method 2020-11-05 16:30:36 +01:00
Miriam Baglioni be5ed8f554 added check to avoid sending empty metadata. 2020-11-05 16:10:17 +01:00
Claudio Atzori 2148a51fae minor changes 2020-11-05 11:24:12 +01:00
Claudio Atzori 4625b7486e code formatting 2020-11-04 18:12:43 +01:00
Claudio Atzori f5f346dd2b Merge pull request 'dump' (#50) from miriam.baglioni/dnet-hadoop:dump into master
LGTM
2020-11-04 18:07:01 +01:00
Miriam Baglioni e9ac471ae9 removed dependency from classes for the pid graph dump 2020-11-04 18:04:42 +01:00
Miriam Baglioni f45c23316f removed entities added for the pid graph dump 2020-11-04 17:31:24 +01:00
Miriam Baglioni e9d948786d removed commented code 2020-11-04 17:30:51 +01:00
Miriam Baglioni b90a945c49 removed property files for pid graph dump 2020-11-04 17:28:33 +01:00
Miriam Baglioni bac307155a removed properties specific for pid graph dump 2020-11-04 17:28:04 +01:00
Miriam Baglioni 9c9d50f486 removed code specific for pid graph dump 2020-11-04 17:26:22 +01:00
Miriam Baglioni 5669890934 removed commented lines 2020-11-04 17:15:21 +01:00
Miriam Baglioni 6a89f59be9 removed commented lines 2020-11-04 17:13:59 +01:00
Miriam Baglioni 56150d7e5e removed all code related to the dump of pids graph 2020-11-04 17:13:12 +01:00
Miriam Baglioni 16c54a96f8 removed pid dump 2020-11-04 17:11:32 +01:00
Miriam Baglioni d9d8de63cc merge upstream 2020-11-04 13:36:38 +01:00
Miriam Baglioni 0cac5436ff Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump 2020-11-04 13:21:11 +01:00
Alessia Bardi 51808b5afd Updated descriptions 2020-11-04 12:29:48 +01:00
Alessia Bardi e6becf8659 Updated descriptions 2020-11-04 12:17:57 +01:00
Alessia Bardi 0abe0eee33 Updated descriptions 2020-11-04 12:15:30 +01:00
Alessia Bardi f6ab238f5d Updated descriptions 2020-11-04 11:50:47 +01:00
Sandro La Bruzzo 3581244daf Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop 2020-11-04 09:04:22 +01:00
Sandro La Bruzzo 66efb39634 implemented merge scholix 2020-11-04 09:04:01 +01:00
Miriam Baglioni c010a8442f fixed issue on test code 2020-11-03 17:26:51 +01:00
Miriam Baglioni 8ec7a61188 merge branch with master 2020-11-03 16:59:08 +01:00
Miriam Baglioni 8b4f7bf492 merge upstream 2020-11-03 16:58:59 +01:00
Miriam Baglioni c209284ca7 new schemas for the entities in the dump with added descriptions 2020-11-03 16:58:08 +01:00
Miriam Baglioni 08806deddf added the splitSize non mandatory parameter. Default size 10G 2020-11-03 16:57:34 +01:00
Miriam Baglioni 7d2eda43ca added new non mandatory property publish to determine if to publish the upload or leave it pending. Default value flase 2020-11-03 16:57:01 +01:00
Miriam Baglioni cbbb1bdc54 moved business logic to new class in common for handling the zip of hte archives 2020-11-03 16:55:50 +01:00
Miriam Baglioni 7d95a5e2b4 refactoring 2020-11-03 16:55:13 +01:00
Miriam Baglioni d4382b54df moved the tar archive with maz size on common module 2020-11-03 16:54:50 +01:00
Miriam Baglioni 1124ac29fc merge upstream 2020-11-02 10:22:51 +01:00
Dimitris 32bf943979 Changes to download only updates 2020-11-02 09:08:25 +02:00
Miriam Baglioni dabb33e018 changed the discriminant for which split the file 2020-10-30 17:52:22 +01:00
Miriam Baglioni 0fba08eae4 max allowed size per file 10 Gb 2020-10-30 16:05:55 +01:00
Miriam Baglioni b828587252 prevent the code to cicle indefinetly 2020-10-30 15:01:25 +01:00
Miriam Baglioni f747e303ac classes for dumping of the graph as ttl file 2020-10-30 14:13:45 +01:00
Miriam Baglioni 16baf5b69e formatting 2020-10-30 14:13:14 +01:00
Miriam Baglioni a9eef9c852 added check for possible Optional value in relation dataInfo 2020-10-30 14:12:28 +01:00
Miriam Baglioni 5f4de9a962 formatting 2020-10-30 14:11:40 +01:00
Miriam Baglioni 10d8bbada8 changed deprecated method with non deprecated versioen 2020-10-30 14:10:10 +01:00
Miriam Baglioni 14bf2e7238 added option to split dumps bigger that 40Gb on different files 2020-10-30 14:09:04 +01:00
Dimitris b8a3392b59 Commit 30102020 2020-10-30 14:07:21 +02:00
Miriam Baglioni 78fdb11c3f merge branch with master 2020-10-29 12:55:22 +01:00
Miriam Baglioni d6e8dc0313 merge upstream 2020-10-29 12:55:06 +01:00
Miriam Baglioni 4cf4454341 changed from deprecated method to new one 2020-10-27 17:46:19 +01:00
Miriam Baglioni c8f32dd109 - 2020-10-27 17:45:58 +01:00
Miriam Baglioni 3582eba565 - 2020-10-27 17:31:33 +01:00
Miriam Baglioni d2374e3b9e added code to handle cases where the funding tree is not existing 2020-10-27 16:15:21 +01:00
Miriam Baglioni 5d3012eeb4 changed code to dump only the programme list and not the classification list 2020-10-27 16:14:18 +01:00
Miriam Baglioni 1bd638d291 removed h2020classification from dump information. Added back the programme info 2020-10-27 16:13:36 +01:00
Miriam Baglioni 3241ec1777 added connection timeout and socket timeout 600 sec 2020-10-27 16:12:11 +01:00
Miriam Baglioni cc68855a1e merge upstream 2020-10-27 15:54:16 +01:00
Miriam Baglioni 1cb60aede4 added connection timeout and socket timeout 600 sec 2020-10-27 15:53:02 +01:00
Enrico Ottonello 9818e74a70 added dependency version in main pom.xml for orcid no doi 2020-10-22 16:38:00 +02:00
Enrico Ottonello 210a50e4f4 replaced null value 2020-10-22 16:24:42 +02:00
Enrico Ottonello b0290dbcb7 moved all dependencies version to main pom.xml 2020-10-22 16:20:46 +02:00
Enrico Ottonello a38ab57062 let run test methods 2020-10-22 15:43:50 +02:00
Enrico Ottonello 1139d6568d replaced null value with a more safe empty string as return value 2020-10-22 15:32:26 +02:00
Enrico Ottonello c58db1c8ea added filter on null value after map function 2020-10-22 15:11:02 +02:00
Enrico Ottonello 846ba30873 if typologies mapping fails, an exception will be propagated 2020-10-22 14:36:18 +02:00
Enrico Ottonello c3114ba0ae replaced null as return value with a more safe empty string 2020-10-22 14:21:31 +02:00
Enrico Ottonello c295c71ca0 added comment 2020-10-22 14:07:26 +02:00
Enrico Ottonello ab083f9946 propagate exception on parsing work (PR request) 2020-10-22 14:02:32 +02:00
Miriam Baglioni 959f30811e added connection timeout and socket timeout 600 sec 2020-10-16 10:52:30 +02:00
Miriam Baglioni 11b7eaae09 changed the name of the folder where to store the context entity from context to communities_infrastructures 2020-10-05 11:24:54 +02:00
Miriam Baglioni 32bffb0134 changed the name from communities_infrastructures to communities_infrastuctures.json 2020-10-05 11:24:17 +02:00
Miriam Baglioni 25cbcf6114 changed to solve issues about names. context renamed communities_infrastructure.json and removed the double json.gz extention to the name of the part in the tar 2020-10-02 12:17:46 +02:00
Miriam Baglioni 01117a46e1 whole workflow activated 2020-10-01 17:19:21 +02:00
Miriam Baglioni cfb5766c6b removed double json.gz from names of files in the tar 2020-10-01 17:18:34 +02:00
Miriam Baglioni fcaedac980 merge branch with master 2020-10-01 16:46:59 +02:00
Miriam Baglioni 983a12ed15 temporary modification to allow the upload of files in the sandbox without the neew to recreate the mapping from scratch 2020-09-25 16:41:51 +02:00
Miriam Baglioni 8b36d19182 added property depositionId and chenage property newVersion that became string from boolean to handle the three possible distinct values 2020-09-25 16:41:15 +02:00
Miriam Baglioni ed5239f9ec added new code to handle the new possibility to upload files to an already open deposition 2020-09-25 16:34:32 +02:00
Miriam Baglioni 3a8c524fce refactor 2020-09-25 16:34:02 +02:00
Miriam Baglioni ccd48dd78a added new test for new method 2020-09-25 16:33:43 +02:00
Miriam Baglioni 3e5497b336 added new method to handle an open deposition to which upload data 2020-09-25 16:33:15 +02:00
Miriam Baglioni 2ac2b537b6 merge branch with master 2020-09-25 14:40:47 +02:00
Miriam Baglioni 54800fb9b0 enabled only the step to upload in zenodo 2020-09-25 14:40:22 +02:00
Miriam Baglioni de6c4d46d8 fixed conflicts 2020-09-24 15:35:01 +02:00
Enrico Ottonello a97ad20c7b exception is now propagated (PR review) 2020-09-22 10:46:34 +02:00
Enrico Ottonello fefbcfb106 dependency version moved to main pom (PR review) 2020-09-22 10:20:25 +02:00
Enrico Ottonello 7cffd14fb0 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2020-09-15 16:05:55 +02:00
Enrico Ottonello 9e8e7fe6ef add comments 2020-09-15 11:32:49 +02:00
Miriam Baglioni c2b5c780ff - 2020-09-14 14:34:03 +02:00
Miriam Baglioni e2ceefe9be - 2020-09-14 14:33:28 +02:00
Miriam Baglioni 1f893e63dc - 2020-09-14 14:33:10 +02:00
Enrico Ottonello 538f299767 merged 2020-09-14 12:35:16 +02:00
Enrico Ottonello eb8c9b2348 Merge remote-tracking branch 'upstream/master' into orcid-no-doi 2020-09-14 12:00:56 +02:00
Miriam Baglioni b72a7dad46 resuorce for pid graph dump 2020-08-24 17:09:01 +02:00
Miriam Baglioni 8694bb9b31 refactoring due to compilation 2020-08-24 17:07:34 +02:00
Miriam Baglioni 428f1022fd refactoring due to compilation 2020-08-24 17:05:50 +02:00
Miriam Baglioni c90a0d39dd refactoring due to compilation 2020-08-24 17:04:59 +02:00
Miriam Baglioni bd5a72929b refactoring due to compilation 2020-08-24 17:03:06 +02:00
Miriam Baglioni 8a069a4fea - 2020-08-24 17:01:30 +02:00
Miriam Baglioni 34fa96f3b1 - 2020-08-24 17:00:20 +02:00
Miriam Baglioni 5fb2949cb8 added utils methods 2020-08-24 17:00:09 +02:00
Miriam Baglioni 2a540b6c01 added constants for the pid graph dump 2020-08-24 16:55:35 +02:00
Miriam Baglioni da103c399a resources for the pid graph dump test 2020-08-24 16:52:07 +02:00
Miriam Baglioni 630a6a1fe7 first tests for the pid graph dump 2020-08-24 16:51:26 +02:00
Miriam Baglioni 40c8d2de7b test resources for the dump of the pids graph 2020-08-24 16:50:39 +02:00
Miriam Baglioni bef79d3bdf first attempt to the dump of pids graph 2020-08-24 16:49:38 +02:00
Miriam Baglioni 85203c16e3 merge branch with master 2020-08-19 11:49:03 +02:00
Miriam Baglioni 2c783793ba removed the affiliation from the author to mirror the changes in the model 2020-08-19 11:48:12 +02:00
Miriam Baglioni c325acef3f changed as extensions of the classes defining the common parameter 2020-08-19 11:47:17 +02:00
Miriam Baglioni 6b8c5034fc extends the result with parameters specific for the community dump. 2020-08-19 11:46:20 +02:00
Miriam Baglioni f26382fa51 extends the instance with parameters collectedfrom and hostedby to be dumps only for communities 2020-08-19 11:45:23 +02:00
Miriam Baglioni 4584cf6334 contains the specialized instance parameter for the dump of the result in the whole graph 2020-08-19 11:44:46 +02:00
Miriam Baglioni f5bae426f7 modified to store information concerning to instance and result common to both the dumps for community and the whole graph 2020-08-19 11:43:44 +02:00
Miriam Baglioni 11b80899d7 added to store information concerning to project common to both the dumps for community and the whole graph 2020-08-19 11:43:18 +02:00
Miriam Baglioni f6bf888016 removed affiliation from author to mirror the changes in the model 2020-08-19 11:41:41 +02:00
Miriam Baglioni 66d0e0d3f2 - 2020-08-19 11:31:50 +02:00
Miriam Baglioni 1c593a9cfe - 2020-08-19 11:29:51 +02:00
Miriam Baglioni e42b2f5ae2 - 2020-08-19 11:29:09 +02:00
Miriam Baglioni f81ee22418 changed to mirror the changes in the model (Instance, CommunityInstance, GraphResult) 2020-08-19 11:28:26 +02:00
Miriam Baglioni 387be43fd4 changed to discriminate if dumping all the results type together or each one in its own archive 2020-08-19 11:25:27 +02:00
Miriam Baglioni c5858afb88 added parameter to guide the dump for the result (resultAggregation). true if all the result types should be dump together, false otherwise. 2020-08-19 11:24:14 +02:00
Miriam Baglioni d407852ac2 changed to reflect the changed in the model 2020-08-19 11:15:05 +02:00
Miriam Baglioni 47c21a8961 refactoring due to compilation 2020-08-19 11:11:57 +02:00
Miriam Baglioni 5570678c65 changed parameter name from hfdsNameNode to nameNode 2020-08-19 10:59:26 +02:00
Miriam Baglioni dc5096a327 refactoring due to compilation 2020-08-19 10:57:36 +02:00
Miriam Baglioni 1791cf2e78 refactoring due to compilation 2020-08-19 10:14:41 +02:00
Miriam Baglioni c7f944a533 refactoring due to compilation 2020-08-19 10:01:26 +02:00
Enrico Ottonello 0377b40fba output to one parquet file 2020-07-30 18:38:07 +02:00
Enrico Ottonello 196f36c6ed fix publication dataset creation 2020-07-30 13:38:33 +02:00
Enrico Ottonello c82b15b5f4 migrate configuration to ocean, fix publication dataset creation 2020-07-28 15:23:52 +02:00
Enrico Ottonello a6acb37689 Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2020-07-28 08:07:40 +02:00
Michele Artini 3adedd0a68 trust truncated to 3 decimals 2020-07-17 11:58:11 +02:00
Enrico Ottonello ca37d3427b separate workflow to parse orcid summaries, activities and generate dataset with no doi publications; test 2020-07-03 23:30:31 +02:00
Enrico Ottonello 1729cc5cf3 publication conversion from json to oaf test 2020-07-02 18:46:20 +02:00
Enrico Ottonello 5525f57ec8 converter from orcid work json to oaf 2020-07-01 18:36:14 +02:00
Enrico Ottonello b7b6be12a5 fixed enriched works generation 2020-06-29 18:03:16 +02:00
Enrico Ottonello b2213b6435 merged with dnet version 2020-06-26 17:27:34 +02:00
Enrico Ottonello c5e149c46e Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi 2020-06-26 16:15:38 +02:00
Enrico Ottonello d6498278ed added workflow to generate seq(orcidId,work) and seq(orcidId,enrichedWork) 2020-06-25 18:43:29 +02:00
Enrico Ottonello fcbb4c1489 parser of orcid publication data from xml original dump 2020-06-24 16:29:32 +02:00
995 changed files with 66960 additions and 32782 deletions

3
.gitignore vendored
View File

@ -7,6 +7,8 @@
*.iws *.iws
*~ *~
.vscode .vscode
.metals
.bloop
.classpath .classpath
/*/.classpath /*/.classpath
/*/*/.classpath /*/*/.classpath
@ -24,4 +26,5 @@
spark-warehouse spark-warehouse
/**/job-override.properties /**/job-override.properties
/**/*.log /**/*.log
/**/.factorypath

View File

@ -15,12 +15,12 @@
<snapshotRepository> <snapshotRepository>
<id>dnet45-snapshots</id> <id>dnet45-snapshots</id>
<name>DNet45 Snapshots</name> <name>DNet45 Snapshots</name>
<url>http://maven.research-infrastructures.eu/nexus/content/repositories/dnet45-snapshots</url> <url>https://maven.d4science.org/nexus/content/repositories/dnet45-snapshots</url>
<layout>default</layout> <layout>default</layout>
</snapshotRepository> </snapshotRepository>
<repository> <repository>
<id>dnet45-releases</id> <id>dnet45-releases</id>
<url>http://maven.research-infrastructures.eu/nexus/content/repositories/dnet45-releases</url> <url>https://maven.d4science.org/nexus/content/repositories/dnet45-releases</url>
</repository> </repository>
</distributionManagement> </distributionManagement>

View File

@ -6,7 +6,8 @@
<groupId>eu.dnetlib.dhp</groupId> <groupId>eu.dnetlib.dhp</groupId>
<artifactId>dhp</artifactId> <artifactId>dhp</artifactId>
<version>1.2.4-SNAPSHOT</version> <version>1.2.4-SNAPSHOT</version>
<relativePath>../</relativePath> <relativePath>../pom.xml</relativePath>
</parent> </parent>
<artifactId>dhp-common</artifactId> <artifactId>dhp-common</artifactId>
@ -20,6 +21,10 @@
<groupId>org.apache.hadoop</groupId> <groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId> <artifactId>hadoop-common</artifactId>
</dependency> </dependency>
<dependency>
<groupId>com.github.sisyphsu</groupId>
<artifactId>dateparser</artifactId>
</dependency>
<dependency> <dependency>
<groupId>org.apache.spark</groupId> <groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId> <artifactId>spark-core_2.11</artifactId>
@ -29,12 +34,6 @@
<artifactId>spark-sql_2.11</artifactId> <artifactId>spark-sql_2.11</artifactId>
</dependency> </dependency>
<dependency>
<groupId>eu.dnetlib.dhp</groupId>
<artifactId>dhp-schemas</artifactId>
<version>${project.version}</version>
</dependency>
<dependency> <dependency>
<groupId>commons-cli</groupId> <groupId>commons-cli</groupId>
<artifactId>commons-cli</artifactId> <artifactId>commons-cli</artifactId>
@ -59,11 +58,6 @@
<groupId>com.fasterxml.jackson.core</groupId> <groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId> <artifactId>jackson-databind</artifactId>
</dependency> </dependency>
<!-- https://mvnrepository.com/artifact/com.rabbitmq/amqp-client -->
<dependency>
<groupId>com.rabbitmq</groupId>
<artifactId>amqp-client</artifactId>
</dependency>
<dependency> <dependency>
<groupId>net.sf.saxon</groupId> <groupId>net.sf.saxon</groupId>
<artifactId>Saxon-HE</artifactId> <artifactId>Saxon-HE</artifactId>
@ -104,10 +98,19 @@
<artifactId>dnet-pace-core</artifactId> <artifactId>dnet-pace-core</artifactId>
</dependency> </dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
</dependency>
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongo-java-driver</artifactId>
</dependency>
<dependency> <dependency>
<groupId>eu.dnetlib.dhp</groupId> <groupId>eu.dnetlib.dhp</groupId>
<artifactId>dhp-schemas</artifactId> <artifactId>dhp-schemas</artifactId>
<version>${project.version}</version>
</dependency> </dependency>
</dependencies> </dependencies>

View File

@ -1,119 +0,0 @@
package eu.dnetlib.data.mdstore.manager.common.model;
import java.io.Serializable;
import java.util.UUID;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.Id;
import javax.persistence.Table;
@Entity
@Table(name = "mdstores")
public class MDStore implements Serializable {
/** */
private static final long serialVersionUID = 3160530489149700055L;
@Id
@Column(name = "id")
private String id;
@Column(name = "format")
private String format;
@Column(name = "layout")
private String layout;
@Column(name = "interpretation")
private String interpretation;
@Column(name = "datasource_name")
private String datasourceName;
@Column(name = "datasource_id")
private String datasourceId;
@Column(name = "api_id")
private String apiId;
public String getId() {
return id;
}
public void setId(final String id) {
this.id = id;
}
public String getFormat() {
return format;
}
public void setFormat(final String format) {
this.format = format;
}
public String getLayout() {
return layout;
}
public void setLayout(final String layout) {
this.layout = layout;
}
public String getInterpretation() {
return interpretation;
}
public void setInterpretation(final String interpretation) {
this.interpretation = interpretation;
}
public String getDatasourceName() {
return datasourceName;
}
public void setDatasourceName(final String datasourceName) {
this.datasourceName = datasourceName;
}
public String getDatasourceId() {
return datasourceId;
}
public void setDatasourceId(final String datasourceId) {
this.datasourceId = datasourceId;
}
public String getApiId() {
return apiId;
}
public void setApiId(final String apiId) {
this.apiId = apiId;
}
public static MDStore newInstance(
final String format, final String layout, final String interpretation) {
return newInstance(format, layout, interpretation, null, null, null);
}
public static MDStore newInstance(
final String format,
final String layout,
final String interpretation,
final String dsName,
final String dsId,
final String apiId) {
final MDStore md = new MDStore();
md.setId("md-" + UUID.randomUUID());
md.setFormat(format);
md.setLayout(layout);
md.setInterpretation(interpretation);
md.setDatasourceName(dsName);
md.setDatasourceId(dsId);
md.setApiId(apiId);
return md;
}
}

View File

@ -1,51 +0,0 @@
package eu.dnetlib.data.mdstore.manager.common.model;
import java.io.Serializable;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.Id;
import javax.persistence.Table;
@Entity
@Table(name = "mdstore_current_versions")
public class MDStoreCurrentVersion implements Serializable {
/** */
private static final long serialVersionUID = -4757725888593745773L;
@Id
@Column(name = "mdstore")
private String mdstore;
@Column(name = "current_version")
private String currentVersion;
public String getMdstore() {
return mdstore;
}
public void setMdstore(final String mdstore) {
this.mdstore = mdstore;
}
public String getCurrentVersion() {
return currentVersion;
}
public void setCurrentVersion(final String currentVersion) {
this.currentVersion = currentVersion;
}
public static MDStoreCurrentVersion newInstance(final String mdId, final String versionId) {
final MDStoreCurrentVersion cv = new MDStoreCurrentVersion();
cv.setMdstore(mdId);
cv.setCurrentVersion(versionId);
return cv;
}
public static MDStoreCurrentVersion newInstance(final MDStoreVersion v) {
return newInstance(v.getMdstore(), v.getId());
}
}

View File

@ -1,99 +0,0 @@
package eu.dnetlib.data.mdstore.manager.common.model;
import java.io.Serializable;
import java.util.Date;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.Id;
import javax.persistence.Table;
import javax.persistence.Temporal;
import javax.persistence.TemporalType;
@Entity
@Table(name = "mdstore_versions")
public class MDStoreVersion implements Serializable {
/** */
private static final long serialVersionUID = -4763494442274298339L;
@Id
@Column(name = "id")
private String id;
@Column(name = "mdstore")
private String mdstore;
@Column(name = "writing")
private boolean writing;
@Column(name = "readcount")
private int readCount = 0;
@Column(name = "lastupdate")
@Temporal(TemporalType.TIMESTAMP)
private Date lastUpdate;
@Column(name = "size")
private long size = 0;
public static MDStoreVersion newInstance(final String mdId, final boolean writing) {
final MDStoreVersion t = new MDStoreVersion();
t.setId(mdId + "-" + new Date().getTime());
t.setMdstore(mdId);
t.setLastUpdate(null);
t.setWriting(writing);
t.setReadCount(0);
t.setSize(0);
return t;
}
public String getId() {
return id;
}
public void setId(final String id) {
this.id = id;
}
public String getMdstore() {
return mdstore;
}
public void setMdstore(final String mdstore) {
this.mdstore = mdstore;
}
public boolean isWriting() {
return writing;
}
public void setWriting(final boolean writing) {
this.writing = writing;
}
public int getReadCount() {
return readCount;
}
public void setReadCount(final int readCount) {
this.readCount = readCount;
}
public Date getLastUpdate() {
return lastUpdate;
}
public void setLastUpdate(final Date lastUpdate) {
this.lastUpdate = lastUpdate;
}
public long getSize() {
return size;
}
public void setSize(final long size) {
this.size = size;
}
}

View File

@ -1,143 +0,0 @@
package eu.dnetlib.data.mdstore.manager.common.model;
import java.io.Serializable;
import java.util.Date;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.Id;
import javax.persistence.Table;
import javax.persistence.Temporal;
import javax.persistence.TemporalType;
@Entity
@Table(name = "mdstores_with_info")
public class MDStoreWithInfo implements Serializable {
/** */
private static final long serialVersionUID = -8445784770687571492L;
@Id
@Column(name = "id")
private String id;
@Column(name = "format")
private String format;
@Column(name = "layout")
private String layout;
@Column(name = "interpretation")
private String interpretation;
@Column(name = "datasource_name")
private String datasourceName;
@Column(name = "datasource_id")
private String datasourceId;
@Column(name = "api_id")
private String apiId;
@Column(name = "current_version")
private String currentVersion;
@Column(name = "lastupdate")
@Temporal(TemporalType.TIMESTAMP)
private Date lastUpdate;
@Column(name = "size")
private long size = 0;
@Column(name = "n_versions")
private long numberOfVersions = 0;
public String getId() {
return id;
}
public void setId(final String id) {
this.id = id;
}
public String getFormat() {
return format;
}
public void setFormat(final String format) {
this.format = format;
}
public String getLayout() {
return layout;
}
public void setLayout(final String layout) {
this.layout = layout;
}
public String getInterpretation() {
return interpretation;
}
public void setInterpretation(final String interpretation) {
this.interpretation = interpretation;
}
public String getDatasourceName() {
return datasourceName;
}
public void setDatasourceName(final String datasourceName) {
this.datasourceName = datasourceName;
}
public String getDatasourceId() {
return datasourceId;
}
public void setDatasourceId(final String datasourceId) {
this.datasourceId = datasourceId;
}
public String getApiId() {
return apiId;
}
public void setApiId(final String apiId) {
this.apiId = apiId;
}
public String getCurrentVersion() {
return currentVersion;
}
public void setCurrentVersion(final String currentVersion) {
this.currentVersion = currentVersion;
}
public Date getLastUpdate() {
return lastUpdate;
}
public void setLastUpdate(final Date lastUpdate) {
this.lastUpdate = lastUpdate;
}
public long getSize() {
return size;
}
public void setSize(final long size) {
this.size = size;
}
public long getNumberOfVersions() {
return numberOfVersions;
}
public void setNumberOfVersions(final long numberOfVersions) {
this.numberOfVersions = numberOfVersions;
}
}

View File

@ -0,0 +1,14 @@
package eu.dnetlib.dhp.application;
import java.io.*;
import java.util.Map;
import java.util.Properties;
import org.apache.hadoop.conf.Configuration;
import com.google.common.collect.Maps;
public class ApplicationUtils {
}

View File

@ -1,10 +1,7 @@
package eu.dnetlib.dhp.application; package eu.dnetlib.dhp.application;
import java.io.ByteArrayInputStream; import java.io.*;
import java.io.ByteArrayOutputStream;
import java.io.Serializable;
import java.io.StringWriter;
import java.util.*; import java.util.*;
import java.util.zip.GZIPInputStream; import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream; import java.util.zip.GZIPOutputStream;
@ -12,17 +9,21 @@ import java.util.zip.GZIPOutputStream;
import org.apache.commons.cli.*; import org.apache.commons.cli.*;
import org.apache.commons.codec.binary.Base64; import org.apache.commons.codec.binary.Base64;
import org.apache.commons.io.IOUtils; import org.apache.commons.io.IOUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.fasterxml.jackson.databind.ObjectMapper; import com.fasterxml.jackson.databind.ObjectMapper;
public class ArgumentApplicationParser implements Serializable { public class ArgumentApplicationParser implements Serializable {
private static final Logger log = LoggerFactory.getLogger(ArgumentApplicationParser.class);
private final Options options = new Options(); private final Options options = new Options();
private final Map<String, String> objectMap = new HashMap<>(); private final Map<String, String> objectMap = new HashMap<>();
private final List<String> compressedValues = new ArrayList<>(); private final List<String> compressedValues = new ArrayList<>();
public ArgumentApplicationParser(final String json_configuration) throws Exception { public ArgumentApplicationParser(final String json_configuration) throws IOException {
final ObjectMapper mapper = new ObjectMapper(); final ObjectMapper mapper = new ObjectMapper();
final OptionsParameter[] configuration = mapper.readValue(json_configuration, OptionsParameter[].class); final OptionsParameter[] configuration = mapper.readValue(json_configuration, OptionsParameter[].class);
createOptionMap(configuration); createOptionMap(configuration);
@ -33,7 +34,6 @@ public class ArgumentApplicationParser implements Serializable {
} }
private void createOptionMap(final OptionsParameter[] configuration) { private void createOptionMap(final OptionsParameter[] configuration) {
Arrays Arrays
.stream(configuration) .stream(configuration)
.map( .map(
@ -47,10 +47,6 @@ public class ArgumentApplicationParser implements Serializable {
return o; return o;
}) })
.forEach(options::addOption); .forEach(options::addOption);
// HelpFormatter formatter = new HelpFormatter();
// formatter.printHelp("myapp", null, options, null, true);
} }
public static String decompressValue(final String abstractCompressed) { public static String decompressValue(final String abstractCompressed) {
@ -61,7 +57,7 @@ public class ArgumentApplicationParser implements Serializable {
IOUtils.copy(gis, stringWriter); IOUtils.copy(gis, stringWriter);
return stringWriter.toString(); return stringWriter.toString();
} catch (Throwable e) { } catch (Throwable e) {
System.out.println("Wrong value to decompress:" + abstractCompressed); log.error("Wrong value to decompress:" + abstractCompressed);
throw new RuntimeException(e); throw new RuntimeException(e);
} }
} }
@ -74,7 +70,7 @@ public class ArgumentApplicationParser implements Serializable {
return java.util.Base64.getEncoder().encodeToString(out.toByteArray()); return java.util.Base64.getEncoder().encodeToString(out.toByteArray());
} }
public void parseArgument(final String[] args) throws Exception { public void parseArgument(final String[] args) throws ParseException {
CommandLineParser parser = new BasicParser(); CommandLineParser parser = new BasicParser();
CommandLine cmd = parser.parse(options, args); CommandLine cmd = parser.parse(options, args);
Arrays Arrays

View File

@ -1,5 +1,5 @@
package eu.dnetlib.collector.worker.model; package eu.dnetlib.dhp.collection;
import java.util.HashMap; import java.util.HashMap;
import java.util.Map; import java.util.Map;

View File

@ -0,0 +1,52 @@
package eu.dnetlib.dhp.common;
import java.util.Map;
import com.google.common.collect.Maps;
public class Constants {
public static final Map<String, String> accessRightsCoarMap = Maps.newHashMap();
public static final Map<String, String> coarCodeLabelMap = Maps.newHashMap();
public static String COAR_ACCESS_RIGHT_SCHEMA = "http://vocabularies.coar-repositories.org/documentation/access_rights/";
static {
accessRightsCoarMap.put("OPEN", "c_abf2");
accessRightsCoarMap.put("RESTRICTED", "c_16ec");
accessRightsCoarMap.put("OPEN SOURCE", "c_abf2");
accessRightsCoarMap.put("CLOSED", "c_14cb");
accessRightsCoarMap.put("EMBARGO", "c_f1cf");
}
static {
coarCodeLabelMap.put("c_abf2", "OPEN");
coarCodeLabelMap.put("c_16ec", "RESTRICTED");
coarCodeLabelMap.put("c_14cb", "CLOSED");
coarCodeLabelMap.put("c_f1cf", "EMBARGO");
}
public static final String SEQUENCE_FILE_NAME = "/sequence_file";
public static final String REPORT_FILE_NAME = "/report";
public static final String MDSTORE_DATA_PATH = "/store";
public static final String MDSTORE_SIZE_PATH = "/size";
public static final String COLLECTION_MODE = "collectionMode";
public static final String METADATA_ENCODING = "metadataEncoding";
public static final String OOZIE_WF_PATH = "oozieWfPath";
public static final String DNET_MESSAGE_MGR_URL = "dnetMessageManagerURL";
public static final String MAX_NUMBER_OF_RETRY = "maxNumberOfRetry";
public static final String REQUEST_DELAY = "requestDelay";
public static final String RETRY_DELAY = "retryDelay";
public static final String CONNECT_TIMEOUT = "connectTimeOut";
public static final String READ_TIMEOUT = "readTimeOut";
public static final String FROM_DATE_OVERRIDE = "fromDateOverride";
public static final String UNTIL_DATE_OVERRIDE = "untilDateOverride";
public static final String CONTENT_TOTALITEMS = "TotalItems";
public static final String CONTENT_INVALIDRECORDS = "InvalidRecords";
public static final String CONTENT_TRANSFORMEDRECORDS = "transformedItems";
}

View File

@ -14,7 +14,7 @@ public class DbClient implements Closeable {
private static final Log log = LogFactory.getLog(DbClient.class); private static final Log log = LogFactory.getLog(DbClient.class);
private Connection connection; private final Connection connection;
public DbClient(final String address, final String login, final String password) { public DbClient(final String address, final String login, final String password) {

View File

@ -0,0 +1,412 @@
package eu.dnetlib.dhp.common;
import java.io.Serializable;
import java.util.*;
import java.util.stream.Collectors;
import eu.dnetlib.dhp.schema.common.ModelConstants;
import eu.dnetlib.dhp.schema.dump.oaf.*;
import eu.dnetlib.dhp.schema.dump.oaf.community.CommunityInstance;
import eu.dnetlib.dhp.schema.dump.oaf.community.CommunityResult;
import eu.dnetlib.dhp.schema.oaf.DataInfo;
import eu.dnetlib.dhp.schema.oaf.Field;
import eu.dnetlib.dhp.schema.oaf.Journal;
import eu.dnetlib.dhp.schema.oaf.StructuredProperty;
public class GraphResultMapper implements Serializable {
public static <E extends eu.dnetlib.dhp.schema.oaf.OafEntity> Result map(
E in) {
CommunityResult out = new CommunityResult();
eu.dnetlib.dhp.schema.oaf.Result input = (eu.dnetlib.dhp.schema.oaf.Result) in;
Optional<eu.dnetlib.dhp.schema.oaf.Qualifier> ort = Optional.ofNullable(input.getResulttype());
if (ort.isPresent()) {
switch (ort.get().getClassid()) {
case "publication":
Optional<Journal> journal = Optional
.ofNullable(((eu.dnetlib.dhp.schema.oaf.Publication) input).getJournal());
if (journal.isPresent()) {
Journal j = journal.get();
Container c = new Container();
c.setConferencedate(j.getConferencedate());
c.setConferenceplace(j.getConferenceplace());
c.setEdition(j.getEdition());
c.setEp(j.getEp());
c.setIss(j.getIss());
c.setIssnLinking(j.getIssnLinking());
c.setIssnOnline(j.getIssnOnline());
c.setIssnPrinted(j.getIssnPrinted());
c.setName(j.getName());
c.setSp(j.getSp());
c.setVol(j.getVol());
out.setContainer(c);
out.setType(ModelConstants.PUBLICATION_DEFAULT_RESULTTYPE.getClassname());
}
break;
case "dataset":
eu.dnetlib.dhp.schema.oaf.Dataset id = (eu.dnetlib.dhp.schema.oaf.Dataset) input;
Optional.ofNullable(id.getSize()).ifPresent(v -> out.setSize(v.getValue()));
Optional.ofNullable(id.getVersion()).ifPresent(v -> out.setVersion(v.getValue()));
out
.setGeolocation(
Optional
.ofNullable(id.getGeolocation())
.map(
igl -> igl
.stream()
.filter(Objects::nonNull)
.map(gli -> {
GeoLocation gl = new GeoLocation();
gl.setBox(gli.getBox());
gl.setPlace(gli.getPlace());
gl.setPoint(gli.getPoint());
return gl;
})
.collect(Collectors.toList()))
.orElse(null));
out.setType(ModelConstants.DATASET_DEFAULT_RESULTTYPE.getClassname());
break;
case "software":
eu.dnetlib.dhp.schema.oaf.Software is = (eu.dnetlib.dhp.schema.oaf.Software) input;
Optional
.ofNullable(is.getCodeRepositoryUrl())
.ifPresent(value -> out.setCodeRepositoryUrl(value.getValue()));
Optional
.ofNullable(is.getDocumentationUrl())
.ifPresent(
value -> out
.setDocumentationUrl(
value
.stream()
.map(v -> v.getValue())
.collect(Collectors.toList())));
Optional
.ofNullable(is.getProgrammingLanguage())
.ifPresent(value -> out.setProgrammingLanguage(value.getClassid()));
out.setType(ModelConstants.SOFTWARE_DEFAULT_RESULTTYPE.getClassname());
break;
case "other":
eu.dnetlib.dhp.schema.oaf.OtherResearchProduct ir = (eu.dnetlib.dhp.schema.oaf.OtherResearchProduct) input;
out
.setContactgroup(
Optional
.ofNullable(ir.getContactgroup())
.map(value -> value.stream().map(cg -> cg.getValue()).collect(Collectors.toList()))
.orElse(null));
out
.setContactperson(
Optional
.ofNullable(ir.getContactperson())
.map(value -> value.stream().map(cp -> cp.getValue()).collect(Collectors.toList()))
.orElse(null));
out
.setTool(
Optional
.ofNullable(ir.getTool())
.map(value -> value.stream().map(t -> t.getValue()).collect(Collectors.toList()))
.orElse(null));
out.setType(ModelConstants.ORP_DEFAULT_RESULTTYPE.getClassname());
break;
}
Optional
.ofNullable(input.getAuthor())
.ifPresent(ats -> out.setAuthor(ats.stream().map(at -> getAuthor(at)).collect(Collectors.toList())));
// I do not map Access Right UNKNOWN or OTHER
Optional<eu.dnetlib.dhp.schema.oaf.Qualifier> oar = Optional.ofNullable(input.getBestaccessright());
if (oar.isPresent()) {
if (Constants.accessRightsCoarMap.containsKey(oar.get().getClassid())) {
String code = Constants.accessRightsCoarMap.get(oar.get().getClassid());
out
.setBestaccessright(
AccessRight
.newInstance(
code,
Constants.coarCodeLabelMap.get(code),
Constants.COAR_ACCESS_RIGHT_SCHEMA));
}
}
final List<String> contributorList = new ArrayList<>();
Optional
.ofNullable(input.getContributor())
.ifPresent(value -> value.stream().forEach(c -> contributorList.add(c.getValue())));
out.setContributor(contributorList);
Optional
.ofNullable(input.getCountry())
.ifPresent(
value -> out
.setCountry(
value
.stream()
.map(
c -> {
if (c.getClassid().equals((ModelConstants.UNKNOWN))) {
return null;
}
Country country = new Country();
country.setCode(c.getClassid());
country.setLabel(c.getClassname());
Optional
.ofNullable(c.getDataInfo())
.ifPresent(
provenance -> country
.setProvenance(
Provenance
.newInstance(
provenance
.getProvenanceaction()
.getClassname(),
c.getDataInfo().getTrust())));
return country;
})
.filter(Objects::nonNull)
.collect(Collectors.toList())));
final List<String> coverageList = new ArrayList<>();
Optional
.ofNullable(input.getCoverage())
.ifPresent(value -> value.stream().forEach(c -> coverageList.add(c.getValue())));
out.setCoverage(coverageList);
out.setDateofcollection(input.getDateofcollection());
final List<String> descriptionList = new ArrayList<>();
Optional
.ofNullable(input.getDescription())
.ifPresent(value -> value.forEach(d -> descriptionList.add(d.getValue())));
out.setDescription(descriptionList);
Optional<Field<String>> oStr = Optional.ofNullable(input.getEmbargoenddate());
if (oStr.isPresent()) {
out.setEmbargoenddate(oStr.get().getValue());
}
final List<String> formatList = new ArrayList<>();
Optional
.ofNullable(input.getFormat())
.ifPresent(value -> value.stream().forEach(f -> formatList.add(f.getValue())));
out.setFormat(formatList);
out.setId(input.getId());
out.setOriginalId(input.getOriginalId());
Optional<List<eu.dnetlib.dhp.schema.oaf.Instance>> oInst = Optional
.ofNullable(input.getInstance());
if (oInst.isPresent()) {
out
.setInstance(
oInst.get().stream().map(i -> getInstance(i)).collect(Collectors.toList()));
}
Optional<eu.dnetlib.dhp.schema.oaf.Qualifier> oL = Optional.ofNullable(input.getLanguage());
if (oL.isPresent()) {
eu.dnetlib.dhp.schema.oaf.Qualifier language = oL.get();
out.setLanguage(Qualifier.newInstance(language.getClassid(), language.getClassname()));
}
Optional<Long> oLong = Optional.ofNullable(input.getLastupdatetimestamp());
if (oLong.isPresent()) {
out.setLastupdatetimestamp(oLong.get());
}
Optional<List<StructuredProperty>> otitle = Optional.ofNullable(input.getTitle());
if (otitle.isPresent()) {
List<StructuredProperty> iTitle = otitle
.get()
.stream()
.filter(t -> t.getQualifier().getClassid().equalsIgnoreCase("main title"))
.collect(Collectors.toList());
if (iTitle.size() > 0) {
out.setMaintitle(iTitle.get(0).getValue());
}
iTitle = otitle
.get()
.stream()
.filter(t -> t.getQualifier().getClassid().equalsIgnoreCase("subtitle"))
.collect(Collectors.toList());
if (iTitle.size() > 0) {
out.setSubtitle(iTitle.get(0).getValue());
}
}
List<ControlledField> pids = new ArrayList<>();
Optional
.ofNullable(input.getPid())
.ifPresent(
value -> value
.stream()
.forEach(
p -> pids
.add(
ControlledField
.newInstance(p.getQualifier().getClassid(), p.getValue()))));
out.setPid(pids);
oStr = Optional.ofNullable(input.getDateofacceptance());
if (oStr.isPresent()) {
out.setPublicationdate(oStr.get().getValue());
}
oStr = Optional.ofNullable(input.getPublisher());
if (oStr.isPresent()) {
out.setPublisher(oStr.get().getValue());
}
List<String> sourceList = new ArrayList<>();
Optional
.ofNullable(input.getSource())
.ifPresent(value -> value.stream().forEach(s -> sourceList.add(s.getValue())));
// out.setSource(input.getSource().stream().map(s -> s.getValue()).collect(Collectors.toList()));
List<Subject> subjectList = new ArrayList<>();
Optional
.ofNullable(input.getSubject())
.ifPresent(
value -> value
.forEach(s -> subjectList.add(getSubject(s))));
out.setSubjects(subjectList);
out.setType(input.getResulttype().getClassid());
}
out
.setCollectedfrom(
input
.getCollectedfrom()
.stream()
.map(cf -> KeyValue.newInstance(cf.getKey(), cf.getValue()))
.collect(Collectors.toList()));
return out;
}
private static CommunityInstance getInstance(eu.dnetlib.dhp.schema.oaf.Instance i) {
CommunityInstance instance = new CommunityInstance();
setCommonValue(i, instance);
instance
.setCollectedfrom(
KeyValue
.newInstance(i.getCollectedfrom().getKey(), i.getCollectedfrom().getValue()));
instance
.setHostedby(
KeyValue.newInstance(i.getHostedby().getKey(), i.getHostedby().getValue()));
return instance;
}
private static <I extends Instance> void setCommonValue(eu.dnetlib.dhp.schema.oaf.Instance i, I instance) {
Optional<eu.dnetlib.dhp.schema.oaf.Qualifier> opAr = Optional
.ofNullable(i.getAccessright());
if (opAr.isPresent()) {
if (Constants.accessRightsCoarMap.containsKey(opAr.get().getClassid())) {
String code = Constants.accessRightsCoarMap.get(opAr.get().getClassid());
instance
.setAccessright(
AccessRight
.newInstance(
code,
Constants.coarCodeLabelMap.get(code),
Constants.COAR_ACCESS_RIGHT_SCHEMA));
}
}
Optional
.ofNullable(i.getLicense())
.ifPresent(value -> instance.setLicense(value.getValue()));
Optional
.ofNullable(i.getDateofacceptance())
.ifPresent(value -> instance.setPublicationdate(value.getValue()));
Optional
.ofNullable(i.getRefereed())
.ifPresent(value -> instance.setRefereed(value.getClassname()));
Optional
.ofNullable(i.getInstancetype())
.ifPresent(value -> instance.setType(value.getClassname()));
Optional.ofNullable(i.getUrl()).ifPresent(value -> instance.setUrl(value));
}
private static Subject getSubject(StructuredProperty s) {
Subject subject = new Subject();
subject.setSubject(ControlledField.newInstance(s.getQualifier().getClassid(), s.getValue()));
Optional<DataInfo> di = Optional.ofNullable(s.getDataInfo());
if (di.isPresent()) {
Provenance p = new Provenance();
p.setProvenance(di.get().getProvenanceaction().getClassname());
p.setTrust(di.get().getTrust());
subject.setProvenance(p);
}
return subject;
}
private static Author getAuthor(eu.dnetlib.dhp.schema.oaf.Author oa) {
Author a = new Author();
a.setFullname(oa.getFullname());
a.setName(oa.getName());
a.setSurname(oa.getSurname());
a.setRank(oa.getRank());
Optional<List<StructuredProperty>> oPids = Optional
.ofNullable(oa.getPid());
if (oPids.isPresent()) {
Pid pid = getOrcid(oPids.get());
if (pid != null) {
a.setPid(pid);
}
}
return a;
}
private static Pid getOrcid(List<StructuredProperty> p) {
for (StructuredProperty pid : p) {
if (pid.getQualifier().getClassid().equals(ModelConstants.ORCID)) {
Optional<DataInfo> di = Optional.ofNullable(pid.getDataInfo());
if (di.isPresent()) {
return Pid
.newInstance(
ControlledField
.newInstance(
pid.getQualifier().getClassid(),
pid.getValue()),
Provenance
.newInstance(
di.get().getProvenanceaction().getClassname(),
di.get().getTrust()));
} else {
return Pid
.newInstance(
ControlledField
.newInstance(
pid.getQualifier().getClassid(),
pid.getValue())
);
}
}
}
return null;
}
}

View File

@ -0,0 +1,114 @@
package eu.dnetlib.dhp.common;
import java.io.BufferedInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.Serializable;
import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream;
import org.apache.hadoop.fs.*;
public class MakeTarArchive implements Serializable {
private static TarArchiveOutputStream getTar(FileSystem fileSystem, String outputPath) throws IOException {
Path hdfsWritePath = new Path(outputPath);
FSDataOutputStream fsDataOutputStream = null;
if (fileSystem.exists(hdfsWritePath)) {
fileSystem.delete(hdfsWritePath, true);
}
fsDataOutputStream = fileSystem.create(hdfsWritePath);
return new TarArchiveOutputStream(fsDataOutputStream.getWrappedStream());
}
private static void write(FileSystem fileSystem, String inputPath, String outputPath, String dir_name)
throws IOException {
Path hdfsWritePath = new Path(outputPath);
FSDataOutputStream fsDataOutputStream = null;
if (fileSystem.exists(hdfsWritePath)) {
fileSystem.delete(hdfsWritePath, true);
}
fsDataOutputStream = fileSystem.create(hdfsWritePath);
TarArchiveOutputStream ar = new TarArchiveOutputStream(fsDataOutputStream.getWrappedStream());
RemoteIterator<LocatedFileStatus> fileStatusListIterator = fileSystem
.listFiles(
new Path(inputPath), true);
while (fileStatusListIterator.hasNext()) {
writeCurrentFile(fileSystem, dir_name, fileStatusListIterator, ar, 0);
}
ar.close();
}
public static void tarMaxSize(FileSystem fileSystem, String inputPath, String outputPath, String dir_name,
int gBperSplit) throws IOException {
final long bytesPerSplit = 1024L * 1024L * 1024L * gBperSplit;
long sourceSize = fileSystem.getContentSummary(new Path(inputPath)).getSpaceConsumed();
if (sourceSize < bytesPerSplit) {
write(fileSystem, inputPath, outputPath + ".tar", dir_name);
} else {
int partNum = 0;
RemoteIterator<LocatedFileStatus> fileStatusListIterator = fileSystem
.listFiles(
new Path(inputPath), true);
boolean next = fileStatusListIterator.hasNext();
while (next) {
TarArchiveOutputStream ar = getTar(fileSystem, outputPath + "_" + (partNum + 1) + ".tar");
long current_size = 0;
while (next && current_size < bytesPerSplit) {
current_size = writeCurrentFile(fileSystem, dir_name, fileStatusListIterator, ar, current_size);
next = fileStatusListIterator.hasNext();
}
partNum += 1;
ar.close();
}
}
}
private static long writeCurrentFile(FileSystem fileSystem, String dir_name,
RemoteIterator<LocatedFileStatus> fileStatusListIterator,
TarArchiveOutputStream ar, long current_size) throws IOException {
LocatedFileStatus fileStatus = fileStatusListIterator.next();
Path p = fileStatus.getPath();
String p_string = p.toString();
if (!p_string.endsWith("_SUCCESS")) {
String name = p_string.substring(p_string.lastIndexOf("/") + 1);
TarArchiveEntry entry = new TarArchiveEntry(dir_name + "/" + name);
entry.setSize(fileStatus.getLen());
current_size += fileStatus.getLen();
ar.putArchiveEntry(entry);
InputStream is = fileSystem.open(fileStatus.getPath());
BufferedInputStream bis = new BufferedInputStream(is);
int count;
byte[] data = new byte[1024];
while ((count = bis.read(data, 0, data.length)) != -1) {
ar.write(data, 0, count);
}
bis.close();
ar.closeArchiveEntry();
}
return current_size;
}
}

View File

@ -1,39 +1,60 @@
package eu.dnetlib.dhp.oa.graph.raw.common; package eu.dnetlib.dhp.common;
import java.io.Closeable; import java.io.Closeable;
import java.io.IOException; import java.io.IOException;
import java.util.ArrayList; import java.util.ArrayList;
import java.util.HashMap; import java.util.HashMap;
import java.util.Map; import java.util.Map;
import java.util.Optional;
import java.util.stream.StreamSupport; import java.util.stream.StreamSupport;
import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.StringUtils;
import org.apache.commons.logging.Log; import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory; import org.apache.commons.logging.LogFactory;
import org.bson.Document; import org.bson.Document;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.google.common.collect.Iterables; import com.google.common.collect.Iterables;
import com.mongodb.BasicDBObject;
import com.mongodb.MongoClient; import com.mongodb.MongoClient;
import com.mongodb.MongoClientURI; import com.mongodb.MongoClientURI;
import com.mongodb.QueryBuilder;
import com.mongodb.client.MongoCollection; import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase; import com.mongodb.client.MongoDatabase;
public class MdstoreClient implements Closeable { public class MdstoreClient implements Closeable {
private static final Logger log = LoggerFactory.getLogger(MdstoreClient.class);
private final MongoClient client; private final MongoClient client;
private final MongoDatabase db; private final MongoDatabase db;
private static final String COLL_METADATA = "metadata"; private static final String COLL_METADATA = "metadata";
private static final String COLL_METADATA_MANAGER = "metadataManager"; private static final String COLL_METADATA_MANAGER = "metadataManager";
private static final Log log = LogFactory.getLog(MdstoreClient.class);
public MdstoreClient(final String baseUrl, final String dbName) { public MdstoreClient(final String baseUrl, final String dbName) {
this.client = new MongoClient(new MongoClientURI(baseUrl)); this.client = new MongoClient(new MongoClientURI(baseUrl));
this.db = getDb(client, dbName); this.db = getDb(client, dbName);
} }
public MongoCollection<Document> mdStore(final String mdId) {
BasicDBObject query = (BasicDBObject) QueryBuilder.start("mdId").is(mdId).get();
log.info("querying current mdId: {}", query.toJson());
final String currentId = Optional
.ofNullable(getColl(db, COLL_METADATA_MANAGER, true).find(query))
.map(r -> r.first())
.map(d -> d.getString("currentId"))
.orElseThrow(() -> new IllegalArgumentException("cannot find current mdstore id for: " + mdId));
log.info("currentId: {}", currentId);
return getColl(db, currentId, true);
}
public Map<String, String> validCollections( public Map<String, String> validCollections(
final String mdFormat, final String mdLayout, final String mdInterpretation) { final String mdFormat, final String mdLayout, final String mdInterpretation) {

View File

@ -13,9 +13,9 @@ import okio.Source;
public class InputStreamRequestBody extends RequestBody { public class InputStreamRequestBody extends RequestBody {
private InputStream inputStream; private final InputStream inputStream;
private MediaType mediaType; private final MediaType mediaType;
private long lenght; private final long lenght;
public static RequestBody create(final MediaType mediaType, final InputStream inputStream, final long len) { public static RequestBody create(final MediaType mediaType, final InputStream inputStream, final long len) {

View File

@ -3,6 +3,7 @@ package eu.dnetlib.dhp.common.api;
import java.io.*; import java.io.*;
import java.io.IOException; import java.io.IOException;
import java.util.concurrent.TimeUnit;
import com.google.gson.Gson; import com.google.gson.Gson;
@ -50,14 +51,15 @@ public class ZenodoAPIClient implements Serializable {
/** /**
* Brand new deposition in Zenodo. It sets the deposition_id and the bucket where to store the files to upload * Brand new deposition in Zenodo. It sets the deposition_id and the bucket where to store the files to upload
*
* @return response code * @return response code
* @throws IOException * @throws IOException
*/ */
public int newDeposition() throws IOException { public int newDeposition() throws IOException {
String json = "{}"; String json = "{}";
OkHttpClient httpClient = new OkHttpClient(); OkHttpClient httpClient = new OkHttpClient.Builder().connectTimeout(600, TimeUnit.SECONDS).build();
RequestBody body = RequestBody.create(MEDIA_TYPE_JSON, json); RequestBody body = RequestBody.create(json, MEDIA_TYPE_JSON);
Request request = new Request.Builder() Request request = new Request.Builder()
.url(urlString) .url(urlString)
@ -86,13 +88,18 @@ public class ZenodoAPIClient implements Serializable {
/** /**
* Upload files in Zenodo. * Upload files in Zenodo.
*
* @param is the inputStream for the file to upload * @param is the inputStream for the file to upload
* @param file_name the name of the file as it will appear on Zenodo * @param file_name the name of the file as it will appear on Zenodo
* @param len the size of the file * @param len the size of the file
* @return the response code * @return the response code
*/ */
public int uploadIS(InputStream is, String file_name, long len) throws IOException { public int uploadIS(InputStream is, String file_name, long len) throws IOException {
OkHttpClient httpClient = new OkHttpClient(); OkHttpClient httpClient = new OkHttpClient.Builder()
.writeTimeout(600, TimeUnit.SECONDS)
.readTimeout(600, TimeUnit.SECONDS)
.connectTimeout(600, TimeUnit.SECONDS)
.build();
Request request = new Request.Builder() Request request = new Request.Builder()
.url(bucket + "/" + file_name) .url(bucket + "/" + file_name)
@ -110,15 +117,16 @@ public class ZenodoAPIClient implements Serializable {
/** /**
* Associates metadata information to the current deposition * Associates metadata information to the current deposition
*
* @param metadata the metadata * @param metadata the metadata
* @return response code * @return response code
* @throws IOException * @throws IOException
*/ */
public int sendMretadata(String metadata) throws IOException { public int sendMretadata(String metadata) throws IOException {
OkHttpClient httpClient = new OkHttpClient(); OkHttpClient httpClient = new OkHttpClient.Builder().connectTimeout(600, TimeUnit.SECONDS).build();
RequestBody body = RequestBody.create(MEDIA_TYPE_JSON, metadata); RequestBody body = RequestBody.create(metadata, MEDIA_TYPE_JSON);
Request request = new Request.Builder() Request request = new Request.Builder()
.url(urlString + "/" + deposition_id) .url(urlString + "/" + deposition_id)
@ -140,6 +148,7 @@ public class ZenodoAPIClient implements Serializable {
/** /**
* To publish the current deposition. It works for both new deposition or new version of an old deposition * To publish the current deposition. It works for both new deposition or new version of an old deposition
*
* @return response code * @return response code
* @throws IOException * @throws IOException
*/ */
@ -147,12 +156,14 @@ public class ZenodoAPIClient implements Serializable {
String json = "{}"; String json = "{}";
OkHttpClient httpClient = new OkHttpClient(); OkHttpClient httpClient = new OkHttpClient.Builder().connectTimeout(600, TimeUnit.SECONDS).build();
RequestBody body = RequestBody.create(json, MEDIA_TYPE_JSON);
Request request = new Request.Builder() Request request = new Request.Builder()
.url(urlString + "/" + deposition_id + "/actions/publish") .url(urlString + "/" + deposition_id + "/actions/publish")
.addHeader("Authorization", "Bearer " + access_token) .addHeader("Authorization", "Bearer " + access_token)
.post(RequestBody.create(MEDIA_TYPE_JSON, json)) .post(body)
.build(); .build();
try (Response response = httpClient.newCall(request).execute()) { try (Response response = httpClient.newCall(request).execute()) {
@ -166,11 +177,12 @@ public class ZenodoAPIClient implements Serializable {
} }
/** /**
* To create a new version of an already published deposition. * To create a new version of an already published deposition. It sets the deposition_id and the bucket to be used
* It sets the deposition_id and the bucket to be used for the new version. * for the new version.
* @param concept_rec_id the concept record id of the deposition for which to create a new version. It is *
* the last part of the url for the DOI Zenodo suggests to use to cite all versions: * @param concept_rec_id the concept record id of the deposition for which to create a new version. It is the last
* DOI: 10.xxx/zenodo.656930 concept_rec_id = 656930 * part of the url for the DOI Zenodo suggests to use to cite all versions: DOI: 10.xxx/zenodo.656930
* concept_rec_id = 656930
* @return response code * @return response code
* @throws IOException * @throws IOException
* @throws MissingConceptDoiException * @throws MissingConceptDoiException
@ -179,12 +191,14 @@ public class ZenodoAPIClient implements Serializable {
setDepositionId(concept_rec_id); setDepositionId(concept_rec_id);
String json = "{}"; String json = "{}";
OkHttpClient httpClient = new OkHttpClient(); OkHttpClient httpClient = new OkHttpClient.Builder().connectTimeout(600, TimeUnit.SECONDS).build();
RequestBody body = RequestBody.create(json, MEDIA_TYPE_JSON);
Request request = new Request.Builder() Request request = new Request.Builder()
.url(urlString + "/" + deposition_id + "/actions/newversion") .url(urlString + "/" + deposition_id + "/actions/newversion")
.addHeader("Authorization", "Bearer " + access_token) .addHeader("Authorization", "Bearer " + access_token)
.post(RequestBody.create(MEDIA_TYPE_JSON, json)) .post(body)
.build(); .build();
try (Response response = httpClient.newCall(request).execute()) { try (Response response = httpClient.newCall(request).execute()) {
@ -201,6 +215,41 @@ public class ZenodoAPIClient implements Serializable {
} }
} }
/**
* To finish uploading a version or new deposition not published
* It sets the deposition_id and the bucket to be used
*
*
* @param deposition_id the deposition id of the not yet published upload
* concept_rec_id = 656930
* @return response code
* @throws IOException
* @throws MissingConceptDoiException
*/
public int uploadOpenDeposition(String deposition_id) throws IOException, MissingConceptDoiException {
this.deposition_id = deposition_id;
OkHttpClient httpClient = new OkHttpClient.Builder().connectTimeout(600, TimeUnit.SECONDS).build();
Request request = new Request.Builder()
.url(urlString + "/" + deposition_id)
.addHeader("Authorization", "Bearer " + access_token)
.build();
try (Response response = httpClient.newCall(request).execute()) {
if (!response.isSuccessful())
throw new IOException("Unexpected code " + response + response.body().string());
ZenodoModel zenodoModel = new Gson().fromJson(response.body().string(), ZenodoModel.class);
bucket = zenodoModel.getLinks().getBucket();
return response.code();
}
}
private void setDepositionId(String concept_rec_id) throws IOException, MissingConceptDoiException { private void setDepositionId(String concept_rec_id) throws IOException, MissingConceptDoiException {
ZenodoModelList zenodoModelList = new Gson().fromJson(getPrevDepositions(), ZenodoModelList.class); ZenodoModelList zenodoModelList = new Gson().fromJson(getPrevDepositions(), ZenodoModelList.class);
@ -217,7 +266,7 @@ public class ZenodoAPIClient implements Serializable {
} }
private String getPrevDepositions() throws IOException { private String getPrevDepositions() throws IOException {
OkHttpClient httpClient = new OkHttpClient(); OkHttpClient httpClient = new OkHttpClient.Builder().connectTimeout(600, TimeUnit.SECONDS).build();
Request request = new Request.Builder() Request request = new Request.Builder()
.url(urlString) .url(urlString)
@ -238,7 +287,9 @@ public class ZenodoAPIClient implements Serializable {
} }
private String getBucket(String url) throws IOException { private String getBucket(String url) throws IOException {
OkHttpClient httpClient = new OkHttpClient(); OkHttpClient httpClient = new OkHttpClient.Builder()
.connectTimeout(600, TimeUnit.SECONDS)
.build();
Request request = new Request.Builder() Request request = new Request.Builder()
.url(url) .url(url)

View File

@ -0,0 +1,72 @@
package eu.dnetlib.dhp.common.rest;
import java.util.Arrays;
import java.util.stream.Collectors;
import org.apache.commons.io.IOUtils;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.client.methods.HttpUriRequest;
import org.apache.http.entity.StringEntity;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.fasterxml.jackson.databind.ObjectMapper;
public class DNetRestClient {
private static final Logger log = LoggerFactory.getLogger(DNetRestClient.class);
private static final ObjectMapper mapper = new ObjectMapper();
public static <T> T doGET(final String url, Class<T> clazz) throws Exception {
final HttpGet httpGet = new HttpGet(url);
return doHTTPRequest(httpGet, clazz);
}
public static String doGET(final String url) throws Exception {
final HttpGet httpGet = new HttpGet(url);
return doHTTPRequest(httpGet);
}
public static <V> String doPOST(final String url, V objParam) throws Exception {
final HttpPost httpPost = new HttpPost(url);
if (objParam != null) {
final StringEntity entity = new StringEntity(mapper.writeValueAsString(objParam));
httpPost.setEntity(entity);
httpPost.setHeader("Accept", "application/json");
httpPost.setHeader("Content-type", "application/json");
}
return doHTTPRequest(httpPost);
}
public static <T, V> T doPOST(final String url, V objParam, Class<T> clazz) throws Exception {
return mapper.readValue(doPOST(url, objParam), clazz);
}
private static String doHTTPRequest(final HttpUriRequest r) throws Exception {
CloseableHttpClient client = HttpClients.createDefault();
log.info("performing HTTP request, method {} on URI {}", r.getMethod(), r.getURI().toString());
log
.info(
"request headers: {}",
Arrays
.asList(r.getAllHeaders())
.stream()
.map(h -> h.getName() + ":" + h.getValue())
.collect(Collectors.joining(",")));
CloseableHttpResponse response = client.execute(r);
return IOUtils.toString(response.getEntity().getContent());
}
private static <T> T doHTTPRequest(final HttpUriRequest r, Class<T> clazz) throws Exception {
return mapper.readValue(doHTTPRequest(r), clazz);
}
}

View File

@ -1,5 +1,5 @@
package eu.dnetlib.dhp.oa.graph.raw.common; package eu.dnetlib.dhp.common.vocabulary;
import java.io.Serializable; import java.io.Serializable;
import java.util.HashMap; import java.util.HashMap;
@ -10,8 +10,8 @@ import org.apache.commons.lang3.StringUtils;
import com.google.common.collect.Maps; import com.google.common.collect.Maps;
import eu.dnetlib.dhp.schema.oaf.OafMapperUtils;
import eu.dnetlib.dhp.schema.oaf.Qualifier; import eu.dnetlib.dhp.schema.oaf.Qualifier;
import eu.dnetlib.dhp.schema.oaf.utils.OafMapperUtils;
public class Vocabulary implements Serializable { public class Vocabulary implements Serializable {

View File

@ -1,5 +1,5 @@
package eu.dnetlib.dhp.oa.graph.raw.common; package eu.dnetlib.dhp.common.vocabulary;
import java.io.Serializable; import java.io.Serializable;
import java.util.*; import java.util.*;
@ -7,8 +7,8 @@ import java.util.stream.Collectors;
import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.StringUtils;
import eu.dnetlib.dhp.schema.oaf.OafMapperUtils;
import eu.dnetlib.dhp.schema.oaf.Qualifier; import eu.dnetlib.dhp.schema.oaf.Qualifier;
import eu.dnetlib.dhp.schema.oaf.utils.OafMapperUtils;
import eu.dnetlib.enabling.is.lookup.rmi.ISLookUpException; import eu.dnetlib.enabling.is.lookup.rmi.ISLookUpException;
import eu.dnetlib.enabling.is.lookup.rmi.ISLookUpService; import eu.dnetlib.enabling.is.lookup.rmi.ISLookUpService;
@ -67,6 +67,10 @@ public class VocabularyGroup implements Serializable {
private final Map<String, Vocabulary> vocs = new HashMap<>(); private final Map<String, Vocabulary> vocs = new HashMap<>();
public Set<String> vocabularyNames() {
return vocs.keySet();
}
public void addVocabulary(final String id, final String name) { public void addVocabulary(final String id, final String name) {
vocs.put(id.toLowerCase(), new Vocabulary(id, name)); vocs.put(id.toLowerCase(), new Vocabulary(id, name));
} }
@ -118,7 +122,31 @@ public class VocabularyGroup implements Serializable {
return vocs.get(vocId.toLowerCase()).getSynonymAsQualifier(syn); return vocs.get(vocId.toLowerCase()).getSynonymAsQualifier(syn);
} }
/**
* getSynonymAsQualifierCaseSensitive
*
* refelects the situation to check caseSensitive vocabulary
*/
public Qualifier getSynonymAsQualifierCaseSensitive(final String vocId, final String syn) {
if (StringUtils.isBlank(vocId)) {
return OafMapperUtils.unknown("", "");
}
return vocs.get(vocId).getSynonymAsQualifier(syn);
}
/**
* termExists
*
* two methods: without and with caseSensitive check
*/
public boolean termExists(final String vocId, final String id) { public boolean termExists(final String vocId, final String id) {
return termExists(vocId, id, Boolean.FALSE);
}
public boolean termExists(final String vocId, final String id, final Boolean caseSensitive) {
if (Boolean.TRUE.equals(caseSensitive)) {
return vocabularyExists(vocId) && vocs.get(vocId).termExists(id);
}
return vocabularyExists(vocId) && vocs.get(vocId.toLowerCase()).termExists(id); return vocabularyExists(vocId) && vocs.get(vocId.toLowerCase()).termExists(id);
} }

View File

@ -1,5 +1,5 @@
package eu.dnetlib.dhp.oa.graph.raw.common; package eu.dnetlib.dhp.common.vocabulary;
import java.io.Serializable; import java.io.Serializable;

View File

@ -0,0 +1,64 @@
package eu.dnetlib.dhp.message;
import java.io.Serializable;
import java.util.HashMap;
import java.util.LinkedHashMap;
import java.util.Map;
public class Message implements Serializable {
private static final long serialVersionUID = 401753881204524893L;
public static String CURRENT_PARAM = "current";
public static String TOTAL_PARAM = "total";
private MessageType messageType;
private String workflowId;
private Map<String, String> body;
public Message() {
}
public Message(final MessageType messageType, final String workflowId) {
this(messageType, workflowId, new LinkedHashMap<>());
}
public Message(final MessageType messageType, final String workflowId, final Map<String, String> body) {
this.messageType = messageType;
this.workflowId = workflowId;
this.body = body;
}
public MessageType getMessageType() {
return messageType;
}
public void setMessageType(MessageType messageType) {
this.messageType = messageType;
}
public String getWorkflowId() {
return workflowId;
}
public void setWorkflowId(final String workflowId) {
this.workflowId = workflowId;
}
public Map<String, String> getBody() {
return body;
}
public void setBody(final Map<String, String> body) {
this.body = body;
}
@Override
public String toString() {
return String.format("Message [type=%s, workflowId=%s, body=%s]", messageType, workflowId, body);
}
}

View File

@ -0,0 +1,94 @@
package eu.dnetlib.dhp.message;
import java.util.Map;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPut;
import org.apache.http.entity.ContentType;
import org.apache.http.entity.StringEntity;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
public class MessageSender {
private static final Logger log = LoggerFactory.getLogger(MessageSender.class);
private static final int SOCKET_TIMEOUT_MS = 2000;
private static final int CONNECTION_REQUEST_TIMEOUT_MS = 2000;
private static final int CONNTECTION_TIMEOUT_MS = 2000;
private final ObjectMapper objectMapper = new ObjectMapper();
private final String dnetMessageEndpoint;
private final String workflowId;
private final ExecutorService executorService = Executors.newCachedThreadPool();
public MessageSender(final String dnetMessageEndpoint, final String workflowId) {
this.workflowId = workflowId;
this.dnetMessageEndpoint = dnetMessageEndpoint;
}
public void sendMessage(final Message message) {
executorService.submit(() -> _sendMessage(message));
}
public void sendMessage(final Long current, final Long total) {
sendMessage(createOngoingMessage(current, total));
}
public void sendReport(final Map<String, String> report) {
sendMessage(new Message(MessageType.REPORT, workflowId, report));
}
private Message createOngoingMessage(final Long current, final Long total) {
final Message m = new Message(MessageType.ONGOING, workflowId);
m.getBody().put(Message.CURRENT_PARAM, current.toString());
if (total != null) {
m.getBody().put(Message.TOTAL_PARAM, total.toString());
}
return m;
}
private void _sendMessage(final Message message) {
try {
final String json = objectMapper.writeValueAsString(message);
final HttpPut req = new HttpPut(dnetMessageEndpoint);
req.setEntity(new StringEntity(json, ContentType.APPLICATION_JSON));
final RequestConfig requestConfig = RequestConfig
.custom()
.setConnectTimeout(CONNTECTION_TIMEOUT_MS)
.setConnectionRequestTimeout(CONNECTION_REQUEST_TIMEOUT_MS)
.setSocketTimeout(SOCKET_TIMEOUT_MS)
.build();
try (final CloseableHttpClient client = HttpClients
.custom()
.setDefaultRequestConfig(requestConfig)
.build();
final CloseableHttpResponse response = client.execute(req)) {
log.debug("Sent Message to " + dnetMessageEndpoint);
log.debug("MESSAGE:" + message);
} catch (final Throwable e) {
log.error("Error sending message to " + dnetMessageEndpoint + ", message content: " + message, e);
}
} catch (final JsonProcessingException e) {
log.error("Error sending message to " + dnetMessageEndpoint + ", message content: " + message, e);
}
}
}

View File

@ -0,0 +1,21 @@
package eu.dnetlib.dhp.message;
import java.io.Serializable;
import java.util.Optional;
import org.apache.commons.lang3.StringUtils;
public enum MessageType implements Serializable {
ONGOING, REPORT;
public MessageType from(String value) {
return Optional
.ofNullable(value)
.map(StringUtils::upperCase)
.map(MessageType::valueOf)
.orElseThrow(() -> new IllegalArgumentException("unknown message type: " + value));
}
}

View File

@ -1,121 +0,0 @@
package eu.dnetlib.dhp.model.mdstore;
import java.io.Serializable;
import eu.dnetlib.dhp.utils.DHPUtils;
/** This class models a record inside the new Metadata store collection on HDFS * */
public class MetadataRecord implements Serializable {
/** The D-Net Identifier associated to the record */
private String id;
/** The original Identifier of the record */
private String originalId;
/** The encoding of the record, should be JSON or XML */
private String encoding;
/**
* The information about the provenance of the record see @{@link Provenance} for the model of this information
*/
private Provenance provenance;
/** The content of the metadata */
private String body;
/** the date when the record has been stored */
private long dateOfCollection;
/** the date when the record has been stored */
private long dateOfTransformation;
public MetadataRecord() {
this.dateOfCollection = System.currentTimeMillis();
}
public MetadataRecord(
String originalId,
String encoding,
Provenance provenance,
String body,
long dateOfCollection) {
this.originalId = originalId;
this.encoding = encoding;
this.provenance = provenance;
this.body = body;
this.dateOfCollection = dateOfCollection;
this.id = DHPUtils.generateIdentifier(originalId, this.provenance.getNsPrefix());
}
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getOriginalId() {
return originalId;
}
public void setOriginalId(String originalId) {
this.originalId = originalId;
}
public String getEncoding() {
return encoding;
}
public void setEncoding(String encoding) {
this.encoding = encoding;
}
public Provenance getProvenance() {
return provenance;
}
public void setProvenance(Provenance provenance) {
this.provenance = provenance;
}
public String getBody() {
return body;
}
public void setBody(String body) {
this.body = body;
}
public long getDateOfCollection() {
return dateOfCollection;
}
public void setDateOfCollection(long dateOfCollection) {
this.dateOfCollection = dateOfCollection;
}
public long getDateOfTransformation() {
return dateOfTransformation;
}
public void setDateOfTransformation(long dateOfTransformation) {
this.dateOfTransformation = dateOfTransformation;
}
@Override
public boolean equals(Object o) {
if (!(o instanceof MetadataRecord)) {
return false;
}
return ((MetadataRecord) o).getId().equalsIgnoreCase(id);
}
@Override
public int hashCode() {
return id.hashCode();
}
}

View File

@ -1,52 +0,0 @@
package eu.dnetlib.dhp.model.mdstore;
import java.io.Serializable;
/**
* @author Sandro La Bruzzo
* <p>
* Provenace class models the provenance of the record in the metadataStore It contains the identifier and the
* name of the datasource that gives the record
*/
public class Provenance implements Serializable {
private String datasourceId;
private String datasourceName;
private String nsPrefix;
public Provenance() {
}
public Provenance(String datasourceId, String datasourceName, String nsPrefix) {
this.datasourceId = datasourceId;
this.datasourceName = datasourceName;
this.nsPrefix = nsPrefix;
}
public String getDatasourceId() {
return datasourceId;
}
public void setDatasourceId(String datasourceId) {
this.datasourceId = datasourceId;
}
public String getDatasourceName() {
return datasourceName;
}
public void setDatasourceName(String datasourceName) {
this.datasourceName = datasourceName;
}
public String getNsPrefix() {
return nsPrefix;
}
public void setNsPrefix(String nsPrefix) {
this.nsPrefix = nsPrefix;
}
}

View File

@ -4,6 +4,7 @@ package eu.dnetlib.dhp.oa.merge;
import java.text.Normalizer; import java.text.Normalizer;
import java.util.*; import java.util.*;
import java.util.stream.Collectors; import java.util.stream.Collectors;
import java.util.stream.Stream;
import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.StringUtils;
@ -32,27 +33,33 @@ public class AuthorMerger {
} }
public static List<Author> mergeAuthor(final List<Author> a, final List<Author> b) { public static List<Author> mergeAuthor(final List<Author> a, final List<Author> b, Double threshold) {
int pa = countAuthorsPids(a); int pa = countAuthorsPids(a);
int pb = countAuthorsPids(b); int pb = countAuthorsPids(b);
List<Author> base, enrich; List<Author> base, enrich;
int sa = authorsSize(a); int sa = authorsSize(a);
int sb = authorsSize(b); int sb = authorsSize(b);
if (pa == pb) { if (sa == sb) {
base = sa > sb ? a : b;
enrich = sa > sb ? b : a;
} else {
base = pa > pb ? a : b; base = pa > pb ? a : b;
enrich = pa > pb ? b : a; enrich = pa > pb ? b : a;
} else {
base = sa > sb ? a : b;
enrich = sa > sb ? b : a;
} }
enrichPidFromList(base, enrich); enrichPidFromList(base, enrich, threshold);
return base; return base;
} }
private static void enrichPidFromList(List<Author> base, List<Author> enrich) { public static List<Author> mergeAuthor(final List<Author> a, final List<Author> b) {
return mergeAuthor(a, b, THRESHOLD);
}
private static void enrichPidFromList(List<Author> base, List<Author> enrich, Double threshold) {
if (base == null || enrich == null) if (base == null || enrich == null)
return; return;
// <pidComparableString, Author> (if an Author has more than 1 pid, it appears 2 times in the list)
final Map<String, Author> basePidAuthorMap = base final Map<String, Author> basePidAuthorMap = base
.stream() .stream()
.filter(a -> a.getPid() != null && a.getPid().size() > 0) .filter(a -> a.getPid() != null && a.getPid().size() > 0)
@ -63,6 +70,7 @@ public class AuthorMerger {
.map(p -> new Tuple2<>(pidToComparableString(p), a))) .map(p -> new Tuple2<>(pidToComparableString(p), a)))
.collect(Collectors.toMap(Tuple2::_1, Tuple2::_2, (x1, x2) -> x1)); .collect(Collectors.toMap(Tuple2::_1, Tuple2::_2, (x1, x2) -> x1));
// <pid, Author> (list of pid that are missing in the other list)
final List<Tuple2<StructuredProperty, Author>> pidToEnrich = enrich final List<Tuple2<StructuredProperty, Author>> pidToEnrich = enrich
.stream() .stream()
.filter(a -> a.getPid() != null && a.getPid().size() > 0) .filter(a -> a.getPid() != null && a.getPid().size() > 0)
@ -83,10 +91,10 @@ public class AuthorMerger {
.max(Comparator.comparing(Tuple2::_1)); .max(Comparator.comparing(Tuple2::_1));
if (simAuthor.isPresent()) { if (simAuthor.isPresent()) {
double th = THRESHOLD; double th = threshold;
// increase the threshold if the surname is too short // increase the threshold if the surname is too short
if (simAuthor.get()._2().getSurname() != null if (simAuthor.get()._2().getSurname() != null
&& simAuthor.get()._2().getSurname().length() <= 3) && simAuthor.get()._2().getSurname().length() <= 3 && threshold > 0.0)
th = 0.99; th = 0.99;
if (simAuthor.get()._1() > th) { if (simAuthor.get()._1() > th) {
@ -156,7 +164,7 @@ public class AuthorMerger {
} }
private static String normalize(final String s) { private static String normalize(final String s) {
return nfd(s) String[] normalized = nfd(s)
.toLowerCase() .toLowerCase()
// do not compact the regexes in a single expression, would cause StackOverflowError // do not compact the regexes in a single expression, would cause StackOverflowError
// in case // in case
@ -166,7 +174,12 @@ public class AuthorMerger {
.replaceAll("(\\p{Punct})+", " ") .replaceAll("(\\p{Punct})+", " ")
.replaceAll("(\\d)+", " ") .replaceAll("(\\d)+", " ")
.replaceAll("(\\n)+", " ") .replaceAll("(\\n)+", " ")
.trim(); .trim()
.split(" ");
Arrays.sort(normalized);
return String.join(" ", normalized);
} }
private static String nfd(final String s) { private static String nfd(final String s) {

View File

@ -1,238 +0,0 @@
package eu.dnetlib.dhp.schema.oaf;
import java.util.LinkedHashMap;
import java.util.Objects;
import java.util.Optional;
import java.util.function.Function;
import java.util.stream.Collectors;
import org.apache.commons.lang3.StringUtils;
import com.clearspring.analytics.util.Lists;
import eu.dnetlib.dhp.schema.common.ModelConstants;
public class CleaningFunctions {
public static final String DOI_URL_PREFIX_REGEX = "(^http(s?):\\/\\/)(((dx\\.)?doi\\.org)|(handle\\.test\\.datacite\\.org))\\/";
public static final String ORCID_PREFIX_REGEX = "^http(s?):\\/\\/orcid\\.org\\/";
public static final String NONE = "none";
public static <T extends Oaf> T fixVocabularyNames(T value) {
if (value instanceof Datasource) {
// nothing to clean here
} else if (value instanceof Project) {
// nothing to clean here
} else if (value instanceof Organization) {
Organization o = (Organization) value;
if (Objects.nonNull(o.getCountry())) {
fixVocabName(o.getCountry(), ModelConstants.DNET_COUNTRY_TYPE);
}
} else if (value instanceof Relation) {
// nothing to clean here
} else if (value instanceof Result) {
Result r = (Result) value;
fixVocabName(r.getLanguage(), ModelConstants.DNET_LANGUAGES);
fixVocabName(r.getResourcetype(), ModelConstants.DNET_DATA_CITE_RESOURCE);
fixVocabName(r.getBestaccessright(), ModelConstants.DNET_ACCESS_MODES);
if (Objects.nonNull(r.getSubject())) {
r.getSubject().forEach(s -> fixVocabName(s.getQualifier(), ModelConstants.DNET_SUBJECT_TYPOLOGIES));
}
if (Objects.nonNull(r.getInstance())) {
for (Instance i : r.getInstance()) {
fixVocabName(i.getAccessright(), ModelConstants.DNET_ACCESS_MODES);
fixVocabName(i.getRefereed(), ModelConstants.DNET_REVIEW_LEVELS);
}
}
if (Objects.nonNull(r.getAuthor())) {
r.getAuthor().forEach(a -> {
if (Objects.nonNull(a.getPid())) {
a.getPid().forEach(p -> {
fixVocabName(p.getQualifier(), ModelConstants.DNET_PID_TYPES);
});
}
});
}
if (value instanceof Publication) {
} else if (value instanceof eu.dnetlib.dhp.schema.oaf.Dataset) {
} else if (value instanceof OtherResearchProduct) {
} else if (value instanceof Software) {
}
}
return value;
}
public static <T extends Oaf> T fixDefaults(T value) {
if (value instanceof Datasource) {
// nothing to clean here
} else if (value instanceof Project) {
// nothing to clean here
} else if (value instanceof Organization) {
Organization o = (Organization) value;
if (Objects.isNull(o.getCountry()) || StringUtils.isBlank(o.getCountry().getClassid())) {
o.setCountry(qualifier("UNKNOWN", "Unknown", ModelConstants.DNET_COUNTRY_TYPE));
}
} else if (value instanceof Relation) {
// nothing to clean here
} else if (value instanceof Result) {
Result r = (Result) value;
if (Objects.nonNull(r.getPublisher()) && StringUtils.isBlank(r.getPublisher().getValue())) {
r.setPublisher(null);
}
if (Objects.isNull(r.getLanguage()) || StringUtils.isBlank(r.getLanguage().getClassid())) {
r
.setLanguage(
qualifier("und", "Undetermined", ModelConstants.DNET_LANGUAGES));
}
if (Objects.nonNull(r.getSubject())) {
r
.setSubject(
r
.getSubject()
.stream()
.filter(Objects::nonNull)
.filter(sp -> StringUtils.isNotBlank(sp.getValue()))
.filter(sp -> Objects.nonNull(sp.getQualifier()))
.filter(sp -> StringUtils.isNotBlank(sp.getQualifier().getClassid()))
.collect(Collectors.toList()));
}
if (Objects.nonNull(r.getPid())) {
r
.setPid(
r
.getPid()
.stream()
.filter(Objects::nonNull)
.filter(sp -> StringUtils.isNotBlank(StringUtils.trim(sp.getValue())))
.filter(sp -> NONE.equalsIgnoreCase(sp.getValue()))
.filter(sp -> Objects.nonNull(sp.getQualifier()))
.filter(sp -> StringUtils.isNotBlank(sp.getQualifier().getClassid()))
.map(CleaningFunctions::normalizePidValue)
.collect(Collectors.toList()));
}
if (Objects.isNull(r.getResourcetype()) || StringUtils.isBlank(r.getResourcetype().getClassid())) {
r
.setResourcetype(
qualifier("UNKNOWN", "Unknown", ModelConstants.DNET_DATA_CITE_RESOURCE));
}
if (Objects.nonNull(r.getInstance())) {
for (Instance i : r.getInstance()) {
if (Objects.isNull(i.getAccessright()) || StringUtils.isBlank(i.getAccessright().getClassid())) {
i.setAccessright(qualifier("UNKNOWN", "not available", ModelConstants.DNET_ACCESS_MODES));
}
if (Objects.isNull(i.getHostedby()) || StringUtils.isBlank(i.getHostedby().getKey())) {
i.setHostedby(ModelConstants.UNKNOWN_REPOSITORY);
}
if (Objects.isNull(i.getRefereed())) {
i.setRefereed(qualifier("0000", "Unknown", ModelConstants.DNET_REVIEW_LEVELS));
}
}
}
if (Objects.isNull(r.getBestaccessright()) || StringUtils.isBlank(r.getBestaccessright().getClassid())) {
Qualifier bestaccessrights = OafMapperUtils.createBestAccessRights(r.getInstance());
if (Objects.isNull(bestaccessrights)) {
r
.setBestaccessright(
qualifier("UNKNOWN", "not available", ModelConstants.DNET_ACCESS_MODES));
} else {
r.setBestaccessright(bestaccessrights);
}
}
if (Objects.nonNull(r.getAuthor())) {
boolean nullRank = r
.getAuthor()
.stream()
.anyMatch(a -> Objects.isNull(a.getRank()));
if (nullRank) {
int i = 1;
for (Author author : r.getAuthor()) {
author.setRank(i++);
}
}
for (Author a : r.getAuthor()) {
if (Objects.isNull(a.getPid())) {
a.setPid(Lists.newArrayList());
} else {
a
.setPid(
a
.getPid()
.stream()
.filter(p -> Objects.nonNull(p.getQualifier()))
.filter(p -> StringUtils.isNotBlank(p.getValue()))
.map(p -> {
p.setValue(p.getValue().trim().replaceAll(ORCID_PREFIX_REGEX, ""));
return p;
})
.collect(
Collectors
.toMap(
StructuredProperty::getValue, Function.identity(), (p1, p2) -> p1,
LinkedHashMap::new))
.values()
.stream()
.collect(Collectors.toList()));
}
}
}
if (value instanceof Publication) {
} else if (value instanceof eu.dnetlib.dhp.schema.oaf.Dataset) {
} else if (value instanceof OtherResearchProduct) {
} else if (value instanceof Software) {
}
}
return value;
}
// HELPERS
private static void fixVocabName(Qualifier q, String vocabularyName) {
if (Objects.nonNull(q) && StringUtils.isBlank(q.getSchemeid())) {
q.setSchemeid(vocabularyName);
q.setSchemename(vocabularyName);
}
}
private static Qualifier qualifier(String classid, String classname, String scheme) {
return OafMapperUtils
.qualifier(
classid, classname, scheme, scheme);
}
/**
* Utility method that normalises PID values on a per-type basis.
* @param pid the PID whose value will be normalised.
* @return the PID containing the normalised value.
*/
public static StructuredProperty normalizePidValue(StructuredProperty pid) {
String value = Optional
.ofNullable(pid.getValue())
.map(String::trim)
.orElseThrow(() -> new IllegalArgumentException("PID value cannot be empty"));
switch (pid.getQualifier().getClassid()) {
// TODO add cleaning for more PID types as needed
case "doi":
pid.setValue(value.toLowerCase().replaceAll(DOI_URL_PREFIX_REGEX, ""));
break;
}
return pid;
}
}

View File

@ -1,14 +0,0 @@
package eu.dnetlib.dhp.schema.oaf;
public class ModelHardLimits {
public static final int MAX_EXTERNAL_ENTITIES = 50;
public static final int MAX_AUTHORS = 200;
public static final int MAX_AUTHOR_FULLNAME_LENGTH = 1000;
public static final int MAX_TITLE_LENGTH = 5000;
public static final int MAX_TITLES = 10;
public static final int MAX_ABSTRACT_LENGTH = 150000;
public static final int MAX_INSTANCES = 10;
}

View File

@ -1,49 +0,0 @@
package eu.dnetlib.dhp.schema.oaf;
import java.util.Comparator;
import eu.dnetlib.dhp.schema.common.ModelConstants;
public class ResultTypeComparator implements Comparator<Result> {
@Override
public int compare(Result left, Result right) {
if (left == null && right == null)
return 0;
if (left == null)
return 1;
if (right == null)
return -1;
String lClass = left.getResulttype().getClassid();
String rClass = right.getResulttype().getClassid();
if (lClass.equals(rClass))
return 0;
if (lClass.equals(ModelConstants.PUBLICATION_RESULTTYPE_CLASSID))
return -1;
if (rClass.equals(ModelConstants.PUBLICATION_RESULTTYPE_CLASSID))
return 1;
if (lClass.equals(ModelConstants.DATASET_RESULTTYPE_CLASSID))
return -1;
if (rClass.equals(ModelConstants.DATASET_RESULTTYPE_CLASSID))
return 1;
if (lClass.equals(ModelConstants.SOFTWARE_RESULTTYPE_CLASSID))
return -1;
if (rClass.equals(ModelConstants.SOFTWARE_RESULTTYPE_CLASSID))
return 1;
if (lClass.equals(ModelConstants.ORP_RESULTTYPE_CLASSID))
return -1;
if (rClass.equals(ModelConstants.ORP_RESULTTYPE_CLASSID))
return 1;
// Else (but unlikely), lexicographical ordering will do.
return lClass.compareTo(rClass);
}
}

View File

@ -0,0 +1,459 @@
package eu.dnetlib.dhp.schema.oaf.utils;
import java.time.LocalDate;
import java.time.ZoneId;
import java.time.format.DateTimeFormatter;
import java.time.format.DateTimeParseException;
import java.util.*;
import java.util.function.Function;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
import java.util.stream.Stream;
import org.apache.commons.lang3.StringUtils;
import org.jetbrains.annotations.NotNull;
import com.github.sisyphsu.dateparser.DateParserUtils;
import com.google.common.collect.Lists;
import com.google.common.collect.Maps;
import com.google.common.collect.Sets;
import eu.dnetlib.dhp.schema.common.ModelConstants;
import eu.dnetlib.dhp.schema.common.ModelSupport;
import eu.dnetlib.dhp.schema.oaf.*;
public class GraphCleaningFunctions extends CleaningFunctions {
public static final String ORCID_CLEANING_REGEX = ".*([0-9]{4}).*[-–—−=].*([0-9]{4}).*[-–—−=].*([0-9]{4}).*[-–—−=].*([0-9x]{4})";
public static final int ORCID_LEN = 19;
public static final String CLEANING_REGEX = "(?:\\n|\\r|\\t)";
public static final String INVALID_AUTHOR_REGEX = ".*deactivated.*";
public static final String TITLE_FILTER_REGEX = "[.*test.*\\W\\d]";
public static final int TITLE_FILTER_RESIDUAL_LENGTH = 10;
public static <T extends Oaf> T fixVocabularyNames(T value) {
if (value instanceof Datasource) {
// nothing to clean here
} else if (value instanceof Project) {
// nothing to clean here
} else if (value instanceof Organization) {
Organization o = (Organization) value;
if (Objects.nonNull(o.getCountry())) {
fixVocabName(o.getCountry(), ModelConstants.DNET_COUNTRY_TYPE);
}
} else if (value instanceof Relation) {
// nothing to clean here
} else if (value instanceof Result) {
Result r = (Result) value;
fixVocabName(r.getLanguage(), ModelConstants.DNET_LANGUAGES);
fixVocabName(r.getResourcetype(), ModelConstants.DNET_DATA_CITE_RESOURCE);
fixVocabName(r.getBestaccessright(), ModelConstants.DNET_ACCESS_MODES);
if (Objects.nonNull(r.getSubject())) {
r.getSubject().forEach(s -> fixVocabName(s.getQualifier(), ModelConstants.DNET_SUBJECT_TYPOLOGIES));
}
if (Objects.nonNull(r.getInstance())) {
for (Instance i : r.getInstance()) {
fixVocabName(i.getAccessright(), ModelConstants.DNET_ACCESS_MODES);
fixVocabName(i.getRefereed(), ModelConstants.DNET_REVIEW_LEVELS);
}
}
if (Objects.nonNull(r.getAuthor())) {
r.getAuthor().stream().filter(Objects::nonNull).forEach(a -> {
if (Objects.nonNull(a.getPid())) {
a.getPid().stream().filter(Objects::nonNull).forEach(p -> {
fixVocabName(p.getQualifier(), ModelConstants.DNET_PID_TYPES);
});
}
});
}
if (value instanceof Publication) {
} else if (value instanceof Dataset) {
} else if (value instanceof OtherResearchProduct) {
} else if (value instanceof Software) {
}
}
return value;
}
public static <T extends Oaf> boolean filter(T value) {
if (value instanceof Datasource) {
// nothing to evaluate here
} else if (value instanceof Project) {
// nothing to evaluate here
} else if (value instanceof Organization) {
// nothing to evaluate here
} else if (value instanceof Relation) {
// nothing to clean here
} else if (value instanceof Result) {
Result r = (Result) value;
if (Objects.isNull(r.getTitle()) || r.getTitle().isEmpty()) {
return false;
}
if (value instanceof Publication) {
} else if (value instanceof Dataset) {
} else if (value instanceof OtherResearchProduct) {
} else if (value instanceof Software) {
}
}
return true;
}
public static <T extends Oaf> T cleanup(T value) {
if (value instanceof Datasource) {
// nothing to clean here
} else if (value instanceof Project) {
// nothing to clean here
} else if (value instanceof Organization) {
Organization o = (Organization) value;
if (Objects.isNull(o.getCountry()) || StringUtils.isBlank(o.getCountry().getClassid())) {
o.setCountry(ModelConstants.UNKNOWN_COUNTRY);
}
} else if (value instanceof Relation) {
Relation r = (Relation) value;
Optional<String> validationDate = doCleanDate(r.getValidationDate());
if (validationDate.isPresent()) {
r.setValidationDate(validationDate.get());
r.setValidated(true);
} else {
r.setValidationDate(null);
r.setValidated(false);
}
} else if (value instanceof Result) {
Result r = (Result) value;
if (Objects.nonNull(r.getDateofacceptance())) {
Optional<String> date = cleanDateField(r.getDateofacceptance());
if (date.isPresent()) {
r.getDateofacceptance().setValue(date.get());
} else {
r.setDateofacceptance(null);
}
}
if (Objects.nonNull(r.getRelevantdate())) {
r
.setRelevantdate(
r
.getRelevantdate()
.stream()
.filter(Objects::nonNull)
.filter(sp -> Objects.nonNull(sp.getQualifier()))
.filter(sp -> StringUtils.isNotBlank(sp.getQualifier().getClassid()))
.map(sp -> {
sp.setValue(GraphCleaningFunctions.cleanDate(sp.getValue()));
return sp;
})
.filter(sp -> StringUtils.isNotBlank(sp.getValue()))
.collect(Collectors.toList()));
}
if (Objects.nonNull(r.getPublisher()) && StringUtils.isBlank(r.getPublisher().getValue())) {
r.setPublisher(null);
}
if (Objects.isNull(r.getLanguage()) || StringUtils.isBlank(r.getLanguage().getClassid())) {
r
.setLanguage(
qualifier("und", "Undetermined", ModelConstants.DNET_LANGUAGES));
}
if (Objects.nonNull(r.getSubject())) {
r
.setSubject(
r
.getSubject()
.stream()
.filter(Objects::nonNull)
.filter(sp -> StringUtils.isNotBlank(sp.getValue()))
.filter(sp -> Objects.nonNull(sp.getQualifier()))
.filter(sp -> StringUtils.isNotBlank(sp.getQualifier().getClassid()))
.map(GraphCleaningFunctions::cleanValue)
.collect(Collectors.toList()));
}
if (Objects.nonNull(r.getTitle())) {
r
.setTitle(
r
.getTitle()
.stream()
.filter(Objects::nonNull)
.filter(sp -> StringUtils.isNotBlank(sp.getValue()))
.filter(
sp -> sp
.getValue()
.toLowerCase()
.replaceAll(TITLE_FILTER_REGEX, "")
.length() > TITLE_FILTER_RESIDUAL_LENGTH)
.map(GraphCleaningFunctions::cleanValue)
.collect(Collectors.toList()));
}
if (Objects.nonNull(r.getDescription())) {
r
.setDescription(
r
.getDescription()
.stream()
.filter(Objects::nonNull)
.filter(sp -> StringUtils.isNotBlank(sp.getValue()))
.map(GraphCleaningFunctions::cleanValue)
.collect(Collectors.toList()));
}
if (Objects.nonNull(r.getPid())) {
r.setPid(processPidCleaning(r.getPid()));
}
if (Objects.isNull(r.getResourcetype()) || StringUtils.isBlank(r.getResourcetype().getClassid())) {
r
.setResourcetype(
qualifier(ModelConstants.UNKNOWN, "Unknown", ModelConstants.DNET_DATA_CITE_RESOURCE));
}
if (Objects.nonNull(r.getInstance())) {
for (Instance i : r.getInstance()) {
if (Objects.nonNull(i.getPid())) {
i.setPid(processPidCleaning(i.getPid()));
}
if (Objects.nonNull(i.getAlternateIdentifier())) {
i.setAlternateIdentifier(processPidCleaning(i.getAlternateIdentifier()));
}
Optional
.ofNullable(i.getPid())
.ifPresent(pid -> {
final Set<StructuredProperty> pids = Sets.newHashSet(pid);
Optional
.ofNullable(i.getAlternateIdentifier())
.ifPresent(altId -> {
final Set<StructuredProperty> altIds = Sets.newHashSet(altId);
i.setAlternateIdentifier(Lists.newArrayList(Sets.difference(altIds, pids)));
});
});
if (Objects.isNull(i.getAccessright()) || StringUtils.isBlank(i.getAccessright().getClassid())) {
i
.setAccessright(
accessRight(
ModelConstants.UNKNOWN, ModelConstants.NOT_AVAILABLE,
ModelConstants.DNET_ACCESS_MODES));
}
if (Objects.isNull(i.getHostedby()) || StringUtils.isBlank(i.getHostedby().getKey())) {
i.setHostedby(ModelConstants.UNKNOWN_REPOSITORY);
}
if (Objects.isNull(i.getRefereed())) {
i.setRefereed(qualifier("0000", "Unknown", ModelConstants.DNET_REVIEW_LEVELS));
}
if (Objects.nonNull(i.getDateofacceptance())) {
Optional<String> date = cleanDateField(i.getDateofacceptance());
if (date.isPresent()) {
i.getDateofacceptance().setValue(date.get());
} else {
i.setDateofacceptance(null);
}
}
}
}
if (Objects.isNull(r.getBestaccessright()) || StringUtils.isBlank(r.getBestaccessright().getClassid())) {
Qualifier bestaccessrights = OafMapperUtils.createBestAccessRights(r.getInstance());
if (Objects.isNull(bestaccessrights)) {
r
.setBestaccessright(
qualifier(
ModelConstants.UNKNOWN, ModelConstants.NOT_AVAILABLE,
ModelConstants.DNET_ACCESS_MODES));
} else {
r.setBestaccessright(bestaccessrights);
}
}
if (Objects.nonNull(r.getAuthor())) {
r
.setAuthor(
r
.getAuthor()
.stream()
.filter(a -> Objects.nonNull(a))
.filter(a -> StringUtils.isNotBlank(a.getFullname()))
.filter(a -> StringUtils.isNotBlank(a.getFullname().replaceAll("[\\W]", "")))
.collect(Collectors.toList()));
boolean nullRank = r
.getAuthor()
.stream()
.anyMatch(a -> Objects.isNull(a.getRank()));
if (nullRank) {
int i = 1;
for (Author author : r.getAuthor()) {
author.setRank(i++);
}
}
for (Author a : r.getAuthor()) {
if (Objects.isNull(a.getPid())) {
a.setPid(Lists.newArrayList());
} else {
a
.setPid(
a
.getPid()
.stream()
.filter(Objects::nonNull)
.filter(p -> Objects.nonNull(p.getQualifier()))
.filter(p -> StringUtils.isNotBlank(p.getValue()))
.map(p -> {
// hack to distinguish orcid from orcid_pending
String pidProvenance = Optional
.ofNullable(p.getDataInfo())
.map(
d -> Optional
.ofNullable(d.getProvenanceaction())
.map(Qualifier::getClassid)
.orElse(""))
.orElse("");
if (p
.getQualifier()
.getClassid()
.toLowerCase()
.contains(ModelConstants.ORCID)) {
if (pidProvenance
.equals(ModelConstants.SYSIMPORT_CROSSWALK_ENTITYREGISTRY)) {
p.getQualifier().setClassid(ModelConstants.ORCID);
} else {
p.getQualifier().setClassid(ModelConstants.ORCID_PENDING);
}
final String orcid = p
.getValue()
.trim()
.toLowerCase()
.replaceAll(ORCID_CLEANING_REGEX, "$1-$2-$3-$4");
if (orcid.length() == ORCID_LEN) {
p.setValue(orcid);
} else {
p.setValue("");
}
}
return p;
})
.filter(p -> StringUtils.isNotBlank(p.getValue()))
.collect(
Collectors
.toMap(
p -> p.getQualifier().getClassid() + p.getValue(),
Function.identity(),
(p1, p2) -> p1,
LinkedHashMap::new))
.values()
.stream()
.collect(Collectors.toList()));
}
}
}
if (value instanceof Publication) {
} else if (value instanceof Dataset) {
} else if (value instanceof OtherResearchProduct) {
} else if (value instanceof Software) {
}
}
return value;
}
private static Optional<String> cleanDateField(Field<String> dateofacceptance) {
return Optional
.ofNullable(dateofacceptance)
.map(Field::getValue)
.map(GraphCleaningFunctions::cleanDate)
.filter(Objects::nonNull);
}
protected static Optional<String> doCleanDate(String date) {
return Optional.ofNullable(cleanDate(date));
}
public static String cleanDate(final String inputDate) {
if (StringUtils.isBlank(inputDate)) {
return null;
}
try {
final LocalDate date = DateParserUtils
.parseDate(inputDate.trim())
.toInstant()
.atZone(ZoneId.systemDefault())
.toLocalDate();
return DateTimeFormatter.ofPattern(ModelSupport.DATE_FORMAT).format(date);
} catch (DateTimeParseException e) {
return null;
}
}
// HELPERS
private static boolean isValidAuthorName(Author a) {
return !Stream
.of(a.getFullname(), a.getName(), a.getSurname())
.filter(s -> s != null && !s.isEmpty())
.collect(Collectors.joining(""))
.toLowerCase()
.matches(INVALID_AUTHOR_REGEX);
}
private static List<StructuredProperty> processPidCleaning(List<StructuredProperty> pids) {
return pids
.stream()
.filter(Objects::nonNull)
.filter(sp -> StringUtils.isNotBlank(StringUtils.trim(sp.getValue())))
.filter(sp -> !PID_BLACKLIST.contains(sp.getValue().trim().toLowerCase()))
.filter(sp -> Objects.nonNull(sp.getQualifier()))
.filter(sp -> StringUtils.isNotBlank(sp.getQualifier().getClassid()))
.map(CleaningFunctions::normalizePidValue)
.filter(CleaningFunctions::pidFilter)
.collect(Collectors.toList());
}
private static void fixVocabName(Qualifier q, String vocabularyName) {
if (Objects.nonNull(q) && StringUtils.isBlank(q.getSchemeid())) {
q.setSchemeid(vocabularyName);
q.setSchemename(vocabularyName);
}
}
private static AccessRight accessRight(String classid, String classname, String scheme) {
return OafMapperUtils
.accessRight(
classid, classname, scheme, scheme);
}
private static Qualifier qualifier(String classid, String classname, String scheme) {
return OafMapperUtils
.qualifier(
classid, classname, scheme, scheme);
}
protected static StructuredProperty cleanValue(StructuredProperty s) {
s.setValue(s.getValue().replaceAll(CLEANING_REGEX, " "));
return s;
}
protected static Field<String> cleanValue(Field<String> s) {
s.setValue(s.getValue().replaceAll(CLEANING_REGEX, " "));
return s;
}
}

View File

@ -1,102 +0,0 @@
package eu.dnetlib.dhp.schema.oaf.utils;
import java.io.Serializable;
import java.util.Objects;
import java.util.Optional;
import java.util.regex.Pattern;
import org.apache.commons.lang.StringUtils;
import eu.dnetlib.dhp.schema.oaf.CleaningFunctions;
import eu.dnetlib.dhp.schema.oaf.OafEntity;
import eu.dnetlib.dhp.schema.oaf.StructuredProperty;
import eu.dnetlib.dhp.utils.DHPUtils;
/**
* Factory class for OpenAIRE identifiers in the Graph
*/
public class IdentifierFactory implements Serializable {
public static final String ID_SEPARATOR = "::";
public static final String ID_PREFIX_SEPARATOR = "|";
public final static String ID_REGEX = "^[0-9][0-9]\\" + ID_PREFIX_SEPARATOR + ".{12}" + ID_SEPARATOR
+ "[a-zA-Z0-9]{32}$";
public final static String DOI_REGEX = "(^10\\.[0-9]{4,9}\\/[-._;()\\/:a-zA-Z0-9]+$)|" +
"(^10\\.1002\\/[^\\s]+$)|" +
"(^10\\.1021\\/[a-zA-Z0-9_][a-zA-Z0-9_][0-9]++$)|" +
"(^10\\.1207\\/[a-zA-Z0-9_]+\\&[0-9]+_[0-9]+$)";
public static final int ID_PREFIX_LEN = 12;
public static final String NONE = "none";
/**
* Creates an identifier from the most relevant PID (if available) in the given entity T. Returns entity.id
* when no PID is available
* @param entity the entity providing PIDs and a default ID.
* @param <T> the specific entity type. Currently Organization and Result subclasses are supported.
* @return an identifier from the most relevant PID, entity.id otherwise
*/
public static <T extends OafEntity> String createIdentifier(T entity) {
if (Objects.isNull(entity.getPid()) || entity.getPid().isEmpty()) {
return entity.getId();
}
return entity
.getPid()
.stream()
.filter(s -> pidFilter(s))
.min(new PidComparator<>(entity))
.map(s -> idFromPid(entity, s))
.map(IdentifierFactory::verifyIdSyntax)
.orElseGet(entity::getId);
}
protected static boolean pidFilter(StructuredProperty s) {
if (Objects.isNull(s.getQualifier()) ||
StringUtils.isBlank(StringUtils.trim(s.getValue()))) {
return false;
}
try {
switch (PidType.valueOf(s.getQualifier().getClassid())) {
case doi:
final String doi = StringUtils.trim(StringUtils.lowerCase(s.getValue()));
return doi.matches(DOI_REGEX);
default:
return true;
}
} catch (IllegalArgumentException e) {
return false;
}
}
private static String verifyIdSyntax(String s) {
if (StringUtils.isBlank(s) || !s.matches(ID_REGEX)) {
throw new RuntimeException(String.format("malformed id: '%s'", s));
} else {
return s;
}
}
private static <T extends OafEntity> String idFromPid(T entity, StructuredProperty s) {
return new StringBuilder()
.append(StringUtils.substringBefore(entity.getId(), ID_PREFIX_SEPARATOR))
.append(ID_PREFIX_SEPARATOR)
.append(createPrefix(s.getQualifier().getClassid()))
.append(ID_SEPARATOR)
.append(DHPUtils.md5(CleaningFunctions.normalizePidValue(s).getValue()))
.toString();
}
// create the prefix (length = 12)
private static String createPrefix(String pidType) {
StringBuilder prefix = new StringBuilder(StringUtils.left(pidType, ID_PREFIX_LEN));
while (prefix.length() < ID_PREFIX_LEN) {
prefix.append("_");
}
return prefix.substring(0, ID_PREFIX_LEN);
}
}

View File

@ -1,8 +1,7 @@
package eu.dnetlib.dhp.schema.oaf; package eu.dnetlib.dhp.schema.oaf.utils;
import static eu.dnetlib.dhp.schema.common.ModelConstants.*; import static eu.dnetlib.dhp.schema.common.ModelConstants.*;
import static eu.dnetlib.dhp.schema.common.ModelConstants.DNET_ACCESS_MODES;
import java.util.*; import java.util.*;
import java.util.concurrent.ConcurrentHashMap; import java.util.concurrent.ConcurrentHashMap;
@ -12,11 +11,48 @@ import java.util.stream.Collectors;
import org.apache.commons.lang3.StringUtils; import org.apache.commons.lang3.StringUtils;
import eu.dnetlib.dhp.schema.common.LicenseComparator; import eu.dnetlib.dhp.schema.common.AccessRightComparator;
import eu.dnetlib.dhp.utils.DHPUtils; import eu.dnetlib.dhp.schema.common.ModelSupport;
import eu.dnetlib.dhp.schema.oaf.*;
public class OafMapperUtils { public class OafMapperUtils {
public static Oaf merge(final Oaf left, final Oaf right) {
if (ModelSupport.isSubClass(left, OafEntity.class)) {
return mergeEntities((OafEntity) left, (OafEntity) right);
} else if (ModelSupport.isSubClass(left, Relation.class)) {
((Relation) left).mergeFrom((Relation) right);
} else {
throw new RuntimeException("invalid Oaf type:" + left.getClass().getCanonicalName());
}
return left;
}
public static OafEntity mergeEntities(OafEntity left, OafEntity right) {
if (ModelSupport.isSubClass(left, Result.class)) {
return mergeResults((Result) left, (Result) right);
} else if (ModelSupport.isSubClass(left, Datasource.class)) {
left.mergeFrom(right);
} else if (ModelSupport.isSubClass(left, Organization.class)) {
left.mergeFrom(right);
} else if (ModelSupport.isSubClass(left, Project.class)) {
left.mergeFrom(right);
} else {
throw new RuntimeException("invalid OafEntity subtype:" + left.getClass().getCanonicalName());
}
return left;
}
public static Result mergeResults(Result left, Result right) {
if (new ResultTypeComparator().compare(left, right) < 0) {
left.mergeFrom(right);
return left;
} else {
right.mergeFrom(left);
return right;
}
}
public static KeyValue keyValue(final String k, final String v) { public static KeyValue keyValue(final String k, final String v) {
final KeyValue kv = new KeyValue(); final KeyValue kv = new KeyValue();
kv.setKey(k); kv.setKey(k);
@ -69,6 +105,29 @@ public class OafMapperUtils {
return qualifier("UNKNOWN", "Unknown", schemeid, schemename); return qualifier("UNKNOWN", "Unknown", schemeid, schemename);
} }
public static AccessRight accessRight(
final String classid,
final String classname,
final String schemeid,
final String schemename) {
return accessRight(classid, classname, schemeid, schemename, null);
}
public static AccessRight accessRight(
final String classid,
final String classname,
final String schemeid,
final String schemename,
final OpenAccessRoute openAccessRoute) {
final AccessRight accessRight = new AccessRight();
accessRight.setClassid(classid);
accessRight.setClassname(classname);
accessRight.setSchemeid(schemeid);
accessRight.setSchemename(schemename);
accessRight.setOpenAccessRoute(openAccessRoute);
return accessRight;
}
public static Qualifier qualifier( public static Qualifier qualifier(
final String classid, final String classid,
final String classname, final String classname,
@ -82,6 +141,15 @@ public class OafMapperUtils {
return q; return q;
} }
public static Qualifier qualifier(final Qualifier qualifier) {
final Qualifier q = new Qualifier();
q.setClassid(qualifier.getClassid());
q.setClassname(qualifier.getClassname());
q.setSchemeid(qualifier.getSchemeid());
q.setSchemename(qualifier.getSchemename());
return q;
}
public static StructuredProperty structuredProperty( public static StructuredProperty structuredProperty(
final String value, final String value,
final String classid, final String classid,
@ -150,7 +218,8 @@ public class OafMapperUtils {
final String issnOnline, final String issnOnline,
final String issnLinking, final String issnLinking,
final DataInfo dataInfo) { final DataInfo dataInfo) {
return journal(
return hasIssn(issnPrinted, issnOnline, issnLinking) ? journal(
name, name,
issnPrinted, issnPrinted,
issnOnline, issnOnline,
@ -162,7 +231,7 @@ public class OafMapperUtils {
null, null,
null, null,
null, null,
dataInfo); dataInfo) : null;
} }
public static Journal journal( public static Journal journal(
@ -179,10 +248,7 @@ public class OafMapperUtils {
final String conferencedate, final String conferencedate,
final DataInfo dataInfo) { final DataInfo dataInfo) {
if (StringUtils.isNotBlank(name) if (StringUtils.isNotBlank(name) || hasIssn(issnPrinted, issnOnline, issnLinking)) {
|| StringUtils.isNotBlank(issnPrinted)
|| StringUtils.isNotBlank(issnOnline)
|| StringUtils.isNotBlank(issnLinking)) {
final Journal j = new Journal(); final Journal j = new Journal();
j.setName(name); j.setName(name);
j.setIssnPrinted(issnPrinted); j.setIssnPrinted(issnPrinted);
@ -202,6 +268,12 @@ public class OafMapperUtils {
} }
} }
private static boolean hasIssn(String issnPrinted, String issnOnline, String issnLinking) {
return StringUtils.isNotBlank(issnPrinted)
|| StringUtils.isNotBlank(issnOnline)
|| StringUtils.isNotBlank(issnLinking);
}
public static DataInfo dataInfo( public static DataInfo dataInfo(
final Boolean deletedbyinference, final Boolean deletedbyinference,
final String inferenceprovenance, final String inferenceprovenance,
@ -228,7 +300,7 @@ public class OafMapperUtils {
} else if (to_md5) { } else if (to_md5) {
final String nsPrefix = StringUtils.substringBefore(originalId, "::"); final String nsPrefix = StringUtils.substringBefore(originalId, "::");
final String rest = StringUtils.substringAfter(originalId, "::"); final String rest = StringUtils.substringAfter(originalId, "::");
return String.format("%s|%s::%s", prefix, nsPrefix, DHPUtils.md5(rest)); return String.format("%s|%s::%s", prefix, nsPrefix, IdentifierFactory.md5(rest));
} else { } else {
return String.format("%s|%s", prefix, originalId); return String.format("%s|%s", prefix, originalId);
} }
@ -268,12 +340,12 @@ public class OafMapperUtils {
protected static Qualifier getBestAccessRights(final List<Instance> instanceList) { protected static Qualifier getBestAccessRights(final List<Instance> instanceList) {
if (instanceList != null) { if (instanceList != null) {
final Optional<Qualifier> min = instanceList final Optional<AccessRight> min = instanceList
.stream() .stream()
.map(i -> i.getAccessright()) .map(i -> i.getAccessright())
.min(new LicenseComparator()); .min(new AccessRightComparator<>());
final Qualifier rights = min.isPresent() ? min.get() : new Qualifier(); final Qualifier rights = min.isPresent() ? qualifier(min.get()) : new Qualifier();
if (StringUtils.isBlank(rights.getClassid())) { if (StringUtils.isBlank(rights.getClassid())) {
rights.setClassid(UNKNOWN); rights.setClassid(UNKNOWN);

View File

@ -1,27 +0,0 @@
package eu.dnetlib.dhp.schema.oaf.utils;
import java.util.Comparator;
public class OrganizationPidComparator implements Comparator<PidType> {
@Override
public int compare(PidType pLeft, PidType pRight) {
if (pLeft.equals(PidType.GRID))
return -1;
if (pRight.equals(PidType.GRID))
return 1;
if (pLeft.equals(PidType.mag_id))
return -1;
if (pRight.equals(PidType.mag_id))
return 1;
if (pLeft.equals(PidType.urn))
return -1;
if (pRight.equals(PidType.urn))
return 1;
return 0;
}
}

View File

@ -1,54 +0,0 @@
package eu.dnetlib.dhp.schema.oaf.utils;
import java.util.Comparator;
import eu.dnetlib.dhp.schema.common.ModelSupport;
import eu.dnetlib.dhp.schema.oaf.OafEntity;
import eu.dnetlib.dhp.schema.oaf.Organization;
import eu.dnetlib.dhp.schema.oaf.Result;
import eu.dnetlib.dhp.schema.oaf.StructuredProperty;
public class PidComparator<T extends OafEntity> implements Comparator<StructuredProperty> {
private T entity;
public PidComparator(T entity) {
this.entity = entity;
}
@Override
public int compare(StructuredProperty left, StructuredProperty right) {
if (left == null && right == null)
return 0;
if (left == null)
return 1;
if (right == null)
return -1;
PidType lClass = PidType.valueOf(left.getQualifier().getClassid());
PidType rClass = PidType.valueOf(right.getQualifier().getClassid());
if (lClass.equals(rClass))
return 0;
if (ModelSupport.isSubClass(entity, Result.class)) {
return compareResultPids(lClass, rClass);
}
if (ModelSupport.isSubClass(entity, Organization.class)) {
return compareOrganizationtPids(lClass, rClass);
}
// Else (but unlikely), lexicographical ordering will do.
return lClass.compareTo(rClass);
}
private int compareResultPids(PidType lClass, PidType rClass) {
return new ResultPidComparator().compare(lClass, rClass);
}
private int compareOrganizationtPids(PidType lClass, PidType rClass) {
return new OrganizationPidComparator().compare(lClass, rClass);
}
}

View File

@ -1,29 +0,0 @@
package eu.dnetlib.dhp.schema.oaf.utils;
import org.apache.commons.lang3.EnumUtils;
public enum PidType {
// Result
doi, pmid, pmc, handle, arXiv, NCID, GBIF, nct, pdb,
// Organization
GRID, mag_id, urn,
// Used by dedup
undefined, original;
public static boolean isValid(String type) {
return EnumUtils.isValidEnum(PidType.class, type);
}
public static PidType tryValueOf(String s) {
try {
return PidType.valueOf(s);
} catch (Exception e) {
return PidType.original;
}
}
}

View File

@ -1,57 +0,0 @@
package eu.dnetlib.dhp.schema.oaf.utils;
import java.util.Comparator;
public class ResultPidComparator implements Comparator<PidType> {
@Override
public int compare(PidType pLeft, PidType pRight) {
if (pLeft.equals(PidType.doi))
return -1;
if (pRight.equals(PidType.doi))
return 1;
if (pLeft.equals(PidType.pmid))
return -1;
if (pRight.equals(PidType.pmid))
return 1;
if (pLeft.equals(PidType.pmc))
return -1;
if (pRight.equals(PidType.pmc))
return 1;
if (pLeft.equals(PidType.handle))
return -1;
if (pRight.equals(PidType.handle))
return 1;
if (pLeft.equals(PidType.arXiv))
return -1;
if (pRight.equals(PidType.arXiv))
return 1;
if (pLeft.equals(PidType.NCID))
return -1;
if (pRight.equals(PidType.NCID))
return 1;
if (pLeft.equals(PidType.GBIF))
return -1;
if (pRight.equals(PidType.GBIF))
return 1;
if (pLeft.equals(PidType.nct))
return -1;
if (pRight.equals(PidType.nct))
return 1;
if (pLeft.equals(PidType.urn))
return -1;
if (pRight.equals(PidType.urn))
return 1;
return 0;
}
}

View File

@ -1,23 +1,43 @@
package eu.dnetlib.dhp.utils; package eu.dnetlib.dhp.utils;
import java.io.ByteArrayInputStream; import java.io.*;
import java.io.ByteArrayOutputStream;
import java.nio.charset.StandardCharsets; import java.nio.charset.StandardCharsets;
import java.security.MessageDigest; import java.security.MessageDigest;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.zip.GZIPInputStream; import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream; import java.util.zip.GZIPOutputStream;
import org.apache.commons.codec.binary.Base64; import org.apache.commons.codec.binary.Base64;
import org.apache.commons.codec.binary.Base64OutputStream; import org.apache.commons.codec.binary.Base64OutputStream;
import org.apache.commons.codec.binary.Hex; import org.apache.commons.codec.binary.Hex;
import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.SaveMode;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.google.common.collect.Maps;
import com.jayway.jsonpath.JsonPath; import com.jayway.jsonpath.JsonPath;
import net.minidev.json.JSONArray; import net.minidev.json.JSONArray;
import scala.collection.JavaConverters;
import scala.collection.Seq;
public class DHPUtils { public class DHPUtils {
private static final Logger log = LoggerFactory.getLogger(DHPUtils.class);
public static Seq<String> toSeq(List<String> list) {
return JavaConverters.asScalaIteratorConverter(list.iterator()).asScala().toSeq();
}
public static String md5(final String s) { public static String md5(final String s) {
try { try {
final MessageDigest md = MessageDigest.getInstance("MD5"); final MessageDigest md = MessageDigest.getInstance("MD5");
@ -72,4 +92,72 @@ public class DHPUtils {
return ""; return "";
} }
} }
public static final ObjectMapper MAPPER = new ObjectMapper();
public static void writeHdfsFile(final Configuration conf, final String content, final String path)
throws IOException {
log.info("writing file {}, size {}", path, content.length());
try (FileSystem fs = FileSystem.get(conf);
BufferedOutputStream os = new BufferedOutputStream(fs.create(new Path(path)))) {
os.write(content.getBytes(StandardCharsets.UTF_8));
os.flush();
}
}
public static String readHdfsFile(Configuration conf, String path) throws IOException {
log.info("reading file {}", path);
try (FileSystem fs = FileSystem.get(conf)) {
final Path p = new Path(path);
if (!fs.exists(p)) {
throw new FileNotFoundException(path);
}
return IOUtils.toString(fs.open(p));
}
}
public static <T> T readHdfsFileAs(Configuration conf, String path, Class<T> clazz) throws IOException {
return MAPPER.readValue(readHdfsFile(conf, path), clazz);
}
public static <T> void saveDataset(final Dataset<T> mdstore, final String targetPath) {
log.info("saving dataset in: {}", targetPath);
mdstore
.write()
.mode(SaveMode.Overwrite)
.format("parquet")
.save(targetPath);
}
public static Configuration getHadoopConfiguration(String nameNode) {
// ====== Init HDFS File System Object
Configuration conf = new Configuration();
// Set FileSystem URI
conf.set("fs.defaultFS", nameNode);
// Because of Maven
conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
System.setProperty("hadoop.home.dir", "/");
return conf;
}
public static void populateOOZIEEnv(final Map<String, String> report) throws IOException {
File file = new File(System.getProperty("oozie.action.output.properties"));
Properties props = new Properties();
report.forEach((k, v) -> props.setProperty(k, v));
try (OutputStream os = new FileOutputStream(file)) {
props.store(os, "");
}
}
public static void populateOOZIEEnv(final String paramName, String value) throws IOException {
Map<String, String> report = Maps.newHashMap();
report.put(paramName, value);
populateOOZIEEnv(report);
}
} }

View File

@ -1,11 +1,11 @@
package eu.dnetlib.dhp.utils; package eu.dnetlib.dhp.utils;
import java.util.Map; import org.apache.cxf.endpoint.Client;
import org.apache.cxf.frontend.ClientProxy;
import javax.xml.ws.BindingProvider;
import org.apache.cxf.jaxws.JaxWsProxyFactoryBean; import org.apache.cxf.jaxws.JaxWsProxyFactoryBean;
import org.apache.cxf.transport.http.HTTPConduit;
import org.apache.cxf.transports.http.configuration.HTTPClientPolicy;
import org.slf4j.Logger; import org.slf4j.Logger;
import org.slf4j.LoggerFactory; import org.slf4j.LoggerFactory;
@ -15,8 +15,8 @@ public class ISLookupClientFactory {
private static final Logger log = LoggerFactory.getLogger(ISLookupClientFactory.class); private static final Logger log = LoggerFactory.getLogger(ISLookupClientFactory.class);
private static int requestTimeout = 60000 * 10; private static final int requestTimeout = 60000 * 10;
private static int connectTimeout = 60000 * 10; private static final int connectTimeout = 60000 * 10;
public static ISLookUpService getLookUpService(final String isLookupUrl) { public static ISLookUpService getLookUpService(final String isLookupUrl) {
return getServiceStub(ISLookUpService.class, isLookupUrl); return getServiceStub(ISLookUpService.class, isLookupUrl);
@ -31,20 +31,23 @@ public class ISLookupClientFactory {
final T service = (T) jaxWsProxyFactory.create(); final T service = (T) jaxWsProxyFactory.create();
if (service instanceof BindingProvider) { Client client = ClientProxy.getClient(service);
if (client != null) {
HTTPConduit conduit = (HTTPConduit) client.getConduit();
HTTPClientPolicy policy = new HTTPClientPolicy();
log log
.info( .info(
"setting timeouts for {} to requestTimeout: {}, connectTimeout: {}", String
BindingProvider.class.getName(), requestTimeout, connectTimeout); .format(
"setting connectTimeout to %s, requestTimeout to %s for service %s",
connectTimeout,
requestTimeout,
clazz.getCanonicalName()));
Map<String, Object> requestContext = ((BindingProvider) service).getRequestContext(); policy.setConnectionTimeout(connectTimeout);
policy.setReceiveTimeout(requestTimeout);
requestContext.put("com.sun.xml.internal.ws.request.timeout", requestTimeout); conduit.setClient(policy);
requestContext.put("com.sun.xml.internal.ws.connect.timeout", connectTimeout);
requestContext.put("com.sun.xml.ws.request.timeout", requestTimeout);
requestContext.put("com.sun.xml.ws.connect.timeout", connectTimeout);
requestContext.put("javax.xml.ws.client.receiveTimeout", requestTimeout);
requestContext.put("javax.xml.ws.client.connectionTimeout", connectTimeout);
} }
return service; return service;

View File

@ -1,76 +0,0 @@
package eu.dnetlib.message;
import java.io.IOException;
import java.util.Map;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
public class Message {
private String workflowId;
private String jobName;
private MessageType type;
private Map<String, String> body;
public static Message fromJson(final String json) throws IOException {
final ObjectMapper jsonMapper = new ObjectMapper();
return jsonMapper.readValue(json, Message.class);
}
public Message() {
}
public Message(String workflowId, String jobName, MessageType type, Map<String, String> body) {
this.workflowId = workflowId;
this.jobName = jobName;
this.type = type;
this.body = body;
}
public String getWorkflowId() {
return workflowId;
}
public void setWorkflowId(String workflowId) {
this.workflowId = workflowId;
}
public String getJobName() {
return jobName;
}
public void setJobName(String jobName) {
this.jobName = jobName;
}
public MessageType getType() {
return type;
}
public void setType(MessageType type) {
this.type = type;
}
public Map<String, String> getBody() {
return body;
}
public void setBody(Map<String, String> body) {
this.body = body;
}
@Override
public String toString() {
final ObjectMapper jsonMapper = new ObjectMapper();
try {
return jsonMapper.writeValueAsString(this);
} catch (JsonProcessingException e) {
return null;
}
}
}

View File

@ -1,47 +0,0 @@
package eu.dnetlib.message;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.concurrent.LinkedBlockingQueue;
import com.rabbitmq.client.AMQP;
import com.rabbitmq.client.Channel;
import com.rabbitmq.client.DefaultConsumer;
import com.rabbitmq.client.Envelope;
public class MessageConsumer extends DefaultConsumer {
final LinkedBlockingQueue<Message> queueMessages;
/**
* Constructs a new instance and records its association to the passed-in channel.
*
* @param channel the channel to which this consumer is attached
* @param queueMessages
*/
public MessageConsumer(Channel channel, LinkedBlockingQueue<Message> queueMessages) {
super(channel);
this.queueMessages = queueMessages;
}
@Override
public void handleDelivery(
String consumerTag, Envelope envelope, AMQP.BasicProperties properties, byte[] body)
throws IOException {
final String json = new String(body, StandardCharsets.UTF_8);
Message message = Message.fromJson(json);
try {
this.queueMessages.put(message);
System.out.println("Receiving Message " + message);
} catch (InterruptedException e) {
if (message.getType() == MessageType.REPORT)
throw new RuntimeException("Error on sending message");
else {
// TODO LOGGING EXCEPTION
}
} finally {
getChannel().basicAck(envelope.getDeliveryTag(), false);
}
}
}

View File

@ -1,136 +0,0 @@
package eu.dnetlib.message;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.TimeoutException;
import com.rabbitmq.client.Channel;
import com.rabbitmq.client.Connection;
import com.rabbitmq.client.ConnectionFactory;
public class MessageManager {
private final String messageHost;
private final String username;
private final String password;
private Connection connection;
private final Map<String, Channel> channels = new HashMap<>();
private boolean durable;
private boolean autodelete;
private final LinkedBlockingQueue<Message> queueMessages;
public MessageManager(
String messageHost,
String username,
String password,
final LinkedBlockingQueue<Message> queueMessages) {
this.queueMessages = queueMessages;
this.messageHost = messageHost;
this.username = username;
this.password = password;
}
public MessageManager(
String messageHost,
String username,
String password,
boolean durable,
boolean autodelete,
final LinkedBlockingQueue<Message> queueMessages) {
this.queueMessages = queueMessages;
this.messageHost = messageHost;
this.username = username;
this.password = password;
this.durable = durable;
this.autodelete = autodelete;
}
private Connection createConnection() throws IOException, TimeoutException {
ConnectionFactory factory = new ConnectionFactory();
factory.setHost(this.messageHost);
factory.setUsername(this.username);
factory.setPassword(this.password);
return factory.newConnection();
}
private Channel createChannel(
final Connection connection,
final String queueName,
final boolean durable,
final boolean autodelete)
throws Exception {
Map<String, Object> args = new HashMap<>();
args.put("x-message-ttl", 10000);
Channel channel = connection.createChannel();
channel.queueDeclare(queueName, durable, false, this.autodelete, args);
return channel;
}
private Channel getOrCreateChannel(final String queueName, boolean durable, boolean autodelete)
throws Exception {
if (channels.containsKey(queueName)) {
return channels.get(queueName);
}
if (this.connection == null) {
this.connection = createConnection();
}
channels.put(queueName, createChannel(this.connection, queueName, durable, autodelete));
return channels.get(queueName);
}
public void close() throws IOException {
channels
.values()
.forEach(
ch -> {
try {
ch.close();
} catch (Exception e) {
// TODO LOG
}
});
this.connection.close();
}
public boolean sendMessage(final Message message, String queueName) throws Exception {
try {
Channel channel = getOrCreateChannel(queueName, this.durable, this.autodelete);
channel.basicPublish("", queueName, null, message.toString().getBytes());
return true;
} catch (Throwable e) {
throw new RuntimeException(e);
}
}
public boolean sendMessage(
final Message message, String queueName, boolean durable_var, boolean autodelete_var)
throws Exception {
try {
Channel channel = getOrCreateChannel(queueName, durable_var, autodelete_var);
channel.basicPublish("", queueName, null, message.toString().getBytes());
return true;
} catch (Throwable e) {
throw new RuntimeException(e);
}
}
public void startConsumingMessage(
final String queueName, final boolean durable, final boolean autodelete) throws Exception {
Channel channel = createChannel(createConnection(), queueName, durable, autodelete);
channel.basicConsume(queueName, false, new MessageConsumer(channel, queueMessages));
}
}

View File

@ -1,6 +0,0 @@
package eu.dnetlib.message;
public enum MessageType {
ONGOING, REPORT
}

File diff suppressed because one or more lines are too long

View File

@ -19,6 +19,30 @@ public class ZenodoAPIClientTest {
private final String CONCEPT_REC_ID = "657113"; private final String CONCEPT_REC_ID = "657113";
private final String depositionId = "674915";
@Test
public void testUploadOldDeposition() throws IOException, MissingConceptDoiException {
ZenodoAPIClient client = new ZenodoAPIClient(URL_STRING,
ACCESS_TOKEN);
Assertions.assertEquals(200, client.uploadOpenDeposition(depositionId));
File file = new File(getClass()
.getResource("/eu/dnetlib/dhp/common/api/COVID-19.json.gz")
.getPath());
InputStream is = new FileInputStream(file);
Assertions.assertEquals(200, client.uploadIS(is, "COVID-19.json.gz", file.length()));
String metadata = IOUtils.toString(getClass().getResourceAsStream("/eu/dnetlib/dhp/common/api/metadata.json"));
Assertions.assertEquals(200, client.sendMretadata(metadata));
Assertions.assertEquals(202, client.publish());
}
@Test @Test
public void testNewDeposition() throws IOException { public void testNewDeposition() throws IOException {

View File

@ -1,16 +0,0 @@
package eu.dnetlib.dhp.model.mdstore;
import static org.junit.jupiter.api.Assertions.assertTrue;
import org.junit.jupiter.api.Test;
public class MetadataRecordTest {
@Test
public void getTimestamp() {
MetadataRecord r = new MetadataRecord();
assertTrue(r.getDateOfCollection() > 0);
}
}

View File

@ -0,0 +1,100 @@
package eu.dnetlib.dhp.oa.merge;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import com.fasterxml.jackson.databind.ObjectMapper;
import eu.dnetlib.dhp.schema.oaf.Author;
import eu.dnetlib.dhp.schema.oaf.Publication;
import eu.dnetlib.dhp.schema.oaf.StructuredProperty;
import eu.dnetlib.pace.util.MapDocumentUtil;
import scala.Tuple2;
public class AuthorMergerTest {
private String publicationsBasePath;
private List<List<Author>> authors;
@BeforeEach
public void setUp() throws Exception {
publicationsBasePath = Paths
.get(AuthorMergerTest.class.getResource("/eu/dnetlib/dhp/oa/merge").toURI())
.toFile()
.getAbsolutePath();
authors = readSample(publicationsBasePath + "/publications_with_authors.json", Publication.class)
.stream()
.map(p -> p._2().getAuthor())
.collect(Collectors.toList());
}
@Test
public void mergeTest() { // used in the dedup: threshold set to 0.95
for (List<Author> authors1 : authors) {
System.out.println("List " + (authors.indexOf(authors1) + 1));
for (Author author : authors1) {
System.out.println(authorToString(author));
}
}
List<Author> merge = AuthorMerger.merge(authors);
System.out.println("Merge ");
for (Author author : merge) {
System.out.println(authorToString(author));
}
Assertions.assertEquals(7, merge.size());
}
public <T> List<Tuple2<String, T>> readSample(String path, Class<T> clazz) {
List<Tuple2<String, T>> res = new ArrayList<>();
BufferedReader reader;
try {
reader = new BufferedReader(new FileReader(path));
String line = reader.readLine();
while (line != null) {
res
.add(
new Tuple2<>(
MapDocumentUtil.getJPathString("$.id", line),
new ObjectMapper().readValue(line, clazz)));
// read next line
line = reader.readLine();
}
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
return res;
}
public String authorToString(Author a) {
String print = "Fullname = ";
print += a.getFullname() + " pid = [";
if (a.getPid() != null)
for (StructuredProperty sp : a.getPid()) {
print += sp.toComparableString() + " ";
}
print += "]";
return print;
}
}

View File

@ -1,47 +0,0 @@
package eu.dnetlib.dhp.schema.oaf.utils;
import static org.junit.jupiter.api.Assertions.*;
import java.io.IOException;
import org.apache.commons.io.IOUtils;
import org.junit.jupiter.api.Test;
import com.fasterxml.jackson.databind.DeserializationFeature;
import com.fasterxml.jackson.databind.ObjectMapper;
import eu.dnetlib.dhp.schema.oaf.Publication;
import eu.dnetlib.dhp.utils.DHPUtils;
public class IdentifierFactoryTest {
private static ObjectMapper OBJECT_MAPPER = new ObjectMapper()
.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
@Test
public void testCreateIdentifierForPublication() throws IOException {
verifyIdentifier("publication_doi.json", "50|doi_________::" + DHPUtils.md5("10.1016/j.cmet.2011.03.013"));
verifyIdentifier("publication_pmc.json", "50|pmc_________::" + DHPUtils.md5("21459329"));
verifyIdentifier(
"publication_urn.json",
"50|urn_________::" + DHPUtils.md5("urn:nbn:nl:ui:29-f3ed5f9e-edf6-457e-8848-61b58a4075e2"));
final String defaultID = "50|DansKnawCris::0829b5191605bdbea36d6502b8c1ce1f";
verifyIdentifier("publication_3.json", defaultID);
verifyIdentifier("publication_4.json", defaultID);
verifyIdentifier("publication_5.json", defaultID);
}
protected void verifyIdentifier(String filename, String expectedID) throws IOException {
final String json = IOUtils.toString(getClass().getResourceAsStream(filename));
final Publication pub = OBJECT_MAPPER.readValue(json, Publication.class);
String id = IdentifierFactory.createIdentifier(pub);
assertNotNull(id);
assertEquals(expectedID, id);
}
}

View File

@ -0,0 +1,180 @@
package eu.dnetlib.dhp.schema.oaf.utils;
import static org.junit.jupiter.api.Assertions.*;
import java.io.IOException;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.util.HashSet;
import java.util.List;
import java.util.Locale;
import java.util.Optional;
import java.util.stream.Collectors;
import org.apache.commons.io.IOUtils;
import org.junit.jupiter.api.Test;
import com.fasterxml.jackson.databind.DeserializationFeature;
import com.fasterxml.jackson.databind.ObjectMapper;
import eu.dnetlib.dhp.schema.common.ModelConstants;
import eu.dnetlib.dhp.schema.oaf.*;
public class OafMapperUtilsTest {
private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper()
.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
@Test
public void testDateValidation() {
assertTrue(GraphCleaningFunctions.doCleanDate("2016-05-07T12:41:19.202Z ").isPresent());
assertTrue(GraphCleaningFunctions.doCleanDate("2020-09-10 11:08:52 ").isPresent());
assertTrue(GraphCleaningFunctions.doCleanDate(" 2016-04-05").isPresent());
assertEquals("2016-04-05", GraphCleaningFunctions.doCleanDate("2016 Apr 05").get());
assertEquals("2009-05-08", GraphCleaningFunctions.doCleanDate("May 8, 2009 5:57:51 PM").get());
assertEquals("1970-10-07", GraphCleaningFunctions.doCleanDate("oct 7, 1970").get());
assertEquals("1970-10-07", GraphCleaningFunctions.doCleanDate("oct 7, '70").get());
assertEquals("1970-10-07", GraphCleaningFunctions.doCleanDate("oct. 7, 1970").get());
assertEquals("1970-10-07", GraphCleaningFunctions.doCleanDate("oct. 7, 70").get());
assertEquals("2006-01-02", GraphCleaningFunctions.doCleanDate("Mon Jan 2 15:04:05 2006").get());
assertEquals("2006-01-02", GraphCleaningFunctions.doCleanDate("Mon Jan 2 15:04:05 MST 2006").get());
assertEquals("2006-01-02", GraphCleaningFunctions.doCleanDate("Mon Jan 02 15:04:05 -0700 2006").get());
assertEquals("2006-01-02", GraphCleaningFunctions.doCleanDate("Monday, 02-Jan-06 15:04:05 MST").get());
assertEquals("2006-01-02", GraphCleaningFunctions.doCleanDate("Mon, 02 Jan 2006 15:04:05 MST").get());
assertEquals("2017-07-11", GraphCleaningFunctions.doCleanDate("Tue, 11 Jul 2017 16:28:13 +0200 (CEST)").get());
assertEquals("2006-01-02", GraphCleaningFunctions.doCleanDate("Mon, 02 Jan 2006 15:04:05 -0700").get());
assertEquals("2018-01-04", GraphCleaningFunctions.doCleanDate("Thu, 4 Jan 2018 17:53:36 +0000").get());
assertEquals("2015-08-10", GraphCleaningFunctions.doCleanDate("Mon Aug 10 15:44:11 UTC+0100 2015").get());
assertEquals(
"2015-07-03",
GraphCleaningFunctions.doCleanDate("Fri Jul 03 2015 18:04:07 GMT+0100 (GMT Daylight Time)").get());
assertEquals("2012-09-17", GraphCleaningFunctions.doCleanDate("September 17, 2012 10:09am").get());
assertEquals("2012-09-17", GraphCleaningFunctions.doCleanDate("September 17, 2012 at 10:09am PST-08").get());
assertEquals("2012-09-17", GraphCleaningFunctions.doCleanDate("September 17, 2012, 10:10:09").get());
assertEquals("1970-10-07", GraphCleaningFunctions.doCleanDate("October 7, 1970").get());
assertEquals("1970-10-07", GraphCleaningFunctions.doCleanDate("October 7th, 1970").get());
assertEquals("2006-02-12", GraphCleaningFunctions.doCleanDate("12 Feb 2006, 19:17").get());
assertEquals("2006-02-12", GraphCleaningFunctions.doCleanDate("12 Feb 2006 19:17").get());
assertEquals("1970-10-07", GraphCleaningFunctions.doCleanDate("7 oct 70").get());
assertEquals("1970-10-07", GraphCleaningFunctions.doCleanDate("7 oct 1970").get());
assertEquals("2013-02-03", GraphCleaningFunctions.doCleanDate("03 February 2013").get());
assertEquals("2013-07-01", GraphCleaningFunctions.doCleanDate("1 July 2013").get());
assertEquals("2013-02-03", GraphCleaningFunctions.doCleanDate("2013-Feb-03").get());
assertEquals("2014-03-31", GraphCleaningFunctions.doCleanDate("3/31/2014").get());
assertEquals("2014-03-31", GraphCleaningFunctions.doCleanDate("03/31/2014").get());
assertEquals("1971-08-21", GraphCleaningFunctions.doCleanDate("08/21/71").get());
assertEquals("1971-01-08", GraphCleaningFunctions.doCleanDate("8/1/71").get());
assertEquals("2014-08-04", GraphCleaningFunctions.doCleanDate("4/8/2014 22:05").get());
assertEquals("2014-08-04", GraphCleaningFunctions.doCleanDate("04/08/2014 22:05").get());
assertEquals("2014-08-04", GraphCleaningFunctions.doCleanDate("4/8/14 22:05").get());
assertEquals("2014-02-04", GraphCleaningFunctions.doCleanDate("04/2/2014 03:00:51").get());
assertEquals("1965-08-08", GraphCleaningFunctions.doCleanDate("8/8/1965 12:00:00 AM").get());
assertEquals("1965-08-08", GraphCleaningFunctions.doCleanDate("8/8/1965 01:00:01 PM").get());
assertEquals("1965-08-08", GraphCleaningFunctions.doCleanDate("8/8/1965 01:00 PM").get());
assertEquals("1965-08-08", GraphCleaningFunctions.doCleanDate("8/8/1965 1:00 PM").get());
assertEquals("1965-08-08", GraphCleaningFunctions.doCleanDate("8/8/1965 12:00 AM").get());
assertEquals("2014-02-04", GraphCleaningFunctions.doCleanDate("4/02/2014 03:00:51").get());
assertEquals("2012-03-19", GraphCleaningFunctions.doCleanDate("03/19/2012 10:11:59").get());
assertEquals("2012-03-19", GraphCleaningFunctions.doCleanDate("03/19/2012 10:11:59.3186369").get());
assertEquals("2014-03-31", GraphCleaningFunctions.doCleanDate("2014/3/31").get());
assertEquals("2014-03-31", GraphCleaningFunctions.doCleanDate("2014/03/31").get());
assertEquals("2014-04-08", GraphCleaningFunctions.doCleanDate("2014/4/8 22:05").get());
assertEquals("2014-04-08", GraphCleaningFunctions.doCleanDate("2014/04/08 22:05").get());
assertEquals("2014-04-02", GraphCleaningFunctions.doCleanDate("2014/04/2 03:00:51").get());
assertEquals("2014-04-02", GraphCleaningFunctions.doCleanDate("2014/4/02 03:00:51").get());
assertEquals("2012-03-19", GraphCleaningFunctions.doCleanDate("2012/03/19 10:11:59").get());
assertEquals("2012-03-19", GraphCleaningFunctions.doCleanDate("2012/03/19 10:11:59.3186369").get());
assertEquals("2014-04-08", GraphCleaningFunctions.doCleanDate("2014年04月08日").get());
assertEquals("2006-01-02", GraphCleaningFunctions.doCleanDate("2006-01-02T15:04:05+0000").get());
assertEquals("2009-08-13", GraphCleaningFunctions.doCleanDate("2009-08-12T22:15:09-07:00").get());
assertEquals("2009-08-12", GraphCleaningFunctions.doCleanDate("2009-08-12T22:15:09").get());
assertEquals("2009-08-12", GraphCleaningFunctions.doCleanDate("2009-08-12T22:15:09Z").get());
assertEquals("2014-04-26", GraphCleaningFunctions.doCleanDate("2014-04-26 17:24:37.3186369").get());
assertEquals("2012-08-03", GraphCleaningFunctions.doCleanDate("2012-08-03 18:31:59.257000000").get());
assertEquals("2014-04-26", GraphCleaningFunctions.doCleanDate("2014-04-26 17:24:37.123").get());
assertEquals("2013-04-01", GraphCleaningFunctions.doCleanDate("2013-04-01 22:43").get());
assertEquals("2013-04-01", GraphCleaningFunctions.doCleanDate("2013-04-01 22:43:22").get());
assertEquals("2014-12-16", GraphCleaningFunctions.doCleanDate("2014-12-16 06:20:00 UTC").get());
assertEquals("2014-12-16", GraphCleaningFunctions.doCleanDate("2014-12-16 06:20:00 GMT").get());
assertEquals("2014-04-26", GraphCleaningFunctions.doCleanDate("2014-04-26 05:24:37 PM").get());
assertEquals("2014-04-26", GraphCleaningFunctions.doCleanDate("2014-04-26 13:13:43 +0800").get());
assertEquals("2014-04-26", GraphCleaningFunctions.doCleanDate("2014-04-26 13:13:43 +0800 +08").get());
assertEquals("2014-04-26", GraphCleaningFunctions.doCleanDate("2014-04-26 13:13:44 +09:00").get());
assertEquals("2012-08-03", GraphCleaningFunctions.doCleanDate("2012-08-03 18:31:59.257000000 +0000 UTC").get());
assertEquals("2015-09-30", GraphCleaningFunctions.doCleanDate("2015-09-30 18:48:56.35272715 +0000 UTC").get());
assertEquals("2015-02-18", GraphCleaningFunctions.doCleanDate("2015-02-18 00:12:00 +0000 GMT").get());
assertEquals("2015-02-18", GraphCleaningFunctions.doCleanDate("2015-02-18 00:12:00 +0000 UTC").get());
assertEquals(
"2015-02-08", GraphCleaningFunctions.doCleanDate("2015-02-08 03:02:00 +0300 MSK m=+0.000000001").get());
assertEquals(
"2015-02-08", GraphCleaningFunctions.doCleanDate("2015-02-08 03:02:00.001 +0300 MSK m=+0.000000001").get());
assertEquals("2017-07-19", GraphCleaningFunctions.doCleanDate("2017-07-19 03:21:51+00:00").get());
assertEquals("2014-04-26", GraphCleaningFunctions.doCleanDate("2014-04-26").get());
assertEquals("2014-04-01", GraphCleaningFunctions.doCleanDate("2014-04").get());
assertEquals("2014-01-01", GraphCleaningFunctions.doCleanDate("2014").get());
assertEquals("2014-05-11", GraphCleaningFunctions.doCleanDate("2014-05-11 08:20:13,787").get());
assertEquals("2014-03-31", GraphCleaningFunctions.doCleanDate("3.31.2014").get());
assertEquals("2014-03-31", GraphCleaningFunctions.doCleanDate("03.31.2014").get());
assertEquals("1971-08-21", GraphCleaningFunctions.doCleanDate("08.21.71").get());
assertEquals("2014-03-01", GraphCleaningFunctions.doCleanDate("2014.03").get());
assertEquals("2014-03-30", GraphCleaningFunctions.doCleanDate("2014.03.30").get());
assertEquals("2014-06-01", GraphCleaningFunctions.doCleanDate("20140601").get());
assertEquals("2014-07-22", GraphCleaningFunctions.doCleanDate("20140722105203").get());
assertEquals("2012-03-19", GraphCleaningFunctions.doCleanDate("1332151919").get());
assertEquals("2013-11-12", GraphCleaningFunctions.doCleanDate("1384216367189").get());
assertEquals("2013-11-12", GraphCleaningFunctions.doCleanDate("1384216367111222").get());
assertEquals("2013-11-12", GraphCleaningFunctions.doCleanDate("1384216367111222333").get());
}
@Test
public void testDate() {
System.out.println(GraphCleaningFunctions.cleanDate("23-FEB-1998"));
}
@Test
public void testMergePubs() throws IOException {
Publication p1 = read("publication_1.json", Publication.class);
Publication p2 = read("publication_2.json", Publication.class);
Dataset d1 = read("dataset_1.json", Dataset.class);
Dataset d2 = read("dataset_2.json", Dataset.class);
assertEquals(p1.getCollectedfrom().size(), 1);
assertEquals(p1.getCollectedfrom().get(0).getKey(), ModelConstants.CROSSREF_ID);
assertEquals(d2.getCollectedfrom().size(), 1);
assertFalse(cfId(d2.getCollectedfrom()).contains(ModelConstants.CROSSREF_ID));
assertTrue(
OafMapperUtils
.mergeResults(p1, d2)
.getResulttype()
.getClassid()
.equals(ModelConstants.PUBLICATION_RESULTTYPE_CLASSID));
assertEquals(p2.getCollectedfrom().size(), 1);
assertFalse(cfId(p2.getCollectedfrom()).contains(ModelConstants.CROSSREF_ID));
assertEquals(d1.getCollectedfrom().size(), 1);
assertTrue(cfId(d1.getCollectedfrom()).contains(ModelConstants.CROSSREF_ID));
assertTrue(
OafMapperUtils
.mergeResults(p2, d1)
.getResulttype()
.getClassid()
.equals(ModelConstants.DATASET_RESULTTYPE_CLASSID));
}
protected HashSet<String> cfId(List<KeyValue> collectedfrom) {
return collectedfrom.stream().map(c -> c.getKey()).collect(Collectors.toCollection(HashSet::new));
}
protected <T extends Result> T read(String filename, Class<T> clazz) throws IOException {
final String json = IOUtils.toString(getClass().getResourceAsStream(filename));
return OBJECT_MAPPER.readValue(json, clazz);
}
}

View File

@ -1,51 +0,0 @@
package eu.dnetlib.message;
import static org.junit.jupiter.api.Assertions.*;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import org.junit.jupiter.api.Test;
public class MessageTest {
@Test
public void fromJsonTest() throws IOException {
Message m = new Message();
m.setWorkflowId("wId");
m.setType(MessageType.ONGOING);
m.setJobName("Collection");
Map<String, String> body = new HashMap<>();
body.put("parsedItem", "300");
body.put("ExecutionTime", "30s");
m.setBody(body);
System.out.println("m = " + m);
Message m1 = Message.fromJson(m.toString());
assertEquals(m1.getWorkflowId(), m.getWorkflowId());
assertEquals(m1.getType(), m.getType());
assertEquals(m1.getJobName(), m.getJobName());
assertNotNull(m1.getBody());
m1.getBody().keySet().forEach(it -> assertEquals(m1.getBody().get(it), m.getBody().get(it)));
assertEquals(m1.getJobName(), m.getJobName());
}
@Test
public void toStringTest() {
final String expectedJson = "{\"workflowId\":\"wId\",\"jobName\":\"Collection\",\"type\":\"ONGOING\",\"body\":{\"ExecutionTime\":\"30s\",\"parsedItem\":\"300\"}}";
Message m = new Message();
m.setWorkflowId("wId");
m.setType(MessageType.ONGOING);
m.setJobName("Collection");
Map<String, String> body = new HashMap<>();
body.put("parsedItem", "300");
body.put("ExecutionTime", "30s");
m.setBody(body);
assertEquals(expectedJson, m.toString());
}
}

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1 @@
{"id":"50|DansKnawCris::0829b5191605bdbea36d6502b8c1ce1g", "resuttype" : { "classid" : "dataset" }, "pid":[{"qualifier":{"classid":"doi"},"value":"10.1016/j.cmet.2011.03.013"},{"qualifier":{"classid":"urn"},"value":"urn:nbn:nl:ui:29-f3ed5f9e-edf6-457e-8848-61b58a4075e2"},{"qualifier":{"classid":"scp-number"},"value":"79953761260"},{"qualifier":{"classid":"pmc"},"value":"21459329"}], "collectedfrom" : [ { "key" : "10|openaire____::081b82f96300b6a6e3d282bad31cb6e2", "value" : "Crossref"} ]}

View File

@ -0,0 +1 @@
{"id":"50|DansKnawCris::0829b5191605bdbea36d6502b8c1ce1g", "resuttype" : { "classid" : "dataset" }, "pid":[{"qualifier":{"classid":"doi"},"value":"10.1016/j.cmet.2011.03.013"},{"qualifier":{"classid":"urn"},"value":"urn:nbn:nl:ui:29-f3ed5f9e-edf6-457e-8848-61b58a4075e2"},{"qualifier":{"classid":"scp-number"},"value":"79953761260"},{"qualifier":{"classid":"pmc"},"value":"21459329"}], "collectedfrom" : [ { "key" : "10|openaire____::081b82f96300b6a6e3d282bad31cb6e3", "value" : "Repository B"} ]}

View File

@ -0,0 +1 @@
{"id":"50|DansKnawCris::0829b5191605bdbea36d6502b8c1ce1f", "resuttype" : { "classid" : "publication" }, "pid":[{"qualifier":{"classid":"doi"},"value":"10.1016/j.cmet.2011.03.013"},{"qualifier":{"classid":"urn"},"value":"urn:nbn:nl:ui:29-f3ed5f9e-edf6-457e-8848-61b58a4075e2"},{"qualifier":{"classid":"scp-number"},"value":"79953761260"},{"qualifier":{"classid":"pmc"},"value":"21459329"}], "collectedfrom" : [ { "key" : "10|openaire____::081b82f96300b6a6e3d282bad31cb6e2", "value" : "Crossref"} ]}

View File

@ -0,0 +1 @@
{"id":"50|DansKnawCris::0829b5191605bdbea36d6502b8c1ce1f", "resuttype" : { "classid" : "publication" }, "pid":[{"qualifier":{"classid":"doi"},"value":"10.1016/j.cmet.2011.03.013"},{"qualifier":{"classid":"urn"},"value":"urn:nbn:nl:ui:29-f3ed5f9e-edf6-457e-8848-61b58a4075e2"},{"qualifier":{"classid":"scp-number"},"value":"79953761260"},{"qualifier":{"classid":"pmc"},"value":"21459329"}], "collectedfrom" : [ { "key" : "10|openaire____::081b82f96300b6a6e3d282bad31cb6e3", "value" : "Repository A"} ]}

View File

@ -1 +0,0 @@
{"id":"50|DansKnawCris::0829b5191605bdbea36d6502b8c1ce1f","pid":[{"qualifier":{"classid":"scp-number"},"value":"79953761260"}]}

View File

@ -1 +0,0 @@
{"id":"50|DansKnawCris::0829b5191605bdbea36d6502b8c1ce1f","pid":[]}

View File

@ -1 +0,0 @@
{"id":"50|DansKnawCris::0829b5191605bdbea36d6502b8c1ce1f"}

View File

@ -1 +0,0 @@
{"id":"50|DansKnawCris::0829b5191605bdbea36d6502b8c1ce1f","pid":[{"qualifier":{"classid":"doi"},"value":"10.1016/j.cmet.2011.03.013"},{"qualifier":{"classid":"urn"},"value":"urn:nbn:nl:ui:29-f3ed5f9e-edf6-457e-8848-61b58a4075e2"},{"qualifier":{"classid":"scp-number"},"value":"79953761260"},{"qualifier":{"classid":"pmc"},"value":"21459329"}]}

View File

@ -1 +0,0 @@
{"id":"50|DansKnawCris::0829b5191605bdbea36d6502b8c1ce1f","pid":[{"qualifier":{"classid":"urn"},"value":"urn:nbn:nl:ui:29-f3ed5f9e-edf6-457e-8848-61b58a4075e2"},{"qualifier":{"classid":"scp-number"},"value":"79953761260"},{"qualifier":{"classid":"pmc"},"value":"21459329"}]}

View File

@ -1 +0,0 @@
{"id":"50|DansKnawCris::0829b5191605bdbea36d6502b8c1ce1f","pid":[{"qualifier":{"classid":"urn"},"value":"urn:nbn:nl:ui:29-f3ed5f9e-edf6-457e-8848-61b58a4075e2"},{"qualifier":{"classid":"scp-number"},"value":"79953761260"},{"qualifier":{"classid":"pmcid"},"value":"21459329"}]}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,11 +0,0 @@
Description of the project
--------------------------
This project defines **object schemas** of the OpenAIRE main entities and the relationships that intercur among them.
Namely it defines the model for
- **research product (result)** which subclasses in publication, dataset, other research product, software
- **data source** object describing the data provider (institutional repository, aggregators, cris systems)
- **organization** research bodies managing a data source or participating to a research project
- **project** research project
Te serialization of such objects (data store files) are used to pass data between workflow nodes in the processing pipeline.

View File

@ -1,73 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>eu.dnetlib.dhp</groupId>
<artifactId>dhp</artifactId>
<version>1.2.4-SNAPSHOT</version>
<relativePath>../</relativePath>
</parent>
<artifactId>dhp-schemas</artifactId>
<packaging>jar</packaging>
<description>This module contains common schema classes meant to be used across the dnet-hadoop submodules</description>
<build>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>4.0.1</version>
<executions>
<execution>
<id>scala-compile-first</id>
<phase>initialize</phase>
<goals>
<goal>add-source</goal>
<goal>compile</goal>
</goals>
</execution>
<execution>
<id>scala-test-compile</id>
<phase>process-test-resources</phase>
<goals>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</dependency>
</dependencies>
</project>

View File

@ -1,40 +0,0 @@
package eu.dnetlib.dhp.schema.action;
import java.io.Serializable;
import com.fasterxml.jackson.databind.annotation.JsonDeserialize;
import eu.dnetlib.dhp.schema.oaf.Oaf;
@JsonDeserialize(using = AtomicActionDeserializer.class)
public class AtomicAction<T extends Oaf> implements Serializable {
private Class<T> clazz;
private T payload;
public AtomicAction() {
}
public AtomicAction(Class<T> clazz, T payload) {
this.clazz = clazz;
this.payload = payload;
}
public Class<T> getClazz() {
return clazz;
}
public void setClazz(Class<T> clazz) {
this.clazz = clazz;
}
public T getPayload() {
return payload;
}
public void setPayload(T payload) {
this.payload = payload;
}
}

View File

@ -1,32 +0,0 @@
package eu.dnetlib.dhp.schema.action;
import java.io.IOException;
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.DeserializationContext;
import com.fasterxml.jackson.databind.JsonDeserializer;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import eu.dnetlib.dhp.schema.oaf.Oaf;
public class AtomicActionDeserializer extends JsonDeserializer {
@Override
public Object deserialize(JsonParser jp, DeserializationContext ctxt)
throws IOException {
JsonNode node = jp.getCodec().readTree(jp);
String classTag = node.get("clazz").asText();
JsonNode payload = node.get("payload");
ObjectMapper mapper = new ObjectMapper();
try {
final Class<?> clazz = Class.forName(classTag);
return new AtomicAction(clazz, (Oaf) mapper.readValue(payload.toString(), clazz));
} catch (ClassNotFoundException e) {
throw new IOException(e);
}
}
}

View File

@ -1,21 +0,0 @@
package eu.dnetlib.dhp.schema.common;
import eu.dnetlib.dhp.schema.oaf.OafEntity;
/** Actual entity types in the Graph */
public enum EntityType {
publication, dataset, otherresearchproduct, software, datasource, organization, project;
/**
* Resolves the EntityType, given the relative class name
*
* @param clazz the given class name
* @param <T> actual OafEntity subclass
* @return the EntityType associated to the given class
*/
public static <T extends OafEntity> EntityType fromClass(Class<T> clazz) {
return EntityType.valueOf(clazz.getSimpleName().toLowerCase());
}
}

View File

@ -1,69 +0,0 @@
package eu.dnetlib.dhp.schema.common;
import java.util.Comparator;
import eu.dnetlib.dhp.schema.oaf.Qualifier;
public class LicenseComparator implements Comparator<Qualifier> {
@Override
public int compare(Qualifier left, Qualifier right) {
if (left == null && right == null)
return 0;
if (left == null)
return 1;
if (right == null)
return -1;
String lClass = left.getClassid();
String rClass = right.getClassid();
if (lClass.equals(rClass))
return 0;
if (lClass.equals("OPEN SOURCE"))
return -1;
if (rClass.equals("OPEN SOURCE"))
return 1;
if (lClass.equals("OPEN"))
return -1;
if (rClass.equals("OPEN"))
return 1;
if (lClass.equals("6MONTHS"))
return -1;
if (rClass.equals("6MONTHS"))
return 1;
if (lClass.equals("12MONTHS"))
return -1;
if (rClass.equals("12MONTHS"))
return 1;
if (lClass.equals("EMBARGO"))
return -1;
if (rClass.equals("EMBARGO"))
return 1;
if (lClass.equals("RESTRICTED"))
return -1;
if (rClass.equals("RESTRICTED"))
return 1;
if (lClass.equals("CLOSED"))
return -1;
if (rClass.equals("CLOSED"))
return 1;
if (lClass.equals("UNKNOWN"))
return -1;
if (rClass.equals("UNKNOWN"))
return 1;
// Else (but unlikely), lexicographical ordering will do.
return lClass.compareTo(rClass);
}
}

View File

@ -1,7 +0,0 @@
package eu.dnetlib.dhp.schema.common;
/** Main entity types in the Graph */
public enum MainEntityType {
result, datasource, organization, project
}

View File

@ -1,124 +0,0 @@
package eu.dnetlib.dhp.schema.common;
import eu.dnetlib.dhp.schema.oaf.DataInfo;
import eu.dnetlib.dhp.schema.oaf.KeyValue;
import eu.dnetlib.dhp.schema.oaf.Qualifier;
public class ModelConstants {
public static final String DNET_SUBJECT_TYPOLOGIES = "dnet:subject_classification_typologies";
public static final String DNET_RESULT_TYPOLOGIES = "dnet:result_typologies";
public static final String DNET_PUBLICATION_RESOURCE = "dnet:publication_resource";
public static final String DNET_ACCESS_MODES = "dnet:access_modes";
public static final String DNET_LANGUAGES = "dnet:languages";
public static final String DNET_PID_TYPES = "dnet:pid_types";
public static final String DNET_DATA_CITE_DATE = "dnet:dataCite_date";
public static final String DNET_DATA_CITE_RESOURCE = "dnet:dataCite_resource";
public static final String DNET_PROVENANCE_ACTIONS = "dnet:provenanceActions";
public static final String DNET_COUNTRY_TYPE = "dnet:countries";
public static final String DNET_REVIEW_LEVELS = "dnet:review_levels";
public static final String SYSIMPORT_CROSSWALK_REPOSITORY = "sysimport:crosswalk:repository";
public static final String SYSIMPORT_CROSSWALK_ENTITYREGISTRY = "sysimport:crosswalk:entityregistry";
public static final String USER_CLAIM = "user:claim";
public static final String DATASET_RESULTTYPE_CLASSID = "dataset";
public static final String PUBLICATION_RESULTTYPE_CLASSID = "publication";
public static final String SOFTWARE_RESULTTYPE_CLASSID = "software";
public static final String ORP_RESULTTYPE_CLASSID = "other";
public static final String RESULT_RESULT = "resultResult";
/**
* @deprecated Use {@link ModelConstants#RELATIONSHIP} instead.
*/
@Deprecated
public static final String PUBLICATION_DATASET = "publicationDataset";
public static final String IS_RELATED_TO = "isRelatedTo";
public static final String SUPPLEMENT = "supplement";
public static final String IS_SUPPLEMENT_TO = "isSupplementTo";
public static final String IS_SUPPLEMENTED_BY = "isSupplementedBy";
public static final String PART = "part";
public static final String IS_PART_OF = "isPartOf";
public static final String HAS_PARTS = "hasParts";
public static final String RELATIONSHIP = "relationship";
public static final String CITATION = "citation";
public static final String CITES = "cites";
public static final String IS_CITED_BY = "isCitedBy";
public static final String REVIEW = "review";
public static final String REVIEWS = "reviews";
public static final String IS_REVIEWED_BY = "isReviewedBy";
public static final String RESULT_PROJECT = "resultProject";
public static final String OUTCOME = "outcome";
public static final String IS_PRODUCED_BY = "isProducedBy";
public static final String PRODUCES = "produces";
public static final String DATASOURCE_ORGANIZATION = "datasourceOrganization";
public static final String PROVISION = "provision";
public static final String IS_PROVIDED_BY = "isProvidedBy";
public static final String PROVIDES = "provides";
public static final String PROJECT_ORGANIZATION = "projectOrganization";
public static final String PARTICIPATION = "participation";
public static final String HAS_PARTICIPANT = "hasParticipant";
public static final String IS_PARTICIPANT = "isParticipant";
public static final String RESULT_ORGANIZATION = "resultOrganization";
public static final String AFFILIATION = "affiliation";
public static final String IS_AUTHOR_INSTITUTION_OF = "isAuthorInstitutionOf";
public static final String HAS_AUTHOR_INSTITUTION = "hasAuthorInstitution";
public static final String MERGES = "merges";
public static final String UNKNOWN = "UNKNOWN";
public static final String NOT_AVAILABLE = "not available";
public static final Qualifier PUBLICATION_DEFAULT_RESULTTYPE = qualifier(
PUBLICATION_RESULTTYPE_CLASSID, PUBLICATION_RESULTTYPE_CLASSID,
DNET_RESULT_TYPOLOGIES, DNET_RESULT_TYPOLOGIES);
public static final Qualifier DATASET_DEFAULT_RESULTTYPE = qualifier(
DATASET_RESULTTYPE_CLASSID, DATASET_RESULTTYPE_CLASSID,
DNET_RESULT_TYPOLOGIES, DNET_RESULT_TYPOLOGIES);
public static final Qualifier SOFTWARE_DEFAULT_RESULTTYPE = qualifier(
SOFTWARE_RESULTTYPE_CLASSID, SOFTWARE_RESULTTYPE_CLASSID,
DNET_RESULT_TYPOLOGIES, DNET_RESULT_TYPOLOGIES);
public static final Qualifier ORP_DEFAULT_RESULTTYPE = qualifier(
ORP_RESULTTYPE_CLASSID, ORP_RESULTTYPE_CLASSID,
DNET_RESULT_TYPOLOGIES, DNET_RESULT_TYPOLOGIES);
public static final Qualifier REPOSITORY_PROVENANCE_ACTIONS = qualifier(
SYSIMPORT_CROSSWALK_REPOSITORY, SYSIMPORT_CROSSWALK_REPOSITORY,
DNET_PROVENANCE_ACTIONS, DNET_PROVENANCE_ACTIONS);
public static final Qualifier ENTITYREGISTRY_PROVENANCE_ACTION = qualifier(
SYSIMPORT_CROSSWALK_ENTITYREGISTRY, SYSIMPORT_CROSSWALK_ENTITYREGISTRY,
DNET_PROVENANCE_ACTIONS, DNET_PROVENANCE_ACTIONS);
public static final KeyValue UNKNOWN_REPOSITORY = keyValue(
"10|openaire____::55045bd2a65019fd8e6741a755395c8c", "Unknown Repository");
private static Qualifier qualifier(
final String classid,
final String classname,
final String schemeid,
final String schemename) {
final Qualifier q = new Qualifier();
q.setClassid(classid);
q.setClassname(classname);
q.setSchemeid(schemeid);
q.setSchemename(schemename);
return q;
}
private static KeyValue keyValue(String key, String value) {
KeyValue kv = new KeyValue();
kv.setKey(key);
kv.setValue(value);
kv.setDataInfo(new DataInfo());
return kv;
}
}

View File

@ -1,476 +0,0 @@
package eu.dnetlib.dhp.schema.common;
import static com.google.common.base.Preconditions.checkArgument;
import java.util.Map;
import java.util.Objects;
import java.util.Optional;
import java.util.function.Function;
import org.apache.commons.lang3.StringUtils;
import com.google.common.collect.Maps;
import eu.dnetlib.dhp.schema.oaf.*;
/** Oaf model utility methods. */
public class ModelSupport {
/** Defines the mapping between the actual entity type and the main entity type */
private static Map<EntityType, MainEntityType> entityMapping = Maps.newHashMap();
static {
entityMapping.put(EntityType.publication, MainEntityType.result);
entityMapping.put(EntityType.dataset, MainEntityType.result);
entityMapping.put(EntityType.otherresearchproduct, MainEntityType.result);
entityMapping.put(EntityType.software, MainEntityType.result);
entityMapping.put(EntityType.datasource, MainEntityType.datasource);
entityMapping.put(EntityType.organization, MainEntityType.organization);
entityMapping.put(EntityType.project, MainEntityType.project);
}
/**
* Defines the mapping between the actual entity types and the relative classes implementing them
*/
public static final Map<EntityType, Class> entityTypes = Maps.newHashMap();
static {
entityTypes.put(EntityType.datasource, Datasource.class);
entityTypes.put(EntityType.organization, Organization.class);
entityTypes.put(EntityType.project, Project.class);
entityTypes.put(EntityType.dataset, Dataset.class);
entityTypes.put(EntityType.otherresearchproduct, OtherResearchProduct.class);
entityTypes.put(EntityType.software, Software.class);
entityTypes.put(EntityType.publication, Publication.class);
}
public static final Map<String, Class> oafTypes = Maps.newHashMap();
static {
oafTypes.put("datasource", Datasource.class);
oafTypes.put("organization", Organization.class);
oafTypes.put("project", Project.class);
oafTypes.put("dataset", Dataset.class);
oafTypes.put("otherresearchproduct", OtherResearchProduct.class);
oafTypes.put("software", Software.class);
oafTypes.put("publication", Publication.class);
oafTypes.put("relation", Relation.class);
}
public static final Map<Class, String> idPrefixMap = Maps.newHashMap();
static {
idPrefixMap.put(Datasource.class, "10");
idPrefixMap.put(Organization.class, "20");
idPrefixMap.put(Project.class, "40");
idPrefixMap.put(Dataset.class, "50");
idPrefixMap.put(OtherResearchProduct.class, "50");
idPrefixMap.put(Software.class, "50");
idPrefixMap.put(Publication.class, "50");
}
public static final Map<String, String> entityIdPrefix = Maps.newHashMap();
static {
entityIdPrefix.put("datasource", "10");
entityIdPrefix.put("organization", "20");
entityIdPrefix.put("project", "40");
entityIdPrefix.put("result", "50");
}
public static final Map<String, String> idPrefixEntity = Maps.newHashMap();
static {
idPrefixEntity.put("10", "datasource");
idPrefixEntity.put("20", "organization");
idPrefixEntity.put("40", "project");
idPrefixEntity.put("50", "result");
}
public static final Map<String, RelationInverse> relationInverseMap = Maps.newHashMap();
static {
relationInverseMap
.put(
"personResult_authorship_isAuthorOf", new RelationInverse()
.setRelation("isAuthorOf")
.setInverse("hasAuthor")
.setRelType("personResult")
.setSubReltype("authorship"));
relationInverseMap
.put(
"personResult_authorship_hasAuthor", new RelationInverse()
.setInverse("isAuthorOf")
.setRelation("hasAuthor")
.setRelType("personResult")
.setSubReltype("authorship"));
relationInverseMap
.put(
"projectOrganization_participation_isParticipant", new RelationInverse()
.setRelation("isParticipant")
.setInverse("hasParticipant")
.setRelType("projectOrganization")
.setSubReltype("participation"));
relationInverseMap
.put(
"projectOrganization_participation_hasParticipant", new RelationInverse()
.setInverse("isParticipant")
.setRelation("hasParticipant")
.setRelType("projectOrganization")
.setSubReltype("participation"));
relationInverseMap
.put(
"resultOrganization_affiliation_hasAuthorInstitution", new RelationInverse()
.setRelation("hasAuthorInstitution")
.setInverse("isAuthorInstitutionOf")
.setRelType("resultOrganization")
.setSubReltype("affiliation"));
relationInverseMap
.put(
"resultOrganization_affiliation_isAuthorInstitutionOf", new RelationInverse()
.setInverse("hasAuthorInstitution")
.setRelation("isAuthorInstitutionOf")
.setRelType("resultOrganization")
.setSubReltype("affiliation"));
relationInverseMap
.put(
"organizationOrganization_dedup_merges", new RelationInverse()
.setRelation("merges")
.setInverse("isMergedIn")
.setRelType("organizationOrganization")
.setSubReltype("dedup"));
relationInverseMap
.put(
"organizationOrganization_dedup_isMergedIn", new RelationInverse()
.setInverse("merges")
.setRelation("isMergedIn")
.setRelType("organizationOrganization")
.setSubReltype("dedup"));
relationInverseMap
.put(
"organizationOrganization_dedupSimilarity_isSimilarTo", new RelationInverse()
.setInverse("isSimilarTo")
.setRelation("isSimilarTo")
.setRelType("organizationOrganization")
.setSubReltype("dedupSimilarity"));
relationInverseMap
.put(
"resultProject_outcome_isProducedBy", new RelationInverse()
.setRelation("isProducedBy")
.setInverse("produces")
.setRelType("resultProject")
.setSubReltype("outcome"));
relationInverseMap
.put(
"resultProject_outcome_produces", new RelationInverse()
.setInverse("isProducedBy")
.setRelation("produces")
.setRelType("resultProject")
.setSubReltype("outcome"));
relationInverseMap
.put(
"projectPerson_contactPerson_isContact", new RelationInverse()
.setRelation("isContact")
.setInverse("hasContact")
.setRelType("projectPerson")
.setSubReltype("contactPerson"));
relationInverseMap
.put(
"projectPerson_contactPerson_hasContact", new RelationInverse()
.setInverse("isContact")
.setRelation("hasContact")
.setRelType("personPerson")
.setSubReltype("coAuthorship"));
relationInverseMap
.put(
"personPerson_coAuthorship_isCoauthorOf", new RelationInverse()
.setInverse("isCoAuthorOf")
.setRelation("isCoAuthorOf")
.setRelType("personPerson")
.setSubReltype("coAuthorship"));
relationInverseMap
.put(
"personPerson_dedup_merges", new RelationInverse()
.setInverse("isMergedIn")
.setRelation("merges")
.setRelType("personPerson")
.setSubReltype("dedup"));
relationInverseMap
.put(
"personPerson_dedup_isMergedIn", new RelationInverse()
.setInverse("merges")
.setRelation("isMergedIn")
.setRelType("personPerson")
.setSubReltype("dedup"));
relationInverseMap
.put(
"personPerson_dedupSimilarity_isSimilarTo", new RelationInverse()
.setInverse("isSimilarTo")
.setRelation("isSimilarTo")
.setRelType("personPerson")
.setSubReltype("dedupSimilarity"));
relationInverseMap
.put(
"datasourceOrganization_provision_isProvidedBy", new RelationInverse()
.setInverse("provides")
.setRelation("isProvidedBy")
.setRelType("datasourceOrganization")
.setSubReltype("provision"));
relationInverseMap
.put(
"datasourceOrganization_provision_provides", new RelationInverse()
.setInverse("isProvidedBy")
.setRelation("provides")
.setRelType("datasourceOrganization")
.setSubReltype("provision"));
relationInverseMap
.put(
"resultResult_similarity_hasAmongTopNSimilarDocuments", new RelationInverse()
.setInverse("isAmongTopNSimilarDocuments")
.setRelation("hasAmongTopNSimilarDocuments")
.setRelType("resultResult")
.setSubReltype("similarity"));
relationInverseMap
.put(
"resultResult_similarity_isAmongTopNSimilarDocuments", new RelationInverse()
.setInverse("hasAmongTopNSimilarDocuments")
.setRelation("isAmongTopNSimilarDocuments")
.setRelType("resultResult")
.setSubReltype("similarity"));
relationInverseMap
.put(
"resultResult_relationship_isRelatedTo", new RelationInverse()
.setInverse("isRelatedTo")
.setRelation("isRelatedTo")
.setRelType("resultResult")
.setSubReltype("relationship"));
relationInverseMap
.put(
"resultResult_similarity_isAmongTopNSimilarDocuments", new RelationInverse()
.setInverse("hasAmongTopNSimilarDocuments")
.setRelation("isAmongTopNSimilarDocuments")
.setRelType("resultResult")
.setSubReltype("similarity"));
relationInverseMap
.put(
"resultResult_supplement_isSupplementTo", new RelationInverse()
.setInverse("isSupplementedBy")
.setRelation("isSupplementTo")
.setRelType("resultResult")
.setSubReltype("supplement"));
relationInverseMap
.put(
"resultResult_supplement_isSupplementedBy", new RelationInverse()
.setInverse("isSupplementTo")
.setRelation("isSupplementedBy")
.setRelType("resultResult")
.setSubReltype("supplement"));
relationInverseMap
.put(
"resultResult_part_isPartOf", new RelationInverse()
.setInverse("hasPart")
.setRelation("isPartOf")
.setRelType("resultResult")
.setSubReltype("part"));
relationInverseMap
.put(
"resultResult_part_hasPart", new RelationInverse()
.setInverse("isPartOf")
.setRelation("hasPart")
.setRelType("resultResult")
.setSubReltype("part"));
relationInverseMap
.put(
"resultResult_dedup_merges", new RelationInverse()
.setInverse("isMergedIn")
.setRelation("merges")
.setRelType("resultResult")
.setSubReltype("dedup"));
relationInverseMap
.put(
"resultResult_dedup_isMergedIn", new RelationInverse()
.setInverse("merges")
.setRelation("isMergedIn")
.setRelType("resultResult")
.setSubReltype("dedup"));
relationInverseMap
.put(
"resultResult_dedupSimilarity_isSimilarTo", new RelationInverse()
.setInverse("isSimilarTo")
.setRelation("isSimilarTo")
.setRelType("resultResult")
.setSubReltype("dedupSimilarity"));
}
private static final String schemeTemplate = "dnet:%s_%s_relations";
private ModelSupport() {
}
public static <E extends OafEntity> String getIdPrefix(Class<E> clazz) {
return idPrefixMap.get(clazz);
}
/**
* Checks subclass-superclass relationship.
*
* @param subClazzObject Subclass object instance
* @param superClazzObject Superclass object instance
* @param <X> Subclass type
* @param <Y> Superclass type
* @return True if X is a subclass of Y
*/
public static <X extends Oaf, Y extends Oaf> Boolean isSubClass(
X subClazzObject, Y superClazzObject) {
return isSubClass(subClazzObject.getClass(), superClazzObject.getClass());
}
/**
* Checks subclass-superclass relationship.
*
* @param subClazzObject Subclass object instance
* @param superClazz Superclass class
* @param <X> Subclass type
* @param <Y> Superclass type
* @return True if X is a subclass of Y
*/
public static <X extends Oaf, Y extends Oaf> Boolean isSubClass(
X subClazzObject, Class<Y> superClazz) {
return isSubClass(subClazzObject.getClass(), superClazz);
}
/**
* Checks subclass-superclass relationship.
*
* @param subClazz Subclass class
* @param superClazz Superclass class
* @param <X> Subclass type
* @param <Y> Superclass type
* @return True if X is a subclass of Y
*/
public static <X extends Oaf, Y extends Oaf> Boolean isSubClass(
Class<X> subClazz, Class<Y> superClazz) {
return superClazz.isAssignableFrom(subClazz);
}
/**
* Lists all the OAF model classes
*
* @param <T>
* @return
*/
public static <T extends Oaf> Class<T>[] getOafModelClasses() {
return new Class[] {
Author.class,
Context.class,
Country.class,
DataInfo.class,
Dataset.class,
Datasource.class,
ExternalReference.class,
ExtraInfo.class,
Field.class,
GeoLocation.class,
Instance.class,
Journal.class,
KeyValue.class,
Oaf.class,
OafEntity.class,
OAIProvenance.class,
Organization.class,
OriginDescription.class,
OtherResearchProduct.class,
Project.class,
Publication.class,
Qualifier.class,
Relation.class,
Result.class,
Software.class,
StructuredProperty.class
};
}
public static String getMainType(final EntityType type) {
return entityMapping.get(type).name();
}
public static boolean isResult(EntityType type) {
return MainEntityType.result.name().equals(getMainType(type));
}
public static String getScheme(final String sourceType, final String targetType) {
return String
.format(
schemeTemplate,
entityMapping.get(EntityType.valueOf(sourceType)).name(),
entityMapping.get(EntityType.valueOf(targetType)).name());
}
public static <T extends Oaf> String tableIdentifier(String dbName, String tableName) {
checkArgument(StringUtils.isNotBlank(dbName), "DB name cannot be empty");
checkArgument(StringUtils.isNotBlank(tableName), "table name cannot be empty");
return String.format("%s.%s", dbName, tableName);
}
public static <T extends Oaf> String tableIdentifier(String dbName, Class<T> clazz) {
checkArgument(Objects.nonNull(clazz), "clazz is needed to derive the table name, thus cannot be null");
return tableIdentifier(dbName, clazz.getSimpleName().toLowerCase());
}
public static <T extends Oaf> Function<T, String> idFn() {
return x -> {
if (isSubClass(x, Relation.class)) {
return idFnForRelation(x);
}
return idFnForOafEntity(x);
};
}
private static <T extends Oaf> String idFnForRelation(T t) {
Relation r = (Relation) t;
return Optional
.ofNullable(r.getSource())
.map(
source -> Optional
.ofNullable(r.getTarget())
.map(
target -> Optional
.ofNullable(r.getRelType())
.map(
relType -> Optional
.ofNullable(r.getSubRelType())
.map(
subRelType -> Optional
.ofNullable(r.getRelClass())
.map(
relClass -> String
.join(
source,
target,
relType,
subRelType,
relClass))
.orElse(
String
.join(
source,
target,
relType,
subRelType)))
.orElse(String.join(source, target, relType)))
.orElse(String.join(source, target)))
.orElse(source))
.orElse(null);
}
private static <T extends Oaf> String idFnForOafEntity(T t) {
return ((OafEntity) t).getId();
}
}

View File

@ -1,46 +0,0 @@
package eu.dnetlib.dhp.schema.common;
public class RelationInverse {
private String relation;
private String inverse;
private String relType;
private String subReltype;
public String getRelType() {
return relType;
}
public RelationInverse setRelType(String relType) {
this.relType = relType;
return this;
}
public String getSubReltype() {
return subReltype;
}
public RelationInverse setSubReltype(String subReltype) {
this.subReltype = subReltype;
return this;
}
public String getRelation() {
return relation;
}
public RelationInverse setRelation(String relation) {
this.relation = relation;
return this;
}
public String getInverse() {
return inverse;
}
public RelationInverse setInverse(String inverse) {
this.inverse = inverse;
return this;
}
}

View File

@ -1,29 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf;
import java.io.Serializable;
/**
* Used to refer to the Article Processing Charge information. Not dumped in this release. It contains two parameters: -
* currency of type String to store the currency of the APC - amount of type String to stores the charged amount
*/
public class APC implements Serializable {
private String currency;
private String amount;
public String getCurrency() {
return currency;
}
public void setCurrency(String currency) {
this.currency = currency;
}
public String getAmount() {
return amount;
}
public void setAmount(String amount) {
this.amount = amount;
}
}

View File

@ -1,31 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf;
/**
* AccessRight. Used to represent the result access rights. It extends the eu.dnet.lib.dhp.schema.dump.oaf.Qualifier
* element with a parameter scheme of type String to store the scheme. Values for this element are found against the
* COAR access right scheme. The classid of the element accessright in eu.dnetlib.dhp.schema.oaf.Result is used to get
* the COAR corresponding code whose value will be used to set the code parameter. The COAR label corresponding to the
* COAR code will be used to set the label parameter. The scheme value will always be the one referring to the COAR
* access right scheme
*/
public class AccessRight extends Qualifier {
private String scheme;
public String getScheme() {
return scheme;
}
public void setScheme(String scheme) {
this.scheme = scheme;
}
public static AccessRight newInstance(String code, String label, String scheme) {
AccessRight ar = new AccessRight();
ar.setCode(code);
ar.setLabel(label);
ar.setScheme(scheme);
return ar;
}
}

View File

@ -1,73 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf;
import java.io.Serializable;
import java.util.List;
/**
* Used to represent the generic author of the result. It has six parameters: - name of type String to store the given
* name of the author. The value for this parameter corresponds to eu.dnetlib.dhp.schema.oaf.Author name - surname of
* type String to store the family name of the author. The value for this parameter corresponds to
* eu.dnetlib.dhp.schema.oaf.Author surname - fullname of type String to store the fullname of the author. The value for
* this parameter corresponds to eu.dnetlib.dhp.schema.oaf.Author fullname - rank of type Integer to store the rank on
* the author in the result's authors list. The value for this parameter corresponds to eu.dnetlib.dhp.schema.oaf.Author
* rank - pid of type eu.dnetlib.dhp.schema.dump.oaf.Pid to store the persistent identifier for the author. For the
* moment only ORCID identifiers will be dumped. - The id element is instantiated by using the following values in the
* eu.dnetlib.dhp.schema.oaf.Result pid: * Qualifier.classid for scheme * value for value - The provenance element is
* instantiated only if the dataInfo is set for the pid in the result to be dumped. The provenance element is
* instantiated by using the following values in the eu.dnetlib.dhp.schema.oaf.Result pid: *
* dataInfo.provenanceaction.classname for provenance * dataInfo.trust for trust
*/
public class Author implements Serializable {
private String fullname;
private String name;
private String surname;
private Integer rank;
private Pid pid;
public String getFullname() {
return fullname;
}
public void setFullname(String fullname) {
this.fullname = fullname;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getSurname() {
return surname;
}
public void setSurname(String surname) {
this.surname = surname;
}
public Integer getRank() {
return rank;
}
public void setRank(Integer rank) {
this.rank = rank;
}
public Pid getPid() {
return pid;
}
public void setPid(Pid pid) {
this.pid = pid;
}
}

View File

@ -1,136 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf;
import java.io.Serializable;
import java.util.Objects;
/**
* To store information about the conference or journal where the result has been presented or published. It contains
* eleven parameters: - name of type String to store the name of the journal or conference. It corresponds to the
* parameter name of eu.dnetlib.dhp.schema.oaf.Journal - issnPrinted ot type String to store the journal printed issn.
* It corresponds to the parameter issnPrinted of eu.dnetlib.dhp.schema.oaf.Journal - issnOnline of type String to store
* the journal online issn. It corresponds to the parameter issnOnline of eu.dnetlib.dhp.schema.oaf.Journal -
* issnLinking of type String to store the journal linking issn. It corresponds to the parameter issnLinking of
* eu.dnetlib.dhp.schema.oaf.Journal - ep of type String to store the end page. It corresponds to the parameter ep of
* eu.dnetlib.dhp.schema.oaf.Journal - iss of type String to store the journal issue. It corresponds to the parameter
* iss of eu.dnetlib.dhp.schema.oaf.Journal - sp of type String to store the start page. It corresponds to the parameter
* sp of eu.dnetlib.dhp.schema.oaf.Journal - vol of type String to store the Volume. It corresponds to the parameter vol
* of eu.dnetlib.dhp.schema.oaf.Journal - edition of type String to store the edition of the journal or conference
* proceeding. It corresponds to the parameter edition of eu.dnetlib.dhp.schema.oaf.Journal - conferenceplace of type
* String to store the place of the conference. It corresponds to the parameter conferenceplace of
* eu.dnetlib.dhp.schema.oaf.Journal - conferencedate of type String to store the date of the conference. It corresponds
* to the parameter conferencedate of eu.dnetlib.dhp.schema.oaf.Journal
*/
public class Container implements Serializable {
private String name;
private String issnPrinted;
private String issnOnline;
private String issnLinking;
private String ep;
private String iss;
private String sp;
private String vol;
private String edition;
private String conferenceplace;
private String conferencedate;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getIssnPrinted() {
return issnPrinted;
}
public void setIssnPrinted(String issnPrinted) {
this.issnPrinted = issnPrinted;
}
public String getIssnOnline() {
return issnOnline;
}
public void setIssnOnline(String issnOnline) {
this.issnOnline = issnOnline;
}
public String getIssnLinking() {
return issnLinking;
}
public void setIssnLinking(String issnLinking) {
this.issnLinking = issnLinking;
}
public String getEp() {
return ep;
}
public void setEp(String ep) {
this.ep = ep;
}
public String getIss() {
return iss;
}
public void setIss(String iss) {
this.iss = iss;
}
public String getSp() {
return sp;
}
public void setSp(String sp) {
this.sp = sp;
}
public String getVol() {
return vol;
}
public void setVol(String vol) {
this.vol = vol;
}
public String getEdition() {
return edition;
}
public void setEdition(String edition) {
this.edition = edition;
}
public String getConferenceplace() {
return conferenceplace;
}
public void setConferenceplace(String conferenceplace) {
this.conferenceplace = conferenceplace;
}
public String getConferencedate() {
return conferencedate;
}
public void setConferencedate(String conferencedate) {
this.conferencedate = conferencedate;
}
}

View File

@ -1,38 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf;
import java.io.Serializable;
/**
* To represent the information described by a scheme and a value in that scheme (i.e. pid). It has two parameters: -
* scheme of type String to store the scheme - value of type String to store the value in that scheme
*/
public class ControlledField implements Serializable {
private String scheme;
private String value;
public String getScheme() {
return scheme;
}
public void setScheme(String scheme) {
this.scheme = scheme;
}
public String getValue() {
return value;
}
public void setValue(String value) {
this.value = value;
}
public static ControlledField newInstance(String scheme, String value) {
ControlledField cf = new ControlledField();
cf.setScheme(scheme);
cf.setValue(value);
return cf;
}
}

View File

@ -1,37 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf;
/**
* Represents the country associated to this result. It extends eu.dnetlib.dhp.schema.dump.oaf.Qualifier with a
* provenance parameter of type eu.dnetlib.dhp.schema.dumo.oaf.Provenance. The country in not mapped if its value in the
* result reprensented in the internal format is Unknown. The value for this element correspond to: - code corresponds
* to the classid of eu.dnetlib.dhp.schema.oaf.Country - label corresponds to the classname of
* eu.dnetlib.dhp.schema.oaf.Country - provenance set only if the dataInfo associated to the Country of the result to be
* dumped is not null. In this case : - provenance corresponds to dataInfo.provenanceaction.classid (to be modified with
* datainfo.provenanceaction.classname) - trust corresponds to dataInfo.trust
*/
public class Country extends Qualifier {
private Provenance provenance;
public Provenance getProvenance() {
return provenance;
}
public void setProvenance(Provenance provenance) {
this.provenance = provenance;
}
public static Country newInstance(String code, String label, Provenance provenance) {
Country c = new Country();
c.setProvenance(provenance);
c.setCode(code);
c.setLabel(label);
return c;
}
public static Country newInstance(String code, String label, String provenance, String trust) {
return newInstance(code, label, Provenance.newInstance(provenance, trust));
}
}

View File

@ -1,36 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf;
import java.io.Serializable;
public class Funder implements Serializable {
private String shortName;
private String name;
private String jurisdiction;
public String getJurisdiction() {
return jurisdiction;
}
public void setJurisdiction(String jurisdiction) {
this.jurisdiction = jurisdiction;
}
public String getShortName() {
return shortName;
}
public void setShortName(String shortName) {
this.shortName = shortName;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
}

View File

@ -1,53 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf;
import java.io.Serializable;
import org.apache.commons.lang3.StringUtils;
import com.fasterxml.jackson.annotation.JsonIgnore;
/**
* Represents the geolocation information. It has three parameters: - point of type String to store the point
* information. It corresponds to eu.dnetlib.dhp.schema.oaf.GeoLocation point - box ot type String to store the box
* information. It corresponds to eu.dnetlib.dhp.schema.oaf.GeoLocation box - place of type String to store the place
* information. It corresponds to eu.dnetlib.dhp.schema.oaf.GeoLocation place
*/
public class GeoLocation implements Serializable {
private String point;
private String box;
private String place;
public String getPoint() {
return point;
}
public void setPoint(String point) {
this.point = point;
}
public String getBox() {
return box;
}
public void setBox(String box) {
this.box = box;
}
public String getPlace() {
return place;
}
public void setPlace(String place) {
this.place = place;
}
@JsonIgnore
public boolean isBlank() {
return StringUtils.isBlank(point) && StringUtils.isBlank(box) && StringUtils.isBlank(place);
}
}

View File

@ -1,107 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf;
import java.io.Serializable;
import java.util.List;
/**
* Represents the manifestations (i.e. different versions) of the result. For example: the pre-print and the published
* versions are two manifestations of the same research result. It has the following parameters: - license of type
* String to store the license applied to the instance. It corresponds to the value of the licence in the instance to be
* dumped - accessright of type eu.dnetlib.dhp.schema.dump.oaf.AccessRight to store the accessright of the instance. -
* type of type String to store the type of the instance as defined in the corresponding dnet vocabulary
* (dnet:pubication_resource). It corresponds to the instancetype.classname of the instance to be mapped - hostedby of
* type eu.dnetlib.dhp.schema.dump.oaf.KeyValue to store the information about the source from which the instance can be
* viewed or downloaded. It is mapped against the hostedby parameter of the instance to be dumped and - key corresponds
* to hostedby.key - value corresponds to hostedby.value - url of type List<String> list of locations where the instance
* is accessible. It corresponds to url of the instance to be dumped - collectedfrom of type
* eu.dnetlib.dhp.schema.dump.oaf.KeyValue to store the information about the source from which the instance has been
* collected. It is mapped against the collectedfrom parameter of the instance to be dumped and - key corresponds to
* collectedfrom.key - value corresponds to collectedfrom.value - publicationdate of type String to store the
* publication date of the instance ;// dateofacceptance; - refereed of type String to store information abour tthe
* review status of the instance. Possible values are 'Unknown', 'nonPeerReviewed', 'peerReviewed'. It corresponds to
* refereed.classname of the instance to be dumped
*/
public class Instance implements Serializable {
private String license;
private AccessRight accessright;
private String type;
private KeyValue hostedby;
private List<String> url;
private KeyValue collectedfrom;
private String publicationdate;// dateofacceptance;
private String refereed; // peer-review status
public String getLicense() {
return license;
}
public void setLicense(String license) {
this.license = license;
}
public AccessRight getAccessright() {
return accessright;
}
public void setAccessright(AccessRight accessright) {
this.accessright = accessright;
}
public String getType() {
return type;
}
public void setType(String type) {
this.type = type;
}
public KeyValue getHostedby() {
return hostedby;
}
public void setHostedby(KeyValue hostedby) {
this.hostedby = hostedby;
}
public List<String> getUrl() {
return url;
}
public void setUrl(List<String> url) {
this.url = url;
}
public KeyValue getCollectedfrom() {
return collectedfrom;
}
public void setCollectedfrom(KeyValue collectedfrom) {
this.collectedfrom = collectedfrom;
}
public String getPublicationdate() {
return publicationdate;
}
public void setPublicationdate(String publicationdate) {
this.publicationdate = publicationdate;
}
public String getRefereed() {
return refereed;
}
public void setRefereed(String refereed) {
this.refereed = refereed;
}
}

View File

@ -1,48 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf;
import java.io.Serializable;
import org.apache.commons.lang3.StringUtils;
import com.fasterxml.jackson.annotation.JsonIgnore;
/**
* To represent the information described by a key and a value. It has two parameters: - key to store the key (generally
* the OpenAIRE id for some entity) - value to store the value (generally the OpenAIRE name for the key)
*/
public class KeyValue implements Serializable {
private String key;
private String value;
public String getKey() {
return key;
}
public void setKey(String key) {
this.key = key;
}
public String getValue() {
return value;
}
public void setValue(String value) {
this.value = value;
}
public static KeyValue newInstance(String key, String value) {
KeyValue inst = new KeyValue();
inst.key = key;
inst.value = value;
return inst;
}
@JsonIgnore
public boolean isBlank() {
return StringUtils.isBlank(key) && StringUtils.isBlank(value);
}
}

View File

@ -1,45 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf;
import java.io.Serializable;
/**
* To represent the generic persistent identifier. It has two parameters: - id of type
* eu.dnetlib.dhp.schema.dump.oaf.ControlledField to store the scheme and value of the Persistent Identifier. -
* provenance of type eu.dnetlib.dhp.schema.dump.oaf.Provenance to store the provenance and trust of the information
*/
public class Pid implements Serializable {
private ControlledField id;
private Provenance provenance;
public ControlledField getId() {
return id;
}
public void setId(ControlledField pid) {
this.id = pid;
}
public Provenance getProvenance() {
return provenance;
}
public void setProvenance(Provenance provenance) {
this.provenance = provenance;
}
public static Pid newInstance(ControlledField pid, Provenance provenance) {
Pid p = new Pid();
p.id = pid;
p.provenance = provenance;
return p;
}
public static Pid newInstance(ControlledField pid) {
Pid p = new Pid();
p.id = pid;
return p;
}
}

View File

@ -1,45 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf;
import java.io.Serializable;
public class Project implements Serializable {
protected String id;// OpenAIRE id
protected String code;
protected String acronym;
protected String title;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getCode() {
return code;
}
public void setCode(String code) {
this.code = code;
}
public String getAcronym() {
return acronym;
}
public void setAcronym(String acronym) {
this.acronym = acronym;
}
public String getTitle() {
return title;
}
public void setTitle(String title) {
this.title = title;
}
}

View File

@ -1,41 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf;
import java.io.Serializable;
/**
* Indicates the process that produced (or provided) the information, and the trust associated to the information. It
* has two parameters: - provenance of type String to store the provenance of the information, - trust of type String to
* store the trust associated to the information
*/
public class Provenance implements Serializable {
private String provenance;
private String trust;
public String getProvenance() {
return provenance;
}
public void setProvenance(String provenance) {
this.provenance = provenance;
}
public String getTrust() {
return trust;
}
public void setTrust(String trust) {
this.trust = trust;
}
public static Provenance newInstance(String provenance, String trust) {
Provenance p = new Provenance();
p.provenance = provenance;
p.trust = trust;
return p;
}
public String toString() {
return provenance + trust;
}
}

View File

@ -1,42 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf;
import java.io.Serializable;
import org.apache.commons.lang3.StringUtils;
import com.fasterxml.jackson.annotation.JsonIgnore;
/**
* To represent the information described by a code and a value It has two parameters: - code to store the code
* (generally the classid of the eu.dnetlib.dhp.schema.oaf.Qualifier element) - label to store the label (generally the
* classname of the eu.dnetlib.dhp.schema.oaf.Qualifier element
*/
public class Qualifier implements Serializable {
private String code; // the classid in the Qualifier
private String label; // the classname in the Qualifier
public String getCode() {
return code;
}
public void setCode(String code) {
this.code = code;
}
public String getLabel() {
return label;
}
public void setLabel(String label) {
this.label = label;
}
public static Qualifier newInstance(String code, String value) {
Qualifier qualifier = new Qualifier();
qualifier.setCode(code);
qualifier.setLabel(value);
return qualifier;
}
}

View File

@ -1,391 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf;
import java.io.Serializable;
import java.util.List;
import eu.dnetlib.dhp.schema.dump.oaf.community.Project;
/**
* To represent the dumped result. It will be extended in the dump for Research Communities - Research
* Initiative/Infrastructures. It has the following parameters: - author of type
* List<eu.dnetlib.dhpschema.dump.oaf.Author> to describe the authors of a result. For each author in the result
* represented in the internal model one author in the esternal model is produced. - type of type String to represent
* the category of the result. Possible values are publication, dataset, software, other. It corresponds to
* resulttype.classname of the dumped result - language of type eu.dnetlib.dhp.schema.dump.oaf.Qualifier to store
* information about the language of the result. It is dumped as - code corresponds to language.classid - value
* corresponds to language.classname - country of type List<eu.dnetlib.dhp.schema.dump.oaf.Country> to store the country
* list to which the result is associated. For each country in the result respresented in the internal model one country
* in the external model is produces - subjects of type List<eu.dnetlib.dhp.dump.oaf.Subject> to store the subjects for
* the result. For each subject in the result represented in the internal model one subject in the external model is
* produced - maintitle of type String to store the main title of the result. It corresponds to the value of the first
* title in the resul to be dumped having classid equals to "main title" - subtitle of type String to store the subtitle
* of the result. It corresponds to the value of the first title in the resul to be dumped having classid equals to
* "subtitle" - description of type List<String> to store the description of the result. It corresponds to the list of
* description.value in the result represented in the internal model - publicationdate of type String to store the
* pubblication date. It corresponds to dateofacceptance.value in the result represented in the internal model -
* publisher of type String to store information about the publisher. It corresponds to publisher.value of the result
* represented in the intrenal model - embargoenddate of type String to store the embargo end date. It corresponds to
* embargoenddate.value of the result represented in the internal model - source of type List<String> See definition of
* Dublin Core field dc:source. It corresponds to the list of source.value in the result represented in the internal
* model - format of type List<String> It corresponds to the list of format.value in the result represented in the
* internal model - contributor of type List<String> to represent contributors for this result. It corresponds to the
* list of contributor.value in the result represented in the internal model - coverage of type String. It corresponds
* to the list of coverage.value in the result represented in the internal model - bestaccessright of type
* eu.dnetlib.dhp.schema.dump.oaf.AccessRight to store informatin about the openest access right associated to the
* manifestations of this research results. It corresponds to the same parameter in the result represented in the
* internal model - instance of type List<eu.dnetlib.dhp.schema.dump.oaf.Instance> to store all the instances associated
* to the result. It corresponds to the same parameter in the result represented in the internal model - container of
* type eu.dnetlib.dhp.schema/dump.oaf.Container (only for result of type publication). It corresponds to the parameter
* journal of the result represented in the internal model - documentationUrl of type List<String> (only for results of
* type software) to store the URLs to the software documentation. It corresponds to the list of documentationUrl.value
* of the result represented in the internal model - codeRepositoryUrl of type String (only for results of type
* software) to store the URL to the repository with the source code. It corresponds to codeRepositoryUrl.value of the
* result represented in the internal model - programmingLanguage of type String (only for results of type software) to
* store the programming language. It corresponds to programmingLanguaga.classid of the result represented in the
* internal model - contactperson of type List<String> (only for results of type other) to store the contact person for
* this result. It corresponds to the list of contactperson.value of the result represented in the internal model -
* contactgroup of type List<String> (only for results of type other) to store the information for the contact group. It
* corresponds to the list of contactgroup.value of the result represented in the internal model - tool of type
* List<String> (only fro results of type other) to store information about tool useful for the interpretation and/or
* re-used of the research product. It corresponds to the list of tool.value in the result represented in the internal
* modelt - size of type String (only for results of type dataset) to store the size of the dataset. It corresponds to
* size.value in the result represented in the internal model - version of type String (only for results of type
* dataset) to store the version. It corresponds to version.value of the result represented in the internal model -
* geolocation fo type List<eu.dnetlib.dhp.schema.dump.oaf.GeoLocation> (only for results of type dataset) to store
* geolocation information. For each geolocation element in the result represented in the internal model a GeoLocation
* in the external model il produced - id of type String to store the OpenAIRE id of the result. It corresponds to the
* id of the result represented in the internal model - originalId of type List<String> to store the original ids of the
* result. It corresponds to the originalId of the result represented in the internal model - pid of type
* List<eu.dnetlib.dhp.schema.dump.oaf.ControlledField> to store the persistent identifiers for the result. For each pid
* in the results represented in the internal model one pid in the external model is produced. The value correspondence
* is: - scheme corresponds to pid.qualifier.classid of the result represented in the internal model - value corresponds
* to the pid.value of the result represented in the internal model - dateofcollection of type String to store
* information about the time OpenAIRE collected the record. It corresponds to dateofcollection of the result
* represented in the internal model - lasteupdatetimestamp of type String to store the timestamp of the last update of
* the record. It corresponds to lastupdatetimestamp of the resord represented in the internal model
*/
public class Result implements Serializable {
private List<Author> author;
// resulttype allows subclassing results into publications | datasets | software
private String type; // resulttype
// common fields
private Qualifier language;
private List<Country> country;
private List<Subject> subjects;
private String maintitle;
private String subtitle;
private List<String> description;
private String publicationdate; // dateofacceptance;
private String publisher;
private String embargoenddate;
private List<String> source;
private List<String> format;
private List<String> contributor;
private List<String> coverage;
private AccessRight bestaccessright;
private List<Instance> instance;
private Container container;// Journal
private List<String> documentationUrl; // software
private String codeRepositoryUrl; // software
private String programmingLanguage; // software
private List<String> contactperson; // orp
private List<String> contactgroup; // orp
private List<String> tool; // orp
private String size; // dataset
private String version; // dataset
private List<GeoLocation> geolocation; // dataset
private String id;
private List<String> originalId;
private List<ControlledField> pid;
private String dateofcollection;
private Long lastupdatetimestamp;
public Long getLastupdatetimestamp() {
return lastupdatetimestamp;
}
public void setLastupdatetimestamp(Long lastupdatetimestamp) {
this.lastupdatetimestamp = lastupdatetimestamp;
}
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public List<String> getOriginalId() {
return originalId;
}
public void setOriginalId(List<String> originalId) {
this.originalId = originalId;
}
public List<ControlledField> getPid() {
return pid;
}
public void setPid(List<ControlledField> pid) {
this.pid = pid;
}
public String getDateofcollection() {
return dateofcollection;
}
public void setDateofcollection(String dateofcollection) {
this.dateofcollection = dateofcollection;
}
public List<Author> getAuthor() {
return author;
}
public String getType() {
return type;
}
public void setType(String type) {
this.type = type;
}
public Container getContainer() {
return container;
}
public void setContainer(Container container) {
this.container = container;
}
public void setAuthor(List<Author> author) {
this.author = author;
}
public Qualifier getLanguage() {
return language;
}
public void setLanguage(Qualifier language) {
this.language = language;
}
public List<Country> getCountry() {
return country;
}
public void setCountry(List<Country> country) {
this.country = country;
}
public List<Subject> getSubjects() {
return subjects;
}
public void setSubjects(List<Subject> subjects) {
this.subjects = subjects;
}
public String getMaintitle() {
return maintitle;
}
public void setMaintitle(String maintitle) {
this.maintitle = maintitle;
}
public String getSubtitle() {
return subtitle;
}
public void setSubtitle(String subtitle) {
this.subtitle = subtitle;
}
public List<String> getDescription() {
return description;
}
public void setDescription(List<String> description) {
this.description = description;
}
public String getPublicationdate() {
return publicationdate;
}
public void setPublicationdate(String publicationdate) {
this.publicationdate = publicationdate;
}
public String getPublisher() {
return publisher;
}
public void setPublisher(String publisher) {
this.publisher = publisher;
}
public String getEmbargoenddate() {
return embargoenddate;
}
public void setEmbargoenddate(String embargoenddate) {
this.embargoenddate = embargoenddate;
}
public List<String> getSource() {
return source;
}
public void setSource(List<String> source) {
this.source = source;
}
public List<String> getFormat() {
return format;
}
public void setFormat(List<String> format) {
this.format = format;
}
public List<String> getContributor() {
return contributor;
}
public void setContributor(List<String> contributor) {
this.contributor = contributor;
}
public List<String> getCoverage() {
return coverage;
}
public void setCoverage(List<String> coverage) {
this.coverage = coverage;
}
public AccessRight getBestaccessright() {
return bestaccessright;
}
public void setBestaccessright(AccessRight bestaccessright) {
this.bestaccessright = bestaccessright;
}
public List<Instance> getInstance() {
return instance;
}
public void setInstance(List<Instance> instance) {
this.instance = instance;
}
public List<String> getDocumentationUrl() {
return documentationUrl;
}
public void setDocumentationUrl(List<String> documentationUrl) {
this.documentationUrl = documentationUrl;
}
public String getCodeRepositoryUrl() {
return codeRepositoryUrl;
}
public void setCodeRepositoryUrl(String codeRepositoryUrl) {
this.codeRepositoryUrl = codeRepositoryUrl;
}
public String getProgrammingLanguage() {
return programmingLanguage;
}
public void setProgrammingLanguage(String programmingLanguage) {
this.programmingLanguage = programmingLanguage;
}
public List<String> getContactperson() {
return contactperson;
}
public void setContactperson(List<String> contactperson) {
this.contactperson = contactperson;
}
public List<String> getContactgroup() {
return contactgroup;
}
public void setContactgroup(List<String> contactgroup) {
this.contactgroup = contactgroup;
}
public List<String> getTool() {
return tool;
}
public void setTool(List<String> tool) {
this.tool = tool;
}
public String getSize() {
return size;
}
public void setSize(String size) {
this.size = size;
}
public String getVersion() {
return version;
}
public void setVersion(String version) {
this.version = version;
}
public List<GeoLocation> getGeolocation() {
return geolocation;
}
public void setGeolocation(List<GeoLocation> geolocation) {
this.geolocation = geolocation;
}
}

View File

@ -1,34 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf;
import java.io.Serializable;
/**
* To represent keywords associated to the result. It has two parameters: - subject of type
* eu.dnetlib.dhp.schema.dump.oaf.ControlledField to describe the subject. It mapped as: - schema it corresponds to
* qualifier.classid of the dumped subject - value it corresponds to the subject value - provenance of type
* eu.dnetlib.dhp.schema.dump.oaf.Provenance to represent the provenance of the subject. It is dumped only if dataInfo
* is not null. In this case: - provenance corresponds to dataInfo.provenanceaction.classname - trust corresponds to
* dataInfo.trust
*/
public class Subject implements Serializable {
private ControlledField subject;
private Provenance provenance;
public ControlledField getSubject() {
return subject;
}
public void setSubject(ControlledField subject) {
this.subject = subject;
}
public Provenance getProvenance() {
return provenance;
}
public void setProvenance(Provenance provenance) {
this.provenance = provenance;
}
}

View File

@ -1,51 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf.community;
import java.util.List;
import eu.dnetlib.dhp.schema.dump.oaf.KeyValue;
import eu.dnetlib.dhp.schema.dump.oaf.Result;
/**
* extends eu.dnetlib.dhp.schema.dump.oaf.Result with the following parameters: - projects of type
* List<eu.dnetlib.dhp.schema.dump.oaf.community.Project> to store the list of projects related to the result. The
* information is added after the result is mapped to the external model - context of type
* List<eu.dnetlib.dhp.schema/dump.oaf.community.Context> to store information about the RC RI related to the result.
* For each context in the result represented in the internal model one context in the external model is produced -
* collectedfrom of type List<eu.dnetliv.dhp.schema.dump.oaf.KeyValue> to store information about the sources from which
* the record has been collected. For each collectedfrom in the result represented in the internal model one
* collectedfrom in the external model is produced
*/
public class CommunityResult extends Result {
private List<Project> projects;
private List<Context> context;
protected List<KeyValue> collectedfrom;
public List<KeyValue> getCollectedfrom() {
return collectedfrom;
}
public void setCollectedfrom(List<KeyValue> collectedfrom) {
this.collectedfrom = collectedfrom;
}
public List<Project> getProjects() {
return projects;
}
public void setProjects(List<Project> projects) {
this.projects = projects;
}
public List<Context> getContext() {
return context;
}
public void setContext(List<Context> context) {
this.context = context;
}
}

View File

@ -1,40 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf.community;
import java.util.List;
import java.util.Objects;
import eu.dnetlib.dhp.schema.dump.oaf.Provenance;
import eu.dnetlib.dhp.schema.dump.oaf.Qualifier;
/**
* Reference to a relevant research infrastructure, initiative or community (RI/RC) among those collaborating with
* OpenAIRE. It extend eu.dnetlib.dhp.shema.dump.oaf.Qualifier with a parameter provenance of type
* List<eu.dnetlib.dhp.schema.dump.oaf.Provenance> to store the provenances of the association between the result and
* the RC/RI. The values for this element correspond to: - code: it corresponds to the id of the context in the result
* to be mapped. If the context id refers to a RC/RI and contains '::' only the part of the id before the first "::"
* will be used as value for code - label it corresponds to the label associated to the id. The information id taken
* from the profile of the RC/RI - provenance it is set only if the dataInfo associated to the contenxt element of the
* result to be dumped is not null. For each dataInfo one instance of type eu.dnetlib.dhp.schema.dump.oaf.Provenance is
* instantiated if the element datainfo.provenanceaction is not null. In this case - provenance corresponds to
* dataInfo.provenanceaction.classname - trust corresponds to dataInfo.trust
*/
public class Context extends Qualifier {
private List<Provenance> provenance;
public List<Provenance> getProvenance() {
return provenance;
}
public void setProvenance(List<Provenance> provenance) {
this.provenance = provenance;
}
@Override
public int hashCode() {
String provenance = new String();
this.provenance.forEach(p -> provenance.concat(p.toString()));
return Objects.hash(getCode(), getLabel(), provenance);
}
}

View File

@ -1,52 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf.community;
import java.io.Serializable;
/**
* To store information about the funder funding the project related to the result. It has the following parameters: -
* shortName of type String to store the funder short name (e.c. AKA). - name of type String to store the funder name
* (e.c. Akademy of Finland) - fundingStream of type String to store the funding stream - jurisdiction of type String to
* store the jurisdiction of the funder
*/
public class Funder implements Serializable {
private String shortName;
private String name;
private String fundingStream;
private String jurisdiction;
public String getJurisdiction() {
return jurisdiction;
}
public void setJurisdiction(String jurisdiction) {
this.jurisdiction = jurisdiction;
}
public String getShortName() {
return shortName;
}
public void setShortName(String shortName) {
this.shortName = shortName;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getFundingStream() {
return fundingStream;
}
public void setFundingStream(String fundingStream) {
this.fundingStream = fundingStream;
}
}

View File

@ -1,88 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf.community;
import java.io.Serializable;
import eu.dnetlib.dhp.schema.dump.oaf.Provenance;
/**
* To store information about the project related to the result. This information is not directly mapped from the result
* represented in the internal model because it is not there. The mapped result will be enriched with project
* information derived by relation between results and projects. Project class has the following parameters: - id of
* type String to store the OpenAIRE id for the Project - code of type String to store the grant agreement - acronym of
* type String to store the acronym for the project - title of type String to store the title of the project - funder of
* type eu.dnetlib.dhp.schema.dump.oaf.community.Funder to store information about the funder funding the project -
* provenance of type eu.dnetlib.dhp.schema.dump.oaf.Provenance to store information about the. provenance of the
* association between the result and the project
*/
public class Project implements Serializable {
private String id;// OpenAIRE id
private String code;
private String acronym;
private String title;
private Funder funder;
private Provenance provenance;
public Provenance getProvenance() {
return provenance;
}
public void setProvenance(Provenance provenance) {
this.provenance = provenance;
}
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getCode() {
return code;
}
public void setCode(String code) {
this.code = code;
}
public String getAcronym() {
return acronym;
}
public void setAcronym(String acronym) {
this.acronym = acronym;
}
public String getTitle() {
return title;
}
public void setTitle(String title) {
this.title = title;
}
public Funder getFunder() {
return funder;
}
public void setFunder(Funder funders) {
this.funder = funders;
}
public static Project newInstance(String id, String code, String acronym, String title, Funder funder) {
Project project = new Project();
project.setAcronym(acronym);
project.setCode(code);
project.setFunder(funder);
project.setId(id);
project.setTitle(title);
return project;
}
}

View File

@ -1,21 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf.graph;
import java.io.Serializable;
public class Constants implements Serializable {
// collectedFrom va con isProvidedBy -> becco da ModelSupport
public static final String HOSTED_BY = "isHostedBy";
public static final String HOSTS = "hosts";
// community result uso isrelatedto
public static final String RESULT_ENTITY = "result";
public static final String DATASOURCE_ENTITY = "datasource";
public static final String CONTEXT_ENTITY = "context";
public static final String CONTEXT_ID = "60";
public static final String CONTEXT_NS_PREFIX = "context____";
}

View File

@ -1,316 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf.graph;
import java.io.Serializable;
import java.util.List;
import eu.dnetlib.dhp.schema.dump.oaf.Container;
import eu.dnetlib.dhp.schema.dump.oaf.ControlledField;
import eu.dnetlib.dhp.schema.dump.oaf.KeyValue;
/**
* To store information about the datasource OpenAIRE collects information from. It contains the following parameters: -
* id of type String to store the OpenAIRE id for the datasource. It corresponds to the parameter id of the datasource
* represented in the internal model - originalId of type List<String> to store the list of original ids associated to
* the datasource. It corresponds to the parameter originalId of the datasource represented in the internal model. The
* null values are filtered out - pid of type List<eu.dnetlib.shp.schema.dump.oaf.ControlledField> to store the
* persistent identifiers for the datasource. For each pid in the datasource represented in the internal model one pid
* in the external model is produced as : - schema corresponds to pid.qualifier.classid of the datasource represented in
* the internal model - value corresponds to pid.value of the datasource represented in the internal model -
* datasourceType of type eu.dnetlib.dhp.schema.dump.oaf.ControlledField to store the datasource type (e.g.
* pubsrepository::institutional, Institutional Repository) as in the dnet vocabulary dnet:datasource_typologies. It
* corresponds to datasourcetype of the datasource represented in the internal model and : - code corresponds to
* datasourcetype.classid - value corresponds to datasourcetype.classname - openairecompatibility of type String to
* store information about the OpenAIRE compatibility of the ingested results (which guidelines they are compliant to).
* It corresponds to openairecompatibility.classname of the datasource represented in the internal model - officialname
* of type Sgtring to store the official name of the datasource. It correspond to officialname.value of the datasource
* represented in the internal model - englishname of type String to store the English name of the datasource. It
* corresponds to englishname.value of the datasource represented in the internal model - websiteurl of type String to
* store the URL of the website of the datasource. It corresponds to websiteurl.value of the datasource represented in
* the internal model - logourl of type String to store the URL of the logo for the datasource. It corresponds to
* logourl.value of the datasource represented in the internal model - dateofvalidation of type String to store the data
* of validation against the guidelines for the datasource records. It corresponds to dateofvalidation.value of the
* datasource represented in the internal model - description of type String to store the description for the
* datasource. It corresponds to description.value of the datasource represented in the internal model
*/
public class Datasource implements Serializable {
private String id; // string
private List<String> originalId; // list string
private List<ControlledField> pid; // list<String>
private ControlledField datasourcetype; // value
private String openairecompatibility; // value
private String officialname; // string
private String englishname; // string
private String websiteurl; // string
private String logourl; // string
private String dateofvalidation; // string
private String description; // description
private List<String> subjects; // List<String>
// opendoar specific fields (od*)
private List<String> languages; // odlanguages List<String>
private List<String> contenttypes; // odcontent types List<String>
// re3data fields
private String releasestartdate; // string
private String releaseenddate; // string
private String missionstatementurl; // string
// {open, restricted or closed}
private String accessrights; // databaseaccesstype string
// {open, restricted or closed}
private String uploadrights; // datauploadtype string
// {feeRequired, registration, other}
private String databaseaccessrestriction; // string
// {feeRequired, registration, other}
private String datauploadrestriction; // string
private Boolean versioning; // boolean
private String citationguidelineurl; // string
// {yes, no, uknown}
private String pidsystems; // string
private String certificates; // string
private List<Object> policies; //
private Container journal; // issn etc del Journal
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public List<String> getOriginalId() {
return originalId;
}
public void setOriginalId(List<String> originalId) {
this.originalId = originalId;
}
public List<ControlledField> getPid() {
return pid;
}
public void setPid(List<ControlledField> pid) {
this.pid = pid;
}
public ControlledField getDatasourcetype() {
return datasourcetype;
}
public void setDatasourcetype(ControlledField datasourcetype) {
this.datasourcetype = datasourcetype;
}
public String getOpenairecompatibility() {
return openairecompatibility;
}
public void setOpenairecompatibility(String openairecompatibility) {
this.openairecompatibility = openairecompatibility;
}
public String getOfficialname() {
return officialname;
}
public void setOfficialname(String officialname) {
this.officialname = officialname;
}
public String getEnglishname() {
return englishname;
}
public void setEnglishname(String englishname) {
this.englishname = englishname;
}
public String getWebsiteurl() {
return websiteurl;
}
public void setWebsiteurl(String websiteurl) {
this.websiteurl = websiteurl;
}
public String getLogourl() {
return logourl;
}
public void setLogourl(String logourl) {
this.logourl = logourl;
}
public String getDateofvalidation() {
return dateofvalidation;
}
public void setDateofvalidation(String dateofvalidation) {
this.dateofvalidation = dateofvalidation;
}
public String getDescription() {
return description;
}
public void setDescription(String description) {
this.description = description;
}
public List<String> getSubjects() {
return subjects;
}
public void setSubjects(List<String> subjects) {
this.subjects = subjects;
}
public List<String> getLanguages() {
return languages;
}
public void setLanguages(List<String> languages) {
this.languages = languages;
}
public List<String> getContenttypes() {
return contenttypes;
}
public void setContenttypes(List<String> contenttypes) {
this.contenttypes = contenttypes;
}
public String getReleasestartdate() {
return releasestartdate;
}
public void setReleasestartdate(String releasestartdate) {
this.releasestartdate = releasestartdate;
}
public String getReleaseenddate() {
return releaseenddate;
}
public void setReleaseenddate(String releaseenddate) {
this.releaseenddate = releaseenddate;
}
public String getMissionstatementurl() {
return missionstatementurl;
}
public void setMissionstatementurl(String missionstatementurl) {
this.missionstatementurl = missionstatementurl;
}
public String getAccessrights() {
return accessrights;
}
public void setAccessrights(String accessrights) {
this.accessrights = accessrights;
}
public String getUploadrights() {
return uploadrights;
}
public void setUploadrights(String uploadrights) {
this.uploadrights = uploadrights;
}
public String getDatabaseaccessrestriction() {
return databaseaccessrestriction;
}
public void setDatabaseaccessrestriction(String databaseaccessrestriction) {
this.databaseaccessrestriction = databaseaccessrestriction;
}
public String getDatauploadrestriction() {
return datauploadrestriction;
}
public void setDatauploadrestriction(String datauploadrestriction) {
this.datauploadrestriction = datauploadrestriction;
}
public Boolean getVersioning() {
return versioning;
}
public void setVersioning(Boolean versioning) {
this.versioning = versioning;
}
public String getCitationguidelineurl() {
return citationguidelineurl;
}
public void setCitationguidelineurl(String citationguidelineurl) {
this.citationguidelineurl = citationguidelineurl;
}
public String getPidsystems() {
return pidsystems;
}
public void setPidsystems(String pidsystems) {
this.pidsystems = pidsystems;
}
public String getCertificates() {
return certificates;
}
public void setCertificates(String certificates) {
this.certificates = certificates;
}
public List<Object> getPolicies() {
return policies;
}
public void setPolicies(List<Object> policiesr3) {
this.policies = policiesr3;
}
public Container getJournal() {
return journal;
}
public void setJournal(Container journal) {
this.journal = journal;
}
}

View File

@ -1,54 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf.graph;
import java.io.Serializable;
/**
* To store information about the funder funding the project related to the result. It has the following parameters:
* - private String shortName to store the short name of the funder (e.g. AKA)
* - private String name to store information about the name of the funder (e.g. Akademy of Finland)
* - private Fundings funding_stream to store the fundingstream
* - private String jurisdiction to store information about the jurisdiction of the funder
*/
public class Funder implements Serializable {
private String shortName;
private String name;
private Fundings funding_stream;
private String jurisdiction;
public String getShortName() {
return shortName;
}
public void setShortName(String shortName) {
this.shortName = shortName;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getJurisdiction() {
return jurisdiction;
}
public void setJurisdiction(String jurisdiction) {
this.jurisdiction = jurisdiction;
}
public Fundings getFunding_stream() {
return funding_stream;
}
public void setFunding_stream(Fundings funding_stream) {
this.funding_stream = funding_stream;
}
}

View File

@ -1,35 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf.graph;
import java.io.Serializable;
/**
* To store inforamtion about the funding stream. It has two parameters:
* - private String id to store the id of the fundings stream. The id is created by appending the shortname of the
* funder to the name of each level in the xml representing the fundng stream. For example: if the funder is the
* European Commission, the funding level 0 name is FP7, the funding level 1 name is SP3 and the funding level 2 name is
* PEOPLE then the id will be: EC::FP7::SP3::PEOPLE
* - private String description to describe the funding stream. It is created by concatenating the description of each funding
* level so for the example above the description would be: SEVENTH FRAMEWORK PROGRAMME - SP3-People - Marie-Curie Actions
*/
public class Fundings implements Serializable {
private String id;
private String description;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getDescription() {
return description;
}
public void setDescription(String description) {
this.description = description;
}
}

View File

@ -1,56 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf.graph;
import java.io.Serializable;
import java.util.Optional;
/**
* To describe the funded amount. It has the following parameters:
* - private String currency to store the currency of the fund
* - private float totalcost to store the total cost of the project
* - private float fundedamount to store the funded amount by the funder
*/
public class Granted implements Serializable {
private String currency;
private float totalcost;
private float fundedamount;
public String getCurrency() {
return currency;
}
public void setCurrency(String currency) {
this.currency = currency;
}
public float getTotalcost() {
return totalcost;
}
public void setTotalcost(float totalcost) {
this.totalcost = totalcost;
}
public float getFundedamount() {
return fundedamount;
}
public void setFundedamount(float fundedamount) {
this.fundedamount = fundedamount;
}
public static Granted newInstance(String currency, float totalcost, float fundedamount) {
Granted granted = new Granted();
granted.currency = currency;
granted.totalcost = totalcost;
granted.fundedamount = fundedamount;
return granted;
}
public static Granted newInstance(String currency, float fundedamount) {
Granted granted = new Granted();
granted.currency = currency;
granted.fundedamount = fundedamount;
return granted;
}
}

View File

@ -1,82 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf.graph;
import java.io.Serializable;
/**
* To store information about the classification for the project. The classification depends on the programme. For example
* H2020-EU.3.4.5.3 can be classified as
* H2020-EU.3. => Societal Challenges (level1)
* H2020-EU.3.4. => Transport (level2)
* H2020-EU.3.4.5. => CLEANSKY2 (level3)
* H2020-EU.3.4.5.3. => IADP Fast Rotorcraft (level4)
*
* We decided to explicitly represent up to three levels in the classification.
*
* H2020Classification has the following parameters:
* - private Programme programme to store the information about the programme related to this classification
* - private String level1 to store the information about the level 1 of the classification (Priority or Pillar of the EC)
* - private String level2 to store the information about the level2 af the classification (Objectives (?))
* - private String level3 to store the information about the level3 of the classification
* - private String classification to store the entire classification related to the programme
*/
public class H2020Classification implements Serializable {
private Programme programme;
private String level1;
private String level2;
private String level3;
private String classification;
public Programme getProgramme() {
return programme;
}
public void setProgramme(Programme programme) {
this.programme = programme;
}
public String getLevel1() {
return level1;
}
public void setLevel1(String level1) {
this.level1 = level1;
}
public String getLevel2() {
return level2;
}
public void setLevel2(String level2) {
this.level2 = level2;
}
public String getLevel3() {
return level3;
}
public void setLevel3(String level3) {
this.level3 = level3;
}
public String getClassification() {
return classification;
}
public void setClassification(String classification) {
this.classification = classification;
}
public static H2020Classification newInstance(String programme_code, String programme_description, String level1,
String level2, String level3, String classification) {
H2020Classification h2020classification = new H2020Classification();
h2020classification.programme = Programme.newInstance(programme_code, programme_description);
h2020classification.level1 = level1;
h2020classification.level2 = level2;
h2020classification.level3 = level3;
h2020classification.classification = classification;
return h2020classification;
}
}

View File

@ -1,41 +0,0 @@
package eu.dnetlib.dhp.schema.dump.oaf.graph;
import java.io.Serializable;
/**
* To represent the generic node in a relation. It has the following parameters:
* - private String id the openaire id of the entity in the relation
* - private String type the type of the entity in the relation.
*
* Consider the generic relation between a Result R and a Project P, the node representing R will have
* as id the id of R and as type result, while the node representing the project will have as id the id of the project
* and as type project
*/
public class Node implements Serializable {
private String id;
private String type;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getType() {
return type;
}
public void setType(String type) {
this.type = type;
}
public static Node newInstance(String id, String type) {
Node node = new Node();
node.id = id;
node.type = type;
return node;
}
}

Some files were not shown because too many files have changed in this diff Show More