Sandro La Bruzzo
3d8e2aa146
Code refactor:
...
- removed old workflows in doiboost
- splitted workflow of doiboost in preprocess and process
2021-07-14 14:37:06 +02:00
Sandro La Bruzzo
c35c117601
fixed process doiboost workflow:
...
- splitted OrcidToOAF into two phase preprocess and process
- updated workflow used in production
2021-07-14 12:48:01 +02:00
Sandro La Bruzzo
bbe8193930
merged stable ids
2021-07-12 17:00:43 +02:00
Miriam Baglioni
7177c25261
added check for null value during doi normalization
2021-07-05 16:22:38 +02:00
Miriam Baglioni
0892cad4e8
the normalization of the content of value was not visible outside the block. Moved doi normalization operation while returning value
2021-07-05 16:21:42 +02:00
Sandro La Bruzzo
c6fa8598e1
massive code refactor:
...
removed modules dhp-*-scholexplorer
2021-07-01 22:13:45 +02:00
Miriam Baglioni
cf758f4f91
added normalization step for the doi
2021-06-30 10:03:15 +02:00
Miriam Baglioni
801763a0fa
there is no more the need to lower case the doi since it is done in the first step. Also changed the creation of the id by using the factory
2021-06-29 19:07:23 +02:00
Miriam Baglioni
a74de1cda2
added normalization step to the doi
2021-06-29 18:51:11 +02:00
Miriam Baglioni
06074ea7d3
added normalization step to the doi
2021-06-29 18:46:08 +02:00
Miriam Baglioni
8b8ffe82dc
added step of normalization for the doi
2021-06-29 18:41:39 +02:00
Miriam Baglioni
50cc21d92e
Added method to normalize doi values (lower case, remove all preceeding 10., filtering out doi not starting with 10.)
2021-06-29 18:35:28 +02:00
Sandro La Bruzzo
80e15cc455
implemented mapping from uniprot, pdb and ebi links
2021-06-24 17:20:00 +02:00
Sandro La Bruzzo
a167543637
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_id_scholexplorer
2021-06-21 09:14:11 +02:00
Miriam Baglioni
464c2ddde3
changed to split in two steps the generation of the crossref dataset
2021-06-18 09:42:31 +02:00
Miriam Baglioni
6aca0d8ebb
added kryo encoding for input files
2021-06-18 09:42:07 +02:00
Sandro La Bruzzo
3100166d29
Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer
2021-06-16 16:22:16 +02:00
Miriam Baglioni
9f9dd00b94
refactoring
2021-06-15 09:24:46 +02:00
Miriam Baglioni
63d74ee379
refactoring
2021-06-15 09:24:11 +02:00
Miriam Baglioni
f7379255b6
changed the workflow to extract info from the dump
2021-06-15 09:22:54 +02:00
Miriam Baglioni
d6e21bb6ea
creates the crossref dataset used for doiboost together with unpacking part from tar
2021-06-14 17:27:19 +02:00
Miriam Baglioni
ce0cfd79e0
creates the crossref dataset used for doiboost
2021-06-14 13:40:19 +02:00
Miriam Baglioni
93efe4de82
split the construction of crossref dataset in two parts. This one just unpacks the tar entries
2021-06-14 13:39:40 +02:00
Sandro La Bruzzo
efbea1e01a
minor fix
2021-06-14 09:45:14 +02:00
Miriam Baglioni
75780fc636
extraction of the tar for the dump of crossref, and creation of the dataset
2021-06-14 09:45:07 +02:00
Miriam Baglioni
8d2e086e48
changes to avoid reassignment to val
2021-06-07 17:50:37 +02:00
Miriam Baglioni
f33521d338
Aggiornare 'dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcid/SparkConvertORCIDToOAF.scala'
...
to be able to replace the aboject assigned to author val has been replaced by var
2021-06-07 17:27:07 +02:00
Miriam Baglioni
bc12e9819e
Aggiornare 'dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcid/SparkConvertORCIDToOAF.scala'
...
The change is to fix the issue that arises when the same work appears more than once on the same ORCID profile. The change avoid to replicate the association doi -> author when the orcid id is already associated to the doi.
2021-06-07 16:37:01 +02:00
Claudio Atzori
5e4b91d9ef
more pervasive use of constants from ModelConstants, especially for ORCID
2021-05-26 18:20:23 +02:00
Claudio Atzori
c4a23c2f4d
fix: preserving the old identifier among the originalIds in the doiboost construction process, trying to avoid UnsupportedOperationException while adding elements to the originalIds
2021-05-19 16:01:52 +02:00
Claudio Atzori
ba03f549d7
fix: preserving the old identifier among the originalIds in the doiboost construction process
2021-05-19 15:43:26 +02:00
Claudio Atzori
2cbf15f4fb
using ModelConstants
2021-05-17 09:54:45 +02:00
Claudio Atzori
f19feceaf0
set the old identifier before switching to the new one
2021-05-14 12:53:40 +02:00
Claudio Atzori
1bd70fa2c6
preserving the old identifier among the originalIds in the doiboost construction process
2021-05-14 11:30:41 +02:00
Claudio Atzori
ca3f3a7687
using ModelConstants
2021-05-14 11:29:49 +02:00
Claudio Atzori
23b8883ab1
applied intellij code cleanup
2021-05-14 10:58:12 +02:00
Claudio Atzori
5afa7d3e0c
core utilities in dhp-common moved in external module dhp-schemas
2021-04-27 15:44:01 +02:00
Sandro La Bruzzo
a16e5299f9
applied unique function on the final dataset
2021-04-16 17:36:48 +02:00
Sandro La Bruzzo
67085da305
fixed NPE
2021-04-16 11:05:58 +02:00
Sandro La Bruzzo
7d6a80e2f2
added new type on MAG mapping
2021-04-16 09:14:15 +02:00
Sandro La Bruzzo
479abd10cb
Add into ORCID workflow a method that extracts orcid directly to the dump generated by Enrico
2021-04-13 17:47:43 +02:00
Claudio Atzori
e686b8de8d
[ORCID-no-doi] integrating PR#98 D-Net/dnet-hadoop#98
2021-04-01 17:11:03 +02:00
Claudio Atzori
ee34cc51c3
[ORCID-no-doi] integrating PR#98 D-Net/dnet-hadoop#98
2021-04-01 17:07:49 +02:00
Claudio Atzori
7941d7be29
WIP: using common definitions from ModelConstants
2021-03-31 18:33:57 +02:00
Sandro La Bruzzo
616d2ecce2
splitted workflow collecting datacite into two workflows.
...
Released on beta
2021-03-31 15:45:58 +02:00
Sandro La Bruzzo
1dfda3624e
improved workflow importing datacite
2021-03-26 13:56:29 +01:00
Sandro La Bruzzo
625e4c29c4
added model constants
2021-03-23 09:39:56 +01:00
Sandro La Bruzzo
c392936b97
fixed error on best access right
2021-03-23 09:23:22 +01:00
Sandro La Bruzzo
c73072079d
fix conflicts
2021-03-22 16:36:31 +01:00
Sandro La Bruzzo
098914dcff
fix wrong relation with source null
2021-03-22 11:35:02 +01:00