miconis
c39c82dfe9
modification of the jobs for the integration of openorgs in the provision, dedup records are no more created by merging but simply taking results of openorgs portal
2021-04-06 14:31:00 +02:00
Claudio Atzori
37b65cc3ad
Merge pull request 'updates on stats-update workflow' ( #100 ) from antonis.lempesis/dnet-hadoop:master into master
...
The workflow integrated in the _stable_ids_ branch has been run correctly on the BETA content, thus IMO this PR can be integrated in the master branch.
Reviewed-on: D-Net/dnet-hadoop#100
2021-04-02 16:13:35 +02:00
Claudio Atzori
1e7e5180fa
[Graph model] updated definition of ExternalReference: added alternateLabel, removed description ( #6503 )
2021-04-02 12:32:12 +02:00
Claudio Atzori
e686b8de8d
[ORCID-no-doi] integrating PR#98 D-Net/dnet-hadoop#98
2021-04-01 17:11:03 +02:00
Claudio Atzori
ee34cc51c3
[ORCID-no-doi] integrating PR#98 D-Net/dnet-hadoop#98
2021-04-01 17:07:49 +02:00
Claudio Atzori
70e49ed53c
[OpenOrgsWf] trivial refactoring
2021-04-01 10:30:51 +02:00
Claudio Atzori
7941d7be29
WIP: using common definitions from ModelConstants
2021-03-31 18:33:57 +02:00
Claudio Atzori
879e8cc7ef
WIP: using common definitions from ModelConstants
2021-03-31 17:12:01 +02:00
Claudio Atzori
72ce741ea6
WIP: using common definitions from ModelConstants
2021-03-31 17:07:13 +02:00
Enrico Ottonello
59ec5137e1
improvement related to https://issue.openaire.research-infrastructures.eu/issues/6501
2021-03-31 16:25:41 +02:00
Sandro La Bruzzo
616d2ecce2
splitted workflow collecting datacite into two workflows.
...
Released on beta
2021-03-31 15:45:58 +02:00
Claudio Atzori
9237d55d7f
[OpenOrgsWf] cleanup
2021-03-29 17:40:34 +02:00
Claudio Atzori
7f4e9479ec
[OpenOrgsWf] graph construction wf: allow to skip the import openorgs node (importOpenorgs true|false)
2021-03-29 16:59:16 +02:00
miconis
2709d08fc2
Merge branch 'stable_ids' into openorgswf
2021-03-29 16:39:07 +02:00
miconis
f446580e9f
code refactoring (useless classes and wf removed), implementation of the test for the openorgs dedup
2021-03-29 16:10:46 +02:00
Claudio Atzori
a0837ac357
[Stats update] integrating PR#100 for testing D-Net/dnet-hadoop#100
2021-03-29 15:59:58 +02:00
miconis
2355cc4e9b
minor changes and bug fix
2021-03-29 10:07:12 +02:00
Sandro La Bruzzo
1dfda3624e
improved workflow importing datacite
2021-03-26 13:56:29 +01:00
Enrico Ottonello
91d8660982
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2021-03-25 11:21:20 +01:00
Enrico Ottonello
ebd67b8c8f
removed duplicates orcid data on authors set
2021-03-25 11:20:52 +01:00
Claudio Atzori
827e7e37db
[Cleaning] drop instance.alternateIdentifier elements when they are available among instance.pid
2021-03-25 11:07:59 +01:00
miconis
28c1cdd132
merged stable_ids into openorgswf
2021-03-25 10:44:49 +01:00
miconis
5dfb66b0fa
minor changes
2021-03-25 10:29:34 +01:00
miconis
348b0ef921
bug fix, implementation of the workflow for the creation of raw_organizations (openorgs dedup), addition of the pid lists to the openorgs postgres db
2021-03-24 15:51:27 +01:00
Claudio Atzori
751125fdf9
[Actionmanager] zero function considers empty entity.id as well as rel.source/rel.target
2021-03-23 17:34:32 +01:00
Claudio Atzori
1e423fdc07
[Actionmanager] remove invalid records from the input graph before groupGraphTableByIdAndMerge
2021-03-23 13:39:24 +01:00
Claudio Atzori
e5ebb500cf
fixed pom versions; included missing workflow modules in dhp-workflows/pom.xml
2021-03-23 12:13:53 +01:00
Claudio Atzori
b75ad76f79
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-03-23 09:59:12 +01:00
Claudio Atzori
8db248aa13
avoiding error on jenkins compilations: java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries (on a random free port)!
2021-03-23 09:56:34 +01:00
Sandro La Bruzzo
625e4c29c4
added model constants
2021-03-23 09:39:56 +01:00
Claudio Atzori
b4febed138
updated mapping tests as consequence of the special treatment reserved to Handle PIDs
2021-03-23 09:37:48 +01:00
Claudio Atzori
431cbe9955
handle missing instance.pid during bulk cleaning
2021-03-23 09:28:58 +01:00
Sandro La Bruzzo
c392936b97
fixed error on best access right
2021-03-23 09:23:22 +01:00
Sandro La Bruzzo
c73072079d
fix conflicts
2021-03-22 16:36:31 +01:00
Sandro La Bruzzo
098914dcff
fix wrong relation with source null
2021-03-22 11:35:02 +01:00
miconis
0fe40b08e4
addition of deduplication profiles for the results, double check on pids and the title with a lower threshold
2021-03-19 17:12:05 +01:00
miconis
98854b0124
minor changes
2021-03-19 16:57:40 +01:00
Claudio Atzori
5a043e95ea
code formatting
2021-03-19 11:37:27 +01:00
Claudio Atzori
a4e82a65aa
integrated filter applied when merging BETA & PROD graphs to rule our records from Datacite
2021-03-19 11:34:44 +01:00
Claudio Atzori
75144dacb3
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-03-19 09:07:40 +01:00
Claudio Atzori
972d5a3d98
[dedup] Datacite should be authoritative for datasets
2021-03-19 09:04:20 +01:00
Sandro La Bruzzo
25d5663d97
added filter
2021-03-18 10:24:42 +01:00
Sandro La Bruzzo
5f98ea74a9
Added fix for pid generation in stableIds
2021-03-17 15:53:24 +01:00
Sandro La Bruzzo
2be0428047
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-03-17 14:54:28 +01:00
Claudio Atzori
8257f9a2bc
result.pid: adjusted the mapping applied to the contents from the aggregator
2021-03-17 12:45:38 +01:00
Sandro La Bruzzo
7c97a4d900
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-03-17 12:13:03 +01:00
Sandro La Bruzzo
cc5bbafa5d
some fix to make workflows runs
2021-03-17 12:12:56 +01:00
Claudio Atzori
640b885706
added instance.alternativeIdentifiers to the graph model, adjusted the mapping applied to the contents from the aggregator
2021-03-16 14:19:32 +01:00
Claudio Atzori
61a2551e74
migrated last changes from svn (dnet45)
2021-03-15 17:17:55 +01:00
Antonis Lempesis
0ba0a6b9da
update promote wf to support monitor&production
2021-03-12 16:42:59 +02:00
Antonis Lempesis
60ebdf2dbe
update promote wf to support monitor&production
2021-03-12 16:34:53 +02:00
Antonis Lempesis
236435b470
following redirects
2021-03-12 14:11:21 +02:00
Antonis Lempesis
3c75a05044
fixed a ton of typos
2021-03-12 13:47:04 +02:00
Sandro La Bruzzo
4bb3bcafa5
add author sequence number
2021-03-11 11:32:32 +01:00
Sandro La Bruzzo
a8e5d0ea0d
updated test and fixed assign of access right
2021-03-11 10:41:24 +01:00
Sandro La Bruzzo
f5e7c57654
Fixed ticket 6282
2021-03-11 10:32:45 +01:00
Antonis Lempesis
fa1ec5b5e9
fixed typo...
2021-03-10 14:05:58 +02:00
Claudio Atzori
01630f638d
IdentifierFactory implementation based on the list of datasources authoritative for a given pid type
2021-03-09 17:11:50 +01:00
Claudio Atzori
59532b0919
[ #6281 Provenance of product PIDs] Added PIDs to the Instance type; extended mapping for OAF/ODF records
2021-03-09 11:14:45 +01:00
Claudio Atzori
d525785497
[ #6282 open access status in the Graph] Result.Instance.accessRight defined with dedicated data type that includes the open access color.
2021-03-09 11:12:55 +01:00
Sandro La Bruzzo
bbe1a7c69a
[ #6281 Provenance of product PIDs] Added PIDs to the Instance type in Scholexplorer Export
2021-03-09 10:46:36 +01:00
Sandro La Bruzzo
a2169ccf07
// implemented Ticket #6281 added pid to Instance in doiBoost
2021-03-09 10:46:36 +01:00
Claudio Atzori
f468c7f0d7
merged from master
2021-03-09 09:12:41 +01:00
Claudio Atzori
8d2bb24512
merged from master
2021-03-08 15:44:34 +01:00
Claudio Atzori
acbe3119a4
RestCollectorPlugin imported from dne45
2021-03-08 09:44:09 +01:00
Antonis Lempesis
f40c150a0d
fixed steps...
2021-03-06 00:35:57 +02:00
Claudio Atzori
fa7930d2e2
merging contributions from PR#97
2021-03-05 15:45:28 +01:00
Antonis Lempesis
6147ee4950
assigning correctly hive contexts to concepts
2021-03-05 14:12:18 +02:00
Antonis Lempesis
c5fbad8093
Contexts are now downloaded instead of using the stats_ext db
2021-03-04 00:42:21 +02:00
Claudio Atzori
55f6ff5f55
README.md for aggregation workflows
2021-03-03 16:18:34 +01:00
Claudio Atzori
e8789b0cdb
Merge pull request 'stats DB for monitor' ( #99 ) from antonis.lempesis/dnet-hadoop:master into master
...
Looks good to me, just a note on the parsing of the citations: since the last version, IIS produces citations as proper relationships among results. This is what we got already in the BETA graph
```
count r.reltype r.subreltype r.relclass
62.129.254 resultResult citation cites
62.043.309 resultResult citation isCitedBy
```
Thus, I suggest to move away from the current property based implementation for the extraction of the citation links and start relying on the relationships instead.
2021-03-03 10:29:09 +01:00
Claudio Atzori
36f750cd1d
removed unused classes
2021-03-03 10:22:29 +01:00
Claudio Atzori
b73dce3e3a
more logging on the MDStore mongodb client. Forcing UTF_8 encoding on the content
2021-03-03 10:17:16 +01:00
Antonis Lempesis
27796343ca
crude sleep. hardcoded value
2021-03-03 01:37:47 +02:00
Enrico Ottonello
70cb100647
added updating last orcid dataset folders after completion
2021-03-01 10:17:04 +01:00
Enrico Ottonello
bd3b16402b
added result typologies
2021-03-01 10:16:02 +01:00
Claudio Atzori
e76c4f62c1
MetadataRecord moved in dhp-schemas
2021-02-26 10:58:48 +01:00
miconis
1a85020572
bug fix in graph-mapper, changes in the implementation of the openorgs wf to create relations and populate openorgs db
2021-02-26 10:19:28 +01:00
Enrico Ottonello
ca1800510a
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2021-02-25 18:45:02 +01:00
Enrico Ottonello
53d7023460
dateOfCollection taken from orcid last_update.txt on hdfs; cleaned wf parameters
2021-02-25 18:43:29 +01:00
Claudio Atzori
7df2461ccc
indent XML records collected from oai-pmh endpoints
2021-02-25 16:19:12 +01:00
Enrico Ottonello
d43ea88caf
aligned orcid result typologies with openaire vocabulary
2021-02-25 15:02:10 +01:00
Claudio Atzori
b830e33392
mdstore collector plugin
2021-02-25 12:30:30 +01:00
Claudio Atzori
271e88537b
code formatting
2021-02-25 12:28:56 +01:00
Claudio Atzori
9c899f4433
cleanup on transformation functions and the relative tests
2021-02-24 15:07:59 +01:00
Claudio Atzori
fc3fa5e343
implemented mdstore collector plugin
2021-02-24 15:07:24 +01:00
Enrico Ottonello
975823b968
data from last updated orcid
2021-02-23 15:35:04 +01:00
Antonis Lempesis
d90767c733
correctly invalidating metadata
2021-02-19 03:18:47 +02:00
Antonis Lempesis
3681afbe04
typo
2021-02-19 03:04:27 +02:00
Antonis Lempesis
c5502eba8f
actually moved stats computation in impala instead of hive...
2021-02-19 02:54:39 +02:00
Antonis Lempesis
33c85d4e66
moved stats computation in impala instead of hive
2021-02-18 17:23:34 +02:00
Antonis Lempesis
b8e96c8ae7
moved cache update to the end
2021-02-18 16:42:22 +02:00
Antonis Lempesis
bcbfc052b1
fixed last errors in step 21
2021-02-18 16:32:54 +02:00
Antonis Lempesis
10a29a4b9a
fixes in monitor step
2021-02-18 15:05:59 +02:00
Antonis Lempesis
8ef66452d5
fixed typo
2021-02-17 22:24:44 +02:00
Antonis Lempesis
a8836e2f5f
fixed typo
2021-02-17 19:27:07 +02:00
Claudio Atzori
e7eba9f7e7
WIP: transformation workflow error reporting; cleanup
2021-02-17 16:54:08 +01:00
Claudio Atzori
58467aaf1e
WIP: transformation workflow error reporting
2021-02-17 16:14:41 +01:00
Claudio Atzori
cc88701f29
retry for any Socket exception
2021-02-17 16:13:54 +01:00
Antonis Lempesis
a445c1ac3d
fixed variable names in monitor script
2021-02-17 16:45:09 +02:00