miconis
3c12eeadce
bug fix in propagation of relations
2021-04-22 11:44:33 +02:00
Claudio Atzori
e5abbec2ba
[orcid] download of the lambda file defined in a script
2021-04-22 11:22:10 +02:00
Claudio Atzori
55964cbd81
[orcid] large oozie workflow cleanup; updated workflow for the orcidnodoi actionset creation
2021-04-22 10:18:09 +02:00
Claudio Atzori
8f309b72ff
[dedup] using node names consistently across the workflow
2021-04-21 17:54:51 +02:00
Claudio Atzori
52244f813a
merging from enrico.ottonello/dnet-hadoop:orcid-no-doi
2021-04-21 12:24:09 +02:00
Sandro La Bruzzo
fd29307b84
updated workflow name
2021-04-21 09:21:41 +02:00
Claudio Atzori
815b9f4d56
[openorgs dedup] fixed workflow parameter declarations. Introduced support for resuming the execution from intermediate steps
2021-04-20 17:24:45 +02:00
Claudio Atzori
d0d477cca3
code formatting
2021-04-20 12:50:34 +02:00
miconis
0393cdce42
addition of alternative names in export queries
2021-04-20 12:45:21 +02:00
miconis
cadd0a5de8
modification of the queries for openorgs: they now consider also pending orgs
2021-04-20 12:06:56 +02:00
Sandro La Bruzzo
e06c7f32f6
updated id figshare as described in #6377
2021-04-20 10:18:07 +02:00
Sandro La Bruzzo
dbe0d0378e
resolved ticket #6377
2021-04-20 09:44:44 +02:00
Sandro La Bruzzo
524e5f3092
Improved parallelization on transformation wf on hadoop
2021-04-19 15:17:25 +02:00
Sandro La Bruzzo
cdfe01bbae
improved parallelization on transformation job
2021-04-19 15:14:52 +02:00
Sandro La Bruzzo
3ae67b7a1d
Merge remote-tracking branch 'origin/stable_ids' into stable_ids
2021-04-16 17:36:57 +02:00
Sandro La Bruzzo
a16e5299f9
applied unique function on the final dataset
2021-04-16 17:36:48 +02:00
Claudio Atzori
45057440c1
code formatting
2021-04-16 17:28:25 +02:00
Enrico Ottonello
34ca792a55
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2021-04-16 17:18:46 +02:00
Enrico Ottonello
27068aacd1
wf to move orcid-no-doi dataset on the folder ready the import
2021-04-16 17:17:47 +02:00
miconis
7ad573d023
bug fix: changed join in propagaterelations without applying filter on the id
2021-04-16 16:40:42 +02:00
Sandro La Bruzzo
67085da305
fixed NPE
2021-04-16 11:05:58 +02:00
Sandro La Bruzzo
644aa8f40c
Merge remote-tracking branch 'origin/stable_ids' into stable_ids
2021-04-16 09:14:26 +02:00
Sandro La Bruzzo
7d6a80e2f2
added new type on MAG mapping
2021-04-16 09:14:15 +02:00
Claudio Atzori
906d50563c
Merge pull request 'properly invalidating impala metadata' ( #105 ) from antonis.lempesis/dnet-hadoop:master into master
...
Reviewed-on: #105
2021-04-15 15:06:22 +02:00
Claudio Atzori
3d58f95522
[stats update] properly invalidating impala metadata
2021-04-15 15:03:05 +02:00
Antonis Lempesis
03d36fadea
properly invalidating impala metadata
2021-04-15 13:34:22 +03:00
miconis
f64e57c112
refactoring of the id generation, sparkcreatemergerels collects entities to create root id after a join
2021-04-15 10:59:24 +02:00
miconis
176a5e493d
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-04-14 18:06:34 +02:00
miconis
3525a8f504
id generation of representative record moved to the SparkCreateMergeRel job
2021-04-14 18:06:07 +02:00
Sandro La Bruzzo
3f77bfceb0
fixed test failure on jenkins
2021-04-14 10:03:01 +02:00
Claudio Atzori
3125cef545
code formatting
2021-04-14 09:11:54 +02:00
Sandro La Bruzzo
44a0064df6
Merge remote-tracking branch 'origin/stable_ids' into stable_ids
2021-04-13 17:48:12 +02:00
Sandro La Bruzzo
479abd10cb
Add into ORCID workflow a method that extracts orcid directly to the dump generated by Enrico
2021-04-13 17:47:43 +02:00
Claudio Atzori
710cd1e8f2
Merge pull request 'add xslt, personname cleaner' ( #104 ) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
...
Reviewed-on: #104
LGTM
2021-04-13 14:43:05 +02:00
Claudio Atzori
d1ca025b0b
[cleaning] remiving authors without fullname or providing 'deactivated' keyword. Removing test test titles
2021-04-13 14:32:41 +02:00
miconis
1542196a33
bug fix: starting node of duplicate scan wf changed
2021-04-13 10:15:43 +02:00
miconis
369ed1cd8a
bug fix: lookupurl parameter added to dedup record job
2021-04-13 09:08:05 +02:00
Andreas Czerniak
3b694074ff
add xslt, personname cleaner
2021-04-13 07:04:27 +02:00
Claudio Atzori
511c0521e5
[dedup] avoiding NPEs handling OpenOrg relations
2021-04-12 17:45:11 +02:00
miconis
d442e25cbc
bug fix: ids in self mergerels are not marked deletedbyinference=true
2021-04-12 15:56:22 +02:00
miconis
dcff9cecdf
bug fix: ids in self mergerels are not marked deletedbyinference=true
2021-04-12 15:55:27 +02:00
miconis
11b22b2d23
bug fix in the query, it now exports only relations with non-hidden organizations
2021-04-08 11:51:47 +02:00
miconis
0857100fb8
implementation of the tests for the openorgs integration in the openaire provision
2021-04-07 18:42:16 +02:00
miconis
bf685d849f
addition of pids in the query for the export of openorgs for the provision, addition of ec_fields in the openorgs model
2021-04-07 14:27:43 +02:00
Miriam Baglioni
70e391d427
merge upstream
2021-04-07 10:38:08 +02:00
miconis
eaaefb8b4c
implementation of the procedure to reuse content of different dbs when creating the raw graph
2021-04-06 14:35:51 +02:00
miconis
c39c82dfe9
modification of the jobs for the integration of openorgs in the provision, dedup records are no more created by merging but simply taking results of openorgs portal
2021-04-06 14:31:00 +02:00
Claudio Atzori
37b65cc3ad
Merge pull request 'updates on stats-update workflow' ( #100 ) from antonis.lempesis/dnet-hadoop:master into master
...
The workflow integrated in the _stable_ids_ branch has been run correctly on the BETA content, thus IMO this PR can be integrated in the master branch.
Reviewed-on: #100
2021-04-02 16:13:35 +02:00
Claudio Atzori
1e7e5180fa
[Graph model] updated definition of ExternalReference: added alternateLabel, removed description ( #6503 )
2021-04-02 12:32:12 +02:00
Claudio Atzori
e686b8de8d
[ORCID-no-doi] integrating PR#98 #98
2021-04-01 17:11:03 +02:00
Claudio Atzori
ee34cc51c3
[ORCID-no-doi] integrating PR#98 #98
2021-04-01 17:07:49 +02:00
Claudio Atzori
70e49ed53c
[OpenOrgsWf] trivial refactoring
2021-04-01 10:30:51 +02:00
Claudio Atzori
7941d7be29
WIP: using common definitions from ModelConstants
2021-03-31 18:33:57 +02:00
Claudio Atzori
879e8cc7ef
WIP: using common definitions from ModelConstants
2021-03-31 17:12:01 +02:00
Claudio Atzori
72ce741ea6
WIP: using common definitions from ModelConstants
2021-03-31 17:07:13 +02:00
Enrico Ottonello
59ec5137e1
improvement related to https://issue.openaire.research-infrastructures.eu/issues/6501
2021-03-31 16:25:41 +02:00
Sandro La Bruzzo
616d2ecce2
splitted workflow collecting datacite into two workflows.
...
Released on beta
2021-03-31 15:45:58 +02:00
Miriam Baglioni
4b6e514f02
merge upstream
2021-03-30 10:27:12 +02:00
Claudio Atzori
9237d55d7f
[OpenOrgsWf] cleanup
2021-03-29 17:40:34 +02:00
Claudio Atzori
7f4e9479ec
[OpenOrgsWf] graph construction wf: allow to skip the import openorgs node (importOpenorgs true|false)
2021-03-29 16:59:16 +02:00
miconis
2709d08fc2
Merge branch 'stable_ids' into openorgswf
2021-03-29 16:39:07 +02:00
miconis
f446580e9f
code refactoring (useless classes and wf removed), implementation of the test for the openorgs dedup
2021-03-29 16:10:46 +02:00
Claudio Atzori
a0837ac357
[Stats update] integrating PR#100 for testing #100
2021-03-29 15:59:58 +02:00
miconis
2355cc4e9b
minor changes and bug fix
2021-03-29 10:07:12 +02:00
Sandro La Bruzzo
1dfda3624e
improved workflow importing datacite
2021-03-26 13:56:29 +01:00
Enrico Ottonello
91d8660982
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2021-03-25 11:21:20 +01:00
Enrico Ottonello
ebd67b8c8f
removed duplicates orcid data on authors set
2021-03-25 11:20:52 +01:00
Claudio Atzori
827e7e37db
[Cleaning] drop instance.alternateIdentifier elements when they are available among instance.pid
2021-03-25 11:07:59 +01:00
miconis
28c1cdd132
merged stable_ids into openorgswf
2021-03-25 10:44:49 +01:00
miconis
5dfb66b0fa
minor changes
2021-03-25 10:29:34 +01:00
miconis
348b0ef921
bug fix, implementation of the workflow for the creation of raw_organizations (openorgs dedup), addition of the pid lists to the openorgs postgres db
2021-03-24 15:51:27 +01:00
Claudio Atzori
751125fdf9
[Actionmanager] zero function considers empty entity.id as well as rel.source/rel.target
2021-03-23 17:34:32 +01:00
Claudio Atzori
1e423fdc07
[Actionmanager] remove invalid records from the input graph before groupGraphTableByIdAndMerge
2021-03-23 13:39:24 +01:00
Claudio Atzori
e5ebb500cf
fixed pom versions; included missing workflow modules in dhp-workflows/pom.xml
2021-03-23 12:13:53 +01:00
Claudio Atzori
b75ad76f79
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-03-23 09:59:12 +01:00
Claudio Atzori
8db248aa13
avoiding error on jenkins compilations: java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries (on a random free port)!
2021-03-23 09:56:34 +01:00
Sandro La Bruzzo
625e4c29c4
added model constants
2021-03-23 09:39:56 +01:00
Claudio Atzori
b4febed138
updated mapping tests as consequence of the special treatment reserved to Handle PIDs
2021-03-23 09:37:48 +01:00
Claudio Atzori
431cbe9955
handle missing instance.pid during bulk cleaning
2021-03-23 09:28:58 +01:00
Sandro La Bruzzo
c392936b97
fixed error on best access right
2021-03-23 09:23:22 +01:00
Sandro La Bruzzo
c73072079d
fix conflicts
2021-03-22 16:36:31 +01:00
Sandro La Bruzzo
098914dcff
fix wrong relation with source null
2021-03-22 11:35:02 +01:00
miconis
0fe40b08e4
addition of deduplication profiles for the results, double check on pids and the title with a lower threshold
2021-03-19 17:12:05 +01:00
miconis
98854b0124
minor changes
2021-03-19 16:57:40 +01:00
Claudio Atzori
5a043e95ea
code formatting
2021-03-19 11:37:27 +01:00
Claudio Atzori
a4e82a65aa
integrated filter applied when merging BETA & PROD graphs to rule our records from Datacite
2021-03-19 11:34:44 +01:00
Claudio Atzori
75144dacb3
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-03-19 09:07:40 +01:00
Claudio Atzori
972d5a3d98
[dedup] Datacite should be authoritative for datasets
2021-03-19 09:04:20 +01:00
Sandro La Bruzzo
25d5663d97
added filter
2021-03-18 10:24:42 +01:00
Sandro La Bruzzo
5f98ea74a9
Added fix for pid generation in stableIds
2021-03-17 15:53:24 +01:00
Sandro La Bruzzo
2be0428047
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-03-17 14:54:28 +01:00
Claudio Atzori
8257f9a2bc
result.pid: adjusted the mapping applied to the contents from the aggregator
2021-03-17 12:45:38 +01:00
Sandro La Bruzzo
7c97a4d900
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-03-17 12:13:03 +01:00
Sandro La Bruzzo
cc5bbafa5d
some fix to make workflows runs
2021-03-17 12:12:56 +01:00
Claudio Atzori
640b885706
added instance.alternativeIdentifiers to the graph model, adjusted the mapping applied to the contents from the aggregator
2021-03-16 14:19:32 +01:00
Claudio Atzori
61a2551e74
migrated last changes from svn (dnet45)
2021-03-15 17:17:55 +01:00
Antonis Lempesis
0ba0a6b9da
update promote wf to support monitor&production
2021-03-12 16:42:59 +02:00
Antonis Lempesis
60ebdf2dbe
update promote wf to support monitor&production
2021-03-12 16:34:53 +02:00
Antonis Lempesis
236435b470
following redirects
2021-03-12 14:11:21 +02:00
Antonis Lempesis
3c75a05044
fixed a ton of typos
2021-03-12 13:47:04 +02:00
Sandro La Bruzzo
4bb3bcafa5
add author sequence number
2021-03-11 11:32:32 +01:00
Sandro La Bruzzo
a8e5d0ea0d
updated test and fixed assign of access right
2021-03-11 10:41:24 +01:00
Sandro La Bruzzo
f5e7c57654
Fixed ticket 6282
2021-03-11 10:32:45 +01:00
Antonis Lempesis
fa1ec5b5e9
fixed typo...
2021-03-10 14:05:58 +02:00
Claudio Atzori
01630f638d
IdentifierFactory implementation based on the list of datasources authoritative for a given pid type
2021-03-09 17:11:50 +01:00
Claudio Atzori
59532b0919
[ #6281 Provenance of product PIDs] Added PIDs to the Instance type; extended mapping for OAF/ODF records
2021-03-09 11:14:45 +01:00
Claudio Atzori
d525785497
[ #6282 open access status in the Graph] Result.Instance.accessRight defined with dedicated data type that includes the open access color.
2021-03-09 11:12:55 +01:00
Sandro La Bruzzo
bbe1a7c69a
[ #6281 Provenance of product PIDs] Added PIDs to the Instance type in Scholexplorer Export
2021-03-09 10:46:36 +01:00
Sandro La Bruzzo
a2169ccf07
// implemented Ticket #6281 added pid to Instance in doiBoost
2021-03-09 10:46:36 +01:00
Claudio Atzori
f468c7f0d7
merged from master
2021-03-09 09:12:41 +01:00
Claudio Atzori
8d2bb24512
merged from master
2021-03-08 15:44:34 +01:00
Claudio Atzori
acbe3119a4
RestCollectorPlugin imported from dne45
2021-03-08 09:44:09 +01:00
Antonis Lempesis
f40c150a0d
fixed steps...
2021-03-06 00:35:57 +02:00
Claudio Atzori
fa7930d2e2
merging contributions from PR#97
2021-03-05 15:45:28 +01:00
Antonis Lempesis
6147ee4950
assigning correctly hive contexts to concepts
2021-03-05 14:12:18 +02:00
Antonis Lempesis
c5fbad8093
Contexts are now downloaded instead of using the stats_ext db
2021-03-04 00:42:21 +02:00
Claudio Atzori
55f6ff5f55
README.md for aggregation workflows
2021-03-03 16:18:34 +01:00
Claudio Atzori
e8789b0cdb
Merge pull request 'stats DB for monitor' ( #99 ) from antonis.lempesis/dnet-hadoop:master into master
...
Looks good to me, just a note on the parsing of the citations: since the last version, IIS produces citations as proper relationships among results. This is what we got already in the BETA graph
```
count r.reltype r.subreltype r.relclass
62.129.254 resultResult citation cites
62.043.309 resultResult citation isCitedBy
```
Thus, I suggest to move away from the current property based implementation for the extraction of the citation links and start relying on the relationships instead.
2021-03-03 10:29:09 +01:00
Claudio Atzori
36f750cd1d
removed unused classes
2021-03-03 10:22:29 +01:00
Claudio Atzori
b73dce3e3a
more logging on the MDStore mongodb client. Forcing UTF_8 encoding on the content
2021-03-03 10:17:16 +01:00
Antonis Lempesis
27796343ca
crude sleep. hardcoded value
2021-03-03 01:37:47 +02:00
Enrico Ottonello
70cb100647
added updating last orcid dataset folders after completion
2021-03-01 10:17:04 +01:00
Enrico Ottonello
bd3b16402b
added result typologies
2021-03-01 10:16:02 +01:00
Claudio Atzori
e76c4f62c1
MetadataRecord moved in dhp-schemas
2021-02-26 10:58:48 +01:00
miconis
1a85020572
bug fix in graph-mapper, changes in the implementation of the openorgs wf to create relations and populate openorgs db
2021-02-26 10:19:28 +01:00
Enrico Ottonello
ca1800510a
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2021-02-25 18:45:02 +01:00
Enrico Ottonello
53d7023460
dateOfCollection taken from orcid last_update.txt on hdfs; cleaned wf parameters
2021-02-25 18:43:29 +01:00
Claudio Atzori
7df2461ccc
indent XML records collected from oai-pmh endpoints
2021-02-25 16:19:12 +01:00
Enrico Ottonello
d43ea88caf
aligned orcid result typologies with openaire vocabulary
2021-02-25 15:02:10 +01:00
Claudio Atzori
b830e33392
mdstore collector plugin
2021-02-25 12:30:30 +01:00
Claudio Atzori
271e88537b
code formatting
2021-02-25 12:28:56 +01:00
Claudio Atzori
9c899f4433
cleanup on transformation functions and the relative tests
2021-02-24 15:07:59 +01:00
Claudio Atzori
fc3fa5e343
implemented mdstore collector plugin
2021-02-24 15:07:24 +01:00
Enrico Ottonello
975823b968
data from last updated orcid
2021-02-23 15:35:04 +01:00
Miriam Baglioni
896919e735
merge upstream
2021-02-23 10:45:29 +01:00
Antonis Lempesis
d90767c733
correctly invalidating metadata
2021-02-19 03:18:47 +02:00
Antonis Lempesis
3681afbe04
typo
2021-02-19 03:04:27 +02:00
Antonis Lempesis
c5502eba8f
actually moved stats computation in impala instead of hive...
2021-02-19 02:54:39 +02:00
Antonis Lempesis
33c85d4e66
moved stats computation in impala instead of hive
2021-02-18 17:23:34 +02:00
Antonis Lempesis
b8e96c8ae7
moved cache update to the end
2021-02-18 16:42:22 +02:00
Antonis Lempesis
bcbfc052b1
fixed last errors in step 21
2021-02-18 16:32:54 +02:00
Antonis Lempesis
10a29a4b9a
fixes in monitor step
2021-02-18 15:05:59 +02:00
Antonis Lempesis
8ef66452d5
fixed typo
2021-02-17 22:24:44 +02:00
Antonis Lempesis
a8836e2f5f
fixed typo
2021-02-17 19:27:07 +02:00
Claudio Atzori
e7eba9f7e7
WIP: transformation workflow error reporting; cleanup
2021-02-17 16:54:08 +01:00
Claudio Atzori
58467aaf1e
WIP: transformation workflow error reporting
2021-02-17 16:14:41 +01:00
Claudio Atzori
cc88701f29
retry for any Socket exception
2021-02-17 16:13:54 +01:00
Antonis Lempesis
a445c1ac3d
fixed variable names in monitor script
2021-02-17 16:45:09 +02:00
Antonis Lempesis
00d516360f
added missing ;
2021-02-17 16:41:10 +02:00
Claudio Atzori
545f8f3e48
using jackson objectmapper instead of GSon to serialise the aggregation report
2021-02-17 12:15:00 +01:00