Claudio Atzori
8a0de2fc18
[ActionManagement] reduced number of xqueries used to access ActionSet info
2021-05-07 17:31:32 +02:00
Sandro La Bruzzo
7dc824fc23
imported changes in stable_id into master
2021-05-07 12:53:50 +02:00
Michele Artini
d82071ba6c
originalId with prefix
2021-05-06 15:34:48 +02:00
Claudio Atzori
d4a30fabe3
clean up tests
2021-05-05 17:28:15 +02:00
Claudio Atzori
dccaf173cf
fixed mapping applied to ODF records. Added unit test to verify the mapping for OpenTrials
2021-05-05 16:36:15 +02:00
Claudio Atzori
8c96a82a03
fixed mapping applied to ODF records. Added unit test to verify the mapping for OpenTrials
2021-05-05 15:30:06 +02:00
Claudio Atzori
2e1eb96f9a
code formatting
2021-05-05 11:23:57 +02:00
Sandro La Bruzzo
1adfc41d23
merged manually changes on stable_id for doiboost into master
2021-05-05 10:23:32 +02:00
Claudio Atzori
fb930b84d3
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-05-04 18:06:30 +02:00
Claudio Atzori
923d19ea8e
mdstore read lock/unlock when bulk copying records from mongodb to hdfs
2021-05-04 18:06:21 +02:00
Sandro La Bruzzo
714b71bd21
updated pubmed
2021-05-04 14:54:12 +02:00
Claudio Atzori
ba86835951
using common constants from ModelConstants
2021-05-04 11:51:52 +02:00
Michele Artini
f4bd2b5619
recert file SparkDedupTest.java
2021-05-04 10:26:14 +02:00
Michele Artini
b4877da363
Merge branch 'stable_ids' into prepare_ror_actionset
2021-05-03 08:13:55 +02:00
Alessia Bardi
9a20057615
fixed query for organisations' pids
2021-04-29 15:23:39 +02:00
Michele Artini
6692128234
Merge branch 'stable_ids' into prepare_ror_actionset
2021-04-29 13:24:08 +02:00
Alessia Bardi
a801999e75
fixed query for organisations' pids
2021-04-29 12:18:42 +02:00
Michele Artini
a278d67175
parse input file
2021-04-29 11:34:47 +02:00
Claudio Atzori
f6ccd54d87
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-04-29 10:10:01 +02:00
Claudio Atzori
91e7220f20
cleaned up workflow for actionset migration, adjusted dnet|cnr* dependency versions
2021-04-29 10:09:52 +02:00
Michele Artini
f77ba34126
pid types
2021-04-29 09:50:05 +02:00
Michele Artini
7c5cd86927
annotations and tests
2021-04-29 09:29:19 +02:00
Michele Artini
b5cf505cc6
partial implementation of the ROR->actionset workflow
2021-04-28 16:00:24 +02:00
Enrico Ottonello
c537986b7c
deleted folders with merged data immediately before merge phases
2021-04-28 11:25:25 +02:00
Sandro La Bruzzo
2129e9caa7
updated pangaea transformation to parse directly the xml
2021-04-28 10:21:03 +02:00
Claudio Atzori
5afa7d3e0c
core utilities in dhp-common moved in external module dhp-schemas
2021-04-27 15:44:01 +02:00
Alessia Bardi
e6075bb917
updated json schema for results - added instances and accessright definition
2021-04-27 15:15:08 +02:00
Sandro La Bruzzo
63c0303137
removed unused import, add log
2021-04-27 12:17:23 +02:00
Sandro La Bruzzo
74484d2823
bug fixing
2021-04-27 12:13:44 +02:00
Sandro La Bruzzo
c74b03d59c
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-04-27 11:31:07 +02:00
Sandro La Bruzzo
7f8848ecdd
added first implementation of Pangaea Mapping
2021-04-27 11:30:37 +02:00
Claudio Atzori
27ab8a704d
adjusted poms to align with the external dhp-schema module
2021-04-27 10:12:27 +02:00
Claudio Atzori
a7cf449b36
cleanup
2021-04-27 10:11:26 +02:00
Claudio Atzori
fa42026590
fixed PersonCleaner extension functions
2021-04-27 10:10:06 +02:00
Claudio Atzori
ef4bfd82e2
code formatting
2021-04-27 10:09:31 +02:00
Claudio Atzori
faa8f6f4e2
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-04-27 09:57:03 +02:00
miconis
6d5c14e030
assertions updated in entity merger test
2021-04-27 09:47:49 +02:00
Claudio Atzori
c2bb03c8b5
depending on external dhp-schemas module
2021-04-23 17:57:35 +02:00
Claudio Atzori
7ed107be53
depending on external dhp-schemas module
2021-04-23 17:52:36 +02:00
Claudio Atzori
c25238480c
making ODF record parsing namespace unaware ( #6629 )
2021-04-23 17:34:57 +02:00
Claudio Atzori
99cfb027fa
making ODF record parsing namespace unaware ( #6629 )
2021-04-23 17:09:36 +02:00
Miriam Baglioni
72e5aa3b42
refactoring
2021-04-23 12:10:30 +02:00
Miriam Baglioni
7d1b8b7f64
merge upstream
2021-04-23 11:55:49 +02:00
miconis
d0e3366c34
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-04-22 11:45:19 +02:00
miconis
3c12eeadce
bug fix in propagation of relations
2021-04-22 11:44:33 +02:00
Claudio Atzori
e5abbec2ba
[orcid] download of the lambda file defined in a script
2021-04-22 11:22:10 +02:00
Claudio Atzori
55964cbd81
[orcid] large oozie workflow cleanup; updated workflow for the orcidnodoi actionset creation
2021-04-22 10:18:09 +02:00
Claudio Atzori
8f309b72ff
[dedup] using node names consistently across the workflow
2021-04-21 17:54:51 +02:00
Claudio Atzori
52244f813a
merging from enrico.ottonello/dnet-hadoop:orcid-no-doi
2021-04-21 12:24:09 +02:00
Sandro La Bruzzo
fd29307b84
updated workflow name
2021-04-21 09:21:41 +02:00
Claudio Atzori
815b9f4d56
[openorgs dedup] fixed workflow parameter declarations. Introduced support for resuming the execution from intermediate steps
2021-04-20 17:24:45 +02:00
Claudio Atzori
d0d477cca3
code formatting
2021-04-20 12:50:34 +02:00
miconis
0393cdce42
addition of alternative names in export queries
2021-04-20 12:45:21 +02:00
miconis
cadd0a5de8
modification of the queries for openorgs: they now consider also pending orgs
2021-04-20 12:06:56 +02:00
Sandro La Bruzzo
e06c7f32f6
updated id figshare as described in #6377
2021-04-20 10:18:07 +02:00
Sandro La Bruzzo
dbe0d0378e
resolved ticket #6377
2021-04-20 09:44:44 +02:00
Antonis Lempesis
625d993cd9
added step for observatory db
2021-04-20 02:31:06 +03:00
Antonis Lempesis
25d0512fbd
code cleanup
2021-04-20 01:43:23 +03:00
Sandro La Bruzzo
524e5f3092
Improved parallelization on transformation wf on hadoop
2021-04-19 15:17:25 +02:00
Sandro La Bruzzo
cdfe01bbae
improved parallelization on transformation job
2021-04-19 15:14:52 +02:00
Sandro La Bruzzo
3ae67b7a1d
Merge remote-tracking branch 'origin/stable_ids' into stable_ids
2021-04-16 17:36:57 +02:00
Sandro La Bruzzo
a16e5299f9
applied unique function on the final dataset
2021-04-16 17:36:48 +02:00
Claudio Atzori
45057440c1
code formatting
2021-04-16 17:28:25 +02:00
Enrico Ottonello
34ca792a55
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2021-04-16 17:18:46 +02:00
Enrico Ottonello
27068aacd1
wf to move orcid-no-doi dataset on the folder ready the import
2021-04-16 17:17:47 +02:00
miconis
7ad573d023
bug fix: changed join in propagaterelations without applying filter on the id
2021-04-16 16:40:42 +02:00
Sandro La Bruzzo
67085da305
fixed NPE
2021-04-16 11:05:58 +02:00
Sandro La Bruzzo
644aa8f40c
Merge remote-tracking branch 'origin/stable_ids' into stable_ids
2021-04-16 09:14:26 +02:00
Sandro La Bruzzo
7d6a80e2f2
added new type on MAG mapping
2021-04-16 09:14:15 +02:00
Claudio Atzori
906d50563c
Merge pull request 'properly invalidating impala metadata' ( #105 ) from antonis.lempesis/dnet-hadoop:master into master
...
Reviewed-on: D-Net/dnet-hadoop#105
2021-04-15 15:06:22 +02:00
Claudio Atzori
3d58f95522
[stats update] properly invalidating impala metadata
2021-04-15 15:03:05 +02:00
Antonis Lempesis
03d36fadea
properly invalidating impala metadata
2021-04-15 13:34:22 +03:00
miconis
f64e57c112
refactoring of the id generation, sparkcreatemergerels collects entities to create root id after a join
2021-04-15 10:59:24 +02:00
miconis
176a5e493d
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-04-14 18:06:34 +02:00
miconis
3525a8f504
id generation of representative record moved to the SparkCreateMergeRel job
2021-04-14 18:06:07 +02:00
Sandro La Bruzzo
3f77bfceb0
fixed test failure on jenkins
2021-04-14 10:03:01 +02:00
Claudio Atzori
3125cef545
code formatting
2021-04-14 09:11:54 +02:00
Sandro La Bruzzo
44a0064df6
Merge remote-tracking branch 'origin/stable_ids' into stable_ids
2021-04-13 17:48:12 +02:00
Sandro La Bruzzo
479abd10cb
Add into ORCID workflow a method that extracts orcid directly to the dump generated by Enrico
2021-04-13 17:47:43 +02:00
Claudio Atzori
710cd1e8f2
Merge pull request 'add xslt, personname cleaner' ( #104 ) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
...
Reviewed-on: D-Net/dnet-hadoop#104
LGTM
2021-04-13 14:43:05 +02:00
Claudio Atzori
d1ca025b0b
[cleaning] remiving authors without fullname or providing 'deactivated' keyword. Removing test test titles
2021-04-13 14:32:41 +02:00
miconis
1542196a33
bug fix: starting node of duplicate scan wf changed
2021-04-13 10:15:43 +02:00
miconis
369ed1cd8a
bug fix: lookupurl parameter added to dedup record job
2021-04-13 09:08:05 +02:00
Andreas Czerniak
3b694074ff
add xslt, personname cleaner
2021-04-13 07:04:27 +02:00
Claudio Atzori
511c0521e5
[dedup] avoiding NPEs handling OpenOrg relations
2021-04-12 17:45:11 +02:00
miconis
d442e25cbc
bug fix: ids in self mergerels are not marked deletedbyinference=true
2021-04-12 15:56:22 +02:00
miconis
dcff9cecdf
bug fix: ids in self mergerels are not marked deletedbyinference=true
2021-04-12 15:55:27 +02:00
miconis
11b22b2d23
bug fix in the query, it now exports only relations with non-hidden organizations
2021-04-08 11:51:47 +02:00
miconis
0857100fb8
implementation of the tests for the openorgs integration in the openaire provision
2021-04-07 18:42:16 +02:00
miconis
bf685d849f
addition of pids in the query for the export of openorgs for the provision, addition of ec_fields in the openorgs model
2021-04-07 14:27:43 +02:00
Miriam Baglioni
70e391d427
merge upstream
2021-04-07 10:38:08 +02:00
miconis
eaaefb8b4c
implementation of the procedure to reuse content of different dbs when creating the raw graph
2021-04-06 14:35:51 +02:00
miconis
c39c82dfe9
modification of the jobs for the integration of openorgs in the provision, dedup records are no more created by merging but simply taking results of openorgs portal
2021-04-06 14:31:00 +02:00
Claudio Atzori
37b65cc3ad
Merge pull request 'updates on stats-update workflow' ( #100 ) from antonis.lempesis/dnet-hadoop:master into master
...
The workflow integrated in the _stable_ids_ branch has been run correctly on the BETA content, thus IMO this PR can be integrated in the master branch.
Reviewed-on: D-Net/dnet-hadoop#100
2021-04-02 16:13:35 +02:00
Claudio Atzori
1e7e5180fa
[Graph model] updated definition of ExternalReference: added alternateLabel, removed description ( #6503 )
2021-04-02 12:32:12 +02:00
Claudio Atzori
e686b8de8d
[ORCID-no-doi] integrating PR#98 D-Net/dnet-hadoop#98
2021-04-01 17:11:03 +02:00
Claudio Atzori
ee34cc51c3
[ORCID-no-doi] integrating PR#98 D-Net/dnet-hadoop#98
2021-04-01 17:07:49 +02:00
Claudio Atzori
70e49ed53c
[OpenOrgsWf] trivial refactoring
2021-04-01 10:30:51 +02:00
Claudio Atzori
7941d7be29
WIP: using common definitions from ModelConstants
2021-03-31 18:33:57 +02:00
Claudio Atzori
879e8cc7ef
WIP: using common definitions from ModelConstants
2021-03-31 17:12:01 +02:00
Claudio Atzori
72ce741ea6
WIP: using common definitions from ModelConstants
2021-03-31 17:07:13 +02:00
Enrico Ottonello
59ec5137e1
improvement related to https://issue.openaire.research-infrastructures.eu/issues/6501
2021-03-31 16:25:41 +02:00
Sandro La Bruzzo
616d2ecce2
splitted workflow collecting datacite into two workflows.
...
Released on beta
2021-03-31 15:45:58 +02:00
Miriam Baglioni
4b6e514f02
merge upstream
2021-03-30 10:27:12 +02:00
Claudio Atzori
9237d55d7f
[OpenOrgsWf] cleanup
2021-03-29 17:40:34 +02:00
Claudio Atzori
7f4e9479ec
[OpenOrgsWf] graph construction wf: allow to skip the import openorgs node (importOpenorgs true|false)
2021-03-29 16:59:16 +02:00
miconis
2709d08fc2
Merge branch 'stable_ids' into openorgswf
2021-03-29 16:39:07 +02:00
miconis
f446580e9f
code refactoring (useless classes and wf removed), implementation of the test for the openorgs dedup
2021-03-29 16:10:46 +02:00
Claudio Atzori
a0837ac357
[Stats update] integrating PR#100 for testing D-Net/dnet-hadoop#100
2021-03-29 15:59:58 +02:00
miconis
2355cc4e9b
minor changes and bug fix
2021-03-29 10:07:12 +02:00
Sandro La Bruzzo
1dfda3624e
improved workflow importing datacite
2021-03-26 13:56:29 +01:00
Enrico Ottonello
91d8660982
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2021-03-25 11:21:20 +01:00
Enrico Ottonello
ebd67b8c8f
removed duplicates orcid data on authors set
2021-03-25 11:20:52 +01:00
Claudio Atzori
827e7e37db
[Cleaning] drop instance.alternateIdentifier elements when they are available among instance.pid
2021-03-25 11:07:59 +01:00
miconis
28c1cdd132
merged stable_ids into openorgswf
2021-03-25 10:44:49 +01:00
miconis
5dfb66b0fa
minor changes
2021-03-25 10:29:34 +01:00
miconis
348b0ef921
bug fix, implementation of the workflow for the creation of raw_organizations (openorgs dedup), addition of the pid lists to the openorgs postgres db
2021-03-24 15:51:27 +01:00
Claudio Atzori
751125fdf9
[Actionmanager] zero function considers empty entity.id as well as rel.source/rel.target
2021-03-23 17:34:32 +01:00
Claudio Atzori
1e423fdc07
[Actionmanager] remove invalid records from the input graph before groupGraphTableByIdAndMerge
2021-03-23 13:39:24 +01:00
Claudio Atzori
e5ebb500cf
fixed pom versions; included missing workflow modules in dhp-workflows/pom.xml
2021-03-23 12:13:53 +01:00
Claudio Atzori
b75ad76f79
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-03-23 09:59:12 +01:00
Claudio Atzori
8db248aa13
avoiding error on jenkins compilations: java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries (on a random free port)!
2021-03-23 09:56:34 +01:00
Sandro La Bruzzo
625e4c29c4
added model constants
2021-03-23 09:39:56 +01:00
Claudio Atzori
b4febed138
updated mapping tests as consequence of the special treatment reserved to Handle PIDs
2021-03-23 09:37:48 +01:00
Claudio Atzori
431cbe9955
handle missing instance.pid during bulk cleaning
2021-03-23 09:28:58 +01:00
Sandro La Bruzzo
c392936b97
fixed error on best access right
2021-03-23 09:23:22 +01:00
Sandro La Bruzzo
c73072079d
fix conflicts
2021-03-22 16:36:31 +01:00
Sandro La Bruzzo
098914dcff
fix wrong relation with source null
2021-03-22 11:35:02 +01:00
miconis
0fe40b08e4
addition of deduplication profiles for the results, double check on pids and the title with a lower threshold
2021-03-19 17:12:05 +01:00
miconis
98854b0124
minor changes
2021-03-19 16:57:40 +01:00
Claudio Atzori
5a043e95ea
code formatting
2021-03-19 11:37:27 +01:00
Claudio Atzori
a4e82a65aa
integrated filter applied when merging BETA & PROD graphs to rule our records from Datacite
2021-03-19 11:34:44 +01:00
Claudio Atzori
75144dacb3
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-03-19 09:07:40 +01:00
Claudio Atzori
972d5a3d98
[dedup] Datacite should be authoritative for datasets
2021-03-19 09:04:20 +01:00
Sandro La Bruzzo
25d5663d97
added filter
2021-03-18 10:24:42 +01:00
Sandro La Bruzzo
5f98ea74a9
Added fix for pid generation in stableIds
2021-03-17 15:53:24 +01:00
Sandro La Bruzzo
2be0428047
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-03-17 14:54:28 +01:00
Claudio Atzori
8257f9a2bc
result.pid: adjusted the mapping applied to the contents from the aggregator
2021-03-17 12:45:38 +01:00
Sandro La Bruzzo
7c97a4d900
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-03-17 12:13:03 +01:00
Sandro La Bruzzo
cc5bbafa5d
some fix to make workflows runs
2021-03-17 12:12:56 +01:00
Claudio Atzori
640b885706
added instance.alternativeIdentifiers to the graph model, adjusted the mapping applied to the contents from the aggregator
2021-03-16 14:19:32 +01:00
Claudio Atzori
61a2551e74
migrated last changes from svn (dnet45)
2021-03-15 17:17:55 +01:00
Antonis Lempesis
0ba0a6b9da
update promote wf to support monitor&production
2021-03-12 16:42:59 +02:00
Antonis Lempesis
60ebdf2dbe
update promote wf to support monitor&production
2021-03-12 16:34:53 +02:00
Antonis Lempesis
236435b470
following redirects
2021-03-12 14:11:21 +02:00
Antonis Lempesis
3c75a05044
fixed a ton of typos
2021-03-12 13:47:04 +02:00
Sandro La Bruzzo
4bb3bcafa5
add author sequence number
2021-03-11 11:32:32 +01:00
Sandro La Bruzzo
a8e5d0ea0d
updated test and fixed assign of access right
2021-03-11 10:41:24 +01:00
Sandro La Bruzzo
f5e7c57654
Fixed ticket 6282
2021-03-11 10:32:45 +01:00
Antonis Lempesis
fa1ec5b5e9
fixed typo...
2021-03-10 14:05:58 +02:00
Claudio Atzori
01630f638d
IdentifierFactory implementation based on the list of datasources authoritative for a given pid type
2021-03-09 17:11:50 +01:00
Claudio Atzori
59532b0919
[ #6281 Provenance of product PIDs] Added PIDs to the Instance type; extended mapping for OAF/ODF records
2021-03-09 11:14:45 +01:00
Claudio Atzori
d525785497
[ #6282 open access status in the Graph] Result.Instance.accessRight defined with dedicated data type that includes the open access color.
2021-03-09 11:12:55 +01:00
Sandro La Bruzzo
bbe1a7c69a
[ #6281 Provenance of product PIDs] Added PIDs to the Instance type in Scholexplorer Export
2021-03-09 10:46:36 +01:00
Sandro La Bruzzo
a2169ccf07
// implemented Ticket #6281 added pid to Instance in doiBoost
2021-03-09 10:46:36 +01:00
Claudio Atzori
f468c7f0d7
merged from master
2021-03-09 09:12:41 +01:00
Claudio Atzori
8d2bb24512
merged from master
2021-03-08 15:44:34 +01:00
Claudio Atzori
acbe3119a4
RestCollectorPlugin imported from dne45
2021-03-08 09:44:09 +01:00
Antonis Lempesis
f40c150a0d
fixed steps...
2021-03-06 00:35:57 +02:00
Claudio Atzori
fa7930d2e2
merging contributions from PR#97
2021-03-05 15:45:28 +01:00
Antonis Lempesis
6147ee4950
assigning correctly hive contexts to concepts
2021-03-05 14:12:18 +02:00
Antonis Lempesis
c5fbad8093
Contexts are now downloaded instead of using the stats_ext db
2021-03-04 00:42:21 +02:00
Claudio Atzori
55f6ff5f55
README.md for aggregation workflows
2021-03-03 16:18:34 +01:00
Claudio Atzori
e8789b0cdb
Merge pull request 'stats DB for monitor' ( #99 ) from antonis.lempesis/dnet-hadoop:master into master
...
Looks good to me, just a note on the parsing of the citations: since the last version, IIS produces citations as proper relationships among results. This is what we got already in the BETA graph
```
count r.reltype r.subreltype r.relclass
62.129.254 resultResult citation cites
62.043.309 resultResult citation isCitedBy
```
Thus, I suggest to move away from the current property based implementation for the extraction of the citation links and start relying on the relationships instead.
2021-03-03 10:29:09 +01:00
Claudio Atzori
36f750cd1d
removed unused classes
2021-03-03 10:22:29 +01:00
Claudio Atzori
b73dce3e3a
more logging on the MDStore mongodb client. Forcing UTF_8 encoding on the content
2021-03-03 10:17:16 +01:00
Antonis Lempesis
27796343ca
crude sleep. hardcoded value
2021-03-03 01:37:47 +02:00
Enrico Ottonello
70cb100647
added updating last orcid dataset folders after completion
2021-03-01 10:17:04 +01:00
Enrico Ottonello
bd3b16402b
added result typologies
2021-03-01 10:16:02 +01:00
Claudio Atzori
e76c4f62c1
MetadataRecord moved in dhp-schemas
2021-02-26 10:58:48 +01:00
miconis
1a85020572
bug fix in graph-mapper, changes in the implementation of the openorgs wf to create relations and populate openorgs db
2021-02-26 10:19:28 +01:00
Enrico Ottonello
ca1800510a
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2021-02-25 18:45:02 +01:00
Enrico Ottonello
53d7023460
dateOfCollection taken from orcid last_update.txt on hdfs; cleaned wf parameters
2021-02-25 18:43:29 +01:00
Claudio Atzori
7df2461ccc
indent XML records collected from oai-pmh endpoints
2021-02-25 16:19:12 +01:00
Enrico Ottonello
d43ea88caf
aligned orcid result typologies with openaire vocabulary
2021-02-25 15:02:10 +01:00
Claudio Atzori
b830e33392
mdstore collector plugin
2021-02-25 12:30:30 +01:00
Claudio Atzori
271e88537b
code formatting
2021-02-25 12:28:56 +01:00
Claudio Atzori
9c899f4433
cleanup on transformation functions and the relative tests
2021-02-24 15:07:59 +01:00
Claudio Atzori
fc3fa5e343
implemented mdstore collector plugin
2021-02-24 15:07:24 +01:00
Enrico Ottonello
975823b968
data from last updated orcid
2021-02-23 15:35:04 +01:00
Miriam Baglioni
896919e735
merge upstream
2021-02-23 10:45:29 +01:00
Antonis Lempesis
d90767c733
correctly invalidating metadata
2021-02-19 03:18:47 +02:00
Antonis Lempesis
3681afbe04
typo
2021-02-19 03:04:27 +02:00
Antonis Lempesis
c5502eba8f
actually moved stats computation in impala instead of hive...
2021-02-19 02:54:39 +02:00
Antonis Lempesis
33c85d4e66
moved stats computation in impala instead of hive
2021-02-18 17:23:34 +02:00
Antonis Lempesis
b8e96c8ae7
moved cache update to the end
2021-02-18 16:42:22 +02:00
Antonis Lempesis
bcbfc052b1
fixed last errors in step 21
2021-02-18 16:32:54 +02:00
Antonis Lempesis
10a29a4b9a
fixes in monitor step
2021-02-18 15:05:59 +02:00
Antonis Lempesis
8ef66452d5
fixed typo
2021-02-17 22:24:44 +02:00
Antonis Lempesis
a8836e2f5f
fixed typo
2021-02-17 19:27:07 +02:00
Claudio Atzori
e7eba9f7e7
WIP: transformation workflow error reporting; cleanup
2021-02-17 16:54:08 +01:00
Claudio Atzori
58467aaf1e
WIP: transformation workflow error reporting
2021-02-17 16:14:41 +01:00
Claudio Atzori
cc88701f29
retry for any Socket exception
2021-02-17 16:13:54 +01:00
Antonis Lempesis
a445c1ac3d
fixed variable names in monitor script
2021-02-17 16:45:09 +02:00
Antonis Lempesis
00d516360f
added missing ;
2021-02-17 16:41:10 +02:00
Claudio Atzori
545f8f3e48
using jackson objectmapper instead of GSon to serialise the aggregation report
2021-02-17 12:15:00 +01:00
Claudio Atzori
b592d78bb4
WIP: collectorWorker error reporting, generalised reported implementation
2021-02-17 10:28:01 +01:00
Antonis Lempesis
cd1b794409
added the monitor db wf
2021-02-17 02:11:55 +02:00
Claudio Atzori
cf27905a71
WIP: collectorWorker error reporting, added report messages
2021-02-16 16:53:14 +01:00
Alessia Bardi
32e81c2d89
non validated rel has null value in validated field
2021-02-16 11:01:42 +01:00
Claudio Atzori
1abe6d1ad7
WIP: collectorWorker error reporting, added report messages
2021-02-15 15:08:59 +01:00
Claudio Atzori
523a6bfa97
Merge pull request 'first commit to the correct branch' ( #94 ) from andreas.czerniak/BrAggr_dnet-hadoop:hadoop_aggregator into hadoop_aggregator
...
Looks good to me, thanks Andreas!
2021-02-15 12:15:31 +01:00
Antonis Lempesis
1c029b9fc0
fixed formatting
2021-02-14 03:14:24 +02:00
Antonis Lempesis
2c4dcc90ba
analyzing tables to produce stats
2021-02-14 02:54:55 +02:00
Sandro La Bruzzo
7edcc87ed4
changed xslt behaviour on failure
2021-02-12 17:27:08 +01:00
Sandro La Bruzzo
6a37c7f175
merge fixed
2021-02-12 16:38:47 +01:00
Sandro La Bruzzo
b3f5c2351d
Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into hadoop_aggregator
...
Conflicts:
dhp-workflows/dhp-aggregation/src/test/java/eu/dnetlib/dhp/transformation/TransformationJobTest.java
2021-02-12 16:37:14 +01:00
Sandro La Bruzzo
f216277219
Implemented cleaning date
2021-02-12 16:34:52 +01:00
Andreas Czerniak
5a9017cf18
clone, min. changes, test, run
2021-02-12 14:32:36 +01:00
Claudio Atzori
aa55dedb8a
Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator
2021-02-12 12:31:05 +01:00
Claudio Atzori
29c6f7e255
classes related to the collection workflow moved into common package; implemented MongoDB collection plugins
2021-02-12 12:31:02 +01:00
Sandro La Bruzzo
17e6f1934e
fixed NPE on cleaner
2021-02-12 11:48:11 +01:00
Sandro La Bruzzo
ebcc3ec14f
updated wrong datacite identifier in trasformation
2021-02-11 16:25:51 +01:00
Michele Artini
83d815d0bc
only stats
2021-02-11 10:57:23 +01:00
Michele Artini
8c836bf930
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2021-02-11 10:54:41 +01:00
Michele Artini
8c1600398a
added resumeFrom parameter
2021-02-11 10:54:16 +01:00
Claudio Atzori
3f8f78cbfb
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2021-02-11 09:36:10 +01:00
Claudio Atzori
b34b5a39ca
index field authoridtypevalue mixes up different author id-type value pairs, dropped in favour of orcidtypevalue
2021-02-11 09:36:04 +01:00
Michele Artini
7249cceb53
switch of 2 nodes
2021-02-11 09:27:08 +01:00
Alessia Bardi
986dd969d3
use the proper import for Lists
2021-02-10 12:03:54 +01:00
miconis
4b2124a18e
implementation of the openorgs wfs, implementation of the raw_all wf to migrate openorgs db entities
2021-02-10 11:51:50 +01:00
Alessia Bardi
c4d1feca74
mapper test with validated link to project
2021-02-10 11:22:54 +01:00
Alessia Bardi
09fc7e2f78
serialization of validated flag on relationships
2021-02-10 11:22:09 +01:00
Enrico Ottonello
ee4ba7298b
fix last update read/write from file on hdfs
2021-02-09 23:24:57 +01:00
Claudio Atzori
bc458d1b54
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2021-02-09 16:27:30 +01:00
Claudio Atzori
82e6c50f3f
updated solr fields (authoridtypevalue, resultsubject, resultresourcetypename)
2021-02-09 16:27:04 +01:00
Claudio Atzori
62bd3c53ee
Merge branch 'master' into provision_indexing
2021-02-09 15:46:26 +01:00
Claudio Atzori
bae029f828
collection_java_xmx allows to declare the heap size allocated for the java actions involved in the metadata collectionw workflow
2021-02-08 18:07:23 +01:00
Claudio Atzori
bebc54d5bf
seq file storing native records is now compressed
2021-02-08 18:06:25 +01:00
Claudio Atzori
50add4c61b
added requestDelay to HttpConnector2 configuration; Aggregation workflow constants moved in dhp-common
2021-02-08 12:19:38 +01:00
Miriam Baglioni
2f5e6647c6
merge upstream
2021-02-08 10:33:11 +01:00
Claudio Atzori
40df0f987d
better logging, WIP: collectorWorker error reporting; common functions moved in DHPUtils
2021-02-06 20:12:00 +01:00
Claudio Atzori
a8a758925e
better logging, WIP: collectorWorker error reporting
2021-02-05 19:18:05 +01:00
Claudio Atzori
730973679a
Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator
2021-02-04 17:25:00 +01:00
Claudio Atzori
deb85706db
imported HttpConnector from https://svn.driver.research-infrastructures.eu/driver/dnet45/modules/dnet-modular-collector-service/trunk/src/main/java/eu/dnetlib/data/collector/plugins/HttpConnector.java as HttpConnector2
2021-02-04 17:24:52 +01:00
Sandro La Bruzzo
4dae5e605d
implemented messaging btween collection worker and Dnet
2021-02-04 15:51:15 +01:00
Claudio Atzori
72c57b28fa
switched project version to 1.2.4-branch_hadoop_aggregator-SNAPSHOT
2021-02-04 14:08:18 +01:00
Claudio Atzori
40764cf626
better logging, WIP: collectorWorker error reporting
2021-02-04 14:06:02 +01:00
Enrico Ottonello
c238561001
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2021-02-04 10:44:21 +01:00
Enrico Ottonello
465ce39f75
job execution now based on file last_update.txt on hdfs
2021-02-04 10:44:04 +01:00
Sandro La Bruzzo
69c253710b
fixed test
2021-02-04 10:30:49 +01:00
Claudio Atzori
e04045089f
better logging, WIP: collectorWorker error reporting
2021-02-03 17:58:22 +01:00
Alessia Bardi
c67329d3ad
updated test for EU Open Data portal datasets
2021-02-03 17:06:48 +01:00
Claudio Atzori
0e8a4f9f1a
better logging, WIP: collectorWorker error reporting
2021-02-03 12:33:41 +01:00
Alessia Bardi
fd705404a1
tests for EU Open Data portal dataset mapping
2021-02-03 10:28:17 +01:00
Miriam Baglioni
6190465851
merge upstream
2021-02-03 10:27:27 +01:00
Claudio Atzori
53884d12c2
code formatting
2021-02-02 14:38:03 +01:00
Claudio Atzori
ac46c247d2
code formatting
2021-02-02 14:24:00 +01:00
Claudio Atzori
bde14b149a
fixed transformation target paths
2021-02-02 12:49:29 +01:00
Claudio Atzori
ca4391aa1c
minor changes
2021-02-02 12:44:04 +01:00
Claudio Atzori
bb89b99b24
code formatting
2021-02-02 12:34:14 +01:00
Claudio Atzori
75807ea5ae
factored out constants
2021-02-02 12:28:21 +01:00
Sandro La Bruzzo
0634674add
implemented transformation test
2021-02-02 12:12:14 +01:00
Claudio Atzori
8eaa1fd4b4
WIP: metadata collection in INCREMENTAL mode and relative test
2021-02-01 19:29:10 +01:00
Sandro La Bruzzo
bead34d11a
code refactor
2021-02-01 14:58:06 +01:00
Sandro La Bruzzo
6ff234d81b
Implemented a first prototype of incremental harvesting and trasformation using readlock
2021-02-01 13:56:05 +01:00
Sandro La Bruzzo
b6b835ef49
update transformation Factory to get Transformation Rule by Id and not by Title
2021-02-01 08:49:42 +01:00
Sandro La Bruzzo
e423634cb6
RollBack in case of error WORKS!!!
2021-01-29 17:21:42 +01:00
Sandro La Bruzzo
8ee82576c6
Collection on Refresh WORKS!!!
2021-01-29 17:02:46 +01:00
Sandro La Bruzzo
0276180039
WIP mdstore
...
transaction implemented on hadoop side
2021-01-29 16:42:41 +01:00
Sandro La Bruzzo
0f8e2ecce6
Merged Datacite transfrom into this branch
2021-01-29 10:45:07 +01:00
Sandro La Bruzzo
99cf3a8ea4
Merged Datacite transfrom into this branch
2021-01-28 16:34:46 +01:00
Sandro La Bruzzo
686e7b507c
Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into aggregation_on_hadoop
2021-01-28 10:02:13 +01:00
Sandro La Bruzzo
98b9498b57
Removed old messaging system not quite used from collection and Transformation workflow
...
code refactor
2021-01-28 09:51:17 +01:00
Sandro La Bruzzo
184e7b3856
Implemented new Transformation using spark
2021-01-27 15:43:08 +01:00
Sandro La Bruzzo
150a617bd1
Merge pull request 'aggregation_on_hadoop' ( #90 ) from sandro.labruzzo/dnet-hadoop:aggregation_on_hadoop into hadoop_aggregator
...
Wonderfull code... You're the Best Sandro
2021-01-26 16:00:47 +01:00
Claudio Atzori
f1a852f278
align usage-stats workflow poms with latest snapshot version
2021-01-26 15:42:42 +01:00
Claudio Atzori
9c32119dc2
Merge pull request 'usage-stats-export-wf-v2' ( #89 ) from dimitris.pierrakos/dnet-hadoop:usage-stats-export-wf-v2 into master
...
Thank you Dimitris!
2021-01-26 15:01:41 +01:00
Claudio Atzori
885e0dd926
[Cleaning] filter authors not providing word characters in the fullname
2021-01-26 09:48:53 +01:00
Claudio Atzori
2890511613
[Cleaning] normalise missing Result.country
2021-01-26 09:41:44 +01:00
Claudio Atzori
4eb9ed35b1
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2021-01-25 18:12:24 +01:00
Claudio Atzori
cd379eb5e3
[Cleaning] trying to avoid NPEs, this time by ruling out authors without a defined fullname
2021-01-25 18:11:49 +01:00
Alessia Bardi
505477f36f
format code
2021-01-25 18:02:49 +01:00
Alessia Bardi
ded6ed8d7d
no ',' author, if there are no author in ODF records
2021-01-25 17:57:51 +01:00
Claudio Atzori
3465c8ccee
[Cleaning] trying to avoid NPEs
2021-01-25 16:54:53 +01:00
Sandro La Bruzzo
a54848a59c
Moved Vocabulary stuff to common module
2021-01-25 15:43:04 +01:00
Sandro La Bruzzo
ffb092b8d3
removed duplicate code HttpConnector.java
2021-01-25 15:05:37 +01:00
Sandro La Bruzzo
cda210a2ca
changed documentation since it didn't reflect the current status
2021-01-25 14:17:42 +01:00
Claudio Atzori
07a0ccfc96
[Cleaning] trying to avoid NPEs
2021-01-25 13:36:01 +01:00
miconis
c7e2d5a59a
minor changes
2021-01-25 12:40:45 +01:00
Claudio Atzori
34d653de41
[Cleaning] updated cleaning rule for DOIs
2021-01-22 14:16:33 +01:00
Miriam Baglioni
fe36895c53
added datasource blacklist for the organization to result propagation through institutional repositories
2021-01-22 11:55:10 +01:00
miconis
8fea29177c
refactoring, minor changes and implementation of the wf for openorgs with integration of organization phases into the scan wf
2021-01-18 16:48:08 +01:00
Dimitris
3e8d2a6b2d
Clean workflows
2021-01-15 16:19:12 +02:00
Michele Artini
cfbcdc95bc
fixed a wf param
2021-01-14 14:45:23 +01:00
Michele Artini
69ba3203c0
fixed a conflict
2021-01-14 14:43:25 +01:00
Michele Artini
b230d44411
fixed conflict
2021-01-14 14:32:31 +01:00
Michele Artini
b9d90e95b8
Added eventId to ShortEventMessage
2021-01-14 14:32:31 +01:00
Michele Artini
64b0b0bfb3
fixed a bug with invalid subject topic
2021-01-14 14:32:31 +01:00
Michele Artini
e3e0ab1de1
fixed a problem with join
2021-01-14 14:32:31 +01:00
Michele Artini
26a941315a
openaireId
2021-01-14 14:32:31 +01:00
Michele Artini
6f4d1a37f0
ES wf properties
2021-01-14 14:32:31 +01:00
Michele Artini
1391341d06
mkdir of output dir
2021-01-14 14:32:31 +01:00
Michele Artini
3c9cbd19f3
whitelist of topics
2021-01-14 14:32:31 +01:00
Michele Artini
467aa77279
workingDir and outputDir
2021-01-14 14:32:31 +01:00
Michele Artini
10f3f7eca7
workingDir and outputDir
2021-01-14 14:32:31 +01:00
Michele Artini
ff41a7b3a4
gzipped output
2021-01-14 14:32:31 +01:00
Claudio Atzori
80cf55ef2e
[Broker] fixed partitionEventsByOpendoarIds workflow parameter names
2021-01-13 16:24:30 +01:00
Claudio Atzori
41500669e2
[BIP! Scores integration] merged missing classes from bipFinder branch
2021-01-11 14:39:47 +01:00
Claudio Atzori
2a7a10809e
[BIP! Scores integration] merged missing classes from bipFinder branch
2021-01-11 10:05:02 +01:00
Claudio Atzori
d6686dd7cf
merged from master
2021-01-08 18:16:12 +01:00
Claudio Atzori
34229970e6
[BIP! Scores integration] Create updates as Result rather than subclasses; Result considers also metrics in the mergeFrom operation
2021-01-08 16:29:17 +01:00
Claudio Atzori
1361c9eb0c
[BIP! Scores integration] Create updates as Result rather than subclasses; Result considers also metrics in the mergeFrom operation
2021-01-07 10:07:30 +01:00
Claudio Atzori
ab2fe9266a
[DOIBoost] minor fixes in workflow definition
2021-01-05 10:26:39 +01:00
Claudio Atzori
7c722f3fdc
[DOIBoost] fixed typo
2021-01-05 10:25:54 +01:00
Claudio Atzori
8879704ba0
[DOIBoost] configurable ES server url and index name in crossref importer
2021-01-05 10:00:13 +01:00
Claudio Atzori
26e9d55c13
code formatting
2021-01-05 09:59:26 +01:00
Sandro La Bruzzo
7834a35768
avoid to save intermediate dataset before generation of Sequence file
2021-01-04 17:54:57 +01:00
Sandro La Bruzzo
e79445a8b4
minor fix for claudio polemica
2021-01-04 17:39:25 +01:00
Sandro La Bruzzo
8765020b85
minor fix
2021-01-04 17:37:08 +01:00
Sandro La Bruzzo
b0dc92786f
defined a single oozie workflow for the generation of doiboost
2021-01-04 17:01:35 +01:00
Claudio Atzori
7185158942
ignore missing properties
2020-12-29 11:06:28 +01:00
Claudio Atzori
28460c2cd1
using com.fasterxml.jackson.databind.ObjectMapper instead of org.codehaus.jackson.map.ObjectMapper
2020-12-23 16:59:52 +01:00
Claudio Atzori
60649ac7d2
swapped expected and actual in tests, updated expected number of authors
2020-12-23 12:26:04 +01:00
Claudio Atzori
723b01f9e9
trivial: the less magic numbers and values around, the better
2020-12-23 12:22:48 +01:00
Claudio Atzori
7bfc35df5e
Merge pull request 'Changed typo in script names' ( #82 ) from antonis.lempesis/dnet-hadoop:master into master
...
no need to! :)
2020-12-22 12:36:21 +01:00
Antonis Lempesis
be5969a8c2
Changed typo in script names
2020-12-22 13:33:32 +02:00
miconis
1e1aab83e3
implementation of the raw wf for openorgs: still not complete, some functionalities are missing
2020-12-21 11:58:21 +01:00
Claudio Atzori
6cb0dc3f43
extended OCRID cleaning procedure
2020-12-21 11:40:17 +01:00
Claudio Atzori
573a8a3272
Merge pull request 'Changed typo in script names' ( #81 ) from antonis.lempesis/dnet-hadoop:master into master
...
ok! LGTM
2020-12-18 17:44:26 +01:00
Antonis Lempesis
2a074c3b2b
Changed typo in script names
2020-12-18 18:40:48 +02:00
Claudio Atzori
47270d9af5
lenient mock can be lenient
2020-12-18 15:38:59 +01:00
Claudio Atzori
2e503ee101
code formatting
2020-12-17 13:47:38 +01:00
Claudio Atzori
5a3e2199b2
Merge pull request 'Creation of the action set to include the bipFinder! score' ( #80 ) from miriam.baglioni/dnet-hadoop:bipFinder into bipFinder_master_test
2020-12-17 12:26:38 +01:00
Claudio Atzori
03319d3bd9
Revert "Merge pull request 'Creation of the action set to include the bipFinder! score' ( #62 ) from miriam.baglioni/dnet-hadoop:bipFinder into master"
...
This reverts commit add7e1693b
, reversing
changes made to f9a8fd8bbd
.
2020-12-17 12:23:58 +01:00
Claudio Atzori
add7e1693b
Merge pull request 'Creation of the action set to include the bipFinder! score' ( #62 ) from miriam.baglioni/dnet-hadoop:bipFinder into master
2020-12-17 12:09:03 +01:00
Alessia Bardi
f9a8fd8bbd
updated test record for textgrid
2020-12-17 11:59:45 +01:00
Claudio Atzori
4766495f5b
[orcid_to_result_from_semrel_propagation] fixed typo in SQL
2020-12-17 09:15:50 +01:00
Claudio Atzori
de00094ebc
Merge pull request 'FIX on the creation of subject based broker enrichments' ( #79 ) from broker into master
2020-12-15 14:58:31 +01:00
Michele Artini
f9dc1e45fd
fixed a bug with invalid subject topic
2020-12-15 14:54:11 +01:00
Sandro La Bruzzo
f92bd56f56
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-12-15 11:47:29 +01:00
Sandro La Bruzzo
1f6c8a9e83
added orcid_pending type to records coming from Crossref
2020-12-15 11:47:15 +01:00
Enrico Ottonello
b2de598c1a
all actions from download lambda file to merge updated data into one wf
2020-12-15 10:42:55 +01:00
Claudio Atzori
9f1181290e
Merge pull request 'broker' ( #78 ) from broker into master
...
The changes look good to me.
2020-12-15 10:03:45 +01:00
Michele Artini
0a0f62bd01
Merge branch 'master' into broker
2020-12-15 08:30:52 +01:00
Michele Artini
12fa5d122a
fixed a problem with join
2020-12-15 08:30:26 +01:00
Michele Artini
991e675dc6
validation in claim rels
2020-12-14 15:41:25 +01:00
Michele Artini
3e19cf7b4a
openaireId
2020-12-14 15:24:33 +01:00
Claudio Atzori
b6f08ce226
re-adding the old junit:junit dep as solr-test-framework needs it
2020-12-14 15:07:31 +01:00
Claudio Atzori
7d325e2c57
using actual result subclasses instead of their parent class
2020-12-14 14:40:54 +01:00
Claudio Atzori
152916890f
renamed test name
2020-12-14 14:40:05 +01:00
Michele Artini
a203aee32a
ES wf properties
2020-12-14 12:02:33 +01:00
Claudio Atzori
1506f49052
Xml record serialization for author PIDs: 1) only one value per PID type is allowed; 2) orcid prevails over orcid_pending
2020-12-14 11:14:03 +01:00
Michele Artini
d03756c962
mkdir of output dir
2020-12-14 11:11:41 +01:00
Michele Artini
399548f221
whitelist of topics
2020-12-14 11:03:55 +01:00
Michele Artini
38da1c282a
Merge branch 'master' into broker
2020-12-14 09:14:02 +01:00
Dimitris
dc9c2f3272
Commit 12122020
2020-12-12 12:00:14 +02:00
Enrico Ottonello
efe4c2a9c5
authors and works are now updated in two separate spark actions of the wf
2020-12-12 02:06:21 +01:00
Enrico Ottonello
858efbfad1
fix dataset creation for downloaded works
2020-12-11 16:49:54 +01:00
Claudio Atzori
61cd129ded
XML serialisation test
2020-12-11 12:44:53 +01:00
Claudio Atzori
ce7a319e01
using the correct assertion import
2020-12-11 12:44:17 +01:00
Claudio Atzori
7fe2433137
excluded transitive older junit dependencies, they can compromise the unit test executions
2020-12-11 12:42:55 +01:00
Claudio Atzori
d9532446eb
imported more diffs from master branch; code formatting
2020-12-10 16:14:16 +01:00
Claudio Atzori
1eaad89a3c
do not fail on uknown properties when grouping entities by ID
2020-12-10 15:56:11 +01:00
Michele Artini
933b4c1ada
workingDir and outputDir
2020-12-10 14:47:51 +01:00
Michele Artini
2e7df07328
workingDir and outputDir
2020-12-10 14:47:22 +01:00
Michele Artini
94bfed1c84
gzipped output
2020-12-10 11:59:28 +01:00
Claudio Atzori
12e2f930c8
resolved conflicts
2020-12-10 10:57:39 +01:00
Miriam Baglioni
b7adbc7c3e
merge branch with master
2020-12-10 10:35:27 +01:00
Alessia Bardi
112da6d76a
in theory, just auto-formatting after mvn compile
2020-12-09 20:00:27 +01:00
Alessia Bardi
bece04b330
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-12-09 19:54:43 +01:00
Alessia Bardi
426b76ee8e
more asserts for TextGrid record
2020-12-09 19:46:11 +01:00
Claudio Atzori
ff72fcd91a
allow orcid_pending to be percolate to the XML graph serialization
2020-12-09 19:04:50 +01:00
Claudio Atzori
4705144918
Merge pull request 'rel_project_validation' ( #69 ) from rel_project_validation into master
...
LGTM
2020-12-09 19:01:20 +01:00
Claudio Atzori
211aa04726
allow orcid_pending to be percolate to the XML graph serialization
2020-12-09 18:08:51 +01:00
Claudio Atzori
ada21ad920
Merge pull request 'dump of the results related to at least one project' ( #61 ) from miriam.baglioni/dnet-hadoop:dump into master
...
LGTM
2020-12-09 17:22:56 +01:00
Claudio Atzori
3c5ce1dada
code formatting
2020-12-09 17:07:20 +01:00
Michele Artini
1bc9adc10d
default trust for validated rels
2020-12-09 16:18:37 +01:00
Claudio Atzori
fcd7689b50
promote actions: shouldGroupById parameter marked as optional (default is true)
2020-12-09 13:10:16 +01:00
Michele Artini
5f21a356fd
reindent
2020-12-09 11:24:30 +01:00
Michele Artini
370a5e650b
validation attributes in resultProject relations
2020-12-09 11:18:26 +01:00
Antonis Lempesis
aead9efd24
added the new parameter (stats_tool_api_url) in the workflow parameters
2020-12-09 10:45:24 +01:00
Antonis Lempesis
77a3a6d82e
added the new parameter (stats_tool_api_url) in the workflow parameters
2020-12-09 10:45:24 +01:00
Antonis Lempesis
91226117b3
ignoring deletedbyinference relations
2020-12-09 10:45:24 +01:00
Antonis Lempesis
b7f29db126
finished first implementation of wf
2020-12-09 10:45:24 +01:00
Antonis Lempesis
ded2392275
initial implementation of the promote wf
2020-12-09 10:45:24 +01:00
Antonis Lempesis
1a87a1effd
added last step to update cache
2020-12-09 10:45:24 +01:00
Enrico Ottonello
2233750a37
original orcid xml data are stored in a field of the class that models orcid data
2020-12-09 09:45:19 +01:00
Claudio Atzori
27e96767e0
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-12-07 21:53:22 +01:00
Claudio Atzori
fba11eef2a
cleanup
2020-12-07 21:53:13 +01:00
Sandro La Bruzzo
7f8b93de72
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-12-07 19:59:39 +01:00
Sandro La Bruzzo
302baab67b
fixed doiboost mapping and workflows
2020-12-07 19:59:33 +01:00
Enrico Ottonello
5c65e602d3
wf doi_authors generates one json data foreach row
2020-12-07 15:28:10 +01:00
Michele Artini
d6934f370e
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-12-07 14:56:23 +01:00
Michele Artini
5de8a7276f
wf to partition opendoar events
2020-12-07 14:56:06 +01:00
Claudio Atzori
5e8509bef7
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-12-07 13:50:08 +01:00
Claudio Atzori
026ad40633
disabled test
2020-12-07 13:50:01 +01:00
Claudio Atzori
21ddcf3a73
actions promotion can optionally avoid grouping objects by id (configured via shouldGroupById parameter)
2020-12-07 13:45:18 +01:00
Enrico Ottonello
fa1855a4b8
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-12-07 11:02:59 +01:00
Enrico Ottonello
b1b589ada1
wf to generate orcid dataset
2020-12-07 11:02:32 +01:00
Sandro La Bruzzo
620e585b63
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-12-07 10:42:53 +01:00
Sandro La Bruzzo
b31dd126fb
fixed crossref workflow added common ORCID Class
2020-12-07 10:42:38 +01:00
Enrico Ottonello
8812ab65e1
completed download function to wf; added accumulators
2020-12-04 21:13:49 +01:00
Claudio Atzori
a104a632df
cleanup
2020-12-04 16:32:47 +01:00
Claudio Atzori
5b4e1142a8
Merge pull request 'added last step to update cache' ( #64 ) from antonis.lempesis/dnet-hadoop:master into master
...
Looks good to me, thanks!
2020-12-04 14:42:31 +01:00
Antonis Lempesis
b1ed1afdcc
added the new parameter (stats_tool_api_url) in the workflow parameters
2020-12-04 13:07:18 +02:00
Antonis Lempesis
7cb113e088
added the new parameter (stats_tool_api_url) in the workflow parameters
2020-12-04 13:04:25 +02:00
Antonis Lempesis
d23ccae0d5
ignoring deletedbyinference relations
2020-12-04 12:42:17 +02:00
Miriam Baglioni
5fb65ffc4a
merge branch with master
2020-12-03 11:24:35 +01:00
Miriam Baglioni
ea88dc3401
fixed issue in property name
2020-12-03 11:24:23 +01:00
Miriam Baglioni
4c58bd1c93
merge with upstream
2020-12-03 11:24:00 +01:00
Miriam Baglioni
05c452f58d
merge with upstream
2020-12-03 10:26:45 +01:00
Enrico Ottonello
53b22c1937
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-12-02 23:21:27 +01:00
Enrico Ottonello
1b1e9ea67c
wf to generate doi_author_list for doiboost; wf to download updated works
2020-12-02 23:20:16 +01:00
Antonis Lempesis
413afcfed5
finished first implementation of wf
2020-12-02 15:57:17 +02:00
Antonis Lempesis
0948536614
initial implementation of the promote wf
2020-12-02 15:41:56 +02:00
Sandro La Bruzzo
7da679542f
fixed wrong projectId
2020-12-02 14:28:09 +01:00
Sandro La Bruzzo
6ba8037cc7
fixed failure to test due to changing of input
2020-12-02 11:34:46 +01:00
Claudio Atzori
cfb55effd9
code formatting
2020-12-02 11:23:49 +01:00
Claudio Atzori
74242e450e
using constants from ModelConstants
2020-12-02 11:23:35 +01:00
Miriam Baglioni
d5efa6963a
using constants in ModelCOnstants
2020-12-02 11:20:26 +01:00
Miriam Baglioni
cd285e98bc
usoing the constants defined in the ModelConstants class
2020-12-02 11:13:23 +01:00
Miriam Baglioni
4b0d1530a2
merge upstream
2020-12-02 11:05:00 +01:00
Claudio Atzori
faa977df7e
Merge pull request 'orcid-no-doi' ( #43 ) from enrico.ottonello/dnet-hadoop:orcid-no-doi into master
...
The dataset was generated and is now part of the actionsets available in BETA
2020-12-02 10:55:12 +01:00
Claudio Atzori
57f448b7a4
graph cleaning workflow separate orcid_pending from orcid, depending on the author pid provenance
2020-12-02 10:44:05 +01:00
Alessia Bardi
2d15667b4a
testing XML generation from json object (case AMS ACTA)
2020-12-02 10:16:26 +01:00
Alessia Bardi
a417624670
tests for raw graph mapping
2020-12-02 10:15:26 +01:00
Claudio Atzori
893ac4a77b
GenerateEntitiesApplication can be configured to hash the id value or not
2020-12-02 09:30:06 +01:00
Miriam Baglioni
f8468c9c22
added extention for new author pid (orcid_pending)
2020-12-01 20:09:35 +01:00
Miriam Baglioni
888175baf7
added java doc
2020-12-01 18:36:29 +01:00
Miriam Baglioni
3d62d99d5d
fixed issue in workflow variable
2020-12-01 15:02:49 +01:00
Miriam Baglioni
17680296b9
removed unnecessary variable and unused method
2020-12-01 15:02:31 +01:00
Miriam Baglioni
5b3ed70808
refactoring
2020-12-01 14:31:34 +01:00
Miriam Baglioni
62ff4999e3
added workflow and last step of collection and save
2020-12-01 14:30:56 +01:00
Miriam Baglioni
45d06c45c7
collecting all the atoic actions for result type and save them all in the AS path
2020-12-01 14:29:18 +01:00
Miriam Baglioni
0051ebede5
extending test
2020-12-01 12:43:03 +01:00
Miriam Baglioni
719da15f04
added test resources
2020-12-01 12:42:30 +01:00
Miriam Baglioni
db36e11912
classes test classes and resources for production of the actionset to include bipFinder score in results
2020-11-30 20:14:23 +01:00
Enrico Ottonello
f2df3ead74
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-11-30 14:22:46 +01:00
Enrico Ottonello
40c4559e92
added datainfo on authors pid with "sysimport:crosswalk:entityregistry",
2020-11-30 14:19:22 +01:00
Claudio Atzori
2c407e775e
GenerateEntitiesApplication can be configured to hash the id value or not
2020-11-30 12:00:38 +01:00
Antonis Lempesis
815d6b25d9
added last step to update cache
2020-11-30 00:48:10 +02:00
Claudio Atzori
758d27745d
cleaning tab characters from text fields
2020-11-27 16:07:24 +01:00
Claudio Atzori
e731a7658d
cleaning texts to remove tab characters too
2020-11-27 09:00:04 +01:00
Claudio Atzori
5151850a19
CROSSREF and DATACITE constants moved in common ModelConstants
2020-11-26 13:08:36 +01:00
Claudio Atzori
a104d2b6ad
cleanup
2020-11-26 11:12:00 +01:00
Claudio Atzori
d0d5525d40
minor changes
2020-11-26 11:04:17 +01:00
Claudio Atzori
13eae4b31e
GroupEntitiesSparkJob must read all graph paths but relations
2020-11-26 11:04:01 +01:00
Claudio Atzori
76363a8512
SimpleDateFormat is not thread safe; improved error reporting in case of invalid dates
2020-11-26 11:03:12 +01:00
Claudio Atzori
c1b9a4045a
grouping of records will be performed by the dedup workflow
2020-11-26 10:59:10 +01:00
Miriam Baglioni
124591a7f3
refactoring
2020-11-25 18:23:28 +01:00
Miriam Baglioni
1a89f8211c
D-Net/dnet-hadoop#61 (comment)
2020-11-25 18:12:40 +01:00
Miriam Baglioni
5fbe54ef54
D-Net/dnet-hadoop#61 (comment)
2020-11-25 18:10:28 +01:00
Miriam Baglioni
ed01e5a5e1
D-Net/dnet-hadoop#61 (comment)
2020-11-25 18:09:34 +01:00
Miriam Baglioni
d4ddde2ef2
changed because of D-Net/dnet-hadoop#61 (comment)
2020-11-25 18:01:01 +01:00
Miriam Baglioni
f5e5e92a10
changed because of D-Net/dnet-hadoop#61 (comment)
2020-11-25 17:58:53 +01:00
Miriam Baglioni
1df94b85b4
changed because of D-Net/dnet-hadoop#61 (comment)
2020-11-25 17:57:43 +01:00
Claudio Atzori
db0181b8af
Merge pull request 'added bidirectionality to relations from project and result coming from crossref' ( #60 ) from miriam.baglioni/dnet-hadoop:sxBidirectionality into master
2020-11-25 17:17:40 +01:00
Sandro La Bruzzo
ec3e238de6
Fixed problem on duplicated identifier
2020-11-25 17:15:54 +01:00
Claudio Atzori
e208b03755
renamed workflow
2020-11-25 14:55:50 +01:00
Claudio Atzori
dfd6205b95
Consistency graph workflow merges all the entities by ID
2020-11-25 14:55:32 +01:00
Miriam Baglioni
90d4369fd2
added test to verify the compression in writing community info on hdfs
2020-11-25 14:34:58 +01:00
Miriam Baglioni
6750e33d69
merge branch with master
2020-11-25 14:09:01 +01:00
Miriam Baglioni
b2c455f883
added java doc
2020-11-25 14:08:09 +01:00
Miriam Baglioni
1f130cdf92
changed the relation (produces -> isProducedBy) due to the change in the code
2020-11-25 14:04:26 +01:00
Miriam Baglioni
e758d5d9b4
refactoring
2020-11-25 13:46:39 +01:00
Miriam Baglioni
87a9f616ae
refactoring and addition of the funder nsp first part as nome for the dump insteasd of the whole nsp
2020-11-25 13:45:41 +01:00
Miriam Baglioni
e7e418e444
added decision node to verify if to upload in Zenodo
2020-11-25 13:44:10 +01:00
Miriam Baglioni
305e3d0c9c
added resource file for relation with relClass = isProducedBy
2020-11-25 13:43:41 +01:00
Miriam Baglioni
21ce175d17
added FilterFunction specification if filter operation
2020-11-25 13:42:31 +01:00
Miriam Baglioni
bde6d337dd
test classes for dump of results related to funders
2020-11-25 13:42:01 +01:00
Miriam Baglioni
b37b9352d7
added constant value for semantic relationship between projects and results
2020-11-25 13:41:08 +01:00
Sandro La Bruzzo
264723ffd8
updated stuff for zenodo upload
2020-11-25 11:56:07 +01:00
Claudio Atzori
36173c13a5
reverted filters in the clening process
2020-11-25 10:24:42 +01:00
Claudio Atzori
eeebd5a920
Cleanig workflow: remove newlines from titles, descriptions, subjects
2020-11-24 18:40:25 +01:00
Claudio Atzori
e1a1bb3ee4
moved class CleaningFunctions in the correct package. Remove newlines from titles, descriptions, subjects
2020-11-24 18:34:03 +01:00
Enrico Ottonello
99a086f0c6
max concurrent executors set to 10, according to ORCID Director of Technology mail request
2020-11-24 17:49:32 +01:00
Miriam Baglioni
72bb0fe360
changed directory name
2020-11-24 16:47:07 +01:00
Miriam Baglioni
00874a8ce6
added bidirectionality to relations from project and result
2020-11-24 15:17:23 +01:00
Miriam Baglioni
39f4a20873
chenged the path and the name for saving the communities_infrastructures dump file
2020-11-24 14:47:32 +01:00
Miriam Baglioni
7e14452a87
final versione of the wf to get the dump of results associated to at least one funder per funder
2020-11-24 14:46:34 +01:00
Miriam Baglioni
c167a18057
added new parameter for the dumpType
2020-11-24 14:45:50 +01:00
Miriam Baglioni
54a309bb6b
refactoring
2020-11-24 14:45:30 +01:00
Miriam Baglioni
35ecea8842
changed to consider the modification for the specification of the type of dump
2020-11-24 14:45:15 +01:00
Miriam Baglioni
b9b6bdb2e6
fixing issue on previous implementation
2020-11-24 14:44:53 +01:00
Miriam Baglioni
7e940f1991
changed to consider the modification for the specification of the type of dump
2020-11-24 14:43:34 +01:00
Miriam Baglioni
62928ef7a5
changed to save the communities_infrastructures information as the other entity dumps: in a json.gz file
2020-11-24 14:42:41 +01:00
Claudio Atzori
33bae02451
reverted behaviour of the cleaning workflow: grouping entities by ID will be managed differently
2020-11-24 14:42:33 +01:00
Miriam Baglioni
3319440c53
changed the direction of the relation between projects and result considered to select the results linked to projects
2020-11-24 14:41:09 +01:00
Miriam Baglioni
00c377dac2
added specification of MapFunction types in map
2020-11-24 14:40:22 +01:00
Miriam Baglioni
44db258dc4
added enumerated for the dump type
2020-11-24 14:38:06 +01:00
Miriam Baglioni
1832708c42
modified boolean variable with string one whcih specify the type of dump we are performing: complete, community or funder
2020-11-24 14:37:36 +01:00
Enrico Ottonello
5c17e768b2
set wf configuration with spark.dynamicAllocation.maxExecutors 20 over 20 input partitions
2020-11-23 16:01:23 +01:00
Enrico Ottonello
5c9a727895
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-11-23 09:49:53 +01:00
Enrico Ottonello
97c8111847
action to convert lambda file in seq file; spark action to download updated authors
2020-11-23 09:49:22 +01:00
Miriam Baglioni
259c67ce36
fixed issue in path name
2020-11-20 12:32:23 +01:00
Miriam Baglioni
0a9db67eec
-
2020-11-20 12:21:33 +01:00
Miriam Baglioni
d362f2637d
merge branch with master
2020-11-19 19:17:20 +01:00
Miriam Baglioni
cf3f47563f
new parameter files
2020-11-19 19:16:05 +01:00
Miriam Baglioni
24c56fa7a3
new logic and workflow for dump of results with link to projects. In this implementation the result match the model of the communityresult.
2020-11-19 19:15:39 +01:00
Claudio Atzori
d48f388fb2
Merge branch 'provision_indexing'
2020-11-19 15:59:55 +01:00
Claudio Atzori
46bde9c13f
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-11-19 15:26:27 +01:00
Claudio Atzori
7c9feaf9e7
project attributes removed from the XML record serialization: contactfullname, contactfax, contactphone, contactemail
2020-11-19 15:26:20 +01:00
Claudio Atzori
fcbb05eb21
cleanup
2020-11-19 15:14:33 +01:00
Claudio Atzori
3f34757c63
merged from master
2020-11-19 14:34:54 +01:00
Michele Artini
293da47ad9
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-11-19 10:42:31 +01:00
Michele Artini
ab08d12c46
considering abstract > MIN_LENGTH in ENRICH_MISSING_ABSTRACT
2020-11-19 10:42:10 +01:00
Claudio Atzori
e503271abe
fixed notification workflow name
2020-11-19 10:41:38 +01:00
Claudio Atzori
0374d34c3e
introduced configuration param outputFormat: HDFS | SOLR
2020-11-19 10:34:28 +01:00
Miriam Baglioni
fafb688887
-
2020-11-18 18:56:48 +01:00
Miriam Baglioni
906db690d2
-
2020-11-18 17:43:08 +01:00
Claudio Atzori
ede7fae6c8
Merge pull request 'XML record indexing test' ( #58 ) from provision_indexing into master
2020-11-18 17:04:34 +01:00
Miriam Baglioni
5402062ff5
changed parameter file with the ono associated to the job
2020-11-18 16:58:20 +01:00
Miriam Baglioni
a172a37ad1
fixed typo
2020-11-18 16:55:07 +01:00
Miriam Baglioni
46ba3793f6
code, workflow and parameters for the dump of the results associated to funders
2020-11-18 16:47:31 +01:00
Claudio Atzori
5218718e8b
updated set of fields from the MDFormatDSResourceType on PROD
2020-11-18 15:00:41 +01:00
Claudio Atzori
d9e07a242b
extended XmlIndexingJob to accept an optional parameter: outputPath. When present, forces the job to write its output on the specified HDFS location
2020-11-18 14:34:55 +01:00
Claudio Atzori
29dcff0f34
spark complains about missing classes, so here they are again
2020-11-18 14:32:32 +01:00
Miriam Baglioni
57cac36898
changed the workflow name
2020-11-18 13:38:03 +01:00
Claudio Atzori
12acf25519
Merge pull request 'starting from first step...' ( #57 ) from antonis.lempesis/dnet-hadoop:master into master
...
No judging. Just re-deploying...
2020-11-18 11:01:49 +01:00
Claudio Atzori
8177ce7939
test for XmlIndexingJob based on a local miniSolrCluster
2020-11-18 10:58:05 +01:00
Alessia Bardi
10e673660f
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-11-18 10:01:23 +01:00
Alessia Bardi
be7b310cef
rel semantcis ignore case
2020-11-18 10:01:20 +01:00
Michele Artini
33da2e3d6c
xpaths for dateOfCollection and dateOfTransformation
2020-11-18 09:26:20 +01:00
Antonis Lempesis
01a6e03989
starting from first step...
2020-11-17 23:26:47 +02:00
Alessia Bardi
8f87020a50
#56 : map relevantDates from aggregated ODF records
2020-11-17 18:42:09 +01:00
Alessia Bardi
7e0a76a8ac
test fr TextGrid
2020-11-17 18:39:25 +01:00
Enrico Ottonello
2b0c9bbb7e
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-11-17 18:24:34 +01:00
Enrico Ottonello
c0c2e05eae
added wf to extracting authors and works xml data from orcid dump to hdfs; added wf to download the lamda file (containing last orcid update informations) from orcid to hdfs
2020-11-17 18:23:12 +01:00
Claudio Atzori
cfc01f136e
PID filtering based on a blacklist
2020-11-17 12:27:06 +01:00
Dimitris
bbcf6b7c8b
Commit 17112020
2020-11-17 08:36:51 +02:00
Enrico Ottonello
c796adae24
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-11-16 11:57:19 +01:00
Claudio Atzori
6ab1ce53c9
fixed condition in result pid cleaning; cleanup
2020-11-16 10:09:17 +01:00
Claudio Atzori
4de8c8b237
fixed workflow variable name
2020-11-16 10:03:11 +01:00
Dimitris
3e24c9b176
Changes 14112020
2020-11-14 18:42:07 +02:00
Claudio Atzori
331d621800
added test resource
2020-11-14 12:16:15 +01:00
Claudio Atzori
5d4e34e26a
fixed typo in variable name
2020-11-14 10:32:26 +01:00
Claudio Atzori
768bc5304c
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-11-13 15:40:34 +01:00
Claudio Atzori
93f7b7974f
Merge pull request 'trust truncated to 3 decimals' ( #24 ) from trunc_trust into master
...
LGTM
2020-11-13 15:40:02 +01:00
Claudio Atzori
528231a287
grouping graph entities by id turned out to be an easy extension for the already existing cleaning workflow
2020-11-13 15:37:48 +01:00
Enrico Ottonello
005f849674
added compression to output dataset
2020-11-13 12:45:31 +01:00
Enrico Ottonello
9a2fa9dc2f
added test for other names parsing from summaries dump
2020-11-13 10:25:34 +01:00
Claudio Atzori
2bed29eb09
WIP: added oozie workflow for grouping graph entities by id
2020-11-13 10:05:12 +01:00
Claudio Atzori
13e36a4da0
WIP: added oozie workflow for grouping graph entities by id
2020-11-13 10:05:02 +01:00
Enrico Ottonello
13f28fa225
moved AuthorData to dhp-schemas; added other names to author data
2020-11-12 17:43:32 +01:00
Enrico Ottonello
2af21150c5
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-11-12 09:58:33 +01:00
Claudio Atzori
9b0fb9e958
merged from master
2020-11-12 09:27:12 +01:00
Claudio Atzori
75324ae58a
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-11-12 09:23:37 +01:00
Claudio Atzori
822971f54f
no need to filter relations in CreateRelatedEntitiesJob_phase1; replaced 'left outer' join with 'left' join in CreateRelatedEntitiesJob_phase2; cleanup;
2020-11-12 09:22:59 +01:00
Enrico Ottonello
1f861f2b0d
now wf output is a sequence file with the format seq("eu.dnetlib.dhp.schema.oaf.Publication",eu.dnetlib.dhp.schema.action.AtomicActions)
2020-11-11 17:38:50 +01:00
Claudio Atzori
9841488482
Merge pull request 'latest changes in stats wf' ( #54 ) from antonis.lempesis/dnet-hadoop:master into master
...
LGTM, thanks!
2020-11-11 16:01:51 +01:00
Antonis Lempesis
99ebaee347
fixed #5913
2020-11-11 16:56:46 +02:00
Claudio Atzori
e3d3481fb9
Merge pull request 'organizations pids' ( #53 ) from organization_pids into master
...
LGTM
2020-11-11 14:08:25 +01:00
Antonis Lempesis
f14e65f6a3
reverted wrong change
2020-11-10 17:23:04 +02:00
Antonis Lempesis
c02c7741c9
fixes in db creation
2020-11-10 17:11:30 +02:00
Antonis Lempesis
e603fa5847
fixes in db creation
2020-11-10 17:11:12 +02:00
Enrico Ottonello
fea2451658
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-11-10 11:49:43 +01:00
Claudio Atzori
18d9aad70c
improved documentation in dhp-graph-provision
2020-11-10 11:48:55 +01:00
Enrico Ottonello
1513174d7e
added further test case
2020-11-10 11:44:55 +01:00
Michele Artini
40160d171f
organizations pids
2020-11-09 12:58:36 +01:00
Sandro La Bruzzo
8e1d43aab2
Implemented ID generation using IdentifierRecordFactory on DOIBoost
2020-11-09 11:53:55 +01:00
Sandro La Bruzzo
027ef2326c
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-11-06 17:12:42 +01:00
Sandro La Bruzzo
cd27df91a1
fixed bug on missing relation in ANDS
2020-11-06 17:12:31 +01:00
Enrico Ottonello
6bc7dbeca7
first version of dataset successful generated from orcid dump 2020
2020-11-06 13:47:50 +01:00
Claudio Atzori
d10447e747
re-packaged graph dump workflow sources
2020-11-05 17:38:18 +01:00
Claudio Atzori
2d76497488
cleanup
2020-11-05 17:10:24 +01:00
Miriam Baglioni
f8e9bda24c
merge branch with master
2020-11-05 16:31:18 +01:00
Miriam Baglioni
be5ed8f554
added check to avoid sending empty metadata.
2020-11-05 16:10:17 +01:00
Claudio Atzori
2148a51fae
minor changes
2020-11-05 11:24:12 +01:00
Claudio Atzori
4625b7486e
code formatting
2020-11-04 18:12:43 +01:00
Claudio Atzori
f5f346dd2b
Merge pull request 'dump' ( #50 ) from miriam.baglioni/dnet-hadoop:dump into master
...
LGTM
2020-11-04 18:07:01 +01:00
Miriam Baglioni
e9ac471ae9
removed dependency from classes for the pid graph dump
2020-11-04 18:04:42 +01:00
Miriam Baglioni
b90a945c49
removed property files for pid graph dump
2020-11-04 17:28:33 +01:00
Miriam Baglioni
bac307155a
removed properties specific for pid graph dump
2020-11-04 17:28:04 +01:00
Miriam Baglioni
9c9d50f486
removed code specific for pid graph dump
2020-11-04 17:26:22 +01:00
Miriam Baglioni
5669890934
removed commented lines
2020-11-04 17:15:21 +01:00
Miriam Baglioni
6a89f59be9
removed commented lines
2020-11-04 17:13:59 +01:00
Miriam Baglioni
56150d7e5e
removed all code related to the dump of pids graph
2020-11-04 17:13:12 +01:00
Miriam Baglioni
16c54a96f8
removed pid dump
2020-11-04 17:11:32 +01:00
Claudio Atzori
e5da4ee9b1
dedup workflow using the common PidComparator
2020-11-04 15:02:02 +01:00
Miriam Baglioni
0cac5436ff
Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump
2020-11-04 13:21:11 +01:00
Alessia Bardi
51808b5afd
Updated descriptions
2020-11-04 12:29:48 +01:00
Alessia Bardi
e6becf8659
Updated descriptions
2020-11-04 12:17:57 +01:00
Alessia Bardi
0abe0eee33
Updated descriptions
2020-11-04 12:15:30 +01:00
Alessia Bardi
f6ab238f5d
Updated descriptions
2020-11-04 11:50:47 +01:00
Sandro La Bruzzo
3581244daf
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-11-04 09:04:22 +01:00
Sandro La Bruzzo
66efb39634
implemented merge scholix
2020-11-04 09:04:01 +01:00
Miriam Baglioni
c010a8442f
fixed issue on test code
2020-11-03 17:26:51 +01:00
Miriam Baglioni
8ec7a61188
merge branch with master
2020-11-03 16:59:08 +01:00
Miriam Baglioni
c209284ca7
new schemas for the entities in the dump with added descriptions
2020-11-03 16:58:08 +01:00
Miriam Baglioni
08806deddf
added the splitSize non mandatory parameter. Default size 10G
2020-11-03 16:57:34 +01:00
Miriam Baglioni
7d2eda43ca
added new non mandatory property publish to determine if to publish the upload or leave it pending. Default value flase
2020-11-03 16:57:01 +01:00
Miriam Baglioni
cbbb1bdc54
moved business logic to new class in common for handling the zip of hte archives
2020-11-03 16:55:50 +01:00
Miriam Baglioni
d4382b54df
moved the tar archive with maz size on common module
2020-11-03 16:54:50 +01:00
Claudio Atzori
86d6fbe95b
refactoring: CleaningFunctions and OafMapperUtils moved in dhp-commong
2020-11-03 12:19:46 +01:00
Claudio Atzori
8471888ad3
Merge branch 'graph_cleaning' into stable_ids
2020-11-03 11:52:47 +01:00
Claudio Atzori
5310e56dba
remove empy PIDs
2020-11-03 11:52:10 +01:00
Claudio Atzori
3fcd669e99
result merge operation leverage on custom ResultTypeComparator in the aggregator graph construction
2020-11-03 10:53:23 +01:00
Claudio Atzori
8e7f81c5f5
code formatting
2020-11-02 14:25:00 +01:00
Claudio Atzori
09e44dabff
Merge branch 'master' into stable_ids
2020-11-02 12:16:01 +01:00
Sandro La Bruzzo
754c86f33e
fixed test to work on jenkins
2020-11-02 09:35:01 +01:00
Sandro La Bruzzo
39337d8a8a
fixed test
2020-11-02 09:26:25 +01:00
Dimitris
32bf943979
Changes to download only updates
2020-11-02 09:08:25 +02:00
Miriam Baglioni
dabb33e018
changed the discriminant for which split the file
2020-10-30 17:52:22 +01:00
Claudio Atzori
c5dda3a00c
Merge pull request 'h2020classification' ( #49 ) from miriam.baglioni/dnet-hadoop:h2020classification into master
...
LGTM
2020-10-30 17:10:05 +01:00
Miriam Baglioni
4905739be6
changed resource file to mirror change in business logic
2020-10-30 17:02:57 +01:00
Miriam Baglioni
b40360ebfb
changed the code to mirror the changed decision in the classification level and prodramme description labels
2020-10-30 17:02:30 +01:00
Miriam Baglioni
696409fb9f
disabled tests because needing remote resource
2020-10-30 17:01:48 +01:00
Miriam Baglioni
0fba08eae4
max allowed size per file 10 Gb
2020-10-30 16:05:55 +01:00
Claudio Atzori
385214eeae
code formatting
2020-10-30 15:47:05 +01:00
Claudio Atzori
04ad8969b2
anticipated execution of the graph cleaning workflow
2020-10-30 15:46:55 +01:00
Claudio Atzori
4ca75d6951
Merge pull request 'Dedup ID creation policy' ( #48 ) from deduptesting into stable_ids
2020-10-30 15:15:32 +01:00
Miriam Baglioni
b828587252
prevent the code to cicle indefinetly
2020-10-30 15:01:25 +01:00
Miriam Baglioni
f747e303ac
classes for dumping of the graph as ttl file
2020-10-30 14:13:45 +01:00
Miriam Baglioni
16baf5b69e
formatting
2020-10-30 14:13:14 +01:00
Miriam Baglioni
a9eef9c852
added check for possible Optional value in relation dataInfo
2020-10-30 14:12:28 +01:00
Miriam Baglioni
5f4de9a962
formatting
2020-10-30 14:11:40 +01:00
Miriam Baglioni
14bf2e7238
added option to split dumps bigger that 40Gb on different files
2020-10-30 14:09:04 +01:00
Dimitris
b8a3392b59
Commit 30102020
2020-10-30 14:07:21 +02:00
Claudio Atzori
58f28296ea
ProvisionConstants moved as ModelHardLimits in dhp-common and applied to truncate long abstracts (len > 150000). Further filtering for empty PID values
2020-10-30 10:56:42 +01:00
Miriam Baglioni
78fdb11c3f
merge branch with master
2020-10-29 12:55:22 +01:00
Sandro La Bruzzo
1d9fdb7367
fixed spark memory issue in SparkSplitOafTODLIEntities
2020-10-28 12:30:32 +01:00
Miriam Baglioni
d2374e3b9e
added code to handle cases where the funding tree is not existing
2020-10-27 16:15:21 +01:00
Miriam Baglioni
5d3012eeb4
changed code to dump only the programme list and not the classification list
2020-10-27 16:14:18 +01:00
Miriam Baglioni
3241ec1777
added connection timeout and socket timeout 600 sec
2020-10-27 16:12:11 +01:00
Enrico Ottonello
9818e74a70
added dependency version in main pom.xml for orcid no doi
2020-10-22 16:38:00 +02:00
Enrico Ottonello
210a50e4f4
replaced null value
2020-10-22 16:24:42 +02:00
Enrico Ottonello
b0290dbcb7
moved all dependencies version to main pom.xml
2020-10-22 16:20:46 +02:00
Enrico Ottonello
a38ab57062
let run test methods
2020-10-22 15:43:50 +02:00
Enrico Ottonello
1139d6568d
replaced null value with a more safe empty string as return value
2020-10-22 15:32:26 +02:00
Enrico Ottonello
c58db1c8ea
added filter on null value after map function
2020-10-22 15:11:02 +02:00
Enrico Ottonello
846ba30873
if typologies mapping fails, an exception will be propagated
2020-10-22 14:36:18 +02:00
Enrico Ottonello
c3114ba0ae
replaced null as return value with a more safe empty string
2020-10-22 14:21:31 +02:00
Enrico Ottonello
c295c71ca0
added comment
2020-10-22 14:07:26 +02:00
Enrico Ottonello
ab083f9946
propagate exception on parsing work (PR request)
2020-10-22 14:02:32 +02:00
sandro
3a81a940b7
solved bug on merge publication
2020-10-21 22:41:55 +02:00
Miriam Baglioni
a2ce527fae
changed to match the requirements for short titles in level and long titles in classification
2020-10-20 17:03:25 +02:00
Sandro La Bruzzo
346ed65e2c
added upload to zenodo node
2020-10-20 16:59:55 +02:00
sandro
271b4db450
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-10-20 16:09:49 +02:00
sandro
d58d02d448
added workflow upload on zenodo
2020-10-20 16:09:07 +02:00
miconis
c4a59d1b9a
merge with the master to port the new packages
2020-10-20 16:07:30 +02:00
miconis
708d887e64
minor changes
2020-10-20 15:12:19 +02:00
miconis
0e54803177
bug fix in the id generator and implementation of jobs for organization dedup
2020-10-20 12:19:46 +02:00
Alessia Bardi
1425d810a8
testing mapping
2020-10-19 17:46:14 +02:00
Claudio Atzori
266bf1a221
common IdentifierFactory in use on the mapping from the aggregator data; merge the entities sharing the same id; code formatting
2020-10-16 17:02:10 +02:00
Claudio Atzori
34f1d0904b
common IdentifierFactory in use on the mapping from the aggregator data
2020-10-16 16:00:19 +02:00
Sandro La Bruzzo
fed711da80
Merge remote-tracking branch 'origin/master' into merge_record_to_common
2020-10-13 15:32:45 +02:00
Sandro La Bruzzo
34bf64c94f
fixed export Scholexplorer to OpenAire
2020-10-13 08:47:58 +02:00
Alessia Bardi
8775a64bc1
Merge pull request 'Merging different compatibility levels (pinocchio operator)' ( #47 ) from merge_graph into master
2020-10-09 14:44:52 +02:00
Claudio Atzori
e751c1402f
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-10-09 13:53:21 +02:00
Claudio Atzori
b961dc7d1e
added originalid to the fields in the result graph view
2020-10-09 13:53:15 +02:00
miconis
6f8720982c
bug fix in the idgenerator and test implementation
2020-10-09 09:30:23 +02:00
Sandro La Bruzzo
734934e2eb
fixed error on empty intersection with publication and relation on export to OAF
2020-10-08 17:29:29 +02:00
Sandro La Bruzzo
eec418cd26
moved AuthoreMerger into dhp-common
2020-10-08 10:33:55 +02:00
Sandro La Bruzzo
fe0a7870e6
Added test to check if merge authors works
2020-10-08 10:33:12 +02:00
Sandro La Bruzzo
cd9c377d18
adpted scholexplorer Dump generation to the new Dataset definition
2020-10-08 10:10:13 +02:00
Claudio Atzori
a3f37a9414
javadoc
2020-10-07 16:44:22 +02:00
Claudio Atzori
8d85a2fced
[BETA wf only] datasources involved in the merge operation doesn't obey to the infra precedence policy, but relies on a custom behaviour that, given two datasources from beta and prod returns the one from prod with the highest compatibility among the two
2020-10-07 16:28:52 +02:00
Claudio Atzori
5f7b75f5c5
code formatting
2020-10-07 13:22:54 +02:00
miconis
1804c5d809
refactoring: classes moved in the right package
2020-10-06 16:44:51 +02:00
miconis
7093355487
bug fix and minor changes
2020-10-06 16:21:34 +02:00
miconis
5a8bc329c5
bug fix in the result merge: it takes the correct bestaccessright basing on the license instead of the trust
2020-10-06 15:26:44 +02:00
miconis
a2ac7e52fb
implementation of the workflow for new organizations in openorgs
2020-10-06 13:58:09 +02:00
Miriam Baglioni
061527f06e
adding short description
2020-10-05 13:54:39 +02:00
Miriam Baglioni
0c12d7bdd8
adding short description
2020-10-05 11:39:55 +02:00
Miriam Baglioni
ae08b3c0dd
merge branch with master
2020-10-05 11:35:55 +02:00
Miriam Baglioni
11b7eaae09
changed the name of the folder where to store the context entity from context to communities_infrastructures
2020-10-05 11:24:54 +02:00
Miriam Baglioni
32bffb0134
changed the name from communities_infrastructures to communities_infrastuctures.json
2020-10-05 11:24:17 +02:00
Claudio Atzori
23f64d9eb4
updated dedup tests following the dnet-pace-core library update
2020-10-02 14:30:53 +02:00
Miriam Baglioni
fc2f7636be
removed not used code
2020-10-02 12:33:52 +02:00
Miriam Baglioni
25cbcf6114
changed to solve issues about names. context renamed communities_infrastructure.json and removed the double json.gz extention to the name of the part in the tar
2020-10-02 12:17:46 +02:00
Claudio Atzori
9db0f88fb8
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-10-02 09:43:35 +02:00
Claudio Atzori
49ae3450a9
code formatting
2020-10-02 09:43:24 +02:00
Claudio Atzori
c2a6e2a9bf
fixed mapping for datasource journal info (ISSNs)
2020-10-02 09:37:08 +02:00
Miriam Baglioni
01117a46e1
whole workflow activated
2020-10-01 17:19:21 +02:00
Miriam Baglioni
cfb5766c6b
removed double json.gz from names of files in the tar
2020-10-01 17:18:34 +02:00
Miriam Baglioni
fcaedac980
merge branch with master
2020-10-01 16:46:59 +02:00
Miriam Baglioni
c6e6ed1bd8
merge branch with master
2020-10-01 16:24:41 +02:00
Miriam Baglioni
4aec347351
refactoring
2020-10-01 16:23:52 +02:00
Miriam Baglioni
61946b4092
refactoring
2020-10-01 16:22:48 +02:00
Miriam Baglioni
7e6d35e56c
added the link to the excel file related to topic
2020-10-01 15:53:31 +02:00
Sandro La Bruzzo
1a0a44e85a
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-10-01 15:46:53 +02:00
Sandro La Bruzzo
c4a3c52e45
fixed Doiboost bug in the identifier
2020-10-01 15:46:44 +02:00
Miriam Baglioni
43cbd62c2b
added classpath.first in the configuration
2020-10-01 15:46:34 +02:00
Miriam Baglioni
cd69c6b023
added dependency for the topic file path
2020-10-01 15:45:59 +02:00
Miriam Baglioni
771cde3d05
moved the library version to global pom
2020-10-01 15:43:47 +02:00
Miriam Baglioni
632351c0da
modified test resources to mirror the changed in the code
2020-10-01 15:43:02 +02:00
Miriam Baglioni
ebc1c5513f
modified test resources to mirror the changed in the code
2020-10-01 15:42:29 +02:00
Miriam Baglioni
3a374c34b6
fixed null pointer exception
2020-10-01 15:41:01 +02:00
Miriam Baglioni
83ea746163
added check to the test
2020-10-01 15:40:28 +02:00
Claudio Atzori
2e9e13444d
author pids made unique by value
2020-10-01 12:50:40 +02:00
Miriam Baglioni
6e5db85b32
-
2020-10-01 11:51:11 +02:00
Miriam Baglioni
a46179f61c
refactoring
2020-10-01 11:22:01 +02:00
Miriam Baglioni
b90bee124b
removing raws that are empy from thos imported
2020-10-01 11:16:49 +02:00
Miriam Baglioni
c107f193c9
refactoring
2020-10-01 11:16:22 +02:00
Claudio Atzori
e265c3e125
cleaning functions factored out in a dedicated class
2020-10-01 10:50:15 +02:00
Miriam Baglioni
706a80a29a
added test to check that separator '-' (not hyphen) will be recognized
2020-10-01 10:38:31 +02:00
Miriam Baglioni
3dca586b3b
refactoring
2020-10-01 10:34:48 +02:00
Miriam Baglioni
416bda6066
changed the programme.desxcription by using the same value used in the classification instead of the short title or the title
2020-10-01 10:31:33 +02:00
Miriam Baglioni
f6587c91f3
added comparison to a char that seems - but it is not
2020-10-01 10:30:26 +02:00
Claudio Atzori
4287164aba
include relevantdate field in the result view
2020-10-01 10:28:55 +02:00
miconis
e3f7798d1b
minor changes in dedup tests, bug fix in the idgenerator and pace-core version update
2020-09-29 15:31:46 +02:00
Miriam Baglioni
7e73bb88b3
changed the logic to add the topic description to the project
2020-09-28 17:21:43 +02:00
Miriam Baglioni
0a035e3630
-
2020-09-28 17:20:49 +02:00
Miriam Baglioni
16bee2084d
added the topic code to the project subset
2020-09-28 17:20:11 +02:00
Miriam Baglioni
0bf2d0db52
added to the workflow the download of the topic excel file and one property needed to get the input path of the topic file in the hdfs filesystem
2020-09-28 12:17:22 +02:00
Miriam Baglioni
c2abde4d9f
changed the implementation of Atomic Actions creation by exploiting the topic information get from the cordis excel file
2020-09-28 12:16:34 +02:00
Miriam Baglioni
d930b8d3fc
changed the query to get only the code of the project and not the optional1 (topic code) and optional2 (topic description)
2020-09-28 12:15:48 +02:00
Miriam Baglioni
f8f5cfd5cc
removed the part added to set the topic code and description in the step of project preparation
2020-09-28 12:13:33 +02:00
Miriam Baglioni
9e19c9a221
remove the topic description from the values in the CSVProject class
2020-09-28 12:11:03 +02:00
Miriam Baglioni
6d8b932e40
refactoring
2020-09-28 12:06:56 +02:00
Miriam Baglioni
b77f166549
changed the package name from csvutils to utils
2020-09-28 12:05:47 +02:00
Miriam Baglioni
e33e3277de
added needed dependency to read the excel file
2020-09-28 12:03:14 +02:00
Miriam Baglioni
f4739a371a
code to get the information related to the topic association between code and description.
2020-09-28 12:02:48 +02:00
Miriam Baglioni
7b6a7333e6
merge branch with master
2020-09-25 16:42:07 +02:00
Miriam Baglioni
983a12ed15
temporary modification to allow the upload of files in the sandbox without the neew to recreate the mapping from scratch
2020-09-25 16:41:51 +02:00
Miriam Baglioni
8b36d19182
added property depositionId and chenage property newVersion that became string from boolean to handle the three possible distinct values
2020-09-25 16:41:15 +02:00
Miriam Baglioni
ed5239f9ec
added new code to handle the new possibility to upload files to an already open deposition
2020-09-25 16:34:32 +02:00
Miriam Baglioni
3a8c524fce
refactor
2020-09-25 16:34:02 +02:00
Miriam Baglioni
2ac2b537b6
merge branch with master
2020-09-25 14:40:47 +02:00
Miriam Baglioni
54800fb9b0
enabled only the step to upload in zenodo
2020-09-25 14:40:22 +02:00
Miriam Baglioni
12c2dfc268
modified the resource to consider the information added to the model
2020-09-25 14:17:23 +02:00
Miriam Baglioni
969fa8d96e
fixed issue and changed the transformation of the programme file to consider the new model
2020-09-25 13:32:34 +02:00
miconis
4cf79f32eb
implementation of the oozie wf to prepare the openorgs input: relations between organizations
2020-09-25 11:29:51 +02:00
Michele Artini
c171fdebe1
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-09-25 09:03:09 +02:00
Michele Artini
c96598aaa4
opendoar partition
2020-09-25 09:02:58 +02:00
Miriam Baglioni
de6c4d46d8
fixed conflicts
2020-09-24 15:35:01 +02:00
Miriam Baglioni
e917281822
-
2020-09-24 15:24:05 +02:00
Miriam Baglioni
9f54f69e6d
added topic information
2020-09-24 15:23:35 +02:00
Miriam Baglioni
d6206d6e63
add the topic description to the action set associated to the project
2020-09-24 15:22:40 +02:00
Miriam Baglioni
6b50226f3b
added topic code and topic description
2020-09-24 15:21:49 +02:00
Miriam Baglioni
15af1f527e
modified to consider the topic information
2020-09-24 15:20:56 +02:00
Miriam Baglioni
609ff17cfc
now the commission give us the framework programme (FP7 - H2020) so use this information to filter out programmes not associated to H2020
2020-09-24 15:19:31 +02:00
Miriam Baglioni
b66f930466
Added optionl1 and optional2 information to the files red from the db. Optional1 contains the topic code and optional2 contains the topic description
2020-09-24 15:16:56 +02:00
Miriam Baglioni
860e6d38a6
added topic description to the CSV project variables
2020-09-24 15:15:26 +02:00
Claudio Atzori
044d3a0214
fixed query used to load datasources in the Graph
2020-09-24 13:48:58 +02:00
Claudio Atzori
27df1cea6d
code formatting
2020-09-24 12:16:00 +02:00
Claudio Atzori
fb22f4d70b
included values for projects fundedamount and totalcost fields in the mapping tests. Swapped expected and actual values in junit test assertions
2020-09-24 12:10:59 +02:00
Claudio Atzori
42f55395c8
fixed order of the ISSNs returned by the SQL query
2020-09-24 12:09:58 +02:00
Claudio Atzori
fadf5c7c69
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-09-24 10:42:52 +02:00
Claudio Atzori
9a7e72d528
using concat_ws to join textual columns from PSQL. When using || to perform the concatenation, Null columns makes the operation result to be Null
2020-09-24 10:42:47 +02:00
Claudio Atzori
9e3e93c6b6
setting the correct issn type in the datasource.journal element
2020-09-24 10:39:16 +02:00
Miriam Baglioni
0d83f47166
merge branch with master
2020-09-23 17:33:49 +02:00
Miriam Baglioni
39eb8ab25b
changed the dump to move from h2020programme to h2020classification
2020-09-23 17:33:00 +02:00
Miriam Baglioni
1d84cf19a6
added new line to resource file
2020-09-23 17:32:22 +02:00
Miriam Baglioni
f0c476b6c9
modification to the test classes to consider h2020classification
2020-09-23 17:31:49 +02:00
Miriam Baglioni
2cba3cb484
modification to the classes building the actionset to consider the h2020classification
2020-09-23 17:31:15 +02:00
Miriam Baglioni
1069cf243a
modification to the schema to consider the H2020classification of the programme. The filed Programme has been moved inside the H2020classification that is now associated to the Project. Programme is no more associated directly to the Project but via H2020CLassification
2020-09-22 14:38:00 +02:00
Enrico Ottonello
a97ad20c7b
exception is now propagated (PR review)
2020-09-22 10:46:34 +02:00
Enrico Ottonello
fefbcfb106
dependency version moved to main pom (PR review)
2020-09-22 10:20:25 +02:00
miconis
259362ef47
implementation of the job to collect simrels from postgres db
2020-09-22 09:43:27 +02:00
Michele Artini
9e681609fd
stats to sql file
2020-09-17 15:51:22 +02:00
Michele Artini
51321c2701
partition of events by opedoarId
2020-09-17 11:38:07 +02:00
Claudio Atzori
cf2ce1a09b
code formatting
2020-09-15 15:58:03 +02:00
Enrico Ottonello
9e8e7fe6ef
add comments
2020-09-15 11:32:49 +02:00
Miriam Baglioni
c2b5c780ff
-
2020-09-14 14:34:03 +02:00
Miriam Baglioni
e2ceefe9be
-
2020-09-14 14:33:28 +02:00
Miriam Baglioni
1f893e63dc
-
2020-09-14 14:33:10 +02:00
Enrico Ottonello
538f299767
merged
2020-09-14 12:35:16 +02:00
Enrico Ottonello
eb8c9b2348
Merge remote-tracking branch 'upstream/master' into orcid-no-doi
2020-09-14 12:00:56 +02:00
Michele Artini
9b0c12f5d3
send notifications
2020-09-11 12:06:16 +02:00
Michele Artini
028613b751
remove old notifications
2020-09-09 15:32:06 +02:00
Michele Artini
9cfc124ac5
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-09-08 16:39:54 +02:00
Michele Artini
a597a218ab
* forall topics
2020-09-08 16:39:40 +02:00
Claudio Atzori
8a523474b7
code formatting
2020-09-07 11:40:16 +02:00
Michele Artini
bb459caf69
support for all topic subscriptions
2020-08-27 11:01:21 +02:00
Michele Artini
82ed8edafd
notification indexing
2020-08-26 15:10:48 +02:00
Miriam Baglioni
b72a7dad46
resuorce for pid graph dump
2020-08-24 17:09:01 +02:00
Miriam Baglioni
8694bb9b31
refactoring due to compilation
2020-08-24 17:07:34 +02:00
Miriam Baglioni
8a069a4fea
-
2020-08-24 17:01:30 +02:00
Miriam Baglioni
34fa96f3b1
-
2020-08-24 17:00:20 +02:00
Miriam Baglioni
5fb2949cb8
added utils methods
2020-08-24 17:00:09 +02:00
Miriam Baglioni
2a540b6c01
added constants for the pid graph dump
2020-08-24 16:55:35 +02:00
Miriam Baglioni
da103c399a
resources for the pid graph dump test
2020-08-24 16:52:07 +02:00
Miriam Baglioni
630a6a1fe7
first tests for the pid graph dump
2020-08-24 16:51:26 +02:00
Miriam Baglioni
40c8d2de7b
test resources for the dump of the pids graph
2020-08-24 16:50:39 +02:00
Miriam Baglioni
bef79d3bdf
first attempt to the dump of pids graph
2020-08-24 16:49:38 +02:00
Michele Artini
da470422d3
deleting events
2020-08-21 14:52:48 +02:00
Michele Artini
6e60bf026a
indexing only a subset of eventsa
2020-08-19 12:39:22 +02:00
Miriam Baglioni
85203c16e3
merge branch with master
2020-08-19 11:49:03 +02:00
Miriam Baglioni
2c783793ba
removed the affiliation from the author to mirror the changes in the model
2020-08-19 11:48:12 +02:00
Miriam Baglioni
f6bf888016
removed affiliation from author to mirror the changes in the model
2020-08-19 11:41:41 +02:00
Miriam Baglioni
66d0e0d3f2
-
2020-08-19 11:31:50 +02:00
Miriam Baglioni
1c593a9cfe
-
2020-08-19 11:29:51 +02:00
Miriam Baglioni
e42b2f5ae2
-
2020-08-19 11:29:09 +02:00
Miriam Baglioni
f81ee22418
changed to mirror the changes in the model (Instance, CommunityInstance, GraphResult)
2020-08-19 11:28:26 +02:00
Miriam Baglioni
387be43fd4
changed to discriminate if dumping all the results type together or each one in its own archive
2020-08-19 11:25:27 +02:00
Miriam Baglioni
c5858afb88
added parameter to guide the dump for the result (resultAggregation). true if all the result types should be dump together, false otherwise.
2020-08-19 11:24:14 +02:00
Miriam Baglioni
d407852ac2
changed to reflect the changed in the model
2020-08-19 11:15:05 +02:00
Miriam Baglioni
47c21a8961
refactoring due to compilation
2020-08-19 11:11:57 +02:00
Miriam Baglioni
5570678c65
changed parameter name from hfdsNameNode to nameNode
2020-08-19 10:59:26 +02:00
Miriam Baglioni
dc5096a327
refactoring due to compilation
2020-08-19 10:57:36 +02:00
Miriam Baglioni
55e24c2547
relclass for relation and corresponding values have been put to lower case (isSupplementedBy wrote as IsSupplementedBy - orcid propagation)
2020-08-18 16:42:08 +02:00
Miriam Baglioni
f44dd5d886
changed in mapping the result semantic name as it will be visible il the relclass Relation: from IsSupplementedBy to isSupplementedBy
2020-08-17 17:15:09 +02:00
Miriam Baglioni
bc6b5d5b34
removed leftover parameter
2020-08-15 11:22:35 +02:00
Miriam Baglioni
200cd5c730
removed leftover parameter
2020-08-15 11:22:19 +02:00
Miriam Baglioni
96600ed04a
modified test resource for mirroring the deletion of affiliation from author parameters
2020-08-14 20:41:49 +02:00
Miriam Baglioni
09f5b92763
added specific reference to class
2020-08-14 20:00:09 +02:00
Miriam Baglioni
37e7c43652
changed parameter name from hdfsNaemNode to nameNode
2020-08-14 18:18:25 +02:00
Claudio Atzori
5b994d7ccf
Merge branch 'dump' of https://code-repo.d4science.org/miriam.baglioni/dnet-hadoop into resolve_conflicts_pr40_dump
2020-08-14 15:32:29 +02:00
Miriam Baglioni
de995970ea
try again to solve clash with master
2020-08-14 15:24:36 +02:00
Miriam Baglioni
5040d72d5e
changed to make it equal to master branch
2020-08-14 15:20:17 +02:00
Miriam Baglioni
be8106c339
added space toavoid conflicts with master branch
2020-08-14 15:16:27 +02:00
Claudio Atzori
1871d1c6f6
solve error java.lang.NoSuchFieldError: INSTANCE when instantiating Solr client
2020-08-14 11:18:30 +02:00
Miriam Baglioni
d2a8a4961a
refactoring
2020-08-13 18:50:33 +02:00
Miriam Baglioni
a5043de5da
added method to get the mapped instance
2020-08-13 18:45:50 +02:00
Miriam Baglioni
b7e49aee8d
removed commented code
2020-08-13 18:44:07 +02:00
Miriam Baglioni
f439a6231e
added missing constraint in XQuery (verify the status of the RC/RI different from hidden)
2020-08-13 15:30:55 +02:00
Miriam Baglioni
0fe800b1c9
modified because of D-Net/dnet-hadoop#40 \#issuecomment-1902
2020-08-13 15:17:12 +02:00
Miriam Baglioni
270c89489c
fixed issue created while renaming subject to subjects in community configuration xml
2020-08-13 15:16:04 +02:00
Miriam Baglioni
fcd10f452c
changed because of D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:55:32 +02:00
Miriam Baglioni
fd48ae3b85
changed because of D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:19:15 +02:00
Miriam Baglioni
04a3e1ab38
disabled tests
2020-08-13 12:18:13 +02:00
Miriam Baglioni
2ede397933
Apply change because of D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:16:39 +02:00
Miriam Baglioni
bfd1fcde6d
removed not useful method and changed because of D-Net/dnet-hadoop#40 (comment) and D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:14:37 +02:00
Miriam Baglioni
7fd8397123
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:13:15 +02:00
Miriam Baglioni
753d448cc9
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:12:58 +02:00
Miriam Baglioni
c0e071fa26
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:12:40 +02:00
Miriam Baglioni
526db915bc
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:12:16 +02:00
Miriam Baglioni
b0fab0d138
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:11:57 +02:00
Miriam Baglioni
1b6320b251
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:11:41 +02:00
Miriam Baglioni
743d31be22
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:11:22 +02:00
Miriam Baglioni
65b48df652
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:11:06 +02:00
Miriam Baglioni
90b54d3efb
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:08:24 +02:00
Miriam Baglioni
69bbb9592a
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:07:39 +02:00
Miriam Baglioni
945323299a
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:07:24 +02:00
Miriam Baglioni
e04c993247
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:07:07 +02:00
Miriam Baglioni
ed0812d0ce
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:06:49 +02:00
Miriam Baglioni
d55cfe0ea5
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:06:20 +02:00
Miriam Baglioni
80866bec7d
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:06:05 +02:00
Miriam Baglioni
1400978c0a
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:05:44 +02:00
Miriam Baglioni
7b941a2e0a
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:05:17 +02:00
Miriam Baglioni
f7474f50fe
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:04:52 +02:00
Miriam Baglioni
367203f412
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:04:33 +02:00
Miriam Baglioni
3ab4809d31
apply changes in D-Net/dnet-hadoop#40 (comment)
2020-08-13 12:04:10 +02:00
Miriam Baglioni
02a4986e7b
Applying changed from code reviews D-Net/dnet-hadoop#40 (comment) and D-Net/dnet-hadoop#40 (comment) and D-Net/dnet-hadoop#40 (comment)
2020-08-13 11:53:01 +02:00
Miriam Baglioni
235d4e4d6e
moved Context as relevant for Communities dump
2020-08-12 18:16:45 +02:00
Miriam Baglioni
adf9f96a67
test for extraction of relation between organizations and context
2020-08-12 10:04:47 +02:00
Miriam Baglioni
7400cd019d
removed not needed variable
2020-08-12 10:03:33 +02:00
Miriam Baglioni
98d28bab5c
fixed missing _ in context nsprefix
2020-08-12 10:00:18 +02:00
Miriam Baglioni
8f48cb29f4
changed resource because of a change in the XQuery that returned the XML to be parsed. The main Zenodo community is no more a separate element, but part of the <zenodocommunities> element
2020-08-11 18:04:38 +02:00
Miriam Baglioni
c3672b162b
merge branch with master
2020-08-11 17:53:04 +02:00
Miriam Baglioni
a16bbf3202
changed test resource to mirror change in the Xquery that produced data to be parsed. The main Zenodo community it is no more provided in a different element, but it is part of the <zenodocommunities>
2020-08-11 17:48:44 +02:00
Miriam Baglioni
25f4fbceea
draft of test and resources
2020-08-11 17:37:22 +02:00
Miriam Baglioni
30a2b19b65
changed metadata for deposition od covid-19 dump in Zenodo
2020-08-11 17:36:56 +02:00
Claudio Atzori
f7cc52ab02
Merge pull request 'enrichment_wfs' ( #39 ) from enrichment_wfs into master
...
LGTM
2020-08-11 17:26:13 +02:00
Miriam Baglioni
49788b532a
changed to mirror changes in the schema
2020-08-11 16:05:03 +02:00
Miriam Baglioni
b08511287b
-
2020-08-11 16:01:36 +02:00
Miriam Baglioni
7e81a17068
changed the XQUERY to mirror the change in the code
2020-08-11 16:00:33 +02:00
Miriam Baglioni
37ad2f28e9
removed added | in prefix for datasource
2020-08-11 15:55:06 +02:00
Miriam Baglioni
f31c2e9461
enabled test
2020-08-11 15:49:25 +02:00
Miriam Baglioni
2d67476417
merge branch with master
2020-08-11 15:46:04 +02:00
Miriam Baglioni
77a390878c
merge upstream
2020-08-11 15:45:48 +02:00
Miriam Baglioni
6d3804e24c
-
2020-08-11 15:45:12 +02:00
Miriam Baglioni
0603ec4757
changed test to upload the dump for covid-19 community
2020-08-11 15:43:25 +02:00
Miriam Baglioni
7dfd56df9d
-
2020-08-11 15:42:35 +02:00
Miriam Baglioni
a169d7e7c1
added test file for the MakeTar class
2020-08-11 15:40:41 +02:00
Miriam Baglioni
acb0926b2e
json schemas for the dumped entities and relation
2020-08-11 15:39:48 +02:00
Miriam Baglioni
ff52c51f92
added the communityMapPath parameter and removed the isLookUpUrl parameter
2020-08-11 15:39:22 +02:00
Miriam Baglioni
6f43acda5e
added the maketar and send to zenodo step. Adjusted wf parameters
2020-08-11 15:38:20 +02:00