Miriam Baglioni
|
21ce175d17
|
added FilterFunction specification if filter operation
|
2020-11-25 13:42:31 +01:00 |
Miriam Baglioni
|
bde6d337dd
|
test classes for dump of results related to funders
|
2020-11-25 13:42:01 +01:00 |
Miriam Baglioni
|
b37b9352d7
|
added constant value for semantic relationship between projects and results
|
2020-11-25 13:41:08 +01:00 |
Sandro La Bruzzo
|
264723ffd8
|
updated stuff for zenodo upload
|
2020-11-25 11:56:07 +01:00 |
Claudio Atzori
|
eeebd5a920
|
Cleanig workflow: remove newlines from titles, descriptions, subjects
|
2020-11-24 18:40:25 +01:00 |
Enrico Ottonello
|
99a086f0c6
|
max concurrent executors set to 10, according to ORCID Director of Technology mail request
|
2020-11-24 17:49:32 +01:00 |
Miriam Baglioni
|
72bb0fe360
|
changed directory name
|
2020-11-24 16:47:07 +01:00 |
Miriam Baglioni
|
00874a8ce6
|
added bidirectionality to relations from project and result
|
2020-11-24 15:17:23 +01:00 |
Miriam Baglioni
|
39f4a20873
|
chenged the path and the name for saving the communities_infrastructures dump file
|
2020-11-24 14:47:32 +01:00 |
Miriam Baglioni
|
7e14452a87
|
final versione of the wf to get the dump of results associated to at least one funder per funder
|
2020-11-24 14:46:34 +01:00 |
Miriam Baglioni
|
c167a18057
|
added new parameter for the dumpType
|
2020-11-24 14:45:50 +01:00 |
Miriam Baglioni
|
54a309bb6b
|
refactoring
|
2020-11-24 14:45:30 +01:00 |
Miriam Baglioni
|
35ecea8842
|
changed to consider the modification for the specification of the type of dump
|
2020-11-24 14:45:15 +01:00 |
Miriam Baglioni
|
b9b6bdb2e6
|
fixing issue on previous implementation
|
2020-11-24 14:44:53 +01:00 |
Miriam Baglioni
|
7e940f1991
|
changed to consider the modification for the specification of the type of dump
|
2020-11-24 14:43:34 +01:00 |
Miriam Baglioni
|
62928ef7a5
|
changed to save the communities_infrastructures information as the other entity dumps: in a json.gz file
|
2020-11-24 14:42:41 +01:00 |
Miriam Baglioni
|
3319440c53
|
changed the direction of the relation between projects and result considered to select the results linked to projects
|
2020-11-24 14:41:09 +01:00 |
Miriam Baglioni
|
00c377dac2
|
added specification of MapFunction types in map
|
2020-11-24 14:40:22 +01:00 |
Miriam Baglioni
|
44db258dc4
|
added enumerated for the dump type
|
2020-11-24 14:38:06 +01:00 |
Miriam Baglioni
|
1832708c42
|
modified boolean variable with string one whcih specify the type of dump we are performing: complete, community or funder
|
2020-11-24 14:37:36 +01:00 |
Enrico Ottonello
|
5c17e768b2
|
set wf configuration with spark.dynamicAllocation.maxExecutors 20 over 20 input partitions
|
2020-11-23 16:01:23 +01:00 |
Enrico Ottonello
|
5c9a727895
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2020-11-23 09:49:53 +01:00 |
Enrico Ottonello
|
97c8111847
|
action to convert lambda file in seq file; spark action to download updated authors
|
2020-11-23 09:49:22 +01:00 |
Miriam Baglioni
|
259c67ce36
|
fixed issue in path name
|
2020-11-20 12:32:23 +01:00 |
Miriam Baglioni
|
0a9db67eec
|
-
|
2020-11-20 12:21:33 +01:00 |
Miriam Baglioni
|
d362f2637d
|
merge branch with master
|
2020-11-19 19:17:20 +01:00 |
Miriam Baglioni
|
cf3f47563f
|
new parameter files
|
2020-11-19 19:16:05 +01:00 |
Miriam Baglioni
|
24c56fa7a3
|
new logic and workflow for dump of results with link to projects. In this implementation the result match the model of the communityresult.
|
2020-11-19 19:15:39 +01:00 |
Claudio Atzori
|
d48f388fb2
|
Merge branch 'provision_indexing'
|
2020-11-19 15:59:55 +01:00 |
Claudio Atzori
|
46bde9c13f
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-11-19 15:26:27 +01:00 |
Claudio Atzori
|
7c9feaf9e7
|
project attributes removed from the XML record serialization: contactfullname, contactfax, contactphone, contactemail
|
2020-11-19 15:26:20 +01:00 |
Michele Artini
|
293da47ad9
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-11-19 10:42:31 +01:00 |
Michele Artini
|
ab08d12c46
|
considering abstract > MIN_LENGTH in ENRICH_MISSING_ABSTRACT
|
2020-11-19 10:42:10 +01:00 |
Claudio Atzori
|
e503271abe
|
fixed notification workflow name
|
2020-11-19 10:41:38 +01:00 |
Claudio Atzori
|
0374d34c3e
|
introduced configuration param outputFormat: HDFS | SOLR
|
2020-11-19 10:34:28 +01:00 |
Miriam Baglioni
|
fafb688887
|
-
|
2020-11-18 18:56:48 +01:00 |
Miriam Baglioni
|
906db690d2
|
-
|
2020-11-18 17:43:08 +01:00 |
Claudio Atzori
|
ede7fae6c8
|
Merge pull request 'XML record indexing test' (#58) from provision_indexing into master
|
2020-11-18 17:04:34 +01:00 |
Miriam Baglioni
|
5402062ff5
|
changed parameter file with the ono associated to the job
|
2020-11-18 16:58:20 +01:00 |
Miriam Baglioni
|
a172a37ad1
|
fixed typo
|
2020-11-18 16:55:07 +01:00 |
Miriam Baglioni
|
46ba3793f6
|
code, workflow and parameters for the dump of the results associated to funders
|
2020-11-18 16:47:31 +01:00 |
Claudio Atzori
|
5218718e8b
|
updated set of fields from the MDFormatDSResourceType on PROD
|
2020-11-18 15:00:41 +01:00 |
Claudio Atzori
|
d9e07a242b
|
extended XmlIndexingJob to accept an optional parameter: outputPath. When present, forces the job to write its output on the specified HDFS location
|
2020-11-18 14:34:55 +01:00 |
Claudio Atzori
|
29dcff0f34
|
spark complains about missing classes, so here they are again
|
2020-11-18 14:32:32 +01:00 |
Miriam Baglioni
|
57cac36898
|
changed the workflow name
|
2020-11-18 13:38:03 +01:00 |
Claudio Atzori
|
12acf25519
|
Merge pull request 'starting from first step...' (#57) from antonis.lempesis/dnet-hadoop:master into master
No judging. Just re-deploying...
|
2020-11-18 11:01:49 +01:00 |
Claudio Atzori
|
8177ce7939
|
test for XmlIndexingJob based on a local miniSolrCluster
|
2020-11-18 10:58:05 +01:00 |
Alessia Bardi
|
10e673660f
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-11-18 10:01:23 +01:00 |
Alessia Bardi
|
be7b310cef
|
rel semantcis ignore case
|
2020-11-18 10:01:20 +01:00 |
Michele Artini
|
33da2e3d6c
|
xpaths for dateOfCollection and dateOfTransformation
|
2020-11-18 09:26:20 +01:00 |
Antonis Lempesis
|
01a6e03989
|
starting from first step...
|
2020-11-17 23:26:47 +02:00 |
Alessia Bardi
|
8f87020a50
|
#56: map relevantDates from aggregated ODF records
|
2020-11-17 18:42:09 +01:00 |
Alessia Bardi
|
7e0a76a8ac
|
test fr TextGrid
|
2020-11-17 18:39:25 +01:00 |
Enrico Ottonello
|
2b0c9bbb7e
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2020-11-17 18:24:34 +01:00 |
Enrico Ottonello
|
c0c2e05eae
|
added wf to extracting authors and works xml data from orcid dump to hdfs; added wf to download the lamda file (containing last orcid update informations) from orcid to hdfs
|
2020-11-17 18:23:12 +01:00 |
Claudio Atzori
|
cfc01f136e
|
PID filtering based on a blacklist
|
2020-11-17 12:27:06 +01:00 |
Enrico Ottonello
|
c796adae24
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2020-11-16 11:57:19 +01:00 |
Claudio Atzori
|
6ab1ce53c9
|
fixed condition in result pid cleaning; cleanup
|
2020-11-16 10:09:17 +01:00 |
Claudio Atzori
|
4de8c8b237
|
fixed workflow variable name
|
2020-11-16 10:03:11 +01:00 |
Claudio Atzori
|
331d621800
|
added test resource
|
2020-11-14 12:16:15 +01:00 |
Claudio Atzori
|
5d4e34e26a
|
fixed typo in variable name
|
2020-11-14 10:32:26 +01:00 |
Claudio Atzori
|
768bc5304c
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-11-13 15:40:34 +01:00 |
Claudio Atzori
|
93f7b7974f
|
Merge pull request 'trust truncated to 3 decimals' (#24) from trunc_trust into master
LGTM
|
2020-11-13 15:40:02 +01:00 |
Claudio Atzori
|
528231a287
|
grouping graph entities by id turned out to be an easy extension for the already existing cleaning workflow
|
2020-11-13 15:37:48 +01:00 |
Enrico Ottonello
|
005f849674
|
added compression to output dataset
|
2020-11-13 12:45:31 +01:00 |
Enrico Ottonello
|
9a2fa9dc2f
|
added test for other names parsing from summaries dump
|
2020-11-13 10:25:34 +01:00 |
Claudio Atzori
|
2bed29eb09
|
WIP: added oozie workflow for grouping graph entities by id
|
2020-11-13 10:05:12 +01:00 |
Claudio Atzori
|
13e36a4da0
|
WIP: added oozie workflow for grouping graph entities by id
|
2020-11-13 10:05:02 +01:00 |
Enrico Ottonello
|
13f28fa225
|
moved AuthorData to dhp-schemas; added other names to author data
|
2020-11-12 17:43:32 +01:00 |
Enrico Ottonello
|
2af21150c5
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2020-11-12 09:58:33 +01:00 |
Claudio Atzori
|
75324ae58a
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-11-12 09:23:37 +01:00 |
Claudio Atzori
|
822971f54f
|
no need to filter relations in CreateRelatedEntitiesJob_phase1; replaced 'left outer' join with 'left' join in CreateRelatedEntitiesJob_phase2; cleanup;
|
2020-11-12 09:22:59 +01:00 |
Enrico Ottonello
|
1f861f2b0d
|
now wf output is a sequence file with the format seq("eu.dnetlib.dhp.schema.oaf.Publication",eu.dnetlib.dhp.schema.action.AtomicActions)
|
2020-11-11 17:38:50 +01:00 |
Claudio Atzori
|
9841488482
|
Merge pull request 'latest changes in stats wf' (#54) from antonis.lempesis/dnet-hadoop:master into master
LGTM, thanks!
|
2020-11-11 16:01:51 +01:00 |
Antonis Lempesis
|
99ebaee347
|
fixed #5913
|
2020-11-11 16:56:46 +02:00 |
Claudio Atzori
|
e3d3481fb9
|
Merge pull request 'organizations pids' (#53) from organization_pids into master
LGTM
|
2020-11-11 14:08:25 +01:00 |
Antonis Lempesis
|
f14e65f6a3
|
reverted wrong change
|
2020-11-10 17:23:04 +02:00 |
Antonis Lempesis
|
c02c7741c9
|
fixes in db creation
|
2020-11-10 17:11:30 +02:00 |
Antonis Lempesis
|
e603fa5847
|
fixes in db creation
|
2020-11-10 17:11:12 +02:00 |
Enrico Ottonello
|
fea2451658
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2020-11-10 11:49:43 +01:00 |
Claudio Atzori
|
18d9aad70c
|
improved documentation in dhp-graph-provision
|
2020-11-10 11:48:55 +01:00 |
Enrico Ottonello
|
1513174d7e
|
added further test case
|
2020-11-10 11:44:55 +01:00 |
Michele Artini
|
40160d171f
|
organizations pids
|
2020-11-09 12:58:36 +01:00 |
Sandro La Bruzzo
|
027ef2326c
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-11-06 17:12:42 +01:00 |
Sandro La Bruzzo
|
cd27df91a1
|
fixed bug on missing relation in ANDS
|
2020-11-06 17:12:31 +01:00 |
Enrico Ottonello
|
6bc7dbeca7
|
first version of dataset successful generated from orcid dump 2020
|
2020-11-06 13:47:50 +01:00 |
Claudio Atzori
|
d10447e747
|
re-packaged graph dump workflow sources
|
2020-11-05 17:38:18 +01:00 |
Miriam Baglioni
|
f8e9bda24c
|
merge branch with master
|
2020-11-05 16:31:18 +01:00 |
Miriam Baglioni
|
be5ed8f554
|
added check to avoid sending empty metadata.
|
2020-11-05 16:10:17 +01:00 |
Claudio Atzori
|
2148a51fae
|
minor changes
|
2020-11-05 11:24:12 +01:00 |
Claudio Atzori
|
4625b7486e
|
code formatting
|
2020-11-04 18:12:43 +01:00 |
Claudio Atzori
|
f5f346dd2b
|
Merge pull request 'dump' (#50) from miriam.baglioni/dnet-hadoop:dump into master
LGTM
|
2020-11-04 18:07:01 +01:00 |
Miriam Baglioni
|
e9ac471ae9
|
removed dependency from classes for the pid graph dump
|
2020-11-04 18:04:42 +01:00 |
Miriam Baglioni
|
b90a945c49
|
removed property files for pid graph dump
|
2020-11-04 17:28:33 +01:00 |
Miriam Baglioni
|
bac307155a
|
removed properties specific for pid graph dump
|
2020-11-04 17:28:04 +01:00 |
Miriam Baglioni
|
9c9d50f486
|
removed code specific for pid graph dump
|
2020-11-04 17:26:22 +01:00 |
Miriam Baglioni
|
5669890934
|
removed commented lines
|
2020-11-04 17:15:21 +01:00 |
Miriam Baglioni
|
6a89f59be9
|
removed commented lines
|
2020-11-04 17:13:59 +01:00 |
Miriam Baglioni
|
56150d7e5e
|
removed all code related to the dump of pids graph
|
2020-11-04 17:13:12 +01:00 |
Miriam Baglioni
|
16c54a96f8
|
removed pid dump
|
2020-11-04 17:11:32 +01:00 |
Miriam Baglioni
|
0cac5436ff
|
Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump
|
2020-11-04 13:21:11 +01:00 |
Alessia Bardi
|
51808b5afd
|
Updated descriptions
|
2020-11-04 12:29:48 +01:00 |
Alessia Bardi
|
e6becf8659
|
Updated descriptions
|
2020-11-04 12:17:57 +01:00 |
Alessia Bardi
|
0abe0eee33
|
Updated descriptions
|
2020-11-04 12:15:30 +01:00 |
Alessia Bardi
|
f6ab238f5d
|
Updated descriptions
|
2020-11-04 11:50:47 +01:00 |
Sandro La Bruzzo
|
3581244daf
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-11-04 09:04:22 +01:00 |
Sandro La Bruzzo
|
66efb39634
|
implemented merge scholix
|
2020-11-04 09:04:01 +01:00 |
Miriam Baglioni
|
c010a8442f
|
fixed issue on test code
|
2020-11-03 17:26:51 +01:00 |
Miriam Baglioni
|
8ec7a61188
|
merge branch with master
|
2020-11-03 16:59:08 +01:00 |
Miriam Baglioni
|
c209284ca7
|
new schemas for the entities in the dump with added descriptions
|
2020-11-03 16:58:08 +01:00 |
Miriam Baglioni
|
08806deddf
|
added the splitSize non mandatory parameter. Default size 10G
|
2020-11-03 16:57:34 +01:00 |
Miriam Baglioni
|
7d2eda43ca
|
added new non mandatory property publish to determine if to publish the upload or leave it pending. Default value flase
|
2020-11-03 16:57:01 +01:00 |
Miriam Baglioni
|
cbbb1bdc54
|
moved business logic to new class in common for handling the zip of hte archives
|
2020-11-03 16:55:50 +01:00 |
Miriam Baglioni
|
d4382b54df
|
moved the tar archive with maz size on common module
|
2020-11-03 16:54:50 +01:00 |
Claudio Atzori
|
5310e56dba
|
remove empy PIDs
|
2020-11-03 11:52:10 +01:00 |
Sandro La Bruzzo
|
754c86f33e
|
fixed test to work on jenkins
|
2020-11-02 09:35:01 +01:00 |
Sandro La Bruzzo
|
39337d8a8a
|
fixed test
|
2020-11-02 09:26:25 +01:00 |
Miriam Baglioni
|
dabb33e018
|
changed the discriminant for which split the file
|
2020-10-30 17:52:22 +01:00 |
Claudio Atzori
|
c5dda3a00c
|
Merge pull request 'h2020classification' (#49) from miriam.baglioni/dnet-hadoop:h2020classification into master
LGTM
|
2020-10-30 17:10:05 +01:00 |
Miriam Baglioni
|
4905739be6
|
changed resource file to mirror change in business logic
|
2020-10-30 17:02:57 +01:00 |
Miriam Baglioni
|
b40360ebfb
|
changed the code to mirror the changed decision in the classification level and prodramme description labels
|
2020-10-30 17:02:30 +01:00 |
Miriam Baglioni
|
696409fb9f
|
disabled tests because needing remote resource
|
2020-10-30 17:01:48 +01:00 |
Miriam Baglioni
|
0fba08eae4
|
max allowed size per file 10 Gb
|
2020-10-30 16:05:55 +01:00 |
Miriam Baglioni
|
b828587252
|
prevent the code to cicle indefinetly
|
2020-10-30 15:01:25 +01:00 |
Miriam Baglioni
|
f747e303ac
|
classes for dumping of the graph as ttl file
|
2020-10-30 14:13:45 +01:00 |
Miriam Baglioni
|
16baf5b69e
|
formatting
|
2020-10-30 14:13:14 +01:00 |
Miriam Baglioni
|
a9eef9c852
|
added check for possible Optional value in relation dataInfo
|
2020-10-30 14:12:28 +01:00 |
Miriam Baglioni
|
5f4de9a962
|
formatting
|
2020-10-30 14:11:40 +01:00 |
Miriam Baglioni
|
14bf2e7238
|
added option to split dumps bigger that 40Gb on different files
|
2020-10-30 14:09:04 +01:00 |
Miriam Baglioni
|
78fdb11c3f
|
merge branch with master
|
2020-10-29 12:55:22 +01:00 |
Sandro La Bruzzo
|
1d9fdb7367
|
fixed spark memory issue in SparkSplitOafTODLIEntities
|
2020-10-28 12:30:32 +01:00 |
Miriam Baglioni
|
d2374e3b9e
|
added code to handle cases where the funding tree is not existing
|
2020-10-27 16:15:21 +01:00 |
Miriam Baglioni
|
5d3012eeb4
|
changed code to dump only the programme list and not the classification list
|
2020-10-27 16:14:18 +01:00 |
Miriam Baglioni
|
3241ec1777
|
added connection timeout and socket timeout 600 sec
|
2020-10-27 16:12:11 +01:00 |
Enrico Ottonello
|
9818e74a70
|
added dependency version in main pom.xml for orcid no doi
|
2020-10-22 16:38:00 +02:00 |
Enrico Ottonello
|
210a50e4f4
|
replaced null value
|
2020-10-22 16:24:42 +02:00 |
Enrico Ottonello
|
b0290dbcb7
|
moved all dependencies version to main pom.xml
|
2020-10-22 16:20:46 +02:00 |
Enrico Ottonello
|
a38ab57062
|
let run test methods
|
2020-10-22 15:43:50 +02:00 |
Enrico Ottonello
|
1139d6568d
|
replaced null value with a more safe empty string as return value
|
2020-10-22 15:32:26 +02:00 |
Enrico Ottonello
|
c58db1c8ea
|
added filter on null value after map function
|
2020-10-22 15:11:02 +02:00 |
Enrico Ottonello
|
846ba30873
|
if typologies mapping fails, an exception will be propagated
|
2020-10-22 14:36:18 +02:00 |
Enrico Ottonello
|
c3114ba0ae
|
replaced null as return value with a more safe empty string
|
2020-10-22 14:21:31 +02:00 |
Enrico Ottonello
|
c295c71ca0
|
added comment
|
2020-10-22 14:07:26 +02:00 |
Enrico Ottonello
|
ab083f9946
|
propagate exception on parsing work (PR request)
|
2020-10-22 14:02:32 +02:00 |
sandro
|
3a81a940b7
|
solved bug on merge publication
|
2020-10-21 22:41:55 +02:00 |
Miriam Baglioni
|
a2ce527fae
|
changed to match the requirements for short titles in level and long titles in classification
|
2020-10-20 17:03:25 +02:00 |
Sandro La Bruzzo
|
346ed65e2c
|
added upload to zenodo node
|
2020-10-20 16:59:55 +02:00 |
sandro
|
271b4db450
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-10-20 16:09:49 +02:00 |
sandro
|
d58d02d448
|
added workflow upload on zenodo
|
2020-10-20 16:09:07 +02:00 |
Alessia Bardi
|
1425d810a8
|
testing mapping
|
2020-10-19 17:46:14 +02:00 |