Claudio Atzori
|
c016cc050a
|
IdentifierFactory: in case a record provides more than one pid of the same type, the the lexicographically lower value is chosen as best pick
|
2020-11-23 19:16:40 +01:00 |
Enrico Ottonello
|
5c17e768b2
|
set wf configuration with spark.dynamicAllocation.maxExecutors 20 over 20 input partitions
|
2020-11-23 16:01:23 +01:00 |
Enrico Ottonello
|
5c9a727895
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2020-11-23 09:49:53 +01:00 |
Enrico Ottonello
|
97c8111847
|
action to convert lambda file in seq file; spark action to download updated authors
|
2020-11-23 09:49:22 +01:00 |
Miriam Baglioni
|
259c67ce36
|
fixed issue in path name
|
2020-11-20 12:32:23 +01:00 |
Miriam Baglioni
|
0a9db67eec
|
-
|
2020-11-20 12:21:33 +01:00 |
Miriam Baglioni
|
d362f2637d
|
merge branch with master
|
2020-11-19 19:17:20 +01:00 |
Miriam Baglioni
|
cf3f47563f
|
new parameter files
|
2020-11-19 19:16:05 +01:00 |
Miriam Baglioni
|
24c56fa7a3
|
new logic and workflow for dump of results with link to projects. In this implementation the result match the model of the communityresult.
|
2020-11-19 19:15:39 +01:00 |
Claudio Atzori
|
d48f388fb2
|
Merge branch 'provision_indexing'
|
2020-11-19 15:59:55 +01:00 |
Claudio Atzori
|
46bde9c13f
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-11-19 15:26:27 +01:00 |
Claudio Atzori
|
7c9feaf9e7
|
project attributes removed from the XML record serialization: contactfullname, contactfax, contactphone, contactemail
|
2020-11-19 15:26:20 +01:00 |
Claudio Atzori
|
fcbb05eb21
|
cleanup
|
2020-11-19 15:14:33 +01:00 |
Claudio Atzori
|
3f34757c63
|
merged from master
|
2020-11-19 14:34:54 +01:00 |
Michele Artini
|
293da47ad9
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-11-19 10:42:31 +01:00 |
Michele Artini
|
ab08d12c46
|
considering abstract > MIN_LENGTH in ENRICH_MISSING_ABSTRACT
|
2020-11-19 10:42:10 +01:00 |
Claudio Atzori
|
e503271abe
|
fixed notification workflow name
|
2020-11-19 10:41:38 +01:00 |
Claudio Atzori
|
0374d34c3e
|
introduced configuration param outputFormat: HDFS | SOLR
|
2020-11-19 10:34:28 +01:00 |
Miriam Baglioni
|
fafb688887
|
-
|
2020-11-18 18:56:48 +01:00 |
Miriam Baglioni
|
906db690d2
|
-
|
2020-11-18 17:43:08 +01:00 |
Claudio Atzori
|
ede7fae6c8
|
Merge pull request 'XML record indexing test' (#58) from provision_indexing into master
|
2020-11-18 17:04:34 +01:00 |
Miriam Baglioni
|
5402062ff5
|
changed parameter file with the ono associated to the job
|
2020-11-18 16:58:20 +01:00 |
Miriam Baglioni
|
a172a37ad1
|
fixed typo
|
2020-11-18 16:55:07 +01:00 |
Miriam Baglioni
|
46ba3793f6
|
code, workflow and parameters for the dump of the results associated to funders
|
2020-11-18 16:47:31 +01:00 |
Claudio Atzori
|
5218718e8b
|
updated set of fields from the MDFormatDSResourceType on PROD
|
2020-11-18 15:00:41 +01:00 |
Claudio Atzori
|
d9e07a242b
|
extended XmlIndexingJob to accept an optional parameter: outputPath. When present, forces the job to write its output on the specified HDFS location
|
2020-11-18 14:34:55 +01:00 |
Claudio Atzori
|
29dcff0f34
|
spark complains about missing classes, so here they are again
|
2020-11-18 14:32:32 +01:00 |
Miriam Baglioni
|
57cac36898
|
changed the workflow name
|
2020-11-18 13:38:03 +01:00 |
Claudio Atzori
|
12acf25519
|
Merge pull request 'starting from first step...' (#57) from antonis.lempesis/dnet-hadoop:master into master
No judging. Just re-deploying...
|
2020-11-18 11:01:49 +01:00 |
Claudio Atzori
|
8177ce7939
|
test for XmlIndexingJob based on a local miniSolrCluster
|
2020-11-18 10:58:05 +01:00 |
Alessia Bardi
|
10e673660f
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-11-18 10:01:23 +01:00 |
Alessia Bardi
|
be7b310cef
|
rel semantcis ignore case
|
2020-11-18 10:01:20 +01:00 |
Michele Artini
|
33da2e3d6c
|
xpaths for dateOfCollection and dateOfTransformation
|
2020-11-18 09:26:20 +01:00 |
Antonis Lempesis
|
01a6e03989
|
starting from first step...
|
2020-11-17 23:26:47 +02:00 |
Alessia Bardi
|
8f87020a50
|
#56: map relevantDates from aggregated ODF records
|
2020-11-17 18:42:09 +01:00 |
Alessia Bardi
|
7e0a76a8ac
|
test fr TextGrid
|
2020-11-17 18:39:25 +01:00 |
Enrico Ottonello
|
2b0c9bbb7e
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2020-11-17 18:24:34 +01:00 |
Enrico Ottonello
|
c0c2e05eae
|
added wf to extracting authors and works xml data from orcid dump to hdfs; added wf to download the lamda file (containing last orcid update informations) from orcid to hdfs
|
2020-11-17 18:23:12 +01:00 |
Claudio Atzori
|
cfc01f136e
|
PID filtering based on a blacklist
|
2020-11-17 12:27:06 +01:00 |
Claudio Atzori
|
628ca54dd3
|
disable old maven repository URLs
|
2020-11-17 12:26:16 +01:00 |
Dimitris
|
bbcf6b7c8b
|
Commit 17112020
|
2020-11-17 08:36:51 +02:00 |
Enrico Ottonello
|
c796adae24
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2020-11-16 11:57:19 +01:00 |
Claudio Atzori
|
6ab1ce53c9
|
fixed condition in result pid cleaning; cleanup
|
2020-11-16 10:09:17 +01:00 |
Claudio Atzori
|
4de8c8b237
|
fixed workflow variable name
|
2020-11-16 10:03:11 +01:00 |
Dimitris
|
3e24c9b176
|
Changes 14112020
|
2020-11-14 18:42:07 +02:00 |
Claudio Atzori
|
331d621800
|
added test resource
|
2020-11-14 12:16:15 +01:00 |
Claudio Atzori
|
5d4e34e26a
|
fixed typo in variable name
|
2020-11-14 10:32:26 +01:00 |
Claudio Atzori
|
768bc5304c
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-11-13 15:40:34 +01:00 |
Claudio Atzori
|
93f7b7974f
|
Merge pull request 'trust truncated to 3 decimals' (#24) from trunc_trust into master
LGTM
|
2020-11-13 15:40:02 +01:00 |
Claudio Atzori
|
2facfefc19
|
updated maven repository URL
|
2020-11-13 15:38:40 +01:00 |