Claudio Atzori
|
1eaad89a3c
|
do not fail on uknown properties when grouping entities by ID
|
2020-12-10 15:56:11 +01:00 |
Michele Artini
|
933b4c1ada
|
workingDir and outputDir
|
2020-12-10 14:47:51 +01:00 |
Michele Artini
|
2e7df07328
|
workingDir and outputDir
|
2020-12-10 14:47:22 +01:00 |
Michele Artini
|
94bfed1c84
|
gzipped output
|
2020-12-10 11:59:28 +01:00 |
Claudio Atzori
|
12e2f930c8
|
resolved conflicts
|
2020-12-10 10:57:39 +01:00 |
Miriam Baglioni
|
b7adbc7c3e
|
merge branch with master
|
2020-12-10 10:35:27 +01:00 |
Alessia Bardi
|
112da6d76a
|
in theory, just auto-formatting after mvn compile
|
2020-12-09 20:00:27 +01:00 |
Alessia Bardi
|
bece04b330
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-12-09 19:54:43 +01:00 |
Alessia Bardi
|
426b76ee8e
|
more asserts for TextGrid record
|
2020-12-09 19:46:11 +01:00 |
Claudio Atzori
|
ff72fcd91a
|
allow orcid_pending to be percolate to the XML graph serialization
|
2020-12-09 19:04:50 +01:00 |
Claudio Atzori
|
4705144918
|
Merge pull request 'rel_project_validation' (#69) from rel_project_validation into master
LGTM
|
2020-12-09 19:01:20 +01:00 |
Claudio Atzori
|
211aa04726
|
allow orcid_pending to be percolate to the XML graph serialization
|
2020-12-09 18:08:51 +01:00 |
Claudio Atzori
|
ada21ad920
|
Merge pull request 'dump of the results related to at least one project' (#61) from miriam.baglioni/dnet-hadoop:dump into master
LGTM
|
2020-12-09 17:22:56 +01:00 |
Claudio Atzori
|
3c5ce1dada
|
code formatting
|
2020-12-09 17:07:20 +01:00 |
Michele Artini
|
1bc9adc10d
|
default trust for validated rels
|
2020-12-09 16:18:37 +01:00 |
Claudio Atzori
|
fcd7689b50
|
promote actions: shouldGroupById parameter marked as optional (default is true)
|
2020-12-09 13:10:16 +01:00 |
Michele Artini
|
5f21a356fd
|
reindent
|
2020-12-09 11:24:30 +01:00 |
Michele Artini
|
370a5e650b
|
validation attributes in resultProject relations
|
2020-12-09 11:18:26 +01:00 |
Antonis Lempesis
|
aead9efd24
|
added the new parameter (stats_tool_api_url) in the workflow parameters
|
2020-12-09 10:45:24 +01:00 |
Antonis Lempesis
|
77a3a6d82e
|
added the new parameter (stats_tool_api_url) in the workflow parameters
|
2020-12-09 10:45:24 +01:00 |
Antonis Lempesis
|
91226117b3
|
ignoring deletedbyinference relations
|
2020-12-09 10:45:24 +01:00 |
Antonis Lempesis
|
b7f29db126
|
finished first implementation of wf
|
2020-12-09 10:45:24 +01:00 |
Antonis Lempesis
|
ded2392275
|
initial implementation of the promote wf
|
2020-12-09 10:45:24 +01:00 |
Antonis Lempesis
|
1a87a1effd
|
added last step to update cache
|
2020-12-09 10:45:24 +01:00 |
Enrico Ottonello
|
2233750a37
|
original orcid xml data are stored in a field of the class that models orcid data
|
2020-12-09 09:45:19 +01:00 |
Claudio Atzori
|
27e96767e0
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-12-07 21:53:22 +01:00 |
Claudio Atzori
|
fba11eef2a
|
cleanup
|
2020-12-07 21:53:13 +01:00 |
Sandro La Bruzzo
|
7f8b93de72
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-12-07 19:59:39 +01:00 |
Sandro La Bruzzo
|
302baab67b
|
fixed doiboost mapping and workflows
|
2020-12-07 19:59:33 +01:00 |
Enrico Ottonello
|
5c65e602d3
|
wf doi_authors generates one json data foreach row
|
2020-12-07 15:28:10 +01:00 |
Michele Artini
|
d6934f370e
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-12-07 14:56:23 +01:00 |
Michele Artini
|
5de8a7276f
|
wf to partition opendoar events
|
2020-12-07 14:56:06 +01:00 |
Claudio Atzori
|
5e8509bef7
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-12-07 13:50:08 +01:00 |
Claudio Atzori
|
026ad40633
|
disabled test
|
2020-12-07 13:50:01 +01:00 |
Claudio Atzori
|
21ddcf3a73
|
actions promotion can optionally avoid grouping objects by id (configured via shouldGroupById parameter)
|
2020-12-07 13:45:18 +01:00 |
Enrico Ottonello
|
fa1855a4b8
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2020-12-07 11:02:59 +01:00 |
Enrico Ottonello
|
b1b589ada1
|
wf to generate orcid dataset
|
2020-12-07 11:02:32 +01:00 |
Sandro La Bruzzo
|
620e585b63
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-12-07 10:42:53 +01:00 |
Sandro La Bruzzo
|
b31dd126fb
|
fixed crossref workflow added common ORCID Class
|
2020-12-07 10:42:38 +01:00 |
Enrico Ottonello
|
8812ab65e1
|
completed download function to wf; added accumulators
|
2020-12-04 21:13:49 +01:00 |
Claudio Atzori
|
a104a632df
|
cleanup
|
2020-12-04 16:32:47 +01:00 |
Claudio Atzori
|
5b4e1142a8
|
Merge pull request 'added last step to update cache' (#64) from antonis.lempesis/dnet-hadoop:master into master
Looks good to me, thanks!
|
2020-12-04 14:42:31 +01:00 |
Antonis Lempesis
|
b1ed1afdcc
|
added the new parameter (stats_tool_api_url) in the workflow parameters
|
2020-12-04 13:07:18 +02:00 |
Antonis Lempesis
|
7cb113e088
|
added the new parameter (stats_tool_api_url) in the workflow parameters
|
2020-12-04 13:04:25 +02:00 |
Antonis Lempesis
|
d23ccae0d5
|
ignoring deletedbyinference relations
|
2020-12-04 12:42:17 +02:00 |
Miriam Baglioni
|
5fb65ffc4a
|
merge branch with master
|
2020-12-03 11:24:35 +01:00 |
Miriam Baglioni
|
ea88dc3401
|
fixed issue in property name
|
2020-12-03 11:24:23 +01:00 |
Miriam Baglioni
|
4c58bd1c93
|
merge with upstream
|
2020-12-03 11:24:00 +01:00 |
Miriam Baglioni
|
05c452f58d
|
merge with upstream
|
2020-12-03 10:26:45 +01:00 |
Enrico Ottonello
|
53b22c1937
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2020-12-02 23:21:27 +01:00 |
Enrico Ottonello
|
1b1e9ea67c
|
wf to generate doi_author_list for doiboost; wf to download updated works
|
2020-12-02 23:20:16 +01:00 |
Antonis Lempesis
|
413afcfed5
|
finished first implementation of wf
|
2020-12-02 15:57:17 +02:00 |
Antonis Lempesis
|
0948536614
|
initial implementation of the promote wf
|
2020-12-02 15:41:56 +02:00 |
Sandro La Bruzzo
|
7da679542f
|
fixed wrong projectId
|
2020-12-02 14:28:09 +01:00 |
Sandro La Bruzzo
|
6ba8037cc7
|
fixed failure to test due to changing of input
|
2020-12-02 11:34:46 +01:00 |
Claudio Atzori
|
cfb55effd9
|
code formatting
|
2020-12-02 11:23:49 +01:00 |
Claudio Atzori
|
74242e450e
|
using constants from ModelConstants
|
2020-12-02 11:23:35 +01:00 |
Miriam Baglioni
|
d5efa6963a
|
using constants in ModelCOnstants
|
2020-12-02 11:20:26 +01:00 |
Miriam Baglioni
|
cd285e98bc
|
usoing the constants defined in the ModelConstants class
|
2020-12-02 11:13:23 +01:00 |
Miriam Baglioni
|
4b0d1530a2
|
merge upstream
|
2020-12-02 11:05:00 +01:00 |
Claudio Atzori
|
faa977df7e
|
Merge pull request 'orcid-no-doi' (#43) from enrico.ottonello/dnet-hadoop:orcid-no-doi into master
The dataset was generated and is now part of the actionsets available in BETA
|
2020-12-02 10:55:12 +01:00 |
Claudio Atzori
|
57f448b7a4
|
graph cleaning workflow separate orcid_pending from orcid, depending on the author pid provenance
|
2020-12-02 10:44:05 +01:00 |
Alessia Bardi
|
2d15667b4a
|
testing XML generation from json object (case AMS ACTA)
|
2020-12-02 10:16:26 +01:00 |
Alessia Bardi
|
a417624670
|
tests for raw graph mapping
|
2020-12-02 10:15:26 +01:00 |
Claudio Atzori
|
893ac4a77b
|
GenerateEntitiesApplication can be configured to hash the id value or not
|
2020-12-02 09:30:06 +01:00 |
Miriam Baglioni
|
f8468c9c22
|
added extention for new author pid (orcid_pending)
|
2020-12-01 20:09:35 +01:00 |
Miriam Baglioni
|
888175baf7
|
added java doc
|
2020-12-01 18:36:29 +01:00 |
Miriam Baglioni
|
3d62d99d5d
|
fixed issue in workflow variable
|
2020-12-01 15:02:49 +01:00 |
Miriam Baglioni
|
17680296b9
|
removed unnecessary variable and unused method
|
2020-12-01 15:02:31 +01:00 |
Miriam Baglioni
|
5b3ed70808
|
refactoring
|
2020-12-01 14:31:34 +01:00 |
Miriam Baglioni
|
62ff4999e3
|
added workflow and last step of collection and save
|
2020-12-01 14:30:56 +01:00 |
Miriam Baglioni
|
45d06c45c7
|
collecting all the atoic actions for result type and save them all in the AS path
|
2020-12-01 14:29:18 +01:00 |
Miriam Baglioni
|
0051ebede5
|
extending test
|
2020-12-01 12:43:03 +01:00 |
Miriam Baglioni
|
719da15f04
|
added test resources
|
2020-12-01 12:42:30 +01:00 |
Miriam Baglioni
|
db36e11912
|
classes test classes and resources for production of the actionset to include bipFinder score in results
|
2020-11-30 20:14:23 +01:00 |
Enrico Ottonello
|
f2df3ead74
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2020-11-30 14:22:46 +01:00 |
Enrico Ottonello
|
40c4559e92
|
added datainfo on authors pid with "sysimport:crosswalk:entityregistry",
|
2020-11-30 14:19:22 +01:00 |
Claudio Atzori
|
2c407e775e
|
GenerateEntitiesApplication can be configured to hash the id value or not
|
2020-11-30 12:00:38 +01:00 |
Antonis Lempesis
|
815d6b25d9
|
added last step to update cache
|
2020-11-30 00:48:10 +02:00 |
Claudio Atzori
|
758d27745d
|
cleaning tab characters from text fields
|
2020-11-27 16:07:24 +01:00 |
Claudio Atzori
|
e731a7658d
|
cleaning texts to remove tab characters too
|
2020-11-27 09:00:04 +01:00 |
Claudio Atzori
|
5151850a19
|
CROSSREF and DATACITE constants moved in common ModelConstants
|
2020-11-26 13:08:36 +01:00 |
Claudio Atzori
|
a104d2b6ad
|
cleanup
|
2020-11-26 11:12:00 +01:00 |
Claudio Atzori
|
d0d5525d40
|
minor changes
|
2020-11-26 11:04:17 +01:00 |
Claudio Atzori
|
13eae4b31e
|
GroupEntitiesSparkJob must read all graph paths but relations
|
2020-11-26 11:04:01 +01:00 |
Claudio Atzori
|
76363a8512
|
SimpleDateFormat is not thread safe; improved error reporting in case of invalid dates
|
2020-11-26 11:03:12 +01:00 |
Claudio Atzori
|
c1b9a4045a
|
grouping of records will be performed by the dedup workflow
|
2020-11-26 10:59:10 +01:00 |
Miriam Baglioni
|
124591a7f3
|
refactoring
|
2020-11-25 18:23:28 +01:00 |
Miriam Baglioni
|
1a89f8211c
|
D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 18:12:40 +01:00 |
Miriam Baglioni
|
5fbe54ef54
|
D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 18:10:28 +01:00 |
Miriam Baglioni
|
ed01e5a5e1
|
D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 18:09:34 +01:00 |
Miriam Baglioni
|
d4ddde2ef2
|
changed because of D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 18:01:01 +01:00 |
Miriam Baglioni
|
f5e5e92a10
|
changed because of D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 17:58:53 +01:00 |
Miriam Baglioni
|
1df94b85b4
|
changed because of D-Net/dnet-hadoop#61 (comment)
|
2020-11-25 17:57:43 +01:00 |
Claudio Atzori
|
db0181b8af
|
Merge pull request 'added bidirectionality to relations from project and result coming from crossref' (#60) from miriam.baglioni/dnet-hadoop:sxBidirectionality into master
|
2020-11-25 17:17:40 +01:00 |
Sandro La Bruzzo
|
ec3e238de6
|
Fixed problem on duplicated identifier
|
2020-11-25 17:15:54 +01:00 |
Claudio Atzori
|
e208b03755
|
renamed workflow
|
2020-11-25 14:55:50 +01:00 |
Claudio Atzori
|
dfd6205b95
|
Consistency graph workflow merges all the entities by ID
|
2020-11-25 14:55:32 +01:00 |
Miriam Baglioni
|
90d4369fd2
|
added test to verify the compression in writing community info on hdfs
|
2020-11-25 14:34:58 +01:00 |
Miriam Baglioni
|
6750e33d69
|
merge branch with master
|
2020-11-25 14:09:01 +01:00 |
Miriam Baglioni
|
b2c455f883
|
added java doc
|
2020-11-25 14:08:09 +01:00 |
Miriam Baglioni
|
1f130cdf92
|
changed the relation (produces -> isProducedBy) due to the change in the code
|
2020-11-25 14:04:26 +01:00 |
Miriam Baglioni
|
e758d5d9b4
|
refactoring
|
2020-11-25 13:46:39 +01:00 |
Miriam Baglioni
|
87a9f616ae
|
refactoring and addition of the funder nsp first part as nome for the dump insteasd of the whole nsp
|
2020-11-25 13:45:41 +01:00 |
Miriam Baglioni
|
e7e418e444
|
added decision node to verify if to upload in Zenodo
|
2020-11-25 13:44:10 +01:00 |
Miriam Baglioni
|
305e3d0c9c
|
added resource file for relation with relClass = isProducedBy
|
2020-11-25 13:43:41 +01:00 |
Miriam Baglioni
|
21ce175d17
|
added FilterFunction specification if filter operation
|
2020-11-25 13:42:31 +01:00 |
Miriam Baglioni
|
bde6d337dd
|
test classes for dump of results related to funders
|
2020-11-25 13:42:01 +01:00 |
Miriam Baglioni
|
b37b9352d7
|
added constant value for semantic relationship between projects and results
|
2020-11-25 13:41:08 +01:00 |
Sandro La Bruzzo
|
264723ffd8
|
updated stuff for zenodo upload
|
2020-11-25 11:56:07 +01:00 |
Claudio Atzori
|
36173c13a5
|
reverted filters in the clening process
|
2020-11-25 10:24:42 +01:00 |
Claudio Atzori
|
eeebd5a920
|
Cleanig workflow: remove newlines from titles, descriptions, subjects
|
2020-11-24 18:40:25 +01:00 |
Claudio Atzori
|
e1a1bb3ee4
|
moved class CleaningFunctions in the correct package. Remove newlines from titles, descriptions, subjects
|
2020-11-24 18:34:03 +01:00 |
Enrico Ottonello
|
99a086f0c6
|
max concurrent executors set to 10, according to ORCID Director of Technology mail request
|
2020-11-24 17:49:32 +01:00 |
Miriam Baglioni
|
72bb0fe360
|
changed directory name
|
2020-11-24 16:47:07 +01:00 |
Miriam Baglioni
|
00874a8ce6
|
added bidirectionality to relations from project and result
|
2020-11-24 15:17:23 +01:00 |
Miriam Baglioni
|
39f4a20873
|
chenged the path and the name for saving the communities_infrastructures dump file
|
2020-11-24 14:47:32 +01:00 |
Miriam Baglioni
|
7e14452a87
|
final versione of the wf to get the dump of results associated to at least one funder per funder
|
2020-11-24 14:46:34 +01:00 |
Miriam Baglioni
|
c167a18057
|
added new parameter for the dumpType
|
2020-11-24 14:45:50 +01:00 |
Miriam Baglioni
|
54a309bb6b
|
refactoring
|
2020-11-24 14:45:30 +01:00 |
Miriam Baglioni
|
35ecea8842
|
changed to consider the modification for the specification of the type of dump
|
2020-11-24 14:45:15 +01:00 |
Miriam Baglioni
|
b9b6bdb2e6
|
fixing issue on previous implementation
|
2020-11-24 14:44:53 +01:00 |
Miriam Baglioni
|
7e940f1991
|
changed to consider the modification for the specification of the type of dump
|
2020-11-24 14:43:34 +01:00 |
Miriam Baglioni
|
62928ef7a5
|
changed to save the communities_infrastructures information as the other entity dumps: in a json.gz file
|
2020-11-24 14:42:41 +01:00 |
Claudio Atzori
|
33bae02451
|
reverted behaviour of the cleaning workflow: grouping entities by ID will be managed differently
|
2020-11-24 14:42:33 +01:00 |
Miriam Baglioni
|
3319440c53
|
changed the direction of the relation between projects and result considered to select the results linked to projects
|
2020-11-24 14:41:09 +01:00 |
Miriam Baglioni
|
00c377dac2
|
added specification of MapFunction types in map
|
2020-11-24 14:40:22 +01:00 |
Miriam Baglioni
|
44db258dc4
|
added enumerated for the dump type
|
2020-11-24 14:38:06 +01:00 |
Miriam Baglioni
|
1832708c42
|
modified boolean variable with string one whcih specify the type of dump we are performing: complete, community or funder
|
2020-11-24 14:37:36 +01:00 |
Enrico Ottonello
|
5c17e768b2
|
set wf configuration with spark.dynamicAllocation.maxExecutors 20 over 20 input partitions
|
2020-11-23 16:01:23 +01:00 |
Enrico Ottonello
|
5c9a727895
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2020-11-23 09:49:53 +01:00 |
Enrico Ottonello
|
97c8111847
|
action to convert lambda file in seq file; spark action to download updated authors
|
2020-11-23 09:49:22 +01:00 |
Miriam Baglioni
|
259c67ce36
|
fixed issue in path name
|
2020-11-20 12:32:23 +01:00 |
Miriam Baglioni
|
0a9db67eec
|
-
|
2020-11-20 12:21:33 +01:00 |
Miriam Baglioni
|
d362f2637d
|
merge branch with master
|
2020-11-19 19:17:20 +01:00 |
Miriam Baglioni
|
cf3f47563f
|
new parameter files
|
2020-11-19 19:16:05 +01:00 |
Miriam Baglioni
|
24c56fa7a3
|
new logic and workflow for dump of results with link to projects. In this implementation the result match the model of the communityresult.
|
2020-11-19 19:15:39 +01:00 |
Claudio Atzori
|
d48f388fb2
|
Merge branch 'provision_indexing'
|
2020-11-19 15:59:55 +01:00 |
Claudio Atzori
|
46bde9c13f
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-11-19 15:26:27 +01:00 |
Claudio Atzori
|
7c9feaf9e7
|
project attributes removed from the XML record serialization: contactfullname, contactfax, contactphone, contactemail
|
2020-11-19 15:26:20 +01:00 |
Claudio Atzori
|
fcbb05eb21
|
cleanup
|
2020-11-19 15:14:33 +01:00 |
Claudio Atzori
|
3f34757c63
|
merged from master
|
2020-11-19 14:34:54 +01:00 |
Michele Artini
|
293da47ad9
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-11-19 10:42:31 +01:00 |
Michele Artini
|
ab08d12c46
|
considering abstract > MIN_LENGTH in ENRICH_MISSING_ABSTRACT
|
2020-11-19 10:42:10 +01:00 |
Claudio Atzori
|
e503271abe
|
fixed notification workflow name
|
2020-11-19 10:41:38 +01:00 |
Claudio Atzori
|
0374d34c3e
|
introduced configuration param outputFormat: HDFS | SOLR
|
2020-11-19 10:34:28 +01:00 |
Miriam Baglioni
|
fafb688887
|
-
|
2020-11-18 18:56:48 +01:00 |
Miriam Baglioni
|
906db690d2
|
-
|
2020-11-18 17:43:08 +01:00 |
Claudio Atzori
|
ede7fae6c8
|
Merge pull request 'XML record indexing test' (#58) from provision_indexing into master
|
2020-11-18 17:04:34 +01:00 |
Miriam Baglioni
|
5402062ff5
|
changed parameter file with the ono associated to the job
|
2020-11-18 16:58:20 +01:00 |
Miriam Baglioni
|
a172a37ad1
|
fixed typo
|
2020-11-18 16:55:07 +01:00 |
Miriam Baglioni
|
46ba3793f6
|
code, workflow and parameters for the dump of the results associated to funders
|
2020-11-18 16:47:31 +01:00 |
Claudio Atzori
|
5218718e8b
|
updated set of fields from the MDFormatDSResourceType on PROD
|
2020-11-18 15:00:41 +01:00 |
Claudio Atzori
|
d9e07a242b
|
extended XmlIndexingJob to accept an optional parameter: outputPath. When present, forces the job to write its output on the specified HDFS location
|
2020-11-18 14:34:55 +01:00 |
Claudio Atzori
|
29dcff0f34
|
spark complains about missing classes, so here they are again
|
2020-11-18 14:32:32 +01:00 |
Miriam Baglioni
|
57cac36898
|
changed the workflow name
|
2020-11-18 13:38:03 +01:00 |
Claudio Atzori
|
12acf25519
|
Merge pull request 'starting from first step...' (#57) from antonis.lempesis/dnet-hadoop:master into master
No judging. Just re-deploying...
|
2020-11-18 11:01:49 +01:00 |
Claudio Atzori
|
8177ce7939
|
test for XmlIndexingJob based on a local miniSolrCluster
|
2020-11-18 10:58:05 +01:00 |
Alessia Bardi
|
10e673660f
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-11-18 10:01:23 +01:00 |
Alessia Bardi
|
be7b310cef
|
rel semantcis ignore case
|
2020-11-18 10:01:20 +01:00 |
Michele Artini
|
33da2e3d6c
|
xpaths for dateOfCollection and dateOfTransformation
|
2020-11-18 09:26:20 +01:00 |
Antonis Lempesis
|
01a6e03989
|
starting from first step...
|
2020-11-17 23:26:47 +02:00 |
Alessia Bardi
|
8f87020a50
|
#56: map relevantDates from aggregated ODF records
|
2020-11-17 18:42:09 +01:00 |
Alessia Bardi
|
7e0a76a8ac
|
test fr TextGrid
|
2020-11-17 18:39:25 +01:00 |
Enrico Ottonello
|
2b0c9bbb7e
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2020-11-17 18:24:34 +01:00 |
Enrico Ottonello
|
c0c2e05eae
|
added wf to extracting authors and works xml data from orcid dump to hdfs; added wf to download the lamda file (containing last orcid update informations) from orcid to hdfs
|
2020-11-17 18:23:12 +01:00 |
Claudio Atzori
|
cfc01f136e
|
PID filtering based on a blacklist
|
2020-11-17 12:27:06 +01:00 |
Dimitris
|
bbcf6b7c8b
|
Commit 17112020
|
2020-11-17 08:36:51 +02:00 |
Enrico Ottonello
|
c796adae24
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2020-11-16 11:57:19 +01:00 |
Claudio Atzori
|
6ab1ce53c9
|
fixed condition in result pid cleaning; cleanup
|
2020-11-16 10:09:17 +01:00 |
Claudio Atzori
|
4de8c8b237
|
fixed workflow variable name
|
2020-11-16 10:03:11 +01:00 |
Dimitris
|
3e24c9b176
|
Changes 14112020
|
2020-11-14 18:42:07 +02:00 |
Claudio Atzori
|
331d621800
|
added test resource
|
2020-11-14 12:16:15 +01:00 |
Claudio Atzori
|
5d4e34e26a
|
fixed typo in variable name
|
2020-11-14 10:32:26 +01:00 |
Claudio Atzori
|
768bc5304c
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-11-13 15:40:34 +01:00 |
Claudio Atzori
|
93f7b7974f
|
Merge pull request 'trust truncated to 3 decimals' (#24) from trunc_trust into master
LGTM
|
2020-11-13 15:40:02 +01:00 |
Claudio Atzori
|
528231a287
|
grouping graph entities by id turned out to be an easy extension for the already existing cleaning workflow
|
2020-11-13 15:37:48 +01:00 |
Enrico Ottonello
|
005f849674
|
added compression to output dataset
|
2020-11-13 12:45:31 +01:00 |
Enrico Ottonello
|
9a2fa9dc2f
|
added test for other names parsing from summaries dump
|
2020-11-13 10:25:34 +01:00 |
Claudio Atzori
|
2bed29eb09
|
WIP: added oozie workflow for grouping graph entities by id
|
2020-11-13 10:05:12 +01:00 |
Claudio Atzori
|
13e36a4da0
|
WIP: added oozie workflow for grouping graph entities by id
|
2020-11-13 10:05:02 +01:00 |
Enrico Ottonello
|
13f28fa225
|
moved AuthorData to dhp-schemas; added other names to author data
|
2020-11-12 17:43:32 +01:00 |
Enrico Ottonello
|
2af21150c5
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2020-11-12 09:58:33 +01:00 |
Claudio Atzori
|
9b0fb9e958
|
merged from master
|
2020-11-12 09:27:12 +01:00 |
Claudio Atzori
|
75324ae58a
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-11-12 09:23:37 +01:00 |
Claudio Atzori
|
822971f54f
|
no need to filter relations in CreateRelatedEntitiesJob_phase1; replaced 'left outer' join with 'left' join in CreateRelatedEntitiesJob_phase2; cleanup;
|
2020-11-12 09:22:59 +01:00 |
Enrico Ottonello
|
1f861f2b0d
|
now wf output is a sequence file with the format seq("eu.dnetlib.dhp.schema.oaf.Publication",eu.dnetlib.dhp.schema.action.AtomicActions)
|
2020-11-11 17:38:50 +01:00 |
Claudio Atzori
|
9841488482
|
Merge pull request 'latest changes in stats wf' (#54) from antonis.lempesis/dnet-hadoop:master into master
LGTM, thanks!
|
2020-11-11 16:01:51 +01:00 |
Antonis Lempesis
|
99ebaee347
|
fixed #5913
|
2020-11-11 16:56:46 +02:00 |
Claudio Atzori
|
e3d3481fb9
|
Merge pull request 'organizations pids' (#53) from organization_pids into master
LGTM
|
2020-11-11 14:08:25 +01:00 |
Antonis Lempesis
|
f14e65f6a3
|
reverted wrong change
|
2020-11-10 17:23:04 +02:00 |
Antonis Lempesis
|
c02c7741c9
|
fixes in db creation
|
2020-11-10 17:11:30 +02:00 |
Antonis Lempesis
|
e603fa5847
|
fixes in db creation
|
2020-11-10 17:11:12 +02:00 |
Enrico Ottonello
|
fea2451658
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
|
2020-11-10 11:49:43 +01:00 |
Claudio Atzori
|
18d9aad70c
|
improved documentation in dhp-graph-provision
|
2020-11-10 11:48:55 +01:00 |
Enrico Ottonello
|
1513174d7e
|
added further test case
|
2020-11-10 11:44:55 +01:00 |
Michele Artini
|
40160d171f
|
organizations pids
|
2020-11-09 12:58:36 +01:00 |
Sandro La Bruzzo
|
8e1d43aab2
|
Implemented ID generation using IdentifierRecordFactory on DOIBoost
|
2020-11-09 11:53:55 +01:00 |
Sandro La Bruzzo
|
027ef2326c
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-11-06 17:12:42 +01:00 |
Sandro La Bruzzo
|
cd27df91a1
|
fixed bug on missing relation in ANDS
|
2020-11-06 17:12:31 +01:00 |
Enrico Ottonello
|
6bc7dbeca7
|
first version of dataset successful generated from orcid dump 2020
|
2020-11-06 13:47:50 +01:00 |
Claudio Atzori
|
d10447e747
|
re-packaged graph dump workflow sources
|
2020-11-05 17:38:18 +01:00 |
Claudio Atzori
|
2d76497488
|
cleanup
|
2020-11-05 17:10:24 +01:00 |
Miriam Baglioni
|
f8e9bda24c
|
merge branch with master
|
2020-11-05 16:31:18 +01:00 |
Miriam Baglioni
|
be5ed8f554
|
added check to avoid sending empty metadata.
|
2020-11-05 16:10:17 +01:00 |
Claudio Atzori
|
2148a51fae
|
minor changes
|
2020-11-05 11:24:12 +01:00 |
Claudio Atzori
|
4625b7486e
|
code formatting
|
2020-11-04 18:12:43 +01:00 |
Claudio Atzori
|
f5f346dd2b
|
Merge pull request 'dump' (#50) from miriam.baglioni/dnet-hadoop:dump into master
LGTM
|
2020-11-04 18:07:01 +01:00 |
Miriam Baglioni
|
e9ac471ae9
|
removed dependency from classes for the pid graph dump
|
2020-11-04 18:04:42 +01:00 |
Miriam Baglioni
|
b90a945c49
|
removed property files for pid graph dump
|
2020-11-04 17:28:33 +01:00 |
Miriam Baglioni
|
bac307155a
|
removed properties specific for pid graph dump
|
2020-11-04 17:28:04 +01:00 |
Miriam Baglioni
|
9c9d50f486
|
removed code specific for pid graph dump
|
2020-11-04 17:26:22 +01:00 |
Miriam Baglioni
|
5669890934
|
removed commented lines
|
2020-11-04 17:15:21 +01:00 |
Miriam Baglioni
|
6a89f59be9
|
removed commented lines
|
2020-11-04 17:13:59 +01:00 |
Miriam Baglioni
|
56150d7e5e
|
removed all code related to the dump of pids graph
|
2020-11-04 17:13:12 +01:00 |
Miriam Baglioni
|
16c54a96f8
|
removed pid dump
|
2020-11-04 17:11:32 +01:00 |
Claudio Atzori
|
e5da4ee9b1
|
dedup workflow using the common PidComparator
|
2020-11-04 15:02:02 +01:00 |
Miriam Baglioni
|
0cac5436ff
|
Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump
|
2020-11-04 13:21:11 +01:00 |
Alessia Bardi
|
51808b5afd
|
Updated descriptions
|
2020-11-04 12:29:48 +01:00 |
Alessia Bardi
|
e6becf8659
|
Updated descriptions
|
2020-11-04 12:17:57 +01:00 |
Alessia Bardi
|
0abe0eee33
|
Updated descriptions
|
2020-11-04 12:15:30 +01:00 |
Alessia Bardi
|
f6ab238f5d
|
Updated descriptions
|
2020-11-04 11:50:47 +01:00 |
Sandro La Bruzzo
|
3581244daf
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-11-04 09:04:22 +01:00 |
Sandro La Bruzzo
|
66efb39634
|
implemented merge scholix
|
2020-11-04 09:04:01 +01:00 |
Miriam Baglioni
|
c010a8442f
|
fixed issue on test code
|
2020-11-03 17:26:51 +01:00 |
Miriam Baglioni
|
8ec7a61188
|
merge branch with master
|
2020-11-03 16:59:08 +01:00 |
Miriam Baglioni
|
c209284ca7
|
new schemas for the entities in the dump with added descriptions
|
2020-11-03 16:58:08 +01:00 |
Miriam Baglioni
|
08806deddf
|
added the splitSize non mandatory parameter. Default size 10G
|
2020-11-03 16:57:34 +01:00 |
Miriam Baglioni
|
7d2eda43ca
|
added new non mandatory property publish to determine if to publish the upload or leave it pending. Default value flase
|
2020-11-03 16:57:01 +01:00 |
Miriam Baglioni
|
cbbb1bdc54
|
moved business logic to new class in common for handling the zip of hte archives
|
2020-11-03 16:55:50 +01:00 |
Miriam Baglioni
|
d4382b54df
|
moved the tar archive with maz size on common module
|
2020-11-03 16:54:50 +01:00 |
Claudio Atzori
|
86d6fbe95b
|
refactoring: CleaningFunctions and OafMapperUtils moved in dhp-commong
|
2020-11-03 12:19:46 +01:00 |
Claudio Atzori
|
8471888ad3
|
Merge branch 'graph_cleaning' into stable_ids
|
2020-11-03 11:52:47 +01:00 |
Claudio Atzori
|
5310e56dba
|
remove empy PIDs
|
2020-11-03 11:52:10 +01:00 |
Claudio Atzori
|
3fcd669e99
|
result merge operation leverage on custom ResultTypeComparator in the aggregator graph construction
|
2020-11-03 10:53:23 +01:00 |
Claudio Atzori
|
8e7f81c5f5
|
code formatting
|
2020-11-02 14:25:00 +01:00 |
Claudio Atzori
|
09e44dabff
|
Merge branch 'master' into stable_ids
|
2020-11-02 12:16:01 +01:00 |
Sandro La Bruzzo
|
754c86f33e
|
fixed test to work on jenkins
|
2020-11-02 09:35:01 +01:00 |
Sandro La Bruzzo
|
39337d8a8a
|
fixed test
|
2020-11-02 09:26:25 +01:00 |
Dimitris
|
32bf943979
|
Changes to download only updates
|
2020-11-02 09:08:25 +02:00 |
Miriam Baglioni
|
dabb33e018
|
changed the discriminant for which split the file
|
2020-10-30 17:52:22 +01:00 |
Claudio Atzori
|
c5dda3a00c
|
Merge pull request 'h2020classification' (#49) from miriam.baglioni/dnet-hadoop:h2020classification into master
LGTM
|
2020-10-30 17:10:05 +01:00 |
Miriam Baglioni
|
4905739be6
|
changed resource file to mirror change in business logic
|
2020-10-30 17:02:57 +01:00 |
Miriam Baglioni
|
b40360ebfb
|
changed the code to mirror the changed decision in the classification level and prodramme description labels
|
2020-10-30 17:02:30 +01:00 |
Miriam Baglioni
|
696409fb9f
|
disabled tests because needing remote resource
|
2020-10-30 17:01:48 +01:00 |
Miriam Baglioni
|
0fba08eae4
|
max allowed size per file 10 Gb
|
2020-10-30 16:05:55 +01:00 |
Claudio Atzori
|
385214eeae
|
code formatting
|
2020-10-30 15:47:05 +01:00 |
Claudio Atzori
|
04ad8969b2
|
anticipated execution of the graph cleaning workflow
|
2020-10-30 15:46:55 +01:00 |
Claudio Atzori
|
4ca75d6951
|
Merge pull request 'Dedup ID creation policy' (#48) from deduptesting into stable_ids
|
2020-10-30 15:15:32 +01:00 |
Miriam Baglioni
|
b828587252
|
prevent the code to cicle indefinetly
|
2020-10-30 15:01:25 +01:00 |
Miriam Baglioni
|
f747e303ac
|
classes for dumping of the graph as ttl file
|
2020-10-30 14:13:45 +01:00 |
Miriam Baglioni
|
16baf5b69e
|
formatting
|
2020-10-30 14:13:14 +01:00 |
Miriam Baglioni
|
a9eef9c852
|
added check for possible Optional value in relation dataInfo
|
2020-10-30 14:12:28 +01:00 |
Miriam Baglioni
|
5f4de9a962
|
formatting
|
2020-10-30 14:11:40 +01:00 |
Miriam Baglioni
|
14bf2e7238
|
added option to split dumps bigger that 40Gb on different files
|
2020-10-30 14:09:04 +01:00 |
Dimitris
|
b8a3392b59
|
Commit 30102020
|
2020-10-30 14:07:21 +02:00 |
Claudio Atzori
|
58f28296ea
|
ProvisionConstants moved as ModelHardLimits in dhp-common and applied to truncate long abstracts (len > 150000). Further filtering for empty PID values
|
2020-10-30 10:56:42 +01:00 |
Miriam Baglioni
|
78fdb11c3f
|
merge branch with master
|
2020-10-29 12:55:22 +01:00 |
Sandro La Bruzzo
|
1d9fdb7367
|
fixed spark memory issue in SparkSplitOafTODLIEntities
|
2020-10-28 12:30:32 +01:00 |
Miriam Baglioni
|
d2374e3b9e
|
added code to handle cases where the funding tree is not existing
|
2020-10-27 16:15:21 +01:00 |
Miriam Baglioni
|
5d3012eeb4
|
changed code to dump only the programme list and not the classification list
|
2020-10-27 16:14:18 +01:00 |
Miriam Baglioni
|
3241ec1777
|
added connection timeout and socket timeout 600 sec
|
2020-10-27 16:12:11 +01:00 |
Enrico Ottonello
|
9818e74a70
|
added dependency version in main pom.xml for orcid no doi
|
2020-10-22 16:38:00 +02:00 |
Enrico Ottonello
|
210a50e4f4
|
replaced null value
|
2020-10-22 16:24:42 +02:00 |
Enrico Ottonello
|
b0290dbcb7
|
moved all dependencies version to main pom.xml
|
2020-10-22 16:20:46 +02:00 |
Enrico Ottonello
|
a38ab57062
|
let run test methods
|
2020-10-22 15:43:50 +02:00 |
Enrico Ottonello
|
1139d6568d
|
replaced null value with a more safe empty string as return value
|
2020-10-22 15:32:26 +02:00 |
Enrico Ottonello
|
c58db1c8ea
|
added filter on null value after map function
|
2020-10-22 15:11:02 +02:00 |
Enrico Ottonello
|
846ba30873
|
if typologies mapping fails, an exception will be propagated
|
2020-10-22 14:36:18 +02:00 |
Enrico Ottonello
|
c3114ba0ae
|
replaced null as return value with a more safe empty string
|
2020-10-22 14:21:31 +02:00 |
Enrico Ottonello
|
c295c71ca0
|
added comment
|
2020-10-22 14:07:26 +02:00 |
Enrico Ottonello
|
ab083f9946
|
propagate exception on parsing work (PR request)
|
2020-10-22 14:02:32 +02:00 |
sandro
|
3a81a940b7
|
solved bug on merge publication
|
2020-10-21 22:41:55 +02:00 |
Miriam Baglioni
|
a2ce527fae
|
changed to match the requirements for short titles in level and long titles in classification
|
2020-10-20 17:03:25 +02:00 |
Sandro La Bruzzo
|
346ed65e2c
|
added upload to zenodo node
|
2020-10-20 16:59:55 +02:00 |
sandro
|
271b4db450
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-10-20 16:09:49 +02:00 |
sandro
|
d58d02d448
|
added workflow upload on zenodo
|
2020-10-20 16:09:07 +02:00 |
miconis
|
c4a59d1b9a
|
merge with the master to port the new packages
|
2020-10-20 16:07:30 +02:00 |
miconis
|
708d887e64
|
minor changes
|
2020-10-20 15:12:19 +02:00 |
miconis
|
0e54803177
|
bug fix in the id generator and implementation of jobs for organization dedup
|
2020-10-20 12:19:46 +02:00 |
Alessia Bardi
|
1425d810a8
|
testing mapping
|
2020-10-19 17:46:14 +02:00 |
Claudio Atzori
|
266bf1a221
|
common IdentifierFactory in use on the mapping from the aggregator data; merge the entities sharing the same id; code formatting
|
2020-10-16 17:02:10 +02:00 |
Claudio Atzori
|
34f1d0904b
|
common IdentifierFactory in use on the mapping from the aggregator data
|
2020-10-16 16:00:19 +02:00 |
Sandro La Bruzzo
|
fed711da80
|
Merge remote-tracking branch 'origin/master' into merge_record_to_common
|
2020-10-13 15:32:45 +02:00 |
Sandro La Bruzzo
|
34bf64c94f
|
fixed export Scholexplorer to OpenAire
|
2020-10-13 08:47:58 +02:00 |
Alessia Bardi
|
8775a64bc1
|
Merge pull request 'Merging different compatibility levels (pinocchio operator)' (#47) from merge_graph into master
|
2020-10-09 14:44:52 +02:00 |
Claudio Atzori
|
e751c1402f
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-10-09 13:53:21 +02:00 |
Claudio Atzori
|
b961dc7d1e
|
added originalid to the fields in the result graph view
|
2020-10-09 13:53:15 +02:00 |
miconis
|
6f8720982c
|
bug fix in the idgenerator and test implementation
|
2020-10-09 09:30:23 +02:00 |
Sandro La Bruzzo
|
734934e2eb
|
fixed error on empty intersection with publication and relation on export to OAF
|
2020-10-08 17:29:29 +02:00 |
Sandro La Bruzzo
|
eec418cd26
|
moved AuthoreMerger into dhp-common
|
2020-10-08 10:33:55 +02:00 |
Sandro La Bruzzo
|
fe0a7870e6
|
Added test to check if merge authors works
|
2020-10-08 10:33:12 +02:00 |
Sandro La Bruzzo
|
cd9c377d18
|
adpted scholexplorer Dump generation to the new Dataset definition
|
2020-10-08 10:10:13 +02:00 |
Claudio Atzori
|
a3f37a9414
|
javadoc
|
2020-10-07 16:44:22 +02:00 |
Claudio Atzori
|
8d85a2fced
|
[BETA wf only] datasources involved in the merge operation doesn't obey to the infra precedence policy, but relies on a custom behaviour that, given two datasources from beta and prod returns the one from prod with the highest compatibility among the two
|
2020-10-07 16:28:52 +02:00 |
Claudio Atzori
|
5f7b75f5c5
|
code formatting
|
2020-10-07 13:22:54 +02:00 |
miconis
|
1804c5d809
|
refactoring: classes moved in the right package
|
2020-10-06 16:44:51 +02:00 |
miconis
|
7093355487
|
bug fix and minor changes
|
2020-10-06 16:21:34 +02:00 |
miconis
|
5a8bc329c5
|
bug fix in the result merge: it takes the correct bestaccessright basing on the license instead of the trust
|
2020-10-06 15:26:44 +02:00 |
miconis
|
a2ac7e52fb
|
implementation of the workflow for new organizations in openorgs
|
2020-10-06 13:58:09 +02:00 |
Miriam Baglioni
|
061527f06e
|
adding short description
|
2020-10-05 13:54:39 +02:00 |
Miriam Baglioni
|
0c12d7bdd8
|
adding short description
|
2020-10-05 11:39:55 +02:00 |
Miriam Baglioni
|
ae08b3c0dd
|
merge branch with master
|
2020-10-05 11:35:55 +02:00 |
Miriam Baglioni
|
11b7eaae09
|
changed the name of the folder where to store the context entity from context to communities_infrastructures
|
2020-10-05 11:24:54 +02:00 |
Miriam Baglioni
|
32bffb0134
|
changed the name from communities_infrastructures to communities_infrastuctures.json
|
2020-10-05 11:24:17 +02:00 |
Claudio Atzori
|
23f64d9eb4
|
updated dedup tests following the dnet-pace-core library update
|
2020-10-02 14:30:53 +02:00 |
Miriam Baglioni
|
fc2f7636be
|
removed not used code
|
2020-10-02 12:33:52 +02:00 |
Miriam Baglioni
|
25cbcf6114
|
changed to solve issues about names. context renamed communities_infrastructure.json and removed the double json.gz extention to the name of the part in the tar
|
2020-10-02 12:17:46 +02:00 |
Claudio Atzori
|
9db0f88fb8
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-10-02 09:43:35 +02:00 |
Claudio Atzori
|
49ae3450a9
|
code formatting
|
2020-10-02 09:43:24 +02:00 |
Claudio Atzori
|
c2a6e2a9bf
|
fixed mapping for datasource journal info (ISSNs)
|
2020-10-02 09:37:08 +02:00 |
Miriam Baglioni
|
01117a46e1
|
whole workflow activated
|
2020-10-01 17:19:21 +02:00 |
Miriam Baglioni
|
cfb5766c6b
|
removed double json.gz from names of files in the tar
|
2020-10-01 17:18:34 +02:00 |
Miriam Baglioni
|
fcaedac980
|
merge branch with master
|
2020-10-01 16:46:59 +02:00 |
Miriam Baglioni
|
c6e6ed1bd8
|
merge branch with master
|
2020-10-01 16:24:41 +02:00 |
Miriam Baglioni
|
4aec347351
|
refactoring
|
2020-10-01 16:23:52 +02:00 |
Miriam Baglioni
|
61946b4092
|
refactoring
|
2020-10-01 16:22:48 +02:00 |
Miriam Baglioni
|
7e6d35e56c
|
added the link to the excel file related to topic
|
2020-10-01 15:53:31 +02:00 |
Sandro La Bruzzo
|
1a0a44e85a
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-10-01 15:46:53 +02:00 |
Sandro La Bruzzo
|
c4a3c52e45
|
fixed Doiboost bug in the identifier
|
2020-10-01 15:46:44 +02:00 |
Miriam Baglioni
|
43cbd62c2b
|
added classpath.first in the configuration
|
2020-10-01 15:46:34 +02:00 |
Miriam Baglioni
|
cd69c6b023
|
added dependency for the topic file path
|
2020-10-01 15:45:59 +02:00 |
Miriam Baglioni
|
771cde3d05
|
moved the library version to global pom
|
2020-10-01 15:43:47 +02:00 |
Miriam Baglioni
|
632351c0da
|
modified test resources to mirror the changed in the code
|
2020-10-01 15:43:02 +02:00 |
Miriam Baglioni
|
ebc1c5513f
|
modified test resources to mirror the changed in the code
|
2020-10-01 15:42:29 +02:00 |
Miriam Baglioni
|
3a374c34b6
|
fixed null pointer exception
|
2020-10-01 15:41:01 +02:00 |
Miriam Baglioni
|
83ea746163
|
added check to the test
|
2020-10-01 15:40:28 +02:00 |
Claudio Atzori
|
2e9e13444d
|
author pids made unique by value
|
2020-10-01 12:50:40 +02:00 |
Miriam Baglioni
|
6e5db85b32
|
-
|
2020-10-01 11:51:11 +02:00 |
Miriam Baglioni
|
a46179f61c
|
refactoring
|
2020-10-01 11:22:01 +02:00 |
Miriam Baglioni
|
b90bee124b
|
removing raws that are empy from thos imported
|
2020-10-01 11:16:49 +02:00 |
Miriam Baglioni
|
c107f193c9
|
refactoring
|
2020-10-01 11:16:22 +02:00 |
Claudio Atzori
|
e265c3e125
|
cleaning functions factored out in a dedicated class
|
2020-10-01 10:50:15 +02:00 |
Miriam Baglioni
|
706a80a29a
|
added test to check that separator '-' (not hyphen) will be recognized
|
2020-10-01 10:38:31 +02:00 |
Miriam Baglioni
|
3dca586b3b
|
refactoring
|
2020-10-01 10:34:48 +02:00 |
Miriam Baglioni
|
416bda6066
|
changed the programme.desxcription by using the same value used in the classification instead of the short title or the title
|
2020-10-01 10:31:33 +02:00 |
Miriam Baglioni
|
f6587c91f3
|
added comparison to a char that seems - but it is not
|
2020-10-01 10:30:26 +02:00 |
Claudio Atzori
|
4287164aba
|
include relevantdate field in the result view
|
2020-10-01 10:28:55 +02:00 |
miconis
|
e3f7798d1b
|
minor changes in dedup tests, bug fix in the idgenerator and pace-core version update
|
2020-09-29 15:31:46 +02:00 |
Miriam Baglioni
|
7e73bb88b3
|
changed the logic to add the topic description to the project
|
2020-09-28 17:21:43 +02:00 |
Miriam Baglioni
|
0a035e3630
|
-
|
2020-09-28 17:20:49 +02:00 |
Miriam Baglioni
|
16bee2084d
|
added the topic code to the project subset
|
2020-09-28 17:20:11 +02:00 |
Miriam Baglioni
|
0bf2d0db52
|
added to the workflow the download of the topic excel file and one property needed to get the input path of the topic file in the hdfs filesystem
|
2020-09-28 12:17:22 +02:00 |
Miriam Baglioni
|
c2abde4d9f
|
changed the implementation of Atomic Actions creation by exploiting the topic information get from the cordis excel file
|
2020-09-28 12:16:34 +02:00 |
Miriam Baglioni
|
d930b8d3fc
|
changed the query to get only the code of the project and not the optional1 (topic code) and optional2 (topic description)
|
2020-09-28 12:15:48 +02:00 |
Miriam Baglioni
|
f8f5cfd5cc
|
removed the part added to set the topic code and description in the step of project preparation
|
2020-09-28 12:13:33 +02:00 |
Miriam Baglioni
|
9e19c9a221
|
remove the topic description from the values in the CSVProject class
|
2020-09-28 12:11:03 +02:00 |
Miriam Baglioni
|
6d8b932e40
|
refactoring
|
2020-09-28 12:06:56 +02:00 |
Miriam Baglioni
|
b77f166549
|
changed the package name from csvutils to utils
|
2020-09-28 12:05:47 +02:00 |
Miriam Baglioni
|
e33e3277de
|
added needed dependency to read the excel file
|
2020-09-28 12:03:14 +02:00 |
Miriam Baglioni
|
f4739a371a
|
code to get the information related to the topic association between code and description.
|
2020-09-28 12:02:48 +02:00 |
Miriam Baglioni
|
7b6a7333e6
|
merge branch with master
|
2020-09-25 16:42:07 +02:00 |
Miriam Baglioni
|
983a12ed15
|
temporary modification to allow the upload of files in the sandbox without the neew to recreate the mapping from scratch
|
2020-09-25 16:41:51 +02:00 |
Miriam Baglioni
|
8b36d19182
|
added property depositionId and chenage property newVersion that became string from boolean to handle the three possible distinct values
|
2020-09-25 16:41:15 +02:00 |
Miriam Baglioni
|
ed5239f9ec
|
added new code to handle the new possibility to upload files to an already open deposition
|
2020-09-25 16:34:32 +02:00 |
Miriam Baglioni
|
3a8c524fce
|
refactor
|
2020-09-25 16:34:02 +02:00 |
Miriam Baglioni
|
2ac2b537b6
|
merge branch with master
|
2020-09-25 14:40:47 +02:00 |
Miriam Baglioni
|
54800fb9b0
|
enabled only the step to upload in zenodo
|
2020-09-25 14:40:22 +02:00 |
Miriam Baglioni
|
12c2dfc268
|
modified the resource to consider the information added to the model
|
2020-09-25 14:17:23 +02:00 |
Miriam Baglioni
|
969fa8d96e
|
fixed issue and changed the transformation of the programme file to consider the new model
|
2020-09-25 13:32:34 +02:00 |
miconis
|
4cf79f32eb
|
implementation of the oozie wf to prepare the openorgs input: relations between organizations
|
2020-09-25 11:29:51 +02:00 |
Michele Artini
|
c171fdebe1
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-09-25 09:03:09 +02:00 |
Michele Artini
|
c96598aaa4
|
opendoar partition
|
2020-09-25 09:02:58 +02:00 |
Miriam Baglioni
|
de6c4d46d8
|
fixed conflicts
|
2020-09-24 15:35:01 +02:00 |
Miriam Baglioni
|
e917281822
|
-
|
2020-09-24 15:24:05 +02:00 |
Miriam Baglioni
|
9f54f69e6d
|
added topic information
|
2020-09-24 15:23:35 +02:00 |
Miriam Baglioni
|
d6206d6e63
|
add the topic description to the action set associated to the project
|
2020-09-24 15:22:40 +02:00 |
Miriam Baglioni
|
6b50226f3b
|
added topic code and topic description
|
2020-09-24 15:21:49 +02:00 |
Miriam Baglioni
|
15af1f527e
|
modified to consider the topic information
|
2020-09-24 15:20:56 +02:00 |
Miriam Baglioni
|
609ff17cfc
|
now the commission give us the framework programme (FP7 - H2020) so use this information to filter out programmes not associated to H2020
|
2020-09-24 15:19:31 +02:00 |
Miriam Baglioni
|
b66f930466
|
Added optionl1 and optional2 information to the files red from the db. Optional1 contains the topic code and optional2 contains the topic description
|
2020-09-24 15:16:56 +02:00 |
Miriam Baglioni
|
860e6d38a6
|
added topic description to the CSV project variables
|
2020-09-24 15:15:26 +02:00 |
Claudio Atzori
|
044d3a0214
|
fixed query used to load datasources in the Graph
|
2020-09-24 13:48:58 +02:00 |
Claudio Atzori
|
27df1cea6d
|
code formatting
|
2020-09-24 12:16:00 +02:00 |
Claudio Atzori
|
fb22f4d70b
|
included values for projects fundedamount and totalcost fields in the mapping tests. Swapped expected and actual values in junit test assertions
|
2020-09-24 12:10:59 +02:00 |
Claudio Atzori
|
42f55395c8
|
fixed order of the ISSNs returned by the SQL query
|
2020-09-24 12:09:58 +02:00 |
Claudio Atzori
|
fadf5c7c69
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-09-24 10:42:52 +02:00 |
Claudio Atzori
|
9a7e72d528
|
using concat_ws to join textual columns from PSQL. When using || to perform the concatenation, Null columns makes the operation result to be Null
|
2020-09-24 10:42:47 +02:00 |
Claudio Atzori
|
9e3e93c6b6
|
setting the correct issn type in the datasource.journal element
|
2020-09-24 10:39:16 +02:00 |
Miriam Baglioni
|
0d83f47166
|
merge branch with master
|
2020-09-23 17:33:49 +02:00 |
Miriam Baglioni
|
39eb8ab25b
|
changed the dump to move from h2020programme to h2020classification
|
2020-09-23 17:33:00 +02:00 |
Miriam Baglioni
|
1d84cf19a6
|
added new line to resource file
|
2020-09-23 17:32:22 +02:00 |
Miriam Baglioni
|
f0c476b6c9
|
modification to the test classes to consider h2020classification
|
2020-09-23 17:31:49 +02:00 |
Miriam Baglioni
|
2cba3cb484
|
modification to the classes building the actionset to consider the h2020classification
|
2020-09-23 17:31:15 +02:00 |
Miriam Baglioni
|
1069cf243a
|
modification to the schema to consider the H2020classification of the programme. The filed Programme has been moved inside the H2020classification that is now associated to the Project. Programme is no more associated directly to the Project but via H2020CLassification
|
2020-09-22 14:38:00 +02:00 |
Enrico Ottonello
|
a97ad20c7b
|
exception is now propagated (PR review)
|
2020-09-22 10:46:34 +02:00 |
Enrico Ottonello
|
fefbcfb106
|
dependency version moved to main pom (PR review)
|
2020-09-22 10:20:25 +02:00 |
miconis
|
259362ef47
|
implementation of the job to collect simrels from postgres db
|
2020-09-22 09:43:27 +02:00 |
Michele Artini
|
9e681609fd
|
stats to sql file
|
2020-09-17 15:51:22 +02:00 |
Michele Artini
|
51321c2701
|
partition of events by opedoarId
|
2020-09-17 11:38:07 +02:00 |
Claudio Atzori
|
cf2ce1a09b
|
code formatting
|
2020-09-15 15:58:03 +02:00 |
Enrico Ottonello
|
9e8e7fe6ef
|
add comments
|
2020-09-15 11:32:49 +02:00 |
Miriam Baglioni
|
c2b5c780ff
|
-
|
2020-09-14 14:34:03 +02:00 |
Miriam Baglioni
|
e2ceefe9be
|
-
|
2020-09-14 14:33:28 +02:00 |
Miriam Baglioni
|
1f893e63dc
|
-
|
2020-09-14 14:33:10 +02:00 |
Enrico Ottonello
|
538f299767
|
merged
|
2020-09-14 12:35:16 +02:00 |
Enrico Ottonello
|
eb8c9b2348
|
Merge remote-tracking branch 'upstream/master' into orcid-no-doi
|
2020-09-14 12:00:56 +02:00 |
Michele Artini
|
9b0c12f5d3
|
send notifications
|
2020-09-11 12:06:16 +02:00 |
Michele Artini
|
028613b751
|
remove old notifications
|
2020-09-09 15:32:06 +02:00 |
Michele Artini
|
9cfc124ac5
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-09-08 16:39:54 +02:00 |
Michele Artini
|
a597a218ab
|
* forall topics
|
2020-09-08 16:39:40 +02:00 |
Claudio Atzori
|
8a523474b7
|
code formatting
|
2020-09-07 11:40:16 +02:00 |
Michele Artini
|
bb459caf69
|
support for all topic subscriptions
|
2020-08-27 11:01:21 +02:00 |
Michele Artini
|
82ed8edafd
|
notification indexing
|
2020-08-26 15:10:48 +02:00 |
Miriam Baglioni
|
b72a7dad46
|
resuorce for pid graph dump
|
2020-08-24 17:09:01 +02:00 |
Miriam Baglioni
|
8694bb9b31
|
refactoring due to compilation
|
2020-08-24 17:07:34 +02:00 |
Miriam Baglioni
|
8a069a4fea
|
-
|
2020-08-24 17:01:30 +02:00 |
Miriam Baglioni
|
34fa96f3b1
|
-
|
2020-08-24 17:00:20 +02:00 |
Miriam Baglioni
|
5fb2949cb8
|
added utils methods
|
2020-08-24 17:00:09 +02:00 |
Miriam Baglioni
|
2a540b6c01
|
added constants for the pid graph dump
|
2020-08-24 16:55:35 +02:00 |
Miriam Baglioni
|
da103c399a
|
resources for the pid graph dump test
|
2020-08-24 16:52:07 +02:00 |
Miriam Baglioni
|
630a6a1fe7
|
first tests for the pid graph dump
|
2020-08-24 16:51:26 +02:00 |
Miriam Baglioni
|
40c8d2de7b
|
test resources for the dump of the pids graph
|
2020-08-24 16:50:39 +02:00 |
Miriam Baglioni
|
bef79d3bdf
|
first attempt to the dump of pids graph
|
2020-08-24 16:49:38 +02:00 |
Michele Artini
|
da470422d3
|
deleting events
|
2020-08-21 14:52:48 +02:00 |
Michele Artini
|
6e60bf026a
|
indexing only a subset of eventsa
|
2020-08-19 12:39:22 +02:00 |
Miriam Baglioni
|
85203c16e3
|
merge branch with master
|
2020-08-19 11:49:03 +02:00 |
Miriam Baglioni
|
2c783793ba
|
removed the affiliation from the author to mirror the changes in the model
|
2020-08-19 11:48:12 +02:00 |
Miriam Baglioni
|
f6bf888016
|
removed affiliation from author to mirror the changes in the model
|
2020-08-19 11:41:41 +02:00 |
Miriam Baglioni
|
66d0e0d3f2
|
-
|
2020-08-19 11:31:50 +02:00 |
Miriam Baglioni
|
1c593a9cfe
|
-
|
2020-08-19 11:29:51 +02:00 |
Miriam Baglioni
|
e42b2f5ae2
|
-
|
2020-08-19 11:29:09 +02:00 |
Miriam Baglioni
|
f81ee22418
|
changed to mirror the changes in the model (Instance, CommunityInstance, GraphResult)
|
2020-08-19 11:28:26 +02:00 |
Miriam Baglioni
|
387be43fd4
|
changed to discriminate if dumping all the results type together or each one in its own archive
|
2020-08-19 11:25:27 +02:00 |
Miriam Baglioni
|
c5858afb88
|
added parameter to guide the dump for the result (resultAggregation). true if all the result types should be dump together, false otherwise.
|
2020-08-19 11:24:14 +02:00 |
Miriam Baglioni
|
d407852ac2
|
changed to reflect the changed in the model
|
2020-08-19 11:15:05 +02:00 |
Miriam Baglioni
|
47c21a8961
|
refactoring due to compilation
|
2020-08-19 11:11:57 +02:00 |
Miriam Baglioni
|
5570678c65
|
changed parameter name from hfdsNameNode to nameNode
|
2020-08-19 10:59:26 +02:00 |
Miriam Baglioni
|
dc5096a327
|
refactoring due to compilation
|
2020-08-19 10:57:36 +02:00 |
Miriam Baglioni
|
55e24c2547
|
relclass for relation and corresponding values have been put to lower case (isSupplementedBy wrote as IsSupplementedBy - orcid propagation)
|
2020-08-18 16:42:08 +02:00 |
Miriam Baglioni
|
f44dd5d886
|
changed in mapping the result semantic name as it will be visible il the relclass Relation: from IsSupplementedBy to isSupplementedBy
|
2020-08-17 17:15:09 +02:00 |
Miriam Baglioni
|
bc6b5d5b34
|
removed leftover parameter
|
2020-08-15 11:22:35 +02:00 |
Miriam Baglioni
|
200cd5c730
|
removed leftover parameter
|
2020-08-15 11:22:19 +02:00 |
Miriam Baglioni
|
96600ed04a
|
modified test resource for mirroring the deletion of affiliation from author parameters
|
2020-08-14 20:41:49 +02:00 |
Miriam Baglioni
|
09f5b92763
|
added specific reference to class
|
2020-08-14 20:00:09 +02:00 |
Miriam Baglioni
|
37e7c43652
|
changed parameter name from hdfsNaemNode to nameNode
|
2020-08-14 18:18:25 +02:00 |
Claudio Atzori
|
5b994d7ccf
|
Merge branch 'dump' of https://code-repo.d4science.org/miriam.baglioni/dnet-hadoop into resolve_conflicts_pr40_dump
|
2020-08-14 15:32:29 +02:00 |
Miriam Baglioni
|
de995970ea
|
try again to solve clash with master
|
2020-08-14 15:24:36 +02:00 |
Miriam Baglioni
|
5040d72d5e
|
changed to make it equal to master branch
|
2020-08-14 15:20:17 +02:00 |
Miriam Baglioni
|
be8106c339
|
added space toavoid conflicts with master branch
|
2020-08-14 15:16:27 +02:00 |
Claudio Atzori
|
1871d1c6f6
|
solve error java.lang.NoSuchFieldError: INSTANCE when instantiating Solr client
|
2020-08-14 11:18:30 +02:00 |
Miriam Baglioni
|
d2a8a4961a
|
refactoring
|
2020-08-13 18:50:33 +02:00 |
Miriam Baglioni
|
a5043de5da
|
added method to get the mapped instance
|
2020-08-13 18:45:50 +02:00 |
Miriam Baglioni
|
b7e49aee8d
|
removed commented code
|
2020-08-13 18:44:07 +02:00 |
Miriam Baglioni
|
f439a6231e
|
added missing constraint in XQuery (verify the status of the RC/RI different from hidden)
|
2020-08-13 15:30:55 +02:00 |
Miriam Baglioni
|
0fe800b1c9
|
modified because of D-Net/dnet-hadoop#40\#issuecomment-1902
|
2020-08-13 15:17:12 +02:00 |
Miriam Baglioni
|
270c89489c
|
fixed issue created while renaming subject to subjects in community configuration xml
|
2020-08-13 15:16:04 +02:00 |
Miriam Baglioni
|
fcd10f452c
|
changed because of D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:55:32 +02:00 |
Miriam Baglioni
|
fd48ae3b85
|
changed because of D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:19:15 +02:00 |
Miriam Baglioni
|
04a3e1ab38
|
disabled tests
|
2020-08-13 12:18:13 +02:00 |
Miriam Baglioni
|
2ede397933
|
Apply change because of D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:16:39 +02:00 |
Miriam Baglioni
|
bfd1fcde6d
|
removed not useful method and changed because of D-Net/dnet-hadoop#40 (comment) and D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:14:37 +02:00 |
Miriam Baglioni
|
7fd8397123
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:13:15 +02:00 |
Miriam Baglioni
|
753d448cc9
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:12:58 +02:00 |
Miriam Baglioni
|
c0e071fa26
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:12:40 +02:00 |
Miriam Baglioni
|
526db915bc
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:12:16 +02:00 |
Miriam Baglioni
|
b0fab0d138
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:11:57 +02:00 |
Miriam Baglioni
|
1b6320b251
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:11:41 +02:00 |
Miriam Baglioni
|
743d31be22
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:11:22 +02:00 |
Miriam Baglioni
|
65b48df652
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:11:06 +02:00 |
Miriam Baglioni
|
90b54d3efb
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:08:24 +02:00 |
Miriam Baglioni
|
69bbb9592a
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:07:39 +02:00 |
Miriam Baglioni
|
945323299a
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:07:24 +02:00 |
Miriam Baglioni
|
e04c993247
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:07:07 +02:00 |
Miriam Baglioni
|
ed0812d0ce
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:06:49 +02:00 |
Miriam Baglioni
|
d55cfe0ea5
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:06:20 +02:00 |
Miriam Baglioni
|
80866bec7d
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:06:05 +02:00 |
Miriam Baglioni
|
1400978c0a
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:05:44 +02:00 |
Miriam Baglioni
|
7b941a2e0a
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:05:17 +02:00 |
Miriam Baglioni
|
f7474f50fe
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:04:52 +02:00 |
Miriam Baglioni
|
367203f412
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:04:33 +02:00 |
Miriam Baglioni
|
3ab4809d31
|
apply changes in D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 12:04:10 +02:00 |
Miriam Baglioni
|
02a4986e7b
|
Applying changed from code reviews D-Net/dnet-hadoop#40 (comment) and D-Net/dnet-hadoop#40 (comment) and D-Net/dnet-hadoop#40 (comment)
|
2020-08-13 11:53:01 +02:00 |
Miriam Baglioni
|
235d4e4d6e
|
moved Context as relevant for Communities dump
|
2020-08-12 18:16:45 +02:00 |
Miriam Baglioni
|
adf9f96a67
|
test for extraction of relation between organizations and context
|
2020-08-12 10:04:47 +02:00 |
Miriam Baglioni
|
7400cd019d
|
removed not needed variable
|
2020-08-12 10:03:33 +02:00 |
Miriam Baglioni
|
98d28bab5c
|
fixed missing _ in context nsprefix
|
2020-08-12 10:00:18 +02:00 |
Miriam Baglioni
|
8f48cb29f4
|
changed resource because of a change in the XQuery that returned the XML to be parsed. The main Zenodo community is no more a separate element, but part of the <zenodocommunities> element
|
2020-08-11 18:04:38 +02:00 |
Miriam Baglioni
|
c3672b162b
|
merge branch with master
|
2020-08-11 17:53:04 +02:00 |
Miriam Baglioni
|
a16bbf3202
|
changed test resource to mirror change in the Xquery that produced data to be parsed. The main Zenodo community it is no more provided in a different element, but it is part of the <zenodocommunities>
|
2020-08-11 17:48:44 +02:00 |
Miriam Baglioni
|
25f4fbceea
|
draft of test and resources
|
2020-08-11 17:37:22 +02:00 |
Miriam Baglioni
|
30a2b19b65
|
changed metadata for deposition od covid-19 dump in Zenodo
|
2020-08-11 17:36:56 +02:00 |
Claudio Atzori
|
f7cc52ab02
|
Merge pull request 'enrichment_wfs' (#39) from enrichment_wfs into master
LGTM
|
2020-08-11 17:26:13 +02:00 |
Miriam Baglioni
|
49788b532a
|
changed to mirror changes in the schema
|
2020-08-11 16:05:03 +02:00 |
Miriam Baglioni
|
b08511287b
|
-
|
2020-08-11 16:01:36 +02:00 |
Miriam Baglioni
|
7e81a17068
|
changed the XQUERY to mirror the change in the code
|
2020-08-11 16:00:33 +02:00 |
Miriam Baglioni
|
37ad2f28e9
|
removed added | in prefix for datasource
|
2020-08-11 15:55:06 +02:00 |
Miriam Baglioni
|
f31c2e9461
|
enabled test
|
2020-08-11 15:49:25 +02:00 |
Miriam Baglioni
|
2d67476417
|
merge branch with master
|
2020-08-11 15:46:04 +02:00 |
Miriam Baglioni
|
77a390878c
|
merge upstream
|
2020-08-11 15:45:48 +02:00 |
Miriam Baglioni
|
6d3804e24c
|
-
|
2020-08-11 15:45:12 +02:00 |
Miriam Baglioni
|
0603ec4757
|
changed test to upload the dump for covid-19 community
|
2020-08-11 15:43:25 +02:00 |
Miriam Baglioni
|
7dfd56df9d
|
-
|
2020-08-11 15:42:35 +02:00 |
Miriam Baglioni
|
a169d7e7c1
|
added test file for the MakeTar class
|
2020-08-11 15:40:41 +02:00 |
Miriam Baglioni
|
acb0926b2e
|
json schemas for the dumped entities and relation
|
2020-08-11 15:39:48 +02:00 |
Miriam Baglioni
|
ff52c51f92
|
added the communityMapPath parameter and removed the isLookUpUrl parameter
|
2020-08-11 15:39:22 +02:00 |
Miriam Baglioni
|
6f43acda5e
|
added the maketar and send to zenodo step. Adjusted wf parameters
|
2020-08-11 15:38:20 +02:00 |
Miriam Baglioni
|
ddc19de2e9
|
removed the isLookUpUrl among the parameters
|
2020-08-11 15:37:47 +02:00 |
Miriam Baglioni
|
592a8ea573
|
added parameter file for maketar class
|
2020-08-11 15:37:14 +02:00 |
Miriam Baglioni
|
77a0951b32
|
added the make archive step in the workflow
|
2020-08-11 15:32:32 +02:00 |
Miriam Baglioni
|
cf4d918787
|
added description, changed parameter name and added method
|
2020-08-11 15:27:31 +02:00 |
Miriam Baglioni
|
dc5fc5366d
|
Creation of an archive for each related dump part
|
2020-08-11 15:26:06 +02:00 |
Miriam Baglioni
|
0ce49049d6
|
added description
|
2020-08-11 15:25:11 +02:00 |
Miriam Baglioni
|
9bae991167
|
added description of the class
|
2020-08-11 11:20:43 +02:00 |
Miriam Baglioni
|
341dc59ead
|
removed the repartition(1). Added code for the creation of an archive containing all the parts dumped for each community
|
2020-08-11 11:18:58 +02:00 |
Sandro La Bruzzo
|
fe8d640aee
|
fixed error on oozie workflow
|
2020-08-11 09:43:03 +02:00 |
Sandro La Bruzzo
|
304590e854
|
updated workflow of indexing to start from begin
|
2020-08-11 09:17:47 +02:00 |
Sandro La Bruzzo
|
eaf0dc68a2
|
fixed indexing
|
2020-08-11 09:17:03 +02:00 |
Miriam Baglioni
|
1991a49f70
|
removed reference to isLookUp to get the communityMap
|
2020-08-10 18:02:56 +02:00 |
Miriam Baglioni
|
c378c38546
|
disabled test. The testing functionalities for hte upload in Zenode are moved to common
|
2020-08-10 12:41:11 +02:00 |
Miriam Baglioni
|
63ad0ed209
|
changed to use communityMapPath instead of IsLookUp
|
2020-08-10 12:40:19 +02:00 |
Miriam Baglioni
|
cec795f2ea
|
changed resources to mirror changes in the model
|
2020-08-10 12:39:35 +02:00 |
Miriam Baglioni
|
f50e3e7333
|
changed the class for which to generate the schema
|
2020-08-10 12:03:49 +02:00 |
Miriam Baglioni
|
b8c26f656c
|
test using communityMapPath instead of isLookUp
|
2020-08-10 12:02:55 +02:00 |
Miriam Baglioni
|
fe88904df0
|
changed the wf definition
|
2020-08-10 12:01:14 +02:00 |
Miriam Baglioni
|
87856467e2
|
removed isLookUpUrl and added code to read from HDSF the communitymap
|
2020-08-10 11:38:41 +02:00 |
Miriam Baglioni
|
1cf7043e26
|
removed isLookUoUrl from the parameters
|
2020-08-10 11:38:03 +02:00 |
Claudio Atzori
|
cf6b68ce5a
|
Merge pull request 'data provision workflow: add nodes to perform DELETE BY QUERY before the indexing begins and COMMIT after the indexing is completed' (#36) from provision_indexing into master
|
2020-08-10 11:16:29 +02:00 |
Sandro La Bruzzo
|
0ade33ad15
|
updated mergeFrom function for DLI Unknown
|
2020-08-10 10:18:35 +02:00 |
Miriam Baglioni
|
46986aae2d
|
added the new parameter for newdeposion/newversion and concept_record_id
|
2020-08-07 18:00:06 +02:00 |
Miriam Baglioni
|
3aedfdf0d6
|
added option to do a new deposition or new version of an old deposition
|
2020-08-07 17:49:14 +02:00 |
Miriam Baglioni
|
1b3ad1bce6
|
filter out authors pid (only orcid). Added check to get unique provenance for context id. filtr out countries with code UNKNOWN
|
2020-08-07 17:48:18 +02:00 |
Miriam Baglioni
|
5ceb8c5f0a
|
moved constants from graph.Constants
|
2020-08-07 17:46:47 +02:00 |
Miriam Baglioni
|
6c65c93c0e
|
refactoring
|
2020-08-07 17:45:35 +02:00 |
Miriam Baglioni
|
68adf86fe4
|
refactoring
|
2020-08-07 17:43:20 +02:00 |
Miriam Baglioni
|
26d2ad6ebb
|
refactoring
|
2020-08-07 17:41:56 +02:00 |
Miriam Baglioni
|
9675af7965
|
refactoring
|
2020-08-07 17:41:07 +02:00 |
Miriam Baglioni
|
346a91f4d9
|
Added constants
|
2020-08-07 17:35:39 +02:00 |
Miriam Baglioni
|
d52b0e1797
|
no use of IsLookUp. The query is done once and its result stored on HDFS. The path to the result is given instead of the isLookUpUrl
|
2020-08-07 17:34:40 +02:00 |
Miriam Baglioni
|
ae1b7fbfdb
|
changed method signature from set of mapkey entries to String representing path on file system where to find the map
|
2020-08-07 17:32:27 +02:00 |
Miriam Baglioni
|
931fa2ff00
|
removed dependencies
|
2020-08-07 16:46:37 +02:00 |
Miriam Baglioni
|
545ea9f77e
|
moved in common. Zenodo response model and APIClient to deposit in Zenodo
|
2020-08-07 16:44:51 +02:00 |
Sandro La Bruzzo
|
ddb1446ceb
|
fixed test
|
2020-08-07 11:34:33 +02:00 |
Sandro La Bruzzo
|
718bc7bbc8
|
implemented provision workflows using the new implementation with Dataset
|
2020-08-07 11:05:18 +02:00 |
Miriam Baglioni
|
da9b012c15
|
fixed dewcription
|
2020-08-06 11:55:44 +02:00 |
Miriam Baglioni
|
6dbadcf181
|
the new schema for the dumped result
|
2020-08-06 11:05:56 +02:00 |
Sandro La Bruzzo
|
a44e5abaa7
|
reformat code
|
2020-08-06 10:30:22 +02:00 |
Sandro La Bruzzo
|
4fb1821fab
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-08-06 10:28:31 +02:00 |
Sandro La Bruzzo
|
9d9e9edbd2
|
improved extractEntity Relation workflows using dataset
|
2020-08-06 10:28:24 +02:00 |
Miriam Baglioni
|
adf0ca5aa7
|
test to send is from hdfs
|
2020-08-05 14:24:43 +02:00 |
Miriam Baglioni
|
14eda4f46e
|
added method to try to put inputstream to zenodo
|
2020-08-05 14:18:25 +02:00 |
Miriam Baglioni
|
e737a47270
|
added classes to try to send input stream to zenodo for the upload
|
2020-08-05 14:17:40 +02:00 |
Miriam Baglioni
|
873e9cd50c
|
changed hadoop setting to connect to s3
|
2020-08-04 15:37:25 +02:00 |
Alessia Bardi
|
a29565ff57
|
code formatting
|
2020-08-04 12:55:27 +02:00 |
Alessia Bardi
|
01db29e208
|
fixes redmine issue #5846: datacite and its different namespace declarations
|
2020-08-04 12:53:48 +02:00 |
Alessia Bardi
|
b4e4e5f858
|
do not duplicate result PIDs
|
2020-08-04 12:52:14 +02:00 |
Alessia Bardi
|
09a323d18d
|
testing a dataset from Nakala
|
2020-08-04 12:50:52 +02:00 |
Alessia Bardi
|
c35bf486cc
|
added handle among the possible PIDs
|
2020-08-04 12:50:12 +02:00 |
Miriam Baglioni
|
5b651abf82
|
merge branch with master
|
2020-08-04 10:14:07 +02:00 |
Miriam Baglioni
|
88e4c3b751
|
added default trust to context bulktagged
|
2020-08-04 10:13:25 +02:00 |
Miriam Baglioni
|
f9342cb484
|
added constant
|
2020-08-03 18:32:35 +02:00 |
Miriam Baglioni
|
96c3c891f4
|
added trust
|
2020-08-03 18:32:17 +02:00 |
Miriam Baglioni
|
53656600ad
|
changed XQuery to select only community and ri with status not hidden
|
2020-08-03 18:29:30 +02:00 |
Miriam Baglioni
|
b34177d8ef
|
merge upstream
|
2020-08-03 18:13:42 +02:00 |
Miriam Baglioni
|
901ae37f7b
|
added step to workflow
|
2020-08-03 18:12:54 +02:00 |
Miriam Baglioni
|
fa38cdb10b
|
added resource
|
2020-08-03 18:11:12 +02:00 |
Miriam Baglioni
|
e9fcc0b2f1
|
commented test unit - to decide change for mirroring the changed logics
|
2020-08-03 18:10:53 +02:00 |
Miriam Baglioni
|
e43aeb139a
|
added new property file and changed some parameter to old files
|
2020-08-03 18:07:28 +02:00 |
Miriam Baglioni
|
aa9f3d9698
|
changed logic for save in s3 directly
|
2020-08-03 18:06:18 +02:00 |
Miriam Baglioni
|
d465f0eec9
|
added fulltext to result
|
2020-08-03 18:03:27 +02:00 |
Miriam Baglioni
|
ec4b392d12
|
added new dependencies for writing on s3
|
2020-08-03 17:57:04 +02:00 |
Miriam Baglioni
|
c892c7dfa7
|
changed to query for community map just once and save the result for remaining executions
|
2020-08-03 17:56:31 +02:00 |
Claudio Atzori
|
3a11a387a9
|
data provision workflow enhancement: added nodes to perform DELETE BY QUERY before the indexing begins and COMMIT after the indexing is completed
|
2020-08-03 14:28:08 +02:00 |
Alessia Bardi
|
8cc067fe76
|
specific test for claims
|
2020-08-03 11:17:50 +02:00 |
Claudio Atzori
|
a89b6cc3ba
|
Merge pull request 'nsprefix_blacklist' (#34) from nsprefix_blacklist into master
|
2020-07-31 11:52:23 +02:00 |
Sandro La Bruzzo
|
0c3bc9ea4b
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-07-31 09:07:18 +02:00 |
Sandro La Bruzzo
|
168bfb496a
|
adopted dedup to the new schema
|
2020-07-31 09:06:57 +02:00 |
Michele Artini
|
652b13abb6
|
Merge branch 'master' into nsprefix_blacklist
|
2020-07-31 07:58:37 +02:00 |
Enrico Ottonello
|
0377b40fba
|
output to one parquet file
|
2020-07-30 18:38:07 +02:00 |
Claudio Atzori
|
cd631bb5bc
|
defaults fixed in the cleaning workflow forces result.publisher to NULL when result.publisher.value in empty
|
2020-07-30 17:03:53 +02:00 |
Miriam Baglioni
|
872d7783fc
|
-
|
2020-07-30 16:50:36 +02:00 |
Miriam Baglioni
|
57c87b7653
|
re-implemented to fix issue on not serializable Set<String> variable
|
2020-07-30 16:43:43 +02:00 |
Miriam Baglioni
|
ef8e5957b5
|
added specific directory where to save results
|
2020-07-30 16:42:46 +02:00 |
Miriam Baglioni
|
75f3361c85
|
-
|
2020-07-30 16:41:31 +02:00 |
Miriam Baglioni
|
3f695b25fa
|
refactoring
|
2020-07-30 16:40:15 +02:00 |
Miriam Baglioni
|
e623f12bef
|
refactoring
|
2020-07-30 16:32:59 +02:00 |
Miriam Baglioni
|
ff7d05abb4
|
added support class to store the couple organizationId representativeId gaot from sql query on hive
|
2020-07-30 16:32:04 +02:00 |
Miriam Baglioni
|
cf6d80b2ab
|
added command to close the writer
|
2020-07-30 16:31:22 +02:00 |
Miriam Baglioni
|
f985bca37b
|
added USER_CLAIM constant value
|
2020-07-30 16:25:26 +02:00 |
Claudio Atzori
|
4bbfcf1ac6
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-07-30 16:25:06 +02:00 |
Claudio Atzori
|
4ff8007518
|
added function to set the missing vocabulary names, used in the cleaning workflow as a pre-cleaning step
|
2020-07-30 16:24:39 +02:00 |
Miriam Baglioni
|
6f1c40a933
|
-
|
2020-07-30 16:24:28 +02:00 |
Miriam Baglioni
|
2b66a93f9e
|
added property file that was missing
|
2020-07-30 16:24:17 +02:00 |
Michele Artini
|
bdece15ca0
|
blacklist of nsprefix
|
2020-07-30 16:13:38 +02:00 |
Enrico Ottonello
|
196f36c6ed
|
fix publication dataset creation
|
2020-07-30 13:38:33 +02:00 |
Sandro La Bruzzo
|
c97c8f0c44
|
implemented new oozie job to extract entities in a separate dataset
|
2020-07-30 12:13:58 +02:00 |
Sandro La Bruzzo
|
3010a362bc
|
updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset
|
2020-07-30 09:25:56 +02:00 |
Sandro La Bruzzo
|
487226f669
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-07-30 09:25:39 +02:00 |
Sandro La Bruzzo
|
16ae3c9ccf
|
updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset
|
2020-07-30 09:25:32 +02:00 |
Miriam Baglioni
|
ee8420c6b3
|
added resource for datasource test
|
2020-07-29 18:28:43 +02:00 |
Miriam Baglioni
|
76bcab98ce
|
added code to filter out null originalId from the dump
|
2020-07-29 18:28:21 +02:00 |
Miriam Baglioni
|
ef1d8aef17
|
added one test to verify the dump for the datasources
|
2020-07-29 18:27:46 +02:00 |
Miriam Baglioni
|
86bab79512
|
-
|
2020-07-29 18:20:22 +02:00 |
Miriam Baglioni
|
31791dcf3d
|
fixed wrong property file path name
|
2020-07-29 18:20:08 +02:00 |
Miriam Baglioni
|
9e722aa1ef
|
-
|
2020-07-29 18:00:08 +02:00 |
Miriam Baglioni
|
d22f106f27
|
added constant to identify datasource associated to funders
|
2020-07-29 17:56:55 +02:00 |
Miriam Baglioni
|
40e194fe2f
|
added check to not dump datasources related to funders
|
2020-07-29 17:56:18 +02:00 |
Miriam Baglioni
|
b48934f6df
|
changed the workflow name
|
2020-07-29 17:43:43 +02:00 |
Miriam Baglioni
|
1433db825d
|
refactorign
|
2020-07-29 17:43:24 +02:00 |
Miriam Baglioni
|
074e9ab75e
|
refactoring
|
2020-07-29 17:42:50 +02:00 |