Miriam Baglioni
5076e4f320
changed test to comply with the modifications
2020-07-20 17:55:18 +02:00
Miriam Baglioni
08dbd99455
changed to dump the whole results graph by usign classes already implemented for communities. Added class to dump also organization
2020-07-20 17:54:28 +02:00
Miriam Baglioni
e47ea9349c
extended some types by adding provenance as the couple (provenance, trust) and moved some classes to be used by the complete graph dump also
2020-07-20 17:46:27 +02:00
Claudio Atzori
32f5e466e3
imports cleanup
2020-07-20 17:42:58 +02:00
Claudio Atzori
54ac583923
code formatting
2020-07-20 17:37:08 +02:00
Claudio Atzori
124e7ce19c
in case of missing attribute //dr:CobjCategory/@type the resulttype is derived by looking up the vocabulary dnet:result_typologies with the 1st instance type available
2020-07-20 17:33:37 +02:00
Claudio Atzori
050dda223d
Merge pull request 'removed duplicated fields' ( #25 ) from unique_field_in_lists into master
...
Looks good as a temporary workaround. I agree the model could seamlessly make the distinct operation by using HashSets instead of Linked (or Array) Lists.
The task to update the model in such a way is added on #9#issuecomment-1583
Thanks!
2020-07-20 12:12:50 +02:00
Claudio Atzori
e0c4cf6f7b
added parameter to drive the graph merge strategy: priority (BETA|PROD)
2020-07-20 10:48:01 +02:00
Claudio Atzori
94ccdb4852
Merge branch 'master' into merge_graph
2020-07-20 10:14:55 +02:00
Michele Artini
331a3cbdd0
fixed originalId
2020-07-20 09:50:29 +02:00
Sandro La Bruzzo
9116d75b3e
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-07-17 18:01:30 +02:00
Miriam Baglioni
47c7122773
changed priority from beta to production
2020-07-17 12:56:35 +02:00
Michele Artini
442f30930c
removed duplicated fields
2020-07-17 12:25:36 +02:00
Claudio Atzori
1781609508
code formatting
2020-07-16 19:06:56 +02:00
Claudio Atzori
878f2b931c
Merge branch 'master' into merge_graph
2020-07-16 16:34:24 +02:00
Miriam Baglioni
f9ad6f3255
Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump
2020-07-10 19:42:53 +02:00
Miriam Baglioni
c27f12d6e8
avoid to consider _SUCCESS file
2020-07-10 19:42:23 +02:00
Claudio Atzori
31071e363f
Merge branch 'provision_indexing'
2020-07-10 19:03:57 +02:00
Claudio Atzori
cc77446dc4
added dbSchema parameter to the raw_db workflow
2020-07-10 19:01:50 +02:00
Michele Artini
e1ae964bc4
stats
2020-07-10 16:12:08 +02:00
Sandro La Bruzzo
c01efed79b
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-07-10 14:44:57 +02:00
Sandro La Bruzzo
a7d3977481
added generation of EBI Dataset
2020-07-10 14:44:50 +02:00
Claudio Atzori
67e1d222b6
bulk cleaning when found null or empty, sets bestaccessrights evaluating the result instances
2020-07-08 17:53:35 +02:00
Claudio Atzori
610d377d57
first implementation of the BETA & PROD graphs merge procedure
2020-07-08 16:54:26 +02:00
Alessia Bardi
9a898c0e4c
Json schema generator
2020-07-08 12:52:00 +02:00
Alessia Bardi
636f9ce7d6
json schema generator lib
2020-07-08 12:50:57 +02:00
Alessia Bardi
8f83b726fa
Dump json schema compliant to json schema Draft 7
2020-07-08 12:48:46 +02:00
Miriam Baglioni
1b0b968548
fixed issue on substring
2020-07-08 12:11:51 +02:00
Miriam Baglioni
7fe00cb4fb
-
2020-07-08 10:29:37 +02:00
Miriam Baglioni
375ef07d7b
changed the description for the upload
2020-07-07 18:41:27 +02:00
Miriam Baglioni
35c8265793
added the json extention to filename
2020-07-07 18:29:49 +02:00
Miriam Baglioni
81434f8e5e
added method newInstance
2020-07-07 18:26:10 +02:00
Miriam Baglioni
817cddfc52
-
2020-07-07 18:25:12 +02:00
Miriam Baglioni
a66aa9bd83
removed unuseful tests
2020-07-07 18:25:00 +02:00
Miriam Baglioni
9b20a21b24
removed unuseful tests
2020-07-07 18:23:37 +02:00
Miriam Baglioni
8a1b42ff21
added check to verify that dump contains at least one product
2020-07-07 18:21:35 +02:00
Miriam Baglioni
d86adb82a7
-
2020-07-07 18:20:51 +02:00
Miriam Baglioni
b2782025f6
enabled the whole workflow to run. Added property to give priority to depenedency in the classpath - to solve conflicts
2020-07-07 18:10:47 +02:00
Miriam Baglioni
83d2c84b77
added constraints to xquery so that to get only profiles with status manager or all
2020-07-07 18:09:48 +02:00
Miriam Baglioni
4c8d86493c
-
2020-07-07 18:09:06 +02:00
Miriam Baglioni
0208bc18f3
added new resource for testing
2020-07-07 17:47:24 +02:00
Miriam Baglioni
f5bb65c9ef
the json schema for the dump of the results
2020-07-07 17:34:40 +02:00
Miriam Baglioni
c19818a3f8
merge branch with fork master
2020-07-06 13:58:23 +02:00
Miriam Baglioni
f8bf4acd76
-
2020-07-02 16:03:11 +02:00
Miriam Baglioni
e6c79d44e6
-
2020-07-02 16:02:02 +02:00
Miriam Baglioni
d7f6f0c216
changed code to use other lib
2020-07-02 16:01:34 +02:00
Miriam Baglioni
8fdc9e070c
added dependency to OkHttp
2020-07-02 16:01:08 +02:00
Miriam Baglioni
94500a581b
merge branch with fork master
2020-07-02 14:25:39 +02:00
Claudio Atzori
ed1c7e5d75
fixed workflow for the import of the claims alone
2020-07-02 12:40:21 +02:00
Sandro La Bruzzo
1d420eedb4
added generation of EBI Dataset
2020-07-02 12:37:43 +02:00
Claudio Atzori
e4a29a4513
fixed workflow for the import of the claims alone
2020-07-02 12:36:33 +02:00
Claudio Atzori
6f5771c1c9
sets author.rank when null
2020-06-25 14:06:21 +02:00
Claudio Atzori
2d77d3a388
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-06-25 12:54:30 +02:00
Miriam Baglioni
05a99cfb61
change the position of value and description elements in the workflow definition
2020-06-25 12:36:08 +02:00
Claudio Atzori
7df2712824
Merge branch 'provision_indexing'
2020-06-25 12:22:41 +02:00
Michele Artini
abcbebcbb4
fixed generation of ids
2020-06-25 09:50:46 +02:00
Michele Artini
77d2a1b1c4
params to choose sql queries for beta or production
2020-06-25 09:28:13 +02:00
Claudio Atzori
0e723d378b
added default from vocab for missing instance.refereed; remove spurious prefixes from orcid values; WIP: prepare relation job
2020-06-24 18:34:42 +02:00
Miriam Baglioni
3e5570de7a
-
2020-06-23 15:44:54 +02:00
Michele Artini
38bb45d0b6
test osf:refereed
2020-06-23 10:14:39 +02:00
Miriam Baglioni
e4b21be004
-
2020-06-22 17:31:50 +02:00
Miriam Baglioni
afa19b0c84
changed the way to PUT the files to the rest API
2020-06-22 17:20:07 +02:00
Miriam Baglioni
250fd1c854
merge branch with fork master
2020-06-22 16:25:48 +02:00
Claudio Atzori
9cd27183b6
[maven-release-plugin] prepare for next development iteration
2020-06-22 11:27:44 +02:00
Claudio Atzori
1e3dab0631
[maven-release-plugin] prepare release dhp-1.2.3
2020-06-22 11:27:39 +02:00
Miriam Baglioni
df80ae5c1b
merge branch with fork master
2020-06-22 10:51:23 +02:00
Miriam Baglioni
e8f914f8b3
-
2020-06-22 10:50:41 +02:00
Miriam Baglioni
edeb862476
excluded dependency in module that generates conflict
2020-06-22 10:49:56 +02:00
Miriam Baglioni
185facb8e5
change the deprecated DefaultHttpClient with the CLoseableHttpClient
2020-06-22 10:49:10 +02:00
Claudio Atzori
7d416f08d8
graph cleaning workflow: set hostedby to unknown repository when defined as NULL
2020-06-22 09:50:43 +02:00
Miriam Baglioni
669a509430
-
2020-06-19 17:39:46 +02:00
Claudio Atzori
d0ac7514b2
cleaning workflow to include cleaning of default values
2020-06-18 19:37:25 +02:00
Miriam Baglioni
44a12d244f
-
2020-06-18 18:38:54 +02:00
Miriam Baglioni
fb80353018
-
2020-06-18 14:21:36 +02:00
Miriam Baglioni
65bf312360
merge branch with fork master
2020-06-18 11:35:27 +02:00
Miriam Baglioni
3953f56bd3
added dependency to pom
2020-06-18 11:34:47 +02:00
Miriam Baglioni
a118b66858
-
2020-06-18 11:34:30 +02:00
Miriam Baglioni
f9578312b5
-
2020-06-18 11:34:15 +02:00
Miriam Baglioni
8b145e6aba
-
2020-06-18 11:25:28 +02:00
Miriam Baglioni
e8b3e972f2
changed the input params and the workflow definition to tackle the Result as all result product produced
2020-06-18 11:25:05 +02:00
Miriam Baglioni
3233b01089
changes due to adding all the result type under Result
2020-06-18 11:22:58 +02:00
Miriam Baglioni
5c8533d1a1
changed in the testing classes
2020-06-18 11:20:08 +02:00
Miriam Baglioni
bc8611a95a
added new resources for testing
2020-06-18 11:19:20 +02:00
Sandro La Bruzzo
9bf67f5de1
resolved conflicts
2020-06-17 09:15:43 +02:00
Sandro La Bruzzo
1d4275acc4
implemented first version of exportation of Scholexplorer into ActionSet
2020-06-17 09:10:38 +02:00
Claudio Atzori
1bc1d15eaf
stubbing for mock datasource.identities must be typed as array
2020-06-16 16:54:28 +02:00
Claudio Atzori
5441f01586
Merge pull request 'missing landingPage urls in instances' ( #22 ) from instances-with-landing-page into master
...
Looks good, thanks!
2020-06-16 15:32:44 +02:00
Claudio Atzori
89859111ee
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-06-16 15:28:29 +02:00
Claudio Atzori
4ec262db53
included externalreference(s) in the result view on the Hive graph DB
2020-06-16 15:28:20 +02:00
Michele Artini
8a4f84f8c0
refactoring
2020-06-16 12:34:13 +02:00
Claudio Atzori
2a4f65795f
WIP: graph cleaner implementation
2020-06-15 18:32:24 +02:00
Claudio Atzori
c15c8c0ad0
map datasource identities (including piwik ids) as original IDs
2020-06-15 16:07:30 +02:00
Miriam Baglioni
9dd3ef22c5
merge branch with fork master
2020-06-15 11:23:26 +02:00
Miriam Baglioni
68cf0fd03f
test input
2020-06-15 11:14:42 +02:00
Miriam Baglioni
0467145ae3
test for graph dump
2020-06-15 11:13:51 +02:00
Miriam Baglioni
e43eedb5b0
added resources and workflow for dump of community products
2020-06-15 11:13:21 +02:00
Miriam Baglioni
f96ca900e1
fixed issues while running on cluster
2020-06-15 11:12:14 +02:00
Miriam Baglioni
20b9e67728
added new class funder
2020-06-15 11:06:18 +02:00
Claudio Atzori
0d52816244
WIP: graph cleaner implementation
2020-06-13 13:06:04 +02:00
Claudio Atzori
bed65a1be6
WIP: graph cleaner implementation
2020-06-12 18:25:47 +02:00
Claudio Atzori
c4d9f1837f
[maven-release-plugin] prepare for next development iteration
2020-06-12 12:21:08 +02:00
Claudio Atzori
f0746a7605
[maven-release-plugin] prepare release dhp-1.2.2
2020-06-12 12:21:03 +02:00
Claudio Atzori
463489f59f
code formatting
2020-06-12 12:03:25 +02:00
Claudio Atzori
4bcad1c9c3
Merge branch 'graph_cleaning'
2020-06-12 11:40:25 +02:00
Claudio Atzori
cdb1956fe9
WIP: graph cleaner implementation
2020-06-12 11:36:59 +02:00
Alessia Bardi
b347499745
do not use deprecated subreltype
2020-06-12 10:58:02 +02:00
Claudio Atzori
97b1c4057c
WIP: graph cleaner implementation
2020-06-12 10:45:18 +02:00
Claudio Atzori
ba8a024af9
avoid NPEs merging titles
2020-06-12 10:45:11 +02:00
Miriam Baglioni
e145972962
-
2020-06-11 13:08:39 +02:00
Miriam Baglioni
a01800224c
-
2020-06-11 13:02:04 +02:00
Miriam Baglioni
356dd582a3
map construction moved in class
2020-06-11 12:59:22 +02:00
Michele Artini
a41e0cb648
missing landingPage urls in instances
2020-06-11 12:28:34 +02:00
Michele Artini
99f88e1cb8
fixed generation entities from claims
2020-06-11 10:51:57 +02:00
Miriam Baglioni
db27663750
-
2020-06-11 10:49:01 +02:00
Miriam Baglioni
bb9f21d0e7
job test for class producing first step of results dump
2020-06-11 10:20:05 +02:00
Claudio Atzori
d1d92c4d8c
fixed integration of claims in the graph
2020-06-11 10:12:00 +02:00
Claudio Atzori
953da4a427
Merge branch 'master' into graph_cleaning
2020-06-10 21:36:56 +02:00
Claudio Atzori
f1bce64391
WIP: graph cleaner implementation
2020-06-10 21:36:31 +02:00
Michele Artini
c08e66e01e
fixed a workflow parameter
2020-06-10 10:11:56 +02:00
Michele Artini
7177a32d75
import of invisible stores
2020-06-10 10:04:00 +02:00
Claudio Atzori
a2fdf85ba1
WIP: graph cleaner implementation
2020-06-09 19:52:53 +02:00
Claudio Atzori
d9f33582c5
WIP: graph cleaner implementation
2020-06-09 17:20:40 +02:00
Miriam Baglioni
a089db18f1
workflow and parameters to exucute the dump
2020-06-09 15:39:38 +02:00
Miriam Baglioni
6bbe27587f
new classes to execute the dump for products associated to community, enrich each result with project information and assign the result to each community it belongs to
2020-06-09 15:39:03 +02:00
Miriam Baglioni
5121cbaf6a
new classes for external dump. Only classes functional to dump products
2020-06-09 15:37:46 +02:00
Claudio Atzori
b2349659cf
WIP: graph property fixing implementation
2020-06-05 18:37:38 +02:00
Claudio Atzori
5e23fb3a74
code formatting
2020-05-30 10:52:56 +02:00
Claudio Atzori
54ca8ed6c3
uniformed param name (isLookupUrl), Vocab model classes defined as Serializable
2020-05-29 18:17:30 +02:00
Claudio Atzori
1577bd5b8b
added IsLookupUrl to the raw_db workflow parameters
2020-05-29 16:18:16 +02:00
Michele Artini
adb798faa5
import from db using is vocabularies
2020-05-29 12:03:51 +02:00
Michele Artini
f5ce7d76e1
resolve conflicts
2020-05-27 12:49:17 +02:00
Michele Artini
b81f2741d2
xquery
2020-05-27 12:10:20 +02:00
Michele Artini
a25598140a
result pids (new xpaths + IS vocabularies)
2020-05-27 12:10:20 +02:00
Michele Artini
7a7272d9ec
result pids (new xpaths + IS vocabularies)
2020-05-27 12:10:20 +02:00
Michele Artini
3ceb2d2853
match terms with vocabularies
2020-05-27 11:34:13 +02:00
Michele Artini
c15d997925
xquery
2020-05-26 13:13:17 +02:00
Michele Artini
c6af36496a
result pids (new xpaths + IS vocabularies)
2020-05-26 13:11:09 +02:00
Michele Artini
093f1aff03
result pids (new xpaths + IS vocabularies)
2020-05-26 13:06:55 +02:00
Miriam Baglioni
54d869e618
merge upstream
2020-05-26 09:22:04 +02:00
Claudio Atzori
7582532e73
[maven-release-plugin] prepare for next development iteration
2020-05-25 19:48:18 +02:00
Claudio Atzori
01c2e93395
[maven-release-plugin] prepare release dhp-1.2.1
2020-05-25 19:48:14 +02:00
Miriam Baglioni
d3d36647d2
merge upstream
2020-05-25 10:38:22 +02:00
Miriam Baglioni
dbde2d243a
changed due to move of PacePerson from dhp-graph-mapper to dhp-common
2020-05-25 10:35:39 +02:00
Miriam Baglioni
8f6ce970f9
moved PacePerson to dhp-common to avoid conflict in dependency with graph-mapper
2020-05-25 10:25:55 +02:00
Claudio Atzori
de108f54d6
code formatting
2020-05-23 10:21:19 +02:00
Claudio Atzori
6b56cae57d
added mapping for bestaccessrights
2020-05-23 09:57:39 +02:00
Claudio Atzori
3cf2796ac6
code formatting
2020-05-22 12:34:00 +02:00
Michele Artini
dc4621b3cb
filter ORCID e MAG identifiers
2020-05-22 12:25:01 +02:00
Michele Artini
9f2d0f1b08
filter ORCID e MAG identifiers
2020-05-22 11:00:27 +02:00
Michele Artini
9de71e54a8
filter ORCID e MAG identifiers
2020-05-22 10:47:39 +02:00
Michele Artini
c5f7e17348
author fullnames
2020-05-22 10:08:02 +02:00
Michele Artini
e43d4d7778
added a coalesce in sql query
2020-05-21 11:08:07 +02:00
Michele Artini
b3bcbb3129
resolve name of organization countries
2020-05-21 08:41:32 +02:00
Claudio Atzori
7838f2c63f
init the empty list for author pids mapped from OAF
2020-05-15 17:06:01 +02:00
Claudio Atzori
7a89507ab1
code formatting
2020-05-15 15:16:54 +02:00
Claudio Atzori
cfc8948717
fixed mapping OdfToGraph: pick the correct element to map author pids and author affiliations; extended mapping Oaf2Graph: added support for author pids
2020-05-15 12:26:16 +02:00
Claudio Atzori
a832658296
code formatting
2020-05-15 10:21:09 +02:00
Claudio Atzori
18f46e47b9
added relations to the graph2hive import workflow
2020-05-15 09:34:48 +02:00
Claudio Atzori
9d028ffe1c
cleanup
2020-05-15 09:28:55 +02:00
Claudio Atzori
fd62359538
cleanup
2020-05-15 09:28:15 +02:00
Claudio Atzori
eb64335a54
parallel implementation for graph Hive importer
2020-05-15 09:05:26 +02:00
Claudio Atzori
f044d09315
revised mapping: more accurate mapping for name/surname from datacite format; improved mapping of null values
2020-05-14 15:07:24 +02:00
Claudio Atzori
ab37953332
added global properties in wf definitions to avoid repeating name-node and job-tracker in the (many) distcp actions; reintroduced output directory removal at the beginning of each spark action
2020-05-14 10:25:41 +02:00
Claudio Atzori
5ecacad70a
fixed default resource typing in Oaf/Odf mapping
2020-05-13 17:01:11 +02:00
Miriam Baglioni
f5d785e096
used the DbClient moved in dhp-common
2020-05-11 13:59:42 +02:00
Miriam Baglioni
2abb84877d
Merge branch 'master' into blacklist
2020-05-11 10:37:49 +02:00
Miriam Baglioni
bb59bdd60f
merge upstream
2020-05-11 10:33:17 +02:00
Miriam Baglioni
5e3548add6
-
2020-05-11 10:33:08 +02:00
Miriam Baglioni
871e079b45
merged with master
2020-05-11 10:20:00 +02:00
Claudio Atzori
60c40618d3
[maven-release-plugin] prepare for next development iteration
2020-05-11 10:17:14 +02:00
Claudio Atzori
c267d958d5
[maven-release-plugin] prepare release dhp-1.2.0
2020-05-11 10:17:10 +02:00
Miriam Baglioni
391b2399cc
merge upstream
2020-05-11 10:08:51 +02:00
Claudio Atzori
42f1a2bf94
bumped project version to 1.2.0-SNAPSHOT
2020-05-11 10:05:57 +02:00
Miriam Baglioni
32301451ec
merge upstream
2020-05-11 09:42:23 +02:00
Claudio Atzori
0ccc864ad9
[maven-release-plugin] prepare for next development iteration
2020-05-08 17:01:31 +02:00
Claudio Atzori
6e47c724c6
[maven-release-plugin] prepare release dhp-1.1.7
2020-05-08 17:01:27 +02:00
Miriam Baglioni
4c94231cad
merge with master fork
2020-05-08 12:25:57 +02:00
Claudio Atzori
62ea19f1d3
introduced mapping for ExternalReferences, made urls defined within an instance unique
2020-05-08 09:43:26 +02:00
Miriam Baglioni
207b899d6d
merged with upstream
2020-05-07 11:43:53 +02:00
Miriam Baglioni
5efae3acb9
new workflow for job3
2020-05-07 11:38:10 +02:00
Claudio Atzori
17860d3ab6
general changes in the RAW graph mapping: missing collectedfrom/hostedby causes records to be skipped; factored out most of the constants in ModelConstants class (dhp-schemas)
2020-05-06 13:20:02 +02:00
Michele Artini
8f30a09d84
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-05-05 17:12:22 +02:00
Michele Artini
ccc609f909
new module for the production of broker events
2020-05-05 17:09:00 +02:00
Claudio Atzori
4a8487165c
using long param names in wf definition
2020-05-04 19:19:29 +02:00
Claudio Atzori
a2fc37df5f
adjusted parameters
2020-05-04 19:18:59 +02:00
Claudio Atzori
f1b7e14036
code formatting
2020-05-04 19:18:34 +02:00
Miriam Baglioni
31ea05297d
moved the DbClient to common and added needed dependency to pom
2020-05-04 12:22:28 +02:00
Miriam Baglioni
4b0bd91012
-
2020-04-30 12:45:28 +02:00
Miriam Baglioni
3abb76ff7a
merge with upstream
2020-04-30 11:15:54 +02:00
Michele Artini
eb9bd42970
fixed a problem with journals
2020-04-30 11:06:05 +02:00
Miriam Baglioni
638a3c465b
-
2020-04-30 11:05:17 +02:00
Michele Artini
a0a6109bbc
fixed a problem with journals
2020-04-30 11:03:46 +02:00
Claudio Atzori
439c6255a2
cleanup
2020-04-29 19:09:07 +02:00
Claudio Atzori
77ac995770
cleaned up poms, added descriptions
2020-04-29 18:44:17 +02:00
Miriam Baglioni
3cffee74b9
merge with upstream
2020-04-29 18:25:29 +02:00
Michele Artini
c43b4c8962
formatting
2020-04-29 12:56:58 +02:00
Michele Artini
a5d7007005
Fix relations in migration
...
Fix pom.xml in dhp-stats-update
2020-04-29 12:05:41 +02:00
Miriam Baglioni
f7695e833c
resolved conflicts
2020-04-29 11:41:31 +02:00
Claudio Atzori
6f5b899038
reformatted code according to the updated style descriptor
2020-04-28 11:23:29 +02:00
Claudio Atzori
ac25f2d8d1
integrated changes from master
2020-04-28 08:55:28 +02:00