Miriam Baglioni
d0ff2a56fb
-
2020-07-29 17:06:53 +02:00
Miriam Baglioni
b96dedb56b
changed due to changes in the model
2020-07-29 17:05:31 +02:00
Miriam Baglioni
6d0f08277b
classes to implement the dump of the whole graph.
2020-07-29 17:03:19 +02:00
Miriam Baglioni
8d4327b292
input parameters and workflow definition for the dump of the whole graph
2020-07-29 17:00:34 +02:00
Miriam Baglioni
b5f995ab12
refactoring
2020-07-29 16:59:48 +02:00
Miriam Baglioni
f7a87cc447
added new constants value
2020-07-29 16:58:40 +02:00
Miriam Baglioni
b71d12cf26
refactoring
2020-07-29 16:52:44 +02:00
Miriam Baglioni
a8d65b68cb
changed to delete the part to check if it was a test or a real execution
2020-07-29 16:47:57 +02:00
Miriam Baglioni
3ec2392904
Added new class to move the place the split is effectively run
2020-07-29 16:46:50 +02:00
Miriam Baglioni
178c2729a7
changed the path to reach the java class to be executed
2020-07-29 12:29:51 +02:00
Miriam Baglioni
437ac12139
removed unused parameter
2020-07-29 12:28:16 +02:00
Michele Artini
35e6e9c064
tests
2020-07-28 12:02:15 +02:00
Miriam Baglioni
6c2223d1fc
added code to get the openaire id for contexts
2020-07-24 17:30:15 +02:00
Miriam Baglioni
afd54c1684
removed not needed upload and refactoring
2020-07-24 17:28:56 +02:00
Miriam Baglioni
7b0569d989
changed to map also the result associated to the whole graph
2020-07-24 17:28:11 +02:00
Miriam Baglioni
082225ad61
-
2020-07-24 17:27:26 +02:00
Miriam Baglioni
968c59d97a
added teh logic to dump also the products for the whole graph. They will miss collected from and context information that will be materialized as new relations
2020-07-24 17:25:19 +02:00
Miriam Baglioni
332258d199
split the classes related to the communities dump and to the whole graph dump
2020-07-24 17:21:48 +02:00
Claudio Atzori
56bbfdc65d
introduced parameter 'numParitions', driving the hive DB table data partitioning. Currently specified only for table 'project'
2020-07-23 08:54:10 +02:00
Sandro La Bruzzo
9ab594ccf6
fixed test
2020-07-21 10:36:21 +02:00
Claudio Atzori
ebf60020ac
map results as OPRs in case of missing //CobjCategory/@type and the vocabulary dnet:result_typologies doesn't resolve the super type
2020-07-20 19:01:10 +02:00
Miriam Baglioni
355d7e426e
added dumo for project - not finished
2020-07-20 18:54:43 +02:00
Miriam Baglioni
a2f01e5259
added getter and setter
2020-07-20 18:54:17 +02:00
Miriam Baglioni
40bbe94f7c
merge with master fork
2020-07-20 18:10:03 +02:00
Miriam Baglioni
23160b4d29
realignment of the workflow classes with the changes in the structure of the module
2020-07-20 18:04:30 +02:00
Miriam Baglioni
3aab7680f6
changed the test results
2020-07-20 18:00:43 +02:00
Miriam Baglioni
5076e4f320
changed test to comply with the modifications
2020-07-20 17:55:18 +02:00
Miriam Baglioni
08dbd99455
changed to dump the whole results graph by usign classes already implemented for communities. Added class to dump also organization
2020-07-20 17:54:28 +02:00
Miriam Baglioni
e47ea9349c
extended some types by adding provenance as the couple (provenance, trust) and moved some classes to be used by the complete graph dump also
2020-07-20 17:46:27 +02:00
Claudio Atzori
32f5e466e3
imports cleanup
2020-07-20 17:42:58 +02:00
Claudio Atzori
54ac583923
code formatting
2020-07-20 17:37:08 +02:00
Claudio Atzori
124e7ce19c
in case of missing attribute //dr:CobjCategory/@type the resulttype is derived by looking up the vocabulary dnet:result_typologies with the 1st instance type available
2020-07-20 17:33:37 +02:00
Claudio Atzori
050dda223d
Merge pull request 'removed duplicated fields' ( #25 ) from unique_field_in_lists into master
...
Looks good as a temporary workaround. I agree the model could seamlessly make the distinct operation by using HashSets instead of Linked (or Array) Lists.
The task to update the model in such a way is added on #9#issuecomment-1583
Thanks!
2020-07-20 12:12:50 +02:00
Claudio Atzori
e0c4cf6f7b
added parameter to drive the graph merge strategy: priority (BETA|PROD)
2020-07-20 10:48:01 +02:00
Claudio Atzori
94ccdb4852
Merge branch 'master' into merge_graph
2020-07-20 10:14:55 +02:00
Michele Artini
331a3cbdd0
fixed originalId
2020-07-20 09:50:29 +02:00
Sandro La Bruzzo
9116d75b3e
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-07-17 18:01:30 +02:00
Miriam Baglioni
47c7122773
changed priority from beta to production
2020-07-17 12:56:35 +02:00
Michele Artini
442f30930c
removed duplicated fields
2020-07-17 12:25:36 +02:00
Claudio Atzori
1781609508
code formatting
2020-07-16 19:06:56 +02:00
Claudio Atzori
878f2b931c
Merge branch 'master' into merge_graph
2020-07-16 16:34:24 +02:00
Miriam Baglioni
f9ad6f3255
Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump
2020-07-10 19:42:53 +02:00
Miriam Baglioni
c27f12d6e8
avoid to consider _SUCCESS file
2020-07-10 19:42:23 +02:00
Claudio Atzori
31071e363f
Merge branch 'provision_indexing'
2020-07-10 19:03:57 +02:00
Claudio Atzori
cc77446dc4
added dbSchema parameter to the raw_db workflow
2020-07-10 19:01:50 +02:00
Michele Artini
e1ae964bc4
stats
2020-07-10 16:12:08 +02:00
Sandro La Bruzzo
c01efed79b
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-07-10 14:44:57 +02:00
Sandro La Bruzzo
a7d3977481
added generation of EBI Dataset
2020-07-10 14:44:50 +02:00
Claudio Atzori
67e1d222b6
bulk cleaning when found null or empty, sets bestaccessrights evaluating the result instances
2020-07-08 17:53:35 +02:00
Claudio Atzori
610d377d57
first implementation of the BETA & PROD graphs merge procedure
2020-07-08 16:54:26 +02:00
Alessia Bardi
9a898c0e4c
Json schema generator
2020-07-08 12:52:00 +02:00
Alessia Bardi
636f9ce7d6
json schema generator lib
2020-07-08 12:50:57 +02:00
Alessia Bardi
8f83b726fa
Dump json schema compliant to json schema Draft 7
2020-07-08 12:48:46 +02:00
Miriam Baglioni
1b0b968548
fixed issue on substring
2020-07-08 12:11:51 +02:00
Miriam Baglioni
7fe00cb4fb
-
2020-07-08 10:29:37 +02:00
Miriam Baglioni
375ef07d7b
changed the description for the upload
2020-07-07 18:41:27 +02:00
Miriam Baglioni
35c8265793
added the json extention to filename
2020-07-07 18:29:49 +02:00
Miriam Baglioni
81434f8e5e
added method newInstance
2020-07-07 18:26:10 +02:00
Miriam Baglioni
817cddfc52
-
2020-07-07 18:25:12 +02:00
Miriam Baglioni
a66aa9bd83
removed unuseful tests
2020-07-07 18:25:00 +02:00
Miriam Baglioni
9b20a21b24
removed unuseful tests
2020-07-07 18:23:37 +02:00
Miriam Baglioni
8a1b42ff21
added check to verify that dump contains at least one product
2020-07-07 18:21:35 +02:00
Miriam Baglioni
d86adb82a7
-
2020-07-07 18:20:51 +02:00
Miriam Baglioni
b2782025f6
enabled the whole workflow to run. Added property to give priority to depenedency in the classpath - to solve conflicts
2020-07-07 18:10:47 +02:00
Miriam Baglioni
83d2c84b77
added constraints to xquery so that to get only profiles with status manager or all
2020-07-07 18:09:48 +02:00
Miriam Baglioni
4c8d86493c
-
2020-07-07 18:09:06 +02:00
Miriam Baglioni
0208bc18f3
added new resource for testing
2020-07-07 17:47:24 +02:00
Miriam Baglioni
f5bb65c9ef
the json schema for the dump of the results
2020-07-07 17:34:40 +02:00
Miriam Baglioni
c19818a3f8
merge branch with fork master
2020-07-06 13:58:23 +02:00
Miriam Baglioni
f8bf4acd76
-
2020-07-02 16:03:11 +02:00
Miriam Baglioni
e6c79d44e6
-
2020-07-02 16:02:02 +02:00
Miriam Baglioni
d7f6f0c216
changed code to use other lib
2020-07-02 16:01:34 +02:00
Miriam Baglioni
8fdc9e070c
added dependency to OkHttp
2020-07-02 16:01:08 +02:00
Miriam Baglioni
94500a581b
merge branch with fork master
2020-07-02 14:25:39 +02:00
Claudio Atzori
ed1c7e5d75
fixed workflow for the import of the claims alone
2020-07-02 12:40:21 +02:00
Sandro La Bruzzo
1d420eedb4
added generation of EBI Dataset
2020-07-02 12:37:43 +02:00
Claudio Atzori
e4a29a4513
fixed workflow for the import of the claims alone
2020-07-02 12:36:33 +02:00
Claudio Atzori
6f5771c1c9
sets author.rank when null
2020-06-25 14:06:21 +02:00
Claudio Atzori
2d77d3a388
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-06-25 12:54:30 +02:00
Miriam Baglioni
05a99cfb61
change the position of value and description elements in the workflow definition
2020-06-25 12:36:08 +02:00
Claudio Atzori
7df2712824
Merge branch 'provision_indexing'
2020-06-25 12:22:41 +02:00
Michele Artini
abcbebcbb4
fixed generation of ids
2020-06-25 09:50:46 +02:00
Michele Artini
77d2a1b1c4
params to choose sql queries for beta or production
2020-06-25 09:28:13 +02:00
Claudio Atzori
0e723d378b
added default from vocab for missing instance.refereed; remove spurious prefixes from orcid values; WIP: prepare relation job
2020-06-24 18:34:42 +02:00
Miriam Baglioni
3e5570de7a
-
2020-06-23 15:44:54 +02:00
Michele Artini
38bb45d0b6
test osf:refereed
2020-06-23 10:14:39 +02:00
Miriam Baglioni
e4b21be004
-
2020-06-22 17:31:50 +02:00
Miriam Baglioni
afa19b0c84
changed the way to PUT the files to the rest API
2020-06-22 17:20:07 +02:00
Miriam Baglioni
250fd1c854
merge branch with fork master
2020-06-22 16:25:48 +02:00
Claudio Atzori
9cd27183b6
[maven-release-plugin] prepare for next development iteration
2020-06-22 11:27:44 +02:00
Claudio Atzori
1e3dab0631
[maven-release-plugin] prepare release dhp-1.2.3
2020-06-22 11:27:39 +02:00
Miriam Baglioni
df80ae5c1b
merge branch with fork master
2020-06-22 10:51:23 +02:00
Miriam Baglioni
e8f914f8b3
-
2020-06-22 10:50:41 +02:00
Miriam Baglioni
edeb862476
excluded dependency in module that generates conflict
2020-06-22 10:49:56 +02:00
Miriam Baglioni
185facb8e5
change the deprecated DefaultHttpClient with the CLoseableHttpClient
2020-06-22 10:49:10 +02:00
Claudio Atzori
7d416f08d8
graph cleaning workflow: set hostedby to unknown repository when defined as NULL
2020-06-22 09:50:43 +02:00
Miriam Baglioni
669a509430
-
2020-06-19 17:39:46 +02:00
Claudio Atzori
d0ac7514b2
cleaning workflow to include cleaning of default values
2020-06-18 19:37:25 +02:00
Miriam Baglioni
44a12d244f
-
2020-06-18 18:38:54 +02:00
Miriam Baglioni
fb80353018
-
2020-06-18 14:21:36 +02:00
Miriam Baglioni
65bf312360
merge branch with fork master
2020-06-18 11:35:27 +02:00
Miriam Baglioni
3953f56bd3
added dependency to pom
2020-06-18 11:34:47 +02:00
Miriam Baglioni
a118b66858
-
2020-06-18 11:34:30 +02:00
Miriam Baglioni
f9578312b5
-
2020-06-18 11:34:15 +02:00
Miriam Baglioni
8b145e6aba
-
2020-06-18 11:25:28 +02:00
Miriam Baglioni
e8b3e972f2
changed the input params and the workflow definition to tackle the Result as all result product produced
2020-06-18 11:25:05 +02:00
Miriam Baglioni
3233b01089
changes due to adding all the result type under Result
2020-06-18 11:22:58 +02:00
Miriam Baglioni
5c8533d1a1
changed in the testing classes
2020-06-18 11:20:08 +02:00
Miriam Baglioni
bc8611a95a
added new resources for testing
2020-06-18 11:19:20 +02:00
Sandro La Bruzzo
9bf67f5de1
resolved conflicts
2020-06-17 09:15:43 +02:00
Sandro La Bruzzo
1d4275acc4
implemented first version of exportation of Scholexplorer into ActionSet
2020-06-17 09:10:38 +02:00
Claudio Atzori
1bc1d15eaf
stubbing for mock datasource.identities must be typed as array
2020-06-16 16:54:28 +02:00
Claudio Atzori
5441f01586
Merge pull request 'missing landingPage urls in instances' ( #22 ) from instances-with-landing-page into master
...
Looks good, thanks!
2020-06-16 15:32:44 +02:00
Claudio Atzori
89859111ee
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-06-16 15:28:29 +02:00
Claudio Atzori
4ec262db53
included externalreference(s) in the result view on the Hive graph DB
2020-06-16 15:28:20 +02:00
Michele Artini
8a4f84f8c0
refactoring
2020-06-16 12:34:13 +02:00
Claudio Atzori
2a4f65795f
WIP: graph cleaner implementation
2020-06-15 18:32:24 +02:00
Claudio Atzori
c15c8c0ad0
map datasource identities (including piwik ids) as original IDs
2020-06-15 16:07:30 +02:00
Miriam Baglioni
9dd3ef22c5
merge branch with fork master
2020-06-15 11:23:26 +02:00
Miriam Baglioni
68cf0fd03f
test input
2020-06-15 11:14:42 +02:00
Miriam Baglioni
0467145ae3
test for graph dump
2020-06-15 11:13:51 +02:00
Miriam Baglioni
e43eedb5b0
added resources and workflow for dump of community products
2020-06-15 11:13:21 +02:00
Miriam Baglioni
f96ca900e1
fixed issues while running on cluster
2020-06-15 11:12:14 +02:00
Miriam Baglioni
20b9e67728
added new class funder
2020-06-15 11:06:18 +02:00
Claudio Atzori
0d52816244
WIP: graph cleaner implementation
2020-06-13 13:06:04 +02:00
Claudio Atzori
bed65a1be6
WIP: graph cleaner implementation
2020-06-12 18:25:47 +02:00
Claudio Atzori
c4d9f1837f
[maven-release-plugin] prepare for next development iteration
2020-06-12 12:21:08 +02:00
Claudio Atzori
f0746a7605
[maven-release-plugin] prepare release dhp-1.2.2
2020-06-12 12:21:03 +02:00
Claudio Atzori
463489f59f
code formatting
2020-06-12 12:03:25 +02:00
Claudio Atzori
4bcad1c9c3
Merge branch 'graph_cleaning'
2020-06-12 11:40:25 +02:00
Claudio Atzori
cdb1956fe9
WIP: graph cleaner implementation
2020-06-12 11:36:59 +02:00
Alessia Bardi
b347499745
do not use deprecated subreltype
2020-06-12 10:58:02 +02:00
Claudio Atzori
97b1c4057c
WIP: graph cleaner implementation
2020-06-12 10:45:18 +02:00
Claudio Atzori
ba8a024af9
avoid NPEs merging titles
2020-06-12 10:45:11 +02:00
Miriam Baglioni
e145972962
-
2020-06-11 13:08:39 +02:00
Miriam Baglioni
a01800224c
-
2020-06-11 13:02:04 +02:00
Miriam Baglioni
356dd582a3
map construction moved in class
2020-06-11 12:59:22 +02:00
Michele Artini
a41e0cb648
missing landingPage urls in instances
2020-06-11 12:28:34 +02:00
Michele Artini
99f88e1cb8
fixed generation entities from claims
2020-06-11 10:51:57 +02:00
Miriam Baglioni
db27663750
-
2020-06-11 10:49:01 +02:00
Miriam Baglioni
bb9f21d0e7
job test for class producing first step of results dump
2020-06-11 10:20:05 +02:00
Claudio Atzori
d1d92c4d8c
fixed integration of claims in the graph
2020-06-11 10:12:00 +02:00
Claudio Atzori
953da4a427
Merge branch 'master' into graph_cleaning
2020-06-10 21:36:56 +02:00
Claudio Atzori
f1bce64391
WIP: graph cleaner implementation
2020-06-10 21:36:31 +02:00
Michele Artini
c08e66e01e
fixed a workflow parameter
2020-06-10 10:11:56 +02:00
Michele Artini
7177a32d75
import of invisible stores
2020-06-10 10:04:00 +02:00
Claudio Atzori
a2fdf85ba1
WIP: graph cleaner implementation
2020-06-09 19:52:53 +02:00
Claudio Atzori
d9f33582c5
WIP: graph cleaner implementation
2020-06-09 17:20:40 +02:00
Miriam Baglioni
a089db18f1
workflow and parameters to exucute the dump
2020-06-09 15:39:38 +02:00
Miriam Baglioni
6bbe27587f
new classes to execute the dump for products associated to community, enrich each result with project information and assign the result to each community it belongs to
2020-06-09 15:39:03 +02:00
Miriam Baglioni
5121cbaf6a
new classes for external dump. Only classes functional to dump products
2020-06-09 15:37:46 +02:00
Claudio Atzori
b2349659cf
WIP: graph property fixing implementation
2020-06-05 18:37:38 +02:00
Claudio Atzori
5e23fb3a74
code formatting
2020-05-30 10:52:56 +02:00
Claudio Atzori
54ca8ed6c3
uniformed param name (isLookupUrl), Vocab model classes defined as Serializable
2020-05-29 18:17:30 +02:00
Claudio Atzori
1577bd5b8b
added IsLookupUrl to the raw_db workflow parameters
2020-05-29 16:18:16 +02:00
Michele Artini
adb798faa5
import from db using is vocabularies
2020-05-29 12:03:51 +02:00
Michele Artini
f5ce7d76e1
resolve conflicts
2020-05-27 12:49:17 +02:00
Michele Artini
b81f2741d2
xquery
2020-05-27 12:10:20 +02:00
Michele Artini
a25598140a
result pids (new xpaths + IS vocabularies)
2020-05-27 12:10:20 +02:00
Michele Artini
7a7272d9ec
result pids (new xpaths + IS vocabularies)
2020-05-27 12:10:20 +02:00
Michele Artini
3ceb2d2853
match terms with vocabularies
2020-05-27 11:34:13 +02:00
Michele Artini
c15d997925
xquery
2020-05-26 13:13:17 +02:00
Michele Artini
c6af36496a
result pids (new xpaths + IS vocabularies)
2020-05-26 13:11:09 +02:00
Michele Artini
093f1aff03
result pids (new xpaths + IS vocabularies)
2020-05-26 13:06:55 +02:00
Miriam Baglioni
54d869e618
merge upstream
2020-05-26 09:22:04 +02:00
Claudio Atzori
7582532e73
[maven-release-plugin] prepare for next development iteration
2020-05-25 19:48:18 +02:00
Claudio Atzori
01c2e93395
[maven-release-plugin] prepare release dhp-1.2.1
2020-05-25 19:48:14 +02:00
Miriam Baglioni
d3d36647d2
merge upstream
2020-05-25 10:38:22 +02:00
Miriam Baglioni
dbde2d243a
changed due to move of PacePerson from dhp-graph-mapper to dhp-common
2020-05-25 10:35:39 +02:00
Miriam Baglioni
8f6ce970f9
moved PacePerson to dhp-common to avoid conflict in dependency with graph-mapper
2020-05-25 10:25:55 +02:00
Claudio Atzori
de108f54d6
code formatting
2020-05-23 10:21:19 +02:00
Claudio Atzori
6b56cae57d
added mapping for bestaccessrights
2020-05-23 09:57:39 +02:00
Claudio Atzori
3cf2796ac6
code formatting
2020-05-22 12:34:00 +02:00
Michele Artini
dc4621b3cb
filter ORCID e MAG identifiers
2020-05-22 12:25:01 +02:00
Michele Artini
9f2d0f1b08
filter ORCID e MAG identifiers
2020-05-22 11:00:27 +02:00
Michele Artini
9de71e54a8
filter ORCID e MAG identifiers
2020-05-22 10:47:39 +02:00
Michele Artini
c5f7e17348
author fullnames
2020-05-22 10:08:02 +02:00
Michele Artini
e43d4d7778
added a coalesce in sql query
2020-05-21 11:08:07 +02:00
Michele Artini
b3bcbb3129
resolve name of organization countries
2020-05-21 08:41:32 +02:00
Claudio Atzori
7838f2c63f
init the empty list for author pids mapped from OAF
2020-05-15 17:06:01 +02:00
Claudio Atzori
7a89507ab1
code formatting
2020-05-15 15:16:54 +02:00
Claudio Atzori
cfc8948717
fixed mapping OdfToGraph: pick the correct element to map author pids and author affiliations; extended mapping Oaf2Graph: added support for author pids
2020-05-15 12:26:16 +02:00
Claudio Atzori
a832658296
code formatting
2020-05-15 10:21:09 +02:00
Claudio Atzori
18f46e47b9
added relations to the graph2hive import workflow
2020-05-15 09:34:48 +02:00
Claudio Atzori
9d028ffe1c
cleanup
2020-05-15 09:28:55 +02:00
Claudio Atzori
fd62359538
cleanup
2020-05-15 09:28:15 +02:00
Claudio Atzori
eb64335a54
parallel implementation for graph Hive importer
2020-05-15 09:05:26 +02:00
Claudio Atzori
f044d09315
revised mapping: more accurate mapping for name/surname from datacite format; improved mapping of null values
2020-05-14 15:07:24 +02:00
Claudio Atzori
ab37953332
added global properties in wf definitions to avoid repeating name-node and job-tracker in the (many) distcp actions; reintroduced output directory removal at the beginning of each spark action
2020-05-14 10:25:41 +02:00
Claudio Atzori
5ecacad70a
fixed default resource typing in Oaf/Odf mapping
2020-05-13 17:01:11 +02:00
Miriam Baglioni
f5d785e096
used the DbClient moved in dhp-common
2020-05-11 13:59:42 +02:00
Miriam Baglioni
2abb84877d
Merge branch 'master' into blacklist
2020-05-11 10:37:49 +02:00
Miriam Baglioni
bb59bdd60f
merge upstream
2020-05-11 10:33:17 +02:00
Miriam Baglioni
5e3548add6
-
2020-05-11 10:33:08 +02:00
Miriam Baglioni
871e079b45
merged with master
2020-05-11 10:20:00 +02:00
Claudio Atzori
60c40618d3
[maven-release-plugin] prepare for next development iteration
2020-05-11 10:17:14 +02:00
Claudio Atzori
c267d958d5
[maven-release-plugin] prepare release dhp-1.2.0
2020-05-11 10:17:10 +02:00
Miriam Baglioni
391b2399cc
merge upstream
2020-05-11 10:08:51 +02:00
Claudio Atzori
42f1a2bf94
bumped project version to 1.2.0-SNAPSHOT
2020-05-11 10:05:57 +02:00
Miriam Baglioni
32301451ec
merge upstream
2020-05-11 09:42:23 +02:00