Miriam Baglioni
6dbadcf181
the new schema for the dumped result
2020-08-06 11:05:56 +02:00
Sandro La Bruzzo
4fb1821fab
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-08-06 10:28:31 +02:00
Sandro La Bruzzo
9d9e9edbd2
improved extractEntity Relation workflows using dataset
2020-08-06 10:28:24 +02:00
Miriam Baglioni
14eda4f46e
added method to try to put inputstream to zenodo
2020-08-05 14:18:25 +02:00
Miriam Baglioni
e737a47270
added classes to try to send input stream to zenodo for the upload
2020-08-05 14:17:40 +02:00
Miriam Baglioni
873e9cd50c
changed hadoop setting to connect to s3
2020-08-04 15:37:25 +02:00
Alessia Bardi
a29565ff57
code formatting
2020-08-04 12:55:27 +02:00
Alessia Bardi
01db29e208
fixes redmine issue #5846 : datacite and its different namespace declarations
2020-08-04 12:53:48 +02:00
Alessia Bardi
b4e4e5f858
do not duplicate result PIDs
2020-08-04 12:52:14 +02:00
Miriam Baglioni
5b651abf82
merge branch with master
2020-08-04 10:14:07 +02:00
Miriam Baglioni
901ae37f7b
added step to workflow
2020-08-03 18:12:54 +02:00
Miriam Baglioni
e43aeb139a
added new property file and changed some parameter to old files
2020-08-03 18:07:28 +02:00
Miriam Baglioni
aa9f3d9698
changed logic for save in s3 directly
2020-08-03 18:06:18 +02:00
Miriam Baglioni
d465f0eec9
added fulltext to result
2020-08-03 18:03:27 +02:00
Miriam Baglioni
c892c7dfa7
changed to query for community map just once and save the result for remaining executions
2020-08-03 17:56:31 +02:00
Michele Artini
652b13abb6
Merge branch 'master' into nsprefix_blacklist
2020-07-31 07:58:37 +02:00
Claudio Atzori
cd631bb5bc
defaults fixed in the cleaning workflow forces result.publisher to NULL when result.publisher.value in empty
2020-07-30 17:03:53 +02:00
Miriam Baglioni
57c87b7653
re-implemented to fix issue on not serializable Set<String> variable
2020-07-30 16:43:43 +02:00
Miriam Baglioni
ef8e5957b5
added specific directory where to save results
2020-07-30 16:42:46 +02:00
Miriam Baglioni
75f3361c85
-
2020-07-30 16:41:31 +02:00
Miriam Baglioni
3f695b25fa
refactoring
2020-07-30 16:40:15 +02:00
Miriam Baglioni
e623f12bef
refactoring
2020-07-30 16:32:59 +02:00
Miriam Baglioni
ff7d05abb4
added support class to store the couple organizationId representativeId gaot from sql query on hive
2020-07-30 16:32:04 +02:00
Miriam Baglioni
cf6d80b2ab
added command to close the writer
2020-07-30 16:31:22 +02:00
Miriam Baglioni
f985bca37b
added USER_CLAIM constant value
2020-07-30 16:25:26 +02:00
Claudio Atzori
4bbfcf1ac6
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-07-30 16:25:06 +02:00
Claudio Atzori
4ff8007518
added function to set the missing vocabulary names, used in the cleaning workflow as a pre-cleaning step
2020-07-30 16:24:39 +02:00
Miriam Baglioni
6f1c40a933
-
2020-07-30 16:24:28 +02:00
Miriam Baglioni
2b66a93f9e
added property file that was missing
2020-07-30 16:24:17 +02:00
Michele Artini
bdece15ca0
blacklist of nsprefix
2020-07-30 16:13:38 +02:00
Sandro La Bruzzo
c97c8f0c44
implemented new oozie job to extract entities in a separate dataset
2020-07-30 12:13:58 +02:00
Sandro La Bruzzo
3010a362bc
updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset
2020-07-30 09:25:56 +02:00
Sandro La Bruzzo
16ae3c9ccf
updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset
2020-07-30 09:25:32 +02:00
Miriam Baglioni
76bcab98ce
added code to filter out null originalId from the dump
2020-07-29 18:28:21 +02:00
Miriam Baglioni
86bab79512
-
2020-07-29 18:20:22 +02:00
Miriam Baglioni
31791dcf3d
fixed wrong property file path name
2020-07-29 18:20:08 +02:00
Miriam Baglioni
9e722aa1ef
-
2020-07-29 18:00:08 +02:00
Miriam Baglioni
d22f106f27
added constant to identify datasource associated to funders
2020-07-29 17:56:55 +02:00
Miriam Baglioni
40e194fe2f
added check to not dump datasources related to funders
2020-07-29 17:56:18 +02:00
Miriam Baglioni
b48934f6df
changed the workflow name
2020-07-29 17:43:43 +02:00
Miriam Baglioni
074e9ab75e
refactoring
2020-07-29 17:42:50 +02:00
Miriam Baglioni
8ad8dac7d4
merge branch with fork master
2020-07-29 17:38:28 +02:00
Miriam Baglioni
9fa82dc93b
fixed issue
2020-07-29 17:36:16 +02:00
Miriam Baglioni
8907648d6a
-
2020-07-29 17:35:47 +02:00
Miriam Baglioni
40a8dafbdc
-
2020-07-29 17:30:44 +02:00
Miriam Baglioni
6d0f08277b
classes to implement the dump of the whole graph.
2020-07-29 17:03:19 +02:00
Miriam Baglioni
8d4327b292
input parameters and workflow definition for the dump of the whole graph
2020-07-29 17:00:34 +02:00
Miriam Baglioni
b5f995ab12
refactoring
2020-07-29 16:59:48 +02:00
Miriam Baglioni
f7a87cc447
added new constants value
2020-07-29 16:58:40 +02:00
Miriam Baglioni
b71d12cf26
refactoring
2020-07-29 16:52:44 +02:00
Miriam Baglioni
a8d65b68cb
changed to delete the part to check if it was a test or a real execution
2020-07-29 16:47:57 +02:00
Miriam Baglioni
3ec2392904
Added new class to move the place the split is effectively run
2020-07-29 16:46:50 +02:00
Miriam Baglioni
178c2729a7
changed the path to reach the java class to be executed
2020-07-29 12:29:51 +02:00
Miriam Baglioni
437ac12139
removed unused parameter
2020-07-29 12:28:16 +02:00
Miriam Baglioni
6c2223d1fc
added code to get the openaire id for contexts
2020-07-24 17:30:15 +02:00
Miriam Baglioni
afd54c1684
removed not needed upload and refactoring
2020-07-24 17:28:56 +02:00
Miriam Baglioni
7b0569d989
changed to map also the result associated to the whole graph
2020-07-24 17:28:11 +02:00
Miriam Baglioni
082225ad61
-
2020-07-24 17:27:26 +02:00
Miriam Baglioni
968c59d97a
added teh logic to dump also the products for the whole graph. They will miss collected from and context information that will be materialized as new relations
2020-07-24 17:25:19 +02:00
Miriam Baglioni
332258d199
split the classes related to the communities dump and to the whole graph dump
2020-07-24 17:21:48 +02:00
Claudio Atzori
56bbfdc65d
introduced parameter 'numParitions', driving the hive DB table data partitioning. Currently specified only for table 'project'
2020-07-23 08:54:10 +02:00
Claudio Atzori
ebf60020ac
map results as OPRs in case of missing //CobjCategory/@type and the vocabulary dnet:result_typologies doesn't resolve the super type
2020-07-20 19:01:10 +02:00
Miriam Baglioni
355d7e426e
added dumo for project - not finished
2020-07-20 18:54:43 +02:00
Miriam Baglioni
a2f01e5259
added getter and setter
2020-07-20 18:54:17 +02:00
Miriam Baglioni
40bbe94f7c
merge with master fork
2020-07-20 18:10:03 +02:00
Miriam Baglioni
23160b4d29
realignment of the workflow classes with the changes in the structure of the module
2020-07-20 18:04:30 +02:00
Miriam Baglioni
08dbd99455
changed to dump the whole results graph by usign classes already implemented for communities. Added class to dump also organization
2020-07-20 17:54:28 +02:00
Miriam Baglioni
e47ea9349c
extended some types by adding provenance as the couple (provenance, trust) and moved some classes to be used by the complete graph dump also
2020-07-20 17:46:27 +02:00
Claudio Atzori
32f5e466e3
imports cleanup
2020-07-20 17:42:58 +02:00
Claudio Atzori
54ac583923
code formatting
2020-07-20 17:37:08 +02:00
Claudio Atzori
124e7ce19c
in case of missing attribute //dr:CobjCategory/@type the resulttype is derived by looking up the vocabulary dnet:result_typologies with the 1st instance type available
2020-07-20 17:33:37 +02:00
Claudio Atzori
050dda223d
Merge pull request 'removed duplicated fields' ( #25 ) from unique_field_in_lists into master
...
Looks good as a temporary workaround. I agree the model could seamlessly make the distinct operation by using HashSets instead of Linked (or Array) Lists.
The task to update the model in such a way is added on #9#issuecomment-1583
Thanks!
2020-07-20 12:12:50 +02:00
Claudio Atzori
e0c4cf6f7b
added parameter to drive the graph merge strategy: priority (BETA|PROD)
2020-07-20 10:48:01 +02:00
Claudio Atzori
94ccdb4852
Merge branch 'master' into merge_graph
2020-07-20 10:14:55 +02:00
Michele Artini
331a3cbdd0
fixed originalId
2020-07-20 09:50:29 +02:00
Sandro La Bruzzo
9116d75b3e
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-07-17 18:01:30 +02:00
Miriam Baglioni
47c7122773
changed priority from beta to production
2020-07-17 12:56:35 +02:00
Michele Artini
442f30930c
removed duplicated fields
2020-07-17 12:25:36 +02:00
Michele Artini
3adedd0a68
trust truncated to 3 decimals
2020-07-17 11:58:11 +02:00
Claudio Atzori
1781609508
code formatting
2020-07-16 19:06:56 +02:00
Claudio Atzori
878f2b931c
Merge branch 'master' into merge_graph
2020-07-16 16:34:24 +02:00
Miriam Baglioni
f9ad6f3255
Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump
2020-07-10 19:42:53 +02:00
Miriam Baglioni
c27f12d6e8
avoid to consider _SUCCESS file
2020-07-10 19:42:23 +02:00
Claudio Atzori
31071e363f
Merge branch 'provision_indexing'
2020-07-10 19:03:57 +02:00
Claudio Atzori
cc77446dc4
added dbSchema parameter to the raw_db workflow
2020-07-10 19:01:50 +02:00
Michele Artini
e1ae964bc4
stats
2020-07-10 16:12:08 +02:00
Sandro La Bruzzo
c01efed79b
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-07-10 14:44:57 +02:00
Sandro La Bruzzo
a7d3977481
added generation of EBI Dataset
2020-07-10 14:44:50 +02:00
Claudio Atzori
67e1d222b6
bulk cleaning when found null or empty, sets bestaccessrights evaluating the result instances
2020-07-08 17:53:35 +02:00
Claudio Atzori
610d377d57
first implementation of the BETA & PROD graphs merge procedure
2020-07-08 16:54:26 +02:00
Alessia Bardi
8f83b726fa
Dump json schema compliant to json schema Draft 7
2020-07-08 12:48:46 +02:00
Miriam Baglioni
1b0b968548
fixed issue on substring
2020-07-08 12:11:51 +02:00
Miriam Baglioni
7fe00cb4fb
-
2020-07-08 10:29:37 +02:00
Miriam Baglioni
35c8265793
added the json extention to filename
2020-07-07 18:29:49 +02:00
Miriam Baglioni
81434f8e5e
added method newInstance
2020-07-07 18:26:10 +02:00
Miriam Baglioni
8a1b42ff21
added check to verify that dump contains at least one product
2020-07-07 18:21:35 +02:00
Miriam Baglioni
d86adb82a7
-
2020-07-07 18:20:51 +02:00
Miriam Baglioni
b2782025f6
enabled the whole workflow to run. Added property to give priority to depenedency in the classpath - to solve conflicts
2020-07-07 18:10:47 +02:00
Miriam Baglioni
83d2c84b77
added constraints to xquery so that to get only profiles with status manager or all
2020-07-07 18:09:48 +02:00
Miriam Baglioni
4c8d86493c
-
2020-07-07 18:09:06 +02:00
Miriam Baglioni
f5bb65c9ef
the json schema for the dump of the results
2020-07-07 17:34:40 +02:00
Miriam Baglioni
c19818a3f8
merge branch with fork master
2020-07-06 13:58:23 +02:00
Miriam Baglioni
d7f6f0c216
changed code to use other lib
2020-07-02 16:01:34 +02:00
Miriam Baglioni
94500a581b
merge branch with fork master
2020-07-02 14:25:39 +02:00
Claudio Atzori
ed1c7e5d75
fixed workflow for the import of the claims alone
2020-07-02 12:40:21 +02:00
Sandro La Bruzzo
1d420eedb4
added generation of EBI Dataset
2020-07-02 12:37:43 +02:00
Claudio Atzori
e4a29a4513
fixed workflow for the import of the claims alone
2020-07-02 12:36:33 +02:00
Claudio Atzori
6f5771c1c9
sets author.rank when null
2020-06-25 14:06:21 +02:00
Claudio Atzori
2d77d3a388
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-06-25 12:54:30 +02:00
Miriam Baglioni
05a99cfb61
change the position of value and description elements in the workflow definition
2020-06-25 12:36:08 +02:00
Claudio Atzori
7df2712824
Merge branch 'provision_indexing'
2020-06-25 12:22:41 +02:00
Michele Artini
abcbebcbb4
fixed generation of ids
2020-06-25 09:50:46 +02:00
Michele Artini
77d2a1b1c4
params to choose sql queries for beta or production
2020-06-25 09:28:13 +02:00
Claudio Atzori
0e723d378b
added default from vocab for missing instance.refereed; remove spurious prefixes from orcid values; WIP: prepare relation job
2020-06-24 18:34:42 +02:00
Miriam Baglioni
afa19b0c84
changed the way to PUT the files to the rest API
2020-06-22 17:20:07 +02:00
Miriam Baglioni
df80ae5c1b
merge branch with fork master
2020-06-22 10:51:23 +02:00
Miriam Baglioni
e8f914f8b3
-
2020-06-22 10:50:41 +02:00
Miriam Baglioni
185facb8e5
change the deprecated DefaultHttpClient with the CLoseableHttpClient
2020-06-22 10:49:10 +02:00
Claudio Atzori
7d416f08d8
graph cleaning workflow: set hostedby to unknown repository when defined as NULL
2020-06-22 09:50:43 +02:00
Miriam Baglioni
669a509430
-
2020-06-19 17:39:46 +02:00
Claudio Atzori
d0ac7514b2
cleaning workflow to include cleaning of default values
2020-06-18 19:37:25 +02:00
Miriam Baglioni
44a12d244f
-
2020-06-18 18:38:54 +02:00
Miriam Baglioni
fb80353018
-
2020-06-18 14:21:36 +02:00
Miriam Baglioni
65bf312360
merge branch with fork master
2020-06-18 11:35:27 +02:00
Miriam Baglioni
f9578312b5
-
2020-06-18 11:34:15 +02:00
Miriam Baglioni
e8b3e972f2
changed the input params and the workflow definition to tackle the Result as all result product produced
2020-06-18 11:25:05 +02:00
Miriam Baglioni
3233b01089
changes due to adding all the result type under Result
2020-06-18 11:22:58 +02:00
Miriam Baglioni
bc8611a95a
added new resources for testing
2020-06-18 11:19:20 +02:00
Sandro La Bruzzo
9bf67f5de1
resolved conflicts
2020-06-17 09:15:43 +02:00
Sandro La Bruzzo
1d4275acc4
implemented first version of exportation of Scholexplorer into ActionSet
2020-06-17 09:10:38 +02:00
Claudio Atzori
5441f01586
Merge pull request 'missing landingPage urls in instances' ( #22 ) from instances-with-landing-page into master
...
Looks good, thanks!
2020-06-16 15:32:44 +02:00
Claudio Atzori
4ec262db53
included externalreference(s) in the result view on the Hive graph DB
2020-06-16 15:28:20 +02:00
Claudio Atzori
2a4f65795f
WIP: graph cleaner implementation
2020-06-15 18:32:24 +02:00
Claudio Atzori
c15c8c0ad0
map datasource identities (including piwik ids) as original IDs
2020-06-15 16:07:30 +02:00
Miriam Baglioni
9dd3ef22c5
merge branch with fork master
2020-06-15 11:23:26 +02:00
Miriam Baglioni
e43eedb5b0
added resources and workflow for dump of community products
2020-06-15 11:13:21 +02:00
Miriam Baglioni
f96ca900e1
fixed issues while running on cluster
2020-06-15 11:12:14 +02:00
Claudio Atzori
0d52816244
WIP: graph cleaner implementation
2020-06-13 13:06:04 +02:00
Claudio Atzori
bed65a1be6
WIP: graph cleaner implementation
2020-06-12 18:25:47 +02:00
Claudio Atzori
463489f59f
code formatting
2020-06-12 12:03:25 +02:00
Claudio Atzori
4bcad1c9c3
Merge branch 'graph_cleaning'
2020-06-12 11:40:25 +02:00
Claudio Atzori
cdb1956fe9
WIP: graph cleaner implementation
2020-06-12 11:36:59 +02:00
Alessia Bardi
b347499745
do not use deprecated subreltype
2020-06-12 10:58:02 +02:00
Claudio Atzori
97b1c4057c
WIP: graph cleaner implementation
2020-06-12 10:45:18 +02:00
Claudio Atzori
ba8a024af9
avoid NPEs merging titles
2020-06-12 10:45:11 +02:00
Miriam Baglioni
a01800224c
-
2020-06-11 13:02:04 +02:00
Miriam Baglioni
356dd582a3
map construction moved in class
2020-06-11 12:59:22 +02:00
Michele Artini
a41e0cb648
missing landingPage urls in instances
2020-06-11 12:28:34 +02:00
Michele Artini
99f88e1cb8
fixed generation entities from claims
2020-06-11 10:51:57 +02:00
Miriam Baglioni
db27663750
-
2020-06-11 10:49:01 +02:00
Claudio Atzori
d1d92c4d8c
fixed integration of claims in the graph
2020-06-11 10:12:00 +02:00
Claudio Atzori
953da4a427
Merge branch 'master' into graph_cleaning
2020-06-10 21:36:56 +02:00
Claudio Atzori
f1bce64391
WIP: graph cleaner implementation
2020-06-10 21:36:31 +02:00
Michele Artini
c08e66e01e
fixed a workflow parameter
2020-06-10 10:11:56 +02:00
Michele Artini
7177a32d75
import of invisible stores
2020-06-10 10:04:00 +02:00
Claudio Atzori
a2fdf85ba1
WIP: graph cleaner implementation
2020-06-09 19:52:53 +02:00
Claudio Atzori
d9f33582c5
WIP: graph cleaner implementation
2020-06-09 17:20:40 +02:00
Miriam Baglioni
a089db18f1
workflow and parameters to exucute the dump
2020-06-09 15:39:38 +02:00
Miriam Baglioni
6bbe27587f
new classes to execute the dump for products associated to community, enrich each result with project information and assign the result to each community it belongs to
2020-06-09 15:39:03 +02:00
Miriam Baglioni
5121cbaf6a
new classes for external dump. Only classes functional to dump products
2020-06-09 15:37:46 +02:00
Claudio Atzori
b2349659cf
WIP: graph property fixing implementation
2020-06-05 18:37:38 +02:00
Claudio Atzori
5e23fb3a74
code formatting
2020-05-30 10:52:56 +02:00
Claudio Atzori
54ca8ed6c3
uniformed param name (isLookupUrl), Vocab model classes defined as Serializable
2020-05-29 18:17:30 +02:00
Claudio Atzori
1577bd5b8b
added IsLookupUrl to the raw_db workflow parameters
2020-05-29 16:18:16 +02:00
Michele Artini
adb798faa5
import from db using is vocabularies
2020-05-29 12:03:51 +02:00
Michele Artini
f5ce7d76e1
resolve conflicts
2020-05-27 12:49:17 +02:00
Michele Artini
b81f2741d2
xquery
2020-05-27 12:10:20 +02:00
Michele Artini
a25598140a
result pids (new xpaths + IS vocabularies)
2020-05-27 12:10:20 +02:00
Michele Artini
7a7272d9ec
result pids (new xpaths + IS vocabularies)
2020-05-27 12:10:20 +02:00
Michele Artini
3ceb2d2853
match terms with vocabularies
2020-05-27 11:34:13 +02:00
Michele Artini
c15d997925
xquery
2020-05-26 13:13:17 +02:00
Michele Artini
c6af36496a
result pids (new xpaths + IS vocabularies)
2020-05-26 13:11:09 +02:00
Michele Artini
093f1aff03
result pids (new xpaths + IS vocabularies)
2020-05-26 13:06:55 +02:00
Miriam Baglioni
d3d36647d2
merge upstream
2020-05-25 10:38:22 +02:00
Miriam Baglioni
dbde2d243a
changed due to move of PacePerson from dhp-graph-mapper to dhp-common
2020-05-25 10:35:39 +02:00
Miriam Baglioni
8f6ce970f9
moved PacePerson to dhp-common to avoid conflict in dependency with graph-mapper
2020-05-25 10:25:55 +02:00
Claudio Atzori
de108f54d6
code formatting
2020-05-23 10:21:19 +02:00
Claudio Atzori
6b56cae57d
added mapping for bestaccessrights
2020-05-23 09:57:39 +02:00
Claudio Atzori
3cf2796ac6
code formatting
2020-05-22 12:34:00 +02:00
Michele Artini
dc4621b3cb
filter ORCID e MAG identifiers
2020-05-22 12:25:01 +02:00
Michele Artini
9f2d0f1b08
filter ORCID e MAG identifiers
2020-05-22 11:00:27 +02:00
Michele Artini
9de71e54a8
filter ORCID e MAG identifiers
2020-05-22 10:47:39 +02:00
Michele Artini
c5f7e17348
author fullnames
2020-05-22 10:08:02 +02:00
Michele Artini
e43d4d7778
added a coalesce in sql query
2020-05-21 11:08:07 +02:00
Michele Artini
b3bcbb3129
resolve name of organization countries
2020-05-21 08:41:32 +02:00
Claudio Atzori
7838f2c63f
init the empty list for author pids mapped from OAF
2020-05-15 17:06:01 +02:00
Claudio Atzori
7a89507ab1
code formatting
2020-05-15 15:16:54 +02:00
Claudio Atzori
cfc8948717
fixed mapping OdfToGraph: pick the correct element to map author pids and author affiliations; extended mapping Oaf2Graph: added support for author pids
2020-05-15 12:26:16 +02:00
Claudio Atzori
a832658296
code formatting
2020-05-15 10:21:09 +02:00
Claudio Atzori
18f46e47b9
added relations to the graph2hive import workflow
2020-05-15 09:34:48 +02:00
Claudio Atzori
9d028ffe1c
cleanup
2020-05-15 09:28:55 +02:00
Claudio Atzori
fd62359538
cleanup
2020-05-15 09:28:15 +02:00
Claudio Atzori
eb64335a54
parallel implementation for graph Hive importer
2020-05-15 09:05:26 +02:00
Claudio Atzori
f044d09315
revised mapping: more accurate mapping for name/surname from datacite format; improved mapping of null values
2020-05-14 15:07:24 +02:00
Claudio Atzori
ab37953332
added global properties in wf definitions to avoid repeating name-node and job-tracker in the (many) distcp actions; reintroduced output directory removal at the beginning of each spark action
2020-05-14 10:25:41 +02:00
Claudio Atzori
5ecacad70a
fixed default resource typing in Oaf/Odf mapping
2020-05-13 17:01:11 +02:00
Miriam Baglioni
f5d785e096
used the DbClient moved in dhp-common
2020-05-11 13:59:42 +02:00
Miriam Baglioni
2abb84877d
Merge branch 'master' into blacklist
2020-05-11 10:37:49 +02:00
Miriam Baglioni
5e3548add6
-
2020-05-11 10:33:08 +02:00
Miriam Baglioni
871e079b45
merged with master
2020-05-11 10:20:00 +02:00
Miriam Baglioni
32301451ec
merge upstream
2020-05-11 09:42:23 +02:00
Miriam Baglioni
4c94231cad
merge with master fork
2020-05-08 12:25:57 +02:00
Claudio Atzori
62ea19f1d3
introduced mapping for ExternalReferences, made urls defined within an instance unique
2020-05-08 09:43:26 +02:00
Miriam Baglioni
207b899d6d
merged with upstream
2020-05-07 11:43:53 +02:00
Miriam Baglioni
5efae3acb9
new workflow for job3
2020-05-07 11:38:10 +02:00
Claudio Atzori
17860d3ab6
general changes in the RAW graph mapping: missing collectedfrom/hostedby causes records to be skipped; factored out most of the constants in ModelConstants class (dhp-schemas)
2020-05-06 13:20:02 +02:00
Michele Artini
8f30a09d84
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-05-05 17:12:22 +02:00
Michele Artini
ccc609f909
new module for the production of broker events
2020-05-05 17:09:00 +02:00
Claudio Atzori
4a8487165c
using long param names in wf definition
2020-05-04 19:19:29 +02:00
Claudio Atzori
a2fc37df5f
adjusted parameters
2020-05-04 19:18:59 +02:00
Claudio Atzori
f1b7e14036
code formatting
2020-05-04 19:18:34 +02:00
Miriam Baglioni
31ea05297d
moved the DbClient to common and added needed dependency to pom
2020-05-04 12:22:28 +02:00
Miriam Baglioni
4b0bd91012
-
2020-04-30 12:45:28 +02:00
Miriam Baglioni
3abb76ff7a
merge with upstream
2020-04-30 11:15:54 +02:00
Michele Artini
eb9bd42970
fixed a problem with journals
2020-04-30 11:06:05 +02:00
Miriam Baglioni
638a3c465b
-
2020-04-30 11:05:17 +02:00
Michele Artini
a0a6109bbc
fixed a problem with journals
2020-04-30 11:03:46 +02:00
Claudio Atzori
439c6255a2
cleanup
2020-04-29 19:09:07 +02:00
Claudio Atzori
77ac995770
cleaned up poms, added descriptions
2020-04-29 18:44:17 +02:00
Miriam Baglioni
3cffee74b9
merge with upstream
2020-04-29 18:25:29 +02:00
Michele Artini
c43b4c8962
formatting
2020-04-29 12:56:58 +02:00
Michele Artini
a5d7007005
Fix relations in migration
...
Fix pom.xml in dhp-stats-update
2020-04-29 12:05:41 +02:00
Miriam Baglioni
f7695e833c
resolved conflicts
2020-04-29 11:41:31 +02:00
Claudio Atzori
6f5b899038
reformatted code according to the updated style descriptor
2020-04-28 11:23:29 +02:00
Claudio Atzori
ac25f2d8d1
integrated changes from master
2020-04-28 08:55:28 +02:00
Claudio Atzori
a0bdbacdae
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
2020-04-27 14:52:31 +02:00
Claudio Atzori
7a3f8085f7
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
2020-04-27 14:45:40 +02:00
Michele Artini
1260d03eba
skip empty projects
2020-04-27 13:51:13 +02:00
Claudio Atzori
268462623a
refined definition of equals and hash methods for Oaf model classes, now based on entity identifier, while relations consider sourceid, targetid and relationship semantic; Factored out function to group Oaf objects in grouping operations; Raw graph creation procedure merges entities and relationships providing the same identity
2020-04-24 14:42:01 +02:00
Claudio Atzori
a3e480d1c9
implmented DispatchEntitiesApplication using spark2 datasets
2020-04-24 14:36:53 +02:00
Claudio Atzori
48157e0fc4
GraphHiveImporterJob moved in dedicate package
2020-04-24 14:32:28 +02:00
Michele Artini
072eae3803
fixed a problem with missing contexts
2020-04-23 16:42:49 +02:00
Michele Artini
b164d96874
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-04-23 16:19:16 +02:00
Michele Artini
d920ce501e
fixed a problem with missing instances
2020-04-23 16:18:40 +02:00
Claudio Atzori
8851050814
replaced hive_db_name with hiveDbName
2020-04-23 08:36:40 +02:00
Claudio Atzori
91f81107b1
applying code formatting
2020-04-23 07:52:32 +02:00
Claudio Atzori
ade4cb97af
fixed parameters passed to the postprocessing action in the workflow mapping the graph as hive DB
2020-04-22 18:24:06 +02:00
Claudio Atzori
e81960335c
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-04-22 10:46:37 +02:00
Michele Artini
9e4d58f505
ResultType
2020-04-22 10:07:26 +02:00
Claudio Atzori
c891661822
small adjustments in the graph2hive workflow
2020-04-21 18:52:23 +02:00
Claudio Atzori
cd320efa96
added extra spark options to graph to hive workflow
2020-04-21 16:12:20 +02:00
Claudio Atzori
d772d967aa
restored changes from master branch
2020-04-20 18:53:06 +02:00
miconis
4da13e4570
Revert "Merge branch 'master' into deduptesting"
...
This reverts commit 772f75d167
, reversing
changes made to 5f45f2c77f
.
2020-04-20 16:04:49 +02:00
Claudio Atzori
d714bfb4d4
collectedfrom field moved in common parent class Oaf.java
2020-04-20 12:25:19 +02:00
Michele Artini
8ff7facfa3
fixed collectedFrom ID
2020-04-20 11:09:27 +02:00
Michele Artini
25307965d2
add a default datainfo if missing
2020-04-20 09:43:27 +02:00
Claudio Atzori
ad7a131b18
introduced common project code formatting plugin, works on the commit hook, based on https://github.com/Cosium/git-code-format-maven-plugin , applied to each java class in the project
2020-04-18 12:42:58 +02:00
Claudio Atzori
ff30f99c65
using newline delimited json files for the raw graph materialization. Introduced contentPath parameter
2020-04-15 16:16:20 +02:00
Alessia Bardi
550a9f82ed
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-04-14 17:53:01 +02:00
Alessia Bardi
a68fae9bcb
now supporting openaire 4.0 compliance
2020-04-14 17:52:48 +02:00
Sandro La Bruzzo
c36239e693
fixed incremental indexing
2020-04-14 17:47:36 +02:00
Claudio Atzori
82e8341f50
reorganizing parameter names in the provision workflow
2020-04-14 15:54:41 +02:00
Claudio Atzori
6b5f9ca9cb
raw graph creation workflow moved under dhp-graph-mapper, claims integration is included
2020-04-10 17:53:07 +02:00
Claudio Atzori
47f3d9b757
unit test for GraphHiveImporterJob
2020-04-08 13:24:43 +02:00
Claudio Atzori
d74e128aa6
Utility classes moved in dhp-common and dhp-schemas
2020-04-07 11:56:22 +02:00
Sandro La Bruzzo
62cc257e5c
fixed step1 workflow
2020-03-27 17:07:34 +01:00
Claudio Atzori
1767dfaa3f
method can be protected, it is meant to be used only in tests
2020-03-27 14:31:26 +01:00
Sandro La Bruzzo
15d9106b3f
FIxed merge of dhp dedup
2020-03-27 13:48:44 +01:00
Sandro La Bruzzo
8c9a56a0c8
refactored package name
2020-03-27 13:19:33 +01:00
Sandro La Bruzzo
a9935f80d4
refactor class name and workflow name for graph mapper, added javadoc
2020-03-27 13:16:24 +01:00
Claudio Atzori
673e744649
moved openaire specific implementations under dedicated package eu.dnetlib.dhp.oa
2020-03-27 10:42:17 +01:00
Claudio Atzori
098fabab3f
reorganizing content under dhp-workflows/dhp-graph-mapper
2020-03-26 19:44:19 +01:00
Claudio Atzori
77c4294924
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-03-26 18:26:52 +01:00
Claudio Atzori
43cbcda7ef
unit test for SparkGraphImporterJob
2020-03-26 18:26:40 +01:00
Sandro La Bruzzo
0cd022ad6a
merge with master
2020-03-26 14:08:29 +01:00
Claudio Atzori
2180cc4fe7
more fields included in result view definition
2020-03-25 11:21:46 +01:00
Claudio Atzori
8b0ba3d76a
posprocessing script correctly run as hive2 action
2020-03-23 17:40:39 +01:00
Claudio Atzori
658d40ccbe
WIP trying to use hive2 actions
2020-03-23 11:14:54 +01:00
Sandro La Bruzzo
0594b92a6d
implemented relation with dataset
2020-03-19 11:11:07 +01:00
Claudio Atzori
abe8fb69a2
added global properties, moved postprocessing script inside the oozie_app directory
2020-03-18 15:43:54 +01:00
Claudio Atzori
8fe7ae1482
xml formatting
2020-03-13 15:53:56 +01:00
Sandro La Bruzzo
addaaa091f
migrate relation from RDD to Dataset
2020-03-13 09:13:20 +01:00
Claudio Atzori
7b6f0c8756
reading graph dump as text files, encoded as newline-delimited JSON records, as indicated in the wiki
2020-03-10 17:19:17 +01:00
Claudio Atzori
0233987603
introduced post processing step following the hive DB creation/population
2020-03-04 10:56:50 +01:00
Claudio Atzori
9af3e904be
close the SparkSession at the end
2020-03-04 10:53:31 +01:00
Claudio Atzori
25ceec29ab
code formatting
2020-03-04 10:44:24 +01:00
Claudio Atzori
60bc2b1a20
drop the hive DB before populating it from scratch
2020-02-27 10:10:55 +01:00
Sandro La Bruzzo
2b8675462f
refactoring code
2020-02-19 10:07:08 +01:00
Claudio Atzori
1b18fd4d54
sync with master branch
2020-02-17 13:49:46 +01:00
Sandro La Bruzzo
76ee85141a
added oozie job for DNET migration and implemented Spark job for extracting entities
2020-02-17 12:31:44 +01:00
Claudio Atzori
1fee6e2b7e
implemented XML records construction and serialization, indexing WIP
2020-02-13 16:53:27 +01:00
Sandro La Bruzzo
19a80e4638
implemented workfow for aggregation and generation of infospace graph
2020-01-24 09:58:55 +01:00
Michele Artini
b35c59eb42
partial implementation of entities from db
2020-01-20 16:04:19 +01:00
Sandro La Bruzzo
abd9034da0
implemented DedupRecord factory with the merge of publications
2019-12-11 15:43:24 +01:00
miconis
4b66b471a4
implementation of the sorting by trust mechanism and the merge of oaf entities
2019-12-10 14:57:16 +01:00
Claudio Atzori
245b4cbbb3
removed import limit
2019-11-08 17:41:01 +01:00
Claudio Atzori
5308f05a02
allow to speficy the target hive DB name in the infospace import workflow
2019-11-07 17:38:09 +01:00
Claudio Atzori
a52d5bde4f
simplified import procedure, maps the infospace as hive tables
2019-11-06 17:45:52 +01:00
Claudio Atzori
1e7a2ac41d
align parmeter names, graph import procedure WIP
2019-11-04 17:41:01 +01:00
Claudio Atzori
32ed4ae8d6
conversion utilities from protobuffer model to DHP model moved in dnet-mapreduce-jobs. Removed also the relative protobuf dependencies
2019-11-04 12:28:56 +01:00
Sandro La Bruzzo
997e57d45b
Added entity filter to spark class
2019-10-30 12:19:03 +01:00
Sandro La Bruzzo
a336956708
added defautl property to job
2019-10-30 12:01:42 +01:00
Claudio Atzori
78b5b57e86
trying to make the spark action to be run as spark2
2019-10-29 18:56:34 +01:00
Claudio Atzori
c8bb81cd9a
align dependencies with IIS cluster
2019-10-29 18:10:20 +01:00
Sandro La Bruzzo
fe62ccd6dd
implemented oozie wf
2019-10-28 12:12:50 +01:00
Sandro La Bruzzo
9ee4e5a196
remove a bit of syntactic sugar on the object inheritance :(
2019-10-25 18:10:30 +02:00
miconis
9fa5aebe9c
minor changes
2019-10-25 12:52:28 +02:00
miconis
551eda1600
dataset, orp and software mapping implemented. addition of test resources for results. implementation of tests to check the result of the mapping
2019-10-25 12:48:25 +02:00
Sandro La Bruzzo
eef14fade3
fixed conflict
2019-10-25 11:58:20 +02:00
Sandro La Bruzzo
0ea7e861ab
added organizations test
2019-10-25 11:56:28 +02:00
miconis
4908165e05
implementation of the createPublication method to map publications
2019-10-25 11:54:14 +02:00
miconis
df37bd6aaf
placeholders for setters in createpublication
2019-10-25 10:57:19 +02:00
Sandro La Bruzzo
c8d6d6bbd1
implemented organization mapping
2019-10-25 10:23:51 +02:00
miconis
b525b54130
starting implementing the createPublication class
2019-10-25 09:55:31 +02:00
Claudio Atzori
4b331790e7
resolved conflicts
2019-10-25 09:45:12 +02:00
Claudio Atzori
c929c1dfac
more proto 2 graph model mappings
2019-10-25 09:25:36 +02:00
Sandro La Bruzzo
09ffda03a2
removed circular dependencies
2019-10-25 09:24:18 +02:00
Sandro La Bruzzo
a10d071cf4
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2019-10-24 17:55:44 +02:00
Sandro La Bruzzo
3a8bb11695
mapped first part
2019-10-24 17:55:40 +02:00
Claudio Atzori
d46371ceab
Merge branch 'master' of https://code-repo.d2science.org/D-Net/dnet-hadoop
2019-10-24 17:43:55 +02:00
Claudio Atzori
0d88f9a6a4
added mapping for projects
2019-10-24 17:43:42 +02:00
Sandro La Bruzzo
2dd9572f41
added Mapping of OriginalDescription
2019-10-24 17:36:44 +02:00
miconis
351d850ad3
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2019-10-24 17:29:07 +02:00
miconis
b66a7e3030
publication test added
2019-10-24 17:29:01 +02:00
Sandro La Bruzzo
6c32d418ac
added conversion of ExtraInfo
2019-10-24 17:26:55 +02:00
Claudio Atzori
5f339a2c24
added mappings for basic types
2019-10-24 17:21:45 +02:00
Sandro La Bruzzo
9d04111391
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2019-10-24 17:05:52 +02:00
Sandro La Bruzzo
0902bac7dd
fixed conflict
2019-10-24 17:05:42 +02:00
Claudio Atzori
d8bfaa3687
added mapping for relations
2019-10-24 17:04:13 +02:00
Sandro La Bruzzo
d2965636e0
created test for convert json into new OAF data model
2019-10-24 17:02:35 +02:00
Claudio Atzori
79c4f1bbd8
Protobuf to internal graph model, early steps
2019-10-24 16:56:13 +02:00
Claudio Atzori
d38aeb8c6e
DataInfo.provenanceaction not repeatable, fluent setters
2019-10-24 16:55:38 +02:00
Sandro La Bruzzo
5744a64478
added module dhp=graph-mapper
2019-10-24 16:00:28 +02:00