Miriam Baglioni
2a540b6c01
added constants for the pid graph dump
2020-08-24 16:55:35 +02:00
Miriam Baglioni
da103c399a
resources for the pid graph dump test
2020-08-24 16:52:07 +02:00
Miriam Baglioni
630a6a1fe7
first tests for the pid graph dump
2020-08-24 16:51:26 +02:00
Miriam Baglioni
40c8d2de7b
test resources for the dump of the pids graph
2020-08-24 16:50:39 +02:00
Miriam Baglioni
bef79d3bdf
first attempt to the dump of pids graph
2020-08-24 16:49:38 +02:00
Miriam Baglioni
85203c16e3
merge branch with master
2020-08-19 11:49:03 +02:00
Miriam Baglioni
2c783793ba
removed the affiliation from the author to mirror the changes in the model
2020-08-19 11:48:12 +02:00
Miriam Baglioni
f6bf888016
removed affiliation from author to mirror the changes in the model
2020-08-19 11:41:41 +02:00
Miriam Baglioni
66d0e0d3f2
-
2020-08-19 11:31:50 +02:00
Miriam Baglioni
1c593a9cfe
-
2020-08-19 11:29:51 +02:00
Miriam Baglioni
e42b2f5ae2
-
2020-08-19 11:29:09 +02:00
Miriam Baglioni
f81ee22418
changed to mirror the changes in the model (Instance, CommunityInstance, GraphResult)
2020-08-19 11:28:26 +02:00
Miriam Baglioni
387be43fd4
changed to discriminate if dumping all the results type together or each one in its own archive
2020-08-19 11:25:27 +02:00
Miriam Baglioni
c5858afb88
added parameter to guide the dump for the result (resultAggregation). true if all the result types should be dump together, false otherwise.
2020-08-19 11:24:14 +02:00
Miriam Baglioni
d407852ac2
changed to reflect the changed in the model
2020-08-19 11:15:05 +02:00
Miriam Baglioni
47c21a8961
refactoring due to compilation
2020-08-19 11:11:57 +02:00
Miriam Baglioni
5570678c65
changed parameter name from hfdsNameNode to nameNode
2020-08-19 10:59:26 +02:00
Miriam Baglioni
dc5096a327
refactoring due to compilation
2020-08-19 10:57:36 +02:00
Miriam Baglioni
96600ed04a
modified test resource for mirroring the deletion of affiliation from author parameters
2020-08-14 20:41:49 +02:00
Miriam Baglioni
09f5b92763
added specific reference to class
2020-08-14 20:00:09 +02:00
Miriam Baglioni
37e7c43652
changed parameter name from hdfsNaemNode to nameNode
2020-08-14 18:18:25 +02:00
Miriam Baglioni
d2a8a4961a
refactoring
2020-08-13 18:50:33 +02:00
Miriam Baglioni
a5043de5da
added method to get the mapped instance
2020-08-13 18:45:50 +02:00
Miriam Baglioni
fcd10f452c
changed because of #40 (comment)
2020-08-13 12:55:32 +02:00
Miriam Baglioni
fd48ae3b85
changed because of #40 (comment)
2020-08-13 12:19:15 +02:00
Miriam Baglioni
04a3e1ab38
disabled tests
2020-08-13 12:18:13 +02:00
Miriam Baglioni
2ede397933
Apply change because of #40 (comment)
2020-08-13 12:16:39 +02:00
Miriam Baglioni
bfd1fcde6d
removed not useful method and changed because of #40 (comment) and #40 (comment)
2020-08-13 12:14:37 +02:00
Miriam Baglioni
7fd8397123
apply changes in #40 (comment)
2020-08-13 12:13:15 +02:00
Miriam Baglioni
753d448cc9
apply changes in #40 (comment)
2020-08-13 12:12:58 +02:00
Miriam Baglioni
c0e071fa26
apply changes in #40 (comment)
2020-08-13 12:12:40 +02:00
Miriam Baglioni
526db915bc
apply changes in #40 (comment)
2020-08-13 12:12:16 +02:00
Miriam Baglioni
b0fab0d138
apply changes in #40 (comment)
2020-08-13 12:11:57 +02:00
Miriam Baglioni
1b6320b251
apply changes in #40 (comment)
2020-08-13 12:11:41 +02:00
Miriam Baglioni
743d31be22
apply changes in #40 (comment)
2020-08-13 12:11:22 +02:00
Miriam Baglioni
65b48df652
apply changes in #40 (comment)
2020-08-13 12:11:06 +02:00
Miriam Baglioni
90b54d3efb
apply changes in #40 (comment)
2020-08-13 12:08:24 +02:00
Miriam Baglioni
69bbb9592a
apply changes in #40 (comment)
2020-08-13 12:07:39 +02:00
Miriam Baglioni
945323299a
apply changes in #40 (comment)
2020-08-13 12:07:24 +02:00
Miriam Baglioni
e04c993247
apply changes in #40 (comment)
2020-08-13 12:07:07 +02:00
Miriam Baglioni
ed0812d0ce
apply changes in #40 (comment)
2020-08-13 12:06:49 +02:00
Miriam Baglioni
d55cfe0ea5
apply changes in #40 (comment)
2020-08-13 12:06:20 +02:00
Miriam Baglioni
80866bec7d
apply changes in #40 (comment)
2020-08-13 12:06:05 +02:00
Miriam Baglioni
1400978c0a
apply changes in #40 (comment)
2020-08-13 12:05:44 +02:00
Miriam Baglioni
7b941a2e0a
apply changes in #40 (comment)
2020-08-13 12:05:17 +02:00
Miriam Baglioni
f7474f50fe
apply changes in #40 (comment)
2020-08-13 12:04:52 +02:00
Miriam Baglioni
367203f412
apply changes in #40 (comment)
2020-08-13 12:04:33 +02:00
Miriam Baglioni
3ab4809d31
apply changes in #40 (comment)
2020-08-13 12:04:10 +02:00
Miriam Baglioni
235d4e4d6e
moved Context as relevant for Communities dump
2020-08-12 18:16:45 +02:00
Miriam Baglioni
adf9f96a67
test for extraction of relation between organizations and context
2020-08-12 10:04:47 +02:00
Miriam Baglioni
7400cd019d
removed not needed variable
2020-08-12 10:03:33 +02:00
Miriam Baglioni
98d28bab5c
fixed missing _ in context nsprefix
2020-08-12 10:00:18 +02:00
Miriam Baglioni
25f4fbceea
draft of test and resources
2020-08-11 17:37:22 +02:00
Miriam Baglioni
30a2b19b65
changed metadata for deposition od covid-19 dump in Zenodo
2020-08-11 17:36:56 +02:00
Miriam Baglioni
49788b532a
changed to mirror changes in the schema
2020-08-11 16:05:03 +02:00
Miriam Baglioni
b08511287b
-
2020-08-11 16:01:36 +02:00
Miriam Baglioni
7e81a17068
changed the XQUERY to mirror the change in the code
2020-08-11 16:00:33 +02:00
Miriam Baglioni
37ad2f28e9
removed added | in prefix for datasource
2020-08-11 15:55:06 +02:00
Miriam Baglioni
f31c2e9461
enabled test
2020-08-11 15:49:25 +02:00
Miriam Baglioni
2d67476417
merge branch with master
2020-08-11 15:46:04 +02:00
Miriam Baglioni
6d3804e24c
-
2020-08-11 15:45:12 +02:00
Miriam Baglioni
0603ec4757
changed test to upload the dump for covid-19 community
2020-08-11 15:43:25 +02:00
Miriam Baglioni
7dfd56df9d
-
2020-08-11 15:42:35 +02:00
Miriam Baglioni
a169d7e7c1
added test file for the MakeTar class
2020-08-11 15:40:41 +02:00
Miriam Baglioni
acb0926b2e
json schemas for the dumped entities and relation
2020-08-11 15:39:48 +02:00
Miriam Baglioni
ff52c51f92
added the communityMapPath parameter and removed the isLookUpUrl parameter
2020-08-11 15:39:22 +02:00
Miriam Baglioni
6f43acda5e
added the maketar and send to zenodo step. Adjusted wf parameters
2020-08-11 15:38:20 +02:00
Miriam Baglioni
ddc19de2e9
removed the isLookUpUrl among the parameters
2020-08-11 15:37:47 +02:00
Miriam Baglioni
592a8ea573
added parameter file for maketar class
2020-08-11 15:37:14 +02:00
Miriam Baglioni
77a0951b32
added the make archive step in the workflow
2020-08-11 15:32:32 +02:00
Miriam Baglioni
cf4d918787
added description, changed parameter name and added method
2020-08-11 15:27:31 +02:00
Miriam Baglioni
dc5fc5366d
Creation of an archive for each related dump part
2020-08-11 15:26:06 +02:00
Miriam Baglioni
0ce49049d6
added description
2020-08-11 15:25:11 +02:00
Miriam Baglioni
9bae991167
added description of the class
2020-08-11 11:20:43 +02:00
Miriam Baglioni
341dc59ead
removed the repartition(1). Added code for the creation of an archive containing all the parts dumped for each community
2020-08-11 11:18:58 +02:00
Miriam Baglioni
1991a49f70
removed reference to isLookUp to get the communityMap
2020-08-10 18:02:56 +02:00
Miriam Baglioni
c378c38546
disabled test. The testing functionalities for hte upload in Zenode are moved to common
2020-08-10 12:41:11 +02:00
Miriam Baglioni
63ad0ed209
changed to use communityMapPath instead of IsLookUp
2020-08-10 12:40:19 +02:00
Miriam Baglioni
cec795f2ea
changed resources to mirror changes in the model
2020-08-10 12:39:35 +02:00
Miriam Baglioni
f50e3e7333
changed the class for which to generate the schema
2020-08-10 12:03:49 +02:00
Miriam Baglioni
b8c26f656c
test using communityMapPath instead of isLookUp
2020-08-10 12:02:55 +02:00
Miriam Baglioni
fe88904df0
changed the wf definition
2020-08-10 12:01:14 +02:00
Miriam Baglioni
87856467e2
removed isLookUpUrl and added code to read from HDSF the communitymap
2020-08-10 11:38:41 +02:00
Miriam Baglioni
1cf7043e26
removed isLookUoUrl from the parameters
2020-08-10 11:38:03 +02:00
Sandro La Bruzzo
0ade33ad15
updated mergeFrom function for DLI Unknown
2020-08-10 10:18:35 +02:00
Miriam Baglioni
46986aae2d
added the new parameter for newdeposion/newversion and concept_record_id
2020-08-07 18:00:06 +02:00
Miriam Baglioni
3aedfdf0d6
added option to do a new deposition or new version of an old deposition
2020-08-07 17:49:14 +02:00
Miriam Baglioni
1b3ad1bce6
filter out authors pid (only orcid). Added check to get unique provenance for context id. filtr out countries with code UNKNOWN
2020-08-07 17:48:18 +02:00
Miriam Baglioni
5ceb8c5f0a
moved constants from graph.Constants
2020-08-07 17:46:47 +02:00
Miriam Baglioni
6c65c93c0e
refactoring
2020-08-07 17:45:35 +02:00
Miriam Baglioni
68adf86fe4
refactoring
2020-08-07 17:43:20 +02:00
Miriam Baglioni
26d2ad6ebb
refactoring
2020-08-07 17:41:56 +02:00
Miriam Baglioni
9675af7965
refactoring
2020-08-07 17:41:07 +02:00
Miriam Baglioni
346a91f4d9
Added constants
2020-08-07 17:35:39 +02:00
Miriam Baglioni
d52b0e1797
no use of IsLookUp. The query is done once and its result stored on HDFS. The path to the result is given instead of the isLookUpUrl
2020-08-07 17:34:40 +02:00
Miriam Baglioni
ae1b7fbfdb
changed method signature from set of mapkey entries to String representing path on file system where to find the map
2020-08-07 17:32:27 +02:00
Miriam Baglioni
545ea9f77e
moved in common. Zenodo response model and APIClient to deposit in Zenodo
2020-08-07 16:44:51 +02:00
Miriam Baglioni
da9b012c15
fixed dewcription
2020-08-06 11:55:44 +02:00
Miriam Baglioni
6dbadcf181
the new schema for the dumped result
2020-08-06 11:05:56 +02:00
Sandro La Bruzzo
4fb1821fab
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-08-06 10:28:31 +02:00
Sandro La Bruzzo
9d9e9edbd2
improved extractEntity Relation workflows using dataset
2020-08-06 10:28:24 +02:00
Miriam Baglioni
adf0ca5aa7
test to send is from hdfs
2020-08-05 14:24:43 +02:00
Miriam Baglioni
14eda4f46e
added method to try to put inputstream to zenodo
2020-08-05 14:18:25 +02:00
Miriam Baglioni
e737a47270
added classes to try to send input stream to zenodo for the upload
2020-08-05 14:17:40 +02:00
Miriam Baglioni
873e9cd50c
changed hadoop setting to connect to s3
2020-08-04 15:37:25 +02:00
Alessia Bardi
a29565ff57
code formatting
2020-08-04 12:55:27 +02:00
Alessia Bardi
01db29e208
fixes redmine issue #5846 : datacite and its different namespace declarations
2020-08-04 12:53:48 +02:00
Alessia Bardi
b4e4e5f858
do not duplicate result PIDs
2020-08-04 12:52:14 +02:00
Alessia Bardi
09a323d18d
testing a dataset from Nakala
2020-08-04 12:50:52 +02:00
Alessia Bardi
c35bf486cc
added handle among the possible PIDs
2020-08-04 12:50:12 +02:00
Miriam Baglioni
5b651abf82
merge branch with master
2020-08-04 10:14:07 +02:00
Miriam Baglioni
901ae37f7b
added step to workflow
2020-08-03 18:12:54 +02:00
Miriam Baglioni
fa38cdb10b
added resource
2020-08-03 18:11:12 +02:00
Miriam Baglioni
e9fcc0b2f1
commented test unit - to decide change for mirroring the changed logics
2020-08-03 18:10:53 +02:00
Miriam Baglioni
e43aeb139a
added new property file and changed some parameter to old files
2020-08-03 18:07:28 +02:00
Miriam Baglioni
aa9f3d9698
changed logic for save in s3 directly
2020-08-03 18:06:18 +02:00
Miriam Baglioni
d465f0eec9
added fulltext to result
2020-08-03 18:03:27 +02:00
Miriam Baglioni
c892c7dfa7
changed to query for community map just once and save the result for remaining executions
2020-08-03 17:56:31 +02:00
Alessia Bardi
8cc067fe76
specific test for claims
2020-08-03 11:17:50 +02:00
Michele Artini
652b13abb6
Merge branch 'master' into nsprefix_blacklist
2020-07-31 07:58:37 +02:00
Claudio Atzori
cd631bb5bc
defaults fixed in the cleaning workflow forces result.publisher to NULL when result.publisher.value in empty
2020-07-30 17:03:53 +02:00
Miriam Baglioni
872d7783fc
-
2020-07-30 16:50:36 +02:00
Miriam Baglioni
57c87b7653
re-implemented to fix issue on not serializable Set<String> variable
2020-07-30 16:43:43 +02:00
Miriam Baglioni
ef8e5957b5
added specific directory where to save results
2020-07-30 16:42:46 +02:00
Miriam Baglioni
75f3361c85
-
2020-07-30 16:41:31 +02:00
Miriam Baglioni
3f695b25fa
refactoring
2020-07-30 16:40:15 +02:00
Miriam Baglioni
e623f12bef
refactoring
2020-07-30 16:32:59 +02:00
Miriam Baglioni
ff7d05abb4
added support class to store the couple organizationId representativeId gaot from sql query on hive
2020-07-30 16:32:04 +02:00
Miriam Baglioni
cf6d80b2ab
added command to close the writer
2020-07-30 16:31:22 +02:00
Miriam Baglioni
f985bca37b
added USER_CLAIM constant value
2020-07-30 16:25:26 +02:00
Claudio Atzori
4bbfcf1ac6
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-07-30 16:25:06 +02:00
Claudio Atzori
4ff8007518
added function to set the missing vocabulary names, used in the cleaning workflow as a pre-cleaning step
2020-07-30 16:24:39 +02:00
Miriam Baglioni
6f1c40a933
-
2020-07-30 16:24:28 +02:00
Miriam Baglioni
2b66a93f9e
added property file that was missing
2020-07-30 16:24:17 +02:00
Michele Artini
bdece15ca0
blacklist of nsprefix
2020-07-30 16:13:38 +02:00
Sandro La Bruzzo
c97c8f0c44
implemented new oozie job to extract entities in a separate dataset
2020-07-30 12:13:58 +02:00
Sandro La Bruzzo
3010a362bc
updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset
2020-07-30 09:25:56 +02:00
Sandro La Bruzzo
487226f669
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-07-30 09:25:39 +02:00
Sandro La Bruzzo
16ae3c9ccf
updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset
2020-07-30 09:25:32 +02:00
Miriam Baglioni
ee8420c6b3
added resource for datasource test
2020-07-29 18:28:43 +02:00
Miriam Baglioni
76bcab98ce
added code to filter out null originalId from the dump
2020-07-29 18:28:21 +02:00
Miriam Baglioni
ef1d8aef17
added one test to verify the dump for the datasources
2020-07-29 18:27:46 +02:00
Miriam Baglioni
86bab79512
-
2020-07-29 18:20:22 +02:00
Miriam Baglioni
31791dcf3d
fixed wrong property file path name
2020-07-29 18:20:08 +02:00
Miriam Baglioni
9e722aa1ef
-
2020-07-29 18:00:08 +02:00
Miriam Baglioni
d22f106f27
added constant to identify datasource associated to funders
2020-07-29 17:56:55 +02:00
Miriam Baglioni
40e194fe2f
added check to not dump datasources related to funders
2020-07-29 17:56:18 +02:00
Miriam Baglioni
b48934f6df
changed the workflow name
2020-07-29 17:43:43 +02:00
Miriam Baglioni
1433db825d
refactorign
2020-07-29 17:43:24 +02:00
Miriam Baglioni
074e9ab75e
refactoring
2020-07-29 17:42:50 +02:00
Miriam Baglioni
8ad8dac7d4
merge branch with fork master
2020-07-29 17:38:28 +02:00
Miriam Baglioni
9fa82dc93b
fixed issue
2020-07-29 17:36:16 +02:00
Miriam Baglioni
8907648d6a
-
2020-07-29 17:35:47 +02:00
Miriam Baglioni
536e7f6352
added and changed resources for testing of the whole graph dump and of community related products dumps
2020-07-29 17:33:34 +02:00
Miriam Baglioni
4d7f590493
testings for the whole graph dump
2020-07-29 17:32:37 +02:00
Miriam Baglioni
a2f73ec2c7
changed due to changes in the model
2020-07-29 17:32:02 +02:00
Miriam Baglioni
481585e9d3
-
2020-07-29 17:31:41 +02:00
Miriam Baglioni
40a8dafbdc
-
2020-07-29 17:30:44 +02:00
Miriam Baglioni
de2ebb467e
changed due to changes in the model
2020-07-29 17:08:02 +02:00
Miriam Baglioni
d0ff2a56fb
-
2020-07-29 17:06:53 +02:00
Miriam Baglioni
b96dedb56b
changed due to changes in the model
2020-07-29 17:05:31 +02:00
Miriam Baglioni
6d0f08277b
classes to implement the dump of the whole graph.
2020-07-29 17:03:19 +02:00
Miriam Baglioni
8d4327b292
input parameters and workflow definition for the dump of the whole graph
2020-07-29 17:00:34 +02:00
Miriam Baglioni
b5f995ab12
refactoring
2020-07-29 16:59:48 +02:00
Miriam Baglioni
f7a87cc447
added new constants value
2020-07-29 16:58:40 +02:00
Miriam Baglioni
b71d12cf26
refactoring
2020-07-29 16:52:44 +02:00
Miriam Baglioni
a8d65b68cb
changed to delete the part to check if it was a test or a real execution
2020-07-29 16:47:57 +02:00
Miriam Baglioni
3ec2392904
Added new class to move the place the split is effectively run
2020-07-29 16:46:50 +02:00
Miriam Baglioni
178c2729a7
changed the path to reach the java class to be executed
2020-07-29 12:29:51 +02:00
Miriam Baglioni
437ac12139
removed unused parameter
2020-07-29 12:28:16 +02:00
Michele Artini
35e6e9c064
tests
2020-07-28 12:02:15 +02:00
Miriam Baglioni
6c2223d1fc
added code to get the openaire id for contexts
2020-07-24 17:30:15 +02:00
Miriam Baglioni
afd54c1684
removed not needed upload and refactoring
2020-07-24 17:28:56 +02:00
Miriam Baglioni
7b0569d989
changed to map also the result associated to the whole graph
2020-07-24 17:28:11 +02:00
Miriam Baglioni
082225ad61
-
2020-07-24 17:27:26 +02:00
Miriam Baglioni
968c59d97a
added teh logic to dump also the products for the whole graph. They will miss collected from and context information that will be materialized as new relations
2020-07-24 17:25:19 +02:00
Miriam Baglioni
332258d199
split the classes related to the communities dump and to the whole graph dump
2020-07-24 17:21:48 +02:00
Claudio Atzori
56bbfdc65d
introduced parameter 'numParitions', driving the hive DB table data partitioning. Currently specified only for table 'project'
2020-07-23 08:54:10 +02:00
Sandro La Bruzzo
9ab594ccf6
fixed test
2020-07-21 10:36:21 +02:00
Claudio Atzori
ebf60020ac
map results as OPRs in case of missing //CobjCategory/@type and the vocabulary dnet:result_typologies doesn't resolve the super type
2020-07-20 19:01:10 +02:00
Miriam Baglioni
355d7e426e
added dumo for project - not finished
2020-07-20 18:54:43 +02:00
Miriam Baglioni
a2f01e5259
added getter and setter
2020-07-20 18:54:17 +02:00
Miriam Baglioni
40bbe94f7c
merge with master fork
2020-07-20 18:10:03 +02:00
Miriam Baglioni
23160b4d29
realignment of the workflow classes with the changes in the structure of the module
2020-07-20 18:04:30 +02:00
Miriam Baglioni
3aab7680f6
changed the test results
2020-07-20 18:00:43 +02:00
Miriam Baglioni
5076e4f320
changed test to comply with the modifications
2020-07-20 17:55:18 +02:00
Miriam Baglioni
08dbd99455
changed to dump the whole results graph by usign classes already implemented for communities. Added class to dump also organization
2020-07-20 17:54:28 +02:00
Miriam Baglioni
e47ea9349c
extended some types by adding provenance as the couple (provenance, trust) and moved some classes to be used by the complete graph dump also
2020-07-20 17:46:27 +02:00
Claudio Atzori
32f5e466e3
imports cleanup
2020-07-20 17:42:58 +02:00
Claudio Atzori
54ac583923
code formatting
2020-07-20 17:37:08 +02:00
Claudio Atzori
124e7ce19c
in case of missing attribute //dr:CobjCategory/@type the resulttype is derived by looking up the vocabulary dnet:result_typologies with the 1st instance type available
2020-07-20 17:33:37 +02:00
Claudio Atzori
050dda223d
Merge pull request 'removed duplicated fields' ( #25 ) from unique_field_in_lists into master
...
Looks good as a temporary workaround. I agree the model could seamlessly make the distinct operation by using HashSets instead of Linked (or Array) Lists.
The task to update the model in such a way is added on #9#issuecomment-1583
Thanks!
2020-07-20 12:12:50 +02:00
Claudio Atzori
e0c4cf6f7b
added parameter to drive the graph merge strategy: priority (BETA|PROD)
2020-07-20 10:48:01 +02:00
Claudio Atzori
94ccdb4852
Merge branch 'master' into merge_graph
2020-07-20 10:14:55 +02:00
Michele Artini
331a3cbdd0
fixed originalId
2020-07-20 09:50:29 +02:00
Sandro La Bruzzo
9116d75b3e
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-07-17 18:01:30 +02:00
Miriam Baglioni
47c7122773
changed priority from beta to production
2020-07-17 12:56:35 +02:00
Michele Artini
442f30930c
removed duplicated fields
2020-07-17 12:25:36 +02:00
Michele Artini
3adedd0a68
trust truncated to 3 decimals
2020-07-17 11:58:11 +02:00
Claudio Atzori
1781609508
code formatting
2020-07-16 19:06:56 +02:00
Claudio Atzori
878f2b931c
Merge branch 'master' into merge_graph
2020-07-16 16:34:24 +02:00
Miriam Baglioni
f9ad6f3255
Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump
2020-07-10 19:42:53 +02:00
Miriam Baglioni
c27f12d6e8
avoid to consider _SUCCESS file
2020-07-10 19:42:23 +02:00
Claudio Atzori
31071e363f
Merge branch 'provision_indexing'
2020-07-10 19:03:57 +02:00
Claudio Atzori
cc77446dc4
added dbSchema parameter to the raw_db workflow
2020-07-10 19:01:50 +02:00
Michele Artini
e1ae964bc4
stats
2020-07-10 16:12:08 +02:00
Sandro La Bruzzo
c01efed79b
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-07-10 14:44:57 +02:00
Sandro La Bruzzo
a7d3977481
added generation of EBI Dataset
2020-07-10 14:44:50 +02:00
Claudio Atzori
67e1d222b6
bulk cleaning when found null or empty, sets bestaccessrights evaluating the result instances
2020-07-08 17:53:35 +02:00
Claudio Atzori
610d377d57
first implementation of the BETA & PROD graphs merge procedure
2020-07-08 16:54:26 +02:00
Alessia Bardi
9a898c0e4c
Json schema generator
2020-07-08 12:52:00 +02:00
Alessia Bardi
8f83b726fa
Dump json schema compliant to json schema Draft 7
2020-07-08 12:48:46 +02:00
Miriam Baglioni
1b0b968548
fixed issue on substring
2020-07-08 12:11:51 +02:00
Miriam Baglioni
7fe00cb4fb
-
2020-07-08 10:29:37 +02:00
Miriam Baglioni
375ef07d7b
changed the description for the upload
2020-07-07 18:41:27 +02:00
Miriam Baglioni
35c8265793
added the json extention to filename
2020-07-07 18:29:49 +02:00
Miriam Baglioni
81434f8e5e
added method newInstance
2020-07-07 18:26:10 +02:00
Miriam Baglioni
817cddfc52
-
2020-07-07 18:25:12 +02:00
Miriam Baglioni
a66aa9bd83
removed unuseful tests
2020-07-07 18:25:00 +02:00
Miriam Baglioni
9b20a21b24
removed unuseful tests
2020-07-07 18:23:37 +02:00
Miriam Baglioni
8a1b42ff21
added check to verify that dump contains at least one product
2020-07-07 18:21:35 +02:00
Miriam Baglioni
d86adb82a7
-
2020-07-07 18:20:51 +02:00
Miriam Baglioni
b2782025f6
enabled the whole workflow to run. Added property to give priority to depenedency in the classpath - to solve conflicts
2020-07-07 18:10:47 +02:00
Miriam Baglioni
83d2c84b77
added constraints to xquery so that to get only profiles with status manager or all
2020-07-07 18:09:48 +02:00
Miriam Baglioni
4c8d86493c
-
2020-07-07 18:09:06 +02:00
Miriam Baglioni
0208bc18f3
added new resource for testing
2020-07-07 17:47:24 +02:00
Miriam Baglioni
f5bb65c9ef
the json schema for the dump of the results
2020-07-07 17:34:40 +02:00
Miriam Baglioni
c19818a3f8
merge branch with fork master
2020-07-06 13:58:23 +02:00
Miriam Baglioni
f8bf4acd76
-
2020-07-02 16:03:11 +02:00
Miriam Baglioni
e6c79d44e6
-
2020-07-02 16:02:02 +02:00
Miriam Baglioni
d7f6f0c216
changed code to use other lib
2020-07-02 16:01:34 +02:00
Miriam Baglioni
94500a581b
merge branch with fork master
2020-07-02 14:25:39 +02:00
Claudio Atzori
ed1c7e5d75
fixed workflow for the import of the claims alone
2020-07-02 12:40:21 +02:00
Sandro La Bruzzo
1d420eedb4
added generation of EBI Dataset
2020-07-02 12:37:43 +02:00
Claudio Atzori
e4a29a4513
fixed workflow for the import of the claims alone
2020-07-02 12:36:33 +02:00
Claudio Atzori
6f5771c1c9
sets author.rank when null
2020-06-25 14:06:21 +02:00
Claudio Atzori
2d77d3a388
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-06-25 12:54:30 +02:00
Miriam Baglioni
05a99cfb61
change the position of value and description elements in the workflow definition
2020-06-25 12:36:08 +02:00
Claudio Atzori
7df2712824
Merge branch 'provision_indexing'
2020-06-25 12:22:41 +02:00
Michele Artini
abcbebcbb4
fixed generation of ids
2020-06-25 09:50:46 +02:00
Michele Artini
77d2a1b1c4
params to choose sql queries for beta or production
2020-06-25 09:28:13 +02:00
Claudio Atzori
0e723d378b
added default from vocab for missing instance.refereed; remove spurious prefixes from orcid values; WIP: prepare relation job
2020-06-24 18:34:42 +02:00
Miriam Baglioni
3e5570de7a
-
2020-06-23 15:44:54 +02:00
Michele Artini
38bb45d0b6
test osf:refereed
2020-06-23 10:14:39 +02:00
Miriam Baglioni
e4b21be004
-
2020-06-22 17:31:50 +02:00
Miriam Baglioni
afa19b0c84
changed the way to PUT the files to the rest API
2020-06-22 17:20:07 +02:00
Miriam Baglioni
df80ae5c1b
merge branch with fork master
2020-06-22 10:51:23 +02:00
Miriam Baglioni
e8f914f8b3
-
2020-06-22 10:50:41 +02:00
Miriam Baglioni
185facb8e5
change the deprecated DefaultHttpClient with the CLoseableHttpClient
2020-06-22 10:49:10 +02:00
Claudio Atzori
7d416f08d8
graph cleaning workflow: set hostedby to unknown repository when defined as NULL
2020-06-22 09:50:43 +02:00
Miriam Baglioni
669a509430
-
2020-06-19 17:39:46 +02:00
Claudio Atzori
d0ac7514b2
cleaning workflow to include cleaning of default values
2020-06-18 19:37:25 +02:00
Miriam Baglioni
44a12d244f
-
2020-06-18 18:38:54 +02:00
Miriam Baglioni
fb80353018
-
2020-06-18 14:21:36 +02:00
Miriam Baglioni
65bf312360
merge branch with fork master
2020-06-18 11:35:27 +02:00
Miriam Baglioni
a118b66858
-
2020-06-18 11:34:30 +02:00
Miriam Baglioni
f9578312b5
-
2020-06-18 11:34:15 +02:00
Miriam Baglioni
8b145e6aba
-
2020-06-18 11:25:28 +02:00
Miriam Baglioni
e8b3e972f2
changed the input params and the workflow definition to tackle the Result as all result product produced
2020-06-18 11:25:05 +02:00
Miriam Baglioni
3233b01089
changes due to adding all the result type under Result
2020-06-18 11:22:58 +02:00
Miriam Baglioni
5c8533d1a1
changed in the testing classes
2020-06-18 11:20:08 +02:00
Miriam Baglioni
bc8611a95a
added new resources for testing
2020-06-18 11:19:20 +02:00
Sandro La Bruzzo
9bf67f5de1
resolved conflicts
2020-06-17 09:15:43 +02:00
Sandro La Bruzzo
1d4275acc4
implemented first version of exportation of Scholexplorer into ActionSet
2020-06-17 09:10:38 +02:00
Claudio Atzori
1bc1d15eaf
stubbing for mock datasource.identities must be typed as array
2020-06-16 16:54:28 +02:00
Claudio Atzori
5441f01586
Merge pull request 'missing landingPage urls in instances' ( #22 ) from instances-with-landing-page into master
...
Looks good, thanks!
2020-06-16 15:32:44 +02:00
Claudio Atzori
89859111ee
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-06-16 15:28:29 +02:00
Claudio Atzori
4ec262db53
included externalreference(s) in the result view on the Hive graph DB
2020-06-16 15:28:20 +02:00
Michele Artini
8a4f84f8c0
refactoring
2020-06-16 12:34:13 +02:00
Claudio Atzori
2a4f65795f
WIP: graph cleaner implementation
2020-06-15 18:32:24 +02:00
Claudio Atzori
c15c8c0ad0
map datasource identities (including piwik ids) as original IDs
2020-06-15 16:07:30 +02:00
Miriam Baglioni
9dd3ef22c5
merge branch with fork master
2020-06-15 11:23:26 +02:00
Miriam Baglioni
68cf0fd03f
test input
2020-06-15 11:14:42 +02:00
Miriam Baglioni
0467145ae3
test for graph dump
2020-06-15 11:13:51 +02:00
Miriam Baglioni
e43eedb5b0
added resources and workflow for dump of community products
2020-06-15 11:13:21 +02:00
Miriam Baglioni
f96ca900e1
fixed issues while running on cluster
2020-06-15 11:12:14 +02:00
Miriam Baglioni
20b9e67728
added new class funder
2020-06-15 11:06:18 +02:00
Claudio Atzori
0d52816244
WIP: graph cleaner implementation
2020-06-13 13:06:04 +02:00
Claudio Atzori
bed65a1be6
WIP: graph cleaner implementation
2020-06-12 18:25:47 +02:00
Claudio Atzori
463489f59f
code formatting
2020-06-12 12:03:25 +02:00
Claudio Atzori
4bcad1c9c3
Merge branch 'graph_cleaning'
2020-06-12 11:40:25 +02:00
Claudio Atzori
cdb1956fe9
WIP: graph cleaner implementation
2020-06-12 11:36:59 +02:00
Alessia Bardi
b347499745
do not use deprecated subreltype
2020-06-12 10:58:02 +02:00
Claudio Atzori
97b1c4057c
WIP: graph cleaner implementation
2020-06-12 10:45:18 +02:00
Claudio Atzori
ba8a024af9
avoid NPEs merging titles
2020-06-12 10:45:11 +02:00
Miriam Baglioni
e145972962
-
2020-06-11 13:08:39 +02:00
Miriam Baglioni
a01800224c
-
2020-06-11 13:02:04 +02:00
Miriam Baglioni
356dd582a3
map construction moved in class
2020-06-11 12:59:22 +02:00
Michele Artini
a41e0cb648
missing landingPage urls in instances
2020-06-11 12:28:34 +02:00
Michele Artini
99f88e1cb8
fixed generation entities from claims
2020-06-11 10:51:57 +02:00
Miriam Baglioni
db27663750
-
2020-06-11 10:49:01 +02:00
Miriam Baglioni
bb9f21d0e7
job test for class producing first step of results dump
2020-06-11 10:20:05 +02:00
Claudio Atzori
d1d92c4d8c
fixed integration of claims in the graph
2020-06-11 10:12:00 +02:00
Claudio Atzori
953da4a427
Merge branch 'master' into graph_cleaning
2020-06-10 21:36:56 +02:00
Claudio Atzori
f1bce64391
WIP: graph cleaner implementation
2020-06-10 21:36:31 +02:00
Michele Artini
c08e66e01e
fixed a workflow parameter
2020-06-10 10:11:56 +02:00
Michele Artini
7177a32d75
import of invisible stores
2020-06-10 10:04:00 +02:00
Claudio Atzori
a2fdf85ba1
WIP: graph cleaner implementation
2020-06-09 19:52:53 +02:00
Claudio Atzori
d9f33582c5
WIP: graph cleaner implementation
2020-06-09 17:20:40 +02:00
Miriam Baglioni
a089db18f1
workflow and parameters to exucute the dump
2020-06-09 15:39:38 +02:00