Claudio Atzori
2d76497488
cleanup
2020-11-05 17:10:24 +01:00
Miriam Baglioni
f8e9bda24c
merge branch with master
2020-11-05 16:31:18 +01:00
Miriam Baglioni
be5ed8f554
added check to avoid sending empty metadata.
2020-11-05 16:10:17 +01:00
Claudio Atzori
2148a51fae
minor changes
2020-11-05 11:24:12 +01:00
Claudio Atzori
4625b7486e
code formatting
2020-11-04 18:12:43 +01:00
Claudio Atzori
f5f346dd2b
Merge pull request 'dump' ( #50 ) from miriam.baglioni/dnet-hadoop:dump into master
...
LGTM
2020-11-04 18:07:01 +01:00
Miriam Baglioni
e9ac471ae9
removed dependency from classes for the pid graph dump
2020-11-04 18:04:42 +01:00
Miriam Baglioni
b90a945c49
removed property files for pid graph dump
2020-11-04 17:28:33 +01:00
Miriam Baglioni
bac307155a
removed properties specific for pid graph dump
2020-11-04 17:28:04 +01:00
Miriam Baglioni
9c9d50f486
removed code specific for pid graph dump
2020-11-04 17:26:22 +01:00
Miriam Baglioni
5669890934
removed commented lines
2020-11-04 17:15:21 +01:00
Miriam Baglioni
6a89f59be9
removed commented lines
2020-11-04 17:13:59 +01:00
Miriam Baglioni
56150d7e5e
removed all code related to the dump of pids graph
2020-11-04 17:13:12 +01:00
Miriam Baglioni
16c54a96f8
removed pid dump
2020-11-04 17:11:32 +01:00
Claudio Atzori
e5da4ee9b1
dedup workflow using the common PidComparator
2020-11-04 15:02:02 +01:00
Miriam Baglioni
0cac5436ff
Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump
2020-11-04 13:21:11 +01:00
Alessia Bardi
51808b5afd
Updated descriptions
2020-11-04 12:29:48 +01:00
Alessia Bardi
e6becf8659
Updated descriptions
2020-11-04 12:17:57 +01:00
Alessia Bardi
0abe0eee33
Updated descriptions
2020-11-04 12:15:30 +01:00
Alessia Bardi
f6ab238f5d
Updated descriptions
2020-11-04 11:50:47 +01:00
Sandro La Bruzzo
3581244daf
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-11-04 09:04:22 +01:00
Sandro La Bruzzo
66efb39634
implemented merge scholix
2020-11-04 09:04:01 +01:00
Miriam Baglioni
c010a8442f
fixed issue on test code
2020-11-03 17:26:51 +01:00
Miriam Baglioni
8ec7a61188
merge branch with master
2020-11-03 16:59:08 +01:00
Miriam Baglioni
c209284ca7
new schemas for the entities in the dump with added descriptions
2020-11-03 16:58:08 +01:00
Miriam Baglioni
08806deddf
added the splitSize non mandatory parameter. Default size 10G
2020-11-03 16:57:34 +01:00
Miriam Baglioni
7d2eda43ca
added new non mandatory property publish to determine if to publish the upload or leave it pending. Default value flase
2020-11-03 16:57:01 +01:00
Miriam Baglioni
cbbb1bdc54
moved business logic to new class in common for handling the zip of hte archives
2020-11-03 16:55:50 +01:00
Miriam Baglioni
d4382b54df
moved the tar archive with maz size on common module
2020-11-03 16:54:50 +01:00
Claudio Atzori
86d6fbe95b
refactoring: CleaningFunctions and OafMapperUtils moved in dhp-commong
2020-11-03 12:19:46 +01:00
Claudio Atzori
8471888ad3
Merge branch 'graph_cleaning' into stable_ids
2020-11-03 11:52:47 +01:00
Claudio Atzori
5310e56dba
remove empy PIDs
2020-11-03 11:52:10 +01:00
Claudio Atzori
3fcd669e99
result merge operation leverage on custom ResultTypeComparator in the aggregator graph construction
2020-11-03 10:53:23 +01:00
Claudio Atzori
8e7f81c5f5
code formatting
2020-11-02 14:25:00 +01:00
Claudio Atzori
09e44dabff
Merge branch 'master' into stable_ids
2020-11-02 12:16:01 +01:00
Sandro La Bruzzo
754c86f33e
fixed test to work on jenkins
2020-11-02 09:35:01 +01:00
Sandro La Bruzzo
39337d8a8a
fixed test
2020-11-02 09:26:25 +01:00
Dimitris
32bf943979
Changes to download only updates
2020-11-02 09:08:25 +02:00
Miriam Baglioni
dabb33e018
changed the discriminant for which split the file
2020-10-30 17:52:22 +01:00
Claudio Atzori
c5dda3a00c
Merge pull request 'h2020classification' ( #49 ) from miriam.baglioni/dnet-hadoop:h2020classification into master
...
LGTM
2020-10-30 17:10:05 +01:00
Miriam Baglioni
4905739be6
changed resource file to mirror change in business logic
2020-10-30 17:02:57 +01:00
Miriam Baglioni
b40360ebfb
changed the code to mirror the changed decision in the classification level and prodramme description labels
2020-10-30 17:02:30 +01:00
Miriam Baglioni
696409fb9f
disabled tests because needing remote resource
2020-10-30 17:01:48 +01:00
Miriam Baglioni
0fba08eae4
max allowed size per file 10 Gb
2020-10-30 16:05:55 +01:00
Claudio Atzori
385214eeae
code formatting
2020-10-30 15:47:05 +01:00
Claudio Atzori
04ad8969b2
anticipated execution of the graph cleaning workflow
2020-10-30 15:46:55 +01:00
Claudio Atzori
4ca75d6951
Merge pull request 'Dedup ID creation policy' ( #48 ) from deduptesting into stable_ids
2020-10-30 15:15:32 +01:00
Miriam Baglioni
b828587252
prevent the code to cicle indefinetly
2020-10-30 15:01:25 +01:00
Miriam Baglioni
f747e303ac
classes for dumping of the graph as ttl file
2020-10-30 14:13:45 +01:00
Miriam Baglioni
16baf5b69e
formatting
2020-10-30 14:13:14 +01:00
Miriam Baglioni
a9eef9c852
added check for possible Optional value in relation dataInfo
2020-10-30 14:12:28 +01:00
Miriam Baglioni
5f4de9a962
formatting
2020-10-30 14:11:40 +01:00
Miriam Baglioni
14bf2e7238
added option to split dumps bigger that 40Gb on different files
2020-10-30 14:09:04 +01:00
Dimitris
b8a3392b59
Commit 30102020
2020-10-30 14:07:21 +02:00
Claudio Atzori
58f28296ea
ProvisionConstants moved as ModelHardLimits in dhp-common and applied to truncate long abstracts (len > 150000). Further filtering for empty PID values
2020-10-30 10:56:42 +01:00
Miriam Baglioni
78fdb11c3f
merge branch with master
2020-10-29 12:55:22 +01:00
Sandro La Bruzzo
1d9fdb7367
fixed spark memory issue in SparkSplitOafTODLIEntities
2020-10-28 12:30:32 +01:00
Miriam Baglioni
d2374e3b9e
added code to handle cases where the funding tree is not existing
2020-10-27 16:15:21 +01:00
Miriam Baglioni
5d3012eeb4
changed code to dump only the programme list and not the classification list
2020-10-27 16:14:18 +01:00
Miriam Baglioni
3241ec1777
added connection timeout and socket timeout 600 sec
2020-10-27 16:12:11 +01:00
Enrico Ottonello
9818e74a70
added dependency version in main pom.xml for orcid no doi
2020-10-22 16:38:00 +02:00
Enrico Ottonello
210a50e4f4
replaced null value
2020-10-22 16:24:42 +02:00
Enrico Ottonello
b0290dbcb7
moved all dependencies version to main pom.xml
2020-10-22 16:20:46 +02:00
Enrico Ottonello
a38ab57062
let run test methods
2020-10-22 15:43:50 +02:00
Enrico Ottonello
1139d6568d
replaced null value with a more safe empty string as return value
2020-10-22 15:32:26 +02:00
Enrico Ottonello
c58db1c8ea
added filter on null value after map function
2020-10-22 15:11:02 +02:00
Enrico Ottonello
846ba30873
if typologies mapping fails, an exception will be propagated
2020-10-22 14:36:18 +02:00
Enrico Ottonello
c3114ba0ae
replaced null as return value with a more safe empty string
2020-10-22 14:21:31 +02:00
Enrico Ottonello
c295c71ca0
added comment
2020-10-22 14:07:26 +02:00
Enrico Ottonello
ab083f9946
propagate exception on parsing work (PR request)
2020-10-22 14:02:32 +02:00
sandro
3a81a940b7
solved bug on merge publication
2020-10-21 22:41:55 +02:00
Miriam Baglioni
a2ce527fae
changed to match the requirements for short titles in level and long titles in classification
2020-10-20 17:03:25 +02:00
Sandro La Bruzzo
346ed65e2c
added upload to zenodo node
2020-10-20 16:59:55 +02:00
sandro
271b4db450
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-10-20 16:09:49 +02:00
sandro
d58d02d448
added workflow upload on zenodo
2020-10-20 16:09:07 +02:00
miconis
c4a59d1b9a
merge with the master to port the new packages
2020-10-20 16:07:30 +02:00
miconis
708d887e64
minor changes
2020-10-20 15:12:19 +02:00
miconis
0e54803177
bug fix in the id generator and implementation of jobs for organization dedup
2020-10-20 12:19:46 +02:00
Alessia Bardi
1425d810a8
testing mapping
2020-10-19 17:46:14 +02:00
Claudio Atzori
266bf1a221
common IdentifierFactory in use on the mapping from the aggregator data; merge the entities sharing the same id; code formatting
2020-10-16 17:02:10 +02:00
Claudio Atzori
34f1d0904b
common IdentifierFactory in use on the mapping from the aggregator data
2020-10-16 16:00:19 +02:00
Sandro La Bruzzo
fed711da80
Merge remote-tracking branch 'origin/master' into merge_record_to_common
2020-10-13 15:32:45 +02:00
Sandro La Bruzzo
34bf64c94f
fixed export Scholexplorer to OpenAire
2020-10-13 08:47:58 +02:00
Alessia Bardi
8775a64bc1
Merge pull request 'Merging different compatibility levels (pinocchio operator)' ( #47 ) from merge_graph into master
2020-10-09 14:44:52 +02:00
Claudio Atzori
e751c1402f
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-10-09 13:53:21 +02:00
Claudio Atzori
b961dc7d1e
added originalid to the fields in the result graph view
2020-10-09 13:53:15 +02:00
miconis
6f8720982c
bug fix in the idgenerator and test implementation
2020-10-09 09:30:23 +02:00
Sandro La Bruzzo
734934e2eb
fixed error on empty intersection with publication and relation on export to OAF
2020-10-08 17:29:29 +02:00
Sandro La Bruzzo
eec418cd26
moved AuthoreMerger into dhp-common
2020-10-08 10:33:55 +02:00
Sandro La Bruzzo
fe0a7870e6
Added test to check if merge authors works
2020-10-08 10:33:12 +02:00
Sandro La Bruzzo
cd9c377d18
adpted scholexplorer Dump generation to the new Dataset definition
2020-10-08 10:10:13 +02:00
Claudio Atzori
a3f37a9414
javadoc
2020-10-07 16:44:22 +02:00
Claudio Atzori
8d85a2fced
[BETA wf only] datasources involved in the merge operation doesn't obey to the infra precedence policy, but relies on a custom behaviour that, given two datasources from beta and prod returns the one from prod with the highest compatibility among the two
2020-10-07 16:28:52 +02:00
Claudio Atzori
5f7b75f5c5
code formatting
2020-10-07 13:22:54 +02:00
miconis
1804c5d809
refactoring: classes moved in the right package
2020-10-06 16:44:51 +02:00
miconis
7093355487
bug fix and minor changes
2020-10-06 16:21:34 +02:00
miconis
5a8bc329c5
bug fix in the result merge: it takes the correct bestaccessright basing on the license instead of the trust
2020-10-06 15:26:44 +02:00
miconis
a2ac7e52fb
implementation of the workflow for new organizations in openorgs
2020-10-06 13:58:09 +02:00
Miriam Baglioni
061527f06e
adding short description
2020-10-05 13:54:39 +02:00
Miriam Baglioni
0c12d7bdd8
adding short description
2020-10-05 11:39:55 +02:00
Miriam Baglioni
ae08b3c0dd
merge branch with master
2020-10-05 11:35:55 +02:00
Miriam Baglioni
11b7eaae09
changed the name of the folder where to store the context entity from context to communities_infrastructures
2020-10-05 11:24:54 +02:00
Miriam Baglioni
32bffb0134
changed the name from communities_infrastructures to communities_infrastuctures.json
2020-10-05 11:24:17 +02:00
Claudio Atzori
23f64d9eb4
updated dedup tests following the dnet-pace-core library update
2020-10-02 14:30:53 +02:00
Miriam Baglioni
fc2f7636be
removed not used code
2020-10-02 12:33:52 +02:00
Miriam Baglioni
25cbcf6114
changed to solve issues about names. context renamed communities_infrastructure.json and removed the double json.gz extention to the name of the part in the tar
2020-10-02 12:17:46 +02:00
Claudio Atzori
9db0f88fb8
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-10-02 09:43:35 +02:00
Claudio Atzori
49ae3450a9
code formatting
2020-10-02 09:43:24 +02:00
Claudio Atzori
c2a6e2a9bf
fixed mapping for datasource journal info (ISSNs)
2020-10-02 09:37:08 +02:00
Miriam Baglioni
01117a46e1
whole workflow activated
2020-10-01 17:19:21 +02:00
Miriam Baglioni
cfb5766c6b
removed double json.gz from names of files in the tar
2020-10-01 17:18:34 +02:00
Miriam Baglioni
fcaedac980
merge branch with master
2020-10-01 16:46:59 +02:00
Miriam Baglioni
c6e6ed1bd8
merge branch with master
2020-10-01 16:24:41 +02:00
Miriam Baglioni
4aec347351
refactoring
2020-10-01 16:23:52 +02:00
Miriam Baglioni
61946b4092
refactoring
2020-10-01 16:22:48 +02:00
Miriam Baglioni
7e6d35e56c
added the link to the excel file related to topic
2020-10-01 15:53:31 +02:00
Sandro La Bruzzo
1a0a44e85a
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-10-01 15:46:53 +02:00
Sandro La Bruzzo
c4a3c52e45
fixed Doiboost bug in the identifier
2020-10-01 15:46:44 +02:00
Miriam Baglioni
43cbd62c2b
added classpath.first in the configuration
2020-10-01 15:46:34 +02:00
Miriam Baglioni
cd69c6b023
added dependency for the topic file path
2020-10-01 15:45:59 +02:00
Miriam Baglioni
771cde3d05
moved the library version to global pom
2020-10-01 15:43:47 +02:00
Miriam Baglioni
632351c0da
modified test resources to mirror the changed in the code
2020-10-01 15:43:02 +02:00
Miriam Baglioni
ebc1c5513f
modified test resources to mirror the changed in the code
2020-10-01 15:42:29 +02:00
Miriam Baglioni
3a374c34b6
fixed null pointer exception
2020-10-01 15:41:01 +02:00
Miriam Baglioni
83ea746163
added check to the test
2020-10-01 15:40:28 +02:00
Claudio Atzori
2e9e13444d
author pids made unique by value
2020-10-01 12:50:40 +02:00
Miriam Baglioni
6e5db85b32
-
2020-10-01 11:51:11 +02:00
Miriam Baglioni
a46179f61c
refactoring
2020-10-01 11:22:01 +02:00
Miriam Baglioni
b90bee124b
removing raws that are empy from thos imported
2020-10-01 11:16:49 +02:00
Miriam Baglioni
c107f193c9
refactoring
2020-10-01 11:16:22 +02:00
Claudio Atzori
e265c3e125
cleaning functions factored out in a dedicated class
2020-10-01 10:50:15 +02:00
Miriam Baglioni
706a80a29a
added test to check that separator '-' (not hyphen) will be recognized
2020-10-01 10:38:31 +02:00
Miriam Baglioni
3dca586b3b
refactoring
2020-10-01 10:34:48 +02:00
Miriam Baglioni
416bda6066
changed the programme.desxcription by using the same value used in the classification instead of the short title or the title
2020-10-01 10:31:33 +02:00
Miriam Baglioni
f6587c91f3
added comparison to a char that seems - but it is not
2020-10-01 10:30:26 +02:00
Claudio Atzori
4287164aba
include relevantdate field in the result view
2020-10-01 10:28:55 +02:00
miconis
e3f7798d1b
minor changes in dedup tests, bug fix in the idgenerator and pace-core version update
2020-09-29 15:31:46 +02:00
Miriam Baglioni
7e73bb88b3
changed the logic to add the topic description to the project
2020-09-28 17:21:43 +02:00
Miriam Baglioni
0a035e3630
-
2020-09-28 17:20:49 +02:00
Miriam Baglioni
16bee2084d
added the topic code to the project subset
2020-09-28 17:20:11 +02:00
Miriam Baglioni
0bf2d0db52
added to the workflow the download of the topic excel file and one property needed to get the input path of the topic file in the hdfs filesystem
2020-09-28 12:17:22 +02:00
Miriam Baglioni
c2abde4d9f
changed the implementation of Atomic Actions creation by exploiting the topic information get from the cordis excel file
2020-09-28 12:16:34 +02:00
Miriam Baglioni
d930b8d3fc
changed the query to get only the code of the project and not the optional1 (topic code) and optional2 (topic description)
2020-09-28 12:15:48 +02:00
Miriam Baglioni
f8f5cfd5cc
removed the part added to set the topic code and description in the step of project preparation
2020-09-28 12:13:33 +02:00
Miriam Baglioni
9e19c9a221
remove the topic description from the values in the CSVProject class
2020-09-28 12:11:03 +02:00
Miriam Baglioni
6d8b932e40
refactoring
2020-09-28 12:06:56 +02:00
Miriam Baglioni
b77f166549
changed the package name from csvutils to utils
2020-09-28 12:05:47 +02:00
Miriam Baglioni
e33e3277de
added needed dependency to read the excel file
2020-09-28 12:03:14 +02:00
Miriam Baglioni
f4739a371a
code to get the information related to the topic association between code and description.
2020-09-28 12:02:48 +02:00
Miriam Baglioni
7b6a7333e6
merge branch with master
2020-09-25 16:42:07 +02:00
Miriam Baglioni
983a12ed15
temporary modification to allow the upload of files in the sandbox without the neew to recreate the mapping from scratch
2020-09-25 16:41:51 +02:00
Miriam Baglioni
8b36d19182
added property depositionId and chenage property newVersion that became string from boolean to handle the three possible distinct values
2020-09-25 16:41:15 +02:00
Miriam Baglioni
ed5239f9ec
added new code to handle the new possibility to upload files to an already open deposition
2020-09-25 16:34:32 +02:00
Miriam Baglioni
3a8c524fce
refactor
2020-09-25 16:34:02 +02:00
Miriam Baglioni
2ac2b537b6
merge branch with master
2020-09-25 14:40:47 +02:00
Miriam Baglioni
54800fb9b0
enabled only the step to upload in zenodo
2020-09-25 14:40:22 +02:00
Miriam Baglioni
12c2dfc268
modified the resource to consider the information added to the model
2020-09-25 14:17:23 +02:00
Miriam Baglioni
969fa8d96e
fixed issue and changed the transformation of the programme file to consider the new model
2020-09-25 13:32:34 +02:00
miconis
4cf79f32eb
implementation of the oozie wf to prepare the openorgs input: relations between organizations
2020-09-25 11:29:51 +02:00
Michele Artini
c171fdebe1
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-09-25 09:03:09 +02:00
Michele Artini
c96598aaa4
opendoar partition
2020-09-25 09:02:58 +02:00
Miriam Baglioni
de6c4d46d8
fixed conflicts
2020-09-24 15:35:01 +02:00
Miriam Baglioni
e917281822
-
2020-09-24 15:24:05 +02:00
Miriam Baglioni
9f54f69e6d
added topic information
2020-09-24 15:23:35 +02:00
Miriam Baglioni
d6206d6e63
add the topic description to the action set associated to the project
2020-09-24 15:22:40 +02:00
Miriam Baglioni
6b50226f3b
added topic code and topic description
2020-09-24 15:21:49 +02:00
Miriam Baglioni
15af1f527e
modified to consider the topic information
2020-09-24 15:20:56 +02:00
Miriam Baglioni
609ff17cfc
now the commission give us the framework programme (FP7 - H2020) so use this information to filter out programmes not associated to H2020
2020-09-24 15:19:31 +02:00
Miriam Baglioni
b66f930466
Added optionl1 and optional2 information to the files red from the db. Optional1 contains the topic code and optional2 contains the topic description
2020-09-24 15:16:56 +02:00
Miriam Baglioni
860e6d38a6
added topic description to the CSV project variables
2020-09-24 15:15:26 +02:00
Claudio Atzori
044d3a0214
fixed query used to load datasources in the Graph
2020-09-24 13:48:58 +02:00
Claudio Atzori
27df1cea6d
code formatting
2020-09-24 12:16:00 +02:00
Claudio Atzori
fb22f4d70b
included values for projects fundedamount and totalcost fields in the mapping tests. Swapped expected and actual values in junit test assertions
2020-09-24 12:10:59 +02:00
Claudio Atzori
42f55395c8
fixed order of the ISSNs returned by the SQL query
2020-09-24 12:09:58 +02:00
Claudio Atzori
fadf5c7c69
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-09-24 10:42:52 +02:00
Claudio Atzori
9a7e72d528
using concat_ws to join textual columns from PSQL. When using || to perform the concatenation, Null columns makes the operation result to be Null
2020-09-24 10:42:47 +02:00
Claudio Atzori
9e3e93c6b6
setting the correct issn type in the datasource.journal element
2020-09-24 10:39:16 +02:00
Miriam Baglioni
0d83f47166
merge branch with master
2020-09-23 17:33:49 +02:00
Miriam Baglioni
39eb8ab25b
changed the dump to move from h2020programme to h2020classification
2020-09-23 17:33:00 +02:00
Miriam Baglioni
1d84cf19a6
added new line to resource file
2020-09-23 17:32:22 +02:00
Miriam Baglioni
f0c476b6c9
modification to the test classes to consider h2020classification
2020-09-23 17:31:49 +02:00
Miriam Baglioni
2cba3cb484
modification to the classes building the actionset to consider the h2020classification
2020-09-23 17:31:15 +02:00
Miriam Baglioni
1069cf243a
modification to the schema to consider the H2020classification of the programme. The filed Programme has been moved inside the H2020classification that is now associated to the Project. Programme is no more associated directly to the Project but via H2020CLassification
2020-09-22 14:38:00 +02:00
Enrico Ottonello
a97ad20c7b
exception is now propagated (PR review)
2020-09-22 10:46:34 +02:00
Enrico Ottonello
fefbcfb106
dependency version moved to main pom (PR review)
2020-09-22 10:20:25 +02:00
miconis
259362ef47
implementation of the job to collect simrels from postgres db
2020-09-22 09:43:27 +02:00
Michele Artini
9e681609fd
stats to sql file
2020-09-17 15:51:22 +02:00
Michele Artini
51321c2701
partition of events by opedoarId
2020-09-17 11:38:07 +02:00
Claudio Atzori
cf2ce1a09b
code formatting
2020-09-15 15:58:03 +02:00
Enrico Ottonello
9e8e7fe6ef
add comments
2020-09-15 11:32:49 +02:00
Miriam Baglioni
c2b5c780ff
-
2020-09-14 14:34:03 +02:00
Miriam Baglioni
e2ceefe9be
-
2020-09-14 14:33:28 +02:00
Miriam Baglioni
1f893e63dc
-
2020-09-14 14:33:10 +02:00
Enrico Ottonello
538f299767
merged
2020-09-14 12:35:16 +02:00
Enrico Ottonello
eb8c9b2348
Merge remote-tracking branch 'upstream/master' into orcid-no-doi
2020-09-14 12:00:56 +02:00
Michele Artini
9b0c12f5d3
send notifications
2020-09-11 12:06:16 +02:00
Michele Artini
028613b751
remove old notifications
2020-09-09 15:32:06 +02:00
Michele Artini
9cfc124ac5
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-09-08 16:39:54 +02:00
Michele Artini
a597a218ab
* forall topics
2020-09-08 16:39:40 +02:00
Claudio Atzori
8a523474b7
code formatting
2020-09-07 11:40:16 +02:00
Michele Artini
bb459caf69
support for all topic subscriptions
2020-08-27 11:01:21 +02:00
Michele Artini
82ed8edafd
notification indexing
2020-08-26 15:10:48 +02:00
Miriam Baglioni
b72a7dad46
resuorce for pid graph dump
2020-08-24 17:09:01 +02:00
Miriam Baglioni
8694bb9b31
refactoring due to compilation
2020-08-24 17:07:34 +02:00
Miriam Baglioni
8a069a4fea
-
2020-08-24 17:01:30 +02:00
Miriam Baglioni
34fa96f3b1
-
2020-08-24 17:00:20 +02:00
Miriam Baglioni
5fb2949cb8
added utils methods
2020-08-24 17:00:09 +02:00
Miriam Baglioni
2a540b6c01
added constants for the pid graph dump
2020-08-24 16:55:35 +02:00
Miriam Baglioni
da103c399a
resources for the pid graph dump test
2020-08-24 16:52:07 +02:00
Miriam Baglioni
630a6a1fe7
first tests for the pid graph dump
2020-08-24 16:51:26 +02:00
Miriam Baglioni
40c8d2de7b
test resources for the dump of the pids graph
2020-08-24 16:50:39 +02:00
Miriam Baglioni
bef79d3bdf
first attempt to the dump of pids graph
2020-08-24 16:49:38 +02:00
Michele Artini
da470422d3
deleting events
2020-08-21 14:52:48 +02:00
Michele Artini
6e60bf026a
indexing only a subset of eventsa
2020-08-19 12:39:22 +02:00
Miriam Baglioni
85203c16e3
merge branch with master
2020-08-19 11:49:03 +02:00
Miriam Baglioni
2c783793ba
removed the affiliation from the author to mirror the changes in the model
2020-08-19 11:48:12 +02:00
Miriam Baglioni
f6bf888016
removed affiliation from author to mirror the changes in the model
2020-08-19 11:41:41 +02:00
Miriam Baglioni
66d0e0d3f2
-
2020-08-19 11:31:50 +02:00
Miriam Baglioni
1c593a9cfe
-
2020-08-19 11:29:51 +02:00
Miriam Baglioni
e42b2f5ae2
-
2020-08-19 11:29:09 +02:00
Miriam Baglioni
f81ee22418
changed to mirror the changes in the model (Instance, CommunityInstance, GraphResult)
2020-08-19 11:28:26 +02:00
Miriam Baglioni
387be43fd4
changed to discriminate if dumping all the results type together or each one in its own archive
2020-08-19 11:25:27 +02:00
Miriam Baglioni
c5858afb88
added parameter to guide the dump for the result (resultAggregation). true if all the result types should be dump together, false otherwise.
2020-08-19 11:24:14 +02:00
Miriam Baglioni
d407852ac2
changed to reflect the changed in the model
2020-08-19 11:15:05 +02:00
Miriam Baglioni
47c21a8961
refactoring due to compilation
2020-08-19 11:11:57 +02:00
Miriam Baglioni
5570678c65
changed parameter name from hfdsNameNode to nameNode
2020-08-19 10:59:26 +02:00
Miriam Baglioni
dc5096a327
refactoring due to compilation
2020-08-19 10:57:36 +02:00
Miriam Baglioni
55e24c2547
relclass for relation and corresponding values have been put to lower case (isSupplementedBy wrote as IsSupplementedBy - orcid propagation)
2020-08-18 16:42:08 +02:00
Miriam Baglioni
f44dd5d886
changed in mapping the result semantic name as it will be visible il the relclass Relation: from IsSupplementedBy to isSupplementedBy
2020-08-17 17:15:09 +02:00
Miriam Baglioni
bc6b5d5b34
removed leftover parameter
2020-08-15 11:22:35 +02:00
Miriam Baglioni
200cd5c730
removed leftover parameter
2020-08-15 11:22:19 +02:00
Miriam Baglioni
96600ed04a
modified test resource for mirroring the deletion of affiliation from author parameters
2020-08-14 20:41:49 +02:00
Miriam Baglioni
09f5b92763
added specific reference to class
2020-08-14 20:00:09 +02:00
Miriam Baglioni
37e7c43652
changed parameter name from hdfsNaemNode to nameNode
2020-08-14 18:18:25 +02:00
Claudio Atzori
5b994d7ccf
Merge branch 'dump' of https://code-repo.d4science.org/miriam.baglioni/dnet-hadoop into resolve_conflicts_pr40_dump
2020-08-14 15:32:29 +02:00
Miriam Baglioni
de995970ea
try again to solve clash with master
2020-08-14 15:24:36 +02:00
Miriam Baglioni
5040d72d5e
changed to make it equal to master branch
2020-08-14 15:20:17 +02:00
Miriam Baglioni
be8106c339
added space toavoid conflicts with master branch
2020-08-14 15:16:27 +02:00
Claudio Atzori
1871d1c6f6
solve error java.lang.NoSuchFieldError: INSTANCE when instantiating Solr client
2020-08-14 11:18:30 +02:00
Miriam Baglioni
d2a8a4961a
refactoring
2020-08-13 18:50:33 +02:00
Miriam Baglioni
a5043de5da
added method to get the mapped instance
2020-08-13 18:45:50 +02:00
Miriam Baglioni
b7e49aee8d
removed commented code
2020-08-13 18:44:07 +02:00
Miriam Baglioni
f439a6231e
added missing constraint in XQuery (verify the status of the RC/RI different from hidden)
2020-08-13 15:30:55 +02:00
Miriam Baglioni
0fe800b1c9
modified because of #40 \#issuecomment-1902
2020-08-13 15:17:12 +02:00
Miriam Baglioni
270c89489c
fixed issue created while renaming subject to subjects in community configuration xml
2020-08-13 15:16:04 +02:00
Miriam Baglioni
fcd10f452c
changed because of #40 (comment)
2020-08-13 12:55:32 +02:00
Miriam Baglioni
fd48ae3b85
changed because of #40 (comment)
2020-08-13 12:19:15 +02:00
Miriam Baglioni
04a3e1ab38
disabled tests
2020-08-13 12:18:13 +02:00
Miriam Baglioni
2ede397933
Apply change because of #40 (comment)
2020-08-13 12:16:39 +02:00
Miriam Baglioni
bfd1fcde6d
removed not useful method and changed because of #40 (comment) and #40 (comment)
2020-08-13 12:14:37 +02:00
Miriam Baglioni
7fd8397123
apply changes in #40 (comment)
2020-08-13 12:13:15 +02:00
Miriam Baglioni
753d448cc9
apply changes in #40 (comment)
2020-08-13 12:12:58 +02:00
Miriam Baglioni
c0e071fa26
apply changes in #40 (comment)
2020-08-13 12:12:40 +02:00
Miriam Baglioni
526db915bc
apply changes in #40 (comment)
2020-08-13 12:12:16 +02:00
Miriam Baglioni
b0fab0d138
apply changes in #40 (comment)
2020-08-13 12:11:57 +02:00
Miriam Baglioni
1b6320b251
apply changes in #40 (comment)
2020-08-13 12:11:41 +02:00
Miriam Baglioni
743d31be22
apply changes in #40 (comment)
2020-08-13 12:11:22 +02:00
Miriam Baglioni
65b48df652
apply changes in #40 (comment)
2020-08-13 12:11:06 +02:00
Miriam Baglioni
90b54d3efb
apply changes in #40 (comment)
2020-08-13 12:08:24 +02:00
Miriam Baglioni
69bbb9592a
apply changes in #40 (comment)
2020-08-13 12:07:39 +02:00
Miriam Baglioni
945323299a
apply changes in #40 (comment)
2020-08-13 12:07:24 +02:00
Miriam Baglioni
e04c993247
apply changes in #40 (comment)
2020-08-13 12:07:07 +02:00
Miriam Baglioni
ed0812d0ce
apply changes in #40 (comment)
2020-08-13 12:06:49 +02:00
Miriam Baglioni
d55cfe0ea5
apply changes in #40 (comment)
2020-08-13 12:06:20 +02:00
Miriam Baglioni
80866bec7d
apply changes in #40 (comment)
2020-08-13 12:06:05 +02:00
Miriam Baglioni
1400978c0a
apply changes in #40 (comment)
2020-08-13 12:05:44 +02:00
Miriam Baglioni
7b941a2e0a
apply changes in #40 (comment)
2020-08-13 12:05:17 +02:00
Miriam Baglioni
f7474f50fe
apply changes in #40 (comment)
2020-08-13 12:04:52 +02:00
Miriam Baglioni
367203f412
apply changes in #40 (comment)
2020-08-13 12:04:33 +02:00
Miriam Baglioni
3ab4809d31
apply changes in #40 (comment)
2020-08-13 12:04:10 +02:00
Miriam Baglioni
02a4986e7b
Applying changed from code reviews #40 (comment) and #40 (comment) and #40 (comment)
2020-08-13 11:53:01 +02:00
Miriam Baglioni
235d4e4d6e
moved Context as relevant for Communities dump
2020-08-12 18:16:45 +02:00
Miriam Baglioni
adf9f96a67
test for extraction of relation between organizations and context
2020-08-12 10:04:47 +02:00
Miriam Baglioni
7400cd019d
removed not needed variable
2020-08-12 10:03:33 +02:00
Miriam Baglioni
98d28bab5c
fixed missing _ in context nsprefix
2020-08-12 10:00:18 +02:00
Miriam Baglioni
8f48cb29f4
changed resource because of a change in the XQuery that returned the XML to be parsed. The main Zenodo community is no more a separate element, but part of the <zenodocommunities> element
2020-08-11 18:04:38 +02:00
Miriam Baglioni
c3672b162b
merge branch with master
2020-08-11 17:53:04 +02:00
Miriam Baglioni
a16bbf3202
changed test resource to mirror change in the Xquery that produced data to be parsed. The main Zenodo community it is no more provided in a different element, but it is part of the <zenodocommunities>
2020-08-11 17:48:44 +02:00
Miriam Baglioni
25f4fbceea
draft of test and resources
2020-08-11 17:37:22 +02:00
Miriam Baglioni
30a2b19b65
changed metadata for deposition od covid-19 dump in Zenodo
2020-08-11 17:36:56 +02:00
Claudio Atzori
f7cc52ab02
Merge pull request 'enrichment_wfs' ( #39 ) from enrichment_wfs into master
...
LGTM
2020-08-11 17:26:13 +02:00
Miriam Baglioni
49788b532a
changed to mirror changes in the schema
2020-08-11 16:05:03 +02:00
Miriam Baglioni
b08511287b
-
2020-08-11 16:01:36 +02:00
Miriam Baglioni
7e81a17068
changed the XQUERY to mirror the change in the code
2020-08-11 16:00:33 +02:00
Miriam Baglioni
37ad2f28e9
removed added | in prefix for datasource
2020-08-11 15:55:06 +02:00
Miriam Baglioni
f31c2e9461
enabled test
2020-08-11 15:49:25 +02:00
Miriam Baglioni
2d67476417
merge branch with master
2020-08-11 15:46:04 +02:00
Miriam Baglioni
77a390878c
merge upstream
2020-08-11 15:45:48 +02:00
Miriam Baglioni
6d3804e24c
-
2020-08-11 15:45:12 +02:00
Miriam Baglioni
0603ec4757
changed test to upload the dump for covid-19 community
2020-08-11 15:43:25 +02:00
Miriam Baglioni
7dfd56df9d
-
2020-08-11 15:42:35 +02:00
Miriam Baglioni
a169d7e7c1
added test file for the MakeTar class
2020-08-11 15:40:41 +02:00
Miriam Baglioni
acb0926b2e
json schemas for the dumped entities and relation
2020-08-11 15:39:48 +02:00
Miriam Baglioni
ff52c51f92
added the communityMapPath parameter and removed the isLookUpUrl parameter
2020-08-11 15:39:22 +02:00
Miriam Baglioni
6f43acda5e
added the maketar and send to zenodo step. Adjusted wf parameters
2020-08-11 15:38:20 +02:00
Miriam Baglioni
ddc19de2e9
removed the isLookUpUrl among the parameters
2020-08-11 15:37:47 +02:00
Miriam Baglioni
592a8ea573
added parameter file for maketar class
2020-08-11 15:37:14 +02:00
Miriam Baglioni
77a0951b32
added the make archive step in the workflow
2020-08-11 15:32:32 +02:00
Miriam Baglioni
cf4d918787
added description, changed parameter name and added method
2020-08-11 15:27:31 +02:00
Miriam Baglioni
dc5fc5366d
Creation of an archive for each related dump part
2020-08-11 15:26:06 +02:00
Miriam Baglioni
0ce49049d6
added description
2020-08-11 15:25:11 +02:00
Miriam Baglioni
9bae991167
added description of the class
2020-08-11 11:20:43 +02:00
Miriam Baglioni
341dc59ead
removed the repartition(1). Added code for the creation of an archive containing all the parts dumped for each community
2020-08-11 11:18:58 +02:00
Sandro La Bruzzo
fe8d640aee
fixed error on oozie workflow
2020-08-11 09:43:03 +02:00
Sandro La Bruzzo
304590e854
updated workflow of indexing to start from begin
2020-08-11 09:17:47 +02:00
Sandro La Bruzzo
eaf0dc68a2
fixed indexing
2020-08-11 09:17:03 +02:00
Miriam Baglioni
1991a49f70
removed reference to isLookUp to get the communityMap
2020-08-10 18:02:56 +02:00
Miriam Baglioni
c378c38546
disabled test. The testing functionalities for hte upload in Zenode are moved to common
2020-08-10 12:41:11 +02:00
Miriam Baglioni
63ad0ed209
changed to use communityMapPath instead of IsLookUp
2020-08-10 12:40:19 +02:00
Miriam Baglioni
cec795f2ea
changed resources to mirror changes in the model
2020-08-10 12:39:35 +02:00
Miriam Baglioni
f50e3e7333
changed the class for which to generate the schema
2020-08-10 12:03:49 +02:00
Miriam Baglioni
b8c26f656c
test using communityMapPath instead of isLookUp
2020-08-10 12:02:55 +02:00
Miriam Baglioni
fe88904df0
changed the wf definition
2020-08-10 12:01:14 +02:00
Miriam Baglioni
87856467e2
removed isLookUpUrl and added code to read from HDSF the communitymap
2020-08-10 11:38:41 +02:00
Miriam Baglioni
1cf7043e26
removed isLookUoUrl from the parameters
2020-08-10 11:38:03 +02:00
Claudio Atzori
cf6b68ce5a
Merge pull request 'data provision workflow: add nodes to perform DELETE BY QUERY before the indexing begins and COMMIT after the indexing is completed' ( #36 ) from provision_indexing into master
2020-08-10 11:16:29 +02:00
Sandro La Bruzzo
0ade33ad15
updated mergeFrom function for DLI Unknown
2020-08-10 10:18:35 +02:00
Miriam Baglioni
46986aae2d
added the new parameter for newdeposion/newversion and concept_record_id
2020-08-07 18:00:06 +02:00
Miriam Baglioni
3aedfdf0d6
added option to do a new deposition or new version of an old deposition
2020-08-07 17:49:14 +02:00
Miriam Baglioni
1b3ad1bce6
filter out authors pid (only orcid). Added check to get unique provenance for context id. filtr out countries with code UNKNOWN
2020-08-07 17:48:18 +02:00
Miriam Baglioni
5ceb8c5f0a
moved constants from graph.Constants
2020-08-07 17:46:47 +02:00
Miriam Baglioni
6c65c93c0e
refactoring
2020-08-07 17:45:35 +02:00
Miriam Baglioni
68adf86fe4
refactoring
2020-08-07 17:43:20 +02:00
Miriam Baglioni
26d2ad6ebb
refactoring
2020-08-07 17:41:56 +02:00
Miriam Baglioni
9675af7965
refactoring
2020-08-07 17:41:07 +02:00
Miriam Baglioni
346a91f4d9
Added constants
2020-08-07 17:35:39 +02:00
Miriam Baglioni
d52b0e1797
no use of IsLookUp. The query is done once and its result stored on HDFS. The path to the result is given instead of the isLookUpUrl
2020-08-07 17:34:40 +02:00
Miriam Baglioni
ae1b7fbfdb
changed method signature from set of mapkey entries to String representing path on file system where to find the map
2020-08-07 17:32:27 +02:00
Miriam Baglioni
931fa2ff00
removed dependencies
2020-08-07 16:46:37 +02:00
Miriam Baglioni
545ea9f77e
moved in common. Zenodo response model and APIClient to deposit in Zenodo
2020-08-07 16:44:51 +02:00
Sandro La Bruzzo
ddb1446ceb
fixed test
2020-08-07 11:34:33 +02:00
Sandro La Bruzzo
718bc7bbc8
implemented provision workflows using the new implementation with Dataset
2020-08-07 11:05:18 +02:00
Miriam Baglioni
da9b012c15
fixed dewcription
2020-08-06 11:55:44 +02:00
Miriam Baglioni
6dbadcf181
the new schema for the dumped result
2020-08-06 11:05:56 +02:00
Sandro La Bruzzo
a44e5abaa7
reformat code
2020-08-06 10:30:22 +02:00
Sandro La Bruzzo
4fb1821fab
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-08-06 10:28:31 +02:00
Sandro La Bruzzo
9d9e9edbd2
improved extractEntity Relation workflows using dataset
2020-08-06 10:28:24 +02:00
Miriam Baglioni
adf0ca5aa7
test to send is from hdfs
2020-08-05 14:24:43 +02:00
Miriam Baglioni
14eda4f46e
added method to try to put inputstream to zenodo
2020-08-05 14:18:25 +02:00
Miriam Baglioni
e737a47270
added classes to try to send input stream to zenodo for the upload
2020-08-05 14:17:40 +02:00
Miriam Baglioni
873e9cd50c
changed hadoop setting to connect to s3
2020-08-04 15:37:25 +02:00
Alessia Bardi
a29565ff57
code formatting
2020-08-04 12:55:27 +02:00
Alessia Bardi
01db29e208
fixes redmine issue #5846 : datacite and its different namespace declarations
2020-08-04 12:53:48 +02:00
Alessia Bardi
b4e4e5f858
do not duplicate result PIDs
2020-08-04 12:52:14 +02:00
Alessia Bardi
09a323d18d
testing a dataset from Nakala
2020-08-04 12:50:52 +02:00
Alessia Bardi
c35bf486cc
added handle among the possible PIDs
2020-08-04 12:50:12 +02:00
Miriam Baglioni
5b651abf82
merge branch with master
2020-08-04 10:14:07 +02:00
Miriam Baglioni
88e4c3b751
added default trust to context bulktagged
2020-08-04 10:13:25 +02:00
Miriam Baglioni
f9342cb484
added constant
2020-08-03 18:32:35 +02:00
Miriam Baglioni
96c3c891f4
added trust
2020-08-03 18:32:17 +02:00
Miriam Baglioni
53656600ad
changed XQuery to select only community and ri with status not hidden
2020-08-03 18:29:30 +02:00
Miriam Baglioni
b34177d8ef
merge upstream
2020-08-03 18:13:42 +02:00
Miriam Baglioni
901ae37f7b
added step to workflow
2020-08-03 18:12:54 +02:00
Miriam Baglioni
fa38cdb10b
added resource
2020-08-03 18:11:12 +02:00
Miriam Baglioni
e9fcc0b2f1
commented test unit - to decide change for mirroring the changed logics
2020-08-03 18:10:53 +02:00
Miriam Baglioni
e43aeb139a
added new property file and changed some parameter to old files
2020-08-03 18:07:28 +02:00
Miriam Baglioni
aa9f3d9698
changed logic for save in s3 directly
2020-08-03 18:06:18 +02:00
Miriam Baglioni
d465f0eec9
added fulltext to result
2020-08-03 18:03:27 +02:00
Miriam Baglioni
ec4b392d12
added new dependencies for writing on s3
2020-08-03 17:57:04 +02:00
Miriam Baglioni
c892c7dfa7
changed to query for community map just once and save the result for remaining executions
2020-08-03 17:56:31 +02:00
Claudio Atzori
3a11a387a9
data provision workflow enhancement: added nodes to perform DELETE BY QUERY before the indexing begins and COMMIT after the indexing is completed
2020-08-03 14:28:08 +02:00
Alessia Bardi
8cc067fe76
specific test for claims
2020-08-03 11:17:50 +02:00
Claudio Atzori
a89b6cc3ba
Merge pull request 'nsprefix_blacklist' ( #34 ) from nsprefix_blacklist into master
2020-07-31 11:52:23 +02:00
Sandro La Bruzzo
0c3bc9ea4b
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-07-31 09:07:18 +02:00
Sandro La Bruzzo
168bfb496a
adopted dedup to the new schema
2020-07-31 09:06:57 +02:00
Michele Artini
652b13abb6
Merge branch 'master' into nsprefix_blacklist
2020-07-31 07:58:37 +02:00
Enrico Ottonello
0377b40fba
output to one parquet file
2020-07-30 18:38:07 +02:00
Claudio Atzori
cd631bb5bc
defaults fixed in the cleaning workflow forces result.publisher to NULL when result.publisher.value in empty
2020-07-30 17:03:53 +02:00
Miriam Baglioni
872d7783fc
-
2020-07-30 16:50:36 +02:00
Miriam Baglioni
57c87b7653
re-implemented to fix issue on not serializable Set<String> variable
2020-07-30 16:43:43 +02:00
Miriam Baglioni
ef8e5957b5
added specific directory where to save results
2020-07-30 16:42:46 +02:00
Miriam Baglioni
75f3361c85
-
2020-07-30 16:41:31 +02:00
Miriam Baglioni
3f695b25fa
refactoring
2020-07-30 16:40:15 +02:00
Miriam Baglioni
e623f12bef
refactoring
2020-07-30 16:32:59 +02:00
Miriam Baglioni
ff7d05abb4
added support class to store the couple organizationId representativeId gaot from sql query on hive
2020-07-30 16:32:04 +02:00
Miriam Baglioni
cf6d80b2ab
added command to close the writer
2020-07-30 16:31:22 +02:00
Miriam Baglioni
f985bca37b
added USER_CLAIM constant value
2020-07-30 16:25:26 +02:00
Claudio Atzori
4bbfcf1ac6
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-07-30 16:25:06 +02:00
Claudio Atzori
4ff8007518
added function to set the missing vocabulary names, used in the cleaning workflow as a pre-cleaning step
2020-07-30 16:24:39 +02:00
Miriam Baglioni
6f1c40a933
-
2020-07-30 16:24:28 +02:00
Miriam Baglioni
2b66a93f9e
added property file that was missing
2020-07-30 16:24:17 +02:00
Michele Artini
bdece15ca0
blacklist of nsprefix
2020-07-30 16:13:38 +02:00
Enrico Ottonello
196f36c6ed
fix publication dataset creation
2020-07-30 13:38:33 +02:00
Sandro La Bruzzo
c97c8f0c44
implemented new oozie job to extract entities in a separate dataset
2020-07-30 12:13:58 +02:00
Sandro La Bruzzo
3010a362bc
updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset
2020-07-30 09:25:56 +02:00
Sandro La Bruzzo
487226f669
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-07-30 09:25:39 +02:00
Sandro La Bruzzo
16ae3c9ccf
updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset
2020-07-30 09:25:32 +02:00
Miriam Baglioni
ee8420c6b3
added resource for datasource test
2020-07-29 18:28:43 +02:00
Miriam Baglioni
76bcab98ce
added code to filter out null originalId from the dump
2020-07-29 18:28:21 +02:00
Miriam Baglioni
ef1d8aef17
added one test to verify the dump for the datasources
2020-07-29 18:27:46 +02:00
Miriam Baglioni
86bab79512
-
2020-07-29 18:20:22 +02:00
Miriam Baglioni
31791dcf3d
fixed wrong property file path name
2020-07-29 18:20:08 +02:00
Miriam Baglioni
9e722aa1ef
-
2020-07-29 18:00:08 +02:00
Miriam Baglioni
d22f106f27
added constant to identify datasource associated to funders
2020-07-29 17:56:55 +02:00
Miriam Baglioni
40e194fe2f
added check to not dump datasources related to funders
2020-07-29 17:56:18 +02:00
Miriam Baglioni
b48934f6df
changed the workflow name
2020-07-29 17:43:43 +02:00
Miriam Baglioni
1433db825d
refactorign
2020-07-29 17:43:24 +02:00
Miriam Baglioni
074e9ab75e
refactoring
2020-07-29 17:42:50 +02:00
Miriam Baglioni
8ad8dac7d4
merge branch with fork master
2020-07-29 17:38:28 +02:00
Miriam Baglioni
9e997e63a2
merge upstream
2020-07-29 17:38:14 +02:00
Miriam Baglioni
9fa82dc93b
fixed issue
2020-07-29 17:36:16 +02:00
Miriam Baglioni
8907648d6a
-
2020-07-29 17:35:47 +02:00
Miriam Baglioni
536e7f6352
added and changed resources for testing of the whole graph dump and of community related products dumps
2020-07-29 17:33:34 +02:00
Miriam Baglioni
4d7f590493
testings for the whole graph dump
2020-07-29 17:32:37 +02:00
Miriam Baglioni
a2f73ec2c7
changed due to changes in the model
2020-07-29 17:32:02 +02:00
Miriam Baglioni
481585e9d3
-
2020-07-29 17:31:41 +02:00
Miriam Baglioni
40a8dafbdc
-
2020-07-29 17:30:44 +02:00
Miriam Baglioni
de2ebb467e
changed due to changes in the model
2020-07-29 17:08:02 +02:00
Miriam Baglioni
d0ff2a56fb
-
2020-07-29 17:06:53 +02:00
Miriam Baglioni
b96dedb56b
changed due to changes in the model
2020-07-29 17:05:31 +02:00
Miriam Baglioni
6d0f08277b
classes to implement the dump of the whole graph.
2020-07-29 17:03:19 +02:00
Miriam Baglioni
8d4327b292
input parameters and workflow definition for the dump of the whole graph
2020-07-29 17:00:34 +02:00
Miriam Baglioni
b5f995ab12
refactoring
2020-07-29 16:59:48 +02:00
Miriam Baglioni
f7a87cc447
added new constants value
2020-07-29 16:58:40 +02:00
Miriam Baglioni
b71d12cf26
refactoring
2020-07-29 16:52:44 +02:00
Miriam Baglioni
a8d65b68cb
changed to delete the part to check if it was a test or a real execution
2020-07-29 16:47:57 +02:00
Miriam Baglioni
3ec2392904
Added new class to move the place the split is effectively run
2020-07-29 16:46:50 +02:00
Michele Artini
8ba94833bd
added an es prop
2020-07-29 14:16:08 +02:00
Miriam Baglioni
178c2729a7
changed the path to reach the java class to be executed
2020-07-29 12:29:51 +02:00
Miriam Baglioni
437ac12139
removed unused parameter
2020-07-29 12:28:16 +02:00
Enrico Ottonello
c82b15b5f4
migrate configuration to ocean, fix publication dataset creation
2020-07-28 15:23:52 +02:00
Claudio Atzori
6f11c0496e
fixed typo in module name dhp-worfklow-profiles -> dhp-workflow-profiles
2020-07-28 15:01:58 +02:00
Claudio Atzori
f680eb3e12
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-07-28 14:10:56 +02:00
Claudio Atzori
985b360c31
fixed typo in module name dhp-worfklow-profiles -> dhp-workflow-profiles
2020-07-28 14:10:52 +02:00
Michele Artini
3acd632123
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-07-28 12:02:30 +02:00
Michele Artini
35e6e9c064
tests
2020-07-28 12:02:15 +02:00
Enrico Ottonello
a6acb37689
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-07-28 08:07:40 +02:00
Claudio Atzori
ee832f358e
Merge pull request 'stats_wf_extensions_and_corrections' ( #28 ) from spyros/dnet-hadoop:stats_wf_extensions_and_corrections into master
...
Thank you Guys! The update workflow will be made available to the beta & production orchestration systems under the HDFS path
```/lib/dnet/oa/graph/stats/oozie_app```
2020-07-27 16:02:03 +02:00
miconis
d47352cbc7
refactoring of the procedure for the id generation, minor changes and addition of a comparation on the original id and the origin datasource
2020-07-24 20:10:47 +02:00
Antonis Lempesis
4ac8ebe427
correctly calculating the project duration
2020-07-24 19:50:40 +03:00
Antonis Lempesis
18d9464b52
creating shadow db only if it not exists...
2020-07-24 19:50:40 +03:00
Antonis Lempesis
e217d496ab
added the dest db...
2020-07-24 19:50:40 +03:00
Antonis Lempesis
b16bb68b9f
added the target db name...
2020-07-24 19:50:40 +03:00
Antonis Lempesis
1ee7eeedf3
added the source db name...
2020-07-24 19:50:40 +03:00
Antonis Lempesis
cecbbfa0fc
added missing tables and views: contexts, creation_date, funder
2020-07-24 19:50:40 +03:00
Antonis Lempesis
25b7a615f5
moved datasource_sources table creating in the datasource section
2020-07-24 19:50:40 +03:00
Antonis Lempesis
a8da4ab9c0
years in projects are now integers
2020-07-24 19:50:40 +03:00
Antonis Lempesis
c9cfc165d9
not using impala since the resulting tables are not visible
2020-07-24 19:50:40 +03:00
Antonis Lempesis
dd3d6a6e15
compute stats for the used and new impala tables
2020-07-24 19:50:40 +03:00
Antonis Lempesis
e6f50de6ef
Separated impala from hive steps
2020-07-24 19:50:40 +03:00
Antonis Lempesis
de49173420
fixed a typo in queries
2020-07-24 19:50:40 +03:00
antleb
391cf80fb8
Added peer-reviewed, green, gold tables and fields in result. Added shortcuts from result-country
2020-07-24 19:50:40 +03:00
antleb
68389d0125
Corrected the script used by the last step of the wf
2020-07-24 19:50:40 +03:00
antleb
ec52141f1a
changed refereed type from value to clssname
2020-07-24 19:50:40 +03:00
Spyros Zoupanos
63cd797aba
Comment out step 15 to make it work with the new schema of Claudio
2020-07-24 19:50:40 +03:00
Spyros Zoupanos
138c6ddffa
Insert statement to datasource table that takes into account the piwik_id of the openAIRE graph
2020-07-24 19:50:40 +03:00
Spyros Zoupanos
3630794cef
Fix to consider the relationships that have been 'virtually deleted' for project_results - defect #5607
2020-07-24 19:50:40 +03:00
Spyros Zoupanos
5546f29e63
Corrections on the shadow schema and the impala table stats calculation
2020-07-24 19:50:40 +03:00
Spyros Zoupanos
adf8a025d2
Adding more relations (Sources, Licences, Additional) and shadow schema as provided and discussed with Antonis Lempesis
2020-07-24 19:50:40 +03:00
Spyros Zoupanos
657a40536b
Corrections by Spyros: Scipt cleanup, corrections and re-arrangement
2020-07-24 19:50:40 +03:00