Miriam Baglioni
|
873e9cd50c
|
changed hadoop setting to connect to s3
|
2020-08-04 15:37:25 +02:00 |
Alessia Bardi
|
a29565ff57
|
code formatting
|
2020-08-04 12:55:27 +02:00 |
Alessia Bardi
|
01db29e208
|
fixes redmine issue #5846: datacite and its different namespace declarations
|
2020-08-04 12:53:48 +02:00 |
Alessia Bardi
|
b4e4e5f858
|
do not duplicate result PIDs
|
2020-08-04 12:52:14 +02:00 |
Alessia Bardi
|
09a323d18d
|
testing a dataset from Nakala
|
2020-08-04 12:50:52 +02:00 |
Alessia Bardi
|
c35bf486cc
|
added handle among the possible PIDs
|
2020-08-04 12:50:12 +02:00 |
Miriam Baglioni
|
5b651abf82
|
merge branch with master
|
2020-08-04 10:14:07 +02:00 |
Miriam Baglioni
|
88e4c3b751
|
added default trust to context bulktagged
|
2020-08-04 10:13:25 +02:00 |
Miriam Baglioni
|
f9342cb484
|
added constant
|
2020-08-03 18:32:35 +02:00 |
Miriam Baglioni
|
96c3c891f4
|
added trust
|
2020-08-03 18:32:17 +02:00 |
Miriam Baglioni
|
53656600ad
|
changed XQuery to select only community and ri with status not hidden
|
2020-08-03 18:29:30 +02:00 |
Miriam Baglioni
|
b34177d8ef
|
merge upstream
|
2020-08-03 18:13:42 +02:00 |
Miriam Baglioni
|
901ae37f7b
|
added step to workflow
|
2020-08-03 18:12:54 +02:00 |
Miriam Baglioni
|
fa38cdb10b
|
added resource
|
2020-08-03 18:11:12 +02:00 |
Miriam Baglioni
|
e9fcc0b2f1
|
commented test unit - to decide change for mirroring the changed logics
|
2020-08-03 18:10:53 +02:00 |
Miriam Baglioni
|
e43aeb139a
|
added new property file and changed some parameter to old files
|
2020-08-03 18:07:28 +02:00 |
Miriam Baglioni
|
aa9f3d9698
|
changed logic for save in s3 directly
|
2020-08-03 18:06:18 +02:00 |
Miriam Baglioni
|
627c1dc73a
|
added new filed and doc
|
2020-08-03 18:05:41 +02:00 |
Miriam Baglioni
|
d465f0eec9
|
added fulltext to result
|
2020-08-03 18:03:27 +02:00 |
Miriam Baglioni
|
ec4b392d12
|
added new dependencies for writing on s3
|
2020-08-03 17:57:04 +02:00 |
Miriam Baglioni
|
c892c7dfa7
|
changed to query for community map just once and save the result for remaining executions
|
2020-08-03 17:56:31 +02:00 |
Claudio Atzori
|
3a11a387a9
|
data provision workflow enhancement: added nodes to perform DELETE BY QUERY before the indexing begins and COMMIT after the indexing is completed
|
2020-08-03 14:28:08 +02:00 |
Alessia Bardi
|
8cc067fe76
|
specific test for claims
|
2020-08-03 11:17:50 +02:00 |
Claudio Atzori
|
a89b6cc3ba
|
Merge pull request 'nsprefix_blacklist' (#34) from nsprefix_blacklist into master
|
2020-07-31 11:52:23 +02:00 |
Sandro La Bruzzo
|
0c3bc9ea4b
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-07-31 09:07:18 +02:00 |
Sandro La Bruzzo
|
168bfb496a
|
adopted dedup to the new schema
|
2020-07-31 09:06:57 +02:00 |
Michele Artini
|
652b13abb6
|
Merge branch 'master' into nsprefix_blacklist
|
2020-07-31 07:58:37 +02:00 |
Claudio Atzori
|
cd631bb5bc
|
defaults fixed in the cleaning workflow forces result.publisher to NULL when result.publisher.value in empty
|
2020-07-30 17:03:53 +02:00 |
Miriam Baglioni
|
872d7783fc
|
-
|
2020-07-30 16:50:36 +02:00 |
Miriam Baglioni
|
3a0f5f1e1b
|
added static newInstance method
|
2020-07-30 16:49:09 +02:00 |
Miriam Baglioni
|
e163b1df79
|
removed the duration attribute (most of the times it is set to 0)
|
2020-07-30 16:45:03 +02:00 |
Miriam Baglioni
|
57c87b7653
|
re-implemented to fix issue on not serializable Set<String> variable
|
2020-07-30 16:43:43 +02:00 |
Miriam Baglioni
|
ef8e5957b5
|
added specific directory where to save results
|
2020-07-30 16:42:46 +02:00 |
Miriam Baglioni
|
75f3361c85
|
-
|
2020-07-30 16:41:31 +02:00 |
Miriam Baglioni
|
3f695b25fa
|
refactoring
|
2020-07-30 16:40:15 +02:00 |
Miriam Baglioni
|
e623f12bef
|
refactoring
|
2020-07-30 16:32:59 +02:00 |
Miriam Baglioni
|
ff7d05abb4
|
added support class to store the couple organizationId representativeId gaot from sql query on hive
|
2020-07-30 16:32:04 +02:00 |
Miriam Baglioni
|
cf6d80b2ab
|
added command to close the writer
|
2020-07-30 16:31:22 +02:00 |
Miriam Baglioni
|
f985bca37b
|
added USER_CLAIM constant value
|
2020-07-30 16:25:26 +02:00 |
Claudio Atzori
|
4bbfcf1ac6
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-07-30 16:25:06 +02:00 |
Claudio Atzori
|
4ff8007518
|
added function to set the missing vocabulary names, used in the cleaning workflow as a pre-cleaning step
|
2020-07-30 16:24:39 +02:00 |
Miriam Baglioni
|
6f1c40a933
|
-
|
2020-07-30 16:24:28 +02:00 |
Miriam Baglioni
|
2b66a93f9e
|
added property file that was missing
|
2020-07-30 16:24:17 +02:00 |
Michele Artini
|
bdece15ca0
|
blacklist of nsprefix
|
2020-07-30 16:13:38 +02:00 |
Sandro La Bruzzo
|
c97c8f0c44
|
implemented new oozie job to extract entities in a separate dataset
|
2020-07-30 12:13:58 +02:00 |
Sandro La Bruzzo
|
3010a362bc
|
updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset
|
2020-07-30 09:25:56 +02:00 |
Sandro La Bruzzo
|
487226f669
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-07-30 09:25:39 +02:00 |
Sandro La Bruzzo
|
16ae3c9ccf
|
updated changing in the workflow of provision in the phase of aggregation. Removed serialization in JSON RDD and used spark Dataset
|
2020-07-30 09:25:32 +02:00 |
Miriam Baglioni
|
ee8420c6b3
|
added resource for datasource test
|
2020-07-29 18:28:43 +02:00 |
Miriam Baglioni
|
76bcab98ce
|
added code to filter out null originalId from the dump
|
2020-07-29 18:28:21 +02:00 |