Claudio Atzori
|
d10447e747
|
re-packaged graph dump workflow sources
|
2020-11-05 17:38:18 +01:00 |
Claudio Atzori
|
2d76497488
|
cleanup
|
2020-11-05 17:10:24 +01:00 |
Miriam Baglioni
|
f8e9bda24c
|
merge branch with master
|
2020-11-05 16:31:18 +01:00 |
Miriam Baglioni
|
be5ed8f554
|
added check to avoid sending empty metadata.
|
2020-11-05 16:10:17 +01:00 |
Claudio Atzori
|
2148a51fae
|
minor changes
|
2020-11-05 11:24:12 +01:00 |
Claudio Atzori
|
4625b7486e
|
code formatting
|
2020-11-04 18:12:43 +01:00 |
Claudio Atzori
|
f5f346dd2b
|
Merge pull request 'dump' (#50) from miriam.baglioni/dnet-hadoop:dump into master
LGTM
|
2020-11-04 18:07:01 +01:00 |
Miriam Baglioni
|
e9ac471ae9
|
removed dependency from classes for the pid graph dump
|
2020-11-04 18:04:42 +01:00 |
Miriam Baglioni
|
b90a945c49
|
removed property files for pid graph dump
|
2020-11-04 17:28:33 +01:00 |
Miriam Baglioni
|
bac307155a
|
removed properties specific for pid graph dump
|
2020-11-04 17:28:04 +01:00 |
Miriam Baglioni
|
9c9d50f486
|
removed code specific for pid graph dump
|
2020-11-04 17:26:22 +01:00 |
Miriam Baglioni
|
5669890934
|
removed commented lines
|
2020-11-04 17:15:21 +01:00 |
Miriam Baglioni
|
6a89f59be9
|
removed commented lines
|
2020-11-04 17:13:59 +01:00 |
Miriam Baglioni
|
56150d7e5e
|
removed all code related to the dump of pids graph
|
2020-11-04 17:13:12 +01:00 |
Miriam Baglioni
|
16c54a96f8
|
removed pid dump
|
2020-11-04 17:11:32 +01:00 |
Claudio Atzori
|
e5da4ee9b1
|
dedup workflow using the common PidComparator
|
2020-11-04 15:02:02 +01:00 |
Miriam Baglioni
|
0cac5436ff
|
Merge branch 'dump' of code-repo.d4science.org:miriam.baglioni/dnet-hadoop into dump
|
2020-11-04 13:21:11 +01:00 |
Alessia Bardi
|
51808b5afd
|
Updated descriptions
|
2020-11-04 12:29:48 +01:00 |
Alessia Bardi
|
e6becf8659
|
Updated descriptions
|
2020-11-04 12:17:57 +01:00 |
Alessia Bardi
|
0abe0eee33
|
Updated descriptions
|
2020-11-04 12:15:30 +01:00 |
Alessia Bardi
|
f6ab238f5d
|
Updated descriptions
|
2020-11-04 11:50:47 +01:00 |
Sandro La Bruzzo
|
3581244daf
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-11-04 09:04:22 +01:00 |
Sandro La Bruzzo
|
66efb39634
|
implemented merge scholix
|
2020-11-04 09:04:01 +01:00 |
Miriam Baglioni
|
c010a8442f
|
fixed issue on test code
|
2020-11-03 17:26:51 +01:00 |
Miriam Baglioni
|
8ec7a61188
|
merge branch with master
|
2020-11-03 16:59:08 +01:00 |
Miriam Baglioni
|
c209284ca7
|
new schemas for the entities in the dump with added descriptions
|
2020-11-03 16:58:08 +01:00 |
Miriam Baglioni
|
08806deddf
|
added the splitSize non mandatory parameter. Default size 10G
|
2020-11-03 16:57:34 +01:00 |
Miriam Baglioni
|
7d2eda43ca
|
added new non mandatory property publish to determine if to publish the upload or leave it pending. Default value flase
|
2020-11-03 16:57:01 +01:00 |
Miriam Baglioni
|
cbbb1bdc54
|
moved business logic to new class in common for handling the zip of hte archives
|
2020-11-03 16:55:50 +01:00 |
Miriam Baglioni
|
d4382b54df
|
moved the tar archive with maz size on common module
|
2020-11-03 16:54:50 +01:00 |
Claudio Atzori
|
86d6fbe95b
|
refactoring: CleaningFunctions and OafMapperUtils moved in dhp-commong
|
2020-11-03 12:19:46 +01:00 |
Claudio Atzori
|
8471888ad3
|
Merge branch 'graph_cleaning' into stable_ids
|
2020-11-03 11:52:47 +01:00 |
Claudio Atzori
|
5310e56dba
|
remove empy PIDs
|
2020-11-03 11:52:10 +01:00 |
Claudio Atzori
|
3fcd669e99
|
result merge operation leverage on custom ResultTypeComparator in the aggregator graph construction
|
2020-11-03 10:53:23 +01:00 |
Claudio Atzori
|
8e7f81c5f5
|
code formatting
|
2020-11-02 14:25:00 +01:00 |
Claudio Atzori
|
09e44dabff
|
Merge branch 'master' into stable_ids
|
2020-11-02 12:16:01 +01:00 |
Sandro La Bruzzo
|
754c86f33e
|
fixed test to work on jenkins
|
2020-11-02 09:35:01 +01:00 |
Sandro La Bruzzo
|
39337d8a8a
|
fixed test
|
2020-11-02 09:26:25 +01:00 |
Miriam Baglioni
|
dabb33e018
|
changed the discriminant for which split the file
|
2020-10-30 17:52:22 +01:00 |
Claudio Atzori
|
c5dda3a00c
|
Merge pull request 'h2020classification' (#49) from miriam.baglioni/dnet-hadoop:h2020classification into master
LGTM
|
2020-10-30 17:10:05 +01:00 |
Miriam Baglioni
|
4905739be6
|
changed resource file to mirror change in business logic
|
2020-10-30 17:02:57 +01:00 |
Miriam Baglioni
|
b40360ebfb
|
changed the code to mirror the changed decision in the classification level and prodramme description labels
|
2020-10-30 17:02:30 +01:00 |
Miriam Baglioni
|
696409fb9f
|
disabled tests because needing remote resource
|
2020-10-30 17:01:48 +01:00 |
Miriam Baglioni
|
0fba08eae4
|
max allowed size per file 10 Gb
|
2020-10-30 16:05:55 +01:00 |
Claudio Atzori
|
385214eeae
|
code formatting
|
2020-10-30 15:47:05 +01:00 |
Claudio Atzori
|
04ad8969b2
|
anticipated execution of the graph cleaning workflow
|
2020-10-30 15:46:55 +01:00 |
Claudio Atzori
|
4ca75d6951
|
Merge pull request 'Dedup ID creation policy' (#48) from deduptesting into stable_ids
|
2020-10-30 15:15:32 +01:00 |
Miriam Baglioni
|
b828587252
|
prevent the code to cicle indefinetly
|
2020-10-30 15:01:25 +01:00 |
Miriam Baglioni
|
f747e303ac
|
classes for dumping of the graph as ttl file
|
2020-10-30 14:13:45 +01:00 |
Miriam Baglioni
|
16baf5b69e
|
formatting
|
2020-10-30 14:13:14 +01:00 |
Miriam Baglioni
|
a9eef9c852
|
added check for possible Optional value in relation dataInfo
|
2020-10-30 14:12:28 +01:00 |
Miriam Baglioni
|
5f4de9a962
|
formatting
|
2020-10-30 14:11:40 +01:00 |
Miriam Baglioni
|
14bf2e7238
|
added option to split dumps bigger that 40Gb on different files
|
2020-10-30 14:09:04 +01:00 |
Claudio Atzori
|
58f28296ea
|
ProvisionConstants moved as ModelHardLimits in dhp-common and applied to truncate long abstracts (len > 150000). Further filtering for empty PID values
|
2020-10-30 10:56:42 +01:00 |
Miriam Baglioni
|
78fdb11c3f
|
merge branch with master
|
2020-10-29 12:55:22 +01:00 |
Sandro La Bruzzo
|
1d9fdb7367
|
fixed spark memory issue in SparkSplitOafTODLIEntities
|
2020-10-28 12:30:32 +01:00 |
Miriam Baglioni
|
d2374e3b9e
|
added code to handle cases where the funding tree is not existing
|
2020-10-27 16:15:21 +01:00 |
Miriam Baglioni
|
5d3012eeb4
|
changed code to dump only the programme list and not the classification list
|
2020-10-27 16:14:18 +01:00 |
Miriam Baglioni
|
3241ec1777
|
added connection timeout and socket timeout 600 sec
|
2020-10-27 16:12:11 +01:00 |
sandro
|
3a81a940b7
|
solved bug on merge publication
|
2020-10-21 22:41:55 +02:00 |
Miriam Baglioni
|
a2ce527fae
|
changed to match the requirements for short titles in level and long titles in classification
|
2020-10-20 17:03:25 +02:00 |
Sandro La Bruzzo
|
346ed65e2c
|
added upload to zenodo node
|
2020-10-20 16:59:55 +02:00 |
sandro
|
271b4db450
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-10-20 16:09:49 +02:00 |
sandro
|
d58d02d448
|
added workflow upload on zenodo
|
2020-10-20 16:09:07 +02:00 |
miconis
|
c4a59d1b9a
|
merge with the master to port the new packages
|
2020-10-20 16:07:30 +02:00 |
miconis
|
708d887e64
|
minor changes
|
2020-10-20 15:12:19 +02:00 |
miconis
|
0e54803177
|
bug fix in the id generator and implementation of jobs for organization dedup
|
2020-10-20 12:19:46 +02:00 |
Alessia Bardi
|
1425d810a8
|
testing mapping
|
2020-10-19 17:46:14 +02:00 |
Claudio Atzori
|
266bf1a221
|
common IdentifierFactory in use on the mapping from the aggregator data; merge the entities sharing the same id; code formatting
|
2020-10-16 17:02:10 +02:00 |
Claudio Atzori
|
34f1d0904b
|
common IdentifierFactory in use on the mapping from the aggregator data
|
2020-10-16 16:00:19 +02:00 |
Sandro La Bruzzo
|
fed711da80
|
Merge remote-tracking branch 'origin/master' into merge_record_to_common
|
2020-10-13 15:32:45 +02:00 |
Sandro La Bruzzo
|
34bf64c94f
|
fixed export Scholexplorer to OpenAire
|
2020-10-13 08:47:58 +02:00 |
Alessia Bardi
|
8775a64bc1
|
Merge pull request 'Merging different compatibility levels (pinocchio operator)' (#47) from merge_graph into master
|
2020-10-09 14:44:52 +02:00 |
Claudio Atzori
|
e751c1402f
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-10-09 13:53:21 +02:00 |
Claudio Atzori
|
b961dc7d1e
|
added originalid to the fields in the result graph view
|
2020-10-09 13:53:15 +02:00 |
miconis
|
6f8720982c
|
bug fix in the idgenerator and test implementation
|
2020-10-09 09:30:23 +02:00 |
Sandro La Bruzzo
|
734934e2eb
|
fixed error on empty intersection with publication and relation on export to OAF
|
2020-10-08 17:29:29 +02:00 |
Sandro La Bruzzo
|
eec418cd26
|
moved AuthoreMerger into dhp-common
|
2020-10-08 10:33:55 +02:00 |
Sandro La Bruzzo
|
fe0a7870e6
|
Added test to check if merge authors works
|
2020-10-08 10:33:12 +02:00 |
Sandro La Bruzzo
|
cd9c377d18
|
adpted scholexplorer Dump generation to the new Dataset definition
|
2020-10-08 10:10:13 +02:00 |
Claudio Atzori
|
a3f37a9414
|
javadoc
|
2020-10-07 16:44:22 +02:00 |
Claudio Atzori
|
8d85a2fced
|
[BETA wf only] datasources involved in the merge operation doesn't obey to the infra precedence policy, but relies on a custom behaviour that, given two datasources from beta and prod returns the one from prod with the highest compatibility among the two
|
2020-10-07 16:28:52 +02:00 |
Claudio Atzori
|
5f7b75f5c5
|
code formatting
|
2020-10-07 13:22:54 +02:00 |
miconis
|
1804c5d809
|
refactoring: classes moved in the right package
|
2020-10-06 16:44:51 +02:00 |
miconis
|
7093355487
|
bug fix and minor changes
|
2020-10-06 16:21:34 +02:00 |
miconis
|
5a8bc329c5
|
bug fix in the result merge: it takes the correct bestaccessright basing on the license instead of the trust
|
2020-10-06 15:26:44 +02:00 |
miconis
|
a2ac7e52fb
|
implementation of the workflow for new organizations in openorgs
|
2020-10-06 13:58:09 +02:00 |
Miriam Baglioni
|
061527f06e
|
adding short description
|
2020-10-05 13:54:39 +02:00 |
Miriam Baglioni
|
0c12d7bdd8
|
adding short description
|
2020-10-05 11:39:55 +02:00 |
Miriam Baglioni
|
ae08b3c0dd
|
merge branch with master
|
2020-10-05 11:35:55 +02:00 |
Miriam Baglioni
|
11b7eaae09
|
changed the name of the folder where to store the context entity from context to communities_infrastructures
|
2020-10-05 11:24:54 +02:00 |
Miriam Baglioni
|
32bffb0134
|
changed the name from communities_infrastructures to communities_infrastuctures.json
|
2020-10-05 11:24:17 +02:00 |
Claudio Atzori
|
23f64d9eb4
|
updated dedup tests following the dnet-pace-core library update
|
2020-10-02 14:30:53 +02:00 |
Miriam Baglioni
|
fc2f7636be
|
removed not used code
|
2020-10-02 12:33:52 +02:00 |
Miriam Baglioni
|
25cbcf6114
|
changed to solve issues about names. context renamed communities_infrastructure.json and removed the double json.gz extention to the name of the part in the tar
|
2020-10-02 12:17:46 +02:00 |
Claudio Atzori
|
9db0f88fb8
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-10-02 09:43:35 +02:00 |
Claudio Atzori
|
49ae3450a9
|
code formatting
|
2020-10-02 09:43:24 +02:00 |
Claudio Atzori
|
c2a6e2a9bf
|
fixed mapping for datasource journal info (ISSNs)
|
2020-10-02 09:37:08 +02:00 |
Miriam Baglioni
|
01117a46e1
|
whole workflow activated
|
2020-10-01 17:19:21 +02:00 |
Miriam Baglioni
|
cfb5766c6b
|
removed double json.gz from names of files in the tar
|
2020-10-01 17:18:34 +02:00 |