Michele Artini
|
0fda2c3a30
|
some tests on db records
|
2020-03-25 09:43:58 +01:00 |
miconis
|
02320de371
|
minor changes
|
2020-03-24 17:43:51 +01:00 |
miconis
|
8e8b5e8f30
|
roots wf merged in scan wf
|
2020-03-24 17:40:58 +01:00 |
Miriam Baglioni
|
19d7f8b51d
|
decommented execution for some of the result type for testing purposes
|
2020-03-24 16:49:46 +01:00 |
Miriam Baglioni
|
ad24c8478f
|
added missing parameter
|
2020-03-24 16:19:59 +01:00 |
Miriam Baglioni
|
46094a3eec
|
bug fixing for implementation with dataset
|
2020-03-24 16:19:36 +01:00 |
Claudio Atzori
|
51ff68db66
|
Merge branch 'dedupTest' of https://code-repo.d4science.org/D-Net/dnet-hadoop into dedupTest
|
2020-03-24 11:18:19 +01:00 |
Claudio Atzori
|
1e869e7bed
|
using method available from currently used library
|
2020-03-24 11:17:44 +01:00 |
miconis
|
f0d72b76a8
|
package structure fixed
|
2020-03-24 10:51:40 +01:00 |
Claudio Atzori
|
aaedbb1b8b
|
WIP: dedup workflow, stage 2
|
2020-03-24 09:59:28 +01:00 |
Michele Artini
|
e3760c7f39
|
fix a bug with organization countries
|
2020-03-24 08:43:56 +01:00 |
Claudio Atzori
|
8b0ba3d76a
|
posprocessing script correctly run as hive2 action
|
2020-03-23 17:40:39 +01:00 |
miconis
|
93e2291291
|
minor changes
|
2020-03-23 17:17:56 +01:00 |
miconis
|
f7890a90df
|
implementation of the mechanism that checks the existance of a mergerel file
|
2020-03-23 17:13:30 +01:00 |
Miriam Baglioni
|
ad712f2d79
|
added the needed variables in the config and read the variables in the workflow
|
2020-03-23 17:11:36 +01:00 |
Miriam Baglioni
|
f1e9fe9752
|
changed implementation using dataset and query on hive
|
2020-03-23 17:11:00 +01:00 |
Miriam Baglioni
|
f09cd1e911
|
removed unuseful variable in the configuration
|
2020-03-23 17:10:14 +01:00 |
Miriam Baglioni
|
9418e3d4fa
|
read dataset from files instead of using hive tables
|
2020-03-23 17:09:27 +01:00 |
Miriam Baglioni
|
a7bf037306
|
remove unused class
|
2020-03-23 14:36:43 +01:00 |
Miriam Baglioni
|
8ab8b6b0bf
|
minor
|
2020-03-23 14:35:23 +01:00 |
Miriam Baglioni
|
30d58fd98c
|
change the configuration of the workflow
|
2020-03-23 14:32:49 +01:00 |
Miriam Baglioni
|
a440152b46
|
refactoring
|
2020-03-23 14:30:56 +01:00 |
Miriam Baglioni
|
47561f3597
|
changed the implementation from rdd to dataset got from sql queries (on hive)
|
2020-03-23 11:58:32 +01:00 |
miconis
|
c20e179f5a
|
structure of the workflows updated
|
2020-03-23 11:43:49 +01:00 |
Claudio Atzori
|
658d40ccbe
|
WIP trying to use hive2 actions
|
2020-03-23 11:14:54 +01:00 |
Claudio Atzori
|
ecb64e4998
|
Merge branch 'migration_wfs_regular_all_steps'
|
2020-03-23 08:57:01 +01:00 |
Michele Artini
|
15160032bd
|
fixed a bug setting some organization fields
|
2020-03-23 08:39:14 +01:00 |
Claudio Atzori
|
a4c52661a0
|
WIP: fixing dedup workflows
|
2020-03-20 19:17:24 +01:00 |
Claudio Atzori
|
6cb0a9bff0
|
dedup wf directory structure aligned with project commons
|
2020-03-20 16:48:14 +01:00 |
miconis
|
e16e644faf
|
implementation of the workflow for entity update and for relations update
|
2020-03-20 13:01:56 +01:00 |
przemek
|
638b78f96a
|
Merge remote-tracking branch 'origin/master' into przemyslawjacewicz_actionmanager_impl_prototype
|
2020-03-19 15:12:56 +01:00 |
miconis
|
4e82a24af2
|
minor changes and implementation of the create connected components action
|
2020-03-19 15:01:07 +01:00 |
Claudio Atzori
|
36236dd1c1
|
action migration workflow produces eu.dnetlib.dhp.schema.action.AtomicAction(s)
|
2020-03-19 14:00:38 +01:00 |
Claudio Atzori
|
a0ab15a64c
|
need to stick on using guava:11.0.2 as it is the version used by the hadoop components (oozie client for sure). The last version (28.2-jre) breaks the oozie workflow submission
|
2020-03-19 13:58:58 +01:00 |
Sandro La Bruzzo
|
0594b92a6d
|
implemented relation with dataset
|
2020-03-19 11:11:07 +01:00 |
miconis
|
679b5869e5
|
implementation of the lookup procedure to take dedup conf from the resource profiles
|
2020-03-18 17:41:56 +01:00 |
Claudio Atzori
|
abe8fb69a2
|
added global properties, moved postprocessing script inside the oozie_app directory
|
2020-03-18 15:43:54 +01:00 |
miconis
|
f32eae5ce9
|
implementation of the spark action for the simrel creation
|
2020-03-18 14:27:49 +01:00 |
Claudio Atzori
|
c7e0730720
|
compress the output produced by migration steps 1 and 2
|
2020-03-18 09:34:57 +01:00 |
Claudio Atzori
|
2f11e37602
|
fixed expansion of path variables
|
2020-03-17 19:41:07 +01:00 |
Claudio Atzori
|
2795b0b096
|
no need to mkdir a the all_entities file
|
2020-03-17 17:22:14 +01:00 |
Claudio Atzori
|
19746ad308
|
when reuseContent, reset ${workingPath}/all_entities
|
2020-03-17 17:17:06 +01:00 |
Claudio Atzori
|
2f0c85eeb3
|
updated parameters for regular_all_steps worfklow, introduced flag 'reuseContent'
|
2020-03-17 17:04:58 +01:00 |
Miriam Baglioni
|
67ea3cf3ed
|
changed the way to read the file with info on resource or relation. From sequenceFile to textFile
|
2020-03-17 16:32:05 +01:00 |
Miriam Baglioni
|
b4652d018c
|
moved the creation of new dir to common class.
|
2020-03-17 16:31:24 +01:00 |
Claudio Atzori
|
b8290b5851
|
updated parameters for regular_all_steps worfklow
|
2020-03-17 15:45:30 +01:00 |
Claudio Atzori
|
4706f24ec5
|
updated parameters for regular_all_steps worfklow
|
2020-03-17 15:23:54 +01:00 |
Claudio Atzori
|
aeb01fa353
|
reading from newline delimited json textfiles instead of sequence files
|
2020-03-17 11:57:24 +01:00 |
Miriam Baglioni
|
92f4e0001d
|
Merge branch 'bulktag'
|
2020-03-16 13:33:27 +01:00 |
Miriam Baglioni
|
ab08a37024
|
Merge remote-tracking branch 'upstream/master'
|
2020-03-16 12:45:23 +01:00 |
Claudio Atzori
|
af835f2f98
|
when migrating actionsets from DM cluster, populate the AtomicAction.targetValue when empty (dedup similarities)
|
2020-03-15 18:07:59 +01:00 |
Claudio Atzori
|
9c84e21b87
|
added workflow to migrate latest version of each actionset content from DM to OCEAN cluster, mapping the targetValues from the old protobuf data model to the dhp.OAF datamodel
|
2020-03-13 15:56:52 +01:00 |
Claudio Atzori
|
8fe7ae1482
|
xml formatting
|
2020-03-13 15:53:56 +01:00 |
Przemysław Jacewicz
|
d0c9b0cdd6
|
WIP promote job functions updated
|
2020-03-13 12:36:42 +01:00 |
Przemysław Jacewicz
|
8d9b3c5de2
|
WIP action payload mapping into OAF type moved, (local) graph table name enum created, tests fixed
|
2020-03-13 10:01:39 +01:00 |
Przemysław Jacewicz
|
5cc560c7e5
|
Removed unnecessary dependency on old OAF model
|
2020-03-13 09:57:46 +01:00 |
Sandro La Bruzzo
|
addaaa091f
|
migrate relation from RDD to Dataset
|
2020-03-13 09:13:20 +01:00 |
Przemysław Jacewicz
|
3f24593e51
|
WIP: promote job tests and test resources implementation snapshot
|
2020-03-11 17:06:29 +01:00 |
Przemysław Jacewicz
|
2e996d610f
|
WIP: promote job functions implementation snapshot
|
2020-03-11 17:02:57 +01:00 |
Przemysław Jacewicz
|
cc63cdc9e6
|
WIP: promote job implementation snapshot
|
2020-03-11 17:02:06 +01:00 |
Przemysław Jacewicz
|
69540f6f78
|
Serialization-safe supplier added
|
2020-03-11 16:59:05 +01:00 |
Przemysław Jacewicz
|
e6e214dab5
|
Oaf merge and get strategy added
|
2020-03-11 16:58:17 +01:00 |
Claudio Atzori
|
7b6f0c8756
|
reading graph dump as text files, encoded as newline-delimited JSON records, as indicated in the wiki
|
2020-03-10 17:19:17 +01:00 |
Claudio Atzori
|
60aedb1110
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-10 17:09:44 +01:00 |
Claudio Atzori
|
a3f184fd3f
|
added field websiteurl in related organizations
|
2020-03-10 17:08:58 +01:00 |
Claudio Atzori
|
0e95544495
|
fixed serialization for datasource subjects
|
2020-03-10 17:07:44 +01:00 |
Sandro La Bruzzo
|
7b28783fb4
|
updated unpaywall mapping
|
2020-03-08 17:00:19 +01:00 |
Michele Artini
|
b6efa9d6ab
|
Configuration of the SequenceFile Writer
|
2020-03-05 15:49:14 +01:00 |
Claudio Atzori
|
5e342a555c
|
no need to compute the inverse relClass, fixed text() in xpath expressions
|
2020-03-05 12:51:48 +01:00 |
Claudio Atzori
|
6ec04d4e02
|
specified column used to perform the join operation in the javadoc
|
2020-03-05 12:50:38 +01:00 |
Michele Artini
|
7a2a466161
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-03-04 14:50:59 +01:00 |
Michele Artini
|
755eade2fb
|
fix creation ids
|
2020-03-04 14:49:45 +01:00 |
Claudio Atzori
|
6379f32466
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-04 10:57:06 +01:00 |
Claudio Atzori
|
0233987603
|
introduced post processing step following the hive DB creation/population
|
2020-03-04 10:56:50 +01:00 |
Claudio Atzori
|
1e563bc15e
|
introduced distinct properties driving the resouce usage for the XML record creation and the indexing phase
|
2020-03-04 10:55:11 +01:00 |
Claudio Atzori
|
9af3e904be
|
close the SparkSession at the end
|
2020-03-04 10:53:31 +01:00 |
Michele Artini
|
e7167b996a
|
logs and closeable
|
2020-03-04 10:46:36 +01:00 |
Claudio Atzori
|
25ceec29ab
|
code formatting
|
2020-03-04 10:44:24 +01:00 |
Claudio Atzori
|
63c00c5e88
|
fixed typo
|
2020-03-04 10:43:44 +01:00 |
Miriam Baglioni
|
c37f2bd1b5
|
moved some classes to package to make code clearer
|
2020-03-03 16:42:23 +01:00 |
Miriam Baglioni
|
d9d2060561
|
implementation for bulk tagging
|
2020-03-03 16:38:50 +01:00 |
Miriam Baglioni
|
e80f80ca93
|
properties and workflow for new propagation
|
2020-03-02 17:03:31 +01:00 |
Claudio Atzori
|
9cf5ce2e66
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-03-02 17:03:10 +01:00 |
Claudio Atzori
|
bc7cfd5975
|
indexing workflow WIP: fixed projects fundingtree xml conversion, prioritized links between results and projects when limiting them to 100 in the join procedure
|
2020-03-02 17:03:07 +01:00 |
Miriam Baglioni
|
50080c1b3c
|
changed the implementation of addAll method. Before adding all the items in a collection, we check if the accumulator set is not empty
|
2020-03-02 16:41:37 +01:00 |
Miriam Baglioni
|
02815dd2cf
|
update result for community moved in propagationconstants
|
2020-03-02 16:40:56 +01:00 |
Miriam Baglioni
|
95f8c3092f
|
update for new propagation implementation and moving of updateResult for community business logic since the same can be used for result to community from organization and result to community from semrel
|
2020-03-02 16:40:17 +01:00 |
Miriam Baglioni
|
3d63f35dcb
|
implementation of new propagation. Result to community for results linked to given organization. We exploit the hasAuthorInstitution semantic link to discover which results are related to institutions
|
2020-03-02 16:39:03 +01:00 |
Michele Artini
|
4b29a121b0
|
migration using spark in step2
|
2020-03-02 16:12:14 +01:00 |
Michele Artini
|
5445a57102
|
migration using spark in step2
|
2020-03-02 16:11:59 +01:00 |
Miriam Baglioni
|
3a4ccb26c0
|
New properties for the orcid to result propagation through semantic relation
|
2020-02-28 18:26:04 +01:00 |
Miriam Baglioni
|
b50166b9ad
|
None
|
2020-02-28 18:25:28 +01:00 |
Miriam Baglioni
|
550cb21c23
|
None
|
2020-02-28 18:24:39 +01:00 |
Miriam Baglioni
|
b098ee0bae
|
Changed the structure of typed row to conatain also list of authors with orcid
|
2020-02-28 18:23:51 +01:00 |
Miriam Baglioni
|
841f5523fe
|
Added information and methods for the new propagation of orcid to result through semrel
|
2020-02-28 18:23:16 +01:00 |
Miriam Baglioni
|
2b7b05fb29
|
New propagation of ORCID to result exploiting the semantic relation connecting them. R has author with orcid o, R is bounf by strong semantic relationship with R1 that has the same author withouth orcid, then o is also associated to the author in R1
|
2020-02-28 18:22:41 +01:00 |
Miriam Baglioni
|
833c83c694
|
Wrong file name
|
2020-02-28 18:21:01 +01:00 |
Miriam Baglioni
|
a86426776a
|
Changed from Oaf to Result the type of the updateResult method parameter, not to be forced to cast each time
|
2020-02-28 18:20:19 +01:00 |
Sandro La Bruzzo
|
b32655e48e
|
changed code to save intermediate result
|
2020-02-27 10:18:46 +01:00 |
Claudio Atzori
|
60bc2b1a20
|
drop the hive DB before populating it from scratch
|
2020-02-27 10:10:55 +01:00 |
Sandro La Bruzzo
|
f09e065865
|
incremented number of repartition
|
2020-02-26 19:26:19 +01:00 |
Sandro La Bruzzo
|
071f5c3e52
|
fixed NPE
|
2020-02-26 15:42:20 +01:00 |
Sandro La Bruzzo
|
a1a6fc8315
|
fixed NPE
|
2020-02-26 15:42:13 +01:00 |
Sandro La Bruzzo
|
1edf02a3ce
|
added log
|
2020-02-26 15:25:03 +01:00 |
Sandro La Bruzzo
|
c3ecabd8e8
|
fixed NPE
|
2020-02-26 14:40:02 +01:00 |
Sandro La Bruzzo
|
5d0f46651b
|
fixed NPE
|
2020-02-26 14:31:34 +01:00 |
Sandro La Bruzzo
|
bc342bf73a
|
fixed wrong generation type in summary
|
2020-02-26 12:49:47 +01:00 |
Sandro La Bruzzo
|
3112e21858
|
fixed typo
|
2020-02-26 12:22:43 +01:00 |
Sandro La Bruzzo
|
119ae6eef5
|
fixed wrong loop in the workflow
|
2020-02-26 12:18:50 +01:00 |
Sandro La Bruzzo
|
7936583a3d
|
added generation of Scholix collection
|
2020-02-26 12:09:06 +01:00 |
Przemysław Jacewicz
|
02db368dc5
|
Merge branch 'master' into przemyslawjacewicz_actionmanager_impl_prototype
|
2020-02-26 11:50:20 +01:00 |
Sandro La Bruzzo
|
2ef3705b2c
|
Added Provision workflow
|
2020-02-26 10:51:35 +01:00 |
Michele Artini
|
689908b2e9
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-02-25 16:00:51 +01:00 |
Michele Artini
|
93665773ea
|
Fixed a problem with JavaRDD Union
|
2020-02-25 15:59:21 +01:00 |
Sandro La Bruzzo
|
b021b8a2e1
|
Added index wf
|
2020-02-24 10:15:55 +01:00 |
Claudio Atzori
|
6a73fd5da5
|
in order to reuse the same XmlRecordFactory across different tasks, the state of contexts must be one per record built
|
2020-02-21 09:17:19 +01:00 |
Michele Artini
|
d49cd2fdc6
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-02-20 11:21:54 +01:00 |
Miriam Baglioni
|
3f941a2af4
|
Merge branch 'master' into propagationCommunityToResult
|
2020-02-19 18:05:22 +01:00 |
Miriam Baglioni
|
b2bdc9b99b
|
merging project to result propagation logic to master
|
2020-02-19 18:04:59 +01:00 |
Miriam Baglioni
|
a153a07997
|
none
|
2020-02-19 18:03:13 +01:00 |
Miriam Baglioni
|
d0279af630
|
start to implement the business logic
|
2020-02-19 17:59:24 +01:00 |
Miriam Baglioni
|
5f63ab1416
|
to query the information system to get the list of comunities up to now. It will have a more general usage when introducing bulk tagging
|
2020-02-19 17:59:02 +01:00 |
Miriam Baglioni
|
5ceb174d24
|
Merge branch 'master' into propagationCommunityToResult
|
2020-02-19 17:13:38 +01:00 |
Miriam Baglioni
|
e8af7a6b64
|
Merge remote-tracking branch 'upstream/master'
|
2020-02-19 17:03:10 +01:00 |
Miriam Baglioni
|
79ff79b0cd
|
propagation of result to community through semantic relation: C -> R and R -> isSupplementedBy R1 => C -> R1
|
2020-02-19 17:02:39 +01:00 |
Claudio Atzori
|
5e5e32cb48
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-02-19 16:56:52 +01:00 |
Claudio Atzori
|
33185fd0b7
|
ISLookupClientFactory moved in dhp-common
|
2020-02-19 16:56:38 +01:00 |
Michele Artini
|
5d3739b5cf
|
migration of claims
|
2020-02-19 15:11:17 +01:00 |
Miriam Baglioni
|
ab84163bb3
|
added set accumulator in TypedRow and used it to acucmulate country information in Country Propagation
|
2020-02-19 15:02:50 +01:00 |
Miriam Baglioni
|
bb0fdf5e0a
|
fix wrong source target in new relation
|
2020-02-19 15:00:46 +01:00 |
Miriam Baglioni
|
9e1678ccf8
|
fix error in workflow name
|
2020-02-19 14:59:24 +01:00 |
Miriam Baglioni
|
8aa3b4d7c0
|
adding to propagation constants the ones needed for propagation of project to result and addition of new accumulator Set in typed row to collect values of a type
|
2020-02-19 14:55:54 +01:00 |
Miriam Baglioni
|
7167673a58
|
implementation and configuration for propagation of project to result through semantic relation: P -> R1 and R1 -> supplemented by -> R2 => P -> R2
|
2020-02-19 14:54:18 +01:00 |
Michele Artini
|
173f1df1e5
|
saved a query for openaire production database
|
2020-02-19 10:15:08 +01:00 |
Sandro La Bruzzo
|
9a2d74ac82
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-02-19 10:13:45 +01:00 |
Sandro La Bruzzo
|
e5d7cdf422
|
fixed sql query
|
2020-02-19 10:13:36 +01:00 |
Sandro La Bruzzo
|
2b8675462f
|
refactoring code
|
2020-02-19 10:07:08 +01:00 |
Miriam Baglioni
|
b81e6af429
|
added config for new propagation
|
2020-02-18 17:30:44 +01:00 |
Miriam Baglioni
|
b736a9581c
|
changed relclass and reltype in reelation specification for country propagation and implementation of propagation of result affiliation through institutional repositories
|
2020-02-18 17:27:28 +01:00 |
Miriam Baglioni
|
ed262293a6
|
aligned to new snapshot version 1.1.6
|
2020-02-18 17:25:32 +01:00 |
Miriam Baglioni
|
2688a89c21
|
changed relclass and reltype in relation specification
|
2020-02-18 17:24:40 +01:00 |
Miriam Baglioni
|
c0022fec9f
|
moved on upper package to serve other propagations
|
2020-02-18 17:24:11 +01:00 |
Miriam Baglioni
|
e0a777028a
|
fix problem in parameters
|
2020-02-18 17:23:34 +01:00 |
Claudio Atzori
|
ed76521d9b
|
removed stale test resources, will be re-added later on
|
2020-02-18 11:51:08 +01:00 |
Claudio Atzori
|
0f364605ff
|
removed stale tests, need to reimplemente them anyway
|
2020-02-18 11:48:19 +01:00 |
Miriam Baglioni
|
5868ff8a86
|
synch fork with master
|
2020-02-17 18:22:27 +01:00 |
Przemysław Jacewicz
|
958f0693d6
|
WIP: logic for promoting action sets added
|
2020-02-17 18:19:19 +01:00 |
Miriam Baglioni
|
18e4092d5c
|
change name of properties dir
|
2020-02-17 18:07:06 +01:00 |
Miriam Baglioni
|
bd0e504b42
|
changes to the wf configuration
|
2020-02-17 18:04:15 +01:00 |
Miriam Baglioni
|
3a9d723655
|
adding default parameters in code
|
2020-02-17 16:30:52 +01:00 |