Miriam Baglioni
|
a01800224c
|
-
|
2020-06-11 13:02:04 +02:00 |
Miriam Baglioni
|
356dd582a3
|
map construction moved in class
|
2020-06-11 12:59:22 +02:00 |
Alessia Bardi
|
e79943965b
|
Fixes #5604: field oamandatepublications in XML
|
2020-06-11 12:49:31 +02:00 |
Michele Artini
|
a41e0cb648
|
missing landingPage urls in instances
|
2020-06-11 12:28:34 +02:00 |
Michele Artini
|
04fdcacd83
|
results with all joined entities
|
2020-06-11 11:25:18 +02:00 |
Michele Artini
|
99f88e1cb8
|
fixed generation entities from claims
|
2020-06-11 10:51:57 +02:00 |
Miriam Baglioni
|
db27663750
|
-
|
2020-06-11 10:49:01 +02:00 |
Miriam Baglioni
|
bb9f21d0e7
|
job test for class producing first step of results dump
|
2020-06-11 10:20:05 +02:00 |
Claudio Atzori
|
d1d92c4d8c
|
fixed integration of claims in the graph
|
2020-06-11 10:12:00 +02:00 |
Claudio Atzori
|
953da4a427
|
Merge branch 'master' into graph_cleaning
|
2020-06-10 21:36:56 +02:00 |
Claudio Atzori
|
f1bce64391
|
WIP: graph cleaner implementation
|
2020-06-10 21:36:31 +02:00 |
Claudio Atzori
|
67c7b31ba6
|
Merge branch 'master' into graph_cleaning
|
2020-06-10 15:00:35 +02:00 |
Claudio Atzori
|
3ebf81d2b0
|
Merge pull request 'oaf-store-interpretation' (#21) from oaf-store-interpretation into master
Looks good, thanks Michele!
|
2020-06-10 14:58:09 +02:00 |
Michele Artini
|
5869cb76b3
|
reformatting
|
2020-06-10 12:11:16 +02:00 |
Michele Artini
|
c08e66e01e
|
fixed a workflow parameter
|
2020-06-10 10:11:56 +02:00 |
Michele Artini
|
7177a32d75
|
import of invisible stores
|
2020-06-10 10:04:00 +02:00 |
Claudio Atzori
|
ce12f236bb
|
disabled test, need to need to update the joined_entity.json file
|
2020-06-09 20:07:36 +02:00 |
Claudio Atzori
|
a2fdf85ba1
|
WIP: graph cleaner implementation
|
2020-06-09 19:52:53 +02:00 |
Alessia Bardi
|
4551c1082f
|
mapping csv for orcid
|
2020-06-09 18:08:47 +02:00 |
Alessia Bardi
|
2d3f7d1eb4
|
fixed log classes to make the ORCID test run
|
2020-06-09 18:07:14 +02:00 |
Alessia Bardi
|
a3a6755d58
|
mapping csv for Unpaywall
|
2020-06-09 17:45:44 +02:00 |
Claudio Atzori
|
d9f33582c5
|
WIP: graph cleaner implementation
|
2020-06-09 17:20:40 +02:00 |
Alessia Bardi
|
f3b033cf09
|
added csv line for funders from Crossref
|
2020-06-09 17:08:26 +02:00 |
Alessia Bardi
|
79969d78b9
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-06-09 17:05:39 +02:00 |
Alessia Bardi
|
fc4d220964
|
updated function name for SNSF
|
2020-06-09 17:05:31 +02:00 |
Michele Artini
|
baaa55f4a3
|
use of pace to calculate trusts
|
2020-06-09 16:01:31 +02:00 |
Alessia Bardi
|
33b130ec43
|
Mapping instructions for MAG
|
2020-06-09 15:57:15 +02:00 |
Miriam Baglioni
|
206abba48c
|
merge branch with fork master
|
2020-06-09 15:41:14 +02:00 |
Miriam Baglioni
|
a089db18f1
|
workflow and parameters to exucute the dump
|
2020-06-09 15:39:38 +02:00 |
Miriam Baglioni
|
6bbe27587f
|
new classes to execute the dump for products associated to community, enrich each result with project information and assign the result to each community it belongs to
|
2020-06-09 15:39:03 +02:00 |
Miriam Baglioni
|
5121cbaf6a
|
new classes for external dump. Only classes functional to dump products
|
2020-06-09 15:37:46 +02:00 |
Alessia Bardi
|
d6de406e11
|
fixed classid for subjects
|
2020-06-09 14:43:34 +02:00 |
Alessia Bardi
|
f072125152
|
map volume and issue in journal information from MAG
|
2020-06-09 14:32:10 +02:00 |
Alessia Bardi
|
b7cb1163ea
|
identifiers always start with 50
|
2020-06-09 10:39:11 +02:00 |
Alessia Bardi
|
181f52b9bc
|
Added mapping table for Crossref
|
2020-06-08 19:33:47 +02:00 |
Alessia Bardi
|
9fd25887f7
|
Result identifiers all start with 50|
|
2020-06-08 19:32:24 +02:00 |
Alessia Bardi
|
16cb073b15
|
set the instance datepfacceptance with the Crossref createdDate in case the issuedDate is blank
|
2020-06-08 19:06:03 +02:00 |
Michele Artini
|
bb659d870c
|
join simrels
|
2020-06-08 16:29:01 +02:00 |
Michele Artini
|
81e85465d8
|
join simrels
|
2020-06-08 16:26:16 +02:00 |
Claudio Atzori
|
3d871c6651
|
Merge branch 'master' into graph_cleaning
|
2020-06-08 15:23:24 +02:00 |
Claudio Atzori
|
25a093b1a4
|
integrated changes from master
|
2020-06-08 15:04:00 +02:00 |
Sandro La Bruzzo
|
e34e7d6728
|
merge DOIBoost
|
2020-06-08 08:32:22 +02:00 |
Sandro La Bruzzo
|
e46e2a4776
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-06-08 08:17:14 +02:00 |
Spyros Zoupanos
|
3576dd186b
|
Adding hive timeout as workflow parameter
|
2020-06-05 22:29:54 +03:00 |
Claudio Atzori
|
b2349659cf
|
WIP: graph property fixing implementation
|
2020-06-05 18:37:38 +02:00 |
Michele Artini
|
a73973a74b
|
partial implemantation of broker events generation
|
2020-06-05 11:43:00 +02:00 |
Michele Artini
|
7e82996e7c
|
partial implemantation of broker events generation
|
2020-06-04 17:10:43 +02:00 |
Sandro La Bruzzo
|
b57e8ba374
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-06-04 14:39:41 +02:00 |
Sandro La Bruzzo
|
7ac1ba2e35
|
improvement DOIBoost
|
2020-06-04 14:39:20 +02:00 |
Michele Artini
|
97177d7f7b
|
partial refactoring
|
2020-06-04 10:26:34 +02:00 |
Sandro La Bruzzo
|
13815d5d13
|
improvement DOIBoost
|
2020-06-01 17:52:12 +02:00 |
Claudio Atzori
|
05f269a1c0
|
kryo based parallel implementation of CreateRelatedEntitiesJob_phase2, now works by OafType; introduced custom aggregator in AdjacencyListBuilderJob
|
2020-06-01 00:32:42 +02:00 |
Claudio Atzori
|
5e23fb3a74
|
code formatting
|
2020-05-30 10:52:56 +02:00 |
Claudio Atzori
|
54ca8ed6c3
|
uniformed param name (isLookupUrl), Vocab model classes defined as Serializable
|
2020-05-29 18:17:30 +02:00 |
Claudio Atzori
|
1577bd5b8b
|
added IsLookupUrl to the raw_db workflow parameters
|
2020-05-29 16:18:16 +02:00 |
Claudio Atzori
|
91d78b825b
|
Merge pull request 'import from db using is vocabularies' (#17) from result_pids into master
Looks good, thanks Michele!
|
2020-05-29 16:02:40 +02:00 |
Michele Artini
|
adb798faa5
|
import from db using is vocabularies
|
2020-05-29 12:03:51 +02:00 |
Claudio Atzori
|
6f5f498c78
|
restored common properties driving executor-cores and executor-memory in join_organization_relations wf node
|
2020-05-29 11:22:00 +02:00 |
Claudio Atzori
|
b2f9564f13
|
WIP: fixed PrepareRelationsJob; parallel implementation of CreateRelatedEntitiesJob_phase2, now works by OafType; introduced custom aggregator in AdjacencyListBuilderJob
|
2020-05-29 10:58:15 +02:00 |
Miriam Baglioni
|
dfa4997a4f
|
removed commented code
|
2020-05-29 10:45:18 +02:00 |
Miriam Baglioni
|
6f1eea28b6
|
changed message in log
|
2020-05-29 10:41:39 +02:00 |
Sandro La Bruzzo
|
b87b3ddb6b
|
changed mapping ORCIDToOAF
|
2020-05-29 09:32:04 +02:00 |
Miriam Baglioni
|
8b6e886fb6
|
added new resource for testing
|
2020-05-28 23:54:31 +02:00 |
Miriam Baglioni
|
6989fb9c8a
|
changed the project test according to the newly introduced join with the db project codes
|
2020-05-28 23:53:24 +02:00 |
Miriam Baglioni
|
782984d8e5
|
added needed parameter
|
2020-05-28 23:52:41 +02:00 |
Miriam Baglioni
|
01f7876595
|
fix issue with flatMap - the return type must not be null
|
2020-05-28 23:50:32 +02:00 |
Claudio Atzori
|
a57965a3ea
|
limiting the dimensions of outliers
|
2020-05-28 17:36:37 +02:00 |
Miriam Baglioni
|
773735f870
|
added the path to the file containing the projects code from the db
|
2020-05-28 17:30:45 +02:00 |
Miriam Baglioni
|
6a15067a64
|
added one step in the workflow
|
2020-05-28 17:30:09 +02:00 |
Miriam Baglioni
|
5309a99a70
|
modified the PrepareProjects to consider those in the db
|
2020-05-28 17:29:53 +02:00 |
Miriam Baglioni
|
b737ed8236
|
added part to read projects from the openaire db to filter out those in the csv file that are not in the db
|
2020-05-28 17:29:21 +02:00 |
Claudio Atzori
|
821be1f8b6
|
experimental implementation of custom aggregation using kryo encoders
|
2020-05-28 13:53:13 +02:00 |
Claudio Atzori
|
83504ecace
|
limiting the maximum number of authors allowed in XML records to MAX_AUTHORS = 200; authors with ORCID can exceed that limit
|
2020-05-28 13:52:30 +02:00 |
Claudio Atzori
|
ef11593068
|
JoinedEntity.links defined as empty list by default
|
2020-05-28 13:50:44 +02:00 |
Claudio Atzori
|
5dea155a87
|
increased number of partitions produced by the join_all_entities phase as well as spark.sql.shuffle.partitions in adjancency_lists phase
|
2020-05-28 13:49:59 +02:00 |
Miriam Baglioni
|
35b7279147
|
changed test because data are saved as SequenceFile now, and because of the group by the umber of produced update decrease
|
2020-05-28 10:26:12 +02:00 |
Miriam Baglioni
|
37c155b86a
|
merge branch with fork master
|
2020-05-28 10:09:51 +02:00 |
Miriam Baglioni
|
df44db686a
|
refactoring
|
2020-05-28 10:07:00 +02:00 |
Miriam Baglioni
|
87b07f4af8
|
removed unused variables
|
2020-05-28 10:05:43 +02:00 |
Miriam Baglioni
|
1060977272
|
added fs actions to remove and the create the workingDir
|
2020-05-28 10:04:36 +02:00 |
Miriam Baglioni
|
96d1a3c431
|
deleted the file were to store the csv files
|
2020-05-28 10:04:10 +02:00 |
Miriam Baglioni
|
669c05c771
|
added groupBy before creating Actions
|
2020-05-28 10:00:45 +02:00 |
Sandro La Bruzzo
|
02f90eeb07
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-28 09:58:32 +02:00 |
Sandro La Bruzzo
|
7d29b61c62
|
code refactor
|
2020-05-28 09:57:46 +02:00 |
Claudio Atzori
|
fdd54bad1c
|
code formatting
|
2020-05-27 19:31:54 +02:00 |
Miriam Baglioni
|
1855453434
|
changed the outputdir of the last step
|
2020-05-27 17:59:36 +02:00 |
Claudio Atzori
|
b9b1bc9967
|
Merge branch 'master' into provision_indexing
|
2020-05-27 12:55:20 +02:00 |
Claudio Atzori
|
aac1515b58
|
Merge pull request 'result_pids without conflicts ???' (#16) from result_pids into master
Looks good, thanks Michele
|
2020-05-27 12:54:52 +02:00 |
Michele Artini
|
f5ce7d76e1
|
resolve conflicts
|
2020-05-27 12:49:17 +02:00 |
Claudio Atzori
|
cfd753217c
|
repartition the join_entities in 24k files
|
2020-05-27 12:44:01 +02:00 |
Claudio Atzori
|
2f1a623d09
|
sync from master branch
|
2020-05-27 12:39:58 +02:00 |
Claudio Atzori
|
9e4ec1543b
|
updated test
|
2020-05-27 12:38:42 +02:00 |
Claudio Atzori
|
8047d16dd9
|
added RDD based adjacency list creation procedure
|
2020-05-27 12:38:12 +02:00 |
Claudio Atzori
|
f057dcdf65
|
limit the max number of externalreferences to MAX_EXTERNAL_ENTITIES
|
2020-05-27 12:37:33 +02:00 |
Michele Artini
|
b81f2741d2
|
xquery
|
2020-05-27 12:10:20 +02:00 |
Michele Artini
|
a25598140a
|
result pids (new xpaths + IS vocabularies)
|
2020-05-27 12:10:20 +02:00 |
Michele Artini
|
7a7272d9ec
|
result pids (new xpaths + IS vocabularies)
|
2020-05-27 12:10:20 +02:00 |
Michele Artini
|
3ceb2d2853
|
match terms with vocabularies
|
2020-05-27 11:34:13 +02:00 |
Claudio Atzori
|
4e36d689dd
|
fixed XML serialization for children sub-elements (duplicates & externalreferences)
|
2020-05-26 18:30:40 +02:00 |
Miriam Baglioni
|
92e3a52e91
|
merge branch with fork master
|
2020-05-26 15:57:51 +02:00 |
Michele Artini
|
c15d997925
|
xquery
|
2020-05-26 13:13:17 +02:00 |
Michele Artini
|
c6af36496a
|
result pids (new xpaths + IS vocabularies)
|
2020-05-26 13:11:09 +02:00 |
Michele Artini
|
093f1aff03
|
result pids (new xpaths + IS vocabularies)
|
2020-05-26 13:06:55 +02:00 |
Claudio Atzori
|
b8e541a454
|
fixing repeated organization.websiteurl in organization entities (#5645) as well as project.ecinternationalorganizationeurinterests
|
2020-05-26 10:30:09 +02:00 |
Claudio Atzori
|
55595d7235
|
HACK: patch NULL values with defaults found in result.datainfo.deletedbyinference and result.context
|
2020-05-26 10:28:35 +02:00 |
Claudio Atzori
|
7b288a94cb
|
code formatting
|
2020-05-26 09:54:13 +02:00 |
Miriam Baglioni
|
54d869e618
|
merge upstream
|
2020-05-26 09:22:04 +02:00 |
Miriam Baglioni
|
eea07f4c42
|
refactoring
|
2020-05-26 09:21:49 +02:00 |
Sandro La Bruzzo
|
79c26382da
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-26 09:15:50 +02:00 |
Sandro La Bruzzo
|
25f52e19a4
|
implemented generation of ActionSet
|
2020-05-26 09:15:33 +02:00 |
Michele Artini
|
d6aada4957
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-26 08:44:31 +02:00 |
Michele Artini
|
b1546605e3
|
updated version of a dependency
|
2020-05-26 08:44:15 +02:00 |
Claudio Atzori
|
7582532e73
|
[maven-release-plugin] prepare for next development iteration
|
2020-05-25 19:48:18 +02:00 |
Claudio Atzori
|
01c2e93395
|
[maven-release-plugin] prepare release dhp-1.2.1
|
2020-05-25 19:48:14 +02:00 |
miconis
|
da1e5cf557
|
implementation of the result title merge. main title with higher trust, distinct between the others
|
2020-05-25 18:02:57 +02:00 |
Miriam Baglioni
|
d3d36647d2
|
merge upstream
|
2020-05-25 10:38:22 +02:00 |
Miriam Baglioni
|
74215f6d9f
|
refactoring
|
2020-05-25 10:38:16 +02:00 |
Miriam Baglioni
|
dbde2d243a
|
changed due to move of PacePerson from dhp-graph-mapper to dhp-common
|
2020-05-25 10:35:39 +02:00 |
Miriam Baglioni
|
f754c424bd
|
changed logic to compute only onece PacePerson for each Author to be enriched
|
2020-05-25 10:35:02 +02:00 |
Miriam Baglioni
|
8f51af4e9b
|
added PacePerson to get name surname for authors having only fullname set
|
2020-05-25 10:34:30 +02:00 |
Miriam Baglioni
|
b258f99ece
|
fix for issue that duplicated result
|
2020-05-25 10:26:48 +02:00 |
Miriam Baglioni
|
8f6ce970f9
|
moved PacePerson to dhp-common to avoid conflict in dependency with graph-mapper
|
2020-05-25 10:25:55 +02:00 |
Claudio Atzori
|
de108f54d6
|
code formatting
|
2020-05-23 10:21:19 +02:00 |
Claudio Atzori
|
6b56cae57d
|
added mapping for bestaccessrights
|
2020-05-23 09:57:39 +02:00 |
Claudio Atzori
|
7181807e64
|
code formatting
|
2020-05-23 09:51:48 +02:00 |
Sandro La Bruzzo
|
2408083566
|
implemented filtering step
|
2020-05-23 08:46:49 +02:00 |
Sandro La Bruzzo
|
244f6e50cf
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-22 20:52:15 +02:00 |
Sandro La Bruzzo
|
147dd389bf
|
minor fix
|
2020-05-22 20:51:42 +02:00 |
Miriam Baglioni
|
0d1ec1913f
|
added fix to avoid duplication of results
|
2020-05-22 18:42:25 +02:00 |
miconis
|
5d7ac78c41
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-22 17:25:08 +02:00 |
miconis
|
0fd0c7d725
|
reimplementation of the sim between two authors. now it takes into account both name and surname. threshold incremented to 1.0 if the name is too short
|
2020-05-22 17:24:57 +02:00 |
Michele Artini
|
eb606dc1e2
|
partial implementation of events with rels
|
2020-05-22 17:17:41 +02:00 |
Miriam Baglioni
|
29066a6b46
|
applied code cleanup
|
2020-05-22 15:38:50 +02:00 |
Miriam Baglioni
|
8610ad5142
|
added groupby id to fix multiple result with same id at join step
|
2020-05-22 15:32:55 +02:00 |
Miriam Baglioni
|
1e44703e3e
|
merge upstream
|
2020-05-22 15:30:07 +02:00 |
Miriam Baglioni
|
ac8025f469
|
-
|
2020-05-22 15:29:41 +02:00 |
Miriam Baglioni
|
50ad83b97f
|
-
|
2020-05-22 15:27:19 +02:00 |
Miriam Baglioni
|
473c6d3a23
|
produces AtomicActions instead of Projects
|
2020-05-22 15:26:57 +02:00 |
Sandro La Bruzzo
|
72278b9375
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-22 15:17:13 +02:00 |
Sandro La Bruzzo
|
22936d0877
|
Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost
|
2020-05-22 15:15:17 +02:00 |
Sandro La Bruzzo
|
9fbb221457
|
completed mapping of UnpayWall and ORCID
|
2020-05-22 15:15:09 +02:00 |
Miriam Baglioni
|
70389b0a30
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-22 13:53:23 +02:00 |
Miriam Baglioni
|
4308f31165
|
added fix to make test run
|
2020-05-22 13:13:01 +02:00 |
Claudio Atzori
|
946598cfba
|
Merge branch 'master' into provision_indexing
|
2020-05-22 12:35:41 +02:00 |
Claudio Atzori
|
3cf2796ac6
|
code formatting
|
2020-05-22 12:34:00 +02:00 |
Michele Artini
|
dc4621b3cb
|
filter ORCID e MAG identifiers
|
2020-05-22 12:25:01 +02:00 |
Michele Artini
|
9f2d0f1b08
|
filter ORCID e MAG identifiers
|
2020-05-22 11:00:27 +02:00 |
Michele Artini
|
9de71e54a8
|
filter ORCID e MAG identifiers
|
2020-05-22 10:47:39 +02:00 |
Michele Artini
|
c5f7e17348
|
author fullnames
|
2020-05-22 10:08:02 +02:00 |
Claudio Atzori
|
ad40470040
|
Merge branch 'master' into provision_indexing
|
2020-05-22 08:51:22 +02:00 |
Claudio Atzori
|
925d933204
|
making XmlRecordFactory immune to graph encoding changes (mostly to avoid NPEs)
|
2020-05-22 08:50:44 +02:00 |
Claudio Atzori
|
b33dd58be4
|
replaced parameter 'reuseRecords' with 'resumeFrom', allowing to restart the provision workflow execution from any step, useful for manual submissions or debugging
|
2020-05-22 08:50:06 +02:00 |
Michele Artini
|
c7ca3cf35b
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-21 16:48:20 +02:00 |
Michele Artini
|
3e34517479
|
partial implementation of events with rels
|
2020-05-21 16:47:53 +02:00 |
Miriam Baglioni
|
eae12a6586
|
Merge branch 'master' into dhp_oaf_model
|
2020-05-21 16:31:22 +02:00 |
Miriam Baglioni
|
6750075fbd
|
merge upstream
|
2020-05-21 16:31:09 +02:00 |
Miriam Baglioni
|
4589c428b1
|
generate action sets and saves them in the hdfs path for the actions sets
|
2020-05-21 16:30:39 +02:00 |
miconis
|
8b35e0e7f0
|
reimplementation of the author merging in deduprecord creation. implementation of the test class. minor changes
|
2020-05-21 12:02:44 +02:00 |
miconis
|
8bbd1d0501
|
reimplementation of the author merging in deduprecord creation. implementation of the test class.
|
2020-05-21 11:52:14 +02:00 |
Michele Artini
|
e43d4d7778
|
added a coalesce in sql query
|
2020-05-21 11:08:07 +02:00 |
Claudio Atzori
|
dbfb9c19fe
|
minor changes
|
2020-05-21 10:00:14 +02:00 |
Michele Artini
|
b3bcbb3129
|
resolve name of organization countries
|
2020-05-21 08:41:32 +02:00 |
Enrico Ottonello
|
1109d3b3fc
|
Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost
|
2020-05-21 00:41:27 +02:00 |
Enrico Ottonello
|
869a53040e
|
save to text file format
|
2020-05-21 00:41:21 +02:00 |
Sandro La Bruzzo
|
5818abaab4
|
fixed Crossref Mapping
|
2020-05-20 17:05:46 +02:00 |
Claudio Atzori
|
da4267d0fe
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-05-20 14:58:22 +02:00 |
Claudio Atzori
|
d7d2a0637f
|
added extra parameters to the provision indexing workflow
|
2020-05-20 14:55:38 +02:00 |
Miriam Baglioni
|
055eec5a77
|
added resource for prepare project test
|
2020-05-20 13:54:10 +02:00 |
Miriam Baglioni
|
9079bc1f61
|
-
|
2020-05-20 13:53:32 +02:00 |
Miriam Baglioni
|
67ba4fde57
|
added test for prepare projects step
|
2020-05-20 13:53:08 +02:00 |
Miriam Baglioni
|
5e0e554000
|
Merge branch 'master' into dhp_oaf_model
|
2020-05-20 10:57:30 +02:00 |
Miriam Baglioni
|
76f3f73caa
|
merge upstream
|
2020-05-20 10:31:40 +02:00 |
Miriam Baglioni
|
3c0eb12d3e
|
removed the not zipped files
|
2020-05-20 10:31:05 +02:00 |
Miriam Baglioni
|
c0d9e02340
|
zipped test resources that are too big
|
2020-05-20 10:30:25 +02:00 |
Miriam Baglioni
|
5e9c9fa87c
|
tests
|
2020-05-20 10:29:57 +02:00 |
Miriam Baglioni
|
faed7521bf
|
added resources for testing
|
2020-05-20 10:29:29 +02:00 |
Miriam Baglioni
|
75491482de
|
added a new preparation step to replicate each project for the programme it is associated to
|
2020-05-20 10:28:56 +02:00 |
Miriam Baglioni
|
eb0e47ba53
|
parameters for h2020 programme
|
2020-05-20 10:26:44 +02:00 |
Sandro La Bruzzo
|
b771d67e9d
|
next step of MAG conversion implemented
|
2020-05-20 08:14:03 +02:00 |
Miriam Baglioni
|
08218d2f3f
|
new workflow with added steps
|
2020-05-19 18:44:25 +02:00 |
Miriam Baglioni
|
457293ccc0
|
test for the variuos steps of project update with programme
|
2020-05-19 18:43:42 +02:00 |
Miriam Baglioni
|
9447d78ef3
|
added preparation classes
|
2020-05-19 18:42:50 +02:00 |
Michele Artini
|
85ca5622d4
|
partial implementation of generation of simple events
|
2020-05-19 16:17:35 +02:00 |
Claudio Atzori
|
0bdfbb0a57
|
reintroduced RDD based relation cut off procedure
|
2020-05-19 15:02:21 +02:00 |
Enrico Ottonello
|
934ad570e0
|
joined summaries and activities dataset
|
2020-05-19 12:57:21 +02:00 |
Enrico Ottonello
|
ca722d4d18
|
merged
|
2020-05-19 09:43:12 +02:00 |
Enrico Ottonello
|
7362bc3e9d
|
workflow to generate seq(doi,AuthorList)
|
2020-05-19 09:34:44 +02:00 |
Sandro La Bruzzo
|
8c95b50f26
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-19 09:25:04 +02:00 |
Sandro La Bruzzo
|
486e850bcc
|
next step of MAG conversion implemented
|
2020-05-19 09:24:45 +02:00 |
Enrico Ottonello
|
d4e9075f22
|
Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost
|
2020-05-18 19:51:36 +02:00 |
Enrico Ottonello
|
fc80e8c7de
|
added accumulator; last modified date of the record is added to saved data; lambda file is partitioned into 20 parts before starting downloading
|
2020-05-18 19:51:29 +02:00 |
Claudio Atzori
|
f3bc8aed31
|
lifted memory requirements for country propagation wf
|
2020-05-18 15:29:10 +02:00 |
Miriam Baglioni
|
b71fbb68b1
|
removed the removeOutputDir command from code. Reltions are written in Append. The erase of the output dir ment to remove all the relations computed in the prevoius steps
|
2020-05-18 13:57:20 +02:00 |
Miriam Baglioni
|
629af7cb79
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-18 13:07:36 +02:00 |
Miriam Baglioni
|
f0f14caf99
|
removed script files for shell actions not performed
|
2020-05-18 13:06:16 +02:00 |
Miriam Baglioni
|
23bbac7d7c
|
-
|
2020-05-18 13:05:03 +02:00 |
Miriam Baglioni
|
4f1ff7ba73
|
added dependency to org.apache.commons common-csv
|
2020-05-18 13:04:39 +02:00 |
Miriam Baglioni
|
abc45f2708
|
added dnet-45 HttpConnector and related Classes, produced the POJO for projects and programme
|
2020-05-18 13:04:06 +02:00 |
Claudio Atzori
|
ef9a9a9f1a
|
remove the outout path when starting
|
2020-05-15 22:34:19 +02:00 |
Enrico Ottonello
|
0b29bb7e3b
|
spark job to download orcid record modified after a fixed date
|
2020-05-15 19:49:26 +02:00 |
Miriam Baglioni
|
5a648016ef
|
parameters from the GetFile class
|
2020-05-15 18:18:50 +02:00 |
Miriam Baglioni
|
83c262a483
|
workflow to download the files
|
2020-05-15 18:18:31 +02:00 |
Miriam Baglioni
|
22cb9e0da7
|
simple code to get file from URL
|
2020-05-15 18:18:01 +02:00 |
Claudio Atzori
|
7838f2c63f
|
init the empty list for author pids mapped from OAF
|
2020-05-15 17:06:01 +02:00 |
Claudio Atzori
|
82b615ab33
|
NPE check
|
2020-05-15 16:04:46 +02:00 |
Miriam Baglioni
|
e26a67c3eb
|
merge with upstream
|
2020-05-15 15:53:05 +02:00 |
Claudio Atzori
|
7a89507ab1
|
code formatting
|
2020-05-15 15:16:54 +02:00 |
Miriam Baglioni
|
5ec8c49ad5
|
removed serialization points
|
2020-05-15 12:49:58 +02:00 |
Claudio Atzori
|
1d35836a58
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-05-15 12:26:31 +02:00 |
Claudio Atzori
|
cfc8948717
|
fixed mapping OdfToGraph: pick the correct element to map author pids and author affiliations; extended mapping Oaf2Graph: added support for author pids
|
2020-05-15 12:26:16 +02:00 |
Michele Artini
|
2a4e68a292
|
events recognition
|
2020-05-15 12:25:37 +02:00 |
Claudio Atzori
|
a832658296
|
code formatting
|
2020-05-15 10:21:09 +02:00 |
Claudio Atzori
|
50d6a2ad3c
|
added output directory removal in the blacklist spark actions; included common global properties in blacklist's workflow.xml
|
2020-05-15 09:53:37 +02:00 |
Claudio Atzori
|
18f46e47b9
|
added relations to the graph2hive import workflow
|
2020-05-15 09:34:48 +02:00 |
Claudio Atzori
|
9d028ffe1c
|
cleanup
|
2020-05-15 09:28:55 +02:00 |
Claudio Atzori
|
fd62359538
|
cleanup
|
2020-05-15 09:28:15 +02:00 |
Claudio Atzori
|
eb64335a54
|
parallel implementation for graph Hive importer
|
2020-05-15 09:05:26 +02:00 |
Miriam Baglioni
|
94571c9a51
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-14 18:29:55 +02:00 |
Miriam Baglioni
|
f25db01664
|
changed in the constant from propagationconstants to modelconstants
|
2020-05-14 18:29:24 +02:00 |
Miriam Baglioni
|
d05630d979
|
removed the constants added in ModelConstants
|
2020-05-14 18:22:50 +02:00 |
Claudio Atzori
|
f044d09315
|
revised mapping: more accurate mapping for name/surname from datacite format; improved mapping of null values
|
2020-05-14 15:07:24 +02:00 |
Miriam Baglioni
|
e7eb4f377e
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-14 10:34:17 +02:00 |
Miriam Baglioni
|
8828458acf
|
minor changes
|
2020-05-14 10:34:12 +02:00 |
Claudio Atzori
|
ab37953332
|
added global properties in wf definitions to avoid repeating name-node and job-tracker in the (many) distcp actions; reintroduced output directory removal at the beginning of each spark action
|
2020-05-14 10:25:41 +02:00 |
Claudio Atzori
|
12bfa6702e
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-05-13 17:01:17 +02:00 |
Claudio Atzori
|
5ecacad70a
|
fixed default resource typing in Oaf/Odf mapping
|
2020-05-13 17:01:11 +02:00 |
Enrico Ottonello
|
12756f9d41
|
multithread (4 threads) test to feed elastic search
|
2020-05-13 16:11:40 +02:00 |
Michele Artini
|
c0265213a0
|
partial implementation
|
2020-05-13 12:00:27 +02:00 |
Sandro La Bruzzo
|
a92ee0f41e
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-13 10:38:13 +02:00 |
Sandro La Bruzzo
|
d876f47d06
|
next step of MAG conversion implemented
|
2020-05-13 10:38:04 +02:00 |
Claudio Atzori
|
1ddd33de41
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-05-13 09:04:41 +02:00 |
Claudio Atzori
|
85f3c55992
|
fixed node names in blacklist workflow
|
2020-05-13 09:04:33 +02:00 |
Miriam Baglioni
|
43f127448d
|
changed the package name from dhp-propagation to dhp-enrichment for the preparation phase of funding propagation
|
2020-05-12 18:24:26 +02:00 |
Enrico Ottonello
|
08040cef80
|
spark action to analyze orcid lambda file
|
2020-05-12 16:57:43 +02:00 |
Claudio Atzori
|
ec0782e582
|
renamed jar containing the bulktagging and propagation workflows from dhp-[bulktagging|propagation] to dhp-enrichment; adjusted xml formatting
|
2020-05-12 15:49:28 +02:00 |
Miriam Baglioni
|
1547ca7e15
|
added blacklist step to the end of the provision wf
|
2020-05-12 12:17:27 +02:00 |
Miriam Baglioni
|
14979f299e
|
changed the configuration factory
|
2020-05-12 11:28:38 +02:00 |
Miriam Baglioni
|
f8aef6161a
|
minor modification
|
2020-05-12 11:28:07 +02:00 |
Miriam Baglioni
|
7387f3449a
|
changed the route to find the verb resolver classes
|
2020-05-12 11:27:38 +02:00 |
Miriam Baglioni
|
7687519f00
|
merged conflicts with upstream branch
|
2020-05-12 10:03:44 +02:00 |
Miriam Baglioni
|
8ffc050b8a
|
fixed problem in communityconfigurationfactory test
|
2020-05-12 10:01:09 +02:00 |
Claudio Atzori
|
527e8169a8
|
adjusted paths pointing to test configurations, cleanup
|
2020-05-11 18:17:05 +02:00 |
Claudio Atzori
|
f9a62ba63b
|
added wf nodes to copy entities to the output path
|
2020-05-11 18:16:39 +02:00 |
Miriam Baglioni
|
ad63effb4e
|
removed deletion of working dir
|
2020-05-11 17:48:22 +02:00 |
Claudio Atzori
|
c6b028f2af
|
code formatting
|
2020-05-11 17:38:08 +02:00 |
Claudio Atzori
|
6d0b11252e
|
bulktagging wfs moved into common dhp-enrichment module
|
2020-05-11 17:32:06 +02:00 |
Miriam Baglioni
|
50659011eb
|
refactoring
|
2020-05-11 16:14:26 +02:00 |
Miriam Baglioni
|
e883daf87e
|
added the outputPath parameter and the reset path to remove the outputath directory
|
2020-05-11 16:10:24 +02:00 |
Miriam Baglioni
|
5ab3424c77
|
removed unused dependencies
|
2020-05-11 16:09:37 +02:00 |
Miriam Baglioni
|
6a3b081263
|
added the last step of blacklisteing
|
2020-05-11 16:09:20 +02:00 |
Enrico Ottonello
|
3b1a68cbf5
|
elastic search feed test
|
2020-05-11 14:53:52 +02:00 |
Enrico Ottonello
|
f53e42bda7
|
merged
|
2020-05-11 14:49:28 +02:00 |
Enrico Ottonello
|
7990894454
|
different date format in lambda file parsing
|
2020-05-11 14:41:11 +02:00 |
Sandro La Bruzzo
|
0c6774e4da
|
updated pom version
|
2020-05-11 14:35:14 +02:00 |
Miriam Baglioni
|
bbc9b4f329
|
removed unused imports
|
2020-05-11 14:28:55 +02:00 |
Miriam Baglioni
|
757bae53ea
|
removed unusefule serialization points
|
2020-05-11 14:28:37 +02:00 |
Miriam Baglioni
|
b35d57a1ac
|
added resources for test
|
2020-05-11 14:15:30 +02:00 |
Miriam Baglioni
|
e563e65335
|
moved check from join to method
|
2020-05-11 14:11:44 +02:00 |
Sandro La Bruzzo
|
b90609848b
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-05-11 14:08:31 +02:00 |
Sandro La Bruzzo
|
4062eafbdb
|
merged from branch
|
2020-05-11 14:08:16 +02:00 |
Miriam Baglioni
|
f5d785e096
|
used the DbClient moved in dhp-common
|
2020-05-11 13:59:42 +02:00 |
Miriam Baglioni
|
112b2cb3c3
|
added the test class
|
2020-05-11 13:58:58 +02:00 |
Miriam Baglioni
|
9a7ae523c9
|
update to version 1.2.1-SNAPSHOT
|
2020-05-11 13:57:47 +02:00 |
Miriam Baglioni
|
2abb84877d
|
Merge branch 'master' into blacklist
|
2020-05-11 10:37:49 +02:00 |
Miriam Baglioni
|
b0f0b24263
|
update to version 1.2.1-SNAPSHOT
|
2020-05-11 10:37:31 +02:00 |
Miriam Baglioni
|
a7e91e23ba
|
update to versione 1.2.1-SNAPSHOT
|
2020-05-11 10:34:30 +02:00 |
Miriam Baglioni
|
bb59bdd60f
|
merge upstream
|
2020-05-11 10:33:17 +02:00 |
Miriam Baglioni
|
5e3548add6
|
-
|
2020-05-11 10:33:08 +02:00 |
Miriam Baglioni
|
dc8c8fa480
|
changed the version
|
2020-05-11 10:20:48 +02:00 |
Miriam Baglioni
|
871e079b45
|
merged with master
|
2020-05-11 10:20:00 +02:00 |
Claudio Atzori
|
60c40618d3
|
[maven-release-plugin] prepare for next development iteration
|
2020-05-11 10:17:14 +02:00 |
Claudio Atzori
|
c267d958d5
|
[maven-release-plugin] prepare release dhp-1.2.0
|
2020-05-11 10:17:10 +02:00 |
Miriam Baglioni
|
622ba87ec2
|
changed the version
|
2020-05-11 10:10:36 +02:00 |
Miriam Baglioni
|
391b2399cc
|
merge upstream
|
2020-05-11 10:08:51 +02:00 |
Claudio Atzori
|
42f1a2bf94
|
bumped project version to 1.2.0-SNAPSHOT
|
2020-05-11 10:05:57 +02:00 |
Sandro La Bruzzo
|
1412158a6f
|
merged from branch
|
2020-05-11 09:45:50 +02:00 |
Miriam Baglioni
|
32301451ec
|
merge upstream
|
2020-05-11 09:42:23 +02:00 |
Miriam Baglioni
|
7e66bc2527
|
fix a typo in the compression keyword and added some logging info in the spark job
|
2020-05-11 09:40:58 +02:00 |
Sandro La Bruzzo
|
1662f221f5
|
added test class
|
2020-05-11 09:39:11 +02:00 |
Sandro La Bruzzo
|
2b48a2c32c
|
Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost
|
2020-05-11 09:38:36 +02:00 |
Sandro La Bruzzo
|
4cebca09d2
|
start implementing MAG mapping
|
2020-05-11 09:38:27 +02:00 |
Spyros Zoupanos
|
ae0f535c73
|
Fixing hardcoded reference to main openAIRE graph db
|
2020-05-09 22:34:48 +03:00 |
Claudio Atzori
|
fd519df616
|
new rels produced by dedup workflow must be unique
|
2020-05-08 19:00:38 +02:00 |
Claudio Atzori
|
0ccc864ad9
|
[maven-release-plugin] prepare for next development iteration
|
2020-05-08 17:01:31 +02:00 |
Claudio Atzori
|
6e47c724c6
|
[maven-release-plugin] prepare release dhp-1.1.7
|
2020-05-08 17:01:27 +02:00 |
Claudio Atzori
|
5b28bb4131
|
code formatting
|
2020-05-08 16:49:47 +02:00 |
Claudio Atzori
|
8fd1952f16
|
code formatting
|
2020-05-08 16:01:09 +02:00 |
miconis
|
3420998bb4
|
reltype set in mergerels
|
2020-05-08 15:43:30 +02:00 |
Enrico Ottonello
|
b9d126dd1f
|
formatting modified after commit
|
2020-05-08 14:54:37 +02:00 |
Enrico Ottonello
|
7e1c987370
|
Merge branch 'doiboost' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doiboost
|
2020-05-08 14:49:50 +02:00 |
Enrico Ottonello
|
9d812788e4
|
added job to download from orcid the records modified after a fixed date, the info are taken from last_modified.csv on hdfs
|
2020-05-08 14:49:39 +02:00 |
Miriam Baglioni
|
9a29ab7508
|
got back to the readPath we have before
|
2020-05-08 13:08:56 +02:00 |
Miriam Baglioni
|
28556507e7
|
-
|
2020-05-08 12:54:52 +02:00 |
Claudio Atzori
|
b2192fdcdc
|
simplified reset_outputpath nodes across the workflows, applied common xml formatting
|
2020-05-08 12:33:31 +02:00 |
Miriam Baglioni
|
4c94231cad
|
merge with master fork
|
2020-05-08 12:25:57 +02:00 |
Miriam Baglioni
|
9b4c0d4b3a
|
-
|
2020-05-08 11:51:45 +02:00 |
Miriam Baglioni
|
53952707b6
|
modified test because of new step of data preparation. It now expects to find ResultCountrySet serialization nstead of DatasourceCountry
|
2020-05-08 11:49:19 +02:00 |
Claudio Atzori
|
62ea19f1d3
|
introduced mapping for ExternalReferences, made urls defined within an instance unique
|
2020-05-08 09:43:26 +02:00 |
Claudio Atzori
|
8c67073a07
|
force speculative execution to false
|
2020-05-08 09:42:21 +02:00 |
Miriam Baglioni
|
d6b9de9f46
|
Merge branch 'master' of https://code-repo.d4science.org/miriam.baglioni/dnet-hadoop
|
2020-05-07 18:22:59 +02:00 |
Miriam Baglioni
|
f95d288681
|
fixed swithch of parameters
|
2020-05-07 18:22:32 +02:00 |
Claudio Atzori
|
166aafd936
|
heavy cleanup
|
2020-05-07 18:22:26 +02:00 |
Michele Artini
|
ac0da5a7ee
|
Partial implementation of broker events
|
2020-05-07 12:31:26 +02:00 |
Miriam Baglioni
|
fb405275f7
|
merged with master
|
2020-05-07 11:48:21 +02:00 |
Miriam Baglioni
|
e124278934
|
-
|
2020-05-07 11:47:11 +02:00 |
Claudio Atzori
|
5111671e62
|
celanup
|
2020-05-07 11:47:00 +02:00 |
Miriam Baglioni
|
9f8855991c
|
changed Encorders.bean to Encoders.kryo
|
2020-05-07 11:44:35 +02:00 |
Miriam Baglioni
|
207b899d6d
|
merged with upstream
|
2020-05-07 11:43:53 +02:00 |
Claudio Atzori
|
5b3f8a0e90
|
using Encoders.bean instead of kryo
|
2020-05-07 11:41:41 +02:00 |
Miriam Baglioni
|
182225becb
|
Merge branch 'master' of https://code-repo.d4science.org/miriam.baglioni/dnet-hadoop
|
2020-05-07 11:38:17 +02:00 |
Miriam Baglioni
|
5efae3acb9
|
new workflow for job3
|
2020-05-07 11:38:10 +02:00 |
Claudio Atzori
|
73243793b2
|
Dataset based implementation for SparkCountryPropagationJob3
|
2020-05-07 11:15:24 +02:00 |
Claudio Atzori
|
128c3bf1c8
|
restored Author bean with simple getter/setter, author pid addition moved into dedicated implementation SparkOrcidToResultFromSemRelJob3
|
2020-05-07 11:14:56 +02:00 |
Miriam Baglioni
|
b2fec32c87
|
new workflow for job3
|
2020-05-07 10:01:57 +02:00 |
Miriam Baglioni
|
29bc8c44b1
|
changes in the construction of new country set
|
2020-05-07 10:01:34 +02:00 |
Miriam Baglioni
|
55e825acd4
|
chenged the test according to changes in SparkCOuntryPropagationJob2
|
2020-05-07 10:01:00 +02:00 |
Miriam Baglioni
|
16193cf0ba
|
new workflow and parameter for country propagation
|
2020-05-07 09:59:58 +02:00 |
Miriam Baglioni
|
5a476c7a13
|
chenged the xquery for the cfhb table
|
2020-05-07 09:58:17 +02:00 |
Miriam Baglioni
|
42ad51577a
|
new implementation with one more serialization step
|
2020-05-07 09:57:49 +02:00 |
Claudio Atzori
|
17860d3ab6
|
general changes in the RAW graph mapping: missing collectedfrom/hostedby causes records to be skipped; factored out most of the constants in ModelConstants class (dhp-schemas)
|
2020-05-06 13:20:02 +02:00 |
Claudio Atzori
|
fdfecc9578
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-05-06 11:28:01 +02:00 |
Claudio Atzori
|
c79e2f5977
|
drop workingPath before starting the dedup workflow
|
2020-05-06 11:27:44 +02:00 |
Michele Artini
|
8f30a09d84
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-05 17:12:22 +02:00 |
Michele Artini
|
ccc609f909
|
new module for the production of broker events
|
2020-05-05 17:09:00 +02:00 |
Miriam Baglioni
|
dd2e698a72
|
added a sequentialization step on the spark job. Addedd new parameter
|
2020-05-05 17:03:43 +02:00 |
Claudio Atzori
|
0825321d0b
|
improved unit tests in dhp-aggregation
|
2020-05-05 12:39:04 +02:00 |
Miriam Baglioni
|
252b219dd5
|
chanced the name of some properties
|
2020-05-05 10:03:32 +02:00 |
Claudio Atzori
|
4a8487165c
|
using long param names in wf definition
|
2020-05-04 19:19:29 +02:00 |
Claudio Atzori
|
a2fc37df5f
|
adjusted parameters
|
2020-05-04 19:18:59 +02:00 |
Claudio Atzori
|
f1b7e14036
|
code formatting
|
2020-05-04 19:18:34 +02:00 |
Miriam Baglioni
|
78578c3ccf
|
fixed wrong trnasition name in workflow
|
2020-05-04 15:46:24 +02:00 |
Miriam Baglioni
|
cc7d9b6b19
|
merge upstream
|
2020-05-04 13:59:09 +02:00 |
Miriam Baglioni
|
3957c815b9
|
changed the name of some parameters
|
2020-05-04 13:58:52 +02:00 |
Miriam Baglioni
|
e218360f8a
|
changed code for the mode of DbClient and also removed the dependency to graph-mapper
|
2020-05-04 12:26:17 +02:00 |
Miriam Baglioni
|
31ea05297d
|
moved the DbClient to common and added needed dependency to pom
|
2020-05-04 12:22:28 +02:00 |
miconis
|
085cf173d7
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-05-04 12:08:20 +02:00 |
miconis
|
3df703f67d
|
mergerels added to propagate relations
|
2020-05-04 12:08:12 +02:00 |
Claudio Atzori
|
bac37b3973
|
fixed children expansion in XML records
|
2020-05-04 11:51:17 +02:00 |
Claudio Atzori
|
077ccd8743
|
stats wf properties cleanup
|
2020-05-04 11:41:46 +02:00 |
Miriam Baglioni
|
b7dd400e51
|
added check if author.pid exists or is null
|
2020-05-01 15:09:02 +02:00 |
Miriam Baglioni
|
dbf3ba051a
|
minor
|
2020-04-30 20:22:07 +02:00 |
Miriam Baglioni
|
43053a286d
|
workflow pom with added blacklist module
|
2020-04-30 18:30:21 +02:00 |
Miriam Baglioni
|
0631fe548a
|
pom.xml
|
2020-04-30 18:29:46 +02:00 |
Miriam Baglioni
|
38ecfd5785
|
the wf with all the three steps for blacklisting relations
|
2020-04-30 18:28:46 +02:00 |
Miriam Baglioni
|
95433e1087
|
parameters for the preparation phase and blacklist phase
|
2020-04-30 18:28:13 +02:00 |
Miriam Baglioni
|
1070790c19
|
minor
|
2020-04-30 18:26:58 +02:00 |
Miriam Baglioni
|
b9d56b3ced
|
applies the actual removal of the relations
|
2020-04-30 18:26:25 +02:00 |
Miriam Baglioni
|
d6d6ebeae5
|
preparation step: creates the subset of the merges relations
|
2020-04-30 18:25:33 +02:00 |
Miriam Baglioni
|
13f30664ea
|
minor
|
2020-04-30 15:23:49 +02:00 |
Miriam Baglioni
|
276b95b7b3
|
add create file instruction
|
2020-04-30 15:05:17 +02:00 |
Miriam Baglioni
|
65a5d67b8b
|
minor modifications
|
2020-04-30 14:45:27 +02:00 |
Miriam Baglioni
|
418595fec2
|
removed the saveGraph parameter
|
2020-04-30 14:45:00 +02:00 |
Miriam Baglioni
|
ce8b1d0bc3
|
new workflow definition to be inserted in the provision pipeline
|
2020-04-30 14:38:54 +02:00 |
Miriam Baglioni
|
4b0bd91012
|
-
|
2020-04-30 12:45:28 +02:00 |
Miriam Baglioni
|
2349bfd8b8
|
changed the job test to remove the writeUpdate option
|
2020-04-30 11:43:33 +02:00 |
Sandro La Bruzzo
|
1e06bbaee8
|
fixed test
|
2020-04-30 11:38:58 +02:00 |
Miriam Baglioni
|
951517f9ec
|
new input parameters and workflow definition to be used in the provision pipeline
|
2020-04-30 11:32:50 +02:00 |
Miriam Baglioni
|
026f297e49
|
removed the writeUpdate oprion
|
2020-04-30 11:31:59 +02:00 |
Sandro La Bruzzo
|
b8e95295e2
|
merged from master
|
2020-04-30 11:27:59 +02:00 |
Miriam Baglioni
|
c89fe762b1
|
modified relation datasource organization
|
2020-04-30 11:17:03 +02:00 |
Miriam Baglioni
|
3abb76ff7a
|
merge with upstream
|
2020-04-30 11:15:54 +02:00 |
Michele Artini
|
eb9bd42970
|
fixed a problem with journals
|
2020-04-30 11:06:05 +02:00 |
Miriam Baglioni
|
638a3c465b
|
-
|
2020-04-30 11:05:17 +02:00 |
Michele Artini
|
a0a6109bbc
|
fixed a problem with journals
|
2020-04-30 11:03:46 +02:00 |
Miriam Baglioni
|
354f0162be
|
changes in the blacklist and workflow definition
|
2020-04-30 10:26:50 +02:00 |
Claudio Atzori
|
439c6255a2
|
cleanup
|
2020-04-29 19:09:07 +02:00 |
Claudio Atzori
|
77ac995770
|
cleaned up poms, added descriptions
|
2020-04-29 18:44:17 +02:00 |
Miriam Baglioni
|
3cffee74b9
|
merge with upstream
|
2020-04-29 18:25:29 +02:00 |
Miriam Baglioni
|
9ab46535e7
|
pom with the new blacklist module added
|
2020-04-29 18:17:15 +02:00 |
Miriam Baglioni
|
6a47e6191d
|
read from blacklist and write the result as relations on hdfs
|
2020-04-29 18:16:01 +02:00 |
Miriam Baglioni
|
869f576273
|
added hash map for relationship entityType id prefix, and relation inverse
|
2020-04-29 18:14:52 +02:00 |
Miriam Baglioni
|
b85ad7012a
|
reads the blacklist from the blacklist db and writes it as a set of relations on hdfs
|
2020-04-29 17:29:49 +02:00 |
Claudio Atzori
|
8fd81e863d
|
added default value for the external_stats_db_name
|
2020-04-29 15:36:24 +02:00 |
Claudio Atzori
|
c6f3ff4462
|
stats workflow content relocated into common package; added <global> property definitions in stats workflow.xml
|
2020-04-29 14:29:27 +02:00 |
Sandro La Bruzzo
|
4a89465740
|
reformatted code
|
2020-04-29 13:24:29 +02:00 |
Sandro La Bruzzo
|
a6b1a59d0a
|
merged with maaster
|
2020-04-29 13:20:57 +02:00 |
Sandro La Bruzzo
|
920c0f19c3
|
Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost
|
2020-04-29 13:13:16 +02:00 |
Sandro La Bruzzo
|
09f161f1f4
|
implemented unit test
|
2020-04-29 13:13:02 +02:00 |
miconis
|
e0d14fe4f8
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-04-29 13:02:53 +02:00 |
miconis
|
0352d3b0ba
|
entity dumps in dedup compressed
|
2020-04-29 13:02:34 +02:00 |
Michele Artini
|
c43b4c8962
|
formatting
|
2020-04-29 12:56:58 +02:00 |
Michele Artini
|
a5d7007005
|
Fix relations in migration
Fix pom.xml in dhp-stats-update
|
2020-04-29 12:05:41 +02:00 |
Miriam Baglioni
|
f7695e833c
|
resolved conflicts
|
2020-04-29 11:41:31 +02:00 |
Claudio Atzori
|
3616d0f88d
|
Merge pull request 'Adding the stats workflow to the dnet-hadoop hierarchy' (#6) from spyros/dnet-hadoop:master into master
Integrating stats update workflow.
|
2020-04-29 10:35:02 +02:00 |
Claudio Atzori
|
964972d29a
|
added data provision workflow definition WIP
|
2020-04-29 09:25:50 +02:00 |
Enrico Ottonello
|
1edcd53581
|
added shell actions to download all 11 activities files from ORCID
|
2020-04-28 20:25:09 +02:00 |
miconis
|
62e467eb0c
|
assertion numbers updated to fit the new implementation of the pace-core
|
2020-04-28 11:46:23 +02:00 |
Claudio Atzori
|
6f5b899038
|
reformatted code according to the updated style descriptor
|
2020-04-28 11:23:29 +02:00 |
Claudio Atzori
|
ac25f2d8d1
|
integrated changes from master
|
2020-04-28 08:55:28 +02:00 |
Miriam Baglioni
|
2980e50edf
|
merge upstream
|
2020-04-27 15:06:48 +02:00 |
Claudio Atzori
|
a0bdbacdae
|
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
|
2020-04-27 14:52:31 +02:00 |
Claudio Atzori
|
7a3f8085f7
|
switched automatic code formatting plugin to net.revelc.code.formatter:formatter-maven-plugin
|
2020-04-27 14:45:40 +02:00 |
Michele Artini
|
1260d03eba
|
skip empty projects
|
2020-04-27 13:51:13 +02:00 |
Miriam Baglioni
|
df34a4ebcc
|
changed the configuration to add ignorecase option to each verb related to covid-19 community
|
2020-04-27 12:32:56 +02:00 |
Miriam Baglioni
|
7a59324ccf
|
changed the test to check for the new ignorecase option
|
2020-04-27 12:31:46 +02:00 |
Miriam Baglioni
|
986c97348d
|
added the ignorecase option to each selection verb
|
2020-04-27 12:31:05 +02:00 |
Miriam Baglioni
|
a303fc9f73
|
resources for testing propagation of result to comminuty from organization and from semrel
|
2020-04-27 11:14:16 +02:00 |
Miriam Baglioni
|
c093d764a3
|
-
|
2020-04-27 11:12:38 +02:00 |
Miriam Baglioni
|
c925e2be16
|
test for propagation of result to community from organization and result to community from semrel
|
2020-04-27 10:59:53 +02:00 |
Miriam Baglioni
|
ec7f166690
|
changed the bl because of changed of the examples for the re implementation of the propagation step
|
2020-04-27 10:58:41 +02:00 |
Miriam Baglioni
|
6135096ef1
|
refactoring
|
2020-04-27 10:57:50 +02:00 |
Miriam Baglioni
|
d30e710165
|
fixed duplicates action name in the workflow
|
2020-04-27 10:52:30 +02:00 |
Miriam Baglioni
|
f9ee343fc0
|
new parametrized workflow with preparation steps and new parameter input files
|
2020-04-27 10:48:31 +02:00 |
Miriam Baglioni
|
e2093644dc
|
changed in the workflow the directory where to store the preparedInfo and the graph genearated at this step
|
2020-04-27 10:46:44 +02:00 |
Miriam Baglioni
|
8a58bf2744
|
removed the writeUpdate option
|
2020-04-27 10:45:06 +02:00 |
Miriam Baglioni
|
5dccbe13db
|
merge with upstream
|
2020-04-27 10:43:59 +02:00 |
Miriam Baglioni
|
7b6505ec69
|
new resuorces for testing propagation of project to result after the re-implementation
|
2020-04-27 10:42:16 +02:00 |
Miriam Baglioni
|
1b0e0bd1b5
|
refactoring
|
2020-04-27 10:40:26 +02:00 |
Miriam Baglioni
|
e5a177f0a7
|
refactoring
|
2020-04-27 10:36:21 +02:00 |
Miriam Baglioni
|
e000754c92
|
refactoring
|
2020-04-27 10:34:03 +02:00 |
Miriam Baglioni
|
95a54d5460
|
removed the writeUpdate option. The update is available in the preparedInfo path
|
2020-04-27 10:30:32 +02:00 |
Miriam Baglioni
|
8802e4126b
|
re-implemented inverting the couple: from (projectId, relatedResultList) to (resultId, relatedProjectList)
|
2020-04-27 10:26:55 +02:00 |
Enrico Ottonello
|
a1861b9eaa
|
workflow works in parallel on 2 activity files
|
2020-04-24 18:33:37 +02:00 |
Enrico Ottonello
|
941e94af06
|
added workflow for generating authors with dois data sequence file
|
2020-04-24 15:50:40 +02:00 |
Claudio Atzori
|
268462623a
|
refined definition of equals and hash methods for Oaf model classes, now based on entity identifier, while relations consider sourceid, targetid and relationship semantic; Factored out function to group Oaf objects in grouping operations; Raw graph creation procedure merges entities and relationships providing the same identity
|
2020-04-24 14:42:01 +02:00 |
Claudio Atzori
|
a3e480d1c9
|
implmented DispatchEntitiesApplication using spark2 datasets
|
2020-04-24 14:36:53 +02:00 |
Claudio Atzori
|
48157e0fc4
|
GraphHiveImporterJob moved in dedicate package
|
2020-04-24 14:32:28 +02:00 |
Miriam Baglioni
|
adcbf0e29a
|
refactoring
|
2020-04-24 10:47:43 +02:00 |
Claudio Atzori
|
278fc9d276
|
code formatting
|
2020-04-23 18:51:38 +02:00 |
miconis
|
5414236644
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-04-23 18:17:23 +02:00 |
miconis
|
8d258c85ff
|
spark dedup test fixed, sample for dataset and orp added, test implemented
|
2020-04-23 18:16:20 +02:00 |
Michele Artini
|
072eae3803
|
fixed a problem with missing contexts
|
2020-04-23 16:42:49 +02:00 |
Michele Artini
|
b164d96874
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-04-23 16:19:16 +02:00 |
Michele Artini
|
d920ce501e
|
fixed a problem with missing instances
|
2020-04-23 16:18:40 +02:00 |
Miriam Baglioni
|
0e447add66
|
removed unuseful classes
|
2020-04-23 12:59:43 +02:00 |
Miriam Baglioni
|
edb00db86a
|
refactoring
|
2020-04-23 12:57:35 +02:00 |
Miriam Baglioni
|
44fab140de
|
-
|
2020-04-23 12:42:07 +02:00 |
Miriam Baglioni
|
769aa8178a
|
refactoring
|
2020-04-23 12:40:44 +02:00 |
Miriam Baglioni
|
d8dc31d4af
|
refactoring
|
2020-04-23 12:35:49 +02:00 |
Miriam Baglioni
|
8c5dac5cc3
|
removed unuseful classes
|
2020-04-23 12:30:58 +02:00 |
Miriam Baglioni
|
15656684b9
|
added proeprties for the preparation step and actual propagation. Added the new parametrized workflow
|
2020-04-23 12:13:34 +02:00 |
Miriam Baglioni
|
6f35f5ca42
|
added the steps of reset output dir and copy information not changed by the propagation step
|
2020-04-23 12:12:07 +02:00 |
Miriam Baglioni
|
19cd5b85c0
|
changed the classname to execute
|
2020-04-23 12:07:41 +02:00 |
Miriam Baglioni
|
fa2ff5c6f5
|
refactoring
|
2020-04-23 11:58:26 +02:00 |
Miriam Baglioni
|
540f70298b
|
added missing property
|
2020-04-23 11:51:48 +02:00 |
Miriam Baglioni
|
e431fe4f5b
|
added the implements Serializable to each class
|
2020-04-23 11:48:47 +02:00 |
Miriam Baglioni
|
24fa81d7e8
|
implementation parametrized for result type
|
2020-04-23 11:44:19 +02:00 |
Miriam Baglioni
|
ab2a24cc2b
|
changed the dependency to use reflections to find annotated classes
|
2020-04-23 11:08:47 +02:00 |
Miriam Baglioni
|
5153d88bd3
|
defiition of workflow and properties for bulktagging
|
2020-04-23 11:04:53 +02:00 |
Miriam Baglioni
|
3b2e4ab670
|
test for bulktag
|
2020-04-23 10:00:10 +02:00 |
Sandro La Bruzzo
|
fdc0523e4c
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-04-23 09:34:13 +02:00 |
Sandro La Bruzzo
|
4ba386d996
|
improved crossref mapping
|
2020-04-23 09:33:48 +02:00 |
Claudio Atzori
|
8851050814
|
replaced hive_db_name with hiveDbName
|
2020-04-23 08:36:40 +02:00 |
Claudio Atzori
|
91f81107b1
|
applying code formatting
|
2020-04-23 07:52:32 +02:00 |
Claudio Atzori
|
1e7583c5a6
|
filtered invisible records in data provision workflow
|
2020-04-23 07:51:34 +02:00 |
Claudio Atzori
|
9ddafd46ca
|
fixed dedup record id prefix, set the correct dataInfo in the DedupRecordFactory
|
2020-04-23 07:50:18 +02:00 |
Claudio Atzori
|
ade4cb97af
|
fixed parameters passed to the postprocessing action in the workflow mapping the graph as hive DB
|
2020-04-22 18:24:06 +02:00 |
Sandro La Bruzzo
|
bb6c9785b4
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-04-22 15:00:57 +02:00 |
Sandro La Bruzzo
|
157915988c
|
improved crossref mapping
|
2020-04-22 15:00:44 +02:00 |
Enrico Ottonello
|
5977f08e92
|
merged
|
2020-04-22 14:50:50 +02:00 |
Enrico Ottonello
|
7d759947ae
|
used vtd for parsing orcid xml record, set 4g heapspace
|
2020-04-22 14:41:19 +02:00 |
Claudio Atzori
|
e81960335c
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-04-22 10:46:37 +02:00 |
Michele Artini
|
9e4d58f505
|
ResultType
|
2020-04-22 10:07:26 +02:00 |
Claudio Atzori
|
c891661822
|
small adjustments in the graph2hive workflow
|
2020-04-21 18:52:23 +02:00 |
Miriam Baglioni
|
259525cb93
|
Merge remote-tracking branch 'upstream/master'
|
2020-04-21 18:33:46 +02:00 |
Miriam Baglioni
|
30e53261d0
|
minor
|
2020-04-21 18:00:53 +02:00 |
Claudio Atzori
|
0b55795d4d
|
small adjustments in the provisioning workflow
|
2020-04-21 16:15:04 +02:00 |
Claudio Atzori
|
88fbb3a353
|
added sparkSqlWarehouseDir to the default extra spark options passed to each workflow
|
2020-04-21 16:13:43 +02:00 |
Claudio Atzori
|
cd320efa96
|
added extra spark options to graph to hive workflow
|
2020-04-21 16:12:20 +02:00 |
Miriam Baglioni
|
90c768dde6
|
added shaded libs module
|
2020-04-21 16:03:51 +02:00 |
Claudio Atzori
|
91e72a6944
|
Dataset based implementation for SparkCreateDedupRecord phase, fixed datasource entity dump supplementing dedup unit tests
|
2020-04-21 12:06:08 +02:00 |
miconis
|
5c9ef08a8e
|
spark dedup test fixed
|
2020-04-21 10:19:04 +02:00 |
Sandro La Bruzzo
|
3624947a7f
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-04-21 08:34:24 +02:00 |
Claudio Atzori
|
d772d967aa
|
restored changes from master branch
|
2020-04-20 18:53:06 +02:00 |
Claudio Atzori
|
eb8a020859
|
fixed behaviour of DedupRecordFactory
|
2020-04-20 18:44:06 +02:00 |
Sandro La Bruzzo
|
039f9b7871
|
Merge remote-tracking branch 'origin/master' into doiboost
|
2020-04-20 18:10:29 +02:00 |
Sandro La Bruzzo
|
e4b105cece
|
improved crossref mapping
|
2020-04-20 18:10:07 +02:00 |
Claudio Atzori
|
ede1af3d85
|
Merge branch 'master' into deduptesting
|
2020-04-20 16:52:14 +02:00 |
miconis
|
1102e32462
|
SparkDedupTest updated and organization dump fixed
|
2020-04-20 16:49:01 +02:00 |
Claudio Atzori
|
667d23c58b
|
finalising Actionset migration workflow
|
2020-04-20 16:45:21 +02:00 |
miconis
|
4da13e4570
|
Revert "Merge branch 'master' into deduptesting"
This reverts commit 772f75d167 , reversing
changes made to 5f45f2c77f .
|
2020-04-20 16:04:49 +02:00 |
Claudio Atzori
|
9147af7fed
|
actionsets migration workflow moved in dhp-workflows/dhp-actionmanager
|
2020-04-20 15:24:33 +02:00 |
miconis
|
772f75d167
|
Merge branch 'master' into deduptesting
|
2020-04-20 14:50:12 +02:00 |
Sandro La Bruzzo
|
5d46ec7d5f
|
fixed name of wrong package
|
2020-04-20 14:49:32 +02:00 |
Sandro La Bruzzo
|
82cc3b707d
|
fixed name of wrong package
|
2020-04-20 14:47:06 +02:00 |
Sandro La Bruzzo
|
b2c872cb4d
|
merged master
|
2020-04-20 14:04:40 +02:00 |
Sandro La Bruzzo
|
7029942e06
|
Merge branch 'doiboost' of code-repo.d4science.org:D-Net/dnet-hadoop into doiboost
|
2020-04-20 13:26:41 +02:00 |
Sandro La Bruzzo
|
0e45f4d450
|
continue mapping from crossref to OAF
|
2020-04-20 13:26:29 +02:00 |
Enrico Ottonello
|
a466648b4b
|
renamed output file
|
2020-04-20 12:32:03 +02:00 |
Claudio Atzori
|
d714bfb4d4
|
collectedfrom field moved in common parent class Oaf.java
|
2020-04-20 12:25:19 +02:00 |
Enrico Ottonello
|
4ae55e3891
|
added workflow parameters
|
2020-04-20 12:00:04 +02:00 |
Michele Artini
|
8ff7facfa3
|
fixed collectedFrom ID
|
2020-04-20 11:09:27 +02:00 |
Sandro La Bruzzo
|
eef60bb9f4
|
created structure of oozie wf for ORCID
|
2020-04-20 10:24:57 +02:00 |
Sandro La Bruzzo
|
4d0d9de07e
|
reorganized package and fixed test
|
2020-04-20 10:02:42 +02:00 |
Sandro La Bruzzo
|
618bc1fc72
|
first implementation of crossrefMapping
|
2020-04-20 09:53:34 +02:00 |
Michele Artini
|
25307965d2
|
add a default datainfo if missing
|
2020-04-20 09:43:27 +02:00 |
Michele Artini
|
d2058fdc47
|
tests
|
2020-04-20 09:31:14 +02:00 |
Enrico Ottonello
|
1d44a359ea
|
renamed package folder
|
2020-04-20 09:25:40 +02:00 |
Michele Artini
|
478a958f09
|
tests
|
2020-04-20 09:15:27 +02:00 |
Miriam Baglioni
|
e1848b7603
|
minor
|
2020-04-18 14:16:42 +02:00 |
Miriam Baglioni
|
0ff9b1ef05
|
added needed parameter
|
2020-04-18 14:16:29 +02:00 |
Miriam Baglioni
|
e2dfe8b656
|
removed not used action
|
2020-04-18 14:16:07 +02:00 |
Miriam Baglioni
|
437ebbad76
|
refactorign
|
2020-04-18 14:15:09 +02:00 |
Miriam Baglioni
|
9a8876ac86
|
added needed parameter
|
2020-04-18 14:14:08 +02:00 |
Miriam Baglioni
|
9854852878
|
refactoring
|
2020-04-18 14:13:16 +02:00 |
Miriam Baglioni
|
454b8a6a29
|
Merge remote-tracking branch 'upstream/master'
|
2020-04-18 14:09:44 +02:00 |
Miriam Baglioni
|
890ec28f0f
|
input parameters for preparation step1
|
2020-04-18 14:09:37 +02:00 |
Miriam Baglioni
|
fbf5c27c27
|
Added preparation classes before actual propagation
|
2020-04-18 14:09:03 +02:00 |
Claudio Atzori
|
5f45f2c77f
|
Merge branch 'master' into deduptesting
|
2020-04-18 12:46:40 +02:00 |
Claudio Atzori
|
ad7a131b18
|
introduced common project code formatting plugin, works on the commit hook, based on https://github.com/Cosium/git-code-format-maven-plugin, applied to each java class in the project
|
2020-04-18 12:42:58 +02:00 |
Claudio Atzori
|
a2938dd059
|
cleanup
|
2020-04-18 12:24:22 +02:00 |
Claudio Atzori
|
9374ff03ea
|
Merge branch 'master' into deduptesting
|
2020-04-18 12:06:58 +02:00 |
Claudio Atzori
|
71813795f6
|
various refactorings on the dnet-dedup-openaire workflow
|
2020-04-18 12:06:23 +02:00 |
Enrico Ottonello
|
7011d4203e
|
parser of orcid summaries from tar gz file on hdfs, that creates a sequence file with authors informations (oid, name, surname, credit name)
|
2020-04-17 18:52:39 +02:00 |
miconis
|
6450bb0daa
|
test for softwares dedup added. definition of orp, dataset and sw dedup configurations
|
2020-04-17 17:31:59 +02:00 |
Miriam Baglioni
|
72c63a326e
|
removed unuseful class
|
2020-04-17 17:14:51 +02:00 |
Miriam Baglioni
|
00c2ca3ee5
|
-
|
2020-04-17 17:14:25 +02:00 |
Miriam Baglioni
|
5cd092114f
|
use mergeFrom method to add the new community contexts
|
2020-04-17 17:13:18 +02:00 |
Miriam Baglioni
|
264c82f21e
|
minor
|
2020-04-17 16:54:46 +02:00 |
Miriam Baglioni
|
8c079c7a49
|
unit test for orcid to result propagation from semrel
|
2020-04-17 16:53:03 +02:00 |
Miriam Baglioni
|
eacd140a98
|
added missing parameter(s)
|
2020-04-17 16:52:30 +02:00 |
Miriam Baglioni
|
390e250faf
|
use the addPid method of the Author class to add a new pid
|
2020-04-17 16:52:02 +02:00 |
Miriam Baglioni
|
b46b080ddc
|
use mergeFrom method call to add the country(ies) instead of modify the result directly.
|
2020-04-17 16:50:54 +02:00 |
Miriam Baglioni
|
c4987dd12a
|
minor
|
2020-04-17 16:49:08 +02:00 |
Claudio Atzori
|
038ac7afd7
|
relation consistency workflow separated from dedup scan and creation of CCs
|
2020-04-17 13:12:44 +02:00 |
Claudio Atzori
|
c92bfeeaee
|
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
|
2020-04-17 13:07:52 +02:00 |
Miriam Baglioni
|
adc11c97a7
|
Merge remote-tracking branch 'upstream/master'
|
2020-04-17 12:34:31 +02:00 |
Sandro La Bruzzo
|
a329ea5575
|
merged with master branch
|
2020-04-17 12:23:54 +02:00 |
Sandro La Bruzzo
|
01ea7721f3
|
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
|
2020-04-17 12:12:25 +02:00 |
Sandro La Bruzzo
|
5e2fa996aa
|
fixed problem with conversion of long into string
|
2020-04-17 12:11:51 +02:00 |
miconis
|
418cf94642
|
implementation of the deletedbyinference test in propagating relations
|
2020-04-17 10:40:21 +02:00 |
Miriam Baglioni
|
5d772e5263
|
new implementation of propagation of community to result from organization that exploits the prepared info
|
2020-04-16 18:45:22 +02:00 |
Miriam Baglioni
|
fff1e5ec39
|
classes to (de)serialize the data provided in the preparation step
|
2020-04-16 18:44:43 +02:00 |
Miriam Baglioni
|
3fd9d6b02f
|
preparation phase for the propagation of community to result from organization
|
2020-04-16 18:43:55 +02:00 |
Miriam Baglioni
|
a9120164aa
|
added hive parameter and a step of reset of the working dir in the workflow
|
2020-04-16 18:42:04 +02:00 |
Miriam Baglioni
|
6afbd542ca
|
changed the save mode to avoid NegativeArraySize... error. Needed to modify also the preparationstep2
|
2020-04-16 18:40:14 +02:00 |
Miriam Baglioni
|
d60fd36046
|
changed the save method
|
2020-04-16 16:14:15 +02:00 |
Miriam Baglioni
|
951b13ac46
|
input parameters and workflow for new implementation of propagation of orcid to result from semrel and preparation phases
|
2020-04-16 16:13:10 +02:00 |
Miriam Baglioni
|
4d89f3dfed
|
removed unuseful classes
|
2020-04-16 16:11:44 +02:00 |
Miriam Baglioni
|
5e72a51f11
|
-
|
2020-04-16 16:11:20 +02:00 |
Miriam Baglioni
|
c33a593381
|
renamed
|
2020-04-16 16:09:47 +02:00 |
Miriam Baglioni
|
0e5399bf74
|
seconf phase of data preparation. Groups all the possible updates by id
|
2020-04-16 16:08:51 +02:00 |
Miriam Baglioni
|
548ba915ac
|
first phase of data preparation. For each result type (parallel) it produces the possible updates
|
2020-04-16 15:58:42 +02:00 |
Miriam Baglioni
|
243013cea3
|
to (de)serialize the association from the resultId and the list of autoritative authors with orcid to possibly propagate
|
2020-04-16 15:57:29 +02:00 |
Miriam Baglioni
|
ac3ad25b36
|
to (de)serialize needed information of the author to determine if the orcid can be passed (name, surname, fullname (?), orcid)
|
2020-04-16 15:56:33 +02:00 |
Miriam Baglioni
|
d6cd700a32
|
new implementation that exploits prepared information (the list of possible updates: resultId - possible list of orcid to be added
|
2020-04-16 15:55:25 +02:00 |
Miriam Baglioni
|
f077f22f73
|
minor
|
2020-04-16 15:54:16 +02:00 |
Miriam Baglioni
|
fd5d792e35
|
refactoring
|
2020-04-16 15:53:34 +02:00 |
Claudio Atzori
|
cb0952428e
|
Merge branch 'master' into deduptesting
|
2020-04-16 14:42:25 +02:00 |
Claudio Atzori
|
cc21bbfb1a
|
Merge branch 'deduptesting' of https://code-repo.d4science.org/D-Net/dnet-hadoop into deduptesting
|
2020-04-16 14:41:37 +02:00 |
Claudio Atzori
|
ec5dfc068d
|
added spark.sql.shuffle.partitions=3840 to dedup scan wf
|
2020-04-16 14:41:28 +02:00 |
Claudio Atzori
|
09f356b047
|
Merge pull request 'Closes #7: subdirs inside graph table dirs' (#8) from przemyslaw.jacewicz/dnet-hadoop:przemyslawjacewicz_7_distcp_configuration_fix into master
Run the code from this PR in isolation and it worked fine. Thanks!
|
2020-04-16 14:38:46 +02:00 |
Claudio Atzori
|
3437383112
|
Merge branch 'master' into deduptesting
|
2020-04-16 12:46:14 +02:00 |
miconis
|
0eccbc318b
|
Deduper class (utilities for dedup) cleaned. Useless methods removed
|
2020-04-16 12:36:37 +02:00 |
Claudio Atzori
|
76d23895e6
|
Merge branch 'deduptesting' of https://code-repo.d4science.org/D-Net/dnet-hadoop into deduptesting
|
2020-04-16 12:18:32 +02:00 |
miconis
|
6a089ec287
|
minor changes
|
2020-04-16 12:15:38 +02:00 |
Claudio Atzori
|
376efd67de
|
removed prepare statement in spark action
|
2020-04-16 12:14:16 +02:00 |
miconis
|
9b36458b6a
|
Merge branch 'deduptesting' of code-repo.d4science.org:D-Net/dnet-hadoop into deduptesting
|
2020-04-16 12:13:58 +02:00 |
miconis
|
cd4d9a148f
|
creating temporary directories in dedup test
|
2020-04-16 12:13:26 +02:00 |
Claudio Atzori
|
b39ff36c16
|
improving the wf definitions
|
2020-04-16 12:11:37 +02:00 |