Claudio Atzori
|
2ac2d928bd
|
[maven-release-plugin] prepare for next development iteration
|
2022-04-07 12:18:47 +02:00 |
Claudio Atzori
|
85bc722ff4
|
[maven-release-plugin] prepare release dhp-1.2.4
|
2022-04-07 12:18:43 +02:00 |
Claudio Atzori
|
bc05b6168a
|
[maven-release-plugin] rollback the release of dhp-1.2.4
|
2022-04-07 11:49:06 +02:00 |
Claudio Atzori
|
505420fd61
|
[maven-release-plugin] prepare for next development iteration
|
2022-04-07 11:34:06 +02:00 |
Claudio Atzori
|
66e718981e
|
[maven-release-plugin] prepare release dhp-1.2.4
|
2022-04-07 11:34:02 +02:00 |
Claudio Atzori
|
48d32466e4
|
instances grouped by URL expose only one refereed
|
2022-03-23 14:52:03 +01:00 |
Miriam Baglioni
|
2b643059fa
|
[Country Propagation] changed the logic to get the collectedfrom at the result level. To fix issue when no instance is created for a result that should have the country associated. Change the code to use spark instead of hive to prepare the data needed for the propagation step. Added new tests for the intermediate steps and new verification for the propagation itself
|
2022-03-11 13:56:48 +01:00 |
Claudio Atzori
|
a87c070447
|
conflicts resolved, merged from beta
|
2022-02-24 12:51:31 +01:00 |
Claudio Atzori
|
86cdb7a38f
|
[provision] serialize measures defined on the result level
|
2022-02-23 15:54:18 +01:00 |
Alessia Bardi
|
9d6203f79b
|
test mapping datasource
|
2022-02-23 15:00:53 +01:00 |
Alessia Bardi
|
600ede1798
|
serialisation of APCs int he XML records
|
2022-02-11 11:00:20 +01:00 |
Claudio Atzori
|
cccb16900c
|
https://support.openaire.eu/issues/7330 normalising DOI urls
|
2021-12-23 12:33:53 +01:00 |
Claudio Atzori
|
98eb292c59
|
avoid NPEs merging XMLInstance(s)
|
2021-12-13 13:27:20 +01:00 |
Claudio Atzori
|
5e17247bb6
|
avoid NPEs merging XMLInstance(s)
|
2021-12-13 11:48:40 +01:00 |
Claudio Atzori
|
b70ecccea0
|
avoid NPEs merging XMLInstance(s)
|
2021-12-12 12:37:38 +01:00 |
Alessia Bardi
|
e53228401b
|
style
|
2021-12-09 15:46:22 +01:00 |
Alessia Bardi
|
6b5d7688a4
|
#7275 serialize license information in XML records
|
2021-12-09 13:46:48 +01:00 |
Claudio Atzori
|
9cac283bec
|
implemented Instance serialization features requested in https://support.openaire.eu/issues/7156
|
2021-12-02 17:20:33 +01:00 |
Claudio Atzori
|
1de881b796
|
resolved conflicts for #165
|
2021-11-26 16:15:11 +01:00 |
Sandro La Bruzzo
|
c9870c5122
|
code formatted
|
2021-10-19 15:24:59 +02:00 |
Claudio Atzori
|
e471f12d5e
|
hotfix: recovered implementation removing the hardcoded working_dirs
|
2021-10-19 12:35:38 +02:00 |
Claudio Atzori
|
14fbf92ad6
|
Merge branch 'beta' into beta_solr_config
|
2021-10-14 11:08:44 +02:00 |
Sandro La Bruzzo
|
5606014b17
|
code refactor see ticket #7065
|
2021-10-12 08:11:53 +02:00 |
Claudio Atzori
|
2f61054cd1
|
code formatting
|
2021-10-11 18:29:42 +02:00 |
Serafeim Chatzopoulos
|
201ce71cc1
|
Add resultsubject, relprojectname and resultacceptanceyear to __all field
|
2021-10-11 13:16:39 +03:00 |
Serafeim Chatzopoulos
|
e468a7b96b
|
Add tests to query Solr with different configurations
|
2021-10-08 16:58:51 +03:00 |
Serafeim Chatzopoulos
|
de81007302
|
Add exploreTestConfig, a new Solr configuration folder
|
2021-10-08 16:54:56 +03:00 |
Alessia Bardi
|
8d3b60f446
|
test for patching records for EOSC Future
|
2021-10-07 17:30:45 +02:00 |
Alessia Bardi
|
b924276e18
|
tests to generate records for the EOSC-Future demo with the EOSC Jupyter Notebbok subject
|
2021-09-24 17:11:56 +02:00 |
Sandro La Bruzzo
|
d4dadf6d77
|
reduced max number of PID in Relatedentity
|
2021-09-02 14:21:24 +02:00 |
Sandro La Bruzzo
|
9f8a80deb7
|
fixed wrong import of unresolved relation in openaire
|
2021-09-01 14:16:27 +02:00 |
Alessia Bardi
|
3762b17f7b
|
added VERSIOn and PART relationship and re-ordered according to my personal and obviously possibly biased
ordering
|
2021-08-31 20:20:05 +02:00 |
Alessia Bardi
|
931f430129
|
Merge branch 'beta' into datasource_model_eosc_beta
|
2021-08-23 11:57:21 +02:00 |
Claudio Atzori
|
9f4db73f30
|
updated/fixed unit tests
|
2021-08-11 15:02:51 +02:00 |
Claudio Atzori
|
2ee21da43b
|
suggestions from SonarLint
|
2021-08-11 12:13:22 +02:00 |
Sandro La Bruzzo
|
6358f92c3a
|
added sleep to solve problem of lost request of creating index
|
2021-07-30 08:54:37 +02:00 |
Claudio Atzori
|
c53d106e80
|
[provision] lowercase relation filter
|
2021-07-29 13:57:00 +02:00 |
Sandro La Bruzzo
|
3721df7aa6
|
refactoring create actionset of scholexplorer, moved on package dhp-aggregation
|
2021-07-29 10:45:35 +02:00 |
Sandro La Bruzzo
|
3d8f0f629b
|
implemented workflow of creation action set for scholexplorer
|
2021-07-28 16:15:34 +02:00 |
Alessia Bardi
|
df8715a1ec
|
format code after mvn compile
|
2021-07-28 11:58:26 +02:00 |
Michele Artini
|
3e2a2d6e71
|
added new fields in xml
|
2021-07-28 11:56:55 +02:00 |
Alessia Bardi
|
c806387d4b
|
tests for enermaps
|
2021-07-28 11:54:36 +02:00 |
Claudio Atzori
|
2fff24df55
|
code formatting
|
2021-07-28 11:34:19 +02:00 |
Sandro La Bruzzo
|
16c91203bd
|
implemented workflow of creation action set for scholexplorer
|
2021-07-28 10:30:49 +02:00 |
Michele Artini
|
52e2315ba2
|
removed trick for datasourcetypeui
|
2021-07-28 10:23:00 +02:00 |
Claudio Atzori
|
10d7b4f0b4
|
filtering 'old' OpenAIRE ids from the entity.originalId[] array in the OAF -> XML searialization procedure
|
2021-07-20 11:52:05 +02:00 |
Sandro La Bruzzo
|
bbe8193930
|
merged stable ids
|
2021-07-12 17:00:43 +02:00 |
Sandro La Bruzzo
|
57c74c73c6
|
fixed mistakes in oozie workflow
|
2021-07-09 12:28:09 +02:00 |
Sandro La Bruzzo
|
61ccb54fde
|
removed wrong loop on oozie wf
|
2021-07-09 12:17:57 +02:00 |
Sandro La Bruzzo
|
9f5a0f3ab6
|
moved wf indexing of Scholexplorer in dhp-graph-provision
|
2021-07-09 12:06:43 +02:00 |
Claudio Atzori
|
96238152cb
|
added serialization for alternateIdentifiers and pids within each record instance
|
2021-05-28 16:57:30 +02:00 |
Claudio Atzori
|
23b8883ab1
|
applied intellij code cleanup
|
2021-05-14 10:58:12 +02:00 |
Claudio Atzori
|
609eb711b3
|
IndexRecordTransformerTest for producing a record that can be manually submitted to solr
|
2021-05-13 16:13:28 +02:00 |
Claudio Atzori
|
1517bf7c92
|
IndexRecordTransformerTest for producing a record that can be manually submitted to solr
|
2021-05-13 16:11:22 +02:00 |
Claudio Atzori
|
5afa7d3e0c
|
core utilities in dhp-common moved in external module dhp-schemas
|
2021-04-27 15:44:01 +02:00 |
Claudio Atzori
|
27ab8a704d
|
adjusted poms to align with the external dhp-schema module
|
2021-04-27 10:12:27 +02:00 |
Claudio Atzori
|
c2bb03c8b5
|
depending on external dhp-schemas module
|
2021-04-23 17:57:35 +02:00 |
Claudio Atzori
|
1e7e5180fa
|
[Graph model] updated definition of ExternalReference: added alternateLabel, removed description (#6503)
|
2021-04-02 12:32:12 +02:00 |
Claudio Atzori
|
7941d7be29
|
WIP: using common definitions from ModelConstants
|
2021-03-31 18:33:57 +02:00 |
Claudio Atzori
|
72ce741ea6
|
WIP: using common definitions from ModelConstants
|
2021-03-31 17:07:13 +02:00 |
Sandro La Bruzzo
|
c73072079d
|
fix conflicts
|
2021-03-22 16:36:31 +01:00 |
Claudio Atzori
|
8d2bb24512
|
merged from master
|
2021-03-08 15:44:34 +01:00 |
Alessia Bardi
|
32e81c2d89
|
non validated rel has null value in validated field
|
2021-02-16 11:01:42 +01:00 |
Claudio Atzori
|
29c6f7e255
|
classes related to the collection workflow moved into common package; implemented MongoDB collection plugins
|
2021-02-12 12:31:02 +01:00 |
Claudio Atzori
|
b34b5a39ca
|
index field authoridtypevalue mixes up different author id-type value pairs, dropped in favour of orcidtypevalue
|
2021-02-11 09:36:04 +01:00 |
Alessia Bardi
|
986dd969d3
|
use the proper import for Lists
|
2021-02-10 12:03:54 +01:00 |
Alessia Bardi
|
09fc7e2f78
|
serialization of validated flag on relationships
|
2021-02-10 11:22:09 +01:00 |
Claudio Atzori
|
82e6c50f3f
|
updated solr fields (authoridtypevalue, resultsubject, resultresourcetypename)
|
2021-02-09 16:27:04 +01:00 |
Claudio Atzori
|
62bd3c53ee
|
Merge branch 'master' into provision_indexing
|
2021-02-09 15:46:26 +01:00 |
Claudio Atzori
|
72c57b28fa
|
switched project version to 1.2.4-branch_hadoop_aggregator-SNAPSHOT
|
2021-02-04 14:08:18 +01:00 |
Claudio Atzori
|
b6f08ce226
|
re-adding the old junit:junit dep as solr-test-framework needs it
|
2020-12-14 15:07:31 +01:00 |
Claudio Atzori
|
1506f49052
|
Xml record serialization for author PIDs: 1) only one value per PID type is allowed; 2) orcid prevails over orcid_pending
|
2020-12-14 11:14:03 +01:00 |
Claudio Atzori
|
61cd129ded
|
XML serialisation test
|
2020-12-11 12:44:53 +01:00 |
Claudio Atzori
|
ce7a319e01
|
using the correct assertion import
|
2020-12-11 12:44:17 +01:00 |
Claudio Atzori
|
7fe2433137
|
excluded transitive older junit dependencies, they can compromise the unit test executions
|
2020-12-11 12:42:55 +01:00 |
Claudio Atzori
|
d9532446eb
|
imported more diffs from master branch; code formatting
|
2020-12-10 16:14:16 +01:00 |
Claudio Atzori
|
12e2f930c8
|
resolved conflicts
|
2020-12-10 10:57:39 +01:00 |
Claudio Atzori
|
ff72fcd91a
|
allow orcid_pending to be percolate to the XML graph serialization
|
2020-12-09 19:04:50 +01:00 |
Claudio Atzori
|
211aa04726
|
allow orcid_pending to be percolate to the XML graph serialization
|
2020-12-09 18:08:51 +01:00 |
Claudio Atzori
|
026ad40633
|
disabled test
|
2020-12-07 13:50:01 +01:00 |
Claudio Atzori
|
cfb55effd9
|
code formatting
|
2020-12-02 11:23:49 +01:00 |
Alessia Bardi
|
2d15667b4a
|
testing XML generation from json object (case AMS ACTA)
|
2020-12-02 10:16:26 +01:00 |
Claudio Atzori
|
d48f388fb2
|
Merge branch 'provision_indexing'
|
2020-11-19 15:59:55 +01:00 |
Claudio Atzori
|
7c9feaf9e7
|
project attributes removed from the XML record serialization: contactfullname, contactfax, contactphone, contactemail
|
2020-11-19 15:26:20 +01:00 |
Claudio Atzori
|
3f34757c63
|
merged from master
|
2020-11-19 14:34:54 +01:00 |
Claudio Atzori
|
0374d34c3e
|
introduced configuration param outputFormat: HDFS | SOLR
|
2020-11-19 10:34:28 +01:00 |
Claudio Atzori
|
5218718e8b
|
updated set of fields from the MDFormatDSResourceType on PROD
|
2020-11-18 15:00:41 +01:00 |
Claudio Atzori
|
d9e07a242b
|
extended XmlIndexingJob to accept an optional parameter: outputPath. When present, forces the job to write its output on the specified HDFS location
|
2020-11-18 14:34:55 +01:00 |
Claudio Atzori
|
29dcff0f34
|
spark complains about missing classes, so here they are again
|
2020-11-18 14:32:32 +01:00 |
Claudio Atzori
|
8177ce7939
|
test for XmlIndexingJob based on a local miniSolrCluster
|
2020-11-18 10:58:05 +01:00 |
Claudio Atzori
|
2bed29eb09
|
WIP: added oozie workflow for grouping graph entities by id
|
2020-11-13 10:05:12 +01:00 |
Claudio Atzori
|
9b0fb9e958
|
merged from master
|
2020-11-12 09:27:12 +01:00 |
Claudio Atzori
|
822971f54f
|
no need to filter relations in CreateRelatedEntitiesJob_phase1; replaced 'left outer' join with 'left' join in CreateRelatedEntitiesJob_phase2; cleanup;
|
2020-11-12 09:22:59 +01:00 |
Claudio Atzori
|
18d9aad70c
|
improved documentation in dhp-graph-provision
|
2020-11-10 11:48:55 +01:00 |
Claudio Atzori
|
58f28296ea
|
ProvisionConstants moved as ModelHardLimits in dhp-common and applied to truncate long abstracts (len > 150000). Further filtering for empty PID values
|
2020-10-30 10:56:42 +01:00 |
Claudio Atzori
|
1871d1c6f6
|
solve error java.lang.NoSuchFieldError: INSTANCE when instantiating Solr client
|
2020-08-14 11:18:30 +02:00 |
Claudio Atzori
|
3a11a387a9
|
data provision workflow enhancement: added nodes to perform DELETE BY QUERY before the indexing begins and COMMIT after the indexing is completed
|
2020-08-03 14:28:08 +02:00 |
Claudio Atzori
|
cc5d13da85
|
introduced parameter shouldIndex (true|false)
|
2020-07-16 13:46:39 +02:00 |
Claudio Atzori
|
b098cc3cbe
|
avoid repeating identical values for fields: source, description
|
2020-07-16 13:45:53 +02:00 |
Claudio Atzori
|
7d6e269b40
|
reverted CreateRelatedEntitiesJob_phase1 to its previous state
|
2020-07-13 22:54:04 +02:00 |
Claudio Atzori
|
8e97598eb4
|
avoid to NPE in case of null instances
|
2020-07-13 20:46:14 +02:00 |
Claudio Atzori
|
06c1913062
|
added different limits for grouping by source and by target, incremented spark.sql.shuffle.partitions for the join operations
|
2020-07-10 19:03:33 +02:00 |
Claudio Atzori
|
4c3836f62e
|
materialize the related entities before joining them
|
2020-07-10 19:00:44 +02:00 |
Claudio Atzori
|
b21866a2da
|
allow to set different to relations cut points by source and by target; adjusted weight assigned to relationship types
|
2020-07-10 13:59:48 +02:00 |
Claudio Atzori
|
ff4d6214f1
|
experimenting with pruning of relations
|
2020-07-10 10:06:41 +02:00 |
Claudio Atzori
|
b383ed42fa
|
pass optional parameter relationFilter to the PrepareRelationJob implementation
|
2020-07-07 14:21:28 +02:00 |
Claudio Atzori
|
d380b85246
|
unit test for the preparation of the relations
|
2020-07-02 12:42:13 +02:00 |
Claudio Atzori
|
7817338e05
|
added test to verify the relation pre-processing
|
2020-06-26 17:58:33 +02:00 |
Claudio Atzori
|
8d59fdf34e
|
WIP: dataset based PrepareRelationsJob
|
2020-06-26 14:32:58 +02:00 |
Claudio Atzori
|
216975c4ec
|
restored complete provision workflow
|
2020-06-25 12:55:52 +02:00 |
Claudio Atzori
|
93f627ea51
|
code formatting
|
2020-06-25 12:54:21 +02:00 |
Claudio Atzori
|
e62333192c
|
WIP: prepare relation job
|
2020-06-25 12:22:18 +02:00 |
Claudio Atzori
|
6933ec11fb
|
WIP: prepare relation job
|
2020-06-25 11:04:12 +02:00 |
Sandro La Bruzzo
|
a6c0faac70
|
added test to verify secondary sorting
|
2020-06-25 10:48:15 +02:00 |
Claudio Atzori
|
69b0391708
|
WIP: prepare relation job
|
2020-06-25 10:19:56 +02:00 |
Claudio Atzori
|
46e76affeb
|
WIP: prepare relation job
|
2020-06-24 19:01:15 +02:00 |
Claudio Atzori
|
0e723d378b
|
added default from vocab for missing instance.refereed; remove spurious prefixes from orcid values; WIP: prepare relation job
|
2020-06-24 18:34:42 +02:00 |
Claudio Atzori
|
9cd27183b6
|
[maven-release-plugin] prepare for next development iteration
|
2020-06-22 11:27:44 +02:00 |
Claudio Atzori
|
1e3dab0631
|
[maven-release-plugin] prepare release dhp-1.2.3
|
2020-06-22 11:27:39 +02:00 |
Claudio Atzori
|
c4d9f1837f
|
[maven-release-plugin] prepare for next development iteration
|
2020-06-12 12:21:08 +02:00 |
Claudio Atzori
|
f0746a7605
|
[maven-release-plugin] prepare release dhp-1.2.2
|
2020-06-12 12:21:03 +02:00 |
Claudio Atzori
|
463489f59f
|
code formatting
|
2020-06-12 12:03:25 +02:00 |
Claudio Atzori
|
4bcad1c9c3
|
Merge branch 'graph_cleaning'
|
2020-06-12 11:40:25 +02:00 |
Alessia Bardi
|
e79943965b
|
Fixes #5604: field oamandatepublications in XML
|
2020-06-11 12:49:31 +02:00 |
Claudio Atzori
|
67c7b31ba6
|
Merge branch 'master' into graph_cleaning
|
2020-06-10 15:00:35 +02:00 |
Claudio Atzori
|
ce12f236bb
|
disabled test, need to need to update the joined_entity.json file
|
2020-06-09 20:07:36 +02:00 |
Claudio Atzori
|
a2fdf85ba1
|
WIP: graph cleaner implementation
|
2020-06-09 19:52:53 +02:00 |
Claudio Atzori
|
05f269a1c0
|
kryo based parallel implementation of CreateRelatedEntitiesJob_phase2, now works by OafType; introduced custom aggregator in AdjacencyListBuilderJob
|
2020-06-01 00:32:42 +02:00 |
Claudio Atzori
|
6f5f498c78
|
restored common properties driving executor-cores and executor-memory in join_organization_relations wf node
|
2020-05-29 11:22:00 +02:00 |
Claudio Atzori
|
b2f9564f13
|
WIP: fixed PrepareRelationsJob; parallel implementation of CreateRelatedEntitiesJob_phase2, now works by OafType; introduced custom aggregator in AdjacencyListBuilderJob
|
2020-05-29 10:58:15 +02:00 |
Claudio Atzori
|
a57965a3ea
|
limiting the dimensions of outliers
|
2020-05-28 17:36:37 +02:00 |
Claudio Atzori
|
821be1f8b6
|
experimental implementation of custom aggregation using kryo encoders
|
2020-05-28 13:53:13 +02:00 |
Claudio Atzori
|
83504ecace
|
limiting the maximum number of authors allowed in XML records to MAX_AUTHORS = 200; authors with ORCID can exceed that limit
|
2020-05-28 13:52:30 +02:00 |
Claudio Atzori
|
ef11593068
|
JoinedEntity.links defined as empty list by default
|
2020-05-28 13:50:44 +02:00 |
Claudio Atzori
|
5dea155a87
|
increased number of partitions produced by the join_all_entities phase as well as spark.sql.shuffle.partitions in adjancency_lists phase
|
2020-05-28 13:49:59 +02:00 |
Claudio Atzori
|
fdd54bad1c
|
code formatting
|
2020-05-27 19:31:54 +02:00 |
Claudio Atzori
|
cfd753217c
|
repartition the join_entities in 24k files
|
2020-05-27 12:44:01 +02:00 |
Claudio Atzori
|
2f1a623d09
|
sync from master branch
|
2020-05-27 12:39:58 +02:00 |
Claudio Atzori
|
9e4ec1543b
|
updated test
|
2020-05-27 12:38:42 +02:00 |
Claudio Atzori
|
8047d16dd9
|
added RDD based adjacency list creation procedure
|
2020-05-27 12:38:12 +02:00 |
Claudio Atzori
|
f057dcdf65
|
limit the max number of externalreferences to MAX_EXTERNAL_ENTITIES
|
2020-05-27 12:37:33 +02:00 |
Claudio Atzori
|
4e36d689dd
|
fixed XML serialization for children sub-elements (duplicates & externalreferences)
|
2020-05-26 18:30:40 +02:00 |
Claudio Atzori
|
b8e541a454
|
fixing repeated organization.websiteurl in organization entities (#5645) as well as project.ecinternationalorganizationeurinterests
|
2020-05-26 10:30:09 +02:00 |
Claudio Atzori
|
7582532e73
|
[maven-release-plugin] prepare for next development iteration
|
2020-05-25 19:48:18 +02:00 |
Claudio Atzori
|
01c2e93395
|
[maven-release-plugin] prepare release dhp-1.2.1
|
2020-05-25 19:48:14 +02:00 |
Claudio Atzori
|
925d933204
|
making XmlRecordFactory immune to graph encoding changes (mostly to avoid NPEs)
|
2020-05-22 08:50:44 +02:00 |
Claudio Atzori
|
b33dd58be4
|
replaced parameter 'reuseRecords' with 'resumeFrom', allowing to restart the provision workflow execution from any step, useful for manual submissions or debugging
|
2020-05-22 08:50:06 +02:00 |
Claudio Atzori
|
dbfb9c19fe
|
minor changes
|
2020-05-21 10:00:14 +02:00 |
Claudio Atzori
|
d7d2a0637f
|
added extra parameters to the provision indexing workflow
|
2020-05-20 14:55:38 +02:00 |
Claudio Atzori
|
0bdfbb0a57
|
reintroduced RDD based relation cut off procedure
|
2020-05-19 15:02:21 +02:00 |