Giambattista Bloisi
73316d8c83
Add jaxb and jaxws dependencies when compiling with spark-34 profile as they are required to run with jdk > 8
2024-05-28 14:14:51 +02:00
Miriam Baglioni
75d5ddb999
Update to include a blackList that filters out the results we know are wrongly associated to IE - update workflow definition - the blacklist parameter
2024-05-27 12:01:28 +02:00
Miriam Baglioni
87c9c61b41
Update to include a blackList that filters out the results we know are wrongly associated to IE - refactoring
2024-05-27 12:01:16 +02:00
Miriam Baglioni
b55fed09f8
Update to include a blackList that filters out the results we know are wrongly associated to IE
2024-05-27 12:01:01 +02:00
Claudio Atzori
107d958b89
[org dedup] avoid NPEs in SparkPrepareNewOrgs
2024-05-27 11:59:54 +02:00
Claudio Atzori
3a7a6ecc32
[org dedup] avoid NPEs in SparkPrepareOrgRels
2024-05-27 11:59:45 +02:00
Claudio Atzori
1af4224d3d
[org dedup] avoid NPEs in SparkPrepareOrgRels
2024-05-27 11:59:33 +02:00
Claudio Atzori
0d5bdb2db0
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2024-05-27 11:59:02 +02:00
Claudio Atzori
66548e6a83
Merge pull request 'changes in copy script' ( #438 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #438
2024-05-27 11:54:03 +02:00
Lampros Smyrnaios
888637773c
Add missing "/*EOS*/" comments.
2024-05-27 12:34:49 +03:00
Lampros Smyrnaios
e0ac494859
Merge branch 'beta' into convert_hive_to_spark_actions
...
# Conflicts:
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step15.sql
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step15_5.sql
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step16_1-definitions.sql
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step16_5.sql
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step2.sql
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step7.sql
2024-05-27 12:27:40 +03:00
Antonis Lempesis
15b54a345a
added fos lvl4
2024-05-24 13:21:28 +03:00
Lampros Smyrnaios
b48ed6e617
Change configuration in the copy-operation to Impala Cluster:
...
Set the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR" parameter to "false".
2024-05-23 16:58:12 +03:00
Lampros Smyrnaios
68322843e2
Small updates to the copy-operation to Impala Cluster:
...
- Add a configuration-"switch" to control whether the script exits upon an error or not.
- Allow the script to exit when a table could not be created.
- Show the elapsed time for processing each database.
2024-05-23 15:07:49 +03:00
Lampros Smyrnaios
c7b32bbacc
Update CopyDataToImpalaCluster:
...
Update the code of acquiring the entities from Ocean cluster, through hive, in order to optimize the process and account for additional reserved keywords in Impala.
Co-authored-by: Antonis Lempesis <antleb@di.uoa.gr>
2024-05-23 13:00:19 +03:00
Giambattista Bloisi
1b2357e10a
Merge pull request 'Changes in maven poms to build and test the project using Spark 3.4.x and scala 2.12' ( #327 ) from spark34-integration into beta
...
Reviewed-on: #327
2024-05-23 09:20:28 +02:00
Sandro La Bruzzo
f1fe363b19
merged again from beta (I hope for the last time)
2024-05-22 11:08:52 +02:00
Sandro La Bruzzo
66c1ffc866
merged again from beta (I hope for the last time)
2024-05-22 11:02:46 +02:00
Claudio Atzori
1ea67eba82
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2024-05-21 13:48:48 +02:00
Claudio Atzori
f9fb2fef6e
Merge pull request 'Modification of Microsoft Academic Graph Mapping' ( #435 ) from mag_only_doi into beta
...
Reviewed-on: #435
2024-05-21 13:48:42 +02:00
Claudio Atzori
834461ba26
[graph provision]fixed wf definition, revised serialization of the usage counts measures
2024-05-21 13:48:06 +02:00
Sandro La Bruzzo
e8a61d5dd5
removed plugin, use only FileGZip plugin
2024-05-21 13:45:29 +02:00
Sandro La Bruzzo
ca9414b737
Implement multiple node name splitter on GZipCollectorPlugin and all nodes that use XMLIterator. If the splitter name contains is a comma separated values it splits for all the values
2024-05-21 09:11:13 +02:00
Sandro La Bruzzo
032bcc8279
since last beta workflow we decide to introduce in the graph only MAG item with DOI and set them invisible ( this should be the same behaviour of the previous DOIBoost mapping).
...
This commit apply this type of mapping
2024-05-20 09:24:15 +02:00
Sandro La Bruzzo
103e2652b3
merged beta
2024-05-17 14:43:07 +02:00
Sandro La Bruzzo
a87f9ea643
fixed scholexplorer bug
2024-05-17 14:16:43 +02:00
Sandro La Bruzzo
6efab4d88e
fixed scholexplorer bug
2024-05-16 16:19:18 +02:00
Claudio Atzori
92f018d196
[graph provision] fixed path pointing to an intermediate data store in the working directory
2024-05-15 15:39:18 +02:00
Claudio Atzori
0611c81a2f
[graph provision] using Qualifier.classNames to populate the correponsing fields in the JSON payload
2024-05-15 15:33:10 +02:00
Michele Artini
2b3b5fe9a1
oai finalization and test
2024-05-15 14:13:16 +02:00
Claudio Atzori
1efe7f7e39
[graph provision] upgrade to dhp-schema:6.1.2, included project.oamandatepublications in the JSON payload mapping, fixed serialisation of the usageCounts measures
2024-05-14 12:39:31 +02:00
Claudio Atzori
53e7bb4336
Merge pull request 'rest-collector-plugin-with-retry' ( #432 ) from rest-collector-plugin-with-retry into beta
...
Reviewed-on: #432
2024-05-10 09:02:33 +02:00
Claudio Atzori
f7d56e2ef2
Merge branch 'beta' into rest-collector-plugin-with-retry
2024-05-10 09:02:21 +02:00
Claudio Atzori
c1237ab39e
Merge pull request 'Fixes in Graph Provision' ( #434 ) from beta_provision_relation into beta
...
Reviewed-on: #434
2024-05-09 14:15:05 +02:00
Claudio Atzori
dc3a5858f7
Merge branch 'beta' into beta_provision_relation
2024-05-09 14:14:43 +02:00
Claudio Atzori
55f39f7850
[graph provision] adds the possibility to validate the XML records before storing them via the validateXML parameter
2024-05-09 14:06:04 +02:00
Claudio Atzori
39a2afe8b5
[graph provision] fixed XML serialization of the usage counts measures, renamed workflow actions to better reflect their role
2024-05-09 13:54:42 +02:00
Claudio Atzori
908ed9da7a
Merge pull request 'Various fixes in the stats wf' ( #430 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #430
2024-05-08 13:41:02 +02:00
Antonis Lempesis
0cada3cc8f
every step is run in the analytics queue. Hardcoded for now, will make a parameter later
2024-05-08 13:42:53 +03:00
Antonis Lempesis
90a4fb3547
fixed typos
2024-05-08 13:17:58 +03:00
Claudio Atzori
18aa323ee9
cleanup unused classes, adjustments in the oozie wf definition
2024-05-08 11:36:46 +02:00
Michele Artini
c9a327bc50
refactoring of gzip method
2024-05-08 11:34:08 +02:00
Michele Artini
e234848af8
oaf record: xpath for root
2024-05-08 10:00:53 +02:00
Claudio Atzori
b4e3389432
fixed property mapping creating the RelatedEntity transient objects. spark cores & memory adjustments. Code formatting
2024-05-07 16:25:17 +02:00
Giambattista Bloisi
711048ceed
PrepareRelationsJob rewritten to use Spark Dataframe API and Windowing functions
2024-05-07 15:44:33 +02:00
Michele Artini
70bf6ac415
oai exporter tests
2024-05-07 09:36:26 +02:00
Michele Artini
aa40e53c19
oai exporter parameters
2024-05-07 08:01:19 +02:00
Michele Artini
ed052a3476
job for the population of the oai database
2024-05-06 16:08:33 +02:00
Claudio Atzori
26363060ed
fixed id prefix creation for the fosnodoi records, again
2024-05-03 15:53:52 +02:00
Claudio Atzori
0486227185
[cleaning] deactivating the cleaning of FOS subjects found in the metadata provided by repositories
2024-05-03 14:31:12 +02:00