Antonis Lempesis
459167ac2f
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta
2024-03-21 12:44:58 +02:00
Antonis Lempesis
07f634a46d
code cleanup
2024-03-21 12:44:30 +02:00
Antonis Lempesis
9521625a07
code cleanup
2024-03-21 11:45:08 +02:00
Sandro La Bruzzo
58dbe71d39
update crossref mapping to be runnable separately as a single datasource outside doiboost
2024-03-20 17:04:52 +01:00
Antonis Lempesis
67a5aa0a38
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta
2024-03-19 11:24:54 +02:00
dimitrispie
a3a570e9a0
Commit monitor-updates-wf
2024-03-19 09:42:21 +02:00
Giambattista Bloisi
664a381d31
Unify merge logic of entities in MergeUtils.class
2024-03-18 16:04:49 +01:00
Michele Artini
cb29b9773c
xslt rules
2024-03-18 15:31:34 +01:00
Michele Artini
85b844d57e
updated BASE filter param
2024-03-15 15:03:27 +01:00
Michele Artini
455f2e1e07
apply commits from master
2024-03-15 14:56:39 +01:00
Michele Artini
30167aa882
mapped oaf:country from results
2024-03-15 11:24:16 +01:00
Michele Artini
88fef367b9
new plugin to collect from a dump of BASE
2024-03-15 10:47:52 +01:00
Claudio Atzori
078169b922
cleanup
2024-03-15 09:56:04 +01:00
Claudio Atzori
af154d4456
implemented changes from #9497 : sort abstracts by string length, included author fullnames in the related results, expanded instance details within each children/result XML element
2024-03-14 16:21:23 +01:00
Claudio Atzori
7863c92466
expanded paper abstract in the result/children XML element (ticket #9497 )
2024-03-13 16:25:31 +01:00
Claudio Atzori
eb5887cb9a
including related organization url in the XML record serialization (ticket #9498 )
2024-03-13 14:46:00 +01:00
Sandro La Bruzzo
5281f010a5
applied cherry pick
2024-03-13 09:59:20 +01:00
Sandro La Bruzzo
ee1fcb672b
code refactor
2024-03-13 09:46:31 +01:00
Miriam Baglioni
5a32bb9578
[OC New] last fix
2024-03-13 09:36:18 +01:00
Sandro La Bruzzo
c532831718
Moved Crossref Mapping on dhp-aggregations,
...
refactored code, avoid to use utility for create part of the oaf defined in DOIBoostMappingUtils, used instead utility in OafMappingUtils
2024-03-13 06:56:10 +01:00
Miriam Baglioni
48c052215c
[OC New] last fix
2024-03-12 23:12:32 +01:00
Claudio Atzori
db66555ebb
WIP: updated provision workflow to create a JSON based representation of the payload
2024-03-12 09:56:09 +01:00
Antonis Lempesis
f74c7e8689
selecting distinct peer_reviewed
2024-03-12 02:13:04 +02:00
Giambattista Bloisi
9092075760
Enrich authors with ORCID info using new matching algorithm
2024-03-11 13:23:59 +01:00
Sandro La Bruzzo
cbd4e5e4bb
update mag mapping
2024-03-08 16:31:40 +01:00
Claudio Atzori
d4871b31e8
WIP: extended provision workflow to create the JSON based payload
2024-03-08 11:43:20 +01:00
Antonis Lempesis
3c79720342
fixed the irish result subset
2024-03-07 14:08:57 +02:00
Antonis Lempesis
5ae4b4286c
Merge branch 'beta' of https://code-repo.d3science.org/antonis.lempesis/dnet-hadoop into beta
2024-03-07 12:15:19 +02:00
Miriam Baglioni
5180b6ec8a
[FOSNEW] removed test class
2024-03-07 10:47:13 +01:00
Miriam Baglioni
7827a2d66b
[OCNEW] added creation of the actionset for the results classified with FoS based ont he OpenAIRE identifier
2024-03-07 10:36:30 +01:00
Antonis Lempesis
316d585c8a
using distinct apcs per publication to avoid huge sums
2024-03-07 02:07:59 +02:00
Miriam Baglioni
fd34372c40
[OCNEW] first implementation
2024-03-06 13:42:00 +01:00
Sandro La Bruzzo
d34cef3f8d
Merge remote-tracking branch 'origin/beta' into doidoost_dismiss
2024-03-05 11:45:31 +01:00
Sandro La Bruzzo
3b837d38ce
added oozie workflow
2024-03-05 11:44:59 +01:00
Sandro La Bruzzo
f417515e43
Implemented class that generates a normalized table of MAG, which is the starting point for the creation of the mag source
2024-03-04 17:15:13 +01:00
Sandro La Bruzzo
ad0e9aa80c
added first part of refactoring of the code generating MAG,
...
make it more readable using spark sql queries
2024-02-29 18:16:15 +01:00
Sandro La Bruzzo
9d94648f3b
code formatted
2024-02-29 18:15:20 +01:00
Giambattista Bloisi
3cd5590f3b
When converting json to XML, remove characters that are not allowed in the XML 1.0 specs, as they will cause xpath failures even if escaped
2024-02-28 15:14:18 +01:00
Giambattista Bloisi
56dd05f85c
Merge pull request 'Revised procedure when converting json data into xml' ( #395 ) from restiterator_xmlcleanup into beta
...
Reviewed-on: D-Net/dnet-hadoop#395
2024-02-28 10:38:54 +01:00
Claudio Atzori
6fcf872daa
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into index_records
2024-02-28 10:27:28 +01:00
Claudio Atzori
3f07390a58
WIP
2024-02-28 10:10:10 +01:00
Sandro La Bruzzo
7d806a434c
formatted code
2024-02-28 09:31:58 +01:00
Sandro La Bruzzo
b63994dcc4
Merge remote-tracking branch 'origin/beta' into orcid_update
2024-02-28 09:11:18 +01:00
Sandro La Bruzzo
915a76a796
following the comment on the pull requests:
...
- Added #NUM_OF_THREADS complete job in the queue at the end of the main loop to avoid deadlock
2024-02-28 09:10:55 +01:00
Giambattista Bloisi
773e856550
Revised procedure when converting json data into xml:
...
- json object keys are renamed to be conformant to xml tag elements, special characters are substituted or removed
- json string values are no longer post-processed as they are already escaped by the org.json.XML.toString method
2024-02-24 16:54:30 +01:00
Sandro La Bruzzo
a712df1e1d
Merge remote-tracking branch 'origin/beta' into orcid_update
2024-02-23 10:12:25 +01:00
Sandro La Bruzzo
b32a9d1994
Implemented workflow for updating table , added step to check if the new generated table is valid
2024-02-23 10:04:28 +01:00
Michele Artini
3268570b2c
mapping of project PIDs
2024-02-22 14:47:21 +01:00
Miriam Baglioni
72bae7af76
[Transformative Agreement] removed the relations from the ActionSet waiting to have the gree light from Ioanna
2024-02-19 16:20:12 +01:00
Miriam Baglioni
43da7e1191
[Tagging Projects and Datasource] changed the way the pathMap parameter is passed. It was too long and was truncated
2024-02-19 16:12:59 +01:00
Serafeim Chatzopoulos
f0dc12634b
Add Action Set creation for affiliations inferred from the OpenAPC data
2024-02-18 18:02:09 +02:00
Claudio Atzori
a63b091bae
Merge branch 'beta' into import_orps_fix
2024-02-15 15:01:56 +01:00
Miriam Baglioni
8dae10b442
-
2024-02-14 14:57:08 +01:00
Miriam Baglioni
83bb97be83
[Tagging Projects and Datasource] added test to check datasource tagging. Fixed issue
2024-02-14 11:23:47 +01:00
Miriam Baglioni
6e1f383e4a
[Tagging Projects and Datasource] first extention of bulktagging to add the context to projects and datasource
2024-02-13 16:37:14 +01:00
Miriam Baglioni
3f7d262a4e
mergin with branch beta
2024-02-13 14:05:58 +01:00
Miriam Baglioni
eca021f4d6
[Transformative Agreement] add results with information abount the agreement and the country of the organization paid for it
2024-02-13 12:21:07 +01:00
Miriam Baglioni
bdb6bbb365
mergin with branch beta
2024-02-12 15:50:43 +01:00
Claudio Atzori
d85d2df6ad
[graph raw] fixed mapping of the original resource type from the Datacite format
2024-02-09 10:20:20 +01:00
Giambattista Bloisi
b19643f6eb
Dedup aliases, created when a dedup in a previous build has been merged in a new dedup, need to be marked as "deletedbyinference", since they are "merged" in the new dedup
2024-02-08 15:34:59 +01:00
Antonis Lempesis
dd4c27f4f3
added 2 new institutions in monitor
2024-02-08 12:57:57 +02:00
Claudio Atzori
38c9001147
fixed import of ORPs stored on HDFS in the internal graph format (e.g. Datacite)
2024-02-07 17:02:05 +01:00
Claudio Atzori
fd17c1f17c
[actiosets] fixed join type
2024-02-05 16:55:36 +02:00
Claudio Atzori
009dcf6aea
[actiosets] introduced support for the PromoteAction strategy
2024-02-05 16:43:40 +02:00
Claudio Atzori
42f5506306
[orcid enrichment] fixed directory cleanup before distcp
2024-02-05 09:45:36 +02:00
Alessia Bardi
f2a08d8cc2
test for Italian records from IRS repositories
2024-01-30 19:20:14 +01:00
Antonis Lempesis
a512ead447
changed orcid ids to all capital
2024-01-30 16:54:47 +02:00
Miriam Baglioni
07a373a0bd
[bulkTagging] removing checks while performing the substring action so that it will fire an Exception if the paramneters are wrongly set
2024-01-30 13:51:11 +01:00
Miriam Baglioni
ead08b0dd4
mergin with branch beta
2024-01-30 12:19:10 +01:00
Antonis Lempesis
bb10a22290
merged changes from dnet-hadoop
2024-01-29 21:51:47 +02:00
Miriam Baglioni
a5995ab557
[orcid-enrichment] change the value of parameters.
2024-01-29 18:19:48 +01:00
Miriam Baglioni
a418dacb47
[UsageCount] code extention to include also the name of the datasource
2024-01-29 18:12:33 +01:00
Miriam Baglioni
e9131f4e4a
mergin with branch beta
2024-01-29 16:27:18 +01:00
Sandro La Bruzzo
9aebca77a0
Added exception throwing in Hadoop transformation when TR is not syntactically valid
2024-01-29 14:41:02 +01:00
Claudio Atzori
926903b06b
Merge branch 'beta' into stats_with_spark_sql
2024-01-29 09:11:45 +01:00
Giambattista Bloisi
078df0b4d1
Use SparkSQL in place of Hive for executing step16-createIndicatorsTables.sql of stats update wf
2024-01-26 21:56:55 +01:00
Claudio Atzori
ce3200263e
Merge branch 'beta' into crossref_missing_author_fix
2024-01-26 15:57:04 +01:00
Sandro La Bruzzo
e889808daa
Fixed problem on missing author in crossref Mapping
2024-01-26 12:19:04 +01:00
Antonis Lempesis
c548796463
Changed step16-createIndicatorsTables to use a spark oozie action instead of hive
2024-01-26 02:04:48 +02:00
Sandro La Bruzzo
0386f36385
Added workflow to update ORCID and replaced some parsing, because the update works and employments xml differs from the dump one.
2024-01-25 19:40:59 +01:00
Antonis Lempesis
a7115cfa9e
max mem of joins (hive.mapjoin.followby.gby.localtask.max.memory.usage) now 80%, up from 55%.
2024-01-25 15:13:16 +01:00
Antonis Lempesis
fd43b0e84a
max mem of joins (hive.mapjoin.followby.gby.localtask.max.memory.usage) now 80%, up from 55%.
2024-01-25 15:06:34 +01:00
Claudio Atzori
9b13c22e5d
[graph provision] retrieve all the context information by adding all=true to the requests issued to thr API
2024-01-23 15:36:08 +01:00
Sandro La Bruzzo
43e0bba7ed
logg added during download
2024-01-23 15:04:49 +01:00
Miriam Baglioni
f7d06dc661
compilation after merging
2024-01-23 11:43:08 +01:00
Miriam Baglioni
6e58d79623
mergin with branch beta
2024-01-23 11:36:47 +01:00
Miriam Baglioni
e0ec800d7e
[BulkTagging] extend the definition of the pathMap to include also actions that should be performed of the value extracted from the result befor applying the constraint
2024-01-23 11:34:53 +01:00
Claudio Atzori
f87f3a6483
[graph provision] updated param specification for the XML converter job
2024-01-23 08:54:37 +01:00
Claudio Atzori
6fd25cf549
code formatting
2024-01-23 08:47:12 +01:00
Claudio Atzori
f76852f385
Merge branch 'beta' into update_pivots_table
2024-01-22 16:37:22 +01:00
Claudio Atzori
1c6db320f4
[graph provision] obtain context info from the context API instead from the ISLookUp service
2024-01-22 15:53:17 +01:00
Claudio Atzori
2655eea5bc
[orcid enrichment] drop paths before copying the non-modifyed contents
2024-01-19 16:28:05 +01:00
Claudio Atzori
c6b3401596
increased shuffle partitions for publications in the country propagation workflow
2024-01-19 10:15:39 +01:00
Miriam Baglioni
bcc0a13981
[enrichment single step] adding <end> element in wf definition
2024-01-18 17:39:14 +01:00
Miriam Baglioni
6af536541d
[enrichment single step] moving parameter file in correct location
2024-01-18 15:35:40 +01:00
Miriam Baglioni
a12a3eb143
-
2024-01-18 15:18:10 +01:00
Miriam Baglioni
82e9e262ee
[enrichment single step] remove parameter from execution
2024-01-17 17:38:03 +01:00
Miriam Baglioni
67ce2d54be
[enrichment single step] refactoring to fix issues in disappeared result type
2024-01-17 16:50:00 +01:00
Miriam Baglioni
59eaccbd87
[enrichment single step] refactoring to fix issue in disappeared result type
2024-01-15 17:49:54 +01:00
Giambattista Bloisi
21a14fcd80
Reusable RunSQLSparkJob for executing SQL in Spark through Oozie Spark Actions
...
Implements pivots table update oozie workflow
2024-01-15 10:18:14 +01:00
Sandro La Bruzzo
e0753f19da
Fixed error of connection timeout
2024-01-13 09:27:08 +01:00
sandro.labruzzo
e328bc0ade
fixed missing parameter on download update
2024-01-12 16:18:20 +01:00
Miriam Baglioni
f612125939
fix issue on FoS integration. Removing the null values from FoS
2024-01-12 10:20:28 +01:00
Claudio Atzori
cb9e739484
Merge branch 'beta' into resource_types
2024-01-11 16:29:41 +01:00
Claudio Atzori
2753044d13
refined mapping for the extraction of the original resource type
2024-01-11 16:28:26 +01:00
Giambattista Bloisi
3c66e3bd7b
Create dedup record for "merged" pivots
...
Do not create dedup records for group that have more than 20 different acceptance date
2024-01-10 22:59:52 +01:00
Giambattista Bloisi
10e135db1e
Use dedup_wf_002 in place of dedup_wf_001 to make explicit a different algorithm has been used to generate those kind of ids
2024-01-10 22:59:52 +01:00
Giambattista Bloisi
831cc1fdde
Generate "merged" dedup id relations also for records that are filtered out by the cut parameters
2024-01-10 22:59:52 +01:00
Giambattista Bloisi
1287315ffb
Do no longer use dedupId information from pivotHistory Database
2024-01-10 22:59:52 +01:00
Giambattista Bloisi
02636e802c
SparkCreateSimRels:
...
- Create dedup blocks from the complete queue of records matching cluster key instead of truncating the results
- Clean titles once before clustering and similarity comparisons
- Added support for filtered fields in model
- Added support for sorting List fields in model
- Added new JSONListClustering and numAuthorsTitleSuffixPrefixChain clustering functions
- Added new maxLengthMatch comparator function
- Use reduced complexity Levenshtein with threshold in levensteinTitle
- Use reduced complexity AuthorsMatch with threshold early-quit
- Use incremental Connected Component to decrease comparisons in similarity match in BlockProcessor
- Use new clusterings configuration in Dedup tests
SparkWhitelistSimRels: use left semi join for clarity and performance
SparkCreateMergeRels:
- Use new connected component algorithm that converge faster than Spark GraphX provided algorithm
- Refactored to use Windowing sorting rather than groupBy to reduce memory pressure
- Use historical pivot table to generate singleton rels, merged rels and keep continuity with dedupIds used in the past
- Comparator for pivot record selection now uses "tomorrow" as filler for missing or incorrect date instead of "2000-01-01"
- Changed generation of ids of type dedup_wf_001 to avoid collisions
DedupRecordFactory: use reduceGroups instead of mapGroups to decrease memory pressure
2024-01-10 22:59:52 +01:00
Antonis Lempesis
e024718f73
creating result_instances even when no pids exist for the instance
2024-01-10 22:25:50 +01:00
Sandro La Bruzzo
859babf722
added some useful comment
2024-01-10 19:51:13 +01:00
Sandro La Bruzzo
39ebb60b38
Merge remote-tracking branch 'origin/beta' into orcid_update
2024-01-10 19:50:00 +01:00
Sandro La Bruzzo
9d5a7c3b22
code refactor
2024-01-10 19:42:34 +01:00
Sandro La Bruzzo
8f61063201
Added workflow
2024-01-10 19:42:22 +01:00
Sandro La Bruzzo
1a42a5c10d
Implemented Download update of ORCID
2024-01-10 18:03:20 +01:00
Miriam Baglioni
e711a05229
fixed conflicts
2024-01-10 11:03:42 +01:00
Miriam Baglioni
71d6f30711
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2024-01-10 10:59:58 +01:00
dimitrispie
b920307bdd
Changes to indicators
2024-01-09 00:47:09 +02:00
dimitrispie
8b2cbb611e
Changes to beta db names
2024-01-09 00:40:56 +02:00
Antonis Lempesis
2e4cab026c
fixed the result_country definition
2024-01-08 16:01:26 +02:00
dimitrispie
6b823100ae
Update buildIrishMonitorDB.sql
...
New indicators added
2024-01-07 22:54:39 +02:00
dimitrispie
75bfde043c
Historical Snapshots Workflow
...
Create historical snapshots db with parameters:
hist_db_name=openaire_beta_historical_snapshots_xxx
hist_db_name_prev=openaire_beta_historical_snapshots_xxx (previous run of wf)
stats_db_name=openaire_beta_stats_xxx
stats_irish_db_name=openaire_beta_stats_monitor_ie_xxx
monitor_db_name=openaire_beta_stats_monitor_xxx
monitor_db_prod_name=openaire_beta_stats_monitor
monitor_irish_db_name=openaire_beta_stats_monitor_ie_xxx
monitor_irish_db_prod_name=openaire_beta_stats_monitor_ie
hist_db_prod_name=openaire_beta_historical_snapshots
hist_db_shadow_name=openaire_beta_historical_snapshots_shadow
hist_date=122023
hive_timeout=150000
hadoop_user_name=xxx
resumeFrom=CreateDB
2024-01-04 15:11:04 +02:00
Miriam Baglioni
cb14470ba6
added properties file in the forlder for the workflow of result to organization from inst repo propagation. Changes the path in the classes implementing the propagation
2023-12-22 14:50:05 +01:00
Miriam Baglioni
9f966b59d4
added properties file in the forlder for the workflow of result to community from semrel propagation. Changes the path in the classes implementing the propagation
2023-12-22 14:11:47 +01:00
Miriam Baglioni
2f3b5a133d
added properties file in the forlder for the workflow of result to community from organization propagation. Changes the path in the classes implementing the propagation
2023-12-22 13:56:40 +01:00
Miriam Baglioni
2f7b9ad815
added properties file in the forlder for the workflow of project to result propagation. Changes the path in the classes implementing the propagation
2023-12-22 11:46:15 +01:00
Miriam Baglioni
f2352e8a78
changed in the classes the path for the property files for the propagation of community from project
2023-12-22 11:43:34 +01:00
Miriam Baglioni
009730b3d1
added properties file in the forlder for the workflow of orcid propagation. Changes the path in the classes implementing the propagationchanged the path to the parameter file in the class for entitytoorganization propagation
2023-12-22 11:42:09 +01:00
Miriam Baglioni
89f269c7f4
changed the path to the parameter file in the class for entitytoorganization propagation
2023-12-22 11:37:50 +01:00
Miriam Baglioni
b06aea0adf
adding the bulkTag parameter file in the folder for the oozie workflow for bulkTagging. Changes the path in the class
2023-12-22 11:35:37 +01:00
Miriam Baglioni
3afd4aa57b
adjustments for country propagation
2023-12-22 11:27:30 +01:00
dimitrispie
ffdd03d2f4
Monitor Irish Stats WF
...
Parameters (with examples):
stats_db_name=openaire_beta_stats_20231208
monitor_irish_db_name=openaire_beta_stats_monitor_ie_20231208b
monitor_irish_db_prod_name=openaire_beta_stats_monitor_ie
graph_db_name=openaire_beta_20231208
monitor_irish_db_shadow_name=openaire_beta_stats_monitor_ie_shadow
hive_timeout=150000
hadoop_user_name=dnet.beta
resumeFrom=Step1-buildIrishMonitorDB
2023-12-22 11:05:24 +02:00
dimitrispie
40b98d8182
Changes to indicators and funders definition
...
- Changes result_refereed definition
- Added result_country indicator
- Added indi_pub_green_with_license indicator
- Added country from jurisdiction to funders
2023-12-22 10:29:20 +02:00
Claudio Atzori
62104790ae
added metaresourcetype to the result hive DB view
2023-12-21 12:27:10 +01:00
Miriam Baglioni
5011c4d11a
refactoring after compiletion
2023-12-20 15:57:26 +01:00
Miriam Baglioni
4740c808f7
-
2023-12-20 14:26:54 +01:00
Miriam Baglioni
d410ea8a41
added needed parameter
2023-12-19 12:15:01 +01:00
Miriam Baglioni
624f5f3f21
[Transformative Agreement] added check to verify the APC were paid byu the IReL funder
2023-12-18 15:28:19 +01:00
Miriam Baglioni
354e02e6a9
[Transformative Agreement] removed not needed class. Read directly the json and no need to pass from the csv
2023-12-18 15:20:27 +01:00
Miriam Baglioni
b00771c7cc
[Transformative Agreement] added code to extract relations from the transformative agreement file for the IE products got from OpenAPC
2023-12-18 15:12:44 +01:00
Sandro La Bruzzo
15fd93a2b6
uploaded input parameters on CreateBaseline WF
2023-12-18 12:21:55 +01:00
Sandro La Bruzzo
9d342a47da
updated the transformation Baseline workflow to include mdstore rollback/commit action
2023-12-18 11:48:57 +01:00
Miriam Baglioni
3eca5d2e1c
-
2023-12-18 09:55:27 +01:00
Miriam Baglioni
01ce0b9c76
[doiboost - preprocess] remove transition to orcid preparation from sequence of steps at the beginning of the workflow
2023-12-15 12:24:55 +01:00
Miriam Baglioni
0d8e496a63
-
2023-12-15 12:16:43 +01:00
Claudio Atzori
ff924215b8
[graph provision] added tests for new peerreviewed field
2023-12-12 11:21:30 +01:00
Claudio Atzori
7e8eff40c1
[graph provision] added tests for the new model fields
2023-12-12 08:54:15 +01:00
Miriam Baglioni
8752d275fa
removed not needed parameter
2023-12-09 15:24:45 +01:00
Miriam Baglioni
d4eedada71
adjusting workflow definition
2023-12-09 15:20:11 +01:00
Claudio Atzori
cb71a7936b
[graph cleaning] avoid stack overflow error when navigating Oaf objects declaring an Enum
2023-12-07 23:09:54 +01:00
Claudio Atzori
70eb1796b2
logging typo
2023-12-07 14:08:04 +01:00
Claudio Atzori
c381bacee0
[enrichment] passing the community API base URL
2023-12-07 14:07:11 +01:00
Miriam Baglioni
336fb31d87
[community_result_propagation] adjusting starting poit of workflow
2023-12-07 10:27:25 +01:00
Miriam Baglioni
c0cde53bf6
[bulktagging] setting first step of bulktaggin as the copy of the entities and relations not involved in the tagging'
2023-12-07 10:08:35 +01:00
Miriam Baglioni
616622d2bb
first version of the workflow single step
2023-12-07 09:59:52 +01:00
Claudio Atzori
259c69e446
[orcid enrichment] fixed workflow definition
2023-12-06 19:41:53 +01:00
Claudio Atzori
431c6bb08a
[dedup] added isLookupUrl to the graph consistency workflow definition, required now by the entity grouping phase
2023-12-06 11:06:46 +01:00
Giambattista Bloisi
613ec5ffce
Add profiles for different spark versions: spark-24, spark-34, spark-35
2023-12-05 19:11:06 +01:00
Sandro La Bruzzo
52495f2cd2
used javax.xml.stream.XMLEventReader instead of deprecated scala.xml.pull.XMLEventReader
2023-12-05 19:11:06 +01:00
Sandro La Bruzzo
8c3e9a09d3
added repository openaire-third-parties
2023-12-05 19:11:06 +01:00
Giambattista Bloisi
2fa78f6071
Changes requires to build and run tests with Java 17
2023-12-05 19:11:06 +01:00
Giambattista Bloisi
326c9dc08c
Changes in maven poms to build and test the project using Spark 3.4.x and scala 2.12
2023-12-05 19:11:06 +01:00
Claudio Atzori
321922772b
added serialization for the new fields imported for the Irish tender
2023-12-05 16:37:04 +01:00
Claudio Atzori
c5b7253130
[community_organization propagation] fixed workflow parameters
2023-12-05 09:13:33 +01:00
Claudio Atzori
3c3bdb8318
[bulktagging] fixed workflow parameters
2023-12-05 09:08:48 +01:00
Claudio Atzori
2a233a89aa
[graph grouping] added isLookupUrl to the workflow definition, passed to the grouping spark aciton
2023-12-03 13:32:52 +01:00
Claudio Atzori
178a14c491
code formatting
2023-12-03 13:31:58 +01:00
Sandro La Bruzzo
3caf6ff27e
Extracted the correct original type to pass to instanceTypeMapping in Crossref Mapping
2023-12-01 16:33:56 +01:00
Claudio Atzori
511a98dd80
fixed doiboost process workflow, removed references to the ProcessORCID step
2023-12-01 16:21:53 +01:00
Claudio Atzori
09d061e90b
Merge branch 'beta' into orcid_import
2023-12-01 15:05:35 +01:00
Claudio Atzori
93a700742a
Merge pull request 'Changes for tables and creation of the new indicator indi_is_result_accessible' ( #363 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#363
2023-12-01 15:05:23 +01:00
Claudio Atzori
0c3c9ea43d
Merge pull request 'StatsDB workflow to export actionsets about OA routes, diamond, and publicly-funded' ( #355 ) from dimitris.pierrakos/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#355
2023-12-01 15:03:56 +01:00
Claudio Atzori
33cb483c75
using objectSubType as originalType in Crossref2Oaf, code formatting
2023-12-01 15:03:05 +01:00
dimitrispie
c9d995dde0
New institutions added
2023-12-01 15:44:35 +02:00
dimitrispie
a397112cb8
Add new indicator
...
Add indi_pub_publicly_funded
2023-12-01 15:00:18 +02:00
dimitrispie
76594ded23
Changes to indicators
...
Fixes on open access colours indicators
- indi_pub_green_oa
- indi_pub_gold_oa
- indi_pub_hybrid
- indi_pub_bronze_oa
- indi_pub_diamond
2023-12-01 13:38:19 +02:00
Claudio Atzori
622fafbd2e
Merge branch 'beta' into orcid_import
2023-12-01 12:28:14 +01:00
Sandro La Bruzzo
bf0fd27c36
Removed unused function
...
Applied PR Comment of Giambattista in the PR
2023-12-01 12:16:42 +01:00
dimitrispie
48430a32a6
Update StatsAtomicActionsJob.java
...
Added indi_funded_result_with_fundref indicator
2023-12-01 11:35:01 +02:00
Sandro La Bruzzo
cdfb7588dd
code formatting
2023-11-30 15:31:42 +01:00
Sandro La Bruzzo
5e22b67b8a
Merge remote-tracking branch 'origin/beta' into orcid_import
2023-11-30 15:27:46 +01:00
Sandro La Bruzzo
f718caaac9
Added copy of the untouched entities of the graph
2023-11-30 14:51:00 +01:00
Sandro La Bruzzo
7b5e04f37e
removed Orcid intersection on DOIBoost
2023-11-30 14:36:50 +01:00
Claudio Atzori
6f10791e77
Merge branch 'beta' into propagationapi
2023-11-30 14:20:18 +01:00
Claudio Atzori
4e1aac2e2f
resolved conflict in pom.xml before applying the changes from [COAR based resource types & Irish tender] #350
2023-11-29 14:37:52 +01:00
Sandro La Bruzzo
86b5775e08
added vocabulary in instanceTypeMapping for
...
- DOIBoost
- Datacite
- PubMed
- Scholexplorer Datasource
2023-11-29 13:15:43 +01:00
Sandro La Bruzzo
c96ff54b45
Merge remote-tracking branch 'origin/resource_types' into resource_types
2023-11-29 12:45:41 +01:00
Sandro La Bruzzo
af1c2634b3
added instanceTypeMapping original field in the mapping of
...
- DOIBoost
- Datacite
- PubMed
- Scholexplorer Datasource
2023-11-29 12:45:30 +01:00
Sandro La Bruzzo
279100fa52
added test
2023-11-29 11:17:58 +01:00
Sandro La Bruzzo
59111713fa
added comment
2023-11-28 09:00:48 +01:00
Sandro La Bruzzo
6f4d0c05ea
Implemented Author MErger for ORCID that takes in account the case when name and surname are swapped
2023-11-28 08:43:56 +01:00
Miriam Baglioni
8eb70e6657
refactoring
2023-11-27 15:13:15 +01:00
Miriam Baglioni
e3cce9a5a0
mergin with branch beta
2023-11-27 15:10:55 +01:00
Miriam Baglioni
48e0427a23
changed the parameter from production to baseURL. Fixed issue in tagging configuration
2023-11-27 15:10:27 +01:00
Sandro La Bruzzo
34a4b3cbdf
Implemented ORCID Enrichment
2023-11-24 12:39:58 +01:00
dimitrispie
359e81b7a6
Update StatsAtomicActionsJob.java
...
Bug fix for duplicate bronze checks
2023-11-23 10:48:55 +02:00
Claudio Atzori
2c77638bf5
Merge branch 'beta' into cleaning_8898
2023-11-22 14:00:10 +01:00
Claudio Atzori
745039ad5b
Merge branch 'beta' into 9117_pubmed_affiliations
2023-11-22 13:52:53 +01:00
Claudio Atzori
11a1207f9c
[graph cleaning] applying coar based vocabularies in bulk
2023-11-22 12:22:14 +01:00
dimitrispie
a94a54a2d0
Changes for tables and creation of the new indicator indi_is_result_accessible
...
- Drop table statements for all tables to avoid duplicates in case of wf rerun
- Add pdfsaggregated step to create the indi_is_result_accessible table. This step is executed on the new impala cluster only, since the pdfaggregation_i is updated on this cluster.
2023-11-15 14:32:18 +02:00
Miriam Baglioni
eaf0a702de
-
2023-11-14 14:53:34 +01:00
Sandro La Bruzzo
6ce36b3e41
Implemented ORCID Workflow on DHP-Aggregation for retrieving ORCID DUMP and generating tables
2023-11-14 12:04:29 +01:00
dimitrispie
d524e30866
Changes to actionsets
...
Resolve comments from
D-Net/dnet-hadoop#355
2023-11-14 09:46:52 +02:00
Miriam Baglioni
5bc97615d5
-
2023-11-03 15:35:10 +01:00
Miriam Baglioni
7b1e34f159
refactoring
2023-11-03 15:30:01 +01:00
Miriam Baglioni
638ad9e74f
changing test for new implementation
2023-11-03 15:06:50 +01:00
Miriam Baglioni
edcb17ca98
refactoring and test
2023-11-03 13:01:14 +01:00
Miriam Baglioni
937ff6a7c7
-
2023-10-31 15:56:08 +01:00
Miriam Baglioni
a737dd47b6
removed not needed test class
2023-10-31 15:54:49 +01:00
Miriam Baglioni
c80b768af0
test for project propagation
2023-10-31 15:49:42 +01:00
Miriam Baglioni
e9a20fc8f6
mergin with branch beta
2023-10-31 14:36:03 +01:00
Claudio Atzori
262d7c581b
[graph cleaning] implemented further suggestions from https://support.openaire.eu/issues/8898
2023-10-31 14:34:10 +01:00
Serafeim Chatzopoulos
2090003ea9
Adjust tests to new WF input params
2023-10-26 13:47:06 -07:00
Serafeim Chatzopoulos
a82aaf57b2
Renaming input param for crossref input path
2023-10-25 12:05:02 -07:00
Claudio Atzori
b3a61ea955
Merge branch 'beta' into url_validation
2023-10-25 14:22:56 +02:00
dimitrispie
89c4dfbaf4
StatsDB workflow to export actionsets about OA routes, diamond, and publicly-funded
...
A new oozie workflow capable to read from the stats db to produce a new actionSet for updating results with:
- green_oa ={true, false}
- openAccesColor = {gold, hybrid, bronze}
- in_diamond_journal={true, false}
- publicly_funded={true, false}
Inputs:
- outputPath
- statsDB
2023-10-24 09:48:23 +03:00
Claudio Atzori
7fc621cdec
added defaults to the graph resolution workflow config-default.xml
2023-10-20 22:28:12 +02:00
Serafeim Chatzopoulos
aad5982bf1
Change the description of the workflow
2023-10-20 12:48:21 +03:00
Miriam Baglioni
a4214ced1e
fixing issue on propagation organization. added --config to workflow definition. added oozie_app to communtiy project
2023-10-20 10:14:20 +02:00
Serafeim Chatzopoulos
6b19dcee80
Add actionset creation for pubmed affiliations
2023-10-19 19:58:25 +03:00
Claudio Atzori
2b9d0416ec
[graph raw] URL Validator to accept double slashes
2023-10-19 16:26:37 +02:00
Claudio Atzori
b0fed1725e
avoid NPEs
2023-10-19 12:13:45 +02:00
Miriam Baglioni
f1b898c6b4
mergin with branch beta
2023-10-19 09:04:35 +02:00
Claudio Atzori
6dfcd0c9a2
[raw graph] mapping original resource types
2023-10-16 12:57:18 +02:00
Claudio Atzori
39d24d5469
Merge branch 'beta' into resource_types
2023-10-16 11:56:38 +02:00
Sandro La Bruzzo
a5a89a702f
new spark parrameter updated
2023-10-16 11:46:12 +02:00
Miriam Baglioni
159388f9c2
testing and fix some issues
2023-10-16 11:26:07 +02:00
Claudio Atzori
03670bb9ce
[dedup] use common saveParquet and save methods to ensure outputs are compressed
2023-10-16 10:55:47 +02:00
Claudio Atzori
54fbf09ac6
[raw graph] WIP: mapping original resource types
2023-10-16 08:57:47 +02:00
Claudio Atzori
6cf64d5d8b
[SWH] renamed 'Software Heritage Identifier' to 'Software Hash Identifier'
2023-10-13 10:09:26 +02:00
Claudio Atzori
76447958bb
cleanup & docs
2023-10-12 12:23:20 +02:00
Claudio Atzori
dda602fff7
[AMF] docs
2023-10-12 10:05:46 +02:00
Miriam Baglioni
8e9493fad9
mergin with branch beta
2023-10-11 18:18:09 +02:00
Miriam Baglioni
89184d5b4f
used the API instead of the IS for bulktagging and propagation for community through organization. Added a new propagation step for communities through projects. Still using the API and not the IS
2023-10-11 18:17:35 +02:00
Claudio Atzori
554551682d
[raw graph] adopting the new COAR based vocabularies for the resource typing
2023-10-11 16:09:19 +02:00
Claudio Atzori
a460ebe215
[UnresolvedEntities] updated action name
2023-10-10 15:50:11 +02:00
Claudio Atzori
66064e99fe
Merge branch 'beta' into fos
2023-10-10 15:07:21 +02:00
Miriam Baglioni
a431b04814
leftover for the properties and removal of bipfinder
2023-10-10 12:53:57 +02:00
Claudio Atzori
ed9282ef2a
removed module dhp-stats-monitor-update
2023-10-10 09:52:03 +02:00
Miriam Baglioni
110ce4b40f
extend the fos model to include the level4 and the scores for level3 and level4. removed bip indicators from the instance
2023-10-10 09:46:40 +02:00
Claudio Atzori
204404b0e3
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-10-10 09:36:13 +02:00
Claudio Atzori
9a98f408b3
code formatting
2023-10-10 09:36:11 +02:00
Claudio Atzori
4e6fccf4f6
Merge pull request 'Beta stats wf updated' ( #332 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#332
2023-10-10 09:35:32 +02:00
Miriam Baglioni
a3d01ccb24
refactoring
2023-10-09 14:52:17 +02:00
Miriam Baglioni
8448b9ebfb
mergin with branch beta
2023-10-09 14:27:23 +02:00
Miriam Baglioni
3d6be20989
changes to use the API instead of the IS the get the information for the communities to be used during bulktagging and context propagation
2023-10-09 14:26:33 +02:00
dimitrispie
17586f0ff8
Update step20-createMonitorDB.sql
...
Add result_orcid table to monitor dbs
2023-10-09 14:21:31 +03:00
dimitrispie
489a082f04
Update step16-createIndicatorsTables.sql
...
Change scripts for gold, hybrid, bronze indicators
2023-10-09 14:00:50 +03:00
Claudio Atzori
ef833840c3
[Doiboost] removed linkage to SFI unidentified project
2023-10-06 15:48:18 +02:00