Michele Artini
5cdba9172b
implementeation of the new collector plugin: research_fi
2024-07-10 14:53:13 +02:00
Miriam Baglioni
c465835061
[Person]new implementation for the extraction of the coAuthorship relations
2024-07-09 12:29:55 +02:00
Miriam Baglioni
814e650e12
[Irish Tender]changed the irish.json file according to comments #26 , #29 , and #34 for 9635
2024-07-04 12:24:28 +02:00
Miriam Baglioni
ddd20e7f8e
[Person]first implementation of the action set to include Person entity in the graph starting from the orcid data
2024-07-04 12:08:46 +02:00
Miriam Baglioni
9cbe966b4a
[AffiliationIngestion]refactoring
2024-06-29 18:35:49 +02:00
Miriam Baglioni
236b64d830
[AffiliationIngestion]Extended the ingestion of affiliation from open aire to include also links derived from Web Crawl. Extended the test. Inserted in Constatns the id and name of the webcrawl datasource to be used here and also in the ingestion of links from web crawl
2024-06-29 18:29:20 +02:00
Miriam Baglioni
67ff783e65
[Person]First implementation to include Person entity in the graph
2024-06-29 17:13:01 +02:00
Miriam Baglioni
d35edac212
[IrishFunderList]make changed according to 9635 comment 20, 21, 22 and 23
2024-06-20 12:28:28 +02:00
Miriam Baglioni
6421f8fece
Merge remote-tracking branch 'origin/beta' into beta
2024-06-19 11:12:15 +02:00
Miriam Baglioni
ac270f795b
[IrishFunderList]make changed according to 9635 comment 14, 15 and 16
2024-06-19 11:11:52 +02:00
Giambattista Bloisi
9bf2bda1c6
Fix: next returned a null value at end of stream
2024-06-12 13:28:51 +02:00
Giambattista Bloisi
d90cb099b8
Fix for paginationStart parameter management
2024-06-11 20:23:44 +02:00
Miriam Baglioni
8fe934810f
Merge remote-tracking branch 'origin/beta' into beta
2024-06-11 10:28:51 +02:00
Miriam Baglioni
9da006e98c
[SDGFoSActionSet]remove datainfo for the result. It is not needed (qualifier.classid = UPDATE) useless since subject do not go at the level of the instance
2024-06-11 10:28:32 +02:00
Giambattista Bloisi
85c1eae7e0
Fixes for pagination strategy looping at end of download
2024-06-10 19:03:58 +02:00
Michele Artini
c726572418
changed some parameters in OSF test
2024-06-07 12:03:26 +02:00
Claudio Atzori
a02f3f0d2b
code formatting
2024-05-30 10:21:18 +02:00
Alessia Bardi
05ee783c07
Merge branch 'beta' into dblp_collection_plugin
2024-05-29 16:04:39 +02:00
Claudio Atzori
c272c4ad68
code formatting
2024-05-29 15:50:07 +02:00
Alessia Bardi
c5f4da16a4
Merge branch 'beta' into rest-collector-request-header-map
2024-05-29 15:46:23 +02:00
Alessia
1b165a14a0
Rest collector plugin on hadoop supports a new param to pass request headers
2024-05-29 15:41:36 +02:00
Michele Artini
e996787be2
OSF test
2024-05-29 15:05:17 +02:00
Miriam Baglioni
5d85b70e1f
[NOAMI] removed Ireland funder id 501100011103. ticket 9635
2024-05-29 11:55:00 +02:00
Miriam Baglioni
75d5ddb999
Update to include a blackList that filters out the results we know are wrongly associated to IE - update workflow definition - the blacklist parameter
2024-05-27 12:01:28 +02:00
Miriam Baglioni
87c9c61b41
Update to include a blackList that filters out the results we know are wrongly associated to IE - refactoring
2024-05-27 12:01:16 +02:00
Miriam Baglioni
b55fed09f8
Update to include a blackList that filters out the results we know are wrongly associated to IE
2024-05-27 12:01:01 +02:00
Sandro La Bruzzo
66c1ffc866
merged again from beta (I hope for the last time)
2024-05-22 11:02:46 +02:00
Sandro La Bruzzo
e8a61d5dd5
removed plugin, use only FileGZip plugin
2024-05-21 13:45:29 +02:00
Sandro La Bruzzo
ca9414b737
Implement multiple node name splitter on GZipCollectorPlugin and all nodes that use XMLIterator. If the splitter name contains is a comma separated values it splits for all the values
2024-05-21 09:11:13 +02:00
Sandro La Bruzzo
032bcc8279
since last beta workflow we decide to introduce in the graph only MAG item with DOI and set them invisible ( this should be the same behaviour of the previous DOIBoost mapping).
...
This commit apply this type of mapping
2024-05-20 09:24:15 +02:00
Claudio Atzori
f7d56e2ef2
Merge branch 'beta' into rest-collector-plugin-with-retry
2024-05-10 09:02:21 +02:00
Claudio Atzori
26363060ed
fixed id prefix creation for the fosnodoi records, again
2024-05-03 15:53:52 +02:00
Claudio Atzori
e1a0fb8933
fixed id prefix creation for the fosnodoi records
2024-05-03 14:14:18 +02:00
Michele Artini
f4068de298
code reindent + tests
2024-05-02 09:51:33 +02:00
Michele Artini
2615136efc
added a retry mechanism
2024-04-30 11:58:42 +02:00
Sandro La Bruzzo
052c6aac9d
formatted code
2024-04-26 16:03:04 +02:00
Sandro La Bruzzo
0d628cd62b
merged again from beta
2024-04-23 17:34:55 +02:00
Claudio Atzori
93dd9cc639
code formatting
2024-04-23 11:28:00 +02:00
Miriam Baglioni
6189879643
[NOAMI] removed entry for Irish Research eLibray (IReL) Care Board from the list of funders.
2024-04-23 11:09:18 +02:00
Miriam Baglioni
7de114bda0
[WebCrawl] addressing comments from PR
2024-04-22 13:52:50 +02:00
Miriam Baglioni
776c898c4b
[WebCrawl] adding affiliation relations from web information
2024-04-22 11:04:17 +02:00
Claudio Atzori
0656ab2838
code formatting
2024-04-20 08:10:58 +02:00
Claudio Atzori
e5879b68c7
[transformative agreement] including reuslt-funder relations to the information imported from the TRs
2024-04-19 17:14:18 +02:00
Sandro La Bruzzo
b84ad0c06e
merged beta
2024-04-19 14:39:59 +02:00
Miriam Baglioni
0625b9061f
removed the funder id : 100011062 Asian Spinal Cord Network, wrongly associated to Ireland
2024-04-16 15:26:53 +02:00
Miriam Baglioni
9eeb9f5d32
mergin with branch beta
2024-04-16 15:24:40 +02:00
Sandro La Bruzzo
a5ddd8dfbb
Added Action set generation for the MAG organization
2024-04-16 13:39:15 +02:00
Michele Artini
78b9d84e4a
test
2024-04-16 09:41:16 +02:00
Sandro La Bruzzo
41a42dde64
code formatted
2024-04-11 17:43:48 +02:00
Sandro La Bruzzo
843dc95340
resolved conflict
2024-04-11 17:38:16 +02:00
Sandro La Bruzzo
1e30454ee0
added vocabulary tu instanceTypeMApping of Mag
2024-04-11 17:32:30 +02:00
Sandro La Bruzzo
2581672c11
updated wf of MAG and crossref to use transaction
2024-04-11 17:27:49 +02:00
Sandro La Bruzzo
a0642bd190
added instanceTypeMapping field on MAG
2024-04-11 13:10:12 +02:00
Sandro La Bruzzo
98dc042db5
mapping generated for MAG,
...
missing generation of Organization Action set
2024-04-05 18:12:53 +02:00
Sandro La Bruzzo
ef582948a7
Updated mapping
2024-04-05 11:10:44 +02:00
Sandro La Bruzzo
5142f462b5
completed mapping from paper to OAF, not tested
2024-04-04 21:06:04 +02:00
Miriam Baglioni
0794e0667b
Merge branch 'doidoost_dismiss' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doidoost_dismiss
2024-04-04 09:16:18 +02:00
Miriam Baglioni
4b1de076ac
[DataciteHostedByMap] added entry for EBRAINS
2024-04-04 09:16:14 +02:00
Miriam Baglioni
c8a88b2187
[DataciteHostedByMap] added entry for EBRAINS
2024-04-04 09:14:58 +02:00
Sandro La Bruzzo
31e152d2bb
Merge remote-tracking branch 'origin/doidoost_dismiss' into doidoost_dismiss
2024-04-03 17:08:35 +02:00
Sandro La Bruzzo
6f3e925cae
Implemented first part of the new MAG mapping
2024-04-03 17:07:14 +02:00
Miriam Baglioni
f0f6abf892
[MapToFunderLink]added references for HFRI and Erasmus+ for the creation of links for funders
2024-04-03 14:59:09 +02:00
Miriam Baglioni
50fbebf186
[NOAMI] removed entry for Health and Social Care Board from the list of funders. Modified IRC putting 1596 and 1597 as synonyms, as required in ticket 9635
2024-04-03 11:45:40 +02:00
Michele Artini
71d6e02886
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2024-04-03 09:50:41 +02:00
Michele Artini
02c9a311c8
base datainfo with trust=0.89
2024-04-03 09:50:21 +02:00
Miriam Baglioni
42846d3b91
[OpenCitation] add compression option when writing the sequence file
2024-04-03 09:25:00 +02:00
Miriam Baglioni
4f0a044245
Merge pull request 'Add action set creation for Datacite affiliations' ( #413 ) from 9647_datacite_affiliations into beta
...
Reviewed-on: #413
2024-04-02 17:33:38 +02:00
Serafeim Chatzopoulos
cbe13a5c61
Fix datacite input path in properties file
2024-04-02 18:00:35 +03:00
Miriam Baglioni
9c9a9562ae
[UsageCount] fixed error
2024-04-02 16:56:37 +02:00
Miriam Baglioni
b42bdd5fb3
[UsageCount] add check in case the datasource is not matched against those present in the graph
2024-04-02 16:28:27 +02:00
Miriam Baglioni
64cbd8abe9
Merge pull request '[UsageCount] Usage count per result split by datasource' ( #318 ) from UsageStatsRecordDS into beta
...
Reviewed-on: #318
2024-04-02 10:21:39 +02:00
Serafeim Chatzopoulos
0eb0701b26
Add action set creation for Datacite affiliations
2024-04-01 17:23:26 +03:00
Sandro La Bruzzo
73a67c0e4a
Improved Crossref mapping to include also unpaywall tested
2024-03-26 17:26:47 +01:00
Miriam Baglioni
94b931f7bd
[BulkTagging - tag datasource and projects]merging with branch beta
2024-03-26 14:25:19 +01:00
Claudio Atzori
ef52128c55
included new stats* workflows in parent pom list of modules, code formatting
2024-03-26 10:42:10 +01:00
Sandro La Bruzzo
ece56f0178
update crossref mapping to be transformed together with UnpayWall
2024-03-25 18:18:10 +01:00
Claudio Atzori
74e5d05577
Merge branch 'beta' into ocnew
2024-03-25 16:10:31 +01:00
Claudio Atzori
6c3b692f60
integrated minor change from beta branch
2024-03-25 16:10:23 +01:00
Claudio Atzori
9a5b134ddf
Merge branch 'beta' into FOSNew
2024-03-25 16:07:37 +01:00
Claudio Atzori
71c1f81b54
Merge branch 'beta' into exception_on_invalid_transofmation_rule
2024-03-25 16:05:11 +01:00
Claudio Atzori
91b61687fa
Merge branch 'beta' into bulkTaggingPathMapExtention
2024-03-25 15:50:18 +01:00
Claudio Atzori
54936b7f42
Merge branch 'beta' into transformativeagreement
2024-03-25 15:42:22 +01:00
Michele Artini
e1149eb5c4
xslt rules and tests
2024-03-25 15:01:42 +01:00
Michele Artini
6ffb1faf09
fixed a problem with multiple nodes
2024-03-25 12:15:51 +01:00
Michele Artini
7faa115ba0
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2024-03-22 11:08:59 +01:00
Michele Artini
f9c74c98fa
fixed an identifier xpath
2024-03-22 11:08:45 +01:00
Sandro La Bruzzo
58dbe71d39
update crossref mapping to be runnable separately as a single datasource outside doiboost
2024-03-20 17:04:52 +01:00
Giambattista Bloisi
664a381d31
Unify merge logic of entities in MergeUtils.class
2024-03-18 16:04:49 +01:00
Michele Artini
cb29b9773c
xslt rules
2024-03-18 15:31:34 +01:00
Michele Artini
85b844d57e
updated BASE filter param
2024-03-15 15:03:27 +01:00
Michele Artini
455f2e1e07
apply commits from master
2024-03-15 14:56:39 +01:00
Michele Artini
88fef367b9
new plugin to collect from a dump of BASE
2024-03-15 10:47:52 +01:00
Sandro La Bruzzo
5281f010a5
applied cherry pick
2024-03-13 09:59:20 +01:00
Sandro La Bruzzo
ee1fcb672b
code refactor
2024-03-13 09:46:31 +01:00
Miriam Baglioni
5a32bb9578
[OC New] last fix
2024-03-13 09:36:18 +01:00
Sandro La Bruzzo
c532831718
Moved Crossref Mapping on dhp-aggregations,
...
refactored code, avoid to use utility for create part of the oaf defined in DOIBoostMappingUtils, used instead utility in OafMappingUtils
2024-03-13 06:56:10 +01:00
Miriam Baglioni
48c052215c
[OC New] last fix
2024-03-12 23:12:32 +01:00
Sandro La Bruzzo
cbd4e5e4bb
update mag mapping
2024-03-08 16:31:40 +01:00
Miriam Baglioni
5180b6ec8a
[FOSNEW] removed test class
2024-03-07 10:47:13 +01:00
Miriam Baglioni
7827a2d66b
[OCNEW] added creation of the actionset for the results classified with FoS based ont he OpenAIRE identifier
2024-03-07 10:36:30 +01:00
Miriam Baglioni
fd34372c40
[OCNEW] first implementation
2024-03-06 13:42:00 +01:00
Sandro La Bruzzo
d34cef3f8d
Merge remote-tracking branch 'origin/beta' into doidoost_dismiss
2024-03-05 11:45:31 +01:00
Sandro La Bruzzo
3b837d38ce
added oozie workflow
2024-03-05 11:44:59 +01:00
Sandro La Bruzzo
f417515e43
Implemented class that generates a normalized table of MAG, which is the starting point for the creation of the mag source
2024-03-04 17:15:13 +01:00
Sandro La Bruzzo
ad0e9aa80c
added first part of refactoring of the code generating MAG,
...
make it more readable using spark sql queries
2024-02-29 18:16:15 +01:00
Giambattista Bloisi
3cd5590f3b
When converting json to XML, remove characters that are not allowed in the XML 1.0 specs, as they will cause xpath failures even if escaped
2024-02-28 15:14:18 +01:00
Giambattista Bloisi
56dd05f85c
Merge pull request 'Revised procedure when converting json data into xml' ( #395 ) from restiterator_xmlcleanup into beta
...
Reviewed-on: #395
2024-02-28 10:38:54 +01:00
Sandro La Bruzzo
7d806a434c
formatted code
2024-02-28 09:31:58 +01:00
Sandro La Bruzzo
915a76a796
following the comment on the pull requests:
...
- Added #NUM_OF_THREADS complete job in the queue at the end of the main loop to avoid deadlock
2024-02-28 09:10:55 +01:00
Giambattista Bloisi
773e856550
Revised procedure when converting json data into xml:
...
- json object keys are renamed to be conformant to xml tag elements, special characters are substituted or removed
- json string values are no longer post-processed as they are already escaped by the org.json.XML.toString method
2024-02-24 16:54:30 +01:00
Sandro La Bruzzo
a712df1e1d
Merge remote-tracking branch 'origin/beta' into orcid_update
2024-02-23 10:12:25 +01:00
Sandro La Bruzzo
b32a9d1994
Implemented workflow for updating table , added step to check if the new generated table is valid
2024-02-23 10:04:28 +01:00
Miriam Baglioni
72bae7af76
[Transformative Agreement] removed the relations from the ActionSet waiting to have the gree light from Ioanna
2024-02-19 16:20:12 +01:00
Serafeim Chatzopoulos
f0dc12634b
Add Action Set creation for affiliations inferred from the OpenAPC data
2024-02-18 18:02:09 +02:00
Miriam Baglioni
eca021f4d6
[Transformative Agreement] add results with information abount the agreement and the country of the organization paid for it
2024-02-13 12:21:07 +01:00
Miriam Baglioni
bdb6bbb365
mergin with branch beta
2024-02-12 15:50:43 +01:00
Miriam Baglioni
07a373a0bd
[bulkTagging] removing checks while performing the substring action so that it will fire an Exception if the paramneters are wrongly set
2024-01-30 13:51:11 +01:00
Miriam Baglioni
a418dacb47
[UsageCount] code extention to include also the name of the datasource
2024-01-29 18:12:33 +01:00
Miriam Baglioni
e9131f4e4a
mergin with branch beta
2024-01-29 16:27:18 +01:00
Sandro La Bruzzo
9aebca77a0
Added exception throwing in Hadoop transformation when TR is not syntactically valid
2024-01-29 14:41:02 +01:00
Sandro La Bruzzo
0386f36385
Added workflow to update ORCID and replaced some parsing, because the update works and employments xml differs from the dump one.
2024-01-25 19:40:59 +01:00
Sandro La Bruzzo
43e0bba7ed
logg added during download
2024-01-23 15:04:49 +01:00
Miriam Baglioni
f7d06dc661
compilation after merging
2024-01-23 11:43:08 +01:00
Miriam Baglioni
e0ec800d7e
[BulkTagging] extend the definition of the pathMap to include also actions that should be performed of the value extracted from the result befor applying the constraint
2024-01-23 11:34:53 +01:00
Sandro La Bruzzo
e0753f19da
Fixed error of connection timeout
2024-01-13 09:27:08 +01:00
sandro.labruzzo
e328bc0ade
fixed missing parameter on download update
2024-01-12 16:18:20 +01:00
Miriam Baglioni
f612125939
fix issue on FoS integration. Removing the null values from FoS
2024-01-12 10:20:28 +01:00
Sandro La Bruzzo
859babf722
added some useful comment
2024-01-10 19:51:13 +01:00
Sandro La Bruzzo
8f61063201
Added workflow
2024-01-10 19:42:22 +01:00
Sandro La Bruzzo
1a42a5c10d
Implemented Download update of ORCID
2024-01-10 18:03:20 +01:00
Miriam Baglioni
624f5f3f21
[Transformative Agreement] added check to verify the APC were paid byu the IReL funder
2023-12-18 15:28:19 +01:00
Miriam Baglioni
354e02e6a9
[Transformative Agreement] removed not needed class. Read directly the json and no need to pass from the csv
2023-12-18 15:20:27 +01:00
Miriam Baglioni
b00771c7cc
[Transformative Agreement] added code to extract relations from the transformative agreement file for the IE products got from OpenAPC
2023-12-18 15:12:44 +01:00
Sandro La Bruzzo
15fd93a2b6
uploaded input parameters on CreateBaseline WF
2023-12-18 12:21:55 +01:00
Sandro La Bruzzo
9d342a47da
updated the transformation Baseline workflow to include mdstore rollback/commit action
2023-12-18 11:48:57 +01:00
Giambattista Bloisi
613ec5ffce
Add profiles for different spark versions: spark-24, spark-34, spark-35
2023-12-05 19:11:06 +01:00
Sandro La Bruzzo
52495f2cd2
used javax.xml.stream.XMLEventReader instead of deprecated scala.xml.pull.XMLEventReader
2023-12-05 19:11:06 +01:00
Claudio Atzori
33cb483c75
using objectSubType as originalType in Crossref2Oaf, code formatting
2023-12-01 15:03:05 +01:00
Claudio Atzori
622fafbd2e
Merge branch 'beta' into orcid_import
2023-12-01 12:28:14 +01:00
Sandro La Bruzzo
cdfb7588dd
code formatting
2023-11-30 15:31:42 +01:00
Sandro La Bruzzo
5e22b67b8a
Merge remote-tracking branch 'origin/beta' into orcid_import
2023-11-30 15:27:46 +01:00
Claudio Atzori
6f10791e77
Merge branch 'beta' into propagationapi
2023-11-30 14:20:18 +01:00
Claudio Atzori
4e1aac2e2f
resolved conflict in pom.xml before applying the changes from [COAR based resource types & Irish tender] #350
2023-11-29 14:37:52 +01:00
Sandro La Bruzzo
86b5775e08
added vocabulary in instanceTypeMapping for
...
- DOIBoost
- Datacite
- PubMed
- Scholexplorer Datasource
2023-11-29 13:15:43 +01:00
Sandro La Bruzzo
af1c2634b3
added instanceTypeMapping original field in the mapping of
...
- DOIBoost
- Datacite
- PubMed
- Scholexplorer Datasource
2023-11-29 12:45:30 +01:00
Miriam Baglioni
8eb70e6657
refactoring
2023-11-27 15:13:15 +01:00
Sandro La Bruzzo
34a4b3cbdf
Implemented ORCID Enrichment
2023-11-24 12:39:58 +01:00
Sandro La Bruzzo
6ce36b3e41
Implemented ORCID Workflow on DHP-Aggregation for retrieving ORCID DUMP and generating tables
2023-11-14 12:04:29 +01:00
Serafeim Chatzopoulos
2090003ea9
Adjust tests to new WF input params
2023-10-26 13:47:06 -07:00
Serafeim Chatzopoulos
a82aaf57b2
Renaming input param for crossref input path
2023-10-25 12:05:02 -07:00