Claudio Atzori
6fd50266f1
translate 'otherresearchproduct' into 'other' when setting the related record type
2024-10-28 10:42:46 +01:00
Claudio Atzori
dffa376eb6
Merge pull request 'dhp-schema upgrade & provision mapping' ( #498 ) from beta_provision_alignment_9.0.0 into beta
...
Reviewed-on: D-Net/dnet-hadoop#498
2024-10-28 10:03:24 +01:00
Claudio Atzori
32fa579b80
[graph provision] select the longest abstract
2024-10-28 10:03:02 +01:00
Claudio Atzori
67e37f41fb
Merge pull request 'blacklist filtering moved before the cleanup phase in order to have case sensitive regex' ( #485 ) from dedup_blacklist_fix into beta
...
Reviewed-on: D-Net/dnet-hadoop#485
2024-10-28 09:42:51 +01:00
Miriam Baglioni
0fb6af5586
Updated main pom dependency against dhp-schema, from 8.0.1 to 9.0.0. The new fields included in the updated schema module are populated by the Solr JSON payload mapping, which also limits the number of authors serialised to 200.
2024-10-25 16:28:50 +02:00
Claudio Atzori
46dbb62598
Merge pull request ' #9839 : include claimed affiliation relationships' ( #476 ) from claim-orgs into beta
...
Reviewed-on: D-Net/dnet-hadoop#476
2024-10-25 10:12:59 +02:00
Claudio Atzori
4a9aeb6238
Merge pull request '9126-impact-indicators-wf-optimisation' ( #471 ) from 9126-impact-indicators-wf-optimisation into beta
...
Reviewed-on: D-Net/dnet-hadoop#471
2024-10-25 10:10:44 +02:00
Claudio Atzori
8172bee8c8
Merge pull request 'Minor fixes' ( #496 ) from beta_fixes_oct into beta
...
Reviewed-on: D-Net/dnet-hadoop#496
2024-10-25 10:09:56 +02:00
Miriam Baglioni
1fce7d5a0f
[Person] remove the isolated nodes from the person set
2024-10-25 10:05:17 +02:00
Miriam Baglioni
842cc75dae
[AffRo] fix name
2024-10-25 09:42:52 +02:00
Miriam Baglioni
e75326d6ec
[FundersMatchFromCrossref] added match from CrossRef to DFG unidentified project
2024-10-25 09:13:54 +02:00
Miriam Baglioni
32f444984e
[person] -
2024-10-24 17:51:42 +02:00
Miriam Baglioni
cab8f1135f
[affroNewModel] -
2024-10-24 17:44:33 +02:00
Miriam Baglioni
c93bf82487
[affroNewModel] extended wf definition
2024-10-24 17:34:34 +02:00
Miriam Baglioni
a7699558ed
[person] -
2024-10-24 16:15:12 +02:00
Miriam Baglioni
01679c935a
[person] added test class to be implemented
2024-10-24 15:27:06 +02:00
Miriam Baglioni
c773421cc7
[person] added new substep in propagation worflow main
2024-10-24 14:44:13 +02:00
Miriam Baglioni
cf07ed9058
[person] refactoring
2024-10-24 14:35:14 +02:00
Miriam Baglioni
c921cf7ee0
[personEntity] removed the deletedbyinference results (not indexed, but still in the graph). Changed the writing mode: append instead of overwrite
2024-10-24 09:57:20 +02:00
Giambattista Bloisi
aa7b8fd014
Use workingDir parameter for temporary data of ORCID enrichment
2024-10-23 14:02:17 +02:00
Giambattista Bloisi
0e34b0ece1
Fix imports: point them from the main distribution packages
2024-10-23 14:01:52 +02:00
Miriam Baglioni
aac5eb3499
[personEntity] changed the data info for the relations with projects. added missing parameters to the job.properties file
2024-10-22 11:54:16 +02:00
Miriam Baglioni
821540f94a
[personEntity] updated the property file to include also the db parameters. The same for the wf definition. Refactoring for compilation
2024-10-22 10:13:30 +02:00
Miriam Baglioni
09a2c93fc7
[personEntity] added relations with projects extracting the info from the database
2024-10-21 16:21:15 +02:00
Miriam Baglioni
ce4ee1189f
[personEntity] create entity for each profile in orcid even without works. Added validated true to each relation coming from orcid data
2024-10-21 14:38:15 +02:00
Miriam Baglioni
2b27afaec8
[createASfromAffRo] refactoring after compilation
2024-10-18 16:22:51 +02:00
Miriam Baglioni
0e5dd14538
[createASfromAffRo] adding the provenance datasource used to get the relation (no datasource can be webcrawl = publisher, rawaff means oalex)
2024-10-18 16:22:21 +02:00
Claudio Atzori
62ff843334
adopting dhp-schemas:8.0.1 to support Auhtor's rawAffiliationString(s). Improved graph2hive implementation
2024-10-08 16:22:54 +02:00
Claudio Atzori
d5867a1992
merged #490
2024-10-08 15:39:59 +02:00
Claudio Atzori
e5df68772d
[graph provision] fixed serialisation of the usage counts as measures in the XML records
2024-10-02 09:35:21 +02:00
Miriam Baglioni
7e6d12fa77
[UsageCount] fixed error
...
(cherry picked from commit 9c9a9562ae
)
2024-10-01 15:55:07 +02:00
Miriam Baglioni
191fc3a461
[UsageCount] add check in case the datasource is not matched against those present in the graph
...
(cherry picked from commit b42bdd5fb3
)
2024-10-01 15:54:31 +02:00
Claudio Atzori
10696f2a44
reverted procedure for creating the UsageCounts actionset
2024-10-01 15:54:13 +02:00
Claudio Atzori
5734b80861
Merge pull request 'datasource table creation split in steps' ( #489 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#489
2024-09-30 16:34:38 +02:00
Antonis Lempesis
f3c179658a
datasource table creation split in steps
2024-09-30 17:12:21 +03:00
Miriam Baglioni
b18ad035c1
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2024-09-30 15:10:44 +02:00
Miriam Baglioni
e430826e00
[ImportOC] fix to move original folder instead of extracted ones
2024-09-30 15:10:10 +02:00
Claudio Atzori
3fcafc7ed6
Merge pull request 'Latest institutions in monitor dbs' ( #472 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#472
2024-09-26 09:49:01 +02:00
Miriam Baglioni
599e56dbc6
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2024-09-25 17:28:23 +02:00
Claudio Atzori
6397141e56
code formatting
2024-09-25 15:27:32 +02:00
Claudio Atzori
e354f9853a
[OpenCitations] move the extracted contents under a backup path to avoid needing to re-download it in case of errors
2024-09-25 15:27:02 +02:00
Sandro La Bruzzo
6a097abc89
as described on ticket #9525
...
1. Changed the mapping applied to Crossref records: anything that has a relationship "is-review-of" must be mapped as publication of type "Review".
2. Force the hostedby of Crossref records with DOI prefix 10.3410 and 10.12703 to the H1 Connect data source.
2024-09-25 11:32:54 +02:00
Michele Artini
9754521847
Merge pull request 'fixed a bug with id' ( #486 ) from osfPreprints_plugin into beta
...
Reviewed-on: D-Net/dnet-hadoop#486
2024-09-25 10:02:24 +02:00
Michele Artini
fa2532db30
fixed a bug with id
2024-09-25 09:38:50 +02:00
Michele Artini
54f8b4da39
Merge pull request 'fixed a bug with 'null' string' ( #484 ) from osfPreprints_plugin into beta
...
Reviewed-on: D-Net/dnet-hadoop#484
2024-09-24 15:19:54 +02:00
Michele Artini
b35d046fd2
fixed a bug with 'null' string
2024-09-24 15:18:54 +02:00
Claudio Atzori
4f0463d779
[graph provision] person serialisation, limit the number of authorships and coauthorships before expanding the payloads
2024-09-24 14:54:34 +02:00
Miriam Baglioni
4d3e079590
Merge remote-tracking branch 'origin/beta' into beta
2024-09-24 14:26:29 +02:00
Claudio Atzori
d1cadc77c9
[graph provision] person serialisation, limit the number of authorships and coauthorships before expanding the payloads
2024-09-24 10:57:20 +02:00
Michele Artini
0e89d4a1cf
fixed a bug with topic ENRICH/MORE/SUBJECT/ARXIV
2024-09-24 08:57:49 +02:00
Michele Artini
e941adbe2b
fixed a bug with topic ENRICH/MORE/SUBJECT/ARXIV
2024-09-24 08:57:37 +02:00
Michele Artini
7f81673f3c
removed the deletedByInference=true filter
2024-09-23 15:27:43 +02:00
Michele Artini
fdbe629f49
removed the deletedByInference=true filter
2024-09-23 15:27:28 +02:00
Antonis Lempesis
619aa34a15
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta
2024-09-23 15:25:59 +03:00
Antonis Lempesis
dbea7a4072
removed duplicate line
2024-09-23 14:57:11 +03:00
Antonis Lempesis
c9241dba0d
Merge pull request 'convert_hive_to_spark_actions' ( #1 ) from convert_hive_to_spark_actions into beta
...
Reviewed-on: antonis.lempesis/dnet-hadoop#1
2024-09-23 13:53:28 +02:00
Claudio Atzori
e0ff84baf0
[graph provision] person serialisation, limit the number of authorships and coauthorships before expanding the payloads
2024-09-23 10:29:46 +02:00
Michele Artini
2d7a7a962d
unit test @Disabled
2024-09-23 10:19:36 +02:00
Michele Artini
6b0f7cc8b0
skip urls with authentication
2024-09-23 10:16:53 +02:00
Claudio Atzori
5f86c93be6
[graph provision] person serialisation
2024-09-20 12:20:00 +02:00
Michele Artini
339d8124f2
osf plugin: links to contributors and primaty_file
2024-09-20 08:44:05 +02:00
Michele Artini
52bb7af03b
use of dom4j
2024-09-19 14:59:05 +02:00
Michele Artini
9073b1159d
partial implementation of osfPreprints plugin + tests
2024-09-19 13:58:53 +02:00
Michele Artini
dcf09811a2
partial implementation of osfPreprints plugin
2024-09-19 12:42:45 +02:00
Claudio Atzori
23e0ab3a7c
run mergeResultsOfDifferentTypes only when checkDelegatedAuthority is true
2024-09-17 15:36:10 +02:00
Claudio Atzori
bfd05cdab2
run mergeResultsOfDifferentTypes only when checkDelegatedAuthority is true
2024-09-17 10:49:32 +02:00
Michele Artini
a2fac78dcc
fixed a problem in incremental harvesting
2024-09-17 10:16:28 +02:00
Michele Artini
99b7adda0c
gtr2 unit test
2024-09-16 15:13:44 +02:00
Michele Artini
bb9cee4f40
implementation of gtr2Publications plugin
2024-09-16 14:16:56 +02:00
Michele De Bonis
6df6b4583e
blacklist filtering moved before the cleanup phase in order to have case sensitive regex
2024-09-16 14:04:59 +02:00
Alessia
07e6e7b4d6
#9839 : include claimed affiliation relationships
2024-09-16 13:41:56 +02:00
Antonis Lempesis
37ad259296
cleanup
2024-09-05 16:02:44 +03:00
Antonis Lempesis
b64c144abf
added new institutions
2024-09-05 16:00:09 +03:00
Serafeim Chatzopoulos
b043f8a963
Remove redundant error messages from impact indicators workflow
2024-09-04 14:28:43 +03:00
Serafeim Chatzopoulos
db03f85366
Remove steps for updating BIP! from the impact indicators workflow
2024-09-04 14:25:44 +03:00
Miriam Baglioni
468f2aa5a5
[AffiliationAffRo]align beta with new affiliation from publisher webpage introduced in production. AffRo collectedfrom OpenAIRE to discriminate against WebCrawl
2024-08-12 18:10:46 +02:00
Miriam Baglioni
89fcf4086c
[Person]fix issue in affiliation relation id construction for person (missing ::)
2024-08-12 18:04:43 +02:00
Miriam Baglioni
45605f93ae
merging with branch beta
2024-08-12 18:03:10 +02:00
Miriam Baglioni
5a7ba77271
[Person]fix issue in affiliation relation id construction for person (missing ::)
2024-08-12 18:01:15 +02:00
Miriam Baglioni
8c185a7b1a
resolving conflicts
2024-08-05 17:14:11 +02:00
Claudio Atzori
e16616b964
added dataInfo to person records
2024-08-05 15:57:37 +02:00
Claudio Atzori
8e7ef79ce0
[bip affiliations] considers only DOI based records
2024-08-05 12:13:48 +02:00
Miriam Baglioni
985ca15264
[openaire-affiliation]removes matchings without DOI
2024-08-05 12:10:40 +02:00
Claudio Atzori
0bf76f2a34
[graph provision] added person to the graph2hive workflow
2024-08-05 09:35:07 +02:00
Claudio Atzori
975d44cac7
[graph provision] added person to the provision workflow
2024-08-02 16:14:10 +02:00
Claudio Atzori
fecbf93e0e
Merge pull request 'FoS L1 & L2' ( #465 ) from fos_l1l2 into beta
...
Reviewed-on: D-Net/dnet-hadoop#465
2024-08-01 13:58:04 +02:00
Claudio Atzori
6bdb8643e6
ActionManager promote: allow to ingest person records in a graph that did not contain them, bumped dhp-schemas version
2024-07-31 11:02:22 +02:00
Claudio Atzori
9486e21a44
copy or process the person records throughout the graph pipeline
2024-07-30 14:25:31 +02:00
Claudio Atzori
64740475d0
depending on dhp-schemas:7.0.1
2024-07-29 11:51:42 +02:00
Miriam Baglioni
1af6571474
merging with branch beta
2024-07-25 15:48:05 +02:00
Claudio Atzori
a81c555fe6
[graph provision] include only FoS L1..L2 in the record serialization
2024-07-25 15:26:47 +02:00
Claudio Atzori
359b8ebda8
[graph provision] include only FoS L1..L2 in the record serialization
2024-07-25 15:22:29 +02:00
Miriam Baglioni
c7f6669f1a
[webcrawl] the blacklist is now in json and no more in csv after the normalization process
2024-07-25 15:20:18 +02:00
Miriam Baglioni
7cff281d3e
[webcrawl] the blacklist is now in json and no more in csv after the normalization process
2024-07-25 15:16:42 +02:00
Claudio Atzori
d4bf449e8c
minor
2024-07-25 14:53:06 +02:00
Miriam Baglioni
fc60661ac5
[webcrawl] added code and test (code/resource) to verify the deletion of the relations related to results put in blacklist
2024-07-25 12:25:14 +02:00
Claudio Atzori
d771a883f9
[dedup] updated sql query used to read organizations from the OpenOrgs DB to include their typology
2024-07-25 09:53:48 +02:00
Claudio Atzori
01958a3e07
[graph provision] addded filter to exclude records marked with datainfo.deletedbyinference = true
2024-07-24 10:00:10 +02:00
Miriam Baglioni
6f1801d7d1
[webcrawl]-
2024-07-23 17:34:48 +02:00
Miriam Baglioni
19806c2ae3
[SDG]fixed switch of methods
2024-07-23 17:12:55 +02:00