Giambattista Bloisi
613ec5ffce
Add profiles for different spark versions: spark-24, spark-34, spark-35
2023-12-05 19:11:06 +01:00
Sandro La Bruzzo
52495f2cd2
used javax.xml.stream.XMLEventReader instead of deprecated scala.xml.pull.XMLEventReader
2023-12-05 19:11:06 +01:00
Claudio Atzori
33cb483c75
using objectSubType as originalType in Crossref2Oaf, code formatting
2023-12-01 15:03:05 +01:00
Claudio Atzori
622fafbd2e
Merge branch 'beta' into orcid_import
2023-12-01 12:28:14 +01:00
Sandro La Bruzzo
cdfb7588dd
code formatting
2023-11-30 15:31:42 +01:00
Sandro La Bruzzo
5e22b67b8a
Merge remote-tracking branch 'origin/beta' into orcid_import
2023-11-30 15:27:46 +01:00
Claudio Atzori
6f10791e77
Merge branch 'beta' into propagationapi
2023-11-30 14:20:18 +01:00
Claudio Atzori
4e1aac2e2f
resolved conflict in pom.xml before applying the changes from [COAR based resource types & Irish tender] #350
2023-11-29 14:37:52 +01:00
Sandro La Bruzzo
86b5775e08
added vocabulary in instanceTypeMapping for
...
- DOIBoost
- Datacite
- PubMed
- Scholexplorer Datasource
2023-11-29 13:15:43 +01:00
Sandro La Bruzzo
af1c2634b3
added instanceTypeMapping original field in the mapping of
...
- DOIBoost
- Datacite
- PubMed
- Scholexplorer Datasource
2023-11-29 12:45:30 +01:00
Miriam Baglioni
8eb70e6657
refactoring
2023-11-27 15:13:15 +01:00
Sandro La Bruzzo
34a4b3cbdf
Implemented ORCID Enrichment
2023-11-24 12:39:58 +01:00
Sandro La Bruzzo
6ce36b3e41
Implemented ORCID Workflow on DHP-Aggregation for retrieving ORCID DUMP and generating tables
2023-11-14 12:04:29 +01:00
Serafeim Chatzopoulos
2090003ea9
Adjust tests to new WF input params
2023-10-26 13:47:06 -07:00
Serafeim Chatzopoulos
a82aaf57b2
Renaming input param for crossref input path
2023-10-25 12:05:02 -07:00
Serafeim Chatzopoulos
aad5982bf1
Change the description of the workflow
2023-10-20 12:48:21 +03:00
Serafeim Chatzopoulos
6b19dcee80
Add actionset creation for pubmed affiliations
2023-10-19 19:58:25 +03:00
Claudio Atzori
a460ebe215
[UnresolvedEntities] updated action name
2023-10-10 15:50:11 +02:00
Miriam Baglioni
a431b04814
leftover for the properties and removal of bipfinder
2023-10-10 12:53:57 +02:00
Miriam Baglioni
110ce4b40f
extend the fos model to include the level4 and the scores for level3 and level4. removed bip indicators from the instance
2023-10-10 09:46:40 +02:00
Claudio Atzori
84a58802ab
[OC] using the common pid cleaning function
2023-10-06 14:48:05 +02:00
Claudio Atzori
46034630cf
[OC] compress the output actionset
2023-10-06 14:42:02 +02:00
Claudio Atzori
ee8a39e7d2
cleanup and refinements
2023-10-04 12:32:05 +02:00
Miriam Baglioni
d7fccdc64b
fixed paths in wf to match the req of the pathname
2023-10-02 14:10:57 +02:00
Miriam Baglioni
9898470b0e
Addressing comments in #340 \#issuecomment-10592
2023-10-02 12:54:16 +02:00
Miriam Baglioni
e84f5b5e64
extended existing codo to accomodate import of POCI from open citation
2023-10-02 09:25:16 +02:00
Claudio Atzori
4786aa0e09
added Archive ouverte UNIGE (ETHZ.UNIGENF, opendoar____::1400) to the Datacite hostedBy_map
2023-09-07 11:21:07 +02:00
Claudio Atzori
15666e86a8
added collectedfrom to the affiliation relations imported from Crossref
2023-09-04 15:56:06 +02:00
Serafeim Chatzopoulos
7de0164c26
Fix import of affiliations relations from Crossref
2023-09-04 16:04:41 +03:00
Miriam Baglioni
9c8b41475a
Merge pull request '8172_impact_indicators_workflow' ( #284 ) from 8172_impact_indicators_workflow into beta
...
Reviewed-on: #284
2023-08-14 15:50:48 +02:00
Serafeim Chatzopoulos
97c1ba8918
Merge actionsets of results and projects
2023-08-11 15:56:53 +03:00
Serafeim Chatzopoulos
7cefe2665b
Remove unnecessary classes
2023-07-28 19:14:39 +03:00
Serafeim Chatzopoulos
26a92ce762
Merge branch '8876' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8876
2023-07-28 19:03:57 +03:00
Serafeim Chatzopoulos
ebfba38ab6
Add changes from code review
2023-07-28 19:03:47 +03:00
Serafeim Chatzopoulos
eb8684a8cf
Merge branch 'beta' into 8876
2023-07-28 13:39:33 +02:00
Giambattista Bloisi
e64c2854a3
Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
...
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Giambattista Bloisi
bb5b845e3c
Use scala.binary.version property to resolve scala maven dependencies
...
Ensure consistent usage of maven properties
Profile for compiling with scala 2.12 and Spark 3.4
2023-07-24 11:13:48 +02:00
Serafeim Chatzopoulos
2cc5b1a39b
Fixes in workflow.xml
2023-07-21 15:26:50 +03:00
Serafeim Chatzopoulos
be320ba3c1
Indentation fixes
2023-07-17 16:04:21 +03:00
Serafeim Chatzopoulos
bc1a4611aa
Minor changes
2023-07-17 11:17:53 +03:00
Serafeim Chatzopoulos
4eba14a80e
Add oozie workflow
2023-07-06 21:07:50 +03:00
Serafeim Chatzopoulos
c2998a14e8
Add basic tests for affiliation relations
2023-07-06 20:28:16 +03:00
Serafeim Chatzopoulos
bc7b00bcd1
Add bi-directional affiliation relations
2023-07-06 18:29:15 +03:00
Serafeim Chatzopoulos
12528ed2ef
Refactor PrepareAffiliationRelations.java to use OafMapperUtils common functions
2023-07-06 18:08:33 +03:00
Serafeim Chatzopoulos
bbc245696e
Prepare actionsets for BIP affiliations
2023-07-06 15:56:12 +03:00
Serafeim Chatzopoulos
347a889b20
Read affiliation relations
2023-07-06 00:51:01 +03:00
Miriam Baglioni
4c9bc4c3a5
refactoring
2023-06-30 19:05:15 +02:00
Miriam Baglioni
7738372125
[UsageCount] fixed typo in attribute name for datasource table
2023-06-30 18:56:41 +02:00
Miriam Baglioni
55ea485783
[UsageCount] split the count for result at the level of the datasource. for each indicator one unit is specified for each datasource contrinuting to that indicator value. The datasource key is the value of the key element in the unit for the measure, while the count for that datasource is in the value
2023-06-30 18:39:30 +02:00
Michele Artini
88a1cbc37d
fixed a datasource id
2023-06-22 07:56:33 +02:00
Alessia Bardi
d5be6a13e9
Updated officialnmae of pangaea in hostedbymap for Datacite to avoid duplicate entries in the source filter of the portal
2023-06-06 14:43:32 +02:00
Claudio Atzori
8acad52a0c
Merge branch 'beta' into apc_affiliation
2023-05-15 15:47:33 +02:00
Claudio Atzori
8a463cc3e8
fixed organization id created when mapping APC affiliations. Factored out ROR constants in dhp-common
2023-05-15 15:44:46 +02:00
Miriam Baglioni
78b07400c0
changed test classes
2023-05-15 11:37:08 +02:00
Miriam Baglioni
86fe886c1a
removed the inverse of the Citing relation
2023-05-15 11:20:51 +02:00
Serafeim Chatzopoulos
815a4ddbba
Add actionset creation for project bip indicators in workflow
2023-04-26 20:40:06 +03:00
Serafeim Chatzopoulos
ee04cf92bf
Add actionsets for project impact indicators
2023-04-26 20:23:46 +03:00
Claudio Atzori
24e2fd828b
code formatting
2023-03-08 21:17:08 +01:00
Miriam Baglioni
0fff98a14c
[ECclassification] removed print
2023-03-02 11:46:57 +01:00
Miriam Baglioni
b0c2f7e526
[ECclassification] removed not needed resources
2023-03-02 11:44:48 +01:00
Miriam Baglioni
d4fc62c2f6
mergin with branch beta
2023-03-02 11:14:54 +01:00
Miriam Baglioni
de8ad1caef
[ECclassification] new implementation for the H2020 classification
2023-03-02 11:14:03 +01:00
Miriam Baglioni
c1f9848953
[ECclassification] added new classes
2023-03-01 15:29:11 +01:00
Claudio Atzori
16ad42e8f3
code formatting
2023-03-01 10:22:13 +01:00
Miriam Baglioni
4f2df876cd
[ECclassification] new implementation first try
2023-02-28 14:44:00 +01:00
Claudio Atzori
2f7346e9cf
WIP monodirectional citations, Datacite
2023-02-28 13:30:51 +01:00
Claudio Atzori
7aebedb43c
code formatting
2023-02-27 11:51:27 +01:00
Miriam Baglioni
80987801d7
[FoS] added check for null on level1 subject
2023-02-27 11:40:22 +01:00
Claudio Atzori
31e97c2a6b
[unresolved entities] updated oozie wf node labels
2023-02-27 11:38:29 +01:00
Miriam Baglioni
23112929e9
[FoS] changed the default separator from comma to tab to solve the issue in subject value split
2023-02-27 10:18:39 +01:00
Claudio Atzori
0c1be41b30
code formatting
2023-02-22 10:15:25 +01:00
Claudio Atzori
477a7c416f
Merge branch 'beta' into UsageCountOnProjectAndDatasource
2023-02-22 09:55:51 +01:00
Miriam Baglioni
016337a0f9
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-02-16 15:54:59 +01:00
Claudio Atzori
9a03f71db1
code formatting
2023-02-13 16:25:47 +01:00
Miriam Baglioni
5cf902a2b0
[UsageCount] changed query to make the sum be computed via sql instead of grouping
2023-02-10 16:16:37 +01:00
Miriam Baglioni
f803530df6
[UsageCount] fixed query
2023-02-10 15:50:56 +01:00
Miriam Baglioni
bb5bba51b3
[UsageCount] extended test
2023-02-09 19:08:30 +01:00
Miriam Baglioni
85e53fad00
[UsageCount] addition of usagecount for Projects and datasources. Extention of the action set created for the results with new entities for projects and datasources. Extention of the resource set and modification of the testing class
2023-02-09 18:59:45 +01:00
Michele Artini
7b7520850b
fixed an invalid char
2023-01-11 09:22:18 +01:00
Sandro La Bruzzo
3c9826f186
updated lines function to it's implementation linesWithSeparators.map(l => l.stripLineEnd) in this way we force scala plugin compiler to consider this pipeline scala code and not java.string.lines() pipeline
2022-12-21 11:21:17 +01:00
Sandro La Bruzzo
72f0d88d6c
formatted code
2022-10-19 14:18:42 +02:00
Sandro La Bruzzo
a1f94530a3
added documentation
2022-10-13 11:47:11 +02:00
Claudio Atzori
f3f7604e6c
trying to fix a test that fails only on Jenkins
2022-09-27 15:21:37 +02:00
Claudio Atzori
27a91841e7
WIP: cleaning of subjects
2022-08-04 11:39:39 +02:00
Claudio Atzori
eb53b52f7c
code formatting
2022-08-02 13:24:47 +02:00
Claudio Atzori
209c7e9dab
[datacite] avoid UnsupportedOperationException
2022-08-01 09:05:35 +02:00
Claudio Atzori
92e48f12f7
[metadata collection] updated collector plugin name
2022-07-29 13:54:00 +02:00
Claudio Atzori
f62c4e05cd
code formatting
2022-07-29 11:56:01 +02:00
Claudio Atzori
ed98a6d9d0
[Datacite mapping] include the older datacite prefixed OpenAIRE id among the originalId[]
2022-07-28 10:15:14 +02:00
Sandro La Bruzzo
67525076ec
fixed test, now it compiles after commit a6977197b3
2022-07-26 15:35:17 +02:00
Claudio Atzori
d43663d30f
adapted RorActionSet test, it should not create parent/child rels
2022-07-25 17:54:10 +02:00
Claudio Atzori
c3ede1b379
Merge branch 'beta' into pubmed_update
2022-07-25 14:10:22 +02:00
Claudio Atzori
1138b2ac8e
code formatting
2022-07-19 14:15:49 +02:00
Sandro La Bruzzo
00168303db
Added unit test to verify the generation in the OriginalID the old openaire Identifier generated by OAI
2022-07-14 10:19:59 +02:00
Sandro La Bruzzo
0a4f4d98fa
added PMCId to PmArticle
2022-07-13 15:27:17 +02:00
Claudio Atzori
0cb1c70788
code formatting
2022-07-01 10:44:08 +02:00
Claudio Atzori
929b145130
code formatting
2022-06-21 23:07:06 +02:00
Claudio Atzori
06b5533d4c
Merge branch 'beta' into 7096-fileGZip-collector-plugin
2022-06-16 09:22:16 +02:00
Alessia Bardi
88d531dc91
exclude FAIRsharing records from Datacite
2022-06-13 16:17:17 +02:00
Claudio Atzori
b8cda65487
code formatting
2022-06-13 09:20:03 +02:00
Michele Artini
634869ce95
deleted hierarchical rels from ror action set
2022-06-13 09:12:21 +02:00
Michele Artini
b94a791bc5
unit tests to transform cnr explora
2022-06-09 12:25:34 +02:00
Claudio Atzori
d098ad0d93
[hb patch] updated map
2022-05-16 15:54:04 +02:00
Claudio Atzori
1dda11e031
[hb patch] updated map
2022-05-16 15:53:27 +02:00
Claudio Atzori
8dd5517548
code formatting
2022-05-16 14:35:24 +02:00
Michele Artini
46c07e0724
deleted hierarchical rels from ror action set
2022-05-16 09:39:54 +02:00
Miriam Baglioni
89657a0b78
[UsageCount] refactoring
2022-05-09 14:43:27 +02:00
Miriam Baglioni
a056f59c6e
[UsageCount] make it as an action set as it should be, plus changed the test to make them work as well now
2022-05-09 12:51:35 +02:00
Serafeim Chatzopoulos
623f7be26d
Fix reading files from HDFS in FileCollector & FileGZipCollector plugins
2022-04-28 16:31:11 +03:00
Claudio Atzori
30105f0722
Merge branch 'beta' into 7096-fileGZip-collector-plugin
2022-04-22 11:22:21 +02:00
Miriam Baglioni
20de75ca64
[Measures] removed typo
2022-04-21 12:14:03 +02:00
Miriam Baglioni
b61efd613b
[Measures] addressed comments in the PR
2022-04-21 12:09:37 +02:00
Claudio Atzori
eabb40fccc
Merge branch 'beta' into 7096-fileGZip-collector-plugin
2022-04-21 11:42:43 +02:00
Miriam Baglioni
c304657d91
[Measures] put the logic in common, no need to change the schema
2022-04-21 11:27:26 +02:00
Miriam Baglioni
5295effc96
[Measures] fixed issue
2022-04-20 16:20:40 +02:00
Miriam Baglioni
5feae77937
[Measures] last changes to accomodate tests
2022-04-20 15:13:09 +02:00
Miriam Baglioni
869407c6e2
[Measures] added new measure (usagecounts) as action set. Measure added at the level of the result. Ref #7587
2022-04-20 14:02:05 +02:00
Serafeim Chatzopoulos
d0b84d3297
Add FileCollectorPlugin and respective test
2022-04-07 15:06:38 +03:00
Serafeim Chatzopoulos
bc1bf55507
Add AbstractSplittedRecordPlugin
2022-04-07 14:33:04 +03:00
Claudio Atzori
c26222623f
[maven-release-plugin] prepare for next development iteration
2022-04-07 13:32:22 +02:00
Claudio Atzori
86585a6b27
[maven-release-plugin] prepare release dhp-1.2.4
2022-04-07 13:32:19 +02:00
Claudio Atzori
ad85d88eaf
[maven-release-plugin] rollback the release of dhp-1.2.4
2022-04-07 13:28:35 +02:00
Claudio Atzori
598e11dfd7
[maven-release-plugin] prepare for next development iteration
2022-04-07 13:27:02 +02:00
Claudio Atzori
db3d9877a5
[maven-release-plugin] prepare release dhp-1.2.4
2022-04-07 13:26:58 +02:00
Claudio Atzori
3bba6d6e38
[maven-release-plugin] rollback the release of dhp-1.2.4
2022-04-07 12:23:17 +02:00
Claudio Atzori
2ac2d928bd
[maven-release-plugin] prepare for next development iteration
2022-04-07 12:18:47 +02:00
Claudio Atzori
85bc722ff4
[maven-release-plugin] prepare release dhp-1.2.4
2022-04-07 12:18:43 +02:00
Claudio Atzori
bc05b6168a
[maven-release-plugin] rollback the release of dhp-1.2.4
2022-04-07 11:49:06 +02:00
Claudio Atzori
505420fd61
[maven-release-plugin] prepare for next development iteration
2022-04-07 11:34:06 +02:00
Claudio Atzori
66e718981e
[maven-release-plugin] prepare release dhp-1.2.4
2022-04-07 11:34:02 +02:00
Serafeim Chatzopoulos
e612489670
Add fileGZip collector plugin and respective test
2022-04-06 19:12:44 +03:00
Claudio Atzori
8c457f1b2c
conflicts resolved, merged from beta
2022-04-06 10:27:52 +02:00
Miriam Baglioni
e77d104951
[OC] added / to workflow path
2022-04-05 15:07:11 +02:00
Sandro La Bruzzo
1b11010169
minor fix
2022-03-29 10:59:14 +02:00
Claudio Atzori
a87c070447
conflicts resolved, merged from beta
2022-02-24 12:51:31 +01:00
Claudio Atzori
5226d0a100
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-02-18 15:21:07 +01:00
Claudio Atzori
401dd38074
code formatting
2022-02-18 15:19:15 +01:00
Sandro La Bruzzo
891781ee3f
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2022-02-18 11:11:32 +01:00
Sandro La Bruzzo
d3f03abd51
fixed wrong json path
2022-02-18 11:11:17 +01:00
Claudio Atzori
89c7313fc5
Merge branch 'beta' into hierarchical_orgs_relations
2022-02-17 10:30:04 +01:00
Sandro La Bruzzo
3aa2020b24
added script to regenerate hostedBy Map following instruction defined on ticket #7539
...
updated hosted By Map
2022-02-15 11:05:27 +01:00
Miriam Baglioni
be64055cfe
[OpenCitation] changed the name of destination folders
2022-02-14 15:49:44 +01:00
Miriam Baglioni
1490867cc7
[OpenCitation] cleaning of the COCI model
2022-02-14 14:52:12 +01:00
Miriam Baglioni
5c4043dba8
[OpenCitation] refactoring
2022-02-08 16:23:05 +01:00
Miriam Baglioni
759ed519f2
[OpenCitation] added logic to avoid the genration of self citations relations
2022-02-08 16:15:34 +01:00
Miriam Baglioni
b071f8e415
[OpenCitation] change to extract in json format each folder just onece
2022-02-08 15:37:28 +01:00
Miriam Baglioni
fbc28ee8c3
[OpenCitation] change the integration logic to consider dois with commas inside
2022-02-07 18:32:08 +01:00
Miriam Baglioni
493caef358
[stats-wf]fixed the result_result table related to PR#191
2022-02-04 14:51:25 +01:00
Miriam Baglioni
73eba34d42
[UnresolvedEntities] Changed the way to merge the unresolved because the new merge removed the dataInfo from the merged result. Added also data info for subjects
2022-02-01 08:38:41 +01:00
Claudio Atzori
b37bc277c4
reintroduced the hostedby patching to the datacite records
2022-01-21 09:15:13 +01:00