Miriam Baglioni
110ce4b40f
extend the fos model to include the level4 and the scores for level3 and level4. removed bip indicators from the instance
2023-10-10 09:46:40 +02:00
Claudio Atzori
84a58802ab
[OC] using the common pid cleaning function
2023-10-06 14:48:05 +02:00
Claudio Atzori
46034630cf
[OC] compress the output actionset
2023-10-06 14:42:02 +02:00
Claudio Atzori
ee8a39e7d2
cleanup and refinements
2023-10-04 12:32:05 +02:00
Miriam Baglioni
d7fccdc64b
fixed paths in wf to match the req of the pathname
2023-10-02 14:10:57 +02:00
Miriam Baglioni
9898470b0e
Addressing comments in D-Net/dnet-hadoop#340 \#issuecomment-10592
2023-10-02 12:54:16 +02:00
Miriam Baglioni
e84f5b5e64
extended existing codo to accomodate import of POCI from open citation
2023-10-02 09:25:16 +02:00
Claudio Atzori
4786aa0e09
added Archive ouverte UNIGE (ETHZ.UNIGENF, opendoar____::1400) to the Datacite hostedBy_map
2023-09-07 11:21:07 +02:00
Claudio Atzori
15666e86a8
added collectedfrom to the affiliation relations imported from Crossref
2023-09-04 15:56:06 +02:00
Serafeim Chatzopoulos
7de0164c26
Fix import of affiliations relations from Crossref
2023-09-04 16:04:41 +03:00
Miriam Baglioni
9c8b41475a
Merge pull request '8172_impact_indicators_workflow' ( #284 ) from 8172_impact_indicators_workflow into beta
...
Reviewed-on: D-Net/dnet-hadoop#284
2023-08-14 15:50:48 +02:00
Serafeim Chatzopoulos
97c1ba8918
Merge actionsets of results and projects
2023-08-11 15:56:53 +03:00
Serafeim Chatzopoulos
7cefe2665b
Remove unnecessary classes
2023-07-28 19:14:39 +03:00
Serafeim Chatzopoulos
26a92ce762
Merge branch '8876' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8876
2023-07-28 19:03:57 +03:00
Serafeim Chatzopoulos
ebfba38ab6
Add changes from code review
2023-07-28 19:03:47 +03:00
Serafeim Chatzopoulos
eb8684a8cf
Merge branch 'beta' into 8876
2023-07-28 13:39:33 +02:00
Giambattista Bloisi
e64c2854a3
Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
...
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Serafeim Chatzopoulos
2cc5b1a39b
Fixes in workflow.xml
2023-07-21 15:26:50 +03:00
Serafeim Chatzopoulos
be320ba3c1
Indentation fixes
2023-07-17 16:04:21 +03:00
Serafeim Chatzopoulos
bc1a4611aa
Minor changes
2023-07-17 11:17:53 +03:00
Serafeim Chatzopoulos
4eba14a80e
Add oozie workflow
2023-07-06 21:07:50 +03:00
Serafeim Chatzopoulos
bc7b00bcd1
Add bi-directional affiliation relations
2023-07-06 18:29:15 +03:00
Serafeim Chatzopoulos
12528ed2ef
Refactor PrepareAffiliationRelations.java to use OafMapperUtils common functions
2023-07-06 18:08:33 +03:00
Serafeim Chatzopoulos
bbc245696e
Prepare actionsets for BIP affiliations
2023-07-06 15:56:12 +03:00
Serafeim Chatzopoulos
347a889b20
Read affiliation relations
2023-07-06 00:51:01 +03:00
Miriam Baglioni
4c9bc4c3a5
refactoring
2023-06-30 19:05:15 +02:00
Miriam Baglioni
7738372125
[UsageCount] fixed typo in attribute name for datasource table
2023-06-30 18:56:41 +02:00
Miriam Baglioni
55ea485783
[UsageCount] split the count for result at the level of the datasource. for each indicator one unit is specified for each datasource contrinuting to that indicator value. The datasource key is the value of the key element in the unit for the measure, while the count for that datasource is in the value
2023-06-30 18:39:30 +02:00
Michele Artini
88a1cbc37d
fixed a datasource id
2023-06-22 07:56:33 +02:00
Alessia Bardi
d5be6a13e9
Updated officialnmae of pangaea in hostedbymap for Datacite to avoid duplicate entries in the source filter of the portal
2023-06-06 14:43:32 +02:00
Claudio Atzori
8acad52a0c
Merge branch 'beta' into apc_affiliation
2023-05-15 15:47:33 +02:00
Claudio Atzori
8a463cc3e8
fixed organization id created when mapping APC affiliations. Factored out ROR constants in dhp-common
2023-05-15 15:44:46 +02:00
Miriam Baglioni
86fe886c1a
removed the inverse of the Citing relation
2023-05-15 11:20:51 +02:00
Serafeim Chatzopoulos
815a4ddbba
Add actionset creation for project bip indicators in workflow
2023-04-26 20:40:06 +03:00
Serafeim Chatzopoulos
ee04cf92bf
Add actionsets for project impact indicators
2023-04-26 20:23:46 +03:00
Miriam Baglioni
d4fc62c2f6
mergin with branch beta
2023-03-02 11:14:54 +01:00
Miriam Baglioni
de8ad1caef
[ECclassification] new implementation for the H2020 classification
2023-03-02 11:14:03 +01:00
Miriam Baglioni
c1f9848953
[ECclassification] added new classes
2023-03-01 15:29:11 +01:00
Claudio Atzori
16ad42e8f3
code formatting
2023-03-01 10:22:13 +01:00
Miriam Baglioni
4f2df876cd
[ECclassification] new implementation first try
2023-02-28 14:44:00 +01:00
Claudio Atzori
2f7346e9cf
WIP monodirectional citations, Datacite
2023-02-28 13:30:51 +01:00
Claudio Atzori
7aebedb43c
code formatting
2023-02-27 11:51:27 +01:00
Miriam Baglioni
80987801d7
[FoS] added check for null on level1 subject
2023-02-27 11:40:22 +01:00
Claudio Atzori
31e97c2a6b
[unresolved entities] updated oozie wf node labels
2023-02-27 11:38:29 +01:00
Miriam Baglioni
23112929e9
[FoS] changed the default separator from comma to tab to solve the issue in subject value split
2023-02-27 10:18:39 +01:00
Claudio Atzori
0c1be41b30
code formatting
2023-02-22 10:15:25 +01:00
Claudio Atzori
477a7c416f
Merge branch 'beta' into UsageCountOnProjectAndDatasource
2023-02-22 09:55:51 +01:00
Miriam Baglioni
016337a0f9
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-02-16 15:54:59 +01:00
Claudio Atzori
9a03f71db1
code formatting
2023-02-13 16:25:47 +01:00
Miriam Baglioni
5cf902a2b0
[UsageCount] changed query to make the sum be computed via sql instead of grouping
2023-02-10 16:16:37 +01:00
Miriam Baglioni
f803530df6
[UsageCount] fixed query
2023-02-10 15:50:56 +01:00
Miriam Baglioni
85e53fad00
[UsageCount] addition of usagecount for Projects and datasources. Extention of the action set created for the results with new entities for projects and datasources. Extention of the resource set and modification of the testing class
2023-02-09 18:59:45 +01:00
Sandro La Bruzzo
3c9826f186
updated lines function to it's implementation linesWithSeparators.map(l => l.stripLineEnd) in this way we force scala plugin compiler to consider this pipeline scala code and not java.string.lines() pipeline
2022-12-21 11:21:17 +01:00
Sandro La Bruzzo
72f0d88d6c
formatted code
2022-10-19 14:18:42 +02:00
Sandro La Bruzzo
a1f94530a3
added documentation
2022-10-13 11:47:11 +02:00
Claudio Atzori
27a91841e7
WIP: cleaning of subjects
2022-08-04 11:39:39 +02:00
Claudio Atzori
eb53b52f7c
code formatting
2022-08-02 13:24:47 +02:00
Claudio Atzori
209c7e9dab
[datacite] avoid UnsupportedOperationException
2022-08-01 09:05:35 +02:00
Claudio Atzori
92e48f12f7
[metadata collection] updated collector plugin name
2022-07-29 13:54:00 +02:00
Claudio Atzori
f62c4e05cd
code formatting
2022-07-29 11:56:01 +02:00
Claudio Atzori
ed98a6d9d0
[Datacite mapping] include the older datacite prefixed OpenAIRE id among the originalId[]
2022-07-28 10:15:14 +02:00
Sandro La Bruzzo
0a4f4d98fa
added PMCId to PmArticle
2022-07-13 15:27:17 +02:00
Claudio Atzori
929b145130
code formatting
2022-06-21 23:07:06 +02:00
Claudio Atzori
06b5533d4c
Merge branch 'beta' into 7096-fileGZip-collector-plugin
2022-06-16 09:22:16 +02:00
Alessia Bardi
88d531dc91
exclude FAIRsharing records from Datacite
2022-06-13 16:17:17 +02:00
Claudio Atzori
b8cda65487
code formatting
2022-06-13 09:20:03 +02:00
Michele Artini
634869ce95
deleted hierarchical rels from ror action set
2022-06-13 09:12:21 +02:00
Claudio Atzori
d098ad0d93
[hb patch] updated map
2022-05-16 15:54:04 +02:00
Miriam Baglioni
89657a0b78
[UsageCount] refactoring
2022-05-09 14:43:27 +02:00
Miriam Baglioni
a056f59c6e
[UsageCount] make it as an action set as it should be, plus changed the test to make them work as well now
2022-05-09 12:51:35 +02:00
Serafeim Chatzopoulos
623f7be26d
Fix reading files from HDFS in FileCollector & FileGZipCollector plugins
2022-04-28 16:31:11 +03:00
Claudio Atzori
30105f0722
Merge branch 'beta' into 7096-fileGZip-collector-plugin
2022-04-22 11:22:21 +02:00
Miriam Baglioni
20de75ca64
[Measures] removed typo
2022-04-21 12:14:03 +02:00
Miriam Baglioni
b61efd613b
[Measures] addressed comments in the PR
2022-04-21 12:09:37 +02:00
Miriam Baglioni
c304657d91
[Measures] put the logic in common, no need to change the schema
2022-04-21 11:27:26 +02:00
Miriam Baglioni
5295effc96
[Measures] fixed issue
2022-04-20 16:20:40 +02:00
Miriam Baglioni
5feae77937
[Measures] last changes to accomodate tests
2022-04-20 15:13:09 +02:00
Miriam Baglioni
869407c6e2
[Measures] added new measure (usagecounts) as action set. Measure added at the level of the result. Ref #7587
2022-04-20 14:02:05 +02:00
Serafeim Chatzopoulos
d0b84d3297
Add FileCollectorPlugin and respective test
2022-04-07 15:06:38 +03:00
Serafeim Chatzopoulos
bc1bf55507
Add AbstractSplittedRecordPlugin
2022-04-07 14:33:04 +03:00
Serafeim Chatzopoulos
e612489670
Add fileGZip collector plugin and respective test
2022-04-06 19:12:44 +03:00
Miriam Baglioni
e77d104951
[OC] added / to workflow path
2022-04-05 15:07:11 +02:00
Claudio Atzori
5226d0a100
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-02-18 15:21:07 +01:00
Claudio Atzori
401dd38074
code formatting
2022-02-18 15:19:15 +01:00
Sandro La Bruzzo
891781ee3f
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2022-02-18 11:11:32 +01:00
Sandro La Bruzzo
d3f03abd51
fixed wrong json path
2022-02-18 11:11:17 +01:00
Claudio Atzori
89c7313fc5
Merge branch 'beta' into hierarchical_orgs_relations
2022-02-17 10:30:04 +01:00
Sandro La Bruzzo
3aa2020b24
added script to regenerate hostedBy Map following instruction defined on ticket #7539
...
updated hosted By Map
2022-02-15 11:05:27 +01:00
Miriam Baglioni
be64055cfe
[OpenCitation] changed the name of destination folders
2022-02-14 15:49:44 +01:00
Miriam Baglioni
1490867cc7
[OpenCitation] cleaning of the COCI model
2022-02-14 14:52:12 +01:00
Miriam Baglioni
5c4043dba8
[OpenCitation] refactoring
2022-02-08 16:23:05 +01:00
Miriam Baglioni
759ed519f2
[OpenCitation] added logic to avoid the genration of self citations relations
2022-02-08 16:15:34 +01:00
Miriam Baglioni
b071f8e415
[OpenCitation] change to extract in json format each folder just onece
2022-02-08 15:37:28 +01:00
Miriam Baglioni
fbc28ee8c3
[OpenCitation] change the integration logic to consider dois with commas inside
2022-02-07 18:32:08 +01:00
Miriam Baglioni
73eba34d42
[UnresolvedEntities] Changed the way to merge the unresolved because the new merge removed the dataInfo from the merged result. Added also data info for subjects
2022-02-01 08:38:41 +01:00
Claudio Atzori
b37bc277c4
reintroduced the hostedby patching to the datacite records
2022-01-21 09:15:13 +01:00
Miriam Baglioni
e7d5a39c03
[BipFinderInstanceLevel] added tests in test class
2022-01-12 17:25:04 +01:00
Miriam Baglioni
4993666d73
[BipFinderInstanceLevel] changed creation of the instance to allow to enrich existing instances with same pid
2022-01-12 16:53:47 +01:00
Sandro La Bruzzo
57e2c4b749
formatted code
2022-01-12 09:40:28 +01:00
Claudio Atzori
dcd282977c
pulled from beta
2022-01-11 16:59:41 +01:00
Claudio Atzori
4f212652ca
scalafmt: code formatting
2022-01-11 16:57:48 +01:00
Miriam Baglioni
b7e450070b
[SDG-FOS] to import SDG file not considering the header
2022-01-07 12:13:26 +01:00
Miriam Baglioni
639190370a
mergin with branch beta
2022-01-07 11:29:25 +01:00
Miriam Baglioni
adccc2346a
[SDG-FOS] to lower case for the doi
2022-01-07 11:28:50 +01:00
Claudio Atzori
58f8998e3d
OAF-store-graph mdstores: save them in text format
2022-01-04 15:02:09 +01:00
Claudio Atzori
174c3037e1
OAF-store-graph mdstores: save them in text format
2022-01-04 14:40:16 +01:00
Claudio Atzori
045d767013
OAF-store-graph mdstores: save them in text format
2022-01-04 14:23:01 +01:00
Claudio Atzori
a6977197b3
serialise records in the OAF-store-graph mdstores in json format. Read them again in the graph construction phase using a tolerant parser to support backward compatible changes in the evolution of the schema
2022-01-03 17:25:26 +01:00
Miriam Baglioni
92fd69e25d
[SDG-FOS] alternative way to get input data to avoid OOM error while getting csv
2022-01-03 15:23:06 +01:00
Miriam Baglioni
7a1b440413
[SDG] logic to create unresolved entities out of SDG input. This changes also some classes related to FOS to reuse the same code. The code under createunresolvedentities create results with the merged update of the the inputs provided (bip at the level of the isntance, fos and sdg for subjects)
2021-12-23 13:24:28 +01:00
Miriam Baglioni
2a67ee13ec
[SDG] added model class
2021-12-23 10:37:52 +01:00
Miriam Baglioni
10579c0dd0
[FOS]fixed doi value in test
2021-12-22 23:10:16 +01:00
Miriam Baglioni
6116fc5d40
[FOS]added logic to include only different subjects. Test refactoring and extention
2021-12-22 23:04:22 +01:00
Miriam Baglioni
b81efb6a9d
[FOS]changed the mapping between the csv and the model. Changed Test classes and resources
2021-12-22 21:40:35 +01:00
Miriam Baglioni
de6c4c8968
[FOS]creation of the unresolved entities: remove the split for the doi: no more needed since each row is related to one doi
2021-12-22 16:44:44 +01:00
Miriam Baglioni
34ac56565d
refactoring
2021-12-22 16:28:11 +01:00
Miriam Baglioni
20ef1d657f
refactoring
2021-12-22 16:26:36 +01:00
Miriam Baglioni
813f856d3f
[BipFinder] removing left over parameter in wf
2021-12-22 16:11:12 +01:00
Miriam Baglioni
2c126ed014
[BipFinder] create unresolved entities with measures at the level of the instance
2021-12-22 16:03:41 +01:00
Miriam Baglioni
b5e11a3a0a
[BipFinder] put in common package BipFinder model
2021-12-22 15:33:05 +01:00
Miriam Baglioni
c5739c4266
[BipFinder] create action set for the measures at the level of the result
2021-12-22 15:08:33 +01:00
Miriam Baglioni
e24a7f3496
mergin with branch beta
2021-12-21 13:57:19 +01:00
Sandro La Bruzzo
3920d68992
Fixed workflow generation of delta in datacite
2021-12-21 11:41:49 +01:00
Sandro La Bruzzo
b881ee5ef8
[scholexplorer]
...
- implemented generation of scholix of delta update of datacite
2021-12-15 11:25:32 +01:00
Miriam Baglioni
22d4b5619b
[BipFinder Result] last changes to test and resources files
2021-12-14 14:54:13 +01:00
Miriam Baglioni
6fb6236cd4
changed the way to produce the AS for bipFinder.
2021-12-14 14:51:14 +01:00
Miriam Baglioni
4eb8276493
-
2021-12-14 11:12:17 +01:00
Sandro La Bruzzo
7af0bbd0b1
[scala-refactor] Module dhp-aggregation:
...
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 11:26:36 +01:00
Sandro La Bruzzo
2164a2a889
Datacite: Code Refactor generated a general SparkApplication Scala where all the spark scala have to inherit
...
Commented a little the Datacite transformation code
2021-11-25 10:54:13 +01:00
Sandro La Bruzzo
a7cf277d98
Datacite: Removed HostedBy Patch as described on ticket #7219 , Now all the records will have hosted by Unknown Repository
2021-11-22 16:03:17 +01:00
Claudio Atzori
3a4d925386
Merge branch 'beta' into hierarchical_orgs_relations
2021-11-18 18:07:08 +01:00
Claudio Atzori
bafa2990f3
code formatting
2021-11-15 17:07:16 +01:00
Sandro La Bruzzo
efa09057db
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2021-11-15 14:32:09 +01:00
Sandro La Bruzzo
48923e46a1
added documentation to Pubmed Class and also added mvn site for dhp-aggregations
2021-11-15 14:32:01 +01:00
Miriam Baglioni
4ec88c718c
merge with beta - resolved conflict in pom
2021-11-15 10:52:16 +01:00
Miriam Baglioni
157d33ebf9
[Bypass Action Set] Refactoring
2021-11-15 09:58:48 +01:00
Miriam Baglioni
92d0e18b55
[Bypass Action Set] used constant DOI instead of "doi"
2021-11-12 10:56:58 +01:00
Miriam Baglioni
881113743f
[Bypass Action Set] refactoring
2021-11-12 10:55:50 +01:00
Miriam Baglioni
47ccb53c4f
[Bypass Action Set] modification for comment D-Net/dnet-hadoop#157 (comment)
2021-11-12 10:54:09 +01:00
Miriam Baglioni
716021546e
[Bypass Action Set] minor fix
2021-11-12 10:18:01 +01:00
Miriam Baglioni
935062edec
[Bypass Action Set] creation of unresolved entities
2021-11-11 16:11:25 +01:00
Claudio Atzori
d02caef185
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-27 15:36:29 +02:00
Sandro La Bruzzo
4acfa8fa2e
Scholexplorer Datasource Aggregation:
...
- Added collectedfrom in the inverse relation generated
Relation resolution:
- increased number of partitions in workflow.xml
- using classid instead of classname to build the pid-dnetId mapping
2021-10-26 17:51:20 +02:00
Sandro La Bruzzo
034304b33a
conflict resolved on merge
2021-10-26 09:40:47 +02:00
Michele Artini
d66e20e7ac
added hierarchy rel in ROR actionset
2021-10-21 15:51:48 +02:00
Sandro La Bruzzo
aeeebd573b
code refactor renamed datacite package
2021-10-20 17:37:42 +02:00
Sandro La Bruzzo
ab3a99d3e9
removed old datacite oozie workflow
2021-10-20 17:19:47 +02:00
Sandro La Bruzzo
ae4e99a471
Adapted workflow of resolution of PID to work into OpenAIRE data workflow
...
- Added relations in both verse on all Scholexplorer datasources
2021-10-20 17:12:16 +02:00
Miriam Baglioni
1cc09adfaa
Opencitations: chenaged the test class to mirror the creation or not of duplicate dois for .refs oc original plus added optional parameter to duplicate the relation
2021-10-18 14:11:27 +02:00
Sandro La Bruzzo
7b15b88d4c
renamed wrong package, implemented last aggregation workflow for scholexplorer
2021-10-15 15:00:15 +02:00