Claudio Atzori
5130eac247
mapping by participant project contribution
2022-06-24 17:16:42 +02:00
Miriam Baglioni
b98f904d48
[Funder Products Dump] new way to avoid using hive
2022-06-21 17:52:27 +02:00
Miriam Baglioni
7423577a08
[Graph DUMP] add code to produce the delta of new projects with respect to the previous delta/dump
2022-06-21 14:51:38 +02:00
Claudio Atzori
4c8e820ff0
mapping relationship from trasformed records based on oaf:relation
2022-06-14 08:49:02 +02:00
Claudio Atzori
116902c028
mapping relationship from trasformed records based on oaf:relation
2022-06-13 14:31:48 +02:00
Claudio Atzori
5d3b4a9c25
[graph merge beta] merge datasource originalid, collectedfrom, and pid lists
2022-05-11 14:13:06 +02:00
Claudio Atzori
2a8e0fb72f
[openorgs] mapping parent/child relations without massaging the semantic labels
2022-05-10 08:45:53 +02:00
Claudio Atzori
77bc9863e9
[openorgs] mapping parent/child relations without massaging the semantic labels
2022-05-09 16:06:04 +02:00
Claudio Atzori
da611cfbbd
[eosc_services] resolved merge conflicts
2022-05-03 13:37:15 +02:00
Claudio Atzori
2ade69dea6
EOSC Services - minor
2022-05-02 17:03:31 +02:00
Claudio Atzori
b6a7ff3a99
EOSC Services - removed fields from mapping, testing preparation
2022-05-02 15:52:33 +02:00
Claudio Atzori
f5f532d134
EOSC Services - ongoing update
2022-04-29 12:25:24 +02:00
Claudio Atzori
5ffc24d1ba
EOSC Services - ongoing update
2022-04-26 16:18:41 +02:00
Claudio Atzori
29150a5d0c
code formatting
2022-04-21 13:31:56 +02:00
Miriam Baglioni
a38f0f5ea7
mergin with branch beta
2022-04-20 15:44:18 +02:00
Claudio Atzori
05fafa1408
[graph raw] avoid NPEs importing datasource consent fields
2022-04-06 15:23:50 +02:00
Claudio Atzori
8c457f1b2c
conflicts resolved, merged from beta
2022-04-06 10:27:52 +02:00
Miriam Baglioni
79336d46c5
[Clean Context] first naive implementation of a functionality to clean not wanted contextes from one result. This implementation simply verifies the main title of the results start with a given string
2022-04-04 15:52:31 +02:00
Claudio Atzori
0a0ae84c22
[graph raw] DOI based instance URLs on https
2022-03-29 10:52:58 +02:00
Miriam Baglioni
0f7d8ca2e0
[HostedByMap] change on master to align to PR 201 on beta merged as 9f3036c847
2022-03-11 15:16:02 +01:00
Claudio Atzori
f25407bbe2
added mapping for datasource consent fields to integrate them in the graph
2022-03-11 09:32:42 +01:00
Miriam Baglioni
2c5087d55a
[HostedByMap] download of doaj from json, modification of test resources, deletion of class no more needed for the CSV download
2022-03-04 15:18:21 +01:00
Miriam Baglioni
5d608d6291
[HostedByMap] changed the model to include also oaStart date and review process that could be possibly used in the future
2022-03-04 11:06:09 +01:00
Miriam Baglioni
8a41f63348
[HostedByMap] update to download the json instead of the csv
2022-03-04 10:38:43 +01:00
Miriam Baglioni
44b0c03080
[HostedByMap] update to download the json instead of the csv
2022-03-04 10:37:59 +01:00
Alessia Bardi
600ede1798
serialisation of APCs int he XML records
2022-02-11 11:00:20 +01:00
Miriam Baglioni
aae667e6b6
[APC at the result level] added the APC at the level of the result and modified test class
2022-02-04 12:34:25 +01:00
Claudio Atzori
f0ea2410e5
improved mapping titles from datacite records to consider title types
2022-01-21 10:50:34 +01:00
Miriam Baglioni
31b26d48ac
[Graph Dump] fixed issue on extraction of relation between entities and contexts: the relationship name and type were swapped
2021-12-23 10:09:47 +01:00
Miriam Baglioni
56409d1281
[Dump] resolved conflicts with beta and merging
2021-12-14 15:03:45 +01:00
Miriam Baglioni
8d755cca80
-
2021-12-13 15:01:40 +01:00
Claudio Atzori
41c70c607d
cleaning workflow assigns the proper default instance type when a value could not be cleaned using the vocabularies
2021-12-09 16:44:28 +01:00
Sandro La Bruzzo
bf880e2508
[scala-refactor] Module dhp-graph-mapper:
...
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 13:57:41 +01:00
Miriam Baglioni
58bc3f223a
[GRAPH DUMP] Add filtering for relation we do not want to dump. It is based on the relclass
2021-12-02 14:09:46 +01:00
Miriam Baglioni
8905a39bf3
mergin with branch beta
2021-12-02 13:17:29 +01:00
Claudio Atzori
d85af6fc25
[cleaning wf] fixed OAF record navigation, a mapping defined on a container object would have prevented the natvigation to continue on its properties
2021-12-01 15:49:15 +01:00
Sandro La Bruzzo
93fe8ce8b2
entity resolution: fix test
2021-11-22 15:50:43 +01:00
Sandro La Bruzzo
35e20b0647
updated resolution wf:
...
- generate a new version of the graph
- changed merge from union to join
2021-11-22 11:48:55 +01:00
Miriam Baglioni
0506fa2654
[Graph Dump] changed to mirror the changes in the model
2021-11-19 15:56:25 +01:00
Miriam Baglioni
9fae872181
[Graph Dump] changed to mirror the changes in the model
2021-11-19 11:25:50 +01:00
Claudio Atzori
bb5dca7979
cleanup
2021-11-18 17:10:46 +01:00
Miriam Baglioni
793b5a8e5f
Aggiornare 'dhp-workflows/dhp-graph-mapper/src/main/java/eu/dnetlib/dhp/oa/graph/dump/ResultMapper.java'
...
Removing the dump of Measure at the level of the result. We decided not to map it
2021-11-18 14:49:38 +01:00
Miriam Baglioni
c6a9f0a1a8
mergin with branch beta
2021-11-16 12:04:40 +01:00
Miriam Baglioni
99d86134f5
[Graph Dump] changed the dump since the measures have been moded at the level of the instance
2021-11-16 12:04:21 +01:00
Claudio Atzori
668ac25224
[graph resolution] using existing argument parser file name
2021-11-15 17:02:45 +01:00
Miriam Baglioni
b3f9370125
merge with beta - resolved conflict in pom
2021-11-12 11:25:26 +01:00
Sandro La Bruzzo
a7763d2492
removed alternate identifier in resolutionMap
2021-11-12 09:56:45 +01:00
Sandro La Bruzzo
2ca0a436ad
added SparkResolveEntities node to the oozie wf
2021-11-11 10:25:42 +01:00
Sandro La Bruzzo
9cb195314f
implemented and tested resolution of entities
2021-11-11 10:17:40 +01:00
Miriam Baglioni
8cc50ecee0
[Graph Dump] changed AccessRight with BestAccessRight in the dump and modified the dependency to the schema to the SNAPSHOT
2021-11-11 08:59:20 +01:00
Miriam Baglioni
88b73f4f49
mergin with branch beta
2021-11-10 17:00:52 +01:00
Sandro La Bruzzo
6477a40670
implement filter of openCitation
2021-11-09 11:27:12 +01:00
Miriam Baglioni
94918a673c
[Graph DUMP] Fix issue for empty origilaId list
2021-11-08 10:25:28 +01:00
Miriam Baglioni
8442efd8d1
[Graph DUMP] Filtering out from the originalIds the id of the result in OpenAIRE
2021-11-05 12:29:22 +01:00
Miriam Baglioni
a22c29fba1
[Graph DUMP] Filtering out from the originalIds the id of the result in OpenAIRE
2021-11-05 12:08:33 +01:00
Miriam Baglioni
0857849a86
[Graph DUMP] Remove dump of measure until it will be clear where to put it (at the level of result or at the level of the instance)
2021-11-05 11:02:37 +01:00
Sandro La Bruzzo
7bd224f051
implement first version of scholexplorer integration for the generation of final graph
2021-11-02 15:58:15 +01:00
Sandro La Bruzzo
d9cbca83f7
moved filter on next phase
2021-10-28 16:13:24 +02:00
Sandro La Bruzzo
1be9aa0a5f
Removed filter of datacite items from the raw graph merging phase, Datacite is not an actionset anymore in beta
2021-10-26 17:52:20 +02:00
Sandro La Bruzzo
4acfa8fa2e
Scholexplorer Datasource Aggregation:
...
- Added collectedfrom in the inverse relation generated
Relation resolution:
- increased number of partitions in workflow.xml
- using classid instead of classname to build the pid-dnetId mapping
2021-10-26 17:51:20 +02:00
Sandro La Bruzzo
034304b33a
conflict resolved on merge
2021-10-26 09:40:47 +02:00
Claudio Atzori
d147295c2f
avoiding java.io.NotSerializableException: java.util.HashMap
2021-10-21 14:15:57 +02:00
Claudio Atzori
3702fe478d
cleanup
2021-10-21 12:05:02 +02:00
Sandro La Bruzzo
ac36aa7d1c
fixed wrong Encoding during a map phase
2021-10-21 11:35:02 +02:00
Sandro La Bruzzo
ae4e99a471
Adapted workflow of resolution of PID to work into OpenAIRE data workflow
...
- Added relations in both verse on all Scholexplorer datasources
2021-10-20 17:12:16 +02:00
Claudio Atzori
00b78b9c58
cleanup: mapping contents in the graph already defined in the OAF graph model doesn't require to be aware of the vocabularies
2021-10-20 14:04:45 +02:00
Claudio Atzori
c01dd0c925
registered oaf model classes for the KryoSerializer
2021-10-20 13:55:07 +02:00
Claudio Atzori
515e068a78
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-19 16:46:06 +02:00
Claudio Atzori
512e7b0170
code formatting
2021-10-19 16:19:29 +02:00
Claudio Atzori
e9157c67aa
Merge branch 'beta' into dump
2021-10-19 16:15:03 +02:00
Claudio Atzori
98f37c8d81
WIP: worflow nodes for including Scholexplorer records in the RAW graph
2021-10-19 16:14:40 +02:00
Claudio Atzori
7a73010acd
WIP: worflow nodes for including Scholexplorer records in the RAW graph
2021-10-19 11:59:16 +02:00
Miriam Baglioni
c7f6cd2591
added again the setting for saXReader
2021-10-19 10:15:26 +02:00
miconis
5f780a6ba1
bug fix in migrate entities: parameter name was wrong
2021-10-18 23:30:40 +02:00
miconis
995c1eddaf
minor change
2021-10-13 17:07:10 +02:00
miconis
326bf63775
integration of parent child orgs relations
2021-10-13 12:24:48 +02:00
Miriam Baglioni
63933808d4
added fix for mixing result types, added configuration default to funder subworkflow
2021-10-13 11:28:28 +02:00
Miriam Baglioni
fec40bdd95
merging with branch beta - resolved conflicts
2021-10-12 09:16:36 +02:00
Sandro La Bruzzo
5606014b17
code refactor see ticket #7065
2021-10-12 08:11:53 +02:00
Sandro La Bruzzo
2557bb41f5
Implemented new method for update baseline inside scala node
2021-10-06 16:41:08 +02:00
Sandro La Bruzzo
b84e0cabeb
Implemented new method for update baseline
2021-10-05 16:34:47 +02:00
Sandro La Bruzzo
991b06bd0b
removed generation of EBI links from old dump, now EBI link dump is created by another wf
2021-10-05 10:21:33 +02:00
Miriam Baglioni
e653756e3d
applied some suggestiond from Sonar Lint
2021-10-04 18:40:07 +02:00
Miriam Baglioni
c4ccd7b32c
-
2021-10-01 12:59:47 +02:00
Miriam Baglioni
c8321ad31a
merge with branch beta
2021-10-01 12:59:08 +02:00
Claudio Atzori
ebf53a1616
added cleaning for relation fields: subRelType & relClass according to dedicated vocabs
2021-09-15 16:10:37 +02:00
Sandro La Bruzzo
e8b3cb9147
Implemented method to download delta updates in EBI Links
2021-08-30 09:32:45 +02:00
Alessia Bardi
ccf4103a25
keep the original url if the decoder fails for any reason
2021-08-25 10:07:58 +02:00
Sandro La Bruzzo
45898c71ac
fixed wrong doi in pubmed
2021-08-24 15:20:04 +02:00
Alessia Bardi
931f430129
Merge branch 'beta' into datasource_model_eosc_beta
2021-08-23 11:57:21 +02:00
Alessia Bardi
4c1474e693
Dealing with #6859#note-2: we have to decode URLs to avoid & and other chars encoded becasue of the original XML representation of data
2021-08-20 17:03:30 +02:00
Miriam Baglioni
e5cf11d088
change open access route to result matching hbm to gold
2021-08-19 10:29:04 +02:00
Claudio Atzori
f74adc4752
added DownloadCSV2 as alternative implementation of the same download procedure
2021-08-13 15:52:15 +02:00
Claudio Atzori
5f0903d50d
fixed CSV downloader & tests
2021-08-13 14:17:54 +02:00
Claudio Atzori
7ee2757fcd
fixed DownloadCSV parameters spec; workflow patching the hostedby replaces the graph content (publication, datasource) rather than creating a copy
2021-08-13 12:41:01 +02:00
Claudio Atzori
c3ad4ab701
minor fixes
2021-08-13 12:23:15 +02:00
Claudio Atzori
3359f73fcf
cleanup & best practices
2021-08-13 12:00:42 +02:00
Miriam Baglioni
f4ec81c92c
mergin with branch beta
2021-08-13 10:31:35 +02:00
Miriam Baglioni
32fd75691f
refactoring
2021-08-13 10:15:42 +02:00
Claudio Atzori
9587d4aee8
Merge branch 'beta' into hostedbymap
2021-08-12 17:04:30 +02:00
Claudio Atzori
8cdce59e0e
[graph raw] let the mapping exceptions propagate
2021-08-12 11:32:26 +02:00
Miriam Baglioni
b1c6140ebf
removed all comments in Italian
2021-08-11 16:23:33 +02:00
Miriam Baglioni
8da3a25cf6
merging with branch beta
2021-08-11 15:55:34 +02:00
Claudio Atzori
9f4db73f30
updated/fixed unit tests
2021-08-11 15:02:51 +02:00
Claudio Atzori
2ee21da43b
suggestions from SonarLint
2021-08-11 12:13:22 +02:00
Miriam Baglioni
9731a6144a
hostedbymap - in case the journal is open access the access may be changed also for the best access right in the result
2021-08-10 17:49:45 +02:00
Miriam Baglioni
c3931557e3
extended the logic of the dump to consider the validation date in the relation (also in the dumped result for communities and funders at the level of the project), the extention on the instance for the APC, the pid, the alternate identifiers, and the extention of the AccessRight to store the OpenAccessRoute. Added new resourec for testing and extended the old class to verify the new dump. Fixed also issue on relation dump: only relation whose source and target are entities in the graph are dumped. The same hold for references to projects
2021-08-06 18:56:18 +02:00
Miriam Baglioni
6bd1eca7e0
merge branch with beta
2021-08-05 15:23:32 +02:00
Miriam Baglioni
73dc082927
added new dumped field (openaccessroute, pid and alternate identifier at the level of the instance) and the bipFinder measure at the level of the result
2021-08-05 15:20:50 +02:00
Miriam Baglioni
ee13da9258
merge branch with master
2021-08-05 11:34:20 +02:00
Miriam Baglioni
eccf3851b0
Hosted By Map - refactoring
2021-08-04 10:16:30 +02:00
Miriam Baglioni
1e952cccf6
Hosted By Map - refactoring and deletion of not needed methods
2021-08-04 10:15:43 +02:00
Miriam Baglioni
8ba8c77f92
Hosted By Map - refactoring
2021-08-04 10:14:57 +02:00
Miriam Baglioni
8f7623e77a
Hosted By Map - refactoring and application of the new aggregator
2021-08-04 10:14:20 +02:00
Miriam Baglioni
a7bf314fd2
Hosted By Map - added new aggregator to get just one result per datasource id
2021-08-04 10:13:30 +02:00
Miriam Baglioni
100e54e6c8
mergin with branch beta
2021-08-03 10:47:11 +02:00
Miriam Baglioni
461b8a29a0
removed not needed class
2021-08-03 10:46:51 +02:00
Miriam Baglioni
327cddde33
Hosted By Map - refactoring
2021-08-03 10:44:13 +02:00
Miriam Baglioni
1e859706a3
Hosted By Map - Classes to apply the HBM to results and datasources
2021-08-02 19:35:23 +02:00
Miriam Baglioni
72df8f9232
Hosted By Map - removed the aggregator for the datasource (it is no more needed) and added a new aggregator for the results. Changed also the hostedBYMap aggregator
2021-08-02 19:34:44 +02:00
Miriam Baglioni
ff1ce75e33
Hosted By Map - modification in the code to prepare the info needed to apply the HostedByMap. There is no need to join datasources with the hbm: all the information needed is in the hosted by map already
2021-08-02 19:32:59 +02:00
Claudio Atzori
e826aae848
using constants from ModelConstants
2021-08-02 14:28:59 +02:00
Miriam Baglioni
7c6ea2f4c7
Hosted By Map - first attempt for the creation of intermedia information to be used to applu the hosted by map on the graph entities
2021-07-30 17:56:27 +02:00
Miriam Baglioni
d8b9b0553b
Hosted By Map - model classes to store the intermediate information to be used to apply the hosted by map
2021-07-30 17:55:39 +02:00
Miriam Baglioni
613bd3bde0
Hosted By Map - refactor of the first attemp to prepare a new hosted by map dependent on the datasource in the graph and on two external sources: the gold list from unibi ad the doaj list of open access journal. Both the lists are downloaded from provided url parameter
2021-07-30 17:54:45 +02:00
Miriam Baglioni
d1807781c0
mergin with branch beta
2021-07-30 14:34:07 +02:00
Miriam Baglioni
1d6ac3715b
merge branch with beta
2021-07-30 11:58:29 +02:00
Claudio Atzori
19620eed46
applying PR#131, Patch the identifiers (source/target) in the relations, refinements
2021-07-30 11:09:32 +02:00
Claudio Atzori
a6a38cca9e
fixed implementation of PatchRelationsApplication, refined the relative unit test
2021-07-30 11:06:11 +02:00
Claudio Atzori
576693d782
added unit test for PatchRelationsApplication
2021-07-30 10:13:33 +02:00
Miriam Baglioni
baad01cadc
hostedbymap
2021-07-29 13:04:39 +02:00
Claudio Atzori
e87e1805c4
[raw_all] added extra workflow step for patching the identifiers in the relations, given an id mapping dataset
2021-07-29 12:13:06 +02:00
Claudio Atzori
1923c1ce21
replaced full join + filtering with a left join
2021-07-29 11:36:20 +02:00
Michele Artini
e6f1773d63
mapping of new eosc fields
2021-07-28 11:17:11 +02:00
Miriam Baglioni
0424f47494
HostedByMap fixing issues
2021-07-28 10:24:13 +02:00
Claudio Atzori
d267dce520
[raw_all] added extra workflow step for patching the identifiers in the relations, given an id mapping dataset
2021-07-27 17:18:29 +02:00
Miriam Baglioni
74f801b689
mergin with branch beta
2021-07-27 13:18:31 +02:00
Miriam Baglioni
35e395eae8
merge with master
2021-07-27 12:34:59 +02:00
Miriam Baglioni
eb07f7f40f
Hosted By Map
2021-07-27 12:27:26 +02:00
Sandro La Bruzzo
848aabbb6c
minor fix
2021-07-25 12:06:41 +02:00
Sandro La Bruzzo
8fac10c91e
fixed defintion wf of creation final infospace of scholexplorer
2021-07-25 11:15:37 +02:00
Sandro La Bruzzo
3920c69bc8
change implementation of resolve Relation to generate jsonRdd in output
2021-07-25 09:51:36 +02:00
Sandro La Bruzzo
d9e3b89937
implemented last part of workflows to generate scholixGraph
2021-07-23 16:38:32 +02:00
Sandro La Bruzzo
cfde63a7c3
fixed resolve relation join
2021-07-23 14:17:29 +02:00
Sandro La Bruzzo
4a439c3863
NPE fixed
2021-07-23 14:17:29 +02:00
Sandro La Bruzzo
43e9380cd3
update resolve relation to use the same format of openaire graph
2021-07-23 11:25:18 +02:00
Sandro La Bruzzo
62ae36a3d2
fixed NPE
2021-07-22 15:41:38 +02:00
Sandro La Bruzzo
31d2d6d41e
Scholexplorer: introduction of dedup openaire
2021-07-21 18:09:32 +02:00
Claudio Atzori
65934888a1
adding record identifier among the originalIds regardless of what IdentifierFactory produces
2021-07-19 17:52:52 +02:00
Claudio Atzori
5947cddafc
adding record identifier among the originalIds regardless of what IdentifierFactory produces
2021-07-19 17:52:24 +02:00