Sandro La Bruzzo
efa09057db
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2021-11-15 14:32:09 +01:00
Sandro La Bruzzo
48923e46a1
added documentation to Pubmed Class and also added mvn site for dhp-aggregations
2021-11-15 14:32:01 +01:00
Claudio Atzori
d2c787d416
[graph resolution] fixed sequence of the workflow steps
2021-11-15 14:31:15 +01:00
Claudio Atzori
975b10b711
[actionmanager] increased spark.sql.shuffle.partitions to 5000
2021-11-15 12:31:45 +01:00
Miriam Baglioni
4ec88c718c
merge with beta - resolved conflict in pom
2021-11-15 10:52:16 +01:00
Miriam Baglioni
6f1a434e90
[Bypass Action Set] Fixed test to consider the new identifier utils
2021-11-15 09:59:23 +01:00
Miriam Baglioni
157d33ebf9
[Bypass Action Set] Refactoring
2021-11-15 09:58:48 +01:00
Miriam Baglioni
6595135a1a
[Dump Schemas] changed the schema of the dumped result according to the modifications in the bestAccessRight type
2021-11-12 11:45:38 +01:00
Miriam Baglioni
43cae4ad88
Merge branch 'dump' of https://code-repo.d4science.org/D-Net/dnet-hadoop into dump
2021-11-12 11:36:54 +01:00
Miriam Baglioni
b3f9370125
merge with beta - resolved conflict in pom
2021-11-12 11:25:26 +01:00
Miriam Baglioni
92d0e18b55
[Bypass Action Set] used constant DOI instead of "doi"
2021-11-12 10:56:58 +01:00
Miriam Baglioni
881113743f
[Bypass Action Set] refactoring
2021-11-12 10:55:50 +01:00
Miriam Baglioni
47ccb53c4f
[Bypass Action Set] modification for comment D-Net/dnet-hadoop#157 (comment)
2021-11-12 10:54:09 +01:00
Miriam Baglioni
ffb0ce1d59
merge with beta - resolved conflict in pom
2021-11-12 10:19:59 +01:00
Miriam Baglioni
716021546e
[Bypass Action Set] minor fix
2021-11-12 10:18:01 +01:00
Sandro La Bruzzo
3469cc2b1d
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2021-11-12 09:56:52 +01:00
Sandro La Bruzzo
a7763d2492
removed alternate identifier in resolutionMap
2021-11-12 09:56:45 +01:00
Miriam Baglioni
b8bdabfae9
[Graph DUmp] removed OpenAccessRoute from test in best access right
2021-11-11 16:16:48 +01:00
Miriam Baglioni
e5498052e8
[Graph DUmp] removed OpenAccessRoute from test in best access right
2021-11-11 16:14:10 +01:00
Miriam Baglioni
935062edec
[Bypass Action Set] creation of unresolved entities
2021-11-11 16:11:25 +01:00
Antonis Lempesis
26f086dd64
removed the too restrctive clause. will discuss again
2021-11-11 12:57:19 +02:00
Claudio Atzori
148289150f
Merge branch 'beta' into doiboost_url
2021-11-11 10:40:19 +01:00
Sandro La Bruzzo
2ca0a436ad
added SparkResolveEntities node to the oozie wf
2021-11-11 10:25:42 +01:00
Sandro La Bruzzo
9cb195314f
implemented and tested resolution of entities
2021-11-11 10:17:40 +01:00
Miriam Baglioni
6d3c4c4abe
mergin with branch beta
2021-11-11 08:59:53 +01:00
Miriam Baglioni
8cc50ecee0
[Graph Dump] changed AccessRight with BestAccessRight in the dump and modified the dependency to the schema to the SNAPSHOT
2021-11-11 08:59:20 +01:00
Miriam Baglioni
88b73f4f49
mergin with branch beta
2021-11-10 17:00:52 +01:00
Miriam Baglioni
c371b23077
-
2021-11-10 17:00:37 +01:00
Alessia Bardi
fc8fceaac3
create direct link to WT projects as well
2021-11-10 14:11:52 +01:00
Alessia Bardi
6cd91004e3
fixed DOI for Wellcome Trust in mapping relationships from Crossref
2021-11-09 12:22:57 +01:00
Miriam Baglioni
9e214ce0eb
[BypassAS] addition of OC relations
2021-11-09 12:07:19 +01:00
Alessia Bardi
b9d4f115cc
fixed Crossref mappign for SFI projects
2021-11-09 12:04:45 +01:00
Sandro La Bruzzo
6477a40670
implement filter of openCitation
2021-11-09 11:27:12 +01:00
Miriam Baglioni
6f7ca539c6
[BypassAS] update of results for bipFinder and FOS
2021-11-09 11:25:41 +01:00
Miriam Baglioni
a7d50c499b
[BypassAS] prepare FOS subject, test and model for FOS and BipFinder scores
2021-11-08 16:44:19 +01:00
Antonis Lempesis
91354c6068
- fetching all context related results
...
- storing tables as parquet
2021-11-08 15:15:46 +02:00
Miriam Baglioni
94918a673c
[Graph DUMP] Fix issue for empty origilaId list
2021-11-08 10:25:28 +01:00
Claudio Atzori
9cb8e4ad21
Merge branch 'beta' into hierarchical_orgs_relations
2021-11-08 09:40:24 +01:00
Miriam Baglioni
4c70201412
mergin with branch beta
2021-11-05 12:29:56 +01:00
Miriam Baglioni
8442efd8d1
[Graph DUMP] Filtering out from the originalIds the id of the result in OpenAIRE
2021-11-05 12:29:22 +01:00
Claudio Atzori
5681e89544
Update 'dhp-workflows/dhp-graph-mapper/src/main/resources/eu/dnetlib/dhp/oa/graph/dump/schemas/result_schema.json'
2021-11-05 12:18:24 +01:00
Miriam Baglioni
a22c29fba1
[Graph DUMP] Filtering out from the originalIds the id of the result in OpenAIRE
2021-11-05 12:08:33 +01:00
Miriam Baglioni
c10ff6928c
[Graph DUMP] add schema of the dump related to the model as in dhp-schemas.2.8.31. Note the measere element at the level of the result has been removed because of issues on where to display it: at the level of the result or at the level of the entity
2021-11-05 11:36:21 +01:00
Miriam Baglioni
0857849a86
[Graph DUMP] Remove dump of measure until it will be clear where to put it (at the level of result or at the level of the instance)
2021-11-05 11:02:37 +01:00
Miriam Baglioni
df7ee77c7a
[DOIBoost Mapping] removed not needed comments
2021-11-04 16:24:07 +01:00
Miriam Baglioni
de63d29b6f
[DOIBoost Mapping] Fix to avoid to produce results with null as identifier (probably due to the filtering function in the factory for the creation of the id)
2021-11-04 16:16:40 +01:00
Miriam Baglioni
d50057b2d9
[DOIBoost Mapping] changed the way to create the url for the instance: we use the crooref guidelines https://doi.org/doi
2021-11-03 16:59:37 +01:00
Miriam Baglioni
edf55395e9
added test resourse
2021-11-03 16:49:30 +01:00
Miriam Baglioni
d97ea82a29
[DOIBoost Mapping] Added test to verify the instance created for Crossref will have just the url related to the doi
2021-11-03 16:45:15 +01:00
Miriam Baglioni
96769b4481
[DOIBoost - Mapping] Changed the logic which brought in in the instance urls that should not be there: The urld of the doi in the json is reachable from the root (json/"URL") other urls where added from the links element. Now the mapping from the link element has been removed
2021-11-03 16:43:36 +01:00
Miriam Baglioni
683fe093cf
[DOIBoost - Mapping] Remove the addition of the instance to the MAG publication record
2021-11-03 15:51:26 +01:00
Miriam Baglioni
b2bb8d9d79
[DOIBoost - Mapping] selecting the url from Crossref containing the doi
2021-11-03 15:44:57 +01:00
Miriam Baglioni
779318961c
[DOIBoost - Mapping] removed the url from crossref containing the api.elsevier.com... string in the url
2021-11-03 14:38:52 +01:00
Miriam Baglioni
2480e590d1
[DOIBoost - Mapping] changed the type on which to map dissertation from Crossref: from 006 Doctoral thesis to 0044 Thesis since dissertation could be either Doctoral or master thesis
2021-11-03 14:25:23 +01:00
Miriam Baglioni
b9d124bb7c
[Enrichment: Propagation through parent-child relationships] Added counters, and changed constraint to verify if filtering out the relation (from classname = harvested to classid != propagation)
2021-11-03 13:55:37 +01:00
Sandro La Bruzzo
7bd224f051
implement first version of scholexplorer integration for the generation of final graph
2021-11-02 15:58:15 +01:00
Antonis Lempesis
b97b78f874
removed hardcoded reference
2021-11-02 09:12:49 +01:00
Claudio Atzori
7fa49f6956
Merge pull request 'removed hardcoded reference' ( #154 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#154
2021-11-02 09:11:30 +01:00
Antonis Lempesis
f78afb5ef9
removed hardcoded reference
2021-11-01 15:42:29 +02:00
Miriam Baglioni
2aca6bfa0a
mergin with branch beta
2021-10-29 11:20:45 +02:00
Miriam Baglioni
09f36cffb8
[Enrichment: Propagation through parent-child relationships] First implementation, testing, and wf for propagation of result to organization through semantic relation
2021-10-29 11:20:03 +02:00
Claudio Atzori
1225ba0b92
[resolution] increasing number of partitions to avoid OOM
2021-10-28 16:18:17 +02:00
Sandro La Bruzzo
d9cbca83f7
moved filter on next phase
2021-10-28 16:13:24 +02:00
Claudio Atzori
d02caef185
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-27 15:36:29 +02:00
Sandro La Bruzzo
1be9aa0a5f
Removed filter of datacite items from the raw graph merging phase, Datacite is not an actionset anymore in beta
2021-10-26 17:52:20 +02:00
Sandro La Bruzzo
4acfa8fa2e
Scholexplorer Datasource Aggregation:
...
- Added collectedfrom in the inverse relation generated
Relation resolution:
- increased number of partitions in workflow.xml
- using classid instead of classname to build the pid-dnetId mapping
2021-10-26 17:51:20 +02:00
Miriam Baglioni
d0ef7d91c5
adding test resource
2021-10-26 17:34:11 +02:00
Sandro La Bruzzo
034304b33a
conflict resolved on merge
2021-10-26 09:40:47 +02:00
Michele Artini
d66e20e7ac
added hierarchy rel in ROR actionset
2021-10-21 15:51:48 +02:00
Claudio Atzori
d147295c2f
avoiding java.io.NotSerializableException: java.util.HashMap
2021-10-21 14:15:57 +02:00
Claudio Atzori
3702fe478d
cleanup
2021-10-21 12:05:02 +02:00
Sandro La Bruzzo
ac36aa7d1c
fixed wrong Encoding during a map phase
2021-10-21 11:35:02 +02:00
Sandro La Bruzzo
aeeebd573b
code refactor renamed datacite package
2021-10-20 17:37:42 +02:00
Sandro La Bruzzo
ab3a99d3e9
removed old datacite oozie workflow
2021-10-20 17:19:47 +02:00
Sandro La Bruzzo
ae4e99a471
Adapted workflow of resolution of PID to work into OpenAIRE data workflow
...
- Added relations in both verse on all Scholexplorer datasources
2021-10-20 17:12:16 +02:00
Claudio Atzori
cece432adc
[stats] reducing the step22 wait time
2021-10-20 14:16:33 +02:00
Antonis Lempesis
a7376907c2
invalidating medatadata before context thingies
2021-10-20 14:16:25 +02:00
Antonis Lempesis
43f4eb492b
fetching affiliated results for 4 orgs in monitor. fixed affiliated orgs in stats db
2021-10-20 14:16:11 +02:00
Claudio Atzori
4f8970f8ed
[stats] reducing the step22 wait time
2021-10-20 14:14:53 +02:00
Claudio Atzori
00b78b9c58
cleanup: mapping contents in the graph already defined in the OAF graph model doesn't require to be aware of the vocabularies
2021-10-20 14:04:45 +02:00
Claudio Atzori
c01dd0c925
registered oaf model classes for the KryoSerializer
2021-10-20 13:55:07 +02:00
Miriam Baglioni
652114c641
[affiliationPropagation] first try. preparetion
2021-10-20 11:44:23 +02:00
Claudio Atzori
59f76b50d4
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-20 09:42:35 +02:00
Antonis Lempesis
241dcf6df1
Merge branch 'beta' into beta
2021-10-19 23:54:21 +02:00
Claudio Atzori
515e068a78
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-19 16:46:06 +02:00
Claudio Atzori
512e7b0170
code formatting
2021-10-19 16:19:29 +02:00
Michele Artini
c4fce785ab
fixed a compilation problem of a unit test
2021-10-19 16:18:26 +02:00
Claudio Atzori
e9157c67aa
Merge branch 'beta' into dump
2021-10-19 16:15:03 +02:00
Claudio Atzori
98f37c8d81
WIP: worflow nodes for including Scholexplorer records in the RAW graph
2021-10-19 16:14:40 +02:00
Claudio Atzori
c8850456e9
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-10-19 16:09:54 +02:00
Claudio Atzori
172363e7f1
[broker] integrating PR#147, notification record creation phase separated from indexing on ES
2021-10-19 15:56:27 +02:00
Claudio Atzori
bdffa86c2f
undo last commit
2021-10-19 15:39:38 +02:00
Sandro La Bruzzo
c9870c5122
code formatted
2021-10-19 15:24:59 +02:00
Sandro La Bruzzo
f8329bc110
since dhp-schemas changed, introducing new Relation inverse model, this class has been updated
2021-10-19 15:24:22 +02:00
Claudio Atzori
e471f12d5e
hotfix: recovered implementation removing the hardcoded working_dirs
2021-10-19 12:35:38 +02:00
Claudio Atzori
7a73010acd
WIP: worflow nodes for including Scholexplorer records in the RAW graph
2021-10-19 11:59:16 +02:00
Miriam Baglioni
c7f6cd2591
added again the setting for saXReader
2021-10-19 10:15:26 +02:00
miconis
5f780a6ba1
bug fix in migrate entities: parameter name was wrong
2021-10-18 23:30:40 +02:00
Miriam Baglioni
1315952702
merge with branch beta
2021-10-18 14:17:09 +02:00
Miriam Baglioni
1cc09adfaa
Opencitations: chenaged the test class to mirror the creation or not of duplicate dois for .refs oc original plus added optional parameter to duplicate the relation
2021-10-18 14:11:27 +02:00
Miriam Baglioni
76d41602be
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-10-18 10:53:22 +02:00
Miriam Baglioni
46f82c7c8f
removed not needed folder deletion
2021-10-18 10:53:16 +02:00
Sandro La Bruzzo
7b15b88d4c
renamed wrong package, implemented last aggregation workflow for scholexplorer
2021-10-15 15:00:15 +02:00
Antonis Lempesis
41ecb1eb61
invalidating medatadata before context thingies
2021-10-15 13:42:55 +03:00
Antonis Lempesis
4b7c8dff2d
fetching affiliated results for 4 orgs in monitor. fixed affiliated orgs in stats db
2021-10-14 18:53:35 +03:00
Claudio Atzori
e15a1969a5
applying fix on the DOIBoost construction process that somehow wasn't part of the merge done in 83c90c7180
2021-10-14 14:33:56 +02:00
Sandro La Bruzzo
51a03c0a50
refactor code for EBI from dhp-graph-mapper into dhp-aggregation
2021-10-14 14:23:13 +02:00
Claudio Atzori
14fbf92ad6
Merge branch 'beta' into beta_solr_config
2021-10-14 11:08:44 +02:00
Miriam Baglioni
4b1920f008
changed the working path parameter value as dependant from the dnet-workflow working dir parameter
2021-10-14 09:18:09 +02:00
Miriam Baglioni
8db39c86e2
added new parameter in the doiboost process workflow to specify a folder for the process of MAG dataset
2021-10-14 09:17:39 +02:00
Claudio Atzori
b292e4a700
[stats wf] added extra logging in the context data retrieval phase
2021-10-13 17:31:53 +02:00
miconis
995c1eddaf
minor change
2021-10-13 17:07:10 +02:00
Miriam Baglioni
5d9cc2452d
changed the working path parameter value as dependant from the dnet-workflow working dir parameter
2021-10-13 15:33:50 +02:00
miconis
326bf63775
integration of parent child orgs relations
2021-10-13 12:24:48 +02:00
Miriam Baglioni
16b28494a9
added new parameter in the doiboost process workflow to specify a folder for the process of MAG dataset
2021-10-13 11:34:24 +02:00
Miriam Baglioni
63933808d4
added fix for mixing result types, added configuration default to funder subworkflow
2021-10-13 11:28:28 +02:00
Sandro La Bruzzo
7387416e90
added params skip update to direct transform in OAF, this should be set to true in production
2021-10-12 12:36:30 +02:00
Sandro La Bruzzo
511da98d0c
- fixed bug on download pmc Article
...
- removed unused line of code in SparkCreateActionset
2021-10-12 11:47:49 +02:00
Miriam Baglioni
fec40bdd95
merging with branch beta - resolved conflicts
2021-10-12 09:16:36 +02:00
Miriam Baglioni
83f51f1812
refactoring
2021-10-12 09:14:43 +02:00
Sandro La Bruzzo
5606014b17
code refactor see ticket #7065
2021-10-12 08:11:53 +02:00
Claudio Atzori
2f61054cd1
code formatting
2021-10-11 18:29:42 +02:00
Claudio Atzori
83c90c7180
manually merging PR#149 D-Net/dnet-hadoop#149
2021-10-11 18:27:05 +02:00
Serafeim Chatzopoulos
201ce71cc1
Add resultsubject, relprojectname and resultacceptanceyear to __all field
2021-10-11 13:16:39 +03:00
Serafeim Chatzopoulos
e468a7b96b
Add tests to query Solr with different configurations
2021-10-08 16:58:51 +03:00
Serafeim Chatzopoulos
de81007302
Add exploreTestConfig, a new Solr configuration folder
2021-10-08 16:54:56 +03:00
Sandro La Bruzzo
8f99d2af86
Make the node of doiBoost to point to the correct OpenAire Organization in relations
2021-10-08 08:35:12 +02:00
Alessia Bardi
c48c43fa9e
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-10-07 17:30:53 +02:00
Alessia Bardi
8d3b60f446
test for patching records for EOSC Future
2021-10-07 17:30:45 +02:00
miconis
611ca511db
set configuration property in openorgs duplicates wf
2021-10-07 15:39:55 +02:00
miconis
9646b9fd98
implementation of the http call for the update of openorgs suggestions
2021-10-07 11:29:11 +02:00
Sandro La Bruzzo
2557bb41f5
Implemented new method for update baseline inside scala node
2021-10-06 16:41:08 +02:00
Sandro La Bruzzo
b84e0cabeb
Implemented new method for update baseline
2021-10-05 16:34:47 +02:00
Michele Artini
d6e1f22408
max numbers of workers for indexing
2021-10-05 15:09:18 +02:00
Michele Artini
210d6c0e6d
generateNotificationsJob and indexNotificationsJob
2021-10-05 13:57:46 +02:00
Michele Artini
69008e20c2
log and tests
2021-10-05 11:58:20 +02:00
Sandro La Bruzzo
f258bbb927
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2021-10-05 10:21:50 +02:00
Sandro La Bruzzo
991b06bd0b
removed generation of EBI links from old dump, now EBI link dump is created by another wf
2021-10-05 10:21:33 +02:00
Claudio Atzori
cb7efe12ac
Merge pull request 'beta' ( #146 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#146
2021-10-05 10:09:37 +02:00
Michele Artini
8bbaa17335
reimplemented of conditions cache as a non static variable
2021-10-05 09:20:37 +02:00
Miriam Baglioni
e653756e3d
applied some suggestiond from Sonar Lint
2021-10-04 18:40:07 +02:00
Michele Artini
0a9ef34b56
test
2021-10-04 15:46:12 +02:00
Michele Artini
31a6ad1d79
optimization of verifySubsriptions()
2021-10-04 12:01:56 +02:00
dimitrispie
3f25d2efb2
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta
2021-10-01 16:03:48 +03:00
dimitrispie
13687fd887
Sprint 3 indicators update
2021-10-01 16:02:02 +03:00
Miriam Baglioni
9814c3e700
mergin with branch beta
2021-10-01 13:00:03 +02:00
Miriam Baglioni
c4ccd7b32c
-
2021-10-01 12:59:47 +02:00
Miriam Baglioni
c8321ad31a
merge with branch beta
2021-10-01 12:59:08 +02:00
Claudio Atzori
b01cd521b0
removed configuration specifying the limit to 8 for spark.dynamicAllocation.maxExecutors
2021-10-01 11:26:33 +02:00
Claudio Atzori
ec94cc9b93
IndexNotificationsJob test: persist contents on HDFS instead of passing them to ES
2021-10-01 09:41:27 +02:00
Claudio Atzori
60a6a9a583
[graph2hive] added field 'measures' to the result view
2021-09-30 09:27:26 +02:00
Sandro La Bruzzo
66702b1973
Added node to update datacite
2021-09-28 08:59:06 +02:00
Sandro La Bruzzo
477cb10715
Merge remote-tracking branch 'origin/beta' into beta
2021-09-27 16:57:23 +02:00
Sandro La Bruzzo
be79d74e3d
Fixed DoiBoost generation to point to correct organization in affiliation relation
2021-09-27 16:57:04 +02:00
Claudio Atzori
474117c2e8
Merge branch 'beta' into dedup_whitelist
2021-09-27 16:41:25 +02:00
Miriam Baglioni
476a4708d6
mergin with branch beta
2021-09-27 16:02:32 +02:00
Miriam Baglioni
5ec69889db
OpenCitations: creation of AS from OC
2021-09-27 16:02:06 +02:00
Claudio Atzori
a53acfbc06
Merge pull request '[stats] updates in the mapping, indicators, wf' ( #145 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#145
2021-09-27 15:59:54 +02:00
Alessia Bardi
b924276e18
tests to generate records for the EOSC-Future demo with the EOSC Jupyter Notebbok subject
2021-09-24 17:11:56 +02:00
Antonis Lempesis
a1e1cf32d7
fixed an impala error
2021-09-24 12:57:24 +03:00
Antonis Lempesis
f358cabb2b
fixed typo
2021-09-22 21:50:37 +03:00
Miriam Baglioni
eedf7c3310
mergin with branch beta
2021-09-22 15:18:34 +02:00
Miriam Baglioni
f2118d771a
first steps in the implementation of the integration of opencitations
2021-09-22 15:18:05 +02:00
Claudio Atzori
7fa60e166e
Merge branch 'beta' into dedup_whitelist
2021-09-22 11:31:18 +02:00
Antonis Lempesis
421d55265d
created hive action for observatory queries
2021-09-21 03:07:58 +03:00
Enrico Ottonello
92a63f78fe
multiple download attempts handling if a connection to orcid server fails
2021-09-20 18:25:00 +02:00
Enrico Ottonello
0c74f5667e
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-09-20 18:12:31 +02:00
miconis
853333bdde
implementation of the whitelist for similarity relations
2021-09-20 16:21:47 +02:00
Antonis Lempesis
8b681dcf1b
attempt to make the observatory wf run in hive
2021-09-18 00:35:14 +03:00
Antonis Lempesis
2943287d10
fixed the definition of cc_licence, part II
2021-09-16 15:59:06 +03:00
Antonis Lempesis
dd2329849f
fixed the definition of cc_licence
2021-09-16 13:50:34 +03:00
Claudio Atzori
09c2eb7f62
Merge branch 'beta' into clean_relations
2021-09-16 11:09:47 +02:00
Miriam Baglioni
e9ccdf853f
related to D-Net/dnet-hadoop#132
2021-09-15 18:44:54 +02:00
Claudio Atzori
12766bf5f2
Merge branch 'beta' into clean_relations
2021-09-15 17:18:15 +02:00
Claudio Atzori
663b1556d7
manually integrating PR#140 D-Net/dnet-hadoop#140
2021-09-15 16:40:25 +02:00
Claudio Atzori
ebf53a1616
added cleaning for relation fields: subRelType & relClass according to dedicated vocabs
2021-09-15 16:10:37 +02:00
Enrico Ottonello
8b804e7fe1
removed unused imports
2021-09-14 17:30:52 +02:00
Enrico Ottonello
aefa36c54b
other task executions go ahead if UnknownHostException happens on a single task
2021-09-14 17:26:15 +02:00
Antonis Lempesis
de9bf3a161
added cc_licences and abstracts in observatory db
2021-09-14 01:29:08 +03:00
Antonis Lempesis
9b1936701c
fixed yet another typo
2021-09-13 21:07:44 +03:00
Antonis Lempesis
8fc89ae822
moved context table creation before indicators
2021-09-13 14:33:23 +03:00
Antonis Lempesis
461bf90ca6
fixed the gold_oa definition
2021-09-13 11:10:30 +03:00
Antonis Lempesis
43852bac0e
creating other::other concept for all contexts
2021-09-13 01:36:41 +03:00
Antonis Lempesis
f13cca7e83
moved dependencies of indicators before them...
2021-09-08 23:07:58 +03:00
Antonis Lempesis
c6ada217a1
fixed typo
2021-09-08 22:34:59 +03:00
Antonis Lempesis
1250ae197f
using new indicators for the definition of peerreviewed, gold, and green
2021-09-08 14:08:43 +03:00
Antonis Lempesis
ccee451dde
added indicators of sprint 2 in monitor db
2021-09-07 23:17:13 +03:00
Sandro La Bruzzo
aed29156c7
changed behavior in transformation job, that doesn't fail at first error
2021-09-07 19:05:46 +02:00
Sandro La Bruzzo
370dddb2fa
fix bug on oai iterator that skip record cleaned
2021-09-07 11:20:41 +02:00
Sandro La Bruzzo
3c6fc2096c
fix bug on oai iterator that skip record cleaned
2021-09-07 10:46:26 +02:00
Sandro La Bruzzo
d4dadf6d77
reduced max number of PID in Relatedentity
2021-09-02 14:21:24 +02:00
Sandro La Bruzzo
9f8a80deb7
fixed wrong import of unresolved relation in openaire
2021-09-01 14:16:27 +02:00
Alessia Bardi
3762b17f7b
added VERSIOn and PART relationship and re-ordered according to my personal and obviously possibly biased
...
ordering
2021-08-31 20:20:05 +02:00
Sandro La Bruzzo
e8b3cb9147
Implemented method to download delta updates in EBI Links
2021-08-30 09:32:45 +02:00
Alessia Bardi
ccf4103a25
keep the original url if the decoder fails for any reason
2021-08-25 10:07:58 +02:00
Sandro La Bruzzo
45898c71ac
fixed wrong doi in pubmed
2021-08-24 15:20:04 +02:00
Alessia Bardi
00a28c0080
originalId was renamed to acronym
2021-08-23 15:02:21 +02:00
Alessia Bardi
f19b04d41b
code formatting after mvn compile
2021-08-23 14:33:39 +02:00
Alessia Bardi
931f430129
Merge branch 'beta' into datasource_model_eosc_beta
2021-08-23 11:57:21 +02:00
Alessia Bardi
4c1474e693
Dealing with #6859#note-2: we have to decode URLs to avoid & and other chars encoded becasue of the original XML representation of data
2021-08-20 17:03:30 +02:00
Miriam Baglioni
5f8ccbc365
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-08-20 11:13:47 +02:00
Miriam Baglioni
882abb40e4
CrossrefDump -
2021-08-20 11:12:53 +02:00
Miriam Baglioni
45c62609af
CrossrefDump - modified because parameter file was moved
2021-08-20 11:12:31 +02:00
Miriam Baglioni
35880c0e7b
CrossrefDump - changed the wf to be able to resume from one of the steps
2021-08-20 11:11:35 +02:00
Miriam Baglioni
f3b6c392c1
CrossrefDump - moving parameter file under folder crossref_dump_reader
2021-08-20 11:10:58 +02:00
Miriam Baglioni
65822400ce
CrossrefDump - added new parameter file that was missing
2021-08-20 11:10:35 +02:00
Alessia Bardi
a053e1513c
different funders in blacklist from BETA and PROD aggregator
2021-08-19 11:32:27 +02:00
Alessia Bardi
812bd54c57
different funders in blacklist from BETA and PROD aggregator
2021-08-19 11:30:14 +02:00
Miriam Baglioni
a65d3caaea
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-08-19 10:29:10 +02:00
Miriam Baglioni
e5cf11d088
change open access route to result matching hbm to gold
2021-08-19 10:29:04 +02:00
Claudio Atzori
7c0c67bdd6
added mock pom
2021-08-13 17:45:53 +02:00
Claudio Atzori
82086f3422
fixed directory name
2021-08-13 17:42:14 +02:00
Claudio Atzori
bc7068106c
added crossref download oozie workflow
2021-08-13 17:19:44 +02:00
Claudio Atzori
2c0a05f11a
manually merged PR#139
2021-08-13 17:15:53 +02:00
Claudio Atzori
d43667d857
Merge pull request 'Automatic download of Crossref' ( #138 ) from crossref_dw_wf into beta
...
Reviewed-on: D-Net/dnet-hadoop#138
2021-08-13 17:10:10 +02:00
Miriam Baglioni
5856ca8a7b
merging with branch beta - resolved conflicts
2021-08-13 16:45:45 +02:00
Miriam Baglioni
6fec71e8d2
removed the specific of the infra we are running the wf from the wf name
2021-08-13 16:39:02 +02:00
Miriam Baglioni
ed7e28490a
change in sh
2021-08-13 16:19:01 +02:00
Claudio Atzori
7743d0f919
consolidated dnet wf profiles into the same submodule
2021-08-13 16:14:54 +02:00
Miriam Baglioni
6eb7508995
mergin with branch beta
2021-08-13 16:07:04 +02:00
Claudio Atzori
f74adc4752
added DownloadCSV2 as alternative implementation of the same download procedure
2021-08-13 15:52:15 +02:00
Claudio Atzori
5f0903d50d
fixed CSV downloader & tests
2021-08-13 14:17:54 +02:00
Claudio Atzori
17cefe6a97
[HBM] removed stale replace option
2021-08-13 12:43:59 +02:00
Claudio Atzori
7ee2757fcd
fixed DownloadCSV parameters spec; workflow patching the hostedby replaces the graph content (publication, datasource) rather than creating a copy
2021-08-13 12:41:01 +02:00
Claudio Atzori
c3ad4ab701
minor fixes
2021-08-13 12:23:15 +02:00
Claudio Atzori
baed5e3337
test classes moved in specific components
2021-08-13 12:14:47 +02:00
Claudio Atzori
3359f73fcf
cleanup & best practices
2021-08-13 12:00:42 +02:00
Miriam Baglioni
f4ec81c92c
mergin with branch beta
2021-08-13 10:31:35 +02:00
Miriam Baglioni
dc8b05b39e
Hosted By Map - changed the association with the datasource id for the hostedby element: there is no more the need to compute it. With the new HBM it is already the id in the graph
2021-08-13 10:18:25 +02:00
Miriam Baglioni
32fd75691f
refactoring
2021-08-13 10:15:42 +02:00
Miriam Baglioni
01db1f8bc4
GetCSV refactoring - removed not needed import
2021-08-13 10:14:17 +02:00
Miriam Baglioni
964a46ca21
GetCSV refactoring - modified due to movement of classes
2021-08-13 10:11:18 +02:00
Miriam Baglioni
eaf077fc34
GetCSV refactoring - removed not needed dependency
2021-08-13 10:08:58 +02:00
Miriam Baglioni
5f674efb0c
moved dependency version in external pom
2021-08-13 10:07:53 +02:00
Miriam Baglioni
5cd5714530
GetCSV refactoring - added ignore annotation for fields not in input csv
2021-08-13 10:06:49 +02:00
Miriam Baglioni
ed183d878e
GetCSV refactoring - modified test classes due to change in the model of projects and programme
2021-08-13 09:28:51 +02:00
Miriam Baglioni
8769dd8eef
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:20:56 +02:00
Miriam Baglioni
6b9e1bf2e3
GetCSV refactoring - removing not needed dependency
2021-08-12 18:17:50 +02:00
Miriam Baglioni
d57b2bb927
GetCSV refactoring - removing not needed dependency
2021-08-12 18:12:51 +02:00
Miriam Baglioni
9da74b544a
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:12:15 +02:00
Miriam Baglioni
ab8abd61bb
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:11:07 +02:00
Miriam Baglioni
335a824e34
GetCSV refactoring - fixed issue
2021-08-12 18:10:10 +02:00
Miriam Baglioni
f0845e9865
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:04:58 +02:00
Miriam Baglioni
7a789423aa
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:04:27 +02:00
Miriam Baglioni
e9fc3ef3bc
GetCSV refactoring - changed to use the new class to get and write the csv file
2021-08-12 18:03:41 +02:00
Miriam Baglioni
4317211a2b
GetCSV refactoring - refactoring due to movement
2021-08-12 18:03:14 +02:00
Miriam Baglioni
b62cd656a7
GetCSV refactoring - changed the model to store only the information needed
2021-08-12 18:01:10 +02:00
Miriam Baglioni
d36e925277
GetCSV refactoring - moved under model package
2021-08-12 18:00:21 +02:00
Miriam Baglioni
6e84b3951f
GetCSV refactoring - moving classes to dhp-common that have dependency with GetCSV class (that was located in graph-mapper)
2021-08-12 17:57:41 +02:00
Claudio Atzori
9587d4aee8
Merge branch 'beta' into hostedbymap
2021-08-12 17:04:30 +02:00
Claudio Atzori
86d940044c
added test to verify bad records from FWF-E-Book-Library
2021-08-12 11:32:56 +02:00
Claudio Atzori
8cdce59e0e
[graph raw] let the mapping exceptions propagate
2021-08-12 11:32:26 +02:00
Miriam Baglioni
08dd2b2102
moving the dependency version to the external pom file
2021-08-11 18:09:41 +02:00
Miriam Baglioni
ac417ca798
removed not needed test resource
2021-08-11 17:50:33 +02:00
Miriam Baglioni
e33daaeee8
reverting
2021-08-11 17:46:19 +02:00
Miriam Baglioni
785db1d5b2
refactoring
2021-08-11 17:44:07 +02:00
Miriam Baglioni
95e5482bbb
removing not needed dependency
2021-08-11 17:42:26 +02:00
Miriam Baglioni
b966329833
reverting
2021-08-11 17:37:00 +02:00
Miriam Baglioni
8ad7c71417
reverting
2021-08-11 17:36:12 +02:00
Miriam Baglioni
0e1a6bec20
reverting
2021-08-11 17:32:29 +02:00
Miriam Baglioni
c6a2a780a9
reverting
2021-08-11 17:30:17 +02:00
Miriam Baglioni
b6b58bba28
reverting
2021-08-11 17:25:37 +02:00
Miriam Baglioni
804589eb30
reverting
2021-08-11 17:23:35 +02:00
Miriam Baglioni
d688749ad9
reverting
2021-08-11 17:22:28 +02:00
Miriam Baglioni
524c06e028
reverting
2021-08-11 17:20:30 +02:00
Miriam Baglioni
7aa3260729
reverting
2021-08-11 17:18:45 +02:00
Miriam Baglioni
55fc500d8d
reverting
2021-08-11 17:17:48 +02:00
Miriam Baglioni
8229632839
adding assertions to the mapping of the unibi part of gold list
2021-08-11 16:36:01 +02:00
Miriam Baglioni
b1c6140ebf
removed all comments in Italian
2021-08-11 16:23:33 +02:00
Miriam Baglioni
52c18c2697
removed not needed test class. Teh functionality has been moved
2021-08-11 16:16:55 +02:00
Miriam Baglioni
8da3a25cf6
merging with branch beta
2021-08-11 15:55:34 +02:00
Claudio Atzori
9f4db73f30
updated/fixed unit tests
2021-08-11 15:02:51 +02:00
Claudio Atzori
61d811ba53
suggestions from intellij
2021-08-11 12:18:20 +02:00
Claudio Atzori
2ee21da43b
suggestions from SonarLint
2021-08-11 12:13:22 +02:00
Miriam Baglioni
b954fe9ba8
mergin with branch beta
2021-08-11 10:12:46 +02:00
Miriam Baglioni
b688567db5
hostedbymap - modified part of test to check the bestaccessright changed
2021-08-11 10:12:10 +02:00
Miriam Baglioni
9731a6144a
hostedbymap - in case the journal is open access the access may be changed also for the best access right in the result
2021-08-10 17:49:45 +02:00
Miriam Baglioni
a90bac3bc9
Graph Dump - added method to test class to verify addition of validation date in projects for community result
2021-08-09 16:36:54 +02:00
Miriam Baglioni
bd0d7bfba7
Graph Dump - added resources for testing addition of validation date in project for communityresult
2021-08-09 16:36:17 +02:00
Miriam Baglioni
8daaa32e90
Graph Dump - added resources for testing
2021-08-09 15:46:29 +02:00
Miriam Baglioni
bc9e3a06ba
Graph Dump - extended the test class
2021-08-09 15:46:06 +02:00
Claudio Atzori
d64a942a76
fixed MappersTest
2021-08-09 12:32:26 +02:00
Miriam Baglioni
2efa5abda5
refactoring
2021-08-09 12:28:36 +02:00
Claudio Atzori
577f3b1ac8
added dnet workflows responsible for the graph construction, enrichment, provision
2021-08-09 11:53:58 +02:00
Miriam Baglioni
da20fceaf7
removed all the part related to the crossref dump download since it is done in a separate workflow
2021-08-09 11:53:45 +02:00
Claudio Atzori
964f97ed4d
cleanup
2021-08-09 11:53:06 +02:00
Miriam Baglioni
54a6cbb244
CrossrefDump - put token among the parameters
2021-08-09 11:41:10 +02:00
Miriam Baglioni
b7079804cb
CrossrefDump - put token among the parameters
2021-08-09 11:34:35 +02:00
Miriam Baglioni
a5f82f442b
Merge branch 'beta' into doiboost_wf
2021-08-09 11:17:51 +02:00
Miriam Baglioni
b6dcf89d22
mergin with branch beta
2021-08-09 11:14:43 +02:00
Miriam Baglioni
eff499af9f
added new tests and changed the test example
2021-08-09 11:12:30 +02:00
Claudio Atzori
a45b95ccc1
resolving conflicts for PR#134
2021-08-09 10:50:03 +02:00
Miriam Baglioni
5d70f842eb
mergin with branch beta
2021-08-06 18:57:09 +02:00
Miriam Baglioni
c3931557e3
extended the logic of the dump to consider the validation date in the relation (also in the dumped result for communities and funders at the level of the project), the extention on the instance for the APC, the pid, the alternate identifiers, and the extention of the AccessRight to store the OpenAccessRoute. Added new resourec for testing and extended the old class to verify the new dump. Fixed also issue on relation dump: only relation whose source and target are entities in the graph are dumped. The same hold for references to projects
2021-08-06 18:56:18 +02:00
Claudio Atzori
66f398fe6f
Merge pull request '[stats] fixed a typo' ( #133 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#133
2021-08-06 14:29:57 +02:00
Miriam Baglioni
6bd1eca7e0
merge branch with beta
2021-08-05 15:23:32 +02:00
Miriam Baglioni
73dc082927
added new dumped field (openaccessroute, pid and alternate identifier at the level of the instance) and the bipFinder measure at the level of the result
2021-08-05 15:20:50 +02:00
Miriam Baglioni
ee13da9258
merge branch with master
2021-08-05 11:34:20 +02:00
Miriam Baglioni
bd096f5170
removed not needed param file
2021-08-05 10:55:43 +02:00
Miriam Baglioni
5faeefbda8
added script to download the dump,changed the workflow input paramenters
2021-08-05 10:54:03 +02:00
Miriam Baglioni
1965e4eece
new workflow for downloading the dump of crossref and unpack it
2021-08-04 18:29:03 +02:00
Claudio Atzori
83c04e5d28
mapping test for dataset records adapted to reflect the delegated pid authority (zenodo)
2021-08-04 10:37:57 +02:00
Miriam Baglioni
b4eb026c8b
mergin with branch beta
2021-08-04 10:21:37 +02:00
Miriam Baglioni
c7b71647c6
Hosted By Map - modification of the resource for testing the presence of only one entry per datasource id
2021-08-04 10:20:02 +02:00
Miriam Baglioni
eb8c3f8594
Hosted By Map - test modified because of the application of the new aggregator on datasources
2021-08-04 10:19:17 +02:00
Miriam Baglioni
e94ae0b1de
Hosted By Map - extention of the workflow to consider also the application of the map to publications and datasources
2021-08-04 10:18:11 +02:00
Miriam Baglioni
67ba4c40e0
Hosted By Map - added parameter resources
2021-08-04 10:17:28 +02:00
Miriam Baglioni
eccf3851b0
Hosted By Map - refactoring
2021-08-04 10:16:30 +02:00
Sandro La Bruzzo
74afe43c3a
fixed wrong test file
2021-08-04 10:16:17 +02:00
Miriam Baglioni
1e952cccf6
Hosted By Map - refactoring and deletion of not needed methods
2021-08-04 10:15:43 +02:00
Miriam Baglioni
8ba8c77f92
Hosted By Map - refactoring
2021-08-04 10:14:57 +02:00
Miriam Baglioni
8f7623e77a
Hosted By Map - refactoring and application of the new aggregator
2021-08-04 10:14:20 +02:00
Sandro La Bruzzo
3fc820203b
fixed wrong test file
2021-08-04 10:13:59 +02:00
Miriam Baglioni
a7bf314fd2
Hosted By Map - added new aggregator to get just one result per datasource id
2021-08-04 10:13:30 +02:00
Miriam Baglioni
9831725073
Hosted By Map - remove from workflow a step not needed. The hbm will be take care also of the integration of the unibi list of gold openaccess journals
2021-08-03 11:02:17 +02:00
Miriam Baglioni
100e54e6c8
mergin with branch beta
2021-08-03 10:47:11 +02:00
Miriam Baglioni
461b8a29a0
removed not needed class
2021-08-03 10:46:51 +02:00
Miriam Baglioni
327cddde33
Hosted By Map - refactoring
2021-08-03 10:44:13 +02:00
Miriam Baglioni
17292c6641
Hosted By Map - resources for testing purposes
2021-08-02 19:37:08 +02:00
Miriam Baglioni
ee7ccb98dc
Hosted By Map - test class to verify the application of the hbm to results and datasource
2021-08-02 19:36:18 +02:00
Miriam Baglioni
90e91486e2
Hosted By Map - test class to verify each step in the preparation process
2021-08-02 19:35:52 +02:00
Miriam Baglioni
1e859706a3
Hosted By Map - Classes to apply the HBM to results and datasources
2021-08-02 19:35:23 +02:00
Miriam Baglioni
72df8f9232
Hosted By Map - removed the aggregator for the datasource (it is no more needed) and added a new aggregator for the results. Changed also the hostedBYMap aggregator
2021-08-02 19:34:44 +02:00
Miriam Baglioni
ff1ce75e33
Hosted By Map - modification in the code to prepare the info needed to apply the HostedByMap. There is no need to join datasources with the hbm: all the information needed is in the hosted by map already
2021-08-02 19:32:59 +02:00
Claudio Atzori
e826aae848
using constants from ModelConstants
2021-08-02 14:28:59 +02:00
Antonis Lempesis
117c3d5c67
fixed a typo
2021-08-02 12:15:58 +03:00
Miriam Baglioni
1695d45bd4
Hosted By Map - Test class to verify the preparation of the intermediate information
2021-07-30 17:57:01 +02:00
Miriam Baglioni
7c6ea2f4c7
Hosted By Map - first attempt for the creation of intermedia information to be used to applu the hosted by map on the graph entities
2021-07-30 17:56:27 +02:00
Miriam Baglioni
d8b9b0553b
Hosted By Map - model classes to store the intermediate information to be used to apply the hosted by map
2021-07-30 17:55:39 +02:00
Miriam Baglioni
613bd3bde0
Hosted By Map - refactor of the first attemp to prepare a new hosted by map dependent on the datasource in the graph and on two external sources: the gold list from unibi ad the doaj list of open access journal. Both the lists are downloaded from provided url parameter
2021-07-30 17:54:45 +02:00
Miriam Baglioni
d1807781c0
mergin with branch beta
2021-07-30 14:34:07 +02:00
Miriam Baglioni
1d6ac3715b
merge branch with beta
2021-07-30 11:58:29 +02:00
Claudio Atzori
19620eed46
applying PR#131, Patch the identifiers (source/target) in the relations, refinements
2021-07-30 11:09:32 +02:00
Claudio Atzori
4f78565c04
fixed implementation of PatchRelationsApplication, refined the relative unit test
2021-07-30 11:07:09 +02:00
Claudio Atzori
a6a38cca9e
fixed implementation of PatchRelationsApplication, refined the relative unit test
2021-07-30 11:06:11 +02:00
Miriam Baglioni
9bc4fd3b69
Patch FCT relations - fixed issue with join
2021-07-30 10:34:05 +02:00
Miriam Baglioni
2fc89fc9b5
Merge branch 'fct_project_id_replacement' of https://code-repo.d4science.org/D-Net/dnet-hadoop into fct_project_id_replacement
2021-07-30 10:20:43 +02:00
Claudio Atzori
081fe92a21
Merge branch 'fct_project_id_replacement' of https://code-repo.d4science.org/D-Net/dnet-hadoop into fct_project_id_replacement
2021-07-30 10:13:56 +02:00
Claudio Atzori
576693d782
added unit test for PatchRelationsApplication
2021-07-30 10:13:33 +02:00
Claudio Atzori
55e6470f44
Merge pull request 'added the sprint 2 indicators in monitor db' ( #129 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#129
2021-07-30 10:11:46 +02:00
Sandro La Bruzzo
6358f92c3a
added sleep to solve problem of lost request of creating index
2021-07-30 08:54:37 +02:00
Antonis Lempesis
26af0320d0
added the sprint 2 indicators in monitor db
2021-07-30 00:31:33 +03:00
Claudio Atzori
7b172e7cd9
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-07-29 13:57:06 +02:00
Claudio Atzori
c53d106e80
[provision] lowercase relation filter
2021-07-29 13:57:00 +02:00
Claudio Atzori
6e3554a45e
[provision] lowercase relation filter
2021-07-29 13:56:37 +02:00
Sandro La Bruzzo
b1b0cc3f15
fixed wrong package name
2021-07-29 13:55:08 +02:00
Miriam Baglioni
baad01cadc
hostedbymap
2021-07-29 13:04:39 +02:00
Claudio Atzori
e725c88ebb
[raw_all] patching relation identifier phase to be run at the end, i.e. includes also claimed relations
2021-07-29 13:03:43 +02:00
Claudio Atzori
5d08ad86ae
[raw_all] patching relation identifier phase to be run at the end, i.e. includes also claimed relations
2021-07-29 13:03:16 +02:00
Claudio Atzori
e87e1805c4
[raw_all] added extra workflow step for patching the identifiers in the relations, given an id mapping dataset
2021-07-29 12:13:06 +02:00