Claudio Atzori
|
ae7cd0735a
|
[graph2hive] more partitions
|
2022-10-14 15:47:58 +02:00 |
Claudio Atzori
|
b47aaf4dd1
|
[cleaning] subjects declared as belonging to specific vocabularies whose values are not found in the vocab are set to type keyword
|
2022-10-13 11:23:43 +02:00 |
Claudio Atzori
|
6163ecbf63
|
[cleaning] renamed parameters in wf action
|
2022-10-11 11:20:03 +02:00 |
Claudio Atzori
|
b301e9fdff
|
[cleaning] renamed action name/description
|
2022-10-11 11:08:52 +02:00 |
Claudio Atzori
|
ece40adc09
|
[cleaning] fixing NPE in the country cleaning phase
|
2022-10-11 10:10:20 +02:00 |
Claudio Atzori
|
8d97949316
|
[cleaning] fixed loop in wf nodes
|
2022-10-07 09:52:45 +02:00 |
Alessia Bardi
|
49360770d7
|
map w3id as instance url
|
2022-09-28 14:16:39 +02:00 |
Miriam Baglioni
|
b5b5a4c192
|
[CleanCountry] fixed issue
|
2022-09-28 12:42:51 +02:00 |
Claudio Atzori
|
3f90d159e3
|
code formatting
|
2022-09-27 15:08:00 +02:00 |
Claudio Atzori
|
0b3e44e521
|
Merge branch 'beta' into relation-from-odf
|
2022-09-27 14:57:01 +02:00 |
Claudio Atzori
|
57dbeb08d2
|
code formatting
|
2022-09-27 14:55:10 +02:00 |
Claudio Atzori
|
25e9d92aad
|
Merge branch 'beta' into clean_country
|
2022-09-27 14:27:49 +02:00 |
Alessia Bardi
|
fd63e9bfac
|
Mapping all relationships supported in ModelConstants and ModelSupport
|
2022-09-26 11:24:13 +02:00 |
Alessia Bardi
|
c5eb722170
|
relationships from relatedIdentifier whose target id type is one of the pid type with an authority
|
2022-09-23 15:47:05 +02:00 |
Claudio Atzori
|
c86cc53520
|
suppressing hyper verbose spark logs during unit test execution
|
2022-09-23 15:20:40 +02:00 |
Alessia Bardi
|
ba33ff71fd
|
refactoring for the generation of relationships from related identifier of type 'OPENAIRE'
|
2022-09-23 15:17:13 +02:00 |
Alessia Bardi
|
982bcc1e35
|
test wrid pid and record identifier
|
2022-09-23 12:06:06 +02:00 |
Claudio Atzori
|
c42850328e
|
fixed semantic (subreltype) for ServiceOrganization relations
|
2022-09-22 16:23:25 +02:00 |
Claudio Atzori
|
e45ec15221
|
Merge branch 'beta' into clean_country
|
2022-09-19 11:34:02 +02:00 |
Claudio Atzori
|
26e1badded
|
added instance.url syntactical validation, avoid creating multiple duplicated URLs
|
2022-09-19 11:19:10 +02:00 |
Claudio Atzori
|
192215a18e
|
merged from branch discard-non-wellformed
|
2022-09-19 10:17:10 +02:00 |
Claudio Atzori
|
e370e940d8
|
[aggregator graph] save invalid records aside for further inspection
|
2022-09-16 14:06:28 +02:00 |
Claudio Atzori
|
1e42d984e1
|
[aggregator graph] save invalid records aside for further inspection
|
2022-09-15 10:49:42 +02:00 |
Alessia Bardi
|
9e7ec4198f
|
fixed test
|
2022-09-14 18:08:56 +02:00 |
Claudio Atzori
|
c48f6e9c57
|
[aggregator graph] save invalid records aside for further inspection
|
2022-09-14 17:11:26 +02:00 |
Claudio Atzori
|
a0919ed495
|
[aggregator graph] save invalid records aside for further inspection
|
2022-09-14 13:27:39 +02:00 |
Alessia Bardi
|
b99a011345
|
return empty Oaf list if record cannot be parsed
|
2022-09-13 11:51:55 +02:00 |
Alessia Bardi
|
27af5122d2
|
logs for non well formed XML files
|
2022-09-12 14:25:23 +02:00 |
Claudio Atzori
|
ff6f789b6d
|
code formatting
|
2022-09-09 15:16:31 +02:00 |
Claudio Atzori
|
b5d6966c01
|
Merge branch 'beta' into clean_country
|
2022-09-09 12:20:19 +02:00 |
Claudio Atzori
|
b5f7bd30be
|
Merge branch 'beta' into clean_subjects
|
2022-09-09 12:20:04 +02:00 |
Alessia Bardi
|
a539c6ccaf
|
https for handle URLs
|
2022-09-09 12:16:28 +02:00 |
Claudio Atzori
|
1203378441
|
Merge branch 'beta' into clean_subjects
|
2022-09-09 10:38:47 +02:00 |
Claudio Atzori
|
14dc909a14
|
Merge branch 'beta' into clean_country
|
2022-09-09 10:38:17 +02:00 |
Alessia Bardi
|
9ef063d502
|
#7861#note-8 instance url from handle
|
2022-09-07 17:29:54 +03:00 |
Alessia Bardi
|
5c45d52af3
|
testing for RiuNet
|
2022-09-07 15:40:57 +03:00 |
Alessia Bardi
|
a11eb38065
|
testing for RO-Hub
|
2022-09-02 16:07:36 +02:00 |
Claudio Atzori
|
b7c387c21f
|
cleaning of subjects: avoid duplicated subjects, prioritise collected vs inferred or other sources
|
2022-08-12 15:09:16 +02:00 |
Claudio Atzori
|
adb526b0e1
|
Merge branch 'beta' into clean_subjects
|
2022-08-12 10:51:17 +02:00 |
Claudio Atzori
|
cb7c07c54e
|
[scholix] added step to create tar archive
|
2022-08-11 11:25:24 +02:00 |
Claudio Atzori
|
2aa16d0432
|
[scholix] fixed OpenCitation dump procedure
|
2022-08-10 17:39:29 +02:00 |
Miriam Baglioni
|
7dbdd4a0fe
|
[Clean Country]changes related to #241 (comment)
|
2022-08-10 15:13:10 +02:00 |
Claudio Atzori
|
51ad93e545
|
[scholix] fixed OpenCitation dump procedure
|
2022-08-10 11:57:56 +02:00 |
Miriam Baglioni
|
62d2138806
|
[Clean Context] changed a bit the logic. Added the check not to have result hosted by a datasource of type institutional repository from NL. Added also the check that the country should have been included in the result via propagation for it to be removed
|
2022-08-08 14:10:47 +02:00 |
Claudio Atzori
|
3418ce50ac
|
cleaning of subjects: perform the cleaning when the given value is equivalent to one of the terms in the vocabulary
|
2022-08-08 12:48:47 +02:00 |
Miriam Baglioni
|
390013a4b2
|
mergin with branch beta
|
2022-08-08 12:30:31 +02:00 |
Claudio Atzori
|
4eaa063b1f
|
cleaning of subjects
|
2022-08-05 16:56:09 +02:00 |
Claudio Atzori
|
32cee1f619
|
WIP: cleaning of subjects
|
2022-08-05 12:32:08 +02:00 |
Claudio Atzori
|
6c0fd9284b
|
merge from beta
|
2022-08-05 10:42:53 +02:00 |
Claudio Atzori
|
b78889a0ce
|
WIP: cleaning of subjects
|
2022-08-05 09:11:37 +02:00 |
Miriam Baglioni
|
a7a18d7630
|
[Graph Dump] removed code for the dump from the project. Fixed issues in tests when possible
|
2022-08-04 17:40:40 +02:00 |
Claudio Atzori
|
27a91841e7
|
WIP: cleaning of subjects
|
2022-08-04 11:39:39 +02:00 |
Claudio Atzori
|
e62018e95d
|
[aggregator graph] added more assertions in test
|
2022-08-03 12:26:05 +02:00 |
Claudio Atzori
|
f62c4e05cd
|
code formatting
|
2022-07-29 11:56:01 +02:00 |
Claudio Atzori
|
1dd1e4fe3a
|
extended test for mapping project_organization relations
|
2022-07-28 11:27:08 +02:00 |
Claudio Atzori
|
09ccc7b472
|
Merge branch 'beta' into project_organization_contribution
|
2022-07-28 09:49:59 +02:00 |
Miriam Baglioni
|
5968ec018d
|
[Clean Country] modified workflow and added param file
|
2022-07-22 16:48:38 +02:00 |
Miriam Baglioni
|
a12d28c644
|
[Clean Country] added logic not to remove country from result if it exist a hosting datasource with that country. Moreover the country will be removed only if added with propagation
|
2022-07-22 16:23:12 +02:00 |
Miriam Baglioni
|
2c933f1158
|
mergin with branch beta
|
2022-07-22 14:57:41 +02:00 |
Sandro La Bruzzo
|
ddc414b258
|
fixed wrong json param
|
2022-07-22 09:43:15 +02:00 |
Sandro La Bruzzo
|
5f651f2316
|
changed filter relation on SubRelType
|
2022-07-21 10:11:48 +02:00 |
Miriam Baglioni
|
65cc736e2f
|
[Clean Country] first implementation to remove country NL from results collected from NARCIS when doi starts with mendely prefix
|
2022-07-20 17:05:56 +02:00 |
Sandro La Bruzzo
|
5b76321d9c
|
implemented oozie workflow to generate scholix dump filtering relclass semantic
|
2022-07-20 16:34:32 +02:00 |
Claudio Atzori
|
1138b2ac8e
|
code formatting
|
2022-07-19 14:15:49 +02:00 |
Claudio Atzori
|
0c1cfee396
|
mapping oaf:fulltext elements in the result.fulltext field
|
2022-07-11 17:34:59 +02:00 |
Claudio Atzori
|
0cb1c70788
|
code formatting
|
2022-07-01 10:44:08 +02:00 |
Claudio Atzori
|
4ec13e2b66
|
Merge branch 'master' into dump_new_funded_products
|
2022-07-01 10:30:28 +02:00 |
Claudio Atzori
|
7da24c1dec
|
added more logging
|
2022-06-28 13:47:49 +02:00 |
Miriam Baglioni
|
71744a1f52
|
[DUMP DELTA PROJECTS] refactoring
|
2022-06-27 18:07:58 +02:00 |
Miriam Baglioni
|
1d1fe3b151
|
[DUMP DELTA PROJECTS] refactoring
|
2022-06-27 18:04:59 +02:00 |
Claudio Atzori
|
a8773af0cb
|
Merge branch 'beta' into project_organization_contribution
|
2022-06-27 09:37:40 +02:00 |
Claudio Atzori
|
5130eac247
|
mapping by participant project contribution
|
2022-06-24 17:16:42 +02:00 |
Miriam Baglioni
|
edddfc6c63
|
[DUMP DELTA PROJECTS] adding test and resource
|
2022-06-21 18:28:53 +02:00 |
Miriam Baglioni
|
f561f13dd9
|
[Funder Products Dump] fixed names of parameters in workflow
|
2022-06-21 18:18:17 +02:00 |
Miriam Baglioni
|
ff74e73369
|
[DUMP NEW FUNDED PRODUCTS] change in resources
|
2022-06-21 18:02:51 +02:00 |
Miriam Baglioni
|
b98f904d48
|
[Funder Products Dump] new way to avoid using hive
|
2022-06-21 17:52:27 +02:00 |
Miriam Baglioni
|
7423577a08
|
[Graph DUMP] add code to produce the delta of new projects with respect to the previous delta/dump
|
2022-06-21 14:51:38 +02:00 |
Claudio Atzori
|
b295a40d9c
|
restored use of name_particles when parsing author names
|
2022-06-16 12:20:43 +02:00 |
Claudio Atzori
|
4c8e820ff0
|
mapping relationship from trasformed records based on oaf:relation
|
2022-06-14 08:49:02 +02:00 |
Claudio Atzori
|
116902c028
|
mapping relationship from trasformed records based on oaf:relation
|
2022-06-13 14:31:48 +02:00 |
Alessia Bardi
|
68bd58d6a4
|
tests for ROHub
|
2022-06-10 17:29:11 +02:00 |
Claudio Atzori
|
52cb086506
|
[graph grouping] drop relation target path before copying from source
|
2022-05-16 12:08:36 +02:00 |
Claudio Atzori
|
997c50078e
|
[graph grouping] drop relation target path before copying from source
|
2022-05-16 12:07:40 +02:00 |
Claudio Atzori
|
6031acb2e3
|
[openorgs] fixed parent/child query, using the correct semantic labels
|
2022-05-16 09:20:48 +02:00 |
Claudio Atzori
|
0dc33ea391
|
[openorgs] fixed parent/child query, using the correct semantic labels
|
2022-05-16 09:20:30 +02:00 |
Miriam Baglioni
|
e4eac1d20b
|
[EOSC TAG] added code to remove EOSC Jupyter Notebook from subjects and put EOSC as classid in the qualifier
|
2022-05-13 11:01:33 +02:00 |
Sandro La Bruzzo
|
22f65680b9
|
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
|
2022-05-11 15:30:12 +02:00 |
Sandro La Bruzzo
|
ca8d26bcb4
|
added better filter for openCitations
|
2022-05-11 15:29:57 +02:00 |
Claudio Atzori
|
5d3b4a9c25
|
[graph merge beta] merge datasource originalid, collectedfrom, and pid lists
|
2022-05-11 14:13:06 +02:00 |
Claudio Atzori
|
2a8e0fb72f
|
[openorgs] mapping parent/child relations without massaging the semantic labels
|
2022-05-10 08:45:53 +02:00 |
Claudio Atzori
|
77bc9863e9
|
[openorgs] mapping parent/child relations without massaging the semantic labels
|
2022-05-09 16:06:04 +02:00 |
Claudio Atzori
|
378020e30a
|
[eosc_services] unit test adaptation
|
2022-05-09 16:05:06 +02:00 |
Claudio Atzori
|
846975c886
|
[eosc_services] using the correct 'keyword' subject type, as declared in the dnet:subject_classification_typologies vocabulary
|
2022-05-05 11:37:58 +02:00 |
Claudio Atzori
|
da611cfbbd
|
[eosc_services] resolved merge conflicts
|
2022-05-03 13:37:15 +02:00 |
Claudio Atzori
|
2ade69dea6
|
EOSC Services - minor
|
2022-05-02 17:03:31 +02:00 |
Claudio Atzori
|
b6a7ff3a99
|
EOSC Services - removed fields from mapping, testing preparation
|
2022-05-02 15:52:33 +02:00 |
Claudio Atzori
|
a8c51f6f16
|
EOSC Services - fixed query and testing preparation
|
2022-05-02 11:09:03 +02:00 |
Claudio Atzori
|
05c1ea92e9
|
EOSC Services - added Service-specific fields in the XML record serialization
|
2022-04-29 15:56:55 +02:00 |
Claudio Atzori
|
f5f532d134
|
EOSC Services - ongoing update
|
2022-04-29 12:25:24 +02:00 |
Claudio Atzori
|
5ffc24d1ba
|
EOSC Services - ongoing update
|
2022-04-26 16:18:41 +02:00 |
Miriam Baglioni
|
19d90658fc
|
[Clean Context] added description to parameters
|
2022-04-22 15:41:23 +02:00 |
Miriam Baglioni
|
e0915061c2
|
[Clean Context] fixed issue in param name
|
2022-04-21 16:32:40 +02:00 |
Miriam Baglioni
|
9a961a0092
|
[Clean Context] fixed issue in param name
|
2022-04-21 15:12:24 +02:00 |
Miriam Baglioni
|
5b7d9e741c
|
[Clean Context] added logic to cleaning workflow to accomodate also context cleaning
|
2022-04-21 13:02:14 +02:00 |
Miriam Baglioni
|
ccba1a3db1
|
[Clean Context] added logic to cleaning workflow to accomodate also context cleaning
|
2022-04-21 13:00:06 +02:00 |
Miriam Baglioni
|
a38f0f5ea7
|
mergin with branch beta
|
2022-04-20 15:44:18 +02:00 |
Miriam Baglioni
|
dbfbe8841a
|
[Clean Context] changed the description in input parameters
|
2022-04-20 15:41:03 +02:00 |
Michele Artini
|
c96a8613f8
|
update SQL queries
|
2022-04-20 12:07:49 +02:00 |
Michele Artini
|
4314db55c8
|
migration to services: update sql queries
|
2022-04-19 15:05:02 +02:00 |
Claudio Atzori
|
05fafa1408
|
[graph raw] avoid NPEs importing datasource consent fields
|
2022-04-06 15:23:50 +02:00 |
Claudio Atzori
|
8c457f1b2c
|
conflicts resolved, merged from beta
|
2022-04-06 10:27:52 +02:00 |
Miriam Baglioni
|
79336d46c5
|
[Clean Context] first naive implementation of a functionality to clean not wanted contextes from one result. This implementation simply verifies the main title of the results start with a given string
|
2022-04-04 15:52:31 +02:00 |
Claudio Atzori
|
0a0ae84c22
|
[graph raw] DOI based instance URLs on https
|
2022-03-29 10:52:58 +02:00 |
Claudio Atzori
|
741bc99c47
|
Merge branch 'beta' into datasource_pdf_consent
|
2022-03-28 09:20:48 +02:00 |
Miriam Baglioni
|
89fd275480
|
[HostedByMap] added left over from PR and fixed issue on workflow
|
2022-03-21 09:54:45 +01:00 |
Miriam Baglioni
|
0f7d8ca2e0
|
[HostedByMap] change on master to align to PR 201 on beta merged as 9f3036c847
|
2022-03-11 15:16:02 +01:00 |
Claudio Atzori
|
f25407bbe2
|
added mapping for datasource consent fields to integrate them in the graph
|
2022-03-11 09:32:42 +01:00 |
Miriam Baglioni
|
2c5087d55a
|
[HostedByMap] download of doaj from json, modification of test resources, deletion of class no more needed for the CSV download
|
2022-03-04 15:18:21 +01:00 |
Miriam Baglioni
|
5d608d6291
|
[HostedByMap] changed the model to include also oaStart date and review process that could be possibly used in the future
|
2022-03-04 11:06:09 +01:00 |
Miriam Baglioni
|
8a41f63348
|
[HostedByMap] update to download the json instead of the csv
|
2022-03-04 10:38:43 +01:00 |
Miriam Baglioni
|
44b0c03080
|
[HostedByMap] update to download the json instead of the csv
|
2022-03-04 10:37:59 +01:00 |
Claudio Atzori
|
99f5b14469
|
[graph raw] invisible records stored among the raw graph rather than the claimed subgraph
|
2022-02-18 15:20:57 +01:00 |
Claudio Atzori
|
cf8443780e
|
added processingchargeamount to the result view
|
2022-02-18 15:17:48 +01:00 |
Alessia Bardi
|
600ede1798
|
serialisation of APCs int he XML records
|
2022-02-11 11:00:20 +01:00 |
Miriam Baglioni
|
aae667e6b6
|
[APC at the result level] added the APC at the level of the result and modified test class
|
2022-02-04 12:34:25 +01:00 |
Alessia Bardi
|
2e215abfa8
|
test for instances with URLs for OpenAPC
|
2022-02-02 17:27:44 +01:00 |
Claudio Atzori
|
8eb75ca169
|
adapted GenerateEntitiesApplicationTest behaviour
|
2022-01-27 16:24:37 +01:00 |
Claudio Atzori
|
af61e44acc
|
ported changes to the GraphCleaningFunctionsTest from 8de9788308
|
2022-01-27 16:19:14 +01:00 |
Claudio Atzori
|
1322379741
|
Merge branch 'beta' into delegated_authorities
|
2022-01-25 14:28:25 +01:00 |
Claudio Atzori
|
97ad94d7d9
|
[graph resolution] drop output path at the beginning
|
2022-01-24 18:02:07 +01:00 |
Claudio Atzori
|
dd52bf1bb8
|
copy relations to the graphOutputPath
|
2022-01-21 13:59:29 +01:00 |
Claudio Atzori
|
4983d6536d
|
Merge branch 'beta' into delegated_authorities
|
2022-01-21 13:02:48 +01:00 |
Claudio Atzori
|
f0ea2410e5
|
improved mapping titles from datacite records to consider title types
|
2022-01-21 10:50:34 +01:00 |
Claudio Atzori
|
3b9020c1b7
|
added unit test for the DispatchEntitiesJob
|
2022-01-19 18:15:55 +01:00 |
Claudio Atzori
|
abfa9c6045
|
code formatting
|
2022-01-19 17:17:11 +01:00 |
Claudio Atzori
|
391aa1373b
|
added unit test
|
2022-01-19 17:13:21 +01:00 |
Claudio Atzori
|
44a937f4ed
|
factored out entity grouping implementation, extended to consider results from delegated authorities rather than identical records from other sources
|
2022-01-19 12:24:52 +01:00 |
Miriam Baglioni
|
a7c4d0d16d
|
[DoiBoost Organizations] added parameter to specify the action in the wf raw_organizations to be able to load the openorgs organization as in the loading step for the construction of the graph
|
2022-01-13 13:52:00 +01:00 |
Sandro La Bruzzo
|
57e2c4b749
|
formatted code
|
2022-01-12 09:40:28 +01:00 |
Claudio Atzori
|
4f212652ca
|
scalafmt: code formatting
|
2022-01-11 16:57:48 +01:00 |
Claudio Atzori
|
908294d86e
|
OAF-store-graph mdstores: firther fix for PR#180
|
2022-01-05 15:49:05 +01:00 |
Claudio Atzori
|
bd59b58efb
|
test for the tolerant deserialisation utility method
|
2022-01-04 11:26:56 +01:00 |
Claudio Atzori
|
a6977197b3
|
serialise records in the OAF-store-graph mdstores in json format. Read them again in the graph construction phase using a tolerant parser to support backward compatible changes in the evolution of the schema
|
2022-01-03 17:25:26 +01:00 |
Miriam Baglioni
|
7a1b440413
|
[SDG] logic to create unresolved entities out of SDG input. This changes also some classes related to FOS to reuse the same code. The code under createunresolvedentities create results with the merged update of the the inputs provided (bip at the level of the isntance, fos and sdg for subjects)
|
2021-12-23 13:24:28 +01:00 |
Miriam Baglioni
|
69e9ea9eeb
|
[Graph Dump] Test for extraction of rels from entities extended
|
2021-12-23 10:15:30 +01:00 |
Miriam Baglioni
|
31b26d48ac
|
[Graph Dump] fixed issue on extraction of relation between entities and contexts: the relationship name and type were swapped
|
2021-12-23 10:09:47 +01:00 |
Miriam Baglioni
|
be0acccf42
|
Merge branch 'beta' into dump
|
2021-12-22 12:39:57 +01:00 |
Miriam Baglioni
|
460e6b95d6
|
[Graph Dump] -
|
2021-12-21 13:48:03 +01:00 |
Sandro La Bruzzo
|
3920d68992
|
Fixed workflow generation of delta in datacite
|
2021-12-21 11:41:49 +01:00 |
Sandro La Bruzzo
|
b881ee5ef8
|
[scholexplorer]
- implemented generation of scholix of delta update of datacite
|
2021-12-15 11:25:32 +01:00 |