Miriam Baglioni
c298c148cb
[CountryPropagation] fix NPE issue
2022-05-20 09:11:46 +02:00
Miriam Baglioni
eaf9385ae5
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-05-17 15:09:37 +02:00
Miriam Baglioni
f5207885e3
[EOSCTag] changed code to remove EOSC Jupyter Notebook and modified test to exclude galaxy + software from the tagging for Galaxy
2022-05-17 15:09:22 +02:00
Claudio Atzori
d098ad0d93
[hb patch] updated map
2022-05-16 15:54:04 +02:00
Claudio Atzori
1dda11e031
[hb patch] updated map
2022-05-16 15:53:27 +02:00
Claudio Atzori
8dd5517548
code formatting
2022-05-16 14:35:24 +02:00
Claudio Atzori
52cb086506
[graph grouping] drop relation target path before copying from source
2022-05-16 12:08:36 +02:00
Claudio Atzori
6442763f97
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-05-16 12:07:45 +02:00
Claudio Atzori
997c50078e
[graph grouping] drop relation target path before copying from source
2022-05-16 12:07:40 +02:00
Sandro La Bruzzo
c1971d52c4
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2022-05-16 10:30:35 +02:00
Sandro La Bruzzo
4c50f35c8b
update publication Date format
2022-05-16 10:29:36 +02:00
Michele Artini
46c07e0724
deleted hierarchical rels from ror action set
2022-05-16 09:39:54 +02:00
Claudio Atzori
6031acb2e3
[openorgs] fixed parent/child query, using the correct semantic labels
2022-05-16 09:20:48 +02:00
Claudio Atzori
0dc33ea391
[openorgs] fixed parent/child query, using the correct semantic labels
2022-05-16 09:20:30 +02:00
Antonis Lempesis
3fc9efeab6
fixed typo, addded open citations and apcs in monitor
2022-05-13 14:28:13 +03:00
Miriam Baglioni
e4eac1d20b
[EOSC TAG] added code to remove EOSC Jupyter Notebook from subjects and put EOSC as classid in the qualifier
2022-05-13 11:01:33 +02:00
Sandro La Bruzzo
22f65680b9
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2022-05-11 15:30:12 +02:00
Sandro La Bruzzo
ca8d26bcb4
added better filter for openCitations
2022-05-11 15:29:57 +02:00
Claudio Atzori
5d3b4a9c25
[graph merge beta] merge datasource originalid, collectedfrom, and pid lists
2022-05-11 14:13:06 +02:00
Antonis Lempesis
23334479bb
removed yet another collab, added more orgs in monitor
2022-05-11 13:05:52 +03:00
Claudio Atzori
2a8e0fb72f
[openorgs] mapping parent/child relations without massaging the semantic labels
2022-05-10 08:45:53 +02:00
Claudio Atzori
77bc9863e9
[openorgs] mapping parent/child relations without massaging the semantic labels
2022-05-09 16:06:04 +02:00
Claudio Atzori
378020e30a
[eosc_services] unit test adaptation
2022-05-09 16:05:06 +02:00
Miriam Baglioni
89657a0b78
[UsageCount] refactoring
2022-05-09 14:43:27 +02:00
Miriam Baglioni
a056f59c6e
[UsageCount] make it as an action set as it should be, plus changed the test to make them work as well now
2022-05-09 12:51:35 +02:00
Antonis Lempesis
61b4c19e65
restored indi_result_org_country_collab, removed indi_result_org_collab
2022-05-06 12:52:10 +03:00
Antonis Lempesis
cfbbcaf7c4
commented out indi_result_org_country_collab
2022-05-06 12:49:36 +03:00
Claudio Atzori
658450d9a3
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-05-05 11:38:08 +02:00
Claudio Atzori
846975c886
[eosc_services] using the correct 'keyword' subject type, as declared in the dnet:subject_classification_typologies vocabulary
2022-05-05 11:37:58 +02:00
Miriam Baglioni
8a72de4011
[EOSCTag] modified workflow to execute all the steps and not only the last one
2022-05-04 10:10:56 +02:00
Miriam Baglioni
bd1108f98b
mergin with branch beta
2022-05-04 10:06:56 +02:00
Miriam Baglioni
3aeedd931a
[EOSCTag] fixed issue in case description is null. Modified test resources and classes
2022-05-04 10:06:38 +02:00
Claudio Atzori
da611cfbbd
[eosc_services] resolved merge conflicts
2022-05-03 13:37:15 +02:00
Claudio Atzori
9e12cb3c92
EOSC Services - removed field knowledgegraph; depending on the released schema module
2022-05-03 11:55:45 +02:00
Miriam Baglioni
a21fe310e5
[EOSCTag] last test and change in the implementation to search in title and descriptio
2022-05-02 17:43:20 +02:00
Claudio Atzori
2ade69dea6
EOSC Services - minor
2022-05-02 17:03:31 +02:00
Claudio Atzori
b6a7ff3a99
EOSC Services - removed fields from mapping, testing preparation
2022-05-02 15:52:33 +02:00
Miriam Baglioni
e37177e1ce
mergin with branch beta
2022-05-02 12:31:50 +02:00
Claudio Atzori
a8c51f6f16
EOSC Services - fixed query and testing preparation
2022-05-02 11:09:03 +02:00
Claudio Atzori
05c1ea92e9
EOSC Services - added Service-specific fields in the XML record serialization
2022-04-29 15:56:55 +02:00
Claudio Atzori
f5f532d134
EOSC Services - ongoing update
2022-04-29 12:25:24 +02:00
Antonis Lempesis
0353f93d54
added new hive opts
2022-04-29 12:49:27 +03:00
Serafeim Chatzopoulos
623f7be26d
Fix reading files from HDFS in FileCollector & FileGZipCollector plugins
2022-04-28 16:31:11 +03:00
Claudio Atzori
5ffc24d1ba
EOSC Services - ongoing update
2022-04-26 16:18:41 +02:00
Sandro La Bruzzo
78015a5733
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2022-04-26 09:56:34 +02:00
Sandro La Bruzzo
8c22e5c30a
added fix to include date array with only year or year and month
2022-04-26 09:56:27 +02:00
Claudio Atzori
81c4496d32
Merge branch 'beta' into 7096-fileGZip-collector-plugin
2022-04-26 09:02:15 +02:00
Miriam Baglioni
e342ec93f0
[EOSCTag] prepared resources for test
2022-04-22 18:35:37 +02:00
Miriam Baglioni
88562c0930
[EOSC TAG] added test for galaxy for title and description criterias
2022-04-22 18:35:03 +02:00
Miriam Baglioni
dfbd2bcbea
[EOSC TAG] added logic in case subject is null
2022-04-22 18:34:03 +02:00
Miriam Baglioni
27c85e901a
[EOSCTag] added resources and finalized test for Jupyter Notebook tagging
2022-04-22 17:38:10 +02:00
Miriam Baglioni
87bff36d9e
mergin with branch beta
2022-04-22 15:52:34 +02:00
Miriam Baglioni
911ce0780a
Merge branch 'cleancontext' of https://code-repo.d4science.org/D-Net/dnet-hadoop into cleancontext
2022-04-22 15:41:42 +02:00
Miriam Baglioni
19d90658fc
[Clean Context] added description to parameters
2022-04-22 15:41:23 +02:00
Claudio Atzori
54162f5c4f
Merge branch 'beta' into cleancontext
2022-04-22 11:49:33 +02:00
Miriam Baglioni
bbb77052d3
[EOSCTag] first test
2022-04-22 11:32:57 +02:00
Claudio Atzori
30105f0722
Merge branch 'beta' into 7096-fileGZip-collector-plugin
2022-04-22 11:22:21 +02:00
Sandro La Bruzzo
a82ec3aaaf
code formatter
2022-04-22 11:08:13 +02:00
Sandro La Bruzzo
aa12429f50
Modified last intersection since we lost many titles.
2022-04-22 11:05:08 +02:00
Miriam Baglioni
7cb7066472
[EoscTag] first "rough" implementation
2022-04-22 10:44:17 +02:00
Sandro La Bruzzo
d660895b30
fixed wrong mapping type of dataset
2022-04-21 20:41:13 +02:00
Miriam Baglioni
e0915061c2
[Clean Context] fixed issue in param name
2022-04-21 16:32:40 +02:00
Miriam Baglioni
6dc68c48e0
[EOSCTag] -
2022-04-21 16:19:04 +02:00
Miriam Baglioni
9a961a0092
[Clean Context] fixed issue in param name
2022-04-21 15:12:24 +02:00
Claudio Atzori
29150a5d0c
code formatting
2022-04-21 13:31:56 +02:00
Miriam Baglioni
5b7d9e741c
[Clean Context] added logic to cleaning workflow to accomodate also context cleaning
2022-04-21 13:02:14 +02:00
Miriam Baglioni
ccba1a3db1
[Clean Context] added logic to cleaning workflow to accomodate also context cleaning
2022-04-21 13:00:06 +02:00
Miriam Baglioni
20de75ca64
[Measures] removed typo
2022-04-21 12:14:03 +02:00
Miriam Baglioni
bebb2a0560
Merge branch 'eosc_dimitris' of https://code-repo.d4science.org/D-Net/dnet-hadoop into eosc_dimitris
2022-04-21 12:10:19 +02:00
Miriam Baglioni
b61efd613b
[Measures] addressed comments in the PR
2022-04-21 12:09:37 +02:00
Miriam Baglioni
d012d125d7
[EOSCTag] -
2022-04-21 12:02:09 +02:00
Claudio Atzori
88acad76f9
Merge branch 'beta' into eosc_dimitris
2022-04-21 12:00:03 +02:00
Claudio Atzori
eabb40fccc
Merge branch 'beta' into 7096-fileGZip-collector-plugin
2022-04-21 11:42:43 +02:00
Miriam Baglioni
c304657d91
[Measures] put the logic in common, no need to change the schema
2022-04-21 11:27:26 +02:00
Sandro La Bruzzo
d580e15442
Modified last intersection since we lost many titles.
...
this is my last resource, after that, I've to change my job
2022-04-21 11:06:08 +02:00
Miriam Baglioni
5295effc96
[Measures] fixed issue
2022-04-20 16:20:40 +02:00
Miriam Baglioni
a38f0f5ea7
mergin with branch beta
2022-04-20 15:44:18 +02:00
Miriam Baglioni
dbfbe8841a
[Clean Context] changed the description in input parameters
2022-04-20 15:41:03 +02:00
Miriam Baglioni
5feae77937
[Measures] last changes to accomodate tests
2022-04-20 15:13:09 +02:00
Miriam Baglioni
869407c6e2
[Measures] added new measure (usagecounts) as action set. Measure added at the level of the result. Ref #7587
2022-04-20 14:02:05 +02:00
Antonis Lempesis
b7cd2c6ca1
added open citations
2022-04-20 14:46:55 +03:00
Michele Artini
c96a8613f8
update SQL queries
2022-04-20 12:07:49 +02:00
Michele Artini
4314db55c8
migration to services: update sql queries
2022-04-19 15:05:02 +02:00
Miriam Baglioni
0012e57bf9
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2022-04-14 14:14:44 +02:00
Miriam Baglioni
c5a863132c
[BulkTagging] revert it
2022-04-14 14:14:13 +02:00
Sandro La Bruzzo
d5b29d96a7
fix merging in crossrefAggregator which creates dataInfo null
2022-04-14 11:07:04 +02:00
Miriam Baglioni
8e8933d41a
[BulkTagging] added fix if result.dataInfo is null
2022-04-14 09:04:24 +02:00
Claudio Atzori
b93a141d6c
[Doiboost] fixed fundingReference extraction from the Crossref records
2022-04-12 10:26:05 +02:00
Claudio Atzori
73c172926a
[Doiboost] fixed fundingReference extraction from the Crossref records
2022-04-12 10:25:42 +02:00
Claudio Atzori
48b580b45c
[graph enrichment] fixed country_propagation oozie workflow definition, parameter saveGraph is not needed anymore by the SparkCountryPropagationJob
2022-04-11 08:52:36 +02:00
Claudio Atzori
21f32b83c6
[graph enrichment] fixed country_propagation oozie workflow definition, parameter saveGraph is not needed anymore by the SparkCountryPropagationJob
2022-04-11 08:52:12 +02:00
Claudio Atzori
4eff7856f5
Merge pull request '[stats-wf] computing stats in each step' ( #210 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#210
2022-04-08 14:21:01 +02:00
Serafeim Chatzopoulos
d0b84d3297
Add FileCollectorPlugin and respective test
2022-04-07 15:06:38 +03:00
Serafeim Chatzopoulos
bc1bf55507
Add AbstractSplittedRecordPlugin
2022-04-07 14:33:04 +03:00
Claudio Atzori
c26222623f
[maven-release-plugin] prepare for next development iteration
2022-04-07 13:32:22 +02:00
Claudio Atzori
86585a6b27
[maven-release-plugin] prepare release dhp-1.2.4
2022-04-07 13:32:19 +02:00
Claudio Atzori
ad85d88eaf
[maven-release-plugin] rollback the release of dhp-1.2.4
2022-04-07 13:28:35 +02:00
Claudio Atzori
598e11dfd7
[maven-release-plugin] prepare for next development iteration
2022-04-07 13:27:02 +02:00
Claudio Atzori
db3d9877a5
[maven-release-plugin] prepare release dhp-1.2.4
2022-04-07 13:26:58 +02:00
Claudio Atzori
3bba6d6e38
[maven-release-plugin] rollback the release of dhp-1.2.4
2022-04-07 12:23:17 +02:00
Claudio Atzori
2ac2d928bd
[maven-release-plugin] prepare for next development iteration
2022-04-07 12:18:47 +02:00
Claudio Atzori
85bc722ff4
[maven-release-plugin] prepare release dhp-1.2.4
2022-04-07 12:18:43 +02:00
Claudio Atzori
bc05b6168a
[maven-release-plugin] rollback the release of dhp-1.2.4
2022-04-07 11:49:06 +02:00
Claudio Atzori
505420fd61
[maven-release-plugin] prepare for next development iteration
2022-04-07 11:34:06 +02:00
Claudio Atzori
66e718981e
[maven-release-plugin] prepare release dhp-1.2.4
2022-04-07 11:34:02 +02:00
Serafeim Chatzopoulos
e612489670
Add fileGZip collector plugin and respective test
2022-04-06 19:12:44 +03:00
Claudio Atzori
4190c9f6bc
[graph raw] avoid NPEs importing datasource consent fields
2022-04-06 15:34:31 +02:00
Claudio Atzori
05fafa1408
[graph raw] avoid NPEs importing datasource consent fields
2022-04-06 15:23:50 +02:00
Antonis Lempesis
c442c91f89
computing stats in each step
2022-04-06 12:40:02 +03:00
Claudio Atzori
8c457f1b2c
conflicts resolved, merged from beta
2022-04-06 10:27:52 +02:00
Miriam Baglioni
e77d104951
[OC] added / to workflow path
2022-04-05 15:07:11 +02:00
Miriam Baglioni
79336d46c5
[Clean Context] first naive implementation of a functionality to clean not wanted contextes from one result. This implementation simply verifies the main title of the results start with a given string
2022-04-04 15:52:31 +02:00
Claudio Atzori
873369af1c
Merge pull request '[stats wf] added apcs in monitor db' ( #207 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#207
2022-03-29 15:40:20 +02:00
Antonis Lempesis
7112806a73
views cannot be stored as parquet...
2022-03-29 16:37:29 +03:00
Antonis Lempesis
fff0b3cc19
added apcs in monitor db
2022-03-29 14:15:31 +03:00
Claudio Atzori
de85367695
Merge pull request '[stats wf] fix: views cannot be stored as parquet...' ( #206 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#206
2022-03-29 12:51:02 +02:00
Antonis Lempesis
ee24f3eb2c
views cannot be stored as parquet...
2022-03-29 13:47:48 +03:00
Sandro La Bruzzo
1b11010169
minor fix
2022-03-29 10:59:14 +02:00
Claudio Atzori
0a0ae84c22
[graph raw] DOI based instance URLs on https
2022-03-29 10:52:58 +02:00
Claudio Atzori
9fa3dd78fe
Merge pull request '[stats wf] various fixes, organization ids for inst. dashboard' ( #205 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#205
2022-03-28 22:03:49 +02:00
Claudio Atzori
96aa2a5d0d
Merge branch 'beta' into instance_group_by_url
2022-03-28 09:23:52 +02:00
Claudio Atzori
741bc99c47
Merge branch 'beta' into datasource_pdf_consent
2022-03-28 09:20:48 +02:00
Claudio Atzori
61319b2e83
updated dhp-schema version; set entity-level dataInfo before & after merging the fields from the group of duplicates
2022-03-25 16:38:33 +01:00
Antonis Lempesis
d8503cd191
added moooar organizations
2022-03-24 14:02:36 +02:00
Miriam Baglioni
7b8f85692e
[Enrichment country] fixed issues with parameters and workflow args
2022-03-23 17:20:23 +01:00
Claudio Atzori
48d32466e4
instances grouped by URL expose only one refereed
2022-03-23 14:52:03 +01:00
Claudio Atzori
f10066547b
increased spark.sql.shuffle.partitions in affiliation_from_semrel_propagation
2022-03-23 12:22:26 +01:00
Claudio Atzori
43733c1a18
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-03-23 12:14:27 +01:00
Antonis Lempesis
62f91b0869
cleanup
2022-03-22 16:17:49 +02:00
Antonis Lempesis
2e8394ecf8
creating aaall tables as parquet
2022-03-22 16:16:08 +02:00
Antonis Lempesis
dcfbeb8142
yet more typos
2022-03-21 12:36:03 +02:00
Miriam Baglioni
89fd275480
[HostedByMap] added left over from PR and fixed issue on workflow
2022-03-21 09:54:45 +01:00
miconis
c763aded70
dependency updated to the new pace-core version
2022-03-16 16:41:50 +01:00
miconis
c959639bd5
dependency updated to the new pace-core version
2022-03-15 16:33:03 +01:00
Miriam Baglioni
0f7d8ca2e0
[HostedByMap] change on master to align to PR 201 on beta merged as 9f3036c847
2022-03-11 15:16:02 +01:00
Claudio Atzori
f430029596
cleanup
2022-03-11 14:28:28 +01:00
Miriam Baglioni
12de9acb0d
[Country Propagation] left out from previous commit
2022-03-11 14:17:02 +01:00
Miriam Baglioni
2fbb35ade5
mergin with branch beta
2022-03-11 13:58:10 +01:00
Miriam Baglioni
4437f9345d
[Country Propagation] left out from previous commit
2022-03-11 13:57:47 +01:00
Miriam Baglioni
2b643059fa
[Country Propagation] changed the logic to get the collectedfrom at the result level. To fix issue when no instance is created for a result that should have the country associated. Change the code to use spark instead of hive to prepare the data needed for the propagation step. Added new tests for the intermediate steps and new verification for the propagation itself
2022-03-11 13:56:48 +01:00
Claudio Atzori
f25407bbe2
added mapping for datasource consent fields to integrate them in the graph
2022-03-11 09:32:42 +01:00
Miriam Baglioni
2c5087d55a
[HostedByMap] download of doaj from json, modification of test resources, deletion of class no more needed for the CSV download
2022-03-04 15:18:21 +01:00
Miriam Baglioni
5d608d6291
[HostedByMap] changed the model to include also oaStart date and review process that could be possibly used in the future
2022-03-04 11:06:09 +01:00
Miriam Baglioni
b7c2340952
[HostedByMap - DOIBoost] changed to use code moved to common since used also from hostedbymap now
2022-03-04 11:05:23 +01:00
Miriam Baglioni
8a41f63348
[HostedByMap] update to download the json instead of the csv
2022-03-04 10:38:43 +01:00
Miriam Baglioni
44b0c03080
[HostedByMap] update to download the json instead of the csv
2022-03-04 10:37:59 +01:00
Antonis Lempesis
ad78e505da
yet another fix
2022-03-03 12:28:12 +02:00
Miriam Baglioni
3be8737c32
[graph-stats] fixed query after the change in the indicator table related to PR#200
2022-03-02 14:09:05 +01:00
Miriam Baglioni
3970651ee1
Merge pull request 'fixed query after the change in the indicator table' ( #200 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#200
2022-03-02 14:05:58 +01:00
Antonis Lempesis
efeeebfee1
fixed query after the change in the indicator table
2022-03-02 13:29:25 +02:00
Claudio Atzori
580d904aae
manually merging PR#199 D-Net/dnet-hadoop#199
2022-02-25 12:22:50 +01:00
Claudio Atzori
1932a65d1c
Merge pull request '[Stats wf] sprint 6 indicators' ( #198 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#198
2022-02-25 12:09:18 +01:00
Miriam Baglioni
f5b0a6f89c
[master to beta] fixed issues in test files
2022-02-25 10:21:57 +01:00
miconis
8991d097b4
bug fix in the DedupRecordFactory, DataInfo set before merge
2022-02-24 17:13:12 +01:00
miconis
fe1c966cbf
Merge branch 'master_202203' of code-repo.d4science.org:D-Net/dnet-hadoop into master_202203
2022-02-24 17:08:38 +01:00
miconis
b0f369dc78
bug fix in the DedupRecordFactory, DataInfo set before merge
2022-02-24 17:08:24 +01:00
Miriam Baglioni
859cb7ac9d
[DoiBoost AR] changed test resource to be sure the result will always have EMBARGO as value for AccessRight
2022-02-24 16:55:32 +01:00
Miriam Baglioni
a40b59b7d5
[ResultToOrgFromInstRepoTest] fixed issue in model of the input resources
2022-02-24 16:05:57 +01:00
Claudio Atzori
66c09b1bc7
code formatting
2022-02-24 12:58:07 +01:00
Claudio Atzori
a87c070447
conflicts resolved, merged from beta
2022-02-24 12:51:31 +01:00
Claudio Atzori
86cdb7a38f
[provision] serialize measures defined on the result level
2022-02-23 15:54:18 +01:00
Alessia Bardi
9d6203f79b
test mapping datasource
2022-02-23 15:00:53 +01:00
Antonis Lempesis
3b92a2ab9c
added the rest of spring 6 in monitor db
2022-02-23 12:05:57 +02:00
Antonis Lempesis
87c91f70a2
added sprint 6 indicators to monitor db
2022-02-22 14:41:48 +02:00
Claudio Atzori
5226d0a100
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-02-18 15:21:07 +01:00
Claudio Atzori
99f5b14469
[graph raw] invisible records stored among the raw graph rather than the claimed subgraph
2022-02-18 15:20:57 +01:00
Claudio Atzori
401dd38074
code formatting
2022-02-18 15:19:15 +01:00
Claudio Atzori
cf8443780e
added processingchargeamount to the result view
2022-02-18 15:17:48 +01:00
Sandro La Bruzzo
891781ee3f
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2022-02-18 11:11:32 +01:00
Sandro La Bruzzo
d3f03abd51
fixed wrong json path
2022-02-18 11:11:17 +01:00
Claudio Atzori
89c7313fc5
Merge branch 'beta' into hierarchical_orgs_relations
2022-02-17 10:30:04 +01:00
dimitrispie
58c59f46eb
Added Sprint 6
2022-02-17 10:21:09 +02:00
Antonis Lempesis
5772f92dba
merged beta chnages in hive branch
2022-02-15 13:24:51 +02:00
Antonis Lempesis
393a4ee956
fixed yet another typo...
2022-02-15 12:56:50 +02:00
Sandro La Bruzzo
3aa2020b24
added script to regenerate hostedBy Map following instruction defined on ticket #7539
...
updated hosted By Map
2022-02-15 11:05:27 +01:00
Miriam Baglioni
be64055cfe
[OpenCitation] changed the name of destination folders
2022-02-14 15:49:44 +01:00
Miriam Baglioni
1490867cc7
[OpenCitation] cleaning of the COCI model
2022-02-14 14:52:12 +01:00
Miriam Baglioni
c191080965
mergin with branch beta
2022-02-14 14:49:39 +01:00
Alessia Bardi
600ede1798
serialisation of APCs int he XML records
2022-02-11 11:00:20 +01:00
Miriam Baglioni
5c4043dba8
[OpenCitation] refactoring
2022-02-08 16:23:05 +01:00
Miriam Baglioni
759ed519f2
[OpenCitation] added logic to avoid the genration of self citations relations
2022-02-08 16:15:34 +01:00
Miriam Baglioni
b071f8e415
[OpenCitation] change to extract in json format each folder just onece
2022-02-08 15:37:28 +01:00
Miriam Baglioni
fbc28ee8c3
[OpenCitation] change the integration logic to consider dois with commas inside
2022-02-07 18:32:08 +01:00
Miriam Baglioni
78be2975f0
[stats-wf]fixed another typo related to PR#193
2022-02-07 11:22:08 +01:00
Miriam Baglioni
1f8302dc37
Merge pull request '[stats-wf]fixed yet another typo' ( #193 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#193
2022-02-07 11:19:26 +01:00
Antonis Lempesis
5f762cbd09
fixed yet another typo
2022-02-07 12:09:12 +02:00
Alessia Bardi
ac8b8f224f
Merge branch 'beta' into extendResult
2022-02-04 16:43:27 +01:00
Miriam Baglioni
493caef358
[stats-wf]fixed the result_result table related to PR#191
2022-02-04 14:51:25 +01:00
Miriam Baglioni
0547fd6ee7
Merge pull request '[stats-wf]fixed the result_result table' ( #191 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#191
2022-02-04 14:47:31 +01:00
Antonis Lempesis
ae633c566b
fixed the result_result table
2022-02-04 15:04:19 +02:00
Miriam Baglioni
aae667e6b6
[APC at the result level] added the APC at the level of the result and modified test class
2022-02-04 12:34:25 +01:00
Sandro La Bruzzo
bcfdf9a0d7
iis repository with https
2022-02-03 16:49:31 +01:00
Miriam Baglioni
3c60e53a96
[stats-wf]fixed the result_result creation for monitor PR#190 on beta
2022-02-03 14:47:08 +01:00
Miriam Baglioni
89922156c9
Merge pull request '[stats-wf]fixed the result_result creation for monitor' ( #190 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#190
2022-02-03 13:00:56 +01:00
Antonis Lempesis
c2b44530a3
typo...
2022-02-03 13:44:07 +02:00
Antonis Lempesis
dbd2646d59
fixed the result_result creation for monitor
2022-02-03 12:37:10 +02:00
Alessia Bardi
2e215abfa8
test for instances with URLs for OpenAPC
2022-02-02 17:27:44 +01:00
Miriam Baglioni
37784209c9
[dhp-schemas-] updated the version of dhp-schema to 2.10.27 for APC name and id modification
2022-02-02 12:46:31 +01:00
Miriam Baglioni
73eba34d42
[UnresolvedEntities] Changed the way to merge the unresolved because the new merge removed the dataInfo from the merged result. Added also data info for subjects
2022-02-01 08:38:41 +01:00
Miriam Baglioni
dce7f5fea8
[BULK TAGGING] changed to fix issue that should have been fixed already
2022-01-31 08:20:28 +01:00
Claudio Atzori
8eb75ca169
adapted GenerateEntitiesApplicationTest behaviour
2022-01-27 16:24:37 +01:00
Claudio Atzori
af61e44acc
ported changes to the GraphCleaningFunctionsTest from 8de9788308
2022-01-27 16:19:14 +01:00
Claudio Atzori
1322379741
Merge branch 'beta' into delegated_authorities
2022-01-25 14:28:25 +01:00
Claudio Atzori
59a250337c
[graph resolution] drop output path at the beginning
2022-01-24 18:02:39 +01:00
Claudio Atzori
97ad94d7d9
[graph resolution] drop output path at the beginning
2022-01-24 18:02:07 +01:00
Claudio Atzori
8de9788308
applied fix for avoiding ruling out the invisible (APC) records during the graph cleaning
2022-01-24 11:29:22 +01:00
Claudio Atzori
2f385b3ac6
updated dnet workflow profile definitions
2022-01-21 13:59:46 +01:00
Claudio Atzori
dd52bf1bb8
copy relations to the graphOutputPath
2022-01-21 13:59:29 +01:00
Claudio Atzori
4983d6536d
Merge branch 'beta' into delegated_authorities
2022-01-21 13:02:48 +01:00
Claudio Atzori
f0ea2410e5
improved mapping titles from datacite records to consider title types
2022-01-21 10:50:34 +01:00
Claudio Atzori
b37bc277c4
reintroduced the hostedby patching to the datacite records
2022-01-21 09:15:13 +01:00
Claudio Atzori
f2fde5566b
using helper method from ModelSupport to find the inverse relation descriptor
2022-01-20 09:19:07 +01:00
Claudio Atzori
3b9020c1b7
added unit test for the DispatchEntitiesJob
2022-01-19 18:15:55 +01:00
Claudio Atzori
abfa9c6045
code formatting
2022-01-19 17:17:11 +01:00
Claudio Atzori
391aa1373b
added unit test
2022-01-19 17:13:21 +01:00
Claudio Atzori
44a937f4ed
factored out entity grouping implementation, extended to consider results from delegated authorities rather than identical records from other sources
2022-01-19 12:24:52 +01:00
Miriam Baglioni
a7c4d0d16d
[DoiBoost Organizations] added parameter to specify the action in the wf raw_organizations to be able to load the openorgs organization as in the loading step for the construction of the graph
2022-01-13 13:52:00 +01:00
Miriam Baglioni
a75fb8c47a
[BipFinderInstanceLevel] change pom to align to the dhp-schema release 2.10.24 and refactoring
2022-01-12 18:06:26 +01:00
Miriam Baglioni
e7d5a39c03
[BipFinderInstanceLevel] added tests in test class
2022-01-12 17:25:04 +01:00
Miriam Baglioni
4993666d73
[BipFinderInstanceLevel] changed creation of the instance to allow to enrich existing instances with same pid
2022-01-12 16:53:47 +01:00
Claudio Atzori
9acc32faa6
[stats wf] final touches for the integration of PRs #166 , #179 in the master branch
2022-01-12 12:04:31 +01:00
dimitrispie
b053b0178e
Sprint 5 and other changes
2022-01-12 11:23:37 +01:00
Antonis Lempesis
b6b4bc0df9
added first indicator of sprint 5
2022-01-12 11:20:28 +01:00
Antonis Lempesis
e91f06f39b
fixed typos in indicators. Added extra views in monitor
2022-01-12 11:18:28 +01:00
Antonis Lempesis
3ce1976627
fixed column names
2022-01-12 11:14:41 +01:00
Antonis Lempesis
4878d7485c
added usage stats
2022-01-12 11:13:25 +01:00
Antonis Lempesis
a4316bafed
fixed a typo
2022-01-12 11:12:53 +01:00
Antonis Lempesis
bb17e070d8
added result_result relations
2022-01-12 11:09:38 +01:00
Claudio Atzori
a30a98a716
Applying PR#166 in the master branch (Added sprint 3&4 of indicators). Merge commit '0df9574a6f5d9d75bc840decb023561ae941f9d6'
2022-01-12 10:57:19 +01:00
Sandro La Bruzzo
57e2c4b749
formatted code
2022-01-12 09:40:28 +01:00
Claudio Atzori
0f2144b5e0
scalafmt: code formatting
2022-01-11 17:03:44 +01:00
Claudio Atzori
dcd282977c
pulled from beta
2022-01-11 16:59:41 +01:00
Claudio Atzori
4f212652ca
scalafmt: code formatting
2022-01-11 16:57:48 +01:00
Sandro La Bruzzo
0163dadb7f
[doiboost]
...
- update MAG schema, new filed added on version dec-2021
2022-01-11 11:05:44 +01:00
Miriam Baglioni
904e1c2667
Merge pull request 'Affiliation Propagation through semantic relation' ( #183 ) from enrichment into beta
...
Reviewed-on: D-Net/dnet-hadoop#183
2022-01-07 19:18:16 +01:00
Miriam Baglioni
064f9bbd87
[AFFPropSR] added new paprameter for the number of iterations and new code for just one iteration
2022-01-07 18:58:51 +01:00
Miriam Baglioni
b7e450070b
[SDG-FOS] to import SDG file not considering the header
2022-01-07 12:13:26 +01:00
Miriam Baglioni
639190370a
mergin with branch beta
2022-01-07 11:29:25 +01:00
Miriam Baglioni
adccc2346a
[SDG-FOS] to lower case for the doi
2022-01-07 11:28:50 +01:00
Claudio Atzori
8ae46ca789
OAF-store-graph mdstores: firther fix for PR#180
2022-01-05 15:52:15 +01:00
Claudio Atzori
908294d86e
OAF-store-graph mdstores: firther fix for PR#180
2022-01-05 15:49:05 +01:00
Claudio Atzori
3bd3653be9
OAF-store-graph mdstores: save them in text format
2022-01-04 16:39:39 +01:00
Claudio Atzori
3dc48c7ab5
OAF-store-graph mdstores: save them in text format
2022-01-04 16:39:27 +01:00
Claudio Atzori
f82db765db
OAF-store-graph mdstores: save them in text format
2022-01-04 16:39:15 +01:00
Claudio Atzori
8d13effa31
test for the tolerant deserialisation utility method
2022-01-04 16:38:26 +01:00
Claudio Atzori
9458ee7938
serialise records in the OAF-store-graph mdstores in json format. Read them again in the graph construction phase using a tolerant parser to support backward compatible changes in the evolution of the schema
2022-01-04 16:38:09 +01:00
Claudio Atzori
58f8998e3d
OAF-store-graph mdstores: save them in text format
2022-01-04 15:02:09 +01:00
Claudio Atzori
174c3037e1
OAF-store-graph mdstores: save them in text format
2022-01-04 14:40:16 +01:00
Claudio Atzori
045d767013
OAF-store-graph mdstores: save them in text format
2022-01-04 14:23:01 +01:00
Claudio Atzori
bd59b58efb
test for the tolerant deserialisation utility method
2022-01-04 11:26:56 +01:00
Claudio Atzori
a6977197b3
serialise records in the OAF-store-graph mdstores in json format. Read them again in the graph construction phase using a tolerant parser to support backward compatible changes in the evolution of the schema
2022-01-03 17:25:26 +01:00
Miriam Baglioni
4c60ee1718
mergin with branch beta
2022-01-03 15:24:02 +01:00
Miriam Baglioni
92fd69e25d
[SDG-FOS] alternative way to get input data to avoid OOM error while getting csv
2022-01-03 15:23:06 +01:00
Claudio Atzori
fe7e5f4748
Merge pull request '[stats wf] result_result relations, usage stats, monitor views, indicator for sprint 5' ( #179 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#179
2022-01-03 14:52:11 +01:00
Claudio Atzori
bcea4e3a9b
added dnet workflow profile for the orchestration of the simplified and complete graph construction and processing pipeline, where the IIS works on the non-deduplicated graph
2022-01-03 14:33:00 +01:00
Miriam Baglioni
a706ba0c08
Merge pull request 'SDG Integration' ( #178 ) from SDG into beta
...
Reviewed-on: D-Net/dnet-hadoop#178
2021-12-23 14:50:00 +01:00
Antonis Lempesis
81ee654271
added result_result relations
2021-12-23 15:46:17 +02:00
Antonis Lempesis
7551e52e95
fixed a typo
2021-12-23 15:33:53 +02:00
Miriam Baglioni
7a1b440413
[SDG] logic to create unresolved entities out of SDG input. This changes also some classes related to FOS to reuse the same code. The code under createunresolvedentities create results with the merged update of the the inputs provided (bip at the level of the isntance, fos and sdg for subjects)
2021-12-23 13:24:28 +01:00
Claudio Atzori
cccb16900c
https://support.openaire.eu/issues/7330 normalising DOI urls
2021-12-23 12:33:53 +01:00
Miriam Baglioni
2a67ee13ec
[SDG] added model class
2021-12-23 10:37:52 +01:00
Miriam Baglioni
69e9ea9eeb
[Graph Dump] Test for extraction of rels from entities extended
2021-12-23 10:15:30 +01:00
Miriam Baglioni
31b26d48ac
[Graph Dump] fixed issue on extraction of relation between entities and contexts: the relationship name and type were swapped
2021-12-23 10:09:47 +01:00
Miriam Baglioni
10579c0dd0
[FOS]fixed doi value in test
2021-12-22 23:10:16 +01:00
Miriam Baglioni
6116fc5d40
[FOS]added logic to include only different subjects. Test refactoring and extention
2021-12-22 23:04:22 +01:00
Miriam Baglioni
b81efb6a9d
[FOS]changed the mapping between the csv and the model. Changed Test classes and resources
2021-12-22 21:40:35 +01:00
Miriam Baglioni
de6c4c8968
[FOS]creation of the unresolved entities: remove the split for the doi: no more needed since each row is related to one doi
2021-12-22 16:44:44 +01:00
Miriam Baglioni
34ac56565d
refactoring
2021-12-22 16:28:11 +01:00
Miriam Baglioni
20ef1d657f
refactoring
2021-12-22 16:26:36 +01:00
Miriam Baglioni
813f856d3f
[BipFinder] removing left over parameter in wf
2021-12-22 16:11:12 +01:00
Miriam Baglioni
2c126ed014
[BipFinder] create unresolved entities with measures at the level of the instance
2021-12-22 16:03:41 +01:00
Miriam Baglioni
0807fdb65a
[BipFinder] remove not needed resources
2021-12-22 15:37:00 +01:00
Miriam Baglioni
b5e11a3a0a
[BipFinder] put in common package BipFinder model
2021-12-22 15:33:05 +01:00
Miriam Baglioni
c5739c4266
[BipFinder] create action set for the measures at the level of the result
2021-12-22 15:08:33 +01:00
Miriam Baglioni
da5f6260aa
mergin with branch beta
2021-12-22 13:12:02 +01:00
Miriam Baglioni
be0acccf42
Merge branch 'beta' into dump
2021-12-22 12:39:57 +01:00
Antonis Lempesis
16539d7360
added usage stats
2021-12-22 02:54:42 +02:00
Antonis Lempesis
3edd661608
fixed column names
2021-12-21 22:55:04 +02:00
Antonis Lempesis
a4c0cbb98c
fixed typos in indicators. Added extra views in monitor
2021-12-21 15:54:38 +02:00
Miriam Baglioni
e24a7f3496
mergin with branch beta
2021-12-21 13:57:19 +01:00
Miriam Baglioni
d1ae219cb4
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-21 13:55:53 +01:00
Miriam Baglioni
460e6b95d6
[Graph Dump] -
2021-12-21 13:48:03 +01:00
Sandro La Bruzzo
3920d68992
Fixed workflow generation of delta in datacite
2021-12-21 11:41:49 +01:00
Antonis Lempesis
58996972d9
added first indicator of sprint 5
2021-12-21 03:35:04 +02:00
dimitrispie
c1cdec09a9
Sprint 5 and other changes
2021-12-20 19:23:57 +02:00
Miriam Baglioni
3cc1b7b153
mergin with branch beta
2021-12-15 17:25:02 +01:00
Miriam Baglioni
63b648b0dd
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-15 12:41:15 +01:00
Antonis Lempesis
f0b523cfa7
removed the too restrctive clause. will discuss again
2021-12-15 12:32:15 +01:00
Sandro La Bruzzo
b881ee5ef8
[scholexplorer]
...
- implemented generation of scholix of delta update of datacite
2021-12-15 11:25:32 +01:00
Sandro La Bruzzo
63952018c0
[scholexplorer]
...
-moved SparkRetrieveDataciteDelta in scala folder
2021-12-15 11:25:32 +01:00
Sandro La Bruzzo
e5bff64f2e
[scholexplorer]
...
- Minor fix on SparkConvertRDDtoDataset
-first implementation of retrieve datacite dump
2021-12-15 11:25:32 +01:00
Claudio Atzori
1790fa2d44
Merge branch 'beta' into affiliationPropagation
2021-12-14 15:26:56 +01:00
Miriam Baglioni
56409d1281
[Dump] resolved conflicts with beta and merging
2021-12-14 15:03:45 +01:00
Miriam Baglioni
22d4b5619b
[BipFinder Result] last changes to test and resources files
2021-12-14 14:54:13 +01:00
Miriam Baglioni
6fb6236cd4
changed the way to produce the AS for bipFinder.
2021-12-14 14:51:14 +01:00
Miriam Baglioni
573bd17cbb
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-14 11:12:25 +01:00
Miriam Baglioni
4eb8276493
-
2021-12-14 11:12:17 +01:00
Antonis Lempesis
ddd34087c2
removed 'stored as parquet' from views..
2021-12-13 23:05:00 +02:00
Antonis Lempesis
915f758c82
moving data to impala cluster and creating shadow databases there
2021-12-13 16:26:14 +02:00
Miriam Baglioni
936578aaf1
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-13 15:01:47 +01:00
Miriam Baglioni
8d755cca80
-
2021-12-13 15:01:40 +01:00
Claudio Atzori
98eb292c59
avoid NPEs merging XMLInstance(s)
2021-12-13 13:27:20 +01:00
Claudio Atzori
5e17247bb6
avoid NPEs merging XMLInstance(s)
2021-12-13 11:48:40 +01:00
Claudio Atzori
b70ecccea0
avoid NPEs merging XMLInstance(s)
2021-12-12 12:37:38 +01:00
Claudio Atzori
c1b6ae47cd
cleaning workflow assigns the proper default instance type when a value could not be cleaned using the vocabularies
2021-12-09 16:47:41 +01:00
Claudio Atzori
eb43eda42a
Merge branch 'beta' into graph_cleaning
2021-12-09 16:46:48 +01:00
Claudio Atzori
41c70c607d
cleaning workflow assigns the proper default instance type when a value could not be cleaned using the vocabularies
2021-12-09 16:44:28 +01:00
Alessia Bardi
cba63e9f82
Merge branch 'beta' into sygma_indexing
2021-12-09 15:52:16 +01:00
Alessia Bardi
e53228401b
style
2021-12-09 15:46:22 +01:00
Claudio Atzori
cd9c51fd7a
vocabulary based cleaning considers also the term label when looking up for a synonym
2021-12-09 14:49:24 +01:00
Claudio Atzori
e6e177dda0
vocabulary based cleaning considers also the term label when looking up for a synonym
2021-12-09 13:57:53 +01:00
Alessia Bardi
6b5d7688a4
#7275 serialize license information in XML records
2021-12-09 13:46:48 +01:00
Miriam Baglioni
b113586207
resolved conflicts
2021-12-07 10:16:14 +01:00
Sandro La Bruzzo
5d51b3dd4a
Merge pull request 'scala_refactor' ( #169 ) from scala_refactor into beta
...
Reviewed-on: D-Net/dnet-hadoop#169
2021-12-06 15:33:44 +01:00
Miriam Baglioni
d9836f0cf3
[OpenCitations] fixed test when executed one after the other
2021-12-06 15:27:09 +01:00
Miriam Baglioni
d1df01ff1e
[Graph Dump] fixed resource for test
2021-12-06 15:15:48 +01:00
Sandro La Bruzzo
ed0c352799
[test-fixing] fixed wrong test
2021-12-06 15:07:41 +01:00
Miriam Baglioni
96a7d46278
[Graph Dump] fixed tests
2021-12-06 15:06:32 +01:00
Sandro La Bruzzo
e9f285ec4d
[scala-refactor] Module dhp-doiboost:
...
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 14:24:03 +01:00
Sandro La Bruzzo
bf880e2508
[scala-refactor] Module dhp-graph-mapper:
...
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 13:57:41 +01:00
Sandro La Bruzzo
7af0bbd0b1
[scala-refactor] Module dhp-aggregation:
...
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 11:26:36 +01:00
Claudio Atzori
08795cbd30
using helper method from ModelSupport to find the inverse relation descriptor
2021-12-06 10:39:56 +01:00
Miriam Baglioni
f430688ff7
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-03 12:36:08 +01:00
Miriam Baglioni
4bb1d43afc
-
2021-12-03 12:35:51 +01:00
Sandro La Bruzzo
f7011b90d8
format code
2021-12-03 11:15:09 +01:00
Claudio Atzori
dd0b2e5244
Merge branch 'beta' into instance_group_by_url
2021-12-03 09:27:58 +01:00
Claudio Atzori
863a2f9db3
avoid to filter OAF records defined as invisible = true
2021-12-03 09:08:12 +01:00
Claudio Atzori
9cac283bec
implemented Instance serialization features requested in https://support.openaire.eu/issues/7156
2021-12-02 17:20:33 +01:00
Miriam Baglioni
d9f80488cc
[GRAPH DUMP] Add one more test to check the filtering of the relations
2021-12-02 14:15:19 +01:00
Miriam Baglioni
58bc3f223a
[GRAPH DUMP] Add filtering for relation we do not want to dump. It is based on the relclass
2021-12-02 14:09:46 +01:00
Miriam Baglioni
8905a39bf3
mergin with branch beta
2021-12-02 13:17:29 +01:00
Miriam Baglioni
87eedad898
-
2021-12-02 13:17:19 +01:00
Claudio Atzori
3b19821f3c
added stats computation on the graph hive DB tables
2021-12-02 10:44:10 +01:00
Claudio Atzori
cfa4560769
minor: fixed hive action name
2021-12-02 10:43:36 +01:00
Claudio Atzori
d85af6fc25
[cleaning wf] fixed OAF record navigation, a mapping defined on a container object would have prevented the natvigation to continue on its properties
2021-12-01 15:49:15 +01:00
Claudio Atzori
4fe7888817
code formatting
2021-12-01 15:48:15 +01:00
Claudio Atzori
01e5e0142a
added test to verify the relation inverse lookup operation
2021-12-01 09:46:26 +01:00
Antonis Lempesis
d05210ba99
finished migration to hive only
2021-11-30 19:01:48 +02:00
Claudio Atzori
0df9574a6f
Merge pull request '[stats wf] Added sprint 3&4 of indicators' ( #166 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#166
2021-11-29 10:40:26 +01:00
Claudio Atzori
1de881b796
resolved conflicts for #165
2021-11-26 16:15:11 +01:00
Claudio Atzori
014e872ae1
[resolution wf] added optional parameter to skip the entity resolution
2021-11-26 15:38:56 +01:00
Claudio Atzori
5c6d328537
code formatting
2021-11-26 15:38:16 +01:00
dimitrispie
09fc2afdca
Added indi_funder_country_collab
...
Kept only indi_pub_has_cc_licence
2021-11-26 16:13:10 +02:00
Antonis Lempesis
0b4163ee0b
added sprint3,4, removed 2, chaos
2021-11-26 15:58:01 +02:00
Antonis Lempesis
12749a0a77
first
2021-11-26 15:40:40 +02:00
dimitrispie
29f69f2f89
Sprint 4
2021-11-26 15:22:04 +02:00
Miriam Baglioni
ac07ed8251
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-11-25 12:32:58 +01:00
Miriam Baglioni
5fd0e610bf
[DOIBOOST Process] fix filtering to filter results with non null id
2021-11-25 12:10:45 +01:00
Sandro La Bruzzo
feea154e89
remove working dir after test
2021-11-25 11:02:38 +01:00
Sandro La Bruzzo
028a8acad8
add test resources
2021-11-25 10:54:47 +01:00
Sandro La Bruzzo
2164a2a889
Datacite: Code Refactor generated a general SparkApplication Scala where all the spark scala have to inherit
...
Commented a little the Datacite transformation code
2021-11-25 10:54:13 +01:00
Miriam Baglioni
3f9b2ba8ce
[Hosted By Map] fix issue in test
2021-11-22 16:59:43 +01:00
Sandro La Bruzzo
a7cf277d98
Datacite: Removed HostedBy Patch as described on ticket #7219 , Now all the records will have hosted by Unknown Repository
2021-11-22 16:03:17 +01:00
Sandro La Bruzzo
483d3039d1
entity resolution: added distcpt of missing entities in graph materialization
2021-11-22 15:55:24 +01:00
Sandro La Bruzzo
93fe8ce8b2
entity resolution: fix test
2021-11-22 15:50:43 +01:00
Sandro La Bruzzo
35e20b0647
updated resolution wf:
...
- generate a new version of the graph
- changed merge from union to join
2021-11-22 11:48:55 +01:00
Miriam Baglioni
fdb75b180e
[Cleaning] added couple of tests for DOIBOOST publications
2021-11-21 16:35:22 +01:00
Miriam Baglioni
0506fa2654
[Graph Dump] changed to mirror the changes in the model
2021-11-19 15:56:25 +01:00
Sandro La Bruzzo
3426451d3f
Merge remote-tracking branch 'origin/beta' into beta
2021-11-19 14:49:04 +01:00
Sandro La Bruzzo
4542a2338b
updated site configuration to deploy on website
2021-11-19 13:44:08 +01:00
Claudio Atzori
e5a2c596b2
Merge branch 'beta' into preserve_openorg_parent_child_relations
2021-11-19 11:35:46 +01:00
Claudio Atzori
f4538f3c4c
cleanup
2021-11-19 11:33:10 +01:00
Claudio Atzori
2b46b87f56
fixed filtering criteria applied in SparkCopyRelationsNoOpenorgs to keep the parent/child relations from OpenOrgs
2021-11-19 11:30:29 +01:00
Miriam Baglioni
9fae872181
[Graph Dump] changed to mirror the changes in the model
2021-11-19 11:25:50 +01:00
Sandro La Bruzzo
fc03c99805
fixed javadocs url after deploying site
2021-11-19 10:46:33 +01:00
Sandro La Bruzzo
0c0d561bc4
added public class into tests to create correct javadoc
2021-11-19 09:54:22 +01:00
Claudio Atzori
62fa61f3cf
merge from beta
2021-11-19 09:23:42 +01:00
Claudio Atzori
bd9a43cefd
Revert to 4094f2bb9a
2021-11-19 09:20:43 +01:00
Claudio Atzori
3a4d925386
Merge branch 'beta' into hierarchical_orgs_relations
2021-11-18 18:07:08 +01:00
Claudio Atzori
3974fa7dc1
Merge branch 'beta' into affiliationPropagation
2021-11-18 18:06:26 +01:00
Claudio Atzori
a24b9f8268
[dedup] trivial refactoring
2021-11-18 17:12:02 +01:00
Claudio Atzori
c0750fb17c
avoid non necessary count operations over large spark datasets
2021-11-18 17:11:31 +01:00
Claudio Atzori
bb5dca7979
cleanup
2021-11-18 17:10:46 +01:00
Miriam Baglioni
793b5a8e5f
Aggiornare 'dhp-workflows/dhp-graph-mapper/src/main/java/eu/dnetlib/dhp/oa/graph/dump/ResultMapper.java'
...
Removing the dump of Measure at the level of the result. We decided not to map it
2021-11-18 14:49:38 +01:00
Miriam Baglioni
5dc5792722
[Graph Dump] Change test resource to mirror the movement of the measure element
2021-11-18 14:39:12 +01:00
Miriam Baglioni
0136a8c266
[Graph Dump] Change test to mirror that measure is at the level of the isntance
2021-11-18 14:38:33 +01:00
Miriam Baglioni
1b79c0ee79
mergin with branch beta
2021-11-18 11:01:00 +01:00
Antonis Lempesis
cb3adb90f4
Merge branch 'beta' into beta
2021-11-17 14:33:45 +01:00
Antonis Lempesis
c283406829
added Universidad Polytecnica de Madrid
2021-11-17 15:33:00 +02:00
Claudio Atzori
e0395719d7
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-11-17 14:17:27 +01:00
Claudio Atzori
82a4e4efae
[cleaning wf] fixed methodology to rule out invalid result titles, based on https://support.openaire.eu/issues/7206
2021-11-17 14:17:22 +01:00
Miriam Baglioni
6d4a1c57ee
[Resolve Entities] Change test dataset to mirror the modification in the creation of the map between the pids and the unresolved
2021-11-17 12:41:52 +01:00
Sandro La Bruzzo
9c82d670b8
make class public in order to create javadoc
2021-11-17 12:31:02 +01:00
Sandro La Bruzzo
1f5ee116ed
code refactor, created and moved scala code on the correct maven folder under src/main/scala and src/test/scala
...
fixed test
2021-11-17 12:23:52 +01:00
Sandro La Bruzzo
2fd9ceac13
code refactor, created and moved scala code on the correct maven folder under src/main/scala and src/test/scala
2021-11-17 11:35:22 +01:00
Sandro La Bruzzo
2506d7a679
Merge branch 'mvn_site_documentation' of code-repo.d4science.org:D-Net/dnet-hadoop into mvn_site_documentation
2021-11-17 11:07:24 +01:00
Sandro La Bruzzo
cded363b55
code refactor, created and moved scala code on the correct maven folder under src/main/scala and src/test/scala
2021-11-17 11:06:35 +01:00
Miriam Baglioni
4094f2bb9a
added integration md file
2021-11-17 10:04:52 +01:00
Miriam Baglioni
ec8b0219ff
[Documentation] Added first page for Integration via unresolved entities generation
2021-11-16 17:41:34 +01:00
Miriam Baglioni
2bbece2ca5
mergin with branch beta
2021-11-16 16:35:40 +01:00
Sandro La Bruzzo
2d67020c59
added dhp-enrichment maven site template
2021-11-16 16:01:08 +01:00
Miriam Baglioni
28ea532ece
[Affilaition Propagation] moved the selection of graph relation as a preparation step
2021-11-16 15:24:19 +01:00
Sandro La Bruzzo
18c1d70ef4
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into mvn_site_documentation
2021-11-16 15:16:49 +01:00
Sandro La Bruzzo
a1cafaf2e3
added mvn site for dnet-hadoop project
2021-11-16 15:16:28 +01:00
Miriam Baglioni
7c96e3fd46
removed not useful dir
2021-11-16 13:57:26 +01:00
Miriam Baglioni
c7c0c3187b
[AFFILIATION PROPAGATION] Applied some SonarLint suggestions
2021-11-16 13:56:32 +01:00
Miriam Baglioni
c6a9f0a1a8
mergin with branch beta
2021-11-16 12:04:40 +01:00
Miriam Baglioni
99d86134f5
[Graph Dump] changed the dump since the measures have been moded at the level of the instance
2021-11-16 12:04:21 +01:00
Claudio Atzori
0a727d325d
[dedup] increased number of partitions in the consistency phase
2021-11-16 08:43:41 +01:00
Claudio Atzori
bafa2990f3
code formatting
2021-11-15 17:07:16 +01:00
Claudio Atzori
668ac25224
[graph resolution] using existing argument parser file name
2021-11-15 17:02:45 +01:00
Claudio Atzori
7d0a03f607
[graph resolution] minor
2021-11-15 14:45:54 +01:00
Claudio Atzori
941a50a2fc
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-11-15 14:42:49 +01:00
Claudio Atzori
7c804acda8
[graph resolution] minor
2021-11-15 14:42:43 +01:00
Sandro La Bruzzo
efa09057db
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2021-11-15 14:32:09 +01:00
Sandro La Bruzzo
48923e46a1
added documentation to Pubmed Class and also added mvn site for dhp-aggregations
2021-11-15 14:32:01 +01:00
Claudio Atzori
d2c787d416
[graph resolution] fixed sequence of the workflow steps
2021-11-15 14:31:15 +01:00
Claudio Atzori
975b10b711
[actionmanager] increased spark.sql.shuffle.partitions to 5000
2021-11-15 12:31:45 +01:00
Miriam Baglioni
4ec88c718c
merge with beta - resolved conflict in pom
2021-11-15 10:52:16 +01:00
Miriam Baglioni
6f1a434e90
[Bypass Action Set] Fixed test to consider the new identifier utils
2021-11-15 09:59:23 +01:00
Miriam Baglioni
157d33ebf9
[Bypass Action Set] Refactoring
2021-11-15 09:58:48 +01:00
Miriam Baglioni
6595135a1a
[Dump Schemas] changed the schema of the dumped result according to the modifications in the bestAccessRight type
2021-11-12 11:45:38 +01:00
Miriam Baglioni
43cae4ad88
Merge branch 'dump' of https://code-repo.d4science.org/D-Net/dnet-hadoop into dump
2021-11-12 11:36:54 +01:00
Miriam Baglioni
b3f9370125
merge with beta - resolved conflict in pom
2021-11-12 11:25:26 +01:00
Miriam Baglioni
92d0e18b55
[Bypass Action Set] used constant DOI instead of "doi"
2021-11-12 10:56:58 +01:00
Miriam Baglioni
881113743f
[Bypass Action Set] refactoring
2021-11-12 10:55:50 +01:00
Miriam Baglioni
47ccb53c4f
[Bypass Action Set] modification for comment D-Net/dnet-hadoop#157 (comment)
2021-11-12 10:54:09 +01:00
Miriam Baglioni
ffb0ce1d59
merge with beta - resolved conflict in pom
2021-11-12 10:19:59 +01:00
Miriam Baglioni
716021546e
[Bypass Action Set] minor fix
2021-11-12 10:18:01 +01:00
Sandro La Bruzzo
3469cc2b1d
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2021-11-12 09:56:52 +01:00
Sandro La Bruzzo
a7763d2492
removed alternate identifier in resolutionMap
2021-11-12 09:56:45 +01:00
Miriam Baglioni
b8bdabfae9
[Graph DUmp] removed OpenAccessRoute from test in best access right
2021-11-11 16:16:48 +01:00
Miriam Baglioni
e5498052e8
[Graph DUmp] removed OpenAccessRoute from test in best access right
2021-11-11 16:14:10 +01:00
Miriam Baglioni
935062edec
[Bypass Action Set] creation of unresolved entities
2021-11-11 16:11:25 +01:00
Antonis Lempesis
26f086dd64
removed the too restrctive clause. will discuss again
2021-11-11 12:57:19 +02:00
Claudio Atzori
148289150f
Merge branch 'beta' into doiboost_url
2021-11-11 10:40:19 +01:00
Sandro La Bruzzo
2ca0a436ad
added SparkResolveEntities node to the oozie wf
2021-11-11 10:25:42 +01:00
Sandro La Bruzzo
9cb195314f
implemented and tested resolution of entities
2021-11-11 10:17:40 +01:00
Miriam Baglioni
6d3c4c4abe
mergin with branch beta
2021-11-11 08:59:53 +01:00
Miriam Baglioni
8cc50ecee0
[Graph Dump] changed AccessRight with BestAccessRight in the dump and modified the dependency to the schema to the SNAPSHOT
2021-11-11 08:59:20 +01:00
Miriam Baglioni
88b73f4f49
mergin with branch beta
2021-11-10 17:00:52 +01:00
Miriam Baglioni
c371b23077
-
2021-11-10 17:00:37 +01:00
Alessia Bardi
fc8fceaac3
create direct link to WT projects as well
2021-11-10 14:11:52 +01:00
Alessia Bardi
6cd91004e3
fixed DOI for Wellcome Trust in mapping relationships from Crossref
2021-11-09 12:22:57 +01:00
Miriam Baglioni
9e214ce0eb
[BypassAS] addition of OC relations
2021-11-09 12:07:19 +01:00
Alessia Bardi
b9d4f115cc
fixed Crossref mappign for SFI projects
2021-11-09 12:04:45 +01:00
Sandro La Bruzzo
6477a40670
implement filter of openCitation
2021-11-09 11:27:12 +01:00
Miriam Baglioni
6f7ca539c6
[BypassAS] update of results for bipFinder and FOS
2021-11-09 11:25:41 +01:00
Miriam Baglioni
a7d50c499b
[BypassAS] prepare FOS subject, test and model for FOS and BipFinder scores
2021-11-08 16:44:19 +01:00
Antonis Lempesis
91354c6068
- fetching all context related results
...
- storing tables as parquet
2021-11-08 15:15:46 +02:00
Miriam Baglioni
94918a673c
[Graph DUMP] Fix issue for empty origilaId list
2021-11-08 10:25:28 +01:00
Claudio Atzori
9cb8e4ad21
Merge branch 'beta' into hierarchical_orgs_relations
2021-11-08 09:40:24 +01:00
Miriam Baglioni
4c70201412
mergin with branch beta
2021-11-05 12:29:56 +01:00
Miriam Baglioni
8442efd8d1
[Graph DUMP] Filtering out from the originalIds the id of the result in OpenAIRE
2021-11-05 12:29:22 +01:00
Claudio Atzori
5681e89544
Update 'dhp-workflows/dhp-graph-mapper/src/main/resources/eu/dnetlib/dhp/oa/graph/dump/schemas/result_schema.json'
2021-11-05 12:18:24 +01:00
Miriam Baglioni
a22c29fba1
[Graph DUMP] Filtering out from the originalIds the id of the result in OpenAIRE
2021-11-05 12:08:33 +01:00
Miriam Baglioni
c10ff6928c
[Graph DUMP] add schema of the dump related to the model as in dhp-schemas.2.8.31. Note the measere element at the level of the result has been removed because of issues on where to display it: at the level of the result or at the level of the entity
2021-11-05 11:36:21 +01:00
Miriam Baglioni
0857849a86
[Graph DUMP] Remove dump of measure until it will be clear where to put it (at the level of result or at the level of the instance)
2021-11-05 11:02:37 +01:00
Miriam Baglioni
df7ee77c7a
[DOIBoost Mapping] removed not needed comments
2021-11-04 16:24:07 +01:00
Miriam Baglioni
de63d29b6f
[DOIBoost Mapping] Fix to avoid to produce results with null as identifier (probably due to the filtering function in the factory for the creation of the id)
2021-11-04 16:16:40 +01:00
Miriam Baglioni
d50057b2d9
[DOIBoost Mapping] changed the way to create the url for the instance: we use the crooref guidelines https://doi.org/doi
2021-11-03 16:59:37 +01:00
Miriam Baglioni
edf55395e9
added test resourse
2021-11-03 16:49:30 +01:00
Miriam Baglioni
d97ea82a29
[DOIBoost Mapping] Added test to verify the instance created for Crossref will have just the url related to the doi
2021-11-03 16:45:15 +01:00
Miriam Baglioni
96769b4481
[DOIBoost - Mapping] Changed the logic which brought in in the instance urls that should not be there: The urld of the doi in the json is reachable from the root (json/"URL") other urls where added from the links element. Now the mapping from the link element has been removed
2021-11-03 16:43:36 +01:00
Miriam Baglioni
683fe093cf
[DOIBoost - Mapping] Remove the addition of the instance to the MAG publication record
2021-11-03 15:51:26 +01:00
Miriam Baglioni
b2bb8d9d79
[DOIBoost - Mapping] selecting the url from Crossref containing the doi
2021-11-03 15:44:57 +01:00
Miriam Baglioni
779318961c
[DOIBoost - Mapping] removed the url from crossref containing the api.elsevier.com... string in the url
2021-11-03 14:38:52 +01:00
Miriam Baglioni
2480e590d1
[DOIBoost - Mapping] changed the type on which to map dissertation from Crossref: from 006 Doctoral thesis to 0044 Thesis since dissertation could be either Doctoral or master thesis
2021-11-03 14:25:23 +01:00
Miriam Baglioni
b9d124bb7c
[Enrichment: Propagation through parent-child relationships] Added counters, and changed constraint to verify if filtering out the relation (from classname = harvested to classid != propagation)
2021-11-03 13:55:37 +01:00
Sandro La Bruzzo
7bd224f051
implement first version of scholexplorer integration for the generation of final graph
2021-11-02 15:58:15 +01:00
Antonis Lempesis
b97b78f874
removed hardcoded reference
2021-11-02 09:12:49 +01:00
Claudio Atzori
7fa49f6956
Merge pull request 'removed hardcoded reference' ( #154 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#154
2021-11-02 09:11:30 +01:00
Antonis Lempesis
f78afb5ef9
removed hardcoded reference
2021-11-01 15:42:29 +02:00
Miriam Baglioni
2aca6bfa0a
mergin with branch beta
2021-10-29 11:20:45 +02:00
Miriam Baglioni
09f36cffb8
[Enrichment: Propagation through parent-child relationships] First implementation, testing, and wf for propagation of result to organization through semantic relation
2021-10-29 11:20:03 +02:00
Claudio Atzori
1225ba0b92
[resolution] increasing number of partitions to avoid OOM
2021-10-28 16:18:17 +02:00
Sandro La Bruzzo
d9cbca83f7
moved filter on next phase
2021-10-28 16:13:24 +02:00
Claudio Atzori
d02caef185
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-27 15:36:29 +02:00
Sandro La Bruzzo
1be9aa0a5f
Removed filter of datacite items from the raw graph merging phase, Datacite is not an actionset anymore in beta
2021-10-26 17:52:20 +02:00
Sandro La Bruzzo
4acfa8fa2e
Scholexplorer Datasource Aggregation:
...
- Added collectedfrom in the inverse relation generated
Relation resolution:
- increased number of partitions in workflow.xml
- using classid instead of classname to build the pid-dnetId mapping
2021-10-26 17:51:20 +02:00
Miriam Baglioni
d0ef7d91c5
adding test resource
2021-10-26 17:34:11 +02:00
Sandro La Bruzzo
034304b33a
conflict resolved on merge
2021-10-26 09:40:47 +02:00
Michele Artini
d66e20e7ac
added hierarchy rel in ROR actionset
2021-10-21 15:51:48 +02:00
Claudio Atzori
d147295c2f
avoiding java.io.NotSerializableException: java.util.HashMap
2021-10-21 14:15:57 +02:00
Claudio Atzori
3702fe478d
cleanup
2021-10-21 12:05:02 +02:00
Sandro La Bruzzo
ac36aa7d1c
fixed wrong Encoding during a map phase
2021-10-21 11:35:02 +02:00
Sandro La Bruzzo
aeeebd573b
code refactor renamed datacite package
2021-10-20 17:37:42 +02:00
Sandro La Bruzzo
ab3a99d3e9
removed old datacite oozie workflow
2021-10-20 17:19:47 +02:00
Sandro La Bruzzo
ae4e99a471
Adapted workflow of resolution of PID to work into OpenAIRE data workflow
...
- Added relations in both verse on all Scholexplorer datasources
2021-10-20 17:12:16 +02:00
Claudio Atzori
cece432adc
[stats] reducing the step22 wait time
2021-10-20 14:16:33 +02:00
Antonis Lempesis
a7376907c2
invalidating medatadata before context thingies
2021-10-20 14:16:25 +02:00
Antonis Lempesis
43f4eb492b
fetching affiliated results for 4 orgs in monitor. fixed affiliated orgs in stats db
2021-10-20 14:16:11 +02:00
Claudio Atzori
4f8970f8ed
[stats] reducing the step22 wait time
2021-10-20 14:14:53 +02:00
Claudio Atzori
00b78b9c58
cleanup: mapping contents in the graph already defined in the OAF graph model doesn't require to be aware of the vocabularies
2021-10-20 14:04:45 +02:00
Claudio Atzori
c01dd0c925
registered oaf model classes for the KryoSerializer
2021-10-20 13:55:07 +02:00
Miriam Baglioni
652114c641
[affiliationPropagation] first try. preparetion
2021-10-20 11:44:23 +02:00
Claudio Atzori
59f76b50d4
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-20 09:42:35 +02:00
Antonis Lempesis
241dcf6df1
Merge branch 'beta' into beta
2021-10-19 23:54:21 +02:00
Claudio Atzori
515e068a78
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-19 16:46:06 +02:00
Claudio Atzori
512e7b0170
code formatting
2021-10-19 16:19:29 +02:00
Michele Artini
c4fce785ab
fixed a compilation problem of a unit test
2021-10-19 16:18:26 +02:00
Claudio Atzori
e9157c67aa
Merge branch 'beta' into dump
2021-10-19 16:15:03 +02:00
Claudio Atzori
98f37c8d81
WIP: worflow nodes for including Scholexplorer records in the RAW graph
2021-10-19 16:14:40 +02:00
Claudio Atzori
c8850456e9
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-10-19 16:09:54 +02:00
Claudio Atzori
172363e7f1
[broker] integrating PR#147, notification record creation phase separated from indexing on ES
2021-10-19 15:56:27 +02:00
Claudio Atzori
bdffa86c2f
undo last commit
2021-10-19 15:39:38 +02:00
Sandro La Bruzzo
c9870c5122
code formatted
2021-10-19 15:24:59 +02:00
Sandro La Bruzzo
f8329bc110
since dhp-schemas changed, introducing new Relation inverse model, this class has been updated
2021-10-19 15:24:22 +02:00
Claudio Atzori
e471f12d5e
hotfix: recovered implementation removing the hardcoded working_dirs
2021-10-19 12:35:38 +02:00
Claudio Atzori
7a73010acd
WIP: worflow nodes for including Scholexplorer records in the RAW graph
2021-10-19 11:59:16 +02:00