sab
ef6c90cc64
implemented methods to extract fulltext link from an API call
2024-09-11 14:57:38 +02:00
sab
df82f8beb9
code adapted as per Michele's recommendations
2024-09-04 15:29:13 +02:00
sab
53787dbf67
code refactored
2024-08-01 09:52:19 +02:00
sab
bbb79273a3
conversion to Dublin Core has been implemented
2024-08-01 01:23:04 +02:00
sab
7f39375ba8
data fetcher has been implemented
2024-07-31 18:05:11 +02:00
Claudio Atzori
d20a5e020a
[graph provision] log the Solr admin application operations for alias deletion and creation
2024-07-15 16:31:04 +02:00
Claudio Atzori
3d1d8e6036
renamed workflow to better reflect its purpose
2024-07-15 15:24:18 +02:00
Claudio Atzori
0b1c58358b
Merge pull request '[broker] fixing the mapping of ORCID for the identification of the enrichments' ( #458 ) from broker_orcid into main
...
Reviewed-on: D-Net/dnet-hadoop#458
2024-07-15 11:34:01 +02:00
Claudio Atzori
b70a440aca
renamed class, updated criteria to consider the ORCIDs used in the matchers
2024-07-12 17:09:01 +02:00
Michele Artini
36c3df1652
tests
2024-07-12 15:29:45 +02:00
Claudio Atzori
2f13683285
[broker] fine tuned the workflow memory settings
2024-07-12 10:27:24 +02:00
Claudio Atzori
5ab409dcab
[metadata collection] added -Dcom.sun.security.enableAIAcaIssuers=true as a default for metadata collection
2024-07-12 10:26:32 +02:00
Claudio Atzori
b756cfeb85
Merge pull request 'set JAVA_HOME and JAVA_OPTS in metadata collection' ( #457 ) from metadata_collection_java_upgrade into main
...
Reviewed-on: D-Net/dnet-hadoop#457
2024-07-11 15:32:11 +02:00
Claudio Atzori
51d6a541bd
[metadata collection] added the possibility to specify the JAVA_HOME and the JAVA_OPTS parameters
2024-07-11 15:24:29 +02:00
Claudio Atzori
07ce92cef2
[OAI-PMH] fixed node name
2024-07-11 11:00:23 +02:00
Miriam Baglioni
f043b7b096
[Irish Tender]changed the irish.json file according to comments #26 , #29 , and #34 for 9635
2024-07-04 12:22:56 +02:00
Claudio Atzori
153b56eeff
make entity level pids unique by pidType:pidValue
2024-07-04 09:41:39 +02:00
Claudio Atzori
ed97ba4565
Merge pull request '[prod] Openaire Affiliation Inference' ( #453 ) from affRoFromRawStringmain into main
...
Reviewed-on: D-Net/dnet-hadoop#453
2024-07-03 12:32:26 +02:00
Claudio Atzori
7b398a6d0b
updated import of organization types from OpenOrgs
2024-07-03 11:11:35 +02:00
Claudio Atzori
13f6506ce5
Change the selection criteria for the pivot record of a group so that by best pid type becomes the first criteria. This will have the effect to slowly converge to records having DOI
2024-07-03 10:44:01 +02:00
Claudio Atzori
3d9ddaa23a
importing organization types from OpenOrgs
2024-07-03 10:15:37 +02:00
Claudio Atzori
c06dfdfd86
ignore dates containing 'null's
2024-07-02 15:43:11 +02:00
Claudio Atzori
b822b34abe
code formatting
2024-07-01 09:22:35 +02:00
Michele De Bonis
ea1841fbd2
implementation of countryMatch and addition of workflow parameters
2024-07-01 09:14:32 +02:00
Miriam Baglioni
4dbce39237
[AffiliationInference]Extended the affiliation ingestion from OpenAIRE to include also the links derived from web crawl. Changed the provenance from BIP! to OpenAIRE
2024-06-29 18:51:06 +02:00
Miriam Baglioni
3ee8a7d18a
[WebCrawl]moved to Constants web crawl name and id
2024-06-29 18:47:23 +02:00
Claudio Atzori
ee7deb3f60
[graph provision] publicFormat worfklow parameter defined as optional
2024-06-28 14:52:43 +02:00
Claudio Atzori
157cc8be87
[graph provision] fixed serialization of the instancetypes
2024-06-28 14:21:12 +02:00
Claudio Atzori
023099a921
imported from beta
2024-06-26 11:40:16 +02:00
Claudio Atzori
786c217085
Using the updated Solr JSON payload model classes
2024-06-26 11:11:33 +02:00
Lampros Smyrnaios
c858c02111
- Fix not using the "export HADOOP_USER_NAME" statement in "createPDFsAggregated.sh", which caused permission-issues when creating tables with Impala.
...
- Remove unused "--user" parameter in "impala-shell" calls.
- Code polishing.
2024-06-26 10:11:21 +02:00
Claudio Atzori
8220e27110
Merge pull request 'Align Solr JSON records to the explore portal requirements' ( #448 ) from json_payload into beta_to_master_may2024
...
Reviewed-on: D-Net/dnet-hadoop#448
2024-06-25 09:57:40 +02:00
Claudio Atzori
bc993d49c1
Update pom.xml
...
depend on released schema version
2024-06-25 09:57:06 +02:00
Claudio Atzori
1dc7458de2
added JSON payload to the SolrInputDocument, updated unit tests
2024-06-24 14:48:09 +02:00
Claudio Atzori
a7a54aab47
WIP: align Solr JSON records to the explore portal requirements
2024-06-20 15:48:45 +02:00
Miriam Baglioni
eaa00a4199
[IrishFunderList]make changed according to 9635 comment 20, 21, 22 and 23
2024-06-20 12:32:57 +02:00
Claudio Atzori
fb731b6d46
WIP: align Solr JSON records to the explore portal requirements
2024-06-19 15:38:43 +02:00
Miriam Baglioni
b6da35e736
[IrishFunderList]make changed according to 9635 comment 14, 15 and 16
2024-06-19 11:06:58 +02:00
Lampros Smyrnaios
3c9b8de892
Miscellaneous updates to the copying operation to Impala Cluster:
...
- Fix not breaking out of the VIEWS-infinite-loop when the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR" is set to "false".
- Exit the script when no HDFS-active-node was found, independently of the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR".
- Fix view_name-recognition in a log-message, by using the more advanced "Perl-Compatible Regular Expressions" in "grep".
- Add error-handling for "compute stats" errors.
2024-06-18 15:59:34 +02:00
Antonis Lempesis
c67ef157d3
filtering out deletedbyinference and invinsible results from accessroute
2024-06-18 15:59:00 +02:00
Lampros Smyrnaios
c23f3031ed
Miscellaneous updates to the copying operation to Impala Cluster:
...
- Show some counts and the elapsed time for various sub-tasks.
- Code polishing.
2024-06-18 15:58:46 +02:00
Claudio Atzori
8ec151aa3d
[graph indexing] comment out setting the JSON payload from the SolrInputDocuments
2024-06-18 15:53:24 +02:00
Claudio Atzori
2636936162
[IE OAI-PMH] fixed oozie wf definition
2024-06-14 11:47:37 +02:00
Miriam Baglioni
ef437a8cdf
[Provision]temporarily removed Json paylod from indexed records (Shadow cannot support it)
2024-06-13 16:48:03 +02:00
Miriam Baglioni
86088ef26e
Merge remote-tracking branch 'origin/beta_to_master_may2024' into beta_to_master_may2024
2024-06-11 17:04:07 +02:00
Miriam Baglioni
143c525343
[WebCrawl]remove relations for pid not doi
2024-06-11 17:03:59 +02:00
Claudio Atzori
c371513d43
[graph resolution] use sparkExecutorMemory to define also the memoryOverhead
2024-06-11 14:21:01 +02:00
Claudio Atzori
71927ca818
avoid NPEs
2024-06-11 12:40:50 +02:00
Giambattista Bloisi
46018dc804
Fix OperationUnsupportedException while merging two Result's contexts due to modification of an immutable collection
2024-06-11 10:39:48 +02:00
Miriam Baglioni
3efd5b1308
[SDGActionSet]remove datainfo for the result. It is not needed (qualifier.classid = UPDATE) useless since subject do not go at the level of the instance
2024-06-11 10:35:57 +02:00
Miriam Baglioni
196fa55774
Merge remote-tracking branch 'origin/beta_to_master_may2024' into beta_to_master_may2024
2024-06-11 10:26:24 +02:00
Miriam Baglioni
50805e3fc1
[FoSActionSet]remove datainfo for the result. It is not needed (qualifier.classid = UPDATE) useless since subject do not go at the level of the instance
2024-06-11 10:25:46 +02:00
Claudio Atzori
d39a1054b8
[actionset promotion] use sparkExecutorMemory to define also the memoryOverhead
2024-06-10 16:15:07 +02:00
Claudio Atzori
576efc1857
hostedby patching to work with the updated Crossref contents
2024-06-10 15:22:33 +02:00
Claudio Atzori
efc1632e16
code formatting
2024-06-06 09:25:26 +02:00
Claudio Atzori
91b49366c6
[graph provision] align serialisation of the usage count measures to the agrred specifications
2024-06-05 16:34:40 +02:00
Claudio Atzori
5e05385d35
minor
2024-06-05 16:31:58 +02:00
Miriam Baglioni
c4d9b5b9d2
[downloadsAndViews]update the test file to consider the new serialization for downloads and views
2024-06-05 16:30:15 +02:00
Miriam Baglioni
bf9a5e6314
[downloadsAndViews]changed the test file to check the indicators are not there if their value is 0
2024-06-05 16:29:40 +02:00
Miriam Baglioni
9d79ddb3dd
[bulkTag] fixed issue that made project disappear in graph_10_enriched
2024-06-05 16:20:40 +02:00
Miriam Baglioni
907aa28c6c
[downloadsAndViews] fixed issue
2024-06-05 16:19:29 +02:00
Miriam Baglioni
3955ceaa76
[downloadsAndViews] changed the serialization for downloads and views
2024-06-05 16:18:46 +02:00
Miriam Baglioni
128c143394
{downloadsAndViews] extended test file with measures for downloads and views
2024-06-05 16:17:59 +02:00
Claudio Atzori
5133993ee5
Merge branch 'beta_to_master_may2024' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta_to_master_may2024
2024-06-05 12:17:48 +02:00
Claudio Atzori
5cf259a851
[graph2hive] use sparkExecutorMemory to define also the memoryOverhead
2024-06-05 12:17:16 +02:00
Claudio Atzori
e1828fc60e
Merge pull request '[PROD] Irish oaipmh exporter' ( #444 ) from irish-oaipmh-exporter into beta_to_master_may2024
...
Reviewed-on: D-Net/dnet-hadoop#444
2024-06-05 10:56:20 +02:00
Claudio Atzori
81090ad593
[IE OAIPHM] added oozie workflow, minor changes, code formatting
2024-06-05 10:03:33 +02:00
Claudio Atzori
56920b447d
Merge pull request 'Fix for missing collectedfrom after dedup' ( #442 ) from fix_mergedcliquesort into beta_to_master_may2024
...
Reviewed-on: D-Net/dnet-hadoop#442
2024-06-03 15:34:01 +02:00
Giambattista Bloisi
3feab5d92d
Fix MergeUtils.mergeGroup: it could get rid of some records and did not consider all PID authorities whilke sorting records.
...
ResultTypeComparator is now renamed in MergeEntitiesComparator and can be used as a general comparator for merging groups of records
2024-06-03 15:13:40 +02:00
Claudio Atzori
6be783caec
[graph cleaning] use sparkExecutorMemory to define also the memoryOverhead
2024-05-29 14:36:49 +02:00
Claudio Atzori
b703f94f09
Merge pull request 'changes in copy script - beta2master' ( #439 ) from antonis.lempesis/dnet-hadoop:beta into beta_to_master_may2024
...
Reviewed-on: D-Net/dnet-hadoop#439
2024-05-29 14:29:26 +02:00
Miriam Baglioni
14f275ffaf
[NOAMI] removed Ireland funder id 501100011103. ticket 9635
2024-05-29 11:54:17 +02:00
Claudio Atzori
a428e7be7e
graph cleaning to implement ugly hardcoded rules, avoid NPEs
2024-05-29 09:26:12 +02:00
Lampros Smyrnaios
e3f28338c1
Miscellaneous updates to the copying operation to Impala Cluster:
...
- Assign the WRITE and EXECUTE permissions to the DBs' HDFS-directories, in order to be able to create tables on top of them, in the Impala Cluster.
- Make sure the "copydb" function returns early, when it encounters a fatal error, while respecting the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR" config.
2024-05-28 17:51:45 +03:00
Claudio Atzori
8e45c5baa8
graph cleaning to implement ugly hardcoded rules
2024-05-28 15:28:42 +02:00
Claudio Atzori
db5e18c784
hostedby patching to work with the updated Crossref contents
2024-05-28 15:28:13 +02:00
Claudio Atzori
fb266efbcb
[org dedup] avoid NPEs in SparkPrepareNewOrgs
2024-05-26 21:23:30 +02:00
Claudio Atzori
d7daf54333
[org dedup] avoid NPEs in SparkPrepareOrgRels
2024-05-26 16:48:11 +02:00
Claudio Atzori
f99eaa0376
Merge branch 'beta_to_master_may2024' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta_to_master_may2024
2024-05-26 15:45:41 +02:00
Claudio Atzori
23312fcc1e
[org dedup] avoid NPEs in SparkPrepareOrgRels
2024-05-26 15:43:24 +02:00
Miriam Baglioni
b864f0adcf
Update to include a blackList that filters out the results we know are wrongly associated to IE - update workflow definition - the blacklist parameter
2024-05-24 16:01:19 +02:00
Miriam Baglioni
7a44869d87
Update to include a blackList that filters out the results we know are wrongly associated to IE - refactoring
2024-05-24 15:23:42 +02:00
Miriam Baglioni
12ffde023f
Update to include a blackList that filters out the results we know are wrongly associated to IE
2024-05-24 12:28:24 +02:00
Antonis Lempesis
15b54a345a
added fos lvl4
2024-05-24 13:21:28 +03:00
Lampros Smyrnaios
b48ed6e617
Change configuration in the copy-operation to Impala Cluster:
...
Set the "SHOULD_EXIT_WHOLE_SCRIPT_UPON_ERROR" parameter to "false".
2024-05-23 16:58:12 +03:00
Lampros Smyrnaios
68322843e2
Small updates to the copy-operation to Impala Cluster:
...
- Add a configuration-"switch" to control whether the script exits upon an error or not.
- Allow the script to exit when a table could not be created.
- Show the elapsed time for processing each database.
2024-05-23 15:07:49 +03:00
Lampros Smyrnaios
c7b32bbacc
Update CopyDataToImpalaCluster:
...
Update the code of acquiring the entities from Ocean cluster, through hive, in order to optimize the process and account for additional reserved keywords in Impala.
Co-authored-by: Antonis Lempesis <antleb@di.uoa.gr>
2024-05-23 13:00:19 +03:00
Claudio Atzori
c3fe59bc78
fixed conflicts merging from beta, code formatting
2024-05-21 14:50:40 +02:00
Claudio Atzori
1ea67eba82
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2024-05-21 13:48:48 +02:00
Claudio Atzori
f9fb2fef6e
Merge pull request 'Modification of Microsoft Academic Graph Mapping' ( #435 ) from mag_only_doi into beta
...
Reviewed-on: D-Net/dnet-hadoop#435
2024-05-21 13:48:42 +02:00
Claudio Atzori
834461ba26
[graph provision]fixed wf definition, revised serialization of the usage counts measures
2024-05-21 13:48:06 +02:00
Sandro La Bruzzo
032bcc8279
since last beta workflow we decide to introduce in the graph only MAG item with DOI and set them invisible ( this should be the same behaviour of the previous DOIBoost mapping).
...
This commit apply this type of mapping
2024-05-20 09:24:15 +02:00
Claudio Atzori
92f018d196
[graph provision] fixed path pointing to an intermediate data store in the working directory
2024-05-15 15:39:18 +02:00
Claudio Atzori
0611c81a2f
[graph provision] using Qualifier.classNames to populate the correponsing fields in the JSON payload
2024-05-15 15:33:10 +02:00
Michele Artini
2b3b5fe9a1
oai finalization and test
2024-05-15 14:13:16 +02:00
Claudio Atzori
1efe7f7e39
[graph provision] upgrade to dhp-schema:6.1.2, included project.oamandatepublications in the JSON payload mapping, fixed serialisation of the usageCounts measures
2024-05-14 12:39:31 +02:00
Claudio Atzori
53e7bb4336
Merge pull request 'rest-collector-plugin-with-retry' ( #432 ) from rest-collector-plugin-with-retry into beta
...
Reviewed-on: D-Net/dnet-hadoop#432
2024-05-10 09:02:33 +02:00
Claudio Atzori
f7d56e2ef2
Merge branch 'beta' into rest-collector-plugin-with-retry
2024-05-10 09:02:21 +02:00
Claudio Atzori
c1237ab39e
Merge pull request 'Fixes in Graph Provision' ( #434 ) from beta_provision_relation into beta
...
Reviewed-on: D-Net/dnet-hadoop#434
2024-05-09 14:15:05 +02:00
Claudio Atzori
dc3a5858f7
Merge branch 'beta' into beta_provision_relation
2024-05-09 14:14:43 +02:00
Claudio Atzori
55f39f7850
[graph provision] adds the possibility to validate the XML records before storing them via the validateXML parameter
2024-05-09 14:06:04 +02:00
Claudio Atzori
39a2afe8b5
[graph provision] fixed XML serialization of the usage counts measures, renamed workflow actions to better reflect their role
2024-05-09 13:54:42 +02:00
Claudio Atzori
908ed9da7a
Merge pull request 'Various fixes in the stats wf' ( #430 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#430
2024-05-08 13:41:02 +02:00
Antonis Lempesis
0cada3cc8f
every step is run in the analytics queue. Hardcoded for now, will make a parameter later
2024-05-08 13:42:53 +03:00
Antonis Lempesis
90a4fb3547
fixed typos
2024-05-08 13:17:58 +03:00
Claudio Atzori
18aa323ee9
cleanup unused classes, adjustments in the oozie wf definition
2024-05-08 11:36:46 +02:00
Michele Artini
c9a327bc50
refactoring of gzip method
2024-05-08 11:34:08 +02:00
Michele Artini
e234848af8
oaf record: xpath for root
2024-05-08 10:00:53 +02:00
Claudio Atzori
b4e3389432
fixed property mapping creating the RelatedEntity transient objects. spark cores & memory adjustments. Code formatting
2024-05-07 16:25:17 +02:00
Giambattista Bloisi
711048ceed
PrepareRelationsJob rewritten to use Spark Dataframe API and Windowing functions
2024-05-07 15:44:33 +02:00
Michele Artini
70bf6ac415
oai exporter tests
2024-05-07 09:36:26 +02:00
Michele Artini
aa40e53c19
oai exporter parameters
2024-05-07 08:01:19 +02:00
Michele Artini
ed052a3476
job for the population of the oai database
2024-05-06 16:08:33 +02:00
Claudio Atzori
26363060ed
fixed id prefix creation for the fosnodoi records, again
2024-05-03 15:53:52 +02:00
Claudio Atzori
0486227185
[cleaning] deactivating the cleaning of FOS subjects found in the metadata provided by repositories
2024-05-03 14:31:12 +02:00
Claudio Atzori
a5d13d5d27
code formatting
2024-05-03 14:14:34 +02:00
Claudio Atzori
e1a0fb8933
fixed id prefix creation for the fosnodoi records
2024-05-03 14:14:18 +02:00
Giambattista Bloisi
69c5efbd8b
Fix: when applying enrichments with no instance information the resulting merge entity was generated with no instance instead of keeping the original information
2024-05-03 13:57:56 +02:00
Claudio Atzori
00ad21d814
Merge pull request 'preparations for dhp-common beta release 1.2.5' ( #433 ) from beta-release-1.2.5 into beta
...
Reviewed-on: D-Net/dnet-hadoop#433
2024-05-02 11:28:19 +02:00
Claudio Atzori
4355f64810
reverted to version 1.2.5-SNAPSHOT
2024-05-02 11:23:53 +02:00
Claudio Atzori
66680b8b9a
refactoring of common utilities
2024-05-02 11:16:58 +02:00
Claudio Atzori
dcf23b3d06
Merge branch 'beta' into beta-release-1.2.5
2024-05-02 10:01:49 +02:00
Michele Artini
f4068de298
code reindent + tests
2024-05-02 09:51:33 +02:00
Claudio Atzori
11bd89e132
[enrichment] use sparkExecutorMemory to define also the memoryOverhead
2024-05-01 08:32:59 +02:00
Claudio Atzori
e96c2c1606
[ranking wf] set spark.executor.memoryOverhead to fine tune the resource consumption
2024-04-30 16:23:25 +02:00
Claudio Atzori
50c18f7a0b
[dedup wf] revised memory settings to address the increased volume of input contents
2024-04-30 12:34:16 +02:00
Michele Artini
2615136efc
added a retry mechanism
2024-04-30 11:58:42 +02:00
Claudio Atzori
c08a58bba8
Merge pull request 'Miscellaneous related to changes in MergeUtils' ( #429 ) from misc_fixes_merge_entities into beta
...
Reviewed-on: D-Net/dnet-hadoop#429
2024-04-24 08:55:37 +02:00
Claudio Atzori
e2937db385
Merge branch 'beta' into misc_fixes_merge_entities
2024-04-24 08:55:28 +02:00
Giambattista Bloisi
1878199dae
Miscellaneous fixes:
...
- in Merge By ID pick by preference those records coming from delegated Authorities
- fix various tests
- close spark session in SparkCreateSimRels
2024-04-24 08:12:45 +02:00
Lampros Smyrnaios
49af2e5740
Miscellaneous updates to the copying operation to Impala Cluster:
...
- Update the algorithm for creating views that depend on other views; overcome some bash-instabilities.
- Upon any error, fail the whole process, not just the current DB-creation, as those errors usually indicate a bug in the initial DB-creation, that should be fixed immediately.
- Enhance parallel-copy of large files by "hadoop distcp" command.
- Reduce the "invalidate metadata" commands to just the current DB's tables, in order to eliminate the general overhead on Impala.
- Show the number of tables and views in the logs.
- Fix some log-messages.
2024-04-23 17:15:04 +03:00
Antonis Lempesis
d2649a1429
increased the jvm ram
2024-04-23 16:03:16 +03:00
Claudio Atzori
c3053ef34d
using version 1.2.5-beta for the release
2024-04-23 14:52:32 +02:00
Claudio Atzori
b5bcab13ec
using version 1.2.5-beta for the release
2024-04-23 14:36:39 +02:00
Claudio Atzori
425c9afc36
using version 1.2.5-beta for the release
2024-04-23 14:30:04 +02:00
Claudio Atzori
93dd9cc639
code formatting
2024-04-23 11:28:00 +02:00
Miriam Baglioni
6189879643
[NOAMI] removed entry for Irish Research eLibray (IReL) Care Board from the list of funders.
2024-04-23 11:09:18 +02:00
Claudio Atzori
c57cff2d6d
Merge pull request '[WebCrawl] adding affiliation relations from web information' ( #428 ) from WebCrowlBeta into beta
...
Reviewed-on: D-Net/dnet-hadoop#428
2024-04-23 09:36:15 +02:00
Miriam Baglioni
7de114bda0
[WebCrawl] addressing comments from PR
2024-04-22 13:52:50 +02:00
Claudio Atzori
eb4692e4ee
Merge branch 'beta' into WebCrowlBeta
2024-04-22 11:40:24 +02:00
Claudio Atzori
24a83fc24f
avoid NPEs in common Oaf merge utilities
2024-04-22 11:39:44 +02:00
Miriam Baglioni
776c898c4b
[WebCrawl] adding affiliation relations from web information
2024-04-22 11:04:17 +02:00
Claudio Atzori
5857fd38c1
avoid NPEs in common Oaf merge utilities
2024-04-21 08:29:09 +02:00
Claudio Atzori
0656ab2838
code formatting
2024-04-20 08:10:58 +02:00
Claudio Atzori
ab7f0855af
fixed query reading projects from the aggregator DB
2024-04-20 08:10:32 +02:00
Claudio Atzori
7a7e313157
updated schema version
2024-04-19 17:30:25 +02:00
Claudio Atzori
e5879b68c7
[transformative agreement] including reuslt-funder relations to the information imported from the TRs
2024-04-19 17:14:18 +02:00
Claudio Atzori
3a027e97a7
[graph indexing] sets spark memoryOverhead in the join operations to the same value used for the memory executor
2024-04-19 16:59:58 +02:00
Claudio Atzori
795e1b2629
Merge pull request '[graph indexing] sets spark memoryOverhead in the join operations to the same value used for the memory executor' ( #426 ) from provision_memoryOverhead into master
...
Reviewed-on: D-Net/dnet-hadoop#426
2024-04-19 16:59:45 +02:00
Claudio Atzori
0c05abe50b
[graph indexing] sets spark memoryOverhead in the join operations to the same value used for the memory executor
2024-04-19 16:57:55 +02:00
Sandro La Bruzzo
b72c3139e2
updated Ignore annotation that is deprecated to Disabled
2024-04-19 14:52:40 +02:00
Antonis Lempesis
b52a5a753b
Merge remote-tracking branch 'upstream/beta' into beta
2024-04-19 15:28:28 +03:00
Antonis Lempesis
c3fe9662b2
all indicator tables are now stored as parquet
2024-04-19 12:45:36 +03:00
Claudio Atzori
57c678d904
integrating changes from PR#424
2024-04-18 11:38:35 +02:00
Claudio Atzori
5ab8cd1794
Various fixes for the stats DB update workflow, step16-createIndicatorsTables.sql
2024-04-18 11:28:18 +02:00
Claudio Atzori
8fdd0244ad
Merge pull request 'Various fixes for the stats DB update workflow, step16-createIndicatorsTables.sql' ( #425 ) from stats_step16_fix into master
...
Reviewed-on: D-Net/dnet-hadoop#425
2024-04-18 11:25:24 +02:00
Claudio Atzori
18fdaaf548
integrating suggestion from #9699 to improve the result_country table construction
2024-04-18 11:23:43 +02:00
Antonis Lempesis
0c71c58df6
fixed the definition of gold_oa
2024-04-18 12:01:27 +03:00
Antonis Lempesis
43d05dbebb
fixed the definition of result_country
2024-04-18 11:53:50 +03:00
Antonis Lempesis
e728a0897c
fixed the definition of indi_pub_bronze_oa
2024-04-18 11:07:55 +03:00
Antonis Lempesis
308ae580a9
slight optimization in indi_pub_gold_oa definition
2024-04-18 10:57:52 +03:00
Antonis Lempesis
27d22bd8f9
slight optimization in indi_pub_gold_oa definition
2024-04-17 23:59:52 +03:00
Antonis Lempesis
1f5aba12fa
slight optimization in indi_pub_gold_oa definition
2024-04-17 23:54:23 +03:00
Claudio Atzori
43e123c624
added column alias
2024-04-17 16:40:29 +02:00
Claudio Atzori
62a07b7add
added missing end of statement /*EOS*/
2024-04-17 15:13:28 +02:00
Claudio Atzori
96bddcc921
revised query implementation for indi_pub_gold_oa
2024-04-17 15:06:50 +02:00
Claudio Atzori
b554c41cc7
Merge pull request 'doidoost_dismiss' ( #418 ) from doidoost_dismiss into beta
...
Reviewed-on: D-Net/dnet-hadoop#418
2024-04-17 12:01:11 +02:00
Claudio Atzori
ac8747582c
Merge branch 'beta' into doidoost_dismiss
2024-04-17 12:01:01 +02:00
Claudio Atzori
0db7e4ae9a
Merge pull request 'Refinements to PR #404 : refactoring the Oaf records merge utilities into dhp-common' ( #422 ) from revised_merge_logic into beta
...
Reviewed-on: D-Net/dnet-hadoop#422
2024-04-17 11:58:26 +02:00
Giambattista Bloisi
8ac167e420
Refinements to PR #404 : refactoring the Oaf records merge utilities into dhp-common
2024-04-16 17:18:28 +02:00
Miriam Baglioni
0486cea4c4
removed the funder id : 100011062 Asian Spinal Cord Network, wrongly associated to Ireland
2024-04-16 15:36:40 +02:00
Miriam Baglioni
0625b9061f
removed the funder id : 100011062 Asian Spinal Cord Network, wrongly associated to Ireland
2024-04-16 15:26:53 +02:00
Miriam Baglioni
9eeb9f5d32
mergin with branch beta
2024-04-16 15:24:40 +02:00
Claudio Atzori
589bce3520
Merge pull request '[pBETA] Improvements to copying data from ocean to impala' ( #421 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#421
2024-04-16 14:22:32 +02:00
Claudio Atzori
013935c593
Merge pull request 'Improvements to copying data from ocean to impala' ( #420 ) from antonis.lempesis/dnet-hadoop:beta into master
...
Reviewed-on: D-Net/dnet-hadoop#420
2024-04-16 14:17:47 +02:00
Sandro La Bruzzo
a5ddd8dfbb
Added Action set generation for the MAG organization
2024-04-16 13:39:15 +02:00
Giambattista Bloisi
da333e9f4d
Merge pull request 'Enhance Dedup authors matching with algorithms used for ORCID enhancements (task 9690)' ( #419 ) from dedup_authorsmatch_bytoken into beta
...
Reviewed-on: D-Net/dnet-hadoop#419
2024-04-16 10:24:11 +02:00
Claudio Atzori
43fd1de681
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2024-04-16 09:42:05 +02:00
Claudio Atzori
d070db4a32
added a couple more invalid author names
2024-04-16 09:41:59 +02:00
Michele Artini
78b9d84e4a
test
2024-04-16 09:41:16 +02:00
Giambattista Bloisi
43b454399f
- Bug fix in matchOrderedTokenAndAbbreviations algorithms where tokens with same initial character were always considered equal
...
- AuthorsMatch exploits the new matching strategy used for ORCID enhancements in #PR398: split author names in tokens, order the tokens, then check for matches of ordered full tokens or abbreviations
2024-04-15 18:19:29 +02:00
Lampros Smyrnaios
d7da4f814b
Minor updates to the copying operation to Impala Cluster:
...
- Improve logging.
- Code optimization/polishing.
2024-04-12 18:12:06 +03:00
Lampros Smyrnaios
14719dcd62
Miscellaneous updates to the copying operation to Impala Cluster:
...
- Update the algorithm for creating views that depend on other views.
- Add check for successful execution of the "hadoop distcp" command.
- Add a check for successful copy operation of all entities.
- Upon facing an error in a DB, exit the method, instead of the whole script.
- Improve logging.
- Code polishing.
2024-04-12 15:36:13 +03:00
Sandro La Bruzzo
41a42dde64
code formatted
2024-04-11 17:43:48 +02:00
Sandro La Bruzzo
843dc95340
resolved conflict
2024-04-11 17:38:16 +02:00
Sandro La Bruzzo
1e30454ee0
added vocabulary tu instanceTypeMApping of Mag
2024-04-11 17:32:30 +02:00
Sandro La Bruzzo
2581672c11
updated wf of MAG and crossref to use transaction
2024-04-11 17:27:49 +02:00
Lampros Smyrnaios
22745027c8
Use the "HADOOP_USER_NAME" value from the "workflow-property", in "copyDataToImpalaCluster.sh", in "stats-monitor-updates".
2024-04-11 17:46:33 +03:00
Lampros Smyrnaios
abf0b69f29
Upgrade the copying operation to Impala Cluster:
...
- Use only hive commands in the Ocean Cluster, as the "impala-shell" will be removed from there to free-up resources.
- Hugely improve the performance in every aspect of the copying process: a) speedup file-transferring and DB-deletion, b) eliminate permissions-assignment, "load" operations and "use $db" queries, c) retry only the "create view" statements and only as long as they depend on other non-created views, instead of trying to recreate all tables and views 5 consecutive times.
- Add error-checks for the creation of tables and views.
2024-04-11 17:12:12 +03:00
Claudio Atzori
3cad4a415d
fixed duplicated property dhp-schemas.version
2024-04-11 15:44:12 +02:00
Sandro La Bruzzo
a0642bd190
added instanceTypeMapping field on MAG
2024-04-11 13:10:12 +02:00
Claudio Atzori
6132bd028e
Merge pull request 'Extend Crossref-funders mapping and datacite hostedbymap' ( #417 ) from CrossrefFundersMap into master
...
Reviewed-on: D-Net/dnet-hadoop#417
2024-04-09 10:30:53 +02:00
Miriam Baglioni
519db1ddef
Extended mapping of funder from crossref ( #9169 , #9277 ) and change the correspondece files for the irish fundrs ( #9635 ). Extended the datacite map to include the association between metadata and the EBRAINS datasource (SciLake)
2024-04-09 09:33:09 +02:00
Sandro La Bruzzo
98dc042db5
mapping generated for MAG,
...
missing generation of Organization Action set
2024-04-05 18:12:53 +02:00
Sandro La Bruzzo
ef582948a7
Updated mapping
2024-04-05 11:10:44 +02:00
Sandro La Bruzzo
5142f462b5
completed mapping from paper to OAF, not tested
2024-04-04 21:06:04 +02:00
Miriam Baglioni
0794e0667b
Merge branch 'doidoost_dismiss' of https://code-repo.d4science.org/D-Net/dnet-hadoop into doidoost_dismiss
2024-04-04 09:16:18 +02:00
Miriam Baglioni
4b1de076ac
[DataciteHostedByMap] added entry for EBRAINS
2024-04-04 09:16:14 +02:00
Miriam Baglioni
c8a88b2187
[DataciteHostedByMap] added entry for EBRAINS
2024-04-04 09:14:58 +02:00
Sandro La Bruzzo
31e152d2bb
Merge remote-tracking branch 'origin/doidoost_dismiss' into doidoost_dismiss
2024-04-03 17:08:35 +02:00
Sandro La Bruzzo
6f3e925cae
Implemented first part of the new MAG mapping
2024-04-03 17:07:14 +02:00
Miriam Baglioni
f0f6abf892
[MapToFunderLink]added references for HFRI and Erasmus+ for the creation of links for funders
2024-04-03 14:59:09 +02:00
Claudio Atzori
26b97aa5ed
Merge pull request '[BETA] fixed the result_country definition and updated the stats DB copy procedure' ( #416 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#416
2024-04-03 12:36:03 +02:00
Claudio Atzori
5add51f38c
Merge pull request 'fixed the result_country definition and updated the stats DB copy procedure' ( #412 ) from antonis.lempesis/dnet-hadoop:beta into master
...
Reviewed-on: D-Net/dnet-hadoop#412
2024-04-03 12:34:17 +02:00
Lampros Smyrnaios
b7c8acc563
- Update the code which acquires the "IMPALA_HDFS_NODE", to test the "tmp"-dir, instead of the base-dir and introduce retries, to overcome potential file-system failures. This change was suggested by "Sebastian Tymkow" and "Grzegorz Bakalarski".
...
- Fix typos.
2024-04-03 13:15:37 +03:00
Miriam Baglioni
50fbebf186
[NOAMI] removed entry for Health and Social Care Board from the list of funders. Modified IRC putting 1596 and 1597 as synonyms, as required in ticket 9635
2024-04-03 11:45:40 +02:00
Michele Artini
71d6e02886
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2024-04-03 09:50:41 +02:00
Michele Artini
02c9a311c8
base datainfo with trust=0.89
2024-04-03 09:50:21 +02:00
Miriam Baglioni
42846d3b91
[OpenCitation] add compression option when writing the sequence file
2024-04-03 09:25:00 +02:00
Miriam Baglioni
4f0a044245
Merge pull request 'Add action set creation for Datacite affiliations' ( #413 ) from 9647_datacite_affiliations into beta
...
Reviewed-on: D-Net/dnet-hadoop#413
2024-04-02 17:33:38 +02:00
Miriam Baglioni
4bb504e693
Merge pull request '[UsageCount] fixed error' ( #415 ) from UsageStatsRecordDS into beta
...
Reviewed-on: D-Net/dnet-hadoop#415
2024-04-02 17:06:12 +02:00
Serafeim Chatzopoulos
cbe13a5c61
Fix datacite input path in properties file
2024-04-02 18:00:35 +03:00
Miriam Baglioni
9c9a9562ae
[UsageCount] fixed error
2024-04-02 16:56:37 +02:00
Miriam Baglioni
2c4440951f
Merge pull request '[UsageCount] add check in case the datasource is not matched against those present in the graph' ( #414 ) from UsageStatsRecordDS into beta
...
Reviewed-on: D-Net/dnet-hadoop#414
2024-04-02 16:30:39 +02:00
Miriam Baglioni
b42bdd5fb3
[UsageCount] add check in case the datasource is not matched against those present in the graph
2024-04-02 16:28:27 +02:00
Miriam Baglioni
64cbd8abe9
Merge pull request '[UsageCount] Usage count per result split by datasource' ( #318 ) from UsageStatsRecordDS into beta
...
Reviewed-on: D-Net/dnet-hadoop#318
2024-04-02 10:21:39 +02:00
Antonis Lempesis
df6e3bda04
added new orgs in monitor
2024-04-01 22:45:29 +03:00
Antonis Lempesis
573b081f1d
added new orgs in monitor
2024-04-01 22:24:46 +03:00
Serafeim Chatzopoulos
0eb0701b26
Add action set creation for Datacite affiliations
2024-04-01 17:23:26 +03:00
Antonis Lempesis
0bf2a7a359
fixed the result_country definition
2024-04-01 15:23:22 +03:00
Claudio Atzori
24227ab598
Merge pull request '[BETA] fixed typo in indicator query' ( #411 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#411
2024-03-27 13:56:43 +01:00
Claudio Atzori
f01390702e
Merge pull request 'fixed typo in indicator query' ( #410 ) from antonis.lempesis/dnet-hadoop:beta into master
...
Reviewed-on: D-Net/dnet-hadoop#410
2024-03-27 13:42:07 +01:00
Antonis Lempesis
9ff44eed96
fixed typo in indicator query
...
added more institutions
2024-03-27 14:39:01 +02:00
Claudio Atzori
cff6040424
Merge pull request '[BETA] added missing EOS, Generate tables with parquet-files, instead of csv in the contexts.sh script' ( #409 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#409
2024-03-27 12:04:04 +01:00
Claudio Atzori
5592ccc37a
Merge pull request 'added missing EOS, Generate tables with parquet-files, instead of csv in the contexts.sh script' ( #408 ) from antonis.lempesis/dnet-hadoop:beta into master
...
Reviewed-on: D-Net/dnet-hadoop#408
2024-03-27 12:02:57 +01:00
Antonis Lempesis
1fee4124e0
added missing EOS
2024-03-27 12:58:25 +02:00
Sandro La Bruzzo
73a67c0e4a
Improved Crossref mapping to include also unpaywall tested
2024-03-26 17:26:47 +01:00
Claudio Atzori
9e700a8b0d
Merge pull request 'adding context information to projects and datasources' ( #407 ) from taggingProjects into beta
...
Reviewed-on: D-Net/dnet-hadoop#407
2024-03-26 14:53:38 +01:00
Claudio Atzori
75551ad4ec
code formatting
2024-03-26 14:53:16 +01:00
Miriam Baglioni
94b931f7bd
[BulkTagging - tag datasource and projects]merging with branch beta
2024-03-26 14:25:19 +01:00
Miriam Baglioni
3b209261f2
[BulkTagging - tag datasource and projects]merging with branch beta
2024-03-26 14:21:27 +01:00
Claudio Atzori
d16c15da8d
adjusted pom files
2024-03-26 14:00:44 +01:00
Lampros Smyrnaios
036ba03fcd
Generate tables with parquet-files, instead of csv, in "dhp-stats-update/.../contexts.sh" script.
2024-03-26 13:29:04 +02:00
Claudio Atzori
09a6d17059
Merge pull request '[Stats wf] #372 , #405 to production' ( #406 ) from antonis.lempesis/dnet-hadoop:beta into master
...
Reviewed-on: D-Net/dnet-hadoop#406
2024-03-26 12:18:26 +01:00
Claudio Atzori
d70793847d
resolving conflicts on step16-createIndicatorsTables.sql
2024-03-26 12:17:52 +01:00
Claudio Atzori
730eaffc85
Merge pull request 'correctly selecting the active hdfs node for the impala cluster' ( #405 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#405
2024-03-26 12:07:46 +01:00
Lampros Smyrnaios
bc8c97182d
Automatically select the ACTIVE HDFS NODE for Impala cluster, in all "copyDataToImpalaCluster.sh" scripts.
2024-03-26 13:01:12 +02:00
Lampros Smyrnaios
92cc27e7eb
Use the ACTIVE HDFS NODE for Impala cluster, in "copyDataToImpalaCluster.sh" script.
2024-03-26 12:34:11 +02:00
Claudio Atzori
ef52128c55
included new stats* workflows in parent pom list of modules, code formatting
2024-03-26 10:42:10 +01:00
Claudio Atzori
bfba71a95c
further follow up changes from integrating the mergeutils branch
2024-03-26 09:01:18 +01:00
Claudio Atzori
d72e7b7487
Merge pull request 'Changes to indicators and funders definition' ( #372 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#372
2024-03-26 08:46:20 +01:00
Sandro La Bruzzo
ece56f0178
update crossref mapping to be transformed together with UnpayWall
2024-03-25 18:18:10 +01:00
Claudio Atzori
414acd4ef4
Merge pull request 'refactoring the Oaf records merge utilities into dhp-common' ( #404 ) from mergeutils into beta
...
Reviewed-on: D-Net/dnet-hadoop#404
2024-03-25 16:16:07 +01:00
Claudio Atzori
ecff0b4825
merge from beta
2024-03-25 16:15:52 +01:00
Claudio Atzori
25c2025223
Merge pull request 'mapped oaf:country from results' ( #403 ) from oaf_country_beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#403
2024-03-25 16:13:31 +01:00
Claudio Atzori
538b180fe0
Merge branch 'beta' into oaf_country_beta
2024-03-25 16:13:20 +01:00
Claudio Atzori
eae88c0fe3
Merge pull request 'Solr JSON payload' ( #399 ) from index_records into beta
...
Reviewed-on: D-Net/dnet-hadoop#399
2024-03-25 16:12:59 +01:00
Claudio Atzori
82fc609c4f
Merge branch 'beta' into index_records
2024-03-25 16:12:49 +01:00
Claudio Atzori
4b978ffa2d
align dhp-schema.version with the beta branch
2024-03-25 16:12:36 +01:00
Claudio Atzori
fa4b3e6d2b
Merge pull request 'Open Citation integration' ( #401 ) from ocnew into beta
...
Reviewed-on: D-Net/dnet-hadoop#401
2024-03-25 16:10:40 +01:00
Claudio Atzori
74e5d05577
Merge branch 'beta' into ocnew
2024-03-25 16:10:31 +01:00
Claudio Atzori
6c3b692f60
integrated minor change from beta branch
2024-03-25 16:10:23 +01:00
Claudio Atzori
e9eb590f87
Merge pull request 'FOS ActionSet for the classification of results without a doi' ( #397 ) from FOSNew into beta
...
Reviewed-on: D-Net/dnet-hadoop#397
2024-03-25 16:07:47 +01:00
Claudio Atzori
9a5b134ddf
Merge branch 'beta' into FOSNew
2024-03-25 16:07:37 +01:00
Claudio Atzori
069803f34a
Merge pull request 'Added exception throwing in Hadoop transformation when TR is not syntactically valid' ( #387 ) from exception_on_invalid_transofmation_rule into beta
...
Reviewed-on: D-Net/dnet-hadoop#387
2024-03-25 16:05:43 +01:00
Claudio Atzori
71c1f81b54
Merge branch 'beta' into exception_on_invalid_transofmation_rule
2024-03-25 16:05:11 +01:00
Claudio Atzori
c3c9bdb59c
Merge pull request 'bulkTaggingPathMapExtention' ( #381 ) from bulkTaggingPathMapExtention into beta
...
Reviewed-on: D-Net/dnet-hadoop#381
2024-03-25 16:02:01 +01:00
Claudio Atzori
91b61687fa
Merge branch 'beta' into bulkTaggingPathMapExtention
2024-03-25 15:50:18 +01:00
Claudio Atzori
63067d4b24
align dhp-schema.version with the beta branch
2024-03-25 15:50:05 +01:00
Claudio Atzori
e0c315b07b
Merge pull request 'Extract Information from Transformative Agreement' ( #371 ) from transformativeagreement into beta
...
Reviewed-on: D-Net/dnet-hadoop#371
2024-03-25 15:42:36 +01:00
Claudio Atzori
54936b7f42
Merge branch 'beta' into transformativeagreement
2024-03-25 15:42:22 +01:00
Claudio Atzori
9fc70a9451
implemented default merge procedure applied to result.instance
2024-03-25 15:39:14 +01:00
Michele Artini
e1149eb5c4
xslt rules and tests
2024-03-25 15:01:42 +01:00
Michele De Bonis
f6601ea7d1
default parameters for openorgs updated
2024-03-25 13:07:04 +01:00
Michele Artini
3f174ad90f
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2024-03-25 12:16:02 +01:00
Michele Artini
6ffb1faf09
fixed a problem with multiple nodes
2024-03-25 12:15:51 +01:00
Giambattista Bloisi
3f22c101d9
Merge pull request 'Enrich authors with ORCID info using new matching algorithm' ( #398 ) from new_orcid_enhancement into beta
...
Reviewed-on: D-Net/dnet-hadoop#398
2024-03-22 17:29:20 +01:00
Claudio Atzori
c8683eb13c
Merge branch 'beta' into mergeutils
2024-03-22 16:36:13 +01:00
Claudio Atzori
aaa73f89d1
refactoring the Oaf records merge utilities into dhp-common
2024-03-22 16:34:03 +01:00
Giambattista Bloisi
0ff7faad72
Fix conditions that prevented ORCID Enrichment
2024-03-22 16:24:49 +01:00
Michele De Bonis
cd4c3c934d
openorgs wf updated
2024-03-22 15:42:37 +01:00
Michele Artini
7faa115ba0
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2024-03-22 11:08:59 +01:00
Michele Artini
f9c74c98fa
fixed an identifier xpath
2024-03-22 11:08:45 +01:00
Claudio Atzori
7ae7e8aa06
Merge pull request 'Unify merge logic of entities in MergeUtils.class' ( #370 ) from mergeutils into beta
...
Reviewed-on: D-Net/dnet-hadoop#370
2024-03-22 10:53:14 +01:00
Antonis Lempesis
4c40c96e30
code cleanup
2024-03-22 10:16:49 +02:00
Antonis Lempesis
459167ac2f
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta
2024-03-21 12:44:58 +02:00
Antonis Lempesis
07f634a46d
code cleanup
2024-03-21 12:44:30 +02:00
Antonis Lempesis
9521625a07
code cleanup
2024-03-21 11:45:08 +02:00
Sandro La Bruzzo
58dbe71d39
update crossref mapping to be runnable separately as a single datasource outside doiboost
2024-03-20 17:04:52 +01:00
Antonis Lempesis
67a5aa0a38
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta
2024-03-19 11:24:54 +02:00
dimitrispie
a3a570e9a0
Commit monitor-updates-wf
2024-03-19 09:42:21 +02:00
Giambattista Bloisi
664a381d31
Unify merge logic of entities in MergeUtils.class
2024-03-18 16:04:49 +01:00
Michele Artini
cb29b9773c
xslt rules
2024-03-18 15:31:34 +01:00
Michele Artini
85b844d57e
updated BASE filter param
2024-03-15 15:03:27 +01:00
Michele Artini
455f2e1e07
apply commits from master
2024-03-15 14:56:39 +01:00
Michele Artini
30167aa882
mapped oaf:country from results
2024-03-15 11:24:16 +01:00
Michele Artini
88fef367b9
new plugin to collect from a dump of BASE
2024-03-15 10:47:52 +01:00
Claudio Atzori
078169b922
cleanup
2024-03-15 09:56:04 +01:00
Claudio Atzori
af154d4456
implemented changes from #9497 : sort abstracts by string length, included author fullnames in the related results, expanded instance details within each children/result XML element
2024-03-14 16:21:23 +01:00
Claudio Atzori
7863c92466
expanded paper abstract in the result/children XML element (ticket #9497 )
2024-03-13 16:25:31 +01:00
Claudio Atzori
eb5887cb9a
including related organization url in the XML record serialization (ticket #9498 )
2024-03-13 14:46:00 +01:00
Michele Artini
a99942f7cf
filter by base types
2024-03-13 12:12:42 +01:00
Michele Artini
7f7083f53e
updated sql query for filtering BASE records
2024-03-13 11:57:26 +01:00
Sandro La Bruzzo
5281f010a5
applied cherry pick
2024-03-13 09:59:20 +01:00
Sandro La Bruzzo
ee1fcb672b
code refactor
2024-03-13 09:46:31 +01:00
Miriam Baglioni
5a32bb9578
[OC New] last fix
2024-03-13 09:36:18 +01:00
Sandro La Bruzzo
c532831718
Moved Crossref Mapping on dhp-aggregations,
...
refactored code, avoid to use utility for create part of the oaf defined in DOIBoostMappingUtils, used instead utility in OafMappingUtils
2024-03-13 06:56:10 +01:00
Miriam Baglioni
48c052215c
[OC New] last fix
2024-03-12 23:12:32 +01:00
Michele Artini
d9b23a76c5
comments
2024-03-12 14:53:34 +01:00
Michele Artini
841ca92246
Merge pull request 'new plugin to collect from a dump of BASE' ( #400 ) from base-collector-plugin into master
...
Reviewed-on: D-Net/dnet-hadoop#400
2024-03-12 12:22:42 +01:00
Michele Artini
3bcfc40293
new plugin to collect from a dump of BASE
2024-03-12 12:17:58 +01:00
Claudio Atzori
db66555ebb
WIP: updated provision workflow to create a JSON based representation of the payload
2024-03-12 09:56:09 +01:00
Antonis Lempesis
f74c7e8689
selecting distinct peer_reviewed
2024-03-12 02:13:04 +02:00
Giambattista Bloisi
9092075760
Enrich authors with ORCID info using new matching algorithm
2024-03-11 13:23:59 +01:00
Sandro La Bruzzo
cbd4e5e4bb
update mag mapping
2024-03-08 16:31:40 +01:00
Claudio Atzori
d4871b31e8
WIP: extended provision workflow to create the JSON based payload
2024-03-08 11:43:20 +01:00
Antonis Lempesis
3c79720342
fixed the irish result subset
2024-03-07 14:08:57 +02:00
Antonis Lempesis
5ae4b4286c
Merge branch 'beta' of https://code-repo.d3science.org/antonis.lempesis/dnet-hadoop into beta
2024-03-07 12:15:19 +02:00
Miriam Baglioni
5180b6ec8a
[FOSNEW] removed test class
2024-03-07 10:47:13 +01:00
Miriam Baglioni
7827a2d66b
[OCNEW] added creation of the actionset for the results classified with FoS based ont he OpenAIRE identifier
2024-03-07 10:36:30 +01:00
Antonis Lempesis
316d585c8a
using distinct apcs per publication to avoid huge sums
2024-03-07 02:07:59 +02:00
Miriam Baglioni
fd34372c40
[OCNEW] first implementation
2024-03-06 13:42:00 +01:00
Sandro La Bruzzo
d34cef3f8d
Merge remote-tracking branch 'origin/beta' into doidoost_dismiss
2024-03-05 11:45:31 +01:00
Sandro La Bruzzo
3b837d38ce
added oozie workflow
2024-03-05 11:44:59 +01:00
Sandro La Bruzzo
f417515e43
Implemented class that generates a normalized table of MAG, which is the starting point for the creation of the mag source
2024-03-04 17:15:13 +01:00
Giambattista Bloisi
3067ea390d
Use SparkSQL in place of Hive for executing step16-createIndicatorsTables.sql of stats update wf
2024-03-04 11:13:34 +01:00
Sandro La Bruzzo
ad0e9aa80c
added first part of refactoring of the code generating MAG,
...
make it more readable using spark sql queries
2024-02-29 18:16:15 +01:00
Sandro La Bruzzo
9d94648f3b
code formatted
2024-02-29 18:15:20 +01:00
Giambattista Bloisi
3cd5590f3b
When converting json to XML, remove characters that are not allowed in the XML 1.0 specs, as they will cause xpath failures even if escaped
2024-02-28 15:14:18 +01:00
Giambattista Bloisi
56dd05f85c
Merge pull request 'Revised procedure when converting json data into xml' ( #395 ) from restiterator_xmlcleanup into beta
...
Reviewed-on: D-Net/dnet-hadoop#395
2024-02-28 10:38:54 +01:00
Claudio Atzori
6fcf872daa
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into index_records
2024-02-28 10:27:28 +01:00
Claudio Atzori
3f07390a58
WIP
2024-02-28 10:10:10 +01:00
Miriam Baglioni
c94d94035c
[BulkTagging] added check to verify if field is present in the pathMap
2024-02-28 09:41:42 +01:00
Sandro La Bruzzo
7d806a434c
formatted code
2024-02-28 09:31:58 +01:00
Sandro La Bruzzo
e468e99100
Merge pull request 'Orcid Update Procedure' ( #394 ) from orcid_update into beta
...
Reviewed-on: D-Net/dnet-hadoop#394
2024-02-28 09:17:30 +01:00
Sandro La Bruzzo
b63994dcc4
Merge remote-tracking branch 'origin/beta' into orcid_update
2024-02-28 09:11:18 +01:00
Sandro La Bruzzo
915a76a796
following the comment on the pull requests:
...
- Added #NUM_OF_THREADS complete job in the queue at the end of the main loop to avoid deadlock
2024-02-28 09:10:55 +01:00
Giambattista Bloisi
773e856550
Revised procedure when converting json data into xml:
...
- json object keys are renamed to be conformant to xml tag elements, special characters are substituted or removed
- json string values are no longer post-processed as they are already escaped by the org.json.XML.toString method
2024-02-24 16:54:30 +01:00
Sandro La Bruzzo
a712df1e1d
Merge remote-tracking branch 'origin/beta' into orcid_update
2024-02-23 10:12:25 +01:00
Sandro La Bruzzo
b32a9d1994
Implemented workflow for updating table , added step to check if the new generated table is valid
2024-02-23 10:04:28 +01:00
Michele Artini
3268570b2c
mapping of project PIDs
2024-02-22 14:47:21 +01:00
Michele Artini
4374d7449e
mapping of project PIDs
2024-02-22 14:44:35 +01:00
Miriam Baglioni
72bae7af76
[Transformative Agreement] removed the relations from the ActionSet waiting to have the gree light from Ioanna
2024-02-19 16:20:12 +01:00
Miriam Baglioni
43da7e1191
[Tagging Projects and Datasource] changed the way the pathMap parameter is passed. It was too long and was truncated
2024-02-19 16:12:59 +01:00
Serafeim Chatzopoulos
f0dc12634b
Add Action Set creation for affiliations inferred from the OpenAPC data
2024-02-18 18:02:09 +02:00
Claudio Atzori
07d009007b
Merge pull request 'Fixed problem on missing author in crossref Mapping' ( #384 ) from crossref_missing_author_fix_master into master
...
Reviewed-on: D-Net/dnet-hadoop#384
2024-02-15 15:06:17 +01:00
Claudio Atzori
071d044971
Merge branch 'master' into crossref_missing_author_fix_master
2024-02-15 15:04:19 +01:00
Claudio Atzori
b3ddbaed58
fixed import of ORPs stored on HDFS in the internal graph format (e.g. Datacite)
2024-02-15 15:02:48 +01:00
Claudio Atzori
753c2a72bd
Merge pull request 'fix import of ORPs' ( #390 ) from import_orps_fix into beta
...
Reviewed-on: D-Net/dnet-hadoop#390
2024-02-15 15:02:08 +01:00
Claudio Atzori
a63b091bae
Merge branch 'beta' into import_orps_fix
2024-02-15 15:01:56 +01:00
Giambattista Bloisi
85aeff72f1
Merge pull request 'Revised instance type comparisons in dedup phase' ( #393 ) from revisedInstanceType into beta
...
Reviewed-on: D-Net/dnet-hadoop#393
2024-02-15 12:15:37 +01:00
Giambattista Bloisi
d65285da7f
Promote "Research" to a jolly instanceType in dedup comparisons
...
Compare "Journal" and "Part of book or chapter of book" with "Article"
2024-02-15 12:11:04 +01:00
Giambattista Bloisi
29194472a7
Promote "Research" to a jolly instanceType in dedup comparisons
...
Compare Part of book or chapter of book with Article
2024-02-15 11:53:46 +01:00
Miriam Baglioni
8dae10b442
-
2024-02-14 14:57:08 +01:00
Miriam Baglioni
83bb97be83
[Tagging Projects and Datasource] added test to check datasource tagging. Fixed issue
2024-02-14 11:23:47 +01:00
Miriam Baglioni
6e1f383e4a
[Tagging Projects and Datasource] first extention of bulktagging to add the context to projects and datasource
2024-02-13 16:37:14 +01:00
Miriam Baglioni
3f7d262a4e
mergin with branch beta
2024-02-13 14:05:58 +01:00
Miriam Baglioni
eca021f4d6
[Transformative Agreement] add results with information abount the agreement and the country of the organization paid for it
2024-02-13 12:21:07 +01:00
Miriam Baglioni
bdb6bbb365
mergin with branch beta
2024-02-12 15:50:43 +01:00
Claudio Atzori
d85d2df6ad
[graph raw] fixed mapping of the original resource type from the Datacite format
2024-02-09 10:20:20 +01:00
Claudio Atzori
1416f16b35
[graph raw] fixed mapping of the original resource type from the Datacite format
2024-02-09 10:19:53 +01:00
Giambattista Bloisi
b19643f6eb
Dedup aliases, created when a dedup in a previous build has been merged in a new dedup, need to be marked as "deletedbyinference", since they are "merged" in the new dedup
2024-02-08 15:34:59 +01:00
Giambattista Bloisi
ba1a0e7b4f
Merge pull request 'Set deletedbyinference =true to dedup aliases, created when a dedup in a previous build has been merged in a new dedup' ( #392 ) from fix_dedupaliases_deletedbyinference into master
...
Reviewed-on: D-Net/dnet-hadoop#392
2024-02-08 15:29:29 +01:00
Giambattista Bloisi
079085286c
Merge branch 'master' into fix_dedupaliases_deletedbyinference
2024-02-08 15:29:13 +01:00
Giambattista Bloisi
8dd666aedd
Dedup aliases, created when a dedup in a previous build has been merged in a new dedup, need to be marked as "deletedbyinference", since they are "merged" in the new dedup
2024-02-08 15:27:57 +01:00
Claudio Atzori
f21133229a
Merge pull request 'Support for the PromoteAction strategy [master]' ( #391 ) from promote_actions_join_type_master into master
...
Reviewed-on: D-Net/dnet-hadoop#391
2024-02-08 15:12:16 +01:00
Claudio Atzori
d86b909db2
[actiosets] fixed join type
2024-02-08 15:10:55 +01:00
Claudio Atzori
08162902ab
[actiosets] introduced support for the PromoteAction strategy
2024-02-08 15:10:40 +01:00
Claudio Atzori
e6bdee86d1
Merge pull request 'Support for the PromoteAction strategy' ( #389 ) from promote_actions_join_type into beta
...
Reviewed-on: D-Net/dnet-hadoop#389
2024-02-08 15:08:05 +01:00
Antonis Lempesis
dd4c27f4f3
added 2 new institutions in monitor
2024-02-08 12:57:57 +02:00
Claudio Atzori
38c9001147
fixed import of ORPs stored on HDFS in the internal graph format (e.g. Datacite)
2024-02-07 17:02:05 +01:00
Claudio Atzori
fd17c1f17c
[actiosets] fixed join type
2024-02-05 16:55:36 +02:00
Claudio Atzori
009dcf6aea
[actiosets] introduced support for the PromoteAction strategy
2024-02-05 16:43:40 +02:00
Claudio Atzori
bb82052c40
[graph cleaning] rule out datasources without an officialname
2024-02-05 14:59:27 +02:00
Claudio Atzori
e8630a6d03
[graph cleaning] rule out datasources without an officialname
2024-02-05 14:59:06 +02:00
Claudio Atzori
42f5506306
[orcid enrichment] fixed directory cleanup before distcp
2024-02-05 09:45:36 +02:00
Claudio Atzori
f28c63d5ef
[orcid enrichment] fixed directory cleanup before distcp
2024-02-05 09:44:56 +02:00
Alessia Bardi
f2a08d8cc2
test for Italian records from IRS repositories
2024-01-30 19:20:14 +01:00
Antonis Lempesis
a512ead447
changed orcid ids to all capital
2024-01-30 16:54:47 +02:00
Miriam Baglioni
07a373a0bd
[bulkTagging] removing checks while performing the substring action so that it will fire an Exception if the paramneters are wrongly set
2024-01-30 13:51:11 +01:00
Miriam Baglioni
ead08b0dd4
mergin with branch beta
2024-01-30 12:19:10 +01:00
Claudio Atzori
1a8b609ed2
code formatting
2024-01-30 11:34:16 +01:00
Antonis Lempesis
bb10a22290
merged changes from dnet-hadoop
2024-01-29 21:51:47 +02:00
Miriam Baglioni
4c8706efee
[orcid-enrichment] change the value of parameters.
2024-01-29 18:21:36 +01:00
Miriam Baglioni
a5995ab557
[orcid-enrichment] change the value of parameters.
2024-01-29 18:19:48 +01:00
Miriam Baglioni
a418dacb47
[UsageCount] code extention to include also the name of the datasource
2024-01-29 18:12:33 +01:00
Miriam Baglioni
e9131f4e4a
mergin with branch beta
2024-01-29 16:27:18 +01:00
Sandro La Bruzzo
9aebca77a0
Added exception throwing in Hadoop transformation when TR is not syntactically valid
2024-01-29 14:41:02 +01:00
Claudio Atzori
f804c58bc7
Merge pull request 'Use SparkSQL in place of Hive for executing step16-createIndicatorsTables.sql of stats update wf' ( #386 ) from stats_with_spark_sql into beta
...
Reviewed-on: D-Net/dnet-hadoop#386
2024-01-29 09:11:59 +01:00
Claudio Atzori
926903b06b
Merge branch 'beta' into stats_with_spark_sql
2024-01-29 09:11:45 +01:00
Giambattista Bloisi
078df0b4d1
Use SparkSQL in place of Hive for executing step16-createIndicatorsTables.sql of stats update wf
2024-01-26 21:56:55 +01:00
Claudio Atzori
4d0c59669b
merged changes from beta
2024-01-26 16:08:54 +01:00
Claudio Atzori
bf99c424fa
Merge pull request 'Fixed problem on missing author in crossref Mapping' ( #383 ) from crossref_missing_author_fix into beta
...
Reviewed-on: D-Net/dnet-hadoop#383
2024-01-26 15:57:23 +01:00
Claudio Atzori
ce3200263e
Merge branch 'beta' into crossref_missing_author_fix
2024-01-26 15:57:04 +01:00
Sandro La Bruzzo
3c8c88bdd3
Fixed problem on missing author in crossref Mapping
2024-01-26 12:29:30 +01:00
Sandro La Bruzzo
e889808daa
Fixed problem on missing author in crossref Mapping
2024-01-26 12:19:04 +01:00
Claudio Atzori
9e8fc6aa88
[collection] increased logging from the oai-pmh metadata collection process
2024-01-26 09:17:20 +01:00
Antonis Lempesis
c548796463
Changed step16-createIndicatorsTables to use a spark oozie action instead of hive
2024-01-26 02:04:48 +02:00
Sandro La Bruzzo
0386f36385
Added workflow to update ORCID and replaced some parsing, because the update works and employments xml differs from the dump one.
2024-01-25 19:40:59 +01:00
Antonis Lempesis
a7115cfa9e
max mem of joins (hive.mapjoin.followby.gby.localtask.max.memory.usage) now 80%, up from 55%.
2024-01-25 15:13:16 +01:00
Antonis Lempesis
fd43b0e84a
max mem of joins (hive.mapjoin.followby.gby.localtask.max.memory.usage) now 80%, up from 55%.
2024-01-25 15:06:34 +01:00
Claudio Atzori
2838a9b630
Update 'CONTRIBUTING.md'
2024-01-24 16:07:05 +01:00
Claudio Atzori
da944a5c55
Merge pull request 'code of conduct and contributing' ( #382 ) from contributing into beta
...
Reviewed-on: D-Net/dnet-hadoop#382
2024-01-24 15:40:26 +01:00
Claudio Atzori
0c97a3a81a
minor
2024-01-24 10:56:33 +01:00
Claudio Atzori
2c1e6849f0
added code of conduct and contributing files
2024-01-24 10:36:41 +01:00
Claudio Atzori
9b13c22e5d
[graph provision] retrieve all the context information by adding all=true to the requests issued to thr API
2024-01-23 15:36:08 +01:00
Claudio Atzori
3e96777cc4
[collection] increased logging from the oai-pmh metadata collection process
2024-01-23 15:21:03 +01:00
Sandro La Bruzzo
43e0bba7ed
logg added during download
2024-01-23 15:04:49 +01:00
Miriam Baglioni
f7d06dc661
compilation after merging
2024-01-23 11:43:08 +01:00
Miriam Baglioni
6e58d79623
mergin with branch beta
2024-01-23 11:36:47 +01:00
Miriam Baglioni
e0ec800d7e
[BulkTagging] extend the definition of the pathMap to include also actions that should be performed of the value extracted from the result befor applying the constraint
2024-01-23 11:34:53 +01:00
Claudio Atzori
9812406589
Merge pull request '[graph provision] updated param specification for the XML converter job' ( #380 ) from provision_community_api into beta
...
Reviewed-on: D-Net/dnet-hadoop#380
2024-01-23 08:55:59 +01:00
Claudio Atzori
f87f3a6483
[graph provision] updated param specification for the XML converter job
2024-01-23 08:54:37 +01:00
Claudio Atzori
6fd25cf549
code formatting
2024-01-23 08:47:12 +01:00
Claudio Atzori
bd187ec6e7
Merge pull request 'Implements pivots table update oozie workflow' ( #376 ) from update_pivots_table into beta
...
Reviewed-on: D-Net/dnet-hadoop#376
2024-01-22 16:37:30 +01:00
Claudio Atzori
f76852f385
Merge branch 'beta' into update_pivots_table
2024-01-22 16:37:22 +01:00
Claudio Atzori
b9fcc5ad5e
Merge pull request 'Context API update' ( #379 ) from provision_community_api into beta
...
Reviewed-on: D-Net/dnet-hadoop#379
2024-01-22 15:55:33 +01:00
Claudio Atzori
1c6db320f4
[graph provision] obtain context info from the context API instead from the ISLookUp service
2024-01-22 15:53:17 +01:00
Claudio Atzori
2655eea5bc
[orcid enrichment] drop paths before copying the non-modifyed contents
2024-01-19 16:28:05 +01:00
Claudio Atzori
c6b3401596
increased shuffle partitions for publications in the country propagation workflow
2024-01-19 10:15:39 +01:00
Miriam Baglioni
bcc0a13981
[enrichment single step] adding <end> element in wf definition
2024-01-18 17:39:14 +01:00
Miriam Baglioni
6af536541d
[enrichment single step] moving parameter file in correct location
2024-01-18 15:35:40 +01:00
Miriam Baglioni
a12a3eb143
-
2024-01-18 15:18:10 +01:00
Claudio Atzori
628fdfb5eb
Merge pull request '[enrichment single step]' ( #378 ) from enrichmentSingleStepFixed into beta
...
Reviewed-on: D-Net/dnet-hadoop#378
2024-01-18 09:41:09 +01:00
Miriam Baglioni
82e9e262ee
[enrichment single step] remove parameter from execution
2024-01-17 17:38:03 +01:00
Miriam Baglioni
67ce2d54be
[enrichment single step] refactoring to fix issues in disappeared result type
2024-01-17 16:50:00 +01:00
Miriam Baglioni
59eaccbd87
[enrichment single step] refactoring to fix issue in disappeared result type
2024-01-15 17:49:54 +01:00
Giambattista Bloisi
21a14fcd80
Reusable RunSQLSparkJob for executing SQL in Spark through Oozie Spark Actions
...
Implements pivots table update oozie workflow
2024-01-15 10:18:14 +01:00
Sandro La Bruzzo
e0753f19da
Fixed error of connection timeout
2024-01-13 09:27:08 +01:00
sandro.labruzzo
e328bc0ade
fixed missing parameter on download update
2024-01-12 16:18:20 +01:00
Claudio Atzori
2d302e6827
Merge pull request '[FoS integration]fix issue on FoS integration. Removing the null values from FoS' ( #375 ) from fosPreparationBeta into beta
...
Reviewed-on: D-Net/dnet-hadoop#375
2024-01-12 10:27:28 +01:00
Miriam Baglioni
f612125939
fix issue on FoS integration. Removing the null values from FoS
2024-01-12 10:20:28 +01:00
Claudio Atzori
c67467723b
Merge pull request 'refined mapping for the extraction of the original resource type' ( #374 ) from resource_types into beta
...
Reviewed-on: D-Net/dnet-hadoop#374
2024-01-11 16:29:47 +01:00
Claudio Atzori
cb9e739484
Merge branch 'beta' into resource_types
2024-01-11 16:29:41 +01:00
Claudio Atzori
2753044d13
refined mapping for the extraction of the original resource type
2024-01-11 16:28:26 +01:00
Giambattista Bloisi
a88dce5bf3
Merge pull request 'Improvements and refactoring in Dedup' ( #367 ) from dedup_increasenumofblocks into beta
...
Reviewed-on: D-Net/dnet-hadoop#367
2024-01-11 11:24:06 +01:00
Giambattista Bloisi
3c66e3bd7b
Create dedup record for "merged" pivots
...
Do not create dedup records for group that have more than 20 different acceptance date
2024-01-10 22:59:52 +01:00
Giambattista Bloisi
10e135db1e
Use dedup_wf_002 in place of dedup_wf_001 to make explicit a different algorithm has been used to generate those kind of ids
2024-01-10 22:59:52 +01:00
Giambattista Bloisi
831cc1fdde
Generate "merged" dedup id relations also for records that are filtered out by the cut parameters
2024-01-10 22:59:52 +01:00
Giambattista Bloisi
1287315ffb
Do no longer use dedupId information from pivotHistory Database
2024-01-10 22:59:52 +01:00
Giambattista Bloisi
02636e802c
SparkCreateSimRels:
...
- Create dedup blocks from the complete queue of records matching cluster key instead of truncating the results
- Clean titles once before clustering and similarity comparisons
- Added support for filtered fields in model
- Added support for sorting List fields in model
- Added new JSONListClustering and numAuthorsTitleSuffixPrefixChain clustering functions
- Added new maxLengthMatch comparator function
- Use reduced complexity Levenshtein with threshold in levensteinTitle
- Use reduced complexity AuthorsMatch with threshold early-quit
- Use incremental Connected Component to decrease comparisons in similarity match in BlockProcessor
- Use new clusterings configuration in Dedup tests
SparkWhitelistSimRels: use left semi join for clarity and performance
SparkCreateMergeRels:
- Use new connected component algorithm that converge faster than Spark GraphX provided algorithm
- Refactored to use Windowing sorting rather than groupBy to reduce memory pressure
- Use historical pivot table to generate singleton rels, merged rels and keep continuity with dedupIds used in the past
- Comparator for pivot record selection now uses "tomorrow" as filler for missing or incorrect date instead of "2000-01-01"
- Changed generation of ids of type dedup_wf_001 to avoid collisions
DedupRecordFactory: use reduceGroups instead of mapGroups to decrease memory pressure
2024-01-10 22:59:52 +01:00
Antonis Lempesis
e024718f73
creating result_instances even when no pids exist for the instance
2024-01-10 22:25:50 +01:00
Sandro La Bruzzo
859babf722
added some useful comment
2024-01-10 19:51:13 +01:00
Sandro La Bruzzo
39ebb60b38
Merge remote-tracking branch 'origin/beta' into orcid_update
2024-01-10 19:50:00 +01:00
Sandro La Bruzzo
9d5a7c3b22
code refactor
2024-01-10 19:42:34 +01:00
Sandro La Bruzzo
8f61063201
Added workflow
2024-01-10 19:42:22 +01:00
Sandro La Bruzzo
1a42a5c10d
Implemented Download update of ORCID
2024-01-10 18:03:20 +01:00
Claudio Atzori
16d858fbf0
Merge pull request 'enrichmentSingleStep' ( #373 ) from enrichmentSingleStep into beta
...
Reviewed-on: D-Net/dnet-hadoop#373
2024-01-10 16:58:49 +01:00
Miriam Baglioni
e711a05229
fixed conflicts
2024-01-10 11:03:42 +01:00
Miriam Baglioni
71d6f30711
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2024-01-10 10:59:58 +01:00
dimitrispie
b920307bdd
Changes to indicators
2024-01-09 00:47:09 +02:00
dimitrispie
8b2cbb611e
Changes to beta db names
2024-01-09 00:40:56 +02:00
Antonis Lempesis
2e4cab026c
fixed the result_country definition
2024-01-08 16:01:26 +02:00
dimitrispie
6b823100ae
Update buildIrishMonitorDB.sql
...
New indicators added
2024-01-07 22:54:39 +02:00
dimitrispie
75bfde043c
Historical Snapshots Workflow
...
Create historical snapshots db with parameters:
hist_db_name=openaire_beta_historical_snapshots_xxx
hist_db_name_prev=openaire_beta_historical_snapshots_xxx (previous run of wf)
stats_db_name=openaire_beta_stats_xxx
stats_irish_db_name=openaire_beta_stats_monitor_ie_xxx
monitor_db_name=openaire_beta_stats_monitor_xxx
monitor_db_prod_name=openaire_beta_stats_monitor
monitor_irish_db_name=openaire_beta_stats_monitor_ie_xxx
monitor_irish_db_prod_name=openaire_beta_stats_monitor_ie
hist_db_prod_name=openaire_beta_historical_snapshots
hist_db_shadow_name=openaire_beta_historical_snapshots_shadow
hist_date=122023
hive_timeout=150000
hadoop_user_name=xxx
resumeFrom=CreateDB
2024-01-04 15:11:04 +02:00
Miriam Baglioni
cb14470ba6
added properties file in the forlder for the workflow of result to organization from inst repo propagation. Changes the path in the classes implementing the propagation
2023-12-22 14:50:05 +01:00
Miriam Baglioni
9f966b59d4
added properties file in the forlder for the workflow of result to community from semrel propagation. Changes the path in the classes implementing the propagation
2023-12-22 14:11:47 +01:00
Miriam Baglioni
2f3b5a133d
added properties file in the forlder for the workflow of result to community from organization propagation. Changes the path in the classes implementing the propagation
2023-12-22 13:56:40 +01:00
Miriam Baglioni
2f7b9ad815
added properties file in the forlder for the workflow of project to result propagation. Changes the path in the classes implementing the propagation
2023-12-22 11:46:15 +01:00
Miriam Baglioni
f2352e8a78
changed in the classes the path for the property files for the propagation of community from project
2023-12-22 11:43:34 +01:00
Miriam Baglioni
009730b3d1
added properties file in the forlder for the workflow of orcid propagation. Changes the path in the classes implementing the propagationchanged the path to the parameter file in the class for entitytoorganization propagation
2023-12-22 11:42:09 +01:00
Miriam Baglioni
89f269c7f4
changed the path to the parameter file in the class for entitytoorganization propagation
2023-12-22 11:37:50 +01:00
Miriam Baglioni
b06aea0adf
adding the bulkTag parameter file in the folder for the oozie workflow for bulkTagging. Changes the path in the class
2023-12-22 11:35:37 +01:00
Miriam Baglioni
3afd4aa57b
adjustments for country propagation
2023-12-22 11:27:30 +01:00
dimitrispie
ffdd03d2f4
Monitor Irish Stats WF
...
Parameters (with examples):
stats_db_name=openaire_beta_stats_20231208
monitor_irish_db_name=openaire_beta_stats_monitor_ie_20231208b
monitor_irish_db_prod_name=openaire_beta_stats_monitor_ie
graph_db_name=openaire_beta_20231208
monitor_irish_db_shadow_name=openaire_beta_stats_monitor_ie_shadow
hive_timeout=150000
hadoop_user_name=dnet.beta
resumeFrom=Step1-buildIrishMonitorDB
2023-12-22 11:05:24 +02:00
dimitrispie
40b98d8182
Changes to indicators and funders definition
...
- Changes result_refereed definition
- Added result_country indicator
- Added indi_pub_green_with_license indicator
- Added country from jurisdiction to funders
2023-12-22 10:29:20 +02:00
Claudio Atzori
62104790ae
added metaresourcetype to the result hive DB view
2023-12-21 12:27:10 +01:00
Claudio Atzori
106968adaa
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2023-12-21 12:26:29 +01:00
Claudio Atzori
a8a4db96f0
added metaresourcetype to the result hive DB view
2023-12-21 12:26:19 +01:00
Miriam Baglioni
5011c4d11a
refactoring after compiletion
2023-12-20 15:57:26 +01:00
Miriam Baglioni
4740c808f7
-
2023-12-20 14:26:54 +01:00
Miriam Baglioni
d410ea8a41
added needed parameter
2023-12-19 12:15:01 +01:00
Sandro La Bruzzo
37e36baf76
updated workflow for generation of Scholix Datasource's to use mdstore transactions
2023-12-18 16:05:35 +01:00
Miriam Baglioni
624f5f3f21
[Transformative Agreement] added check to verify the APC were paid byu the IReL funder
2023-12-18 15:28:19 +01:00
Miriam Baglioni
354e02e6a9
[Transformative Agreement] removed not needed class. Read directly the json and no need to pass from the csv
2023-12-18 15:20:27 +01:00
Miriam Baglioni
b00771c7cc
[Transformative Agreement] added code to extract relations from the transformative agreement file for the IE products got from OpenAPC
2023-12-18 15:12:44 +01:00
Sandro La Bruzzo
9d39845d1f
uploaded input parameters on CreateBaseline WF
2023-12-18 12:23:12 +01:00
Sandro La Bruzzo
15fd93a2b6
uploaded input parameters on CreateBaseline WF
2023-12-18 12:21:55 +01:00
Sandro La Bruzzo
9d342a47da
updated the transformation Baseline workflow to include mdstore rollback/commit action
2023-12-18 11:48:57 +01:00
Sandro La Bruzzo
1fbd4325f5
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2023-12-18 11:47:17 +01:00
Sandro La Bruzzo
1f1a6a5f5f
updated the transformation Baseline workflow to include mdstore rollback/commit action
2023-12-18 11:47:00 +01:00
Miriam Baglioni
3eca5d2e1c
-
2023-12-18 09:55:27 +01:00
Miriam Baglioni
01ce0b9c76
[doiboost - preprocess] remove transition to orcid preparation from sequence of steps at the beginning of the workflow
2023-12-15 12:24:55 +01:00
Miriam Baglioni
0d8e496a63
-
2023-12-15 12:16:43 +01:00
Claudio Atzori
c4ec35b6cd
Merge pull request 'Master branch updates from beta December 2023' ( #369 ) from beta_to_master_dicember2023 into master
...
Reviewed-on: D-Net/dnet-hadoop#369
2023-12-15 11:18:30 +01:00
Claudio Atzori
1726f49790
code formatting
2023-12-15 10:37:02 +01:00
Claudio Atzori
a59be5779e
Merge pull request '9078_xml_records_irish_tender' ( #368 ) from 9078_xml_records_irish_tender into beta
...
Reviewed-on: D-Net/dnet-hadoop#368
2023-12-12 12:34:43 +01:00
Claudio Atzori
ff924215b8
[graph provision] added tests for new peerreviewed field
2023-12-12 11:21:30 +01:00
Claudio Atzori
a6d635e695
Merge branch 'beta' into 9078_xml_records_irish_tender
2023-12-12 11:06:42 +01:00
Claudio Atzori
98cce5bfb2
code formatting
2023-12-12 09:59:05 +01:00
Claudio Atzori
84d54643cf
[cleaning] allow enriched orcids to pass the cleaning, rule out non-orcid author pids
2023-12-12 09:57:00 +01:00
Claudio Atzori
7e8eff40c1
[graph provision] added tests for the new model fields
2023-12-12 08:54:15 +01:00
Miriam Baglioni
8752d275fa
removed not needed parameter
2023-12-09 15:24:45 +01:00
Miriam Baglioni
d4eedada71
adjusting workflow definition
2023-12-09 15:20:11 +01:00
Claudio Atzori
aba95ed1d1
code formatting
2023-12-08 17:06:19 +01:00
Claudio Atzori
2877839df0
Merge pull request '[graph cleaning] added cleaning for result.publisher and result.instance.license' ( #366 ) from clean_license_publisher into beta
...
Reviewed-on: D-Net/dnet-hadoop#366
2023-12-08 16:58:37 +01:00
Claudio Atzori
34abd0fc43
Merge branch 'beta' into clean_license_publisher
2023-12-08 16:58:27 +01:00
Claudio Atzori
cb71a7936b
[graph cleaning] avoid stack overflow error when navigating Oaf objects declaring an Enum
2023-12-07 23:09:54 +01:00
Claudio Atzori
70eb1796b2
logging typo
2023-12-07 14:08:04 +01:00
Claudio Atzori
c381bacee0
[enrichment] passing the community API base URL
2023-12-07 14:07:11 +01:00
Miriam Baglioni
336fb31d87
[community_result_propagation] adjusting starting poit of workflow
2023-12-07 10:27:25 +01:00
Miriam Baglioni
c0cde53bf6
[bulktagging] setting first step of bulktaggin as the copy of the entities and relations not involved in the tagging'
2023-12-07 10:08:35 +01:00
Miriam Baglioni
616622d2bb
first version of the workflow single step
2023-12-07 09:59:52 +01:00
Claudio Atzori
259c69e446
[orcid enrichment] fixed workflow definition
2023-12-06 19:41:53 +01:00
Claudio Atzori
431c6bb08a
[dedup] added isLookupUrl to the graph consistency workflow definition, required now by the entity grouping phase
2023-12-06 11:06:46 +01:00
Claudio Atzori
982c0c110b
Merge pull request '[graph provision] added serialization for the new fields imported from the stats DB' ( #365 ) from 9078_xml_records_irish_tender into beta
...
Reviewed-on: D-Net/dnet-hadoop#365
2023-12-05 16:39:44 +01:00
Claudio Atzori
321922772b
added serialization for the new fields imported for the Irish tender
2023-12-05 16:37:04 +01:00
Claudio Atzori
c5b7253130
[community_organization propagation] fixed workflow parameters
2023-12-05 09:13:33 +01:00
Claudio Atzori
3c3bdb8318
[bulktagging] fixed workflow parameters
2023-12-05 09:08:48 +01:00
Claudio Atzori
7c3041b276
avoid NPEs
2023-12-03 16:49:49 +01:00
Claudio Atzori
74b185d07b
avoid NPEs
2023-12-03 16:18:20 +01:00
Claudio Atzori
e6086efc53
avoid NPEs in Vocabulary.getTermBySynonym
2023-12-03 13:33:20 +01:00
Claudio Atzori
2a233a89aa
[graph grouping] added isLookupUrl to the workflow definition, passed to the grouping spark aciton
2023-12-03 13:32:52 +01:00
Claudio Atzori
178a14c491
code formatting
2023-12-03 13:31:58 +01:00
Sandro La Bruzzo
3caf6ff27e
Extracted the correct original type to pass to instanceTypeMapping in Crossref Mapping
2023-12-01 16:33:56 +01:00
Claudio Atzori
511a98dd80
fixed doiboost process workflow, removed references to the ProcessORCID step
2023-12-01 16:21:53 +01:00
Claudio Atzori
d33f578e54
code formatting
2023-12-01 15:14:17 +01:00
Claudio Atzori
c5ac593c07
Merge pull request 'ORCID Enrichment and Download' ( #364 ) from orcid_import into beta
...
Reviewed-on: D-Net/dnet-hadoop#364
2023-12-01 15:05:44 +01:00
Claudio Atzori
09d061e90b
Merge branch 'beta' into orcid_import
2023-12-01 15:05:35 +01:00
Claudio Atzori
93a700742a
Merge pull request 'Changes for tables and creation of the new indicator indi_is_result_accessible' ( #363 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#363
2023-12-01 15:05:23 +01:00
Claudio Atzori
0c3c9ea43d
Merge pull request 'StatsDB workflow to export actionsets about OA routes, diamond, and publicly-funded' ( #355 ) from dimitris.pierrakos/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#355
2023-12-01 15:03:56 +01:00
Claudio Atzori
33cb483c75
using objectSubType as originalType in Crossref2Oaf, code formatting
2023-12-01 15:03:05 +01:00
dimitrispie
c9d995dde0
New institutions added
2023-12-01 15:44:35 +02:00
dimitrispie
a397112cb8
Add new indicator
...
Add indi_pub_publicly_funded
2023-12-01 15:00:18 +02:00
dimitrispie
76594ded23
Changes to indicators
...
Fixes on open access colours indicators
- indi_pub_green_oa
- indi_pub_gold_oa
- indi_pub_hybrid
- indi_pub_bronze_oa
- indi_pub_diamond
2023-12-01 13:38:19 +02:00
Claudio Atzori
622fafbd2e
Merge branch 'beta' into orcid_import
2023-12-01 12:28:14 +01:00
Sandro La Bruzzo
bf0fd27c36
Removed unused function
...
Applied PR Comment of Giambattista in the PR
2023-12-01 12:16:42 +01:00
dimitrispie
48430a32a6
Update StatsAtomicActionsJob.java
...
Added indi_funded_result_with_fundref indicator
2023-12-01 11:35:01 +02:00
Sandro La Bruzzo
cdfb7588dd
code formatting
2023-11-30 15:31:42 +01:00
Sandro La Bruzzo
5e22b67b8a
Merge remote-tracking branch 'origin/beta' into orcid_import
2023-11-30 15:27:46 +01:00
Sandro La Bruzzo
f718caaac9
Added copy of the untouched entities of the graph
2023-11-30 14:51:00 +01:00
Sandro La Bruzzo
7b5e04f37e
removed Orcid intersection on DOIBoost
2023-11-30 14:36:50 +01:00
Claudio Atzori
4cbabc9fbc
Merge pull request '[ENRICHMENT][BETA] Use of community API in enrichment process AND addition to tagging result for communities through projects' ( #359 ) from propagationapi into beta
...
Reviewed-on: D-Net/dnet-hadoop#359
2023-11-30 14:20:33 +01:00
Claudio Atzori
6f10791e77
Merge branch 'beta' into propagationapi
2023-11-30 14:20:18 +01:00
Claudio Atzori
4e1aac2e2f
resolved conflict in pom.xml before applying the changes from [COAR based resource types & Irish tender] #350
2023-11-29 14:37:52 +01:00
Sandro La Bruzzo
86b5775e08
added vocabulary in instanceTypeMapping for
...
- DOIBoost
- Datacite
- PubMed
- Scholexplorer Datasource
2023-11-29 13:15:43 +01:00
Sandro La Bruzzo
c96ff54b45
Merge remote-tracking branch 'origin/resource_types' into resource_types
2023-11-29 12:45:41 +01:00
Sandro La Bruzzo
af1c2634b3
added instanceTypeMapping original field in the mapping of
...
- DOIBoost
- Datacite
- PubMed
- Scholexplorer Datasource
2023-11-29 12:45:30 +01:00
Sandro La Bruzzo
279100fa52
added test
2023-11-29 11:17:58 +01:00
Sandro La Bruzzo
aa239ec673
Changed implementation of check similarity to verify exact match of name instead of the first char
2023-11-29 11:17:41 +01:00
Sandro La Bruzzo
59111713fa
added comment
2023-11-28 09:00:48 +01:00
Sandro La Bruzzo
6f4d0c05ea
Implemented Author MErger for ORCID that takes in account the case when name and surname are swapped
2023-11-28 08:43:56 +01:00
Miriam Baglioni
8eb70e6657
refactoring
2023-11-27 15:13:15 +01:00
Miriam Baglioni
e3cce9a5a0
mergin with branch beta
2023-11-27 15:10:55 +01:00
Miriam Baglioni
48e0427a23
changed the parameter from production to baseURL. Fixed issue in tagging configuration
2023-11-27 15:10:27 +01:00
Sandro La Bruzzo
34a4b3cbdf
Implemented ORCID Enrichment
2023-11-24 12:39:58 +01:00
Claudio Atzori
1763d377ad
code formatting
2023-11-23 16:33:24 +01:00
Claudio Atzori
1ba582de3c
[graph cleaning] added cleaning for result.publisher and result.instance.license
2023-11-23 16:27:19 +01:00
dimitrispie
359e81b7a6
Update StatsAtomicActionsJob.java
...
Bug fix for duplicate bronze checks
2023-11-23 10:48:55 +02:00
Claudio Atzori
a0311e8a90
Merge pull request 'Clear working dir in bipranker workflow' ( #360 ) from 9120_bipranker_clean_working_dir into master
...
Reviewed-on: D-Net/dnet-hadoop#360
2023-11-22 14:10:39 +01:00
Claudio Atzori
8fb05888fd
Merge branch 'master' into 9120_bipranker_clean_working_dir
2023-11-22 14:10:30 +01:00
Claudio Atzori
a21617732a
Merge pull request 'graph cleaning, suggestions from ticket 8898 - round 2' ( #356 ) from cleaning_8898 into beta
...
Reviewed-on: D-Net/dnet-hadoop#356
2023-11-22 14:00:37 +01:00
Claudio Atzori
2c77638bf5
Merge branch 'beta' into cleaning_8898
2023-11-22 14:00:10 +01:00
Claudio Atzori
836d7ec724
Merge pull request 'Add Pubmed affiliations (inferred by BIP) as actionsets' ( #353 ) from 9117_pubmed_affiliations into beta
...
Reviewed-on: D-Net/dnet-hadoop#353
2023-11-22 13:53:07 +01:00
Claudio Atzori
745039ad5b
Merge branch 'beta' into 9117_pubmed_affiliations
2023-11-22 13:52:53 +01:00
Claudio Atzori
008fdf9d8a
Merge pull request 'URL Validator to accept double slashes' ( #352 ) from url_validation into beta
...
Reviewed-on: D-Net/dnet-hadoop#352
2023-11-22 13:52:08 +01:00
Claudio Atzori
11a1207f9c
[graph cleaning] applying coar based vocabularies in bulk
2023-11-22 12:22:14 +01:00
dimitrispie
a94a54a2d0
Changes for tables and creation of the new indicator indi_is_result_accessible
...
- Drop table statements for all tables to avoid duplicates in case of wf rerun
- Add pdfsaggregated step to create the indi_is_result_accessible table. This step is executed on the new impala cluster only, since the pdfaggregation_i is updated on this cluster.
2023-11-15 14:32:18 +02:00
Claudio Atzori
2b626815ff
Merge pull request 'Project propagation via communityAPI instead of using IS via IIS' ( #362 ) from projectPropagation into master
...
Reviewed-on: D-Net/dnet-hadoop#362
2023-11-14 16:37:53 +01:00
Miriam Baglioni
b177cd5a0a
Project propagation via communityAPI instead of using IS via IIS
2023-11-14 16:25:09 +01:00
Miriam Baglioni
eaf0a702de
-
2023-11-14 14:53:34 +01:00
Sandro La Bruzzo
6ce36b3e41
Implemented ORCID Workflow on DHP-Aggregation for retrieving ORCID DUMP and generating tables
2023-11-14 12:04:29 +01:00
dimitrispie
d524e30866
Changes to actionsets
...
Resolve comments from
D-Net/dnet-hadoop#355
2023-11-14 09:46:52 +02:00
Serafeim Chatzopoulos
671ba8a5a7
Clear working dir in bipranker workflow
2023-11-07 18:35:05 +02:00
Miriam Baglioni
5bc97615d5
-
2023-11-03 15:35:10 +01:00
Miriam Baglioni
7b1e34f159
refactoring
2023-11-03 15:30:01 +01:00
Miriam Baglioni
638ad9e74f
changing test for new implementation
2023-11-03 15:06:50 +01:00
Miriam Baglioni
edcb17ca98
refactoring and test
2023-11-03 13:01:14 +01:00
Claudio Atzori
5f1ed61c1f
merging from bulkTag branch
2023-11-03 12:51:37 +01:00
Claudio Atzori
8c03c41d5d
applying changes from beta
2023-11-03 12:08:39 +01:00
Claudio Atzori
97454e9594
Merge pull request '9117_pubmed_affiliations_prod' ( #357 ) from 9117_pubmed_affiliations_prod into master
...
Reviewed-on: D-Net/dnet-hadoop#357
2023-11-03 11:45:34 +01:00
Serafeim Chatzopoulos
7e34dde774
Renaming input param for crossref input path
2023-11-02 17:47:04 +02:00
Serafeim Chatzopoulos
24c3f92d87
Change the description of the workflow
2023-11-02 17:46:51 +02:00
Serafeim Chatzopoulos
6ce9b600c1
Add actionset creation for pubmed affiliations
2023-11-02 17:46:39 +02:00
Serafeim Chatzopoulos
94089878fd
Adjust tests to new WF input params
2023-11-02 17:46:13 +02:00
Miriam Baglioni
937ff6a7c7
-
2023-10-31 15:56:08 +01:00
Miriam Baglioni
a737dd47b6
removed not needed test class
2023-10-31 15:54:49 +01:00
Miriam Baglioni
c80b768af0
test for project propagation
2023-10-31 15:49:42 +01:00
Miriam Baglioni
e9a20fc8f6
mergin with branch beta
2023-10-31 14:36:03 +01:00
Claudio Atzori
dde2fec035
[graph cleaning] cleanup
2023-10-31 14:35:33 +01:00
Claudio Atzori
262d7c581b
[graph cleaning] implemented further suggestions from https://support.openaire.eu/issues/8898
2023-10-31 14:34:10 +01:00
Serafeim Chatzopoulos
2090003ea9
Adjust tests to new WF input params
2023-10-26 13:47:06 -07:00
Miriam Baglioni
0097f4e64b
Removed Query community testing. Removed package from common related to the interaction with Zenodo since it was moved to the dump-project
2023-10-26 09:38:09 +02:00
Serafeim Chatzopoulos
a82aaf57b2
Renaming input param for crossref input path
2023-10-25 12:05:02 -07:00
Claudio Atzori
b3a61ea955
Merge branch 'beta' into url_validation
2023-10-25 14:22:56 +02:00
dimitrispie
89c4dfbaf4
StatsDB workflow to export actionsets about OA routes, diamond, and publicly-funded
...
A new oozie workflow capable to read from the stats db to produce a new actionSet for updating results with:
- green_oa ={true, false}
- openAccesColor = {gold, hybrid, bronze}
- in_diamond_journal={true, false}
- publicly_funded={true, false}
Inputs:
- outputPath
- statsDB
2023-10-24 09:48:23 +03:00
Miriam Baglioni
5c5a195e97
refactoring and fixing issue on property name
2023-10-23 11:26:17 +02:00
Claudio Atzori
a870aa2b09
depending on dhp-schemas:3.17.2
2023-10-20 22:28:39 +02:00
Claudio Atzori
7fc621cdec
added defaults to the graph resolution workflow config-default.xml
2023-10-20 22:28:12 +02:00
Miriam Baglioni
70b78a40c7
removed file from different propagation
2023-10-20 15:50:49 +02:00
Miriam Baglioni
f206ff42d6
modified code to use the the API. Removing not needed parameters. Rewritten the code to exploit the parallel stream on the entity types
2023-10-20 15:49:41 +02:00
Miriam Baglioni
34358afe75
modified resource file, workflow anf default-config. Add 3g of memory Overhead and specified the shuffle partition in the wf confiduration. Removed the multiple instantiation in the wf because of different implementation of the spark job
2023-10-20 15:48:27 +02:00
Miriam Baglioni
18bfff8af3
adding test classes and modifying test for bulktag
2023-10-20 15:47:03 +02:00
Miriam Baglioni
69dac91659
adding the new code to use the API instead of the Information Service
2023-10-20 15:45:52 +02:00
Serafeim Chatzopoulos
aad5982bf1
Change the description of the workflow
2023-10-20 12:48:21 +03:00
Miriam Baglioni
a9ede1e989
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2023-10-20 10:14:43 +02:00
Miriam Baglioni
a4214ced1e
fixing issue on propagation organization. added --config to workflow definition. added oozie_app to communtiy project
2023-10-20 10:14:20 +02:00
Serafeim Chatzopoulos
6b19dcee80
Add actionset creation for pubmed affiliations
2023-10-19 19:58:25 +03:00
Claudio Atzori
2b9d0416ec
[graph raw] URL Validator to accept double slashes
2023-10-19 16:26:37 +02:00
Claudio Atzori
b0fed1725e
avoid NPEs
2023-10-19 12:13:45 +02:00
Miriam Baglioni
f1b898c6b4
mergin with branch beta
2023-10-19 09:04:35 +02:00
Claudio Atzori
a24178cb93
Merge branch 'beta' into resource_types
2023-10-17 11:09:50 +02:00
Claudio Atzori
d28b7085f6
more NPE checks
2023-10-17 11:09:31 +02:00
Claudio Atzori
3b1c8b9fbd
Merge pull request 'FIX: GroupEntitiesSparkJob deletes whole graph outputPath instead of its temporary folder' ( #351 ) from fix_consistency_missing_rels into beta
...
Reviewed-on: D-Net/dnet-hadoop#351
2023-10-17 08:40:23 +02:00
Claudio Atzori
1d594eaffd
Merge branch 'beta' into fix_consistency_missing_rels
2023-10-17 08:40:07 +02:00
Giambattista Bloisi
0e44b037a5
FIX: GroupEntitiesSparkJob deletes whole graph outputPath instead of its temporary folder
2023-10-17 07:54:01 +02:00
Claudio Atzori
6dfcd0c9a2
[raw graph] mapping original resource types
2023-10-16 12:57:18 +02:00
Claudio Atzori
39d24d5469
Merge branch 'beta' into resource_types
2023-10-16 11:56:38 +02:00
Claudio Atzori
389e3fcc59
Merge pull request '[dedup] use common `saveParquet` and `save` methods to ensure outputs are compressed' ( #349 ) from fix_dedup_not_compressed into beta
...
Reviewed-on: D-Net/dnet-hadoop#349
2023-10-16 11:56:18 +02:00
Sandro La Bruzzo
a5a89a702f
new spark parrameter updated
2023-10-16 11:46:12 +02:00
Miriam Baglioni
159388f9c2
testing and fix some issues
2023-10-16 11:26:07 +02:00
Claudio Atzori
03670bb9ce
[dedup] use common saveParquet and save methods to ensure outputs are compressed
2023-10-16 10:55:47 +02:00
Claudio Atzori
54fbf09ac6
[raw graph] WIP: mapping original resource types
2023-10-16 08:57:47 +02:00
Claudio Atzori
6cf64d5d8b
[SWH] renamed 'Software Heritage Identifier' to 'Software Hash Identifier'
2023-10-13 10:09:26 +02:00
Claudio Atzori
242d647146
cleanup & docs
2023-10-12 12:23:44 +02:00
Claudio Atzori
76447958bb
cleanup & docs
2023-10-12 12:23:20 +02:00
Claudio Atzori
af3ffad6c4
[AMF] docs
2023-10-12 10:07:52 +02:00
Claudio Atzori
1902728f7e
Merge pull request '[ActionManagerFramework] documentation' ( #347 ) from actionset_docs into beta
...
Reviewed-on: D-Net/dnet-hadoop#347
2023-10-12 10:07:25 +02:00
Claudio Atzori
dda602fff7
[AMF] docs
2023-10-12 10:05:46 +02:00
Claudio Atzori
05ee7d8b09
[graph cleaning] avoid NPEs
2023-10-12 09:13:42 +02:00
Miriam Baglioni
8e9493fad9
mergin with branch beta
2023-10-11 18:18:09 +02:00
Miriam Baglioni
89184d5b4f
used the API instead of the IS for bulktagging and propagation for community through organization. Added a new propagation step for communities through projects. Still using the API and not the IS
2023-10-11 18:17:35 +02:00
Claudio Atzori
554551682d
[raw graph] adopting the new COAR based vocabularies for the resource typing
2023-10-11 16:09:19 +02:00
Claudio Atzori
a460ebe215
[UnresolvedEntities] updated action name
2023-10-10 15:50:11 +02:00
Claudio Atzori
ecea58a41c
Merge pull request '[UnresolvedEntities] changing in the creation of the unresolved entities' ( #346 ) from fos into beta
...
Reviewed-on: D-Net/dnet-hadoop#346
2023-10-10 15:10:21 +02:00
Claudio Atzori
66064e99fe
Merge branch 'beta' into fos
2023-10-10 15:07:21 +02:00
Miriam Baglioni
a431b04814
leftover for the properties and removal of bipfinder
2023-10-10 12:53:57 +02:00
Claudio Atzori
ed9282ef2a
removed module dhp-stats-monitor-update
2023-10-10 09:52:03 +02:00
Miriam Baglioni
110ce4b40f
extend the fos model to include the level4 and the scores for level3 and level4. removed bip indicators from the instance
2023-10-10 09:46:40 +02:00
Claudio Atzori
204404b0e3
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-10-10 09:36:13 +02:00
Claudio Atzori
9a98f408b3
code formatting
2023-10-10 09:36:11 +02:00
Claudio Atzori
4e6fccf4f6
Merge pull request 'Beta stats wf updated' ( #332 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#332
2023-10-10 09:35:32 +02:00
Miriam Baglioni
a3d01ccb24
refactoring
2023-10-09 14:52:17 +02:00
Miriam Baglioni
8448b9ebfb
mergin with branch beta
2023-10-09 14:27:23 +02:00
Miriam Baglioni
3d6be20989
changes to use the API instead of the IS the get the information for the communities to be used during bulktagging and context propagation
2023-10-09 14:26:33 +02:00
dimitrispie
17586f0ff8
Update step20-createMonitorDB.sql
...
Add result_orcid table to monitor dbs
2023-10-09 14:21:31 +03:00
dimitrispie
489a082f04
Update step16-createIndicatorsTables.sql
...
Change scripts for gold, hybrid, bronze indicators
2023-10-09 14:00:50 +03:00
Claudio Atzori
ef833840c3
[Doiboost] removed linkage to SFI unidentified project
2023-10-06 15:48:18 +02:00
Claudio Atzori
84a58802ab
[OC] using the common pid cleaning function
2023-10-06 14:48:05 +02:00
Claudio Atzori
46034630cf
[OC] compress the output actionset
2023-10-06 14:42:02 +02:00
Claudio Atzori
774e874d18
Merge pull request 'implemented relation to irish funder from a Json list' ( #344 ) from irish_funder into beta
...
Reviewed-on: D-Net/dnet-hadoop#344
2023-10-06 14:26:54 +02:00
Claudio Atzori
3bc44fbf1d
Merge branch 'beta' into irish_funder
2023-10-06 14:26:41 +02:00
Claudio Atzori
11153742c9
Merge pull request 'Extending the coverage of the peer non-unknown refereed instances' ( #342 ) from peer_reviewed into beta
...
Reviewed-on: D-Net/dnet-hadoop#342
2023-10-06 14:22:13 +02:00
Claudio Atzori
8108491722
Merge branch 'beta' into peer_reviewed
2023-10-06 14:21:52 +02:00
Giambattista Bloisi
2f3cf6d0e7
Fix cleaning of Pmid where parsing of numbers stopped at first not leading 0' character
2023-10-06 14:20:15 +02:00
Claudio Atzori
ba5475ed4c
Merge pull request 'Fix cleaning of Pmid where parsing of numbers stopped at first not leading 0 (zero) character' ( #345 ) from fix_truncated_pmid into master
...
Reviewed-on: D-Net/dnet-hadoop#345
2023-10-06 14:19:49 +02:00
Claudio Atzori
6856ab28ab
Merge pull request 'SWH_integration' ( #343 ) from SWH_integration into beta
...
Reviewed-on: D-Net/dnet-hadoop#343
2023-10-06 14:15:56 +02:00
Claudio Atzori
3c23d5f9bc
Merge branch 'beta' into SWH_integration
2023-10-06 14:15:38 +02:00
Claudio Atzori
858931ccb6
[SWH] compress the output actionset
2023-10-06 14:03:33 +02:00
Claudio Atzori
f759b18bca
[SWH] aligned parameter name
2023-10-06 13:43:20 +02:00
Giambattista Bloisi
2c235e82ad
Fix cleaning of Pmid where parsing of numbers stopped at first not leading 0' character
2023-10-06 12:35:54 +02:00
Claudio Atzori
eed9fe0902
code formatting
2023-10-06 12:31:17 +02:00
Claudio Atzori
7f27111b1f
Merge branch 'importpoci' into beta
2023-10-06 12:23:28 +02:00
Claudio Atzori
73c49b8d26
Merge branch 'beta' into SWH_integration
2023-10-06 12:21:51 +02:00
Sandro La Bruzzo
42a2dad975
implemented relation to irish funder from a Json list
2023-10-06 11:52:33 +02:00
Sandro La Bruzzo
13f332ce77
ignored jenv prop
2023-10-06 10:40:05 +02:00
Serafeim Chatzopoulos
1bb83b9188
Add prefix in SWH ID
2023-10-04 20:31:45 +03:00
Claudio Atzori
ee8a39e7d2
cleanup and refinements
2023-10-04 12:32:05 +02:00
Serafeim Chatzopoulos
e9f24df21c
Move SWH API Key from constants to workflow param
2023-10-03 20:57:57 +03:00
Serafeim Chatzopoulos
cae75fc75d
Add SWH in the collectedFrom field
2023-10-03 16:55:10 +03:00
Serafeim Chatzopoulos
b49a3ac9b2
Add actionsetsPath as a global WF param
2023-10-03 15:43:38 +03:00
Serafeim Chatzopoulos
24c43e0c60
Restructure workflow parameters
2023-10-03 15:11:58 +03:00
Serafeim Chatzopoulos
9f73d93e62
Add param for limiting repo Urls
2023-10-03 14:39:08 +03:00
Claudio Atzori
b446a9ed98
Merge branch 'beta' into peer_reviewed
2023-10-03 10:52:23 +02:00
Claudio Atzori
f344ad76d0
Merge pull request 'extended existing code to import of POCI from open citation' ( #340 ) from importpoci into beta
...
Reviewed-on: D-Net/dnet-hadoop#340
2023-10-03 10:52:11 +02:00
Claudio Atzori
5919e488dd
Merge branch 'beta' into importpoci
2023-10-03 10:43:53 +02:00
Serafeim Chatzopoulos
839a8524e7
Add action for creating actionsets
2023-10-02 23:50:38 +03:00
Claudio Atzori
c9a5ad6a02
extending the coverage of the peer non-unknown refereed instances
2023-10-02 16:28:42 +02:00
Miriam Baglioni
d7fccdc64b
fixed paths in wf to match the req of the pathname
2023-10-02 14:10:57 +02:00
Miriam Baglioni
9898470b0e
Addressing comments in D-Net/dnet-hadoop#340 \#issuecomment-10592
2023-10-02 12:54:16 +02:00
Giambattista Bloisi
c412dc162b
Fix bug in conversion from dedup json model to Spark Dataset of Rows: list of strings contained the json escaped representation of the value instead of the plain value, this caused instanceTypeMatch failures because of the leading and trailing double quotes
2023-10-02 11:34:51 +02:00
Claudio Atzori
4ac06c9e37
Merge pull request 'Fix bug in conversion from dedup json model to Spark Dataset of Rows (instanceTypeMatch no longer working)' ( #339 ) from fix_dedupfailsonmatchinginstances into master
...
Reviewed-on: D-Net/dnet-hadoop#339
2023-10-02 11:34:20 +02:00
Claudio Atzori
fa692b3629
Merge branch 'master' into fix_dedupfailsonmatchinginstances
2023-10-02 11:28:16 +02:00
Claudio Atzori
5d09b7db8b
Merge pull request 'SparkPropagateRelation relations do not propagate deletedByInference and invisible' ( #333 ) from consistency_keep_mergerels into beta
...
Reviewed-on: D-Net/dnet-hadoop#333
2023-10-02 11:27:57 +02:00
Claudio Atzori
7b403a920f
Merge branch 'beta' into consistency_keep_mergerels
2023-10-02 11:26:00 +02:00
Claudio Atzori
dc86018a5f
Merge branch 'merge_entities_job' into beta
2023-10-02 11:24:48 +02:00
Giambattista Bloisi
3c47920c78
Use asScala to convert java List to Scala Sequence
2023-10-02 11:04:47 +02:00
Claudio Atzori
7f244d9a7a
code formatting
2023-10-02 11:04:36 +02:00
Giambattista Bloisi
e239b81740
Fix defect #8997 : GenerateEventsJob is generating huge amounts of logs because broker entity similarity calculation consistently failed
2023-10-02 11:04:18 +02:00
Claudio Atzori
ef02648399
Merge pull request 'fixed dedup configuration management in the Broker workflow' ( #341 ) from fix_8997 into master
...
Reviewed-on: D-Net/dnet-hadoop#341
2023-10-02 11:03:50 +02:00
Claudio Atzori
d13bb534f0
Merge branch 'master' into fix_8997
2023-10-02 11:03:18 +02:00
Miriam Baglioni
e84f5b5e64
extended existing codo to accomodate import of POCI from open citation
2023-10-02 09:25:16 +02:00
Serafeim Chatzopoulos
ab0d70691c
Add step for archiving repoUrls to SWH
2023-09-28 20:56:18 +03:00
Giambattista Bloisi
775c3f704a
Fix bug in conversion from dedup json model to Spark Dataset of Rows: list of strings contained the json escaped representation of the value instead of the plain value, this caused instanceTypeMatch failures because of the leading and trailing double quotes
2023-09-27 22:30:47 +02:00
Serafeim Chatzopoulos
ed9c81a0b7
Add steps to collect last visit data && archive not found repository URLs
2023-09-27 19:00:54 +03:00
Sandro La Bruzzo
9c3ab11d5b
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2023-09-25 15:29:19 +02:00
Sandro La Bruzzo
423ef30676
minor fix on the aggregation of uniprot and pdb
2023-09-25 15:28:58 +02:00
Giambattista Bloisi
7152d47f84
Use asScala to convert java List to Scala Sequence
2023-09-20 16:14:27 +02:00
Claudio Atzori
4853c19b5e
code formatting
2023-09-20 15:53:21 +02:00
Giambattista Bloisi
1f226d1dce
Fix defect #8997 : GenerateEventsJob is generating huge amounts of logs because broker entity similarity calculation consistently failed
2023-09-20 15:42:00 +02:00
Alessia Bardi
0935d7757c
Use v5 of the UNIBI Gold ISSN list in test
2023-09-20 15:41:35 +02:00
Alessia Bardi
cc7204a089
tests for d4science catalog
2023-09-20 15:38:32 +02:00
Sandro La Bruzzo
76476cdfb6
Added maven repo for dependencies that are not in maven central
2023-09-20 10:33:14 +02:00
Alessia Bardi
6186cdc2cc
Use v5 of the UNIBI Gold ISSN list in test
2023-09-19 14:47:01 +02:00
Alessia Bardi
d94b9bebf7
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2023-09-19 13:38:45 +02:00
Alessia Bardi
19abba8fa7
tests for d4science catalog
2023-09-19 13:38:25 +02:00
dimitrispie
9ef971a146
Update step16-createIndicatorsTables.sql
...
Fix int year for:
indi_org_openess_year
indi_org_fairness_year
indi_org_findable_year
2023-09-19 14:25:42 +03:00
Serafeim Chatzopoulos
9d44418d38
Add collecting software code repository URLs
2023-09-14 18:43:25 +03:00
Serafeim Chatzopoulos
395a4af020
Run CC and RAM sequentieally in dhp-impact-indicators WF
2023-09-13 08:59:40 +02:00
Claudio Atzori
c2f179800c
Merge pull request 'Run CC and RAM sequentieally in dhp-impact-indicators WF' ( #338 ) from run_cc_and_ram_sequentially into master
...
Reviewed-on: D-Net/dnet-hadoop#338
2023-09-13 08:52:53 +02:00
Serafeim Chatzopoulos
2aed5a74be
Run CC and RAM sequentieally in dhp-impact-indicators WF
2023-09-12 22:31:50 +03:00
Claudio Atzori
8a6892cc63
[graph dedup] consistency wf should not remove the relations while dispatching the entities
2023-09-12 21:27:05 +02:00
Claudio Atzori
4dc4862011
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2023-09-12 14:34:34 +02:00
Claudio Atzori
dc80ab14d3
[graph dedup] consistency wf should not remove the relations while dispatching the entities
2023-09-12 14:34:28 +02:00
Alessia Bardi
77a2199837
updated test for EOSC comunity
2023-09-08 11:05:49 +02:00
Claudio Atzori
4786aa0e09
added Archive ouverte UNIGE (ETHZ.UNIGENF, opendoar____::1400) to the Datacite hostedBy_map
2023-09-07 11:21:07 +02:00
Claudio Atzori
265180bfd2
added Archive ouverte UNIGE (ETHZ.UNIGENF, opendoar____::1400) to the Datacite hostedBy_map
2023-09-07 11:20:35 +02:00
dimitrispie
5f90cc11e9
Update step16-createIndicatorsTables.sql
...
Fix indi_pub_bronze_oa
2023-09-06 14:14:38 +03:00
Claudio Atzori
da0e9828f7
resolved conflicts for PR#337
2023-09-06 11:28:46 +02:00
Claudio Atzori
9f5d16624c
Merge pull request '[graph raw] datainfo.invisible set as true only for entities' ( #336 ) from invisible_relations into beta
...
Reviewed-on: D-Net/dnet-hadoop#336
2023-09-04 16:14:47 +02:00
Claudio Atzori
adec6692ca
Merge branch 'beta' into invisible_relations
2023-09-04 16:13:06 +02:00
Claudio Atzori
15666e86a8
added collectedfrom to the affiliation relations imported from Crossref
2023-09-04 15:56:06 +02:00
Claudio Atzori
7d6bd4f20b
Merge pull request 'Fix import of affiliations relations from Crossref' ( #335 ) from 8876_fix_crossref_affiliation_relations_import into beta
...
Reviewed-on: D-Net/dnet-hadoop#335
2023-09-04 15:19:58 +02:00
Claudio Atzori
5b06c9d06f
[graph raw] datainfo.invisible set as true only for entities
2023-09-04 15:15:24 +02:00
Serafeim Chatzopoulos
7de0164c26
Fix import of affiliations relations from Crossref
2023-09-04 16:04:41 +03:00
Giambattista Bloisi
2caaaec42d
Include SparkCleanRelation logic in SparkPropagateRelation
...
SparkPropagateRelation includes merge relations
Revised tests for SparkPropagateRelation
2023-09-04 11:33:20 +02:00
dimitrispie
964c2f553e
Changes in indicators step, monitor step
...
- graduatedoctorates for observatory
- result_apc_affiliations table
- new indicators
indi_is_funder_plan_s
indi_funder_fairness
indi_ris_fairness
indi_funder_openess
indi_ris_openess
indi_funder_findable
indi_ris_findable
indi_is_project_result_after
- cast year to int in composite indicators
- new institutions
-- Universidade Católica Portuguesa
-- Iscte - Instituto Universitário de Lisboa
-- Munster Technological University
-- Cardiff University
-- Leibniz Institute of Ecological Urban and Regional Development
2023-09-01 10:57:02 +03:00
Giambattista Bloisi
6cc7d8ca7b
GroupEntities and DispatchEntites are now merged in GroupEntitiesSparkJob
2023-08-30 10:43:31 +02:00
Claudio Atzori
488d9a1cea
Merge pull request 'Add sparkExecutorMemoryOverhead workflow config to set off-heap memory for Spark actions. If not explicitly set it is defaulted to 1Gb' ( #331 ) from consistencywf_memoryoverhead_conf into beta
...
Reviewed-on: D-Net/dnet-hadoop#331
2023-08-29 16:31:36 +02:00
Giambattista Bloisi
6b1c05d118
Add sparkExecutorMemoryOverhead workflow config to set off-heap memory for Spark actions. If not explicitly set it is defaulted to 1Gb
2023-08-29 16:04:19 +02:00
Claudio Atzori
bf35280ea6
code formatting
2023-08-29 11:11:00 +02:00
Claudio Atzori
0515d81c7c
Merge pull request 'Rewrite SparkPropagateRelation exploiting Dataframe API' ( #330 ) from propagate_relation_rewrite into beta
...
Reviewed-on: D-Net/dnet-hadoop#330
2023-08-29 10:47:14 +02:00
Claudio Atzori
58665a246c
Merge branch 'beta' into propagate_relation_rewrite
2023-08-29 10:47:02 +02:00
Claudio Atzori
f437be80ad
[impact indicators] adjusted paths in the bip ranker wf parameters
2023-08-29 09:03:03 +02:00
Giambattista Bloisi
d012aec0b3
Revert PropagateRelation's argument name from outputPath to graphOutputPath in consistency workflow ( #8964 )
2023-08-28 22:44:54 +02:00
Giambattista Bloisi
a860e19423
Fix ensure all relations are written out, not only those managed by dedup
2023-08-28 15:36:02 +02:00
Giambattista Bloisi
0d7b2bf83d
Rewrite SparkPropagateRelation exploiting Dataframe API
2023-08-28 10:34:54 +02:00
Miriam Baglioni
9c8b41475a
Merge pull request '8172_impact_indicators_workflow' ( #284 ) from 8172_impact_indicators_workflow into beta
...
Reviewed-on: D-Net/dnet-hadoop#284
2023-08-14 15:50:48 +02:00
Serafeim Chatzopoulos
97c1ba8918
Merge actionsets of results and projects
2023-08-11 15:56:53 +03:00
Miriam Baglioni
35b8deb2c6
Merge pull request 'DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag' ( #329 ) from dispatch_filter_invisible_entities into beta
...
Reviewed-on: D-Net/dnet-hadoop#329
2023-08-10 12:56:18 +02:00
Giambattista Bloisi
95cd2b9b1e
Make filterInvisible a mandatory parameter of DispathEntitiesSparkJob
...
Make filterInvisible a mandatory parameter of both dedup/consistency and graph/group oozie workflows
2023-08-10 11:53:48 +02:00
Giambattista Bloisi
fab9920271
DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag
2023-08-09 15:41:43 +02:00
Miriam Baglioni
599828ce35
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2023-08-09 13:07:13 +02:00
Miriam Baglioni
c25ac21e5e
Merge pull request 'graph cleaning, suggestions from ticket 8898' ( #325 ) from cleaning_8898 into beta
...
Reviewed-on: D-Net/dnet-hadoop#325
2023-08-08 11:14:19 +02:00
Miriam Baglioni
c334fe2438
Merge pull request 'Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleted by inference or that are pointing to dangling entities' ( #328 ) from cleanup_relations_after_dedup into beta
...
Reviewed-on: D-Net/dnet-hadoop#328
2023-08-08 09:49:12 +02:00
Miriam Baglioni
0e2f855807
Merge pull request 'Updates Promotion DBs' ( #321 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#321
2023-08-07 12:09:16 +02:00
Miriam Baglioni
18fbe52b20
Merge pull request 'Import affiliation relations from Crossref' ( #320 ) from 8876 into beta
...
Reviewed-on: D-Net/dnet-hadoop#320
2023-08-07 10:45:30 +02:00
Giambattista Bloisi
97b6d1dc45
Filter ids by dataInfo.deletedbyinference and DataInfo.invisible flags
...
Filter relations also by dataInfo.invisible flag
2023-08-07 10:24:11 +02:00
Giambattista Bloisi
af49424b59
Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleyted by inference or that are pointing to dangling entities
2023-08-04 14:27:39 +02:00
Claudio Atzori
0bc74e2000
code formatting
2023-08-02 11:52:10 +02:00
Claudio Atzori
7180911ded
[graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests
2023-08-02 11:44:14 +02:00
Claudio Atzori
b9dddbfe54
rule out records with NULL dataInfo, except for Relations
2023-07-31 17:53:54 +02:00
Claudio Atzori
da1727f93f
rule out records with NULL dataInfo, except for Relations
2023-07-31 17:52:56 +02:00
Claudio Atzori
11ffb9bd68
rule out records with NULL dataInfo
2023-07-31 12:35:33 +02:00
Claudio Atzori
ccac6a7f75
rule out records with NULL dataInfo
2023-07-31 12:35:05 +02:00
Serafeim Chatzopoulos
7cefe2665b
Remove unnecessary classes
2023-07-28 19:14:39 +03:00
Serafeim Chatzopoulos
26a92ce762
Merge branch '8876' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8876
2023-07-28 19:03:57 +03:00
Serafeim Chatzopoulos
ebfba38ab6
Add changes from code review
2023-07-28 19:03:47 +03:00
Serafeim Chatzopoulos
eb8684a8cf
Merge branch 'beta' into 8876
2023-07-28 13:39:33 +02:00
Claudio Atzori
1275a07d45
Merge pull request '[graph indexing] expand the instance level fulltext in the XML records' ( #326 ) from instance_fulltext_xml into beta
...
Reviewed-on: D-Net/dnet-hadoop#326
2023-07-27 15:02:07 +02:00
Claudio Atzori
a72b9e96ac
expand the instance level fulltext in the XML records
2023-07-27 14:57:38 +02:00
Claudio Atzori
d512df8612
code formatting
2023-07-26 09:14:08 +02:00
Claudio Atzori
d8435a6512
inverted condition
2023-07-25 17:39:57 +02:00
Claudio Atzori
59764145bb
cherry picked & fixed commit 270df939c4
2023-07-25 17:39:00 +02:00
Claudio Atzori
270df939c4
partial implementation of the suggestions from https://support.openaire.eu/issues/8898
2023-07-25 17:29:50 +02:00
Claudio Atzori
8c63e4a864
Merge pull request 'Refactor Dedup using Spark Dataframe API, initial support for scala 2.12 and Spark 3.4' ( #324 ) from dedup-with-dataframe-2 into beta
...
Reviewed-on: D-Net/dnet-hadoop#324
2023-07-25 10:17:17 +02:00
Giambattista Bloisi
e64c2854a3
Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
...
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Giambattista Bloisi
bb5b845e3c
Use scala.binary.version property to resolve scala maven dependencies
...
Ensure consistent usage of maven properties
Profile for compiling with scala 2.12 and Spark 3.4
2023-07-24 11:13:48 +02:00
Claudio Atzori
002b24e06f
Merge pull request '[graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests' ( #315 ) from pid_cleaning into beta
...
Reviewed-on: D-Net/dnet-hadoop#315
2023-07-24 10:49:44 +02:00
Claudio Atzori
c754397a19
Merge branch 'beta' into pid_cleaning
2023-07-24 10:49:31 +02:00
Claudio Atzori
f0678cda09
Merge pull request 'fix_beta_tests' ( #323 ) from fix_beta_tests into beta
...
Reviewed-on: D-Net/dnet-hadoop#323
2023-07-24 10:47:35 +02:00
Serafeim Chatzopoulos
3a0f09774a
Add script to find score limits
2023-07-21 17:55:41 +03:00
Ilias Kanellos
06b9b71c4e
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-07-21 17:42:49 +03:00
Ilias Kanellos
2374f445a9
Produce additional bip update specific files
2023-07-21 17:42:46 +03:00
Serafeim Chatzopoulos
cb0f3c50f6
Format workflow.xml
2023-07-21 16:07:10 +03:00
Serafeim Chatzopoulos
c64e5e588f
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-07-21 15:27:02 +03:00
Serafeim Chatzopoulos
2cc5b1a39b
Fixes in workflow.xml
2023-07-21 15:26:50 +03:00
Ilias Kanellos
0f96af5d56
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-07-21 13:42:35 +03:00
Ilias Kanellos
03da965162
Format bip-score based file without doi references
2023-07-21 13:42:30 +03:00
Giambattista Bloisi
f03153823a
Update testCitationRelations number of expected citations according to changes made in 0559d8b4
(monodirectional citations)
2023-07-21 10:48:28 +02:00
Giambattista Bloisi
54c1eacef1
SparkJobTest was failing because testing workingdir was not cleaned up after eact test
2023-07-21 10:42:24 +02:00
Giambattista Bloisi
5e15f20e6e
Fix entityMerger that was excluding the authors of the first entity in the list to merge
2023-07-21 00:46:54 +02:00
Giambattista Bloisi
0210a14e43
Ignore timestamp differences in PromoteActionPayloadForGraphTableJobTest
2023-07-20 23:45:57 +02:00
Giambattista Bloisi
dba34505de
Fix SparkStatsTest bug where parquet tables were incorrectly read as text files leading to unpredictable count() values
2023-07-19 14:24:52 +02:00
Giambattista Bloisi
e47ed1fdb2
Use DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES in json mapper to avoid that tests fail if they encounter unmapped properties
2023-07-19 14:21:40 +02:00
Giambattista Bloisi
38dfebfbe6
Disable MdStoreClientTest test as it requires a local mongodb running and it does not perform any assertions
2023-07-19 14:18:56 +02:00
Miriam Baglioni
9e8e39f78a
-
2023-07-19 11:35:58 +02:00
Claudio Atzori
373a5f2c83
Merge pull request 'Master branch updates from beta July 2023' ( #317 ) from master_july23 into master
...
Reviewed-on: D-Net/dnet-hadoop#317
2023-07-18 18:22:04 +02:00
Serafeim Chatzopoulos
db4ca43ee8
Resolve conflict
2023-07-18 18:38:26 +03:00
Serafeim Chatzopoulos
be320ba3c1
Indentation fixes
2023-07-17 16:04:21 +03:00
dimitrispie
be4856ef35
Update step15.sql
2023-07-17 15:33:58 +03:00
Serafeim Chatzopoulos
bc1a4611aa
Minor changes
2023-07-17 11:17:53 +03:00
Claudio Atzori
8af129b0c7
merged stats promotion step from antonis/promotion-prod-only
2023-07-13 15:03:28 +02:00
dimitrispie
706092bc19
Update updateProductionViews.sh
2023-07-13 15:48:12 +03:00
dimitrispie
aedd279f78
Updates Promotion DBs
...
- Add a step for promoting the splitted monitor DBs
2023-07-13 15:35:46 +03:00
dimitrispie
163b2ee2a8
Changes
...
1. Monitor updates
2. Bug fixes during copy to impala cluster
2023-07-13 15:25:00 +03:00
dimitrispie
76901a25f9
Updates Promotion DBs
...
- Add a step for promoting the splitted monitor DBs
2023-07-12 22:49:08 +03:00
Giambattista Bloisi
ef493681d9
Merge pull request 'Import dnet-pace-core module in this project and use it after renaming to dhp-pace-core' ( #319 ) from beta_with_pace_core into beta
...
Reviewed-on: D-Net/dnet-hadoop#319
2023-07-11 14:03:15 +02:00
Serafeim Chatzopoulos
4eba14a80e
Add oozie workflow
2023-07-06 21:07:50 +03:00
Serafeim Chatzopoulos
c2998a14e8
Add basic tests for affiliation relations
2023-07-06 20:28:16 +03:00
Serafeim Chatzopoulos
bc7b00bcd1
Add bi-directional affiliation relations
2023-07-06 18:29:15 +03:00
Serafeim Chatzopoulos
12528ed2ef
Refactor PrepareAffiliationRelations.java to use OafMapperUtils common functions
2023-07-06 18:08:33 +03:00
Serafeim Chatzopoulos
bbc245696e
Prepare actionsets for BIP affiliations
2023-07-06 15:56:12 +03:00
Ilias Kanellos
0c433eccdd
Fix scores & Workflow
2023-07-06 15:06:28 +03:00
Ilias Kanellos
d5c39a1059
Fix map scores to doi
2023-07-06 15:04:48 +03:00
Ilias Kanellos
772d5f0aab
Make PR and AttRank serial
2023-07-06 13:47:51 +03:00
Giambattista Bloisi
801da2fd4a
New sources formatted by maven plugin
2023-07-06 10:28:53 +02:00
Giambattista Bloisi
bd3fcf869a
rename dnet-pace-core into dhp-pace-core module and use it as dependency in other modules
2023-07-06 10:02:23 +02:00
Serafeim Chatzopoulos
347a889b20
Read affiliation relations
2023-07-06 00:51:01 +03:00
Giambattista Bloisi
3b35db5fbd
Import dnet-pace-core module from dnet-dedup repository
2023-07-05 22:23:06 +02:00
Miriam Baglioni
8dcd028eed
[UsageCount] fixed typo in attribute name for datasource table
2023-07-01 16:07:22 +02:00
Miriam Baglioni
4c9bc4c3a5
refactoring
2023-06-30 19:05:15 +02:00
Miriam Baglioni
8621377917
[UsageCount] fixed typo in attribute name for datasource table
2023-06-30 19:02:44 +02:00
Miriam Baglioni
ef2dd7a980
resolved conflicts
2023-06-30 18:59:47 +02:00
Miriam Baglioni
7738372125
[UsageCount] fixed typo in attribute name for datasource table
2023-06-30 18:56:41 +02:00
Miriam Baglioni
55ea485783
[UsageCount] split the count for result at the level of the datasource. for each indicator one unit is specified for each datasource contrinuting to that indicator value. The datasource key is the value of the key element in the unit for the measure, while the count for that datasource is in the value
2023-06-30 18:39:30 +02:00
Claudio Atzori
f3a85e224b
merged from branch beta the bulk tagging (single step, negative constraints), the cleanig worflow (single step, pid type based cleaning), instance level fulltext
2023-06-28 13:33:57 +02:00
Claudio Atzori
4ef0f2ec26
added dependency commons-validator:commons-validator:1.7
2023-06-28 13:32:01 +02:00
Claudio Atzori
288ec0b7d6
[doiboost] merged workflow from branch beta
2023-06-28 09:15:37 +02:00
Claudio Atzori
5f32edd9bf
adopting dhp-schema:3.17.1
2023-06-27 16:57:17 +02:00
Claudio Atzori
e10ce92fe5
[stats wf] merged workflows from branch beta
2023-06-27 14:32:48 +02:00
Claudio Atzori
b93e1541aa
Merge pull request 'update sql query to return distinct pids' ( #301 ) from distinct_pids_from_openorgs into master
...
Reviewed-on: D-Net/dnet-hadoop#301
2023-06-27 12:24:47 +02:00
Claudio Atzori
d029bf0b94
Merge branch 'master' into distinct_pids_from_openorgs
2023-06-27 12:24:35 +02:00
Claudio Atzori
0f5a819f44
[graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests
2023-06-23 16:10:49 +02:00
Serafeim Chatzopoulos
60f25b780d
Minor fixes in workflow.xml and job.properties
2023-06-23 12:51:50 +03:00
Michele Artini
009d7f312f
fixed a datasource Id
2023-06-21 16:17:34 +02:00
Miriam Baglioni
e4b27182d0
[master] refactoring
2023-06-21 11:15:53 +02:00
Giambattista Bloisi
758e662ab8
Revert "REmove duplicated code and ensure that load and initialization is done through "DedupConfig.load" method"
...
This reverts commit 485f9d18cb
.
2023-06-19 13:08:10 +02:00
Giambattista Bloisi
485f9d18cb
REmove duplicated code and ensure that load and initialization is done through "DedupConfig.load" method
2023-06-19 13:00:02 +02:00
Claudio Atzori
6210f6ee48
Merge pull request 'Precompile blacklists patterns before evaluating clustering criteria' ( #1 ) from optimized-clustering into master
...
Reviewed-on: D-Net/dnet-dedup#1
2023-06-19 12:43:49 +02:00
Giambattista Bloisi
b0ade43608
Precompile blacklists patterns before evaluating clustering criteria
...
Enable Junit 5 tests in maven builds
Make path comparisons platform-independent
Read String resource files assuming they are encoded in UTF-8
Fix a few test conditions
2023-06-16 09:41:11 +02:00
Michele Artini
a92206dab5
re-added the name of a column (pid)
2023-06-13 11:43:10 +02:00
Miriam Baglioni
d9506035e4
[ZenodoApi] gone back to okhttp3 to send the payload.
2023-06-09 12:05:02 +02:00
Alessia Bardi
118e72d7db
Updated officialnmae of pangaea in hostedbymap for Datacite to avoid duplicate entries in the source filter of the portal
2023-06-06 14:39:12 +02:00
Alessia Bardi
5befd93d7d
test records for Solr indexing
2023-06-06 14:34:33 +02:00
Michele Artini
cae92cf811
update sql query to return distinct pids
2023-06-06 14:06:06 +02:00
Miriam Baglioni
b64a5eb4a5
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2023-05-24 15:21:58 +02:00
Ilias Kanellos
a1b9187039
Fix syntax error on workflow.xml
2023-05-23 17:17:12 +03:00
Ilias Kanellos
6a7e370a21
Remove unnecessary counts in graph creation
2023-05-23 16:48:58 +03:00
Ilias Kanellos
ec4e010687
End after rankings | Create graph debugged
2023-05-23 16:44:04 +03:00
Claudio Atzori
654ffcba60
Merge pull request '[UsageCount] addition of usagecount for Projects and datasources' ( #296 ) from master_datasource_project_usagecounts into master
...
Reviewed-on: D-Net/dnet-hadoop#296
2023-05-22 16:13:24 +02:00
Claudio Atzori
db625e548d
[UsageCount] addition of usagecount for Projects and datasources
2023-05-22 15:00:46 +02:00
Alessia Bardi
04141fe259
tests for records from D4Science catalogues
2023-05-19 14:28:24 +02:00
Ilias Kanellos
38020e242a
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-05-16 17:34:53 +03:00
Ilias Kanellos
3d69f33c84
Fix selection of columns in graph creation
2023-05-16 17:34:42 +03:00
Ilias Kanellos
3c38f7ba6f
Fix selection of columns in graph creation
2023-05-16 17:32:53 +03:00
Serafeim Chatzopoulos
8ef718c363
Fix workflow application path
2023-05-16 16:28:48 +03:00
Serafeim Chatzopoulos
26328e2a0d
Move job.properties
2023-05-16 14:39:53 +03:00
Serafeim Chatzopoulos
4eec3e7052
Add jobTracker, nameNode && spark2Lib as global params in oozie wf
2023-05-15 22:28:48 +03:00
Serafeim Chatzopoulos
b83135c252
Add missing kill nodes in workflow.xml
2023-05-15 19:55:35 +03:00
Serafeim Chatzopoulos
45f2aa0867
Move end node ... at the end in workflow.xml
2023-05-15 17:52:20 +03:00
Serafeim Chatzopoulos
12a57e1f58
Resolve conflicts
2023-05-15 16:20:11 +03:00
Serafeim Chatzopoulos
82e2a96f51
Resolve conflicts
2023-05-15 15:53:12 +03:00
Serafeim Chatzopoulos
b8e8c959fe
Update workflow.xml && job.properties
2023-05-15 15:50:23 +03:00
Ilias Kanellos
4a905932a3
Spark properties from job.properties
2023-05-15 15:24:22 +03:00
Serafeim Chatzopoulos
07818131ef
Update documentation
2023-05-15 13:04:44 +03:00
Ilias Kanellos
1788ac2d4d
Correct filtering for MAG records
2023-05-12 12:55:43 +03:00
Ilias Kanellos
5ddbb4ad10
Spark properties no longer hardcoded
2023-05-11 15:36:47 +03:00
Ilias Kanellos
3de35fd6a3
Produce 5 classes of ranking scores
2023-05-11 14:42:25 +03:00
Ilias Kanellos
90332439ad
Remove deletion of synonym folder
2023-04-28 13:45:19 +03:00
Ilias Kanellos
a98da54896
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-04-28 13:23:49 +03:00
Ilias Kanellos
09485fbee3
Fixed unicode bug. Workflow ends after first script
2023-04-28 13:09:13 +03:00
Serafeim Chatzopoulos
614cc1089b
Add separate forder for results && project actionsets
2023-04-27 12:37:15 +03:00
Serafeim Chatzopoulos
815a4ddbba
Add actionset creation for project bip indicators in workflow
2023-04-26 20:40:06 +03:00
Serafeim Chatzopoulos
ee04cf92bf
Add actionsets for project impact indicators
2023-04-26 20:23:46 +03:00
Alessia Bardi
b88f009d9f
combined level 4 and 6 for the demo
2023-04-24 12:10:33 +02:00
Alessia Bardi
5ffe82ffd8
aligned to current DMF index layout on production
2023-04-24 12:09:55 +02:00
Alessia Bardi
1c173642f0
removed level5 from test records
2023-04-24 09:32:32 +02:00
Alessia Bardi
382f46a8e4
tests to generate the XML records for the index for the EDITH demo on digital twins, integrating output from the FoS classifier
2023-04-21 16:46:30 +02:00
Miriam Baglioni
9fc8ebe98b
refactoring
2023-04-19 09:32:13 +02:00
Serafeim Chatzopoulos
23f58a86f1
Change jar param in project impact indicators action
2023-04-18 12:26:01 +03:00
Miriam Baglioni
24c41806ac
[ZenodoApiClienttest] change test to mirror change in the omplementation
2023-04-18 09:08:09 +02:00
Miriam Baglioni
087b5a7973
[ZenodiAPIClient] new version of the API to connect to Zenodo (change the http client
2023-04-17 18:59:22 +02:00
Michele De Bonis
cb595c87bb
implementation of the support for authors deduplication: cosinesimilarity comparator and double array json parser
2023-04-17 11:06:27 +02:00
Claudio Atzori
688e3b7936
added eoscifguidelines in the result view; removed compute statistics statements
2023-04-11 11:45:56 +02:00
Claudio Atzori
2e465915b4
[graph to Solr] using dedicated sparkExecutorCores, sparkExecutorMemory, sparkDriverMemory in convert_to_xml
2023-04-11 10:43:44 +02:00
Serafeim Chatzopoulos
7256c8d3c7
Add script for aggregating impact indicators at the project level
2023-04-07 16:30:12 +03:00
Claudio Atzori
4a4ca634f0
Merge pull request 'advConstraintsInBeta' ( #288 ) from advConstraintsInBeta into master
...
Reviewed-on: D-Net/dnet-hadoop#288
2023-04-06 15:24:23 +02:00
Miriam Baglioni
c6a7602b3e
refactoring after compilation
2023-04-06 14:45:01 +02:00
Miriam Baglioni
831055a1fc
change of the property for test purposes, addition of two new verbs, and fix of issue for advanced constraints
2023-04-06 14:41:32 +02:00
Miriam Baglioni
cf3d0f4f83
fixed issue on bulktagging for the advanced constraints
2023-04-06 12:17:35 +02:00
Claudio Atzori
4f67225fbc
Merge pull request 'doiboostMappingExtention' ( #286 ) from doiboostMappingExtention into master
...
Reviewed-on: D-Net/dnet-hadoop#286
2023-04-06 09:25:08 +02:00
Claudio Atzori
e093f04874
Merge pull request 'AdvancedConstraint' ( #285 ) from advConstraintsInBeta into master
...
Reviewed-on: D-Net/dnet-hadoop#285
2023-04-06 09:24:54 +02:00
Miriam Baglioni
c5a9f39141
Extended the association project - result in the mapping from CrossRef
2023-04-05 16:48:36 +02:00
Miriam Baglioni
ecc05fe0f3
Added the code for the advancedConstraint implementation during the bulkTagging
2023-04-05 16:40:29 +02:00
Claudio Atzori
42442ccd39
Merge pull request 'updated the order of the compatibilities' ( #275 ) from compatibility_order into master
...
Reviewed-on: D-Net/dnet-hadoop#275
2023-04-05 12:44:14 +02:00
Michele De Bonis
297eb207a5
minor change in the author match which now can compute count and percentage
2023-04-04 17:10:37 +02:00
Miriam Baglioni
9a9cc6a1dd
changed the way the tar archive is build to support renaming in case we need to change .tt.gz into .json.gz
2023-04-04 11:40:58 +02:00
Serafeim Chatzopoulos
102aa5ab81
Add dependency to dhp-aggregation
2023-03-21 19:25:29 +02:00
Serafeim Chatzopoulos
f3e5abf63b
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-03-21 18:26:09 +02:00
Serafeim Chatzopoulos
3e8a4cf952
Rearrange resources folder structure
2023-03-21 18:25:55 +02:00
Serafeim Chatzopoulos
f992ecb657
Checkout BIP-Ranker during 'prepare-package' && add it in the oozie-package.tar.gz
2023-03-21 18:03:55 +02:00
Ilias Kanellos
9dc8f0f05f
Add ActionSet step
2023-03-21 16:14:15 +02:00
Ilias Kanellos
b5c252865c
Add filtering based on citation source
2023-03-20 15:38:36 +02:00
Serafeim Chatzopoulos
720fd19b39
Add dhp-impact-indicators workflow files
2023-03-14 19:28:27 +02:00
Serafeim Chatzopoulos
c6e39b7f33
Add dhp-impact-indicators
2023-03-14 18:50:54 +02:00
Michele Artini
200098b683
updated the order of the compatibilities
2023-02-22 11:52:59 +01:00
Michele Artini
9c1df15071
null values in date range conditions
2023-02-13 16:05:58 +01:00
Miriam Baglioni
32870339f5
refactoring after compile
2023-02-13 13:06:48 +01:00
Miriam Baglioni
7184cc0804
[FoS] added check for null on level1 subject
2023-02-13 13:03:49 +01:00
Miriam Baglioni
7473093c84
[FoS] changed the default separator from comma to tab to solve the issue in subject value split
2023-02-10 15:34:52 +01:00
Miriam Baglioni
5f0906be60
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2023-02-02 17:13:14 +01:00
Michele De Bonis
6a6c266dde
implementation of author dedup configuration and lnfi clustering function
2023-01-31 11:53:10 +01:00
Claudio Atzori
1b37516578
[bulk tagging] better node naming
2023-01-20 16:11:26 +01:00
Claudio Atzori
c1e2460293
[cleaning] the datasource master-duplicate fixup should not be brought to production yet
2023-01-20 09:20:26 +01:00
Claudio Atzori
3800361033
[country propagation] fixes error 'cannot resolve countrySet given input columns: []' when there is no prepared information driving the propagation process for a given result type
2023-01-19 15:57:43 +01:00
Michele Artini
699736addc
NPE prevention
2023-01-11 13:14:44 +01:00
Claudio Atzori
f86e19b282
code formatting
2023-01-11 09:53:19 +01:00
Michele Artini
d40e20f437
Considering instance pids and alteternative identifiers
2023-01-11 09:37:34 +01:00
Michele Artini
4953ae5649
fixed an invalid char
2023-01-11 08:35:53 +01:00
Miriam Baglioni
c60d3a2b46
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2023-01-09 17:28:27 +01:00
Claudio Atzori
7becdaf31d
Merge pull request 'Workaround to use new version of intellij on Master' ( #266 ) from master_intellij into master
...
Reviewed-on: D-Net/dnet-hadoop#266
2022-12-23 10:32:21 +01:00
Miriam Baglioni
b713132db7
[Cleaning] adding missing classes
2022-12-21 12:49:08 +01:00
Miriam Baglioni
11f2b470d3
[Cleaning] adding missing classes
2022-12-21 12:42:19 +01:00
Sandro La Bruzzo
91c70b15a5
updated lines function to it's implementation linesWithSeparators.map(l => l.stripLineEnd) in this way we force scala plugin compiler to consider this pipeline scala code and not java.string.lines() pipeline
2022-12-21 11:14:42 +01:00
Claudio Atzori
f910b7379d
[cleaning] recovering missing resources from D-Net/dnet-hadoop#265
2022-12-21 09:26:34 +01:00
Claudio Atzori
33bdad104e
[cleaning] align parameter names
2022-12-20 21:43:59 +01:00
Claudio Atzori
5816ded93f
code formatting
2022-12-20 10:41:40 +01:00
Claudio Atzori
46972f8393
[orcid propagation] skip empty directory
2022-12-20 10:28:22 +01:00
Claudio Atzori
da85ca697d
Merge pull request 'cleanCountryOnMaster' ( #265 ) from cleanCountryOnMaster into master
...
Reviewed-on: D-Net/dnet-hadoop#265
2022-12-16 15:58:44 +01:00
Miriam Baglioni
059e100ec7
[Clean Country] moving other resources for testing purposes
2022-12-16 15:48:21 +01:00
Miriam Baglioni
fc95a550c3
[Clean Country] moving other resources for testing purposes
2022-12-16 15:46:32 +01:00
Miriam Baglioni
6901ac91b1
[Clean Country] moving source and resources to master
2022-12-16 15:42:49 +01:00
Claudio Atzori
08c4588d47
Merge pull request 'Changes from beta stats wf to prod' ( #264 ) from antonis.lempesis/dnet-hadoop:beta into master
...
Reviewed-on: D-Net/dnet-hadoop#264
2022-12-07 15:56:22 +01:00
Miriam Baglioni
29d3da85f1
[EOSC DUMP] added resources needed for the review as test
2022-11-25 17:16:20 +01:00
Miriam Baglioni
33a2b1b5dc
[Bulk Tag] fixed typo in test configuration
2022-11-23 11:31:17 +01:00
Miriam Baglioni
c6df8327b3
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2022-11-23 11:26:57 +01:00
Miriam Baglioni
935aa367d8
[BulkTag] removed commented code
2022-11-23 11:16:39 +01:00
Miriam Baglioni
43aedbdfe5
[BulkTag] changed verb name in configuration
2022-11-23 11:14:23 +01:00
Miriam Baglioni
b6da9b67ff
[BulkTag] fixed typo in annotation for verb name
2022-11-23 11:13:58 +01:00
Michele De Bonis
14f6346676
implementation of the new software configuration
2022-11-22 17:48:34 +01:00
Claudio Atzori
a34c8b6f81
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2022-11-22 10:22:31 +01:00
Miriam Baglioni
122e75aa17
fixed conflicts
2022-11-21 18:13:12 +01:00
Miriam Baglioni
cee7a45b1d
[Bulk Tag Datasource] fixed issue with verb name and add new test for neanias selection for orcid
2022-11-21 18:10:20 +01:00
Michele De Bonis
9fee2ed611
minor changes
2022-11-21 14:35:46 +01:00
Claudio Atzori
ed64618235
increased spark.sql.shuffle.partitions in the last join phase of the result (publication) to community through semantic relation propagation
2022-11-18 16:06:51 +01:00
Claudio Atzori
8742934843
added spark.sql.shuffle.partitions in the last join phase of the result to community through semantic relation propagation
2022-11-18 11:32:22 +01:00
Claudio Atzori
13cc592f39
code formatting
2022-11-15 09:37:57 +01:00
Claudio Atzori
af15b1e48d
[eosc tag] extending criteria for Jupyter Notebook (adding to ORP the same constraint)
2022-11-14 18:30:43 +01:00
Claudio Atzori
eb45ba7af0
extended mapping from ODF relations (PR#251)
2022-11-14 18:26:13 +01:00
Claudio Atzori
a929dc5fee
integrated changes for mapping ROHub contents in the Graph
2022-11-14 18:15:35 +01:00
Miriam Baglioni
5f9383b2d9
[EOSC TAG] remove reduntant check for jupyter notebook
2022-11-11 14:06:19 +01:00
Miriam Baglioni
b18bbca8af
[EOSC TAG] adding search in orp for jupyter notebook criteria
2022-11-11 12:42:58 +01:00
dimitrispie
55fa3b2a17
Hive memory parameters
2022-11-03 15:21:04 +01:00
Claudio Atzori
80c5e0f637
code formatting
2022-09-27 12:51:51 +02:00
Claudio Atzori
c01d528ab2
suppressing hyper verbose spark logs during unit test execution
2022-09-23 15:19:50 +02:00
Claudio Atzori
e6d788d27a
[stats wf] adding missing changes lost in PR#248
2022-09-23 14:38:42 +02:00
Claudio Atzori
930f118673
fixed semantic (subreltype) for ServiceOrganization relations
2022-09-22 16:24:44 +02:00
Claudio Atzori
b2c3071e72
Merge branch 'master' into beta2master_sept_2022
2022-09-22 14:39:15 +02:00
Claudio Atzori
10ec074f79
Merge remote-tracking branch 'antonis.lempesis/beta' into beta2master_sept_2022
2022-09-22 14:12:19 +02:00
Claudio Atzori
7225fe9cbe
integrated changes from discard-non-wellformed
2022-09-22 10:06:07 +02:00
Miriam Baglioni
869e129288
[EOSC BulkTag] refactoring
2022-09-20 16:13:18 +02:00
Miriam Baglioni
840465958b
[EOSC BulkTag] filtering aout the datasources registered in the eosc with compatibility different from 3.0, 4.0 for literature, data and CRIS to add the context eosc to the results
2022-09-20 10:30:41 +02:00
Claudio Atzori
bdc8f993d0
[Patch Hosted By] check also the presence of datasource.officialname.value
2022-09-19 15:28:03 +02:00
Miriam Baglioni
ec87149cb3
[Patch Hosted By] added fix to avoi NPE error when datasource official name is not provided. Removing datasources if no officialname has been provided
2022-09-19 14:06:52 +02:00
Miriam Baglioni
b42e2c9df6
[Patch Hosted By] added fix to avoi NPE error when datasource official name is not provided
2022-09-19 12:30:32 +02:00
Miriam Baglioni
1329aa8479
[EOSC BulkTag] modified test to remove association of result to eosc when eoscifguidelines are set
2022-09-19 11:59:48 +02:00
Miriam Baglioni
a0ee1a8640
[EOSC BulkTag] remove addition of eosc context for result with eosc if guidelines set
2022-09-19 11:44:10 +02:00
Claudio Atzori
96062164f9
Merge pull request '[Aggregator graph|master] Discard invalid records' ( #245 ) from discard-non-wellformed into master
...
Reviewed-on: D-Net/dnet-hadoop#245
2022-09-19 09:48:16 +02:00
Claudio Atzori
35bb7c423f
updated dhp-schemas version to 2.12.1
2022-09-16 16:13:15 +02:00
Claudio Atzori
fd87571506
code formatting
2022-09-16 16:05:03 +02:00
Claudio Atzori
c527112e33
Merge commit 'ff6f789b6d9be0567b6ad72f8a0e75fe3f52726a' into beta2master_sept_2022
2022-09-16 15:59:10 +02:00
Claudio Atzori
65209359bc
Merge commit 'b5f7bd30be7f7adaaa28170740da0484b50a77ed' into beta2master_sept_2022
2022-09-16 15:58:11 +02:00
Claudio Atzori
d72a64ded3
Merge commit '690be4482fc84327dc7617acbc8d976d559df512' into beta2master_sept_2022
2022-09-16 15:57:44 +02:00
Claudio Atzori
3e8499ce47
Merge commit '71b069ca90a2f7ec09d64241c60917d3636fc81e' into beta2master_sept_2022
2022-09-16 15:57:20 +02:00
Claudio Atzori
61aacb3271
Merge commit '1203378441dc6d8e8435cacd42e76e11746f6d1b' into beta2master_sept_2022
2022-09-16 15:56:55 +02:00
Claudio Atzori
dbb567251a
merged 853c996fa2
2022-09-16 15:56:28 +02:00
Claudio Atzori
c7e8ad853e
Merge commit '2b5f8c9c9a3611c57ee5febfe262a455a39ad801' into beta2master_sept_2022
2022-09-16 15:55:04 +02:00
Claudio Atzori
0849ebfd80
merged a11eb38065
2022-09-16 15:54:32 +02:00
Claudio Atzori
281239249e
Merge commit 'b7c387c21f946adbc9da90ded95166205195edb0' into beta2master_sept_2022
2022-09-16 15:49:20 +02:00
Claudio Atzori
45fc5e12be
Merge commit 'cb7c07c54e59675e8dffe42b7f2a13f16c956068' into beta2master_sept_2022
2022-09-16 15:48:55 +02:00
Claudio Atzori
1c05aaaa2e
Merge commit '3418ce50ac9b28fed4fa949919e6c8208738cdcf' into beta2master_sept_2022
2022-09-16 15:48:36 +02:00
Claudio Atzori
01d5ad6361
Merge commit 'd85ba3c1a9d7f0e80565742161ff6c9ecffd52b7' into beta2master_sept_2022
2022-09-16 15:48:16 +02:00
Claudio Atzori
d872d1cdd9
Merge commit 'a4815f6bec87f05be8cd740d236707949a0f746e' into beta2master_sept_2022
2022-09-16 15:47:49 +02:00
Claudio Atzori
ab0efecab4
Merge commit '84598c75356cf580de6c81653a9351e9b8173639' into beta2master_sept_2022
2022-09-16 15:47:05 +02:00
Claudio Atzori
725c3c68d0
Merge commit '844f6eb46533cdd4be3210401b10401322079640' into beta2master_sept_2022
2022-09-16 15:46:40 +02:00
Claudio Atzori
300ae6221c
Merge commit '32cee1f619eb30d2e2ac6083435b76b1aba7db09' into beta2master_sept_2022
2022-09-16 15:45:57 +02:00
Claudio Atzori
0ec2eaba35
Merge commit 'c1f2ffc53dc41f1fac3855b2d2df7d6a5ea15e3e' into beta2master_sept_2022
2022-09-16 15:45:27 +02:00
Claudio Atzori
a387807d43
Merge commit 'b78889a0ce27a79c7ab2d8da05b118ee4f1bcb36' into beta2master_sept_2022
2022-09-16 15:44:17 +02:00
Claudio Atzori
2abe2bc137
Merge commit '08ce2cadc2d84aa982726e429c280a905536a715' into beta2master_sept_2022
2022-09-16 15:43:49 +02:00
Claudio Atzori
a07c876922
Merge commit '27a91841e7fa2a1b615b4d1e161d606db5bead96' into beta2master_sept_2022
2022-09-16 15:43:02 +02:00
Claudio Atzori
cbd48bc645
Merge commit 'efd96e7e664e4139321e35e8d172b884ba4b61a1' into beta2master_sept_2022
2022-09-16 15:38:56 +02:00
miconis
9ddd24ba36
implementation of comparators and clustering function for the author deduplication
2022-04-19 10:18:09 +02:00
miconis
97a32faf9b
test implementation for the new fdup version
2022-04-13 09:48:56 +02:00
miconis
10172553ab
[maven-release-plugin] prepare for next development iteration
2022-03-15 15:06:18 +01:00
miconis
bd919ac98d
[maven-release-plugin] prepare release dnet-dedup-4.1.12
2022-03-15 15:06:12 +01:00
miconis
a965233dd0
bug fix in the normalization of a legalname, city map updated and transliteration support added
2022-03-15 14:59:13 +01:00
miconis
ac9708e31b
[maven-release-plugin] prepare for next development iteration
2022-03-09 13:43:48 +01:00
miconis
a5a6054039
[maven-release-plugin] prepare release dnet-dedup-4.1.11
2022-03-09 13:43:44 +01:00
miconis
3bc07c5881
bug fix in the AuthorMatch, implementation of the concat function in the model creation with jpath query
2022-03-09 12:53:09 +01:00
miconis
699612dd17
implementation of the size threshold on authors list match
2022-03-08 16:49:28 +01:00
miconis
8f07f0c537
[maven-release-plugin] prepare for next development iteration
2022-01-13 17:22:16 +01:00
miconis
620e35db28
[maven-release-plugin] prepare release dnet-dedup-4.1.10
2022-01-13 17:22:12 +01:00
miconis
2ff97781d2
minor change
2022-01-13 17:20:20 +01:00
miconis
1ff6a3dc11
[maven-release-plugin] prepare for next development iteration
2022-01-13 15:15:05 +01:00
miconis
003bcf1699
[maven-release-plugin] prepare release dnet-dedup-4.1.9
2022-01-13 15:15:00 +01:00
miconis
2f1ba56f61
bug fix in the authormatch comparator, implementation of tests
2022-01-13 11:58:28 +01:00
miconis
cea8440153
[maven-release-plugin] prepare for next development iteration
2021-12-30 13:11:57 +01:00
miconis
eb48d31ea6
[maven-release-plugin] prepare release dnet-dedup-4.1.8
2021-12-30 13:11:52 +01:00
miconis
a224bf70a4
implementation of new comparators for publication dedup configuration update
2021-12-27 17:35:02 +01:00
miconis
8f1db32921
implementation of the instance type comparator and its tests
2021-11-04 15:20:57 +01:00
miconis
fbb1b66bfb
dedup test implementation & graph drawing tools
2021-09-13 14:53:19 +02:00
miconis
1144d50a11
[maven-release-plugin] prepare for next development iteration
2021-05-03 16:09:56 +02:00
miconis
f33a18ca9d
[maven-release-plugin] prepare release dnet-dedup-4.1.7
2021-05-03 16:09:08 +02:00
miconis
4bce4f2e8e
minor change: version updated
2021-05-03 16:05:39 +02:00
miconis
c6266242e3
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-dedup
2021-05-03 15:38:00 +02:00
miconis
4988e9f80d
implementation of cross comparison for different fields, addition of clustering mechanism to collapse keys from different clustering functions on the same cluster
2021-05-03 15:37:41 +02:00
Claudio Atzori
58d013e24f
[maven-release-plugin] prepare for next development iteration
2021-04-12 16:12:15 +02:00
Claudio Atzori
3a7336157b
[maven-release-plugin] prepare release dnet-dedup-4.0.6
2021-04-12 16:12:10 +02:00
miconis
ed0d5d3e1d
implementation of the wf to dedup entities, addition of the module to run the wf on the cluster
2020-12-04 15:41:31 +01:00
miconis
72116446ec
[maven-release-plugin] prepare for next development iteration
2020-09-29 12:06:38 +02:00
miconis
05a03d97cd
[maven-release-plugin] prepare release dnet-dedup-4.0.5
2020-09-29 12:06:35 +02:00
miconis
2a01022712
minor changes
2020-09-29 12:05:50 +02:00
miconis
dd34e371d7
fixed error in the treeprocessor. it used th=-1 as default value, now it use th=1
2020-09-29 12:01:25 +02:00
miconis
19c3c90d7b
fixed error in the block processor: entities with orderField=null were not considered
2020-09-19 17:43:41 +02:00
Sandro La Bruzzo
a109ebe287
fixed NPE
2020-08-06 10:27:05 +02:00
miconis
a5a3ea24f8
[maven-release-plugin] prepare for next development iteration
2020-07-16 18:59:25 +02:00
miconis
840fe8f4d3
[maven-release-plugin] prepare release dnet-dedup-4.0.4
2020-07-16 18:59:22 +02:00
miconis
07ab904d60
implementation of the clustering function for the suffixprefix chain
2020-07-16 18:57:55 +02:00
Claudio Atzori
eaf7defe0c
[maven-release-plugin] prepare for next development iteration
2020-07-15 17:57:09 +02:00
Claudio Atzori
ff2c8eba12
[maven-release-plugin] prepare release dnet-dedup-4.0.3
2020-07-15 17:57:04 +02:00
Claudio Atzori
7cc3742a26
removed maven release.property
2020-07-15 17:52:27 +02:00
Claudio Atzori
14611ea450
reverted to 4.0.3-SNAPSHOT
2020-07-15 17:37:36 +02:00
Claudio Atzori
9f20f23870
Revert "wordssuffixprefix: adjust the token length according to the number of words; removed maven release temporary files"
...
This reverts commit 51d91fa520
.
2020-07-15 17:35:56 +02:00
Claudio Atzori
9efcd8e245
Revert "reverted to 4.0.3-SNAPSHOT"
...
This reverts commit ec97983ce1
.
2020-07-15 17:28:37 +02:00
Claudio Atzori
ba493f9ab8
[maven-release-plugin] rollback the release of dnet-dedup-4.0.3
2020-07-15 17:24:43 +02:00
Claudio Atzori
6c98d4c436
[maven-release-plugin] prepare release dnet-dedup-4.0.3
2020-07-15 17:24:25 +02:00
Claudio Atzori
ec97983ce1
reverted to 4.0.3-SNAPSHOT
2020-07-15 17:20:12 +02:00
Claudio Atzori
51d91fa520
wordssuffixprefix: adjust the token length according to the number of words; removed maven release temporary files
2020-07-15 17:13:45 +02:00
Claudio Atzori
b79ea97107
Revert "wordssuffixprefix: adjust the token length according to the number of words; removed maven release temporary files"
...
This reverts commit d2861950ac
.
2020-07-15 17:11:46 +02:00
Claudio Atzori
92aadbfc7b
[maven-release-plugin] prepare release dnet-dedup-4.0.3
2020-07-15 17:04:20 +02:00
Claudio Atzori
d2861950ac
wordssuffixprefix: adjust the token length according to the number of words; removed maven release temporary files
2020-07-15 16:49:47 +02:00
miconis
244a037a90
implementation of a class to test the clustering functions
2020-07-12 10:13:54 +02:00
miconis
7aa2001a8b
[maven-release-plugin] prepare for next development iteration
2020-07-02 17:06:38 +02:00
miconis
c72055f543
[maven-release-plugin] prepare release dnet-dedup-4.0.2
2020-07-02 17:06:36 +02:00
miconis
f933fd33e0
implemented new function for clustering
2020-07-02 17:04:17 +02:00
miconis
411d1cc24f
implementation of the test for the dedup and addition of new support classes
2020-06-11 10:46:46 +02:00
miconis
48c094f599
[maven-release-plugin] prepare for next development iteration
2020-04-24 14:39:01 +02:00
miconis
4365ba41c9
[maven-release-plugin] prepare release dnet-dedup-4.0.1
2020-04-24 14:38:58 +02:00
miconis
6e9b27f37d
implementation of the mechanism to truncate the string and the lists
2020-04-24 14:36:42 +02:00
Sandro La Bruzzo
8e4211708e
[maven-release-plugin] prepare for next development iteration
2020-02-10 12:51:04 +01:00
Sandro La Bruzzo
24e2ab9092
[maven-release-plugin] prepare release dnet-dedup-4.0.0
2020-02-10 12:50:45 +01:00
Sandro La Bruzzo
46727f5c76
upgraded maven version of commons-lang
2020-02-10 12:38:40 +01:00
miconis
5c8f6febee
minor changes in comparators
2020-01-24 10:01:11 +01:00
miconis
4dce785375
update in the implementation of the tree: addition of new logic aggregations and statistics
2020-01-14 11:42:43 +02:00
miconis
b3748b8d77
minor changes
2019-12-18 16:20:35 +01:00
miconis
b21b1b8f61
implementation of new aggregation in the tree node processing
2019-12-18 16:19:36 +01:00
miconis
20fcfe6328
implementation of new aggregation in the tree node processing
2019-12-18 16:19:26 +01:00
Sandro La Bruzzo
d924f28b93
fixed wrong use of jspath
2019-12-18 09:29:44 +01:00
miconis
84aaa65501
implementation of new json comparator and update of the publication configuration
2019-12-17 09:16:26 +01:00
Sandro La Bruzzo
5c01ae4c92
merged JqMapping branch into tree2
2019-12-13 11:30:02 +01:00
Sandro La Bruzzo
35008fdbf9
fix stuff
2019-12-06 15:28:30 +01:00
Sandro La Bruzzo
16c670a5d5
Improved deduplication
2019-12-05 14:14:25 +01:00
miconis
49f9beb4a8
implementation of romansmatch and re-implementation of the getNumber function. New terms in the translation map and update of the configuration
2019-11-28 16:54:44 +01:00
miconis
f791730330
addition of one term to the translation maps in the configurations
2019-11-27 15:48:37 +01:00
miconis
d2278fe358
minor change in the citymatch
2019-11-21 10:54:02 +01:00
miconis
8c0d346005
the param map has been updated: now it accepts string parameters
2019-11-21 09:37:56 +01:00
miconis
ddd40540aa
jarowinklernormalizedname splitted in 3 different comparators: citymatch, keywordmatch and jarowinkler. Implementation of the TreeStatistic support functions
2019-11-20 10:45:00 +01:00
miconis
c687956371
code cleaning and implementation of the TreeDedup + minor changes
2019-11-14 10:01:21 +01:00
miconis
0973899865
code cleaning, distribution of the classes in packages and implementation of the new configuration
2019-11-07 12:47:12 +01:00
miconis
30a873265f
put the last modification of the master branch into the tree2. Addition of the configuration as parameter of the comparator. This is to allow the comparator to access it
2019-10-29 16:38:42 +01:00
miconis
1beb776691
minor changes
2019-10-29 15:58:21 +01:00
miconis
075f741d28
[maven-release-plugin] prepare for next development iteration
2019-10-24 11:34:19 +02:00
miconis
ced4bcdd59
[maven-release-plugin] prepare release dnet-dedup-3.0.15
2019-10-24 11:34:12 +02:00
miconis
13f93e6055
Revert "[maven-release-plugin] prepare release dnet-dedup-3.0.15"
...
This reverts commit cf93515d94
.
2019-10-24 11:23:01 +02:00
miconis
cf93515d94
[maven-release-plugin] prepare release dnet-dedup-3.0.15
2019-10-24 11:17:07 +02:00
miconis
285ec3ca17
release rollback
2019-10-24 11:11:07 +02:00
miconis
5f249fd56c
minor changes
2019-10-23 16:37:20 +02:00
miconis
c9863debfa
minor changes and configuration updates (synonym field added)
2019-10-23 16:31:45 +02:00
miconis
5499ca17c3
minor changes
2019-10-08 16:49:07 +02:00
miconis
50b7a12b3f
normalization of the term in the translation map added
2019-10-08 15:13:45 +02:00
miconis
26b383fea2
translation map moved in json configuration, support for synonyms added in the configuration, now the configuration is argument of conditions, distancealgos and clusteringfunctions
2019-10-08 14:53:52 +02:00
Claudio Atzori
07355d2811
[maven-release-plugin] prepare for next development iteration
2019-09-25 10:39:46 +02:00
Claudio Atzori
254eb46809
[maven-release-plugin] prepare release dnet-dedup-3.0.14
2019-09-25 10:39:39 +02:00
Claudio Atzori
74c6462b49
updated translation map and some tests
2019-09-25 10:15:13 +02:00
miconis
aed81e4cfa
translation map updated
2019-09-25 09:53:06 +02:00
miconis
afd2b398d5
optimize imports
2019-08-09 15:42:41 +02:00
miconis
d71dae5fd2
implementation of the conditions in tree nodes. get rid of the conditions part of the configuration
2019-08-09 15:41:49 +02:00
miconis
a5c5d2f01b
implementation of the decision tree. It takes place of the distance algos, necessaryConditions and sufficientConditions are still there. The model contains only path, type and name of the field. ignoreMissing is still in the model because it is used by the conditions.
2019-08-09 10:08:34 +02:00
miconis
f2136e1024
code refactoring: useless module removed
2019-08-07 15:16:59 +02:00
miconis
8c867101ef
addition of a fixSpecial function to address the problem with special character in organization names, addition of new terms in translation maps
2019-08-06 17:06:05 +02:00
miconis
4502b44337
addition of the BlockUtils class for meta-blocking, implementation of a new local test with edge filtering example
2019-08-06 12:09:34 +02:00
miconis
cffb712a99
Merge branch 'master' of https://github.com/dnet-team/dnet-dedup
2019-07-19 17:10:53 +02:00
miconis
a85576c27e
restyling of the JaroWinklerNormalizedName comparator, now it is optimized. Addition of some translations in the translation maps, addition of a clustering based on keywords in organizations legalnames
2019-07-19 17:10:29 +02:00
Claudio Atzori
6cb846331a
[maven-release-plugin] prepare for next development iteration
2019-07-08 11:12:52 +02:00
Claudio Atzori
c04d2232c2
[maven-release-plugin] prepare release dnet-dedup-3.0.13
2019-07-08 11:12:45 +02:00
miconis
fb5e38db26
Merge branch 'master' of https://github.com/dnet-team/dnet-dedup
2019-07-08 11:02:29 +02:00
miconis
3c6f8d1e44
bug fixing in the keywordsclustering class
2019-07-08 11:01:49 +02:00
Claudio Atzori
a69022617d
[maven-release-plugin] prepare for next development iteration
2019-07-08 10:11:24 +02:00
Claudio Atzori
c6baeb93d4
[maven-release-plugin] prepare release dnet-dedup-3.0.12
2019-07-08 10:11:17 +02:00
miconis
f5de20a508
[maven-release-plugin] rollback the release of dnet-dedup-3.0.12
2019-07-08 10:00:48 +02:00
miconis
ba50aa8654
[maven-release-plugin] prepare for next development iteration
2019-07-08 09:48:10 +02:00
miconis
7065110a21
[maven-release-plugin] prepare release dnet-dedup-3.0.12
2019-07-08 09:48:03 +02:00
miconis
15bec5e876
addition of doi normalization in PidMatch comparator, addition of keywordsclustering (clustering based on terms in the translation maps for the organizations), minor changes
2019-07-08 09:44:02 +02:00
Claudio Atzori
2dcffb965f
[maven-release-plugin] prepare for next development iteration
2019-06-19 10:02:39 +02:00
Claudio Atzori
85126c59f7
[maven-release-plugin] prepare release dnet-dedup-3.0.11
2019-06-19 10:02:32 +02:00
Claudio Atzori
15d7b584f3
optimized classpath resolvers
2019-06-19 10:01:35 +02:00
Claudio Atzori
ff4956def9
[maven-release-plugin] prepare for next development iteration
2019-06-18 14:46:34 +02:00
Claudio Atzori
eb5ce312a3
[maven-release-plugin] prepare release dnet-dedup-3.0.10
2019-06-18 14:46:27 +02:00
Claudio Atzori
f2bc665403
avoid to divide by zero: in case of missing values, return undefined response
2019-06-18 14:45:15 +02:00
Claudio Atzori
e3f86b92c8
cleanup
2019-06-18 14:44:42 +02:00
miconis
54e4d0af04
exact match condition gives undefined if a field is missing, ignoremissing semantics changed: now performs the comparison in any case if =true, if false gives -1 in case of missing
2019-06-18 14:05:31 +02:00
miconis
e8db8f2abb
implementation of the integration test, addition of document blocks to group entities after clustering
2019-05-21 16:38:26 +02:00
Claudio Atzori
f7a3bdf3f8
[maven-release-plugin] prepare for next development iteration
2019-04-03 12:35:00 +02:00
Claudio Atzori
98c179c8fb
[maven-release-plugin] prepare release dnet-dedup-3.0.9
2019-04-03 12:34:52 +02:00
miconis
3e61a90c8f
[maven-release-plugin] rollback the release of dnet-dedup-3.0.9
2019-04-03 12:27:28 +02:00
miconis
15fb9eb883
[maven-release-plugin] prepare for next development iteration
2019-04-03 12:26:05 +02:00
miconis
a1ff4daa7f
[maven-release-plugin] prepare release dnet-dedup-3.0.9
2019-04-03 12:25:56 +02:00
miconis
1d29bae47c
branch cities merged into master
2019-04-03 12:22:33 +02:00
miconis
7e7018c51f
addition of a sparktester test, implementation of 2 different classes for testing in dnet-dedup-test module, addition of new terms in the vocabulary and change in the implementation of the JaroWinklerNormalizedName comparator
2019-04-03 09:40:14 +02:00
miconis
4bd5a9beee
minor changes
2019-03-26 15:48:21 +01:00
Michele De Bonis
662448e584
update of the comparator for legalnames of organizations
2019-03-21 14:27:27 +01:00
Claudio Atzori
f2394fcd9f
[maven-release-plugin] prepare for next development iteration
2019-02-18 09:09:14 +01:00
Claudio Atzori
722431dde1
[maven-release-plugin] prepare release dnet-dedup-3.0.8
2019-02-18 09:09:07 +01:00
Claudio Atzori
470c4b0f20
default configuration includes configurationId
2019-02-18 09:07:23 +01:00
Claudio Atzori
ccb7e83196
[maven-release-plugin] prepare for next development iteration
2019-02-17 12:56:19 +01:00
Claudio Atzori
7d8e62d4cc
[maven-release-plugin] prepare release dnet-dedup-3.0.7
2019-02-17 12:56:11 +01:00
Claudio Atzori
968cd47436
replace existing attributes when loading default configuration
2019-02-17 12:48:25 +01:00
Michele De Bonis
0735f3a822
implementation of the test classes and minor changes
2019-02-08 12:56:47 +01:00
Michele De Bonis
7a8d28991f
implementation of the decision tree for the deduplication of the authors, implementation of multiple comparators to be used in a tree node and definition of the proto for person entity
2018-12-20 09:54:41 +01:00
Michele De Bonis
39613dbbd6
implementation of the decisional tree, addition of the dnet-openaire-data-protos module, definition of the person proto, blockprocessor and paceconfig modified with addition of support for the tree processing
2018-12-12 16:30:03 +01:00
Claudio Atzori
f1c68d8ba3
apply limits (length, size) to pace Fields
2018-11-20 10:51:38 +01:00
Claudio Atzori
c5979ffe18
[maven-release-plugin] prepare for next development iteration
2018-11-19 17:41:45 +01:00
Claudio Atzori
9869dff1d2
[maven-release-plugin] prepare release dnet-dedup-3.0.6
2018-11-19 17:41:37 +01:00
Claudio Atzori
c2d4cb3ba6
added new properties to FieldDef (size, length) to limit the information mapped onto each MapDocument
2018-11-19 17:37:57 +01:00
Claudio Atzori
394fcafd41
[maven-release-plugin] prepare for next development iteration
2018-11-17 09:13:16 +01:00
Claudio Atzori
397554130c
[maven-release-plugin] prepare release dnet-dedup-3.0.5
2018-11-17 09:13:09 +01:00
Claudio Atzori
0dfb2ea600
added distance function fot software titles
2018-11-17 09:11:38 +01:00
Michele De Bonis
3d4372ced9
addition of cities check
2018-11-16 16:11:03 +01:00
Claudio Atzori
55a9b4f501
[maven-release-plugin] prepare for next development iteration
2018-11-16 09:18:00 +01:00
Claudio Atzori
35ab630493
[maven-release-plugin] prepare release dnet-dedup-3.0.4
2018-11-16 09:17:53 +01:00
Claudio Atzori
399e4bc80f
default (empty) configuration should be aligned with the updated model
2018-11-15 16:52:56 +01:00
Claudio Atzori
59bab8dba4
less verbose logging
2018-11-13 09:07:45 +01:00
Claudio Atzori
478ad72cb8
propagate exceptions in case of serialization errors, removed configuration pretty printing, removed unused class ScoredResult
2018-11-12 15:52:18 +01:00
Claudio Atzori
f7616c7a8a
[maven-release-plugin] prepare for next development iteration
2018-11-12 14:23:36 +01:00
Claudio Atzori
df4b871c8b
[maven-release-plugin] prepare release dnet-dedup-3.0.3
2018-11-12 14:23:29 +01:00
Michele De Bonis
72a9b3139e
Merge branch 'master' of https://github.com/dnet-team/dnet-dedup
2018-11-12 14:11:26 +01:00
Michele De Bonis
b5062f5429
configuration file updated, addition of condition on domain
2018-11-12 14:11:15 +01:00
Claudio Atzori
2a509b18fa
[maven-release-plugin] prepare for next development iteration
2018-11-12 12:46:50 +01:00
Claudio Atzori
e247218987
[maven-release-plugin] prepare release dnet-dedup-3.0.2
2018-11-12 12:46:42 +01:00
Claudio Atzori
b7bc7f0401
getting rid of spark libs from dnet-pace-core
2018-11-12 12:46:06 +01:00
Claudio Atzori
3dacba37ea
[maven-release-plugin] prepare for next development iteration
2018-11-12 11:40:42 +01:00
Claudio Atzori
8cc2517f5d
[maven-release-plugin] prepare release dnet-dedup-3.0.1
2018-11-12 11:40:34 +01:00
Claudio Atzori
851ae5eec3
[maven-release-plugin] rollback the release of dnet-dedup-3.0.1
2018-11-12 11:39:07 +01:00
Claudio Atzori
f283d58a6e
[maven-release-plugin] prepare release dnet-dedup-3.0.1
2018-11-12 11:38:52 +01:00
Claudio Atzori
6d09041288
[maven-release-plugin] rollback the release of dnet-dedup-3.0.1
2018-11-12 11:28:28 +01:00
Claudio Atzori
46cee13596
[maven-release-plugin] prepare for next development iteration
2018-11-12 11:24:06 +01:00
Claudio Atzori
e1c69ad24e
[maven-release-plugin] prepare release dnet-dedup-3.0.1
2018-11-12 11:23:57 +01:00
Michele De Bonis
b247a86e69
configuration files changed: dedupRun instead of run, assertion updated in tests
2018-11-06 11:02:00 +01:00
Michele De Bonis
4c8485d0bb
deleted useless imports
2018-11-06 09:48:22 +01:00
Michele De Bonis
748189af10
implementation of JaroWinklerNormalizedName, addition of various stopwords in different languages and configuration test
2018-11-05 17:22:59 +01:00
Claudio Atzori
e296f7a81c
added DiffPatchMatch utility. Resumed commented tests!
2018-10-31 10:49:11 +01:00
Michele De Bonis
dc41b76643
serialization test added. useless getter methods ignored by json serialization
2018-10-29 16:16:11 +01:00
Michele De Bonis
ea36007d1f
DedupConf parsed using Jackson library
2018-10-29 11:13:55 +01:00
Michele De Bonis
8b4762bf54
implementation of the toString methonds changed: from Gson to Jackson
2018-10-26 14:55:59 +02:00
Michele De Bonis
3cf3dc1934
modification in the initialization of clustering functions, distance algos and conditions.
2018-10-25 15:15:40 +02:00
Michele De Bonis
1cbbc3f15a
update in the discovery of clustering, conditions and distance functions (annotated with custom annotations)
2018-10-24 12:09:41 +02:00
Claudio Atzori
4d379c2227
revised PidMatch implementation, cleanup
2018-10-20 08:38:19 +02:00
Claudio Atzori
3197f26691
[maven-release-plugin] prepare for next development iteration
2018-10-18 12:17:34 +02:00
Claudio Atzori
63815be2d6
[maven-release-plugin] prepare release dnet-dedup-3.0.0
2018-10-18 12:17:27 +02:00
Claudio Atzori
ed14476b06
[maven-release-plugin] rollback the release of dnet-dedup-3.0.0
2018-10-18 12:13:03 +02:00
Claudio Atzori
82d5dce114
[maven-release-plugin] prepare release dnet-dedup-3.0.0
2018-10-18 12:12:45 +02:00
Claudio Atzori
4f29124607
[maven-release-plugin] rollback the release of dnet-dedup-3.0.0
2018-10-18 12:00:45 +02:00
Claudio Atzori
5a48937ae1
[maven-release-plugin] prepare for next development iteration
2018-10-18 11:58:43 +02:00
Claudio Atzori
5aec80345f
[maven-release-plugin] prepare release dnet-dedup-3.0.0
2018-10-18 11:58:36 +02:00
Claudio Atzori
1b46966383
updated maven project structure
2018-10-18 11:56:26 +02:00
Michele De Bonis
72ebf7c0f3
update of the spark test
2018-10-18 10:12:44 +02:00
Sandro La Bruzzo
1bb5c26e6d
Added FSpark Implementation of dedup
2018-10-11 15:19:20 +02:00
Sandro La Bruzzo
d1c73bcf90
Added First Implementation of Spark Test
2018-10-02 17:07:17 +02:00
Sandro La Bruzzo
476c3d7b07
added d-net pace core module and ignored target folder
2018-10-02 10:37:54 +02:00