Commit Graph

3665 Commits

Author SHA1 Message Date
Claudio Atzori b502f86523 fixed input path supplemented to GetDatasourceFromCountry; adjusted the various spark.sql.shuffle.partitions 2023-03-24 13:09:12 +01:00
Claudio Atzori c07857fa37 [graph cleaning] unit tests & cleanup 2023-03-23 15:57:47 +01:00
Claudio Atzori 90e61a8aba [graph cleaning] WIP: refactoring of the cleaning stages, unit tests 2023-03-23 15:03:26 +01:00
Claudio Atzori 488d9a5eaa [graph cleaning] WIP: refactoring of the cleaning stages, unit tests 2023-03-23 10:41:13 +01:00
Claudio Atzori 4f5ba0ed52 [graph cleaning] WIP: refactoring of the cleaning stages, unit tests 2023-03-21 14:41:20 +01:00
Claudio Atzori 6d3d18d8b5 [graph cleaning] WIP: refactoring of the cleaning stages 2023-03-16 17:23:36 +01:00
Claudio Atzori 518618f1a9 [graph cleaning] avoid to overwrite the subject class to 'keyword' for those with provenance 'subject:fos' 2023-03-14 15:22:47 +01:00
Claudio Atzori 41e00bcd07 [graph provision] avoid to parse again the XML records, apparently the escaped XML characters get unescaped invalidating the record 2023-03-13 15:19:49 +01:00
Claudio Atzori 24e2fd828b code formatting 2023-03-08 21:17:08 +01:00
Claudio Atzori e28d395e87 [aggregator graph] using dedicated path to sync claims, adjusted paths with wildcards 2023-03-08 21:16:52 +01:00
Claudio Atzori 5b8fd37314 [aggregator graph] using dedicated path to sync claims 2023-03-08 15:28:14 +01:00
Claudio Atzori 7fd89566c2 [aggregator graph] handle paths including wildcards 2023-03-08 12:43:00 +01:00
Miriam Baglioni 588aca5ce4 Merge pull request 'h2020classification' (#280) from h2020classification into beta
Reviewed-on: D-Net/dnet-hadoop#280
2023-03-03 09:29:10 +01:00
Claudio Atzori 8ec0d62d91 pre-group the records in each table before joning the contents from BETA and PROD together 2023-03-02 14:49:19 +01:00
Miriam Baglioni 0fff98a14c [ECclassification] removed print 2023-03-02 11:46:57 +01:00
Miriam Baglioni b0c2f7e526 [ECclassification] removed not needed resources 2023-03-02 11:44:48 +01:00
Miriam Baglioni d4fc62c2f6 mergin with branch beta 2023-03-02 11:14:54 +01:00
Miriam Baglioni de8ad1caef [ECclassification] new implementation for the H2020 classification 2023-03-02 11:14:03 +01:00
Claudio Atzori db9dad4aa7 [actionmanager] increased spark.sql.shuffle.partitions for publication, dataset, relation records 2023-03-02 09:11:37 +01:00
Miriam Baglioni c1f9848953 [ECclassification] added new classes 2023-03-01 15:29:11 +01:00
Claudio Atzori 6f488547a7 ignore non processable records 2023-03-01 14:49:51 +01:00
Claudio Atzori 7d263f265e adjusted logs 2023-03-01 11:58:07 +01:00
Claudio Atzori 16ad42e8f3 code formatting 2023-03-01 10:22:13 +01:00
Claudio Atzori 9c59dac859 followup changes reorganising the mdstore synchronisation mechanism 2023-03-01 10:16:20 +01:00
Miriam Baglioni ad745c0aa3 [CrossrefFunderMapping] fixed issueson funder name 2023-02-28 14:58:27 +01:00
Miriam Baglioni 4f2df876cd [ECclassification] new implementation first try 2023-02-28 14:44:00 +01:00
Claudio Atzori 2f7346e9cf WIP monodirectional citations, Datacite 2023-02-28 13:30:51 +01:00
Claudio Atzori 0559d8b412 WIP monodirectional citations 2023-02-28 10:57:32 +01:00
Sandro La Bruzzo 69fa616490 removed wrong content 2023-02-28 10:27:38 +01:00
Sandro La Bruzzo 832a75d012 added mapping for crossref funder 2023-02-28 10:16:34 +01:00
Sandro La Bruzzo 78e51c182a Added missing parametero to raw all workflow 2023-02-28 10:16:01 +01:00
Claudio Atzori 7aebedb43c code formatting 2023-02-27 11:51:27 +01:00
Miriam Baglioni 80987801d7 [FoS] added check for null on level1 subject 2023-02-27 11:40:22 +01:00
Claudio Atzori 31e97c2a6b [unresolved entities] updated oozie wf node labels 2023-02-27 11:38:29 +01:00
Miriam Baglioni 23112929e9 [FoS] changed the default separator from comma to tab to solve the issue in subject value split 2023-02-27 10:18:39 +01:00
Serafeim Chatzopoulos 0b5bf53b45 Remove unecessary indexed fields from Solr 2023-02-23 12:42:42 +02:00
Michele Artini fddcf701e9 updated the order of the compatibilities 2023-02-22 12:07:09 +01:00
Claudio Atzori 0c1be41b30 code formatting 2023-02-22 10:15:25 +01:00
Claudio Atzori 99cd7761aa cleanup of non necessary dhp-monitor-update workflow 2023-02-22 10:10:22 +01:00
Claudio Atzori cd3a51a15f Merge branch 'beta' into 8232-mdstore-synch-improve 2023-02-22 09:57:07 +01:00
Claudio Atzori 477a7c416f Merge branch 'beta' into UsageCountOnProjectAndDatasource 2023-02-22 09:55:51 +01:00
Claudio Atzori c20c1c9159 Merge pull request 'Added 4 institutions:' (#261) from antonis.lempesis/dnet-hadoop:beta into beta
Reviewed-on: D-Net/dnet-hadoop#261
2023-02-22 09:53:45 +01:00
Miriam Baglioni d617c3e812 [DOIBoost] extended mapping for funder #8407 2023-02-20 14:45:27 +01:00
Miriam Baglioni 016337a0f9 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2023-02-16 15:54:59 +01:00
Sandro La Bruzzo 118c1fc3b3 Merge remote-tracking branch 'origin/beta' into beta 2023-02-15 10:29:28 +01:00
Sandro La Bruzzo a8ac79fa25 Added citation relation on crossref Mapping 2023-02-15 10:29:13 +01:00
Claudio Atzori 9a03f71db1 code formatting 2023-02-13 16:25:47 +01:00
Michele Artini 554df257ab null values in date range conditions 2023-02-13 16:15:32 +01:00
Miriam Baglioni 5cf902a2b0 [UsageCount] changed query to make the sum be computed via sql instead of grouping 2023-02-10 16:16:37 +01:00
Miriam Baglioni f803530df6 [UsageCount] fixed query 2023-02-10 15:50:56 +01:00