Miriam Baglioni
c334fe2438
Merge pull request 'Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleted by inference or that are pointing to dangling entities' ( #328 ) from cleanup_relations_after_dedup into beta
...
Reviewed-on: D-Net/dnet-hadoop#328
2023-08-08 09:49:12 +02:00
Miriam Baglioni
0e2f855807
Merge pull request 'Updates Promotion DBs' ( #321 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#321
2023-08-07 12:09:16 +02:00
Miriam Baglioni
18fbe52b20
Merge pull request 'Import affiliation relations from Crossref' ( #320 ) from 8876 into beta
...
Reviewed-on: D-Net/dnet-hadoop#320
2023-08-07 10:45:30 +02:00
Giambattista Bloisi
97b6d1dc45
Filter ids by dataInfo.deletedbyinference and DataInfo.invisible flags
...
Filter relations also by dataInfo.invisible flag
2023-08-07 10:24:11 +02:00
Giambattista Bloisi
af49424b59
Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleyted by inference or that are pointing to dangling entities
2023-08-04 14:27:39 +02:00
Claudio Atzori
0bc74e2000
code formatting
2023-08-02 11:52:10 +02:00
Claudio Atzori
11ffb9bd68
rule out records with NULL dataInfo
2023-07-31 12:35:33 +02:00
Claudio Atzori
ccac6a7f75
rule out records with NULL dataInfo
2023-07-31 12:35:05 +02:00
Serafeim Chatzopoulos
7cefe2665b
Remove unnecessary classes
2023-07-28 19:14:39 +03:00
Serafeim Chatzopoulos
26a92ce762
Merge branch '8876' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8876
2023-07-28 19:03:57 +03:00
Serafeim Chatzopoulos
ebfba38ab6
Add changes from code review
2023-07-28 19:03:47 +03:00
Serafeim Chatzopoulos
eb8684a8cf
Merge branch 'beta' into 8876
2023-07-28 13:39:33 +02:00
Claudio Atzori
a72b9e96ac
expand the instance level fulltext in the XML records
2023-07-27 14:57:38 +02:00
Claudio Atzori
59764145bb
cherry picked & fixed commit 270df939c4
2023-07-25 17:39:00 +02:00
Claudio Atzori
270df939c4
partial implementation of the suggestions from https://support.openaire.eu/issues/8898
2023-07-25 17:29:50 +02:00
Giambattista Bloisi
e64c2854a3
Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
...
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Giambattista Bloisi
bb5b845e3c
Use scala.binary.version property to resolve scala maven dependencies
...
Ensure consistent usage of maven properties
Profile for compiling with scala 2.12 and Spark 3.4
2023-07-24 11:13:48 +02:00
Serafeim Chatzopoulos
3a0f09774a
Add script to find score limits
2023-07-21 17:55:41 +03:00
Ilias Kanellos
06b9b71c4e
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-07-21 17:42:49 +03:00
Ilias Kanellos
2374f445a9
Produce additional bip update specific files
2023-07-21 17:42:46 +03:00
Serafeim Chatzopoulos
cb0f3c50f6
Format workflow.xml
2023-07-21 16:07:10 +03:00
Serafeim Chatzopoulos
c64e5e588f
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-07-21 15:27:02 +03:00
Serafeim Chatzopoulos
2cc5b1a39b
Fixes in workflow.xml
2023-07-21 15:26:50 +03:00
Ilias Kanellos
0f96af5d56
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-07-21 13:42:35 +03:00
Ilias Kanellos
03da965162
Format bip-score based file without doi references
2023-07-21 13:42:30 +03:00
Giambattista Bloisi
f03153823a
Update testCitationRelations number of expected citations according to changes made in 0559d8b4
(monodirectional citations)
2023-07-21 10:48:28 +02:00
Giambattista Bloisi
54c1eacef1
SparkJobTest was failing because testing workingdir was not cleaned up after eact test
2023-07-21 10:42:24 +02:00
Giambattista Bloisi
5e15f20e6e
Fix entityMerger that was excluding the authors of the first entity in the list to merge
2023-07-21 00:46:54 +02:00
Giambattista Bloisi
0210a14e43
Ignore timestamp differences in PromoteActionPayloadForGraphTableJobTest
2023-07-20 23:45:57 +02:00
Giambattista Bloisi
dba34505de
Fix SparkStatsTest bug where parquet tables were incorrectly read as text files leading to unpredictable count() values
2023-07-19 14:24:52 +02:00
Giambattista Bloisi
e47ed1fdb2
Use DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES in json mapper to avoid that tests fail if they encounter unmapped properties
2023-07-19 14:21:40 +02:00
Serafeim Chatzopoulos
db4ca43ee8
Resolve conflict
2023-07-18 18:38:26 +03:00
Serafeim Chatzopoulos
be320ba3c1
Indentation fixes
2023-07-17 16:04:21 +03:00
dimitrispie
be4856ef35
Update step15.sql
2023-07-17 15:33:58 +03:00
Serafeim Chatzopoulos
bc1a4611aa
Minor changes
2023-07-17 11:17:53 +03:00
Claudio Atzori
8af129b0c7
merged stats promotion step from antonis/promotion-prod-only
2023-07-13 15:03:28 +02:00
dimitrispie
706092bc19
Update updateProductionViews.sh
2023-07-13 15:48:12 +03:00
dimitrispie
aedd279f78
Updates Promotion DBs
...
- Add a step for promoting the splitted monitor DBs
2023-07-13 15:35:46 +03:00
dimitrispie
163b2ee2a8
Changes
...
1. Monitor updates
2. Bug fixes during copy to impala cluster
2023-07-13 15:25:00 +03:00
dimitrispie
76901a25f9
Updates Promotion DBs
...
- Add a step for promoting the splitted monitor DBs
2023-07-12 22:49:08 +03:00
Serafeim Chatzopoulos
4eba14a80e
Add oozie workflow
2023-07-06 21:07:50 +03:00
Serafeim Chatzopoulos
c2998a14e8
Add basic tests for affiliation relations
2023-07-06 20:28:16 +03:00
Serafeim Chatzopoulos
bc7b00bcd1
Add bi-directional affiliation relations
2023-07-06 18:29:15 +03:00
Serafeim Chatzopoulos
12528ed2ef
Refactor PrepareAffiliationRelations.java to use OafMapperUtils common functions
2023-07-06 18:08:33 +03:00
Serafeim Chatzopoulos
bbc245696e
Prepare actionsets for BIP affiliations
2023-07-06 15:56:12 +03:00
Ilias Kanellos
0c433eccdd
Fix scores & Workflow
2023-07-06 15:06:28 +03:00
Ilias Kanellos
d5c39a1059
Fix map scores to doi
2023-07-06 15:04:48 +03:00
Ilias Kanellos
772d5f0aab
Make PR and AttRank serial
2023-07-06 13:47:51 +03:00
Giambattista Bloisi
801da2fd4a
New sources formatted by maven plugin
2023-07-06 10:28:53 +02:00
Giambattista Bloisi
bd3fcf869a
rename dnet-pace-core into dhp-pace-core module and use it as dependency in other modules
2023-07-06 10:02:23 +02:00
Serafeim Chatzopoulos
347a889b20
Read affiliation relations
2023-07-06 00:51:01 +03:00
Miriam Baglioni
8dcd028eed
[UsageCount] fixed typo in attribute name for datasource table
2023-07-01 16:07:22 +02:00
Miriam Baglioni
7738372125
[UsageCount] fixed typo in attribute name for datasource table
2023-06-30 18:56:41 +02:00
Sandro La Bruzzo
9963fd6d29
updated log to add subentity
2023-06-28 13:36:05 +02:00
Claudio Atzori
f3a85e224b
merged from branch beta the bulk tagging (single step, negative constraints), the cleanig worflow (single step, pid type based cleaning), instance level fulltext
2023-06-28 13:33:57 +02:00
Sandro La Bruzzo
ed7e2ab6d1
reverted mistake on commit workflow.xml
2023-06-28 11:40:19 +02:00
Sandro La Bruzzo
9910ce06ae
added to CreateSimRel the feature to write time log
2023-06-28 11:38:16 +02:00
Miriam Baglioni
2717edafb7
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-06-28 11:25:14 +02:00
Miriam Baglioni
2f04c9d149
[BulkTagging] fixing left over for test
2023-06-28 11:24:42 +02:00
Sandro La Bruzzo
bd17c3edc8
added to CreateSimRel the feature to write time log
2023-06-28 11:20:58 +02:00
Claudio Atzori
288ec0b7d6
[doiboost] merged workflow from branch beta
2023-06-28 09:15:37 +02:00
Claudio Atzori
e10ce92fe5
[stats wf] merged workflows from branch beta
2023-06-27 14:32:48 +02:00
Claudio Atzori
d029bf0b94
Merge branch 'master' into distinct_pids_from_openorgs
2023-06-27 12:24:35 +02:00
Serafeim Chatzopoulos
60f25b780d
Minor fixes in workflow.xml and job.properties
2023-06-23 12:51:50 +03:00
Michele Artini
88a1cbc37d
fixed a datasource id
2023-06-22 07:56:33 +02:00
Michele Artini
009d7f312f
fixed a datasource Id
2023-06-21 16:17:34 +02:00
Claudio Atzori
b0ebf56367
Merge pull request 'Update step15_5.sql' ( #314 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#314
2023-06-21 10:33:22 +02:00
dimitrispie
2b6370eaee
Update step15_5.sql
...
Bug fix
2023-06-21 11:31:10 +03:00
Claudio Atzori
35e42a86ed
Merge pull request 'Update step15_5.sql' ( #313 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#313
2023-06-21 10:26:16 +02:00
dimitrispie
74cb060bfe
Update step15_5.sql
...
Add "if not exists" clause
2023-06-21 11:24:06 +03:00
Claudio Atzori
85e016df17
Merge pull request 'Update step16-createIndicatorsTables.sql' ( #312 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#312
2023-06-21 09:52:33 +02:00
dimitrispie
a475cfcb7b
Update step16-createIndicatorsTables.sql
...
Rename a field in indi_pub_interdisciplinarity
2023-06-21 10:42:02 +03:00
Claudio Atzori
979cf9cd87
Merge pull request 'Update step15.sql' ( #311 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#311
2023-06-21 09:20:01 +02:00
dimitrispie
4648cd88d4
Update step15.sql
...
Cast score to double
2023-06-21 10:02:19 +03:00
dimitrispie
94d2573c77
Update step15.sql
...
Bug Fix
2023-06-21 09:22:39 +03:00
Claudio Atzori
0561362de2
Merge pull request 'Update step20-createMonitorDB_institutions.sql' ( #309 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#309
2023-06-20 15:07:09 +02:00
Claudio Atzori
50d7dc0078
[graph enrichment] fixed projectOrganizationPath not being passed to the apply_resulttoorganization_propagation node
2023-06-19 15:42:44 +02:00
Claudio Atzori
fbd9bf704e
indent
2023-06-19 15:41:22 +02:00
Giambattista Bloisi
758e662ab8
Revert "REmove duplicated code and ensure that load and initialization is done through "DedupConfig.load" method"
...
This reverts commit 485f9d18cb
.
2023-06-19 13:08:10 +02:00
Giambattista Bloisi
485f9d18cb
REmove duplicated code and ensure that load and initialization is done through "DedupConfig.load" method
2023-06-19 13:00:02 +02:00
dimitrispie
be2caedb04
Update step20-createMonitorDB_institutions.sql
...
Add openorgs____::1624ff7c01bb641b91f4518539a0c28a Vrije Universiteit Amsterdam
2023-06-19 12:12:17 +03:00
dimitrispie
36e0a8fec4
Changes to Promotion Stats WF
...
1. Add new cluster host at impala-shell commands
2. Add a step for splitting monitor dbs
3. Update workflow.xml to included the new splitting monitor dbs step
2023-06-19 09:44:34 +03:00
dimitrispie
4c770a5e29
Update finalizeImpalaCluster.sh
...
Drop views in shadow dbs before dropping the db
2023-06-15 13:25:37 +03:00
dimitrispie
e06d962a6a
Update step15.sql
2023-06-15 12:20:35 +03:00
dimitrispie
afcad08396
Update step20-createMonitorDB_institutions.sql
...
Added openorgs____::c0b262bd6eab819e4c994914f9c010e2 -- National Institute of Geophysics and Volcanology
2023-06-15 10:28:49 +03:00
Claudio Atzori
b9748763e2
Merge pull request '[stats wf] Bug fixes' ( #308 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#308
2023-06-14 21:57:03 +02:00
dimitrispie
42b8ce2ba4
Update copyDataToImpalaCluster.sh
2023-06-14 19:23:42 +03:00
dimitrispie
2032b0df40
Bug fixes
...
1. Remove tables/views from old databases in the new cluster, before dropping the dbs
2. Fix id in result_accessroute, indi_impact_measures, indi_pub_bronze_oa
2023-06-14 19:09:09 +03:00
Michele Artini
a92206dab5
re-added the name of a column (pid)
2023-06-13 11:43:10 +02:00
Claudio Atzori
b76a47b103
[aggregator graph] added column alias when mapping organization PIDs from the OpenOrgs database
2023-06-13 11:38:10 +02:00
Claudio Atzori
ad04f14b81
Merge branch 'beta' into distinct_pids_from_openorgs_beta
2023-06-12 09:58:21 +02:00
Claudio Atzori
55f002f1e9
Merge branch 'beta' into propagationProjectThroughParentChils
2023-06-12 09:56:53 +02:00
Claudio Atzori
4b00a76271
Merge branch 'beta' into fulltext_url_validation
2023-06-12 09:55:25 +02:00
Claudio Atzori
de225c71cd
Merge branch 'beta' into removeTaggingCondition
2023-06-12 09:50:40 +02:00
Claudio Atzori
e1409ffe80
update sql query to return distinct pids
2023-06-12 09:47:45 +02:00
Claudio Atzori
da7b66c542
Merge pull request '[stats wf] Added memory to hive' ( #305 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#305
2023-06-08 08:58:48 +02:00
dimitrispie
c5f42c7f5b
Added memory to hive
2023-06-07 18:18:23 +03:00
Claudio Atzori
afb76ebf0f
Merge pull request '[stats wf] Bug fix on indicators step' ( #304 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#304
2023-06-07 16:49:09 +02:00
dimitrispie
fa24e2e18f
Bug fix on indicators step
...
indi_pub_gold_oa table was missing during the creation of other indicators
2023-06-07 17:43:37 +03:00
Claudio Atzori
01c67e697d
Merge pull request '[ stats wf] Bug fix' ( #303 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#303
2023-06-07 14:41:44 +02:00
dimitrispie
28272c1b0e
Bug fix
2023-06-07 15:34:01 +03:00
Alessia Bardi
d5be6a13e9
Updated officialnmae of pangaea in hostedbymap for Datacite to avoid duplicate entries in the source filter of the portal
2023-06-06 14:43:32 +02:00
Alessia Bardi
118e72d7db
Updated officialnmae of pangaea in hostedbymap for Datacite to avoid duplicate entries in the source filter of the portal
2023-06-06 14:39:12 +02:00
Alessia Bardi
5befd93d7d
test records for Solr indexing
2023-06-06 14:34:33 +02:00
Michele Artini
cae92cf811
update sql query to return distinct pids
2023-06-06 14:06:06 +02:00
Claudio Atzori
8f651f1225
Merge pull request 'Changes to beta stats wf' ( #300 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#300
2023-06-06 11:41:36 +02:00
dimitrispie
ad07fbf053
Add names to organizations for collaboration indicators
2023-06-02 14:13:10 +03:00
dimitrispie
2324670714
Split Monitor DBs-Interdisciplinarity indicators
...
- Split DBs Monitor for faster rendering of visualizations
- Add interdisciplinarity indicators from result_fos
2023-06-02 13:34:16 +03:00
Miriam Baglioni
daf4d7971b
refactoring
2023-05-31 18:56:58 +02:00
Miriam Baglioni
97d72d41c3
finalization of implementation and testing
2023-05-31 18:53:22 +02:00
Miriam Baglioni
0389b57ca7
added propagation for project to organization
2023-05-31 11:06:58 +02:00
Claudio Atzori
e45777e7e1
[aggregator graph] added validation for URLs mapped from oaf:fulltext
2023-05-26 11:33:42 +02:00
dimitrispie
ebe586b1d1
Impact indicators/Unpaywall
...
- Added Impact indicators
- Added unpaywall open access colours
2023-05-26 10:25:28 +03:00
dimitrispie
d6102dd576
Update step16-createIndicatorsTables.sql
...
- Add org names to indi_project_collab_org
- Add indi_pub_bronze_oa
- Changes to indi_pub_hybrid_oa_with_cc
2023-05-25 14:52:34 +03:00
Miriam Baglioni
9097e71853
Added assertion in test
2023-05-24 16:30:53 +02:00
Miriam Baglioni
9567c13bc3
refactoring
2023-05-24 16:20:05 +02:00
Miriam Baglioni
34172455d1
[BulkTag] Adding remove constraints to specify when a community must not appear in the context of a result.
2023-05-24 09:56:23 +02:00
Ilias Kanellos
a1b9187039
Fix syntax error on workflow.xml
2023-05-23 17:17:12 +03:00
Ilias Kanellos
6a7e370a21
Remove unnecessary counts in graph creation
2023-05-23 16:48:58 +03:00
Ilias Kanellos
ec4e010687
End after rankings | Create graph debugged
2023-05-23 16:44:04 +03:00
Claudio Atzori
db625e548d
[UsageCount] addition of usagecount for Projects and datasources
2023-05-22 15:00:46 +02:00
Alessia Bardi
04141fe259
tests for records from D4Science catalogues
2023-05-19 14:28:24 +02:00
Claudio Atzori
a235d2a24a
Merge pull request 'Updates to steps related to transfer data to impala cluster' ( #295 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#295
2023-05-18 08:46:15 +02:00
dimitrispie
86f4f63daf
Updates to steps related to transfer data to impala cluster
...
1. Remove external table definitions in stats_ext
2. Fix the issue where some views are not created.
3. Added two workflow parameters for copying also the usage stats dbs
2023-05-18 09:33:05 +03:00
Claudio Atzori
909729a2fc
[dedup] tweaking num partitions, minor changes
2023-05-17 10:16:22 +02:00
Ilias Kanellos
38020e242a
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-05-16 17:34:53 +03:00
Ilias Kanellos
3d69f33c84
Fix selection of columns in graph creation
2023-05-16 17:34:42 +03:00
Ilias Kanellos
3c38f7ba6f
Fix selection of columns in graph creation
2023-05-16 17:32:53 +03:00
Serafeim Chatzopoulos
8ef718c363
Fix workflow application path
2023-05-16 16:28:48 +03:00
Serafeim Chatzopoulos
26328e2a0d
Move job.properties
2023-05-16 14:39:53 +03:00
Serafeim Chatzopoulos
4eec3e7052
Add jobTracker, nameNode && spark2Lib as global params in oozie wf
2023-05-15 22:28:48 +03:00
Serafeim Chatzopoulos
b83135c252
Add missing kill nodes in workflow.xml
2023-05-15 19:55:35 +03:00
Serafeim Chatzopoulos
45f2aa0867
Move end node ... at the end in workflow.xml
2023-05-15 17:52:20 +03:00
Claudio Atzori
8acad52a0c
Merge branch 'beta' into apc_affiliation
2023-05-15 15:47:33 +02:00
Claudio Atzori
8a463cc3e8
fixed organization id created when mapping APC affiliations. Factored out ROR constants in dhp-common
2023-05-15 15:44:46 +02:00
Serafeim Chatzopoulos
12a57e1f58
Resolve conflicts
2023-05-15 16:20:11 +03:00
Serafeim Chatzopoulos
82e2a96f51
Resolve conflicts
2023-05-15 15:53:12 +03:00
Serafeim Chatzopoulos
b8e8c959fe
Update workflow.xml && job.properties
2023-05-15 15:50:23 +03:00
Ilias Kanellos
4a905932a3
Spark properties from job.properties
2023-05-15 15:24:22 +03:00
Claudio Atzori
0c314d5e09
Merge pull request 'Update copyDataToImpalaCluster.sh' ( #293 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#293
2023-05-15 12:05:54 +02:00
Serafeim Chatzopoulos
07818131ef
Update documentation
2023-05-15 13:04:44 +03:00
dimitrispie
b3f9633205
Update copyDataToImpalaCluster.sh
...
Added option --user to impala-shell command
2023-05-15 12:51:44 +03:00
Miriam Baglioni
78b07400c0
changed test classes
2023-05-15 11:37:08 +02:00
Miriam Baglioni
86fe886c1a
removed the inverse of the Citing relation
2023-05-15 11:20:51 +02:00
Ilias Kanellos
1788ac2d4d
Correct filtering for MAG records
2023-05-12 12:55:43 +03:00
Miriam Baglioni
12cd179d2d
Merge pull request 'Update copyDataToImpalaCluster.sh' ( #291 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: D-Net/dnet-hadoop#291
2023-05-12 11:36:34 +02:00
dimitrispie
00d0d162b6
Update copyDataToImpalaCluster.sh
...
Added a temporary folder to copy the files to avoid permission issues
2023-05-12 12:31:13 +03:00
Ilias Kanellos
5ddbb4ad10
Spark properties no longer hardcoded
2023-05-11 15:36:47 +03:00
Ilias Kanellos
3de35fd6a3
Produce 5 classes of ranking scores
2023-05-11 14:42:25 +03:00
Miriam Baglioni
8c05f49665
moved the version as it was before the change
2023-05-09 10:48:34 +02:00