Claudio Atzori
9f5d16624c
Merge pull request '[graph raw] datainfo.invisible set as true only for entities' ( #336 ) from invisible_relations into beta
...
Reviewed-on: #336
2023-09-04 16:14:47 +02:00
Claudio Atzori
adec6692ca
Merge branch 'beta' into invisible_relations
2023-09-04 16:13:06 +02:00
Claudio Atzori
15666e86a8
added collectedfrom to the affiliation relations imported from Crossref
2023-09-04 15:56:06 +02:00
Claudio Atzori
7d6bd4f20b
Merge pull request 'Fix import of affiliations relations from Crossref' ( #335 ) from 8876_fix_crossref_affiliation_relations_import into beta
...
Reviewed-on: #335
2023-09-04 15:19:58 +02:00
Claudio Atzori
5b06c9d06f
[graph raw] datainfo.invisible set as true only for entities
2023-09-04 15:15:24 +02:00
Serafeim Chatzopoulos
7de0164c26
Fix import of affiliations relations from Crossref
2023-09-04 16:04:41 +03:00
Claudio Atzori
488d9a1cea
Merge pull request 'Add sparkExecutorMemoryOverhead workflow config to set off-heap memory for Spark actions. If not explicitly set it is defaulted to 1Gb' ( #331 ) from consistencywf_memoryoverhead_conf into beta
...
Reviewed-on: #331
2023-08-29 16:31:36 +02:00
Giambattista Bloisi
6b1c05d118
Add sparkExecutorMemoryOverhead workflow config to set off-heap memory for Spark actions. If not explicitly set it is defaulted to 1Gb
2023-08-29 16:04:19 +02:00
Claudio Atzori
bf35280ea6
code formatting
2023-08-29 11:11:00 +02:00
Claudio Atzori
0515d81c7c
Merge pull request 'Rewrite SparkPropagateRelation exploiting Dataframe API' ( #330 ) from propagate_relation_rewrite into beta
...
Reviewed-on: #330
2023-08-29 10:47:14 +02:00
Claudio Atzori
58665a246c
Merge branch 'beta' into propagate_relation_rewrite
2023-08-29 10:47:02 +02:00
Claudio Atzori
f437be80ad
[impact indicators] adjusted paths in the bip ranker wf parameters
2023-08-29 09:03:03 +02:00
Giambattista Bloisi
d012aec0b3
Revert PropagateRelation's argument name from outputPath to graphOutputPath in consistency workflow ( #8964 )
2023-08-28 22:44:54 +02:00
Giambattista Bloisi
a860e19423
Fix ensure all relations are written out, not only those managed by dedup
2023-08-28 15:36:02 +02:00
Giambattista Bloisi
0d7b2bf83d
Rewrite SparkPropagateRelation exploiting Dataframe API
2023-08-28 10:34:54 +02:00
Miriam Baglioni
9c8b41475a
Merge pull request '8172_impact_indicators_workflow' ( #284 ) from 8172_impact_indicators_workflow into beta
...
Reviewed-on: #284
2023-08-14 15:50:48 +02:00
Serafeim Chatzopoulos
97c1ba8918
Merge actionsets of results and projects
2023-08-11 15:56:53 +03:00
Miriam Baglioni
35b8deb2c6
Merge pull request 'DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag' ( #329 ) from dispatch_filter_invisible_entities into beta
...
Reviewed-on: #329
2023-08-10 12:56:18 +02:00
Giambattista Bloisi
95cd2b9b1e
Make filterInvisible a mandatory parameter of DispathEntitiesSparkJob
...
Make filterInvisible a mandatory parameter of both dedup/consistency and graph/group oozie workflows
2023-08-10 11:53:48 +02:00
Giambattista Bloisi
fab9920271
DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag
2023-08-09 15:41:43 +02:00
Miriam Baglioni
c25ac21e5e
Merge pull request 'graph cleaning, suggestions from ticket 8898' ( #325 ) from cleaning_8898 into beta
...
Reviewed-on: #325
2023-08-08 11:14:19 +02:00
Miriam Baglioni
c334fe2438
Merge pull request 'Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleted by inference or that are pointing to dangling entities' ( #328 ) from cleanup_relations_after_dedup into beta
...
Reviewed-on: #328
2023-08-08 09:49:12 +02:00
Miriam Baglioni
0e2f855807
Merge pull request 'Updates Promotion DBs' ( #321 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #321
2023-08-07 12:09:16 +02:00
Miriam Baglioni
18fbe52b20
Merge pull request 'Import affiliation relations from Crossref' ( #320 ) from 8876 into beta
...
Reviewed-on: #320
2023-08-07 10:45:30 +02:00
Giambattista Bloisi
97b6d1dc45
Filter ids by dataInfo.deletedbyinference and DataInfo.invisible flags
...
Filter relations also by dataInfo.invisible flag
2023-08-07 10:24:11 +02:00
Giambattista Bloisi
af49424b59
Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleyted by inference or that are pointing to dangling entities
2023-08-04 14:27:39 +02:00
Claudio Atzori
b9dddbfe54
rule out records with NULL dataInfo, except for Relations
2023-07-31 17:53:54 +02:00
Claudio Atzori
11ffb9bd68
rule out records with NULL dataInfo
2023-07-31 12:35:33 +02:00
Serafeim Chatzopoulos
7cefe2665b
Remove unnecessary classes
2023-07-28 19:14:39 +03:00
Serafeim Chatzopoulos
26a92ce762
Merge branch '8876' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8876
2023-07-28 19:03:57 +03:00
Serafeim Chatzopoulos
ebfba38ab6
Add changes from code review
2023-07-28 19:03:47 +03:00
Serafeim Chatzopoulos
eb8684a8cf
Merge branch 'beta' into 8876
2023-07-28 13:39:33 +02:00
Claudio Atzori
1275a07d45
Merge pull request '[graph indexing] expand the instance level fulltext in the XML records' ( #326 ) from instance_fulltext_xml into beta
...
Reviewed-on: #326
2023-07-27 15:02:07 +02:00
Claudio Atzori
a72b9e96ac
expand the instance level fulltext in the XML records
2023-07-27 14:57:38 +02:00
Claudio Atzori
d8435a6512
inverted condition
2023-07-25 17:39:57 +02:00
Claudio Atzori
270df939c4
partial implementation of the suggestions from https://support.openaire.eu/issues/8898
2023-07-25 17:29:50 +02:00
Claudio Atzori
8c63e4a864
Merge pull request 'Refactor Dedup using Spark Dataframe API, initial support for scala 2.12 and Spark 3.4' ( #324 ) from dedup-with-dataframe-2 into beta
...
Reviewed-on: #324
2023-07-25 10:17:17 +02:00
Giambattista Bloisi
e64c2854a3
Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
...
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Giambattista Bloisi
bb5b845e3c
Use scala.binary.version property to resolve scala maven dependencies
...
Ensure consistent usage of maven properties
Profile for compiling with scala 2.12 and Spark 3.4
2023-07-24 11:13:48 +02:00
Claudio Atzori
002b24e06f
Merge pull request '[graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests' ( #315 ) from pid_cleaning into beta
...
Reviewed-on: #315
2023-07-24 10:49:44 +02:00
Claudio Atzori
c754397a19
Merge branch 'beta' into pid_cleaning
2023-07-24 10:49:31 +02:00
Claudio Atzori
f0678cda09
Merge pull request 'fix_beta_tests' ( #323 ) from fix_beta_tests into beta
...
Reviewed-on: #323
2023-07-24 10:47:35 +02:00
Serafeim Chatzopoulos
3a0f09774a
Add script to find score limits
2023-07-21 17:55:41 +03:00
Ilias Kanellos
06b9b71c4e
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-07-21 17:42:49 +03:00
Ilias Kanellos
2374f445a9
Produce additional bip update specific files
2023-07-21 17:42:46 +03:00
Serafeim Chatzopoulos
cb0f3c50f6
Format workflow.xml
2023-07-21 16:07:10 +03:00
Serafeim Chatzopoulos
c64e5e588f
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-07-21 15:27:02 +03:00
Serafeim Chatzopoulos
2cc5b1a39b
Fixes in workflow.xml
2023-07-21 15:26:50 +03:00
Ilias Kanellos
0f96af5d56
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-07-21 13:42:35 +03:00
Ilias Kanellos
03da965162
Format bip-score based file without doi references
2023-07-21 13:42:30 +03:00
Giambattista Bloisi
f03153823a
Update testCitationRelations number of expected citations according to changes made in 0559d8b4
(monodirectional citations)
2023-07-21 10:48:28 +02:00
Giambattista Bloisi
54c1eacef1
SparkJobTest was failing because testing workingdir was not cleaned up after eact test
2023-07-21 10:42:24 +02:00
Giambattista Bloisi
5e15f20e6e
Fix entityMerger that was excluding the authors of the first entity in the list to merge
2023-07-21 00:46:54 +02:00
Giambattista Bloisi
0210a14e43
Ignore timestamp differences in PromoteActionPayloadForGraphTableJobTest
2023-07-20 23:45:57 +02:00
Giambattista Bloisi
dba34505de
Fix SparkStatsTest bug where parquet tables were incorrectly read as text files leading to unpredictable count() values
2023-07-19 14:24:52 +02:00
Giambattista Bloisi
e47ed1fdb2
Use DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES in json mapper to avoid that tests fail if they encounter unmapped properties
2023-07-19 14:21:40 +02:00
Giambattista Bloisi
38dfebfbe6
Disable MdStoreClientTest test as it requires a local mongodb running and it does not perform any assertions
2023-07-19 14:18:56 +02:00
Serafeim Chatzopoulos
db4ca43ee8
Resolve conflict
2023-07-18 18:38:26 +03:00
Serafeim Chatzopoulos
be320ba3c1
Indentation fixes
2023-07-17 16:04:21 +03:00
Serafeim Chatzopoulos
bc1a4611aa
Minor changes
2023-07-17 11:17:53 +03:00
dimitrispie
76901a25f9
Updates Promotion DBs
...
- Add a step for promoting the splitted monitor DBs
2023-07-12 22:49:08 +03:00
Giambattista Bloisi
ef493681d9
Merge pull request 'Import dnet-pace-core module in this project and use it after renaming to dhp-pace-core' ( #319 ) from beta_with_pace_core into beta
...
Reviewed-on: #319
2023-07-11 14:03:15 +02:00
Serafeim Chatzopoulos
4eba14a80e
Add oozie workflow
2023-07-06 21:07:50 +03:00
Serafeim Chatzopoulos
c2998a14e8
Add basic tests for affiliation relations
2023-07-06 20:28:16 +03:00
Serafeim Chatzopoulos
bc7b00bcd1
Add bi-directional affiliation relations
2023-07-06 18:29:15 +03:00
Serafeim Chatzopoulos
12528ed2ef
Refactor PrepareAffiliationRelations.java to use OafMapperUtils common functions
2023-07-06 18:08:33 +03:00
Serafeim Chatzopoulos
bbc245696e
Prepare actionsets for BIP affiliations
2023-07-06 15:56:12 +03:00
Ilias Kanellos
0c433eccdd
Fix scores & Workflow
2023-07-06 15:06:28 +03:00
Ilias Kanellos
d5c39a1059
Fix map scores to doi
2023-07-06 15:04:48 +03:00
Ilias Kanellos
772d5f0aab
Make PR and AttRank serial
2023-07-06 13:47:51 +03:00
Giambattista Bloisi
801da2fd4a
New sources formatted by maven plugin
2023-07-06 10:28:53 +02:00
Giambattista Bloisi
bd3fcf869a
rename dnet-pace-core into dhp-pace-core module and use it as dependency in other modules
2023-07-06 10:02:23 +02:00
Serafeim Chatzopoulos
347a889b20
Read affiliation relations
2023-07-06 00:51:01 +03:00
Giambattista Bloisi
3b35db5fbd
Import dnet-pace-core module from dnet-dedup repository
2023-07-05 22:23:06 +02:00
Miriam Baglioni
7738372125
[UsageCount] fixed typo in attribute name for datasource table
2023-06-30 18:56:41 +02:00
Sandro La Bruzzo
9963fd6d29
updated log to add subentity
2023-06-28 13:36:05 +02:00
Sandro La Bruzzo
ed7e2ab6d1
reverted mistake on commit workflow.xml
2023-06-28 11:40:19 +02:00
Sandro La Bruzzo
9910ce06ae
added to CreateSimRel the feature to write time log
2023-06-28 11:38:16 +02:00
Miriam Baglioni
2717edafb7
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-06-28 11:25:14 +02:00
Miriam Baglioni
2f04c9d149
[BulkTagging] fixing left over for test
2023-06-28 11:24:42 +02:00
Sandro La Bruzzo
bd17c3edc8
added to CreateSimRel the feature to write time log
2023-06-28 11:20:58 +02:00
Sandro La Bruzzo
b195da3a83
Added utility to write time logs during the deduplication phase
2023-06-28 11:20:09 +02:00
Claudio Atzori
0f5a819f44
[graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests
2023-06-23 16:10:49 +02:00
Serafeim Chatzopoulos
60f25b780d
Minor fixes in workflow.xml and job.properties
2023-06-23 12:51:50 +03:00
Michele Artini
88a1cbc37d
fixed a datasource id
2023-06-22 07:56:33 +02:00
Claudio Atzori
b0ebf56367
Merge pull request 'Update step15_5.sql' ( #314 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #314
2023-06-21 10:33:22 +02:00
dimitrispie
2b6370eaee
Update step15_5.sql
...
Bug fix
2023-06-21 11:31:10 +03:00
Claudio Atzori
35e42a86ed
Merge pull request 'Update step15_5.sql' ( #313 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #313
2023-06-21 10:26:16 +02:00
dimitrispie
74cb060bfe
Update step15_5.sql
...
Add "if not exists" clause
2023-06-21 11:24:06 +03:00
Claudio Atzori
85e016df17
Merge pull request 'Update step16-createIndicatorsTables.sql' ( #312 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #312
2023-06-21 09:52:33 +02:00
dimitrispie
a475cfcb7b
Update step16-createIndicatorsTables.sql
...
Rename a field in indi_pub_interdisciplinarity
2023-06-21 10:42:02 +03:00
Claudio Atzori
979cf9cd87
Merge pull request 'Update step15.sql' ( #311 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #311
2023-06-21 09:20:01 +02:00
dimitrispie
4648cd88d4
Update step15.sql
...
Cast score to double
2023-06-21 10:02:19 +03:00
dimitrispie
94d2573c77
Update step15.sql
...
Bug Fix
2023-06-21 09:22:39 +03:00
Claudio Atzori
0561362de2
Merge pull request 'Update step20-createMonitorDB_institutions.sql' ( #309 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #309
2023-06-20 15:07:09 +02:00
Claudio Atzori
50d7dc0078
[graph enrichment] fixed projectOrganizationPath not being passed to the apply_resulttoorganization_propagation node
2023-06-19 15:42:44 +02:00
Claudio Atzori
fbd9bf704e
indent
2023-06-19 15:41:22 +02:00
Claudio Atzori
6210f6ee48
Merge pull request 'Precompile blacklists patterns before evaluating clustering criteria' ( #1 ) from optimized-clustering into master
...
Reviewed-on: D-Net/dnet-dedup#1
2023-06-19 12:43:49 +02:00
dimitrispie
be2caedb04
Update step20-createMonitorDB_institutions.sql
...
Add openorgs____::1624ff7c01bb641b91f4518539a0c28a Vrije Universiteit Amsterdam
2023-06-19 12:12:17 +03:00
dimitrispie
36e0a8fec4
Changes to Promotion Stats WF
...
1. Add new cluster host at impala-shell commands
2. Add a step for splitting monitor dbs
3. Update workflow.xml to included the new splitting monitor dbs step
2023-06-19 09:44:34 +03:00
Giambattista Bloisi
b0ade43608
Precompile blacklists patterns before evaluating clustering criteria
...
Enable Junit 5 tests in maven builds
Make path comparisons platform-independent
Read String resource files assuming they are encoded in UTF-8
Fix a few test conditions
2023-06-16 09:41:11 +02:00
dimitrispie
4c770a5e29
Update finalizeImpalaCluster.sh
...
Drop views in shadow dbs before dropping the db
2023-06-15 13:25:37 +03:00
dimitrispie
e06d962a6a
Update step15.sql
2023-06-15 12:20:35 +03:00
dimitrispie
afcad08396
Update step20-createMonitorDB_institutions.sql
...
Added openorgs____::c0b262bd6eab819e4c994914f9c010e2 -- National Institute of Geophysics and Volcanology
2023-06-15 10:28:49 +03:00
Claudio Atzori
b9748763e2
Merge pull request '[stats wf] Bug fixes' ( #308 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #308
2023-06-14 21:57:03 +02:00
dimitrispie
42b8ce2ba4
Update copyDataToImpalaCluster.sh
2023-06-14 19:23:42 +03:00
dimitrispie
2032b0df40
Bug fixes
...
1. Remove tables/views from old databases in the new cluster, before dropping the dbs
2. Fix id in result_accessroute, indi_impact_measures, indi_pub_bronze_oa
2023-06-14 19:09:09 +03:00
Claudio Atzori
b76a47b103
[aggregator graph] added column alias when mapping organization PIDs from the OpenOrgs database
2023-06-13 11:38:10 +02:00
Claudio Atzori
744a61a030
depending on dhp-schema:3.17.1
2023-06-12 13:49:44 +02:00
Claudio Atzori
2e4616a251
Merge pull request '[graph cleaning] pid cleaning' ( #307 ) from pid_cleaning into beta
...
Reviewed-on: #307
2023-06-12 13:32:29 +02:00
Claudio Atzori
d6a8b24711
Merge branch 'beta' into pid_cleaning
2023-06-12 13:32:22 +02:00
Claudio Atzori
fdbfb25614
Merge pull request 'update sql query to return distinct pids [beta]' ( #306 ) from distinct_pids_from_openorgs_beta into beta
...
Reviewed-on: #306
2023-06-12 09:59:00 +02:00
Claudio Atzori
ad04f14b81
Merge branch 'beta' into distinct_pids_from_openorgs_beta
2023-06-12 09:58:21 +02:00
Claudio Atzori
a98e6591e2
Merge pull request 'propagation of projects through parent-child relations' ( #299 ) from propagationProjectThroughParentChils into beta
...
Reviewed-on: #299
2023-06-12 09:57:20 +02:00
Claudio Atzori
55f002f1e9
Merge branch 'beta' into propagationProjectThroughParentChils
2023-06-12 09:56:53 +02:00
Claudio Atzori
daa21ddbb5
Merge pull request '[aggregator graph] validation for URLs from oaf:fulltext' ( #298 ) from fulltext_url_validation into beta
...
Reviewed-on: #298
2023-06-12 09:55:35 +02:00
Claudio Atzori
4b00a76271
Merge branch 'beta' into fulltext_url_validation
2023-06-12 09:55:25 +02:00
Claudio Atzori
eb2fa8556b
Merge pull request 'removeTaggingCondition' ( #297 ) from removeTaggingCondition into beta
...
Reviewed-on: #297
2023-06-12 09:53:05 +02:00
Claudio Atzori
de225c71cd
Merge branch 'beta' into removeTaggingCondition
2023-06-12 09:50:40 +02:00
Claudio Atzori
e1409ffe80
update sql query to return distinct pids
2023-06-12 09:47:45 +02:00
Claudio Atzori
1d33074fd1
WIP: pid cleaning
2023-06-09 16:47:25 +02:00
Claudio Atzori
da7b66c542
Merge pull request '[stats wf] Added memory to hive' ( #305 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #305
2023-06-08 08:58:48 +02:00
dimitrispie
c5f42c7f5b
Added memory to hive
2023-06-07 18:18:23 +03:00
Claudio Atzori
afb76ebf0f
Merge pull request '[stats wf] Bug fix on indicators step' ( #304 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #304
2023-06-07 16:49:09 +02:00
dimitrispie
fa24e2e18f
Bug fix on indicators step
...
indi_pub_gold_oa table was missing during the creation of other indicators
2023-06-07 17:43:37 +03:00
Claudio Atzori
01c67e697d
Merge pull request '[ stats wf] Bug fix' ( #303 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #303
2023-06-07 14:41:44 +02:00
dimitrispie
28272c1b0e
Bug fix
2023-06-07 15:34:01 +03:00
Alessia Bardi
d5be6a13e9
Updated officialnmae of pangaea in hostedbymap for Datacite to avoid duplicate entries in the source filter of the portal
2023-06-06 14:43:32 +02:00
Claudio Atzori
8f651f1225
Merge pull request 'Changes to beta stats wf' ( #300 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #300
2023-06-06 11:41:36 +02:00
dimitrispie
ad07fbf053
Add names to organizations for collaboration indicators
2023-06-02 14:13:10 +03:00
dimitrispie
2324670714
Split Monitor DBs-Interdisciplinarity indicators
...
- Split DBs Monitor for faster rendering of visualizations
- Add interdisciplinarity indicators from result_fos
2023-06-02 13:34:16 +03:00
Miriam Baglioni
daf4d7971b
refactoring
2023-05-31 18:56:58 +02:00
Miriam Baglioni
97d72d41c3
finalization of implementation and testing
2023-05-31 18:53:22 +02:00
Miriam Baglioni
0389b57ca7
added propagation for project to organization
2023-05-31 11:06:58 +02:00
Claudio Atzori
e45777e7e1
[aggregator graph] added validation for URLs mapped from oaf:fulltext
2023-05-26 11:33:42 +02:00
dimitrispie
ebe586b1d1
Impact indicators/Unpaywall
...
- Added Impact indicators
- Added unpaywall open access colours
2023-05-26 10:25:28 +03:00
dimitrispie
d6102dd576
Update step16-createIndicatorsTables.sql
...
- Add org names to indi_project_collab_org
- Add indi_pub_bronze_oa
- Changes to indi_pub_hybrid_oa_with_cc
2023-05-25 14:52:34 +03:00
Miriam Baglioni
9097e71853
Added assertion in test
2023-05-24 16:30:53 +02:00
Miriam Baglioni
9567c13bc3
refactoring
2023-05-24 16:20:05 +02:00
Miriam Baglioni
34172455d1
[BulkTag] Adding remove constraints to specify when a community must not appear in the context of a result.
2023-05-24 09:56:23 +02:00
Ilias Kanellos
a1b9187039
Fix syntax error on workflow.xml
2023-05-23 17:17:12 +03:00
Ilias Kanellos
6a7e370a21
Remove unnecessary counts in graph creation
2023-05-23 16:48:58 +03:00
Ilias Kanellos
ec4e010687
End after rankings | Create graph debugged
2023-05-23 16:44:04 +03:00
Claudio Atzori
a235d2a24a
Merge pull request 'Updates to steps related to transfer data to impala cluster' ( #295 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #295
2023-05-18 08:46:15 +02:00
dimitrispie
86f4f63daf
Updates to steps related to transfer data to impala cluster
...
1. Remove external table definitions in stats_ext
2. Fix the issue where some views are not created.
3. Added two workflow parameters for copying also the usage stats dbs
2023-05-18 09:33:05 +03:00
Claudio Atzori
909729a2fc
[dedup] tweaking num partitions, minor changes
2023-05-17 10:16:22 +02:00
Ilias Kanellos
38020e242a
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-05-16 17:34:53 +03:00
Ilias Kanellos
3d69f33c84
Fix selection of columns in graph creation
2023-05-16 17:34:42 +03:00
Ilias Kanellos
3c38f7ba6f
Fix selection of columns in graph creation
2023-05-16 17:32:53 +03:00
Serafeim Chatzopoulos
8ef718c363
Fix workflow application path
2023-05-16 16:28:48 +03:00
Serafeim Chatzopoulos
26328e2a0d
Move job.properties
2023-05-16 14:39:53 +03:00
Serafeim Chatzopoulos
4eec3e7052
Add jobTracker, nameNode && spark2Lib as global params in oozie wf
2023-05-15 22:28:48 +03:00
Serafeim Chatzopoulos
b83135c252
Add missing kill nodes in workflow.xml
2023-05-15 19:55:35 +03:00
Serafeim Chatzopoulos
45f2aa0867
Move end node ... at the end in workflow.xml
2023-05-15 17:52:20 +03:00
Claudio Atzori
e309688711
Merge pull request 'fix APC affiliation links' ( #294 ) from apc_affiliation into beta
...
Reviewed-on: #294
2023-05-15 15:47:57 +02:00
Claudio Atzori
8acad52a0c
Merge branch 'beta' into apc_affiliation
2023-05-15 15:47:33 +02:00
Claudio Atzori
8a463cc3e8
fixed organization id created when mapping APC affiliations. Factored out ROR constants in dhp-common
2023-05-15 15:44:46 +02:00
Serafeim Chatzopoulos
12a57e1f58
Resolve conflicts
2023-05-15 16:20:11 +03:00
Serafeim Chatzopoulos
82e2a96f51
Resolve conflicts
2023-05-15 15:53:12 +03:00
Serafeim Chatzopoulos
b8e8c959fe
Update workflow.xml && job.properties
2023-05-15 15:50:23 +03:00
Ilias Kanellos
4a905932a3
Spark properties from job.properties
2023-05-15 15:24:22 +03:00
Claudio Atzori
0c314d5e09
Merge pull request 'Update copyDataToImpalaCluster.sh' ( #293 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #293
2023-05-15 12:05:54 +02:00
Serafeim Chatzopoulos
07818131ef
Update documentation
2023-05-15 13:04:44 +03:00
dimitrispie
b3f9633205
Update copyDataToImpalaCluster.sh
...
Added option --user to impala-shell command
2023-05-15 12:51:44 +03:00
Miriam Baglioni
021321ae06
Merge pull request 'removed the inverse of the Citing relation' ( #292 ) from citeOnly into beta
...
Reviewed-on: #292
2023-05-15 11:37:39 +02:00
Miriam Baglioni
78b07400c0
changed test classes
2023-05-15 11:37:08 +02:00
Miriam Baglioni
86fe886c1a
removed the inverse of the Citing relation
2023-05-15 11:20:51 +02:00
Ilias Kanellos
1788ac2d4d
Correct filtering for MAG records
2023-05-12 12:55:43 +03:00
Miriam Baglioni
12cd179d2d
Merge pull request 'Update copyDataToImpalaCluster.sh' ( #291 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #291
2023-05-12 11:36:34 +02:00
dimitrispie
00d0d162b6
Update copyDataToImpalaCluster.sh
...
Added a temporary folder to copy the files to avoid permission issues
2023-05-12 12:31:13 +03:00
Ilias Kanellos
5ddbb4ad10
Spark properties no longer hardcoded
2023-05-11 15:36:47 +03:00
Ilias Kanellos
3de35fd6a3
Produce 5 classes of ranking scores
2023-05-11 14:42:25 +03:00
Miriam Baglioni
8c05f49665
moved the version as it was before the change
2023-05-09 10:48:34 +02:00
Miriam Baglioni
99ac5bab46
added check to avoid NPE when checking the organization country
2023-05-04 19:38:39 +02:00
Claudio Atzori
0704e186f6
Merge pull request 'Stats wf executed on hive only' ( #283 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #283
2023-05-02 14:05:12 +02:00
Claudio Atzori
cd80b200ee
Merge pull request 'Affiliation links from APC' ( #290 ) from apc_affiliation into beta
...
Reviewed-on: #290
2023-05-02 12:00:04 +02:00
Claudio Atzori
d8882c4481
extended mapping applied to datacite records to produce affiliations using the ROR ids. Inc ase of APCs it includes the amount and the currently in the relation
2023-05-02 11:56:51 +02:00
Claudio Atzori
d02916ef82
code formatting
2023-05-02 11:05:37 +02:00
Claudio Atzori
f653640cd9
Merge pull request 'Bulk Tagging single step' ( #289 ) from bulkTagRefactor into beta
...
Reviewed-on: #289
2023-05-02 10:54:14 +02:00
dimitrispie
c3d58e58e1
Bug fixes
2023-05-02 11:54:07 +03:00
Claudio Atzori
abd7ca0c18
Merge branch 'beta' into bulkTagRefactor
2023-05-02 10:50:01 +02:00
Claudio Atzori
de36c7b083
Merge pull request 'Enrichment - result to community through organization' ( #255 ) from organizationToRepresentative into beta
...
Reviewed-on: #255
2023-05-02 10:47:07 +02:00
Claudio Atzori
45f625d14f
Merge branch 'beta' into organizationToRepresentative
2023-05-02 10:46:55 +02:00
Claudio Atzori
cdd33f7445
Merge pull request 'graph cleaning refactoring' ( #282 ) from graph_cleaning_refactoring into beta
...
Reviewed-on: #282
2023-05-02 10:40:02 +02:00
Claudio Atzori
de11edca98
Merge branch 'beta' into organizationToRepresentative
2023-05-02 09:59:41 +02:00
Claudio Atzori
851f664bd9
Merge branch 'beta' into graph_cleaning_refactoring
2023-05-02 09:55:40 +02:00
dimitrispie
e57ecdaf98
Update step20-createMonitorDB.sql
...
Add University of Manitoba
2023-04-30 17:52:23 +03:00
Ilias Kanellos
90332439ad
Remove deletion of synonym folder
2023-04-28 13:45:19 +03:00
Ilias Kanellos
a98da54896
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-04-28 13:23:49 +03:00
Ilias Kanellos
09485fbee3
Fixed unicode bug. Workflow ends after first script
2023-04-28 13:09:13 +03:00
Serafeim Chatzopoulos
614cc1089b
Add separate forder for results && project actionsets
2023-04-27 12:37:15 +03:00
Serafeim Chatzopoulos
815a4ddbba
Add actionset creation for project bip indicators in workflow
2023-04-26 20:40:06 +03:00
Serafeim Chatzopoulos
ee04cf92bf
Add actionsets for project impact indicators
2023-04-26 20:23:46 +03:00
dimitrispie
fdb5d2b39f
Bug fixes
2023-04-23 18:29:00 +03:00
dimitrispie
53ce023035
Bug fixes
2023-04-23 18:23:45 +03:00
Miriam Baglioni
ce03f3ee62
mergin with branch beta
2023-04-20 14:50:47 +02:00
dimitrispie
4fa750b719
Bug fixes on monitor-update
2023-04-19 17:39:53 +03:00
dimitrispie
5247cb7115
Bug fix
2023-04-19 11:11:19 +03:00
Miriam Baglioni
efc4f6a658
[bulkTag] refactor to enrich each result single step
2023-04-18 17:39:31 +02:00
Serafeim Chatzopoulos
23f58a86f1
Change jar param in project impact indicators action
2023-04-18 12:26:01 +03:00
Miriam Baglioni
73f77575bd
[ZenodoApiClient] align with master version
2023-04-18 10:25:27 +02:00
Miriam Baglioni
697a134504
-
2023-04-18 10:21:12 +02:00
Miriam Baglioni
6cc95c96a2
-
2023-04-18 09:53:11 +02:00
Michele De Bonis
cb595c87bb
implementation of the support for authors deduplication: cosinesimilarity comparator and double array json parser
2023-04-17 11:06:27 +02:00
dimitrispie
25dafccc24
Merge branch 'hive' into beta
2023-04-12 11:36:59 +03:00
Claudio Atzori
a2dcb06daf
added eoscifguidelines in the result view; removed compute statistics statements
2023-04-11 10:43:32 +02:00
Serafeim Chatzopoulos
7256c8d3c7
Add script for aggregating impact indicators at the project level
2023-04-07 16:30:12 +03:00
dimitrispie
c85de8fa1f
-Added Technological University Dublin
...
-Added project_organization_contribution table
-Add Delft University of Technology
2023-04-07 09:22:59 +03:00
dimitrispie
9b41dff33c
Update step20-createMonitorDB.sql
...
Added Delft University of Technology
2023-04-07 09:21:38 +03:00
Miriam Baglioni
932d07d2dd
[bulkTag] added filtering for datasources in eosctag
2023-04-06 15:08:27 +02:00
Miriam Baglioni
287753417d
better implementation for the fix
2023-04-06 12:22:38 +02:00
Miriam Baglioni
b42abc9904
fixed issue on bulktagging for the advanced constraints
2023-04-06 12:15:00 +02:00
dimitrispie
91e18ac7f4
Added project_organization_contribution table
2023-04-06 10:53:11 +03:00
Miriam Baglioni
b25b401065
added test to verify the advconstraints to dth community. inserted some additional logs.
2023-04-05 12:18:39 +02:00
Claudio Atzori
864f4051d3
[graph cleaning] added missing case
2023-04-05 11:35:47 +02:00
Michele De Bonis
297eb207a5
minor change in the author match which now can compute count and percentage
2023-04-04 17:10:37 +02:00
Claudio Atzori
dead87917f
[graph cleaning] cleanup
2023-04-04 13:13:43 +02:00
Claudio Atzori
2a6ba29b64
[graph cleaning] unit tests & cleanup
2023-04-04 12:34:51 +02:00
dimitrispie
9e1335df4c
-Added Technological University Dublin
...
-Added project_organization_contribution table
2023-04-04 13:22:40 +03:00
Claudio Atzori
63b8bbc015
[graph to Solr] using dedicated sparkExecutorCores, sparkExecutorMemory, sparkDriverMemory in convert_to_xml
2023-03-24 13:43:20 +01:00
Claudio Atzori
b502f86523
fixed input path supplemented to GetDatasourceFromCountry; adjusted the various spark.sql.shuffle.partitions
2023-03-24 13:09:12 +01:00
Claudio Atzori
c07857fa37
[graph cleaning] unit tests & cleanup
2023-03-23 15:57:47 +01:00
Claudio Atzori
90e61a8aba
[graph cleaning] WIP: refactoring of the cleaning stages, unit tests
2023-03-23 15:03:26 +01:00
Claudio Atzori
308e10d102
serialising: 1. measures for all the entity types and 2. result level fulltext
2023-03-23 11:23:22 +01:00
Claudio Atzori
488d9a5eaa
[graph cleaning] WIP: refactoring of the cleaning stages, unit tests
2023-03-23 10:41:13 +01:00
dimitrispie
fad7fa4af8
Added Technological University Dublin
2023-03-22 09:44:00 +02:00
Serafeim Chatzopoulos
102aa5ab81
Add dependency to dhp-aggregation
2023-03-21 19:25:29 +02:00
Serafeim Chatzopoulos
f3e5abf63b
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-03-21 18:26:09 +02:00
Serafeim Chatzopoulos
3e8a4cf952
Rearrange resources folder structure
2023-03-21 18:25:55 +02:00
Serafeim Chatzopoulos
f992ecb657
Checkout BIP-Ranker during 'prepare-package' && add it in the oozie-package.tar.gz
2023-03-21 18:03:55 +02:00
Ilias Kanellos
9dc8f0f05f
Add ActionSet step
2023-03-21 16:14:15 +02:00
Claudio Atzori
4f5ba0ed52
[graph cleaning] WIP: refactoring of the cleaning stages, unit tests
2023-03-21 14:41:20 +01:00
Ilias Kanellos
b5c252865c
Add filtering based on citation source
2023-03-20 15:38:36 +02:00
Claudio Atzori
6d3d18d8b5
[graph cleaning] WIP: refactoring of the cleaning stages
2023-03-16 17:23:36 +01:00
dimitrispie
43b23a9bf3
Update step20-createMonitorDB.sql
...
Added Technological University Dublin
2023-03-15 09:57:12 +02:00
Serafeim Chatzopoulos
720fd19b39
Add dhp-impact-indicators workflow files
2023-03-14 19:28:27 +02:00
Serafeim Chatzopoulos
c6e39b7f33
Add dhp-impact-indicators
2023-03-14 18:50:54 +02:00
Claudio Atzori
518618f1a9
[graph cleaning] avoid to overwrite the subject class to 'keyword' for those with provenance 'subject:fos'
2023-03-14 15:22:47 +01:00
Claudio Atzori
41e00bcd07
[graph provision] avoid to parse again the XML records, apparently the escaped XML characters get unescaped invalidating the record
2023-03-13 15:19:49 +01:00
Claudio Atzori
46d2df1c90
Merge pull request '[aggregator graph] handle paths including wildcards' ( #281 ) from aggregator_graph into beta
...
Reviewed-on: #281
2023-03-08 21:17:39 +01:00
Claudio Atzori
24e2fd828b
code formatting
2023-03-08 21:17:08 +01:00
Claudio Atzori
e28d395e87
[aggregator graph] using dedicated path to sync claims, adjusted paths with wildcards
2023-03-08 21:16:52 +01:00
Claudio Atzori
5b8fd37314
[aggregator graph] using dedicated path to sync claims
2023-03-08 15:28:14 +01:00
Claudio Atzori
7fd89566c2
[aggregator graph] handle paths including wildcards
2023-03-08 12:43:00 +01:00
Miriam Baglioni
588aca5ce4
Merge pull request 'h2020classification' ( #280 ) from h2020classification into beta
...
Reviewed-on: #280
2023-03-03 09:29:10 +01:00
Claudio Atzori
8ec0d62d91
pre-group the records in each table before joning the contents from BETA and PROD together
2023-03-02 14:49:19 +01:00
Miriam Baglioni
0fff98a14c
[ECclassification] removed print
2023-03-02 11:46:57 +01:00
Miriam Baglioni
b0c2f7e526
[ECclassification] removed not needed resources
2023-03-02 11:44:48 +01:00
Miriam Baglioni
d4fc62c2f6
mergin with branch beta
2023-03-02 11:14:54 +01:00
Miriam Baglioni
de8ad1caef
[ECclassification] new implementation for the H2020 classification
2023-03-02 11:14:03 +01:00
Claudio Atzori
db9dad4aa7
[actionmanager] increased spark.sql.shuffle.partitions for publication, dataset, relation records
2023-03-02 09:11:37 +01:00
Miriam Baglioni
c1f9848953
[ECclassification] added new classes
2023-03-01 15:29:11 +01:00
Claudio Atzori
6f488547a7
ignore non processable records
2023-03-01 14:49:51 +01:00
Claudio Atzori
7d263f265e
adjusted logs
2023-03-01 11:58:07 +01:00
Claudio Atzori
16ad42e8f3
code formatting
2023-03-01 10:22:13 +01:00
Claudio Atzori
9c59dac859
followup changes reorganising the mdstore synchronisation mechanism
2023-03-01 10:16:20 +01:00
Miriam Baglioni
49737f1087
Merge pull request '[CrossrefFunderMapping] fixed issueson funder name' ( #279 ) from doiboostFunderExtention into beta
...
Reviewed-on: #279
2023-02-28 15:08:07 +01:00
Miriam Baglioni
ad745c0aa3
[CrossrefFunderMapping] fixed issueson funder name
2023-02-28 14:58:27 +01:00
Miriam Baglioni
4f2df876cd
[ECclassification] new implementation first try
2023-02-28 14:44:00 +01:00
Claudio Atzori
bc986f66ec
Merge pull request 'monodirectional citations' ( #278 ) from citations_monodirectional into beta
...
Reviewed-on: #278
2023-02-28 13:33:52 +01:00
Claudio Atzori
2f7346e9cf
WIP monodirectional citations, Datacite
2023-02-28 13:30:51 +01:00
Claudio Atzori
0559d8b412
WIP monodirectional citations
2023-02-28 10:57:32 +01:00
Sandro La Bruzzo
69fa616490
removed wrong content
2023-02-28 10:27:38 +01:00
Sandro La Bruzzo
832a75d012
added mapping for crossref funder
2023-02-28 10:16:34 +01:00
Sandro La Bruzzo
78e51c182a
Added missing parametero to raw all workflow
2023-02-28 10:16:01 +01:00
Claudio Atzori
7aebedb43c
code formatting
2023-02-27 11:51:27 +01:00
Miriam Baglioni
80987801d7
[FoS] added check for null on level1 subject
2023-02-27 11:40:22 +01:00
Claudio Atzori
31e97c2a6b
[unresolved entities] updated oozie wf node labels
2023-02-27 11:38:29 +01:00
Miriam Baglioni
23112929e9
[FoS] changed the default separator from comma to tab to solve the issue in subject value split
2023-02-27 10:18:39 +01:00
Claudio Atzori
c4856b4eaa
Merge pull request 'Remove unecessary indexed fields from Solr' ( #277 ) from 8099_lighten_solr_index into beta
...
Reviewed-on: #277
2023-02-23 11:50:29 +01:00
Serafeim Chatzopoulos
0b5bf53b45
Remove unecessary indexed fields from Solr
2023-02-23 12:42:42 +02:00
dimitrispie
1547611246
Merge branch 'beta' into hive
2023-02-22 16:57:12 +02:00
Claudio Atzori
9e4ec0023c
Merge pull request 'updated the order of the compatibilities (BETA)' ( #276 ) from compatibility_order_beta into beta
...
Reviewed-on: #276
2023-02-22 14:47:32 +01:00
Michele Artini
fddcf701e9
updated the order of the compatibilities
2023-02-22 12:07:09 +01:00
Claudio Atzori
0c1be41b30
code formatting
2023-02-22 10:15:25 +01:00
Claudio Atzori
3b876d9327
depending on dhp-schemas v. 3.16.0
2023-02-22 10:15:10 +01:00
Claudio Atzori
99cd7761aa
cleanup of non necessary dhp-monitor-update workflow
2023-02-22 10:10:22 +01:00
Claudio Atzori
a590c371a9
Merge pull request '8232-mdstore-synch-improve' ( #272 ) from 8232-mdstore-synch-improve into beta
...
Reviewed-on: #272
2023-02-22 10:02:26 +01:00
Claudio Atzori
cd3a51a15f
Merge branch 'beta' into 8232-mdstore-synch-improve
2023-02-22 09:57:07 +01:00
Claudio Atzori
42b6b5d5ce
Merge pull request 'UsageCountOnProjectAndDatasource' ( #271 ) from UsageCountOnProjectAndDatasource into beta
...
Reviewed-on: #271
2023-02-22 09:56:08 +01:00
Claudio Atzori
477a7c416f
Merge branch 'beta' into UsageCountOnProjectAndDatasource
2023-02-22 09:55:51 +01:00
Claudio Atzori
c20c1c9159
Merge pull request 'Added 4 institutions:' ( #261 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #261
2023-02-22 09:53:45 +01:00
Miriam Baglioni
d617c3e812
[DOIBoost] extended mapping for funder #8407
2023-02-20 14:45:27 +01:00
dimitrispie
90807b60c7
Changes to monitor wf
2023-02-20 10:42:24 +02:00
dimitrispie
d2f9ccf934
Changes to separate monitor wf
2023-02-20 10:41:21 +02:00
dimitrispie
032a401cbf
Bug fixes
2023-02-20 09:29:20 +02:00
Miriam Baglioni
016337a0f9
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-02-16 15:54:59 +01:00
Sandro La Bruzzo
118c1fc3b3
Merge remote-tracking branch 'origin/beta' into beta
2023-02-15 10:29:28 +01:00
Sandro La Bruzzo
a8ac79fa25
Added citation relation on crossref Mapping
2023-02-15 10:29:13 +01:00
dimitrispie
595192d510
Bug fix
2023-02-14 16:24:08 +02:00
dimitrispie
f3aaff3688
Remove duplicate orgs
2023-02-14 09:48:36 +02:00
Claudio Atzori
9a03f71db1
code formatting
2023-02-13 16:25:47 +01:00
Michele Artini
554df257ab
null values in date range conditions
2023-02-13 16:15:32 +01:00
dimitrispie
3400133c2f
Bug fix
2023-02-13 09:44:00 +02:00
dimitrispie
935db0ab25
Added organizations for Monitor
2023-02-13 09:29:09 +02:00
dimitrispie
7b78b15c81
Changes for copying to Impala Cluster
2023-02-13 09:27:00 +02:00
Miriam Baglioni
5cf902a2b0
[UsageCount] changed query to make the sum be computed via sql instead of grouping
2023-02-10 16:16:37 +01:00
Miriam Baglioni
f803530df6
[UsageCount] fixed query
2023-02-10 15:50:56 +01:00
Miriam Baglioni
bb5bba51b3
[UsageCount] extended test
2023-02-09 19:08:30 +01:00
Miriam Baglioni
85e53fad00
[UsageCount] addition of usagecount for Projects and datasources. Extention of the action set created for the results with new entities for projects and datasources. Extention of the resource set and modification of the testing class
2023-02-09 18:59:45 +01:00
dimitrispie
d71f5672d3
Add monitor post step
2023-02-09 13:44:14 +02:00
dimitrispie
35ba8bb328
Bug fixes
2023-02-09 12:57:57 +02:00
Sandro La Bruzzo
8920932dd8
Code formatted
2023-02-08 11:34:18 +01:00
Sandro La Bruzzo
0b9819f1ab
Code formatted
2023-02-08 10:32:33 +01:00
Sandro La Bruzzo
6c81a161d2
Merge remote-tracking branch 'origin/beta' into 8231-mdstore-synch-improve
2023-02-08 10:29:09 +01:00
dimitrispie
3ba11d64a1
Changes 07022023
2023-02-07 12:53:51 +02:00
dimitrispie
98c34263ed
Update step20-createMonitorDB.sql
...
Add University of Cape Town organization
2023-02-07 08:14:48 +02:00
dimitrispie
2dc6d47270
Changes 06022023
2023-02-06 13:18:53 +02:00
dimitrispie
973d78a4d6
Update step15_5.sql
...
Added unpaywalls open access colors
2023-02-02 08:03:54 +02:00
Claudio Atzori
d05ca53a14
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-01-31 14:39:53 +01:00
Michele De Bonis
6a6c266dde
implementation of author dedup configuration and lnfi clustering function
2023-01-31 11:53:10 +01:00
Miriam Baglioni
e82e009b46
added missing close tag for XML produced by the xquery to get information for the community from the IS
2023-01-31 10:19:34 +01:00
Miriam Baglioni
b254a0375f
[Affiliation from institutionalrepo] changed the field to check to verify the datasource type. Now it is in the field jurisdiction
2023-01-26 16:51:20 +01:00
dimitrispie
cf58e4a5e4
Added Arts et Métiers ParisTech
2023-01-25 16:03:16 +02:00
dimitrispie
db7d625ba9
Addedd Arts et Métiers ParisTech organization
2023-01-25 12:22:21 +02:00
Claudio Atzori
505867bce9
[bulk tagging] better node naming
2023-01-20 16:13:16 +01:00
Miriam Baglioni
ecd398fe51
refactoring
2023-01-20 14:23:45 +01:00
Miriam Baglioni
0a5c6010b0
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-01-13 16:14:46 +01:00
dimitrispie
4d7553c9f1
Bug fixes
2023-01-12 17:19:19 +02:00
dimitrispie
dd70c32ad7
Bug fixes
2023-01-12 17:18:05 +02:00
dimitrispie
51f7ab5864
Bug fixes
2023-01-12 17:15:06 +02:00
dimitrispie
34d4bf727c
Bug fixes
2023-01-12 11:28:37 +02:00
dimitrispie
43f6d4f296
-Monitor DB workflow
2023-01-12 11:26:47 +02:00
dimitrispie
686580a220
- New Monitor DB workflow
...
- New Organization added
2023-01-12 11:18:03 +02:00
Claudio Atzori
0a58bc7ba7
[broker] prevent NPEs
2023-01-11 14:44:14 +01:00
Claudio Atzori
04cb96001c
[broker] d40e20f437
adapted to the beta graph model
2023-01-11 10:10:12 +01:00
Michele Artini
91b845f611
Considering instance pids and alteternative identifiers
2023-01-11 09:58:54 +01:00
Miriam Baglioni
1f367122e4
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-01-11 09:47:44 +01:00
Michele Artini
7b7520850b
fixed an invalid char
2023-01-11 09:22:18 +01:00
Miriam Baglioni
d6895f0387
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-01-09 17:28:38 +01:00
dimitrispie
becb242c17
Monitor DB only Workflow
2023-01-04 16:50:29 +02:00
dimitrispie
dcb958e146
Changes to execute the stats wf only in hive
2023-01-04 11:39:01 +02:00
Claudio Atzori
18a7aa2d78
Merge pull request 'Workaround to use new version of intellij on Beta' ( #267 ) from beta_intellij into beta
...
Reviewed-on: #267
2022-12-23 10:32:01 +01:00
dimitrispie
592013d5dd
Added more steps in decision node
2022-12-23 09:43:16 +02:00
dimitrispie
2a4bf32d4c
Merge branch 'hive' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into hive
...
# Conflicts:
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step10.sql
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step13.sql
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step14.sql
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step16_1-definitions.sql
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step7.sql
2022-12-22 10:22:46 +02:00
dimitrispie
6449ff4207
1. Added a decision node to enables the workflow to make a selection on the execution path to follow
...
2. Added new organization
3. Added 5 new tables from Eurostast
2022-12-22 10:18:21 +02:00
Miriam Baglioni
8893389895
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-12-21 12:42:27 +01:00
Antonis Lempesis
c8309fe18e
addded command line params to allow hive actions to run
2022-12-21 12:41:33 +02:00
Antonis Lempesis
028873cc51
added new hive opts
2022-12-21 12:41:33 +02:00
Antonis Lempesis
1ddea4f442
removed 'stored as parquet' from views..
2022-12-21 12:41:33 +02:00
Antonis Lempesis
2754c3dd62
moving data to impala cluster and creating shadow databases there
2022-12-21 12:41:29 +02:00
Antonis Lempesis
778a1a724f
finished migration to hive only
2022-12-21 12:41:25 +02:00
Antonis Lempesis
e84dd5fe26
first
2022-12-21 12:41:23 +02:00
Sandro La Bruzzo
3c9826f186
updated lines function to it's implementation linesWithSeparators.map(l => l.stripLineEnd) in this way we force scala plugin compiler to consider this pipeline scala code and not java.string.lines() pipeline
2022-12-21 11:21:17 +01:00
Claudio Atzori
6aa91204a5
[orcid propagation] skip empty directories
2022-12-20 14:15:46 +01:00
Claudio Atzori
9cf0a98699
[cleaning] set the common subject classid/name
2022-12-20 10:17:33 +01:00
Miriam Baglioni
6674cccb94
[BulkTag] description of parameters more comprehensive for those who do not implement it
2022-12-16 15:33:20 +01:00
Miriam Baglioni
f37113a941
[BulkTag] moving xquery to get community configuration in dedicated file
2022-12-16 15:32:26 +01:00
Miriam Baglioni
8685eaa706
[Clean Country] added test to verify remove of country
2022-12-16 15:31:25 +01:00
Miriam Baglioni
dc0ec88a58
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-12-16 13:18:32 +01:00
Miriam Baglioni
d791840b82
[Clean Country] added test to verify remove of country:
2022-12-16 13:18:29 +01:00
Claudio Atzori
7b80b24f82
[cleaning] country cleaning must use both PID and AlternateIdentifier fields
2022-12-15 14:49:04 +01:00
Claudio Atzori
b8bafab8a0
[cleaning] improved vocabulary based mapping, specialization for the strict vocab cleaning
2022-12-12 14:43:03 +01:00
Sandro La Bruzzo
5e4866d033
implemented synch for single mdstore
2022-12-12 11:29:46 +01:00
Claudio Atzori
c18b8048c3
[cleaning] avoid NPE
2022-12-10 11:41:38 +01:00
Claudio Atzori
8b44afe5e5
[cleaning] avoid NPE
2022-12-09 15:44:57 +01:00
Claudio Atzori
389dd25430
[cleaning] avoid NPE
2022-12-08 18:40:48 +01:00
Claudio Atzori
730228d73d
[cleaning] align wf parameter names in test
2022-12-08 18:40:22 +01:00
Claudio Atzori
2094fa6db0
[cleaning] align wf parameter names
2022-12-08 17:22:26 +01:00
Miriam Baglioni
a485a94956
[Cleaning] fixed parameter name in property file
2022-12-08 16:59:34 +01:00
Miriam Baglioni
3d99b78d94
[Cleaning] fixed error in parameter (workingPath to workingDir)
2022-12-08 10:25:02 +01:00
Claudio Atzori
1b8488976b
code formatting
2022-12-07 10:45:38 +01:00
Claudio Atzori
cd1b58483e
[bulk tag] fixed Community configuration parsing to void NPE
2022-12-07 10:39:00 +01:00
Claudio Atzori
062abfd669
fixed NPE, removed unused stuff
2022-12-06 12:04:00 +01:00
Claudio Atzori
71b121e9f8
Merge pull request '[graph cleaning] update collectedfrom & hostedby references as consequence of the datasource deduplication' ( #260 ) from graph_cleaning into beta
...
Reviewed-on: #260
2022-12-02 14:49:15 +01:00
Claudio Atzori
8248da40d9
Merge branch 'beta' into graph_cleaning
2022-12-02 14:49:00 +01:00
Claudio Atzori
ddf065756f
Merge pull request 'Two organizations are added for monitor' ( #258 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #258
2022-12-02 14:45:27 +01:00
Claudio Atzori
41f7f1bbc5
Merge pull request '[graph dedup] records stability and testing' ( #44 ) from deduptesting into beta
...
Reviewed-on: #44
2022-12-02 14:43:05 +01:00
Sandro La Bruzzo
5a48a2fb18
implemented synch for single mdstore
2022-12-01 11:34:43 +01:00
Claudio Atzori
a38116546d
Merge branch 'beta' into deduptesting
2022-11-30 11:27:29 +01:00
Miriam Baglioni
ce020f2c83
[EOSC FUTURE] added resources and test for review
2022-11-30 09:57:30 +01:00
Miriam Baglioni
bb0ddc1c44
[BulkTag] adding verb starts_with
2022-11-30 09:56:24 +01:00
Claudio Atzori
8e3edba318
[graph cleaning] testing the collectedfron and hostedby patch procedure
2022-11-29 16:07:09 +01:00
Claudio Atzori
58c05731f9
[graph cleaning] WIP: testing the collectedfron and hostedby patch procedure
2022-11-29 11:21:51 +01:00
Miriam Baglioni
7d264a1d69
Merge pull request 'horizontalConstraints' ( #259 ) from horizontalConstraints into beta
...
Reviewed-on: #259
2022-11-28 18:20:17 +01:00
Miriam Baglioni
9c70c5dbd6
[Bulk Tag horizontal] added new path in definition of constraint (to recognize fos subjects) - changed test and resource class to test this new aspect
2022-11-28 14:51:20 +01:00
Miriam Baglioni
0628df7a3a
resolving conflicts
2022-11-28 10:44:56 +01:00
Claudio Atzori
11695ba649
[graph cleaning] patch also the result's collectedfrom and hostedby datasource name according to the datasource master-duplicate mapping
2022-11-28 10:18:43 +01:00
Claudio Atzori
6082d235d3
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into graph_cleaning
2022-11-28 09:54:48 +01:00
Claudio Atzori
24ef301cc1
[graph cleaning] patch the result's collectedfrom and hostedby identifiers according to the datasource master-duplicate mapping
2022-11-28 09:54:18 +01:00
Alessia Bardi
90c8f9cb61
tests for EOSC Future
2022-11-23 12:18:44 +01:00
Miriam Baglioni
0e3edc5018
[Bulk Tag] fixed issue in verb name
2022-11-23 11:26:36 +01:00
Claudio Atzori
a79c47522d
updated ORCID datasource identifier
2022-11-23 10:17:49 +01:00
Alessia Bardi
2832117f23
added eoscifguidelines in test
2022-11-22 18:01:12 +01:00
Michele De Bonis
14f6346676
implementation of the new software configuration
2022-11-22 17:48:34 +01:00
Alessia Bardi
3c08269a4d
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-11-22 17:31:00 +01:00
Alessia Bardi
2687fc9f73
tests for EOSC Future review - ROhub
2022-11-22 17:30:56 +01:00
Claudio Atzori
1d5143b0b6
Merge branch 'beta' into deduptesting
2022-11-22 10:21:30 +01:00
Michele De Bonis
9fee2ed611
minor changes
2022-11-21 14:35:46 +01:00
Claudio Atzori
0aa725083f
extended dedup testing
2022-11-17 16:13:43 +01:00
Claudio Atzori
3dbc637d3e
code formatting
2022-11-17 09:55:41 +01:00
Claudio Atzori
24f99d7310
Merge pull request 'Map oaf:eoscifguidelines from mdstore to the graph' ( #256 ) from eoscifguidelines-from-mdstores into beta
...
Reviewed-on: #256
2022-11-14 15:40:34 +01:00
Claudio Atzori
ddff0e8999
merging duplicates using IdentifierComparator
2022-11-11 16:10:25 +01:00
Claudio Atzori
5af5a8ae42
added IdentifierComparator
2022-11-09 14:20:59 +01:00
Claudio Atzori
0419953470
merge from beta
2022-11-07 12:22:35 +01:00
Claudio Atzori
7c3390ac10
Merge branch 'beta' into eoscifguidelines-from-mdstores
2022-11-07 12:18:40 +01:00
Claudio Atzori
22873c9172
Merge pull request 'Added fields: totalcost, fundedamount, currency, in project table' ( #257 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #257
2022-10-31 13:49:27 +01:00
Sandro La Bruzzo
2b9a20a4a3
Changed the way Scholexplorer filter the relationships, I found that filter all relation coming from openCitation is wrong, because we loose a lot of relation than intersect OpenCitation, but they don't come only from there
2022-10-24 12:53:47 +02:00
Alessia Bardi
208ed32315
fixed xpath for semantic relation
2022-10-23 18:18:13 +02:00
Alessia Bardi
ee759ac92d
file format after mvn compile
2022-10-23 18:09:47 +02:00
Alessia Bardi
31a10f000b
Map the field oaf:eoscifguidelines from mdstores. Currently we can find it in ROHub metadata
2022-10-23 18:05:37 +02:00
Claudio Atzori
ec39b84898
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-10-19 15:21:02 +02:00
Claudio Atzori
bca4a61710
suppressing hyper verbose spark logs during unit test execution
2022-10-19 15:20:58 +02:00
Sandro La Bruzzo
72f0d88d6c
formatted code
2022-10-19 14:18:42 +02:00
Claudio Atzori
9b449110c6
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-10-14 15:48:04 +02:00
Claudio Atzori
ae7cd0735a
[graph2hive] more partitions
2022-10-14 15:47:58 +02:00
Sandro La Bruzzo
135cf81151
Merge remote-tracking branch 'origin/beta' into beta
2022-10-13 11:47:25 +02:00
Sandro La Bruzzo
a1f94530a3
added documentation
2022-10-13 11:47:11 +02:00
Claudio Atzori
b47aaf4dd1
[cleaning] subjects declared as belonging to specific vocabularies whose values are not found in the vocab are set to type keyword
2022-10-13 11:23:43 +02:00
Claudio Atzori
6163ecbf63
[cleaning] renamed parameters in wf action
2022-10-11 11:20:03 +02:00
Claudio Atzori
b301e9fdff
[cleaning] renamed action name/description
2022-10-11 11:08:52 +02:00
Claudio Atzori
ece40adc09
[cleaning] fixing NPE in the country cleaning phase
2022-10-11 10:10:20 +02:00
Claudio Atzori
d51275a965
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-10-07 09:52:49 +02:00
Claudio Atzori
8d97949316
[cleaning] fixed loop in wf nodes
2022-10-07 09:52:45 +02:00
Miriam Baglioni
a653e1b3ea
[Enrichment - result to community through organization] reimplementation of the data preparation step using spark
2022-10-04 15:01:28 +02:00
Miriam Baglioni
4d8339614b
Revert "[BipFinder] Fixed issue for wrong escaped char in doi"
...
This reverts commit 188f25eefa
.
2022-10-04 14:29:47 +02:00
Miriam Baglioni
7324853a17
Revert "[BipFinder] refactoring"
...
This reverts commit 28dc317350
.
2022-10-04 14:29:39 +02:00
Miriam Baglioni
28dc317350
[BipFinder] refactoring
2022-10-04 09:47:27 +02:00
Miriam Baglioni
188f25eefa
[BipFinder] Fixed issue for wrong escaped char in doi
2022-10-03 12:42:52 +02:00
Claudio Atzori
89f7007080
Merge pull request '[stats wf] misc changes' ( #254 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #254
2022-10-03 10:32:05 +02:00
Alessia Bardi
49360770d7
map w3id as instance url
2022-09-28 14:16:39 +02:00
Miriam Baglioni
b5b5a4c192
[CleanCountry] fixed issue
2022-09-28 12:42:51 +02:00
Miriam Baglioni
f1d7d45cf7
[BulkTag] fixed issue
2022-09-28 12:01:43 +02:00
Miriam Baglioni
3ec044600d
[BulkTag] fixed conflicts
2022-09-28 11:58:28 +02:00
Miriam Baglioni
1cb79719a7
[BulkTag] fixed issues
2022-09-28 11:44:55 +02:00
Claudio Atzori
f3f7604e6c
trying to fix a test that fails only on Jenkins
2022-09-27 15:21:37 +02:00
Claudio Atzori
de7bc9350e
Merge pull request 'relation-from-odf' ( #251 ) from relation-from-odf into beta
...
Reviewed-on: #251
2022-09-27 15:08:26 +02:00
Claudio Atzori
3f90d159e3
code formatting
2022-09-27 15:08:00 +02:00
Claudio Atzori
0b3e44e521
Merge branch 'beta' into relation-from-odf
2022-09-27 14:57:01 +02:00
Claudio Atzori
b4b6a4457c
Merge pull request 'BulkTagging extension' ( #250 ) from horizontalConstraints into beta
...
Reviewed-on: #250
2022-09-27 14:56:31 +02:00
Claudio Atzori
57dbeb08d2
code formatting
2022-09-27 14:55:10 +02:00
Claudio Atzori
b60985cf68
Merge branch 'beta' into horizontalConstraints
2022-09-27 14:39:31 +02:00
Claudio Atzori
3b60642ef9
Merge pull request 'Synchronize indicators in stats-db with monitor-db' ( #249 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #249
2022-09-27 14:37:33 +02:00
Claudio Atzori
6ad38ade74
Merge pull request 'Clean Country' ( #241 ) from clean_country into beta
...
Reviewed-on: #241
2022-09-27 14:35:35 +02:00
Claudio Atzori
25e9d92aad
Merge branch 'beta' into clean_country
2022-09-27 14:27:49 +02:00
Alessia Bardi
fd63e9bfac
Mapping all relationships supported in ModelConstants and ModelSupport
2022-09-26 11:24:13 +02:00
Miriam Baglioni
ca216a92ad
[BulkTagging] changed the query to the IS to insert values for FOS and SDG as subject in the configuration used for the tagging
2022-09-23 17:06:07 +02:00
Miriam Baglioni
3e6b0f58bb
[BulkTagging] changed the query to the IS to get also the information for the advancedConstraint from the profile
2022-09-23 16:47:19 +02:00
Miriam Baglioni
4a3e119b73
mergin with branch beta
2022-09-23 16:16:06 +02:00
Miriam Baglioni
f0e303abf9
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-09-23 16:15:32 +02:00
Miriam Baglioni
55da4d8715
[BulkTagging] modifying code to represent constraints horizontally on all the results. Added subject to the set of field used to express the constraint. Modified resorces to test the new approach. Modified test calss
2022-09-23 16:02:19 +02:00
Alessia Bardi
c5eb722170
relationships from relatedIdentifier whose target id type is one of the pid type with an authority
2022-09-23 15:47:05 +02:00
Claudio Atzori
c86cc53520
suppressing hyper verbose spark logs during unit test execution
2022-09-23 15:20:40 +02:00
Alessia Bardi
ba33ff71fd
refactoring for the generation of relationships from related identifier of type 'OPENAIRE'
2022-09-23 15:17:13 +02:00
Alessia Bardi
982bcc1e35
test wrid pid and record identifier
2022-09-23 12:06:06 +02:00
Miriam Baglioni
960cb861a0
refactoring
2022-09-23 11:14:04 +02:00
Claudio Atzori
c42850328e
fixed semantic (subreltype) for ServiceOrganization relations
2022-09-22 16:23:25 +02:00
Miriam Baglioni
33bb79459e
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-09-22 15:55:17 +02:00
Claudio Atzori
e45ec15221
Merge branch 'beta' into clean_country
2022-09-19 11:34:02 +02:00
Claudio Atzori
26e1badded
added instance.url syntactical validation, avoid creating multiple duplicated URLs
2022-09-19 11:19:10 +02:00
Miriam Baglioni
5240ac3d7b
[EOSC Tag] remove addition of eosc context for result with eosc if guidelines set
2022-09-19 11:02:18 +02:00
Claudio Atzori
192215a18e
merged from branch discard-non-wellformed
2022-09-19 10:17:10 +02:00
Claudio Atzori
e370e940d8
[aggregator graph] save invalid records aside for further inspection
2022-09-16 14:06:28 +02:00
Claudio Atzori
465e941214
Merge pull request '[stats wf] Changes to indicators tables' ( #244 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #244
2022-09-16 10:13:58 +02:00
Claudio Atzori
1e42d984e1
[aggregator graph] save invalid records aside for further inspection
2022-09-15 10:49:42 +02:00
Alessia Bardi
9e7ec4198f
fixed test
2022-09-14 18:08:56 +02:00
Claudio Atzori
c48f6e9c57
[aggregator graph] save invalid records aside for further inspection
2022-09-14 17:11:26 +02:00
Claudio Atzori
a0919ed495
[aggregator graph] save invalid records aside for further inspection
2022-09-14 13:27:39 +02:00
Alessia Bardi
b99a011345
return empty Oaf list if record cannot be parsed
2022-09-13 11:51:55 +02:00
Alessia Bardi
27af5122d2
logs for non well formed XML files
2022-09-12 14:25:23 +02:00
Claudio Atzori
5066db3386
Merge pull request 'subjects cleaning' ( #239 ) from clean_subjects into beta
...
Reviewed-on: #239
2022-09-09 15:17:02 +02:00
Claudio Atzori
ff6f789b6d
code formatting
2022-09-09 15:16:31 +02:00
Claudio Atzori
b5d6966c01
Merge branch 'beta' into clean_country
2022-09-09 12:20:19 +02:00
Claudio Atzori
b5f7bd30be
Merge branch 'beta' into clean_subjects
2022-09-09 12:20:04 +02:00
Claudio Atzori
690be4482f
Merge pull request '#7861#note-8 instance url from handle' ( #243 ) from handle_as_instance_urls into beta
...
Reviewed-on: #243
2022-09-09 12:19:17 +02:00
Alessia Bardi
f14107ad77
Merge branch 'handle_as_instance_urls' of https://code-repo.d4science.org/D-Net/dnet-hadoop into handle_as_instance_urls
2022-09-09 12:17:19 +02:00
Alessia Bardi
a539c6ccaf
https for handle URLs
2022-09-09 12:16:28 +02:00
Claudio Atzori
1203378441
Merge branch 'beta' into clean_subjects
2022-09-09 10:38:47 +02:00
Claudio Atzori
14dc909a14
Merge branch 'beta' into clean_country
2022-09-09 10:38:17 +02:00
Claudio Atzori
853c996fa2
Merge branch 'beta' into handle_as_instance_urls
2022-09-09 09:47:16 +02:00
Claudio Atzori
a431e01383
Merge pull request 'orcid_multipleworks_download' ( #242 ) from enrico.ottonello/dnet-hadoop:orcid_multipleworks_download into beta
...
Reviewed-on: #242
2022-09-09 08:45:02 +02:00
Alessia Bardi
9ef063d502
#7861#note-8 instance url from handle
2022-09-07 17:29:54 +03:00
Alessia Bardi
5c45d52af3
testing for RiuNet
2022-09-07 15:40:57 +03:00
Alessia Bardi
a11eb38065
testing for RO-Hub
2022-09-02 16:07:36 +02:00
Enrico Ottonello
bfdf2dc390
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid_multipleworks_download
2022-08-25 12:07:54 +02:00
Enrico Ottonello
da1cf561e6
alignment with beta
2022-08-25 11:57:20 +02:00
Enrico Ottonello
27445ccdaa
cleaned log
2022-08-25 11:56:14 +02:00
Claudio Atzori
b7c387c21f
cleaning of subjects: avoid duplicated subjects, prioritise collected vs inferred or other sources
2022-08-12 15:09:16 +02:00
Claudio Atzori
adb526b0e1
Merge branch 'beta' into clean_subjects
2022-08-12 10:51:17 +02:00
Claudio Atzori
cb7c07c54e
[scholix] added step to create tar archive
2022-08-11 11:25:24 +02:00
Claudio Atzori
2aa16d0432
[scholix] fixed OpenCitation dump procedure
2022-08-10 17:39:29 +02:00
Miriam Baglioni
7dbdd4a0fe
[Clean Country]changes related to #241 (comment)
2022-08-10 15:13:10 +02:00
Claudio Atzori
51ad93e545
[scholix] fixed OpenCitation dump procedure
2022-08-10 11:57:56 +02:00
Miriam Baglioni
62d2138806
[Clean Context] changed a bit the logic. Added the check not to have result hosted by a datasource of type institutional repository from NL. Added also the check that the country should have been included in the result via propagation for it to be removed
2022-08-08 14:10:47 +02:00
Claudio Atzori
3418ce50ac
cleaning of subjects: perform the cleaning when the given value is equivalent to one of the terms in the vocabulary
2022-08-08 12:48:47 +02:00
Claudio Atzori
a78028dabc
Merge branch 'beta' into clean_subjects
2022-08-08 12:34:33 +02:00
Miriam Baglioni
390013a4b2
mergin with branch beta
2022-08-08 12:30:31 +02:00
Claudio Atzori
d85ba3c1a9
Merge pull request 'serialising field eoscifguidelines field in the Solr XML records' ( #234 ) from tagEosc into beta
...
Reviewed-on: #234
2022-08-08 10:28:41 +02:00
Claudio Atzori
3937ff04de
Merge branch 'beta' into tagEosc
2022-08-08 09:57:23 +02:00
Claudio Atzori
a4815f6bec
Merge branch 'beta' into clean_subjects
2022-08-05 16:57:03 +02:00
Claudio Atzori
29c4cde42e
Merge branch 'clean_subjects' of https://code-repo.d4science.org/D-Net/dnet-hadoop into clean_subjects
2022-08-05 16:56:37 +02:00
Claudio Atzori
4eaa063b1f
cleaning of subjects
2022-08-05 16:56:09 +02:00
Claudio Atzori
84598c7535
Merge pull request 'restored some collab indicators' ( #240 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #240
2022-08-05 15:50:39 +02:00
Claudio Atzori
844f6eb465
Merge branch 'beta' into clean_subjects
2022-08-05 12:39:05 +02:00
Claudio Atzori
32cee1f619
WIP: cleaning of subjects
2022-08-05 12:32:08 +02:00
Claudio Atzori
c1f2ffc53d
Merge pull request 'commenting out the collab indicators because they still fail' ( #237 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #237
2022-08-05 11:57:36 +02:00
Claudio Atzori
6c0fd9284b
merge from beta
2022-08-05 10:42:53 +02:00
Claudio Atzori
b78889a0ce
WIP: cleaning of subjects
2022-08-05 09:11:37 +02:00
Claudio Atzori
08ce2cadc2
Merge pull request '[Graph Dump] Remove code from dnet-hadoop' ( #235 ) from removeDump into beta
...
Reviewed-on: #235
2022-08-05 09:09:50 +02:00
Miriam Baglioni
a7a18d7630
[Graph Dump] removed code for the dump from the project. Fixed issues in tests when possible
2022-08-04 17:40:40 +02:00
Claudio Atzori
499826ead1
serialising field eoscifguidelines field in the Solr XML records
2022-08-04 12:40:48 +02:00
Claudio Atzori
27a91841e7
WIP: cleaning of subjects
2022-08-04 11:39:39 +02:00
Antonis Lempesis
b09d7ddc74
fixed the datasourceOrganization relations
2022-08-03 12:26:50 +02:00
Claudio Atzori
e62018e95d
[aggregator graph] added more assertions in test
2022-08-03 12:26:05 +02:00
Claudio Atzori
efd96e7e66
Merge pull request 'fixed the datasourceOrganization relations' ( #233 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #233
2022-08-03 12:25:05 +02:00
Claudio Atzori
eb53b52f7c
code formatting
2022-08-02 13:24:47 +02:00
Claudio Atzori
27681cf6bf
Merge pull request '[stats wf] latest version of indicators + added FOS classification' ( #232 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #232
2022-08-02 12:57:15 +02:00
Claudio Atzori
209c7e9dab
[datacite] avoid UnsupportedOperationException
2022-08-01 09:05:35 +02:00
Enrico Ottonello
64311b8be4
removed unuseful accumulator
2022-07-31 01:03:29 +02:00
Antonis Lempesis
6fc9ef53f6
addded command line params to allow hive actions to run
2022-07-29 16:36:20 +03:00
Claudio Atzori
92e48f12f7
[metadata collection] updated collector plugin name
2022-07-29 13:54:00 +02:00
Claudio Atzori
f62c4e05cd
code formatting
2022-07-29 11:56:01 +02:00
Claudio Atzori
0727f0ef48
[EOSC tag] avoid NPEs
2022-07-29 11:55:34 +02:00
Miriam Baglioni
3329b6ce6b
[EOSC TAG] added fix for NPE on subjects
2022-07-29 10:54:20 +02:00
Claudio Atzori
37cfda0fc5
Merge pull request 'participant project contribution' ( #223 ) from project_organization_contribution into beta
...
Reviewed-on: #223
2022-07-28 12:16:30 +02:00
Claudio Atzori
1dd1e4fe3a
extended test for mapping project_organization relations
2022-07-28 11:27:08 +02:00
Claudio Atzori
60e4fbd78b
Merge branch 'beta' into project_organization_contribution
2022-07-28 10:15:43 +02:00
Claudio Atzori
ed98a6d9d0
[Datacite mapping] include the older datacite prefixed OpenAIRE id among the originalId[]
2022-07-28 10:15:14 +02:00
Claudio Atzori
09ccc7b472
Merge branch 'beta' into project_organization_contribution
2022-07-28 09:49:59 +02:00
Sandro La Bruzzo
67525076ec
fixed test, now it compiles after commit a6977197b3
2022-07-26 15:35:17 +02:00
Claudio Atzori
26104826c4
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-07-26 14:34:29 +02:00
Claudio Atzori
c03e20be39
Merge pull request 'EOSC COntext Tagging' ( #231 ) from eosc_context_tagging into beta
...
Reviewed-on: #231
2022-07-26 09:20:53 +02:00
Claudio Atzori
d43663d30f
adapted RorActionSet test, it should not create parent/child rels
2022-07-25 17:54:10 +02:00
Miriam Baglioni
35bcd9422d
[EOSC Context Tagging] removed not needed specification in path
2022-07-25 15:45:22 +02:00
Miriam Baglioni
1c82acb168
[EOSC Context Tagging] refactoring: moved EOSC IF tagging in package eosc under bulkTag
2022-07-25 14:26:39 +02:00
Miriam Baglioni
68cb637832
merge with branch beta
2022-07-25 14:24:25 +02:00
Miriam Baglioni
0172bab251
[EOSC Context Tagging] refactoring
2022-07-25 14:16:45 +02:00
Claudio Atzori
3c23d634eb
Merge pull request 'EOSC IF' ( #230 ) from tagEosc into beta
...
Reviewed-on: #230
2022-07-25 14:14:53 +02:00
Claudio Atzori
612b7a5530
Merge branch 'beta' into tagEosc
2022-07-25 14:12:59 +02:00
Claudio Atzori
3f883c4ecc
Merge pull request 'pubmed_update' ( #228 ) from pubmed_update into beta
...
Reviewed-on: #228
2022-07-25 14:10:35 +02:00
Claudio Atzori
c3ede1b379
Merge branch 'beta' into pubmed_update
2022-07-25 14:10:22 +02:00
Miriam Baglioni
144c103b67
[EOSC Context Tagging] add check to avoid the insertion of the context if already present
2022-07-25 13:52:45 +02:00
Enrico Ottonello
657b0208a2
multiple works download (<=100) for single request
2022-07-25 12:37:39 +02:00
Miriam Baglioni
d091866e48
[EOSC Context Tagging] refactoring
2022-07-25 11:12:22 +02:00
Miriam Baglioni
5968ec018d
[Clean Country] modified workflow and added param file
2022-07-22 16:48:38 +02:00
Miriam Baglioni
a12d28c644
[Clean Country] added logic not to remove country from result if it exist a hosting datasource with that country. Moreover the country will be removed only if added with propagation
2022-07-22 16:23:12 +02:00
Miriam Baglioni
2c933f1158
mergin with branch beta
2022-07-22 14:57:41 +02:00
Miriam Baglioni
06a95daf60
[EOSC context TAG] refactoring after compilation
2022-07-22 14:57:06 +02:00
Miriam Baglioni
ffb0ce3fb9
mergin with branch beta
2022-07-22 14:55:55 +02:00
Miriam Baglioni
627332526b
[EOSC context TAG] workflow start from reset_outputpath action
2022-07-22 14:55:11 +02:00
Miriam Baglioni
7a1c1b6f53
[EOSC context TAG] Add test class and resourcesK
2022-07-22 14:36:02 +02:00
Sandro La Bruzzo
ddc414b258
fixed wrong json param
2022-07-22 09:43:15 +02:00
Miriam Baglioni
317a4a56ef
[EOSC context TAG] first implementation of the logic to tag results imported from datasources registered in the EOSC
2022-07-21 17:37:48 +02:00
Miriam Baglioni
3be036f290
[EOSC TAG] refactoring after compilation
2022-07-21 14:45:43 +02:00
Miriam Baglioni
e61b8e6b03
mergin with branch beta
2022-07-21 14:43:23 +02:00
Miriam Baglioni
56d09e6348
[EOSC TAG] before adding the tag added a step to verify the same tag is not already present
2022-07-21 14:36:48 +02:00
Miriam Baglioni
5143a80232
[EOSC TAG] modification of test class to align with new element
2022-07-21 11:56:51 +02:00
Claudio Atzori
d900a02b74
Merge pull request 'implemented oozie workflow to generate scholix dump filtering relclass semantic' ( #229 ) from opencitation_enrichments into beta
...
Reviewed-on: #229
2022-07-21 10:12:17 +02:00
Sandro La Bruzzo
5f651f2316
changed filter relation on SubRelType
2022-07-21 10:11:48 +02:00
Miriam Baglioni
438abdf96f
[EOSC TAG] adding eosc interoperability guidelines in the specific element in the result. Removed from subjects. Removed also the deletion of EOSC Jupyter Notebook from subject since now the criteria are searchd for in a different place
2022-07-20 18:07:54 +02:00
Miriam Baglioni
65cc736e2f
[Clean Country] first implementation to remove country NL from results collected from NARCIS when doi starts with mendely prefix
2022-07-20 17:05:56 +02:00
Sandro La Bruzzo
5b76321d9c
implemented oozie workflow to generate scholix dump filtering relclass semantic
2022-07-20 16:34:32 +02:00
Claudio Atzori
18b505d6a3
Merge branch 'master' into beta
2022-07-19 14:18:02 +02:00
Claudio Atzori
1138b2ac8e
code formatting
2022-07-19 14:15:49 +02:00
Sandro La Bruzzo
00168303db
Added unit test to verify the generation in the OriginalID the old openaire Identifier generated by OAI
2022-07-14 10:19:59 +02:00
Sandro La Bruzzo
0a4f4d98fa
added PMCId to PmArticle
2022-07-13 15:27:17 +02:00
Alessia Bardi
28a32facf6
Merge pull request 'mapping `oaf:fulltext` element in the `result.fulltext` field' ( #226 ) from oaf_fulltext_mapping into beta
...
Reviewed-on: #226
2022-07-12 11:13:08 +02:00
Claudio Atzori
0c1cfee396
mapping oaf:fulltext elements in the result.fulltext field
2022-07-11 17:34:59 +02:00
Miriam Baglioni
fae681fea1
[Country Propagation] add check to avoid NPE on datasource.getDatasourceType().getClassis()
2022-07-03 17:39:58 +02:00
Miriam Baglioni
c09fcdb40b
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-07-01 12:38:03 +02:00
Claudio Atzori
138d1dfbf8
Merge pull request 'score class in the XML serialization' ( #225 ) from measure_serialization into beta
...
Reviewed-on: #225
2022-07-01 10:53:49 +02:00
Claudio Atzori
446699c59d
Merge pull request '[Graph Dump] New funded products dump' ( #222 ) from dump_new_funded_products into master
...
Reviewed-on: #222
2022-07-01 10:51:36 +02:00
Claudio Atzori
0cb1c70788
code formatting
2022-07-01 10:44:08 +02:00
Claudio Atzori
4ec13e2b66
Merge branch 'master' into dump_new_funded_products
2022-07-01 10:30:28 +02:00
Claudio Atzori
2f998b2429
Merge pull request '[Graph DUMP] add code to produce the delta of new projects with respect to the previous delta/dump' ( #221 ) from dump_delta_projects into master
...
IMO looks good, I think it can be integrated in the master branch.
Reviewed-on: #221
2022-07-01 10:30:10 +02:00
Claudio Atzori
072f192853
include the class information in the measure XML serialization
2022-07-01 09:54:56 +02:00
Claudio Atzori
a88103bcf9
[action manager] added more testing
2022-07-01 09:06:59 +02:00
Claudio Atzori
7da24c1dec
added more logging
2022-06-28 13:47:49 +02:00
Miriam Baglioni
ee1f1eeca2
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-06-28 11:06:32 +02:00
Miriam Baglioni
71744a1f52
[DUMP DELTA PROJECTS] refactoring
2022-06-27 18:07:58 +02:00
Miriam Baglioni
1d1fe3b151
[DUMP DELTA PROJECTS] refactoring
2022-06-27 18:04:59 +02:00
Claudio Atzori
a8773af0cb
Merge branch 'beta' into project_organization_contribution
2022-06-27 09:37:40 +02:00
Claudio Atzori
cba9c2b7cc
Merge pull request 'author name parsing' ( #220 ) from author_name_particles into beta
...
Reviewed-on: #220
2022-06-27 09:37:27 +02:00
Claudio Atzori
4829b96bb5
Merge branch 'beta' into author_name_particles
2022-06-27 09:37:03 +02:00
Claudio Atzori
316b0fd73c
added 'von' to the name particles file
2022-06-27 09:36:51 +02:00
Claudio Atzori
5130eac247
mapping by participant project contribution
2022-06-24 17:16:42 +02:00
Claudio Atzori
929b145130
code formatting
2022-06-21 23:07:06 +02:00
Miriam Baglioni
edddfc6c63
[DUMP DELTA PROJECTS] adding test and resource
2022-06-21 18:28:53 +02:00
Miriam Baglioni
f561f13dd9
[Funder Products Dump] fixed names of parameters in workflow
2022-06-21 18:18:17 +02:00
Miriam Baglioni
ff74e73369
[DUMP NEW FUNDED PRODUCTS] change in resources
2022-06-21 18:02:51 +02:00
Miriam Baglioni
b98f904d48
[Funder Products Dump] new way to avoid using hive
2022-06-21 17:52:27 +02:00
Miriam Baglioni
7423577a08
[Graph DUMP] add code to produce the delta of new projects with respect to the previous delta/dump
2022-06-21 14:51:38 +02:00
Claudio Atzori
c76ff6c613
Merge pull request '7096-fileGZip-collector-plugin' ( #211 ) from 7096-fileGZip-collector-plugin into beta
...
Reviewed-on: #211
2022-06-16 15:34:45 +02:00
Claudio Atzori
b295a40d9c
restored use of name_particles when parsing author names
2022-06-16 12:20:43 +02:00
Claudio Atzori
c7b09c6225
Merge branch 'beta' into 7096-fileGZip-collector-plugin
2022-06-16 09:28:50 +02:00
Claudio Atzori
875ae29961
Merge pull request 'mapping relationship from trasformed records based on `oaf:relation`' ( #219 ) from oaf_relation_mapping into beta
...
Reviewed-on: #219
2022-06-16 09:27:19 +02:00
Claudio Atzori
e03c0c7794
Merge branch 'beta' into oaf_relation_mapping
2022-06-16 09:27:01 +02:00
Claudio Atzori
06b5533d4c
Merge branch 'beta' into 7096-fileGZip-collector-plugin
2022-06-16 09:22:16 +02:00
Claudio Atzori
4c8e820ff0
mapping relationship from trasformed records based on oaf:relation
2022-06-14 08:49:02 +02:00
Alessia Bardi
88d531dc91
exclude FAIRsharing records from Datacite
2022-06-13 16:17:17 +02:00
Claudio Atzori
116902c028
mapping relationship from trasformed records based on oaf:relation
2022-06-13 14:31:48 +02:00
Claudio Atzori
b8cda65487
code formatting
2022-06-13 09:20:03 +02:00
Michele Artini
634869ce95
deleted hierarchical rels from ror action set
2022-06-13 09:12:21 +02:00
Alessia Bardi
922c6d66ef
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-06-10 17:29:15 +02:00
Alessia Bardi
68bd58d6a4
tests for ROHub
2022-06-10 17:29:11 +02:00
Miriam Baglioni
b229c6e7af
Merge pull request 'beta' ( #218 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #218
2022-06-10 11:03:48 +02:00
Michele Artini
b94a791bc5
unit tests to transform cnr explora
2022-06-09 12:25:34 +02:00
Miriam Baglioni
ab8868bd3a
[ZENODO-API] changed to iterate in all the deposited products and not just the last ten
2022-06-08 17:03:15 +02:00
Miriam Baglioni
4b6913787b
[DOI-BOOST] added one method in test of crossref mapping to aof and one resource. Related to ticket 7807
2022-06-08 14:55:19 +02:00
Miriam Baglioni
31d4557e8d
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2022-06-06 11:52:29 +02:00
Claudio Atzori
5c2949a864
Merge pull request '[stats wf] added open citations & more orgs in monitor, removed collab indicator' ( #213 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #213
2022-05-20 11:38:43 +02:00
Miriam Baglioni
5e0b8f9b5f
[CountryPropagation] refactoring
2022-05-20 09:15:53 +02:00
Miriam Baglioni
c298c148cb
[CountryPropagation] fix NPE issue
2022-05-20 09:11:46 +02:00
Miriam Baglioni
eaf9385ae5
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-05-17 15:09:37 +02:00
Miriam Baglioni
f5207885e3
[EOSCTag] changed code to remove EOSC Jupyter Notebook and modified test to exclude galaxy + software from the tagging for Galaxy
2022-05-17 15:09:22 +02:00
Claudio Atzori
d098ad0d93
[hb patch] updated map
2022-05-16 15:54:04 +02:00
Claudio Atzori
1dda11e031
[hb patch] updated map
2022-05-16 15:53:27 +02:00
Claudio Atzori
8dd5517548
code formatting
2022-05-16 14:35:24 +02:00
Claudio Atzori
52cb086506
[graph grouping] drop relation target path before copying from source
2022-05-16 12:08:36 +02:00
Claudio Atzori
6442763f97
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-05-16 12:07:45 +02:00
Claudio Atzori
997c50078e
[graph grouping] drop relation target path before copying from source
2022-05-16 12:07:40 +02:00
Sandro La Bruzzo
c1971d52c4
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2022-05-16 10:30:35 +02:00
Sandro La Bruzzo
4c50f35c8b
update publication Date format
2022-05-16 10:29:36 +02:00
Michele Artini
46c07e0724
deleted hierarchical rels from ror action set
2022-05-16 09:39:54 +02:00
Claudio Atzori
6031acb2e3
[openorgs] fixed parent/child query, using the correct semantic labels
2022-05-16 09:20:48 +02:00
Claudio Atzori
0dc33ea391
[openorgs] fixed parent/child query, using the correct semantic labels
2022-05-16 09:20:30 +02:00
Miriam Baglioni
e4eac1d20b
[EOSC TAG] added code to remove EOSC Jupyter Notebook from subjects and put EOSC as classid in the qualifier
2022-05-13 11:01:33 +02:00
Sandro La Bruzzo
22f65680b9
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2022-05-11 15:30:12 +02:00
Sandro La Bruzzo
ca8d26bcb4
added better filter for openCitations
2022-05-11 15:29:57 +02:00
Claudio Atzori
5d3b4a9c25
[graph merge beta] merge datasource originalid, collectedfrom, and pid lists
2022-05-11 14:13:06 +02:00
Claudio Atzori
2a8e0fb72f
[openorgs] mapping parent/child relations without massaging the semantic labels
2022-05-10 08:45:53 +02:00
Claudio Atzori
77bc9863e9
[openorgs] mapping parent/child relations without massaging the semantic labels
2022-05-09 16:06:04 +02:00
Claudio Atzori
378020e30a
[eosc_services] unit test adaptation
2022-05-09 16:05:06 +02:00
Miriam Baglioni
89657a0b78
[UsageCount] refactoring
2022-05-09 14:43:27 +02:00
Miriam Baglioni
a056f59c6e
[UsageCount] make it as an action set as it should be, plus changed the test to make them work as well now
2022-05-09 12:51:35 +02:00
Claudio Atzori
658450d9a3
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-05-05 11:38:08 +02:00
Claudio Atzori
846975c886
[eosc_services] using the correct 'keyword' subject type, as declared in the dnet:subject_classification_typologies vocabulary
2022-05-05 11:37:58 +02:00
Miriam Baglioni
5fe25cc51c
Merge pull request '[eosc tag] set the eosc subjects, rough implementation' ( #215 ) from eosc_tag into beta
...
Reviewed-on: #215
2022-05-04 10:11:14 +02:00
Miriam Baglioni
8a72de4011
[EOSCTag] modified workflow to execute all the steps and not only the last one
2022-05-04 10:10:56 +02:00
Miriam Baglioni
bd1108f98b
mergin with branch beta
2022-05-04 10:06:56 +02:00
Miriam Baglioni
3aeedd931a
[EOSCTag] fixed issue in case description is null. Modified test resources and classes
2022-05-04 10:06:38 +02:00
Claudio Atzori
da611cfbbd
[eosc_services] resolved merge conflicts
2022-05-03 13:37:15 +02:00
Claudio Atzori
9e12cb3c92
EOSC Services - removed field knowledgegraph; depending on the released schema module
2022-05-03 11:55:45 +02:00
Miriam Baglioni
a21fe310e5
[EOSCTag] last test and change in the implementation to search in title and descriptio
2022-05-02 17:43:20 +02:00
Claudio Atzori
2ade69dea6
EOSC Services - minor
2022-05-02 17:03:31 +02:00
Claudio Atzori
b6a7ff3a99
EOSC Services - removed fields from mapping, testing preparation
2022-05-02 15:52:33 +02:00
Miriam Baglioni
e37177e1ce
mergin with branch beta
2022-05-02 12:31:50 +02:00
Claudio Atzori
a8c51f6f16
EOSC Services - fixed query and testing preparation
2022-05-02 11:09:03 +02:00
Claudio Atzori
05c1ea92e9
EOSC Services - added Service-specific fields in the XML record serialization
2022-04-29 15:56:55 +02:00
Claudio Atzori
f5f532d134
EOSC Services - ongoing update
2022-04-29 12:25:24 +02:00
Antonis Lempesis
0353f93d54
added new hive opts
2022-04-29 12:49:27 +03:00
Serafeim Chatzopoulos
623f7be26d
Fix reading files from HDFS in FileCollector & FileGZipCollector plugins
2022-04-28 16:31:11 +03:00
Claudio Atzori
5ffc24d1ba
EOSC Services - ongoing update
2022-04-26 16:18:41 +02:00
Sandro La Bruzzo
78015a5733
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2022-04-26 09:56:34 +02:00
Sandro La Bruzzo
8c22e5c30a
added fix to include date array with only year or year and month
2022-04-26 09:56:27 +02:00
Claudio Atzori
81c4496d32
Merge branch 'beta' into 7096-fileGZip-collector-plugin
2022-04-26 09:02:15 +02:00
Miriam Baglioni
e342ec93f0
[EOSCTag] prepared resources for test
2022-04-22 18:35:37 +02:00
Miriam Baglioni
88562c0930
[EOSC TAG] added test for galaxy for title and description criterias
2022-04-22 18:35:03 +02:00
Miriam Baglioni
dfbd2bcbea
[EOSC TAG] added logic in case subject is null
2022-04-22 18:34:03 +02:00
Miriam Baglioni
27c85e901a
[EOSCTag] added resources and finalized test for Jupyter Notebook tagging
2022-04-22 17:38:10 +02:00
Miriam Baglioni
87bff36d9e
mergin with branch beta
2022-04-22 15:52:34 +02:00
Claudio Atzori
81242538e6
Merge pull request 'Oozie workflow for cleancontext' ( #216 ) from cleancontext into beta
...
Reviewed-on: #216
Looks good. We need to extend the cleaning workflow parameters to enable the extra step only when it is needed.
2022-04-22 15:46:40 +02:00
Miriam Baglioni
911ce0780a
Merge branch 'cleancontext' of https://code-repo.d4science.org/D-Net/dnet-hadoop into cleancontext
2022-04-22 15:41:42 +02:00
Miriam Baglioni
19d90658fc
[Clean Context] added description to parameters
2022-04-22 15:41:23 +02:00
Claudio Atzori
54162f5c4f
Merge branch 'beta' into cleancontext
2022-04-22 11:49:33 +02:00
Miriam Baglioni
bbb77052d3
[EOSCTag] first test
2022-04-22 11:32:57 +02:00
Claudio Atzori
30105f0722
Merge branch 'beta' into 7096-fileGZip-collector-plugin
2022-04-22 11:22:21 +02:00
Sandro La Bruzzo
a82ec3aaaf
code formatter
2022-04-22 11:08:13 +02:00
Sandro La Bruzzo
aa12429f50
Modified last intersection since we lost many titles.
2022-04-22 11:05:08 +02:00
Miriam Baglioni
7cb7066472
[EoscTag] first "rough" implementation
2022-04-22 10:44:17 +02:00
Sandro La Bruzzo
d660895b30
fixed wrong mapping type of dataset
2022-04-21 20:41:13 +02:00
Miriam Baglioni
e0915061c2
[Clean Context] fixed issue in param name
2022-04-21 16:32:40 +02:00
Miriam Baglioni
6dc68c48e0
[EOSCTag] -
2022-04-21 16:19:04 +02:00
Miriam Baglioni
9a961a0092
[Clean Context] fixed issue in param name
2022-04-21 15:12:24 +02:00
Claudio Atzori
29150a5d0c
code formatting
2022-04-21 13:31:56 +02:00
Miriam Baglioni
5b7d9e741c
[Clean Context] added logic to cleaning workflow to accomodate also context cleaning
2022-04-21 13:02:14 +02:00
Miriam Baglioni
ccba1a3db1
[Clean Context] added logic to cleaning workflow to accomodate also context cleaning
2022-04-21 13:00:06 +02:00
Claudio Atzori
a289c9eae2
Merge pull request '[Measures] added new measure (UsageCounts)' ( #214 ) from eosc_dimitris into beta
...
Reviewed-on: #214
2022-04-21 12:19:18 +02:00
Miriam Baglioni
20de75ca64
[Measures] removed typo
2022-04-21 12:14:03 +02:00
Miriam Baglioni
bebb2a0560
Merge branch 'eosc_dimitris' of https://code-repo.d4science.org/D-Net/dnet-hadoop into eosc_dimitris
2022-04-21 12:10:19 +02:00
Miriam Baglioni
b61efd613b
[Measures] addressed comments in the PR
2022-04-21 12:09:37 +02:00
Miriam Baglioni
d012d125d7
[EOSCTag] -
2022-04-21 12:02:09 +02:00
Claudio Atzori
88acad76f9
Merge branch 'beta' into eosc_dimitris
2022-04-21 12:00:03 +02:00
Claudio Atzori
eabb40fccc
Merge branch 'beta' into 7096-fileGZip-collector-plugin
2022-04-21 11:42:43 +02:00
Miriam Baglioni
c304657d91
[Measures] put the logic in common, no need to change the schema
2022-04-21 11:27:26 +02:00
Sandro La Bruzzo
d580e15442
Modified last intersection since we lost many titles.
...
this is my last resource, after that, I've to change my job
2022-04-21 11:06:08 +02:00
Miriam Baglioni
5295effc96
[Measures] fixed issue
2022-04-20 16:20:40 +02:00
Miriam Baglioni
61c0266a44
Merge pull request 'Remove Context from result' ( #208 ) from cleancontext into beta
...
Reviewed-on: #208
2022-04-20 15:45:32 +02:00
Miriam Baglioni
a38f0f5ea7
mergin with branch beta
2022-04-20 15:44:18 +02:00
Miriam Baglioni
dbfbe8841a
[Clean Context] changed the description in input parameters
2022-04-20 15:41:03 +02:00
Miriam Baglioni
5feae77937
[Measures] last changes to accomodate tests
2022-04-20 15:13:09 +02:00
Miriam Baglioni
869407c6e2
[Measures] added new measure (usagecounts) as action set. Measure added at the level of the result. Ref #7587
2022-04-20 14:02:05 +02:00
Michele Artini
c96a8613f8
update SQL queries
2022-04-20 12:07:49 +02:00
Michele Artini
4314db55c8
migration to services: update sql queries
2022-04-19 15:05:02 +02:00
miconis
9ddd24ba36
implementation of comparators and clustering function for the author deduplication
2022-04-19 10:18:09 +02:00
Miriam Baglioni
0012e57bf9
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2022-04-14 14:14:44 +02:00
Miriam Baglioni
c5a863132c
[BulkTagging] revert it
2022-04-14 14:14:13 +02:00
Sandro La Bruzzo
d5b29d96a7
fix merging in crossrefAggregator which creates dataInfo null
2022-04-14 11:07:04 +02:00
Miriam Baglioni
8e8933d41a
[BulkTagging] added fix if result.dataInfo is null
2022-04-14 09:04:24 +02:00
miconis
97a32faf9b
test implementation for the new fdup version
2022-04-13 09:48:56 +02:00
Claudio Atzori
b93a141d6c
[Doiboost] fixed fundingReference extraction from the Crossref records
2022-04-12 10:26:05 +02:00
Claudio Atzori
73c172926a
[Doiboost] fixed fundingReference extraction from the Crossref records
2022-04-12 10:25:42 +02:00
Claudio Atzori
48b580b45c
[graph enrichment] fixed country_propagation oozie workflow definition, parameter saveGraph is not needed anymore by the SparkCountryPropagationJob
2022-04-11 08:52:36 +02:00
Claudio Atzori
21f32b83c6
[graph enrichment] fixed country_propagation oozie workflow definition, parameter saveGraph is not needed anymore by the SparkCountryPropagationJob
2022-04-11 08:52:12 +02:00
Claudio Atzori
4eff7856f5
Merge pull request '[stats-wf] computing stats in each step' ( #210 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #210
2022-04-08 14:21:01 +02:00
Serafeim Chatzopoulos
d0b84d3297
Add FileCollectorPlugin and respective test
2022-04-07 15:06:38 +03:00
Claudio Atzori
91e32f12ed
Merge branch 'master' into beta
2022-04-07 13:37:58 +02:00
Serafeim Chatzopoulos
bc1bf55507
Add AbstractSplittedRecordPlugin
2022-04-07 14:33:04 +03:00
Claudio Atzori
c26222623f
[maven-release-plugin] prepare for next development iteration
2022-04-07 13:32:22 +02:00
Claudio Atzori
86585a6b27
[maven-release-plugin] prepare release dhp-1.2.4
2022-04-07 13:32:19 +02:00
Claudio Atzori
ad85d88eaf
[maven-release-plugin] rollback the release of dhp-1.2.4
2022-04-07 13:28:35 +02:00
Claudio Atzori
598e11dfd7
[maven-release-plugin] prepare for next development iteration
2022-04-07 13:27:02 +02:00
Claudio Atzori
db3d9877a5
[maven-release-plugin] prepare release dhp-1.2.4
2022-04-07 13:26:58 +02:00
Claudio Atzori
f03dea4f49
allow to skip maven site
2022-04-07 13:22:55 +02:00
Claudio Atzori
3bba6d6e38
[maven-release-plugin] rollback the release of dhp-1.2.4
2022-04-07 12:23:17 +02:00
Claudio Atzori
2ac2d928bd
[maven-release-plugin] prepare for next development iteration
2022-04-07 12:18:47 +02:00
Claudio Atzori
85bc722ff4
[maven-release-plugin] prepare release dhp-1.2.4
2022-04-07 12:18:43 +02:00
Claudio Atzori
bc05b6168a
[maven-release-plugin] rollback the release of dhp-1.2.4
2022-04-07 11:49:06 +02:00
Claudio Atzori
505420fd61
[maven-release-plugin] prepare for next development iteration
2022-04-07 11:34:06 +02:00
Claudio Atzori
66e718981e
[maven-release-plugin] prepare release dhp-1.2.4
2022-04-07 11:34:02 +02:00
Serafeim Chatzopoulos
e612489670
Add fileGZip collector plugin and respective test
2022-04-06 19:12:44 +03:00
Claudio Atzori
4190c9f6bc
[graph raw] avoid NPEs importing datasource consent fields
2022-04-06 15:34:31 +02:00
Claudio Atzori
05fafa1408
[graph raw] avoid NPEs importing datasource consent fields
2022-04-06 15:23:50 +02:00
Claudio Atzori
8c457f1b2c
conflicts resolved, merged from beta
2022-04-06 10:27:52 +02:00
Miriam Baglioni
e77d104951
[OC] added / to workflow path
2022-04-05 15:07:11 +02:00
Miriam Baglioni
79336d46c5
[Clean Context] first naive implementation of a functionality to clean not wanted contextes from one result. This implementation simply verifies the main title of the results start with a given string
2022-04-04 15:52:31 +02:00
Claudio Atzori
873369af1c
Merge pull request '[stats wf] added apcs in monitor db' ( #207 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #207
2022-03-29 15:40:20 +02:00
Claudio Atzori
de85367695
Merge pull request '[stats wf] fix: views cannot be stored as parquet...' ( #206 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #206
2022-03-29 12:51:02 +02:00
Sandro La Bruzzo
1b11010169
minor fix
2022-03-29 10:59:14 +02:00
Claudio Atzori
0a0ae84c22
[graph raw] DOI based instance URLs on https
2022-03-29 10:52:58 +02:00
Claudio Atzori
eca82e30c9
updated dhp-schema version
2022-03-29 09:46:49 +02:00
Claudio Atzori
9fa3dd78fe
Merge pull request '[stats wf] various fixes, organization ids for inst. dashboard' ( #205 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #205
2022-03-28 22:03:49 +02:00
Claudio Atzori
5d53ac95aa
Merge pull request 'XML serialisation of instances with the same URLs - 2nd round' ( #204 ) from instance_group_by_url into beta
...
Reviewed-on: #204
2022-03-28 09:24:00 +02:00
Claudio Atzori
96aa2a5d0d
Merge branch 'beta' into instance_group_by_url
2022-03-28 09:23:52 +02:00
Claudio Atzori
395ac6ecec
merged pom.xml from beta branch
2022-03-28 09:23:42 +02:00
Claudio Atzori
fa3cb84f77
Merge pull request 'Datasource consent fields' ( #202 ) from datasource_pdf_consent into beta
...
Reviewed-on: #202
2022-03-28 09:21:14 +02:00
Claudio Atzori
741bc99c47
Merge branch 'beta' into datasource_pdf_consent
2022-03-28 09:20:48 +02:00
Claudio Atzori
3610f1749a
merged pom.xml from beta branch
2022-03-28 09:20:27 +02:00
Claudio Atzori
61319b2e83
updated dhp-schema version; set entity-level dataInfo before & after merging the fields from the group of duplicates
2022-03-25 16:38:33 +01:00
Miriam Baglioni
7b8f85692e
[Enrichment country] fixed issues with parameters and workflow args
2022-03-23 17:20:23 +01:00
Claudio Atzori
48d32466e4
instances grouped by URL expose only one refereed
2022-03-23 14:52:03 +01:00
Claudio Atzori
f10066547b
increased spark.sql.shuffle.partitions in affiliation_from_semrel_propagation
2022-03-23 12:22:26 +01:00
Claudio Atzori
43733c1a18
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-03-23 12:14:27 +01:00
Miriam Baglioni
89fd275480
[HostedByMap] added left over from PR and fixed issue on workflow
2022-03-21 09:54:45 +01:00
miconis
c763aded70
dependency updated to the new pace-core version
2022-03-16 16:41:50 +01:00
miconis
c959639bd5
dependency updated to the new pace-core version
2022-03-15 16:33:03 +01:00
miconis
10172553ab
[maven-release-plugin] prepare for next development iteration
2022-03-15 15:06:18 +01:00
miconis
bd919ac98d
[maven-release-plugin] prepare release dnet-dedup-4.1.12
2022-03-15 15:06:12 +01:00
miconis
a965233dd0
bug fix in the normalization of a legalname, city map updated and transliteration support added
2022-03-15 14:59:13 +01:00
Miriam Baglioni
0f7d8ca2e0
[HostedByMap] change on master to align to PR 201 on beta merged as 9f3036c847
2022-03-11 15:16:02 +01:00
Claudio Atzori
f430029596
cleanup
2022-03-11 14:28:28 +01:00
Claudio Atzori
d48ccfd65e
Merge pull request 'enrichment_country' ( #203 ) from enrichment_country into beta
...
Looks good to me
Reviewed-on: #203
2022-03-11 14:27:01 +01:00
Miriam Baglioni
12de9acb0d
[Country Propagation] left out from previous commit
2022-03-11 14:17:02 +01:00
Miriam Baglioni
2fbb35ade5
mergin with branch beta
2022-03-11 13:58:10 +01:00
Miriam Baglioni
4437f9345d
[Country Propagation] left out from previous commit
2022-03-11 13:57:47 +01:00
Miriam Baglioni
2b643059fa
[Country Propagation] changed the logic to get the collectedfrom at the result level. To fix issue when no instance is created for a result that should have the country associated. Change the code to use spark instead of hive to prepare the data needed for the propagation step. Added new tests for the intermediate steps and new verification for the propagation itself
2022-03-11 13:56:48 +01:00
Claudio Atzori
f25407bbe2
added mapping for datasource consent fields to integrate them in the graph
2022-03-11 09:32:42 +01:00
miconis
ac9708e31b
[maven-release-plugin] prepare for next development iteration
2022-03-09 13:43:48 +01:00
miconis
a5a6054039
[maven-release-plugin] prepare release dnet-dedup-4.1.11
2022-03-09 13:43:44 +01:00
miconis
3bc07c5881
bug fix in the AuthorMatch, implementation of the concat function in the model creation with jpath query
2022-03-09 12:53:09 +01:00
miconis
699612dd17
implementation of the size threshold on authors list match
2022-03-08 16:49:28 +01:00
Claudio Atzori
9f3036c847
Merge pull request 'HostedByMap' ( #201 ) from hostedByMap_update into beta
...
Reviewed-on: #201
2022-03-04 16:26:27 +01:00
Miriam Baglioni
2c5087d55a
[HostedByMap] download of doaj from json, modification of test resources, deletion of class no more needed for the CSV download
2022-03-04 15:18:21 +01:00
Miriam Baglioni
5d608d6291
[HostedByMap] changed the model to include also oaStart date and review process that could be possibly used in the future
2022-03-04 11:06:09 +01:00
Miriam Baglioni
b7c2340952
[HostedByMap - DOIBoost] changed to use code moved to common since used also from hostedbymap now
2022-03-04 11:05:23 +01:00
Miriam Baglioni
8a41f63348
[HostedByMap] update to download the json instead of the csv
2022-03-04 10:38:43 +01:00
Miriam Baglioni
44b0c03080
[HostedByMap] update to download the json instead of the csv
2022-03-04 10:37:59 +01:00
Miriam Baglioni
3be8737c32
[graph-stats] fixed query after the change in the indicator table related to PR#200
2022-03-02 14:09:05 +01:00
Miriam Baglioni
3970651ee1
Merge pull request 'fixed query after the change in the indicator table' ( #200 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #200
2022-03-02 14:05:58 +01:00
Claudio Atzori
580d904aae
manually merging PR#199 #199
2022-02-25 12:22:50 +01:00
Claudio Atzori
1932a65d1c
Merge pull request '[Stats wf] sprint 6 indicators' ( #198 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #198
2022-02-25 12:09:18 +01:00
Miriam Baglioni
f5b0a6f89c
[master to beta] fixed issues in test files
2022-02-25 10:21:57 +01:00
miconis
8991d097b4
bug fix in the DedupRecordFactory, DataInfo set before merge
2022-02-24 17:13:12 +01:00
miconis
fe1c966cbf
Merge branch 'master_202203' of code-repo.d4science.org:D-Net/dnet-hadoop into master_202203
2022-02-24 17:08:38 +01:00
miconis
b0f369dc78
bug fix in the DedupRecordFactory, DataInfo set before merge
2022-02-24 17:08:24 +01:00
Miriam Baglioni
859cb7ac9d
[DoiBoost AR] changed test resource to be sure the result will always have EMBARGO as value for AccessRight
2022-02-24 16:55:32 +01:00
Miriam Baglioni
a40b59b7d5
[ResultToOrgFromInstRepoTest] fixed issue in model of the input resources
2022-02-24 16:05:57 +01:00
Claudio Atzori
66c09b1bc7
code formatting
2022-02-24 12:58:07 +01:00
Claudio Atzori
e7016c3981
Merge branch 'master_202203' into beta
2022-02-24 12:51:58 +01:00
Claudio Atzori
a87c070447
conflicts resolved, merged from beta
2022-02-24 12:51:31 +01:00
Claudio Atzori
55caa389d5
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2022-02-24 12:16:43 +01:00
Claudio Atzori
ab36154e3e
added more ignores
2022-02-24 12:16:17 +01:00
Claudio Atzori
fbf192d6ba
Merge pull request '[provision wf] serialize measures defined on the result level' ( #196 ) from xml_measures into beta
...
Reviewed-on: #196
2022-02-23 15:56:28 +01:00
Claudio Atzori
86cdb7a38f
[provision] serialize measures defined on the result level
2022-02-23 15:54:18 +01:00
Alessia Bardi
9d6203f79b
test mapping datasource
2022-02-23 15:00:53 +01:00
Claudio Atzori
5226d0a100
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-02-18 15:21:07 +01:00
Claudio Atzori
99f5b14469
[graph raw] invisible records stored among the raw graph rather than the claimed subgraph
2022-02-18 15:20:57 +01:00
Claudio Atzori
401dd38074
code formatting
2022-02-18 15:19:15 +01:00
Claudio Atzori
cf8443780e
added processingchargeamount to the result view
2022-02-18 15:17:48 +01:00
Sandro La Bruzzo
891781ee3f
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2022-02-18 11:11:32 +01:00
Sandro La Bruzzo
d3f03abd51
fixed wrong json path
2022-02-18 11:11:17 +01:00
Claudio Atzori
56e048ef21
Merge pull request 'added hierarchy rel in ROR actionset' ( #153 ) from hierarchical_orgs_relations into beta
...
Reviewed-on: #153
2022-02-17 10:31:56 +01:00
Claudio Atzori
89c7313fc5
Merge branch 'beta' into hierarchical_orgs_relations
2022-02-17 10:30:04 +01:00
Antonis Lempesis
5772f92dba
merged beta chnages in hive branch
2022-02-15 13:24:51 +02:00
Sandro La Bruzzo
3aa2020b24
added script to regenerate hostedBy Map following instruction defined on ticket #7539
...
updated hosted By Map
2022-02-15 11:05:27 +01:00
Miriam Baglioni
90e197a563
Merge pull request '[OpenCitation] changed the name of destination folders' ( #195 ) from openCitations into beta
...
Reviewed-on: #195
2022-02-14 15:52:10 +01:00
Miriam Baglioni
be64055cfe
[OpenCitation] changed the name of destination folders
2022-02-14 15:49:44 +01:00
Miriam Baglioni
a1013e62d4
Merge pull request 'openCitations' ( #194 ) from openCitations into beta
...
Reviewed-on: #194
2022-02-14 14:58:28 +01:00
Miriam Baglioni
1490867cc7
[OpenCitation] cleaning of the COCI model
2022-02-14 14:52:12 +01:00
Miriam Baglioni
c191080965
mergin with branch beta
2022-02-14 14:49:39 +01:00
Alessia Bardi
6158170334
testing delegated authority and bumped dep to schemas
2022-02-11 18:05:18 +01:00
Alessia Bardi
600ede1798
serialisation of APCs int he XML records
2022-02-11 11:00:20 +01:00
Miriam Baglioni
5c4043dba8
[OpenCitation] refactoring
2022-02-08 16:23:05 +01:00
Miriam Baglioni
759ed519f2
[OpenCitation] added logic to avoid the genration of self citations relations
2022-02-08 16:15:34 +01:00
Miriam Baglioni
b071f8e415
[OpenCitation] change to extract in json format each folder just onece
2022-02-08 15:37:28 +01:00
Miriam Baglioni
fbc28ee8c3
[OpenCitation] change the integration logic to consider dois with commas inside
2022-02-07 18:32:08 +01:00
Miriam Baglioni
78be2975f0
[stats-wf]fixed another typo related to PR#193
2022-02-07 11:22:08 +01:00
Miriam Baglioni
1f8302dc37
Merge pull request '[stats-wf]fixed yet another typo' ( #193 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #193
2022-02-07 11:19:26 +01:00
Alessia Bardi
b04ecfcf13
Merge pull request 'extendResult' ( #192 ) from extendResult into beta
...
Reviewed-on: #192
2022-02-04 16:43:58 +01:00
Alessia Bardi
ac8b8f224f
Merge branch 'beta' into extendResult
2022-02-04 16:43:27 +01:00
Miriam Baglioni
9fd2ef468e
[APC at the result level] changed dependecy in external pom
2022-02-04 16:40:32 +01:00
Miriam Baglioni
493caef358
[stats-wf]fixed the result_result table related to PR#191
2022-02-04 14:51:25 +01:00
Miriam Baglioni
0547fd6ee7
Merge pull request '[stats-wf]fixed the result_result table' ( #191 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #191
2022-02-04 14:47:31 +01:00
Miriam Baglioni
aae667e6b6
[APC at the result level] added the APC at the level of the result and modified test class
2022-02-04 12:34:25 +01:00
Sandro La Bruzzo
bcfdf9a0d7
iis repository with https
2022-02-03 16:49:31 +01:00
Miriam Baglioni
3c60e53a96
[stats-wf]fixed the result_result creation for monitor PR#190 on beta
2022-02-03 14:47:08 +01:00
Miriam Baglioni
89922156c9
Merge pull request '[stats-wf]fixed the result_result creation for monitor' ( #190 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #190
2022-02-03 13:00:56 +01:00
Alessia Bardi
2e215abfa8
test for instances with URLs for OpenAPC
2022-02-02 17:27:44 +01:00
Miriam Baglioni
37784209c9
[dhp-schemas-] updated the version of dhp-schema to 2.10.27 for APC name and id modification
2022-02-02 12:46:31 +01:00
Miriam Baglioni
73eba34d42
[UnresolvedEntities] Changed the way to merge the unresolved because the new merge removed the dataInfo from the merged result. Added also data info for subjects
2022-02-01 08:38:41 +01:00
Miriam Baglioni
dce7f5fea8
[BULK TAGGING] changed to fix issue that should have been fixed already
2022-01-31 08:20:28 +01:00
Claudio Atzori
8eb75ca169
adapted GenerateEntitiesApplicationTest behaviour
2022-01-27 16:24:37 +01:00
Claudio Atzori
db299dd8ab
fixed typo
2022-01-27 16:24:06 +01:00
Claudio Atzori
af61e44acc
ported changes to the GraphCleaningFunctionsTest from 8de9788308
2022-01-27 16:19:14 +01:00
Claudio Atzori
4fc44edb71
depending on dhp-schemas:2.10.26
2022-01-27 16:03:57 +01:00
Miriam Baglioni
a70b0990c9
Merge pull request 'Priority to records from delegated authorities' ( #187 ) from delegated_authorities into beta
...
Reviewed-on: #187
2022-01-26 16:02:49 +01:00
Claudio Atzori
1322379741
Merge branch 'beta' into delegated_authorities
2022-01-25 14:28:25 +01:00
Claudio Atzori
59a250337c
[graph resolution] drop output path at the beginning
2022-01-24 18:02:39 +01:00
Claudio Atzori
97ad94d7d9
[graph resolution] drop output path at the beginning
2022-01-24 18:02:07 +01:00
Claudio Atzori
8de9788308
applied fix for avoiding ruling out the invisible (APC) records during the graph cleaning
2022-01-24 11:29:22 +01:00
Claudio Atzori
c42623f006
added NPE checks
2022-01-21 14:30:09 +01:00
Claudio Atzori
2f385b3ac6
updated dnet workflow profile definitions
2022-01-21 13:59:46 +01:00
Claudio Atzori
dd52bf1bb8
copy relations to the graphOutputPath
2022-01-21 13:59:29 +01:00
Claudio Atzori
4983d6536d
Merge branch 'beta' into delegated_authorities
2022-01-21 13:02:48 +01:00
Sandro La Bruzzo
7a3819144d
Merge pull request 'title types from datacite records' ( #188 ) from datacite_title_mapping into beta
...
Reviewed-on: #188
2022-01-21 11:05:25 +01:00
Claudio Atzori
f0ea2410e5
improved mapping titles from datacite records to consider title types
2022-01-21 10:50:34 +01:00
Claudio Atzori
b37bc277c4
reintroduced the hostedby patching to the datacite records
2022-01-21 09:15:13 +01:00
Claudio Atzori
f2fde5566b
using helper method from ModelSupport to find the inverse relation descriptor
2022-01-20 09:19:07 +01:00
Claudio Atzori
3b9020c1b7
added unit test for the DispatchEntitiesJob
2022-01-19 18:15:55 +01:00
Claudio Atzori
abfa9c6045
code formatting
2022-01-19 17:17:11 +01:00
Claudio Atzori
391aa1373b
added unit test
2022-01-19 17:13:21 +01:00
Claudio Atzori
62f135262e
code formatting
2022-01-19 12:30:52 +01:00
Claudio Atzori
44a937f4ed
factored out entity grouping implementation, extended to consider results from delegated authorities rather than identical records from other sources
2022-01-19 12:24:52 +01:00
miconis
8f07f0c537
[maven-release-plugin] prepare for next development iteration
2022-01-13 17:22:16 +01:00
miconis
620e35db28
[maven-release-plugin] prepare release dnet-dedup-4.1.10
2022-01-13 17:22:12 +01:00
miconis
2ff97781d2
minor change
2022-01-13 17:20:20 +01:00
Miriam Baglioni
42e8f76778
[GraphCleaning] change the return value in the filtering function to avoid to lose the APC entities
2022-01-13 16:06:43 +01:00
miconis
1ff6a3dc11
[maven-release-plugin] prepare for next development iteration
2022-01-13 15:15:05 +01:00
miconis
003bcf1699
[maven-release-plugin] prepare release dnet-dedup-4.1.9
2022-01-13 15:15:00 +01:00
Miriam Baglioni
a7c4d0d16d
[DoiBoost Organizations] added parameter to specify the action in the wf raw_organizations to be able to load the openorgs organization as in the loading step for the construction of the graph
2022-01-13 13:52:00 +01:00
miconis
2f1ba56f61
bug fix in the authormatch comparator, implementation of tests
2022-01-13 11:58:28 +01:00
Miriam Baglioni
7bf12ad24a
Merge pull request 'BipInstance' ( #185 ) from BipInstance into beta
...
Reviewed-on: #185
2022-01-12 18:15:38 +01:00
Miriam Baglioni
a75fb8c47a
[BipFinderInstanceLevel] change pom to align to the dhp-schema release 2.10.24 and refactoring
2022-01-12 18:06:26 +01:00
Miriam Baglioni
4d517ed9ec
mergin with branch beta
2022-01-12 17:29:37 +01:00
Miriam Baglioni
e7d5a39c03
[BipFinderInstanceLevel] added tests in test class
2022-01-12 17:25:04 +01:00
Claudio Atzori
dbd6fa1d65
scalafmt: remote referencing the common definition files makes it work compiling the entire project as well as the individual submodules
2022-01-12 17:19:38 +01:00
Miriam Baglioni
4993666d73
[BipFinderInstanceLevel] changed creation of the instance to allow to enrich existing instances with same pid
2022-01-12 16:53:47 +01:00
Claudio Atzori
9acc32faa6
[stats wf] final touches for the integration of PRs #166 , #179 in the master branch
2022-01-12 12:04:31 +01:00
dimitrispie
b053b0178e
Sprint 5 and other changes
2022-01-12 11:23:37 +01:00
Antonis Lempesis
b6b4bc0df9
added first indicator of sprint 5
2022-01-12 11:20:28 +01:00
Antonis Lempesis
e91f06f39b
fixed typos in indicators. Added extra views in monitor
2022-01-12 11:18:28 +01:00
Antonis Lempesis
3ce1976627
fixed column names
2022-01-12 11:14:41 +01:00
Antonis Lempesis
4878d7485c
added usage stats
2022-01-12 11:13:25 +01:00
Antonis Lempesis
a4316bafed
fixed a typo
2022-01-12 11:12:53 +01:00
Antonis Lempesis
bb17e070d8
added result_result relations
2022-01-12 11:09:38 +01:00
Claudio Atzori
a30a98a716
Applying PR#166 in the master branch (Added sprint 3&4 of indicators). Merge commit '0df9574a6f5d9d75bc840decb023561ae941f9d6'
2022-01-12 10:57:19 +01:00
Sandro La Bruzzo
1b9e8378b3
Merge pull request 'scalafmt: code style for scala' ( #184 ) from scalafmt into beta
...
Reviewed-on: #184
2022-01-12 09:58:39 +01:00
Sandro La Bruzzo
57e2c4b749
formatted code
2022-01-12 09:40:28 +01:00
Sandro La Bruzzo
b78d2b71f0
updated scala format configuration
2022-01-12 09:38:34 +01:00
Claudio Atzori
0f2144b5e0
scalafmt: code formatting
2022-01-11 17:03:44 +01:00
Claudio Atzori
dcd282977c
pulled from beta
2022-01-11 16:59:41 +01:00
Claudio Atzori
4f212652ca
scalafmt: code formatting
2022-01-11 16:57:48 +01:00
Sandro La Bruzzo
0163dadb7f
[doiboost]
...
- update MAG schema, new filed added on version dec-2021
2022-01-11 11:05:44 +01:00
Miriam Baglioni
904e1c2667
Merge pull request 'Affiliation Propagation through semantic relation' ( #183 ) from enrichment into beta
...
Reviewed-on: #183
2022-01-07 19:18:16 +01:00
Miriam Baglioni
064f9bbd87
[AFFPropSR] added new paprameter for the number of iterations and new code for just one iteration
2022-01-07 18:58:51 +01:00
Miriam Baglioni
93f26fb742
Merge pull request '[SDG-FOS] to import SDG file not considering the header' ( #182 ) from SDG into beta
...
Reviewed-on: #182
2022-01-07 16:28:55 +01:00
Miriam Baglioni
b7e450070b
[SDG-FOS] to import SDG file not considering the header
2022-01-07 12:13:26 +01:00
Miriam Baglioni
af8a33638d
Merge pull request 'SDG - FOS' ( #181 ) from SDG into beta
...
Reviewed-on: #181
2022-01-07 11:31:19 +01:00
Miriam Baglioni
639190370a
mergin with branch beta
2022-01-07 11:29:25 +01:00
Miriam Baglioni
adccc2346a
[SDG-FOS] to lower case for the doi
2022-01-07 11:28:50 +01:00
Claudio Atzori
8ae46ca789
OAF-store-graph mdstores: firther fix for PR#180
2022-01-05 15:52:15 +01:00
Claudio Atzori
908294d86e
OAF-store-graph mdstores: firther fix for PR#180
2022-01-05 15:49:05 +01:00
Claudio Atzori
3bd3653be9
OAF-store-graph mdstores: save them in text format
2022-01-04 16:39:39 +01:00
Claudio Atzori
3dc48c7ab5
OAF-store-graph mdstores: save them in text format
2022-01-04 16:39:27 +01:00
Claudio Atzori
f82db765db
OAF-store-graph mdstores: save them in text format
2022-01-04 16:39:15 +01:00
Claudio Atzori
8d13effa31
test for the tolerant deserialisation utility method
2022-01-04 16:38:26 +01:00
Claudio Atzori
9458ee7938
serialise records in the OAF-store-graph mdstores in json format. Read them again in the graph construction phase using a tolerant parser to support backward compatible changes in the evolution of the schema
2022-01-04 16:38:09 +01:00
Claudio Atzori
58f8998e3d
OAF-store-graph mdstores: save them in text format
2022-01-04 15:02:09 +01:00
Claudio Atzori
174c3037e1
OAF-store-graph mdstores: save them in text format
2022-01-04 14:40:16 +01:00
Claudio Atzori
045d767013
OAF-store-graph mdstores: save them in text format
2022-01-04 14:23:01 +01:00
Claudio Atzori
cb30770a0b
Merge pull request 'tolerant parsing of OAF-store-graph mdstores' ( #180 ) from graph_interpretation_mdstores into beta
...
Reviewed-on: #180
2022-01-04 11:32:29 +01:00
Claudio Atzori
bd59b58efb
test for the tolerant deserialisation utility method
2022-01-04 11:26:56 +01:00
Claudio Atzori
a6977197b3
serialise records in the OAF-store-graph mdstores in json format. Read them again in the graph construction phase using a tolerant parser to support backward compatible changes in the evolution of the schema
2022-01-03 17:25:26 +01:00
Miriam Baglioni
4c60ee1718
mergin with branch beta
2022-01-03 15:24:02 +01:00
Miriam Baglioni
92fd69e25d
[SDG-FOS] alternative way to get input data to avoid OOM error while getting csv
2022-01-03 15:23:06 +01:00
Claudio Atzori
fe7e5f4748
Merge pull request '[stats wf] result_result relations, usage stats, monitor views, indicator for sprint 5' ( #179 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #179
2022-01-03 14:52:11 +01:00
Claudio Atzori
bcea4e3a9b
added dnet workflow profile for the orchestration of the simplified and complete graph construction and processing pipeline, where the IIS works on the non-deduplicated graph
2022-01-03 14:33:00 +01:00
miconis
cea8440153
[maven-release-plugin] prepare for next development iteration
2021-12-30 13:11:57 +01:00
miconis
eb48d31ea6
[maven-release-plugin] prepare release dnet-dedup-4.1.8
2021-12-30 13:11:52 +01:00
miconis
a224bf70a4
implementation of new comparators for publication dedup configuration update
2021-12-27 17:35:02 +01:00
Miriam Baglioni
a706ba0c08
Merge pull request 'SDG Integration' ( #178 ) from SDG into beta
...
Reviewed-on: #178
2021-12-23 14:50:00 +01:00
Miriam Baglioni
7a1b440413
[SDG] logic to create unresolved entities out of SDG input. This changes also some classes related to FOS to reuse the same code. The code under createunresolvedentities create results with the merged update of the the inputs provided (bip at the level of the isntance, fos and sdg for subjects)
2021-12-23 13:24:28 +01:00
Claudio Atzori
278cf08421
Merge pull request 'Normalising DOI urls' ( #177 ) from instance_group_by_url into beta
...
Reviewed-on: #177
2021-12-23 12:40:17 +01:00
Claudio Atzori
cccb16900c
https://support.openaire.eu/issues/7330 normalising DOI urls
2021-12-23 12:33:53 +01:00
Miriam Baglioni
2a67ee13ec
[SDG] added model class
2021-12-23 10:37:52 +01:00
Miriam Baglioni
5c4fee3533
Merge pull request '[Graph Dump] fixed issue on extraction of relation between entities and contexts: the relationship name and type were swapped' ( #176 ) from dump into beta
...
Reviewed-on: #176
2021-12-23 10:16:20 +01:00
Miriam Baglioni
69e9ea9eeb
[Graph Dump] Test for extraction of rels from entities extended
2021-12-23 10:15:30 +01:00
Miriam Baglioni
31b26d48ac
[Graph Dump] fixed issue on extraction of relation between entities and contexts: the relationship name and type were swapped
2021-12-23 10:09:47 +01:00
Miriam Baglioni
bf3a9505e0
Merge pull request 'FOS' ( #175 ) from FOS into beta
...
Reviewed-on: #175
2021-12-23 09:06:56 +01:00
Miriam Baglioni
10579c0dd0
[FOS]fixed doi value in test
2021-12-22 23:10:16 +01:00
Miriam Baglioni
6116fc5d40
[FOS]added logic to include only different subjects. Test refactoring and extention
2021-12-22 23:04:22 +01:00
Miriam Baglioni
b81efb6a9d
[FOS]changed the mapping between the csv and the model. Changed Test classes and resources
2021-12-22 21:40:35 +01:00
Miriam Baglioni
73175ba086
mergin with branch beta
2021-12-22 16:45:15 +01:00
Miriam Baglioni
de6c4c8968
[FOS]creation of the unresolved entities: remove the split for the doi: no more needed since each row is related to one doi
2021-12-22 16:44:44 +01:00
Miriam Baglioni
b352fbe453
Merge pull request 'bipFinder: unresolved entities' ( #174 ) from bipFinder into beta
...
Reviewed-on: #174
2021-12-22 16:42:30 +01:00
Miriam Baglioni
34ac56565d
refactoring
2021-12-22 16:28:11 +01:00
Miriam Baglioni
20ef1d657f
refactoring
2021-12-22 16:26:36 +01:00
Miriam Baglioni
813f856d3f
[BipFinder] removing left over parameter in wf
2021-12-22 16:11:12 +01:00
Miriam Baglioni
2c126ed014
[BipFinder] create unresolved entities with measures at the level of the instance
2021-12-22 16:03:41 +01:00
Miriam Baglioni
bf52a1847b
Merge pull request 'bipFinder at the level of the result' ( #173 ) from bipFinder into beta
...
Reviewed-on: #173
2021-12-22 15:48:03 +01:00
Miriam Baglioni
0807fdb65a
[BipFinder] remove not needed resources
2021-12-22 15:37:00 +01:00
Miriam Baglioni
b5e11a3a0a
[BipFinder] put in common package BipFinder model
2021-12-22 15:33:05 +01:00
Miriam Baglioni
c5739c4266
[BipFinder] create action set for the measures at the level of the result
2021-12-22 15:08:33 +01:00
Miriam Baglioni
da5f6260aa
mergin with branch beta
2021-12-22 13:12:02 +01:00
Miriam Baglioni
4849270c55
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-22 13:07:37 +01:00
Claudio Atzori
8d18500069
using dhp-schema:2.9.24
2021-12-22 12:47:21 +01:00
Miriam Baglioni
9d19b057b8
Merge pull request '[GRAPH DUMP]Moving Measures' ( #159 ) from dump into beta
...
Reviewed-on: #159
2021-12-22 12:40:35 +01:00
Miriam Baglioni
be0acccf42
Merge branch 'beta' into dump
2021-12-22 12:39:57 +01:00
Miriam Baglioni
89ea9fa0e1
Merge branch 'dump' of https://code-repo.d4science.org/D-Net/dnet-hadoop into dump
2021-12-22 12:36:32 +01:00
Miriam Baglioni
e24a7f3496
mergin with branch beta
2021-12-21 13:57:19 +01:00
Miriam Baglioni
d1ae219cb4
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-21 13:55:53 +01:00
Miriam Baglioni
460e6b95d6
[Graph Dump] -
2021-12-21 13:48:03 +01:00
Sandro La Bruzzo
3920d68992
Fixed workflow generation of delta in datacite
2021-12-21 11:41:49 +01:00
Miriam Baglioni
3cc1b7b153
mergin with branch beta
2021-12-15 17:25:02 +01:00
Miriam Baglioni
5e5dfd619c
Merge branch 'beta' into dump
2021-12-15 17:21:55 +01:00
Miriam Baglioni
63b648b0dd
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-15 12:41:15 +01:00
Antonis Lempesis
f0b523cfa7
removed the too restrctive clause. will discuss again
2021-12-15 12:32:15 +01:00
Sandro La Bruzzo
b881ee5ef8
[scholexplorer]
...
- implemented generation of scholix of delta update of datacite
2021-12-15 11:25:32 +01:00
Sandro La Bruzzo
63952018c0
[scholexplorer]
...
-moved SparkRetrieveDataciteDelta in scala folder
2021-12-15 11:25:32 +01:00
Sandro La Bruzzo
e5bff64f2e
[scholexplorer]
...
- Minor fix on SparkConvertRDDtoDataset
-first implementation of retrieve datacite dump
2021-12-15 11:25:32 +01:00
Claudio Atzori
e30e5ac8a8
Merge pull request '[Affiliation Propagation]' ( #162 ) from affiliationPropagation into beta
...
Reviewed-on: #162
2021-12-14 15:28:23 +01:00
Claudio Atzori
1790fa2d44
Merge branch 'beta' into affiliationPropagation
2021-12-14 15:26:56 +01:00
Miriam Baglioni
56409d1281
[Dump] resolved conflicts with beta and merging
2021-12-14 15:03:45 +01:00
Miriam Baglioni
a3592b463a
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-14 14:58:26 +01:00
Miriam Baglioni
22d4b5619b
[BipFinder Result] last changes to test and resources files
2021-12-14 14:54:13 +01:00
Miriam Baglioni
6fb6236cd4
changed the way to produce the AS for bipFinder.
2021-12-14 14:51:14 +01:00
Claudio Atzori
aff3ddc8d2
added cleaning for the format field, removing carrige return and tab characters
2021-12-14 11:41:46 +01:00
Miriam Baglioni
573bd17cbb
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-14 11:12:25 +01:00
Miriam Baglioni
4eb8276493
-
2021-12-14 11:12:17 +01:00
Antonis Lempesis
ddd34087c2
removed 'stored as parquet' from views..
2021-12-13 23:05:00 +02:00
Antonis Lempesis
915f758c82
moving data to impala cluster and creating shadow databases there
2021-12-13 16:26:14 +02:00
Miriam Baglioni
936578aaf1
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-13 15:01:47 +01:00
Miriam Baglioni
8d755cca80
-
2021-12-13 15:01:40 +01:00
Claudio Atzori
98eb292c59
avoid NPEs merging XMLInstance(s)
2021-12-13 13:27:20 +01:00
Claudio Atzori
5e17247bb6
avoid NPEs merging XMLInstance(s)
2021-12-13 11:48:40 +01:00
Claudio Atzori
b70ecccea0
avoid NPEs merging XMLInstance(s)
2021-12-12 12:37:38 +01:00
Claudio Atzori
c1b6ae47cd
cleaning workflow assigns the proper default instance type when a value could not be cleaned using the vocabularies
2021-12-09 16:47:41 +01:00
Claudio Atzori
25dc7929a9
Merge pull request '[graph cleaning] improved instance type defaults' ( #172 ) from graph_cleaning into beta
...
Reviewed-on: #172
2021-12-09 16:47:06 +01:00
Claudio Atzori
eb43eda42a
Merge branch 'beta' into graph_cleaning
2021-12-09 16:46:48 +01:00
Claudio Atzori
41c70c607d
cleaning workflow assigns the proper default instance type when a value could not be cleaned using the vocabularies
2021-12-09 16:44:28 +01:00
Alessia Bardi
8f1e018ceb
Merge pull request 'Serialization of fields in XML records for Sygma (and not only)' ( #171 ) from sygma_indexing into beta
...
Reviewed-on: #171
2021-12-09 15:53:27 +01:00
Alessia Bardi
cba63e9f82
Merge branch 'beta' into sygma_indexing
2021-12-09 15:52:16 +01:00
Alessia Bardi
e53228401b
style
2021-12-09 15:46:22 +01:00
Claudio Atzori
cd9c51fd7a
vocabulary based cleaning considers also the term label when looking up for a synonym
2021-12-09 14:49:24 +01:00
Claudio Atzori
adf17452b0
Merge pull request '[graph cleaning] consider terms as synonyms in the vocabulary lookup' ( #170 ) from graph_cleaning into beta
...
Reviewed-on: #170
2021-12-09 14:45:14 +01:00
Claudio Atzori
e6e177dda0
vocabulary based cleaning considers also the term label when looking up for a synonym
2021-12-09 13:57:53 +01:00
Alessia Bardi
6b5d7688a4
#7275 serialize license information in XML records
2021-12-09 13:46:48 +01:00
Miriam Baglioni
b113586207
resolved conflicts
2021-12-07 10:16:14 +01:00
Sandro La Bruzzo
5d51b3dd4a
Merge pull request 'scala_refactor' ( #169 ) from scala_refactor into beta
...
Reviewed-on: #169
2021-12-06 15:33:44 +01:00
Miriam Baglioni
d9836f0cf3
[OpenCitations] fixed test when executed one after the other
2021-12-06 15:27:09 +01:00
Miriam Baglioni
d1df01ff1e
[Graph Dump] fixed resource for test
2021-12-06 15:15:48 +01:00
Sandro La Bruzzo
ed0c352799
[test-fixing] fixed wrong test
2021-12-06 15:07:41 +01:00
Miriam Baglioni
96a7d46278
[Graph Dump] fixed tests
2021-12-06 15:06:32 +01:00
Sandro La Bruzzo
e9f285ec4d
[scala-refactor] Module dhp-doiboost:
...
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 14:24:03 +01:00
Sandro La Bruzzo
bf880e2508
[scala-refactor] Module dhp-graph-mapper:
...
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 13:57:41 +01:00
Sandro La Bruzzo
81bf604059
[scala-refactor] Module dhp-common:
...
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 11:29:24 +01:00
Sandro La Bruzzo
7af0bbd0b1
[scala-refactor] Module dhp-aggregation:
...
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 11:26:36 +01:00
Claudio Atzori
9132727793
fixed date cleaning test
2021-12-06 10:54:05 +01:00
Claudio Atzori
08795cbd30
using helper method from ModelSupport to find the inverse relation descriptor
2021-12-06 10:39:56 +01:00
Miriam Baglioni
f430688ff7
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-03 12:36:08 +01:00
Miriam Baglioni
4bb1d43afc
-
2021-12-03 12:35:51 +01:00
Sandro La Bruzzo
0fa0ce33d6
removed duplicated on gitignore
2021-12-03 11:47:35 +01:00
Sandro La Bruzzo
f7011b90d8
format code
2021-12-03 11:15:09 +01:00
Claudio Atzori
372633880f
Merge pull request 'XML serialisation of instances with the same URLs' ( #167 ) from instance_group_by_url into beta
...
Reviewed-on: #167
2021-12-03 09:28:06 +01:00
Claudio Atzori
dd0b2e5244
Merge branch 'beta' into instance_group_by_url
2021-12-03 09:27:58 +01:00
Claudio Atzori
c4c705aa46
Merge pull request 'Cleaning of invisible records' ( #168 ) from clean_invisible_records into beta
...
Reviewed-on: #168
2021-12-03 09:27:41 +01:00
Claudio Atzori
863a2f9db3
avoid to filter OAF records defined as invisible = true
2021-12-03 09:08:12 +01:00
Claudio Atzori
9cac283bec
implemented Instance serialization features requested in https://support.openaire.eu/issues/7156
2021-12-02 17:20:33 +01:00
Miriam Baglioni
d9f80488cc
[GRAPH DUMP] Add one more test to check the filtering of the relations
2021-12-02 14:15:19 +01:00
Miriam Baglioni
58bc3f223a
[GRAPH DUMP] Add filtering for relation we do not want to dump. It is based on the relclass
2021-12-02 14:09:46 +01:00
Miriam Baglioni
8905a39bf3
mergin with branch beta
2021-12-02 13:17:29 +01:00
Miriam Baglioni
87eedad898
-
2021-12-02 13:17:19 +01:00
Claudio Atzori
3b19821f3c
added stats computation on the graph hive DB tables
2021-12-02 10:44:10 +01:00
Claudio Atzori
cfa4560769
minor: fixed hive action name
2021-12-02 10:43:36 +01:00
Claudio Atzori
d85af6fc25
[cleaning wf] fixed OAF record navigation, a mapping defined on a container object would have prevented the natvigation to continue on its properties
2021-12-01 15:49:15 +01:00
Claudio Atzori
4fe7888817
code formatting
2021-12-01 15:48:15 +01:00
Claudio Atzori
01e5e0142a
added test to verify the relation inverse lookup operation
2021-12-01 09:46:26 +01:00
Antonis Lempesis
d05210ba99
finished migration to hive only
2021-11-30 19:01:48 +02:00
Claudio Atzori
0df9574a6f
Merge pull request '[stats wf] Added sprint 3&4 of indicators' ( #166 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #166
2021-11-29 10:40:26 +01:00
Claudio Atzori
1de881b796
resolved conflicts for #165
2021-11-26 16:15:11 +01:00
Claudio Atzori
014e872ae1
[resolution wf] added optional parameter to skip the entity resolution
2021-11-26 15:38:56 +01:00
Claudio Atzori
5c6d328537
code formatting
2021-11-26 15:38:16 +01:00
Antonis Lempesis
12749a0a77
first
2021-11-26 15:40:40 +02:00
Sandro La Bruzzo
bb7f556eff
Merge remote-tracking branch 'origin/beta' into beta
2021-11-25 13:03:25 +01:00
Sandro La Bruzzo
1e1f5e4fe0
minor fix
2021-11-25 13:03:17 +01:00
Miriam Baglioni
ac07ed8251
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-11-25 12:32:58 +01:00
Miriam Baglioni
5fd0e610bf
[DOIBOOST Process] fix filtering to filter results with non null id
2021-11-25 12:10:45 +01:00
Sandro La Bruzzo
feea154e89
remove working dir after test
2021-11-25 11:02:38 +01:00
Sandro La Bruzzo
028a8acad8
add test resources
2021-11-25 10:54:47 +01:00
Sandro La Bruzzo
2164a2a889
Datacite: Code Refactor generated a general SparkApplication Scala where all the spark scala have to inherit
...
Commented a little the Datacite transformation code
2021-11-25 10:54:13 +01:00
Miriam Baglioni
3f9b2ba8ce
[Hosted By Map] fix issue in test
2021-11-22 16:59:43 +01:00
Sandro La Bruzzo
a7cf277d98
Datacite: Removed HostedBy Patch as described on ticket #7219 , Now all the records will have hosted by Unknown Repository
2021-11-22 16:03:17 +01:00
Sandro La Bruzzo
483d3039d1
entity resolution: added distcpt of missing entities in graph materialization
2021-11-22 15:55:24 +01:00
Sandro La Bruzzo
93fe8ce8b2
entity resolution: fix test
2021-11-22 15:50:43 +01:00
Sandro La Bruzzo
35e20b0647
updated resolution wf:
...
- generate a new version of the graph
- changed merge from union to join
2021-11-22 11:48:55 +01:00
Miriam Baglioni
fdb75b180e
[Cleaning] added couple of tests for DOIBOOST publications
2021-11-21 16:35:22 +01:00
Miriam Baglioni
0506fa2654
[Graph Dump] changed to mirror the changes in the model
2021-11-19 15:56:25 +01:00
Sandro La Bruzzo
6110a2b984
reverted version
2021-11-19 15:31:45 +01:00
Sandro La Bruzzo
65ebe1019b
updated wagon-ssh version
2021-11-19 14:59:04 +01:00
Sandro La Bruzzo
155d8bf83f
updated maven site plugin on dhp-code-style
2021-11-19 14:51:08 +01:00
Sandro La Bruzzo
3426451d3f
Merge remote-tracking branch 'origin/beta' into beta
2021-11-19 14:49:04 +01:00
Sandro La Bruzzo
75298ec442
added site.xml to code style
2021-11-19 14:48:44 +01:00
Sandro La Bruzzo
4542a2338b
updated site configuration to deploy on website
2021-11-19 13:44:08 +01:00
Claudio Atzori
90c2a4987e
Merge pull request '[fix] preserve parent/child relations from OpenOrgs' ( #164 ) from preserve_openorg_parent_child_relations into beta
...
Reviewed-on: #164
2021-11-19 11:35:55 +01:00
Claudio Atzori
e5a2c596b2
Merge branch 'beta' into preserve_openorg_parent_child_relations
2021-11-19 11:35:46 +01:00
Claudio Atzori
f4538f3c4c
cleanup
2021-11-19 11:33:10 +01:00
Claudio Atzori
2b46b87f56
fixed filtering criteria applied in SparkCopyRelationsNoOpenorgs to keep the parent/child relations from OpenOrgs
2021-11-19 11:30:29 +01:00
Miriam Baglioni
9fae872181
[Graph Dump] changed to mirror the changes in the model
2021-11-19 11:25:50 +01:00
Sandro La Bruzzo
fc03c99805
fixed javadocs url after deploying site
2021-11-19 10:46:33 +01:00
Sandro La Bruzzo
8a7c7d36db
Merge pull request 'mvn_site_documentation' ( #161 ) from mvn_site_documentation into beta
...
Reviewed-on: #161
2021-11-19 09:54:53 +01:00
Sandro La Bruzzo
0c0d561bc4
added public class into tests to create correct javadoc
2021-11-19 09:54:22 +01:00
Claudio Atzori
62fa61f3cf
merge from beta
2021-11-19 09:23:42 +01:00
Claudio Atzori
bd9a43cefd
Revert to 4094f2bb9a
2021-11-19 09:20:43 +01:00
Claudio Atzori
3a4d925386
Merge branch 'beta' into hierarchical_orgs_relations
2021-11-18 18:07:08 +01:00
Claudio Atzori
3974fa7dc1
Merge branch 'beta' into affiliationPropagation
2021-11-18 18:06:26 +01:00
Claudio Atzori
a24b9f8268
[dedup] trivial refactoring
2021-11-18 17:12:02 +01:00
Claudio Atzori
c0750fb17c
avoid non necessary count operations over large spark datasets
2021-11-18 17:11:31 +01:00
Claudio Atzori
bb5dca7979
cleanup
2021-11-18 17:10:46 +01:00
Miriam Baglioni
793b5a8e5f
Aggiornare 'dhp-workflows/dhp-graph-mapper/src/main/java/eu/dnetlib/dhp/oa/graph/dump/ResultMapper.java'
...
Removing the dump of Measure at the level of the result. We decided not to map it
2021-11-18 14:49:38 +01:00
Miriam Baglioni
5dc5792722
[Graph Dump] Change test resource to mirror the movement of the measure element
2021-11-18 14:39:12 +01:00
Miriam Baglioni
0136a8c266
[Graph Dump] Change test to mirror that measure is at the level of the isntance
2021-11-18 14:38:33 +01:00
Miriam Baglioni
1b79c0ee79
mergin with branch beta
2021-11-18 11:01:00 +01:00
Claudio Atzori
10a32f287f
Merge pull request '[stats wf] RIs, affiliations, parquet' ( #156 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #156
2021-11-17 15:02:25 +01:00
Sandro La Bruzzo
9c82d670b8
make class public in order to create javadoc
2021-11-17 12:31:02 +01:00
Sandro La Bruzzo
1f5ee116ed
code refactor, created and moved scala code on the correct maven folder under src/main/scala and src/test/scala
...
fixed test
2021-11-17 12:23:52 +01:00
Sandro La Bruzzo
2fd9ceac13
code refactor, created and moved scala code on the correct maven folder under src/main/scala and src/test/scala
2021-11-17 11:35:22 +01:00
Sandro La Bruzzo
60ae874dcb
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into mvn_site_documentation
2021-11-17 11:08:34 +01:00
Sandro La Bruzzo
2506d7a679
Merge branch 'mvn_site_documentation' of code-repo.d4science.org:D-Net/dnet-hadoop into mvn_site_documentation
2021-11-17 11:07:24 +01:00
Sandro La Bruzzo
cded363b55
code refactor, created and moved scala code on the correct maven folder under src/main/scala and src/test/scala
2021-11-17 11:06:35 +01:00
Miriam Baglioni
4094f2bb9a
added integration md file
2021-11-17 10:04:52 +01:00
Miriam Baglioni
ec8b0219ff
[Documentation] Added first page for Integration via unresolved entities generation
2021-11-16 17:41:34 +01:00
Miriam Baglioni
2bbece2ca5
mergin with branch beta
2021-11-16 16:35:40 +01:00
Sandro La Bruzzo
2d67020c59
added dhp-enrichment maven site template
2021-11-16 16:01:08 +01:00
Miriam Baglioni
28ea532ece
[Affilaition Propagation] moved the selection of graph relation as a preparation step
2021-11-16 15:24:19 +01:00
Sandro La Bruzzo
18c1d70ef4
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into mvn_site_documentation
2021-11-16 15:16:49 +01:00
Sandro La Bruzzo
a1cafaf2e3
added mvn site for dnet-hadoop project
2021-11-16 15:16:28 +01:00
Miriam Baglioni
7c96e3fd46
removed not useful dir
2021-11-16 13:57:26 +01:00
Miriam Baglioni
c7c0c3187b
[AFFILIATION PROPAGATION] Applied some SonarLint suggestions
2021-11-16 13:56:32 +01:00
Miriam Baglioni
c6a9f0a1a8
mergin with branch beta
2021-11-16 12:04:40 +01:00
Miriam Baglioni
99d86134f5
[Graph Dump] changed the dump since the measures have been moded at the level of the instance
2021-11-16 12:04:21 +01:00
Miriam Baglioni
6595135a1a
[Dump Schemas] changed the schema of the dumped result according to the modifications in the bestAccessRight type
2021-11-12 11:45:38 +01:00
Miriam Baglioni
43cae4ad88
Merge branch 'dump' of https://code-repo.d4science.org/D-Net/dnet-hadoop into dump
2021-11-12 11:36:54 +01:00
Miriam Baglioni
b3f9370125
merge with beta - resolved conflict in pom
2021-11-12 11:25:26 +01:00
Miriam Baglioni
b8bdabfae9
[Graph DUmp] removed OpenAccessRoute from test in best access right
2021-11-11 16:16:48 +01:00
Miriam Baglioni
e5498052e8
[Graph DUmp] removed OpenAccessRoute from test in best access right
2021-11-11 16:14:10 +01:00
Miriam Baglioni
8cc50ecee0
[Graph Dump] changed AccessRight with BestAccessRight in the dump and modified the dependency to the schema to the SNAPSHOT
2021-11-11 08:59:20 +01:00
Miriam Baglioni
88b73f4f49
mergin with branch beta
2021-11-10 17:00:52 +01:00
Alessia Bardi
fc8fceaac3
create direct link to WT projects as well
2021-11-10 14:11:52 +01:00
Alessia Bardi
6cd91004e3
fixed DOI for Wellcome Trust in mapping relationships from Crossref
2021-11-09 12:22:57 +01:00
Alessia Bardi
b9d4f115cc
fixed Crossref mappign for SFI projects
2021-11-09 12:04:45 +01:00
Miriam Baglioni
94918a673c
[Graph DUMP] Fix issue for empty origilaId list
2021-11-08 10:25:28 +01:00
Claudio Atzori
9cb8e4ad21
Merge branch 'beta' into hierarchical_orgs_relations
2021-11-08 09:40:24 +01:00
Miriam Baglioni
4c70201412
mergin with branch beta
2021-11-05 12:29:56 +01:00
Miriam Baglioni
8442efd8d1
[Graph DUMP] Filtering out from the originalIds the id of the result in OpenAIRE
2021-11-05 12:29:22 +01:00
Claudio Atzori
5681e89544
Update 'dhp-workflows/dhp-graph-mapper/src/main/resources/eu/dnetlib/dhp/oa/graph/dump/schemas/result_schema.json'
2021-11-05 12:18:24 +01:00
Miriam Baglioni
a22c29fba1
[Graph DUMP] Filtering out from the originalIds the id of the result in OpenAIRE
2021-11-05 12:08:33 +01:00
Miriam Baglioni
c10ff6928c
[Graph DUMP] add schema of the dump related to the model as in dhp-schemas.2.8.31. Note the measere element at the level of the result has been removed because of issues on where to display it: at the level of the result or at the level of the entity
2021-11-05 11:36:21 +01:00
Miriam Baglioni
0857849a86
[Graph DUMP] Remove dump of measure until it will be clear where to put it (at the level of result or at the level of the instance)
2021-11-05 11:02:37 +01:00
miconis
8f1db32921
implementation of the instance type comparator and its tests
2021-11-04 15:20:57 +01:00
Miriam Baglioni
b9d124bb7c
[Enrichment: Propagation through parent-child relationships] Added counters, and changed constraint to verify if filtering out the relation (from classname = harvested to classid != propagation)
2021-11-03 13:55:37 +01:00
Antonis Lempesis
b97b78f874
removed hardcoded reference
2021-11-02 09:12:49 +01:00
Miriam Baglioni
2aca6bfa0a
mergin with branch beta
2021-10-29 11:20:45 +02:00
Miriam Baglioni
09f36cffb8
[Enrichment: Propagation through parent-child relationships] First implementation, testing, and wf for propagation of result to organization through semantic relation
2021-10-29 11:20:03 +02:00
Claudio Atzori
d02caef185
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-27 15:36:29 +02:00
Miriam Baglioni
d0ef7d91c5
adding test resource
2021-10-26 17:34:11 +02:00
Michele Artini
d66e20e7ac
added hierarchy rel in ROR actionset
2021-10-21 15:51:48 +02:00
Claudio Atzori
cece432adc
[stats] reducing the step22 wait time
2021-10-20 14:16:33 +02:00
Antonis Lempesis
a7376907c2
invalidating medatadata before context thingies
2021-10-20 14:16:25 +02:00
Antonis Lempesis
43f4eb492b
fetching affiliated results for 4 orgs in monitor. fixed affiliated orgs in stats db
2021-10-20 14:16:11 +02:00
Miriam Baglioni
652114c641
[affiliationPropagation] first try. preparetion
2021-10-20 11:44:23 +02:00
Michele Artini
c4fce785ab
fixed a compilation problem of a unit test
2021-10-19 16:18:26 +02:00
Claudio Atzori
172363e7f1
[broker] integrating PR#147, notification record creation phase separated from indexing on ES
2021-10-19 15:56:27 +02:00
Claudio Atzori
bdffa86c2f
undo last commit
2021-10-19 15:39:38 +02:00
Claudio Atzori
e471f12d5e
hotfix: recovered implementation removing the hardcoded working_dirs
2021-10-19 12:35:38 +02:00
Claudio Atzori
e15a1969a5
applying fix on the DOIBoost construction process that somehow wasn't part of the merge done in 83c90c7180
2021-10-14 14:33:56 +02:00
Miriam Baglioni
4b1920f008
changed the working path parameter value as dependant from the dnet-workflow working dir parameter
2021-10-14 09:18:09 +02:00
Miriam Baglioni
8db39c86e2
added new parameter in the doiboost process workflow to specify a folder for the process of MAG dataset
2021-10-14 09:17:39 +02:00
Claudio Atzori
2f61054cd1
code formatting
2021-10-11 18:29:42 +02:00
Claudio Atzori
83c90c7180
manually merging PR#149 #149
2021-10-11 18:27:05 +02:00
Michele Artini
d6e1f22408
max numbers of workers for indexing
2021-10-05 15:09:18 +02:00
Michele Artini
210d6c0e6d
generateNotificationsJob and indexNotificationsJob
2021-10-05 13:57:46 +02:00
Michele Artini
69008e20c2
log and tests
2021-10-05 11:58:20 +02:00
Michele Artini
8bbaa17335
reimplemented of conditions cache as a non static variable
2021-10-05 09:20:37 +02:00
Michele Artini
0a9ef34b56
test
2021-10-04 15:46:12 +02:00
Michele Artini
31a6ad1d79
optimization of verifySubsriptions()
2021-10-04 12:01:56 +02:00
Claudio Atzori
b01cd521b0
removed configuration specifying the limit to 8 for spark.dynamicAllocation.maxExecutors
2021-10-01 11:26:33 +02:00
Claudio Atzori
ec94cc9b93
IndexNotificationsJob test: persist contents on HDFS instead of passing them to ES
2021-10-01 09:41:27 +02:00
miconis
fbb1b66bfb
dedup test implementation & graph drawing tools
2021-09-13 14:53:19 +02:00
Sandro La Bruzzo
370dddb2fa
fix bug on oai iterator that skip record cleaned
2021-09-07 11:20:41 +02:00
Claudio Atzori
d64a942a76
fixed MappersTest
2021-08-09 12:32:26 +02:00
Claudio Atzori
a45b95ccc1
resolving conflicts for PR#134
2021-08-09 10:50:03 +02:00
miconis
1144d50a11
[maven-release-plugin] prepare for next development iteration
2021-05-03 16:09:56 +02:00
miconis
f33a18ca9d
[maven-release-plugin] prepare release dnet-dedup-4.1.7
2021-05-03 16:09:08 +02:00
miconis
4bce4f2e8e
minor change: version updated
2021-05-03 16:05:39 +02:00
miconis
c6266242e3
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-dedup
2021-05-03 15:38:00 +02:00
miconis
4988e9f80d
implementation of cross comparison for different fields, addition of clustering mechanism to collapse keys from different clustering functions on the same cluster
2021-05-03 15:37:41 +02:00
Claudio Atzori
58d013e24f
[maven-release-plugin] prepare for next development iteration
2021-04-12 16:12:15 +02:00
Claudio Atzori
3a7336157b
[maven-release-plugin] prepare release dnet-dedup-4.0.6
2021-04-12 16:12:10 +02:00
miconis
ed0d5d3e1d
implementation of the wf to dedup entities, addition of the module to run the wf on the cluster
2020-12-04 15:41:31 +01:00
miconis
3f2d3253e4
Merge branch 'stable_ids' into deduptesting
2020-11-05 15:52:57 +01:00
miconis
1699d41d39
relations for openorgs: not it choose only one master
2020-11-05 15:48:42 +01:00
miconis
72116446ec
[maven-release-plugin] prepare for next development iteration
2020-09-29 12:06:38 +02:00
miconis
05a03d97cd
[maven-release-plugin] prepare release dnet-dedup-4.0.5
2020-09-29 12:06:35 +02:00
miconis
2a01022712
minor changes
2020-09-29 12:05:50 +02:00
miconis
dd34e371d7
fixed error in the treeprocessor. it used th=-1 as default value, now it use th=1
2020-09-29 12:01:25 +02:00
miconis
19c3c90d7b
fixed error in the block processor: entities with orderField=null were not considered
2020-09-19 17:43:41 +02:00
Sandro La Bruzzo
a109ebe287
fixed NPE
2020-08-06 10:27:05 +02:00
miconis
a5a3ea24f8
[maven-release-plugin] prepare for next development iteration
2020-07-16 18:59:25 +02:00
miconis
840fe8f4d3
[maven-release-plugin] prepare release dnet-dedup-4.0.4
2020-07-16 18:59:22 +02:00
miconis
07ab904d60
implementation of the clustering function for the suffixprefix chain
2020-07-16 18:57:55 +02:00
Claudio Atzori
eaf7defe0c
[maven-release-plugin] prepare for next development iteration
2020-07-15 17:57:09 +02:00
Claudio Atzori
ff2c8eba12
[maven-release-plugin] prepare release dnet-dedup-4.0.3
2020-07-15 17:57:04 +02:00
Claudio Atzori
7cc3742a26
removed maven release.property
2020-07-15 17:52:27 +02:00
Claudio Atzori
14611ea450
reverted to 4.0.3-SNAPSHOT
2020-07-15 17:37:36 +02:00
Claudio Atzori
9f20f23870
Revert "wordssuffixprefix: adjust the token length according to the number of words; removed maven release temporary files"
...
This reverts commit 51d91fa520
.
2020-07-15 17:35:56 +02:00
Claudio Atzori
9efcd8e245
Revert "reverted to 4.0.3-SNAPSHOT"
...
This reverts commit ec97983ce1
.
2020-07-15 17:28:37 +02:00
Claudio Atzori
ba493f9ab8
[maven-release-plugin] rollback the release of dnet-dedup-4.0.3
2020-07-15 17:24:43 +02:00
Claudio Atzori
6c98d4c436
[maven-release-plugin] prepare release dnet-dedup-4.0.3
2020-07-15 17:24:25 +02:00
Claudio Atzori
ec97983ce1
reverted to 4.0.3-SNAPSHOT
2020-07-15 17:20:12 +02:00
Claudio Atzori
51d91fa520
wordssuffixprefix: adjust the token length according to the number of words; removed maven release temporary files
2020-07-15 17:13:45 +02:00
Claudio Atzori
b79ea97107
Revert "wordssuffixprefix: adjust the token length according to the number of words; removed maven release temporary files"
...
This reverts commit d2861950ac
.
2020-07-15 17:11:46 +02:00
Claudio Atzori
92aadbfc7b
[maven-release-plugin] prepare release dnet-dedup-4.0.3
2020-07-15 17:04:20 +02:00
Claudio Atzori
d2861950ac
wordssuffixprefix: adjust the token length according to the number of words; removed maven release temporary files
2020-07-15 16:49:47 +02:00
miconis
244a037a90
implementation of a class to test the clustering functions
2020-07-12 10:13:54 +02:00
miconis
7aa2001a8b
[maven-release-plugin] prepare for next development iteration
2020-07-02 17:06:38 +02:00
miconis
c72055f543
[maven-release-plugin] prepare release dnet-dedup-4.0.2
2020-07-02 17:06:36 +02:00
miconis
f933fd33e0
implemented new function for clustering
2020-07-02 17:04:17 +02:00
miconis
411d1cc24f
implementation of the test for the dedup and addition of new support classes
2020-06-11 10:46:46 +02:00
miconis
48c094f599
[maven-release-plugin] prepare for next development iteration
2020-04-24 14:39:01 +02:00
miconis
4365ba41c9
[maven-release-plugin] prepare release dnet-dedup-4.0.1
2020-04-24 14:38:58 +02:00
miconis
6e9b27f37d
implementation of the mechanism to truncate the string and the lists
2020-04-24 14:36:42 +02:00
Sandro La Bruzzo
8e4211708e
[maven-release-plugin] prepare for next development iteration
2020-02-10 12:51:04 +01:00
Sandro La Bruzzo
24e2ab9092
[maven-release-plugin] prepare release dnet-dedup-4.0.0
2020-02-10 12:50:45 +01:00
Sandro La Bruzzo
46727f5c76
upgraded maven version of commons-lang
2020-02-10 12:38:40 +01:00
miconis
5c8f6febee
minor changes in comparators
2020-01-24 10:01:11 +01:00
miconis
4dce785375
update in the implementation of the tree: addition of new logic aggregations and statistics
2020-01-14 11:42:43 +02:00
miconis
b3748b8d77
minor changes
2019-12-18 16:20:35 +01:00
miconis
b21b1b8f61
implementation of new aggregation in the tree node processing
2019-12-18 16:19:36 +01:00
miconis
20fcfe6328
implementation of new aggregation in the tree node processing
2019-12-18 16:19:26 +01:00
Sandro La Bruzzo
d924f28b93
fixed wrong use of jspath
2019-12-18 09:29:44 +01:00
miconis
84aaa65501
implementation of new json comparator and update of the publication configuration
2019-12-17 09:16:26 +01:00
Sandro La Bruzzo
5c01ae4c92
merged JqMapping branch into tree2
2019-12-13 11:30:02 +01:00
Sandro La Bruzzo
35008fdbf9
fix stuff
2019-12-06 15:28:30 +01:00
Sandro La Bruzzo
16c670a5d5
Improved deduplication
2019-12-05 14:14:25 +01:00
miconis
49f9beb4a8
implementation of romansmatch and re-implementation of the getNumber function. New terms in the translation map and update of the configuration
2019-11-28 16:54:44 +01:00
miconis
f791730330
addition of one term to the translation maps in the configurations
2019-11-27 15:48:37 +01:00
miconis
d2278fe358
minor change in the citymatch
2019-11-21 10:54:02 +01:00
miconis
8c0d346005
the param map has been updated: now it accepts string parameters
2019-11-21 09:37:56 +01:00
miconis
ddd40540aa
jarowinklernormalizedname splitted in 3 different comparators: citymatch, keywordmatch and jarowinkler. Implementation of the TreeStatistic support functions
2019-11-20 10:45:00 +01:00
miconis
c687956371
code cleaning and implementation of the TreeDedup + minor changes
2019-11-14 10:01:21 +01:00
miconis
0973899865
code cleaning, distribution of the classes in packages and implementation of the new configuration
2019-11-07 12:47:12 +01:00
miconis
30a873265f
put the last modification of the master branch into the tree2. Addition of the configuration as parameter of the comparator. This is to allow the comparator to access it
2019-10-29 16:38:42 +01:00
miconis
1beb776691
minor changes
2019-10-29 15:58:21 +01:00
miconis
075f741d28
[maven-release-plugin] prepare for next development iteration
2019-10-24 11:34:19 +02:00
miconis
ced4bcdd59
[maven-release-plugin] prepare release dnet-dedup-3.0.15
2019-10-24 11:34:12 +02:00
miconis
13f93e6055
Revert "[maven-release-plugin] prepare release dnet-dedup-3.0.15"
...
This reverts commit cf93515d94
.
2019-10-24 11:23:01 +02:00
miconis
cf93515d94
[maven-release-plugin] prepare release dnet-dedup-3.0.15
2019-10-24 11:17:07 +02:00
miconis
285ec3ca17
release rollback
2019-10-24 11:11:07 +02:00
miconis
5f249fd56c
minor changes
2019-10-23 16:37:20 +02:00
miconis
c9863debfa
minor changes and configuration updates (synonym field added)
2019-10-23 16:31:45 +02:00
miconis
5499ca17c3
minor changes
2019-10-08 16:49:07 +02:00
miconis
50b7a12b3f
normalization of the term in the translation map added
2019-10-08 15:13:45 +02:00
miconis
26b383fea2
translation map moved in json configuration, support for synonyms added in the configuration, now the configuration is argument of conditions, distancealgos and clusteringfunctions
2019-10-08 14:53:52 +02:00
Claudio Atzori
07355d2811
[maven-release-plugin] prepare for next development iteration
2019-09-25 10:39:46 +02:00
Claudio Atzori
254eb46809
[maven-release-plugin] prepare release dnet-dedup-3.0.14
2019-09-25 10:39:39 +02:00
Claudio Atzori
74c6462b49
updated translation map and some tests
2019-09-25 10:15:13 +02:00
miconis
aed81e4cfa
translation map updated
2019-09-25 09:53:06 +02:00
miconis
afd2b398d5
optimize imports
2019-08-09 15:42:41 +02:00
miconis
d71dae5fd2
implementation of the conditions in tree nodes. get rid of the conditions part of the configuration
2019-08-09 15:41:49 +02:00
miconis
a5c5d2f01b
implementation of the decision tree. It takes place of the distance algos, necessaryConditions and sufficientConditions are still there. The model contains only path, type and name of the field. ignoreMissing is still in the model because it is used by the conditions.
2019-08-09 10:08:34 +02:00
miconis
f2136e1024
code refactoring: useless module removed
2019-08-07 15:16:59 +02:00
miconis
8c867101ef
addition of a fixSpecial function to address the problem with special character in organization names, addition of new terms in translation maps
2019-08-06 17:06:05 +02:00
miconis
4502b44337
addition of the BlockUtils class for meta-blocking, implementation of a new local test with edge filtering example
2019-08-06 12:09:34 +02:00
miconis
cffb712a99
Merge branch 'master' of https://github.com/dnet-team/dnet-dedup
2019-07-19 17:10:53 +02:00
miconis
a85576c27e
restyling of the JaroWinklerNormalizedName comparator, now it is optimized. Addition of some translations in the translation maps, addition of a clustering based on keywords in organizations legalnames
2019-07-19 17:10:29 +02:00
Claudio Atzori
6cb846331a
[maven-release-plugin] prepare for next development iteration
2019-07-08 11:12:52 +02:00
Claudio Atzori
c04d2232c2
[maven-release-plugin] prepare release dnet-dedup-3.0.13
2019-07-08 11:12:45 +02:00
miconis
fb5e38db26
Merge branch 'master' of https://github.com/dnet-team/dnet-dedup
2019-07-08 11:02:29 +02:00
miconis
3c6f8d1e44
bug fixing in the keywordsclustering class
2019-07-08 11:01:49 +02:00
Claudio Atzori
a69022617d
[maven-release-plugin] prepare for next development iteration
2019-07-08 10:11:24 +02:00
Claudio Atzori
c6baeb93d4
[maven-release-plugin] prepare release dnet-dedup-3.0.12
2019-07-08 10:11:17 +02:00
miconis
f5de20a508
[maven-release-plugin] rollback the release of dnet-dedup-3.0.12
2019-07-08 10:00:48 +02:00
miconis
ba50aa8654
[maven-release-plugin] prepare for next development iteration
2019-07-08 09:48:10 +02:00
miconis
7065110a21
[maven-release-plugin] prepare release dnet-dedup-3.0.12
2019-07-08 09:48:03 +02:00
miconis
15bec5e876
addition of doi normalization in PidMatch comparator, addition of keywordsclustering (clustering based on terms in the translation maps for the organizations), minor changes
2019-07-08 09:44:02 +02:00
Claudio Atzori
2dcffb965f
[maven-release-plugin] prepare for next development iteration
2019-06-19 10:02:39 +02:00
Claudio Atzori
85126c59f7
[maven-release-plugin] prepare release dnet-dedup-3.0.11
2019-06-19 10:02:32 +02:00
Claudio Atzori
15d7b584f3
optimized classpath resolvers
2019-06-19 10:01:35 +02:00
Claudio Atzori
ff4956def9
[maven-release-plugin] prepare for next development iteration
2019-06-18 14:46:34 +02:00
Claudio Atzori
eb5ce312a3
[maven-release-plugin] prepare release dnet-dedup-3.0.10
2019-06-18 14:46:27 +02:00
Claudio Atzori
f2bc665403
avoid to divide by zero: in case of missing values, return undefined response
2019-06-18 14:45:15 +02:00
Claudio Atzori
e3f86b92c8
cleanup
2019-06-18 14:44:42 +02:00
miconis
54e4d0af04
exact match condition gives undefined if a field is missing, ignoremissing semantics changed: now performs the comparison in any case if =true, if false gives -1 in case of missing
2019-06-18 14:05:31 +02:00
miconis
e8db8f2abb
implementation of the integration test, addition of document blocks to group entities after clustering
2019-05-21 16:38:26 +02:00
Claudio Atzori
f7a3bdf3f8
[maven-release-plugin] prepare for next development iteration
2019-04-03 12:35:00 +02:00
Claudio Atzori
98c179c8fb
[maven-release-plugin] prepare release dnet-dedup-3.0.9
2019-04-03 12:34:52 +02:00
miconis
3e61a90c8f
[maven-release-plugin] rollback the release of dnet-dedup-3.0.9
2019-04-03 12:27:28 +02:00
miconis
15fb9eb883
[maven-release-plugin] prepare for next development iteration
2019-04-03 12:26:05 +02:00
miconis
a1ff4daa7f
[maven-release-plugin] prepare release dnet-dedup-3.0.9
2019-04-03 12:25:56 +02:00
miconis
1d29bae47c
branch cities merged into master
2019-04-03 12:22:33 +02:00
miconis
7e7018c51f
addition of a sparktester test, implementation of 2 different classes for testing in dnet-dedup-test module, addition of new terms in the vocabulary and change in the implementation of the JaroWinklerNormalizedName comparator
2019-04-03 09:40:14 +02:00
miconis
4bd5a9beee
minor changes
2019-03-26 15:48:21 +01:00
Michele De Bonis
662448e584
update of the comparator for legalnames of organizations
2019-03-21 14:27:27 +01:00
Claudio Atzori
f2394fcd9f
[maven-release-plugin] prepare for next development iteration
2019-02-18 09:09:14 +01:00
Claudio Atzori
722431dde1
[maven-release-plugin] prepare release dnet-dedup-3.0.8
2019-02-18 09:09:07 +01:00
Claudio Atzori
470c4b0f20
default configuration includes configurationId
2019-02-18 09:07:23 +01:00
Claudio Atzori
ccb7e83196
[maven-release-plugin] prepare for next development iteration
2019-02-17 12:56:19 +01:00
Claudio Atzori
7d8e62d4cc
[maven-release-plugin] prepare release dnet-dedup-3.0.7
2019-02-17 12:56:11 +01:00
Claudio Atzori
968cd47436
replace existing attributes when loading default configuration
2019-02-17 12:48:25 +01:00
Michele De Bonis
0735f3a822
implementation of the test classes and minor changes
2019-02-08 12:56:47 +01:00
Michele De Bonis
7a8d28991f
implementation of the decision tree for the deduplication of the authors, implementation of multiple comparators to be used in a tree node and definition of the proto for person entity
2018-12-20 09:54:41 +01:00
Michele De Bonis
39613dbbd6
implementation of the decisional tree, addition of the dnet-openaire-data-protos module, definition of the person proto, blockprocessor and paceconfig modified with addition of support for the tree processing
2018-12-12 16:30:03 +01:00
Claudio Atzori
f1c68d8ba3
apply limits (length, size) to pace Fields
2018-11-20 10:51:38 +01:00
Claudio Atzori
c5979ffe18
[maven-release-plugin] prepare for next development iteration
2018-11-19 17:41:45 +01:00
Claudio Atzori
9869dff1d2
[maven-release-plugin] prepare release dnet-dedup-3.0.6
2018-11-19 17:41:37 +01:00
Claudio Atzori
c2d4cb3ba6
added new properties to FieldDef (size, length) to limit the information mapped onto each MapDocument
2018-11-19 17:37:57 +01:00
Claudio Atzori
394fcafd41
[maven-release-plugin] prepare for next development iteration
2018-11-17 09:13:16 +01:00
Claudio Atzori
397554130c
[maven-release-plugin] prepare release dnet-dedup-3.0.5
2018-11-17 09:13:09 +01:00
Claudio Atzori
0dfb2ea600
added distance function fot software titles
2018-11-17 09:11:38 +01:00
Michele De Bonis
3d4372ced9
addition of cities check
2018-11-16 16:11:03 +01:00
Claudio Atzori
55a9b4f501
[maven-release-plugin] prepare for next development iteration
2018-11-16 09:18:00 +01:00
Claudio Atzori
35ab630493
[maven-release-plugin] prepare release dnet-dedup-3.0.4
2018-11-16 09:17:53 +01:00
Claudio Atzori
399e4bc80f
default (empty) configuration should be aligned with the updated model
2018-11-15 16:52:56 +01:00
Claudio Atzori
59bab8dba4
less verbose logging
2018-11-13 09:07:45 +01:00
Claudio Atzori
478ad72cb8
propagate exceptions in case of serialization errors, removed configuration pretty printing, removed unused class ScoredResult
2018-11-12 15:52:18 +01:00
Claudio Atzori
f7616c7a8a
[maven-release-plugin] prepare for next development iteration
2018-11-12 14:23:36 +01:00
Claudio Atzori
df4b871c8b
[maven-release-plugin] prepare release dnet-dedup-3.0.3
2018-11-12 14:23:29 +01:00
Michele De Bonis
72a9b3139e
Merge branch 'master' of https://github.com/dnet-team/dnet-dedup
2018-11-12 14:11:26 +01:00
Michele De Bonis
b5062f5429
configuration file updated, addition of condition on domain
2018-11-12 14:11:15 +01:00
Claudio Atzori
2a509b18fa
[maven-release-plugin] prepare for next development iteration
2018-11-12 12:46:50 +01:00
Claudio Atzori
e247218987
[maven-release-plugin] prepare release dnet-dedup-3.0.2
2018-11-12 12:46:42 +01:00
Claudio Atzori
b7bc7f0401
getting rid of spark libs from dnet-pace-core
2018-11-12 12:46:06 +01:00
Claudio Atzori
3dacba37ea
[maven-release-plugin] prepare for next development iteration
2018-11-12 11:40:42 +01:00
Claudio Atzori
8cc2517f5d
[maven-release-plugin] prepare release dnet-dedup-3.0.1
2018-11-12 11:40:34 +01:00
Claudio Atzori
851ae5eec3
[maven-release-plugin] rollback the release of dnet-dedup-3.0.1
2018-11-12 11:39:07 +01:00
Claudio Atzori
f283d58a6e
[maven-release-plugin] prepare release dnet-dedup-3.0.1
2018-11-12 11:38:52 +01:00
Claudio Atzori
6d09041288
[maven-release-plugin] rollback the release of dnet-dedup-3.0.1
2018-11-12 11:28:28 +01:00
Claudio Atzori
46cee13596
[maven-release-plugin] prepare for next development iteration
2018-11-12 11:24:06 +01:00
Claudio Atzori
e1c69ad24e
[maven-release-plugin] prepare release dnet-dedup-3.0.1
2018-11-12 11:23:57 +01:00
Michele De Bonis
b247a86e69
configuration files changed: dedupRun instead of run, assertion updated in tests
2018-11-06 11:02:00 +01:00
Michele De Bonis
4c8485d0bb
deleted useless imports
2018-11-06 09:48:22 +01:00
Michele De Bonis
748189af10
implementation of JaroWinklerNormalizedName, addition of various stopwords in different languages and configuration test
2018-11-05 17:22:59 +01:00
Claudio Atzori
e296f7a81c
added DiffPatchMatch utility. Resumed commented tests!
2018-10-31 10:49:11 +01:00
Michele De Bonis
dc41b76643
serialization test added. useless getter methods ignored by json serialization
2018-10-29 16:16:11 +01:00
Michele De Bonis
ea36007d1f
DedupConf parsed using Jackson library
2018-10-29 11:13:55 +01:00
Michele De Bonis
8b4762bf54
implementation of the toString methonds changed: from Gson to Jackson
2018-10-26 14:55:59 +02:00
Michele De Bonis
3cf3dc1934
modification in the initialization of clustering functions, distance algos and conditions.
2018-10-25 15:15:40 +02:00
Michele De Bonis
1cbbc3f15a
update in the discovery of clustering, conditions and distance functions (annotated with custom annotations)
2018-10-24 12:09:41 +02:00
Claudio Atzori
4d379c2227
revised PidMatch implementation, cleanup
2018-10-20 08:38:19 +02:00
Claudio Atzori
3197f26691
[maven-release-plugin] prepare for next development iteration
2018-10-18 12:17:34 +02:00
Claudio Atzori
63815be2d6
[maven-release-plugin] prepare release dnet-dedup-3.0.0
2018-10-18 12:17:27 +02:00
Claudio Atzori
ed14476b06
[maven-release-plugin] rollback the release of dnet-dedup-3.0.0
2018-10-18 12:13:03 +02:00
Claudio Atzori
82d5dce114
[maven-release-plugin] prepare release dnet-dedup-3.0.0
2018-10-18 12:12:45 +02:00
Claudio Atzori
4f29124607
[maven-release-plugin] rollback the release of dnet-dedup-3.0.0
2018-10-18 12:00:45 +02:00
Claudio Atzori
5a48937ae1
[maven-release-plugin] prepare for next development iteration
2018-10-18 11:58:43 +02:00
Claudio Atzori
5aec80345f
[maven-release-plugin] prepare release dnet-dedup-3.0.0
2018-10-18 11:58:36 +02:00
Claudio Atzori
1b46966383
updated maven project structure
2018-10-18 11:56:26 +02:00
Michele De Bonis
72ebf7c0f3
update of the spark test
2018-10-18 10:12:44 +02:00
Sandro La Bruzzo
1bb5c26e6d
Added FSpark Implementation of dedup
2018-10-11 15:19:20 +02:00
Sandro La Bruzzo
d1c73bcf90
Added First Implementation of Spark Test
2018-10-02 17:07:17 +02:00
Sandro La Bruzzo
476c3d7b07
added d-net pace core module and ignored target folder
2018-10-02 10:37:54 +02:00