Claudio Atzori
795e1b2629
Merge pull request '[graph indexing] sets spark memoryOverhead in the join operations to the same value used for the memory executor' ( #426 ) from provision_memoryOverhead into master
...
Reviewed-on: #426
2024-04-19 16:59:45 +02:00
Claudio Atzori
0c05abe50b
[graph indexing] sets spark memoryOverhead in the join operations to the same value used for the memory executor
2024-04-19 16:57:55 +02:00
Claudio Atzori
8fdd0244ad
Merge pull request 'Various fixes for the stats DB update workflow, step16-createIndicatorsTables.sql' ( #425 ) from stats_step16_fix into master
...
Reviewed-on: #425
2024-04-18 11:25:24 +02:00
Claudio Atzori
18fdaaf548
integrating suggestion from #9699 to improve the result_country table construction
2024-04-18 11:23:43 +02:00
Claudio Atzori
43e123c624
added column alias
2024-04-17 16:40:29 +02:00
Claudio Atzori
62a07b7add
added missing end of statement /*EOS*/
2024-04-17 15:13:28 +02:00
Claudio Atzori
96bddcc921
revised query implementation for indi_pub_gold_oa
2024-04-17 15:06:50 +02:00
Miriam Baglioni
0486cea4c4
removed the funder id : 100011062 Asian Spinal Cord Network, wrongly associated to Ireland
2024-04-16 15:36:40 +02:00
Claudio Atzori
013935c593
Merge pull request 'Improvements to copying data from ocean to impala' ( #420 ) from antonis.lempesis/dnet-hadoop:beta into master
...
Reviewed-on: #420
2024-04-16 14:17:47 +02:00
Lampros Smyrnaios
d7da4f814b
Minor updates to the copying operation to Impala Cluster:
...
- Improve logging.
- Code optimization/polishing.
2024-04-12 18:12:06 +03:00
Lampros Smyrnaios
14719dcd62
Miscellaneous updates to the copying operation to Impala Cluster:
...
- Update the algorithm for creating views that depend on other views.
- Add check for successful execution of the "hadoop distcp" command.
- Add a check for successful copy operation of all entities.
- Upon facing an error in a DB, exit the method, instead of the whole script.
- Improve logging.
- Code polishing.
2024-04-12 15:36:13 +03:00
Lampros Smyrnaios
22745027c8
Use the "HADOOP_USER_NAME" value from the "workflow-property", in "copyDataToImpalaCluster.sh", in "stats-monitor-updates".
2024-04-11 17:46:33 +03:00
Lampros Smyrnaios
abf0b69f29
Upgrade the copying operation to Impala Cluster:
...
- Use only hive commands in the Ocean Cluster, as the "impala-shell" will be removed from there to free-up resources.
- Hugely improve the performance in every aspect of the copying process: a) speedup file-transferring and DB-deletion, b) eliminate permissions-assignment, "load" operations and "use $db" queries, c) retry only the "create view" statements and only as long as they depend on other non-created views, instead of trying to recreate all tables and views 5 consecutive times.
- Add error-checks for the creation of tables and views.
2024-04-11 17:12:12 +03:00
Claudio Atzori
6132bd028e
Merge pull request 'Extend Crossref-funders mapping and datacite hostedbymap' ( #417 ) from CrossrefFundersMap into master
...
Reviewed-on: #417
2024-04-09 10:30:53 +02:00
Miriam Baglioni
519db1ddef
Extended mapping of funder from crossref ( #9169 , #9277 ) and change the correspondece files for the irish fundrs ( #9635 ). Extended the datacite map to include the association between metadata and the EBRAINS datasource (SciLake)
2024-04-09 09:33:09 +02:00
Claudio Atzori
5add51f38c
Merge pull request 'fixed the result_country definition and updated the stats DB copy procedure' ( #412 ) from antonis.lempesis/dnet-hadoop:beta into master
...
Reviewed-on: #412
2024-04-03 12:34:17 +02:00
Lampros Smyrnaios
b7c8acc563
- Update the code which acquires the "IMPALA_HDFS_NODE", to test the "tmp"-dir, instead of the base-dir and introduce retries, to overcome potential file-system failures. This change was suggested by "Sebastian Tymkow" and "Grzegorz Bakalarski".
...
- Fix typos.
2024-04-03 13:15:37 +03:00
Antonis Lempesis
df6e3bda04
added new orgs in monitor
2024-04-01 22:45:29 +03:00
Antonis Lempesis
573b081f1d
added new orgs in monitor
2024-04-01 22:24:46 +03:00
Antonis Lempesis
0bf2a7a359
fixed the result_country definition
2024-04-01 15:23:22 +03:00
Claudio Atzori
f01390702e
Merge pull request 'fixed typo in indicator query' ( #410 ) from antonis.lempesis/dnet-hadoop:beta into master
...
Reviewed-on: #410
2024-03-27 13:42:07 +01:00
Antonis Lempesis
9ff44eed96
fixed typo in indicator query
...
added more institutions
2024-03-27 14:39:01 +02:00
Claudio Atzori
5592ccc37a
Merge pull request 'added missing EOS, Generate tables with parquet-files, instead of csv in the contexts.sh script' ( #408 ) from antonis.lempesis/dnet-hadoop:beta into master
...
Reviewed-on: #408
2024-03-27 12:02:57 +01:00
Antonis Lempesis
1fee4124e0
added missing EOS
2024-03-27 12:58:25 +02:00
Claudio Atzori
d16c15da8d
adjusted pom files
2024-03-26 14:00:44 +01:00
Lampros Smyrnaios
036ba03fcd
Generate tables with parquet-files, instead of csv, in "dhp-stats-update/.../contexts.sh" script.
2024-03-26 13:29:04 +02:00
Claudio Atzori
09a6d17059
Merge pull request '[Stats wf] #372 , #405 to production' ( #406 ) from antonis.lempesis/dnet-hadoop:beta into master
...
Reviewed-on: #406
2024-03-26 12:18:26 +01:00
Claudio Atzori
d70793847d
resolving conflicts on step16-createIndicatorsTables.sql
2024-03-26 12:17:52 +01:00
Lampros Smyrnaios
bc8c97182d
Automatically select the ACTIVE HDFS NODE for Impala cluster, in all "copyDataToImpalaCluster.sh" scripts.
2024-03-26 13:01:12 +02:00
Lampros Smyrnaios
92cc27e7eb
Use the ACTIVE HDFS NODE for Impala cluster, in "copyDataToImpalaCluster.sh" script.
2024-03-26 12:34:11 +02:00
Michele De Bonis
f6601ea7d1
default parameters for openorgs updated
2024-03-25 13:07:04 +01:00
Michele De Bonis
cd4c3c934d
openorgs wf updated
2024-03-22 15:42:37 +01:00
Antonis Lempesis
4c40c96e30
code cleanup
2024-03-22 10:16:49 +02:00
Antonis Lempesis
459167ac2f
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta
2024-03-21 12:44:58 +02:00
Antonis Lempesis
07f634a46d
code cleanup
2024-03-21 12:44:30 +02:00
Antonis Lempesis
9521625a07
code cleanup
2024-03-21 11:45:08 +02:00
Antonis Lempesis
67a5aa0a38
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta
2024-03-19 11:24:54 +02:00
dimitrispie
a3a570e9a0
Commit monitor-updates-wf
2024-03-19 09:42:21 +02:00
Michele Artini
a99942f7cf
filter by base types
2024-03-13 12:12:42 +01:00
Michele Artini
7f7083f53e
updated sql query for filtering BASE records
2024-03-13 11:57:26 +01:00
Michele Artini
d9b23a76c5
comments
2024-03-12 14:53:34 +01:00
Michele Artini
841ca92246
Merge pull request 'new plugin to collect from a dump of BASE' ( #400 ) from base-collector-plugin into master
...
Reviewed-on: #400
2024-03-12 12:22:42 +01:00
Michele Artini
3bcfc40293
new plugin to collect from a dump of BASE
2024-03-12 12:17:58 +01:00
Antonis Lempesis
f74c7e8689
selecting distinct peer_reviewed
2024-03-12 02:13:04 +02:00
Antonis Lempesis
3c79720342
fixed the irish result subset
2024-03-07 14:08:57 +02:00
Antonis Lempesis
5ae4b4286c
Merge branch 'beta' of https://code-repo.d3science.org/antonis.lempesis/dnet-hadoop into beta
2024-03-07 12:15:19 +02:00
Antonis Lempesis
316d585c8a
using distinct apcs per publication to avoid huge sums
2024-03-07 02:07:59 +02:00
Giambattista Bloisi
3067ea390d
Use SparkSQL in place of Hive for executing step16-createIndicatorsTables.sql of stats update wf
2024-03-04 11:13:34 +01:00
Miriam Baglioni
c94d94035c
[BulkTagging] added check to verify if field is present in the pathMap
2024-02-28 09:41:42 +01:00
Michele Artini
4374d7449e
mapping of project PIDs
2024-02-22 14:44:35 +01:00
Claudio Atzori
07d009007b
Merge pull request 'Fixed problem on missing author in crossref Mapping' ( #384 ) from crossref_missing_author_fix_master into master
...
Reviewed-on: #384
2024-02-15 15:06:17 +01:00
Claudio Atzori
071d044971
Merge branch 'master' into crossref_missing_author_fix_master
2024-02-15 15:04:19 +01:00
Claudio Atzori
b3ddbaed58
fixed import of ORPs stored on HDFS in the internal graph format (e.g. Datacite)
2024-02-15 15:02:48 +01:00
Claudio Atzori
1416f16b35
[graph raw] fixed mapping of the original resource type from the Datacite format
2024-02-09 10:19:53 +01:00
Giambattista Bloisi
ba1a0e7b4f
Merge pull request 'Set deletedbyinference =true to dedup aliases, created when a dedup in a previous build has been merged in a new dedup' ( #392 ) from fix_dedupaliases_deletedbyinference into master
...
Reviewed-on: #392
2024-02-08 15:29:29 +01:00
Giambattista Bloisi
079085286c
Merge branch 'master' into fix_dedupaliases_deletedbyinference
2024-02-08 15:29:13 +01:00
Giambattista Bloisi
8dd666aedd
Dedup aliases, created when a dedup in a previous build has been merged in a new dedup, need to be marked as "deletedbyinference", since they are "merged" in the new dedup
2024-02-08 15:27:57 +01:00
Claudio Atzori
f21133229a
Merge pull request 'Support for the PromoteAction strategy [master]' ( #391 ) from promote_actions_join_type_master into master
...
Reviewed-on: #391
2024-02-08 15:12:16 +01:00
Claudio Atzori
d86b909db2
[actiosets] fixed join type
2024-02-08 15:10:55 +01:00
Claudio Atzori
08162902ab
[actiosets] introduced support for the PromoteAction strategy
2024-02-08 15:10:40 +01:00
Antonis Lempesis
dd4c27f4f3
added 2 new institutions in monitor
2024-02-08 12:57:57 +02:00
Claudio Atzori
e8630a6d03
[graph cleaning] rule out datasources without an officialname
2024-02-05 14:59:06 +02:00
Claudio Atzori
f28c63d5ef
[orcid enrichment] fixed directory cleanup before distcp
2024-02-05 09:44:56 +02:00
Antonis Lempesis
a512ead447
changed orcid ids to all capital
2024-01-30 16:54:47 +02:00
Claudio Atzori
1a8b609ed2
code formatting
2024-01-30 11:34:16 +01:00
Antonis Lempesis
bb10a22290
merged changes from dnet-hadoop
2024-01-29 21:51:47 +02:00
Miriam Baglioni
4c8706efee
[orcid-enrichment] change the value of parameters.
2024-01-29 18:21:36 +01:00
Claudio Atzori
f804c58bc7
Merge pull request 'Use SparkSQL in place of Hive for executing step16-createIndicatorsTables.sql of stats update wf' ( #386 ) from stats_with_spark_sql into beta
...
Reviewed-on: #386
2024-01-29 09:11:59 +01:00
Claudio Atzori
926903b06b
Merge branch 'beta' into stats_with_spark_sql
2024-01-29 09:11:45 +01:00
Giambattista Bloisi
078df0b4d1
Use SparkSQL in place of Hive for executing step16-createIndicatorsTables.sql of stats update wf
2024-01-26 21:56:55 +01:00
Claudio Atzori
4d0c59669b
merged changes from beta
2024-01-26 16:08:54 +01:00
Claudio Atzori
bf99c424fa
Merge pull request 'Fixed problem on missing author in crossref Mapping' ( #383 ) from crossref_missing_author_fix into beta
...
Reviewed-on: #383
2024-01-26 15:57:23 +01:00
Claudio Atzori
ce3200263e
Merge branch 'beta' into crossref_missing_author_fix
2024-01-26 15:57:04 +01:00
Sandro La Bruzzo
3c8c88bdd3
Fixed problem on missing author in crossref Mapping
2024-01-26 12:29:30 +01:00
Sandro La Bruzzo
e889808daa
Fixed problem on missing author in crossref Mapping
2024-01-26 12:19:04 +01:00
Claudio Atzori
9e8fc6aa88
[collection] increased logging from the oai-pmh metadata collection process
2024-01-26 09:17:20 +01:00
Antonis Lempesis
c548796463
Changed step16-createIndicatorsTables to use a spark oozie action instead of hive
2024-01-26 02:04:48 +02:00
Antonis Lempesis
a7115cfa9e
max mem of joins (hive.mapjoin.followby.gby.localtask.max.memory.usage) now 80%, up from 55%.
2024-01-25 15:13:16 +01:00
Antonis Lempesis
fd43b0e84a
max mem of joins (hive.mapjoin.followby.gby.localtask.max.memory.usage) now 80%, up from 55%.
2024-01-25 15:06:34 +01:00
Claudio Atzori
2838a9b630
Update 'CONTRIBUTING.md'
2024-01-24 16:07:05 +01:00
Claudio Atzori
da944a5c55
Merge pull request 'code of conduct and contributing' ( #382 ) from contributing into beta
...
Reviewed-on: #382
2024-01-24 15:40:26 +01:00
Claudio Atzori
0c97a3a81a
minor
2024-01-24 10:56:33 +01:00
Claudio Atzori
2c1e6849f0
added code of conduct and contributing files
2024-01-24 10:36:41 +01:00
Claudio Atzori
9b13c22e5d
[graph provision] retrieve all the context information by adding all=true to the requests issued to thr API
2024-01-23 15:36:08 +01:00
Claudio Atzori
3e96777cc4
[collection] increased logging from the oai-pmh metadata collection process
2024-01-23 15:21:03 +01:00
Claudio Atzori
9812406589
Merge pull request '[graph provision] updated param specification for the XML converter job' ( #380 ) from provision_community_api into beta
...
Reviewed-on: #380
2024-01-23 08:55:59 +01:00
Claudio Atzori
f87f3a6483
[graph provision] updated param specification for the XML converter job
2024-01-23 08:54:37 +01:00
Claudio Atzori
6fd25cf549
code formatting
2024-01-23 08:47:12 +01:00
Claudio Atzori
bd187ec6e7
Merge pull request 'Implements pivots table update oozie workflow' ( #376 ) from update_pivots_table into beta
...
Reviewed-on: #376
2024-01-22 16:37:30 +01:00
Claudio Atzori
f76852f385
Merge branch 'beta' into update_pivots_table
2024-01-22 16:37:22 +01:00
Claudio Atzori
b9fcc5ad5e
Merge pull request 'Context API update' ( #379 ) from provision_community_api into beta
...
Reviewed-on: #379
2024-01-22 15:55:33 +01:00
Claudio Atzori
1c6db320f4
[graph provision] obtain context info from the context API instead from the ISLookUp service
2024-01-22 15:53:17 +01:00
Claudio Atzori
2655eea5bc
[orcid enrichment] drop paths before copying the non-modifyed contents
2024-01-19 16:28:05 +01:00
Claudio Atzori
c6b3401596
increased shuffle partitions for publications in the country propagation workflow
2024-01-19 10:15:39 +01:00
Miriam Baglioni
bcc0a13981
[enrichment single step] adding <end> element in wf definition
2024-01-18 17:39:14 +01:00
Miriam Baglioni
6af536541d
[enrichment single step] moving parameter file in correct location
2024-01-18 15:35:40 +01:00
Miriam Baglioni
a12a3eb143
-
2024-01-18 15:18:10 +01:00
Claudio Atzori
628fdfb5eb
Merge pull request '[enrichment single step]' ( #378 ) from enrichmentSingleStepFixed into beta
...
Reviewed-on: #378
2024-01-18 09:41:09 +01:00
Miriam Baglioni
82e9e262ee
[enrichment single step] remove parameter from execution
2024-01-17 17:38:03 +01:00
Miriam Baglioni
67ce2d54be
[enrichment single step] refactoring to fix issues in disappeared result type
2024-01-17 16:50:00 +01:00
Miriam Baglioni
59eaccbd87
[enrichment single step] refactoring to fix issue in disappeared result type
2024-01-15 17:49:54 +01:00
Giambattista Bloisi
21a14fcd80
Reusable RunSQLSparkJob for executing SQL in Spark through Oozie Spark Actions
...
Implements pivots table update oozie workflow
2024-01-15 10:18:14 +01:00
Claudio Atzori
2d302e6827
Merge pull request '[FoS integration]fix issue on FoS integration. Removing the null values from FoS' ( #375 ) from fosPreparationBeta into beta
...
Reviewed-on: #375
2024-01-12 10:27:28 +01:00
Miriam Baglioni
f612125939
fix issue on FoS integration. Removing the null values from FoS
2024-01-12 10:20:28 +01:00
Claudio Atzori
c67467723b
Merge pull request 'refined mapping for the extraction of the original resource type' ( #374 ) from resource_types into beta
...
Reviewed-on: #374
2024-01-11 16:29:47 +01:00
Claudio Atzori
cb9e739484
Merge branch 'beta' into resource_types
2024-01-11 16:29:41 +01:00
Claudio Atzori
2753044d13
refined mapping for the extraction of the original resource type
2024-01-11 16:28:26 +01:00
Giambattista Bloisi
a88dce5bf3
Merge pull request 'Improvements and refactoring in Dedup' ( #367 ) from dedup_increasenumofblocks into beta
...
Reviewed-on: #367
2024-01-11 11:24:06 +01:00
Giambattista Bloisi
3c66e3bd7b
Create dedup record for "merged" pivots
...
Do not create dedup records for group that have more than 20 different acceptance date
2024-01-10 22:59:52 +01:00
Giambattista Bloisi
10e135db1e
Use dedup_wf_002 in place of dedup_wf_001 to make explicit a different algorithm has been used to generate those kind of ids
2024-01-10 22:59:52 +01:00
Giambattista Bloisi
831cc1fdde
Generate "merged" dedup id relations also for records that are filtered out by the cut parameters
2024-01-10 22:59:52 +01:00
Giambattista Bloisi
1287315ffb
Do no longer use dedupId information from pivotHistory Database
2024-01-10 22:59:52 +01:00
Giambattista Bloisi
02636e802c
SparkCreateSimRels:
...
- Create dedup blocks from the complete queue of records matching cluster key instead of truncating the results
- Clean titles once before clustering and similarity comparisons
- Added support for filtered fields in model
- Added support for sorting List fields in model
- Added new JSONListClustering and numAuthorsTitleSuffixPrefixChain clustering functions
- Added new maxLengthMatch comparator function
- Use reduced complexity Levenshtein with threshold in levensteinTitle
- Use reduced complexity AuthorsMatch with threshold early-quit
- Use incremental Connected Component to decrease comparisons in similarity match in BlockProcessor
- Use new clusterings configuration in Dedup tests
SparkWhitelistSimRels: use left semi join for clarity and performance
SparkCreateMergeRels:
- Use new connected component algorithm that converge faster than Spark GraphX provided algorithm
- Refactored to use Windowing sorting rather than groupBy to reduce memory pressure
- Use historical pivot table to generate singleton rels, merged rels and keep continuity with dedupIds used in the past
- Comparator for pivot record selection now uses "tomorrow" as filler for missing or incorrect date instead of "2000-01-01"
- Changed generation of ids of type dedup_wf_001 to avoid collisions
DedupRecordFactory: use reduceGroups instead of mapGroups to decrease memory pressure
2024-01-10 22:59:52 +01:00
Antonis Lempesis
e024718f73
creating result_instances even when no pids exist for the instance
2024-01-10 22:25:50 +01:00
Claudio Atzori
16d858fbf0
Merge pull request 'enrichmentSingleStep' ( #373 ) from enrichmentSingleStep into beta
...
Reviewed-on: #373
2024-01-10 16:58:49 +01:00
Miriam Baglioni
e711a05229
fixed conflicts
2024-01-10 11:03:42 +01:00
Miriam Baglioni
71d6f30711
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2024-01-10 10:59:58 +01:00
dimitrispie
b920307bdd
Changes to indicators
2024-01-09 00:47:09 +02:00
dimitrispie
8b2cbb611e
Changes to beta db names
2024-01-09 00:40:56 +02:00
Antonis Lempesis
2e4cab026c
fixed the result_country definition
2024-01-08 16:01:26 +02:00
dimitrispie
6b823100ae
Update buildIrishMonitorDB.sql
...
New indicators added
2024-01-07 22:54:39 +02:00
dimitrispie
75bfde043c
Historical Snapshots Workflow
...
Create historical snapshots db with parameters:
hist_db_name=openaire_beta_historical_snapshots_xxx
hist_db_name_prev=openaire_beta_historical_snapshots_xxx (previous run of wf)
stats_db_name=openaire_beta_stats_xxx
stats_irish_db_name=openaire_beta_stats_monitor_ie_xxx
monitor_db_name=openaire_beta_stats_monitor_xxx
monitor_db_prod_name=openaire_beta_stats_monitor
monitor_irish_db_name=openaire_beta_stats_monitor_ie_xxx
monitor_irish_db_prod_name=openaire_beta_stats_monitor_ie
hist_db_prod_name=openaire_beta_historical_snapshots
hist_db_shadow_name=openaire_beta_historical_snapshots_shadow
hist_date=122023
hive_timeout=150000
hadoop_user_name=xxx
resumeFrom=CreateDB
2024-01-04 15:11:04 +02:00
Miriam Baglioni
cb14470ba6
added properties file in the forlder for the workflow of result to organization from inst repo propagation. Changes the path in the classes implementing the propagation
2023-12-22 14:50:05 +01:00
Miriam Baglioni
9f966b59d4
added properties file in the forlder for the workflow of result to community from semrel propagation. Changes the path in the classes implementing the propagation
2023-12-22 14:11:47 +01:00
Miriam Baglioni
2f3b5a133d
added properties file in the forlder for the workflow of result to community from organization propagation. Changes the path in the classes implementing the propagation
2023-12-22 13:56:40 +01:00
Miriam Baglioni
2f7b9ad815
added properties file in the forlder for the workflow of project to result propagation. Changes the path in the classes implementing the propagation
2023-12-22 11:46:15 +01:00
Miriam Baglioni
f2352e8a78
changed in the classes the path for the property files for the propagation of community from project
2023-12-22 11:43:34 +01:00
Miriam Baglioni
009730b3d1
added properties file in the forlder for the workflow of orcid propagation. Changes the path in the classes implementing the propagationchanged the path to the parameter file in the class for entitytoorganization propagation
2023-12-22 11:42:09 +01:00
Miriam Baglioni
89f269c7f4
changed the path to the parameter file in the class for entitytoorganization propagation
2023-12-22 11:37:50 +01:00
Miriam Baglioni
b06aea0adf
adding the bulkTag parameter file in the folder for the oozie workflow for bulkTagging. Changes the path in the class
2023-12-22 11:35:37 +01:00
Miriam Baglioni
3afd4aa57b
adjustments for country propagation
2023-12-22 11:27:30 +01:00
dimitrispie
ffdd03d2f4
Monitor Irish Stats WF
...
Parameters (with examples):
stats_db_name=openaire_beta_stats_20231208
monitor_irish_db_name=openaire_beta_stats_monitor_ie_20231208b
monitor_irish_db_prod_name=openaire_beta_stats_monitor_ie
graph_db_name=openaire_beta_20231208
monitor_irish_db_shadow_name=openaire_beta_stats_monitor_ie_shadow
hive_timeout=150000
hadoop_user_name=dnet.beta
resumeFrom=Step1-buildIrishMonitorDB
2023-12-22 11:05:24 +02:00
dimitrispie
40b98d8182
Changes to indicators and funders definition
...
- Changes result_refereed definition
- Added result_country indicator
- Added indi_pub_green_with_license indicator
- Added country from jurisdiction to funders
2023-12-22 10:29:20 +02:00
Claudio Atzori
62104790ae
added metaresourcetype to the result hive DB view
2023-12-21 12:27:10 +01:00
Claudio Atzori
106968adaa
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2023-12-21 12:26:29 +01:00
Claudio Atzori
a8a4db96f0
added metaresourcetype to the result hive DB view
2023-12-21 12:26:19 +01:00
Miriam Baglioni
5011c4d11a
refactoring after compiletion
2023-12-20 15:57:26 +01:00
Miriam Baglioni
4740c808f7
-
2023-12-20 14:26:54 +01:00
Miriam Baglioni
d410ea8a41
added needed parameter
2023-12-19 12:15:01 +01:00
Sandro La Bruzzo
37e36baf76
updated workflow for generation of Scholix Datasource's to use mdstore transactions
2023-12-18 16:05:35 +01:00
Sandro La Bruzzo
9d39845d1f
uploaded input parameters on CreateBaseline WF
2023-12-18 12:23:12 +01:00
Sandro La Bruzzo
15fd93a2b6
uploaded input parameters on CreateBaseline WF
2023-12-18 12:21:55 +01:00
Sandro La Bruzzo
9d342a47da
updated the transformation Baseline workflow to include mdstore rollback/commit action
2023-12-18 11:48:57 +01:00
Sandro La Bruzzo
1fbd4325f5
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2023-12-18 11:47:17 +01:00
Sandro La Bruzzo
1f1a6a5f5f
updated the transformation Baseline workflow to include mdstore rollback/commit action
2023-12-18 11:47:00 +01:00
Miriam Baglioni
3eca5d2e1c
-
2023-12-18 09:55:27 +01:00
Miriam Baglioni
01ce0b9c76
[doiboost - preprocess] remove transition to orcid preparation from sequence of steps at the beginning of the workflow
2023-12-15 12:24:55 +01:00
Miriam Baglioni
0d8e496a63
-
2023-12-15 12:16:43 +01:00
Claudio Atzori
c4ec35b6cd
Merge pull request 'Master branch updates from beta December 2023' ( #369 ) from beta_to_master_dicember2023 into master
...
Reviewed-on: #369
2023-12-15 11:18:30 +01:00
Claudio Atzori
1726f49790
code formatting
2023-12-15 10:37:02 +01:00
Claudio Atzori
a59be5779e
Merge pull request '9078_xml_records_irish_tender' ( #368 ) from 9078_xml_records_irish_tender into beta
...
Reviewed-on: #368
2023-12-12 12:34:43 +01:00
Claudio Atzori
ff924215b8
[graph provision] added tests for new peerreviewed field
2023-12-12 11:21:30 +01:00
Claudio Atzori
a6d635e695
Merge branch 'beta' into 9078_xml_records_irish_tender
2023-12-12 11:06:42 +01:00
Claudio Atzori
98cce5bfb2
code formatting
2023-12-12 09:59:05 +01:00
Claudio Atzori
84d54643cf
[cleaning] allow enriched orcids to pass the cleaning, rule out non-orcid author pids
2023-12-12 09:57:00 +01:00
Claudio Atzori
7e8eff40c1
[graph provision] added tests for the new model fields
2023-12-12 08:54:15 +01:00
Miriam Baglioni
8752d275fa
removed not needed parameter
2023-12-09 15:24:45 +01:00
Miriam Baglioni
d4eedada71
adjusting workflow definition
2023-12-09 15:20:11 +01:00
Claudio Atzori
aba95ed1d1
code formatting
2023-12-08 17:06:19 +01:00
Claudio Atzori
2877839df0
Merge pull request '[graph cleaning] added cleaning for result.publisher and result.instance.license' ( #366 ) from clean_license_publisher into beta
...
Reviewed-on: #366
2023-12-08 16:58:37 +01:00
Claudio Atzori
34abd0fc43
Merge branch 'beta' into clean_license_publisher
2023-12-08 16:58:27 +01:00
Claudio Atzori
cb71a7936b
[graph cleaning] avoid stack overflow error when navigating Oaf objects declaring an Enum
2023-12-07 23:09:54 +01:00
Claudio Atzori
70eb1796b2
logging typo
2023-12-07 14:08:04 +01:00
Claudio Atzori
c381bacee0
[enrichment] passing the community API base URL
2023-12-07 14:07:11 +01:00
Miriam Baglioni
336fb31d87
[community_result_propagation] adjusting starting poit of workflow
2023-12-07 10:27:25 +01:00
Miriam Baglioni
c0cde53bf6
[bulktagging] setting first step of bulktaggin as the copy of the entities and relations not involved in the tagging'
2023-12-07 10:08:35 +01:00
Miriam Baglioni
616622d2bb
first version of the workflow single step
2023-12-07 09:59:52 +01:00
Claudio Atzori
259c69e446
[orcid enrichment] fixed workflow definition
2023-12-06 19:41:53 +01:00
Claudio Atzori
431c6bb08a
[dedup] added isLookupUrl to the graph consistency workflow definition, required now by the entity grouping phase
2023-12-06 11:06:46 +01:00
Claudio Atzori
982c0c110b
Merge pull request '[graph provision] added serialization for the new fields imported from the stats DB' ( #365 ) from 9078_xml_records_irish_tender into beta
...
Reviewed-on: #365
2023-12-05 16:39:44 +01:00
Claudio Atzori
321922772b
added serialization for the new fields imported for the Irish tender
2023-12-05 16:37:04 +01:00
Claudio Atzori
c5b7253130
[community_organization propagation] fixed workflow parameters
2023-12-05 09:13:33 +01:00
Claudio Atzori
3c3bdb8318
[bulktagging] fixed workflow parameters
2023-12-05 09:08:48 +01:00
Claudio Atzori
7c3041b276
avoid NPEs
2023-12-03 16:49:49 +01:00
Claudio Atzori
74b185d07b
avoid NPEs
2023-12-03 16:18:20 +01:00
Claudio Atzori
e6086efc53
avoid NPEs in Vocabulary.getTermBySynonym
2023-12-03 13:33:20 +01:00
Claudio Atzori
2a233a89aa
[graph grouping] added isLookupUrl to the workflow definition, passed to the grouping spark aciton
2023-12-03 13:32:52 +01:00
Claudio Atzori
178a14c491
code formatting
2023-12-03 13:31:58 +01:00
Sandro La Bruzzo
3caf6ff27e
Extracted the correct original type to pass to instanceTypeMapping in Crossref Mapping
2023-12-01 16:33:56 +01:00
Claudio Atzori
511a98dd80
fixed doiboost process workflow, removed references to the ProcessORCID step
2023-12-01 16:21:53 +01:00
Claudio Atzori
d33f578e54
code formatting
2023-12-01 15:14:17 +01:00
Claudio Atzori
c5ac593c07
Merge pull request 'ORCID Enrichment and Download' ( #364 ) from orcid_import into beta
...
Reviewed-on: #364
2023-12-01 15:05:44 +01:00
Claudio Atzori
09d061e90b
Merge branch 'beta' into orcid_import
2023-12-01 15:05:35 +01:00
Claudio Atzori
93a700742a
Merge pull request 'Changes for tables and creation of the new indicator indi_is_result_accessible' ( #363 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #363
2023-12-01 15:05:23 +01:00
Claudio Atzori
0c3c9ea43d
Merge pull request 'StatsDB workflow to export actionsets about OA routes, diamond, and publicly-funded' ( #355 ) from dimitris.pierrakos/dnet-hadoop:beta into beta
...
Reviewed-on: #355
2023-12-01 15:03:56 +01:00
Claudio Atzori
33cb483c75
using objectSubType as originalType in Crossref2Oaf, code formatting
2023-12-01 15:03:05 +01:00
dimitrispie
c9d995dde0
New institutions added
2023-12-01 15:44:35 +02:00
dimitrispie
a397112cb8
Add new indicator
...
Add indi_pub_publicly_funded
2023-12-01 15:00:18 +02:00
dimitrispie
76594ded23
Changes to indicators
...
Fixes on open access colours indicators
- indi_pub_green_oa
- indi_pub_gold_oa
- indi_pub_hybrid
- indi_pub_bronze_oa
- indi_pub_diamond
2023-12-01 13:38:19 +02:00
Claudio Atzori
622fafbd2e
Merge branch 'beta' into orcid_import
2023-12-01 12:28:14 +01:00
Sandro La Bruzzo
bf0fd27c36
Removed unused function
...
Applied PR Comment of Giambattista in the PR
2023-12-01 12:16:42 +01:00
dimitrispie
48430a32a6
Update StatsAtomicActionsJob.java
...
Added indi_funded_result_with_fundref indicator
2023-12-01 11:35:01 +02:00
Sandro La Bruzzo
cdfb7588dd
code formatting
2023-11-30 15:31:42 +01:00
Sandro La Bruzzo
5e22b67b8a
Merge remote-tracking branch 'origin/beta' into orcid_import
2023-11-30 15:27:46 +01:00
Sandro La Bruzzo
f718caaac9
Added copy of the untouched entities of the graph
2023-11-30 14:51:00 +01:00
Sandro La Bruzzo
7b5e04f37e
removed Orcid intersection on DOIBoost
2023-11-30 14:36:50 +01:00
Claudio Atzori
4cbabc9fbc
Merge pull request '[ENRICHMENT][BETA] Use of community API in enrichment process AND addition to tagging result for communities through projects' ( #359 ) from propagationapi into beta
...
Reviewed-on: #359
2023-11-30 14:20:33 +01:00
Claudio Atzori
6f10791e77
Merge branch 'beta' into propagationapi
2023-11-30 14:20:18 +01:00
Claudio Atzori
4e1aac2e2f
resolved conflict in pom.xml before applying the changes from [COAR based resource types & Irish tender] #350
2023-11-29 14:37:52 +01:00
Sandro La Bruzzo
86b5775e08
added vocabulary in instanceTypeMapping for
...
- DOIBoost
- Datacite
- PubMed
- Scholexplorer Datasource
2023-11-29 13:15:43 +01:00
Sandro La Bruzzo
c96ff54b45
Merge remote-tracking branch 'origin/resource_types' into resource_types
2023-11-29 12:45:41 +01:00
Sandro La Bruzzo
af1c2634b3
added instanceTypeMapping original field in the mapping of
...
- DOIBoost
- Datacite
- PubMed
- Scholexplorer Datasource
2023-11-29 12:45:30 +01:00
Sandro La Bruzzo
279100fa52
added test
2023-11-29 11:17:58 +01:00
Sandro La Bruzzo
aa239ec673
Changed implementation of check similarity to verify exact match of name instead of the first char
2023-11-29 11:17:41 +01:00
Sandro La Bruzzo
59111713fa
added comment
2023-11-28 09:00:48 +01:00
Sandro La Bruzzo
6f4d0c05ea
Implemented Author MErger for ORCID that takes in account the case when name and surname are swapped
2023-11-28 08:43:56 +01:00
Miriam Baglioni
8eb70e6657
refactoring
2023-11-27 15:13:15 +01:00
Miriam Baglioni
e3cce9a5a0
mergin with branch beta
2023-11-27 15:10:55 +01:00
Miriam Baglioni
48e0427a23
changed the parameter from production to baseURL. Fixed issue in tagging configuration
2023-11-27 15:10:27 +01:00
Sandro La Bruzzo
34a4b3cbdf
Implemented ORCID Enrichment
2023-11-24 12:39:58 +01:00
Claudio Atzori
1763d377ad
code formatting
2023-11-23 16:33:24 +01:00
Claudio Atzori
1ba582de3c
[graph cleaning] added cleaning for result.publisher and result.instance.license
2023-11-23 16:27:19 +01:00
dimitrispie
359e81b7a6
Update StatsAtomicActionsJob.java
...
Bug fix for duplicate bronze checks
2023-11-23 10:48:55 +02:00
Claudio Atzori
a0311e8a90
Merge pull request 'Clear working dir in bipranker workflow' ( #360 ) from 9120_bipranker_clean_working_dir into master
...
Reviewed-on: #360
2023-11-22 14:10:39 +01:00
Claudio Atzori
8fb05888fd
Merge branch 'master' into 9120_bipranker_clean_working_dir
2023-11-22 14:10:30 +01:00
Claudio Atzori
a21617732a
Merge pull request 'graph cleaning, suggestions from ticket 8898 - round 2' ( #356 ) from cleaning_8898 into beta
...
Reviewed-on: #356
2023-11-22 14:00:37 +01:00
Claudio Atzori
2c77638bf5
Merge branch 'beta' into cleaning_8898
2023-11-22 14:00:10 +01:00
Claudio Atzori
836d7ec724
Merge pull request 'Add Pubmed affiliations (inferred by BIP) as actionsets' ( #353 ) from 9117_pubmed_affiliations into beta
...
Reviewed-on: #353
2023-11-22 13:53:07 +01:00
Claudio Atzori
745039ad5b
Merge branch 'beta' into 9117_pubmed_affiliations
2023-11-22 13:52:53 +01:00
Claudio Atzori
008fdf9d8a
Merge pull request 'URL Validator to accept double slashes' ( #352 ) from url_validation into beta
...
Reviewed-on: #352
2023-11-22 13:52:08 +01:00
Claudio Atzori
11a1207f9c
[graph cleaning] applying coar based vocabularies in bulk
2023-11-22 12:22:14 +01:00
dimitrispie
a94a54a2d0
Changes for tables and creation of the new indicator indi_is_result_accessible
...
- Drop table statements for all tables to avoid duplicates in case of wf rerun
- Add pdfsaggregated step to create the indi_is_result_accessible table. This step is executed on the new impala cluster only, since the pdfaggregation_i is updated on this cluster.
2023-11-15 14:32:18 +02:00
Claudio Atzori
2b626815ff
Merge pull request 'Project propagation via communityAPI instead of using IS via IIS' ( #362 ) from projectPropagation into master
...
Reviewed-on: #362
2023-11-14 16:37:53 +01:00
Miriam Baglioni
b177cd5a0a
Project propagation via communityAPI instead of using IS via IIS
2023-11-14 16:25:09 +01:00
Miriam Baglioni
eaf0a702de
-
2023-11-14 14:53:34 +01:00
Sandro La Bruzzo
6ce36b3e41
Implemented ORCID Workflow on DHP-Aggregation for retrieving ORCID DUMP and generating tables
2023-11-14 12:04:29 +01:00
dimitrispie
d524e30866
Changes to actionsets
...
Resolve comments from
#355
2023-11-14 09:46:52 +02:00
Serafeim Chatzopoulos
671ba8a5a7
Clear working dir in bipranker workflow
2023-11-07 18:35:05 +02:00
Miriam Baglioni
5bc97615d5
-
2023-11-03 15:35:10 +01:00
Miriam Baglioni
7b1e34f159
refactoring
2023-11-03 15:30:01 +01:00
Miriam Baglioni
638ad9e74f
changing test for new implementation
2023-11-03 15:06:50 +01:00
Miriam Baglioni
edcb17ca98
refactoring and test
2023-11-03 13:01:14 +01:00
Claudio Atzori
5f1ed61c1f
merging from bulkTag branch
2023-11-03 12:51:37 +01:00
Claudio Atzori
8c03c41d5d
applying changes from beta
2023-11-03 12:08:39 +01:00
Claudio Atzori
97454e9594
Merge pull request '9117_pubmed_affiliations_prod' ( #357 ) from 9117_pubmed_affiliations_prod into master
...
Reviewed-on: #357
2023-11-03 11:45:34 +01:00
Serafeim Chatzopoulos
7e34dde774
Renaming input param for crossref input path
2023-11-02 17:47:04 +02:00
Serafeim Chatzopoulos
24c3f92d87
Change the description of the workflow
2023-11-02 17:46:51 +02:00
Serafeim Chatzopoulos
6ce9b600c1
Add actionset creation for pubmed affiliations
2023-11-02 17:46:39 +02:00
Serafeim Chatzopoulos
94089878fd
Adjust tests to new WF input params
2023-11-02 17:46:13 +02:00
Miriam Baglioni
937ff6a7c7
-
2023-10-31 15:56:08 +01:00
Miriam Baglioni
a737dd47b6
removed not needed test class
2023-10-31 15:54:49 +01:00
Miriam Baglioni
c80b768af0
test for project propagation
2023-10-31 15:49:42 +01:00
Miriam Baglioni
e9a20fc8f6
mergin with branch beta
2023-10-31 14:36:03 +01:00
Claudio Atzori
dde2fec035
[graph cleaning] cleanup
2023-10-31 14:35:33 +01:00
Claudio Atzori
262d7c581b
[graph cleaning] implemented further suggestions from https://support.openaire.eu/issues/8898
2023-10-31 14:34:10 +01:00
Serafeim Chatzopoulos
2090003ea9
Adjust tests to new WF input params
2023-10-26 13:47:06 -07:00
Miriam Baglioni
0097f4e64b
Removed Query community testing. Removed package from common related to the interaction with Zenodo since it was moved to the dump-project
2023-10-26 09:38:09 +02:00
Serafeim Chatzopoulos
a82aaf57b2
Renaming input param for crossref input path
2023-10-25 12:05:02 -07:00
Claudio Atzori
b3a61ea955
Merge branch 'beta' into url_validation
2023-10-25 14:22:56 +02:00
dimitrispie
89c4dfbaf4
StatsDB workflow to export actionsets about OA routes, diamond, and publicly-funded
...
A new oozie workflow capable to read from the stats db to produce a new actionSet for updating results with:
- green_oa ={true, false}
- openAccesColor = {gold, hybrid, bronze}
- in_diamond_journal={true, false}
- publicly_funded={true, false}
Inputs:
- outputPath
- statsDB
2023-10-24 09:48:23 +03:00
Miriam Baglioni
5c5a195e97
refactoring and fixing issue on property name
2023-10-23 11:26:17 +02:00
Claudio Atzori
a870aa2b09
depending on dhp-schemas:3.17.2
2023-10-20 22:28:39 +02:00
Claudio Atzori
7fc621cdec
added defaults to the graph resolution workflow config-default.xml
2023-10-20 22:28:12 +02:00
Miriam Baglioni
70b78a40c7
removed file from different propagation
2023-10-20 15:50:49 +02:00
Miriam Baglioni
f206ff42d6
modified code to use the the API. Removing not needed parameters. Rewritten the code to exploit the parallel stream on the entity types
2023-10-20 15:49:41 +02:00
Miriam Baglioni
34358afe75
modified resource file, workflow anf default-config. Add 3g of memory Overhead and specified the shuffle partition in the wf confiduration. Removed the multiple instantiation in the wf because of different implementation of the spark job
2023-10-20 15:48:27 +02:00
Miriam Baglioni
18bfff8af3
adding test classes and modifying test for bulktag
2023-10-20 15:47:03 +02:00
Miriam Baglioni
69dac91659
adding the new code to use the API instead of the Information Service
2023-10-20 15:45:52 +02:00
Serafeim Chatzopoulos
aad5982bf1
Change the description of the workflow
2023-10-20 12:48:21 +03:00
Miriam Baglioni
a9ede1e989
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2023-10-20 10:14:43 +02:00
Miriam Baglioni
a4214ced1e
fixing issue on propagation organization. added --config to workflow definition. added oozie_app to communtiy project
2023-10-20 10:14:20 +02:00
Serafeim Chatzopoulos
6b19dcee80
Add actionset creation for pubmed affiliations
2023-10-19 19:58:25 +03:00
Claudio Atzori
2b9d0416ec
[graph raw] URL Validator to accept double slashes
2023-10-19 16:26:37 +02:00
Claudio Atzori
b0fed1725e
avoid NPEs
2023-10-19 12:13:45 +02:00
Miriam Baglioni
f1b898c6b4
mergin with branch beta
2023-10-19 09:04:35 +02:00
Claudio Atzori
a24178cb93
Merge branch 'beta' into resource_types
2023-10-17 11:09:50 +02:00
Claudio Atzori
d28b7085f6
more NPE checks
2023-10-17 11:09:31 +02:00
Claudio Atzori
3b1c8b9fbd
Merge pull request 'FIX: GroupEntitiesSparkJob deletes whole graph outputPath instead of its temporary folder' ( #351 ) from fix_consistency_missing_rels into beta
...
Reviewed-on: #351
2023-10-17 08:40:23 +02:00
Claudio Atzori
1d594eaffd
Merge branch 'beta' into fix_consistency_missing_rels
2023-10-17 08:40:07 +02:00
Giambattista Bloisi
0e44b037a5
FIX: GroupEntitiesSparkJob deletes whole graph outputPath instead of its temporary folder
2023-10-17 07:54:01 +02:00
Claudio Atzori
6dfcd0c9a2
[raw graph] mapping original resource types
2023-10-16 12:57:18 +02:00
Claudio Atzori
39d24d5469
Merge branch 'beta' into resource_types
2023-10-16 11:56:38 +02:00
Claudio Atzori
389e3fcc59
Merge pull request '[dedup] use common `saveParquet` and `save` methods to ensure outputs are compressed' ( #349 ) from fix_dedup_not_compressed into beta
...
Reviewed-on: #349
2023-10-16 11:56:18 +02:00
Sandro La Bruzzo
a5a89a702f
new spark parrameter updated
2023-10-16 11:46:12 +02:00
Miriam Baglioni
159388f9c2
testing and fix some issues
2023-10-16 11:26:07 +02:00
Claudio Atzori
03670bb9ce
[dedup] use common saveParquet and save methods to ensure outputs are compressed
2023-10-16 10:55:47 +02:00
Claudio Atzori
54fbf09ac6
[raw graph] WIP: mapping original resource types
2023-10-16 08:57:47 +02:00
Claudio Atzori
6cf64d5d8b
[SWH] renamed 'Software Heritage Identifier' to 'Software Hash Identifier'
2023-10-13 10:09:26 +02:00
Claudio Atzori
242d647146
cleanup & docs
2023-10-12 12:23:44 +02:00
Claudio Atzori
76447958bb
cleanup & docs
2023-10-12 12:23:20 +02:00
Claudio Atzori
af3ffad6c4
[AMF] docs
2023-10-12 10:07:52 +02:00
Claudio Atzori
1902728f7e
Merge pull request '[ActionManagerFramework] documentation' ( #347 ) from actionset_docs into beta
...
Reviewed-on: #347
2023-10-12 10:07:25 +02:00
Claudio Atzori
dda602fff7
[AMF] docs
2023-10-12 10:05:46 +02:00
Claudio Atzori
05ee7d8b09
[graph cleaning] avoid NPEs
2023-10-12 09:13:42 +02:00
Miriam Baglioni
8e9493fad9
mergin with branch beta
2023-10-11 18:18:09 +02:00
Miriam Baglioni
89184d5b4f
used the API instead of the IS for bulktagging and propagation for community through organization. Added a new propagation step for communities through projects. Still using the API and not the IS
2023-10-11 18:17:35 +02:00
Claudio Atzori
554551682d
[raw graph] adopting the new COAR based vocabularies for the resource typing
2023-10-11 16:09:19 +02:00
Claudio Atzori
a460ebe215
[UnresolvedEntities] updated action name
2023-10-10 15:50:11 +02:00
Claudio Atzori
ecea58a41c
Merge pull request '[UnresolvedEntities] changing in the creation of the unresolved entities' ( #346 ) from fos into beta
...
Reviewed-on: #346
2023-10-10 15:10:21 +02:00
Claudio Atzori
66064e99fe
Merge branch 'beta' into fos
2023-10-10 15:07:21 +02:00
Miriam Baglioni
a431b04814
leftover for the properties and removal of bipfinder
2023-10-10 12:53:57 +02:00
Claudio Atzori
ed9282ef2a
removed module dhp-stats-monitor-update
2023-10-10 09:52:03 +02:00
Miriam Baglioni
110ce4b40f
extend the fos model to include the level4 and the scores for level3 and level4. removed bip indicators from the instance
2023-10-10 09:46:40 +02:00
Claudio Atzori
204404b0e3
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-10-10 09:36:13 +02:00
Claudio Atzori
9a98f408b3
code formatting
2023-10-10 09:36:11 +02:00
Claudio Atzori
4e6fccf4f6
Merge pull request 'Beta stats wf updated' ( #332 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #332
2023-10-10 09:35:32 +02:00
Miriam Baglioni
a3d01ccb24
refactoring
2023-10-09 14:52:17 +02:00
Miriam Baglioni
8448b9ebfb
mergin with branch beta
2023-10-09 14:27:23 +02:00
Miriam Baglioni
3d6be20989
changes to use the API instead of the IS the get the information for the communities to be used during bulktagging and context propagation
2023-10-09 14:26:33 +02:00
dimitrispie
17586f0ff8
Update step20-createMonitorDB.sql
...
Add result_orcid table to monitor dbs
2023-10-09 14:21:31 +03:00
dimitrispie
489a082f04
Update step16-createIndicatorsTables.sql
...
Change scripts for gold, hybrid, bronze indicators
2023-10-09 14:00:50 +03:00
Claudio Atzori
ef833840c3
[Doiboost] removed linkage to SFI unidentified project
2023-10-06 15:48:18 +02:00
Claudio Atzori
84a58802ab
[OC] using the common pid cleaning function
2023-10-06 14:48:05 +02:00
Claudio Atzori
46034630cf
[OC] compress the output actionset
2023-10-06 14:42:02 +02:00
Claudio Atzori
774e874d18
Merge pull request 'implemented relation to irish funder from a Json list' ( #344 ) from irish_funder into beta
...
Reviewed-on: #344
2023-10-06 14:26:54 +02:00
Claudio Atzori
3bc44fbf1d
Merge branch 'beta' into irish_funder
2023-10-06 14:26:41 +02:00
Claudio Atzori
11153742c9
Merge pull request 'Extending the coverage of the peer non-unknown refereed instances' ( #342 ) from peer_reviewed into beta
...
Reviewed-on: #342
2023-10-06 14:22:13 +02:00
Claudio Atzori
8108491722
Merge branch 'beta' into peer_reviewed
2023-10-06 14:21:52 +02:00
Giambattista Bloisi
2f3cf6d0e7
Fix cleaning of Pmid where parsing of numbers stopped at first not leading 0' character
2023-10-06 14:20:15 +02:00
Claudio Atzori
ba5475ed4c
Merge pull request 'Fix cleaning of Pmid where parsing of numbers stopped at first not leading 0 (zero) character' ( #345 ) from fix_truncated_pmid into master
...
Reviewed-on: #345
2023-10-06 14:19:49 +02:00
Claudio Atzori
6856ab28ab
Merge pull request 'SWH_integration' ( #343 ) from SWH_integration into beta
...
Reviewed-on: #343
2023-10-06 14:15:56 +02:00
Claudio Atzori
3c23d5f9bc
Merge branch 'beta' into SWH_integration
2023-10-06 14:15:38 +02:00
Claudio Atzori
858931ccb6
[SWH] compress the output actionset
2023-10-06 14:03:33 +02:00
Claudio Atzori
f759b18bca
[SWH] aligned parameter name
2023-10-06 13:43:20 +02:00
Giambattista Bloisi
2c235e82ad
Fix cleaning of Pmid where parsing of numbers stopped at first not leading 0' character
2023-10-06 12:35:54 +02:00
Claudio Atzori
eed9fe0902
code formatting
2023-10-06 12:31:17 +02:00
Claudio Atzori
7f27111b1f
Merge branch 'importpoci' into beta
2023-10-06 12:23:28 +02:00
Claudio Atzori
73c49b8d26
Merge branch 'beta' into SWH_integration
2023-10-06 12:21:51 +02:00
Sandro La Bruzzo
42a2dad975
implemented relation to irish funder from a Json list
2023-10-06 11:52:33 +02:00
Sandro La Bruzzo
13f332ce77
ignored jenv prop
2023-10-06 10:40:05 +02:00
Serafeim Chatzopoulos
1bb83b9188
Add prefix in SWH ID
2023-10-04 20:31:45 +03:00
Claudio Atzori
ee8a39e7d2
cleanup and refinements
2023-10-04 12:32:05 +02:00
Serafeim Chatzopoulos
e9f24df21c
Move SWH API Key from constants to workflow param
2023-10-03 20:57:57 +03:00
Serafeim Chatzopoulos
cae75fc75d
Add SWH in the collectedFrom field
2023-10-03 16:55:10 +03:00
Serafeim Chatzopoulos
b49a3ac9b2
Add actionsetsPath as a global WF param
2023-10-03 15:43:38 +03:00
Serafeim Chatzopoulos
24c43e0c60
Restructure workflow parameters
2023-10-03 15:11:58 +03:00
Serafeim Chatzopoulos
9f73d93e62
Add param for limiting repo Urls
2023-10-03 14:39:08 +03:00
Claudio Atzori
b446a9ed98
Merge branch 'beta' into peer_reviewed
2023-10-03 10:52:23 +02:00
Claudio Atzori
f344ad76d0
Merge pull request 'extended existing code to import of POCI from open citation' ( #340 ) from importpoci into beta
...
Reviewed-on: #340
2023-10-03 10:52:11 +02:00
Claudio Atzori
5919e488dd
Merge branch 'beta' into importpoci
2023-10-03 10:43:53 +02:00
Serafeim Chatzopoulos
839a8524e7
Add action for creating actionsets
2023-10-02 23:50:38 +03:00
Claudio Atzori
c9a5ad6a02
extending the coverage of the peer non-unknown refereed instances
2023-10-02 16:28:42 +02:00
Miriam Baglioni
d7fccdc64b
fixed paths in wf to match the req of the pathname
2023-10-02 14:10:57 +02:00
Miriam Baglioni
9898470b0e
Addressing comments in #340 \#issuecomment-10592
2023-10-02 12:54:16 +02:00
Giambattista Bloisi
c412dc162b
Fix bug in conversion from dedup json model to Spark Dataset of Rows: list of strings contained the json escaped representation of the value instead of the plain value, this caused instanceTypeMatch failures because of the leading and trailing double quotes
2023-10-02 11:34:51 +02:00
Claudio Atzori
4ac06c9e37
Merge pull request 'Fix bug in conversion from dedup json model to Spark Dataset of Rows (instanceTypeMatch no longer working)' ( #339 ) from fix_dedupfailsonmatchinginstances into master
...
Reviewed-on: #339
2023-10-02 11:34:20 +02:00
Claudio Atzori
fa692b3629
Merge branch 'master' into fix_dedupfailsonmatchinginstances
2023-10-02 11:28:16 +02:00
Claudio Atzori
5d09b7db8b
Merge pull request 'SparkPropagateRelation relations do not propagate deletedByInference and invisible' ( #333 ) from consistency_keep_mergerels into beta
...
Reviewed-on: #333
2023-10-02 11:27:57 +02:00
Claudio Atzori
7b403a920f
Merge branch 'beta' into consistency_keep_mergerels
2023-10-02 11:26:00 +02:00
Claudio Atzori
dc86018a5f
Merge branch 'merge_entities_job' into beta
2023-10-02 11:24:48 +02:00
Giambattista Bloisi
3c47920c78
Use asScala to convert java List to Scala Sequence
2023-10-02 11:04:47 +02:00
Claudio Atzori
7f244d9a7a
code formatting
2023-10-02 11:04:36 +02:00
Giambattista Bloisi
e239b81740
Fix defect #8997 : GenerateEventsJob is generating huge amounts of logs because broker entity similarity calculation consistently failed
2023-10-02 11:04:18 +02:00
Claudio Atzori
ef02648399
Merge pull request 'fixed dedup configuration management in the Broker workflow' ( #341 ) from fix_8997 into master
...
Reviewed-on: #341
2023-10-02 11:03:50 +02:00
Claudio Atzori
d13bb534f0
Merge branch 'master' into fix_8997
2023-10-02 11:03:18 +02:00
Miriam Baglioni
e84f5b5e64
extended existing codo to accomodate import of POCI from open citation
2023-10-02 09:25:16 +02:00
Serafeim Chatzopoulos
ab0d70691c
Add step for archiving repoUrls to SWH
2023-09-28 20:56:18 +03:00
Giambattista Bloisi
775c3f704a
Fix bug in conversion from dedup json model to Spark Dataset of Rows: list of strings contained the json escaped representation of the value instead of the plain value, this caused instanceTypeMatch failures because of the leading and trailing double quotes
2023-09-27 22:30:47 +02:00
Serafeim Chatzopoulos
ed9c81a0b7
Add steps to collect last visit data && archive not found repository URLs
2023-09-27 19:00:54 +03:00
Sandro La Bruzzo
9c3ab11d5b
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2023-09-25 15:29:19 +02:00
Sandro La Bruzzo
423ef30676
minor fix on the aggregation of uniprot and pdb
2023-09-25 15:28:58 +02:00
Giambattista Bloisi
7152d47f84
Use asScala to convert java List to Scala Sequence
2023-09-20 16:14:27 +02:00
Claudio Atzori
4853c19b5e
code formatting
2023-09-20 15:53:21 +02:00
Giambattista Bloisi
1f226d1dce
Fix defect #8997 : GenerateEventsJob is generating huge amounts of logs because broker entity similarity calculation consistently failed
2023-09-20 15:42:00 +02:00
Alessia Bardi
0935d7757c
Use v5 of the UNIBI Gold ISSN list in test
2023-09-20 15:41:35 +02:00
Alessia Bardi
cc7204a089
tests for d4science catalog
2023-09-20 15:38:32 +02:00
Sandro La Bruzzo
76476cdfb6
Added maven repo for dependencies that are not in maven central
2023-09-20 10:33:14 +02:00
Alessia Bardi
6186cdc2cc
Use v5 of the UNIBI Gold ISSN list in test
2023-09-19 14:47:01 +02:00
Alessia Bardi
d94b9bebf7
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2023-09-19 13:38:45 +02:00
Alessia Bardi
19abba8fa7
tests for d4science catalog
2023-09-19 13:38:25 +02:00
dimitrispie
9ef971a146
Update step16-createIndicatorsTables.sql
...
Fix int year for:
indi_org_openess_year
indi_org_fairness_year
indi_org_findable_year
2023-09-19 14:25:42 +03:00
Serafeim Chatzopoulos
9d44418d38
Add collecting software code repository URLs
2023-09-14 18:43:25 +03:00
Serafeim Chatzopoulos
395a4af020
Run CC and RAM sequentieally in dhp-impact-indicators WF
2023-09-13 08:59:40 +02:00
Claudio Atzori
c2f179800c
Merge pull request 'Run CC and RAM sequentieally in dhp-impact-indicators WF' ( #338 ) from run_cc_and_ram_sequentially into master
...
Reviewed-on: #338
2023-09-13 08:52:53 +02:00
Serafeim Chatzopoulos
2aed5a74be
Run CC and RAM sequentieally in dhp-impact-indicators WF
2023-09-12 22:31:50 +03:00
Claudio Atzori
8a6892cc63
[graph dedup] consistency wf should not remove the relations while dispatching the entities
2023-09-12 21:27:05 +02:00
Claudio Atzori
4dc4862011
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2023-09-12 14:34:34 +02:00
Claudio Atzori
dc80ab14d3
[graph dedup] consistency wf should not remove the relations while dispatching the entities
2023-09-12 14:34:28 +02:00
Alessia Bardi
77a2199837
updated test for EOSC comunity
2023-09-08 11:05:49 +02:00
Claudio Atzori
4786aa0e09
added Archive ouverte UNIGE (ETHZ.UNIGENF, opendoar____::1400) to the Datacite hostedBy_map
2023-09-07 11:21:07 +02:00
Claudio Atzori
265180bfd2
added Archive ouverte UNIGE (ETHZ.UNIGENF, opendoar____::1400) to the Datacite hostedBy_map
2023-09-07 11:20:35 +02:00
dimitrispie
5f90cc11e9
Update step16-createIndicatorsTables.sql
...
Fix indi_pub_bronze_oa
2023-09-06 14:14:38 +03:00
Claudio Atzori
da0e9828f7
resolved conflicts for PR#337
2023-09-06 11:28:46 +02:00
Claudio Atzori
9f5d16624c
Merge pull request '[graph raw] datainfo.invisible set as true only for entities' ( #336 ) from invisible_relations into beta
...
Reviewed-on: #336
2023-09-04 16:14:47 +02:00
Claudio Atzori
adec6692ca
Merge branch 'beta' into invisible_relations
2023-09-04 16:13:06 +02:00
Claudio Atzori
15666e86a8
added collectedfrom to the affiliation relations imported from Crossref
2023-09-04 15:56:06 +02:00
Claudio Atzori
7d6bd4f20b
Merge pull request 'Fix import of affiliations relations from Crossref' ( #335 ) from 8876_fix_crossref_affiliation_relations_import into beta
...
Reviewed-on: #335
2023-09-04 15:19:58 +02:00
Claudio Atzori
5b06c9d06f
[graph raw] datainfo.invisible set as true only for entities
2023-09-04 15:15:24 +02:00
Serafeim Chatzopoulos
7de0164c26
Fix import of affiliations relations from Crossref
2023-09-04 16:04:41 +03:00
Giambattista Bloisi
2caaaec42d
Include SparkCleanRelation logic in SparkPropagateRelation
...
SparkPropagateRelation includes merge relations
Revised tests for SparkPropagateRelation
2023-09-04 11:33:20 +02:00
dimitrispie
964c2f553e
Changes in indicators step, monitor step
...
- graduatedoctorates for observatory
- result_apc_affiliations table
- new indicators
indi_is_funder_plan_s
indi_funder_fairness
indi_ris_fairness
indi_funder_openess
indi_ris_openess
indi_funder_findable
indi_ris_findable
indi_is_project_result_after
- cast year to int in composite indicators
- new institutions
-- Universidade Católica Portuguesa
-- Iscte - Instituto Universitário de Lisboa
-- Munster Technological University
-- Cardiff University
-- Leibniz Institute of Ecological Urban and Regional Development
2023-09-01 10:57:02 +03:00
Giambattista Bloisi
6cc7d8ca7b
GroupEntities and DispatchEntites are now merged in GroupEntitiesSparkJob
2023-08-30 10:43:31 +02:00
Claudio Atzori
488d9a1cea
Merge pull request 'Add sparkExecutorMemoryOverhead workflow config to set off-heap memory for Spark actions. If not explicitly set it is defaulted to 1Gb' ( #331 ) from consistencywf_memoryoverhead_conf into beta
...
Reviewed-on: #331
2023-08-29 16:31:36 +02:00
Giambattista Bloisi
6b1c05d118
Add sparkExecutorMemoryOverhead workflow config to set off-heap memory for Spark actions. If not explicitly set it is defaulted to 1Gb
2023-08-29 16:04:19 +02:00
Claudio Atzori
bf35280ea6
code formatting
2023-08-29 11:11:00 +02:00
Claudio Atzori
0515d81c7c
Merge pull request 'Rewrite SparkPropagateRelation exploiting Dataframe API' ( #330 ) from propagate_relation_rewrite into beta
...
Reviewed-on: #330
2023-08-29 10:47:14 +02:00
Claudio Atzori
58665a246c
Merge branch 'beta' into propagate_relation_rewrite
2023-08-29 10:47:02 +02:00
Claudio Atzori
f437be80ad
[impact indicators] adjusted paths in the bip ranker wf parameters
2023-08-29 09:03:03 +02:00
Giambattista Bloisi
d012aec0b3
Revert PropagateRelation's argument name from outputPath to graphOutputPath in consistency workflow ( #8964 )
2023-08-28 22:44:54 +02:00
Giambattista Bloisi
a860e19423
Fix ensure all relations are written out, not only those managed by dedup
2023-08-28 15:36:02 +02:00
Giambattista Bloisi
0d7b2bf83d
Rewrite SparkPropagateRelation exploiting Dataframe API
2023-08-28 10:34:54 +02:00
Miriam Baglioni
9c8b41475a
Merge pull request '8172_impact_indicators_workflow' ( #284 ) from 8172_impact_indicators_workflow into beta
...
Reviewed-on: #284
2023-08-14 15:50:48 +02:00
Serafeim Chatzopoulos
97c1ba8918
Merge actionsets of results and projects
2023-08-11 15:56:53 +03:00
Miriam Baglioni
35b8deb2c6
Merge pull request 'DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag' ( #329 ) from dispatch_filter_invisible_entities into beta
...
Reviewed-on: #329
2023-08-10 12:56:18 +02:00
Giambattista Bloisi
95cd2b9b1e
Make filterInvisible a mandatory parameter of DispathEntitiesSparkJob
...
Make filterInvisible a mandatory parameter of both dedup/consistency and graph/group oozie workflows
2023-08-10 11:53:48 +02:00
Giambattista Bloisi
fab9920271
DispatchEntitiesSparkJob: manage all entity types together, support filtering by dataInfo.invisible flag
2023-08-09 15:41:43 +02:00
Miriam Baglioni
599828ce35
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2023-08-09 13:07:13 +02:00
Miriam Baglioni
c25ac21e5e
Merge pull request 'graph cleaning, suggestions from ticket 8898' ( #325 ) from cleaning_8898 into beta
...
Reviewed-on: #325
2023-08-08 11:14:19 +02:00
Miriam Baglioni
c334fe2438
Merge pull request 'Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleted by inference or that are pointing to dangling entities' ( #328 ) from cleanup_relations_after_dedup into beta
...
Reviewed-on: #328
2023-08-08 09:49:12 +02:00
Miriam Baglioni
0e2f855807
Merge pull request 'Updates Promotion DBs' ( #321 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #321
2023-08-07 12:09:16 +02:00
Miriam Baglioni
18fbe52b20
Merge pull request 'Import affiliation relations from Crossref' ( #320 ) from 8876 into beta
...
Reviewed-on: #320
2023-08-07 10:45:30 +02:00
Giambattista Bloisi
97b6d1dc45
Filter ids by dataInfo.deletedbyinference and DataInfo.invisible flags
...
Filter relations also by dataInfo.invisible flag
2023-08-07 10:24:11 +02:00
Giambattista Bloisi
af49424b59
Add a "CleanRelation" action after the PropagateRelation to filter out all relations that have been deleyted by inference or that are pointing to dangling entities
2023-08-04 14:27:39 +02:00
Claudio Atzori
0bc74e2000
code formatting
2023-08-02 11:52:10 +02:00
Claudio Atzori
7180911ded
[graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests
2023-08-02 11:44:14 +02:00
Claudio Atzori
b9dddbfe54
rule out records with NULL dataInfo, except for Relations
2023-07-31 17:53:54 +02:00
Claudio Atzori
da1727f93f
rule out records with NULL dataInfo, except for Relations
2023-07-31 17:52:56 +02:00
Claudio Atzori
11ffb9bd68
rule out records with NULL dataInfo
2023-07-31 12:35:33 +02:00
Claudio Atzori
ccac6a7f75
rule out records with NULL dataInfo
2023-07-31 12:35:05 +02:00
Serafeim Chatzopoulos
7cefe2665b
Remove unnecessary classes
2023-07-28 19:14:39 +03:00
Serafeim Chatzopoulos
26a92ce762
Merge branch '8876' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8876
2023-07-28 19:03:57 +03:00
Serafeim Chatzopoulos
ebfba38ab6
Add changes from code review
2023-07-28 19:03:47 +03:00
Serafeim Chatzopoulos
eb8684a8cf
Merge branch 'beta' into 8876
2023-07-28 13:39:33 +02:00
Claudio Atzori
1275a07d45
Merge pull request '[graph indexing] expand the instance level fulltext in the XML records' ( #326 ) from instance_fulltext_xml into beta
...
Reviewed-on: #326
2023-07-27 15:02:07 +02:00
Claudio Atzori
a72b9e96ac
expand the instance level fulltext in the XML records
2023-07-27 14:57:38 +02:00
Claudio Atzori
d512df8612
code formatting
2023-07-26 09:14:08 +02:00
Claudio Atzori
d8435a6512
inverted condition
2023-07-25 17:39:57 +02:00
Claudio Atzori
59764145bb
cherry picked & fixed commit 270df939c4
2023-07-25 17:39:00 +02:00
Claudio Atzori
270df939c4
partial implementation of the suggestions from https://support.openaire.eu/issues/8898
2023-07-25 17:29:50 +02:00
Claudio Atzori
8c63e4a864
Merge pull request 'Refactor Dedup using Spark Dataframe API, initial support for scala 2.12 and Spark 3.4' ( #324 ) from dedup-with-dataframe-2 into beta
...
Reviewed-on: #324
2023-07-25 10:17:17 +02:00
Giambattista Bloisi
e64c2854a3
Refactor Dedup process to use Spark Dataframe API and intermediate representation with Row interface
...
JsonPath cache contention fixed by using a ConcurrentHashMap
Blacklist filtering performance improvement
Minor performance improvements when evaluating similarity
Sorting in clustered elements is deterministic (by ordering and identity field, instead of ordering field only)
2023-07-24 15:36:24 +02:00
Giambattista Bloisi
bb5b845e3c
Use scala.binary.version property to resolve scala maven dependencies
...
Ensure consistent usage of maven properties
Profile for compiling with scala 2.12 and Spark 3.4
2023-07-24 11:13:48 +02:00
Claudio Atzori
002b24e06f
Merge pull request '[graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests' ( #315 ) from pid_cleaning into beta
...
Reviewed-on: #315
2023-07-24 10:49:44 +02:00
Claudio Atzori
c754397a19
Merge branch 'beta' into pid_cleaning
2023-07-24 10:49:31 +02:00
Claudio Atzori
f0678cda09
Merge pull request 'fix_beta_tests' ( #323 ) from fix_beta_tests into beta
...
Reviewed-on: #323
2023-07-24 10:47:35 +02:00
Serafeim Chatzopoulos
3a0f09774a
Add script to find score limits
2023-07-21 17:55:41 +03:00
Ilias Kanellos
06b9b71c4e
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-07-21 17:42:49 +03:00
Ilias Kanellos
2374f445a9
Produce additional bip update specific files
2023-07-21 17:42:46 +03:00
Serafeim Chatzopoulos
cb0f3c50f6
Format workflow.xml
2023-07-21 16:07:10 +03:00
Serafeim Chatzopoulos
c64e5e588f
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-07-21 15:27:02 +03:00
Serafeim Chatzopoulos
2cc5b1a39b
Fixes in workflow.xml
2023-07-21 15:26:50 +03:00
Ilias Kanellos
0f96af5d56
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-07-21 13:42:35 +03:00
Ilias Kanellos
03da965162
Format bip-score based file without doi references
2023-07-21 13:42:30 +03:00
Giambattista Bloisi
f03153823a
Update testCitationRelations number of expected citations according to changes made in 0559d8b4
(monodirectional citations)
2023-07-21 10:48:28 +02:00
Giambattista Bloisi
54c1eacef1
SparkJobTest was failing because testing workingdir was not cleaned up after eact test
2023-07-21 10:42:24 +02:00
Giambattista Bloisi
5e15f20e6e
Fix entityMerger that was excluding the authors of the first entity in the list to merge
2023-07-21 00:46:54 +02:00
Giambattista Bloisi
0210a14e43
Ignore timestamp differences in PromoteActionPayloadForGraphTableJobTest
2023-07-20 23:45:57 +02:00
Giambattista Bloisi
dba34505de
Fix SparkStatsTest bug where parquet tables were incorrectly read as text files leading to unpredictable count() values
2023-07-19 14:24:52 +02:00
Giambattista Bloisi
e47ed1fdb2
Use DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES in json mapper to avoid that tests fail if they encounter unmapped properties
2023-07-19 14:21:40 +02:00
Giambattista Bloisi
38dfebfbe6
Disable MdStoreClientTest test as it requires a local mongodb running and it does not perform any assertions
2023-07-19 14:18:56 +02:00
Miriam Baglioni
9e8e39f78a
-
2023-07-19 11:35:58 +02:00
Claudio Atzori
373a5f2c83
Merge pull request 'Master branch updates from beta July 2023' ( #317 ) from master_july23 into master
...
Reviewed-on: #317
2023-07-18 18:22:04 +02:00
Serafeim Chatzopoulos
db4ca43ee8
Resolve conflict
2023-07-18 18:38:26 +03:00
Serafeim Chatzopoulos
be320ba3c1
Indentation fixes
2023-07-17 16:04:21 +03:00
dimitrispie
be4856ef35
Update step15.sql
2023-07-17 15:33:58 +03:00
Serafeim Chatzopoulos
bc1a4611aa
Minor changes
2023-07-17 11:17:53 +03:00
Claudio Atzori
8af129b0c7
merged stats promotion step from antonis/promotion-prod-only
2023-07-13 15:03:28 +02:00
dimitrispie
706092bc19
Update updateProductionViews.sh
2023-07-13 15:48:12 +03:00
dimitrispie
aedd279f78
Updates Promotion DBs
...
- Add a step for promoting the splitted monitor DBs
2023-07-13 15:35:46 +03:00
dimitrispie
163b2ee2a8
Changes
...
1. Monitor updates
2. Bug fixes during copy to impala cluster
2023-07-13 15:25:00 +03:00
dimitrispie
76901a25f9
Updates Promotion DBs
...
- Add a step for promoting the splitted monitor DBs
2023-07-12 22:49:08 +03:00
Giambattista Bloisi
ef493681d9
Merge pull request 'Import dnet-pace-core module in this project and use it after renaming to dhp-pace-core' ( #319 ) from beta_with_pace_core into beta
...
Reviewed-on: #319
2023-07-11 14:03:15 +02:00
Serafeim Chatzopoulos
4eba14a80e
Add oozie workflow
2023-07-06 21:07:50 +03:00
Serafeim Chatzopoulos
c2998a14e8
Add basic tests for affiliation relations
2023-07-06 20:28:16 +03:00
Serafeim Chatzopoulos
bc7b00bcd1
Add bi-directional affiliation relations
2023-07-06 18:29:15 +03:00
Serafeim Chatzopoulos
12528ed2ef
Refactor PrepareAffiliationRelations.java to use OafMapperUtils common functions
2023-07-06 18:08:33 +03:00
Serafeim Chatzopoulos
bbc245696e
Prepare actionsets for BIP affiliations
2023-07-06 15:56:12 +03:00
Ilias Kanellos
0c433eccdd
Fix scores & Workflow
2023-07-06 15:06:28 +03:00
Ilias Kanellos
d5c39a1059
Fix map scores to doi
2023-07-06 15:04:48 +03:00
Ilias Kanellos
772d5f0aab
Make PR and AttRank serial
2023-07-06 13:47:51 +03:00
Giambattista Bloisi
801da2fd4a
New sources formatted by maven plugin
2023-07-06 10:28:53 +02:00
Giambattista Bloisi
bd3fcf869a
rename dnet-pace-core into dhp-pace-core module and use it as dependency in other modules
2023-07-06 10:02:23 +02:00
Serafeim Chatzopoulos
347a889b20
Read affiliation relations
2023-07-06 00:51:01 +03:00
Giambattista Bloisi
3b35db5fbd
Import dnet-pace-core module from dnet-dedup repository
2023-07-05 22:23:06 +02:00
Miriam Baglioni
8dcd028eed
[UsageCount] fixed typo in attribute name for datasource table
2023-07-01 16:07:22 +02:00
Miriam Baglioni
8621377917
[UsageCount] fixed typo in attribute name for datasource table
2023-06-30 19:02:44 +02:00
Miriam Baglioni
ef2dd7a980
resolved conflicts
2023-06-30 18:59:47 +02:00
Miriam Baglioni
7738372125
[UsageCount] fixed typo in attribute name for datasource table
2023-06-30 18:56:41 +02:00
Sandro La Bruzzo
9963fd6d29
updated log to add subentity
2023-06-28 13:36:05 +02:00
Claudio Atzori
f3a85e224b
merged from branch beta the bulk tagging (single step, negative constraints), the cleanig worflow (single step, pid type based cleaning), instance level fulltext
2023-06-28 13:33:57 +02:00
Claudio Atzori
4ef0f2ec26
added dependency commons-validator:commons-validator:1.7
2023-06-28 13:32:01 +02:00
Sandro La Bruzzo
ed7e2ab6d1
reverted mistake on commit workflow.xml
2023-06-28 11:40:19 +02:00
Sandro La Bruzzo
9910ce06ae
added to CreateSimRel the feature to write time log
2023-06-28 11:38:16 +02:00
Miriam Baglioni
2717edafb7
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-06-28 11:25:14 +02:00
Miriam Baglioni
2f04c9d149
[BulkTagging] fixing left over for test
2023-06-28 11:24:42 +02:00
Sandro La Bruzzo
bd17c3edc8
added to CreateSimRel the feature to write time log
2023-06-28 11:20:58 +02:00
Sandro La Bruzzo
b195da3a83
Added utility to write time logs during the deduplication phase
2023-06-28 11:20:09 +02:00
Claudio Atzori
288ec0b7d6
[doiboost] merged workflow from branch beta
2023-06-28 09:15:37 +02:00
Claudio Atzori
5f32edd9bf
adopting dhp-schema:3.17.1
2023-06-27 16:57:17 +02:00
Claudio Atzori
e10ce92fe5
[stats wf] merged workflows from branch beta
2023-06-27 14:32:48 +02:00
Claudio Atzori
b93e1541aa
Merge pull request 'update sql query to return distinct pids' ( #301 ) from distinct_pids_from_openorgs into master
...
Reviewed-on: #301
2023-06-27 12:24:47 +02:00
Claudio Atzori
d029bf0b94
Merge branch 'master' into distinct_pids_from_openorgs
2023-06-27 12:24:35 +02:00
Claudio Atzori
0f5a819f44
[graph cleaning] fixed regex behaviour for cleaning ROR and GRID identifiers, added tests
2023-06-23 16:10:49 +02:00
Serafeim Chatzopoulos
60f25b780d
Minor fixes in workflow.xml and job.properties
2023-06-23 12:51:50 +03:00
Michele Artini
88a1cbc37d
fixed a datasource id
2023-06-22 07:56:33 +02:00
Michele Artini
009d7f312f
fixed a datasource Id
2023-06-21 16:17:34 +02:00
Miriam Baglioni
e4b27182d0
[master] refactoring
2023-06-21 11:15:53 +02:00
Claudio Atzori
b0ebf56367
Merge pull request 'Update step15_5.sql' ( #314 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #314
2023-06-21 10:33:22 +02:00
dimitrispie
2b6370eaee
Update step15_5.sql
...
Bug fix
2023-06-21 11:31:10 +03:00
Claudio Atzori
35e42a86ed
Merge pull request 'Update step15_5.sql' ( #313 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #313
2023-06-21 10:26:16 +02:00
dimitrispie
74cb060bfe
Update step15_5.sql
...
Add "if not exists" clause
2023-06-21 11:24:06 +03:00
Claudio Atzori
85e016df17
Merge pull request 'Update step16-createIndicatorsTables.sql' ( #312 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #312
2023-06-21 09:52:33 +02:00
dimitrispie
a475cfcb7b
Update step16-createIndicatorsTables.sql
...
Rename a field in indi_pub_interdisciplinarity
2023-06-21 10:42:02 +03:00
Claudio Atzori
979cf9cd87
Merge pull request 'Update step15.sql' ( #311 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #311
2023-06-21 09:20:01 +02:00
dimitrispie
4648cd88d4
Update step15.sql
...
Cast score to double
2023-06-21 10:02:19 +03:00
dimitrispie
94d2573c77
Update step15.sql
...
Bug Fix
2023-06-21 09:22:39 +03:00
Claudio Atzori
0561362de2
Merge pull request 'Update step20-createMonitorDB_institutions.sql' ( #309 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #309
2023-06-20 15:07:09 +02:00
Claudio Atzori
50d7dc0078
[graph enrichment] fixed projectOrganizationPath not being passed to the apply_resulttoorganization_propagation node
2023-06-19 15:42:44 +02:00
Claudio Atzori
fbd9bf704e
indent
2023-06-19 15:41:22 +02:00
Giambattista Bloisi
758e662ab8
Revert "REmove duplicated code and ensure that load and initialization is done through "DedupConfig.load" method"
...
This reverts commit 485f9d18cb
.
2023-06-19 13:08:10 +02:00
Giambattista Bloisi
485f9d18cb
REmove duplicated code and ensure that load and initialization is done through "DedupConfig.load" method
2023-06-19 13:00:02 +02:00
Claudio Atzori
6210f6ee48
Merge pull request 'Precompile blacklists patterns before evaluating clustering criteria' ( #1 ) from optimized-clustering into master
...
Reviewed-on: D-Net/dnet-dedup#1
2023-06-19 12:43:49 +02:00
dimitrispie
be2caedb04
Update step20-createMonitorDB_institutions.sql
...
Add openorgs____::1624ff7c01bb641b91f4518539a0c28a Vrije Universiteit Amsterdam
2023-06-19 12:12:17 +03:00
dimitrispie
36e0a8fec4
Changes to Promotion Stats WF
...
1. Add new cluster host at impala-shell commands
2. Add a step for splitting monitor dbs
3. Update workflow.xml to included the new splitting monitor dbs step
2023-06-19 09:44:34 +03:00
Giambattista Bloisi
b0ade43608
Precompile blacklists patterns before evaluating clustering criteria
...
Enable Junit 5 tests in maven builds
Make path comparisons platform-independent
Read String resource files assuming they are encoded in UTF-8
Fix a few test conditions
2023-06-16 09:41:11 +02:00
dimitrispie
4c770a5e29
Update finalizeImpalaCluster.sh
...
Drop views in shadow dbs before dropping the db
2023-06-15 13:25:37 +03:00
dimitrispie
e06d962a6a
Update step15.sql
2023-06-15 12:20:35 +03:00
dimitrispie
afcad08396
Update step20-createMonitorDB_institutions.sql
...
Added openorgs____::c0b262bd6eab819e4c994914f9c010e2 -- National Institute of Geophysics and Volcanology
2023-06-15 10:28:49 +03:00
Claudio Atzori
b9748763e2
Merge pull request '[stats wf] Bug fixes' ( #308 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #308
2023-06-14 21:57:03 +02:00
dimitrispie
42b8ce2ba4
Update copyDataToImpalaCluster.sh
2023-06-14 19:23:42 +03:00
dimitrispie
2032b0df40
Bug fixes
...
1. Remove tables/views from old databases in the new cluster, before dropping the dbs
2. Fix id in result_accessroute, indi_impact_measures, indi_pub_bronze_oa
2023-06-14 19:09:09 +03:00
Michele Artini
a92206dab5
re-added the name of a column (pid)
2023-06-13 11:43:10 +02:00
Claudio Atzori
b76a47b103
[aggregator graph] added column alias when mapping organization PIDs from the OpenOrgs database
2023-06-13 11:38:10 +02:00
Claudio Atzori
744a61a030
depending on dhp-schema:3.17.1
2023-06-12 13:49:44 +02:00
Claudio Atzori
2e4616a251
Merge pull request '[graph cleaning] pid cleaning' ( #307 ) from pid_cleaning into beta
...
Reviewed-on: #307
2023-06-12 13:32:29 +02:00
Claudio Atzori
d6a8b24711
Merge branch 'beta' into pid_cleaning
2023-06-12 13:32:22 +02:00
Claudio Atzori
fdbfb25614
Merge pull request 'update sql query to return distinct pids [beta]' ( #306 ) from distinct_pids_from_openorgs_beta into beta
...
Reviewed-on: #306
2023-06-12 09:59:00 +02:00
Claudio Atzori
ad04f14b81
Merge branch 'beta' into distinct_pids_from_openorgs_beta
2023-06-12 09:58:21 +02:00
Claudio Atzori
a98e6591e2
Merge pull request 'propagation of projects through parent-child relations' ( #299 ) from propagationProjectThroughParentChils into beta
...
Reviewed-on: #299
2023-06-12 09:57:20 +02:00
Claudio Atzori
55f002f1e9
Merge branch 'beta' into propagationProjectThroughParentChils
2023-06-12 09:56:53 +02:00
Claudio Atzori
daa21ddbb5
Merge pull request '[aggregator graph] validation for URLs from oaf:fulltext' ( #298 ) from fulltext_url_validation into beta
...
Reviewed-on: #298
2023-06-12 09:55:35 +02:00
Claudio Atzori
4b00a76271
Merge branch 'beta' into fulltext_url_validation
2023-06-12 09:55:25 +02:00
Claudio Atzori
eb2fa8556b
Merge pull request 'removeTaggingCondition' ( #297 ) from removeTaggingCondition into beta
...
Reviewed-on: #297
2023-06-12 09:53:05 +02:00
Claudio Atzori
de225c71cd
Merge branch 'beta' into removeTaggingCondition
2023-06-12 09:50:40 +02:00
Claudio Atzori
e1409ffe80
update sql query to return distinct pids
2023-06-12 09:47:45 +02:00
Claudio Atzori
1d33074fd1
WIP: pid cleaning
2023-06-09 16:47:25 +02:00
Miriam Baglioni
d9506035e4
[ZenodoApi] gone back to okhttp3 to send the payload.
2023-06-09 12:05:02 +02:00
Claudio Atzori
da7b66c542
Merge pull request '[stats wf] Added memory to hive' ( #305 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #305
2023-06-08 08:58:48 +02:00
dimitrispie
c5f42c7f5b
Added memory to hive
2023-06-07 18:18:23 +03:00
Claudio Atzori
afb76ebf0f
Merge pull request '[stats wf] Bug fix on indicators step' ( #304 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #304
2023-06-07 16:49:09 +02:00
dimitrispie
fa24e2e18f
Bug fix on indicators step
...
indi_pub_gold_oa table was missing during the creation of other indicators
2023-06-07 17:43:37 +03:00
Claudio Atzori
01c67e697d
Merge pull request '[ stats wf] Bug fix' ( #303 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #303
2023-06-07 14:41:44 +02:00
dimitrispie
28272c1b0e
Bug fix
2023-06-07 15:34:01 +03:00
Alessia Bardi
d5be6a13e9
Updated officialnmae of pangaea in hostedbymap for Datacite to avoid duplicate entries in the source filter of the portal
2023-06-06 14:43:32 +02:00
Alessia Bardi
118e72d7db
Updated officialnmae of pangaea in hostedbymap for Datacite to avoid duplicate entries in the source filter of the portal
2023-06-06 14:39:12 +02:00
Alessia Bardi
5befd93d7d
test records for Solr indexing
2023-06-06 14:34:33 +02:00
Michele Artini
cae92cf811
update sql query to return distinct pids
2023-06-06 14:06:06 +02:00
Claudio Atzori
8f651f1225
Merge pull request 'Changes to beta stats wf' ( #300 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #300
2023-06-06 11:41:36 +02:00
dimitrispie
ad07fbf053
Add names to organizations for collaboration indicators
2023-06-02 14:13:10 +03:00
dimitrispie
2324670714
Split Monitor DBs-Interdisciplinarity indicators
...
- Split DBs Monitor for faster rendering of visualizations
- Add interdisciplinarity indicators from result_fos
2023-06-02 13:34:16 +03:00
Miriam Baglioni
daf4d7971b
refactoring
2023-05-31 18:56:58 +02:00
Miriam Baglioni
97d72d41c3
finalization of implementation and testing
2023-05-31 18:53:22 +02:00
Miriam Baglioni
0389b57ca7
added propagation for project to organization
2023-05-31 11:06:58 +02:00
Claudio Atzori
e45777e7e1
[aggregator graph] added validation for URLs mapped from oaf:fulltext
2023-05-26 11:33:42 +02:00
dimitrispie
ebe586b1d1
Impact indicators/Unpaywall
...
- Added Impact indicators
- Added unpaywall open access colours
2023-05-26 10:25:28 +03:00
dimitrispie
d6102dd576
Update step16-createIndicatorsTables.sql
...
- Add org names to indi_project_collab_org
- Add indi_pub_bronze_oa
- Changes to indi_pub_hybrid_oa_with_cc
2023-05-25 14:52:34 +03:00
Miriam Baglioni
9097e71853
Added assertion in test
2023-05-24 16:30:53 +02:00
Miriam Baglioni
9567c13bc3
refactoring
2023-05-24 16:20:05 +02:00
Miriam Baglioni
b64a5eb4a5
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2023-05-24 15:21:58 +02:00
Miriam Baglioni
34172455d1
[BulkTag] Adding remove constraints to specify when a community must not appear in the context of a result.
2023-05-24 09:56:23 +02:00
Ilias Kanellos
a1b9187039
Fix syntax error on workflow.xml
2023-05-23 17:17:12 +03:00
Ilias Kanellos
6a7e370a21
Remove unnecessary counts in graph creation
2023-05-23 16:48:58 +03:00
Ilias Kanellos
ec4e010687
End after rankings | Create graph debugged
2023-05-23 16:44:04 +03:00
Claudio Atzori
654ffcba60
Merge pull request '[UsageCount] addition of usagecount for Projects and datasources' ( #296 ) from master_datasource_project_usagecounts into master
...
Reviewed-on: #296
2023-05-22 16:13:24 +02:00
Claudio Atzori
db625e548d
[UsageCount] addition of usagecount for Projects and datasources
2023-05-22 15:00:46 +02:00
Alessia Bardi
04141fe259
tests for records from D4Science catalogues
2023-05-19 14:28:24 +02:00
Claudio Atzori
a235d2a24a
Merge pull request 'Updates to steps related to transfer data to impala cluster' ( #295 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #295
2023-05-18 08:46:15 +02:00
dimitrispie
86f4f63daf
Updates to steps related to transfer data to impala cluster
...
1. Remove external table definitions in stats_ext
2. Fix the issue where some views are not created.
3. Added two workflow parameters for copying also the usage stats dbs
2023-05-18 09:33:05 +03:00
Claudio Atzori
909729a2fc
[dedup] tweaking num partitions, minor changes
2023-05-17 10:16:22 +02:00
Ilias Kanellos
38020e242a
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-05-16 17:34:53 +03:00
Ilias Kanellos
3d69f33c84
Fix selection of columns in graph creation
2023-05-16 17:34:42 +03:00
Ilias Kanellos
3c38f7ba6f
Fix selection of columns in graph creation
2023-05-16 17:32:53 +03:00
Serafeim Chatzopoulos
8ef718c363
Fix workflow application path
2023-05-16 16:28:48 +03:00
Serafeim Chatzopoulos
26328e2a0d
Move job.properties
2023-05-16 14:39:53 +03:00
Serafeim Chatzopoulos
4eec3e7052
Add jobTracker, nameNode && spark2Lib as global params in oozie wf
2023-05-15 22:28:48 +03:00
Serafeim Chatzopoulos
b83135c252
Add missing kill nodes in workflow.xml
2023-05-15 19:55:35 +03:00
Serafeim Chatzopoulos
45f2aa0867
Move end node ... at the end in workflow.xml
2023-05-15 17:52:20 +03:00
Claudio Atzori
e309688711
Merge pull request 'fix APC affiliation links' ( #294 ) from apc_affiliation into beta
...
Reviewed-on: #294
2023-05-15 15:47:57 +02:00
Claudio Atzori
8acad52a0c
Merge branch 'beta' into apc_affiliation
2023-05-15 15:47:33 +02:00
Claudio Atzori
8a463cc3e8
fixed organization id created when mapping APC affiliations. Factored out ROR constants in dhp-common
2023-05-15 15:44:46 +02:00
Serafeim Chatzopoulos
12a57e1f58
Resolve conflicts
2023-05-15 16:20:11 +03:00
Serafeim Chatzopoulos
82e2a96f51
Resolve conflicts
2023-05-15 15:53:12 +03:00
Serafeim Chatzopoulos
b8e8c959fe
Update workflow.xml && job.properties
2023-05-15 15:50:23 +03:00
Ilias Kanellos
4a905932a3
Spark properties from job.properties
2023-05-15 15:24:22 +03:00
Claudio Atzori
0c314d5e09
Merge pull request 'Update copyDataToImpalaCluster.sh' ( #293 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #293
2023-05-15 12:05:54 +02:00
Serafeim Chatzopoulos
07818131ef
Update documentation
2023-05-15 13:04:44 +03:00
dimitrispie
b3f9633205
Update copyDataToImpalaCluster.sh
...
Added option --user to impala-shell command
2023-05-15 12:51:44 +03:00
Miriam Baglioni
021321ae06
Merge pull request 'removed the inverse of the Citing relation' ( #292 ) from citeOnly into beta
...
Reviewed-on: #292
2023-05-15 11:37:39 +02:00
Miriam Baglioni
78b07400c0
changed test classes
2023-05-15 11:37:08 +02:00
Miriam Baglioni
86fe886c1a
removed the inverse of the Citing relation
2023-05-15 11:20:51 +02:00
Ilias Kanellos
1788ac2d4d
Correct filtering for MAG records
2023-05-12 12:55:43 +03:00
Miriam Baglioni
12cd179d2d
Merge pull request 'Update copyDataToImpalaCluster.sh' ( #291 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #291
2023-05-12 11:36:34 +02:00
dimitrispie
00d0d162b6
Update copyDataToImpalaCluster.sh
...
Added a temporary folder to copy the files to avoid permission issues
2023-05-12 12:31:13 +03:00
Ilias Kanellos
5ddbb4ad10
Spark properties no longer hardcoded
2023-05-11 15:36:47 +03:00
Ilias Kanellos
3de35fd6a3
Produce 5 classes of ranking scores
2023-05-11 14:42:25 +03:00
Miriam Baglioni
8c05f49665
moved the version as it was before the change
2023-05-09 10:48:34 +02:00
Miriam Baglioni
99ac5bab46
added check to avoid NPE when checking the organization country
2023-05-04 19:38:39 +02:00
Claudio Atzori
0704e186f6
Merge pull request 'Stats wf executed on hive only' ( #283 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #283
2023-05-02 14:05:12 +02:00
Claudio Atzori
cd80b200ee
Merge pull request 'Affiliation links from APC' ( #290 ) from apc_affiliation into beta
...
Reviewed-on: #290
2023-05-02 12:00:04 +02:00
Claudio Atzori
d8882c4481
extended mapping applied to datacite records to produce affiliations using the ROR ids. Inc ase of APCs it includes the amount and the currently in the relation
2023-05-02 11:56:51 +02:00
Claudio Atzori
d02916ef82
code formatting
2023-05-02 11:05:37 +02:00
Claudio Atzori
f653640cd9
Merge pull request 'Bulk Tagging single step' ( #289 ) from bulkTagRefactor into beta
...
Reviewed-on: #289
2023-05-02 10:54:14 +02:00
dimitrispie
c3d58e58e1
Bug fixes
2023-05-02 11:54:07 +03:00
Claudio Atzori
abd7ca0c18
Merge branch 'beta' into bulkTagRefactor
2023-05-02 10:50:01 +02:00
Claudio Atzori
de36c7b083
Merge pull request 'Enrichment - result to community through organization' ( #255 ) from organizationToRepresentative into beta
...
Reviewed-on: #255
2023-05-02 10:47:07 +02:00
Claudio Atzori
45f625d14f
Merge branch 'beta' into organizationToRepresentative
2023-05-02 10:46:55 +02:00
Claudio Atzori
cdd33f7445
Merge pull request 'graph cleaning refactoring' ( #282 ) from graph_cleaning_refactoring into beta
...
Reviewed-on: #282
2023-05-02 10:40:02 +02:00
Claudio Atzori
de11edca98
Merge branch 'beta' into organizationToRepresentative
2023-05-02 09:59:41 +02:00
Claudio Atzori
851f664bd9
Merge branch 'beta' into graph_cleaning_refactoring
2023-05-02 09:55:40 +02:00
dimitrispie
e57ecdaf98
Update step20-createMonitorDB.sql
...
Add University of Manitoba
2023-04-30 17:52:23 +03:00
Ilias Kanellos
90332439ad
Remove deletion of synonym folder
2023-04-28 13:45:19 +03:00
Ilias Kanellos
a98da54896
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-04-28 13:23:49 +03:00
Ilias Kanellos
09485fbee3
Fixed unicode bug. Workflow ends after first script
2023-04-28 13:09:13 +03:00
Serafeim Chatzopoulos
614cc1089b
Add separate forder for results && project actionsets
2023-04-27 12:37:15 +03:00
Serafeim Chatzopoulos
815a4ddbba
Add actionset creation for project bip indicators in workflow
2023-04-26 20:40:06 +03:00
Serafeim Chatzopoulos
ee04cf92bf
Add actionsets for project impact indicators
2023-04-26 20:23:46 +03:00
Alessia Bardi
b88f009d9f
combined level 4 and 6 for the demo
2023-04-24 12:10:33 +02:00
Alessia Bardi
5ffe82ffd8
aligned to current DMF index layout on production
2023-04-24 12:09:55 +02:00
Alessia Bardi
1c173642f0
removed level5 from test records
2023-04-24 09:32:32 +02:00
dimitrispie
fdb5d2b39f
Bug fixes
2023-04-23 18:29:00 +03:00
dimitrispie
53ce023035
Bug fixes
2023-04-23 18:23:45 +03:00
Alessia Bardi
382f46a8e4
tests to generate the XML records for the index for the EDITH demo on digital twins, integrating output from the FoS classifier
2023-04-21 16:46:30 +02:00
Miriam Baglioni
ce03f3ee62
mergin with branch beta
2023-04-20 14:50:47 +02:00
dimitrispie
4fa750b719
Bug fixes on monitor-update
2023-04-19 17:39:53 +03:00
dimitrispie
5247cb7115
Bug fix
2023-04-19 11:11:19 +03:00
Miriam Baglioni
9fc8ebe98b
refactoring
2023-04-19 09:32:13 +02:00
Miriam Baglioni
efc4f6a658
[bulkTag] refactor to enrich each result single step
2023-04-18 17:39:31 +02:00
Serafeim Chatzopoulos
23f58a86f1
Change jar param in project impact indicators action
2023-04-18 12:26:01 +03:00
Miriam Baglioni
73f77575bd
[ZenodoApiClient] align with master version
2023-04-18 10:25:27 +02:00
Miriam Baglioni
697a134504
-
2023-04-18 10:21:12 +02:00
Miriam Baglioni
6cc95c96a2
-
2023-04-18 09:53:11 +02:00
Miriam Baglioni
24c41806ac
[ZenodoApiClienttest] change test to mirror change in the omplementation
2023-04-18 09:08:09 +02:00
Miriam Baglioni
087b5a7973
[ZenodiAPIClient] new version of the API to connect to Zenodo (change the http client
2023-04-17 18:59:22 +02:00
Michele De Bonis
cb595c87bb
implementation of the support for authors deduplication: cosinesimilarity comparator and double array json parser
2023-04-17 11:06:27 +02:00
dimitrispie
25dafccc24
Merge branch 'hive' into beta
2023-04-12 11:36:59 +03:00
Claudio Atzori
688e3b7936
added eoscifguidelines in the result view; removed compute statistics statements
2023-04-11 11:45:56 +02:00
Claudio Atzori
2e465915b4
[graph to Solr] using dedicated sparkExecutorCores, sparkExecutorMemory, sparkDriverMemory in convert_to_xml
2023-04-11 10:43:44 +02:00
Claudio Atzori
a2dcb06daf
added eoscifguidelines in the result view; removed compute statistics statements
2023-04-11 10:43:32 +02:00
Serafeim Chatzopoulos
7256c8d3c7
Add script for aggregating impact indicators at the project level
2023-04-07 16:30:12 +03:00
dimitrispie
c85de8fa1f
-Added Technological University Dublin
...
-Added project_organization_contribution table
-Add Delft University of Technology
2023-04-07 09:22:59 +03:00
dimitrispie
9b41dff33c
Update step20-createMonitorDB.sql
...
Added Delft University of Technology
2023-04-07 09:21:38 +03:00
Claudio Atzori
4a4ca634f0
Merge pull request 'advConstraintsInBeta' ( #288 ) from advConstraintsInBeta into master
...
Reviewed-on: #288
2023-04-06 15:24:23 +02:00
Miriam Baglioni
932d07d2dd
[bulkTag] added filtering for datasources in eosctag
2023-04-06 15:08:27 +02:00
Miriam Baglioni
c6a7602b3e
refactoring after compilation
2023-04-06 14:45:01 +02:00
Miriam Baglioni
831055a1fc
change of the property for test purposes, addition of two new verbs, and fix of issue for advanced constraints
2023-04-06 14:41:32 +02:00
Miriam Baglioni
287753417d
better implementation for the fix
2023-04-06 12:22:38 +02:00
Miriam Baglioni
cf3d0f4f83
fixed issue on bulktagging for the advanced constraints
2023-04-06 12:17:35 +02:00
Miriam Baglioni
b42abc9904
fixed issue on bulktagging for the advanced constraints
2023-04-06 12:15:00 +02:00
dimitrispie
91e18ac7f4
Added project_organization_contribution table
2023-04-06 10:53:11 +03:00
Claudio Atzori
4f67225fbc
Merge pull request 'doiboostMappingExtention' ( #286 ) from doiboostMappingExtention into master
...
Reviewed-on: #286
2023-04-06 09:25:08 +02:00
Claudio Atzori
e093f04874
Merge pull request 'AdvancedConstraint' ( #285 ) from advConstraintsInBeta into master
...
Reviewed-on: #285
2023-04-06 09:24:54 +02:00
Miriam Baglioni
c5a9f39141
Extended the association project - result in the mapping from CrossRef
2023-04-05 16:48:36 +02:00
Miriam Baglioni
ecc05fe0f3
Added the code for the advancedConstraint implementation during the bulkTagging
2023-04-05 16:40:29 +02:00
Claudio Atzori
42442ccd39
Merge pull request 'updated the order of the compatibilities' ( #275 ) from compatibility_order into master
...
Reviewed-on: #275
2023-04-05 12:44:14 +02:00
Miriam Baglioni
b25b401065
added test to verify the advconstraints to dth community. inserted some additional logs.
2023-04-05 12:18:39 +02:00
Claudio Atzori
864f4051d3
[graph cleaning] added missing case
2023-04-05 11:35:47 +02:00
Michele De Bonis
297eb207a5
minor change in the author match which now can compute count and percentage
2023-04-04 17:10:37 +02:00
Claudio Atzori
dead87917f
[graph cleaning] cleanup
2023-04-04 13:13:43 +02:00
Claudio Atzori
2a6ba29b64
[graph cleaning] unit tests & cleanup
2023-04-04 12:34:51 +02:00
dimitrispie
9e1335df4c
-Added Technological University Dublin
...
-Added project_organization_contribution table
2023-04-04 13:22:40 +03:00
Miriam Baglioni
9a9cc6a1dd
changed the way the tar archive is build to support renaming in case we need to change .tt.gz into .json.gz
2023-04-04 11:40:58 +02:00
Claudio Atzori
63b8bbc015
[graph to Solr] using dedicated sparkExecutorCores, sparkExecutorMemory, sparkDriverMemory in convert_to_xml
2023-03-24 13:43:20 +01:00
Claudio Atzori
b502f86523
fixed input path supplemented to GetDatasourceFromCountry; adjusted the various spark.sql.shuffle.partitions
2023-03-24 13:09:12 +01:00
Claudio Atzori
c07857fa37
[graph cleaning] unit tests & cleanup
2023-03-23 15:57:47 +01:00
Claudio Atzori
90e61a8aba
[graph cleaning] WIP: refactoring of the cleaning stages, unit tests
2023-03-23 15:03:26 +01:00
Claudio Atzori
308e10d102
serialising: 1. measures for all the entity types and 2. result level fulltext
2023-03-23 11:23:22 +01:00
Claudio Atzori
488d9a5eaa
[graph cleaning] WIP: refactoring of the cleaning stages, unit tests
2023-03-23 10:41:13 +01:00
dimitrispie
fad7fa4af8
Added Technological University Dublin
2023-03-22 09:44:00 +02:00
Serafeim Chatzopoulos
102aa5ab81
Add dependency to dhp-aggregation
2023-03-21 19:25:29 +02:00
Serafeim Chatzopoulos
f3e5abf63b
Merge branch '8172_impact_indicators_workflow' of https://code-repo.d4science.org/D-Net/dnet-hadoop into 8172_impact_indicators_workflow
2023-03-21 18:26:09 +02:00
Serafeim Chatzopoulos
3e8a4cf952
Rearrange resources folder structure
2023-03-21 18:25:55 +02:00
Serafeim Chatzopoulos
f992ecb657
Checkout BIP-Ranker during 'prepare-package' && add it in the oozie-package.tar.gz
2023-03-21 18:03:55 +02:00
Ilias Kanellos
9dc8f0f05f
Add ActionSet step
2023-03-21 16:14:15 +02:00
Claudio Atzori
4f5ba0ed52
[graph cleaning] WIP: refactoring of the cleaning stages, unit tests
2023-03-21 14:41:20 +01:00
Ilias Kanellos
b5c252865c
Add filtering based on citation source
2023-03-20 15:38:36 +02:00
Claudio Atzori
6d3d18d8b5
[graph cleaning] WIP: refactoring of the cleaning stages
2023-03-16 17:23:36 +01:00
dimitrispie
43b23a9bf3
Update step20-createMonitorDB.sql
...
Added Technological University Dublin
2023-03-15 09:57:12 +02:00
Serafeim Chatzopoulos
720fd19b39
Add dhp-impact-indicators workflow files
2023-03-14 19:28:27 +02:00
Serafeim Chatzopoulos
c6e39b7f33
Add dhp-impact-indicators
2023-03-14 18:50:54 +02:00
Claudio Atzori
518618f1a9
[graph cleaning] avoid to overwrite the subject class to 'keyword' for those with provenance 'subject:fos'
2023-03-14 15:22:47 +01:00
Claudio Atzori
41e00bcd07
[graph provision] avoid to parse again the XML records, apparently the escaped XML characters get unescaped invalidating the record
2023-03-13 15:19:49 +01:00
Claudio Atzori
46d2df1c90
Merge pull request '[aggregator graph] handle paths including wildcards' ( #281 ) from aggregator_graph into beta
...
Reviewed-on: #281
2023-03-08 21:17:39 +01:00
Claudio Atzori
24e2fd828b
code formatting
2023-03-08 21:17:08 +01:00
Claudio Atzori
e28d395e87
[aggregator graph] using dedicated path to sync claims, adjusted paths with wildcards
2023-03-08 21:16:52 +01:00
Claudio Atzori
5b8fd37314
[aggregator graph] using dedicated path to sync claims
2023-03-08 15:28:14 +01:00
Claudio Atzori
7fd89566c2
[aggregator graph] handle paths including wildcards
2023-03-08 12:43:00 +01:00
Miriam Baglioni
588aca5ce4
Merge pull request 'h2020classification' ( #280 ) from h2020classification into beta
...
Reviewed-on: #280
2023-03-03 09:29:10 +01:00
Claudio Atzori
8ec0d62d91
pre-group the records in each table before joning the contents from BETA and PROD together
2023-03-02 14:49:19 +01:00
Miriam Baglioni
0fff98a14c
[ECclassification] removed print
2023-03-02 11:46:57 +01:00
Miriam Baglioni
b0c2f7e526
[ECclassification] removed not needed resources
2023-03-02 11:44:48 +01:00
Miriam Baglioni
d4fc62c2f6
mergin with branch beta
2023-03-02 11:14:54 +01:00
Miriam Baglioni
de8ad1caef
[ECclassification] new implementation for the H2020 classification
2023-03-02 11:14:03 +01:00
Claudio Atzori
db9dad4aa7
[actionmanager] increased spark.sql.shuffle.partitions for publication, dataset, relation records
2023-03-02 09:11:37 +01:00
Miriam Baglioni
c1f9848953
[ECclassification] added new classes
2023-03-01 15:29:11 +01:00
Claudio Atzori
6f488547a7
ignore non processable records
2023-03-01 14:49:51 +01:00
Claudio Atzori
7d263f265e
adjusted logs
2023-03-01 11:58:07 +01:00
Claudio Atzori
16ad42e8f3
code formatting
2023-03-01 10:22:13 +01:00
Claudio Atzori
9c59dac859
followup changes reorganising the mdstore synchronisation mechanism
2023-03-01 10:16:20 +01:00
Miriam Baglioni
49737f1087
Merge pull request '[CrossrefFunderMapping] fixed issueson funder name' ( #279 ) from doiboostFunderExtention into beta
...
Reviewed-on: #279
2023-02-28 15:08:07 +01:00
Miriam Baglioni
ad745c0aa3
[CrossrefFunderMapping] fixed issueson funder name
2023-02-28 14:58:27 +01:00
Miriam Baglioni
4f2df876cd
[ECclassification] new implementation first try
2023-02-28 14:44:00 +01:00
Claudio Atzori
bc986f66ec
Merge pull request 'monodirectional citations' ( #278 ) from citations_monodirectional into beta
...
Reviewed-on: #278
2023-02-28 13:33:52 +01:00
Claudio Atzori
2f7346e9cf
WIP monodirectional citations, Datacite
2023-02-28 13:30:51 +01:00
Claudio Atzori
0559d8b412
WIP monodirectional citations
2023-02-28 10:57:32 +01:00
Sandro La Bruzzo
69fa616490
removed wrong content
2023-02-28 10:27:38 +01:00
Sandro La Bruzzo
832a75d012
added mapping for crossref funder
2023-02-28 10:16:34 +01:00
Sandro La Bruzzo
78e51c182a
Added missing parametero to raw all workflow
2023-02-28 10:16:01 +01:00
Claudio Atzori
7aebedb43c
code formatting
2023-02-27 11:51:27 +01:00
Miriam Baglioni
80987801d7
[FoS] added check for null on level1 subject
2023-02-27 11:40:22 +01:00
Claudio Atzori
31e97c2a6b
[unresolved entities] updated oozie wf node labels
2023-02-27 11:38:29 +01:00
Miriam Baglioni
23112929e9
[FoS] changed the default separator from comma to tab to solve the issue in subject value split
2023-02-27 10:18:39 +01:00
Claudio Atzori
c4856b4eaa
Merge pull request 'Remove unecessary indexed fields from Solr' ( #277 ) from 8099_lighten_solr_index into beta
...
Reviewed-on: #277
2023-02-23 11:50:29 +01:00
Serafeim Chatzopoulos
0b5bf53b45
Remove unecessary indexed fields from Solr
2023-02-23 12:42:42 +02:00
dimitrispie
1547611246
Merge branch 'beta' into hive
2023-02-22 16:57:12 +02:00
Claudio Atzori
9e4ec0023c
Merge pull request 'updated the order of the compatibilities (BETA)' ( #276 ) from compatibility_order_beta into beta
...
Reviewed-on: #276
2023-02-22 14:47:32 +01:00
Michele Artini
fddcf701e9
updated the order of the compatibilities
2023-02-22 12:07:09 +01:00
Michele Artini
200098b683
updated the order of the compatibilities
2023-02-22 11:52:59 +01:00
Claudio Atzori
0c1be41b30
code formatting
2023-02-22 10:15:25 +01:00
Claudio Atzori
3b876d9327
depending on dhp-schemas v. 3.16.0
2023-02-22 10:15:10 +01:00
Claudio Atzori
99cd7761aa
cleanup of non necessary dhp-monitor-update workflow
2023-02-22 10:10:22 +01:00
Claudio Atzori
a590c371a9
Merge pull request '8232-mdstore-synch-improve' ( #272 ) from 8232-mdstore-synch-improve into beta
...
Reviewed-on: #272
2023-02-22 10:02:26 +01:00
Claudio Atzori
cd3a51a15f
Merge branch 'beta' into 8232-mdstore-synch-improve
2023-02-22 09:57:07 +01:00
Claudio Atzori
42b6b5d5ce
Merge pull request 'UsageCountOnProjectAndDatasource' ( #271 ) from UsageCountOnProjectAndDatasource into beta
...
Reviewed-on: #271
2023-02-22 09:56:08 +01:00
Claudio Atzori
477a7c416f
Merge branch 'beta' into UsageCountOnProjectAndDatasource
2023-02-22 09:55:51 +01:00
Claudio Atzori
c20c1c9159
Merge pull request 'Added 4 institutions:' ( #261 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #261
2023-02-22 09:53:45 +01:00
Miriam Baglioni
d617c3e812
[DOIBoost] extended mapping for funder #8407
2023-02-20 14:45:27 +01:00
dimitrispie
90807b60c7
Changes to monitor wf
2023-02-20 10:42:24 +02:00
dimitrispie
d2f9ccf934
Changes to separate monitor wf
2023-02-20 10:41:21 +02:00
dimitrispie
032a401cbf
Bug fixes
2023-02-20 09:29:20 +02:00
Miriam Baglioni
016337a0f9
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-02-16 15:54:59 +01:00
Sandro La Bruzzo
118c1fc3b3
Merge remote-tracking branch 'origin/beta' into beta
2023-02-15 10:29:28 +01:00
Sandro La Bruzzo
a8ac79fa25
Added citation relation on crossref Mapping
2023-02-15 10:29:13 +01:00
dimitrispie
595192d510
Bug fix
2023-02-14 16:24:08 +02:00
dimitrispie
f3aaff3688
Remove duplicate orgs
2023-02-14 09:48:36 +02:00
Claudio Atzori
9a03f71db1
code formatting
2023-02-13 16:25:47 +01:00
Michele Artini
554df257ab
null values in date range conditions
2023-02-13 16:15:32 +01:00
Michele Artini
9c1df15071
null values in date range conditions
2023-02-13 16:05:58 +01:00
Miriam Baglioni
32870339f5
refactoring after compile
2023-02-13 13:06:48 +01:00
Miriam Baglioni
7184cc0804
[FoS] added check for null on level1 subject
2023-02-13 13:03:49 +01:00
dimitrispie
3400133c2f
Bug fix
2023-02-13 09:44:00 +02:00
dimitrispie
935db0ab25
Added organizations for Monitor
2023-02-13 09:29:09 +02:00
dimitrispie
7b78b15c81
Changes for copying to Impala Cluster
2023-02-13 09:27:00 +02:00
Miriam Baglioni
5cf902a2b0
[UsageCount] changed query to make the sum be computed via sql instead of grouping
2023-02-10 16:16:37 +01:00
Miriam Baglioni
f803530df6
[UsageCount] fixed query
2023-02-10 15:50:56 +01:00
Miriam Baglioni
7473093c84
[FoS] changed the default separator from comma to tab to solve the issue in subject value split
2023-02-10 15:34:52 +01:00
Miriam Baglioni
bb5bba51b3
[UsageCount] extended test
2023-02-09 19:08:30 +01:00
Miriam Baglioni
85e53fad00
[UsageCount] addition of usagecount for Projects and datasources. Extention of the action set created for the results with new entities for projects and datasources. Extention of the resource set and modification of the testing class
2023-02-09 18:59:45 +01:00
dimitrispie
d71f5672d3
Add monitor post step
2023-02-09 13:44:14 +02:00
dimitrispie
35ba8bb328
Bug fixes
2023-02-09 12:57:57 +02:00
Sandro La Bruzzo
8920932dd8
Code formatted
2023-02-08 11:34:18 +01:00
Sandro La Bruzzo
0b9819f1ab
Code formatted
2023-02-08 10:32:33 +01:00
Sandro La Bruzzo
6c81a161d2
Merge remote-tracking branch 'origin/beta' into 8231-mdstore-synch-improve
2023-02-08 10:29:09 +01:00
dimitrispie
3ba11d64a1
Changes 07022023
2023-02-07 12:53:51 +02:00
dimitrispie
98c34263ed
Update step20-createMonitorDB.sql
...
Add University of Cape Town organization
2023-02-07 08:14:48 +02:00
dimitrispie
2dc6d47270
Changes 06022023
2023-02-06 13:18:53 +02:00
Miriam Baglioni
5f0906be60
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2023-02-02 17:13:14 +01:00
dimitrispie
973d78a4d6
Update step15_5.sql
...
Added unpaywalls open access colors
2023-02-02 08:03:54 +02:00
Claudio Atzori
d05ca53a14
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-01-31 14:39:53 +01:00
Michele De Bonis
6a6c266dde
implementation of author dedup configuration and lnfi clustering function
2023-01-31 11:53:10 +01:00
Miriam Baglioni
e82e009b46
added missing close tag for XML produced by the xquery to get information for the community from the IS
2023-01-31 10:19:34 +01:00
Miriam Baglioni
b254a0375f
[Affiliation from institutionalrepo] changed the field to check to verify the datasource type. Now it is in the field jurisdiction
2023-01-26 16:51:20 +01:00
dimitrispie
cf58e4a5e4
Added Arts et Métiers ParisTech
2023-01-25 16:03:16 +02:00
dimitrispie
db7d625ba9
Addedd Arts et Métiers ParisTech organization
2023-01-25 12:22:21 +02:00
Claudio Atzori
505867bce9
[bulk tagging] better node naming
2023-01-20 16:13:16 +01:00
Claudio Atzori
1b37516578
[bulk tagging] better node naming
2023-01-20 16:11:26 +01:00
Miriam Baglioni
ecd398fe51
refactoring
2023-01-20 14:23:45 +01:00
Claudio Atzori
c1e2460293
[cleaning] the datasource master-duplicate fixup should not be brought to production yet
2023-01-20 09:20:26 +01:00
Claudio Atzori
3800361033
[country propagation] fixes error 'cannot resolve countrySet given input columns: []' when there is no prepared information driving the propagation process for a given result type
2023-01-19 15:57:43 +01:00
Miriam Baglioni
0a5c6010b0
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-01-13 16:14:46 +01:00
dimitrispie
4d7553c9f1
Bug fixes
2023-01-12 17:19:19 +02:00
dimitrispie
dd70c32ad7
Bug fixes
2023-01-12 17:18:05 +02:00
dimitrispie
51f7ab5864
Bug fixes
2023-01-12 17:15:06 +02:00
dimitrispie
34d4bf727c
Bug fixes
2023-01-12 11:28:37 +02:00
dimitrispie
43f6d4f296
-Monitor DB workflow
2023-01-12 11:26:47 +02:00
dimitrispie
686580a220
- New Monitor DB workflow
...
- New Organization added
2023-01-12 11:18:03 +02:00
Claudio Atzori
0a58bc7ba7
[broker] prevent NPEs
2023-01-11 14:44:14 +01:00
Michele Artini
699736addc
NPE prevention
2023-01-11 13:14:44 +01:00
Claudio Atzori
04cb96001c
[broker] d40e20f437
adapted to the beta graph model
2023-01-11 10:10:12 +01:00
Michele Artini
91b845f611
Considering instance pids and alteternative identifiers
2023-01-11 09:58:54 +01:00
Claudio Atzori
f86e19b282
code formatting
2023-01-11 09:53:19 +01:00
Miriam Baglioni
1f367122e4
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-01-11 09:47:44 +01:00
Michele Artini
d40e20f437
Considering instance pids and alteternative identifiers
2023-01-11 09:37:34 +01:00
Michele Artini
7b7520850b
fixed an invalid char
2023-01-11 09:22:18 +01:00
Michele Artini
4953ae5649
fixed an invalid char
2023-01-11 08:35:53 +01:00
Miriam Baglioni
d6895f0387
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2023-01-09 17:28:38 +01:00
Miriam Baglioni
c60d3a2b46
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2023-01-09 17:28:27 +01:00
dimitrispie
becb242c17
Monitor DB only Workflow
2023-01-04 16:50:29 +02:00
dimitrispie
dcb958e146
Changes to execute the stats wf only in hive
2023-01-04 11:39:01 +02:00
Claudio Atzori
7becdaf31d
Merge pull request 'Workaround to use new version of intellij on Master' ( #266 ) from master_intellij into master
...
Reviewed-on: #266
2022-12-23 10:32:21 +01:00
Claudio Atzori
18a7aa2d78
Merge pull request 'Workaround to use new version of intellij on Beta' ( #267 ) from beta_intellij into beta
...
Reviewed-on: #267
2022-12-23 10:32:01 +01:00
dimitrispie
592013d5dd
Added more steps in decision node
2022-12-23 09:43:16 +02:00
dimitrispie
2a4bf32d4c
Merge branch 'hive' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into hive
...
# Conflicts:
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step10.sql
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step13.sql
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step14.sql
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step16_1-definitions.sql
# dhp-workflows/dhp-stats-update/src/main/resources/eu/dnetlib/dhp/oa/graph/stats/oozie_app/scripts/step7.sql
2022-12-22 10:22:46 +02:00
dimitrispie
6449ff4207
1. Added a decision node to enables the workflow to make a selection on the execution path to follow
...
2. Added new organization
3. Added 5 new tables from Eurostast
2022-12-22 10:18:21 +02:00
Miriam Baglioni
b713132db7
[Cleaning] adding missing classes
2022-12-21 12:49:08 +01:00
Miriam Baglioni
8893389895
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-12-21 12:42:27 +01:00
Miriam Baglioni
11f2b470d3
[Cleaning] adding missing classes
2022-12-21 12:42:19 +01:00
Antonis Lempesis
c8309fe18e
addded command line params to allow hive actions to run
2022-12-21 12:41:33 +02:00
Antonis Lempesis
028873cc51
added new hive opts
2022-12-21 12:41:33 +02:00
Antonis Lempesis
1ddea4f442
removed 'stored as parquet' from views..
2022-12-21 12:41:33 +02:00
Antonis Lempesis
2754c3dd62
moving data to impala cluster and creating shadow databases there
2022-12-21 12:41:29 +02:00
Antonis Lempesis
778a1a724f
finished migration to hive only
2022-12-21 12:41:25 +02:00
Antonis Lempesis
e84dd5fe26
first
2022-12-21 12:41:23 +02:00
Sandro La Bruzzo
3c9826f186
updated lines function to it's implementation linesWithSeparators.map(l => l.stripLineEnd) in this way we force scala plugin compiler to consider this pipeline scala code and not java.string.lines() pipeline
2022-12-21 11:21:17 +01:00
Sandro La Bruzzo
91c70b15a5
updated lines function to it's implementation linesWithSeparators.map(l => l.stripLineEnd) in this way we force scala plugin compiler to consider this pipeline scala code and not java.string.lines() pipeline
2022-12-21 11:14:42 +01:00
Claudio Atzori
f910b7379d
[cleaning] recovering missing resources from #265
2022-12-21 09:26:34 +01:00
Claudio Atzori
33bdad104e
[cleaning] align parameter names
2022-12-20 21:43:59 +01:00
Claudio Atzori
6aa91204a5
[orcid propagation] skip empty directories
2022-12-20 14:15:46 +01:00
Claudio Atzori
5816ded93f
code formatting
2022-12-20 10:41:40 +01:00
Claudio Atzori
46972f8393
[orcid propagation] skip empty directory
2022-12-20 10:28:22 +01:00
Claudio Atzori
9cf0a98699
[cleaning] set the common subject classid/name
2022-12-20 10:17:33 +01:00
Claudio Atzori
da85ca697d
Merge pull request 'cleanCountryOnMaster' ( #265 ) from cleanCountryOnMaster into master
...
Reviewed-on: #265
2022-12-16 15:58:44 +01:00
Miriam Baglioni
059e100ec7
[Clean Country] moving other resources for testing purposes
2022-12-16 15:48:21 +01:00
Miriam Baglioni
fc95a550c3
[Clean Country] moving other resources for testing purposes
2022-12-16 15:46:32 +01:00
Miriam Baglioni
6901ac91b1
[Clean Country] moving source and resources to master
2022-12-16 15:42:49 +01:00
Miriam Baglioni
6674cccb94
[BulkTag] description of parameters more comprehensive for those who do not implement it
2022-12-16 15:33:20 +01:00
Miriam Baglioni
f37113a941
[BulkTag] moving xquery to get community configuration in dedicated file
2022-12-16 15:32:26 +01:00
Miriam Baglioni
8685eaa706
[Clean Country] added test to verify remove of country
2022-12-16 15:31:25 +01:00
Miriam Baglioni
dc0ec88a58
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-12-16 13:18:32 +01:00
Miriam Baglioni
d791840b82
[Clean Country] added test to verify remove of country:
2022-12-16 13:18:29 +01:00
Claudio Atzori
7b80b24f82
[cleaning] country cleaning must use both PID and AlternateIdentifier fields
2022-12-15 14:49:04 +01:00
Claudio Atzori
b8bafab8a0
[cleaning] improved vocabulary based mapping, specialization for the strict vocab cleaning
2022-12-12 14:43:03 +01:00
Sandro La Bruzzo
5e4866d033
implemented synch for single mdstore
2022-12-12 11:29:46 +01:00
Claudio Atzori
c18b8048c3
[cleaning] avoid NPE
2022-12-10 11:41:38 +01:00
Claudio Atzori
8b44afe5e5
[cleaning] avoid NPE
2022-12-09 15:44:57 +01:00
Claudio Atzori
389dd25430
[cleaning] avoid NPE
2022-12-08 18:40:48 +01:00
Claudio Atzori
730228d73d
[cleaning] align wf parameter names in test
2022-12-08 18:40:22 +01:00
Claudio Atzori
2094fa6db0
[cleaning] align wf parameter names
2022-12-08 17:22:26 +01:00
Miriam Baglioni
a485a94956
[Cleaning] fixed parameter name in property file
2022-12-08 16:59:34 +01:00
Miriam Baglioni
3d99b78d94
[Cleaning] fixed error in parameter (workingPath to workingDir)
2022-12-08 10:25:02 +01:00
Claudio Atzori
08c4588d47
Merge pull request 'Changes from beta stats wf to prod' ( #264 ) from antonis.lempesis/dnet-hadoop:beta into master
...
Reviewed-on: #264
2022-12-07 15:56:22 +01:00
Claudio Atzori
1b8488976b
code formatting
2022-12-07 10:45:38 +01:00
Claudio Atzori
cd1b58483e
[bulk tag] fixed Community configuration parsing to void NPE
2022-12-07 10:39:00 +01:00
Claudio Atzori
062abfd669
fixed NPE, removed unused stuff
2022-12-06 12:04:00 +01:00
dimitrispie
2a52a42169
Added 4 institutions:
...
-University of Modena and Reggio Emilia
-Bilkent University
-Saints Cyril and Methodius University of Skopje
-University of Milan
2022-12-06 10:10:21 +02:00
Claudio Atzori
71b121e9f8
Merge pull request '[graph cleaning] update collectedfrom & hostedby references as consequence of the datasource deduplication' ( #260 ) from graph_cleaning into beta
...
Reviewed-on: #260
2022-12-02 14:49:15 +01:00
Claudio Atzori
8248da40d9
Merge branch 'beta' into graph_cleaning
2022-12-02 14:49:00 +01:00
Claudio Atzori
ddf065756f
Merge pull request 'Two organizations are added for monitor' ( #258 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #258
2022-12-02 14:45:27 +01:00
Claudio Atzori
41f7f1bbc5
Merge pull request '[graph dedup] records stability and testing' ( #44 ) from deduptesting into beta
...
Reviewed-on: #44
2022-12-02 14:43:05 +01:00
Sandro La Bruzzo
5a48a2fb18
implemented synch for single mdstore
2022-12-01 11:34:43 +01:00
Claudio Atzori
a38116546d
Merge branch 'beta' into deduptesting
2022-11-30 11:27:29 +01:00
Miriam Baglioni
ce020f2c83
[EOSC FUTURE] added resources and test for review
2022-11-30 09:57:30 +01:00
Miriam Baglioni
bb0ddc1c44
[BulkTag] adding verb starts_with
2022-11-30 09:56:24 +01:00
Claudio Atzori
8e3edba318
[graph cleaning] testing the collectedfron and hostedby patch procedure
2022-11-29 16:07:09 +01:00
Claudio Atzori
58c05731f9
[graph cleaning] WIP: testing the collectedfron and hostedby patch procedure
2022-11-29 11:21:51 +01:00
Miriam Baglioni
7d264a1d69
Merge pull request 'horizontalConstraints' ( #259 ) from horizontalConstraints into beta
...
Reviewed-on: #259
2022-11-28 18:20:17 +01:00
Miriam Baglioni
9c70c5dbd6
[Bulk Tag horizontal] added new path in definition of constraint (to recognize fos subjects) - changed test and resource class to test this new aspect
2022-11-28 14:51:20 +01:00
Miriam Baglioni
0628df7a3a
resolving conflicts
2022-11-28 10:44:56 +01:00
Claudio Atzori
11695ba649
[graph cleaning] patch also the result's collectedfrom and hostedby datasource name according to the datasource master-duplicate mapping
2022-11-28 10:18:43 +01:00
Claudio Atzori
6082d235d3
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into graph_cleaning
2022-11-28 09:54:48 +01:00
Claudio Atzori
24ef301cc1
[graph cleaning] patch the result's collectedfrom and hostedby identifiers according to the datasource master-duplicate mapping
2022-11-28 09:54:18 +01:00
Miriam Baglioni
29d3da85f1
[EOSC DUMP] added resources needed for the review as test
2022-11-25 17:16:20 +01:00
Alessia Bardi
90c8f9cb61
tests for EOSC Future
2022-11-23 12:18:44 +01:00
Miriam Baglioni
33a2b1b5dc
[Bulk Tag] fixed typo in test configuration
2022-11-23 11:31:17 +01:00
Miriam Baglioni
c6df8327b3
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2022-11-23 11:26:57 +01:00
Miriam Baglioni
0e3edc5018
[Bulk Tag] fixed issue in verb name
2022-11-23 11:26:36 +01:00
Miriam Baglioni
935aa367d8
[BulkTag] removed commented code
2022-11-23 11:16:39 +01:00
Miriam Baglioni
43aedbdfe5
[BulkTag] changed verb name in configuration
2022-11-23 11:14:23 +01:00
Miriam Baglioni
b6da9b67ff
[BulkTag] fixed typo in annotation for verb name
2022-11-23 11:13:58 +01:00
Claudio Atzori
a79c47522d
updated ORCID datasource identifier
2022-11-23 10:17:49 +01:00
Alessia Bardi
2832117f23
added eoscifguidelines in test
2022-11-22 18:01:12 +01:00
Michele De Bonis
14f6346676
implementation of the new software configuration
2022-11-22 17:48:34 +01:00
Alessia Bardi
3c08269a4d
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-11-22 17:31:00 +01:00
Alessia Bardi
2687fc9f73
tests for EOSC Future review - ROhub
2022-11-22 17:30:56 +01:00
Claudio Atzori
a34c8b6f81
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2022-11-22 10:22:31 +01:00
Claudio Atzori
1d5143b0b6
Merge branch 'beta' into deduptesting
2022-11-22 10:21:30 +01:00
Miriam Baglioni
122e75aa17
fixed conflicts
2022-11-21 18:13:12 +01:00
Miriam Baglioni
cee7a45b1d
[Bulk Tag Datasource] fixed issue with verb name and add new test for neanias selection for orcid
2022-11-21 18:10:20 +01:00
Michele De Bonis
9fee2ed611
minor changes
2022-11-21 14:35:46 +01:00
Claudio Atzori
ed64618235
increased spark.sql.shuffle.partitions in the last join phase of the result (publication) to community through semantic relation propagation
2022-11-18 16:06:51 +01:00
Claudio Atzori
8742934843
added spark.sql.shuffle.partitions in the last join phase of the result to community through semantic relation propagation
2022-11-18 11:32:22 +01:00
Claudio Atzori
0aa725083f
extended dedup testing
2022-11-17 16:13:43 +01:00
Claudio Atzori
3dbc637d3e
code formatting
2022-11-17 09:55:41 +01:00
Claudio Atzori
13cc592f39
code formatting
2022-11-15 09:37:57 +01:00
Claudio Atzori
af15b1e48d
[eosc tag] extending criteria for Jupyter Notebook (adding to ORP the same constraint)
2022-11-14 18:30:43 +01:00
Claudio Atzori
eb45ba7af0
extended mapping from ODF relations (PR#251)
2022-11-14 18:26:13 +01:00
Claudio Atzori
a929dc5fee
integrated changes for mapping ROHub contents in the Graph
2022-11-14 18:15:35 +01:00
Claudio Atzori
24f99d7310
Merge pull request 'Map oaf:eoscifguidelines from mdstore to the graph' ( #256 ) from eoscifguidelines-from-mdstores into beta
...
Reviewed-on: #256
2022-11-14 15:40:34 +01:00
Claudio Atzori
ddff0e8999
merging duplicates using IdentifierComparator
2022-11-11 16:10:25 +01:00
Miriam Baglioni
5f9383b2d9
[EOSC TAG] remove reduntant check for jupyter notebook
2022-11-11 14:06:19 +01:00
Miriam Baglioni
b18bbca8af
[EOSC TAG] adding search in orp for jupyter notebook criteria
2022-11-11 12:42:58 +01:00
Claudio Atzori
5af5a8ae42
added IdentifierComparator
2022-11-09 14:20:59 +01:00
Claudio Atzori
0419953470
merge from beta
2022-11-07 12:22:35 +01:00
Claudio Atzori
7c3390ac10
Merge branch 'beta' into eoscifguidelines-from-mdstores
2022-11-07 12:18:40 +01:00
dimitrispie
55fa3b2a17
Hive memory parameters
2022-11-03 15:21:04 +01:00
dimitrispie
992fc5b628
Added McMaster University Institution
2022-11-03 11:02:18 +02:00
dimitrispie
7fda05e380
Added Autonomous University of Barcelona
2022-11-01 13:59:40 +02:00
Claudio Atzori
22873c9172
Merge pull request 'Added fields: totalcost, fundedamount, currency, in project table' ( #257 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #257
2022-10-31 13:49:27 +01:00
dimitrispie
7861c472e0
Hive memory parameters
2022-10-28 19:00:32 +03:00
dimitrispie
5df9c63963
Added fields: totalcost, fundedamount, currency, in project table
2022-10-27 16:44:26 +03:00
Sandro La Bruzzo
2b9a20a4a3
Changed the way Scholexplorer filter the relationships, I found that filter all relation coming from openCitation is wrong, because we loose a lot of relation than intersect OpenCitation, but they don't come only from there
2022-10-24 12:53:47 +02:00
Alessia Bardi
208ed32315
fixed xpath for semantic relation
2022-10-23 18:18:13 +02:00
Alessia Bardi
ee759ac92d
file format after mvn compile
2022-10-23 18:09:47 +02:00
Alessia Bardi
31a10f000b
Map the field oaf:eoscifguidelines from mdstores. Currently we can find it in ROHub metadata
2022-10-23 18:05:37 +02:00
Claudio Atzori
ec39b84898
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-10-19 15:21:02 +02:00
Claudio Atzori
bca4a61710
suppressing hyper verbose spark logs during unit test execution
2022-10-19 15:20:58 +02:00
Sandro La Bruzzo
72f0d88d6c
formatted code
2022-10-19 14:18:42 +02:00
Claudio Atzori
9b449110c6
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-10-14 15:48:04 +02:00
Claudio Atzori
ae7cd0735a
[graph2hive] more partitions
2022-10-14 15:47:58 +02:00
Sandro La Bruzzo
135cf81151
Merge remote-tracking branch 'origin/beta' into beta
2022-10-13 11:47:25 +02:00
Sandro La Bruzzo
a1f94530a3
added documentation
2022-10-13 11:47:11 +02:00
Claudio Atzori
b47aaf4dd1
[cleaning] subjects declared as belonging to specific vocabularies whose values are not found in the vocab are set to type keyword
2022-10-13 11:23:43 +02:00
Claudio Atzori
6163ecbf63
[cleaning] renamed parameters in wf action
2022-10-11 11:20:03 +02:00
Claudio Atzori
b301e9fdff
[cleaning] renamed action name/description
2022-10-11 11:08:52 +02:00
Claudio Atzori
ece40adc09
[cleaning] fixing NPE in the country cleaning phase
2022-10-11 10:10:20 +02:00
Claudio Atzori
d51275a965
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-10-07 09:52:49 +02:00
Claudio Atzori
8d97949316
[cleaning] fixed loop in wf nodes
2022-10-07 09:52:45 +02:00
Miriam Baglioni
a653e1b3ea
[Enrichment - result to community through organization] reimplementation of the data preparation step using spark
2022-10-04 15:01:28 +02:00
Miriam Baglioni
4d8339614b
Revert "[BipFinder] Fixed issue for wrong escaped char in doi"
...
This reverts commit 188f25eefa
.
2022-10-04 14:29:47 +02:00
Miriam Baglioni
7324853a17
Revert "[BipFinder] refactoring"
...
This reverts commit 28dc317350
.
2022-10-04 14:29:39 +02:00
Miriam Baglioni
28dc317350
[BipFinder] refactoring
2022-10-04 09:47:27 +02:00
Miriam Baglioni
188f25eefa
[BipFinder] Fixed issue for wrong escaped char in doi
2022-10-03 12:42:52 +02:00
Claudio Atzori
89f7007080
Merge pull request '[stats wf] misc changes' ( #254 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #254
2022-10-03 10:32:05 +02:00
dimitrispie
2c0c3f1806
Cast amount to float for table result_apcs
2022-09-28 19:33:24 +03:00
Alessia Bardi
49360770d7
map w3id as instance url
2022-09-28 14:16:39 +02:00
dimitrispie
bdc46e3eaa
Remove denormalization of results to fix downloads numbers in monitor
2022-09-28 14:59:08 +03:00
dimitrispie
2ebb1459a9
Fixed type in no_downloads
2022-09-28 14:36:57 +03:00
Miriam Baglioni
b5b5a4c192
[CleanCountry] fixed issue
2022-09-28 12:42:51 +02:00
Miriam Baglioni
f1d7d45cf7
[BulkTag] fixed issue
2022-09-28 12:01:43 +02:00
Miriam Baglioni
3ec044600d
[BulkTag] fixed conflicts
2022-09-28 11:58:28 +02:00
Miriam Baglioni
1cb79719a7
[BulkTag] fixed issues
2022-09-28 11:44:55 +02:00
Claudio Atzori
f3f7604e6c
trying to fix a test that fails only on Jenkins
2022-09-27 15:21:37 +02:00
Claudio Atzori
de7bc9350e
Merge pull request 'relation-from-odf' ( #251 ) from relation-from-odf into beta
...
Reviewed-on: #251
2022-09-27 15:08:26 +02:00
Claudio Atzori
3f90d159e3
code formatting
2022-09-27 15:08:00 +02:00
Claudio Atzori
0b3e44e521
Merge branch 'beta' into relation-from-odf
2022-09-27 14:57:01 +02:00
Claudio Atzori
b4b6a4457c
Merge pull request 'BulkTagging extension' ( #250 ) from horizontalConstraints into beta
...
Reviewed-on: #250
2022-09-27 14:56:31 +02:00
Claudio Atzori
57dbeb08d2
code formatting
2022-09-27 14:55:10 +02:00
Claudio Atzori
b60985cf68
Merge branch 'beta' into horizontalConstraints
2022-09-27 14:39:31 +02:00
Claudio Atzori
3b60642ef9
Merge pull request 'Synchronize indicators in stats-db with monitor-db' ( #249 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #249
2022-09-27 14:37:33 +02:00
Claudio Atzori
6ad38ade74
Merge pull request 'Clean Country' ( #241 ) from clean_country into beta
...
Reviewed-on: #241
2022-09-27 14:35:35 +02:00
Claudio Atzori
25e9d92aad
Merge branch 'beta' into clean_country
2022-09-27 14:27:49 +02:00
Claudio Atzori
80c5e0f637
code formatting
2022-09-27 12:51:51 +02:00
Alessia Bardi
fd63e9bfac
Mapping all relationships supported in ModelConstants and ModelSupport
2022-09-26 11:24:13 +02:00
Miriam Baglioni
ca216a92ad
[BulkTagging] changed the query to the IS to insert values for FOS and SDG as subject in the configuration used for the tagging
2022-09-23 17:06:07 +02:00
Miriam Baglioni
3e6b0f58bb
[BulkTagging] changed the query to the IS to get also the information for the advancedConstraint from the profile
2022-09-23 16:47:19 +02:00
Miriam Baglioni
4a3e119b73
mergin with branch beta
2022-09-23 16:16:06 +02:00
Miriam Baglioni
f0e303abf9
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-09-23 16:15:32 +02:00
Miriam Baglioni
55da4d8715
[BulkTagging] modifying code to represent constraints horizontally on all the results. Added subject to the set of field used to express the constraint. Modified resorces to test the new approach. Modified test calss
2022-09-23 16:02:19 +02:00
Alessia Bardi
c5eb722170
relationships from relatedIdentifier whose target id type is one of the pid type with an authority
2022-09-23 15:47:05 +02:00
Claudio Atzori
c86cc53520
suppressing hyper verbose spark logs during unit test execution
2022-09-23 15:20:40 +02:00
Claudio Atzori
c01d528ab2
suppressing hyper verbose spark logs during unit test execution
2022-09-23 15:19:50 +02:00
Alessia Bardi
ba33ff71fd
refactoring for the generation of relationships from related identifier of type 'OPENAIRE'
2022-09-23 15:17:13 +02:00
Claudio Atzori
e6d788d27a
[stats wf] adding missing changes lost in PR#248
2022-09-23 14:38:42 +02:00
Alessia Bardi
982bcc1e35
test wrid pid and record identifier
2022-09-23 12:06:06 +02:00
Miriam Baglioni
960cb861a0
refactoring
2022-09-23 11:14:04 +02:00
Claudio Atzori
930f118673
fixed semantic (subreltype) for ServiceOrganization relations
2022-09-22 16:24:44 +02:00
Claudio Atzori
c42850328e
fixed semantic (subreltype) for ServiceOrganization relations
2022-09-22 16:23:25 +02:00
Miriam Baglioni
33bb79459e
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-09-22 15:55:17 +02:00
Claudio Atzori
b2c3071e72
Merge branch 'master' into beta2master_sept_2022
2022-09-22 14:39:15 +02:00
Claudio Atzori
10ec074f79
Merge remote-tracking branch 'antonis.lempesis/beta' into beta2master_sept_2022
2022-09-22 14:12:19 +02:00
dimitrispie
dcd85f8cd7
- Synchronize indicators in stats-db with monitor-db
...
- added new openorg id for Nanyang Technological University
- changed openorg id for University of Helsinki #8088 ticket
2022-09-22 13:33:07 +03:00
Claudio Atzori
7225fe9cbe
integrated changes from discard-non-wellformed
2022-09-22 10:06:07 +02:00
Miriam Baglioni
869e129288
[EOSC BulkTag] refactoring
2022-09-20 16:13:18 +02:00
Miriam Baglioni
840465958b
[EOSC BulkTag] filtering aout the datasources registered in the eosc with compatibility different from 3.0, 4.0 for literature, data and CRIS to add the context eosc to the results
2022-09-20 10:30:41 +02:00
Claudio Atzori
bdc8f993d0
[Patch Hosted By] check also the presence of datasource.officialname.value
2022-09-19 15:28:03 +02:00
Miriam Baglioni
ec87149cb3
[Patch Hosted By] added fix to avoi NPE error when datasource official name is not provided. Removing datasources if no officialname has been provided
2022-09-19 14:06:52 +02:00
Miriam Baglioni
b42e2c9df6
[Patch Hosted By] added fix to avoi NPE error when datasource official name is not provided
2022-09-19 12:30:32 +02:00
Miriam Baglioni
1329aa8479
[EOSC BulkTag] modified test to remove association of result to eosc when eoscifguidelines are set
2022-09-19 11:59:48 +02:00
Miriam Baglioni
a0ee1a8640
[EOSC BulkTag] remove addition of eosc context for result with eosc if guidelines set
2022-09-19 11:44:10 +02:00
Claudio Atzori
e45ec15221
Merge branch 'beta' into clean_country
2022-09-19 11:34:02 +02:00
Claudio Atzori
26e1badded
added instance.url syntactical validation, avoid creating multiple duplicated URLs
2022-09-19 11:19:10 +02:00
Miriam Baglioni
5240ac3d7b
[EOSC Tag] remove addition of eosc context for result with eosc if guidelines set
2022-09-19 11:02:18 +02:00
Claudio Atzori
192215a18e
merged from branch discard-non-wellformed
2022-09-19 10:17:10 +02:00
Claudio Atzori
96062164f9
Merge pull request '[Aggregator graph|master] Discard invalid records' ( #245 ) from discard-non-wellformed into master
...
Reviewed-on: #245
2022-09-19 09:48:16 +02:00
Claudio Atzori
35bb7c423f
updated dhp-schemas version to 2.12.1
2022-09-16 16:13:15 +02:00
Claudio Atzori
fd87571506
code formatting
2022-09-16 16:05:03 +02:00
Claudio Atzori
c527112e33
Merge commit 'ff6f789b6d9be0567b6ad72f8a0e75fe3f52726a' into beta2master_sept_2022
2022-09-16 15:59:10 +02:00
Claudio Atzori
65209359bc
Merge commit 'b5f7bd30be7f7adaaa28170740da0484b50a77ed' into beta2master_sept_2022
2022-09-16 15:58:11 +02:00
Claudio Atzori
d72a64ded3
Merge commit '690be4482fc84327dc7617acbc8d976d559df512' into beta2master_sept_2022
2022-09-16 15:57:44 +02:00
Claudio Atzori
3e8499ce47
Merge commit '71b069ca90a2f7ec09d64241c60917d3636fc81e' into beta2master_sept_2022
2022-09-16 15:57:20 +02:00
Claudio Atzori
61aacb3271
Merge commit '1203378441dc6d8e8435cacd42e76e11746f6d1b' into beta2master_sept_2022
2022-09-16 15:56:55 +02:00
Claudio Atzori
dbb567251a
merged 853c996fa2
2022-09-16 15:56:28 +02:00
Claudio Atzori
c7e8ad853e
Merge commit '2b5f8c9c9a3611c57ee5febfe262a455a39ad801' into beta2master_sept_2022
2022-09-16 15:55:04 +02:00
Claudio Atzori
0849ebfd80
merged a11eb38065
2022-09-16 15:54:32 +02:00
Claudio Atzori
281239249e
Merge commit 'b7c387c21f946adbc9da90ded95166205195edb0' into beta2master_sept_2022
2022-09-16 15:49:20 +02:00
Claudio Atzori
45fc5e12be
Merge commit 'cb7c07c54e59675e8dffe42b7f2a13f16c956068' into beta2master_sept_2022
2022-09-16 15:48:55 +02:00
Claudio Atzori
1c05aaaa2e
Merge commit '3418ce50ac9b28fed4fa949919e6c8208738cdcf' into beta2master_sept_2022
2022-09-16 15:48:36 +02:00
Claudio Atzori
01d5ad6361
Merge commit 'd85ba3c1a9d7f0e80565742161ff6c9ecffd52b7' into beta2master_sept_2022
2022-09-16 15:48:16 +02:00
Claudio Atzori
d872d1cdd9
Merge commit 'a4815f6bec87f05be8cd740d236707949a0f746e' into beta2master_sept_2022
2022-09-16 15:47:49 +02:00
Claudio Atzori
ab0efecab4
Merge commit '84598c75356cf580de6c81653a9351e9b8173639' into beta2master_sept_2022
2022-09-16 15:47:05 +02:00
Claudio Atzori
725c3c68d0
Merge commit '844f6eb46533cdd4be3210401b10401322079640' into beta2master_sept_2022
2022-09-16 15:46:40 +02:00
Claudio Atzori
300ae6221c
Merge commit '32cee1f619eb30d2e2ac6083435b76b1aba7db09' into beta2master_sept_2022
2022-09-16 15:45:57 +02:00
Claudio Atzori
0ec2eaba35
Merge commit 'c1f2ffc53dc41f1fac3855b2d2df7d6a5ea15e3e' into beta2master_sept_2022
2022-09-16 15:45:27 +02:00
Claudio Atzori
a387807d43
Merge commit 'b78889a0ce27a79c7ab2d8da05b118ee4f1bcb36' into beta2master_sept_2022
2022-09-16 15:44:17 +02:00
Claudio Atzori
2abe2bc137
Merge commit '08ce2cadc2d84aa982726e429c280a905536a715' into beta2master_sept_2022
2022-09-16 15:43:49 +02:00
Claudio Atzori
a07c876922
Merge commit '27a91841e7fa2a1b615b4d1e161d606db5bead96' into beta2master_sept_2022
2022-09-16 15:43:02 +02:00
Claudio Atzori
cbd48bc645
Merge commit 'efd96e7e664e4139321e35e8d172b884ba4b61a1' into beta2master_sept_2022
2022-09-16 15:38:56 +02:00
Claudio Atzori
e370e940d8
[aggregator graph] save invalid records aside for further inspection
2022-09-16 14:06:28 +02:00
Claudio Atzori
465e941214
Merge pull request '[stats wf] Changes to indicators tables' ( #244 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #244
2022-09-16 10:13:58 +02:00
Claudio Atzori
1e42d984e1
[aggregator graph] save invalid records aside for further inspection
2022-09-15 10:49:42 +02:00
Alessia Bardi
9e7ec4198f
fixed test
2022-09-14 18:08:56 +02:00
Claudio Atzori
c48f6e9c57
[aggregator graph] save invalid records aside for further inspection
2022-09-14 17:11:26 +02:00
dimitrispie
3bf3127251
Changes to monitor and indicator scripts
2022-09-14 16:36:19 +03:00
Claudio Atzori
a0919ed495
[aggregator graph] save invalid records aside for further inspection
2022-09-14 13:27:39 +02:00
Alessia Bardi
b99a011345
return empty Oaf list if record cannot be parsed
2022-09-13 11:51:55 +02:00
Alessia Bardi
27af5122d2
logs for non well formed XML files
2022-09-12 14:25:23 +02:00
Claudio Atzori
5066db3386
Merge pull request 'subjects cleaning' ( #239 ) from clean_subjects into beta
...
Reviewed-on: #239
2022-09-09 15:17:02 +02:00
Claudio Atzori
ff6f789b6d
code formatting
2022-09-09 15:16:31 +02:00
Claudio Atzori
b5d6966c01
Merge branch 'beta' into clean_country
2022-09-09 12:20:19 +02:00
Claudio Atzori
b5f7bd30be
Merge branch 'beta' into clean_subjects
2022-09-09 12:20:04 +02:00
Claudio Atzori
690be4482f
Merge pull request '#7861#note-8 instance url from handle' ( #243 ) from handle_as_instance_urls into beta
...
Reviewed-on: #243
2022-09-09 12:19:17 +02:00
Alessia Bardi
f14107ad77
Merge branch 'handle_as_instance_urls' of https://code-repo.d4science.org/D-Net/dnet-hadoop into handle_as_instance_urls
2022-09-09 12:17:19 +02:00
Alessia Bardi
a539c6ccaf
https for handle URLs
2022-09-09 12:16:28 +02:00
dimitrispie
71b069ca90
Changes to indicator and monitor scripts
2022-09-09 13:15:58 +03:00
Claudio Atzori
1203378441
Merge branch 'beta' into clean_subjects
2022-09-09 10:38:47 +02:00
Claudio Atzori
14dc909a14
Merge branch 'beta' into clean_country
2022-09-09 10:38:17 +02:00
Claudio Atzori
853c996fa2
Merge branch 'beta' into handle_as_instance_urls
2022-09-09 09:47:16 +02:00
Claudio Atzori
a431e01383
Merge pull request 'orcid_multipleworks_download' ( #242 ) from enrico.ottonello/dnet-hadoop:orcid_multipleworks_download into beta
...
Reviewed-on: #242
2022-09-09 08:45:02 +02:00
Alessia Bardi
9ef063d502
#7861#note-8 instance url from handle
2022-09-07 17:29:54 +03:00
Alessia Bardi
5c45d52af3
testing for RiuNet
2022-09-07 15:40:57 +03:00
dimitrispie
2b5f8c9c9a
comment out duplicate table creation
2022-09-06 12:27:53 +03:00
Alessia Bardi
a11eb38065
testing for RO-Hub
2022-09-02 16:07:36 +02:00
Enrico Ottonello
bfdf2dc390
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid_multipleworks_download
2022-08-25 12:07:54 +02:00
Enrico Ottonello
da1cf561e6
alignment with beta
2022-08-25 11:57:20 +02:00
Enrico Ottonello
27445ccdaa
cleaned log
2022-08-25 11:56:14 +02:00
Claudio Atzori
b7c387c21f
cleaning of subjects: avoid duplicated subjects, prioritise collected vs inferred or other sources
2022-08-12 15:09:16 +02:00
Claudio Atzori
adb526b0e1
Merge branch 'beta' into clean_subjects
2022-08-12 10:51:17 +02:00
Claudio Atzori
cb7c07c54e
[scholix] added step to create tar archive
2022-08-11 11:25:24 +02:00
Claudio Atzori
2aa16d0432
[scholix] fixed OpenCitation dump procedure
2022-08-10 17:39:29 +02:00
Miriam Baglioni
7dbdd4a0fe
[Clean Country]changes related to #241 (comment)
2022-08-10 15:13:10 +02:00
Claudio Atzori
51ad93e545
[scholix] fixed OpenCitation dump procedure
2022-08-10 11:57:56 +02:00
Miriam Baglioni
62d2138806
[Clean Context] changed a bit the logic. Added the check not to have result hosted by a datasource of type institutional repository from NL. Added also the check that the country should have been included in the result via propagation for it to be removed
2022-08-08 14:10:47 +02:00
Claudio Atzori
3418ce50ac
cleaning of subjects: perform the cleaning when the given value is equivalent to one of the terms in the vocabulary
2022-08-08 12:48:47 +02:00
Claudio Atzori
a78028dabc
Merge branch 'beta' into clean_subjects
2022-08-08 12:34:33 +02:00
Miriam Baglioni
390013a4b2
mergin with branch beta
2022-08-08 12:30:31 +02:00
Claudio Atzori
d85ba3c1a9
Merge pull request 'serialising field eoscifguidelines field in the Solr XML records' ( #234 ) from tagEosc into beta
...
Reviewed-on: #234
2022-08-08 10:28:41 +02:00
Claudio Atzori
3937ff04de
Merge branch 'beta' into tagEosc
2022-08-08 09:57:23 +02:00
Claudio Atzori
a4815f6bec
Merge branch 'beta' into clean_subjects
2022-08-05 16:57:03 +02:00
Claudio Atzori
29c4cde42e
Merge branch 'clean_subjects' of https://code-repo.d4science.org/D-Net/dnet-hadoop into clean_subjects
2022-08-05 16:56:37 +02:00
Claudio Atzori
4eaa063b1f
cleaning of subjects
2022-08-05 16:56:09 +02:00
Claudio Atzori
84598c7535
Merge pull request 'restored some collab indicators' ( #240 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #240
2022-08-05 15:50:39 +02:00
Antonis Lempesis
fcef5294e2
restored some collab indicators
2022-08-05 13:45:01 +03:00
Claudio Atzori
844f6eb465
Merge branch 'beta' into clean_subjects
2022-08-05 12:39:05 +02:00
Claudio Atzori
32cee1f619
WIP: cleaning of subjects
2022-08-05 12:32:08 +02:00
Claudio Atzori
c1f2ffc53d
Merge pull request 'commenting out the collab indicators because they still fail' ( #237 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #237
2022-08-05 11:57:36 +02:00
Antonis Lempesis
227e10f4b3
commenting out the collab indicators because they still fail
2022-08-05 12:54:36 +03:00
Claudio Atzori
6c0fd9284b
merge from beta
2022-08-05 10:42:53 +02:00
Claudio Atzori
b78889a0ce
WIP: cleaning of subjects
2022-08-05 09:11:37 +02:00
Claudio Atzori
08ce2cadc2
Merge pull request '[Graph Dump] Remove code from dnet-hadoop' ( #235 ) from removeDump into beta
...
Reviewed-on: #235
2022-08-05 09:09:50 +02:00
Miriam Baglioni
a7a18d7630
[Graph Dump] removed code for the dump from the project. Fixed issues in tests when possible
2022-08-04 17:40:40 +02:00
Claudio Atzori
499826ead1
serialising field eoscifguidelines field in the Solr XML records
2022-08-04 12:40:48 +02:00
Claudio Atzori
27a91841e7
WIP: cleaning of subjects
2022-08-04 11:39:39 +02:00
Antonis Lempesis
b09d7ddc74
fixed the datasourceOrganization relations
2022-08-03 12:26:50 +02:00
Claudio Atzori
e62018e95d
[aggregator graph] added more assertions in test
2022-08-03 12:26:05 +02:00
Claudio Atzori
efd96e7e66
Merge pull request 'fixed the datasourceOrganization relations' ( #233 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #233
2022-08-03 12:25:05 +02:00
Antonis Lempesis
8b0407d8ec
fixed the datasourceOrganization relations
2022-08-03 12:26:59 +03:00
Claudio Atzori
eb53b52f7c
code formatting
2022-08-02 13:24:47 +02:00
Claudio Atzori
27681cf6bf
Merge pull request '[stats wf] latest version of indicators + added FOS classification' ( #232 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #232
2022-08-02 12:57:15 +02:00
Antonis Lempesis
1778d40c40
latest version of indicators
2022-08-02 13:39:34 +03:00
Claudio Atzori
209c7e9dab
[datacite] avoid UnsupportedOperationException
2022-08-01 09:05:35 +02:00
Enrico Ottonello
64311b8be4
removed unuseful accumulator
2022-07-31 01:03:29 +02:00
Antonis Lempesis
6fc9ef53f6
addded command line params to allow hive actions to run
2022-07-29 16:36:20 +03:00
Antonis Lempesis
9886fe87ec
- Added FOS classification
...
- Added extra orgs in monitor
- Fixed result-project and organization-project tables
2022-07-29 16:34:50 +03:00
Claudio Atzori
92e48f12f7
[metadata collection] updated collector plugin name
2022-07-29 13:54:00 +02:00
Claudio Atzori
f62c4e05cd
code formatting
2022-07-29 11:56:01 +02:00
Claudio Atzori
0727f0ef48
[EOSC tag] avoid NPEs
2022-07-29 11:55:34 +02:00
Miriam Baglioni
3329b6ce6b
[EOSC TAG] added fix for NPE on subjects
2022-07-29 10:54:20 +02:00
Claudio Atzori
37cfda0fc5
Merge pull request 'participant project contribution' ( #223 ) from project_organization_contribution into beta
...
Reviewed-on: #223
2022-07-28 12:16:30 +02:00
Claudio Atzori
1dd1e4fe3a
extended test for mapping project_organization relations
2022-07-28 11:27:08 +02:00
Claudio Atzori
60e4fbd78b
Merge branch 'beta' into project_organization_contribution
2022-07-28 10:15:43 +02:00
Claudio Atzori
ed98a6d9d0
[Datacite mapping] include the older datacite prefixed OpenAIRE id among the originalId[]
2022-07-28 10:15:14 +02:00
Claudio Atzori
09ccc7b472
Merge branch 'beta' into project_organization_contribution
2022-07-28 09:49:59 +02:00
Sandro La Bruzzo
67525076ec
fixed test, now it compiles after commit a6977197b3
2022-07-26 15:35:17 +02:00
Claudio Atzori
26104826c4
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-07-26 14:34:29 +02:00
Claudio Atzori
c03e20be39
Merge pull request 'EOSC COntext Tagging' ( #231 ) from eosc_context_tagging into beta
...
Reviewed-on: #231
2022-07-26 09:20:53 +02:00
Claudio Atzori
d43663d30f
adapted RorActionSet test, it should not create parent/child rels
2022-07-25 17:54:10 +02:00
Miriam Baglioni
35bcd9422d
[EOSC Context Tagging] removed not needed specification in path
2022-07-25 15:45:22 +02:00
Miriam Baglioni
1c82acb168
[EOSC Context Tagging] refactoring: moved EOSC IF tagging in package eosc under bulkTag
2022-07-25 14:26:39 +02:00
Miriam Baglioni
68cb637832
merge with branch beta
2022-07-25 14:24:25 +02:00
Miriam Baglioni
0172bab251
[EOSC Context Tagging] refactoring
2022-07-25 14:16:45 +02:00
Claudio Atzori
3c23d634eb
Merge pull request 'EOSC IF' ( #230 ) from tagEosc into beta
...
Reviewed-on: #230
2022-07-25 14:14:53 +02:00
Claudio Atzori
612b7a5530
Merge branch 'beta' into tagEosc
2022-07-25 14:12:59 +02:00
Claudio Atzori
3f883c4ecc
Merge pull request 'pubmed_update' ( #228 ) from pubmed_update into beta
...
Reviewed-on: #228
2022-07-25 14:10:35 +02:00
Claudio Atzori
c3ede1b379
Merge branch 'beta' into pubmed_update
2022-07-25 14:10:22 +02:00
Miriam Baglioni
144c103b67
[EOSC Context Tagging] add check to avoid the insertion of the context if already present
2022-07-25 13:52:45 +02:00
Enrico Ottonello
657b0208a2
multiple works download (<=100) for single request
2022-07-25 12:37:39 +02:00
Miriam Baglioni
d091866e48
[EOSC Context Tagging] refactoring
2022-07-25 11:12:22 +02:00
Miriam Baglioni
5968ec018d
[Clean Country] modified workflow and added param file
2022-07-22 16:48:38 +02:00
Miriam Baglioni
a12d28c644
[Clean Country] added logic not to remove country from result if it exist a hosting datasource with that country. Moreover the country will be removed only if added with propagation
2022-07-22 16:23:12 +02:00
Miriam Baglioni
2c933f1158
mergin with branch beta
2022-07-22 14:57:41 +02:00
Miriam Baglioni
06a95daf60
[EOSC context TAG] refactoring after compilation
2022-07-22 14:57:06 +02:00
Miriam Baglioni
ffb0ce3fb9
mergin with branch beta
2022-07-22 14:55:55 +02:00
Miriam Baglioni
627332526b
[EOSC context TAG] workflow start from reset_outputpath action
2022-07-22 14:55:11 +02:00
Miriam Baglioni
7a1c1b6f53
[EOSC context TAG] Add test class and resourcesK
2022-07-22 14:36:02 +02:00
Sandro La Bruzzo
ddc414b258
fixed wrong json param
2022-07-22 09:43:15 +02:00
Miriam Baglioni
317a4a56ef
[EOSC context TAG] first implementation of the logic to tag results imported from datasources registered in the EOSC
2022-07-21 17:37:48 +02:00
Miriam Baglioni
3be036f290
[EOSC TAG] refactoring after compilation
2022-07-21 14:45:43 +02:00
Miriam Baglioni
e61b8e6b03
mergin with branch beta
2022-07-21 14:43:23 +02:00
Miriam Baglioni
56d09e6348
[EOSC TAG] before adding the tag added a step to verify the same tag is not already present
2022-07-21 14:36:48 +02:00
Miriam Baglioni
5143a80232
[EOSC TAG] modification of test class to align with new element
2022-07-21 11:56:51 +02:00
Claudio Atzori
d900a02b74
Merge pull request 'implemented oozie workflow to generate scholix dump filtering relclass semantic' ( #229 ) from opencitation_enrichments into beta
...
Reviewed-on: #229
2022-07-21 10:12:17 +02:00
Sandro La Bruzzo
5f651f2316
changed filter relation on SubRelType
2022-07-21 10:11:48 +02:00
Miriam Baglioni
438abdf96f
[EOSC TAG] adding eosc interoperability guidelines in the specific element in the result. Removed from subjects. Removed also the deletion of EOSC Jupyter Notebook from subject since now the criteria are searchd for in a different place
2022-07-20 18:07:54 +02:00
Miriam Baglioni
65cc736e2f
[Clean Country] first implementation to remove country NL from results collected from NARCIS when doi starts with mendely prefix
2022-07-20 17:05:56 +02:00
Sandro La Bruzzo
5b76321d9c
implemented oozie workflow to generate scholix dump filtering relclass semantic
2022-07-20 16:34:32 +02:00
Claudio Atzori
18b505d6a3
Merge branch 'master' into beta
2022-07-19 14:18:02 +02:00
Claudio Atzori
1138b2ac8e
code formatting
2022-07-19 14:15:49 +02:00
Sandro La Bruzzo
00168303db
Added unit test to verify the generation in the OriginalID the old openaire Identifier generated by OAI
2022-07-14 10:19:59 +02:00
Sandro La Bruzzo
0a4f4d98fa
added PMCId to PmArticle
2022-07-13 15:27:17 +02:00
Alessia Bardi
28a32facf6
Merge pull request 'mapping `oaf:fulltext` element in the `result.fulltext` field' ( #226 ) from oaf_fulltext_mapping into beta
...
Reviewed-on: #226
2022-07-12 11:13:08 +02:00
Claudio Atzori
0c1cfee396
mapping oaf:fulltext elements in the result.fulltext field
2022-07-11 17:34:59 +02:00
Miriam Baglioni
fae681fea1
[Country Propagation] add check to avoid NPE on datasource.getDatasourceType().getClassis()
2022-07-03 17:39:58 +02:00
Miriam Baglioni
c09fcdb40b
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-07-01 12:38:03 +02:00
Claudio Atzori
138d1dfbf8
Merge pull request 'score class in the XML serialization' ( #225 ) from measure_serialization into beta
...
Reviewed-on: #225
2022-07-01 10:53:49 +02:00
Claudio Atzori
446699c59d
Merge pull request '[Graph Dump] New funded products dump' ( #222 ) from dump_new_funded_products into master
...
Reviewed-on: #222
2022-07-01 10:51:36 +02:00
Claudio Atzori
0cb1c70788
code formatting
2022-07-01 10:44:08 +02:00
Claudio Atzori
4ec13e2b66
Merge branch 'master' into dump_new_funded_products
2022-07-01 10:30:28 +02:00
Claudio Atzori
2f998b2429
Merge pull request '[Graph DUMP] add code to produce the delta of new projects with respect to the previous delta/dump' ( #221 ) from dump_delta_projects into master
...
IMO looks good, I think it can be integrated in the master branch.
Reviewed-on: #221
2022-07-01 10:30:10 +02:00
Claudio Atzori
072f192853
include the class information in the measure XML serialization
2022-07-01 09:54:56 +02:00
Claudio Atzori
a88103bcf9
[action manager] added more testing
2022-07-01 09:06:59 +02:00
Claudio Atzori
7da24c1dec
added more logging
2022-06-28 13:47:49 +02:00
Miriam Baglioni
ee1f1eeca2
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-06-28 11:06:32 +02:00
Miriam Baglioni
71744a1f52
[DUMP DELTA PROJECTS] refactoring
2022-06-27 18:07:58 +02:00
Miriam Baglioni
1d1fe3b151
[DUMP DELTA PROJECTS] refactoring
2022-06-27 18:04:59 +02:00
Claudio Atzori
a8773af0cb
Merge branch 'beta' into project_organization_contribution
2022-06-27 09:37:40 +02:00
Claudio Atzori
cba9c2b7cc
Merge pull request 'author name parsing' ( #220 ) from author_name_particles into beta
...
Reviewed-on: #220
2022-06-27 09:37:27 +02:00
Claudio Atzori
4829b96bb5
Merge branch 'beta' into author_name_particles
2022-06-27 09:37:03 +02:00
Claudio Atzori
316b0fd73c
added 'von' to the name particles file
2022-06-27 09:36:51 +02:00
Claudio Atzori
5130eac247
mapping by participant project contribution
2022-06-24 17:16:42 +02:00
Claudio Atzori
929b145130
code formatting
2022-06-21 23:07:06 +02:00
Miriam Baglioni
edddfc6c63
[DUMP DELTA PROJECTS] adding test and resource
2022-06-21 18:28:53 +02:00
Miriam Baglioni
f561f13dd9
[Funder Products Dump] fixed names of parameters in workflow
2022-06-21 18:18:17 +02:00
Miriam Baglioni
ff74e73369
[DUMP NEW FUNDED PRODUCTS] change in resources
2022-06-21 18:02:51 +02:00
Miriam Baglioni
b98f904d48
[Funder Products Dump] new way to avoid using hive
2022-06-21 17:52:27 +02:00
Miriam Baglioni
7423577a08
[Graph DUMP] add code to produce the delta of new projects with respect to the previous delta/dump
2022-06-21 14:51:38 +02:00
Claudio Atzori
c76ff6c613
Merge pull request '7096-fileGZip-collector-plugin' ( #211 ) from 7096-fileGZip-collector-plugin into beta
...
Reviewed-on: #211
2022-06-16 15:34:45 +02:00
Claudio Atzori
b295a40d9c
restored use of name_particles when parsing author names
2022-06-16 12:20:43 +02:00
Claudio Atzori
c7b09c6225
Merge branch 'beta' into 7096-fileGZip-collector-plugin
2022-06-16 09:28:50 +02:00
Claudio Atzori
875ae29961
Merge pull request 'mapping relationship from trasformed records based on `oaf:relation`' ( #219 ) from oaf_relation_mapping into beta
...
Reviewed-on: #219
2022-06-16 09:27:19 +02:00
Claudio Atzori
e03c0c7794
Merge branch 'beta' into oaf_relation_mapping
2022-06-16 09:27:01 +02:00
Claudio Atzori
06b5533d4c
Merge branch 'beta' into 7096-fileGZip-collector-plugin
2022-06-16 09:22:16 +02:00
Claudio Atzori
4c8e820ff0
mapping relationship from trasformed records based on oaf:relation
2022-06-14 08:49:02 +02:00
Alessia Bardi
88d531dc91
exclude FAIRsharing records from Datacite
2022-06-13 16:17:17 +02:00
Claudio Atzori
116902c028
mapping relationship from trasformed records based on oaf:relation
2022-06-13 14:31:48 +02:00
Claudio Atzori
b8cda65487
code formatting
2022-06-13 09:20:03 +02:00
Michele Artini
634869ce95
deleted hierarchical rels from ror action set
2022-06-13 09:12:21 +02:00
Alessia Bardi
922c6d66ef
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-06-10 17:29:15 +02:00
Alessia Bardi
68bd58d6a4
tests for ROHub
2022-06-10 17:29:11 +02:00
Miriam Baglioni
b229c6e7af
Merge pull request 'beta' ( #218 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #218
2022-06-10 11:03:48 +02:00
Antonis Lempesis
ab18c9daa9
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta
2022-06-09 15:48:21 +03:00
Antonis Lempesis
574492c659
removed double result_apc table creation from monitor
2022-06-09 15:48:13 +03:00
Michele Artini
b94a791bc5
unit tests to transform cnr explora
2022-06-09 12:25:34 +02:00
Miriam Baglioni
ab8868bd3a
[ZENODO-API] changed to iterate in all the deposited products and not just the last ten
2022-06-08 17:03:15 +02:00
Miriam Baglioni
4b6913787b
[DOI-BOOST] added one method in test of crossref mapping to aof and one resource. Related to ticket 7807
2022-06-08 14:55:19 +02:00
Antonis Lempesis
db088cc69c
fixed *_organization tables
2022-06-07 04:04:28 +03:00
Miriam Baglioni
31d4557e8d
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2022-06-06 11:52:29 +02:00
Claudio Atzori
5c2949a864
Merge pull request '[stats wf] added open citations & more orgs in monitor, removed collab indicator' ( #213 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #213
2022-05-20 11:38:43 +02:00
Miriam Baglioni
5e0b8f9b5f
[CountryPropagation] refactoring
2022-05-20 09:15:53 +02:00
Miriam Baglioni
c298c148cb
[CountryPropagation] fix NPE issue
2022-05-20 09:11:46 +02:00
Miriam Baglioni
eaf9385ae5
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-05-17 15:09:37 +02:00
Miriam Baglioni
f5207885e3
[EOSCTag] changed code to remove EOSC Jupyter Notebook and modified test to exclude galaxy + software from the tagging for Galaxy
2022-05-17 15:09:22 +02:00
Claudio Atzori
d098ad0d93
[hb patch] updated map
2022-05-16 15:54:04 +02:00
Claudio Atzori
1dda11e031
[hb patch] updated map
2022-05-16 15:53:27 +02:00
Claudio Atzori
8dd5517548
code formatting
2022-05-16 14:35:24 +02:00
Claudio Atzori
52cb086506
[graph grouping] drop relation target path before copying from source
2022-05-16 12:08:36 +02:00
Claudio Atzori
6442763f97
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-05-16 12:07:45 +02:00
Claudio Atzori
997c50078e
[graph grouping] drop relation target path before copying from source
2022-05-16 12:07:40 +02:00
Sandro La Bruzzo
c1971d52c4
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2022-05-16 10:30:35 +02:00
Sandro La Bruzzo
4c50f35c8b
update publication Date format
2022-05-16 10:29:36 +02:00
Michele Artini
46c07e0724
deleted hierarchical rels from ror action set
2022-05-16 09:39:54 +02:00
Claudio Atzori
6031acb2e3
[openorgs] fixed parent/child query, using the correct semantic labels
2022-05-16 09:20:48 +02:00
Claudio Atzori
0dc33ea391
[openorgs] fixed parent/child query, using the correct semantic labels
2022-05-16 09:20:30 +02:00
Antonis Lempesis
8160763330
fixed conflict
2022-05-13 14:29:31 +03:00
Antonis Lempesis
3fc9efeab6
fixed typo, addded open citations and apcs in monitor
2022-05-13 14:28:13 +03:00
Miriam Baglioni
e4eac1d20b
[EOSC TAG] added code to remove EOSC Jupyter Notebook from subjects and put EOSC as classid in the qualifier
2022-05-13 11:01:33 +02:00
Antonis Lempesis
c25134f28d
fixed typo
2022-05-12 14:55:47 +03:00
Sandro La Bruzzo
22f65680b9
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2022-05-11 15:30:12 +02:00
Sandro La Bruzzo
ca8d26bcb4
added better filter for openCitations
2022-05-11 15:29:57 +02:00
Claudio Atzori
5d3b4a9c25
[graph merge beta] merge datasource originalid, collectedfrom, and pid lists
2022-05-11 14:13:06 +02:00
Antonis Lempesis
23334479bb
removed yet another collab, added more orgs in monitor
2022-05-11 13:05:52 +03:00
Claudio Atzori
2a8e0fb72f
[openorgs] mapping parent/child relations without massaging the semantic labels
2022-05-10 08:45:53 +02:00
Claudio Atzori
77bc9863e9
[openorgs] mapping parent/child relations without massaging the semantic labels
2022-05-09 16:06:04 +02:00
Claudio Atzori
378020e30a
[eosc_services] unit test adaptation
2022-05-09 16:05:06 +02:00
Miriam Baglioni
89657a0b78
[UsageCount] refactoring
2022-05-09 14:43:27 +02:00
Miriam Baglioni
a056f59c6e
[UsageCount] make it as an action set as it should be, plus changed the test to make them work as well now
2022-05-09 12:51:35 +02:00
Antonis Lempesis
61b4c19e65
restored indi_result_org_country_collab, removed indi_result_org_collab
2022-05-06 12:52:10 +03:00
Antonis Lempesis
cfbbcaf7c4
commented out indi_result_org_country_collab
2022-05-06 12:49:36 +03:00
Claudio Atzori
658450d9a3
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-05-05 11:38:08 +02:00
Claudio Atzori
846975c886
[eosc_services] using the correct 'keyword' subject type, as declared in the dnet:subject_classification_typologies vocabulary
2022-05-05 11:37:58 +02:00
Miriam Baglioni
5fe25cc51c
Merge pull request '[eosc tag] set the eosc subjects, rough implementation' ( #215 ) from eosc_tag into beta
...
Reviewed-on: #215
2022-05-04 10:11:14 +02:00
Miriam Baglioni
8a72de4011
[EOSCTag] modified workflow to execute all the steps and not only the last one
2022-05-04 10:10:56 +02:00
Miriam Baglioni
bd1108f98b
mergin with branch beta
2022-05-04 10:06:56 +02:00
Miriam Baglioni
3aeedd931a
[EOSCTag] fixed issue in case description is null. Modified test resources and classes
2022-05-04 10:06:38 +02:00
Claudio Atzori
da611cfbbd
[eosc_services] resolved merge conflicts
2022-05-03 13:37:15 +02:00
Claudio Atzori
9e12cb3c92
EOSC Services - removed field knowledgegraph; depending on the released schema module
2022-05-03 11:55:45 +02:00
Miriam Baglioni
a21fe310e5
[EOSCTag] last test and change in the implementation to search in title and descriptio
2022-05-02 17:43:20 +02:00
Claudio Atzori
2ade69dea6
EOSC Services - minor
2022-05-02 17:03:31 +02:00
Claudio Atzori
b6a7ff3a99
EOSC Services - removed fields from mapping, testing preparation
2022-05-02 15:52:33 +02:00
Miriam Baglioni
e37177e1ce
mergin with branch beta
2022-05-02 12:31:50 +02:00
Claudio Atzori
a8c51f6f16
EOSC Services - fixed query and testing preparation
2022-05-02 11:09:03 +02:00
Claudio Atzori
05c1ea92e9
EOSC Services - added Service-specific fields in the XML record serialization
2022-04-29 15:56:55 +02:00
Claudio Atzori
f5f532d134
EOSC Services - ongoing update
2022-04-29 12:25:24 +02:00
Antonis Lempesis
0353f93d54
added new hive opts
2022-04-29 12:49:27 +03:00
Serafeim Chatzopoulos
623f7be26d
Fix reading files from HDFS in FileCollector & FileGZipCollector plugins
2022-04-28 16:31:11 +03:00
Claudio Atzori
5ffc24d1ba
EOSC Services - ongoing update
2022-04-26 16:18:41 +02:00
Sandro La Bruzzo
78015a5733
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2022-04-26 09:56:34 +02:00
Sandro La Bruzzo
8c22e5c30a
added fix to include date array with only year or year and month
2022-04-26 09:56:27 +02:00
Claudio Atzori
81c4496d32
Merge branch 'beta' into 7096-fileGZip-collector-plugin
2022-04-26 09:02:15 +02:00
Miriam Baglioni
e342ec93f0
[EOSCTag] prepared resources for test
2022-04-22 18:35:37 +02:00
Miriam Baglioni
88562c0930
[EOSC TAG] added test for galaxy for title and description criterias
2022-04-22 18:35:03 +02:00
Miriam Baglioni
dfbd2bcbea
[EOSC TAG] added logic in case subject is null
2022-04-22 18:34:03 +02:00
Miriam Baglioni
27c85e901a
[EOSCTag] added resources and finalized test for Jupyter Notebook tagging
2022-04-22 17:38:10 +02:00
Miriam Baglioni
87bff36d9e
mergin with branch beta
2022-04-22 15:52:34 +02:00
Claudio Atzori
81242538e6
Merge pull request 'Oozie workflow for cleancontext' ( #216 ) from cleancontext into beta
...
Reviewed-on: #216
Looks good. We need to extend the cleaning workflow parameters to enable the extra step only when it is needed.
2022-04-22 15:46:40 +02:00
Miriam Baglioni
911ce0780a
Merge branch 'cleancontext' of https://code-repo.d4science.org/D-Net/dnet-hadoop into cleancontext
2022-04-22 15:41:42 +02:00
Miriam Baglioni
19d90658fc
[Clean Context] added description to parameters
2022-04-22 15:41:23 +02:00
Claudio Atzori
54162f5c4f
Merge branch 'beta' into cleancontext
2022-04-22 11:49:33 +02:00
Miriam Baglioni
bbb77052d3
[EOSCTag] first test
2022-04-22 11:32:57 +02:00
Claudio Atzori
30105f0722
Merge branch 'beta' into 7096-fileGZip-collector-plugin
2022-04-22 11:22:21 +02:00
Sandro La Bruzzo
a82ec3aaaf
code formatter
2022-04-22 11:08:13 +02:00
Sandro La Bruzzo
aa12429f50
Modified last intersection since we lost many titles.
2022-04-22 11:05:08 +02:00
Miriam Baglioni
7cb7066472
[EoscTag] first "rough" implementation
2022-04-22 10:44:17 +02:00
Sandro La Bruzzo
d660895b30
fixed wrong mapping type of dataset
2022-04-21 20:41:13 +02:00
Miriam Baglioni
e0915061c2
[Clean Context] fixed issue in param name
2022-04-21 16:32:40 +02:00
Miriam Baglioni
6dc68c48e0
[EOSCTag] -
2022-04-21 16:19:04 +02:00
Miriam Baglioni
9a961a0092
[Clean Context] fixed issue in param name
2022-04-21 15:12:24 +02:00
Claudio Atzori
29150a5d0c
code formatting
2022-04-21 13:31:56 +02:00
Miriam Baglioni
5b7d9e741c
[Clean Context] added logic to cleaning workflow to accomodate also context cleaning
2022-04-21 13:02:14 +02:00
Miriam Baglioni
ccba1a3db1
[Clean Context] added logic to cleaning workflow to accomodate also context cleaning
2022-04-21 13:00:06 +02:00
Claudio Atzori
a289c9eae2
Merge pull request '[Measures] added new measure (UsageCounts)' ( #214 ) from eosc_dimitris into beta
...
Reviewed-on: #214
2022-04-21 12:19:18 +02:00
Miriam Baglioni
20de75ca64
[Measures] removed typo
2022-04-21 12:14:03 +02:00
Miriam Baglioni
bebb2a0560
Merge branch 'eosc_dimitris' of https://code-repo.d4science.org/D-Net/dnet-hadoop into eosc_dimitris
2022-04-21 12:10:19 +02:00
Miriam Baglioni
b61efd613b
[Measures] addressed comments in the PR
2022-04-21 12:09:37 +02:00
Miriam Baglioni
d012d125d7
[EOSCTag] -
2022-04-21 12:02:09 +02:00
Claudio Atzori
88acad76f9
Merge branch 'beta' into eosc_dimitris
2022-04-21 12:00:03 +02:00
Claudio Atzori
eabb40fccc
Merge branch 'beta' into 7096-fileGZip-collector-plugin
2022-04-21 11:42:43 +02:00
Miriam Baglioni
c304657d91
[Measures] put the logic in common, no need to change the schema
2022-04-21 11:27:26 +02:00
Sandro La Bruzzo
d580e15442
Modified last intersection since we lost many titles.
...
this is my last resource, after that, I've to change my job
2022-04-21 11:06:08 +02:00
Miriam Baglioni
5295effc96
[Measures] fixed issue
2022-04-20 16:20:40 +02:00
Miriam Baglioni
61c0266a44
Merge pull request 'Remove Context from result' ( #208 ) from cleancontext into beta
...
Reviewed-on: #208
2022-04-20 15:45:32 +02:00
Miriam Baglioni
a38f0f5ea7
mergin with branch beta
2022-04-20 15:44:18 +02:00
Miriam Baglioni
dbfbe8841a
[Clean Context] changed the description in input parameters
2022-04-20 15:41:03 +02:00
Miriam Baglioni
5feae77937
[Measures] last changes to accomodate tests
2022-04-20 15:13:09 +02:00
Miriam Baglioni
869407c6e2
[Measures] added new measure (usagecounts) as action set. Measure added at the level of the result. Ref #7587
2022-04-20 14:02:05 +02:00
Antonis Lempesis
b7cd2c6ca1
added open citations
2022-04-20 14:46:55 +03:00
Michele Artini
c96a8613f8
update SQL queries
2022-04-20 12:07:49 +02:00
Michele Artini
4314db55c8
migration to services: update sql queries
2022-04-19 15:05:02 +02:00
miconis
9ddd24ba36
implementation of comparators and clustering function for the author deduplication
2022-04-19 10:18:09 +02:00
Miriam Baglioni
0012e57bf9
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2022-04-14 14:14:44 +02:00
Miriam Baglioni
c5a863132c
[BulkTagging] revert it
2022-04-14 14:14:13 +02:00
Sandro La Bruzzo
d5b29d96a7
fix merging in crossrefAggregator which creates dataInfo null
2022-04-14 11:07:04 +02:00
Miriam Baglioni
8e8933d41a
[BulkTagging] added fix if result.dataInfo is null
2022-04-14 09:04:24 +02:00
miconis
97a32faf9b
test implementation for the new fdup version
2022-04-13 09:48:56 +02:00
Claudio Atzori
b93a141d6c
[Doiboost] fixed fundingReference extraction from the Crossref records
2022-04-12 10:26:05 +02:00
Claudio Atzori
73c172926a
[Doiboost] fixed fundingReference extraction from the Crossref records
2022-04-12 10:25:42 +02:00
Claudio Atzori
48b580b45c
[graph enrichment] fixed country_propagation oozie workflow definition, parameter saveGraph is not needed anymore by the SparkCountryPropagationJob
2022-04-11 08:52:36 +02:00
Claudio Atzori
21f32b83c6
[graph enrichment] fixed country_propagation oozie workflow definition, parameter saveGraph is not needed anymore by the SparkCountryPropagationJob
2022-04-11 08:52:12 +02:00
Claudio Atzori
4eff7856f5
Merge pull request '[stats-wf] computing stats in each step' ( #210 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #210
2022-04-08 14:21:01 +02:00
Serafeim Chatzopoulos
d0b84d3297
Add FileCollectorPlugin and respective test
2022-04-07 15:06:38 +03:00
Claudio Atzori
91e32f12ed
Merge branch 'master' into beta
2022-04-07 13:37:58 +02:00
Serafeim Chatzopoulos
bc1bf55507
Add AbstractSplittedRecordPlugin
2022-04-07 14:33:04 +03:00
Claudio Atzori
c26222623f
[maven-release-plugin] prepare for next development iteration
2022-04-07 13:32:22 +02:00
Claudio Atzori
86585a6b27
[maven-release-plugin] prepare release dhp-1.2.4
2022-04-07 13:32:19 +02:00
Claudio Atzori
ad85d88eaf
[maven-release-plugin] rollback the release of dhp-1.2.4
2022-04-07 13:28:35 +02:00
Claudio Atzori
598e11dfd7
[maven-release-plugin] prepare for next development iteration
2022-04-07 13:27:02 +02:00
Claudio Atzori
db3d9877a5
[maven-release-plugin] prepare release dhp-1.2.4
2022-04-07 13:26:58 +02:00
Claudio Atzori
f03dea4f49
allow to skip maven site
2022-04-07 13:22:55 +02:00
Claudio Atzori
3bba6d6e38
[maven-release-plugin] rollback the release of dhp-1.2.4
2022-04-07 12:23:17 +02:00
Claudio Atzori
2ac2d928bd
[maven-release-plugin] prepare for next development iteration
2022-04-07 12:18:47 +02:00
Claudio Atzori
85bc722ff4
[maven-release-plugin] prepare release dhp-1.2.4
2022-04-07 12:18:43 +02:00
Claudio Atzori
bc05b6168a
[maven-release-plugin] rollback the release of dhp-1.2.4
2022-04-07 11:49:06 +02:00
Claudio Atzori
505420fd61
[maven-release-plugin] prepare for next development iteration
2022-04-07 11:34:06 +02:00
Claudio Atzori
66e718981e
[maven-release-plugin] prepare release dhp-1.2.4
2022-04-07 11:34:02 +02:00
Serafeim Chatzopoulos
e612489670
Add fileGZip collector plugin and respective test
2022-04-06 19:12:44 +03:00
Claudio Atzori
4190c9f6bc
[graph raw] avoid NPEs importing datasource consent fields
2022-04-06 15:34:31 +02:00
Claudio Atzori
05fafa1408
[graph raw] avoid NPEs importing datasource consent fields
2022-04-06 15:23:50 +02:00
Antonis Lempesis
c442c91f89
computing stats in each step
2022-04-06 12:40:02 +03:00
Claudio Atzori
8c457f1b2c
conflicts resolved, merged from beta
2022-04-06 10:27:52 +02:00
Miriam Baglioni
e77d104951
[OC] added / to workflow path
2022-04-05 15:07:11 +02:00
Miriam Baglioni
79336d46c5
[Clean Context] first naive implementation of a functionality to clean not wanted contextes from one result. This implementation simply verifies the main title of the results start with a given string
2022-04-04 15:52:31 +02:00
Claudio Atzori
873369af1c
Merge pull request '[stats wf] added apcs in monitor db' ( #207 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #207
2022-03-29 15:40:20 +02:00
Antonis Lempesis
7112806a73
views cannot be stored as parquet...
2022-03-29 16:37:29 +03:00
Antonis Lempesis
fff0b3cc19
added apcs in monitor db
2022-03-29 14:15:31 +03:00
Claudio Atzori
de85367695
Merge pull request '[stats wf] fix: views cannot be stored as parquet...' ( #206 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #206
2022-03-29 12:51:02 +02:00
Antonis Lempesis
ee24f3eb2c
views cannot be stored as parquet...
2022-03-29 13:47:48 +03:00
Sandro La Bruzzo
1b11010169
minor fix
2022-03-29 10:59:14 +02:00
Claudio Atzori
0a0ae84c22
[graph raw] DOI based instance URLs on https
2022-03-29 10:52:58 +02:00
Claudio Atzori
eca82e30c9
updated dhp-schema version
2022-03-29 09:46:49 +02:00
Claudio Atzori
9fa3dd78fe
Merge pull request '[stats wf] various fixes, organization ids for inst. dashboard' ( #205 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #205
2022-03-28 22:03:49 +02:00
Claudio Atzori
5d53ac95aa
Merge pull request 'XML serialisation of instances with the same URLs - 2nd round' ( #204 ) from instance_group_by_url into beta
...
Reviewed-on: #204
2022-03-28 09:24:00 +02:00
Claudio Atzori
96aa2a5d0d
Merge branch 'beta' into instance_group_by_url
2022-03-28 09:23:52 +02:00
Claudio Atzori
395ac6ecec
merged pom.xml from beta branch
2022-03-28 09:23:42 +02:00
Claudio Atzori
fa3cb84f77
Merge pull request 'Datasource consent fields' ( #202 ) from datasource_pdf_consent into beta
...
Reviewed-on: #202
2022-03-28 09:21:14 +02:00
Claudio Atzori
741bc99c47
Merge branch 'beta' into datasource_pdf_consent
2022-03-28 09:20:48 +02:00
Claudio Atzori
3610f1749a
merged pom.xml from beta branch
2022-03-28 09:20:27 +02:00
Claudio Atzori
61319b2e83
updated dhp-schema version; set entity-level dataInfo before & after merging the fields from the group of duplicates
2022-03-25 16:38:33 +01:00
Antonis Lempesis
d8503cd191
added moooar organizations
2022-03-24 14:02:36 +02:00
Miriam Baglioni
7b8f85692e
[Enrichment country] fixed issues with parameters and workflow args
2022-03-23 17:20:23 +01:00
Claudio Atzori
48d32466e4
instances grouped by URL expose only one refereed
2022-03-23 14:52:03 +01:00
Claudio Atzori
f10066547b
increased spark.sql.shuffle.partitions in affiliation_from_semrel_propagation
2022-03-23 12:22:26 +01:00
Claudio Atzori
43733c1a18
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-03-23 12:14:27 +01:00
Antonis Lempesis
62f91b0869
cleanup
2022-03-22 16:17:49 +02:00
Antonis Lempesis
2e8394ecf8
creating aaall tables as parquet
2022-03-22 16:16:08 +02:00
Antonis Lempesis
dcfbeb8142
yet more typos
2022-03-21 12:36:03 +02:00
Miriam Baglioni
89fd275480
[HostedByMap] added left over from PR and fixed issue on workflow
2022-03-21 09:54:45 +01:00
miconis
c763aded70
dependency updated to the new pace-core version
2022-03-16 16:41:50 +01:00
miconis
c959639bd5
dependency updated to the new pace-core version
2022-03-15 16:33:03 +01:00
miconis
10172553ab
[maven-release-plugin] prepare for next development iteration
2022-03-15 15:06:18 +01:00
miconis
bd919ac98d
[maven-release-plugin] prepare release dnet-dedup-4.1.12
2022-03-15 15:06:12 +01:00
miconis
a965233dd0
bug fix in the normalization of a legalname, city map updated and transliteration support added
2022-03-15 14:59:13 +01:00
Miriam Baglioni
0f7d8ca2e0
[HostedByMap] change on master to align to PR 201 on beta merged as 9f3036c847
2022-03-11 15:16:02 +01:00
Claudio Atzori
f430029596
cleanup
2022-03-11 14:28:28 +01:00
Claudio Atzori
d48ccfd65e
Merge pull request 'enrichment_country' ( #203 ) from enrichment_country into beta
...
Looks good to me
Reviewed-on: #203
2022-03-11 14:27:01 +01:00
Miriam Baglioni
12de9acb0d
[Country Propagation] left out from previous commit
2022-03-11 14:17:02 +01:00
Miriam Baglioni
2fbb35ade5
mergin with branch beta
2022-03-11 13:58:10 +01:00
Miriam Baglioni
4437f9345d
[Country Propagation] left out from previous commit
2022-03-11 13:57:47 +01:00
Miriam Baglioni
2b643059fa
[Country Propagation] changed the logic to get the collectedfrom at the result level. To fix issue when no instance is created for a result that should have the country associated. Change the code to use spark instead of hive to prepare the data needed for the propagation step. Added new tests for the intermediate steps and new verification for the propagation itself
2022-03-11 13:56:48 +01:00
Claudio Atzori
f25407bbe2
added mapping for datasource consent fields to integrate them in the graph
2022-03-11 09:32:42 +01:00
miconis
ac9708e31b
[maven-release-plugin] prepare for next development iteration
2022-03-09 13:43:48 +01:00
miconis
a5a6054039
[maven-release-plugin] prepare release dnet-dedup-4.1.11
2022-03-09 13:43:44 +01:00
miconis
3bc07c5881
bug fix in the AuthorMatch, implementation of the concat function in the model creation with jpath query
2022-03-09 12:53:09 +01:00
miconis
699612dd17
implementation of the size threshold on authors list match
2022-03-08 16:49:28 +01:00
Claudio Atzori
9f3036c847
Merge pull request 'HostedByMap' ( #201 ) from hostedByMap_update into beta
...
Reviewed-on: #201
2022-03-04 16:26:27 +01:00
Miriam Baglioni
2c5087d55a
[HostedByMap] download of doaj from json, modification of test resources, deletion of class no more needed for the CSV download
2022-03-04 15:18:21 +01:00
Miriam Baglioni
5d608d6291
[HostedByMap] changed the model to include also oaStart date and review process that could be possibly used in the future
2022-03-04 11:06:09 +01:00
Miriam Baglioni
b7c2340952
[HostedByMap - DOIBoost] changed to use code moved to common since used also from hostedbymap now
2022-03-04 11:05:23 +01:00
Miriam Baglioni
8a41f63348
[HostedByMap] update to download the json instead of the csv
2022-03-04 10:38:43 +01:00
Miriam Baglioni
44b0c03080
[HostedByMap] update to download the json instead of the csv
2022-03-04 10:37:59 +01:00
Antonis Lempesis
ad78e505da
yet another fix
2022-03-03 12:28:12 +02:00
Miriam Baglioni
3be8737c32
[graph-stats] fixed query after the change in the indicator table related to PR#200
2022-03-02 14:09:05 +01:00
Miriam Baglioni
3970651ee1
Merge pull request 'fixed query after the change in the indicator table' ( #200 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #200
2022-03-02 14:05:58 +01:00
Antonis Lempesis
efeeebfee1
fixed query after the change in the indicator table
2022-03-02 13:29:25 +02:00
Claudio Atzori
580d904aae
manually merging PR#199 #199
2022-02-25 12:22:50 +01:00
Claudio Atzori
1932a65d1c
Merge pull request '[Stats wf] sprint 6 indicators' ( #198 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #198
2022-02-25 12:09:18 +01:00
Miriam Baglioni
f5b0a6f89c
[master to beta] fixed issues in test files
2022-02-25 10:21:57 +01:00
miconis
8991d097b4
bug fix in the DedupRecordFactory, DataInfo set before merge
2022-02-24 17:13:12 +01:00
miconis
fe1c966cbf
Merge branch 'master_202203' of code-repo.d4science.org:D-Net/dnet-hadoop into master_202203
2022-02-24 17:08:38 +01:00
miconis
b0f369dc78
bug fix in the DedupRecordFactory, DataInfo set before merge
2022-02-24 17:08:24 +01:00
Miriam Baglioni
859cb7ac9d
[DoiBoost AR] changed test resource to be sure the result will always have EMBARGO as value for AccessRight
2022-02-24 16:55:32 +01:00
Miriam Baglioni
a40b59b7d5
[ResultToOrgFromInstRepoTest] fixed issue in model of the input resources
2022-02-24 16:05:57 +01:00
Claudio Atzori
66c09b1bc7
code formatting
2022-02-24 12:58:07 +01:00
Claudio Atzori
e7016c3981
Merge branch 'master_202203' into beta
2022-02-24 12:51:58 +01:00
Claudio Atzori
a87c070447
conflicts resolved, merged from beta
2022-02-24 12:51:31 +01:00
Claudio Atzori
55caa389d5
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2022-02-24 12:16:43 +01:00
Claudio Atzori
ab36154e3e
added more ignores
2022-02-24 12:16:17 +01:00
Claudio Atzori
fbf192d6ba
Merge pull request '[provision wf] serialize measures defined on the result level' ( #196 ) from xml_measures into beta
...
Reviewed-on: #196
2022-02-23 15:56:28 +01:00
Claudio Atzori
86cdb7a38f
[provision] serialize measures defined on the result level
2022-02-23 15:54:18 +01:00
Alessia Bardi
9d6203f79b
test mapping datasource
2022-02-23 15:00:53 +01:00
Antonis Lempesis
3b92a2ab9c
added the rest of spring 6 in monitor db
2022-02-23 12:05:57 +02:00
dimitrispie
9a75ca1ae4
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta
2022-02-22 14:47:33 +02:00
Antonis Lempesis
87c91f70a2
added sprint 6 indicators to monitor db
2022-02-22 14:41:48 +02:00
Antonis Lempesis
0bff45e739
added sprint 6 indicators to monitor db
2022-02-18 17:11:23 +02:00
Claudio Atzori
5226d0a100
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2022-02-18 15:21:07 +01:00
Claudio Atzori
99f5b14469
[graph raw] invisible records stored among the raw graph rather than the claimed subgraph
2022-02-18 15:20:57 +01:00
Claudio Atzori
401dd38074
code formatting
2022-02-18 15:19:15 +01:00
Claudio Atzori
cf8443780e
added processingchargeamount to the result view
2022-02-18 15:17:48 +01:00
Sandro La Bruzzo
891781ee3f
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2022-02-18 11:11:32 +01:00
Sandro La Bruzzo
d3f03abd51
fixed wrong json path
2022-02-18 11:11:17 +01:00
Claudio Atzori
56e048ef21
Merge pull request 'added hierarchy rel in ROR actionset' ( #153 ) from hierarchical_orgs_relations into beta
...
Reviewed-on: #153
2022-02-17 10:31:56 +01:00
Claudio Atzori
89c7313fc5
Merge branch 'beta' into hierarchical_orgs_relations
2022-02-17 10:30:04 +01:00
dimitrispie
58c59f46eb
Added Sprint 6
2022-02-17 10:21:09 +02:00
Antonis Lempesis
5772f92dba
merged beta chnages in hive branch
2022-02-15 13:24:51 +02:00
Antonis Lempesis
393a4ee956
fixed yet another typo...
2022-02-15 12:56:50 +02:00
Sandro La Bruzzo
3aa2020b24
added script to regenerate hostedBy Map following instruction defined on ticket #7539
...
updated hosted By Map
2022-02-15 11:05:27 +01:00
Miriam Baglioni
90e197a563
Merge pull request '[OpenCitation] changed the name of destination folders' ( #195 ) from openCitations into beta
...
Reviewed-on: #195
2022-02-14 15:52:10 +01:00
Miriam Baglioni
be64055cfe
[OpenCitation] changed the name of destination folders
2022-02-14 15:49:44 +01:00
Miriam Baglioni
a1013e62d4
Merge pull request 'openCitations' ( #194 ) from openCitations into beta
...
Reviewed-on: #194
2022-02-14 14:58:28 +01:00
Miriam Baglioni
1490867cc7
[OpenCitation] cleaning of the COCI model
2022-02-14 14:52:12 +01:00
Miriam Baglioni
c191080965
mergin with branch beta
2022-02-14 14:49:39 +01:00
Alessia Bardi
6158170334
testing delegated authority and bumped dep to schemas
2022-02-11 18:05:18 +01:00
Alessia Bardi
600ede1798
serialisation of APCs int he XML records
2022-02-11 11:00:20 +01:00
Miriam Baglioni
5c4043dba8
[OpenCitation] refactoring
2022-02-08 16:23:05 +01:00
Miriam Baglioni
759ed519f2
[OpenCitation] added logic to avoid the genration of self citations relations
2022-02-08 16:15:34 +01:00
Miriam Baglioni
b071f8e415
[OpenCitation] change to extract in json format each folder just onece
2022-02-08 15:37:28 +01:00
Miriam Baglioni
fbc28ee8c3
[OpenCitation] change the integration logic to consider dois with commas inside
2022-02-07 18:32:08 +01:00
Miriam Baglioni
78be2975f0
[stats-wf]fixed another typo related to PR#193
2022-02-07 11:22:08 +01:00
Miriam Baglioni
1f8302dc37
Merge pull request '[stats-wf]fixed yet another typo' ( #193 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #193
2022-02-07 11:19:26 +01:00
Antonis Lempesis
5f762cbd09
fixed yet another typo
2022-02-07 12:09:12 +02:00
Alessia Bardi
b04ecfcf13
Merge pull request 'extendResult' ( #192 ) from extendResult into beta
...
Reviewed-on: #192
2022-02-04 16:43:58 +01:00
Alessia Bardi
ac8b8f224f
Merge branch 'beta' into extendResult
2022-02-04 16:43:27 +01:00
Miriam Baglioni
9fd2ef468e
[APC at the result level] changed dependecy in external pom
2022-02-04 16:40:32 +01:00
Miriam Baglioni
493caef358
[stats-wf]fixed the result_result table related to PR#191
2022-02-04 14:51:25 +01:00
Miriam Baglioni
0547fd6ee7
Merge pull request '[stats-wf]fixed the result_result table' ( #191 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #191
2022-02-04 14:47:31 +01:00
Antonis Lempesis
ae633c566b
fixed the result_result table
2022-02-04 15:04:19 +02:00
Miriam Baglioni
aae667e6b6
[APC at the result level] added the APC at the level of the result and modified test class
2022-02-04 12:34:25 +01:00
Sandro La Bruzzo
bcfdf9a0d7
iis repository with https
2022-02-03 16:49:31 +01:00
Miriam Baglioni
3c60e53a96
[stats-wf]fixed the result_result creation for monitor PR#190 on beta
2022-02-03 14:47:08 +01:00
Miriam Baglioni
89922156c9
Merge pull request '[stats-wf]fixed the result_result creation for monitor' ( #190 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #190
2022-02-03 13:00:56 +01:00
Antonis Lempesis
c2b44530a3
typo...
2022-02-03 13:44:07 +02:00
Antonis Lempesis
dbd2646d59
fixed the result_result creation for monitor
2022-02-03 12:37:10 +02:00
Alessia Bardi
2e215abfa8
test for instances with URLs for OpenAPC
2022-02-02 17:27:44 +01:00
Miriam Baglioni
37784209c9
[dhp-schemas-] updated the version of dhp-schema to 2.10.27 for APC name and id modification
2022-02-02 12:46:31 +01:00
Miriam Baglioni
73eba34d42
[UnresolvedEntities] Changed the way to merge the unresolved because the new merge removed the dataInfo from the merged result. Added also data info for subjects
2022-02-01 08:38:41 +01:00
Miriam Baglioni
dce7f5fea8
[BULK TAGGING] changed to fix issue that should have been fixed already
2022-01-31 08:20:28 +01:00
Claudio Atzori
8eb75ca169
adapted GenerateEntitiesApplicationTest behaviour
2022-01-27 16:24:37 +01:00
Claudio Atzori
db299dd8ab
fixed typo
2022-01-27 16:24:06 +01:00
Claudio Atzori
af61e44acc
ported changes to the GraphCleaningFunctionsTest from 8de9788308
2022-01-27 16:19:14 +01:00
Claudio Atzori
4fc44edb71
depending on dhp-schemas:2.10.26
2022-01-27 16:03:57 +01:00
Miriam Baglioni
a70b0990c9
Merge pull request 'Priority to records from delegated authorities' ( #187 ) from delegated_authorities into beta
...
Reviewed-on: #187
2022-01-26 16:02:49 +01:00
Claudio Atzori
1322379741
Merge branch 'beta' into delegated_authorities
2022-01-25 14:28:25 +01:00
Claudio Atzori
59a250337c
[graph resolution] drop output path at the beginning
2022-01-24 18:02:39 +01:00
Claudio Atzori
97ad94d7d9
[graph resolution] drop output path at the beginning
2022-01-24 18:02:07 +01:00
Claudio Atzori
8de9788308
applied fix for avoiding ruling out the invisible (APC) records during the graph cleaning
2022-01-24 11:29:22 +01:00
Claudio Atzori
c42623f006
added NPE checks
2022-01-21 14:30:09 +01:00
Claudio Atzori
2f385b3ac6
updated dnet workflow profile definitions
2022-01-21 13:59:46 +01:00
Claudio Atzori
dd52bf1bb8
copy relations to the graphOutputPath
2022-01-21 13:59:29 +01:00
Claudio Atzori
4983d6536d
Merge branch 'beta' into delegated_authorities
2022-01-21 13:02:48 +01:00
Sandro La Bruzzo
7a3819144d
Merge pull request 'title types from datacite records' ( #188 ) from datacite_title_mapping into beta
...
Reviewed-on: #188
2022-01-21 11:05:25 +01:00
Claudio Atzori
f0ea2410e5
improved mapping titles from datacite records to consider title types
2022-01-21 10:50:34 +01:00
Claudio Atzori
b37bc277c4
reintroduced the hostedby patching to the datacite records
2022-01-21 09:15:13 +01:00
Claudio Atzori
f2fde5566b
using helper method from ModelSupport to find the inverse relation descriptor
2022-01-20 09:19:07 +01:00
Claudio Atzori
3b9020c1b7
added unit test for the DispatchEntitiesJob
2022-01-19 18:15:55 +01:00
Claudio Atzori
abfa9c6045
code formatting
2022-01-19 17:17:11 +01:00
Claudio Atzori
391aa1373b
added unit test
2022-01-19 17:13:21 +01:00
Claudio Atzori
62f135262e
code formatting
2022-01-19 12:30:52 +01:00
Claudio Atzori
44a937f4ed
factored out entity grouping implementation, extended to consider results from delegated authorities rather than identical records from other sources
2022-01-19 12:24:52 +01:00
miconis
8f07f0c537
[maven-release-plugin] prepare for next development iteration
2022-01-13 17:22:16 +01:00
miconis
620e35db28
[maven-release-plugin] prepare release dnet-dedup-4.1.10
2022-01-13 17:22:12 +01:00
miconis
2ff97781d2
minor change
2022-01-13 17:20:20 +01:00
Miriam Baglioni
42e8f76778
[GraphCleaning] change the return value in the filtering function to avoid to lose the APC entities
2022-01-13 16:06:43 +01:00
miconis
1ff6a3dc11
[maven-release-plugin] prepare for next development iteration
2022-01-13 15:15:05 +01:00
miconis
003bcf1699
[maven-release-plugin] prepare release dnet-dedup-4.1.9
2022-01-13 15:15:00 +01:00
Miriam Baglioni
a7c4d0d16d
[DoiBoost Organizations] added parameter to specify the action in the wf raw_organizations to be able to load the openorgs organization as in the loading step for the construction of the graph
2022-01-13 13:52:00 +01:00
miconis
2f1ba56f61
bug fix in the authormatch comparator, implementation of tests
2022-01-13 11:58:28 +01:00
Miriam Baglioni
7bf12ad24a
Merge pull request 'BipInstance' ( #185 ) from BipInstance into beta
...
Reviewed-on: #185
2022-01-12 18:15:38 +01:00
Miriam Baglioni
a75fb8c47a
[BipFinderInstanceLevel] change pom to align to the dhp-schema release 2.10.24 and refactoring
2022-01-12 18:06:26 +01:00
Miriam Baglioni
4d517ed9ec
mergin with branch beta
2022-01-12 17:29:37 +01:00
Miriam Baglioni
e7d5a39c03
[BipFinderInstanceLevel] added tests in test class
2022-01-12 17:25:04 +01:00
Claudio Atzori
dbd6fa1d65
scalafmt: remote referencing the common definition files makes it work compiling the entire project as well as the individual submodules
2022-01-12 17:19:38 +01:00
Miriam Baglioni
4993666d73
[BipFinderInstanceLevel] changed creation of the instance to allow to enrich existing instances with same pid
2022-01-12 16:53:47 +01:00
Claudio Atzori
9acc32faa6
[stats wf] final touches for the integration of PRs #166 , #179 in the master branch
2022-01-12 12:04:31 +01:00
dimitrispie
b053b0178e
Sprint 5 and other changes
2022-01-12 11:23:37 +01:00
Antonis Lempesis
b6b4bc0df9
added first indicator of sprint 5
2022-01-12 11:20:28 +01:00
Antonis Lempesis
e91f06f39b
fixed typos in indicators. Added extra views in monitor
2022-01-12 11:18:28 +01:00
Antonis Lempesis
3ce1976627
fixed column names
2022-01-12 11:14:41 +01:00
Antonis Lempesis
4878d7485c
added usage stats
2022-01-12 11:13:25 +01:00
Antonis Lempesis
a4316bafed
fixed a typo
2022-01-12 11:12:53 +01:00
Antonis Lempesis
bb17e070d8
added result_result relations
2022-01-12 11:09:38 +01:00
Claudio Atzori
a30a98a716
Applying PR#166 in the master branch (Added sprint 3&4 of indicators). Merge commit '0df9574a6f5d9d75bc840decb023561ae941f9d6'
2022-01-12 10:57:19 +01:00
Sandro La Bruzzo
1b9e8378b3
Merge pull request 'scalafmt: code style for scala' ( #184 ) from scalafmt into beta
...
Reviewed-on: #184
2022-01-12 09:58:39 +01:00
Sandro La Bruzzo
57e2c4b749
formatted code
2022-01-12 09:40:28 +01:00
Sandro La Bruzzo
b78d2b71f0
updated scala format configuration
2022-01-12 09:38:34 +01:00
Claudio Atzori
0f2144b5e0
scalafmt: code formatting
2022-01-11 17:03:44 +01:00
Claudio Atzori
dcd282977c
pulled from beta
2022-01-11 16:59:41 +01:00
Claudio Atzori
4f212652ca
scalafmt: code formatting
2022-01-11 16:57:48 +01:00
Sandro La Bruzzo
0163dadb7f
[doiboost]
...
- update MAG schema, new filed added on version dec-2021
2022-01-11 11:05:44 +01:00
Miriam Baglioni
904e1c2667
Merge pull request 'Affiliation Propagation through semantic relation' ( #183 ) from enrichment into beta
...
Reviewed-on: #183
2022-01-07 19:18:16 +01:00
Miriam Baglioni
064f9bbd87
[AFFPropSR] added new paprameter for the number of iterations and new code for just one iteration
2022-01-07 18:58:51 +01:00
Miriam Baglioni
93f26fb742
Merge pull request '[SDG-FOS] to import SDG file not considering the header' ( #182 ) from SDG into beta
...
Reviewed-on: #182
2022-01-07 16:28:55 +01:00
Miriam Baglioni
b7e450070b
[SDG-FOS] to import SDG file not considering the header
2022-01-07 12:13:26 +01:00
Miriam Baglioni
af8a33638d
Merge pull request 'SDG - FOS' ( #181 ) from SDG into beta
...
Reviewed-on: #181
2022-01-07 11:31:19 +01:00
Miriam Baglioni
639190370a
mergin with branch beta
2022-01-07 11:29:25 +01:00
Miriam Baglioni
adccc2346a
[SDG-FOS] to lower case for the doi
2022-01-07 11:28:50 +01:00
Claudio Atzori
8ae46ca789
OAF-store-graph mdstores: firther fix for PR#180
2022-01-05 15:52:15 +01:00
Claudio Atzori
908294d86e
OAF-store-graph mdstores: firther fix for PR#180
2022-01-05 15:49:05 +01:00
Claudio Atzori
3bd3653be9
OAF-store-graph mdstores: save them in text format
2022-01-04 16:39:39 +01:00
Claudio Atzori
3dc48c7ab5
OAF-store-graph mdstores: save them in text format
2022-01-04 16:39:27 +01:00
Claudio Atzori
f82db765db
OAF-store-graph mdstores: save them in text format
2022-01-04 16:39:15 +01:00
Claudio Atzori
8d13effa31
test for the tolerant deserialisation utility method
2022-01-04 16:38:26 +01:00
Claudio Atzori
9458ee7938
serialise records in the OAF-store-graph mdstores in json format. Read them again in the graph construction phase using a tolerant parser to support backward compatible changes in the evolution of the schema
2022-01-04 16:38:09 +01:00
Claudio Atzori
58f8998e3d
OAF-store-graph mdstores: save them in text format
2022-01-04 15:02:09 +01:00
Claudio Atzori
174c3037e1
OAF-store-graph mdstores: save them in text format
2022-01-04 14:40:16 +01:00
Claudio Atzori
045d767013
OAF-store-graph mdstores: save them in text format
2022-01-04 14:23:01 +01:00
Claudio Atzori
cb30770a0b
Merge pull request 'tolerant parsing of OAF-store-graph mdstores' ( #180 ) from graph_interpretation_mdstores into beta
...
Reviewed-on: #180
2022-01-04 11:32:29 +01:00
Claudio Atzori
bd59b58efb
test for the tolerant deserialisation utility method
2022-01-04 11:26:56 +01:00
Claudio Atzori
a6977197b3
serialise records in the OAF-store-graph mdstores in json format. Read them again in the graph construction phase using a tolerant parser to support backward compatible changes in the evolution of the schema
2022-01-03 17:25:26 +01:00
Miriam Baglioni
4c60ee1718
mergin with branch beta
2022-01-03 15:24:02 +01:00
Miriam Baglioni
92fd69e25d
[SDG-FOS] alternative way to get input data to avoid OOM error while getting csv
2022-01-03 15:23:06 +01:00
Claudio Atzori
fe7e5f4748
Merge pull request '[stats wf] result_result relations, usage stats, monitor views, indicator for sprint 5' ( #179 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #179
2022-01-03 14:52:11 +01:00
Claudio Atzori
bcea4e3a9b
added dnet workflow profile for the orchestration of the simplified and complete graph construction and processing pipeline, where the IIS works on the non-deduplicated graph
2022-01-03 14:33:00 +01:00
miconis
cea8440153
[maven-release-plugin] prepare for next development iteration
2021-12-30 13:11:57 +01:00
miconis
eb48d31ea6
[maven-release-plugin] prepare release dnet-dedup-4.1.8
2021-12-30 13:11:52 +01:00
miconis
a224bf70a4
implementation of new comparators for publication dedup configuration update
2021-12-27 17:35:02 +01:00
Miriam Baglioni
a706ba0c08
Merge pull request 'SDG Integration' ( #178 ) from SDG into beta
...
Reviewed-on: #178
2021-12-23 14:50:00 +01:00
Antonis Lempesis
81ee654271
added result_result relations
2021-12-23 15:46:17 +02:00
Antonis Lempesis
7551e52e95
fixed a typo
2021-12-23 15:33:53 +02:00
Miriam Baglioni
7a1b440413
[SDG] logic to create unresolved entities out of SDG input. This changes also some classes related to FOS to reuse the same code. The code under createunresolvedentities create results with the merged update of the the inputs provided (bip at the level of the isntance, fos and sdg for subjects)
2021-12-23 13:24:28 +01:00
Claudio Atzori
278cf08421
Merge pull request 'Normalising DOI urls' ( #177 ) from instance_group_by_url into beta
...
Reviewed-on: #177
2021-12-23 12:40:17 +01:00
Claudio Atzori
cccb16900c
https://support.openaire.eu/issues/7330 normalising DOI urls
2021-12-23 12:33:53 +01:00
Miriam Baglioni
2a67ee13ec
[SDG] added model class
2021-12-23 10:37:52 +01:00
Miriam Baglioni
5c4fee3533
Merge pull request '[Graph Dump] fixed issue on extraction of relation between entities and contexts: the relationship name and type were swapped' ( #176 ) from dump into beta
...
Reviewed-on: #176
2021-12-23 10:16:20 +01:00
Miriam Baglioni
69e9ea9eeb
[Graph Dump] Test for extraction of rels from entities extended
2021-12-23 10:15:30 +01:00
Miriam Baglioni
31b26d48ac
[Graph Dump] fixed issue on extraction of relation between entities and contexts: the relationship name and type were swapped
2021-12-23 10:09:47 +01:00
Miriam Baglioni
bf3a9505e0
Merge pull request 'FOS' ( #175 ) from FOS into beta
...
Reviewed-on: #175
2021-12-23 09:06:56 +01:00
Miriam Baglioni
10579c0dd0
[FOS]fixed doi value in test
2021-12-22 23:10:16 +01:00
Miriam Baglioni
6116fc5d40
[FOS]added logic to include only different subjects. Test refactoring and extention
2021-12-22 23:04:22 +01:00
Miriam Baglioni
b81efb6a9d
[FOS]changed the mapping between the csv and the model. Changed Test classes and resources
2021-12-22 21:40:35 +01:00
Miriam Baglioni
73175ba086
mergin with branch beta
2021-12-22 16:45:15 +01:00
Miriam Baglioni
de6c4c8968
[FOS]creation of the unresolved entities: remove the split for the doi: no more needed since each row is related to one doi
2021-12-22 16:44:44 +01:00
Miriam Baglioni
b352fbe453
Merge pull request 'bipFinder: unresolved entities' ( #174 ) from bipFinder into beta
...
Reviewed-on: #174
2021-12-22 16:42:30 +01:00
Miriam Baglioni
34ac56565d
refactoring
2021-12-22 16:28:11 +01:00
Miriam Baglioni
20ef1d657f
refactoring
2021-12-22 16:26:36 +01:00
Miriam Baglioni
813f856d3f
[BipFinder] removing left over parameter in wf
2021-12-22 16:11:12 +01:00
Miriam Baglioni
2c126ed014
[BipFinder] create unresolved entities with measures at the level of the instance
2021-12-22 16:03:41 +01:00
Miriam Baglioni
bf52a1847b
Merge pull request 'bipFinder at the level of the result' ( #173 ) from bipFinder into beta
...
Reviewed-on: #173
2021-12-22 15:48:03 +01:00
Miriam Baglioni
0807fdb65a
[BipFinder] remove not needed resources
2021-12-22 15:37:00 +01:00
Miriam Baglioni
b5e11a3a0a
[BipFinder] put in common package BipFinder model
2021-12-22 15:33:05 +01:00
Miriam Baglioni
c5739c4266
[BipFinder] create action set for the measures at the level of the result
2021-12-22 15:08:33 +01:00
Miriam Baglioni
da5f6260aa
mergin with branch beta
2021-12-22 13:12:02 +01:00
Miriam Baglioni
4849270c55
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-22 13:07:37 +01:00
Claudio Atzori
8d18500069
using dhp-schema:2.9.24
2021-12-22 12:47:21 +01:00
Miriam Baglioni
9d19b057b8
Merge pull request '[GRAPH DUMP]Moving Measures' ( #159 ) from dump into beta
...
Reviewed-on: #159
2021-12-22 12:40:35 +01:00
Miriam Baglioni
be0acccf42
Merge branch 'beta' into dump
2021-12-22 12:39:57 +01:00
Miriam Baglioni
89ea9fa0e1
Merge branch 'dump' of https://code-repo.d4science.org/D-Net/dnet-hadoop into dump
2021-12-22 12:36:32 +01:00
Antonis Lempesis
16539d7360
added usage stats
2021-12-22 02:54:42 +02:00
Antonis Lempesis
3edd661608
fixed column names
2021-12-21 22:55:04 +02:00
Antonis Lempesis
a4c0cbb98c
fixed typos in indicators. Added extra views in monitor
2021-12-21 15:54:38 +02:00
Miriam Baglioni
e24a7f3496
mergin with branch beta
2021-12-21 13:57:19 +01:00
Miriam Baglioni
d1ae219cb4
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-21 13:55:53 +01:00
Miriam Baglioni
460e6b95d6
[Graph Dump] -
2021-12-21 13:48:03 +01:00
Sandro La Bruzzo
3920d68992
Fixed workflow generation of delta in datacite
2021-12-21 11:41:49 +01:00
Antonis Lempesis
58996972d9
added first indicator of sprint 5
2021-12-21 03:35:04 +02:00
dimitrispie
c1cdec09a9
Sprint 5 and other changes
2021-12-20 19:23:57 +02:00
Miriam Baglioni
3cc1b7b153
mergin with branch beta
2021-12-15 17:25:02 +01:00
Miriam Baglioni
5e5dfd619c
Merge branch 'beta' into dump
2021-12-15 17:21:55 +01:00
Miriam Baglioni
63b648b0dd
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-15 12:41:15 +01:00
Antonis Lempesis
f0b523cfa7
removed the too restrctive clause. will discuss again
2021-12-15 12:32:15 +01:00
Sandro La Bruzzo
b881ee5ef8
[scholexplorer]
...
- implemented generation of scholix of delta update of datacite
2021-12-15 11:25:32 +01:00
Sandro La Bruzzo
63952018c0
[scholexplorer]
...
-moved SparkRetrieveDataciteDelta in scala folder
2021-12-15 11:25:32 +01:00
Sandro La Bruzzo
e5bff64f2e
[scholexplorer]
...
- Minor fix on SparkConvertRDDtoDataset
-first implementation of retrieve datacite dump
2021-12-15 11:25:32 +01:00
Claudio Atzori
e30e5ac8a8
Merge pull request '[Affiliation Propagation]' ( #162 ) from affiliationPropagation into beta
...
Reviewed-on: #162
2021-12-14 15:28:23 +01:00
Claudio Atzori
1790fa2d44
Merge branch 'beta' into affiliationPropagation
2021-12-14 15:26:56 +01:00
Miriam Baglioni
56409d1281
[Dump] resolved conflicts with beta and merging
2021-12-14 15:03:45 +01:00
Miriam Baglioni
a3592b463a
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-14 14:58:26 +01:00
Miriam Baglioni
22d4b5619b
[BipFinder Result] last changes to test and resources files
2021-12-14 14:54:13 +01:00
Miriam Baglioni
6fb6236cd4
changed the way to produce the AS for bipFinder.
2021-12-14 14:51:14 +01:00
Claudio Atzori
aff3ddc8d2
added cleaning for the format field, removing carrige return and tab characters
2021-12-14 11:41:46 +01:00
Miriam Baglioni
573bd17cbb
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-14 11:12:25 +01:00
Miriam Baglioni
4eb8276493
-
2021-12-14 11:12:17 +01:00
Antonis Lempesis
ddd34087c2
removed 'stored as parquet' from views..
2021-12-13 23:05:00 +02:00
Antonis Lempesis
915f758c82
moving data to impala cluster and creating shadow databases there
2021-12-13 16:26:14 +02:00
Miriam Baglioni
936578aaf1
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-13 15:01:47 +01:00
Miriam Baglioni
8d755cca80
-
2021-12-13 15:01:40 +01:00
Claudio Atzori
98eb292c59
avoid NPEs merging XMLInstance(s)
2021-12-13 13:27:20 +01:00
Claudio Atzori
5e17247bb6
avoid NPEs merging XMLInstance(s)
2021-12-13 11:48:40 +01:00
Claudio Atzori
b70ecccea0
avoid NPEs merging XMLInstance(s)
2021-12-12 12:37:38 +01:00
Claudio Atzori
c1b6ae47cd
cleaning workflow assigns the proper default instance type when a value could not be cleaned using the vocabularies
2021-12-09 16:47:41 +01:00
Claudio Atzori
25dc7929a9
Merge pull request '[graph cleaning] improved instance type defaults' ( #172 ) from graph_cleaning into beta
...
Reviewed-on: #172
2021-12-09 16:47:06 +01:00
Claudio Atzori
eb43eda42a
Merge branch 'beta' into graph_cleaning
2021-12-09 16:46:48 +01:00
Claudio Atzori
41c70c607d
cleaning workflow assigns the proper default instance type when a value could not be cleaned using the vocabularies
2021-12-09 16:44:28 +01:00
Alessia Bardi
8f1e018ceb
Merge pull request 'Serialization of fields in XML records for Sygma (and not only)' ( #171 ) from sygma_indexing into beta
...
Reviewed-on: #171
2021-12-09 15:53:27 +01:00
Alessia Bardi
cba63e9f82
Merge branch 'beta' into sygma_indexing
2021-12-09 15:52:16 +01:00
Alessia Bardi
e53228401b
style
2021-12-09 15:46:22 +01:00
Claudio Atzori
cd9c51fd7a
vocabulary based cleaning considers also the term label when looking up for a synonym
2021-12-09 14:49:24 +01:00
Claudio Atzori
adf17452b0
Merge pull request '[graph cleaning] consider terms as synonyms in the vocabulary lookup' ( #170 ) from graph_cleaning into beta
...
Reviewed-on: #170
2021-12-09 14:45:14 +01:00
Claudio Atzori
e6e177dda0
vocabulary based cleaning considers also the term label when looking up for a synonym
2021-12-09 13:57:53 +01:00
Alessia Bardi
6b5d7688a4
#7275 serialize license information in XML records
2021-12-09 13:46:48 +01:00
Miriam Baglioni
b113586207
resolved conflicts
2021-12-07 10:16:14 +01:00
Sandro La Bruzzo
5d51b3dd4a
Merge pull request 'scala_refactor' ( #169 ) from scala_refactor into beta
...
Reviewed-on: #169
2021-12-06 15:33:44 +01:00
Miriam Baglioni
d9836f0cf3
[OpenCitations] fixed test when executed one after the other
2021-12-06 15:27:09 +01:00
Miriam Baglioni
d1df01ff1e
[Graph Dump] fixed resource for test
2021-12-06 15:15:48 +01:00
Sandro La Bruzzo
ed0c352799
[test-fixing] fixed wrong test
2021-12-06 15:07:41 +01:00
Miriam Baglioni
96a7d46278
[Graph Dump] fixed tests
2021-12-06 15:06:32 +01:00
Sandro La Bruzzo
e9f285ec4d
[scala-refactor] Module dhp-doiboost:
...
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 14:24:03 +01:00
Sandro La Bruzzo
bf880e2508
[scala-refactor] Module dhp-graph-mapper:
...
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 13:57:41 +01:00
Sandro La Bruzzo
81bf604059
[scala-refactor] Module dhp-common:
...
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 11:29:24 +01:00
Sandro La Bruzzo
7af0bbd0b1
[scala-refactor] Module dhp-aggregation:
...
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 11:26:36 +01:00
Claudio Atzori
9132727793
fixed date cleaning test
2021-12-06 10:54:05 +01:00
Claudio Atzori
08795cbd30
using helper method from ModelSupport to find the inverse relation descriptor
2021-12-06 10:39:56 +01:00
Miriam Baglioni
f430688ff7
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-12-03 12:36:08 +01:00
Miriam Baglioni
4bb1d43afc
-
2021-12-03 12:35:51 +01:00
Sandro La Bruzzo
0fa0ce33d6
removed duplicated on gitignore
2021-12-03 11:47:35 +01:00
Sandro La Bruzzo
f7011b90d8
format code
2021-12-03 11:15:09 +01:00
Claudio Atzori
372633880f
Merge pull request 'XML serialisation of instances with the same URLs' ( #167 ) from instance_group_by_url into beta
...
Reviewed-on: #167
2021-12-03 09:28:06 +01:00
Claudio Atzori
dd0b2e5244
Merge branch 'beta' into instance_group_by_url
2021-12-03 09:27:58 +01:00
Claudio Atzori
c4c705aa46
Merge pull request 'Cleaning of invisible records' ( #168 ) from clean_invisible_records into beta
...
Reviewed-on: #168
2021-12-03 09:27:41 +01:00
Claudio Atzori
863a2f9db3
avoid to filter OAF records defined as invisible = true
2021-12-03 09:08:12 +01:00
Claudio Atzori
9cac283bec
implemented Instance serialization features requested in https://support.openaire.eu/issues/7156
2021-12-02 17:20:33 +01:00
Miriam Baglioni
d9f80488cc
[GRAPH DUMP] Add one more test to check the filtering of the relations
2021-12-02 14:15:19 +01:00
Miriam Baglioni
58bc3f223a
[GRAPH DUMP] Add filtering for relation we do not want to dump. It is based on the relclass
2021-12-02 14:09:46 +01:00
Miriam Baglioni
8905a39bf3
mergin with branch beta
2021-12-02 13:17:29 +01:00
Miriam Baglioni
87eedad898
-
2021-12-02 13:17:19 +01:00
Claudio Atzori
3b19821f3c
added stats computation on the graph hive DB tables
2021-12-02 10:44:10 +01:00
Claudio Atzori
cfa4560769
minor: fixed hive action name
2021-12-02 10:43:36 +01:00
Claudio Atzori
d85af6fc25
[cleaning wf] fixed OAF record navigation, a mapping defined on a container object would have prevented the natvigation to continue on its properties
2021-12-01 15:49:15 +01:00
Claudio Atzori
4fe7888817
code formatting
2021-12-01 15:48:15 +01:00
Claudio Atzori
01e5e0142a
added test to verify the relation inverse lookup operation
2021-12-01 09:46:26 +01:00
Antonis Lempesis
d05210ba99
finished migration to hive only
2021-11-30 19:01:48 +02:00
Claudio Atzori
0df9574a6f
Merge pull request '[stats wf] Added sprint 3&4 of indicators' ( #166 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #166
2021-11-29 10:40:26 +01:00
Claudio Atzori
1de881b796
resolved conflicts for #165
2021-11-26 16:15:11 +01:00
Claudio Atzori
014e872ae1
[resolution wf] added optional parameter to skip the entity resolution
2021-11-26 15:38:56 +01:00
Claudio Atzori
5c6d328537
code formatting
2021-11-26 15:38:16 +01:00
dimitrispie
09fc2afdca
Added indi_funder_country_collab
...
Kept only indi_pub_has_cc_licence
2021-11-26 16:13:10 +02:00
dimitrispie
8750a71502
Merge remote-tracking branch 'origin/beta' into beta
2021-11-26 16:11:26 +02:00
dimitrispie
25fc8abf77
Sprint 4
2021-11-26 16:10:58 +02:00
Antonis Lempesis
0b4163ee0b
added sprint3,4, removed 2, chaos
2021-11-26 15:58:01 +02:00
Antonis Lempesis
12749a0a77
first
2021-11-26 15:40:40 +02:00
dimitrispie
29f69f2f89
Sprint 4
2021-11-26 15:22:04 +02:00
Sandro La Bruzzo
bb7f556eff
Merge remote-tracking branch 'origin/beta' into beta
2021-11-25 13:03:25 +01:00
Sandro La Bruzzo
1e1f5e4fe0
minor fix
2021-11-25 13:03:17 +01:00
Miriam Baglioni
ac07ed8251
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-11-25 12:32:58 +01:00
Miriam Baglioni
5fd0e610bf
[DOIBOOST Process] fix filtering to filter results with non null id
2021-11-25 12:10:45 +01:00
Sandro La Bruzzo
feea154e89
remove working dir after test
2021-11-25 11:02:38 +01:00
Sandro La Bruzzo
028a8acad8
add test resources
2021-11-25 10:54:47 +01:00
Sandro La Bruzzo
2164a2a889
Datacite: Code Refactor generated a general SparkApplication Scala where all the spark scala have to inherit
...
Commented a little the Datacite transformation code
2021-11-25 10:54:13 +01:00
Miriam Baglioni
3f9b2ba8ce
[Hosted By Map] fix issue in test
2021-11-22 16:59:43 +01:00
Sandro La Bruzzo
a7cf277d98
Datacite: Removed HostedBy Patch as described on ticket #7219 , Now all the records will have hosted by Unknown Repository
2021-11-22 16:03:17 +01:00
Sandro La Bruzzo
483d3039d1
entity resolution: added distcpt of missing entities in graph materialization
2021-11-22 15:55:24 +01:00
Sandro La Bruzzo
93fe8ce8b2
entity resolution: fix test
2021-11-22 15:50:43 +01:00
Sandro La Bruzzo
35e20b0647
updated resolution wf:
...
- generate a new version of the graph
- changed merge from union to join
2021-11-22 11:48:55 +01:00
Miriam Baglioni
fdb75b180e
[Cleaning] added couple of tests for DOIBOOST publications
2021-11-21 16:35:22 +01:00
Miriam Baglioni
0506fa2654
[Graph Dump] changed to mirror the changes in the model
2021-11-19 15:56:25 +01:00
Sandro La Bruzzo
6110a2b984
reverted version
2021-11-19 15:31:45 +01:00
Sandro La Bruzzo
65ebe1019b
updated wagon-ssh version
2021-11-19 14:59:04 +01:00
Sandro La Bruzzo
155d8bf83f
updated maven site plugin on dhp-code-style
2021-11-19 14:51:08 +01:00
Sandro La Bruzzo
3426451d3f
Merge remote-tracking branch 'origin/beta' into beta
2021-11-19 14:49:04 +01:00
Sandro La Bruzzo
75298ec442
added site.xml to code style
2021-11-19 14:48:44 +01:00
Sandro La Bruzzo
4542a2338b
updated site configuration to deploy on website
2021-11-19 13:44:08 +01:00
Claudio Atzori
90c2a4987e
Merge pull request '[fix] preserve parent/child relations from OpenOrgs' ( #164 ) from preserve_openorg_parent_child_relations into beta
...
Reviewed-on: #164
2021-11-19 11:35:55 +01:00
Claudio Atzori
e5a2c596b2
Merge branch 'beta' into preserve_openorg_parent_child_relations
2021-11-19 11:35:46 +01:00
Claudio Atzori
f4538f3c4c
cleanup
2021-11-19 11:33:10 +01:00
Claudio Atzori
2b46b87f56
fixed filtering criteria applied in SparkCopyRelationsNoOpenorgs to keep the parent/child relations from OpenOrgs
2021-11-19 11:30:29 +01:00
Miriam Baglioni
9fae872181
[Graph Dump] changed to mirror the changes in the model
2021-11-19 11:25:50 +01:00
Sandro La Bruzzo
fc03c99805
fixed javadocs url after deploying site
2021-11-19 10:46:33 +01:00
Sandro La Bruzzo
8a7c7d36db
Merge pull request 'mvn_site_documentation' ( #161 ) from mvn_site_documentation into beta
...
Reviewed-on: #161
2021-11-19 09:54:53 +01:00
Sandro La Bruzzo
0c0d561bc4
added public class into tests to create correct javadoc
2021-11-19 09:54:22 +01:00
Claudio Atzori
62fa61f3cf
merge from beta
2021-11-19 09:23:42 +01:00
Claudio Atzori
bd9a43cefd
Revert to 4094f2bb9a
2021-11-19 09:20:43 +01:00
Claudio Atzori
3a4d925386
Merge branch 'beta' into hierarchical_orgs_relations
2021-11-18 18:07:08 +01:00
Claudio Atzori
3974fa7dc1
Merge branch 'beta' into affiliationPropagation
2021-11-18 18:06:26 +01:00
Claudio Atzori
a24b9f8268
[dedup] trivial refactoring
2021-11-18 17:12:02 +01:00
Claudio Atzori
c0750fb17c
avoid non necessary count operations over large spark datasets
2021-11-18 17:11:31 +01:00
Claudio Atzori
bb5dca7979
cleanup
2021-11-18 17:10:46 +01:00
Miriam Baglioni
793b5a8e5f
Aggiornare 'dhp-workflows/dhp-graph-mapper/src/main/java/eu/dnetlib/dhp/oa/graph/dump/ResultMapper.java'
...
Removing the dump of Measure at the level of the result. We decided not to map it
2021-11-18 14:49:38 +01:00
Miriam Baglioni
5dc5792722
[Graph Dump] Change test resource to mirror the movement of the measure element
2021-11-18 14:39:12 +01:00
Miriam Baglioni
0136a8c266
[Graph Dump] Change test to mirror that measure is at the level of the isntance
2021-11-18 14:38:33 +01:00
Miriam Baglioni
1b79c0ee79
mergin with branch beta
2021-11-18 11:01:00 +01:00
Claudio Atzori
10a32f287f
Merge pull request '[stats wf] RIs, affiliations, parquet' ( #156 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #156
2021-11-17 15:02:25 +01:00
Antonis Lempesis
cb3adb90f4
Merge branch 'beta' into beta
2021-11-17 14:33:45 +01:00
Antonis Lempesis
c283406829
added Universidad Polytecnica de Madrid
2021-11-17 15:33:00 +02:00
Claudio Atzori
e0395719d7
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-11-17 14:17:27 +01:00
Claudio Atzori
82a4e4efae
[cleaning wf] fixed methodology to rule out invalid result titles, based on https://support.openaire.eu/issues/7206
2021-11-17 14:17:22 +01:00
Miriam Baglioni
6d4a1c57ee
[Resolve Entities] Change test dataset to mirror the modification in the creation of the map between the pids and the unresolved
2021-11-17 12:41:52 +01:00
Sandro La Bruzzo
9c82d670b8
make class public in order to create javadoc
2021-11-17 12:31:02 +01:00
Sandro La Bruzzo
1f5ee116ed
code refactor, created and moved scala code on the correct maven folder under src/main/scala and src/test/scala
...
fixed test
2021-11-17 12:23:52 +01:00
Sandro La Bruzzo
2fd9ceac13
code refactor, created and moved scala code on the correct maven folder under src/main/scala and src/test/scala
2021-11-17 11:35:22 +01:00
Sandro La Bruzzo
60ae874dcb
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into mvn_site_documentation
2021-11-17 11:08:34 +01:00
Sandro La Bruzzo
2506d7a679
Merge branch 'mvn_site_documentation' of code-repo.d4science.org:D-Net/dnet-hadoop into mvn_site_documentation
2021-11-17 11:07:24 +01:00
Sandro La Bruzzo
cded363b55
code refactor, created and moved scala code on the correct maven folder under src/main/scala and src/test/scala
2021-11-17 11:06:35 +01:00
Miriam Baglioni
4094f2bb9a
added integration md file
2021-11-17 10:04:52 +01:00
Miriam Baglioni
ec8b0219ff
[Documentation] Added first page for Integration via unresolved entities generation
2021-11-16 17:41:34 +01:00
Miriam Baglioni
2bbece2ca5
mergin with branch beta
2021-11-16 16:35:40 +01:00
Sandro La Bruzzo
2d67020c59
added dhp-enrichment maven site template
2021-11-16 16:01:08 +01:00
Claudio Atzori
49f897ef29
[cleaning wf] fixed regex used to spot garbage in result titles; adjusted threshold for filtering titles
2021-11-16 15:24:23 +01:00
Miriam Baglioni
28ea532ece
[Affilaition Propagation] moved the selection of graph relation as a preparation step
2021-11-16 15:24:19 +01:00
Sandro La Bruzzo
18c1d70ef4
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into mvn_site_documentation
2021-11-16 15:16:49 +01:00
Sandro La Bruzzo
a1cafaf2e3
added mvn site for dnet-hadoop project
2021-11-16 15:16:28 +01:00
Miriam Baglioni
7c96e3fd46
removed not useful dir
2021-11-16 13:57:26 +01:00
Miriam Baglioni
c7c0c3187b
[AFFILIATION PROPAGATION] Applied some SonarLint suggestions
2021-11-16 13:56:32 +01:00
Miriam Baglioni
c6a9f0a1a8
mergin with branch beta
2021-11-16 12:04:40 +01:00
Miriam Baglioni
99d86134f5
[Graph Dump] changed the dump since the measures have been moded at the level of the instance
2021-11-16 12:04:21 +01:00
Claudio Atzori
0a727d325d
[dedup] increased number of partitions in the consistency phase
2021-11-16 08:43:41 +01:00
Claudio Atzori
bafa2990f3
code formatting
2021-11-15 17:07:16 +01:00
Claudio Atzori
668ac25224
[graph resolution] using existing argument parser file name
2021-11-15 17:02:45 +01:00
Claudio Atzori
7d0a03f607
[graph resolution] minor
2021-11-15 14:45:54 +01:00
Claudio Atzori
941a50a2fc
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-11-15 14:42:49 +01:00
Claudio Atzori
7c804acda8
[graph resolution] minor
2021-11-15 14:42:43 +01:00
Sandro La Bruzzo
efa09057db
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2021-11-15 14:32:09 +01:00
Sandro La Bruzzo
48923e46a1
added documentation to Pubmed Class and also added mvn site for dhp-aggregations
2021-11-15 14:32:01 +01:00
Claudio Atzori
d2c787d416
[graph resolution] fixed sequence of the workflow steps
2021-11-15 14:31:15 +01:00
Claudio Atzori
975b10b711
[actionmanager] increased spark.sql.shuffle.partitions to 5000
2021-11-15 12:31:45 +01:00
Claudio Atzori
1ecceea788
Merge pull request 'Open Citations' ( #158 ) from openCitations into beta
...
Reviewed-on: #158
2021-11-15 10:59:19 +01:00
Miriam Baglioni
4ec88c718c
merge with beta - resolved conflict in pom
2021-11-15 10:52:16 +01:00
Miriam Baglioni
6f1a434e90
[Bypass Action Set] Fixed test to consider the new identifier utils
2021-11-15 09:59:23 +01:00
Miriam Baglioni
157d33ebf9
[Bypass Action Set] Refactoring
2021-11-15 09:58:48 +01:00
Claudio Atzori
7b81607035
Merge pull request 'PR: Bypass Action Set' ( #157 ) from bypass_acstionset into beta
...
Reviewed-on: #157
2021-11-12 12:01:05 +01:00
Miriam Baglioni
6595135a1a
[Dump Schemas] changed the schema of the dumped result according to the modifications in the bestAccessRight type
2021-11-12 11:45:38 +01:00
Miriam Baglioni
43cae4ad88
Merge branch 'dump' of https://code-repo.d4science.org/D-Net/dnet-hadoop into dump
2021-11-12 11:36:54 +01:00
Miriam Baglioni
b3f9370125
merge with beta - resolved conflict in pom
2021-11-12 11:25:26 +01:00
Miriam Baglioni
92d0e18b55
[Bypass Action Set] used constant DOI instead of "doi"
2021-11-12 10:56:58 +01:00
Miriam Baglioni
881113743f
[Bypass Action Set] refactoring
2021-11-12 10:55:50 +01:00
Miriam Baglioni
47ccb53c4f
[Bypass Action Set] modification for comment #157 (comment)
2021-11-12 10:54:09 +01:00
Miriam Baglioni
ffb0ce1d59
merge with beta - resolved conflict in pom
2021-11-12 10:19:59 +01:00
Miriam Baglioni
716021546e
[Bypass Action Set] minor fix
2021-11-12 10:18:01 +01:00
Claudio Atzori
1f2a3d1af0
depending on dhp-schemas:2.8.22 (release)
2021-11-12 10:15:11 +01:00
Sandro La Bruzzo
3469cc2b1d
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2021-11-12 09:56:52 +01:00
Sandro La Bruzzo
a7763d2492
removed alternate identifier in resolutionMap
2021-11-12 09:56:45 +01:00
Miriam Baglioni
b8bdabfae9
[Graph DUmp] removed OpenAccessRoute from test in best access right
2021-11-11 16:16:48 +01:00
Miriam Baglioni
e5498052e8
[Graph DUmp] removed OpenAccessRoute from test in best access right
2021-11-11 16:14:10 +01:00
Miriam Baglioni
935062edec
[Bypass Action Set] creation of unresolved entities
2021-11-11 16:11:25 +01:00
Antonis Lempesis
26f086dd64
removed the too restrctive clause. will discuss again
2021-11-11 12:57:19 +02:00
Claudio Atzori
8bdca3413f
Merge pull request 'DOIBoost Mapping: change the creation of the instance in the DOIBoost result' ( #155 ) from doiboost_url into beta
...
Reviewed-on: #155
2021-11-11 10:40:32 +01:00
Claudio Atzori
148289150f
Merge branch 'beta' into doiboost_url
2021-11-11 10:40:19 +01:00
Sandro La Bruzzo
2ca0a436ad
added SparkResolveEntities node to the oozie wf
2021-11-11 10:25:42 +01:00
Sandro La Bruzzo
9cb195314f
implemented and tested resolution of entities
2021-11-11 10:17:40 +01:00
Miriam Baglioni
6d3c4c4abe
mergin with branch beta
2021-11-11 08:59:53 +01:00
Miriam Baglioni
8cc50ecee0
[Graph Dump] changed AccessRight with BestAccessRight in the dump and modified the dependency to the schema to the SNAPSHOT
2021-11-11 08:59:20 +01:00
Miriam Baglioni
88b73f4f49
mergin with branch beta
2021-11-10 17:00:52 +01:00
Miriam Baglioni
c371b23077
-
2021-11-10 17:00:37 +01:00
Alessia Bardi
fc8fceaac3
create direct link to WT projects as well
2021-11-10 14:11:52 +01:00
Alessia Bardi
6cd91004e3
fixed DOI for Wellcome Trust in mapping relationships from Crossref
2021-11-09 12:22:57 +01:00
Miriam Baglioni
9e214ce0eb
[BypassAS] addition of OC relations
2021-11-09 12:07:19 +01:00
Alessia Bardi
b9d4f115cc
fixed Crossref mappign for SFI projects
2021-11-09 12:04:45 +01:00
Sandro La Bruzzo
6477a40670
implement filter of openCitation
2021-11-09 11:27:12 +01:00
Miriam Baglioni
6f7ca539c6
[BypassAS] update of results for bipFinder and FOS
2021-11-09 11:25:41 +01:00
Miriam Baglioni
a7d50c499b
[BypassAS] prepare FOS subject, test and model for FOS and BipFinder scores
2021-11-08 16:44:19 +01:00
Antonis Lempesis
91354c6068
- fetching all context related results
...
- storing tables as parquet
2021-11-08 15:15:46 +02:00
Miriam Baglioni
94918a673c
[Graph DUMP] Fix issue for empty origilaId list
2021-11-08 10:25:28 +01:00
Claudio Atzori
9cb8e4ad21
Merge branch 'beta' into hierarchical_orgs_relations
2021-11-08 09:40:24 +01:00
Miriam Baglioni
4c70201412
mergin with branch beta
2021-11-05 12:29:56 +01:00
Miriam Baglioni
8442efd8d1
[Graph DUMP] Filtering out from the originalIds the id of the result in OpenAIRE
2021-11-05 12:29:22 +01:00
Claudio Atzori
5681e89544
Update 'dhp-workflows/dhp-graph-mapper/src/main/resources/eu/dnetlib/dhp/oa/graph/dump/schemas/result_schema.json'
2021-11-05 12:18:24 +01:00
Miriam Baglioni
a22c29fba1
[Graph DUMP] Filtering out from the originalIds the id of the result in OpenAIRE
2021-11-05 12:08:33 +01:00
Miriam Baglioni
c10ff6928c
[Graph DUMP] add schema of the dump related to the model as in dhp-schemas.2.8.31. Note the measere element at the level of the result has been removed because of issues on where to display it: at the level of the result or at the level of the entity
2021-11-05 11:36:21 +01:00
Miriam Baglioni
0857849a86
[Graph DUMP] Remove dump of measure until it will be clear where to put it (at the level of result or at the level of the instance)
2021-11-05 11:02:37 +01:00
Miriam Baglioni
df7ee77c7a
[DOIBoost Mapping] removed not needed comments
2021-11-04 16:24:07 +01:00
Miriam Baglioni
de63d29b6f
[DOIBoost Mapping] Fix to avoid to produce results with null as identifier (probably due to the filtering function in the factory for the creation of the id)
2021-11-04 16:16:40 +01:00
miconis
8f1db32921
implementation of the instance type comparator and its tests
2021-11-04 15:20:57 +01:00
Miriam Baglioni
d50057b2d9
[DOIBoost Mapping] changed the way to create the url for the instance: we use the crooref guidelines https://doi.org/doi
2021-11-03 16:59:37 +01:00
Miriam Baglioni
edf55395e9
added test resourse
2021-11-03 16:49:30 +01:00
Miriam Baglioni
d97ea82a29
[DOIBoost Mapping] Added test to verify the instance created for Crossref will have just the url related to the doi
2021-11-03 16:45:15 +01:00
Miriam Baglioni
96769b4481
[DOIBoost - Mapping] Changed the logic which brought in in the instance urls that should not be there: The urld of the doi in the json is reachable from the root (json/"URL") other urls where added from the links element. Now the mapping from the link element has been removed
2021-11-03 16:43:36 +01:00
Miriam Baglioni
683fe093cf
[DOIBoost - Mapping] Remove the addition of the instance to the MAG publication record
2021-11-03 15:51:26 +01:00
Miriam Baglioni
b2bb8d9d79
[DOIBoost - Mapping] selecting the url from Crossref containing the doi
2021-11-03 15:44:57 +01:00
Miriam Baglioni
779318961c
[DOIBoost - Mapping] removed the url from crossref containing the api.elsevier.com... string in the url
2021-11-03 14:38:52 +01:00
Miriam Baglioni
2480e590d1
[DOIBoost - Mapping] changed the type on which to map dissertation from Crossref: from 006 Doctoral thesis to 0044 Thesis since dissertation could be either Doctoral or master thesis
2021-11-03 14:25:23 +01:00
Miriam Baglioni
b9d124bb7c
[Enrichment: Propagation through parent-child relationships] Added counters, and changed constraint to verify if filtering out the relation (from classname = harvested to classid != propagation)
2021-11-03 13:55:37 +01:00
Sandro La Bruzzo
7bd224f051
implement first version of scholexplorer integration for the generation of final graph
2021-11-02 15:58:15 +01:00
Antonis Lempesis
b97b78f874
removed hardcoded reference
2021-11-02 09:12:49 +01:00
Claudio Atzori
7fa49f6956
Merge pull request 'removed hardcoded reference' ( #154 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #154
2021-11-02 09:11:30 +01:00
Antonis Lempesis
f78afb5ef9
removed hardcoded reference
2021-11-01 15:42:29 +02:00
Miriam Baglioni
2aca6bfa0a
mergin with branch beta
2021-10-29 11:20:45 +02:00
Miriam Baglioni
09f36cffb8
[Enrichment: Propagation through parent-child relationships] First implementation, testing, and wf for propagation of result to organization through semantic relation
2021-10-29 11:20:03 +02:00
Claudio Atzori
1225ba0b92
[resolution] increasing number of partitions to avoid OOM
2021-10-28 16:18:17 +02:00
Sandro La Bruzzo
d9cbca83f7
moved filter on next phase
2021-10-28 16:13:24 +02:00
Claudio Atzori
d02caef185
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-27 15:36:29 +02:00
Sandro La Bruzzo
1be9aa0a5f
Removed filter of datacite items from the raw graph merging phase, Datacite is not an actionset anymore in beta
2021-10-26 17:52:20 +02:00
Sandro La Bruzzo
4acfa8fa2e
Scholexplorer Datasource Aggregation:
...
- Added collectedfrom in the inverse relation generated
Relation resolution:
- increased number of partitions in workflow.xml
- using classid instead of classname to build the pid-dnetId mapping
2021-10-26 17:51:20 +02:00
Miriam Baglioni
d0ef7d91c5
adding test resource
2021-10-26 17:34:11 +02:00
Sandro La Bruzzo
aafdffa6b3
resolved conflict
2021-10-26 09:45:46 +02:00
Sandro La Bruzzo
034304b33a
conflict resolved on merge
2021-10-26 09:40:47 +02:00
Michele Artini
d66e20e7ac
added hierarchy rel in ROR actionset
2021-10-21 15:51:48 +02:00
Claudio Atzori
6b34ba737e
minor
2021-10-21 14:16:18 +02:00
Claudio Atzori
d147295c2f
avoiding java.io.NotSerializableException: java.util.HashMap
2021-10-21 14:15:57 +02:00
Claudio Atzori
3702fe478d
cleanup
2021-10-21 12:05:02 +02:00
Sandro La Bruzzo
ac36aa7d1c
fixed wrong Encoding during a map phase
2021-10-21 11:35:02 +02:00
Sandro La Bruzzo
aeeebd573b
code refactor renamed datacite package
2021-10-20 17:37:42 +02:00
Sandro La Bruzzo
ab3a99d3e9
removed old datacite oozie workflow
2021-10-20 17:19:47 +02:00
Sandro La Bruzzo
ae4e99a471
Adapted workflow of resolution of PID to work into OpenAIRE data workflow
...
- Added relations in both verse on all Scholexplorer datasources
2021-10-20 17:12:16 +02:00
Claudio Atzori
cece432adc
[stats] reducing the step22 wait time
2021-10-20 14:16:33 +02:00
Antonis Lempesis
a7376907c2
invalidating medatadata before context thingies
2021-10-20 14:16:25 +02:00
Antonis Lempesis
43f4eb492b
fetching affiliated results for 4 orgs in monitor. fixed affiliated orgs in stats db
2021-10-20 14:16:11 +02:00
Claudio Atzori
4f8970f8ed
[stats] reducing the step22 wait time
2021-10-20 14:14:53 +02:00
Claudio Atzori
00b78b9c58
cleanup: mapping contents in the graph already defined in the OAF graph model doesn't require to be aware of the vocabularies
2021-10-20 14:04:45 +02:00
Claudio Atzori
c01dd0c925
registered oaf model classes for the KryoSerializer
2021-10-20 13:55:07 +02:00
Miriam Baglioni
652114c641
[affiliationPropagation] first try. preparetion
2021-10-20 11:44:23 +02:00
Claudio Atzori
d0cf2963f0
Merge pull request 'hierarchical_orgs_relations' ( #150 ) from hierarchical_orgs_relations into beta
...
Reviewed-on: #150
2021-10-20 10:13:47 +02:00
Claudio Atzori
59f76b50d4
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-20 09:42:35 +02:00
Claudio Atzori
bc3372093e
Merge pull request '[stats] affiliations in stats and monitor dbs' ( #152 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #152
2021-10-20 09:40:34 +02:00
Antonis Lempesis
241dcf6df1
Merge branch 'beta' into beta
2021-10-19 23:54:21 +02:00
Claudio Atzori
515e068a78
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-19 16:46:06 +02:00
Claudio Atzori
512e7b0170
code formatting
2021-10-19 16:19:29 +02:00
Michele Artini
c4fce785ab
fixed a compilation problem of a unit test
2021-10-19 16:18:26 +02:00
Claudio Atzori
d517c71458
Merge branch 'dump' into beta
2021-10-19 16:15:42 +02:00
Claudio Atzori
e9157c67aa
Merge branch 'beta' into dump
2021-10-19 16:15:03 +02:00
Claudio Atzori
98f37c8d81
WIP: worflow nodes for including Scholexplorer records in the RAW graph
2021-10-19 16:14:40 +02:00
Claudio Atzori
c8850456e9
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-10-19 16:09:54 +02:00
Claudio Atzori
172363e7f1
[broker] integrating PR#147, notification record creation phase separated from indexing on ES
2021-10-19 15:56:27 +02:00
Claudio Atzori
bdffa86c2f
undo last commit
2021-10-19 15:39:38 +02:00
Sandro La Bruzzo
c9870c5122
code formatted
2021-10-19 15:24:59 +02:00
Sandro La Bruzzo
f8329bc110
since dhp-schemas changed, introducing new Relation inverse model, this class has been updated
2021-10-19 15:24:22 +02:00
Claudio Atzori
e471f12d5e
hotfix: recovered implementation removing the hardcoded working_dirs
2021-10-19 12:35:38 +02:00
Claudio Atzori
7a73010acd
WIP: worflow nodes for including Scholexplorer records in the RAW graph
2021-10-19 11:59:16 +02:00
Miriam Baglioni
c7f6cd2591
added again the setting for saXReader
2021-10-19 10:15:26 +02:00
Sandro La Bruzzo
a894d7adf3
updated version of dhp-schemas
2021-10-19 10:02:55 +02:00
miconis
5f780a6ba1
bug fix in migrate entities: parameter name was wrong
2021-10-18 23:30:40 +02:00
Miriam Baglioni
1315952702
merge with branch beta
2021-10-18 14:17:09 +02:00
Miriam Baglioni
1cc09adfaa
Opencitations: chenaged the test class to mirror the creation or not of duplicate dois for .refs oc original plus added optional parameter to duplicate the relation
2021-10-18 14:11:27 +02:00
Miriam Baglioni
76d41602be
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-10-18 10:53:22 +02:00
Miriam Baglioni
46f82c7c8f
removed not needed folder deletion
2021-10-18 10:53:16 +02:00
Sandro La Bruzzo
7b15b88d4c
renamed wrong package, implemented last aggregation workflow for scholexplorer
2021-10-15 15:00:15 +02:00
Antonis Lempesis
41ecb1eb61
invalidating medatadata before context thingies
2021-10-15 13:42:55 +03:00
Antonis Lempesis
4b7c8dff2d
fetching affiliated results for 4 orgs in monitor. fixed affiliated orgs in stats db
2021-10-14 18:53:35 +03:00
Claudio Atzori
e15a1969a5
applying fix on the DOIBoost construction process that somehow wasn't part of the merge done in 83c90c7180
2021-10-14 14:33:56 +02:00
Sandro La Bruzzo
51a03c0a50
refactor code for EBI from dhp-graph-mapper into dhp-aggregation
2021-10-14 14:23:13 +02:00
Claudio Atzori
dd568ec88b
Merge pull request 'Refactoring Solr Configuration' ( #148 ) from beta_solr_config into beta
...
Reviewed-on: #148
2021-10-14 12:45:11 +02:00
Claudio Atzori
14fbf92ad6
Merge branch 'beta' into beta_solr_config
2021-10-14 11:08:44 +02:00
Miriam Baglioni
4b1920f008
changed the working path parameter value as dependant from the dnet-workflow working dir parameter
2021-10-14 09:18:09 +02:00
Miriam Baglioni
8db39c86e2
added new parameter in the doiboost process workflow to specify a folder for the process of MAG dataset
2021-10-14 09:17:39 +02:00
Claudio Atzori
b292e4a700
[stats wf] added extra logging in the context data retrieval phase
2021-10-13 17:31:53 +02:00
miconis
995c1eddaf
minor change
2021-10-13 17:07:10 +02:00
Miriam Baglioni
5d9cc2452d
changed the working path parameter value as dependant from the dnet-workflow working dir parameter
2021-10-13 15:33:50 +02:00
miconis
326bf63775
integration of parent child orgs relations
2021-10-13 12:24:48 +02:00
Miriam Baglioni
16b28494a9
added new parameter in the doiboost process workflow to specify a folder for the process of MAG dataset
2021-10-13 11:34:24 +02:00
Miriam Baglioni
63933808d4
added fix for mixing result types, added configuration default to funder subworkflow
2021-10-13 11:28:28 +02:00
Sandro La Bruzzo
f2c8356ccf
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2021-10-12 12:36:40 +02:00
Sandro La Bruzzo
7387416e90
added params skip update to direct transform in OAF, this should be set to true in production
2021-10-12 12:36:30 +02:00
Claudio Atzori
914b3e92cb
updating graph schema module dependency to version 2.8.20 to include organization parent/child relation constants
2021-10-12 12:00:45 +02:00
Sandro La Bruzzo
511da98d0c
- fixed bug on download pmc Article
...
- removed unused line of code in SparkCreateActionset
2021-10-12 11:47:49 +02:00
Miriam Baglioni
fec40bdd95
merging with branch beta - resolved conflicts
2021-10-12 09:16:36 +02:00
Miriam Baglioni
83f51f1812
refactoring
2021-10-12 09:14:43 +02:00
Sandro La Bruzzo
5606014b17
code refactor see ticket #7065
2021-10-12 08:11:53 +02:00
Claudio Atzori
2f61054cd1
code formatting
2021-10-11 18:29:42 +02:00
Claudio Atzori
83c90c7180
manually merging PR#149 #149
2021-10-11 18:27:05 +02:00
Serafeim Chatzopoulos
201ce71cc1
Add resultsubject, relprojectname and resultacceptanceyear to __all field
2021-10-11 13:16:39 +03:00
Serafeim Chatzopoulos
e468a7b96b
Add tests to query Solr with different configurations
2021-10-08 16:58:51 +03:00
Serafeim Chatzopoulos
de81007302
Add exploreTestConfig, a new Solr configuration folder
2021-10-08 16:54:56 +03:00
Sandro La Bruzzo
8f99d2af86
Make the node of doiBoost to point to the correct OpenAire Organization in relations
2021-10-08 08:35:12 +02:00
Alessia Bardi
c48c43fa9e
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-10-07 17:30:53 +02:00
Alessia Bardi
8d3b60f446
test for patching records for EOSC Future
2021-10-07 17:30:45 +02:00
miconis
611ca511db
set configuration property in openorgs duplicates wf
2021-10-07 15:39:55 +02:00
miconis
9646b9fd98
implementation of the http call for the update of openorgs suggestions
2021-10-07 11:29:11 +02:00
Sandro La Bruzzo
2557bb41f5
Implemented new method for update baseline inside scala node
2021-10-06 16:41:08 +02:00
Sandro La Bruzzo
b84e0cabeb
Implemented new method for update baseline
2021-10-05 16:34:47 +02:00
Michele Artini
d6e1f22408
max numbers of workers for indexing
2021-10-05 15:09:18 +02:00
Michele Artini
210d6c0e6d
generateNotificationsJob and indexNotificationsJob
2021-10-05 13:57:46 +02:00
Michele Artini
69008e20c2
log and tests
2021-10-05 11:58:20 +02:00
Sandro La Bruzzo
f258bbb927
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2021-10-05 10:21:50 +02:00
Sandro La Bruzzo
991b06bd0b
removed generation of EBI links from old dump, now EBI link dump is created by another wf
2021-10-05 10:21:33 +02:00
Claudio Atzori
cb7efe12ac
Merge pull request 'beta' ( #146 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #146
2021-10-05 10:09:37 +02:00
Michele Artini
8bbaa17335
reimplemented of conditions cache as a non static variable
2021-10-05 09:20:37 +02:00
Miriam Baglioni
e653756e3d
applied some suggestiond from Sonar Lint
2021-10-04 18:40:07 +02:00
Michele Artini
0a9ef34b56
test
2021-10-04 15:46:12 +02:00
Michele Artini
31a6ad1d79
optimization of verifySubsriptions()
2021-10-04 12:01:56 +02:00
dimitrispie
3f25d2efb2
Merge branch 'beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into beta
2021-10-01 16:03:48 +03:00
dimitrispie
13687fd887
Sprint 3 indicators update
2021-10-01 16:02:02 +03:00
Miriam Baglioni
9814c3e700
mergin with branch beta
2021-10-01 13:00:03 +02:00
Miriam Baglioni
c4ccd7b32c
-
2021-10-01 12:59:47 +02:00
Miriam Baglioni
c8321ad31a
merge with branch beta
2021-10-01 12:59:08 +02:00
Claudio Atzori
b01cd521b0
removed configuration specifying the limit to 8 for spark.dynamicAllocation.maxExecutors
2021-10-01 11:26:33 +02:00
Claudio Atzori
ec94cc9b93
IndexNotificationsJob test: persist contents on HDFS instead of passing them to ES
2021-10-01 09:41:27 +02:00
Claudio Atzori
60a6a9a583
[graph2hive] added field 'measures' to the result view
2021-09-30 09:27:26 +02:00
Sandro La Bruzzo
66702b1973
Added node to update datacite
2021-09-28 08:59:06 +02:00
Sandro La Bruzzo
477cb10715
Merge remote-tracking branch 'origin/beta' into beta
2021-09-27 16:57:23 +02:00
Sandro La Bruzzo
be79d74e3d
Fixed DoiBoost generation to point to correct organization in affiliation relation
2021-09-27 16:57:04 +02:00
Claudio Atzori
35619b93ee
Merge pull request 'implementation of the whitelist for similarity relations' ( #144 ) from dedup_whitelist into beta
...
Reviewed-on: #144
2021-09-27 16:47:40 +02:00
Claudio Atzori
474117c2e8
Merge branch 'beta' into dedup_whitelist
2021-09-27 16:41:25 +02:00
Miriam Baglioni
476a4708d6
mergin with branch beta
2021-09-27 16:02:32 +02:00
Miriam Baglioni
5ec69889db
OpenCitations: creation of AS from OC
2021-09-27 16:02:06 +02:00
Claudio Atzori
a53acfbc06
Merge pull request '[stats] updates in the mapping, indicators, wf' ( #145 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #145
2021-09-27 15:59:54 +02:00
Alessia Bardi
b924276e18
tests to generate records for the EOSC-Future demo with the EOSC Jupyter Notebbok subject
2021-09-24 17:11:56 +02:00
Antonis Lempesis
a1e1cf32d7
fixed an impala error
2021-09-24 12:57:24 +03:00
Antonis Lempesis
f358cabb2b
fixed typo
2021-09-22 21:50:37 +03:00
Miriam Baglioni
eedf7c3310
mergin with branch beta
2021-09-22 15:18:34 +02:00
Miriam Baglioni
f2118d771a
first steps in the implementation of the integration of opencitations
2021-09-22 15:18:05 +02:00
Claudio Atzori
df15a4dc9f
Merge pull request 'UnknowHostException handling for orcid collector api' ( #141 ) from enrico.ottonello/dnet-hadoop:beta into beta
...
Reviewed-on: #141
2021-09-22 11:51:13 +02:00
Claudio Atzori
7fa60e166e
Merge branch 'beta' into dedup_whitelist
2021-09-22 11:31:18 +02:00
Antonis Lempesis
421d55265d
created hive action for observatory queries
2021-09-21 03:07:58 +03:00
Enrico Ottonello
92a63f78fe
multiple download attempts handling if a connection to orcid server fails
2021-09-20 18:25:00 +02:00
Enrico Ottonello
0c74f5667e
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-09-20 18:12:31 +02:00
miconis
853333bdde
implementation of the whitelist for similarity relations
2021-09-20 16:21:47 +02:00
Antonis Lempesis
8b681dcf1b
attempt to make the observatory wf run in hive
2021-09-18 00:35:14 +03:00
Claudio Atzori
71cfa386bc
Merge pull request 'cleaning for relation fields' ( #142 ) from clean_relations into beta
...
Reviewed-on: #142
2021-09-17 16:01:03 +02:00
Antonis Lempesis
2943287d10
fixed the definition of cc_licence, part II
2021-09-16 15:59:06 +03:00
Antonis Lempesis
dd2329849f
fixed the definition of cc_licence
2021-09-16 13:50:34 +03:00
Claudio Atzori
09c2eb7f62
Merge branch 'beta' into clean_relations
2021-09-16 11:09:47 +02:00
Claudio Atzori
954a16c213
Merge pull request 'Propagation relations not Cleaned' ( #143 ) from enrichment into beta
...
Reviewed-on: #143
2021-09-15 19:14:38 +02:00
Miriam Baglioni
e9ccdf853f
related to #132
2021-09-15 18:44:54 +02:00
Claudio Atzori
12766bf5f2
Merge branch 'beta' into clean_relations
2021-09-15 17:18:15 +02:00
Claudio Atzori
663b1556d7
manually integrating PR#140 #140
2021-09-15 16:40:25 +02:00
Claudio Atzori
ebf53a1616
added cleaning for relation fields: subRelType & relClass according to dedicated vocabs
2021-09-15 16:10:37 +02:00
Enrico Ottonello
8b804e7fe1
removed unused imports
2021-09-14 17:30:52 +02:00
Enrico Ottonello
aefa36c54b
other task executions go ahead if UnknownHostException happens on a single task
2021-09-14 17:26:15 +02:00
Antonis Lempesis
de9bf3a161
added cc_licences and abstracts in observatory db
2021-09-14 01:29:08 +03:00
Antonis Lempesis
9b1936701c
fixed yet another typo
2021-09-13 21:07:44 +03:00
miconis
fbb1b66bfb
dedup test implementation & graph drawing tools
2021-09-13 14:53:19 +02:00
Antonis Lempesis
8fc89ae822
moved context table creation before indicators
2021-09-13 14:33:23 +03:00
Antonis Lempesis
461bf90ca6
fixed the gold_oa definition
2021-09-13 11:10:30 +03:00
Antonis Lempesis
43852bac0e
creating other::other concept for all contexts
2021-09-13 01:36:41 +03:00
Antonis Lempesis
f13cca7e83
moved dependencies of indicators before them...
2021-09-08 23:07:58 +03:00
Antonis Lempesis
c6ada217a1
fixed typo
2021-09-08 22:34:59 +03:00
Antonis Lempesis
1250ae197f
using new indicators for the definition of peerreviewed, gold, and green
2021-09-08 14:08:43 +03:00
Antonis Lempesis
ccee451dde
added indicators of sprint 2 in monitor db
2021-09-07 23:17:13 +03:00
Sandro La Bruzzo
aed29156c7
changed behavior in transformation job, that doesn't fail at first error
2021-09-07 19:05:46 +02:00
Sandro La Bruzzo
370dddb2fa
fix bug on oai iterator that skip record cleaned
2021-09-07 11:20:41 +02:00
Sandro La Bruzzo
3c6fc2096c
fix bug on oai iterator that skip record cleaned
2021-09-07 10:46:26 +02:00
Sandro La Bruzzo
d4dadf6d77
reduced max number of PID in Relatedentity
2021-09-02 14:21:24 +02:00
Sandro La Bruzzo
9f8a80deb7
fixed wrong import of unresolved relation in openaire
2021-09-01 14:16:27 +02:00
Alessia Bardi
3762b17f7b
added VERSIOn and PART relationship and re-ordered according to my personal and obviously possibly biased
...
ordering
2021-08-31 20:20:05 +02:00
Sandro La Bruzzo
e8b3cb9147
Implemented method to download delta updates in EBI Links
2021-08-30 09:32:45 +02:00
Alessia Bardi
ccf4103a25
keep the original url if the decoder fails for any reason
2021-08-25 10:07:58 +02:00
Sandro La Bruzzo
45898c71ac
fixed wrong doi in pubmed
2021-08-24 15:20:04 +02:00
Alessia Bardi
00a28c0080
originalId was renamed to acronym
2021-08-23 15:02:21 +02:00
Alessia Bardi
f19b04d41b
code formatting after mvn compile
2021-08-23 14:33:39 +02:00
Alessia Bardi
412d2cb16a
added dependencies to classgraph and opencsv. Bumped version of dhp-schemas
2021-08-23 14:32:00 +02:00
Alessia Bardi
3bcac7e88c
Merge pull request 'towards EOSC datasource profiles' ( #130 ) from datasource_model_eosc_beta into beta
...
Reviewed-on: #130
2021-08-23 11:58:34 +02:00
Alessia Bardi
931f430129
Merge branch 'beta' into datasource_model_eosc_beta
2021-08-23 11:57:21 +02:00
Alessia Bardi
4c1474e693
Dealing with #6859#note-2: we have to decode URLs to avoid & and other chars encoded becasue of the original XML representation of data
2021-08-20 17:03:30 +02:00
Miriam Baglioni
5f8ccbc365
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-08-20 11:13:47 +02:00
Miriam Baglioni
882abb40e4
CrossrefDump -
2021-08-20 11:12:53 +02:00
Miriam Baglioni
45c62609af
CrossrefDump - modified because parameter file was moved
2021-08-20 11:12:31 +02:00
Miriam Baglioni
35880c0e7b
CrossrefDump - changed the wf to be able to resume from one of the steps
2021-08-20 11:11:35 +02:00
Miriam Baglioni
f3b6c392c1
CrossrefDump - moving parameter file under folder crossref_dump_reader
2021-08-20 11:10:58 +02:00
Miriam Baglioni
65822400ce
CrossrefDump - added new parameter file that was missing
2021-08-20 11:10:35 +02:00
Alessia Bardi
a053e1513c
different funders in blacklist from BETA and PROD aggregator
2021-08-19 11:32:27 +02:00
Alessia Bardi
812bd54c57
different funders in blacklist from BETA and PROD aggregator
2021-08-19 11:30:14 +02:00
Miriam Baglioni
a65d3caaea
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-08-19 10:29:10 +02:00
Miriam Baglioni
e5cf11d088
change open access route to result matching hbm to gold
2021-08-19 10:29:04 +02:00
Claudio Atzori
7c0c67bdd6
added mock pom
2021-08-13 17:45:53 +02:00
Claudio Atzori
82086f3422
fixed directory name
2021-08-13 17:42:14 +02:00
Claudio Atzori
bc7068106c
added crossref download oozie workflow
2021-08-13 17:19:44 +02:00
Claudio Atzori
2c0a05f11a
manually merged PR#139
2021-08-13 17:15:53 +02:00
Claudio Atzori
d43667d857
Merge pull request 'Automatic download of Crossref' ( #138 ) from crossref_dw_wf into beta
...
Reviewed-on: #138
2021-08-13 17:10:10 +02:00
Miriam Baglioni
5856ca8a7b
merging with branch beta - resolved conflicts
2021-08-13 16:45:45 +02:00
Miriam Baglioni
6fec71e8d2
removed the specific of the infra we are running the wf from the wf name
2021-08-13 16:39:02 +02:00
Miriam Baglioni
ed7e28490a
change in sh
2021-08-13 16:19:01 +02:00
Claudio Atzori
7743d0f919
consolidated dnet wf profiles into the same submodule
2021-08-13 16:14:54 +02:00
Miriam Baglioni
6eb7508995
mergin with branch beta
2021-08-13 16:07:04 +02:00
Claudio Atzori
f74adc4752
added DownloadCSV2 as alternative implementation of the same download procedure
2021-08-13 15:52:15 +02:00
Claudio Atzori
5f0903d50d
fixed CSV downloader & tests
2021-08-13 14:17:54 +02:00
Claudio Atzori
17cefe6a97
[HBM] removed stale replace option
2021-08-13 12:43:59 +02:00
Claudio Atzori
7ee2757fcd
fixed DownloadCSV parameters spec; workflow patching the hostedby replaces the graph content (publication, datasource) rather than creating a copy
2021-08-13 12:41:01 +02:00
Claudio Atzori
c3ad4ab701
minor fixes
2021-08-13 12:23:15 +02:00
Claudio Atzori
baed5e3337
test classes moved in specific components
2021-08-13 12:14:47 +02:00
Claudio Atzori
3359f73fcf
cleanup & best practices
2021-08-13 12:00:42 +02:00
Claudio Atzori
4e6575a428
Merge pull request 'Moving Download CSV' ( #137 ) from refactoring_download_csv into beta
...
Reviewed-on: #137
2021-08-13 10:41:01 +02:00
Miriam Baglioni
f4ec81c92c
mergin with branch beta
2021-08-13 10:31:35 +02:00
Miriam Baglioni
dc8b05b39e
Hosted By Map - changed the association with the datasource id for the hostedby element: there is no more the need to compute it. With the new HBM it is already the id in the graph
2021-08-13 10:18:25 +02:00
Miriam Baglioni
32fd75691f
refactoring
2021-08-13 10:15:42 +02:00
Miriam Baglioni
dfd1e53c69
added external dependency for version
2021-08-13 10:15:12 +02:00
Miriam Baglioni
01db1f8bc4
GetCSV refactoring - removed not needed import
2021-08-13 10:14:17 +02:00
Miriam Baglioni
964a46ca21
GetCSV refactoring - modified due to movement of classes
2021-08-13 10:11:18 +02:00
Miriam Baglioni
eaf077fc34
GetCSV refactoring - removed not needed dependency
2021-08-13 10:08:58 +02:00
Miriam Baglioni
5f674efb0c
moved dependency version in external pom
2021-08-13 10:07:53 +02:00
Miriam Baglioni
5cd5714530
GetCSV refactoring - added ignore annotation for fields not in input csv
2021-08-13 10:06:49 +02:00
Miriam Baglioni
58f241f4a2
GetCSV refactoring - changed due to change of input resource
2021-08-13 10:04:44 +02:00
Miriam Baglioni
f3d575f749
GetCSV refactoring - changed due to changes in input resource
2021-08-13 10:03:57 +02:00
Miriam Baglioni
a5f6edfa6c
GetCSV refactoring - changed to mirror the original model class
2021-08-13 09:30:03 +02:00
Miriam Baglioni
ed183d878e
GetCSV refactoring - modified test classes due to change in the model of projects and programme
2021-08-13 09:28:51 +02:00
Miriam Baglioni
8769dd8eef
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:20:56 +02:00
Miriam Baglioni
6b9e1bf2e3
GetCSV refactoring - removing not needed dependency
2021-08-12 18:17:50 +02:00
Miriam Baglioni
d57b2bb927
GetCSV refactoring - removing not needed dependency
2021-08-12 18:12:51 +02:00
Miriam Baglioni
9da74b544a
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:12:15 +02:00
Miriam Baglioni
ab8abd61bb
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:11:07 +02:00
Miriam Baglioni
335a824e34
GetCSV refactoring - fixed issue
2021-08-12 18:10:10 +02:00
Miriam Baglioni
f0845e9865
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:04:58 +02:00
Miriam Baglioni
7a789423aa
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:04:27 +02:00
Miriam Baglioni
e9fc3ef3bc
GetCSV refactoring - changed to use the new class to get and write the csv file
2021-08-12 18:03:41 +02:00
Miriam Baglioni
4317211a2b
GetCSV refactoring - refactoring due to movement
2021-08-12 18:03:14 +02:00
Miriam Baglioni
b62cd656a7
GetCSV refactoring - changed the model to store only the information needed
2021-08-12 18:01:10 +02:00
Miriam Baglioni
d36e925277
GetCSV refactoring - moved under model package
2021-08-12 18:00:21 +02:00
Miriam Baglioni
7402daf51a
GetCSV refactoring - added dependency to open-csv lib
2021-08-12 17:59:19 +02:00
Miriam Baglioni
733bcaecf6
GetCSV refactoring - added test class (all the tests are disabled since they refer to remote resource)
2021-08-12 17:58:52 +02:00
Miriam Baglioni
bfe8f5335c
GetCSV refactoring - copied model classes in test path
2021-08-12 17:58:14 +02:00
Miriam Baglioni
6e84b3951f
GetCSV refactoring - moving classes to dhp-common that have dependency with GetCSV class (that was located in graph-mapper)
2021-08-12 17:57:41 +02:00
Claudio Atzori
e91ffcd2f3
Merge pull request 'hostedbymap' ( #136 ) from hostedbymap into beta
...
Reviewed-on: #136
2021-08-12 17:10:55 +02:00
Claudio Atzori
9587d4aee8
Merge branch 'beta' into hostedbymap
2021-08-12 17:04:30 +02:00
Claudio Atzori
86d940044c
added test to verify bad records from FWF-E-Book-Library
2021-08-12 11:32:56 +02:00
Claudio Atzori
8cdce59e0e
[graph raw] let the mapping exceptions propagate
2021-08-12 11:32:26 +02:00
Miriam Baglioni
08dd2b2102
moving the dependency version to the external pom file
2021-08-11 18:09:41 +02:00
Miriam Baglioni
ac417ca798
removed not needed test resource
2021-08-11 17:50:33 +02:00
Miriam Baglioni
e33daaeee8
reverting
2021-08-11 17:46:19 +02:00
Miriam Baglioni
9650eea497
reverting
2021-08-11 17:45:48 +02:00
Miriam Baglioni
785db1d5b2
refactoring
2021-08-11 17:44:07 +02:00
Miriam Baglioni
95e5482bbb
removing not needed dependency
2021-08-11 17:42:26 +02:00
Miriam Baglioni
cc3d72df0e
removing not needed dependency
2021-08-11 17:42:01 +02:00
Miriam Baglioni
b966329833
reverting
2021-08-11 17:37:00 +02:00
Miriam Baglioni
8ad7c71417
reverting
2021-08-11 17:36:12 +02:00
Miriam Baglioni
0e1a6bec20
reverting
2021-08-11 17:32:29 +02:00
Miriam Baglioni
c6a2a780a9
reverting
2021-08-11 17:30:17 +02:00
Miriam Baglioni
b6b58bba28
reverting
2021-08-11 17:25:37 +02:00
Miriam Baglioni
804589eb30
reverting
2021-08-11 17:23:35 +02:00
Miriam Baglioni
d688749ad9
reverting
2021-08-11 17:22:28 +02:00
Miriam Baglioni
524c06e028
reverting
2021-08-11 17:20:30 +02:00
Miriam Baglioni
7aa3260729
reverting
2021-08-11 17:18:45 +02:00
Miriam Baglioni
55fc500d8d
reverting
2021-08-11 17:17:48 +02:00
Miriam Baglioni
f9b6b45d85
reverting
2021-08-11 17:04:48 +02:00
Miriam Baglioni
8229632839
adding assertions to the mapping of the unibi part of gold list
2021-08-11 16:36:01 +02:00
Miriam Baglioni
b1c6140ebf
removed all comments in Italian
2021-08-11 16:23:33 +02:00
Miriam Baglioni
52c18c2697
removed not needed test class. Teh functionality has been moved
2021-08-11 16:16:55 +02:00
Miriam Baglioni
8da3a25cf6
merging with branch beta
2021-08-11 15:55:34 +02:00
Claudio Atzori
9f4db73f30
updated/fixed unit tests
2021-08-11 15:02:51 +02:00
Claudio Atzori
61d811ba53
suggestions from intellij
2021-08-11 12:18:20 +02:00
Claudio Atzori
2ee21da43b
suggestions from SonarLint
2021-08-11 12:13:22 +02:00
Miriam Baglioni
b954fe9ba8
mergin with branch beta
2021-08-11 10:12:46 +02:00
Miriam Baglioni
b688567db5
hostedbymap - modified part of test to check the bestaccessright changed
2021-08-11 10:12:10 +02:00
Miriam Baglioni
9731a6144a
hostedbymap - in case the journal is open access the access may be changed also for the best access right in the result
2021-08-10 17:49:45 +02:00
Miriam Baglioni
a90bac3bc9
Graph Dump - added method to test class to verify addition of validation date in projects for community result
2021-08-09 16:36:54 +02:00
Miriam Baglioni
bd0d7bfba7
Graph Dump - added resources for testing addition of validation date in project for communityresult
2021-08-09 16:36:17 +02:00
Miriam Baglioni
8daaa32e90
Graph Dump - added resources for testing
2021-08-09 15:46:29 +02:00
Miriam Baglioni
bc9e3a06ba
Graph Dump - extended the test class
2021-08-09 15:46:06 +02:00
Claudio Atzori
d64a942a76
fixed MappersTest
2021-08-09 12:32:26 +02:00
Miriam Baglioni
2efa5abda5
refactoring
2021-08-09 12:28:36 +02:00
Claudio Atzori
577f3b1ac8
added dnet workflows responsible for the graph construction, enrichment, provision
2021-08-09 11:53:58 +02:00
Miriam Baglioni
da20fceaf7
removed all the part related to the crossref dump download since it is done in a separate workflow
2021-08-09 11:53:45 +02:00
Claudio Atzori
964f97ed4d
cleanup
2021-08-09 11:53:06 +02:00
Miriam Baglioni
54a6cbb244
CrossrefDump - put token among the parameters
2021-08-09 11:41:10 +02:00
Miriam Baglioni
b7079804cb
CrossrefDump - put token among the parameters
2021-08-09 11:34:35 +02:00
Miriam Baglioni
a5f82f442b
Merge branch 'beta' into doiboost_wf
2021-08-09 11:17:51 +02:00
Miriam Baglioni
b6dcf89d22
mergin with branch beta
2021-08-09 11:14:43 +02:00
Miriam Baglioni
eff499af9f
added new tests and changed the test example
2021-08-09 11:12:30 +02:00
Claudio Atzori
a45b95ccc1
resolving conflicts for PR#134
2021-08-09 10:50:03 +02:00
Miriam Baglioni
5d70f842eb
mergin with branch beta
2021-08-06 18:57:09 +02:00
Miriam Baglioni
c3931557e3
extended the logic of the dump to consider the validation date in the relation (also in the dumped result for communities and funders at the level of the project), the extention on the instance for the APC, the pid, the alternate identifiers, and the extention of the AccessRight to store the OpenAccessRoute. Added new resourec for testing and extended the old class to verify the new dump. Fixed also issue on relation dump: only relation whose source and target are entities in the graph are dumped. The same hold for references to projects
2021-08-06 18:56:18 +02:00
Claudio Atzori
66f398fe6f
Merge pull request '[stats] fixed a typo' ( #133 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #133
2021-08-06 14:29:57 +02:00
Miriam Baglioni
6bd1eca7e0
merge branch with beta
2021-08-05 15:23:32 +02:00
Miriam Baglioni
73dc082927
added new dumped field (openaccessroute, pid and alternate identifier at the level of the instance) and the bipFinder measure at the level of the result
2021-08-05 15:20:50 +02:00
Miriam Baglioni
ee13da9258
merge branch with master
2021-08-05 11:34:20 +02:00
Miriam Baglioni
bd096f5170
removed not needed param file
2021-08-05 10:55:43 +02:00
Miriam Baglioni
5faeefbda8
added script to download the dump,changed the workflow input paramenters
2021-08-05 10:54:03 +02:00
Miriam Baglioni
1965e4eece
new workflow for downloading the dump of crossref and unpack it
2021-08-04 18:29:03 +02:00
Claudio Atzori
83c04e5d28
mapping test for dataset records adapted to reflect the delegated pid authority (zenodo)
2021-08-04 10:37:57 +02:00
Miriam Baglioni
b4eb026c8b
mergin with branch beta
2021-08-04 10:21:37 +02:00
Miriam Baglioni
c7b71647c6
Hosted By Map - modification of the resource for testing the presence of only one entry per datasource id
2021-08-04 10:20:02 +02:00
Miriam Baglioni
eb8c3f8594
Hosted By Map - test modified because of the application of the new aggregator on datasources
2021-08-04 10:19:17 +02:00
Miriam Baglioni
e94ae0b1de
Hosted By Map - extention of the workflow to consider also the application of the map to publications and datasources
2021-08-04 10:18:11 +02:00
Miriam Baglioni
67ba4c40e0
Hosted By Map - added parameter resources
2021-08-04 10:17:28 +02:00
Miriam Baglioni
eccf3851b0
Hosted By Map - refactoring
2021-08-04 10:16:30 +02:00
Sandro La Bruzzo
74afe43c3a
fixed wrong test file
2021-08-04 10:16:17 +02:00
Miriam Baglioni
1e952cccf6
Hosted By Map - refactoring and deletion of not needed methods
2021-08-04 10:15:43 +02:00
Miriam Baglioni
8ba8c77f92
Hosted By Map - refactoring
2021-08-04 10:14:57 +02:00
Miriam Baglioni
8f7623e77a
Hosted By Map - refactoring and application of the new aggregator
2021-08-04 10:14:20 +02:00
Sandro La Bruzzo
3fc820203b
fixed wrong test file
2021-08-04 10:13:59 +02:00
Miriam Baglioni
a7bf314fd2
Hosted By Map - added new aggregator to get just one result per datasource id
2021-08-04 10:13:30 +02:00
Miriam Baglioni
9831725073
Hosted By Map - remove from workflow a step not needed. The hbm will be take care also of the integration of the unibi list of gold openaccess journals
2021-08-03 11:02:17 +02:00
Miriam Baglioni
100e54e6c8
mergin with branch beta
2021-08-03 10:47:11 +02:00
Miriam Baglioni
461b8a29a0
removed not needed class
2021-08-03 10:46:51 +02:00
Miriam Baglioni
327cddde33
Hosted By Map - refactoring
2021-08-03 10:44:13 +02:00
Miriam Baglioni
17292c6641
Hosted By Map - resources for testing purposes
2021-08-02 19:37:08 +02:00
Miriam Baglioni
ee7ccb98dc
Hosted By Map - test class to verify the application of the hbm to results and datasource
2021-08-02 19:36:18 +02:00
Miriam Baglioni
90e91486e2
Hosted By Map - test class to verify each step in the preparation process
2021-08-02 19:35:52 +02:00
Miriam Baglioni
1e859706a3
Hosted By Map - Classes to apply the HBM to results and datasources
2021-08-02 19:35:23 +02:00
Miriam Baglioni
72df8f9232
Hosted By Map - removed the aggregator for the datasource (it is no more needed) and added a new aggregator for the results. Changed also the hostedBYMap aggregator
2021-08-02 19:34:44 +02:00
Miriam Baglioni
ff1ce75e33
Hosted By Map - modification in the code to prepare the info needed to apply the HostedByMap. There is no need to join datasources with the hbm: all the information needed is in the hosted by map already
2021-08-02 19:32:59 +02:00
Claudio Atzori
e826aae848
using constants from ModelConstants
2021-08-02 14:28:59 +02:00
Claudio Atzori
fd55c77d97
updated dependency dhp-schemas:2.7.15
2021-08-02 13:48:42 +02:00
Antonis Lempesis
117c3d5c67
fixed a typo
2021-08-02 12:15:58 +03:00
Miriam Baglioni
1695d45bd4
Hosted By Map - Test class to verify the preparation of the intermediate information
2021-07-30 17:57:01 +02:00
Miriam Baglioni
7c6ea2f4c7
Hosted By Map - first attempt for the creation of intermedia information to be used to applu the hosted by map on the graph entities
2021-07-30 17:56:27 +02:00
Miriam Baglioni
d8b9b0553b
Hosted By Map - model classes to store the intermediate information to be used to apply the hosted by map
2021-07-30 17:55:39 +02:00
Miriam Baglioni
613bd3bde0
Hosted By Map - refactor of the first attemp to prepare a new hosted by map dependent on the datasource in the graph and on two external sources: the gold list from unibi ad the doaj list of open access journal. Both the lists are downloaded from provided url parameter
2021-07-30 17:54:45 +02:00
Miriam Baglioni
d1807781c0
mergin with branch beta
2021-07-30 14:34:07 +02:00
Miriam Baglioni
1d6ac3715b
merge branch with beta
2021-07-30 11:58:29 +02:00
Claudio Atzori
e244f73165
Update 'README.md'
2021-07-30 11:54:38 +02:00
Claudio Atzori
11e26c020a
Update 'README.md'
2021-07-30 11:54:13 +02:00
Claudio Atzori
19620eed46
applying PR#131, Patch the identifiers (source/target) in the relations, refinements
2021-07-30 11:09:32 +02:00
Claudio Atzori
5219d56be5
Merge pull request 'Patch the identifiers (source/target) in the relations, refinements' ( #131 ) from fct_project_id_replacement into master
...
Reviewed-on: #131
2021-07-30 11:07:54 +02:00
Claudio Atzori
4f78565c04
fixed implementation of PatchRelationsApplication, refined the relative unit test
2021-07-30 11:07:09 +02:00
Claudio Atzori
a6a38cca9e
fixed implementation of PatchRelationsApplication, refined the relative unit test
2021-07-30 11:06:11 +02:00
Miriam Baglioni
9bc4fd3b69
Patch FCT relations - fixed issue with join
2021-07-30 10:34:05 +02:00
Miriam Baglioni
2fc89fc9b5
Merge branch 'fct_project_id_replacement' of https://code-repo.d4science.org/D-Net/dnet-hadoop into fct_project_id_replacement
2021-07-30 10:20:43 +02:00
Claudio Atzori
081fe92a21
Merge branch 'fct_project_id_replacement' of https://code-repo.d4science.org/D-Net/dnet-hadoop into fct_project_id_replacement
2021-07-30 10:13:56 +02:00
Claudio Atzori
576693d782
added unit test for PatchRelationsApplication
2021-07-30 10:13:33 +02:00
Claudio Atzori
55e6470f44
Merge pull request 'added the sprint 2 indicators in monitor db' ( #129 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #129
2021-07-30 10:11:46 +02:00
Sandro La Bruzzo
6358f92c3a
added sleep to solve problem of lost request of creating index
2021-07-30 08:54:37 +02:00
Antonis Lempesis
26af0320d0
added the sprint 2 indicators in monitor db
2021-07-30 00:31:33 +03:00
Claudio Atzori
7b172e7cd9
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-07-29 13:57:06 +02:00
Claudio Atzori
c53d106e80
[provision] lowercase relation filter
2021-07-29 13:57:00 +02:00
Claudio Atzori
6e3554a45e
[provision] lowercase relation filter
2021-07-29 13:56:37 +02:00
Sandro La Bruzzo
b1b0cc3f15
fixed wrong package name
2021-07-29 13:55:08 +02:00
Miriam Baglioni
baad01cadc
hostedbymap
2021-07-29 13:04:39 +02:00
Claudio Atzori
e725c88ebb
[raw_all] patching relation identifier phase to be run at the end, i.e. includes also claimed relations
2021-07-29 13:03:43 +02:00
Claudio Atzori
5d08ad86ae
[raw_all] patching relation identifier phase to be run at the end, i.e. includes also claimed relations
2021-07-29 13:03:16 +02:00
Claudio Atzori
e87e1805c4
[raw_all] added extra workflow step for patching the identifiers in the relations, given an id mapping dataset
2021-07-29 12:13:06 +02:00
Claudio Atzori
f83dd70e1c
Merge pull request 'Patch the identifiers (source/target) in the relations' ( #125 ) from fct_project_id_replacement into master
...
Reviewed-on: #125
2021-07-29 12:11:27 +02:00
Claudio Atzori
5f7330d407
Merge branch 'master' into fct_project_id_replacement
2021-07-29 11:38:22 +02:00
Claudio Atzori
1923c1ce21
replaced full join + filtering with a left join
2021-07-29 11:36:20 +02:00
Claudio Atzori
dc55ed4acd
Merge pull request '[beta] stats update workflow' ( #128 ) from antonis.lempesis/dnet-hadoop:beta into beta
...
Reviewed-on: #128
2021-07-29 11:13:21 +02:00
Claudio Atzori
908f57a475
code formatting
2021-07-29 10:49:39 +02:00
Sandro La Bruzzo
3721df7aa6
refactoring create actionset of scholexplorer, moved on package dhp-aggregation
2021-07-29 10:45:35 +02:00
Michele Artini
6aef3e8f46
Merge pull request '[broker] updated relation descriptors' ( #127 ) from broker_relations_upgrade into beta
...
Reviewed-on: #127
2021-07-29 08:16:49 +02:00
Antonis Lempesis
4afa5215a9
fixed a NPE?
2021-07-28 21:59:12 +03:00
Antonis Lempesis
3d1580fa9b
fixed a typo
2021-07-28 18:50:31 +03:00
Claudio Atzori
4c5a71ba2f
[broker] updated relation descriptors, making use of constant values
2021-07-28 17:11:18 +02:00
Claudio Atzori
a9961a1835
[cleaning] title cleaning based on the me.xuender:unidecode library
2021-07-28 16:36:33 +02:00
Claudio Atzori
e1797c0a42
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-07-28 16:21:36 +02:00
Claudio Atzori
6dddad86ee
[cleaning] title cleaning based on the me.xuender:unidecode library
2021-07-28 16:21:29 +02:00
Sandro La Bruzzo
3d8f0f629b
implemented workflow of creation action set for scholexplorer
2021-07-28 16:15:34 +02:00
Antonis Lempesis
9b181ffa73
added the h2020 classification scheme for projects
2021-07-28 16:31:29 +03:00
Alessia Bardi
df8715a1ec
format code after mvn compile
2021-07-28 11:58:26 +02:00
Michele Artini
3e2a2d6e71
added new fields in xml
2021-07-28 11:56:55 +02:00
Alessia Bardi
c806387d4b
tests for enermaps
2021-07-28 11:54:36 +02:00
Alessia Bardi
9594343725
code formatting after mvn compile
2021-07-28 11:41:34 +02:00
Claudio Atzori
2fff24df55
code formatting
2021-07-28 11:34:19 +02:00
Michele Artini
9f1c7b8e17
tests
2021-07-28 11:32:34 +02:00
Claudio Atzori
b346feed36
Merge pull request 'Change the access right in DoiBoost' ( #126 ) from doiboosi_accessright into beta
...
Reviewed-on: #126
2021-07-28 11:29:15 +02:00
Antonis Lempesis
4a9741825d
added result_orcid, result_project provenance, issn in datasources
2021-07-28 12:28:04 +03:00
Miriam Baglioni
3d2bba3d5d
removing not needed classes
2021-07-28 11:25:43 +02:00
Miriam Baglioni
cc0d3d8a7b
mergin with branch beta
2021-07-28 11:24:46 +02:00
Michele Artini
e6f1773d63
mapping of new eosc fields
2021-07-28 11:17:11 +02:00
Miriam Baglioni
80d5b3b4de
DoiBoost AccessRigh #4362 - removing commented code
2021-07-28 11:16:49 +02:00
Miriam Baglioni
5fe016dcbc
DoiBoost AccessRigh #4362 - related to https://code-repo.d4science.org/D-Net/dnet-hadoop/pulls/126/files#issuecomment-4194
2021-07-28 11:14:28 +02:00
Miriam Baglioni
73ed7374a9
mergin with branch beta
2021-07-28 11:05:16 +02:00
Miriam Baglioni
43e62fcae9
DoiBoost AccessRigh #4362 - related to https://code-repo.d4science.org/D-Net/dnet-hadoop/pulls/126/files#issuecomment-4193
2021-07-28 11:04:55 +02:00
Michele Artini
c72c960ffb
added eosc fields
2021-07-28 11:03:15 +02:00
Michele Artini
1fb572a33a
added eosc fields
2021-07-28 10:52:24 +02:00
Miriam Baglioni
708d0ade34
Merge branch 'beta' into hostedbymap
2021-07-28 10:37:22 +02:00
Sandro La Bruzzo
16c91203bd
implemented workflow of creation action set for scholexplorer
2021-07-28 10:30:49 +02:00
Miriam Baglioni
6c936943aa
mergin with branch beta
2021-07-28 10:24:48 +02:00
Miriam Baglioni
0424f47494
HostedByMap fixing issues
2021-07-28 10:24:13 +02:00
Michele Artini
52e2315ba2
removed trick for datasourcetypeui
2021-07-28 10:23:00 +02:00
Claudio Atzori
d267dce520
[raw_all] added extra workflow step for patching the identifiers in the relations, given an id mapping dataset
2021-07-27 17:18:29 +02:00
Sandro La Bruzzo
825d9f0289
fixed datacite workflow starting from Importing delta
2021-07-27 16:09:46 +02:00
Claudio Atzori
5aa7d16d1b
updated assertions in eu.dnetlib.dhp.oa.graph.raw.MappersTest
2021-07-27 15:11:58 +02:00
Claudio Atzori
998b66855a
updated assertions in eu.dnetlib.dhp.oa.graph.raw.MappersTest
2021-07-27 15:11:37 +02:00
Antonis Lempesis
1a28a69cac
changed the citeee in *_citations to cites
2021-07-27 15:14:09 +03:00
Miriam Baglioni
74f801b689
mergin with branch beta
2021-07-27 13:18:31 +02:00
Miriam Baglioni
35e395eae8
merge with master
2021-07-27 12:34:59 +02:00
Miriam Baglioni
eb07f7f40f
Hosted By Map
2021-07-27 12:27:26 +02:00
Antonis Lempesis
ed185fd7ed
added missing colons
2021-07-27 11:42:47 +03:00
Antonis Lempesis
f3b9570354
properly invalidating metadata
2021-07-26 13:00:16 +03:00
Sandro La Bruzzo
848aabbb6c
minor fix
2021-07-25 12:06:41 +02:00
Sandro La Bruzzo
8fac10c91e
fixed defintion wf of creation final infospace of scholexplorer
2021-07-25 11:15:37 +02:00
Sandro La Bruzzo
3920c69bc8
change implementation of resolve Relation to generate jsonRdd in output
2021-07-25 09:51:36 +02:00
Antonis Lempesis
f9fbb0f261
added indicators second sprint
2021-07-24 16:40:28 +03:00
Claudio Atzori
a0393607a7
mapping funding relations from Datacite should be done according to the actual result identifier
2021-07-23 18:15:08 +02:00
Claudio Atzori
5b6844b969
mapping funding relations from Datacite should be done according to the actual result identifier
2021-07-23 18:14:37 +02:00
Sandro La Bruzzo
d9e3b89937
implemented last part of workflows to generate scholixGraph
2021-07-23 16:38:32 +02:00
Sandro La Bruzzo
cfde63a7c3
fixed resolve relation join
2021-07-23 14:17:29 +02:00
Sandro La Bruzzo
4a439c3863
NPE fixed
2021-07-23 14:17:29 +02:00
Claudio Atzori
bc835d2024
[cleaning] fixed filtering function for missing titles
2021-07-23 11:56:13 +02:00
Claudio Atzori
ffdb2a3ea3
[cleaning] fixed filtering function for missing titles
2021-07-23 11:55:55 +02:00
Sandro La Bruzzo
ca74e8dd02
create a separate wf for resolving relation
2021-07-23 11:40:06 +02:00
Sandro La Bruzzo
43e9380cd3
update resolve relation to use the same format of openaire graph
2021-07-23 11:25:18 +02:00
Sandro La Bruzzo
058b636d4d
added control to check if the entity exists
2021-07-22 16:08:54 +02:00
Sandro La Bruzzo
62ae36a3d2
fixed NPE
2021-07-22 15:41:38 +02:00
Miriam Baglioni
63553a76b3
added code to download gold issn list from unibi
2021-07-22 12:01:48 +02:00
Miriam Baglioni
1a5b114906
DoiBoost AccessRigh #4362 - refactoring
2021-07-22 12:00:23 +02:00
Sandro La Bruzzo
d94565862a
fixed NPE
2021-07-21 21:23:11 +02:00
Sandro La Bruzzo
31d2d6d41e
Scholexplorer: introduction of dedup openaire
2021-07-21 18:09:32 +02:00
Miriam Baglioni
b226ba4439
mergin with branch beta
2021-07-21 09:46:40 +02:00
Alessia Bardi
9069958479
tests for enermaps
2021-07-20 19:31:43 +02:00
Claudio Atzori
10d7b4f0b4
filtering 'old' OpenAIRE ids from the entity.originalId[] array in the OAF -> XML searialization procedure
2021-07-20 11:52:05 +02:00
Claudio Atzori
77e8c6c7f7
filtering 'old' OpenAIRE ids from the entity.originalId[] array in the OAF -> XML searialization procedure
2021-07-20 11:51:33 +02:00
Miriam Baglioni
83fe31c92e
changed the name of the workflows
2021-07-19 18:19:14 +02:00
Miriam Baglioni
dd81c36b60
Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta
2021-07-19 18:18:14 +02:00
Miriam Baglioni
54acc5373b
changed the name of the workflows
2021-07-19 18:18:09 +02:00
Miriam Baglioni
b420b11ed3
duplicate the number of partitions in ProcessMag
2021-07-19 18:16:23 +02:00
Claudio Atzori
65934888a1
adding record identifier among the originalIds regardless of what IdentifierFactory produces
2021-07-19 17:52:52 +02:00
Claudio Atzori
5947cddafc
adding record identifier among the originalIds regardless of what IdentifierFactory produces
2021-07-19 17:52:24 +02:00
Claudio Atzori
0977baf41d
contents mapped from the stores with 'claim' interpretation will not change their identifier along their way towards the graph
2021-07-19 17:43:52 +02:00
Miriam Baglioni
13cf444f85
Merge pull request 'force orginalId for claimed records' ( #124 ) from forceOrginalId_claims into master
...
Reviewed-on: #124
2021-07-19 17:41:58 +02:00
Claudio Atzori
5e5f65a3c3
contents mapped from the stores with 'claim' interpretation will not change their identifier along their way towards the graph
2021-07-19 15:56:55 +02:00
Miriam Baglioni
662c396354
duplicate the number of partitions in ConvertCrossrefToOaf
2021-07-19 12:41:14 +02:00
Miriam Baglioni
59530a14fb
DoiBoost AccessRigh #4362 - set BestAccessRight with the ususal comparator
2021-07-19 12:34:35 +02:00
Miriam Baglioni
199123b74b
DoiBoost AccessRigh #4362 - Fixed issue on date formatting. Added test method and associated resource
2021-07-16 17:30:27 +02:00
Miriam Baglioni
c4b18e6ccb
changed the download.sh, added skip step to allow to not execute one phase and changed the workflow sequence of steps
2021-07-16 15:01:25 +02:00
Miriam Baglioni
acd6056330
added shell action to automatically download the new dump and put it in a specified hdfs location
2021-07-16 12:47:10 +02:00
Miriam Baglioni
3bc9a05bc9
mergin with branch beta
2021-07-16 10:32:27 +02:00
Miriam Baglioni
34506df1b6
DoiBoost AccessRigh #4362 - if the journal is open, the OPEN access right is set to all instances and color is GOLD (overwrite if the color was already set in one of the previous steps)
2021-07-16 10:29:51 +02:00
Claudio Atzori
bf9e0d2d4f
Merge pull request 'orcid-no-doi' ( #123 ) from enrico.ottonello/dnet-hadoop:orcid-no-doi into beta
...
Reviewed-on: #123
2021-07-15 17:59:41 +02:00
Claudio Atzori
9913b6073c
Merge pull request 'orcid-no-doi' ( #123 ) from enrico.ottonello/dnet-hadoop:orcid-no-doi into master
...
Reviewed-on: #123
2021-07-15 17:53:58 +02:00
Sandro La Bruzzo
7e2caafe84
Scholexplorer: fixed mapping typologies
2021-07-15 09:53:12 +02:00
Enrico Ottonello
2dc50c0999
added default value to process path
2021-07-14 17:02:22 +02:00
Enrico Ottonello
66604bb2b4
added absolute path to process folder
2021-07-14 16:44:51 +02:00
Enrico Ottonello
7840cc6526
merged with master
2021-07-14 15:33:59 +02:00
Miriam Baglioni
4da46bb62f
mergin with branch beta
2021-07-14 15:08:52 +02:00
Enrico Ottonello
a65667d217
added publication to dataset even if no contributors
2021-07-14 15:07:07 +02:00
Sandro La Bruzzo
10068c00ea
Code refactor:
...
- removed old workflows in doiboost
- splitted workflow of doiboost in preprocess and process
2021-07-14 14:45:50 +02:00
Miriam Baglioni
09ad7b2a9e
DoiBoost AccessRigh #4362 - Unpaywall mapped to OAF with OPEN instance (non oa are filtered out) (unknown hostedby) + map the color as it is
2021-07-14 14:45:21 +02:00
Miriam Baglioni
f4f7c6f9d3
DoiBoost AccessRigh #4362 - Unpaywall mapped to OAF with OPEN instance (non oa are filtered out) (unknown hostedby) + map the color as it is
2021-07-14 14:44:54 +02:00
Miriam Baglioni
6222adf176
DoiBoost AccessRigh #4362 - added resources and test for crossref mapping (licence part included)
2021-07-14 14:42:34 +02:00
Miriam Baglioni
981b1018f6
DoiBoost AccessRigh #4362 - decide access right according to licence. Default access right is Unknown
2021-07-14 14:42:06 +02:00
Sandro La Bruzzo
3d8e2aa146
Code refactor:
...
- removed old workflows in doiboost
- splitted workflow of doiboost in preprocess and process
2021-07-14 14:37:06 +02:00
Miriam Baglioni
441701c85c
DoiBoost AccessRigh #4362 - If multiple licenses are available, take the one applied to 'vor'
2021-07-14 14:14:50 +02:00
Sandro La Bruzzo
c35c117601
fixed process doiboost workflow:
...
- splitted OrcidToOAF into two phase preprocess and process
- updated workflow used in production
2021-07-14 12:48:01 +02:00
Miriam Baglioni
1cdd09cd8e
Tentative fix for testing of Jenkins
2021-07-14 11:14:59 +02:00
Sandro La Bruzzo
4cb65bc64a
fixed process doiboost workflow:
...
- splitted OrcidToOAF into two phase preprocess and process
- updated workflow used in production
2021-07-14 09:44:32 +02:00
Miriam Baglioni
774cdb190e
changes to mirror the last dump of the graph with the ols data model.
2021-07-13 18:57:24 +02:00
Miriam Baglioni
886617afd0
One result linked to more than on project is saved just once
2021-07-13 18:15:35 +02:00
Miriam Baglioni
320cf02d96
Changed the way to find results linked to projects. We verify to actually have the project on the graph before selecting the result
2021-07-13 18:13:32 +02:00
Miriam Baglioni
52ce35d57b
-
2021-07-13 18:08:46 +02:00
Miriam Baglioni
970b387b8d
modification to allow dump of a single community
2021-07-13 18:08:10 +02:00
Miriam Baglioni
eae10c5894
modification to allow the dump for a single community
2021-07-13 18:07:25 +02:00
Miriam Baglioni
c028feef4f
workflow for the dump as sub workflows
2021-07-13 18:06:44 +02:00
Miriam Baglioni
d70f8c96fd
funding contains and not starts with h2020
2021-07-13 17:34:53 +02:00
Miriam Baglioni
5e38c7f42d
dumping only communities with status all
2021-07-13 17:32:38 +02:00
Claudio Atzori
734de62474
[doiboost] added workflow for the ActionSet update dedicated to production
2021-07-13 17:26:04 +02:00
Miriam Baglioni
d418c309f5
removed the part after part-x- in the file name generated by spark. It was too long and created problems while creating the tar entries
2021-07-13 17:11:49 +02:00
Miriam Baglioni
618d2de2da
minor changes and refactoring
2021-07-13 17:10:02 +02:00
Miriam Baglioni
59615da65e
Add test to verify the creation of relation between context and projects
2021-07-13 17:09:15 +02:00
Miriam Baglioni
084b4ef999
added the creation of the openaireId from funder and grant number if the element is not present in the context profile
2021-07-13 17:07:46 +02:00
Claudio Atzori
fa720c1da4
[doiboost] added workflow for the ActionSet update dedicated to production
2021-07-13 16:59:30 +02:00
Miriam Baglioni
8f322a73cb
change because of the renaming of originalId in acronym
2021-07-13 16:22:58 +02:00
Miriam Baglioni
72397ea1ba
Added fix for community of arbitrary name length
2021-07-13 16:18:35 +02:00
Miriam Baglioni
5295d10691
added check not to dump deletedByInference entities
2021-07-13 16:11:46 +02:00
Claudio Atzori
9629569e22
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2021-07-13 16:04:08 +02:00
Claudio Atzori
f13e11e3f7
[aggregation] datacite wf: defined parameter declaring the path used to store the OAF objects produced by the transformation phase
2021-07-13 16:04:02 +02:00
Miriam Baglioni
e9a17ec899
added check to verify not to add void APC
2021-07-13 15:53:35 +02:00
Miriam Baglioni
8429aed6c6
Added resource for testing selection of valid relations
2021-07-13 15:49:38 +02:00
Miriam Baglioni
39b1a6edf6
added test class for the selection of valid relations and description
2021-07-13 15:23:09 +02:00
Miriam Baglioni
9a58f1b93d
added logic to select only the valid relations: those not deletedbyinference and having both part of the relation as entities in the graph
2021-07-13 15:20:39 +02:00
Miriam Baglioni
13c66e16be
changed logic to split for communities
2021-07-13 15:15:27 +02:00
Miriam Baglioni
6410ab71d8
added APC in the dump and test method
2021-07-13 15:13:58 +02:00
Miriam Baglioni
65a242646d
added resource for APC dump
2021-07-13 14:45:25 +02:00
Miriam Baglioni
4b432fbee8
extended test class
2021-07-13 14:40:39 +02:00
Miriam Baglioni
87a6e2b967
extended test class
2021-07-13 14:38:28 +02:00
Miriam Baglioni
69fd40fd30
modified code to split the Croatian funder
2021-07-13 14:35:26 +02:00
Miriam Baglioni
86e50f7311
modified code to split the Croatian funder
2021-07-13 14:31:45 +02:00
Miriam Baglioni
da88c850c6
changed the logic to verify if a community is contained in the list of context of a result
2021-07-13 14:22:44 +02:00
Miriam Baglioni
2f66fedfec
changed the logic to verify if a community is contained in the list of context of a result
2021-07-13 14:22:23 +02:00
Miriam Baglioni
f5486ffb14
Fixed issues to tests
2021-07-13 14:07:45 +02:00
Claudio Atzori
e0061232e9
[aggregation] datacite wf: conditional creation of links, optional resume from intermediate phases
2021-07-13 13:41:21 +02:00
Claudio Atzori
bc4b86c27c
updated URL in the issueManagement tag
2021-07-13 11:54:32 +02:00
Claudio Atzori
28a66af425
updated URL in the issueManagement tag
2021-07-13 11:52:24 +02:00
Claudio Atzori
783988af06
depending on dhp-schemas:2.6.14 (release)
2021-07-13 11:17:25 +02:00
Claudio Atzori
9038fdc771
depending on dhp-schemas:2.7.14 (release)
2021-07-12 17:46:12 +02:00
Sandro La Bruzzo
bbe8193930
merged stable ids
2021-07-12 17:00:43 +02:00
Claudio Atzori
ae2b47b29d
[broker] added coalesce(1) on the stats dataset before storing it on postgres
2021-07-09 15:47:51 +02:00
Sandro La Bruzzo
57c74c73c6
fixed mistakes in oozie workflow
2021-07-09 12:28:09 +02:00
Sandro La Bruzzo
61ccb54fde
removed wrong loop on oozie wf
2021-07-09 12:17:57 +02:00
Sandro La Bruzzo
9f5a0f3ab6
moved wf indexing of Scholexplorer in dhp-graph-provision
2021-07-09 12:06:43 +02:00
Sandro La Bruzzo
09fccf8000
added workflow to serialize scholix and summary in json
2021-07-09 11:01:42 +02:00
Sandro La Bruzzo
0ea576745f
updated CreateInputGraph because ggenerics don't work on Spark Dataset
2021-07-09 10:29:24 +02:00
Sandro La Bruzzo
cd17e19044
implemented branch workflow to import datacite and crossref in scholexplorer
2021-07-08 21:20:19 +02:00
Miriam Baglioni
c30f3ce647
merge doi normalization
2021-07-08 19:20:02 +02:00
Sandro La Bruzzo
8a034e46e1
updated baseline workflow
2021-07-08 11:11:41 +02:00
Claudio Atzori
b7b8e0986e
[raw_all] The claim merge procedure includes the claimed contexts in the merged result
2021-07-08 10:42:31 +02:00
Sandro La Bruzzo
0799ac9fb6
fixed wrong path
2021-07-08 10:36:37 +02:00
Sandro La Bruzzo
4d53402712
extended ebiLinks to create a dataset before generation of OAF
2021-07-08 10:26:21 +02:00
Sandro La Bruzzo
a4a54a3786
code refactor
2021-07-08 09:08:25 +02:00
Sandro La Bruzzo
a01dbe0ab0
completed workflow of generation of scholix and summaries
2021-07-07 23:10:34 +02:00
Claudio Atzori
fdcff42e46
[raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity
2021-07-07 19:01:59 +02:00
Claudio Atzori
777536ce91
[aggregation] string values used as regular expressions in the OAI collection classes are defined in a single point as constants, to be reused across the code (PR#122)
2021-07-07 11:23:48 +02:00
Claudio Atzori
bc014023c8
Merge pull request 'to solve the scala SI-3623' ( #122 ) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
...
Reviewed-on: #122
2021-07-07 11:13:51 +02:00
Claudio Atzori
32bdfdccbc
[raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity
2021-07-07 11:08:27 +02:00
Andreas Czerniak
ebf3f47a02
from&until more OAI2.0 compl., adding tfs
2021-07-07 09:29:49 +02:00
Claudio Atzori
f580cb77e1
added mapping for claim relation 'resultResult_publicationDataset_isRelatedTo' (present on BETA)
2021-07-06 21:11:11 +02:00
Sandro La Bruzzo
ed684874f2
deleted old scholix project
2021-07-06 17:20:08 +02:00
Sandro La Bruzzo
8535506c22
added scholix generation
2021-07-06 17:18:06 +02:00
Sandro La Bruzzo
4c54bd8742
add test to verify merge scholix on source
2021-07-06 11:32:14 +02:00
Andreas Czerniak
3531802710
to solve the scala SI-3623
2021-07-06 11:30:56 +02:00
Sandro La Bruzzo
7d8db2eb8a
betterRenamingMethod
2021-07-06 09:56:32 +02:00
Sandro La Bruzzo
c952c8d236
generate first side of scholix mapping
2021-07-06 09:53:14 +02:00
Claudio Atzori
70ded407bb
HttpClient used in metadata collection retries also on 404
2021-07-05 18:04:30 +02:00
Miriam Baglioni
7177c25261
added check for null value during doi normalization
2021-07-05 16:22:38 +02:00
Miriam Baglioni
0892cad4e8
the normalization of the content of value was not visible outside the block. Moved doi normalization operation while returning value
2021-07-05 16:21:42 +02:00
Claudio Atzori
350a0823bd
Merge pull request 'using organization ids instead of names in monitor db creation' ( #121 ) from antonis.lempesis/dnet-hadoop:stable_ids into stable_ids
...
Reviewed-on: #121
2021-07-05 11:07:39 +02:00
Antonis Lempesis
89e6f46682
using organization ids instead of names in monitor db creation
2021-07-05 12:00:00 +03:00
Sandro La Bruzzo
e4b84ef5d6
fixed mapping OAF to Scholix summary
2021-07-02 16:48:48 +02:00
Sandro La Bruzzo
8fa0841898
Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer
2021-07-01 22:14:04 +02:00
Sandro La Bruzzo
c6fa8598e1
massive code refactor:
...
removed modules dhp-*-scholexplorer
2021-07-01 22:13:45 +02:00
Antonis Lempesis
829caee4fd
added the missing indicators files
2021-06-30 17:31:33 +02:00
Sandro La Bruzzo
84b834c893
added test dataset test for pangaea
2021-06-30 17:31:09 +02:00
Sandro La Bruzzo
1a6b398968
implemented Creation of Raw Graph and Resolution
2021-06-30 17:27:55 +02:00
Miriam Baglioni
bc34347643
added assertions to verify doi normalization
2021-06-30 14:37:08 +02:00
Miriam Baglioni
86f47afcc7
slight modification of the resource to accomodate also doi normalization tests
2021-06-30 14:36:49 +02:00
Miriam Baglioni
03767ea8e6
slight modification of the resource to accomodate also doi normalization tests
2021-06-30 13:21:24 +02:00
Miriam Baglioni
f8eec0ca9a
added resource to test the normalization of doi during the import of MAG
2021-06-30 13:19:54 +02:00
Miriam Baglioni
149f85ddf5
added tests for the normalization of the dois
2021-06-30 13:00:52 +02:00
Miriam Baglioni
e487b5544c
added tests for the normalization of the dois
2021-06-30 12:57:11 +02:00
Miriam Baglioni
1503ccbbb5
added tests for the normalization of the dois
2021-06-30 12:55:37 +02:00
Miriam Baglioni
1299bfb357
Added class to test the normalization of doi
2021-06-30 12:53:27 +02:00
Sandro La Bruzzo
623a0c4edb
code Refactor, renaming packages
2021-06-30 11:09:30 +02:00
Miriam Baglioni
cf758f4f91
added normalization step for the doi
2021-06-30 10:03:15 +02:00
Miriam Baglioni
801763a0fa
there is no more the need to lower case the doi since it is done in the first step. Also changed the creation of the id by using the factory
2021-06-29 19:07:23 +02:00
Miriam Baglioni
a74de1cda2
added normalization step to the doi
2021-06-29 18:51:11 +02:00
Miriam Baglioni
06074ea7d3
added normalization step to the doi
2021-06-29 18:46:08 +02:00
Miriam Baglioni
8b8ffe82dc
added step of normalization for the doi
2021-06-29 18:41:39 +02:00
Miriam Baglioni
50cc21d92e
Added method to normalize doi values (lower case, remove all preceeding 10., filtering out doi not starting with 10.)
2021-06-29 18:35:28 +02:00
Claudio Atzori
6d3f960238
Merge pull request 'added the missing indicators files' ( #120 ) from antonis.lempesis/dnet-hadoop:stable_ids into stable_ids
...
Reviewed-on: #120
2021-06-29 15:57:39 +02:00
Antonis Lempesis
ae18171212
Merge branch 'stable_ids' into stable_ids
2021-06-29 15:33:39 +02:00
Antonis Lempesis
87f14a3899
added the missing indicators files
2021-06-29 16:31:51 +03:00
Sandro La Bruzzo
db933ebd21
Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer
2021-06-29 14:16:12 +02:00
Sandro La Bruzzo
7e08655e5f
added relation dates in all scholexplorer Datasources
2021-06-29 12:02:03 +02:00
Sandro La Bruzzo
075055eaca
added relation dates in bio mapping
2021-06-29 10:33:09 +02:00
Sandro La Bruzzo
f36f92287d
implemented mapping from Crossref Event Data to Oaf
2021-06-29 10:21:23 +02:00
Claudio Atzori
986a8011ec
Merge pull request 'copied latest changes from old fork: indicators+monitor institutions' ( #119 ) from antonis.lempesis/dnet-hadoop:stable_ids into stable_ids
...
Reviewed-on: #119
2021-06-29 08:49:12 +02:00
Antonis Lempesis
018c4eb52c
copied latest changes from old fork: indicators+monitor institutions
2021-06-28 23:46:52 +03:00
Sandro La Bruzzo
511ec14c63
implemented mapping from EBI and Scholix Resolved to OAF
2021-06-28 22:04:22 +02:00
Claudio Atzori
af42377d0e
HttpClient used in metadata collection retries on 502, 503, 504
2021-06-28 09:34:30 +02:00
Sandro La Bruzzo
ad50415167
Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer
2021-06-24 17:20:50 +02:00
Sandro La Bruzzo
80e15cc455
implemented mapping from uniprot, pdb and ebi links
2021-06-24 17:20:00 +02:00
Claudio Atzori
67afd06cd1
[cleaning] cleaning instance.pid and instance.alternateidentifier using the same procedure used to clean result.pid
2021-06-24 12:10:17 +02:00
Claudio Atzori
2e8fd2c531
cleanup
2021-06-23 14:38:24 +02:00
Claudio Atzori
4dc9ebf217
[raw_all] fixed unit test
2021-06-23 14:38:07 +02:00
Claudio Atzori
50fc5a64a0
[raw_all] Aggregator graph creation merges claims (updates) with the corresponding entity
2021-06-23 11:49:42 +02:00
Claudio Atzori
5edcc6832a
applying sonarLint suggestions
2021-06-23 09:53:29 +02:00
Sandro La Bruzzo
080a280bea
added pdb to Oaf Transformation
2021-06-21 16:23:59 +02:00
Sandro La Bruzzo
1dc0c59e20
merged fix thai dates from stable_ids
2021-06-21 10:39:46 +02:00
Sandro La Bruzzo
dc66cf615b
Merge branch 'stable_id_scholexplorer' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_id_scholexplorer
2021-06-21 09:38:33 +02:00
Sandro La Bruzzo
507e42102a
added pdb to oaf class
2021-06-21 09:36:40 +02:00
Sandro La Bruzzo
a167543637
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_id_scholexplorer
2021-06-21 09:14:11 +02:00
Sandro La Bruzzo
4fe7b75644
renamed packages
2021-06-18 16:41:24 +02:00
Sandro La Bruzzo
3990165d05
changed typologies of unresolved relation
2021-06-18 11:43:59 +02:00
Claudio Atzori
2dd5449c13
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-06-18 10:08:15 +02:00
Claudio Atzori
fd54ecf7bd
bumped dhp-schemas dependency version
2021-06-18 10:08:07 +02:00
Miriam Baglioni
180d671127
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-06-18 09:46:18 +02:00
Miriam Baglioni
13c96622c9
-
2021-06-18 09:45:16 +02:00
Miriam Baglioni
b486ae498f
added test and test resource to verify the generation of the date of acceptance from the input extracted from the dump
2021-06-18 09:43:32 +02:00
Miriam Baglioni
464c2ddde3
changed to split in two steps the generation of the crossref dataset
2021-06-18 09:42:31 +02:00
Miriam Baglioni
6aca0d8ebb
added kryo encoding for input files
2021-06-18 09:42:07 +02:00
Miriam Baglioni
3585e53da3
changed to split in two steps the generation of the crossref dataset
2021-06-18 09:41:23 +02:00
Claudio Atzori
41b551562e
applying PR#115 (DatePicker) on stable_ids
2021-06-17 09:33:50 +02:00
Sandro La Bruzzo
3100166d29
Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer
2021-06-16 16:22:16 +02:00
Claudio Atzori
74833d04f1
Merge branch 'pids_beta' of https://code-repo.d4science.org/antonis.lempesis/dnet-hadoop into stable_ids
2021-06-16 15:54:18 +02:00
Claudio Atzori
7243a40c88
code formatting
2021-06-16 15:03:03 +02:00
Sandro La Bruzzo
dfcf78cf24
removed wrong code
2021-06-16 14:57:42 +02:00
Sandro La Bruzzo
cc0f2b11fb
Implemented mapping from pubmed baseline to OAF
2021-06-16 14:56:24 +02:00
Miriam Baglioni
95885bcf12
forces executor Executor memory and driver executor memory to be 7G (trying to avoid OOM)
2021-06-16 10:17:52 +02:00
Miriam Baglioni
2550a73981
-
2021-06-16 10:04:41 +02:00
Miriam Baglioni
1c47c0d786
modified the number of executors trying to avoid OOM exception
2021-06-15 21:05:39 +02:00
Miriam Baglioni
7deac55138
added one option for resume from in the wf
2021-06-15 18:38:20 +02:00
Antonis Lempesis
f7c0b80e35
storing result_instance as parquet
2021-06-15 14:45:48 +03:00
Miriam Baglioni
66e7ef892f
changed the parameter name
2021-06-15 11:08:54 +02:00
Miriam Baglioni
4f47ad0891
no need to rename the folders, just write in overwrite mode, so I changed the name of the output folder
2021-06-15 09:28:31 +02:00
Miriam Baglioni
9f9dd00b94
refactoring
2021-06-15 09:24:46 +02:00
Miriam Baglioni
63d74ee379
refactoring
2021-06-15 09:24:11 +02:00
Miriam Baglioni
6ebc236657
added needed property: outputPath
2021-06-15 09:23:24 +02:00
Miriam Baglioni
f7379255b6
changed the workflow to extract info from the dump
2021-06-15 09:22:54 +02:00
Miriam Baglioni
d6e21bb6ea
creates the crossref dataset used for doiboost together with unpacking part from tar
2021-06-14 17:27:19 +02:00
Miriam Baglioni
4da141bd7c
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-06-14 13:41:02 +02:00
Miriam Baglioni
ce0cfd79e0
creates the crossref dataset used for doiboost
2021-06-14 13:40:19 +02:00
Miriam Baglioni
93efe4de82
split the construction of crossref dataset in two parts. This one just unpacks the tar entries
2021-06-14 13:39:40 +02:00
Michele Artini
ada063ce70
fixed a problem with empty mdstore list (2)
2021-06-14 12:04:47 +02:00
Michele Artini
83132ee99a
fixed a problem with empty mdstore list
2021-06-14 11:57:00 +02:00
Miriam Baglioni
cf360d7c97
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-06-14 10:19:49 +02:00
Miriam Baglioni
8873e6b6d1
workflow and parameter
2021-06-14 10:15:57 +02:00
Miriam Baglioni
0f1acdf6b6
workflow and parameter
2021-06-14 10:08:55 +02:00
Sandro La Bruzzo
aeb8132627
Merged branch stable_ids
2021-06-14 10:07:29 +02:00
Sandro La Bruzzo
efbea1e01a
minor fix
2021-06-14 09:45:14 +02:00
Miriam Baglioni
75780fc636
extraction of the tar for the dump of crossref, and creation of the dataset
2021-06-14 09:45:07 +02:00
Claudio Atzori
2039bb9f5f
orcid / orcid_pending cleaning backported from master branch
2021-06-14 09:40:50 +02:00
Claudio Atzori
dd19c4ac5a
Merge pull request 'import_new_mdstores' ( #112 ) from import_new_mdstores into stable_ids
...
Reviewed-on: #112
2021-06-14 09:23:55 +02:00
Claudio Atzori
e9e86a237d
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-06-11 17:00:02 +02:00
Claudio Atzori
10bd6ca194
depending on dhp-schemas:2.5.12 (release)
2021-06-11 16:59:56 +02:00
Claudio Atzori
a900bfb874
delegating the date parsing to https://github.com/sisyphsu/dateparser
2021-06-11 16:53:01 +02:00
Sandro La Bruzzo
dd997c49e0
fix wrong relation id
...
fix date thai ticket #6791
2021-06-10 14:47:18 +02:00
Antonis Lempesis
d413b24611
added instances, orgs for monitor, totalcost for projects, apcs
2021-06-10 02:35:46 +03:00
Claudio Atzori
741077dbca
Merge pull request 'Fix in Affiliation Propagation' ( #113 ) from miriam.baglioni/dnet-hadoop:master into stable_ids
...
Reviewed-on: #113
2021-06-09 18:42:42 +02:00
Miriam Baglioni
32b0c27217
Aggiornare 'dhp-workflows/dhp-enrichment/src/main/java/eu/dnetlib/dhp/resulttoorganizationfrominstrepo/PrepareResultInstRepoAssociation.java'
...
fix in SQL query: while writing the blacklist constraint it used d.id to indicate the datasource id, but no alias for the datasource was defined. So I removed the alias
2021-06-09 18:36:11 +02:00
Sandro La Bruzzo
0d1f37302f
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_id_scholexplorer
2021-06-09 09:35:16 +02:00
Miriam Baglioni
dc07f1079b
added check in case the author set to be enriched is null
2021-06-08 12:06:10 +02:00
Miriam Baglioni
8d2e086e48
changes to avoid reassignment to val
2021-06-07 17:50:37 +02:00
Miriam Baglioni
f33521d338
Aggiornare 'dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcid/SparkConvertORCIDToOAF.scala'
...
to be able to replace the aboject assigned to author val has been replaced by var
2021-06-07 17:27:07 +02:00
Miriam Baglioni
bc12e9819e
Aggiornare 'dhp-workflows/dhp-doiboost/src/main/java/eu/dnetlib/doiboost/orcid/SparkConvertORCIDToOAF.scala'
...
The change is to fix the issue that arises when the same work appears more than once on the same ORCID profile. The change avoid to replicate the association doi -> author when the orcid id is already associated to the doi.
2021-06-07 16:37:01 +02:00
Sandro La Bruzzo
0cdb7ccdaa
added inverse relations to datacite mapping
2021-06-04 15:10:20 +02:00
Sandro La Bruzzo
5b724d9972
added relations to datacite mapping
2021-06-04 10:14:22 +02:00
Sandro La Bruzzo
e57294ac99
implemented changes on PUBMed dataflow
2021-06-03 10:52:09 +02:00
Michele Artini
ede2749822
orcid pid type
2021-06-01 12:42:43 +02:00
Michele Artini
f0fbfdcfae
Merge branch 'stable_ids' into import_new_mdstores
2021-06-01 12:03:00 +02:00
Michele Artini
e950750262
add nodes to import hdfs mdstores
2021-06-01 10:48:50 +02:00
Michele Artini
03a510859a
removed coalesce(1)
2021-05-31 14:10:51 +02:00
Michele Artini
e9f2b6037c
patch of mdstore records
2021-05-31 11:36:26 +02:00
Sandro La Bruzzo
02ef46535f
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-05-31 09:50:15 +02:00
Sandro La Bruzzo
aeadc5a366
updated wf Datacite Import to retrieve the block size as parameter
2021-05-31 09:49:53 +02:00
Claudio Atzori
96238152cb
added serialization for alternateIdentifiers and pids within each record instance
2021-05-28 16:57:30 +02:00
Michele Artini
ad56a44fda
save as gzipped sequence file
2021-05-28 14:45:39 +02:00
Claudio Atzori
83722ebc47
pull #111 replied on stable_ids
2021-05-28 14:11:46 +02:00
Claudio Atzori
eb6acfbabc
[cleaning] removing non parsable relation.validationDate(s)
2021-05-28 10:50:44 +02:00
Claudio Atzori
6e3a4e9237
updated test expectations
2021-05-28 09:37:50 +02:00
Claudio Atzori
ac3d090e9e
bumped dhp-schemas dependency version
2021-05-27 17:31:12 +02:00
Michele Artini
4fa5671d16
first implementation of Hdfs Mdstores Importer
2021-05-27 16:22:07 +02:00
Claudio Atzori
c3d92247d3
bumped dhp-schemas dependency version
2021-05-27 15:10:51 +02:00
Claudio Atzori
d512062b58
integrating pull #109 , H2020Classification
2021-05-27 12:22:47 +02:00
Claudio Atzori
5e4b91d9ef
more pervasive use of constants from ModelConstants, especially for ORCID
2021-05-26 18:20:23 +02:00
Sandro La Bruzzo
bced804151
updated wf Datacite Import to retrieve the block size as parameter
2021-05-26 17:06:50 +02:00
Claudio Atzori
4f58418184
depending on dhp-schemas:2.4.7 (release)
2021-05-24 10:32:48 +02:00
Miriam Baglioni
abd88f663d
changed test resource to mirror change in the input file
2021-05-21 15:20:47 +02:00
Miriam Baglioni
c844877de2
changed workflow flow to possibly parallelize also the programme and project preparation steps
2021-05-21 14:41:57 +02:00
Miriam Baglioni
073d76864d
refactoring
2021-05-21 14:41:03 +02:00
Miriam Baglioni
4c8b4a774c
removed not needed code
2021-05-21 14:40:07 +02:00
Enrico Ottonello
abdd0ade1f
added temporary output folder as workflow parameter
2021-05-21 12:08:16 +02:00
Miriam Baglioni
53b9d87fec
new prepareProgramme according to the new file
2021-05-21 11:49:31 +02:00
Miriam Baglioni
1ee8f13580
refactoring and added "left" as join type to be 100% sure to get the whole set of projects
2021-05-21 11:49:05 +02:00
Miriam Baglioni
e07c3ba089
due to change in the input file the filtering step is no more needed
2021-05-21 11:47:43 +02:00
Miriam Baglioni
54f6e2f693
changed to get the needed information to build the action set as parallel jobs
2021-05-21 11:47:00 +02:00
Miriam Baglioni
7180505519
removed non needed variable
2021-05-21 11:46:13 +02:00
Miriam Baglioni
2eb1a8b344
changed because the input file changed
2021-05-21 11:40:20 +02:00
Enrico Ottonello
d0945c3c78
added temporary output folder, because of folder access rights are different on beta and prod
2021-05-20 19:14:31 +02:00
Enrico Ottonello
1265dadc90
workflow aligned with stable_ids
2021-05-20 19:01:28 +02:00
Enrico Ottonello
0821d8e97d
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2021-05-20 18:33:18 +02:00
Enrico Ottonello
ae7bd24d79
removed old workflows
2021-05-20 18:32:22 +02:00
Enrico Ottonello
4d6c473bf1
removed redundant classes contained now in dhp-schema
2021-05-20 18:26:42 +02:00
Claudio Atzori
9d725efdc1
reverted implementation of the mdstore client
2021-05-20 18:26:09 +02:00
Miriam Baglioni
9610224671
added param to workflow property
2021-05-20 18:21:12 +02:00
Claudio Atzori
863b56b6ce
using constants from ModelConstants
2021-05-20 16:23:58 +02:00
Claudio Atzori
ae5c28e54f
code formatting
2021-05-20 16:13:06 +02:00
Miriam Baglioni
aa45b4df9b
-
2021-05-20 15:57:40 +02:00
Miriam Baglioni
052c837843
-
2021-05-20 15:54:44 +02:00
Claudio Atzori
b695932ae4
integrated pull#108
2021-05-20 15:34:04 +02:00
Claudio Atzori
ea9b00ce56
adjusted test
2021-05-20 15:31:42 +02:00
Claudio Atzori
2e70aa43f0
Merge pull request 'H2020Classification fix and possibility to add datasources in blacklist for propagation of result to organization' ( #108 ) from miriam.baglioni/dnet-hadoop:master into master
...
Reviewed-on: #108
The changes look ok, but please drop a comment to describe how the parameters should be changed from the workflow caller for both workflows
* H2020Classification
* propagation of result to organization
2021-05-20 15:25:05 +02:00
Claudio Atzori
b572f56763
Merge branch 'master' into master
2021-05-20 15:22:35 +02:00
Claudio Atzori
2578b7fbb3
code formatting
2021-05-20 14:59:02 +02:00
Miriam Baglioni
dc0ad8d2e0
fixed issue related to change in the file name downloaded. Added sheet name as parameter and also a check if the name should change
2021-05-20 14:53:53 +02:00
Claudio Atzori
232dce83db
fixes #6701 : xpath for titles to support both datacite and Guidelines v4 mapping
2021-05-20 14:41:15 +02:00
Claudio Atzori
aef2977ad0
fixes #6701 : xpath for titles to support both datacite and Guidelines v4 mapping
2021-05-20 14:40:22 +02:00
Miriam Baglioni
02b80cf24f
resolved conflicts
2021-05-20 10:59:39 +02:00
Claudio Atzori
c4a23c2f4d
fix: preserving the old identifier among the originalIds in the doiboost construction process, trying to avoid UnsupportedOperationException while adding elements to the originalIds
2021-05-19 16:01:52 +02:00
Claudio Atzori
ba03f549d7
fix: preserving the old identifier among the originalIds in the doiboost construction process
2021-05-19 15:43:26 +02:00
Claudio Atzori
239d0f0a9a
ROR actionset import workflow backported from branch stable_ids
2021-05-18 16:12:11 +02:00
Antonis Lempesis
168edcbde3
added the final steps for the observatory promote wf and some cleanup
2021-05-18 15:23:20 +03:00
Michele Artini
e56ccec536
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-05-18 14:00:28 +02:00
Michele Artini
c1e20de7cf
fixed the deserialization of a json property
2021-05-18 14:00:14 +02:00
Claudio Atzori
a9f512103b
using constants from ModelConstants
2021-05-18 11:19:07 +02:00
Claudio Atzori
eeb8bcf075
using constants from ModelConstants
2021-05-18 11:10:07 +02:00
Claudio Atzori
2cbf15f4fb
using ModelConstants
2021-05-17 09:54:45 +02:00
Enrico Ottonello
e13926cdd0
merged with master
2021-05-14 18:10:31 +02:00
Claudio Atzori
f19feceaf0
set the old identifier before switching to the new one
2021-05-14 12:53:40 +02:00
Claudio Atzori
1bd70fa2c6
preserving the old identifier among the originalIds in the doiboost construction process
2021-05-14 11:30:41 +02:00
Claudio Atzori
ca3f3a7687
using ModelConstants
2021-05-14 11:29:49 +02:00
Claudio Atzori
0358ae16ce
depending on the latest dhp-schema version
2021-05-14 11:28:33 +02:00
Claudio Atzori
23b8883ab1
applied intellij code cleanup
2021-05-14 10:58:12 +02:00
Claudio Atzori
609eb711b3
IndexRecordTransformerTest for producing a record that can be manually submitted to solr
2021-05-13 16:13:28 +02:00
Claudio Atzori
1517bf7c92
IndexRecordTransformerTest for producing a record that can be manually submitted to solr
2021-05-13 16:11:22 +02:00
Sandro La Bruzzo
d9a0bbda7b
implemented new phase in doiboost to make the dataset Distinct by ID
2021-05-13 12:25:14 +02:00
Sandro La Bruzzo
6424cd9062
Added passing of the following parameters:
...
-varDataSourceId
-varOfficialName
in Each transformation Rule
2021-05-11 15:17:38 +02:00
Sandro La Bruzzo
073dcea2aa
Added passing of the following parameters:
...
-varDataSourceId
-varOfficialName
in Each transformation Rule
2021-05-11 15:05:58 +02:00
Claudio Atzori
d4c3476152
mapping datasource.journal only when an issn is available, null otherwhise
2021-05-11 11:08:54 +02:00
Claudio Atzori
da9d6f3887
mapping datasource.journal only when an issn is available, null otherwhise
2021-05-11 10:45:30 +02:00
Sandro La Bruzzo
54217d73ff
removed old parameters from oozie workflow
2021-05-11 09:59:02 +02:00
Claudio Atzori
d1cbee8413
imported methods from CleaningFunctions, defined in GraphCleaningFunctions
2021-05-10 16:43:39 +02:00
Claudio Atzori
3797543600
MDStoreManager model classes moved in dhp-schemas
2021-05-10 14:32:05 +02:00
Claudio Atzori
3925eb6a79
MDStoreManager model classes moved in dhp-schemas
2021-05-10 13:58:23 +02:00
Claudio Atzori
25254885b9
[ActionManagement] reduced number of xqueries used to access ActionSet info
2021-05-07 17:32:03 +02:00
Claudio Atzori
8a0de2fc18
[ActionManagement] reduced number of xqueries used to access ActionSet info
2021-05-07 17:31:32 +02:00
Sandro La Bruzzo
7dc824fc23
imported changes in stable_id into master
2021-05-07 12:53:50 +02:00
Michele Artini
d82071ba6c
originalId with prefix
2021-05-06 15:34:48 +02:00
Claudio Atzori
d4a30fabe3
clean up tests
2021-05-05 17:28:15 +02:00
Claudio Atzori
dccaf173cf
fixed mapping applied to ODF records. Added unit test to verify the mapping for OpenTrials
2021-05-05 16:36:15 +02:00
Claudio Atzori
8c96a82a03
fixed mapping applied to ODF records. Added unit test to verify the mapping for OpenTrials
2021-05-05 15:30:06 +02:00
Claudio Atzori
50fc128ff7
alternative way to set timeouts for the ISLookup client
2021-05-05 11:24:44 +02:00
Claudio Atzori
2e1eb96f9a
code formatting
2021-05-05 11:23:57 +02:00
Claudio Atzori
b1785ba77c
alternative way to set timeouts for the ISLookup client
2021-05-05 11:23:46 +02:00
Sandro La Bruzzo
1adfc41d23
merged manually changes on stable_id for doiboost into master
2021-05-05 10:23:32 +02:00
Claudio Atzori
fb930b84d3
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-05-04 18:06:30 +02:00
Claudio Atzori
923d19ea8e
mdstore read lock/unlock when bulk copying records from mongodb to hdfs
2021-05-04 18:06:21 +02:00
Sandro La Bruzzo
714b71bd21
updated pubmed
2021-05-04 14:54:12 +02:00
Claudio Atzori
ba86835951
using common constants from ModelConstants
2021-05-04 11:51:52 +02:00
Claudio Atzori
c00be646f3
Merge pull request 'prepare_ror_actionset' ( #106 ) from prepare_ror_actionset into stable_ids
...
Reviewed-on: #106
Thanks Michele, looks good to me.
2021-05-04 11:41:58 +02:00
Michele Artini
f4bd2b5619
recert file SparkDedupTest.java
2021-05-04 10:26:14 +02:00
Michele Artini
49910aedca
Merge branch 'stable_ids' into prepare_ror_actionset
2021-05-04 10:00:12 +02:00
Claudio Atzori
5cc3e6d61c
bumped pace-core dependency version
2021-05-03 16:40:50 +02:00
miconis
1144d50a11
[maven-release-plugin] prepare for next development iteration
2021-05-03 16:09:56 +02:00
miconis
f33a18ca9d
[maven-release-plugin] prepare release dnet-dedup-4.1.7
2021-05-03 16:09:08 +02:00
miconis
4bce4f2e8e
minor change: version updated
2021-05-03 16:05:39 +02:00
miconis
c6266242e3
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-dedup
2021-05-03 15:38:00 +02:00
miconis
4988e9f80d
implementation of cross comparison for different fields, addition of clustering mechanism to collapse keys from different clustering functions on the same cluster
2021-05-03 15:37:41 +02:00
Michele Artini
b4877da363
Merge branch 'stable_ids' into prepare_ror_actionset
2021-05-03 08:13:55 +02:00
Alessia Bardi
9a20057615
fixed query for organisations' pids
2021-04-29 15:23:39 +02:00
Michele Artini
6692128234
Merge branch 'stable_ids' into prepare_ror_actionset
2021-04-29 13:24:08 +02:00
Alessia Bardi
a801999e75
fixed query for organisations' pids
2021-04-29 12:18:42 +02:00
Michele Artini
a278d67175
parse input file
2021-04-29 11:34:47 +02:00
Claudio Atzori
f6ccd54d87
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-04-29 10:10:01 +02:00
Claudio Atzori
91e7220f20
cleaned up workflow for actionset migration, adjusted dnet|cnr* dependency versions
2021-04-29 10:09:52 +02:00
Michele Artini
f77ba34126
pid types
2021-04-29 09:50:05 +02:00
Michele Artini
7c5cd86927
annotations and tests
2021-04-29 09:29:19 +02:00
Michele Artini
b5cf505cc6
partial implementation of the ROR->actionset workflow
2021-04-28 16:00:24 +02:00
Enrico Ottonello
c537986b7c
deleted folders with merged data immediately before merge phases
2021-04-28 11:25:25 +02:00
Sandro La Bruzzo
2129e9caa7
updated pangaea transformation to parse directly the xml
2021-04-28 10:21:03 +02:00
Claudio Atzori
5afa7d3e0c
core utilities in dhp-common moved in external module dhp-schemas
2021-04-27 15:44:01 +02:00
Alessia Bardi
e6075bb917
updated json schema for results - added instances and accessright definition
2021-04-27 15:15:08 +02:00
Claudio Atzori
ac77a245a3
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-04-27 14:05:00 +02:00
Claudio Atzori
f783e60ff7
cleanup
2021-04-27 14:04:50 +02:00
Sandro La Bruzzo
63c0303137
removed unused import, add log
2021-04-27 12:17:23 +02:00
Sandro La Bruzzo
74484d2823
bug fixing
2021-04-27 12:13:44 +02:00
Claudio Atzori
dd2e0a81f4
added dnet45-bootstrap-snapshot and dnet45-bootstrap-release repositories
2021-04-27 12:08:43 +02:00
Claudio Atzori
233d849f90
added dnet45-bootstrap-snapshot and dnet45-bootstrap-release repositories
2021-04-27 12:03:40 +02:00
Claudio Atzori
fcd13f5350
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-04-27 11:37:45 +02:00
Claudio Atzori
4028176559
enabled snapshots from dnet45-snapshots repository
2021-04-27 11:37:32 +02:00
Sandro La Bruzzo
c74b03d59c
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-04-27 11:31:07 +02:00
Sandro La Bruzzo
7f8848ecdd
added first implementation of Pangaea Mapping
2021-04-27 11:30:37 +02:00
Claudio Atzori
27ab8a704d
adjusted poms to align with the external dhp-schema module
2021-04-27 10:12:27 +02:00
Claudio Atzori
a7cf449b36
cleanup
2021-04-27 10:11:26 +02:00
Claudio Atzori
82de6fb634
dhp-schema moved to dedicated module https://code-repo.d4science.org/D-Net/dhp-schemas
2021-04-27 10:10:50 +02:00
Claudio Atzori
fa42026590
fixed PersonCleaner extension functions
2021-04-27 10:10:06 +02:00
Claudio Atzori
ef4bfd82e2
code formatting
2021-04-27 10:09:31 +02:00
Claudio Atzori
faa8f6f4e2
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-04-27 09:57:03 +02:00
miconis
6d5c14e030
assertions updated in entity merger test
2021-04-27 09:47:49 +02:00
Claudio Atzori
c2bb03c8b5
depending on external dhp-schemas module
2021-04-23 17:57:35 +02:00
Claudio Atzori
7ed107be53
depending on external dhp-schemas module
2021-04-23 17:52:36 +02:00
Claudio Atzori
c25238480c
making ODF record parsing namespace unaware ( #6629 )
2021-04-23 17:34:57 +02:00
Claudio Atzori
99cfb027fa
making ODF record parsing namespace unaware ( #6629 )
2021-04-23 17:09:36 +02:00
Miriam Baglioni
72e5aa3b42
refactoring
2021-04-23 12:10:30 +02:00
Miriam Baglioni
4ae6fba01d
refactoring
2021-04-23 12:09:19 +02:00
Miriam Baglioni
7d1b8b7f64
merge upstream
2021-04-23 11:55:49 +02:00
miconis
d0e3366c34
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-04-22 11:45:19 +02:00
miconis
3c12eeadce
bug fix in propagation of relations
2021-04-22 11:44:33 +02:00
Claudio Atzori
e5abbec2ba
[orcid] download of the lambda file defined in a script
2021-04-22 11:22:10 +02:00
Claudio Atzori
55964cbd81
[orcid] large oozie workflow cleanup; updated workflow for the orcidnodoi actionset creation
2021-04-22 10:18:09 +02:00
Claudio Atzori
8f309b72ff
[dedup] using node names consistently across the workflow
2021-04-21 17:54:51 +02:00
Claudio Atzori
52244f813a
merging from enrico.ottonello/dnet-hadoop:orcid-no-doi
2021-04-21 12:24:09 +02:00
Sandro La Bruzzo
fd29307b84
updated workflow name
2021-04-21 09:21:41 +02:00
Claudio Atzori
815b9f4d56
[openorgs dedup] fixed workflow parameter declarations. Introduced support for resuming the execution from intermediate steps
2021-04-20 17:24:45 +02:00
Claudio Atzori
d0d477cca3
code formatting
2021-04-20 12:50:34 +02:00
miconis
0393cdce42
addition of alternative names in export queries
2021-04-20 12:45:21 +02:00
miconis
cadd0a5de8
modification of the queries for openorgs: they now consider also pending orgs
2021-04-20 12:06:56 +02:00
Sandro La Bruzzo
e06c7f32f6
updated id figshare as described in #6377
2021-04-20 10:18:07 +02:00
Sandro La Bruzzo
dbe0d0378e
resolved ticket #6377
2021-04-20 09:44:44 +02:00
Antonis Lempesis
625d993cd9
added step for observatory db
2021-04-20 02:31:06 +03:00
Antonis Lempesis
25d0512fbd
code cleanup
2021-04-20 01:43:23 +03:00
Sandro La Bruzzo
524e5f3092
Improved parallelization on transformation wf on hadoop
2021-04-19 15:17:25 +02:00
Sandro La Bruzzo
cdfe01bbae
improved parallelization on transformation job
2021-04-19 15:14:52 +02:00
Sandro La Bruzzo
3ae67b7a1d
Merge remote-tracking branch 'origin/stable_ids' into stable_ids
2021-04-16 17:36:57 +02:00
Sandro La Bruzzo
a16e5299f9
applied unique function on the final dataset
2021-04-16 17:36:48 +02:00
Claudio Atzori
45057440c1
code formatting
2021-04-16 17:28:25 +02:00
Enrico Ottonello
34ca792a55
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2021-04-16 17:18:46 +02:00
Enrico Ottonello
27068aacd1
wf to move orcid-no-doi dataset on the folder ready the import
2021-04-16 17:17:47 +02:00
miconis
7ad573d023
bug fix: changed join in propagaterelations without applying filter on the id
2021-04-16 16:40:42 +02:00
Sandro La Bruzzo
67085da305
fixed NPE
2021-04-16 11:05:58 +02:00
Sandro La Bruzzo
644aa8f40c
Merge remote-tracking branch 'origin/stable_ids' into stable_ids
2021-04-16 09:14:26 +02:00
Sandro La Bruzzo
7d6a80e2f2
added new type on MAG mapping
2021-04-16 09:14:15 +02:00
Claudio Atzori
8704d32780
code formatting
2021-04-15 16:52:58 +02:00
Claudio Atzori
ba4b4c74d8
do not make the identifier prefix depend on the Handle
2021-04-15 16:48:26 +02:00
Claudio Atzori
906d50563c
Merge pull request 'properly invalidating impala metadata' ( #105 ) from antonis.lempesis/dnet-hadoop:master into master
...
Reviewed-on: #105
2021-04-15 15:06:22 +02:00
Claudio Atzori
3d58f95522
[stats update] properly invalidating impala metadata
2021-04-15 15:03:05 +02:00
Antonis Lempesis
03d36fadea
properly invalidating impala metadata
2021-04-15 13:34:22 +03:00
miconis
f64e57c112
refactoring of the id generation, sparkcreatemergerels collects entities to create root id after a join
2021-04-15 10:59:24 +02:00
miconis
176a5e493d
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-04-14 18:06:34 +02:00
miconis
3525a8f504
id generation of representative record moved to the SparkCreateMergeRel job
2021-04-14 18:06:07 +02:00
Claudio Atzori
745fa92db8
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-04-14 10:14:00 +02:00
Claudio Atzori
083c2959dc
cleanup
2021-04-14 10:13:53 +02:00
Sandro La Bruzzo
3f77bfceb0
fixed test failure on jenkins
2021-04-14 10:03:01 +02:00
Claudio Atzori
3125cef545
code formatting
2021-04-14 09:11:54 +02:00
Sandro La Bruzzo
44a0064df6
Merge remote-tracking branch 'origin/stable_ids' into stable_ids
2021-04-13 17:48:12 +02:00
Sandro La Bruzzo
479abd10cb
Add into ORCID workflow a method that extracts orcid directly to the dump generated by Enrico
2021-04-13 17:47:43 +02:00
Claudio Atzori
710cd1e8f2
Merge pull request 'add xslt, personname cleaner' ( #104 ) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
...
Reviewed-on: #104
LGTM
2021-04-13 14:43:05 +02:00
Claudio Atzori
d1ca025b0b
[cleaning] remiving authors without fullname or providing 'deactivated' keyword. Removing test test titles
2021-04-13 14:32:41 +02:00
miconis
1542196a33
bug fix: starting node of duplicate scan wf changed
2021-04-13 10:15:43 +02:00
miconis
369ed1cd8a
bug fix: lookupurl parameter added to dedup record job
2021-04-13 09:08:05 +02:00
Andreas Czerniak
52fbece3b3
Merge branch 'stable_ids' of https://code-repo.d4science.org/andreas.czerniak/BrStableId_dnet-hadoop into stable_ids
2021-04-13 07:05:09 +02:00
Andreas Czerniak
d7614c1f85
introduce new const
2021-04-13 07:04:27 +02:00
Andreas Czerniak
3b694074ff
add xslt, personname cleaner
2021-04-13 07:04:27 +02:00
Claudio Atzori
511c0521e5
[dedup] avoiding NPEs handling OpenOrg relations
2021-04-12 17:45:11 +02:00
Claudio Atzori
72dcadd8e6
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-04-12 17:32:09 +02:00
Claudio Atzori
902d05f548
[cleaning] avoiding NPEs handling null author PIDs
2021-04-12 17:31:40 +02:00
Claudio Atzori
58d013e24f
[maven-release-plugin] prepare for next development iteration
2021-04-12 16:12:15 +02:00
Claudio Atzori
3a7336157b
[maven-release-plugin] prepare release dnet-dedup-4.0.6
2021-04-12 16:12:10 +02:00
miconis
d442e25cbc
bug fix: ids in self mergerels are not marked deletedbyinference=true
2021-04-12 15:56:22 +02:00
miconis
dcff9cecdf
bug fix: ids in self mergerels are not marked deletedbyinference=true
2021-04-12 15:55:27 +02:00
Andreas Czerniak
34df35926c
add xslt, personname cleaner
2021-04-09 14:35:36 +02:00
miconis
11b22b2d23
bug fix in the query, it now exports only relations with non-hidden organizations
2021-04-08 11:51:47 +02:00
miconis
0857100fb8
implementation of the tests for the openorgs integration in the openaire provision
2021-04-07 18:42:16 +02:00
miconis
bf685d849f
addition of pids in the query for the export of openorgs for the provision, addition of ec_fields in the openorgs model
2021-04-07 14:27:43 +02:00
Miriam Baglioni
70e391d427
merge upstream
2021-04-07 10:38:08 +02:00
miconis
eaaefb8b4c
implementation of the procedure to reuse content of different dbs when creating the raw graph
2021-04-06 14:35:51 +02:00
miconis
c39c82dfe9
modification of the jobs for the integration of openorgs in the provision, dedup records are no more created by merging but simply taking results of openorgs portal
2021-04-06 14:31:00 +02:00
Claudio Atzori
37b65cc3ad
Merge pull request 'updates on stats-update workflow' ( #100 ) from antonis.lempesis/dnet-hadoop:master into master
...
The workflow integrated in the _stable_ids_ branch has been run correctly on the BETA content, thus IMO this PR can be integrated in the master branch.
Reviewed-on: #100
2021-04-02 16:13:35 +02:00
Claudio Atzori
1e7e5180fa
[Graph model] updated definition of ExternalReference: added alternateLabel, removed description ( #6503 )
2021-04-02 12:32:12 +02:00
Claudio Atzori
e686b8de8d
[ORCID-no-doi] integrating PR#98 #98
2021-04-01 17:11:03 +02:00
Claudio Atzori
ee34cc51c3
[ORCID-no-doi] integrating PR#98 #98
2021-04-01 17:07:49 +02:00
Claudio Atzori
70e49ed53c
[OpenOrgsWf] trivial refactoring
2021-04-01 10:30:51 +02:00
Claudio Atzori
7941d7be29
WIP: using common definitions from ModelConstants
2021-03-31 18:33:57 +02:00
Claudio Atzori
879e8cc7ef
WIP: using common definitions from ModelConstants
2021-03-31 17:12:01 +02:00
Claudio Atzori
72ce741ea6
WIP: using common definitions from ModelConstants
2021-03-31 17:07:13 +02:00
Enrico Ottonello
59ec5137e1
improvement related to https://issue.openaire.research-infrastructures.eu/issues/6501
2021-03-31 16:25:41 +02:00
Sandro La Bruzzo
616d2ecce2
splitted workflow collecting datacite into two workflows.
...
Released on beta
2021-03-31 15:45:58 +02:00
Miriam Baglioni
4b6e514f02
merge upstream
2021-03-30 10:27:12 +02:00
Claudio Atzori
27681b876c
code formatting
2021-03-29 17:47:11 +02:00
Claudio Atzori
9237d55d7f
[OpenOrgsWf] cleanup
2021-03-29 17:40:34 +02:00
Claudio Atzori
7f4e9479ec
[OpenOrgsWf] graph construction wf: allow to skip the import openorgs node (importOpenorgs true|false)
2021-03-29 16:59:16 +02:00
Claudio Atzori
940556f6d3
Merge pull request 'OpenOrgs dedup and Integration with OpenAIRE Provision' ( #102 ) from openorgswf into stable_ids
...
Reviewed-on: #102
2021-03-29 16:41:09 +02:00
miconis
2709d08fc2
Merge branch 'stable_ids' into openorgswf
2021-03-29 16:39:07 +02:00
miconis
f446580e9f
code refactoring (useless classes and wf removed), implementation of the test for the openorgs dedup
2021-03-29 16:10:46 +02:00
Claudio Atzori
3becaa5539
[Cleaning] drop alternate identifiers with empty values
2021-03-29 16:01:35 +02:00
Claudio Atzori
a0837ac357
[Stats update] integrating PR#100 for testing #100
2021-03-29 15:59:58 +02:00
Claudio Atzori
48f2b6127e
[Cleaning] drop alternate identifiers with empty values
2021-03-29 14:23:18 +02:00
miconis
2355cc4e9b
minor changes and bug fix
2021-03-29 10:07:12 +02:00
Sandro La Bruzzo
1dfda3624e
improved workflow importing datacite
2021-03-26 13:56:29 +01:00
Claudio Atzori
b5b7dc2104
[Cleaning] drop alternate identifiers with empty values
2021-03-26 12:30:00 +01:00
Enrico Ottonello
91d8660982
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2021-03-25 11:21:20 +01:00
Enrico Ottonello
ebd67b8c8f
removed duplicates orcid data on authors set
2021-03-25 11:20:52 +01:00
Claudio Atzori
827e7e37db
[Cleaning] drop instance.alternateIdentifier elements when they are available among instance.pid
2021-03-25 11:07:59 +01:00
miconis
28c1cdd132
merged stable_ids into openorgswf
2021-03-25 10:44:49 +01:00
miconis
5dfb66b0fa
minor changes
2021-03-25 10:29:34 +01:00
miconis
348b0ef921
bug fix, implementation of the workflow for the creation of raw_organizations (openorgs dedup), addition of the pid lists to the openorgs postgres db
2021-03-24 15:51:27 +01:00
Claudio Atzori
751125fdf9
[Actionmanager] zero function considers empty entity.id as well as rel.source/rel.target
2021-03-23 17:34:32 +01:00
Claudio Atzori
1e423fdc07
[Actionmanager] remove invalid records from the input graph before groupGraphTableByIdAndMerge
2021-03-23 13:39:24 +01:00
Claudio Atzori
e5ebb500cf
fixed pom versions; included missing workflow modules in dhp-workflows/pom.xml
2021-03-23 12:13:53 +01:00
Claudio Atzori
b75ad76f79
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-03-23 09:59:12 +01:00
Claudio Atzori
8db248aa13
avoiding error on jenkins compilations: java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries (on a random free port)!
2021-03-23 09:56:34 +01:00
Sandro La Bruzzo
625e4c29c4
added model constants
2021-03-23 09:39:56 +01:00
Claudio Atzori
b4febed138
updated mapping tests as consequence of the special treatment reserved to Handle PIDs
2021-03-23 09:37:48 +01:00
Claudio Atzori
431cbe9955
handle missing instance.pid during bulk cleaning
2021-03-23 09:28:58 +01:00
Sandro La Bruzzo
c392936b97
fixed error on best access right
2021-03-23 09:23:22 +01:00
Sandro La Bruzzo
c73072079d
fix conflicts
2021-03-22 16:36:31 +01:00
Sandro La Bruzzo
098914dcff
fix wrong relation with source null
2021-03-22 11:35:02 +01:00
miconis
0fe40b08e4
addition of deduplication profiles for the results, double check on pids and the title with a lower threshold
2021-03-19 17:12:05 +01:00
miconis
98854b0124
minor changes
2021-03-19 16:57:40 +01:00
Claudio Atzori
5a043e95ea
code formatting
2021-03-19 11:37:27 +01:00
Claudio Atzori
a4e82a65aa
integrated filter applied when merging BETA & PROD graphs to rule our records from Datacite
2021-03-19 11:34:44 +01:00
Claudio Atzori
3256b9c836
code formatting
2021-03-19 09:36:12 +01:00
Claudio Atzori
75144dacb3
Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2021-03-19 09:07:40 +01:00
Claudio Atzori
9588bfba81
[cleaning] entries avaialbe as PIDs must not appear as alternateIdentifier
2021-03-19 09:07:30 +01:00
Claudio Atzori
972d5a3d98
[dedup] Datacite should be authoritative for datasets
2021-03-19 09:04:20 +01:00
Sandro La Bruzzo
25d5663d97
added filter
2021-03-18 10:24:42 +01:00
Sandro La Bruzzo
5f98ea74a9
Added fix for pid generation in stableIds
2021-03-17 15:53:24 +01:00
Sandro La Bruzzo
b4805b989d
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-03-17 15:14:59 +01:00
Claudio Atzori
734232d3b9
identifier factory doesn't depend on pre-existing entity.id
2021-03-17 15:14:53 +01:00
Sandro La Bruzzo
76b10090fc
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-03-17 15:06:46 +01:00
Claudio Atzori
a3dac32f16
pidFilter a bit more permissive
2021-03-17 15:06:05 +01:00
Sandro La Bruzzo
2be0428047
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-03-17 14:54:28 +01:00
Claudio Atzori
8257f9a2bc
result.pid: adjusted the mapping applied to the contents from the aggregator
2021-03-17 12:45:38 +01:00
Sandro La Bruzzo
7c97a4d900
Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids
2021-03-17 12:13:03 +01:00
Sandro La Bruzzo
cc5bbafa5d
some fix to make workflows runs
2021-03-17 12:12:56 +01:00
Claudio Atzori
3b2da86f0a
added precondition on IdentifierFactory to check the presence of entity.id
2021-03-16 17:05:38 +01:00
Claudio Atzori
640b885706
added instance.alternativeIdentifiers to the graph model, adjusted the mapping applied to the contents from the aggregator
2021-03-16 14:19:32 +01:00
Claudio Atzori
61a2551e74
migrated last changes from svn (dnet45)
2021-03-15 17:17:55 +01:00
Claudio Atzori
9cac6da9bd
added AccessRight and OpenAccessRoute classes to ModelSupport.getOafModelClasses()
2021-03-12 16:31:17 +01:00
Antonis Lempesis
0ba0a6b9da
update promote wf to support monitor&production
2021-03-12 16:42:59 +02:00
Antonis Lempesis
60ebdf2dbe
update promote wf to support monitor&production
2021-03-12 16:34:53 +02:00
Antonis Lempesis
236435b470
following redirects
2021-03-12 14:11:21 +02:00
Antonis Lempesis
3c75a05044
fixed a ton of typos
2021-03-12 13:47:04 +02:00
Claudio Atzori
19f3580b3d
introduced java8-based date parsing
2021-03-11 16:46:23 +01:00
Claudio Atzori
d3cb923f24
introduced java8-based date parsing
2021-03-11 12:37:33 +01:00
Sandro La Bruzzo
4bb3bcafa5
add author sequence number
2021-03-11 11:32:32 +01:00
Sandro La Bruzzo
a8e5d0ea0d
updated test and fixed assign of access right
2021-03-11 10:41:24 +01:00
Sandro La Bruzzo
f5e7c57654
Fixed ticket 6282
2021-03-11 10:32:45 +01:00
Claudio Atzori
f74e464942
create bestaccessright as Qualifier
2021-03-10 15:40:05 +01:00
Antonis Lempesis
fa1ec5b5e9
fixed typo...
2021-03-10 14:05:58 +02:00
Claudio Atzori
c801ab6c1d
minor
2021-03-09 17:22:31 +01:00
Claudio Atzori
9917d7e01c
PID authorities include ArXiv
2021-03-09 17:12:52 +01:00
Claudio Atzori
01630f638d
IdentifierFactory implementation based on the list of datasources authoritative for a given pid type
2021-03-09 17:11:50 +01:00
Claudio Atzori
b3f3b895e5
[ #6282 open access status in the Graph] OAStatus renamed as openAccessRoute
2021-03-09 11:41:11 +01:00
Claudio Atzori
765f9bdee7
merged from dhp_oaf_model
2021-03-09 11:37:41 +01:00
Claudio Atzori
59532b0919
[ #6281 Provenance of product PIDs] Added PIDs to the Instance type; extended mapping for OAF/ODF records
2021-03-09 11:14:45 +01:00
Claudio Atzori
d525785497
[ #6282 open access status in the Graph] Result.Instance.accessRight defined with dedicated data type that includes the open access color.
2021-03-09 11:12:55 +01:00
Sandro La Bruzzo
bbe1a7c69a
[ #6281 Provenance of product PIDs] Added PIDs to the Instance type in Scholexplorer Export
2021-03-09 10:46:36 +01:00
Sandro La Bruzzo
a2169ccf07
// implemented Ticket #6281 added pid to Instance in doiBoost
2021-03-09 10:46:36 +01:00
Claudio Atzori
f468c7f0d7
merged from master
2021-03-09 09:12:41 +01:00
Claudio Atzori
76441f4edd
code formatting
2021-03-09 09:12:26 +01:00
Claudio Atzori
8d2bb24512
merged from master
2021-03-08 15:44:34 +01:00
Claudio Atzori
acbe3119a4
RestCollectorPlugin imported from dne45
2021-03-08 09:44:09 +01:00
Antonis Lempesis
f40c150a0d
fixed steps...
2021-03-06 00:35:57 +02:00
Claudio Atzori
fa7930d2e2
merging contributions from PR#97
2021-03-05 15:45:28 +01:00
Antonis Lempesis
6147ee4950
assigning correctly hive contexts to concepts
2021-03-05 14:12:18 +02:00
Antonis Lempesis
c5fbad8093
Contexts are now downloaded instead of using the stats_ext db
2021-03-04 00:42:21 +02:00
Claudio Atzori
55f6ff5f55
README.md for aggregation workflows
2021-03-03 16:18:34 +01:00
Claudio Atzori
e8789b0cdb
Merge pull request 'stats DB for monitor' ( #99 ) from antonis.lempesis/dnet-hadoop:master into master
...
Looks good to me, just a note on the parsing of the citations: since the last version, IIS produces citations as proper relationships among results. This is what we got already in the BETA graph
```
count r.reltype r.subreltype r.relclass
62.129.254 resultResult citation cites
62.043.309 resultResult citation isCitedBy
```
Thus, I suggest to move away from the current property based implementation for the extraction of the citation links and start relying on the relationships instead.
2021-03-03 10:29:09 +01:00
Claudio Atzori
ec80b7ade3
code formatting
2021-03-03 10:22:53 +01:00
Claudio Atzori
36f750cd1d
removed unused classes
2021-03-03 10:22:29 +01:00
Claudio Atzori
b73dce3e3a
more logging on the MDStore mongodb client. Forcing UTF_8 encoding on the content
2021-03-03 10:17:16 +01:00
Antonis Lempesis
27796343ca
crude sleep. hardcoded value
2021-03-03 01:37:47 +02:00
Enrico Ottonello
20c0438f11
reformatted code after compile step
2021-03-01 11:07:01 +01:00
Enrico Ottonello
70cb100647
added updating last orcid dataset folders after completion
2021-03-01 10:17:04 +01:00
Enrico Ottonello
bd3b16402b
added result typologies
2021-03-01 10:16:02 +01:00
Claudio Atzori
e76c4f62c1
MetadataRecord moved in dhp-schemas
2021-02-26 10:58:48 +01:00
miconis
1a85020572
bug fix in graph-mapper, changes in the implementation of the openorgs wf to create relations and populate openorgs db
2021-02-26 10:19:28 +01:00
Enrico Ottonello
ca1800510a
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2021-02-25 18:45:02 +01:00
Enrico Ottonello
53d7023460
dateOfCollection taken from orcid last_update.txt on hdfs; cleaned wf parameters
2021-02-25 18:43:29 +01:00
Claudio Atzori
7df2461ccc
indent XML records collected from oai-pmh endpoints
2021-02-25 16:19:12 +01:00
Enrico Ottonello
d43ea88caf
aligned orcid result typologies with openaire vocabulary
2021-02-25 15:02:10 +01:00
Claudio Atzori
b830e33392
mdstore collector plugin
2021-02-25 12:30:30 +01:00
Claudio Atzori
dc98c39500
more logging
2021-02-25 12:29:18 +01:00
Claudio Atzori
271e88537b
code formatting
2021-02-25 12:28:56 +01:00
Claudio Atzori
9c899f4433
cleanup on transformation functions and the relative tests
2021-02-24 15:07:59 +01:00
Claudio Atzori
fc3fa5e343
implemented mdstore collector plugin
2021-02-24 15:07:24 +01:00
Enrico Ottonello
975823b968
data from last updated orcid
2021-02-23 15:35:04 +01:00
Miriam Baglioni
896919e735
merge upstream
2021-02-23 10:45:29 +01:00
Antonis Lempesis
d90767c733
correctly invalidating metadata
2021-02-19 03:18:47 +02:00
Antonis Lempesis
3681afbe04
typo
2021-02-19 03:04:27 +02:00
Antonis Lempesis
c5502eba8f
actually moved stats computation in impala instead of hive...
2021-02-19 02:54:39 +02:00
Antonis Lempesis
33c85d4e66
moved stats computation in impala instead of hive
2021-02-18 17:23:34 +02:00
Antonis Lempesis
b8e96c8ae7
moved cache update to the end
2021-02-18 16:42:22 +02:00
Antonis Lempesis
bcbfc052b1
fixed last errors in step 21
2021-02-18 16:32:54 +02:00
Antonis Lempesis
10a29a4b9a
fixes in monitor step
2021-02-18 15:05:59 +02:00
Antonis Lempesis
8ef66452d5
fixed typo
2021-02-17 22:24:44 +02:00
Antonis Lempesis
a8836e2f5f
fixed typo
2021-02-17 19:27:07 +02:00
Claudio Atzori
e7eba9f7e7
WIP: transformation workflow error reporting; cleanup
2021-02-17 16:54:08 +01:00
Claudio Atzori
58467aaf1e
WIP: transformation workflow error reporting
2021-02-17 16:14:41 +01:00
Claudio Atzori
cc88701f29
retry for any Socket exception
2021-02-17 16:13:54 +01:00
Antonis Lempesis
a445c1ac3d
fixed variable names in monitor script
2021-02-17 16:45:09 +02:00
Antonis Lempesis
00d516360f
added missing ;
2021-02-17 16:41:10 +02:00
Claudio Atzori
545f8f3e48
using jackson objectmapper instead of GSon to serialise the aggregation report
2021-02-17 12:15:00 +01:00
Claudio Atzori
b592d78bb4
WIP: collectorWorker error reporting, generalised reported implementation
2021-02-17 10:28:01 +01:00
Antonis Lempesis
cd1b794409
added the monitor db wf
2021-02-17 02:11:55 +02:00
Claudio Atzori
cf27905a71
WIP: collectorWorker error reporting, added report messages
2021-02-16 16:53:14 +01:00
Alessia Bardi
bf2830b981
Merge pull request 'manage merging of Relatation validation attributes' ( #95 ) from merge_rel_validation into master
2021-02-16 12:14:27 +01:00
Claudio Atzori
6f9864c564
manage merging of Relatation validation attributes
2021-02-16 11:53:44 +01:00
Alessia Bardi
32e81c2d89
non validated rel has null value in validated field
2021-02-16 11:01:42 +01:00
Claudio Atzori
58288a95b8
WIP: collectorWorker error reporting, added report messages
2021-02-15 15:28:53 +01:00
Claudio Atzori
1abe6d1ad7
WIP: collectorWorker error reporting, added report messages
2021-02-15 15:08:59 +01:00
Claudio Atzori
523a6bfa97
Merge pull request 'first commit to the correct branch' ( #94 ) from andreas.czerniak/BrAggr_dnet-hadoop:hadoop_aggregator into hadoop_aggregator
...
Looks good to me, thanks Andreas!
2021-02-15 12:15:31 +01:00
Antonis Lempesis
1c029b9fc0
fixed formatting
2021-02-14 03:14:24 +02:00
Antonis Lempesis
2c4dcc90ba
analyzing tables to produce stats
2021-02-14 02:54:55 +02:00
Sandro La Bruzzo
7edcc87ed4
changed xslt behaviour on failure
2021-02-12 17:27:08 +01:00
Sandro La Bruzzo
6a37c7f175
merge fixed
2021-02-12 16:38:47 +01:00
Sandro La Bruzzo
b3f5c2351d
Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into hadoop_aggregator
...
Conflicts:
dhp-workflows/dhp-aggregation/src/test/java/eu/dnetlib/dhp/transformation/TransformationJobTest.java
2021-02-12 16:37:14 +01:00
Sandro La Bruzzo
f216277219
Implemented cleaning date
2021-02-12 16:34:52 +01:00
Andreas Czerniak
5a9017cf18
clone, min. changes, test, run
2021-02-12 14:32:36 +01:00
Claudio Atzori
aa55dedb8a
Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator
2021-02-12 12:31:05 +01:00
Claudio Atzori
29c6f7e255
classes related to the collection workflow moved into common package; implemented MongoDB collection plugins
2021-02-12 12:31:02 +01:00
Sandro La Bruzzo
17e6f1934e
fixed NPE on cleaner
2021-02-12 11:48:11 +01:00
Sandro La Bruzzo
ebcc3ec14f
updated wrong datacite identifier in trasformation
2021-02-11 16:25:51 +01:00
Michele Artini
83d815d0bc
only stats
2021-02-11 10:57:23 +01:00
Michele Artini
8c836bf930
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2021-02-11 10:54:41 +01:00
Michele Artini
8c1600398a
added resumeFrom parameter
2021-02-11 10:54:16 +01:00
Claudio Atzori
3f8f78cbfb
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2021-02-11 09:36:10 +01:00
Claudio Atzori
b34b5a39ca
index field authoridtypevalue mixes up different author id-type value pairs, dropped in favour of orcidtypevalue
2021-02-11 09:36:04 +01:00
Michele Artini
7249cceb53
switch of 2 nodes
2021-02-11 09:27:08 +01:00
Claudio Atzori
73393d3c4d
Merge pull request 'validatedLinksToProjects' ( #93 ) from validatedLinksToProjects into master
...
LGTM
2021-02-10 12:32:35 +01:00
Alessia Bardi
986dd969d3
use the proper import for Lists
2021-02-10 12:03:54 +01:00
miconis
4b2124a18e
implementation of the openorgs wfs, implementation of the raw_all wf to migrate openorgs db entities
2021-02-10 11:51:50 +01:00
Alessia Bardi
c4d1feca74
mapper test with validated link to project
2021-02-10 11:22:54 +01:00
Alessia Bardi
09fc7e2f78
serialization of validated flag on relationships
2021-02-10 11:22:09 +01:00
Enrico Ottonello
ee4ba7298b
fix last update read/write from file on hdfs
2021-02-09 23:24:57 +01:00
Claudio Atzori
bc458d1b54
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2021-02-09 16:27:30 +01:00
Claudio Atzori
82e6c50f3f
updated solr fields (authoridtypevalue, resultsubject, resultresourcetypename)
2021-02-09 16:27:04 +01:00
Claudio Atzori
62bd3c53ee
Merge branch 'master' into provision_indexing
2021-02-09 15:46:26 +01:00
Claudio Atzori
bae029f828
collection_java_xmx allows to declare the heap size allocated for the java actions involved in the metadata collectionw workflow
2021-02-08 18:07:23 +01:00
Claudio Atzori
bebc54d5bf
seq file storing native records is now compressed
2021-02-08 18:06:25 +01:00
Claudio Atzori
50add4c61b
added requestDelay to HttpConnector2 configuration; Aggregation workflow constants moved in dhp-common
2021-02-08 12:19:38 +01:00
Miriam Baglioni
2f5e6647c6
merge upstream
2021-02-08 10:33:11 +01:00
Claudio Atzori
40df0f987d
better logging, WIP: collectorWorker error reporting; common functions moved in DHPUtils
2021-02-06 20:12:00 +01:00
Claudio Atzori
a8a758925e
better logging, WIP: collectorWorker error reporting
2021-02-05 19:18:05 +01:00
Michele Artini
2ee0c3e47e
http entity as json string
2021-02-05 09:45:39 +01:00
Claudio Atzori
730973679a
Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator
2021-02-04 17:25:00 +01:00
Claudio Atzori
deb85706db
imported HttpConnector from https://svn.driver.research-infrastructures.eu/driver/dnet45/modules/dnet-modular-collector-service/trunk/src/main/java/eu/dnetlib/data/collector/plugins/HttpConnector.java as HttpConnector2
2021-02-04 17:24:52 +01:00
Sandro La Bruzzo
4dae5e605d
implemented messaging btween collection worker and Dnet
2021-02-04 15:51:15 +01:00
Claudio Atzori
72c57b28fa
switched project version to 1.2.4-branch_hadoop_aggregator-SNAPSHOT
2021-02-04 14:08:18 +01:00
Claudio Atzori
40764cf626
better logging, WIP: collectorWorker error reporting
2021-02-04 14:06:02 +01:00
Enrico Ottonello
c238561001
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2021-02-04 10:44:21 +01:00
Enrico Ottonello
465ce39f75
job execution now based on file last_update.txt on hdfs
2021-02-04 10:44:04 +01:00
Sandro La Bruzzo
69c253710b
fixed test
2021-02-04 10:30:49 +01:00
Michele Artini
3ea8c328ac
Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into hadoop_aggregator
2021-02-04 09:46:13 +01:00
Michele Artini
26d2eb946f
messages sender
2021-02-04 09:45:46 +01:00
Claudio Atzori
4758b58aa2
Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator
2021-02-03 17:58:29 +01:00
Claudio Atzori
e04045089f
better logging, WIP: collectorWorker error reporting
2021-02-03 17:58:22 +01:00
Alessia Bardi
c67329d3ad
updated test for EU Open Data portal datasets
2021-02-03 17:06:48 +01:00
Michele Artini
1b9731632b
Message Sender
2021-02-03 16:42:36 +01:00
Michele Artini
820d729e99
recover of Message and MessageType class
2021-02-03 16:20:34 +01:00
Michele Artini
33f4696d6e
Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into hadoop_aggregator
2021-02-03 16:08:21 +01:00
Michele Artini
c286d28ad2
logs
2021-02-03 16:07:49 +01:00
Claudio Atzori
0e8a4f9f1a
better logging, WIP: collectorWorker error reporting
2021-02-03 12:33:41 +01:00
Alessia Bardi
fd705404a1
tests for EU Open Data portal dataset mapping
2021-02-03 10:28:17 +01:00
Miriam Baglioni
6190465851
merge upstream
2021-02-03 10:27:27 +01:00
Claudio Atzori
53884d12c2
code formatting
2021-02-02 14:38:03 +01:00
Claudio Atzori
ac46c247d2
code formatting
2021-02-02 14:24:00 +01:00
Claudio Atzori
bde14b149a
fixed transformation target paths
2021-02-02 12:49:29 +01:00
Claudio Atzori
ca4391aa1c
minor changes
2021-02-02 12:44:04 +01:00
Claudio Atzori
bb89b99b24
code formatting
2021-02-02 12:34:14 +01:00
Claudio Atzori
75807ea5ae
factored out constants
2021-02-02 12:28:21 +01:00
Sandro La Bruzzo
4ed1e306b6
Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into hadoop_aggregator
2021-02-02 12:12:51 +01:00
Sandro La Bruzzo
0634674add
implemented transformation test
2021-02-02 12:12:14 +01:00
Claudio Atzori
d62ea1490d
cleaned up RabbitMQ stuff
2021-02-02 10:53:19 +01:00
Claudio Atzori
73d772a4b4
added method to list the known vocabulary names
2021-02-02 10:39:47 +01:00
Claudio Atzori
8eaa1fd4b4
WIP: metadata collection in INCREMENTAL mode and relative test
2021-02-01 19:29:10 +01:00
Sandro La Bruzzo
bead34d11a
code refactor
2021-02-01 14:58:06 +01:00
Sandro La Bruzzo
6ff234d81b
Implemented a first prototype of incremental harvesting and trasformation using readlock
2021-02-01 13:56:05 +01:00
Sandro La Bruzzo
b6b835ef49
update transformation Factory to get Transformation Rule by Id and not by Title
2021-02-01 08:49:42 +01:00
Sandro La Bruzzo
e423634cb6
RollBack in case of error WORKS!!!
2021-01-29 17:21:42 +01:00
Sandro La Bruzzo
8ee82576c6
Collection on Refresh WORKS!!!
2021-01-29 17:02:46 +01:00
Sandro La Bruzzo
0276180039
WIP mdstore
...
transaction implemented on hadoop side
2021-01-29 16:42:41 +01:00
Michele Artini
d942d0c77d
methods toString(), hashCode() and equals()
2021-01-29 13:16:48 +01:00
Sandro La Bruzzo
0f8e2ecce6
Merged Datacite transfrom into this branch
2021-01-29 10:45:07 +01:00
Sandro La Bruzzo
99cf3a8ea4
Merged Datacite transfrom into this branch
2021-01-28 16:34:46 +01:00
Sandro La Bruzzo
2da8bf7429
Merge pull request 'aggregation_on_hadoop' ( #91 ) from sandro.labruzzo/dnet-hadoop:aggregation_on_hadoop into hadoop_aggregator
...
ok
2021-01-28 10:06:49 +01:00
Sandro La Bruzzo
686e7b507c
Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into aggregation_on_hadoop
2021-01-28 10:02:13 +01:00
Sandro La Bruzzo
98b9498b57
Removed old messaging system not quite used from collection and Transformation workflow
...
code refactor
2021-01-28 09:51:17 +01:00
Michele Artini
38f2508c87
new fields in mdstore beans
2021-01-28 08:24:45 +01:00
Sandro La Bruzzo
184e7b3856
Implemented new Transformation using spark
2021-01-27 15:43:08 +01:00
Sandro La Bruzzo
150a617bd1
Merge pull request 'aggregation_on_hadoop' ( #90 ) from sandro.labruzzo/dnet-hadoop:aggregation_on_hadoop into hadoop_aggregator
...
Wonderfull code... You're the Best Sandro
2021-01-26 16:00:47 +01:00
Claudio Atzori
f1a852f278
align usage-stats workflow poms with latest snapshot version
2021-01-26 15:42:42 +01:00
Claudio Atzori
9c32119dc2
Merge pull request 'usage-stats-export-wf-v2' ( #89 ) from dimitris.pierrakos/dnet-hadoop:usage-stats-export-wf-v2 into master
...
Thank you Dimitris!
2021-01-26 15:01:41 +01:00
Claudio Atzori
885e0dd926
[Cleaning] filter authors not providing word characters in the fullname
2021-01-26 09:48:53 +01:00
Claudio Atzori
2890511613
[Cleaning] normalise missing Result.country
2021-01-26 09:41:44 +01:00
Claudio Atzori
4eb9ed35b1
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2021-01-25 18:12:24 +01:00
Claudio Atzori
cd379eb5e3
[Cleaning] trying to avoid NPEs, this time by ruling out authors without a defined fullname
2021-01-25 18:11:49 +01:00
Alessia Bardi
505477f36f
format code
2021-01-25 18:02:49 +01:00
Alessia Bardi
ded6ed8d7d
no ',' author, if there are no author in ODF records
2021-01-25 17:57:51 +01:00
Claudio Atzori
3465c8ccee
[Cleaning] trying to avoid NPEs
2021-01-25 16:54:53 +01:00
Sandro La Bruzzo
a54848a59c
Moved Vocabulary stuff to common module
2021-01-25 15:43:04 +01:00
Sandro La Bruzzo
ffb092b8d3
removed duplicate code HttpConnector.java
2021-01-25 15:05:37 +01:00
Sandro La Bruzzo
cda210a2ca
changed documentation since it didn't reflect the current status
2021-01-25 14:17:42 +01:00
Claudio Atzori
07a0ccfc96
[Cleaning] trying to avoid NPEs
2021-01-25 13:36:01 +01:00
miconis
c7e2d5a59a
minor changes
2021-01-25 12:40:45 +01:00
Claudio Atzori
646dab7f68
trying to avoid NPEs
2021-01-22 18:24:34 +01:00
Claudio Atzori
34d653de41
[Cleaning] updated cleaning rule for DOIs
2021-01-22 14:16:33 +01:00
Miriam Baglioni
fe36895c53
added datasource blacklist for the organization to result propagation through institutional repositories
2021-01-22 11:55:10 +01:00
miconis
8fea29177c
refactoring, minor changes and implementation of the wf for openorgs with integration of organization phases into the scan wf
2021-01-18 16:48:08 +01:00
Dimitris
3e8d2a6b2d
Clean workflows
2021-01-15 16:19:12 +02:00
Michele Artini
f667e94a31
Merge pull request 'broker' ( #88 ) from broker into master
2021-01-14 14:48:13 +01:00
Michele Artini
cfbcdc95bc
fixed a wf param
2021-01-14 14:45:23 +01:00
Michele Artini
69ba3203c0
fixed a conflict
2021-01-14 14:43:25 +01:00
Michele Artini
fafb5b2e08
Merge branch 'broker' of code-repo.d4science.org:D-Net/dnet-hadoop into broker
2021-01-14 14:32:42 +01:00
Michele Artini
b230d44411
fixed conflict
2021-01-14 14:32:31 +01:00
Michele Artini
b9d90e95b8
Added eventId to ShortEventMessage
2021-01-14 14:32:31 +01:00
Michele Artini
64b0b0bfb3
fixed a bug with invalid subject topic
2021-01-14 14:32:31 +01:00
Michele Artini
e3e0ab1de1
fixed a problem with join
2021-01-14 14:32:31 +01:00
Michele Artini
26a941315a
openaireId
2021-01-14 14:32:31 +01:00
Michele Artini
6f4d1a37f0
ES wf properties
2021-01-14 14:32:31 +01:00
Michele Artini
1391341d06
mkdir of output dir
2021-01-14 14:32:31 +01:00
Michele Artini
3c9cbd19f3
whitelist of topics
2021-01-14 14:32:31 +01:00
Michele Artini
467aa77279
workingDir and outputDir
2021-01-14 14:32:31 +01:00
Michele Artini
10f3f7eca7
workingDir and outputDir
2021-01-14 14:32:31 +01:00
Michele Artini
ff41a7b3a4
gzipped output
2021-01-14 14:32:31 +01:00
Michele Artini
223fa660cb
fixed conflict
2021-01-14 14:23:44 +01:00
Michele Artini
ac91e495fc
Added eventId to ShortEventMessage
2021-01-14 13:20:35 +01:00
Claudio Atzori
80cf55ef2e
[Broker] fixed partitionEventsByOpendoarIds workflow parameter names
2021-01-13 16:24:30 +01:00
Claudio Atzori
41500669e2
[BIP! Scores integration] merged missing classes from bipFinder branch
2021-01-11 14:39:47 +01:00
Claudio Atzori
2a7a10809e
[BIP! Scores integration] merged missing classes from bipFinder branch
2021-01-11 10:05:02 +01:00
Claudio Atzori
5bd999efe7
Merge pull request 'bipFinder_master_test' ( #84 ) from bipFinder_master_test into master
2021-01-08 18:16:34 +01:00
Claudio Atzori
d6686dd7cf
merged from master
2021-01-08 18:16:12 +01:00
Claudio Atzori
34229970e6
[BIP! Scores integration] Create updates as Result rather than subclasses; Result considers also metrics in the mergeFrom operation
2021-01-08 16:29:17 +01:00
Claudio Atzori
1361c9eb0c
[BIP! Scores integration] Create updates as Result rather than subclasses; Result considers also metrics in the mergeFrom operation
2021-01-07 10:07:30 +01:00
Claudio Atzori
ab2fe9266a
[DOIBoost] minor fixes in workflow definition
2021-01-05 10:26:39 +01:00
Claudio Atzori
7c722f3fdc
[DOIBoost] fixed typo
2021-01-05 10:25:54 +01:00
Claudio Atzori
8879704ba0
[DOIBoost] configurable ES server url and index name in crossref importer
2021-01-05 10:00:13 +01:00
Claudio Atzori
26e9d55c13
code formatting
2021-01-05 09:59:26 +01:00
Sandro La Bruzzo
7834a35768
avoid to save intermediate dataset before generation of Sequence file
2021-01-04 17:54:57 +01:00
Sandro La Bruzzo
e79445a8b4
minor fix for claudio polemica
2021-01-04 17:39:25 +01:00
Sandro La Bruzzo
8765020b85
minor fix
2021-01-04 17:37:08 +01:00
Sandro La Bruzzo
b0dc92786f
defined a single oozie workflow for the generation of doiboost
2021-01-04 17:01:35 +01:00
Claudio Atzori
7185158942
ignore missing properties
2020-12-29 11:06:28 +01:00
Claudio Atzori
28460c2cd1
using com.fasterxml.jackson.databind.ObjectMapper instead of org.codehaus.jackson.map.ObjectMapper
2020-12-23 16:59:52 +01:00
Claudio Atzori
60649ac7d2
swapped expected and actual in tests, updated expected number of authors
2020-12-23 12:26:04 +01:00
Claudio Atzori
723b01f9e9
trivial: the less magic numbers and values around, the better
2020-12-23 12:22:48 +01:00
Claudio Atzori
6848d0c3d7
trivial: avoid duplicated code
2020-12-23 12:21:58 +01:00
Claudio Atzori
d8b5f43a7e
code formatting
2020-12-22 14:59:03 +01:00
Claudio Atzori
7bfc35df5e
Merge pull request 'Changed typo in script names' ( #82 ) from antonis.lempesis/dnet-hadoop:master into master
...
no need to! :)
2020-12-22 12:36:21 +01:00
Antonis Lempesis
be5969a8c2
Changed typo in script names
2020-12-22 13:33:32 +02:00
miconis
794e22b09c
bug fix in the authormerge: now authors with higher size have priority, normalization of author name fixed
2020-12-21 17:51:42 +01:00
miconis
1e1aab83e3
implementation of the raw wf for openorgs: still not complete, some functionalities are missing
2020-12-21 11:58:21 +01:00
Claudio Atzori
6cb0dc3f43
extended OCRID cleaning procedure
2020-12-21 11:40:17 +01:00
Claudio Atzori
573a8a3272
Merge pull request 'Changed typo in script names' ( #81 ) from antonis.lempesis/dnet-hadoop:master into master
...
ok! LGTM
2020-12-18 17:44:26 +01:00
Antonis Lempesis
2a074c3b2b
Changed typo in script names
2020-12-18 18:40:48 +02:00
Claudio Atzori
47270d9af5
lenient mock can be lenient
2020-12-18 15:38:59 +01:00
Claudio Atzori
2e503ee101
code formatting
2020-12-17 13:47:38 +01:00
Claudio Atzori
5a3e2199b2
Merge pull request 'Creation of the action set to include the bipFinder! score' ( #80 ) from miriam.baglioni/dnet-hadoop:bipFinder into bipFinder_master_test
2020-12-17 12:26:38 +01:00
Claudio Atzori
03319d3bd9
Revert "Merge pull request 'Creation of the action set to include the bipFinder! score' ( #62 ) from miriam.baglioni/dnet-hadoop:bipFinder into master"
...
This reverts commit add7e1693b
, reversing
changes made to f9a8fd8bbd
.
2020-12-17 12:23:58 +01:00
Claudio Atzori
add7e1693b
Merge pull request 'Creation of the action set to include the bipFinder! score' ( #62 ) from miriam.baglioni/dnet-hadoop:bipFinder into master
2020-12-17 12:09:03 +01:00
Alessia Bardi
f9a8fd8bbd
updated test record for textgrid
2020-12-17 11:59:45 +01:00
Claudio Atzori
4766495f5b
[orcid_to_result_from_semrel_propagation] fixed typo in SQL
2020-12-17 09:15:50 +01:00
Claudio Atzori
de00094ebc
Merge pull request 'FIX on the creation of subject based broker enrichments' ( #79 ) from broker into master
2020-12-15 14:58:31 +01:00
Michele Artini
f9dc1e45fd
fixed a bug with invalid subject topic
2020-12-15 14:54:11 +01:00
Sandro La Bruzzo
f92bd56f56
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-12-15 11:47:29 +01:00
Sandro La Bruzzo
1f6c8a9e83
added orcid_pending type to records coming from Crossref
2020-12-15 11:47:15 +01:00
Enrico Ottonello
b2de598c1a
all actions from download lambda file to merge updated data into one wf
2020-12-15 10:42:55 +01:00
Claudio Atzori
9f1181290e
Merge pull request 'broker' ( #78 ) from broker into master
...
The changes look good to me.
2020-12-15 10:03:45 +01:00
Claudio Atzori
6299f75807
Merge pull request 'validation in claim rels' ( #77 ) from claims_validation into master
...
LGTM
2020-12-15 09:28:24 +01:00
Michele Artini
0a0f62bd01
Merge branch 'master' into broker
2020-12-15 08:30:52 +01:00
Michele Artini
12fa5d122a
fixed a problem with join
2020-12-15 08:30:26 +01:00
Michele Artini
991e675dc6
validation in claim rels
2020-12-14 15:41:25 +01:00
Michele Artini
3e19cf7b4a
openaireId
2020-12-14 15:24:33 +01:00
Claudio Atzori
b6f08ce226
re-adding the old junit:junit dep as solr-test-framework needs it
2020-12-14 15:07:31 +01:00
Claudio Atzori
e8ef8c63d4
delegate merging of OafEntity.dataInfo to the implementation of subclasses
2020-12-14 15:04:44 +01:00
Claudio Atzori
7d325e2c57
using actual result subclasses instead of their parent class
2020-12-14 14:40:54 +01:00
Claudio Atzori
152916890f
renamed test name
2020-12-14 14:40:05 +01:00
Michele Artini
a203aee32a
ES wf properties
2020-12-14 12:02:33 +01:00
Claudio Atzori
1506f49052
Xml record serialization for author PIDs: 1) only one value per PID type is allowed; 2) orcid prevails over orcid_pending
2020-12-14 11:14:03 +01:00
Michele Artini
d03756c962
mkdir of output dir
2020-12-14 11:11:41 +01:00
Michele Artini
399548f221
whitelist of topics
2020-12-14 11:03:55 +01:00
Michele Artini
38da1c282a
Merge branch 'master' into broker
2020-12-14 09:14:02 +01:00
Dimitris
dc9c2f3272
Commit 12122020
2020-12-12 12:00:14 +02:00
Enrico Ottonello
efe4c2a9c5
authors and works are now updated in two separate spark actions of the wf
2020-12-12 02:06:21 +01:00
Enrico Ottonello
858efbfad1
fix dataset creation for downloaded works
2020-12-11 16:49:54 +01:00
Claudio Atzori
61cd129ded
XML serialisation test
2020-12-11 12:44:53 +01:00
Claudio Atzori
ce7a319e01
using the correct assertion import
2020-12-11 12:44:17 +01:00
Claudio Atzori
7fe2433137
excluded transitive older junit dependencies, they can compromise the unit test executions
2020-12-11 12:42:55 +01:00
Claudio Atzori
d9532446eb
imported more diffs from master branch; code formatting
2020-12-10 16:14:16 +01:00
Claudio Atzori
1eaad89a3c
do not fail on uknown properties when grouping entities by ID
2020-12-10 15:56:11 +01:00
Michele Artini
933b4c1ada
workingDir and outputDir
2020-12-10 14:47:51 +01:00
Michele Artini
2e7df07328
workingDir and outputDir
2020-12-10 14:47:22 +01:00
Michele Artini
94bfed1c84
gzipped output
2020-12-10 11:59:28 +01:00
Claudio Atzori
3c10941376
Merge pull request 'bipFinder_resolve_conflicts' ( #73 ) from bipFinder_resolve_conflicts into stable_ids
2020-12-10 11:00:46 +01:00
Claudio Atzori
12e2f930c8
resolved conflicts
2020-12-10 10:57:39 +01:00
Miriam Baglioni
b7adbc7c3e
merge branch with master
2020-12-10 10:35:27 +01:00
Alessia Bardi
112da6d76a
in theory, just auto-formatting after mvn compile
2020-12-09 20:00:27 +01:00
Alessia Bardi
bece04b330
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-12-09 19:54:43 +01:00
Alessia Bardi
426b76ee8e
more asserts for TextGrid record
2020-12-09 19:46:11 +01:00
Claudio Atzori
ff72fcd91a
allow orcid_pending to be percolate to the XML graph serialization
2020-12-09 19:04:50 +01:00
Claudio Atzori
4705144918
Merge pull request 'rel_project_validation' ( #69 ) from rel_project_validation into master
...
LGTM
2020-12-09 19:01:20 +01:00
Claudio Atzori
211aa04726
allow orcid_pending to be percolate to the XML graph serialization
2020-12-09 18:08:51 +01:00
Claudio Atzori
db4e400a0b
introduced Oaf.mergeFrom method to allow merging of dataInfo(s), with prevalence based on datainfo.trust
2020-12-09 18:01:45 +01:00
Claudio Atzori
ada21ad920
Merge pull request 'dump of the results related to at least one project' ( #61 ) from miriam.baglioni/dnet-hadoop:dump into master
...
LGTM
2020-12-09 17:22:56 +01:00
Miriam Baglioni
6fbc67a959
using ModelConstant.ORCID and removing not used constants
2020-12-09 17:10:20 +01:00
Claudio Atzori
3c5ce1dada
code formatting
2020-12-09 17:07:20 +01:00
Miriam Baglioni
212b52614f
added graph mapper versus community result without context and project in common to be used for the doiboost mapping
2020-12-09 16:59:02 +01:00
Michele Artini
1bc9adc10d
default trust for validated rels
2020-12-09 16:18:37 +01:00
Claudio Atzori
fcd7689b50
promote actions: shouldGroupById parameter marked as optional (default is true)
2020-12-09 13:10:16 +01:00
Michele Artini
5f21a356fd
reindent
2020-12-09 11:24:30 +01:00
Michele Artini
370a5e650b
validation attributes in resultProject relations
2020-12-09 11:18:26 +01:00
Antonis Lempesis
aead9efd24
added the new parameter (stats_tool_api_url) in the workflow parameters
2020-12-09 10:45:24 +01:00
Antonis Lempesis
77a3a6d82e
added the new parameter (stats_tool_api_url) in the workflow parameters
2020-12-09 10:45:24 +01:00
Antonis Lempesis
91226117b3
ignoring deletedbyinference relations
2020-12-09 10:45:24 +01:00
Antonis Lempesis
b7f29db126
finished first implementation of wf
2020-12-09 10:45:24 +01:00
Antonis Lempesis
ded2392275
initial implementation of the promote wf
2020-12-09 10:45:24 +01:00
Antonis Lempesis
1a87a1effd
added last step to update cache
2020-12-09 10:45:24 +01:00
Michele Artini
75bf708351
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-12-09 10:31:33 +01:00
Michele Artini
620e1307a3
indentation
2020-12-09 10:30:47 +01:00
Enrico Ottonello
2233750a37
original orcid xml data are stored in a field of the class that models orcid data
2020-12-09 09:45:19 +01:00
Claudio Atzori
491ad24750
introduced filtering for DOIs in graph cleaning workflow
2020-12-09 09:10:33 +01:00
Claudio Atzori
27e96767e0
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-12-07 21:53:22 +01:00
Claudio Atzori
fba11eef2a
cleanup
2020-12-07 21:53:13 +01:00
Claudio Atzori
2fcc24b36e
code formatting
2020-12-07 21:52:32 +01:00
Claudio Atzori
197f286fa4
removed duplicated dependency (org.apache.httpcomponents:httpclent
2020-12-07 21:52:17 +01:00
Sandro La Bruzzo
7f8b93de72
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-12-07 19:59:39 +01:00
Sandro La Bruzzo
302baab67b
fixed doiboost mapping and workflows
2020-12-07 19:59:33 +01:00
Enrico Ottonello
5c65e602d3
wf doi_authors generates one json data foreach row
2020-12-07 15:28:10 +01:00
Michele Artini
d6934f370e
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-12-07 14:56:23 +01:00
Michele Artini
5de8a7276f
wf to partition opendoar events
2020-12-07 14:56:06 +01:00
Claudio Atzori
5e8509bef7
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-12-07 13:50:08 +01:00
Claudio Atzori
026ad40633
disabled test
2020-12-07 13:50:01 +01:00
Claudio Atzori
21ddcf3a73
actions promotion can optionally avoid grouping objects by id (configured via shouldGroupById parameter)
2020-12-07 13:45:18 +01:00
Enrico Ottonello
fa1855a4b8
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-12-07 11:02:59 +01:00
Enrico Ottonello
b1b589ada1
wf to generate orcid dataset
2020-12-07 11:02:32 +01:00
Sandro La Bruzzo
620e585b63
Merge branch 'master' of code-repo.d4science.org:D-Net/dnet-hadoop
2020-12-07 10:42:53 +01:00
Sandro La Bruzzo
b31dd126fb
fixed crossref workflow added common ORCID Class
2020-12-07 10:42:38 +01:00
Enrico Ottonello
8812ab65e1
completed download function to wf; added accumulators
2020-12-04 21:13:49 +01:00
Claudio Atzori
a104a632df
cleanup
2020-12-04 16:32:47 +01:00
miconis
ed0d5d3e1d
implementation of the wf to dedup entities, addition of the module to run the wf on the cluster
2020-12-04 15:41:31 +01:00
Claudio Atzori
5b4e1142a8
Merge pull request 'added last step to update cache' ( #64 ) from antonis.lempesis/dnet-hadoop:master into master
...
Looks good to me, thanks!
2020-12-04 14:42:31 +01:00
Antonis Lempesis
b1ed1afdcc
added the new parameter (stats_tool_api_url) in the workflow parameters
2020-12-04 13:07:18 +02:00
Antonis Lempesis
7cb113e088
added the new parameter (stats_tool_api_url) in the workflow parameters
2020-12-04 13:04:25 +02:00
Antonis Lempesis
d23ccae0d5
ignoring deletedbyinference relations
2020-12-04 12:42:17 +02:00
Miriam Baglioni
5fb65ffc4a
merge branch with master
2020-12-03 11:24:35 +01:00
Miriam Baglioni
ea88dc3401
fixed issue in property name
2020-12-03 11:24:23 +01:00
Miriam Baglioni
4c58bd1c93
merge with upstream
2020-12-03 11:24:00 +01:00
Miriam Baglioni
05c452f58d
merge with upstream
2020-12-03 10:26:45 +01:00
Enrico Ottonello
53b22c1937
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-12-02 23:21:27 +01:00
Enrico Ottonello
1b1e9ea67c
wf to generate doi_author_list for doiboost; wf to download updated works
2020-12-02 23:20:16 +01:00
Antonis Lempesis
413afcfed5
finished first implementation of wf
2020-12-02 15:57:17 +02:00
Antonis Lempesis
0948536614
initial implementation of the promote wf
2020-12-02 15:41:56 +02:00
Sandro La Bruzzo
7da679542f
fixed wrong projectId
2020-12-02 14:28:09 +01:00
Sandro La Bruzzo
6ba8037cc7
fixed failure to test due to changing of input
2020-12-02 11:34:46 +01:00
Claudio Atzori
cfb55effd9
code formatting
2020-12-02 11:23:49 +01:00
Claudio Atzori
74242e450e
using constants from ModelConstants
2020-12-02 11:23:35 +01:00
Miriam Baglioni
d5efa6963a
using constants in ModelCOnstants
2020-12-02 11:20:26 +01:00
Claudio Atzori
873c358d1d
Merge pull request 'added extension for new author pid (orcid_pending)' ( #63 ) from miriam.baglioni/dnet-hadoop:master into master
...
LGTM
2020-12-02 11:15:00 +01:00
Miriam Baglioni
cd285e98bc
usoing the constants defined in the ModelConstants class
2020-12-02 11:13:23 +01:00
Miriam Baglioni
51c582c08c
added orcid class name among the constants set
2020-12-02 11:12:54 +01:00
Miriam Baglioni
4b0d1530a2
merge upstream
2020-12-02 11:05:00 +01:00
Claudio Atzori
faa977df7e
Merge pull request 'orcid-no-doi' ( #43 ) from enrico.ottonello/dnet-hadoop:orcid-no-doi into master
...
The dataset was generated and is now part of the actionsets available in BETA
2020-12-02 10:55:12 +01:00
Claudio Atzori
57f448b7a4
graph cleaning workflow separate orcid_pending from orcid, depending on the author pid provenance
2020-12-02 10:44:05 +01:00
Alessia Bardi
2d15667b4a
testing XML generation from json object (case AMS ACTA)
2020-12-02 10:16:26 +01:00
Alessia Bardi
a417624670
tests for raw graph mapping
2020-12-02 10:15:26 +01:00
Claudio Atzori
943b961cf6
introduced PidBlacklist
2020-12-02 09:30:34 +01:00
Claudio Atzori
893ac4a77b
GenerateEntitiesApplication can be configured to hash the id value or not
2020-12-02 09:30:06 +01:00
Miriam Baglioni
f8468c9c22
added extention for new author pid (orcid_pending)
2020-12-01 20:09:35 +01:00
Miriam Baglioni
888175baf7
added java doc
2020-12-01 18:36:29 +01:00
Miriam Baglioni
3d62d99d5d
fixed issue in workflow variable
2020-12-01 15:02:49 +01:00
Miriam Baglioni
17680296b9
removed unnecessary variable and unused method
2020-12-01 15:02:31 +01:00
Miriam Baglioni
5b3ed70808
refactoring
2020-12-01 14:31:34 +01:00
Miriam Baglioni
62ff4999e3
added workflow and last step of collection and save
2020-12-01 14:30:56 +01:00
Miriam Baglioni
45d06c45c7
collecting all the atoic actions for result type and save them all in the AS path
2020-12-01 14:29:18 +01:00
Miriam Baglioni
0051ebede5
extending test
2020-12-01 12:43:03 +01:00
Miriam Baglioni
719da15f04
added test resources
2020-12-01 12:42:30 +01:00
Miriam Baglioni
e819155eb2
added implements Seriaiazable
2020-12-01 09:51:58 +01:00
Miriam Baglioni
db36e11912
classes test classes and resources for production of the actionset to include bipFinder score in results
2020-11-30 20:14:23 +01:00
Claudio Atzori
349e7246aa
do not consider NCID, GBIF as PIDs candidate for the ID creation
2020-11-30 16:52:40 +01:00
Enrico Ottonello
f2df3ead74
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-11-30 14:22:46 +01:00
Enrico Ottonello
40c4559e92
added datainfo on authors pid with "sysimport:crosswalk:entityregistry",
2020-11-30 14:19:22 +01:00
Claudio Atzori
2c407e775e
GenerateEntitiesApplication can be configured to hash the id value or not
2020-11-30 12:00:38 +01:00
Antonis Lempesis
815d6b25d9
added last step to update cache
2020-11-30 00:48:10 +02:00
Claudio Atzori
758d27745d
cleaning tab characters from text fields
2020-11-27 16:07:24 +01:00
Claudio Atzori
596a2a459d
added testing class for OafMapperUtils
2020-11-27 12:01:11 +01:00
Claudio Atzori
e731a7658d
cleaning texts to remove tab characters too
2020-11-27 09:00:04 +01:00
Claudio Atzori
fa66e5b6b8
ResultTypeComparator gives priority to Records collectedfrom Crossref
2020-11-26 13:09:19 +01:00
Claudio Atzori
5151850a19
CROSSREF and DATACITE constants moved in common ModelConstants
2020-11-26 13:08:36 +01:00
Claudio Atzori
a104d2b6ad
cleanup
2020-11-26 11:12:00 +01:00
Claudio Atzori
d0d5525d40
minor changes
2020-11-26 11:04:17 +01:00
Claudio Atzori
13eae4b31e
GroupEntitiesSparkJob must read all graph paths but relations
2020-11-26 11:04:01 +01:00
Claudio Atzori
76363a8512
SimpleDateFormat is not thread safe; improved error reporting in case of invalid dates
2020-11-26 11:03:12 +01:00
Claudio Atzori
c1b9a4045a
grouping of records will be performed by the dedup workflow
2020-11-26 10:59:10 +01:00
Miriam Baglioni
124591a7f3
refactoring
2020-11-25 18:23:28 +01:00
Miriam Baglioni
1a89f8211c
#61 (comment)
2020-11-25 18:12:40 +01:00
Miriam Baglioni
5fbe54ef54
#61 (comment)
2020-11-25 18:10:28 +01:00
Miriam Baglioni
ed01e5a5e1
#61 (comment)
2020-11-25 18:09:34 +01:00
Miriam Baglioni
d4ddde2ef2
changed because of #61 (comment)
2020-11-25 18:01:01 +01:00
Miriam Baglioni
f5e5e92a10
changed because of #61 (comment)
2020-11-25 17:58:53 +01:00
Miriam Baglioni
1df94b85b4
changed because of #61 (comment)
2020-11-25 17:57:43 +01:00
Miriam Baglioni
66c0e3e574
changed because of #61 (comment)
2020-11-25 17:52:17 +01:00
Claudio Atzori
db0181b8af
Merge pull request 'added bidirectionality to relations from project and result coming from crossref' ( #60 ) from miriam.baglioni/dnet-hadoop:sxBidirectionality into master
2020-11-25 17:17:40 +01:00
Sandro La Bruzzo
ec3e238de6
Fixed problem on duplicated identifier
2020-11-25 17:15:54 +01:00
Claudio Atzori
1372a4d1bf
fixed merging method
2020-11-25 16:05:51 +01:00
Claudio Atzori
e208b03755
renamed workflow
2020-11-25 14:55:50 +01:00
Claudio Atzori
dfd6205b95
Consistency graph workflow merges all the entities by ID
2020-11-25 14:55:32 +01:00
Miriam Baglioni
90d4369fd2
added test to verify the compression in writing community info on hdfs
2020-11-25 14:34:58 +01:00
Miriam Baglioni
6750e33d69
merge branch with master
2020-11-25 14:09:01 +01:00
Miriam Baglioni
b2c455f883
added java doc
2020-11-25 14:08:09 +01:00
Miriam Baglioni
1f130cdf92
changed the relation (produces -> isProducedBy) due to the change in the code
2020-11-25 14:04:26 +01:00
Miriam Baglioni
e758d5d9b4
refactoring
2020-11-25 13:46:39 +01:00
Miriam Baglioni
87a9f616ae
refactoring and addition of the funder nsp first part as nome for the dump insteasd of the whole nsp
2020-11-25 13:45:41 +01:00
Miriam Baglioni
e7e418e444
added decision node to verify if to upload in Zenodo
2020-11-25 13:44:10 +01:00
Miriam Baglioni
305e3d0c9c
added resource file for relation with relClass = isProducedBy
2020-11-25 13:43:41 +01:00
Miriam Baglioni
21ce175d17
added FilterFunction specification if filter operation
2020-11-25 13:42:31 +01:00
Miriam Baglioni
bde6d337dd
test classes for dump of results related to funders
2020-11-25 13:42:01 +01:00
Miriam Baglioni
b37b9352d7
added constant value for semantic relationship between projects and results
2020-11-25 13:41:08 +01:00
Sandro La Bruzzo
264723ffd8
updated stuff for zenodo upload
2020-11-25 11:56:07 +01:00
Claudio Atzori
36173c13a5
reverted filters in the clening process
2020-11-25 10:24:42 +01:00
Claudio Atzori
eeebd5a920
Cleanig workflow: remove newlines from titles, descriptions, subjects
2020-11-24 18:40:25 +01:00
Claudio Atzori
e1a1bb3ee4
moved class CleaningFunctions in the correct package. Remove newlines from titles, descriptions, subjects
2020-11-24 18:34:03 +01:00
Enrico Ottonello
99a086f0c6
max concurrent executors set to 10, according to ORCID Director of Technology mail request
2020-11-24 17:49:32 +01:00
Miriam Baglioni
72bb0fe360
changed directory name
2020-11-24 16:47:07 +01:00
Miriam Baglioni
00874a8ce6
added bidirectionality to relations from project and result
2020-11-24 15:17:23 +01:00
Miriam Baglioni
39f4a20873
chenged the path and the name for saving the communities_infrastructures dump file
2020-11-24 14:47:32 +01:00
Miriam Baglioni
7e14452a87
final versione of the wf to get the dump of results associated to at least one funder per funder
2020-11-24 14:46:34 +01:00
Miriam Baglioni
c167a18057
added new parameter for the dumpType
2020-11-24 14:45:50 +01:00
Miriam Baglioni
54a309bb6b
refactoring
2020-11-24 14:45:30 +01:00
Miriam Baglioni
35ecea8842
changed to consider the modification for the specification of the type of dump
2020-11-24 14:45:15 +01:00
Miriam Baglioni
b9b6bdb2e6
fixing issue on previous implementation
2020-11-24 14:44:53 +01:00
Miriam Baglioni
7e940f1991
changed to consider the modification for the specification of the type of dump
2020-11-24 14:43:34 +01:00
Miriam Baglioni
62928ef7a5
changed to save the communities_infrastructures information as the other entity dumps: in a json.gz file
2020-11-24 14:42:41 +01:00
Claudio Atzori
33bae02451
reverted behaviour of the cleaning workflow: grouping entities by ID will be managed differently
2020-11-24 14:42:33 +01:00
Claudio Atzori
e43ab07af6
code formatting
2020-11-24 14:41:39 +01:00
Miriam Baglioni
3319440c53
changed the direction of the relation between projects and result considered to select the results linked to projects
2020-11-24 14:41:09 +01:00
Miriam Baglioni
00c377dac2
added specification of MapFunction types in map
2020-11-24 14:40:22 +01:00
Miriam Baglioni
44db258dc4
added enumerated for the dump type
2020-11-24 14:38:06 +01:00
Miriam Baglioni
1832708c42
modified boolean variable with string one whcih specify the type of dump we are performing: complete, community or funder
2020-11-24 14:37:36 +01:00
Miriam Baglioni
73dbb79602
removed the checl for the community name in the common version on MakeTar
2020-11-24 14:36:15 +01:00
Claudio Atzori
c016cc050a
IdentifierFactory: in case a record provides more than one pid of the same type, the the lexicographically lower value is chosen as best pick
2020-11-23 19:16:40 +01:00
Enrico Ottonello
5c17e768b2
set wf configuration with spark.dynamicAllocation.maxExecutors 20 over 20 input partitions
2020-11-23 16:01:23 +01:00
Enrico Ottonello
5c9a727895
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-11-23 09:49:53 +01:00
Enrico Ottonello
97c8111847
action to convert lambda file in seq file; spark action to download updated authors
2020-11-23 09:49:22 +01:00
Miriam Baglioni
259c67ce36
fixed issue in path name
2020-11-20 12:32:23 +01:00
Miriam Baglioni
0a9db67eec
-
2020-11-20 12:21:33 +01:00
Miriam Baglioni
d362f2637d
merge branch with master
2020-11-19 19:17:20 +01:00
Miriam Baglioni
cf3f47563f
new parameter files
2020-11-19 19:16:05 +01:00
Miriam Baglioni
24c56fa7a3
new logic and workflow for dump of results with link to projects. In this implementation the result match the model of the communityresult.
2020-11-19 19:15:39 +01:00
Claudio Atzori
d48f388fb2
Merge branch 'provision_indexing'
2020-11-19 15:59:55 +01:00
Claudio Atzori
46bde9c13f
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop
2020-11-19 15:26:27 +01:00
Claudio Atzori
7c9feaf9e7
project attributes removed from the XML record serialization: contactfullname, contactfax, contactphone, contactemail
2020-11-19 15:26:20 +01:00
Claudio Atzori
fcbb05eb21
cleanup
2020-11-19 15:14:33 +01:00
Claudio Atzori
3f34757c63
merged from master
2020-11-19 14:34:54 +01:00
Claudio Atzori
0374d34c3e
introduced configuration param outputFormat: HDFS | SOLR
2020-11-19 10:34:28 +01:00
Miriam Baglioni
fafb688887
-
2020-11-18 18:56:48 +01:00
Miriam Baglioni
906db690d2
-
2020-11-18 17:43:08 +01:00
Miriam Baglioni
5402062ff5
changed parameter file with the ono associated to the job
2020-11-18 16:58:20 +01:00
Miriam Baglioni
a172a37ad1
fixed typo
2020-11-18 16:55:07 +01:00
Miriam Baglioni
46ba3793f6
code, workflow and parameters for the dump of the results associated to funders
2020-11-18 16:47:31 +01:00
Miriam Baglioni
57cac36898
changed the workflow name
2020-11-18 13:38:03 +01:00
Enrico Ottonello
2b0c9bbb7e
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-11-17 18:24:34 +01:00
Enrico Ottonello
c0c2e05eae
added wf to extracting authors and works xml data from orcid dump to hdfs; added wf to download the lamda file (containing last orcid update informations) from orcid to hdfs
2020-11-17 18:23:12 +01:00
Dimitris
bbcf6b7c8b
Commit 17112020
2020-11-17 08:36:51 +02:00
Enrico Ottonello
c796adae24
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-11-16 11:57:19 +01:00
Dimitris
3e24c9b176
Changes 14112020
2020-11-14 18:42:07 +02:00
Enrico Ottonello
005f849674
added compression to output dataset
2020-11-13 12:45:31 +01:00
Enrico Ottonello
9a2fa9dc2f
added test for other names parsing from summaries dump
2020-11-13 10:25:34 +01:00
Enrico Ottonello
13f28fa225
moved AuthorData to dhp-schemas; added other names to author data
2020-11-12 17:43:32 +01:00
Enrico Ottonello
2af21150c5
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-11-12 09:58:33 +01:00
Claudio Atzori
9b0fb9e958
merged from master
2020-11-12 09:27:12 +01:00
Enrico Ottonello
1f861f2b0d
now wf output is a sequence file with the format seq("eu.dnetlib.dhp.schema.oaf.Publication",eu.dnetlib.dhp.schema.action.AtomicActions)
2020-11-11 17:38:50 +01:00
Enrico Ottonello
fea2451658
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-11-10 11:49:43 +01:00
Enrico Ottonello
1513174d7e
added further test case
2020-11-10 11:44:55 +01:00
Sandro La Bruzzo
8e1d43aab2
Implemented ID generation using IdentifierRecordFactory on DOIBoost
2020-11-09 11:53:55 +01:00
Enrico Ottonello
6bc7dbeca7
first version of dataset successful generated from orcid dump 2020
2020-11-06 13:47:50 +01:00
Claudio Atzori
2d76497488
cleanup
2020-11-05 17:10:24 +01:00
miconis
3f2d3253e4
Merge branch 'stable_ids' into deduptesting
2020-11-05 15:52:57 +01:00
miconis
1699d41d39
relations for openorgs: not it choose only one master
2020-11-05 15:48:42 +01:00
Claudio Atzori
e5da4ee9b1
dedup workflow using the common PidComparator
2020-11-04 15:02:02 +01:00
Claudio Atzori
ea2a0ea949
IdentifierFactory considers only DOIs matching a given regex
2020-11-03 18:43:37 +01:00
Claudio Atzori
86d6fbe95b
refactoring: CleaningFunctions and OafMapperUtils moved in dhp-commong
2020-11-03 12:19:46 +01:00
Claudio Atzori
8471888ad3
Merge branch 'graph_cleaning' into stable_ids
2020-11-03 11:52:47 +01:00
Claudio Atzori
3fcd669e99
result merge operation leverage on custom ResultTypeComparator in the aggregator graph construction
2020-11-03 10:53:23 +01:00
Claudio Atzori
78c3c1b62b
exclude pid values set to 'none'
2020-11-02 14:25:26 +01:00
Claudio Atzori
8e7f81c5f5
code formatting
2020-11-02 14:25:00 +01:00
Claudio Atzori
09e44dabff
Merge branch 'master' into stable_ids
2020-11-02 12:16:01 +01:00
Dimitris
32bf943979
Changes to download only updates
2020-11-02 09:08:25 +02:00
Claudio Atzori
385214eeae
code formatting
2020-10-30 15:47:05 +01:00
Claudio Atzori
04ad8969b2
anticipated execution of the graph cleaning workflow
2020-10-30 15:46:55 +01:00
Claudio Atzori
4ca75d6951
Merge pull request 'Dedup ID creation policy' ( #48 ) from deduptesting into stable_ids
2020-10-30 15:15:32 +01:00
Dimitris
b8a3392b59
Commit 30102020
2020-10-30 14:07:21 +02:00
Claudio Atzori
58f28296ea
ProvisionConstants moved as ModelHardLimits in dhp-common and applied to truncate long abstracts (len > 150000). Further filtering for empty PID values
2020-10-30 10:56:42 +01:00
Enrico Ottonello
9818e74a70
added dependency version in main pom.xml for orcid no doi
2020-10-22 16:38:00 +02:00
Enrico Ottonello
210a50e4f4
replaced null value
2020-10-22 16:24:42 +02:00
Enrico Ottonello
b0290dbcb7
moved all dependencies version to main pom.xml
2020-10-22 16:20:46 +02:00
Enrico Ottonello
a38ab57062
let run test methods
2020-10-22 15:43:50 +02:00
Enrico Ottonello
1139d6568d
replaced null value with a more safe empty string as return value
2020-10-22 15:32:26 +02:00
Enrico Ottonello
c58db1c8ea
added filter on null value after map function
2020-10-22 15:11:02 +02:00
Enrico Ottonello
846ba30873
if typologies mapping fails, an exception will be propagated
2020-10-22 14:36:18 +02:00
Enrico Ottonello
c3114ba0ae
replaced null as return value with a more safe empty string
2020-10-22 14:21:31 +02:00
Enrico Ottonello
c295c71ca0
added comment
2020-10-22 14:07:26 +02:00
Enrico Ottonello
ab083f9946
propagate exception on parsing work (PR request)
2020-10-22 14:02:32 +02:00
miconis
c4a59d1b9a
merge with the master to port the new packages
2020-10-20 16:07:30 +02:00
miconis
708d887e64
minor changes
2020-10-20 15:12:19 +02:00
miconis
0e54803177
bug fix in the id generator and implementation of jobs for organization dedup
2020-10-20 12:19:46 +02:00
Claudio Atzori
266bf1a221
common IdentifierFactory in use on the mapping from the aggregator data; merge the entities sharing the same id; code formatting
2020-10-16 17:02:10 +02:00
Claudio Atzori
34f1d0904b
common IdentifierFactory in use on the mapping from the aggregator data
2020-10-16 16:00:19 +02:00
Claudio Atzori
c188868450
Merge branch 'master' into stable_ids
2020-10-16 12:06:23 +02:00
Claudio Atzori
3e6c8bca39
Merge branch 'master' into stable_ids
2020-10-09 13:53:40 +02:00
miconis
6f8720982c
bug fix in the idgenerator and test implementation
2020-10-09 09:30:23 +02:00
Claudio Atzori
8958f20813
code formatting
2020-10-07 13:14:31 +02:00
Claudio Atzori
1abcabb6e6
WIP stable ids: IdentifierFactory & unit test
2020-10-06 18:55:23 +02:00
miconis
1804c5d809
refactoring: classes moved in the right package
2020-10-06 16:44:51 +02:00
miconis
7093355487
bug fix and minor changes
2020-10-06 16:21:34 +02:00
Claudio Atzori
642b459552
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids
2020-10-06 15:45:02 +02:00
Claudio Atzori
6ce340bd3d
WIP stable ids: IdentifierFactory
2020-10-06 15:44:53 +02:00
miconis
a2ac7e52fb
implementation of the workflow for new organizations in openorgs
2020-10-06 13:58:09 +02:00
miconis
e3f7798d1b
minor changes in dedup tests, bug fix in the idgenerator and pace-core version update
2020-09-29 15:31:46 +02:00
miconis
72116446ec
[maven-release-plugin] prepare for next development iteration
2020-09-29 12:06:38 +02:00
miconis
05a03d97cd
[maven-release-plugin] prepare release dnet-dedup-4.0.5
2020-09-29 12:06:35 +02:00
miconis
2a01022712
minor changes
2020-09-29 12:05:50 +02:00
miconis
dd34e371d7
fixed error in the treeprocessor. it used th=-1 as default value, now it use th=1
2020-09-29 12:01:25 +02:00
miconis
4cf79f32eb
implementation of the oozie wf to prepare the openorgs input: relations between organizations
2020-09-25 11:29:51 +02:00
Enrico Ottonello
a97ad20c7b
exception is now propagated (PR review)
2020-09-22 10:46:34 +02:00
Enrico Ottonello
fefbcfb106
dependency version moved to main pom (PR review)
2020-09-22 10:20:25 +02:00
miconis
259362ef47
implementation of the job to collect simrels from postgres db
2020-09-22 09:43:27 +02:00
miconis
19c3c90d7b
fixed error in the block processor: entities with orderField=null were not considered
2020-09-19 17:43:41 +02:00
Enrico Ottonello
7cffd14fb0
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-09-15 16:05:55 +02:00
Enrico Ottonello
9e8e7fe6ef
add comments
2020-09-15 11:32:49 +02:00
Enrico Ottonello
538f299767
merged
2020-09-14 12:35:16 +02:00
Enrico Ottonello
eb8c9b2348
Merge remote-tracking branch 'upstream/master' into orcid-no-doi
2020-09-14 12:00:56 +02:00
Sandro La Bruzzo
a109ebe287
fixed NPE
2020-08-06 10:27:05 +02:00
Enrico Ottonello
0377b40fba
output to one parquet file
2020-07-30 18:38:07 +02:00
Enrico Ottonello
196f36c6ed
fix publication dataset creation
2020-07-30 13:38:33 +02:00
Enrico Ottonello
c82b15b5f4
migrate configuration to ocean, fix publication dataset creation
2020-07-28 15:23:52 +02:00
Enrico Ottonello
a6acb37689
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-07-28 08:07:40 +02:00
miconis
d47352cbc7
refactoring of the procedure for the id generation, minor changes and addition of a comparation on the original id and the origin datasource
2020-07-24 20:10:47 +02:00
miconis
b260fee787
implementation of the dedup_id generation using pids to make the graph more stable
2020-07-22 17:29:48 +02:00
miconis
a5a3ea24f8
[maven-release-plugin] prepare for next development iteration
2020-07-16 18:59:25 +02:00
miconis
840fe8f4d3
[maven-release-plugin] prepare release dnet-dedup-4.0.4
2020-07-16 18:59:22 +02:00
miconis
07ab904d60
implementation of the clustering function for the suffixprefix chain
2020-07-16 18:57:55 +02:00
Claudio Atzori
eaf7defe0c
[maven-release-plugin] prepare for next development iteration
2020-07-15 17:57:09 +02:00
Claudio Atzori
ff2c8eba12
[maven-release-plugin] prepare release dnet-dedup-4.0.3
2020-07-15 17:57:04 +02:00
Claudio Atzori
7cc3742a26
removed maven release.property
2020-07-15 17:52:27 +02:00
Claudio Atzori
14611ea450
reverted to 4.0.3-SNAPSHOT
2020-07-15 17:37:36 +02:00
Claudio Atzori
9f20f23870
Revert "wordssuffixprefix: adjust the token length according to the number of words; removed maven release temporary files"
...
This reverts commit 51d91fa520
.
2020-07-15 17:35:56 +02:00
Claudio Atzori
9efcd8e245
Revert "reverted to 4.0.3-SNAPSHOT"
...
This reverts commit ec97983ce1
.
2020-07-15 17:28:37 +02:00
Claudio Atzori
ba493f9ab8
[maven-release-plugin] rollback the release of dnet-dedup-4.0.3
2020-07-15 17:24:43 +02:00
Claudio Atzori
6c98d4c436
[maven-release-plugin] prepare release dnet-dedup-4.0.3
2020-07-15 17:24:25 +02:00
Claudio Atzori
ec97983ce1
reverted to 4.0.3-SNAPSHOT
2020-07-15 17:20:12 +02:00
Claudio Atzori
51d91fa520
wordssuffixprefix: adjust the token length according to the number of words; removed maven release temporary files
2020-07-15 17:13:45 +02:00
Claudio Atzori
b79ea97107
Revert "wordssuffixprefix: adjust the token length according to the number of words; removed maven release temporary files"
...
This reverts commit d2861950ac
.
2020-07-15 17:11:46 +02:00
Claudio Atzori
92aadbfc7b
[maven-release-plugin] prepare release dnet-dedup-4.0.3
2020-07-15 17:04:20 +02:00
Claudio Atzori
d2861950ac
wordssuffixprefix: adjust the token length according to the number of words; removed maven release temporary files
2020-07-15 16:49:47 +02:00
miconis
244a037a90
implementation of a class to test the clustering functions
2020-07-12 10:13:54 +02:00
Enrico Ottonello
ca37d3427b
separate workflow to parse orcid summaries, activities and generate dataset with no doi publications; test
2020-07-03 23:30:31 +02:00
Enrico Ottonello
1729cc5cf3
publication conversion from json to oaf test
2020-07-02 18:46:20 +02:00
miconis
7aa2001a8b
[maven-release-plugin] prepare for next development iteration
2020-07-02 17:06:38 +02:00
miconis
c72055f543
[maven-release-plugin] prepare release dnet-dedup-4.0.2
2020-07-02 17:06:36 +02:00
miconis
f933fd33e0
implemented new function for clustering
2020-07-02 17:04:17 +02:00
Enrico Ottonello
5525f57ec8
converter from orcid work json to oaf
2020-07-01 18:36:14 +02:00
Enrico Ottonello
b7b6be12a5
fixed enriched works generation
2020-06-29 18:03:16 +02:00
Enrico Ottonello
b2213b6435
merged with dnet version
2020-06-26 17:27:34 +02:00
Enrico Ottonello
c5e149c46e
Merge branch 'master' of https://code-repo.d4science.org/D-Net/dnet-hadoop into orcid-no-doi
2020-06-26 16:15:38 +02:00
Enrico Ottonello
d6498278ed
added workflow to generate seq(orcidId,work) and seq(orcidId,enrichedWork)
2020-06-25 18:43:29 +02:00
Enrico Ottonello
fcbb4c1489
parser of orcid publication data from xml original dump
2020-06-24 16:29:32 +02:00
miconis
411d1cc24f
implementation of the test for the dedup and addition of new support classes
2020-06-11 10:46:46 +02:00
miconis
48c094f599
[maven-release-plugin] prepare for next development iteration
2020-04-24 14:39:01 +02:00
miconis
4365ba41c9
[maven-release-plugin] prepare release dnet-dedup-4.0.1
2020-04-24 14:38:58 +02:00
miconis
6e9b27f37d
implementation of the mechanism to truncate the string and the lists
2020-04-24 14:36:42 +02:00
Sandro La Bruzzo
8e4211708e
[maven-release-plugin] prepare for next development iteration
2020-02-10 12:51:04 +01:00
Sandro La Bruzzo
24e2ab9092
[maven-release-plugin] prepare release dnet-dedup-4.0.0
2020-02-10 12:50:45 +01:00
Sandro La Bruzzo
46727f5c76
upgraded maven version of commons-lang
2020-02-10 12:38:40 +01:00
miconis
5c8f6febee
minor changes in comparators
2020-01-24 10:01:11 +01:00
miconis
4dce785375
update in the implementation of the tree: addition of new logic aggregations and statistics
2020-01-14 11:42:43 +02:00
miconis
b3748b8d77
minor changes
2019-12-18 16:20:35 +01:00
miconis
b21b1b8f61
implementation of new aggregation in the tree node processing
2019-12-18 16:19:36 +01:00
miconis
20fcfe6328
implementation of new aggregation in the tree node processing
2019-12-18 16:19:26 +01:00
Sandro La Bruzzo
d924f28b93
fixed wrong use of jspath
2019-12-18 09:29:44 +01:00
miconis
84aaa65501
implementation of new json comparator and update of the publication configuration
2019-12-17 09:16:26 +01:00
Sandro La Bruzzo
5c01ae4c92
merged JqMapping branch into tree2
2019-12-13 11:30:02 +01:00
Sandro La Bruzzo
35008fdbf9
fix stuff
2019-12-06 15:28:30 +01:00
Sandro La Bruzzo
16c670a5d5
Improved deduplication
2019-12-05 14:14:25 +01:00
miconis
49f9beb4a8
implementation of romansmatch and re-implementation of the getNumber function. New terms in the translation map and update of the configuration
2019-11-28 16:54:44 +01:00
miconis
f791730330
addition of one term to the translation maps in the configurations
2019-11-27 15:48:37 +01:00
miconis
d2278fe358
minor change in the citymatch
2019-11-21 10:54:02 +01:00
miconis
8c0d346005
the param map has been updated: now it accepts string parameters
2019-11-21 09:37:56 +01:00
miconis
ddd40540aa
jarowinklernormalizedname splitted in 3 different comparators: citymatch, keywordmatch and jarowinkler. Implementation of the TreeStatistic support functions
2019-11-20 10:45:00 +01:00
miconis
c687956371
code cleaning and implementation of the TreeDedup + minor changes
2019-11-14 10:01:21 +01:00
miconis
0973899865
code cleaning, distribution of the classes in packages and implementation of the new configuration
2019-11-07 12:47:12 +01:00
miconis
30a873265f
put the last modification of the master branch into the tree2. Addition of the configuration as parameter of the comparator. This is to allow the comparator to access it
2019-10-29 16:38:42 +01:00
miconis
1beb776691
minor changes
2019-10-29 15:58:21 +01:00
miconis
075f741d28
[maven-release-plugin] prepare for next development iteration
2019-10-24 11:34:19 +02:00
miconis
ced4bcdd59
[maven-release-plugin] prepare release dnet-dedup-3.0.15
2019-10-24 11:34:12 +02:00
miconis
13f93e6055
Revert "[maven-release-plugin] prepare release dnet-dedup-3.0.15"
...
This reverts commit cf93515d94
.
2019-10-24 11:23:01 +02:00
miconis
cf93515d94
[maven-release-plugin] prepare release dnet-dedup-3.0.15
2019-10-24 11:17:07 +02:00
miconis
285ec3ca17
release rollback
2019-10-24 11:11:07 +02:00
miconis
5f249fd56c
minor changes
2019-10-23 16:37:20 +02:00
miconis
c9863debfa
minor changes and configuration updates (synonym field added)
2019-10-23 16:31:45 +02:00
miconis
5499ca17c3
minor changes
2019-10-08 16:49:07 +02:00
miconis
50b7a12b3f
normalization of the term in the translation map added
2019-10-08 15:13:45 +02:00
miconis
26b383fea2
translation map moved in json configuration, support for synonyms added in the configuration, now the configuration is argument of conditions, distancealgos and clusteringfunctions
2019-10-08 14:53:52 +02:00
Claudio Atzori
07355d2811
[maven-release-plugin] prepare for next development iteration
2019-09-25 10:39:46 +02:00
Claudio Atzori
254eb46809
[maven-release-plugin] prepare release dnet-dedup-3.0.14
2019-09-25 10:39:39 +02:00
Claudio Atzori
74c6462b49
updated translation map and some tests
2019-09-25 10:15:13 +02:00
miconis
aed81e4cfa
translation map updated
2019-09-25 09:53:06 +02:00
miconis
afd2b398d5
optimize imports
2019-08-09 15:42:41 +02:00
miconis
d71dae5fd2
implementation of the conditions in tree nodes. get rid of the conditions part of the configuration
2019-08-09 15:41:49 +02:00
miconis
a5c5d2f01b
implementation of the decision tree. It takes place of the distance algos, necessaryConditions and sufficientConditions are still there. The model contains only path, type and name of the field. ignoreMissing is still in the model because it is used by the conditions.
2019-08-09 10:08:34 +02:00
miconis
f2136e1024
code refactoring: useless module removed
2019-08-07 15:16:59 +02:00
miconis
8c867101ef
addition of a fixSpecial function to address the problem with special character in organization names, addition of new terms in translation maps
2019-08-06 17:06:05 +02:00
miconis
4502b44337
addition of the BlockUtils class for meta-blocking, implementation of a new local test with edge filtering example
2019-08-06 12:09:34 +02:00
miconis
cffb712a99
Merge branch 'master' of https://github.com/dnet-team/dnet-dedup
2019-07-19 17:10:53 +02:00
miconis
a85576c27e
restyling of the JaroWinklerNormalizedName comparator, now it is optimized. Addition of some translations in the translation maps, addition of a clustering based on keywords in organizations legalnames
2019-07-19 17:10:29 +02:00
Claudio Atzori
6cb846331a
[maven-release-plugin] prepare for next development iteration
2019-07-08 11:12:52 +02:00
Claudio Atzori
c04d2232c2
[maven-release-plugin] prepare release dnet-dedup-3.0.13
2019-07-08 11:12:45 +02:00
miconis
fb5e38db26
Merge branch 'master' of https://github.com/dnet-team/dnet-dedup
2019-07-08 11:02:29 +02:00
miconis
3c6f8d1e44
bug fixing in the keywordsclustering class
2019-07-08 11:01:49 +02:00
Claudio Atzori
a69022617d
[maven-release-plugin] prepare for next development iteration
2019-07-08 10:11:24 +02:00
Claudio Atzori
c6baeb93d4
[maven-release-plugin] prepare release dnet-dedup-3.0.12
2019-07-08 10:11:17 +02:00
miconis
f5de20a508
[maven-release-plugin] rollback the release of dnet-dedup-3.0.12
2019-07-08 10:00:48 +02:00
miconis
ba50aa8654
[maven-release-plugin] prepare for next development iteration
2019-07-08 09:48:10 +02:00
miconis
7065110a21
[maven-release-plugin] prepare release dnet-dedup-3.0.12
2019-07-08 09:48:03 +02:00
miconis
15bec5e876
addition of doi normalization in PidMatch comparator, addition of keywordsclustering (clustering based on terms in the translation maps for the organizations), minor changes
2019-07-08 09:44:02 +02:00
Claudio Atzori
2dcffb965f
[maven-release-plugin] prepare for next development iteration
2019-06-19 10:02:39 +02:00
Claudio Atzori
85126c59f7
[maven-release-plugin] prepare release dnet-dedup-3.0.11
2019-06-19 10:02:32 +02:00
Claudio Atzori
15d7b584f3
optimized classpath resolvers
2019-06-19 10:01:35 +02:00
Claudio Atzori
ff4956def9
[maven-release-plugin] prepare for next development iteration
2019-06-18 14:46:34 +02:00
Claudio Atzori
eb5ce312a3
[maven-release-plugin] prepare release dnet-dedup-3.0.10
2019-06-18 14:46:27 +02:00
Claudio Atzori
f2bc665403
avoid to divide by zero: in case of missing values, return undefined response
2019-06-18 14:45:15 +02:00
Claudio Atzori
e3f86b92c8
cleanup
2019-06-18 14:44:42 +02:00
miconis
54e4d0af04
exact match condition gives undefined if a field is missing, ignoremissing semantics changed: now performs the comparison in any case if =true, if false gives -1 in case of missing
2019-06-18 14:05:31 +02:00
miconis
e8db8f2abb
implementation of the integration test, addition of document blocks to group entities after clustering
2019-05-21 16:38:26 +02:00
Claudio Atzori
f7a3bdf3f8
[maven-release-plugin] prepare for next development iteration
2019-04-03 12:35:00 +02:00
Claudio Atzori
98c179c8fb
[maven-release-plugin] prepare release dnet-dedup-3.0.9
2019-04-03 12:34:52 +02:00
miconis
3e61a90c8f
[maven-release-plugin] rollback the release of dnet-dedup-3.0.9
2019-04-03 12:27:28 +02:00
miconis
15fb9eb883
[maven-release-plugin] prepare for next development iteration
2019-04-03 12:26:05 +02:00
miconis
a1ff4daa7f
[maven-release-plugin] prepare release dnet-dedup-3.0.9
2019-04-03 12:25:56 +02:00
miconis
1d29bae47c
branch cities merged into master
2019-04-03 12:22:33 +02:00
miconis
7e7018c51f
addition of a sparktester test, implementation of 2 different classes for testing in dnet-dedup-test module, addition of new terms in the vocabulary and change in the implementation of the JaroWinklerNormalizedName comparator
2019-04-03 09:40:14 +02:00
miconis
4bd5a9beee
minor changes
2019-03-26 15:48:21 +01:00
Michele De Bonis
662448e584
update of the comparator for legalnames of organizations
2019-03-21 14:27:27 +01:00
Claudio Atzori
f2394fcd9f
[maven-release-plugin] prepare for next development iteration
2019-02-18 09:09:14 +01:00
Claudio Atzori
722431dde1
[maven-release-plugin] prepare release dnet-dedup-3.0.8
2019-02-18 09:09:07 +01:00
Claudio Atzori
470c4b0f20
default configuration includes configurationId
2019-02-18 09:07:23 +01:00
Claudio Atzori
ccb7e83196
[maven-release-plugin] prepare for next development iteration
2019-02-17 12:56:19 +01:00
Claudio Atzori
7d8e62d4cc
[maven-release-plugin] prepare release dnet-dedup-3.0.7
2019-02-17 12:56:11 +01:00
Claudio Atzori
968cd47436
replace existing attributes when loading default configuration
2019-02-17 12:48:25 +01:00
Michele De Bonis
0735f3a822
implementation of the test classes and minor changes
2019-02-08 12:56:47 +01:00
Michele De Bonis
7a8d28991f
implementation of the decision tree for the deduplication of the authors, implementation of multiple comparators to be used in a tree node and definition of the proto for person entity
2018-12-20 09:54:41 +01:00
Michele De Bonis
39613dbbd6
implementation of the decisional tree, addition of the dnet-openaire-data-protos module, definition of the person proto, blockprocessor and paceconfig modified with addition of support for the tree processing
2018-12-12 16:30:03 +01:00
Claudio Atzori
f1c68d8ba3
apply limits (length, size) to pace Fields
2018-11-20 10:51:38 +01:00
Claudio Atzori
c5979ffe18
[maven-release-plugin] prepare for next development iteration
2018-11-19 17:41:45 +01:00
Claudio Atzori
9869dff1d2
[maven-release-plugin] prepare release dnet-dedup-3.0.6
2018-11-19 17:41:37 +01:00
Claudio Atzori
c2d4cb3ba6
added new properties to FieldDef (size, length) to limit the information mapped onto each MapDocument
2018-11-19 17:37:57 +01:00
Claudio Atzori
394fcafd41
[maven-release-plugin] prepare for next development iteration
2018-11-17 09:13:16 +01:00
Claudio Atzori
397554130c
[maven-release-plugin] prepare release dnet-dedup-3.0.5
2018-11-17 09:13:09 +01:00
Claudio Atzori
0dfb2ea600
added distance function fot software titles
2018-11-17 09:11:38 +01:00
Michele De Bonis
3d4372ced9
addition of cities check
2018-11-16 16:11:03 +01:00
Claudio Atzori
55a9b4f501
[maven-release-plugin] prepare for next development iteration
2018-11-16 09:18:00 +01:00
Claudio Atzori
35ab630493
[maven-release-plugin] prepare release dnet-dedup-3.0.4
2018-11-16 09:17:53 +01:00
Claudio Atzori
399e4bc80f
default (empty) configuration should be aligned with the updated model
2018-11-15 16:52:56 +01:00
Claudio Atzori
59bab8dba4
less verbose logging
2018-11-13 09:07:45 +01:00
Claudio Atzori
478ad72cb8
propagate exceptions in case of serialization errors, removed configuration pretty printing, removed unused class ScoredResult
2018-11-12 15:52:18 +01:00
Claudio Atzori
f7616c7a8a
[maven-release-plugin] prepare for next development iteration
2018-11-12 14:23:36 +01:00
Claudio Atzori
df4b871c8b
[maven-release-plugin] prepare release dnet-dedup-3.0.3
2018-11-12 14:23:29 +01:00
Michele De Bonis
72a9b3139e
Merge branch 'master' of https://github.com/dnet-team/dnet-dedup
2018-11-12 14:11:26 +01:00
Michele De Bonis
b5062f5429
configuration file updated, addition of condition on domain
2018-11-12 14:11:15 +01:00
Claudio Atzori
2a509b18fa
[maven-release-plugin] prepare for next development iteration
2018-11-12 12:46:50 +01:00
Claudio Atzori
e247218987
[maven-release-plugin] prepare release dnet-dedup-3.0.2
2018-11-12 12:46:42 +01:00
Claudio Atzori
b7bc7f0401
getting rid of spark libs from dnet-pace-core
2018-11-12 12:46:06 +01:00
Claudio Atzori
3dacba37ea
[maven-release-plugin] prepare for next development iteration
2018-11-12 11:40:42 +01:00
Claudio Atzori
8cc2517f5d
[maven-release-plugin] prepare release dnet-dedup-3.0.1
2018-11-12 11:40:34 +01:00
Claudio Atzori
851ae5eec3
[maven-release-plugin] rollback the release of dnet-dedup-3.0.1
2018-11-12 11:39:07 +01:00
Claudio Atzori
f283d58a6e
[maven-release-plugin] prepare release dnet-dedup-3.0.1
2018-11-12 11:38:52 +01:00
Claudio Atzori
6d09041288
[maven-release-plugin] rollback the release of dnet-dedup-3.0.1
2018-11-12 11:28:28 +01:00
Claudio Atzori
46cee13596
[maven-release-plugin] prepare for next development iteration
2018-11-12 11:24:06 +01:00
Claudio Atzori
e1c69ad24e
[maven-release-plugin] prepare release dnet-dedup-3.0.1
2018-11-12 11:23:57 +01:00
Michele De Bonis
b247a86e69
configuration files changed: dedupRun instead of run, assertion updated in tests
2018-11-06 11:02:00 +01:00
Michele De Bonis
4c8485d0bb
deleted useless imports
2018-11-06 09:48:22 +01:00
Michele De Bonis
748189af10
implementation of JaroWinklerNormalizedName, addition of various stopwords in different languages and configuration test
2018-11-05 17:22:59 +01:00
Claudio Atzori
e296f7a81c
added DiffPatchMatch utility. Resumed commented tests!
2018-10-31 10:49:11 +01:00
Michele De Bonis
dc41b76643
serialization test added. useless getter methods ignored by json serialization
2018-10-29 16:16:11 +01:00
Michele De Bonis
ea36007d1f
DedupConf parsed using Jackson library
2018-10-29 11:13:55 +01:00
Michele De Bonis
8b4762bf54
implementation of the toString methonds changed: from Gson to Jackson
2018-10-26 14:55:59 +02:00
Michele De Bonis
3cf3dc1934
modification in the initialization of clustering functions, distance algos and conditions.
2018-10-25 15:15:40 +02:00
Michele De Bonis
1cbbc3f15a
update in the discovery of clustering, conditions and distance functions (annotated with custom annotations)
2018-10-24 12:09:41 +02:00
Claudio Atzori
4d379c2227
revised PidMatch implementation, cleanup
2018-10-20 08:38:19 +02:00
Claudio Atzori
3197f26691
[maven-release-plugin] prepare for next development iteration
2018-10-18 12:17:34 +02:00
Claudio Atzori
63815be2d6
[maven-release-plugin] prepare release dnet-dedup-3.0.0
2018-10-18 12:17:27 +02:00
Claudio Atzori
ed14476b06
[maven-release-plugin] rollback the release of dnet-dedup-3.0.0
2018-10-18 12:13:03 +02:00
Claudio Atzori
82d5dce114
[maven-release-plugin] prepare release dnet-dedup-3.0.0
2018-10-18 12:12:45 +02:00
Claudio Atzori
4f29124607
[maven-release-plugin] rollback the release of dnet-dedup-3.0.0
2018-10-18 12:00:45 +02:00
Claudio Atzori
5a48937ae1
[maven-release-plugin] prepare for next development iteration
2018-10-18 11:58:43 +02:00
Claudio Atzori
5aec80345f
[maven-release-plugin] prepare release dnet-dedup-3.0.0
2018-10-18 11:58:36 +02:00
Claudio Atzori
1b46966383
updated maven project structure
2018-10-18 11:56:26 +02:00
Michele De Bonis
72ebf7c0f3
update of the spark test
2018-10-18 10:12:44 +02:00
Sandro La Bruzzo
1bb5c26e6d
Added FSpark Implementation of dedup
2018-10-11 15:19:20 +02:00
Sandro La Bruzzo
d1c73bcf90
Added First Implementation of Spark Test
2018-10-02 17:07:17 +02:00
Sandro La Bruzzo
476c3d7b07
added d-net pace core module and ignored target folder
2018-10-02 10:37:54 +02:00