dnet-hadoop

Commit Graph

Author	SHA1	Message	Date
Claudio Atzori	d8882c4481	extended mapping applied to datacite records to produce affiliations using the ROR ids. Inc ase of APCs it includes the amount and the currently in the relation	2023-05-02 11:56:51 +02:00
Claudio Atzori	abd7ca0c18	Merge branch 'beta' into bulkTagRefactor	2023-05-02 10:50:01 +02:00
Claudio Atzori	45f625d14f	Merge branch 'beta' into organizationToRepresentative	2023-05-02 10:46:55 +02:00
Claudio Atzori	de11edca98	Merge branch 'beta' into organizationToRepresentative	2023-05-02 09:59:41 +02:00
Claudio Atzori	851f664bd9	Merge branch 'beta' into graph_cleaning_refactoring	2023-05-02 09:55:40 +02:00
Miriam Baglioni	efc4f6a658	[bulkTag] refactor to enrich each result single step	2023-04-18 17:39:31 +02:00
Miriam Baglioni	697a134504	-	2023-04-18 10:21:12 +02:00
Miriam Baglioni	6cc95c96a2	-	2023-04-18 09:53:11 +02:00
Claudio Atzori	a2dcb06daf	added eoscifguidelines in the result view; removed compute statistics statements	2023-04-11 10:43:32 +02:00
Miriam Baglioni	932d07d2dd	[bulkTag] added filtering for datasources in eosctag	2023-04-06 15:08:27 +02:00
Miriam Baglioni	287753417d	better implementation for the fix	2023-04-06 12:22:38 +02:00
Miriam Baglioni	b42abc9904	fixed issue on bulktagging for the advanced constraints	2023-04-06 12:15:00 +02:00
Miriam Baglioni	b25b401065	added test to verify the advconstraints to dth community. inserted some additional logs.	2023-04-05 12:18:39 +02:00
Claudio Atzori	864f4051d3	[graph cleaning] added missing case	2023-04-05 11:35:47 +02:00
Claudio Atzori	dead87917f	[graph cleaning] cleanup	2023-04-04 13:13:43 +02:00
Claudio Atzori	2a6ba29b64	[graph cleaning] unit tests & cleanup	2023-04-04 12:34:51 +02:00
Claudio Atzori	63b8bbc015	[graph to Solr] using dedicated sparkExecutorCores, sparkExecutorMemory, sparkDriverMemory in convert_to_xml	2023-03-24 13:43:20 +01:00
Claudio Atzori	b502f86523	fixed input path supplemented to GetDatasourceFromCountry; adjusted the various spark.sql.shuffle.partitions	2023-03-24 13:09:12 +01:00
Claudio Atzori	c07857fa37	[graph cleaning] unit tests & cleanup	2023-03-23 15:57:47 +01:00
Claudio Atzori	90e61a8aba	[graph cleaning] WIP: refactoring of the cleaning stages, unit tests	2023-03-23 15:03:26 +01:00
Claudio Atzori	308e10d102	serialising: 1. measures for all the entity types and 2. result level fulltext	2023-03-23 11:23:22 +01:00
Claudio Atzori	488d9a5eaa	[graph cleaning] WIP: refactoring of the cleaning stages, unit tests	2023-03-23 10:41:13 +01:00
Claudio Atzori	4f5ba0ed52	[graph cleaning] WIP: refactoring of the cleaning stages, unit tests	2023-03-21 14:41:20 +01:00
Claudio Atzori	6d3d18d8b5	[graph cleaning] WIP: refactoring of the cleaning stages	2023-03-16 17:23:36 +01:00
Claudio Atzori	518618f1a9	[graph cleaning] avoid to overwrite the subject class to 'keyword' for those with provenance 'subject:fos'	2023-03-14 15:22:47 +01:00
Claudio Atzori	41e00bcd07	[graph provision] avoid to parse again the XML records, apparently the escaped XML characters get unescaped invalidating the record	2023-03-13 15:19:49 +01:00
Claudio Atzori	24e2fd828b	code formatting	2023-03-08 21:17:08 +01:00
Claudio Atzori	e28d395e87	[aggregator graph] using dedicated path to sync claims, adjusted paths with wildcards	2023-03-08 21:16:52 +01:00
Claudio Atzori	5b8fd37314	[aggregator graph] using dedicated path to sync claims	2023-03-08 15:28:14 +01:00
Claudio Atzori	7fd89566c2	[aggregator graph] handle paths including wildcards	2023-03-08 12:43:00 +01:00
Miriam Baglioni	588aca5ce4	Merge pull request 'h2020classification' (#280 ) from h2020classification into beta Reviewed-on: D-Net/dnet-hadoop#280	2023-03-03 09:29:10 +01:00
Claudio Atzori	8ec0d62d91	pre-group the records in each table before joning the contents from BETA and PROD together	2023-03-02 14:49:19 +01:00
Miriam Baglioni	0fff98a14c	[ECclassification] removed print	2023-03-02 11:46:57 +01:00
Miriam Baglioni	b0c2f7e526	[ECclassification] removed not needed resources	2023-03-02 11:44:48 +01:00
Miriam Baglioni	d4fc62c2f6	mergin with branch beta	2023-03-02 11:14:54 +01:00
Miriam Baglioni	de8ad1caef	[ECclassification] new implementation for the H2020 classification	2023-03-02 11:14:03 +01:00
Claudio Atzori	db9dad4aa7	[actionmanager] increased spark.sql.shuffle.partitions for publication, dataset, relation records	2023-03-02 09:11:37 +01:00
Miriam Baglioni	c1f9848953	[ECclassification] added new classes	2023-03-01 15:29:11 +01:00
Claudio Atzori	6f488547a7	ignore non processable records	2023-03-01 14:49:51 +01:00
Claudio Atzori	7d263f265e	adjusted logs	2023-03-01 11:58:07 +01:00
Claudio Atzori	16ad42e8f3	code formatting	2023-03-01 10:22:13 +01:00
Claudio Atzori	9c59dac859	followup changes reorganising the mdstore synchronisation mechanism	2023-03-01 10:16:20 +01:00
Miriam Baglioni	ad745c0aa3	[CrossrefFunderMapping] fixed issueson funder name	2023-02-28 14:58:27 +01:00
Miriam Baglioni	4f2df876cd	[ECclassification] new implementation first try	2023-02-28 14:44:00 +01:00
Claudio Atzori	2f7346e9cf	WIP monodirectional citations, Datacite	2023-02-28 13:30:51 +01:00
Claudio Atzori	0559d8b412	WIP monodirectional citations	2023-02-28 10:57:32 +01:00
Sandro La Bruzzo	69fa616490	removed wrong content	2023-02-28 10:27:38 +01:00
Sandro La Bruzzo	832a75d012	added mapping for crossref funder	2023-02-28 10:16:34 +01:00
Sandro La Bruzzo	78e51c182a	Added missing parametero to raw all workflow	2023-02-28 10:16:01 +01:00
Claudio Atzori	7aebedb43c	code formatting	2023-02-27 11:51:27 +01:00

1 2 3 4 5 ...

3684 Commits