Commit Graph

724 Commits

Author SHA1 Message Date
Claudio Atzori d098ad0d93 [hb patch] updated map 2022-05-16 15:54:04 +02:00
Claudio Atzori 1dda11e031 [hb patch] updated map 2022-05-16 15:53:27 +02:00
Claudio Atzori 8dd5517548 code formatting 2022-05-16 14:35:24 +02:00
Michele Artini 46c07e0724 deleted hierarchical rels from ror action set 2022-05-16 09:39:54 +02:00
Miriam Baglioni 89657a0b78 [UsageCount] refactoring 2022-05-09 14:43:27 +02:00
Miriam Baglioni a056f59c6e [UsageCount] make it as an action set as it should be, plus changed the test to make them work as well now 2022-05-09 12:51:35 +02:00
Serafeim Chatzopoulos 623f7be26d Fix reading files from HDFS in FileCollector & FileGZipCollector plugins 2022-04-28 16:31:11 +03:00
Claudio Atzori 30105f0722 Merge branch 'beta' into 7096-fileGZip-collector-plugin 2022-04-22 11:22:21 +02:00
Miriam Baglioni 20de75ca64 [Measures] removed typo 2022-04-21 12:14:03 +02:00
Miriam Baglioni b61efd613b [Measures] addressed comments in the PR 2022-04-21 12:09:37 +02:00
Claudio Atzori eabb40fccc Merge branch 'beta' into 7096-fileGZip-collector-plugin 2022-04-21 11:42:43 +02:00
Miriam Baglioni c304657d91 [Measures] put the logic in common, no need to change the schema 2022-04-21 11:27:26 +02:00
Miriam Baglioni 5295effc96 [Measures] fixed issue 2022-04-20 16:20:40 +02:00
Miriam Baglioni 5feae77937 [Measures] last changes to accomodate tests 2022-04-20 15:13:09 +02:00
Miriam Baglioni 869407c6e2 [Measures] added new measure (usagecounts) as action set. Measure added at the level of the result. Ref #7587 2022-04-20 14:02:05 +02:00
Serafeim Chatzopoulos d0b84d3297 Add FileCollectorPlugin and respective test 2022-04-07 15:06:38 +03:00
Serafeim Chatzopoulos bc1bf55507 Add AbstractSplittedRecordPlugin 2022-04-07 14:33:04 +03:00
Serafeim Chatzopoulos e612489670 Add fileGZip collector plugin and respective test 2022-04-06 19:12:44 +03:00
Claudio Atzori 8c457f1b2c conflicts resolved, merged from beta 2022-04-06 10:27:52 +02:00
Miriam Baglioni e77d104951 [OC] added / to workflow path 2022-04-05 15:07:11 +02:00
Sandro La Bruzzo 1b11010169 minor fix 2022-03-29 10:59:14 +02:00
Claudio Atzori 5226d0a100 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2022-02-18 15:21:07 +01:00
Claudio Atzori 401dd38074 code formatting 2022-02-18 15:19:15 +01:00
Sandro La Bruzzo 891781ee3f Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta 2022-02-18 11:11:32 +01:00
Sandro La Bruzzo d3f03abd51 fixed wrong json path 2022-02-18 11:11:17 +01:00
Claudio Atzori 89c7313fc5 Merge branch 'beta' into hierarchical_orgs_relations 2022-02-17 10:30:04 +01:00
Sandro La Bruzzo 3aa2020b24 added script to regenerate hostedBy Map following instruction defined on ticket #7539
updated hosted By Map
2022-02-15 11:05:27 +01:00
Miriam Baglioni be64055cfe [OpenCitation] changed the name of destination folders 2022-02-14 15:49:44 +01:00
Miriam Baglioni 1490867cc7 [OpenCitation] cleaning of the COCI model 2022-02-14 14:52:12 +01:00
Miriam Baglioni 5c4043dba8 [OpenCitation] refactoring 2022-02-08 16:23:05 +01:00
Miriam Baglioni 759ed519f2 [OpenCitation] added logic to avoid the genration of self citations relations 2022-02-08 16:15:34 +01:00
Miriam Baglioni b071f8e415 [OpenCitation] change to extract in json format each folder just onece 2022-02-08 15:37:28 +01:00
Miriam Baglioni fbc28ee8c3 [OpenCitation] change the integration logic to consider dois with commas inside 2022-02-07 18:32:08 +01:00
Miriam Baglioni 73eba34d42 [UnresolvedEntities] Changed the way to merge the unresolved because the new merge removed the dataInfo from the merged result. Added also data info for subjects 2022-02-01 08:38:41 +01:00
Claudio Atzori b37bc277c4 reintroduced the hostedby patching to the datacite records 2022-01-21 09:15:13 +01:00
Miriam Baglioni a75fb8c47a [BipFinderInstanceLevel] change pom to align to the dhp-schema release 2.10.24 and refactoring 2022-01-12 18:06:26 +01:00
Miriam Baglioni e7d5a39c03 [BipFinderInstanceLevel] added tests in test class 2022-01-12 17:25:04 +01:00
Miriam Baglioni 4993666d73 [BipFinderInstanceLevel] changed creation of the instance to allow to enrich existing instances with same pid 2022-01-12 16:53:47 +01:00
Sandro La Bruzzo 57e2c4b749 formatted code 2022-01-12 09:40:28 +01:00
Claudio Atzori dcd282977c pulled from beta 2022-01-11 16:59:41 +01:00
Claudio Atzori 4f212652ca scalafmt: code formatting 2022-01-11 16:57:48 +01:00
Miriam Baglioni b7e450070b [SDG-FOS] to import SDG file not considering the header 2022-01-07 12:13:26 +01:00
Miriam Baglioni 639190370a mergin with branch beta 2022-01-07 11:29:25 +01:00
Miriam Baglioni adccc2346a [SDG-FOS] to lower case for the doi 2022-01-07 11:28:50 +01:00
Claudio Atzori 58f8998e3d OAF-store-graph mdstores: save them in text format 2022-01-04 15:02:09 +01:00
Claudio Atzori 174c3037e1 OAF-store-graph mdstores: save them in text format 2022-01-04 14:40:16 +01:00
Claudio Atzori 045d767013 OAF-store-graph mdstores: save them in text format 2022-01-04 14:23:01 +01:00
Claudio Atzori a6977197b3 serialise records in the OAF-store-graph mdstores in json format. Read them again in the graph construction phase using a tolerant parser to support backward compatible changes in the evolution of the schema 2022-01-03 17:25:26 +01:00
Miriam Baglioni 92fd69e25d [SDG-FOS] alternative way to get input data to avoid OOM error while getting csv 2022-01-03 15:23:06 +01:00
Miriam Baglioni 7a1b440413 [SDG] logic to create unresolved entities out of SDG input. This changes also some classes related to FOS to reuse the same code. The code under createunresolvedentities create results with the merged update of the the inputs provided (bip at the level of the isntance, fos and sdg for subjects) 2021-12-23 13:24:28 +01:00
Miriam Baglioni 2a67ee13ec [SDG] added model class 2021-12-23 10:37:52 +01:00
Miriam Baglioni 10579c0dd0 [FOS]fixed doi value in test 2021-12-22 23:10:16 +01:00
Miriam Baglioni 6116fc5d40 [FOS]added logic to include only different subjects. Test refactoring and extention 2021-12-22 23:04:22 +01:00
Miriam Baglioni b81efb6a9d [FOS]changed the mapping between the csv and the model. Changed Test classes and resources 2021-12-22 21:40:35 +01:00
Miriam Baglioni de6c4c8968 [FOS]creation of the unresolved entities: remove the split for the doi: no more needed since each row is related to one doi 2021-12-22 16:44:44 +01:00
Miriam Baglioni 34ac56565d refactoring 2021-12-22 16:28:11 +01:00
Miriam Baglioni 20ef1d657f refactoring 2021-12-22 16:26:36 +01:00
Miriam Baglioni 813f856d3f [BipFinder] removing left over parameter in wf 2021-12-22 16:11:12 +01:00
Miriam Baglioni 2c126ed014 [BipFinder] create unresolved entities with measures at the level of the instance 2021-12-22 16:03:41 +01:00
Miriam Baglioni 0807fdb65a [BipFinder] remove not needed resources 2021-12-22 15:37:00 +01:00
Miriam Baglioni b5e11a3a0a [BipFinder] put in common package BipFinder model 2021-12-22 15:33:05 +01:00
Miriam Baglioni c5739c4266 [BipFinder] create action set for the measures at the level of the result 2021-12-22 15:08:33 +01:00
Miriam Baglioni e24a7f3496 mergin with branch beta 2021-12-21 13:57:19 +01:00
Miriam Baglioni d1ae219cb4 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-12-21 13:55:53 +01:00
Sandro La Bruzzo 3920d68992 Fixed workflow generation of delta in datacite 2021-12-21 11:41:49 +01:00
Sandro La Bruzzo b881ee5ef8 [scholexplorer]
- implemented generation of scholix of delta update of datacite
2021-12-15 11:25:32 +01:00
Miriam Baglioni 22d4b5619b [BipFinder Result] last changes to test and resources files 2021-12-14 14:54:13 +01:00
Miriam Baglioni 6fb6236cd4 changed the way to produce the AS for bipFinder. 2021-12-14 14:51:14 +01:00
Miriam Baglioni 4eb8276493 - 2021-12-14 11:12:17 +01:00
Miriam Baglioni b113586207 resolved conflicts 2021-12-07 10:16:14 +01:00
Miriam Baglioni d9836f0cf3 [OpenCitations] fixed test when executed one after the other 2021-12-06 15:27:09 +01:00
Sandro La Bruzzo 7af0bbd0b1 [scala-refactor] Module dhp-aggregation:
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 11:26:36 +01:00
Miriam Baglioni f430688ff7 Merge branch 'beta' of https://code-repo.d4science.org/D-Net/dnet-hadoop into beta 2021-12-03 12:36:08 +01:00
Sandro La Bruzzo f7011b90d8 format code 2021-12-03 11:15:09 +01:00
Miriam Baglioni 87eedad898 - 2021-12-02 13:17:19 +01:00
Sandro La Bruzzo feea154e89 remove working dir after test 2021-11-25 11:02:38 +01:00
Sandro La Bruzzo 028a8acad8 add test resources 2021-11-25 10:54:47 +01:00
Sandro La Bruzzo 2164a2a889 Datacite: Code Refactor generated a general SparkApplication Scala where all the spark scala have to inherit
Commented a little the Datacite transformation code
2021-11-25 10:54:13 +01:00
Sandro La Bruzzo a7cf277d98 Datacite: Removed HostedBy Patch as described on ticket #7219, Now all the records will have hosted by Unknown Repository 2021-11-22 16:03:17 +01:00
Sandro La Bruzzo fc03c99805 fixed javadocs url after deploying site 2021-11-19 10:46:33 +01:00
Claudio Atzori bd9a43cefd Revert to 4094f2bb9a 2021-11-19 09:20:43 +01:00
Claudio Atzori 3a4d925386 Merge branch 'beta' into hierarchical_orgs_relations 2021-11-18 18:07:08 +01:00
Sandro La Bruzzo 2506d7a679 Merge branch 'mvn_site_documentation' of code-repo.d4science.org:D-Net/dnet-hadoop into mvn_site_documentation 2021-11-17 11:07:24 +01:00
Sandro La Bruzzo cded363b55 code refactor, created and moved scala code on the correct maven folder under src/main/scala and src/test/scala 2021-11-17 11:06:35 +01:00
Miriam Baglioni 4094f2bb9a added integration md file 2021-11-17 10:04:52 +01:00
Miriam Baglioni ec8b0219ff [Documentation] Added first page for Integration via unresolved entities generation 2021-11-16 17:41:34 +01:00
Sandro La Bruzzo 2d67020c59 added dhp-enrichment maven site template 2021-11-16 16:01:08 +01:00
Claudio Atzori bafa2990f3 code formatting 2021-11-15 17:07:16 +01:00
Sandro La Bruzzo efa09057db Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta 2021-11-15 14:32:09 +01:00
Sandro La Bruzzo 48923e46a1 added documentation to Pubmed Class and also added mvn site for dhp-aggregations 2021-11-15 14:32:01 +01:00
Miriam Baglioni 4ec88c718c merge with beta - resolved conflict in pom 2021-11-15 10:52:16 +01:00
Miriam Baglioni 6f1a434e90 [Bypass Action Set] Fixed test to consider the new identifier utils 2021-11-15 09:59:23 +01:00
Miriam Baglioni 157d33ebf9 [Bypass Action Set] Refactoring 2021-11-15 09:58:48 +01:00
Miriam Baglioni 92d0e18b55 [Bypass Action Set] used constant DOI instead of "doi" 2021-11-12 10:56:58 +01:00
Miriam Baglioni 881113743f [Bypass Action Set] refactoring 2021-11-12 10:55:50 +01:00
Miriam Baglioni 47ccb53c4f [Bypass Action Set] modification for comment #157 (comment) 2021-11-12 10:54:09 +01:00
Miriam Baglioni 716021546e [Bypass Action Set] minor fix 2021-11-12 10:18:01 +01:00
Miriam Baglioni 935062edec [Bypass Action Set] creation of unresolved entities 2021-11-11 16:11:25 +01:00
Claudio Atzori d02caef185 Merge branch 'beta' into hierarchical_orgs_relations 2021-10-27 15:36:29 +02:00
Sandro La Bruzzo 4acfa8fa2e Scholexplorer Datasource Aggregation:
- Added collectedfrom in the inverse relation generated
Relation resolution:
- increased number of partitions in workflow.xml
- using classid instead of classname to build the pid-dnetId mapping
2021-10-26 17:51:20 +02:00
Sandro La Bruzzo 034304b33a conflict resolved on merge 2021-10-26 09:40:47 +02:00
Michele Artini d66e20e7ac added hierarchy rel in ROR actionset 2021-10-21 15:51:48 +02:00
Sandro La Bruzzo aeeebd573b code refactor renamed datacite package 2021-10-20 17:37:42 +02:00
Sandro La Bruzzo ab3a99d3e9 removed old datacite oozie workflow 2021-10-20 17:19:47 +02:00
Sandro La Bruzzo ae4e99a471 Adapted workflow of resolution of PID to work into OpenAIRE data workflow
- Added relations in both verse on all Scholexplorer datasources
2021-10-20 17:12:16 +02:00
Miriam Baglioni 1cc09adfaa Opencitations: chenaged the test class to mirror the creation or not of duplicate dois for .refs oc original plus added optional parameter to duplicate the relation 2021-10-18 14:11:27 +02:00
Sandro La Bruzzo 7b15b88d4c renamed wrong package, implemented last aggregation workflow for scholexplorer 2021-10-15 15:00:15 +02:00
Sandro La Bruzzo 51a03c0a50 refactor code for EBI from dhp-graph-mapper into dhp-aggregation 2021-10-14 14:23:13 +02:00
Sandro La Bruzzo 7387416e90 added params skip update to direct transform in OAF, this should be set to true in production 2021-10-12 12:36:30 +02:00
Sandro La Bruzzo 511da98d0c - fixed bug on download pmc Article
- removed unused line of code in SparkCreateActionset
2021-10-12 11:47:49 +02:00
Sandro La Bruzzo 5606014b17 code refactor see ticket #7065 2021-10-12 08:11:53 +02:00
Sandro La Bruzzo 66702b1973 Added node to update datacite 2021-09-28 08:59:06 +02:00
Miriam Baglioni 5ec69889db OpenCitations: creation of AS from OC 2021-09-27 16:02:06 +02:00
Miriam Baglioni f2118d771a first steps in the implementation of the integration of opencitations 2021-09-22 15:18:05 +02:00
Claudio Atzori 663b1556d7 manually integrating PR#140 #140 2021-09-15 16:40:25 +02:00
Sandro La Bruzzo aed29156c7 changed behavior in transformation job, that doesn't fail at first error 2021-09-07 19:05:46 +02:00
Sandro La Bruzzo 3c6fc2096c fix bug on oai iterator that skip record cleaned 2021-09-07 10:46:26 +02:00
Sandro La Bruzzo 9f8a80deb7 fixed wrong import of unresolved relation in openaire 2021-09-01 14:16:27 +02:00
Sandro La Bruzzo e8b3cb9147 Implemented method to download delta updates in EBI Links 2021-08-30 09:32:45 +02:00
Alessia Bardi 931f430129 Merge branch 'beta' into datasource_model_eosc_beta 2021-08-23 11:57:21 +02:00
Claudio Atzori baed5e3337 test classes moved in specific components 2021-08-13 12:14:47 +02:00
Claudio Atzori 3359f73fcf cleanup & best practices 2021-08-13 12:00:42 +02:00
Miriam Baglioni 32fd75691f refactoring 2021-08-13 10:15:42 +02:00
Miriam Baglioni 5cd5714530 GetCSV refactoring - added ignore annotation for fields not in input csv 2021-08-13 10:06:49 +02:00
Miriam Baglioni ed183d878e GetCSV refactoring - modified test classes due to change in the model of projects and programme 2021-08-13 09:28:51 +02:00
Miriam Baglioni 8769dd8eef GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:20:56 +02:00
Miriam Baglioni 6b9e1bf2e3 GetCSV refactoring - removing not needed dependency 2021-08-12 18:17:50 +02:00
Miriam Baglioni 9da74b544a GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:12:15 +02:00
Miriam Baglioni ab8abd61bb GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:11:07 +02:00
Miriam Baglioni 335a824e34 GetCSV refactoring - fixed issue 2021-08-12 18:10:10 +02:00
Miriam Baglioni f0845e9865 GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:04:58 +02:00
Miriam Baglioni 7a789423aa GetCSV refactoring - refactoring due to movement of classes 2021-08-12 18:04:27 +02:00
Miriam Baglioni e9fc3ef3bc GetCSV refactoring - changed to use the new class to get and write the csv file 2021-08-12 18:03:41 +02:00
Miriam Baglioni 4317211a2b GetCSV refactoring - refactoring due to movement 2021-08-12 18:03:14 +02:00
Miriam Baglioni b62cd656a7 GetCSV refactoring - changed the model to store only the information needed 2021-08-12 18:01:10 +02:00
Miriam Baglioni d36e925277 GetCSV refactoring - moved under model package 2021-08-12 18:00:21 +02:00
Miriam Baglioni 6e84b3951f GetCSV refactoring - moving classes to dhp-common that have dependency with GetCSV class (that was located in graph-mapper) 2021-08-12 17:57:41 +02:00
Miriam Baglioni 804589eb30 reverting 2021-08-11 17:23:35 +02:00
Miriam Baglioni d688749ad9 reverting 2021-08-11 17:22:28 +02:00
Miriam Baglioni 524c06e028 reverting 2021-08-11 17:20:30 +02:00
Miriam Baglioni 7aa3260729 reverting 2021-08-11 17:18:45 +02:00
Miriam Baglioni 55fc500d8d reverting 2021-08-11 17:17:48 +02:00
Miriam Baglioni 8da3a25cf6 merging with branch beta 2021-08-11 15:55:34 +02:00
Claudio Atzori 9f4db73f30 updated/fixed unit tests 2021-08-11 15:02:51 +02:00
Claudio Atzori 61d811ba53 suggestions from intellij 2021-08-11 12:18:20 +02:00
Claudio Atzori 2ee21da43b suggestions from SonarLint 2021-08-11 12:13:22 +02:00
Miriam Baglioni 1d6ac3715b merge branch with beta 2021-07-30 11:58:29 +02:00
Sandro La Bruzzo b1b0cc3f15 fixed wrong package name 2021-07-29 13:55:08 +02:00
Sandro La Bruzzo 3721df7aa6 refactoring create actionset of scholexplorer, moved on package dhp-aggregation 2021-07-29 10:45:35 +02:00
Sandro La Bruzzo 3d8f0f629b implemented workflow of creation action set for scholexplorer 2021-07-28 16:15:34 +02:00
Miriam Baglioni cc0d3d8a7b mergin with branch beta 2021-07-28 11:24:46 +02:00
Miriam Baglioni 708d0ade34 Merge branch 'beta' into hostedbymap 2021-07-28 10:37:22 +02:00
Sandro La Bruzzo 16c91203bd implemented workflow of creation action set for scholexplorer 2021-07-28 10:30:49 +02:00
Sandro La Bruzzo 825d9f0289 fixed datacite workflow starting from Importing delta 2021-07-27 16:09:46 +02:00
Miriam Baglioni 74f801b689 mergin with branch beta 2021-07-27 13:18:31 +02:00
Miriam Baglioni eb07f7f40f Hosted By Map 2021-07-27 12:27:26 +02:00
Claudio Atzori a0393607a7 mapping funding relations from Datacite should be done according to the actual result identifier 2021-07-23 18:15:08 +02:00
Sandro La Bruzzo 62ae36a3d2 fixed NPE 2021-07-22 15:41:38 +02:00
Miriam Baglioni 63553a76b3 added code to download gold issn list from unibi 2021-07-22 12:01:48 +02:00
Sandro La Bruzzo bbe8193930 merged stable ids 2021-07-12 17:00:43 +02:00
Sandro La Bruzzo cd17e19044 implemented branch workflow to import datacite and crossref in scholexplorer 2021-07-08 21:20:19 +02:00
Claudio Atzori 777536ce91 [aggregation] string values used as regular expressions in the OAI collection classes are defined in a single point as constants, to be reused across the code (PR#122) 2021-07-07 11:23:48 +02:00
Claudio Atzori bc014023c8 Merge pull request 'to solve the scala SI-3623' (#122) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
Reviewed-on: #122
2021-07-07 11:13:51 +02:00
Andreas Czerniak ebf3f47a02 from&until more OAI2.0 compl., adding tfs 2021-07-07 09:29:49 +02:00
Claudio Atzori 70ded407bb HttpClient used in metadata collection retries also on 404 2021-07-05 18:04:30 +02:00
Sandro La Bruzzo db933ebd21 Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer 2021-06-29 14:16:12 +02:00
Sandro La Bruzzo 7e08655e5f added relation dates in all scholexplorer Datasources 2021-06-29 12:02:03 +02:00
Claudio Atzori af42377d0e HttpClient used in metadata collection retries on 502, 503, 504 2021-06-28 09:34:30 +02:00
Sandro La Bruzzo ad50415167 Merge remote-tracking branch 'origin/stable_ids' into stable_id_scholexplorer 2021-06-24 17:20:50 +02:00
Sandro La Bruzzo 80e15cc455 implemented mapping from uniprot, pdb and ebi links 2021-06-24 17:20:00 +02:00
Claudio Atzori 5edcc6832a applying sonarLint suggestions 2021-06-23 09:53:29 +02:00
Sandro La Bruzzo 1dc0c59e20 merged fix thai dates from stable_ids 2021-06-21 10:39:46 +02:00
Sandro La Bruzzo 507e42102a added pdb to oaf class 2021-06-21 09:36:40 +02:00
Sandro La Bruzzo 3990165d05 changed typologies of unresolved relation 2021-06-18 11:43:59 +02:00
Sandro La Bruzzo cc0f2b11fb Implemented mapping from pubmed baseline to OAF 2021-06-16 14:56:24 +02:00
Sandro La Bruzzo aeb8132627 Merged branch stable_ids 2021-06-14 10:07:29 +02:00
Claudio Atzori e9e86a237d Merge branch 'stable_ids' of https://code-repo.d4science.org/D-Net/dnet-hadoop into stable_ids 2021-06-11 17:00:02 +02:00
Claudio Atzori a900bfb874 delegating the date parsing to https://github.com/sisyphsu/dateparser 2021-06-11 16:53:01 +02:00
Sandro La Bruzzo dd997c49e0 fix wrong relation id
fix date thai ticket #6791
2021-06-10 14:47:18 +02:00
Sandro La Bruzzo 0cdb7ccdaa added inverse relations to datacite mapping 2021-06-04 15:10:20 +02:00
Sandro La Bruzzo 5b724d9972 added relations to datacite mapping 2021-06-04 10:14:22 +02:00
Sandro La Bruzzo 02ef46535f Merge branch 'stable_ids' of code-repo.d4science.org:D-Net/dnet-hadoop into stable_ids 2021-05-31 09:50:15 +02:00
Sandro La Bruzzo aeadc5a366 updated wf Datacite Import to retrieve the block size as parameter 2021-05-31 09:49:53 +02:00
Claudio Atzori d512062b58 integrating pull #109, H2020Classification 2021-05-27 12:22:47 +02:00
Sandro La Bruzzo bced804151 updated wf Datacite Import to retrieve the block size as parameter 2021-05-26 17:06:50 +02:00
Miriam Baglioni abd88f663d changed test resource to mirror change in the input file 2021-05-21 15:20:47 +02:00
Miriam Baglioni c844877de2 changed workflow flow to possibly parallelize also the programme and project preparation steps 2021-05-21 14:41:57 +02:00
Miriam Baglioni 073d76864d refactoring 2021-05-21 14:41:03 +02:00
Miriam Baglioni 4c8b4a774c removed not needed code 2021-05-21 14:40:07 +02:00
Miriam Baglioni 53b9d87fec new prepareProgramme according to the new file 2021-05-21 11:49:31 +02:00
Miriam Baglioni 1ee8f13580 refactoring and added "left" as join type to be 100% sure to get the whole set of projects 2021-05-21 11:49:05 +02:00
Miriam Baglioni e07c3ba089 due to change in the input file the filtering step is no more needed 2021-05-21 11:47:43 +02:00
Miriam Baglioni 54f6e2f693 changed to get the needed information to build the action set as parallel jobs 2021-05-21 11:47:00 +02:00
Miriam Baglioni 7180505519 removed non needed variable 2021-05-21 11:46:13 +02:00
Miriam Baglioni 2eb1a8b344 changed because the input file changed 2021-05-21 11:40:20 +02:00
Claudio Atzori 9d725efdc1 reverted implementation of the mdstore client 2021-05-20 18:26:09 +02:00
Miriam Baglioni 9610224671 added param to workflow property 2021-05-20 18:21:12 +02:00
Miriam Baglioni 052c837843 - 2021-05-20 15:54:44 +02:00
Claudio Atzori b695932ae4 integrated pull#108 2021-05-20 15:34:04 +02:00
Miriam Baglioni dc0ad8d2e0 fixed issue related to change in the file name downloaded. Added sheet name as parameter and also a check if the name should change 2021-05-20 14:53:53 +02:00
Claudio Atzori 239d0f0a9a ROR actionset import workflow backported from branch stable_ids 2021-05-18 16:12:11 +02:00
Michele Artini c1e20de7cf fixed the deserialization of a json property 2021-05-18 14:00:14 +02:00
Claudio Atzori 23b8883ab1 applied intellij code cleanup 2021-05-14 10:58:12 +02:00
Sandro La Bruzzo 6424cd9062 Added passing of the following parameters:
-varDataSourceId
-varOfficialName

in Each transformation Rule
2021-05-11 15:17:38 +02:00
Sandro La Bruzzo 073dcea2aa Added passing of the following parameters:
-varDataSourceId
-varOfficialName

in Each transformation Rule
2021-05-11 15:05:58 +02:00
Claudio Atzori 3797543600 MDStoreManager model classes moved in dhp-schemas 2021-05-10 14:32:05 +02:00
Michele Artini d82071ba6c originalId with prefix 2021-05-06 15:34:48 +02:00
Claudio Atzori 923d19ea8e mdstore read lock/unlock when bulk copying records from mongodb to hdfs 2021-05-04 18:06:21 +02:00
Claudio Atzori ba86835951 using common constants from ModelConstants 2021-05-04 11:51:52 +02:00
Michele Artini a278d67175 parse input file 2021-04-29 11:34:47 +02:00
Michele Artini f77ba34126 pid types 2021-04-29 09:50:05 +02:00
Michele Artini 7c5cd86927 annotations and tests 2021-04-29 09:29:19 +02:00
Michele Artini b5cf505cc6 partial implementation of the ROR->actionset workflow 2021-04-28 16:00:24 +02:00
Claudio Atzori 5afa7d3e0c core utilities in dhp-common moved in external module dhp-schemas 2021-04-27 15:44:01 +02:00
Sandro La Bruzzo 63c0303137 removed unused import, add log 2021-04-27 12:17:23 +02:00
Claudio Atzori fa42026590 fixed PersonCleaner extension functions 2021-04-27 10:10:06 +02:00
Sandro La Bruzzo fd29307b84 updated workflow name 2021-04-21 09:21:41 +02:00
Claudio Atzori d0d477cca3 code formatting 2021-04-20 12:50:34 +02:00
Sandro La Bruzzo e06c7f32f6 updated id figshare as described in #6377 2021-04-20 10:18:07 +02:00
Sandro La Bruzzo dbe0d0378e resolved ticket #6377 2021-04-20 09:44:44 +02:00
Sandro La Bruzzo 524e5f3092 Improved parallelization on transformation wf on hadoop 2021-04-19 15:17:25 +02:00
Sandro La Bruzzo cdfe01bbae improved parallelization on transformation job 2021-04-19 15:14:52 +02:00
Claudio Atzori 3125cef545 code formatting 2021-04-14 09:11:54 +02:00
Andreas Czerniak 3b694074ff add xslt, personname cleaner 2021-04-13 07:04:27 +02:00
Claudio Atzori 7941d7be29 WIP: using common definitions from ModelConstants 2021-03-31 18:33:57 +02:00
Claudio Atzori 879e8cc7ef WIP: using common definitions from ModelConstants 2021-03-31 17:12:01 +02:00
Claudio Atzori 72ce741ea6 WIP: using common definitions from ModelConstants 2021-03-31 17:07:13 +02:00
Sandro La Bruzzo 616d2ecce2 splitted workflow collecting datacite into two workflows.
Released on beta
2021-03-31 15:45:58 +02:00
Sandro La Bruzzo 1dfda3624e improved workflow importing datacite 2021-03-26 13:56:29 +01:00
Claudio Atzori 8db248aa13 avoiding error on jenkins compilations: java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries (on a random free port)! 2021-03-23 09:56:34 +01:00
Sandro La Bruzzo c73072079d fix conflicts 2021-03-22 16:36:31 +01:00
Claudio Atzori 61a2551e74 migrated last changes from svn (dnet45) 2021-03-15 17:17:55 +01:00
Claudio Atzori acbe3119a4 RestCollectorPlugin imported from dne45 2021-03-08 09:44:09 +01:00
Claudio Atzori fa7930d2e2 merging contributions from PR#97 2021-03-05 15:45:28 +01:00
Claudio Atzori 36f750cd1d removed unused classes 2021-03-03 10:22:29 +01:00
Claudio Atzori b73dce3e3a more logging on the MDStore mongodb client. Forcing UTF_8 encoding on the content 2021-03-03 10:17:16 +01:00
Claudio Atzori e76c4f62c1 MetadataRecord moved in dhp-schemas 2021-02-26 10:58:48 +01:00
Claudio Atzori 7df2461ccc indent XML records collected from oai-pmh endpoints 2021-02-25 16:19:12 +01:00
Claudio Atzori b830e33392 mdstore collector plugin 2021-02-25 12:30:30 +01:00
Claudio Atzori 271e88537b code formatting 2021-02-25 12:28:56 +01:00
Claudio Atzori 9c899f4433 cleanup on transformation functions and the relative tests 2021-02-24 15:07:59 +01:00
Claudio Atzori fc3fa5e343 implemented mdstore collector plugin 2021-02-24 15:07:24 +01:00
Claudio Atzori e7eba9f7e7 WIP: transformation workflow error reporting; cleanup 2021-02-17 16:54:08 +01:00
Claudio Atzori 58467aaf1e WIP: transformation workflow error reporting 2021-02-17 16:14:41 +01:00
Claudio Atzori cc88701f29 retry for any Socket exception 2021-02-17 16:13:54 +01:00
Claudio Atzori 545f8f3e48 using jackson objectmapper instead of GSon to serialise the aggregation report 2021-02-17 12:15:00 +01:00
Claudio Atzori b592d78bb4 WIP: collectorWorker error reporting, generalised reported implementation 2021-02-17 10:28:01 +01:00
Claudio Atzori cf27905a71 WIP: collectorWorker error reporting, added report messages 2021-02-16 16:53:14 +01:00
Claudio Atzori 1abe6d1ad7 WIP: collectorWorker error reporting, added report messages 2021-02-15 15:08:59 +01:00
Claudio Atzori 523a6bfa97 Merge pull request 'first commit to the correct branch' (#94) from andreas.czerniak/BrAggr_dnet-hadoop:hadoop_aggregator into hadoop_aggregator
Looks good to me, thanks Andreas!
2021-02-15 12:15:31 +01:00
Sandro La Bruzzo 7edcc87ed4 changed xslt behaviour on failure 2021-02-12 17:27:08 +01:00
Sandro La Bruzzo 6a37c7f175 merge fixed 2021-02-12 16:38:47 +01:00
Sandro La Bruzzo b3f5c2351d Merge branch 'hadoop_aggregator' of code-repo.d4science.org:D-Net/dnet-hadoop into hadoop_aggregator
 Conflicts:
	dhp-workflows/dhp-aggregation/src/test/java/eu/dnetlib/dhp/transformation/TransformationJobTest.java
2021-02-12 16:37:14 +01:00
Sandro La Bruzzo f216277219 Implemented cleaning date 2021-02-12 16:34:52 +01:00
Andreas Czerniak 5a9017cf18 clone, min. changes, test, run 2021-02-12 14:32:36 +01:00
Claudio Atzori aa55dedb8a Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator 2021-02-12 12:31:05 +01:00
Claudio Atzori 29c6f7e255 classes related to the collection workflow moved into common package; implemented MongoDB collection plugins 2021-02-12 12:31:02 +01:00
Sandro La Bruzzo 17e6f1934e fixed NPE on cleaner 2021-02-12 11:48:11 +01:00
Sandro La Bruzzo ebcc3ec14f updated wrong datacite identifier in trasformation 2021-02-11 16:25:51 +01:00
Claudio Atzori bae029f828 collection_java_xmx allows to declare the heap size allocated for the java actions involved in the metadata collectionw workflow 2021-02-08 18:07:23 +01:00
Claudio Atzori bebc54d5bf seq file storing native records is now compressed 2021-02-08 18:06:25 +01:00
Claudio Atzori 50add4c61b added requestDelay to HttpConnector2 configuration; Aggregation workflow constants moved in dhp-common 2021-02-08 12:19:38 +01:00
Claudio Atzori 40df0f987d better logging, WIP: collectorWorker error reporting; common functions moved in DHPUtils 2021-02-06 20:12:00 +01:00
Claudio Atzori a8a758925e better logging, WIP: collectorWorker error reporting 2021-02-05 19:18:05 +01:00
Claudio Atzori 730973679a Merge branch 'hadoop_aggregator' of https://code-repo.d4science.org/D-Net/dnet-hadoop into hadoop_aggregator 2021-02-04 17:25:00 +01:00
Claudio Atzori deb85706db imported HttpConnector from https://svn.driver.research-infrastructures.eu/driver/dnet45/modules/dnet-modular-collector-service/trunk/src/main/java/eu/dnetlib/data/collector/plugins/HttpConnector.java as HttpConnector2 2021-02-04 17:24:52 +01:00
Sandro La Bruzzo 4dae5e605d implemented messaging btween collection worker and Dnet 2021-02-04 15:51:15 +01:00
Claudio Atzori 40764cf626 better logging, WIP: collectorWorker error reporting 2021-02-04 14:06:02 +01:00
Sandro La Bruzzo 69c253710b fixed test 2021-02-04 10:30:49 +01:00
Claudio Atzori e04045089f better logging, WIP: collectorWorker error reporting 2021-02-03 17:58:22 +01:00
Claudio Atzori 0e8a4f9f1a better logging, WIP: collectorWorker error reporting 2021-02-03 12:33:41 +01:00
Claudio Atzori 53884d12c2 code formatting 2021-02-02 14:38:03 +01:00
Claudio Atzori ac46c247d2 code formatting 2021-02-02 14:24:00 +01:00
Claudio Atzori bde14b149a fixed transformation target paths 2021-02-02 12:49:29 +01:00
Claudio Atzori ca4391aa1c minor changes 2021-02-02 12:44:04 +01:00
Claudio Atzori bb89b99b24 code formatting 2021-02-02 12:34:14 +01:00
Claudio Atzori 75807ea5ae factored out constants 2021-02-02 12:28:21 +01:00
Sandro La Bruzzo 0634674add implemented transformation test 2021-02-02 12:12:14 +01:00
Claudio Atzori 8eaa1fd4b4 WIP: metadata collection in INCREMENTAL mode and relative test 2021-02-01 19:29:10 +01:00
Sandro La Bruzzo bead34d11a code refactor 2021-02-01 14:58:06 +01:00
Sandro La Bruzzo 6ff234d81b Implemented a first prototype of incremental harvesting and trasformation using readlock 2021-02-01 13:56:05 +01:00
Sandro La Bruzzo b6b835ef49 update transformation Factory to get Transformation Rule by Id and not by Title 2021-02-01 08:49:42 +01:00
Sandro La Bruzzo e423634cb6 RollBack in case of error WORKS!!! 2021-01-29 17:21:42 +01:00
Sandro La Bruzzo 8ee82576c6 Collection on Refresh WORKS!!! 2021-01-29 17:02:46 +01:00
Sandro La Bruzzo 0276180039 WIP mdstore
transaction implemented on hadoop side
2021-01-29 16:42:41 +01:00
Sandro La Bruzzo 0f8e2ecce6 Merged Datacite transfrom into this branch 2021-01-29 10:45:07 +01:00
Sandro La Bruzzo 99cf3a8ea4 Merged Datacite transfrom into this branch 2021-01-28 16:34:46 +01:00
Sandro La Bruzzo 98b9498b57 Removed old messaging system not quite used from collection and Transformation workflow
code refactor
2021-01-28 09:51:17 +01:00
Sandro La Bruzzo 184e7b3856 Implemented new Transformation using spark 2021-01-27 15:43:08 +01:00
Sandro La Bruzzo ffb092b8d3 removed duplicate code HttpConnector.java 2021-01-25 15:05:37 +01:00
Claudio Atzori 41500669e2 [BIP! Scores integration] merged missing classes from bipFinder branch 2021-01-11 14:39:47 +01:00
Claudio Atzori 2a7a10809e [BIP! Scores integration] merged missing classes from bipFinder branch 2021-01-11 10:05:02 +01:00
Claudio Atzori d6686dd7cf merged from master 2021-01-08 18:16:12 +01:00
Claudio Atzori 34229970e6 [BIP! Scores integration] Create updates as Result rather than subclasses; Result considers also metrics in the mergeFrom operation 2021-01-08 16:29:17 +01:00
Claudio Atzori 1361c9eb0c [BIP! Scores integration] Create updates as Result rather than subclasses; Result considers also metrics in the mergeFrom operation 2021-01-07 10:07:30 +01:00
Claudio Atzori 2e503ee101 code formatting 2020-12-17 13:47:38 +01:00
Claudio Atzori 03319d3bd9 Revert "Merge pull request 'Creation of the action set to include the bipFinder! score' (#62) from miriam.baglioni/dnet-hadoop:bipFinder into master"
This reverts commit add7e1693b, reversing
changes made to f9a8fd8bbd.
2020-12-17 12:23:58 +01:00
Miriam Baglioni 888175baf7 added java doc 2020-12-01 18:36:29 +01:00
Miriam Baglioni 3d62d99d5d fixed issue in workflow variable 2020-12-01 15:02:49 +01:00
Miriam Baglioni 17680296b9 removed unnecessary variable and unused method 2020-12-01 15:02:31 +01:00