Serafeim Chatzopoulos
623f7be26d
Fix reading files from HDFS in FileCollector & FileGZipCollector plugins
2022-04-28 16:31:11 +03:00
Claudio Atzori
30105f0722
Merge branch 'beta' into 7096-fileGZip-collector-plugin
2022-04-22 11:22:21 +02:00
Miriam Baglioni
20de75ca64
[Measures] removed typo
2022-04-21 12:14:03 +02:00
Miriam Baglioni
b61efd613b
[Measures] addressed comments in the PR
2022-04-21 12:09:37 +02:00
Miriam Baglioni
c304657d91
[Measures] put the logic in common, no need to change the schema
2022-04-21 11:27:26 +02:00
Miriam Baglioni
5295effc96
[Measures] fixed issue
2022-04-20 16:20:40 +02:00
Miriam Baglioni
5feae77937
[Measures] last changes to accomodate tests
2022-04-20 15:13:09 +02:00
Miriam Baglioni
869407c6e2
[Measures] added new measure (usagecounts) as action set. Measure added at the level of the result. Ref #7587
2022-04-20 14:02:05 +02:00
Serafeim Chatzopoulos
d0b84d3297
Add FileCollectorPlugin and respective test
2022-04-07 15:06:38 +03:00
Serafeim Chatzopoulos
bc1bf55507
Add AbstractSplittedRecordPlugin
2022-04-07 14:33:04 +03:00
Serafeim Chatzopoulos
e612489670
Add fileGZip collector plugin and respective test
2022-04-06 19:12:44 +03:00
Claudio Atzori
401dd38074
code formatting
2022-02-18 15:19:15 +01:00
Claudio Atzori
89c7313fc5
Merge branch 'beta' into hierarchical_orgs_relations
2022-02-17 10:30:04 +01:00
Miriam Baglioni
be64055cfe
[OpenCitation] changed the name of destination folders
2022-02-14 15:49:44 +01:00
Miriam Baglioni
1490867cc7
[OpenCitation] cleaning of the COCI model
2022-02-14 14:52:12 +01:00
Miriam Baglioni
5c4043dba8
[OpenCitation] refactoring
2022-02-08 16:23:05 +01:00
Miriam Baglioni
759ed519f2
[OpenCitation] added logic to avoid the genration of self citations relations
2022-02-08 16:15:34 +01:00
Miriam Baglioni
b071f8e415
[OpenCitation] change to extract in json format each folder just onece
2022-02-08 15:37:28 +01:00
Miriam Baglioni
fbc28ee8c3
[OpenCitation] change the integration logic to consider dois with commas inside
2022-02-07 18:32:08 +01:00
Miriam Baglioni
73eba34d42
[UnresolvedEntities] Changed the way to merge the unresolved because the new merge removed the dataInfo from the merged result. Added also data info for subjects
2022-02-01 08:38:41 +01:00
Miriam Baglioni
e7d5a39c03
[BipFinderInstanceLevel] added tests in test class
2022-01-12 17:25:04 +01:00
Miriam Baglioni
4993666d73
[BipFinderInstanceLevel] changed creation of the instance to allow to enrich existing instances with same pid
2022-01-12 16:53:47 +01:00
Miriam Baglioni
b7e450070b
[SDG-FOS] to import SDG file not considering the header
2022-01-07 12:13:26 +01:00
Miriam Baglioni
adccc2346a
[SDG-FOS] to lower case for the doi
2022-01-07 11:28:50 +01:00
Miriam Baglioni
92fd69e25d
[SDG-FOS] alternative way to get input data to avoid OOM error while getting csv
2022-01-03 15:23:06 +01:00
Miriam Baglioni
7a1b440413
[SDG] logic to create unresolved entities out of SDG input. This changes also some classes related to FOS to reuse the same code. The code under createunresolvedentities create results with the merged update of the the inputs provided (bip at the level of the isntance, fos and sdg for subjects)
2021-12-23 13:24:28 +01:00
Miriam Baglioni
2a67ee13ec
[SDG] added model class
2021-12-23 10:37:52 +01:00
Miriam Baglioni
10579c0dd0
[FOS]fixed doi value in test
2021-12-22 23:10:16 +01:00
Miriam Baglioni
6116fc5d40
[FOS]added logic to include only different subjects. Test refactoring and extention
2021-12-22 23:04:22 +01:00
Miriam Baglioni
b81efb6a9d
[FOS]changed the mapping between the csv and the model. Changed Test classes and resources
2021-12-22 21:40:35 +01:00
Miriam Baglioni
de6c4c8968
[FOS]creation of the unresolved entities: remove the split for the doi: no more needed since each row is related to one doi
2021-12-22 16:44:44 +01:00
Miriam Baglioni
20ef1d657f
refactoring
2021-12-22 16:26:36 +01:00
Miriam Baglioni
2c126ed014
[BipFinder] create unresolved entities with measures at the level of the instance
2021-12-22 16:03:41 +01:00
Miriam Baglioni
b5e11a3a0a
[BipFinder] put in common package BipFinder model
2021-12-22 15:33:05 +01:00
Miriam Baglioni
c5739c4266
[BipFinder] create action set for the measures at the level of the result
2021-12-22 15:08:33 +01:00
Miriam Baglioni
6fb6236cd4
changed the way to produce the AS for bipFinder.
2021-12-14 14:51:14 +01:00
Miriam Baglioni
4eb8276493
-
2021-12-14 11:12:17 +01:00
Sandro La Bruzzo
7af0bbd0b1
[scala-refactor] Module dhp-aggregation:
...
Moved all scala source into src/main/scala and src/test/scala
2021-12-06 11:26:36 +01:00
Sandro La Bruzzo
2164a2a889
Datacite: Code Refactor generated a general SparkApplication Scala where all the spark scala have to inherit
...
Commented a little the Datacite transformation code
2021-11-25 10:54:13 +01:00
Sandro La Bruzzo
a7cf277d98
Datacite: Removed HostedBy Patch as described on ticket #7219 , Now all the records will have hosted by Unknown Repository
2021-11-22 16:03:17 +01:00
Claudio Atzori
3a4d925386
Merge branch 'beta' into hierarchical_orgs_relations
2021-11-18 18:07:08 +01:00
Claudio Atzori
bafa2990f3
code formatting
2021-11-15 17:07:16 +01:00
Sandro La Bruzzo
efa09057db
Merge branch 'beta' of code-repo.d4science.org:D-Net/dnet-hadoop into beta
2021-11-15 14:32:09 +01:00
Sandro La Bruzzo
48923e46a1
added documentation to Pubmed Class and also added mvn site for dhp-aggregations
2021-11-15 14:32:01 +01:00
Miriam Baglioni
4ec88c718c
merge with beta - resolved conflict in pom
2021-11-15 10:52:16 +01:00
Miriam Baglioni
157d33ebf9
[Bypass Action Set] Refactoring
2021-11-15 09:58:48 +01:00
Miriam Baglioni
92d0e18b55
[Bypass Action Set] used constant DOI instead of "doi"
2021-11-12 10:56:58 +01:00
Miriam Baglioni
881113743f
[Bypass Action Set] refactoring
2021-11-12 10:55:50 +01:00
Miriam Baglioni
47ccb53c4f
[Bypass Action Set] modification for comment #157 (comment)
2021-11-12 10:54:09 +01:00
Miriam Baglioni
716021546e
[Bypass Action Set] minor fix
2021-11-12 10:18:01 +01:00
Miriam Baglioni
935062edec
[Bypass Action Set] creation of unresolved entities
2021-11-11 16:11:25 +01:00
Claudio Atzori
d02caef185
Merge branch 'beta' into hierarchical_orgs_relations
2021-10-27 15:36:29 +02:00
Sandro La Bruzzo
4acfa8fa2e
Scholexplorer Datasource Aggregation:
...
- Added collectedfrom in the inverse relation generated
Relation resolution:
- increased number of partitions in workflow.xml
- using classid instead of classname to build the pid-dnetId mapping
2021-10-26 17:51:20 +02:00
Michele Artini
d66e20e7ac
added hierarchy rel in ROR actionset
2021-10-21 15:51:48 +02:00
Sandro La Bruzzo
aeeebd573b
code refactor renamed datacite package
2021-10-20 17:37:42 +02:00
Sandro La Bruzzo
ae4e99a471
Adapted workflow of resolution of PID to work into OpenAIRE data workflow
...
- Added relations in both verse on all Scholexplorer datasources
2021-10-20 17:12:16 +02:00
Miriam Baglioni
1cc09adfaa
Opencitations: chenaged the test class to mirror the creation or not of duplicate dois for .refs oc original plus added optional parameter to duplicate the relation
2021-10-18 14:11:27 +02:00
Sandro La Bruzzo
7b15b88d4c
renamed wrong package, implemented last aggregation workflow for scholexplorer
2021-10-15 15:00:15 +02:00
Sandro La Bruzzo
51a03c0a50
refactor code for EBI from dhp-graph-mapper into dhp-aggregation
2021-10-14 14:23:13 +02:00
Sandro La Bruzzo
7387416e90
added params skip update to direct transform in OAF, this should be set to true in production
2021-10-12 12:36:30 +02:00
Sandro La Bruzzo
511da98d0c
- fixed bug on download pmc Article
...
- removed unused line of code in SparkCreateActionset
2021-10-12 11:47:49 +02:00
Sandro La Bruzzo
5606014b17
code refactor see ticket #7065
2021-10-12 08:11:53 +02:00
Sandro La Bruzzo
66702b1973
Added node to update datacite
2021-09-28 08:59:06 +02:00
Miriam Baglioni
5ec69889db
OpenCitations: creation of AS from OC
2021-09-27 16:02:06 +02:00
Miriam Baglioni
f2118d771a
first steps in the implementation of the integration of opencitations
2021-09-22 15:18:05 +02:00
Claudio Atzori
663b1556d7
manually integrating PR#140 #140
2021-09-15 16:40:25 +02:00
Sandro La Bruzzo
aed29156c7
changed behavior in transformation job, that doesn't fail at first error
2021-09-07 19:05:46 +02:00
Sandro La Bruzzo
3c6fc2096c
fix bug on oai iterator that skip record cleaned
2021-09-07 10:46:26 +02:00
Sandro La Bruzzo
e8b3cb9147
Implemented method to download delta updates in EBI Links
2021-08-30 09:32:45 +02:00
Claudio Atzori
3359f73fcf
cleanup & best practices
2021-08-13 12:00:42 +02:00
Miriam Baglioni
32fd75691f
refactoring
2021-08-13 10:15:42 +02:00
Miriam Baglioni
5cd5714530
GetCSV refactoring - added ignore annotation for fields not in input csv
2021-08-13 10:06:49 +02:00
Miriam Baglioni
8769dd8eef
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:20:56 +02:00
Miriam Baglioni
6b9e1bf2e3
GetCSV refactoring - removing not needed dependency
2021-08-12 18:17:50 +02:00
Miriam Baglioni
335a824e34
GetCSV refactoring - fixed issue
2021-08-12 18:10:10 +02:00
Miriam Baglioni
f0845e9865
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:04:58 +02:00
Miriam Baglioni
7a789423aa
GetCSV refactoring - refactoring due to movement of classes
2021-08-12 18:04:27 +02:00
Miriam Baglioni
e9fc3ef3bc
GetCSV refactoring - changed to use the new class to get and write the csv file
2021-08-12 18:03:41 +02:00
Miriam Baglioni
4317211a2b
GetCSV refactoring - refactoring due to movement
2021-08-12 18:03:14 +02:00
Miriam Baglioni
b62cd656a7
GetCSV refactoring - changed the model to store only the information needed
2021-08-12 18:01:10 +02:00
Miriam Baglioni
d36e925277
GetCSV refactoring - moved under model package
2021-08-12 18:00:21 +02:00
Miriam Baglioni
6e84b3951f
GetCSV refactoring - moving classes to dhp-common that have dependency with GetCSV class (that was located in graph-mapper)
2021-08-12 17:57:41 +02:00
Miriam Baglioni
8da3a25cf6
merging with branch beta
2021-08-11 15:55:34 +02:00
Claudio Atzori
9f4db73f30
updated/fixed unit tests
2021-08-11 15:02:51 +02:00
Claudio Atzori
2ee21da43b
suggestions from SonarLint
2021-08-11 12:13:22 +02:00
Miriam Baglioni
1d6ac3715b
merge branch with beta
2021-07-30 11:58:29 +02:00
Sandro La Bruzzo
3721df7aa6
refactoring create actionset of scholexplorer, moved on package dhp-aggregation
2021-07-29 10:45:35 +02:00
Sandro La Bruzzo
3d8f0f629b
implemented workflow of creation action set for scholexplorer
2021-07-28 16:15:34 +02:00
Miriam Baglioni
cc0d3d8a7b
mergin with branch beta
2021-07-28 11:24:46 +02:00
Miriam Baglioni
708d0ade34
Merge branch 'beta' into hostedbymap
2021-07-28 10:37:22 +02:00
Sandro La Bruzzo
16c91203bd
implemented workflow of creation action set for scholexplorer
2021-07-28 10:30:49 +02:00
Sandro La Bruzzo
825d9f0289
fixed datacite workflow starting from Importing delta
2021-07-27 16:09:46 +02:00
Miriam Baglioni
74f801b689
mergin with branch beta
2021-07-27 13:18:31 +02:00
Claudio Atzori
a0393607a7
mapping funding relations from Datacite should be done according to the actual result identifier
2021-07-23 18:15:08 +02:00
Miriam Baglioni
63553a76b3
added code to download gold issn list from unibi
2021-07-22 12:01:48 +02:00
Sandro La Bruzzo
bbe8193930
merged stable ids
2021-07-12 17:00:43 +02:00
Sandro La Bruzzo
cd17e19044
implemented branch workflow to import datacite and crossref in scholexplorer
2021-07-08 21:20:19 +02:00
Claudio Atzori
777536ce91
[aggregation] string values used as regular expressions in the OAI collection classes are defined in a single point as constants, to be reused across the code (PR#122)
2021-07-07 11:23:48 +02:00
Claudio Atzori
bc014023c8
Merge pull request 'to solve the scala SI-3623' ( #122 ) from andreas.czerniak/BrStableId_dnet-hadoop:stable_ids into stable_ids
...
Reviewed-on: #122
2021-07-07 11:13:51 +02:00
Andreas Czerniak
ebf3f47a02
from&until more OAI2.0 compl., adding tfs
2021-07-07 09:29:49 +02:00