interactive-mining/interactive-mining-backend/madoap/src/static/exampleClarinDocs.json

9 lines
396 KiB
JSON
Raw Normal View History

2018-04-12 12:48:02 +02:00
{"id":"1410.0286","text":"LAF-Fabric: a data analysis tool for Linguistic Annotation Framework with an application to the Hebrew Bible Dirk Roorda\u2217 \u2217\u2217 Gino Kalkman\u2217\u2217\u2217 Martijn Naaijer\u2217\u2217\u2217 Andreas van Cranenburgh\u2217\u2217\u2217\u2217 \u2217\u2217\u2217\u2217\u2217 dirk.roorda@dans.knaw.nl g.j.kalkman@vu.nl m.naaijer@vu.nl andreas.van.cranenburgh@huygens.knaw.nl \u2217 Data Archiving and Networked Services - Royal Netherlands Academy of Arts and Sciences, Anna van Saksenlaan 10; 2593 HT Den Haag, Netherlands \u2217\u2217 The Language Archive - Max Planck Institute for Psycholinguistics, Wundtlaan 1; 6525 XD Nijmegen, Netherlands arXiv:1410.0286v1 [cs.CL] 1 Oct 2014 \u2217\u2217\u2217 Eep Talstra Centre for Bible and Computing - VU University, Faculteit der Godgeleerdheid; De Boelelaan 1105; 1081 HV Amsterdam, Netherlands \u2217\u2217\u2217\u2217 Huygens Institute for the History of the Netherlands - Royal Netherlands Academy of Arts and Sciences, P.O. box 90754; 2509 LT; Den Haag, Netherlands \u2217\u2217\u2217\u2217\u2217 Institute for Logic Language and Computation - University of Amsterdam, FNWI ILLC Universiteit van Amsterdam; P.O. Box 94242; 1090 GE Amsterdam, Netherlands Abstract The Linguistic Annotation Framework (LAF) provides a general, extensible stand-off markup system for corpora. This paper discusses LAF-Fabric, a new tool to analyse LAF resources in general with an extension to process the Hebrew Bible in particular. We first walk through the history of the Hebrew Bible as text database in decennium-wide steps. Then we describe how LAF-Fabric may serve as an analysis tool for this corpus. Finally, we describe three analytic projects/workflows that benefit from the new LAF representation: 1) the study of linguistic variation: extract cooccurrence data of common nouns between the books of the Bible (Martijn Naaijer); 2) the study of the grammar of Hebrew poetry in the Psalms: extract clause typology (Gino Kalkman); 3) construction of a parser of classical Hebrew by Data Oriented Parsing: generate tree structures from the database (Andreas van Cranenburgh). 1. The Hebrew Bible The Hebrew Bible is written in old forms of Hebrew and Aramaic, which are dead languages now. Written manuscripts are our only source for studying these languages. The study of such a body of historical texts involves research questions from different disciplines. Linguistic analysis is a stepping stone which must be followed by questions at higher levels of abstraction, such as literary questions: how did authors use the system of the language to craft their design: i.e. style, literary effect, focus, and all those features of the text that are not dictated by the language system (van Peursen et al. 2010)? Another line of questions falls into historical linguistics: systematically charting linguistic variation in the biblical linguistic corpus can help addressing the question as to whether the variation reflects diachronic development (van Peursen and van Keulen 2006). 1.1 Bible and Computer Naturally, a research program as mentioned above seeks to employ digital tools. In fact, a group of researchers in Amsterdam started compiling a text database in the 1970s: the Werkgroep Informatica Vrije Universiteit (WIVU). This resulted in a database of text and markup, the so-called WIVU database (Talstra and Sikkel 2000), which became widely known because it became incorporated in Bible study software packages. The WIVU markup is based on observable characteristics of the texts and refrains from being committed to a particular linguistic theory or framework. There is no explicit grammar to which the marked-up text material has to conform. One of the consequences is that the data cannot conveniently be described in one hierarchical structure, although hierarchy plays a role. There are several, incompatible hierarchies implicit in the data. In the 1990s the ground work has been laid for an analytical tool operating on the WIVU data. In his PhD. thesis, Doedens (1994) defined a database model and a pai
{"id":"1501.01866","text":"The Hebrew Bible as Data: Laboratory Sharing Experiences Dirk Roorda\u2217 \u2217\u2217 dirk.roorda@dans.knaw.nl \u2217 Data Archiving and Networked Services - Royal Netherlands Academy of Arts and Sciences, Anna van Saksenlaan 10; 2593 HT Den Haag, Netherlands \u2217 The Language Archive - Max Planck Institute for Psycholinguistics, Wundtlaan 1; 6525 XD Nijmegen, Netherlands arXiv:1501.01866v1 [cs.CL] 8 Jan 2015 Abstract The systematic study of ancient texts including their production, transmission and interpretation is greatly aided by the digital methods that started taking off in the 1970s. But how is that research in turn transmitted to new generations of researchers? We tell a story of Bible and computer across the decades and then point out the current challenges: (1) finding a stable data representation for changing methods of computation; (2) sharing results in inter- and intra-disciplinary ways, for reproducibility and cross-fertilization. We report recent developments in meeting these challenges. The scene is the text database of the Hebrew Bible, constructed by the Eep Talstra Centre for Bible and Computer (ETCBC), which is still growing in detail and sophistication. We show how a subtle mix of computational ingredients enable scholars to research the transmission and interpretation of the Hebrew Bible in new ways: (1) a standard data format, Linguistic Annotation Framework (LAF); (2) the methods of scientific computing, made accessible by (interactive) Python and its associated ecosystem. Additionally, we show how these efforts have culminated in the construction of a new, publicly accessible search engine SHEBANQ, where the text of the Hebrew Bible and its underlying data can be queried in a simple, yet powerful query language MQL, and where those queries can be saved and shared. 1. Introduction The Hebrew Bible is a collection of ancient texts resulting from a ten-centuries long tradition. It is one of the most studied texts in human culture. Information processing by machines is less than two centuries old, but since its inception its capabilities have evolved in an exponential manner up till now (Gleick 2011). We are interested in what happens when the Hebrew Bible as an object of study is brought under the scope of the current methods of information processing. The Eep Talstra Centre for Bible and Computing (ETCBC) formerly known as Werkgroep Informatica Vrije Universiteit (WIVU), has been involved in just this since the 1970s and their members are dedicated to this approach. The combination of a relatively stable set of data and a rapidly evolving set of methods urges for reflection. Add to that a growing set of ambitious research questions, and it becomes clear that not only reflection is needed but also action. Methods from computational linguistics and the wider digital humanities are to be used, hence people from different disciplines have to be involved. How can the ETCBC share its data and way of working productively with people that are used to a wide variety of computational ways? In this article we tell a story of reflection and action, and the characters are databases, data formats, query languages, annotations, computer languages, archives, repositories and social media. This story has a beginning in February 2012, when a group of biblical scholars convened at the Lorentz center at Leiden for the workshop Biblical Scholarship and Humanities Computing: Data Types, Text, Language and Interpretation (Roorda et al. 2012). They searched for new ways to obtain computational tools that matched their research interests. The author was part of that meeting and had prepared a demo application: a query saver. It was an attempt to improve the sharing of knowledge. It is a craft to write successful queries for the ETCBC Hebrew Text database, and by publishing their queries, researchers might teach each other how to do it. In the years that followed, this idea has materialized as the result of the SHEBANQ project (System for HEBrew text: ANnotations for Queries and markup), a c
{"id":"1501.06412","text":"The Anatomy of Relevance Topical, Snippet and Perceived Relevance in Search Result Evaluation \u02da Aleksandr Chuklin Maarten de Rijke University of Amsterdam, Amsterdam, The Netherlands arXiv:1501.06412v1 [cs.IR] 26 Jan 2015 a.chuklin, derijke@uva.nl ABSTRACT Currently, the quality of a search engine is often determined using so-called topical relevance, i.e., the match between the user intent (expressed as a query) and the content of the document. In this work we want to draw attention to two aspects of retrieval system performance affected by the presentation of results: result attractiveness (\u201cperceived relevance\u201d) and immediate usefulness of the snippets (\u201csnippet relevance\u201d). Perceived relevance may influence discoverability of good topical documents and seemingly better rankings may in fact be less useful to the user if good-looking snippets lead to irrelevant documents or vice-versa. And result items on a search engine result page (SERP) with high snippet relevance may add towards the total utility gained by the user even without the need to click those items. We start by motivating the need to collect different aspects of relevance (topical, perceived and snippet relevances) and how these aspects can improve evaluation measures. We then discuss possible ways to collect these relevance aspects using crowdsourcing and the challenges arising from that. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval 1. INTRODUCTION For decades the main evaluation paradigm for search engines was the Cranfield methodology [7]. In a typical setting of a TREC conference, the documents are evaluated by human raters who assign relevance labels based on their judgement about the relevance of the document to the user\u2019s topic of interest, expressed as a query. A graded relevance scale is typically used with topical relevance labels ranging from 0 to 4 or from irrelevant to highly relevant. 9\u02daNow at Google Switzerland. Copyright is held by the author/owner(s). SIGIR\u201914 Workshop on Gathering Efficient Assessments of Relevance (GEAR\u201914), July 11, 2014, Gold Coast, Queensland, Australia. These relevance labels can be obtained either from trained experts or using a crowdsourcing approach. Either way, cases of disagreement have to be addressed, and those are usually treated as raters\u2019 mistakes, but may also arise from different interpretations of the user intent or the notion of relevance. In a traditional evaluation approach a single relevance label is chosen for each document-topic pair. These labels are then aggregated to SERP-level quality measures such as DCG [9] or ERR [2]. By using additional inputs from raters, we can (a) refine these quality measures and (b) better understand the performance of retrieval systems. 2. RELATED WORK The idea to separate perceived and topical relevance was suggested by [3] while designing the DBN click model. Unlike earlier click models, it suggests that the likelihood of a user clicking a document depends not on the topical relevance of the document, but rather on its perceived relevance, since the user can only judge based on the result snippet. This idea was later picked up by [12] who showed that while topical and perceived relevance are correlated, there is a noticeable discrepancy between them. They performed a simulated experiment by modeling the user click probability and showed that taking it into account would lead to substantially different ordering of the systems participating in a TREC Web Track. The idea to separate out snippet relevance appears after the introduction of good abandonment [10]: cases when users abandon a search result page without clicking any results and yet they are satisfied. This may be due to the SERP being rich with instant answers [4], e.g., a weather widget or a dictionary box, or due to the fact that a query has a precise informational need, that can easily be answered in a result snippet [5]. In fact, as was shown by [11] a big portion
{"id":"oai:arXiv.org:1309.2788","text":"Training in Data Curation as Service in a Federated Data Infrastructure - the FrontO\ufb03ce\u2013BackO\ufb03ce Model Ingrid Dillo, Rene van Horik, and Andrea Scharnhorst arXiv:1309.2788v1 [cs.DL] 11 Sep 2013 Data Archiving and Networked Services, Anna van Saksenlaan 10, 2593 HT The Hague , The Netherlands {ingrid.dillo,rene.van.horik,andrea.scharnhorst}@dans.knaw.nl http://www.dans.knaw.nl Abstract. The increasing volume and importance of research data leads to the emergence of research data infrastructures in which data management plays an important role. As a consequence, practices at digital archives and libraries change. In this paper, we focus on a possible alliance between archives and libraries around training activities in data curation. We introduce a so-called FrontO\ufb03ce\u2013BackO\ufb03ce model and discuss experiences of its implementation in the Netherlands. In this model, an e\ufb03cient division of tasks relies on a distributed infrastructure in which research institutions (i.e., universities) use centralized storage and data curation services provided by national research data archives. The training activities are aimed at information professionals working at those research institutions, for instance as digital librarians. We describe our experiences with the course DataIntelligence4Librarians. Eventually, we re\ufb02ect about the international dimension of education and training around data curation and stewardship. Keywords: data curation, data management, training, data sharing, data archive, digital libraries, education, science policy, documentation 1 Introduction A research archive can be depicted as a safe haven for research data, carefully selected, documented and stored for future consultation. Accordingly, the core tasks of a data archivist could be imagined to be con\ufb01ned to proper documentation, and the care for material preservation. In short: \u201dOur service starts where others drop the data\u201d1 . The current practices of archivists seem to deviate from such an archetype to a large extent. This turn of tables can best be understood by a recall to the history of archival sciences. In general, for archives of research data the same principles hold as for any other archive. In 1898, in the handbook, one of the foundational texts in archival sciences [1], Muller, Feith, and Fruin describe the archive as an organic entirety whose function cannot be determined 1 Personal communication Henk Koning, former Technical Archivist at DANS 2 Dillo Fig. 1. The federated data infrastructure - a collaborative framework. Scheme designed by Peter Doorn based on the Collaborative Data Infrastructure as envisioned in [6, p. 31] . a priori. On the contrary, its function needs to be de\ufb01ned and rede\ufb01ned depending on the development of the institution (i.e., a board or government) whose selected traces it is obliged to archive. In other words, Muller et al. describe a co-evolution of the institution and its archive. This view applied to a research data archive, the corresponding institution is none other than the science system. From out this viewpoint, it is not surprising that the profound changes in scienti\ufb01c practice [2] and scholarly communication [3] in\ufb02uence the expectations placed on a data archive or, more speci\ufb01cally, a sustainable digital archive (Trusted Digital Repository). The changing modes of scholarly communication and practice alter the form and content of what is seen worth to be preserved. [5] Changing research practices require new negotiations on the division of labor. Who is responsible for setting up digital research infrastructures including virtual research environments - the information service providers such as Trusted Digital Repositories (TDRs) or the research institutions? Who takes care of the preparation of (meta)\u2013data and formats prior to archiving? Who should preserve software tools - the labs which developed them or the archive together with \u2019data\u2019 for which they have been developed? The
{"id":"oai:arXiv.org:1310.3370","text":"Talking With Scholars: Developing a Research Environment for Oral History Collections Max Kemman1 , Stef Scagliola1 , Franciska de Jong1,2 , and Roeland Ordelman2,3 arXiv:1310.3370v1 [cs.DL] 12 Oct 2013 1 3 Erasmus University Rotterdam, Rotterdam, The Netherlands {kemman,scagliola}@eshcc.eur.nl 2 University of Twente, Enschede, The Netherlands f.m.g.dejong@utwente.nl Netherlands Institute for Sound and Vision, Hilversum, The Netherlands rordelman@beeldengeluid.nl Abstract. Scholars are yet to make optimal use of Oral History collections. For the uptake of digital research tools in the daily working practice of researchers, practices and conventions commonly adhered to in the subfields in the humanities should be taken into account during development. To this end, in the Oral History Today project a research tool for exploring Oral History collections is developed in close collaboration with scholarly researchers. This paper describes four stages of scholarly research and the first steps undertaken to incorporate requirements of these stages in a digital research environment. Keywords: Oral History, scholarly research, user-centered design, exploration, result presentation, data curation, word cloud, visual facets 1 Introduction The digital turn has profoundly influenced historical culture and has led to a rise in the creation of audio-visual archives with personal narratives, commonly identified as Oral History. For the general public, searching these archives by making use of standard search tools may be sufficient. Yet for scholars, the full value of this type of data cannot be exploited optimally as available tools do not enable scholars to engage with the content for the purposes of research. When working with audio-visual content, the availability of annotations is key to the process of digging up interesting fragments. In the past years, a lot of effort has been put in tools for creating manual annotations and generating annotations (semi-)automatically. But to accelerate scholarly research, tools are required that can take available annotations layers as input and provide means for visualization, compression and aggregation of the data. Thus allowing the researcher to explore and process the data, both at fragment-, item- and collection-level. However, to develop such dedicated data exploration tools, technology specialists and researchers in the humanities have to engage in a process of mutual understanding and joint development. Taking carefully into account the specific set of practices and conventions commonly adhered to within the subfields in the humanities is a minimum requirement for the uptake of the technology in the daily working practice of scholars. In this paper we present a research tool developed in close collaboration with scholars that enables searching and exploration of aggregated, heterogeneous Oral History content. 2 Four stages of scholarly research The user interface development is based upon four stages of scholarly research that were defined on the basis of an investigation of use scenarios reported in [1]. Exploration and selection. In the first stage, the focus is on the exploration and selection of one or more content sets within an archive that may be suitable for addressing a certain scholarly issue. The first steps in content exploration by a researcher often come down to searching for material. Research starts with the search for new or additional data. This stage can get the form of plain browsing, but it can also be strongly purpose-driven, (e.g., checking details, searching for complementary sources), item-oriented (e.g., finding the first interview with a specific person), or directed towards patterns in a collection, in which case an entire data set is the focus of attention. Exploration and investigation. Once the relevant materials have been identified, the focus in the next stage is mostly on the further exploration of the collected materials, the ordering, comparison (by individual researchers or in joint efforts) and analysis, and
{"id":"oai:arXiv.org:1312.3393","text":"arXiv:1312.3393v2 [cs.LG] 17 Dec 2013 Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem Masrour Zoghi ISLA, University of Amsterdam, The Netherlands m.zoghi@uva.nl Shimon Whiteson ISLA, University of Amsterdam, The Netherlands s.a.whiteson@uva.nl Remi Munos INRIA Lille - Nord Europe, Villeneuve d\u2019Ascq, France remi.munos@inria.fr Maarten de Rijke ISLA, University of Amsterdam, The Netherlands Abstract This paper proposes a new method for the K-armed dueling bandit problem, a variation on the regular K-armed bandit problem that offers only relative feedback about pairs of arms. Our approach extends the Upper Confidence Bound algorithm to the relative setting by using estimates of the pairwise probabilities to select a promising arm and applying Upper Confidence Bound with the winner as a benchmark. We prove a finite-time regret bound of order O(log t). In addition, our empirical results using real data from an information retrieval application show that it greatly outperforms the state of the art. 1. Introduction In this paper, we propose and analyze a new algorithm, called Relative Upper Confidence Bound (RUCB), for the K-armed dueling bandit problem (Yue et al., 2012), a variation on the K-armed bandit problem, where the feedback comes in the form of pairwise preferences. We assess the performance of this algorithm using one of the main current applications of the K-armed dueling bandit problem, ranker evaluation (Hofmann et al., 2013; Joachims, 2002; Yue & Joachims, 2011), which is used in information retrieval, ad placement and recommender systems, among others. derijke@uva.nl The K-armed dueling bandit problem is part of the general framework of preference learning (F\u00a8 urnkranz & H\u00a8 ullermeier, 2010; F\u00a8 urnkranz et al., 2012), where the goal is to learn, not from real-valued feedback, but from relative feedback, which specifies only which of two alternatives is preferred. Developing effective preference learning methods is important for dealing with domains in which feedback is naturally qualitative (e.g., because it is provided by a human) and specifying real-valued feedback instead would be arbitrary or inefficient (F\u00a8 urnkranz et al., 2012). Other algorithms proposed for this problem are Interleaved Filter (IF) (Yue et al., 2012), Beat the Mean (BTM) (Yue & Joachims, 2011), and SAVAGE (Urvoy et al., 2013). All of these methods were designed for the finite-horizon setting, in which the algorithm requires as input the exploration horizon, T , the time by which the algorithm needs to produce the best arm. The algorithm is then judged based upon either the accuracy of the returned best arm or the regret accumulated in the exploration phase.1 All three of these algorithms use the exploration horizon to set their internal parameters, so for each T , there is a separate algorithm IFT , BTMT and SAVAGET . By contrast, RUCB does not require this input, making it more useful in practice, since a good exploration horizon is often difficult to guess. Nonetheless, RUCB outperforms these algorithms in terms of the accuracy and regret metrics used in the finite-horizon setting. The main idea of RUCB is to maintain optimistic estimates of the probabilities of all possible pairwise out1 These terms are formalized in Section 2. Relative Upper Confidence Bound comes, and (1) use these estimates to select a potential champion, which is an arm that has a chance of being the best arm, and (2) select an arm to compare to this potential champion by performing regular Upper Confidence Bound (Auer et al., 2002) relative to it. We prove a finite-time high-probability bound of O(log t) on the cumulative regret of RUCB, from which we deduce a bound on the expected cumulative regret. These bounds rely on substantially less restrictive assumptions on the K-armed dueling bandit problem than IF and BTM and have better multiplicative constants than those of SAVAGE. Furthermore, our bounds are the first explicitly non-asymptotic results for the K-armed dueling ban
2018-04-18 17:00:21 +02:00
{"id":"oai:arXiv.org:1312.4428","text":"On Constraint Satisfaction Problems below P\u2217 arXiv:1312.4428v2 [cs.CC] 17 Dec 2013 L\u00b4aszl\u00b4o Egri\u2020 Abstract Symmetric Datalog, a fragment of the logic programming language Datalog, is conjectured to capture all constraint satisfaction problems (CSP) in L. Therefore developing tools that help us understand whether or not a CSP can be defined in symmetric Datalog is an important task. It is widely known that a CSP is definable in Datalog and linear Datalog if and only if that CSP has bounded treewidth and bounded pathwidth duality, respectively. In the case of symmetric Datalog, Bulatov, Krokhin and Larose ask for such a duality (2008). We provide two such dualities, and give applications. In particular, we give a short and simple new proof of the result of Dalmau and Larose that \u201cMaltsev + Datalog \u21d2 symmetric Datalog\u201d (2008). In the second part of the paper, we provide some evidence for the conjecture of Dalmau (2002) that every CSP in NL is definable in linear Datalog. Our results also show that a wide class of CSPs\u2013CSPs which do not have bounded pathwidth duality (e.g., the P-complete Horn-3Sat problem)\u2013cannot be defined by any polynomial size family of monotone read-once nondeterministic branching programs. 1 Introduction Constraint satisfaction problems (CSP) constitute a unifying framework to study various computational problems arising naturally in various branches of computer science, including artificial intelligence, graph homomorphisms, and database theory. Loosely speaking, an instance of a CSP consists of a list of variables and a set of constraints, each specified by an ordered tuple of variables and a constraint relation over some specified domain. The goal is then to determine whether variables can be assigned domain values such that all constraints are simultaneously satisfied. Recent efforts have been directed at classifying the complexity of the so-called nonuniform CSP. For a fixed finite set of finite relations \u0393, CSP(\u0393) denotes the nonuniform CSP corresponding to \u0393. The difference between an instance of CSP(\u0393) and an instance of the general CSP is that constraints in an instance of CSP(\u0393) take the form (xi1 , . . . , xik ) \u2208 R for some R \u2208 \u0393. Examples of nonuniform CSPs include k-Sat, Horn-3Sat, Graph H-Coloring, and many others. \u2217 Research supported by NSERC, FQRNT, and ERC Starting Grant PARAMTIGHT (No. 280152). Institute for Computer Science and Control, Hungarian Academy of Sciences (MTA SZTAKI), Budapest, Hungary. {laszlo.egri@mail.mcgill.ca} \u2020 1 For a relational structure B, the homomorphism problem HOM(B) takes a structure A as input, and the task is to determine if there is a homomorphism from A to B. For instance, consider structures that contain a single symmetric binary relation, i.e., graphs. A homomorphism from a graph G to a graph H is a mapping from VG to VH such that any edge of G is mapped to an edge of H. If H is a graph with a single edge then HOM(H) is the set of graphs which are two-colorable. There is a well-known and straightforward correspondence between the CSP and the homomorphism problem. For this reason, from now on we work only with the homomorphism problem instead of the CSP. Nevertheless, we call HOM(B) a CSP and we also write CSP(B) instead of HOM(B), as it is often done in the literature. The CSP is of course NP-complete, and therefore research has focused on identifying \u201cislands\u201d of tractable CSPs. The well-known CSP dichotomy conjecture of Feder and Vardi [13] states that every CSP is either tractable or NP-complete, and progress towards this conjecture has been steady during the last fifteen years. From a complexity-theoretic perspective, the classification of CSP(B) as in P or being NP-complete is rather coarse and therefore somewhat dissatisfactory. Consequently, understanding the fine-grained complexity of CSPs gained considerable attention during the last few years. Ultimately, one would like to know the precise complex
{"id":"Clusa-et-al.2016.PlosOne","text":"RESEARCH ARTICLE An Easy Phylogenetically Informative Method to Trace the Globally Invasive Potamopyrgus Mud Snail from Rivers eDNA Laura Clusa1*, Alba Ardura2, Fiona Gower3, Laura Miralles1, Valentina Tsartsianidou1, Anastasija Zaiko3,4, Eva Garcia-Vazquez1 1 Department of Functional Biology, University of Oviedo, C/ Julian Claveria s/n 33006, Oviedo, Spain, 2 USR3278-CRIOBE-CNRS-EPHE-UPVD, Laboratoire dExcellence CORAIL, Université de Perpignan CBETM, 58 rue Paul Alduy, 66860, Perpignan Cedex, France, 3 Coastal and Freshwater Group, Cawthron Institute, 98 Halifax Street East, 7010, Nelson, New Zealand, 4 Marine Science and Technology Centre, Klaipeda University, H. Manto 84, LT-92294, Klaipeda, Lithuania a11111 * lauraclusa@gmail.com Abstract OPEN ACCESS Citation: Clusa L, Ardura A, Gower F, Miralles L, Tsartsianidou V, Zaiko A, et al. (2016) An Easy Phylogenetically Informative Method to Trace the Globally Invasive Potamopyrgus Mud Snail from Rivers eDNA. PLoS ONE 11(10): e0162899. doi:10.1371/journal.pone.0162899 Editor: Richard C. Willson, University of Houston, UNITED STATES Received: March 21, 2016 Accepted: August 30, 2016 Published: October 5, 2016 Copyright: 2016 Clusa et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Potamopyrgus antipodarum (New Zealand mud snail) is a prosobranch mollusk native to New Zealand with a wide invasive distribution range. Its non-indigenous populations are reported from Australia, Asia, Europe and North America. Being an extremely tolerant species, Potamopyrgus is capable to survive in a great range of salinity and temperature conditions, which explains its high invasiveness and successful spread outside the native range. Here we report the first finding of Potamopyrgus antipodarum in a basin of the Cantabrian corridor in North Iberia (Bay of Biscay, Spain). Two haplotypes already described in Europe were found in different sectors of River Nora (Nalon basin), suggesting the secondary introductions from earlier established invasive populations. To enhance the surveillance of the species and tracking its further spread in the region, we developed a specific set of primers for the genus Potamopyrgus that amplify a fragment of 16S rDNA. The sequences obtained from PCR on DNA extracted from tissue and water samples (environmental DNA, eDNA) were identical in each location, suggesting clonal reproduction of the introduced individuals. Multiple introduction events from different source populations were inferred from our sequence data. The eDNA tool developed here can serve for tracing New Zealand mud snail populations outside its native range, and for inventorying mud snail population assemblages in the native settings if high throughput sequencing methodologies are employed. Data Availability Statement: All sequences from this work are available in the Genbank database (accession numbers KU932989-KU933010). Funding: This work was supported by the Spanish project MINECO-13-CGL2013-42415-R and the Asturias Regional Grant GRUPIN-2014-093. Laura Clusa holds a PCTI Grant from the Asturias Regional Government, referenced BP14-145. Alba Ardura holds a regional postdoctoral Marie Curie grant COFUND-CLARIN. Introduction Human-mediated translocations of marine organisms have become a widely acknowledged global environmental issue nowadays [1, 2]. Maritime activities like merchant shipping or yachting aid the spread of many species out of their native distribution range, and global change may facilitate the success of exotic species in recipient ecosystems until they become PLOS ONE | DOI:10.1371/journal.pone.0162899 October 5, 2016 1 / 16 A Phylogenetically Informative Method to Trace Mud Snail Invasion in eDNA from Water Samples Competing Interests: The authors have declared that no competing interests exist. invasive with adverse effects on envir
{"id":"jmedgenet-2011-100468","text":"Genotype-phenotype correlations ORIGINAL ARTICLE Comprehensive sequence analysis of nine Usher syndrome genes in the UK National Collaborative Usher Study Polona Le Quesne Stabej,1 Zubin Saihan,2,3 Nell Rangesh,4 Heather B Steele-Stallard,1 John Ambrose,5 Alison Coffey,5 Jenny Emmerson,5 Elene Haralambous,1 Yasmin Hughes,1 Karen P Steel,5 Linda M Luxon,4,6 Andrew R Webster,2,3 Maria Bitner-Glindzicz1,6 < Additional materials are published online only. To view these files please visit the journal online (http://jmg.bmj. com/content/49/1.toc). 1 Clinical and Molecular Genetics, Institute of Child Health, UCL, London, UK 2 Institute of Ophthalmology, UCL, London, UK 3 Moorfields Eye Hospital, London, UK 4 Audiovestibualar Medicine, Institute of Child Health, UCL, London, UK 5 Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK 6 UCL Ear Institute, London, UK Correspondence to Dr Maria Bitner-Glindzicz, Clinical and Molecular Genetics Unit, Institute of Child Health, UCL, 30 Guilford Street, London WC1N 1EH, UK; mbitnerg@ich.ucl.ac.uk Received 31 August 2011 Revised 13 October 2011 Accepted 15 October 2011 Published Online First 1 December 2011 ABSTRACT Background Usher syndrome (USH) is an autosomal recessive disorder comprising retinitis pigmentosa, hearing loss and, in some cases, vestibular dysfunction. It is clinically and genetically heterogeneous with three distinctive clinical types (IeIII) and nine Usher genes identified. This study is a comprehensive clinical and genetic analysis of 172 Usher patients and evaluates the contribution of digenic inheritance. Methods The genes MYO7A, USH1C, CDH23, PCDH15, USH1G, USH2A, GPR98, WHRN, CLRN1 and the candidate gene SLC4A7 were sequenced in 172 UK Usher patients, regardless of clinical type. Results No subject had definite mutations (nonsense, frameshift or consensus splice site mutations) in two different USH genes. Novel missense variants were classified UV1-4 (unclassified variant): UV4 is probably pathogenic, based on control frequency <0.23%, identification in trans to a pathogenic/probably pathogenic mutation and segregation with USH in only one family; and UV3 (likely pathogenic) as above, but no information on phase. Overall 79% of identified pathogenic/UV4/UV3 variants were truncating and 21% were missense changes. MYO7A accounted for 53.2%, and USH1C for 14.9% of USH1 families (USH1C: c.496+1G>A being the most common USH1 mutation in the cohort). USH2A was responsible for 79.3% of USH2 families and GPR98 for only 6.6%. No mutations were found in USH1G, WHRN or SLC4A7. Conclusions One or two pathogenic/likely pathogenic variants were identified in 86% of cases. No convincing cases of digenic inheritance were found. It is concluded that digenic inheritance does not make a significant contribution to Usher syndrome; the observation of multiple variants in different genes is likely to reflect polymorphic variation, rather than digenic effects. INTRODUCTION Usher syndrome (USH) is an autosomal recessive disease characterised by the association of sensorineural hearing loss, retinitis pigmentosa (RP) and in some cases by vestibular dysfunction. The disorder is divided into three clinical types: type I (USH1) characterised by profound congenital hearing loss, absent vestibular function and onset of RP usually within the first decade of life; type II (USH2), J Med Genet 2012;49:27e36. doi:10.1136/jmedgenet-2011-100468 characterised by congenital, moderate to severe hearing loss, with normal vestibular function and onset of RP around or after puberty; and type III (USH3), defined by postlingual progressive hearing loss and variable vestibular response together with RP.1 2 In addition there remain patients whose disease does not fit into any of these three subtypes, because of atypical audiovestibular or retinal findings, who are said to have atypical Usher syndrome. Eleven loci and nine genes are associated with USH and cases of digenic inheritance have been described.3e16 For USH1, five associated genes have