搜索结果: 1-15 共查到“计算语言学 data”相关记录17条 . 查询时间(0.5 秒)
Generating Phrasal and Sentential Paraphrases:A Survey of Data-Driven Methods
Generating Phrasal Sentential Paraphrases Data-Driven Methods
2015/9/8
The task of paraphrasing is inherently familiar to speakers of all languages. Moreover, the task of automatically generating or extracting semantic equivalences for the various units of language—words...
Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets
data set Computing language
2015/9/6
We propose a method for learning dialogue management policies from a fixed data set. The method
addresses the challenges posed by Information State Update (ISU)-based dialogue systems, which
r...
Coherence in Natural Language:Data Stuctures and Applications
Natural Language Data Stuctures and Applications
2015/9/2
In his blurb on the back cover, Mark Liberman calls this book “the biggest step forward [in research on discourse structure] since Aristotle.” Given this eminent recommendation, I read the book with g...
Never say “never.” In 1997, most experts would have sworn that text-to-speech (TTS)
synthesis technologies had reached a plateau, from which it would be very hard to
leave. Five years later, speech ...
LAF-Fabric: a data analysis tool for Linguistic Annotation Framework with an application to the Hebrew Bible
LAF-Fabric data analysis tool Linguistic Annotation Framework the Hebrew Bible
2015/5/6
The Linguistic Annotation Framework (LAF) provides a general, extensible stand-off markup system for corpora. This paper discusses LAF-Fabric, a new tool to analyse LAF resources in general with an ex...
Detection of Runs of Homozygosity from Whole Exome Sequencing Data: State of the Art and Perspectives for Clinical, Population and Epidemiological Studies
Exome sequencing Runs of homozygosity Homozygosity mapping
2015/5/5
Runs of homozygosity (ROH) are sizeable stretches of homozygous genotypes at consecutive polymorphic DNA marker positions, traditionally captured by means of genome-wide single nucleotide polymorphism...
A data infrastructure reference model with applications: towards realization of a ScienceTube vision with a data replication service
Reference model ScienceTube Data infrastructure Replication
2015/4/24
The wide variety of scientific user communities work with data since many years and thus have already a wide variety of data infrastructures in production today. The aim of this paper is thus not to c...
ISOcat Data Categories for Signed Language Resources
signed language resources metadata data categories standardization
2015/4/21
As the creation of signed language resources is gaining speed worldwide, the need for standards in this field becomes more acute. This paper discusses the state of the field of signed language resourc...
LAT Bridge: Bridging tools for annotation and exploration of rich linguistic data
LAT Bridge Bridging tools for annotation rich linguistic data
2015/4/9
We present a software module, the LAT Bridge, which enables bidirectional communication between the annotation and exploration tools developed at the Max Planck Institute for Psycholinguistics as part...
A Data Category Registry- and Component-based Metadata Framework
A Data Category Registry Metadata Framework
2015/4/8
We describe our computer-supported framework to overcome the rule of metadata schism. It combines the use of controlled vocabularies,managed by a data category registry, with a component-based approac...
A COLLECTIVE DATA GENERATION METHOD FOR SPEECH LANGUAGE MODELS
Language models crowdsourcing Amazon Mechanical Turk
2014/11/27
Recently we began using Amazon Mechanical Turk (AMT), an Internet marketplace, to deploy our spoken dialogue systems to large audiences for user testing and data collection purposes. This crowdsourcin...
N-gram Weighting: Reducing Training Data Mismatch in Cross-Domain Language Model Estimation
N-gram Weighing Data Mismatch Cross-Domain Language Model
2014/11/27
In domains with insufficient matched training data, language models are often constructed by interpolating component models trained from partially matched corpora. Since the ngrams from such corpora m...
Iterative Language Model Estimation: Efficient Data Structure & Algorithms
language modeling smoothing interpolation
2014/11/27
Despite the availability of better performing techniques, most language models are trained using popular toolkits that do not support perplexity optimization. In this work, we present an efficient dat...
SPEAKER VERIFICATION OVER HANDHELD DEVICES WITH REALISTIC NOISY SPEECH DATA
SPEAKER VERIFICATION HANDHELD DEVICES REALISTIC NOISY SPEECH DATA
2014/11/27
We study speaker verification f or handheld devices assuming realistic, noisy test conditions and assuming no prior knowledge of the noise characteristics. Data were r ecorded in office ( “quiet”) and...
Managing Fieldwork Data with Toolbox and the Natural Language Toolkit(图)
Managing Fieldwork Data Natural Language Toolkit Toolbox
2009/6/4
This paper shows how fieldwork data can be managed using the program Toolbox together with the Natural Language Toolkit (NLTK) for the Python programming language. It provides background information a...