Already a member? Log in

Sign up with your...


Sign Up with your email address

Add Tags

Duplicate Tags

Rename Tags

Share This URL With Others!

Save Link

Sign in

Sign Up with your email address

Sign up

By clicking the button, you agree to the Terms & Conditions.

Forgot Password?

Please enter your username below and press the send button.
A password reset link will be sent to you.

If you are unable to access the email address originally associated with your Delicious account, we recommend creating a new account.

Links 1 through 5 of 5 by Vance Stevens tagged corpora

WebCorp LSE is a fully-tailored linguistic search engine to cache and process large sections of the web. WebCorp LSE offers:

•enhanced sentence boundary detection
•date identification
•'boilerplate' removal
•collocation and other statistical analyses
•grammatical tagging
•language detection
•full pattern matching and wildcard search

Share It With Others!


The following corpora were created by Mark Davies (Professor of Linguistics at Brigham Young University), and they offer a unique combination of queries, size, speed, and genre balance. They are used by more than 60,000 people each month (making them perhaps the most widely-used corpora currently available), and they serve as the basis for an increasing number of publications by researchers from throughout the world.

The corpora show how language is used, in ways that can't be done with Google, Google Books, or similar resources. The corpora have many different uses, including 1) finding out how native speakers actually speak and write 2) looking at language variation and change, 3) finding the frequency of words, phrases, and collocates, and 4) designing authentic language teaching materials and resources.

Share It With Others!

Despite drawbacks, the World Wide Web is a mine of language data of unprecedented richness and ease of access.

It is also the only viable source of "disposable" corpora built ad hoc for a specific purpose. These corpora are essential resources for language professionals who routinely work with specialized languages, often in areas where neologisms and new terms are introduced at a fast pace and where standard reference corpora have to be complemented by easy-to-construct, focused, up-to-date text collections.

While it is possible to construct a web-based corpus through manual queries and downloads, this process is extremely time-consuming.

The perl scripts included in the BootCaT toolkit implement an iterative procedure to bootstrap specialized corpora and terms from the web, requiring only a list of "seeds" (terms that are expected to be typical of the domain of interest) as input.

Share It With Others!

Share It With Others!

SACODEYL presents an innovative ICT-based solution for the compilation and pedagogical, language learning-oriented exploitation of linguistic teen talk oral corpora

Share It With Others!