Already a member? Log in

Sign up with your...


Sign Up with your email address

Add Tags

Duplicate Tags

Rename Tags

Share This URL With Others!

Save Link

Sign in

Sign Up with your email address

Sign up

By clicking the button, you agree to the Terms & Conditions.

Forgot Password?

Please enter your username below and press the send button.
A password reset link will be sent to you.

If you are unable to access the email address originally associated with your Delicious account, we recommend creating a new account.

Links 1 through 10 of 14 by Vance Stevens tagged textanalysis

can create cloze exercises where 'academic' words are blanked

Share It With Others!

Create Gap-Filling Exercises
Take any text, mark the words you want to learn and transforms your text into a cloze deletion test. Seen in Nik Peachey's blog

Share It With Others!

NEW: Enter entire texts at
We've added a new feature at, the alternative interface for COCA. You can now enter an entire text -- maybe a newspaper article that you've copied from a website, or something you've written -- and it will then give you detailed information about the words and phrases in the text. There's now no need to copy and paste individual words and phrases into the regular COCA interface -- just work from your original text.

Share It With Others!

Tom's famous site for all kinds of lexical text manipulations and analysis

Share It With Others!

Vance's collection of textanalysis sites, to which I append what I found out about concordancing, etc

Share It With Others!

WebCorp LSE is a fully-tailored linguistic search engine to cache and process large sections of the web. WebCorp LSE offers:

•enhanced sentence boundary detection
•date identification
•'boilerplate' removal
•collocation and other statistical analyses
•grammatical tagging
•language detection
•full pattern matching and wildcard search

Share It With Others!


The following corpora were created by Mark Davies (Professor of Linguistics at Brigham Young University), and they offer a unique combination of queries, size, speed, and genre balance. They are used by more than 60,000 people each month (making them perhaps the most widely-used corpora currently available), and they serve as the basis for an increasing number of publications by researchers from throughout the world.

The corpora show how language is used, in ways that can't be done with Google, Google Books, or similar resources. The corpora have many different uses, including 1) finding out how native speakers actually speak and write 2) looking at language variation and change, 3) finding the frequency of words, phrases, and collocates, and 4) designing authentic language teaching materials and resources.

Share It With Others!

Michigan Corpus of Academic Spoken English

Welcome to our NEW interface to the on-line, searchable part of our collection of transcripts of academic speech events recorded at the University of Michigan.

There are currently 152 transcripts (totaling 1,848,364 words) available at this site.

Browse the corpus according to specified speaker and speech attributes, returning quick file references.

Search the corpus for words or phrases in specified contexts, returning concordance results with references to files, full utterances, and speakers.

Share It With Others!

Despite drawbacks, the World Wide Web is a mine of language data of unprecedented richness and ease of access.

It is also the only viable source of "disposable" corpora built ad hoc for a specific purpose. These corpora are essential resources for language professionals who routinely work with specialized languages, often in areas where neologisms and new terms are introduced at a fast pace and where standard reference corpora have to be complemented by easy-to-construct, focused, up-to-date text collections.

While it is possible to construct a web-based corpus through manual queries and downloads, this process is extremely time-consuming.

The perl scripts included in the BootCaT toolkit implement an iterative procedure to bootstrap specialized corpora and terms from the web, requiring only a list of "seeds" (terms that are expected to be typical of the domain of interest) as input.

Share It With Others!

Share It With Others!