Already a member? Log in

Sign up with your...

or

Sign Up with your email address

Add Tags

Duplicate Tags

Rename Tags

Share This URL With Others!

Save Link

Sign in

Sign Up with your email address

Sign up

By clicking the button, you agree to the Terms & Conditions.

Forgot Password?

Please enter your username below and press the send button.
A password reset link will be sent to you.

If you are unable to access the email address originally associated with your Delicious account, we recommend creating a new account.

1-10 of Juan Manuel Caicedo's Bookmarks with the Tag Bundle mining

This is a draft textbook on data analysis methods, intended for a one-semester course for advance undergraduate students who have already taken classes in probability, mathematical statistics, and linear regression.

Share It With Others!

Luke fork for Lucene 4.1 (mavenized)

Share It With Others!

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Share It With Others!

Mavuno is an open source, modular, scalable text mining toolkit built upon Hadoop. It supports basic natural language processing tasks (e.g., part of speech tagging, chunking, parsing, named entity recognition), is capable of large-scale distributional similarity computations (e.g., synonym, paraphrase, and lexical variant mining), and has information extraction capabilities (e.g., instance and semantic relation mining). It can easily be adapted to new input formats and text mining tasks.

Share It With Others!

VisualSearch.js enhances ordinary search boxes with the ability to autocomplete faceted search queries. Specify the facets for completion, along with the completable values for any facet. You can retrieve the search query as a structured object, so you don't have to parse the query string yourself.

Share It With Others!

Scrunch is an experimental Scala wrapper for Crunch, based on the same ideas as the Cascade project at Google, which created a Scala wrapper for FlumeJava.

Share It With Others!

ARFF (Attribute-Relation File Format) reader/writer library for Python.

Share It With Others!

ReVerb is a program that automatically identifies and extracts binary relationships from English sentences. ReVerb is designed for Web-scale information extraction, where the target relations cannot be specified in advance and speed is important.

The source code is available for download and it's released under the GPLv3 license.

Share It With Others!

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.

Share It With Others!

A feed with the new versions of the Stack Exchange dump.

Share It With Others!