Research Guides: Text and Data Mining: TDM Tools

Vendor Tools

The following library providers have built-in tools for conducting text analysis on their content. Access may be restricted depending on WashU subscriptions and copyright.

HathiTrust Research Center Analytics
Supports large-scale computational analysis of the works in the HathiTrust Digital Library.

HTRC Algorithms are web-based, click-and-run tools to perform computational text analysis on volumes in the HathiTrust Digital Library. The algorithms can help you explore, analyze, and visualize public worksets or those you have created.
For more advanced users, the HTRC Data Capsules provide secure computing environments for performing researcher-driven text analysis on the HathiTrust corpus.
JSTOR Constellate
Perform text analysis on JSTOR and Portico content in a secure Jupyter notebook environment.
Also includes beginner and intermediate series of Jupyter notebooks for analyzing word frequencies, topic modelling, sentiment analysis, and other common text mining techniques

LC for Robots
The Library of Congress provides machine-readable access to its digital collections via APIs and built-in tools.

Tools & Tutorials

Programming Historian
A collection of peer-reviewed tutorials to learn a wide-variety of digital tools and techniques suitable for novice to advanced-intermediate programmers.
JSTOR's Consellate platform
A series of novice to advanced intermediate Jupyter notebooks covering introductory programming with Python and common text mining tasks.
Voyant Tools
Voyant Tools is a web-based reading and analysis environment for digital texts designed for those without programming skills.

Building Legal Literacies for Text Data Mining by Rachael Samberg (Editor); Timothy Vollmer (Editor)
ISBN: 9780999797044
Publication Date: 2021-07-01
This book explores the legal literacies covered during the virtual Building Legal Literacies for Text Data Mining Institute, including copyright (both U.S. and international law), technological protection measures, privacy, and ethical considerations. It describes in detail how we developed and delivered the 4-day institute, and also provides ideas for hosting shorter literacy teaching sessions. Finally, we offer reflections and take-aways on the Institute.
Text Mining with R by Julia Silge; David Robinson
ISBN: 9781491981658
Publication Date: 2017-07-18
Much of the data available today is unstructured and text-heavy, making it challenging for analysts to apply their usual data wrangling and visualization tools. With this practical book, you'll explore text-mining techniques with tidytext, a package that authors Julia Silge and David Robinson developed using the tidy principles behind R packages like ggraph and dplyr. You'll learn how tidytext and other tidy tools in R can make text analysis easier and more effective. The authors demonstrate how treating text as data frames enables you to manipulate, summarize, and visualize characteristics of text. You'll also learn how to integrate natural language processing (NLP) into effective workflows.
Text Analytics with Python by Dipanjan Sarkar
ISBN: 1484243536
Publication Date: 2019-05-22
Leverage Natural Language Processing (NLP) in Python and learn how to set up your own robust environment for performing text analytics. This second edition has gone through a major revamp and introduces several significant changes and new topics based on the recent trends in NLP. You'll see how to use the latest state-of-the-art frameworks in NLP, coupled with machine learning and deep learning models for supervised sentiment analysis powered by Python to solve actual case studies. Start by reviewing Python for NLP fundamentals on strings and text data and move on to engineering representation methods for text data, including both traditional statistical models and newer deep learning-based embedding models.