The Data and Methodology

The newspapers used for our research were all drawn from the Library of Congress’s “Chronicling America” collection. Chronicling America is an open-source storehouse of American newspapers from the eighteenth through twentieth centuries, though it is especially strong for the period of our research here (the 1910s). Some states are nonetheless significantly better represented than others. The Library of Congress also regularly updates the collection, so its coverage is constantly evolving. We began building our collection of stories in May of 2018 and stopped almost exactly one year later. Thus the data you’ll find here is based on what the collection looked like as of May 2019. We then used Tableau, which is a free, open-source software, to create visualizations from that data. I will embed images of these visualizations within my posts here, but they are always best viewed at the Tableau site where you can hover over the visualization to draw out additional information and sometimes restrict the categories being viewed. To access the original Tableau visualization, just click on the the embedded version here.

Below is a visualization of what the collection looked like as of May 2019 and the variation in issue availability from state to state. If viewed at Tableau’s site, you can float your cursor over each state to view their respective available newspaper counts.

Click on the image for a better view in Tableau

There were a total of 453,839 newspaper issues in the collection at the time. Obviously there is a huge range in the newspaper issues available for each state, with the most coming from New Mexico at 16,538, and the fewest from Alabama at 47. To account for this variation when comparing variables between states, we created a toggle to “normalize” the data in relevant visualizations. I will more fully explain that when we encounter such an example.

Our method was relatively simple: we searched the database for every article that mentioned intoxicant cannabis between January 1, 1910 and December 31, 1919. We used various search terms, with various alternate spellings in some cases: “cannabis,” “cannabis indica,” “Indian hemp,” “hashish,” “hasheesh,” “marihuana,” “mariguana,” “marijuana,” “locoweed.” We did not search “hemp” alone because the overwhelming majority of such articles were related to fiber rather than intoxicant cannabis, though often the term “hemp” did turn up in articles that also contained our main search terms, and thus you will find plenty of references to “hemp” in the data.

There were some false hits and some other idiosyncrasies with the database that I’ll touch on where appropriate, but in the end our collection of cannabis stories included 1,225 legible articles. For each article we broke down its contents into key components: the location of publication, the words used for cannabis, the effects described, the demographics of any users or sellers of the drug, and so forth.

Certain categories of course required some subjective decisions on our part. For example, when we began categorizing the effects attributed to cannabis, we encountered dozens of different adjectives from story to story. This resulted in a huge list of “effects,” nearly 150 in all, that was far to unwieldy to be of any use. But, because the vast majority of these were just variations on a few key themes, we were able to go back and consolidate them into a list of nine umbrella effects ranging from “altered perception,” and “altered emotion,” to “addiction,” and “violence” (more on this later). We performed a similar procedure with respect to demographic information, consolidating our various descriptions into a dozen or so categories. Thus the data required us to make a number of important subjective decisions. Another group of researchers might have categorized some of these details somewhat differently, but we did our best to be as objective as possible in all of our decisions. In all cases, the final decisions rested with me, and thus any deficiencies in them are my responsibility.

Once we settled on the final categories, we began creating the Tableau visualizations that you will see featured throughout these pages, along with an interactive map that you may manipulate on your own to pursue your own research questions.

Next Post: Is the Word Marijuana Racist? And Other Questions of Nomenclature