We have been gathering more and more data on the organizations in the Hall Hoag Collection throughout the year and it is starting to show some results. We have locations for over 1700 organizations in the collection and you can see them mapped out by clicking the image below. While exploring the map click on any of the dots to learn more about the organizations. Included are hyperlinks to bio pages, start dates, end dates and categories.
Category: Archival Processing (Page 2 of 2)
It has been 1 year since I started as the Project Director for the Hall-Hoag grant and we have been able to accomplish a lot over the first year. It has been a very fun and exciting journey so far and I am looking forward to the next two years. Here are some of the year one accomplishments:
Processing: ~60 new boxes were created from unprocessed material. This concluded the work on unprocessed material.
Collation: ~1100 boxes of material were collated. . When first processed materials were arranged alphabetically within each box, but materials from each organization remained scattered across multiple boxes (for example, materials related to the National States’ Rights Party are currently in boxes 2C, 3C, 4B, 7B, 12B, 18B, 21B, 23B, 26B, 29B, etc.). The goal for the collation process is to correlate all materials from an organization, then alphabetize the entirety of Part II so that, for example, the first box begins with “A Call to Resist” and the last box ends with “Zygon”. This is a critical step in processing the collection as it will eliminate the need for researchers to search for an organization in multiple boxes and will improve the ability of staff to guide researchers in use of the collection.
Research: 1647 organizations have been researched by student assistants. They found a variety of information on the organizations including: locations, exist dates, biographical histories, related archival collections, organization members and variant forms of the organization names.
Outreach: Announcements about the grant have been published on listservs and in printed newsletters. A blog has been created and updated at least once a week outlining the work done on this project and a presentation was giving at the DH:The Next Generation conference.
Organization Names: We created a list of 35,000 unique organizations in Hall Hoag Part II. To create the organizations list, we started with the inventories that were created when processing the collection, which, when combined, represented about 180,000 lines in an Excel spread sheet. Many of the organizations were repeated throughout the inventory and were weeded out using many automated and manual processes.
EAC-CPF Standards: Throughout the first year of the grant the project team worked to set a standard set of minimum requirements for EAC-CPF records. Since there are 35,000 organizations in the collection and the goal is create an EAC-CPF record for each organization it was important to set a low requirement for data. Also, many of the organizations in the collection are obscure making it difficult to create robust EAC records. It was also important to set standards for the maximum amount of information to be collected. Some organizations are very well known and students/staff working on creating EAC records could easily spend too much time working on one record if standards were not created.
- Minimum Requirements:
- All Control fields
- Organization name
- One exist date
- Location – to the state level at least
- Maximum Requirements (in addition to the minimum fields):
- Biographical history
- List of past members (founders)
- List of all variant forms of the organization name
- List of related archival collections
- List of all geographic locations
EAC-CPF Tool: A tool was built using FileMaker Pro to create EAC-CPF records. The tool has been customized to automatically export valid EAC-CPF records for each organization based on the data that is entered. Fields include name, name authority, locations, dates, members, related archival collections and more. This tool will greatly increase the efficiency of creating EAC-CPF records. No coding will have to be done by staff at Brown. Students can be trained on how to enter data into FileMaker much more easily than they can be trained on EAC-CPF. This will allow more people to contribute to the creation of EAC-CPF records. The tool can also “export all” records in one batch. This allows us to make a wholesale change to the EAC-CPF coding (if there was a mistake, or an update needed). Rather than going through each record and making changes, they can be made and then exported for each record in a batch.
We are out at the Library Connections Annex now! All of the material from the Hall Hoag Collection is stored at the Annex and we will be working through all 1,600 boxes this summer. The goal of this stage is to collate all of the folders containing items from the same organizations. The current boxes we have are in the order that the items were shipped to Brown. We will be working to take the items out of the original boxes and place them in boxes labeled with letters of the alphabet. As a group we should be able to refile 25-30 boxes each day. The work will include checking the original boxes against a folder inventory to ensure we have the correct boxes (and they are inventoried), refiling the material into new boxes in alphabetical order, and then updating the old inventories with the new boxes numbers for each of the folders. We will have 8 students working in 3 teams full time all summer. We hope to get through the entire collection. This will also provide us the opportunity to see all of the material as we refile it, so we should have some great highlights to share on the blog.
We have finished researching over 1500 organizations in the collection and we found locations for 900 of them. Massachusetts and New York have the highest number of organizations and if you combine, New York, Massachusetts, Illinois and California you would have more than half of the organizations in this subset. Although the results here are only for 900 organizations, I would not be surprised if the trends continued throughout the collection. It is also interesting that so far we have found material from 45 different states and 25 organizations with international locations. The charts below contain the data that we have found so far.
After attending the DH: The next Generation conference a few weeks back I decided to pull together some data from the Hall Hoag Collection Part I. We have a list of each organization in Part I and how many folders we have for each of those organizations and I decided to organize that data by the organization category (The categories were chosen by Gordon Hall, there are 99 of them). What I found was very interesting. For example, Christian Religious Right groups is by far the largest group in Part I with 643 organizations 2995 folders of material. This is even more surprising given that both Christian Identity (111 organizations, 311 folders) and Catholic Traditionalism (122 organizations, 336 folders) groups are separated from the larger group of Christian Religious Right groups in Hall Hoag Part I. The next largest type of organization is Single Issue Focus Left organizations with 375 organizations and 1008 folders. The size of that group shows the nature of many extremist groups in that there are many organizations and many of them have a very specific focus. The entire list is included below. It is a bit hard to read, but if you click the image it will enlarge, or you can download a PDF version of the list here: Hall Hoag Part I Data Analysis.
Periodically, I will be including the experiences of students working hands on with Hall-Hoag Part II. Below is a report from Aimie Kawai who started working with the Hall-Haog collection in January of 2013. Kawai is a 3rd year student at Brown, studying history.
———————————————————————————————————————————————————————-
I came into this project as a student assistant this semester. As a history concentrator at Brown with a particular interest in social conflicts in the past century, I had a feeling that I would find this project both engaging and beneficial.
During the two months that I have worked so far, I have been pleasantly surprised by the extent to which my initial feelings about this project were correct. We have been wrapping up the inventory part of the project, which means I have spent much of my time digging through old boxes and sorting out the material that is relevant. The very first box I went through was very surprising; it contained so much garbage—ice cream containers, ripped up pieces of parchment papers, and toilet paper boxes.[1] Since then however, the boxes have been interesting in their content in a manner that is more pertinent to the project. I’ve looked at materials from so many different movements and seen thousands of people’s individual passions and views represented in posters or organizations. It’s fascinating how in just one afternoon of work, I can sort through contrasting opinions on abortion, communism, miscegenation, and more.
Due to the nature of my interests, this project has also been applicable to my studies here at Brown. It has been really remarkable to pick up a pamphlet and realize that it is material from an organization I talked about in class earlier that day. Or to be sorting through papers and pull out information on posse comitatus, which I read about for a class the week before. It benefits both my interest in the project and in my classes, encouraging me to pay more attention so that I can find more ties and overlaps in these two aspects of my life.
It feels strange to already be wrapping up this part of the project, when I feel like I just entered the scene. However, I’m looking forward to continuing my involvement and beginning the next phase.
– Aimie Kawai
[1] Note: All of the material shipped to Brown for Hall-Hoag Part II was packaged after Gordon Hall passed away. Based on the contents of the boxes, it seems that they shipped nearly every item in Mr. Hall’s house.
As a deliverable for this grant, we have to create EAC-CPF records for each organization in the collection. The first step in that process is figuring out what organizations are in the collection.
When the materials were shipped to us they were completely unorganized. As part of the process of organizing the materials into new folders and archival, we also created an inventory in Excel of each organization that came out of that shipping box (see image above).
We ended up with (after years of work) inventories of nearly 800 boxes that when combined represented about 180,000 lines in an Excel spread sheet. As you can imagine, many of the organizations are repeated throughout the collection.
We combined the organization name column from all of the inventories (Column D above) into a new spread sheet. Excel can only handle about 65,000 lines and due to the size of our data we had to start with 3 spread sheets. After we pulled all of the organizations out of the original spread sheets, we alphabetized the new spread sheet. By doing this we were able to see which organizations were duplicated throughout the collection.
Next, we had to weed out the duplicates. Rather than going through each line in Excel and deleting the duplicates, we used a few tricks. The first step was using the function in Excel “delete all duplicates.” With this function alone we were able to cut the number from 180,000 down to about 75,000. However, this only deleted the lines that were exactly the same.
When looking through the data, we saw that many duplicates still existed due to misspellings and any additional information added to the name of the organization. Excel could not recognize them as the same organization and therefore could not automatically delete them.
In the example above you can see that all of these lines are from the same organization but the data in parenthesis makes them unique. To clean this up, we used the Excel function “Text to Column” and we were able to separate the text out by deliminating the text by a “(“ symbol. In Other words, Excel was able to take everything that occurred after a ( symbol and separate it out. Once the data inside the () was taken out we were able to run the “Delete All Duplicates” function once again and this cut our list from 75,000 down to about 50,000.
However, there was still a lot of weeding to be done. In the example above you can see how many different spellings and formats the Minute Women of the USA had over the years of inventorying the collection.
There was no common factor that we could use to separate out these organizations. We chose to use Google Refine to help clean up the lists. Google Refine is a tool that can be used in various different ways to help clean up data. As you can see in the screen shot below after uploading the spread sheet into Google Refine, it was able to point out different lines of data that are similar.
Using this tool you are able to select which version of the organization you would like to use and then Google Refine goes through and updates the other versions of that name. This process cut the list down to about 40,000 lines. The next step was to go through each line of the spread sheet manually and deleting the duplicates.
In all, the process took about 60 hours to complete and we ended up with about 35,000 unique organization names. The next step will be taking this list of names and creating an authoritative version for each.
As part of this project we are going to be creating EAC-CPF records for each organization contained in the collection.
“Encoded Archival Context – Corporate bodies, Persons, and Families (EAC-CPF) primarily addresses the description of individuals, families and corporate bodies that create, preserve, use and are responsible for and/or associated with records in a variety of ways…It supports the linking of information about one agent to other agents to show/discover the relationships amongst record-creating entities, and the linking to descriptions of records and other contextual entities. EAC-CPF is a communication structure for archival contextual information for individuals, corporate bodies and families. It supports the exchange of ISAAR (CPF) compliant authority records.”[1]
EAC is defined as a document type definition (DTD) as well as in an XML Schema and a Relax NG schema. EAC elements reflect the ISAAR(CPF) standard and the ISAD(G), two standards managed by the International Council on Archives. [2]
To save the time in creating an XML file for each of the organizations in the collection we have created a FileMaker Pro database which stores information on each organization and has been customized to export this data in valid EAC-CPF XML. All of the information will either be entered in manually, or pulled imported from other online sources (we are still exploring our options here). The FileMaker Pro database will track information about the background of the organizations, members of the organizations, publications of the organizations, and other archives that have information on the organizations. We determined what information to track based on what is most readily available to our staff within the collection We anticipate that most of the data in the FileMaker Pro database will be hand entered (meaning not through a script or importing) which in itself is extremely time consuming. However, being able to export the data in a valid EAC XML will save us an immeasurable amount of time.
Below are some screen shots of our FileMaker Pro database. For more information about this work contact Daniel Johnson (Daniel_johnson_1@brown.edu). The database has not been completed and will be worked on throughout the first two years of the grant, but the screen shots below will give a general idea of what will be used to create the EAC-CPF.
[1] http://eac.staatsbibliothek-berlin.de/about/ts-eac-cpf.html
[2] http://en.wikipedia.org/wiki/Encoded_Archival_Context
In the first tab in the database we can collect information on the background of the organization and enter information regarding authorized versions of the organization’s name. Included here are start dates, end dates, locations, organizational histories, categories, and subjects. In this tab data is used to describe what the organization is and evidence is given to help identify the organizations. In the other tabs in the database we establish the relationship of this organization to other entities.
In the 2nd tab we establish relationships between the organization and people who were members of the organization, by looking through the materials (who wrote the articles, who is mentioned as members) and doing additional research. We can include what a persons role in the organization was and when they were involved. This helps create connections between different organizations within the collection (many people are members of multiple organizations) and with outside collections. By using authorized forms of a person’s name (LoC) eventually these names can be cross referenced against archival holdings within other institutions as well as other collections held by Brown University.
This database is also structured to track the publications in the collection that each organization has issued. Due to the size of the collection (700,000 items) it is unlikely that we will be able to input the title of all of the publications however, simply entering the dates of the publications will create useful connections. In many cases the organizations in this collection are obscure and very difficult to research. By looking at when things were published the major dates of existence for an organization can be established This also helps establish connections between different organizations. For example after dates have been entered researchers can explore questions like “what was being published in 1958?” and “who was publishing it?” on the extreme right and left. This makes issues that were topical at a particular time in history easier to explore across a wide variety of organizations.
This tab will be used to list other archives that have collections created by or about organizations in the Hall-Hoag collection. Although this tab will not be often used (due to the size of the collection and the difficulty in finding collections online) it was developed because in some cases the connections will be obvious and the information would be very valuable for researchers.
The final tab in the database is used to create the XML data. By clicking the button “Craete EAC-CPF” the text box below is filled with a valid EAC-CPF file created using the data entered into the database. The text in this field can be copy/pasted into Oxygen to be view, edited and saved. In addition this database also has scripts in place that can create EAC-CPF records for each organization in the database and save them to a specified location on a shared drive at Brown University. This allows us to make whole sale changes to our EAC-CPF code and update each record en masse. A pdf version of some sample data is attached.
Although most of the collection has been processed we spent the end of last year cleaning up some of the loose ends. There were about 20 boxes of unprocessed material that we still had to accession and organize. Materials were taken out of the shipping containers they came in, reviewed and then placed into folders based on what organization published the document. Upwards of 400 folders are created from each of the shipping boxes that we process. The fact that we have received nearly 800 shipping boxes can give you an idea of the scope of this collection.