spinning plates: databases and data collection

The last few weeks have been picking up momentum in terms of digital projects. I haven't made any progress on my front end for the digital repository, but I'll be digging in hard once the WordPress plugin is launched (and I'm also still dabbling in learning PHP). But I can see deadlines approaching, so my nose will be to the grindstone soon.

Another project I'm working on in parallel is an inventory system for our physical artifact collections. While I'm technically responsible for site-level archives data, I'm taking a quick-turnaround funding opportunity to set up a revamped database for our curation folks. As usual, this is a shoestring scenario (with the whole thing planned at a basic level in about 3 days because of administrative deadlines). But I do love a challenge!

Our old "database" was really a non-relational table in Access that hadn't been updated in a while. Values like trinomial site numbers weren't in standardized formats or locations, making locating artifacts and collections tricky business. We were able to fund a small army of folks to take very basic inventory of boxes in order to assign an identity to each box and get a good handle of site numbers in each. The information collected by the inventory assistants was transferred to a spreadsheet. Then I took all 6,800+ rows of data and cleaned it up in OpenRefine. Cleanup involved basics like trimming leading and trailing whitespace, correcting spelling data entry errors, etc. Since the inventory assistants recorded one line per box but there could be many site numbers per box, I used OpenRefine to break all site numbers out individually and standardize formatting. I then took the resulting table (with 11,000+ rows) and split it out into normalized tables for Collection, Box, and Site Number. I then brought these tables back into Access and created relationships, along with another table to identify Row/Shelf ID.

Since each box now has its own identity and accurately recorded location, the next part of the project will be to create Data Matrix tags for each box. This will allow for easy movement of boxes and inventory update. I decided on Data Matrix optical codes based on the experience of Sustainable Archaeology (data matrix allows for a lot of flexibility and teensy size for artifact-level labeling!). We obtained a thermal transfer printer and several scanners. It couldn't be that hard to get this to sync with the database, right? I could whip this together in no time, right? FAMOUS LAST WORDS.

A DM Code. Scan this with your phone.

Obstacles of getting this running started with general bureacratic IT ticket hurry-up-and-wait unsuccessful remote application install kind of stuff. After eating up most of a day trying to install a configuration utility, I gave up on my agency PC and decided to troubleshoot from home for ease of experimentation. Trouble is, my home laptop is Linux without MS Access. But, not to be defeated, I trudged on. It turns out that the algorithm for generating Data Matrix codes is very complex. While it's apparently a piece of cake to find ways to render regular barcodes, this is different. Oh.

Proprietary applications like those from IDAutomation work, but free versions are far too limiting, even though my needs are basic. So I went open source and found libdtmx. I was able to get the associated qdatamatrix GUI to generate one page of codes, but I couldn't get the output to work. I did, however, figure out how to generate .png codes straight from the Linux command line, but only one by one. If I knew what I was doing, I could generate a whole mess of .png codes and then connect them straight into my database. That's on deck. Maybe with Python? Not that I know Python that well. But this is how one learns these things, right?

So I can define what I want my end result to be, I just need to figure out how to get there. I did figure out how to configure the scanner and input the information encoded on a DM code into an Access form. And I cannot tell you how much I needed that small victory to keep me going. "Beep," numbers appear as if by magic, YAYYYYY!

When I'm back in the office next week I plan to return to battle, this time with the installation of the thermal printer. I'm fairly sure the printer manufacturer's label design application should work until I get the database set up to generate tags directly.

In other not-actually-my-job-but-i-love-it news, I'll be on a four day field project beginning on June 9. Because I must be bored or something, I suggested to the lead that we use this as an experiment for digital field data collection (I know, I know). So I have a bunch of templates for field forms and I'm planning to plug them into KoboToolbox data collection forms so we can use our own devices and see how it goes. Obviously since this is the first shot we'll also be recording on paper, but it should be fun. I'll share my setup and experience as I move along with that.

To be continued...

By @Jolene Smith in
Tags : #tools, #links, #digital-data, #databases,

Comments !