Posts Tagged ‘#lawdi’

h1

Online Continuing Education Class Sequence in XML and RDF

October 22, 2013

If I had a) time and b) money, I would certainly be considering taking the classes for the Certificate in XML and RDF-Based Systems at Library Juice Academy.  It starts next February, and comprises 6 4-week sessions at $175 a pop. It’s taught by Robert Chavez, who has a PhD in Classical Studies and worked at Perseus for 8 years. Course descriptions for each session are available at the link, and they seem pretty hands-on – like you might actually build stuff, not just talk about it. Library Juice does continuing education classes, all online and asynchronous, for library professionals.

Note I have no personal experience with Library Juice and don’t know Robert Chavez, but boy those classes look just right for someone who’s interested in LAWDI and Linked Open Data and is not a great self-starter in terms of teaching oneself technical stuff (i.e., me). If only I had a) time and b) professional development money.

h1

TOCS-IN at Zotero: A Project That Didn’t Work

September 20, 2012

So, blogging a project that didn’t work – good idea or not?  Let’s see…

The project was to get the content of the TOCS-IN citation database into the free, open-access bibliographic software Zotero (which David Pettegrew discusses today; his post kicked me over my hesitation about blogging this project). I wanted to do this for two reasons: to draw increased attention to TOCS-IN, which is an excellent, open-access bibliographic resource for Classicists, and make it especially accessible to Zotero users; and to make the TOCS-IN content potentially available as Linked Open Data, because Zotero can export files in BIBO, a linked open data format for bibliographic citations.

My steps were:

1. Get permission from P.M.W. Matheson of the University of Toronto, the manager of the volunteer-driven TOCS-IN project, to use the available data files for this purpose.  She was helpful and supportive – thank you!

2. Write a Python script to convert the data file formatting from a custom SGML markup to RIS format, a common format for bibliographic citations (used by Zotero as well as EndNote, which created it.) I am not a programmer, but happily my husband is; this piece would not have been possible without his help, although I did big chunks of it All By Myself.

3. Add the RIS-formatted citations to a Zotero Group library. This turned out to be the problem.  In theory, there is no limit to the number of bibliographic citations that can be stored by a Zotero user.  In practice, once I got about 40,000 (of the ca. 80,000) citations uploaded my Zotero standalone software began freezing every time I attempted to do anything (like stubbornly add another several thousand citations), and refusing to sync with the online Group Library.  A question posted in the Zotero forums got the swift and helpful confirmation that the sync process simply cannot handle such large datasets well, and that I alone would not be affected; any users who tried to use this large group library would start crashing their Zotero instances as well.

What now?

It’s possible that Zotero, which is actively under development, will make it possible to create very large citation libraries. Zotero used to not be able to handle a couple of thousand citations in one library, and now it can do that with ease (as, for example, the ASCSA Group Library of 2553 items demonstrates). But it may not be a priority for Zotero’s developers to move in that direction; most people use Zotero for personal citation libraries, not as de facto mirror sites for large bibliographic indices.

I have looked at BibSoup/BibServer, related projects that allow the open-access presentation of bibliographic data online, deal with a wide variety of formats (bibtex, MARC, RIS, BibJSON, RDF), and are relevant to the Linked Open Data goal of this project (full RESTful API).  I really liked Zotero simply because it is already very popular with humanities-oriented users and likely to become more so (it seems especially popular among graduate students). BibSoup is geared toward STEM academics, and currently only has about 17,000 citations total (and I’m a little hesitant about breaking things after my Zotero experience!); BibServer requires a server and IT chops which I lack. I do think these applications have a lot of potential, but I don’t think they will work for my project right now.  I’d welcome an argument on this point, or any other suggestions.

Finally, I’d like to add a quick recap and appreciation of what TOCS-IN is and comprises.  TOCS-IN is a bibliographic database  that is fully open-access (searchable at Toronto and at Louvain) and entirely crowd-sourced – that is to say, made possible by the contributions of volunteers who transcribe or copy and paste journal tables of contents and format them for inclusion in the database.  A list of volunteers is available at the site, as is a list of journals currently needing a volunteer.  Do consider joining us; I am currently covering three journals, and the time burden is minimal, especially if the journal publishes its table of contents online (much less typing!)

The basic portion of TOCS-IN is about 80,000 citations, comprising the tables of contents of about 180 journals, all among those indexed by the subscription database L’Annee Philologique. The project began in 1992, so chronological coverage mostly starts there.  A comprehensive list of titles, volumes, and issue numbers is available at the Toronto site. TOCS-IN at Toronto and Louvain currently also searches an additional ca. 56,000 citations, including tables of contents of some TOCS-IN journals dating before 1992 (listed at Louvain), and edited volumes, festschrifts, etc. (listed at Toronto).

h1

LAWDI 3: Good Linking Practices for Bibliographic Stuff

June 13, 2012

While the following were informed by conversations and presentations at LAWDI, they should be considered my opinions only, and I welcome any (polite!) discussion of why my ideas are wrong-headed  in comments.

So, you’re a scholar putting up information online, and you don’t have the time or IT chops to start learning how to implement RDFa or learn a specialized linked open data vocabulary. The following are some ideas of things you can do that are linked open data friendly, with an emphasis on providing links to stable, authoritative, easy to use URLs. This post covers bibliographic items (secondary scholarship).

I want to emphasize that doing all this linking is work; it takes time. I’ve been trying to link more thoroughly in my blog posts about LAWDI, and it does add to the time burden of writing blog posts. I urge readers to strive to include more (good-quality) links in the things they post online, but please don’t feel guilty if you can’t do it all. Do what you can; every bit is a piece toward our common goals.

Books

  • Link to a WorldCat record using the OCLC number. Permalink URLS are linkable from records and can be created using the format http://www.worldcat.org/oclc/37663433 .
    WorldCat is my top choice because 1) it welcomes links, 2) it’s the largest and most international open linkable library catalog. Note: sometimes if you look a book up by title you’ll find multiple OCLC records with multiple OCLC numbers, even though you’re looking at the same book, not even different editions. OCLC and its members are probably working to tidy this sort of thing and merge (or at least cross-reference) duplicate records. For now, pick the one that has the largest number of libraries showing in the list in your home/target country (there will often be one US record and one European record, for example.)
  • Link to the US Library of Congress using an LCCN (Library of Congress Call Number).  Permalink URLS are shown in records and can be created using the format http://lccn.loc.gov/97040652 (useful, since many books have the LCCN in print on the inside.)
    Using the Library of Congress is a fine choice; it’s my second choice because it is US-centric (while WorldCat is working on becoming more international) and the Library of Congress records don’t have the enhancements that WorldCat records do (ability to display holdings in libraries near you, ability to provide a link to online booksellers, etc.)
  • I would not bother linking to, for example, Amazon using an ISBN. WorldCat links using OCLC are more useful in my opinion, and as easy to create.
  • Including the ISBN in a citation can be useful; there are some great browser plug-ins that can identify ISBNs in web pages and link users to libraries or online booksellers (for example, LibX or Book Burro).

Digital Books

  • If a book is available in an open-access digital edition, by all means include a link to that, preferably in addition to a link to a WorldCat record for the print edition. For open-access digital books you have two strong choices, neither the clear winner yet in my opinion.
  • Link to the Open Library record. URLs look like this: http://openlibrary.org/books/OL6907393M
    Open Library is the more linked data friendly solution; each record can be downloaded in RDF and JSON. Records also include linked OCLC numbers and LCCNs. The full-text books can be downloaded in a bunch of different formats, from .pdf to MOBI, and also also readable online.  Open Library is part of the Internet Archive, and is a “born-open” project. They currently only have about 1 million open-access books, though, and their records aren’t as scholar-friendly – they don’t have all the features of  library catalog records (though they are based on them.)
  • Link to the Hathi Trust record. URLs look like this: http://catalog.hathitrust.org/Record/001220795
    Hathi Trust’s records have library-provided bibliographic data and they have a large collection (3 million plus) of open-access volumes (as well as many more digital volumes not open-access; availability of formats can also be an issue). They are backed by a bunch of big academic libraries and are likely to stick around. They have an API, but are not as linked-data friendly as Open Library.
  • I would not bother linking to a Google Books record unless you can’t find a match at either of the previous places. Google Books has great content, but their metadata is lacking, and they are a for-profit company who cannot guarantee a future commitment to free open-access products.

Book Chapters

  • For print-only book chapters, right now you’d do best to link to the whole book.
  • Ditto for book chapters available in full-text digitally, unless you can track down .pdfs at the author’s web site or academia.edu, for example.

Journal Articles

  • Link to the DOI of the article – a long unique number appended in even print citations – using the format http://dx.doi.org/10.1177/1469605309338428 . Participating publishers have committed to maintaining access to articles via DOIs in perpetuity, even as their online platforms may change. (Remember, though, a lot of the articles are available by subscription only; many who follow the link will get an abstract but not full-text if their institution does not subscribe.)
  • Available digitally but doesn’t have a DOI? Look for a stable URL or permalink at the page with the article citation. Jstor does a good job with these (http://www.jstor.org/stable/3182036) but so do many other large commercial article databases.
  • Available digitally but not directly linkable? (This might be the case with an article published in a 19th century journal that has been digitized by the volume, but without the individual articles indexed, or an online-only journal with poor linkability.)  Link to the record for the journal in a repository like Hathi Trust or Open Library (above), or to the home page of the online journal, if articles cannot be directly linked.
  • Print-only? (Lots of journal articles still are, especially older, smaller, or foreign ones). Link to the WorldCat record for the whole journal, using the OCLC number or ISSN if there is one: http://www.worldcat.org/oclc/18999240 .

Questions? Quibbles? Cases I missed? Ask in comments.

Previous posts here on LAWDI:

Collection of blog posts and other  resources from LAWDI:

h1

Library-Related Presentations at LAWDI

June 6, 2012

LAWDI was set up with half-hour presentations by ‘faculty,’ and 15-minute presentations by the rest of the attendees.  Links to slides for all presentations that used them are being collected here.  In this post I discuss those presentations most relevant to librarians and the issues they love best (bibliographic citation, authority control, scholarly publishing) as well as recapping my own presentation.

Friday we began with a talk by Chuck Jones of the ISAW Library (links he discussed collected at AWOL) and then a powerhouse tour of library linked data and metadata issues by Corey Harper of NYU’s Bobst Library.  His slides are here.   (For librarians wanting to get up to speed or keep up to date on the issues Corey covers I also strongly recommend following the blog of Ed Summers of the Library of Congress, http://inkdroid.org/journal/ Half of what I know about linked open data I learned there.)

So, I had a tough act to follow; I think I actually said, “And now for something completely different.”  First I described the goals of and demonstrated the Ancient World Open Bibliographies. Its origins are covered in a post titled “The Beginning” at that blog, and you can follow the links to the Wiki and Zotero library for the project yourself. In the context of LAWDI, it was important to note that Zotero allows the export of bibliographic citations automatically marked up using the Bibo (Bibliographic ontology) vocabulary, so keeping bibliographies there gives you a leg up on becoming part of the linked open data world.  I also demonstrated an online bibliography on Evagrius Ponticus by Joel Kalvesmaki of Dumbarton Oaks as example of what can be done with a bibliography based in Zotero, but presented as an inherent part of a digital project.

The second point I wanted to make was that bibliographic information is linked open data friendly.  (Libraries have worked hard to make it so!) Library catalogs are structured data files on books, and while the current structure is out of date, we’re working on that (see Corey Harper’s talk). Most books have a standard number that represents them: an ISBN, an OCLC number (accession number into the OCLC catalog, now online as WorldCat) or a Library of Congress Control Number (LCCN).  Many books have all three!  Articles, book chapters, or other things  scholars want to cite are more problematic.  Many journal publishers now use DOIs (digital object identifiers) for specific articles, but these have not been universally adopted. I demonstrated the DOI resolver at http://dx.doi.org/ (which also lets you create stable URIs for DOIs; I’ll cover this in more explicit detail in a future post.)

My third point was to try to think more broadly about how existing open-access online bibliographic indexes for ancient studies could move in the direction of being linked open data compliant.  At 8am the morning I spoke, without any prompting from me, Tom Elliott posted a manifesto on this same topic at his blog: Ancient Studies Needs Open Bibliographic Data and Associated URIs. So, let me say, what he said, and amen.

Saturday we had two talks that were very exciting to me as a librarian, even though they were actually about scholarly publishing. Sebastian Heath of ISAW talked (without slides I think) about publishing the ISAW Papers series using linked open data principles.  Andrew Reinhard of the American School of Classical Studies (ASCSA) publications office brought forward one of the more resonant metaphors of the conference, that the current scholarly publishing enterprise is essentially steampunk, 21st century work with 19th century models. (This got retweeted a lot!) He was bursting with ways ASCSA plans to change this. Slides are here.

Next up: my recommendations on choosing good links for bibliographic stuff.

Previous post here on LAWDI:

h1

LAWDI Conference on Linked Open Data for Ancient Studies

June 4, 2012

This week I was very fortunate to attend the Linked Ancient World Data Institute (LAWDI) conference held at the Institute for the Study of the Ancient World (ISAW) at NYU, in New York City and sponsored by the NEH Office of Digital Humanities.  This is intended to be the first of a few posts in which I discuss the conference topics and some practical outcomes I hope to participate in. (I say this publicly so I have to actually write them!)

LAWDI was a wonderful conference.  The very active twitter feed (#lawdi) was followed by 400 people, and towards the end I began to worry that they would start to think we were all a bit touched in the head, given the levels of enthusiasm that approached a lovefest.

Much credit goes to our ISAW host and general fount of visionary optimism, Sebastian Heath, as well as his co-hosts Tom Elliott of ISAW and John Muccigrosso of Drew University (where a second LAWDI will be held in 2013.) They fostered an atmosphere of collaboration and support that was truly welcoming to attendees at all levels; this is a rare enough feat at any conference, but especially so at one dealing with fairly high-level technological and semantic discussions.  My fellow conference attendees were also a fascinating, bright, energetic and truly nice group of people.  I feel as if I’ve made a bunch of new friends. Thank you all.

So, LAWDI is about Linked Open Data. I am sure I have a lot of general readers who may be wondering what the heck that is.  Here’s my attempt at a basic recap in terms that should be fairly accessible (I just actually tried to explain this to my neighbors, who are neither IT nor ancient studies people).  The internet is all about linking; one of the best ways to draw attention to resources is by linking to them. Links that are stable and short(ish), like http://www.jstor.org/stable/3632121 or http://www.worldcat.org/oclc/235892089 are a lot easier to deal with than 100+ character linksoup with characters like % and ? or websites where you can only link to a landing page but individual documents must be searched for every time you go there. So, people who manage information online should work on making their links resemble those above, for ease of use by everyone.

Second, where possible, links should go to authoritative sources. Pick a place to link to that will be around for a while – forever if possible! There are actually now international authorities for some things – VIAF is a big one for personal names, for example – so if I want to refer to 19th century Classicist Basil Gildersleeve I can link to http://viaf.org/viaf/2490055 and be pretty sure that that’s understandable to both people and computers internationally and will be around for a good long time.  (I’ll make a list of “good places to link to for classical bibliographies” in a subsequent post.)

Beyond that, however, there are some background technologies – not necessarily visible to the human viewer of a web page – to allow computers to figure out links between things.  The Wikipedia article linked above gives you a lot of acronyms and links to explain them, but for the non-coder, the gist is as follows.  One uses a special markup language to tell any computer that looks that “Basil Gildersleeve” is a human person, and that the URL http://viaf.org/viaf/2490055 is a description of him.  The computer can then find other references to the human person Basil Gildersleeve described at http://viaf.org/viaf/2490055 elsewhere, see that they are the same person, and automagically make a link.  This is the ultimate goal.  Examples of projects in ancient studies that are using this technology to, for example, search across disparate data sets include Pelagios and  CLAROS.

Coming next: 1) a recap of my presentation at LAWDI and 2) thoughts about best practices for Linked Open Data related to bibliographies and bibliographic citations specifically, at 2 levels: the low-tech and the higher-tech.