Posts Tagged ‘linked open data’

h1

Online Continuing Education Class Sequence in XML and RDF

October 22, 2013

If I had a) time and b) money, I would certainly be considering taking the classes for the Certificate in XML and RDF-Based Systems at Library Juice Academy.  It starts next February, and comprises 6 4-week sessions at $175 a pop. It’s taught by Robert Chavez, who has a PhD in Classical Studies and worked at Perseus for 8 years. Course descriptions for each session are available at the link, and they seem pretty hands-on – like you might actually build stuff, not just talk about it. Library Juice does continuing education classes, all online and asynchronous, for library professionals.

Note I have no personal experience with Library Juice and don’t know Robert Chavez, but boy those classes look just right for someone who’s interested in LAWDI and Linked Open Data and is not a great self-starter in terms of teaching oneself technical stuff (i.e., me). If only I had a) time and b) professional development money.

h1

TOCS-IN at Zotero: A Project That Didn’t Work

September 20, 2012

So, blogging a project that didn’t work – good idea or not?  Let’s see…

The project was to get the content of the TOCS-IN citation database into the free, open-access bibliographic software Zotero (which David Pettegrew discusses today; his post kicked me over my hesitation about blogging this project). I wanted to do this for two reasons: to draw increased attention to TOCS-IN, which is an excellent, open-access bibliographic resource for Classicists, and make it especially accessible to Zotero users; and to make the TOCS-IN content potentially available as Linked Open Data, because Zotero can export files in BIBO, a linked open data format for bibliographic citations.

My steps were:

1. Get permission from P.M.W. Matheson of the University of Toronto, the manager of the volunteer-driven TOCS-IN project, to use the available data files for this purpose.  She was helpful and supportive – thank you!

2. Write a Python script to convert the data file formatting from a custom SGML markup to RIS format, a common format for bibliographic citations (used by Zotero as well as EndNote, which created it.) I am not a programmer, but happily my husband is; this piece would not have been possible without his help, although I did big chunks of it All By Myself.

3. Add the RIS-formatted citations to a Zotero Group library. This turned out to be the problem.  In theory, there is no limit to the number of bibliographic citations that can be stored by a Zotero user.  In practice, once I got about 40,000 (of the ca. 80,000) citations uploaded my Zotero standalone software began freezing every time I attempted to do anything (like stubbornly add another several thousand citations), and refusing to sync with the online Group Library.  A question posted in the Zotero forums got the swift and helpful confirmation that the sync process simply cannot handle such large datasets well, and that I alone would not be affected; any users who tried to use this large group library would start crashing their Zotero instances as well.

What now?

It’s possible that Zotero, which is actively under development, will make it possible to create very large citation libraries. Zotero used to not be able to handle a couple of thousand citations in one library, and now it can do that with ease (as, for example, the ASCSA Group Library of 2553 items demonstrates). But it may not be a priority for Zotero’s developers to move in that direction; most people use Zotero for personal citation libraries, not as de facto mirror sites for large bibliographic indices.

I have looked at BibSoup/BibServer, related projects that allow the open-access presentation of bibliographic data online, deal with a wide variety of formats (bibtex, MARC, RIS, BibJSON, RDF), and are relevant to the Linked Open Data goal of this project (full RESTful API).  I really liked Zotero simply because it is already very popular with humanities-oriented users and likely to become more so (it seems especially popular among graduate students). BibSoup is geared toward STEM academics, and currently only has about 17,000 citations total (and I’m a little hesitant about breaking things after my Zotero experience!); BibServer requires a server and IT chops which I lack. I do think these applications have a lot of potential, but I don’t think they will work for my project right now.  I’d welcome an argument on this point, or any other suggestions.

Finally, I’d like to add a quick recap and appreciation of what TOCS-IN is and comprises.  TOCS-IN is a bibliographic database  that is fully open-access (searchable at Toronto and at Louvain) and entirely crowd-sourced – that is to say, made possible by the contributions of volunteers who transcribe or copy and paste journal tables of contents and format them for inclusion in the database.  A list of volunteers is available at the site, as is a list of journals currently needing a volunteer.  Do consider joining us; I am currently covering three journals, and the time burden is minimal, especially if the journal publishes its table of contents online (much less typing!)

The basic portion of TOCS-IN is about 80,000 citations, comprising the tables of contents of about 180 journals, all among those indexed by the subscription database L’Annee Philologique. The project began in 1992, so chronological coverage mostly starts there.  A comprehensive list of titles, volumes, and issue numbers is available at the Toronto site. TOCS-IN at Toronto and Louvain currently also searches an additional ca. 56,000 citations, including tables of contents of some TOCS-IN journals dating before 1992 (listed at Louvain), and edited volumes, festschrifts, etc. (listed at Toronto).

h1

LAWDI 3: Good Linking Practices for Bibliographic Stuff

June 13, 2012

While the following were informed by conversations and presentations at LAWDI, they should be considered my opinions only, and I welcome any (polite!) discussion of why my ideas are wrong-headed  in comments.

So, you’re a scholar putting up information online, and you don’t have the time or IT chops to start learning how to implement RDFa or learn a specialized linked open data vocabulary. The following are some ideas of things you can do that are linked open data friendly, with an emphasis on providing links to stable, authoritative, easy to use URLs. This post covers bibliographic items (secondary scholarship).

I want to emphasize that doing all this linking is work; it takes time. I’ve been trying to link more thoroughly in my blog posts about LAWDI, and it does add to the time burden of writing blog posts. I urge readers to strive to include more (good-quality) links in the things they post online, but please don’t feel guilty if you can’t do it all. Do what you can; every bit is a piece toward our common goals.

Books

  • Link to a WorldCat record using the OCLC number. Permalink URLS are linkable from records and can be created using the format http://www.worldcat.org/oclc/37663433 .
    WorldCat is my top choice because 1) it welcomes links, 2) it’s the largest and most international open linkable library catalog. Note: sometimes if you look a book up by title you’ll find multiple OCLC records with multiple OCLC numbers, even though you’re looking at the same book, not even different editions. OCLC and its members are probably working to tidy this sort of thing and merge (or at least cross-reference) duplicate records. For now, pick the one that has the largest number of libraries showing in the list in your home/target country (there will often be one US record and one European record, for example.)
  • Link to the US Library of Congress using an LCCN (Library of Congress Call Number).  Permalink URLS are shown in records and can be created using the format http://lccn.loc.gov/97040652 (useful, since many books have the LCCN in print on the inside.)
    Using the Library of Congress is a fine choice; it’s my second choice because it is US-centric (while WorldCat is working on becoming more international) and the Library of Congress records don’t have the enhancements that WorldCat records do (ability to display holdings in libraries near you, ability to provide a link to online booksellers, etc.)
  • I would not bother linking to, for example, Amazon using an ISBN. WorldCat links using OCLC are more useful in my opinion, and as easy to create.
  • Including the ISBN in a citation can be useful; there are some great browser plug-ins that can identify ISBNs in web pages and link users to libraries or online booksellers (for example, LibX or Book Burro).

Digital Books

  • If a book is available in an open-access digital edition, by all means include a link to that, preferably in addition to a link to a WorldCat record for the print edition. For open-access digital books you have two strong choices, neither the clear winner yet in my opinion.
  • Link to the Open Library record. URLs look like this: http://openlibrary.org/books/OL6907393M
    Open Library is the more linked data friendly solution; each record can be downloaded in RDF and JSON. Records also include linked OCLC numbers and LCCNs. The full-text books can be downloaded in a bunch of different formats, from .pdf to MOBI, and also also readable online.  Open Library is part of the Internet Archive, and is a “born-open” project. They currently only have about 1 million open-access books, though, and their records aren’t as scholar-friendly – they don’t have all the features of  library catalog records (though they are based on them.)
  • Link to the Hathi Trust record. URLs look like this: http://catalog.hathitrust.org/Record/001220795
    Hathi Trust’s records have library-provided bibliographic data and they have a large collection (3 million plus) of open-access volumes (as well as many more digital volumes not open-access; availability of formats can also be an issue). They are backed by a bunch of big academic libraries and are likely to stick around. They have an API, but are not as linked-data friendly as Open Library.
  • I would not bother linking to a Google Books record unless you can’t find a match at either of the previous places. Google Books has great content, but their metadata is lacking, and they are a for-profit company who cannot guarantee a future commitment to free open-access products.

Book Chapters

  • For print-only book chapters, right now you’d do best to link to the whole book.
  • Ditto for book chapters available in full-text digitally, unless you can track down .pdfs at the author’s web site or academia.edu, for example.

Journal Articles

  • Link to the DOI of the article – a long unique number appended in even print citations – using the format http://dx.doi.org/10.1177/1469605309338428 . Participating publishers have committed to maintaining access to articles via DOIs in perpetuity, even as their online platforms may change. (Remember, though, a lot of the articles are available by subscription only; many who follow the link will get an abstract but not full-text if their institution does not subscribe.)
  • Available digitally but doesn’t have a DOI? Look for a stable URL or permalink at the page with the article citation. Jstor does a good job with these (http://www.jstor.org/stable/3182036) but so do many other large commercial article databases.
  • Available digitally but not directly linkable? (This might be the case with an article published in a 19th century journal that has been digitized by the volume, but without the individual articles indexed, or an online-only journal with poor linkability.)  Link to the record for the journal in a repository like Hathi Trust or Open Library (above), or to the home page of the online journal, if articles cannot be directly linked.
  • Print-only? (Lots of journal articles still are, especially older, smaller, or foreign ones). Link to the WorldCat record for the whole journal, using the OCLC number or ISSN if there is one: http://www.worldcat.org/oclc/18999240 .

Questions? Quibbles? Cases I missed? Ask in comments.

Previous posts here on LAWDI:

Collection of blog posts and other  resources from LAWDI:

h1

LAWDI Conference on Linked Open Data for Ancient Studies

June 4, 2012

This week I was very fortunate to attend the Linked Ancient World Data Institute (LAWDI) conference held at the Institute for the Study of the Ancient World (ISAW) at NYU, in New York City and sponsored by the NEH Office of Digital Humanities.  This is intended to be the first of a few posts in which I discuss the conference topics and some practical outcomes I hope to participate in. (I say this publicly so I have to actually write them!)

LAWDI was a wonderful conference.  The very active twitter feed (#lawdi) was followed by 400 people, and towards the end I began to worry that they would start to think we were all a bit touched in the head, given the levels of enthusiasm that approached a lovefest.

Much credit goes to our ISAW host and general fount of visionary optimism, Sebastian Heath, as well as his co-hosts Tom Elliott of ISAW and John Muccigrosso of Drew University (where a second LAWDI will be held in 2013.) They fostered an atmosphere of collaboration and support that was truly welcoming to attendees at all levels; this is a rare enough feat at any conference, but especially so at one dealing with fairly high-level technological and semantic discussions.  My fellow conference attendees were also a fascinating, bright, energetic and truly nice group of people.  I feel as if I’ve made a bunch of new friends. Thank you all.

So, LAWDI is about Linked Open Data. I am sure I have a lot of general readers who may be wondering what the heck that is.  Here’s my attempt at a basic recap in terms that should be fairly accessible (I just actually tried to explain this to my neighbors, who are neither IT nor ancient studies people).  The internet is all about linking; one of the best ways to draw attention to resources is by linking to them. Links that are stable and short(ish), like http://www.jstor.org/stable/3632121 or http://www.worldcat.org/oclc/235892089 are a lot easier to deal with than 100+ character linksoup with characters like % and ? or websites where you can only link to a landing page but individual documents must be searched for every time you go there. So, people who manage information online should work on making their links resemble those above, for ease of use by everyone.

Second, where possible, links should go to authoritative sources. Pick a place to link to that will be around for a while – forever if possible! There are actually now international authorities for some things – VIAF is a big one for personal names, for example – so if I want to refer to 19th century Classicist Basil Gildersleeve I can link to http://viaf.org/viaf/2490055 and be pretty sure that that’s understandable to both people and computers internationally and will be around for a good long time.  (I’ll make a list of “good places to link to for classical bibliographies” in a subsequent post.)

Beyond that, however, there are some background technologies – not necessarily visible to the human viewer of a web page – to allow computers to figure out links between things.  The Wikipedia article linked above gives you a lot of acronyms and links to explain them, but for the non-coder, the gist is as follows.  One uses a special markup language to tell any computer that looks that “Basil Gildersleeve” is a human person, and that the URL http://viaf.org/viaf/2490055 is a description of him.  The computer can then find other references to the human person Basil Gildersleeve described at http://viaf.org/viaf/2490055 elsewhere, see that they are the same person, and automagically make a link.  This is the ultimate goal.  Examples of projects in ancient studies that are using this technology to, for example, search across disparate data sets include Pelagios and  CLAROS.

Coming next: 1) a recap of my presentation at LAWDI and 2) thoughts about best practices for Linked Open Data related to bibliographies and bibliographic citations specifically, at 2 levels: the low-tech and the higher-tech.