Archive for the ‘Information Management’ Category

h1

TOCS-IN at Zotero: A Project That Didn’t Work

September 20, 2012

So, blogging a project that didn’t work – good idea or not?  Let’s see…

The project was to get the content of the TOCS-IN citation database into the free, open-access bibliographic software Zotero (which David Pettegrew discusses today; his post kicked me over my hesitation about blogging this project). I wanted to do this for two reasons: to draw increased attention to TOCS-IN, which is an excellent, open-access bibliographic resource for Classicists, and make it especially accessible to Zotero users; and to make the TOCS-IN content potentially available as Linked Open Data, because Zotero can export files in BIBO, a linked open data format for bibliographic citations.

My steps were:

1. Get permission from P.M.W. Matheson of the University of Toronto, the manager of the volunteer-driven TOCS-IN project, to use the available data files for this purpose.  She was helpful and supportive – thank you!

2. Write a Python script to convert the data file formatting from a custom SGML markup to RIS format, a common format for bibliographic citations (used by Zotero as well as EndNote, which created it.) I am not a programmer, but happily my husband is; this piece would not have been possible without his help, although I did big chunks of it All By Myself.

3. Add the RIS-formatted citations to a Zotero Group library. This turned out to be the problem.  In theory, there is no limit to the number of bibliographic citations that can be stored by a Zotero user.  In practice, once I got about 40,000 (of the ca. 80,000) citations uploaded my Zotero standalone software began freezing every time I attempted to do anything (like stubbornly add another several thousand citations), and refusing to sync with the online Group Library.  A question posted in the Zotero forums got the swift and helpful confirmation that the sync process simply cannot handle such large datasets well, and that I alone would not be affected; any users who tried to use this large group library would start crashing their Zotero instances as well.

What now?

It’s possible that Zotero, which is actively under development, will make it possible to create very large citation libraries. Zotero used to not be able to handle a couple of thousand citations in one library, and now it can do that with ease (as, for example, the ASCSA Group Library of 2553 items demonstrates). But it may not be a priority for Zotero’s developers to move in that direction; most people use Zotero for personal citation libraries, not as de facto mirror sites for large bibliographic indices.

I have looked at BibSoup/BibServer, related projects that allow the open-access presentation of bibliographic data online, deal with a wide variety of formats (bibtex, MARC, RIS, BibJSON, RDF), and are relevant to the Linked Open Data goal of this project (full RESTful API).  I really liked Zotero simply because it is already very popular with humanities-oriented users and likely to become more so (it seems especially popular among graduate students). BibSoup is geared toward STEM academics, and currently only has about 17,000 citations total (and I’m a little hesitant about breaking things after my Zotero experience!); BibServer requires a server and IT chops which I lack. I do think these applications have a lot of potential, but I don’t think they will work for my project right now.  I’d welcome an argument on this point, or any other suggestions.

Finally, I’d like to add a quick recap and appreciation of what TOCS-IN is and comprises.  TOCS-IN is a bibliographic database  that is fully open-access (searchable at Toronto and at Louvain) and entirely crowd-sourced – that is to say, made possible by the contributions of volunteers who transcribe or copy and paste journal tables of contents and format them for inclusion in the database.  A list of volunteers is available at the site, as is a list of journals currently needing a volunteer.  Do consider joining us; I am currently covering three journals, and the time burden is minimal, especially if the journal publishes its table of contents online (much less typing!)

The basic portion of TOCS-IN is about 80,000 citations, comprising the tables of contents of about 180 journals, all among those indexed by the subscription database L’Annee Philologique. The project began in 1992, so chronological coverage mostly starts there.  A comprehensive list of titles, volumes, and issue numbers is available at the Toronto site. TOCS-IN at Toronto and Louvain currently also searches an additional ca. 56,000 citations, including tables of contents of some TOCS-IN journals dating before 1992 (listed at Louvain), and edited volumes, festschrifts, etc. (listed at Toronto).

Advertisements
h1

MARC Records for Packard Humanities Institute Latin Texts

August 2, 2012

Blake Landor, the Classics, Philosophy, Religion and General Humanities Librarian at the George A. Smathers Libraries, University of Florida, has just announced the availability of a set of open-access MARC records for the PHI Classical Latin Texts online (formerly on widely-used CD-Rom).

To download the 605 MARC records, scroll to the bottom of the University of Florida Library’s page about Creative Commons licenses for their work: http://www.uflib.ufl.edu/catmet/creativecommons.html There is a download link for a zip file of the records.

Anyone wanting a view of the way the records look in UF’s catalog can search for ‘Packard Humanities Institute’ in the online catalog: http://uf.catalog.fcla.edu/

Landor thanks Chuck Jones and Karen Green for their support of his project, which was funded by an internal mini-grant, but clearly the biggest thanks are due to Landor for his initiative and public service.  Kudos! Librarians, get ’em in your catalogs ASAP!

h1

LAWDI 3: Good Linking Practices for Bibliographic Stuff

June 13, 2012

While the following were informed by conversations and presentations at LAWDI, they should be considered my opinions only, and I welcome any (polite!) discussion of why my ideas are wrong-headed  in comments.

So, you’re a scholar putting up information online, and you don’t have the time or IT chops to start learning how to implement RDFa or learn a specialized linked open data vocabulary. The following are some ideas of things you can do that are linked open data friendly, with an emphasis on providing links to stable, authoritative, easy to use URLs. This post covers bibliographic items (secondary scholarship).

I want to emphasize that doing all this linking is work; it takes time. I’ve been trying to link more thoroughly in my blog posts about LAWDI, and it does add to the time burden of writing blog posts. I urge readers to strive to include more (good-quality) links in the things they post online, but please don’t feel guilty if you can’t do it all. Do what you can; every bit is a piece toward our common goals.

Books

  • Link to a WorldCat record using the OCLC number. Permalink URLS are linkable from records and can be created using the format http://www.worldcat.org/oclc/37663433 .
    WorldCat is my top choice because 1) it welcomes links, 2) it’s the largest and most international open linkable library catalog. Note: sometimes if you look a book up by title you’ll find multiple OCLC records with multiple OCLC numbers, even though you’re looking at the same book, not even different editions. OCLC and its members are probably working to tidy this sort of thing and merge (or at least cross-reference) duplicate records. For now, pick the one that has the largest number of libraries showing in the list in your home/target country (there will often be one US record and one European record, for example.)
  • Link to the US Library of Congress using an LCCN (Library of Congress Call Number).  Permalink URLS are shown in records and can be created using the format http://lccn.loc.gov/97040652 (useful, since many books have the LCCN in print on the inside.)
    Using the Library of Congress is a fine choice; it’s my second choice because it is US-centric (while WorldCat is working on becoming more international) and the Library of Congress records don’t have the enhancements that WorldCat records do (ability to display holdings in libraries near you, ability to provide a link to online booksellers, etc.)
  • I would not bother linking to, for example, Amazon using an ISBN. WorldCat links using OCLC are more useful in my opinion, and as easy to create.
  • Including the ISBN in a citation can be useful; there are some great browser plug-ins that can identify ISBNs in web pages and link users to libraries or online booksellers (for example, LibX or Book Burro).

Digital Books

  • If a book is available in an open-access digital edition, by all means include a link to that, preferably in addition to a link to a WorldCat record for the print edition. For open-access digital books you have two strong choices, neither the clear winner yet in my opinion.
  • Link to the Open Library record. URLs look like this: http://openlibrary.org/books/OL6907393M
    Open Library is the more linked data friendly solution; each record can be downloaded in RDF and JSON. Records also include linked OCLC numbers and LCCNs. The full-text books can be downloaded in a bunch of different formats, from .pdf to MOBI, and also also readable online.  Open Library is part of the Internet Archive, and is a “born-open” project. They currently only have about 1 million open-access books, though, and their records aren’t as scholar-friendly – they don’t have all the features of  library catalog records (though they are based on them.)
  • Link to the Hathi Trust record. URLs look like this: http://catalog.hathitrust.org/Record/001220795
    Hathi Trust’s records have library-provided bibliographic data and they have a large collection (3 million plus) of open-access volumes (as well as many more digital volumes not open-access; availability of formats can also be an issue). They are backed by a bunch of big academic libraries and are likely to stick around. They have an API, but are not as linked-data friendly as Open Library.
  • I would not bother linking to a Google Books record unless you can’t find a match at either of the previous places. Google Books has great content, but their metadata is lacking, and they are a for-profit company who cannot guarantee a future commitment to free open-access products.

Book Chapters

  • For print-only book chapters, right now you’d do best to link to the whole book.
  • Ditto for book chapters available in full-text digitally, unless you can track down .pdfs at the author’s web site or academia.edu, for example.

Journal Articles

  • Link to the DOI of the article – a long unique number appended in even print citations – using the format http://dx.doi.org/10.1177/1469605309338428 . Participating publishers have committed to maintaining access to articles via DOIs in perpetuity, even as their online platforms may change. (Remember, though, a lot of the articles are available by subscription only; many who follow the link will get an abstract but not full-text if their institution does not subscribe.)
  • Available digitally but doesn’t have a DOI? Look for a stable URL or permalink at the page with the article citation. Jstor does a good job with these (http://www.jstor.org/stable/3182036) but so do many other large commercial article databases.
  • Available digitally but not directly linkable? (This might be the case with an article published in a 19th century journal that has been digitized by the volume, but without the individual articles indexed, or an online-only journal with poor linkability.)  Link to the record for the journal in a repository like Hathi Trust or Open Library (above), or to the home page of the online journal, if articles cannot be directly linked.
  • Print-only? (Lots of journal articles still are, especially older, smaller, or foreign ones). Link to the WorldCat record for the whole journal, using the OCLC number or ISSN if there is one: http://www.worldcat.org/oclc/18999240 .

Questions? Quibbles? Cases I missed? Ask in comments.

Previous posts here on LAWDI:

Collection of blog posts and other  resources from LAWDI:

h1

Library-Related Presentations at LAWDI

June 6, 2012

LAWDI was set up with half-hour presentations by ‘faculty,’ and 15-minute presentations by the rest of the attendees.  Links to slides for all presentations that used them are being collected here.  In this post I discuss those presentations most relevant to librarians and the issues they love best (bibliographic citation, authority control, scholarly publishing) as well as recapping my own presentation.

Friday we began with a talk by Chuck Jones of the ISAW Library (links he discussed collected at AWOL) and then a powerhouse tour of library linked data and metadata issues by Corey Harper of NYU’s Bobst Library.  His slides are here.   (For librarians wanting to get up to speed or keep up to date on the issues Corey covers I also strongly recommend following the blog of Ed Summers of the Library of Congress, http://inkdroid.org/journal/ Half of what I know about linked open data I learned there.)

So, I had a tough act to follow; I think I actually said, “And now for something completely different.”  First I described the goals of and demonstrated the Ancient World Open Bibliographies. Its origins are covered in a post titled “The Beginning” at that blog, and you can follow the links to the Wiki and Zotero library for the project yourself. In the context of LAWDI, it was important to note that Zotero allows the export of bibliographic citations automatically marked up using the Bibo (Bibliographic ontology) vocabulary, so keeping bibliographies there gives you a leg up on becoming part of the linked open data world.  I also demonstrated an online bibliography on Evagrius Ponticus by Joel Kalvesmaki of Dumbarton Oaks as example of what can be done with a bibliography based in Zotero, but presented as an inherent part of a digital project.

The second point I wanted to make was that bibliographic information is linked open data friendly.  (Libraries have worked hard to make it so!) Library catalogs are structured data files on books, and while the current structure is out of date, we’re working on that (see Corey Harper’s talk). Most books have a standard number that represents them: an ISBN, an OCLC number (accession number into the OCLC catalog, now online as WorldCat) or a Library of Congress Control Number (LCCN).  Many books have all three!  Articles, book chapters, or other things  scholars want to cite are more problematic.  Many journal publishers now use DOIs (digital object identifiers) for specific articles, but these have not been universally adopted. I demonstrated the DOI resolver at http://dx.doi.org/ (which also lets you create stable URIs for DOIs; I’ll cover this in more explicit detail in a future post.)

My third point was to try to think more broadly about how existing open-access online bibliographic indexes for ancient studies could move in the direction of being linked open data compliant.  At 8am the morning I spoke, without any prompting from me, Tom Elliott posted a manifesto on this same topic at his blog: Ancient Studies Needs Open Bibliographic Data and Associated URIs. So, let me say, what he said, and amen.

Saturday we had two talks that were very exciting to me as a librarian, even though they were actually about scholarly publishing. Sebastian Heath of ISAW talked (without slides I think) about publishing the ISAW Papers series using linked open data principles.  Andrew Reinhard of the American School of Classical Studies (ASCSA) publications office brought forward one of the more resonant metaphors of the conference, that the current scholarly publishing enterprise is essentially steampunk, 21st century work with 19th century models. (This got retweeted a lot!) He was bursting with ways ASCSA plans to change this. Slides are here.

Next up: my recommendations on choosing good links for bibliographic stuff.

Previous post here on LAWDI:

h1

Google Scholar Citations & Wikipedia Initiative

November 20, 2011

I started a temporary job this week, at the University of Cincinnati Classics Library. It was sudden, and is temporary, because it followed the unexpected death of David Ball, the longtime Circulation Supervisor there, and PhD of that department, whom I knew slightly during our overlap in the Blegen in 2000. He is and will continue to be much missed.

In my experience the first week of a new job one is either left alone and bored for long periods while training is being organized, or one is run off one’s feet.  Guess which last week was for me? There’s also some “work for hire” language in the temp agency paperwork that makes me uncomfortable, so I’ll be blogging exclusively on my own time, which has many other demands on it already, such as 3rd grade spelling homework.  Two quick notes, though:

Following swiftly on the heels of the Bing Scholar outreach into Arts and Humanities, Google has opened up its “Citations” program to all comers.  What this means is you can sign up to manage a page for yourself as a Google Scholar author, verify that scholarly works Google Scholar identifies as by you are actually by you, and link out to a web site (hmm, following on Chuck Jones’s post about the prevalence of full-text papers in Institutional Repositories and desirability of an index thereto, why not link to a place scholars can download .pdfs of your work?)  There are also the beginnings of citation metrics, a feature Microsoft Academic Search is also developing, both as a challenge to the most commonly used metrics in (subscription-based) Science Citation Index at Web of Science.

Here’s a link to my citations page, if you want to see what it looks like.  Obviously if your name is as uncommon as mine, you’re probably easily findable in Google Scholar anyway, but if you share a name with many scholars in many fields, Google Scholar Citations is a great way to make your work more easily findable amidst the mass of Karen Joneses out there.

On another topic, I sadly neglected to note who brought to my attention the American Sociological Association’s call for a Wikipedia Initiative among scholars in that field.  Hat tip to somebody, probably Chuck Jones or David Meadows!  The essay linked above can be boiled down to: Think Wikipedia stinks for sociology? Well, people are going to keep using it, so why not make it better?  Gabriel Bodard bruited the idea of a Classics Wikipedia Hack Day on Twitter a while back, but enthusiasm was somewhat limited.  I myself was a bit daunted when I set out to be a one-woman Wikipedia Classics Hacker, and wrote about some reasons why.  But I still think it would be valuable, and one might even argue that it’s necessary, for scholars to improve Wikipedia articles in their fields. I just can’t quite see yet how to make it happen, and I hope the Sociologists find a good way forward with this.

h1

Academic Search from Microsoft (Yup, it’s Bing Scholar)

November 8, 2011

I still get a ton of traffic to this blog from people searching for a Microsoft Bing version of Google Scholar.  Yesterday I got a comment from someone who works at Microsoft linking me to such a service, which now exists in beta (whatever that means, anymore).

Please consider the following assessment of Microsoft Academic Search, as it seems to be formally called, an addendum to my long post
Comparing citation searching: Google, Bing, Google Scholar, Web of Science, L’Annee originally written in October 2009. (You know in my mind it’s just going to be Bing Scholar, forever and ever.)

Short reminder of the method – I vanity searched myself, under the names “Phoebe Acheson” and “Phoebe E. Acheson” (with and without quotes if the search engine supported them) and reported on how much of my professional work was found.

Microsoft Academic Search (http://academic.research.microsoft.com/)
Microsoft Academic Search is a free online scholarly search engine which debuted in late 2009 with limited discipline coverage, but has expanded a great deal over the course of 2011.  (According to this report, there was no Humanities or Social Science content yet as of July 2011; there is now.) It’s very hard to find a statement of where the content indexed by Microsoft Academic Search comes from; it does not appear to come through direct partnerships with academic publishers, as Google Scholar uses.  Instead, Microsoft Academic Search uses a “focus crawler” and indexes data (including some but not all metadata) from web sites listing citations.  (This information comes from a Microsoft Q&A forum in 2010; a list of the top 100 sites indexed is included as are some specifics about metadata collected.) A major difference from Google Scholar is that Microsoft Academic Search seems to index (and thus search) only citations, not the full text of articles.

As of this writing, Microsoft Academic Search states that it contains 36,684,112 publications by 18,820,566 authors, and is updated weekly, with 123,978 items added last week.  Microsoft Academic Search classifies its content by Domains, which are heavily tilted towards scientific disciplines (Agricultural Science, Materials Science) but now include Arts & Humanities, Business and Economics, and Social Science.  One advantage of the domain classification is that one can limit a search by one or more domains; this fixes a common problem in Google Scholar in which name searches for classical scholars turn up many articles by same or similar-named authors in scientific fields.  Search results can also be narrowed by domain, a very big improvement over Google Scholar.  How the Domains are assigned is not stated, of course, so interdisciplinary topics might be tricky to place accurately.

When I searched for Phoebe Acheson, a box above the results set asked me if I was searching for one of two authors, Phoebe Acheson or Phoebe E. Acheson (I have published under both names).  For a more common name – I used Steve Thompson – a long list of possibilities appears, but at least some of them are distinguished by academic affiliations, and a few have a photo!  It is possible to create and account, log in and add information to Microsoft Academic Search, and one thing a researcher can do is “claim” her own articles and create a profile (and apparently upload a picture.)  (Google Scholar has a feature like this which came out last summer, but I signed up on the wait list to claim my account then and still haven’t heard back from them.)

Microsoft Academic Search found 4 publications for me, and they are all works that I authored or co-authored.  One publication is listed twice; apparently the algorithm is not too good at detecting duplicates, as the only difference is the absence of page numbers in one of the citations. A check of Google Scholar using the search Phoebe Acheson turns up a total of 275 citations, but only the top 5 are actually things I published.  Thus, while Google Scholar includes more erroneous results, it also includes more correct results (and remember, it is searching the full text of articles – so it finds any publications that mention my name).  I would guess that Microsoft Academic Search will improve in this area, as Humanities and Social Science domains are new to the system and presumably growing.  Microsoft Academic Search, like Google Scholar, includes a citation index feature allowing one to see other works which have cited a paper.  This feature also suffers from the limited content of Microsoft Academic Search; a paper listed as cited 14 times in Google Scholar has no citations in Microsoft, and another cited 8 times in Google Scholar is cited once in Microsoft.  Since Microsoft Academic Scholar is using this citation information to develop citation metrics (see this Nature article), the speedy growth of the material set indexed by Microsoft is urgent to make the numbers have real meaning.

So, the content for the Humanities and Social Sciences is very limited still.  Where Microsoft Academic Search shines, and challenges Google Scholar, is the added features.  The ability to facet a search by domain and the existence of author pages (here’s Jack L. Davis) were mentioned above.  There are also pages for journals (here’s Hesperia), built in citation graphs and co-author webs, and various other neat bells and whistles (a Call For Proposal search that can specify by location of the conference – I guess for when you’re dying to visit Florence for work!)

I recommend most classics scholars and students check Google Scholar when searching for articles on a topic, in addition to looking in  discipline-specific bibliographic sources.  (I also LOVE it for citation-checking – when you’ve copied something down wrong, or can’t remember a subtitle, Google Scholar is almost always the fastest way to get the right information.)  Microsoft Academic Search is not yet ready to challenge Google Scholar for classicists, based simply on the content available.  But if the content continues to grow, it could become a strong challenger.  And I think that junior academics seeking to manage their online visibility and findability owe it to themselves to spend an hour logging on, claiming their author page, and adding any missing citations (you can even link to a full-text paper or add a .pdf).  Like Academia.edu, which I have discussed in this space, Microsoft Academic Search is a place you can be found, so it behooves you to make the information about you there as full and accurate as possible.