Free your data and the rest will follow

October 17, 2002

Alice questions the feasibility of my utopian legal research scheme and wonders if it would end up making research harder, by killing off West and Lexis. I highly doubt that courts publishing their cases in a standard electronic format would kill off West or Lexis, or even have a significant effect on their revenues. In fact, it might make it even cheaper for those services to add new data (although probably not. I assume they get material in some sort of electronic format already.)

Even if data is available free, there is still room for premium research services.

People are willing to pay for data that is otherwise free, if it is well organized and easy to search. For example, eMarketer (my former employer) does this for internet stats. West and Lexis provide a significant added value to the raw data (the value of human editing, classifying cases into the relevant materials.) What I'm suggested is a standard for distributing the electronic equivalent of official court reporters (which, according to a law librarian, "you're never going to actually use.")

Open legal data is not going to replace the proprietary databases, and probably won't even make legal research all that much cheaper. What it does do is to allow law libraries, schools and firms to be able to create their own unique tools based on freely available data. A firm could create an electronic database of case law focusing in on a certain area, and index it using a much more detailed taxonomy than West's. A tool that spiders through cases and pulls out the links between them from citations (ala Blogdex), and give entities the opportunity to have research tools that are better suited to their individual needs. A firm's internal research system could store their own metadata alongside the primary materials. Open formats would open legal research to new creative, specialized tools. This is not creating a single monolithic scheme, but opening the door to many specialized and innovative search schemes. For example, look at Google API and offer lawyers a larger number of unique, specialized tools.

The biggest obstacle to this being useful anytime soon is not convincing the judges of the relevance-- electronic publishing will likely be cheaper than publishing on paper-- but in getting out the 200+ years of historical data that's in books and proprietary databases into an open, standard format.

EDIT: Donna Wentworth asks:

Let's just say that someone had a copy of the Eldred Supreme Court transcripts, culled from the generous-yet-decidedly-proprietary databanks of Lexis-Nexis. Could that someone then go ahead and publish the transcripts on her weblog?
On the same theme, wouldn't that be easier if there was public access to public court records?

Posted by Andrew Raff at October 17, 2002 04:37 PM
Trackback URL for this entry: Geeked out!
Excerpt: Andrew has elucidated his thoughts on the super utopian open research scheme. Plus my super-duper comments!
Weblog: a mad tea-party
Tracked: October 18, 2002 06:17 PM

Yes, you're right. Getting cases into non-proprietary format would be the hardest thing. But moving beyond that --

I suppose I would need to see a more detailed specification of what you intended it to do. If I want to see the cases that link to another, I key cite or shepardize or flavor-of-the-week it. I don't see how links a la Blogdex would be helpful, and how that isn't already incorporated.

The problem with that sort of relational linking is it *still* doesn't take into account what kind of link it is (link on standard of review? prior cases in that line's holding, etc) and how the current opinion is treating that link. Any kind of program made to separate out, say, construction law cases is inevitably going to select some summary judgment cases and underselect for the pertinent case law.

But do I need to store my own special metadata on a case that I use infrequently? Perhaps even only once? What incentive is there for me to do that? The point you make about keeping your metadata to yourself is good (I think Mike was envisioning a totally open system). No _lawyer_ will want to share his proprietary info!

Now, what I think you're getting at (and I could be totally off-base, hope you don't think I'm being too presumptive) is knowledge management on the firm end. I could definitely see the utility in something that went through briefs, memos, and whatnot, tagged the sentences being cited to, and arranged them so they could be accessed either on a topical level (or what it hoped was a topical level... again, I have problems seeing how mere programs can really organize these things well) or from the case law. That way you can see what you've said the case said. Throw in some trust metrics (super secret, because the smart partners get more juice than the not-so-great lawyers that nonetheless bring in loads of business ;) and the new lawyer or the lawyer attacking a case a little bit out of his normal area can pick up things more easily. Of course, link the things you've pulled out into your objects to the files they're in, so you can see the context.

For that kind of idea, you wouldn't even need to worry about open source law. Just link it to your favorite research system and all that jazz.

Posted by: Alice on October 17, 2002 10:15 PM

Court records - there is indeed public access to the SCOTUS transcripts (see here for more info -

You'd have to ask a real attorney, but I am pretty sure there is much less law mediating in favor of public access to transcripts than for the _law_. There will be a good chunk of cases in your conlaw casebook dealing with the right to appeal and the costs of getting a transcript for that appeal and people who can't afford to do that.

With those kind of public but not free records, although you might think it would be wonderful to have free and open court records for all, think about all the private information in those records. Quite a bit has been written recently about public records that were private by virtue of their inaccessibility. Laws making certain records public didn't quite foresee new technology that spreads social security numbers and home addresses all over the world.

Posted by: Alice on October 17, 2002 10:26 PM

"But do I need to store my own special metadata on a case that I use infrequently? Perhaps even only once?"

I'd think that this is the situation where you want to store your personal metadata, so that you can remember how you used it last-- 3 years ago.

Of course, this raises the issue of personal KM vs firm-centric KM. As an individual, you want to keep your knwoledge as personal and proprietary as possible, but the firm, as a whole, wants to share knowledge and make sure that it permeates throught the firm and doesn't centralize in one person. That tension is why organizations have trouble adopting KM initiatives. But being able to extend and create one's own system will unquestionably more elegant and useful than hacking some extensions onto a proprietary research system.

I think the privacy argument is a strong reason to keep public records only semi-public, but the irony of public records not being available to the general public helps to make my point that open access is useful.

And what type of research applications do I envision? I'm not entirely sure. I don't have enough experience with legal research to know what I'd like to see. From the idealistic perspective, open systems offer much more potential for faster innovation and trial and error than closed systems run by an oligopoly with little incentive to innovate.

Posted by: Andrew on October 17, 2002 11:02 PM

Efforts are under way to convince state and federal courts to move to citation systems that do not depend on proprietary reporters like the West Regional Reporter system. Some states have already adopted public domain citation, making it a lot easier to use the case reports released electronically by the state courts.

Mandatory citation to proprietary reporter series is onerous because, thanks to Eighth Circuit and other holdings, the pagination of a compilation is copyrightable even if the compiled text is in the public domain, and "star pagination" in another copy is infringement. There's a circuit split on this matter. See West v. MDC, 799 F.2d 1219 (8th Cir. 1986) (holding that Mead, then the owner of LEXIS, appropriated West's copyright in the arrangement of its articles when it inserted "star pagination" in cases reproduced on LEXIS), cert. denied, 479 U.S. 1070 (1987). Contra Matthew Bender & Co. v. West Pub. Co., 158 F.3d 693, 699-701 (2d Cir. 1998) (holding that Matthew Bender's practice of obtaining case text from LEXIS and reinserting "star pagination" referring to pages of the books of the West Reporter System did not infringe because the arrangement does not meet even the de minimis requirement of creativity), cert. denied, 526 U.S. 1154 (1999).

(Don't worry, I ripped that cite out of a paper I wrote. I didn't spend part of my Saturday evening writing that cite! Besides, I don't have access to Westlaw or LEXIS to do that kind of research right now, which makes public domain opinions that much more important to me.)

The 17th Edition of the Bluebook includes a rule for public domain citation, rule 10.3.3, on page 64.

In states that adopt public domain citation, the courts tend to release opinions in formats designed to make using public domain citation easier -- for example, by numbering paragraphs so that page numbers and typefaces are insignificant.

A few resources:

American Association of Law Librarians, Citation Formats Committee:

American Bar Association Special Committee on Citation Issues Report and Recommendations (1996):, Public Domain Citation Systems:

Google turns up a whole pile of hits for "public domain citation system".


That all said, I agree that a place will remain for the paid legal research services, who can invest substantial sums in system maintenance, database and software development, data entry, and so forth. They can amass data from countless proprietary publications that are not in the public domain. The most sophisticated searches will still come at a cost, but a lot of times one can get by with much less complicated searches. Sometimes, it's just a matter of getting hold of a case that one knows is out there.

Posted by: tph on October 19, 2002 11:35 PM