CURL for pubmed reference data

5 replies [Last post]
unixstudent
User offline. Last seen 5 years 1 week ago. Offline
Joined: 05/03/2013

Is it possible to use curl to retrieve a pubmed entry?
When I type:
curl "http://www.rcsb.org/pdb/files/20CB.pdb"

I get the pdb file (with no other info).

But when I type:
curl "http://www.ncbi.nlm.nih.gov/pubmed/23638186"

or even some version of this such as http://www.ncbi.nlm.nih.gov/pubmed/files/23638186

I get a the entire web page of that entry, not just the data associated with PMID 23638186

Is there some way to format this so only the data associated with that particular PMID is retrieved?

Thanks!

pcfb
User offline. Last seen 4 days 22 hours ago. Offline
Joined: 08/04/2010
Curl for Pubmed

This is a good question and would be very popular, but the short answer is no. Pubmed is tricky to get the PDF even when you are browsing the web. Usually there is a LinkOut menu below the abstract which has the URL for *other* sites where the PDF might actually reside. NCBI don't archive any of them at PubMed itself, but just point to where they are archived.

In general, if you want to find the "curlable" location of a file, you would start with a page like that pubmed link, look at the source for the page (the raw file) and find the full URL.

In the case of PubMed, they *do* have a link to the DOI of the article, which would link you to the original source.
 curl -s "http://www.ncbi.nlm.nih.gov/pubmed/23638186" | grep doi

If you want to get a little fancier, you can extract just the just URL for the doi itself:
curl -s "http://www.ncbi.nlm.nih.gov/pubmed/23638186" | grep doi | tail -n1 |sed -E 's/.*doi: ([^ ]+)\..*/http:\/\/dx.doi.org\/\1/g'

That URL leads to a redirection site that in turn leads to the real site! So it is possible to curl *that* url, but it will lead to different sites, so you couldn't easily write a single command that would find the PDF on *that* page, because it would depend on the journal's web format.

Once you start getting fancier like that, it would be worth writing a Python program to chain things together: Find the DOI, retrieve the DOI, Find the Journal it links to, look for links on the journal site that have PDF in the name, etc..

In this case, I would just download the Papers program and let it do the link following for you!

unixstudent
User offline. Last seen 5 years 1 week ago. Offline
Joined: 05/03/2013
One possible answer appears

One possible answer appears to be from http://www.ncbi.nlm.nih.gov/books/NBK3862/

At the prompt, write:
curl "http://www.ncbi.nlm.nih.gov/pubmed/23638186?report=medline&format=text"

This returns the data associated with the PMID 23638186

Now I just have to figure out how to do this with a query!

pcfb
User offline. Last seen 4 days 22 hours ago. Offline
Joined: 08/04/2010
PUBMED Data

Apologies: I thought you were looking for the actual paper associated with it, not just the metadata.

If you are just looking for how you can type getpubmed 23638186 and have it return the record, then look into functions...
Put this definition in your .bash_profile

getpubmed() {
curl -s "http://www.ncbi.nlm.nih.gov/pubmed/$1?report=medline&format=text"
}

The $1 in place of the PMID will insert the first thing you type after this function name (getpubmed) into the URL...

If that isn't what you are looking to do, maybe explain a little more about what you want to type/run and what you want to happen as a result.

unixstudent
User offline. Last seen 5 years 1 week ago. Offline
Joined: 05/03/2013
Thanks so much for all these

Thanks so much for all these great suggestions, I will look into this more as I progress through the book so I can not just copy but (hopefully) understand!

What I was aiming to do with this was to go through the exercises on pages 97-99. I could not get the CrossRef urls in the example files to return the references as it is written in the book so wanted to try to adapt this exercise to PubMed.

pcfb
User offline. Last seen 4 days 22 hours ago. Offline
Joined: 08/04/2010
Crossref

OK, good luck!
If you use copy the URLs from these example files, they should work:
scripts/shellscripts.sh
examples/reflist.txt

For instance, these URLs work for me:

http://www.crossref.org/openurl/?pid=demo@practicalcomputing.org&id=doi:10.1103/PhysRev.47.777&redirect=false&format=unixref
http://www.crossref.org/openurl/?title=Nature&date=2008&volume=452&spage=745&pid=demo@practicalcomputing.org

Have fun, and thanks for the enthusiasm.