Ruby script: download document JPGs from ISSUU

Ian Wishart has released his excellent book ‘The Divinity Code‘ onto the wild woolly web. Sign up to issuu & download the document PDF now!

HOWEVER sometimes publishers don’t give you a PDF. You can download the JPGs individually using laborious methods, but come on! You’re using a computer.. a machine that specialises in automating boring tasks. Time for a script methinks.

This little ruby snippet downloads all the JPG files comprising a document from issuu.com, to the local directory. It needs a wee bit of setup. Instructions are provided in the code comments.
NOTE you’ll need ruby installed to use it!

#-------------------------------------------------
# get_issuu.rb - retrieve all jpg's for a document
#-------------------------------------------------

# 1. Open the issuu.com document in your web browser, as usual
#    example: http://issuu.com/iwishart/docs/thedivinity
# 2. Read the document page count; set variable $PAGES below
# 3. From browser menu, choose View > Source
# 4. Do a text search for "documentId" 
# 5. Copy string such as "081230122554-f76b0df1e7464a149caf5158813252d9" 
#    to $PUB variable below
# 6. Execute script:
#    ruby get_issuu.rb

require 'open-uri'
$PUB="081230122554-f76b0df1e7464a149caf5158813252d9"
$PAGES=20

for $X in 1..$PAGES do
  $PX="page_#{$X}.jpg"
  $PADX="page_#{"%03d" % $X}.jpg"
  puts(Time.now.strftime('%Y-%m-%d %X') +" get "+ $PX +">>"+ $PADX)
  open($PADX,"wb").write(open("http://image.issuu.com/#{$PUB}/jpg/#{$PX}").read)
end
puts("#{Time.now.strftime('%Y-%m-%d %X')} DONE")

Hat tip: rubynoob.

About these ads

27 Comments

Filed under geek

27 Responses to Ruby script: download document JPGs from ISSUU

  1. Archie

    Sorry a stupid question, but where does it save the downloaded images?
    Could you please email me?

  2. JPGs go to wherever the script is invoked.

    Try this
    1) copy the script “get_issuu.rb” to your desktop
    2) start command prompt and cd to desktop (C: \winnt\profiles\???\desktop , or C: \users\???\desktop , depends on your OS)
    3) run the script as described
    4) JPGs copied to the desktop :)

  3. Johan M Kjartansson

    I installed macruby on my mac and tried to run the script in terminal and got this message:

    get_issuu.rb:1: syntax error, unexpected $undefined, expecting ‘}’
    {\rtf1\ansi\ansicpg1252\cocoartf1038\cocoasubrtf350
    ^
    ???

  4. Johan M Kjartansson

    Disregard.

    It worked after I compiled the .rb file in Dreamweaver instead of TextEdit!?

    But just so you know, the images are saved in my users folder even though I had the get_issuu.rb file located on my desktop and I executed it from there also in Terminal ;)

    Thanks allot!

  5. Gonzalo

    Many Thanx, dude! This worked just perfectly! I was looking for picture books so badly! Cheers!

  6. Oleg

    Is it possible to download images in their original size?
    For example : on this book http://issuu.com/junkshow/docs/2012_program_special_blend
    script worked just fine, but images scaled down to fit the screen and practically unreadable.
    thanks.

  7. Kostas

    Excellent! Thanks a lot for that.

  8. nicolas

    it works!!!! a m a z i n g….thanks

  9. Herman Sasmita

    Waaa It Worked….. ^^ THank YOu……

  10. give me the step how to do it?

  11. Mr.Thanks

    I second the explanation on downloading Original FULL SIZE images… I’m getting compressed hardly legible images :| totally bumming me out brah…

  12. sundeep

    how to download the pages inbetween the book??

  13. S

    Works,

    Seconding 100% original full size image download script proposal.

  14. Thanks for sharing!
    And this is my python code:

    #get_issuu.py

    import urllib2
    from datetime import date
    pub=”081230122554-f76b0df1e7464a149caf5158813252d9″
    page=20
    for x in range(1,20):
    px=”page_%s.jpg”%x
    padx=”page_%03d.jpg”%x
    print date.today().strftime(“%Y-%m-%d”)+str(x)+” get “+px+” >> “+padx
    open(padx,’wb’).write(urllib2.urlopen(‘http://image.issuu.com/’+pub+’/jpg/’+px).read())

    print date.today().strftime(‘%Y-%m-%d ‘)+page+”downloaded. Done!”

    Good luck!

    Hung

  15. afrodite p.

    Hi there. I have come across a different situation, where this code does not work. The ‘issuu’ document I would like to download is found in a website but the owner has removed it from his/her issuu.com account-webpage. In other words you can only see the file in the owner’s webpage and only if you register (which is free, but requires email & password). It is obvious that the ruby code is developed so that it only downloads documents from this domain “http://image.issuu.com/#{$PUB}” but is it possible to download form other sites like in the case I described above?

    Any help would be appreciated.
    Thanks.

  16. iodurodisodio

    Thanks, code work perfectly!

  17. You are amazing… thanks for your script…..

  18. Eks

    Thanks using this and convert from ImageMagick, I was able to make a pdf file form a 300 page document.

    convert *.pdf full_document.pdf

  19. I install Python 3.3.0 on my laptop (win 7 ult. sp1 x64) i made a script file named get get_issuu.py, when I try to Run it, it gave me “Invalid syntax” some solution ?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s