Extracting the audio & stereo pair from Cardboard Camera 3D panoramic images

Update: I’ve made a web app that splits/joins Cardboard Camera images here: http://cctoolkit.vectorcult.com/

The source is here: https://bitbucket.org/pansapiens/cardboardcam

Today Google released a new 3D photo app for VR – Cardboard Camera. Unlike the 2D (non-stereoscopic) photospheres that can be acquired using Google Camera, this new app takes 3D stereoscopic 360 horizontal panoramas using a few tricks to generate a stereo pair using a single lens. It also captures the ambient audio which can be played back while viewing the scene in glorious 3D with a Google Cardboard viewer.


A sunny afternoon in suburban Melbourne (9 Mb, 9510×1797)

However, if you download a Cardboard Camera image, you’ll notice it looks like a single panoramic photo – where’s the audio and the depth information coming from ? Turns out it’s in there, it’s just hiding.

With a bit of digging, first using a hex editor, then using the jhead tool, this is what I discovered. The JPEG Exif header contains some extra data in XMP format. This XMP data contains some Base64 encoded audio (GAudio:Mime=”audio/mp4a-latm”) and a Base64 encoded image (GImage:Mime=”image/jpeg”).

Note that the XML in the XMP header seems to be broken into 65460 byte blocks, punctuated by something like (where ? are some non-ASCII bytes):

I haven’t investigated if this is part of the XMP standard, but I’m guessing it is since the Python XMP Toolkit seems to handle it just fine. It turns out this is all part of the XMP specification (in particular, ExtendedXMP), so Google isn’t doing anything fishy or proprietary here.

So, knowing this, it wasn’t hard to extract the audio data and the extra image, then convert it from Base64 to bytes.

Voila ! Now we have an audio file and the second image of the stereo pair.


Hopefully this is useful to people who want to view their stereo panoramas using methods other than the official Cardboard Camera app.

← Previous post

Next post →


  1. very cool. thanks. trying to invent the same thing

  2. anyone made a webbased extractor service yet? it’s been 3 days already!

  3. Andy Modla

    Thanks Andrew,
    Some of the cardboard camera photos I have seen give me eye strain. I would prefer to use my edited stereo photos from twin cameras and convert to vr.jpg files for viewing in Google’s Cardboard camera app. Is anyone working on code to do this?

    • Yes, I’m working on this. The python-xmp-toolkit allows modification of the XMP data, so it’s not too hard to reverse the process. I’ll have something to share soon.

      • Dean Zwikel

        Hi Andrew. I have the same need. I’m curious if you have been able to implement this yet. I was able to use the current join technique with custom XMP properties to map my left/right stereo photos to a sector of the VR180 hemisphere. This makes the view of the photo about 1/2 of the ideal size when viewing it in VR180 in a headset. It would be nice to be able to scale it up 2x but I can’t seem to find a way to do it with the XMP parameters that doesn’t bend the image.

        Also, would it be possible for you to provide a standalone executeable version of the “join” function? I have a lot of left and right images and would love to convert them for viewing in VR180. Thanks for your work on this. Very useful!!

        • I’ve haven’t done much with this over the past year, but the source code is here if you’d like to adapt it to do bulk joining as a standalone script: https://bitbucket.org/pansapiens/cardboardcam – it only works on Mac or Linux due to the libexempi requirement.

          • Dean Zwikel

            OK. I’m trying to install this on an Ubuntu system and verify it works before attempting to modity the source code. I’m not familiar with the SECRET_KEY, GOOGLE_ANALYTICS_TRACKING_ID settings in ‘cardboardcam/settings.py’. Could I ask what those are used for and how do I know what to set them to?

          • If you are just running on your local machine (not creating a public server), you don’t need to worry about changing SECERT_KEY and GOOGLE_ANALYTICS_TRACKING_ID – it should just work with the current values. If you want to change them, they can be set to any string really (GOOGLE_ANALYTICS_TRACKING_ID is a value provided by your Google Analytics account – if you don’t have one and aren’t using that, don’t worry about it changing it).

            I just tested it and needed to make a small update to make it work (one of the dependencies, flask-thumbnails, had made some breaking changes AFAIKT). You also need to do mkdir -p cardboardcam/static/uploads/thumbnails, which I added to the README.

            Feel free to file issues on the bitbucket repo.

        • Dean Zwikel

          Just installed and tested it on Ubuntu and it works. Could I ask is there a way to store the XMP properties for JOIN in a file and load them vs having to type them in on the form? Thank you so much!

  4. This is awesome. The split worked great for me, but I couldn’t get the join to work. I tried extracting the two images and tidying them up using gimp, then rejoining them using the same meta-data numbers that the original had. It looks like it works, but the CardboardCamera app says “Not a VR image” or something like that. I tried renaming the file to look like the original, but still no go. Any ideas what I might be doing wrong?

    • Hmm .. what you are doing seems like it should work. Could you send me the original .vr.jpg photo and the edited pair you are trying to join so I can try and diagnose what’s going wrong ? (contact at vectorcult.com, or just paste some links here if you aren’t worried about making them public)

    • In fact, just testing it now with my own photo, I’m seeing the same issue. I’ll look into it and update you when there’s a fix.

    • Hi Dan – I made a few tweaks and it seems to be working okay for all my examples now. Give it another shot and tell me how it goes.

  5. Dave

    Well it works for me now where it didn’t before (same images).
    Thanks for making it. I tried to figure it out but then it took me two days to even realise that edited images were less than half the file size of the originals :)

  6. Dude! You’re an absolute genius! Works perfectly now! My next step is to try to take two shots; one with the camera angled down, and then one with the camera angled up. I’ll split them apart and try to stitch them and then reassemble them with your webtool and see if I can get a 360 degree 3D pano. Quick question – what’s the maximum filesize that each image is allowed to be to work with your joiner webtool? Again, thanks for all your hard work – you’re a giant among big men.

    • I’ll be interested to see how you go making a 360 degree pano – I haven’t had a chance to play with all the possibilities so I’ve no idea if it will work.

      The maximum file size per image is currently set to 20 Mb – I figured this should be high enough for most purposes, but should prevent someone accidentally/maliciously filling up the server by uploading something huge. If you need it raised I can do that.

  7. Just wanted to let you know what a personal life-saver you were. I took a spectacular Cardboard Camera photo of a historic playground where my family spent the day a few weeks ago. My first photo with the app, and when I finally experienced it it was quite emotional to experience what felt like a complete moment trapped in time. I really loved it.

    A couple of weeks later I wiped my Nexus 6 due to performance issues, re-downloaded the photo from my Google Photos backup, and tried to view it. I got the “This is not a VR image” error. Nothing I’ve tried has resolved it, and reports to Google didn’t get me anywhere (naturally).

    I just used your tool to split the file, then used your join tool (with guess settings) to rejoin it. It worked like a charm. Also for a second, less important, file.

    When I have time I’ll dig through the broken and fixed versions to try to figure out what specifically was broken so if this ever comes up again I can fix it. Maybe an additional tool could be added to your site to repair broken files with this information in mind?

    Anyway, thank you. A lot. You’ve restored an overwhelmingly valuable memory that I thought I’d lost for good.

  8. Just wanted to pop in and also say thanks.

    After fumbling around trying to get some 3D renderings to show on the Cardboard (with little success) I was able to get everything working via your split & join scripts. Now I have working equirectangular renderings that are fully panoramic (can look all the way up and down as well as left to right)

    This apparently used to be quite a bit easier, but seems that Google has tweaked the way Cardboard reads the panoramic images, thus breaking any images not having been taken with the most recent Cardboard Camera app (as TurboFool found out above)

    • Hi Steve

      I’m interested in how you made 3D renderings to show on Cardboard. Could you explain the process to get the different panoramic images?

  9. derek

    i discovered your cardboard tools today and wanted to say thanks!
    the join tool has been fun to mess around with.
    the only issue i’ve had was with audio. the first time i tried, i used a random 38 second clip that was converted to mp4 using a free online service. (that file was mp3 48k, 16bit, 160k) that worked perfectly. then next time i tried with a 33 second file (mp3 44.1/16/320k and 48k) but it didn’t work. the third time it didn’t work with a full length song.
    i was wondering if you have any hints at how to get the audio in the proper format to have it play when your jpg loads in cardboard. it was odd it worked the first time, but not the other 2 times. the image always looks amazing!
    i do music production and photography and this is a stellar way to mingle the two!

    • The audio format for original Cardboard Camera .vr.jpg photos is MP4 (AAC) 44.1 kHz/128 kbit/16-bit mono (mimetype is ‘audio/mp4a-latm’). I haven’t tested anything other than that format – I wouldn’t expect MP3 encoded audio will work unless it’s first converted to this. The cctoolkit app doesn’t do any audio conversion of it’s own, so if you provide it with some type of ‘mp4’ that passes sanity checks it will try to use it. You should be able to be able to convert your audio to MP4 (AAC) 44.1 kHz/128 kbit/16-bit mono using something like ffmpeg (although fre:ac on Windows should also do the trick).

    • This looks like a top/bottom 360 stereo image. Cut the top and bottom halves into separate left and right files, make sure both images are exactly the same dimensions, then try using them with the “Join” operation on the cctoolkit.vectorcult.com web app – it might work.

      • derek

        also, i use the free app AAA VR CINEMA and it has the option to load images from your gallery and then it lets you select what format your image is in and the app converts it. so the app does the converting and it’s pretty decent if you don’t want to get into splitting, editing and joining a lot of images.

  10. drewp

    The last line “ifh.close” is a no-op without the function call. I wouldn’t even bother with the close lines in a program like this (the files are closed on exit), but you could also switch to this style:

    with open(…, ‘wb’) as ifh:

    • Thanks – the missing brackets were a cut-n-paste error (I originally had the Github gist embedded here but there were some issues so I just pasted the code directly).
      Python context managers are awesome and I usually use them for files – but when I don’t I try to keep the habit of closing files just in case the code ends up in a function somewhere where it matters. But yeh, right here in this script it doesn’t matter.

  11. Miles Dunkin


    This is super tantalising. I can see the potential, but it doesn’t seem to matter how I “export” my Cardboard Camera photo (copy via Explorer, share from within Cardboard Camera, saved to PC Desktop) I’m getting a “Something went wrong 500” message when I drag it onto your Split tool. It loads, then returns the error message.

    I’m a complete newbie at this stuff, hence I love the idea of drag and drop! Any suggestions gratefully accepted.

    Thanks in advance.

    • Hi Miles – turns out a recent version of Cardboard Camera made a small change to the format of the images (removed the padding on the Base64 data) – I hadn’t detected the issue since I was only testing with older images. Pretty sure the bug is fixed now, so give it another try !

      • Miles Dunkin

        Perfect! You’re a champion.

  12. Tycho

    The web tool looks awesome, however I get a 500 error (Something went wrong) upon uploading my cardboard camera photo’s. I tried several photo’s, but to no avail. Maybe there has been a change since your last revision? Anyway, awesome tool, and much look forward to using it!

    • Hi Tycho – I’ve checked the server and everything seems to be working to split/join images from the latest version of Cardboard Camera, but it will throw that error if there’s something not quite right with the image (eg, bad XMP metadata for splitting, abnormally huge files for joining). If you send me the images that are giving the error I can take a look and try to diagnose what’s going wrong.

  13. Andrew I want to sincerely THANK YOU from the bottom of my heart for making our lives easier!! You’re a a genius. Boy I was searching high and low for answers on extracting those files and and messing with the elusive exif info. Even Google keep quiet about this vr.jpg format. Then you came and not only made an extractor but a combiner as well! U totally rock man! :)

  14. This is great! I was looking for a way to see the cardboard camera images on my Vive and now (with a little tweaking) I can :) adding the ability to natively view these plus sound in PC VR would be amazing. Any plans for a windows application to view in rift/vive?

  15. randalter

    Getting a “502 Bad Gateway” error in Chrome and MS Edge browsers when I try to join

  16. Unfortunately does not work for me, cardboard camera says “Not a VR image”.
    I’m using rendered images,last one a pair of (stereo) 2048×1024 equirectangular panorama.

  17. Jorgensen

    Hi Andrew

    i have only very little programming experience, but i wonder if this feature (joining images for cardboard) can be made / converted as a local program on a windows computer?

    thanks in advance

    • Hi Jorgensen – in theory it could be done and I’ve certainly explored the possibility, however there is one library (‘Exempi’, libexempi) which only has official builds for Linux and Mac OS X, and this prevents easily creating a Windows standalone version. There are various ways to work around this (figuring out if libexempi can be compiled for python-xmp-toolkit on Windows, or writing new code to replace the need for libexempi), but all these things would take some time and effort.

  18. Matias

    Awesome, thanks!

    I added a few lines to the script after fetching the image, I had problems with the base64 padding for one of the images I was working with.

    Here are the lines just in case some one had the same problem.

    image_b64 = xmp[u’http://ns.google.com/photos/1.0/image/’][1][1]

    missing_padding = len(image_b64) % 4
    if missing_padding:
    image_b64 += b’=’* (4 – missing_padding)

  19. Bonjour et merci Andrew pour votre application en ligne: Grâce à vous j’ai pu mettre mes photos 3D stéréoscopiques en VR mode pour ‘cardboard camera’ et même sonoriser la vue.
    Pour cela j’ai mis 4 photos qu’on regardera en pivotant de 90° sur soi !
    Je partage mes photos sur mon drive :
    [Google Traduction]
    Hello and thank you Andrew for your online application: Thanks to you I put my stereoscopic 3D photos in VR mode for ‘cardboard camera’ and even sound view.
    For that I put 4 photos that will look by pivoting 90 ° on itself!
    I share my photos on my drive:

  20. Tobias

    The web service is really cool, but could you perhaps add a small command line tool. That would really help if you try to join or split a lot of images.

Leave a Reply