Is it always better to record the A7s via 8 bit UHD and downsample for HD delivery? Or is the in-camera scaling of the 4K sensor for HD, via the built in XAVC codec just as good?
And what is the best exposure strategy for this camera too?
Just so there’s a reason for reading the rest of this rather long post the answer IMHO is an emphatic yes. Click the image below to expand it and check XAVC on the left and a ‘properly’ exposed and post workflowed (if that is a verb) UHD version. Awesome isn’t it? The image also breaks down the red, green and blue colour channels too as sometimes a particular channel can be quite informative.
Remember that both the internal and the external outputs of this camera are 8 bit only, meaning 255 steps in each of the R,G,B channels between darkest and lightest. This isn't usually considered ideal and we'll touch on this in this post as well.
Channel Comparisons - Look at the difference between XVAC and UHD, the sharpness and details is quite different
The ability to do this has been unlocked with the Convergent Design Odyssey 7Q+ which is an fantastic piece of kit. I’m not going to write a review about it because there are many reviews out there already. But you know how sometimes you just pick something up and it feels like it’s just right. Well that’s this. I had the O7Q before and upgraded recently to get the ability to record the A7s. I cannot be happier with this combination and if you have a camera that it works with then it’s a bargain. Add that to the fact that Convergent Design are a great helpful bunch of people to work with then they deserve all this praise.
Lets dial back to the beginning and see how we got there.
XAVC in Linear - This is how the basic XAVC file looks. This is shown in linear or raw (lowercase) format, in the sense there has been no additional gamma encoding other than sRGB for display. (This is not linear as in from the camera sensor which would be very dark) Values in the file can run from 0 to 1 but because it is a log format file then the blacks are raised and the highlights are stretched. Black is starting around 0.09 and the brightest spot appears around 0.93. The waveform also shows the way the colours are compressed within to 0 to 100% container.
If we start with a baseline image, this is a scene which has a full range but skewed towards the darks, this base line is 3200 ISO Slog2 recorded via the internal card in XAVC format. Basically what you get out of the stock A7s. 3200 ISO is the lowest ISO setting for Slog2 and therefore you would assume the ‘cleanest’ from a noise perspective.
Firstly i am using Nuke X for these tests, this gives me an environment where i know there’s nothing weird going on and i can control every aspect of colorspaces and gamma adjustment.
If i take the .MP4 XAVC file in straight as a linear file i see the basic data in the image. The data ranges from 0 to 1 in floating point. You can see the effect of the log recorded file with the low contrast image and the raised black levels. I’ve marked on the file the luminance value at several key areas. In order to get this value i analyse an area and get an average (so i’m not recording noise pixels)
You can see what a lot of people first say about the A7s, look how noisey it looks. Look at those great big magenta and green spldoges. Why is this? I believe it’s because you are looking at the darkest, noisiest part of the sensor here in its full glory. We can see it because we have recorded Slog2 (which raises the blacks) and also because i’m viewing it in a linear space.
Remember that Slog2 requires that it is gamma converted back into scene linear space - in other words we need to undo the log encoding back to real world light values.
In a scene linear space, 18% grey is at 0.18 value, a white piece of paper is nominally at 0.9.
So lets tell Nuke that this is an Slog2 source and Nuke will apply a transfer function to take this flat image back to how the file is supposed to be viewed.
If you’re unsure about what a log curve is and does then i'm in the middle of writing another post - drop me a comment or line and i'll make sure you're updated.
With the file back in scene space you see that the raised blacks are pulled back down and that we see highlight clipping, in other words we see light that has gone superbright, it's more than 100%. If we pull the exposure down in Nuke (close the standard f8 aperture down to f16 in our Nuke viewer) then we see the highlights in the same way that we would if we were closing the iris of a lens down.
Sidenote about Nuke: Nuke is a floating float linear compositor and we work with real world light values. It’s important that blurring, defocusing and transforms happen in this space to make them realistic. Typically at the end of a Nuke chain or as a viewing option we would decide how we want to compress this full linear range into a deliverable format. In other words what do we want to do with these >1 light values? Compress them, clip them, roll them off softly? This is the right way to approach this. For this article i am showing them as they really are without any roll off or soft clipping, hence the example above comparing two different exposures. I hope it also highlights why log formats are important because they retain all the real world light data.
Okay, so we have our XAVC file, lets compare that to the plain HDMI version of the 1080p image. This image is recorded to ProRes HQ via the Odyssey 7Q+. What differences can we see?
Firstly we see that the blacks are lower compared to XAVC. Great, so what about this industry standard Slog2 format? Well the thing about standards are there are so many to choose from. Welcome to my world. Yes the Slog2 encoding is different. In fact practically every camera has tweaked it, yet Sony et al do not release the curves with which each Slog is generated. Instead the Nuke curve is based on a math expression from Sony, which i think refers to the F35. It highlights that Slog2 isn’t as much of a standard as it should be. And we can see that the HDMI version of the scene looks different to the XAVC one. For our purposes we’ll just ignore that, but there are times you can’t and there are times where the encoding is so different you have to manually work out how to linearise a particular flavour of Slog2 yourself. I believe that Slog3 is more standardised and also AlexaLogC is pretty set in stone.
Secondly the image is slightly sharper too, a factor of the ProRes giving a better compression than the built in XAVC. This is a static image which is the best case scenario for XAVC, if the camera was moving with lots of high detail then the compression would be worse as it struggled to fit massively changing scene into a very small container.
I’m not too fussed about the HDMI image at HD resolution though, that’s not why we’re here. So here is the 4K UHD version of the same scene at 3200. Colour-wise it’s similar to the HDMI HD version as expected. And it’s twice the size, see the detail in the colour checker? This is the full sensor now. But we want to compare this to the XAVC version. So we will scale it and compare side by side.
In theory the camera itself is doing this internally, we know that it takes the full 4K image and scales it to HD, it does not line skip. Then this is compressed. So how different is the cameras route compared to our own version?
Quite different as you can see.
I imagine the difference is that there’s a pretty lame downsampling algorithm being used which is then having all the detail destroyed by the XAVC compression. Compare to our version where we have taken a 4:2:2 UHD source and scaled it by half which is basically generating a 444 version where every pixel is different. These differences are even greater when we look at the colour channels one by one (as the very first image at the top showed).
There are also many different ways to scale an image down which vary in sharpness and artefacts. Here’re a few shown above.
And also a common complaint on Sony prosumer cameras is that there are problems with highlights and black edges. I’ve touched on this before, but i believe it is the product of the downsampling used in camera. Some algorithms produce negative lobes (over shoots which can create out of range colours). By using a very aggressive sharpening algorithm i can even generate them myself. A sight familiar to some Sony folk i think.
So at this point i think we can safely say that scaling a UHD image to HD ourselves and not letting the camera do it produces a substantially better result, plus also the flexibility in choosing how we want to scale based on the scene.
However that UHD image is still pretty noisy isn’t it?
There’s a technique where we rate a camera as being less ISO than it is. The basically means over exposing the image and then adjusting exposure in post. What this basically does is to add more detail into the shadows and then crush those shadows to hide the noise. The dynamic range is lowered because you are crushing the shadows and as you are overexposing you are loosing some of your highlight range. Sounds awful but in real world cases it can work very well. When you are encoding in a log format, you are retaining as much of the range of the camera as possible meaning that to adjust exposure is very simple. It's just a simple multiplication. If you divide by 2 you are lowering the exposure by one stop.
So to test this we shoot the same scene at variety of ISOs. This is a bit counter intuitive, use a higher, noiser ISO to get less noise? But yes, it’s about using a better part of the sensor. We know that sensors are basically linear and from this post we know that the shadow detail is held with very few code values and where the shadows stop just becomes noise. So we want to record the shadows further up the luminance curve to get away from that noise floor.
So the results are here, we’re using ISO3200, 6400, 12600, 25600 and 51200 to try. There’s a point at which the noise cannot be minimised any more and also a point at which we loose too much highlight detail. The workflow is to take the image (which is progressively brighter in camera) and make sure the Slog2 is converted to scene linear properly, then simply pop an exposure node taking the image down by 1 stop for each doubling of the ISO. Then the resulting exposure should be identical between the 5 versions but the noise will be different and the highlight amount will clip more and more.
So where shoudl we expose? It’s scene dependent, in this test case and in fact other interior dim lighting i would generally expose up 2 stops. In daylight you probably wouldn’t expose up at all because you may need every last bit of highlight detail. In a day light interior scene perhaps i’d expose a single stop up. I've read anecdotally various people rating their cameras at half the base ISO. Which is the same as overexposing a stop. Personally i feel with the A7s that the base exposure is too bright anyway, and rating it at 1600 gives a better exposed image in Slog.
You can minimise noise on the XAVC versions too by doing the same thing, it’s not quite as effective as using the UHD version but better than not doing it. The problem is monitoring Slog2 is very difficult, the raised blacks trick you into thinking you have so much shadow detail but you don’t. You need to use the Zebras and try to expose to the right as best you can. Or use something like the waveform and false colour on an external recorder like the Odyssey 7Q+ to maintain a consistent exposure. I believe there’s an update due to this recorder that would let you monitor through a LUT. When that happens i’ll update this will some LUTs to monitor exposure through (if they’re not included by default…)
One step i haven’t carried out is denoising the UHD version before downscaling. This would yield even better results, but denoising is really a separate topic as there are so many options and so many really aren’t very good. For another time perhaps.
Banding Test Image - A simple shiny smooth surface list from one side creates a simple graduated test for banding. Interestingly (and whether the JPG compress here shows or not) the 8 bit capture is remarkably smooth as it is.
Another side effect of the downsampling process is that colour definition should also improve. All the built in, the HDMI and the UHD outputs are 8 bit only, this is usually considered too low for professional work. Most cameras ought to be outputting a 10 bit signal.
Remember though that the bit depth isn't the dynamic range, it's the number of recorded steps from dark to light. It isn't the bit depth of the sensor, the hardware - it's just the output format.
So in theory to compare the downsampled version we shoot a smooth subtle graduated surface then push the graduations in grading to maximise any steps. We want to see banding and then do the same thing with the downsampled UHD version and compare. So this image shows the source, it's a smooth shiny surface lit from one side.
I think it's useful to note here that the source images are remarkably smooth, i can't really see any banding and to be honest i don't think there's a big issue with only an 8 bit output.
But lets push both the HD and the downsampled UHD version and compare. Quite the difference really? The UHD is noticably smoother which is what i would expect. You can also see from the waveform display the 8 bit bands compared to the downsampled version.
Why is this? When we take the 8 bit UHD version in we're manipulating it in a 32 bit colourspace, and downsampling averages the pixel colours across the image and basically turns 4 pixels into 1, smoothing the colour as it goes, so the choices of smoothed colour aren't limited to 8 bit. Is this real colour detail? Well basically it is, there's nothing being made up because it was all recorded in th UHD image. There are some techniques which involve scaling an image, adding noise then dowsampling which smooths gradients up but that is fudging with the image, but the end result is often worth it. In fact these can get quite clever with scaling routines that interpolate detail ended up with sharpened images and higher bit depths but that's not for now.
I've seen various reviews and tests on the interwebs claming incredible dynamic range from this camera. I think these tests highlight that simply isn't true. In fact as far as i can see all cameras totally over egg their 'dynamic range'. The dynamic range from the base ISO (3200) to clipping is around 0 to 8 on a linear scale. This is around 10/11 stops of which we're painfully aware of how much noise there is in the shadows. It could be that the sensor itself is capable of more, through RAW, but the only route out that preserves as much of this right now is Slog2 which is what we're using here. Before anyone jumps on this, don't forget i'm saying that basically all cameras out there are the same. The differnece between tested range via step wedges and actual real world point it at something and look, is quite different.
But i think with the right approach you can produce fantastic results from the A7s. In later posts i will talk through some techniques for minimising rolling shutter and also the prickly topic of colourspaces and colour workflow which is another area way more complicated than it should be.