With the launch of the Google Pixel 3, smartphone cameras have taken yet some other bound in capability. I had the opportunity to sit down with Isaac Reynolds, Product Director for Camera on Pixel, and Marc Levoy, Distinguished Engineer and Computational Photography Lead at Google, to learn more about the technology behind the new camera in the Pixel 3.
Ane of the beginning things you might observe about the Pixel 3 is the single rear photographic camera. At a fourth dimension when we’re seeing companies add dual, triple, even quad-camera setups, one main photographic camera seems at get-go an odd choice.
Merely later on speaking to Marc and Isaac I recall that the Pixel camera team is taking the correct arroyo – at least for at present. Any applied science that makes a unmarried camera improve will make multiple cameras in futurity models that much better, and nosotros’ve seen in the by that a unmarried camera approach can outperform a dual camera arroyo in Portrait Manner, peculiarly when the telephoto camera module has a smaller sensor and slower lens, or lacks reliable autofocus.
Let’s take a closer expect at some of the Pixel iii’due south core technologies.
1. Super Res Zoom
Last yr the Pixel 2 showed us what was possible with burst photography. HDR+ was its clandestine sauce, and information technology worked past constantly buffering nine frames in retentiveness. When you lot printing the shutter, the camera essentially goes back in time to those last nine frames
, breaks each of them up into thousands of ’tiles’, aligns them all, and then averages them.
Breaking each image into small tiles allows for advanced alignment even when the lensman or subject introduces movement. Blurred elements in some shots can exist discarded, or subjects that have moved from frame to frame tin exist realigned. Averaging simulates the furnishings of shooting with a larger sensor by ‘evening out’ noise. And going back in time to the last 9 frames captured right before you lot hit the shutter push ways there’s null shutter lag.
|Similar the Pixel 2, HDR+ allows the Pixel 3 to render sharp, low racket images fifty-fifty in loftier contrast situations. Click paradigm to view the level of particular at 100%.
This year, the Pixel 3 pushes all this farther. It uses HDR+ burst photography to buffer up to 15 images
, and then employs super-resolution techniques to increase the resolution of the epitome across what the sensor and lens combination would traditionally achieve
. Subtle shifts from handheld milkshake and optical image stabilization (OIS) allow scene detail to be localized with sub-pixel precision, since shifts are unlikely to exist exact multiples of a pixel.
In fact, I was told the shifts are carefully controlled by the optical paradigm stabilization system. “We tin can demonstrate the manner the optical image stabilization moves very slightly” remarked Marc Levoy. Precise sub-pixel shifts are not necessary at the sensor level though; instead, OIS is used to uniformly distribute a agglomeration of scene samples across a pixel, and so the images are aligned to sub-pixel precision in software.
We go a red, light-green, and blue filter behind every pixel just considering of the way we shake the lens, then in that location’southward no more need to demosaic
But Google – and Peyman Milanfar’s inquiry team working on this particular feature – didn’t terminate at that place. “We get a carmine, dark-green, and blue filter behind every pixel only considering of the style we shake the lens, so there’southward no more demand to demosaic” explains Marc. If you lot accept enough samples, you lot tin expect any scene element to have fallen on a cerise, light-green, and blue pixel. Subsequently alignment, and so, yous have R, G, and B information for whatever given scene element, which removes the need to demosaic. That itself leads to an increment in resolution (since you don’t have to interpolate spatial information from neighboring pixels), and a decrease in noise since the math required for demosaicing is itself a source of noise. The benefits are essentially similar to what y’all get when shooting pixel shift modes on dedicated cameras.
At that place’s a small catch to all this – at least for now. Super Res only activates at 1.2x zoom or more. Not in the default ‘zoomed out’ 28mm equivalent style. As expected, the lower your level of zoom, the more impressed you lot’ll be with the resulting Super Res images, and naturally the resolving power of the lens volition be a limitation. Simply the claim is that you lot tin get “digital zoom roughly competitive with a 2x optical zoom” according to Isaac Reynolds, and information technology all happens right on the phone.
The results I was shown at Google appeared to exist more than impressive than the example we were provided above, no dubiety at to the lowest degree in part due to the extreme zoom of our example hither. Nosotros’ll reserve judgement until we’ve had a run a risk to exam the feature for ourselves.
Would the Pixel three benefit from a second rear camera? For certain scenarios – yet landscapes for case – probably. But having more cameras doesn’t always hateful better capabilities. Quite often ‘second’ cameras have worse low light performance due to a smaller sensor and slower lens, likewise as poor autofocus due to the lack of, or fewer, phase-notice pixels. 1 huge advantage of Pixel’s Portrait Mode is that its autofocus doesn’t differ from normal wide-bending shooting: dual pixel AF combined with HDR+ and pixel-binning yields incredible low calorie-free functioning, even with fast moving erratic subjects.
ii. Computational Raw
The Pixel three introduces ‘computational Raw’ capture in the default photographic camera app. Isaac stressed that when Google decided to enable Raw in its Pixel cameras, they wanted to do it correct, taking advantage of the phone’southward computational power.
Our Raw file is the result of aligning and merging multiple frames, which makes it wait more like the result of a DSLR
“There’south i key difference relative to the residue of the manufacture. Our DNG is the outcome of adjustment and merging [upwardly to xv] multiple frames… which makes it look more like the result of a DSLR” explains Marc. There’due south no exaggeration here: nosotros know very well that image quality tends to calibration with sensor size thanks to a greater amount of total light collected per exposure, which reduces the bear on of the most dominant source of noise in images: photon shot, or statistical, dissonance.
The Pixel cameras tin effectively make up for their small sensor sizes past capturing more total calorie-free through multiple exposures, while adjustment moving objects from frame to frame so they tin yet be averaged to decrease noise. That means better low light operation and higher dynamic range than what you’d expect from such a small sensor.
Shooting Raw allows you to take reward of that extra range: past pulling back blown highlights and raising shadows otherwise clipped to black in the JPEG, and with full freedom over white balance in mail thanks to the fact that in that location’s no scaling of the color channels before the Raw file is written. Even amend news? HDR+ independently merges red, green and blue channels, which means the Raws are true Raws – united nations-demosaiced.
|Pixel iii introduces in-camera
Such ‘merged’ Raw files stand for a major threat to traditional cameras. The math alone suggests that, solely based on sensor size, xv averaged frames from the Pixel three sensor should compete with APS-C sized sensors in terms of dissonance levels. There are more factors at play, including fill up factor, breakthrough efficiency and microlens pattern, but needless to say we’re very excited to get the Pixel three into our studio scene and compare it with dedicated cameras in Raw manner, where the effects of the JPEG engine can be decoupled from raw performance.
While solutions do exist for combining multiple Raws from traditional cameras with alignment into a unmarried output DNG, having an integrated solution in a smartphone that takes advantage of Google’s frankly class-leading tile-based align and merge – with no ghosting artifacts even with moving objects in the frame – is incredibly heady. This feature should prove highly benign to enthusiast photographers. And what’s more – Raws are automatically uploaded to Google Photos, so you don’t have to worry about transferring them equally yous do with traditional cameras.
3. Synthetic Fill Flash
|‘Constructed Fill Flash’ adds a glow to homo subjects, as if a reflector were held out in front of them.
Often a photographer volition use a reflector to light the faces of backlit subjects. Pixel 3 does this computationally. The same machine-learning based segmentation algorithm that the Pixel camera uses in Portrait Fashion is used to identify human subjects and add a warm glow to them.
If you’ve used the front facing camera on the Pixel 2 for Portrait Way selfies, y’all’ve probably noticed how well it detects and masks human subjects using only segmentation. By using that same partition method for synthetic fill up flash, the Pixel iii is able to relight human subjects very finer, with believable results that don’t confuse and relight other objects in the frame.
Interestingly, the same sectionalisation methods used to identify man subjects are likewise used for front end-facing video prototype stabilization, which is not bad news for vloggers. If you lot’re vlogging, you typically want yourself, not the background, to be stabilized. That’s impossible with typical gyro-based optical paradigm stabilization. The Pixel iii analyzes each frame of the video feed and uses digital stabilization to steady you in the frame. There’s a modest crop penalty to enabling this fashion, but it allows for very steady video of the person holding the photographic camera.
iv. Learning-based Portrait Way
The Pixel 2 had one of the best Portrait Modes we’ve tested despite having but one lens. This was due to its clever use of divide pixels to sample a stereo pair of images behind the lens, combined with machine-learning based segmentation to understand human vs. non-human being objects in the scene (for an in-depth explanation, watch my video hither). Furthermore, dual pixel AF meant robust performance of fifty-fifty moving subjects in low low-cal – not bad for constantly moving toddlers. The Pixel 3 brings some significant improvements despite lacking a 2nd lens.
Co-ordinate to computational atomic number 82 Marc Levoy, “Where we used to compute stereo from the dual pixels, we now use a learning-based pipeline. It still utilizes the dual pixels, but it’southward non a conventional algorithm, it’s learning based”. Google essentially congenital a ‘frankenphone’ rig consisting of five Pixel 3 phones that could be fired simultaneously to build loftier quality depth maps from structure from motion and multi-view stereo. These ‘basis truth’ maps were used to train a neural network with depth maps generated from the unmarried Pixel three telephone in the middle of this rig. There were a number of advantages to this approach: the largely separated phones provided large baselines for more than accurate depth estimation, less risk of occluded objects going undetected, and parallax in multiple directions immune Google to avoid the aperture problem (where detail forth the axis of stereo disparity essentially has no measured disparity).
What this ways is improved results: more uniformly defocused backgrounds and fewer depth map errors. Have a look at the improved results with complex objects, where many approaches are unable to reliably blur backgrounds ‘seen through’ holes in foreground objects:
Interestingly, this learning-based approach as well yields meliorate results with mid-distance shots where a person is farther away. Typically, the further away your subject is, the less difference in stereo disparity between your discipline and groundwork, making accurate depth maps difficult to compute given the small 1mm baseline of the carve up pixels. Accept a expect at the Portrait Way comparing below, with the new algorithm on the left vs. the one-time on the right.
v. Night Sight
Rather than just rely on long exposures for depression low-cal photography, ‘Night Sight’ utilizes HDR+ burst way photography to take usable photos in very dark situations. Previously, the Pixel ii would never drop beneath 1/15s shutter speed, but considering information technology needed faster shutter speeds to maintain that 9-frame buffer with null shutter lag. That does mean that even the Pixel ii could, in very low light, finer sample 0.6 seconds (nine x i/15s), but sometimes that’south not even enough to get a usable photo in extremely dark situations.
The photographic camera will merge upward to 15 frames… to become you an image equivalent to a 5 2d exposure
The Pixel 3 now has a ‘Night Sight’ way which sacrifices the goose egg shutter lag and expects you to hold the camera steady
you’ve pressed the shutter push. When yous do so, the camera volition merge up to 15 frames, each with shutter speeds as low as, say, ane/3s, to get you an paradigm equivalent to a 5 second exposure. But without the motion blur that would inevitably result from such a long exposure.
Put simply: even though there might exist subject or handheld movement over the entire 5s span of the 15 frame outburst, many of the the 1/3s ‘snapshots’ of that burst are likely to still be abrupt, albeit maybe displaced relative to one another. The tile-based alignment of Google’s ‘robust merge’ engineering science, however, can handle inter-frame movement by aligning objects that accept moved and discarding tiles of whatsoever frame that have too much motion blur.
Have a look at the results below, which also shows you the benefit of the wider-angle, 2nd front-facing ‘groupie’ camera:
Furthermore, Nighttime Sight style takes a machine-learning based approach to auto white balance. It’s often very hard to decide the dominant light source in such dark environments, so Google has opted to use learning-based AWB to yield natural looking images.
Last thoughts: simpler photography
The philosophy behind the Pixel photographic camera – and for that matter the philosophy backside many smartphone cameras today – is one-push button photography. A seamless experience without the need to activate various modes or features.
This is possible thanks to the computational approaches these devices comprehend. The Pixel camera and software are designed to give you pleasing results without requiring y’all to recollect much nearly camera settings. Synthetic fill flash activates automatically with backlit human subjects, and Super Resolution automatically kicks in as you lot zoom.
At their best, these technologies allows you to focus on the moment
Motion photos turns on automatically when the photographic camera detects interesting action, and Top Shot now uses AI to automatically suggest the best photo of the agglomeration, even if it’s a moment that occurred earlier you pressed the shutter push button. Autofocus typically focuses on man subjects very reliably, but when yous need to specify your subject area, but tap on information technology and ‘Motion Autofocus’ will continue to track and focus on it very reliably. Perfect for your toddler or pet.
At their best, these technologies allow y’all to focus on the moment, perhaps even
information technology, and sometimes fifty-fifty help you lot to capture memories you might take otherwise missed.
We’ll be putting the Pixel three through its paces soon, and so stay tuned. In the meantime, allow us know in the comments below what your favorite features are, and what you’d similar to see tested.
In proficient lite, these last 9 frames typically span the concluding 150ms earlier yous pressed the shutter button. In very depression light, it can span upwardly to the concluding 0.6s.
Nosotros were only told ‘say, peradventure fifteen images’ in chat about the number of images in the buffer for Super Res Zoom and Night Sight. It may exist more, it could be less, but we were at to the lowest degree told that information technology is more than 9 frames. One affair to proceed in mind is that fifty-fifty if you have a 15-frame buffer, not all frames are guaranteed to exist usable. For case, if in Night Sight one or more of these frames have too much subject field motion mistiness, they’re discarded.
You can achieve a similar super-resolution issue manually with traditional cameras, and we describe the process here.
Nosotros are retrieving offers for your location, please refresh the page to run into the prices.