[computational photography] analysis of the core technology behind image restoration

2021-08-09 15:55:40  作者:There are three words

Hello everyone , This is a column 《 Computational photography 》 The tenth article , This column comes from the interdisciplinary field of computer science and photography .


                                            author & edit | There are three words

A problem to be introduced today is a very classic problem in the field of computer vision , Image restoration , This paper introduces image restoration based on deep learning and its core technology .

1 Fundamentals of image restoration  

1.1 Texture synthesis


Texture synthesis is Texture Synthesis, This is a common visual problem . Its definition is given a small piece of texture sample , Generate large images with similar appearance , For example, below 1:



chart 1 Texture Synthesis


Upper figure 1 In the middle frame is the original picture , And the rest is completely generated , A typical texture synthesis method is to generate a texture step by step by searching the pixels in the similar area of the existing image . Technology like this , It can be used to fill and remove small cavities and defects , Here's the picture 2:



chart 2 be based on Texture Synthesis The cavity is filled


1.2  Image restoration


Image restoration is image inpainting. In photography , Many times we can't control the shooting scene , For example, the bustling flow of people in the scenic spot , Make it difficult to get photos with clean background . On the other hand , The image will also be damaged during saving , It needs to be fixed , For example, below 3 Shows a picture of comparison before and after repair .



chart 3 Photo restoration


The simplest image restoration method is based on the principle of image self similarity , That is, the texture synthesis method mentioned earlier , It is completed by finding matching blocks with similar texture in the current image , This kind of method propagates in structure (structure propagation) As a representative , It has been able to better complement the smaller areas .


such as Photoshop The seal tool in is a tool that can repair local images , The technical principle behind it is PatchMatch, It is also a method based on image block filling , You can use interactive strategies to gradually patch .


The problem with this kind of method is that only the similarity of images is considered , Without considering semantic information , Therefore, the images often completed are very untrue . For example, the repair of cat tail in the figure below , It is also impossible to complete the large missing areas .


chart 4 Photoshop Image restoration


2 Image restoration algorithm based on deep learning

The learning ability of traditional image methods is limited , It can't repair serious image blocks well , At present, the idea of generating countermeasure network based on deep learning model is gradually applied to image restoration , And achieved good results , This paper mainly introduces the related technologies .


2.1 The basic method


Traditional methods usually use similarity algorithm to select image blocks from other regions of the image for completion ,Context Encoder[1] The neural network is trained to infer the information of the occluded part from the non occluded part of the occluded image , Automate the process , The specific network structure is shown in the figure 5 Shown :


Context Encoders Contains a Encoder, A full connectivity layer , One Decoder, It is used to learn image features and generate a prediction map corresponding to the area to be repaired , The input is the original image including the occluded area , Output the prediction result of the occluded area .



chart 5 Context Encoders


Encoder The main structure of is AlexNet The Internet , If you type 227×227, The characteristic graph is 6×6×256.


Encoder Then there is the channel by channel full connection layer , In order to obtain a large receptive field and a small amount of calculation , The structure of channel by channel full connection is adopted , Its input size is 6×6×256, The output size does not change . Of course, it is not necessary to adopt the channel by channel full connection layer structure here , You just need to control the feature and have a large receptive field . Large receptive fields are very important for image completion tasks , When the receptive field is small , The effective information outside the area cannot be used by the internal points of the complete area , The completion effect will be greatly affected .


Decoder It contains several up sampling convolutions , Output the parts to be repaired , The specific up sampling magnification is related to the size of the repaired part relative to the original image .


In the process of network training, the loss function consists of two parts :

The first part is Encoder-decoder Partial image reconstruction loss , Use the predicted part and the original figure L2 distance , Of course, only the parts that need to be repaired are calculated , So here we need to be under the control of the mask .


The second part is GAN Against the loss of . When GAN When the discriminator cannot judge whether the prediction map is from the training set , It is considered that the network model parameters reach the optimal state .


Context Encoder The structure of the generator and discriminator of the model is relatively simple , Although the result of completion is more real , But the boundary is very uneven , Does not satisfy local consistency . According to this feature , researcher [2] The global discriminator and the local discriminator are used together Context Encoder The model has been improved , Put forward Globally and Locally Consistent Image Completion Model ( abbreviation GLCIC Model ).


chart 6 GLCIC Model


GLCIC Model contains 3 A module , One is the image completion model of encoding and decoding , One is the global discriminator , One is the local discriminator . The global discriminator can be used to judge the consistency of the whole image reconstruction , The output local discriminator can be used to judge whether the filled image block has better local details , The specific discrimination loss is the global discriminator , The local discriminator outputs feature vectors for serial connection , And then pass by sigmoid After mapping, the authenticity is judged .


2.2 Attention mechanism


Traditional image completion methods are good at sampling from surrounding images ,CNN Models are good at generating new textures directly , In order to make comprehensive use of the advantages of these two methods , And make full use of the redundant information in the picture , Some researchers have proposed methods based on attention mechanism to complete images . In words [3] For example , It adopts two steps from coarse to fine .


The basic process is to roughly complete the picture first , Then look for similar picture blocks in the unobstructed area , The overall network structure is as follows :



chart 7 Attention mechanism image restoration model


Basic generated image restoration network (coarse network) Is a codec model , It is used for initial rough prediction , The training loss of rough network is reconstruction loss . Here, the reconstruction loss author on different pixels , It is weighted according to the distance between it and known pixels , The closer you get , The less weight , Because the easier it is to rebuild .


Fine network (refinement network) The prediction of rough network is used as input for fine adjustment , Contains two branches .


chart 8 Fine network


One of them is the attention Branch , It regards convolution as a template matching process , Through the foreground ( Occlusion area ) And the background ( Unobstructed area ) The block convolution looks for the image block similar to the image block in the foreground in the background . say concretely , Is to follow... From the background 3×3 The size is sampled to form a series of convolution kernels , Then convolute with the foreground , The greater the similarity between the two , The larger the result of convolution , This is done by ordinary convolution and channel by channel softmax Operation implementation .


chart 9 Attention Branch


The complete training loss of fine network is reconstruction loss and confrontation loss . Because the fine network has a more complete scene than the original image without regional information , So its encoder can learn better feature representation than rough network .


2.3 Arbitrary shape repair


When using the deep learning method for picture completion , Generally, the missing area is filled with white or random noise , Then the convolution layer is used to extract context features and subsequent completion . white / Random noise has no effective information , It is not reasonable to convolute them indiscriminately with valid information , Such completion results in some unreasonable image blocks , And you can't repair any shape area , For example, below 10 Completion of samples in .


chart 10 Irregular area completion


In order to solve this side effect , Nvidia Put forward Partial Convolution[4], It improves picture completion by modifying convolution operations , To be specific , Partial Convolution The calculation formula of is as follows :




Type in the W It's the convolution kernel ,X Is the corresponding picture content on a convolution kernel ,M Is a convolution kernel with occlusion information Mask matrix , The element contains only 0 and 1; m’ Is to update each layer Mask The method of matrix , It can be found that only the areas to be filled are filled .


Later, Gated Conv[5] stay Partial Convolution On the basis of that, we improved , take Partial Conv Methodical Mask Update to learn from pictures , And no longer will Mask The value of the element in is fixed to 0 and 1, But from [0,1], The comparison of the two is shown in the figure below :


chart 11 Partial Convolution( Left ) And Gated Conv( Right )


2.4 Edge guided repair


When a painter begins to draw a picture , Often first draw the overall edge outline , Then color it . Based on this inspiration , There is a kind of image restoration framework, which first repairs the edge , Then the idea of repairing the texture content ,EdgeConnect[6] Is one of the representatives , It contains two generators , Two discriminators are used to complete the above two steps .



chart 12 Edge Connect


If the input figure that does not need to be repaired is Igray, Its edge detection result is Cgt, The input of the first generator consists of three figures , Grayscale image to be repaired ~Igray, Gray image edge detection results ~Cgt, And the mask M, Output as repaired edge Cpred, The tasks of the first generator are as follows .


The generator optimization objectives include standard countermeasure network loss and a feature matching loss (feature matching loss), The loss of feature matching is defined as follows :


LFM The definition of is similar to perceived loss , But no external VGG Characteristics of the model , Instead, the discriminant model is used directly D Activation value of each layer , because VGG The model is not trained for edge detection .


After getting the edge prediction results , The edge is fed into the second stage together with the color image 2 A generator , Output the final prediction result . The optimization objectives of the generator include a standard countermeasure network and a perceived loss , Style loss . Perceived loss is what we often use VGG Distance of feature space , Style loss is commonly used in stylized networks Gram Matrix distance , The definition is as follows :

watermark,size_16,text_QDUxQ1RP5Y2a5a6i,color_FFFFFF,t_100,g_se,x_10,y_10,shadow_90,type_ZmFuZ3poZW5naGVpdGk= watermark,size_16,text_QDUxQ1RP5Y2a5a6i,color_FFFFFF,t_100,g_se,x_10,y_10,shadow_90,type_ZmFuZ3poZW5naGVpdGk=

because EdgeConnect The method needs to calculate the edge in advance , With Canny Operator represents , Different parameters will generate different edge features , Thus affecting the repair results , Experiments show that more edge information is helpful for content repair .


2.5 Summary


Image inpainting is a very basic problem , Widely used tasks , but The current image restoration model has not been used on a large scale , Because the generalization ability in real scenes is very limited , At present, the key points worthy of attention include but are not limited to :


(1) Application and improvement of generative model .

(2) Addition and use of interactive information .

(3) Various applications of image restoration .

(4) Video repair .


Resources for this article :

[1] Pathak D, Krahenbuhl P, Donahue J, et al. Context encoders: Feature learning by inpainting[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2536-2544.

[2] Iizuka S, Simo-Serra E, Ishikawa H. Globally and locally consistent image completion[J]. ACM Transactions on Graphics (ToG), 2017, 36(4): 1-14.

[3] Yu J, Lin Z, Yang J, et al. Generative image inpainting with contextual attention[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 5505-5514.

[4] Liu G, Reda F A, Shih K J, et al. Image inpainting for irregular holes using partial convolutions[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 85-100.

[5] Yu J, Lin Z, Yang J, et al. Free-form image inpainting with gated convolution[C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 4471-4480.

[6] Nazeri K, Ng E, Joseph T, et al. EdgeConnect: Structure Guided Image Inpainting using Edge Prediction[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops. 2019: 0-0.

More details and practices , Please refer to my latest book ,《 Deep learning of photographic image processing 》, Introduce the following :






This paper introduces the problem of image restoration , It belongs to a more complex technology , Not only need a high-level understanding of the image , Fine local editing is also required , At present, the image restoration model based on deep learning has made some progress , But there is still a certain distance from practical application , It's worth learning from friends who are interested in visual technology .

There are three AI Autumn row - Image quality group


The image quality team needs to master the content related to image quality , The things to learn include 8 General direction : Image quality assessment , Image composition analysis , Image denoising , Image contrast enhancement , Image deblurring and super resolution , Image stylization , Image depth estimation , Image restoration . To learn more, please read the following articles :


【CV Autumn row 】 What are the research and application of image quality improvement and editing , How to learn well step by step ?


Please contact backstage for reprint

Infringement must be investigated

watermark,size_16,text_QDUxQ1RP5Y2a5a6i,color_FFFFFF,t_100,g_se,x_10,y_10,shadow_90,type_ZmFuZ3poZW5naGVpdGk= watermark,size_16,text_QDUxQ1RP5Y2a5a6i,color_FFFFFF,t_100,g_se,x_10,y_10,shadow_90,type_ZmFuZ3poZW5naGVpdGk= watermark,size_16,text_QDUxQ1RP5Y2a5a6i,color_FFFFFF,t_100,g_se,x_10,y_10,shadow_90,type_ZmFuZ3poZW5naGVpdGk=

Previous selections


本文为[There are three words]所创,转载请带上原文链接,感谢