A Brief Review of the R-CNN Family - Region-based CNN for Object Detection

The R-CNNs are awesome works on object detection, which demonstrated the effectiveness of using region proposals with deep neural networks, and have become a state-of-the-art baseline for the object detection task. In this blog post I'll make a brief review of the R-CNN family - from R-CNN to Mask R-CNN, and several related works based on the idea of R-CNNs. Implementation and evaluation details are not mentioned here. For those details, please refer to the original papers provided in the References section.

Read More

[MineSweeping] The Long Struggle of DensePose Installation

Update in October 2019:

DensePose has been re-implemented with the brand-new object detection framework Detectron2, which is based on PyTorch and much easier to install and use (You don't have to manually compile Caffe2)
I strongly recommend that you check out the new official DensePose code at https://github.com/facebookresearch/detectron2/tree/master/projects/DensePose.

DensePose is a great work in real-time human pose estimation, which is based on Caffe2 and Detectron framework. It extracts dense human body 3D surface based on RGB images. The installation instructions are provided here.

During my installation process, these are the problems that took me some time to tackle. I spent on week to finally figure out solutions to all the issues. So lucky of me not to give up too early...

Greetings from Facebook AI Research

Read More

MathJax - Use Math in Hexo, Just Like Tex! (Including Common Issue Solutions)

Sometimes you may want to explain some algorithms or principles with beautiful formulae in your blog. How to do this? Edit them in Microsoft Word, take a screenshot, crop it and put it in the blog post? When you finish your article and find out that you missed a symbol in the pictures - oh man, gotta repeat that again? Stop using those images now! A beautiful math display engine - MathJax allows you to code math like a coder.

$$\mathcal{C}\phi \delta e \mathfrak{M}\alpha th \mathit{I}n \mathcal{H}ex\sigma \mathbb{N}o\omega!$$

Read More

A Review of ResNet - Residual Networks

0 Introduction

Deep learning researchers have been constructing skyscrapers in recent years. Especially, VGG nets and GoogLeNet have pushed the depths of convolutional networks to the extreme. But questions remain: if time and money aren't problems, are deeper networks always performing better? Not exactly.

When residual networks were proposed, researchers around the world was stunned by its depth. "Jesus Christ! Is this a neural network or the Dubai Tower?" But don't be afraid! These networks are deep but the structures are simple. Interestingly, these networks not only defeated all opponents in the classification, detection, localization challenges in ImageNet 2015, but were also the main innovation in the best paper of CVPR2016.

Network Growth

Read More

A Review of VGG net - Very Deep Convolutional Neural Networks

0 Introduction

Convolutional neural networks(CNN) have enjoyed great success in computer vision research fields in the past few years. A number of attempts are made based on the original CNN architecture to improve its accuracy and performance. In 2014, Karen Simonyan et al. did an investigation on the effect of depth on CNNs' accuracy in large-scale image recognition (thus also proposing a series of very deep CNNs which are usually called VGG nets). The result confirmed the importance of CNN depth in visual representations.

1 Background: VGG net's ancestors

Before introducing VGG net, let's take a glance at prior convolutional neural networks.

1.1 LeNet: The Origin

Basic neural network structures(for example, multi-layer perceptron) learn patterns on 1D vectors, which cannot cope with 2D features in images well. In 1986, Lecun et al. proposed a convolution network model called LeNet-5. Its structure is fairly simple: two convolution layers, two subsampling layers and a few fully connected layers. This network was used to solve a number recognition problem. (If you need to learn more about the convolution operation, please refer to Google or Digital Image Processing by Rafael C. Gonzalez)

Read More

Smartypants is NOT SO SMART

When blogging with Hexo, every time I type a single quotation mark(also called apostrophe) like this:


, Hexo would convert it to a symbol like this


You would say that this is also an apotrophe, but it really looks UNBEARABLE in the articles. It's been a problem bothering me for more than a month.(I'm not saying that this is the reason for not updating my blog, but I don't mind if you think so!)

Read More