Language
(This post is also translated into other languages)

Update in March 2019:

After TensorFlow developers introduced the APIs of Tensorflow 2.0 on Tensorflow Dev Summit 2019, I have made my decision to turn to PyTorch.

TensorFlow is a powerful open-source deep learning framework, supporting various languages including Python. However, its APIs are far too complicated for a beginner in deep learning(especially those who are new to Python). In order to ease the pain of having to understand the mess of various elements in TensorFlow computation graphs, I made this tutorial to help beginners take the first bite of the cake.

ResNets are one of the greatest works in the deep learning field. Although they look scary with extreme depths, it's not a hard job to implement one. Now let's build one of the simplest ResNets - ResNet-56, and train it on the CIFAR-10 dataset.

2019.3 更新:

Tensorflow Dev Summit上开发者介绍TF 2.0 API后， 我彻底下定了换用PyTorch的决心。

TensorFlow是一个强大的开源深度学习软件库，它支持包括Python在内的多种语言。然而，由于API过于复杂（实际上还有点混乱），它往往使得一个深度学习的初学者（尤其是为此初学Python的那些）望而却步——老虎吃天，无从下口。为了减轻初学者不得不尝试理解TensorFlow中的大量概念的痛苦，我213今天带各位尝尝深度学习这片天的第一口。
ResNet是深度学习领域的一个重磅炸弹，尽管它们（ResNet有不同层数的多个模型）的深度看上去有点吓人，但实际上实现一个ResNet并不难。接下来，我们来实现一个较为简单的ResNet——ResNet-56，并在CIFAR-10数据集上训练一下，看看效果如何。

First let's take a look at ResNet-56. It's proposed by Kaiming He et al., and is designed to confirm the effect of residual networks. It has 56 weighted layers, deep but simple. The structure is shown in the figure below:

Seems a little bit long? Don't worry, let's do this step by step.

## 1 Ingredients

Python 3.6

TensorFlow 1.4.0

Numpy 1.13.3

OpenCV 3.2.0

CIFAR-10 Dataset

Also prepare some basic knowledge on Python programming, digital image processing and convolutional neural networks. If you are already capable of building, training and validating your own neural networks with TensorFlow, you don't have to read this post.

## 2 Recipe

### 2.0 Prepare the tools

Prepare(import) the tools for our project, including all that I mentioned above. Like this :P

Wait... What's this? TensorChain? Another deep learning framework like TensorFlow?

Uh, nope. This is my own encapsulation of some TensorFlow APIs, for the sake of easing your pain. You'll only have to focus on "what's what" in the beginning. We'll look into my implementation of this encapsulation later, when you are clear how everything goes. Please download this file and put it where your code file is, and import it.

### 2.1 Decide the input

Every neural network requires an input - you always have to identify the details of a question, before asking the computer to solve it. All of the variable, constant in TensorFlow are objects of type tf.Tensors. And the tf.placeholder of our input(s) is a special one. Images in CIFAR-10 dataset are RGB images(3 channels) of 32x32(really small), so our input should shaped like [32, 32, 3]. Also, we want to input a little batch of multiple images. Therefore, our input data should be an array of shape [?, 32, 32, 3]. Unknown dimension size can be marked as None, and it will be clear when we feed the model with the actual images. It's coded like this:

Ground truth data also need to be known in supervised learning, so we also have to define a placeholder for the ground truth data:

We want the label data to be in the one-hot encoding format, which means an array of length 10, denoting 10 classes. Only on one position is a '1', and on other positions are '0's.

### 2.2 Do some operations

For now, let's use our TensorChain to build it fast. Under most circumstances that we may face, the computations are based on the input data or the result of the former computation, so our network(or say, the most of it) look more like a chain than a web. Every time we add some new operation(layer), we add it to our TensorChain object. Just remember to get the output_tensor of this object(denoting the output tensor of the last operation on the chain) when you need to ue native TensorFlow API.
The construction function of TensorChain class requires a Tensor object as the parameter, which is also the input tensor of this chain. As we mentioned earlier, all we have to do is add operations. See my ResNet-56 code:

TensorChain类的构造函数需要一个Tensor对象作为参数，这个对象也正是被拿来作为这个链的输入层。正如我们之前所说的，只要在这个对象上添加运算即可。写个ResNet-56，代码很简单：

This is it? Right, this is it! Isn't it cool? Didn't seem that high, huh? That's because I encapsulated that huge mess of weights and biases, only leaving a few parameters that decide the structure of the network. Later in this pose we'll talk about the actual work that these functions do.

### 2.3 Define the loss

In supervised learning, you always have to tell the learning target to the model. To tell the model how to optimize, you have to let it know how, how much, on which direction should it change its parameters. This is done by using a loss function. Therefore, we need to define a loss function for our ResNet-56 model(which we designed for this classification problem) so that it will learn and optimize.
A commonly used loss function in classification problems is cross entropy. It's defined below:

$$C=-\frac{1}{n}\sum_x{y\ln a+(1-y)\ln(1-a)}$$

in which $$y$$ is the expected(or say correct) output and $$a$$ is the actual output.
This seems a little bit complicated. But it's not a hard job to implement, since TensorFlow implemented it already! You can also try and implement it yourself within one line if you want. For now we use the pre-defined cross entropy loss function:

and it returns a tf.Tensor that denotes an average of cross entropies(don't forget that this is a batch). As for the 'softmax' before the 'cross_entropy', it's a function that project the data in an array to range 0~1, which allows us to do a comparison between our prediction and the ground truth(in one-hot code). The definition is simple too:\

$$S_i=\frac{e^{V_i}}{\sum_j{e^{V_j}}}$$

### 2.4 Define the train op

Now we have the loss function. We'll have to tell its value to an optimizer, which make our model learn and optimize in order to minimize the loss value. Gradient Descent Optimizer, Adagrad Optimizer, Adam Optimizer and Momentum Optimizer are commonly used optimizers. Here we use an Adam Optimizer for instance. You're free to try any other one here. When

Also, tell the optimizer that what the loss tensor is. The returned object is a train operation.

The neural network is finished. It's time to grab some data and train it.

### 2.5 Feed the model with data, and train it!

Remember how we defined the placeholders? It's time to fetch some data that fits the placeholders and train it. See how CIFAR-10 dataset can be fetched on its website.

The returned value dict is a Python dictionary. Every time we unpickle a file, a dictionary would be returned. Its 'data' key leads to 10000 RGB images of size 32x32, which is stored in a [10000, 3072] array(3072=32323, I guess you know how it's stored now). The 'label' key leads to 10000 values in range 0~9. Obviously we have to reshape the data so as to fit it into the network model:

The details for data processing are not covered here. Try doing step-by-step to see the results.
The image_data and new_label_data are contain 10000 pieces of data each. Let's divide them into 100 small batches(100 elements each, including image and label) and feed it into the model. Do this on all the 5 batch files:

A session - created with tf.Session() - is required every time we run a TensorFlow model, no matter when we're training it or evaluating it. The first time you run a model, you'll need to run session.run(tf.global_variables_initializer()) to initialize the values of the TensorFlow variables defined previously.
When running session.run(), you must first decide a TensorFlow operation(or a list of operations) that you need. If its result is dependent on some actual data(which means that some data in one or more placeholders flow to this operation), it's also required that you feed it the actual data by adding a feed_dict parameter. For example, I'm training this ResNet-56 model, in which a loss will be calculated with my ground_truth and the prediction result that comes from the input_tensor. Therefore, I'll have to give a value for each placeholder given above(format: "placeholder name: corresponding data"), and fold them in one Python dictionary.

I'm also interested in the loss function value in each iteration(which means feeding a batch of data and executing one forward-propagation and one back-propagation) in the training process. Therefore, what I'll fill in the parameter is not just the train op, but also the loss tensor. And the session.run() above should be modified to:

This is when the return value of session.run() becomes useful. Its value(s) - corresponding to the first parameter of run() - is/are the actual value(s) of the tensor(s) in the first parameter. In our example, loss_value is the actual output of the loss tensor. As for train_, we don't care what it is. Just add it to match the dimensions.

Actually, one epoch(train the model once with the whole dataset) is not enough for the model to fully optimize. I trained this model for 40 epochs and added some loop variables to display the result. You can see my code and my output below. It's highly recommended that you train this with a high-performance GPU, or it would be a century before you train your model to a satisfactory degree.

### 2.6 Conclusion

In a word, building & training neural network models with TensorFlow involves the following steps:

1. Decide the input tensor

2. Add operations(ops) based on existing tensors

3. Define the loss tensor, just like other tensors

4. Select an optimizer and define the train op

5. Process data and feed the model with them

1. 定义输入Tensor

2. 在已有的Tensor上添加运算（op

3. 像之前添加的那些运算一样，定义损失Tensor

4. 选择一个优化器并定义训练操作

5. 把数据处理为合适的shape，并喂进模型训练

## 3 A Closer Look

Wait, it's too late to leave now!
TensorChain saved you from having to deal with a mess of TensorFlow classes and functions. Now it's time that we take a closer look at how TensorChain is implemented, thus understanding the native TensorFlow APIs.

TensorChain让你不至于面对TensorFlow中乱糟糟的类型和函数而不知所措被水淹没。现在是时候近距离观察一下TensorChain是如何实现的，以便理解TensorFlowAPI了。

### 3.1 TensorFlow variables

Let's begin with TensorFlow variables. Variables in TensorFlow are similar to variables in C, Java or any other strong typed programming languages - they have a type, though not necessarily explicitly decided upon definition. Usually them will change as the training process goes on, getting close to a best value.
The most commonly used variables in TensorFlow are weights and biases. I guess that you have seen formulae like:

TensorFlow中最常用的变量就是weights和biases（权重和偏置）。想必你应该见过这样的式子吧：

$$y=Wx+b$$

The $$W$$ here is the weight, and the $$b$$ here is the bias. When implementing some common network layers, they two are always used as the parameters in the layers. For instance, at the very beginning of our ResNet-56, we had a 3x3 sized convolution layer with 16 channels. Its implementation in TensorChain is:

See? On line 16, we used a tf.nn.conv2d() function, the parameters of which are input, filter, strides, padding, etc. As can be guessed from the names, this function does a convolution operation with out input and the weights(the convolution filter here). A bias is added to the result as the final output. There are also many people who argue that the bias here is meaningless and should removed. One line of code is sufficient for defining a variable:

To define weight or bias variables, create a tf.Variable object. Usually you'll need to give the initial_value which also decides the shape of this tensor. tf.truncated_normal() and tf.constant() are usually used as the initial values. Also, other APIs - function tf.get_variable() and package tf.initializers are frequently used when using some more methods for initialization. I strongly recommend that you try using these APIs yourself.

### 3.2 Tensors and operations

Going on with the parameters of the tf.nn.conv2d() function. The required parameters also include strides and padding. You should have already learned about what strides mean in convolution, and I'll only talk about their formats. strides require a 1-D vector with a length of 4, like [1, 2, 2, 1]. The 1st and the 4th number is always 1(in order to match dimensions with the input), while the 2nd and the 3rd means the vertical stride and the horizonal stride.
The 4th parameter padding is a little bit different from its definition in convolution operation. It requires 'SAME' of 'VALID', denoting 'with' or 'without' zero paddings. When it's 'SAME', zero padding is introduced to make the shapes match as needed, equally on every side of the input map.

tf.nn.conv2d() is just an example of TensorFlow operations. Other functions like tf.matmul(), tf.reduce_mean(), tf.global_variables_initializer(), tf.losses.softmax_cross_entropy(), tf.truncated_normal() are all operations. Operation functions return tensors(tf.truncated_normal also return a tensor, a tensor with initializers).

tf.nn.conv2d()只是TensorFlow运算（operation）的一个例子。其他例如tf.matmul()tf.reduce_mean()tf.nn.relu()tf.batch_normalization()tf.global_variables_initializer()tf.losses.softmax_cross_entropy()tf.truncated_normal()之类的函数也都是TensorFlow的运算。TensorFlow的运算函数会返回一个Tensor对象（包括tf.truncated_normal()也是！它只不过返回的是一个带初始化器的Tensor而已）。

All the functions in the TensorChain class are based on the most basic TensorFlow operations and variables. After learning about these basic TensorFlow concepts, actually you can already abandon TensorChain, go and try implementing your own neural networks yourself!

TensorChain类中的所有成员函数都是基于最基本的TensorFlow运算和变量的。实际上，了解了这些，你现在已经可以抛开TensorChain的束缚，去尝试实现你自己的神经网络了！

## 4 Spices

I'm not joking just now! But I know that there are a lot of things that you still don't understand about using TensorFlow - like "how do I visualize my computation graph", "how do I save/load my model to/from files", "how do I record some tensors' values while training" or "how do I view the loss curves" - after all TensorFlow APIs are far more complicated than just building those nets. Those are also important techniques in your research. If you'd rather ask me than spending some time experimenting, please go on with reading.

The very first thing that you may want to do - after training a network model with nice outcomes - would be saving it. Saving a model is fairly easy - just use a tf.train.Saver object. See my code below:

I saved my model and variable values to 'models/model.ckpt'. But actually, you'll find 3 files in the 'models' directory - model.ckpt.data-00000-of-00001, model.ckpt.meta and model.ckpt.index - none of which is 'model.ckpt'! That's because TensorFlow stores the graph structure separately from variables values. The .meta file describes the saved graph structure; the .index file records the mappings between tensor names and tensor metadata; and the .data-00000-of-00001 file - which is always the biggest one - saves all the variable values. If you need the graph data together with the variable values to be loaded, use a Saver to load after creating a session:

Remember that session.run(tf.global_variables_initializer()) shouldn't be executed, since variables are already initialized with your saved .data-0000-of-00001 file.
If you only need the graph to be loaded, only use the .meta file:

Function tf.train.import_meta_graph() loads(appends) the graph to your current computation graph. The values of tensors are still uninitialized so you'll have to execute session.run(tf.global_variables_initializer()) again. The tensors that you defined in the model can be retrieved by their names(property of the Tensor objects, instead of Python variable names). For example:

tf.train.import_meta_graph()函数将文件里的计算图读到（添加到）你当前的计算图中。其中所有Tensor的值仍未初始化，所以有必要执行一下session.run(tf.global_variables_initializer())了。之前定义的变量可以按照名称取回，示例：

To retrieve normal tensors, you'll have to append a ':0' to the name of the op. This means getting the associated tensor of the op. train is a little special - we only need the op, so the function is get_operation_by_name() so the ':0' is not necessary.