STDFusionNet：基于显著目标检测的红外-可见光图像融合网络#

题目：STDFusionNet: An Infrared and Visible Image Fusion Network Based on Salient Target Detection
期刊：IEEE Transactions on Instrumentation and Measurement, Vol. 70, 2021（Art. no. 5009513）
作者：Jiayi Ma, Linfeng Tang, Meilong Xu, Hao Zhang, Guobao Xiao
DOI：10.1109/TIM.2021.3075747
代码： https://github.com/jiayi-ma/STDFusionNet

1. 摘要#

论文提出一种基于显著目标检测（salient target detection）的红外-可见光融合网络，命名为 STDFusionNet，其目标是保留红外图像的热目标（thermal targets）与可见光图像的纹理结构（texture structures）。
作者引入显著目标mask（salient target mask）用于标注红外图像中“人或机器更关注的区域”，以此为不同信息的融合提供空间引导（spatial guidance）。
论文将显著目标mask与特定loss函数结合，用于指导特征的提取与重建；并指出：特征提取网络可选择性提取红外显著目标特征与可见背景纹理特征，重建网络融合并重建期望结果。
论文强调：显著目标mask仅在训练阶段需要，使得STDFusionNet在测试时是端到端模型；并且模型可隐式实现显著目标检测与关键信息融合
论文给出实验结论：相对“state of the arts”，其在公共数据集上可对 EN/MI/VIF/SF 指标分别取得约**1.25% / 22.65% / 4.3% / 0.89%**的提升。

2. 引言与动机#

单一传感器/单一拍摄设置得到的图像只能从有限视角描述场景，因此融合来自不同传感器/不同设置的互补图像有助于增强场景理解；其中红外-可见光融合是重要场景之一。
论文给出问题现象：一些现有融合方法会削弱“有用信息（useful information）”；并在示例中指出：U2Fusion 会弱化显著目标，FusionGAN 会弱化背景纹理。

Fig. 2：STDFusionNet示例对比（GTF / DenseFuse / STDFusionNet）

从左到右依次为红外、可见、传统方法GTF结果、深度方法DenseFuse结果、以及本文STDFusionNet结果；红框与绿框用于展示GTF与DenseFuse存在细节损失、边缘模糊、伪影，而STDFusionNet更好突出目标并具有丰富纹理。

3. 贡献#

贡献1：定义融合过程中的“期望信息（desired information）”为红外图像的显著目标与可见图像的背景纹理的组合，并声称这是首次对红外-可见光融合目标的显式定义。
贡献2：将显著目标mask引入特定loss函数，引导网络检测红外热辐射目标并与可见背景纹理细节融合。
贡献3：大量实验显示该网络优越性；并指出融合结果“看起来像高质量可见光图像且目标突出”，有助于目标识别与场景理解。

4. 方法（STDFusionNet）#

4.1 符号与“期望信息”定义#

在红外-可见光融合中，最关键的信息是显著目标与纹理结构，分别来自红外图像与可见光图像；因此将“期望信息”显式定义为：红外图像中的显著目标信息 + 可见光图像中的背景纹理结构信息。
论文据此提出两项关键：
1. 确定红外图像中的显著目标区域（通常是能发出更多热量的对象，如pedestrians/vehicles/bunkers所在区域）；网络需要学习从红外图像中自动检测这些区域；
2. 从检测到的区域准确提取期望信息并进行有效融合与重建，使融合结果在红外显著区域包含红外显著目标，在背景区域保留可见纹理。
在loss构建中，作者用显著目标mask $I_m$ 将“期望结果（desired result）”定义为：

I_d = I_m \circ I_{ir} + (1 - I_m) \circ I_{vi} \tag{1}

其中 $\circ$ 表示逐元素乘（element-wise multiplication）。

Fig.3：左侧的“Salient target mask”与其“背景mask（反相）”，在图中通过“逐元素乘”节点把源图像分成“显著区域”和“背景区域”。

4.2 总体框架#

4.2.1 输入与“分区”操作（Fig.3左）#

输入1：Visible Image（可见光图像），在Fig.3顶部作为可见分支的输入；同时在Fig.3下方参与与背景mask相乘以得到“可见背景区域”。
输入2：Infrared Image（红外图像），在Fig.3顶部作为红外分支的输入；同时在 Fig.3 下方参与与显著mask相乘以得到“红外显著区域”。
输入3：Salient target mask $I_m$ ：此处说明其目的在于高亮红外图像中“能辐射大量热量”的对象（如 pedestrians/vehicles/bunkers）。
背景mask（Fig.3 中的反相mask）：“salient target masks are inverted to obtain the background masks”。
逐像素乘（Fig.3 图例中的 element-wise multiplication）：论文写到将显著mask与背景mask分别在像素级与红外/可见图像相乘，得到“source salient target regions”和“source background texture regions”。

对应实现：这部分是“mask分区”，属于训练期loss构建的前处理；并不意味着mask被送进主干网络作为输入。本文强调 mask仅用于训练期引导，不需要在测试期输入网络。

4.2.2 Feature Extraction Network（Fig.3顶部）#

特征提取网络采用pseudosiamese架构以“区别对待”不同模态的源图像，从而选择性地从红外图像提取显著目标特征、从可见图像提取背景纹理特征。
Fig.3顶部：可见分支与红外分支都先经过一个Conv 5×5和一个lrelu（leaky rectified linear unit），然后接 ResBlock×3。
该pseudosiamese架构中两条特征提取网络具有相同架构，但参数独立训练不共享，原因是红外与可见光图像属性不同。

4.2.3 Feature Reconstruction Network（Fig.3顶部右侧）#

Fig.3 中在两条特征提取分支之后进入Feature Reconstruction Network（虚线框部分），其内部由ResBlock×4组成，并最终输出融合图像 $I_f$ （“Fused Image”）。
特征重建网络的输入是红外卷积特征与可见卷积特征在通道维度上的拼接（concatenation in the channel dimension）。
重建网络最后一层使用Tanh激活，以保证输出图像取值范围与输入源图像一致。

4.2.4 Loss Function（Fig.3中部“Loss Function”块）#

Loss Function块中用简写表达了 “ $L = L_p + L_{grad}$ ” 的思想；正文进一步说明其loss由两类损失构成：pixel loss（约束融合图像像素强度一致性）与gradient loss（促使融合图像包含更多细节信息）。
论文强调：pixel/gradient loss都分别在显著区域与背景区域构建，并结合显著mask $I_m$ 把融合图像划分为 $I_m\circ I_f$ （显著区域）与 $(1-I_m)\circ I_f$ （背景区域）。

4.3 Loss 公式（式(2)-(6)）——像素一致 + 梯度一致 + 显著/背景分区#

4.3.1 像素损失 Pixel loss（显著区域 / 背景区域）#

显著区域像素损失：

L^{pixel}_{salient}=\frac{1}{HW}\left\| (I_m\circ I_f)-(I_m\circ I_{ir})\right\|_1 \tag{2}

背景区域像素损失：

L^{pixel}_{back}=\frac{1}{HW}\left\| ((1-I_m)\circ I_f)-((1-I_m)\circ I_{vi})\right\|_1 \tag{3}

论文说明： $\|\cdot\|_1$ 为 L1 范数， $H,W$ 分别为图像高和宽。

4.3.2 梯度损失 Gradient loss（显著区域 / 背景区域）#

论文写明：梯度算子 $\nabla$ 使用 Sobel operator 来计算图像梯度。
显著区域梯度损失：

L^{grad}_{salient}=\frac{1}{HW}\left\| (I_m\circ \nabla I_f)-(I_m\circ \nabla I_{ir})\right\|_1 \tag{4}

背景区域梯度损失：

L^{grad}_{back}=\frac{1}{HW}\left\| ((1-I_m)\circ \nabla I_f)-((1-I_m)\circ \nabla I_{vi})\right\|_1 \tag{5}

4.3.3 总损失——区域权重 + 同区域内 pixel/grad 等权#

论文指出：与以往方法不同，作者在“同一个区域内”对 pixel loss 与 gradient loss 同等对待（equally），因此最终loss为：

L = (L^{pixel}_{back}+L^{grad}_{back})+\alpha(L^{pixel}_{salient}+L^{grad}_{salient}) \tag{6}

论文解释： $\alpha$ 是控制背景区域与显著区域 loss 平衡的权重；并指出通过在 loss 中引入显著区域约束，模型具有“自动检测并提取红外显著目标”的能力。

4.4 显著目标mask的获取与应用（Fig.3下半+Fig.4）#

使用LabelMe toolbox标注红外图像中的显著目标并转成二值mask；之后将mask取反得到背景mask。
随后：
- 将显著mask与背景mask分别在像素级与红外/可见图像相乘，得到“源显著区域”和“源背景纹理区域”；
- 将融合图像同样与显著mask/背景mask相乘，得到“融合显著区域”和“融合背景区域”；
- 最终用这些区域去构造特定loss，从而引导网络隐式实现显著目标检测与信息融合。
显著目标mask仅用于训练引导，测试阶段不需要输入网络，因此模型端到端。

4.5 网络结构细节（Fig.3顶部/底部 ResBlock）#

4.5.1 Feature Extraction Network（两条分支，pseudosiamese）#

特征提取部分包含两条网络（红外/可见），二者架构相同但参数独立训练，以适应不同模态图像的属性差异。
每条特征提取网络由：
- Common layer：一个 5×5 卷积层 + 一个 leaky ReLU 激活层；
- 3 个 ResBlocks：用于增强提取的信息（“reinforce the extracted information”）。

4.5.2 ResBlock（Fig.3 底部局部结构：每一个点/算子）#

ResBlock 有两条路径：
主分支（上路）：Conv1(1×1) → lrelu → Conv2(3×3) → lrelu → Conv3(1×1)；
旁路（下路）：identity conv(1×1)。
两路输出在“+”节点处相加后，再过一个 lrelu 输出。
除Conv2为 3×3 外，其余卷积核大小均为1×1；Conv1/Conv2后接 leaky ReLU；Conv3与identity conv输出先相加再接 leaky ReLU。
identity conv的设计用于解决 ResBlock 输入与输出维度不一致的问题。

4.5.3 Feature Reconstruction Network（融合与重建）#

特征重建网络由4个ResBlocks组成；其输入是红外与可见分支特征的通道拼接；最后一层使用Tanh激活以保证输出范围与输入一致。

4.5.4 padding/stride（“无下采样”）#

信息丢失对融合任务是灾难性的，因此 STDFusionNet 的所有卷积层采用 padding = SAME 与 stride = 1；由此网络不引入下采样，融合图像尺寸与源图像一致。

5. 方法实现#

按 Fig.3 + 式(1)-(6) 整理的训练/推理流程

数据预处理（归一化到 [-1,1]、裁剪 stride=24、patch=128×128；测试不裁剪）
来自 utils.input_setup 和 train.py。训练时每张源图/掩码都按 stride=24 滑窗裁成 128×128 patch，并用 (imread(...) - 127.5)/127.5 归一化到 [-1,1]。

1
def input_setup(sess, config, data_dir, index=0):
2
    """
3
    Read image files and make their sub-images and saved them as a h5 file format.
4
    """
5
    # Load data path
6
    if config.is_train:
7
        data = prepare_data(sess, dataset=data_dir)
8
    else:
9
        data = prepare_data(sess, dataset=data_dir)
10

11
    sub_input_sequence = []
12

13
    if config.is_train:
14
        for i in range(len(data)):
15
            input_ = (imread(data[i]) - 127.5) / 127.5
16
            if len(input_.shape) == 3:
17
                h, w, _ = input_.shape
18
            else:
19
                h, w = input_.shape
20
            for x in range(0, h - config.image_size + 1, config.stride):
21
                for y in range(0, w - config.image_size + 1, config.stride):
22
                    sub_input = input_[x:x + config.image_size, y:y + config.image_size]
23
                    # Make channel value
24
                    if data_dir == "Train":
25
                        sub_input = cv2.resize(sub_input, (config.image_size / 4, config.image_size / 4),
26
                                               interpolation=cv2.INTER_CUBIC)
27
                        sub_input = sub_input.reshape([config.image_size / 4, config.image_size / 4, 1])
28
                        print('error')
29
                    else:
30
                        sub_input = sub_input.reshape([config.image_size, config.image_size, 1])
31

32
                    sub_input_sequence.append(sub_input)
33

34
    else:
35
        input_ = (imread(data[index]) - 127.5) / 127.5 // 归一化
36
        if len(input_.shape) == 3:
37
            h_real, w_real, _ = input_.shape // RGB只关心HW
38
        else:
39
            h_real, w_real = input_.shape // IR只关心HW
40
        input_ = np.lib.pad(input_, ((padding, padding_h), (padding, padding_w)), 'edge')
41
        h, w = input_.shape
42
        # print(input_.shape)
43
        # Numbers of sub-images in height and width of image are needed to compute merge operation.
44
        nx = ny = 0
45
        for x in range(0, h - config.image_size + 1, config.stride):
46
            nx += 1
47
            ny = 0
48
            for y in range(0, w - config.image_size + 1, config.stride):
49
                ny += 1
50
                sub_input = input_[x:x + config.image_size, y:y + config.image_size]  # [33 x 33]
51
                sub_input = sub_input.reshape([config.image_size, config.image_size, 1])
52
                // 单通道输出，只关心亮度
53
                sub_input_sequence.append(sub_input)
54
// 128x128 每次64x64，右下，滑窗
55
    """
56
    len(sub_input_sequence) : the number of sub_input (33 x 33 x ch) in one image
57
    (sub_input_sequence[0]).shape : (33, 33, 1)
58
    """
59
    # Make list to numpy array. With this transform
60
    arrdata = np.asarray(sub_input_sequence)  # [?, 33, 33, 1]
61
    # print(arrdata.shape)
62
    make_data(sess, arrdata, data_dir)
63

64
    if not config.is_train:
65
        print(nx, ny)
66
        print(h_real, w_real)
67
        return nx, ny, h_real, w_real

对齐：config.image_size=128、config.stride=24（见 5.5），对应训练细节“crop 128×128 with stride 24 得到 6921 patch”与“输入归一化到 [-1,1]”，且无下采样保持尺寸。
Train_vi、Train_ir、Train_ir_mask_blur 通过该函数生成训练 patch，mask 仅进入 loss 计算（Fig.3 下半部分），测试阶段不需要 mask。

5.1 前向传播（Inference/Training都需要）#

代码来源：train_network.py。
包含两通路Visible Image与Infrared Image，和重建网络

1
class STDFusionNet():
2
    def vi_feature_extraction_network(self, vi_image): // 这里定义的是可见光图像类
3
        # 可见光编码器，输入 vi_image 形状: [N, H, W, 1]
4
        with tf.compat.v1.variable_scope('vi_extraction_network'):
5
            with tf.compat.v1.variable_scope('conv1'):
6
                # 首层 5x5 卷积提取低层特征，输出 16 通道
7
                weights = tf.compat.v1.get_variable("w", [5, 5, 1, 16],
8
                                                    initializer=tf.truncated_normal_initializer(stddev=1e-3))
9
                #weights = weights_spectral_norm(weights)
10
                # 每个输出通道一个偏置
11
                bias = tf.compat.v1.get_variable("b", [16], initializer=tf.constant_initializer(0.0))
12
                # 步长 1 且 SAME 填充，保持空间尺寸
13
                conv1 = tf.nn.conv2d(vi_image, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
14
                # conv1 = tf.contrib.layers.batch_norm(conv1, decay=0.9, updates_collections=None, epsilon=1e-5, scale=True)
15
                # Leaky ReLU 激活缓解神经元死亡
16
                conv1 = tf.nn.leaky_relu(conv1)
17
            block1_input = conv1
18
            # state size: 16
19

20

21

22

23
    // 主分支 1×1→3×3→1×1 且 Conv1/Conv2 后接 leaky ReLU，旁路 identity 1×1 升维；`conv3 + identity_conv` 后再过 leaky ReLU，对应 Fig.3 ResBlock 的“+”与激活。
24

25
            with tf.compat.v1.variable_scope('block1'):
26
                with tf.compat.v1.variable_scope('conv1'):
27
                    # 1x1 卷积混合通道，不改变空间分辨率
28
                    weights = tf.compat.v1.get_variable("w", [1, 1, 16, 16],
29
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
30
                    #weights = weights_spectral_norm(weights)
31
                    bias = tf.compat.v1.get_variable("b", [16], initializer=tf.constant_initializer(0.0))
32
                    # 点卷积投影
33
                    conv1 = tf.nn.conv2d(block1_input, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
34
                    conv1 = tf.nn.leaky_relu(conv1)
35

36
                with tf.compat.v1.variable_scope('conv2'):
37
                    # 3x3 卷积聚合空间上下文
38
                    weights = tf.compat.v1.get_variable("w", [3, 3, 16, 16],
39
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
40
                    #weights = weights_spectral_norm(weights)
41
                    bias = tf.compat.v1.get_variable("b", [16], initializer=tf.constant_initializer(0.0))
42
                    # SAME 填充保持特征图尺寸
43
                    conv2 = tf.nn.conv2d(conv1, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
44
                    conv2 = tf.nn.leaky_relu(conv2)
45
                with tf.compat.v1.variable_scope('conv3'):
46
                    # 1x1 卷积生成残差输出
47
                    weights = tf.compat.v1.get_variable("w", [1, 1, 16, 16],
48
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
49
                    #weights = weights_spectral_norm(weights)
50
                    bias = tf.compat.v1.get_variable("b", [16], initializer=tf.constant_initializer(0.0))
51
                    # 残差分支输出
52
                    conv3 = tf.nn.conv2d(conv2, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
53

54
                # 残差连接：主分支与输入相加
55
                block1_output = tf.nn.leaky_relu(conv3 + block1_input)
56
            block2_input = block1_output
57

58

59

60

61
            with tf.compat.v1.variable_scope('block2'):
62
                with tf.compat.v1.variable_scope('conv1'):
63
                    # 先用 1x1 通道混合，再做空间卷积
64
                    weights = tf.compat.v1.get_variable("w", [1, 1, 16, 16],
65
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
66
                    #weights = weights_spectral_norm(weights)
67
                    bias = tf.compat.v1.get_variable("b", [16], initializer=tf.constant_initializer(0.0))
68
                    # 点卷积投影
69
                    conv1 = tf.nn.conv2d(block2_input, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
70
                    conv1 = tf.nn.leaky_relu(conv1)
71

72
                with tf.compat.v1.variable_scope('conv2'):
73
                    # 3x3 卷积扩大感受野
74
                    weights = tf.compat.v1.get_variable("w", [3, 3, 16, 16],
75
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
76
                    #weights = weights_spectral_norm(weights)
77
                    bias = tf.compat.v1.get_variable("b", [16], initializer=tf.constant_initializer(0.0))
78
                    # SAME 填充保持分辨率
79
                    conv2 = tf.nn.conv2d(conv1, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
80
                    conv2 = tf.nn.leaky_relu(conv2)
81
                with tf.compat.v1.variable_scope('conv3'):
82
                    # 1x1 卷积将通道升维到 32
83
                    weights = tf.compat.v1.get_variable("w", [1, 1, 16, 32],
84
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
85
                    #weights = weights_spectral_norm(weights)
86
                    bias = tf.compat.v1.get_variable("b", [32], initializer=tf.constant_initializer(0.0))
87
                    # 残差主分支输出
88
                    conv3 = tf.nn.conv2d(conv2, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
89
                with tf.variable_scope('identity_conv'):
90
                    # 投影捷径，将通道从 16 映射到 32 以便相加
91
                    weights = tf.compat.v1.get_variable("w", [1, 1, 16, 32],
92
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
93
                    #weights = weights_spectral_norm(weights)
94
                    identity_conv = tf.nn.conv2d(block2_input, weights, strides=[1, 1, 1, 1], padding='SAME')
95
                # 残差相加后再激活
96
                block2_output = tf.nn.leaky_relu(conv3 + identity_conv)
97
                block3_input = block2_output
98

99

100

101

102
            with tf.compat.v1.variable_scope('block3'):
103
                with tf.compat.v1.variable_scope('conv1'):
104
                    # 1x1 通道混合，保持 32 通道
105
                    weights = tf.compat.v1.get_variable("w", [1, 1, 32, 32],
106
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
107
                    #weights = weights_spectral_norm(weights)
108
                    bias = tf.compat.v1.get_variable("b", [32], initializer=tf.constant_initializer(0.0))
109
                    # 点卷积投影
110
                    conv1 = tf.nn.conv2d(block3_input, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
111
                    conv1 = tf.nn.leaky_relu(conv1)
112

113
                with tf.compat.v1.variable_scope('conv2'):
114
                    # 3x3 卷积，通道仍为 32
115
                    weights = tf.compat.v1.get_variable("w", [3, 3, 32, 32],
116
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
117
                    #weights = weights_spectral_norm(weights)
118
                    bias = tf.compat.v1.get_variable("b", [32], initializer=tf.constant_initializer(0.0))
119
                    # SAME 填充的空间卷积
120
                    conv2 = tf.nn.conv2d(conv1, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
121
                    conv2 = tf.nn.leaky_relu(conv2)
122
                with tf.compat.v1.variable_scope('conv3'):
123
                    # 1x1 卷积将通道升到 64
124
                    weights = tf.compat.v1.get_variable("w", [1, 1, 32, 64],
125
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
126
                    #weights = weights_spectral_norm(weights)
127
                    bias = tf.compat.v1.get_variable("b", [64], initializer=tf.constant_initializer(0.0))
128
                    # 残差主分支输出
129
                    conv3 = tf.nn.conv2d(conv2, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
130
                with tf.variable_scope('identity_conv'):
131
                    # 捷径分支投影，将 32 通道升到 64
132
                    weights = tf.compat.v1.get_variable("w", [1, 1, 32, 64],
133
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
134
                    #weights = weights_spectral_norm(weights)
135
                    identity_conv = tf.nn.conv2d(block3_input, weights, strides=[1, 1, 1, 1], padding='SAME')
136
                # 残差相加后的可见光编码输出
137
                block3_output = tf.nn.leaky_relu(conv3 + identity_conv)
138
                encoding_feature = block3_output
139
        return encoding_feature
140

141

142

143

144

145

146

147

148
    def ir_feature_extraction_network(self, ir_image): // 这里定义的是红外图像类
149
        # 红外编码器，输入 ir_image 形状: [N, H, W, 1]
150
        with tf.compat.v1.variable_scope('ir_extraction_network'):
151
            with tf.compat.v1.variable_scope('conv1'):
152
                # 首层 5x5 卷积提取低层红外特征，输出 16 通道
153
                weights = tf.compat.v1.get_variable("w", [5, 5, 1, 16],
154
                                                    initializer=tf.truncated_normal_initializer(stddev=1e-3))
155
                #weights = weights_spectral_norm(weights)
156
                # 每通道偏置
157
                bias = tf.compat.v1.get_variable("b", [16], initializer=tf.constant_initializer(0.0))
158
                # SAME 填充保持尺寸，步长 1
159
                conv1 = tf.nn.conv2d(ir_image, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
160
                # conv1 = tf.contrib.layers.batch_norm(conv1, decay=0.9, updates_collections=None, epsilon=1e-5, scale=True)
161
                # Leaky ReLU 激活
162
                conv1 = tf.nn.leaky_relu(conv1)
163
            block1_input = conv1
164
            # state size: 16
165

166

167

168

169
    // 主分支 1×1→3×3→1×1 且 Conv1/Conv2 后接 leaky ReLU，旁路 identity 1×1 升维；`conv3 + identity_conv` 后再过 leaky ReLU，对应 Fig.3 ResBlock 的“+”与激活。
170

171
            with tf.compat.v1.variable_scope('block1'):
172
                with tf.compat.v1.variable_scope('conv1'):
173
                    # 1x1 通道混合，保持空间
174
                    weights = tf.compat.v1.get_variable("w", [1, 1, 16, 16],
175
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
176
                    #weights = weights_spectral_norm(weights)
177
                    bias = tf.compat.v1.get_variable("b", [16], initializer=tf.constant_initializer(0.0))
178
                    # 点卷积投影
179
                    conv1 = tf.nn.conv2d(block1_input, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
180
                    conv1 = tf.nn.leaky_relu(conv1)
181

182
                with tf.compat.v1.variable_scope('conv2'):
183
                    # 3x3 卷积获取局部上下文
184
                    weights = tf.compat.v1.get_variable("w", [3, 3, 16, 16],
185
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
186
                    #weights = weights_spectral_norm(weights)
187
                    bias = tf.compat.v1.get_variable("b", [16], initializer=tf.constant_initializer(0.0))
188
                    # SAME 填充，步长 1
189
                    conv2 = tf.nn.conv2d(conv1, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
190
                    conv2 = tf.nn.leaky_relu(conv2)
191
                with tf.compat.v1.variable_scope('conv3'):
192
                    # 1x1 卷积生成残差输出（仍 16 通道）
193
                    weights = tf.compat.v1.get_variable("w", [1, 1, 16, 16],
194
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
195
                    #weights = weights_spectral_norm(weights)
196
                    bias = tf.compat.v1.get_variable("b", [16], initializer=tf.constant_initializer(0.0))
197
                    # 残差主分支输出
198
                    conv3 = tf.nn.conv2d(conv2, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
199

200
                # 残差连接：与输入相加再激活
201
                block1_output = tf.nn.leaky_relu(conv3 + block1_input)
202
            block2_input = block1_output
203

204

205

206

207
            with tf.compat.v1.variable_scope('block2'):
208
                with tf.compat.v1.variable_scope('conv1'):
209
                    # 1x1 通道混合，为升维做准备
210
                    weights = tf.compat.v1.get_variable("w", [1, 1, 16, 16],
211
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
212
                    #weights = weights_spectral_norm(weights)
213
                    bias = tf.compat.v1.get_variable("b", [16], initializer=tf.constant_initializer(0.0))
214
                    # 点卷积投影
215
                    conv1 = tf.nn.conv2d(block2_input, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
216
                    conv1 = tf.nn.leaky_relu(conv1)
217

218
                with tf.compat.v1.variable_scope('conv2'):
219
                    # 3x3 卷积扩大感受野
220
                    weights = tf.compat.v1.get_variable("w", [3, 3, 16, 16],
221
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
222
                    #weights = weights_spectral_norm(weights)
223
                    bias = tf.compat.v1.get_variable("b", [16], initializer=tf.constant_initializer(0.0))
224
                    # SAME 填充保持尺寸
225
                    conv2 = tf.nn.conv2d(conv1, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
226
                    conv2 = tf.nn.leaky_relu(conv2)
227
                with tf.compat.v1.variable_scope('conv3'):
228
                    # 1x1 卷积升维至 32 通道
229
                    weights = tf.compat.v1.get_variable("w", [1, 1, 16, 32],
230
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
231
                    #weights = weights_spectral_norm(weights)
232
                    bias = tf.compat.v1.get_variable("b", [32], initializer=tf.constant_initializer(0.0))
233
                    # 残差主分支输出
234
                    conv3 = tf.nn.conv2d(conv2, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
235
                with tf.variable_scope('identity_conv'):
236
                    # 捷径投影，将 16 通道映射到 32 通道
237
                    weights = tf.compat.v1.get_variable("w", [1, 1, 16, 32],
238
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
239
                    #weights = weights_spectral_norm(weights)
240
                    identity_conv = tf.nn.conv2d(block2_input, weights, strides=[1, 1, 1, 1], padding='SAME')
241
                # 残差相加并激活
242
                block2_output = tf.nn.leaky_relu(conv3 + identity_conv)
243
                block3_input = block2_output
244

245

246

247

248
            with tf.compat.v1.variable_scope('block3'):
249
                with tf.compat.v1.variable_scope('conv1'):
250
                    # 1x1 通道混合，保持 32 通道
251
                    weights = tf.compat.v1.get_variable("w", [1, 1, 32, 32],
252
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
253
                    #weights = weights_spectral_norm(weights)
254
                    bias = tf.compat.v1.get_variable("b", [32], initializer=tf.constant_initializer(0.0))
255
                    # 点卷积投影
256
                    conv1 = tf.nn.conv2d(block3_input, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
257
                    conv1 = tf.nn.leaky_relu(conv1)
258

259
                with tf.compat.v1.variable_scope('conv2'):
260
                    # 3x3 卷积保持 32 通道
261
                    weights = tf.compat.v1.get_variable("w", [3, 3, 32, 32],
262
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
263
                    #weights = weights_spectral_norm(weights)
264
                    bias = tf.compat.v1.get_variable("b", [32], initializer=tf.constant_initializer(0.0))
265
                    # SAME 填充卷积
266
                    conv2 = tf.nn.conv2d(conv1, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
267
                    conv2 = tf.nn.leaky_relu(conv2)
268
                with tf.compat.v1.variable_scope('conv3'):
269
                    # 1x1 卷积升维到 64 通道
270
                    weights = tf.compat.v1.get_variable("w", [1, 1, 32, 64],
271
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
272
                    #weights = weights_spectral_norm(weights)
273
                    bias = tf.compat.v1.get_variable("b", [64], initializer=tf.constant_initializer(0.0))
274
                    # 残差主分支输出
275
                    conv3 = tf.nn.conv2d(conv2, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
276
                with tf.variable_scope('identity_conv'):
277
                    # 捷径分支投影，32 -> 64 通道
278
                    weights = tf.compat.v1.get_variable("w", [1, 1, 32, 64],
279
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
280
                    #weights = weights_spectral_norm(weights)
281
                    identity_conv = tf.nn.conv2d(block3_input, weights, strides=[1, 1, 1, 1], padding='SAME')
282
                # 残差输出，得到红外编码特征
283
                block3_output = tf.nn.leaky_relu(conv3 + identity_conv)
284
                encoding_feature = block3_output
285
        return encoding_feature
286

287

288

289

290

291

292

293

294
    def feature_reconstruction_network(self, feature): // decoder重建网络
295
        # 解码重建网络，将拼接特征还原为融合图像
296
        with tf.compat.v1.variable_scope('reconstruction_network'):
297
            block1_input = feature
298
            with tf.compat.v1.variable_scope('block1'):
299
                with tf.compat.v1.variable_scope('conv1'):
300
                    # 1x1 通道混合，保持 128 通道
301
                    weights = tf.compat.v1.get_variable("w", [1, 1, 128, 128],
302
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
303
                    #weights = weights_spectral_norm(weights)
304
                    bias = tf.compat.v1.get_variable("b", [128], initializer=tf.constant_initializer(0.0))
305
                    # 点卷积
306
                    conv1 = tf.nn.conv2d(block1_input, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
307
                    conv1 = tf.nn.leaky_relu(conv1)
308

309
                with tf.compat.v1.variable_scope('conv2'):
310
                    # 3x3 卷积保持通道，提炼空间信息
311
                    weights = tf.compat.v1.get_variable("w", [3, 3, 128, 128],
312
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
313
                    #weights = weights_spectral_norm(weights)
314
                    bias = tf.compat.v1.get_variable("b", [128], initializer=tf.constant_initializer(0.0))
315
                    # SAME 填充卷积
316
                    conv2 = tf.nn.conv2d(conv1, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
317
                    conv2 = tf.nn.leaky_relu(conv2)
318
                with tf.compat.v1.variable_scope('conv3'):
319
                    # 1x1 卷积降维至 64 通道
320
                    weights = tf.compat.v1.get_variable("w", [1, 1, 128, 64],
321
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
322
                    #weights = weights_spectral_norm(weights)
323
                    bias = tf.compat.v1.get_variable("b", [64], initializer=tf.constant_initializer(0.0))
324
                    # 残差主分支输出
325
                    conv3 = tf.nn.conv2d(conv2, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
326
                with tf.variable_scope('identity_conv'):
327
                    # 捷径投影 128 -> 64，匹配相加
328
                    weights = tf.compat.v1.get_variable("w", [1, 1, 128, 64],
329
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
330
                    #weights = weights_spectral_norm(weights)
331
                    identity_conv = tf.nn.conv2d(block1_input, weights, strides=[1, 1, 1, 1], padding='SAME')
332
                # 残差相加，使用 ELU 稳定梯度
333
                block1_output = tf.nn.elu(conv3 + identity_conv)
334
            block2_input = block1_output
335
            with tf.compat.v1.variable_scope('block2'):
336
                with tf.compat.v1.variable_scope('conv1'):
337
                    # 1x1 通道混合，维持 64 通道
338
                    weights = tf.compat.v1.get_variable("w", [1, 1, 64, 64],
339
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
340
                    #weights = weights_spectral_norm(weights)
341
                    bias = tf.compat.v1.get_variable("b", [64], initializer=tf.constant_initializer(0.0))
342
                    # 点卷积
343
                    conv1 = tf.nn.conv2d(block2_input, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
344
                    conv1 = tf.nn.leaky_relu(conv1)
345

346
                with tf.compat.v1.variable_scope('conv2'):
347
                    # 3x3 卷积保持 64 通道
348
                    weights = tf.compat.v1.get_variable("w", [3, 3, 64, 64],
349
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
350
                    #weights = weights_spectral_norm(weights)
351
                    bias = tf.compat.v1.get_variable("b", [64], initializer=tf.constant_initializer(0.0))
352
                    # SAME 卷积
353
                    conv2 = tf.nn.conv2d(conv1, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
354
                    conv2 = tf.nn.leaky_relu(conv2)
355
                with tf.compat.v1.variable_scope('conv3'):
356
                    # 1x1 卷积降维到 32 通道
357
                    weights = tf.compat.v1.get_variable("w", [1, 1, 64, 32],
358
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
359
                    #weights = weights_spectral_norm(weights)
360
                    bias = tf.compat.v1.get_variable("b", [32], initializer=tf.constant_initializer(0.0))
361
                    # 残差主分支
362
                    conv3 = tf.nn.conv2d(conv2, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
363
                with tf.variable_scope('identity_conv'):
364
                    # 捷径投影 64 -> 32
365
                    weights = tf.compat.v1.get_variable("w", [1, 1, 64, 32],
366
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
367
                    #weights = weights_spectral_norm(weights)
368
                    identity_conv = tf.nn.conv2d(block2_input, weights, strides=[1, 1, 1, 1], padding='SAME')
369
                # 残差相加再 ELU 激活
370
                block2_output = tf.nn.elu(conv3 + identity_conv)
371
                block3_input = block2_output
372
            with tf.compat.v1.variable_scope('block3'):
373
                with tf.compat.v1.variable_scope('conv1'):
374
                    # 1x1 卷积保持 32 通道
375
                    weights = tf.compat.v1.get_variable("w", [1, 1, 32, 32],
376
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
377
                    #weights = weights_spectral_norm(weights)
378
                    bias = tf.compat.v1.get_variable("b", [32], initializer=tf.constant_initializer(0.0))
379
                    # 点卷积
380
                    conv1 = tf.nn.conv2d(block3_input, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
381
                    conv1 = tf.nn.leaky_relu(conv1)
382

383
                with tf.compat.v1.variable_scope('conv2'):
384
                    # 3x3 卷积保持 32 通道
385
                    weights = tf.compat.v1.get_variable("w", [3, 3, 32, 32],
386
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
387
                    #weights = weights_spectral_norm(weights)
388
                    bias = tf.compat.v1.get_variable("b", [32], initializer=tf.constant_initializer(0.0))
389
                    # SAME 卷积
390
                    conv2 = tf.nn.conv2d(conv1, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
391
                    conv2 = tf.nn.leaky_relu(conv2)
392
                with tf.compat.v1.variable_scope('conv3'):
393
                    # 1x1 卷积降到 16 通道
394
                    weights = tf.compat.v1.get_variable("w", [1, 1, 32, 16],
395
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
396
                    #weights = weights_spectral_norm(weights)
397
                    bias = tf.compat.v1.get_variable("b", [16], initializer=tf.constant_initializer(0.0))
398
                    # 残差主分支
399
                    conv3 = tf.nn.conv2d(conv2, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
400
                with tf.variable_scope('identity_conv'):
401
                    # 捷径投影 32 -> 16
402
                    weights = tf.compat.v1.get_variable("w", [1, 1, 32, 16],
403
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
404
                    #weights = weights_spectral_norm(weights)
405
                    identity_conv = tf.nn.conv2d(block3_input, weights, strides=[1, 1, 1, 1], padding='SAME')
406
                # 残差相加后激活
407
                block3_output = tf.nn.leaky_relu(conv3 + identity_conv)
408
                block4_input = block3_output
409
            with tf.compat.v1.variable_scope('block4'):
410
                with tf.compat.v1.variable_scope('conv1'):
411
                    # 1x1 卷积保持 16 通道
412
                    weights = tf.compat.v1.get_variable("w", [1, 1, 16, 16],
413
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
414
                    #weights = weights_spectral_norm(weights)
415
                    bias = tf.compat.v1.get_variable("b", [16], initializer=tf.constant_initializer(0.0))
416
                    # 点卷积
417
                    conv1 = tf.nn.conv2d(block4_input, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
418
                    conv1 = tf.nn.leaky_relu(conv1)
419

420
                with tf.compat.v1.variable_scope('conv2'):
421
                    # 3x3 卷积保持 16 通道
422
                    weights = tf.compat.v1.get_variable("w", [3, 3, 16, 16],
423
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
424
                    #weights = weights_spectral_norm(weights)
425
                    bias = tf.compat.v1.get_variable("b", [16], initializer=tf.constant_initializer(0.0))
426
                    # SAME 卷积
427
                    conv2 = tf.nn.conv2d(conv1, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
428
                    conv2 = tf.nn.leaky_relu(conv2)
429
                with tf.compat.v1.variable_scope('conv3'):
430
                    # 1x1 卷积生成单通道输出
431
                    weights = tf.compat.v1.get_variable("w", [1, 1, 16, 1],
432
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
433
                    #weights = weights_spectral_norm(weights)
434
                    bias = tf.compat.v1.get_variable("b", [1], initializer=tf.constant_initializer(0.0))
435
                    # 残差主分支输出
436
                    conv3 = tf.nn.conv2d(conv2, weights, strides=[1, 1, 1, 1], padding='SAME') + bias
437
                with tf.variable_scope('identity_conv'):
438
                    # 捷径分支保持 1 通道，便于相加
439
                    weights = tf.compat.v1.get_variable("w", [1, 1, 16, 1],
440
                                                        initializer=tf.truncated_normal_initializer(stddev=1e-3))
441
                    #weights = weights_spectral_norm(weights)
442
                    identity_conv = tf.nn.conv2d(block4_input, weights, strides=[1, 1, 1, 1], padding='SAME')
443
                # 残差相加后用 tanh 输出融合图像
444
                block4_output = tf.nn.tanh(conv3 + identity_conv)
445
                fusion_image = block4_output
446
        return fusion_image

1
    def STDFusion_model(self, vi_image, ir_image):
2
        # 整体模型：编码可见光与红外，拼接后解码得到融合图像
3
        with tf.variable_scope("STDFusion_model"):
4
            # 提取可见光特征
5
            vi_feature = self.vi_feature_extraction_network(vi_image)
6
            # 提取红外特征
7
            ir_feature = self.ir_feature_extraction_network(ir_image)
8
            # 通道维拼接形成联合特征
9
            feature = tf.concat([vi_feature, ir_feature], axis=-1)
10
            # 解码重建融合图像
11
            f_image = self.feature_reconstruction_network(feature)
12
        return f_image

5.2 Sobel 梯度算子（对应式(4)(5)中的 ∇）#

1
def gradient(input):
2
    filter1 = tf.reshape(tf.constant([[-1., 0., 1.], [-2., 0., 2.], [-1., 0., 1.]]), [3, 3, 1, 1])
3
    filter2 = tf.reshape(tf.constant([[-1., -2., -1.], [0., 0., 0.], [1., 2., 1.]]), [3, 3, 1, 1])
4
    Gradient1 = tf.nn.conv2d(input, filter1, strides=[1, 1, 1, 1], padding='SAME')
5
    Gradient2 = tf.nn.conv2d(input, filter2, strides=[1, 1, 1, 1], padding='SAME')
6
    Gradient = tf.abs(Gradient1) + tf.abs(Gradient2)
7
    return Gradient

代码来源：utils.py。
filter1/2 为 Sobel 水平/垂直核，对应原文中“we employ the Sobel operator”；输出 Gradient 即式(4)(5) 中的 $\nabla I$ 。

5.3 Loss 计算（对应式(2)-(6)，含 mask 分区）#

1
with tf.name_scope('g_loss'):
2
    self.ir_mask = (self.ir_mask + 1) / 2.0
3
    self.ir_p_loss_train = tf.multiply(self.ir_mask, tf.abs(self.fusion_images - self.ir_images))
4
    self.vi_p_loss_train = tf.multiply(1 - self.ir_mask, tf.abs(self.fusion_images - self.vi_images))
5
    self.ir_grad_loss_train = tf.multiply(self.ir_mask, tf.abs(gradient(self.fusion_images) - gradient(self.ir_images)))
6
    self.vi_grad_loss_train = tf.multiply(1 - self.ir_mask, tf.abs(gradient(self.fusion_images) - gradient(self.vi_images)))
7

8
    self.ir_p_loss = tf.reduce_mean(self.ir_p_loss_train)
9
    self.vi_p_loss = tf.reduce_mean(self.vi_p_loss_train)
10
    self.ir_grad_loss = tf.reduce_mean(self.ir_grad_loss_train)
11
    self.vi_grad_loss = tf.reduce_mean(self.vi_grad_loss_train)
12
    self.g_loss_2 = 1 * self.vi_p_loss + 1 * self.vi_grad_loss + 7 * self.ir_p_loss + 7 * self.ir_grad_loss

代码来源：model.py。
(mask+1)/2 恢复二值显著区域；1 - ir_mask 是背景 mask（“inverted”）；四项对应式(2)–(5) 的显著/背景像素与梯度 L1，reduce_mean 实现 $\frac{1}{HW}\|\cdot\|_1$ 。g_loss_2 中显著区域乘 $\alpha=7$ ，背景系数为 1，对应式(6)。

5.4 训练流程（“Training Details”）#

1
flags.DEFINE_integer("epoch", 30, "Number of epoch [10]")
2
flags.DEFINE_integer("batch_size", 32, "The size of batch images [128]")
3
flags.DEFINE_integer("image_size", 128, "The size of image to use [33]")
4
flags.DEFINE_integer("stride", 24, "The size of stride to apply input image [14]")
5
flags.DEFINE_float("learning_rate", 1e-3, "The learning rate of gradient descent algorithm [1e-4]")
6
with tf.name_scope('train_step'):
7
    self.train_generator_op = tf.train.AdamOptimizer(config.learning_rate).minimize(self.g_loss_total, var_list=self.g_vars)
8
for ep in range(config.epoch):
9
    lr = self.init_lr if ep < self.decay_epoch else self.init_lr * (config.epoch - ep) / (config.epoch - self.decay_epoch)
10
    batch_idxs = len(train_data_ir) // config.batch_size
11
    for idx in range(0, batch_idxs):
12
        batch_vi_images = train_data_vi[idx * config.batch_size: (idx + 1) * config.batch_size]
13
        batch_ir_images = train_data_ir[idx * config.batch_size: (idx + 1) * config.batch_size]
14
        batch_ir_mask = train_data_ir_mask[idx * config.batch_size: (idx + 1) * config.batch_size]
15
        batch_ir_mask = (batch_ir_mask + 1.0) / 2.0
16
        _, err_g, batch_vi_p_loss, batch_ir_p_loss, batch_vi_grad_loss, batch_ir_grad_loss, summary_str = self.sess.run(
17
            [self.train_generator_op, self.g_loss_total, self.vi_p_loss, self.ir_p_loss,
18
             self.vi_grad_loss, self.ir_grad_loss, self.summary_op],
19
            feed_dict={self.vi_images: batch_vi_images, self.ir_images: batch_ir_images,
20
                       self.ir_mask: batch_ir_mask, self.lr: lr})

代码来源：train.py（超参）与 model.py（训练循环）。
epoch=30、batch_size=32、learning_rate=1e-3、stride=24、image_size=128 完全对齐原论文训练设置；Adam 优化器即原文“TensorFlow + Adam”。训练集为 TNO 20 对图像，经 5.1 裁剪共 6921 对 patch；测试阶段不裁剪且不输入 mask。

6. 实验设置（Datasets / Metrics / Training Details）#

6.1 数据集（TNO & RoadScene）#

论文在实验中使用两个数据集：TNO 与 RoadScene。
TNO：包含 60 对红外/可见图像，分为三个序列，分别含 19、23、32 对；Fig.4 给出典型源图像与对应mask示例。
RoadScene：由 Xu 等基于 FLIR 视频发布，包含 221 对对齐的红外/可见图像，场景包含道路、车辆、行人，并被描述为缓解“样本少与低分辨率”的挑战。

6.2 指标（EN / MI / VIF / SF）#

论文选择四个常用指标：EN、MI、VIF、SF，并在文中给出其定义公式；并说明客观评价是对主观评价的补充。
论文给出 SF 的定义并指出：SF 大意味着融合图像含有更丰富的纹理与细节，从而性能更好。

6.3 训练细节（Training Details）#

训练：在 TNO 上训练，训练图像对数量为 20；为获得更多数据，设置 stride=24 进行裁剪，每个 patch 大小 128×128，得到 6921 对 patch。
测试：在 TNO 选 20 对做对比实验，在 RoadScene 选 20 对做泛化实验；并强调测试时源图像直接输入网络、不做裁剪。
归一化与优化：源图像归一化到 [-1,1]；使用 Adam；实现平台 TensorFlow；batch size=32，iteration=30，学习率 1e-3。
$\alpha$ 取值：论文观察到显著区域只占红外图像很小比例，因此为平衡显著/背景区域的loss，设 $\alpha=7$ 。硬件：NVIDIA TITAN V GPU + 2.00-GHz Intel Xeon Gold 5117 CPU。

7. 对比实验结果#

7.1 对比方法（9个）#

论文比较9种方法：传统方法 GTF、MDLatLRR；深度方法 DenseFuse、NestFuse、FusionGAN、GANMcC、IFCNN、PMGI、U2Fusion，并说明这些方法实现公开且参数按原文设置。

7.2 TNO：主观结果（Figs.5–8）与论文给出的观察#

论文在 TNO 上选择四个典型图像对（bench、Kaptein_1123、Kaptein_1654、Tree_4915）做主观评价，并在 Fig.5 中用红框标注显著区域进行放大对比。
论文描述（Fig.5）：MDLatLRR 会丢失热辐射目标信息；DenseFuse/IFCNN/U2Fusion 虽保留热辐射目标信息，但受到严重噪声污染（来源为可见图像信息）。
论文总结四个场景：STDFusionNet 不仅能有效突出显著目标，还在保持背景纹理细节方面有明显优势；并举例说明：Kaptein_1123 中树枝纹理最清晰且天空不被热辐射污染；Kaptein_1654 中背景路灯与可见图几乎一致；Tree_4915 中其他方法几乎无法区分灌木与背景，而 STDFusionNet 能突出红外目标并区分灌木。
论文指出：这种“选择性保留红外显著目标 + 可见纹理细节”的表现，主要得益于训练时人工提取的显著目标mask与构造的loss函数。

7.3 TNO：客观结果（Fig.9 + Table I）与论文对指标的解释#

Fig.9 的题注说明：在 TNO 的 20 对图像上对 EN/MI/VIF/SF 做曲线对比；曲线上一点 (x,y) 表示有 (100*x)% 的图像对的指标值不超过 y；并列出用于比较的9种方法名称。
论文对 TNO 的定量结论：在四个指标中，STDFusionNet 在 EN、MI、VIF 三项上优势显著；SF 指标仅以很小差距落后于 IFCNN。
论文强调：STDFusionNet 在 VIF 上几乎所有图像对都取最高值，这与主观评价一致，表明其融合图像具有更好的视觉效果；并解释 EN 最大说明信息更丰富、MI 最大说明从源图像传递的信息更多；SF 虽非最佳但“可比结果”表明融合结果具备足够梯度信息。

7.4 泛化实验（RoadScene）：彩色可见图像的融合策略 + 论文观察#

泛化设置：使用 RoadScene 测试在 TNO 上训练的模型，以评估泛化能力。
因 RoadScene 可见图像为彩色，论文采用特定融合策略以保色：RGB→YCbCr；将 Y 通道与灰度红外图像进行融合；再用可见图的 Cb/Cr 做逆变换恢复 RGB 融合结果。
论文对 Figs.10–13 的观察：STDFusionNet 能选择性保留红外与可见的有用信息；其融合图像在显著区域非常接近红外图像，且背景区域几乎完整保留可见纹理结构；而其他方法虽然能突出目标，但融合背景“极不令人满意”，例如天空被热信息严重污染，影响对时间/天气判断；同时其他方法对墙面文字、车辆、树桩、栅栏、路灯等背景细节保留不佳，STDFusionNet 则能有效保留背景细节并维持/增强目标对比度。

论文对 RoadScene 的定量结论：STDFusionNet 在 MI/VIF/SF 的平均值最好；EN 指标仅以很小差距落后 NestFuse；并据此认为其具有良好泛化性，受成像传感器特性影响较小。

7.5 效率对比（Table II）与论文结论#

论文指出：运行效率也是重要因素；Table II 给出在 TNO 与 RoadScene 上不同方法的平均运行时间；并指出深度方法因 GPU 加速在运行时间上有优势，尤其是 STDFusionNet；传统方法耗时更长，MDLatLRR 因分解过程尤其耗时。
论文结论：STDFusionNet 在两数据集上具有最小平均运行时间与最小标准差，说明网络对不同分辨率源图像具有鲁棒性并证明了结构设计的效率。

8. “显著目标检测”可视化（Fig.15）#

论文写到：STDFusionNet 可“隐式”实现显著目标检测，并给出可视化：展示红外图像的显著区域，以及“从融合结果中减去可见背景区域”的差分结果。
论文指出：差分结果与红外显著区域基本一致，且存在“额外的热显著目标”被方法检测到的现象，从而表明 STDFusionNet 能隐式执行显著目标检测。

9. 消融实验（Fig.16 + Table III）：期望信息定义 & 梯度loss#

9.1 w/o desired information（去掉“期望信息定义”的消融）#

为验证“期望信息定义”的合理性，在 TNO 上训练两种模型，主要差异是是否将显著目标mask引入loss；当移除显著mask后，不需要区分显著/背景区域，因此将 $\alpha$ 设为 1。
论文在 Fig.16 中描述：有期望信息定义时，STDFusionNet 的结果能突出显著目标并维持背景纹理；不使用期望信息定义时，网络以“coarse manner”进行融合，导致显著区域的热辐射信息与背景纹理信息都不能很好保留。

9.2 w/o gradient loss（去掉梯度loss的消融）#

论文在 Fig.16 附近写到：移除 gradient loss 时，显著区域几乎没有纹理信息，显著目标形状出现严重扭曲，背景区域也出现伪影；并且在 Table III 中除 SF 外其他指标呈下降趋势，论文据此强调 gradient loss 的重要性：它能确保融合图像中显著目标的纹理清晰度（texture sharpness）。

10. 结论（Conclusion）#

提出 STDFusionNet，并将红外-可见光融合的期望信息显式定义为“红外显著区域 + 可见背景区域”；在此基础上把显著目标mask引入loss以精确引导网络优化。
模型可隐式完成显著目标检测与信息融合，融合结果既包含显著热目标也具有丰富背景纹理；大量主观与客观实验验证其优越性，并且运行速度更快。

Lovely HuTao!

【论文阅读 | TIM 2021 | STDFusionNet：基于显著目标检测的红外-可见光图像融合网络】