凯发k8官方网 收集整理的这篇文章主要介绍了 caffe data层_caffe实现多标签输入,添加数据层(data layer) 小编觉得挺不错的,现在分享给大家,帮大家做个参考.

因为之前遇到了sequence learning问题(crnn),里面涉及到一张图对应多个标签。caffe源码本身是不支持多类标签数据的输入的。




hdf5 slice layer实现,因为caffe中要求一个hdf5文件大小不超过2gb,所以文件如果较大的话,需要生成多个hdf5文件,所以需要用到slice layer。 参考生成hdf5文件用于多标签训练


其实我个人总结的是,数据层的添加可以考虑用python,因为比较简单、快,也不会影响效率,计算层的添加还是需要用c 来写的。


我采用的方案就是用python将数据转换成lmdb格式,然后在prototxt中定义采用python module的方式,去读取之前转换的lmdb数据。


1. 前期数据准备

前期的数据准备和单分类一样,只不过现在我们有多个标签了,那么我就在train.txt和val.txt中,用空格将多个标签分隔开。例如 image1.jpg label1 label2 label3 label4

2. 数据转lmdb格式


# -*- coding: utf-8 -*-

import numpy as np

import lmdb

import sys, os

import caffe

from skimage import io

import cv2

import random

train_path = 'train.txt' # 训练集标签

val_path = 'val.txt' # 验证集标签

train_lmdb = '/path/to/your/data_train_lmdb' # 生成lmdb格式训练集数据的路径,到目录级别就可以了

val_lmdb = '/path/to/your/data_val_lmdb' # 生成lmdb格式验证集数据的路径,到目录级别就可以了

# 加载train.txt

def load_txt(txt, shuffle):

if txt == none:

print "txtpath!!!"


if not os.path.exists(txt):

print "the txt is't exists"


# 将数据按行存入list中

file_content = []

with open(txt, 'r') as fr:

for line in fr.readlines():

line = line.strip()

file_content.append([_ for _ in line.split(' ')])

# shuffle数据

if shuffle:


return file_content

if __name__ == '__main__':

content = []

# 这里定义了要处理的文件目录,因为我们有train data 和 val data,所以我们需要把val_path和val_lmdb改成train_path和train_lmdb再执行一次这个脚本。

content = load_txt(val_path, true)

env = lmdb.environment(val_lmdb, map_size=int(1e12))

with env.begin(write=true) as txn:

for i in range(len(content)):

pic_path = content[i][0]

# 采用skimage库的方式来读文件

img_file = io.imread(pic_path, as_grey=true)

# 如果采用opencv的方式来读文件,那么下面也要改成mat转string的方式

#img_file = cv2.imread(pic_path, 0)

data = np.zeros(( img_file.shape[0], img_file.shape[1]), dtype=np.uint8)

data = img_file

# 因为lmdb是键值数据库,所以我们采用将键和值都设置为字符串格式

str_id = "image- d" %(i)

cv2.imencode('.jpg', data)

txn.put(str_id.encode('ascii'), cv2.imencode('.jpg', data)[1].tostring())

# 这里的多标签采用的是空格分隔,到时候读lmdb数据库的时候,也用空格解析就可以了

multi_labels = ""

for _ in content[i][1:len(content[i])]:

multi_labels = _

multi_labels = " "

multi_labels = content[i][-1]

# 键和值都是字符串格式

str_id = "label- d" %(i)

#txn.put(str_id.encode('ascii'), multi_labels)

txn.put(str_id, multi_labels)

#txn.put(str_id, multi_labels)

str_id = "num-samples"

txn.put(str_id, str(len(content)))

#txn.put(str_id.encode('ascii'), str(len(content)))

print str(len(content))


3. 定义datalayer




# 这是一个losslayer的例子

import caffe

import numpy as np

class euclideanlosslayer(caffe.layer):


compute the euclidean loss in the same manner as the c euclideanlosslayer

to demonstrate the class interface for developing layers in python.


# 设置参数

def setup(self, bottom, top):

# check input pair

if len(bottom) != 2:

raise exception("need two inputs to compute distance.")

def reshape(self, bottom, top):

# check input dimensions match

if bottom[0].count != bottom[1].count:

raise exception("inputs must have the same dimension.")

# difference is shape of inputs

self.diff = np.zeros_like(bottom[0].data, dtype=np.float32)

# loss output is scalar


# 前向计算方式

def forward(self, bottom, top):

self.diff[...] = bottom[0].data - bottom[1].data

top[0].data[...] = np.sum(self.diff**2) / bottom[0].num / 2.

# 反向传播方式

def backward(self, top, propagate_down, bottom):

for i in range(2):

if not propagate_down[i]:


if i == 0:

sign = 1


sign = -1

bottom[i].diff[...] = sign * self.diff / bottom[i].num

那么我们知道接口长什么样以后,我们就开始依葫芦画瓢了。别急,先来看看prototxt怎么定义参数的,因为到时候这个决定了我们要向data layer中传入什么参数。先看看官方接口

4. 定义prototxt

message pythonparameter {

optional string module = 1;

optional string layer = 2;

// this value is set to the attribute `param_str` of the `pythonlayer` object

// in python before calling the `setup()` method. this could be a number,

// string, dictionary in python dict format, json, etc. you may parse this

// string in `setup` method and use it in `forward` and `backward`.

optional string param_str = 3 [default = '']; # 这里比较关键,也就是我们通过这个参数,来决定如何读取lmdb数据的

// deprecated

optional bool share_in_parallel = 4 [default = false];



layer {

name: "data"

type: "python"

top: "data"

top: "label"

include {

phase: train


python_param {

module: "datalayer"

layer: "crnndatalayer"

param_str: "{'data' : '/path/to/your/data_train_lmdb', 'batch_size' : 128}"




import sys

import caffe

from caffe import layers as l, params as p

from caffe.coord_map import crop

import numpy as np

import os

import cv2

import lmdb

import random

import timeit

import os

class crnndatalayer(caffe.layer):

def setup(self, bottom, top):

params = eval(self.param_str)

# 读prototxt中的参数

self.lmdb = lmdb.open(params['data']).begin(buffers=true).cursor()

# 这个是生成lmdb数据的时候,定义的样本的总个数


# print '[' str(c) ']'

self.max_num = int(str(c))

self.batch_size = int(params['batch_size'])

# two tops: data and label

if len(top) != 2:

raise exception("need to define two tops: data and label.")

# data layers have no bottoms

if len(bottom) != 0:

raise exception("do not define a bottom.")

def reshape(self, bottom, top):

# load image label image pair

start = timeit.timeit()

self.data,self.label = self.load_data()

end = timeit.timeit()

# print 'time used for reshape',end-start

# reshape tops to fit (leading 1 is for batch dimension)



def forward(self, bottom, top):

# assign output

top[0].data[...] = self.data

top[1].data[...] = self.label

# 因为是data layer,所以不需要定义backward

def backward(self, top, propagate_down, bottom):


def load_data(self):

# 采用随机读入的方式

rnd = random.randint(0,self.max_num-self.batch_size-1)

# 先初始化一个多维数组,用于存放读入的数据,在这里设置batch size, channel, height, width

img_list= np.zeros((self.batch_size, channel, height, width),

dtype = np.float32)

# 先初始化一个多维数组,用于存放标签数据,设置batch size, label size(每张图对应的标签的个数)

label_seq = np.ones((self.batch_size, label_size), dtype = np.float32)

j = 0

i = 0

# print 'loading data ...'

while i < self.batch_size:

# rnd = random.randint(0,self.max_num-self.batch_size-1)

imagekey = 'image- d' % (rnd j)

labelkey = 'label- d' % (rnd j)


img_array = np.asarray(bytearray(self.lmdb.get(imagekey)), dtype=np.uint8)

#imgdata = cv2.imdecode(img_array, 0)

imgdata = cv2.imdecode(np.fromstring(img_array, np.uint8), cv2.cv_load_image_grayscale)

# 设置resize的width和height

image = cv2.resize(imgdata, width,height))

image = (image - 128.0)/128

img_list[i] = image

label = str(self.lmdb.get(labelkey))

#numbers = np.array(map(lambda x: float(ascii2label(ord(x))), label))

label_list = label.split(" ")

label_list = [int(_) for _ in label_list]

# 这里把标签依次放入数组中

label_seq[i, :len(label_list)] = label_list

i =1

except exception as e:

print e

j =1

# print 'data loaded'

return img_list,label_seq

5. 重新编译caffe

因为我们添加了一个python module,那么我们要在环境变量中,设置这个module,不然会出现找不到的情况。

vim ~/.bash_profile

export pythonpath=$pythonpath:(添加datalayer.py所在目录)

source ~/.bash_profile


with_python_layer=1 make && make pycaffe




