1.实时图像处理FPGA实现的一些约束
-
时序约束
对于非实时图像处理而言,时序约束并不紧张。但是如果涉及到实时图像处理。那么每一个cycle都需要输出一个像素的数据。每一个像素的计算都需要在一个像素周期内完成
-
位宽约束
对于实时图像处理而言,计算一个特定的像素点需要相邻几个像素点的数据。比如双线性插值(需要4个相邻像素的灰阶)。但是由于RAM资源有限。端口也有限。所以没办法在一个周期内同时访存得到4个像素点的数据。解决方法:1.使用多个Ram Bank可以达到同时访存的目的 2.使用双倍的时钟去访问RAM。达到一个标准周期内可以访问RAM两次的目的。 3.使用适当的缓存。
复杂的运算如何处理
-
小数计算
全部转化为定点数再做处理
-
除法计算
除法计算是最为消耗资源的计算。我们推荐使用查找表来完成除法计算。
2.实时图像处理FPGA实现的一些技巧
-
查找表
使用查找表来实现一些复杂的计算,比如除法。然后有限位宽的查找表可以通过插值来实现比位宽更高的精确度。(查找表的输入为index高8位)。然后具体每一个低八位对应的值通过插值来实现。
-
基于显示像素扫描顺序的方法
可以基于像素扫描的规律简化下列平方计算。
接口部分(输入,输出)
line buffer
线性变换模块
双线性插值模块
3.我们面对得项目,所面对的情况:
线性变换坐标变化范围(v:畸变图纵轴 y:原图纵轴)
v | y | |
---|---|---|
0 | (v-50,0) | |
50 | (0,v+5) | |
Height-50 | (v-5,v+50) | |
Height | (v,v+50) |
4.示例代码ram读写时序
{signal: [
{name: 'clk', wave: 'p.........'},
{name: 'data_enable', wave: '01........'},
{name: 'de_1d', wave: '0.1.......'},
{name: 'dot_cnt', wave: '2.22222222', data: ['0000', '0001', '0010', '0011','0100','0101','0110','0111','1000']},
{name: 'dat', wave: '3.33333333', data: ['d0' ,'d1', 'd2', 'd3','d4','d5','d6','d7','d8']},
{name: 'dot_cnt[0]', wave: '0.10101010'},
{name: 'dot_cnt[0]_falling', wave: '0..1010101'},
{name: 'sram_addr0', wave: '4...4.4.4.4', data: ['0' , '1', '2', '3','4','0101','0110','0111','1000']},
{name: 'sram_wdata0', wave: '3.33333333', data: ['0','0,d0','d1,d0','d1,d2','d3,d2','d3,d4','d5,d4','d5,d6','d7,d6']},
{name: 'sram0_web', wave: '1..0101010'},
{name: 'o_sram_addr0', wave: '4....4.4.4', data: ['0' , '1', '2', '3']},
{name: 'o_sram_wdata0', wave: '3..33333333', data: ['0','0,d0','d1,d0','d1,d2','d3,d2','d3,d4','d5,d4','d5,d6','d7,d6']},
{name: 'o_sram0_web', wave: '1...0101010'},
{name: 'sram0_web_d', wave: '1....0101010'},
{name: 'i_sram0_rdata', wave: '5....5.5.5.', data: ['d1,d0','d3,d2','d5,d4','d7,d6']},
{name: 'i_rdata_odd', wave: '5.....5.5.5.', data: ['d1,d0','d3,d2','d5,d4','d7,d6']},
{name: 'rdata_post', wave: '5......5.5.5.', data: ['d1,d0','d3,d2','d5,d4','d7,d6']},
{name: 'sram0_web_1d', wave: '1...010101'},
{name: 'wr_odd', wave: '0.1........'},
{name: 'toggle', wave: '0..10101010'},
{name: 'toggle_d', wave: '0...10101010'},
{name: 'data_3d', wave: '3.....33333', data: ['d0','d1', 'd2', 'd3','d4','d5']},
{name: 'wr_odd_d', wave: '0....1.....'},
]}
这份RAM代码是双读通道。
5.各个模块的实现
5.1 Line Buffer
50行Line Buffer 分为两个25行的LIne Buffer LB0 LB1
LB0和LB1的规格为25x1920x48 bit
LB0 LB1可以独立完成读写
5.2 Divider LUT
5.地址增加方法
type 1 2 3 ( v1_1d == v1_2d )
begin
addr <= addr + (flag_sram - flag_sram_1d) * 1920 + u1 - u1_1d ;
end
type 4 5 6
- (v1 == v1_1d + 1)
flag_sram - flag_sram_1d = 1 时,加2行
flag_sram - flag_sram_1d = 0 时,加1行
begin
addr <= addr + ( flag_sram - flag_sram_1d + 1) * 1920+ u1_1d - u1_2d;
end
type 4 5 6
- (v1 == v1_1d - 1)
flag_sram - flag_sram_1d = 0 时,减1行
flag_sram - flag_sram_1d = -1时,减2行
begin
addr <= addr + ( flag_sram - flag_sram_1d - 1) * 1920 + u1 - u1_1d;
end
综上 type3
begin
addr <= addr + ( flag_sram - flag_sram_1d + v1 - v1_1d) * 1920 + u1 - u1_1d;
end
可以转化为如下通式
begin
addr <= addr + ( flag_sram - flag_sram_1d + v1 - v1_1d) * 1920 + u1 - u1_1d;
end
6.数据缓存方法
type1 2 3(v==v_1d)
pixel_reg1 = i_sram_rdata[95:72];
type 4 5 6 (v1_1d==v1_2d+1) flag_sram == 1
flag_sram_1d - flag_sram_2d = 1 时,加2行
pixel_reg1 <= pixel_reg1;
flag_sram_1d - flag_sram_2d = 0时,加1行
pixel_reg1 <= i_sram_rdata[95:72];
type 4 5 6 (v1_1d == v1_2d - 1) flag_sram == 0
flag_sram - flag_sram_1d = 0 时,减1行
pixel_reg1 <= i_sram_rdata[95:72];
flag_sram - flag_sram_1d = -1,减2行
pixel_reg1 <= pixel_reg1;
6.寄存器数据如何安排 // at stage 4
type 1 2 3 (v1_3d == v1_4d )
if (flag_sram_3d == 0)
part_a <= i_sram_rdat[95:72] * mul_a_2d;
else if (flag_sram_3d == 1)
case (u1_3d - u1_4d)
0: part_a <= pixel_reg0 * mul_a_2d;
1: part_a <= pixel_reg1 * mul_a_2d;
2: part_a <= pixel_reg2 * mul_a_2d;
endcase
if (flag_sram_3d == 0)
part_b <= i_sram_rdat[71:48] * mul_b_2d;
else if (flag_sram_3d == 1)
case (u1_3d - u1_4d)
0: part_b <= pixel_reg1 * mul_b_2d;
1: part_b <= pixel_reg2 * mul_b_2d;
2: part_b <= pixel_reg3 * mul_b_2d;
endcase
if (flag_sram_3d == 0)
case (u1_3d - u1_4d)
0: part_c <= pixel_reg0 * mul_c_2d;
1: part_c <= pixel_reg1 * mul_c_2d;
2: part_c <= pixel_reg2 * mul_c_2d;
endcase
else if ( flag_sram_3d == 1)
part_c <= i_sram_rdat [95:72] * mul_c_2d
if (flag_sram_3d == 0)
case (u1_3d - u1_4d)
0: part_d <= pixel_reg1 * mul_d_2d;
1: part_d <= pixel_reg2 * mul_d_2d;
2: part_d <= pixel_reg3 * mul_d_2d;
endcase
else if ( flag_sram_3d == 1)
part_d <= i_sram_rdat [71:48] * mul_d_2d;
type 4 5 6 (v1_3d == v1_4d+1 || v1_3d==v1_4d-1)
v1_3d == v1_4d+1 //flag_sram_3d == 1
if (v1_3d == v1_4d+1)
case (u1_3d - u1_4d)
0: part_a <= pixel_reg0 * mul_a_2d;
1: part_a <= pixel_reg1 * mul_a_2d;
2: part_a <= pixel_reg2 * mul_a_2d;
endcase
if (v1_3d == v1_4d+1)
case (u1_3d - u1_4d)
0: part_b <= pixel_reg1 * mul_b_2d;
1: part_b <= pixel_reg2 * mul_b_2d;
2: part_b <= pixel_reg3 * mul_b_2d;
endcase
if (v1_3d == v1_4d+1)
part_c <= i_sram_rdat [95:72] * mul_c_2d
if (v1_3d == v1_4d+1)
part_d <= i_sram_rdat [71:48] * mul_d_2d
v1_3d==v1_4d-1 //flag_sram_3d == 0
if (v1_3d == v1_4d - 1)
part_a <= i_sram_rdat [95:72] * mul_a_2d;
if (v1_3d == v1_4d - 1)
part_b <= i_sram_rdat[71:48] * mul_b_2d;
if (v1_3d == v1_4d -1)
case (u1_3d - u1_4d)
0: part_c <= pixel_reg0 * mul_c_2d;
1: part_c <= pixel_reg1 * mul_c_2d;
2: part_c <= pixel_reg2 * mul_c_2d;
endcase
if (v1_3d == v1_4d -1)
case (u1_3d - u1_4d)
0: part_d <= pixel_reg1 * mul_d_2d;
1: part_d <= pixel_reg2 * mul_d_2d;
2: part_d <= pixel_reg3 * mul_d_2d;
endcase
7.越界处理
flag_cross_u1;
flag_cross_u1_1;
flag_cross_v1;
flag_cross_v1_1;
flag_cross = {flag_cross_u1, flag_cross_u1_1, flag_cross_v1, flag_cross_v1_1}
对于a而言越界条件如下
flag_cross_u1 || flag_cross_v1
对于b而言越界条件如下
flag_cross_u1_1 || flag_cross_v1
对于c而言越界条件如下
flag_cross_u1 || flag_cross_v1_1
对于d而言越界条件如下
flag_cross_u1_1 || flag_cross_v1_1
8.Line Buffer
设计两组buffer,做一个ping-pong操作
新版设计:
T0:
读入转换后的坐标, i_coord_u, i_coord_v
T1:
算出 u1,v1, u2, v2, du,dv, sub_du, sub_dv
T2:
计算mul_a, mul_b, mul_c, mul_d
if (i_hsync_1d && i_hsync_2d)
u1_offset=u1-u1_src;
v1_offset=v1- v1_src;
else if (i_hsync_1d && !i_hsync_2d)
u1_offset =0;
v1_offset =0;
assign v1_sub1 =v1-1;
assign v1_plus1=v1+1;
assign v1_plus2=v1+2;
if (u1 > u1_src+6 || v1>v1_src+1 || v1<v1_src-1)
addr1<=u1>>3 + (v1_sub1 )240;
addr2<=u1>>3 + (v1)240;
addr3<=u1>>3 + (v1_plus1)240;
addr4<=u1>>3 + (v1_plus2)240;
u1_src <= u1;
v1_src <= v1;
addr_en <= 1;
T3:
给入地址addr<=addr1 web<=1;
u1_offset_1d , v1_offset_1d
mul_a_1d, mul_b_2d, mul_c_2d, mul_d_2d
T4:
给入地址addr<=addr2 ,
u1_offset_2d , v1_offset_2d
mul_a_3d, mul_b_3d, mul_c_3d, mul_d_3d
T5:
给入地址addr<=addr3 ,
u1_offset_3d, v1_offset_3d;
line1_reg1_a <= sram_data[8*DW-1:7*DW] ..... line1_reg8_a<=sram_data[DW-1:0]
mul_a_4d, mul_b_4d, mul_c_4d, mul_d_4d
T6:
给入地址addr<=addr4 ,
u1_offset_4d,v1_offset_4d;
line2_reg1_a <= sram_data[8*DW-1:7*DW] ..... line2_reg8_a<=sram_data[DW-1:0]
mul_a_5d, mul_b_5d, mul_c_5d, mul_d_5d
T7:
web = 0;
u1_offset_5d, v1_offset_5d;
line3_reg1_a <= sram_data[8*DW-1:7*DW] ..... line3_reg8_a<=sram_data[DW-1:0]
mul_a_5d, mul_b_5d, mul_c_5d, mul_d_5d
T8:
u1_offset_6d,v1_offset_6d;
line4_reg1_a <= sram_data[8*DW-1:7*DW] ..... line4_reg8_a<=sram_data[DW-1:0]
mul_a_6d, mul_b_6d, mul_c_6d, mul_d_6d
T9:
v1_offset_7d
case (u1_offset_6d)
'd0: begin
tmp_line1_reg1 = line1_reg1
tmp_line1_reg2 = line1_reg2
tmp_line2_reg1 = line2_reg1
tmp_line2_reg2 = line2_reg2
tmp_line3_reg1 = line3_reg1
tmp_line3_reg2 = line3_reg2
tmp_line4_reg1 = line4_reg1
tmp_line4_reg2 = line4_reg2
end
....
endcase
mul_a_7d, mul_b_7d, mul_c_7d, mul_d_7d
T10:
case (v1_offset_7d)
'd0: begin
pixel_a <= tmp_line2_reg1;
pixel_b <= tmp_line2_reg2;
pixel_c <= tmp_line3_reg1;
pixel_d <= tmp_line4_reg2;
end
endcase
mul_a_8d, mul_b_8d, mul_c_8d, mul_d_8d
T11
part_a_r = pixel_a[23:16] * mul_a_8d;
part_b_r = pixel_b[23:16] * mul_b_8d;
part_c_r = pixel_c[23:16] * mul_c_8d;
part_d_r = pixel_d[23:16] * mul_d_8d;
T12
final_r = part_a_r + part_b_r + part_c_r + part_d_r;
主要是边界条件如何处理。如果待插值点位于图片边缘。
可能存在的边界情况
flag_cross_u1 && ~flag_cross_u2 && flag_cross_v1 && ~flag_cross_v2;(左上角)
eg: (-0.2 ,-0.5)
flag_cross_u1 && ~flag_cross_u2 && ~flag_cross_v1 && ~flag_cross_v2; (左侧)
eg: (-0.2 , 50)
flag_cross_u1 && ~flag_cross_u2 && ~flag_cross_v1 && flag_cross_v2; (左下角)
eg: (-0.2, 1079.5)
~flag_cross_u1 && ~flag_cross_u2 && flag_cross_v1 && ~flag_cross_v2; (上侧)
eg:(50, -0.5)
~flag_cross_u1 && ~flag_cross_u2 && ~flag_cross_v1 && flag_cross_v2; (下侧)
eg : (50,1079.5)
~flag_cross_u1 && flag_cross_u2 && flag_cross_v1 && ~flag_cross_v2; (右上角)
eg: (1919.5, -0.5)
~flag_cross_u1 && flag_cross_u2 && ~flag_cross_v1 && ~flag_cross_v2; (右侧)
eg :(1919.5, 50)
~flag_cross_u1 && flag_cross_u2 && ~flag_cross_v1 && flag_cross_v2;(右下角)
eg : (1919.5,1079.5)
疑问:如何实现地址赋值
办法:根据当前写SRAM的地址(提前显示屏扫描50行)
为了显示视频流需要将SRAM扩冲为150行Line Buffer
这个地址的确定问题
11/6 工作汇报,以及下一步的工作安排
- 目前的设计尚且是每个周期出1个pixel,下一步需要修改为每个周期出4个pixel
- 对目前的设计做一个详细的验证。具体步骤是写出一个计算变换矩阵的SV model。然后做随机化的验证
透视变换
https://blog.csdn.net/xiaowei_cqu/article/details/26471527
11/7 变换矩阵公式推导
11/14 汇报记录
sram可选1T16P 或者是1T32P 尽量做成离散的类型
1T4P实现方式
对于坐标变换模块值得注意的是
temp_u_1, temp_v_1, temp_w_1,temp_u_2, temp_v_2, temp_w_2的计算方式可以优化。都可以使用加法器实现
插值模块实现思路(1T4P)
cache结构
输入:4个连续的变换后坐标
stage 0
赋值u_a_1 , v_a_1 , u_a_2 , v_a_2 , u_a_3 , v_a_3 , u_a_4 , v_a_4
赋值du_1 , dv_1 ....... du_4 , dv_4
stage 1
reg [3:0] flag_cross_1,...,flag_cross_4;
为每个点赋值越界条件
赋值head_v_differ = v_a_2 [1:0] - v_a_1[1:0];
赋值 u_a_1_1d , v_a_1_1d , v_a_3_1d, v_a_4_1d;
赋值v_flag_1d;
赋值flag_start = u_b_1 < 1919 && !flag_start && !(v1 > 1080 && v1 < 8190));
stage 2
确定sram访问地址
assign addr_tmp = (v_a_1 << 7) - (v_a_1 << 3) + u_a_1>>4;
addr1 = addr_tmp;
赋值 u_src , v_src
赋值 addr_en
赋值 head_v_differ_1d;
赋值 u_a_1_2d , v_a_1_2d;v_a_3_2d, v_a_4_2d;
赋值 flag_start_1d;
stage 3
addr2 = addr_tmp + 120;
赋值head_v_differ_2d;
赋值 v_flag_3d;
赋值 sram_web;
赋值 sram_addr = addr1;
赋值 addr_en_1d;
stage 4
assign tail_v_differ = coord_v[1:0] - coord_v[DW+1:DW];
assign head_tail_v_differ = coord_v[1:0] - v_src[1:0];
if (head_tail_v_differ == 'b01)
v_shift <= 120;
else if (head_tail_v_differ == 'b00 && (tail_v_differ == 'b11 || head_v_differ == 'b01))
v_shift <= 120
else
v_shift <= -240;
if (differ1 || differ2)
addr_en_line3 = 1'b1;
end else begin
addr_en_line3 = 0;
end
赋值 sram_web =
赋值 addr = addr2
stage 5
if (addr_en_line3) begin
sram_addr = sram_addr + v_shift;
end
if (rdata_vld_1)
{line1_reg_a_1 ...... line1_reg_a_16} <= sram_rdata;
stage 6
if (rdata_vld_2)
{line2_reg_a_1 ...... line2_reg_a_16} <= sram_rdata;
stage 7
if (rdata_vld_3)
{line3_reg_1 ...... line3_reg_16} <= sram_rdata;
定义一个3*5的pre_cache缓存上一次
stage 8
定义一个3*6的cache 通过当前4个pixel中第一个pixel的u坐标与u_src的差值来决定缓存哪部分数据
temp_line1_reg1 .... temp_line1_reg6
temp_line2_reg1 .... temp_line2_reg6
temp_line3_reg1 .... temp_line3_reg6
stage 9
根据与v_src的v值差异以及当前点与4个连续点第一个坐标的u值差异来决定选取temp cache中的4个特定点。对于当前要计算的4个连续像素。第一个像素的选择范围是temp_reg1-temp_reg3。第二个像素的选择范围是temp_reg2-temp_reg4.......第四个像素的选择范围是temp_reg4-temp_reg6。
stage 10
做插值的第一步处理
stage11
做差值的第二步处理
12/2汇报
如何平衡这个寄存器数目和组合逻辑复杂度。
sram的吞吐量和寄存器数目哪个更加重要。功耗和资源哪个更加重要
case 语句测试
area ultimazation
该测试的目的是检查case语句不同写法的资源消耗.
type2和type3功能上等效,资源消耗也相等,type2与type3比type1资源消耗大
type1
always @(posedge i_clk or negedge i_rst_n)
begin
if (!i_rst_n) begin
o_value <= 24'b0;
end else begin
casez (i_shift)
8'b01111:o_value <= i_data[5*24-1:4*24];
8'b10111:o_value <= i_data[4*24-1:3*24];
8'b11011:o_value <= i_data[3*24-1:2*24];
8'b11101:o_value <= i_data[2*24-1:1*24];
8'b11110:o_value <= i_data[1*24-1:0*24];
default :o_value <= 24'hffffff;
endcase
end
end
Combinational area: 106.560001
Buf/Inv area: 1.080000
Noncombinational area: 190.080002
Macro/Black Box area: 0.000000
Net Interconnect area: undefined (Wire load has zero net area)
Total cell area: 296.640003
type2
always @(posedge i_clk or negedge i_rst_n)
begin
if (!i_rst_n) begin
o_value <= 24'b0;
end else begin
casez (i_shift)
8'h00,8'h01,8'h02,8'h03,8'h04,8'h05,8'h06,8'h07,8'h08,8'h09,8'h0a,8'h0b,8'h0c,8'h0d,8'h0e,8'h0f:o_value <= i_data[5*24-1:4*24];
5'd16,8'd17,8'd18,8'd19,8'd20,8'd21,8'd22,8'd23:o_value <= i_data[4*24-1:3*24];
5'd24,8'd25,8'd26,8'd27:o_value <= i_data[3*24-1:2*24];
5'd28,8'd29:o_value <= i_data[2*24-1:1*24];
5'd30:o_value <= i_data[1*24-1:0*24];
default :o_value <= 24'hffffff;
endcase
end
end
Combinational area: 230.040000
Buf/Inv area: 1.080000
Noncombinational area: 190.080002
Macro/Black Box area: 0.000000
Net Interconnect area: undefined (Wire load has zero net area)
Total cell area: 420.120002
type3
always @(posedge i_clk or negedge i_rst_n)
begin
if (!i_rst_n) begin
o_value <= 24'b0;
end else begin
casez (i_shift)
5'b0????:o_value <= i_data[5*24-1:4*24];
5'b10???:o_value <= i_data[4*24-1:3*24];
5'b110??:o_value <= i_data[3*24-1:2*24];
5'b1110?:o_value <= i_data[2*24-1:1*24];
5'b11110:o_value <= i_data[1*24-1:0*24];
default :o_value <= 24'hffffff;
endcase
end
end
Combinational area: 230.040000
Buf/Inv area: 1.080000
Noncombinational area: 190.080002
Macro/Black Box area: 0.000000
Net Interconnect area: undefined (Wire load has zero net area)
Total cell area: 420.120002
type 4
always @(posedge i_clk or negedge i_rst_n)
begin
if (!i_rst_n) begin
o_value <= 24'hffffff;
end else begin
casez (i_shift)
22'b10??_????_????_????_????_??:o_value <= i_data[22*24-1:21*24];
22'b110?_????_????_????_????_??:o_value <= i_data[21*24-1:20*24];
22'b1110_????_????_????_????_??:o_value <= i_data[20*24-1:19*24];
22'b1111_0???_????_????_????_??:o_value <= i_data[19*24-1:18*24];
22'b1111_10??_????_????_????_??:o_value <= i_data[18*24-1:17*24];
22'b1111_110?_????_????_????_??:o_value <= i_data[17*24-1:16*24];
22'b1111_1110_????_????_????_??:o_value <= i_data[16*24-1:15*24];
22'b1111_1111_0???_????_????_??:o_value <= i_data[15*24-1:14*24];
22'b1111_1111_10??_????_????_??:o_value <= i_data[14*24-1:13*24];
22'b1111_1111_110?_????_????_??:o_value <= i_data[13*24-1:12*24];
22'b1111_1111_1110_????_????_??:o_value <= i_data[12*24-1:11*24];
22'b1111_1111_1111_0???_????_??:o_value <= i_data[11*24-1:10*24];
22'b1111_1111_1111_10??_????_??:o_value <= i_data[10*24-1:9*24];
22'b1111_1111_1111_110?_????_??:o_value <= i_data[9*24-1:8*24];
22'b1111_1111_1111_1110_????_??:o_value <= i_data[8*24-1:7*24];
22'b1111_1111_1111_1111_0???_??:o_value <= i_data[7*24-1:6*24];
22'b1111_1111_1111_1111_10??_??:o_value <= i_data[6*24-1:5*24];
22'b1111_1111_1111_1111_110?_??:o_value <= i_data[5*24-1:4*24];
22'b1111_1111_1111_1111_1110_??:o_value <= i_data[4*24-1:3*24];
22'b1111_1111_1111_1111_1111_1?:o_value <= i_data[3*24-1:2*24];
22'b1111_1111_1111_1111_1111_10:o_value <= i_data[2*24-1:1*24];
default: o_value <= 24'hffffff;
endcase
end
end
Combinational area: 960.840003
Buf/Inv area: 2.520000
Noncombinational area: 198.719994
Macro/Black Box area: 0.000000
Net Interconnect area: undefined (Wire load has zero net area)
Total cell area: 1159.559996
Timing report
type1
always @(posedge i_clk or negedge i_rst_n)
begin
if (!i_rst_n) begin
o_value <= 24'b0;
end else begin
casez (i_shift)
5'b0????:o_value <= i_data[5*24-1:4*24];
5'b10???:o_value <= i_data[4*24-1:3*24];
5'b110??:o_value <= i_data[3*24-1:2*24];
5'b1110?:o_value <= i_data[2*24-1:1*24];
5'b11110:o_value <= i_data[1*24-1:0*24];
default :o_value <= 24'hffffff;
endcase
end
end
Point Incr Path
-----------------------------------------------------------
clock i_clk (rise edge) 0.00 0.00
clock network delay (ideal) 0.00 0.00
input external delay 0.25 0.25 r
i_shift[4] (in) 0.00 0.25 r
U110/Z (INVM1T) 0.19 0.44 f
U112/Z (NR2M2T) 0.60 1.04 r
U205/Z (AOI22M1T) 0.22 1.26 f
U203/Z (ND4M1T) 0.13 1.38 r
o_value_reg[22]/D (DFQRM1T) 0.00 1.38 r
data arrival time 1.38
clock i_clk (rise edge) 2.50 2.50
clock network delay (ideal) 0.00 2.50
clock uncertainty -0.20 2.30
o_value_reg[22]/CK (DFQRM1T) 0.00 2.30 r
library setup time -0.02 2.28
data required time 2.28
-----------------------------------------------------------
data required time 2.28
data arrival time -1.38
-----------------------------------------------------------
slack (MET) 0.90
type 2
always @(posedge i_clk or negedge i_rst_n)
begin
if (!i_rst_n) begin
o_value <= 24'hffffff;
end else begin
casez (i_shift)
22'b10??_????_????_????_????_??:o_value <= i_data[22*24-1:21*24];
22'b110?_????_????_????_????_??:o_value <= i_data[21*24-1:20*24];
22'b1110_????_????_????_????_??:o_value <= i_data[20*24-1:19*24];
22'b1111_0???_????_????_????_??:o_value <= i_data[19*24-1:18*24];
22'b1111_10??_????_????_????_??:o_value <= i_data[18*24-1:17*24];
22'b1111_110?_????_????_????_??:o_value <= i_data[17*24-1:16*24];
22'b1111_1110_????_????_????_??:o_value <= i_data[16*24-1:15*24];
22'b1111_1111_0???_????_????_??:o_value <= i_data[15*24-1:14*24];
22'b1111_1111_10??_????_????_??:o_value <= i_data[14*24-1:13*24];
22'b1111_1111_110?_????_????_??:o_value <= i_data[13*24-1:12*24];
22'b1111_1111_1110_????_????_??:o_value <= i_data[12*24-1:11*24];
22'b1111_1111_1111_0???_????_??:o_value <= i_data[11*24-1:10*24];
22'b1111_1111_1111_10??_????_??:o_value <= i_data[10*24-1:9*24];
22'b1111_1111_1111_110?_????_??:o_value <= i_data[9*24-1:8*24];
22'b1111_1111_1111_1110_????_??:o_value <= i_data[8*24-1:7*24];
22'b1111_1111_1111_1111_0???_??:o_value <= i_data[7*24-1:6*24];
22'b1111_1111_1111_1111_10??_??:o_value <= i_data[6*24-1:5*24];
22'b1111_1111_1111_1111_110?_??:o_value <= i_data[5*24-1:4*24];
22'b1111_1111_1111_1111_1110_??:o_value <= i_data[4*24-1:3*24];
22'b1111_1111_1111_1111_1111_1?:o_value <= i_data[3*24-1:2*24];
22'b1111_1111_1111_1111_1111_10:o_value <= i_data[2*24-1:1*24];
default: o_value <= 24'hffffff;
endcase
end
end
Point Incr Path
-----------------------------------------------------------
clock i_clk (rise edge) 0.00 0.00
clock network delay (ideal) 0.00 0.00
input external delay 0.25 0.25 r
i_shift[17] (in) 0.00 0.25 r
U406/Z (ND4M2T) 0.11 0.36 f
U418/Z (ND2B1M4T) 0.15 0.51 f
U417/Z (NR2B1M4T) 0.06 0.57 r
U441/Z (CKND2M2T) 0.08 0.65 f
U439/Z (NR2B1M2T) 0.10 0.74 r
U438/Z (CKND2M2T) 0.08 0.83 f
U437/Z (NR2B1M2T) 0.10 0.92 r
U405/Z (ND2M2T) 0.09 1.01 f
U414/Z (NR2B1M2T) 0.10 1.11 r
U413/Z (ND2M2T) 0.10 1.21 f
U403/Z (NR2B1M4T) 0.08 1.29 r
U732/Z (NR2B1M2T) 0.52 1.80 r
U770/Z (AO22M1T) 0.31 2.12 r
U694/Z (AOI211M1T) 0.05 2.17 f
U691/Z (ND4M1T) 0.11 2.28 r
o_value_reg[15]/D (DFQSM1T) 0.00 2.28 r
data arrival time 2.28
clock i_clk (rise edge) 2.50 2.50
clock network delay (ideal) 0.00 2.50
clock uncertainty -0.20 2.30
o_value_reg[15]/CK (DFQSM1T) 0.00 2.30 r
library setup time -0.01 2.29
data required time 2.29
-----------------------------------------------------------
data required time 2.29
data arrival time -2.28
-----------------------------------------------------------
slack (MET) 0.01
12/3 向彭总汇报总结
- 对于每一行而言。line3出现的位置是
- 出一个ppt,12月18号向Eason做一个汇报,时间为1.5h-2h
12/9
12/10
在下列公式中:
M是一个系数,没有单位。k值的单位随r值量纲变化。故这里的r如果单位是mm。那么k值的单位是/mm2.
如果r的单位是像素坐标,那么k值的单位也是/像素坐标平方
12/11 报告大纲
- 项目背景(目前只是实现了线性变换)
- 基本理论(介绍线性变换)
- 具体实现(模块结构图)
- 具体实现(坐标变换模块)
- 具体实现(双线性插值模块)
12/15
- 我的方案需要将椭圆坐标变换的内容加进去
- 我的方案需要给出一个应用于视频流的寻址方式
- 需要做一下综合,看看资源开销,与其他的实现方案做对比
12/18
- 1.对于桶形畸变和线性变换夹杂的情况。
- 2.首先得到畸变的图案(线性畸变+桶形畸变),因为镜片的畸变系数(畸变中心)已知,所以可以根据公式计算出原始的线性畸变4个顶点。从而可以计算得到标准的线性变换的矩阵。
radial pipline:
stage0:
stage1:
stage2:
stage3:
stage4
12/19
补码的乘法不能像加减法那样直接乘。他们需要一些扩展
在verilog中,有符号的乘法要求所有变量均是signed型变量
-
sign_c 为signed 型变量
-
sign_d不是signed型变量
12/20
coordinate_trans_module 流水线 (一个周期足矣)
input
pre_stage
stage0
interpolation module
可以将二级cache缩短为3624 bits
可以将pre_cache缩短为3524 bits
判断是否需要启用line3只需要5个周期的坐标哦
1/3
img_distortion_v3 (缩短流水线长度)
stage 0
赋值sram_addr,赋值sram_web.赋值scan_en, u_src, v_src
sram_addr = addr1
sram_web = (!scan_en && flag_start) || (scan_en && u_comp)
scan_en = flag_start_公式
计算 u16 = u1 + delta_h_x *(16+4)
stage 1
赋值sram_addr = addr2
赋值addr_en=sram_web && !sram_web
赋值 addr3_line3_en = u16[FLEN]!=u1[FLEN]
赋值flag_cross
stage 2
赋值sram_addr = addr3
赋值line1_cache
stage 3
赋值line2_cache
stage 4
赋值line3_cache
赋值 pre_cache(需要sign_v)
stage 5
赋值temp_cache (需要u和u_src的差值)
stage 6
赋值pix1_a ... pix1_d ... pix4_a ... pix4_d