Spider-OS

login - registration

Back to the post list

Videocore : draw triangles
2019 Feb 22

The Raspberry Pi is equipped with a powerful graphics processor Videocore 4. Nothing better to have a rich and fluid graphical interface. But you have to know how to program it ;-)

In this post, I show you how to program it to display three overlapped triangles, as in the screenshot below (done on my Raspberry Pi).

Snapshot triangles

Comments :

Aran (webmaster)
2019 Feb 22 21:16

First of all I want to specify my sources otherwise I could never have arrived at it. This is the work of Peter Lemon. And also the bible of Videocore edited by Broadcom.

1st step: initializing the Framebuffer (framebufferInit)

The visible screen is a memory area called : framebuffer. It is initialized via a message sent to the mailbox. This message indicates the width, height, screen and the number of color bits per pixel in particular. Below is the program to initialize the framebuffer.

; setting the mailbox message
mov r0,1920
mov r1,1080
mov r2,16
str r0,[framebuf.width]
str r0,[framebuf.widthV]
str r1,[framebuf.height]
str r1,[framebuf.heightV]
str r2,[framebuf.bits] 

mov r0,MAIL_TAGS  ; mailbox channel 8
mov r1,framebuf   ; message address
bl mboxCall

align 16
framebuf  MboxFB

struc MboxFB
{
  dw .fin - $
  dw $00000000

  dw Set_Physical_Display
  dw $00000008, $00000008
  .width dw 0
  .height dw 0

  dw Set_Virtual_Buffer
  dw $00000008, $00000008
  .widthV dw 0
  .heightV dw 0

  dw Set_Depth
  dw $00000004, $00000004
  .bits dw 0

  dw Set_Virtual_Offset
  dw $00000008, $00000008
  dw 0
  dw 0

  dw Allocate_Buffer
  dw $00000008, $00000008
  .ptr dw 0
  dw 0
 
  dw $00000000
  .fin:
}

After initialization the mailbox sends us the address of the framebuffer in the message. It is recovered with the following code. It is a bus address, it must be converted into a physical address.

ldr r0,[framebuf.ptr]
and r0,$3FFFFFFF
str r0,[framebuf.ptr]

To know more about the mailboxes it's here.

2nd step: initialisation of Videocore (v3dInit)

The QPU is activated via a message sent to the mailbox. The magic number 0x02443356 is returned in r0 if the initialization is successful.

mov r0,MAIL_TAGS  ; mailbox channel 8
mov r1,mboxV3D  ; message address
bl mboxCall

; magic number read
mov r4,PERIPHERAL_BASE + V3D_BASE
ldr r0,[r4,V3D_IDENT0]

align 16
mboxV3D  MboxV3D

struc MboxV3D
{
  dw .fin - $
  dw $00000000

  dw Set_Clock_Rate
  dw $00000008
  dw $00000008
  dw CLK_V3D_ID
  dw 250*1000*1000

  dw Enable_QPU
  dw $00000004
  dw $00000004
  dw 1

  dw $00000000
 .fin:
}

3rd step: generation of the binner control list (v3dBinnerPrep)

The videocore does not write directly in the visible video memory (the framebuffer), but rather in a memory area composed of tiles. This system saves bandwidth, because only the tiles that a figure covers, are updated.

The control lists are used to format this memory area. These are codes that will tell the Binner how to proceed. The control lists are detailed in the manual VideoCore IV 3D Architecture Reference Guide section 9.

Here is the structure with all the necessary codes, which we will use later.

struc V3DControlListBinner
{
  .tile			Tile_Binning_Mode_Configuration
  Start_Tile_Binning 
  Increment_Semaphore
  .clip			Clip_Window
  .conf			Configuration_Bits
  .viewport		Viewport_Offset
  .nvShader		NV_Shader_State		; address of vertices
  .vertex			Vertex_Array_Primitives	; info on the shape to draw
  Flush
  .end:
}
virtual at 0
  oVCLBin	V3DControlListBinner
end virtual

We start by indicating how many tiles there are in width and height, and the memory areas used, with the code Tile Binning Mode Configuration.

align 16
v3dCLBin  V3DControlListBinner

mov r4,v3dCLBin

mov r0,VBIN_TILE            ; tile allocation memory address
add r1,r4,oVCLBin.tile.address
strNotAlign32 r0,r1
 
mov r0,0x20000             ; allocated size = size state tile * number of tiles
add r1,r4,oVCLBin.tile.size       ; 48 * (1920/32) * (1080/32) = 0x17bb0
strNotAlign32 r0,r1
 
mov r0,VBIN_TILE_STATE         ; address of tile state
add r1,r4,oVCLBin.tile.baseaddress
strNotAlign32 r0,r1
 
lsr r0,screenW,5               ; number of tiles in width 
strb r0,[r4,oVCLBin.tile.tileWidth]
lsr r0,screenH,5               ; number of tiles in height  
strb r0,[r4,oVCLBin.tile.tileHeight]
 
mov r0,Auto_Initialise_Tile_State_Data_Array + Multisample_Mode_4X
strb r0,[r4,oVCLBin.tile.data]

You will notice that I use the strNotAlign16 and strNotAlign32 functions to write a 16bits or 32bits register on a non-aligned memory area.

Then we specify the code Start Tile Binning, which indicates the beginning of the bining list.

The code Clip Window allows to restrict the visible area on the screen. We specify the origin, and the width / height of the window. Here the whole screen is used.

mov r0,0			
add r1,r4,oVCLBin.clip.left		; origin x left
strNotAlign16 r0,r1

add r1,r4,oVCLBin.clip.bottom	; origin y at the bottom
strNotAlign16 r0,r1

add r1,r4,oVCLBin.clip.width		; width in pixels
strNotAlign16 screenW,r1

add r1,r4,oVCLBin.clip.height		; height in pixels
strNotAlign16 screenH,r1	

The Configuration Bits control code enables 4x mode for anti-aliasing. It also enables the Z buffer test to take into account the superposition of the pixels (when figures overlap). Here the function Depth_Test_Function_GE allows to keep the pixel with the Z parameter the largest.

mov r0,Rasteriser_Oversample_Mode_4X + Enable_Forward_Facing_Primitive
strb r0,[r4,oVCLBin.conf.data8]

mov r0,Depth_Test_Function_GE + Z_Updates_Enable
add r1,r4,oVCLBin.conf.data16
strNotAlign16 r0,r1

Then there is the code Viewport Offset, that I do not use at the moment. So it is zeroed.

mov r0,0
add r1,r4,oVCLBin.viewport.x
strNotAlign16 r0,r1

add r1,r4,oVCLBin.viewport.y
strNotAlign16 r0,r1

The NV Shader State control code indicates the address of the Shader Record. This one gives all the information on the shapes to draw. Like the vertices that make up the shapes, and the programs to execute to shade the colors for example. We will see all this later.

Videocore 4 supports OpenGL and OpenVG. The 3D system can therefore operate in three modes: GL, NV and VG, as shown in Videocore Guide page 63. Choose the GL mode, which offers the most processing possibilities, with the Vertex Array Primitives control code.
This command defines three items :

Here is the corresponding code :

mov r0,Mode_Triangles
strb r0,[r4,oVCLBin.vertex.data]	

mov r0,3*3			; number of vertices of the triangle * 3 triangles
add r1,r4,oVCLBin.vertex.length
strNotAlign32 r0,r1

mov r0,0
add r1,r4,oVCLBin.vertex.index
strNotAlign32 r0,r1

The various OpenGL primitives available to us :

OpenGL primitives

The binning control list ends with the command Flush.

4th step: generation of the Render control list (v3dRenderPrep)

The binning phase automatically generated a new control list, which will be used by the Render. This one will be able to thus generate the tiles accordingly.

In addition, a new control list is added to indicate the erase colors, to frame the tiles in memory, and to trigger the updating of the framebuffer for each tile, when necessary. The following structure indicates the different control codes used to achieve this :

struc V3DControlListRender
{
	Wait_On_Semaphore
	.clrColor		Clear_Colors
	.tileMode		Tile_Rendering_Mode_Configuration
	.tileCoor		Tile_Coordinates
	.strTbufG		Store_Tile_Buffer_General

	times (1920/32) * (1080/32+2) * (sizeof.VCLRenZ) db 0
}
virtual at 0
  oVCLRen V3DControlListRender
end virtual

We start by indicating the colors of the wallpaper and the mask with the command Clear Colors. Here black.

mov r0,0
add r1,r4,oVCLRen.clrColor.clearcolor1
strNotAlign32 r0,r1

add r1,r4,oVCLRen.clrColor.clearcolor2
strNotAlign32 r0,r1

mov r0,0
add r1,r4,oVCLRen.clrColor.clearVGZS
strNotAlign32 r0,r1
strb r0,[r4,oVCLRen.clrColor.clearstencil]

The Tile Rendering Mode Configuration command specifies the framebuffer address, screen width and height, the number of bits per pixel, and the rendering mode.

ldr r0,[framebuf.ptr]			; framebuffer address
add r1,r4,oVCLRen.tileMode.address	
strNotAlign32 r0,r1

add r1,r4,oVCLRen.tileMode.width			; screen width in pixels
strNotAlign16 screenW,r1

add r1,r4,oVCLRen.tileMode.height		; screen height in pixels
strNotAlign16 screenH,r1

mov r0,Frame_Buffer_Color_Format_BGR565_No_Dither + Multisample_Mode_4X
add r1,r4,oVCLRen.tileMode.data			; mode
strNotAlign16 r0,r1	

The Tile Coordinates and Store Tile Buffer General commands are then used to clear the screen. For this we give them 0 as parameters, thanks to the following structures:

struc Tile_Coordinates
{
	.id		db $73
	.column	db 0		; Tile Column Number
	.row		db 0		; Tile Row Number
}

struc Store_Tile_Buffer_General
{
	.id			db $1C
	.data16		dh 0
	.addrData32	dw 0	; Memory Base Address Of Frame
}
virtual at 0
  oStrBufG Store_Tile_Buffer_General
	sizeof.StrBufG = $ - oStrBufG
end virtual

As you can see in the V3DControlListRender structure, I left some memory space with the times command. This will allow us to add additional control codes. One must make control codes for each tile. And for that we will program a loop.

Here is the information to indicate for each tile :

We will use a structure to point to these codes :

struc V3DControlListRender1
{
	.tileCoor		Tile_Coordinates
	.bSubList		Branch_To_Sub_List
	.MSstr		Store_Multi_Sample
}
virtual at 0
  oVCLRen1 V3DControlListRender1
	sizeof.VCLRen1 = $ - oVCLRen1
end virtual

Which gives us the nice next loop :

; to point after the header in the structure
add r4,sizeof.VCLRenH

; tile configuration
tx equ r10
ty equ r11
mov tx,0
mov ty,0
ldr r3,[framebuf.ptr]

.DO1:
;{
	.DO2:
;	{
		; coordinate of the tile
		mov r0,Tile_Coordinates_id
		strb r0,[r4,oVCLRen1.tileCoor.id]		
		strb tx,[r4,oVCLRen1.tileCoor.column]		
		strb ty,[r4,oVCLRen1.tileCoor.row]
		
		; memory address of the tile
		mov r0,VBIN_TILE			; tile allocation memory address
		mul r2,nTuileW,ty
		add r2,tx
		lsl r2,5
		add r2,r0				; r2 : tile address for Branch_To_Sub_List
		
		; writing the jump to the address of the tile
		mov r0,Branch_To_Sub_List_id
		strb r0,[r4,oVCLRen1.bSubList.id]	
		add r0,r4,oVCLRen1.bSubList.address
		strNotAlign32 r2,r0

		; writing the tile
		mov r0,Store_Multi_Sample_id
		strb r0,[r4,oVCLRen1.MSstr]			
		
		; next tile
		add r4,sizeof.VCLRen1
		add tx,1
;	}
	cmp tx,nTuileW
	blo .DO2
	
	; end line reached, next line
	add ty,1
	mov tx,0
;}
cmp ty,nTuileH
blo .DO1

; last tile modification
mov r0,Store_Multi_Sample_End_id
strb r0,[r4, -1]	

; backup address end structure
mov r0,v3dAddrEnd
str r4,[r0]	

5th step: preparation of the shader, vertex and fragment data (v3dShaderPrep)

In step 3, the binning has been given the address of the Shader State. This is where we will configure this recording. It allows you to specify all the data needed for OpenGL processing. In our case we choose a NV Shader State Record, as indicated in the Videocore Guide page 80, table 46. Here is the corresponding structure :

struc NV_Shader_State_Record
{
	.flag			db 0			; Flag Bits: 0 = Fragment Shader Is Single Threaded,
							; 1 = Point Size Included In Shaded Vertex Data, 2 = Enable 	Clipping,
							; 3 = Clip Coordinates Header Included In Shaded Vertex Data
	.stride			db 0		; Shaded Vertex Data Stride
	.nbrUniform		db 0		; Fragment Shader Number Of Uniforms (Not Used Currently)
	.nbrVarying		db 0		; Fragment Shader Number Of Varyings
	.addrCode		dw 0	; Fragment Shader Code Address
	.addrUniform		dw 0	; Fragment Shader Uniforms Address
	.addrData		dw 0	; Shaded Vertex Data Address (128-Bit Aligned If Including Clip Coordinate Header)
}
virtual at 0
  oNVShaderState NV_Shader_State_Record
end virtual

The Shaded Vertex Data Stride indicates the size of a vertex entry. This is composed of x, y, z, w coordinates, as well as RGB colors. As in the following structure :

struc Vertex_Data_Entry_Varying
{
	.x				dh 0				; X In 12.4 Fixed Point
	.y				dh 0				; Y In 12.4 Fixed Point
	.z				dw 1.0			; Z
	.w				dw 1.0			; 1 / W
	.varying0			dw 0.0			; Varying 0 (Red)
	.varying1			dw 0.0			; Varying 1 (Green)
	.varying2			dw 0.0			; Varying 2 (Blue)
}
virtual at 0
  oVertexDataEV Vertex_Data_Entry_Varying
	sizeof.VertexDataEntryVarying = $ - oVertexDataEV
end virtual

The Number Of Varyings match to the three RGB colors.

The NV Shader State Record also indicates the address of the data, ie the coordinates, and varyings of each vertex, via the Shaded Vertex Data Address entry.
In addition the data is processed by a program whose address is specified by the input Shaded Vertex Data Address

Here is the program to set these inputs :

mov r4,nvShaderState										
mov r5,vertexData
mov r6,fragmentShaderCode

mov r1,BUS_ADDRESSES_l2CACHE_DISABLED
mov r0,sizeof.VertexDataEntryVarying		; Shaded Vertex Data Stride
strb r0,[r4,oNVShaderState.stride]
mov r0,3							; Fragment Shader Number Of Varyings
strb r0,[r4,oNVShaderState.nbrVarying]

mov r0,r6							; Fragment Shader Code Address
add r0,r1
str r0,[r4,oNVShaderState.addrCode]			

mov r0,r5							; Shaded Vertex Data Address
add r0,r1
str r0,[r4,oNVShaderState.addrData]	

align 16
nvShaderState			NV_Shader_State_Record
align 16
vertexData				Vertex_Data
align 16
fragmentShaderCode		Fragment_Shader_Code

Go one more effort, we're almost there ;-)

The data

That's all nice, but where are the coordinates of my three triangles? Well at the Shaded Vertex Data Address, which has just been programmed.
For the three red, green and blue triangles of the screenshot, I provided the following data :

struc Vertex_Data
{
  ; Vertex: Top Left
  dh 620 * 16 ; X In 12.4 Fixed Point
  dh 150 * 16 ; Y In 12.4 Fixed Point
  dw 0.9 ; Z
  dw 1.0 ; 1 / W
  dw 1.0 ; Varying 0 (Red)
  dw 0.0 ; Varying 1 (Green)
  dw 0.0 ; Varying 2 (Blue)	

  ; Vertex: Top Right
  dh 1220 * 16 ; X In 12.4 Fixed Point
  dh 130 * 16 ; Y In 12.4 Fixed Point
  dw 0.9 ; Z
  dw 1.0 ; 1 / W
  dw 1.0 ; Varying 0 (Red)
  dw 0.0 ; Varying 1 (Green)
  dw 0.0 ; Varying 2 (Blue)	

  ; Vertex: Bottom Right
  dh 500 * 16 ; X In 12.4 Fixed Point
  dh 962 * 16 ; Y In 12.4 Fixed Point
  dw 0.9 ; Z
  dw 1.0 ; 1 / W
  dw 1.0 ; Varying 0 (Red)
  dw 0.0 ; Varying 1 (Green)
  dw 0.0 ; Varying 2 (Blue)
	
  ; Vertex: Top Left
  dh 212 * 16 ; X In 12.4 Fixed Point
  dh 350 * 16 ; Y In 12.4 Fixed Point
  dw 0.8 ; Z
  dw 1.0 ; 1 / W
  dw 0.0 ; Varying 0 (Red)
  dw 0.0 ; Varying 1 (Green)
  dw 1.0 ; Varying 2 (Blue)	

  ; Vertex: Top Right
  dh 1720 * 16 ; X In 12.4 Fixed Point
  dh 130 * 16 ; Y In 12.4 Fixed Point
  dw 0.8 ; Z
  dw 1.0 ; 1 / W
  dw 0.0 ; Varying 0 (Red)
  dw 0.0 ; Varying 1 (Green)
  dw 1.0 ; Varying 2 (Blue)
	
  ; Vertex: Bottom Right
  dh 720 * 16 ; X In 12.4 Fixed Point
  dh 962 * 16 ; Y In 12.4 Fixed Point
  dw 0.8 ; Z
  dw 1.0 ; 1 / W
  dw 0.0 ; Varying 0 (Red)
  dw 0.0 ; Varying 1 (Green)
  dw 1.0 ; Varying 2 (Blue)
	
  ; Vertex: Top Left
  dh 212 * 16 ; X In 12.4 Fixed Point
  dh 150 * 16 ; Y In 12.4 Fixed Point
  dw 0.85 ; Z
  dw 1.0 ; 1 / W
  dw 0.0 ; Varying 0 (Red)
  dw 1.0 ; Varying 1 (Green)
  dw 0.0 ; Varying 2 (Blue)	

  ; Vertex: Top Right
  dh 1420 * 16 ; X In 12.4 Fixed Point
  dh 730 * 16 ; Y In 12.4 Fixed Point
  dw 0.85 ; Z
  dw 1.0 ; 1 / W
  dw 0.0 ; Varying 0 (Red)
  dw 1.0 ; Varying 1 (Green)
  dw 0.0 ; Varying 2 (Blue)
	
  ; Vertex: Bottom Right
  dh 1220 * 16 ; X In 12.4 Fixed Point
  dh 962 * 16 ; Y In 12.4 Fixed Point
  dw 0.85 ; Z
  dw 1.0 ; 1 / W
  dw 0.0 ; Varying 0 (Red)
  dw 1.0 ; Varying 1 (Green)
  dw 0.0 ; Varying 2 (Blue)
	
}
virtual at 0
  oVertexData Vertex_Data
	sizeof.VertexData = $ - oVertexData
end virtual

Each triangle is thus composed of three vertices. For each vertex we specify its coordinates x, y, z, w and its RGB color. The important parameter here is the Z, which defines the depth between 1.0 and -1.0.
The three triangles do not have the same depth Z. As they overlap, it is necessary to determine who appears in the foreground. It is the triangle with the largest Z that is in front of the other.

Here the red triangle has a depth Z of 0.9, which is greater than the blue triangle which is 0.8, so the red triangle is in front of the blue.
The green triangle is between red and blue because it has a Z depth of 0.85.

And this superposition remains the same, whatever the order of tracing triangles.

The program

The program tells Videocore what to do with vertex data. One can for example make a gradient of color with the varyings. But also and above all the taking into account of the depth Z, which one has configured previously !

Here is the magic program :

struc Fragment_Shader_Code
{
	dw $958e0dbf, $d1724823			; load_sm   ; mov r0, vary    ; mov r3.8d, 1.0
	dw $818e7176, $40024821			; sbwait    ; fadd r0, r0, r5 ; mov r1, vary
	dw $818e7376, $10024862			; fadd r1, r1, r5 ; mov r2, vary     
	dw $819e7540, $114248a3			; fadd r2, r2, r5 ; mov r3.8a, r0
	dw $809e7009, $115049e3			; nop             ; mov r3.8b, r1
	dw $809e7012, $116049e3			; nop             ; mov r3.8c, r2
	dw $159cffc0, $10020b27			; mov tlb_z, rb15 ; nop
	dw $159e76c0, $30020ba7			; mov tlbc, r3; nop; thrend
	dw $009e7000, $100009e7			; nop; nop; nop
	dw $009e7000, $500009e7			; nop; nop; sbdone
}

Step 6: Running the Binning and Render (v3dBinnerRun & v3dRenderRun)

Arrived at this stage, we still have nothing on the screen ! For the moment, data preparation has been done for Videocore. We must now launch the processes that will perform the binning, and the rendering. Here are the corresponding programs :

v3dBinnerRun

mov r4,PERIPHERAL_BASE + V3D_BASE

mov r0,1
str r0,[r4,V3D_BFC]			; reset the flush counter
str r0,[r4,V3D_RFC]			; reset the frame counter

; thread 0 configuration
mov r0,v3dCLBin				; address of the control list
str r0,[r4,V3D_CT0CA]	
mov r0,v3dCLBin.end			; end address of the control list
str r0,[r4,V3D_CT0EA]			; thread execution

; waiting for thread to finish
.DO1:
	ldr r0,[r4,V3D_BFC]		; flush counter
	tst r0,1					; test if the PTB emptied all lists of tiles in memory
beq .DO1

v3dRenderRun

mov r4,PERIPHERAL_BASE + V3D_BASE

; thread 1 configuration
mov r0,v3dCLRen				; address of the control list
str r0,[r4,V3D_CT1CA]
mov r1,v3dAddrEnd
ldr r0,[r1]						; end address of the control list
str r0,[r4,V3D_CT1EA]			; thread execution

; waiting for thread to finish
.DO1:
	ldr r0,[r4,V3D_RFC]		; flush counter
	tst r0,1					; test if the last tile storage operation is complete
beq .DO1

last step: displaying triangles

That's it, we have all the functions to display three superimposed triangles as in the screenshot. Here are the functions to execute in the correct order :

bl v3dInit
bl framebufferInit
bl v3dShaderPrep
bl v3dBinnerPrep
bl v3dRenderPrep
bl v3dBinnerRun
bl v3dRenderRun	

Demo

To test the result directly on your Raspberry, I made you a little demo, which displays the triangles in the following order red, blue, then green.

Simply unzip the file to a blank SD card, and restart the Raspberry Pi : demo1.zip

Voila, your comments are welcome :-)

Add a comment

Page : 1