Projecting 3D Points into a 2D Screen

Skann.ai
14 min readSep 21, 2023

--

Projecting 3D Points into a 2D Screen

Imagine a painter creating a masterpiece on a two-dimensional canvas that seems to leap into life. In a similar vein, the art of computer graphics employs intricate mathematical wizardry to transform three-dimensional reality into captivating two-dimensional images. This transformation involves the process of projecting a 3D point from the real-world coordinates onto a 2D image plane. In this journey, we will apply this process step by step, and create a simple python code that reveals this transformation.

What is the format of a point that we are converting, and what are we trying to obtain ?

Conversion of a 3D point on world coordinate to 2D point on screen

As you can understand, our main goal is to transform a 3D point on the world coordinate, and obtain 2D coordinates on the image plane with a certain process.

In the 3D coordinate system, a point’s position is defined by its values along the x, y, and z axes. For example, a point at x=1, y=2, and z=3 is denoted as (1, 2, 3). However, for the sake of simplifying calculations during the transformation process, we prefer to represent a 3D point using a 4D vector. This might seem counterintuitive, but it helps with the math involved.

In this representation, we’re using a 4D vector to describe a 3D point. This choice is more about mathematical convenience, and we’ll get into the reasons behind it later. For now, just know that we’re using this 4D vector to capture the point’s location. Our end goal is to obtain a 3D vector (a list of values) that specifically gives us the x’ and y’ coordinates of that point on the 2D image.

In simpler terms, we’re taking a point in a 3D world, doing some calculations, and figuring out where that point would appear on a flat 2D image.

What are those processes that make transformations to a 2D point?

Conversion of a 3D point on world coordinate to 2D point on screen, with using 3 matrices.

Converting a 3D point from its location in the world to a 2D point on a screen involves a process that relies on three matrices. This process is essentially a series of matrix multiplications carried out in a specific order. Each matrix performs a unique type of transformation on the point. Now, we’ll take a closer look at these matrices and delve into how each one alters the position of the 3D point in the world coordinates.

The Camera Matrix

The purpose is aligning world coordinate system with camera coordinate system

We’re aiming to find the 2D representation of a 3D point on the camera’s perspective, while keeping in mind that the camera has its own unique coordinate system distinct from the world’s. The camera can be positioned anywhere within the world, and its view direction changes based on its placement. This transformation ensures that whatever the camera is looking at gets projected in a way that is relative to the camera’s perspective, no matter how the camera is oriented. In simpler terms, all further operations are carried out in alignment with the camera’s coordinate system, thanks to this transformation. This approach guarantees that the camera’s view volume is consistently created from its own viewpoint. You may be wondering what view volume means. We will explain this concept while explaining the projection matrix.

Left Matrix:Rotation, Right Matrix:Translation

There are two essential steps involved in achieving this alignment. The first step is translation, and the second step is rotation.

In the translation step, we bring the camera’s coordinate system and the world’s coordinate system into alignment. We do this by effectively shifting the origin of the camera’s coordinate system to match the origin of the world’s coordinate system. To achieve this, we calculate the negative position of the camera relative to the world coordinates and use that to adjust the alignment. Now, let’s talk about why we use a 4D vector to represent a 3D point. While regular translation can be achieved by simply adding the 3D point vector to the translation vector, we choose to use a 4D vector (also known as a homogeneous vector) to express the translation. This allows us to utilize a dot product for the translation, offering certain advantages.

Once translation is done, the two coordinate systems might still not be perfectly aligned. To achieve complete alignment, we apply rotation. Here’s how it works: We take the unit vectors along the x, y, and z axes of the camera’s coordinate system and perform a dot product with the point vector. This process helps us achieve the desired alignment result.

After completing these steps — translation and rotation — our point, initially defined in the world’s coordinate system, is now accurately represented according to the camera’s coordinate system.

Projection Matrix

Conversion of View Volumes to Canonical View Volume

To grasp the concept of Projection, we need to get a handle on two fundamental ideas: the “view volume” and the “canonical view volume.”

The view volume refers to the space that the camera can see within the 3D scene. Anything outside this volume won’t show up in the final image we generate. Now, the canonical view volume takes this view volume and transforms it into a cube. This cube is centered at the coordinate origin and extends from -1 to 1 along the x, y, and z axes. This transformation helps make our process more efficient and simplifies the later step of transforming it into the final image.

Projection is essentially a process that converts the view volume into this canonical view volume. Now, let’s dive into two specific types of projection: orthographic and perspective projection.

Orthographic Projection

View Volume of Orthographic Projection

Orthographic projection ensures that no matter how far away an object is from the camera, its size in the image remains the same. To achieve this, we fit the object within a rectangular prism-like view volume. This ensures that the object’s dimensions match the image’s dimensions.

Perspective Projection

View Volume of Perspective Projection

On the other hand, perspective projection comes into play when we want objects that are far from the camera to appear smaller in the image. To achieve this effect, we use a prism-like shape where the part corresponding to distant objects has a larger area. This way, when we transform the scene into the canonical view volume, objects that are farther away naturally appear smaller compared to those that are closer.

Now, let’s delve into projection matrices, starting with the orthographic projection matrix.

Orthographic Projection Matrix

Orthographic Projection

Orthographic Projection Matrix

The purpose of the orthographic projection matrix is to position points within the canonical view volume without altering their proportions. In other words, it ensures that the relative sizes and shapes of objects in the view volume are preserved. I won’t go into the specifics of how this matrix is constructed here, but if you’re interested, you can explore more about it from various sources.

Perspective Projection Matrix

Overview of Perspective Projection

Perspective Projection Matrix

Perspective projection involves two main steps.

In the first step, we convert the perspective view volume into an orthographic view volume. During this stage, objects that are far away appear smaller compared to objects that are closer. Although I won’t go into the details of how this matrix is created here, there are resources available if you’re interested in learning more about it.

Matrix that converts Perspective View Volume to Orthographic View Volume

Once we’ve accomplished the transformation using this matrix, our points go from being situated in the perspective view volume to being positioned in the orthographic view volume. You can think of this step as a process of compressing the view volume. Distant objects are compressed more, while closer objects undergo less compression.

The second step involves converting the orthographic view volume into the canonical view volume. This is the same process we discussed earlier in relation to orthographic projection. By applying the matrix we mentioned earlier, we complete the transformation from the perspective view volume to the final canonical view volume.

Perspective Projection Matrix

Once we’ve reached this stage, you might notice that the “weight” of the point we’ve obtained (which is the value at the bottom of the vector) corresponds to the negative of the z value. Now, in the context of homogeneous coordinates, if you multiply or divide all the values in a vector by the same number, you’re essentially representing the same point in 3D space. In simpler terms, if we take the vector resulting from the matrix multiplication and divide its values by -z, we effectively express that same point within the 3D coordinate system.

Considering that the weight is set to 1, the values in this vector represent the point’s precise position within the canonical view volume (CVV). In essence, we’ve completed the transformation from the 3D world into the CVV, and these adjusted values hold the point’s location information in this standardized cube-like space.

Viewport Matrix

Up to this point, we’ve effectively translated a 3D point from its position in the world to the canonical view volume (CVV) in relation to the camera’s viewpoint. In the next step, we must convert the CVV into actual screen coordinates. This step essentially tells us where a point within the CVV will be positioned on the screen.

Conversion of CVV to Screen Coordinates

In the image provided, nx represents the screen’s width, and ny stands for the screen’s height. To carry out the mapping, we need to transform points within the [-1, 1] range along the x-axis to fit within the range of [-0.5, nx-0.5], points within the [-1, 1] range along the y-axis to fit within [-0.5, ny-0.5], and points within the [-1, 1] range along the z-axis to fit within [0, 1].

The choice to start the range at -0.5 is because the origin of the screen’s coordinate system is precisely in the middle of the pixel located at the upper left corner. If we consider the length of a pixel to be 1, then the starting point of the screen’s coordinates would be at -0.5.

We also consider the z-axis to store information about depth, which helps determine whether a point is positioned in front of or behind another point. For instance, after completing the necessary transformations, let’s say we have two points: A(20, 30, 0.1) and B(20, 30, 0.5). The point displayed in the final image will be A, as it’s located closer to the viewer in the canonical view volume.

The transformation process is achieved using a matrix, which is provided below. If you’re curious about the origins of this matrix and how it works, you can find more information through research.

Viewport Matrix

Notice that we will obtain a 3*4 matrix, as we desired at the beginning of the discussion.

Conclusion

After all these stages, we successfully found a point in the world coordinate, where it should be located on the screen. To achieve this, we used three basic matrices: the camera matrix, the projection matrix, and the viewport matrix. We used the camera matrix to find the position of the object relative to the camera coordinate system, the projection matrix to express the point in the view volume of the camera in CVV, and the viewport matrix to determine where the point in the CVV corresponds to on the screen. In addition, we aimed to display the point in front of the screen by hiding the depth information. As a result of all these steps, we have successfully found the 3D point’s position on the 2D screen.

Simple Python Code of the Conversion

Now, let’s create a straightforward Python code that performs the described operations.

import numpy as np

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d.art3d import Poly3DCollection

We import the numpy library to easily perform matrix operations. The reason we imported matplotlib.pyplot and Poly3DCollection is to see the process visually.

#Homogeneous point we want to convert

point_3d = np.array([2, 2, -10, 1])

#Type of the projection we want

projection_type = “orthographic”

#Coordinates of the view volume

left = -3

right = 3

bottom = -3

top = 3

near = 5

far = 20

At this stage, we take any point from the world coordinate system that we want to transform. We keep the type of projection we want to implement in a variable. You can randomly select the coordinates of the view volume. Choosing different view volumes will change the properties of the image you will get on the screen. To ensure your chosen point appears on the screen, ensure it falls within the boundaries of the view volume.

#Creating camera matrix

rotation_matrix = np.array([

[1, 0, 0, 0],

[0, 1, 0, 0],

[0, 0, 1, 0],

[0, 0, 0, 1]

])

translation_matrix = np.array([

[1, 0, 0, 0],

[0, 1, 0, 0],

[0, 0, 1, 0],

[0, 0, 0, 1]

])

camera_matrix = rotation_matrix @ translation_matrix

#Camera is at the origin of world coordinate system, looking towards -z axis

For the sake of simplicity, we assume that there is no need for translation and rotation operations, which means the world coordinate system is already aligned with the camera coordinate system.

#Projection Matrix

def orthographic_projection(left, right, bottom, top, near, far):

op_matrix = np.array([

[2 / (right — left), 0, 0, -(right + left) / (right — left)],

[0, 2 / (top — bottom), 0, -(top + bottom) / (top — bottom)],

[0, 0, -2 / (far — near), -(far + near) / (far — near)],

[0, 0, 0, 1]

])

return op_matrix

def perspective_projection(left, right, bottom, top, near, far):

pp_matrix = np.array([

[(2 * near) / (right — left), 0, (right + left) / (right — left), 0],

[0, (2 * near) / (top — bottom), (top + bottom) / (top — bottom), 0],

[0, 0, -(far + near) / (far — near), -(2 * far * near) / (far — near)],

[0, 0, -1, 0]

])

return pp_matrix

We are creating two different functions for the two different projection types we will perform. Each function takes the necessary parameters for the view volume and turns it into projection matrices.

#ViewPort Matrix

nx = 600

ny = 600

viewport_matrix = np.array([

[nx / 2, 0, 0, (nx — 1) / 2],

[0, ny / 2, 0, (ny — 1) / 2],

[0, 0, 0.5, 0.5],

])

The value “nx” corresponds to the desired screen width, while “ny” represents the screen’s height. We generate the viewport matrix according to the explanation provided previously.

#Choosing projection matrix associated with projection type

if(projection_type == “orthographic”):

projection_matrix = orthographic_projection(left, right, bottom, top, near, far)

elif (projection_type == “perspective”):

projection_matrix = perspective_projection(left, right, bottom, top, near, far)

#Applying the matrices in the described order.

point_after_CM = camera_matrix @ point_3d

point_after_PM = projection_matrix @ point_after_CM

#Normalization of the projected point

point_after_PM /= point_after_PM[3]

point_after_VP = viewport_matrix @ point_after_PM

print(point_after_VP)We obtain the projection matrix according to the projection type we have chosen. Then we apply the projection matrices to the 3D point in the order we specified. After projection, we normalize the homogeneous coordinate. In this way, we are proportioning the points according to the depth in perspective projection. Notice that this operation has no effect on orthographic projection. This is because the weight of point_after_PM is already equal to 1.

Great job! We’ve successfully determined how a 3D point gets projected onto a 2D screen. Now, let’s create a code that visually demonstrates this process. To do this, we’ll choose the corner points of a cube in the 3D world, and then figure out where these points will appear on the 2D screen after projection.

cube_vertices = np.array([

[-1, -1, -1, 1], # Vertex 0

[1, -1, -1, 1], # Vertex 1

[1, 1, -1, 1], # Vertex 2

[-1, 1, -1, 1], # Vertex 3

[-1, -1, 1, 1], # Vertex 4

[1, -1, 1, 1], # Vertex 5

[1, 1, 1, 1], # Vertex 6

[-1, 1, 1, 1] # Vertex 7

])

# Translate cube vertices to center at (0, 0, -10)

translation_vector = np.array([0, 0, -10, 0])

cube_vertices = cube_vertices + translation_vector

cube_edges = [

[0, 1], [1, 2], [2, 3], [3, 0],

[4, 5], [5, 6], [6, 7], [7, 4],

[0, 4], [1, 5], [2, 6], [3, 7]

]

In the provided code, our initial step was to generate a cube. We positioned its center at the origin and made each side 2 units long. Afterward, we shifted the cube’s center to the point (0, 0, -10) to ensure it remains within our designated view volume. Regarding the “cube_edges” explanation, it involves linking the indices of specific vertices that we intend to form edges between.

# Create a figure and 3D subplot

fig = plt.figure(figsize=(10, 6))

ax3d = fig.add_subplot(121, projection=’3d’)

ax2d = fig.add_subplot(122)

# Plot the cube in 3D

for edge in cube_edges:

ax3d.plot(cube_vertices[edge, 0], cube_vertices[edge, 1], cube_vertices[edge, 2], color=’blue’)

# Transformed cube vertices after camera and projection matrices

cube_after_CM = camera_matrix @ cube_vertices.T

cube_after_PM = projection_matrix @ cube_after_CM

cube_after_PM /= cube_after_PM[3]

cube_after_VP = viewport_matrix @ cube_after_PM

# Plot the projected cube in 2D

for edge in cube_edges:

start_idx, end_idx = edge

start_point = cube_after_VP[:2, start_idx]

end_point = cube_after_VP[:2, end_idx]

ax2d.plot([start_point[0], end_point[0]], [start_point[1], end_point[1]], color=’red’)

# Set labels and title

ax3d.set_xlabel(‘X’)

ax3d.set_ylabel(‘Y’)

ax3d.set_zlabel(‘Z’)

ax3d.set_title(‘3D Cube Projection’)

ax2d.set_xlabel(‘X’)

ax2d.set_ylabel(‘Y’)

ax2d.set_xlim(0, nx)

ax2d.set_ylim(0, ny)

ax2d.set_title(‘2D Projection on Screen’)

plt.tight_layout()

plt.show()This section draws the 3D view of the cube and its projection on the screen.

You can find the output of the code below for both orthographic and perspective projection.

Output of the code when projection_type = “orthographic”

Output of the code when projection_type = “perspective”

Keep in mind that the camera is oriented along the -z direction, and objects that are farther away from the camera appear smaller in the perspective projection.

By adjusting certain parameters, we can achieve various outcomes. For instance, if we shift the camera’s position to (-2,-2,-2), we need to modify the code accordingly like this:

translation_matrix = np.array([

[1, 0, 0, 2],

[0, 1, 0, 2],

[0, 0, 1, 2],

[0, 0, 0, 1]

])

The output of the code for perspective projection would be following:

Perspective Projection when camera at (-2,-2,-2) on 3D world

You can change the parameters to get different results. But in order to get an efficient result, the points you will project should be in your view volume. Also make sure you don’t change the format of the matrices we use.

Uygar Baran Ülgen

--

--

Skann.ai
Skann.ai

Written by Skann.ai

Vehicle inspections, reinterpreted imaginatively.

No responses yet