Introduction

In this short tutorial I’m presenting something that’s made me loose weeks of work. How to implement picking with a perspective camera in the Android platform using OPENGL 1.0.

The process of picking basically involves the user clicking a point in their device screen, we take that point and apply the inverse transforms that opengl applies to it’s 3D scene, and so get a point in the world coordinate system (wcs) that is where the player wanted to click. For the sake of simplicity, we will work on a simple 2D map, instead of having to cast a ray to intersect multiple objects.

Usually, in opengl we would use the function glUnProject to un-project the point and so get the wcs equivalent point, but that function is plagued by errors on the Android platform and it’ very difficult to get the gl transformations for the projection and model matrixes.

Algorithm

So here is my solution. It might not be perfect, but it actually works.

Code Snippet
 /**
    * Calculates the transform from screen coordinate 
    * system to world coordinate system coordinates 
    * for a specific point, given a camera position.
    * 
    * @param touch Vec2 point of screen touch, the 
      actual position on physical screen (ej: 160, 240)
    * @param cam camera object with x,y,z of the 
      camera and screenWidth and screenHeight of 
      the device.
    * @return position in WCS.
    */
   public Vec2 GetWorldCoords( Vec2 touch, Camera cam)
   {  
       // Initialize auxiliary variables.
       Vec2 worldPos = new Vec2();
       
       // SCREEN height & width (ej: 320 x 480)
       float screenW = cam.GetScreenWidth();
       float screenH = cam.GetScreenHeight();
              
       // Auxiliary matrix and vectors 
       // to deal with ogl.
       float[] invertedMatrix, transformMatrix, 
           normalizedInPoint, outPoint;
       invertedMatrix = new float[16];
       transformMatrix = new float[16];
       normalizedInPoint = new float[4];
       outPoint = new float[4];
 
       // Invert y coordinate, as android uses 
       // top-left, and ogl bottom-left.
       int oglTouchY = (int) (screenH - touch.Y());
       
       /* Transform the screen point to clip 
       space in ogl (-1,1) */       
       normalizedInPoint[0] = 
        (float) ((touch.X()) * 2.0f / screenW - 1.0);
       normalizedInPoint[1] = 
        (float) ((oglTouchY) * 2.0f / screenH - 1.0);
       normalizedInPoint[2] = - 1.0f;
       normalizedInPoint[3] = 1.0f;
 
       /* Obtain the transform matrix and 
       then the inverse. */
       Print("Proj", getCurrentProjection(gl));
       Print("Model", getCurrentModelView(gl));
       Matrix.multiplyMM(
           transformMatrix, 0, 
           getCurrentProjection(gl), 0, 
           getCurrentModelView(gl), 0);
       Matrix.invertM(invertedMatrix, 0, 
           transformMatrix, 0);       
 
       /* Apply the inverse to the point 
       in clip space */
       Matrix.multiplyMV(
           outPoint, 0, 
           invertedMatrix, 0, 
           normalizedInPoint, 0);
       
       if (outPoint[3] == 0.0)
       {
           // Avoid /0 error.
           Log.e("World coords", "ERROR!");
           return worldPos;
       }
       
       // Divide by the 3rd component to find 
       // out the real position.
       worldPos.Set(
           outPoint[0] / outPoint[3], 
           outPoint[1] / outPoint[3]);
         
       return worldPos;       
   }
 

In my case, I’ve got a render, a logic and an application thread, this function is a service provided by the render thread, because it needs the gl Projection and ModelView matrixes.

What happens is the logic thread sends a touch (x,y) position, and the current camera (x,y,z, screenH, screenW), to the GetWorldCoords function, and expects the world position of that point taking into accound the camera position (x,y,z), and the view fustrum (represented by the projection and modelview matrixes).

The first lines get the data ready, create auxiliary matrixes and access camera data.

One important point is the line

int oglTouchY = (int) (screenH - touch.Y());

This inversion is needed because android screen coordinates assume a top-left coordinate system, and opengl needs a bottom left. So we change it. And with that we can start doing the picking algorithm.

Transform the point from screen coordinates (ej: 120, 330) to clip space (for a 320 x 480 android, this would be –0.25, 0.375)
Get the transformation matrix (projection * modelView), and invert it.
Multiply the clip-space point times the inverse transformation.
Divide the coordinates x,y,z (positions 0,1,2) times the w (position 3)
You’ve got the world coordinates.

Notes:

The z doesn’t appear because I don’t have need for it, but you can get it easily (outPoint[2] / outPoint[3]).

The situation I’m working on is the following. The red and blue are the frustum limits, the green is the world map, at an arbitrary point in space, and the camera is at the tip of the view frustum.

There is one very special complication when doing this picking algorithm in the android platform and that is accessing the projection and model view matrixes opengl uses. We manage with the following code.

Code Snippet
 /**
    * Record the current modelView matrix 
    * state. Has the side effect of
    * setting the current matrix state 
    * to GL_MODELVIEW
    * @param gl context
    */
   public float[] getCurrentModelView(GL10 gl) 
   {
        float[] mModelView = new float[16];
        getMatrix(gl, GL10.GL_MODELVIEW, mModelView);
        return mModelView;
   }
 
   /**
    * Record the current projection matrix 
    * state. Has the side effect of
    * setting the current matrix state 
    * to GL_PROJECTION
    * @param gl context
    */
   public float[] getCurrentProjection(GL10 gl) 
   {
       float[] mProjection = new float[16];
       getMatrix(gl, GL10.GL_PROJECTION, mProjection);
       return mProjection;
   }
 
   /**
    * Fetches a specific matrix from opengl
    * @param gl context
    * @param mode of the matrix
    * @param mat initialized float[16] array 
    * to fill with the matrix
    */
   private void getMatrix(GL10 gl, int mode, float[] mat) 
   {
       MatrixTrackingGL gl2 = (MatrixTrackingGL) gl;
       gl2.glMatrixMode(mode);
       gl2.getMatrix(mat, 0);
   }
 

The gl parameter passed to the getCurrent*(GL10 gl) functions is stored as a member variable in the class.

The MatrixTrackingGL class is part of the android samples, and can be found here. Some other classes must be included for it to work (mainly MatrixStack). The MatrixTrackingGL class acts as a wrapper for the gl context, but providing the data we need. For it to work, our custom GLSurfaceView class must have the GLWrapper call, something like this.

Code Snippet
 public DagGLSurfaceView(Context context) 
{
    super(context);       
        
    setFocusable(true);
        
    // Wrapper set so the renderer can 
    //access the gl transformation matrixes.
    setGLWrapper(
    new GLSurfaceView.GLWrapper() 
    {
        @Override
        public GL wrap(GL gl) 
        { 
            return new MatrixTrackingGL(gl); 
        }
    });  
        
    mRenderer = new DagRenderer();
    setRenderer(mRenderer);
}
 

(Where DagRender is my GLSurfaceView.Renderer, and DagGLSurfaceView is my GLSurfaceView)

Magic Scrolls of Code

Friday, October 8, 2010

3D Picking in Android

Introduction

Algorithm

Notes: