Friday, October 8, 2010

3D Picking in Android


In this short tutorial I’m presenting something that’s made me loose weeks of work. How to implement picking with a perspective camera in the Android platform using OPENGL 1.0.

The process of picking basically involves the user clicking a point in their device screen, we take that point and apply the inverse transforms that opengl applies to it’s 3D scene, and so get a point in the world coordinate system (wcs) that is where the player wanted to click. For the sake of simplicity, we will work on a simple 2D map, instead of having to cast a ray to intersect multiple objects.

Usually, in opengl we would use the function glUnProject to un-project the point and so get the wcs equivalent point, but that function is plagued by errors on the Android platform and it’ very difficult to get the gl transformations for the projection and model matrixes.


So here is my solution. It might not be perfect, but it actually works.

Code Snippet
  1. /**
  2.     * Calculates the transform from screen coordinate
  3.     * system to world coordinate system coordinates
  4.     * for a specific point, given a camera position.
  5.     *
  6.     * @param touch Vec2 point of screen touch, the
  7.       actual position on physical screen (ej: 160, 240)
  8.     * @param cam camera object with x,y,z of the
  9.       camera and screenWidth and screenHeight of
  10.       the device.
  11.     * @return position in WCS.
  12.     */
  13.    public Vec2 GetWorldCoords( Vec2 touch, Camera cam)
  14.    {  
  15.        // Initialize auxiliary variables.
  16.        Vec2 worldPos = new Vec2();
  18.        // SCREEN height & width (ej: 320 x 480)
  19.        float screenW = cam.GetScreenWidth();
  20.        float screenH = cam.GetScreenHeight();
  22.        // Auxiliary matrix and vectors
  23.        // to deal with ogl.
  24.        float[] invertedMatrix, transformMatrix,
  25.            normalizedInPoint, outPoint;
  26.        invertedMatrix = new float[16];
  27.        transformMatrix = new float[16];
  28.        normalizedInPoint = new float[4];
  29.        outPoint = new float[4];
  31.        // Invert y coordinate, as android uses
  32.        // top-left, and ogl bottom-left.
  33.        int oglTouchY = (int) (screenH - touch.Y());
  35.        /* Transform the screen point to clip
  36.        space in ogl (-1,1) */       
  37.        normalizedInPoint[0] =
  38.         (float) ((touch.X()) * 2.0f / screenW - 1.0);
  39.        normalizedInPoint[1] =
  40.         (float) ((oglTouchY) * 2.0f / screenH - 1.0);
  41.        normalizedInPoint[2] = - 1.0f;
  42.        normalizedInPoint[3] = 1.0f;
  44.        /* Obtain the transform matrix and
  45.        then the inverse. */
  46.        Print("Proj", getCurrentProjection(gl));
  47.        Print("Model", getCurrentModelView(gl));
  48.        Matrix.multiplyMM(
  49.            transformMatrix, 0,
  50.            getCurrentProjection(gl), 0,
  51.            getCurrentModelView(gl), 0);
  52.        Matrix.invertM(invertedMatrix, 0,
  53.            transformMatrix, 0);       
  55.        /* Apply the inverse to the point
  56.        in clip space */
  57.        Matrix.multiplyMV(
  58.            outPoint, 0,
  59.            invertedMatrix, 0,
  60.            normalizedInPoint, 0);
  62.        if (outPoint[3] == 0.0)
  63.        {
  64.            // Avoid /0 error.
  65.            Log.e("World coords", "ERROR!");
  66.            return worldPos;
  67.        }
  69.        // Divide by the 3rd component to find
  70.        // out the real position.
  71.        worldPos.Set(
  72.            outPoint[0] / outPoint[3],
  73.            outPoint[1] / outPoint[3]);
  75.        return worldPos;       
  76.    }

In my case, I’ve got a render, a logic and an application thread, this function is a service provided by the render thread, because it needs the gl Projection and ModelView matrixes.

What happens is the logic thread sends a touch (x,y) position, and the current camera (x,y,z, screenH, screenW), to the GetWorldCoords function, and expects the world position of that point taking into accound the camera position (x,y,z), and the view fustrum (represented by the projection and modelview matrixes).

The first lines get the data ready, create auxiliary matrixes and access camera data.

One important point is the line

int oglTouchY = (int) (screenH - touch.Y());

This inversion is needed because android screen coordinates assume a top-left coordinate system, and opengl needs a bottom left. So we change it. And with that we can start doing the picking algorithm.

  1. Transform the point from screen coordinates (ej: 120, 330) to clip space (for a 320 x 480 android, this would be –0.25, 0.375)
  2. Get the transformation matrix (projection * modelView), and invert it.
  3. Multiply the clip-space point times the inverse transformation.
  4. Divide the coordinates x,y,z (positions 0,1,2) times the w (position 3)
  5. You’ve got the world coordinates.


The z doesn’t appear because I don’t have need for it, but you can get it easily (outPoint[2] / outPoint[3]).

The situation I’m working on is the following. The red and blue are the frustum limits, the green is the world map, at an arbitrary point in space, and the camera is at the tip of the view frustum.


There is one very special complication when doing this picking algorithm in the android platform and that is accessing the projection and model view matrixes opengl uses. We manage with the following code.

Code Snippet
  1. /**
  2.     * Record the current modelView matrix
  3.     * state. Has the side effect of
  4.     * setting the current matrix state
  5.     * to GL_MODELVIEW
  6.     * @param gl context
  7.     */
  8.    public float[] getCurrentModelView(GL10 gl)
  9.    {
  10.         float[] mModelView = new float[16];
  11.         getMatrix(gl, GL10.GL_MODELVIEW, mModelView);
  12.         return mModelView;
  13.    }
  15.    /**
  16.     * Record the current projection matrix
  17.     * state. Has the side effect of
  18.     * setting the current matrix state
  19.     * to GL_PROJECTION
  20.     * @param gl context
  21.     */
  22.    public float[] getCurrentProjection(GL10 gl)
  23.    {
  24.        float[] mProjection = new float[16];
  25.        getMatrix(gl, GL10.GL_PROJECTION, mProjection);
  26.        return mProjection;
  27.    }
  29.    /**
  30.     * Fetches a specific matrix from opengl
  31.     * @param gl context
  32.     * @param mode of the matrix
  33.     * @param mat initialized float[16] array
  34.     * to fill with the matrix
  35.     */
  36.    private void getMatrix(GL10 gl, int mode, float[] mat)
  37.    {
  38.        MatrixTrackingGL gl2 = (MatrixTrackingGL) gl;
  39.        gl2.glMatrixMode(mode);
  40.        gl2.getMatrix(mat, 0);
  41.    }


The gl parameter passed to the getCurrent*(GL10 gl) functions is stored as a member variable in the class.

The MatrixTrackingGL class is part of the android samples, and can be found here. Some other classes must be included for it to work (mainly MatrixStack). The MatrixTrackingGL class acts as a wrapper for the gl context, but providing the data we need. For it to work, our custom GLSurfaceView class must have the GLWrapper call, something like this.

Code Snippet
  1. public DagGLSurfaceView(Context context)
  2. {
  3.     super(context);       
  5.     setFocusable(true);
  7.     // Wrapper set so the renderer can
  8.     //access the gl transformation matrixes.
  9.     setGLWrapper(
  10.     new GLSurfaceView.GLWrapper()
  11.     {
  12.         @Override
  13.         public GL wrap(GL gl)
  14.         {
  15.             return new MatrixTrackingGL(gl);
  16.         }
  17.     });  
  19.     mRenderer = new DagRenderer();
  20.     setRenderer(mRenderer);
  21. }

(Where DagRender is my GLSurfaceView.Renderer, and DagGLSurfaceView is my GLSurfaceView)


  1. Hi Jaime,
    I'm stuck with the same problem for over a week now. What you write here is exactly what I'm trying to achieve. It would be a blast, if you could post some compilable code.

  2. Well Judith, the source code for the game I'm making that uses that is here.
    You want to look at the file and the file.

    Hope it can be of use!

    1. Hi Jaime! I'm trying to implement your algorithm but it doesn't work in my project.
      I want to pick a cube or a pyramid and rotate it. After that, I need implement the same algorithm but for my thesis project and my time is finishing X.X, I'm wondering if you could help me?

  3. Hello Jaime!
    thank you for the article!!
    Your blog is the only one I found so far in terms of good explanation of theory and practical (Android) example most others just repeat the same theory and nothing more.
    Thank you again!!!
    It would be nice if you continue your blog with practical explanations of 3D picks algoritms: color pick, name pick, ray pick

  4. Thanks for the comment Alex. I try to share any algorithm or task I have special problems with, so other people don't have to waste their time like I did.

  5. Hi! It really helps a lot!
    But may I know how you utilize the current camera position (x,y,z) to get the result? when I went through your code in GetWorldCoords, I couldnt get how the cam (x,y,z) was involved there?

  6. Well, you don't get to use the x,y,z of the cam, because you've already supplied those to OGL in your matrix transformations to move the camera each frame.
    We access that in getCurrentProjection(gl), so we only use the camera directly for the size of the viewport in this code.

    Hope it helps :3
    If you need more help, feel free to email me directly.

  7. Hi, what If I have set the Frustum like this:

    float size = .01f * (float) Math.tan(Math.toRadians(-45.0) / 2);
    float ratio = (float) w / h;
    gl.glFrustumf(-size, size, -size / ratio, size / ratio, 0.01f, 10.0f);

    ? It gives wrong values..

  8. Aleksandar, take a look at gluPerspective for that, much easier to use.

  9. Is it the same? I use it with gluPerspective now, and then define a, say, rect with vertices:

    float[] vertices = new float[] {
    1, -1, 0
    1, 1, 0
    -1, 1, 0
    -1, -1, 0

    and it draws perfectly correct. But then when I touch the points where the rectangle is drawn with the texture, having in mind that the texture is 256x256 (pow of two), the getWorldCoords returns values which are close to but not correct to the defined vertices. And the range of x goes from -2.2 to 2.2 and y from -1.5 to 1.5.

  10. Sorry, one more question... Currently the scene goes from x -2.0 to x 2.0 and from y -1.5 to y 1.5 .. will this be the same for all screen devices? How do you handle different screen sizes?

  11. Thanks~I get it ~ really helps a lot:)

  12. One more question:
    if we want to get the 3D coordinates, then there should be 3 parameters contained in the screen coordinates, right? ScreenX, ScreenY, and 0 or 1 for the two ending points of the ray.
    How will these be involved?

  13. Great code, but I have some questions:

    1. Why normalizedInPoint[2] = - 1.0f?
    2. Where is "an arbitrary point in space" of world map?

    I am trying to understand these transformation but not everything is clear.

  14. +1 for snakeye comment. It is curious it works with the values:
    normalizedInPoint[2] = - 1.0f;
    normalizedInPoint[3] = 1.0f;

    I tried to use the same in my Renderer but can't get results. Guess it only works in your particular OpenGL setting...

  15. Hi,

    I can get this code running, I put the Touch-Screen Co-ordinates and it gives a normalized value b/w -1 and 1 and I can convert it back to Screen Coordinates using the Denormalization.

    My question is if I want to check if I have clicked inside a marker/object area or not, how should I proceed.

    I would really appreciate if you can reply.

  16. Hello!

    This is nice, but how to build a moell selector tool with this?

    I have several modellt, but in android glPicking/colorPicking not working.

    How can I change this code to make selections?

    1. This tutorial was useful but there was allot I had to figure out.

      I have implemented a full tutorial of ray picking on android here:

  17. /* Transform the screen point to clip
    space in ogl (-1,1) */
    normalizedInPoint[0] =
    (float) ((touch.X()) * 2.0f / screenW - 1.0);
    normalizedInPoint[1] =
    (float) ((oglTouchY) * 2.0f / screenH - 1.0);
    normalizedInPoint[2] = - 1.0f;
    normalizedInPoint[3] = 1.0f;

    Can You please explain this lines.

    In the above link they are not using "ray", but using
    glReadPixels( x, int(winY), 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &winZ );
    Will it not work with Android? I am trying to use it.
    Please help.

  19. Hi... Your solution seems to be great.. But I am working on android so i want to do the same in android.. Could you provide the same code for android, please....?

  20. Hi there,
    im still new in this AR. What software are you using? and if i have my own 3D model, how can i import my own model?

  21. Your link for MatrixTrackingGL library is gone.
    This is valid: