Unity Tensorflow



TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications. This is an example of using model trained with TensorFlow in Unity application for image classification and object detection. It’s a quick port of TF Classify and TF Detect examples from TensorFlow repo, using TensorFlowSharp for gluing it all together. Classify results: Detect results. Using TensorFlowSharp in Unity (Experimental) Unity now offers the possibility to use pretrained TensorFlow graphs inside of the game engine. This was made possible thanks to the TensorFlowSharp project. Notice: This feature is still experimental. TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.

I’m extremely excited about the new Unity3D Machine Learning functionality that’s being added. Setting it up was a little painful though, so I wanted to share the steps I followed, with the specific versions that work (I tried a whole lot and nothing else worked). In this guide, I’ll show you everything you need to get setup and ready to start with the 3D ball example. There’s also a video version at the end..

You’ll need to download CUDA 8.0.61 for this to work.

You can view the CUDA download archive here: https://developer.nvidia.com/cuda-toolkit-archive

Select and downloadCUDA Toolkit 8.0 GA2

Close any open Unity and Visual Studio instances (you’ll have to restart the installer if you don’t do this first)

Run the Installer

ChooseExpress

Next we need to grab the CUDA Deep Neural Network library aka cuDNN

Visit the CUDNN page here: https://developer.nvidia.com/cudnn

You’ll need to create an NVIDIA account and log in to download the library. It’s easy to do and free though.

Once you’re logged in, click the download button.

Tensorflowsharp

Choose the download link for v6.0 for CUDA 8.0Download cuDNN v6.0 (April 27, 2017), for CUDA 8.0

Open the cuDNN zip file.

Copy the 3 folders (bin, include, and lib) from the zip file into your CUDA 8.0 folder.

The default folder path you’d copy into is C:Program FilesNVIDIA GPU Computing Toolkitcudav8.0

Into Here

Now you need to add an environment variable and two path entries.

Open the environment variable dialog (hit the windows key and start typing envir…. and it’ll auto popup for you)

Click the Environment Variables Button

Click the New button to add a new System Variable

Set the Variable Name to CUDA_HOME

Set the Value to C:Program FilesNVIDIA GPU Computing Toolkitcudav8.0

Find the Path in the Environment Variables Dialog

Make sure you select the System variables version, not the user variables!

Click Edit

Add the following twofolders to the path.

C:Program FilesNVIDIA GPU Computing ToolkitCUDAv8.0libx64

C:Program FilesNVIDIA GPU Computing ToolkitCUDAv8.0extrasCUPTIlibx64

Unity Tensorflow

Click Ok a couple times to close out the dialogs.

Next, we need to install Anaconda to configure our python environment.

Visit the download page here: https://www.anaconda.com/download/

Download the 3.6 version (I went with the 64 bit installer, not sure if it matters but I’d go with that one).

Run the Anaconda Installer and choose the default options all the way through.

After the installation completes, open the Anaconda Prompt

Creating the Conda Enviroment

Next, we need to create an environment with python 3.5.2.

Run the conda create command in the prompt like this:

conda create -n tensorflow-gpu python=3.5.2

Next, activate the newly created environment with this command:

activate tensorflow-gpu

And finally, install tensorflow with this command

pip install tensorflow-gpu

Once the installation completes, you can test that it was successful by launching python (still from that anaconda prompt) by typing:

python

Then use the command:

import tensorflow as tf

To try out the ML agents, you’ll want to download the sample project from the GitHub page here: https://github.com/Unity-Technologies/ml-agents

You can download the zip or use GIT and clone the repository.

Next, open another anaconda prompt as an administrator.

Tensorflowsharp Unity

Change directory into the ‘python’ subfolder or the location you cloned or unzipped the project into.

For example, my folder is: C:Gitml-agentspython because I cloned the repository into my c:git folder.

Tensorflow

Fixing Html5lib

Next, type the following command: conda install –force html5lib

Now type: pip install .

Make sure you include that period, it’s part of the command.

If there are no errors, you’re good to go and ready to start setting up your Unity project with tensorflow and machine learning!

Next, you’ll want to build the Unity Environment. The steps for this are very clearly laid out here so I won’t repeat them: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Getting-Started-with-Balance-Ball.md

One issue I ran into though was the ENABLE_TENSORFLOW define being cleared out after I installed the tensorflowsharp library. When this happened, the “Internal” option disappeared. Simply re-add it and hit play so it re-compiles, then the internal option should re-appear.

First I wanted to say thanks to the guys at Unity for building this all out. I’m excited to start integrating machine learning into projects for my AI.

I also wanted to thank Nitish Mutha for this awesome blog post that got me 90% of the way through this setup.

A while ago I made an example of how to use TensorFlow models in Unity using TensorFlow Sharp plugin: TFClassify-Unity. Image classification worked well enough, but object detection had poor performance. Still, I figured it can be a good starting point for someone who needs this kind of functionality in Unity app.

Unfortunately, Unity stopped supporting TensorFlow and moved to their own inference engine code-named Barracuda. You can still use the example above, but the latest plugin was built with TensorFlow 1.7.1 in mind and anything trained with higher versions might not work at all. The good news is that Barracuda should support ONNX models from the box and you can convert your TensorFlow model to the supported format easy enough. Bad news is that Barracuda is still in preview and there are some caveats.

Differences

With TensorFlow Sharp plugin, my the idea was to take TensorFlow example for Android and make a similar one for Unity using the same models, which is inception_v1 for image classification and ssd_mobilenet_v1 for object detection. I had successfully tried mobilenet_v1 architecture as well - it's not in the example, but all you need is to replace input/output names and std/mean values.

With Barracuda, things are a bit more complicated. There are 3 ways to try certain architecture in Unity: use ONNX model that you already have, try to convert TensorFlow model using TensorFlow to ONNX converter, or to try to convert it to Barracuda format using TensorFlow to Barracuda script provided by Unity (you'll need to clone the whole repo to use this converter, or install it with pip install mlagents).

None of those things worked for me with inception and ssd-mobilenet models. There were either problems with converting, or Unity wouldn't load the model complaining about not supported tensors, or it would crash or return nonsensical results during inference. Pre-trained inception ONNX seemed like it really wanted to get there, but craping out on the way with either weird errors or weird results (perhaps someone else will have more luck with that one).

But some things did work.

Image classification

MobileNet is a great architecture for mobile inference since, as it goes from its name, it was created exactly for that. It's small, fast and there are different versions that provide a trade-off between size/latency and accuracy. I didn't try latest mobilenet_v3, but v1 and v2 are working great both as ONNX and after tf-barracuda conversion. If you have .onnx model - you're set, but if you got .pb (TensorFlow model in protobuf format), the conversion is easy enough using the tensorflow-to-barracuda converter:

Converter figures out inputs/outputs itself. Those are good to keep around in case you need to modify them in code later. Here I have input name 'input' and output name 'MobilenetV2/Predictions/Reshape_1'. You can also see those in Unity editor when you choose this model for inspection. One thing to note: with mobilenet_v2, the converter and Unity inspector shows wrong input dimensions - it should be [1, 224, 224, 3] instead, but this doesn't seem to matter in practice.

Then you can load and run this model using Barracuda as described in the documentation:

Unity tensorflow plugin

The important thing that has to be done with the input image to make inference work is normalization, which means shifting and scaling down pixel values so that they go from [0;255] range to [-1;1]:

Barracuda actually has a method to create a tensor from Texture2D, but it doesn't accept parameters for scaling and bias. That's weird since it is often a necessary step before running inference on an image. Although be careful trying your own models - some of them might actually have scaling a bias layers as part of the model itself, so be sure to inspect it in Unity before using.

Object detection

There seem to be 2 object detection architectures that are currently used most often: SSD-MobileNet and YOLO. Unfortunately, SSD is not yet supported by Barracuda (as stated in this issue). I had to settle on YOLO v2, but originally YOLO is implemented in DarkNet and to get either Tensorflow or ONNX model you'll need to convert darknet weights to necessary format first.

Tensorflow To Onnx

Fortunately, already converted ONNX models exist, however, full network seemed like way too huge for mobile inference, so I chose Tiny-YOLO v2 model available here (opset version 7 or 8). But if you already have a Tensorflow model, then tensorflow-to-barracuda converter works just as well, in fact I have one in also works folder in my repository that you can try.

Funny enough, ONNX model already has layers for normalizing image pixels, except that they don't appear to actually do anything because this model doesn't require normalization and works with pixels in [0;255] range just fine.

The biggest pain with YOLO is that its output requires much more interpretation than SSD-Mobilenet. Here is the description of Tiny-YOLO output from ONNX repository:

'The output is a (125x13x13) tensor where 13x13 is the number of grid cells that the image gets divided into. Each grid cell corresponds to 125 channels, made up of the 5 bounding boxes predicted by the grid cell and the 25 data elements that describe each bounding box (5x25=125).'

Yeah. Fortunately, Microsoft has a good tutorial on using ONNX models for object detection with .NET where we can steal o lot of code from, although with some modifications.

Let's not block things

My TensorFlow Sharp example was pretty dumb in terms of parallelism since I simply run inference once a second in the main thread, blocking the camera from playing. There were other examples showing a more reasonable approach like running model in a separate thread (thanks MatthewHallberg).

However, running Barracuda in a separate thread simply didn't work, producing some ugly looking crashes. Judging by documentation Barracuda should be asynchronous by default, scheduling inference on GPU automatically (if available), so you simply call Execute() and then query the result sometime later. In reality, there are still caveats.

Unity Tensorflow Image Recognition

So Barracuda worker class has 3 methods that you can run inference with: Execute(), ExecuteAsync() and an extension method ExecuteAndWaitForCompletion(). The last one is obvious: it blocks. Don't use it unless you want your app to freeze during the process. Execute() method works asynchronously, so you should be able to do something like this:

...or query output in a different method entirely. However, I've noticed that there is still a slight delay even if you just call Execute() and nothing else, causing camera feed to jitter slightly. This might be less noticeable on newer devices, so try before buy.

ExecuteAsync() seems like a very nice option to run inference asynchronously: it returns an enumerator which you can run with StartCoroutine(worker.ExecuteAsync(inputs)). However, internally this method does yield return null after each layer, which means executing one layer per frame, which, depending on amount of layers in your model and complexity of operations in them, might just be too often and cause the model execute much slower than it can (as I found out is the case with mobilenet model for image classification). YOLO model does seem to work better with ExecuteAsync() than other methods, although amount of devices I can test it on is quite limited.

Playing around with different methods to run a model, I found another possibility: since ExecuteAsync() is an IEnumerator, you can iterate it manually, executing as many layers per frame as you want:

That's a bit hacky and totally not platform-independent - so judge yourself, but I found it actually work better for mobilenet image classification model, causing minimum lag.

Conclusion

Once again, Barracuda is still in early development, a lot of things change often and radically. Like my example being tested with Barracuda 0.4.0-preview version of the plugin, and 0.5.0-preview already breaks it, making object detection produce wrong results (so make sure to install 0.4.0 if you're gonna try the example, and I'll be looking into newer versions later). But I think that an inference engine that works cross-platform without all the hassle with TensorFlow versions, supports ONNX from the box and baked into Unity is a great development.

So does Barracuda example give better performance than TensorFlow Sharp one? It's hard to compare, especially with object detection. Different architectures are used, Tiny-YOLO has a lot fewer labels than SSD-Mobilenet, TFSharp example faster with OpenGLES while Barracuda works better with Vulcan, async strategies are different and it's not clear how to get best async results from Barracuda yet. But comparing image classification with MobileNet v1 on my Galaxy S8, I got following inference times running it synchronously: ~400ms for TFSharp with OpenGLES 2/3 vs ~110ms for Barracuda with Vulcan. But what's more important is that Barracuda will likely be developed, supported and improved for years to come, so I fully expect even bigger performance gains in the future.

I hope this article and example will be useful for you and thanks for reading!

Tensorflowsharp

Check out complete code on my github: https://github.com/Syn-McJ/TFClassify-Unity-Barracuda