Wednesday, August 19, 2009

.Net framework deadlock detection and prevention at runtime

Long time since last post :)


INTRODUCTION


This time I decided to share some info and tools, about one of the worst case scenario a developer could face when developing high-end multi-threaded applications: deadlocks.




That scenario goes something like this: You have the application structured on multi-layers, each with it's own logic, that communicates with a lot of clients, of different types, asynchronously to add a top flavor. Each logic action you take inside each layer, due to multiple threads going on and spinning around at the same time has to be synchronized, to avoid stall data or unexpected logic behavior. (for instance for a bank you have to synchronize access to some logic that updates the account balance, otherwise 2 threads trying to change the balance might screw things up).


The problem the developer might encounter, in most of the cases after few months since the initial release, would be that the application(server and/or client depending on the client logic versus pending communication) might freeze undefinetly. That is caused by two (up to N) threads trying to acquire lock on some resources, in a cycle.




Enough chit chat... :)

I'll present here a method of detecting deadlocks caused by poorly written code envolving lock{} csharp syntax. The detection is made by injecting the detector at runtime and detouring CLR low level methods. More on this below.





DEADLOCK FOR DUMMIES

The deadlock, simply explained, goes something like this:


Thread T1 acquires lock to object O1.
Thread T2 acquires lock to object O2.
Thread T1 tries to acquire lock to object O2.
Thread T2 tries to acquire lock to object O1.




e.g.
T1 T2
| \/ |
O1 O2


When the last step happens, we have a deadlock. None of the threads will advance anymore.

The lock can be much more advanced, up to N threads for N locked objects. The condition is to have a cycle between them.

E.g. T1 on O1 and O2, T2 on O2 and O3, Tk on Ok and Ok+1.... Tn on On and O1.

This can be easily explained with a classic problem, the dining philosopher:


.NET LOCK
The .Net syntax for locking goes like this:

object o=new object();
void criticalmethod()
{
lock(o)
{
//Do critical code here
}
}


All good so far. But if inside the lock you are calling methods you have no control over (not written by you or other assemblies) or the application is so large and you've reached a state where you've locked objects frenzy-style you will reach a deadlock at some point in the future.


DETECTION AND RESOLUTION

Next step: detection and resolution to the rescue.... But how?


The most obvious step to do would be to save the dump of the running-locked exe and start examining it with windbg. There are lots of resources on the internet about it, and Tess blog is one of them: http://blogs.msdn.com/tess/archive/2008/06/12/asp-net-case-study-deadlock-waiting-in-gettosta.aspx

But, what if the application is in production already and it's not an IIS-hosted process that might detect deadlocks and hangs(to restart them in a timely fashion), but some windows service developed in-house, that will stay like that until somebody will punch it?


MY APPROACH

My approach to this was to let the application cook, while monitoring what is going on inside of it, at runtime, with my own built tools. (after testing the tools of course :) ).

Then, my journey in discovering the inner works of locking mechanism began.

BOOKS
First and foremost, the books needed to understand the problems I'm raising here would be:
- Inside Microsoft .Net IL Assembler
- Customizing the Microsoft .Net Common Language Runtime.
- Build your own .Net Language and Compiler.
- Reversing - Secrets of Reverse Engineering
- Other windbg and debugging core books.

.NET LOCK IN IL (Intermediate Language)
A quick look at the lock syntax from C# in IL would reveal what is actually happening inside. Actual IL the compiler generates for a lock is(translated in C#):


System.Threading.Monitor.Enter(yourobject);
try
{
//your critical code here
}
finally
{
System.Threading.Monitor.Exit(yourobject);
}


This Monitor class is one of the core synchronization primitives in .Net along with semaphores, mutexes etc.

Using reflector on mscorlib Monitor class from System.Threading namespace, we see that it's implemented like this:


[MethodImpl(MethodImplOptions.InternalCall)]
public static extern void Enter(object obj);


This means that the method is actually implemented by the CLR, in NATIVE, unmanaged code. Ouch.


OLD BAD SOLUTION - Replace Monitor and recompile

One old solution that might have worked in previous versions of framework was to compile your application with a class named Monitor in the threading namespace and somehow hope the compiler would call your method. That was actually somehow more of a bug than a real solution. (you can find such approaches on http://www.codeproject.com/ ).


OLD BAD SOLUTION 2 - Build your own locker, IDisposable

Another solution I also used in smaller applications would be like this: create your own class that inherits from IDisposable, do the Monitor.Enter in constructor, do the Monitor Exit in the dispose, and use it to replace the lock syntax with something like:

using(new BadLocker(objectlocked))
{
//critical code here
}

Instead of the lock code:
lock(objectlocked){}.


This solution has the advantages:
- You can write your detection code in .Net
- Easy to write

Disadvantages:
- Search and replace on most sources
- Might omit some code that is actually calling Monitor.Enter (either from mistake or it's already compiled and included as reference, thus code you don't have control over).


CLR sources
So, what I needed was a tool to actually catch ALL Monitor.Enter/Exit calls from my application, from all threads and assemblies loaded.
That lead me to the CLR sources. I had to see what is going on inside.
Microsoft made the sources public so it's not hard to get them.
If you search enough you will find in ecall.cpp the actual translations of the monitor methods:

JIT_MonitorEnterStatic
JIT_MonitorEnterWorker
JIT_MonitorEnterWorkerPortable
all of them with the Exit equivalent.


Digging deeper you could see how the lock goes, and that during locks, CLR also checks if there are some hosts objects based on some interfaces loaded in the CLR. These hosts permit the option of starting the CLR with some custom managers (for sync, garbage collection etc).

OLD (NOT BAD) SOLUTION 3:
A good resource on deadlock detection based on CLR hosts is here:

It uses the function CoreBindToRuntimeEx to start a new CLR engine in a native process, inserts some custom hosts that the CLR will consider later, then call another function, ExecuteInDefaultApplicationDomain, to run a .Net assembly. Now, because the method accepts only a predefined method that looks like this
<>

As not all the .net applications have such main function, he uses a shim .net dll to actually start the desired assembly by calling
AppDomain.CurrentDomain.ExecuteAssembly.

If you take the detector sources and start analyzing them, you might find some things not very ok:
- the code actually implements new synchronization primitives, then does the detection
- the application is started from a native process, meaning that if you have an application that spawns some foreground threads in the main method, then exit main, the native process will also exit, thus killing the existing foreground threads, even if you don't like the idea. In native world, the main thread dictates what thread is started and if you exit the main from the native program, bye bye other threads.

This means that you will try to detect some buggy (bad designed) application that hangs at runtime by using a new synchronization code that might lead to even worse deadlocks or bugs.

OLD SOLUTION 4
Use so many tools to screw up the actual assembly. Neah, not nice.

FINALLY MY SOLUTION: INJECTION, HOOK(DETOURS), LEAVE THE ASSEMBLY INTACT
One thing was clear. The detection has to be made AT RUNTIME, by somehow making the methods defined in mscorwks.dll to pass through your code, as some sort of a proxy.
Hey, that looks like good old INJECTION+DETOUR hooking technique.
What I had to do was to inject my own code(native dll) in the running process, and on startup of my dll to detour the desired methods to my own methods, then call back the original ones when done.
To make this happen, I used Microsoft Detours from the research project. It has a flaw as it might not work on x64 systems. You can use plain simple JMP detour, but to make this detection, I could run the process as x86 for the sake of easiness.
Hey but wait, if you look with Depends tool at the mscorwks.dll you will see that there are no exported functions with that name. Hmm the methods are inside the dll.
IDA (interactive dissasembler) to the rescue! Used the free version (4.9) and loaded the dll. After a bit of research I found the methods I want, took the offsets, plus an offset of a known exported function. Took also methods definitions, as I have to detour to a function with the same structure.
I wrote with a little help from the good old Google an injector exe that will inject my native dll in the started .Net process.
Then in my native dll, I took with GetProcAddress the offset of the exported function, and with a plain simple math difference equation I correlated the entry points of the non exported functions.
Then, detoured them. Yes it works, but... Not for Exit. The method header is too small to be detoured. After a bit of fuss, I decided to get parts of detours code, and discard the error, let the hook happen even if the header is too small. It worked, no error was thrown and the assembly was working fine.
Made a synch context class that will store a relation between ObjectIds locked and threads. As threads I got the actual native threadid with GetThreadId(). I wasn't interested in the thread pool fiber-style thread ids as the unmanaged level is the one of interest. The object id (.net object id one) was acquired by getting the private offset of GetHashCode method from CLR object class. As I haven't implemented the actual objects, working with void * pointers worked pretty well.
A graph node-style was used to store thread to object relations.
Used a critical section to enter and leave the methods, to make sure the multithreads will not step on my graph structure. I know, I've put another pipe choke in the process other than the critical section the MonitorEnter uses, but the way i've made it, it couldn't hang the process.
Tested it, and it detects deadlocks you wouldn't even think of. Something like, if you have an application of 100 threads out of which 50 are doing something in back and others doing some other things, if the first 50 hangs, the others will continue to work, even if you have no clue that they locked.
The actual detection is made on a sepparate thread (the graph traversal for searching a cycle) at each 5 seconds, to avoid intensive operations at each lock. In the end, if the app is hanged, it will be hanged x hours from now on, so performance wise it's better to do it not so often. :)
I then decided that once I spot a deadlock, to save a minidump (with dbghelp.dll minidump function) save the chain, and crash/exit the application. This way, if it's a windows service it will auto-restart (if you configed it like this), and it will give you the option to analyze the content of the dump with windbg, having the chain already prepared to make the search easier.

I also made a sepparate process that spawns the .Net process and immediately after the injector program that injects the c++ dll.
To inject the process at the very beginning it's pretty hard. I tried to start the process in CREATE_SUSPENDED state and then inject it, but it's not working, because the way .Net processes are ran goes like this: SystemLoader calls the mscoree.dll _corexemain() method, and the mscoree decides what mscorwks version to load and start the application.

CONCLUSION
Advantages using this solution:
- live analysis of the deadlock, prevention of it, auto saving the dump.
- the framework itself (mscorlib) makes some locks inside on "this" e.g. winforms is a good example, that you will never detect without this hook.
- high speed due to native code
- incredibly low memory consumed for the live analysis compared to a .net version one
- you can consider yourself a little cracker by now
Disadvantages of using this solution:
-it's tailored at a specific version of mscorwks.dll for version 2.0 due to the lack of exported function. At any other dll version, the IDA path has to be taken and recompile the process with new offsets.
- it doesn't detect livelocks or locks made with semaphores, mutexes, etc. (I haven't used them in the application I made this for). probably can be implemented the same way. I didn't had time and mood to detect locks for some portions of .net code I don't use.
-untested on X64 version
- my not so good c++ coding experience (yes, I'm a .Net developer, duh) that might have omitted to clean some resources, handles etc.
- the process that injects it might omit some locks made before the injection was made. (order of milliseconds). To address this issue, we might use the shim approach, and ExecuteAssembly. Basically, create a .Net starter application that will wait few seconds until the injection is made, then execute the desired .net assembly from that .net process.
- lots of modules each with it's specific role.
- don't know if it's legal, but hey, it saves your skin.
BINARIES FOR TESTS
I will put them live soon.
SOURCECODE
The sourcecode can be made available at request. If it saved your skin and business, you might consider "donating" a few beers for the nights I lost making this happen.