GHC Threading And FFI

Albert Y. C. Lai, trebla [at] vex [dot] net

Information in this article comes from:

paper Extending the Haskell Foreign Function Interface with Concurrency by Simon Marlow, Simon Peyton Jones, Wolfgang Thaller. This paper is superseded at only one point: the number of “capabilities”.
blog article Concurrency and Foreign Functions in the Glasgow Haskell Compiler by Leon P. Smith. This article confirms the current status of the number of “capabilities”.

Terminology

threaded RTS

Obtained by linking with ghc -threaded ... Assumed in this article. Without -threaded, you lose concurrency during FFI calls.

OS thread

At the end of the day, every piece of code, C or Haskell, must be run in some OS thread, aka native thread. The GHC RTS spawns OS threads as it needs. Your C code can also spawn your own OS threads.

Haskell thread

Created by forkIO, forkOS, and C calling Haskell; aka green thread, lightweight thread. Haskell code is run in Haskell threads (which in turn are run in some OS threads).

unbound Haskell thread: Created by forkIO. Both Haskell code and C calls are run in arbitrarily chosen OS threads as the GHC RTS sees fit.
bound Haskell thread: Created by forkOS and C calling Haskell. A bound Haskell thread is associated permanently with an OS thread. When the bound Haskell thread calls C, the C code is run in that OS thread.

safe, unsafe

foreign import … defaults to foreign import … safe.
foreign import … safe allows the C call to call Haskell, is concurrent if -threaded, and incurs a bit more cost.
foreign import … unsafe is the opposite.

capability

The GHC RTS uses capabilities to assign and re-assign OS threads to run Haskell code, like a token-passing system. The number of capabilities is the number of OS threads that run Haskell code, so for example

+RTS
    -N 2

means that all your thousands of Haskell threads are cramped into 2 OS threads. OS threads running safe C calls do not have capabilities assigned.

Concurrent FFI Calls

Without much effort from the Haskell programmer, multiple Haskell threads calling C together already works: they don't block each other, and they don't block unrelated Haskell threads. The Haskell programmer only needs to add -threaded and delete unsafe.

Here is how GHC does it. So an OS thread with a capability is happily churning along Haskell threads. Suddenly one unbound Haskell thread safe-calls C. (The story of the bound case is in the next section.) This Haskell thread is suspended, this OS thread loses its capability and runs the C code, and some other OS thread gains the capability and picks up the other Haskell threads. Everyone is happy.

An unsafe C call does not involve a transfer of capability. Therefore many other Haskell threads, including garbage collection threads, are put on hold as collateral damage.

When the C call finishes, eventually the original OS thread re-gains a capability to resume the caller Haskell thread (and picks up other Haskell threads).

Some of the above are probably technical details we don't have to worry about. For example, we don't mind which OS thread is chosen to run C, and we don't mind which OS thread is chosen to resume the caller Haskell thread. The point we care about is that one OS thread runs C and another OS thread runs Haskell.

The following example spawns 2 Haskell threads to make 2 slow C calls; meanwhile the main thread still has something to say. All of them have their say at the scheduled times. We also hear that two OS threads run the two C calls.

To compile on Linux: ghc -threaded main.hs slow.c

main.hs:

import Control.Concurrent
import Control.Exception(finally)
import Foreign.C

mforkIO action = do
  done <- newEmptyMVar
  forkIO (action `finally` putMVar done ())
  return (takeMVar done)

main = do
  w1 <- mforkIO (thread_code 3)
  threadDelay 100000
  w2 <- mforkIO (thread_code 2)
  threadDelay 1000000
  putStrLn "haskell thread here"
  w2
  w1

thread_code :: CUInt -> IO ()
thread_code n = do
  ht <- myThreadId
  putStrLn (show ht ++ " starts")
  slow n
  putStrLn (show ht ++ " ends")

foreign import ccall safe slow :: CUInt -> IO ()

slow.c (Linux only):

#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <stdio.h>

unsigned get_ostid(void) {
  return syscall(SYS_gettid);
}
/* yes, I gamble that pid_t is essentially a word. */

void slow(unsigned n) {
  printf("slow sleeps in OS thread %u for %u seconds\n", get_ostid(), n);
  sleep(n);
}

Result:

ThreadId 4 starts
slow sleeps in OS thread 4323 for 3 seconds
ThreadId 5 starts
slow sleeps in OS thread 4324 for 2 seconds
  [1 second later]
haskell thread here
  [1 second later]
ThreadId 5 ends
  [1 second later]
ThreadId 4 ends

Thread-Local FFI Calls

C calls happening in unpredictably chosen OS threads defeat some C libraries; such a library requires you to choose one OS thread and make all your library C calls there. This is the sole cause of all of the visible complications of GHC threading.

The complication is nicely contained by bound Haskell threads. When a bound Haskell thread is created, it is associated with an OS thread permanently. Every C calls from this bound Haskell thread are run in that associated OS thread. So, if you make all library C calls from this bound Haskell thread, they all go to the same OS thread. The library is happy.

(Nominally, Haskell code in this thread may still be run in whatever OS threads bearing capabilities, but unlikely in current GHC. So beware that a bound Haskell thread costs more for switching context.)

Three ways to obtain a bound Haskell thread:

The Haskell thread that runs main is already a bound Haskell thread.
forkOS creates a fresh OS thread and a bound Haskell thread associated with it.
When C calls Haskell: this is the subject of the next section.

Why does forkOS always create a fresh OS thread for the association? For concurrency: two forkOS'ed Haskell threads calling C at the same time necessitates two OS threads.

The following example first shows that an unbound Haskell thread can make C calls in different OS threads at different times. (I force it by exploiting a technical detail in the previous section.) Then it tests that a forkOS bound Haskell thread makes two C calls in the same OS thread (immune to my exploit); meanwhile another forkOS bound Haskell thread butts in.

To compile on Linux: ghc -threaded main.hs slow.c

main.hs:

import Control.Concurrent
import Control.Exception(finally)
import Foreign.C

mforkIO action = do
  done <- newEmptyMVar
  forkIO (action `finally` putMVar done ())
  return (takeMVar done)

mforkOS action = do
  done <- newEmptyMVar
  forkOS (action `finally` putMVar done ())
  return (takeMVar done)

main = do
  wait_ibm <- mforkIO ibm
  wait_ibm
  wait_ibm <- mforkOS ibm
  threadDelay 500000
  forkOS (do x <- get_ostid
             putStrLn ("another forkOS calls C in " ++ show x)
         )
  wait_ibm

-- ibm = I've Been Moved!
ibm = do
  b <- isCurrentThreadBound
  let msg = "ibm " ++ (if b then "" else "un") ++ "bound calls C in "
  x <- get_ostid
  putStrLn (msg ++ show x)
  wait_sleep <- mforkIO (sleep 2 >> return ())
  threadDelay 1000000
  x <- get_ostid
  putStrLn (msg ++ show x)
  wait_sleep

foreign import ccall safe get_ostid :: IO CUInt
foreign import ccall safe sleep :: CUInt -> IO CUInt

slow.c (Linux only):

#define _GNU_SOURCE
#include <sys/syscall.h>
#include <unistd.h>

unsigned get_ostid(void) {
  return syscall(SYS_gettid);
}
/* yes, I gamble that pid_t is essentially a word. */

Result:

ibm unbound calls C in 5193
ibm unbound calls C in 5194
ibm bound calls C in 5196
another forkOS calls C in 5197
ibm bound calls C in 5196

Foreign Calls Haskell (Calls Foreign (…

C calling Haskell works without extra effort from the Haskell programmer (or the C programmer). Firstly, multiple C OS threads calling Haskell is concurrent. Secondly, if the called Haskell calls C, i.e., C → Haskell → C, the 2nd C code is run in the same OS thread as the 1st C code. So C libraries with thread-locality requirements are happy.

The most popular use case of C → Haskell → C is with GUI libraries and OpenGL: the 1st C is the event loop, the Haskell is an event handler you supply, and the 2nd C is your event handler giving commands to the library. The library requires the event loop and the commands to be in the same OS thread.

Here is how GHC does it. When C calls Haskell, the GHC RTS creates a fresh bound Haskell thread associated with the calling OS thread, to run the called Haskell. From what we now know about bound threads, everything just works when the called Haskell calls C.

This mechanism is also how multiple bound Haskell threads end up sharing the same OS thread. For example if we have this call chain:

C → Haskell → C → Haskell → C → Haskell → C → Haskell

then we have 4 bound Haskell threads associated with the same OS thread. This is harmless because at least 3 of them are suspended; only the last one is active and may make yet another C call. In fact, we also understand that it is important that all 4 C calls and any further ones are in the same OS thread, stacked upon each other.

The following example has a C function and a Haskell function recursively calling each other, showing that every call into Haskell is another bound Haskell thread, and they all use the same OS thread for C calls.

To compile on Linux: ghc -threaded main.hs slow.c

main.hs:

import Control.Concurrent
import Foreign.C
import Foreign.Ptr

foreign import ccall safe get_ostid :: IO CUInt

hthreadinfo prefix = do
  t <- myThreadId
  putStr (prefix ++ ": haskell " ++ show t)
  b <- isCurrentThreadBound
  if b
    then do
    n <- get_ostid
    putStrLn (" bound to os thread " ++ show n)
    else putStrLn " unbound"

main = do
  -- recall that main is also run in a bound thread
  haskell 5

foreign import ccall safe cfunc :: FunPtr (IO ()) -> IO ()

foreign import ccall "wrapper" ptr_for_cfunc :: IO () -> IO (FunPtr (IO ()))

haskell 0 = return ()
haskell n = do
  hthreadinfo ("T minus " ++ show n)
  ptr <- ptr_for_cfunc (haskell (n-1))
  cfunc ptr
  freeHaskellFunPtr ptr
  ht <- myThreadId
  putStrLn (show ht ++ " done")

slow.c (Linux only):

#define _GNU_SOURCE
#include <sys/syscall.h>
#include <unistd.h>
#include <HsFFI.h>

unsigned get_ostid(void) {
  return syscall(SYS_gettid);
}
/* yes, I gamble that pid_t is essentially a word. */

void cfunc(HsFunPtr callback) {
  callback();
}

Result:

T minus 5: haskell ThreadId 3 bound to os thread 3942
T minus 4: haskell ThreadId 4 bound to os thread 3942
T minus 3: haskell ThreadId 5 bound to os thread 3942
T minus 2: haskell ThreadId 6 bound to os thread 3942
T minus 1: haskell ThreadId 7 bound to os thread 3942
ThreadId 7 done
ThreadId 6 done
ThreadId 5 done
ThreadId 4 done
ThreadId 3 done

I have shown using Haskell as the main program. You can use C as the main program too; in fact, you can create OS threads on the C side, and from them call Haskell. All of the above still work.

I have more Haskell Notes and Examples