This Haskell I/O tutorial started its life as what I wrote for Haskell Wiki - A Brief Introduction to Haskell - I/O. Why I now continue its life on my own website and not improve on Haskell Wiki:
<hask>
to mark up Haskell fragments, use plain old
<code>
instead. But there are too many existing ones to fix.<hask>
because code colouring is
oh-so-important to them. I am tired of repairing their damage or educating
them—indeed if they don't proofread and notice the broken rendering they cause,
if all they see is their brilliant colouring accomplishment, then I should
just walk away and not bother.This tutorial relies on the following skills:
IO X -> (X -> IO
Y) -> IO Y
. I just say “parse”, you do not need to know what IO
X
means (I will tell you), but you need to know how it can be a legal
type.Haskell I/O actions have their own types, of the form IO X
. The
meaning of these types and how to use them is gradually illustrated in this
article. (It cannot be correctly summarized in a few words in layperson
language; every sentence you think up now is wrong.) But I can already tell you
why we cannot follow other languages and use simply X
or
()->X
.
In Haskell, if g :: String
, then g
is
the same string every time. If f :: () -> String
, then
f () :: String
is the same string every time.
String
or () -> String
cannot possibly be the type
of a read-a-line action, since the whole point of the read action
is the possibility of giving you different strings at different times.
By typing the read-a-line action as getLine :: IO String
, it
is free from the requirement of giving you the same string every time. It is
required, instead, to be the same I/O action every time, which is exactly
right: every time you use getLine
, it is the same action of
reading a line.
IO a
types enjoy a slightly special status: the entry point of a
complete program, main
, must have such a type. Naturally, this is
because the complete program is expected to perform I/O and interact with you
and the world.
putStrLn :: String -> IO ()
is a function and not an I/O action,
since its type is a function type. But once you supply a string parameter, you
get an I/O action: putStrLn "hello" :: IO ()
.
That is indeed the type of an I/O action. This action writes
hello. You can now use it for main
:
-- put this in file r0.hs, compile and run, don't walk main :: IO () main = putStrLn "hello"
$ ghc r0.hs ... $ ./r0 hello $
getLine :: IO String
already has a right type, and you
can use it for main
of another program:
-- put this in file r1.hs, compile and run, enter a line main :: IO String main = getLine
$ ghc r1.hs ... $ ./r1 Good morning $
What about doing things at a REPL prompt? Here are typical sessions:
Prelude> putStrLn "hello" hello Prelude>
Prelude> getLine Good morning "Good morning" Prelude>
The REPL echoes the string you enter for getLine
. This is REPL
special service and not representative of standard behaviour, e.g., you cannot
expect this from a compiled version of r1.hs.
This and other REPL special services, deviating from standard behaviours, are rationalized as convenience for learning and testing. Ironically, the very deviation itself makes them untrustworthy for learning and testing. Whenever the REPL differs from compiled programs, remember: the REPL is simply wrong.
How to extend r1.hs to process the input line received, generally how to make
use of the a
in IO a
, and how to build compound
actions, are the subject of the next section.
How do we write a program that uses getLine
to read a line,
then uses putStrLn
to write it?
Failed attempt #0: putStrLn getLine
does not type-check.
Failed attempt #1: Ask for an “extractor” stripIO :: IO String ->
String
, then write putStrLn (stripIO getLine)
. Right type,
wrong behaviour. As I explained before, since stripIO getLine ::
String
, this would be the same string every time you use it,
contradicting the point of reading input. The idea of extraction is an illogical
dead end, and no matter how much you like it, you must give up. You shall
not extract.
What do you actually need, as opposed to want, to propagate the line read by
getLine
to downstream processors such as putStrLn
?
I'll use an analogy, much as I am against analogies. I have a radio receiver
that reads signals from the environment (analogous to getLine
). I
have a loudspeaker that takes signals and outputs sound (analogous to
putStrLn
). Suppose now I say, I have a cunning plan, I go to the
radio receiver and extract signals into my own hands, then I go to the loudspeaker
and provide signals from my own hands. Is that a cunning plan, or a silly plan?
Apparently, that is exactly the silly plan other programming languages have
always made you do (my hands are analogous to state variables), which you tried to
replicate in Haskell. Why don't I just use a connector to connect the radio
receiver with the loudspeaker, be done with it, and spare my hands? That is
exactly what Haskell will make you do.
The connector is (>>=)
. Its English name is “bind”. The way you
use it is like this:
-- put this in file b0.hs, compile and run, enter a line main :: IO () main = getLine >>= putStrLn
$ ghc b0.hs ... $ ./b0 Good morning Good morning
Or in a REPL:
Prelude> getLine >>= putStrLn Good morning Good morning Prelude>
To make sense of it, first look at the types and see how they match up. The
type of (>>=)
is
(>>=) :: IO a -> (a -> IO b) -> IO b
It is a long type, but don't give up, keep calm and analyze.
(>>=)
wants two parameters.IO a
, and we give it getLine :: IO
String
. That matches up, with a = String
.a -> IO b
, and we give it
putStrLn :: String -> IO ()
. That matches up too, with b =
()
.getLine >>= putStrLn
type-checks and has type
IO b = IO ()
.Officially, (>>=)
has a more abstract, general type.
It is Monad m => m a -> (a -> m b) -> m b
. But concrete before
abstract, specific before general. Let us stick to specifically IO
today. This tutorial is about I/O not abstract general monads.
That takes care of the types. Now the behaviour:
getLine >>= putStrln
is a compound action. It consists of
two constituent actions, to be performed in the order listed below.getLine
.putStrLn
. Recall that putStrLn
alone is not an
action, but it is a String -> IO ()
function, supplying a
parameter gets you an action.(getLine >>=)
is one
single unit, and you have to give it a callback. The callback is
putStrLn
in this example. The behaviour is: read a line, then
call your callback with the line as parameter.Echoing a string is too boring. How do we write a program that reads a line, then writes that line but with "You have entered: " prepended?
There are two ways. This section shows one way, and the next section shows the other way. The difference comes down to: Do you group the prepending with the writing stage or the reading stage? (I am not promoting either way as better. I show both because both teach you something new. You decide, for each particular situation, which grouping is appropriate.) The way shown here groups the prepending with the writing stage.
The trick is to realize that the callback can also be a lambda function
(or any function you write) of the right type. The callback can supply an
augmented string to putStrLn
. It can be
\s -> putStrLn ("You have entered: " ++ s)
Check that the type is still right: String -> IO ()
. So let's do it:
-- put this in file b1.hs, compile and run, enter a line main :: IO () main = getLine >>= \s -> putStrLn ("You have entered: " ++ s)
$ ghc b1.hs ... $ ./b1 Good morning You have entered: Good morning
Or in a REPL:
Prelude> getLine >>= \s -> putStrLn ("You have entered: " ++ s) Good morning You have entered: Good morning Prelude>
Once you know how to build a compound action from two constituents, you know
how to build from more constituents. The trick is to include further
(>>=)
s in callbacks. Predict what the following program does, then
test your prediction.
-- put this file in b2.hs, compile and run, enter two lines main :: IO () main = putStrLn "Enter a line:" >>= \_ -> getLine >>= \s0 -> putStrLn "Enter one last line:" >>= \_ -> getLine >>= \s1 -> putStrLn ("Total length: " ++ show (length (s0 ++ s1))) -- this code layout is only educational, for showing theoretical structure
This code layout is only educational, not practical. The section on
do-notation will show a practical layout, which also relieves you
from repeating >>=
all the time.
Before do-notation was invented, people used this layout in practice:
putStrLn "Enter a line:" >>= \_ -> getLine >>= \s0 -> putStrLn "Enter one last line:" >>= \_ -> getLine >>= \s1 -> putStrLn ("Total length: " ++ show (length (s0 ++ s1)))
Not very attractive, bu not too bad, actually.
This is a good time to say more on the meaning of a type like
IO String
. Well, you already know that it stands for an I/O action.
Now what is the String
doing there? It is intuitive to say
“the I/O action returns a string”; this is right in one sense but wrong in
another sense. I am not getting into that muddy water. I prefer to say:
The type act1 :: IO String
means: act1
is an I/O
action, and as for the String
part: in act1 >>=
callback2
, a string is passed to callback2
. I do not talk
about returning anything; I talk about passing a string to the callback. This is
correct and robust in all senses. It is why I like the callback story more.
And the type putStrLn s :: IO ()
means:
in putStrLn s >>= callback3
, a boring ()
is
passed to the callback. This is because the writing action does not really
have any information to tell the callback, so ()
is exactly
right. You have seen how I would write callback3
in this case.
How do we write a program that reads a line, then writes that line but with "You have entered: " prepended, and this time we group the prepending with the reading?
This requires the help of another library function. It is:
pure :: a -> IO a
pure
had an old name return
. It did not mean the
control-flow kind of returning. There are lies, damn lies, statistics, and
meaningful names.
While I'm at lies and multiple meanings, the official abstract general
types are: pure :: Applicative m => a -> m a
,
return :: Monad m => a -> m a
.
The behaviour is this. With a supplied parameter, like pure s
,
it is a dummy I/O action—it does not read, write, or change anything—but
it passes s
to the next callback, if there is any. Its sole purpose
is to help you control what data to pass on. Here is an example of using it:
getLine >>= \s -> pure ("You have entered: " ++ s)
Its overall behaviour is this: read a line, then—don't pass on that line verbatim—intercept it and pass on the augmented string.
So here is the complete program:
-- put this in file p0.hs, compile and run, enter a line main :: IO () main = (getLine >>= \s -> pure ("You have entered: " ++ s)) >>= putStrLn
$ ghc p0.hs ... $ ./p0 Good morning You have entered: Good morning
You don't usually write code like that. But suppose you are to package up the whole “read a line, then pass on the augmented string” as conceptually one single reusable unit, then it makes slightly more sense:
-- put this in file p1.hs, compile and run, enter a line main :: IO () main = getLine_and_prepend >>= putStrLn getLine_and_prepend :: IO String getLine_and_prepend = getLine >>= \s -> pure ("You have entered: " ++ s)
Exercise: Check the types. Always check the types.
If you are to package up a more involved compound action and be selective about what it passes on, it makes even more sense:
-- put this in file p2.hs main :: IO () main = get_email >>= putStrLn get_email :: IO String get_email = putStrLn "Enter email: " >>= \_ -> getLine >>= \s0 -> putStrLn "Enter again: " >>= \_ -> getLine >>= \s1 -> if s0 /= s1 then putStrLn "Different. Re-enter." >>= \_ -> get_email else pure s0
To give I/O programming a more conventional, imperative look, Haskell provides special syntactic sugar, and we call it do-notation. Examples illustrate this notation well, and you should able to extrapolate.
This code
act0 >>= \x -> -- act1, act2 may use x act1 >>= \_ -> act2 >>= \z -> -- act3 may use x,z act3
can be written as
do { x <- act0; act1; z <- act2; act3 }
Extraneous semicolons are allowed for your convenience, for example
do { ; x <- act0; act1; z <- act2; act3; }
For your further convenience, whitespace-sensitive layout is also supported:
do x <- act0 act1 z <- act2 act3 -- or do x <- act0 act1 z <- act2 act3
When you see this fine print, you know what I'm going to say
again. Since do-notation is only about representing (>>=)
, it
is good for IO
as well as all Monad
instances.
Here are some previous examples rewritten using do-notation:
Prelude> do { s <- getLine; putStrLn ("You have entered: " ++ s) } Good morning You have entered: Good morning Prelude>
-- put this in file d0.hs main :: IO () main = do putStrLn "Enter a line:" s0 <- getLine putStrLn "Enter one last line:" s1 <- getLine putStrLn ("Total length: " ++ show (length (s0 ++ s1)))
-- put this in file d1.hs main :: IO () main = do s <- get_email putStrLn s get_email :: IO String get_email = do putStrLn "Enter email: " s0 <- getLine putStrLn "Enter again: " s1 <- getLine if s0 /= s1 then -- you need to start a new "do" for this do putStrLn "Different. Re-enter." get_email else pure s0
Two more features of do-notation. First, you can write local definitions in the middle:
do x <- getLine let x1 = x ++ x x2 = take 10 x1 putStrLn x2
This represents:
getLine >>= \x -> let {x1 = x ++ x; x2 = take 10 x1} in putStrLn x2
Note that in normal let
, you have to include in
.
But in the let
feature of do-notation, do not add in
.
Here is d0.hs rewritten again, using this let
feature:
-- put this in file d2.hs main :: IO () main = do putStrLn "Enter a line:" s0 <- getLine putStrLn "Enter one last line:" s1 <- getLine let s = s0 ++ s1 n = length s putStrLn ("Total length: " ++ show n)
The second feature is that you are not restricted to
variable <- action
you are allowed the general
pattern <- action
For example, System.IO has this function
openTempFile :: FilePath -> String -> IO (FilePath, Handle)
Here is how you may use it in do-notation:
do (path, handle) <- openTempFile "/tmp" "gory.txt" hPutStrLn handle ("I am " ++ path) ...
It represents:
openTempFile "/tmp" "gory.txt" >>= \(path, handle) -> hPutStrLn handle ("I am " ++ path) >>= \_ -> ...
That is a simplified story. If the pattern is non-exhaustive, you need to know the full story. It is gross. It goes like this:
do x:xs <- getLine putChar x
What happens if the string is empty? Here is what happens:
getLine >>= \y -> case y of x:xs -> putChar x _ -> fail "compiler puts an error message here"
fail :: String -> IO a
is one more library function. fail
"error message"
throws an exception with the given error message. So,
what happens if the string is empty? Answer: exception. Can you catch that
exception and not let it abort the whole program? Yes, but that's another tutorial.
It's me, fine print again! The official abstract general type
is fail :: Monad m => String -> m a
. For other Monad
instances, it may do other things instead of throwing exceptions. Now here is
the thing: We now widely agree that fail
makes no sense for some
Monad
instances, i.e., non-exhaustive patterns make no sense for
some Monad
instances. But it is historical and hard to retract now.
But it is acceptable for IO
, and I'm only doing IO
here.
Even though using non-exhaustive patterns has a predictable behaviour,
you should not rely on it. You should consciously decide how your program
responds to all cases, and code up exactly that. You should write your own
case
branches. Here is my conscious decision:
loop = do s <- getLine case s of x:xs -> putChar x [] -> do putStrLn "Re-enter." loop
With IO
popping up whenever your program converses with the
outside world, it is tempting to jump to the wrong conclusion that
IO
creeps into every line of your program, every bloody line.
Suppose you are to write a program that inputs a string, calculates the sum of the number of 'x's and 'y's, and outputs the answer. It is tempting to organize the program this wrong way:
-- Wrong organization. main :: IO () main = do ans <- calculate print ans calculate :: IO Int calculate = do inp <- getLine let ans = length (filter (\c -> c == 'x' || c == 'y') inp) pure ans
Or this wrong way:
-- Wrong organization. main :: IO () main = do inp <- getLine calculate inp calculate :: String -> IO () calculate s = do let ans :: Int ans = length (filter (\c -> c == 'x' || c == 'y') s) print ans
Why are they wrong? Because even in C++ you would not write code like the following:
class Complex { private double real, imag; public Complex() { cout << "Please enter the real part: "; cin >> real; cout << "Please enter the imaginary part: "; cin >> imag; cout << "Thank you. Would you like to confirm your input?"; // etc etc } };
Even you know what's wrong with it. Even you, when using imperative languages, have a perfect sense of which part of your program is responsible for I/O, which part is responsible for internal processing, and the internal processing part does not do the talking or asking.
The distinction carries over to Haskell. The I/O part of your program has the
IO
type; the internal processing part does not. Here is the right
organization.
main :: IO () main = do inp <- getLine let ans :: Int ans = calculate inp print ans calculate :: String -> Int calculate s = length (filter (\c -> c == 'x' || c == 'y') s)
Inputting data and outputting answers need I/O and IO
;
calculating answers does not.
putStrLn "hello" >>= \_ -> act2
can be simplified to
putStrLn "hello" >> act2
That is the sole purpose of the (>>)
operator. It simply means:
act1 >> act2 = act1 >>= \_ -> act2
You know the drill. (>>)
is also general.
You have heard that “Haskell is lazy”. That is a gross oversimplification on multiple axes. The axis I talk about today is I/O. On this axis, the oversimplication is almost false.
Haskell I/O is not lazy, and you have just seen empirical evidence
first-hand. When you use getLine
, the computer will make you enter
a line immediately, whether that line is needed or not. When you use
putStrLn
, the computer will do the writing immediately, whether
that output is needed or not. I/O is where Haskell is emphatically not lazy,
where you re-gain control of what things happen in what order.
There are a few exceptions: a few things in the library perform lazy I/O. But I have not covered them, and I will not cover them. They are very hard to explain and understand correctly. They are also rarely used in practice, since they are hard to use correctly. You are not missing out.
But their names are readFile
, getContents
,
hGetContents
, and interact
.
If your program appears to postpone an I/O action, there is always a less whimsical explanation than “lazy”. The most likely one is standard buffering: line-buffered for terminals, block-buffered for files. This has been standard behaviour since Unix, and I think MSDOS does it too. If you need, System.IO provides tools for explicit flushing and re-configuring buffering.
You do not need to know general Monad for coding up I/O. But if you are curious about Monad nonetheless, then nothing wrong with curiosity if you have ample spare time for a rabbit hole. (It is a rabbit hole. It contains no extra information on I/O. You have been warned.) It is really the other way round: knowing specifically Haskell I/O, Maybe, lists, and a few other concrete things helps you learn general Monad. Concrete before abstract, specific before general.
The monad tutorial I like is Phil Wadler's: monads for functional programming.
This is just a tutorial. This is intentionally incomplete. This is just a beginning. There are many more I/O actions and functions in the library for practical use. There are many more design choices, structuring ways, and implementation methods. There are many formal semantics.
I have more Haskell Notes and Examples