haskell interview questions
Top haskell frequently asked interview questions
I came across the following definition as I try to learn Haskell using a real project to drive it. I don't understand what the exclamation mark in front of each argument means and my books didn't seem to mention it.
data MidiMessage = MidiMessage !Int !MidiMessage
Source: (StackOverflow)
I'm having problems getting GHC to specialize a function with a class constraint. I have a minimal example of my problem here: Foo.hs and Main.hs. The two files compile (GHC 7.6.2, ghc -O3 Main
) and run.
NOTE:
Foo.hs
is really stripped down. If you want to see why the constraint is needed, you can see a little more code here. If I put the code in a single file or make many other minor changes, GHC simply inlines the call to plusFastCyc
. This will not happen in the real code because plusFastCyc
is too large for GHC to inline, even when marked INLINE
. The point is to specialize the call to plusFastCyc
, not inline it. plusFastCyc
is called in many places in the real code, so duplicating such a large function would not be desirable even if I could force GHC to do it.
The code of interest is the plusFastCyc
in Foo.hs
, reproduced here:
{-# INLINEABLE plusFastCyc #-}
{-# SPECIALIZE plusFastCyc ::
forall m . (Factored m Int) =>
(FastCyc (VT U.Vector m) Int) ->
(FastCyc (VT U.Vector m) Int) ->
(FastCyc (VT U.Vector m) Int) #-}
-- Although the next specialization makes `fcTest` fast,
-- it isn't useful to me in my real program because the phantom type M is reified
-- {-# SPECIALIZE plusFastCyc ::
-- FastCyc (VT U.Vector M) Int ->
-- FastCyc (VT U.Vector M) Int ->
-- FastCyc (VT U.Vector M) Int #-}
plusFastCyc :: (Num (t r)) => (FastCyc t r) -> (FastCyc t r) -> (FastCyc t r)
plusFastCyc (PowBasis v1) (PowBasis v2) = PowBasis $ v1 + v2
The Main.hs
file has two drivers: vtTest
, which runs in ~3 seconds, and fcTest
, which runs in ~83 seconds when compiled with -O3 using the forall
'd specialization.
The core shows that for the vtTest
test, the addition code is being specialized to Unboxed
vectors over Int
s, etc, while generic vector code is used for fcTest
.
On line 10, you can see that GHC does write a specialized version of plusFastCyc
, compared to the generic version on line 167.
The rule for the specialization is on line 225. I believe this rule should fire on line 270. (main6
calls iterate main8 y
, so main8
is where plusFastCyc
should be specialized.)
My goal is to make fcTest
as fast as vtTest
by specializing plusFastCyc
. I've found two ways to do this:
- Explicity call
inline
from GHC.Exts
in fcTest
.
- Remove the
Factored m Int
constraint on plusFastCyc
.
Option 1 is unsatisfactory because in the actual code base plusFastCyc
is a frequently used operation and a very large function, so it should not be inlined at every use. Rather, GHC should call a specialized version of plusFastCyc
. Option 2 is not really an option because I need the constraint in the real code.
I've tried a variety of options using (and not using) INLINE
, INLINABLE
, and SPECIALIZE
, but nothing seems to work. (EDIT: I may have stripped out too much of plusFastCyc
to make my example small, so INLINE
might cause the function to be inlined. This doesn't happen in my real code because plusFastCyc
is so large.) In this particular example, I'm not getting any match_co: needs more cases
or RULE: LHS too complicated to desugar
(and here) warnings, though I was getting many match_co
warnings before minimizing the example. Presumably, the "problem" is the Factored m Int
constraint in the rule; if I make changes to that constraint, fcTest
runs as fast as vtTest
.
Am I doing something GHC just doesn't like? Why won't GHC specialize the plusFastCyc
, and how can I make it?
UPDATE
The problem persists in GHC 7.8.2, so this question is still relevant.
Source: (StackOverflow)
There are at least three popular libraries for accessing and manipulating fields of records. The ones I know of are: data-accessor, fclabels and lenses.
Personally I started with data-accessor and I'm using them now. However recently on haskell-cafe there was an opinion of fclabels being superior.
Therefore I'm interested in comparison of those three (and maybe more) libraries.
Source: (StackOverflow)
The two Haskell web frameworks in the news recently are Yesod (at 0.8) and Snap (at 0.4).
It's quite obvious that Yesod currently supports a lot more features than Snap. However, I can't stand the syntax Yesod uses for its HTML, CSS and Javascript.
So, I'd like to understand what I'd be missing if I went with Snap instead. For example, doesn't look like database support is there. How about sessions? Other features?
Source: (StackOverflow)
I can't find it now, but I swear there used to be a T-shirt for sale featuring the immortal words:
What part of
do you not understand?
In my case, the answer would be... all of it!
In particular, I often see notation like this in Haskell papers, but I have no clue what the hell any of it means. I have no idea what branch of mathematics it's supposed to be.
I recognise the letters of the Greek alphabet of course, and symbols such as "∉" (which usually means that something is not an element of a set).
On the other hand, I've never seen "⊢" before (Wikipedia claims it might mean "partition"). I'm also unfamiliar with the use of the vinculum here. (Usually it denotes a fraction, but that does not appear to be the case here.)
I imagine SO is not a good place to be explaining the entire Milner Hindley algorithm. But if somebody could at least tell me where to start looking to comprehend what this sea of symbols means, that would be helpful. (I'm sure I can't be the only person who's wondering...)
Source: (StackOverflow)
What's the status of multicore programming in Haskell? What projects, tools, and libraries are available now? What experience reports have there been?
Source: (StackOverflow)
I've to admit that I don't know much about functional programming. I read about it from here and there, and so came to know that in functional programming, a function returns the same output, for same input, no matter how many times the function is called. It's exactly like mathematical function which evaluates to same output for same value of input parameter which involves in the function expression.
For example, consider this:
f(x,y) = x*x + y; //it is a mathematical function
No matter how many times you use f(10,4)
, its value will always be 104
. As such, wherever you've written f(10,4)
, you can replace it with 104
, without altering the value of the whole expression. This property is referred to as referential transparency of an expression.
As Wikipedia says (link),
Conversely, in functional code, the output value of a function depends only on the arguments that are input to the function, so calling a function f twice with the same value for an argument x will produce the same result f(x) both times.
So my question is: can a time function (which returns the current time) exist in functional programming?
If yes, then how can it exist? Does it not violate the principle of functional programming? It particularly violates referential transparency which is one of the property of functional programming (if I correctly understand it).
Or if no, then how can one know the current time in functional programming?
Source: (StackOverflow)
What is the difference between the dot (.)
and the dollar sign ($)
?. As I understand it, they are both syntactic sugar for not needing to use parentheses.
Source: (StackOverflow)
I have taken Problem #12 from Project Euler as a programming exercise and to compare my (surely not optimal) implementations in C, Python, Erlang and Haskell. In order to get some higher execution times, I search for the first triangle number with more than 1000 divisors instead of 500 as stated in the original problem.
The result is the following:
C:
lorenzo@enzo:~/erlang$ gcc -lm -o euler12.bin euler12.c
lorenzo@enzo:~/erlang$ time ./euler12.bin
842161320
real 0m11.074s
user 0m11.070s
sys 0m0.000s
python:
lorenzo@enzo:~/erlang$ time ./euler12.py
842161320
real 1m16.632s
user 1m16.370s
sys 0m0.250s
python with pypy:
lorenzo@enzo:~/Downloads/pypy-c-jit-43780-b590cf6de419-linux64/bin$ time ./pypy /home/lorenzo/erlang/euler12.py
842161320
real 0m13.082s
user 0m13.050s
sys 0m0.020s
erlang:
lorenzo@enzo:~/erlang$ erlc euler12.erl
lorenzo@enzo:~/erlang$ time erl -s euler12 solve
Erlang R13B03 (erts-5.7.4) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.7.4 (abort with ^G)
1> 842161320
real 0m48.259s
user 0m48.070s
sys 0m0.020s
haskell:
lorenzo@enzo:~/erlang$ ghc euler12.hs -o euler12.hsx
[1 of 1] Compiling Main ( euler12.hs, euler12.o )
Linking euler12.hsx ...
lorenzo@enzo:~/erlang$ time ./euler12.hsx
842161320
real 2m37.326s
user 2m37.240s
sys 0m0.080s
Summary:
- C: 100%
- python: 692% (118% with pypy)
- erlang: 436% (135% thanks to RichardC)
- haskell: 1421%
I suppose that C has a big advantage as it uses long for the calculations and not arbitrary length integers as the other three. Also it doesn't need to load a runtime first (Do the others?).
Question 1:
Do Erlang, Python and Haskell lose speed due to using arbitrary length integers or don't they as long as the values are less than MAXINT
?
Question 2:
Why is Haskell so slow? Is there a compiler flag that turns off the brakes or is it my implementation? (The latter is quite probable as Haskell is a book with seven seals to me.)
Question 3:
Can you offer me some hints how to optimize these implementations without changing the way I determine the factors? Optimization in any way: nicer, faster, more "native" to the language.
EDIT:
Question 4:
Do my functional implementations permit LCO (last call optimization, a.k.a tail recursion elimination) and hence avoid adding unnecessary frames onto the call stack?
I really tried to implement the same algorithm as similar as possible in the four languages, although I have to admit that my Haskell and Erlang knowledge is very limited.
Source codes used:
#include <stdio.h>
#include <math.h>
int factorCount (long n)
{
double square = sqrt (n);
int isquare = (int) square;
int count = isquare == square ? -1 : 0;
long candidate;
for (candidate = 1; candidate <= isquare; candidate ++)
if (0 == n % candidate) count += 2;
return count;
}
int main ()
{
long triangle = 1;
int index = 1;
while (factorCount (triangle) < 1001)
{
index ++;
triangle += index;
}
printf ("%ld\n", triangle);
}
#! /usr/bin/env python3.2
import math
def factorCount (n):
square = math.sqrt (n)
isquare = int (square)
count = -1 if isquare == square else 0
for candidate in range (1, isquare + 1):
if not n % candidate: count += 2
return count
triangle = 1
index = 1
while factorCount (triangle) < 1001:
index += 1
triangle += index
print (triangle)
-module (euler12).
-compile (export_all).
factorCount (Number) -> factorCount (Number, math:sqrt (Number), 1, 0).
factorCount (_, Sqrt, Candidate, Count) when Candidate > Sqrt -> Count;
factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;
factorCount (Number, Sqrt, Candidate, Count) ->
case Number rem Candidate of
0 -> factorCount (Number, Sqrt, Candidate + 1, Count + 2);
_ -> factorCount (Number, Sqrt, Candidate + 1, Count)
end.
nextTriangle (Index, Triangle) ->
Count = factorCount (Triangle),
if
Count > 1000 -> Triangle;
true -> nextTriangle (Index + 1, Triangle + Index + 1)
end.
solve () ->
io:format ("~p~n", [nextTriangle (1, 1) ] ),
halt (0).
factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
where square = sqrt $ fromIntegral number
isquare = floor square
factorCount' number sqrt candidate count
| fromIntegral candidate > sqrt = count
| number `mod` candidate == 0 = factorCount' number sqrt (candidate + 1) (count + 2)
| otherwise = factorCount' number sqrt (candidate + 1) count
nextTriangle index triangle
| factorCount triangle > 1000 = triangle
| otherwise = nextTriangle (index + 1) (triangle + index + 1)
main = print $ nextTriangle 1 1
Source: (StackOverflow)
Who first said
A monad is just a monoid in the
category of endofunctors, what's the
problem?
and on a less important note is this true and if so could you give an explanation (hopefully one that can be understood by someone who doesn't have much haskell experience).
Source: (StackOverflow)
What are some open source programs that use Haskell and can be considered to be good quality modern Haskell? The larger the code base, the better.
I want to learn from their source code. I feel I'm past the point of learning from small code examples, which are often to esoteric and small-world. I want to see how code is structured, how monads interact when you have a lot of things going on (logging, I/O, configuration, etc.).
Source: (StackOverflow)
What is a good way to design/structure large functional programs, especially in Haskell?
I've been through a bunch of the tutorials (Write Yourself a Scheme being my favorite, with Real World Haskell a close second) - but most of the programs are relatively small, and single-purpose. Additionally, I don't consider some of them to be particularly elegant (for example, the vast lookup tables in WYAS).
I'm now wanting to write larger programs, with more moving parts - acquiring data from a variety of different sources, cleaning it, processing it in various ways, displaying it in user interfaces, persisting it, communicating over networks, etc. How could one best structure such code to be legible, maintainable, and adaptable to changing requirements?
There is quite a large literature addressing these questions for large object-oriented imperative programs. Ideas like MVC, design patterns, etc. are decent prescriptions for realizing broad goals like separation of concerns and reusability in an OO style. Additionally, newer imperative languages lend themselves to a 'design as you grow' style of refactoring to which, in my novice opinion, Haskell appears less well-suited.
Is there an equivalent literature for Haskell? How is the zoo of exotic control structures available in functional programming (monads, arrows, applicative, etc.) best employed for this purpose? What best practices could you recommend?
Thanks!
EDIT (this is a follow-up to Don Stewart's answer):
@dons mentioned: "Monads capture key architectural designs in types."
I guess my question is: how should one think about key architectural designs in a pure functional language?
Consider the example of several data streams, and several processing steps. I can write modular parsers for the data streams to a set of data structures, and I can implement each processing step as a pure function. The processing steps required for one piece of data will depend on its value and others'. Some of the steps should be followed by side-effects like GUI updates or database queries.
What's the 'Right' way to tie the data and the parsing steps in a nice way? One could write a big function which does the right thing for the various data types. Or one could use a monad to keep track of what's been processed so far and have each processing step get whatever it needs next from the monad state. Or one could write largely separate programs and send messages around (I don't much like this option).
The slides he linked have a Things we Need bullet: "Idioms for mapping design onto
types/functions/classes/monads". What are the idioms? :)
Source: (StackOverflow)
From the docs for GHC 7.6:
[Y]ou often don't even need the SPECIALIZE pragma in the first place. When compiling a module M, GHC's optimiser (with -O) automatically considers each top-level overloaded function declared in M, and specialises it for the different types at which it is called in M. The optimiser also considers each imported INLINABLE overloaded function, and specialises it for the different types at which it is called in M.
and
Moreover, given a SPECIALIZE pragma for a function f, GHC will automatically create specialisations for any type-class-overloaded functions called by f, if they are in the same module as the SPECIALIZE pragma, or if they are INLINABLE; and so on, transitively.
So GHC should automatically specialize some/most/all(?) functions marked INLINABLE
without a pragma, and if I use an explicit pragma, the specialization is transitive. My question is:
is the auto-specialization transitive?
Specifically, here's a small example:
Main.hs:
import Data.Vector.Unboxed as U
import Foo
main =
let y = Bar $ Qux $ U.replicate 11221184 0 :: Foo (Qux Int)
(Bar (Qux ans)) = iterate (plus y) y !! 100
in putStr $ show $ foldl1' (*) ans
Foo.hs:
module Foo (Qux(..), Foo(..), plus) where
import Data.Vector.Unboxed as U
newtype Qux r = Qux (Vector r)
-- GHC inlines `plus` if I remove the bangs or the Baz constructor
data Foo t = Bar !t
| Baz !t
instance (Num r, Unbox r) => Num (Qux r) where
{-# INLINABLE (+) #-}
(Qux x) + (Qux y) = Qux $ U.zipWith (+) x y
{-# INLINABLE plus #-}
plus :: (Num t) => (Foo t) -> (Foo t) -> (Foo t)
plus (Bar v1) (Bar v2) = Bar $ v1 + v2
GHC specializes the call to plus
, but does not specialize (+)
in the Qux
Num
instance which kills performance.
However, an explicit pragma
{-# SPECIALIZE plus :: Foo (Qux Int) -> Foo (Qux Int) -> Foo (Qux Int) #-}
results in transitive specialization as the docs indicate, so (+)
is specialized and the code is 30x faster (both compiled with -O2
). Is this expected behavior? Should I only expect (+)
to be specialized transitively with an explicit pragma?
UPDATE
The docs for 7.8.2 haven't changed, and the behavior is the same, so this question is still relevant.
Source: (StackOverflow)
Here's the scenario: I've written some code with a type signature and GHC complains could not deduce x ~ y for some x
and y
. You can usually throw GHC a bone and simply add the isomorphism to the function constraints, but this is a bad idea for several reasons:
- It does not emphasize understanding the code.
- You can end up with 5 constraints where one would have sufficed (for example, if the 5 are implied by one more specific constraint)
- You can end up with bogus constraints if you've done something wrong or if GHC is being unhelpful
I just spent several hours battling case 3. I'm playing with syntactic-2.0
, and I was trying to define a domain-independent version of share
, similar to the version defined in NanoFeldspar.hs
.
I had this:
{-# LANGUAGE GADTs, FlexibleContexts, TypeOperators #-}
import Data.Syntactic
-- Based on NanoFeldspar.hs
data Let a where
Let :: Let (a :-> (a -> b) :-> Full b)
share :: (Let :<: sup,
Domain a ~ sup,
Domain b ~ sup,
SyntacticN (a -> (a -> b) -> b) fi)
=> a -> (a -> b) -> a
share = sugarSym Let
and GHC could not deduce (Internal a) ~ (Internal b)
, which is certainly not what I was going for. So either I had written some code I didn't intend to (which required the constraint), or GHC wanted that constraint due to some other constraints I had written.
It turns out I needed to add (Syntactic a, Syntactic b, Syntactic (a->b))
to the constraint list, none of which imply (Internal a) ~ (Internal b)
. I basically stumbled upon the correct constraints; I still don't have a systematic way to find them.
My questions are:
- Why did GHC propose that constraint? Nowhere in syntactic is there a constraint
Internal a ~ Internal b
, so where did GHC pull that from?
- In general, what techniques can be used to trace the origin of a constraint which GHC believes it needs? Even for constraints that I can discover myself, my approach is essentially brute forcing the offending path by physically writing down recursive constraints. This approach is basically going down an infinite rabbit hole of constraints and is about the least efficient method I can imagine.
Source: (StackOverflow)
I've recently caught the FP bug (trying to learn Haskell), and I've been really impressed with what I've seen so far (first-class functions, lazy evaluation, and all the other goodies). I'm no expert yet, but I've already begun to find it easier to reason "functionally" than imperatively for basic algorithms (and I'm having trouble going back where I have to).
The one area where current FP seems to fall flat, however, is GUI programming. The Haskell approach seems to be to just wrap imperative GUI toolkits (such as GTK+ or wxWidgets) and to use "do" blocks to simulate an imperative style. I haven't used F#, but my understanding is that it does something similar using OOP with .NET classes. Obviously, there's a good reason for this--current GUI programming is all about IO and side effects, so purely functional programming isn't possible with most current frameworks.
My question is, is it possible to have a functional approach to GUI programming? I'm having trouble imagining what this would look like in practice. Does anyone know of any frameworks, experimental or otherwise, that try this sort of thing (or even any frameworks that are designed from the ground up for a functional language)? Or is the solution to just use a hybrid approach, with OOP for the GUI parts and FP for the logic? (I'm just asking out of curiosity--I'd love to think that FP is "the future," but GUI programming seems like a pretty large hole to fill.)
Source: (StackOverflow)