Is this a good way to do JSON validation?

At work, where we use Clojure, we’ve been improving our error messages in the public API to

  1. return as many errors as possible in a response, and
  2. be in humanly readable English.

If one adopts spec as we have one gets the former for free, but the output of spec can hardly be called humanly readable. For the latter part we chose to use phrase.

Given that I’d like to see Haskell used more at work (currently there are 2 minor services written in Haskell and around a score in Clojure) I thought I’d take a look at JSON validation in Haskell. I ended up beig less than impressed. We have at least one great library for parsing JSON, aeson, but there are probably a few more that I haven’t noticed. It’s of course possible to mix in validation with the parsing, but since parsers, and this is true for aeson’s parser too, tend to be monads and that means that item 1 above, finding as many errors as possible, isn’t on the table.

A quick look at Hackage gave that

  • there is a package called aeson-better-errors that looked promising but didn’t fi my needs (I explain at the end why it isn’t passing muster)
  • the support for JSON Schema is very lacking in Haskell, hjsonschema is deprecated and aeson-schema only supports version 3 of the draft (the current version is 7) and the authors claim that that hjsonschema is more moderna and more actively maintained

So, a bit disappointed I started playing with the problem myself and found that, just as is stated in the description of the validation library, I want something that’s isomorphic to Either but accumulates on the error side. That is, something like

data JSONValidationResult = JVRInvalid [JSONValidationFailure]
                          | JVRValid
                          deriving (Eq, Show)

instance Semigroup JSONValidationResult where
  (JVRInvalid es0) <> (JVRInvalid es1) = JVRInvalid $ es0 <> es1
  JVRValid <> r = r
  r <> JVRValid = r

I decided it was all right to limit validation to proper JSON expressions, i.e. a validator could have the type Value -> JSONValidationResult. I want to combine validators so I decided to wrap it in a newtype and write a SemiGroup instance for it as well:

newtype JSONValidator = JV (A.Value -> JSONValidationResult)

instance Semigroup JSONValidator where
  (JV v0) <> (JV v1) = JV $ \ val -> v0 val <> v1 val

The function to actually run the validation is rather straight forward

runJSONValidator (JV validator) val = validator val

After writing a few validators I realised a few patterns emerged and the following functions simplified things a bit:

mapInvalid _ JVRValid = JVRValid
mapInvalid f (JVRInvalid es) = JVRInvalid $ map f es

valid = JVRValid
invalid s = JVRInvalid [JVFDesc s]

With this in place I started writing validators for the basic JSON types:

isNumber = JV go
    go (A.Number _) = valid
    go _ = invalid "not a number"

isString = JV go
    go (A.String _) = valid
    go _ = invalid "not a string"

isBool = JV go
    go (A.Bool _) = valid
    go _ = invalid "not a bool"

isNull = JV go
    go A.Null = valid
    go _ = invalid "not 'null'"

The number type in JSON is a float (well, in aeson it’s a Scientific), so to check for an integer a bit more than the above is needed

isInt = JV go
    go (A.Number i) = if i == fromInteger (round i)
                      then valid
                      else invalid "not an integer"
    go _ = invalid "not an integer"

as well as functions that check for the presence of a specific key

reqKey n v = JV go
    go (A.Object obj) = case HM.lookup n obj of
                          Nothing -> invalid $ "required key '" <> n <> "' is missing"
                          Just val -> mapInvalid (JVFPath n) $ runJSONValidator v val
    go _ = invalid "not an object"
optKey n v = JV go
    go (A.Object obj) = case HM.lookup n obj of
                          Nothing -> valid
                          Just val -> mapInvalid (JVFPath n) $ runJSONValidator v val
    go _ = invalid "not an object"

With this in place I can now create a validator for a person with a name and an age:

vPerson = reqKey "name" isString <>
          reqKey "age" isInt

and run it on a Value:

*> runJSONValidator vPerson <$> (decode "{\"name\": \"Alice\", \"age\": 32}" :: Maybe Value)
Just JVRValid

and all failures are picked up

*> runJSONValidator vPerson <$> (decode "{\"name\": \"Alice\", \"age\": \"foo\"}" :: Maybe Value)
Just (JVRInvalid [JVFPath "age" (JVFDesc "not an integer")])

*>runJSONValidator vPerson <$> (decode "{\"name\": \"Alice\"}" :: Maybe Value)
Just (JVRInvalid [JVFDesc "required key 'age' is missing"])

runJSONValidator vPerson <$> (decode "{\"nam\": \"Alice\"}" :: Maybe Value)
Just (JVRInvalid [JVFDesc "required key 'name' is missing",JVFDesc "required key 'age' is missing"])


  1. I quickly realised I wanted slightly more complex validation of course, so all the validators for basic JSON types above have a version taking a custom validator of type a -> JSONValidationResult (where a is the Haskell type contained in the particulare Value).
  2. I started out thinking that I want an Applicative for my validations, but slowly I relaxed that to SemiGroup. I’m still not sure about this decision, because I can see a real use of or which I don’t really have now. Maybe that means I should switch back towards Applicative, just so I can implement an Alternative instance for validators.
  3. Well, I simply don’t know if this is even a good way to implement validators. I’d love to hear suggestions both for improvements and for completely different ways of tackling the problems.
  4. I would love to find out that there already is a library that does all this in a much better way. Please point me in its direction!

Appendix: A look at aeson-better-errors

The issue with aeson-better-errors is easiest to illustrate using the same example as in its announcement:

{-# LANGUAGE OverloadedStrings #-}
module Play where

import           Data.Aeson
import           Data.Aeson.BetterErrors

data Person = Person String Int
  deriving (Show)

asPerson :: Parse e Person
asPerson = Person <$> key "name" asString <*> key "age" asIntegral

and with this loaded in GHCi (and make sure to either pass -XOverloadedStrings on the command line, or :set -XOverloadedStrings in GHCi itself)

*> parse asPerson "{\"name\": \"Alice\", \"age\": 32}"
Right (Person "Alice" 32)
*> parse asPerson "{\"name\": \"Alice\"}"
Left (BadSchema [] (KeyMissing "age"))
*> parse asPerson "{\"nam\": \"Alice\"}"
Left (BadSchema [] (KeyMissing "name"))

Clearly aeson-better-errors isn’t fulfilling the bit about reporting as many errors as possible. Something that I would have realised right away if I had bothered reading its API reference on Hackage a bit more carefully, the parser type ParseT is an instance of Monad!


Validation reminds of this package and post:

Leave a comment