images/1x/mona-3-cut.png

Haskell Bugs - Functional Bugs and their Fixes

When it compiles it’s correct. Unless it isn’t. HasBugs is a small, handpicked Dataset of Bugs from popular Haskell FOSS Programs and Libraries. Our bugs come with 3 versions: buggy, tested, fixed that you can run with Docker. Oh, and we actually had a look what is failing!

images/1200w/mona-coccoon-1-cut.png

Problem 1

Information

What is happening? To effectively work with bugs, you need to understand the problem. Many datasets lack descriptions, or are defined by 'failing ci'. That is not enough! A lot of research had issues due to missing information, or produced patches could not be assessed. Also, different task need different information - for many datasets these are only derived from builds or using tools. We deserve better.

images/800w/engerling-incubator-1.png

Problem 2

Reproducibility

Recently someone asked me if it is worth the effort caring for reproducibility. Noone reuses your things and if one or two people want to, they can just sent you an email. I am still thinking a lot about this. I think there is some truth to it. However, I have also seen how much time and energy I used to reproduce results. Maybe exact reproducibility is a myth, but wouldn't it be nice to have things that give some output?

images/1x/mona-4-cut.png

Problem 3

Accessability

Things you can't use are not really useful. We know that working with datasets isn't exactly a walk in the park. By design it's a hard task, and organizing many different projects in the same way can be harmful too. Static approaches, dynamic approaches, grounded theory, etc. are all valid approaches that can bring us further as researchers. Let's try to help pick everyone up.

images/800w/engerling-variety-1-cut.png

Problem 4

Quality

Bugs are not the same. Datapoints can vary greatly in their information-granularity and quantity. Many datasets mine their datapoints by automatic tools without manual inspection, or sampling at best. That leads to big datasets that can be used for machine learning, to produce models that are only evaluated on sampling. That is all fun and games, but to really put things to the test you need a human-evaluated gold standard to test against. We got you covered.

Our main features

Learning from previous Datasets, we try to bring the best quality

images/400w/container-engerling-1-cut.png

Containerised Builds

Figuring out Dependencies is a pain. We provide a Docker Image and Container that runs the build for you --- for buggy, tested and fixed versions.

Docker Images
images/400w/mona-1-cut.png

Fault-Reason

We provide a small, high quality dataset. Everyhing has been human-evaluated. No more puzzling for you if your Automated Program Repair does something good - because we specify what was wrong in the first place.

Learn more
images/800w/engerling-book-1-cut.png

Detailed Datapoints

Picking up best practices, with our dataset you get tests, multi-location faults, frameworks,related issues and much more

Data Showcase

Embrace the Bugs!

images/1200w/mona-hunting-1-cut.png