Create your own COCO-style datasets

Learn how to convert your dataset into one of the most popular annotated image formats used today.

Posted on
Create your own COCO-style datasets

In today’s world of deep learning if data is King, making sure it’s in the right format might just be Queen. Or at least Jack or 10. Anyway, it’s pretty important. After working hard to collect your images and annotating all the objects, you have to decide what format you’re going to use to store all that info. This may not seem like a big decision compared to all the other things you have to worry about, but if you want to quickly see how different models perform on your data, it’s vital to get this step right.

Back in 2014 Microsoft created a dataset called COCO (Common Objects in COntext) to help advance research in object recognition and scene understanding. COCO was one of the first large scale datasets to annotate objects with more than just bounding boxes, and because of that it became a popular benchmark to use when testing out new detection models. The format COCO uses to store annotations has since become a de facto standard, and if you can convert your dataset to its style, a whole world of state-of-the-art model implementations opens up.

This is where pycococreator comes in. pycococreator takes care of all the annotation formatting details and will help convert your data into the COCO format. Let’s see how to use it by working with a toy dataset for detecting squares, triangles, and circles.

Example shape image and object masks

The shapes dataset has 500 128x128px jpeg images of random colored and sized circles, squares, and triangles on a random colored background. It also has binary mask annotations encoded in png of each of the shapes. This binary mask format is fairly easy to understand and create. That’s why it’s the format your dataset needs to be in before you can use pycococreator to create your COCO-styled version. You might be thinking, “why not just use the png binary mask format if it’s so easy to understand.” Remember, the whole reason we’re trying to make a COCO dataset isn’t because it’s the best way of representing annotated images, but because everyone else is using it.

The example script we’ll use to create the COCO-style dataset expects your images and annotations to have the following structure:

    │    │ <image_id>_<object_class_name>_<annotation_id>.png
    │    │ ...
         │   <image_id>.jpeg
         │   ...

In the shapes example, subset is “shapes_train”, year is “2018”, and object_class_name is “square”, “triangle”, or “circle”.  You would generally also have separate “validate” and “test” datasets.

COCO uses JSON (JavaScript Object Notation) to encode information about a dataset. There are several variations of COCO, depending on if its being used for object instances, object keypoints, or image captions. We’re interested in the object instances format which goes something like this:

 "info": info,
 "licenses": [license],
 "categories": [category],
 "images": [image],
 "annotations": [annotation]

The “info”, “licenses”, “categories”, and “images” lists are straightforward to create, but the “annotations” can be a bit tricky. Luckily we have pycococreator to handle that part for us. Let’s start out by getting the easy stuff out of the way first. We’ll describe our dataset using python lists and dictionaries and later export them to json.

INFO = {
    "description": "Example Dataset",
    "url": "",
    "version": "0.1.0",
    "year": 2018,
    "contributor": "waspinator",
    "date_created": datetime.datetime.utcnow().isoformat(' ')

        "id": 1,
        "name": "Attribution-NonCommercial-ShareAlike License",
        "url": ""

        'id': 1,
        'name': 'square',
        'supercategory': 'shape',
        'id': 2,
        'name': 'circle',
        'supercategory': 'shape',
        'id': 3,
        'name': 'triangle',
        'supercategory': 'shape',

Okay, with the first three done we can continue with images and annotations. All we have to do is loop through each image jpeg and its corresponding annotation pngs and let pycococreator generate the correctly formatted items. Lines 90 and 91 create our image entries, while lines 112-114 take care of annotations.

    # filter for jpeg images
    for root, _, files in os.walk(IMAGE_DIR):
        image_files = filter_for_jpeg(root, files)

        # go through each image
        for image_filename in image_files:
            image =
            image_info = pycococreatortools.create_image_info(
                image_id, os.path.basename(image_filename), image.size)

            # filter for associated png annotations
            for root, _, files in os.walk(ANNOTATION_DIR):
                annotation_files = filter_for_annotations(root, files, image_filename)

                # go through each associated annotation
                for annotation_filename in annotation_files:
                    if 'square' in annotation_filename:
                        class_id = 1
                    elif 'circle' in annotation_filename:
                        class_id = 2
                        class_id = 3

                    category_info = {'id': class_id, 'is_crowd': 'crowd' in image_filename}
                    binary_mask = np.asarray(
                    annotation_info = pycococreatortools.create_annotation_info(
                        segmentation_id, image_id, category_info, binary_mask,
                        image.size, tolerance=2)
                    if annotation_info is not None:

There are two types of annotations COCO supports, and their format depends on whether the annotation is of a single object or a “crowd” of objects. Single objects are encoded using a list of points along their contours, while crowds are encoded using column-major RLE (Run Length Encoding). RLE is a compression method that works by replaces repeating values by the number of times they repeat. For example 0 0 1 1 1 0 1  would become 2 3 1 1. Column-major just means that instead of reading a binary mask array left-to-right along rows, we read them up-to-down along columns.

The tolerance option in pycococreatortools.create_annotation_info() changes how precise contours will be recorded for individual objects. The higher the number, the lower the quality of annotation, but it also means a lower file size. 2 is usually a good value to start with.

After creating your COCO-style dataset you can test it out by visualizing it using the COCO API. Using the example Jupyter Notebook in the pycococreator repo, you should see something like this:

Example output using the COCO API

You can find the full script used to convert the shapes dataset along with pycococreator itself on github.

If you want to try playing around with the shape dataset yourself, download it here: shapes_train_dataset.

Now you’re ready to convert your own dataset into the COCO format and begin experimenting with the latest advancements in computer vision. Take a look below for links to some of the amazing models using COCO.

References and Resources


68 thoughts on “Create your own COCO-style datasets”

  1. Thanks, this came at the perfect time! Looks like its working great so far, going to try it on a segmentation task that expects coco-style input soon.

  2. Thanks Patrick. I followed your tutorial successfully. I was wondering if you had tips on getting the resultant COCO-format data into a TensorFlow image segmentation implementation like Mask RCNN or Deeplab.

  3. @waspinator Thank you so much for your library, it’s very useful for me at the moment. I followed your post and then can create the COCO style dataset in your example. However, I have one problem. I think that it’s easy with you but for me, it’s not. I would like to create the binary mask annotations encoded in png of each object on my picture then can create the .png picture and put in the annotations folder before I would like to use your tool to create the COCO style dataset. But I don’t have enough coding skill in Python to write a tools or find some tools. Can you give me some recommend or even guide me with some advices. Thank you so much.

    1. I’m sorry I don’t have a good solution for you at the moment. I’m working on releasing the tool I use to create my segmentation, but I’m not sure when that will be. You can try the annotation tools listed above, but none of them output the format the shape dataset is in, so you would have to write some code to convert them.

  4. Thanks for the article. Very useful. I have one question:
    How do you get segmentation_id? That is, what does the “id” field in the “annotation” field of the coco json format represent? Is it unique across all images and segments? Or is it unique only for one “image_id”?
    I tried to find out the answer to this question all over the web without any success.

    1. As far as I know it doesn’t have a specific meaning. It’s probably a good idea to make all of them unique, not just for each image. I just increment by 1 when I’m building my datasets.

  5. Thanks for sharing your library.
    But the module pycocotools can’t be found in the jupyter notebook while I run the example in the shape folder. I use Anaconda 3 environment now. Is there anything I need to check for the complete installation?

      1. I have installed, then
        cd E:\Github\coco\PythonAPI
        python install
        Then I run Jupyter Notebook to run pycococreator/examples/shapes/ shapes_to_coco.ipynb

        it gives errors as follows (there is a file called _mask.pyx at E:\Github\pycococreator\examples\shapes\pycocotools):

        ModuleNotFoundError Traceback (most recent call last)
        in ()
        6 from PIL import Image
        7 import numpy as np
        —-> 8 from pycococreatortools import pycococreatortools

        e:\anaconda\envs\mxnet\lib\site-packages\pycococreatortools\ in ()
        9 from PIL import Image
        —> 10 from pycocotools import mask

        E:\Github\pycococreator\examples\shapes\pycocotools\ in ()
        —-> 3 import pycocotools._mask as _mask

        ModuleNotFoundError: No module named ‘pycocotools._mask’

        Please help. Thanks in advance

      1. Thank you though the discussion doesn’t really solve my question.
        I actually saw you ask a similar question I have on the github or somewhere. I want to make own dataset and train it on detectron. However, I only have ploygon vertex for the images. I could use coco API to convert polygon to encoded RLE, which I believe is compressed RLE. While looking into downloaded coco annotation files, they actually use uncompressed RLE format, like this one
        [272,2,4,4,4,4,2,9,1,2,16,43,143,24,5,8,16,44,141,….]. So I’m wondering how to convert from compressed RLE to uncompressed RLE. Also could we directly use annotations with compressed RLE format for training on detectron? Or we have to convert it first and then feed the right format data to detectron? Thanks a lot?

          1. Thank you for all the information. I used the first answer from the link you gave to generate a binary mask from a polygon. And then I used your binary_mask_to_polygon function to convert that mask back to polygon. I found if tolerance is set to 0, then the resulting polygon is very different from the original one. If tolerance is set to 1, there is still a little difference (number of vertices) between the original polygon and the resulting polygon. For example, the original polygon is as follows,
            [1269.0, 572.0, 1375.0, 589.0, 1375.0, 589.0, 1483.0, 641.0, 1562.0, 699.0, 1623.0, 781.0, 1615.0, 806.0, 1634.0, 838.0, 1603.0, 822.0, 1569.0, 835.0, 1500.0, 817.0]
            the resulting polygon is below,
            [[1633.0, 837.5, 1603.0, 822.5, 1568.0, 835.5, 1500.0, 817.5, 1268.5, 572.0, 1272.0, 571.5, 1376.0, 588.5, 1485.0, 641.5, 1563.5, 700.0, 1622.5, 779.0, 1623.5, 782.0, 1615.5, 806.0, 1632.5, 834.0, 1633.0, 837.5]]
            Don’t count the repeating ending vertex, resulting polygon has one more vertex than that of the original one. Maybe different implementations have slight different results? What does tolerance mean? Why is the resulting polygon so different if tolerance is set to 0?Thank you.

            Also I found the coco APIs actually handle the conversion between the uncompressed RLE and compressed RLE. You could check the functions rleToString and rleFrString in the following link
            According to the comments (line 102 and 118) in the link they do handle the internal conversions between uncompressed one to compressed one.

            1. The tolerance parameter changes how detailed the polygon is. For example if you want to draw a circle, there are an infinite amount of points you would need. The higher the tolerance the lower quality of polygon. It doesn’t really matter what the exact numbers are, as long as the correct area is covered.

              1. Thank you. For the purpose of making a dataset, probably setting tolerance to 1 is good enough. Too high precision will add some unnecessary computation.
                Also I verified that rleToString and rleFrString do the conversion. If I understand correctly, the conversion between uncompressed RLEs to compressed RLEs in the source code only add some redundant computation if our training dataset uses uncompressed RLEs. Because it always use uncompressed RLEs to compute area and IOU.

  6. If I have a same single object, say an animal, at two different places of an image, should I use two annotations with the same “image_id” and the different “id”, or I should use one annotation containing two polygons? Thank you.

    1. Hi Yogesh, I also have the similar dataset which i need to convert into COCO fromat. I have original aerial images and mask pngs exported from archGIS tool. I am not sure if these pngs are the same which are mentioned in this tutorial. Did you able to convert your dataset into the coco format somehow ?

  7. I prepared my data thanks to your code but I have no idea how to load it into the MASK R-CNN. I would appreciate any help such as an example or a tutorial.

    1. Did you get MASK-RCNN working with the example shapes data? Try training on that first to see how things should work, and then modify the jupyter notebook to point to your dataset.

  8. “It also has binary mask annotations encoded in png of each of the shapes. This binary mask format is fairly easy to understand and create. ”
    How can I encode the binary masks in png.
    Please help

      1. I am using the function
        def polygons_to_mask(img_shape, polygons):
        mask = np.zeros(img_shape[:2], dtype=np.uint8)
        mask = Image.fromarray(mask)
        xy = list(map(tuple, polygons))
        ImageDraw.Draw(mask).polygon(xy=xy, outline=1, fill=1)
        mask = np.array(mask, dtype=bool)
        return mask
        This function creates a binary mask given a polygon coordinates, one at a time.
        Now how can I proceed to create the png and use your pycococreator tool

  9. Hi Folks,
    I cobbled together some code to put together the png annotation files. Hopefully this helps somebody. I am new to computer vision and Python and it took me a while to develop this. To create the polygons, I used the labelme app created by wkentaro. The script consumes the json files generated by that app. Also, some of this code is taken form seahawks8 which can be found here:

    import cv2
    import PIL.Image
    import PIL.ImageDraw
    import fnmatch
    import io
    import json
    import labelme
    import matplotlib.pyplot as plt
    import numpy as np
    import os
    import re
    INPUT_DIR = '/home/david/projects/ir/coco/labelme/valves'
    OUTPUT_DIR = "/home/david/projects/ir/coco/labelme/valves/coco/"
    def filter_for_json(root, files):
        file_types = ['*.json']
        file_types = r'|'.join([fnmatch.translate(x) for x in file_types])
        files = [os.path.join(root, f) for f in files]
        files = [f for f in files if re.match(file_types, f)]
        return files
    def main(input_dir = INPUT_DIR, output_dir = OUTPUT_DIR, starting_image_id=1000):
        for root, _, files in os.walk(input_dir):
            image_id = starting_image_id
    # get all of the json files in the root directory
    json_files = filter_for_json(root, files)
    # loop over each json file
    for json_filename in json_files:
    # open the json file 
    data = json.load(open(json_filename))
    # load the image from the json file and save it to disk
    image_data = data['imageData']
    image = labelme.utils.img_b64_to_arr(image_data)
    jpg_file_name = OUTPUT_DIR + str(image_id) + ".jpg"
    # Get the shape
    image_shape = image.shape
    # loop over each shape (polygon) in the json file
    annotation_id = 0
    for shape in data['shapes']:
        label_name = shape['label']
        polygons = shape['points']
        mask = labelme.utils.polygons_to_mask(image_shape, polygons)
        mask = PIL.Image.fromarray(mask)
        # draw white polygon
        xy = list(map(tuple, polygons))
        draw = PIL.ImageDraw.Draw(mask)
        draw.polygon(xy=xy, outline=1, fill=1)
        image_byte_array = io.BytesIO()
        # save the png and the jpg in the COCO format, format='PNG')
        png_file_name = OUTPUT_DIR + str(image_id) + '_' + label_name + '_' + str(annotation_id) + ".png"
        image_byte_array = image_byte_array.getvalue()
        annotation_id = annotation_id + 1
    image_id = image_id + 1
    if __name__ == "__main__":
  10. Thanks a lot, this post is of great help.
    I want to ask that in case i have both the mask annotations and actual images in “.png” format, would the script and training model work same as with JPG & PNG?

      1. Thanks, it worked!
        My problem set has like 20 different categories annotated, but I am focused on only 8 of them, which will make 9 (including n=background). What tolerance value should I select?

  11. My problem set has like 20 different categories annotated, but I am focused on only 8 of them, which will make 9 (including n=background). What tolerance value should I select?

  12. Hey,
    The parts before using pycococreater are little vague. Is there a tutorial to make files usable by the pycococreater from simple png files. As i am new to this i am having difficulty in understanding the binary masks and their creation.

  13. Hi Patrick !
    Thanks a lot for sharing your work. I was wondering if you still have the script to generate mask per object png from the global instance mask ?

  14. Hi and thank you for helpful post, I followed the procedure and I can run the without any error. I got the jason file but when I visualize it with the help of the ipython code that you provided it does not show the mask correctly. would you please take a look and see if it works for you?

  15. thank you for the help but you have not mentioned after the generation of the json file which parts of the code we have to customize in order to make the mask rcnn works on the new json data set. where should I put the json file and which part of dataset code should I change?

  16. Hi,
    I have my own dataset . The dataset consist of 1 class and each image has many of them.
    I have the ground truth masks for the images but each image has many object (all the objects located on the same mask ).
    I am wondering if I can use your script to get COCO style annotations.

  17. @waspinator, I have some aerial images in my dataset and i have extracted few objects from the each images as mask pngs. These pngs are simple like object area is having white pixels and background is blank. i was confused if mask pngs you are talking about are something special ? or can i use my mask pngs to conversion using your tool ?

    Thanks in advance

  18. dear friend
    I have my own dataset that consist of Satellite Fotos(30000) and each image has corresponding ground truth (mask) that each mask has only one class (Edge Buildings )
    how can I can use your script to get annotations automatic for each mask(Json file)or is there any solution whithout to taking long time ?
    best regard

  19. Thank you very much for your tool,
    I have a question. Please answer as quick as possible!
    I have a lot of trainning image, some have detect(s) which i want to find, and some do not (mean this image is good). I also have mask image for all the images which have detect(s) and i used your tôl to annotate its. But how do i annotate the “good” image which don’t have detect(mask) to annotate?

  20. Thank you very much for your tool, but i have a question. Please answer as quick as possible!
    I have a lot of trainning image, some have defects which i want to find, and some do not (mean this image is good). I also have mask image for all the images which have defect(s) but how do i annotate the “good” image which don’t have defect(mask) to annotate?

      1. Thank you for your reply.
        I already know how to annotate my data. I mean that how can i annotate the good (no mark) images because its don’t have anything to mark.

        1. In this case, it would be better if you try to use unsupervised learning rather than MaskRCNN. For MaskRCNN you need an object inside the images.

          1. This is not what i mean. Sorry for not make it clear.
            I have 2 types of images. One has defect on it and one does not. I already use CNN to classify my data to 2 classes but now i want to know where the defect is so i want to use MaskRCNN.

            1. This is how I would annotate my image. Label the parts that are good and the parts that are defective. If a part is 100% good, just label the entire part “good”.

              Or, if you don’t care about detecting the good areas in the good images, you don’t have to annotate the good areas at all. Then Mask R-CNN will just learn to detect the defects.

              1. But in my case the whole image is good if it does not have defect inside.
                It means my data have 1 class (defect) and background. So you can think the good image itself is just a background image.

                1. okay, so you only really have one class, “defect”. Just label that and leave the rest of the image to be the default “BG” (background) class. For images without a defect, don’t add any labels.

                  1. Sorry for not making clear here.
                    So if i don’t add any labels to the images without a defect, does your tool change it to coco-type for MaskRCNN training?

  21. In that case, you only need to train your MaskRCNN model with only one object (defect object). You can run the inference with all of your images, for those don’t have the defect object inside, the number of the instance is zero.

      1. You don’t need to annotation for those don’t have any defect inside. Just train your model with the detect images, mean that you only have 1 class and background. Later you can test with the images without any defect, the number of instance should be zero.

        1. So i don’t annotate those image which don’t have any defect inside which mean i don’t use those image to train, just ust the image that have defect inside?

Leave a Reply

Your email address will not be published. Required fields are marked *